Grundlehren der mathematischen Wissenschaften A Series of Comprehensive Studies in Mathematics
Editors
S.S. Chern B. Eckmann P. de la Harpe H. Hironaka F. Hirzebruch N. Hitchin L. Hormander M.A. Knus A. Kupiainen J. Lannes G. Lebeau M. Ratner D. Serre Ya.G. Sinai N.J.A. Sloane J. Tits M. Waldschmidt S. Watanabe Managing Editors
M. Berger J. Coates
Springer New York
Berlin Heidelberg Barcelona Budapest Hong Kong London
Milan Paris Singapore
Tokyo
S.R.S. Varadhan
260
Grundlehren der mathematischen Wissenschaften A Series of Comprehensive Studies in Mathematics
A Selection 200. Doid: Lectures on Algebraic Topology 201. Beck: Continuous Flows in the Plane 202. Schmetterer: Introduction to Mathematical Statistics 203. Schoeneberg: Elliptic Modular Functions 204. Popov: Hyperstability of Control Systems 20.5. Nikol'skil: Approximation of Functions of Several Variables and Imbedding Theorems 206. Andr6: Homologie des Alg6bres Commutatives 207. Donoghue: Monotone Matrix Functions and Analytic Continuation 208. Lacey: The Isometric Theory of Classical Banach Spaces 209. Ringel: Map Color Theorem 210. Gihman/Skorohod: The Theory of Stochastic Processes I 211. Comfort/Negrepontis: The Theory of Ultrafilters 212. Switzer: Algebraic TopologyHomotopy and Homology 213. Shafarevich: Basic Algebraic Geometry 214. van der Waerden: Group Theory and Quantum Mechanics 215. Schaefer: Banach Lattices and Positive Operators 216. Pdlya/Szego: Problems and Theorems in Analysis 11
217. Stenstrom: Rings of Quotients 218. Gihman/Skorohod: The Theory of Stochastic Processes 11 219. Duvant/Lions: Inequalities in Mechanics and Physics 220. Kirikov: Elements of the Theory of Representations 221. Mumford: Algebraic Geometry 1: Complex Projective Varieties 222. Lang: Introduction to Modular Forms 223. Bergh/Lofstrom: Interpolation Spaces. An Introduction 224. Gilbarg/Trudinger: Elliptic Partial Differential Equations of Second Order 225. Schtitte: Proof Theory 226. Karoubi: KTheory. An Introduction 227. Grauert/Remmert: Theorie der Steinschen
229. Hasse: Number Theory 230. Klingenberg: Lectures on Closed Geodesics 231. Lang: Elliptic Curves: Diophantine Analysis 232. Gihman/Skorohod: The Theory of Stochastic Processes III 233. Stroock/Varadhan: Multidimensional Diffusion Processes 234. Aigner: Combinatorial Theory 235. Dynkin/Yushkevich: Controlled Markov Processes 236. Grauert/Remmert: Theory of Stein Spaces 237. Kothe: Topological Vector Spaces II 238. Graham/McGehee: Essays in Commutative Harmonic Analysis 239. Elliott: Probabilistic Number Theory I 240. Elliott: Probabilistic Number Theory II 241. Rudin: Function Theory in the Unit Ball of C" 242. Huppert/Blackburn: Finite Groups 11 243. Huppert/Blackburn: Finite Groups III 244. Kubert/Lang: Modular Units 245. Comfeld/Fomin/Sinai: Ergodic Theory 246. Naimark/Stem: Theory of Group Representations 247. Suzuki: Group Theory I 248. Suzuki: Group Theory II 249. Chung: Lectures from Markov Processes to Brownian Motion 250. Amold: Geometrical Methods in the Theory of Ordinary Differential Equations 251. Chow/Hale: Methods of Bifurcation Theory 252. Aubin: Nonlinear Analysis on Manifolds. MongeAmpore Equations 253. Dwork: Lectures on padic Differential Equations 254. Freitag: Siegelsche Modulfunktionen 255. Lang: Complex Multiplication 256. Hormander: The Analysis of Linear Partial Differential Operators 1 257. Hormander: The Analysis of Linear Partial Differential Operators 11
Ritume
228. Segal/Kunze: Integrals and Operators
(continued after index)
M.I. Freidlin A.D. Wentzell
Random Perturbations of Dynamical Systems Second Edition Translated by Joseph Sziics
With 33 Illustrations
Springer
M.I. Freidlin Department of Mathematics University of Maryland College Park, MD 20742
A.D. Wentzell Department of Mathematics Tulane University New Orleans, LA 70118
Joseph Sziics (Translator) Texas A&M University at Galveston P.O. Box 1675 Galveston, TX 77553
USA
USA
USA
Mathematics Subject Classification (1991): 58F30, 60F10, 60H25 Library of Congress CataloginginPublication Data Freidlin, M.I. (Mark Iosifovich) Random perturbations of dynamical systems / Mark 1. Freidlin, Alexander D. Wentzell ; [Joseph Szucs (translator)].  2nd ed. p. cm.  (Grundlehren der mathematischen Wissenschaften ; 260)
"[Based on the] original Russian edition: Fluktua[sii v dinamicheskikh sistemakh pod deistviem malykh sluchainykh vozmushchenii, Nauka : Moscow, 1979"T.p. verso. Includes bibliographical references and index. ISBN 0387983627 (hardcover) 1. Stochastic processes. 2. Perturbation (Mathematics) 1. Venttsel', A.D. II. Venttsel', A.D. Fluktuatsii v dinamicheskikh sistemakh pod deistviem malykh sluchainykh vozmushchenii. III. Title. IV. Series. QA274.F73 1998 519.2dc21 984686 Printed on acidfree paper.
Original Russian edition: Fluktuatsii v Dinamicheskikh Sistemakh Pod Deistviem Malykh Sluchainykh Vozmushchenii, Nauka: Moscow, 1979.
© 1998, 1984 SpringerVerlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (SpringerVerlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adapta
tion, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as under
stood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely be anyone.
Production managed by Anthony Guardiola; manufacturing supervised by Joe Quatela. Typeset by Asco Trade Typesetting Ltd., Hong Kong. Printed and bound by Edwards Brothers Inc., Ann Arbor, MI. Printed in the United States of America.
987654321 ISBN 0387983627 SpringerVerlag New York Berlin Heidelberg SPIN 10647749
Preface to the Second Edition
The first edition of this book was published in 1979 in Russian. Most of the material presented was related to largedeviation theory for stochastic processes. This theory was developed more or less at the same time by different authors in different countries. This book was the first monograph in which largedeviation theory for stochastic processes was presented. Since then a number of books specially dedicated to largedeviation theory have been published, including S. R. S. Varadhan [4], A. D. Wentzell [9], J.D. Deuschel and D. W. Stroock [1], A. Dembo and O. Zeitouni [1]. Just a few changes were made for this edition in the part where large deviations are treated. The most essential is the addition of two new sections in the last chapter. Large deviations for infinitedimensional systems are briefly considered in one new section, and the applications of largedeviation theory to wave front propagation for reactiondiffusion equations are considered in another one.
Largedeviation theory is not the only class of limit theorems arising in the context of random perturbations of dynamical systems. We therefore included in the second edition a number of new results related to the averaging principle. Random perturbations of classical dynamical systems under certain conditions lead to diffusion processes on graphs. Such problems are considered in the new Chapter 8. Some new results concerning fast oscillating perturbations of dynamical systems with conservation laws are included in Chapter 7. A few small additions and corrections were made in the other chapters as well. We would like to thank Ruth Pfeiffer and Fred Torcaso for their help in the preparation of the second edition of this book. College Park, Maryland New Orleans, Louisiana
M.I. Freidlin A.D. Wentzell
Preface
Asymptotical problems have always played an important role in probability
theory. In classical probability theory dealing mainly with sequences of independent variables, theorems of the type of laws of large numbers, theorems of the type of the central limit theorem, and theorems on large deviations constitute a major part of all investigations. In recent years, when
random processes have become the main subject of study, asymptotic investigations have continued to play a major role. We can say that in the theory of random processes such investigations play an even greater role than in classical probability theory, because it is apparently impossible to obtain simple exact formulas in problems connected with large classes of random processes. Asymptotical investigations in the theory of random processes include results of the types of both the laws of large numbers and the central limit theorem and, in the past decade, theorems on large deviations. Of course, all these problems have acquired new aspects and new interpretations in the theory of random processes.
One of the important schemes leading to the study of various limit theorems for random processes is dynamical systems subject to the effect of random perturbations. Several theoretical and applied problems lead to this scheme. It is often natural to assume that, in one sense or another, the random perturbations are small compared to the deterministic constituents of the motion. The problem of studying small random perturbations of dynamical systems has been posed in the paper by Pontrjagin, Andronov, and Vitt [1]. The results obtained in this article relate to onedimensional and partly twodimensional dynamical systems and perturbations leading to diffusion processes. Other types of random perturbations may also be considered; in particular, those arising in connection with the averaging principle. Here the smallness of the effect of perturbations is ensured by the fact that they oscillate quickly. The contents of the book consists of various asymptotic problems arising
as the parameter characterizing the smallness of random perturbations converges to zero. Of course, the authors could not consider all conceivable schemes of small random perturbations of dynamical systems. In particular, the book does not consider at all dynamical systems generated by random
Foreword
vector fields. Much attention is given to the study of the effect of perturbations on large time intervals. On such intervals small perturbations essentially influence the behavior of the system in general. In order to take account of this influence, we have to be able to estimate the probabilities of rare events, i.e., we need theorems on the asymptotics of probabilities of large deviations for random processes. The book studies these asymptotics and their applications to problems of the behavior of a random process on large time intervals, such as the problem of the limit behavior of the invariant measure, the problem of exit of a random process from a domain, and the problem of stability under random perturbations. Some of these problems have been formulated for a long time and others are comparatively new. The problems being studied can be considered as problems of the asymp
totic study of integrals in a function space, and the fundamental method used can be considered as an infinitedimensional generalization of the wellknown method of Laplace. These constructions are linked to contemporary
research in asymptotic methods. In the cases where, as a result of the effect of perturbations, diffusion processes are obtained, we arrive at problems closely connected with elliptic and parabolic differential equations with a small parameter. Our investigations imply some new results concerning such equations. We are interested in these connections and as a rule include the corresponding formulations in terms of differential equations. We would like to note that this book is being written when the theory of large deviations for random processes is just being created. There have been a series of achievements but there is still much to be done. Therefore, the book treats some topics that have not yet taken their final form (part of the material is presented in a survey form). At the same time, some new research is not reflected at all in the book. The authors attempted to minimize the deficiencies connected with this. The book is written for mathematicians but can also be used by specialists of adjacent fields. The fact is that although the proofs use quite intricate
mathematical constructions, the results admit a simple formulation as a rule.
Contents
Preface to the Second Edition Preface
Introduction
v vii 1
CHAPTER I
Random Perturbations
15
§1. Probabilities and Random Variables §2. Random Processes. General Properties §3. Wiener Process. Stochastic Integral §4. Markov Processes and Semigroups §5. Diffusion Processes and Differential Equations
15 17
24 29 34
CHAPTER 2
Small Random Perturbations on a Finite Time Interval §1. Zeroth Approximation §2. Expansion in Powers of a Small Parameter §3. Elliptic and Parabolic Differential Equations with a Small Parameter at the Derivatives of Highest Order
44 44 51
59
CHAPTER 3
Action Functional §1. Laplace's Method in a Function Space §2. Exponential Estimates §3. Action Functional. General Properties §4. Action Functional for Gaussian Random Processes and Fields
70 70 74 79
92
CHAPTER 4
Gaussian Perturbations of Dynamical Systems. Neighborhood of an Equilibrium Point §1. Action Functional §2. The Problem of Exit from a Domain §3. Properties of the Quasipotential. Examples §4. Asymptotics of the Mean Exit Time and Invariant Measure for the Neighborhood of an Equilibrium Position §5. Gaussian Perturbations of General Form
103 103 108 118 123 132
X
Contents
CHAPTER 5
Perturbations Leading to Markov Processes §l. Legendre Transformation §2. Locally Infinitely Divisible Processes §3. Special Cases. Generalizations §4. Consequences. Generalization of Results of Chapter 4
136 136 143 153 157
CHAPTER 6
Markov Perturbations on Large Time Intervals
161
§1. Auxiliary Results. Equivalence Relation §2. Markov Chains Connected with the Process (X;, §3. Lemmas on Markov Chains §4. The Problem of the Invariant Measure §5. The Problem of Exit from a Domain §6. Decomposition into Cycles. Sublimit Distributions §7. Eigenvalue Problems
161
168 176 185 192 198
203
CHAPTER 7
The Averaging Principle. Fluctuations in Dynamical Systems with Averaging
212
§1. The Averaging Principle in the Theory of Ordinary Differential Equations §2. The Averaging Principle when the Fast Motion is a Random Process §3. Normal Deviations from an Averaged System §4. Large Deviations from an Averaged System §5. Large Deviations Continued §6. The Behavior of the System on Large Time Intervals §7. Not Very Large Deviations §8. Examples §9. The Averaging Principle for Stochastic Differential Equations
212 216 219 233 241 249 253 257 268
CHAPTER 8
Random Perturbations of Hamiltonian Systems §l. Introduction §2. Main Results §3. Proof of Theorem 2.2
283 283 295
§4. Proofs of Lemmas 3.1 to 3.4
301 312
§5. Proof of Lemma 3.5 §6. Proof of Lemma 3.6 §7. Remarks and Generalizations
328 338 344
CHAPTER 9
Stability Under Random Perturbations §1. Formulation of the Problem §2. The Problem of Optimal Stabilization §3. Examples
361 361
367 373
Contents
xi
CHAPTER 10
Sharpenings and Generalizations
377
§1. Local Theorems and Sharp Asymptotics §2. Large Deviations for Random Measures §3. Processes with Small Diffusion with Reflection at the Boundary §4. Wave Fronts in Semilinear PDEs and Large Deviations §5. Random Perturbations of InfiniteDimensional Systems
377
References
417
Index
429
385
392 397 408
Introduction
Let b(x) be a continuous vector field in R'. First we discuss nonrandom perturbations of a dynamical system xr = b(x,).
(1)
We may consider the perturbed system
9, = b(X,, 0,),
(2)
where b(x, y) is a function jointly continuous in its two arguments and turning into b(x) for y = 0. We shall speak of small perturbations if the function 41 giving the perturbing effect is small in one sense or another. We may speak of problems of the following kind: the convergence of the
solution X, of the perturbed system to the solution x, of the unperturbed system as the effect of the perturbation decreases, approximate expressions of
various accuracies for the deviations X,  x, caused by the perturbations, and the same problems for various functionals of a solution (for example, the first exit time from a given domain D). To solve the kind of problems related to a finite time interval we require
less of the function b(x, y) than in problems connected with an infinite interval (or a finite interval growing unboundedly as the perturbing effect decreases). The simplest result related to a finite interval is the following: if the solution of the system (1) with initial condition xo at t = 0 is unique, then
the solution X, of system (2) with initial condition X0 converges to x, uniformly in t e [0, T] as Xo + xo and 110 11 or = suPo s t :g r 10, I  0. If the function b(x, y) is differentiable with respect to the pair of its arguments, then
we can linearize it near the point x = x, y = 0 and obtain a linear approximation b, of X,  x, as the solution of the linear system Ob'
b` 
ax'
(X" 0)bi +
OP
ayk (X"
0) 0,;
(3)
under sufficiently weak conditions, the norm supo s , s r I X,  x,  6,1 of the
remainder will be o(1 X O  xo I + 110 11 04 If b(x, y) is still smoother,
2
Introduction
then we have the decomposition
X
x, + 6, + Y, +
0(1 X0
 x012 +
110112
OT),
(4)
in which y, depends quadratically on perturbations of the initial conditions and the right side (the function !, can be determined from a system of linear differential equations with a quadratic function of t/i 6, on the right side), etc.
We may consider a scheme
X; = b(X', 80,)
(5)
depending on a small parameter s, where ii', is a given function. In this case for the solution X' with initial condition Xo = x0 we can obtain a decomposition
x,+sytt)+s2y;2)+...+e"y:")
(6)
in powers of a with the remainder infinitely small compared with a", uniformly on any finite interval [0, T]. Under more stringent restrictions on the function b(x, y), results of this kind can be obtained for perturbations 0, which are not small in the norm of uniform convergence but rather, for example, in some L'norm or another. As far as results connected with an infinite time interval are concerned, stability properties of the unperturbed system (1) as t  oo are essential. Let x* be an equilibrium position of system (1), i.e., let b(x*) = 0. Let this equilibrium position be asymptotically stable, i.e., for any neighborhood U9 x* let there exist a small neighborhood V of x* such that for any x0 E V the trajectory x, starting at x0 does not leave U for t >_ 0 and converges to x* as t + oc. Denote by G* the set of initial points x0 from which there start solutions converging to x* as t  oo. For any neighborhood U of x* and any point x0 e G* there exist 6 > 0 and T > 0 such that for IXo  x0I < 6, supo_ 0, but the solution X, of the system obtained from the
initial one by an arbitrarily small perturbation leaves the domain in finite time.
Some of these results also hold for trajectories attracted not to a point x* but rather a compact set of limit points, for example, for trajectories winding on a limit cycle.
There are situations where besides the fact that the perturbations are small, we have sufficient information on their statistical character. In this case
it is appropriate to develop various mathematical models of small random perturbations. The consideration of random perturbations extends the notion of perturbations considered in classical settings at least in two directions. Firstly, the requirements of smallness become less stringent: instead of absolute smallness for all t (or in integral norm) it may be assumed that the perturbabations are small only in mean over the ensemble of all possible perturbations. Small random perturbations may assume large values but the probability of these large values is small. Secondly, the consideration of random processes as perturbations extends the notion of the stationarity character of perturba
tions. Instead of assuming that the perturbations themselves do not change with time, we may assume that the factors which form the statistical structure of the perturbations are constant, i.e., the perturbations are stationary as random processes. Such an extension of the notion of a perturbation leads to effects not characteristic of small deterministic perturbations. Especially important new properties occur in considering a long lasting effect of small random perturbations. We shall see what models of small random perturbations may be like and what problems are natural to consider concerning them. We begin with perturbations of the form
X = b(X;, so,),
(7)
where 0, is a given random process, for example, a stationary Gaussian process with known correlation function. (Nonparametric problems connected with arbitrarily random processes which belong to certain classes and are small in some sense are by far more complicated.) For the sake of simplicity, let the initial point X0 not depend on e: Xo = xo. If the solution of system (7) is unique, then the random perturbation l/i(t) leads to a random
process X. The first problem which arises is the following: Will X; converge to the solution x, of the unperturbed system as a 0? We may consider various kinds of probabilistic convergence: convergence with probability 1, in probability, and in mean. If supo s,1 T 106,1 < oo with probability 1, then, ignoring the fact that the realization of 0, is random, we may apply the results presented above to perturbations of the form so, and obtain, under various
4
Introduction
conditions on b(x, y), that X; and that
x, with probability 1, uniformly in t E [0, T]
X; = x, + EY,(t) + O(E)
(8)
X; =x,+EY;1)+.+E"Y;")+o(s")
(9)
or
(o(E) and o(E") are understood as being satisfied with probability I uniformly in t e [0, T] as E 0). Nevertheless, it is not convergence with probability 1 which represents the main interest from the point of view of possible applications. In considering
small random perturbations, perhaps we shall not have to do with X; for various E simultaneously but only for one small E. We shall be interested in questions such as: Can we guarantee with practical certainty that for a small z the value of X f is close to x, ? What will the order of the deviation X;  x, be? What can be said about the distribution of the values of the random process X; and functionals thereof? etc. Fortunately, convergence with probability 1 implies convergence in probability, so that X; will converge to x, in probability, uniformly in t e [0, T] as E  0:
P0 0. For convergence in mean we have to impose still further restrictions on b(x, y) and vi,; we shall not discuss this. From the sharper result (8) it follows that the random process
Xx r
Y";=
r
converges to a random process Y;11 in the sense of distributions as E  0 (this
latter process is connected with the random perturbing effect 0, through linear differential equations). In particular, this implies that if 0, is a Gaussian process, then in first approximation, the random process X; will be Gaussian
with mean x, and correlation function proportional to c2. This implies the following result: if f is a smooth scalarvalued function in R' and grad f (x,") 96 0, then
\ PJf(Xtt").f(x,0)<x=O(xl+o(1) 9 or I
(11)
5
Introduction
.
as a + 0, where D(y) = f'' (l/ / n)e 221'dz is the Laplace function and a is determined from grad f (x,o) and the value of the correlation function of Y,(1 at the point (t0, to). We may obtain sharper results from (9): an expansion of
the remainder o(1) in powers of e. We may also obtain results relative to asymptotic distributions of functionals of Y', 0 S t < T, and sharpenings of them, connected with asymptotic expansions. Hence for random perturbations of the form (7) we may pose and solve a series of problems characteristic of the limit theorems of probability theory. Results on the convergence in probability of a random solution of the per
turbed system to a nonrandom function correspond to laws of large numbers for sums of independent random variables. We can speak of the limit distribution under a suitable normalization; this corresponds to results of the type of the central limit theorem. Also as in sharpenings of the central limit theorem, we may obtain asymptotic expansions in powers of the parameter. In the limit theorems for sums of independent random variables there is still another direction: the study of probabilities of large deviations (after normalization) of a sum from the mean. Of course, all these probabilities con
verge to zero. Nevertheless we may study the problem of finding simple expressions equivalent to them or the problem of sharper (or rougher) asymptotics of them. The first general results concerning large deviations for sums of independent random variables have been obtained by Cramer [1]. These results have to do with asymptotics, up to equivalence, of probabilities of the form (12)
6V"
Jj
as n  oc, x + oo and also asymptotic expansions for such probabilities (under more stringent restrictions). We may be interested in analogous problems for a family of random processes X; arising as a result of small random perturbations of a dynamical system. For example, let A be a set in a function space on the interval [0, T], which does not contain the unperturbed trajectory x, (and is at a positive distance from it). Then the probability P {X` E A}
(13)
of the event that the perturbed trajectory Xi belongs to A, of course, converges to 0 as E  0, but what is the asymptotics of this infinitely small probability? It may seem that such digging into extremely rare events contradicts the general spirit of probability theory, which ignores events of small probability. Nevertheless, it is exactly this determination of which almost unlikely events related to the random process Xf on a finite interval are "more improbable" and which are "less improbable," that, in several cases, serves as a key to the
6
Introduction
question of what the behavior, with probability close to 1, of the process XI will be on an infinite time interval (or on an interval growing with decreasing e).
Indeed, for the sake of definiteness, we consider the particular case of perturbations of the form (7): X; = b(XI) + E0,.
(14)
Furthermore, let 0, be a stationary Gaussian process. Assume that the trajec
tories of the unperturbed system (1), beginning at points of a bounded domain D, do not leave this domain for t > 0 and are attracted to a stable
equilibrium position x* as t  oo. Will the trajectories of the perturbed system (14) also have this property with probability near 1 ?The results above
related to small nonrandom perturbations cannot help us answer this question, since the supremum of 10, I for t e [0, oo) is infinite with probability 1 (if we do not consider the case of "very degenerate" processes 0,). We have to approach this question differently. We divide the time axis [0, oe) into a
countable number of intervals of length T. On each of these intervals, for small s, the most likely behavior of XI is such that the supremum of I X;  x, I over the interval is small. (For intervals with large indices, X; will be simply close to x* with overwhelming probability.) All other ways of behavior, in
particular, the exit of X; from D on a given time interval, will have small probabilities for small s. Nonetheless, these probabilities are positive for any e > 0. (Again, we exclude from our considerations the class of "very degenerate" random processes 4i,.) For a given e > 0 the probability P {X r 0 D for some t e [kT, (k + 1)T]}
(15)
will be almost the same for all intervals with large indices. If the events involving the behavior of our random process on different time intervals were
independent, we would obtain from this that sooner or later, with probability 1, the process X; leaves D and the first exit time t` has an approximately
exponential distribution with parameter T ' P{X; exits from D for some t e [k T, (k + 1)T]}. The same will happen if these events are not exactly independent but the dependence between them decreases for distant intervals in a certain manner. This can be ensured by some weak dependence properties of the perturbing random process ti,. Hence for problems connected with the exit of X' from a domain for small e, it is essential to know the asymptotics of the probabilities of improbable
events ("large deviations") involving the behavior of X; on finite time intervals. In the case of small Gaussian perturbations it turns out that these
probabilities have asymptotics of the form exp{CeZ} as a  0 (rough asymptotics, i.e., not up to equivalence but logarithmic equivalence). It turns
out that we can introduce a functional S((p) defined on smooth functions (which are smoother than the trajectories of Xi), such that
7
Introduction
P{p(X`, (p) < S}
exp{e2S((p)}
(16)
for small positive b and e, where p is the distance in a function space (say, in
the space of continuous functions on the interval from T1 to T2; for the precise meaning of formula (16), cf. Ch. 3). The value of the functional at a
given function characterizes the difficulty of the passage of X' near the function. The probability of an unlikely event consists of the contributions exp{ eZS(cp)} corresponding to neighborhoods of separate functions fp; as s  0, only the summand with smallest S(ip) becomes essential. Therefore, it is natural that the constant C providing the asymptotics is determined as the infimum of S((p) over the corresponding set of functions q). Thus for the probability in formula (15) the infimum has to be taken over smooth functions (p, leaving D for t e [kT, (k + 1)T]. (Exact formulations and the form of the functional S((p) may be found in §5, Ch. 4; there we discuss its application to finding the asymptotics of the exit time i` ass  0.) Another problem related to the behavior of Xf on an infinite time interval is the problem of the limit behavior of the stationary distribution u' of X; as e  0. This limit behavior is connected with the limit sets of the dynamical system (1). Indeed, the stationary distribution shows how much time the process spends in one set or another. It is plausible to expect that for small c the process X; will spend an overwhelming amount of time near limit sets of the dynamical system and, most likely, near stable limit sets. If system (1) has only one stable limit set K, then the measure p' converges weakly to a measure concentrated on K as a + 0 (we do not formulate our assertions in so precise a way that we take account of the possibility of the existence of distinct limits µ" for different sequences e1 + 0). However, if there are several stable sets, even if there are at least two, K, and K2, then the situation becomes unclear; it depends on the exact form of small perturbations.
The problem of what happens to the stationary distribution of a random process arising as an effect of random perturbations of a dynamical system when these perturbations decrease has been posed in the paper of Pontrjagin, Andronov, and Vitt [1]. The approach applied in this article does not relate to perturbations of the form (14) but rather perturbations under whose influence there arise diffusion processes (given by formulas (19) and (20) below). This approach is based on solving the FokkerPlanck differential equation; in the onedimensional case the problem of finding the asymptotics of the stationary distribution has been solved completely (cf. also Bernstein's article [1] which appeared in the same period). Some results involving the stationary distribution in the twodimensional case have also been obtained. Our approach is not based on equations for the probability density of the stationary distribution but rather the study of probabilities of improbable events. We outline the scheme of application of this approach to the problem of asymptotics of the stationary distribution. The process X, spends most of the time in neighborhoods of the stable limit sets K, and K2, it occasionally moves to a significant distance from K1 or K2 and returns to the same set, and it very seldom passes from K, to K2 or
Introduction
8
conversely. If we establish that the probability of the passage of X, from K, to K2 over a long time T (not depending on e) converges to 0 with rate
exp{
V1282)
as a + 0, and the probability of passage from K2 to K1 has the order
exp{V21e2} and V, 2 < V21, then it becomes plausible that for small a the process spends
most of the time in the neighborhood of K2. This is so since a successful "attempt" at passage from K1 to K2 will fall on a smaller number of time intervals [kT, (k + 1)T] spent by the process near K1, than a successful attempt at passage from K2 to K, with respect to the number of time intervals of length T spent near K2. Then 1t will converge to a measure concentrated on K2. The constants V12 and V21 can be determined as the infima of the
functional S((p) over the smooth functions cp passing from K1 to K2 and conversely on an interval of length T(more precisely, they can be determined as the limits of these infima as T + oc). The program of the study limit behavior which we have outlined here is carried out not for random perturbations of the form (14) but rather perturbations leading to Markov processes; the exact formulations and results are given in §4, Ch. 6. As we have already noted, random perturbations of the form (14) do not represent the only scheme of random perturbations which we shall consider (and not even the scheme to which we shall pay the greatest attention). An immediate generalization of it may be considered, in which the random process 0, is replaced by a generalized random process, a "white noise," which can be defined as the derivative (in the sense of distributions) of the Wiener process w,:
X = b(Xf) + ew,.
(17)
Upon integrating equation (17), it takes the following form which does not contain distributions:
X; = X0 +
b(X$) ds + e(w,  w0).
(18)
fo
For perturbations of this form we can solve a larger number of interesting problems than for perturbations of the form (14), since they lead to a Markov
process X. A further generalization is perturbations which depend on the point of the space and are of the form Xf = b(Xf) + eQ(Xf)w,.
(19)
9
Introduction
where a(x) is a matrixvalued function. The precise meaning of equation (19) can be formulated in the language of stochastic integrals in the following way: r
S} = 0 for any b > 0, where I (w)  (co) I is the Euclidean length of the vector con(co)  ((o). If limb... M I g (w)  ( o) IZ = 0, then we say that verges to in mean square. Finally, a sequence converges to with probability 1 or almost surely if P{limn.,, , (w) = (w)} = 1. in mean square, then M If as follows from the CauchyBunyakovsky inequality. If  almost surely or in probability, then for the convergence of M to M5 we must impose additional assumptions. For example, it is sufficient to assume that the absolute values of variables do

not exceed a certain random variable q(w) having finite mathematical expectation (Lebesgue's theorem). We shall use repeatedly this and other theorems on the passage to the limit under the mathematical expectation sign. All the needed information concerning this question can be found in the book by Kolmogorov and Fomin [1]. Let I be a asubalgebra of the aalgebra F and suppose 9 is complete with respect to the measure P (this means that the aalgebra 9r contains, together with every set A, all sets from .F differing from A by a set of probability 0).
Let n be a onedimensional random variable having finite mathematical expectation. The conditional mathematical expectation of n with respect to the aalgebra N, denoted by M(r1I W), is defined as the function on (I measurable with respect to the aalgebra for which
fnM(q 19r) P(dco) =nf n(w)P(dw). for every A E §. The existence of the random variable M(n 11) follows from the RadonNikodym theorem. The same theorem implies that any two such random variables coincide everywhere except, maybe, on a set of measure 0. If the random variable XA(w) is equal to 1 on some set A e 9 and 0 for co 0 A,
§2. Random Processes. General Properties
17
then M(XA 11) is called the conditional probability of the event A with respect
to the Qalgebra I and is denoted by P(A 19). We list the basic properties of conditional mathematical expectations. 1.
M(ri1K)
_ Oif q>_0.
= M( 19) + M(q 1 9r) if each term on the right exists. and. Mti are defined and is measurable 19) _ M(ri 15) if with respect to the aalgebra 9. c V e 9. We have Let T' and I2 be two Qalgebras such that
2. M( + ri I 3. 4. 5.
Assume that the random variable does not depend on the Qalgebra 9, i.e., P({ E D} n A) = P{ E D} P(A) for any Borel set D and any
A e 9. Then M(g 15) = M provided that the latter mathematical expectation exists.
We note that the conditional mathematical expectation is defined up to values on a set of probability 0, and that all equalities between conditional
mathematical expectations are satisfied everywhere with the possible exception of a subset of the space S2 having probability 0. In those cases which do not lead to misunderstanding, we shall not state this explicitly. The proofs of Properties 15 and other properties of conditional mathematical expectations and conditional probabilities may be found in the book by Gikhman and Skorokhod [1].
§2. Random Processes. General Properties Let us consider a probability space {S2, ,F, P}, a measurable space (X, 9), and a set T on the real line. A family of random variables ,(w), t E T with values in (X, 4) is called a random process. The parameter t is usually called time and the space X the phase space of the random process ,(w). Usually, we shall consider random processes whose phase space is either the Euclidean space R' or a smooth manifold. For every fixed co E S2 we obtain a function ,, t e T with values in X, which is called a trajectory, realization, or sample function of the process ,(co).
The
collection of distributions of the random variables S,z, ... , ,) for all r = 1, 2, 3, ... and t...... t, e Tis called the family of finite
dimensional distributions of ,. If T is a countable set, then the finitedimensional distributions determine the random process to the degree of uniqueness usual in probability theory. In the case where T is an interval of the real line, as is known, there is essential nonuniqueness. For example, one can have two processes with the same finitedimensional distributions yet one has continuous trajectories for almost all co and the other has discontinuous trajectories. To avoid this nonuniqueness, the requirement of separability of
I. Random Perturbations
18
processes is introduced in the general theory. For all processes to be considered in this book there exist with probability one continuous variants or right continuous variants. Such a right continuous process is determined essentially uniquely by its finitedimensional distributions. We shall always consider continuous or right continuous modifications, without stating this explicitly.
With every random process , t e T there are associated the aalgebras _ F4, = s < t} and .F4 , = a s > t}, which are the smallest aalgebras with respect to which the random variables ,(co) are measurable for s < t and s > t, respectively. It is clear that for t1 < t2 we have the inclusion ,F4
tj
.F4
,2.
In what follows we often consider the con
ditional mathematical expectations M(gI,F,,), which will sometimes be denoted by M(qj s < t). By we denote the conditional mathematical expectation with respect to the aalgebra generated by the random variable ,. Analogous notation will be used for conditional probabilities. Assume given a nondecreasing family of aalgebras .K,:.A' 1 c .N12 for 0 < tl < t2 ; t1, t2 e T. We denote by . V the smallest aalgebra containing all aalgebras for t > 0. A random variable t(co) assuming nonnegative values and the value + oo is called a Markov time (or a random variable not
depending on the future) with respect to the family of aalgebras .N', if {CO: T(w) < t} e A' for every t e T. An important example of Markov time is
the first hitting time of a closed set in R' by a process , continuous with probability 1. Here the role of the aalgebras A' is played by the nondes < t). creasing family of aalgebras F = We denote by.A' the collection of the sets A c .AI for which
An{T 0, x E X, and r E 4 the relation
Px{XT+r E FL T} = P(t, XT, r)
is satisfied for almost all points of the set nT = {w E Q: r((O) < 00 } with respect to the measure P. Conditions ensuring that a given Markov process is a strong Markov process together with various properties of strong Markov processes, are discussed in detail in Dynkin's book [2]. We note that all Markov processes considered in the present book are strong Markov. As a simple important example of a Markov process may serve a Markov chain with a finite number of states. This is a Markov process in which the parameter t assumes the values 0, 1, 2, ... and the phase space X consists A homogeneous Markov of a finite number of points: X = {e,, ... , chain (we shall only encounter such chains in this book) is given by the square matrix P = (p;j)(i, j = 1, ... , n) of onestep transition probabilities: PC.{X, = e;} = pi,. It follows from the definition of a Markov process describes the distribution of that if the row vector q(s) = (q,(s), ... , XS(w) (i.e.,
Px{XS = ei} = q;(s)), q(s)P'S for qi _> 0,Y7 qi = 1, t > s. A row vector q = (q ... , then q(t) = for which qP = q is called an invariant distribution of the Markov process. Every chain with a finite number of states admits an invariant distribution. If all entries of P (or a power of it) are different from zero, then the invariant distribution q is unique and lim, P{X,(w) = ei} = qi for all x and e; E X. This assertion, called the ergodic theorem for Markov chains, can be carried over to Markov processes of the general kind. In the forthcoming sections we shall return to Markov processes. Now we recall some more classes of processes. We say that a random process cr(w), t > 0 in the phase space (R', _V') is a process with independent increments if the increments
are independent random variables for any t >
1 > . . . > t1 >_ 0.
23
§2. Random Processes. General Properties
The Poisson process is an example of such a process. It is a random process V1, t > 0 assuming nonnegative integral values, having independent increments and right continuous trajectories with probability one for which
P{v,vs=k}=
[(t  s)/L]k (Is)x. e
0_<s0; Mwsw, = min(s, t); for almost all w, the trajectories of w,(w) are continuous in t Cr [0, oo).
It can be proved (cf., for example, Gikhman and Skorokhod [1]) that a process with these properties exists on an appropriate probability space {f2, .F, P}. From the Gaussianness of w, it follows that for arbitrary moments
of time t >
t>
> tt > 0 the random variables
'I'll,  W,,,_,, Wt._,  W,,,_2,..., Wrz  W11
have a joint Gaussian distribution and from property (2) it follows that they
are uncorrelated: M(w,.,,  w,)(w,i  w,f) = 0 for i, j = 1, 2, ... , n; i # j. We conclude from this that the increments of a Wiener process are independent. We note that the increment of a Wiener process from time s to t, s < t
has a Gaussian distribution and M(w,  w3) = 0, M(w,  ws)2 = t  s. It can be calculated that M I w,  ws I = 27t t0_ s). We recall some properties of a Wiener process. The upper limits (lim sup) of almost all trajectories of a Wiener process are + oo and the lower limits (lim inf) are  oo. From this it follows in particular that the trajectories of a Wiener process pass through zero infinitely many times with probability 1 and the set {t: w,(w) = 0} is unbounded for almost all co. The realizations of a Wiener process are continuous by definition. Nevertheless, with probability 1 they are nowhere differentiable and have infinite variation on every time
interval. It can be proved that with probability one the trajectories of a Wiener process satisfy a Holder condition with any exponent a < 1/2 but do
not satisfy it with exponents a > 1/2. We also note the following useful identity:
P{ sup W., > a } = 2P{wT > a). oss5T
}
25
§3. Wiener Process. Stochastic Integral
Every random process ,(w), t E Twith values in a measurable space (X, 9) can be considered as a mapping of the space (0, 9:) into a space of functions defined on T, with values in X. In particular, a Wiener process w,(w), t e [0, T] determines a mapping of 0 into the space COT (R') of continuous functions on
[0, T] which are zero at t = 0. This mapping determines a probability measure jt in COT(R'), which is called the Wiener measure. The support of the Wiener measure is the whole space COT(R'). This means that an arbitrary small neighborhood (in the uniform topology) of every function q E COT(R') has positive Wiener measure. A collection of r independent Wiener processes w' ((o), w, (w), ... , w;(w) is called an rdimensional Wiener process.
The important role of the Wiener process in the theory of random processes can be explained to a large degree by the fact that many classes of random processes with continuous trajectories admit a convenient representation in terms of a Wiener process. This representation is given by means
of the stochastic integral. We recall the construction and properties of the stochastic integral.
Let us given a probability space {S2, ., P}, a nondecreasing family of aalgebras .N, t > 0, .N, S .F, and a Wiener process w, on {0, F, P}. We assume that the Qalgebras .A ^, are such that ,` 5, c N; for every t > 0 and
M(Iw',K'SI'I.4S)=ts
M(w,w3 .4's)=0;
for every 0 < s < t. This will be so, at any rate, if J, We say that a random process f (t, (o), t > 0 measurable in the pair (t, co) does not depend on the future (with respect to the family of aalgebras . r) if ,/'(t, w) is measurable with respect to A, for every t > 0. We denote by Ha, b, 0 0
Conversely, if a nonnegative definite matrix (a''(x)) for any real A .... , and a vector b(x) are given with sufficiently smooth entries, then we can construct a diffusion process with diffusion coefficients a'J(x) and drift b(x). This can be done, for example, by means of a stochastic differential equation: if the matrix a(x) is such that a(x)a*(x) = (a'}(x)), then the solutions of the
equation k, = b(X,)  o(X,)w, form a diffusion process with diffusion coefficients a''(x) and drift b(x). For the existence and uniqueness of solutions of the stochastic differential equation it is necessary that the coefficients a(x) and b(x) satisfy certain regularity requirements. For example, as has already been indicated, it is sufficient that a(x) and b(x) satisfy a Lipschitz condition. A representation of (a'j(x)) in the form (a'j(x)) = a(x)a*(x) with entries o (x) satisfying a Lipschitz condition is always possible whenever the functions a''(x) are twice continuously differentiable (Freidlin [5]). If det(a''(x)) : 0, then for such a representation it is sufficient that the functions a''(x) satisfy a Lipschitz condition. Consequently, every operator
L
1
2 i.i=ta
`'(x)
'()a,
axa2ax; + ;=tb x ax,
with nonnegative definite matrix (a'j(x)) and sufficiently smooth coefficients has a corresponding diffusion process. This diffusion process is determined essentially unique by its differential generator: any two processes with a common differential generator induce the same distribution in the space of
36
1. Random Perturbations
trajectories. This is true in all cases where the coefficients aii(x) and b(x) satisfy some weak regularity conditions, which are always satisfied in our investigations. In a majority of problems in probability theory we are interested in those properties of a random process which are determined by the corresponding
distribution in the space of trajectories and do not depend on the concrete representation of the process. In connection with this we shall often say: "Let us consider the diffusion process corresponding to the differential operator L," without specifying how this process is actually given.
A diffusion process corresponding to the operator L can be constructed without appealing to stochastic differential equations. For example, if the diffusion matrix is nondegenerate, then a corresponding process can be constructed starting from the existence theorem for solutions of the parabolic
equation au/at = Lu(t, x). Relying on results of the theory of differential equations, we can establish a series of important properties of diffusion processes. For example, we can give conditions under which the transition function has a density. In many problems connected with degeneracies in one way or another, it seems to be more convenient to use stochastic differential equations.
We mention some particular cases. If a''(x) = 0 for all i,j = 1, 2, ... , r, then L turns into an operator of the first order:
L=
b(x)
a ax'
In this case, the stochastic differential equations turn into the following system of ordinary differential equations:
x,=b(x,),
xo=x.
Consequently, to any differential operator of the first order there corresponds a Markov process which represents a deterministic motion given by solutions of an ordinary differential equation. In the theory of differential equations, this ordinary differential equation is called the equation of characteristics and its solutions are the characteristics of the operator L.
Another particular case is when all drift coefficients b'(x) 0 and the diffusion coefficients form a unit matrix: a'1(x) = V. Then, L = A/2, where A is the Laplace operator. The corresponding Markov family has the form
i.e., to the operator A/2 there corresponds the family of processes which are obtained by translating the Wiener process by the vector x e R'. For the sake connected with this family will also be of brevity, the Markov process (w,, called a Wiener process. The index in the probability or the mathematical
37
§5. Diffusion Processes and Differential Equations
expectation will indicate that the trajectory w' = x + w, is considered. For example, Px{w, a I'} = P{x + w, a F}, MX f (w,) = m f (x + w,). It is easy to see that if all coefficients of L are constant, then the corresponding Markov family consists of Gaussian processes of the form
X' =xQw,+bt. The diffusion process will also be Gaussian if the diffusion coefficients are constant and the drift depends linearly on x. Now let (X Px) be a diffusion process, A the infinitesimal generator of the process, and L the corresponding differential operator. Let us consider the Cauchy problem au(t, x) = Lu(t, x); at
u(0, x) = f(x),
xER', t > 0. A generalized solution of this problem is, by definition, a solution of the following Cauchy problem: au,
at =
Au,,
U.
= f.
The operator A is an extension of L, so that this definition is unambiguous. As has been noted in §3, the solution of the abstract Cauchy problem exists in every case where f e DA and can be written in the form
u,(x) = Mxf(X1) = T, AX).
If the classical solution u(t, x) of the Cauchy problem exists, then, since Au = Lu for smooth functions u = u(t, x), the function u(t, x) is also a
solution of the abstract Cauchy problem and u(t, x) = T, f (x) by the uniqueness of the solution of the abstract problem. This representation can be extended to solutions of the Cauchy problem with an arbitrary bounded continuous initial function not necessarily belonging to DA. This follows from the maximum principle for parabolic equations. If the matrix (a''(x)) is nondegenerate and the coefficients of L are sufficiently regular (for example, they satisfy a Lipschitz condition), then the equation au/at = Lu has the fundamental solution p(t, x, y), i.e., the solution of the Cauchy problem with initial function 8(x  y). As can be seen easily, this fundamental solution is the denity of the transition function: P(t, x, F) =
Sr
p (t, x, y) dy.
The equation au/at = Lu is called the backward Kolmogorov equation of the diffusion process (Xe, P.J.
38
1. Random Perturbations
Let c(x) be a bounded uniformly continuous function on R'. Consider the family of operators
t z 0,
T, f (x) = M. f (X,) eXP{ f c(Xs) ds j. o
in the space of bounded measurable functions on R'. The operators t, form a semigroup (cf., for example, Dynkin [2]). Taking into account that exp{ c(X,) ds
fo
1+
f c(XS) ds + o(t)
JJJ
0
as t 10, it is easy to prove that if f e DA and the coefficients of the operator L are uniformly bounded on R', then f belongs to the domain of the infinitesimal generator A of the semigroup t, and Af (x) = Af (x) + c(x) f (x), where A is the infinitesimal generator of the semigroup T, f (x) = M,, f (X,). Using this observation, it can be proved that for a bounded continuous function f (x) the solution of the Cauchy problem 8v(t, X)
at
_ Lv(t, x) + c(x)v(t, x),
x e R',
t > 0,
v(0, x) = f (x)
can be written in the form
v(t, x) = T, f (x) = M. f (X) exp{ fo rc(Xs) ds)}.
A representation in the form of the expectation of a functional of the trajectories of the corresponding process can also be given for the solution of a nonhomogeneous equation: if aw
at
= Lw + c(x)w + g(x),
w(0, x) = 0,
then
w(t, x) = M.
ft9(x) eXP{
f
du} ds.
A probabilistic representation of solutions of an equation with coefficients depending on t and x can be given in terms of the mean value of functionals of trajectories of the process determined by the nonhomogeneous stochastic differential equation
39
§5. Diffusion Processes and Differential Equations
X, = b(t, X) + 6(t, X)tv,
X'0 = X.
The solutions of this equation exist for any x e R', to > 0 if the coefficients are continuous in t and x and satisfy a Lipschitz condition in x with a constant independent of t (Gikhman and Skorokhod [1]). The set of the processes X,lo.x for all to > 0 and x c R' forms a nonhomogeneous Markov family (cf. Dynkin [1]). In the phase space R' of a diffusion process (Xe, Px) let a bounded domain D with a smooth boundary 8D be given. Denote by r the first exit time of the process from the domain D: r = r(w) = inf{t:X, 0 D}. In many problems we are interested in the mean of functionals depending on the behavior of the process from time 0 to time T. for example, expressions of the form
Mx Jf(X) ds,
Mx exp{
lJ
' f (XS) ds }, etc. o
)
These expressions as functions of the initial point x are solutions of boundary value problems for the differential generator L of the process (X,, Px). In the domain D let us consider Dirichlet's problem Lu(x) + c(x)u(x) = f (x),
x c D;
U(X) Ixe OD = O(X)
It is assumed that c(x), J'(x), for x e R', and >'(x), for x c 8D, are bounded continuous functions and c(x) S 0. Concerning the operator L we assume that it is uniformly nondegenerate in D u 8D, i.e., Y a`J(x).t12 k Y , , k > 0, and all coefficients satisfy a Lipschitz condition. Under these conditions, the function u(x)
f
Mx fo f (X) exp oc(X,) ds } dt + Mx O(XT) expS J o c(X5) ds
is the unique solution of the above Dirichlet problem. l In order to prove this, first we assume that the solution u(x) of Dirichlet's problem can be extended with preservation of smoothness to the whole space R'. We write YY = j O c(X,) ds and apply Ito's formula to the function u(X,)e" :
u(XJ exp[ f 'c(X,) ds]  u(x) = J(eVu(Xs), Q(X,) dw,) + Lo
+ f e',[Lu(X5)  J'(Xs)] ds 0
+ f u(X,)eY,c(X,) ds. 0
fte'f(x) ds
40
1. Random Perturbations
This equality is satisfied for all t >_ 0 with probability 1. We now replace t by
the random variable T. For s < r the trajectory X. does not leave D, and therefore, Lu(X.,) = f (X,)  c(X.,)u(X,). Hence for t = r the last two terms on the right side of the equality obtained by means of Ito"'s formula cancel each other. The random variable r is a Markov time with respect to the aalgebras .K,. Under the assumptions made concerning the domain and the process, we have M. ,r < K < oo. Therefore Mx
(ey,Vu(Xs), a(Xs) dws) = 0. 0
utilizing these remarks, we obtain Mxu(X,) exp[ Jo` c(XS) ds]  u(x) = Mx ff(X) exp[J fc(X,.) dvl ds. 0
o
J
Our assertion follows from this in the case where u(x) can be extended smoothly to the whole space R'. In order to obtain a proof in the general case,
we have to approximate the domain D with an increasing sequence of domains D c D with sufficiently smooth boundaries OD.. As boundary functions we have to choose the values, on aD,,, of the solution u(x) of Dirichlet's problem in D. We mention some special cases. If c(x) = 0, fi(x) = 0 and f (x) 1, then u(x) = Mx r. The function l(x) is the unique solution of the problem Lu(x) =  1
for x e D,
u(x)IeD = 0.
If c(x) _ 0, f (x) _ 0, then for a(x) = MX fi(X,) we obtain the problem
Lu(x) = 0,
x e D;
u(x)111D = 1G(x)
If c(x) > 0, then, as is well known, Dirichlet's problem can "go out to the
spectrum"; the solution of the equation Lu + c(x)u = 0 with vanishing boundary values may not be unique in this case. On the other hand, if c(x) < co < oo, and Mxe`ot < oo for x e D, then it can be proved that the solution of Dirichlet's problem is unique and the formulas giving a representation of the solution in the form of the expectation of a functional of the corres
ponding process remain valid (Khasminskii [2]). It can be proved that sup{c: Mxe" < oo} = ,1, is the smallest eigenvalue of the problem
Lit =21u,UI8D=0. We will need not only equations for expectations associated with diffusion processes, but also certain inequalities.
If u(x) is a function that is smooth in D and continuous in its closure
41
§5. Diffusion Processes and Differential Equations
D U OD, Lu(x) < co < 0 for x E D, then Mxt
u(x)  min{u(x) : x E aD} co
if u(x) is positive in D u aD, and Lu(x)  ..u(x) < 0, x c D, then MxexZ
0, xcD; w(t, x) It > o, xE eD = fi(x)
can be represented in the form w(t, x) = Mx{z >
t; f(X,)} + Mx{r < t; (X)},
under some regularity assumptions on the coefficients of the operator, the boundary of the domain D and the functions f (x) and O(').
On the one hand, we can view the formulas mentioned in this section which connect the expectations of certain functionals of a diffusion process with solutions of the corresponding boundary value problems as a method of calculating these expectations by solving differential equations. On the other hand, we can study the properties of the functionals and their mathematical expectations by methods of probability theory in order to use this information for the study of solutions of boundary value problems. in our book the second point of view will be predominant. A representation in the form of the mathematical expectation of a func
tional of trajectories of the corresponding process can also be given for several other boundary value problems, for example, for Neumann's problem, the third boundary value problem (Freidlin [3], Ikeda [1]). Now we turn to the behavior of a diffusion process as t  o c. For the sake of simplicity, we shall assume that the diffusion matrix and the drift vector consist of bounded entries satisfying a Lipschitz condition and Y_ a"(x)A,.Z > k Y .13, k > 0,
i.j=1
i=1
for x E R' and for all real A...... A,. Such a nondegenerate diffusion process may have trajectories of two different types: either trajectories going out to
1. Random Perturbations
42
infinity as t  x with probability P = 1 for any x c R' or trajectories which return to a given bounded region after an arbitrary large t with probability Px = 1, x e R", although P.,{ I X, I = x } = 1.
.
The diffusion processes which have trajectories of the second type are said to be recurrent. The processes for which I X, I = oc } = 1, are said to be transient. It is easy to prove that the trajectories of a recurrent process hit every open set of the phase space with probability P. = 1 for any x E R'. We denote by r = inf{t: I X, I < 1 } the first entrance time of the unit ball with
center at the origin. For a recurrent process, Px{t < x} = 1. If Msr < 00 for any x E R', then the process (X,, Px) is said to be positively recurrent, otherwise it is said to be null recurrent. The Wiener process in R' or R2 serves
as an example of a null recurrent process. The Wiener process in R' is transient for r > 3. If, uniformly for all x E R' lying outside some ball, the projection of the drift b(x) onto the radius vector connecting the origin of coordinates with the point x is negative and bounded from below in its absolute value, then the process (X,, P.,) is positively recurrent. It is possible to give stronger sufficient conditions for recurrence and positive recurrence in terms of socalled barriersnonnegative functions V(x), x E R' for which LV(x) has a definite sign and which behave in a certain way at infinity. The recurrence or transience of a diffusion process is closely connected
with the formulation of boundary value problems for the operator L in un
bounded domains. For example, the exterior Dirichlet problem for the Laplace operator in R2, where the corresponding process is recurrent, has a unique solution in the class of bounded functions while in order to select the unique solution of the exterior Dirichlet problem for the operator 0 in R3, it is necessary to prescribe the limit of the solution as I x I  00. It can be proved that if a diffusion process (X Px) is positively recurrent, then it has a unique stationary probability distribution µ(I), I a R', i.e., a
probability measure for which U, p(F) = f R.p(dx)P(t, x, I)= u(F). This measure has a density m(x) which is the unique solution of the problem
L*m(x) = 0
for x e R',
m(x) > 0,
5rn(x) dx = 1. R'
Here L* is the formal adjoint of L: 1
'
'
a2
L*m(x)
i
a
ar,
(b'(x)m(x)).
For positively recurrent diffusion processes the law of large numbers holds in the following form:
r Px lim
IT.
T
J(x)m(x) dx = 1
./(XS) ds = J0
J R'
43
§5. Diffusion Processes and Differential Equations
for arbitrary x e R' and any bounded measurable function f (x) on R'. The process X; , t E [0, T] defined by the stochastic differential equation
X; = b(X) + a(X;)w
Xo = x,
as every other random process induces a probability distribution in the space of trajectories. Since the trajectories of X; are continuous with probability 1,
this distribution is concentrated in the space CoT of continuous functions assuming the value x at t = 0. We denote by ux the measure corresponding to X; in CoT. Together with X;, we consider the process Yx satisfying the stochastic differential equation
Yo=x.
Y: = b(Yx) + a(Yi)ii, + f (t, Y;),
The processes Y; and X; coincide for t = 0; they differ by the drift vector f (t, Y,). Let py be the measure corresponding to the process Y, in CoT. We will be particularly interested in the question of when the measures pX and Zy are absolutely continuous with respect to each other and what the density of one
measure with respect to the other looks like. Suppose there exists an rdimensional vector (p(t, x) with components bounded by an absolute constant and such that a(x)cp(t, x) = f (t, x). Then yx and p y are absolutely continuous with respect to each other and the density dµ,./dµX has the form T
dµX
(X") = exp J o ((p(t, X' ), dw,) 
i
l
T
Jo
I W(t, Xx) I2 dt }
(Girsanov [1], Gikhman and Skorokhod [1]). In particular, if the diffusion matrix a(x) = a(x)a*(x) is uniformly nondegenerate for x e R', then p,r and py are absolutely continuous with respect to each other for any bounded measurable f(t, x). If X; = x + w, is the Wiener process and I
YX =x+w,+f(s)ds,
f
0
then
dµx = exp j fo (f (s), dws) 

TIIf
Jo
(s) I2 ds}.
The last equality holds provided that I TI I f (s) I2 ds < oo.
Chapter 2
Small Random Perturbations on a Finite Time Interval
§1. Zeroth Order Approximation In the space R' we consider the following system of ordinary differential equations: X' = b(X', el;,),
Xo = x.
(1.1)
Here ,(w), t > 0 is a random process on a probability space {1, .y, P} with
values in R' ands is a small numerical parameter. We assume that the trajectories of are ,(co)right continuous, bounded and have at most a finite number of points of discontinuity on every interval [0, T], T < 00. At the points of discontinuity of where as a rule, equation (1.1) cannot be satisfied, we impose the requirement of continuity of X. The vector field b(x, y) = (b'(x, y),. .. , b'(x, y)), x e R', y e R' is assumed to be jointly continuous in its variables. Under these conditions the solution of problem (1.1) exists for almost all w e S1 on a sufficiently small interval [0, T], T = T(V)).
Let b(x, 0) = b(x). We consider the random process Xf as a result of small perturbations of the system z, = b(x,),
x0 = x.
(1.2)
Theorem 1.1. Assume that the vector field b(x, y), x e R', y e R' is continuous and that equation (1.2) has a unique solution on the interval [0, T]. Then, for sufficiently small e, the solution of equation (1.1) is defined for t e [0, T] and
P'lim max IX{x,J=0l.=1. (e0 0 0 and 6 > 0 we have
limp{ max IX.,xSI> $=0,
MIX;  x,12 < e2a(t),
s+0
(05ssf
J
where a(t) is a monotone increasing function, which is expressed in terms of Ixland K.
For the proof, we need the following lemma, which we shall use several times in what follows.
Lemma 1.1. Let m(t), t e [0, T], be a nonnegative function satisfying the relation
m(t) < C + at
J0
m(s) ds,
with C, a > 0. Then m(t) < Ceaf
for t e [0, TI.
t e [0, T],
(1.4)
2. Small Random Perturbations on a Finite Time Interval
46
Proof. From inequality (1.4) we obtain m(t) 1 C + a
Io
l
m(s) dsY
t < 1.
Integrating both sides from 0 to t, we obtain
InIC+a Itm(s)dsJInC OSsSt
o
b j < 46ZeZ fo
M[aXXs)]2 ds = c2S2a2(t) (1.8)
Estimates (1.6)(1.8) imply the last assertion of the theorem.
In some respect we make more stringent assumptions in Theorem 1.2 than in Theorem 1.1. We assumed that the coefficients satisfied a Lipschitz condition instead of continuity. However, we obtained a stronger result in that not only did we prove that X; converges to x, but we also obtain estimates of the rate of convergence. If we make even more stringent assumptions concerning the smoothness of the coefficients, then the difference X;  x, can be estimated more accurately. We shall return to this question in the next section. Now we will obtain a result on the zeroth approximation for a differential equation with a right side of a sufficiently general form. We consider the differential equation X; = b(e, t, X 'v w),
X' = x
in R". Here b(s, t, x, w) _ (b'(e, t, x, co), ... , b'(e, t, X, (0)) is an rdimensional vector defined for x e R', t > 0, e > 0 and w e Q. We assume that the field b(e, t, x, w) is continuous in t and x for almost
all w for any > 0, sup
M I b(e, t, x, co) IZ < oc,
12A.r.e(0.1l
and for some K > 0 we have I b(e, t, x, w)  b(e, t, y, w)1 < K I x  y
almost surely for any x, y e R', t > 0, e > 0. We note that continuity in a for fixed t, x, w is not assumed.
Theorem 1.3. We assume that there exists a continuous function b(t, x), t > 0. x e R' such that for any 6 > 0, T > 0, x e R' we have ro+T
ia+ T
b(e, t, x, co) dt 
b(t, x) dt
>b}=0
(1.9)
fto
uniformly in to > 0. Then the equation X, = b(t, xt),
xo = x
(1.10)
49
§1. Zeroth order Approximation
has a unique solution and
lira P{ max I X,  xr 1 > 61 = 0 05r5T
for every T > 0 and 6 > 0.
Proof. First we note that the function b(t, x) satisfies a Lipschitz condition in x with the same constant as the function b(e, t, x, w). Indeed, since the function b(t, x) is continuous, by the mean value theorem we have r
+e
a  0.
b(s, x) ds = b(t, x) e + o(o),
f
Taking account of (1.9), we obtain that r+e
15(t, x)  b(t, y) I =
f
b(s, x) ds 
<el
f
rr + e I
b(s, y) ds
r+e
r+e
b(e, s, x, w) ds 
b(a, s, y, w) ds Jr
+°(e)+SE c5
fO
< P max OSkSn
rkT/n
[b(e, s, 9.,, co)  b(s, x,)] ds > 11
I
J0
(k + 1) T/n
+ P max f OSkSn JkTln
(1.15)
l
I b(E, s, xs, co)  b(s, xs) I ds > 2 } . )))
§2. Expansion in Powers of a Small Parameter
51
We estimate the last term by means of Chebyshev's inequality: r(k + U TI.
pI max f
I b(c, s, xs, w)  b(s, xs) I ds >
kT/n
k
2 (k + 1) T/n
< n  max P IT /n
k
21 (1.16)
4 T2
sup M I b(e, s, .xs, (o)  b(s, xs)12 b2 n2 ?O.eE(0.1)
4T 2 n6
sup
M C l b(s, x5) I + I b(s, s, 0, (a) I + K IX., I ] 2 < nb
S>_O.sE(0.1)
where C1 is a constant. Here we have used the fact that sup
M I b(s, s, 0, w) 12 < oo.
SZO.iE(O. 1)
It follows from (1.14)(1.16) that the right side of (1.12) converges to zero in probability ass  0. This completes the proof of Theorem 1.3.
The random process X' considered in Theorem 1.3 can be viewed as a result of random perturbations of system (1.10). We shall return to the study of similar perturbations in Ch. 7.
§2. Expansion in Powers of a Small Parameter We return to the study of equations (1.1) and (1.3). In this section we obtain
an expansion, of X; in powers of the small parameter s provided that the functions b(x, y) are sufficiently smooth. We follow the usual approach of perturbation theory to obtain an expansion (2.1)
of X,' in powers of E. We substitute this expansion with unknown coefficients X;o', .. , X;k), ... into equation (1.1) and expand the right sides in powers of E. Equating the coefficients of the same powers on the left and right, we obtain
differential equations for the successive calculation of the coefficients Xt(o)$ X,('),... in (2.1).
We discuss how the right side of (1.1) is expanded in powers of E. Let X(E) be any power series with coefficients from R':
X(E)=C0+CIE+.+CIE'`+.
52
2. Small Random Perturbations on a Finite Time Interval
We write 1 dkb(X(e), ey) k!
dek
E=0
It is easy to see that (Dk depends linearly on Ck fork > 1 and 1k is a polynomial
of degree k in the variable y. In particular, 0o = b(co, 0),
01 = B1(co,
0)c1 +
B2(co, O)y,
where B1(x, y) = (ab'(x, y)/axk) is a square matrix of order r and B2(x, y) = (ab1(x, y)/ayk) is a matrix having r rows and J columns. It is clear from the definition of that the difference(Dk  B1(co,O)Ck y) is independent of Ck
.
Carrying out the above program, we expand both sides of (1.1) in powers of E:
Xf0) + EX'11) + ... + EkXrlk) +
00(Xi°),
b,)
+ E(D1(X;°), X,1), 5r) + ....+. Ek(I)k(X;°),
.., X; k),
br)+...
Hence we obtain the differential equations X(o) = D0(X (,°), ,) = b(X;°), 0),
9,(1) = ` 1(X1°), X(1), r) = B1(X;°), O)Xi°) + B2(Xi°), 0) r,
Xik) = q)k(Xi°), .. , X(k) ir) =
B1(X(,O),
O)Xik) + `Yk(X;°), ... , X,(k 1), Sr), (2.2)
To these differential equations we add the initial conditions Xa) = x, Xa) = 0, ... , X (k) = 0,.... If b(x, y) is sufficiently smooth, then equations (2.2), together with the initial conditions, determine the functions X(,°), X;1), ... , X(,k), ... uniquely. The zeroth approximation is determined from the first equation of system (2.2), which coincides with (1.2). If X;°) is known, then the second equation in (2.2) is a linear equation in X1,1). In general, if the functions X;°j, ... , X;k1j are known, then the equation for X;k) will be a nonhomogeneous linear equation with coefficients depending on time.
Theorem 2.1. Suppose the trajectories of a process ,(co) are continuous with probability 1 and that the function b(x, y), x e R`, y e R', has k + 1 bounded continuous partial derivatives with respect to x and y. Then XE = X(0)r+ EXIrT) + ....+. r
ekXlk)r
+ RE+l (t) k
53
§2. Expansion in Powers of a Small Parameter
where the functions X;i), i = 0, 1,
sup
..., k, are determined from system (2.2) and
IRE+1(t)I
to. First we obtain that tE  to as e 0. From this we obtain
X. =
X(o) + EXT + O(E)
(2.16)
= X,0) + (tt  to)X,°) + O(rt  t°) + EX,a' + O(E). Taking the scalar product of (2.16) and n, we obtain
(XT.  X',°), n) = (tE  to)(9,°), n) + o(re  to) + E(X,o), n) + o(E)
(2.17)
On the other hand, because of the smoothness of 8D at the point X;°), the scalar product on the left side of (2.17) will be infinitesimal compared to XT,  X. It follows from this and from (2.16) that (XT.  X;o), n) = o(tL  to) + o(c).
(2.18)
From (2.17) and (2.19) we obtain the expansion (2.14) for tE. Substituting the 11 expansion in (2.16) again, we obtain (2.15).
The coefficient of E in the expansion (2.15) can be obtained by projecting X;o) parallel to X(,°) onto the tangent hyperplane at X;°). If the expansion (2.8) holds with k = 2, the function X(,°) is twice differentiable and the random function X;') is once differentiable, then we can obtain an expansion of tE and XT, to within o(EZ) (although the corresponding
§3. Elliptic and Parabolic Differential Equations
59
functional is twice differentiable only on some subspace). On the other hand, if x,(1) is not differentiable (this happens in the case of diffusion processes with small diffusion, considered in Theorem 2.2) then we do not obtain an expansion for r, to within o(4 We explain why this is so. The fact is that in the proof of Theorem 2.3 we did not use the circumstance that TE is exactly the first time of reaching the boundary but only that it is a time when X, is on the boundary, converging to to. If we consider a process
X, of a simple form: X, = xo + t + aw then the first time Te of reaching a point x t > xo and the last time a` of being at x t differ by a quantity of order F2. Indeed, by virtue of the strong Markov property with respect to the Markov time e, we obtain that the distribution of a`  r is the same as that of the random variable r` = max{t: t + Fw, = 0}. Then, we use the fact that F 2(162 + Fw,,2) = t + F w,.2 = t + w where w, is again a Wiener process issued from zero and C` = F21;, where = max{t: t + w, = 0).
§3. Elliptic and Parabolic Differential Equations with a Small Parameter at the Derivatives of Highest Order In the theory of differential equations of elliptic or parabolic type, much attention is devoted to the study of the behavior, as F  0, of solutions of boundary value problems for equations of the form L`u` + c(x)u` = f (x) or av`lOt = Levi + c(x)v' + g(x), where L` is an elliptic differential operator with a small parameter at the derivatives of highest order:
L` =
FZ
2
Y_ a"(x) ;=
a2
8x ax
; + Y_ b'(x) a;. =1
8x
As was said in Chapter 1, with every such operator V (whose coefficients are assumed to be sufficiently regular) there is associated a diffusion process X. This diffusion process can be given by means of the stochastic equation
I" = b(X,'') + Fa(X,'")w,,
Xo" = x,
(3.1)
where a(x)a*(x) = (a''(x)), b(x) = (b' (x), ... , bo(x)). For this process we shall sometimes use the notation sometimes X,(x) (in the framework of the notion of a Markov family), sometimes X, and in which case we shall write the index x in the probability and consider the Markov process (X,, P"). In the preceding two sections of this chapter we obtained several results concerning the behavior of solutions X,"(o)) of equations (3.1) as a  0. Since the solutions of the boundary value problems for L` can be written as mean values of some functionals of the trajectories of the family (X,", P) results concerning the behavior of solutions of boundary value problems as F  0 can be obtained from the behavior of as s + 0. The present section is devoted to these questions.
60
2. Small Random Perturbations on a Finite Time Interval
We consider the Cauchy problem av`(t, x) at
_ L`v`(t, x) + c(x)v`(t, x) + g(x);
t > 0, x e R', v`(O, x) = f(x) (3.2)
for s > 0 and together with it the problem for the firstorder operator which is obtained for r = 0: av°(t, x) _ Lovo + c(x)v° + g(x); at
t> 0,
x E R',
v°(0, x) = f (x). (3.3)
We assume that the following conditions are satisfied. (1)
the functions c(x) are uniformly continuous and bounded for xER';
(2)
the coefficients of L' satisfy a Lipschitz condition; ,12 < .;=I a"(x)ArA; < k2 % for any real AA 2, ... , "and x e R, where k2 is a positive constant.
(3) k2
Under these conditions, the solutions of problems (3.2) and (3.3) exists and are unique.
All results of this paragraph remain valid in the case where the form Y a''(x)A; A is only nonnegative definite. However, in the case of degeneracies
the formulation of boundary value problems has to be adjusted and the notion of a generalized solution has to be introduced. We shall make the adjustments necessary in the case of degeneracies after an analysis of the nondegenerate case. Theorem 3.1. If conditions (1)(3) are satisfied, then the limit limL...o v`(t, x) = v°(t, x) exists for every bounded continuous initial function f(x). x e R". The function v°(t, x) is a solution of problem (3.3). .
For the proof we note first of all that if condition (3) is satisfied, then there exists a matrix a(x) with entries satisfying a Lipschitz condition for which a(x)a*(x) _ (a"(x)) (cf. §5, Ch. 1). The solution of equation (3.2) can be represented in the following way: s
r
v'(t, x) = Mf (X;,x) exp [JO c(Xs'x) ds + M
g(X:,x) exp JO
c(X1"x) du] ds, J0
where Xf,x is the Markov family constructed by means of equation (3.1). It follows from Theorem 1.2 that the processes Xs"(co) converge to X°,' (the
61
§3. Elliptic and Parabolic Differential Equations
solution of equation (1.2) with initial condition X ox = x) in probability uniformly on the interval [0, t] as E * 0. Taking into account that there is a bounded continuous functional of X"(co) under the sign of mathematical expectation in (3.4), by the Lebesgue dominated convergence theorem we conclude that
1
io
VT, x)
f(X°'x) exp
c(X°.") ds] +
g(X°,
Jo
[Jt
)
exp [JO c(X'
) du] ds.
An easy substitution shows that the function on the right side of the equality is a solution of problem (3.3). Theorem 3.1 is proved.
If we assume that the coefficients of L` have bounded derivatives up to order k + 1 inclusive, then the matrix o(x) can be chosen so that its entries also have k + 1 bounded derivatives. In this case, by virtue of Theorem 2.2 we can write down an expansion for X;x in powers of E up to order k. If the functions f (x), c(x), and g(x) have k + 1 bounded derivatives, then, as follows from (2.7), we have an expansion in powers of E up to order k with remainder of order Ek+ 1
Hence, for example, if g(x) = c(x) = 0 and r = 1, then the solution of problem (3.2) can be written in the form
v`(t, x) = Mxf(Xi) Mxf(X(,°) + EX( ' + ... + SIX,(') + R4 +1(t))
_
s'MxG; +
(3.5)
O(Ek+1),
i=o
X;k' are the coefficients mentioned in Theorem 2.2 of the expansion of X, in powers of the small parameter; where X01,
Gi = G ;(X;°), ... , X c > 0.
max T(x) S t 0 and x e D we have T "(X) y T(x),
sup I XS(x)  xs(x) I  0 05s5t
68
2. Small Random Perturbations on a Finite Time Interval
in probability as a  0 and note that the mathematical expectation of the square of the random variable in (3.16) under the sign of mathematical expectation is bounded uniformly in e < e° provided that e° is sufficiently small.
Remark 1. If T(x) < oo but the trajectory x,(x) does not leave D in a regular manner, then, as follows from simple examples, the limit function may have discontinuities on this trajectory.
Remark 2. It is easy to verify that the limit function a°(x) in Theorem 3.3 satisfies the following firstorder equation obtained for e = 0: r
L°u°(x) + c(x)u°(x) _
DO
b`(x) ax' + c(x)u°(x) = g(x). t= t
The function a°(x) is chosen from the solutions of this equation by the condition that it coincides with +/i(x) at those points of the boundary of D through which the trajectories x,(x) leave D. Remark 3. Now let us allow the matrix (a"(x)) to have degeneracies. In this case problem (3.15) must be modified. First, we cannot prescribe boundary conditions at all points of the boundary. This is easily seen from the example of firstorder equations; boundary conditions will not be assumed at some points of the boundary. Second, a classical solution may not exist even in the case of infinitely differentiable coefficients, and it is necessary to introduce the notion of a generalized solution. Third and finally, a generalized solution may not be unique without additional assumptions. To construct a theory of such equations with a nonnegative characteristic form, we can use methods of probability theory. The first results in this area were actually obtained in this way (Freidlin [1], [4], [6]). Some of these results were subsequently obtained by traditional methods of the theory of differential equations. If the entries of (a''(x)) have bounded second derivatives, then there exists a factorization (a''(x)) = v(x)a*(x), where the entries of a(x) satisfy a Lipschitz condition. In this case the process (X`, Ps) corresponding to the operator L` is constructed by means of equations (3.1). In Freidlin's publications [I] and [4], this process is used to make precise the formulation of boundary value problems for V. to introduce the notion of a generalized solution, to prove existence and uniqueness theorems, and to study the smoothness of a generalized solution. In particular, if the functions a"(x) have bounded second derivatives and satisfy the hypotheses of Theorem 3.2 or Theorem 3.3, respectively (with the exception of nondegeneracy), then for every sufficiently small e, the generalized solution exists, is unique, and satisfies equality (3.16). In this case the
§3. Elliptic and Parabolic Differential Equations
69
assertion of Theorem 3.2 (Theorem 3.3) also holds if by u(x) we understand the generalized solution. After similar adjustments, Theorem 3.1 also remains valid.
Theorems 3.2 and 3.3 used results concerning the limit behavior of X1 which are of the type of law of large numbers. From finer results (expansions in powers of e) we can obtain finer consequences concerning the asymptotics of the solution of Dirichlet's problem. Concerning the expansion of the solution in powers of a small parameter (in the case of smooth boundary condi
tions), the best results are not obtained by methods of pure probability theory but rather by purely analytical or combined (cf. Holland [1]) methods. We consider an example with nonsmooth boundary conditions.
Let the characteristic x,(x), t > 0 issued from an interior point x of a domain D with a smooth boundary leave the domain, intersecting its boundary for the value'to of the parameter; at the point y = x,o(x) the vector b(y) is directed strictly outside the domain. Let u` be a solution of the Dirichlet problem L`u` = 0, u`  1 as we approach some subdomain r, of the boundary and u` i 0 as we approach the interior points of aD\I.1(and u` is assumed to be bounded everywhere). Suppose that the surface area of the boundary of I't is equal to zero. Then the solution W(x) is unique and can be represented in the form u`(x) = MxXr,(Xr"')
If y is an interior point of r1 or aD\I,, then the value of u` at the point x converges to I or 0, respectively, as e  0 (results concerning the rate of convergence must rely on results of the type of large deviations; cf. Ch. 6, Theorems 2.1 and 2.2). On the other hand, if y belongs to the boundary of the domain I1, then the expansion (2.15) reduces the problem of asymptotics of u`(x) to the problem of asymptotics of the probability that the Gaussian
random vector X;°I  9,0)[(X;0), n)/(X;°), n)] hits the e' times magnified projection of I"1 onto the tangent plane (tangent line in the twodimensional case). In particular, in the twodimensional case if r, is a segment of an arc with y as one endpoint, then limt_ou`(x) = 1. The same is true in the higher dimensional case provided that the boundary of I i is smooth at y. If this boundary has a "corner" at y, then the problem reduces to the problem of the probability that a normal random vector with mean zero falls into an angle (solid angle, cone) with vertex at zero. Using an affine transformation, one can calculate the angle (solid angle).
Chapter 3
Action Functional
§1. Laplace's Method in a Function Space We consider a random process X; = X,(x) in the space R' defined by the stochastic differential equation
X' = b(X,) +
Xo = x.
(1.1)
Here, as usual, wr is a Wiener process in R' and the field b(x) is assumed to be sufficiently smooth. As is shown in §1, Ch. 2, as a + 0, the trajectories of X; converge in probability to the solution of the unperturbed equation Xr = b(xr),
x0 = x.
(1.2)
uniformly on every finite time interval. In the special case which we are considering it is easy to give an estimate of the probability
P{ sup I X'(x)  xr(x) I > S lllosrsT
which is sharper than that given in Ch. 2. Indeed, it follows from equations (1.1) and (1.2) that
X;(x)  xr(x) = f[b(X(x))  b(xs(x))] ds + Ewr.
(1.3)
0
Assuming that b(x) satisfies a Lipschitz condition with constant K, we obtain from (1.3) that
sup IX,  xrI < gelT sup Iwrl O5t5T
OSrST
(1.4)
71
Method in a Function Space
§
This implies that the probability of the deviation of X;(x) from the trajectory of the dynamical system decreases exponentially with decreasing e:
P1 sup IXXx)  x(x)I > OStST
S < P{
sup Iwtl >
tttO5t5T
2P'1wTl >
eKT e
aKT1 2e
1(riz)t
=Of I 7e2KTJ
_ ex p
SZ
WT
a
2KT
= O(eC`2), where C is a positive constant (depending on 6, K, and T). This estimation means that if a subset A of the space of continuous func
tions on the interval from 0 to T contains a function x,(x) together with its 6neighborhood in this space, then the main contribution to the probability P{X`(x) a A} is given by this 6neighborhood; the probability of the remaining part of A is exponentially small.
In many problems we are interested in probabilities P{X`(x) a A} for sets A not containing the function x,(x) together with its neighborhood. Such problems arise, for example, in connection with the study of stability under random perturbations, when we are mainly interested in the probability of exit from a neighborhood of a stable equilibrium position or of a stable limit cycle in a given time or we are interested in the mean exit time from such a neighborhood. As we shall see, similar problems arise in the study of the limit behavior of an invariant measure of a diffusion process X; as a * 0, in connection with the study of elliptic differential equations with a small parameter at the derivatives of highest order and in other problems. If the function x,(x) together with some neighborhood of it is not contained in A, then P{X`(x) e A} a 0 as a + 0. It turns out that in this case, under certain assumptions on A, there exists a function qp e A such that the principal part of the probability measure of A is concentrated near q ; more precisely, for any neighborhood U(qp) of cp we have P{X`(x) a A\U((p)) = o(P{X`(x) a U((p))) as
A similar situation arises in applying Laplace's method to calculate the asymptotics as a  0 of integrals of the form J e`tf(")g(x) dx. If x0 is the only minimum point of the continuous function f (x) on the interval [a, b] and the function g(x) is continuous and positive, then the major contribution to this integral is given by the neighborhood of x0. Indeed, let Ut be a neighborhood of x0. Since x0 is the only minimum point off in [a, b], we have
72
3. Action Functional
minxe[a,b]\U1 f(x) > f(xo) + y, where y is a positive number. Using this estimate, we obtain
g(x) exp{e'f(x)} dx < (b  a) max g(x) [a, b]\U,
exp{E'(f(xo) + y)}.
x e [a, b]
(1.5)
For the integral over the neighborhood of xo, we obtain the following lower estimate :
r
xo+b
g(x) exp{ e If (x)} dx >
J U1
J xo+6
g(x) exp{ E'(f (xo) + y/2)} dx
> 26 min g(x) exp {  e
(f (xo) + y/2)},
(1.6)
x e [a, b]
where 6 is chosen from the conditions: maxlx_xol 0, y > 0, and so > 0 there exists an so > 0 such that for 0 < e <eo and s < so we have
P{POT(X`, (D(s)) > b) < expe2(s  y)). Proof of Theorem 2.1. If SOT(O < K < oo, then cp is absolutely continuous
and fo 1 0s 12 ds < oo. We consider the random process Yr = X'  cp,
obtained from X; = ew, by the shift by cp. A shift by a function having a square integrable derivative induces an absolute continuous change of
75
§2. Exponential Estimates
measures in COT. If u,w is the measure in COT corresponding to the process
X = Ew, and µy. is the measure corresponding to Yc, then the density of the second measure with respect to the first one has the form 5T(4
(
dyy'
(ew) = exp{ E t d{Lyw t
.. ,
dws)
82

fTI2 ds
.
2
P{PoT(X`, q0) < b}
= P{POT(Y`, 0) < S} =
(
= exp{ (
E
22 1T1
dur` (Ew)P(dw)
I
OT(&,0) 2V/2 e I
2M(fo (cps, dws))2 8s2
1
SOT(4P)
= 1/4,
SOA(P)
T
P expE1 f(0.,dw.)j > exp{2\E1 SoT(W)}3/4.
(2.3)
From estimates (2.2) and (2.3) we conclude that T
f
Por(ew, 0)
i exp{ 2  , / 2 E  1
SoT(4')},
76
3. Action Functional
and therefore,
P{poT(X`, (P) < d} > i exp{e2Sor(rP)
 2,/2 E  IS 0T(q,)}.
This implies the assertion of Theorem 2.1. Proof of Theorem 2.2. We have to estimate the probability that the trajectories of our process move far from the set of small values of the functional SoT(10) On the trajectories of X; = ew, themselves, the action functional is equal to
+ x and we approximate the functions of X; by smoother functions. We denote by IF, 0 < t < T the random polygon with vertices at the points (0, 0), (A, X;), (2A, X',), ... , (T, X T). We shall choose A later and now we only say that T/A is an integer. The event {poT(X`, 1(s))
S} may occur in two ways: either with pQT(X`, I`) < b or with POT(X`, 1`) 8. In the first case we certainly have P0 (D(s), i.e., SOT(1`) > s. From this we obtain P{poT(X`, (D(s)) > zi} < P{SOT(1`) > s} + P{poT(X': 1) > d}.
(2.4)
To estimate the first probability, we transform SOT(F):
SOT(F) = 2
f
e2 Tie
T I1(I2 dt
n
= 2 Y
1)AI2
111'kA  4, (k
k=1
As a consequence of selfsimilarity and independence of the increments of a Wiener' process, the sum =I A' IwkA  W(k1)e12 is distributed in the same way as r/; ; , where the 1 are independent random variables having normal distribution with parameters (0, 1): IT,
rT/e
P{SoT(1`) > s} = P Y Si' > 2e Zs }. We estimate the right side by means of Chebyshev's exponential inequality. Since
Mexp{1 2
2}=C, 0, we have rr/A
P{SOT(1`) > s} = Pi > 2e2s
61
(
< P{ max I ew1I > 61. 2 1051511
Continuing estimation (2.6) and taking account of the inequality
P{wo > z}
 S}
A
`ostse
4Q P{we > 6/2re} S
_
23 2re
6,/27c
z
exp 
8r2&2
.
(2.7)
Now it is sufficient to take A < 62 /4r2sO, and the right side of (2.7) will be smaller than 'z exp{ez(s  y)} for a sufficiently small and s 0 and every sufficiently small SO > 0 there exists an go > 0 such that exp{EZ(SOT((P)  Y)}
P{POT(XC, (P) < a0) >_ exp{E2(SOT((P) + y)}
for 0 < E < Eo. The righthand inequality constitutes the assertion in Theorem 2.1. We prove the lefthand inequality. We choose 60 > 0 so small that
inf
SOT(S) > SOT((P)  Y/4.
0: vor(vp,*) 5 30
This can be done relying on Lemma 2.1. By the corollary to the same lemma, we have
bt = PoT({1li: PoT((P, 0) 0.
We apply Theorem 2.2: P{POT(X`, (P) < b0} < P{PoT(X`, 4b(SOT((P)  Y/2)) > at}
< exp{ _S2 (SO((P)  Y)}
whenever s is sufficiently small. By the same token inequalities (2.9), and with them Theorem 2.3, are proved. O The proofs of Theorems 2.1 and 2.2 reproduce in the simplest case of the process X' = sw, the construction used in the articles [1] and [4] by Wenizell and Freidlin. In this special case, analogous results were obtained in Schilder's article [1].
§3. Action Functional. General Properties The content of §2 may be divided into two parts: one which is concerned with the Wiener process and the other which relates in general to ways of describing the rough (to within logarithmic equivalence) asymptotics of families of measures in metric spaces. We consider these questions separately.
80
3. Action Functional
Let X be a metric space with metric p. On the aalgebra of its Borel subsets let ph be a family of probability measures depending on a parameter h > 0. We shall be interested in the asymptotics of p'' as h l 0 (the changes which have to be made if we consider convergence to another limit or to 00 or if we consider a parameter not on the real line but in a more general set, are obvious). Let 2(h) be a positive realvalued function going to + 0e as It 10 and let S(x) be a function on X assuming values in [0, oo]. We shall say that A(h)S(x) is an action function for p" as h 10 if the following assertions hold: (0)
the set D(s) = (x: S(x) < s) is compact for every s > 0;
(I)
for any S > 0, any y > 0 and any x e X there exists an ho > 0 such that
p"{y: p(x, y) < S) > exp{1(h)[S(x) + y])
(3.1)
for all It < ho; (II)
for any S > 0, any y > 0 and any s > 0 there exists an ho > 0 such that p''{y: p(y, (D(s)) > S) < exp{ 2(h)(s  y)}
(3.2)
for h < ho.
If c' is a family of random elements of X defined on the probability spaces (S2'',,', P''), then the action function for the family of the distributions p'', p"(A) = a A} is called the action function for the family ''. In this case formulas (3.1) and (3.2) take the form
P"{p(g', x) < 8} < exp{2(h)[S(x) + y]), (p(s)) > S} < exp{A(h)(s  y)).
(3.1')
(3.2')
Separately, the functions S(x) and 2(h) will be called the normalized action function and normalizing coefficient. It is clear that the decomposition of an action function into two factors 2(h) and S(x) is not unique; moreover, 2(h) can always be replaced by a function 11(h)  2(h). Nevertheless, we shall prove below that for a given normalizing coefficient, the normalized action function is uniquely defined.
EXAMPLE 3.1. X = R', p' is the Poisson distribution with parameter h for every h > 0, and we are interested in the behavior of ph as h 10. Here
2(h) = In h; S(x) = x for every nonnegative integer x and S(x) = +00 for the remaining x.
§3. Action Functional. General Properties
81
If X is a function space, we shall use the term action functional. Hence for the family of random processes sw where w, is a Wiener process, t E [0, T]
and wo = 0, a normalized action functional as c  0 is S(cp) = . f I cp, I dt o for absolutely continuous cp,, 0 < t < T, cpo = 0, and S(q) _ + oo for all other cp and the normalizing coefficient is equal to p2 (as the space X we
take the space of continuous functions on the interval [0, T] with the metric corresponding to uniform convergence).
We note that condition (0) implies that S(x) attains its minimum on every nonempty closed set. It is sufficient to consider only the case of a
closed A 9 X with SA = inf(S(x): x E A) < oo. We choose a sequence of nA SA. The nested compact sets points x E A such that s = nAD and therefore, their intersection is are nonempty (since nonempty and contains a point XA, S(xA) = sA. It would be desirable to obtain immediately a large number of examples
of families of random processes for which we could determine an action functional. The following result (Freidlin [7]) helps us with that. Theorem 3.1. Let )(h)S"(x) be the action function for a family of measures µ" on a space X (with metric p,,) as h 10. Let cp be a continuous mapping of X into a space Y with metric py and let a measure v" on Y be given by the formula vh(A) = µh((p'(A)). The asymptotics of the family of measures v" as It 10 is given by the action function .(h)S"(y), where S"(y) = min{S"(x): x E w'(y)} (the minimum over the empty set is set to be equal to + oc).
Proof. We introduce the following notation: tD"(s) = {x: S"(x) < s}, V(s) = {y: S"(y) < s}.
It is easy to see that 1"(s) = cp(b (s)), from which we obtain easily that S" satisfies condition (0). We prove condition (I). We fix an arbitrary y E Y and a neighborhood of
it. If S"(y) = oo, then there is nothing to prove. If S"(y) < oo, then there exists an x such that p (x) = y, S"(y) = S"(x). We choose a neighborhood of x whose image is contained in the selected neighborhood of y and thus obtain the condition to be proved. Now we pass to condition (II). The preimage of the set (y: py(y, D"(s)) >_ 6) under tp is closed and does not intersect the compact set c1 (s). Therefore, we
can choose a positive d' such that the S'neighborhood of 4)"(s) does not intersect p' {y: p1(y, V(s)) >_ 8}. From inequality (3.2) with px,'" and 8' in place of 6, we obtain a similar inequality for v", py, 0" and 6.
Using this theorem in the special case where X and Y are the same space with distinct metrics, we obtain that if )(h)S(x) is an action function for a family of measures µh as It 10 in a metric p, and another metric P2 is such that p2(x, y)  0 if p,(x, y)  0, then A(h)S(x) is an action function in the
82
3. Action Functional
metric P2, as well. Of course, this simple assertion can be obtained directly.
From this we obtain, in particular, that E2SOT(9) = (1/2e2) Jo I0tl2 dt remains an action functional for the family of the processes sw 0 < t < T as c 10 if we consider the metric of the Hilbert space LOT The following examples are more interesting. EXAMPLE 3.2. Let G(s, t) be a k times continuously differentiable function on
the square [0, T] x [0, T], k > 1. In the space COT we consider the operator 6 defined by the formula 6(p, = J G(s, r t) dips. 0
Here the integral is understood in the sense of Stieltjes and by the assumed smoothness of G, integration by parts is allowed:
69, = G(T, *PT  G(0, t)(po 
f T aliass, t) cps ds. o
a
This equality shows that 6 is a continuous mapping of COT into the space Oki t) of functions having k  1 continuous derivatives with the metric
P''((p, ) = max
OSiSkt 05t5T
I dt(cOt  0) dt'
We calculate the action functional for the family of the random processes T
X, = Sdw, = c
G(s, t) dws J0
in CokT t' as s  0. By Theorem 3.1, the normalizing coefficient remains the same and the normalized action functional is given by the equality SX((P) = SOT((P) = min{SoT(I/i): do = (p} T
= min 2
f 1 0 . 1 2 ds:
cp
;
0
if there are no 0 for which do = cp, then SX((p) = + oo. We introduce an auxiliary operator G in LOT, given by the formula Gf (t) =
TG(s,
J0
t) f (s) ds,
J3. Action
83
Functional. General Properties
SXT in terms of the inverse of G. This will not be a onetoone general, since G vanishes on some subspace Lo S L02 T, which operator in nontrivial. We make the inverse operator onetoone artificially, by may be setting G 1(p = 0, where 0 is the unique function in LoT orthogonal to Lo and such that Go = cp. The operator G' is defined on the range of G. If SX((o) < oc), then there exists a function i' E COT such that do = cp and S.T(o) < oo. Then 0 is absolutely continuous and do = Giy. Therefore,
and express
j
T
Sor((v) = min{2 S I&12 ds: do =
=min 211f 112: Gf =
,
where If II is the norm of fin LoT. Any element f for which Gf = 9 can be
represented in the form f = G'cp + f', where fE Lo. Taking into account that G' (p is orthogonal to Lo, we obtain 11f112 = JIG `(PII 2 + 11f,112 >JIG 'tPI12 This means that SOT(9) = ilIG'cp112 for (p in the range of G. For the remaining tp E CoT " the functional assumes the value + oo. EXAMPLE 3.3. We
consider a random process X; on the interval [0, T],
satisfying the linear differential equation
_
d P(dt)X,
R Qk
d'`X `
dtk
evV,
k =0
where t.', is a onedimensional white noise process. In order to choose a unique solution, we have to prescribe n boundary conditions; for the sake
of simplicity, we assume them to be homogeneous linear and nonrandom. We denote by G(s, t) the Green's function of the boundary value problem
connected with the operator P(d/dt) and our boundary conditions (cf. Coddington and Levinson [1]). The process X; can be represented in the form X; = s I' G(s, t) dws, i.e., X` = d(ew). In this case the corresponding operator has the singlevalued inverse G' = P(d/dt) with domain consisting of the functions satisfying the boundary conditions. From this we conclude that the action functional for the family of the processes X, as e  0 is E 2SoT(cp), where 2
S0T(gP) =
T J0T
dt,
and, if cp does not satisfy the boundary conditions or the derivative d"'(p,/dt' ' is not absolutely continuous, then SoT(gP) =  co. The action functional has an analogous form in the case of a family of multidimensional random processes which are the solution of a system of linear differential equations on the right side of which there is a white noise multiplied by a small parameter.
84
3. Action Functional
In the next chapter (§1), by means of the method based on Theorem 3.1, we establish that for a family of diffusion processes arising as a result of a perturbation of a dynamical system z = b(x,) by adding to the right side a white noise multiplied by a small parameter, the action functional is equal to (1/21;2) IT It  b(tPr)I2 dt. We consider the conditions (0), (I), and (II) in more detail. In the case of a complete space X, condition (0) can be split into two: lower semicontinuity of S(x) on X (which is equivalent to closedness of 4(s) for every s) and relative compactness of V(s). Such a splitting is convenient in the verification of the condition (cf. Lemma 2.1). The condition of semicontinuity of S is not stringent: it is easy to prove that if the functions 2(h) and S(x) satisfy conditions (I) and (II), then so do 1(h) and the lower semicontinuous function S(x) = S(x) A limyx S(y). (The method of redefining the normalized action functional by semicontinuity is used in a somewhat more refined form in the proof of Theorem 2.1 of Ch. 5.) Theorem 3.2. Condition (I), together with the condition of relative compactness of cd(s), is equivalent to the condition: (Ieq)
for any d > 0, y > 0 and so > 0 there exists ha > 0 such that inequality (3.1) is satisfied for all h < ho and all x e (D(so).
Condition (II) implies the following:
(II,,) for any S > 0, y > 0 and so > 0 there exists an ho > 0 such that inequality (3.2) is satisfied for all It < ho and s < so. Conditions (0) and (II) imply the following:
(II+) for any S > 0 and s > 0 there exist y > 0 and ho > 0 such that µ"{y: p(y, (D(s)) > S} < exp{ A(h)(s + y)}
(3.3)
for all h s + 2y; then A n cp(s + 2y) = 0. We select a positive b' not exceeding p(A, (D(s + 2y)) (this distance is positive by virtue of the compactness of the second set) and use inequality (3.2) for 8' in place of 8 and s + 2y in place of s.
Conditions (I) and (II) were introduced in describing the rough asymptotics of probabilities of large deviations in the papers [1], [4] of Wentzell
85
§3. Action Functional. General Properties
and Freidlin. There are other methods of description; however, under condition (0) they are equivalent to the one given here.
In Varadhan's paper [1] the following conditions occur instead of conditions (I) and (II): (1')
for any open A c X we have
inf{S(x): x e A);
(3.4)
HE A(h)' In y4(A) < inf{S(x): x e A).
(3.5)
lim A(h)
In p"(A)
hjo
(II') for any closed A Z X we have hj0
Theorem 3.3. Conditions (I) and (I') are equivalent. Condition (II') implies (II) and conditions (0) and (II) imply (II') Consequently, (I) p (I') and (II) a (II') under condition (0).
Proof'. The implications (I') (I), (I) (I') and (II') (II) can be proved very simply. We prove, for example, the last one. The set A = {y: p(y, (D(s)) b} is closed and S(y) > s in it. Therefore, inf{S(y): y E A', s. From (II') we obtain: Ii ii 0 A(h)' In p"(A) < s, which means that for every y > 0 and h sufficiently small we have: A(h)' In p"(A) < s + y, i.e., (3.2) is satisfied.
Now let (0) and (II) be satisfied. We prove (II'). Choose an arbitrary y > 0 and put s = inf{S(y): y E A)  y. The closed set A does not intersect the compact set cD(s). Therefore, S = p(A, (D(s)) > 0. We use inequality (3.2) and obtain that ph(A) < ph{y: p(y, cD(s)) > S}
< exp{ A(h)(s  y)) = exp{  A(h)(inf{S(y): y e A}  2y)}.
for sufficiently small h. Taking logarithms, dividing by the normalizing coefficient A(h) and passing to the limit, we obtain that T1Mhj0 A(h)' In p" (A)
< inf{S(y): y e A) + 2y, which implies (3.5), since y > 0 is arbitrary.
In Borovkov's paper [1] the rough asymptotics of probabilities of large deviations is characterized by one condition instead of the two conditions (I) and (II) or (I') and (II'). We shall say that a set A C X is regular (with respect to the function S) if the infimum of S on the closure of A coincides with the infimum of S on the set of interior points of A: inf{S(x): x e [A]) = inf{S(x): x E (A)).
86
3. Action Functional
We introduce the following condition: (1 z)
for any regular Borel set A g X,
lim ,1(h)' In p''(A) = inf{S(x): x e A).
(3.6)
hjO
Theorem 3.4. Conditions (0), (I) and (II) imply (I Z). Moreover, if A is a regular set and min{S(x): x e [A]} is attained at a unique point xo, then lim
ph(A n {x: p(xo, x) < S}) = 1
hj0
(3.7)
ph(A)
for every 6 > 0. Conversely, conditions (0) and (I Z) imply (I) and (II).
We note that in terms of random elements of X, (3.7) can be rewritten in the form lim Ph{p(xo, h) < 6I 1;h e A} = 1.
(3.7')
hj0
Proof. We use the equivalences (I) e> (I') and (II) (II') already established (under condition (0)). That (I') and (II') imply (I is obvious. Moreover, if Z) A is a nonBorel regular set, then (3.6) is satisfied if ph(A) is replaced by the corresponding inner and outer measures. To obtain relation (3.7), we observe that min{S(x): x e [A], p(x, xo) > S} > S(xo).
We obtain from (3.5) that fim t.(h)' In ph{x a A: p(xo, x)
S}
hj0
< HE ),(h)
In ph{x a [A]: p(xo, x) >_ S} < S(xo).
hjO
This means that ph{x e A: p(xo, x)
S} converges to zero faster than
ph(A) ; exp{ A(h)S(xo)}. We show that (I Z) implies (I') and (II'). For any positive 6 and any set A c X we denote by A+a the 6neighborhood of A and A_a the set of points
which lie at a distance greater than 6 from the complement of A. We set s(± 6) = inf(S(x): x e A+b) (infimum over the empty set is assumed to be equal to + cc). The function s is defined for all real values of the argument; at zero we define it to be inf{S(x): x e A}. It is easy to see that it is a nonincreasing function which is continuous except at possibly a countable number of points.
87
§3. Action Functional. General Properties
If A is an open set, then s is left continuous at zero. Therefore, for an arbitrarily small y > 0 there exists a b > 0 such that s(6) < s(0)  y, and s is continuous at 6. The latter ensures the applicability of (3.6) to A_,,. We obtain
Jim A(h)' In ph(A) >: lim A(h)  I In µh(A_5) > s(0)  y. hjo
hj0
Since y is arbitrary, this implies (3.4).
In the case of a closed set A we use condition (0) to establish the right continuity of s at zero and then we repeat the same reasoning with A+a replacing A_a. 2 II(PII > c}. EXAMPLE 3.4. Let A be the exterior of a ball in LO2 T: A = {cp e LOT:
This set is regular with respect to the functional SoT considered in Example 3.3. To verify this, we multiply all elements of A by a number q smaller than
but close to one. The open set q A = q (A) absorbs [A] and the infimum of the functional as well as all of its values change insignificantly (they are multiplied by q2). Since A is regular, we have lim E2 In P{IIX'11 > c} = lim E2 In P{11X'11 > c} =
min{
Eo
Eo
(dt d(P
P
where the minimum is taken over all functions cp satisfying the boundary conditions and equal to c in norm. We consider the special case where the operator P(d/dt) with the boundary conditions is selfadjoint in LOT. Then it has a complete orthonormal system of eigenfunctions ek(t), k = 1, 2, ... , corresponding to the eigenvalues
... (cf., for example, Coddington and Levinson [I D. If a function qp in LOT is representable in the form Eckek, then P(d/dt)cp = Ek°°=t ckAkek Ak, k = 1, 2,
and II P(d/dt)cp II Z = Y ck Ak .
This implies that the minimum in (3.8) is equal
to c2/2 multiplied by A;, the square of the eigenvalue with the smallest absolute value. Consequently, Jim c2 In P{11X'11 > c} Eo
_ c2Ai/2.
(3.9)
The infimum of S(cp) on the sphere of radius c in LOT is attained on the eigenfunctions, multiplied by c, corresponding to the eigenvalue A,. If this eigenvalue is simple (and A, is not an eigenvalue), then there are only two such functions: ce, and ce,. Then for any 6 > 0 we have limP{IIX'  ce, II < S or IIX' + ce,11 < 61 IIX`II > c} = I. Ejo
88
3. Action Functional
The same is true for the conditional probability under the condition IX`II > C.
A concrete example: P(d/dt)cp = dzcp/dt2, boundary conditions:
(po
= (PT = 0. This operator is selfadjoint. The equation cp" = Acp, (po = coT = 0
for the eigenfunctions has the solution: A = Ak = k2nz/T2; ek(t) (,/2/T) sin(knt/T). The eigenvalues are simple. We have Jim e' In P{IIX`II > c}
ro
c2n4
2T 4
for every d > 0 the conditional probability that X` is in the 6neighborhood of one of the functions ±c([2/T) sin(ttt/T), under the condition that IIX`II is greater (not smaller) than c, converges to I as a > 0. If our differential operator with certain boundary conditions is not selfadjoint, then A in (3.9) must be replaced with the smallest eigenvalue of the product of the operator (with the boundary conditions) with its adjoint.
EXAMPLE 3.5. Let b(x) be a continuous function from R' into R'. On the space COT of continuous functions on the interval [0, T] with values in R', consider the functional SOT(cp) equal to
IT 1 cps
 b((ps)12 ds, if co is absolutely
continuous and cpo = xO to + oe on thezrest of COT. Let an open set D a xO, D
R' be such that there exist interior points of the complement of D
arbitrarily close to every point of the boundary aD of D (i.e., OD = O[D]). Let us denote by AD the open set of continuous functions cp 0 S t < T such that cp, e D for all t E [0, T]. We prove that AD = COT\ AD is a regular set with respect to SOT
Let the minimum of SOT on the closed set AD be attained at the function
cp 0 < t < T, cpo = xO (Figure 2). This function certainly reaches the boundary at some point to # 0: cp,a E OD. The minimum is finite, because there exist arbitrarily smooth functions issued from xO and leaving D in the time interval [0, T]; this implies that the function cp, is absolutely continuous.
Figure 2.
89
§3. Action Functional. General Properties
For any S > 0 there exists an interior point x° of R'\ D in the 6neighborhood of the point cp,0. We put
I
(pa=tp,+ t (x°tpro), to
0 S(x), and
from conditions (II) and (0) that Nato limhlo S S(x). We prove the second one. For any y > 0, the compact set b(S(x)  y) does not contain x. We choose a positive b smaller than one half of the distance of x from this compact set. We have uh{y: P(x, y) < b} < ph{y: p(y, D(S(x)  y)) > b}. By (3.2), this does not exceed exp{2(h)(S(x)  2y)} for h sufficiently small. This gives us the necessary assertion.
We note that we cannot replace conditions (I), (II) (under condition (0)) by relations (3.11). Relations (3.11) do not imply (II), as the following example shows: X = R', µh is a mixture, with equal weights, of the normal distribution
with mean 0 and variance h and the normal distribution with mean 0 and variance exp{e"h2}, and 2(h) = h'. Here limalo limhlo 2(h)' In ph{y:
p(x, y) < 8} _ x2/2 for all x; the function x2/2 satisfies (0) but (II) is not satisfied. We formulate, in a general language, the methods used for the verification of conditions (I) and (II) in §2 (they will be used in §4 and §§1 and 2 of Ch. 5 as well). To deduce (I), we select a function gx(y) on X so that the family of measures j''(dy) = exp{2(h)gx(y)}µh(dy) converge to a measure concentrated at x; then we use the fact that
µh{y; P(x,y) < 8} =
5 Y: Ax. A s}
< JA
exp{(l  x)A(h)S(z(x))}µ"(dx)
x exp{(1  x)A(h)s}. Of course, the problem of estimating the integral on A is solved separately in each case.
Finally, we discuss the peculiarities arising in considering families of measures depending, besides the main parameter h, on another parameter x assuming values in a space X (X will have the meaning of the point from which
the trajectory of the perturbed dynamical system is issued; we restrict ourselves to the case of perturbations which are homogeneous in time). First, we shall consider not just one space of functions and one metric; instead for any T > 0, we shall consider a space of functions defined on the interval [0, T] and a metric POT. Correspondingly, the normalized action functional will depend on the interval: S = SOT. (In the case of perturbations considered in Chs. 4, 5, and 7, this functional turns out to have the form SOT(W) = f o L(rp gyp,) dt, and it is convenient to define it analogously for all intervals [T,, T2],  oo < T, < T2 < + oo on the real line.) Moreover, to every point x E X and every value of the parameter h there will correspond its own measure in the space of functions on the interval
[0, T]. We can follow two routes: we either consider a whole family of functionals depending on the parameter x (the functional corresponding to x is assumed to be equal to + oo for all functions rp, for which p o # x) or introduce a new definition of the action functional suitable for the situation. We follow the second route. Let {Q", .' h, PX} be a family of probability spaces depending on the
parameters It > 0 and x running through the values of a metric space X and let X;, t > 0 be a random process on this space with values in X. Let POT be a metric in the space of functions on [0, T] with values in X and let Sor((P) be a functional on this space. We say that A(h)SOT(W) is the action func
tional for the family of the random processes (X', Px) uniformly in a class
92
3. Action Functional
,W of subsets of X if we have: (0p)
the functional SOT is lower semicontinuous and the sum of the cx(s) for x e K is compact for any compact set K E X, where I)Js) is the set of functions on [0, T] such that p o = x and SOT((P) < s; for any b > 0, any y > 0, any so > 0 and any A e .c there exists an ho > 0 such that Px{POT(Xh, (P) < 8} > exp{A(h)[SOT(T) + Y]}
(3.12)
for all h 0, y > 0, so > 0 and any A e.4 there exists an ho > 0 such that Px{POT(X'. (Dx(s)) > 8} < exp{.1(h)(s  y)}.
(3.13)
for all h  0. We A,, < oo. put G(s, t) _ .Ik'2ek(s)ek(t). The operator A has finite trace Therefore, the series > A,','2ek(s)ek(t) is convergent in L101,101. The
93
14. Action Functional for Gaussian Random Processes and Fields
function G(s, t) is precisely the kernel of the integral operator A 112. In order to see this, it is sufficient to note that the eigenfunctions of the operator with kernel G(s. t) coincide with the eigenfunctions of A and its eigenvalues are square roots of the corresponding eigenvalues of A.
The random process X, admits an integral representation in terms of the kernel of A''2: T
X, =
G(s, t) dws, J0
where w, is an rdimensional Wiener process. We shall write this equality in the form X, = A'"21i.,. In order to prove it, it is sufficient to note that X = A 112 N is a Gaussian process with MX, = 0 and
MX,X, =
T G(s,
u)G(t, u) du = a(s, t).
J0
The operators A and A 112 nullify some subspace L0 s L02 T. This subspace
is nontrivial in general, and therefore, for a definition of the inverses A' and A1/2, additional agreements are needed. We define the inverses by = tfi2 if Aifi1 = cp, A'12tli2 = (p and 'P,, 2 are = O1, A setting A orthogonal to L0. Consequently, A' and A 112 are defined uniquely on the ranges of A and A 112, respectively. On LoT we consider the functional S(rp) equal to IIA"2rPII2; if A112(p z which A' is defined, is not defined, we set S((p) = + o. For functions cp for Z(A'cp, (p), i.e., S((p) is an extension of the functional we have S(cp) =
'(A'ap, (p)
Theorem 4.1. The functional S(ip) is a normalized action .functional for the Gaussian process X,` = EX, in LoT as c  0. The normalizing function is 2
Proof. Let GN(s, t) be a symmetric positive definite continuously differentiable
kernel on the square [0, T] x [0, T] such that T T [G(s,
Jo Jo
t)  GN(S, t)]2 ds dt
exp{E2(S(q) + y)}
(4.1)
for any 6, y > 0 for sufficiently small positive c. If S((p) = ±xc, then (4.1) is obvious. Let S((p) < x. There exists a 0 e LOT orthogonal to L0 and such
94
3, Action Functional
that A"2i/i = (p. We put (PN = GNy1 and XN = GN14'. Choose No so large that IIwN  (P11 _ < II'I (f fo I G(s, t)  GN(s, t)] 2 ds dt)''2 < 6/3 for N > No. For such N we have o {ILEX  (PII < b} 2 {ILEX  EXNII < 6/3, IIEXN  (PNII < 6/3).
From this we obtain
P{ILEX  (PII<S}>PIEXNWNIIS/3}. }}}
(4.2)
Since GN(S, t) is a continuously differentiable kernel, in estimating the first term of the right side, we can use the result of Example 3.2: P{IIEXN  (PNII
 exp{E2(S((p) + y)}
(4.3)
for s smaller than some E1. Here we have used the fact that
GN'(PN=tP =A"2(P,
S((P)=2Ik1'112 =2IIGN'(PNII2
Moreover, using Chebyshev's inequality, we obtain jT
 EXN? /3} < e
(
2M exp{9a
rT
2
rJ (G(s, t)  GN(s, t)) dw., L0
r
a2
dt _> 9E2
Jl
T
J0
T (G(s,
Lfo
t)  GN(s, t)) dws12 dt } J
(4.4)
JJJ
for an arbitrary a > 0. For sufficiently large N, the mathematical expectation on the right side is finite. To see this, we introduce the eigenfunctions (Pk
and corresponding eigenvalues µk, k = 1, 2,
... of the symmetric square
integrable kernel rN(s, t) = G(s, t)  GN(s, t). It is easy to verify the following equalities: FN(S, t) = Y, Pk (Pk(S)(Pk(t), T
µk =
J0
T
JO I NO, t)2 ds dt,
TrN(S, t) dw3 = Y (k(Pk(t) T (Pk(s) dw.,
J0
k
0
jr [5T I AS, t) dws
JZ
dt = , µk [s:kS ) dw] 2
95
§q, Action Functional for Gaussian Random Processes and Fields
The random variables bk = !o 9k(s) dws have normal distribution with mean zero and variance one and they are independent for distinct k. Taking account of these properties, we obtain T
2 Za
[LTrNS, t) dws dt } = M exp
M exp as 00
(9a
°°
k zk
M exp S2 µkbk
18a
= f`
k=
E µk ki )
)
fo
k=
(
1 b2
µk
The last equality holds if (18a/52)µk < 1 for all k. Since µk
=
f
7
0
T
f
FAS, t)2 ds dt + 0
0
as N  oo, relation (4.5) is satisfied for all N larger than some N1 = N,(a, 6). The convergence of the series Y µk implies the convergence of the infinite product in (4.5). Consequently, if N > N1, then the mathematical expectation on the right side of (4.4) is finite and for a = S((p) + y and for c sufficiently small we obtain P{IIEX  EXNII > 6/3} 5 const exp{s2(S((p) + y)}.
(4.6)
Combining estimates (4.2), (4.3), and (4.6), we obtain (4.1). Now we prove that for any s > 0, y > 0, 6 > 0 there exists an Eo such that
P{p(EX, (i(s)) > S} < exp{E2(s  y)}
(4.7)
for E < Eo, where i(s) = {tp E LoT: S((p) < s}.
Along with the image '(s) of the ball of radius
2s in LoT under the
mapping A"2, we consider the image CDN(s) of the same ball under GN. Let N > 6s/6. Taking account of the definition of GN(s, t), we obtain T r T
sup
IIGw  GN(pII2 =
m:ll(vll < z:
sup s
fo Lfo(G(s, t)  GN(s, t))cps ds
2
dt
L
ff ,:IIrolls 0 T
T(G(s,
sup
t)  GN(S, t))2 ds dt II (p 112
o
< 6/3.
From this we obtain P(p(EX, 41(s)) > b) < P{IIEX  EXNII > 6/3} + P{p(EXN, (DN(s)) > 6/3)
(4.8)
96
3. Action Functional
for N > 6s/8. By virtue of (4.4), the first term on the right side can be made smaller than exp{E2(s  y/2)) for N sufficiently large. The estimate P{p(EXN, (DN(s)) > 6/3} < exp{e2(s  y/2)}
of the second term follows from Example 3.2. Relying on these estimates, we can derive (4.7) from (4.8). Theorem 4.1 is proved. Remark. As was established in the preceding section, estimates (4.1) and (4.7) are satisfied for sufficiently small a uniformly in all functions (p with S((p) < const and for all s < so < oo, respectively.
In the above proof, taken from Freidlin [1], the main role was played by the representation X, = i, of the Gaussian process X,. This representation enabled us to reduce the calculation of the action functional for the family of processes eX, in LaT to the estimates obtained in §2 of the corresponding probabilities for the Wiener process (for the uniform metric). We are going to formulate and prove a theorem very close to Theorem 4.1; nevertheless, we shall not rely on the estimates for the Wiener process in the uniform metric but rather reproduce proofs from §3.2 in a Hilbert space setting (cf. Wentzell [4]). This enables us to write out an expression of the
action functional for Gaussian random processes and fields in various Hilbert norms. Of course, a Gaussian random field X. can also be represented in the form f G(s, z) dws and we can use the arguments in the proof o of Theorem 4.1. Nevertheless, this representation is not as natural for a
random process. Let H be a real Hilbert space. We preserve the notation (,) and II II for
the scalar product and norm of H. In H we consider a Gaussian random element X with mean 0 and correlation functional B(f, g) = M(X,f)(X, g). This bilinear functional can be represented in the form B(f, g) = (Af, g) where A is a selfadjoint linear operator which turns out to be automatically nonnegative definite and completely continuous with a finite trace (Gikhman and Skorokhod [2], Ch. V, §5). As earlier, we write S((p) _'IIA'"2(p1I2 If A1/2cp is not defined, we set S(4) = + oo. In order to make A  t/2 singlevalued, as A'12(p we again choose that element 0 which is orthogonal to the null space of A and for which A1,20 = (p. Theorem 4.2. Let s, 6 and y be arbitrary positive numbers. We have
P{IIEX  (p 11 < 6) > exp{e2(S((p) + y)}
(4.9)
fore > 0 sufficiently small. Inequality (4.9) is satisfied uniformly for all cp with S((p) < s < oo. If fi(s) = {cp e H: S((p) < s}, then
P{p(eX, (D(s)) > 6) S exp{e2(s  y)}
(4.10)
97
§4. Action Functional for Gaussian Random Processes and Fields
fore > 0 sufficiently small. Inequality (4.10) is satisfied uniformly for all
s<so 1  e262MIIX112 = 1  E262 Y Ai > 3/4 i=1
for e < 2b(y?° 1 2i)  'f2 and 00
P{Iy 1 '(piXil < K) > 1  K2MI Y_ Ai 19iXi1
2
J
= I  K2 Y A`93 = 1  2K2S(q) > 3/4 i=1
for K > 2,/2_ S((p). It follows from these inequalities that the random variable under the sign of mathematical expectation in (4.11) is greater than
exp{e2S((p)  e'K) with probability not smaller than 3/4. This implies inequality (4.9).
98
3. Action Functional
Now we prove the second assertion of the theorem. We denote by X the random vector with coordinates (X,, X2, ... , Xio, 0, 0,...). The choice of the index io will be specified later. It is easy to verify that P{p(EX, c(s)) > d} < P{cX0(D(s)) + P{p(X,X) > S/E}. (4.12)
The first probability is equal to P{S(EX) > s} = P{S(X) > se2} = P{
(
A 'X; > 2sc 2}.
(4.13)
JJJ
The random variable D0= i A' X? is the sum of squares of io independent normal random variables with parameters (0, 1), and consequently, has a X2distribution with io degrees of freedom. Using the expression for the density of a X2distribution, we obtain
P{eX00(s)} = P1j=Y A 'Xf > 2se2} t
_
°c
J2sE
)))
1
Xio/2
(4.14)
r(ia/2)2`0/2
< const E'0 exp{ E2s} for e > 0 sufficiently small. The second probability in (4.12) can be estimated by means of Chebyshev's exponential inequality: P{ p(X, X) > b/E} < exp{  2 (6/0' } M exp{ 2 p(X, j)2 )2 }.
(4.15)
The mathematical expectation on the right side is finite for sufficiently large io. This can be proved in the same way as the finiteness of the mathematical
expectation in (4.4); we have to take account of the convergence of the series
1 A... Substituting c = 2sb2 in (4.15), we have
P{p(X, X) > b/E} < const exp{ ss 2}
(4.16)
for sufficiently large io. Combining formulas (4.13), (4.14) and (4.16), we obtain the last assertion of the theorem.
This theorem enables us to calculate the action functional for Gaussian random processes and fields in arbitrary Hilbert norms. It is only required that the realizations of the process belong to the corresponding Hilbert space. In many examples, for example, in problems concerning the crossing of a level by a random process or field, it is desirable to have estimations in
j4. Action Functional for Gaussian Random Processes and Fields
99
the uniform norm. We can use imbedding theorems to obtain such estimations.
Let D be a bounded domain in R' with smooth boundary. Let us denote by W'2 the function space on D obtained from the space of infinitely differentiable functions in D by completing it in the norm
II u II w2 =
(
igI 0, the function cp(t) assumes values inside G for t e [OaG(cp)  a, Oac((P)], then we shall say that cp(t) leaves G in a regular manner.
Theorem 2.3. Suppose that the hypotheses of Theorem 2.1 are satisfied and
there exists an extremal !p(t),  oo < t < T, unique up to translations and going from 0 to aG (namely, to the point yo, Fig. 6). Let us extend 0(t) by continuity tot > Tin an arbitrary way. Let G c D be a neighborhood with smooth boundary of the equilibrium position and suppose that the extremal 0(t) leaves this neighborhood in a regular manner. Let its denote by O;"G the last moment qJ' time at which the trajectory of X6(t) is on aG until exit onto aD: OG = max{t < r`: XE(t) e aG}. For any 6> 0 and x e D we have lim PX Ll ma x I X`(t)  0(t  O`aG + oaG(O))I < a} = 1. ) t VA Yo) + d, where d is a positive
constant. Then we choose p < d/SL A 6/2 and the spheres y and F'. The functions co(t), 0 < t < T starting on F', lying farther than 6/2 from the translates of 0(t) and such that SOT(q) < V(O, yo) + 0.65d, do not reach the
117
§2, The Problem of Exit from a Domain
p/2neighborhood of D. As in the proof of Theorem 2.1, from this we can conclude that px{T(y v OD) < T, X`(T(y v aD)) E 3D,
max I X`(t)  w(t  O'?G + O G(cb)) I >_ S} c 2U(x).
On the other hand, if ips is a solution of (3.2), then W _ = 0. This follows from the facts that d U((s)/ds = I VU(ips)12 > 0 for 0, # 0 and 0 is the only zero of U(x). From this we obtain S oc.T(0) = 2
(cps, V U(O,)) ds = 2[U(x)  U(O)] = 2U(x). J T
Consequently, for any curve cps connecting 0 and x we have: S((p) >_ 2U(x) = S(0). Therefore. V(0, x) = inf S((p) = 2U(x), and cp, is an extremal. If U(x) is twice continuously differentiable, then the solution of equation (3.2) is unique, and consequently, the extremal from 0 to x normalized by the condition cPT = x is also unique.
It follows from the theorem just proved that if b(x) has a potential, i.e., b(x) = VU(x), then V(O, x) differs from U(x) only by a multiplicative constant. This is the reason why we call V a quasipotential. If b(x) has a decomposition (3.1), then from the orthogonality condition for VU(x) and 1(x) = b(x) + VU(x) we obtain the equation for the quasipotential: Z(VV(O, x), VV(O, x)) + (b(x), VV(O, x)) = 0,
(3.4)
which is, in essence, Jacobi's equation for the variational problem defining
the quasipotential. Theorem 3.1 says that the solution of this equation, satisfying the conditions V(O, 0) = 0, V(O, x) > 0 and VV(O, x) 0 0 for x 0 0, is the quasipotential. It can be proved that conversely, if the quasipotential V(O, x) is continuously differentiable, then it satisfies equation (3.4), i.e., we have decomposition (3.1). However, it is easy to find examples showing that V(O, x) may not be differentiable.
The function V(O, x) cannot be arbitrarily bad: it satisfies a Lipschitz condition; this can be derived easily from Lemma 2.3. Now we pass to the study of extremals of S((p) going from 0 to x. For them we can write Euler's equations in the usual way:
cp`  J=t
abk
Obi
ax'
axk
(cP'))
abf ((p,) = 0,  /=t b'((p,) axk
1 < k < r.
120
4. Gaussian Perturbations of Dynamical Systems
If b(x) has the decomposition (3.1), then we can write the simpler equations (3.2) for the extremals. From these equations we can draw a series of qualita. tive conclusions on the behavior of extremals. At every point of the domain, the velocity of the motion on an extremal is equal to VU(x) + 1(x) according
to (3.2), whereas b(x) _ VU(x) + 1(x). From this we conclude that the velocity of the motion on an extremal is equal, in norm, to the velocity of the motion of trajectories of the system : I V U(x) + l(x) I = I VU(x) + 1(x) I = I b(x) 1. The last assertion remains valid if we do not assume the existence of the decomposition of b(x). This follows from the lemma below. Lemma 3.1. Let S((p) < oe. Let us denote by ip the function obtained from (p by changing the parameter in such a way that I WS l = I b(OS) I for almost all s. We have S(O) < S(cp), where equality is attained only if I (PS I = I b((ps) I for almost all s. Proof. For the substitution s = s(t), gyp, = cps(,) we obtain T
S((p) = 2 f
I b((cs)  c05I2 ds
T, 1
rtlTzl
2 JIIT,) t(
I b(ip,)  ip1s(t) ' Izs(t) dt
f (I b(w,) Izs(t) + I w, Izs(t)') dt  f ,(T2)
_
2
(b(ct), w,) dt
1(T1)
1(T11
t(Tz)
>t f(T,)
tlTz)
I b(1) I
I
,
I dt 
f
NT=1
dt,
OT I)
where t(s) is the inverse function of s(t). In (3.5) we have used the inequality axe + a'y2 > 2xy, true for any positive a. We define the function s(t) by the equality
t=
f
s(1)
10.1 Ib((P.,)I' du.
0
cps(,) is absolutely continuous This function is monotone increasing and in t. For this function we have I ip, I = I b((P,) I for almost all t and inequality (3.5) turns into the equality tlTz)
t(Tt)
This proves the lemma.
NTz)
fM)
121
§3. properties of the Quasipotential. Examples
Now we consider some examples.
consider the dynamical system z, = b(x,) on the real line, where b(O) = 0, b(x) > 0 for x < 0 and b(x) < 0 for x > 0 and D is an interval (a t, az) c R' containing the point 0. If we write U(x) = fa b(y) dy, then b(x) = dU/dx, so that in the onedimensional case every field has a EXAMPLE 3.1. Let us
potential. It follows from Theorems 2.1 and 3.1 that for small e, with probability close to one, the first exit of the process X; = x + $o b(XS) ds + ew, from the interval (a t, a2) takes place through that endpoint at at which U(x) assumes the smaller value. For the sake of definiteness, let U(a,) < U(a2)The equation for the extremal has the form ips =  b(O); we have to take that
solution ips,  oo < s < T, of this equation which converges to zero as  co and for which (AT = a t .
s
In the onedimensional case, the function v(x) = Px{XT,: = all can be calculated explicitly by solving the corresponding boundary value problem. We obtain 2
72
v`(x) _
t
ezc
Of course it is easy to see from this formula that lim,_0 Pc{XT, = at} = 1 if U(at) < U(a2). Using the explicit form of v(x), it is interesting to study how X, exits from (at, a2) if U(a,) = U(a2). In this case, Theorem 2.1 is not applicable. Using Laplace's method for the asymptotic estimation of integrals (cf., for example, Evgrafov [1]), we obtain that for x e (a,, az), r72e2e2u(,r)
4
ez exp{2r2U(012)} 2
1 U'(«z)1
ezt 2u(Y) dy  f_z exp{2E 2U(at)} I U'(«,) I 21 2
as r  0. From this we find that if U(a,) = U(a2) and U'(a;) 5e 0, i = 1, 2, then
Px{x;. _ «;}
IU'(«t)I' + IU'(«z)I
t
Consequently, in this case the trajectories of X, exit from the interval through both ends with positive probability as r r 0. The probability of exit through a; is inversely proportional to I U'(a;) I = b(a1) I
122
4. Gaussian Perturbations of Dynamical Systems
EXAMPLE 3.2. Let us consider a homogeneous linear system .z, = Ax, in Rr
with constant coefficients. Assume that the matrix A is normal, i.e., AA* = A*A and that the symmetric matrix A + A* is negative definite. In this case, the origin of coordinates 0 is an asymptotically stable equilibrium position. Indeed, the solution of our system can be represented in the form x, = eA`xo, where xo is the initial position. The normality of eA` follows from that of A. Using this observation and the relation eA`eA*` = we obtain e(A+A')t
Ix,I2
= (eAA`xo, eA`xo) = (e(A+A*)txo, xo) < IxoI
Ie(AA*)1xoI.
Since A + A* is negative definite, we have I e4A+A*"xoI 0. Consequently, 1x112 . 0 for any initial condition xo. It is easy to verify by direct differentiation that the vector field Ax admits the decomposition
Ax=
A
x,x)+A 4A:
_
2A*
X.
Moreover, the vector fields V(  4(A + A*)x, x) = RA + A*)x and 2(A  A*)x are orthogonal:
( 2(A + A*)x, 2(A  A*)x) _ '[(Ax, Ax)  (A*x, A*x)] 4[(A*Ax, x)  (AA*x, x)] = 0. Let the right sides of our system be perturbed by a white noise: gee = AX; + :w,.
We are interested in how the trajectories of X; exit from a bounded domain D containing the equilibrium position 0. From Theorem 3.1 and formula (3.6) we conclude that the quasipotential V(O, x) of our dynamical system
with respect to the equilibrium position 0 is equal to  2((A + A*)x, x). In order to find the point on the boundary OD of D near which the trajectories
of X; first leave D with probability converging to 1 as E  0, we have to find the minimum of V(O, x) on OD. The equation for the extremals has the form
gyp, _ 2(A + A*)(p, + 2(A  A*)(p, = A*(p,. If yo e 3D is the unique point where V(O, x) attains its smallest value on LID, then the last piece of a Markov trajectory is near the extremal entering this point. Up to a shift of time, the equation of this trajectory can be written in the form gyp, = e'*'yo.
$q Asymptotics of the Mean Exit Time
123
Figure 7.
For example, let a dynamical system i  x z,i =  x, z
z; = x;  x; , be given in the plane R2. The matrix of this system is normal and the origin of coordinates is asymptotically stable. The trajectories of the system are logarithmic spirals winding on the origin in the clockwise direction (Fig. 7). The quasipotential is equal to [(x')2 + (x2)2], so that with overwhelming
probability for small c, Xf exits from D near the point yo a 8D which is closest to the origin. It is easy to verify that the extremals are also logarithmic spirals but this time unwinding from 0 in the clockwise direction.
§4. Asymptotics of the Mean Exit Time and Invariant Measure for the Neighborhood of an Equilibrium Position As in §2, let D be a bounded domain in R', D e 0 a stable equilibrium position of the system (1.1), r` the time of first exit of a process X; from D. Under the assumption that D is attracted to 0, in this section we calculate the principal term of In MX r` as r  0 and also that of In W(D), where is the invariant
measure of the process X;, D = R'\D. Since the existence of an invariant measure and its properties depend on the behavior of b(x) not only for x e D, in the study of m`(D) we have to make some assumptions concerning b(x) in the whole space R'.
124
4. Gaussian Perturbations of Dynamical Systems
Theorem 4.1. Let 0 be an asymptotically stable equilibrium position of the system (1.1) and assume that the domain D c R' is attracted to 0. Furthermore, assume that the boundary OD of D is a smooth manifold and (b(x), n(x)) < 0 for x E OD, where n(x) is the exterior normal of the boundary of D. Then for x e D we have
lim E2 In Mr = min V(O, ),).
(4.1)
c+o
Here the function V(O, y) is the quasipotential of the dynamical system (1.1) with respect to 0.
The proof of this theorem uses arguments, constructions and notation employed in the proof of Theorem 2.1. To prove (4.1), it is sufficient to verify that for any d > 0, there exists an EO such that for E < eo we have (a) (b)
E2 In Mxt` < Vo + d. e2 In MxT` > Vo
 d.
First we prove inequality (a). We choose positive numbers p, h, T1, and T.
such that the following conditions are satisfied: firstly, all trajectories of the unperturbed system, starting at points x e D u OD, hit the u/2neighborhood of the equilibrium position before time T, and after time T, they do not leave this neighborhood; secondly, for every point x lying in the ball G = {x a R': Ix  01 < u}, there exists a function cp; such that cpo = x, rp' reaches the exterior of the hneighborhood of D at a time T(x) < T2
and cp, does not hit the p/2neighborhood of 0 after exit from G and SoT(x)((px) < Vo + d/2.
The first of these assumptions can be satisfied because 0 is an asymptotically stable equilibrium position, D is attracted to 0 and for x e OD, we have the inequality (b(x), n(x)) < 0. The functions cp; occurring in the second condition were constructed in §2.
From the definition of the action functional, we obtain for y c G that P.1 sup 0< t< T(y)
cl
I
y X`  rPtyI < h} { E2 (SoT(y)((pV) )) > exPl + 2)I }
> exp{ i 2(Vo + d)}, whenever E is sufficiently small.
Since the point cOT(y does not belong to the hneighborhood of D, from the last inequality we conclude that Py.{TL < T2} > Pv{TL < T(y)} > exp{r 2(V0 + d)} for sufficiently small a and y E G.
(4.2)
125
§4 Asymptotics of the Mean Exit Time
We denote by a the first entrance time of G: a = min{t: X' e G}. Using the strong Markov property of X,, for any x e D and sufficiently small c we obtain
PATE < T, + T2} > Mx{a < T,; Px.{TE < T2}}
> P,r{a < T,} exp{a2(V0 + d)} t exp{E2(Va + d)}.
(4.3)
Here we have used inequality (4.2) for the estimation of Px:{T' < T2} and the fact that the trajectories of X; converge in probability to x, uniformly on
Then, using the Markov property of X,, from (4.3) we obtain
MxTE < > (n + 1)(T, + T2)Px{n(T, + T2) < T£ < (n + 1)(T, + T2)} n=o 00
_ (T, + T2) Y Px{T' > n(T, + T2)} h=o
T}].
(4.4)
yer
As follows from Lemma 2.2, T can be chosen so large that the second probability have the estimate
Py{T' = T, > T) < z exp{E2(Vo  h)}.
(4.5)
In order to estimate the first probability, we note that the trajectories of X;,
0< t < T, for which T' = T, < T are a positive distance from the set {cp E CoT(R'):
Wo = y, SoT((p) < Vo  h/2) of functions provided that h > 0
126
4. Gaussian Perturbations of Dynamical Systems
is arbitrary and µ is sufficiently small. By virtue of the properties of the action functional, we obtain from this that
P,,{TE = T1 < T} < exp{c2(Vo  h)} for y e I' and sufficiently small e, ,u > 0. It follows from the last inequality and estimates (4.4) and (4.5) that
P(x, 8D) < exp{a2(Vo  h)},
(4.6)
whenever c, p are sufficiently small. As in Theorem 2.1, we denote by v the smallest n for which Z. = XT., E dD. It follows from (4.6) that
Px{v > n} > [1  exp{ e2(Vo  h)}]"' for x e y. It is obvious that TE = (r1  TO) + (T2  T1) + .  + Therefore, MxT` = a= 1 Mx{v > n; T.  T"_ }. Using the strong Markov property of X, with respect to the time a"_ we obtain that 
Mx{v > n; T.  Tn_} > Mx{v > n; T"  (T"_ t} > Px{v

xer
The last infimum is greater than some positive constant t1 independent of E. This follows from the circumstance that the trajectories of the dynamical system spend some positive time going from r to y. Combining all estimates obtained so far, we obtain that M'T' > t1 Y min P2{v > n} "
2E7
t, Y(1  exp{c2(V0  h)})"' = tl exp{c2(Vo  h)} for sufficiently small c and µ and x e y.
This implies assertion (b) for x e y. For any x E D, taking into account that Px{T' > T1 }  1 as c  0, we have M., TE = Mx{TE < T1; TE} + Mx{TE > T1; TE}
> Mx{TE > TI; MT,TE}
> t' exp{(Vo 
exp{e2(Vo  h)}.
h)F2}Px{TE > T1} >
2
By the same token, we have proved assertion (b) for any x e D. O
127
§4 Asymptotics of the Mean Exit Time
Analyzing the proof of this theorem, we can see that the assumptions manifold aD is smooth and (b(x), n(x)) < 0 for x e aD can be relaxed. that the to assume instead of them that the boundary of D and that of It is sufficient of D coincide and for any x E aD, the trajectory x,(x) of the dythe closure namical system is situated in D for all t > 0. Remark.
We mention one more result relating to the distribution of the random variable re, the first exit time of D.
Theorem 4.2. Suppose that the hypotheses of Theorem 4.1 are satisfied. Then for every a> 0 and x E D we have Px{eE2(V0a)
lim
< TE < e`2(v°+a)} = 1.
EO
Proof. If lm PATE > eE 2(V°+°)} > 0 E0
for some a > 0, then lim c2 In M1TE > Vo + a, Eo
which contradicts Theorem 4.1. Therefore, for any a > 0 and x e D we have lim Px{TE < exp{E2(Vo + a)}} = 1.
(4.7)
Eo
Further, using the notation introduced in the proof of Theorem 2.1, we can write: PTITE < eE2(Yoa)}
n, rE < exp{E2(V0 
< MX x{T1 < TE, tt
a)}}
(4.8)
n=1
+ Px{TE = it}.
The last probability on the right side of (4.8) converges to zero. We estimate the remaining terms. Let mE = [C exp{E2(Vo  a)}]; we choose the constant C later. For x e y we have 00
Px{v = n, TE < exp{E2(V0  a)} n=1 00
< Px{v < mE} + E Px{v = n, T,, < exp{E2(V0  a)}} n=Mi
< Px{v < mE} + Px{r,, < exp{E2(Vo  a)}}.
128
4. Gaussian Perturbations of Dynamical Systems
Using the inequality
1} < exp{e 2(VO  h)}, which holds for
x e y, h > 0 and sufficiently small r, we obtain that
Px{v < mL} < 1  (1  exp{ e2(V0  h)})"''  0
(4.10)
as e . 0, for any C, a > 0 and h sufficiently small. We estimate the second term on the right side of (4.9). There exists 0 > 0
such that PX{T, > 01 > 1/2 for all x e y and e > 0. For the number Sof successes in m Bernoulli trials with probability of success 1/2, we have the inequality
P{S", > m/3} > I  b for m > mo. Since T,,, = (T1  TO) + (r2  T1) + + the strong Markov property of the process, we obtain that P_{T
< e` Z"""} = P
f Me
1/C and m, is sufficiently large. Combining estimates (4.8)(4.11), we arrive at the relation lim P.,{r`
01
(Fig. 8).
We outline the proof of this theorem. As we have already noted, condition
A implies the existence and uniqueness of a finite invariant measure. To prove (4.16), it is sufficient to verify that for any h > 0 there exists co = eo(h) such that fore < co we have the inequalities (a)
p`(D) > exp{e2(V0 + h)},
(b)
µ2(D) < exp{a2(V0  h)},
where Vo = infX E D
VA x).
If Vo = 0, then inequalities (a) and (b) are obvious. We discuss the case Vo > 0. It is clear that Vo > 0 only if p(O, D) = po > 0. For the proof of inequalities (a) and (b), we use the following representation of the invariant measure. As earlier, let y and I be the spheres of radii p/2 and p, respectively, with center at the equilibrium position 0 and let p < po. As in the proof of Theorem 2.1, we consider the increasing sequence of Markov times To, Go, TI, 6t, T2, .... Condition A implies that all these times are finite with probability 1.
The sequence X,, X',, ... , Xi., ... forms a Markov chain with compact phase space y. The transition probabilities of this chain have positive density with respect to Lebesgue measure on y. This implies that the chain has a unique normalized invariant measure 1`(dx). As follows, for example, from Khas'minskii [1], the normalized invariant measure p`(.) of X, can be
44. Asymptotics of the Mean Exit Time
131
on y in the following
expressed in terms of the invariant measure of way:
p'(D) = cE
Mx JV
T
XD(XD) ds 1`(dx),
(4.17)
J0
where XD(x) is the indicator of D and the factor cE is determined from the
normalization condition ul(R') = 1. We set TD = min{t: X; E D u aD}. From (4.17) we obtain u'(D) = cE
Jy
Mx J XD(XS) dsl'(dx) 0
< cE max Px{rD < r,} max Myr,. xey
(4.18)
yEDD
It follows from condition A and the compactness of OD that
max Myr, exp{E 2(VO + h/2)}, rT,
inf Mx xeD_p
J0
XD(XS') ds > a2 > 0.
132
4. Gaussian Perturbations of Dynamical Systems
Combining these estimates, we obtain from (4.17) that for small E, pe(D) > cE min Pr{min{t: X; e Dfi} < T11 xey
x
inf M>cE exp{a2(V0 + h/2)} JXD(XDds
XED_R
a2.
0
Taking account of (4.21), from this we obtain assertion (b).
We note that if b(x) has stable equilibrium positions other than zero, then the assertion of Theorem 4.3 is not true in general. The behavior of the invariant measure in the case of a field b(x) of a more general form will be considered in Ch.6.
§5. Gaussian Perturbations of General Form Let the function b(x, y), x e R', y e R, with values in R' be such that x21+IY1 Y2I)Ib(xt,Y1)b(x2,Y2)I We consider a random process X; = X;(x), t > 0, satisfying the differential equation X; = b(X;, EC,),
Xo = x,
(5.1)
where ?, is a Gaussian random process in R'. We shall assume that S, has zero mean and continuous trajectories with probability 1. As is known, for the continuity it is sufficient that the correla
tion matrix a(s, t) = (a'j(s, t)), a''(s, t) = Mss(' have twice differentiable entries. We write b(x) = b(x, 0). The process X; can be considered as a result of small perturbations of the dynamical system x, = x,(x): z, = b(x,),
x0 = x.
It was proved in Ch. 2 that X; ,. x, uniformly on every finite time interval
as E  0. In the same chapter we studied the expansion of X; in powers of the small parameter e. In the present section we deal with large deviations of X; from the trajectories of the dynamical system x,. For continuous functions cps, s > 0, with values in R' we define an operator Bx(cp) = u by the formula
u = u, = fb(us, (ps) ds + x. 0
In other words, u, = Bx((p) is the solution of equation (5.1), where EC, is replaced by cp, and we take the initial condition u0 = x. In terms of Bx we can write: X; = Bx(EC).
133
§5. Gaussian Perturbations of General Form
We shall denote by 11 I I c and 11A LZ the norms in CO 7.(R') and LO' T(R'), respectively.
Lemma 5.1. Suppose that the functions b(x, y) satisfy a Lipschitz condition with constant K and u = Bx((p), v = Bx(I1i), where .p, 0 E COT(R'). Then
Ilu  vllc < K ,/
eKTllgp  4111V
proof. By the definition of Bx, we have I u,  v, I = I f[b(us, cps)  b(vs,'s)] ds
 0; X'(x) 0 D} of first exit from D. We may try to prove the following analogue of Theorem 4.1:
lim e2 In Mtr`(x) = Vo = inf{S0 (co): (Do = OApT E aD; T > 0}. 80
(5.3)
Nevertheless, it becomes clear immediately that this is not so simple. First of all, an analysis of the presupposed plan of proof shows that the role of the limit of e2 In Mrs(x) may also be presumably played by
Vo=inf{S_..T((p):4,,=0, oo 0) for the same reason. In the case of Markov perturbations, Vo and Vo* obviously
coincide but in the nonMarkov case they may not. Moreover, in the proof of Theorems 2.1, 4.1, and 4.2, we have used a construction involving cycles, dividing a trajectory of the Markov process X; into parts, the dependence among which could be accounted for and turned out to be sufficiently small.
135
§S. Gaussian Perturbations of General Form
For an arbitrary stationary perturbation cC, we do not have anything similar: we have to impose, on the stationary process C conditions ensuring the weakening of dependence as time goes. Since we are dealing with probabilities
converging to zero (probabilities of large deviations), the strong mixing property
sup{ I P(A r B)  P(A)P(B) I : A E
c 3, B E
turns out to be insufficient; we need more precise conditions. These problems are considered in Grin's works [1], [2]; in particular, for a certain class of processes X;, the infima Vo and Vo coincide and (5.3) is satisfied.
Chapter 5
Perturbations Leading to Markov Processes
§1. Legendre Transformation In this chapter we shall consider theorems on the asymptotics of probabilities of large deviations for Markov random processes. These processes can be viewed as generalizations of the scheme of summing independent random variables; the constructions used in the study of large deviations for Markov
processes generalize constructions encountered in the study of sums of independent terms.
The first general limit theorems for probabilities of large deviations of sums of independent random variables are contained in Cramer's paper [1]. The basic assumption there is the finiteness of exponential moments; the
results can be formulated in terms of the Legendre transforms of some convex functions connected with the exponential moments of the random variables. The families of random processes we are going to consider are analogues of the schemes of sums of random variables with finite exponentialmoments,
so that Legendre's transformation turns out to be essential in our case, as well. First we consider this transformation and its application to families of measures in finitedimensional spaces. Let H(a) be a function of an rdimensional vector argument, assuming
its values in ( oc, + oo] and not identically equal to + x. Suppose that H(a) is convex and lower semicontinuous. (We note that the condition of semicontinuityand even continuityis satisfied automatically for all a with the exception of the boundary of the set {a: H(a) < x }.) To this function the Legendre transformation assigns the function defined by the formula
L(/3) = sup [(a, (3)  H(a)],
(1.1)
Y
where (a, /3) = Y; _ , a; #' is the scalar product. It is easy to prove that L is again a function of the same class as H. i.e., it is convex, lower semicontinuous, assuming values in ( x, + oc] and not identically equal to + oo. The following properties of Legendre's transforma
tion can be found in Rockafellar's book [1]. The inverse of Legendre's
137
§l. Legendre Transformation
transformation is itself:
H(a) = sup [(a, /3)  L(f3)]
(1.2)
p
(Rockafellar [I], Theorem 12.2). The functions L and H coupled by relations (1.1) and (1.2) are said to be conjugate, which we shall denote in the following
way: H(a)'' L(J1). At points ao interior for the set {a: H(a) < oo} with
respect to its affine hull, H is subdifferentiable, i.e., it has a (generally nonunique) subgradient, a vector f30 such that for all a,
H(a) > H(ao) + (a  ao, Qo)
(1.3)
(Rockafellar [1], Theorem 23.4; geometrically speaking, a subgradient is the
angular coefficient of a nonvertical supporting plane of the set of points above the graph of the function). The multivalued mapping assigning to every point the set of subgradients of the function H at that point is the inverse of the same mapping for L, i.e., (1.3) for all a is equivalent to the inequality (1.4)
L(fl) ? L(f3o) + (ao a  a0) for all 13 (Rockafellar [1], Theorem 23.5, Corollary 23.5.1). We have L(l)
00
as I P I  oo if and only if H(a) < oc in some neighborhood of a = 0. For functions H, L which are smooth inside their domains of finiteness, the determination of the conjugate function reduces to the classical Legendre
transformation: we have to find the solution a = a(J3) of the equation VH(a) = 13 and L(J3) is determined from the formula L(f3) = (a(a), f'3)  H(a(l3));
(1.5)
moreover, we have a(#) = VL(J3). If one of the functions conjugate to each other is continuously differentiable n > 2 times, and the matrix of secondorder derivatives is positive definite, then the other function has the same smoothness and the matrices of the secondorder derivatives at corresponding points are inverses of each other: (0 2
aa32H
WO) a«;
=
1(ea

EXAMPLE I.I. Let H(a) = r(ea  1) +  1), a e R'; solving the equation H'(a) = re'  le° = J3, we find that a(J3) = In 13 +
L(p)= flInP+
r, 1 > 0. Upon
#Z + 4r1 2r /2#r 2
+4r1 p2+4r1 +r+1.
138
5. Perturbations Leading to Markov Processes
It turns out that the rough asymptotics of families of probability measures
in R' can be connected with the Legendre transform of the logarithm of exponential moments. The following two theorems are borrowed from Gartner [2], [3] with some changes. Let p' be a family of probability measures in R' and put
W(a) = In J r exp{(a, x)}µ''(dx). The function H' is convex: by the Holder inequality, for 0 < c < 1 we have
H''(ca, + (1  c)a2) = In 5exp{c(otix)} exp{(l  cxa2, x)}µ''(dx) R
< Inl (JRr exp{(a,, x)}µ''(dx)) L`
(
1c
r
r
exp{(a2, x)}µh(dx))
R
= cH"(at) + (1  c)H''((X2);
H' is lower semicontinuous (this can be proved easily by means of Fatou's lemma), assumes values in ( oo, + co] and is not identically equal to + 00, since H''(0) = 0. Let 1(h) be a numericalvalued function converging to + oo as h 10. We assume that the limit H(a) = lim A(h)'H''(2,(h)a)
(1.6)
h10
exists for all a. This function is also convex and H(0) = 0. We stipulate that it be lower semicontinuous, not assume the value  co and be finite in some neighborhood of a = 0. Let L(fl) H(a). Theorem 1.1. For the family of measures µ'' and the functions . and L, condition
(II) of §3, Ch. 3 holds, i.e., for any 6 > 0, y > 0, s > 0 there exists ho > 0 such that for all h < ho we have µ''{y: p(y, (D(s))
S} < exp{ ),(h)(s  y)},
(1.7)
where b(s) _ {#: L($) < s}.
Proof. The set 4(s) can be represented as an uncountable intersection of halfspaces :
(D(s) = n {o: (a, fi)  H(a) S} < µh U {y: (ai, y)  H(ai) >
s})
> µh{y: (ai, y)  H(ai) > s}
i=1
tf i=1
exp{A(h)[(ai, y)  H(a;)  s]}p"(dy)
R'
_Y
exp{1(h)[,1.(h)'H""(.(h)a)
i=1
 H(ai)]}
x exp{A(h)s}. We obtain (1.7) from this by taking account of (1.6).
We shall say that a convex function L is strictly convex at a point /3o if there exists ao such that
L(fl) > L($o) + (ao, Q  flo)
(1.8)
for all / /o. For a function L to be strictly convex at all points interior to the set {fl: L(f3) < co} with respect to its affine hull (with the notation of Rockafellar's book [1], §§4,6, at the points of the set ri(dom L)), it is sufficient
that the function H conjugate to L be sufficiently smooth, i.e., that the set {a: H(a) < oo } have interior points, H be differentiable at them and if a sequence of points ai converges to a boundary point of the set {a: H(a) < oo }, then we have IVH(ai)I oo (cf. Rockafellar [1], Theorem 26.3).
140
5. Perturbations Leading to Markov Processes
Theorem 1.2. Let the assumptions imposed on p", H" and H earlier be satisfied. Moreover, let the function L be strictly convex at the points of a dense subset of { f3: L(/3) < cr }.
For the family of measures (!" and the functions A and L, condition (I) of §3 of Ch. 3 is satisfied, i.e., for any S > 0, y > 0 and x E R' there exists ho > 0 such that for h < ho we have
p"{y: p(y, x) < S} > exp{A(h)[L(x) + y]}.
(1.9)
Proof. It is sufficient to prove the assertion of the theorem for points x at which L is strictly convex. Indeed, the fulfillment of the assertion of the theorem for such x is equivalent to its fulfillment for all x, the same function A and the function L(x) defined as L(x) if L is strictly convex at x and as + 00 otherwise. At the points where L(x) < L(x), we have to use the circumstance
that L(x) = limyx L(y), and the remark made in §3 of Ch. 3. Suppose that L is strictly convex at x. We choose ao so that L(f3) > L(x) + (ao, /3  x) for f3 96 x. Then we have
H((xo) = sup[(c0, /3)  L(j3)] = (ao, x)  L(x).
(1.10)
v
Since H(ao) is finite, H''(A(h)ao) is also finite for sufficiently small h. For such h we consider the probability measure defined by the relation = Sr e Xp{A(h)(ao y)  H"(A(h)o)}p"(dy').
We use the mutual absolute continuity of µ" and
p"{y: p(y, x) < h} = r
exp{ A(h)(a0, y) + H"(A(h)ao)}µ"''°(dy). ply.xi 8'} < which converges to zero as h 10.
Consequently, if the hypotheses of Theorems 1.1 and 1.2 are satisfied, then 1(h)L(x) is the action function for the family of measures u" as h j 0.
The following example shows that the requirement of strict convexity of L on a set dense in {/3: L(/3) < oc } cannot be omitted. EXAMPLE 1.2. For the family of Poisson distributions (u"(F) =
Er(h"e"/k !))
we have: H"(a) = h (ea  1). If we are interested in values of h converging to zero and put 2(h) =  In h, then we obtain
H(a) = lim(In h)' In H"(a In h) h10
=limh h
lo
lnh
1 + 00,
a 1;
{
L(l3) = j + 03, /3,
/3 < 0,
/3>0.
But, the normalized action function found by us in §3, Ch. 3 is different from +oo only for nonnegative integral values of the argument. Another example: we take a continuous finite function S(x) which is not convex and is such that S(x)/ I x I oo as I x I oo and min S(x) = 0. As p" we take the probability measure with density C(h) exp{  2(h)S(x)}, where
142
5. Perturbations Leading to Markov Processes
2(h) + oc as h 10. Here the normalized action function will be S(x) but the Legendre transform of the function
H(a) = lim 2(h)' In fexP{2(h)(cc, x))p"(dx)
h0
will be equal not to S(x) but rather the convex hull L(x) of S(x). In those domains where S is not convex, L will be linear and consequently, not strictly convex.
The following examples show how the theorems proved above can be applied to obtain rough limit theorems on large deviations for sums of independent random variables. Or course, they can be derived from the precise results of Cramer [1] (at least in the onedimensional case). EXAMPLE 1.3. Let ,, 5a,
... , ", ... be a sequence of independent identically
distributed random vectors and let
Ho(a) = In Me'° be finite for sufficiently small I a 1. We are interested in the rough asymptotics
as n + eo of the distribution of the arithmetic mean We have
H"(a) = In M exp
t+
n+
"l =
nHo(n ta).
\a'
We put 2(n) = n. Then not only does 2(n)H"(2(n)a) converge to H0(a) but it also coincides with it.
The function Ho is infinitely differentiable at interior points of the set {a: Ho(a) < oo }. If it also satisfies the condition I VHo(a;) I  oc as the points a; converge to a boundary point of the above set, then its Legendre transform Lo is strictly convex and the asymptotics of the distribution of the arithmetic mean is given by the action function n Lo(x). EXAMPLE 1.4. Under the hypotheses of the preceding example, we consider
S.  nM k)/B",
the distributions of the random vectors
where B. is a sequence going to oc faster than In but slower than n. We have
H"(a) = In M expa, '
i
Ill
/
+
+"B.
= n[Ho(B01 ")  (B" , OHo(0))J IcI2 a2Ho ` a1af = n[ Y 1
Jj
§2. Locally Infinitely Divisible Processes
143
If as the normalizing coefficient A(n) we take B.'/n, then we obtain
H(a) = lim 2(n)t H(A(n)a) = n_.O
aZ Ho
11
21,; aai aa;
(0)aia;.
If the matrix of this quadratic form, i.e., the covariance matrix of the random is nonsingular, then the Legendre transform of H has the form
vector
1
2ij aijflI
i,
where (ai) = ((a2Ho/aai 6a;)(0))1. The action function for the family of the random vectors under consideration is In particular, this means that nZ
In P lim lim 6j0nm Bn
+Bn
x
§2. Locally Infinitely Divisible Processes Discontinuous Markov processes which can be considered as a result of random perturbations of dynamical systems arise in various problems. We consider an example.
Let two nonnegative functions 1(x) and r(x) be given on the real line. For every h > 0 we consider the following Markov process Xh on those points of the real line which are multiples of h: if the process begins at a point x, then over time dt it jumps distance It to the right with probability h'r(x) dt (up to infinitesimals of higher order as dt  0) and to the left with probability h' 1(x) dt (it jumps more than once with probability o(dt)). For small It, in first approximation the process can be described by the differ
ential equation z, = r(x,)  1(x,) (the exact meaning of this is as follows: under certain assumptions on r and I it can be proved that as It 10, X; converges in probability to the solution of the differential equation with the same initial condition). A more concrete version of this example is as follows: in a culture medium of volume V there are bacteria whose rates c+ and c_ of division and death
depend on the concentration of bacteria in the given volume. An appropriate mathematical model of the process of the variation of concentration of bacteria with time is a Markov process X" of the form described above with h = V_', r(x) = x c+(x) and 1(x) = x  c _ (x).
It is natural to consider the process X; as a result of a random perturbation of the differential equation z, = r(x,)  I(x,) (a result of a small random
perturbation for small h). As in the case of perturbations of the type of a
5. Perturbations Leading to Markov Processes
144
white noise, we may be interested in probabilities of events of the form {por(X", () < S}, etc. (probabilities of "large deviations"). As we have already mentioned, the first approximation of X" for small h is the solution of the differential equation; the second approximation will be a diffusion process with drift r(x)  1(x) and small local variance h(r(x) + 1(x)).
Nevertheless, this approximation does riot work for large deviations: as we shall see, the probabilities of large deviations for the family of processes X; can be described by means of an action functional not coinciding with the action functional of the diffusion processes. We describe a general scheme which includes the above example. In rspace R' let us be given: a vectorvalued function b(x) = (b'(x), ... , b'(x)), a matrixvalued function (a' (x)) (of order r, symmetric and nonnegative definite) and for every x e R', a measure px on R`\ {0} such that
fIR12px(df) < r. K 101
For every h > 0 let (X;', P"r) be a Markov process in R' with right continuous
trajectories and infinitesimal generator A' defined for twice continuously differentiable functions with compact support by the formula A"f(x) = Y b'(x) i
(If (x)
axi
+ h' R \IOI
+
h
Y
2 i. j
a''(x)
a?T(x) axi axi fa' a, (x'x)
f+ hfl) .f(x)  h
/
/
Pr(df).
(2.1)
i
If a''(x) = 0 and lix is finite for all x, then X" moves in the following way: it jumps a finite number of times over any finite time and the density of jumps at x is h 'px(R' \ {0}) (i.e., if the process is near x, then over time It it makes a
jump with probability h'p,(R' \ {0}) dt up to infinitesimals of higher order as dt + 0); the distribution of the length of a jump is given by the measure px(R' \ {0})' px(h' d/3) (again, as dt + 0); between jumps the process moves in accordance with the dynamical system r, = b(x,). where
b(x) = b(x)  f
fpx(df)
R \{01
On the other hand, if px(R' \ {0}) = x,, then the process jumps infinitely many times over a finite time.
The process under consideration above is a special case of our scheme with r = 1. measure px concentrated at the points ± 1, px{ I } = r(x), px{  I } = 1(x) and b(x) = r(x)  1(x).
145
§2. Locally Infinitely Divisible Processes
If the measure µx is concentrated at 0 for every x, then the integral term in formula (2.1) vanishes and A'' turns into a differential operator of the second order. In this case (X;, Px) is a family of diffusion processes with a small diffusion coefficient and the corresponding trajectories are continuous with probability one. In the general case (X;, Px) combines a continuous diffusion motion and jumps. The scheme introduced by us is a generalization of the scheme of processes
with independent incrementsthe continuous version of the scheme of sums of independent random variables. It is known that in the study of large
deviations for sums of independent random variables, an important role is
played by the condition of finiteness of exponential moments (cf., for example,
Cramer [1]). We introduce this condition for our scheme, as well: we shall assume that for all a = (a,, ... , a,), the expression H(x, a) _
W(x)a; + 1 Y_ a" (x)a; aj
2..;
+
I
(ex{xii}  1 
0i P iux(dp)
(2.2)
is finite. The function H is convex and analytic in the second argument. It vanishes at zero.
The connection of H with the Markov process (X,, Px) can be described in the following way: if we apply the operator A' defined by formula (2.1) to the function exp{Y; (x;x'), then we obtain h'H(x, ha) exp{y; a;x'}. We denote by L(x, /i) the Legendre transform of H(x, a) with respect to the second variable. The equality H(x, 0) = 0 implies that L is nonnegative; it vanishes at 13 = b(x). The function L may assume the value + oo ; however, inside the domain where it is finite, L is smooth.
For the example considered above we have: H(x, (x) = r(x)(e"  1) + l(x)(e"  1), and the function L has the form indicated in the preceding section with r(x) and 1(x) replacing r and 1.
For a function q,, T, < t < T2, with values in R', we define a functional by the formula S(AP) = ST,T,(co) =
f
T2
L(co, 0,) dt,
(2.3)
TI
if q is absolutely continuous and the integral is convergent; otherwise we put ST1T,(cp) =+ x. This functional will be a normalized action functional (and the normalizing coefficient will be h'). In particular, if the measure p is concentrated at 0 and (a'') is the identity matrix, then, as follows from results of Ch. 4, the action functional has the form (2.3), where L(4, cp) = i 10  b(op)1z. In Wentzell and Freidlin [4] the action functional is computed for a family of diffusion processes with an arbitrary matrix (a'') (cf. the next section).
146
5. Perturbations Leading to Markov Processes
Families of infinitely divisible processes belonging to our scheme have been considered in Borovkov [1]. We return to these classes of random processes in the next section. Now we formulate a result which generalizes results of both Wentzell and Freidlin [4] and Borovkov [1]. In order that h'SOT((p) be the action functional for the family of processes (X,, Ps), it is, of course, necessary to impose some restrictions on this family. We formulate them in terms of the functions H and L. I.
There exists an everywhere finite nonnegative convex function H(a) such that H(0) = 0 and H(x, a) < H(a) for all x, a.
II.
The function L(x, f3) is finite for all values of the arguments; for any R > 0 there exists positive constants M and m such that L(x, /3) < M, I V, L(x,1i)1 < M, E;; (a2L/(a f' a/3j)(x, I3)c'c' >_ m E; (c')2 for all x, c e Rr and all J3, 1/31 < R.
The following requirement is between the simple continuity of H and L in the first argument, which is insufficient for our purposes, and uniform continuity, which is not satisfied even in the case of diffusion processes, i.e., for functions H and L quadratic in the second argument. III.
AL(6') =
sup
sup
1 Y ,' 1< b'
p
L(y ,13)
 L(y, P)  0 as b'  0.
1+ L(y, I3)
Condition III implies the following continuity condition for H:
H(y, (1 + AL(S'))'a)  H(y', a) < AL(S'Xl + AL(S'))'
(2.4)
for all a and I y  y' l < 8', where AL(6')  0 as S' j 0.
Theorem 2.1. Suppose that the junctions H and L satisfy conditions IIII and the functional S(q) is defined by formula (2.3). Then h'S(9) is the action functional for the family of processes (X,, Px) in the metric POT((P, 0) = SUP I (pt O 0 such that for any partition of the interval from 0 to T by points 0 =
to < tt
0 and so > 0 there exists a positive ho such that for all h < ho we have
PXO{Por(X', (P) < 6) > exp{h'[SoT((P) + Y]}, where cP is an arbitrary function such that SOT((P) < so, 4po = x and that
(s)) ? S) 0 such that the oscillation of each of these functions q on any interval of length not exceeding At is smaller than 6/2. We decrease this At if necessary so that for It
any polygon I with vertices (t;, q ) such that max(tj+,  t,) < At we have SOT(1) < SoT(() + y/2. We fix an equidistant partition of the interval from 0 to T into intervals of length not exceeding At. Then for all polygons 1 determined by functions (p with S0T(tp) < s0 we have 11,1 < 6/2At. For It smaller than some h0, inequality (2.5) holds for these polygons with 6/2 instead of 6 and y/2 instead of y. We use the circumstance that poT(9,1) < 6/2. We obtain PX{PoT(Xh, 1) < 6/2}
PX{POT(Xh, (P) < b}
> exp{h1[S0 (W) + Y]}. Now we prove (2.6). Given a small positive number x, we choose a positive
6' < 6/2 such that the expression AL(6') in condition III does not exceed x. 0 < t S T with vertices at the points We consider a random polygon (0, Xo), (At, X;,), (2At, X2et), ... , (T, XT), where the choice of At = T/n (n is an integer) will be specified later. We introduce the events Ah(i) = j
I X"  X e, I
OStSet
S
By virtue of a wellknown estimate (cf. Dynkin (1], Lemma 6.3), the supremum
of this probability does not exceed
2 sup sup P3 { I X;'  y j > 8'/2}. t<et
y
We estimate this probability by means of the exponential Chebyshev inequality: for any C > 0 we have
Py{ I X,  yI > 8'/2} < [M' exp{h'C (X;  y)} y)}]e" 1C512(2.12)
+
It follows from (2.8) that the mathematical expectation here does not exceed
exp{h'tH(±C)} < exp{h'LtH(±C)}. Now we choose C = 4s0/8' and we choose At so that AtH(±C) < so. Then the right side of (2.12) does not exceed 2exp{h'so} and the corresponding terms in (2.11) are negligible
compared to exp{h'(s  y)}. The second term in (2.11) is estimated by means of the exponential Chebyshev inequality:
nt
= Px ( (
A"(l) n {SOT(l") > S})
F i
A"(i); exp{h'(1 + x)'SOT(l")}
Mx o
x exp{h'(1 + x)2s}.
(2.13)
Using the choice of 8', we obtain "
SOT(l) _
n1
(i+t)et
i=0
i At
 X Ot} dt L l "' Xh +1)et At I Xh
"Y_1AtIL (X''tet+ .=o
`
_ (1 + )n1At L(Xet i=0
Xetl
J
.(1 + x) + x
X(+1)At X otl J
+ xT
5. Perturbations Leading to Markov Processes
152
for we n7=o A"(i). Taking account of this, we obtain from (2.13) that n 1
Pz n A"(i) n {t" x(s)}} t=o
J
"
n1
n1
< Mx ,n A h(i); iI exp{h (1 + x)1
x exp{h[(1 +
x)`2s
1
X( +1)Ot  X1 AtL(XAt )}1J h
+ (1 + x)zxT]}.
(2.14)
Using the Markov property with respect to the times it, 2At, ... , (n  1)At, we obtain that the mathematical expectation in this formula does not exceed (
[sup M,",[A"(0); exp{
l
T
h'(1
( + x)  'AtL ! y,
Xh
_
eAt
lllln
ll
Y/ JJ
(2.15)
Here the second argument of L does not exceed 5'/At. For fixed y, we approxi.
mate the function L(y, /1), convex on the interval /31 < 5'/At, by means of a polygon L'(y, /3) circumscribed from below to an accuracy of X. This can be done by means of a polygon with the number N of links not depending on y, for example, in the following way: we denote by (i;, ig+; (i = 0, 1, 2,...) the points at which L(y, /3) = ix (for i # 0, there are two such points) and choose the polygon formed by the tangent lines to the graph of L(y, /3) at these points, i.e., we put
L'(y, ) =
max i. [L(Y, /31) + i0:5 i!c
4
(Y,
(2.16)
Here the number N of points is equal to 2io + 1, where io is the integral part of x' sup ,. supjpi for I# I < S'/At (Fig. 9).
L(y, /3). It is easy to see that L(y, /3)  L'(y, /3) < x
Figure 9.
13
153
:special Cases. Generalizations
Using the definition of L, the expression (2.16) can be rewritten in the form
L'(y, /3) = max [a; /3  H(y, a;)],
(2.17)
io y Y ; c?, µ > 0. We calculate the characteristics of the family of processes (X;, Pt): the infinitesimal generator for functions f eCt2) has the form azf
2
(x) AY (x) = Y b'(x) of (x) + Y a'j(x) ox' ox' 2 ;; ox'
(3.2)
The role of the parameter h of the preceding paragraph is played by £2 and the normalizing coefficient is C. 2. Moreover, we have H(x, a) = Y b'(x)a; + 2 Y a" (x)ai a; ;
(3.3)
the Legendre transform of this function is
L(x, /1) = 2 Y a;;(x)(/l'  b'(x))(f3  b'(x)),
(3.4)
where (a;;(x)) = (a''(x))' ; the normalized action functional has the form (for absolutely continuous cp T, < t < T2) 1
ST,T2((P) = 2
Tz
f Yi; aij(q,)(4  b'(W,))(O  bj((,r)) dt. /
T,
We verify conditions IIII of Theorem 2.1. As the function H we may choose
H(a) = BIaI + 4AIaI2,
where B and A are constants majorizing I b(x) I and the largest eigenvalue of the matrix (a''(x)). As the constant M = M(R) of condition II we may
choose µ'(R + B + 1)2 and as m we may take A'. Condition III is also satisfied: the function AL(d') can be expressed in terms of the moduli of continuity of the functions b'(x), a'j(x) and the constants µ, A and B. The following theorem deals with a slight generalization of this scheme and in this case the proof of Theorem 2.1 needs only minor changes.
Theorem 3.1. Let the functions b'(x) and a"(x) be bounded and uniformly continuous in R', let the matrix (a''(x)) be symmetric for any x and let
>a''(x)c;cj q
Y c;' t
155
13. Special Cases. Generalizations
where p is a positive constant. Furthermore, suppose that b'`(x), ... , b"(x)
uniformly converge to b'(x), ... , b'(x), respectively, as is * 0, (X;, Px) is a diffusion process in R' with drift b(x) = (b' `(x), ... , b"(x)) and diffusion matrix E2(a"(x)), and the functional S is given by formula (3.5). Then E2Sor(cP) is the action functional for the family of processes (X,, Px) in the sense of the metric por(cP, ') = SUPO  0,
if at least one of the f3' < 0;
(3.9)
the functional S(ip) is defined as the integral f T12
L(O,) dt
ST,T2(IP) =
(3.10)
TI
for absolutely continuous cp, = (cp; , ... , (p,) and as + oo for all remaining rp (the integral is equal to + oo for all (p, for which at least one coordinate is not a nondecreasing function). The hypotheses of Theorem 2.1 are not applicable, since L equals + 00 outside the set {#' > 0, ... , f > 0}. Nevertheless, the following theorem holds. Theorem 3.3. The functional given by formulas (3.10) and (3.9) is the normalized
action functional for the family of processes ,, uniformly with respect to the initial point as h J, 0 (as normalizing coefficient we take, of course, h ').
This is a special case of Borovkov's results [1] (in fact, these results are concerned formally only with the onedimensional case; for r > 1 we can refer to later work: Mogul'skii [1] and Wentzell [7]). The preceding results were related to the case where with the increase of the parameter, the jumps of the Markov process occur as many times more often as many times smaller they become. We formulate a result concerning the case where this is not so. As above, let nl ... , n" be independent Poisson processes with parameters
a > 0 and h > 0 we put
h(ir  a2't),
Theorem 3.4. Put T2
ST,T2(P) = 
2
r
T, it,
(1')1(cPI)2
dt
(3.11)
§4.
Consequences.
Generalization of Results of Chapter 4
157
for absolutely continuous cp,, T, < t < T2 and ST,T,(cp) = +oo for the re
7(p) is the action functional for the family of maining qp,. Then (h2a) ISO II" ), 0 < t < T, uniformly with respect to the r11" I" = (7, processes
initial point as
,
ha '
,
,
oo, h2a + 0.
This is a special case of one of the theorems of the same article [1] by Borovkov (the multidimensional case can be found in Mogul'skii [1]).
§4. Consequences. Generalization of Results of Chapter 4 Let us see whether the results obtained by us in §§2, 3 and 4 of Ch. 4 for small
perturbations of the type of a "white noise" of a dynamical system can be carried over to small jumplike perturbations (or to perturbations of the type of diffusion with varying diffusion).
Let z, = b(x,) be a dynamical system with one stable equilibrium position 0 and let (X;, PX) be a family of Markov processes of the form described in
§2. For this family, we may pose problems on the limit behavior as h 10 of the invariant measure p", of the distribution of the point X=h of exit from a domain and of the mean time MX r'' of exit from a domain. We may conjecture that the solutions will be connected with the function V(0, x) _ inf{Sr,r2((p): tor, = 0, (PT2 = x). Namely, the invariant measure µ'' must be described by the action function h ' V(0, x), the distribution of X',',, as h 10 must to be concentrated near those points of the boundary at which min).Eap V(0, y) is attained, MXT" must be logarithmically equivalent to
exp{h'min V(0, y) ((
yEaD
and for small h, the exit from a domain must take place with overwhelming probability along an extremal of S((p) leading from 0 to the boundary, etc. The proofs in §§2, 4 of Ch. 4 have to be changed in the following way. Instead of the small spheres y and r about the equilibrium position, we have to take a small ball y containing the equilibrium position and the exterior r of a sphere of a somewhat larger radius (a jumplike process may simply jump over a sphere). Instead of the chain Z on the set y u OD, we consider a chain on the sum of y and the complement of D. A trajectory of X, beginning at a point of y is not necessarily on a sphere of small radius (the boundary
of r) at the first entrance time of F. Nevertheless, the probability that at this time the process will be at a distance larger than some $ > 0 from this sphere converges to zero faster than any exponential exp{ Kh' } as h 10. Theorems 2.1, 2.3, 2.4, 4.1 and 4.2 of Ch. 4 remain true for families (X,, PX) satisfying the hypotheses of Theorem 2.1. Of course, these theorems are also true for families of diffusion processes satisfying the hypotheses of Theorems 3.1 and 3.2. We give the corresponding
158
5. Perturbations Leading to Markov Processes
formulation in the language of differential equations, i.e., the generalization of Theorem 2.2 of Ch. 4. Theorem 4.1. Let 0 be a stable equilibrium position of the dynamical system x, = b(x,) on a manifold M and let D be a domain in M with compact closure
and smooth boundary 3D. Suppose that the trajectories of the dynamical system beginning at any point of D u aD are attracted to 0 as t  oe and the vector b(x) is directed strictly inside D at every boundary point. Furthermore, let (X;, Pz) be a family of diffusion processes on M with drift br(x) (in local coordinates) converging to b(x) as e ' 0 and diffusion matrix c2(a'.J(x)) and let the hypotheses of Theorem 3.2 be satisfied. For ever), F > 0 let ue(x) be the solution of Dirichlet's problem s2
a2UE(x)
I a''(x) ax ax' 2 i;
+ Y b'E(x) auE(x) = 0, i
ut(x) = g(x),
ax'
x e D,
x e aD,
with continuous boundary function g. Then 1imE_o uE(x) = g(yo), where yo is a point on the boundary at which minY D V(O, y) is attained (it is assumed that this point is unique).
The proof of the theorem below is entirely analogous to that of Theorem 4.3 of Ch. 4.
Theorem 4.2. Let the dynamical system z, = b(x,) in R' have a unique equilibrium position 0 which is stable and attracts the trajectories beginning at any point of R'. For sufficiently large I x I let the inequality (x, b(x)) <  c I x 1, where c is a positive constant, be satisfied. Suppose that the family of diffusion
processes (X;, Px) with drift bE(x)  b(x) and diffusion matrix s2(a'j(x)) satisfies the hypotheses of Theorem 3.1. Then the normalized invariant measure I,E of (X;, Px) as e  0 is described by the action function r 2V(O, X).
The formulation and proof of the corresponding theorem for diffusion processes on a manifold is postponed until §4 of Ch. 6. The matter is that this theorem will be quite simple for compact manifolds but the trajectories of a
dynamical system on such a manifold cannot be attracted to one stable equilibrium position. For families of jumplike processes, the corresponding theorem will also be more complicated than Theorem 4.3 of Ch. 4 or Theorem 4.2 of this chapter. We indicate the changes in Theorem 3.1 of Ch. 4 enabling us to determine the quasipotential. Let A be a subset of the domain D with boundary aD. The problem RA for a firstorder differential equation in D is, by definition, the problem of finding a function U continuous in D u OD, vanishing on A and positive
Consequences. Generalization
§4.
159
of Results of Chapter 4
outside A, continuously differentiable and satisfying the equation in question in (D v OD) \ A and such that VU(x) 0 0 for x E (D v OD) \ A. Theorem 4.3. Let H(x, a) M* L(x, /3) be strictly convex functions smooth in
the second argument and coupled by the Legendre transformation. Let ST,T2((p) = JT2 L((p,, 0,) dt (for absolutely continuous functions q, and +oo Let A be a compact subset of D. Let us put otherwise).
V(A, x) = inf {ST,T2((p): (PT, E A, OT, = x;  oc < T, < T2 < oo }. (4.1)
Suppose that U is a solution of problem RA for the equation
H(x, VU(x)) = 0
(4.2)
inD u 3D. Then U(x) = V(A,x)forall xfor which U(x) < min{U(y): y e 8D}. The infimum in (4.1) is attained at the solution of the equation gyp, = V.H((p VU(cp,))
(4.3)
with the final condition PT2 = x for any T2 < oo (then the value of (p automatically belongs to A for some T,,  oc < T, < T2).
Theorem 3.1 of Ch. 4 is a special case of this with H(x, a) = (b(x), a) + Ia12/2, A = {0}.
Proof. Let qp, T, < t < T2, be any curve connecting A with x and lying entirely in D u OD. We use the inequality L(x, /3) > Y; a; /f'  H(x, a), following from definition (1.1). Substituting 9p rp, and VU((p,) for x, /3 and a, we obtain T2
ST,T2((p) =
J T,
0U
T2
L(O (q,) dt > T,
Y ax ; ((o,)c; dt 1
rT2
H(p,, VU((pJ) dr. J T,
The second integral is equal to zero by (4.2). The first one is U(cpT2)

U(gpT,) = U(x). Hence ST, T2(0) > U(x). If U(x) < min{C(y): y e 3D}, then
the consideration of curves going through the boundary before hitting x does not yield any advantage and V(A, x) > U(x). Now we show that the value U(x) can be attained. For this we construct an extremal. We shall solve equation (4.3) for t < T2 with the condition coT2 = x. A solution exists as long as q,, does not go out of (D u OD) \ A, since the right side is continuous (the solution may not be unique). We have (d/dt)U(cp,) _ Y; (dU/0x'x(p,) gyp' = L((p gyp,) + H(qp VU((p,)). The second term is equal to zero. The first term is positive, since L(x, /3) >: 0 and vanishes only for /3 = b(x) = V, H(x, 0), which is not equal to
160
S. Perturbations Leading to Markov Processes
V, H(x, V U(x)) for x 0 A (because V U(x) # 0). Hence U((p,) decreases with decreasing t. Consequently, (p, does not go out of D u aD. We prove that for some T, < T2 (maybe for T, =  oo) the curve cp, enters A. If this were not so, then the solution gyp, would be defined for all t < T2 and the limit lim,  .
0 would exist. However, this would mean that cp, does not hit some neighborhood of the compactum A. Outside this neighborhood we have (d/dt)U((p,) = L((p 0,) > const > 0, which contradicts the existence of a finite limit of U((p,) as t +  cc. Now we determine the value of S at the function we have just obtained: Z
ST,T2(tp) =
TZL((p,, apt) dt =
J T1
d
T JT, dt
U((p,) dt = U(PT2)  U((QT,) = U(X).
The theorem is proved. The hypotheses of this theorem can be satisfied only in the case where all trajectories of the dynamical system z, = b(x,) beginning at points of D u aD are attracted to A as t co (or they enter A). The same is true for Theorem 3.1 of Ch. 4.
As an example, we consider the family of onedimensional jumplike processes introduced at the beginning of §2. If 1(x) > r(x) for x > x0 and 1(x) < r(x) for x < xo, then the point x0 is a stable equilibrium position for the equation for the equation .z, = r(x,)  1(x,). We solve problem H(x, U'(x)) = 0, where H(x, a) = r(x)(ea  1) + 1(x)(e  1). Given x, the function H(x, a) vanishes at a = 0 and at a = ln(1(x)/r(x)). For x 34 xa we have to take the second value, because it is required that U'(x) : 0 for x # x0. We find that U(x) = fxoln(1(y)/r(y))dy (this function is positive for both x > xo and x < x0). It is in fact the quasipotential. An extremal of the normalized action functional is nothing else but a solution of the differential equation 0, = (aH/aa)(cp U'((p,)) = 1((p,)  r((p,). Now we obtain immediately the solution of a series of problems connected with the family of processes (X;, Pz): the probability that the process leaves an interval (x,, x2) 9 xo through the left endpoint converges to 1 as h 10 if U(xl) < U(x2) and to 0 if the opposite inequality holds (for U(x,) = U(x2) the problem remains open); the mathematical expectation of the first exit
time is logarithmically equivalent to exp{h' min[U(xl), U(x2)]} (this result has been obtained by another method by Labkovskii [1]) and so on. The asymptotics of the invariant measure of (X,, P') as h 10 is given by the action function h'U(x) (this does not follow from the results formulated by us; however, in this case as well as in the potential case, considered in §4, of Ch. 4, the invariant measure can easily be calculated explicitly).
Chapter 6
Markov Perturbations on Large Time Intervals
§1. Auxiliary Results. Equivalence Relation In this chapter we consider families of diffusion processes (X;, Px) on a connected manifold M. We shall assume that these families satisfy the hypotheses of Theorem 3.2 of Ch. 5 and the behavior of probabilities of large'deviations from the "most probable" trajectorythe trajectory of the dynamical system x, = b(x,)can be described as E 0, by the action functional EZS((P) = e 2ST,TZ(q ), where
2f, tiY_ ai,{(P,)(O'  b'((gj))(4j  b'((p,)) dr. I
Tz
S((P) = 
We set forth results of Wentzell and Freidlin [4], [5] and some generalizations thereof.
In problems connected with the behavior of X; in a domain D M on large time intervals (problems of exit from a domain and for D = M the problem of invariant measure; concerning special cases, cf. §§2, 4 of Ch. 4 and §4 of Ch. 5), an essential role is played by the function VD(x, y) = inf{SoT(4P): c°o = x, (PT = t', gyp, e D u OD for t c [0, T]),
where the infimum is taken over intervals [0, T] of arbitrary length. For D = M we shall use the notation VM(x, y) = V(x, y). For small s, the function VD(x, y) characterizes the difficulty of passage from x to a small neighborhood of y, without leaving D (or D u 8D), within a"reasonable" time. For D = M we can give this, for example, the following exact meaning: it can be proved that .
V(x, y) = lim lim lim[E2 In PX{Ta < T}], T oo a,0 a'0
where Ta is the first entrance time of the 6neighborhood of y for the process e
162
6. Markov Perturbations on Large Time Intervals
It is easy to see that VD is everywhere finite. Let us study its more compli. cated properties.
Lemma 1.1. There exists a constant L > 0 such that for any x, y e M with p(x, y) < A, there exists a function cpt, cpo = x, coT = y, T = p(x, y), for which So1(cp) < L L. p(x, y).
This is Lemma 2.3 of Ch. 4, adapted to manifolds. Here A > 0 is the radius of the neighborhoods in which the coordinate systems K.'. are defined (cf. §3, Ch. 5).
This lemma implies that VD(x, y) is continuous for x, y e D and if the boundary is smooth, for x, y e D v M.
Lemma 1.2. For any y > 0 and any compactum K c D (in the case of a smooth boundary, for K S D u OD) there exists To such that for any x, y e K
there exists a function p,, 0 < t < T, coo = x, PT = y, T< To such that SoT((o) < VD(x, y) + Y
For the proof, we choose a sufficiently dense finite 6net {xi) of points in
K; we connect them with curves at which the action functional assumes values differing from the infimum by less than y/2 and complete them with end sections by using Lemma 1.1: from x to a point xi near x, then from xi to a point x; near y, and from x; to y.
Let cp 0 < t < T, be a continuous curve in M and let 0 = to < t1
0 and any y > 0 there exists a positive h such that for any function cpt, 0 < t < T, T + S0T(cp) < K, and any partition of the interval from 0 to T with max(ti+1  ti) < h we have SOT(l) < SOT(Cp) + Y
The first step of the proof is application of the equicontinuity of all such (p in order to establish that for sufficiently small h, the values cot, ti < t S t.+ 1,
lie in one coordinate neighborhood K,t, and the number of different coordinate systems Ks, is bounded from above by a number depending only on K. This is the assertion of Lemma 2.1 of Ch. 5 concerning which we referred
the reader to Wentzell [7] (however, for a function L(x, fi) quadratic in /3, the proof is simpler). We have already seen in the proof of Theorem 2.1 of Ch. 4 that in our arguments we have to go out to a small distance from the boundary of D or, on the other hand, we have to consider only points lying inside D at a positive
163
§l, Auxiliary Results. Equivalence Relation
distance from the boundary. Let p(x,y) be a Riemannian distance cor
responding to a smooth tensor g;,{x): (dp)Z = Y;; g;,{x) dx` dx'. We denote by D+a the 6neighborhood of D and by D_a the set of points of D at a distance greater than b from the boundary. In the case of a compact twice continuously differentiable boundary OD, the boundaries OD,5 and 0_6 are also twice continuously differentiable for sufficiently small b. For points x lying between 3D _a and 3D+a or on them, the following points are defined uniquely: (x)o is the point on 3D closest to x; (x)_a is the point on 3D_a closest to x; (x)+a is the point on BD+a closest to x. The mappings x (x)o, x  (x)_a and x (x)+a are twice continuously differentiable. Lemma 1.4. Let 3D be compact and twice continuously differentiable. For any K > 0 and any y > 0 there exists b0 > 0 such that for any positive b < bo and any function cp 0 < t < T, T + SoT(cp) < K, assuming its values in D U 3D,
there exists a function 0 0 0 such that the values of any of the co on any interval of
length not greater than ho lie in the A'neighborhood of the left endpoint. We decrease ho so that for the "polygon" I inscribed in co with time step not exceeding ho, we have SOT(l) h0, we take the partition of the interval from 0 to T into n equal parts and consider the
6. Markov Perturbations on Large Time Intervals
164
corresponding polygon 1 0 < t < T. In each of the local coordinate systems chosen by us, the modulus of the derivative of 1, does not exceed R = nA'/y0 at any point. We choose a positive b0 < A' such that t1L(60) < y/(3K + y) (cf. condition III of Theorem 2.1, Ch. 5) and S0 n
,,R I (aL/aIi') (x, f)
:::I y/3 (cf. condition II).
For every S < 60 and each of the functions (p, under consideration we define a function ip 0 < t < T = T: 0`
_
if1,ED_avaD_a
1,
otherwise
This function satisfies the conditions concerning the initial and terminal points. We estimate SoT(ip). We have S0T(O)  SOT(l) =
f
T
(p,)  L(W1, 1,) dt + f T [L(ip,,1,)  L(l 11)] dt.
0
0
(1.1)
'The expression L(Q l,)
is
meaningful, since because of the choice
60 < 2.' < A/2, on the whole interval from iT/n to (i + 1)T/n the points j, and 1, are in a neighborhood where one of the local coordinate systems chosen by us acts.) The value ip, is different from 1, only when 1, is in the 6strip along the boundary, i.e., on not more than n little intervals [ti, ti+ 1] s [iT/n, (i + 1)T/n]. Moreover, only the first coordinates may be different. In the first integral in (1.1) we have ip, and l,; the first of these derivatives, just as the second one, does not exceed R in modulus (because it either coincides with 1, or differs from 1, by a vanishing first coordinate). We have (i+ 1)TJn
f
i1)] dt = TJn
f1`1 aL
fi(t)) (w;  i,) dt.
(1.2)
1
Here fl(t) is a point of the interval connecting ip, with i, (which varies with t
in general), 't,'  l' does not depend on t and is equal to (`Y,i.,  l,i.,  Y',i + 1,,)(t1+1  t3.
We pull this constant out of the integral. We use the facts that
I aL/aPt l
S y/3n8o and that 0 < 1,  gyp, < 6 for all t. We obtain that the integral (1.2) does not exceed y/3n and the first integral in (1.1) is not greater than y/3. The second integral in (1.1) does not exceed AL(b) fo, (1 + L(l l,)) dt
0 there exists 6 > 0 such that for all sufficiently small a and x belonging to the closed 6neighborhood g u aq of K we have TG
MX f x9(X;) dt > eY` Z.
(1.4)
0
Proof. We connect the point x e g u ag and the closest point x' of K with a
curve (p, with the value of S not greater than y/3 (this can be done for sufficiently small 6). We continue this curve by a solution of z, = b(x,) until
168
6. Markov Perturbations on Large Time Intervals
exit from g v ag in such a way that the interval on which the curve is defined has length not greater than some T < x independent of x. The trajectory of X; is at a distance not greater than 6/2 from cp, with probability not less than e2j`213 (for small e); moreover, it is in g until exit from G, it spends a time
bounded from below by a constant to and the mathematical expectation 213
in (1.4) is greater than
Now we prove a lemma which is a generalization of Lemma 2.2 of Ch. 4.
Lemma 1.9. Let K be a compact subset of M not containing any (0limit set entirely. There exist positive constants c and To such that for all sufficiently small a and any T > To and x e K we have
P{T/i > T}
0 is small. We use an analogous construction in the case of systems satisfying condition (A).
We introduce the following notation: V0(K;, K;) = inf{Sor((1): cpo c K;, (Pr E K;, tp, c (D u OD)\ U Ks
S*i,j
for0 T.: X; E C}, t = inf{t a,,,: X e ag u aD} and consider the Markov chain Z = X. From n = 1 on, Z. belongs to ag u OD. As far as the times a are concerned, Xa0 can be any point of C; all the following XQ until the time
of exit of X; to aD belong to one of the surfaces r, and after exit to the boundary we have r = a = a,,+, = and the chain Z. stops. The estimates of the transition probabilities of the Z provide the following two lemmas.
Lemma 2.1. For any y > 0 there exists po > 0 (which can be chosen arbitrarily small) such that for any p2, 0 < p2 < po, there exists p,, 0 < Pi < P2 such that for any So smaller than po and sufficiently small e, for all x in the p2neighborhood G, of the compactum K. (i = 1'... , 1) the onestep transition probabilities of Z satisfy the inequalities
exp{f2(VD(K,. K) + y)} < P(x. ag) < exp{E2(VD(K;, K;)  y)};
(2.3)
exp{ E2(PD(K;, aD) + ),)} < P(x, aD)
PX{PoT0(X`, (P) < b'}
> exp{e2(SOT0((p) + 0.ly)}
> exp{e 2( VD(Ki, Ki) + M. The lower estimate in (2.3) has been obtained; we pass to estimate (2.5). Let x e Gi, y E aD, VD(Ki, y) < cc. We choose a point yk from our ponet
such that p(yk, y) < po; then p((yk)a, (y)a) < 2po. We connect x and x' a Ki with a curve with the value of S not greater than O.ly and x' and (Poj.'k e Ki with the "value" of S not greater than 0.1y. We complete the curve thus obtained with 9,K"',. Then we reach (yk)_a and the total value of S is not greater than VD(Ki, y) + 0.5y. We connect (yk)_s with (y)+a through (y)a, increasing the value of S by no more than 0.3y. Finally, we extend all functions constructed so far to the same interval [0, To] to be a solution of z, = b(x,) in such a way that SOT0(9) < VD(Ki, y) + 0.8y. Using Theorem 3.2 of Ch. 5 again, we obtain the lower estimate in (2.5). Now we obtain the upper estimates. It is sufficient to prove estimates (2.3) and (2.5) for x e F. (this follows from the strong Markov property). By virtue of the choice of po and 45', for
any curve (p 0 < r < T beginning on Fi, touching the S'neighborhood of ag; (the (60 + b')neighborhood of y e aD), not touching the compacta K5, s 0 i, j and not leaving D+a u aD +a, we have: S0T(cp) > VD(Ki, K)  0.3y(S0T(rp) > VD(Ki, y)  0.4y). Using Lemma 1.9, we choose T, such that for all sufficiently small E > 0 and x E (D u aD)\g we have Px{T, > T} < exp{E2V0}.
Any trajectory of X; beginning at a point x e 1i and being in ag; (in aD n 9ao(y)) at time T, either spends time T, without touching ag u OD or reaches ag; (OD n iaa(y)) over time T,; in this case PoT,(X`, IX(VD(Ki, K)  0.3y)) ? 6' (PoT1(X`, b (VD(Ki, y)  0.4y)) ? S').
Therefore, for any x e I'i we have Ps{X , a ag,} < Px{T, > T, } + Px{PoT,(X `, CDx(VD(Ki, K)  0.3y))
> 6');
(2.6)
PC {X', e OD u 1ffao(y)) < Px{T, > T, }
+ PX{PoT1(X`, 4x(VD(Ki y)  0.4y)) > 6'}. (2.7)
§2. Markov Chains Connected with the Process (X;, P;)
175
For small c the first probability on the right side of (2.6) and (2.7) does not exceed exp{ _E2 Vo} and by virtue of Theorem 3.2 of Ch. 5 the second one is smaller than exp{F2(VD(K;, K;)  0.5y)} (exp{F2([o(K;, y)  O.Sy)}) for all x e r; . From this we obtain the upper estimates in (2.3) and (2.5).
Lemma 2.2. For any y > 0 there exists po > 0 (which can be chosen arbitrarily
small) such that for any p2, 0 < P2 < po there exists pt, 0 < Pt < p2 such that for any So, 0 < So < po, sufficiently small F and all x outside the P2_ neighborhood of the compacta K; and the boundary 8D, the onestep transition probabilities of the chain Z. satisfy the inequalities exp{ _C2( PD(x, K;) + y)} < P(x, 8g;)
< exp{_e2(VD(x, K;)  y)}; exp{
2(VD(x, OD) + y)} < P(x, 8D) < exp{F2(V'D(x, 01))  y)};
(2.8)
(2.9)
exp{ _ E 2(f7
< exp{ F2(VD(x, y)  y)).
(2.10)
Proof. We choose a po as in the proof of the preceding lemma. Let p21 0 < P2 < po, be given. Let us denote by C the compactum consisting of the points of D u 8D except the p2neighborhoods of K; and 8D. We choose S, 0 < S < po/2, such that for all x e CP2, j = 1, ..., 1 and y e 7D we have VD, 6(x, K;) > VD(x, K;)  01y, VD _
6(x, K;) < VD(x, K,) + 0.1y,
VD +6(x, Y) ? VD(X, Y)  0.1y,
17D_6(x, (Y)s) < VD(x, Y) + O.ly.
We choose p2nets x1, ... , xM in CP2 and yt, ... , yN on 3D. For every pair x;, K1 for which VD(x;, K;) < oc we select a function (px''1', 0 < t < T = T(x;, Kj), (po','j = x,, (p'lKi e K;, which does not touch Usx1K5, does
not leave D a u 0D _ s and for which SOT(cp" ''J) < VD(x;, K;) + 0.2y. For 0 0, touches the boundary until exit from D o aD. (Of course, this simple result can be obtained more easily.) The formulation of Theorem 2.1 in the language of differential equations reads as follows :
Theorem 2.2. Suppose that uE(x) is the solution of the Dirichlet problem LEur(x) = 0 in D, uE(x) = exp{e 2F(x)} on OD, where in local coordinates we have LE = Y; b'E(x)(a/ax') + e2/2 L;; a"(x)(a2/ax'ax'), b'E(x)  b'(x) uniformly in x and the local coordinate systems K,,0 as a + 0 and F is a continuous
function. If none of the colimit sets of the system x, = b(x,) lies entirely in D o aD, then uE(x) = exp{e 2 max,EPD (F(y)  VD(x, y))} as e  0, uniformly in x belonging to any compact subset of D, where VD(x, y) is defined by the coefficients b'(x), a"(x) in the same way as in §1. In the case where there are colimit sets in D u aD (and namely, in D) and
condition A is satisfied, for the study of problems connected with the behavior of (X;, Px) we first have to study the limit behavior of Markov chains with exponential asymptotics of the transition probabilities.
§3. Lemmas on Markov Chains For finite Markov chains an invariant measure, the distribution at the time of exit from a set, the mean exit time, etc., can be expressed explicitly as the ratio of some determinants, i.e., sums of products consisting of transition
§3. Lemmas
on Markov Chains
177
probabilities (since the values of the invariant measure and of the other quantities involved are positive, these sums only contain terms with a plus sign). Which products appear in the various sums, can be described conveniently by means of graphs on the set of states of the chain. For chains on an infinite phase space divided into a finite number of parts for which there are upper and lower estimates (of the type of those obtained in the preceding section) of the transition probabilities, in this section we obtain estimates of values of an invariant measure, probabilities of exit to a set sooner than to another (others), and so on. In the application of these results to a chain Z. with estimates of the transition probabilities obtained in the preceding section, in each sum we select one or several terms decreasing more slowly than the remaining ones. The constant characterizing the rate
of decrease of the sum of products is, of course, determined as the minimum of the sums of constants characterizing the rate of decrease of each of the transition probabilities (cf. §§4, 5). Let L be a finite set, whose elements will be denoted by the letters i, j, k, m, n, etc. and let a subset W be selected in L. A graph consisting of arrows in  n (m e L\ W, n e L, n : m) is called a Wgraph if it satisfies the following conditions: (1) (2)
every point m E L\ W is the initial point of exactly one arrow; there are no closed cycles in the graph.
We note that condition (2) can be replaced by the following condition: (2')
for any point m e L\W there exists a sequence of arrows leading from it to some point n e W.
We denote by G(W) the set of Wgraphs; we shall use the letter g to denote graphs. If pi; (i, jE L, j 0 i) are numbers, then Ronn)Eg p,R will be denoted by tt(g)
Lemma 3.1. Let us consider a Markov chain with set of states L and transition probabilities pi, and assume that every state can be reached from any other state
in a finite number of steps. Then the stationary distribution of the chain is {(LEL Qd 'Qi, i E L}, where
Qi = Y_ n(g) ge G(i)
Proof. The numbers Qi are positive. It is sufficient to prove that they satisfy the system of equations
Qi = Y Q,pfi jEL
(ieL),
since it is well known that the stationary distribution is the unique (up to a multiplicative constant) solution of this system. In the ith equation we carry
178
6. Markov Perturbations on Large Time Intervals
the ith term from the right side to the left side; we obtain that we have to verify the equality
Qi I pik = I Qjpji. k*i
j*i
It is easy to see that if we substitute the numbers defined by formulas (3.1) in (3.2), then on both sides we obtain the sum n(g) over all graphs g satisfying the following conditions: (1)
Every point m e L is the initial point of exactly one arrow m > n
(n96 m,neL); (2)
in the graph there is exactly one closed cycle and this cycle contains the point i.
Lemma 3.2. Let us be given a Markov chain on a phase space X divided into disjoint sets Xi, where i runs over a finite set L. Suppose that there exist nonnegative numbers pij (j 0 i, i, j e L) and a number a > 1 such that
a'pij < P(x, X) < apij
(x e Xi, i # j).
for the transition probabilities of our chain. Furthermore, suppose that every set Xj can be reached from any state x sooner or later (for this it is necessary and sufficient that for any j there exist a {j}graph g such that ir(g) > 0). Then
/ a221('y Qi)
i
t
Qi < v(Xi)
0. Similarly to the equivalence of points, the property of stability depends only on the structure of the system z, = b(x,). The example illustrated in Fig. 11 shows that there may exist a stable compact set not containing any stable wlimit
p. The Problem of the Invariant Measure
189
set (i.e., such that any trajectory of the dynamical system beginning near this
set does not leave a small neighborhood of the set). In the example in Fig. 12, K2 and K4 are stable and Kt and K3 are unstable. Lemma 4.2. If a compactum K; is unstable, then there exists a stable com
pactum K; such that V(Ki, K) = 0.
proof. There exists x 0 K; such that V(Ki, x) = 0. We issue a trajectory xt(x), t >_ 0, from x. It leads us to its colimit set contained in one of the compacta K3; furthermore, V(K1, K) = V(x, K) = 0. The compactum K3 does not coincide with Ki, otherwise x would have to be in Ki; if K3 is unstable, in the same way we pass from it to another compactum, etc. Finally we arrive at a stable compactum. Lemma 4.3. (a)
Among the {i}graphs for which the minimum (4.3) is attained there is one in which from the index m, m 0 i of each unstable compactum an arrow m + j is issued with V(Km, Ki) = 0 and with Kj stable.
(b)
For a stable compactum Ki, the value W(Ki) can be calculated according to (4.3), considering graphs on the set of indices of only
(c)
stable compacta. If K3 is an unstable compactum, then
W(K) = min[W(Ki) + V(Ki, K)],
(4.4)
where the minimum is taken over all stable compacta Ki.
Proof. (a) For an {i}graph for which the minimum (4.3) is attained we consider all m for which assertion (a) is not satisfied. Among these there are ones not containing any arrow from the index of an unstable compactum.
If no arrow ends in m, we replace m + n by m  j with V(Km, K) = 0. Then the {i}graph remains an {i}graph and the sum of the values of V corresponding to the arrows decreases.
If the arrows ending in m are st  m, ... , s,  m, and K51,..., K,, are stable, we also replace m  n by m  j, V(Km, K) = 0. If no cycle is formed, then we obtain an {i}graph with a smaller value of the sum. However, a cycle m
j
 sk  m may be formed. Then we replace sk 1, m by sk  n;
have V(Km, Kj) +
V(K, K) = V(K, K) < V(K, Km) + V(Km, K.),
so that the sum of the values of V corresponding to the arrows does not increase.
RepeatLng this operation, we get rid of all "bad" arrows.
(b) That the minimum over {i}graphs on the set of indices of stable compacta is not greater than the former minimum follows from (a). The reverse inequality follows from the fact that every {i}graph on the set of
6. Markov Perturbations on Large Time Intervals
190
indices of stable compacta can be completed to an {i)graph on the whole set {1, ... , 1) by adding arrows with vanishing V, beginning at indices of unstable compacta. (c) For any i # j we have W(K) < W(K;) + V(K;, K). Indeed, on the right side we have the minimum of V(Km, Kn) over graphs in which every point is the initial point of exactly one arrow and there is exactly one cycle i +j  i. Removing the arrow beginning at j, we obtain an {i}graph without increasing the sum. If K; is an unstable compactum, then we choose a stable compactum Ks such that V(K;, KS) = 0 and a graph g for which
V(K,, K,,)
is attained and in which indices of unstable compacta are initial points of only arrows with V(Km, Kn) = 0. We add the arrow j  s to this graph. Then a cycle j  s   j is formed. In this cycle we choose the last index i of a stable compactum before j. To the arrow i  k issued from i (k may be equal to j) there corresponds V(K,, Kk) = V(Ki, K). We throw this arrow out and obtain an {i}graph g' for which Y,(mn)eg V(K,,
W(K)  V(K;, K). This implies that W(K) > min[W(K) + V(K;, K3)] over all stable compacta K; .
Formula (4.4) implies, among other things, that the minimum of W(K;) may be attained only at stable compacta.
We consider the example of a dynamical system on a sphere whose trajectories, illustrated in the plane, have the form depicted in Fig. 12. Of course, the system has another singular point, not in the figure, which is unstable; we have to introduce the compactum K5 consisting of this point. If the values V(K;, K), I < i, j < 4 are given by the matrix (2.1), then the values W(K) for stable compacta are W(K2) = 6 and W(K3) = 9. We obtain that ass + 0, the invariant measure p` is concentrated in a small neighborhood of the limit cycle K2 and converges weakly to the unique invariant measure, concentrated in K2, of the system .z, = b(x,) (it is given by a density, with respect to arc length, inversely proportional to the length of the vector b(x)). Theorem 4.3. For X E M let us set
W(x) = min[W(K;) + V(K,, x)],
(4.5)
where the minimum can be taken over either all compacta or only stable compacta. Let y be an arbitrary positive number. For any sufficiently small neighborhood ' (x) of x there exists Fo > 0 such that for c < Fo we have exp{
F2(W(x)  min W(K;) + y)
l
S p`(, ,,(x)) < exp{ F2(W(x)  min W(K)  y) }.
14. The Problem of the Invariant Measure
191
proof. For a point x not belonging to any of the compacta K;, we use the following device: to the compact Kt,... , K, we add another compactum {x}. The system of disjoint compacta thus obtained continues to satisfy condition A of §2 and we can apply Theorem 4.1. The compactum {x} is unstable. Therefore, the minimum of the values of W is attained at a compactum other than {x} and W({x}) can be calculated according to (4.5). For a point x belonging to some K;, we have W(x) = W(K). The value of p%f,,(x)) can be estimated from above by p (g;) and a lower estimate can be obtained by means of Lemma 1.8.
Theorem 4.3 means that the asymptotics as c  0 of the invariant measure P, is given by the action function
F2(W(x)  min W(K;) ]. We consider the onedimensional case, where everything can be calculated
to the end. Let the manifold M be the interval from 0 to 6, closed into a circle. Let us consider a family of diffusion processes on it with infinitesimal generators b(x)(d/dx) + (E2/2)(d2/dx2), where b(x) =  U'(x) and the graph of U(x) is given in Fig. 14. The function U(x) has local extrema at the points 0, 1, 2, 3, 4, 5, 6 and its values at these points are 7, 1, 5, 0, 10, 2, 11, respectively. (This is not the case, considered in §3, Ch. 4, of a potential field b(x), since U is not continuous on the circle M.) There are six compacta containing o)limit sets of $, = b(x,): the point 0 (which is the same as 6), 1, 2, 3, 4 and 5; the points 1, 3 and 5 are stable. We can determine the values of V(1, x) for 0 < x < 2 by solving problem
R, for the equation b(x)VX(1, x) + "(VX(l, x))2 = 0. We obtain V(1, x) = 2[U(x)  U(1)]. Analogously, V(3, x) = 2[U(x)  U(3)] for 2 < x 0 and any 8 > 0 there exist So, 0 < So 5 S and co > 0 such that for all e < Eo, x c F and y c OD we have exp{E2(WD(x, y)  WD + y< Px{XT
ay)}
S exp{ e2(WD(x, Y)  WD  Y)}, (5.6)
where WD is defined by formula (5.3) and WD(x, y) by formulas (5.1), (5.2) or (5.5) In other words, for a process beginning at the point x, the asymptotics of the
distribution as E  0 at the time of exit to the boundary is given by the action function E2(WD(x, y)  WD), uniformly for all initial points strictly inside D.
Proof. We put y' = y 4'' and choose a corresponding po according to Lemmas 2.1 and 2.2. We choose a positive p2 smaller than po and the distance between F and OD. According to the same lemmas, we choose a positive
Pi < P2 and use the construction described in §2, involving Z. For x c U; G; we use Lemma 3.3 with L = {K1, ... , K;, y, aD}, W = (y, 3D) and the following sets: X2, a E L: G1, ... , G1, OD v fajy)9 aD\,9ao(y)
(beginning with n = 1, Z. E G; implies Z E ag;). The estimates for the probabilities P(x, XQ) for x e G. are given by formulas (2.3)(2.5); we have p,Q = exp{ a2VD(a, fl)) and a = exp{E2y' }. The sum Y9rGKi ty.aD} n(g) is
equivalent to a positive constant N multiplied by exp{E WD(K;, y)}. (N is equal to the number of graphs g E GJ;,,,{y, aD} at which the minimum of L3. E9 VD(a, fl) is attained. The denominator in (3.3) is equivalent to a positive constant multiplied by exp{E2WD}.) Taking into account that for x c G;, WD(x, y) differs from WD(K;, y) by not more than y', this implies the assertion of the theorem for x c U; G. If x c F\U; G;, we use the strong Markov property with respect to the Markov time rl: PX{X', E 98o(y)} = Pz{A'=, E Oao(Y)) I
+ Y MX{XL, e ag;; Px,, {X=. E tff ao(Y)} }.
(5.7)
According to (2.10), the first probability is between exp( E  2(VD(x, y) ± 7'));
and according to what has already been proved, the probability under the sign of mathematical expectation is between exp{ e 2(WD(K;, y)  WD ± (4' + 1)y')}. Using estimate (2.8), we obtain that the ith mathematical expectation in (5.7) falls between exp{a2(VD(x, K;) + WD(K;, y) + WD ± (4' + 1)y')}, and the whole sum (5.7) is between exp{E2(WD(x,y)  WD ± (4' + 2)y')}, where WD(x, y) is given by the first of formulas (5.4). This proves the theorem.
195
The Problem of Exit from a Domain
Theorem 5.1 enables us to establish, in particular, the most probable place of exit of X; to the boundary for small e. Theorem 5.2 (Wentzell and Freidlin [3], [4]). For every j = 1, ... ,1 let Y be the set of points y t= aD at which the minimum of VD(K;, y) is attained. Let the point x be such that the trajectory x,(x), t >_ 0, of the dynamical system, issued from x, does not go out of D and is attracted to K,. From the {aD}graphs
on the set {K,, ... , K,, OD} we select those at which the minimum (5.3) is attained. In each of these graphs we consider the chain of arrows leading from
K, into OD; let the last arrow in the chain be K;  OD. We denote by M(i) the set of all such j in all selected graphs. Then with probability converging to 1 as e  0, the first exit to the boundary of the trajectory of X;, beginning at x, takes place in a small neighborhood of the set UjeM(i) Y;
The assertion remains valid i/'all the VD, including those in formula (5.3), are replaced by VD.
We return to the example illustrated in Fig. 12. We reproduce this figure here (Fig. 16). Besides VD(K,, K), let VD(K2, OD) = 8, VD(K3, 3D) = 2, Po0(K4, OD) = I be given; VD(K 1, OD) is necessarily equal to + oo. On OD we single out the sets Y2, Y3 and Y4. Now there will be two {3D)graphs mini
mizing Lie)eg VD(a, /3): the first one consists of the arrows K,  K2, K2 + 3D, K3  OD and K4 K2 and the second one is the same with K4  K3 replacing K4 K2. Consequently, M(1) = M(2) _ {2}, M(3) _ {3}, and M(4) = {2, 3}. The trajectories of the dynamical system emanating from a point x in the lefthand part of D (to the left of the separatrices ending at the point K4) are
attracted to the cycle K2 with the exception of the unstable equilibrium position K,. The points of the righthand part are attracted to K3 and the points on the separating line to K4. Consequently, for small e, from points of the left half of D, the process X; goes out to aD in a small neighborhood of
Figure 16.
196
6. Markov Perturbations on Large Time Intervals
Y2, from points of the right half, it hits aD near Y3 and from points of the separating line near Y2 or Y3. If we increase VD(K,, DD) so that VD(K2, OD) = 16, VD(K3, OD) = 10,
VD(K4, aD) = 9, then again there are two (OD)graphs minimizing the or K4  K3; for all we have M(i) _ {3}. Consequently, for small e, the exit to the boundary sum
from all points of the domain will take place near Y3.
Now we turn to the problem of the time spent by the process X,' in D until exit to the boundary. We consider graphs on the set L = {Kt, ..., K,, x, aD}. We put MD(x) =
min
VD(a, /3).
geG(x,4jeD)) (s$)eg
The notation G(oc # W) was introduced in §3 after formulating Lemma 3.4. Lemma 5.3. The minimum (5.8) can also be written as
MD(x) =
min
Y
VD(a, /3);
(5.9)
(s fi)eg
MD(x) = WD n min[VD(x, K) + MD(K,)],
(5.10)
i
where MD(K) is defined by the equality
MD(K,) =
min
VD(a, /3),
(5.11)
geG(Xil,je I) (s'R)eg
where in the minimum we have graphs on the set {K,,..., K1, OD} (and WD is defined by formula (5.3)). In determining the minima (5.9) or (5.11), (5.10), one can omit all unstable compacta K,.
The proof is analogous to that of Lemmas 4.1 and 4.3 again. Theorem 5.3. We have lim F2 In MXrt = WD  MD(X)
(5.12)
E0
uniformly in x belonging to any compact subset F of D.
Proof. We choose y', PO, pt, P2 as in the proof of the preceding theorem, but with the additional condition that the mean exit time from the poneighborhood of the K, does not exceed exp{F2y'} (cf. Lemma 1.7). We consider
Markov times to (=0), Tt, T2,... and the chain Z. = X. We denote by
197
5. The Problem of Exit from a Domain
the index of the step in which Z. first goes out to aD, i.e., the smallest n for which t` = T.. Using the strong Markov property, we can write: 00
Mzr _
Mi4T1}.
n=0
does not exceed y'} (for small e). Hence up to a
Lemmas 1.7, 1.8 and 1.9 yield that in this sum exp{F:2
2exp{g2y'} but is greater than
factor between [2 exp{e2y'}]+1, Mzt` coincides with Y o Px{Z 0 aD} ME v. This mathematical expectation can be estimated by means of Lemma 3.4.
First for x e U; G. we obtain, using estimates (2.3)(2.5), that
exp{e 2(WD  MD(Ki)  (4' + 1)y')}
< M r, < 2 exp{e2(WD  MD(K;) + (4' + 1)y')},
(5.13)
for small s. Then for an initial point x e F\U, G; we obtain
Mxr = Mzt1 +
MX{Z1 eag;; M',T`}. 1=1
Taking account of the inequality Mz r1 < 2exp{e2y'} and estimates (2.8) (2.10) and (5.13), we obtain that M x t` is between exp{e2(WD  MD(x) + y)}. This is true for x e U; G;, as well. Since ,y > 0 was arbitrarily small, we obtain the assertion of the theorem.
We return to the example considered above (withVD(K;, K) given by the matrix (2.1) and VD(K2, aD) = 8, VD(K3, OD) = 2, VD(K4, OD) = 1). We calculate the asymptotics of the mathematical expectation of the time t` of
exit to the boundary for trajectories beginning at the stable equilibrium position K 3. We find that WD = 10 (the minimum (5.3) is attained at the two
graphs K1  K2, K2 + OD, K3  aD, K4  K2 or K4  K3) and MD(K3) = 6 (the minimum (5.11) is attained at the graphs K1  K2, K3  K4,
K4'K2;K'K2,K3'K2,K4'
Hence the mathematical expectation of the exit time is logarithmically
equivalent to exp{e 2(WD  MD(K3))} = Considering Z, we can understand why we obtain this average exit time. Beginning in K3, the chain Z makes a number of order exp{4e2}.
exp{e2VD(K3, aD)} = exp{2e2} of steps on a93, spending an amount of time of the same order with probability close to 1 for small e. After this, with probability close to 1, it goes out to aD and with probability of order
exp{e2(VD(K3, K2)  VD(K3, 0D))} = exp{4e2},
198
6. Markov Perturbations on Large Time Intervals
it passes to the stable cycle K2 (it may be delayed for a relatively small number of steps near the unstable equilibrium position K4). After this has taken place, over a time of order exp{E 2 VD(K2 , dD)} =
exp{8e2) the
chain
Z performs transitions within the limits of 092 and (approximately exp{E2 VD(K2, Kt)} = exp{E2} many times less often) Ogt with over
whelming probability. After this it goes out to the boundary. Hence
a
mathematical expectation of order exp{4E 2 } arises due to the less likelyof
probability of order exp{4E2}values of order exp{8E2}. We recall that in the case of a domain attracted to one stable equilibrium position, the average exit time MX ',r' has the same order as the boundaries of the range of the most probable values of TE (Theorem 4.2 of Ch. 4). In particular, any quantile of the distribution of r is logarithmically equivalent to the average. Our example shows that this is not so in general in the case where there are several compacta K, containing wlimit sets in the domain D; the mathematical expectation may tend to infinity essentially faster than the median and quartiles. This restricts seriously the value of the above theorem as a result character. izing the limit behavior of the distribution of TL.
We mentik,i the corresponding result formulated in the language of differential equations.
Theorem 5.4. Let g(x) be a positive continuous function on D u t3D and let v`(x) be the solution of the equation L`v`(x) = g(x) in D with vanishing boundary conditions on M. We have v`(x)
exp{E 2(WD  MD(x))}
uniformly in x belonging to any compact subset as E
0.
§6. Decomposition into Cycles. Sublimit Distributions In the problems to which this chapter is devoted there are two large param
eters: E2 and t, the time over which the perturbed dynamical system is considered. It is natural to study what happens when the convergence of these parameters to infinity is coordinated in one way or another. We shall be interested in the limit behavior of the measures Px{X; e I}; we restrict ourselves to the case of a compact manifold (as in §4). The simplest case is where first e 2 goes to infinity and then t does. Then all is determined by the behavior of the unperturbed dynamical system. It is clear that lim,.,,, limE .0 Px{X; e n = 1, if the open set r contains the whole
$6
199
Decomposition into Cycles. Sublimit Distributions
Wlimit set of the trajectory x,(x) beginning at the point xo(x) = x. This limit is equal to zero if r is at a positive distance from the wlimit set. In 44 we considered the case where first t goes to infinity and then a 2 does. Theorem 4.1 gives an opportunity to establish that
lim lim PX{X; E r) = 1 L'0 ,00
for open sets r containing all compacta K. at which the minimum of W(K;) is attained. In the case of general position this minimum is attained at one compactum. If on this compactum there is concentrated a unique normalized invariant measure Po of the dynamical system, then
lim lim Pa{X E F) = µo(r) L0 ('a0
for all r with boundary of pomeasure zero. We study the behavior of X; on time intervals of length t(e 2) where t(c
2)
is a function monotone increasing with increasing a2. It is clear that if t(2) increases sufficiently slowly, then over time t(E2) the trajectory of X; cannot move far from that stable compactum in whose domain of attraction the initial point is. Over larger time intervals there are passages from the neighborhood of this compactum to neighborhoods of others; first to the
"closest" compactum (in the sense of the action functional) and then to more and more "far away" ones. First of all we establish in which order X1 enters the neighborhoods of the compacta K. Theorem 6.1. Let L = { 1, 2, ... , I) and let Q be a subset of L. For the process
X; let us consider the first entrance time TQ of the boundaries ag, of the pneighborhoods g, of the K, with indices in L\Q. Let the process begin in g; u agi, i e Q. Then for sufficiently small p, with probability converging to 1 as e + 0, XL belongs to one of the sets ag, such that in one of the (L\Q)graphs g at which the minimum
A(Q) = min
Y V(Km, K,,)
(6.1)
geG(L\Q) (mn)eg
is attained, the chain of arrows beginning at i leads to j E L\Q.
The proof can be carried out easily by means of Lemmas 2.1 and 3.3. In this theorem FF(Km, Kn) can be replaced by V(Km, Kn) and in this case
we can also omit all unstable compacta and consider only passages from one stable compactum to another.
200
6. Markov Perturbations on Large Time Intervals
We consider an example. Let K;, i = 1, 2, 3, 4, 5 be stable compacta containing a)limit sets and let the values V(K;, K;) be given by the matrix 0
4
9
13
12
7
0
5
10
11
6
8
0
17
15
3
6
8
0
2
5
7
10
3
0
Let the process begin near K. We determine the {2, 3, 4, 5)graph minimizing the sum of values of V. This graph consists of the only arrow 1 , 2.
Therefore, the first of the compacta approached by the process will be K 2
Further, we see where we go from the neighborhood of K2. We put
Q = {2}. We find that the 11, 3, 4,51graph 2  3 minimizes the sum (6.1). Consequently, the next compactum approached by the process will be K3, with overwhelming probability. Then, the graph 3 > I shows that the process returns to K 1. Passages from K , to K 2, from K, to K 3 and from K 3 back to K 1 take place many times, but ultimately the process comes to one of the compacta K4, K5. In order to see to which one exactly, we use Theorem 6.1 for Q = { 1, 2, 3). We find that the minimum (6.1) is attained for the J4,511, 1  2, 2 4. Hence with probability close to 1 for small e, if graph 3
the process begins near K1, K2 or K3, then it reaches the neighborhood of K4 sooner than that of K5. (We are not saying that until this, the process performs passages between the compacta K 1, K 2 and K 3 only in the most probable order K1  K2  K3 > K,. What is more, in our case it can be proved that with probability close to 1, passages will take place in the reverse order before reaching the neighborhood of K4.) Afterwards the process goes from K4 to K 5 and backwards (this is shown by the graphs 4  5 and 5 * 4). Then it returns to the neighborhoods of the compacta K 1, K2 and K 3; most probably to K 1 first (as is shown by the graph 5 + 4,4 1). On large time intervals passages will take pl i .e between the "cycles" K,  Kz + K3  K, and K4 > K5 K4; they take place the most often in the way described above. Consequently, we obtain a "cycle of cycles"a cycle of rank two. The passages between the K; can be described by means of the hierarchy of cycles in the general case, as well (Freidlin [9], [10]). Let K1..... K,0 be stable compacta and let Q be a subset of L = 11, 2.... , I }. We assume that there exists a unique (L\Q)graph g* for which the minimum (6.1) is attained. We define RQ(i), i e Q as that element of L\Q which is the terminal point of the chain of arrows going from i to L\Q in the graph g*. Now we describe the decomposition of L into hierarchy of cycles. We begin with cycles of rank one. For every io e L we consider the sequence 1). Let n be the smallest index io, i t , i 2 , in, ... in which in = R(;,,_ 0 < m < n and for k smaller than n for which there is a repetition: in = all ik are different. Then the cycles of rank one generated by the element
201
o. Decomposition into Cycles. Sublimit Distributions
i0 e L are, by definite, the groups {io}, {i1 }.... , 1
I), {i,,,
im+ 1
. +
i}, where the last group is considered with the indicated cyclic
order. Cycles generated by distinct initial points i0 e L either do not intersect or coincide; in the latter case the cyclic order on them is one and the same. Hence the cycles of rank one have been selected (some of them consist of only one point).
We continue the definition by recurrence. Let the cycles of rank (k  1) be already defined. They are (briefly (k  1)cycles) n; ', n2 `, ... , sets of (k  2)cycles equipped with a cyclic order. Ultimately, every cycle consists of pointselements of L; we shall denote the set of point which constitute a cycle by the same symbol as the cycle itself. We shall say that a n; if Rn; '(m) e nk cycle n;' is a successor of n;` and write n;` i(m) assumes the same `. It can be proved that the function for in e it value for all in e n'j', so that the above definition is unambiguous. Now we consider a cycle it ' and a sequence of cycles n1i. ' n!`
beginning with it. In this sequence repetition begins nk ' from a certain index. Let it be the smallest such index: 7r`,' = it ', 0 < in < n. We shall say that the cycle nko ' generates the cycles of rank k { lr `_ , } (in cycles of rank k, each consisting of one cycle of the preceding rank) and {n;,,;' nk,'j 77i. it '}. Taking all initial (k  1)cycles ir,,', we decompose all cycles of rank (k  1) into kcycles. The cycles of rank zero are the points of L; for some k all (k  1)cycles participate in one kcycle, which exhausts the whole set L. For small s, the decomposition into cycles completely determines the most probable order of traversing the neighborhoods of the stable compacta by trajectories of X; (of course, all this concerns the case of "general position,"
where every minimum (6.1) is attained only for one graph). Now we turn our attention to the time spent by the process in one cycle or another. Theorem 6.2. Let it be a cycle. Let us put
C(n) = A(n)  min min
Y V(Km, K.),
(6.2)
icn gEGRIi}
where A(n) is defined by formula (6.1) and Gn{i} is the set of {i)graphs over the set it. Then. for sufficiently small p > 0 we have
lim E2 In Mr = C(n)
(6.3)
c 0
uniformly in x belonging to the pneighborhood of u ,, Ki and for any y > 0 we have
lim E0
uniformly in all indicated x.
Px{e`z(ctni,)
< Tn < e`Z(ctn)+,}} = 1
(6.4)
202
6. Markov Perturbations on Large Time lutervats
Proof. Relation (6.3) can be derived from Theorem 5.3. We recall that for any set Q c L (not a cycle), limE_o e2 In MX' XrQ depends on the choice of the point x e u;EQ g; in general (cf. §5). The proof of assertion (6.4) is analogous to that of Theorem 4.2 of Ch. 4 and we omit it.
The decomposition into cycles and Theorem 6.2 enable us to answer the problem of behavior of X' on time intervals of length t(e2), where t(E2) has exponential asymptotics as E' + 00: t(2) x e"  Z. Let it, n', ... , n1si be cycles of next to the last rank, unified into the last cycle, which exhausts L. If the constant C is greater than C(n), C(n'), ..., C(n's'), then over time of exp{Ce2}, the process can traverse all these cycles many times (and order all cycles of smaller rank inside them) and for it the limit distribution found in §4 can be established. In terms of the hierarchy of cycles, this limit distribution is concentrated on that one of the cycles it, ri ..... n's' for which the
corresponding constant C(n), C(n ), ... or C(n"') is the greatest; within this cycle, it is concentrated on that one of the subcycles for which the corresponding constant defined by (6.2) is the greatest possible, and so on up to points i and the compacta K; corresponding to them. The same is true if we do not have t(E2) cexp{Ce2} but only lim e2 In t(e 2) > max(C(n), C(n ), ... , C(n's')}.
ro
Over a time t(E 2) having smaller order, the process cannot traverse all cycles and no limit distribution common for all initial points sets. Nevertheless, on cycles of not maximum rank, which can be traversed by the process, sublimit distributions depending on the initial point set. We return to the example considered after Theorem 6.1. We illustrate the hierarchy of cycles in Fig. 17: the cycles of rank zero (the points) are
unified in the cycles (I  2 4 3  11 and {4  5 ,. 4} of rank one and they are unified in the only cycle of rank two. On the arrow beginning at each cycle it we indicated the value of the constant C(n). If lim,_0 E2 In t(E2) is
between 0 and 2, then over time t(e 2), the process does not succeed in moving away from that compactum K, near which it began and the sublimit 11
Figure 17.
§7. Eigenvalue Problems
203
distribution corresponding to the initial point x is concentrated on that K.
to whose domain of attraction x belongs. Over time t(e 2) for which limE.o e2 In t(e2) is between 2 and 3, a moving away from the 0cycle {4} takes place; i.e., the only thing what happens is that if the process was near K4, then it passes to K5. Then the sublimit distribution is concentrated on
K5, for initial points in the domain of attraction of K4 and in the domain of attraction of K5. If limt.o s2In t(E2) is between 3 and 4, then over time t(E2) a limit distribution is established on the cycle {4 + 5  4}, but nothing else takes place. Since this distribution is concentrated at the point 5 corresponding to the compactum K5, the result is the same as for lim E2 In t(E2) e+0
between 2 and 3. lirnE_o E2 In t(r 2) is between 4 and 5, a moving away from the cycle If
{4 + 5 + 4} takes place, the process hits the neighborhood of K 1 and passes from there to K2, but not farther; the sublimit distribution is concentrated on K2 for initial points attracted to any compactum except K3 (and for points attracted to K3 on K3). Finally, if limEo e21n t(E2) > 5, then a limit distribution is established (although a moving away from the cycle {I + 2 + 3 + 11 and even from the cycle {3} may not have taken place).
Now we can determine the sublimit distributions for initial points belonging to the domain of attraction of any stable compactum. For example, for x atK4 provided tracted to K4 we have lime.ao PX{X,(2) e I) = 1 for open I that 0 < limE0 0 In t(E2) < lmE.0 E2 In t(E2) < 2; limE_,o PX{X;(E2) e = 1 for open F K5 provided that 2 < limt,o E2 In t(E2) < fim,o EZ In t(e 2) < 4. If these lower and upper limits are greater than 4 and smaller than 5, then lim,o PX{X;(E2) e F} = 1 for open sets I containing K2. is attracted Finally, if lim.0 E2 In t(E2) > 5, then the distribution of to K3, independently of the initial point x. A general formulation of the result and a scheme of proof can be found in
Freidlin [10]. Sublimit distributions over an exponential time can also be established in the case of a process X; stopping after reaching the boundary of the domain (the situation of §5). We note that the problem of limit distribution of X' (,2) is closely connected with the stabilization problem of solutions of parabolic differential equations with a small parameter (cf. the same article).
§7. Eigenvalue Problems Let L be an elliptic differential operator in a bounded domain D with smooth
boundary 8D. As is known, the smallest eigenvalue Al of the operator L with zero boundary conditions is real, positive, and simple. It admits the
204
6. Markov Perturbations on large Time Intervals
following probability theoretic characterization. Let (X Px) be the diffusion process with generator L and let r be the time of first exit of X, from D. Then ,t forms the division between those A for which Mxe''t < or, and those for which Mxext = x (cf. Khas'minskii [2]). The results concerning the action functional for the family of processes (Xl, Px) corresponding to the operators z
LF 
2
z
I a"(x) c7x'dx' + Y he(x) ax.
can be applied to determine the asymptotics of eigenvalues of  Lr as F + 0. Two qualitatively different cases arise according as all trajectories of the dynamical system _r, = b(x,) leave D u OD or there are stable wlimit sets of the system in D.
The first, simpler, case was considered in Wentzell [6]. As has been established in §1 (Lemma 1.9), the probability that the trajectory of Xt'spends
more than time T in D can be estimated from above by the expression exp{Fzc(T  To)}, c > 0. This implies that MXe't` < x for A < Fzc, and therefore, A,; > a zc. The rate of convergence of A to infinity is given more accurately by the following theorem. Theorem 7.1. As r > 0 we have : A.; = (c, + o(1))F z. where
c, = lim T` min{SOT(cp): cp, e D u OD, 0 < t < T}
(7.2)
and SOT is the normalised action fitnctional.fbr the family of diffusion processes (X`+ P`X ) (
Proof'. First of all we establish the existence of the limit (7.2). We denote the minimum appearing in (7.2) by a(T). Firstly, it is clear that a(T/n) < a(T)/n for any natural number n (because the value of S((p) is less than or equal to SOT((p)/n for at least one of the intervals [kT/n, (k + 1)T/n]). From this we
obtain that for any T, T > 0, a(T) > [T/T] cc(T).
(7.3)
In order to obtain the opposite estimation, we use the fact that there exist positive To and A such that any two points x and y of D u 8D can be connected with a curve cp,(x, y), 0 < t < T(x, y) < To, such that SOT(x,r)((P(x, y)) < A.
205
17. Eigenvalue Problems
For some large T let the minimum a(T) be attained for a function cp. We put x = 0p, 1' = 0o and construct cp, from pieces: cp, = Co, for 0 < t Px{P0T(X`, (,) < 6}
expf _F2 MAO +x T]) > exp{EZT(c, + 3x)}. Successive application of the Markov property (n  1) times gives
PX(TE > nT) > exp{ne2T(c, + 3x)}. From this we obtain for small E and all n that Mx exp{e_2bTE} > exp{EZbnT}Pz{TE > nT}
exp{nE2T(b  c,  3x)}, which tends to infinity as n  :e. Therefore, M, exp{EZbr } = oo.
Now we prove that if b < c,, then Mx exp{E2br } < co for small E. Again we choose 0 < x < (c,  b)/3 and T and b > 0 such that min{SOT(cp): (p, e D+a u BD+a, 0 < t < T) > T(c,  2x)
(7.10)
(we have applied Lemma 1.4 once more). The distance between the set of functions ,G, entirely in D for 0 < t < T and any of the sets (Dx(T(c,
 2x)) =
{tp: (PO = X, So .(cp) 5 T(c,  2x)}
is not less than S. Therefore, by the upper estimate in Theorem 3.2 of Ch. 5, for sufficiently small E and all x e D we have Px{TE > T} < Px{PoT(X`, 4)x(T(c,  2x))) > 8}
< exp{ E ZT(c,  3x)}.
(7.11)
The Markov property gives us PX{TE > nT} < exp{ ne2T(c,  3x)}, and 00
Mx exp{EzbT`}
nT} n=0 W
< exp{E2bT} Z exp{nE2T(b  c, + 3x)} < co. n=0
The theorem is proved. O
207
17. Eigenvalue Problems
The following theorem helps us to obtain a large class of examples in which the limit (7.2) can be calculated. Theorem 7.2. Let the field b(x) have a potential with respect to the Riemannian metric connected with the diffusion matrix, i.e., let there exist a function U(x), smooth in D v aD and such that
b'(x) _  a''(x) OU(x) axi
(7.12)
Then c, is equal to min {.! XeDui)D 2 ij where
a" (x)b'(x)b'(x) }(7.13) JJJ
(a;{x)) = (a''(x))'.
Proof. We denote by ci the minimum (7.13). Let the minimum a(T) be attained for the function (p:
a(T) =
J 0r 2Y_ a1.((q,)(0P'

b'((p,)) dt.
(7.14)
We transform the integral in the following way: T1
a(T) =
f 2Y_ a1,((p,)ci c?i dt
J
T
a;;((p,)b'((p,)O{ dt +
a,,{(P,)b'((p,)bi((P,)dt. J0
The first integral here is nonnegative, the second is equal to U(90)  U(9T) by (7.12), and the third is not smaller than ci multiplied by T. This implies
that T la(T) > c*  T'C, where C is the maximum of U(x)  U(y) for all x, y e D' aD. Passing to the limit as T  oo, we obtain that c, > c;. The inequality c, < cr follows from the fact that for the function cp,  x0, where x0 is a point where the minimum (7.13) is attained, we have T 'SOT((p)
= cl. Now we consider the opposite casethe case where D contains wlimit sets of the system z, = b(x,). We assume that condition A of §2 is satisfied (there exist a finite number of compacta K 1, ... , K, consisting of equivalent points and containing all colimit sets). As we have already said, in this case our process with a small diffusion can be approximated well by a Markov
208
6. Markov Perturbations on Large Time Intervals
chain with (1 + 1) states corresponding to the compacta K; and the boundary aD, having transition probabilities of order
exp{E2VD(K;, K;)}, exp{ E 2 VD(K;, 3D)}
(7.15)
(the transition probabilities from aD to K; are assumed to be equal to 0 and the diagonal elements of the matrix of transition probabilities are such that the sums in rows are equal to 1). It seems plausible that the condition of finiteness of MX' e" ` is close to the condition that the mathematical expectation of e'1" is finite, where vE is the number of steps in the chain with the indicated transition probabilities until the first entrance of OD. The negative of the logarithm of the largest eigenvalue
of the matrix of transition probabilities after the eigenvalue I turns out to be the divisor between those A for which the above mathematical expectation is finite and those for which it is infinite. The asymptotics of eigenvalues of a matrix with entries of order (7.15) can be found in Wentzell's note [3]. The justification of passage from the chain to the diffusion process X is discussed in Wentzell [2]. Theorem 7.3. Let A.;, A', ... , A be the eigenvalues, except the eigenvalue 1, indexed in decreasing order of their absolute values, of a stochastic matrix
with entries logarithmically equivalent to the expressions (7.15) as E
0.
We define constants V(k(, k = 1, ... , 1, 1 + 1 by the formula Vlkl
= min
I
1",(O(,
(7.16)
geG(k) l=R)eg
where G(k) is the set of Wgraphs over the set L = {Kt, ..., Kt, aD} such that W contains k elements.
(We note that for k = 1 + 1, we have W = L; in this case there is exactly one Wgraph, which is the empty graph and the sum in (7.16), just as V"'), is equal to zero.) Then for E  0 we have Re(1 ,1k)
exp{(V(k) 
Vlk+Il)EZ}.
(7.17)
The proof is analogous to those of Lemma 3.1 and other lemmas of §3: the coefficients of the characteristic polynomial of the matrix of transition probabilities from which the identity matrix is subtracted can be expressed as sums of products ir(g) over Wgraphs. From this we obtain the asymptotics of the coefficients, which in turn yields the asymptotics of the roots.
209
j'). Eigenvalue Problems
We note that the assertion of the theorem implies that V(k 1)  V(k) V(k)  V(k+ 1) < V(k+1) _
V(1 +2) In the case
of "general position," where these inequalities are strict, the eigenvalue Ak turns out to be real and simple for smalls (the principal eigenvalue Al is always real and simple). As far as the justification of the passage from the asymptotic problem for the Markov chain to the problem for the diffusion process is concerned, it is
divided into two parts: a construction, independent of the presence of the small parameter e, which connects diffusion processes with certain discrete chains and estimates connected with the Markov chains Z introduced in §2. For the first part, the following lemma turns out to be essential. Suppose that (X,, Px) is a diffusion process with generator L in a bounded domain D with smooth boundary, ag is a surface inside D, r is the surface of a neighborhood of ag, separating ag from OD. Let us introduce the random times
Qo = min{t; X, e r}, r1 = min{t > ao: X, e ag u aD}.
For a > 0 we introduce the operators q" acting on bounded functions defined on ag:
q f (x) = Mx{Xt, e ag; c""f (XT)},
x e ag.
(7.18)
Lemma 7.1. The smallest eigenvalue Al of  L with zero boundary conditions on OD is the division between those a for which the largest eigenvalue of q" is smaller than I and those for which it is larger than 1.
The proof of this assertion and a generalization of it to other eigenvalues can be found in Wentzell [5].
In order to apply this to diffusion processes with small diffusion, we choose ag = Ui 0gi and r = Ui ri in the same was as in §2 and for the corresponding operators qf, we prove the following estimates. Lemma 7.2. For any y > 0 one can choose the radii of the neighborhoods of Ki so small for all a <e If' and sufficiently small r we have
I + aeY` 2 < Me6" < 1 + aey' 2, exp{(VD(K;, K;) + y)Ez} < q`X"Cj(x) = MX{XT, Eag;; a"t'} < exp{(VD(Ki, K;) 
Xeag;
y)cz},
(7.19)
xeag;; (7.20)
exp{(VD(Ki, 3D) + y)E2} < Mx{XT, EOD; a"t'}
< exp{(VD(K,,aD) 
y)Ez},
xeagi.
(7.21)
210
6. Markov Perturbations on Large Time Intervals
The proof is analogous to that of Lemmas 2.1 and 1.7; the upper estimates are obtained with somewhat more work. In conclusion, we obtain the following result. Theorem 7.4. The smallest eigenvalue A, of  L` is logarithmically equivalent to
exp{Ez(Vtt1  V12t} as r
(7.22)
0, where VIII, Vt2 are defined by formula (7.16).
In particular, if D contains a unique asymptotically stable equilibrium position 0, then W, ;::
exp.fr:Zmin V(O, y)}, ean
(7.23)
J
where V is a quasipotential. Theorems 7.1 and 7.4 do not give anything interesting in the case where there are only unstable sets inside D or the stable limit sets are on the boundary of the domain. Some results concerning this case are contained in the paper [1] by Devinatz, Ellis, and Friedman. The results contained in Theorems 7.3 and 7.4 are related to results of §6. In particular, the constants which are to be crossed by limt..o e2 In t(e Z) as
a sublimit distribution changes to another one are nothing else but Vtk,
 V(k+1).
We clarify the mechanisms of this connection. Let (X;, Px) be a diffusion
process with small diffusion on a compact manifold M or a process in a closed domain D u OD, stopped after exit to the boundary and let f (x) be a continuous function vanishing on aD. We assume that the function u(t, x) = Mx f(X;)the solution of the corresponding problem for the equation au'/at = LEu`can be expanded in a series in the eigenfunctions e;(x) of  L`: ue(t x) _
c7e
'e(x)
(7.24)
with coefficient c obtained by integrating f multiplied by the corresponding
eigenfunction if of the adjoint operator. (Since L` is not selfadjoint, the existence of such an expansion is not automatic.) We assume that the first to eigenvalues are real, nonnegative and decrease exponentially fast: A.;
xexp{Cisz},
Let the function t(e 2) be such that C;+t < lirE...o eZ In t(ez) < Him, o e2 In t(e2) < C,. We have lim5.o
Z) = 1 for 1 < i 5 j and this
211
§7. Eigenvalue Problems
limit is equal to zero for i > j. If the infinite remainder of the sum (7.24) does not prevent it, we obtain
lim u5(t(E2), x) = L lira 5f(y)e(y) dy lim e;(x). i=1 E0
EO
(7.25)
EO
In other words, the character of the limit behavior of u5(t(ex) changes as 2)
crosses Cj. This outlines the route following which we can derive conclusions concerning the behavior of a process on exponentially increasing time intervals lim(.o E2 In t(E
from results involving exponentially small eigenvalues and the corresponding eigenfunctions. The same method can be used for the more difficult inverse problem.
Chapter 7
The Averaging Principle. Fluctuations in Dynamical Systems with Averaging
§1. The Averaging Principle in the Theory of Ordinary Differential Equations Let us consider the system
Z; = Eb(Z;, c,),
Zo = x
(1.1)
t > 0, is a function assumof ordinary differential equations in R', where ing values in R', e is a small numerical parameter and b(x, y) = (b' (x, y), ... , b'(x, y)).
If the functions b'(x, y) do not increase too fast, then the solution of equation (1.1) converges to Z° = x as E ' 0, uniformly on every finite time interval [0, T]. However, the behavior of Zr on time intervals of order E' or of larger order is usually of main interest since it is only times of order E' over which significant changessuch as exit from the neighborhood of an equilibrium position or of a periodic trajectorytake place in system (1.1). In the study of the system on intervals of the form [0. TE'], it is convenient to pass to new coordinates in order that the time interval do not depend on E. We set X; = Z. Then the equation for X; assumes the form
X; = b(X;,Xo = X.
(1.2)
The study of this system on a finite time interval is equivalent to the study of system (1.1) on time intervals of order E'.
Let the function b(x, y) be bounded, continuous in x and y, and let it satisfy a Lipschitz condition in x with a constant independent of y:
Ib(x1.y)  b(x2,y)I < KIx,  x2I. We assume that lim Too
1
rT
T Jo
b(x, cS) ds = b(x)
(1.3)
*l. The Averaging Principle in the Theory of Ordinary Differential Equations
213
for x c R'. If this limit exists, then the function b(x) is obviously bounded and
satisfies a Lipschitz condition with the same constant K. Condition (1.3) is satisfied, for example, if , is periodic or is a sum of periodic functions. The displacement of the trajectory X; over a small time A can be written in the form ('A
Xt  X = J b(Xs, es/e) ds 0 A
A
= f b(x, bs/E) ds +
f[b(X'S,s/E) 
b(x,s/E) ds
0
0 e/E
0` 
f
o
b(x, cs) ds + pE(O)
The coefficient of A in the first term of the last side converges to b(x) as e/A ' 0,
according to (1.3). The second term satisfies the inequality I pE(A) I < KA2. Consequently, the displacement of the trajectory X; over a small time differs from the displacement of the trajectory x, of the differential equation X, = b(x,),
xo = x.
(1.4)
only by an infinitely small quantity compared to A as A  0, e/A  0. If we assume that the limit in (1.3) is uniform in x, then from this we obtain a proof of the fact that the trajectory X, converges to the solution of equation (1.4), uniformly on every finite time interval as a  0. The assertion that the trajectory X; is close to x, is called the averaging principle.
An analogous principle can also be formulated in a more general situation We consider, for example, the system
Xo=x,
Cl _ E'b2(XI,o = y,
(1.5)
e R', and b, and b2 are bounded, continuously differentiable functions on R' Q R' with values in R' and R', respectively. The velocity of the motion of the variables has order E' as a + 0. Therefore, the are called fast variables, the space R' the space of fast motion and the x slow where x E R',
variables.
We consider the fast motion 4,(x) for fixed slow variables x c R':
4,(x) = b2(x, ,(x)),
fo(x) = y,
and assume that the limit lim
1
T
T o ' Jo
b,(x, 5(x)) ds = b,(x).
(1.6)
214
7. The Averaging Principle
exists. For the sake of simplicity, let this limit be independent of the initial pointy of the trajectory S(x). The averaging principle for system (1.5) is the assertion that under certain assumptions, displacement in the space of slow motion can be approximated by the trajectory of the averaged system x, = b1(x,),
zo = x.
For equation (1.2), the role of fast motion is played by c; _,tE. In
this
case the velocity of fast motion does not depend on the slow variables. Although the averaging principle has long been applied to problems of celestial mechanics, oscillation theory and radiophysics, no mathematically rigorous justification of it had existed for a long time. The first general result
in this area was obtained by N. N. Bogolyubov (cf. Bogolyubov and Mitropol'skii [1]). He proved that if the limit (1.3) exists uniformly in x, then
the solution X; of equation (1.2) converges to the solution of the average system (1.4), uniformly on every interval. Under certain assumptions, the rate of convergence was also estimated and an asymptotic expansion in powers of the small parameter was constructed. In Bogolyubov and Zubarev [1] (cf. also Bogolyubov and Mitropol'skii [1]), these results were extended to some cases of system (1.5), namely to systems in which the fast motion is onedimensional and the equation for ' has the form , = E'b2(X,) and to some more general systems. V. M. Volosov obtained a series of results concerning the general case of system (1.5) (cf. Volosov [1]). Nevertheless, in the case of multidimensional fast motions, the requirement of uniform convergence to the limit in (1.6), which is usually imposed, excludes a series of interesting problems, for example, problems arising in perturbations of Hamiltonian systems. In a sufficiently general situation it can be proved that for every T > 0 and p > 0, the Lebesgue measure of the set F`p of those initial conditions in problem (1.5) for which supo s e p converges to
zero with e. This result, obtained in Anosov [1], was later sharpened for systems of a special form (cf. Neishtadt [1]). Consequently, if a system of differential equations is reduced to the form (1.2) or (1.5), then it is clear, at least formally, what the equation of zeroth approximation looks like. In some cases a procedure can be found for the determination of higher approximations. In the study of concrete problems first we have to choose the variables in such a way that the fast and slow motions be separated. As an example, we consider the equation
z, + w2x = e (x, x; t),
x e R'
(1.7)
If f (x, x, t) = (1  x2)z, then this equation turns into the socalled van der Pol equation describing oscillations in a lamp generator. For s = 0 we obtain the equation of a harmonic oscillator. In the phase plane (x, z) the solutions of this equation are the ellipses x = r cos((ot + 0), x = rw sin(cot + 0), on which the phase point rotates with constant angular velocity to. Without
ll. The Averaging Principle in the Theory of Ordinary Differential Equations
215
perturbations (s = 0), the amplitude r is determined by the initial conditions and does not change with time and for the phase we have w, = on + 0. If e is now different from zero, but small, then the amplitude r and the difference
or  wt are not constant in general. Nevertheless, one may expect that
the rate of change of them is small provided that a is small. Indeed, passing from the variables (x, z) to the van der Pol variables (r, 0), we obtain drF
dr
= eF,(wt + 0`, r'., t),
ro,
rF2(wt + 0`, r':, t),
Oo = 00,
dO`
dt =
where the functions F,(s, r, t) and
t) are given by the equalities
F,(s, r, t) =  1 f (r cos s, r sin s, t) sin s, co
F2(s, r, t) =  rw f (r cos s,  r sin s, t) cos s. Consequently, in the van der Pol variables (r, 0), the equation can be written in the form (1.1). If f (x, z, t) does not depend explicitly on t, then F, (wt + 0, r) and F2(wt + 0, r) are periodic in t and condition (1.3) is satisfied. Therefore,
the averaging principle is applicable to system (1.8). In the case where f (x, )i, t) depends periodically on the last argument, condition (1.3) is also satisfied. If the frequency w and the frequency v of the function f in t are incommensurable, then the averaged right sides of system (1.8) have the form (cf., for example, Bogolyubov and Mitropol'skii [1])
F,(r)
r 2,r 4rr2(0
0
f (r cos 4/,  r sin /i, %,t) sin 0 dt/i dt, o
2a
F2(r)
,.2n
f
f
(1.9)
2n
f(r cos >G, r sin 0, vr) cos 41 dpi A
4nzrw fo
It is easy to calculate the averaged right sides of system (1.8) for commensurable co and v, as well. Relying on the averaging principle, we can derive that as a 0, the trajectory (r,, 0,) can be approximated by the trajectory of the averaged system rr = eFtlrr),
O, =
ro = ro,
Oo = 00,
uniformly on the interval [0, Tr']. Using this approximation. we can derive a number of interesting conclusions on the behavior of solutions of equation (1.7). For example, let F,(r0) = 0 and let the function F,(r) be positive to the left of r0 and negative to the right of it, i.e., let r0 be the amplitude of
7. The Averaging Principle
216
an asymptotically stable periodic solution of the averaged system. It can be shown by means of the averaging principle that oscillations with amplitude close to ro and frequency close to w occur in the system described by equation (1.7) with arbitrary initial conditions, provided that a is sufficiently small. In the case where F1(r) has several zeros at which it changes from positive to negative, the amplitude of oscillations over a long time depends on the initial state. We shall return to the van der Pol equation in §8. Equation (1.7) describes a system which is obtained as a result of small perturbations of the equation of an oscillator. Perturbations of a mechanical system of a more general form can also be considered. Let the system be conservative, let us denote by H(p, q) its Hamiltonian, and let the equations of the perturbed system have the form E
dd p`
dt
dq`
dt
=
H dq;
(ps, qE) + cf
p(p`. qE).
H (PE, qE) + tf .(p'. qE),
i = 1, 2.... , n.
app
If n = 1,p=x,q= andH(p,q)=q2+wzp2,fp=0,fy =.f (p. q), then this system is equivalent to equation (1.7). In the case of one degree of freedom
(n = 1), if the level sets H(p, q) = C = const are smooth curves homeomorphic to the circle, new variables can also be introduced in such a way that the fast and slow motions be separated. As such variables, we can take the value H of the Hamiltonian H(p, q) and the angular coordinate cp on a level
set. Since fi(p, q) = 0 for the unperturbed system, H(p;'q;) will change slowly.
It is clear that if we choose (H', (p) as variables, where H' is a smooth function of H, then H' also changes slowly. For example, in the van der Pol variables, the slow variable is r = .,IH. In order that the system preserve the Hamiltonian form, in place of the variables (H, 9), one often considers the socalled actionangle variables (1, (p) (cf., for example, Arnold [1]), where I = H/w. In the multidimensional case the slowly varying variables will be
first integrals of the unperturbed system. In some cases (cf. Arnold [1], Ch. 10), in Hamiltonian systems with n degrees of freedom we can also introduce actionangle variables (1, (p); in these variables the fast and slow mo
tions are separated and the system of equations preserves the Hamiltonian form.
§2. The Averaging Principle when the Fast Motion is a Random Process As has already been mentioned, condition (1.3), which implies the averaging
principle, is satisfied if the function , is periodic. On the other hand, this condition can be viewed as a certain kind of law of large numbers: (1.3) is
§2. The Averaging Principle when the Fast Motion is a Random Process
217
satisfied if the values assumed by , at moments of time far away from each other are "almost independent ". In what follows we shall assume that the role of fast variables is played by a random process. First we discuss the simpler case of (1.2), where the fast
motion does not depend on the slow variables. Hence let , be a random process with values in R'. We shall assume that the function b(x, y) satisfies a Lipschitz condition: Ib(xt,yt)  b(x2, Y2)I < K(Ixt  X21 + IYt  Y2 I)Concerning the process we assume that its trajectories are continuous with probability one or on every finite time interval they have a finite number of discontinuities of the first kind and there are no discontinuities of the second
kind. Under these assumptions, the solution of equation (1.2) exists with probability 1 for any x e R' and it is defined uniquely for all t > 0. If condition (1.3) is satisfied with probability 1 uniformly in x e R', then the ordinary averaging principle implies that with probability 1, the trajectory of X; converges to the solution of equation (1.4), uniformly on every finite interval (b(x) and x, may depend on w in general).
Less stringent assumptions can be imposed concerning the type of convergence in (1.3). Then we obtain a weaker result in general.
We assume that there exists a vector field b(x) in R' such that for any
>0andxcR'wehave 1
T
rjr+T
b(x, S) ds  b(x)
>6}=0,
(2.1)
uniformly in t > 0. It follows from (2.1) that b(x) satisfies a Lipschitz condition (with the same constant as b(x, y)). Therefore, there exists a unique solution of the problem
The random process X; can be considered as a result of random perturbations of the dynamical system (2.2), small on the average. Relation (2.1) is an assumption on the average in time smallness of random perturbations. Theorem 2.1. Suppose that condition (2.1) is satisfied and sup, M Ib(x, S,)I2
< x. Then for any T > 0 and 6 > 0 we have
lime{ sup IX; 9,I>6 }=0. e.0
0 0 with probability 1. The vanishing of the probability P{t(g°) = T} follows from the existence of the transition probability density of CO. Consequently, P{r(CE) < T} converges to P{ r(C°) < T} as r + 0. In particular, choosing the ball of radius S with center at the point 0 as D, we obtain
lim P{ sup J X'  x, I > b f } = P{r(C°) > T1. E.0
105i 0 we have
sup IZ;x,,,I>b}=0, E0
limP10!5
l< TIE
)
where Z, is the solution of equation (1.1) and x, is the solution of the averaged system (1.4). In the case b(x) = 0 this theorem implies that in the time interval [0, TIE] the process Z; does not move away noticeably from the initial posi
tion. It turns out that in this case, displacements of order 1 take place over time intervals of order EZ. Apparently, it was Stratonovich [1], who first called attention to this fact. On the level of the rigor of physics, he established
(cf. the same publication) that under certain conditions, the family of processes Z;1E2 converges to a diffusion process and computed the characteristics of the limit process. A mathematically rigorous proof of this result was given
by Khas'minskii [5]. A proof is given in Borodin [1] under essentially less stringent assumptions. Without precisely formulating the conditions, which, in addition to the equality b(x) __ 0, contain some assumptions concerning the boundedness of the derivatives of the b(x, y) and also the sufficiently good mixing of , and the existence of moments, we include the result here.
229
Normal Deviations from an Averaged System
Let us introduce the notation a"(x, s, t) = Mb'(x, 5)bk(x, St), B(x, y) = (B%x, Y)) =
(`abax; `(x, Y)1 J,
MB;{x,S)b'(x,,).
K`(x, s, t) _ j= I
Suppose that the limits 1
aik(x) = lim T.o
1
K'(x) = lim T. T
ffo+T rto+T a`k(x, s, t) ds dt, o
J to
o
J to
fto+T rto+T
K'(x, s, t) ds dt
exist, uniformly in to > 0 and x e R'. Then on the interval [0, T], the process q = Z;,E2 converges weakly to the diffusion process with generating operator L
1
,k(x)
'
2 i.k=1 a
a2 i
'
axaxk +
i
K '(x)
axi
ass + 0. A precise formulation and proof can be found in Khas'minskii [5] and Borodin [1]. A natural generalization of the last result is the limit theorem describing the evolution of first integrals of a dynamical system. Consider equation (1.2), and assume that the corresponding averaged system (1.4) has a first integral H(x) : H(X1) = H(x) = constant for t > 0. Let H(x) be a smooth
function with compact connected level sets C(y) = {x e R' : H(x) = y}. Since Xl > Xr as s 10 uniformly on any finite time interval [0, T] in probability, maxo_ 0. We assume also that the vector field b(x, u), x e R2, u e R'", is Lipschitz continuous and Ib(x, u) I is bounded or grows slowly; the function b(x) = Mb(x, ,) then is also Lipschitz continuous. Introduce the following conditions. 1. Let H(x), x e R2, be a first integral for averaged system (1.4). Assume that H(x) is three times differentiable and the sets C(y) = {x a R2 : H(x) = y} are closed connected curves in R2 without self
intersections. This means, in particular, that H(x) has a unique extremum point. We can assume without loss of generality, that this point is the origin
lp
231
Normal Deviations from an Averaged System
0 C 'R2,
and that H(O) = 0, H(x) > 0 for x 0 0. The trajectory Xt performs periodic motion along C(y), y = H(Xo) with a period T(y). Assume that T(Y) < c(1 + y2) for y >_ 0 and some c > 0. 2. Let IH(x)I 5 C(Ix12 + 1),
02H(x) ax' ax.'
IVH(x)I 5 c(IxI + 1), a3H(x)
5 c,
ax'axiaxk I
for some c > 0 and for i, j, k e 11, 2}, x e R2. Let IxI'` (4(p + 1))/(,u  1), 0. If the averaged dynamical system has an asymptotically stable equilibrium position or limit cycle and the initial point Xo = x is situated in the domain of attraction of this equilibrium position or cycle, then it follows from the above results that with probability close to 1 for F small, the trajectory of X; hits the neighborhood of the equilibrium position or the limit cycle and spends an arbi
trarily long time T in it provided that s is sufficiently small. By means of Theorem 3.1 we can estimate the probability that over a fixed time [0, T] the trajectory of X, does not leave a neighborhood D` of the equilibrium position if this neighborhood has linear dimensions of order /_& However, the above results do not enable us to estimate, in a precise way, the probability
that over the time [0, T] the process X; leaves a given neighborhood, independent of s, of the equilibrium position. We can only say that this probability converges to zero. Theorems 2.1 and 3.1 do not enable us to study events determined by the behavior of X; on time intervals increasing
7. The Averaging Prineip
234
with e'. For example, by means of these theorems we cannot estimate the time spent by X; in a neighborhood D of an asymptotically stable equili. brium position until the first exit time of D. Over a sufficiently long time X, goes from the neighborhood of one equilibrium position of the averaged system to neighborhoods of others. These passages take place "in spite of" the averaged motion, due to prolonged deviations of , from its "typical" behavior. In one word, the situation here is completely analogous to that confronted in Chs. 46: in order to study all these questions, we need theorems
on large deviations for the family of processes X. For this family we are going to introduce an action functional and by means of it we shall study
probabilities of events having small probability fore < I and also the behavior of the process on time intervals increasing with decreasing E. These results were obtained in Freidlin [9], [11]. In what follows we assume for the sake of simplicity that not only are the partial derivatives of the b'(x, y), x e R', y e R', bounded but so are the b'(x, y) themselves: sup
I b`(x, y) I +
Ob'
axe
(x, Y)
The assumption that I b(x, y) I is bounded could be replaced by an assumption
on the finiteness of some exponential moments of Ib(x,,)I but this would lengthen the proofs. We shall say that condition F is satisfied if there exists a numericalvalued function H(x, a), x e R', (a a R', such that
lime In M exp{ 1 f T (as, b((ps, .,Ij) ds = E0
)
o
T H(cps,
Jo
as) ds,
(4.1)
for any step functions cps, as on the interval [0, T] with values in R'. If as cps and as we choose constants cp, a e R', then we obtain from (4.1) that lim
1 In M exp
T. T
rT (a, b((p, l;s)) ds } = H((p, a). Jo
(4.2)
)))
Lemma 4.1. The function H(x, cc) is jointly continuous in its variables and convex in the second argument.
Indeed, it follows from (4.2) that
IH(x+A,a+b)H(x,a)I 0 and cP E COT(R'), iPo = x, there exists EO > 0 such that for s < EO we have P{PoT(XE, (P) < S} > exp{i:'(SOT(4P) + ')}.
(4.3)
P{POT(XE, 4 (s)) > b} < exp{E'(s  y)}, where X; is the solution of equation (1.2) with the initial condition Xo = x.
We postpone the proof of this theorem until the next section and now discuss some consequences of it and the verification of the conditions of the theorem for some classes of processes. First of all we note that for any set A S CoT(R') = {(P E COT(R'): cqo = x) we have
 inf S((p) < lim E In P{XE E A} NE(A)
EO
hm E In P[XE E A]
inf S((p),
(4.5)
pc[A)
E O
where [A] is the closure of A and (A) is the interior of A in CxOT(R'). Estimates
(4.5) follow from general properties of an action functional (cf. Ch. 3). If the infima in (4.5) over [A] and (A) coincide, then (4.5) implies the relation lim E In P{X' E A}
inf SOT(4P)
EO
peA
We would like to mention that because of the boundedness of Ib(x, S5)I and the possible degeneracy of the random variables b(x, S), the condition of coincidence of the infima of SOT((o) over the sets [A] and (A) is more
237
4, Large Deviations from an Averaged System
stringent than, say, in the case of a functional corresponding to an additive perturbation of the type of a white noise (cf. Ch. 4). It follows from the compactness of Dx(s) that there exists so > 0 such that estimate (4.3) holds for e < so for any function cp E cx(s). Since b(x, y) satisfies a Lipschitz condition, for any (Px E C'OT(R'), (p'' E CO"T(R') and 6 > 0 we have {w: POT(X`, (px) < b} c {w: POT(Y` (Py) < (e IT + 2)8},
if pOT((px (py) < S, where X` = X; and Y` = Y; are solutions of equation (1.2) with initial conditions x and y, respectively and K is the Lipschitz constant. Relying on estimates (4.3) and (4.5), from this we obtain that if p((px, (py) < b, then
inf 0E('OT(RI).Ply. (p') 0 and any compactum Q e R' there exists eo > 0 such that inequalities (4.3) and (4.4) hold for every initial point x e Q and every rp E tx(s).
For a large class of important events, the infimum in (4.6) can be expressed in terms of the function ux(t, z) = inf {SOT(4 ): X00 = x, cp, = z}. As a rule, the initial point cpo = x is assumed to be fixed, and therefore, we shall omit
the subscript x in ux(t, z). As is known, the function u(t, z) satisfies the HamiltonJacobi equation. Since the Legendre transformation is involutive,
the HamiltonJacobi equation for u(t, z) has the form (cf., for example, Gel'fand and Fomin [1])
=0.
(4.7)
238
7. The Averaging Principle
The functional SoT((p) vanishes for trajectories of the averaged system. Indeed, from the concavity of In x it follows that T
1
H(x, a) = lim  In M exp T.W T
lim
1M T
Jo
(a, b(x,S)) ds
roT (a, b(x, . )) ds
J
rT
a, lim
T.T J
Mb(x, i .,) dsJ _ (a, b(x)),
1
and consequently, L((p, gyp) = supa[(O, a)  H(cp, a)] = 0 for cp = 6((P). It follows from the differentiability of H(x, a) with respect to a that the action functional vanishes only at trajectories of the averaged system. We take for the set A = {(p e Co .(R'): supo S}. Then we conclude from (4.5) that for any S > 0 we have
P{ sup JX,  x,1 > 8} < exp{cs '} l05t5T
)))
for sufficiently small E, where c is an arbitrary number less than inf,ElAl S(W)
This infimum is positive, since S(q) > 0 for tp a [A] and S is lower semicontinuous. Consequently, if the hypotheses of Theorem 4.1 are satisfied, then the probability of deviations of order 1 from the trajectory of the averaged system is exponentially small. In §8 we consider some examples of the application of Theorem 4.1 and now discuss the problem of fulfilment of the hypotheses of Theorem 4.1 in the case where i;, is a Markov process. Lemma 4.3. Suppose that , is a homogeneous Markov process with values in
D c R' and for any x, aeR' let lim
1
In M,, exp
(a ,
b(x, ,)) ds } = H(x, a)
(4.8)
fo,
uniformly in y e D. Then condition F is satisfied.
Proof. Let a, and z., be step functions and let ak and zk be their values on [tk  1, tk); 0 c to < t l < t2 < ... < t = T, respectively. Using the Markov
14 Large
239
Deviations from an Averaged System
property, we can write (1
rT
lim e In My exp{ J (as, e0 1
l im a In My exp
b(z £
F. o
x M;,, , expif
(az, b(z2,
fo 1
x M4,._,;E exp{ l
ds
fo,
ds J
r
ri,
(an, b(zn, s;J) ds}.
F.J 0
From (4.8) it follows that rk'k1 j(1 E In My exp{ E f0 (ak, b(zk, slE)) ds  H(Zk, ak)(tk  tk 1)
 0, be a homogeneous stochastically continuous Markov
process with N states 11, 2, ... , N}, let p,.#) be the probability of passage from
i to j over time t, and let P(t) = (p;,{t)). We denote by Q = (q;;) the matrix consisting of the derivatives dp;j(t)/dt at t = 0; as in known, these derivatives exist.
Theorem 4.2. Suppose that all entries of Q are different from zero. Let us denote
by Q°'X = (qX) the matrix whose entries are given by the equalities q,jX = q;; + 8;j (a, b(x, i)), where 8;j = I for i = j and 5 , = 0 for i j. Then QZX
240
7. The Averaging Princip
has a simple real eigenvalue ) = A(x, a) exceeding the real parts of all other eigenvalues. This eigenvalue is differentiable with respect to a. Condition F is satisfied and H(x, a) = A(x, a).
Proof. If 5, is a Markov process, then the family of operators T t > 0, acting on the set of bounded measurable functions on the phase space of according to the formula
,
f0,(a,
exp{
7,.f (z) =
l
b(x, ,)) ds},
forms a positive semigroup. In our case the phase space consists of a finite number of points and the semigroup is a semigroup of matrices acting in the Ndimensional space of vectors f = (f (1), ... , f (N)). It is easy to calculate the infinitesimal generator A of this semigroup:
A=lim7`,E=(q1 ilo
+bi;(a,b(x,i)))=Q°.X
t
By means of the infinitesimal generator the semigroup T, can be represented in the form T, = Since by assumption q;; # 0, the entries of the
matrix T, are positive if t > 0. By Frobenius' theorem (Gantmakher [1]), the eigenvalue with largest absolute value p = µ(t, x, a) of such a matrix is real, positive and simple. To it there corresponds an eigenvector e(t, x, a) _ (e1, . . . , eN), "_, ek = 1, all of whose components are positive. It is easy to derive from the semigroup property of the operators T that e(t, x, a) does not actually depend on t and is an eigenvector of the matrix i.e.,
a) = A(x, a)e(x, a). The corresponding eigenvalue )(x, a) is
real, simple, exceeds the real parts of all other eigenvalues of Q',' and p(t, x, a) = exp{t )(x, a)). The differentiability of A(x, a) with respect to a follows from the differentiability of the entries of Q' and the simplicity of the eigenvalue .(x, a) (cf. Kato [1]). By Lemma 4.3, in order to complete the proof of the theorem it is sufficient to show that T
lim
1 In M; exp fo ( a, b(x, S)) ds = , (x, a)
T . ao T
for i = 1, 2, N. This equality can obviously be rewritten in the following equivalent form: lim 1 In(T 1)(i) = A(x, a), Ty.T
where 1 is the vector with components equal to one.
(4.10)
241
15. Large Deviations Continued
To prove (4.10) we use the fact that all components of the eigenvector e . e(x, a) = (et, ... , eN),_k=, ek = 1, are positive:0 < c < min, P{p(tl`, (p°) < 8'}.
This inequality follows from the fact that the trajectories of and any function (p for which 90((p) < oo satisfy a Lipschitz condition. Estimating the right side of the last inequality by means of the first of the inequalities (5.2), we obtain PI PoT(X`'", (P) < S} > exp{e'(F(cpe,..., (Pne) + y)} = exPS l
6:) ds + yl E1(j'L(Ii. T o
(5.3)
for every y > 0 and sufficiently small s, where BPS is the piecewise linear func
tion having corners at the multiples of A and coinciding with (p, at these points.
Taking account of the absolute continuity of (ps and the convexity of L(x, fi) in #, we obtain T
f
n
L( O ,BPS) ds
o
1
L
`\(O(k 1)e,
k=1 n
k=1
f
A
klle
(Ps dsA T
kA
(k 1)A
fkA
L(,(k ne, P:) ds = fo L(t5, (Ps) ds = SoT(W)
Relations (5.3) and (5.4) imply the first of the two inequalities in the definition of the action functional: P(PoT(.
m, (p) < S}
> exp{ e 1(Sor((P) + y)}.
To prove the second inequality, we note that we have the inclusion {(U E 0: PoT(X`' ", (Ax(s)) > 619 (CO E fl: p(tl`,
°(s)) > b'}
(5.5)
245
15. Large Deviations Continued
for sufficiently small A(b) and b'(b), where &(S) = {q E COT(R'): co = x, SOT(1P) S S}.
This inclusion and the second estimate in (5.2) imply the required inequality
45.(s)) > b} < exp{E'(s  y)}.
(5.6)
As has been remarked, the compactness of iD,,(s) can be proved in the same way as in Lemma 4.2. Lemma 5.1 is proved. As usual, this lemma and the lower semicontinuity of SOT(q) imply that inf S(cp) < lim E In P{XE, 0 E A} E0
`p (A)
< ltm E In P{XE 0 E A} <  inf S((p) E0
(5.7)
oc[A]
for every A c CoT(R'). Here (A) is the set of points of A, interior with respect to the space CoT(R'). Lemma 5.2. Suppose that the function H(x, a) is differentiable with respect to the variables a. Let 0'"': [0, T] ' R' be a sequence of step functions uniformly
converging to some function cp E C0T(R') as n  oo. Then there exists a sequence gyp(") E COT(R'), uniformly converging to cp, such that T
)tm
L(q/'"', Ws")) ds < SOT((q) 0
n OD
Proof. For an arbitrary partition 0 = to < t1 < t2
SOT((O)
Y lk((Otk  (Otk,) k1
It is known (cf. Rockafellar [1]) that if !k(#) is a lower semicontinuous convex function and f3* e {f: lk(f3) < oo} = Ak, then lk(1R*) = lim,B.1k($), where the points Q belong to 1k, the interior of Ak with respect to its affine
hull'). Therefore, for every 6 > 0 there exists a function ip,: [0, T]  R' such that sup I Y'1  Y', I 0 provided that A = A(S) and d' = S'(S) are sufficiently small. This inclusion and estimate (5.5) imply the inequality P{POT(X`, (P) < d} > P
{PoT(X':."
(P.') < d'}
> exp{ r:'(S'""(cp') + )')} > exp{ rs'(SOT(cP) + 2y)}
(5.13)
for sufficiently small s.
Now we prove the second inequality in the definition of the action functional. First of all we note that since I b(x, y) I is bounded, the trajectory of X;,
248
7. The Averaging ptinciplc
just as well as that of if I, issued from a point x e R' belongs to some corn. pactum F c COT(R') for t E [0, T]. It follows from the semicontinuity of the functional Sor((p) in 0 that for any y > 0 there exists b = 3((p) such that S"(cP) > S  y/2 if Por(cP, 0) < 6 and SOT(cp) > s. Relying on the semi. continuity of So .((p) in cp, we conclude that the functional 6,((,) is lower semicontinuous in cP E Cor(R'). Consequently, by(cp) attains its infimum on every compactum. Let Fl be the compactum obtained from F by omitting the 6/2neighbor. hood of the set Ox(s) = {cP E Cor(R'): cPo = x, Sor(cp) 5 s}. Let us write by = inf0EFl 67M, 6' = by/(4KT + 2), where K is a Lipschitz constant of b(x, y).
Let us choose a finite 6'net in F and let q, , ... , cps" be the elements of this net, not belonging to 1.,(s). It is obvious that N
P{Por(X`, ba(s)) > b} 5Y, P{Por(Xc, (p(i)) < S')
(5.14)
11
if S' < 6. From the Lipschitz continuity of b(x, y) for Por((P, 0) we obtain the inclusion {co: POr(X`, (p) < 6')
{w: NO'.", (P) < (2KT + 1)6').
(5.15)
We choose step functions 0'1', 0(1), ... , r/i(N) such that POT(//U, (0°) < 6'/2 for i = 1, 2, ... , N. Relations (5.14) and (5.15) imply the estimate P{Por(X', D,(s)) > 6)
ex d E0
JJ
for x E (0, 1).
The deviations of order ex determine the average time needed by the trajectories of the process X;, Xo = 0, to exit from a domain D` containing 0
257
p. Examples
if DE is
obtained from a given domain D by a stretching with coefficient
e: DE = Ex . D. Let TE = inf {t: X; a DE},
V(x) = inf u(t, x). tao
Theorem 7.3. Suppose that the hypotheses of Theorem 7.2 are satisfied, the matrix C is nonsingular and DE = Ex  D, x e (0, z), where D is a bounded domain with smooth boundary. Let us put VO = min..OD V(x). Then lim
E'2x In MOTE
= V0,
E10
lim
P0{eL2K t(vY)
< TE
0.
The proof of this theorem is analogous to those of the corresponding results of Chapter 4 and we omit it. We say a few words concerning the calculation of V(x). Analogously to §3, Ch. 4, it can be proved that V(x) is a solution of problem Ro for the equation
'(CVV(x), VV(x)) + (Fix, VV(x)) = 0.
An immediate verification shows that if C'B is symmetric, then V(x) _
(C'Bx, x). Concluding this section, we note that analogous estimates can be obtained for not only deviations of order e', x e (0, 1) from an equilibrium position but also deviations of order ex from any averaged trajectory.
§8. Examples EXAMPLE 8.1. First we consider the case where the right sides of equations
(1.2) do not depend on x and the process , is stationary. Then X; can be expressed in the form
X; = x +
1/E
rl
b( SAE) ds = x + t(tAE)' J0
b( 5) ds. 0
We write m = Mb(t;y), B"(T) = m') and assume that J; =1 B"(T)  0 as T oo. By means of the Chebyshev inequality, we
258
7. The Averaging Principle
obtain from this for any S > 0 that T
> S}
b(as) ds  m
T
0, where qs is a Gaussian process with independent increments and mean zero such that MryryT = 10 G`(x5) ds (cf. Khas'minskii [4]).
It is easy to give examples showing that the system obtained from (8.8) upon averaging may have asymptotically stable stationary points even in cases where the vector fields A(y)x + b(y) do not have equilibrium positions for any value of the parameter y or have unstable equilibrium positions, In the neighborhood of such points the process X', F < 1, spends a long time or is even attracted to them with great probability. For the sake of definiteness, let S, be the Markov process with a finite number of states, considered in Theorem 4.2, let b = 0 and let all eigenvalues of the matrix A have negative real parts. Then the origin of coordinates 0 is an asymptotically stable equilibrium position of the averaged system. Let D be a bounded domain containing the origin and having boundary aD and let V(x, OD) be the function introduced in §6. If V(0, aD) < x, then in accordance with Theorem 6.1, the trajectories X,, Xo = x ED, leave D over time t` going to infinity in probability as F > 0 and In M t`  F' V(0, aD). The case V(0, OD) = + ce is illustrated by the following example. Let S, be a process with two states and let A(y) be equal to 0),
A,
A2 =
a
('
0)

in these states, respectively. We consider the system X; =
If
,
did not pass from one state to another, then for any initial state So, the origin
of coordinates would be an unstable stationary pointa saddle. Let the matrix Q for the process , have the form
0= The stationary distribution of this process is the uniform distribution, that is The averaged system (2, i)
XI = i(g  a)x',
x2 = 'z(u  a)x2
267
fig. Examples
has an asymptotically stable equilibrium position at 0 if a > a. It is easy to prove that in this case the trajectories of X' converge to 0 with probability 1 as t + 0 for any s > 0, i.e., due to random passages of ,, the system acquires
stability. By means of results of the present chapter we can calculate the logarithmic asymptotics of P.,{r' < oo} as a  0, where rt = min{t: X' D} (D is a neighborhood of the equilibrium position) as well as the asymptotics of this probability for x > 0, s = const.
EXAMPLE 8.5. Consider again equation (8.2). Let r = 2, b(x) = OH(x) _
(8H(x)/8.x2,8H(x)/8x'), and ,, oo < t < oo, be a twodimensional mean zero stationary process with the correlation matrix B(r). The averaged system is now a Hamiltonian one:
X, = OH(X,),
To = x e R2,
(8.9)
and the Hamiltonian function H(x) is a first integral for (8.9). Assume that
the function H(x), the matrix a(x), and the process , are such that conditions 15 of Theorem 3.2 are satisfied. Here g(x,z) = Q(x)z,
F(x,z) = (VH(x),a(x)z), (8.10)
D(x, s) = M (VH(x), Q(x)cs)(V H(x),
Q(x,s) = M(V (VH(x), a(x)is),
x, z e R2, s > 0.
Let B(r) = (Bj(r)), B'J(r) =
B=
r
B(r) dr.
The convergence of this integral follows from our assumptions. One can derive from (8.10) that
D(x) = 2 J0 D(x,s)ds = (a(x)Ba*(x)VH(x),V.H(x)). 0
Since B(r) is a positive definite function, h and a(x)Ba*(x) are also positive definite. Thus one can introduce a(y) such that Gr2 (y) =
(J")
dl IVH(x)I
(a(x)BQ*(x)VH(x),VH(x))dl
r C(y)
IVH(x)
To write down the drift coefficient for the limiting process, we need some
268
7. The Averaging Principle
notations. For any smooth vector field e(x) in R2 denote by Ve(x) the matrix (ey(x)), ey(x) = 8ej(x)/8x'. Simple calculations show that
Q(x) = JO Q(x, s) ds = tr(a*(x) V(a*(x)VH(x)) B), 0
and we have the following expression for the drift,
B(y) =
dt l cty) IvH(x)I
)
tr(a*(x) V(a*(x)VH(x))  B) del
'
JC)
IVH(x)I
Let, for example, a(x) be the unit matrix. Then
D(x) = (BVH(x), VH(x)),
Q(x) = tr(H(x)B),
where H(x) is the Hessian matrix for H(x) : Hy(x) = 82H(x)/8x'8xi.
§9. The Averaging Principle for Stochastic Differential Equations We consider the system of differential equations
X`=b(X`,Y`)+a(X`,Y`)w,
Xo=x, Yo = y,
E'B(X`, y,) + E112C(X`, Y`)w,
(9.1)
where x E R', Y E R1, b(x, y) = (b1(x, y), ... , b'(x, y)),
B(x, y) = (B'(x, y), ... , B'(x, y)),
w, is an ndimensional Wiener process and a(x, y) = (aj(x, y)), C(x, y) _ (C;(x, y)) are matrices transforming R" into R' and R', respectively. The functions b`(x, y), B'(x, y), a'(x, y), C).(x, y) are assumed to be bounded and
satisfy a Lipschitz condition. By this example we illustrate equations of the type (1.5) where the velocity of fast motion depends on the slow variables.
We also note that in contrast to the preceding sections, the slow variables in (9.1) form a random process even for given Y;, t E [0, t]. We introduce a random process Y, Y, x e R', y e R', which is defined by the stochastic differential quation
Y,'= B(x, Y;') + Cx, Y;r)rv
Yo'= y.
(9.2)
The solutions of this equation form a Markov process in R', depending on x e R' as a parameter.
269
19. The Averaging Principle for Stochastic Differential Equations
First we formulate and prove the averaging principle in the case where the
entries of the matrix o(x, y) do not depend on y and then we indicate the changes necessary for the consideration of the general case. We assume that there exists a function b(x) = (b'(x), 52(X),..., b'(x)), x e R', such that for any t > 0, x e R', y e R' we have i+T
1
M
where x(T)
b(x, Ysy) ds  b(x) < x(T),
T Jr
0 as T
(9.3)
oo.
Theorem 9.1. Let the entries of o(x,y) = o(x) be independent of y and let condition (9.3) be satisfied. Let us denote by X, the random process determined in R' by the differential equation'
X0 = X.
X, = b(X,) +
Then for any T > 0, S > 0, x e R' and y e R' we have
er
limp sup IX ;  X, l >6 = 0. 0 0 we have P10:S sup
IX',9'1 >6 } *0
1ST
(9.7)
)))
as e . 0 and A = A(e) = E ° In E' uniformly in x e R', y e R'. Indeed, it follows from the definition of X; and k' that
P{ sup I X;  2"1 > S } < P{ } t IOS,ST
fr
I b(X S, YS)  b(X [SYA]A, 's) I ds > S
271
19. The Averaging Principle for Stochastic Differential Equations
Estimating the probability on the right side by means of Chebyshev's inequality and taking account of (9.4) and (9.6), we obtain (9.7). It is also easy to obtain from (9.4) and (9.6) that sup MIX,
(9.8)
OS+sT
as
XS I converges to zero in probability
Now we show that supo s s : 5 T
as t  0. The assertion of the theorem will obviously follow from this and (9.7).
First we note that it follows from the definition of f, that for s e [0, A] the process Z, = Pke+E coincides in distribution with the process XSIk°',k° defined
by equation (9.2). We only have to choose the Wiener process w, in (9.2) independent of Xke, Y. Taking into account that E'A(E)  oo, we obtain, relying on (9.3), that (k+1)A
M
b(Xks, Y) dt  Ab(Xke)
J kA
fE = AM
[b(Xke, ZS)  b(Xkn)] ds < A x(A/E).
A
Using this estimate, we arrive at the relation
M{ sup
V) ds  J
f b(Xv 1,1,1,
(Osr5T
0 I
< M fOS1!5jT1Al max
k=0
f
b(XS) ds 0
}
(k+1)A
[b(Xko, PS b(Xko)] ds
I
+ c7 A
A
[Tln]
< YM
[b(Xke, PS)  b(Xke)] ds + C,A
k=0
S C7 A + T x(A/E)  0
(9.9)
as e  0, since A(E)  0, x(A(E)/E)  0 as E  0. We estimate mt(t) = M 1kr. .  X, 12. It follows from the definition of X, and A'; that
k,
fo [
+ U pon
b(X[SlelnPS)  b(XS)] ds
J[b(X)  b(X5)] ds + f[a(X)
 cr(XS)] dw3.
squaring both sides and using some elementary inequalities and the
272
7. The Averaging Principle
Lipschitz condition, we arrive at the relation m'(t) < C8 t f'm(s) ds + C9 ftm(s) ds 0
f b(X[S/ele, Ys) ds  fo I
+ 3M
t
b(Xs) ds
0
We obtain from this relation that 2
m`(t) < 3M
f,) ds
I
fo,

fo b(XS)] ds
f or t e [0, T]. This implies by (9.9) that m`(t) +0. For 6 > 0 we have the inequality
P{ sup
05tb} }}
< P sup
0 0, x e R' and y e R' we have
f,IT max M i,j T
Y Qk(x, YXY)cr (x, Ys'') ds  a`j(x) k
I
< x(T),
(9.10)
273
§9. The Averaging Principle for Stochastic Differential Equations
where x(T) 0 as T + cc and Y'Y is the solution of equation (9.2). Let X, be the solution of the stochastic differential equation
X0 = x1
X, = b(Xr) + 5(X,)wr,
where d(x) = (a(x))112 and w, is an rdimensional Wiener process. It turns out that in this case X, converges to X,. However, the convergence has to be understood in a somewhat weaker sense. Namely, the measure corresponding to X in the space of trajectories converges weakly to the measure corresponding to X, as a 10. We shall not include the proof of this assertion here but rather refer the reader to Khas'minskii [6] for a proof. We note that in that work the averaging principle is proved under assumptions allowing some growth of the coefficients. Conditions (9.3) and (9.10) are also replaced by less stringent ones.
We consider an example. Let (r`, (p`) be the twodimensional Markov process governed by the differential operator Le1
a2
2 are
+ b(r, (p)
a
ar
+
1B(r, (G) a(p + zC (r, (p) a(pza l a
e
1
2
This process can also be described by the stochastic equations r` = b(r`, (p`) + rv 1,
or, = s  'B(rzq,) + r 1/2C(r` (pE)w2. We assume that the functions b(r, (p), B(r, (p) and C(r, (p) are 2nperiodic in
(p and C(r, cp) > co > 0. In this case the process 0;'0) obtained from the process (ro)
= (p0o) +
foB(ro, (p(ro)) ds +
J'C(ro,
(pa o1) dws
0
by identifying the values differing by an integral multiple of 2n is a nondegenerate Markov process on the circle. This process has a unique invariant measure on the circle with density m(ro, (p) and there exist C, , > 0 such that for any bounded measurable function f ((p) on the circle rznf(W)m(ro, (P) d4
I
< Cext sup I f((P)I osro 0 and x c M there exists a com. pactum F c M, x c F, such that Pxy{X' E F for t c [0, T]) = 1 for every
yEEand e > 0. Let a be an element of the dual T *Mx of TM.. We introduce the differential
operator R = R(x, z, a) acting on functions f (y), y c E, according to the formula R(x, z, a)f (y) = Lf (y) + (B(z, y), V f (y)) + (a, b(x, y))f (y);
x, z e M and a E T *M.,, are parameters. For all values of the parameters, R(x, z, a) is an elliptic differential operator in the space of functions defined on E. It is the restriction, to smooth functions, of the infinitesimal generator of a positive semigroup. Analogously to §4, we can deduce from this that R(x, z, a) has a simple eigenvalue µ(x, z, a) with largest real part. This eigenvalue is real and by virtue of its simplicity, it is differentiable with respect to the parameters x, z, a. We introduce the diffusion process Y', z E M, on E, governed by the operator IYZ = L + (B(z, y), Vy).
Lemma 9.1. Let a E T*Mx and let F be a compactum in M. The limit lim
1 1
T. T
(a, b(x, YE)) ds = µ(x, z, a)
In My exp fo,
exists uniformly in x, z E F and y c E. The function µ(x, z, a) is convex downward in the variables at.
Proof. Let us write V(x, y, a) = (a, b(x, y)). The family of operators T' acting in the space of bounded measurable functions on E according to the formula
T; f (y) = My f (Y;) exp{ f V(x, Ys, a) ds } )
o
forms a positive semigroup. The assertion of Lemma 9.1 can be derived from this analogously to the proof of Theorem 4.2. Let us denote by L(x, z, f3)(x, z c M, fi E TM.,) the Legendre transform of the function µ(x, z, a) with respect to the last argument:
L(x, z, fi) = sup[(a, 09
µ(x, z, a)].
277
ig, The Averaging Principle for Stochastic Differential Equations
We shall sometimes consider L(x, z, /3) with coinciding first two arguments. We write L(x, x, /3) = L(x, /3). This function is obviously the Legendre transform of µ(x, x, a). We note that L(x, z, a) is lower semicontinuous.
Theorem 9.2 (Freidlin [11]). Let X; be the first component of the Markov process Z, governed by the operator 2` on M x E. Let us put T
SOT((P) =
J0
L(q,, (P) ds
for absolutely continuous functions P E COT(M); for the remaining 4P E COT(M) we set SoT(co) _ + co.
The functional E 'SO 40 is the action functional for the family of processes X t e [0, T] in COT(M) as E 10.
Proof. Together with the process Z; = (Xr;, Y;), we consider the process where
C = E1g(Yc) +
We write e(x, y) = C1(y)B(x, y). The process 2; differs from Z, by a change of the drift vector in the variables in which there is a nondegenerate diffusion, so that the measures corresponding to these processes in the space of tra
jectories are absolutely continuous with respect to each other. Taking account of this observation, we obtain for any function p: [0, T]  M and S > 0 that Px.y{POT(X`, (P) < 6} = Mx,y
(
{Por(2L, (P) < 6;
T
l
T
exp{ E Iii f (e(2s, ?D, dw3) 
(2E)1
(9.13)
Jo I e(9 t, Ys) I2 ds} }.
o
Let ik("): [0, T]  M be a step function such that PoT((P, ,(")) < 1/n. For any y, C > 0 we have
r
T
f
J0
(e(Z.r YS), dw,) 
Jo
dw3)
I
> 4E ; POT(k', (P) < b}
< exp{CE1} for sufficiently small b and 1/n. This estimate can be verified by means of the exponential Chebyshev inequality. We obtain from this and (9.13) that for
278
7. The Averaging Principle
any cP E COT(M) and y > 0 we have MX.
v
PoT(
`, (P) < b; exp E 1!2 $(e(i S
YS), dws)
0

(2E)
1
I
T
I e('Yn,, Ys) I2 ds s
3E
< PX.,{POT(X`, Jo (P) < 6)
< MX.Y{POT(?`, (P) < 6; exp{ E 1!2 fT(e(4n),

(2E)1
(9.14)
s) I2 ds + 3E
I
JoT
Ys), dws)
o
(
for sufficiently small 6 and 1/n. We introduce still another process, Z; = (X;, Y;), which is defined by the stochastic equations
X, = b(g;, Yc), l' Yt _ E1g(2t) + e  1B(Wi'
f3 ,
I I) + e
1/ 2C(
t)wr
in coordinate form. Taking account of the absolute continuity of the measures corresponding to Z; and Z;, it follows from inequality (9.14) that e vl2E < PX. v{POT(X `, (P) < S} < ev12c PX.,,{POT(
(9.15)
`, (P) < 6)
for sufficiently small 6 and 1/n.
Let r, < t2
0, x e R'; (9.21)
00, x) =.f W,
where b'(x) = 2 f'_, b'(x, y) dy. Therefore, the convergence of Xc to the trajectories of system (9.17) implies the convergence of the solution of problem (9.20) to the solution of problem (9.21) as E J 0. Relation (9.18) enables us to estimate the rate of this convergence. As we have seen in Chapters 4 and 6, in stationary problems with a small parameter, large deviations may determine the principal term of asymptotics,
not only the terms converging to zero with e. We consider the stationary problem corresponding to equation (9.20). For the sake of brevity, we shall assume that r = 1: a2ww.z
L`w`(x, y) =
1
2E Sy2
+ b(x, y)
aw,
ax =
0,
x E ( 1, 1),
Y E ( 1, 1),
E
a, (x, Y)
= 0,
K.,:( 1,
w`(1, Y)1,.r, = X 1, Y),
(9.22)
P( 1, Y),
1]:b(l,y)>o}, r_ = {ye[1, 1]: b(1,y) 0 for x E [  1, 1]. If we assume in
7. The Averaging Principle
282
addition that b(1, y) >_ 0 for y e [  1, 1], then it is easy to prove that i
i
Jim w`(x, y) = Elo
J
'(1, y)b(1, y) dy 1
J
I
b(1, y) dy i
If we do not assume that b(1, y) > 0 for y e [ 1, 1] in (1.9), then the situation becomes much more complicated. Concerning this, cf. Sarafyan, Safaryan, and Freidlin [1]. This work also discusses the case where the trajectories of the averaged motion do not leave (1, 1).
Chapter 8
Random Perturbations of Hamiltonian Systems
§1. Introduction Consider the dynamical system in R' defined by a smooth vector field b(x):
zr=b(xr),
xo=xeR'.
(1.1)
In Chapters 26 we have considered small random perturbations of system (1.1) described by the equation Xl = b(Xr) + ewi,
X08
= x,
(1.2)
where w, is a Wiener process, and e a small parameter (we reserve the notation without  for another Wiener process and another stochastic equation).
The longtime behavior of the process Xr was described in Chapter 6 in terms that presupposed identification of points x, y of the space that are equivalent under the equivalence relation that was introduced in Section 1 of that chapter. But for some dynamical systems all points of the phase space
are equivalente.g., if the trajectories of the dynamical system are of the form shown in Fig. 10. An important class of such systems is provided by Hamiltonian systems.
A dynamical system (1.1) in R2i is called a Hamiltonian system if there exists a smooth function H(x), x c ,R2", such that b(x ) = VH(x)
(8H(p, q) ,..., OH(p, q) aql
aqn
OH(p, q) apl
8H(p, q)1 an
,
x= (p,q) = (pl,...,pn;gi,...,gn) eR2n.
The function H(p, q) is called the Hamiltonian, and n is the number of degrees of freedom. It is well known that H(x) is an integral of motion: H(x,) = H(pt, q,) = H(po, qo) is a constant; and the flow (p,, qr) preserves the Euclidean volume in R. The invariant measure concentrated on a trajectory of the dynamical system is proportional to d1/Ib(x)I, where dl is the length along the trajectory.
284
8. Random Perturbations of Hamiltonian Syste.,
(c)
Figure 18.
We consider Hamiltonian systems with one degree of freedom: .zt(x) = GH(x,(x)),
xo(x) = x e R2.
(1.3)
We assume that H(x) is smooth enough and limlxl, H(x) = +oo. A typical example is shown in Fig. 18(a). The trajectories of the corresponding system are shown in Fig. 18(b). These trajectories consist of five families of periodic orbits and separatrices dividing these families: three families of closed trajectories encircling exactly one of the stable equilibrium points x3i x4, or x5 (the regions covered by the trajectories are denoted by D3, D4, and D5, respectively); the family of closed trajectories encircling x3 and x4 (they cover the region D2); and the family of closed trajectories encircling all equilibrium points (the orbits of this family cover the region D1). These families are separated by two ooshaped curves with crossing points xt and x2. Each o0shaped curve consists of an equilibrium point (x1 or x2) and two trajectories approaching the equilibrium point as t + ±oo.
285
fl. Introduction
(c)
(d)
Figure 19.
Note that if the Hamiltonian has the shape shown in Fig. 19(a), the selfintersecting separatrix may look like that in Fig. 19 (d). Still in this case we call it loosely an ooshaped curve. Consider the perturbed system corresponding to (1.3):
X, =VH(X,)+w,,
Xo =xeR2.
(1.4)
Here wt is a twodimensional Wiener process, and e, a small parameter. The
motion Xi, roughly speaking, consists of fast rotation along the nonperturbed trajectories and slow motion across them; so this is a situation where the averaging principle is to be expected to hold (see Chapter 7). To study slow motion across the deterministic orbits it is convenient to rescale the time. Consider, along with k,', the process Xf described by
kill =E2 VH(X,)+rv,,
X o =xeR2,
(1.5)
286
8. Random Perturbations of Hamiltonian Syste
where w, is a twodimensional Wiener process. It is easy to see that the process Xf coincides, in the sense of probability distributions, with XflE2. We denote by Px the probabilities evaluated under the assumption that 10* = x or Xo = x. The diffusions Xf, Xf defined by (1.4) and (1.5) have generating differential operators 2
D = b(x) V + 2 e,
L` =
E
b(x) V + 2 A,
where b(x) = VH(x), and the Lebesgue measure is invariant for these processes, as it is for system (1.3). (The same is true for other forms of pertur. bations of the system (1.3) that lead to diffusions with generators LE = b(x) . V + E2L0,
LE =  b(x) V + Lo, 82
where Lo is a selfadjoint elliptic operator; but for simplicity we stick to the case of Lo = i A.) If the diffusions XE, XE are moving in a region covered by closed tra
jectories xr(x) (in one of the regions Dk in Fig. 18(b)), the results of Khas'minskii [6] can be applied, and the averaging principle holds: for small c the motion in the direction of these trajectories is approximately the same as the nonrandom motion along the same closed trajectory, and the process makes very many rotations before it moves to another trajectory at a significant distance from the initial one; the motion from one trajectory to another ("across" the trajectories) is approximately a diffusion whose characteristics
at some point (trajectory) are obtained by averaging with respect to the invariant measure concentrated on this closed trajectory. This diffusion is "slow" in the case of the process k,, but it has a "natural" time scale in the case of Xf. As for the characteristics of this diffusion on the trajectories, let us use the function H as the coordinate of a trajectory (H can take the same value on different trajectories, so it is only a local coordinate). Let us apply Ito's formula to H(XE):
dH(X,) = VH(XX) z b(XX) dt + VH(XX) dw1 + AH(X,) dt. E
i
The dot product VH b = 0 since b(x) = VH(x), so
H(XX) =H(XX)+J0f VH(Xs) dw,s+J0t1AH(XE)ds. Since Xf rotates many times along the trajectories of the Hamiltonian sys
tem before H(X7) changes considerably, the integral of 1 VH(X7) is
287
p. Introduction
approximately equal to the integral fo B(H(XS )) ds, where
B(H) =
f(20H(x)/Ib(x)I)dl f(1/Ib(x)I)dl '
(1 7) .
the integrals being taken over the connected component of the level curve {x : H(x) = H} lying in the region considered. As for the stochastic integral in (1.6), it is well known that it can be represented as
J VH(W?) . dw = w(J' IVH(x:)12ds).
(1.8)
where W() is a onedimensional Wiener process, W(O) = 0 (see, e.g., Freidlin [15], Chapter 1). The argument in this Wiener process is approximately fo A(H(XS )) ds, where
A(H) _ §(IVH(x)I2/Ib(x)I )dl f(1/I b(x)I) dl
the integrals being taken over the same curve as in (1.7) (we may mention that Ib(x) I = IVH(x) I, so the integrand in the upper integral is equal to IVH(x)I). This suggests that the "slow" process on the trajectories is, for
small e, approximately the same as the diffusion corresponding to the differential operator
Lf (H) = 2 A(H) r(H) + B(H) f'(H). This describes the "slow" process in the space obtained by identifying all points on the same trajectory of (1.3); but only while the process XJ is mov
ing in a region covered by closed trajectories. But this process can move from one region covered by closed trajectories to another, and such regions are separated by the components of level curves {x : H(x) = H} that are not closed trajectories (see Fig. 18(b)). If we identify all points belonging to the same component of a level curve
Ix: H(x) = H}, we obtain a graph consisting of several segments, corresponding: 1,, to the trajectories in the domain Di outside the outer oocurve; 12, to the trajectories in D2 between the outer and the inner oocurve; 13 and 14, to the trajectories inside the two loops of the inner oocurve (domains D3 and D4), and 15, to those inside the right loop of the outer oocurve (domain D5) (see Fig. 18). The ends of the segments are vertices 01 and 02 corresponding to the oocurves, 03, 04, 05 corresponding to the extrema x3, x4, x5 (it happens that
the curves corresponding to 01, 02 each contain one critical point of H. a
288
8. Random Perturbations of Hamiltonian Systeras
saddle point). Let us complement our graph by a vertex O,,,, being the end of
the segment I, that corresponds to the point at infinity. Let us denote the graph thus obtained by F.
Let Y(x) be the identification mapping ascribing to each point x E R2 the corresponding point of the graph. We denote the function H carried over to the graph r under this mapping also by H (we take H(OB) = 100). The
function H can be taken as the local coordinate on this graph. Couples (i, H), where i is the number of the segment I;, define global coordinates on the graph. Several such couples may correspond to a vertex. The graph has the structure of a tree in the case of a system in R2; but for a system on a different manifold it may have loops.
Note that the function i(x), x e R2, the number of the segment of the graph r containing the point Y(x), is preserved along each trajectory of the unperturbed dynamical system: i(X,) = i(x) for every t. This means that i(x) is a first integral of the system (1.1)a discrete one. If the Hamiltonian has more than one critical point, then there are points x, y E R2 such that H(x) = H(y) and i(x) A i(y). In this case the system (1.1) has
two independent first integrals, H(x) and i(x). This is, actually, the reason why H(XX) does not converge to a Markov process as a  0 in the case of several critical points. If there is more than one first integral, we have to take a larger phase space, including all first integrals as the coordinates in it. In the present case, it is two coordinates (i, H). The couple (i(XX ), H(X, )) = Y(XX) is a stochastic process on the graph F, and it is reasonable to expect that as e + 0, this process converges in a sense to some diffusion process Y, on the graph F. The problem about diffusions on graphs arising as limits of fast motion of dynamical systems was first proposed in Burdzeiko, Ignatov, Khas'minskii, Shakhgil'dyan [1]. Some results concerning random perturbations of Hamiltonian systems with one degree of freedom leading to random processes on graphs were considered by G. Wolansky [1], [2]. We present here the results of Freidlin and Wentzell [2]. Some other asymptotic problems for diffusion procesess in which the limiting process is a diffusion process on a graph were considered in Freidlin and Wentzell [3]. What can we say about the limiting process Y1? First of all, with each segment I; of the graph we can associate a differential operator acting on functions on this segment:
L; f (H) = A;(H) f"(H) + Bi(H) f'(H);
i
(1.10)
A;(H), B;(H) are defined by formulas (1.9) and (1.7) with the integrals taken over the connected component of the level curve {x : H(x) = H} lying in the region D; corresponding to I. These differential operators determine the behavior of the limiting process Y, on IF, but only as long as it moves inside one segment of the graph. What can we say about the process after it leaves the segment of the graph where it started?
11.
289
Introduction
The problem naturally arises of describing the class of diffusions on a graph that are governed by the given differential operators while inside its segments. This problem was considered by W. Feller [1], for one segment. It turned out that in order to determine the behavior of the process after it goes out of the interior of the segment, some boundary conditions must be given, but only for those ends of the segment that are accessible from the inside. Criteria of accessibility of an end from the inside, and also of reaching the insider from an end were given. One of them: if the integral
Jexp{_J) dH}dH
(1.11)
diverges at the end Hk, then Hk is not accessible from the inside. These criteria, and also the boundary conditions, are formulated in a simpler way if we represent the differential operator Lf as a generalized second derivative
(d/dv)((d/du)f) with respect to two increasing functions u(H), v(H) (see Feller [2] and Mandl [1]). For example, the condition of the integral (1.11) diverging at Hk is replaced by unboundedness of u(H) near Hk (in fact, (1.11) is an expression for u(H)); the end Hk is accessible from inside and the
inside is accessible from Hk if and only if u(H), v(H) are bounded at Hk. Another condition for inaccessibility that we need: if f v(H) du(H) diverges at Hk, then the end Hk is not accessible from the inside. Representing the differential operator in the form of a generalized second derivative is especially convenient in our case, because the operators (1.10) degenerate at the ends of the segment I,: its coefficients A; (H), B; (H) given by formulas (1.9) and (1.7) have finite limits, but A,(H) + 0 at these ends (in the case of nondegenerate critical points, at an inverse logarithmic rate at an end corresponding to a level curve that contains a saddle point, and linearly at an end corresponding to an extremum). Feller's result can be carried over to diffusions on graphs; if some segments I, meet at a vertex Ok (which we write as I;  Ok) and Ok is accessible
from the inside of at least one segment, then some "interior boundary" conditions (or "gluing" conditions) have to be prescribed at Ok. (If the vertex Ok is inaccessible from any of the segments, no condition has to be given.) If for all segments II  Ok the end of I; corresponding to Ok is accessible from
the inside of I,, and the inside of I, can be reached from this end, then the general interior boundary condition can be written in the form
akLf(Ok) _
(±fki) du. (Ok),
(1.12)
UiOk
where Lf (Ok) is the common limit at Ok of the functions L;f defined by (1.10) for all segments I ;  Ok; u, is the function on I; used in the representation L, = (d/dv,) (d/du,); a, > 0, Yk; Z 0, and the 8k, is taken with + if the function u, has its minimum at Ok, and with  if it has its maximum there; and ak + E,;1; O, IBki > 0 (otherwise the condition (1.12) is reduced to
290
8. Random Perturbations of Hamiltonian system
0 = 0). The coefficient ak is not zero if and only if the process spends a positive time at the point Ok. (Theorem 2.1 of the next section is formulated
only in the case ak = 0, and only as a theorem about the existence and uniqueness of a diffusion corresponding to given gluing conditions; in Freidlin and Wentzell [3] the formulation is more extensive.) Before considering our problem in a more concrete way, let us introduce some notations:
The indicator function of a set A is denoted by the supremum norm in the function space, by 11 11; the closure of a set A, by A; D; denotes the set of all points x e R2 such that Y(x) belongs to the interior of the segment I;;
Ck={x:YX=Ok}; Ck; = Ck n BD;;
for H being one of the values of the function H(x),
C(H) = {x : H(x) = H}; for H being one of the values of the function H(x) on D;, C;(H) = {x e D; : H(x) = H}; for two such numbers H1 < H2,
D;(Ht,H2) = D;(H2,HI) = {xeDi : Hi 0,
Ck,(6)_{xeDt:H(x)=H(Ok)±6}
(the sets Ck;(6) are the connected components of the boundary of Dk(±8)); if D with some subscripts, and the like denotes a region in R2, zE with the same subscripts, and the like denotes the first time at which the process Xl leaves the region; for example, rk(±6) = min{t : Xf 0 Dk(±8)}; and V with the same subscripts and the like denotes the corresponding time for the process k,8The pictures of the domains Dk(±S) and D,(H(Ok) ± 6,,W) are different
for a vertex corresponding to an extremum point xk and for one corresponding to a level curve containing a saddle point xk; they are shown in Fig. 20 and 21.
In the notations we introduced, the coefficients A; (H), B. (H) can be written as
A((H) _ §c,(H)(IVH(x)IZ/Ib(x)I)dl fc,(H) (1 1 I b(x) I) dl
Bt(H) _ §QH)(1AH(x)/Ib(x)I)dl fc,(H)(11Ib(x)I)dl
291
§1. introduction
Figure 20. Case of xk being an extremum.
Figure 21. Case of xk being a saddle point.
integrals of the kind used in (1.13) can be expressed in another form, and differentiated with respect to the variable H using the following lemma. Lemma 1.1. Left f be a function that is continuously differentiable in the closed region D;(H1, H2). Then for H, Ho e [Hl, H2],
fc,(H) f (x) I VH(x) I dl = fci(Ho) f(x) I V H(x) I dl
±
if Di(Ho,H)
[Vf (x) VH(x) +f (x),&H(x)] A,
where the sign + or  is taken according to whether the gradient VH(x) at C;(H) is pointing outside the region D; (Ho, H) or inside it; and
Vf(x) VH(x)+f(x) t1H(x)1 dH fC1(H) f(x)IVH(x)Idl =
c,(H) L
IVH(x)I
IVH(x)I
dl.
292
8. Random Perturbations of Hamiltonian Systems
The integral in the denominator in (1.13) is handled by means of Lemma 1.1 with f (x) = 1 / I VH(x) 12; those in the numerators, with f (x)
1 and with
f(x) = AH(x)/IVH(x)12. By Lemma 1.1, the coefficients A1(H), B1(H) are continuously differ. entiable at the interior points of the interval H(I,) at least k  1 times if H(x) is continuously differentiable k times. Let us consider the behavior of these coefficients as H approaches the ends of the interval H(I1), restricting ourselves to the case of the function g having only nondegenerate critical points (i.e., with nondegenerate matrix (a2H/ax'axJ) of second derivatives). As H approaches an end of the interval H(I,) corresponding to a non.. degenerate extremum xk of the Hamiltonian, the integral fcj(H) (1 / I VHI) d1, which is equal to the period of the orbit C1(H), converges to O' H
ap (xk) ag (xk) fc,(H)
2
2
apaq (xk)
> 0;
IVHI dl  const (H  H(Ok)) + 0;
and
fCH)
(AH/IVHI) dl + Tk AH(xk).
So A1(H) + 0 as H + H(Ok) = H(xk), B1(H) + ZAH(xk) (positive if xk is a minimum, and negative in the case of a maximum). If H approaches H(Ok), where Ok corresponds to a level curve contain
ing a nondegenerate saddle point, we have f C, H) (1 /IVHI) dl  const 11nIH  H(Ok)II ' 00; fc;(H) IVHI dl + fc IVHI d[ > 0. The coefficient A1(H) + 0 (at an inverse logarithmic rate); and it can be shown that in this AH(xk) (which can be positive, negative, or zero). case too B1(H) One can obtain zsome accurate estimates of the derivatives A'(H), Bi'(H) as H approaches the ends of H(I1) corresponding to critical points of the Hamiltonian, but we do not need such. What we need, and what is easily obtained from Lemma 1.1, is that
IA'(H)I,IB;(H)I
0 there exists 6', 0 < 6' < 6, such that the probabilities Px{XT (±o) e Ckj(S)} almost do not depend on x if it changes on Ui Cki(S') (the liegtnning of the proof of Lemma 3.6). (5) To find the coefficients fki we use the fact that the invariant measure p of the process Xt is the Lebesgue measure, so the limiting process for y(Xf) must have the invariant measure p o Y1. One can find the discussion of some generalizations of the problem described in this section in Section 7. In particular, there we consider biefly perturbations of systems with conservation laws. Perturbations of Hamiltonian systems on tori may lead to processes on graphs that spend a positive time at some vertices. Degenerate perturbations of Hamiltonian systems, in particular, perturbations of a nonlinear oscillator with one degree of freedom, are also considered in Section 7. In addition, we briefly consider there the multidimensional case and the case of several conservation laws.
§2. Main Results Theorem 2.1. Let F be a graph consisting of closed segments 11, ... , IN and vertices 01,..., OM. Let a coordinate be defined in the interior of each segment Ii; let ui(y), vi(y), for every segment Ii, be two functions on its interior that increase (strictly) as the coordinate increases; and let u; be continuous. Suppose that the vertices are divided into two classes. interior vertices, for which limyo,, ui(y), limyo, vi(y) are finite for all segments Ii meeting at Ok (notation: I;  Ok); and exterior vertices, such that only one segment I, enters Ok, and f (c + vi(y)) dui(y) diverges at the end Ok for some constant c. For each interior vertex Ok, let Nki be nonnegative constants defined for i such that IiOk; >2i:i;Ok 13ki > 0. Consider the set D(A) e C(I') consisting of all functions f such that f has a continuous generalized derivative (d/dvi)(d/dui)f in the interior of each segment It;
finite limits limyo, (d/dvi)(d/dui)f(y) exist at every vertex Ok, and they do not depend on the segment Ii  Ok; for each interior vertex Ok,
df
i:ii  Ok
fiki lim Ok dui (y) Y
Ell Nki i:/i Ok
lim
f (y) = 0,
Y+Ok dui
(2.1)
296
8. Random Perturbations of Hamiltonian Systeau
where the sum >' contains all i such that the coordinate on the ith segment has a minimum at Ok, and E", those for which it has a maximum.
Define the operator A with domain of definition D(A) by Af(y) (d/dvi)(d/dui)f (y) in the interior of every segment Ii, and at the vertices, as the limit of this expression.
Then there exists a strong Markov process (y,, Py) on F with continuous trajectories whose infinitesimal operator is A. If we take the sapce C[0, oo) of all continuous functions on [0, cc) with values in r as the sample space for this process, with y, being the value ofa function of this space at the point t, such a process is unique. If Ok is an exterior vertex, and y Ok, then with Py probability 1 the process never reaches Ok. The proof can be carried out similarly to that of the corresponding result for diffusions on an interval. First we verify the fulfillment of the conditions of the HilleYosida theorem; then existence of a continuouspath version of the Markov process is proved; and so on: see Mandl [1] or the original papers by Feller [1], [2]. Diffusion processes on graphs have been considered in
some papers, in particular, in Baxter and Chacon [1]. The corresponding Theorem 3.1 in Freidlin and Wentzell [3] allows considering only interior vertices.
In the situation of a graph associated with a Hamiltonian system, the vertices corresponding to level curves of the Hamiltonian containing saddle points are interior vertices, and those corresponding to extrema and to the point at infinity, exterior.
Theorem 2.2. Let the Hamiltonian H(x), x e R2, be four times continuously differentiable with bounded second derivatives; H(x) 2! A,Ix12, IVH(x)I z A2Ix1, LH(x) >_ A3 for sufficiently large I xI , where Al, A2, A3 are positive constants. Let H(x) have a finite number of critical points xl,... , xN, at which the matrix of second derivatives is nondegenerate. Let every level curve Ck (see notations in Section 1) contain only one critical point xk. Let (X,, P) be the diffusion process on R2 corresponding to the differ(x) + e 2VH(x) Vf (x). Then the distribution of ential operator Lef (x) = ?Of of continuous functions on [0, oo) with values in the process Y(XI) in the space Y(R2)( a I) with respect to PX converges weakly to the probability measure PY(x), where (y,, Py) is the process on the graph whose existence is stated in Theorem 2.1, corresponding to the functions ui, vi defined by formulas (1.15) and (1.16), and to the coefficients , ki given by
ski =
I VH(x) I dl.
(2.2)
The proof is given in Sections 36, and now we give an application to partial differential equations.
297
¢2. Main Results
Let G be a bounded region in R2 with smooth boundary aG. Consider the pirichlet problem
Lefe(x) =!Afe(x) +E OH(x)  Vfe(x) = g(x), x e G. f '(x) = fi(x),
x E G, (2.3)
Here H(x) is the same as in Theorem 2.2, >1i(x) and g(x) are continuous functions on aG and on G u 3G, respectively, and c is a small parameter. It is well known that the behavior of fe(x) as a  0 depends on the behavior of the trajectories of the dynamical system z, = VII(x,): If the trajectory x,(x), t >_ 0, starting at xo(x) = x hits the boundary aG at a point z(x) e aG, and VH(z(x)) n(z(x)) # 0 (n(z(x)) is the outward normal vector to OG at the point z(x)), then fe(x) + O(z(x)) as a + 0 (see, e.g., Freidlin (15]; no integral of the function g is involved in this formula, because, in the
case we are considering, the trajectory X, leaves the region G in a time of order e2). If x,(x) does not leave the region G, the situation is more complicated.
Let d be the subset of G covered by the orbits C;(H) that belong entirely to G : G = U;,H:Cj(H) c G C;(H); and let FG be the subset of the graph I' that
corresponds to G c R2 : FG = Y(G) = {(i, H) : C,(H) c G}. (i) Assume that the set I'G is connected (otherwise one should consider its connected components separately). Suppose that the boundary of FG consists of the points (ik, Hk), k = 1, . . . , 1. Each of the curves C,,(Hk), (ik, Hk) e OFG, has a nonempty intersection with 0G. (ii) Assume that for each (ik, Hk) e 8I'G the intersection Cik (Hk) n aG consists of exactly one point zk. This should be considered as the main case. Later we discuss the case of Cik (Hk) n aG consisting of more than one point. (iii) Assume that arG contains no vertices. Let A be the set of all vertices that belong to IG. Let A = A, u A2, where A, consists interior vertices, and A2, of exterior vertices.
Theorem 2.3. If for a point x e G the trajectory x, (x), t Z 0, hits the boundary aG at a point z(x) e aG, and VH(z(x))  n(z(x)) 0 0, then li
mff(x) = O(z(x))
If x e G, and if the Hamiltonian H(x) satisfies the conditions of Theorem 2.2, and conditions (i), (ii), and (iii) are fuflled, then
lim.f8(x) =f(i(x), H(x)), where f (i, H) is the solution of the following Dirichlet problem on 17G,
298
8. Random Perturbations of Hamiltonian Systems
A,(H)f"(i, H) + B,(H)f'(i, H) = g(i, H),
(i, H) a rG,
f (ik, Hk) = 4,(zk) for (ik, Hk) a OrG, f (i, H) is continuous on rG,
E (±fJki)/'(i,H(Ok)) = 0 for Ok e Al. Ho eOk
Here
A,(H) =
fc,(H) IVH(x) I dl TG(H)
B1(H)
IVH(x)Il dl'
 fc,(H) AH(x)/2IVH(x)I dl fC,(H) IVH(x)Il dl
'
fk; are defined by (2.2) and taken with + if H > H(Ok) for (i, H) a I,, and with  if H::9 H(Ok) for (i, H) a I,; and g(x)IVH(x)I_l
9(i,H) _
fCC(H)
dl
fC,(H) IVH(x)Wl dl
The solution of problem (2.4) exists and is unique. If the functions u,(H), v; (H) are defined by (1.15), (1.16), then
f(i, H) =f (i, H) + ai(l)u;(H) +a:2), where H
1(i, H) = J
H(Ok) J H(Ok)
9(i, y)v; (y) dy u; (z) dz,
Ok being an end of the segment I,. The constants a; l ), ai2) are determined uniquely by the boundary conditions on 8G, the continuity, and the gluing conditions (2.1) at the vertices belonging to Al. The proof of this theorem is based on Theorem 2.2 and includes the following statements. 1.
The solutions of problems (2.3) and (2.4) can be represented as
follows. Te
f '(x) = Mz'r (X ) + Mx JO g(X?) dx, "0
f(i,H) = M(,,H)I(YTo)+M(l,H)
0(Y,)ds,
fo
299
§2. Main Results
where re is the first exit time from G for X, : re = min{t : Xf # G}, r°,
the first exit time from r'G for the process yt of Theorem 2.2: r0 _ min{t : yt 0 r'G}, and M(;,H) = My is the expectation corresponding to the probability Py associated with the process yt on the graph. 2. MXre < A < oo for every small e 0 0 and for every x e G. The proof of this statement consists of two main parts. First we get a bound for the mean exit time from a region U such that the closure of Y(U) contains no vertices, and this is done in the standard way. Another bound is obtained for the expected exit time from a region D such that Y(D) belongs to a neighborhood of a vertex of I'; for this Lemmas 3.4 and 3.5 are used. 3. A;,(Hk) > 0 for all k, (ik,Hk) a arG, and thus every (ik, Hk) is a regular point of the boundary tFG for the process (yt, Py). 4.
If we denote a(k, H) = inf{IVH(x)I : x e Ck(H)}, then for every
positive 6,
lim Jim Pz { h0 e+0
l
1
h
h
J" g(XX) ds  4(i(x), H(x))
uniformly in x such that a(Y(x)) > a > 0. EXAMPLE. Let us consider the boundaryvalue problem (2.3) with g(x) = 0, H(x) as shown in Fig. 18, and the region G as shown in Fig. 22(a). The region G has two holes. Its boundary touches the orbits at the points z1, z2, z3. In the part of G outside the region bounded by the dotted line the trajectories of the dynamical system leave the region G. For x e G\(G U 3G) the limit of f e(x) is equal to the value of the boundary function at the point
at which the trajectory x,(x) first leaves G. To evaluate lim5.0 fe(x) for x e G (G is shown separately in Fig. 22(c)) one must consider the graph IF corresponding to our Hamiltonian and its part FG c r corresponding to the region G (see Fig. 22(b)). In our example 8J G consists of three points (4, Hl), (5, H2), and (1, H3). The boundary conditions at aFG are:
f(4,H1) = ck(zl),
f(5,H2) = O(z2),
f(1,H3) = i(z3).
Since the righthand side of the equation in problem (2.4) is equal to 0, on every segment I; the limiting solution f (i, H) is a linear function of the function u;(H). Since the functions u.(H) are unbounded at the vertices
of the second class (in particular, at 03), and the solution f (i, H) is bounded, this solution is constant on the segment 13. On the remaining segments the formulas (2.5) give the limiting solution up to two constants on each of the segments I,, 12, 14, 15. So we have nine unknown constants. To determine them we have three boundary conditions, four equations arising from continuity conditions at Ol and 02, and two gluing conditions at these vertices. So we have a system of nine linear equations with nine unknowns. This system has a unique solution.
8. Random Perturbations of Hamiltonian Systeam
300
(a) (b)
 
(c) Figure 22.
Another class of examples is that with g(x) = I and t/i(x) = 0. The solution of the corresponding problem (2.3) is the expectation M'r". So Theorem 2.3 provides the method of finding the limit of the expected exit time of the process Xt from a region G (or the main term of the asymptotic of the expectation Maze of the exit time for the process X1 as a + 0). Remark. We assumed in Theorem 2.3 that for each (ik, Hk) E aI'G the set C,,(Hk) n aG consists of exactly one point zk. The opposite case is that this intersection is equal to CCt(Hk). In this case one should replace the boundary condition f (ik, Hk) = 0(zk) by
f (ik, Hk) = 4, where (Hk) tp (x)
IVH(x)I dl
k = fC §Ck(Hk) IVH(X)I dl ,k
If aG n C,k (Hk) consists of more than one point but does not coincide with C,k (Hk), the situation is more complicated.
301
p. Proof of Theorem 2.2
§3. Proof of Theorem 2.2 Before proving Theorem 2.2 we formulate the necessary lemmas and introduce some notations. The proof of the lemmas is given in Sections 4 and 5.
Lemma 3.1. Let M be a metric space; Y, a continuous mapping M" Y(M), y(M) being a complete separable metric space. Let (Xe, P) be a family of Markov processes in M, suppose that the process Y(Xe) has continuous trajectories (in Theorem 2.2, the process X,' itself has continuous paths; but we want to formulate our lemma so that it can be applied in a wider class of situations). Let (y,, Py) be a Markov process with continuous paths in Y(M) whose infinitesimal operator is A with domain of definition D(A). Let us suppose that the space C[0, oo) of continuous functions on [0, eo) with values in r' is taken as the sample space, so that the distribution of the process in the space of continuous functions is simply Py. Let 'Y be a subset of the space C(Y(M)) such that for measures µ1, µ2 on Y(M) the equality f f dice = f f dµ2 for all
f e'I' implies µ1 = µ2. Let D be a subset of D(A) such that for every f e'P and a. > 0 the equation ..F  AF = f has a solution F e D. Suppose that for every x e M the family of distributions Qz of Y(X;) in the space C[0, oo) corresponding to the probabilities P' for all a is tight; and that for every compact K S Y(M), for every f e D and every A > 0,
MX J0 e
[1.f(Y(X,))A.f(Y(X,))]dt'.f(Y(x))
(3.1)
as a  0, uniformly in x e Y1(K). Then Qx converges weakly as e p 0 to the probability measure PY(x).
The lemmas that follow are formulated under the assumption that the conditions of Theorem 2.2 are satisfied. Lemma 3.2. The family of distributions Qxe (those of Y(X;) with respect to the probability measures PX in the space C[0, oo)) with small nonzero a is tight.
It can also be proved that the family {Qx} with small a and x changing in an arbitrary compact subset of R2 is tight. Lemma 3.3. For every fixed H1 < H2 belonging to the interior of the interval H(I,), for every function f on [H1, H2] that is three times continuously differentiable on this closed interval, and for every positive number A, e
Mx
,Hz ) f(H(_Y, [e_'1 AT
; (Hi,H2)
+Jo
e
dt
e
e
[)..f (H(XX ))  L,f (H(Xr ))] dt
' f (H(x)) as a i 0, uniformly with respect to x e Di(H1, H2).
(3.2)
302
8. Random Perturbations of Hamiltonian Syste,
Lemma 3.4. Let Ok be an exterior vertex (one corresponding to an extremum of the Hamiltonian). Then for every positive A. and x there exists 63.4 > 0 such that for sufficiently small e for all x e Dk(t83.4), tk(ts3..)
a
Mz J0
dt < K.
Lemma 3.5. Let Ok be an interior vertex (corresponding to a curve containing a saddle point). Then for every positive A and x there exists 63.5 > 0 such that for 0 < 6 5 83.5 for sufficiently small a for all x e Dk(±8), tf(fa) Mz
Je1t dt < x 8.
(3.3')
0
Remark. Formulas (3.3) and (3.3') are almost the same as MXTk(±8) < x or x8, and such inequalities can also be proved; but it is these formulas rather than the more natural estimates for MXTr(±8) that are used in the proof of Theorem 2.2.
Lemma 3.6. Define, for Ii " Ok, Pki = P it (Ei:r,  ok fiki) , where I3ki are defined by formula (2.2). For any positive x there exists a positive 83.6 such that for 0 < 8 < 63.6 there exists 83.6 = 63.6 (6) > 0 such that for sufficiently small e, Px{X=k(±b) a Cki(6)} PkiI < x
(3.4)
for all x e Dk(± 53.6)
Proof of Theorem 2.2. Make use of Lemma 3.1. As the set P we take the subset of C(f) consisting of all functions that are twice continuously differentiable with respect to the local coordinate H inside each segment Ii; D is the subset of D(A) consisting of functions with four continuous derivatives inside each segment. The tightness required in Lemma 3.1 is the statement of Lemma 3.2; so what remains is to prove that the difference of both sides in (3.1) is smaller than an arbitrary positive number rI for sufficiently small e for every x e R2.
Let a function f e D, a number 2> 0, and a point x e R2 be fixed. Choose Ho > maxk H(Ok) so that Mee_ T° < X
I 2(IIfII + .1' IlAf  Afll )
for all sufficiently small e, where Te = Te(Ho) = min{t : H(X,) is possible by Lemma 3.2).
Ho} (this
303
§3; proof of Theorem 2.2
To prove that r°°
me
ez'[2f(Y(Xi))Af(Y(Xi)))dtf(Y(x))
_ 1, is the first exit time from D(Ho,B') after the time Un. Use the strong Markov property with respect to an: MX{Tn
0, a positive S5 = bs(S) < S so that
If(Y)  f (Ok) I an : X, e Ck,(S") V Ck,(S)}, an = min{t >_ T'n_1 : X, Dk(±S')} (after the process reaches Ck,(S), all Tn and an are equal to Tk(±S); the difference is that, since we are starting from a point z e Ck,(S'), we have a' = 0, so we do not need to consider To). We have W
b2(z) _
Mz{T' < Tk(±a);
ezZ04(X%)}
n=1 00
+
MZ{an < Tk(±S); E'O
(3.26)
n=1
04e(z') = Mz,
[e_tf( Y(X(±Jl)))
r0
+I
CAt
(Y(Xe))  A.r( '(Xre))j dt] f(Y(z')), r
(3.27)
Mz,[eAr,(x(ok)±6,H(ok)±b").f(Y(Xr;(moo
os(Z) =
±a,H(ok)±a")))
+
r' JO0
f( Y(z'))
±a,H(ok) ±o")
eAl
A.f(Y(X,8)))dt] (3.28)
where r; (H(Ok) ± S, H(Ok) ± S"), according to our system of notations, is the time at which the process leaves the region D,(H(Ok) ± S, H(Ok) ± S")
(this region is the same as the difference Dk(±S)\Dk(±S")); and for ZECk,(S'),
311
{, proof of Theorem 2.2 00 Tk(±S);exr}
Mz{Tn
0 we can choose S' = 0 < 8;' < S', so that Va(H(Ok) ± 8')/Va(H(Ok) ± 8") < 3 for 0
 0:
ex' f(H(X,
dws )) =f(H(Xo)) + Jo a+ eu[Af(H(XS))+2f (H(Xs))IVH(XS)I2 sf'(H(XS))VH(XS)
Ja
+ 2 f'(H(X,s))AH(X')] ds, Mzeatf
(H(X E)) =f (H(x)) + MX
(4.2)
J0
e'5[Af (H(X., ))
+ 2f"(H(Xs))IVH(Xs )I2 + 2f'(H(Xs))AH(XE)] ds for the time r of going out of an arbitrary bounded region. In particular, for
A=0,f(H)=H, MXH(X,) = H(x) + Ms
2 AH(X`) ds,
(4.4)
M`H(XX) = H(x) + Mx f 2 AH(Xs) ds.
(4.5)
JO
o
Proof of Lemma 3.2. Introducing the graph, we did not describe the metric on it. Now we do this: for any two points y, y' e Y(R2), consider all paths
on the graph that connect these points following the segments I; : y = (io, Ho) '' (io, Hl) = Oki = (il, Hl) ''' (il, H2) = Ok2 = (i2, H2) H ... r (it1, H,1) = Ok,_, = (it, H11) ''' (i1, Ht) = y', and take 11
p(y,y')
= min E jH1+1
 H14,
i=0
where the minimum is taken over all such paths. In order to prove the tightness it is sufficient to prove that: (1) for every T > 0 and & > 0 there exists a number Ho such that
Px{ max H(X,)zHo} 0 there exists a constant A, such that for every a E K there exists a function fP (y) on Y(R2) such that fP (a) = 1, fP (y) = 0 for p(y, a) > p,
0 < fP (y) < I everywhere, and fP ( Y(X, )) + Apt is a submartingale for all e (see Stroock and Varadhan [2]). As for (1), we can write, using formula (4.4),
where A4 > sup AH(x) (the constant denoted Ao was introduced in (1.14), and A1, A2, A3 in the formulation of Theorem 2.2). Using Kolmogorov's inequality, we obtain for Ho > H(x) + (A4/2)T,
Pe J',max H(X,)>Ho< stsT
JJJ
Mx JOT I °H(Xi
dt
Ho H(x)  (A4/2)T fT[A5 +A6MXH(X,)]dt
Ho  H(x)  (A4/2)T and using (4.7), we obtain (4.6). Let us prove (2). Let h(x) be a fixed smooth function such that 0::5 h(x) < 1, h(x) = 1 for
x 0,
Mxzk(±S)
0 decreasing with a (but not too
fast), and for D.(H1,112) with one of the ends H1, H2or bothbeing possibly at a distance S(c) from an end of the interval H(I1). Lemma 4.2 is the cdependent version of Lemma 3.3, including this lemma as a particular case; and Lemma 5.1 is the timedependent version of Lemma 3.5, which does not include it as a particular case. In the proof of Lemma 3.5 both Lemmas 4.2 and 5.1 are used. The proofs rely on a rather complicated system of smaller lemmas. Let us formulate two cdependent versions of Lemma 3.3. Let U1(H1i H2) be the class of all functions g defined in the closed region D;(H1, H2), satisfying a Lipschitz condition Ig(x')  g(x) I < Lip(g)  Ix'  xl, and such that dl Jc,(H) g(x) Ib(x)I = 0
for all He(H1iH2). Lemma 4.1. There exist (small) positive constants A7, A8 such that for every positive A9 for every H1, H2 < A9 in the interval H(11) at a distance not less than 6 = S(e) = eA7 from the ends of H(11), for every positive .l, for sufficiently
317
0. proof of Lemmas 3.1 to 3.4
small e, for every g e (,(HI, H2), and for every x e D,(H,, H2), ; (Hi,H) I MX
ex'g(XX) dtl < (IIgII + Lip(g)) . eAs
f0
(4.8)
Lemma 4.2. Under the conditions of the previous lemma, there exist such positive constants Alo, A11 that for sufficiently small e, for every H1, H2 < A9 in the interval H(1,) at a distance not less than 6 = 8(e) = eAI0 from its ends, for every x e D,(H1i H2), and for every smooth function f on the interval [H1i H2], MX
[e_(hu1H2)f(H(X(Hl,i,2)))
+
rt;(H,,Hz) I
ext[.lf (H(Xre)) 
UPI + Ilf"II +
Lif
(H(Xrs)
)] dt f(H(x))
Ilf"'II)e4
(4.9)
Lemma 3.3 is a particular case of this lemma. Lemma 4.2 is proved simply enough if Lemma 4.1 is proved. To prove Lemma 4.1 we use the fact that our process X; moves fast along the trajectories of the Hamiltonian system (and more slowly across them). It is con
venient for us to consider the "slow" process X. The following lemma makes precise the statement about our process moving along the trajectories. Lemma 4.3. For every positive 1, 2
Px{m x IX,'  xt(x)I > rl} < 3(e2 2L 1) E4
(4.10)
where L is the Lipschitz constant of the function VH (the same as that of VH); and M,,I
X,  x, (x) I
p x,(x) that is uniform in every finite time interval [0, T], but also in the interval [0, cIln I] that grows indefinitely as a decreases, where the constant
c < Lt. The most likely behavior of the process X, (or X,1) in the region D,(H1, H2) is rotating many times along the closed trajectories filling this region. We have to devise some way to count these rotations, adopting some definition of what a rotation is. No problem arises in the case of rotations of a solution x,(x) of the unperturbed system x, = OH(x,): we can consider the first time at which x,(x) returns to the same point xo(x) = X. But the diffusion process X: never returns to X. Let us take a curve 0 in D;(Ht, 02) running across all trajectories x,(x) in this region so that every trajectory (level curve of the Hamiltonian) intersects 0 at one point. Unfortunately, the time at which k;' returns to this curve cannot serve as the definition of a rotation time, because a trajectory If starting at some point of 0 returns to this
319
¢t. Proof of Lemmas 3.1 to 3.4
curve infinitely many times at arbitrarily small positive times. So to define a "rotation," we take two curves 8 and 8' running across the trajectories at two by different places, and define zo < Q1 < f I < &2 < f2
At. Since xst(x) a Dl lies at a distance at least At from a, we have also Xnt e D1. For t between At and
T (x)  At, the trajectory X! does not intersect
a.
Since the points
XT(x)et(x) E D2, XT(x)+,&t(x) e D1 lie at a distance at least At from the curve 0, we have also XT(x)Ate D2, XT(x)+At e D1. So between the times At
and T(x)  At the trajectory if passes from D1 to D2, intersecting 0' (and therefore jr, is between At and T(x)  At), and between T(x)  At and T(x) + At (but not between dl and T(x)  At) the trajectory passes from D2 to D1, intersecting a. So, finally, T(x)  At < as"1 < T(x) + At. So we have proved the inclusion of the events: {
max
OSt5T(x)+h
l1  x,(x)I < BAt} c {i i  T(x)I< At};
321
§4. Proof of Lemmas 3.1 to 3.4
and the inequality
PX{iil  T(x)I > All < Px{
max
l 0:5 t5 T(x)+h
IXr  xr(x)I > BAt } J))
together with Lemma 4.3 yields (4.14). Now, rT(x)
eAe2' dt
MX 1 J
S MX{ail  T(x)l < h; I fl  T(x)l}
:,
r
e 2t dt
+ MX{ Ii1  T(x) >_ h;
.
ro
111
The second term is equal to (.1e2)1 PX{ail  T(x)l > h} and is estimated using (4.14); the first one is also estimated by (4.14) and the following elementary lemma applied to the random variable ail  T(x) I A h. Lemma 4.5. If a nonnegative random variable is such that P{ > x} < C/xa for every x > 0, then Mr; < 3 0114
So we have (4.15). The estimate (4.16) follows from (4.14) and the fact that MXe 2i' = MX {i1 < T(x)  h; a 2f' } + MX{i1 > T(x)  h; a 2zj }. As for the initial point Xo = x 0 8, the trajectory X, that is close to xr(x) need not cross the curve 8' and 0 after this within the time T (x) + h, but is bound to do so within two rotations plus h, and the estimate (4.17) is proved the same way as the more precise (4.14); (4.18) follows immediately. Proof of Lemma 4.1. The expectation in (4.8) can be written as c2MX f
f, (H,,H2)
e,ie2rg('Yi)dt.
0
Take two curves 8, 8' in a slightly larger region D;(H1  6/2, H2 + 8/2), and
consider the sequence of Markov times ik defined by (4.13). Let v = max{k : ik < fi (H1, H2)}. We have f
T,
o
2
v1
tk+,
e
k=0
2
r (Hi,H2)
dt +
'
e;Z21
g(Xf) dt.
k
(4.19)
We are going to apply Lemma 4.3 with D,(H1  6/2, H2 + 6/2) instead of
D.(H1, H2) and D,(H1, H2) instead of b1(Hi, Hi). We have to estimate T(x), B, choose the curves 8, 8', and estimate the constants B, h, d.
322
8. Random Perturbations of Hamiltonian Systems
We have h.5 max{lVH(x)I : H(x) < A9}; the distance from D,(HI,H2) to the complement of D; (HI  6/2, H2 + 6/2) is not less than 6/2B. Let us take d = 8/2B (so the first condition imposed on d is satisfied; the rest is taken care of later). The period T(x) of an orbit xt(x) going through a point x e D; is positive, it has a finite limit as x approaches an extremum point xk (such that the corresponding vertex Ok is one of the ends of the segment I,), and it grows at the rate of const j1njH(x)  H(Ok)I I as x approaches the curve Ck; if
it contains a saddle point of the Hamiltonian. In any case, we have A12 < T(x) < A1311n 51 for x e D,(HI, H2). Take two points xo and xo on a fixed level curve C,(Ho) in the region D,,
and consider solutions y,(xo), y,(xo), oo < t < co, of yf = VH(y,) going through the points xo, x(1; the curves described by these solutions are orthogonal to the level curves of the Hamiltonian. Let us take a, a' being
the parts of the curves described by y,(xo),
y,(xo) in the region
D, (HI  8/2, H2 + 6/2). The curves described by y,(xo), y,(xo) may enter the same critical point xk as t oo or t oo (they certainly enter the same critical point if one of the ends of the segment I, corresponds to an extremum of the Hamiltonian), but from different directions. The distance from D; (HI  6/2, H2 + t5/2) to a
critical point is clearly not less than A140, and it is easy to see that the distance between the curves a and a' is at least A150. This distance is greater than d = 6/2B for sufficiently small 6 (i.e., for sufficiently small e). Also we can take B = A16 V o We could have taken h = const as far as it concerns the requirements p(x,(x), a) > Bt for x e a n D; (HI, H2), 0 < t < h,
p(x,(x), 0) > B(T(x)  t) for the same x and T(x)  h < t < T(x); but we also have to ensure that Bh < d, so we take h = A 17 V o. Now, if we choose the constant A7 so that A7 (2LA13 + 1) < 2, then for sufficiently small ewe obtain from (4.15), (4.16), and (4.18), T(X)
MXI
e2' dt
(4.20)
J
for x e a n D,(HI, H2), where A18 < min(1  A7 (LA13 +
Z),
24A7*
(LA13 + 1)),
MXe 2Z' S 1 
A12 e2
(4.21)
for xe anD,(H1iH2), and T,
e'2' dt < 2A13A7 1ln el + 1
Mx JO
for all xeD,(HI,H2).
(4.22)
§4. Proof of lemmas 3.1 to 3.4
323
Now let us extend the function g to the whole plane so that IIgII and the Lipschitz constant Lip(g) do not increase. Let us add to the righthand side of (4.19) and subtract from it fJ1 H2) e 2(g(Xt) dt; then we can represent this righthand side as 00
E
{ik to} C {6(±O) > to,
Ir0
IVH(X)I2dS < C02}
o
u < zk(±8) > to, J00 IVH(Xs)I2ds > CO, 2I W(C02)I ?
l
36
}
to, JO VH(X)I2 ds > CS2, I W(C02)I < 3b}.
329
§5. Proof of Lemma 3.5
Suppose fo° I Z AH(Xs) Ids < 6 for all trajectories Xf such that Tk (±6) > to. Then the second event on the ri ghthand side cannot occur, and the third one
is a part of the event { W (C8) < 36}. Using the normal distribution, we have for all x e Dk(±8), to,Jr°
IOH(X,)I2ds < C82}
PX{Tk(±8) > to} < PX{Tk(±a) >
3/.2
+3/J7r YL 1
J
o
ev z/2 dy.
It does not matter much for our proof what C we take in this estimate. Let us take C = 9; then the last integral is equal to 0.6826. Of course, 12 AHI 5 A28 in some neighborhood of the curve Ck; so if, for small 6, we take to < 6/A28i the inequality fo° I AH(Xs) I ds < 6 is guaranteed. So we have to estimate, for some to = to(S)i< 8/A28, the first summand on the righthand side of (5.1). To do this, we consider once again cycles between reaching two lines. Just as in the proof of Lemma 4.1, introduce curves 8, 8' in a slightly larger region Dk(±26):
0 ={xEDk(±26) : IxxkI=A29V'i}, 0'= {xEDk(±2a) : Ix xkI =A30v6}.
Just as in the proof of Lemma 4.1, let us consider the "slow" process X,'. Define Markov times a"o < &i < i`I < a2 < ... by io = 0,
a; = min{t > iii : 1 E 8'},
fi = min{t >: &i : X, E a}.
For sufficiently large positive constants A29 < A30 the curves a, a' each consist of four arcs: two across the "sleeves" of Dk(±26) in which the solutions of the Hamiltonian system go out of the neighborhood of the saddle point, and two across the "sleeves" through which the trajectories go into the cen
tral, crosslike part of Dk(±26). Let us denote the corresponding parts by aout, aout, Din, ai'n (they each consist of two arcs; see Fig. 24).
Before the time ik(±a), the times &, are those at which the process Xt goes out of the central part of the region Dk(±8); and ii, i > 0, are those at which it, after going through one of the two "handles" of this region, reaches its boundary. Lemmas 5.2 and 5.3 are taking care of the spaces between these times.
Let D, be one of the two outer "handles" of the region Dk(±8); e.g., its left part between the lines Bout and ain (or the right part); f,, will denote the first time at which the process X' leaves this region.
330
8. Random Perturbations of Hamiltonian Systems
Figure 24.
Lemma 5.2. There exist positive constants A31, A32, A33 such that for sufficiently small e and small6 > eA", Mzi < A32 I1n6I,
(5.2)
for every x E D,; Pz{XT E ain}
1
0),
(e
(5.3)
uniformly in x c a'; and Px { J." JVH(Xi) I2 ds > A33 }
1
(e
0),
(5.4)
uniformly in x e aout.
Proof. The proof is similar to those of Lemmas 4.4 and 4.1, and relies on the same Lemma 4.3. As in the beginning of the proof of Lemma 4.1, we have B = max{Ib(x)I : x c Dk(±28)} < A34; again we take d = 8/2B. The time Ta(x) in which the trajectory xt(x) leaves the region D, (through the line Oin, of course) is bounded by A35 I1n6J, and the time the trajectory starting on of spends in the set {x : Ix  xkI >_ A36} is at least A37 if we choose the
constant A36 small enough and if 0 is small. We can take B = A38V and
§5. Proof of Lemma 3.5
331
h=A39V3so that for all xeD,,
p(xt(x),0) >
Bt
for0 A35Ilntj + 1} : z E D,}]", Me C xC
A35I1nbj + 1
1sup{PZ{z8>A35I1n5I+1}:zEDc} o
aB z
< 0.
01
The functions a'J r1), rj), b' )7) are defined in a neighborhood of the point (0,0); let us extend these functions of the whole plane so that au, b' are bounded, z
11) > ao > 0,
everywhere.
B'(0, rl) = B2
0) = 0,
a > Bo > 0, aa2 < Bu
333
§5. Proof of Lemma 3.5
for the diffusion in the plane governed by the We use the notation operator (5.7) with coefficients extended as described above. Before the process k;' leaves a neighborhood of xk, l and $' are the c and 1coordinates of this process. The curves a, a' are no longer parts of circles in our new coordinates, but for small 6 they are approximately parts of ellipses the ratio of whose sizes is A30/A29. We can choose positive constants A29 < A30, A4f, and A41 < A42 < A40 f for x e a', 1?11 < A41 f for x e a, so that, for small positive 6, and ICI > A4205 for x e an Let us consider the times iout = min{ t : J fl = A40 f } and f _ min{ t : 1X1'1 = A420} It is clear that iz < tout; and if the process X,' starts then it leaves the region D., either from a point Xo = x e a, and z`out < through one of its sides (that are trajectories of the system), or through aout It turns out that the time iout is of order of ln(6/e2), whereas z is of order at least 6/e2, i.e., infinitely large compared with iout (one can prove using largedeviation estimates similar to Theorem 4.4.2 that n is, in fact, of order exp{const6/e2}, but we do not need that). The proof is based on comparing the processes er, i' (that need not be Markov ones if taken separately) with onedimensional diffusions. e_y Let us introduce the function F(x) = fox [J'4' e'2 dz] dy. Using l'Hopital's rule, we prove that F(x)  1 In x as x > oo. Let us consider the function. iyn,
2
2Bo 1 [F(
BoA4ov' /e)  F( Bole)].
This function is the solution of the boundaryvalue problem e2 d2ue
du`
2 d 2 +B0 d _ 1,
so
is
the expectation
ue(±A4of) = 0;
of the
time of leaving the interval
(A4of, A4of) for the onedimensional diffusion process with generator starting from a point e (A4of, A4of) Let us apply the operator Le to the function u` on the plane, depending on the ccoordinate only: Leue(x) = a11(x)  (2(
 B1(x) 2 ( Bo
We have
Bole)  1 2x
Bo/e)F'( Bo le)  o b1(x)F'( Bo le).
B1(0, q) + (aB1 /a )( ,,,), Leve(x) <  ao + (2e/C).  max I F' (x) I <  ao/2 for sufficiently small c, and we obtain by formula (5.1) of Chapter 1: sup lb 1(x) I
334
8. Random Perturbations of Hamiltonian Systems
Mxzout
to/8210
l IVH(Xs)I2ds < 9(2/821.
(5.8)
335
§5. Proof of Lemma 3.5
For an arbitrary natural n this probability does not exceed T"
p; in < to/e2 _ to} < 0.8 for sufficiently small a and x e Dk(±(). Application of the Markov property yields PX{Tk(±8) z nto} < 0.8", and MXzk(±6)
: rnI : Xt 0 Dk(±80)}, and r,, = min{t > a,, : Xr E U; Cki(6o) v Ui Ck;(6)}. The expectation in (5.14) is equal to 00
Q"+1 az`dt} + J" J n=1
r Mx{ rn < rk(±8); f
n=0
111
M% Jan
ll
T
< rk(±b); f " eztdt}
00
Mx{Tn < rk(+8); aZr"} sup{MZrk(±280) : z e Dk(±280)} n=0 00
+
r1
sup Mz J ezt dt : z e U Cki(20)
Mx{an < Tk(±8); n=1
l
0
JJJ
(5.15)
(the strong Markov property is used).
The first supremum is estimated by Lemma 5.1; it is O(Sollnel) < A488o8I1n81. For a path starting at a point z e Ck;(260), the time r1 is nothing but r; (H(Ok) ± 8o, H(Ok) ± 6). To estimate its Mzexpectation, we apply Lemma 4.2 to the solution of the boundaryvalue problem
)f,(H)Lf(H)=1, J (H(Ok) ± 80) =f(H(Ok) ± 6) = 0. It is easy to see that 11f 1) 11
280) < A506o8Iln8I, and
A4962Iln8I,
n/..} < [A4562I1n8I/(1  el)]".
§6. Proof of Lemma 3.6 We use a result in partial differential equations, which we give here without proof, reformulated to fit out particular problem.
Let 2E be a twodimensional secondorder differential operator in the domain D = (H,, H2) x (oo, oc) given by
Y` = e2
B + a l l (H' e) 8H2
+ 2a 12(H, 0) 8H88
+
a2
(H, 0) aa2
+b'(H,0)aH+b2(H,0)e, where
(6.1)
0, for sufficiently small 6', 0 < 6' < 6, the difference Px{Xtk(±6) E Ckj(6)} pkjI
is small for sufficiently small E and for all x e Dk(±8'), where pkj are given by Pkj = I3kjl Ei:i,O, Pki.
The first step is proving that for every i such that h  Ok and for every
function f on 8Dk(±6) with 11f 11 Ho  8} and in all regions Dk(±b), then G(X,) dt = 0, and R2
Jf
G(x) dx =
Jcd
v`(dx)M'
Jo
G(X,) A
(6.13)
Let us consider an arbitrary segment Ij of the graph with ends Ok, and Ok2. Let us denote Hk, = H(Ok,), Hk2 = H(Ok2), except if the vertex Ok2 corresponds to the point at infinity, in which case we take Hk2 = Ho. Let g(H) be an arbitrary continuous function on the interval H(Ij) that is equal to 0 outside its subinterval (Hk, + 6, Hk2  6). Consider the function G(x) that is equal to g(H(x)) in the domain Dj, and to 0 outside Dj. The lefthand side of (6.13) can be rewritten as
ft I
JD JJJ
Hk2a
g(H(x)) dx =
Jg(H) dvj(H), Hk, +S
where uj (H) is the function whose derivative is given by formula (1.15). For the expectation in the righthand side of (6.13), formulas (4.28) and (4.29) hold, and we can write Hk26
J Hki +a
g(H) dvj(H) = VE(Ckj(6)) Hk2
x
JHk
11
(uj(Hk2 6')  uj(H))g(H) dvj(H) + o(l)1
+ V(Ck2;(6)) X
I uj(Hk' +6)  uj(Hk, +a') uj(Hk2 b)  uj(Hk, +8 )
fuj(Hk2 S')  uj(Hk2 6) Luj(Hk2  a)  uj(Hk, + 15')
Hk2 6 JHk (uj(H) +6
11
 uj(Hk, +&'))g(H) dvj(H) + o(1)] ,
where the o(l) go to 0 as e > 0.
In order for this to be true for an arbitrary continuous function g, it is necessary that the limits of v8(Ck j(S)), V6(Ck2j(6)) exist as a + 0, namely,
344
8. Random Perturbations of Hamiltonian Systems
urn VE(Ckv(a)) = uj(Hk, Urn VE(Ck2j(8)) _ e.0
I
+a)  uj(Hk, +8')
uj (Hk2  a) 1 uj (Hk2  6)
But by formula (1.17), uj(H')  uj(H)  2(fck IVH(x) I dl)' . (H'  H) as H, H' + Hk, (we disregard the case of Hk2 ='flo). This means that, for sufficiently small 6 and e,
V`(Ckj(a)) 
Ykj
2(8a')
I
: H(O) = 0, i e 11, 2}, can be used as coordinates on F. The points (1, H(F)) and (2, H(B)) as well as (1, H(E)) and (2, H(A)) are identified.
Define the mapping Y(x) : R2 i F by Y(x) = (k(x), H(x)), and let Y1 = Y(X1) It is easy to see that, when H(X,) 0 [H(C), H(D)], the fast component of the process X,1 has distribution on C(H(X,)) close to the unique invariant measure of the nonperturbed system on the corresponding level set of H(x).
When H(x') e (H(C), H(B)) u (H(A), H(D)), the distribution of the fast component is close to the measure concentrated at the stable rest point of the
nonperturbed system on the corresponding level set. But when H(Xf) e (H(B), H(A)), the fast coordinate of the X, is concentrated near one of the points xo1 (H(X, )) or x(62)(H(X, )), depending upon the side from which H(X,) last entered the segment [H(B), H(A)]. If c H(D),
H(F) < y < H(D);
2 AH(xo') (Y)), [
y > H(D),
AH(x) dl  ky) If()VH(x)I'
i AH(xo2) (Y)),
Define the operators Li, i = 1, 2,
y < H(C), H(C) < y < H(A); y < H(C), H(C) < y < H(A).
§7. Remarks and Generalizations
Li = 2 Ai(y) y2 + B;(y)
347
.
The process Yt on F is governed by the operator Lt on the upper part of the graph F(i = 1), and by L2 on the lower part (i = 2). The operator L2 degenerates at the point (0, 2) e IF, so that this point is inaccessible for the process Y,. To define the process Y, for all t >_ 0 in a unique way, one should add the gluing conditions at the points (1, H(F)) identified with (2, H(B)) and at (1, H(E)) identified with (2, H(A)). These gluing conditions define the behavior of the process at the vertices. As we have already seen, the gluing conditions describe the domain of definition of the generator for the process Yt. Let Yt be the process on IF such that its generator is defined for functions f (i, y), (i, y) e F, which are continuous on 1, smooth on {(i,y) e F : y >_ H(F), i = 1} and on {(i,y) e F : y < H(A), i = 2}, and
satisfy Llf(l,H(F))= L2f(2,H(B)) and Llf(1,H(E))=L2f(2,H(A)). Obviously, these gluing conditions have the form described earlier in this chapter with one of a ki = 0 for k = 1, 2. The generator coincides with Ll at the points of the set {(i, y) e F : y > H(F), i = I} and with L2 at the points of {(i,y) e F : y < H(A), i = 2}. The limiting process Yt on r is determined by these conditions in a unique way. We have seen that the graphs associated with the Hamiltonian systems in R2 always have a structure of a tree. If we consider dynamical systems in 2.
the phase space of a more complicated topological structure, the corresponding graph, on which the slow motion of the perturbed system should be considered, can have loops. Consider, for example, the dynamical system on the twodimensional torus T2 with Hamiltonian H(x) defined as follows. Let the torus be imbedded in R3 and the symmetry axis of the torus be parallel to
a plane 11 (see Fig. 27(a)), and H(x) be equal to the distance of a point x e T2 from 11. The trajectories of the system then lie in the planes parallel to 11 as shown in Fig. 27(a). The set of connected components of the level sets of the Hamiltonian H is homeomorphic to the graph r shown in Fig. 27(b). This graph has a loop. One can describe the slow motion for the diffusion process resulting from small white noise perturbations of the dynamical system in the same way as in the case of the systems in R2. The dynamical system or this example is not a generic Hamiltonian sys
tem on T2. Generic Hamiltonian systems on twodimensional tori were studied in Arnold [2] and Sinai and Khanin [1]. In particular, it was shown there that a generic dynamical system on T2 preserving the area has the following structure. There are finitely many loops in T2 such that inside each
loop the system behaves as a Hamiltonian system in a finite part of the plane. Each trajectory lying in the exterior of all loops is dense in the exterior. These trajectories form one ergodic class. There are two such loops in the example shown in Fig. 28.
8. Random Perturbations of Hamiltonian Systems
348
(a) Figure 27.
(a)
(b) Figure 28.
(c)
349
§7. Remarks and Generalizations
The Hamiltonian H of such a system on T2 is multivalued, but its gradient VH(x), x e T2, is a smooth vector field on the torus. Consider small white noise perturbation of such a system:
X; = OH(.i) + cW,,
X, = zt/E2.
To study the slow component of the perturbed process, we should identify some points of the phase space. First, as we did before, we should identify all the points of each periodic trajectory. But here we have an additional identification: since the trajectories of the dynamical system outside the loops are dense in the exterior of the loops, all the points of the exterior should be glued together. The set of all the connected components of the level sets of the Hamiltonian, provided with the natural topology, is homeomorphic in this case to a graph I' having the following structure: the exterior of the loops corresponds to one vertex Oo. The number of branches connected with Oo is equal to the number of loops. Each branch describes the set of connected components of the Hamiltonian level sets inside the corresponding loop. For example, the left branch of the graph I' shown in Fig. 28(c), consisting of the edges 0002i 0201, 0203, describes the set of connected components inside the loop yl. Let HI, H2,. .., HN be the values of the Hamiltonian function on the loops y , . .. , yN. Denote by H(x), x e T2, the function equal to zero outside the loops and equal to H(x)  Hk inside the kth loop, k = 1, ... , N. Let all the edges of F be indexed by the numbers 1, . . . , n. The pairs (k, y), where k is the number of the edge containing a point z e I and y is the value of H(x) on the level set component corresponding to z, form a coordinate system on r. Consider the mapping Y : T2 i r, Y(x) = (k(x), H(x)). Then one can expect that the processes Ye = Y(XE), 0 < t < T, T < oo, converge weakly in the space of continuous functions on [0, T] with the values in IF to a diffusion process Y, on F. One can prove in the same way as in the case of Hamiltonian systems in R2, that the limiting process is governed inside the edges by the operators obtained by averaging with respect to the invariant density of the fast motion. The gluing conditions at all the vertices besides Oo have the same form as before. But the gluing condition at Oo is special. The
uniform distribution on T2 is invariant for any process Xt, e > 0. Thus, the
time each of them spends in the exterior of the loops is proportional to the area of the exterior. Therefore, the limiting process Y, spends a positive amount of time at the vertex Oo, that corresponds to the exterior of the loops. The process Y, spends zero time at all other vertices. To calculate the gluing conditions at Oo, assume that the Hamiltonian
H(x) inside each loop has just one extremum (like inside the loop y2 in Fig. 28). Since the gluing conditions at 00 are determined by the behavior of the process X, in a neighborhood of the loops, we can assume this without loss of generality. Let the area of the torus T2 be equal to one, and the area of the exterior of the loops be equal to a. Denote by p the measure on r such that u{Oo} = a, (du/dH)(k, H) = Tk(H) inside the kth edge, where
350
8. Random Perturbations of Hamiltonian System,
Tk(H) =
IVH(x)l1 dC, Ck(H)
Ck(H) is the component of the level set C(H) = {x E T2 : H(x) = H} inside the kth loop. It is easy to check that Tk(H) = IdSk(H)/dHI, where Sk(H) is the area of the domain bounded by Ck(H). Taking this into account, one
can check that p is the normalized invariant measure for the processes Y, = Y(X,) on IF for any e > 0, and it is invariant for the limiting process Yt as well.
On the other hand, a diffusion process on IF with the generator A has an invariant measure p if and only if
J Au(k, y) du = 0
r
for any continuous function u(z), z e t, that is smooth at the interior points of the edges and such that Au(z) is also continuous. Taking into account that the operator A for the process Y, coincides with the differential operator Lk
1
d
)
dHd
= 2Tk(H) dH ak(H)
ak(H) = f ,
k_(H)
I VH(x) I de,
C
inside any edge Ik of the graph F, we conclude from (7.3) that the measure p is invariant for the process Y, only if N
aAu(Oo) =
2 E(± 1)ak(0) du
(k, Oo).
(7.4)
k=1
Here (du/dH)(k, Oo) means the derivative of the function u(z), z E F along
the kth edge Ik  Oo at the point O. Sign "+" should be taken in the kth term if H(x) < 0 on Ik, and sign "" if H(x) > 0 on Ik. Equality (7.4) gives us the gluing condition at the vertex Oo. 3.
Consider an oscillator with one degree of freedom,
j), +f (Pt) = 0.
(7.5)
Let the force f (p) be a smooth generic function, F(p) = fo f (z) dz. Assume
that limlpl, F(p) = oc. One can introduce the Hamiltonian H(p, q) _ 1 q2 + F(p) of system (7.5) and rewrite (7.5) in the Hamiltonian form,
Pr = aH (Pt, qt) m qt 4
9r = 
aH 9
(7.6)
(Pt, qt) = r(Pt)
351
§7. Remarks and Generalizations
If we denote by x the point (p, q) of the phase space, equations (7.6) have the
form zt = VH(xt). Consider now perturbations of (7.5) by white noise, Ae +f(Pi) = eJi'.
(7.7)
Here Wt is the Wiener process in R', 0 < e 0 the processes Yt = H(p1, qI) converge weakly in the space of continuous functions on [0, T] to the diffusion process Y1 on [0, oo), corresponding to the operator
i L = 2 A(Y) A(Y)
_
d
dyd e
1
f
+ B(Y) dy ,
Hg(p,q)de
T (y) c(y) I VH(p, q) I '
B(y) =
(7.10)
Hgg d8
2T(y) f
y)
IVH(p,q)I
where T(y) = fc(y)(d8/IVH(p,q)I) is the period of the nonperturbed oscillations with the energy H(p, q) = y. Here, as before, C(y) _ {(p, q) a R2 : H(p, q) = y}; d( is the length element on C(y). The point 0 is inaccessible for the process Y, on [0, oo). The proof of this convergence can be carried out in the same way as in the nondegenerate case. Using the Gauss formula, one can check that
dY
[A(Y)T (Y)] = B(Y)T (Y)
Thus,
L 2T (Y) dY
(acy)
where a(y) = A(y)T(y). Now, since H(p, q) =1 q2 + F(p), the contour C(y) can be described as follows.
C(y) = {(p, q) e R2 : a(y)
0, the process Z, = Y(p;,
on
F converges weakly in the space of continuous functions on [0, T] with values in
F to a Markov diffusion process Zt as s j 0. The limiting process Zt is governed by the operator d d (S(k,y)dy Lk = 2Sy(k,y) dy 1
i n s i d e the edge I k c IF, k = 1, ... , n, and by the gluing conditions at the vertices: a bounded continuous on IF and smooth inside the edges function g(z), z E I', belongs to the domain of definition of the generator A of the limiting
process Zt if and only if Ag is continuous on F, and at any interior (corresponding to a saddle point of H(p, q)) vertex O,, of IF, the following equality holds.
S1(O,)d.!g(Qt) +S,2(OC)di?g(Or) = Si;(Oe)d6g(Ol) dy
dy
dy
Here Sik(Ot) =1im_o, S(z), Irk  01, k = 1,
2,
we assume that
3;
"k
H(Y1(z)) < H(Y1(Oc)) or H(Y1(z)) > H(Y1(Oe)) simultaneously for "F zel,, U1,Z. The operators Lk, k = 1, ... , n and the gluing conditions define the limiting process Zt in a unique way.
The proof of this theorem is carried out, in general, using the same scheme as in the case of nondegenerate perturbations. The most essential difference arises in the proof of the Markov property (step 4) in the plan described in §8.1). We use here a priori bounds of Hormander type instead of Krylov and Safonov [I]. The detailed proof of Theorem 7.1 can be found in Freidlin and Weber [1]. The degenerate perturbations of a general Hamil
tonian system with one degree of freedom are considered in that paper as well. 4.
Consider the diffusion process (Xt, P,r) in Rr corresponding to the
operator
L
> B` (x)
2 i,J=1 AtJ (x) ax_ 0, the behavior of the process after touching the vertices should be described. One can prove that the exterior vertices are inaccessible. To calculate the gluing conditions at the interior vertices, one can, sometimes, use the same approach as we used before. Assume that the processes in R' corresponding to the operators L and LI have the same invariant density M(x). Then the density M(x) is invariant
for the process (X,, Px) for any e > 0. Define the measure ,u on r as the projection on r of the measure p, ju(D) = fD M(x) dx, D c R': FA(Y) =
#(Y,(Y));
y is a Bore] set in F. Then the measure p will be invariant for the processes
Y,, e > 0, and for the limiting process Yt. Now, if we know that Yt is a continuous Markov process on r, the gluing conditions at a vertex Ok are described by a set of nonnegative constants ak, flkl, ... , flke, where a is the number of edges connected with Ok. These constants can be chosen in a unique way so that the process YY has the prescribed invariant measure. For example, one can use such an approach if
L = L + B(x) V,
Lt = LI + b(x) O,
where L and LI are selfadjoint operators and the vector fields B(x) and b(x) are divergence free. The Lebesgue measure is invariant for L and Lt in this case.
Note that if the process (XI, Px) degenerates in a dynamical system, to repeat our construction and arguments, we should assume that the dynamical system has a unique invariant measure on each connected component Ck(y). This assumption, in the case when the dimension of the manifolds Ck(y) is greater than 1, is rather restrictive. One can expect that the convergence to a diffusion process on the graph still holds under a less restrictive assumption, namely, that the dynamical system on Ck(y) has a unique invariant measure stable with respect to small random perturbations.
The nonperturbed process (Xt, Px) (or the dynamical system) in R' governed by the operator L may have several smooth first integrals 5.
Hl (x), ... , He(x); t < r. Let
C(y)={xcR':Hl(x)=yl,...,He(x)=ye}, y=(yt,...,ye)eR'. The set C(y) may be empty for some y n Re'. Let C(y) consist of n(y) connected components: C(y) = Uk`? Ck(y). Assume that at least one of the first integrals, say HI (x), tends to Foo as IxI  oo. Then each Ck(y) is compact.
358
8. Random Perturbations of Hamiltonian System,
A point x e R' is called nonsingular if the matrix (oH;(x)/ax1), 1 5 i 5 e, 1 < j < r, has the maximal rank &I. Assume that the nonperturbed process at a nonsingular point x e Rr is nondegenerate, if considered on the manifold C(H1(x), ... , H,,(x)). This means that
a(x) > 0,
E A'J(x)e'ej >_ a(x)lel2, t,i
for any e = (e1 i ... , er) such that e VHk (x) = 0, k = 1, ... , e. Then, the diffusion process Xt on each Ck(y), consisting of nonsingular points, has a unique invariant measure. Let My (x), y e R', y e Ck(y), be the density of that measure. The collection of all connected components of the level sets C(y), y e R', provided with the natural topology is homeomorphic to a set r consisting of glued Cdimensional pieces. The interior points of these pieces correspond to the nonsingular components Ck(y).
Define the mapping Y : R' H r : Y(x), x e R', is the point of t corresponding to the connected component of C(H1(x),. .. , H,(x)) containing x. One can expect that the stochastic processes Y7 = Y(XX ), where Xf corresponds to L' = (1 /e2)L + L1, converge weakly (in the space of continuous functions on [0, T] with values in T, provided with the uniform topology) to a diffusion process Yt on r as a + 0. Inside an fdimensional piece yk c I', the matrix (3H;(x)/0x1) has rank t, and the values of the first integrals H, (x), ... , H., (x) can be used as coordinates. The process Y, in these coordinates is governed by an operator a
a2
1
i
Lk=2Eak(Y)a;ai+Ebk(Y)a Y YY i=1 41=1
The coefficients a`k(y) and bk(y) can be calculated by the averaging procedure with respect to the density My (x):
a()=
r
A (x)
'Hi (x)
fCk(Y) mnl
bk(y) _ f
)a
( )
My (x) dx,
LH;(x)ML (x) dx.
Ck(r)
To determine the limiting process, Yt for all t >_ 0, one should supplement the operators Lk governing the process inside the edimensional pieces, by the gluing conditions in the places where several (dimensional pieces are glued together. These gluing conditions should be, in a sense, similar to the boundary conditions for the multidimensional process (see Wentzell [8]).
§7. Remarks and Generalizations
359
C Figure 29.
In the conclusion of this section, we would like to mention one more asymptotic problem leading to a process on a graph. 6.
Consider the Wiener process Xt in the domain" G' c R', shown in Fig. 29(a), with the normal reflection on the boundary. The domain G' consists of a ball of the radius p' and of three cylinders of radii Eri, Er2, Erg, respectively. Let the axes of these cylinders intersect at center 0 of the ball. It is natural to expect that the projection of the process Xf on the "skeleton" F of the domain GE, shown in Fig. 29(b), converges to a stochastic process on F, as e j 0. One can check that, inside each edge of the graph, the limiting process will be the onedimensional Wiener process. Assuming that near the points A, B, C, the boundary of GE consists of a piece of the plane orthogonal to the axis of the corresponding cylinder, it is easy to see that the limiting process has instantaneous reflection at the points A, B, c e F. To determine the limiting process for all t >_ 0, we should describe its behavior at the ver
tex 0. The gluing condition at 0 depends on the relation between pE and e. If e maxk rk S p' > e('l)/', then the point 0 is a trap for the process: after touching 0, the trajectory stays there forever. The gluing condition in this case is u"(0) = 0,
independently of the edge along which this derivative is calculated. If limEio pee('l)/r = x, then the domain of the generator of the limiting process consists of continuous functions u(x), x e r, for which u"(x) is also continuous and s
du
r
i
'
dY; (0)
K'I'( 2)F (
r
(r + 2
1) u"(0) .
2
The limiting process in this case spends a positive time at 0. This problem and some other problems leading to the processes on graphs were considered in Freidlin and Wentzell [4] in Freidlin [21].
Chapter 9
Stability Under Random Perturbations
§1. Formulation of the Problem In the theory of ordinary differential equations much work is devoted to the
study of stability of solutions with respect to small perturbations of the initial conditions or of the right side of an equation. In this chapter we consider some problems concerning stability under random perturbations. First we recall the basic notions of classical stability theory. Let the dynamical system X, = b(x,)
(1.1)
in R' have an equilibrium position at the point 0: b(O) = 0. The equilibrium position 0 is said to be stable (Lyapunov stable) if for every neighborhood U, of 0 there exists a neighborhood U2 of 0 such that the solutions of equation (1.1) with initial condition xo = x e U2 do not leave U, for positive t. If, in addition, lim,. x, = 0 for trajectories issued from points xo = x sufficiently close to 0, then the equilibrium position 0 is said to be asymptotically stable. With stability with respect to perturbations of the initial conditions there is closely connected the problem of stability under continuously acting perturbations. To clarify the meaning of this problem, along with equation (1.1), we consider the equation X, = b(X,) + br,
(1.2)
where s, is a bounded continuous function on the halfline [0, oc) with values in R'. The problem of stability under continuously acting perturbations can
be formulated in the following way: under what conditions on the field b(x) does the solution of problem (1.2) with initial condition Xo = x converge
uniformly on [0, x) to the constant solution X, = 0, as Ix  01 + sup,:,,,. I C, I 0. It can be proved that if the equilibrium position is stable, in a sufficiently strong sense, with respect to small perturbations of the initial conditions, then it is also stable under continuously acting perturbations.
362
9. Stability Under Random Perturbations
For example, if the equilibrium position is asymptotically stable, i.e.,
limx,=0 t.
uniformly in xo = x belonging to some neighborhood of 0, then 0 is also stable with respect to small continuously acting perturbations (cf. Malkin [1R)
The situation changes essentially if we omit the condition of boundedness of C, on [0, oc). In this case the solutions of equation (1.2) may, in general, leave any neighborhood of the equilibrium position even if the equilibrium position is stable with respect to perturbations of the initial conditions in the
strongest sense. Actually, the very notion of smallness of continuously acting perturbations needs to be adjusted in this case. Now let C, in (1.2) be a random process: C, = C,(w). If the perturbations C,(w) are uniformly small in probability, i.e., supo 5, < oo I C,(w) I  0 in probability, then the situation is not different from the deterministic case: if the point 0 is stable for system (1.1) in a sufficiently strong sense, then as
sup U(0)
o5r 0. Nevertheless, the assumption that [, converges to zero uniformly on the whole halfline [0, oc) is too stringent in a majority of problems. An assumption of the following kind is more natural: sup, M I C, IZ or some other characteristic of the process C, converges to zero, which makes large values of I C, I unlikely at every fixed time t but allows the functions J C## I to assume large values
at some moments of time depending on to. Under assumptions of this kind, the trajectories of the process z, may, in general, leave any neighborhood of the equilibrium position sooner or later. For example, if C, is a stationary Gaussian process, then the trajectories of z, have arbitrarily large deviations from the equilibrium position with probability one regardless of how small
MICr12=a00is.
To clarify the character of problems, involving stability under random perturbations, to be discussed in this chapter, we consider an object whose state can be described by a point x e R. We assume that in the absence of
random perturbations, the evolution of this object can be described by equation (1.1), and random perturbations C,(w) lead to an evolution which can be described by the equation
X, = b(X C).
(1.3)
As , we also allow some generalized random processes for which equation (1.3) is meaningful (there exists a unique solution of equation (1.3) for any
§1. Formulation of the Problem
363
initial point x0 = x e R'). For example, we shall consider the case where b(x, y) = b(x) + r(x)y and C, is a white noise process or the derivative of a Poisson process.
We assume that there exists a domain D c R' such that as long as the phase point characterizing the state of our object belongs to D, the object does not undergo essential changes and when the phase point leaves D, the object gets destroyed. Such a domain D will be called a critical domain. Let Xx be the solution of equation (1.3) with initial condition Xo = x. We introduce the random variable TX = min{t: X; 0 D}the time elapsed until the destruction of the object. As the measure of stability of the system with respect to random perturbations C, relative to the domain D, it is natural
to choose various theoretical probability characteristics of TX).' Hence if we are interested in our object only over a time interval [0, T], then as the measure of instability of the system, we may choose P{TX < T}. If the time interval is not given beforehand, then stability can be characterized by MTX. In cases where the sojourn of the phase point outside D does not lead to destruction of the object but is only undesirable, as the measure of stability,
we may choose the value of the invariant measure of the process for the complement of D (if there exists an invariant measure). This measure will characterize the ratio of the time spent by the trajectory X; outside D. However, a precise calculation of these theoretical probability characteristics is possible only rarely, mainly when Xx turns out to be a Markov process or a component of a Markov process in a higher dimensional space. As is known, in the Markov case the probabilities and mean values under
consideration are solutions of some boundary value problems for appropriate equations. Even in those cases where equations and boundary conditions can be written down, it is not at all simple to obtain useful information from the equations. So it is of interest to determine various kinds of asymptotics of the mentioned theoretical probability characteristics as one parameter or another, occurring in the equation, converges to zero. In a wide class of problems we may assume that the intensity of the noise is small in some sense compared with the deterministic factors determining the evolution of the system. Consequently, a small parameter appears in the
problem. If 0 is an asymptotically stable equilibrium position of the unperturbed system, then exit from a domain D takes place due to small random
perturbations, in spite of the deterministic constituents. As we know, in such situations estimates of P{TX < T} and MTX are given by means of an
action functional. Although in this way we can only calculate the rough, logarithmic, asymptotics, this is usually sufficient in many problems. For example, the logarithmic asymptotics enables us to compare various critical We note that r' may turn out to be equal to 0o with positive probability. If the random perturbations vanish at the equilibrium position 0 itself, then P(t' < no) may converge to 0 as x + 0. For differential equations with perturbations vanishing as the equilibrium position is approached, a stability theory close to the classical theory can becreated (cf. Khas'minskii [1]).
364
9. Stability Under Random Perturbations
domains, various vector fields b(x, y) and also enables us to solve some problems of optimal stabilization. This approach is developed in the article by Wentzell and Freidlin [5]. We introduce several different formulations of stability problems and outline methods of their solution. Let X,, t > 0, be the family of random processes in R`, obtained as a result of small random perturbations of system (1.1); the probabilities and mathematical expectations corresponding to a given value of the parameter and a given initial point will be denoted by Px, Mx, respectively. Let A(h)S0r((P) be the action functional for the family of processes X; with respect to the metric poT((o, Eli) = supo 0 we choose T > 0 such that for any control function a e II and any xo e D there exists a function cp 0 < t < T, such that cps, = xo, cp, leaves D for some t e [0, T] and SaT((p) < Vo + y. Then we prove the lemma below.
Lemma 2.1. For any a e n n 2I we have
Px°(tD < T+ 1) > exp{h'(Vo + 2y)}
(2.5)
for sufficiently small h and all x e D.
The proof can be carried out as that of part (a) of Theorem 4.1 of Ch. 4: the function (p, is extended beyond D; we use the lower estimate given by Theorem 2.1 of Ch. 5, of the probability of passing through a tube. The only difference is that the hypotheses of Theorem 2.1 of Ch. 5 are not satisfied in the neighborhood of the point x,, where a(x) is discontinuous. Nevertheless, the corresponding segment of the function cp, can be replaced by a straight line segment and the lower estimate gives the following lemma.
Lemma 2.2. Let be a family of locally infinitely divisible processes with infinitesimal generator of the form (2.1) and let the corresponding function
H(x, a(x), a), along with its first and second derivatives with respect to a, be bounded in ant' finite domain of variation of a (but it may be arbitrarily discontinuous as a./unction x). Then for any y > 0 and any 6 > 0 there exists po > 0 such that Px °{po,o(X'' °, (p) < b} > exp{ yh' }
(2.6)
for sufficiently small h > 0, for all x, y such that It'  xI E + (1 + It follows that A is empty. The first part of the theorem is proved. For the proof of the second part it is sufficient to apply Theorem 4.3 of Ch. 5 to the functions H(x, a(x), a) H L(x, a(x), fl), continuous for x = x0, where a(x) = a(x, VV(x)). El
Now we assume that there exists a function a(x, a) mentioned in the hypothesis of Theorem 2.2, for every x0 e D there exists a function V(x) = V.,°(x) satisfying the hypotheses of Theorem 2.2 and for every x0, the function
ax°(x) = a(x, V. Vxo(x)) belongs to the class 2[ of admissible controls. These conditions imply, in particular, that for any two points x0, x, ED, sufficiently close to each other, there exists a control function a(x) such that there is a "most probable" trajectory from x, to x0a solution of the equation z, = b(x,, a(x,)). We introduce the additional requirement that 0 is an isolated point of the set {a: H(x, a) = 0} for every x. Then x0 can be reached from x, over a finite time and it is easy to prove that the same remains true for arbitrary x0, x, a D, not only for points close to each other.
Theorem 2.3. Let the conditions just formulated be satisfied. We choose a point x0 for which minYEBD !7X0(y) attains the maximum (equal to VO). We choose the control function a(x) in the following way: in the set Bx° = Jx: Vxo(x) < min Vxo(Y) ye(7D
}}
for the remaining x we define a(x) in an arbitrary we put a(x) = a(x, Vx way, only ensuring that a(x) is continuous and that from any point of D the solution of the system z, = b(x,, a(x,)) reaches the set Bx° for positive t, remaining in D. Then
lim h In Mz °rD = V0. h10
for any x e D.
(2.10)
373
§3. Examples
Together with Theorem 2.1, this theorem means that the function a is a solution of our optimal stabilization problem. The proof can be carried out in the following way: for the points of Bx0, is a Lyapunov function for the system x, = b(x a(x,)). the function This, together with the structure of a(x) for the remaining x, shows that xo is the unique stable equilibrium position, which attracts the trajectories issued from points of the domain. Now (2.10) follows from Theorem 4.1 of Ch. 4, generalized as indicated in §4, Ch. 5.
§3. Examples At the end of Chapter 5 we considered the example of calculating the quasipotential and the asymptotics of the mean exit time of a neighborhood of a stable equilibrium position and of the invariant measure for the family of onedimensional processes jumping distance h to the right and to the left
with probabilities h'r(x) dt and h'I(x) dt over time dt. By the same token, we determined the characteristics of the stability of an equilibrium position. This example admits various interpretations; in particular, the process of division and death of a large number of cells (cf. §2, Ch. 5). The problem of stability of an equilibrium position of such a system, i.e., the problem of determining the time over which the number of cells remains below a given level may represent a great interest, especially if we consider, for example, the number of cells of a certain kind in the blood system of an organism rather than the number of bacteria in a culture. Another concrete interpretation of the same scheme is a system consisting of a large number of elements N, which go out of work independently of each other after an exponential time of service. These elements start to be repaired; the maintenance time is exponential with coefficient p depending
on the ratio of the elements having gone out of work. The change of this ratio x with time is a process of the indicated form with h = N', r(x) = (1  x) A (.1 is the coefficient of the distribution of the time of service) and I(x) = xp(x). A nonexponential maintenance time or a nonexponential time between subsequent cell divisions lead to other schemes, cf. Freidlin [8] and Levina, Leontovich, and PyatetskiiShapiro [1]. We consider examples of optimal stabilization. EXAMPLE 3.1. Let the family of controlled processes have the same structure
at all points; in other words, let the function H and the set of admissible controls at a given point be independent of x: H(x, a, a) _ H(a, a):
11(x) _ II.
374
9. Stability Under Random Perturbations
In this case the function H is also independent of x:
H(x, a) = H(a) = inf H(a, a) aert
and equation (2.7) turns into H(VV(x)) = 0.
(3.1)
The function H(a) vanishes only for one vector in every direction (except a = 0). The locus of the terminal points of these vectors will be denoted by A. Equation (3.1) may be rewritten in the form V V(x) e A.
Equation (3.1) has an infinite set of solutionsthe rdimensional planes of the form VXpzp(x) = (ao, x  xo), oto c A but none of them is a solution of problem Rxo for the equation. To find this solution, we note that the solu
tion planes depend on the (r  1)dimensional parameter ao E A and a onedimensional parameter independent of a on the scalar product (ao, xo). This family is a complete integral of equation (3.1). As is known
(cf. Courant [1], p. 111), the envelope of any (r  1)parameter family of these solutions is also a solution. If for a fixed xo, the family of planes Vxoxo(x) has a smooth envelope, then this envelope is the desired solution Vxa(x) of problem Rxo.
This solution is given by a conic surface at any rate (i.e., Vo(x) is a positively homogeneous function of degree one in x  xo); it is convenient to define it by means of its (r  1)dimensional level surface
U, = {x  xo: Vxo(x) = 1)
(it is independent of xo). The surface U, has one point fo on every ray emanating from the origin. We shall see how to find this point. Let the generator of the cone Vxo(x), corresponding to a point /io, be a line of tangency of the cone with the plane (ao is determined uniquely, because the direction of this vector is given: it is the direction of the exterior normal to U, at /fo). The intersection with the horizontal plane at height 1 is the (r  1)dimensional plane {/3 : (ao, fl) = 1 }, which is tangent to U, at 80 (Fig. 31). The point of this plane which is the closest to the origin is situated at distance I ao I  ` in the same direction from the origin as ao, i.e.,
t(ao)
0
Figure 31.
§3. Examples
375
it is the point 1(ao) obtained from ao by inversion with respect to the unit sphere with center at 0. Hence to find the level surface U1, we have to invert the surface A; through every point of the surface 1(A) thus obtained we consider the plane orthogonal
to the corresponding radius and take the envelope of these planes. This geometric transformation does not always lead to a smooth convex surface:
"corners" or cuspidal edges may appear. Criteria may be given for the smoothness of U, in terms of the centers of curvature of the original surface A.
If the surface U, turns out to be smooth (continuously differentiable), then the solution of the optimal control problem may be obtained in the following way. In D we inscribe the largest figure homothetic to U, with positive coefficient of homothety c, i.e., the figure xo + cU,, where xo is a point of D (not defined in a unique way in general). Inside this figure, the optimal control field a(x) is defined in the following way: we choose a point
Po a U, situated on the ray in the direction of x  x0; we determine the corresponding point oco of A ; as a(x) we choose that value ao of the controlling
parameter in fI for which min°En H(a, ao) is attained (this minimum is equal to H(ao) = 0; we assume that the minimum is attained and that ao depends continuously on as belonging to A).
Consequently, the value of the control parameter a(x) is constant on every radius issued from x0. At points x lying outside xo + cU,, the field a(x) may be defined almost arbitrarily, it only has to drive all points inside the indicated surface. The mean exit time is logarithmically equivalent to exp{clr' } as h 10.
In Wentzell and Freidlin [5], a special case of this example was considered, where the subject was the control of a diffusion process with small diffusion by means of the choice of the drift. EXAMPLE 3.2. We consider a dynamical system perturbed by a small white
noise; it can be subjected to control effects whose magnitude is under our
control but which themselves may contain a noise. The mathematical model of the situation is as follows:
a(X:, )[6(Xr,a) + e&(X,K where w, w, are independent Wiener processes and a(x) is a control function, which is chosen within the limits of the set 11(x). We consider the simplest case: the process is onedimensional, a, 6, and a are positive constants, b(x) is a continuous function, D is the interval (x,, x2) and the controlling parameter a(x) varies within the limits
±L26' max lb(x)l + d 'Q] [x1,x2]
at every point (or within the limits of any larger segment f] (x)).
376
9. Stability Under Random Perturbations
We have H(x,a,a) = (b(x) + ab)a + i(.2 + a2a2)a2;
H(x, a) = min H(x, a, a) = b(x)Lx  b2 + ;Q2a2 Q
for
Ice I
> [2d26  2 max[X,. XZl I b(x) I + Qab' ] '. The minimum here is
attained for a = bQ2a'. The function H(x, a) vanishes for (
/
_
Q2
bQO2
+
Of = al(x)
2a2 < 0
and for
a=a2(x)_
b(x) 2
b(x)2 Q4
52 W2 2
>0,
and also for a = 0, which is, by the way, immaterial for the determination
of the optimal quasipotential (it can be proved easily that
I al(x) I and a2(x) I surpass the indicated boundary for Joel). Problem RX0 for the equation
H(x, Vxo(x)) = 0 reduces to the equation Vxo(x) 
1;(x) for xl < x < xo, a2(x)
for xo < x < x2
with the additional condition 1imxXO VXO(x) = 0. It is easy to see that min(VXO(xl), VXa(x2)) attains its largest value for an xo for which VX0(x,) and VO(x2) are equal to each other. This leads to the following equation for x0:
rx J
b(x)2
+ 62 ga+ b(x)1 J dx 
f
4 +
%o
L
b(x)2
b2
 b(x)] dx
which has a unique solution. After finding the optimal equilibrium position x0, we can determine the optimal control according to the formula
_ bd2a1(x)' to the left of xo, a(x)  5 bQ2a2(x)' to the right of x0. ((
Chapter 10
Sharpenings and Generalizations
§1. Local Theorems and Sharp Asymptotics In Chapters 3, 4, 5 and 7 we established limit theorems on large deviations,
involving the rough asymptotics of probabilities of the type P{X'E A}. There arises the following question: Is it possible to obtain subtler results for families of random processes (similar to those obtained for sums of independent random variables)local limit theorems on large deviations and theorems on sharp asymptotics? There is some work in this direction; we give a survey of the results in this section. If X; is a family of random processes, a local theorem on large deviations may involve the asymptotics, as h 10, of the density p;'(y) of the distribution of the value of the process being considered at time t, where the point y is different from the "most probable" value of x(t) for small h (we take into
account that the density is not defined uniquely in general; we have to indicate that we speak of, for example, the continuous version of density). We may prove local theorems involving the joint density p",...... ,,(y,, ... ,
of random variables X,..... X. We may also consider the asymptotics of the density of the distribution of a functional F(X') at points different from the "most probable" value of F(x(.)).
However, obtaining these kinds of results involving a large class of functionals F and families of random processes X" is unrealistic for the time being, at least because we need to obtain results on the existence of a continuous density of the distribution of F(X") beforehand. For the same reason, positive results in the area of local theorems on large deviations involving densities of values of a process at separate moments of time up to now are restricted to families of diffusion processes, for which the problem of the existence of a density is well studied and solved in the affirmative (sense) under insignificant restrictions, in the case of a nonsingular diffusion matrix.
The transition probability density p"(t, x, y) for the value of a diffusion process Xr under the assumption that Xo = x has the meaning of the fundamental solution of the corresponding parabolic differential equation (or that of the Green's function for the corresponding problem in the case of a diffusion process in a domain with attainable boundary). This provides a
base for both an additional method of study of the density p" and the area of possible applications.
10. Sharpenings and Generalizations
378
We begin with Friedman's work [1] because its results are connected more directly with what has been discussed (although it appeared a little later than Kifer's work [1] containing stronger results). Let ph(t,x,y) be the fundamental solution of the equation au/at = Lhu in the rdimensional space, where
Lhu=}t2
a21,
() ax' ax' +Yb`(x)au, ax
a`'x
in other words, ph(t, x, y) is the continuous version of the transition probability density of the corresponding diffusion process (XI, PX). (Here it is more convenient to denote the small parameter in the diffusion matrix by h rather than E2, as had been done beginning with Ch. 4.) We put (1.2)
V(t, x, y) = min{So,(gp): 4Po = x, (Pr = y},
where h'So, is the action functional for the family of processes (Xv, PX). The functional So, is given, as we know, by the formula SOAP) = 2 J
ajj(WSXOS  b'((gS))(WS  b'((gs)) ds,
(1.3)
where (a;j(x)) = (a"(x))'. It can be proved that V(t, x, y).
lim h In ph(t, x, y)
(1.4)
h10
This rough local theorem on large deviations is obtained by applying rough (integral) theorems of Wentzell and Freidlin [4] and estimates from Aronson [1]: ph(t, x, y)
AO exp  (hty'2
co I
{
A, At, x+ Y) > /ht)rr2 eXp A2

X0,2 X)
ht
Cl IY  x(t, x)I2 }
(ht)r/22 eXp 
ht
c2I (Y  x(t, x))12 ht
I
for sufficiently small t, where the A;, c; and a are positive constants and x(t, x)
is the solution of the system r = b(x) with initial condition x(0, x) = x. Further, if (X;', PX) is the diffusion process corresponding to Lh in a domain D with smooth boundary OD, vanishing upon reaching the boundary,
then its transition probability density q''(t, x, y) is the Green's function for
§1. Local Theorems and Sharp Asymptotics
379
the equation au/at = Lhu with boundary condition u = 0 on aD. It is proved that V0(t, x, y),
lim h In qh(t, x, y)
(1.5)
h10
where VD(t, x, y) is defined as the infimum of SoT(cp) for curves cp connecting
x and y over time t, without leaving D. This implies in particular that lim qh(t, x, y) = 0
(1.6)
hIoP (t,x,y)
if all extremals for which the minimum (1.2) is attained leave D U OD. The following opposite result can also be proved: if all extremals pass inside D, then qh(t, x, 1)
= 1. lim h1oPh(t,x,y)
(According to M. Kac's terminology, the process X;' "does not feel" the boundary for small h.) In the scheme being considered we may also include, in a natural way (and almost precisely), questions involving the asymptotics, for small values of time, of the transition density of a diffusion process (of the fundamental
solution of a parabolic equation) not depending on a parameter. Indeed, let p(t, x, y) be the transition probability density of the diffusion process (X1, Px) corresponding to the elliptic operator ,2
Lu = 2Y_ a" (x)ax'axj
+ Yb'(x)ax'
We consider the family of diffusion processes X, = X,,,, h > 0. The transition density p(h, x, y) of X, over time h is equal to the transition density ph(1, x, y) of X, over time 1. The process (X,, Pz) is governed by the differential operator
Lhu = hLu =
h
2
2
I a''(x) ax' ax' + h > b'(x)
a1.
(1.8)
The drift coefficients h b'(x) converge to zero as h j 0, so that the family of operators (1.8) is almost the same as the family (1.1) with the b'(x) replaced
by zero. In any event, the action functional may be written out without difficulty: it has the form h'So,((p), where Sor(ip) = 2
f
Y_ a;;((p.)O',
ds.
(1.9)
380
10. Sharpenings and Generalizations
The problem of finding the minimum of this functional can be solved in terms of the following Riemannian metric p(x, y) connected with the matrix (a;;): P(x, Y) =
min Wo=x.4,,=Y
J [Y pr;((ps)Os]'12 ds. 0
It is easy to prove that the minimum of the functional (1.9) for all parametrizations cps, 0 S s < t, of a given curve is equal to the square of the Riemannian length of the curve multiplied by (2t)'. Consequently, the minimum (1.2) is equal to (2t)'p(x, y)2. Application of formula (1.4) yields
lim h In p(h, x, y) _  p(x, y)z. hj0
(1.10)
Accordingly, for the transition density of the diffusion process which vanishes on the boundary of D we obtain
lim h In q(h, x, y) = ZPD(x, y)Z,
(1.11)
hjO
where PD(X, y) is the infimum of the Riemannian lengths of the curves connecting x and y, without leaving D. There are also results, corresponding to (1.6), (1.7), involving the ratio of the densities q and p for small values of the time argument. In particular, the "principle of not feeling of the boundary" assumes the form lim
q(t, x, y)
=1
(1.12)
jop(t,x,Y)
if all shortest geodesics connecting x and y lie entirely in D. The results (1.10)(1.12) were obtained in Varadhan [2], [3] (the result (1.12) in the special case of a Wiener process was obtained in Ciesielski [1]). In Kifer [1], [3] and Molchanov [1] sharp versions of these results were obtained. In obtaining them, an essential role is played by the corresponding
rough results (it is insignificant whether in local or integral form): they provide an opportunity to exclude from the consideration all but a neighborhood of an extremal (extremals) of the action functional; after this, the problem becomes local. The sharp asymptotics of the transition probability density from x to y turns out to depend on whether x and y are conjugate or not on an extremal
connecting them (cf. Gel'fand and Fomin [1]). In the case where x and y are nonconjugate and the coefficients of the operator smooth, not only can the asymptotics of the density up to equivalence be obtained but an asymptotic
381
§1. Local Theorems and Sharp Asymptotics
expansion in powers of the small parameter can also be obtained for the family of processes corresponding to the operators (1.1) we have (2nht)_"2
ph(t, x, y) =
exp{h'V(t, x, y)}[K0(t, x, y)
+ hK I (t, x, y) +
+ hmKm(t, x, y) + o(hm)]
(1.13)
as h 10 (Kifer [3]); for the process with generator (1.8) we have p(t, x, y) =
(2nt)_"
exp{ p(x, y)2/2t}[K0(x, y)
+ tK I (x, y) +    + tmKm(x, y) + o(tm)]
(1.14)
as t 10 (Molchanov [1]). Methods in probability theory are combined with analytic methods in these publications. We outline the proof of expansion (1.13) in the simplest case, moreover for m = 0, i.e., the proof of the existence of the finite limit lim (2xht)''2ph(t, x, y) exp{h' V(t, x, y)}.
(1.15)
h1o
Let the operator Lh have the form
Au + Y b(x)
Lhu =
8x`
(1.16)
2
The process corresponding to this operator may be given by means of the stochastic equation
XS = b(X;) + h'/2rv
Xo = x,
(1.17)
where w, is an rdimensional Wiener process. Along with X;, we consider the
diffusion process Ys, nonhomogeneous in time, given by the stochastic equation
YS = Os + h''2r'
Ya = x,
(1.18)
where cp 0 < s < t, is an extremal of the action functional from x to y over time t (we assume that it is unique). The density of the distribution of Y; can be written out easily: it is equal to the density of the normal distribution with mean cpt and covariance matrix htE; at y it is equal to (27tht)'"2. The ratio of the probability densities of X; and Y; is equal to the limit of the ratio
P{X''ey+D}/P{Y"Ey+D}
(1.19)
382
10. Sharpenings and Generalizations
as the diameter of the neighborhood D of the origin of coordinates con. verges to zero. We use the fact that the measures in Co,(R'), corresponding to the random processes Xs and Y5, are absolutely continuous with respect to each other; the density has the form dpxh
(Yn) = exp{h 112
(fb(Y) (b( Y')  Psdw)
(2h)1
 02 ds I(1.20)
The absolute continuity enables us to express the probability of any event connected with X' in the form of an integral of a functional of Yh; in particular, dpXh
,,
=M Y,Ey+D;dµYh(Y).
P{X'' E1 y+ DI
n
Expression (1.19) assumes the form M((
{exp{h 112 J'(b(Y)
 cps, dws)
(l
I
(2h)'fIb(Ys)0sl2ds}IY;'ey+D} = M{ exp{ h 1/2 J(b(5 + h'r2ws)  0s, dws)
l
(

(2h)'
oIb(p.,
fo
ll + h'i2n,s)  OS12ds} w,Eh 1/2D}
(we use the fact that wo = 0 and Ys = cps + h'/2ws for 0 < s < t). If D (and together with it, h1/2D) shrinks to zero, we obtain the conditional mathematical expectation under the condition w, = 0. Hence p"(t, x,1/2 = (21rht)
h1j2ws)  OS' dws)
t

o (2h)'
fo Ib(cps J
l + h'"2ws)  OS12 ds} w, = )
)
of.
(1.21)
J
(Formula (1.21) has already been obtained by Hunt [1]). To establish the existence of the limit (1.15) as h 10, first of all we truncate the mathematical expectation (1.21) by multiplying the exponential expression by the indicator of the event {maxo<s 0, it is sufficient to assume that in 9R there exists a countable subset 9R0 such that for every
function f e 9A and every measure p there exists a bounded sequence of elements f of 9JZo, converging to f almost everywhere with respect to p. We shall always assume that this condition is satisfied. Now let us be given a family of random measures nh, depending on a
parameter h. For the sake of simplicity we assume that h is a positive numerical parameter.
Our fundamental assumption consists in the following. There exists a function 2(h) converging to + oc as h j 0 and such that the finite limit
H(f) = lim A(h)` In M exp{d(h) 0, the measures p such that S(p) exp{A(h)[S(p) + y]},
(2.8)
P{p(70, (D(s)) > S} < exp{A(h)[s  y]}
(2.9)
for sufficiently small h, where (b(s) = {p e V: S(p) < s}.
Proof. First we obtain estimate (2.8). If S(p) = + ac, there is nothing to be proved. Therefore, we assume that S(p) < eo. We use condition (1) for 61 > 0. We obtain the estimate p(ic", p) < p41(tt", p) + max WEB
(2.11)
390
10. Sharpenings and Generalizations
Applying Theorem 1.2 of Ch. 5 to the family of finitedimensional vectors 17h = h 10, and taking into account that in a finitedimensional space all norms are equivalent, we obtain the estimate P{pql(n", p) < 8/21 > exp{ A(h)[S.A(µ) + y/2]} > exp{ )(h)[S(p) + )'/2]}. (2.12)
Now we estimate the subtrahend in (2.11). The exponential Chebyshev inequality yields P{)
= exp{ _2(h)[x
 A(h)' In M exp{,1(h) > 8/4) < exp{),(h)[S(p) + y]}
(2.13)
for sufficiently small h > 0. Estimate (2.8) follows from (2.1l)(2.13). Now we deduce (2.9). We use condition (1) again. By virtue of B.2 we have max,,,.s < 8/4 for sufficiently small positive 8, and for all p E (F(s). From this and (2.10) we conclude that P{p(rc", (F(s)) >_ 8) < P{pj,(n", (F(s)) > 8/2) + P{max  8/2) < P{p%(rc", (Fe(s)) >_ 8/4) < exp{A(h)(s  y/2)}.
The second term in (2.14) can be estimated by means of (2.13) and we obtain
estimate (2.9) for small h. O We consider an example. Let (S,, Px) be a diffusion process on a compact
manifold E of class 0°°', controlled by an elliptic differential operator L
391
§2. Large Deviations for Random Measures
with infinitely differentiable coefficients. Let us consider the family of random measures aT defined by formula (2.1). We verify that this family satisfies the
condition that the limit (2.4) exists, where instead of h 10, the parameter T goes to oo and as the function A we take T: for any function f e B, the finite limit
H(f) = lim T` In M x expT f I fO
T. m
ds .
(2.15)
11
exists. As in the proof of Theorem 4.2 of Ch. 7, for this we note that the family of operators T
T, g(x) = Mx exp{ Jo f (SS)
J
forms a semigroup acting in B. If g(x) is a nonnegative function belonging to B, assuming positive values on some open set, then the function Tfg(x) is strictly positive for any t > 0. As in the proof of Theorem 4.2 of Ch. 7, we may deduce from this that the limit (2.15) exists and is equal to .1(f ), the logarithm of the spectral radius of T, (cf., for example, Kato [1]). The fulfilment of conditions A.1 and A.2 follows from results of perturbation theory. Moreover, it is easy to prove that the supremum in the definition of S(µ) may be taken for not all functions belonging to B but only continuous ones: S(µ)
= sup [f>  A(f)] fec
(2.16)
For continuous functions f the logarithm of the spectral radius of Ti coincides with that eigenvalue of the infinitesimal generator Af of the semigroup T, which has the largest real part: this eigenvalue is real and simple. For smooth functions, Af coincides with the elliptic differential operator
L + f. This implies that by .(f) in formula (2.16) we may understand the maximal eigenvalue of L + f.
The random measure hT is a probability measure for every T > 0. According to B. 1, the functional S(µ) is finite only for probability measures µ.
Since A(f + c) = A(f) + c for any constant c, the supremum in (2.16) may be taken only for those f e C for which ),(f) = 0. On the other hand, the functions f for which %.(f) = 0 are exactly those functions which can be
represented in the form Lu/u, u > 0. Consequently, for probability measures µ, the definition of S(p) may be rewritten in the form
S(µ)
inf u>o
l µ, u
u
(2.17)
392
10. Sharpenings and Generalizations
For measures k admitting a smooth positive density with respect to the Riemannian volume m induced by the Riemannian metric connected with the principal terms of L, we may write out Euler's equation for an extremal of problem (2.17) and obtain an expression for S(p), not containing the infimum. In particular, if L is selfadjoint with respect to m, then the equation for an
extremal can be solved explicitly. The infimum in (2.17) is attained for u = dµ/dm and S(p) can be written in the form
v S(u) = g
dp
r JE
dm. dµ
dm
The functional S(p) may be calculated analogously for measures 71T in
the case where (4, Pz) is a process in a bounded domain with reflection at the boundary (Gartner [2]).
§3. Processes with Small Diffusion with Reflection at the Boundary In Chapters 26 we considered the application of methods in probability theory to the first boundary value problem for differential equations with a small parameter at the derivatives of highest order. There naturally arises the question of what can be done for the second boundary value problem or for a
mixed boundary value problem, where Dirichlet conditions are given on some parts of the boundary and Neumann conditions on other parts. Let the smooth boundary of a bounded domain D consist of two components, 31D and 02D. We shall consider the boundary value problem E2
L`u`(x) _  Y a'j(x) 2
3u`(x) 3!
a2us
8x` ax' = 0,
+ Y b`(x)
CUE
8x'
= 0,
x e D,
u`(x) LD = f ( x),
a,D
where a/a1 is the derivative in some nontangential direction; the coefficients of the equation, the direction ! and the function f are assumed to be sufficiently smooth functions of x. This problem has a unique solution, which can be written in the form u`(x) = Mx f (XI.), where (XI, Ps) is the diffusion process in D u BID, governed by LE in D, undergoing reflection in the direction I on the part DID of the boundary; re = min{t : XX E 02D} (Fig. 32). The asymptotics of the solution u` may be deduced from results involving
§3. Processes with Small Diffusion with Reflection at the Boundary
393
Figure 32.
the limit behavior of X; for small E. We begin with results of the type of laws of large numbers.
Along with X;, we consider the dynamical system z, = b(x,) in D, obtained from X' for E = 0. It follows from results of Ch. 2 that with probability
close to one for small E, the trajectory of X beginning at a point x e D, is close to the trajectory x,(x) of the dynamical system until the time of exit of x,(x) to the boundary (if this time is finite). From this we obtain the following: if x,(x) goes to 02D sooner than to 0,D and at the place y(x) of exit, the field b is directed strictly outside the domain, then the value u(x) of the solution of problem (3.1) at the given point x c D converges to f (y(x)) asE + 0.
If X' begins at a point x E 0,D (or, moving near the trajectory of the dynamical system, reaches 0,D sooner than 02D), then its most probable behavior depends on whether b(x) is directed strictly inside or outside the domain. In the first case, X; will be close to the trajectory, issued from x E a,D,
of the same dynamical system t, = b(x,). From this we obtain the following result.
Theorem 3.1. Let the field b be directed strictly inside D on a,D and strictly outside D on 02 D. Let all trajectories of the dynamical system .z, = b(x,), beginning at points x e D u 3%D, exit from D (naturally, through 32D). Then the solution u`(x) of problem (3.1) converges to the solution u°(x) of the degenerate problem
Y b'(x) ax; = 0,
x c D,
(3.2) u°(x)1a2D = PA
uniformly forxeDuODase 0. If b is directed strictly outside Don 01 D, then the trajectory of the dynamical
system goes out of D u 49,D through 81D and X; is deprived of the opportunity of following it. It turns out that in this case X; begins to move along a
394
10. Sharpenings and Generalizations
trajectory of the system z, = b(x,) on a1D, where b is obtained by projecting
b parallel to the direction onto the tangential direction. This result also admits a formulation in the language of partial differential equations but in a
situation different from the case of a boundary consisting of two components, considered here (cf. Freidlin [2]). In the case where the trajectories of the dynamical system do not leave D through a2D but instead enter D through 02D, results of the type of laws of large numbers are not sufficient for the determination of lim,_0 u`(x). A study of large deviations for diffusion processes with reflection and of the asymptotics of solutions of the corresponding boundary value problems
was carried out in Zhivoglyadova and Freidlin [1] and Anderson and Orey [1]. We present the results of these publications briefly.
We discuss Anderson's and Orey's [1] construction enabling us to construct a diffusion process with reflection by means of stochastic equations, beginning with a Wiener process. We restrict ourselves to the case of a process
in the halfplane R+ = {(x', x2): x` > 0} with reflection along the normal direction of the boundary, i.e., along the x'direction. We introduce a mapping F of COT(R2) into COT(R+): for E C0T(R2) we define the value rl, _ by the equalities r,( ) of the function h = (3.3)
min(0, min SS
\
0<s 1, the solution never exceeds 1. But as we show later, one can prove using the large deviation estimates that u(t, x) = 1 for any h > 0, where u(t, x) is the solution of (4.4).
Equation (4.4) appeared, first, in Fisher [1] and in Kolmogorov, Petrovskii, and Piskunov [1]. The rigorous results for nonlinearities f (u) = c(u)u satisfying (4.3) were obtained in the second of these papers. Therefore, this nonlinearity is called KPPtype nonlinearity. It was shown in Kolmogorov, Petrovskii, and Piskunov [1] that the solution of problem (4.4) with a KPPtype nonlinear term behaves for large t, roughly speaking, as a running wave:
u(t, x) z v(x  a*t), where a* = D
t + oo,
2Dc, c =f'(0), and v(z) is the unique solution of the problem
v"(z) + a*v'(z) + f (v(z)) = 0,
v(oo) = 1,
co < z < oo, v(+co) = 1,
v(0) = 2.
Thus, the asymptotic behavior of u(t, x) for large t is characterized by the asymptotic speed a* and by the shape of the wave v(z). Instead of considering large t and x, one can introduce a small parameter e > 0: put z/(t, x) = u(t/e, x/e). Then the function u6(t, x) is the solution of
§4. Wave Fronts in Semilinear PDEs and Large Deviations
399
the problem
 eD a2Ue +
ate(t, x) at
2 axe
1
E
f(uE) , (4.6)
tt(0,x) = X (x) One can expect that ue(t, x) = u(t/e, x/e) .: v((x  a*t)/e) X (x  a*t) as e 10. This means that the first approximation has to do only with the asymptotic speed a*, and does not involve the asymptotic shape v(z). Consider now a generalization of problem (4.6) (see Freidlin [12], [14], [17], [18]): ate(t x)
at'
=
e
2
'
ague
t>0,
1
> a'J(x) aaxl + e f (x, uE),
xeR',
i.j=1
uE(0,x) = g(x) > 0.
A number of interesting problems, like the small diffusion asymptotics for reactiondiffusion equations (RDEs) with slowly changing coefficients, can be
reduced to (4.7) after a proper time and space rescaling. We assume that the coefficients a`j(x) and f (x, u) are smooth enough, a A2 1
)2 for some 0 < a < a; the function f (x, u) = c(x, u)u, where c(x, u) satisfies conditions (4.3) for any x e R' (in other words, f (x, u) a`j (x)A,4j < a
is of KPPtype for any x e R'). Denote by Go the support of the initial function g(x). We assume that 0 < g(x) < g < oo and that the closure [Go] of Go coincides with the closure of the interior (Go) of G : [(Go)] = [Go]. Moreover, assume that g(x) is continuous in (Go). Denote by (X,6, Px) the diffusion process in R' governed by the operator
V= e
2
a2
E j_1
ax'axj
The FeynmanKac formula yields an equation for te(t,x),
ut(t,x) = Mxg(X;)exp{
!
1 J0
c(X:,ue(ts,Xs))ds}.
(4.8)
e
JJJ
First, assume that c(x, 0) = c = constant. Then, taking into account that c(x, u) < c(x, 0) = c, we derive from (4.8), 0 < ue(t, x) 5 Mxg(X, )e"/E < gear/EPx{Xl a Go}.
(4.9)
400
10. Sharpenings and Generalizations
The action functional for the family of processes Xs, 0 S s < t, in the space of continuous functions is equal to t
1
1
Sot (0) =
2e
r
f tE
ds,
0
(a''J(x))1, and
where 0 is absolutely continuous,
lmelnP{X, eGo}=inf{Sot(q):0eCot, 00=x,,eGo}.
(4.10)
810
Denote by p(x, y) the Riemannian distance in Rr corresponding to the form dsz = E; i=1 a1 (x) dx' dxi. One can check that the infimum in (4.10) is actually equal to (1/2t)pz(x, Go). We derive from (4.9) and (4.10), 0 < u`(t, x) < je`iPX{Xt e Go}
exp{ E
(ci  pz (2GO)) l
It follows from the last bound that
if p(x, Go) > tv,
lim tt (t, x) = 0, 10
t > 0.
(4.11)
We now outline the proof of the equality
o u(t, x) = 1,
1
if p(x, Go) < t 2c,
t > 0.
(4.12)
Let us first prove that
lim elnu'(t,x) = 0, ¢l0
(4.13)
if p(x, Go) = t 2c, 1 > 0. If K is a compact set in It > 0, x e Rr}, then the convergence in (4.13) is uniform in (t, x) e K n {p(x, Go) = t 2c}. One can derive from the maximum principle that lim uE(t, x) < 1,
t > 0, x e Rr.
(4.14)
E40
Therefore, to prove (4.13) it is sufficient to show that lim aInue(t,x) z 0, 10
ifp(x,GO)=t 2c,t>0.
(4.15)
§4. Wave Fronts in Semilinear PDEs and Large Deviations
401
Let ¢s, 0 5 s ::g t, be the shortest geodesic connecting x and [Go] with the parameter s proportional to the Riemannian distance from x. For any small 6 > 0, choose z E (Go), Iz  0t 1 < 6, and define
0<s 0, a5s5i6
where d(., ) is the Euclidean distance in R' and G,q = {x e R', p(x, Go) < s 2c}. Such a choice of p is possible since d( , Gs) d1 > 0 for 6 < s:5 t  6. Then (4.11) implies that uE (t  s, qs) > 0 as a 10 for (5 < s < t  d and fi E Eat, and thus c%, u5(t  s, qis)) ' c(t/is, 0) = c as e 10 for such and s. Since Sot(¢`) = p2(x, Go)/2t, one can derive from the definition of 0s that for any h > 0 there exists b > 0 so small that Sot(0s)  p2(x, Go) < h. 21
Let go = min{g(x) : Ix  zj < p}. It follows from (4.8) that r
u'(1, x) ? MXg(X,)X,(X')expj E jo c(Xs,u(ts,X.,))ds 111
gpe(c(t3s))IEP,, > gpec(t3o)1a
exp J l
j9p}
 1 (P2 (x, Go) + 2h/ }
(4.16)
2t
for e > 0 small enough. In (4.16) we used the lower bound from the definition of the action functional. Taking into account that p(x, Go) = t 2c, we derive (4.15) from (4.16).
We now prove (4.12). Suppose it is not true. Then, taking into account (4.14), xo a R' and to > 0 exist such that p(xo, Go) < to 2c,
lim u`(to,xo) < 1  2h s10
(4.17)
10. Sharpenings and Generalizations
402
for some h > 0. Let U = Ue be the connected component of the set {(t, x) : t > 0, x e R', p(xo, Go) < t 2c, ue(t, x) < I  h},
containing the point (to, x0). The boundary 8U of the set U c R'+1 consists of sets 8 U, and 8 U2:
0U, =0Un{(t,x) : t>0,xeR',ue(t,x) = 1 h},
8U2=avn({(t,x):t>0,p(x,Go)=tV2__c}u{(t,x):t=O,xeGo}). =min{s : (t  s, X) e U} and let X;, i e 11, 2}, be the indicator function of the set of trajectories XI for which (t  re, X,8.) e BU;. Let A, Define
be the distance in R'+1 between (to, xo) and the surface {(t, x) : t > 0, p(x, Go) = t 2c}, A _ AI A to. Let xo denote the indicator function of the set of trajectories Xe, such that maxo s s s IX,'  xJ < A. Using the strong Markov property, one can derive from (4.8), 1
U8(to,xo)=M zt(tore,X,)exp{E 1
J0
c(X,s,ue(tos,XS))ds} JJ
2
r
ll ,,e ,e Mxxxoua(tore,Xt)exp{E Jf c(Xe,ue,(t0s,X,))ds}.
 0. To estimate the second term, note that re > A and min c(Xe, ue(to  s, X,)) >t
05s5z`
min c(x, u) = co > 0, 05u51h IxxoI 5 A
if (to  r, X) e 8U2. Thus, the second term in the righthand side of (4.18) is greater than MxoX'OX'2e(Cox)/eue(to
 re, 4).
Using bound (4.13), we conclude that the last expression is bounded from below by M4 if s > 0 is small enough. Combining these bounds together, one can derive from (4.18): ue(to,xo) ? (1  h)Mxoy1Xo + Mx0X210
21hM,.(14)
403
§4. Wave Fronts in Semilinear PDEs and Large Deviations
fore small enough. Since M ,,,to co > 0 exists such that
1 as E 10, the last inequality implies that
ue(to,xo)> 1 2 for0<E<eo. This bound contradicts (4.17) and thus (4.12) holds. Equalities (4.11) and (4.12) mean that, in the case c(x, 0) = c = constant, the motion of the interface separating areas where u`(t, x) is close to 0 and to 1 for 0 < e 0, then
G,+h=Ix aR':p(x,G,)0.
In particular, if Go = Go is a ball of radius E with the center at a point z e R',
and g(x) = g8(x) is the indicator function of Go c K, then G, is the Riemannian ball of radius t/ with the center at z. Now let c(x, 0) = maxos. c(x, u) = c(x) not be a constant. Put 1
r
V(t, x) = inf
Jo
c(0s)  .2
ds : O e Cor, X60
= x, #, a Go
.
+,i=I
One can check that V(t, x) is a Lipschitz continuous function increasing in t. It follows from (4.8) and (4.3) that us(t, x) 0,
(4.20)
If we could prove that lim zt(t,x) = 1, 610
then the equation V(t, x) = 0 would give us the position of the interface
404
10. Sharpenings and Generalizations
(wavefront) separating areas where ue(t, x) is close to 0 and to 1 at time t. But simple examples show that, in general, this is not true; the area where lime1o ue(t, x) = 0 can be larger than prescribed by (4.19). This, actually, depends on the behavior of the extremals in the variational problem defining V(t, X).
We say that condition (N) is
fulfilled
if for any (t, x) such that
V(t, x) = 0, V (t' x)
I
= sup
0
c(Os)  2
ds : ¢ e Cot,
.E aji
j=1
00 =x,¢te[Gt],V(ts,¢s.) 0}, and the equation V(t, x) = 0 describes the position of the wavefront at time t.
The first statement of this theorem is a small refinement of (4.19). The proof of the second statement is similar to the proof of (4.12) in the case c(x) = c = constant; the condition (N) allows us to check that limelo c In u(t, x) = 0, if t > 0 and V (t, x) = 0. The rest of the proof is based on (4.8), strong Markov property, and condition (4.3), as in the case c(x) _ constant. For the detailed proof, see Freidlin. [15]. The motion of the front in the case of variable c(x) = c(x, 0) has a number of special features, which cannot be observed if c(x) = c = constant. Consider, for example, the onedimensional case a(,
at
x)
=
2
2 axe
ue (O, X) = X (x
+ E c(x)ue(l  ue),
 fl)
t> 0, x e R,
(4.21)
405
§4. Wave Fronts in Semilinear PDEs and Large Deviations
Here c(x) is a smooth increasing positive function; /1 > 0 is a constant. One can check that the condition (N) is fulfilled in this case, and the equation V(t, x) = Vp(t, x) = 0 describes the position of the front at time t. If x > fi, then
Vp(t, x) = sup{
Jo
[c(c5)
0o
 2 s]
+
= x,
Or
Let, first, c(x) = 1 for x < 0 and c(x) = 1 + x for x > 0. Then the function Vp(t, x) can be calculated explicitly:
V,6(t,x)
24+t(1+
Vp(t,x)>0,
(#x) 2
fl+x
13
2t
2
x>fl,
\\x 0,
xf(t) = 2 +fl+
a
3 +2t2(1 +fl).
The interface motion in this case has some particularities which are worth mentioning. Evaluate fl = xo(l), x;(1), xo(2):
xo(1)=I+ 3:.2.0, xo(2) = 2 + 3 x 5.6,
xx(0)=1+
43+ 3 + 3 X5.1.
We see that xo(2) > xx(1). This means that the motion of the front does not satisfy the Markov property (i.e., the semigroup property): given the position of the front at time 1, the behavior of the front before time 1 can influence the behavior of the front after time 1. The evolution of the function ue(t,x) satisfies, of course, the semigroup property: if ue(s, x) is known for x e R1,
406
10. Sharpenings and Generalizations
then ue(t, x), t > s, can be calculated in a unique way. However, it turns out that for c(x) # constant, the evolution of the main term of ue(t, x) as e 10, which is a step function with values 0 and 1, already loses the semigroup property. To preserve this property, one must extend the phase space. The phase space should include not just the point where Vp(t, x) is equal to zero, but the whole function vp(t, x) = VV (t, x) A 0.
The loss of the semigroup property also shows that the motion of the front cannot be described by a Huygens principle. One more property of the front motion in this example: Let uo(t,x) and u i (t, x) be the solutions of equation (4.21) with different initial functions uo(0, x) = X (x), ui (0, x) = 1  X (x  1). In the first case, the front moves
from x = 0 in the positive direction and its position at time t is given by xo(t). In the second case, the front moves from x = 1 in the negative direction of the xaxis. Condition (N) is not satisfied in the last case, but one can
prove that the position k (t) of the front at time t is the solution of the equation (Freidlin [15]),
X (t) _  c(Xt),
Xo = 1.
(4.22)
In our example, c(x) = 1 + x for x > 0. Solving (4.22), one can find that the
front comes to x = 0 at time ti.o z 1.22 in the second case. On the other hand, solving the equation xp(t) = 1 for fi = 0, we see that in the first case, the front needs time to_,I z 0.57 to come from 0 to 1. Thus, the motion in the direction in which c(x) increases is faster than in the opposite direction. This property is also incompatible with the Huygens principle. We have seen in this example that if c(x) is linearly increasing, the front moves at an increasing, but finite, speed. It turns out that if c(x) increases fast enough on a short interval, remaining smooth, the front may have a jump. For example, consider problem (4.21) with ft = 0 and take as c(x) a smooth increasing function which coincides with the step function e(x) =
1 + 3(1  X (x  h)), h > 0, everywhere except a 6neighborhood of the point x = h. One can check that the condition (N) is satisfied in this case. But if 8 is small enough, the solution xo = xo (t) of the equation V (t, xo) = 0 has the form shown in Fig. 33. This means that for 0 < t < To, the front moves continuously. But at time t = To (see Fig. 33) a new source arises at a point h' close to h if 8 is small enough. The region where ue(t, x) is close to 1 for 0 < e 0, x e R', W (t, x) < 0}; (ii)
limelo u(t, x) = 1, uniformly in (t, x) a F2, where F2 is a compact
subset of the interior of the set { (t, x) : t > 0, x e R', W (t, x) = 0}.
The proof of this theorem and various examples can be found in Freidlin [17] and Freidlin and Lee [1].
Various generalizations of problem (4.7) for KPPtype nonlinear terms can be found in Freidlin [18], [20], [22], and Freidlin and Lee [1]. In particular,
one can consider RDEsystems of the KPPtype, equations with nonlinear boundary conditions, and degenerate reactiondiffusion equations. An analytic approach to some of these problems is available at present as well; corresponding references can be found in Freidlin and Lee [1]. Wavefront
propagation in periodic and in random media is studied in Gartner and Freidlin [1], Freidlin [15], and Gartner [5].
408
10. Sharpenings and Generalizations
Consider onedimensional dynamical system u = f (u). If f (u) is of the KPP1 type, the dynamical system has two equilibrium points: one unstable point at u = 0 and one stable point at u = 1. One can consider problem (4.7) with such a nonlinear term that the dynamical system has two stable rest points separated by the unstable third one. Such nonlinear terms are called bistable. Problem (4.7) with a nonlinear term that is bistable for each x c R' is of great interest. One can prove that zt (t, x) also converges in this case to a
step function as e ]. 0, and the evolution of this step function can be described by a Huygens principle. These results can be found in Gartner [4].
§5. Random Perturbations of InfiniteDimensional Systems We considered, in the previous chapters of this book, random perturbations of finitedimensional dynamical systems. Similar questions for infinitedimensional systems are of interest as well. From a general point of view, we face in the infinitedimensional case, again, problems of the lawoflargenumbers type, of the centrallimittheorem type, and of the largedeviation type. But many new interesting and rather delicate effects appear in this case. A rich class of infinitedimensional dynamical systems and semiflows is described by evolutionary differential equations. For example, one can consider the dynamical system associated with a linear hyperbolic equation or a semifliow associated with the reactiondiffusion equation 8u(t, x)
at
= DDu +f (x, u),
t>0,xEGcR',
u(0, x) = g(x), 8u(t, x) On
IaG = 0.
Here D is a positive constant, f (x, u) is a Lipschitz continuous function, and
n = n(x) is the interior normal to the boundary 8G of the domain G. The domain G is assumed to be regular enough. Problem (5.1) is solvable for any continuous bounded g(x). The corresponding solution u(t, x) will be differ
entiable in x e G for any t > 0, even if g(x) is just continuous. This shows that problem (5.1) cannot be solved, in general, for t < 0. Therefore, equation (5.1) defines a semiflow T1g(x) = u(t,x), t z 0, but not a flow (dynamical system), in the space of continuous functions. Of course, there are many ways to introduce perturbations of the semiflow Tlg. One can consider various kinds of perturbations of the equation, perturbations of the domain, and perturbations of the boundary or initial conditions. Let us start with additive perturbations of the equation. We restrict our
§5. Random Perturbations of InfiniteDimensional Systems
409
selves to the case of one spatial variable and consider the periodic problem, so that we do not have to care about the boundary conditions: 2 au`(t,x)Da+f(x,uE)+sC(t,x), TX_ 2
at
zt(0,x) = g(x).
We assume that the functions f (x, u), g(x), C(t, x) are 2mperiodic in x, so that they can be considered for x belonging to the circle S' of radius 1, and we consider 27rperiodic in x solutions of problem (5.2). When s = 0, problem (5.2) defines a semiflow in the space Cs, of continuous functions on Sl provided with the uniform topology. What kind of noise C(t, x) will we consider? Of course, the smoother C(t, x), the smoother the solution of (5.2). But on the other hand, the smoothness makes statistical properties of the noise and of the solution more complicated. The simplest (from the statistical point of view) noise C(t, x) is the spacetime white noise
C(t'
x)
02 W (t' x)
=
t >_ 0, x e [0, 21r).
arax
Here W (t, x), t > 0, x e R1, is the Brownian sheet, that is, the contin
uous, mean zero Gaussian random field with the correlation function M W (S, x) W (t, y) = (s A t) (x A y). One can prove that such a random field exists, but 62 W (t, x)/arax does not exist as a usual function. Therefore, we have to explain the solution of problem (5.2) with C(t, x) = 62 W(t, x)/atax. A measurable function 1/(t, x), t >_ 0, x e Sl, is called the generalized solution of (5.2) with C(t, x) = a2 W (t, x)/atax, if u`(t, x)b(x) dx

g(x)o(x) dx J s,
J s,
[uE(s, x)Dc"(x) f (x, uE (s, x))0(x)] dx ds o
s1
+e
0'(x) W(t, x) dx
(5.3)
Js1
for any 0 e CS, with probability 1. It is proved in Walsh [1] that such a solution exists and is unique, if f (x, u) is Lipschitz continuous and g e Csi. Some properties of the solution were established in that paper as well. In particular, it was shown that the solution is Holder continuous and defines a Markov process ur = u5(t, 0) in Cs, . The asymptotic behavior of u4 as s 10 was studied in Faris and JonaLasinio [1] and in Freidlin [16].
410
10. Sharpenings and Generalizations
Consider, first, the linear case, x)
D
ale
ae + E
8z
t > 0, x e S,
a az x)
(5.4)
ve(0,x) = g(x)
Here D, a > 0, W (t, x) is the Brownian sheet. The solution of (5.4) with g(x) _ 0 is denoted by vo(t, x), and let vg(t, x) be the solution of (5.4) with E = 0; vy°(t, x) is not random. It is clear that ve(t, x) = vg (t, x) + vo(t, z).
Due to the linearity of the problem, va (t, x) is a Gaussian mean zero random field. One can solve problem (5.4) using the Fourier method. The
eigenvalues of the operator Lh(x) = Dh"(x)  ah(x), x c S1, are 1k = Dk2 + a, k = 0, 1, 2,.. .. The normalized eigenfunctions corresponding to 2k
are n 1/2 cos kx and 7r1/2 sin kx. Then 00
Ao(t) +_= E(Ak(t) sinkx + Bk(t) coskx), (5.5)
vE(t, x) = vg(t, x) +
V/7r k=1
where Ak(t) and Bk(t) are independent OrnsteinUhlenbleck processes, satisfying the equations dAk(t) = dWk(t)  AkAk(t) dt,
Ak(0) = 0,
dBk(t) = d Wk (t)  2kBk(t) dt,
Bk(0) = 0.
Here Wk(t) and Wk(t) are independent Wiener processes. The correlation function of the field V (t, x) has the form M(v8(s,x)  Mv`(s,x))(v8(t,y)  Mve(t,y)) = ezB(s, x, t, y)
=
E2
00
k=O
cos(x  y)
(
e xk(rS)  e tk(r+s));
2k
x,y e S1, 0 < s < t. (5.6)
One can view vi = vE(t, ), t > 0, as a Markov process in Cs, (generalized OrnsteinUhlenbeck process). Equalities (5.5) and (5.6) imply that the process v, has a limiting distribution p8 as t + oo. The distribution p8 in Cs, is mean zero Gaussian with the correlation function c2B(x, y)
ez . cos k2k (x Y)
= c
k =O
k
p8 is the unique invariant measure of the process vt in Cs, .
(5.7)
411
§5. Random Perturbations of InfiniteDimensional Systems
Consider now the random field ve(t, x), 0 < t < T, x e S1, for some T E (0, oo). It is easy to see that ve(t, x) converges to vg (t, x) as a > 0, but with a small probability ve(t, x) will be in a neighborhood of any function ¢(t, x) e C10,71xs1 such that 0(0, x) = g(x), if e > 0. The probabilities of such deviations are characterized by an action functional. One can derive from Theorem 3.4.1 and (5.6) that the action functional for the family ve(t, x) in as e + 0 is a 2Sv(0), where 10r(t,x)
S°(O) =
2
Js1 Jo
+oo,
Do,",(1,x)+a0(t,x)I2dtdx,
0e W21'2
for the rest of L[o,T]xs1.
Here W2'2 is the Sobolev space of functions on [0, T] x S1 possessing generalized square integrable derivatives of the first order in t and of the second order in x.
Note that there is a continuous imbedding of W1'2 in C[o,T]xsi. We preserve the same notation for the restriction of S°(qS) on C10,71xs,. Using Theorem 3.4.1 and Fernique's bound for the probability of exceeding a high
level by a continuous Gaussian field (Fernique [1]), one can prove that e 2Sv(q) is the action functional for ve(t, x) in C[oT]xsI as e > 0. Consider now equation (5.2) with C(t, x) = 02W(t, x)/atax. Let F(x, u) _ f0" f(x, z) dz, and
U(c) = Jo3
I (do)2 +F(x, fi(x))
dx,
E WI21
where W21 is the Sobolev space of functions on S' having square integrable first derivatives. It is readily checked that the variational derivative 6U(q)/8q$ has the form 8U(o) _
D D
dx2
.f(x, c)
So the functional U(O) is the potential for the semiflow in Cs1 defined by (5.1): au(t, x)
at
_ 5U(u(t, x)) 8u
We have seen in Chapter 4 that for finitedimensional dynamical systems, potentiality leads to certain simplifications. In particular, the density me(x)
412
10. Sharpenings and Generalizations
of the invariant measure of the process
4 = VU(xj) +cJ1' in R' can be written down explicitly: m8(x) = C e
C 1= E
2C 2u(x),
z JRr e '
x c R', dx.
Convergence of the last integral is the necessary and sufficient condition for existence of a finite invariant measure. One could expect that a similar formula could be written down for the invariant density m'(b) of the process u!:
mE(q) = constant x exp{ 
U(O) I.
(5.8)
The difficulty involved here, first of all, is due to the fact that there is no counterpart of Lebesgue measure in the corresponding space of functions. One can try to avoid this difficulty in various ways. First, for (5.8) to make sense, one can do the following. Denote by 96(q) the 6neighborhood of the function 0 in CS, norm. If v is the normalized invariant measure for the process u;, then one can expect that T
r
Py{ lim 1 f T_ T
dt = v(9a(c))} = 1,
where x w is the indicator of the set 9J(4). Then it is natural to call exp{(2/e2)U(0)} the nonnormalized density function of the stationary distribution of the process u1 (with respect to nonexisting uniform distribution) provided T
lim lim
.ToT X (0J(tOt)dt
.510 T.oo fo X4(02)(u;) dt
= exp{E(U(O1)
 U(c2))}.
(5.9)
In (5.9), a is fixed. It is intuitively clear from (5.9) that the invariant measure
concentrates as c > 0 near the points where U(S) has the absolute minimum. One can try to give exact meaning not to (5.8) but to its intuitive implications, which characterize the behavior of uj for e