Large Deviations
This is Volume 137 in PURE AND APPLIED MATHEMATICS
H. Bass, A. Borel, J. Moser, S.T. Yau, editors ...
27 downloads
379 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Large Deviations
This is Volume 137 in PURE AND APPLIED MATHEMATICS
H. Bass, A. Borel, J. Moser, S.T. Yau, editors Paul A. Smith and Samuel Eilenberg, founding editors A complete list of titles in this series appears at the end of this volume.
Large Deviations JeanDominique Deuschel Department of Mathematics Cornell University Ithaca, New York
Daniel W. Stroock Department of Mathematics Massachusetts Institute of Technology Cambridge, Massachusetts
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York Berkeley London Sydney Tokyo Toronto
Copyright 0 1989 by Academic Press, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, CA 92101 United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD. 2428 Oval Road. London NW1 7DX
Library of Congress CataloginginPublictjonData Deuschel, JeanDominique, Date Large deviations 1 JeanDominique Deuschel, Daniel W. Stroock. p. cm. (Pure and applied mathematics; v. 137) Rev. ed. of An introduction to the theory of large deviations 1 D.W. Stroock. c1984. Bibliography: p. Includes index. ISBN 0122131509 1. Large deviations. I. Stroock, Daniel W. Introduction to the theory of large deviations. 11. Title. ILI. Series: Pure and applied mathematics (Academic Press); 137. QA3.P8 vol. 137 [QA273.67] 89397 510 sdcl9 CIP [519.5'34]
Printed in the United States of America 89909192 987654321
For
Monroe D. Donsker who has always liked it best in function space
This Page Intentionally Left Blank
Preface The title of this book to the contrary not withstanding, there is no more a “theory” of large deviations than there is a “theory” of partial differential equations; and what passes for the “theory” is, in reality, little more than a grabbag of techniques which have been successfully applied to special situations and are therefore worth trying in sufficiently closely related settings. Thus, even though the title implies that a master key is contained herein, the reader will discover that reading this book prepares him to analyze large deviations in the same sense as the manual for his computer prepared him to write his first program; that is, hardly at all! In spite of the preceding admission, we have written this book in the belief that even (and, perhaps, particularly) when a field possesses no “CAUCHY integral formula,” a useful purpose can be served by a book which surveys a few outstanding successes and attempts to codify some of the principles on which those successes are based. In the present case, the examples of success are plentiful but the underlying principles are few and somewhat illusive. We hope that the brief synopsis given below will help the reader spot and understand these few principles, at least in so far as we have recognized and understood them ourselves. After attempting, in Section 1.1, a heuristic explanation of the ideas on which the theory of large deviations rests, the remainder of Chapter I is devoted to a detailed account of two basic examples. The first of these, which is the content of Section 1.2, is CRAMER’S renowned theorem on the large deviations of the CESAROmeans of independent Rvalued random variables from the Law of Large Numbers. In order to emphasize, as soon as possible, that large deviations can be successfully analyzed even in an infinite dimensional context, for our second example we have chosen vii
...
Vlll
Large Deviations
SCHILDER’S Theorem for rescaled WIENER’Smeasure. The derivation is carried out in Section 1.3, and applications to first STRASSEN’S Law of the Iterated Logarithm and second to the estimates of VENTCELand FRETDLIN are given in Section 1.4. In connection with the VENTCELFREIDLIN estimates, we have assumed that the reader is familiar with the elements of IT& theory of stochastic differential equations; however, because the rest of the book relies on neither the contents of Section 1.4 nor a knowledge of IT& calculus, readers who are not acquainted with the quirks of stochastic integration need not (on that account) be too concerned about what lies ahead. Armed with the examples from Chapter I, we turn in Chapter I1 to the formulation of two of the guiding principles on which the rest of the book is more or less based. The first of these is contained in Lemma 2.1.4 which provides a reasonably general statement of the “covariant” nature of large deviations results under mappings which are sufficiently continuous. (The treatment given in Section 1.4 of the VENTCELFREIDLIN estimates should be ample evidence of the potential power of this principle.) In order to formulate the second general principle set forth in this chapter, we start in Section 2.1 with VARADHAN’S version of the LAPLACE asymptotic formula (cf. Theorem 2.1.10) and combine this in Section 2.2 with a little elementary convex analysis to arrive at the conclusion (drawn in Theorem 2.2.21) that when large deviations are governed by a convex rate function then that rate function must be the LEGENDRE transform of the logarithmic moment generating function. Since, as we saw in Chapter I, the rate functions produced in both C R A M ~ Rand ’ S SCHILDER’S Theorems are in fact LEGENDRE transforms of the corresponding logarithmic moment generating functions, this observation leads one to guess that there may be circumstances in which the easiest approach to large deviation results will consist of two steps: one being an abstract existential proof that the large deviations are governed by a convex rate function and the second being the “computation” of a LEGENDRE transform. (Such a procedure is reminiscent of the timehonored technique to describe the solution to a partial differential equation by first invoking some abstract existence principle and only then trying to actually say something concrete about its properties.) The contents of Chapters I11 and IV may be viewed as a sequence of examples to which the principles developed in Chapter I1 can be applied. In Chapter 111, all the examples concern partial sums of independent random variables. After introducing, in Section 3.1, a general argument (cf. Theorem 3.1.6 and its Corollary 3.1.7) for carrying out an abstract existential
Preface
ix
proof that large deviation results for such sums are governed by convex rate ’S this functions, we return in the rest of the chapter to C R A M ~ RTheorem; time in its full glory as a statement about random variables taking values either in a space of probability measures or in a BANACH space. Thus, Section 3.2 contains a proof of SANOV’S Theorem (cf. Theorem 3.2.17) for empirical distributions; and Section 3.3 is devoted to the BANACH space version of CRAMER’S Theorem. (In connection with the derivation of these results, we introduce in Lemma 3.2.7 a somewhat technical miniprinciple which turns out to play an important role throughout the rest of the book.) Finally, in Section 3.4, we show that SCHILDER’S Theorem is a special case of the BANACH space statement of CRAMER’S Theorem and, in fact, that a scH~~DERlike result can be proved for general GAussian measures.
As we said before, Chapter IV is again an application of the principles laid down in Chapter 2. In particular, we now take up the study of SANOVtype theorems for MARKOV processes which do not necessarily have independent increments. In order to make the development here mimic the one in Chapter 111, we impose extremely strong hypotheses to guarantee that the processes with which we are dealing possess ergodic properties which are nearly as good as those possessed by processes with independent increments. As a result, basically the same ideas as those in Chapter I11 apply to nice additive functionals of such processes and allow us to prove (cf. Theorems 4.1.14 and 4.2.16) that these functionals have large deviations which are governed by a convex rate function. In particular, after identifying the rate functions involved, we use these considerations to obtain a variant of the original DONSKERVARADHAN theory for the large deviations of the normalized occupation time distribution (i.e. the empiriprocess (cf. Theorems 4.1.43 cal distribution of the position) of a MARKOV and 4.2.43). Because it is technically the simpler, we do MARKOV chains (i.e., MARKOV processes with a discrete timeparameter) in Section 4.1 and move to the continuoustime setting in Section 4.2; and in Section 4.4 we show how, under the hypotheses used in Sections 4.1 and 4.2, one can realize the large deviation theory for the empirical distribution of the whole process as the projective limit of the theory for the position. Section 4.3, which is somewhat a digression from the main theme and should probably be skipped on first reading, contains DONSKER and VARADHAN’S analysis of the WIENERsausage problem. To some extent, Chapter V represents to retreat from the pattern set approach of in Chapters I11 and IV and a return to the more LLhandson’’ Chapter I. Thus, just as in Chapter I, the approach in Chapter V is to first
X
Large Deviations
inequality; get an upper bound, basically as an application CHEBYSHEV’S then a lower bound via ergodic considerations; and finally a reconciliation the two. A rather general treatment of the upper bound is given in Section 5.1, where, in Theorem 5.1.6 and Corollary 5.1.11, we sharpen results obtained earlier in Theorem 2.2.4. In preparation for the derivation of the lower bound, we digress in Section 5.2 and give a brief resum6 of a few more or less familiar results from ergodic theory. As a first application of these considerations, we present in Section 5.3 a very general large deviation result for the empirical distribution of the position of a symmetric MARKOV process (cf. Theorem 5.3.10). Our second application is the content of Section 5.4, where we prove CHIYONOBU and KUSUOKA’Srecent theorem about the process level large deviations of a (not necessarily MARKOV)hypermixing process (cf. Theorem 5.4.27); and, in Section 5.5, we discuss the hypermixing property for processes which are 6MARKOV. The motivation behind Chapter V has been our desire to get away from the extremely strong ergodic assumptions on which the techniques in C h a p ters 111and IV depend and to replace them with assumptions which have a better chance of holding in either noncompact or infinite dimensional situations. In order to test and compare the scope of the various techniques which are contained in Chapters IV and V, we describe in Chapter VI some analytic results with which one can see, at least in the context of diffusion processes, the relative position of these results as measured on the scale of elliptic coercivity. The contents of Chapters I through IV constitute a reasonably thorough introduction to the basic ideas of the theory and more or less record lectures given by the second author during the fall of 1987. Thus, we consider these four chapters as a suitable package on which to base a semester length course for advanced graduate students with a strong background in analysis and some knowledge of probability theory. In this connection, we point out that each section ends with a large selection of exercises. Although some of these exercises are quite routine and do not require any particular ingenuity on the part of the student, others are more demanding. Indeed, we have not hesitated to include in the exercises a good deal of important material. In particular, it is only in the exercises that one can find most of the applications. Finally, a word about the history of this book may be in order. In 1983, the second author gave a course, at the University of Colorado, in which he taught himself and one or two others something about the modern theory
Preface
xi
of large deviations. Having expended considerable effort on the task, he decided to set down everything which he then knew about the subject in a little book [loll. That was five years ago. In the intervening years, both the subject as well as his understanding of it have grown; and, with the aid and comfort provided by a fellow sufferer, he took on the more ambitious project of basing a full blown exposition on the course which he gave in fall of 1987 at M.I.T. Thus, the present book is a great deal longer: both because it contains more material and because the exposition is more detailed. Unfortunately, in the process of removing some of the more glaring imperfections and omissions in [loll, we are confident that we have introduced a sufficient number of new flaws to keep our readers somewhat annoyed and, occasionally, thoroughly confounded. However, the responsibility for these flaws is entirely ours and not that of the ever patient students in 18.158, who struggled with the class notes out of which this final version evolved. In particular, we take this opportunity to thank STEVEFROMM for goading us into addressing several of the more perplexing inanities in those class notes. Also, we are indebted to MICHAELSHARPE who saved us many harrowing hours manipulating w i n t o doing our bidding (cf. the similarity between the format, if not the content, of the present volume and volume # 133 in the same series); and, last but not least, it is a pleasure for us to thank our typist for Eir beautiful work. Cambridge, MA December 31, 1988
This Page Intentionally Left Blank
Contents Chapter I: Some Examples 1.1: The General Idea 1.2: The Classical CRAMERTheorem 1.3: SCHILDER’S Theorem 1.4: Two Applications of SCHILDER’S Theorem
1 1 3 8 21
Chapter 11: Some Generalities 2.1: The Large Deviation Principle 2.2: Large Deviations and Convex Analysis
35 35 52
Chapter 111: Generalized CramBr Theory 3.1: Preliminary Formulation 3.2: SANOV’S Theorem 3.3: CRAMER’STheorem for BANACH Spaces 3.4: Large Deviations for GAUssian Measures
58 58 64 78 85
Chapter IV: Uniform Large Deviations 4.1: MARKOVChains 4.2: Continuous Time MARKOVProcesses 4.3: The WIENER Sausage 4.4: Process Level Large Deviations
91 91 110 140 161
Chapter V: NonUniform Results 5.1: Generalities about the Upper Bound 5.2: A Little Ergodic Theory 5.3: The General Symmetric MARKOVCase 5.4: The Large Deviation Principle for Hypermixing Processes 5.5: Hypermixing in the Epsilon MARKOV Case
185 185 193 206 213 231
...
Xlll
xiv
Large Deviations
Chapter VI: Analytic Considerations 6.1: When is a MARKOV Process Hypermixing? 6.2: Symmetric Diffusions on a Manifold 6.3: Hypoelliptic Diffusions on a Compact Manifold
237 237 250 271
Historical Notes and References
284
Notation Index
301
Subject Index
305
I
Some Examples
1.1 The General Idea
Let E be a Polish space (i.e., a complete, separable metric space) and suppose that { p e : E > 0) is a family of probability measures on E with the property that p E + 6, as E + 0 for some p E E (i.e., pE tends weakly to the point mass 6,). Then, for each open set U 3 p , we have that p E ( U C ) 0; and so we can reasonably say that, as E , 0, the measures p, “see p as being typical.” Equivalently, one can say that events l? E E lying outside of a neighborhood of p describe increasingly “deviant” behavior. What is often an important and interesting problem is the determination of just how “deviant” a particular event is. That is, given an event r for which p 4 F, one wants to know the rate at which pe(I’) is tending to 0. In general, a detailed answer to this question is seldom available. However, if one restricts ones attention to events which are “very deviant” in the sense that p e ( r ) goes to zero exponentially fast and if one only asks about the exponential rate, then one has a much better chance of finding a solution and one is studying the large deviations of the family { p E: E > 0). In order to understand why the analysis of large deviations ought to be relatively easy and what one should expect such an analysis to yield, consider the case in which all of the measures p e are absolutely continuous with respect S, it is reasonable to to some fixed reference measure m. Since pc suppose that dP€  = gc exp[~/e] dm where E log gc 0 uniformly fast as E + 0 and I is a nonnegative function which vanishes only at the point p . One then has, for any r with m ( r ) < 00,


1
Large Deviations
2
and so (since m(r)< 0 0 )

ess.sup{exp[l(q)]
:q E
r}
as E + 0. (The “essential” here refers to the measure m.) Hence, in the situation described above, we have, at least when m(r)< 0O:
(1.1.1)
lim clogpL,(r)= em. inf{I(q) : q E r}.
StO
In particular, the factor gE plays no role in the analysis of large deviations; and it is this fact which accounts for the relative simplicity of this sort of analysis. Moreover, it is often e&syto extend (1.1.1) to cover all r’s. For instance, such an extension can certainly be made if one knows that for each L > 0 there is a r L such that (1.1.2)
m(I‘L)
0,
The reader is assumed to be familiar with some form of WIENER'S basic existence theorem and with the basic properties of Wiener's measure W . In particular, he is advised to reconcile the statement which he knows with the one given above. We can now describe the family of measures which we want to study in this section. Namely, for each 6 > 0, let W , denote the distribution of 8 t 1 / 2 8 under W . Clearly W , 60, where 60 is the point mass at the path which never leaves 0. Hence, we are again dealing with a family for which it is reasonable to ask about large deviations. Before getting into the details, it may be helpful to make a couple of remarks. In the first place, it should be noted that, at least formally, we are dealing here with a situation like the one discussed in Section 1.1. Indeed, an often useful heuristic representation of WIENER'S measure is the formula

(1.3.6)
*
Large Deviations
10
The expression in (1.3.6) is somewhat fanciful. Indeed, none of quantities on the right hand side makes sense by itself. In particular, “do” stands for the (nonexistent) translation invariant measure on 0 , Y?‘ denotes the derivative (which Walmost surely fails to exist) of 8, and the constant “c” is infinite. Thus, (1.3.6) is at best just a schematic representation of what one gets by formally passing to the limit in the expression for the W measure of a subset of 0 whose description involves a continuum of times. Leaving such technicalities aside, one has to admit that to whatever degree one accepts (1.3.6), one has to grant the expression
an equal degree of acceptance; and, on the basis of this expression combined with the discussion in Section 1.1, one is led to predict that the function I governing the large deviations of {We : E > 0) ought to be le(t)I2dt. A second remark, and the one on which our analysis will be based, is that the family {We : E > 0} is related to the sort of family which was handled in Section 1.2; and, as we will see later (cf. Sections 3.3 and 3.4 ), the result which we are about to obtain can be considered as a consequence of C R A M ~ RTheorem ’S for measures on 0. To understand the relationship with the situation dealt with in CRAMER’S Theorem, note that the measure W1/, here is precisely the distribution under W” of the random variable
Ce, 
(e+ , e n )
E W
1
n
n
E
0.
1
Hence, on the basis of CRAMER’S Theorem, we should predict that the I governing the large deviations of {We: E > 0) is the LEGENDRE transform of the logarithmic moment generation function for W . In this connection, one should observe that the quantity AW introduced in WIENER’S Theorem above is the logarithmic moment generating function
for W . Thus, what we are now predicting is that the function
is the function which governs the large deviations of {We: E > 0). We begin our rigorous analysis with a lemma which shows, among other things, that the two predictions made above are at least consistent.
11
I Some Examples 1.3.8 Lemma. Given X E O* define ?(tx E 0 by It
(1.3.9)
Then, for all A, 77 E 0*,
Next, define H’ = H1([O,c o ) ; R d ) to be the space of ?,b E O with the property that $ ( t ) = s,” $(s) ds, t 2 0, for some E L 2 ( [ 000); , R d ) ; and set
ll*llH1
=
ll~llL2([o,00)pd)
for
*
4
E
H’. Then
( 1.3.12)
In particular,
AhNJA) = Aw(X),
(1.3.13)
and for each L 2 0, {$ E 0 : subset).
Ah($)5
E @*,
L } CC 0 (i.e., is a compact
PROOF:We first observe that the second equality in (1.3.10) is an elementary integration by parts. Second, we note that it suffices to prove the first equality in (1.3.10) when X = 77 (since the general case then follows from this by polarization) and that, by an elementary approximation argument, we need only handle X E 0’ which are nonatomic and compactly supported. But in this case we have: s A t A(&)
=
s,
rm)
. A(&)
Wdt)
=2 i
tdlX((4 . ) I2
=
0
1,
d
IX((t,
. ) I2
.
lo.,
s X(ds)
dt.
00)
Turning to the proof of (1.3.12), first suppose that $ E H I . Then
12
Large Deviations
and therefore, by (1.3.10),
Ah($)
is equal to
f
4 ( t ) X((t,00)) dt  
lo
IX((t,O O ) )dt~ :~X E 0'
F)
I
Hence, the proof of (1.3.12) will be complete once we show that $ E H' whenever Ah($)< oo.But if q5 E C,"([O, 00); R d ) (i.e., it is a smooth path with compact support) and we define X E 0' by + ( t ) = X ( ( t , oo)), t _> 0, then
and so there exists a unique $ E L2([0,00); Fad) such that
 Id
$ ( t ) . i ( t ) dt =
J, w
4(t)d t , 4 E C,([0,0O);
w.
9J)
From here it is an easy step to the conclusion that $ ( t ) = s," $(s) ds, t 2 0, and therefore that $ E H'. Given (1.3.12), (1.3.13) is clearly a consequence of (1.3.11). To complete the proof, note first that, directly from its definition in (1.3.7), A& is lower semicontinuous. Thus, the fact that {$ : A&($) I L} is compact follows immediately from (1.3.12) and the easily verified observation that bounded subsets of H' are relatively compact in 0.I We will now prove a slightly deficient form of the right hand side of (1.1.5) with I = Ah. The reader should remark the similarity of this argument with the proof of Lemma 1.2.5.
1.3.14 Lemma. Let $ E 0 be given. Then for each 6 > 0 there exists an > 0 such that
T
(Here, and throughout, B ( x , r ) denotes the open ball of radius T around a point x in a metric space; and B(x,r ) denotes the corresponding closed ball.) In particular, if K C C 0 , then (1.3.16)

lim 6 log ( W , ( K ) )5  inf Ah.
€40
K
13
1 Some Examples PROOF:To prove (1.3.15), note that
W E(B($, r ) ) = W(B(+/E1/2, r / 2 l 2 ) )
for all X E O*. If Ah($)= 00, choose X E Q* so that hw(X) 2 1 + 1/6 and T = 1/(1 IlXlls.). If AL($) < 00, choose X E O* so that 6 8'(A $>,  A W ( X ) 2 Ah($)  6/2 and T = Z ( l + l l & . ) ' To prove (1.3.16), set .!= infK Ah and, for given 6 > 0, use (1.3.15) and the compactness of K to choose $ 1 , . . . , $n E K and T I , . . . , T n E (0,00) SO that K G U;" €I(&, r k ) and
+
Then
and
SO
Finally, let 6 \ 0. I
1.3.17 Remark. Suppose that { p E : E > 0) is a family of probability measures on a BANACH space ( X , 11 . 11) and let Ape denote the logarithmic moment generating function for p E (i.e.,
for X E X * ) . Further, assume that (1.3.18)
A(X) E lim €Ape(A/€) EO'
14
Large Deviations
exists for every A E X*.Then the argument used to prove Lemma 1.3.14 leads to the conclusion that for any K C C X

limclog(p,(K))
€+O
I infA*, K
where s ~ p { x * ( A ,~ A(A) ) ~ : A E X’}
A*(x)
is the LEGENDREtransform of A. In the particular case treated in Lemma 1.3.14, we had that CAW,(A/€) = hw(A),and so (1.3.18) was trivial. See Theorem 2.2.4 for more details on this subject. Although the result obtained in Lemma 1.3.14 is restricted to compact subsets and is therefore less than we really want, we will turn to the left hand side of (1.1.5) before addressing the problem of extending (1.3.16) to all closed sets. Just as in the proof of Theorem 1.2.6, the key to proving the left hand side of (1.1.5) is the use of an efficient method for moving the “center” (or mean) of the measures W e . In the present setting, this key is contained in the following important quasiinvarianceproperty of WIENER’S measure.

1.3.19 Lemma. (CAMERON & MARTIN)Given X E 0*,let W x denote the distribution of 6 0 $A under W , where $A is the element of 0 described in (1.3.9). Then W x 0 so that B ( $ , T )C G. Then $ = $A and so, for 0 < 6 < T ,

W,(G) 2 W E ( B ( $6 ,) ) = W  X / e ”(B(O, 2 6/c1/’))
Since, by (1.3.13), Aw(X) = Ah($),we see that (1.3.22) holds. 1 We must now return to the problem of removing the compactness restriction from Lemma 1.3.14. Our idea will be to produce a family of compact sets K L , L > 0 with the property that (1.3.23)

lim E log ( W , ( K i ) ) 5 L,
E+O
L > 0.
What (1.3.23) says is that, as L /” 00, the events K i become so “deviant” that they cannot even be seen on the scale at which we are looking; and, therefore, they cannot contribute to our calculation (cf. the proof of Theorem 1.3.27 below).
16
Large Deviations
There are several ways in which one can go about constructing the sets K L . The method which we will adopt here will be to construct a function Q :0 [0, m] with the properties that

(1) @ is subadditive and @(a@) = IaIQ(0) for all a E ( 2 ) ( 0 : @(e) 5 L } cc 0 for each L > 0, (3) : q e ) < }.) = 1.
W and 8 E 0,
w({e
In order to construct such a Q and to pass from the fact that it exists to (1.3.23), we will make use of the following beautiful and powerful estimate due to X. FERNIQUE [45].

1.3.24 Theorem. (FERNIQUE) Let X be a real, separable FRECHET space and Q : X [O,m] a measurable subadditive function with the property that @(ax)= lalQ(x)for all a E R and 2 E X.Next, let p be a probability measure on (X, Bx) with the property that p2 on ( X 2 ,Bxz) is invariant under the transformation
If p ( { x : @(s)< oo}) = 1, then there exists an a
PROOF:Given 0 < s < t , we have
> 0 for which
I Some Examples
17
and therefore
Working by induction, we conclude from this that
where
Thus if
ct
< a / ( 2 P ) , then
1.3.25 Lemma. For 0 E 0 set
Then { O E 0 : @(0)5 L } cc 0 , for each L such that (1.3.26)
> 0; and there exists an a > 0
exp[cr@(8)2]w(&) < 00.
In particular, if K L = (0 : @(O)' 5 L/a},then K L CC 0 and (1.3.23) holds.
PROOF:The proof that, for every R E (O,m), (0 : a(0) I R } cc 0 is a standard application of the ASCOLIARZELA criterion combined with a diagonalization argument. The details are left to the reader.
Large Deviations
18
To prove that (1.3.26) holds for some Q > 0, we first observe that W 2 has the invariance property required in FERNIQUE’S Theorem. (Indeed, any centered GAussian measure on a FRECHET space will have this property.) Thus, the existence of a will follow once we show that W({e : @(e) < co}) = 1. To this end, note that, by parts (iv) and (vi) of Theorem 1.3.2 combined with FERNIQUE’S Theorem,
for some A d < 00. At the same time, again as a consequence of FERNIQUE’S Theorem and elementary properties of W , we see that
for some B d
< co. Finally, since, by
for some c
< 00,
d
(iv) in Theorem 1.3.2,
we can combine these into the estimate that
which is more than enough for our purposes. Knowing (1.3.26), we can proceed to prove (1.3.23) as follows:
I exp[~/e] J
exp[aa(0)21 w ( ~ Q ) .
0
Together with (1.3.26), this surely leads to (1.3.23). I 1.3.27 Theorem. (SCHILDER) For every I’ E BQ: (1.3.28)  inf A&, r o
< lim E log (We(r))5 i& e log (WE(I?)) 5  iEf A*,. r
€+O
E’O
PROOF: In view of Lemma 1.3.21, all that we have to do is show that

lim 6 log (We( F ) )5  inf Ah F
EbO
for each closed sets F . To this end, let C = i n f F h b , and, for L > 0, set FL = F n K L , where K L is the compact subset produced in the preceding lemma. Then W,(F ) 5 W € ( F L ) W € ( K L )and ; so, by Lemma 1.3.14 and (1.3.23), lim clog ( W E ( F )5) (C A L).
+
SbO
After letting L /” oo,we arrive at the required result. I
I Some Examples
19
1.3.29 Exercise. Given 11, E 0 and n 2 1, define
Show that
V ( $ )< 00 if and only if $ E H ' , and V ( $ )= ~
for $
HI.
~ $ ~ ~E$ 1
1.3.32 Exercise. The Lemma 1.3.19 is not a complete statement of CAMERON and MARTIN'S result 1151. Indeed, suppose that $ E H 1 and choose
{$nIF G c ~ ( [ o , ~ o ) ; R ~ )
so that
l$,

$11~1

0. Set
@,(el =
f/l$llLl.
8 E 0,
where A, is the element of O* defined by A n ( ( t , 0 0 ) ) = & ( t ) , t 2 0. Show that an @ in L2(W),where @ under W is GAussian with mean 0 and variance Next, show that exp[@.,Aw(A,)] exp[@f~~$~~&] in L'(W). Finally, conclude from this that if W+ denotes the distribution of 6' 0 $ under W , then W $ _  inf{IT(+) : X ( $ ) E G } =  inf{IT o X  ' ( d ) :
{1
C#J
E G and d ( 0 ) = 0)
T
=  inf
l X ( t )  b ( X ( t ) ) I 2dt : X E G and X ( 0 ) = 0
I
and (1.4.9)
Similarly, if F
Hk = H1(([o,TI; W d ) = { $ l [ O , T ] : $ E H ' } *
RT is closed, then
Theorem leads directly to a large deviation In other words, SCHILDER'S result for {P, : E > 0). The preceding example of VENTCEL and FREIDLIN'S theory is as simple as it is because the map 0 X ( 0 ) is especially pleasant; in particular, it is continuous and its inverse is easy to compute. In general, the maps involved are not only more complicated but are not even continuous. To be precise, let a : W d Rd 8 W d be symmetric matrix valued, b : W d Wd, and assume that there exists an M E [l,co) such that



Large Deviations
26

norm.) (In (1.4.11) and elsewhere, II.IIH.s. stands for the HILBERTSCHMIDT Next, for z E Rd and E > 0, let XF : [O,T]x 0 Rd be the Walmost surely unique {Bt : t E [0,TI}progressively measurable solution to the IT^ stochastic integral equation
x:(t,e)= x + E l l 2 (1.4.12)
I'
U(X,.(S,
6 ) )W s )
t where u
E [O,TI,
= a l l 2 ;and define P," = w
0
(xf
on ( f l ~ , l ? ~(since ~ ) Xp(,O) E RT for Walmost every 6 E 0, there is no problem with considering P," on RT). Once again, P," Sx;, where is the integral curve Xg E
*
rt
Moreover, if one pretends that (1.4.12) means that x:(t,e) =
E1/2u(x:(S,e))B(s) + b(x:(s,e)),
tE
[o,~],
(this is not even formally correct, since we are dealing with IT^ and not STRATONOVICH integral; however this error becomes negligible as E + 0) and one ignores all continuity questions, then one can repeat the argument given in the preceding paragraph and thereby arrive at the conjecture that the large deviations of {P," : E > 0) are governed by the function
according to whether X  x 4 H$ or X  x E H;. Considering all the objections which one can raise to the above na'ive line of reasoning, it is somewhat remarkable that the conjecture to which it leads is, nonetheless, absolutely correct. In order to get around the most serious flaw in our heuristic argument (namely, our treatment of the maps 0 E 0 XF(0) E f l as~ if they were continuous), we introduce EULERapproximations. Namely, set

Tn(t)= , [ntl n
n E Z+ and t E [ O , c o )
I Some Examples
27
(recall that [TI is the integer part of r E R), and consider the maps B E 0 HX&(B) E 5 2 given ~ by

for t E [O,T].Clearly the maps B E 0 X & ( @ ) E 5 2 are ~ continuous. Moreover, X&(B) = X:,, ( @ B ) ; and so, just as in the original case considered, we can apply SCHILDER'S Theorem to deduce that

 '$In,T,x a,b < lim clog €0
(1.4.15)
5 lim log €+O
[w({e: x:,€(e)E r})] [w({e: x;,,(e)E r})] 5  i ~ f af 6, ; ~ , ~ , r
where


according to whether X  x 4 H; or X  x E H;. Since it is clear that X& X : in Wmeasure and that I;;:, as n + 00, all that stands between us and the conjectured result are estimates which allow us to exchange the order in which nlimits and €limits are taken. The following lemma takes care of the required facts about the convergence of { I Z ; ; , ~to } ~I;;:.

1.4.17 Lemma. For each z f Rd, { X : I;!+.(X) 5 L ) CC RT for all L 2 0 and infr infr I&;: as n + 00 for every r 5 2 ~ .
PROOF:Assume that x = 0 and set I = 1;;; and I , = Because { X : I ( X ) 5 L } is a bounded subset of H;, we will know that it is compact in 5 2 as ~ soon as we show that it is closed there. To this end, ~ the properties suppose that { X n } y is a sequence of elements in 5 2 with that X, +X in 5 2 and ~ SUP, I ( X , ) 5 L . Then X E H; and X, X weakly in El$. Since this means that

weakly in L2([D,T];Rd),it follows that I ( X ) < ,oo I ( X , ) 5 L. Thus, { X : I ( X ) 5 L } Cc 5 2 ~ To . prove the convergence assertion, first note
28
Large Deviations
that infr I = 00 if and only if infr I, = 00 for every n 2 1. Next, note that if B is a bounded subset of H;, then lim sup II,(x>  I(x)~ = 0.
(1.4.18)
n'OXEB
In particular, this proves that infr I 2 G,+m infr I,. Finally, if C = infr I < 00, then we can choose a bounded subset B of H$ so that I ( X ) A inf,Ll I,(X) 2 C 1 for X 4 B. Hence, because infr I, 5 C 1 for all sufficiently large n 's, we can use (1.4.18) to conclude that
+
+
inf I = inf I = lim inf I, = lim infI,. r rnB ncarn~ n+m r
I
As a preliminary to our estimate on the rate of convergence of the X& 's to XF, we present the following standard estimate for stochastic integrals.

1.4.19 Lemma. Let (Y : [0,00) x 0 RN@ Rd and p : [0, m) x 0 be bounded {&}progressively measurable functions and set
PROOF:Set P ( t , 0) = Y(t, 0)  Y(s, 0) and [ E SN', define
sstp ( ~ , 0dT) for t 2
and
&(e)
= inf{t 2 s :
0 and 2 1,
( F ( 6 )denotes the open &neighborhood around F.) Thus, by (1.4.15), Lemma 1.4.17, and Lemma 1.4.21, we see that ( 1.4.27) for every 6 > 0. Finally, set & = inf1;;: for 6 2 0. It is clear that /" C I CO as 6 \ 0. Suppose that C < lo. We could then find {X,}? and L < L < Lo so that X , E F(lln)and I;".;L(X,) 5 L. Further, by Lemma 1.4.17, we could assume that X , X . But clearly this would mean that X E F and, again by Lemma 1.4.17, that infpI;$ 5 I;:L(X) < CO. Hence we can let 6 /" 0 in (1.4.27) and thereby get the right hand side of (1.4.26). Next, let G be an open set in RT. Then, for each X E G and n 2 1, we see that
Cs

Large Deviations
32
as long as B ( X ,26) C G. Using (1.4.15),Lemma 1.4.17, and Lemma 1.4.21, we conclude from this that
lim Elog(P:(G))
2 I$(X).
I
O’E
1.4.28 Exercise.
STRASSEN’S Theorem is the function space version of the Classical Law of the Iterated Logarithm. That is, given realvalued, identically distributed, independent random variables X I , . . .,X,, . . . with mean 0 and variance 1, set S, = CyXm, n 2 1. Then the Classical Law of the Iterated Logarithm is the statement that (1.4.29)
lim
n’oo
sn =1
P(4
almost surely.
When the X , ’s are standard GAussian random variables, (1.4.29) is an immediate consequence of (1.4.2) with a($) = $(l) since, in this case, {S,}? has the same distribution as the distribution of 8 {O(n)}? under W . It turns out that the general classical result can also be seen as a consequence of STRASSEN’S Theorem. The proof entails the use of the SKOROKHOD Representation Theorem [97]. We outline below how this argument proceeds in the special case when the X , ’s are standard BERNOULLI random variables (i.e., P ( X , = 1) = P ( X n = 1) = L). 2 Throughout the rest of this exercise, the X , ’s are BERNOULLI and d = 1.

(i) Define q,(O) = 0 and Tn+l(f3) =
inf{t  En(8) : t 2 En(8) and lO(t)  8(En(8))l 2 l}, n 2 0
where E,(8) = C:=o~m(8).Show that the 7,’s under W are identically distributed, independent, and have mean 1. Next, set C, = and define Yn(8)= O(E,)  O(E,,) for n 2 1, and show that the Y, ’s (under W )are independent standard BERNOULLI random variables. (Both of these assertions turn on the fact that if T is a {&}stopping time with W(T< m) = 1, then 8 E 0 8(. V T ( 8 ) )  e(T(8)) E 0 under W is independent of Z?, and has distribution W.) Conclude that { S n / P ( n ) } y has the same distribution as the distribution of
cy~~

8

{ t n ( E n ( e ) / n ,0)):
under W . In particular, (1.4.29) for BERNOULLI random variables is equivalent to (1.4.30)
for Walmost every 8.
1 Some Examples
33

(ii) Use the Strong Law of Large Numbers to show that E,(O)/n 1 Walmost surely; and from this, together with Theorem 1.4.1, conclude that (1.4.30) holds for Walmost every 8. The construction of the 7, 's for more general random variables is more difficult. (The content of SKOROKHOD'S Theorem is that such 7, 's always exist.) However, once their existence has been established, the rest of the argument is the same as the one just given for the BERNOULLI case. 1.4.31 Exercise.
There is a more direct approach which can be taken to prove the left hand side of (1.4.26). Namely, given 11, E H$, let 8 H X,Zi$(e)be the Walmost surely unique {a, : t E [O, 2'1)progressively measurable solution to
for t E [O,T].

(i) Show that the distribution of 8 H X ~ ~ ~under ( 0 )W is the same as that of 8 X f ( 0 ) under W+/"". (See Exercise 1.3.32 for the notation, and think of 11, as being the element of H 1 with $(it) = $ ( T ) for t 2 T.) (ii) Define Y"(11,) E f l by ~
Using (i) above, Exercise 1.3.32, and HOLDER'Sinequality, show that for every q E [l,00) and T > 0,
Conclude from this that
for all T > 0.
Large Deviations
34
(iii) From (ii), show that for every open G in RT, (1.4.32)
b e l o g ( P T ( G ) ) 2 inf{Iz(+) : 1c, E
and Y"(+)E G};
O E'
and show that this is equivalent to the left hand side of (1.4.26).
It should be noted that the preceding derivation does not use in any way the strict positivity of a(.) until the very end. Thus, (1.4.32) holds even if a is allowed to degenerate. However, when a can degenerate, it is not so easy to give its nice an expression as that in (1.4.14) for the quantity on the right hand side of (1.4.32). (cf. Exercise 2.1.25 below.) 1.4.33 Exercise.
Replace (1.4.10) and (1.4.11), respectively, by the assumptions that (1.4.34) 0 < a(.)
6 M(l+lx12)IRaand lb(x)I 5 M(1+)2)2)1'2, x E W d
for some M E (0, m) and that, for each T E (0, m),
for some M , E ( 0 , ~ ) Show . that for each x E Wd,e > 0, and T > 0, there is a Walmost surely unique {at : t E [O,T]}progressively measurable solution 8 X?(O) to (1.4.12) and that both

and
are oo for every T > 0. Conclude from these not only that Theorem 1.4.25 continues t o hold when (1.4.10) and (1.4.11) are replaced by (1.4.34) and (1.4.35) and also that (1.4.26) can be improved to the statement that (1.4.36) $fI&$

< l i m e l o g ( P . f ( r ) ) 5 E ~ l o g ( P . ~ ( r )5) igfI&;$ O E'
OE'
r
whenever z, x. Also, observe that it is still true that { X : Ig;",X) 5 L1 cc RT for every L 2 0.
I1
Some Generalities
2.1 The Large Deviation Principle Having seen several examples for which it is possible to carry out a successful analysis of the large deviations, we will now attempt to formulate into general principles some of the ideas and techniques which proved useful in those examples. Because we never use completeness in this section, we will take E throughout this section to be a separable metric space. A function I : E [0,co] is said to be a rate function if it is lower semicontinuous. Given a family { p E : E > 0 ) M l ( E ) (we often use M l ( E ) to denote the space of probability measures on (E,BE)), we will say that { p e : E > 0) satisfies the full large deviation principle with rate function I or, equivalently, that the rate function I governs the large deviations of { p E : E > 0) if (1.1.5) holds for every r E BE. It is clear that if I is a rate function which governs the large deviations of some family { p E : E > 0) then it must be true that infE I = 0.

The following result is elementary but reassuring.
2.1.1 Lemma. For any given { p e : E > 0) Ml(E) there is a t most one rate function governing the large deviations of { p e : E > 0).
PROOF:Suppose there were two, and name them I1 and 1 2 . Because of lower semicontinuity, we know that I j ( p ) = lim,Co infqp,,) Ij for every p E E . Thus it suffices for us to show that, for each p E E, infB(p,r)I1 = infs(p,,) I2 for each T in a dense subset of (0, co). To this end, observe that
35
36
Large Deviations
for any r > 0 with the property that infB(P,T)Ij = infF(p,TlI j . In particular, this will be the case if r > 0 is a continuity point for the nonincreasing function T E ( 0 , ~ ) infB(p,s)Ij; and therefore we see that infB(p,T)I1 = infqp,r)IZ for all but a countable number of r > 0. I

In all our examples, the governing rate function was not only lower semicontinuous but also had the property that the level sets { q € E : I ( q ) 5 L } were compact for all L 2 0. Because such rate functions play a prominent role and since the additional property is extremely useful, we will say that I :E [O,oo] is a good rate function if { q E E : I ( q ) 5 L } CC E for all L 2 0. Some elementary properties of good rate functions are listed in the next result.

2.1.2 Lemma. Let I be a good rate function. Then, for each closed F in E, (2.1.3)

(Recall that = { q E E : dist(q, I') < 6) for any subset I'.) In addition, if @ : E [00, 00) is an upper semicontinuous function, then for any closed F E on which @ is bounded above there is a q E F such that @(q) I ( q ) = suPF(@  I).
PROOF:The derivation of (2.1.3) in this general setting differs in no way from the one given for the special case handled at the end of the first paragraph in the proof of Theorem 1.4.25; thus, we will not repeat the argument here. To prove the second assertion, first note that there is nothing to do if sup,(@  I) = 00. Thus, we assume that C = supF(@ I) > 00, in which case we know that C E (oo,m). Choose {qn}yE F so that @(qn)  I ( q n ) 1 .t  $. Because {qn}y { q : I ( q ) 5 M  C l}, where M = supF a, there is a convergent subsequence of {qn}r which converges to some q; and, because @  I is upper semicontinuous, not only is q E F but also @(q) I ( q ) 2 C.
+
Another advantage that good rate functions have is that the full large deviation principle is a covariant notion when the rate function is good. (In this connection, we use here and elsewhere the notation p o fl to denote the covariant image of a measure p under a measurable map f . Thus, p o f'(I') = p(fl(I')) for measurable subsets I' of the image space.) That is, such principles can be L'pushedforward" under mappings which are "nearly continuous." We have already seen an example of this when we discussed in Section 1.4 the passage from SCHILDER'S Theorem
II Some Generalities
37
to the estimate of VENTCEL and FREIDLIN. The next lemma provides a general statement of this technique. (See also Exercise 2.1.20 below.) 2.1.4 Lemma. Let I be a good rate function on E , f a measurable map from E into a second separable metric space (E’, p‘), and assume that there C ( E ;E’) such that exists a sequence { f n } T
{

lim sup p’ ( f n ( q ) ,f ( q ) ) : q E E with I(q) 5 L } = 0 for each L E (0,oo).
n+m
Then the map I‘ : E’
[0, m] given by
I’(q’) = inf{I(q) : q E E and q’ = f ( q ) } , q‘ E E’,
is a good rate function on E‘. Moreover, if, in addition, {pd: c M 1 ( E ) has the property that
> 0)
C_
for each 6 E (0, GO), then I’governs the large deviations of { p , o f  l : c > 0} whenever I governs the large deviations of {pd: 6 > 0). In particular, i f f E C ( E ;E’) and I is a good rate function on E which governs the large deviations of {pd : c > 0}, then I’ is a good rate function on E‘ which governs the large deviations of {pdo f’ : E > O}.
PROOF:One should observe that the case when f is continuous everywhere on E is trivial and therefore really should not be thought of as a consequence of the general result. First, observe that f is continuous on K L { q E E : I ( q ) 5 L } for each L E [0, co). Second, suppose that q’ E E’ with I‘(q‘) < co. Then, for some L E [O,GO), I’(q’) = inf{I(q) : q E K L and q’ = f ( q ) } ;
and therefore, by Lemma 2.1.2, there is a q E f’(q’) for which I ( q ) = I’(q’). With these preliminaries, we can easily prove that I’ is a good rate function. Indeed, if L E [0, GO) and
KL
{q;}:
= {q’
E E’ : I’(q’) 5
L},
then there is a {qn}y C_ K L such that qk = f ( q n ) and I’(qk) = I(q,) for each n E Z+. Thus, since K L CC E , we can choose a subsequence 00 {Qnm Im=l so that qn, q E K L . Because f J K L is continuous, this means that

4‘
=f(n) =
That is, KL C C E’.
*
lm
m00
I
Pn,
and I’(q‘) 5 I ( q ) 5 L.
Large Deviations
38
In preparation for the second part of the proof, we next show that, for each closed F’ in E’, infI’ = lim lim inf { I ( q ) : p’(fn(q), F’) 5 6). F’
0 0 6‘ v 1J 7

To this end, first suppose that p’ E F’ with I’(p’) < 00 and 6 E ( 0 , ~ are ) given. Choose p E f’(p‘) so that I ( p ) = I/@’). Noting that fn(p) p’ as n 00, we see that there is an N E Z+ such that

for all n 2 N ; and therefore, we now know that
To prove the opposite inequality, assume that
We can then choose { q m } ;
E Ke+l and nm

00 SO
that
for each rn E Z+. Furthermore, because Ke+l cc E and I is lower semicontinuous, we may and will assume that qm + q € Ke. Hence, since f l ~ ~ is+ continuous ~ and therefore q’ = f ( q ) E F‘, we have that
$i I‘ 5 I ( q ) 5 e. To complete the proof, assume that I governs the large deviations of {,uc : c > 0) and that lim L c l o g [pE(r(n;6))] = 00, 1 2 ’ 0 0
where
6 E (0,00),
E’O
r(n;6 ) = { Q : P’(fn(Q),f ( 4 ) ) 2 6).
Given an open set G’ in E’ and p’ E G’ with I‘(p’) < 00,choose p E f  ’ ( p ’ ) so that I ’ ( p ’ )= I ( p ) and 6 E (0, co) so that 26
< p’(p’, (GI)‘).
II Some Generalities Then, since each f n is continuous and f n ( p ) and a sequence {T,}:=,, g ( 0 , ~such ) that B(P,Tn)

c f,l(B’(P’,s,),
39 f ( p ) , there is an N E Z+
n
L N,
where B’(p’,6) is the p’ball in E’ of radius 6 around p‘. Hence, for n 2 N ,
B ( p , r n ) 2 f’(G’) u r ( n ; 6 ) ; and therefore, by choosing n 2 N so that

lim clog
[pE(r(n;s))]5   ~ ’ ( p ’) 1,
E+O
we see, from the large deviation principle for { p E : E
> 0}, that
E’O
from which we conclude that limelog [ p , o f  l ( G ’ ) ] 2 infI’.
L
EO ‘
Finally, for closed F’ in E’, set
and note that fl
(F’) g
f,l
(~(6 n)); u
6)
for n E Z+ and S E (0, m). Hence, for every n E Z+ and 6 > 0,
where
 ~ ( n6) ;
 lim
E+O
log
[pE(r(n; q)].
Since, by hypothesis, R(n;6) 00 as n because, by the preceding paragraph,
m for each
6 E (0,m) and
40
Large Deviations
the large deviation principle for { p E : c
> 0) now leads to
Another situation which we encountered in Chapter I (cf. the proof Theorem) is that of a deficient large deviation principle; of SCHILDER’S namely, one in which the right hand side of (1.1.5) has been proved only when the set r is relatively compact. As it was there, such a large deviation principle is usually a preliminary step on the way to proving a full large deviation principle. Nonetheless, it arises sufliciently often to warrant our giving it a name. Thus, i f f is a rate function and { p e : E > 0) C M l ( E ) satisfies )  inf I lim clog ( p E ( G )2 G
for all open G in E
€’O
and

lim clog (pe(K))5  inf I K
E O ’
for all K
cc E ,
then we will say that { p E : c > 0) satisfies the weak large deviation principle w i t h rate function I . The passage from a weak to a full large deviation principle is often accomplished by an application of the following simple observation.
2.1.5 Lemma. Let {pa : e > 0) C M 1 ( E ) , and assume that, for each L 2 0, there exists a K L C c E with the property that

lim clog ( p E ( K i )5) L.
(2.1.6)
E+O
If I is a rate function and { p E : E > 0) satisfies the weak large deviation principle with rate function I , then not only is I a good rate function, but it also governs the large deviations of {pa: c > 0).
PROOF:First note that inf I 2  lim c log (p, (KL)) 2 L; KZ
€40
and so { q : I ( q ) 5 L } C K L + ~Since . I is lower semicontinuous, this proves that I is a good rate function. Next, let F be a closed subset in E and set FL = F n K L for L 2 0. Then pa(F) 5 PE(FL) + P L E ( G ) ,
II Some Generalities
41
and so
for every L 2 0. Thus we get the required result upon letting L /*
00.
I
MI@) is exponentially We will say that a family { p e : e > 0) tight if, for each L > 0, there is a K L cc E for which (2.1.6) holds. We end this section with a result which, in its original version, was first proved by S.R.S. VARADHAN [loti]. 2.1.7 Lemma. Let I be a rate function and suppose that { p a : E > 0) satisfies the weak large deviation principle with rate function I. If the function @ : E [m, 001 is lower semicontinuous, then

{
2 sup @ ( q ) I ( q ) : q E E and @ ( q )A I ( q ) < m}. (Throughout we adopt the convention that the supremum over the empty set is m.)
PROOF:Let q E E satisfy @ ( q )A I(q) < 00. Then, for each T > 0,
Since @(q)= lim,+o infB(q,r)@, we conclude that

2.1.8 Lemma. Assume that I is a good rate function and that { p E: E > 0) satisfies the full large deviation principle with rate function I . If @ : E [00, 00) is an upper semicontinuous function which satisfies
Large Deviations
42 then

lim clog
S0
(J exp[@/c]dpt)
5 sup(@ I). E
PROOF:We first work in the case when @ 5 M for some M E ( 0 , ~ ) . Given L > 0, set KL = { q : I ( q ) 5 L ) . Since @ is upper semicontinuous and K L CC E , we can choose, for given 6 > 0, a finite set {qm}$=l KL and positive numbers T I , . . . , T , so that
for 1 5 m 5 n,where B,
= B(q,,
J exp [@/el 44I exp [f ( M +
T,).
6
1%
Thus, if G = Uz=, B,,
( P E
then
))I
(GC)
and so
1
5 sup(@ I ) v ( M  L ) + 26. ( E
Now let 6 \ 0 and L /” 00. To treat the general case, set preceding to show that
where
@M = @
A M for M E (0, GO), and use the
11 Some Generalities
43
2.1.10 Theorem. (VARADHAN) Let I be a good rate function and assiime that { p e : E > 0) & M1(E) satisfies the full large deviation principle with rate function I . If @ E C ( E ;R) satisfies (2.1.9), then (2.1.11)
O’€lim 6 1%
(/
I). exP[@/€]d P € ) = sup(@ E
In particular, (2.1.11) holds if @ E C ( E ;R) satisfies (2.1.12) for some a E (1,m).
PROOF:In view of Lemma 2.1.7 and Lemma 2.1.8 , all that we have to do is check that (2.1.12) implies (2.1.9). But, by HOLDER’Sinequality,
from which (2.1.9) follows immediately when (2.1.12) holds. I 2.1.13 Exercise.
(i) Define EULER’S Gamma function by
r(Y)=
io
t71 t
e
dt,
yE(0,~).
7 0 0 )
Note that y7+lr(y) = y
J
t7le7t
dt;
(O@)
and using Theorem 2.1.10 together with Exercise 1.1.6, conclude that
This is, of course, a very weak version of STIRLING’S formula and, as such, it serves as a good example of both the virtues and the deficiencies in the asymptotic theory with which we are dealing.
Large Deviations
44

(ii) Let W be WIENER’S measure on 0 with d = 1; and, for given P E R, define Xp : [0, 00) x 0 R by the equation
and up : 0 + [0, 00) by
If, for e > 0, pp,+ E M1([O,00)) is the distribution of 0 +I e 1 / 2 u p ( B ) under W , show that : e > 0) satisfies the full large deviation principle with the good rate function Ip : [O, 00) [0, GO] given by

where
“8=(
p2
 w;
for P 2 1
p2
+
for P < 1
W;
and
={
t h e w E ( 0 , n ) s u c h t h a t wcosw=Psinw 0
) that wcoshw = psinhw thew E ( 0 , ~such
ifPE(Oo,l] ifp=1 if /3 E ( 1 , ~ ) .
Hint: Note that, by Lemma 2.1.4 combined with SCHILDER’S Theorem, the desired large deviation principle holds with 1
Ip(u) = inf{
(&t)  P $ ( t ) ) 2 dt : $ E H1with
I’
4(t)2dt = u2
= Ip(1)u2;

and use the calculus of variations to evaluate I p ( 1 ) .
(iii) Next, define Yp : [0,00) x O2 stochastic integral equation
[0,00) to be the solution of the IT^)
II Some Generalities
45
under W 2 ,and note that
[l t
Yp(t, 0,0') = exp
X p ( s , 0) d0'(s) 
f
1 t
X $ ( s , 0) ds] .
Letting Pp E M1([0,co))denote the distribution of
(e,e') E o2
++
Yj(i,e,e')
under W 2 ,check that Pp(dy) = pp(y)dy where pp(y) = iqp(1ogy) and, for z # 0,
where 6 = 1/1z1. Finally, use ii) above and VARADHAN'S Theorem to show from this that
and that
lim c log
€+O
1
(J
(O@)
[
1 (1  0 2 / 2 ) 2 exp E
2a2
+ 0 2 / 2 y + I p ( 0 ) ] = ;[I + (1+
~ C X ~ ) ~ / ~ ] .
2a2
2.1.14 Exercise. Let { p B: 6
> 0)
M1(E) and a rate function I : E

[0, m] be given.
(i) If I is good and { p E : E > 0) satisfies the full large deviation principle with rate I, show that there is a q E E for which I ( q ) = 0.
(ii) Assuming that E is locally compact and that { p , : 6 > 0) satisfies the full large deviation principle with rate I, show that I is good if and only if {pe : E
> 0) is exponentially tight.
Large Deviations
46
for every lower semicontinuous @ : E

[00,
001.
(iv) If
lim lim clog [ P E ( B ( Q , T ) ) ]I  I ( q ) , c+o
Q E E,
T\O
show that
In particular, this means, of course, that

lim clog(pe(K)) 5 f;i
I,
EO'
K
CC
E.
Also, check that
I
limelog [/exp[@/~] dp. 5 s u p ( @  I ) €+O
for every upper semicontinuous @ : E only (2.1.9) but also the condition that { q E E : @(q)
+
[00,00)
I L } CC E , L E
which satisfies not
[o,~).
(v) Assume that lim limclog ( p . ( B ( q , T ) ) ) =  I ( q ) T
V
F o
=o\rlirn a+O limelog ( p a ( ~ ( q , ~ ) ) q) E , E.
Show that { p c : rate I and that
E
> 0)
satisfies the weak large deviation principle with
I1 Some Generalities for
47
E C ( E ;R) which satisfy (2.1.9) and the condition ( 4 E E : @((I) 5 L ) cc E ,
LE
[O,Oo).
2.1.15 Exercise.

For each i from an index set Z let { p i + : E > 0) be a family of probability measures on E . Assume that there exists a good rate function I : E [0,00] with the property that
(2.1.16)
ro
Show that for any @ E C ( E ;R) which satisfies
one has that
 sup{@(q) I ( q ) : q E
EIJ = 0.
In particular, show that (2.1.17), and therefore (2.1.18), holds if @ E C ( E ;R) satisfies (2.1.19) for some cx E (1,m).
2.1.20 Exercise.
This exercise contains several variations on the theme of Lemma 2.1.4. Throughout, { p E : E > 0) 5 Ml(E), I : E + [O,oo] is a good rate function, and E' is a second separable metric space.
(i) Assume that { p E: E > 0) satisfies the full large deviation principle with rate I. Further, assume that there is a nondecreasing family {FL : L 2 0 )
Large Deviations
48
of closed sets in E with the properties that pE(F,) = 1, E F, ULz0FL, and that

limelog ( p E ( p ; ) ) I L,
EO ’
> 0,
where
L 2 0.
Finally, suppose that f : F, + E’ is a function whose restriction to each FL, L 0, is continuous. If p: E Ml(E‘) is defined by
for
E
> 0 and r‘ E BE,and if I’(q‘) = inf{I(q) : q E F, and f ( q ) = q ’ } ,
q’ E E’,
show that I’ is a good rate function on E’ and that it governs the large deviations of {p: : E > 0).
(ii) Let {fe : E 5 0) be a family of continuous maps from E into E’, set IA(q’) = inf{I(q) : q E E and q’ = f o ( q ) } , q’ E E’, and assume that
where p’ denotes the metric on E’. Assuming that { p e : E > 0) satisfies the full large deviation principle with rate I, show that { p eo f,’ : c > 0) satisfies the full large deviation principle with the good rate function I;.

(iii) Let f : [O,m) x E E‘ be a measurable function for which there exists a sequence { f n } y C([O,oo)x E ; E’) with the properties that fn(O, .) f(0, .) uniformly on each level set of I and


Assuming that { p E: E > 0) is exponentially tight and that I governs the large deviations of {pe : E > 0}, show that the function I’ : E‘ [0,m] given by I‘(q’) = inf{l(q) : qi = f ( O , q ) } , q’ E E’, is a good rate function and that it governs the large deviations of { p E o f(E)l : E > 0).
II Some Generalities
49
(iv) Again assume that { p c : E > 0) is an exponentially tight family whose large deviations are governed by I . Next, let X be a compact metric space and suppose that f : [ O , c o ) x E x X E’ is a measurable map with the property that there is a sequence E C([O,00) x E x X ;E’) such that
{fR}r
for L E [0, co),and
for S E (0, co). Finally, define I: : E’

[0, W] for x E X by
IL(q‘) = inf{l(q) : q‘ = f(0, q, x)},
q’ E E’

Show that I: is a good rate function for each x E X. In addition, show that if x E 2 as E \ 0 and if g ( E , q ) = ~ ( E , Q , z , for ) ( E , Q ) E (0,co) x E , then I; governs the large deviations of { p Eo g ( c )  l : c > O}.
Hint: By using the exponential tightness of {pE: c every 6 E (0, co),
> 0}, show that, for
(v) Refer to the setting of part (iv) above and suppose that has the property that
f
C(E’;R)
for some Q E ( 1 , ~ )Show . that
2.1.2 1 Exercise.
The purpose of this exercise is to check that the full large deviation principle behaves in a functorial fashion under projective limits. To be precise, suppose that {En: n E Z’} is a sequence of Polish spaces and that,
Large Deviations
50
for each n E Z+, pn is a complete metric for En and r n + l , n : E n + 1 En is a mapping with the property that Pn (Tn+l,nzn+l, rn+l,nYn+l) I pn+l(zn+l,Yn+l) for all Zn+l, yn+l E En+l. Define E to be the set of 00 x = (XI, , z n , . . ) E En such that 2, = n n + l , n Z n + l for every n E Z+, and let nn denote the restriction to E of the natural projection map from En onto En. Give E the topology which it inherits from the product topology on En, and define
nn=l
n:=l
n,"=,
( 2.1.22)
(i) Show that E is a Polish space and, in fact, that p is a complete metric on E. Further, check that G is an open subset of E if and only if G = Up.lr;lGn, where G, is an open subset of En for each n E Z+. Also, check that for each closed subset F of E and every 6 > 0, there is an n E Z+ and a closed subset Fn of En such that F & n;'Fn F(", where F(@is computed relative to the metric p. Finally, show that K CC E if and only if K = n,",lr;lKn, where Kn cc En for each n E Z+. The metric space ( E , p ) is called the projective limit of the sequence {(En,rn+l,n,pn): n E n+}* (ii) For each n E Z+ suppose that In : En [0,m] is a good rate function, and define

(2.1.23)
I(x) = sup In(rnx), z E E. ncZ+
Show that I is a good rate function and that
I n ( z n )= inf{I(x) : 2, = rn(x)}, n E Z+ and x, E En. (iii) Again let In be a good rate function on En for each n E Z+ and define I accordingly as in (2.1.23). Next, let { p e : E > 0) & M 1 ( E ) ; and, for each n E Z+ set pn,+ = pLE o ngl.If, for each n E Z+,
for every open G, in En, show that
( '(
limtlog p G
€ 40
)> 2 infI G
for every open G in E. Similarly, if, for each n E Z+,

lim c log ( p n , . (Fn)) 5  inf In Fn
O E'
for all closed Fn in En, show that
( '(
limelog p F

€40
for every closed F in E.
'1 5  i n f I F
II Some Generalities
51
2.1.24 Exercise.
Assume that { p , : E > 0) satisfies the full large deviation principle with respect to the good rate function I , and suppose that @ E C ( E ;R) satisfies the condition in (2.1.9).
(i) Show that
and that K c l o g ( l e x p [ @ ( q ) / ~pE(dq) ] 5 sup(@ I ) )
E0
for closed F
E.
F
In particular, conclude that lim inf{I(q)  (a(q) : q E E with @ ( q ) > L } = 00.
LPW
Hint: For
r E BE,set @'r(q) =
{
03
ifqel? if q 4 r;
and apply Lemma 2.1.7 and Lemma 2.1.8 to
(ii) For
E
@G
and
@F,
respectively.
> 0, define
Next, set J ( q ) = I ( q )  @ ( q )  a , q E E , where a = infE(I  (a). Show that J is a good rate function and that it governs the large deviations of {ve : E > 0). Finally, check that when there is precisely one p E E at which J vanishes then the measures v, converge to 6,. 2.1.25 Exercise.
 
In Theorem 1.4.25, we made the assumption that the diffusion matrix a : Rd Rd 8 W d was symmetric and that a together with the drift coefficient b : Rd Rd satisfy (1.4.10) and (1.4.11). Here we replace those assumptions by
Large Deviations
52
{
I;”.:”,X)= inf IT($) : $ E H1 and, for t E [O,T],
for X E RT (cf. (1.3.36) for the notation here) is a good rate function and that (1.4.26) continues to hold. Next (cf. Exercise 1.4.33) extend this result to cover the case when the preceding upper bounds on a and b are replaced by
0 5 a(.)
5 M ( 1 + 1z12)1/21Rdand lb(z)I 5 M ( 1 +
IXI~)”~,
zE
Wd.
2.2 Large Deviations and Convex Analysis
As we saw in Chapter I, it is sometimes the case that the state space E is a separable BANACH space. Furthermore, even when E is not itself a vector space, it often turns out that it is a convex subset of one. For this reason we formulate the following somewhat cumbersome hypothesis about E.
(C)
E is a closed convex subset of the locally convex, HAUSDORFF topological (real) vector space X, and E is a Polish space with respect to the topology that it inherits as a subset of X .
2.2.1 Remark.
The two examples which should be kept in mihd are when E is itself a separable BANACH space (in which case we take X = E ) and when E = MI@), where C is a Polish space. In the latter case, we take X = M(C) to be the space of all finite signed measures on C and endow M(C) with the topology generated by the sets
(2.2.2)
{ P E W :
IJdWaIl
< T } ,
where a E M(C), E Cb(C; Fa), and T > 0. As is well known (cf. Lemma 3.2.2 below), the LEVY metric on is a complete separable metric, which is consistent with the restriction of this topology to M1(C).
I1 Some Generalities
53
Throughout the rest of this section we will be assuming, without further comment, that we are in the situation described in (C). In this connection, we will be using X * to denote the (real) topological dual of X ;and, for p E MI( E ) , we will define the logarithmic moment generating function of p to be the map A E X * A,(A) E [m, m] given by

(2.2.3)
As we saw in Sections 1.2 and 1.3, when {p, : E > 0) is a family of measures on a separable BANACH space, the logarithmic moment generating functions of the p E's can play an important role in the analysis of the large deviations of {p, : t > 0). It should therefore come as no surprise that the same is true even when we are working with the more general situation described by (C). The reason for this is partially explained by the next result. 2.2.4 Theorem. Let { p E: t (2.2.5)
A(X)
> 0 ) C_ M l ( E ) and assume that
= 1imEApe(A/E)E [co,oo] O E'

exists for every X E X * . Then A is a convex function on X*. Moreover, if the LEGENDREtransform q E E A * ( q ) of A is defined by (2.2.6)
A*(q) = SUP {x*(A,q),  A(X) : A E X * } ,
then A* is a nonnegative, lower semicontinuous, convex function; and, for any F C c E , (2.2.7)

lim E log (p E(3'))5  inf A*. F
c0
Finally, if in addition, { p c : E > 0) is exponentially tight, then (2.2.7) continues to hold for all closed subsets F of E.
PROOF:The convexity of A follows from that of the Ape 's, which in turn is a consequence of HOLDER'Sinequality. To see that A * ( q ) 2 0 for every q E E , simply note that A(0) = 0. Also, because A* is the pointwise supremum over continuous affine functions on E , it is lower semicontinuous and convex. The proof of (2.2.7) for compact F is little more than a rerun of the argument used to derive (1.3.16). (Note that when p E = W,one has that E A , ~(A/€) = Aw(X) for all E > 0.) Namely, let p E E and 6 E (0,1] be given and choose A E X * so that x*(A,P)x A(A) 2
{
1++ A*(p) 
4
if A*@) = 00 if A * ( p ) < 00.
Large Deviations
54
Next choose T
> 0 so that x*(X,p q ) x 5 6/2 for q E B ( p ,T ) . Since
we see that
for all sufficiently small E and T . Once one has (2.2.8), (2.2.7) for compact F follows from the last part of (iv) in Exercise 2.1.14. Finally, the extension to all closed F when { p E : E > 0) is exponentially tight is precisely the same as the last part of the proof of Lemma 2.1.5. I
Although the preceding indicates that, when A exists, its LEGENDRE transform A" is a good candidate for the rate function governing the large deviations of { p E : 6 > 0 } , we know that, in general, h* will not be the correct rate function. Indeed, from Lemma 2.1.4 we know that when the p e 's come from pushing measures v, forward under a continuous mapping f and if J governs the large deviations of {vE : > 0}, then the large deviations of { p E : E > 0) will be governed by the rate function I given by I ( p ) = inf{J(q) : p = f(q)}. Since it is extremely unlikely that such an I will be convex even if J is, we see that convexity of I will be more the exception than the rule. With the preceding in mind and assuming that {,uE: E > 0) satisfies the full large deviation principle with some rate function I, one might ask if convexity is the only obstruction to the identification of I with A*. As we are about to see, the answer to this question is, apart from minor technicalities, "yes." There are two steps in the proof. The fist one is the easy application of Theorem 2.1.10 alluded to above. 2.2.9 Lemma. Let { p E : E
> 0) E Ml(E) satisfy the condition that
If { p E : 6 > 0) satisfies the full large deviation principle with the good rate function I, then the limit A(X) in (2.2.5) exists for every X E X * and satisfies (2.2.11)
A(X) = SUP { ~ * ( X , Q )~ I ( q ) : p E E } , X E X * .
55
II Some Generalities PROOF:Note that (2.2.10) guarantees that
for each A E X * . Hence, we can apply Theorem 2.1.10 to the function
and thereby conclude not only that A(A) exists but also that (2.2.11) holds. I 2.2.12 Remark.
Let everything be as in Lemma 2.2.9 and define (2.2.13)
Obviously, (2.2.11) is then equivalent to (2.2.14)
{
A(A) = sup x*(A,z),
 ' ( 2 ) : x E X}
,
A E X*.
Moreover, I^ is always lower semicontinuous on X . Finally, X if I is convex on E .
i is convex on
The second step in our program is contained in the following theorem about one of the basic properties of the LEGENDREtransform. If one looks carefully at the proof, one realizes that this property is an analytic statement of the geometric fact that at each point on the graph of a convex function there is a tangent line which never goes above the graph. 2.2.15 Theorem. Let f : X

(oa,cx]be a lower semicontinuous, (cx,m] by
convex function and define g : X *
g(A) = sup{x*(A,z)x 
Iff is not identically equal to (2.2.16)
f(2)
00,
f(.)
:2 E
x}.
then
= sup{x*(A,x)x  g(A) : A E
x*}, 2 E x.
PROOF:The first step in the proof is to develop the geometric picture alluded to above. To this end, define E ( f ) = ( ( 2 , a )E
xxR:
Q
2 f(2))
Large Deviations
56
and
&*(f) = {(x,P) E X * x R : f(x) 1 X * ( X , Z ) ~ P for every x EX}. It is then an easy matter to check from our assumptions that &(f) is a nonempty, closed, convex subset of X x R. Indeed, the closedness and convexity of &(f) come from the lower semicontinuity and convexity of f ; and it is clear that (xo,f(xo))E &(f), where xo is any element of X for which f(x0) < 00. On the other hand, although &*(f) is obviously closed and convex, it is less obvious that it is nonempty. To see that &*(f) # 0, choose xo E E as above and apply the HAHNBANACH Theorem to find a (p,p, r) E X* x R x R with the properties that the closed affine half space
contains the set €(f) but not the point
( 2 0 ,f(xo)
 1). Then, since
while X * ( P 1x o ) x
we see that p
 P ( f ( E 0 )  1) > 7,
> 0, and therefore that
(2.2.18)
Next, noting that
P 1 g(A) for every (A,/?) E &*(f) and
(A,g(X)) E €*(f) for any A E X * with g(X)
< 00,
one sees that S ( W = inf{P : ( A P ) E
&*(fl},
and therefore that (2.2.16) is equivalent to
( 2.2.19)
f(z)= SUP{X*(A,z ) x  P : (A, P) E &*(f)}, x E X.
Since it is clear that f(x) 2 x(X, x), 0 for any z E X and (A, P ) E &*(f), we will have proved (2.2.19) as soon as we show that, for each ( x , a )4 E ( f ) , there is a (A,/?) E &*(f) such that (2.2.20)
x * ( X , z ) ,  P > a.
I1 Some Generalities
57
In order to prove the existence of the pair (X,P) E &*(f) in (2.2.20), suppose that z E X and that a < f (z) are given. Then, since (z, a ) 4 &(f), the HAHNBANACH Theorem again provides the existence of ( p , p , y ) E X' x R x R so that the H ( p , p, 7) in (2.2.17) contains E ( f ) and (z, a ) 4 H ( p , p, y). In particular, since ~ . ( p5, 0 )~pe 5 y for 2 f (zo), we know that p 2 0. Hence, for every 6 > 0,
where (X0,Po) is the element of E*(f) described in (2.2.18). (The introduction of 6 > 0 here is to take care of the case when the tangent hyperplane is vertical and therefore p = 0.) At the same time, for sufficiently small 6 > 0 one has that
Hence, (2.1.20) holds with (Alp) = (X6,Pb) for any sufficiently small 6 > 0. I By combining Lemma 2.2.10 and Remark 2.2.12 with Theorem 2.2.15, we arrive at the following useful algorithm for identifying convex rate functions. 2.2.21 Theorem. Assume that {pc : E > 0) G Ml(E) satisfies (2.2.10) and that I is a convex, good rate function which governs the large deviations of { p E: E > 0). Then, not only does the limit A(X) in (2.2.5) exist for every X E X', but also (2.211) holds and
(2.2.22)
I ( q ) = A*(q)
sup{x*(X,q)x  A(X) : X E X * } ,
qE
X.
2.2.23 Exercise.
Suppose that {pa : 6 > 0) satisfies the weak large deviation principle with rate function I . Further, assume that the limit A(X) in (2.2.5) exists for each X E X'.
(i) Show that A* 5 I . (ii) If one knows, in addition, that A and I satisfy (2.2.11), show that A' 2 f for every lower semicontinuous convex f : E (co,co]which satisfies f 5 I . In other words, A* is the lower semicontinuous,convex minorant of I.

I11
General CramGr Theory
3.1: Preliminary Formulation
We want in this section to extend the CRAMERTheorem (cf. Theorem 1.2.6) to a more general setting. In order to describe the setting which we have in mind, it will be necessary to introduce the following embellished form of the hypothesis (C) made at the beginning of Section 2.2.
(c)
E and X are the same as they were in (C). In addition there is a metric p on E which is compatible with the topology on E induced by the topology on X and a measurable norm 11 . 11 on X (which need not be compatible with the topology on X) such that: ( E , p ) is Polish; 11 . 11 is bounded on pbounded subsets of E ;
for all a E [0, 11 and all elements p l , p2, 41, q2 E E ; and
Without further mention, we will be working in this section with the situation which we now describe. E , X , p, and 11 . 11 are as in (C), and R = EZ' is given the product topology. Note that, since E is a separable metric space, the BORELfield Bn over 52 coincides with the product Zf aalgebra (BE) . Next, for n E Z+, we use X, : R E to denote the nth coordinate map (i.e., X,(w) = w,). In view of the preceding remark about Bn,one sees that not only is each of the maps X, measurable from

58
III General Cram& Theory
(a,Bn) into (X,Bx) but
59
so are linear combinations of these maps. Given 0 5 m 5 n, we will use SF to denote Xi (f0 when m = n ) and
c,"=,+,
Sm S , to stand for +; and when m = 0, we will usually drop the superscript. Finally, p € M1(E), P z pZf (again using the remark about BQ, one sees that P E MI(0)) and p, E M1(E) is the distribution of under P. Our purpose will be to study the large deviation theory for the family { p , : n 2 1). Obviously, to whatever extent we succeed, we will have generalized CRAMER'STheorem. Our approach is an amalgam of ideas coming from D. RUELLEvia 0. LANFORDand the results obtained in Section 2.2. In particular, we will first use LANFORD'S argument to show, in complete generality, that { p , : n >_ 1) satisfies a weak large deviation principle with a convex rate function. We will then do our best to replace the weak principle with the full large deviation principle and to identify the governing rate function. The main reason for our needing to make the assumptions in is that we will want to use the technical facts proved in the following lemma.
17L
s,
(c)
3.1.1 Lemma. Let A be a nonempty, open convex subset of E . Then for any K cc A , the closed convex hull K of K is also a compact subset of A . In particular, if v E M1(E), then, for each S E (0, l),there is a convex K CC A such that v ( K )2 (1  S ) v ( A ) .
PROOF: Suppose that K CC A . Given 0 < S < p ( K , A C )choose , M
PI, . . . , pM E K so that
K cUB(p,,S) 1
cr
and denote by r(6)the set of points amqm, where {a,}? C [0,1] with C,M a , = 1 and qm E B ( p , , S ) , 1 5 m 5 M . Clearly, r(6)g A and is closed in E . Moreover, because implies that pballs are convex, it is easy to show that r(6)is convex. Hence, I? 2 T'(S). This not only proves that K C A , but it also gives us an easy way to see that K is compact. Indeed, again using one sees that
(c)
(c),
where { P I , . . . , p ~ } is the convex hull of { P I , . . . , p ~ and, } as such, is compact. Since K r(6)and 6 can be taken arbitrarily small, it follows immediately that K is totally bounded and therefore, since it is closed in E , compact.
Large Deviations
60
Given the first part, the second part of the lemma is an immediate consequence of the wellknown ULAM’S Lemma which says that, because E and therefore A are Polish spaces, there is a K cc A such that v ( K ) (1 S)v(A);and obviously, the first part says that we may as well take K to be convex. I Our first application of Lemma 3.1.1 occurs already in the second part of the next key result.

3.1.2 Lemma. For each convex C E BE,n E Z+ pn(C) is supermultiplicative. In addition, if A is an open convex subset of E , then either p n ( A ) = 0 for all n E Z+ or there exists an N E Z+ such that pn(A) > 0 for all n 2 N . PROOF:To prove the first assertion, observe that, by convexity,
and therefore, by shift invariance and independence,
We next turn to the second assertion. Suppose that ,u,(A) > 0 for some m E h+, and, using Lemma 3.1.1, choose a convex K C c A so that p , ( K ) > 0. Let 0 < 26 < p ( K ,A C ) take , G = { q E E : llq  KI1 < 6}, and set M = sup{ 11q11 : q E K}. Then, for n = sm T , where 0 5 T < m,
+
as long as m M
< n6. Thus, if we choose N
so that m M
< NS and
then, since K is convex, we have that
for all n 2 N . I Before we can use Lemma 3.1.2, we recall the following simple fact about subaddit ive functions.
III General Cram&r Theory
61

3.1.3 Lemma. Let f : Z+ [O,m] be a subadditive function and assume that there is an N E Z+ such that f ( n ) < 00 for all n 2 N. Then
lim
n+m
f tn ) = inf f (n)E [0, m). n n>N n
PROOF: For m 2 N , set M, = max{f(n) : m 5 n 5 2m). For n 2 m 2
where s = [n/rn] and r = n  ms. Hence,
By combining Lemma 3.1.2 with Lemma 3.1.3, we know that if C" denotes the collection of all nonempty, convex open sets A in E,then 1 C(A) = C,,(A) =  lim  log (pn(A)) E [0, m] noo n
(3.1.4)
exists for every A E C". Noting that if I is the rate function governing the large deviations of { p n : n 2 l}, then (cf. the proof of Lemma 2.1.1)
(3.1.5)
I ( g ) = I,,(q) G lim C,,(B(q,r)) = sup{C,(A) : q E A E C " } , r\O
we see that there is no alternative to our adopting (3.1.5) as the definition of I . Of course, we still have to check that this I does indeed give rise to a large deviation principle. 3.1.6 Theorem. The function I,, in (3.1.5) is a convex rate function on E and { p n : n 2 I} satisfies the weak large deviation principle with rate function I,,. Furthermore, if G is a finite union of elements from C", then
1 lim  log (pn(G)) =  inf I,,.
n+mn
G
PROOF: The lower semicontinuity of I,, is an immediate consequence of its definition. To prove that I,, is convex, let q l , q 2 E E be given, and set q= Given an A E C" containing q, choose A, E C" so that q, E Ai
9.
62
Large Deviations
and A 2 A1aAa.Then
C ( A ) =  lim
n+m
1 2n
 log ( p z n ( A ) )
{w :
1 =
2
5
( lim
1
 log
nmn
Ip(q1)
sn(w) E A1 and Sn(u) E Az}))
1 [ p n ( A l ) ] lim  log [ p n n+m n
(&)I)
+U 4 2 ) . 7
2
+
and from this we conclude that I p ( q ) 5 ( I p ( q l ) I p ( q 2 ) ) / 2 . Because we already know that I p is lower semicontinuous, the convexity of I p is now proved by a familiar iteration argument followed by a passage to the limit. The fact that lim_ ,~ log ( p n ( G ) )2  infG I p for arbitrary open G in E is built into the definition of I p . Next, suppose that K cc E and let C < infK I p . Then, there is a finite cover { A l , . . . , A M }& C" of K with the property that C(Am) > C for each 1 5 m 5 M . Hence,
and so we have proved that En+m log ( p n ( K ) )5  infK I p and therefore that the weak large deviation principle holds. To complete the proof, suppose that G = M A,, where {A,}? G C". Then an easy argument shows that
Ul
1 n
 lim log(pn(G)) nm
=
min L ( A m ) .
ljm<M
Hence, it suffices for us to check that C ( A ) = infA I p for every A E C"; and since we already know that C ( A ) 5 infA I p , this comes down to checking that C ( A ) 1 infA I p when L ( A ) < 00. To this end, let 6 E (0,l) be given and choose N so that log ( p n ( A ) )5 L ( A ) + 6 for n 2 N . Next, we use Lemma 3.1.1 t o find a convex K cc A such that 1 1 1% ( P N ( A ) ) 1% ( P N ( K ) ) < 6. N Then, by subadditivity and the preceding paragraph,
;
inf I p 5 f;i A
1
I p 5 !& log ( p n ( K ) ) n+m
III General Cram& Theory
63
3.1.7 Corollary. If for each L 2 0 there is a K L cc E such that (3.1.8)
 1 lim  log ( p n ( ~ i ) 5 ) L,
n+cc
n
then I p is a good rate function and {p,, : n 2 1) satisfies the full large deviation principle with rate function I p . 16 in addition,
for every X E X*, then
and
PROOF:The first assertion is no more than the conjunction of Theorem 3.1.6 and Lemma 2.1.5. The rest is an immediate consequence of the first part together with Theorem 2.2.21. I 3.1.11 Exercise.
In the case when E = X is finite dimensional and p(p, q ) = 1/q  pll, show that the whole of Corollary 3.1.7 applies as soon as A,(X) < 00 for every X E X*. This is, of course, the CRAMERTheorem in the general finite dimensional setting.
Large Deviations
64
3.2 Sanov's Theorem
In this section we will specialize to the situation described in the second example of Remark 2.2.1. That is, E = and X = M(C), where C is a Polish space and M(C) is given the topology which is generated by the sets in (2.2.2). Clearly, the topology inherited by as a subset of M(C) is the weak topology (i.e., the topology corresponding to convergence against bounded continuous test functions). In order to show that Ml(C) and M(C) satisfy the hypothesis at the beginning of Section 3.1, we must produce the metric p on M1(C) and the norm 11 * I( on M(C). The latter is easy; namely, we take llall to be the total variation of IIallvar of a E M(C). Since
(e)
we see that 11. l V ar is lower semicontinuous and therefore certainly measurable on M(C); and clearly 11 [ I v ar is bounded on MI@). We now turn to the metric for MI@). Following LEVY and PROHOROV, define the L6vy metric p ( a , v) = inf(6 > o : a ( ~5 )V ( F ( " ) ) 6 (3.2.1) and v ( F ) 5 Q ( F ( ~ ) )6 for every closed F in C} e
+
+
for a,v E MI@), where F ( 6 ) is defined relative to a complete metric on C. An easy argument shows that p is a metric and that it satisfies the convexity property required in (6).Since it is clear that p(a,v) 5 IIv  a [ ( , all that remains is to show that p is compatible with the weak topology and that (MI@),p) is Polish. Before proving this, we will need to recall some elementary properties of the weak topology.
(i) The weak topology is second countable. (ii) a,
v if and only if En+m a,(F) 5 v ( F ) for every closed F in C.
(iii) 2 Ml(C) is relatively compact if and only if for each 6 > 0 there is a K cc C such that a ( K ) 2 1  6 for every a E r. (Such a subset r i s said to be tight.)
(iv)If F c b (C;R) is uniformly bounded on all of C and is equicontinuous v implies that on each compact subset of C, then a,
All of these facts are wellknown, and their proofs can be found in any standard text in which the modern theory of weak convergence is discussed. We will now use them to check that the LEVY metric possesses the properties which we want.
111 General Cram& Theory
65
3.2.2 Lemma. ( L ~ v Y& PROHOROV) The metric p in (3.2.1) is compatible with the weak topology on MI@), and (MI@),p) is Polish.

PROOF:In view of property (3)above, it is obvious that a , + v if p ( a n , v) 0. To prove the opposite implication, let S > 0 be given and for each closed F in C define dist (a, ( F ( ' ) ) " ) = dist(a, F )
+ dist (a,( F ( 6 ) ) c')
a E C,
where "dist" is measured with the same metric on C as the one used in the definition of F ( 6 ) .It is then an easy matter to check that {$JF : F closed in C} is uniformly bounded and equicontinuous on C. Hence, by property (iv), if a, ==+ u , then f $JF da, f $JF du at a rate which is independent of F ; and, since X F 5 $F 5 x F ( 6 ) , we conclude from this that p(an,v ) 0 if a, =+ v. (We use the notation XJto denote the indicator (or characteristic) function of a set r.) We have therefore proved that p is compatible with the weak topology on MI@). To prove that p is a complete metric on MI@), suppose that


 
sup p(an,am) n>m
0 as m
00.
We must show that {an}yis relatively compact. To this end, let 6 > 0 be p(a,, am) 5 S/2', given and, for l E Z+, choose m E Z+ so that and then (using property (iii)) choose Ke C C C so that f f k (Ke) 2 1 6/2e
for all n E Z+. Finally, set
n 00
K =
Kj6/2')
e=i
note that K is closed and totally bounded with respect to a complete metric and is therefore compact, and check that an( K " ) 5 26 for all n E Z+. Thus, by property (iii), {an}yis indeed relatively compact. I Before getting down to the main business of this section, there is one more general fact about the space M(C) which it will be useful to have at our disposal. Namely, we want a good representation of M(C)*.
3.2.3 Lemma. The duality relation (3.2.4)
Large Deviations
66
sc
determines a representation of M(C)* as cb(C; R).
PROOF:Clearly, for each 4 E cb(C;R), a E M(C) 4da determines a unique element of M(C)*. Thus, all that we have to show is that every element of M(C)* arises in this way. Let X E M(C)* be given and define $(a) = A(&), (T E C. Clearly, 4 is continuous. Moreover, because of the way in which the topology on M(C) is defined, we can find a finite set {$~~}r/lE Cb(C; R) such that
and from this it is clear that 4 is bounded. Finally, it is obvious that X(Q) = 4da if a is a linear combination of point masses; and, because such a's are dense in M(C), it follows that this equation holds for all Q E M(C). I
s,
Returning to the problem of large deviations, let Q E Ml(Ml(C)) be given and define Qn E M1 (Ml(C))to be the distribution of U = (Vl,.. . ,V,)
E
M1(C)"
C 1 "

n k=l
vk E Ml(C)
under Q" E M1(MI@)").By the Weak Law of Large Numbers combined with the second countability of the weak topology on MI@),one can easily check that Qn ,,a , where PQ E Ml(C) is defined by r
Thus, it is reasonable to inquire about the large deviations of {Qn : n 2 1). In fact, by the results which we proved in Section 3.1, we will know that the large deviations {Qn : n 2 1) are governed by the rate function (3.2.5)
I Q ( v )= h b ( V )
SUP
4 d V  A Q ( ~ :) 4 E Cb(C;R)
for v E Ml(C), where
and that IQ is good, as soon as we show that {Qn : n 2 1) is exponentially tight. In order to do so, we employ the following remarkably useful general observation which will serve us well not only here but also later on.
III General Cram& Theory
67
3.2.7 Lemma. Let p E Ml(C) be a fixed and suppose that {Vm}E=lis a bounded sequence of nonnegative, measurable functions on C which tends to 0 in pmeasure as m m. Then, for each M E [l,m) and /3 E [I,GO) with the property that there is a subsequence (Vmf

for 0 < E I 1 and L E [ l , m ) ,whenever {R,: c family which satisfies
13.2.9)
0 0} satisfies the full large deviation principle with rate I.
72
Large Deviations To prove (3.2.20), use Theorem 2.1.10 and (3.2.9) to obtain
for all u E M1(C) and V E cb(C;w);and so, for each v E M1(C),

for every bounded measurable V : C [0,m). In particular, just as in the proof of Lemma 3.2.13, I(v) = m if v is not absolutely continuous with respect to p. On the other hand, if v 0 for which (3.4.3) holds is a consequence of FERNIQUE’S Theorem (cf. Theorem 1.3.24). Furthermore, the equalities in (3.4.4) are all obtained from consideration of the Rvalued, centered GAussian random variable x E E I+ p ( X , z ) ~ the ; inequality is trivial; and the finiteness of B follows trivially from (3.4.3). Finally, given (3.4.3), the fact that A; is a good rate function is covered in the statement of Theorem 3.3.11; and the homogeneity of A; is an immediate consequence of the homogeneity of Ap. I Following the pattern in SCHILDER’S Theorem, we now define pe to be the distribution of x E E d I 2 x E E under p; and, as a first approximation to his result, we present the following.

Large Deviations
86 3.4.5 Theorem. The family { p E : E
> 0) satisfies the full large deviation
principle with the good rate function A;. PROOF:We have already pointed out that, as a consequence of Theorem 3.3.11, A; is a good rate function. Furthermore, since p1ln is the distribution under p n of x E E n x k (i.e., plln here is the same measure as the one which we denoted by p n in Section 3.3), Theorem 3.3.11 allows us to also conclude from (3.4.3) that { p l l n : n 1 1) satisfies the full large deviation principle with rate A;. In order to pass from this statement to ] 1 and y ( ~ = ) en(€)for E > 0. It is the desired one, set n(E) = [ 1 / ~ V y(~)’/’x under p l / n ( B )has distribution p E and that then clear that x y(e) E [l E , 11 for 0 < E < 1. Now suppose that F is a closed subset of E and set F = {ylI2x : y E 13 and x E F}. Then F is also closed, and so
 xy

[i,
Since
this proves the upper bound in the large deviation principle. To prove the lower bound, let G be an ‘open set in E and suppose that x E G. Then we can find an open neighborhood U of x and an EO E (0,1/2] such that U C y(c)’/’G for all 0 < E < €0. Hence
) lim Y(E) log ( p n ( E ) ( y ( ~ )  l / Z G ) ) lim E log ( p E ( G )=

F o4
BO ’
6 )
1
1 noo & I log (p1ln(U)) 2  inf A; 2 A;(.). n U Thus, the lower bound is also proved. I As a dividend of Theorem 3.4.5, we get the following sharpening of the estimate in (3.4.3). 3.4.6 Corollary. (DONSKER & VARADHAN) Set (3.4.7) a = inf{A;(x) : 11x11~= 1) and b = sup{Q,(X,X) : I l X l l p = 1).
Then (3.4.8)
lim
RCC
1
log [p({x E E : l
l ~ l 2l ~R})]= U
1 2b
= 
III General Cramer Theory
&
87
sEIlxll&p(dx)
0; and therefore I = ( q ) 2 C(q, ~ / 2 )2 C ( p ,r) 2 C for all q E B(p,r/4). To prove the convexity, let p , q E E with Ifi(p)V I f i ( q ) < 00 be given. For r > 0, choose 6 > 0 so that
Then
for all large enough n E Z+. Hence,
IV Uniform Large Deviations
95
+
(q)
and so Ifi 5 (Ifi(p) I f i ( q ) ) / 2 .Since Ifi is lower semicontinuous, it follows from this that it is also convex. Next, suppose that G is open in E and that p E G with I f i ( p ) < 00. Then, for T > 0 with B ( p , r ) g G,
!&
1 log'P,(G)
1
2 !& lo g P,(B(p ,~ ) ) 2 L(P,T); n+m n
n+m
and therefore lim?
logPn(G) 2 Ifi(p). We now introduce an assumption which, among other things, guarantees that our MARKOVchain is uniformly ergodic (cf. Exercise 4.1.48 below). Namely, we will assume that there exist C, N E Z+ and M E [1,00)with e 5 N such that
(6)
{
,oo
P ( 6 , .) 5
g g=l fly?, )
for all 6, .iE
*
supeE&JE exp[allzll]riE(6, dx) < 0;)
where lP+'(6, . ) =
for Q E
J2 h y i , . ) fi@,d i ) ,
m
2
(0,0;)),
2 1,
and f I ~ ( 8r) , = fl(C?,C x r) for r E BE. The next lemma contains some important preliminary consequences of
(0). 4.1.9 Lemma. Assume that (U) holds. If
then, for ail m E Z+, S > 0, n E N, and (4.1. l o )
Q
E ( 0 , ~ :)
sup P8 ( ( 2 : I I S ~ + ~ ( L .2J 61) ) I I 5 exp[crb
eE2
+rn~,].
In particular, this means that (4.1.5) is satisfied. Furthermore, if to E MI$) and Q E M1(E) is defined by
Q(r)= $
c J: N
Ic=l
c r)fio(de), r E B E ,
~ ( 6 ,x
c
then (4.1.11)
J,e x ~ [ ~ l l ~Ql( ld]x ) 5 exp[&],
Q
E
(o,~),
96
Large Deviations
and (41.12)
Ln 
F(xt+m(G), .. . t x n t + m ( G ) ) p e ( u )I M"
F ( x ) Q"(dx)
for all m E N, n E Z+, and all measurable F : En [O,oo). Finally, if either E is the space of probability measures on some Polish space or (X,11 . 11) is a separable BANACH space, then, for each L 2 0, there is a K L cc E for which
PROOF: Noting that
4
we see that (4.1.10) will be proved as soon as we show that
IJ,eXP[allSm(G)lll (JEexP[.ll.lll
) R&W3
fiE(~m(4,a)
and so the required estimate follows by induction on m. Next let CO be given and define Q accordingly. Then, since
firn+1(S,. ) =
Jc fi(i, ) A m ( 6 , d i ) 5 supfi(i, *), (€5

we see that Q satisfies (4.1.11). In addition, since, for any m E N, n E Z+, and measurable F : En [0,oa)
IV
Uniform Large Deviations
97
and therefore the desired result follows easily by induction on n. Given (4.1.10) and (4.1.12), the proof of (4.1.13) can be accomplished as follows. Using Lemma 3.2.7 in the case when E is the space of probability measures on a Polish space and Theorem 3.3.11 when ( X , 11 11) is space, we can find compact sets K L in E such that a separable BANACH limnm log (Qn(KE))5  ( M L ) . Furthermore, by Lemma 3.1.1, we may assume that these K L 's are convex. Hence,
+
and so, by (4.1.12), the required estimate follows. I We are now ready to prove the basic large deviation result of this section. 4.1.14 Theorem. Assume that
(6)holds and let Ifi be the function de
fined in (4.1.6). Then, for every K cc E , (4.1.15) Furthermore, if either E is the space ofprobability measures on some Polish space or (X, 11 . 11) is a separable BANACHspace, then Ia is a good rate function anti, for all I' E BE,
Large Deviations
98
In particular, (4.1.17)
infpi,ne(r) I
d,,(r>I suppi,ne(r), r E BE. i
What we will show first is that everything holds when pa,n is replaced by and Ifi is replaced by Ufi.It will then be a relatively simple matter to pass to the desired statements. We begin by showing that (4.1.18) for every p E E . To this end, suppose that 0 < a < I f i ( p ) and choose 6 > 0 so that L(p,46) > a. Then, by (U),for all 6 , f E 2,
At the same time, for each 1 I m 5 N ,
Since P, (B(p,26)) 5 exp [na] for sufficiently large n 2 1, we conclude from the above that
and therefore that (4.1.18) holds. Given (4.1.18), one can proceed in exactly the same way as one did in part (iv) of Exercise 2.1.14 to show that
IV
Uniform Large Deviations
99
In particular, we now know that (4.1.15) holds when p ; ~ and , ~ 1, are replaced by p:,, and [In, respectively. Furthermore, from (4.1.8) and (4.1.17), we see that
for every open G in E. Finally, suppose that E is either the space of probability measures on some Polish space or that (X, 11 . 11) is a separable BANACHspace. By (4.1.13), we know that there is a family { K L : L 2 0) of compact, convex E such that
Hence, just as in the proof of Lemma 2.1.5, we conclude not only that I= is good but also that (4.1.16) holds with ~ 2 and , CIfi ~ in place of p ; ~and , ~
In. In order to complete the proof, note that, from (4.1.10),
for every r E BE. Hence, (4.1.15) follows from (4.1.19); and, when E is either the space of probability measures on some Polish space or a separable BANACHspace, the right hand side of (4.1.16) is a n easy consequence of the fact that it holds when Ifi and p ~ .are , ~replaced by f?Ih and P:,~, respectively. Since the left hand side of (3.1.16) is precisely (3.1.8), the proof is now complete. I 4.1.20 Corollary. Assume that (U) holds and that either E is the space of probability measures on some Polish space C' or that (X, 1) . 11) is a separable real BANACH space. Then, for every @ E C(E; R) which satisfies 1/12
(4.1.21)
SUP SUP nEZ+
( L e x ~ [ n a @ &] , n )
6 ~ 2
0, there exist 0 < a < b < 00 such that inft+b] Pt(I'(6))> 0.
PROOF:The supermultiplicative property is proved in exactly the same way as it was in Lemma 4.1.4. As for the second part, suppose that, for some s E [0, m), P,(r)> 0; and, for t > s, define qt E [0, 00) and Tt E [0, s) so that t = qts rt. Since r is bounded and
+

y s r tq),
stq
= t
one sees that, for sufficiently large t
> s:
Pu({w : s t ( w ) E 1'(6)})2 Pu({w : q ( w ) E I' and ~ ~ S v t ( < w t) 6~/ 2~ } )
2 ~ a ( r ) ~ ~ ~ u:(~{ ~w
S r to
With the same argument as we used to prove Lemma 4.1.7, one can easily check the following.

4.2.7 Lemma. Assume that (4.2.4) holds and refer to the preceding. Then
the map I p : E open G in E ,
[0, ca] is lower semicontinuous and convex; and, for
In order to get the complementary upper bound, we must introduce an assumption which will play the same role here as (6)played in Section 4.1. Namely, we will assume that
for some M E [I, m) and
PI, p2
E M1((0,1]).
4.2.8 Remark.
(a)
Although we have stated in such a way that time t = 1 appears to have special importance, this is in fact not the case. Indeed, as will be apparent from the development given below, we could deal equally well with the case in which the probability measures p1 and p2 are supported on any bounded interval in (0,m). We have chosen the interval (0,1] only for convenience.
As an easy consequence of toward the upper bound. 4.2.9 Lemma. Assume that
(a)we can take the following initial step
(a)holds.
114
Large Deviations
(i) If
then (4.2.10)
for all a E (0,m) and (t,(T) E (0, m) x C. In particular, (4.2.4) holds. (ii) For every p E E
and so
PROOF: The proof of (4.2.10) can be accomplished by induction on [ t ] using the MARKOVproperty. The details are left to the reader (cf. Lemma 4.1.9). By the usual HEINEBOREL argument (cf. (iv) of Exercise 2.1.14), we need only prove the first assertion in (ii). To this end, let a < I p ( p ) and choose 6 > 0 so that C ( p ,36) > a. Then, for any (T,T E C and t > 0, Po,t(B(P,6 ) )
from which the required estimate is immediate. I
IV Uniform Large Deviations
115
Combining Lemma 4.2.7 with Lemma 4.2.9, we now see that, under (fi), {pu,t : t > 0) satisfies the weak large deviation principle, uniformly in CY E C, with rate function I p . In order to show that I , is a good rate function and that the full (uniform) large deviation principle holds under it will be useful to have some additional notation. Set fi = R x (0, 1IN;and, for CY E C, define Po= P, x py on (fi, B x BE,ll). Next, for G = (w, t) E d, set S.(G) = S.(w),s . ( G ) = s.(w); and define
(a),
T~(G =) 72 +
n
C
tm,
R,
E N,
m=O
and
m=l
where

Finally, for ( T ,a) E (0, oo) x C, we denote by ,G,,,T E bution under Pu of G E fi S T ( ; ) E E. 4.2.11 Lemma. Assume that for 6 E (0, oo),
M l ( E ) the
distri
(0)holds and refer to the preceding.
Then,
Furthermore, given v E M1(C), define M u E
for
E
BE. Then
Mi(E)
by
116
and, for every n E Z+ and all measurable F : E"
h
(4.2.13)
x m + z ( 4 ? .*
J$Lh(4,
IM"
Ln

7

Large Deviations [0, m),
xm+2("1)(W))
RT(d4
F ( x )M X W .
In particular, when E is either the space of probability measures on some Polish space or E = (X, I( 11) is a separable BANACH space, there exists for each L E [ 1 , m ) a Kr, CC E such that


Finally, in the case described in Remark 4.2.2, for every measurable function V : C [O, 001,
where p,, E MI@) is given by (4.2.15)
IV Uniform Large Deviations
117
and, for T E [1,00),
After combining these with the above and again applying HOLDER'Sinequality, we arrive at
for all ( T , a ) E [l,00) x C and a E (0,00); and clearly (4.2.12) is an easy step from here.
To prove (4.2.13), let v E M1(C) be given and define M u E Ml(E) accordingly. The estimate on the exponential moments of M u is an easy consequence of (4.2.10). Next, denote by fiu E Ml(E) the distribution XI(;) E E ; and note that, by the second part under Pu of 5 E fl of fiu 5 M M , for all a E C. At the same time, by the MARKOV property, we have that

(o),
where fiG
= fig,,() and
= Cl+7m+z(n1)(3)(W)
for 5 = ( w , t). Combining this with the preceding and using induction on n, one quickly arrives at (4.2.13). In order to use (4.2.13) to check the uniform exponential tightness of : t > 0} when E = (X, 1) . 11) is a separable BANACH space, define G E fi LT(G) E MI(E) by (cf. the paragraph preceding the statement of this lemma)
{fiu,t

n(T,G)
Pm(T,5)6xm(j);
LT(5)= m=1
and let Q u , E~ Ml(Ml(E)) be the distribution of LT under Pu.Since (cf. Theorem 3.3.4) S T ( & ) = m(LT(G)),
Large Deviations
118
the desired exponential tightness follows immediately from the last part of Lemma 3.3.10 once one notices that, by (4.2.13),
for T E [l,cm) and measurable V : E

[0, m].
When E = M1(C’) for some Polish space C’, we apply the preceding and JENSEN’Sinequality to conclude that
for all T E [l,m) and measurable V : C’ given by p:=J

[ O , o o ] , where p: E M1(C’) is
a’M,(da’). M i (E’)
Thus, Lemma 3.2.7 applies and yields the desired tightness. Finally, to prove (4.2.14) in the situation described in Remark 4.2.2, it suffices to apply the preceding (with C’ = C) and to observe both that the coincides with the pv in (4.2.15) and that above
for 2 = ( w ,t), T E [l,cm), and measurable V : C

[0, m].
IV
Uniform Large Deviations
119
We are now in a position to prove the main result of this section.
(a)
holds and that E is either a separable BANACH space or the space ofprobability measures on a Polish space. Then the I p in (4.2.6) is a good, convex rate function and, for every E BE,
4.2.16 Theorem. Assume that
PROOF:We have already proved everything except the goodness of IF and the upper bound for closed sets. But, by Lemma 4.2.7 combined with (4.2.12), we see that 1 lim 7 log [ 21$ tcc
>
,Gu,t(G)]  inf I p G
s
for all open G E . In particular, since {fiu,t : t > 0) is uniformly exponentially tight, we now see that, for each L E [l,co)there is a K L Cc E such that 1 inf I  >  lim  log KZ p 
too
t
and, since we already know that I p is lower semicontinuous, this completes the proof that I p is good. Turning to the upper bound for closed sets, note that by combining (ii) of Lemma 4.2.9 with (4.2.12) one sees that
for every p E E . Thus, again by the uniform exponential tightness of {jiu,t : t > 0}, we know that the upper bound holds when pa,t is replaced by jiu,t. But, by (4.2.12), this means that
for every closed F & E and S > 0; and, because (cf. (2.1.3)) I p is good, this is all that we need in order to get the upper bound. Applying the results of Chapter 11, we can now take the following preliminary step toward the identification of the rate function I p .
Large Deviations
120

4.2.17 Corollary. Let everything be as in the statement of Theorem 4.2.16. Then, for each continuous @ : E R which satisfies
for some Q E (1, oo), lim sup [:log
tca U€C
(J,exp[t~(r)l pu,t(dz))  xEE sup(a(z) IF(.)>
I
= 0.
In particular, if (4.2.18)
Ap(X)
 1
= t'Wlim t UsupAP,,,(tX), EC
A E X*,
then h p ( X ) E R for X E X*, (4.2.19)
Ap(X) = S U P { X * ( X , Z ) ~  I ~ ( z :) x E E } ,
X E X*,
IF(.) = SUP { ~ * ( X , X ) ,  Ap(z) : X E X * } , z E E ,
and (4.2.20)
 Ap(X) = 0,
X E X'.
We are now at the same stage in our development here as we were after proving Corollary 4.1.20 in Section 4.1; and, once again, we want to develop the analogue of the identification made in (4.1.42). Thus, from now on, we will be assuming that we are in the situation described in Remark 4.2.2 and we introduce (4.2.21)
Ap(V) =
lim f log (sup
t+m
t
UEC
1 n
exp
[l
V ( C s ( w ) )ds] P,(dw))
for V E B(C;W). By Corollary 4.2.17, we know that, under
(o),
I P ( V ) = A;.(.)
(4.2.22)
= SUP
{
V ( O ~) ( d a) A p ( V ): V E Cb(C; R)
for v E M1(C). Clearly,
( A p ( V ) AP(W)I I IIV  WIIB,

V, W E B ( C ;
W;
and, by HOLDER'S inequality, one sees that V E B ( C ; R ) Ap(V) is convex. Our goal is to find alternative expressions for these functionals; and, just as in the discrete time setting, we point out that the identification itself does not rely hypothesis (6).
IV Uniform Large Deviations
121
What we have to do first is interpret Ap(V) as the logarithmic spectral radius of an appropriate operator; and this will again involve the FEYNMANKAC formula. However, in the present setting, there are a few more technical details which have to be confronted. In the first place, we must do a little elementary perturbation theory for semigroups. Define
(4.2.24)
for ( t ,a) E (0, a)x C. In particular, if
+
for E B ( C ;W), then {PF : t > 0) is a semigroup of bounded operators on B(C;R); and, in fact, IIP?llop 5 exp[tllV+II~],t > 0.
PROOF:The existence as well as the uniqueness of ug is an elementary application of the standard P I C A R D iteration procedure for solving equations of VOLTERRA type. Furthermore, as a consequence of the uniqueness, one can easily prove the semigroup property for : t > 0) by checking that u(t,.) = u z ( s t,.) satisfies (4.2.24) with replaced by [P:4]. Finally, the asserted bound follows immediately from
+
{Py
+
We can now state and prove the FeynmanKac formula in the context of continuoustime processes. 4.2.25 Theorem. For each V E B ( C ; W) and all 4 E B ( C ;R),
Large Deviations
122 In particular, if
P ( t ,U , r) z
exp [[o,,l
J
V ( C S ( 4 )ds P d d U )
w:xt(w)Er)
f o r ( t , u , r ) E ( O , O O ) X C X B ~ , then [P,"#](D) = & # ( < ) ~ ~ ( t , ~ , forall d O}invariant, and [Lv0 P,"#] = [ P y 0 L"#] for t E ( 0 , ~and ) 4 E D". Moreover, if X > p(Py) and
Large Deviations
124
for 4 E B(C;R)+, then RY admits a unique extension its a bounded linear operator taking B(C;R) into Bo;and, for each 4 E B(C;R),
fort E (0, m). In particular, if 4 E Bo, then, for every X DV and
> p ( P y ) , [RY4] E
[Lv0 Ry4] = X[RY+] 4. Finally, if V E Cb(C; R) and X D n DV and
> p ( P r ) , then, for every 4 E Bo, [Ry4] E
PROOF:The preliminary assertions are all trivial consequences of the definition just given. To see that Ry can be extended as a bounded linear operator on B(C;W), it sufficesto note that RY is nonnegativity preserving and that eYl [p31IIB dt < 0;) [RYlI IIB 5
II
J
(O@)
as long as X > p ( P r ) . Moreover, in proving (4.2.33), we may assume that 4 E B(C;R)+, in which case all the steps taken below are easily justified:
Clearly this proves (4.2.33) and, therefore, also that [RY$] E Bo for all 4 E B ( C ;R) and that [Rr+]E DV with [LVo RY4] = X[RY4] 4 when 4 E Bo.
I V Uniform Large Deviations
125
> p(py) V 0. It is then
Finally, suppose that V E c b ( c ; R) and that easy to check from (4.2.24) and (4.2.32) that
[m] [R%] + [R:(V[RY4])] =
4 E B(C;R).
1
Hence, in this case, if 4 E Bo, then not only is [RY+] E DV but also (cf. the last part of Lemma 4.2.29) [Ry4] E D. Thus, for such 4's, we also have that  [LORY41 = 4  V[R,V4];
x[m]
from which (4.2.34) is immediate. To handle the case when X E ( p ( P y ) ,01, simply observe that p(Py+") = p(P;) +a and that RrZ: = RY for every aER. g We are now ready to return to the problem of finding alternate descrip tions of A;. 4.2.35 Lemma. For v E MI@) define
{ J, 414 dv

(4.2.36)
~ ~ (= sup v ) 
E D n B ( c ; [I,m))}.
:
U
Then
A;.(.)
I W.) I L * ( 4
(4.2.37) =SUP
{L
V d v  Ap(V) : V E B ( C ; R )
Moreover, if {Pt : t > 0) is FELLER continuous (ie., Cb(C;R) is {Pt : t 0)invariant) and D, = { u E D n c b ( c ; R) : Lu E c b ( C ; w ) } , then (4.2.38) A>(.) = J ~ ( Y )
SUP
{l t
d
U
:
.>
= [Pt.]

I(. +
io,tl
p t  8 (vu
*
1.
E D, n C b ( C ; [1,00))
PROOF:Let u E D n B(C;[l,m)) and set V, = matter to check that the function w : [0, m) x C satisfies w(t,
>
43,
?.It is then a trivial
91
R given by w ( t ,.) = u
(0) ds,
t > 0.
Hence, u = [ P p u ] , t > 0. But this means that Ap(V,) = p ( P p ) = 0; and so, for every v E MI@),

$dv Ix p * ( v ) .
Large Deviations
126
Clearly, this proves the second part of (4.2.37). Moreover, if u E D, n cb(c;[l,m)) and therefore V, E cb(C;w), then the same argument shows that J p 5 A>. To complete the proof, let V E Cb(C;R) and X > A p ( V ) be given. Set u = [Ryl] and observe that, by Corollary 4.2.27 and the last part of Lemma 4.2.31, u E D and that Xu  (Lu Vu) = 1. In addition, by the FEYNMANKAC formula, one sees that u 2 E for some 6 > 0. Hence,
+
from which the first part of (4.2.37) follows after one lets X \ A p ( V ) and then takes the supremum over V E cb(C;w).Finally, in the FELLER continuous case, one can easily check that [Ryl] E D,; and so the preceding shows that A>(.) 5 J p ( v ) . I 4.2.39 Theorem. For h > 0 let n h be the transition probability function given by nh(0,) = P(h,g,.). Then,
Jn,(v) 5 (4.2.40)
hxp*(v) for
h
>0
1 and z p ( v ) 5 lim  Jn,(v) for v E MI@); hTO
and so, p E MI@) is {Pt : t > 0)invariant ifx'p*(p) = 0, and J p ( p ) = 0 if p is {Pt : t > 0)invariant. (See (4.1.38) for the definition of Jn,.) In particular, if A> = Ap*, then 1 (4.2.41) j p ( v ) = x p * ( v ) ,Jn,(v) 5 h j p ( v ) , a n d J p ( v ) = lim Jn,(v) O\h h
for all v E M1(C); and so p E MI (C) is { Pt : t > 0)invariant if and only if J p ( p ) = 0. Finally, when {Pt : t > 0) is FELLER continuous, then 1 (4.2.42) J p ( v ) = A>(v), Jn,(v) 5 h J p ( v ) , and J p ( v ) = lim Jn,(v) oh \ h
for v E Ml(C); and so, in this case, p E Ml(C) is {Pt : t and only if J p ( p ) = 0.
> 0)invariant if
PROOF: To prove the first part of (4.2.40), first use (4.1.39) to see that
Jn,, (v) 5 sup
{
V dv  An, ( V ): V E B ( C ;R)
1
IV
Uniform Large Deviations
127
Thus, the inequality will be established once we note that, by JENSEN'S inequality and Lemma 4.1.33,
To prove the second part of (4.2.40),let u E D f l B ( C ;[l,co))be given and note that, since 1  2 5  logz for 2 E (0, co),
and therefore that k%dv=limh\oh
s
c
~
PU
1
dv 5 lim Jn,(v). hyO
Clearly, this completes the proof of (4.2.40). Moreover, if Kp*(,u) = 0, then, by (4.2.40) and Lemma 4.1.45, one sees that P p h = ,u for all h > 0. On the other hand, if ,u is {Pt : t > 0)invariant, then, by Lemma 4.1.45, Jn,(p) = 0 for all h > 0, and therefore sp(,u) = 0 by (4.2.40). Next, suppose that A> = A p * . Then (4.2.41) follows immediately from (4.2.37) and (4.2.40). Finally, suppose that {Pt : t > 0 ) is FELLERcontinuous. Then, by Lemma 4.1.36, Jn, = A h h . At the same time, by the same argument as the one which led to the first part of (4.2.40), Ahh(.) 5 hA;(v); and, by (4.2.38), AI;, = J p . Clearly this proves that Jn, 5 h J p . Hence, by the last part of (4.2.40), we now see that Jp(v) 5 s p ( v ) 5 bhL0 xJnh(v) 1 5 JP(U).
I
We have now proved the following version of a result which was originally derived by DONSKER and VARADHAN [30]. 4.2.43 Theorem. (DONSKER & VARADHAN) Assume that (0)holds and * define J p as in (4.2.36). Then T p is a good rate function, K p = J p = A>
Large Deviations
128
and, for every r E BM,( c ) ,
In particular, if, in addition, {Pt : t > 0) is FELLER continuous and Jp is defined as in (4.2.38), then Jp = 5p; and so (4.2.44) holds with Jp in place of 3p.
PROOF:In view of Theorem 4.2.16, (4.2.19), and the second part of Theo * rem 4.2.39, all that we have to do is show that A> = A p . But, with the aid of (4.2.14), this follows by the same argument as we used to prove Lemma 4.1.40. A truth which is familiar to MARKOV process devotees is that life often becomes simpler when one deals with symmetric transition probability functions. Thus, it should come as no surprise that the preceding theory of large deviations takes a more pleasing form when applied to such processes. In particular, we will close this section by showing that the rate function can often be expressed in terms of the DIRICHLET form associated with the symmetric process. We begin by recalling a few of the basic facts about symmetric MARKOVsemigroups. The afinite measure m on ( E l & ) is said to be reversing for the transition probability function P(t,a,.) if the measures rnt, t E (0, GO), defined on (C2,Bg ) by
are symmetric (i.e., rnt(rlx r2)= rnt(r2x r1) for all rl, r2 E B E ) . Clearly, m being reversing for P ( t ,a, .) is equivalent to the statement that the semigroup {Pt : t > 0) is msymmetric in the sense that, for each t E (0, oo),
In particular, by taking II, = 1 in the preceding, we see that for any t E (0700) and 4 E B ( C ;[ 0 , 4 ) ,
L J , Pt4drn=
4dm.
IV
Uniform Large Deviations
In other words, m is {Pt : t inequality,
129
> 0)invariant;
IIPt4lI~.(,) 5 llw+2)llL1(Tn)
and therefore, by JENSEN'S = ll4Il;2(,)

for all t E (0,m) and 4 E B ( C ; R ) . After combining this with the fact that [Pt4](a) 4(0), o E c, for 4 E cb(c;w),one can easily show that {Pt : t > 0) determines a unique strongly continuous semigroup {Ft : t > 0) of selfadjoint contractions on L2(m)such that Ft4 = Pt4 for 4 E B(C;R) n L 2 ( m ) . Use to denote the generator of the semigroup {pt : t > 0) and note that is a nonpositive selfadjoint operator on L2(m).(The selfadjointness of follows from that of the Pt 's and the nonpositivity is a consequence of their contractive property.) Moreover, by either STONE'S Theorem or the HILLEYOSHIDA Theorem, one knows that
z
z.
where {Ex : A E [0,co)) is the spectral resolution of the identity for Finally, define the Dirichlet form & to be the quadratic mapping given by
let D(E) = {4 E L2(m): E ( $ , 4 )
< 00); and note that, by (4.2.46),

What we want to show is that, under reasonable assumptions, the function JE : M1(C) [0,m] given by (4.2.49)
&(f1/2,
f 1/2)
governs the large deviations of {Lt : t
if p O}.
4.2.50 Lemma. For V E B(C;W) define {PT :
4.2.23. Assuming that (4.2.45) holds,
t > 0) as in Lemma
130
Large Deviations
and so, for each t E (0, co),there is a unique continuous extension Fr to L2(m)of P y on B ( C ; R ) n L'(m). Moreover, {pr : t > 0) is a strongly continuous semigroup of bounded, selfadjoint operators on L2(m);and (4.2.51)
1
(J,V d p  J&(p) : p E Ml(C) 1 = lim t 1% (ll?StvllLz(m)+L2(m) ) IA P ( V ) ,
A&(V)
SUP
t+oo
where we have used L2(m)into itself.
)I
l l ~ z ( ~ )  , p ( to ~ ) denote the norm for operators on
1
By (4.2.26), it is obvious that [P,"4](a)/I etllVlle[Pt\#\](a), aE C. Hence, the first assertion follows immediately from the fact that Pt itself acts contractively on L2(m);and so there is no problem about proving the existence and uniqueness of the extensions Fr. In addition, it is clear that {Fr : t > 0 } forms a semigroup and that this semigroup is strongly
PROOF:
continuous on L 2 ( m ) .In order to show that the Fr 's are selfadjoint, first observe that
for ( t , a ) E (0,co) x C and 4 E B ( C ; R ) . Indeed, using the expression in (4.2.26) for [Py4](a),one sees that
Now let denote the adjoint of $. Then, for 4, $ E B ( C ;R) n L'(m), one sees from (4.2.52) and (4.2.24), respectively, that
IV Uniform Large Deviations
131
and
where we have used the selfadjointness of Ft to get the first of these expressions. Starting from the above, it is an easy step to
and thence to $ = FY. Having established that { p y : t > 0} is a strongly continuous semigroup of bounded, selfadjoint operators on L2(rn),we can now say that
P:
(4.2.53)
ext
= V
t E (0, CQ),
,a)
where {EY : A E A[, m)} is the spectral resolution of the identity for V L and is the generator of {F: : t > 0). In particular, Av = limt+m $log ( I / ~ Y I I L z ( m )  L Z ( m ) . Thus, we will be done once we show that XV 5 Ap(V) and that Av = AE(V). To see the first of these, let A > AV be given. Then there is a 2c) E L2(rn) such that 2c) = EY2c) # 0. Thus, we can find a d, f B(C;R) r l L1(rn)such that EYd # 0. But this means, on the one hand, that
zv
)
and, on the other hand, that
In other words, Av 5 A p ( V ) . To prove that Av = AE(V), first note that
and there fore that
Large Deviations
132
Next, using (4.2.24) and (4.2.48), check that
By combining these, we see t,hat
Thus, all that remains is to check that the preceding supremum is unchanged if we restrict ourselves to nonnegative 4’s. But, for 4 E L1(rn)n B ( C ;R),
and so, by (4.2.48) and an easy limit argument, we see that
as t \ 0 for every 4 E L2(rn). In particular, we now know that
ql#L14) I €(A#).
I
4.2.55 Lemma. Assume that m is a reversing measure for P ( t ,u, .) and
define & and JE accordingly. Then (cf. (4.2.37) and Theorem 4.2.39for the notation)
and (4.2.57)
1 Jc(p) = lim Jn,(p) Oh \ h
PROOF:Obviously,
for p E MI@) satisfying p
0,
and
Thus, by LEBESGUE'SDominated Convergence Theorem, we have that
Clearly the desired result follows from this together with (4.2.40) and (4.2.48). By combining the preceding with our earlier results, we arrive at the following version of a result which, once again, is due originally to DONSKER and VARADHAN [30]. 4.2.58 Theorem. Assume that rn is P ( t , a ,  ) reversing. I f p 0)reversing afinite measure, v 0) as a family of probability measures on M1(C).In addition, iff and {fin}: are the functions defined above, then (4.3.5) and Lemma 4.3.6 tell us that all the hypotheses of Lemma 2.1.4 are met by these functions and the family {Qt : t > 0). Hence, as a consequence of Lemma 2.1.4, we now see that J p l ~ governs , the Iarge deviations of {Qt : t > 0) as a family of probability measures on El; and this is just another way of saying that (4.3.9) holds. 1

The principle reason for DONSKER and VARADHAN'S interest in Theorem 4.3.7 is that they wanted to apply it to the following rather strange computation. Namely, let N E Z+ be given and, as in Section 1.3, denote by W WIENER'S measure on 0. Given E > 0, t E (0, m), and 8 E 0 , define
c$"(o) = {x E RN :
< for some s E
1 2  ~(s>l E
[o, tl}

to be the €sausagearound 81ro,tl.Using Il?l to denote the LEBESGUEmeasure of l? E B R ~note , that 8 E 0 lG~"(O)lis measurable and set
d(')(t; 7 )=
1
[
exp  rl6i"(e)l]W(dO), t E (0,m)
0
IV
Uniform Large Deviations
147
for fixed y E (0,m).In order to verify a conjecture made by some physicists, what DONSKER and VARADHAN wanted to do is compute the asymptotic behavior of d("(t; 7 ) as t 00; and we will devote the rest of this section to showing what they did. The first step is to rewrite d ( ' ) ( t ; y )in such a way that it becomes clearer what one should expect. To this end, observe that, by BROWNIAN scaling (cf. (iv) of Theorem 1.3.2), for each Q E (0,m):

have the same distribution under W . Thus, since
we see, upon taking a: = t2/N, that
where ~ ( t=)E / t 1 l N .Looking at the form of X ( € ) ( t ; y ) one , is led to guess that
and therefore, by (4.3.10),
might be the appropriate limit to compute. Further evidence that the preceding is a step in the right direction is provided by the following relatively simple computation.
4.3.11 Lemma. Let G be a bounded, nonempty, open subset of RN and set
(The space CF(G;R) consists of those 4 E C"(RN; R) with compact support in G.) Then
Large Deviations
148
(See Remark 4.3.33 below.)
PROOF:For z E R N and 6 E 0 , let &Js be the path t E [O,m) Hx + B ( t ) E RN accordingly. It is then clear, by the RN and define G~"'"(&) translation invariance of LEBESGUE'Smeasure, that
for all z E W N . Next, define
c ( x , O ) = inf{t 2 0 : &(t)4 G } .
where u G ( t , Z ) = W ( { e : show that
C ( Z , ~ ) > t } ) . Thus, all that we have to do is
(4.3.13)
The proof of (4.3.13) depends on an elementary fact about the relation between WIENER'S measure and the FRIEDRICHS' extension of $ A on CF(G;R). (We use A here to denote the standard LACLACEoperator on RN.) Namely, if Qt is the operator on B(G;W) defined by
z
[Qt+](x)=
J
# ( O , ( t ) ) W(dB), z E G and
4 E B(G;R),
{ws,wt)
then {Qt : t > 0 ) is a subMARKOVian semigroup on B(G;W) which is weakly continuous on Cb(G;W) and satisfies
for all 4, p!~ E B ( G ; R ) .In particular, each Qt a d m i t s a unique extension as a selfadjoint contraction on L 2 ( G ) ,and : t > 0) becomes a
vt
{ot
IV
Uniform Large Deviations
149
strongly continuous semigroup of selfadjoint contractions whose generator coincides with E. That is, Qt = etL,t E [ O , o o ) . (For more information on such matters, the reader might want to consult [SO] or [51].) With the preceding in hand, we now see that
and so (4.3.13) comes down to checking that (4.3.14)
After combining these we see that
and obviously (4.3.14) is an immediate consequence of this. I
Large Deviations
150
Considering how crude the idea behind (4.3.12) appears to be, one may be surprised that, after making the optimal choice of G, the right hand side of (4.3.12) turns out to be the limit which we are seeking. The intuitive explanation for this is that a WIENERpath 8 either takes an excursion which carries it far away from the origin, with the result that (6i‘(t))(8)( becomes very large as t CQ, or 8 remains in some fixed bounded open G, in which case its “sausage” eventually fills up the whole of G. Although this intuitive picture is appealing, it does not lend itself easily to a rigorous proof. Instead, our derivation of the upper bound will rely on an application of Theorem 4.3.7 and will not make any direct reference to the preceding intuition. In order to arrive at a situation to which that theorem is applicable, we will need to make some preliminary preparations. Let R E (0,oo) be chosen and fixed, and set

Next, introduce on C ( R ) the metric
D R ( ~y), 5 min{ Iz + Rk  yI : k E ZN}, z, y E C(R); and observe that ( C ( R ) ,DR)becomes a compact metric space for which the corresponding BORELfield B x ( R ) coincides with the field BRN [ C ( R ) ] of BRNmeasurable subsets of C ( R ) . Also, define FR : RN C ( R )by

(151 = max{n E Z : n 5 Ro and 4 E H ' ( C ( R ) ) + with l1411p(xR)= 1 and
for some C3, C, E ( 0 , ~ )and ; clearly the desired result follows from this. I At this point what we know is that  inf{ylGI+
A(G) : G E S,}
Although (4.3.29) appears to be still some distance from our goal, it, in conjunction with a beautiful result from classical potential theory, turns out to be all that we need. To be precise, for measurable 4 : RN [0, 00) define the decreasing rearrangement of 4 to be the nonnegative measurable function on RN with the property that

4
I V Uniform Large Deviations
159
I{$
where f 2 ~ 3 (BRN(0,l)I. Obviously, > t}l = 1{4 > t } ( for every t E [O,m), and therefore 4 E L 2 ( R N )IE L 2 ( R N )is an isometry. The beautiful result alluded to states that E H ' ( R N ) and
4
4
(4.3.30)
if
4 E H1(RN).For an elegant
proof of this statement, see [74].
4.3.31 Theorem. (DONSKER & VARADHAN) Set
where
:
4 E CF(BRN
(0,
(I/~N)"~))
1
=1
with
.
Then, for every E E (0, m), lim
t+oo
1
tN/(N+2)
log ( L e x P
[  ~l6~')(')1]
W(de)) =  & N ( y ) .
PROOF: In view of (4.3.10) and (4.3.29), all that we have to do is check that inf{ylGI+ X(G) : G E St,} 5 &N(Y) (4.3.32)
To this end, note that, by an obvious scaling argument,
where BA denotes the open ball in R N around the origin wit Hence, inf{ylG(
I
volume
+ X(G) : G E @I,}Ii n f { y ( B ~+( X(BA) : A E (0,m))
1
Large Deviations
160
which is the left hand side of (4.3.32). To prove the right hand side of (4.3.32), suppose that E H1(WN)+ with l l 4 l l ~ 2 ( ~=~ 1, and A = > 0)l < 00 is given. Then, by the result cited above,
I{+
where 6 is the decreasing rearrangement of 4. At the same time, by an elementary mollification procedure, one can easily check that
for every 6 E ( 0 , ~ )Thus, . after letting 6 \ 0, we conclude that
4.3.33 Remark. The reader who is uncomfortable with the sort of DIRICHLETform technology used in the proof of Lemma 4.3.11 should note that the proof of Theorem 4.3.31 only required our knowing (4.3.12) when G is a ball around the origin, in which case (4.3.12) can be easily derived from familiar, classical facts about the eigenvalues and eigenfunctions for f A with boundary condition 0.
I V Uniform Large Deviations
161
4.4 Process Level Large Deviations
In the preceding three sections, we discussed the large deviation theory for the empirical distribution of the position of a MARKOVprocess. In this section, we will develop the same theory for the empirical distribution of the whole process. We begin in the setting of MARKOVchains. Thus, let II be a transition probability function on a Polish space C and denote by {Pu: 0 E C} the associated MARKOVfamily of probability measures on R = EN.For n E N, define 8, : R R so that C,(&w) = C,+,(w) (recall that Cn(w) is the position of w E R at time n E N); and, given ’~tE E + , define

Once again, under the conditions introduced in Section 4.1, ergodic considerations predict that R,(w) + Pp almost surely, where Pp = & P, p ( d a ) and p E M1(C) is the IIinvariant discussed in Exercise 4.1.48. Our goal is to describe the large deviation theory for the families
{P, 0 (R,)’: n 2 I},
a E C.
Note that L,(w) = R,(w) o C,’ and therefore that the result which we are now pursuing is “higher” than the earlier one. We will begin by considering the more modest task of dealing with a study of the analogous problem for the finite dimensional marginals of the R,(w) ’s. Namely, for 1 5 k < l < 00, define
and, for d 2 2 , consider the map
and let pbfk E MI (MI@&))denote the distribution of w c)L?)(w) under PU . We will now develop the large deviation theory for the families {pb,, (4 :
n 2 1) when Il satisfies (U). To this end, define the transition probability function Il(d)on C d by (4.4.3)
Large Deviations
162
for
dd)E Ed and r
E
&d;
and let {PLf&:
d d )E C d } be the associated
= (Ed)N.Noting that
MARKOVfamily on
hid))
( r I ( d ) ) d ' e  l ( ~ ( d ) , d ~ (=d )r)I e ( ~ y ) , d ~ l ( d ) ) r I ( ~ l ( d ) , . .  r I(4 ( ~ ~  (4 ~ ,)d r ~ for C E Z+, one sees that (U) implies that
for
d d )E xd.
Thus, when II satisfies (U), Theorem 4.1.43 applies to the empirical distribution of the position of the MARKOVchain {PLfd): ( ~ ( E~ 1E d } and tells us that
Jn( 4(v)
(4.4.4)
Jn(d)(v),
v E M1(Cd)
is a good rate function and that
for every r E
f3M1(Cd);
where .
n
and ELd)( w ( ~ )is) the position of w ( ~at) time n E N. Since, by the MARKOV property, it is an easy matter to check that for any n E Z+, (T E C, and dd)E Ed with oy) = (T: P g m =
s,. pi:)) ( { J d )L , ( J ~ )r})) :
E
( J d ) , d ~ @ ) ) r, E aE,
and therefore that
for all n E Z+ and deviation result.
(T
E C , we have now proved the following uniform large
IV Uniform Large Deviations
163
4.4.5 Lemma. Assume that (U) holds. Then the function JF’ is a good rate function on M1(Cd) and
for all r E
BM,
We next want to give an alternative expression for Jf). In order to develop this other expression, it will be necessary to recall a basic property of probability measures on a Polish space. Namely, given a Polish space E , a countably generated subaalgebra 3 of BE, and a P E M l ( E ) , there is a map z E E P 3 ( z l . ) E MI(C) with the properties that (1) z E E P F ( z ,B ) is Fmeasurable for every B E BE; (2) P F ( z ,A ) = X A ( ~ )z, E E , for each A E 3; (3) P ( A n B ) = JA P 3 ( z , B ) P ( d z ) for all A E F and B E BE.

The map z E E P3(x, .) is caljed a regular conditional probability distribution of P given F (abbreviated by r.c.p.d. of P given 3).The existence of a regular probability distribution is a wellknown but nontrivial fact (cf. Theorem 1.1.8 in [104]) about the measure theory of Polish spaces. On the other hand, it is easy to see that any two r.c.p.d.’s of P given 3 can differ only on a Fmeasurable, Pnull set.

4.4.7 Lemma. Let E be a Polish space and 3 a countably generated subaalgebra of B E . Given P, Q E M 1 ( E ) ,let x E E P 3 (x , .) and x EE Q 3 ( x , .) be, respectively, r.c.p.d.’s of P and Q given F. Then xEE H(Q3(z,)lP3(x,.))is 3measurable; and

where PI3 and QI3 are the restrictions of P and Q to 3 .
PROOF:First note that since, by Lemma 3.2.13,
we have that (v,p ) E (Mi(E))’
++
H(vIP)
Large Deviations
164

is a lower semicontinuous function; and therefore the 3measurability of z EE
H(Q7(z, . ) ( P F ( z.)) ,
is established. Second, observe that if either side of (4.4.8) is finite, then Q 0 and
d d )E Ed,
then log ( [II(d)ua] ( d d ) )=)crqb(ap),. .. ,u y ' ) , and so
(4(v) = 00. which means that Jn H dd') (a We next suppose that v is shiftinvariant. Let be a r.c.p.d. of Y given Bdl; (4 and note that, by Lemma 4.4.7,
7.1
Large Deviations
166
At the same time, by Lemma 3.2.13,
which obviously dominates
for every H(vIvd1
E Cb(Cd; W). But, by shift invariance, the preceding says that @ d n) dominates
Thus, we have now shown that Jf'(v) 5 H(v(vd1 @d II). On the other hand, by Lemma 3.2.13, JENSEN'Sinequality, and shiftinvariance:
and clearly this completes the proof. I We are now ready to return to the problem, posed at the beginning of this section, of examining the large deviation theory for the Ml(a)valued random variables in (4.4.1). Actually, as we are about to see, we are already
IV
167
Uniform Large Deviations
quite close to having such a theory. Indeed, by (ii) in Exercise 3.2.22,we can identify Ml(R) as the projective limit of the sequence {M1(Cd) : d 2 2}. Furthermore, if ?rd is the projection map w ER

?rd(W)
= (C*(w), . . . ,C d   1 ( W ) ) E C d ,
then it is obvious that pb$ is the distribution of w under P,. Hence, just as in Exercise 2.1.21, if we set
JAm’(Q)z SUPJ$’(Q
(4.4.11)
0
(?rd)l)j

&(w)
0 (?rd)l
Q E Ml(R),
d22
then we have the following uniform large deviation result as a consequence of Lemma 4.4.5. 4.4.12 Theorem. Assume that (U) holds, and define JAW’ as in (4.4.11). Then JAW’is a good rate function and
for every JAW’ .)
r
E B M ~ ( Q (See ) . (4.4.16) below for more information about
Although Theorem 4.4.12 in conjunction with Lemma 4.4.9 provides a reasonably satisfactory description of the large deviation theory under consideration, it would be an even better theory if we could find a more Unfortunately, in order t o get a nicer direct method of computing Jnm’(Q). ( ( expression for JnW) it will be necessary for us to introduce some additional not at ion. We will say that Q E M 1 ( R ) is shiftinvariant if Q = Q o 8;’ for every n 2 1, and we will use Ms(Q) to denote the set of all shiftinvariant Q E Ml(R). Note that, by Lemma 4.4.9, we need only concern ourselves with the computation of JAm’(Q) for Q E MT(R) since JAW)(&)= 00 for Q 4 MT(R). Next, for n E Z,set Zn = Z n (w,n], R: = Czn, and use C,(w*) to denote the position of W* E Cl; at time m E Z,. Given n E Z and Q E MT(Q), one can use the KOLMOGOROV Extension Theorem to show that there is a unique QE E Ml(R;) with the property that
Q; ( { w *
E 0; : ( x  d + n ( W * ) , . . . , ~ ( w * >E)
r})
= Q ( { W E ~ :(Zdw),...,Mw)) Er})
Large Deviations
168
for all d 2 1 and I? E Ed+'. Next, given P E Ml(QZ), define P @O II to be the unique element of Ml(f2;) satisfying
for all d 2 1 and $ E B(Cd+2;Fa). Finally, for n E Z and k 5 C 5 t3$; denote the aalgebra over generated by the map w* E
a;

(Ck(W*),
. . ., C,(w*))
E
TI,
let
Pk;
and, for P, P' E Ml(f2;), let HF)(P'IP) denote the relative entropy of the restrictions to Z3r$!n, of measures P and I". It is then an easy matter to see that, for any Q E Ms(f2) and d 2 1, Lemma 4.4.9 becomes the statement that
Hence, (4.4.14)
J ~ ~ ) (=QSUPH~)(Q;IQ; ) gon), Q E M;(~I). dzl
In order to take advantage of the expression in (4.4.14), we will need the following simple continuity result for the relative entropy functional. 4.4.15 Lemma. Let ( E ,7 )be a measurable space, suppose that
ur
is a nondecreasing sequence of subaalgebras of 3 such that 3ngenerates 3. Then, for any pair of probability measures P and Q on ( E ,3),
H(QIF"IPIF,,)/" H(QIP) as TI 400. PROOF:By the argument used to prove Lemma 3.2.13, we know that
R) denotes the space of bounded, .Fnmeasurable $ : E where B ( E ,3n; R. At the same time,

IV

Uniform Large Deviations
169
Hence, it is clear that n H(QIF,,IPIF~) is nondecreasing and that its limit does not exceed H(QJP). On the other hand, the class of II, E B ( E ,F ;W) for which
is closed under bounded, pointwise convergence and, obviously, contains B ( E ,F ~W);for all n 2 1. Combining Lemma 4.4.15 with (4.4.14), we now see that JAm’(Q) =
(4.4.16)
{ H(Q;IQ: 00
n)
80
if Q E M?(o) otherwise.
When (4.4.16) is put together with (4.4.12), we obtain a version of the process level large deviation result proved originally by DONSKER and VARADHAN [36]. Having dealt with the discrete time setting, we now want to see whether we cannot prove the analogous result in the continuoustime context. However, before we can do so, we must arrange that the sample space R at the beginning of Section 4.2 be itself amenable to a Polish structure. For this reason, we will assume that R is the space of rightcontinuous paths w : [O,oo) C which have a left limit at each t E ( 0 , ~ ) .Next, define RT = D([O,TI;C) for T E (0,oo) to be the Skorokhod space of rightC which have a left limit at each t € (0, T] and continuous WT : [0, T ] are leftcontinuous at T,and endow RT with the Skorokhod topology. That is, we give RT the topology induced by the metric


for W T , w k E Q T , where X runs over all increasing homeomorphisms of [O,T]onto itself. The following facts about the SKOROKHOD topology on RT are standard and will be important for our development below.
(i) The SKOROKHOD topology on RT is Polish in the sense that it is separable and admits a complete metric. (ii) The aalgebra over RT generated by the maps C, t E [0,TI,coincides with the BORELfield an,.
WT E
RT

W T ( ~ )E
170

Next, for 0 < TI < Tz < 00, define T!$) : 0~~
Large Deviations
R T ~so that
Unfortunately, although these natural restriction maps are, by (ii), measurable, they are not continuous. Thus, it is not possible for us to simply define the topology on R as the projective limit of the topologies on the RT’s. However, we will postpone the consideration of this technicality until later on in our development and will, for now content ourselves with the introduction of the projective limit measurable structure on R; which, according to (ii), is the one induced by the position maps w ER C t ( w ) , t E [0, 00). Thus, we will use Bt,t E [O, oo),to stand for the aalgebra over R generated by w E R C,(w), s E [ O , t ] , and we will use B to denote the smallest aalgebra over R containing Obviously, ( t ,w ) E [o, 00) x R c,(w) E c is {ot: t E [o, oo)}Gogressively measurable. In addition, for each T E (0,00), the map TT : R RT defined bv



is a measurable surjection; and, in fact, a
u,,&.

(UtEIO,T)Bt) = rG1(Do,).
4.4.17 Warning.
Throughout the rest of this section, R will be the pathspace just described, and we will be assuming that the transition probability function P ( t ,(I, .) permits us to realize the corresponding MARKOVfamily {Po: a E C} on (R,B) with C t ( w ) being the position of w E R at time t E [0,00). Furthermore, we will be assuming that (4.4.18)
Po({w : & ( w ) = limC,(w)}) = 1, ( t , a ) E ( 0 , ~ x) C. s/t
In what follows, it will be convenient to have a notation for the “splice” of two paths. For this reason, if T E (0,00), WT E RT, and w’ E R, define ZT E 0 by Z j ~ ( t = ) W T ( t A T ) ,t E [0,00), and WT @T W’ E SO that WT @T W’ = ZT if CO(W’) # W T ( T and )

. that the map ( W T , w’) E RT x Cl WT @T W’ if Co(w’) = W T ( T )Observe is measurable. When w,w’ E R, set w @T w’ = ( T T W ) @T w’. Finally, for
IV
Uniform Large Deviations
171
for ~kE B((R,B);R). We will also need the timeshift semigroup ( 8 , : t 2 0) on Q. Namely, for t E [ O , c o ) , define the timeshift Bt : R R by Es(8,w) = Cs+t(w), s E [ O , c o ) ; and note that ( t , w ) E [O,m) x R 8tw E R is measurable.


With these preliminaries taken care of, we can begin to formulate the problem which we want to study. To this end, let w E R Rt(w) E MI ((GI B)) (the probability measures on the measurable space ( 0 ,B)) be the map given by (4.4.19)
where X p t ] denotes normalized LEBESGUEmeasure on [0,t].(Cf. the comment in Remark 4.2.2 following the definition of L t ( w ) for the appropriate expression when the paths are regular.) What we want to do is analyze the large deviations of {Rt : t > 0} under the measures P,. Of course, as yet, we do not even have a topological structure on Ml((R,B)) and therefore are not really in a position to carry out such an analysis. Nonetheless, just as in the MARKOVchain case, our analysis will be actually accomplished at the level of the finite timemarginals of the Rt ’s; and this analysis we are ready to do. Unfortunately, although the ideas here are just as intrinsically simple as the ones in the discrete time setting, technicalities introduced by the continuity of time tend to make them appear more complicated than they really are. Given T E (0,cu) and WT E RT,define

It is a relatively easy matter to check that WT E QT is measurable. What is less obvious is that {Pi:) : MARKOVproperty described in the following. 4.4.21 Lemma. For each T E (0, m), WT E
WT
Pi:) E MI((@B)) E RT} satisfies the
RT, and w’
E R,
Large Deviations
172
In addition, for each T E (0, oo),WT E RT, s E [0, oo), and A E
&+T,
S, (esw/> Q
P~T)(~~I)
(4.4.22)
for every Q
E B((R,
B);R).
PROOF:To prove the first assertion, note that d i s t ( q , V ( @ t ( W@T w ‘ ) ) ) 5 dist ( W T , r T ( O t Z T ) ) dist (T~+T;T, T
+
~ + T ( ~ @T T
w’))
To prove (4.4.22), set
A , = {w’ : WT
@T w‘ E
A},
and first suppose that s 5 T . Then Os(wT @T w ’ ) = (O,Zs,) so, by the MARKOVproperty for the P, ’s,
@ . T  ~ w’,
and
since, by (4.4.18), C,(w’) = [ T T ( ~ , ( w T @ T W ’ ) ) ]( T )for P,,(T)alrnost every w’ E fl. When s > T , a similar argument, based on the identities

for w’ E R with Co(w‘) = W T (T ),yields the desired result. For T E (0,oo) define ( t , W T ) E (0, m) XRT p ( T ) ( tW,T , .) E M I ( ~ T ) so that
I V Uniform Large Deviations
173
Then, by (4.4.22), P ( T ) ( tW, T , is a transition probability function on f l ~ . In addition, by the first part of Lemma 4.4.21, P ( T ) ( t , ~ ~; ,I . )S, as t \ 0. Finally, if BiT) = Bt+T, t E [0,m), then the map a)
( t , w ) E [o,m) x
R

= TT(etW)
c ~ ) ( W )
E RT
is {Z3iT): t E [0, 00))progressively measurable and
r}
=
H T )(t,xiT),r)
(a.e., pi:)) for all s, t E [0,m) and every l? E Bn,. Thus, we are in the situation treated in Section 4.2. In fact, if the original transition probability function P(t,a,.) on C satisfies then it is clear that ~ i T ) ( { w :’ C L T ; ( W ’ ) E
(o),
p ( T ) (ft T ,W T , ’) pl(dt)
(W’)
6;,
P ( T ) ( t + T , w ~ , ’ ) p z ( d t ) , W T , W & E flT.
Thus (cf. Remark 4.2.8 as well as Exercise 4.2.61), the following statement is just an application of the results proved in Section 4.2. 4.4.23 Lemma. Assume that P ( t , a , . ) satisfies given, and define $)

= Jp(q: MI( f l ~ )
(a),let T E (0,m) be
[O, m]
in terms of P ( T ) ( t ,.)~ in ~ ,the same way as Jp is defined by (4.2.36) in terms of P(t,cr, .). Then: the level sets of $? are strongly compact; (TI 1 1 (4.4.24) Jp (v) = supJ ( T ) ( v= ) lim J ( T ) ( v ) , v E MI(!&), h>O h * h h\Oh nh where ~(LT’(wT,.) (4.1.38); and
= P ( T ) ( hW, T , . )
and
is defined accordingly as in
and Xpt] denotes normalized LEBESGUE measure on [0, t]. The reason for our choosing to state (4.4.25) relative to the strong topology on M i ( f l ~will ) become clear as we develop the theory for unbounded time intervals.
174
Large Deviations

Noting that the distribution of W'
EQ
(Rt(d))0 rG1
E Mi
(a,)
under P,,(T)coincides with that of W'
E Q I+ LIT)(&&) E Mi (RT)
under Pw, (TI, we see that, as long as
CJ
= WT(T),
Moreover, having stated Lemma 4.4.23 in terms of the strong topology, we can circumvent the technical objection (raised after our initial discussion of the SKOROKHOD topology) to putting an inductive limit topology That is, we will entirely avoid putting a topology on R itself and on Q. will, instead, go directly to the projective limit strong topology on Ml((R,B)). To be precise, we consider the topology on Ml((Q,B)) for which the sets (4.4.26) form a neighborhood basis at Q as runs over the bounded functions which are &measurable for some T E [0,m). Clearly this is the projective limit, under the maps
Q E Mi((Q,B))  Q 0 r T 1
€Mi(%)
of the strong topology on the spaces M,(Q,). In particular, 00
K c c M 1 ( ( R , B ) ) ifandonlyif K =
(7 ( Q : Q O ~ ~ ; ~ E K ~ } , d=l
where Kd is a strongly compact subset of M1(Rd) for each d E Z'. Finally, we will say that r C M1 ((a,B)) is measurable if it is an element of the B)) generated by the sets in (4.4.26). In case there aalgebra over M I ((a, is any doubt about it, we point out that this notion of measurability will, in most cases, be much more restrictive than the one determined by the BORELstructure associated with the projective limit strong topology. Having made these preparations, we can now prove the following large deviation principle.
I V Uniform Large Deviations

175
4.4.27 Theorem. Assume that P ( t , 0,.) satisfies : MI ((0,a)) tion JLW)
( 4.4.28)
(6) and define the func
[0, 001 by
J p (Q) = s u p { j r ) ( Q o n,')
(m)
:
T E (0, G O ) } ,
for each Q E M l ( ( 0 , B ) ) . (See (4.4.38) below for more information about (a)
J,
r c_
.) Then the level sets of 7Lm)are compact; and, for every measurable Ml((fi,ql  inf ro
7p)5 lim t03
t
P,({w : R t ( w ) E I?})
(4.4.29)
P, ( { w : R , ( w ) E I?}) PROOF:In view of (4.4.26) and (4.4.25), the only part of this statement which requires comment is the last inequality in (4.4.29). The main difficulty in the proof stems from the fact that the strong topology on M l ( 0 ~ ) is not first countable. In particular, it is not immediately clear whether 0 ~5 L) } are sequentially comstrongly compact sets {v E ~ ~ (: J(~T)(Q) pact in the strong topology. To see that they are, let { u n } F 5 M l ( 0 ~ ) satisfying $?(un) 5 L < 00 for all n E Z f be given. Then, because {v E MI(&) : T(pT'(v) _< L } is weakly compact, we can choose a . subsequence (u,{} which converges weakly to some u E M l ( 0 ~ )But (TI {v E M l ( 0 ~ :) J p (v) 5 L } is also strongly compact, and therefore v,~ v in the strong topology. Once one has the preceding, it becomes an easy matter to check that if

{Qd}zff=, G Mi ((0,a))and
suP$)(Qd d,l
0
nil)
< 0,
then there is a subsequence { Q d , , } which converges in Ml((0,a)). Now suppose that r is a measurable subset of M l ( ( 0 , B ) ) and that F 2 'I is a closed subset of Ml((O,B)). Choose ( F d ) & so that F = {Q : Q o n;' E F d } and each F d is a strongly closed set in Ml(0d). We then know, from (4.4.25), that
nzl
1 lim 
t03
t
P, ( { w : R t ( w ) E
r})
5  sup inf (dl Jp . dEZ+
Fd
Next , suppose that C = sUp&z+ infFd$) < 00. Then we can choose o T;') = infFd$ 5 C. Thus, by the { Q d I z l C F so that $?(Qd
Large Deviations
176

preceding paragraph, we can find a subsequence {Qd,,} and a Q E MI E ( ( 0 , O ) ) so that Qd, Q in M1 ((a,23)). Since F is closed, Q E F , and (a)
clearly J,
(Q) 5 C. 1
Once again, we want to develop a better expression for our rate function. Our development will turn on (4.4.24)combined with the sort of reasoning with which we solved the analogous problem in the case of MARKOVchains. In what follows, it will be handy to have some more notation. In the first place, if v E M 1 ( f l ~ for ) some T E ( 0 , ~ or ) if Q E Ml ((a ,B)), then we will use vt, t E [O,T],or Qt, t E [O,W), to denote v o or Q o T;', respectively. Secondly, for T E ( 0 , ~ and ) h E [O,T],define : 0~ RT by O p ) w T = nT(OhZT) and say that v E M l ( O ~ is ) shiftinvariant if [Y o = Y T  ~for all h E [O, TI. Similarly, we say that Q E M1((fl,23)) is shiftinvariantif Q o Oh = Q for every h E [O,W); and we will use MY(&) and Ms((CI,B)) to denote the set of shiftinvariant elements of M 1 ( O ~and ) M1 ((0,a)),respectively.
Or)

(TY))'
(Or))l]Th
4.4.30 Lemma. Q E M l ( ( 0 , B ) ) is shiftinvariant if and only if QT E MY(&) for every T E (0, m). Moreover, if v E MY(RT),then
In particular, if Q E M;((O,B)), then, for each t E (O,W), lim,ft C,(w) for Qalmost every w E R.
&(w) =
PROOF:Obviously it suffices to prove the second assertion. To this end, define
Because each WT has at most countably many discontinuities,
where l[O,T~ denotes normalized LEBESGUEmeasure on [0,TI. On the other hand, if v E M s ( 0 ~ )then ,
is independent o f t E (O,T];and therefore, by FUBINI'S Theorem, we get the desired result. I
IV
Uniform Large Deviations
177
For T E [O,oo) and v E Mi(&), we set (4.4.31)
and when Q E Mi ((Q, a)),we use Q € 3 P, ~ instead of QT @T P,. Note that v @T P, is the unique Q E Ml((R,B)) with the properties that QT = v and w E R w PEiw, is a r.c.p.d. of Q given rT1 (Ba,). In addition, by (4.4.18) and the MARKOVproperty for { P , : 0 E C}, one can easily check that
for 0 5 TI< TZ< 00 and v E M1(RT1). 4.4.33 Lemma. Let T E ( 0 , ~ and ) v E M l ( 0 ~ be ) given. If v is not
shiftinvariant, then
On the other hand, if Y E Ms(R,), then, for h E (O,T),
(4.4.34)

where WT E RT v ( ~  ~ ) ( w.)Tis, a r.c.p.d. of v given ( r(TI T  J 1 (f?nT,,). In particular, if v E M ~ ( R Tand ) s, t > 0 satisfy s t < T, then
+
PROOF:First note that
( ~ 2 ,(flT_h)measurable, )
for any $ E B ( ~ T R).; In particular, if '$ is = $ o Of); and if v E M ~ ( R T )then , then [IIiT)$]
1
Large Deviations
178
With these preliminaries, the argument used to prove Lemma 4.4.9 can be easily adapted to prove the first assertion of the present lemma as well as (4.4.34). Finally, by combining Lemma 4.4.7 with (4.4.32), we see that
Thus, if v is shiftinvariant, then (4.4.35) follows from (4.4.34). I As was the case in the MARKOVchain setting, in order to complete our program it will be convenient to move our measures to the left halfline. Thus, for T E [0, oo),let 0; be the space of rightcontinuous paths LJ; : (..,TI C which have a left limit at each t E (..,TI and are the leftcontinuous at T. For oo < s 5 t 5 T < 00, denote by B~sT~l aalgebra over 0; generated by the maps w; E 0; w;'(T) E C for T E [ s , t ] ; and use B@) to stand for the smallest aalgebra over fl; which contains BIs,Tl ( T ) for all s E (00, TI.


4.4.36 Lemma. Let Q E MY((0,B)) be given. Then, for every T E
[0, oo), there is a unique QG E MY ((a;, #)))
for every n E Z+, m < tl
with the property that
< ... < t, 5 T, and I? E BE..
PROOF:The uniqueness assertion is obvious; and clearly it suffices to prove existence in the case when T = 0. For d E Z+, let Ord,O1 be the space of rightcontinuous paths w ~  ~ :, ~ , [d, 01 C which have a left limit at each t E [d, 01 and are continuous at each t E [d,O] for which t E Z. Then (cf. Exercise 4.4.40 below), fir,,,] becomes a Polish space when it is given the topology determined metric in which the homeomorphisms X : [d,O] by the SKOROKHOD [d, 01 have the property that X ( t ) = t for every t E [d, 01 n Z.Also, it is then easy to see that the natural restriction mapping taking Ordl,ol onto Rr,,,] is continuous for each d E Z+;and, clearly, the projective limit of {O~d,,l : d E Z'} can be identified with the space S2tm,ol consisting of those paths wC; E fl; which are continuous at n for every n E N.


IV
Uniform Large Deviations
179
for all n E Z+, 0 5 tl < . . . < t , 5 d , and r E 0p.Moreover, the family (QTd,ol : d E Z+} is consistently defined on the spaces (fird,ol : d E Z+}. Hence, by KOLMOGOROV'S Extension Theorem, there is a unique Qg E MY((f22;,,B(0))) which extends all the Qrd,ol's; and clearly this is the measure which we were seeking. I Given T E [0, oo),WT E f l ~ and , wC; E a;, define wC; W T ) ( t ) = wg*(t A 0) if w;(o)# w ~ ( 0 and )
€30 WT
E R$
SO
that
(wg*€30
if w;(O)= W T ( O ) . It is then an easy matter to check that
is measurable. Thus, for Q E MT ((Q, a))and T E [0, oo),we can determine (Qg €30 p+)TE Mi ((a;, by
for all r E O ( T ) . Finally, for T E [O,m), s Ml((Cl;,B(T))),we will set
E
(oo,T], and p ; , v; E
After one reconciles the notation just introduced with our earlier notation, one finds that (4.4.34) says that, for all 0 < h < T ,
and, as we are about to see, (4.4.37) is the key to the last step in our identification of I(.J , .
180
Large Deviations
4.4.38 Theorem. Let Q E M1 ((n,a)) be given. Then, for any h
> 0,
PROOF:If Q 4 M?((O,B)),then, by Lemma 4.4.30 and Lemma 4.4.33, $?)(Q) = 00. Thus,we will now assume that Q E MY((0,B)). Set f ( h , T ) = J n ( ~ ) ( Qfor~ )0 < h < T < 00. Then, f(h,.)is nonh ; by (4.4.35), f ( s + t , T ) = decreasing on ( h ,00) for each h E ( 0 , ~ )and, f ( t ,T  s) + f(s,T) as long as s + t < T. In particular, if h E (0,00) and T E (1,00) and n E Z+, then by induction on 0 5 e 5 n: l1
k=O
and so nf(k,T)
2f(h,T)Lnf
;,Tl (h
),
TE(2,00)andn~Z+.
Consequently,
for every n E Z+; and therefore, by (4.4.24),
and clearly the desired result now follows immediately from (4.4.37) and LEMMA4.4.7. 1 In conjunction with Theorem 4.4.38, Theorem 4.4.27 becomes a version of the DONSKER and VARADHAN’S result on this subject [36]. 4.4.40 Exercise.
Working with the SKOROKHODtopology is notoriously unpleasant; and, in order not to burden the presentation with even more technicalities, we have swept some annoying details under the rug. What follows is a selection of some points which we have used without proof.
IV

Uniform Large Deviations
181
(i) Show that, for each T E (0,m) and t E [O,T],the map WT E RT W T ( ~ )E C is &,measurable. This fact, which is wellknown when C = R,
can be proved for general C’s by using the fact that every Polish space may be continuously embedded as a 66 in [0, 11” and applying the C = R result to each of the coordinates of the embedding. (ii) In the proof of Lemma 4.4.36, we tacitly used the fact that if d E Z+ and we define the SKOROKHOD distance dist(uid,Ol, between paths
Gjid,ol)
I
q  d , O ] , W[d,O]
E y  4 0 1 by
where X runs over increasing homeomorphisms of [d, 01 satisfying X ( t ) = t for t E [d,O] n Z, then the resulting metric makes R1d,O1into a Polish space and the natural restriction maps from f2rdl,01onto Rr,,,] continuous. Check this fact. 4.4.41 Exercise.
A remarkable dividend of looking at large deviations at the level of processes is that the rate functions JAW’ and Tim) have the pleasing property that they are affine on the space of shiftinvariant probability measures. (As we will see in Section 5.3 below, this fact can be made to play an extremely important role in the derivation of processlevel large deviation (.I. results.) In this exercise, we outline a simple way to see this fact for J p , an analogous approach leads to the same fact for JAW’. What we want to show is that, for Q , Q’ E MT(R), (m)
(m)
(a’
(4.4.42) J p (aQ+(la)Q’) = a J p ( Q ) + ( l  a ) J p (Q‘),
(Y
E (0,l).
Since we already know that (m) J p is convex, all that we need to do is check that the right hand side of (4.4.42) is dominated by the left. The first step will be to develop yet another expression (cf. (4.4.43) below) for (m) Jp . (i) Given
Y
E Ml(C), set
P, =
Po v(du).
Using (4.4.8) and (4.4.34),show that for any Q E MY(R), Y E Ml(C), and T E [0,m):
H ( Q ~ + h \ ( p v ) ~ +=h H ) ( Q T I ( P ~ ) T+) J n y + h ) ( Q T + ~ ) , h E (0, w).
Large Deviations
182
Starting from the preceding and using (4.4.39), conclude that (4.4.43)
(ii) To complete the proof of (4.4.42), prove that (aa+(la)b)log(aa+(la)b)
aaloga+(1a)blogb
Ib  al e
for every Q f ( 0 , l ) and all a, & f [0, 00). Now suppose that Q, &’ E Ms(Q) and a E (0,l) are given, set Y = aQ0 (1  a)&&, and use the preceding together with (4.4.43) to conclude that
+
(a) Jp
(QQ + (1  a)&’)L ~
(m) J P
( 9 )+ (1 a )(a) JP
(&’)a
(iii) The equation (4.4.43) is interesting in its own right. Indeed, it ex(m) presses J p (Q) as a specific relative entropy. This expression becomes particularly interesting in the case when one knows (as one does if P(t,v, .) satisfies apriori that there is a {Pt : t > 0)invariant p E Ml(C) with I(. the property that H(Q0Ip) < 00 for every Q E MT(R) with J, (Q) < 00. Indeed, show that, in this case, one can replace (4.4.43) by
(a))
(4.4.44)

4.4.45 Exercise.
Let n be a transition probability on C, and define Jn : M1(C) [0, m] accordingly (as in (4.1.38)). Also, for given v E MI@), let M?)(C2) denote the space of p E M1(C2) with the property that p o w l ’ = Y = P O T 2 where x i , i E {1,2}, is used here to denote the ith projection from C2 into C .
’ f
(i) Assume that II satisfies the condition (U) of Section 4.1, and use the results in this section together with those in Section 4.1 to prove the equality (4.4.46)
Jn(v) = inf{H(plv@z n) : p E M?)(C2)}
as an application of the last part of Lemma 2.1.4. Conclude, in particular, that if Jn(v) < 00, then there must exist a p E MY)(E2) such that Jn(.) = H ( p 1 8~ 2 n).
IV
Uniform Large Deviations
183
(ii) Half of (4.4.46) is trivial and depends in no way on the condition (U). Namely, to see that the left hand side of (4.4.46) is always dominated by the right, check directly from the definitions of Jn and Jc’ (cf. (4.4.4)) that Jn(v) I: Jf’(p) for every p E MF’(E2), and then apply Lemma 4.4.9. (iii) Even when ll satisfies (U), a direct proof that the left hand side of (4.4.46) dominates the right is not so easy. Thus, all that we will attempt to do here is explain how the existence of a p E Ma’(Ez) satisfying J ~ ( Y = ) H(p(v8 2 II) is related to the functions u E B ( C ;[l,m)) in terms of which Jn(v) is defined. Given a u E B (C; [l,m)), consider the transition probability defined by
(Note that, in the notation of Section 4.1, the II, above would have been denoted there by IIv with V = log &.)Next, define p, = v 82 II,, check that
and conclude that (4.4.47)
Jn(v) = 
J,log
dv = H(pu Iv
II).
for pu E Ma’ (C2) Conversely, use Lemma 3.2.13 to check that
and conclude that
sE
Summarizing, we now see that Jn(v) =  log dv if and only if p, E MF)(E2), in which case Jn(v) = H ( p , I v 8 2 II). The problem is, of course, that one cannot expect, in general, that there will exist a dv. u E B ( C ;[0,m)) for which Jn(v) =  SElog
Large Deviations
184 4.4.48 Exercise.

It is no accident that the rate function governing the large deviations of the empirical process is infinite off of the space of shiftinvariant measures. To see this, iet R = EN,define w E R &(w) E Ml(R) as in (4.4.1), and suppose that P E Ml(Cl) and I : R [0,a]satisfy 1
lim log ( P ( { w : &(w) E G } ) ) 2 I(Q) nw
for every open G in Ml(R) and Q E G. Show that I must be identically infinite off of MT(S2).
Hint: First check that MT(R) is a closed subset of Ml(R2);and, second, note that, for any c > 0, there is an N E Z+ such that the LBVY distance between elements Q and Q’ of M,(R) is less than e if
(The map
~FIO,NI is
the projection of R onto EN obtained by restricting a
n [0, N ] . ) Finally, for any w E R and n E 7+,let Gn E R be the path determined by Ck,+e(G,)
= & ( w ) for k E N and 1 5 l < n;
and show both that R, (G,) E MT(R) and that
V
NonUniform Results
5.1 Generalities about the Upper Bound
We begin by restating Theorem 2.2.4 for the setting in which we will be working. Namely, let fl be a Polish space and suppose that {QE: c > 0) is a family of probability measures on Ml(52) with the property that
(5.1.1)
A(V) = lim clog E'O
(1
Ml(W
exp
[1 1 e n
V ( w )p(dw)] QE(dp))
exists for every V E Cb(52;R). We then know that

lim E log(Q,(C)) 5  inf A*
(5.1.2)
EO'
for C
cc Ml(52), where A* : M1(R)
(5.1.3)
A*(p)
= sup
{

C
[0,00], given by
V d p  A(V) : V E Cb(i2;R)
I
,
is the LEGENDREtransform of A. Our goal in this section is to find out when we can remove the restriction that the C in (5.1.2) be compact.

R, we will say that @ is nonGiven a function @ : Cb(R;R) decreasing if @(V1) 5 @(Vz)whenever V, 5 VZ; and we will say that @ is tight if for each M E (0,oo) there is a K ( M ) CC 52 such that @ ( V 5 ) 1 whenever V is an element of cb(n;W)which vanishes on K ( M ) and is bounded by M . 185
Large Deviations
186 5.1.4 Lemma. Let @ : Cb(Q; R)
+ W
be a nondecreasing, convex function with the property that @(cl) = c, c E R. Then I@(V2) @(Vl)l 5 llv2  v1IIB for all v1, v 2 E Cb(R;R). Moreover, if, in addition, @ is tight, , ) CC R such that then for every E > 0 and M E (0,m) there is a K ( EM  Q(V1)I 5 E for all ~ 1 V,, E cb(R;R)with the properties that VI = V2 on K ( E , M )and IIVIIIB V llvzll~I M.
IQ(v~)
PROOF: First, note that @(V)5 9(11VllB1) = IIVIIB and that
v v O=m(l)
Thus,
Q(V) + q  V ) I 2
I@(v)l5 IlVll~,v E cb(@ R). Second, using convexity and writing
one sees that
for 8 E ( 0 , l ) . fiom (5.1.5) and the remark preceding it, we have that
for all 8 E ( 0 , l ) ; and, therefore, after letting 8 \ 0 and reversing the roles of V1 and V2, one gets the first assertion. To prove the second assertion, let E > 0 and M E (0,m) be given and use (5.1.5) to see that
V llv2llB 5 M . Finally, define 8 E (0,l) so that as long as I[VI~~B
e 1  ( 1 + 4 M ) = € A , 2 2 and set K(6,M ) = K ( 4 M / 8 ) ,where { K ( M ) : M E (0, m)} is the family of compact sets which appears in the definition of tightness for @. After reversing the roles of Vl and V2, one then arrives at the desired conclusion. 1 Before presenting the next result, we need to introduce some notation. Let /j be a compatible metric on R with the property that (Q, p ) is totally bounded, and denote by fl the completion of R with respect to b. Obviously,
V NonUniform Results
187
fl
is compact and, because it is Polish, il can be thought of as a dense subset of fl. In particular, we will identify Ml(R2) with the subset of those ji E Ml(fl) for which ji(fi \ R) = 0. In addition, if Ct,(R;R) denotes the space of bounded, &uniformly continuous functions on R, then E ~ ( f l€3); E C b ( i l ;R) is a surjective isometry. What the following theorem turns on is the observation that “tightness” allows one to work on the compact space fl and then transfer one’s conclusions there back to 52 itself.
4

$In

5.1.6 Theorem. Let @ : Cb(il2;R) W be a nondecreasing, convex function with the property that @(cl)= c, c E R; and define 9 on Ml(il2) bY
Then ! I ! is convex rate function. Moreover, if @ is tight, then 9 is good, there is a po E Ml(R) at which P vanishes, (5.1.8)
@ ( V= ) SUP
{
V d p  @ ( p ): /J E MI(R)},
V
E cb(fi;R),
and (5.1.9)
=
{
P(ji)
if fi E Ml(fZ)
m
if fi E M(6) \ Ml(n)
where $ is defined on M(6) by (5.1.10) !b(ji) = s u p
V d j i  @ ( V l n ): V E C(fl;R)

Conversely, suppose that @ : M,(R) [0, m] is a convex rate function which vanishes a t some po E Ml(S1);and define @ on cb(R;W) by (5.1.8). Then @ is a nondecreasing, convex function which satisfies @(cl)= c, c E €3; @ can be recovered from @ via (5.1.7); and 9 is good if and only if @ is tight.
PROOF:Let @ be a function of the sort described in the first part of the theorem, and define P accordingly by (5.1.7). Obviously, 9 is lower semicontinuous and convex. In addition, since @(O) = 0, it is clear that 9 > 0. Next, add the assumption that @ is tight. To see that P is good, let { K ( M ): M E (0, m)} be the compact subsets of il described in tightness property for @. If @ ( p )5 L, then
188
Large Deviations
for all V E Cb(R;R) satisfying llvll~5 M and V = 0 on K ( M ) . Hence, Q ( p ) 5 L implies that p ( K ( M ) ' ) 5 for all M E (0,00); and therefore
is compact in M1(R). We next turn to the proof of (5.1.9). To see that &(/it = 0;) unless fi f M,(fl), suppose that fi f M(fl)\Ml(fl).If @ is not a probability measure, then @(/i) = 0;) follows easily from Q(c1) = c, c E W. Thus, suppose that ji E Ml(fl) \ Ml(R). Then ji = Op (1  O)D, where p E M,(R), D E M1(fi) with D(R) = 0, and 0 E [0,1). Since R is a subset of fl, \ R can be written as the countable union of compact 2 $. subsets of fl. Hence, there exists a compact I? C f i \ R for which $(k) Now let M E ( 0 , ~ be ) given and use the TIETZE Extension Theorem to construct a VM E C(fl; [0,M I ) with the properties that VM = 0 on K ( M ) and ?M = M on K.We then have that
+
@(b)L
~ V M d / i   " ( V M I o )L
(1  8 ) M
 1, M E (0,m);
and this shows that @(/i) = 00. To complete the proof of (5.1.9), we must still check that $ ( p ) = Q ( p ) for p E Ml(R). Obviously, & ( p ) 5 Q ( p ) , and so it suffices to check that V dp  Q ( V ) 5 @ ( p ) for all V E Cb(R; R). Given V E cb(n;R) and c > 0, set M = IlVll~,choose K(c,M ) CC 0 as in the last part of Lemma 5.1.4, and take K CC R so that K 2 K ( EM , ) and p(KC)< e / ( M 1). Now use the TIETZE Extension Theorem to construct a P E C(fl;W) so that V = V on K and IlVll~5 IIVIIB. Then
s,
+
Continuing in the setting of the preceding paragraph, we next want to derive (5.1.8). To this end, first observe that, because of (5.1.10), (5.1.9), and the fact that M(fi) is the dual of C(fl;W), Theorem 2.2.15 implies (5.1.8) for E C b ( @ W ) . Also, it is clear that for all v E Cb(fl;R) the left hand side of (5.1.8) dominates the right. With these preliminaries in mind, let V E c b ( f l ; R ) and 0 < c 5 1 be given. Set M = l l V ( l ~and (recalling that we already know that @ is good) choose K cc Q so that K 2 K ( c , M ) and p(KC)< c / ( M 1) whenever Q ( p ) 5 2M 1. Next, construct W E &(fl; R) so that llWll~5 M and W = V on K , and choose p E M1(R) so that Q ( W )5 W d p  Q ( p ) c. Then, Q ( p ) 5 2M 1, and so
v
+
s,
+
+
+
V NonUniform Results
189
In other words, (5.1.8) is now proved. Finally, by taking V = 1 in (5.1.8), we see that infMI(n)9 = 0; and therefore, by Lemma 2.1.2, there is a po at which 9 vanishes. It remains to prove the converse assertions. Let 9 be given as in the second part of the theorem, and define CP by (5.1.8). It is then an easy matter to check that CP is a nondecreasing, convex function for which @(cl) = c, c E R. Moreover, the ability to recover 9 via (5.1.7) is a simple application of Theorem 2.2.15. In particular, by the first part of this theorem, 9 is good if CP is tight. Finally, to see that CP is tight if 9 is good, let M E (0,m) be given; and choose K cc R so that p ( K C )< 1/M whenever Q ( p ) 5 M . Then the right hand side of (5.1.8) is dominated by 1 for all V E Cb(SZ;R) which vanish on K and satisfy l l V l l ~5 M .
5.1.11 Corollary. Let {QE : E > 0) be a family of probability measures on Ml(R) and assume that the limit A(V) in (5.1.1) exists for each V E Cb(R; R). Then A is a nondecreasing, convex function with the property that A(c1) = c, c E W. Moreover, if A is tight, then the function A* in (5.1.3) is good and (5.1.2) holds for every closed set C 2 Ml(0). PROOF:The only assertion which is not an immediate consequence of Theorem 5.1.6 is the final one. To handle this one, denote by Qe the ) from Q E by the inclusion M1(R) C M1(fi). measure on M ~ ( f iinduced Then
for
3 E C(fi;R). Thus, if G is defined in terms of A as in (5.1.10), then
for all closed C Ml(h2). At the same time, if A is tight, then, by (5.1.9), infc A* = infc,,Ml(n) A*; and clearly this shows that (5.1.2) holds for every closed C. I h
5.1.12 Exercise. It turns out that there is no need to know that the limit A(V) in (5.1.1) exists in order to get an upper bound. Indeed, let {QE : E > 0) C M1(M1(C)),suppose that CP : cb(c;w) R is a function which dominates

(5.1.13)
i ( V ) = lim clog O E ’
(1
for V E Cb(C;R); and let 9 :
exp

M1 (C)
[f
V(n)p(do)]Q E ( d p ) )
R be defined as in (5.1.7).
Large Deviations
190
(i) Show that

lim E log (Qd(C)) 5  inf Q
(5.1.14)
C
€40
for all C cc MI@). Next, show that h is a nondecreasing, convex function which satisfies h(c1) = c, c E R; and conclude that (5.1.14) continues to hold for all closed C M1(C) if is tight. In particular, these considerations apply when @ = h; in which cwe we will use k to denote the corresponding \Ir .

(ii) Suppose that there exists a function F : C R with the properties that F is bounded below, {o : F ( o ) 5 M } C C C for every M E [O,m), and
Show that is then tight; and conclude that h* is good, that (5.1.14) holds with \Ir = h* for every closed C C M1(C), and that
h(V)= sup
(5.1.16)
{ J, V d p
 i * ( p ) : p E M1(C)
for every V E Cb(C; R).
5.1.17 Exercise. Return to the setting of Remark 4.2.2 in Section 4.2, and define hp(V) to be
for V E Cb(C;R).
(i) Check that
np
is nondecreasing, convex, and satisfies i p ( c 1 ) = c, c E
W. Thus, if (5.1.7) is used to define h> from i p , then
(cf. Remark 4.2.2) holds always for C CC M1(C) and will hold for every if hp is tight. closed C C_
V Non Uniform Results
191

(ii) Show that if F : C R is a function which is bounded below and has the properties that {a : F ( a ) 5 M } cc C, M E [0, m), and
then
i p

is tight.

(iii) Let F : C R be a lower semicontinuous function which is bounded below, and suppose that there is a measurable u : [O, m) x C [0, m ) which satisfies (see the paragraph preceding Lemma 4.2.23)
Show that
Finally, if {c : F ( u ) 5 M } cc C, M E [O, m ) ,u is uniformly positive, and
conclude that x p is tight. At least when dealing with processes whose paths are continuous, one often finds the function u by a localization procedure. Namely, one starts with a function F with compact level sets and seeks a nondecreasing, locally bounded sequence of functions un E D which satisfy u, 2 1 and Lu, = Fun on a sequence of open sets U, which exhaust C; and one then takes u to be the limit of the u, 's. (iv) It is clear that A p 5 A p (where A p is defined in (4.2.21)) and therefore that h > ( v )5 i > ( v ) and also that
for all v E
(cf. (4.2.22) and (4.2.37) for the notation here). Thus (cf. (4.2.36) and (4.2.38)),we see that J p 5 and that, when P(t,a, is FELLERcontinuous, J p 5 A>. Check that the following line of reasoning leads to A> 5 J p and thence to (5.1.21)
xi
A> = J p
*
if A, = A>.
a)
Large Deviations
192
Let V E Cb(C; W) be given and define {Py : t Lemma 4.2.23. Given X > i p ( V ) , define
Show that infnEZ+ infoex u,(u) Lemma 4.2.31), and that
> 0, U,
XU,  VU,  Lu, = 1  v,
> 0 } accordingly as in
E D (cf. the discussion preceding
where v,
G
e',
[P,Vl]
Next, check that

and therefore that supnEZ+IIv,/u,)IB < co. Since X > i p ( V ) , conclude that v,/u, 0 boundedly. After combining this with the preceding, one is led to
and from here it is an easy step to the desired conclusion. Finally, by the same reasoning which just led to (5.1.21), prove that (5.1.22)

A* A*  J p
when P(t,cr, .) is FELLERcontinuous.
(iv) Formulate and verify the results in (i) through (iv) for the discretetime setting.
V Non Uniform Results
193
5.2 A Little Ergodic Theory
Before attempting to develop lower bounds which will complement the upper bounds obtained in Section 5.1, we make a digression in which we will discuss a few essential facts from ergodic theory. Because it is not so readily available in standard texts, we will work in the continuous parameter setting. We begin our discussion with the lovely Sunrise Lemma of F. RIESZ[91]. To understand both the name as well as the intuition behind what is going on, think about the distribution of light and shade in a (onedimensional) mountainous region at precisely the moment when the sun comes up over the horizon. In the lemma, the sun is on the right, the set E is the region in the shade, and “ F ( s )is the altitude at s.”

5.2.1 Lemma. Let I = [a,b] be a nonempty compact interval and F : I R a continuous function. Denote by E the set of s E I” with the property that F ( t ) > F ( s ) for some t E (s, b ) . Then E is an open subset of R; and if E # 8, then it is the union of countably many mutually disjoint open intervals (a,/?) each of which has the property that F ( P ) 2 F ( a ) .
PROOF:Clearly, E is open in R, and therefore all that we have to do is check that if (a, /?) is a nonempty connected component of E then F(/?)2 F ( a ) . To this end, suppose that F ( P ) < F ( a ) and set A = (F(a)+F(P))/2. Then C E {s E (a,p) : F ( s ) = A } is a nonempty, compact subset of (a,/?). Let y = max{s : s E C}, and observe that F ( t ) < A for all t E (y,/?].In addition, since p $ E , F ( t ) 5 F ( P ) < A for every t E (/?,b). Hence, F ( t ) < A = F(y) for all t E (y,b), and therefore y 4 E . However, y E (a,/?)E ; and so we have a contradiction. I As a direct consequence of Lemma 5.2.1, we get the following sharp form of the HARDYLITTLEWOOD Maximal Inequality [58]. 5.2.2 Theorem. Given a function
f
E L ’ ( R ) , define
(5.2.3) Then s E R I+
J ( s ) E [O,m) is lower semicontinuous and
. use Irl to denote the LEBESGUE measure of for all X E ( 0 , ~ )(We W.) In particular, for all p E (1,001,
(5.2.5)
r
C_
194
Large Deviations
PROOF:Without loss of generality, we will assume that f 2 0. Given n E Z+ and X E (0, m), set I , = [n, n] and define
and
for s E [n,n).Clearly, {s E 1; : fn(s) > A} coincides with the set E,,x in Lemma 5.2.1 corresponding to the function Fn,x on I,. Moreover, by that lemma, we know that E,J is either empty or the countable union of mutually disjoint intervals ( a , @ with ) the property that A(@  a ) 5
J,B f(t)d t . Hence, After letting n /” Xl{s :
c
00, one
quickly concludes from the above that
f(4 > All I
s:f(s)>A}
f(t)d t ,
E (0700);
and so (5.2.4) results from taking left limits in the preceding. ) bounOnce one knows (5.2.4), one can get (5.2.5) for p E ( 1 , ~ and ded, nonnegative f E L1(R)by simply noting that
where we have used HOLDER’Sinequality in the last step. The derivation of the general result is now an easy limit argument. Since (5.2.5) is obvious when p = 00, the proof is now complete. 1 We are now ready to start doing ergodic theory. Let (R,B) be a measurable space. The family 0 = (0, : t E [ O , o o ) } is called a measurable, oneparameter semigroup of transformations on (R, 23) if ( t , w ) E [O,m) x 0 H & ( w ) E R is l?pa) x Bmeasurable function from [O,m) x 0 into (0,B) and ds+, = 8, o Ot for all s, t E [ O , o o ) . A set A 2 R is said to be @invariantif A = 8r1A, t E [O,m); and a measure Q E M1((R, l?)) is said to be @invariantif Q = Q o d,’, t E [0, m). We will use 30 and MY((R,B)), respectively, to denote the @invariant subsets A E B and @invariant measures Q E MI ((a,B)).
V NonUniform Results
195
5.2.6 Theorem. (MAXIMAL ERGODIC INEQUALITY) Let (52, 23) be a measurable space and 0 = (0, : t E [0,m)} a measurable, oneparameter semigroup of transformations on (R, 23). Then the set 30 is a subaalgebra of B. Next, given a measurable f : R R, let Rf be the set of w E R with If(0tw)l dt < 00 for every T E [0,m). Then Rf E 23, the property that and Q ( R f ) = 1 for all Q E MY((R,B)) and f E L1(Q). Finally, given a measurable f : R W, define f~ : 0 R for T E ( 0 , ~by )

f T ( u )=
{
Then (T,w) E (0,m) x R

+
w
f(e,w) dt
ifw E af otherwise.
f ~ ( w )E R is measurable, T E (0,m)
f ~ ( wE)R is continuous for each w E R, and, for every Q
E

MY((R, a)),
one has that (5.2.7)
1
Q ({w : Mf(w) 2 A}) 5 illfllL1(Q)lA E (07 0 0 ) 7
and
where
Mf(w)= SUP
(fT(41,
WER.
T€(O,m)

PROOF:The only thing that we need to do is check that (5.2.7) and (5.2.8) hold for bounded measurable f : R [O,m). Let such an f be given; and, for m E Z+ and w E R , define
and
It is then an easy matter to see that for 1 5 m < n and t E (0,n m],
~rnf(6t~ i )J n , w ( t ) Hence, by (5.2.4), for all X E ( 0 , ~ and ) 15 m XQ ( { w : M m f b ) 2 A } ) cnm
0)invariant and that 6: E Mf(R) for every w E ROO.In addition,
so

f d6: for all w E ROOand f E Cb(R; R). Finally, given and f *(w) = Q E MY(R) and a c.p.d. w Qw of Q)&, set
200
Large Deviations
and note both that RQ E 38 and that
RQ = {w
E 000 : : 6 = Qw}.
5.2.14 Lemma. The set MY(R) is closed in Ml(R), and Q E MY(R) is
an element of EMF@) if and only if
for every f E 5. In particular, EMY(R) E B,,,,,. Moreover, Q(RQ) = 1 for each Q E MY(f2); and therefore, for each Q E MY(R), Qw E MY(R) for Qalmost every w E R.
PROOF:Since Q E MY(R) if and only if lf(etw)Q(dw) =
J, f(w)Q(dw)
for all t E (0,m) and f E Cb(R;R);
and because f 0 et E Cb(fl;R), t E (o,m), whenever f E c b ( R ; w ) , it is clear how to write MY(R) as the intersection of closed sets C ( t ,f ) , (t, f ) E (0, m) x c b (0; To prove the characterization of EM?(R), it is enough to show that the stated condition is sufficient. But, if f* = f dQ (a.s.,Q) for every f E 5, then EQ [fl3e]is Qalmost surely constant for every f E 5. Since the class of f E B(R;R) which have this property is closed under bounded pointwise convergence, we see that EQ [f)3@]is Qalmost surely constant for every f E B(R;R); and obviously, this is tantamount to the assertion that Q is ergodic. Finally, if Q E MY(fl), then the equality Q(RQ) = 1 is an immediate consequence of the Individual Ergodic Theorem together with the fact that, for each f f B(R;fa), w E R fdQw is a version of EQ[f13@].1
w).
s,
s,
5.2.15 Lemma. For every Q E MY(R), Qw = : 6 E EMY(0) for Qalmost every w E R. In particular, if R b = {w E RQ : Qw E EMY(R)},
then
fib E 38, Qb C_ ROO, and Q = PROOF:Note that Q({w :
Qw
4 EM?(fl)})
J"b
6: Q ( L ) .
V NonUniform Results At the same time, for each f E 5 and
201 E
> 0,
and
In the preceding, we have used the fact that f~ E Cb(R;R) in order to pass from the second to the third lines, and we have used (xn,,f*) o B5 = xnof*, s E [O,m), in the passage to the last line. I Clearly, the preceding shows that w H Qw admits a regular version; and therefore, by the reasoning at the end of Remark 5.2.13, we have the following result as an immediate consequence of Lemma 5.2.15. 5.2.16 Theorem. (ERGODIC DECOMPOSITION THEOREM) Let R be a Polish space and 0 = (0, : t E [ O , c o ) } a measurable semigroup of continuous transformations on R. Then, for each Q E MY(C2), there is a PQ E Ml(Ml(R2)) with the properties that ~Q(EMY(R)) = 1 and (5.2.12) holds.
Before closing this section, we record what our results look like in the case when 0 = (0, : t E R} is a measurable group of transformations (i.e., Os+t = B,oB, for all s, t E R) on 0. Note that invariance of measures or functions under 0 is equivalent to invariance under either of the semigroups 0+ f (0, : t E [0, m)} or 0 = {O, : t E [0,m)}. Thus, by treating O+ and 0 separately, one sees that for every Q E MY((!&a)) and f E L’(Q),
202
Large Deviations
for A E (0, m),
(5.2.18)
both Qalmost surely and in L1(Q), and
(5.2.20)
if p E ( 1 , ~ and ) f E L P ( Q ) . Finally, when R is a Polish space and the Ot ’s are continuous, then the Ergodic Decomposition Theorem again applies and yields (5.2.12) with a PQ which is concentrated on the ergodic elements of M? (0). 5.2.21 Exercise.
As was mentioned in Remark 5.2.13, the calgebra 3g is hardly ever countably generated. To see why this is the case, assume that 0 = (0, : t E R} is a measurable group of transformations on (R,B) with the property that every orbit [w]e= {Btw : t E W}, w E 0, is an element of B and that there exists a Q E EMY((0,B)) such that Q ( [ w ] e )= 0 for every w E R. Under these circumstances, it is impossible for 3~ to be countably generated. Indeed, suppose that 30 = o({Ae}Y). Choose {&}y so that Be =
= n,“=,
{
At A:
ifQ(Ae) = 1 if Q(A&)= 0.
Show that C Be = [w]e for some w E R, and conclude that 1 = Q(C) = &([w]Q)= 0. In particular, this rules out the possibility that 3s is countably generated. For a simple example of such a situation, take R to be the 2torus S1x S1 and (0, : t E [0,m)} to be the flow generated by the vector 7% where y is an irrational number. Check that all the orbits are then go subsets of R and that the normalized LEBESGUE measure on R is an ergodic, invariant measure which assigns measure 0 to each of these orbits.
&+
V NonUniform Results
203
5.2.22 Exercise.
For the sake of completeness, work out the theory developed in this section for the case of a discrete 1parameter semigroup (0, : n E Z+}. Of course, since Bn = 0" where 0 = 01, the appropriate notions of invariance are simply that Q = Q o 0' and f = f o 0I.
(i) F'rom the HARDYLITTLEWOOD inequality, derive
for all X E (0, m) and any sequence {an}y.(Here we use JrJ to denote the LEBESGUE measure of r S Z+; in other words, the cardinality of I?.)
(ii) Knowing (i), prove that for any 0invariant Q E M1((R,B)) and any f f L1(Q),
for X E (O,m),
(5.2.26)
1

n m=l
f(0"w)

E Q [ f 1 3 e ] ( w ) (as.,&) and in L'(Q),
and
if p E (1,m) and f f LP(Q).
(iii) Assuming that R is a Polish space and that 0 is continuous, state and prove the appropriate version of the Ergodic Decomposition Theorem (i.e., Theorem 5.2.16).
204
Large Deviations
5.2.28 Exercise.
Let II(a, be a transition probability function on the measurable space ( C , F ) and define the operator [n4](a)= & ~ ~ ( T ) I I ( u a, ~ET C, ) , for 4 E B ( ( E , F ) ; R ) Denote . by B n ( ( C , F ) ; R )the space of 4 E B ( ( C , F ) ; R ) which are IIinvariant (i.e., 4 = IIq5), and let MY((C,3)) be the space of IIinvariant p E M1 ((C, F))(i.e., p = p I I = J.II(o, p ( d a ) ) . a)
a)
(i) Prove that, for any p E M Y ( ( C , F ) ) and 4 E L'(p),
for X E (0, m) and
for p E (1,m].

(ii) Next, show that for each p E M Y ( ( C , F ) ) there is a unique bounded linear operator E, : L 1 ( p ) L 1 ( p ) with the property
1
"

[IIm4](a)
m=l
[ECL+](a) palmost surely and in L ' ( p ) .
Show that E i = E,, ECL42 0 if 4 2 0, and E p 4 = q5 (a.s.,p) if 4 E B n ( ( C , F ) ; R ) .In particular, conclude that E, is a contraction on P ( p ) for every p E [l,m]. Finally, show that
for p E (1,m) and
E Lp(p).
(iii) Call an element p of MY (( C, F))IIergodic if
4=J
c
4dp
(as., p ) for each
4 E B~((c,T);R).
Show that two IIergodic elements of M F ( ( C , F ) )are either equal or singular.
V NanUniform Results
205
(iv) Set R = EN,B = p ,and let {Po: a E C} be the MARKOVfamily of probability measures on (a,B) whose transition function is lI(cr, .). Given p E Ml((C,F)), set Pp = JE Pcp(da), and check that Pp is invariant under the shift O : R R given by (Ow), = w,+l, n E h+, if and only if show that p E MF((C,F)) is HIergodicif and only is ergodic for 8.

5.2.31 Exercise.
Let 0 = {Ot : t E [0, GO)} be a measurable semigroup of transformations on the measurable space (a,F),and assume that there is a sub aalgebra 30C F with the property that UtE[o,m) O;lFo generates the whole of F. Next, for each T E [0, m), let FT and p be the aalgebras generated by UtEIO,T~ OF'30 and UtEIT,m) O;'~O, respectively. Finally, define the tail aalgebra 7 = p.
nTEIO,m)
(i) Given any f E B ( ( R , F ) ; R ) , set

f*(w)= lim f t ( w ) = t+m
t+m
t
When f is 3~measurablefor some T E [0, m), show that the function f * is 7measurable. Next, assuming that Q E M Y ( ( 0 , F ) )and using Q 7 to
7 measurable for every denote the Qcompletion of 7, show that f* is Q
f E B((fm;q. (ii) Using (i), show that if Q E MY((R,F)),then 3e that Q is ergodic if Q(A) E ( 0 , l ) for every A E 7.
Q
7 ; and conclude
206
Large Deviations
5.3 The General Symmetric Markov Case
Our first application of the results obtained in Section 5.1 will be to the large deviation theory for the empirical distribution of the position of a symmetric MARKOV process. More precisely, let C, P(t,a,.),and the associated MARKOV family {Po: (T € C} Ml((f2,B)) be as they were in Section 4.4; and define
s
(5.3.1)
L t h ) =X[O,t] O
( W j 0 , t I ) I,
I..
(tl w ) E (01
x 0,
as in Remark 4.2.2. Next, assume that there is a P (t ,(T,)reversingmeasure m E MI@), and define the DIRICHLET form E and the associated functions A& : B(C;R) R and J& : MI@) [0, m] as we did in the final part of Section 4.2 (cf. especially (4.2.47), (4.2.51), and (4.2.49)). Finally, set P, = J, P, m(da).

5.3.2 Lemma.

If JE is lower semicontinuous, then
and (5.3.4) for all C Cc MI@).Moreover, if, in addition, JE is good (or, equivalently, A, is tight), then (5.3.4) holds for every closed C MI@).
PROOF:In (ii) of Exercise 4.2.63, we saw that JE is convex. Thus, by Theorem 2.2.15 and (4.2.51), if JE is lower semicontinuous then (5.3.3) follows; and so, by the results in Section 5.1 (in particular part (i) of Exercise 5.1.12), all that we have to do is check that
But, because m E MI@), it is easy to see that
We now want to show that, under reasonable conditions, one can prove the complementary lower bound. The approach which we are going to
V Non Uniform Results
207
adopt is very reminiscent of the one which we used in the original proof that we gave in Section 1.2 of the classical CRAMERTheorem for realvalued random variables. That is, we will force certain ergodic behavior factor and will by the introduction of an appropriate RADONNIKODYM get our lower bound by estimating the size of the factor which we have introduced. However, in order to carry out this program, we need to make the following mild assumption.
(E)
If {QT : T > 0) C Ml((R,B)) is consistent in the sense that Q T ~= Q T ~on Btj~~ for all 0 5 TI < T2 < m, then there exists a unique Q E Mi((R,B)) such that Q = QT on BT for each T E [O,m).
Note that (E) holds if 52 is a Polish space, B = B,, each Bt is countably generated, and B is generated by U ,,  at. (Cf. Theorem 1.1.10 in [104].) 5.3.5 Lemma. Let u E D n B ( C ;[l,m)), set V, =
e,define
for ( t , ~E)(0, m) x C, and set r E BE and
for ( t , w ) E [O,m) x R. (See Lemma 4.2.23 and Theorem 4.2.25 for the notation here.) Then P,(t, (T,) is a transition probability function; and, for every u E C, (X,(t), Bt, Po)is a nonnegative martingale with meanvalue 1. Moreover, for each CT € C, there is a unique P," € Ml((R,B)) satisfying PZ(A) =
X,(t,w) P,(dw),
In fact, the family {P,"
: uE
t E [O,m) and A E
C} is measurable and, for each cr E C,
for all s, t E (0, m) and A E B,. Finally, if
(5.3.7)
at.
208
Large Deviations
then mu is a reversing measure for P,(t,
6,
a).
PROOF: We first check that P,(t, 0,.) is a transition probability function. To this end, note that
Thus, the measurability of
as well as the CHAPMANKOLMOGOROV equation are immediate. In addition, since u = P p u (cf. the proof of Lemma 4.2.35), it is clear that P,(t, 0,C) = 1. We next show that, for each CT E C and I? E BE,
for 8 , t E ( 0 , ~ and ) A E D,. Indeed, by the MARKOV property combined with (4.2.26),
which is equivalent to (5.3.8). By taking I? = C in (5.3.8), we get the asserted martingale property; and therefore, by (E), the existence and uniqueness of P," have also been established. Moreover, the measurability of u E C w P," is a trivial consequence of the expression for P," on each of the Dt 's, and (5.3.6) follows easily from (5.3.8). Finally, to see that mu is reversing for Pu(t,0,.), note that for 4, E B ( C ;R) $J
FF
Since, by Lemma 4.2.50, is selfadjoint on L 2 (m),it follows that the first expression in the above is symmetric in 4 and $J.
V Non Uniform Results
209
5.3.9 Lemma. Assume that J & ( p ) = 0 only if p = m. Then for every u E D n B ( C ; [l,co)) and every ropen neighborhood (cf. the discussion
preceding Lemma 3.2.19) G E B M ~ (of~mu ) X,(t, w)Po(dw)= 1 in mmeasure.

PROOF:Note that it suffices to check that if PzL= ScPzm,(da), then P"({W : Lt(w) E G}) 1 as t + 00. Furthermore, since P" is tirneshift invariant, this latter statement will follow from the Individual Ergodic Theorem once we show that P" is ergodic relative to timeshift. Thus, all that we have to do is show that if {tn}y [O,m), F E B(CZ+;W), and
then Cpo is P"almost surely constant if, for each t f ( O , o o ) , P"almost surely. We begin by showing that if 4 E B(C;R) satisfies
@t
=
@O
for each t E (0,00), then 4 is mualmost surely constant. In fact, given such a 4, we can use symmetry to check that
Since, for each t E (O,m), PU(t,o,dr)m,(do) is bounded above and below by constant positive multiples of P(t,0,d ~ ) m ( d a it ) , follows from (4.2.54) that €(4,4) = 0. But, this means that J & ( p )= 0, where
and therefore, by hypothesis, f#J is rnalmost surely constant. Returning to the ergodicity question about P", suppose that = @O P"almost surely for each t E (0,m). Set 4(a) = S , @ o ( w ) P ~ ( d w ) , and observe that for all t E (0, m) and mualmost every o E C
Large Deviations
210
Thus, by the preceding, 4 is mualmost surely constant. But this means that, for any t E (0,m) and A E Bt,
s,
Qo(w) P"(dw) =
J, @t(w)P"(dw)
=L
$ ( C t ( w ) ) P"(dw) = P"(A)
and clearly this leads to the conclusion that
In other words, @o must be P"almost surely constant. I
5.3.10 Theorem. Assume that J & ( p ) = 0 only if p = m, and Jet v E M,(C) have the property that, for some T E [O, m), VPT is not singular to m. Then for every ropen set G E B M , ( c ) (5.3.11)
Hence, if, in addition, JE is a good rate function, then
(5.3.12)
for every r E B M ~ ( c ) .
PROOF:In view of Lemma 5.3.2, all that we have to do is check (5.3.11). Also, since, for any T E ( 0 , ~and ) 6 > 0, P U P T ( { W : Lt(w)
E GI)
4 Pu({.
: l [ L t ( 4  GIVar< 6 )
as soon as t is sufficiently large, we will assume, without loss of generality, that II itself is not singular to rn. In particular, this means, by Lemma 5.3.9, that
lim t+m
X u ( t ,w)P u ( h ) > 0. w:Lt(w)€G}
We begin by showing that if u E D n B ( C ;[ l , m ) ) , then (5.3.13)
V Non Uniform Results
211
for every ropen G E ~ ? M ~ (  Q containing mu. To this end, set
; therefore, by the remarks made above, for all r E ( 0 , ~ )and
2
 lim
sup
'LopEG(r)
1
V, d p =
C
5
dm,.
(5.1.13) is now proved. Finally, we will show that if Jt(p) < 00, then there exists a sequence {un}yG D n B ( C ;[l,m)) such that mu, p in the strong topology on Ml(C) and JE(m,,,) J E ( ~ Clearly, ). when combined with (5.3.13), this will complete the proof of (5.3.11). with J E ( ~ lo,
In particular, the map V E Cb(f21; R) HAI(V) is a tight, convex function which satisfies AI(c1) = c for c E R.
PROOF:Without loss of generality, we will assume throughout that V is nonnegative, and we will use M to denote IlVll~. To prove the existence of AI(V), set
Because of shiftinvariance, all that we have to do is check that the limit ) given and write T = limT,, exists. To this end, let S E ( 0 , ~ be nTS T T , where 1 2 E~ Z+ and TT E [0, S),for T > S. Then, by (H1) and
+
V NonUniform Results
217
shiftinvariance, for every C > LO,
Hence,
for S E (0, 00) and C implies that
> CO; and, since a ( [ )\ 1 as C /” 00, this clearly this
In order to prove (5.4.14), let C > C, be given and set T = C + 111. Then, again by shiftinvariance and (Hl),
218
Large Deviations
where we have used JENSEN'Sinequality in the passage from the second to the third line. After dividing through by nT and then letting n , 00, we arrive at (5.4.14). Finally, the convexity of A, as well as the equality Ar(c1) = c, c E R, are both immediate consequences of the definition of A,. Moreover, given (5.4.14), it is clear how to choose the sets K ( M ) cc QI to check tightness. Namely, let C > CO V 1 be given and choose K ( M ) CC 521 so that

P({w : v ( w ) 4 K ( M ) } ) I exp[(C+
IIl)4C)M]. I
Now let A; : Ml(R1) [0,00] be the function defined in (5.4.6). Then, by Lemma 5.4.13 and Corollary 5.1.11, A; is a good rate function on M ~ ( Q Iand )

P({w : RT(w)0 nll E F } ) 5  inf A; lim log 2T l ( > F for closed F MI ($2,). Thus, by (ii) of Exercise 2.1.21 (cf. (ii) of Exercise 3.2.22 as well), the function A* : Ml(Q) [0,m] in (5.4.5) is also a good rate function; and, just as in (iii) of Exercise 2.1.21, we now have (5.4.4). T+m

Having completed STEP 1, we now begin STEP 2 by checking that A*(&) = 0;)
(5.4.15)
when Q f Ml(Q)\ Ms(Q).
To this end, suppose that Q $ Ms(S2) is given. One can then choose a compact interval I and a v E cb ($21; R) so that
w)>Jn
(5.4.16) l V o n & $ w ) Q(d

Vonl(w)Q(dw)+l
for some C E W.
In particular, if the compact interval J is chosen so that ( L + I ) U I C J and W E C ~ ( $ ~ J ;iRs d) e f i n e d b y W o n ~ e V o n I o e ~  V o nthen(5.4.16) ~, leads to A*(Q) 2  A J ( M W > : M E (0, m)}.
sup{^
Thus, we will have completed the proof of (5.4.15) once we show that
A j ( M W ) 5 0 for every M E (0,m). But it is clear that, for any T > C,
and, therefore,
2T log asT00.
(k [l: exp
MW
o TJ
(&w) dt
V Non Uniform Results
219
To complete the proof of (5.4.7), we will use the following lemma. 5.4.17 Lemma. Let I be a compact interval. Then
(5.4.18) for all Q E Ml(R) and
> C,;
and, for every Q E MT(R),
1
5 FHI(T)(QIP) for T E (0,m) and V E B(f21;R), where I ( T ) = {t : It  I1 I T}.
PROOF:Recall (cf. Lemma 3.2.13) that HI(QIP) is given by (5.4.20) s u p { ~ V o 7 r 1 d Q  l o g ( ~ e x p [ V o n r ]d P ) : V E Cb(l2r;R)). Thus, (5.4.18) is an immediate consequence of (5.4.14). To prove (5.4.19), let Q E MT(R) and V E B(RI;R) be given. For T E (0,m), define VT E B(RI(T); R) so that
Because Q is shiftinvariant, one then has that
Finally, by (5.4.20), the right hand side of the preceding is dominated by &HI(T)(QIP) when V E Cb(R1;W)and therefore for general V E B(Qr;W). I From here, it is an easy matter to complete STEP2. Indeed, by (5.4.18), for any Q 6 Ml(R), we have that
 1
lim HI(QJP) 5 A*(&).
IPR
)I)
220
Large Deviations
On the other hand, if Q E MY(O), then both HI(Q)P) and h;(Q) depend on I only through 111, and, by (5.4.19),
for any S E ( 0 , ~ )and V E Cb(R[s,s];R). Clearly this leads immediately to AiS,S'] (Q T[S,S]) <  T lim M ' &H[,T](QIP); and the rest of STEP 2 is now simply a matter of notation.
'
We next turn to STEP 3 and verify that (5.4.10) holds for ergodic Q E
MY(W 5.4.21 Lemma. If Q E EMS(O) and I is a compact interval, then for any G I E B M ~ (which ~ ~ is ) a ropen neighborhood of Q o A;'
PROOF: The argument is very much like the one used in (ii) of Exercise 3.2.23, only here the Ergodic Theorem plays the role that the Law of Large Numbers did there. Set I ( T ) = {t : It  I1 5 T} and

and let AT = { w : RT(w)o ~7' E GI and FT(w) > 0). Then, by the Ergodic Theorem, Q(AT) 1 as T + 00. Thus, by JENSEN'S inequality,
since
&(w)log(FT(w)) p ( h ) Hr(T)(Q\P)  J,: 2 eel  HI(T)(QIP). I As an essentially immediate consequence of Lemma 5.4.21, we see that lim log ( P ( { w : RT(w)E G})) 2 H(Q) T+w
for any open G C_ Ml(R) and any ergodic Q E G.
V Non Uniform Results
221

Continuing with STEP3, we next define the lower semicontinuous function J : Ml(CL) [0, 003 as in (5.4.11). Our goal is to prove that J 5 H. At the moment (cf. the preceding paragraph), we know that J 5 H on
E M m ) u (Ml(W \ MW)). 5.4.22 Lemma. The function J in (5.4.11) is convex.
PROOF:Since J is lower semicontinuous, it suffices to check that
for Q 1 , Q Z E M1(R) satisfying J(Q1) V J(Qz) < 00. To this end, let G be ) T > 0 so an open set containing Q = ( 0 1 Q2)/2. Choose S E ( 0 , ~ and that
+
where I = [S, S] and the balls BI are defined relative to the LEVYmetric on Ml(0,). Set
and
w(T)= P ( { w : RT(w)E G}). Then, by (H2):
+
as long as C > and T > ( 2 s C ) / ~ T . (The number P(C)' is the HOLDER conjugate of P(C).) Since J(Q1) V J ( Q 2 ) < 00 means that ul(T)uz(T)2
222
Large Deviations
exp [  M T ] for some M
< 00 and all sufficiently large T 's, we now see that
1
2 5"'(T)"2(T) for all sufficiently large T 's; and clearly this leads to
1 2.  lim log TTm
+ 1 2
J(Q1)
P({w : RT(w)on,'
2T l
E
Br(Ql,r)}))
(
7 ,
lim  log P ( { w : RT(w)0 rT1E Br(Q2,T ) } ) )
TT&! 2T
+ J(Q2). 2
We are now in the following situation. Both of the functions J and H are lower semicontinuous and convex; and we know that H(Q) >_ J ( Q ) for all Q E (M1(R)\ M:(R)) u EMf((R).Furthermore, the function H is affine on MS((R)in the sense that
H(aQi + (1  a I Q 2 ) = aWQi)+ (1  a)H(Q2)
(5.4.23) for
(Y
E [0,1] and Q1, Q 2 E
Ms((R).To see this, simply observe that (cf.
(ii) of Exercise 4.4.41)
+
aHI(Q1JP) (1  a)Hr(Q2lP) 2 Hr (aQi+ (1  Q ) Q ~ ~ P ) 2 2 aHr(Q1IP)+ (1  a)Hr(Q21P) .; From these remarks, it should be clear that the following lemma is all that we need in order to complete STEP3.
M1((R)+ [O,oo] be a lower semicontinuous function. If CP is convex, then for every p E Mi (fl)
5.4.24 Lemma. Let
(5.4.25)
ip :
(Ll(o)
Rp@,))
s,,,,)
@ ( RP) ( W
On the other hand, if CP is f i n e on Mf(R) and p E Ml(Mf(fl)),then (5.4.26)
@(R)P(dR).
V Non Uniform Results
223
PROOF:We begin with the case in which p ( K ) = 1 for some compact subset K of M1(R).Throughout, B ( Q , r ) denotes the LEVYmetric ball in M 1 ( 0 ) of radius r around Q. For m E Z+, choose a finite set {Rm,e}tzl E K so that the balls B,,e = B(R,,e, l/m), 1 5 C 5 L,, cover K ; set Am,l = K fl Bm,l and
for 2 5 C 5 L,; and take a,,e = p(A,,e). Next, for m E Z+ and 1 I lI L,, choose P,,e E K n Bm,e SO that
@(Pm,e)5 inf{ @ ( R ): R E K n B,,!}
+ ;m1
and define Fm,t by
Assuming that @ is convex, we have that

where Q,,(R) = @(Pm,e)for R E A,,J. Since @ is lower semicontinuous, @,(R) Q,(R)for each R E K . Thus, when @ is bounded, LEBESGUE'S Dominated Convergence Theorem shows that
as m

00.
At the same time,
and so, again by lower semicontinuity,
Large Deviations
224
and together, these imply the desired result when 0 is bounded. Thus, even if 9 is not bounded, we have that (5.4.25) holds for @ A n; and, therefore, a passage to the limit as n t 00 yields the result for a's which are not necessarily bounded. Next, assume that 9 is afiine on Ms(R). Because Ms(R) is closed, we may and will assume that the K for which p ( K ) = 1is contained in Ms(R); and therefore that each of the measures is an element of Ms(R). Thus,  Fm,t since JMs(n) R P ( ~ R=) am,tpm,t,
c,"=;
where S m ( R ) = @(Fm,t) for R E A,,[. Noting that, by lower semicontinuity, @ ( R ) 5lim m,(R) m+oo
for each R E K , we can now use FATOU'S Lemma to conclude that the left hand side of (5.4.26)dominates the right hand side. At the same time, by the result in the preceding paragraph, the opposite inequality also holds. We have now completed the proof in the case when p is compactly supported. To handle the case when p is not compactly supported, choose a nondecreasing sequence of compact sets K,, so that p(K,) 2 (n l)/n;set a, = p(K,); and define a,(I') = &p(I'nK,) and Tn(I') = &p(I'nK:) for I' E UMl(a). Since each o,,is compactly supported and J Ro,(dR) J Rp(dR),we see from the above that
*
@
(
/Ml (a)
p(dR))
n k
@
(/MI
(a)Ro,(dR)
when 9 is convex. On the other hand, if
Q,
)
I
/Ml(Q)
is affine, then
@ ( RP) ( W
V NonUniform Results
225
Since it is clear that
we are done. I Applying Lemma 5.4.24 to J and H, we now see that
where PQ E MI (EMs(R)) is the measure described in the Ergodic Decomposition Theorem. Hence, we have now completed STEP 3; and therefore we have derived the following version of a theorem proved originally by T. CHIYONOBU and S. KUSUOKA in [la].

5.4.27 Theorem. Assume that P E Ms(G) is hypermixing. Then the specific entropy function H : M,(R) [O,oo]in (5.4.8) exists (ie., the indicated limit exists) and defines a good rate function which governs the large deviations of { P o R;’ : T E (0, m)} as T 00.

At the beginning of this section we mentioned that there are certain technical difficulties associated with taking R to be the SKOROKHOD space D(R;C) of rightcontinuous paths w : R C which have a leftlimit at each t E R. The difficulties alluded to stem from the problem of putting a Polish topology on R which is the projective limit of Polish topologies on the SKOROKHOD spaces of paths on finite time intervals. To be precise, let I be a compact interval and denote by D ( I ;C) the space of rightcontinuous paths WI : I C which have a leftlimit at each t E I and are leftcontinuous at the right hand end of I . Using SKOROKHOD’S prescription, one can then put a metric PI on D ( I ;C) in such a way that ( D ( I ;C), P I ) is a complete, separable metric space and prconvergence of { w ~ , e } & to WI is equivalent to


+ supIX(t)  tl : X E LI tEI
where distc denotes the distance on C determined by the C ’s metric and LI stands for the group of increasing homeomorphisms of I onto itself. Furthermore, the P I ’ S can be chosen so that if I = [a,b] and J = [c,d],
226
Large Deviations
where c 5 a and b 5 d , and if leftcontinuous at b, then
WJ,
w> are elements of D ( J ;C) which are
(5.4.28) The problem comes from the fact that W J E D ( J ;C ) and I J do not guarantee that W J is~ an~ element of D ( I ; C ) ,since W J need not be leftcontinuous at the right end of I . Worse, even if one replaces the restriction map by TI : D ( J ;C ) D ( I ;C ) given by

the situation in (5.4.28) does not improve substantially (i.e., the topologies still do not mesh correctly). For this reason, we will adopt a scheme for introducing a topology on D(R;C) which is slightly different from the one which we used for C(R;C ) . From now on, R will denote D(R;C); and, for compact intervals I, PI will be the metric introduced by SKOROKHOD on D ( I ; C ) . Given T E (O,m), we will use QT to denotes the space D ( (  T , T ) ; C ) of paths WT : (T,T) C which are rightcontinuous and have a left limit at each t E (T, T ) . Next, we define the metric dT on RT by

and we take


Rs, S f Finally, we define KT,S : 1 2 ~ Qs, 0 < S < T , and T S : R ( 0 ,m), to be the natural restriction mappings. As a relatively straightforward application of the fact that each w E 52 can have at most countably many points of discontinuity, one can use (5.4.28) to check all but the final assertion in the following lemma. The final assertion is a consequence of the wellknown facts that, for each compact interval I, the SKOROKHOD topology on D ( I ;C) restricts to the uniform topology on C ( I ;C) and that the Bore1 field of the SKOROKHOD topology is the aalgebra generated by the evaluation maps C t , t E I .
V Non Uniform Results
22 7
5.4.29 Lemma. Each of the spaces (RT, d ~ )T, E ( 0 , ~ is ) a complete separable metric space; and, for all 0 < S < T , dS(TT,SU,TT,SUk) 5 dT(WT,W&), w T ,w k E
%"I'
Moreover, (R, d ) is a complete, separable metric space which is homeomorphic to the projective limit of the sequence ( ( f & , ~ ~ + l , ~ , d :, )n E E+}; and ( t ,w)E (0,m) x R Otw is continuous. Finally, the relative topology which C(R; C) inherits as a subset of (R, d) coincides with the topology of uniform convergence on compacts, and ?3nis the aalgebra over R generated by the maps w E R &(w) E C, t E R.

Once one has the facts contained in Lemma 5.4.29, the argument used to prove Theorem 5.4.27 with R = C(R; C ) applies without change to the case when R = D(R;C). 5.4.30 Exercise.
Formulate and prove the analogue of Theorem 5.4.27 for the discreteparameter setting. 5.4.31 Exercise.
Let R be either C(R; C ) or D(R;C ) and let C' be a second Polish space. Suppose that F : R + C' is a B[~,~lmeasurable map for some T E [O,W), and assume that t E R F(Otw) E C' is an element of a' R' so that C : ( @ ( U ) ) = D ( R ; C ' ) for each w E R. Finally, define : R F(Otu) for t E R. Given a P E MT(R) which is hypermixing, show that P' = P o @' is a hypermixing element of Ms (0').
 
5.4.32 Exercise.
=

Let R = C(W;C), and suppose that P E M1(R) admits a good rate function J : M1(R) [0, W] which governs the large deviations of { P 0 RT1 : T > 0). Next, define the empirical position measure
 
and observe that LT(w) = RT(u)0 X i 1 . Thus, since w E 0 &(w) E C is continuous, and, therefore, so is R E M1(R) R o C,' E Ml(C), the final part of Lemma 2.1.4 says that
I ( p ) = inf { J ( R ): R E M1(R) and p = Ro C;'},
p E
MI(C),
is a good rate function which governs the large deviations of ( P 0 LT1 : T > O}.
228
Large Deviations

Now let R = D(W;C)and suppose that there exist P E Ml(R) and a good rate function J : M l ( 0 ) [O,m] which is related to P as in the preceding paragraph. What one would like is to repeat the argument just given and thereby show that the large deviations of { P o LT : T > 0 } are governed by a rate function of the sort described above. The problem is, of course, that w E R ,I Co(w) E C is no longer a continuous mapping. In order to circumvent this problem, one can take the following sequence of easy steps.

(i) Set Ro = { w : & ( w ) = C,(w)} and show that Ro is a bssubset of R and that w E Ro & ( w ) E C is continuous. Conclude that MY(R) = { Q E Ml(R) : &(no) = l} is a C58 subset of M1(R) and that Q E MT(R) Q o C,' is continuous. Finally, check that Ms(R) MY(R).


(ii) For ( T , w ) E (0,m) x R, define GT E R so that G T I [  ~ , = ~ )w[T,T) and 82TGT = GT.Show that ( T ,w ) E ( 0 , ~ x) GT E is measurable and therefore so is ( T , w ) E ( 0 , ~x)R +I RT(w)= RT(&) E Ms(R). In addition, check that, for each S E [0,m),

(iii) Suppose that P E M1(R) and that J : M1(R) [O,m] is a good rate function which governs the large deviations of {PoR;;l : T E (0, m)} as T 00. Show that JlMS(,) is a good rate function which governs the

large deviations of {P o R G 1 : T E (0, m)} as T
l
LT(w) =
T
1

00.
Next, define
T 6Ct(w)
dt
and show that { P o LT1 : T E (0, m)} satisfies the full large deviation principle with respect to the good rate function p E M1(C)

I(p)
= inf{ J ( Q ) : Q E MY(,)
and Q o X i 1 = p } .
In particular, when P E MT(R) is hypermixing, conclude that { PoL,l : T E (0,m)) satisfies the full large deviation principle with the good rate function I : M1(C) [0,m] given by

(5.4.33) 5.4.34 Exercise.
I ( p ) = inf{H(Q) : Q o C,'

=
p}.
Let P E Ms(R) be hypermixing. Starting ffom (5.4.14),show that, for each compact interval I, V E B(R1;Fa) Al(V) E R is a continuous
V Non Uniform Results
229
function of bounded, pointwise convergence. (Hint: See the proof of Lemma 4.1.40.) Conclude that
(5.4.35) for Q E Ml(i2). 5.4.36 Exercise.
Let P ( t ,c7, .) be a transition probability function on C and assume that the corresponding MARKOVfamily {Pu : u E C} can be realized on D([O,00); C). Also, suppose that there is precisely one P(t,u,,)invariant p E MI@); and denote by P the unique element of MY(fl) with the property that
for oo < s < t < 00 and I? E BE.(Obviously, P o CF1 = p for all t E W.) Finally, assume that P is hypermixing. The purpose of this exercise is to see when the rate function I in (5.4.33) can be identified with one of the rate functions which we produced in Section 4.2.
(i) Show that if p = rn is P(t,a,)reversing, then I = J E , where JE is defined from the associated DIRICHLET form E (cf. (4.2.47)) as in (4.2.49). (Hint: Use (5.4.18) with I = (0) and Exercise 5.3.15.) (ii) The nonreversible case is not so satisfactory. To see what sort of thing as in Exercise 5.1.17, and J p and J p can be said, define i p , A;, and as in (4.2.38) and (4.2.36). Noting that A{,) 5 A p (cf. (4.2.21)), show that I 2 J p . Next, if, for some V E B(C;R), (5.4.38)
xi
show that i p ( V ) 5 Ai0)(V). Conclude from this, Exercise 5.1.17, and Exercise 5.4.34 that I = J p when (5.4.38) holds for every V E B(C;R). Similarly, when P ( t ,c7, ) is FELLERcontinuous, show that, when (5.4.38) holds for every V E Cb(C; W), I must equal J p .
230
Large Deviations
5.4.39 Exercise.
One of the more remarkable features of the hypermixing property is its behavior under products. To be precise, let 3 be a countable index set and for each i E 3 let Pi be a hypermixing element of MY (D(R;Xi)) where each Ci is a Polish space. Further, assume that there are functions a, pl and y satisfying (5.4.1) such that (H1) and (H2) hold with P = Pi for all i E 3. After making the obvious identification of
show that
niE3Pi determines an element of
which is hypermixing with the same choice of functions a , p, and y. 5.4.40 Exercise.
Define the rtopology on Ml(f2) to be the weakest topology with respect to which the mapping
is continuous for each compact interval I and V E B(SZ1;R). Given a I? C_ Ml(SZ), let I?" and r Tdenote, respectively, the interior and closure of I? in the .rtopology. 'Assuming that P E Ms(SZ) is hypermixing, show that, for every measurable 'I M1(f2), 
inf H(Q) < lim QEr0
ttm
t
log (P({u : R t ( w ) E I?}))
(Hint: Use the estimate on which (5.4.14) is based and apply Theorem 3.2.21.) With the preceding in hand, one sees that it would have been possible to avoid some of the difficulties associated with the SKOROKHOD topology by proceeding along a line of reasoning like the one which we used to complete the program in Section 4.4.
V NonUniform Results
231
5.5 Hypermixing in the Epsilon Markov Case
In this section, we develop a sufficient condition for the hypermixing property t o hold. Throughout, R will denote the space D ( R ; C ) (cf. the discussion following Theorem 5.4.27) and P will denote a fixed element of
M m .

Recall the aalgebras BI = a({& : t E I}),where I runs over intervals in R. We will use BI(R;R) to denote the subset of f E B(R;W) which PL E M l ( R ) to are BImeasurable. Also, given I, choose w E R be a regular conditional probability distribution of P given BI and define EI : B(R; R) Br(S2;W)so that E ~ f ( w=) f ( w ’ ) PL(dw’). Notice that, by JENSEN’Sinequality,

s,
(5.5.1) where
for p , q E [l,001 and any operator K defined on the bounded measurable functions B ( ( E 3); , R) of a measure space ( E ,F,p ) . In addition, by shiftinvariance, one has that
[Wf
(5.5.3) Es+If = 0 4 4 1 0 8, (8% P ) for all s E W and f E B(R; R). Using E; and E,’ to denote E(,,,] and E[s,oo), respectively; we now define Pt : B(R; R) B(R; R) for t f (0, cm) by

(5.5.4)
Ptf = E, [E:(f
0
et)]
= E{ [(E?,f) 0 41.
Obviously, (5.5.5) l l p , J I L p ( p ) ~ L p ( p= ) 1, p E I.* In addition, if f E B+,(R; R) G B[s,m)(R;R) and 0 (cf. (5.5.3)) Palmost surely:
p t f = ~ ; ( f0 e,) = E,E;,(~ = E, ([E,(f
0
0
es>] 0 a t  , )
< s < t < 00, then
e,) = E , [tP,f)0
&,I;
and therefore, by (5.5.1), we see that (5.5.6)
IIPt f IILP ( P ) 5 I1P, f IILP ( P )
for p E [ l , ~ and ] f E B?,(QR). yields
(5.5.7)
7
O<s 0) to denote the MARKOVsemigroup on B ( C ;W) which is determined by P ( t ,o,.), and we will suppose that there is a {Pt : t > 0)invariant p E Ml(C). Finally, we will denote by P the unique element of My (R) (52 = D(W;C)) with the properties that (6.1.1)
P OC;'
=p
and P ( A n B ) =
PE,(~)(B) P(dw)
for A E B(,,01 and B E B[O,0 0 ) . It should be obvious that, in the terminology of Section 5.5, P is 0MARKOV.In fact, the Pt in (5.5.4) is given by
for f E B+(R; R); and, as a consequence, Theorem 5.5.17 is easily seen to become the following statement. 6.1.3 Theorem. The P in (6.1.1) is hypermixing if and only if
(6.1.4)
IIPT
I L2(p)4L4
(p)
= 1 for some T E ( 0 , ~ ) . 237
238
Large Deviations
A MARKOVsemigroup for which (6.1.4) holds is said to be phypercontractive; and it is our goal to find conditions which guarantee this hypercontractive property. As a preliminary step in this direction, the following result is often useful. 6.1.5 Lemma.
If llpTlILz(p)+Lyp)
=1
then (6.1.6)
llpt9  (9)pIIL*(p) I 3  q
ll4IlLZ(p)
for t E [T,m) and 9 E B ( C ;R); where we have introduced the notation
( 9 ) p=
(6.1.7)
Jc 9 d P .
Conversely, if, for some To, TI E (0, oo),
MO
IIpTollLZ(p)+L4(p)< O0
then (Pt : t > 0) is phypercontractive.
PROOF:The first assertion is simply a translation of Lemma 5.5.11 into the present context. To prove the second part, write 9 = a + @, where a = (9),. Then, by HOLDER'S inequality, for t > TOV 21' :
jl~t411;"(p)I a4 + 6."Pt@lI"l.,,)
+ 41~lllPt@11;3(p)+ llmIl;qp) 4
I a4 + 8a211~t@11;z(p)+ 311pt@llL4(p)
+ 3M,4p4[(tTO)'T'l
< a4 + 8p2[t/~1~ a2 "@llLz(p)
11414L2(p,1
where we have used (6.1.8) in the passage to the last line. Finally, we choose t > To V TIso that 8p2[t/T11 5 2 and 3Mtp4[(tTO)/T11<  1,
and thereby obtain
II~t9(l;4(p,Ia4 + 2a211@11~2(p) + 11@114Lz(p) = (a2 + ll~ll"L2(p))2 = 11#114L2(,).
I
The next result is a typical application of Lemma 6.1.5.
VI Analytic Considerations
239
6.1.9 Theorem. Suppose that there exist TO,TI E (0, co) for which P(Ti,u, d ~ =) p(Ti,0,T ) p ( d ~ ) , i E { O , 1 } and a E C,
and there is an
Then {Pt : t
E
> 0 such that
> 0) is phypercontractive.
llLz(p)+L4(p)
ObviouslY, ((PT, is check that PROOF:
< 00. Thus, all that we have to do
for some t E ( 0 , ~ ) But, . by (ii) of Exercise 4.1.48 with n ( a ,.) = P(Tl,a,*), we see that (4.1.50) says that
Hence, if EO is the operator on L 1 ( p ) which takes 4 E L1(p) into the constant function ( 4 ) p ,then (because p is P ( t ,0,.)invariant) llPt  ~ o I l L ' ( p ) 4 ' ( p )< 2
for every t E (0, OO),
and, by the preceding,
Hence, by the RIESZTHORIN Interpolation Theorem,
and clearly this means that we need only take t = nT1 for some sufficiently large n E Z+. I
Large Deviations
240
6.1.10 Remark. Theorem 6.1.9 makes it reasonably clear where hypermixing stands in relation to the hypothesis under which we proved our large deviation principle in Section 4.2. Namely, hypermixing is implied by the following strong version of (0):
(a)
for some 21 ' , T2 E (0,oo) and M E [l,00). Indeed, there is then (cf. Exercise 4.2.59) precisely one {Pt : t > 0)invariant p E MI@);and, by Theorem 6.1.9, the corresponding P is necessarily hypermixing. Even though (SU) implies hypermixing, it is easy to see that itself does not always lead to hypermixing processes. For example, uniform rotation on S' certainly satisfies (0)and is certainly not hypermixing. On the other hand, as the following example demonstrates, there are important hypermixing processes for which fails.
(a)
(a)

6.1.11 Example. Define ~t : W
(0,oo) for t E (0, m) by
and let
The corresponding MARKOVprocess is the famous OrnsteinUhlenbeck process; and, as is well known, the associated measures {P, : x E W} live on C([O,0 0 ) ; R). In fact, P, is the distribution under WIENER'S measure W of the solution X : [O,m) x 0 W to

x ( t ,e) =
+ e(t) 6 lox ( s ,e) ds.
(See Section 1.3 for the notation here.) Furthermore, it is obvious that
m(dz) = yl(x)dx is the one and only {Pt : t > 0)invariant measure but that cannot be satisfied by P ( t , x , . )for any choice of p1 and pz. Nonetheless, as we are about to see, the {Pt : t > 0) is mhypercontractive, and therefore the corresponding P in (6.1.1) is hypermixing. To verify the preceding assertion, first note that
(a)
VI Analytic Considerations
241
where
From this expression it is easily seen that P ( t , z,?/)*rn2(dX x dY)
< 00,
and therefore ((4 llLz(m)+L4(m) < 00 for all sufficiently large t E ( 0 , ~ ) . Thus, by Lemma 6.1.5, all that remains is to check that the second part of (6.1.8) holds. To this end, observe that (as the preceding expression makes explicit) rn is P(t,x, )reversing, and therefore, by (4.2.46) and (4.2.57),
p t 4  (d),JJLZ(,) I extlldllL2(m),
t E (0,CQ) and d E L2(rn>
where (6.1.13)
X
= inf(E($,$)
:
4 E L2(rn)and 114  ( $ ) , l l ~ z ( ~ )
= 1).
(We are using primes here to denote derivatives with respect to x.) Since Cz(f4; R) is {Pt : t > 0)invariant and
4)
d =~I
t J 4 J  pt4, &),( 1
1 = 2l
( d Y X )  X d ~ ( X ) ) d ( X )d d x ) = 211d)));z(m)
and (Pt4)' = et/2Pt(4')for d
#J
E C;(R; Fa), we know that
2
2
((Ptd((,z,,) dt = 2E(Pt4, Ptd) = e"((~t(d')JIL2fm)
I et11d111i2(m)= 2 e  t E ( h d )

first for all $r E C,"(R;R) and thence for all q5 E L2(rn).Finally, since Pt4 ( c $ ) in ~ L2(rn)as t + 00, we now have:
i,
Hence, X 2 and therefore the second part of (6.1.8) holds for all TI E (0, GO). Actually, A = since
a,
IId  (4),11;z(,) when
d(z) = x, x E R.
= 2E(4,d)
242
Large Deviations
At least when p is P ( t ,cr, .)reversing, the preceding example indicates that the property of phypercontractivity is closely related to properties of the associated DIRICHLET form. This connection is spelled out most precisely in the following version of a theorem due to L. GROSS[56].
6.1.14 Theorem. (GROSS)Suppose that rn E is P ( t ,n, )reversing; and let & be the associated DIRICHLET form (cf (4.2.47) ). Given Q E ( 0 , ~and ) ,B 2 0,
if and only if (6.1.15) for 1 < p 5 q < 00 and t E ( 0 , ~with ) e 4 t / a 2 ( q  1)/@  1). In fact, (6.1.15) with p = 2 implies (LS) and therefore (6.1.15) for general p E (1, a) . .
z
PROOF:Recall the operator which generates the semigroup {pt : t > 0) on L2(rn) (cf. the discussion preceding (4.2.46)), let 4 E B(C;(0, a)) n Dom(z) be given, and set dt = Pt4. Then, for any q E [I,oo),
Note (cf. the argument leading to (4.2.54)) that, for any 1c, E B ( C ;[0, m))n Dom(Z),
where we have used the fact that, for any a , b E (0,oo) and q E [1,00),
243
VI Analytic Considerations which follows, in turn, from
for 71 E (1, co). Hence, we now see that
At the same time, if t E (0,oo) B(C;[0, co)), then

q ( t ) E (1,co) is smooth and 11, E

Therefore, after combining this with the above, we have that for smooth q ( t ) E (1,co) and 4 E B ( C ;[0, co)) f l Dom(Z) :
t E (0,co)
(6.1.16)
Now suppose that (LS)holds and, for given p E ( l , c o ) , set q ( t ) = 1 ( p  l)e4t/a. Then q’(t) = 4 ( q ( t )  l ) / a and so (6.1.16) says that
+
and therefore that
at least for 4 E B ( C ; [0, m)) n Dam(,). Since the passage from this to general 4 E LP(m) is trivial and ((PtllLP(m)+Lr(m) = 1 for all T E [l,001, we
244
Large Deviations
have now proved that (LS) implies (6.1.15). On the other hand, if one takes e4t/a, then one finds that (6.1.16) becomes an equality at t = 0. Hence, when (6.1.15) holds with p = 2 and therefore $llr#t((Ln(t)(m) <  0 at
q(t) = 1
+
t = 0, (LS)follows for r# E B(C;[0, m)) n Dom(Z). At this point it is an easy step to (LS) for all r# E B ( C ; [O, m)) and thence, via (4.2.54), for all r# E L2(m).I An estimate of the form in (LS) is called a logarithmic Sobolev inequality.
6.1.17 Corollary. Assume that m E M1(C) is P(t,u,.)reversing and define AE and JE accordingly (asin (4.2.62) and (4.2.60), respectively). Then the following three properties are equivalent:
with e4t/(r 2 ( q  l)/(p  l), and (6.1.20)
A ~ ( V: l)o ~ g(Lexp[a~]dm),
v€c~(c;R).
Moreover, if any one of these holds, then (6.1.21)
for t E (0, m) and r# E L2(m).
PROOF:Note that (6.1.18) is equivalent to (LS),first for nonnegative 4’s and then (by (4.2.54)) for all 0’s. Thus, by Theorem 6.1.14, (6.1.18) and (6.1.19) are equivalent. At the same time, the equivalence of (6.1.18) to (6.1.20) is the content of Exercise 5.3.15. Finally, by (6.1.6), one knows that IIPtIlp(m)+L4(m) = 1 implies that IlPtr#  (r#)mIILz(m) 5 3  1 ~ 2 ~ ~ r # ~In~particular, ~ ~ ( m ) . when (6.1.19) holds, then one can take t = (alog 3)/4. After combining this with the Spectral Theorem (cf. (4.2.57)), one concludes that Ear# = (d)m, that EX EO = 0 for X E [0,2/a), and therefore that (6.1.21) holds. I We conclude this section with a result which sharpens for the reversible setting the sort of topics treated in Theorem 5.5.12 and Lemma 6.1.5.
6.1.22 Theorem. Assume that m is P ( t ,g,)reversing.
245
VI Analytic Considerations (i) Suppose that IIPTIILP(m)L'I(m) =
1
for some T E (0,m) and 1 < p < q < 00. Then (6.1.18) holds with (6.1.23) In particular, if {Pt : t then
> 0) is rnhypercontractive at time T E
(6.1.24) for 1 < p
=1
IIPtllLP(m,Lq(m)
< q < m and t
E
(O,m),
(0, m) with et/T 2
s.
(ii) Assume that
114  (4)mll;2(m)
(6.1.25)
41, 4 E W m ) ,
5
and that (LS) holds for some a , p, y E ( 0 , ~ ) Then .
and so {Pt : t
> 0) is mhypercontractive.
PROOF:To prove (i) we will use the criterion provided by the equivalence of (6.1.18) and (6.1.20). To this end, we first show that, for given V E B ( C ;R),
where a = a(T,p,q). Indeed, for 4 E B ( C ;[O,oo)), set
J,exp [c T V ( C m T ( w ) ) (b(CnTt(W))P u ( b ) , n1
an,t(c)=
m=O
I
E C
Then, by Theorem 4.2.25, JENSEN'Sinequality, and the MARKOVproperty:
246
Large Deviations
and so
But
and therefore
Since, by our hypothesis and HOLDER'Sinequality, it is easy to see that

m. we now get the asserted estimate after letting n To complete the proof of (i), we reason as follows. If p = 2, there is nothing to do, since 1 AE(V) = t'iE ;log
)
(ll~~llL2(m)+Lz(m)
On the other hand, if 1 < p < 2, then (by precisely the same argument as we used to prove (5.5.14)) we can find a TI E (0,m) for which IJPT,IILP(m)tLZ(m) = 1; and therefore = lim log 1 n+m
nT
(IIV '~T+T,I~L~(~)~L~(~)
A similar argument applies when 2 < p < 00. To prove (ii), we will show that
(6.1.26)
VI Analytic Considerations
247
and clearly this will lead immediately to the desired result. Note that in order to prove (6.1.26), it suffices to show that
J,(1 + t$)' log ((1+ t*y ) dm I t2 J, q2log(@) dm + 2t2, +
for all II, E L 2 ( m )with be given and set
t
t2
($),
€or t E R. Then fa(0) = log(1
+
= 0 and
I l $ l l ~ z ( ~= )
ER
1. To this end, let 6 > 0
+ S),
(1 t$)lCIlog((l
+ t$)' + 6)drn + 2 1
+ log(1 + t 2 )+
J, $2
+
(1 t*)3* (l+t*)2+sdm
log($') dm] ,
and (1
+ tI+q2 + 6
dm
+ 10
(1+ t*)* [(1+ t*)2
< 2 log (1 + 
m) 6  [4A(t,6)'
dm2
+ &I2
 10A(t,6)]
4t2 1 t2
+
2
5 2 1 0 4 1 + 6) + 4 where
and we have used JENSEN'Sinequality in the passage to the last line. From these and TAYLOR'S Theorem, we conclude that
and therefore the required estimate follows once one lets 6 \ 0.
248
Large Deviations
8.1.27 Exercise. Referring to Lemma 5.3.5, let u E D fl B ( C ; [l,cm)) be given and define mu E MI@) and the transition probability function Pu(t,u,.) accordingly.
(i) Show that for any 4 E B(C;R) and p E M1(C)
(6.1.28)
J,
42
log
= inf
{
(
l1411;z(p)) "
dp
[4'logq5'  $'log t  4'
1
+ t] d p : t E ( 0 , ~ )
+
Next, check that 3c log z  z log t  z t >_ 0 for all (t, 3c) E (0, cm) x [O,cm); and use this in conjunction with (6.1.28) to show that (6.1.29)
H(vImu)
IJJuJJ%H(vJm), v E Ml(C).
(ii) Let &, denote the DIRICHLET form associated with P,(t, Using (4.2.54), show that (6.1.30)
.) and mu.
(T,
& ( A4) I l141il(m)~u(4,4), 4 E B ( C ;w.
(iii) By combining parts (i) and (ii), show that (6.1.18) implies that H(vlmu)
I (WIIu114BJEu(4,
v E Ml(C).
In particular, this means that the hypermixing property is preserved by the transformation described in Lemma 5.3.5. 6.1.3 1 Exercise.
Let m E M1(C) be a P(t,o,.)reversing measure. More familiar than logarithmic SOBOLEV inequalities are classical Sobolev inequalities of the form (6.1.32)
114112LP(m)
6 A ( E ( h 4 ) + Bll$lliz(m)),
4E
m; W),
for some p E (2,cm) and A, B E [O,cm). One naturally expects that a classical SOBOLEV inequality ought to be a stronger statement than a logarithmic one. To verify this, let 4 € B(C;R) with J J $ J J L z (=~ )1 be given, and use JENSEN'S inequality to check that
VI Analytic Considerations
249
Thus, (6.1.32) implies that
In particular, if one has, in addition to (6.1.32), that
then, by part (ii) of Theorem 6.1.22,
+
PA(1 BC) P2
+ 12.
JE(v), v E M1(C).
6.1.33 Exercise.
In his article [56], GROSSconsidered the “twopoint” space C = {1,1} with the BERNOULLI measure m = (61 + 61)/2 and the transition probability function l+et i f o = T P(t,cr,T) = le+ if = T.
I,
Obviously, m is P(t,u, .)reversing. Using & to denote the DIRICHLET form associated with P ( t ,u, .) and m, show that (6.1.34) and conclude from this that the associated semigroup {Pt : t the property that p t = 1 as long as 1 < p < q e2t 2 ( q  l ) / ( p  1). Finally, check that (6.1.34) is optimal.
l L P(m)+Lr(m)
> 0) has < 00 and
Hint: First observe that it suffices to prove (6.1.34) for 4’s of the form &,(a) = 1 bo, where b E [0,1]; and then show that (6.1.34) for $hb is equivalent to
+
h(b)
+
(1 b)2 log(l+ b )
+ (1  b)2 lOg(1  b )  (1+ b 2 ) lo g ( l+ b 2 ) I 2b2
for b E [O,11. Finally, prove the preceding by checking that h(0) = h‘(0) = 0 and that h”(b) 5 4.
Large Deviations
250
6.1.35 Exercise. Referring to the situation in Corollary 6.1.17 and assuming (6.1.18) holds, show that (6.1.36) H(vPtlm) 5 exp
[%I
a
H(vlrn),
v E M1(C) and t E [O,oo).
Hint: Assuming that f is a uniformly positive element of Dom(z) which is bounded, set f t = [Ptf]and check that
Next, using (4.2.54) in the same sort of way that we used it in the proof of Theorem 6.1.14, show that
6.2 Symmetric Diffusions on a Manifold
The purpose of this section is to provide a ready source of examples to which the results in Chapter V and Section 6.1 are applicable. The setting in which we will working is that of differentiable manifolds. Thus, we will assume that C is a separable, connected, Ndimensional C"manifold on which there is given a complete RIEMANNian structure; and we will denote by X the associated RIEMANNian measure on C. Given vector fields X , Y E r(T(C)),(XIY) E CO"(C;R) will be the RIEMANNian inner product of X and Y ; and 1x1 = ( X I X ) l / ' is the length of X. (We use T(C) to denote the tangent bundle over C and I'(T(C)) to denote the space of smooth sections.) Also, we use V x Y E r(T(C)) to denote the associated (LEVICIVITA)RIEMANNian covariant derivative of Y with respect to X. That is, V i defined to be the KOSULconnection which satisfies (6.2.1)
V x Y  V y X = [X, Y ] , X , Y E F(T(C)),
where X , Y ] = X Y  Y X is the commutator of X and Y , and (6.2.2)
X ( Y I 2 ) = (VxYIZ) + (YIVxZ)
for X, Y, Z E r(T(C)).
In addition, we will use grad 4 E r(T(C))and divX E C w ( C ;R) to denote the gradient of Q E C"(C;R) and the divergence of X E r(T(C)).Thus, for X E I'(T(C)): (6.2.3)
x4 = (XIgrad4), d E C"(C;W,
VI Analytic Considerations
251
and (6.2.4)
lXq5dX = 
J, q5divXdX
for q5 E Cp(C;R),
where C r ( C ; R ) denotes the class of # E C""(C;R) which have compact support. In particular, with the use of normal coordinates, one can easily check that
if {&}? C I'(T(C)) is orthonormal at u. Finally, we will use A to denote the LaplaceBeltrami operator given by
A4 = div(grad$),
# E Cm(C; R).
The reason for our introducing the preceding terminology is that we are going to be dealing with diffusions on C corresponding to an operator L of the form eu 2
[Ld]= div
(6.2.6)
1 (eUgrad(6) = ([A#]  (gradUIgrad4)) 2
€or # E C"(C; R), where U is a fixed element of C"(C; W) which satisfies (6.2.7) (Note that Example 6.1.11 corresponds to C = R with the standard EucLIDean structure and U ( z ) = (xc2 log2n)/2.) Our first step will be to make sure that such a diffusion exists and that the measure rn E Ml(C) given by m(do) = eu(u) X(da)
(6.2.8)
is reversing for the corresponding transition probability function. To be precise, we will prove the following.

6.2.9 Theorem. Set 52 = C([O,0 0 ) ; C) and define the evaluation map
Ct : R R and the ualgebra f3t for t E [O,m) accordingly. Then, for each cr E C , there is precisely one P,, E Ml(R) with the property that
(6.2.10)
Large Deviations
252

is a meanzero martingale for every # E Cr(C;R). Moreover, the map uEC P, E Ml(fl) is continuous and the family {Pe: u E C} is (timehomogeneous) MARKOV.Finally, let P(t, u,.) denote the associated transition probability function (i.e., P ( t , u , r ) = P,({w : C t ( w ) E I’})). Then the measure m in (6.2.8) is P(t, u,)reversing. In fact, the corresponding DIRICHLET form E is given by
4) =
(6.2.11)
f Jc lgrad#I2 dm
for # E L2(m)nC“(C; R) with lgrad #I E L2(m);and E is the closure of its own restriction to C r ( C ; Fa) in the sense that # E L2(m)is an element of Dom(E) (i.e., satisfies €(#,#) < 00) if and only if # is the limit in L2(m) of a sequence {&}y G C r ( C ; R) with the property that
in which case E(4, #) = limn+m E(#n, (bn). In particular, if {pi: t > 0) is the semigroup on L 2 ( m )determined by P(t,u, then for every # E L2(m), [Ft#] Jc #dm in ~ ~ ( rasn t )+ 00.

a ) ,
Aside from rather mundane probabilistic considerations, the proof of Theorem 6.2.9 comes down to showing that the diffusion “generated” by L does not explode (cf. Chapter 10 of [104]);and the key to checking this is contained in the following variant of a lemma due to M. GAFFNEY[52], which shows how to utilize the completness assumption that we have made about the RIEMANNian structure on C. (For the required standard facts about RIEMANNian geometry, the reader might want to consult MILNOR’S marvelous [761.)
6.2.12 Lemma. (GAFFNEY)There exists a 11, E C” (C; [0, co)) with the properties that the level set {u : $(a)5 R} is compact for each R E (0,co) and that Igrad11,I is bounded. In particular, there exists a nondecreasing sequence {q,,}? C_ C r ( C ; [0,1]) with the properties that
 
11
lgradqnI 1 1 ~ 0 as n {u : q,(o) = 1) /” C and PROOF: Choose and fix a reference point uo E C, and set $(g)
= dist(o,ao),
CT
0.
E C,
where “distance” is being measured with respect to the RIEMANNian distance function on C. Because C is connected, C = {u : #(u) < co}; and by the triangle inequality, it is obvious that # is LIPSCHITZcontinuous with LIPSCHITZconstant 1. Moreover, because the RIEMANNian structure on C is complete, the level sets K ( R ) G {o : #(u) 5 R} are compact, and clearly they exhaust C. Thus, we can find an open cover { U m } r and an atlas {(Wm,Qrn)}Twith the following properties:
VI Analytic Considerations
253
(i) Every pair of points in W , are joined by a unique geodesic which lies entirely inside of W , . (ii) Dm cc
w,.
(iii) For every R E (0,oo) there are only finitely many rn E Z+ with W , n K ( R ) # 0; and if W , n K ( R ) # 0, then 5 K ( R 1).
w,
+
Finally, choose a , C r (C; [O, I]) to be a partition of unity which is subordinate to {Um}y.
4m,Ja) =J
4 0 @.,l(Y)p€(@,(a)
 Y) 4 4 ,
(7 E
urn,
@rn(Wm)
where p,(y) = ~  ~ p ( y / cand ) p E C" (RN;[0, m)) is compactly supported E in the unit ball and has total (LEBESGUE) integral 1. Clearly, 4,, C" (U,;[0, m)). In addition, for every u E U,,
Similarly, for all a, T E U,, 147n,C(T>
 4rn,€WlI
SUP diSt(Q,,,(T), lYl 0) is strongly continuous. In fact, (0, : t > 0} of is the semigroup which is generated hy the FRIEDRICHS extension LIC,UO(C;R). Using {Ex : A E [0,00)} to denote the spectral resolution of  L , we have the representation
{ot
In particular, if
256
Large Deviations
then (6.2.14) leads to
for t E (0,m). A basic fact about the FRIEDRICHS extension of a nonnegative operator is that its DIRICHLET form is the closure of its quadratic form. Thus, in the present situation, E is the closure of its restriction to C,(C;R). We next want to prove (6.2.11). To this end, let #J E Cw(C;R)nL2(m) with (grad41 E L 2 ( m )be given, and observe that, by (6.2.15) and the fact that E is closed, all that we have to do is produce a sequence {#Jn}TC C,OO(C;R) such that #Jn #J in L2(m) and

lgrad#J,  grad#JI2dm
 0
as n
0.
To this end, choose the functions 71, as in the last part of Lemma 6.2.12 and simply take #Jn = 7]9t#J. As an immediate consequence of (6.2.11) and (6.2.16) with #J = 1, we see that k [ Q t l ] d m 2 1 for all t E (0,~); and because [Qtl] is continuous and dominated by 1, this proves that [Qtl] 1. Equivalently, we now know that P,({w : [ ( w ) 5 t } ) = 0 for every (t, u ) E [0, m) x C ; and therefore the measures P, are actually concentrated on a. In particular, {P, : (T € C } is itself a FELLERcontinuous timehomogeneous MARKOV family of probability measures on R; and all of the statements which we have made about the Qt 's immediately become statements about the semigroup {Pt : t > 0) determined by {Po: n E C}. We still have to prove the final assertion of the theorem. Using the spectral representation of Pt = Gt,one sees that it is sufficient to show that the range of the projection EO is the constant functions. Equivalently, this comes down to checking that #J is constant if #J E L 2 ( m )with E(#J,#J)= 0. To this end, assume that €(@,4) = 0. One then has that &(#J,$) = 0 for every $ E Dom(E) and therefore that
Vl Analytic Considerations
257
for every $ E CF(C;R). But this means that [L4] = 0 in the sense of distributions and therefore, by standard elliptic regularity theory, that 4 E Co3(C;W).In particular, this now leads to the conclusion that
and, therefore, that grad4 = 0 everywhere. Clearly the constancy of follows from this and the connectedness of C.
4
From now on m will be the probability measure in (6.2.8) and we will use (r$)mto denote the mintegral of a 4 E L1(m).Also, P ( t ,g, will be the transition probability function for the MARKOVfamily {Pn : o E C) produced in Theorem 6.2.9, and {Pt : t > 0) will be the corresponding FELLERContinuous semigroup. Before proceeding, we will need the following technical addendum to Theorem 6.2.9. a)
6.2.17 Lemma. Set
Then, for each f E
F,( t ,0) E ( 0 , ~x) C

[ P t f ] ( g is ) smooth,
and lgradfl E L2(m).In fact, (6.2.20)
(g, [Lf]) U(m)
Finally, 3 is {Pt : t
(
for f , g E F.
= 1 (gradflgradg))
2
m
> 0)invariant.
PROOF:Let f E F and 1c, E C?(C;R) be given. Then,
Thus, ( t ,o) E ( 0 , ~x) C +I [ P t f ] ( osatisfies ) the first equality in (6.2.19) is the sense of distributions; and therefore, by elliptic regularity theory, it is a smooth function which satisfies this equality in the classical sense.
Large Deviations
258
Before attempting to check the second inequality in (6.2.19), we will prove lgradfl E L2(rn), f E 7 ,and (6.2.20). To this end, choose {v,}? as in the last part of Lemma 6.2.12. Then
I(
I L22(m)
from which it is a simple matter to estimate lgradfl in terms of Jlfll~~z(~)IILfll~z(~). Thus, we now know that lgradfl E L2(rn)for all f E 3,and once one knows this, the proof of (6.2.20) is easy:
Returning to the proof of the the second equality in (6.2.19), note that we already know that (6.2.19) holds for elements of CF(C; R); and therefore, if II, € Cr(C;R), then
where, in the passage from the first to the second lines, we have used the facts that [Pt$] E 7 ,t € [O, oo),and therefore that (6.2.20) applies. Clearly the second equality in (6.2.19) follows from the above. Moreover, we now see that 3is {Pt : t > 0)invariant, since the only thing that we had left to check is that [LPtf]E L2(rn),and this is obvious from the second equality in (6.2.19). I Our goal now is to find conditions which will tell us when the results in Sections 5.3 and 5.4 apply to the processes described in Theorem 6.2.9. We begin with the following. 6.2.21 Theorem. Set V = lgrad UI2  AU and assume that the level sets {o E C : V ( c )5 R } , R E [0,m) are compact. Then Jc is a good rate
VI Analytic Considerations
259
function and
€or every measurable r
MI (C). PROOF:Recall that, for any R E [0, GO), the set
is relatively compact in L 2 ( R N ) where , B is the open unit ball in W N . Hence, with the use of a partition of unity, one can easily check that, for any relatively compact open set G C,
{ 4J E C r ( G ;
Igrad4JI2dm5 R
R) :
is a relatively compact subset of L2(m)for every R E [O, 00). Knowing this and using, once again, the functions qln from Lemma 6.2.12, one concludes that, for each R E [O,m),
@(R)= { 4JE C?@; R) : €(d, 4)I R } is relatively compact in Lt,(m). That is, every sequence {4Jn} @(R) contains a subsequence which is L2(m)convergenton each compact subset of C. Thus, we would know that @ ( R )is relatively compact in L2(m) if we could produce a sequence {Kt}? of compact subsets in C such that {&t}
(6.2.23)
lim '+O0
sup
J
4Jz dm = 0.
+E@(R) K;
To prove (6.2.23) under the stated hypothesis on V , note that if C,"(C; R) and 1' 1, = eU/2+, then
= 2€(4,4J)
+
1 c
[LU]4J2dm
+
a s,
lgrad UI24J2dm
4J
E
Large Deviations
260 and therefore
Since the level sets of V are compact, it is clear from this how to choose the sets Ke. To complete the proof that Jc is good, remember that E is the closure of its restriction to C r ( C : R) and conclude that
where (90 is the closure of @(R)in L2(m). Thus, if {v 7L }"1 C_ M1(C) with J&(v,,) 5 R, n E Z+, then dv, = dm, where {&}y C_ "(R). Now choose a subsequence which converges in L2(m)to an element 4 of O(R). It is then clear that v,t + v, where dv = d2drn. Moreover, since (cf. (4.2.54)) {&t)
J & ( Y )=
WI, 141) I €(4,+) I R,
it is also clear that J&(v) 5 R. The rest of the proof is nothing but an application of elliptic regularity theory and Exercise 5.3.14. Indeed, elliptic regularity theory assures us that P(t,0,d7) = p ( t , 0,T )m ( d ~ )where , p E C" ( ( 0 , ~ x) C x C; (0, a)). I Having found a condition which enables us to apply the results in Section
5.3, we next want to see what we can do to bring the results in Section 5.4 to bear. As we pointed out in Remark 6.1.10, the strong form of (0)in (SU)is more than enough to guarantee that the semigroup {Pt : t > 0) is hypercontractive. Of course, at least from the standpoint of large deviation theory, this is not a very useful observation since (SU) itself implies far stronger large deviation results than does hypermixing. On the other hand, Example 6.1.11 clearly demonstrates that there are interesting situations in which (SU) fails to hold but {Pt : t > 0) is nonetheless hypercontractive; and what we want to do now is develop machinery for recognizing such situations. Thus, we are about to embark on a program which wilI eventually give us a criterion with which to determine when {Pt : t > 0) is hypercontractive even though (SU)may fail. The program which we have in mind is based on the work of BAKRYand EMERY[3]and entails the analysis of the function
VI Analytic Considerations
261
where f is a uniformly positive element of F (cf. (6.2.18)) and we use to denote [Ptf].Using Lemma 6.1.17, one can easily justify the steps:
ft
Thus, since, by the last part of Theorem 6.2.9,
(6.2.25)
Clearly (6.2.25) is potentially related to a logarithmic SOBOLEV inequality. In particular, it indicates that we would be welladvised to study quantities related to the integrand on the right hand side. With this in mind, we introduce, for 6 E (0, a), the function (6.2.26)
a) = (lgrad ft(u)l'
+ 6)
I/'
, (t,a) E [O, 00)
x
c.
By straightforward computation (IT& transformation rule for second order operators), one can show that
where
and (6.2.29)
Our next goal is to interpret the quantity v(t,a) in (6.2.28). In doing so, it will be necessary to recall some more notions from RIEMANNian geometry. In the first place, if g E C"(C; W), then the Hessian, Hessg, is the element of r(T*(C)@ T*(E)) given by Hessg(X, Y) = X Y g  VxYg for X, Y E r(T(C)). Note that, because the LEVICIVITAconnection is torsion free, Hessg is symmetric. Also, an elementary calculation leads to (6.2.30)
Hessg(X,Y) = (VxgradglY),
X , Y E r(T(C)).
Large Deviations
262
A second notion which we will need is that of the RICCIcurvature tensor. For this purpose, recall that the Riemann curvature is the tensor R E r(T*(C)s4)defined by R(X, V, Y, W ) = (Vx
0
VvY
 V v 0 VXY  V[X,VI YIW)
for X , Y, V, W E r(T(C)),and that the Ricci curvature is the tensor Ric E r(T*(C)e2)such that N
(6.2.31)
Ric(X, Y)(.)
R (X ,Ek,Y, &)(d),
=
X, Y E K'(T(E)),
k=l
as long as {Ek}y r(T(C))is orthonormal at We will now show that (6.2.32)
w(t,.) = (Ric
0.
+ HessU)(gradft,gradft) + IIHessftIIH,S,, 2
where, for any {Ek}Y C I'(T(C)) which is orthonormal at a,
is the HILBERTSCHMIDT norm of Hessft(o). In the derivation of (6.2.32), a central role will be played by the identity grad (grad ulgrad w) = V,,d,grad
(6.2.33)
w
+ Vgrad .grad u
for u,w E C"(C;R). To prove (6.2.33), set Y = gradu and 2 = gradw. Then, for X E K'(T(C)):
+
+
(XJVYZ VZY) = Y(XJZ)  ( V y X J Z ) Z(XIY)  (VZXJY) = YXW  ( V X Y l Z ) = XYw
+ zxu  (YIVXZ)  ([Y,X]IZ)  ( [ Z , X ] I Y )
+ XZu,  X ( Y \ Z ) = X(YI2) = (X)grad(YJZ)),
where we made use of the torsion free nature of V. Turning to the proof of (6.2.32), note that 1
(
(
I
w(t,.) = 5A(gradftlgradft)  Z(gradU grad gradft gradft)) 
= wo(t,
a)
(grad Aft (gradft)
+ (grad (grad Ulgrad ftllgrad ft )
1 (grad Ulgrad (grad fi /grad ft)) 2
+ (grad (gradUlgradft)Igradft).
VI Analytic Considerations
263
At the same time, by (6.2.33) (with u = U and 'u = ft),and (6.2.30): 1  2 (grad UIgrad (grad ft [gradft)) + (grad (grad UIgrad ft)Igrad ft) 1 = grad U b a d ftlgrad ft) (Vgrad u s a d ftlgrad ft) 2 Hess U (grad ft ,grad ft ) = HessU(gradft(gradft).
+
+
Thus, all that remains is to show that
11
lli,s,.
vo(t,1 = Ric(grad ft , grad ft ) + Hess ft In order to check (6.2.34), it will be convenient to fix a CJ E C and to choose {Ek} C r(T(C))so that {Ek(O)}y is orthonormal and VxEk(0) = 0 for 1 5 k 5 N and X E r (T(C ) ). For example, one can choose a normal coordinate system (xl,. . . , xN) in a neighborhood 0 of u and arrange that Ek = in 0. By (6.2.33), one then has (6.2.34)
&
1 A(lgradftI2)(0) 2 = div ( v g r a d f t gradft)(g) N
= x(VEkvgradftgradf t l E k ) ( a ) k=l
and, by (6.2.2),
I
N
(grad Aft g a d f t ) (0)=
(grad (VEk grad f t k=l
N
IE k ) Igrad f t ) (0)
N
= C(VgradftVEI:gradftIEl,)(a) k C(VEl:gradftIVgradfrEk)((T)
k=l
k=l
N
= C(VgradftvqgradftIEb)(CJ). k=l
Thus, after subtracting the second of these from the first, we arrive at N
vo(t,O) = Ric(gradftlgradft)(.)
+
(V[Ek,gradjt]gradftlEk)(a). k=l Finally, note that, because the HEssian is symmetric and V is torsion free, (V[Ek,gradft]gradft I E k )
(0)
= Hessft ([El,,p a d f t ] , Ek)(u) = (VEkgradftI[Ek,gradft])((.) = (VEk grad ft lVEk grad ft) (a> (VEk grad ft l v g r a d ft E k ) (0) = ( b g r a d ft lVEkgrad ft) (a).
Thus, (6.2.34) follows after summing the preceding over 1 5 Ic 5 N .
264
Large Deviations
Having dealt with w ( t , a),we next want to estimate W,5(t, u ) in (6.2.29). Remembering that the square of the HILBERTSCHMIDT norm dominates the square of the largest eigenvalue of a symmetric matrix, use (6.2.33) to check that
and therefore
By combining (6.2.27), (6.2.32), and (6.2.35), we arrive at the important relation (6.2.36)
aw
[ L ~ b ] ( t , u) ( t , u )
at
2
(Ric
+ Hess U )(grad ft ,grad ft) (a) 2w(t, 0 )
In particular, if we now make the assumption that Ric for some €
+ Hess U 2 261,
> 0, then
6.2.38 Lemma. Let T E (0,m) and w E C” ([0, TI x C; [0, m)) be given, and assume that t E [O,T]c,Ilw(t,.)11~2(~,is bounded. If
then
PROOF:Choose {qn}yas in the last part of Lemma 6.2.12 and set
265
VI Analytic Considerations Then
from which the desired inequality follows after one takes the limit as n 00.
+
I
With the preceding preparations, we are at last ready to prove the estimate toward which our efforts have been directed. 6.2.39 Lemma. Assume that (BALE)holds for some€
> 0. Then,
for
every uniformly positive element f of 3,
as in (6.2.26). Then, by Lemma 6.2.17, (6.2.37), and PROOF:Define Lemma 6.2.38, we know that
Now let 4,
$J
E C,oO(C; [0,w)) be given and set
Large Deviations
266
(u6(Tt,')(grad[p~4]Jgradi)) m dt  &'I2
Jd'ett "Pt+l, 4 L z ( m ) d t .

Now let {qn}f10 be the sequence produced in Lemma 6.2.12, replace 11, in the preceding by qnrlet n 00 and 6 \ 0, and use the above together with (6.2.41) to conclude that
(4, [grad [ P T / I I ) ~ 5~ eET ( ~ ) (4, [P~lgradfl])L 2 ( m ) Finally, because this is true for an arbitrary 4 E Cr (C; [0,w)),it obviously implies (6.2.40). I 6.2.42 Theorem. Assume that all 1 < p 5 q < 00,
(6'2.43)
IIYfIl,~(rn)L~(m)
In particular, {Pt : t (6.2.44)
(B&E) holds for some 6 > 0. Then, for
1 
91 for t E (0, 00) with e2Et2 p1'
> 0) is hypercontractive at time (log3)/26 and
VI Analytic Considerations
267
PROOF:Let f be a uniformly positive element of F. Then, from (6.2.40), we have that
and so, by (6.2.25),
Next, let q E ( 1 , ~and ) a uniformly positive 4 E F be given. Choosing 2 6.2.12, set fn = (qn+qj2 l / n ) . Plugging this fn into the above, noting that
+
{qn}yas in Lemma
and then letting n

00,
we arrive at
Since $t [Pt4]is a uniformly positive element of F whenever 4 itself is, we can use this in (6.1.16) with q ( t ) = 1 ( p  l)e2Etto conclude that
t E [O, 0 0 )

+
IIpt411L.ct,(m,
is nonincreasing; and from this point it is an easy step to (6.2.43). Finally, (6.2.44) follows from (6.2.43) together with Theorem 6.1.14 and Corollary 6.1.17. a 6.2.45 Corollary. Assume that there is a bounded V E C"(C;R) with
the property that (6.2.46)
for some c
Ric
> 0.
Then {Pt : t
+ Hess (U + V) 1 €1 > 0) is hypercontractive.
PROOF:Without loss of generality, we will assume that l e v d m = LeV'"dX
= 1.
Large Deviations
268
Define m' E and the DIRICHLET form &' relative to U Theorem 6.1.14and Theorem 6.2.42,
+ V. By
Using the technique in part (i) of Exercise 6.1.27, one sees that
At the same time, by (6.2.11),
and therefore
Thus, we find ourselves at the same place as we were when we started the second paragraph in the proof of Theorem 6.2.42;and therefore the same argument applies here. 1 6.2.47 Exercise.
Let C = W N and give W N the standard EucLIDean structure. Then the RIEMANNian measure is LEBESGUE'S measure and A is the standard EucLIDean LAPLACE operator. Let U E Cm(RN;R) be a function which is bounded below and satisfies (6.2.7),and define m E M1(WN) and L on CF(RN;W) accordingly. Finally, let & be the corresponding DIRICHLET form described in Theorem 6.2.9,and define V as in Theorem 6.2.21.
(i) It is interesting to see that, at least for the setting just described, Theorem 6.2.21 is quite sharp. To see this, suppose that there is an T E (0,oo) and a sequence u, 00 with the property that

sup '7%€2'
sup
V ( 7 )< 00,
T E B ( 0 , rr )
where B(U,T)denotes the open EucLIDean ball with center u and radius T . Choose y5 f
and set
Cr (B(0,T ) ; [O, 00))
with
lN
$ dz = 1,
4, = exp(U/2)$, where $,(T) = $(T + u,), T E R N . Show that I l 4 , l l ~ z ( ~ ) = 1 for all n E Z+ and sup €(&, 4,) < 00; nG!+
and conclude from this that the associated JE cannot be good.
VI Analytic Considerations
269
(ii) Assume that
where a E ( 0 , ~ and ) c, is chosen so that the normalization condition is satisfied. Show that JE is good if and only if a E ( 1 , ~ and ) that the associated semigroup {Pt : t > 0) is hypercontractive if a E [ 2 , ~ ) . Finally, if a E (1,2), show that (LS) fails and therefore that {Pt : t > 0) is not hypercontractive. (Hint: Try test functions of the form egu with P E (0,
m.)
(iii) The preceding result showed that the ORNSTEINUHLENBECK semigroup in Exercise 6.1.11 (i.e., the case when a = 2) is at the borderline of hypercontractivity. By a remarkable coincidence, it turns out that Theorem 6.2.42 predicts the optimal hypercontractive result for this semigroup. To see this, check that in this case (B&E) holds with E = and therefore that P1 IIPtIILp(m)+Lg(n) for et 2 p  1' Using the fact (cf. the last part of Example 6.1.11) that
and therefore that the predicted result is optimal. Actually, one can do even better. Namely, by considering the functions +,(a) = exp(rlz12), one can show that for any 1 < p < q < 00 and t E ( 0 , ~ with ) et < ( q  l)/(p  l),
llpt
IILp(m)+Lq(m)
= 00.
The facts contained in this exercise were first obtained by E. NELSON[79] and constitute the origins of all hypercontractivity considerations. 6.2.48 Exercise.
It is interesting to look at the BAKRYEMERY argument when C is compact; even though, in that case, we already know that (SU)holds and therefore that {Pt : t > 0) is more than hypercontractive. In this exercise we outline the argument for the compact case and point out that the argument is not only simpler but also leads to a slightly sharper statement. Observe that the key to the simplification is hidden entirely in the fact that the space C"(C; R) is invariant under both L and {Pt : t > 0).
270
Large Deviations
(i) Let f E C”(C; R) be uniformly positive and set H ( t ) = (ft logft)mr where, once again, ft = [Ptf].First show that
where
+t
= log f t , and second that
Now conclude that the condition
+
+
(e@[ ~ ~ H e s s ~ ~(Ric ~ ~ , HessU)(grad$,grad$)]) s.
(B&E’)
m
2 2t(e@lgrad$I2) m for 1c, E C”(C;W) implies (6.2.43).
(ii) The major advantage that (B&E’) has over (B&E) is that it leaves open the possibility of applying it even when no pointwise estimate holds. N For example, consider the case when C is the flat Ntorus (= (R/Z) ) and U 0. Then, since the RICCIcurvature vanishes, the left hand side of (B&E‘)becomes
which is easily seen to dominate
where (61,... ,ON) is the standard coordinate system on C. Thus, in this case, (B&E’)holds for all N E Z+ with a given E if it holds when N = 1for that E . Therefore, assume that N = 1, and observe that when h = &I2 then 2 the preceding dominates 4llh”[[,,(,), whereas the factor to be estimated on 2
the right hand side of (B&E’)becomes 4[lh’llL2(x).Use these observations to show that (B&E’)holds with E =
i.
VI Analytic Considerations
271
6.3 Hypoelliptic Diffusions on a Compact Manifold
In this section we will describe a particularly good situation to which the results in Section 4.2 apply and will attempt to give a more pleasing expression for the associated rate function, even when the process involved is not symmetric. The general setting in which we will be working is as follows. The space C will be a connected, compact, Ndimensional differentiable manifold; and X will denote a fixed probability measure on C which is "smooth" in the sense that, €or any coordinate chart (W,a ) ,there is an a E C" ( W ;(0,m)) for which r
In particular, for any X E r(T(C)),there is a (unique) gx E C"(C;R) with the property that
where (6.3.1)
X*$
= X$ + gx$,
$ E C"(C; R).
Now suppose that X I , . . . ,X d , and Y are given elements of r(T(C)) and define the operator
The following theorem contains a few important facts about the diffusion determined by Ly. 6.3.2 Theorem. Let R = C([O, 00); C), w E R H&(w) E C, t E [0, m), and {Bt : t > 0) be as in Theorem 6.2.9. Then, for each u E C, there is a unique P,, E Ml(R) for which
{Po' : u E C} is a FELLERcontinuous MARKOVfamily. Finally, let (PF : t > 0} denote the asso
is a meanzero martingale. In addition,
ciated MARKOVsemigroup. Then, for each
E C"(C;R),
the function
272 ( t , u ) E [o,m) x
which satisfies
c

Large Deviations [~:4~](u E )R is an element ofC"([o,oo)
(6.3.3) (t, u) = [Lyu] ( t ,u), (t,u) E [0, m) x C,
at
x C;R)
with u(0, = 4; a)
X is {PF : t > 0)invariant if and only if g y = 0; and X is { P y : t reversing if and only if Y = 0.
> 0)
PROOF:There are many ways in which one can prove each of these facts. For the sake of completeness, we will outline a proof which should be pleasing to the probabilists, if no one else. Without loss of generality, we assume that C is an embedded submanifold of R" for a suitably large n E Z+ and that the vector fields XI,. . . , Xd, and Y are the restrictions to C of vector fields X I , . . .,X d , and 3 on Rn with coefficients in C r (R"; W) (i.e., bounded continuous derivatives of all orders). At the same time, we think of each of the functions gx, as the restriction to C of some jx, E Coo(R"; R) , and then set xi = kk gx,. Hence, if fl = C" ([0, m); Rn), then one can use 1 ~ 6theory ' ~ of stochastic integral equations to construct a FELLERcontinuous, MARKOVfamily {Pz: x E R"} 5 Ml(f2) with the property that, for every x E R",
+
h

6
is a meanzero martingale for every E C?(R";R), where 2 E fl & ( L j ) E Rn and & are defined by analogy to their "unhatted" counterparts, and d
t =  p jo X  , + Y . k=l
In fact, one knows that it is possible to differentiate the solution to 1 ~ 6 ' ~ equations it9 a function of the starting point x. As a consequence, one finds first that the associated semigroup { pt : t > 0} maps C r (R"; R) into itself and then that (t,x) E [0, m) x C H [ P t J ](x)E R is a smooth solution to a0
(t, at x) = [hi]( t ,x), t E [0, oa) x C with C(0,
6
a)
=
d
for each E C?(R";R). Finally, if x = u E C, then one can easily show that pc(Q)= 1; and so we get all the required existence results by simply taking Po = pnlBn,u E C. Furthermore, the asserted uniqueness statement follows easily (cf. Theorem 6.3.2 in [104]) from the fact that we now also know how to find a smooth solution to (6.3.3) for every smooth 4; namely, one simply chooses E C" (R"; R) so that $ 1 ~= 4 and then takes u(t,u)= [ M ] ( u ) .
4
VI Analytic Considerations
2 73
To complete the proof, let 4, $ E C"(C; W) be given and note that, for any T E (O,m),
for t E [O,T].Hence, with t+!~ = 1, we see that X is {P' : t if and only if g y = 0. At the same time, if Y = 0, then
whereas, if X is {P'
:
> 0)invariant
t > 0)reversing, then (Y$,q5)L2(A) = 0. I
6.3.4 Remark.
Note that if U E C"(C;R) and Y u E I'(T(C)) and mu E Ml(C) are defined by
sc
where Zu = e' dX, then mu is {P' : t > 0)reversing if and only if Y = Y'. Indeed, for any X E T(T(C)), one can easily check that
from which it is clear that the reasoning used to prove the last part of the preceding theorem applies with mu replacing X and X i (XkU) replacing X,. .
+
As yet we have not made any assumptions which would guarantee the sort of conditions required to make the results in Section 4.2 applicable. For this reason, we will now add the following hypothesis: Lie(X1,. . . , xd) = T(C),
(H)
where Lie(X1,. . . , Xd) denotes the LIE subalgebra of I'(T(E)) generated by {XI,.. . ,x d } and the equality means that, at each 0 E C,
{
x E Lie(X1,. . . ,xd)}
~ ( 0: )
= T,(c).
274
Large Deviations
famous theorem (see [63]),the hypothesis (H) According to HORMANDER’S is more than enough to guarantee that, for any Y E I’(T(C)), the operator
d
 + LY
at
is “hypoelliptic.” In particular, this means that
PY(t,(T, d7) = p Y ( t , c,?)A(&), where the function py is a nonnegative element of Cm((O,00) x C x C; W). In addition, (H) is sufficient to guarantee that p y must be everywhere strictly positive. To see this, one can either invoke BONY’S strong maximum principle (see [13]) or one can use the ‘‘support theorem” in [103]. Thus, with (H), we have more than enough information to see that not only does hold but even that, for every t E (0, m), the condition
(a)
1 A Mt
(6.3.6)
5 P Y ( t , u ,*) 5 M t A ,
(T
E
c,
for some Mt E [l,GO). In view of the preceding, we now know that (H) allows us to apply the results of Section 4.2, and the following lemma summarizes what we can say immediately on the basis of those results.

6.3.7 Lemma. Assume that (H) holds, and define w E fl Lt(w) E Ml(C), t E (0, m) as in Remark 4.2.2. Then, for every r E &?M~(C),
where (6.3.9) .Iy(.)
= sup { 
LY u dv U
: u E C ” ( C ; [I,m))}
,
Y E
M1(C).

Moreover, if & denotes the DIRICHLET form corresponding to (t,u) E Po(t,cr,.)E Ml(C) and A, then
(0,m) x C
(6.3.10)
P ( v )= JE(V) =
~ ( f 1 / f1/2) ~ ,
if dv = f d ~ otherwise ,
VI Analytic Considerations where J o
3Jy
275
with Y = 0.
PROOF:Let L be the operator defined in the discussion preceding Lemma 4.2.31 and define D, as in Lemma 4.2.35. In view of Theorem 4.2.43, the first assertion will be proved once we note that D, C_ C"(C; R), Lu = L y u for 2~ E C"(C;R), and that, for every u E D, there is a sequence {un}yC C"(C; R) such that (tin,Lyun) (u, Lu) uniformly as n 03. Clearly the only one of these needing comment is the last. But, for every u E D,, un [PGnu] E C"(C;R) and Lyun = [Pl'/,Lu]. Finally, since holds, the second assertion is an immediate consequence of Theorem 4.2.58. I


(a)
6.3.1 1 Remark.
In connection with Remark 6.3.4, one should notice that the last part of Lemma 6.3.8 can be immediately modified to say that J y = JEW when Y is the Y u in (6.3.5) and Iuis the DIRICHLETform associated with the corresponding symmetric MARKOVsemigroup on L2(mu).
Our main goal in the rest of this section will be to obtain a better expression for the rate function J y , even in cases when Remark 6.3.11 does not apply. In particular, what we are seeking is an expression in which one can clearly see the distinct contributions made to J y by the "symmetrizable" and "nonsymmetrizable" parts of L y . In order to carry out our program, it will be useful to introduce the following notions. In the first place, for # E C"(C; R) define X# E C" ( C ;Rd) by
X# =
["' I.
Xd#
Next, for p E [ l , m ) , define W,"'(X,X) to be the space of # E P ( X ) for which there exists a sequence {&}? C C"(C; R) with the properties that
as m

CQ.
6.3.13 Lemma. For any p E [l,m), there is a unique continuous linear mapping

X ( P f: Wj')(X,X) + L P ( A ; P )
276
Large Deviations (PI
for which X Q, = XQ, whenever Q, E C"O(C;R). In fact, unique element of P ( A ; R d ) with the property that
(P)
X Q, is the
(P) (d and therefore, X (b = X (b Aalmost everywhere when (b 6 Wj"(X, A) n W,$"(X, A). Moreover, if 7 E C1(R;R) and Q, is an element of Wjl'(X, A) (PI which satisfies o Q, E Lq(A)and (77' o Q,)X (b f Lq(A; R d ) for some q E [l,m), then r] 0 Q, E Wil)(X,A) and (PI
(q)
x
(7704) = (v' O Q , > X 4.
PROOF:We first note that, for any (b f LP(A), there is at most one @ E P ( A ; R d ) with the property that (6.3.14) for every X€J E Cm(C;Rd). Second, we observe that if {&}y Cm(C;R) satisfies (6.3.12), then X(bn converges in L p ( A ; R d ) to a @ E L p ( X ; R d ) for which (6.3.14) holds. Thus, both the existence and uniqueness statements follow immediately, and all the other statements are easy applications of these. I Because the program which we have in mind rests on L y being a compact perturbation of Lo, we will have to assume that d
(6.3.15)
Y =C
a k X k
for some {ak}?
c c~(c;w).
k=l
The importance of (6.3.15) is already apparent in the next result.
6.3.16 Lemma. Assume that (H) holds. Then W,'"(X,A) = Dom(E) and w for (b E Dom(E). €(#>4) = 1Ix (b11;2(x;Rd)
Ic in addition, Y is given by (6.3.15), then .Iy(.) < 00 if and only if dv = f dA, where f is nonnegative and f '1' E Wil)(X, A). PROOF:To prove the first part, note that
VI Analytic Considerations
277
for 4 E C"(C; W). Thus, since 4 E Dom(E) and
€(+,$) = nm lim when
lim 4n in L ~ ( A ) ~ ( 4 ~ , 4i f 4~ =) n00
{&}y C Dom(E) satisfies
we see that Wil)(X, A) C Dom(E) and that
€(4,4) = Il~(2)4\1&xiRd) for
4 E Wil)(X, A). To prove the opposite inclusion, let 4 E Dom(E) be given and set 4, = [Pl"/,,$],n E Z+. Then, because of (H), {&}? & C"(C;W), and clearly 4, 4. At the same time, by the Spectral Theorem,

as m
+
00.
Turning to the second part, note that (cf. Theorem 4.2.58) there is nothing to do when Y = 0. On the other hand, if Y is given by (6.3.15), then, after writing u E C"(C; [l,m)) as e4, we see that
Hence, if we take
(6.3.18)
A = [a1]
9
ad
then we find that
By reversing the preceding argument, we also find that
and so we now see that J y ( v ) < m if and only if Jo(v) < 00.
278
Large Deviations
In order to complete our program, let Y be given by (6.3.15), define A as in (6.3.18); and, for v E MI@), define A, to be the orthogonal projection in L2(v; W d )of A onto
{x4:
f$
E C"(C;R)}
LZ(u;Rd) 1
and set
P(A; ') = IIAVll;Z(,;Rd)* Since
it is clear that v E

P(A,v) is lower semicontinuous and convex.
6.3.19 Theorem. Assume that (H) holds and that Y is given by (6.3.15). Then (6.3.20)
1 J y ( v ) = J E ( v ) P(A; v ) 4
+
+ 51
RY dv,
where A is defined as in (6.3.18) and d
Ry =
~ C X i a k . k=l
PROOF:In view of Lemma 6.3.16, we need only consider v E M1(C) for which dv = f dX for some nonnegative f with f112 E Wil)(X,X). In addition, since both sides of (6.3.20) are lower semicontinuous and convex, we may and will assume that f 2 E for some 6 > 0. (Otherwise, set v, = (1  c)v and let E \ 0.)
+
We begin by proving that (6.3.21)
1 J y ( v ) = J E ( v ) P(A; v )  (A,, 4
To this end, choose {f$.}f"
+
$2)
f
1/2
1/2
)
C"(C; R) so that
and set iDn = X4n  ;A,. Then (cf. (6.3.17))
.Iy(.) equals
L2(u;Rd)
VI Analytic Considerations
279
and, for any given $ E C"(C; R),
At the same time,
(2)
Hence, since x + 
1/2
E L2 ( u ; R d ) ,
where
5
c i/\X+I\L2(u;Rd)7
"9
for some C E ( 0 , ~ )depending only on A and f . Clearly, by using (6.3.17) with Y = 0 to compute JE(v), one can easily use the preceding to get (6.3.21).
To prove (6.3.20) from (6.3.21), all that we have to do is check that
280
Large Deviations
and this comes down to showing that there exists a sequence { g , } T C"(C; W) such that
For this purpose, choose
{u,}T
E
G C" (C; [l,co))so that
and set gn = log u,. One then has that
We can now complete the proof by simply noting that
6.3.22 Exercise. Let X I , .. . ,Xd, and Y be smooth vector fields on the connected, compact manifold C; and assume that the Xk 's satisfy (H). Next, set 2 = C x R and define the vector fields XI,. . .,i d , and Y on 2 by
VI Analytic Considerations
281
and
for 4 E C”(f:; Finally, define
W), where b l , . . . ,b d , and c are given elements of C”(C; W). d
2
=EX:+ Y
on c  ( k ; ~ ) .
k=l
One can then show that L determines a (unique) FELLERcontinuous, MARKOV family {Pa : 0 E k} of probability measures on R = C([O,0 0 ) ; 2)with the property that
is a meanzero martingale for every 6 E k and all 4 E C” (9; W). In fact, as aficionados of stochastic differential equations will easily verify, if & = (a, 0) & {P," : t > 0) generator of {Ft : t > 0) norm for operator on L 2 ( m )into itself norm for operators on P ( p ) into L Q ( p ) lowest eigenvalue of A in G LEGENDREtransform of J& logarithmic moment function of p logarithmic moment function of LEGENDREtransform of A, logarithmic moment function for II LEGENDREtransform of An logarithmic moment function for P ( t ,(T,.) LEGENDREtransform of Ap variant of the preceding variant of A p LEGENDREtransform of A p variant of the preceding logarithmic moment function of W LEGENDREtransform of Aw logarithmic SOBOLEV inequality space of probability measures on C space of finite signed measures on C @invariantprobability measures on R ergodic elements of My ((a,B)) timeshift invariant probability measures on R timeshift ergodic probability measures on R mean of the measure p a)
26, 262 273 106 194 214 21 103 125 129 167 175 68, 92 111 123 129 130 231 147 130 3, 78 68 4, 78 101 101 120 120 125 190 190 191 8 10 242 64 64 194 197 167, 176 214 78
Notation
303
(d,
mean of q5 under m splice of p with Il p@dn distribution of 5, under Pe Pd,n distribution of L, under and P, Pa,, distribution of L, under pn Pn distribution of St under P, Pu,t covariant derivative of Y relative to X VXY splice of v with P, u @T p* splice of paths W T 8 w' f i ( b , & lI(o,) transition probability functions [discretetime) FEYNM AN K AC kernel (discretetime) law of the MARKOVchain starting from b & c cont inuoustime transition probability function variant of the preceding FEYNMANKAC semigroup FEYNMANKAC kernel (continuous time) L2(m)extension of {Pi : t > 0) law of the MARKOVprocess starting from o r.c.p.d. of P given B1 splice of P with II empirical process measures variant of the preceding regular conditional probability distribution RICCIcurvature tensor partial sums normalized partial sums additive & normalized additive functionah unit circle €sausage around 01[ o , t ~ tail afield Rdvalued WIENER paths and dual associated norm duality relation between 0 ' and 0 tangent bundle over C a uniform ergodic condition for fi(8,.) a uniform ergodic condition for n(a, a uniform ergodic condition for ?(t, a,.) a variant of the preceding (total) variation norm WIENER'S measure a)
d)
25 7 165 92 92 68 111 250 177 f 70 91, 92 101 91, 92 110
110 121 122 129 111 231 168 161, 171 214 163 262 59, 91, 93 59, 91, 93 If0 202 146 205 8 8 8 250 95 100 113
240 64 8
304
Large Deviations
* W$')(X,A) (XIY) [X, YI
X* X'(
' >x
weak convergence of measures a SOBOLEV space RIEMANNian inner product of X and Y commutator of X and Y topological dual of X duality relation between X* and X
1
275 250 250 53 53
Subject Index
afEne property of specific entropy, 181 Azencott, 24 empirical distribution functional, 68,92 of the position process, 111 Cameron and Martin’s formula, 14,19
of the whole process, 161, 171
ChapmanKolmogorov equation, 110
empirical process measure, 214
Chiyonobu and Kusuoka, 225
entropysee relative entropy
classical Sobolev inequality, 248
€Markov, 232
conditional probability distribution, 198 covariant notion of large deviations, 36 CramWs theorem
backward, 236 forward, 236 ergodic
classical, 5
decomposition theorem, 201
for Banach spaces, 83
elements, 197
for Gaussian measures, 86
individual theorem, 196
generalized , 61
maximal inequality, 195
in
R N ,63
exponentially tight 41
decreasing rearrangement, 158
Feller continuous, 103, 125
Dirichlet form, 129
Fernique’s theorem, 16
discrete oneparameter semigroup, 203
FeynmanKac formula
Doeblin’s theory of ergodicity for Markov chain, 106
discretetime, 102 continuoustime, 121
Donsker and Varadhan, 83, 86, 105, 127,
133, 146, 159, 169, 180
Gafhey’s lemma, 252
306
Large Deviations
Gaussian measure on Banach spaces, 85
logarithmic Sobolev inequality, 242
covariance of, 85
logarithmic spectral radius, 101, 122
tail estimate of, 86
lower bound for symmetric rnarkov pro
good rate function, 36
Gross’s logarithmic Sobolev inequalities, 242, 249
cesses, 210 lowersemicontinuous convex minorant, 57 msymmetric, 128
HardyLittlewood maximal inequality, 193 Hessian, 261 Hormander’s condition, 273 hypercontractive, 232, 238 hypermixing, 213 individual ergodic theoremsee ergodic
maximal ergodic inequalitysee ergodic mean of the measure, 80 measurable group of transformation, 201 measurable one parameter semigroup of transformations, 194 moment generating functionsee logarithmic moment generating function
[measurably separated, 213 Lanford, 59
nondecreasing function, 185
LaplaceBeltrami operator, 251 large deviation principle for hypermixing processes, 225 full, 35 for symmetric Markov processes, 133,
210 uniform, for Markov chains, 97, 105 uniform, for Markov processes, 119,127 uniform, for Markov chains at process level, 167,169 uniform, for Markov processes at process level, 175, 180 uniform, for Markov processes w.r.t. the variation norm topology, 145 weak , 40 law of the iterated logarithm classical , 32 Strassen’s, 21 Legendre transform, 4,55
OrnsteinUhlenbeck process, 240, 269 nergodic, 204 ninvariant, 106 Polish space, 1 projective limits, 50 strong topology, 174
{pt : t
> 0)invariant,
134
Ranga Rao’s theorem, 78 rate function, 35 good , 36 regular conditional probability distribu tion, 163 relative entropy, 70 variational formula for , 68 reversing measure, 128 Ricci curvature, 262
LBvy metric, 64
Riemann curvature, 262
logarithmic moment generating function,
Riesz’s sunrise lemma, 193
3,53
Ruelle, 59
Subject Index
307 tight
Sanov’s theorem 70 w.r.t. the strong topology, 73 Schilder’s theorem, 18 shiftinvariant, 164, 167, 176
S korokhod space, 169 topology, 169 representation theorem, 32 smooth probability measure, 271 Sobolev classical inequality, 248 logarithmic inequaltity, 242 space, 154 specific relative entropy, 182, 215 affine property of, 181, 222 Strassen’s theorem, 21 strong law of large numbers in Banach spaces, 78 symmetric Markov process, 128
function, 185 set, 64 timeshift semigroup, 171 transformation group, 214 topology strong, 71 7, 71
uniform norm, 140 variationnorm, 140 weak, 52, 64 transition probability function, 91, 110 upper bound, 189 Varadhan’s theorem, 43 Ventcel and Fkeidlin’s estimate, 31 Wiener quadruple, 88 Wiener sausage, 146 asymptotics of, 159
tail estimate for Gaussian measures, 86
Wiener measure 8
tail aalgebra, 205
scale invariance property of, 9
@invariant, 194
quasiinvariance property of, 14
PURE AND APPLIED MATHEMATICS VOl. 1 VOl. 2 VOl. 3
VOl. 4 VOl. Vol. VOl. Vol. VOl.
5
6
7 8 9
VOl. 10
VOl. 11* VOl. 12* Vol. 13 Vol. 14 Vol. 15* Vol. Vol. Vol. VOl. Vol. VOl.
16* 17 18 19 20 21
VOl. 22 Vol. 23* VOl. 24
Arnold Sommerfeld, Partial Differential Equations in Physics Reinhold Baer , Linear Algebra and Projective Geometry Herbert Busemann and Paul Kelly, Projective Geometry and Projective Metrics Stefan Bergman and M. Schiffer, Kernel Functions and Elliptic Differential Equations in Mathematical Physics Ralph Philip Boas, Jr., Entire Functions Herbert Busemann, The Geometry of Geodesics Claude Chevalley, Fundamental Concepts of Algebra SzeTsen Hu, Homotopy Theory A. M. Ostrowski, Solution of Equations in Euclidean and Banach Spaces, Third Edition of Solution of Equations and Systems of Equations J . Dieudonnt, Treatise on Analysis: Volume I, Foundations of Modern Analysis; Volume II; Volume III; Volume IV; Volume V; Volume VI; Volume VII S. I . Goldberg, Curvature and Homology Sigurdur Helgason, Differential Geometry and Symmetric Spaces T . H. Hildebrandt, Introduction to the Theory of Integration Shreeram Abhyankar, Local Analytic Geometry Richard L. Bishop and Richard J. Crittenden, Geometry of Manifolds Steven A. Gad, Point Set Topology Barry Mitchell, Theory of Categories Anthony P . Morse, A Theory of Sets Gustave Choquet, Topology Z. I. Borevich and I. R. Shafarevich, Number Theory JosC Luis Massera and Juan Jorge Schaffer, Linear Differential Equations and Function Spaces Richard D. Schafer, A n Introduction to Nonassociative Algebras Martin Eichler, Introduction to the Theory of Algebraic Numbers and Functions Shreeram Abhyanker, Resolution of Singularities of Embedded Algebraic Surfaces
Presently out of print
Vol. 25 Vol. Vol. Vol. Vol.
26 27 28* 29
Vol. 30 Vol. 31 Vol. 32 VOl. 33 VOl. 34* VOl. 35
Vol. VOl. Vol. VOl. Vol.
36 37 38 39 40*
Vol. 41* Vol. 42 VOl. 43 VOl. 44
VOl. 45 Vol. 46 VOl. 47
Vol. 48
Franqois Treves, Topological Vector Spaces, Distributions, and Kernels Peter D. Lax and Ralph S . Phillips, Scattering Theory Oystein Ore, The Four Color Problem Maurice Heins, Complex Function Theory R. M. Blumenthal and R. K . Getoor, Markov Processes and Potential Theory L. J . Mordell, Diophantine Equations J . Barkley Rosser, Simplified Independence Pro0fs: Boolean Valued Models of Set Theory William F . Donoghue, Jr., Distributions and Fourier Transforms Marston Morse and Stewart S . Cairns, Critical Point Theory in Global Analysis and Differential Topology Edwin Weiss, Cohomology of Groups Hans Freudenthal and H. De Vries, Linear Lie Groups Laszlo Fuchs, Infinite Abelian Groups Keio Nagami, Dimension Theory Peter L. Duren, Theory of H p Spaces Bod0 Pareigis, Categories and Functors Paul L. Butzer and Rolf J . Nessel, Fourier Analysis and Approximation: Volume I, OneDimensional Theory Eduard PrugoveCki, Quantum Mechanics in Hilbert Space D. V. Widder, An Introduction to Transform Theory Max D . Larsen and Paul J. McCarthy, Multiplicative Theory of Ideals ErnstAugust Behrens, Ring Theory Morris Newman, Integral Matrices Glen E. Bredon, Introduction to Compact Transformation Groups Werner Greub, Stephen Halperin, and Ray Vanstone, Connections, Curvature, and Cohomology: Volume I, De Rham Cohomology of Manifolds and Vector Bundles Volume 11, Lie Groups, Principal Bundles, and Characteristic Classes Volume III, Cohomology of Principal Bundles and Homogeneous Spaces Xia Daoxing, Measure and Integration Theory of InfiniteDimensional Spaces: Abstract Harmonic Analysis
Ronald G. Douglas, Banach Algebra Techniques in Operator Theory Vol. 50 Willard Miller, Jr ., Symmetry Groups and Theory Applications Arthur A. Sagle and Ralph E. Walde, Introduction to Lie Vol. 51 Groups and Lie Algebras T . Benny Rushing, Topological Embeddings Vol. 52 VOl. 53* James W. Vick, Homology Theory: A n Introduction to Algebraic Topology E. R. Kolchin, Differential Algebra and Algebraic Groups VOl. 54 VOl. 55 Gerald J. Janusz, Algebraic Number Fields Vol. 56 A. S . B. Holland, Introduction to the Theory of Entire Functions VOl. 57 Wayne Roberts and Dale Varberg, Convex Functions Vol. 58 H. M. Edwards, Riemann’s Zeta Function VOl. 59 Samuel Eilenberg, Automata, Languages, and Machines: Volume A, Volume B Vol. 60 Morris Hirsch and Stephen Smale, Differential Equations, Dynamical Systems, and Linear Algebra Wilhelm Magnus, Noneuclidean Tesselations and Their Group Vol. 61 Vol. 62 FranGois Treves, Basic Linear Partial Differential Equations Vol. 63* William M . Boothby, A n Introduction to Differentiable Manifolds and Riemannian Geometry Vol. 64 Brayton Gray, Homotopy Theory: An introduction to Algebraic Topology Vol. 65 Robert A. Adams, Sobolev Spaces Vol. 66 John J. Benedetto, Spectral Synthesis Vol. 67 D. V. Wilder, The Heat Equation Vol. 68 Irving Ezra Segal, Mathematical Cosmology and Extragalactic Astronomy Vol. 69 I . Martin Isaacs, Character Theory of Finite Groups Vol. 70 James R. Brown, Ergodic Theory and Topological Dynamics Vol. 71 C. Truesdell, A First Course in Rational Continuum Mechanics: Volume I , General Concepts Vol. 72 K. D. Stroyan and W. A. J. Luxemburg, Introduction to the Theory of Infinitesimals VOl. 73 B. M. Puttaswamaiah and John D. Dixon, Modular Representations of Finite Groups VOl. 74 Melvyn Berger ,Nonlinearity and Functional Analysis: Lectures on Nonlinearity Problems in Mathematical Analysis VOl. 75 George Gratzer, Lattice Theory
VOl. 49
Vol. 76
Charalambos D. Aliprantis and Owen Burkinshaw, Locally Solid Riesz Spaces Jan Mikusinski, The Bochner Integral VOl. 77 Vol. 78 Michiel Hazelwinkel, Formal Groups and Applications Vol. 79 Thomas Jech, Set Theory Vol. 80 Sigurdur Helgason, Differential Geometry, Lie Groups, and Symmetric Spaces Vol. 81 Carl L. DeVito, Functional Analysis Vol. 82 Robert B . Burckel, An Introduction to Classical Complex Analysis Vol. 83 C. Truesdell and R. G. Muncaster, Fundamentals of Maxwell’s Kinetic Theory of a Simple Monatomic Gas: Treated as a Branch of Rational Mechanics Vol. 84 Louis Halle Rowen, Polynomial Identities in Ring Theory Vol. 85 Joseph J. Rotman, An Introduction to Homological Algebra Vol. 86 Barry Simon, Functional Integration and Quantum Physics Vol. 87 Dragos M. Cvetkovic, Michael Doob, and Horst Sachs, Spectra of Graphs Vol. 88 David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications VOl. 89 Herbert Seifert, W . Threlfall, A Textbook of Topology Vol. 90 Grezegorz Rozenberg and Art0 Salomaa, The Mathematical Theory of L Systems Vol. 91 Donald W. Kahn, Introduction to Global Analysis Vol. 92 Eduard PrugoveCki, Quantum Mechanics in Hilbert Space, Second Edition VOl. 93 Robert M. Young, An Introduction to Nonharmonic Fourier Series VOl. 94 M. C. Irwin, Smooth Dynamical Systems Vol. 96 John B. Garnett, Bounded Analytic Functions VOl. 97 Jean Dieudonnk, A Panorama of Pure Mathematics: As Seen by N. Bourbaki Vol. 98 Joseph G. Rosenstein, Linear Orderings VOl. 99 M. Scott Osborne and Garth Warner, The Theory of Eisenstein Systems VOl. 100 Richard V. Kadison and John R. Ringrose, Fundamentals of the Theory of Operator Algebras: Volume 1, Elementary Theory; Volume 2, Advanced Theory VOl. 101 Howard Osborn, Vector Bundles: Volume I , Foundations and StiefelWhitney Classes
VOl. 102 Avraham Feintuch and Richard Saeks, System Theory: A Hilbert Space Approach Vol. 103 Barrett O’Neill, SemiRiemannian Geometry: With Applications to Relativity VOl. 104 K. A. Zhevlakov, A. M. Slin’ko, I. P. Shestakov, and A. I. Shirshov, Rings That Are Nearly Associative Vol. 105 Ulf Grenander , Mathe~aticalExperiments on the Computer VOl. 106 Edward B. Manoukian, Renormalization Vol. 107 E. J. McShane, Unified Integration Vol. 108 A. P . Morse, A Theory of Sets, Revised and Enlarged Edition VOl. 109 K. P. S . BhaskaraRao and M. BhaskaraRao, Theory of Charges: A Study of Finitely Additive Measures VOl. 110 Larry C. Grove, Algebra VOl. 111 Steven Roman, The Umbra1 Calculus VOl. 112 John W. Morgan and Hyman Bass, editors, The Smith Conjecture Vol. 113 Sigurdur Helgason, Groups and Geometric Analysis: Integral Geometry, Invariant Differential Operators, and Spherical Functions Vol. 114 E. R. Kolchin, Differential Algebraic Groups Vol. 115 Isaac Chavel, Eigenvalues in Riemannian Geometry Vol. 116 W. D. Curtis and F. R. Miller, Differential Manifolds and Theoretical Physics Vol. 117 Jean Berstel and Dominique Perrin, Theory of Codes Vol. 118 A. E. Hurd and P. A. Loeb, A n Introduction to Nonstandard Real Analysis VOl. 119 Charalambos D . Aliprantis and Owen Burkinshaw, Positive Operators VOl. 120 William M. Boothby, A n Introduction to Differentiable Manifolds and Riemannian Geometry, Second Edition VOl. 121 Douglas C. Ravenel, Complex Cobordism and Stable Homotopy Groups of Spheres VOl. 122 Sergio Albeverio, Jens Erik Fenstad, Raphael HseghKrohn, and Tom Lindstrram, Nonstandard Methods in Stochastic Analysis and Mathematical Physics Vol. 123 Albert0 Torchinsky, Real Variable Methods in Harmonic Analysis Vol. 124 Robert J. Daverman, Decomposition of Manifolds Vol. 125 J. M. G. Fell and R. S. Doran, Representations of *Algebras, Locally Compact Groups, and Banach *Algebraic Bundles: Volume 1, Basic Representation Theory of Groups and Algebras
Vol. 126 J. M. G . Fell and R. S. Doran, Representations of *Algebras, Locally Compact Groups, and Banach *Algebraic Bundles: Volume 2, Induced Representations, the Imprimitivity Theorem, and the Generalized Mackey Analysis Vol. 127 Louis H. Rowen, Ring Theory, Volume I Vol. 128 Louis H . Rowen, Ring Theory, Volume I1 Vol. 129 Colin Bennett and Robert Sharpley , Interpolation of Operators Vol. 130 Jiirgen Poschel and Eugene Trubowitz, Inverse Spectral Theory Vol. t31 Jens Carsten Jantzen, Representations of Algebraic Groups Vol. 132 Nolan R. Wallach, Real Reductive Groups I VOl. 133 Michael Sharpe, General Theory of Markov Processes Vol. 134 Igor Frenkel, James Lepowsky, and Arne Meurman, Vertex Operators and the Monster Vol. 135 Donald Passman, lnfinite Crossed Products Vol. 136 Heinz Otto Kreiss and Jens Lorenz, InitialBoundary Value Problems rind the NuvierStokes Equations Vol. 137 JeanDominiqueDeuschel and Daniel W. Stroock, Large Deviations
This Page Intentionally Left Blank