An Initiation to Logarithmic Sobolev Inequalities Gilles Royer
An Initiation to Logarithmic Sobolev Inequalities
SMF...
36 downloads
645 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
An Initiation to Logarithmic Sobolev Inequalities Gilles Royer
An Initiation to Logarithmic Sobolev Inequalities
SMF/AMS TEXTS and MONOGRAPHS Volume 14
Cours Specialises Numero 5 1999
An Initiation to Logarithmic Sobolev Inequalities Gilles Royer
Translated by
Donald Babbitt
a
0
N
American Mathematical Society Societe Mathematique de France
Une Initiation aux Inegalites de Sobolev Logarithmiques An Initiation to Logarithmic Sobolev Inequalities Gilles Royer Originally published in French by Society Mathematique de France.
Copyright © 1999 Societe Matht matique de France Translated from the French by Donald Babbitt 2000 Mathematics Subject Classification. Primary 60-02; Secondary 35J85, 47B25, 47D07, 60J60, 82C99.
For additional information and updates on this book, visit
www.anis.org/bookpages/smfanLs-14
Library of Congress Cataloging-in-Publication Data Royer. Gilles. [Initiation aux in4galites de Sobolev logarithmiques. English] An initiation to logarithmic Sobolev inequalities / Gilles Royer ; translated by Donald Babbitt. p. cm. - (SMF/AMS texts and monographs, ISSN 1525-2302 ; v. 14) (Cours specialises, ISSN 1284-6090 ; 5) Includes bibliographical references.
ISBN-13: 978-0-8218-4401-4 (alk. paper) ISBN-10: 0-8218-4401-6 (elk. paper)
1. Ergodic theory. inequalities. I. Title. QA313.R6913
2. Logarithmic functions.
3. Semigroups of operators. 4. Differential
2007 2007060798
515'.48--dc22
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can also be made by e-mail to reprint-permissionaams. org. © 2007 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America.
® The paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability.
Visit the AMS home page at http://vw.ams.org/
10987654321
12 1110090807
Contents Preface
vii
Chapter 1. Self-Adjoint Operators 1.1. Symmetric operators 1.2. Spectral decomposition of self-adjoint operators Chapter 2. Semi-Groups 2.1. Semi-groups of self-adjoint operators 2.2. Kolmogorov semi-groups
1
1
8 15 15 19
Chapter 3. Logarithmic Sobolev Inequalities 3.1. The Poincare and Gross inequalities 3.2. An application to ergodicity
37 37 55
Chapter 4. Gibbs Measures 4.1. Generalities 4.2. An Ising model with real spin
65
Chapter 5. Stabilization of Glauber- Langevin Dynamics 5.1. The Gross inequality and stabilization 5.2. The case of weak interactions 5.3. Perspectives
89 89 95 101
Appendix A.
105
65 73
A.1.
Markovian kernels
105
A.2. A.3.
Bounded real measures The topology of weak convergence
109
Bibliography
111
117
V
Preface This book contains the material that was essentially covered in a course "de troisieme cycle"' taught during the second semester of the 1996-1997 academic year at the University d'Orleans. The goal of this course was to give an exposition of an example of the use of logarithmic Sobolev inequalities coming primarily from two papers by B. Zegarlinski [Zeg9O, Zeg96]. The example is concerned with real spin models with weak interactions on a lattice where one can apply a classic method due to Dobrushin; see notably [DobTO]. For these models, we give a proof of the uniqueness of the Gibbs measure by showing the exponential stabilization of the stochastic evolution of an infinite dimensional diffusion process which generalizes the case of the Glauber dynamics for the Ising model. Although these models are technically more complicated than the Ising model, one still uses familiar techniques, e.g., using Ito's stochastic integral calculus to construct and study diffusion processes, as well as utilizing the well-known properties of self-adjoint differential operators on iR" and Sobolev and Poincare inequalities in their original setting. These models also utilize in a natural way some elegant results on logarithmic Sobolev inequalities such as the Bakry-Emery and Herbst inequalities. Interestingly, these models are simplifications of the Nelson models of Euclidean fields where Gross first introduced logarithmic Sobolev inequalities.2 In this book we introduce in a self-contained manner the basic notions of
self-adjoint operators, diffusion processes, and Gibbs measures. The chapter on logarithmic Sobolev inequalities is enriched by adding applications to Markov chains so as not to remain in too special a setting. The reader will find indications of some recent applications of logarithmic Sobolev inequalities to statistical mechanics at the end of Chapter 5. I would like to thank my colleagues S. Roelly and P. Maheux for very useful discussions as well as the students of the DEA d'Orleans, in particular, G. Salin.
Note added to the original Preface. The translation presented here differs from the French original by a small number of corrections. Since the original course was given, logarithmic Sobolev inequalities have been the
subject of many articles. We recommend that interested readers consult 'Translator's note: "Un cours de troisieme cycle" is equivalent to an advanced graduate course in an American university. 2Translator's note: These are now also called Gross inequalities. vii
viii
PREFACE
[Cor02, OR071, and their bibliographies if they are interested in further study of the subjects treated here.
CHAPTER 1
Self-Adjoint Operators We denote by H, a separable complex Hilbert space,' by V a dense linear subspace of H, and by A an operator from D to H. The space V is called the domain of the operator A and is denoted D(A). Unlike bounded operators, 2 in particular, operators on any finite dimensional Hilbert space, simple consideration of the symmetry of operators does not lead to a theorem
of spectral decomposition. We will introduce directly the notion of selfadjointness by utilizing spectral conditions based on an expose of P. Cartier at 1'Ecole Polytechnique.
1.1. Symmetric operators Definition 1.1.1. We say that the complex number A is in the resolvent set p(A) of A if (AId-A) is injective, its image (AId-A)V is dense in'H, and if the inverse operator (A Id - A)-' is a bounded operator from (A Id - A)D to H. This operator is then uniquely extended to a bounded operator R,, on H called the resolvent operator. We often abbreviate A - A Id by A - A.
Proposition 1.1.2 (Resolvent Equation). . For all A, Ec E p(A) we have:
RA-R1, =(A-u)R,RA. Note that the Resolvent Equation implies that {RA} is a commutative family of operators. Definition 1.1.3. We say that A is closed if V is complete for the norm IIA',II2)1,2.
IIII.a = (II0II2 +
Consider the graph of A:
9A={(1p,ATp)EHxH : 1P ED}. It is obvious that the projection of the graph of A, with the usual product norm H x H, onto D, with the norm HA, is an isometry. Thus it is clear that A is closed if its graph CA is closed in H x H. For a closed operator A one can express the resolvent set p(A) in a simpler way. 'The scalar product is left linear and right anti-linear. 2Recall that an operator B is bounded if there exists a constant M such that )IBIS < MJJxJJ for all x in V. 1
1. SELF-ADJOINT OPERATORS
2
Proposition 1.1.4. Let A be a closed operator. In order for A to be in p(A), it is necessary and sufficient that one of the two following conditions hold:
(1) The mapping (A - A) is a bijection of D onto H. (2) There exists a bounded operator Ra of H such that:
R,\ o(A-A)=Ida (A-A)oRa=ldH. PROOF. (1) In order to show the necessity of the condition, we need
to show that if A E p(A) then Image(,\ - A) = H. Since this image is dense, there exists for any x E H a sequence yn of elements in D such that x = lim(Ayn -Ay,,). By applying the bounded operator Ra, we can conclude that yn = Ra(A - A)yn converges. Since both yn and Ay,, converge, and cA is closed, the limit y of yn is in D and lim(Ayn) = Ay from which we conclude that .\y - Ay = x. Since x is arbitrary, we see that (A - A)D = H. Now suppose A - A is a bijection. It is a continuous mapping from the Hilbert space (D, II'IIA) to the Hilbert space H. By Banach's open mapping theorem, the inverse mapping is continuous and obviously remains continuous if we equip D with the weaker norm I I II H . (2) We see these conditions are equivalent to the initial definition if we take into account the fact that A - A is surjective if its image is dense. 0 Self-adjoint operators are a special class of symmetric operators where by a symmetric operator A with a dense domain D in H we mean a linear operator A : D H that satisfies: VV, E D (AV, V,) = (cp, Aye). They are often defined on natural domains that are too small for the operator
A to be closed. A basic example is the Laplacian A defined, say, on D = Cc°(R'), the space of infinitely differentiable functions on R' with compact support. However these operators are easily seen to be closeable in the following sense:
Proposition 1.1.5. The closure of the graph of the symmetric operator A with domain D as a subset of H x H is the graph of an operator A defined on a domain D' D D. Moreover the resolvent sets and the resolvent operators are the same for both operators. (A is called the closure of A.) PROOF. We first show that GA is a graph of a function from H to H. We need to show that if (V, ,O) E GA and (cp, ?P') E cA then = V,'. There exists a sequence (cpn, Acpn) that converges to (V, y') and similarly a sequence (cp;i, Acp'n) that converges to (, 0, almost everywhere. Then -A + V is essentially self-adjoint on D = C,(R"). PROOF. Recall Kato's Lemma (see, for example, Reed & Simon [RS72])
which says if u is a real function in LL(R") such that Au E L L(R"), then one has, in the sense of distributions, that Alul 3 sgn(u)Au. We argue by contradiction. Suppose that R. := (-A + V + 1)D is not dense in L2; then we can find a non-zero function u in L2 such that (u, cp) = 0 for all functions cp in R. Since D is stable under complex conjugation, it is easy to see that we can assume that u is real. We have that (-A + V + 1)u = 0
in the sense of distributions. It follows immediately that Au E Ll(R), which allows us to apply Kato's Lemma: (1.1.4)
Alul 3 sgn(u)Au = (V + 1)lul > Jul.
We regularize Jul, with the aid of an infinitely differentiable positive function
e, with compact support and integral equal to 1, as follows. Let e6(x) :_
1.1. SYMMETRIC OPERATORS
7
d-"e(x/S) and ws := Jul * ea. The regularized function wa is an infinitely differentiable square integrable function and, applying (1.1.4), we have: Owa = Alui * ea 3 Jul * ea = wa. (1.1.5)
, {lw6Il2. (zwd, w6) >
On the other hand. Owb = w * Deb E L2, which by utilizing Corollary 1.1.16, implies that the function w& is in the domain of the negative selfadjoint operator A, thus (wb, Ows) S 0. Combining this with (1.1.5) we see that wa = 0 for all J. Since wa Jul in L2 when 6 -+ 0 we get u = 0, which is a contradiction. 0 Up to this point we have not explained why our notion of "self-adjoint" is the same as the more traditional one. This we do now. Definition 1.1.18. The adjoint operator of A* of (D, A) is the operator defined on the space D' of vectors g such that the linear form f '-+ (g, Af ) is continuous and where A*(g) is defined by VfED
(g, Af) = (A`g, f) Remark 1.1.19. The existence of a unique A*(g) satisfying the above equation follows from the Riesz Representation Theorem.
Proposition 1.1.20. Let (D, A) be a symmetric operator in H. The operator A is self-adjoint if and only if V coincides with the set of vectors g such that there exists a constant c(g) satisfying (1.1.6)
I(g,Af)l 0 IIJIIL2(fo,sl)
- VOf(0)I , 0.
PROOF. Set f (t) := log(jIS(t)Il). Since the operator norm for operators satisfies IIABIJ < 11AII IIBIJ, the function f is subadditive and the above D
lemma applies.
The number ys is called the Lyapunov exponent of the semi-group.
2.1.2. The case of symmetric operators. Recall that a self-adjoint operator A is bounded below if there exists a constant m such that (Ax, x) mhlxhI2, for all x in the domain of A. Using the spectral decomposition theorem, one can easily show that the best possible constant for this inequality, called the lower bound of A, is in = inf Q(A).
Proposition 2.1.8. Let A be a self-adjoint operator on a Hilbert space H that is bounded below. There then exists a unique strongly continuous semi-group S(t) for which the infinitesimal generator is A. For t(A) < m we have:
R1, _ -
(2.1.4)
J0
eAtS(t) dt.
Conversely the infinitesimal generator A of a semi-group S(t) of symmetric operators on H is a self-adjoint operator that is bounded below. PROOF. We begin by considering the case where H = L2(µ) and where A is the multiplication operator defined by X on:
D(A) = If E L2(µ) : fX E L2(µ)}. Since A is bounded below, there exists a constant m such that X > m almost everywhere. Thus for all t > 0 the function a-'x is bounded and 111
- e-tXI <e "`thXi
Multiplication by e-tX defines a bounded operator S(t) for each t and the family S(t), t E R+, is clearly a semi-group. If f E D(A), the bounded convergence theorem implies that: 1
1
li a (f - S(t)f) = lim t (f - e-txf) = Xf.
t
2. SEMI-GROUPS
18
Going in the other direction, if f is in the domain D(A) of the infinitesimal generator A of the semi-group S(t) the sequence n f (1-exp(- X)) converges n everywhere, in L2, and by extracting a subsequence that converges almost we see that X f E L2 and hence f E D(A). The uniqueness of the semi-group just constructed follows from the uniqueness result in Proposition 2.1.4 and the formula (2.1.4) follows from the formula: e-txeta dt.
1
VA <s
A-x
The construction of the semi-group in the general case reduces immediately to the preceding case because the spectral theorem allows us to transform A into a multiplication operator defined by a function X. We however do have to check that two different spectral representations do not give differ-
ent results. The easiest way to see this is to note that the formula (2.1.4) determines, for g E H and f E D(A), the Laplace transform of the numerical function (S(t) f , g) on J - oc, m[, and therefore S(t) itself since g is an arbitrary element of H and f is an arbitrary element from a dense subspace of H. We now turn to the converse. For c < --ys where -ys is the Lyapunov exponent of the semi-group S, the integral: r+oo
rc = - J
edS(t) dt
0
is convergent and defines a bounded operator on H. We choose, in particular, c = --ys-1. Since the operators St are symmetric, the definition of A implies
that A is symmetric on D(A) and that rc is symmetric on H. To show that the infinitesimal generator A is self-adjoint, it is sufficient to show the real number c is in the resolvent set p(A). For this it is sufficient to show that
(cI - A) o rc = IH
(2.1.5)
(which includes rc c D(A)) since the relation rc o (c7 - A) = ID(A) follows from (2.1.5) by the symmetry of the operators. (See Proposition 1.1.4.) We now establish (2.1.5). For all E H, the relation: oo
Shrc = -
J
eSt+h dt = e- h ( rh ec'S(t) dt + rc)
implies:
(cI - A)rcip =
h 0(`rc + -ch
= lim (e
h(Sh - I)rci,))
hch -
r+ f eS(t)dt) ch
l
e
h
.
The second term is a mean that tends to the value of the continuous function
t i-. e'S i,1' at 0 as h -+ 0, i.e., to t&. The first term clearly goes to 0 as h
0, which establishes (2.1.5).
0
2.2. KOLMOGOROV SEMI-GROUPS
19
Remark 2.1.9. The construction of a semi-group from a self-adjoint operator is just an example of the functional calculus for a self-adjoint operator and in this sense we have St = exp(-tA). Exercise 2.1.10. Prove that if A is bounded we have:
(-tA)k exp(-tA) =k=o 00j ki Exercise 2.1.11 (the heat semi-group).. Show that the semi-group that is generated by the R" Laplacian operator -zA acting on L2(dx) is given by: 21
(exp 24P(xW
=(
I exp(
I x 2ty12)v(y)dy
Exercise 2.1.12 (the Ornstein- Uhlenbeck semi-group). Let y denote the Gaussian probability measure on JR with mean 0 and variance 1/2, i.e., y(dy) = 1e-b2 dy. We consider the family of measures on R depending
on xERand t>0defined by No (x,dy)=dx(dy)and for 1>0 xe-t)2 -1/2 e p _ (y Nt(x, dy) = (7r(1 - e-2t))
1 - e-2t Show that Nt(,b) := f ?p(y) Nt(x, dy) defines a strongly continuous symmet-
ric semi-group on L2(y). Let -L be its infinitesimal generator. Prove that: L f = !A f - xV f , for f E C,°(iR), and that this latter formula defines an 2 essentially self-adjoint operator on Cco°(R).1
2.2. Kolmogorov semi-groups The goal of this section is the introduction of certain stochastic differential equations for which the associated semi-group is called a Kolmogorov semi-group.
2.2.1. Review of Brownian motion. A real Brownian motion starting at 0 at time 0 is a family of random variables Bt with t E ]R+ defined on a probability space (11,.F, P) that is a centered Gaussian process such that for
any finite sequence tk, 1 < k < n of "times", the vector (Bt, , ... , is a vector-valued centered Gaussian random variable such that E(BtB3) = t As. This data determines in a unique manner the joint distributions of the random vectors (Bt, , ... , Bt.). An important fact is that we can always choose versions of the random variables Bt such that for almost all w, t ' Bt(w) is continuous almost surely on R, although these random paths are almost surely nowhere-differentiable. We will always choose such versions of Brownian motion. 'Note that we have made the choice of relating the semi-group and its infinitesimal generator by St = exp(-tA),
2. SEMI-GROUPS
20
One can always have a more global vision of Brownian motion on the interval [0, T] (resp: [O, oo[). Let W _: Co([0,T]) (resp: Co([0, oo[) be the space of continuous functions on [0, T] (resp: [0, oo[) which vanish at 0 with
the topology defined by the usual "sup" norm (resp: with the topology defined by the family of semi-norms Ill II N := sup{ I f (x) I , x E [0, N] }, N = 1, 2, ...). It is easy to show that the corresponding a-field of Borel subsets is the same as the a-field generated by the evaluation functions ,Ot for t E
[0,T], defined by 3t(w) = w(t) for w E W. The law on Co([0,T]) of the Brownian motion is the image Q7 of the probability measure P induced by the mapping w H B. (w). It is called the Wiener measure on Co([0,T]) (so there is a Wiener measure for each T and we define similarly the Wiener measure on Co([0, oc[)).
Definition 2.2.1. We call the process defined by the evaluation variables on the space Co([0, oo[) with the Borel a-field and Wiener measure the canonical version of Brownian motion.
Brownian motion possesses the Markov property. In fact, let F_ be the a-field generated by a(B8, 0 < s < t), called the a-field of past events, let .Ft' be the a-field generated by B8 for s >, t, called the a-algebra of future events, and let F{t} be the a-field generated by the single random variable Bt, called the a-field of present events. Then the Markov property is the following: for any bounded.F, random variable ', we have: E(z, I Fe) = E(o I F{t})-The
.F{t}-measurable random variables are of the form o o Bt where cp is a Borel-measurable function on R. The heat semi-group appears in the description of the transition from t to t + h. in the following way:
E(f(Bt+h) I Ft) = [Nhf](Bt)1 _ 2 [Nhf](x) = (x 2h) I f (Y) dy. 27rh vr-
j
eXp \
Exercise 2.2.2. Establish the following formula: IE(B, - B8)2 = It - sI. Show that for all finite sequences t 1 < t2 < the random 0 for all x E E and that is reversible for a transition kernel (matrix) K, i.e., µ ® K is symmetric. Define a positive self-adjoint operator on L2(µ) by setting A := I - K. Show that A satisfies a Poincare inequality if and only if K is irreducible, i.e. for some n, all of the elements of the matrix 'k-I Kk are strictly positive. Let a be the symmetric matrix µ® K defined by a(x, y) = µ(x)K(x, y). Verify that:
E(f, f) = 2 E (f (x) - f
(y)) 20,(X,
y)
x,yEE
Let -y be any path connecting x and y: yo = x, yI, , yp-I, '1'p = y and define e(-y) by e(y) := µ(x)µ(y) Ek=I a-I (yk-I, yk). Then let -y(x, y) be any path connecting x and y that minimizes e for paths connecting x and y. Prove that the Poincare constant is bounded above by: max E X(x,y,u,v)e(y(x,y)) x,yE E
where X(x, y, u, v) equals 1 if (u, v) is an edge of the path -y(x, y) and 0 if not. From now on f and g will denote real-valued functions.
3.1. THE POINCAR$ AND GROSS INEQUALITIES
39
Definition 3.1.6. The operator A satisfies a logarithmic Sobolev inequality if there exists a constant c such that: df OO E L2(µ)
(LS)
ff2log (ii1fr) dµ < c6(f,f),
with the following conventions:
II
- 112 denotes
the norm of L2(µ); we agree
that the right-hand member equals +oo if f V D(E) and that left-hand member equals +00 if f2 log(f) is not integrable. This inequality has been developed by L. Gross in [Gr75]. In addition, there are more general logarithmic Sobolev inequalities that add a supplementary term ellf II2> with e > 0, to the right-hand side. See, for example, the course of D. Bakry [Bk93]. We refer to the inequality (LS) as the strict logarithmic Sobolev inequality. In [Gr92], L. Gross gives a panoramic view of diverse applications of these inequalities. In this book, unless we state otherwise, a logarithmic Sobolev inequality, or simply Gross inequality, will always refer to an inequality of the form (LS).
Remark 3.1.7. Since the inequality (LS) is stable when f is multiplied by A # 0, we can restrict ourselves to the case when IIf II2 = 1. In this case twice the left-hand side of the (LS) inequality can be written as fE y(f 2) dp with y(x) = x log(x). Since the function y is convex, Jensen's inequality implies the positivity of the first member of the Gross inequality: IEy(.f2)dµ % y(Lf2 du) = 0,
where the inequality is strict except when the function is constant.
Proposition 3.1.8. If L'(µ) fl D(E) is dense in D(E) for the norm IIE, then the Gross inequality implies the Poincare inequality with the same constant. II
PROOF. It suffices to prove the Poincare inequality for bounded g for which the integral is zero. We set f := 1+eg and write the Gross inequality in the form:
J(1+eg)21og(1+eg)dµ 0 with dtgt = -Agt. There exists a constant M such that for µ-almost every x:
l u(9t+h)(x) - u(9t)(x) - hu'(9t)(x) (9t+h(x) - 9t(x))I < Mh2. By passing to L2 norms, we see that u(gt) is L2 differentiable with derivative
equal to -u'(gt)Agt. Since gt is in the domain of A it is in the domain of A1/2, which coincides with HI(µ). A direct calculation on the distributions
3. LOGARITHMIC SOBOLEV INEQUALITIES
42
shows that u'(gt) also belongs to H1(µ) with Du'(gt) = u"(gt)Vgt. Thus, we have: d
dt
u(9t) dµ = -(u (9t), A9t) -(AI"2u
(9t), A1129t)
(9t),9t) = -12
"
J u (9t)I 9t dµ I2
11
One of the most important consequences of the Gross inequality is Nelson's hypercontractivity property:
Theorem 3.1.15. Let Nt be a Kolmogorov semi-group satisfying the Gross inequality with constant c. Then for all p, q > 1 and t > 0 such that q 1 = e2t/` , we have:
P-1
(3.1.2)
g E L"(µ).
IINt9IIQ < II9IIp,
Conversely the inequality (3.1.2) implies the Gross inequality.
PROOF. By a simple density argument it suffices to consider functions g such that Im(g) C [a, b] with a > 0. In this case, a 5 gt < b for gt = Ntg. From gt E D(A) C H' (µ), we conclude that gt /2 E H1(µ). By applying the Gross inequality to this function, we obtain: (3.1.3)
f9log(9i)dii -
q log (J
9i dµ) fg' dµ
4
Jg_2JvgtI2 dµ.
By applying the previous lemma for q > 0, we see that the function ,D (q,t) = 197 dA
is differentiable with respect to the second variable t on R+, and for q > 1 the following inequality holds:
-2t(q, t) = -q J99-lA9t dµ =
A q -1) fg_2Ivgtl2 dµ 2
J
2(q - 1)
dµ)
du].
J gt [_ 97 log(9t) µ + 4 log (f 9e We then apply this to the case q := q(t)1 + (p - 1)e2t/c and taking into 2(q(t) - 1) account q'(t) = c
c
L
we obtain the inequality: q'(t)
8241(q(t), t) + q (t) J 9e log (9t) µ - q(t) log(4,(q(t),
t) < 0.
3.1. THE POINCARE AND GROSS INEQUALITIES
43
If we set: 11(t) = -b(q(t), t)1/Q(t) = IIgtIIq(t),
the preceding inequality is equivalent to: 0.
Since W(0) = II91Ip and 'I"(t) 5 0, hypercontractivity follows. Conversely, hypercontractivity implies the decay of the function WY(t), as
has been pointed out in [GZ98]. Indeed, the relation (q(t + s) - 1) = e2s/`(q(t) - 1)
implies that the operator N. is a contraction from LQ(t) to LQ(8+t) and we have:
'(t + s) = II NBgtII q(t+e) 5 1(t).
We choose g of the form g,, := (92+a)1/2 where g is a C°° function and a > 0. Since Theorem 2.2.27 implies that g is in D(A), the above calculations of the derivatives are valid even for t = 0. The inequality 'Y'(0) -2 indicates that the operator -21d is a lower bound for A. We will identify functions F on Rd with the corresponding multiplication operators AIF by F. Let d > 3, and let F be a real locally bounded function F in Ld/2(dx) acting by multiplication on L2(dx). Prove that F s -2kdIIFIId12 A-
3. LOGARITHMIC SOBOLEV INEQUALITIES
44
Let U be a CO° function on ](Pd such that exp(-2U) is Lebesgue-integrable and denote the associated Boltzmann probability measure -Z exp -2U(x) dx by µ. The operator defined by Acp = - 'AV + V UV a, for cp E C,"°, extends uniquely into a self-adjoint operator on L2(µ). We set 2V := IVUI2 - AU. Verify that the formula F(V) := Z-1/2 exp(-U) V defines an isometry .F of L2(µ) into L2(dx) that transforms A into the Schrodinger operator B defined on C°O by BO
-2010
We suppose that there exist positive constants d, b, m such that U < bV + b,
V > -m.
Let f be a function of norm 1 in L2(v) such that Im(f) C [a, 0] with a > 0. We set:
F = (log(f) - U -
log(Z))+,
1 = SUP t-2 log d/2 (t).
Prove that f FdI2(x) dx < 21 and from this deduce: log(f) 0,
for all t and x. We apply the inequality (3.1.4) for p = v and t = If IIL2(A) to obtain:
)
2f f2 log
dv
IIf 110(v) (_.11
(f2(f2) f The inequality (3.1.5) implies that the integrand is positive and we are able f2 + If II (,,)) dv.
to write: 2
f2log
J
Ifl IIf110(v)
dv
0, we have: (3.1.11)
cp
fcod+ r}
e-r2/`.
In particular, if a < c 1, exp(alxI2)dp < oc.
PROOF. We first consider the case when cp is bounded and positive. We set
G(t) := log(F(t)),
F(t) = Jet4 dp.
The function G(t) is differentiable on R+ and utilizing the Gross inequality for f = etV'2, we have:
r tG'(t) = tF-1(t) Jwetv dp = 2F-1(t) cF-1(t) 1V f I2 dp + G(t)
ff
if
2 log(f) dm
4ct2F-1(t) JlvcpI2etw dµ + G(t) 4ct2 + G(t).
Since G(0) = 1 and G'(0) = f pdp, the examination of this differential inequality leads easily to:
G(t) < 4ct2+tJcpdµ,
t
0,
and Markov's inequality leads to: (3.1.12)
u{cP 3 jpdp +r}
0, then for all functions f and g such that f (X) and g(Y) zare square-integrable with mean 0, one has: I
((
E(f (X)g(Y))
1/2 r ] 1/2 (f2 . /3 [E(X )) 11J [E(g2(Y))
Exercise 3.1.24 (Bretagnolle'a inequality). Let B= and B2 be two independent Brownian motions with the same filtration (.Ft), let X be a Fomeasurable Gaussian random variable which has zero mean and variance 2, and let Xt and X= be the solutions to the Ornstein-Uhlenbeck equation associated to these two Brownian motions with initial conditions equal to X01 = X02 = X almost surely.
Prove that the random vector (Xt , Xt) is Gaussian with mean (0, 0) and establish the formulas:
E(Xt )2 = E(Xt )2 =
2
and E (Xt Xt) = 2e-2t.
Prove the relations:
E (f (Xt )f (Xt )) = f (Nt.f )2 dry < (f
f1+e-sc d y) 2/( I+e-2c)
By introducing the indicator function IIl_,,,,ul, deduce that for all Gauss-
ian random vectors (Y, Z) with mean (0, 0) and covariance 2 (1010) , with j3 > 0, one has:
P(Y V Z < M) < (1
rM e-x2 dx)
2/(1+0)
J 3.1.1. The Bakry-Emery inequality. This inequality is the Gross inequality for the infinitesimal generators of Kolmogorov processes on Rd associated to uniformly convex potentials U. p will denote the Boltzmann measure corresponding to U, Nt the Kolmogorov semi-group, and U"(x) the symmetric matrix of second derivatives or Hessian at the point x. We assume the existence of a constant m > 0 such that for all x E Rd (Cm)
U"(x) i mId,
3.1. THE POINCARE AND GROSS INEQUALITIES
51
in the usual sense of order on the symmetric operators on Rd. Diverse inequalities follow from this convexity. For suitable constants b, b' these inequalities are: (3.1.13)
(x - y) (VU(x) - VU(y)) > m(x - y)2,
(3.1.14)
X. VU(x) > mIx12 - b,
(3.1.15)
U(x) >
2
IxI2
- b'.
The first inequality follows directly from the formula:
vu(x) - Vu(y) =
J in
1
U" (f (t)) (x
- y)dt
where f (t) = tx + (1 - t)y. The second is deduced from the first by taking y = 0, and the third is deduced from the preceding by integration along lines starting at 0. The relation (3.1.14) shows that the Kolmogorov process does not explode in finite time and (3.1.15) shows that exp(-U) is integrable. The proof of the Bakry-Emery inequality, following the presentation given in [AKR951, will be based on the following stabilization result in the weak sense:
Proposition 3.1.25. We suppose that the hypothesis Cm is satisfied. be a Lipschitz function on Rd and i/it := Nt(i4). Then' t is a Lipschitz function, its Lipschitz slope tends to 0 exponentially, and for any Lipschitz function g, the integral fRd 9(t't) dµ converges to g(f dµ) when t oo. Let
PROOF. Let l denote the Lipschitz slope of 0. On one hand, we have: (3.1.16)
I''t(x) - Ot(y)I = JE('(Xf) - O(Xe))I m. fiiiid
PROOF. Suppose that W E D(A). With the aid of the preceding lemma, we can approximate W by a sequence Wn of infinitely differentiable and uniformly bounded functions. Setting On = we see immediately that:
A n = 0n(Apn -
1
1V to and, since the variation distance between two probability measures is bounded by 2, we can adjust a to cover the case when t -(k/v)Id, where k is the Lipschitz slope of U2. So it suffices to write
U = (U1 +U2,v)+(U2 -U2,v),
for v > k/m. It remains to show that for t > 0, the relative entropy I(.C(Xt) I µ) is finite. The Cameron-Martin formula says that the (Radon-Nykodym) density for the measure on C([0, t]) defined by the process Xx with respect to the Wiener measure P for the Brownian paths WX starting at x is: t
F(w) = exp(U(x) - U(w(t)) -
2
f [IVU12
- AU](w(s))ds).
0
The density g for ,C(Xt) with respect to L (Wj) is:
g(y) = IE(F I Wt = y) where IE denotes the expectation associated with the Wiener measure on C([0, t]) for Brownian paths starting at x and C(WI) is the Gaussian meay)2
) dx on Rk. We write this Gaussian density as sure (27rt)-"/2 exp(- (x 2t exp(-2v). The density f of L(Xt) with respect top is thus given by:
f(y) = ZIE(exp(2U(y) - 2v(y))F I Wt = y) Setting y(x) = x log(x), we have: I(t) = Z-1 E('r[f(Wt)] exp(2v(Wt) - 2U(Wt)))
= Z-1IE(y[ZIE(exp(2U(Wt) - 2v(Wt))F I Ft)] exp(2v(Wt) - 2U(Wt))), which by Jensen's inequality gives us: (3.2.6)
I (t)
5 Z-1 IEI E(y[Zexp(2U(Wt) - 2v(Wt))F] I Yt) exp(2v(Wt) - 2U(Wt)) I (3.2.7)
_ ( [log(Z) + U(x) + U(W) - 2v(Wt) I
\\
j
[IVU12
- AU)((s)) ds] F)
.
3. LOGARITHMIC SOBOLEV INEQUALITIES
60
To see that this quantity is bounded we only have to check that the terms between the brackets are bounded above since F has expectation 1. This is obviously the case for -v since: exp(-2v(y)) = (21rt)-"/2 exp(_ (x 2ty)2
It remains only to look at ]E=(U(Wt)F). But this term is bounded above since
ftEIvuI2 exp(-U]((8)) ds) is uniformly bounded above on a path space and U+ exp(-U) is bounded above on Rd.
Remark 3.2.8. The interest in this method is to control the rate of stabilization in the sense of entropy and thus in the sense of the total vari-
ation norm. We should also point out that the method of Harris chains (see, for example, [MT97]) allows us to still obtain the same exponential stabilization for total variation under the preceding hypotheses. Let us also note that the utilization of similar calculations in the theory of simulated annealing is treated by L. Miclo in [CM]. It is formally a simple extension of Theorem 3.2.5 but is technically more difficult in the case of Rd. On the other hand more stringent growth conditions on U at infinity than above will imply that the Kolmogorov semi-group has stronger contraction properties than hypercontractivity; see [KKR93, Dav89, CKS87]. In fact, Theorem 3.2.5 can be generalized to most semi-groups associated to reversible processes. The simple case of jump processes in a finite state space is already instructive. Let K be a kernel on E with invariant probability p. We denote by £K the Dirichlet form associated to the symmetric operator on L2 (µ) given by Id - (K + K*), i.e.: 2
£K (f, f) := (f, (I - K)f)t'(µ) Assuming that a logarithmic Sobolev inequality holds, we denote the corresponding constant of the inequality by /3(K). Clearly we have: /3(K) = 3(K'). In the case when the support of µ is all of E, we can always reduce the problem to this case, the adjoint K* of K in L2(µ) is given by the unique kernel:
K'(x,y) = µ(y)K(yx). Then for any initial probability m, a jump process Xt is defined such that: G(Xo) = m and the associated semi-group Nt satisfies:
dtNtf = (K - I)Nt f.
3.2. AN APPLICATION TO ERGODICITY
61
The density gt of mNt with respect to p is then Nt go. We can then show (see [D-S96]) that:
I(mNt 1,u) < I(m I µ)eXp(-2t), where one can take c = i3(K) in the reversible case, and c = 23(K) in the non-reversible case. The derivative calculations are the same as in 3.2.5, but the equality E(log(f ), f) = 4E(f1/2, f1/2) needs to be replaced by an inequality. In the reversible case, the constants do not change. We treat the reversible case as an exercise:
Exercise 3.2.9. Let µ be a reversible probability for the kernel K on the finite set E. Prove the inequality MI(u, v) < M,, (u, v) between the logarithmic mean u - v
log(u) - log(y) and the arithmetic mean. One can utilize the representations: I
'
100 M,(u,v) =
°O
dt
,
dt
M.(u,v) =
+ u)(t + v)
(t + (UV) 1/2)
2
By utilizing the formula in Exercise 3.1.5, deduce:
EK(log(f),f) > 4EK(f1/2,f1/2). Remark 3.2.10. The validity of the Gross inequality in the case where the state space is finite is relatively trivial. For example in the reversible case, the Poincare inequality for Id -K is true if and only if the chain associated to K is irreducible; see Exercise 3.1.5. The two terms of the Gross inequality are non-zero except for constant functions. On the unit sphere of L2(µ), which is compact, the expressions F-xEE f2(x) log (If 1(x)/IIf lie) u(x)
and varµ(f) are two positive functions of f that vanish only at 1. They are infinitely differentiable and the same calculation that was used in Proposition 3.1.8 shows that their differentials are the same at 1. They are thus comparable. However, finding good, let alone the optimal, logarithmic Sobolev
constants in the finite case requires a large number of methods as can be seen in the work of P. Diaconis and L. Saloff-Coste [D-S96].
In the case of discrete-time Markov chains there is a simple analog of the formula (3.2.4), due to L. Miclo [Mic96], for which the proof is more subtle. We do not need to assume that K is reversible since this wouldn't make the proof any easier.
Theorem 3.2.11. Let y be an invariant probability for K. If the operator
Id -KK`
3. LOGARITHMIC SOBOLEV INEQUALITIES
62
satisfies a Gross inequality in L2(µ) with constant Q(KK'), then there is exponential stabilization with the following form:
I(mK" I µ) < 1 -
(3.2.8)
(
1
3(KK`))
"I(m µ).
PROOF. We note, first of all, that the Dirichlet form in question has a simple interpretation: varµ(f)
- varµ(K'f) = fvarK(f)P(dY).
It follows from this that an admissible constant in the Poincare inequality and a fortiori in the Gross inequality, should be larger than 1. Let f be the density of m with respect to p. (If this does not exist there is nothing to prove.) It is clearly sufficient to consider the case n = 1. In this case the density of mK is K* f . By letting
=
v in the convexity inequality
log(1;) - f + 1 , 0, we see that:
[u - v] + (f - v)2, u,v
u(log(u) - log(y))
0.
By replacing u by f (x) and v by K* f (y) and summing with respect to the kernel
(p (9 K*)(y,x),
the term between brackets disappears because of the K'-invariance of µ and we see that:
E
µ(y)K.
(f log(f)) (y) -
µ(y)K f (y) log(K f (y)) yEE
yEE
tt(y)K`(y,x)( f(x) -
K.f)2,
yEE,xEE
which can also be written as
I(m I u) - I(mK I µ) , E u(y){
(K.' VIf
-
K;, f
l
2}
yEE
AMVa K;(V!) =E(V/f,V!) yEE
I(fu I µ) =
I(m I µ),
where the last inequality is just the Gross inequality.
0
Remark 3.2.12. The preceding calculations would be valid in the case of an arbitrary kernel but this generality is illusory since the validity of the logarithmic Sobolev inequality in this case requires that µ be a finite barycenter of Dirac measures. The Poincare inequality varo (f) S
3.2. AN APPLICATION TO ERGODICITY
63
aEKK (f, f) does not have the same defect and, furthermore, it immediately implies that:
var,,,(k"f)
(1
-
va'µ(f
These methods are very simple but the real problem is to find good Poincare constants.
CHAPTER 4
Gibbs Measures 4.1. Generalities The theory of Gibbs measures arose in the study of certain statistical mechanical models. These models can be applied each time the system can be described by configurations belonging to a product Fs, where S is a "large" finite or denumerable set called the set of sites and the set F is called the set of elementary states or spins. We denote the set of configurations by E.
Examples 4.1.1. We give three examples. The prototype of all the statistical mechanical models that will interest us is the Ising model. Its purpose is to explain the phenomenon of magnetization. In this case S := Zd with d = 1, 2, 3, .. . and F = {-1,1}. Physically S represents the sites of atoms in a crystal where each atom has an elementary magnetic moment called
the quantum mechanical spin, which can take the value of either +1 or -1 at any site. An immediate modification to this setting is to take for F: a finite set, or the unit sphere Sk in RA+1 or F = N; in order to represent the set of possible images on a screen consisting of n rows and p columns we take F = {0, 1} where 0 codes a black
pixel and 1 a white pixel and S = {1,...,n} x {1,... , p}. If, say, p = 640, n = 350, the cardinality of E will be the huge number 222400. If the pixel can have several shades of gray, we will take F = {0, 1, ... , b}. If the pixels can be one of three fundamental colors, F will be the cube of the preceding set; the space of trajectories (configurations) of a discrete time process with values in F is FN or Fz.
4.1.1. Gibbs measures. We denote the finite subsets of S by Pf(S). Definition 4.1.2. An interaction on E consists of a a-finite measure on F and a family VL, L E Pf(S), of functions on L such that for each L, VL(x) only depends on the restriction XL of x to L. The VL are called interaction potentials. From now on we will only consider local interactions, i.e., for any i E S, there only exist a finite number of L E Pf(S) such that i E L and VL # 0. We denote by XL the canonical projection of E onto EL := FL defined by XL(x) = XL and we define the reference measure on EL to be h := (gA)L. 65
4. GIBBS MEASURES
66
For any probability measure P on E, we denote by PL the projection of P
on EL. Let L be a proper subset of S and ( E ES_L. We introduce the conditional probability:
P( I XS-L = C), which is a probability measure on E (depending in reality on the version that is chosen for the random probability P( I XS_L)) and its projection
PxL( I XS-L = O on EL, which we will denote simply by PL( I (). It is the probability governing the configurations on L given the exterior condition (. In the examples given above, F is a Polish space, I which implies that these conditional probabilities exist. See [DM75]. Alternatively we can simply postulate their existence. Case where S is finite. We define the energy of a configuration x by:
U(x) = >2 VL(x). LCS
The Boltzmann-Gibbs measure associated to the interaction V with temperature /3-1, where /3 is a strictly positive parameter, is the unique probability of the form: (4.1.1)
P(dx) = Z-1 exp -,3U(x) \s(dx),
where Z is the normalization constant:
Z = fexp -$U(x) \s(dx),
which is assumed to be finite. We consider for any L C S and z E E the unique probability measure nL,z proportional to: (4.1.2)
exp(-Q > VA(x,zLc))AL(dx) AEP f (S)
AnL#0
(For the moment, the notation Pf(S) is redundant.) It is easy to see that for any L a proper subset of S, we have: PL (dx I zLc) = nL (dx, z) P-almost surely in z.
Indeed, the conditional probability, conditioned by ( = zL-, is:
Z-1 exp -/U(x, () f Z_1 exp -,3U(x, () AL(dx)' and if we express U as a function of the VA we see that the corresponding terms A disjoint from L cancel out in the numerator and denominator and things are simplified. 1A separable topological space for which the topology can be defined by a complete metric space.
4.1. GENERALITIES
67
Case where S is infinite. Formula (4.1.1) no longer makes sense but, for L finite, the sum figuring in (4.1.2) only consists of a finite number of terms because of the locality of the interaction. This leads us to make the following:
Definition 4.1.3. The energy of a configuration x in L, given that z E E, is: (4.1.3)
VL ((x, ZLC )) for x E EL,
UL (X, z)
AEPf(S) AnLOO
where L` := S - L.
It is useful to note that this energy only depends on the coordinates ZLC of z that are outside the "box" L. We suppose for any finite subset L of S, any condition z, and any /3 > 0 that the function exp -QUL,ZL, is integrable with respect to AL. We denote its integral by ZL,z, which is called the partition function associated to the energy UL. We can associate a family of Boltzmann probability measures to the energy UL, where the family is parameterized by a new parameter T called the temperature. In the earlier chapters we set this equal to 1/2. Definition 4.1.4. The Boltzmann-Gibbs probability measure on EL with temperature T = )3-1, energy UL,z7 and measure AL is the probability measure defined by: (4.1.4) nL(z, dx) = ZL Z exP -/UL(x, z) AL(dx).
This formula defines for each L a kernel nL from E to FL.
Definition 4.1.5. The system of kernels nL, L E Pf(S), given by (4.1.4) is called the system of local specifications associated to the potential V.
One can easily verify the following compatibility property. For any subset L of a finite subset E of S and any configuration z, we have: (4.1.5) J co(xL, xE-L) nE (z, (dXL, dXE-L)) E
=
J
[ E
J
1P
V(xL, yE-L) nL((1JE, zS-E), dXL), nE (z, (dyL, dYE-L))
EL
This property will become more transparent when we introduce other notation later on. Taking into account the properties of the composition of two conditioning operations we introduce the following key definition.
Definition 4.1.6. We say a probability measure P on E is a Gibbs measure for the interaction V and temperature 0-1 if for any L E Pf(S), we have: (4.1.6)
PL( I zLC) = nL(z, ) for P-almost every z.
4. GIBBS MEASURES
68
We also say that P is a Gibbs state. The equations (4.1.6) are often called the Dobrushin-Lanford-Ruelle (D.L.R.) equations. Physically these equations say that the part of the system that is inside L is in thermodynamic equilibrium at the given temperature with the rest of the system. The function V{,} is the internal energy of the atom at site i and V{ij} is the interaction energy between the atoms at sites i and j. In physics it is usually only these two types of potentials that are considered, i.e., VL = 0 for card(L) > 2. If k 3 atoms interact and a new phenomenon is created then these potentials would be added to the the sum of the C2 pairwise interactions and the self-energies. It should also be pointed out that these multiple potentials are useful in the study of images; see [Gu92]. For the Ising model, A is the uniform probability on {-1, +1 } and we describe the energies in the following way: we consider ?Gd as a graph by defining the 2d neighbors of the point i = (ij, ... , id) as the points that are obtained by adding ±1 to the coordinates of i. We set V{;}(x) := Hx; and,
writing i - j if i and j are neighbors, we set the interaction potentials to be:
Jx,xj if i - j, 0
if not,
where H is an arbitrary real number corresponding to an exterior magnetic field, J > 0 is a fixed positive number, and VL =_ 0 for card(L) > 2. We now describe the behavior of the Ising model. Let GT,H be the set of
Gibbs measures with temperature T = 0-1 and with H and J fixed. This is a convex set of probabilities that for d = 1 consists of a unique measure card(GT,H) = 1 for all T and H. When d > 1, there are the following possibilities:
For H # 0, card(GT,H) = 1. For H = 0,
there exists a temnerature T -such that:
J card(GT,H) = I if T > T, card(9TH) > 1 if T < TT.
These results correspond to the following properties of the ferromagnetic
In the presence of an exterior field H 0 0, the elementary magnets are mostly aligned in the same direction as H, creating an induced magnetization. In the absence of this field, there are two cases. At high temperature, i.e., greater than the Curie temperature T., there exists only one possible "phase" with a mean zero magnetization: f X, dP = 0 for all i. For temperatures lower than the Curie temperature, one is able define, in particular, two phases P+ and P- with non-zero mean, respectively positive and negative, magnetization. For example, P+ is the weak limit of the corresponding Gibbs measures when H -+ 0+. Thus the field is able fields.
to conserve a non-trivial magnetization giving rise to a permanent magnet.. For a detailed discussion, see [Sp74].
4.1. GENERALITIES
69
In this book, we will study the case of real spins of dimension one and the case of high temperature.
4.1.2. Markov fields and Gibbs measures. Being able to define a multi-dimensional Markov process is one of the most significant properties of a Gibbs state. The characterization of the corresponding potentials has been described by G.R. Grimmet in [Grm73]. In order to simplify things
in this subsection, except in the case of Exercise 4.1.10, we will restrict ourselves to the case where the elementary state space F is finite. Since the characterization we are looking for does not depend upon the parameter ,0, we set it equal to 1. We assume that S has the structure of a non-oriented graph without loops. We will write i - j if i is related to j. If L C S, the boundary OL of L is the set
aL={iES - LI 3jEL,i-j}. The graph structure is given by a subset of S x S, called the set relations, which is assumed symmetric and is such that it does not intersect the diagonal. We assume that the graph is locally finite, i.e., any point only has a finite number of neighbors.
Definition 4.1.7. A subset A of S is a clique if any two distinct points of A are neighbors.
For any boundary condition u E EaL we will abbreviate the probability measure Pxt,(. I XaL = u) on EL by PL(. I u).
Definition 4.1.8. We say that a probability measure P on FS possesses the Markov property if for all finite subsets L of S, we have for PS_L-almost every ( : PL(' I () = PL(' I (8L)
The process (P, XL, L E Pf(S)), indexed by the finite subsets of L, is called a Markov field.
Exercise 4.1.9. Show that a stationary finite Markov chain with state space F defines a Markov field on Fz.
Exercise 4.1.10. We denote by r(m, 02) the Gaussian measure on R with mean m and variance Q2. The Ornstein-Uhlenbeck semi-group NN has
the kernels Nt(x,.) = F(xe-=, 2(1 - e-2t)) as measures on llt with its invariant probability measure equal to 'y(dy) = x(0,1/2). Fix t and consider the Markov process Y indexed by time Z, with transition kernel NN, and initial probability measure (distribution) -y. Calculate the law (joint distribution) of the random vector (Y1, Y2, ... , Y,,), and deduce from this that the law P on RZ defined by this process is a Gibbs measure corresponding to the following interaction potentials with respect to Lebesgue measure and temperature 1: V{i}(x) = coth(t)xi,
V{i,i+l}(x) = -2sinh-1(t)xixi+1.
4. GIBBS MEASURES
70
Do this by first considering the conditional probabilities corresponding to
the interval I conditioned by data on the set J \ I where J is an interval properly containing I.
Definition 4.1.11. We say that a probability measure P on E is full if, for any L E Pf(S), the marginal probability PL on EL has the property
that PL({y})>0forall y in EL. Naturally the existence of a full probability measure implies that F must be at most denumerable.
Theorem 4.1.12. Let (S, .) be a locally finite graph, F a finite or denumerable set, A the counting measure on F, and P a full probability measure on FS. In order for P to be Markovian, it is necessary and sufficient
that P be a Gibbs measure for an interaction V such that VL - 0 for all subsets L that are not cliques. That this condition is sufficient follows immediately from the formulas (4.1.3) and (4.1.4). We will first prove its necessity when S is finite.
Lemma 4.1.13 (Mobiu8). Let S be a set and f and g two mappings from Pf(S) into an Abelian group (G, +). Then the following two relations are equivalent:
VA C S f (A) _
(4.1.7)
g(B), BCA
VA C S g(A) _
(4.1.8)
(-1)card(A\B) f(B). BCA
PROOF. When C C A, set a(C, A) = When card(A \ C) = n, there are Ch subsets B such that card(A \ B) = k in this EccBcA(-1)card(A\B)
sum. Thus: k a(A ,C)=E(-1) k Cn=
k=0
if n= 0, then A=C,
1
(1-1)"=0 ifn¢0 , then A#C.
Assuming that (4.1.7) is true, we have:
E (-1)card(A\B)f (B) =
nL: E
(-1)card(A\B)g(C)
BCACCB
BCA
_ E a(C, A)g(C) = g(A), CCA
which is (4.1.8). We can prove that (4.1.8) implies (4.1.7) by strictly parallel reasoning. Alternatively we make the following remark. Let f be given, then by recurrence on card(A), we can construct a unique function 0 satisfying f (A) = >BCA O(B). Then following the first part of the proof, we will have:
0(A) = E (-1)card(A\B)f(B) = g(A). BCA
O
4.1. GENERALITIES
71
Exercise 4.1.14. Let I be an ordered finite set and M a matrix indexed by I x I possessing the following property (T): Mi1 # 0 only if i < j. Show
that M is invertible if and only if Mii # 0 for all i and in this case M'1 possesses property (T). Interpret the Mobius lemma in this setting when
G=R. Completion of the proof of Theorem 4.1.12. We fix a state, denoted by 0, in F and denote by OL the configuration that is identically equal to 0 on L. Since P is full the following conditional probabilities are uniquely defined and are not equal to zero. For L C S, L # S, y E E, we set: /PL({yL} I OS\L)l WL(y) = -loglPL({OL} I OS\L)/'
WS(y)
_-log( P({y})
P({OS})
We have thus defined a mapping L --* WL of P(S) into the additive group of functions on E. By the Mobius lemma, there exists a family of functions V such that WL = EACL VA for all L. Since WL only depends upon variables indexed by L, the Mobius inversion formula shows that the same is true for
VL. The probability measure P is a Gibbs measure for V since P({y}) is proportional to exp (- SACS VA(y)) .
Let A be a set which is not a clique. We will show that VA = 0. Let i and j be two distinct non-neighboring points belonging to A. We write the Mobius formula in the following manner: -1)card(A\B)[WB + WBU{i,j} - WBU{i} - WBU{j}]
VA =
BCA\{i,j}
We will show for any B that the sum of the four terms between the brackets
is zero. Set C := S - (B U {i, j}). All of the probability measures that we will consider are conditioned by {Xc = Oc} and can be expressed with the aid of:
I Xc = Oc and Xj = 0). Utilizing the notations: B notation XB=YB event we can write: WB(y)
Bo
I
To
XB=OB Xi=yi Xi=0,
log(Q(B and lo)/Q(Bo and lo)), log(Q(B I Io)IQ(Bo I To)) WBU{i}(y) = - log(Q(B and I)IQ(Bo and I4)),
which by taking the difference gives:
WBU{{}(y) - WB(y) = - log(Q(B and I)/Q(B and Io)) log(Q(I I B)/Q(Io 113))
=-log(P(Xi=yiIXB=YB and X.=0 andXc=Oc) / P(Xi =0I XB =YB and X3 = 0 and Xc =Oc)).
4. GIBBS MEASURES
72
By replacing B by B U {j} and Q by P( I Xc = Oc) in the preceding calculations we obtain in the same way the following: WBu{i,j} (1!) - WBU{j}
log(P(Xi = yi I XB = YB and Xj = yj and Xc = Oc)
/P(Xi=0I XB=YBandX3 =y,andXc=Oc)). Since j
8{i}, the Markov property implies that the measures Pi(- I (YB,y ,Oc))
and Pi(' I (yB,O,OC))
axe equal, which combined with the relation WBU{i,j)(v) - u'BU{j}(Y) = WBu{i}(y) - WB(Y) proves that VA equals zero. We now consider the case where S is infinite. The formula
WL(y) _ -log
(PL({YL} I XL = OS\L) l PL({OL} I X8L = OS\L)/
can be used in the same way as when S is finite because of the Markov property. Since the boundary of L is finite and P is full we are allowed to utilize conditional expectations that are then defined unambiguously. We define the functions VL in terms of the WL by the Mobius formula and we consider, for any finite subset E of S, the probability measure on EE obtained by conditioning P by {XOE = 08E}. We can then apply the preceding proof to PE, which shows for L C E that the potential VL is equal to zero if L is not a clique. Let E be a finite subset of S such that L U 8L C E and set E' := E U OE. By the Markov property and the composition of conditional expectations we have for PS\L-almost surely (:
PL(' I XS\L = O = PL(' I XE'\L = (E'\L) = PL(' I Xar = 08E and XE\L = (s\L) In other words, we can simply condition the measure PP by the condition XE\L = (E\L in order to calculate PL(- I XS\L = (). Since the interaction potentials for PE are the VA, we find the proportionality relation:
PL({x} I XS\L = () « exp-(E V'%((X,())), AEPI(S) AnLOO
which completes the proof.
4.2. AN ISING MODEL WITH REAL SPIN
73
4.2. An Ising model with real spin From now on the set of sites S will be Zd and the space of elementary states F will be R with Lebesgue measure ,\(dx) = dx. Thus E = R. We set lil := sups 1 and with the dominant coefficient strictly positive. This hypothesis is traditional and is already made for models of Euclidean fields when the models have been simplified by discretizing space and time; see [GRS75]. But, in fact, only a certain number of growth conditions of the self-interactions at infinity are necessary. These conditions will be satisfied by the hypothesis that V is a polynomial. Similarly the form of the pairwise interactions can be generalized. However, with our assumptions, the energy in the region L under the condition z is:
UL,z(x) = EV(xi) - E 1J('i - j)xixj - E J(i -j)xizj iEL
(i,j)EL2 2
iEL,
We set the value of the parameter Q equal to 1/2 so that the Boltzmann measures are the same as in the study of the Langevin equation. Our goal is to study the problems of uniqueness that are analogous to the case T > TT in the Ising model. If we take the measure Z-1 exp(-2V(x))dx as the basic measure on I8, the analogue of the uniform measure on {0, 1} for the Ising model, the self-interactions will be absorbed in this measure and only the pairwise interactions will remain. The condition of high temperature
corresponds then to the case where the coefficients J are small, i.e., the condition of weak interactions. But for now we will study the problems of existence in the general setting.
Exercise 4.2.2. Let A be Lebesgue measure on F = R, L a finite subset of Z2, and P a real polynomial that is bounded below and of degree at least
equal to 4. We set V{i}(x) := P(xi) for any i and V,j(x) := 11 (x, - xj)2 for all neighbors i and j in Z2 with the other interactions VL set equal to zero. Let n3L z be the Gibbs measure on RL with temperature 1/0 that is associated to V with the exterior condition z. A will denote the discrete 2We do not assume that J has a constant sign. In the case where there is a constant sign we say the model is ferromagnetic if J > 0 and anti-ferromagnetic if J < 0.
4. GIBBS MEASURES
74
Laplacian: Ox(i) = >j_i(x(j) - x(i)). Prove for 3 tending to +00 that the measure nL Z is concentrated in the set D of solutions of the following non-linear Dirichlet problem: find the functions x from L U OL into R such
that
I
xl8L = ziaL I
[-Ax + P'(x)](i) = 0 for all i E L,
in other words, that dL ,,(D) tends to 1.
4.2.1. Existence of Gibbs measures. From now on L will always denote a finite subset of S. The construction of a Gibbs measure on E will be accomplished by taking a limit point of the Gibbs measures on EL when L tends to Z' . For this it is convenient to consider, for any z E E, the probability measure on E, denoted by 7rL(z, dx), governing the configurations x for which the value is frozen at z outside of L, i.e., XLC = zLc and where the restriction xL to L is a random variable with distribution nL(z,dxL). When z varies this family of probability Thus we have: 7rL,z = nL,z ® measures defines a kernel on E that we denote by 7rL. We have: 7L1)(z) =
VI(y, zL a, we are able to find b such that for any x E IR, we have V(x) >, 2x2 - b. The function UL dominates, up to a constant, the Gaussian case considered earlier and thus exp(-2UL,,) is integrable. Before continuing we will need some preliminary estimates.
Lemma 4.2.6. There exist constants CI and C2 such that for any z E E, L E Pf(Zd), we have: (4.2.3)
J iEZd
a(i)x; 7rL(z, dx)
CI + C2 E a(j)zj . jOL
PROOF. By virtue of the hypotheses in Theorem 4.2.5, we can find con-
stants a > a and b such that: dx E III;
x V'(x) >, axe - b,
(4.2.4)
Vi 34j
xi aiV{i,jl(xi' xj) %
1
2 -1p(i - j)(XI2 +' xj)
where ai := 8/8xi. The second of these relations will be, in any case, evident for the interactions that we will consider. By integrating by parts we have for any i E L that: ZL,Z =
r
JR
2xi BiUL,z
dx.
4. GIBBS MEASURES
76
Thus utilizing (4.2.4) and dividing by ZL,, we obtain:
E p(i - j ) (x? + z )] lrL (z, dx) < 1.
J [2ax? - 2b - E p(j - i) (x? +
j0L
jEL
We now apply Lemma 4.2.4 for a value of o' such that o < o' < a. By summing the preceding inequality with respect to a(i), we obtain:
(2a-o-7')J Ea(i)x; lrL(z,dx) < (1+2b)Ea(i)+oEzj2, iEL
iEL
since ES a(i)p(j - i) < o'a(j). It suffices to take into account that i 0 L in order to obtain the inequality in the form announced. Corollary 4.2.7. For any Gibbs measure u supported by S' and i E Zd, we have: X? , A(dx)
E
p(j-i)uj(s)ds. o
0
j
For i V L and replacing x by z we also have:
ui(t) p(j-i)uj(s)ds.
If we set f (t) := Eies a(i)ui (t) we have, taking into account Lemma 4.2.4, that:
f(t) 5 Ea(i)[l[iEL(xi +8+(b+1)t) +1 Lz;] + f(l + a' + a')f(s)ds. iES
Gronwall's Lemma then implies:
f(t)<exp((I+a'+a')t)Ea(i)[IiEL(x?+8+(b+1)t)+Ij Lz?], iES
which gives us the relation we were looking for in the case where k > 1 + a' + at.
4. GIBBS MEASURES
80
We note that we have exactly the same relation when we replace a by a translate a(k) = a(k - i) since & is also superharmonic. The term corresponding to k = i of the left member leads to the second inequality.
To construct the infinite dimensional processes, we will utilize a comand XM,Z,x for M C L. The result obtained parison of two processes will also prove useful later for obtaining stabilization properties when we in-
troduce the property called finite velocity propagation, a concept that first appeared in [Zeg96] s Before proceeding, we point out an inequality that will appear later in many places. When we say V is sufficiently convex, we will mean that V satisfies a uniform convexity inequality in the Euclidean space 12 (a,,,) where a is a weight introduced in Lemma 4.2.4.
Lemma 4.2.10. For all points l; and i in RL, independent of the box L and the condition z, we have, by setting m := inf(V"), the following inequality:
ai(&-1Ji)(49W(r )-aiU(tl)) >, iEL
/m- 2d
ai(Ci-m)2.
(
\
iEL
PROOF. As in the inequality (3.1.13), we have:
(x - y)(V'(x) - V'(y)) > m(x - y)2,
(4.2.12a)
(4.2.12b)
(xi - yi)(aiV{i,j) (xi,xj) - aiV{ij}(yi,yj))
-2 Ep(i - j) ((xi - yi)2 + (xj - yj)2) By summing these equations for the weight ai and using the fact that a is superharmonic, we see that the lemma follows.
Proposition 4.2.11. We can find constants k', k", k', and c' such that:
(1) For any time t, and any subset M of the box L: (4.2.13)
a(0) )E
(1: a(i) sup (Xt O,x - Xt ;'0'x)2 iES
0<s 0 such that for any Gibbs measures p supported by S', any i E Zd, and any r, we have: (4.2.20) p{x E E : 1x=l > r} < e-k([r-bl+)z.
PROOF. We, first of all, show an analogous inequality for the measures lrL(dx, z), i E L. In view of its degree, the polynomial P can, for any m > 0, be decomposed into the form:
mx2 + F(x) + R(x),
V (X) = 2
4. GIBBS MEASURES
84
where F(x) is a convex function and R(x) is infinitely differentiable with compact support on R. We begin by controlling the local specifications iiiL,z associated to the convex site potential V (x) = Imx2 + F(x). The quadratic part of the energy UL,z(x) is the function: iEL
2x? - i,jEL 2
J(i -
j)xixj
-
J(i - Axiz3,
iEL
I&L
for which, by Hadamard's criterion, the Hessian is positive definite form > a, and even bounded below by (m - a) Id. Thus by the Theorem of Bakry and Emery, the measure iL,z has, for m > a, a Gross constant that is smaller than (2m - 2a)-1, independently of L and z. The inequality (3.1.11), called the concentration inequality, for the function 1xil with i E L, is written: iiL,.(dx)]+)2},
nL,z{IxiI > r} < exp{-k([r - f Ixil
k = 2(m - a).
In order to deduce this from the concentration inequality for nL,z, we begin by comparing the Kolmogorov processes XI and XI associated to these two measures and departing from the same point x E RL. Utilizing Lemma
4.2.10, the Langevin equations, and the fact that IR'I is bounded, say by 1/2, we see that: dt (Xi(t)
-
Xi(t))2
= -2(Xi(t) - Xi(t)) (a3UL.=(X (t)) - ajUL.z(X (t)))
= -2(Xi(t) - Xi(t)) (aiUL,z(X(t))
- aiUL,=(k(t)) + R'(xi(t)))
< -2(Xi(t) - ki(t)) (OzUL,=(X(t)) - aiUL,Z(X (t))) + lI xi(t) - Xi(t)I. By summing the preceding inequality with respect to a and then applying Lemma 4.2.10 and the Schwarz inequality, we obtain: (4.2.21)
'1
,2(t) < (-2m+a+d)V2(t)+1(Ea(i))"2 (t) iEs
where : ap(t) = IIX L.z,T (t)
- XL'z'T(t)Ilt2(a)'
If m is chosen at the beginning, and it can be, such that the constant 2m - a - a, denoted by A, is strictly positive, the analysis of this differential inequality shows right away that: ,P(t) < lA ' la11"2,
and, in particular, we obtain a lower bound for the gap between the values of XL.z,x(t) and X (t) at 0. This estimate is valid for all translated weights a with the same constants, and finally for all L, z, x, i, t, we have: X; 'z'T (t)
-
z'T (t) I < bo
with bo =
Ial1/2a-1/2(0).
4.2. AN (SING MODEL WITH REAL SPIN
85
We apply the finite dimensional stabilization Theorem 3.2.7, which is applicable here, to the two processes X and j C' obtain for any positive p that: nLz{Ixi p} =clt-00 m P(X;xz > P) o0
(4.2.22)
clime
P(X,'
p - bo) = iiL,z{jxil > p - bo}
exp(-k([r - bo - bL'Z]+)2) where we have set: bL'z = f I xi I iiL,z (dx).
The upper bound (4.2.3), applied to a`, shows that there exist constants bI and b2, independent of i, such that:
b2'z r}) = lim n
cc
J
1rL,,,z{I4 > r}p(dz)
> r} tc.(dz) flIm1rL,z{lxil noc
f n00 lim exp(-k([r - b0 - bL"'1]+)2) p(dz)
1 - 6r, for any n and r.
PROOF. Let (µn) and (vn) be two compatible systems of probability measures for the projective system of kernels (Fn, Kn). It suffices to show that: (4.2.23)
Ilin - vnllvt exp(-2lxIIJ * zI) > exp(-2R 1/2ar2), and finally there exists a constant k' such that
> exp(-k'r2)IIMn(x), if ( E Ma,,.. In order for Kn-1((, . ) to dominate an-1 for ( E Mn,,, it suffices to choose the constant q, introduced above in (4.2.24), to be equal to k'. Having made this choice, we have, for r sufficiently large, that a, > 1 exp(-k'r2) since the probability measures Qn have uniformly bounded second moments
while we have by Proposition 4.2.15 that 6, < exp(-k(r - b)2). By taking k larger than k', we see that all of the hypotheses of Proposition 4.2.16 are satisfied.
This result can be carried over to the so-called PV1 model where the set of sites is R. This is described in (CR75].
Exercise 4.2.18 (uniqueness in the Gaussian case). The dimension d is arbitrary and we set V(x) := 2x2 with a > a. We denote: (1) the space of configurations that decrease more rapidly than any negative power of the distance from the origin by S; (2) the dual of S by S'; (3) the
88
4. GIBBS MEASURES
subspace of S' consisting of configurations with finite support in Zd by T. Note that the dual of T is E = RZd. For E T and x E E we set: x) = exp{-(2x + C, aC - J * O(TE) }.
Prove that any Gibbs measure u on RZd has the following quasi-invariance property: the translated measure r_ p satisfies r-Ep(dx) = a(C,x)µ(dx)
If p is supported by S', extend this property to C E S. Show that a Id -[J* J is a permutation of S. Compute from this the Laplace and Fourier transforms of p and conclude the uniqueness.
Remark 4.2.19. Up to now, the limitation to measures supported by S' appears to be merely a technical tool for obtaining certain upper-bounds. However its necessity is made more apparent by the preceding exercise. Indeed, let po be the Gaussian Gibbs measure on S' associated to the interactions: V (x) ,a2x2, J(i) = 1{i-O} while supposing that the constant a - 2d, denoted by m2, is strictly positive. Then let h be any solution on Zd of the equation (-0+m2)h = 0. It is easy to check (see [Roy77J) that the translation of µo in RZd by h is still a Gibbs measure. (There exist non-zero such h.) However the measure obtained is
not supported by S', since h does not belong to S', unless it is the zero function. This is because its Fourier transform g on the torus Td will satisfy (m2 + 4 `vk_I sin2(pk/2))g = 0. The construction of analogous measures in the non-Gaussian case is not known.
CHAPTER 5
Stabilization of Glauber-Langevin Dynamics As we will see, the Gross logarithmic Sobolev inequality can continue to play a roll in the study of certain infinite dimensional models, unlike the ordinary Sobolev inequalities. This will allow us to prove various ergodic and stabilization properties for these models; in reality, the Poincare inequality also has this characteristic. But the Gross inequality is more powerful since it conserves, under a weak form, the regularization property that the ordinary Sobolev inequality has.' The Glauber-Langevin stochastic dynamics of the Ising model with real spins will furnish a striking illustration of the possibilities of the Gross inequality.
5.1. The Gross inequality and stabilization We consider the process X L,, associated to the energy in the box L with exterior condition z For an Ising model with real spins for which the interactions satisfy the hypotheses 4.2.1. Such a process on E corresponds to a Kolmogorov process in EL associated to this energy with the configuration outside of L being frozen at z. It follows from this that such a process has as invariant measure the Boltzmann measure lrL(z, ), which is concentrated at z outside of L. We denote the box {i : ail < n} by Ln and by a a weight on Zd as in (4.2.2).
Lemma 5.1.1. Let an instant in time t > 0 and an initial condition x E S' be fixed. The relative entropy at the instant t with exterior condition z depends in a tempered way on the size of the box Ln and the exterior influence z: (5.1.1)
I(G(Xt n,z'x)
I IrLn,z)
a. In this case the energy UL,0 is the quadratic form: a 2
1 Q(x) _
2J(i - j)xixj
2xi i.jEL
iEL
Since the eigenvalues associated to 2Q are all greater than a - a, the calcu-
lation of the normalization of the Gaussian measure µL,,,0 gives us:
log(4ir(a - a)) = kind.
log(ZL,,,o) < 2
In the case where the exterior condition z is non-zero, we need to combine the preceding inequality with the following inequality: log{
1
ZL,,.o
JL
J(i - j)xizj - 2Q(xL)) (LxL}
exp(2 iEL j¢L
= 2 (J * zLr., Q' I (J * zLr.))
,l
V i m < ( 1 -7)
JEA
where p denotes the integer in A that is congruent to p modulo N. By the Schwarz inequality (5.2.5) implies: (apfp-1)2
(1
'y)-1
EAp.ilrp-l7rp-2...7r1(aif)2,
iEA
5. STABILIZATION OF GLAUBER-LANGEVIN DYNAMICS
98
which combined with (5.2.3) gives us:
V(f2loglfI) , 1. By the isomorphism of the Hilbert space L2(pq) onto L2(dp) defined by: (5.2.11)
0 + Z, 1/2 exp(-Vq)VL,
we see that the operator Agcp
-2'P" + V'' p' is transformed into the
Schrodinger operator: HgzG :_ -1 ip" + WqVI
where W. = 2 (V'Q - VQ )
with domain CC°. Let D(x, y) be the polynomial of total degree 2m - 2 defined by:
V'(x) - V'(y) = (x - y) D(x, y) Since the leading coefficient of V is strictly positive, there exist constants a > 0 and b such that D2(x, y) > 2-2ma(x2 +y2)2'"-2 - b, which implies for any p and q the lower bound: V q(p) > apt(p2 + g2)2m-2 - b.
Since VQ'(p) is a polynomial in p and q of total degree 2m - 2, we can find a constant on b' such that:
- b') ll1 1,1(ap2 - b'). Since the second derivative operator is negative in L2(dp), aHq + b' Id is IIlpl,l Wq(p) > 11p1>1(a(p2 +
g2)2m-2
an upper bound of the multiplication operator p2 11p1.>1 and by utilizing the
5.3. PERSPECTIVES
101
Hilbert space isomorphism above we see that aAq + b' IIlpI31 is an upper bound of the multiplication operator on L2(pq). With aid of the Dirichlet form
_12
F'2 d,, Eq(F, F) = (F, AgF)L2(pq) = f this upper bound is written: for any F in H1(pq), we have
f
p F2 dpq aEq(F, F) + b' F2 dpq. Pfit JpI it We can take into account the region IpI 0 and sufficiently large L we have: (5.3.1)
Icov,VL.=(X$,Xj)I
0,
we see that this property is equivalent to the following apparently weaker property: for any bounded measurable function f on E, we have almost surely that: (A.1.3)
IE(f(X.+1) I Fn) = E(f(X.+1) I F{n}) We will say that the process is a homogeneous Markov process with transition kernel N if, in addition to (A.1.3) we have:
E(f (X.+1) I F,) = Nf (X.).
A.I. MARKOVIAN KERNELS
107
We can also express the Markov property in the following form: outside of a negligible set of points (x, x1, x2, ...) with respect to the law for (Xn, X, I, ...),
E(f(X.+1) I X. = x and Xn-1 = x1 and Xn-2 = x2 and...) = Nf(x). The law for X0 is called the initial law for the process. Given any probability
m and any kernel N, one can always show that it is possible to construct processes (Xn),n E N, with initial law m and transition kernel N. The corresponding marginal laws ,C(Xo, ... , Xn) are uniquely determined for any
n and, in fact, they are equal to m ® N . . . 0 N where there are n N's. A process is said to be canonical when Il = ET, the random variable Xn is the nth coordinate of this product, and F is the or-field generated by the X. From the image of the measure P for an arbitrary process, one can construct a canonical process that will have the same marginal laws £(Xt...... Xtk ) for any finite set of times. Stationary Markov chains. We will consider a canonical Markov chain: (EN, F, Xn, n E N).
Let T be the shift to the left operator on EN onto itself defined by T(x)i xi+1. Since Xi o T = X;+1,
the law of (Xo, Xl,... , Xn_1) for T (P) is the law of (X1, X2,..., Xn) for P. Since a probability measure on EN is determined by its marginal laws we see that P is T-invariant if and only if for any k the law for (Xi, Xi+1, , Xi+k) is independent of i. We say in this case that the process is stationary. We will now consider a homogeneous Markov chain with transition kernel N and initial law m:
Proposition A.M. The process is stationary if and only if m is an invariant measure for N.
PROOF. Suppose that the process is stationary. The law for the pair (Xo,X1) is m(dxo) ®N(xo,dxl) and thus the law for Xl is mN. Since the process is stationary the law for X1 is also m. Conversely, suppose m is an invariant measure for N and let f be any bounded and measurable function. If we set:
0(x1) =:
fN(xi dx2) fN(x2,dx3)
f
..
ff(x1x2... ,
N(x, dx1),
Appendix A
108
the invariance of m gives us: E(f (X1, X2,
. ,
Xn+)) = fm(dxo)fco(x)N(xodzi) = fco(xo)m(dxo) =
ff(xo.xi.. . , xn) N(xn-1, dxn) = E(f (Xo, X1,..., X1 )),
which is the relation we are looking for since f is arbitrary.
If m is an invariant measure for N, we can construct chains that are indexed by times Z. The important notion of reversibility of a process we are about to define is much stronger than invariance. Definition A.1.6. We say that a measure A on E is reversible for N when \ ® N is a symmetric measure on E x E. If m is reversible for N, we have, for all positive f and g that: fE g(x) Nf (x) .\(dx) = JE Ng(x) f (x) .\(dx). In particular, a reversible measure for N is an invariant measure for N. To see this just set f = 1 in the above identity. If, in addition, A is a-finite, the monotone class theorem will show that the preceding condition is also sufficient for the reversibility of A. It can also be shown that the a-finite measure A is reversible for N if and only if N induces a self-adjoint contraction on L2(A). When A is a probability measure, the reversibility of A is equivalent to the following condition: let p be the time reversal mapping of Ez defined by [p(x)]n = x_,,; then the law of the canonical stationary Markov process defined by the transition kernel N and initial law A is invariant with respect to p. Verifying a criterion that is equivalent to reversibility is, in general, much simpler than directly establishing reversibility. For example, when E is denumerable, reversibility can be written:
for any x and y in E, \,,N.,y = \yNy,x. Example A.1.7. If we consider the values WV,, of the Brownian nlotion at integral times we obtain a Markov chain with transition kernel N1. Lemma 2.2.10 shows that Lebesgue measure is reversible for N1.
Jump processes. In the case when the state space E is finite, the generalization of Markov chains to continuous-time Markov chains to the case of continuous time is fairly simple. For example, the interested reader can consult Rozanov's book [R.oz87]. We begin with a matrix A satisfying the hypotheses (A.1.2) and the Markovian semi-group NN := exp(-tA). Then
A.2. BOUNDED REAL MEASURES
109
there exists a Markov process (Xt), t E R+, with values in E with transition semi-group (Nt), i.e.: (A.1.4)
E(f (Xs+t) Ifs) = Nt f (Xs),
F. = a(Xu, u E R+, u G s).
We can also construct a version of this process for which the trajectories are piece-wise constant and right continuous. In the case where K is a transition matrix for which the diagonal elements are zero and A = I - K, it is possible to envision the process as follows: Xt is a body that sits at a point x of E
for a certain random waiting time governed by an exponential law with parameter 1, then jumps to another point y, with probability K(x, y), with the waiting times being independent for each site.
A.2. Bounded real measures Since we will only consider bounded measures we will, in general, omit explicit mention of this property.
Definition A.2.1. A real measure on a measurable space (E, E) is a mapping p from E to R such that for any sequence An of disjoint measurable 00
sets, the series E"0p(An) converges absolutely and its sum equals p( U An). n
Exercise A.M. Show that if we remove the word "absolutely", we obtain an equivalent definition. The following theorem allows us to reduce results about measures to the special case of positive measures. Theorem A.2.3 (Jordan-Hahn). Any real measure it can be decomposed in a unique way as the difference of two mutually singular positive measures.
The qualification "mutually singular" says that there exists A with A+ (A) = 0 and µ_(A`) = 0. If the values taken by a measure are finite, it will automatically be bounded. In any case, it would be neither useful nor agreeable to consider signed measures taking infinite values. If we define 1µl := a+ +,a- the theorem implies that 1/I (C) = supBcc1µ(B)1 for any C E E. This measure is called the absolute value of µ. If f is a 14-integrable function, we define f du to be f dµ+ - f du-. Examples A.2.4. Here are three examples of signed measures. (1) The difference of two bounded positive measures; (2) for f E L1(v) where v is a positive a-finite measure, we have the signed measure f v defined by:
(fv)(A)=JAfdv; (3) for any continuous linear form l on C(E) where E is a compact metrizable space, there exists by the Riesz Representation Theorem
Appendix A
110
a unique real measure p such that:
r
dfEC(E) l(f)=J fdp. E
We denote by II'Iloo the uniform norm of bounded functions, by M the vector space of all real measures on E and we define the total variation of a measurer by:
IIpoovt = L dl µl.
Proposition A.2.5. Let p be a bounded measure. Then the total variation of p equals the norm of the linear functional defined by p on the space of bounded measurable functions: (A.2.1)
IIAII"t =
sup{ f fdp : f E ,C-, IIfIIW
1}
and the space (M, II'Ilvt) is complete. PROOF. Utilizing the decomposition of p into its positive and negative parts, we see that:
f fdp= f
fdp+ -
f fdp- < fIfId++JlfId
= fIfIdtPI < IlfII. f dIul, which proves that the left-hand member of (A.2.1) is greater than or equal to the right-hand member. Going in the opposite direction, we consider f = IIA - IIA- where A is such that A+ (A) = 0 and p_ (Ac) = 0. The relation
p(f) = p+(A) + IA-(A') = u+(E) + A_ (E) = JfdII shows that the sup bound on the right-hand side of (A.2.1) is attained. From the formula that we just established it follows that II . ll't is a norm. In fact, M is identified with a subset of the dual of the space of bounded measurable functions with the uniform norm. We now turn to proving completeness. Let pk be a Cauchy sequence in
M. Since the masses f dIpkI are uniformly bounded, a := E2-klukl is a bounded positive measure. Since all of the µk are absolutely continuous with respect to a we can apply the Radon-Nikodym Theorem to obtain Ipk I = 9ka, from which we deduce that Pk = fka where fk := IIAk - EA., and where the set Ak is as above. It is easy to establish the formula
Ilfallvt=
f
Iflda=IIfIIL-00,
for any f in Ll (a), which shows that the sequence (fk) is Cauchy in the Banach space Ll (a) and thus converges to a function f in Ll (a). The same formula shows that µk converges to f a. 'Unfortunately the measure JµJ is also called the total variation.
0
A.3. THE TOPOLOGY OF WEAK CONVERGENCE
111
To finish our discussion we recall that M is ordered, i.e., µ < v if and only if µ(A) < v(A) for any A E e and that, for this order, there exists the upper and lower bound of two measures. For example, one can show that: (a, A p)(A) = sup{E c(Bj) A p(Bi) I for finite partitions (Bi, i E I) of A}. iE/
It is not necessary to worry about such formulas because given two measures we can always find a positive measure with respect to which the two measures are absolutely continuous and it suffices to take the lower envelope of the two
densities. We note that if v and p are positive, they are mutually singular if or Ap=0. Exercise A.2.6 (mixing kernels). Let a kernel N on E be such that there exists a positive measure a of mass a < 1 satisfying `/x a(dy) 0 and that then we can take a = Ey min= N(x, y).
A.3. The topology of weak convergence Let E be a Polish space, Cb(E) the space of bounded real Bore] measurable functions on E, and M6 (E) the cone of positive Borel measures on E. Definition A.3.1. The topology of weak convergence on Mb (E) is the least fine topology for which the mappings µ -,u(f) from JNb (E) to R are continuous for all f in Cb(E)
It is easy to see that such a topology exists and that a basis for this topology is the collection consisting of the "elementary open sets" {µ I a1 < µ(f 1) < b1, ... , an < µ(fn) < b } where n is any positive integer and where the fi are bounded continuous functions. The following result is technically very useful.
Appendix A
112
Proposition A.3.2. Let I be a Hausdorfspace. In order for a mapping i 4 µi of I to Mb (E) with the weak topology to be continuous, it is necessary
and sufficient that for any open set U of E the mapping i -+ pi(U) is lower semi-continuous and that i -r pi (1) is continuous.
PROOF. We, first of all, construct a denumerable set F of positive bounded measurable functions on E that is stable under the "sup" operation, i.e., f, g E F implies sup(f,g) E .F, which has the property that the characteristic function of any open set U in E is able to be written llj = sup fn where the functions fn belong to F. Let xi be a dense sequence n
E and let B be the denumerable set of open balls with rational radius whose centers belong to this sequence. Since any open set V of E can be written as a countable union of elements in B it suffices that F has the desired property of only having to check for U E B. If U = B(xi, r) we have:
lu = supgn,i,r with gn,i,r(x) = [nd(x, U`)] Al; n
and thus it suffices to form F as the set of all functions which are the "sup" of finite sets of the functions gn,i,r.
Let U be an open set. Then we can write lu = supra fn, where the sequence fn is increasing in F, since we can replace fn by sup(f 1, f2, if necessary. Applying Beppo Levi's Theorem gives us:
,
f,,)
µ(U) = suPla(fn)
(A.3.1)
Since the mappings i'-+ pi(f,,) are continuous for any n, the upper envelope i i-+ µ(U) is lower semi-continuous. Conversely, we suppose that for any open set U, the preceding mappings are lower semi-continuous and let f be a positive continuous function bounded by b. We denote by wp the open set {x I f (x) > p}. Fubini's Theorem then implies:
,W) =
J "o µ(wp)dp
since
f(x) =
j
" o L,,(x)dp.
We can restrict the integral to the interval [0, b]. Since the function p z(wp) is decreasing, µi(f) is the increasing limit of the Riemann sums: 2^
b 2-n E pi (wb2- ). P=1
Since each of these sums is lower semi-continuous as a function of i, the same is true for pi(f ). By replacing f by b - f in the previous argument we obtain the upper semi-continuity and therefore the continuity of i '-+ µi(f ). This extends to any arbitrary bounded continuous function that can be decomposed into the difference of two bounded positive continuous functions. Finally, from the definition of weak convergence, we see that i '- ui is continuous. 0 Corollary A.M. The weak topology is metrizable.
A.3. THE TOPOLOGY OF WEAK CONVERGENCE
PROOF. Let ¢ be the mapping Mb(E)
113
RY that associates the family
µ(f) top with f E F where F is defined in the proof above. It is easy to see that ]R-F is metrizable since.F is countable. We show that 0 is injective. Let It and v be two measures such that b(p) _ 0(v) and U an open set. By the relation (A.3.1) we have: (A.3.2)
µ(U) = sup
V(U)'
i.e., the two measures coincide on the open sets and thus on the o-field of Borel sets.
Since 0 is injective we can identify .Mb (E) with a subset of IRF and since the latter space is metrizable with metric S, this induces a metric on A lb (E). Clearly since 0 is continuous, b defines a less fine topology than the weak topology. To show the equality of these topologies we utilize the preceding proposition that says that it is sufficient to show that the mappings It '-, u(U) are lower semi-continuous with respect to the 6 topology. But since f E .E, the mapping It '-- It(f) is continuous by construction of b, which completes the proof. 0
Corollary A.3.4. Let B be a basis of open sets containing E that is stable under finite unions and let G be a subset of Cb(E) containing 1 such that the indicator (characteristic) function of any element in 8 is the limit of an increasing sequence of elements of G. Any sequence of measures (An) converges weakly to It if and only if for any f E G the sequence converges to µ(f ).
Example A.3.5. Consider E = RS where S is denumerable. We can take for B the collection of open sets depending on only a finite number of coordinates, i.e., sets of the form R.5\ L x U where U is an open set in 1RL and L is a finite set of S and for g the bounded continuous functions that depend on only a finite number of coordinates.
Exercise A.M. Show that any lower-continuous function f : E II8+ is the limit of an increasing sequence of linear combinations of indicator functions of open sets with positive coefficients. Deduce from this that It - µ(f) is lower semi-continuous. The interest of the metrizability result is that we will be able to study the weak topology with aid of weak convergence.
Definition A.3.7. We say that a set A of positive bounded measures on E is tight if for any e > 0, we can find a compact K C E such that: Vp E A µ(K°) 0, there is a compact set Ke that supports It up to E. This property of bounded measures on Polish spaces is all the more important because it can be extended to Souslin spaces - continuous
Appendix A
114
images of Polish spaces - which includes all of the usual separable spaces. The interested reader can consult [DM75] for this result which naturally leads to the following more elementary result:
Theorem A.3.8 (compactness criterion of Prokhorov). Any set A of bounded positive measures that is tight and bounded is relatively compact in the weak topology.
PROOF. First of all, we will assume that the result is true when E is compact. Let An be a sequence of measures in A. Because of the tightness of A, we can define an increasing sequence K; of compact subsets of E such that:
sup{µ(E - K;) I p c Al < 1/i. We denote by µn,; the restriction of An to K;. By utilizing the result for the compact case we can extract a subsequence k -* µn(l,k),1 of the sequence µ,,,l that converges to a measure supported by Kl; next, we can extract from the sequence U.tn(l,k),2 a subsequence Un(2,k),2 that converges to a measure µ,,,2 supported by K2 and continuing in this way we can, for any positive integer i, find a subsequence k H An(i,k),i converging to µ,,,;, a measure supported by K;.
It is easy to see if we consider the "diagonal" sequence (n(p, p)) associated to the array n(p, k), that the sequence of measures p - lln(p,p),i converges to L,, ;, for any i. By setting lim; µ,,,i(B), for any Borel set B, we define a measure. The additivity of this set function is clear. The o-additivity follows from the fact that the sequences (B) are increasing in i combined with Beppo Levi's Theorem:
[
limµoo,i(JJBJ) = lim Epoo,i(Bj) = LA.(B9) where B = uj B3. Clearly the inequality u,. (K;) < 1/i continues to hold. Any bounded continuous function f on E whose uniform norm is bounded by b satisfies: Im.(f)-µoo,i(f)I