Fundamentals of Measurable Dynamics Ergodic Theory on Lebesgue Spaces
DANIEL J. RUDOLPH Department of Mathematics, Univ...
147 downloads
1165 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Fundamentals of Measurable Dynamics Ergodic Theory on Lebesgue Spaces
DANIEL J. RUDOLPH Department of Mathematics, University of Maryland
CLARENDON PRESS, OXFORD 1990
Oxford University Press, Walton Street, Oxford OX2 6DP Oxford New York Toronto Delhi Bombay Calcutta Madras Karachi Petaling Jaya Singapore Hong Kong Tokyo Nairobi Dar es Salaam Cape Town Melbourne Auckland and associated companies in Berlin Ibadan Oxford is a trade mark of Oxford University Press Puhlished in the United States hy Oxford University Press, New York
(0 Daniel J. Rudolph, 1990 All rights reserved. No part of this puhlication may be reproduced, stored in a retrieval system, or transmitted, in any form or hy any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press British Library Cataloguing in Puhlication Data Rudolph, Daniel J. Fundamentals of measurahle dynamics. 1. Ergodic theory I. Title 515.42 ISBN 0 -19 853572 4 Lihrary of Congress Cataloging in Puhlication Data Rudolph, Daniel J. Fundamentals of measurable dynamics: ergodic theory on Lehesgue spaces / Daniel J. Rudolph. Includes hihliographical references (p. ). Includes index. 1. Ergodic theory. 2. Measure-preserving transformations. I. Tille. QA614.R83 1990 515'.43-dc20 90-7486 ISBN 0-19-853572-4 Set hy Asco Trade Typesetting Ltd, Hong Kong. Printed and hound in Great Britain hy Biddies Ltd, Guildford and King's Lynn
Preface
Our intention here is to give an elementary technical treatment of the fundamental concepts of the measure-preserving dynamics of a Lebesgue probability space. This text has grown out of a course given at the University of Maryland beginning in the Spring of 1984. The last twenty-five years have seen an enormous growth in the theory of dynamical systems in general, and in particular, the probabilistic side of this field, classically known as ergodic theory. This development in recent years, most especially through the work of D. S. Ornstein and his school, has changed the perspective on much of the classical work in the field to a more set-coding combinatorial point of view as opposed to a functional analytic point of view. This perspective is of course much older, easily visible in the work of Kakutani, Chacon, and many others. We choose to attach Ornstein's name to it as it is in his work, and the work of those around him, that this point of view has reached its current power. Good expository treatments of ergodic theory already exist and certainly proofs of many of our main results can easily be found in standard texts in the field. What we are attempting to do through the methodology and order of proof we have chosen is to present the reader with this body of material from what has been to date a most fruitful point of view with the fabric of its development, at least at an elementary level, intact. We assume the reader has a thorough working knowledge of the topology of the real line, and Lebesgue measure theory on the real line. Royden (1968) is a good source for this material. The one deep result we will use without proof is the Riesz representation theorem for the dual of the continuous functions but only on such simple topological spaces as Cantor sets or the unit circle. The text is not intended to be encyclopaedic, but rather to present detailed arguments and chains of arguments showing technically how the fundamentals of dynamics on Lebesgue spaces are developed. The intent is to show those who want to prove theorems in ergodic theory what some of the more fruitful threads of argument ha ve been. For this reason, we gladly present some very technical material, and at several points give multiple proofs. Although in places the technical detail may seem formidable, we will in fact often make simplifying assumptions. The one most obvious is that only the ergodic theory of single transformations is considered. To extend the theory to actions of 7L n or more general discrete abelian groups is not too difficult. For non-abelian and continuous groups, even as basic as IR, extension requires new ideas,
vi
I
Preface
fundamentally the existence of measurable sections. We include a bibliography where such extensions can be found. From the basis given here, this literature should be readily accessible. Chapter 1 presents the fundamental concepts of measure-preserving dynamics and introduces a number of examples. Chapter 2 is a basic and technical treatment of the structure of Lebesgue probability spaces. As a preparation for later work, we prove an L i-martingale theorem via the Vitali covering lemma. This argument is a warm-up for our proof of both the BirkhofT ergodic theorem and the Shannon-McMillanBreiman theorem, as both will be proven by a 'Vitali' type argument. Chapter 3 presents the ergodic theorems, and ergodic decomposition. We present the now classical von Neumann L 2-ergodic theorem and the GarsiaHalmos proof of the BirkhofT ergodic theorem, to juxtapose them with the 'backward Vitali lemma' proof we then present. The intention is to give the reader as much technical insight into these theorems as is reasonable. Our last task is to show that any measure-preserving transformation decomposes as an integral of ergodic transformations. Hopefully the presentation given here makes this very technical argument approachable. Proofs are difficult to find in the literature. Chapter 4 covers the hierarchy of mixing properties, presenting the circle of definitions of weakly mixing and ending with the definition of a Kolmogorov automorphism. Included is a short development of the spectral theory of transformations. This leads to the theory of entropy in Chapter 5, which we develop from a name-counting point of view, again using the backward Vitali lemma to prove the Shannon-McMillan - Brieman theorem. In Chapter 6 we introduce the concept of a joining and disjointness and we use these to again characterize ergodic weakly mixing, and K-mixing transformations. We also show that Chacon's map has minimal self-joinings and use this to construct some counter-examples. In Chapter 7 we present the Burton- Rothstein proofs of Krieger's generator theorem and Ornstein's isomorphism theorem from the viewpoint of joinings. Many exercises are presented throughout the course of the text. They are intended to help the reader develop technical facility with the methods developed, and to explicate areas not fully developed in the text. Chapters 1 through 5 form the core of an introductory graduate course in ergodic theory. As the point of view here is technical, we have been most successful using this material in conjunction with a more broadly oriented text such as Walters (1982), Friedman (1970) or Cornfeld, F ormin and Sinai (1982). Chapters 6 and 7 can be used either as the core of a more advanced seminar or reading course, expanded perhaps with appropriate research literature of the field, allowing the instructor to orient the course either toward deeper abstract study, or application of the material. The book ends with a bibliography.
Preface
I
vii
I would like to thank Charles Toll, Ken Berg, Mike Boyle, Aimee Johnson and Janet Kammeyer for collecting and refining the original note~, and the text during its years of development. 1 also must thank Virginia Vargas for shepherding along the manuscript through its many revisions.
Maryland 1989
D.l.R.
Contents
1.
Measurable dynamics I.!
Examples
1.2 Exercises
2.
lebesgue probability spaces
2.1 2.2 2.3 2.4 2.5
3.
9 10 12 18 25
Von Neumann's L 2- ergodic theorem Two proofs of Birkhoff's ergodic theorem Proof of the backward Vitali lemma Consequences of the Birkhoff theorem Disintegrating a measure space over a factor algebra
27 29 37 43 45
Mixing properties
4.1 4.2 4.3 4.4 4.5 4.6 4.7
5.
Countable algebras and trees of partitions Generating trees and additive set functions Lebesgue spaces A martingale theorem and conditional expectation More about generating trees and dynamical systems
Ergodic theorems and ergodic decomposition
3.1 3.2 3.3 3.4 3.5
4.
7
Poincare recurrence Ergodicity as a mixing property Weakly mixing A little spectral theory Weakly mixing and eigenfunctions Mixing The Kolmogorov Property
51 51 52 57 62 66 69
Entropy
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
Counting names The Shannon-McMillan-Breiman theorem Entropy zero and past algebras More about the K-property The entropy of an ergodic transformation Examples of entropy computations Entropy and information from the entropy formula More about zero entropy and tail fields
71 77
79 82 83 85 89 94
5.9 Even more about the K-property 5.10 Entropy for non-ergodic maps 6.
Joinings and disjointness
6.1 6.2 6.3 6.4 6.5 6.6
7.
97 103
Joinings The relatively independent joining Disjointness Minimal self-joinings Chacon's map once more Constructions
105 108 113 116 118 125
The Krieger and Ornstein theorems
7.1 72 7.3 7.4 7.5 7.6 7.7
Symbolic spaces and processes Painting nameS on towers and generic nameS The d- Metric and entropy Pure columns and Ornstein's fundamental lemma Krieger's finite generator theorem Ornstein's isomorphism theorem Weakly Bernoulli processes
Bibliography Index
129 132 135 141 150 152 156 164 167
1 Measurable dynamics
Dynamics, as a mathematical discipline, is the study of those properties of some collection of self-maps of a space which become apparent asymptotically through many iterations of the maps. The collection is almost always a semigroup, usually a group, and often, as a group, simply 7L or IR. The origins of the field lie in the study of movement through time of some physical system. The space is the space of possible states of the system and the self-maps indicate how the state changes as time progresses. The field of dynamics breaks into various disciplines according to the category of the space and self-maps considered. If the space is a smooth manifold and the self-maps are differentiable then the discipline is smooth dynamics. If the space is just a topological space and the self-maps are continuous, the field is topological dynamics. As a very significant subfield of this, if the space is a closed shift-invariant subset of all infinite sequences of symbols from some finite set, and the self-maps are the shifts, then the field is symbolic dynamics. Lastly, and of interest to us here, if the space is a measure space and the self-maps are measure-preserving then the field is measurable dynamics, or more classically, ergodic theory. Each of these disciplines has its own special flavour and language, but the overlaps among them are enormous and the parallels often subtle and insightful.
1.1
Examples_
Here is a collection of examples of dynamical systems from various mathematical disciplines, each of which has one or more natural invariant measures.
Example 1 Rotations of the circle. The space Sl will be the circle of circumference 1 and the self-maps, RIZ , rotations by an angle tX. In this case, the space and the collection of maps can be identified as the same object, the group of rotations of the circle. Lebesgue measure m on the circle is invariant under rotations. This is a particular case of a compact group acting on itself by left multiplication and its unique invariant Haar probability measure. We will return to this idea in our last example. Example 2 Hyperbolic toral automorphisms. Here the space T2 will be the two-dimensional torus. We think of this as W~.217L2. This is again a compact abelian group but we will not consider the action of the group on itself. Let
2
I Fundamentals of measurable dynamics
ME SL(2.Z) be a 2 x 2 integer matrix of determinant 1, (for example
[~
:]>. As M(Z2) =
7L 2, M projects to an automorphism of T2 preserving
Lebesgue measure. Suppose the eigenvalues of M are A, A- I and do not lie on the unit circle. They both must be real. Let unit eigenvectors be VI' and V2 • Supposing IAI < 1 for any point x E T2, the line of points x(s) = x + SV 1 contracts exponentially fast as M is iteratively applied, i.e., IIM"(x(s» - M"(x)II =
IAI"lsl.
Similarly, the line x(u) = x + UV2 expands exponentially fast. The existence of such expanding and contracting foliations (families of lines or curves), is a common, and much studied, phenomenon. It has powerful implications for the dynamics of the system, as we see in Exercise 2. Mixing Markov Chains. Let M be an n x n matrix whose entries ~ 0 all entries of Mk are greater thanO. The space will consist of all sequences x = {xj}f=_"",XjE {l, ... ,n} where Example 3
mi• j are either 0 or 1. Assume that for some k
LM
We view the movement from Xj to xj +1 as a transition from one state to another and the 1's in M indicate the allowed transitions. Mk > 0 says that all transitions across time units are allowed. The map -+ is the left shift, i.e., T(x) = y where Yi = Xi +1' The space is a closed shift-invariant subset of {l, ... , n}Z, and is called a topological Markov shift, or chain. In this situation there are many invariant measures. Here is one sort. Let P = [Pi.i] be a matrix with all Pi.j :?: 0, Pi.j > 0 iff mi.} = land
k
LM
T: LM LM
"
'" p 1.1 .. = 1'
L j=1
i.e., a Markov matrix compatible with M. As pk has all positive entries. P has a unique left eigenvector (PI,""P,,) with eigenvalue 1 with all Pi> 0 and Li'=lPi = 1. This allows us to define a T-invariant finitely additive measure on open sets y defining it on a cylinder set c = {x : Xj = i(j), a 5;. j 5;. b} to be b-l
Jlp(c)
= Pi(",
n
Pi(k).i(k+1)·
k=" fhat Jlp extends to a T-invariant Borel measure on Ko]mogorovextension theorem (see Chapter 2).
L
follows from the
Example 4 Isometries of compact metric spaces. Here the space X will be a x>mpact metric space with metric d( , ), and the self-map f: X -+ X, an sometry d(f(x),f(y» = d(x, y).
Measurable dynamics
We wi1l assume f has a dense orbit, Le., fOT some
I3
Xo
00
U
!"(xo) = X.
(1.1)
"=-00
Iff has such a dense orbit, then all positive orbits are dense, i.e., U;;-'=l j"(x) =. X for aU x E X. We want to see two things about this situation; first that X can be made a compact abelian group with the action of f given as multiplication by f(xo) and second that X has a unique f-invariant Borel probability measure. If X is a finite set, then f must be a cyclic permutation and the results are easy to see. Thus we may assume that all the points fn(x o ) are distinct. We first define a product rule on this dense subset of X x X by j"(xo ) x fm(x o ) = j"+m(xo). If fn,(x o ) and fffl;(X O ) are both Cauchy sequences, then so is f n, -m,(xo)' Thus for a, bE X, select f"'(x o) -+ a, f""(xo) -+ band derme a x b- 1 = limj",-m,(xo). This element of X is independent of the Cauchy sequences chosen, and makes X an abelian topological group. The group relations follow from those on j"(xo)' Notice that Xo is the identity element, f("o) x a = f(a) and multiplying by any a E X is an isometry of X. Remarks 1. A compact group with an element g with {gn: n ~ O} dense is caUed monothetic.
2. This argument can be generalized to much weaker situations; for example X need not be metric, only compact Tz with {j"} equicontinuous. The index n could be replaced by t E IR or more generaUy by g E G, a locally compact abelian group of self-maps of X. We could also have started with X merely precompact and extended f to its compactification. We will apply this last idea in our discussion of the weakly mixing property in Chapter 4. 3. If we do not assume f has a dense orbit, then X always decomposes into an indexed family X(IX) of disjoint compact f-invariant sets on each of which f has a dense orbit. This is a simple case of the decomposition of a space into 'ergodic components.' We now want to see that f has a unique invariant Borel probability measure. We will do this by showing that a compact abelian metrizable monothetic group has a unique invariant Haar measure. For any continuous function II, set 1 n-l
An(h, x)
where g has a dense orbit.
=-
L h(gi
n 1=0
X
x)
(1.2)
4
I
Fundamentals of measurable dynamics
Using the density of the orbit of 9 one shows that lim IAn(h, x) - An(h, y)1 =
°
(1.3)
uniformly in X x X, and hence as (1.4)
we can conclude lim An(h, x)
= L(h)
converges uniformly to a constant for x E X. As L(h) is a bounded linear functional on C(X), by the Riesz representation theorem (1.5)
for some Borel measure J1.. That J1. is 9 invariant follows easily from L(h)
= L(g(h».
If v were some other g-invariant measure, then
f
An(h,x)v(dx)
=
f
hv(dx).
But as An(h, x) converges uniformly to L(h), v = J1., and J1. is unique. This is only one of myriad constructions of Haar measure for such groups. Example 5 Rank-l cutting and stacking constructions. We will describe a general method for constructing a certain class of maps. Each will act on some interval [0, a) c ~+. The method is inductive. At each stage of the induction we will have constructed a partial map /,., defined on the interval [0, an), where an s;; a. The map in will have the following form. The interval [0, an) will be cut into disjoint subintervals 1(1, n), 1(2, n), ... , I(N(n), n),
left-closed, right-open, all of the same length an/N(n). We will have assigned some permutation 11:nof {I, ... , N(n)} to order the intevals and in maps 11:n(i) to 11:n (i + 1) linearly. Thus /,. is defined everywhere except l(11: n(N(n», n), and /,.-1 is defined everywhere except 1(11:n(1), n). We view this situation as a stack, tower or block of intervals (all three words are often used) and /,. as movement vertically through the stack (Fig. 1.1). To get started N(O) = 1, 1(1,0) = [0,1),
and
To construct!..+l' we choose a parameter k(n
11:0
= id.
(1.6)
+ 1) > 1 and cut each I(j,n)
Measurable dynamics
I
5
1 - - - - - - - 1 } 1(:rr,,(N(n»,n)
1 - - - - - - - 1 } 1(:rr,,(3).n) 1 - - - - - - - - 1 ) 1(:rr,,(2),n)
t - - - - - - - l } I(:rr,,(l),n)
Fig. 1.1 A stack of intervals. 1 1 1 --1--1---1-1 1
1
--1--1---1---1--1---1---1--1--1--
Fig. 1.2 Cutting a stack. into ken + 1) subintervals of the same width. We can view this as slicing through the stack vertically (Fig. 1.2). This gives us the first part of our list of intervals 1(j, n + 1). We also select parameters S(I,n
+ 1), S(2, n + 1), ... , S(k(n + 1),n + 1) E Z+ U to}
and cut off k(n+1)
L
S(j,n+l)
j=1
intervals, of the same length N(n)k(n
+ 1)
(1.7)
as those already cut, but from an interval [an,an+d. We can define nn+1 by describing how to stack these intervals. We work from left to right through the sliced off subcolumns of the previous stack, placing one above the other, putting S(j, n + 1) of our new intervals atop the jth slice before adding the next (Fig. 1.3). nn +l (i) is the index of the interval i steps up the stack of intervals. It is easy to see that f.. +l = f.. where both are defined, and that all f.. preserve Lebesgue measure t. We assume 00 "'k(n) ~ £...-)=1
£...n=1
S( J,. n) an -1
N(n - l)k(n)
a- 1 0 are called atoms of J.Lo. We say J.Lo is non-atomic if no such atoms exist, i.e., for all chains rt', J.L(rt') = O. Exercise 2.2 If J.Lo is non-atomic then chains rt'.
limi~oo
J.Lo(cJ
= 0,
uniformly on
Let E be the union of all sets A in any 1'; with J.Lo(A) = O. Let X 0 = X\E. Theorem 2.1 If {1';} is a generating tree of partitions of X, and J.Lo is a non-atomic additive set function on it, then there is a 1~ 1 map ,p:Xo-+Zc~.
Z is a compact totally disconnected subset of ~ and for any set A E Pi' ,p(A) is open and closed in the relative topology on Z and J.Lo(A) = t(,p(A», (remember f is Lebesgue measure). Proof To each A E 1';, J.Lo(A) > 0, we will assign a closed intervaII(A) E ~+ as follows. For Po = {X}, assign an interval I (X), with t(l(X» = 5/4. Assume we have made assignments through Pk and (1) if A, A' are disjoint in Pk so are I(A), I(A'); (2) if A c A' are in different P/s, then I(A) c I(A'); and (3) if A
E
Pk then
To construct the intervals for Pk +1' simply select inside each I(A), A E Pk , disjoint closed intervals one for each A' c A, A' E Pk + 1 , each some small fraction i:k+ 1 ~ r k- 1 larger than J.LO(A') > O. For any chain rt' = {cd there corresponds a constructed chain of closed intervals I(c i ) and as
(2.4) nkJ(Ck) is a single point.
For any x E X 0, let ck(x) E Pk be that set containing x and ,p(x) be the point in ~+ corresponding to this chain. This defines ,p. As J.Lo(A) = 0 implies (A n X 0) = 0, ,p(A) n J(A) of- 0 for any set A. It follows that if A E Pjo ' (2.5)
I Fundamentals of measurable dynamics
12
(as fJ.o is non-atomic, for A E Pi' t(I(A» goes to 0 uniformly as j -=1= 0 if Ai c A). As
-+ 00,
and
tft(A) n [(Ai)
fJ.(A)(1
+ ei) < t U
[(A;) < fJ.(A)(1
+ Ti),
AjEP Ai CA
(2.6)
Since tft(A) and tft(A C ) are disjoint for A E p;, these sets are both open and closed in the relative topology on Z = tft(X). The construction of the map tft is simply modelling fJ.o and X by a Cantorlike set in IJ;l of Lebesgue measure I. Exercise 2.3 State and prove a modified version of this theorem that includes the possibility of atoms.
2.3
lebesgue spaces
We call X, with generating tree of partitions {P;} and non-atomic set functon fJ.o, Lebesgue if
t(tft(X o)\tft(X
0» = o.
(2.7)
In more technical terms Lebesgue means that ()'>
UA
(1) for any 8 > 0 there is a set E(e)
=
:?
l
L Ai'
i~l
where each Ai E lj for some j and the Ai are pairwise disjoint with
I
fJ.o(A i) < 8;
i
and (2) for any chain ((j
= {C 1} with
n
Ci
= 0, once i is sufficiently large, Ci c
E(8),
or the empty chains have measure O.
(2.8)
Exercise 2.4 The definition above of a non-atomic Lebesgue space is given in two forms (2.7) and (2.8). Show that they are equivalent. Hint: first show that any union of sets in a ~ree can be written as a disjoint union. Exercise 2.5
Both Examples (I) and (2) are Lebesgue.
Lebesgue probability spaces I 1;:S
At this point being Lebesgue seems to depend critically on the choice of {Pi}' We proceed to eliminate this artifact. From now on we assume X, {Pi}' fJ.o is Lebesgue, and non-atomic unless otherwise stated. In Z = (b(X) we have the u-algebra of Lebesgue measurable sets :F. Let d be the inverse images in X of such sets, a u-algebra in X containing all the Pi' For any A E Pi' (b(A) is Lebesgue measurable as :F is complete. Remember (Royden 1968) :F consists of those sets whose outer and inner measures are equal. For A E Pi> (b(A) is within measure 0 of (b(A). These closed and open sets generate the topology on Z. If we define ~(S) to be the set of all coverings of S by disjoint unions of sets A, each an element of some Pj, and then define fJ.*(S)
I
= inf
fJ.o(A)
= t*({b(S)),
(2.9)
CE'6(S) AE'6
then for SEd, fJ.*(S) = t({b(S)) =
1 - t({b(SC»
=
1 - fJ.*(SC),
and if fJ.*(S) = 1 - fJ.*(SC)
then t*({b(S»
=
1 - t*({b(SC»
and SEd. Thus d
= {S: fJ.*(S) = 1 - fJ.*(SC)}
(2.10)
and we write fJ.(S) for t({b(S» = fJ.*(S). The extension of fJ.o to all of d under the Lebesgue hypothesis is a version of the Kolmogorov extension theorem (Chung 1968). It can, of course, be done directly in terms of outer and inner measure. The value of our approach via the injection {b to IR is that we can now use the very tight connection in IR between geometry and measure. We want to see that if one generating tree of partitions and additive set function fJ.o yields a Lebesgue space, then any other choice for a tree from d is also Lebesgue. If (X, d, fJ.) has a choice for X, {PJ, fJ.o making it Lebesgue, we call (X,d,fJ.) a Lebesgue space. Let {QJ be a tree of finite partitions, not necessarily generating, made of sets in d. We first create a new space X on which it does generate. We say Xl '" X 2 if for any i and S E Qi' if Xl E S then X 2 E S. This is an equivalence relation. Let it be the space of equivalence classes of ~. The QI can be thought of as partitions of X and on this space, {Qi} generates.
Theorem 2.2 If (X, d, fJ.) is Lebesgue and non-atomic and {Qj} is any tree of partitions from .s# with fJ.(S) > 0 for all S E Qj, then (X, {Q;}, fJ.) is Lebesgue. Proof What we must show is (2.8) that given any 8 > 0, we can find a countable disjoint collection of sets B;, each an element of some Qi' so that LfJ.(BJ < 10 i
n
U
and for any chain If,} = {c i } from the tree {Qi} with C j = 0, Ck C Bj , for some k. We fix e. We will work in Z, the image space constructed in Theorem 2.1 using the Lebesgue space (X, {Pj }, fJ.o) we know exists. For each set S E Qj we construct a closed subset D(S) c 0,
(2.18)
(the map ,p will, in fact, be 1-1 and measure-preserving on [0, 1]). This example is the general form of a separable probability space. A probability space could also fail to be Lebesgue if no tree of partitions could generate, i.e., if the cardinality of X were larger than C.
10 I r-unaamentalS OT meaSuraOle aynamlcs
We also have seen in Theorem 2.3 that in a Lebesgue space for a tree of partitions to separate points is equivalent to generating the measure algebra. The following example shows that this also can fail in the non-Lebesgue case. Exercise 2.tt LetA ~ [O,t) beasubset witht*(A) = tand t*(AC n [O,!» = t. Let X = Au (Ae + t) ~ [0,1]. Define .F = {X n E : E is Lebesgue measurable} and J.I.(X n E) = t(E) as in the above example (1) show J.I. is well defined;
(2) exhibit a tree of partitions {PJ ~ .F that separates points of X but does not generate the full measure algebra .F. At the other extreme from these examples is a case which cannot fail to be Lebesgue. This is the case when our generating tree of partitions has no empty chains. For example the Cantor-like set Z constructed in Theorem 2.1 has this property as do mixing Markov chains, for the given tree of partitions. In such cases all additives set functions on the tree extend to measures. One can topologize such a space, taking as a base for the open sets the sets in the tree. The topology is totally disconnected, metrizable and, as there are no empty chains, compact. The Borel measures on this space are, by the Riesz representation theorem, the dual of the continuous functions. In this particular case the Riesz theorem is easily proven, as the characteristic functions of sets in the tree are continuous. Borel probability measures on this space are exactly the ones which come from additive set functions on the tree and are the same as positive nonned linear functionals on the continuous functions. This space of Borel probability measures is itself a compact metric space, as the dual of the continuous functions. We will see much more of this in Chapter 6.
2.4
A martingale theorem and conditional expectation
As an application of our Lebesgue space development we will prove a martingale theorem. The result is quite standard and our proof is not particularly special. It is though, via a covering argument and the Vitali lemma. Such covering arguments are the core of our approach to many later results and we wish to stress the analogy and so include the complete argument. Lemma 2.5 (Vitali covering lemma) Let S c [0,1] be Lebesgue measurable and suppose that for each XES there is given a nested sequence of intervals Ii(x) which intersect to x. Then there is a countable disjoint collection 11 , 12 , ... where Ik = Ii(k)(x(k» for some i(k), x(k) E S which covers almost all of S, i.e., (2.19)
Lebesgue probability spaces
I 19
Proof Select the 1,. inductively with length 1,.+1 > 1sup{length Iix) IlAx) disjointfrom 11 ,12' ... ,1d, with Ik +l disjoint from 11" .. , l k. Either this process terminates after finitely many steps, in which case we are finished or we get a sequence of disjoint intervals II' I 2 , ..•• The length of 1; -+ 0 as i -+ 00. In fact, the sum of the lengths of all the 1; which are selected is bounded by 1. Let Ii be 1; expanded symmetrically by a fraction Il of its length. So the length of Ii = (1 + Il) X length Let 1; be 1; expanded (symmetrically) 5 times its length. Claim. For all t ~ 2, allll > 0
1;.
Sc
1
ex>
i =1
i=l+l
U I;u U
(2.20)
Ii'
Otherwise there is some XES not in the right hand side. So x ¢ Ul=1 I;, and there is some lAx) disjoint from II' ... , [,. If lAx) is also disjoint from ['+1' [,+2, '" then we get a contradiction to the selection of the 1; (length Ij(x) > twice the length [, for large enough t). If 1. is the first of ['+1, [,+2' ... which intersects Ij(x), then x fL and so length (fix» > 2 length (1.) and this is another contradiction to the selection of 1.. Therefore (2.20) is true so
t(s\.V1;) ~ t(.V [/;\1;]) + t(lP;) < .-1
.-1
1+1
Il
+ II
•
for large t.
Let {Q I} be a tree of partitions for X, not necessarily generating. Recall we can construct a space X' of equivalence classes on which {Q;} generates and is Lebesgue. The purpose of the martingale theorem we now want to prove is to project L 1 (JL) on X to L 1 (JL') on X'. Such a projection is called a conditonal expectation. Here are some simple examples of such projections. Example 1 Suppose (X, fF, JL) is the unit interval with Lebesgue measure, and Bo = ([O, t), [t, 1), [0, 1), 0}. The factor space X' consists of two points, call them {R,B}. IffE L 1 (JL) we define 2
(pf)(x ' ) =
r
1r
fdJL
ifx'=R
fdJL
ifx'
J[o.!]
2
JI!. I]
Then for any A
E
X' (i.e., A
f
E
A
{R, B}) we have
p(f) dJL'
=
r
Jp-'(A)
f dJL.
= B.
Example 2 Let § ' be 'doubled sets' in [0,1], i.e., if x e B, B e ~I, then x ± ! e B for the appropriate ± sign. The equivalence classes of [0, 1] mod § ' consist of pairs of points {x, y} where Ix - yl = !. If f eL l ([0, 1]) let (pf)( {x, y}) = !(f(x) + f(y» and for A e ff"
f
p(f) dJl'
=
A
So p : L 1 (Jl)
-+
r
f dJl.
jp-'(A)
L 1 (Jl') isometrically.
Let {Q.} be a tree of partitions in a Lebesgue probability space (X, [F,Jl). Let f E L~ (Jl), the positive integrable functions. Consider the space (X', [F', Jl') which {Qi} generates. We want to project f to an f' e L 1 (Jl') so that for any
Se [F', (2.21) We calIf' the conditional expectation off given ff", and will write it E(flff"). For a finite partition Q we can easily define the conditional expectation of f given Q as
fQ(x)
= JltS) Is f dJl,
where xeS e Q,
(2.22)
If {Qi} is a tree of partitions, we see each f Q, is a simple function constant on each S e Qi. Further, for S e Qi and j 2 i
(2.23) In fact, we need not assume f Q , actually arises from some original f e L~; only these last conditions (2.23) need be assumed to get the result we want. We call {Ji}, {QJ a martingale if {Qi} is a tree of partitions, Ji is constant on sets S e Qi and for S e Qi and j 2 i
Is /;
dJl =
Is./j dJl.
(2.24)
This notion of a martingale is much stronger than is standard, and so our martingale convergence theorem is a much weakened version of Doob's martingale theorem (Chung 1968). Theorem 2.6 If {/;}, {Q;} is a martingale, /; 2 0, then /; converges a.e. to a function f eL l (Jl), and for any {Qi} measurable set S
lim
LL Ji 2
f.
(2.25)
Lebesgue probability spaces
I
21
Notice, as each /; is Qi measurable, we can assume without loss of generality {Qi} generates. Proof First, we show that S = {x: J;(x) -+ oo} has measure O. S is measurable so for any II > 0, we can find a finite disjoint union of sets Sj E Qi for some i with ,u(U1SjAS) < ll. As U1Sj is a finite union, once i is large enough
r
JUi
fid.u S}
is a constant in i. But now
U
and we conclude .u(S n j Sj) = 0 and .u(S) < ll. Thus .u(S) = O. Now to show f, converges pointwise a.e. to a limit It will be convenient to transport our construction to IR, using the {Qi} trees, so that each set ,ps.. S E Qi' is an open compact set, with ,p(S) c I(S) an interval oflength less than .u(S)(l + 2- i ). Since {Qi} may contain atoms the tree ofintervals may have branches which descend to a set of positive measure. Such a set will be a closed interval. Thus ,p will not be defined pointwise on atoms of ,u, but will map the atom to the whole interval as a set map. (If you did not solve Exercise 2.3, take this as a hint and do so now.) We may define!; on ,p(X), by transporting to each I(S) the value of fi on S E Qi' Clearly if /; converges a.e. on t;b(X) then fi does on X. Note: If C1 => C2 => ••• is a branch of {Q;} with limi_oo.u(C1 ) > 0, i.e., an atom, then clearly h(X) converges as i -+ 00 for all x E C i as
1
n
(2.26)
i.e., fi(x) is nearly monotone decreasing. Thus we need only work on the non-atomic part. Pick a > II > and let
°
Sa .• = {z E ,p(X) : !;(z) > a + II infinitely often and !;(z) < a often}.
II
infinitely
hex) converges a.e. iff t(Sa .• ) = 0 for all a, ll. Notice also that if l(z) converges then z rf Sa .• for any a, B. By our note if z E Sa .• then ,p-l(Z) is not in an atom of .u. Thus, for every Sa .• we can select intervals I;(z) from the construction of t;b such that Ii +1 (z) £; Ij(z), t(Ij(z» -+ 0 and for each i, z if j ~ j(i, z) then Ij(z) corresponds -. ;TI some Qj(i,%) so that
Z E
(1) if i is even
Jr
_fj(i.z)dt =
1,(%)ntP(x)
Jf _hdt > (a + ~)t(l;(Z»
(2.27)
(a - ~)t(Ji(Z».
(2.28)
l i (%),-.,4>(x)
and (2) if i is odd then
f
_];(i.%)dt
=
f
_];dt
v > u > lim A.(f, x) :j>, Q}.
(3.3)
If x belongs to no such Eu.v then limn~oo An(f, x) exists (possibly ± 00). We will show J-l(Eu.V> = O. Assume v > 0, otherwise replace f by - f and - u > O. Assume J-l(E u.v) > O. From its definition, if T(x) E E u • v , then x is also. In other words, T- 1 (E u • v ) O.
ni-O
The function f - v ¢ L 1 (J-l) if J-l(X) for each x
= 00,
but choosing a set A c X, J-l(A)
0, J-l(X) < 00. Now, using J-l(X) < 00, u - f ELI (J-l) and for some n, 1 .-1 U - - L f(Ti(x» > O. ni=O
By the maximal lemma
or
Ix f dJ-l
::5;
(3.6)
uJ-l(E..,.,).
-
This implies J-l(E ..,v) = 0 and we are done.
Corollary 3.5 Defining the map L(f) = J,for f E LI(J-l), IIL(f)111 ::5; I fill and so L(f) is a continuous projection from L I (p) onto the subspace of T-invariant U-functions. Proof As II An(g)lll to!,
= I gill for g ~ 0, and since A.(f) converges pointwise
and (3.7)
Notice that L(f)(T(x»
= lim A.U; T(x» =
· 1 1... f f(T'.(x» Ilmn i~l
= lim(n :
1
1 A n + (f,x) _
f~X»)
= L(f)(x). !hus L(f) is T-invariant. As IIL(f - g) II 1 = IIL(f) - L(g)lIl IS a continuous projection onto the T-invariant Ll functions.
(3.8) ::5;
II! - gill' L _
~~
I r-unaamentalS OT measuraole aynamlcs
We now show that the only non-trivial convergence in the Birkhofftheorem is in the case of a finite invariant measure. Corollary 3.6 If (X, ff', JJ.) has no T-invariant subsets of finite measure, then LU) == 0 for all f E U(JJ.). Proof If X has no T-invariant sets of finite measure, the only T-invariant V function is identically equal to O. • Notice that X can be broken up into a subset Xro which has no T-invariant subsets of finite measure, and its complement which is at most a countable union of T-invariant sets of finite measure. On x,XJ' Cesaro averages of Ll functions converge to zero. The following corollary says that on a remaining piece the convergence is in L 1. Corollary 3.7 If JJ.(X)
0 there is a set F c X so that F, T(F), ... , T" -1 (F) are disjoint and their union Covers all but e in measure of X.
~oof For all x E X, let ik(x) = O,jk(X) = nk - 1. There is, by the backward VItali lemma, a set A' c X and measurable values i(x), j(x) defined on A', chosen from among the ik(x), A(x) so that all U{t;I(x) Ti(X) = J(x) are disjoint and
________
~I~x
_______________ A'
I
I I I
l(x)
Fig. 3.1 Schematic of the backward Vitali lemma. The intervals represent disjoint measurable subsets, and the orbit sections lex) cut through the diagram vertically.
Now i(x) andj(x) must be of the form i(x) = 0 andj(x) = nk(x) - 1. Set
F = x~L
1(X)-1 (
}}o
)
(3.9)
T"i:(x) .
Letting "-I
J'(x) =
U Ti(X)
for x
E
F,
1=1
then k(x)-I
l(x)
=
U
J'(T"l(X»
k=O
is a disjoint union. Thus for x
E
F, the J'(x) are pairwise disjoint and
J.l( xeF U J'(X») = J.l( xeA' U leX») >
1-
B.
(3.10)
Saying the lex) are pairwise disjoint is equivalent to saying F, T(F), ... , _ T"-I(F) are pairwise disjoint. The Rohlin lemma is in fact much older than the backward Vitali lemma, and this is a very unusual and in fact difficult proof, in that it uses the much deeper backward Vitali lemma. Exercise 3.1 Give a direct proof of the Rohlin lemma, not relying on the backward Vitali lemma.
Ergodic theorems and ergodic decomposition [ 35
Standard proofs usually rely heavily on the ordering of 7L, as did our first proof of the Birkhoff theorem, or look surprisingly like the proof of the backward Vitali lemma. In this sense the backward Vitali lemma can be viewed as a generalized Rohlin lemma. In fact for non-periodic actions of general amenable groups the backward Vitali lemma remains true with only slight changes (F0lner sets replace intervals and complete disjointness is not obtained) whereas the Rohlin lemma as stated requires the existence of tiling sets (Ornstein and Weiss 1980). We noW use the backward Vitali lemma to re-prove the Birkhoff theorem. Suppose we have a set F and measurable functions i(x) ~ 0 ~ j(x) so that the sets J(x) = U{~l(x) Ti(X) are disjoint as x varies over F. A basic fact to keep in mind is that for any f ELI (J-l) we get
1
iI
j(x)
UX.F J(x)
fdJ-l
=
f(Ti(x»dJ-l.
(3.11 )
F i =i(x)
Theorem 3.11 Let T be a measure-preserving, invertible transformation on (X,.fF,J-l), a Lebesgue probability space. Let f E e(J-l) and as before set
For almost all x E X, An(J,x) converges. We can, as usual, identify the limit function. Let J
= {A E ff: J-l(ALlT- 1 (A» = O}.
This is a a-algebra, the a-algebra of T-invariant (mod 0) sets. For almost all XEX,
(3.12) Proof We first handle periodic points. Let X = U~o Xn U Xoo where x E Xn iff x is of least period n. All the Xn are in J, and J restricted to Xn separates the n-point orbits. Thus both E(f[J) and limk~oo.h are equal to (lIn) ~}:6 f(Ti(x» on X n. We are left with X oo , on which T is non-periodic. Renormalizing the measure, we have reduced the problem to T a non-periodic transformation of (X'~,J-l).
Define
J(x)
=
lim sup(An(J, x»,
taking values in R U {oo, -oo}. As J(T(x» = J(x), J is J measurable. Let E E J. We wish tO show that JEJdJ-l = JEfdJ-l, and hence J = E(f[J) a.e. As the sets where f > 0 and f ~ 0 are in J we can, withbut loss of generality, assume ? O. A
J
36
I
Fundamentals of measurable dynamics
Let Eoo = {x E E :/(x) = OO}, EM Fix e > ik(x)
~
00
°
and M > so that
(1) AM x;)+1(J,x) >
= {x E E:/(x) ~ M}.
(3.13)
°
and we can measurably select functions ik(x) = 0,
~ if x E Eoo; (3.14)
Use the backward Vitali lemma to find a set F £ X and functions i(x) = 0, j(x) = A(x) for some k so that all the J(x) = Uf~l(x) Ti(X) are disjoint and B = UXEFJ(X) has JJ(B) > 1 - B. As E oo , EM E J, if x E Ea, then J(x) £ Ea. Thus for Eoo ,
r
JE.,nB
r ~! r
fdJJ =
(j(x)
+ 1)Aj (x)+1(f,x)dJJ
JE",nF
(j(x)
+ 1)dJJ = JJ(E oo n B).
B JE",nF
B
So
Letting B..... 0, JJ(Eoo) = O. Thus E = limM EM a.s. so 0 ~ I < Now by (3.11)
f
B 2: BJJ(EM n B) 2: B
(j(x)
00
a.e.
+ l)dJJ
EMnF
r ~ Ir ~
I/(x) - Aj(X)+1(f,x)I(j(x)
+ 1)dJJ
JEMnF
JEMnF
and all the sets A', T(A'), .;., T N- 1 (A') are disjoint.
°
Proof We prove the contra positive. Suppose Jl(A) > some 0 < n < N,
°
and for all A'
£;
A, for
T"(A') n A' ¥- 0. As we could always delete sets of measure
Let n(A') be the least n > n(A'). Let
°
°
from A', we must have an n with
Jl(T"(A') n A') > O.
with Jl(T"(A') n A') > 0. If A"
No
=
£;
A' then n(A") ~
max (n(A'» < N. A'S;;A /l(A'»O
°
.There is a subset A' £; A, Jl(A') > and n(A') = No, so A', T(A'), ... , TNo -l(A') are all disjoint. Let A" = A'\ TN°(A'). As A", T(A"), ... , T No- 1 (A"), TN°(A") are all disjoint, n(A") > No so Jl(A") = 0, i.e.,
A' = TN°(A')
a.s.
This must also be true of any subset of A'. Let Pi be a refining tree of partitions of A'. For any subset S E Pi'
TNo(s) = S a.s. Delete from A' a TN°-invariant subset of measure (A"),
°
so that on what remains
TNo(S n A") = S n A" for all S E Pi. For any chain of sets {cd in the tree p;, that descends to a point x E A", it follows that TNo(x) = x. Thus every point of A" is a periodic point of period No < N.
°
-
Corollary 3.13 If T is non-periodic, then for any subset A, Il(A) > and i !S;; 0 !S;;j, there is a subset A's; A, Il(A') > 0 with Ti(A'), T i + 1 (A'), ... , TJ(A') all disjoint. _
38
I Fundamentals of measurable dynamics
Exercise 3.2 Let P" = {x: Tn(x) = x and n is the least such}. If J-L(Pn) > 0 show that Pn = Ui:6 Ai where the Ai are disjoint, T-l(A;) = A(i-l)mOdn' This shows what the periodic part of T looks like. Exercise 3.3. Suppose T is ergodic but Tn is not, for some n > 1. Show that there is a value k dividing n, k =f. 1, and disjoint sets A o, At, ... , A k- l with x = U~:6 Ai a.s., T-l(A i ) = A(i-l)mOdk and Tk acting on Ao is ergodic. We are now ready for our two most technical steps. Notice the parallel between them and the proof of the standard Vitali lemma of Chapter 2. The use of one-dimensional geometry is the same. Because our sequences of intervals nest outward instead of inward, we must use much more explicit constructions than in the classical Vitali lemma. Lemma 3.14 Let (X, ff', J-L) be a Lebesgue probability space, T a measurepreserving invertible non-periodic map of X to itself. On a set A c X, J-L(A) > 0, we are given bounded integer-valued measurable functions i(x) ~ 0 ~ j(x). There is then a measurable subset A' c A, so that for all x E A', the sets l(x) = U{ 0, and Ti(l)(A' ), Ti(I)+l(A' ), ... , Tj(l)(A' ) are disjoint}.
By the previous lemma .911 =f. t/J. If A'l ~ A~ ~ .. , ~ AI. ~ ... are all in .911 then so is Uf=l A;. Thus there must be an a.s.-maximal set under containment A'l E .911 , i.e., if A'l ~ A" E .911 then Il(A") = Il(A'd. (This is not done with the axiom of choice. Rather the sequence is constructed to approach the maximal available measure.) We claim j(l)+n(l)
A i(1),j(l)
~
U
i=i(I)-n(l)
Ti(A'I ) a.s.
(3.17)
Ergodic theorems and ergodic decomposition
I
39
Hnot, then )11)+n(l)
B
U
= A i(l),)(I)\
i=i(I)-n(l)
Ti(A'I)
°
would be of positive measure and hence contain a subset B', J-L(B') > and Ti(I)(B'), T i (l)+1(B'), ... , Tj~)(B') all disjoint and disjoint from Uf 0. Hence, as of some stage J(x), sets cease being split. As (X(x) is a constant on the equivalence class, all non-zero terms in the vector must be equal. Break X" into T-invariant subsets XI' X 2 , X 3 , ••• , X~ where XEXn if (X(x) = lin (1/00 = O).•We will discuss X oo , as it is the more interesting case. The X n , n < 00 follow a similar line of reasoning leading to the parts of the decomposition of type 1. X is the part of type 2. Assume Il(Xoo ) > 0, and renormalizing Il, we can assume X = Xoo and for all x, (X(x) = 0. We first construct a map cp from a.a. the equivalence classes of points not separated by {Q/} onto a.a. of [0, 1] by assigning to successive levels of the tree left-closed, right-open intervals. We have discussed this construction in detail in Chapter 2. The map cp is defined a.e., maps to a.a. of [0, 1] and is bimeasurable and measure-preserving. We can assume the subset X' where cp is defined is T-invariant. Let Z = cp(X'). We wish to refine cp to a point map from X' to Z x [0,1], still bimeasurable and measure-preserving by successively cutting the fibres fz = z x [0,1] according to the probability vectors D(Pj Id)( iji-l(X». Let PI = {Pl,P2' ... 'Ps}. The functions D(P;ld) are constant on {Qd separated fibres, hence {D(P/ld)(cp-l(Z» H=l is well-defined everywhere on X and is a measurable probability vector-valued function. For a set qj E Ql and Pi E PI' define a map cp(qj f""I Pi) to the measurable 00
48
I Fundamentals of measurable dynamics
subset of cp(qj) x [0,1] between the graphs of
E(iO Pil d
)
0
E(
and
cp-l
k =1
UP;lJII)
0
ip-l,
(3.35)
k =1
closed below, open above. It follows easily that [2 (cp(qj
n Pi)) = /1(qj n p;)
(3.36)
where [2 is two-dimensional Lebesgue measure. Figure 3.2 illustrates this construction. We extend cp to sets of the form qj n Pi'
Pi
qj E Qb
E
Pk
by induction on k. We assume the elements of Pk are ordered so that those whose union is the first element of Pk - 1 come first, next those in the second element, etc. Thus defining cp(qj n Pi) to be the subset of ip(qj) x [0,1] between the graphs of
E(G Pdd)
0
E( UPdd)
and
ip-l
0 ip-I,
k =1
k=1
closed below, open above, we automatically get
- -- - - -- -
r
-
-,--
-
V-
- -- - - -- ---- - r-_ ....- - - - - - - - - -- .... _ f- -
-
- -....
....
-
--
-
....
- -- f- -
-
Fig. 3.2
PI'
---
--.
--- ---....
/
........ .... -........
'"
--....
- r- -
- r- -
....
....
-- --
-
:.---
The dashed lines indicate how P2 refines PI'
.... ....
--
--
V---
- - r-- - I-I- -- - f- - -
-
Disintegration over a factor algebra. The dashed lines indicate how P2 refines
Ergodic theorems and ergodic decomposition
t.
(2 (qJ(qJ () Pi)
=
I 49
/l(qj ("'\ Pi)·
2. {qJ(qj ("'\ Pi)lqj ~PI iSJrom level k of {Qi v P;}} partitions Z x [0,1]. Call this partition Qi v Pi· 3. The partitions {Qi v P;} form a tree of partitions exactly mirroring the intersection properties of {Qi v P;}. The map qJ gives a 1-1 correspondence between the chains of the {Qi v PJ tree and the {Q, v P'} tree of partitions. If two points Zl and Z2 are not separa~d by the {Q, v Pd tree then first, they must lie on the ~ame fibre as the Qi tree separates points of Z. But now, from our constructIOn, the cham of sets c1 ::::> C2 ::::> ••• that contains Zl and Z2 must intersect to an interval, and hence
fz'
Now the Ci = qJ -l(Ci) either intersect to 0 or to a single point x E X. If not 0, then ex = ex(x) > 0 which we know is not true. Hence the collection of all chains in {Qi v Pi} that descend to more than one point correspond to empty chains in {Qi v Pi}. These form a set of chains of measure in X, hence a set of measure 0 in Z x [0,1]. On what remains, {Qi v P'} is a generating tree of partitions. The {QI v P'} chains that descend to 0 form a set of chains of measure O. Delete from X a T-invariant subset of measure containing the intersection points of the chains corresponding to these. On the remaining T-invariant set of full measure qJ now reduces to a point map which is bimeasurable and measure-preserving as the two trees are images one of the other and generate. Let f = qJTqJ-l be a measure-preserving map of qJ(X) c [0,1] x [0,1] to itself. As T(Qd and T-1(Q;) are subsets of Qi+l' Tmaps fibres fz to fibres, but furthermore, as T(Pi ) and T-1(Pi ) are subsets of Pi+l' T maps the intervals on a fibre corresponding to some Pk to a finite union of intervals on the image fibre in a measure-preserving fashion, the {.P;} tree restricted to a fibre generates the Lebesgue algebra ~ on it and so letting t z represent this fibre measure, T is a bimeasurable measure-preserving map from (f(z),~, t z ) to (f(T(z», ~1'(z),t1'(Z»· _ As the Qi consist of vertical sets and are qJ(Qi),.9I = qJ(.9I) is the completion of the algebra of vertical sets. From Fubini's theorem, for any measurable set A c [0,1] x [0,1], for a.e. z, A ("'\ f(z) is measurable
°
°
E(AI.9I)(z)
and t(A)
For X n , n
O. Show Xn = Ui:J Ai' a disjoint union, and T- 1 (Ai) = Ai -l(modnj. Note the similarity of this to Exercise 3.3 except here ergodicity of T is not assumed, just that J c do
Corollary 3.18 If in the above construction .91 = oF = the algebra of Tinvariant (modO) sets, then for a.e. x, T is a bimeasurable measure-preserving ergodic map from (F(z), §"., t z ) to itself The disintegration of (X, JF, Ii) over J is called the ergodic decomposition of the system. Proof Mter deleting a set of measure 0, T(Q;) = Qi identically. Hence, a.s., T(f(z»
= f(z).
All that remains is to verify ergodicity. We use Corollary 3.16. Let f be the characteristic function of some set Pk E Pi, and fn its Cesaro averages. By the Birkhoff theorem, J.. converges a.e. to E(floF). As there are only countably many such f, for a.e. z, for t z a.e. y, f.«z,
y»
--+
E(fIJ)(z),
a constant t z a.e. on f(z). That this holds for finite unions of sets in some Pi follows easily. These are Ll dense as {PJ generates, hence by Corollary 3.16 to the Birkhoff theorem Ton (f(z), ~,tz) is ergodic for a.e. x. • This completes our discussion of disintegration over factor algebras. We
will not argue the essential uniq ueness of the disintegration. This is not terribly difficult to demonstrate.
4 Mixing properties
The fundamental problem of ergodic theory is to explore the structure of measure-preserving transformations in search of properties natural to them, which can be easily applied to describe and distinguish them. In this chapter we will discuss a hierarchy of such properties, each successively stronger, called mixing properties. The reason for this name is that they concern the way in which the powers of a transformation T 'mix' one set, A, into another set E, i.e., they concern the sequence of functions (4.1)
4.1
Poincare recurrence
The simplest of these properties is Poincare recurrence which says for some i >0
Theorem 4.1 space (X,~,
If T is a measure-preserving transformation of the probability
J-t) and J-t(A) > 0, then for some 0 < i -< [1/J-t(A)] J-t(T-i(A) n A) > O.
(4.2)
~
Proof If J-t(T-i(A) n A) J-t(X)
~
=
0 for all such i, then
ll;t:l]
J-t(T-i(A))
= J-t(A)
(1 + [J-t(~)J) > 1,
a conflict.
4.2
Ergodicity as a mixing property
If T is ergodic we know more. The L 2 -ergodic theorem says
which we could write as
•
52 I Fundamentals of measurable dynamics 1 "-1
11 i~
f
(XT-;(A)' XB -
Jl(A)Jl(B»dJL ~ 0;
this is half of the following theorem. Note: from here on 'transformation' will always mean an invertible, measure-preserving map from a Lebesgue probability space to itself. Theorem 4.2 and B,
A transformation T is ergodic iff for any two measurable sets A
(4.3)
Proof We know one direction, that if T is ergodic the limit holds. Assume the limit holds and suppose T(A) = A. Letting A = B 1 "-1
11 i~
f
(XT-'(A)' XA -
Jl(A)2)dJl ~ O.
But as T- 1 (A) = 1:1. this says
frl
dJl
= Jl(A) = Jl(A)2.
•
Thus Jl(A) = 0 or 1 and T is ergodic. Corollary 4.3
The transformation T is ergodic iff for any measurable set A
ifo f
1 "-1
11
(XT-;(A)' XA -
JL(A)2)dJl ~ O.
•
For mixing properties this is not unusual, that knowing all sets mix with themselves in some fashion implies the same for any pair of sets.
4.3
Weakly mixing
Our first non-trivial mixing property will require that the Cesaro convergence of the sums above be absolute. Definition 4.1 We say a transformation Tis weakly mixing iffor any measurable sets A and B, (4.4)
This condition lies at the heart of a large web of argument. Perhaps the core result here is that any ergodic transformation has a maximal invariant factor
Mixing properties
I
53
algebra on which it is isomorphic to an isometry of a compact metric space. The transformation is weakly mixing exactly when this factor algebra is trivial. We will ofTer two proofs of this fact, one by a bare-hands construction of the metric space due to Katznelson, the other via a short discussion of spectral theory. Our first result will already show that weakly mixing is a non-trivial condition. We first need a preliminary definition. We use the symbol # to indicate the cardinality of a finite set. Definition 4.2
A subset SeN is of density rx if #(Sn{O,I, ... ,n-I})
n and of full density if rx
=
n
-+
rx,
(4.5)
I, i.e., S contains 'almost all' of N.
Lemma 4.3 A transformation T is weakly mixing iff for any measurable A and B there is a subset S = {nl < n2 < n3"'} of N of full density for which lim
~(T-nk(A)
n B)
= ~(A)~(B).
(4.6)
k~X!
Proof
Suppose such a subset S existed. Then -
!~~
;;1 Jo f IXT-'(A)XB n-l
~(A)~(B)I d~
~ !~~ ~Cn{o,~,n-L} 1~(T-i(A) n B) - ~(A)~(B)I) -~
1
+ lim ~ ( # SC n {O, I, ... , n - I}) "-Xl n
which equals zero. On the other hand, if T is weakly mixing,
1 n-1 . lim ~ L I~(T-'(A) n B) - ~(A)~(B)I n--+oo n i==Q Letting S,
=
=
0.
{i: IJl(T-i(A) n B) - Jl(A)Jl(B) I < Il},
-I' #{S%n{O" .. ,n-I}} 1m n
1
~ ~
lim
In~l
.
~ L... I~(T-'(A)
8n~X!ni=O
n B) -
~(A)~(B)I =
0.
#{S,n{O, ... ,n-I}} , , Th us IImn~X! = 1. If denSity in N were O'-additive, we n could now just take Sl/i' As it is not, we must be more clever, Choose {Ni } so that Ni+l/Ni > i and for n 2 Ni
ni
54
I Fundamentals of measurable dynamics #{SI/i n {O, ... ,n-l}} n
Let S =
Ur,;,l SI/i n
> 1-I/i.
{a, 1, ... , Ni - I}. The result is now a computation.
-
Corollary 4.5 T is weakly mixing iff lim
n1 i~ f(XT-' O. But for such an n,
J.l(Bro(x) n T-"(B,o(x»
= J.l(B,o(x) n B'o(T"(x» ~
J.l(B,o(x» - 1:(J.l(B'o(x)d Bro(T"(x)))
~
(1 - 1:)J.l(B,o(x»
and T is not weakly mixing.
-
Corollary 4.7 If T, acting on (X, F, J.l), has a factor action measurably isomorphic to an isometry of a compact metric space then T is not weakly mixing. Proof If T is weakly mixing, then restricted to any factor it also is.
•
Exercise 4.1 Refine Theorem 4.6 to show that if T is a minimal isometry of a compact metric space X, J.l its unique invariant Borel probability measure, then for any FE L 2(J.l) and I: > 0, show there is a sequence nk of positive density with fIF(T"(x»F(x) - 11F11WdJ.l < 1:. We will now develop a circle of equivalent definitions of weakly mixing, one piece of which will be the converse of Corollary 4.7.
Mixing properties I 55
Theorem 4.8 Let T acting on (X,.fF,J.l) be ergodic. If the Cartesian square TxT, acting on (X x X,.fF x .fF, J.l x J.l) is ergodic then T is weakly mixing. The proof rests on the following piece of arithmetic. Lemma .4.9
If an is a sequence of real numbers with
1 "-1
(4.8)
lim-Lal=a " .... 00 n /=0 and 1 "-1
L
lim -
"-00 n
i=O
ar =
a2
then 1 "-1
lim Proof
1
n-1
lim - L (aj - a)2
"-00 n i=O
Proof of Theorem 4.7
L (ai -
n 1=0
""'00
= lim
a)2
= O.
(1 L ar - -
2a n-1
"-1
"-00 n ;=0
n
)
L ai + a 2
= O.
i=O
•
For any sets A and B,
1 "-1 lim - L Ji(A
n"'oo
n 1=0
.
11
T-'(B» = Ji(A)Ji(B)
as T is ergodic. Letting A = A x Allnd B = B x B,
1 "-1 _ _ 1 "-1 lim - L Ji x Ji(A 11 T- i x T-i(B» = lim - L Ji(A 11 T-i(B»2 "-00 n i=O "-ex:: n i=O = Ji x Ji(A)Ji x Ji(B) = (Ji(A)Ji(B»2 .
• By Lemma 4.9 then
1 "-1 lim - L (Ji(A
"-00 n i=O
11
T-i(B» - Ji(A)Ji(B»2
and Corollary 4.5 tells us T is weakly mixing.
=0
•
The next theorem is Katznelson's demonstration that if a transformation is not weakly mixing, then it must have a factor isomorphic to a minimal isometry. This is done by constructing an invariant pseudometric which
56
I
Fundamentals of measurable dynamics
makes the space precom pact. Later we will give an alternate proof via spectral theory. Theorem 4.10
If T acting on (X,:#', p,) is ergodic and has no factor actions isomorphic to an isometry of a compact metric space and S acting on (Y,~, v) any other ergodic transformation, then T x S acting on (X x y,:#' x ~,p, x v) is ergodic.
Proof We verify the contrapositive, i.e., assuming the Cartesian product is non-ergodic we construct a factor algebra on which T is isomorphic to an isometry of a compact metric space. We do this by constructing a non-trivial T-invariant pseudometric on X making it a precompact metric space. Let A E :#' x ~ be a T x S-invariant set, 0 < p, x v(A) = IX < 1. Let :#'1 = ff x {trivial algebra} and :#'2 = {trivial algebra} x 0, X can be covered a.s. by a finite number of e-balls in the pseudometric d.
Mixing properties
I
57
Let {A 1 ,A 2, ... } be a sequence of sets dense in 0 and now select N so large that Jl
N) (U Bi > 1 i=l
Jl(Bd
--. 2
Let Xl = Ui:=-oo Ti(Bd, a set offull measure by ergodicity of T. For x E Xl, E TJ(Bd, as Jl(TJ(Bd) = Jl(Bd > 1 - Jl(Uf=l B;), there must be ayE TJ(B 1 ) n Bk for some k E {1, ... , N}. But then
X
and so the sets Ai:=
{X: v(AxL\A;)
n
Proof For f in the cone of positive continuous functions L. : f ..... is continuous in the uniform norm on f. We can extend Ln to all of CIR([O, 2n» by L(f+) - L(f-) and get a complex-valued continuous linear functional. Hence by the Riesz representation theorem, Ln(f) = f dVn for some perhaps complex-valued measure vn • But for f ~ we compute
J
°
so Vn is a positive real-valued measure with vn([0,2n» = p(P(O»
0
T
for any trigonometric polynomial P. The next result is the core of spectral theory as we will use it. Let L 2(F, p,) be the closure in U(p,) of the linear span ofthe set offunctions {F 0 TJ}jez,
Mixing properties
I
61
Theorem 4.14 The map cI>F extends to an L 2-isometry from L2(Sp(F» to L2(F, J,L). As we have seen, cI>F(l) = F and for any f E L2(Sp(F», cI>F(e i6 f(O» = cI>(f)
Proof
0
T.
For P and Q polynomials, (P, Q)sP(F) = =
L b/:kaj,k j,k
f(~
bjF(Tj(X»)
= (cI>E(P), cI>F(Q» F
(~ CkF(Tk(X») dJ,L
w
Thus cI>F is an L 2-isometry where it is defined. Certainly then it extends isometrically to the closure of the polynomials in L 2(sp(F». This is all of L2(Sp(F». The image under F of the trigonometric polynomials is exactly the linear span of {F 0 Tj}jez, Hence the range of the extended cI>F is its closure. Since T acts as an isometry of L 2(F, J,L) to itself, as does multiplication by ei6 on L 2(sp(F», the identity cI>F(e i6 f(O»
= F(f) 0 T
•
extends from polynomials to all of L 2(sp(F».
Corollary 4.15 If G E L 2(F,J,L), then sp(G)-< sp(F) and we can compute the Radon-N ikod ym derivative
Proof Supposing G = (P), P a polynomial and nomials, it is a finite computation that
f
and 9 are also poly-
(Pf, Pg).P(F) = (f, g)SP(fl>F(P))'
Suppose now GEL 2(F, J,L) is arbitrary, G = F{h). Suppose P; -: h in L2(Sp(F», where the P; are polynomials. Then G; = F(P;) -+ G in L 2(J,L). Thus all the coefficients
f G;(Tj(x»G;(Tk(x» dJ,L -: f G)Tj(x»GjTk(x»dJ,L. This tells us, for f and 9 still polynomials (hf,hg)sP(F) = (f,g)sP(G)' I
As F and G are both L 2 isometries, this tells us that the map f -+ if is an U-isometry from L 2(Sp(G» into the ideal hU(sp(F», and for any f E
62
I Fundamentals of measurable dynamics
U(sp(G», f Jdsp(G)
=
flhI2JdSP(F).
•
A complete discussion of the spectral theory of ergodic transformations would now include how U(JI') is built up from the pieces U(F, JL). We have what we want of spectral theory now, though, and refer the reader to Parry (1969) to pursue this picture further.
4.5
Weakly mixing and eigenfunctions
We will now see how a spectral measure sp(F) can ferret out the failure of weakly mixing. Lemma 4.16
For F
E
L 2 (JL), 2n
~ 1 jtn If F(x)F(Tj(x» dJLI
tends to zero in n iff sp(F) has no atoms. Proof To identify the atoms of sp(F) we look at [0,2n) x [0,2n) with measure v = sp(F) x sp(F). Notice that v( {(O, 0) :
°
E
[0, 2n)}) = (sp(F)(atomic part»2.
To compute v( {(O, O)}) consider the sequence of functions In(x, y) = -1- fi.J 2n + 1 j=-n
..( )
ell X-Y
(4.15)
which converge pointwise to the characteristic function of the diagonal and are uniformly bounded by 1. Thus v({(O,O)})
=
!~~
f
In(x,y)dv.
Computing,
t feijXe-ijYdV f ln(X,y)dV = _1_ 2n + 1 j=-n =
1 • -2--1 I ao.jaO.-j n + j=-n
= -1-
2n
I• If F(x)F(Tj(x»dJL. 12
+ 1 j=-n
•
Mixing properties
Corollary 4.17
Suppose T is ergodic. For any set A
I
63
E §",
±
_1_ (Jl(A n T-J(A» - Jl(A)2)2 2n + 1 i=-n
(4.16)
tend to zero in n iff sp(F) has no atoms away from 0, where F
=
XA(X) - Jl(A).
Proof FE L 2 (Jl) and notice
t
_1_ IfF(X)F(Ti(X» dJll2 2n + 1 j=-n =
2n
~ 1 jtn (f XA(X) -
Jl(a»(XA(Tj(x» - Jl(A»Y
Furthermore, sp(F)({O})
=
.f
lim _1-1 fF(X)F(Tj(X»dJl 2n + )=-n
n-ro
=
fFdJl fFdJl
by the L 2-ergodic theorem (Theorem 3.1). This is zero as F dJl = O. Thus sp(F) has no atom at O. By Lemma 4.16 it has no atoms elsewhere if and only if
J
1
2n
n
+ 1 j2n (Jl(A n
Ti(A» - Jl(A)2)2
•
tends to zero. . Notice that
1 2n
n
+ 1 i2 =
n (Jl(A n Ti(A» - Jl(A)2)2
(2n 2n
2) (_1_ t +
+ +1
n
1 j=O
(Jl(A n Ti(A» _ IJ(A)2f) _ (Jl(A) - Jl(A)2f. 2n + 1
Thus the limit of the symmetric average is the same as that of the one-sided average. Corollary 4.18 Suppose T is ergodic but not weakly mixing. There is then a A. E C, IAI = 1, A-# 1 and an f E L 2 (Jl), If I = 1 with f(T(x» = )J"(x), i.e., T has an eigenfunction with eigenvalue A.
64
I Fundamentals of measurable dynamics
Proof As T is not weakly mixing for some non-trivial set A and F(x) = XA(X) - Jl(A), we know sp(F) has an atom not at O. Suppose it is at a point (Jo· Set A = eiflo . The function 1 at (Jo { bflo = 0 elsewhere is in L 2 (sp(F» and
f bflo dsp(F) > O. Thus f = F(bflo ) is not identically zero. But notice f(T(x»
To see that f
E
= F(e ifl bflo(O» = (Abflo ) = ¥(x),
Jl-a.s.
L OO(Jl) just note
If(T(x» I = IAllf(x)1 = If(x) I and so If I is a constant Jl-a.s. As f=l= 0, we can normalize f to have modulus 1. • The eigenfunction f above can be regarded as a factor map from X to S1 carrying the action of T to rotation by (JoIf 00 is irrational we know Rflo is uniquely ergodic, hence f(Jl) must be Lebesgue measure on S1. If 00 is rational then Rflo is not ergodic, but is still uniquely ergodic on each of its ergodic components. They are finite sets of course. Thus f(Jl) must map to exactly one of them. In either case, we have obtained an isometric factor, giving an alternate proof of Theorem 4.10. We have in fact gained more, having identified the isometry as either a rotation on the circle or on a finite point set. Thus, for example, we have shown that any isometry of a compact metric space must have such a factor algebra. As at the end of Theorem 4.10, we are once more poised ready to show that an ergodic T has a maximal isometric factor. To do so along the lines of the proof of Theorem 4.10 involves constructing an appropriate metric from the full algebra of invariant sets in X x X. Here, using spectral theory, we can more explicitly describe this algebra as the algebra generated by the eigen-. functions. The full discussion of this we construct as a series of exercises. Exercise 4.3 1. Using Exercise 4.1, show that if T is a minimal isometry of a compact metric space and F E L 2(Jl) then sp(F) must have atoms. 2. Using Corollary 4.15 show that in fact sp(F) must be purely atomic. Hint: pick G to avoid atoms.
Mixing properties
I 65
3. Conclude that for a minimal isometry, L 2(1') is generated by eigenfunctions. 4. For T arbitrary but ergodic, let .sd be the algebra generated by the eigenfunctions. Show .sd must contain all isometric factors. 5. Let A(T) be the set of all eigenvalues of T. Show that, as T is ergodic and L 2 (J.l) is separable, that A(T) is countable. 6. For each AE A(T), let SA S;;; L 2 (J.l) be the subspace of eigenfunctions with eigenvalue A. Show that as T is ergodic, dim(SA) = 1. 7. Let A(T) = {Ai} and /; E SA, be a generator for SA,. Show that the pseudometric d(x,y) = LII/;(x) - /;(y)1/2 i makes the action of T on the factor algebra.sd isomorphic to a minimal isometry of a compact metric space: This completes the argument that d is the maximal isometric factor, it gives more, showing that T is an isometry exactly when all sp(F) are purely atomic. A minimal isometry is in fact determined, up to isomorphism, by its eigenvalues A(T). A proof of this result of von Neumann is accessible from our current vantage. It follows from the observation that if the eigenvalues agree, then one can construct an isometry between the two metrics as given in Exercise 4.3, part 7, above. This requires a careful analysis of how the various eigenfunctions must fit together. Thus spectral theory, when the spectral measures are purely atomic is carried in the collection of numbers A(T), and is relatively simple. When the spectral measures sp(F) are non-atomic the situation is much more difficult. We will stop here though, and return to tie together all our discussions of the weakly mixing property. Proposition 4.19 For an ergodic transformation T acting on (X, iF, 1'), the following are equivalent:
(1) T is weakly mixing; (2) the Cartesian product of T with any other ergodic transformation, with product measure, is ergodic; (3) the Cartesian square TxT with product measure, is ergodic; (4) T has no factor action isomorphic to an isometry of a compact metric space; (5) T has no non-trivial eigetifunctions.
•
This circle of equivalent conditions makes the weakly mixing property extremely easy to work with. In Chapter 6 we will add another condition equivalent to these three, that of disjointness from all isometries. For a continuation of these investigations into the weakly mixing property, see Furstenberg (1981).
66
4.6
I Fundamentals of measurable dynamics Mixing
Definition 4.3 Our next mixing property of a transformation T, simply called mixing, is that for any two sets A, B E ~ (4.17)
lim f.l(P(A) n B) = f.l(A)f.l(B). Thus the limit on a set offull density of Lemma 4.4 is made a limit.
This would seem to be the most natural of mixing conditions, having a far simpler definition than weak mixing (or the later K-mixing). The fact that we will prove no theorems about mixing is one indication that this is not the case. The mixing property is in fact rather difficult to work with. Definition 4.4 We say Tis k-fold mixing if for any sets Ai' A 2 , lim
... ,
Ak
f.l(Ai n P2(A 2 n T ft 3(A3'" (At - i n Tftk(A k)) ... )
(4.18) There is a parallel notion of k-fold weak mixing, that the full density version of this limit holds. Furstenberg (1981) has shown that weakly mixing implies k-fold weakly mixing for all k. The relationship between two-fold and threefold mixing remains one of the outstanding unsolved problems in measurable dynamics. Chacon's transformation
We will now describe the construction of a transformation T, due to Chacon, which is weakly mixing but not mixing. We give a rank-1 cutting and stacking version of the construction. As described in Example 5 of Chapter 1, for Chacon's example k(n) = 3,
S(1, n)
=
S(3, n) = 0,
S(2, n)
= 1,
i.e., at each stage of the construction the stack is cut into three equal slices, which are stacked in order with a single new level placed between the second and third slices (Fig. 4.1). It is a computation that T acts on the interval [0,3/2). Theorem 4.20
Chacon's map is weakly mixing but not mixing.
Proof Let N(n) be the number of intervals in the nth stack, and label these intervals /(1, n), /(2, n), ... , /(N(n), n)
in order from the bottom of the stack upward. Thus for 1:::; j < N(n), T(/ (j, n» = I (j + 1, n).
Mixing properties I 67
D DiDiO D D
S(2,n)= 1
Cutting
Stacking
Fig. 4.1
Cutting and stacking in Chacon's map.
From the diagram above, for 1 < j < N(n), 1
J.l(TN(n)(I(j, n» (') I(j, n» ~
"3 J.l(I(j, n»
J.l(T N(n)+l(I(j, n» (') I(j, n» ~
"3 J.l(/(j, n».
1
(4.19)
Let S = 1(2,2). For the second level of the second stack, J.l(S) = 2/9. For any n > 2, S is a disjoint union of sets of the form I(j, n) where 1 < j < N(n). It follows that for n > ~2, (4.20) and T is not mixing. We show T is weakly mixing by contradiction. Suppose d( , ) is a nontrivial T-invariant pseudometric on [0,3/2] making it precompact. For 1/10 > 8 > 0, let D be an 8-ball of positive measure ::F 1. As the intervals I(j, n) refine in n, there must be an nand j with l(j, n) satisfying J.l(I(j, n) (') D) ~ (1 - 8)J.l(I(j, n»,
(4.21)
i.e., all but a fraction 8 of I(j, n) is within a single 8-ball. Let
l(j, n) = I(j, n) (') D. As dis T-invariant, setting f(k, n) = T"-J(I(j, n», each f(k, n) has radius at most 8 and occupies a fraction (1 - 8) of I(k, n). As 8 < 1/10, and •
I Fundamentals of measurable dynamics
68
Jl(TN(n)(I(j, n» r. I(j, n»
~ ~ Jl(I(j, n»,
we know
Hence for a set of points x of positive measure, i.e., those in d(TN(n)(x), x)
[U, n),
< 28.
But as f(x) = d(TN(n)(x), x)
is T-invariant it is constant a.e., hence d(TN(n)(x),x) < 28
(4.22)
d(T N (n)+1 (x), x) < 28
(4.23)
almost surely. Using
we similarly conclude almost surely. But then d(T(x), x)
< 4e
(4.24)
almost surely, hence d(T(x), x)
= 0 almost surely and as a.e. orbit is dense for
d, d is the trivial pseudometric.
_
Exercise 4.4 Give an alternate proof that Chacon's map is not weakly mixing by showing it has no non-trivial measurable eigenfunctions. Hint: Show that any measurable function must finally be essentially constant on most levels of the towers. Use this, and the 'spacer' to show that the only possible eigenvalue is 1. Exercise 4.5 Consider the rank-l cutting and stacking construction with = 2", and spacers S(i, n) = 0 except for i = 2"-1 and S(2 n-1, n) = 1. Call the e~godic map obtained T.
k(n)
1. Show T is weakly mixing. 2. Show that there is a sequence nk .....
00
so that for any set S,
Jl(ynk(S) r. S) -: Jl(S),
(this property is called rigidity).
Mixing properties
I 69
3. Show that a mixing map is never rigid, and hence T is not mixing. 4. Show that an isometry of a compact metric space is always rigid. 5. Show that Chacon's map is not rigid, and in fact, for no set S "# 0 in X is there a sequence nit /' 00 with Jl(rnk(S) AS) --+ O. This gives a third proof that Chacon's map is weakly mixing.
4.7 The Kolmogorov property We end this chapter by giving a definition of K-automorphism (K is for Kolmogorov) that shows how it fits into the hierarchy of mixing conditions. In the next chapter on entropy we will develop a chain of equivalences for the . K-automorphisms very similar to that we developed for weak mixing. Definition 4.5 A map T acting on the Lebesgue probability space (X,~, Jl) is called a K-automorphism provided the following is true. For any finite partition P and for any 8 > 0 there is an N (depending on both P and 8) so that for any t E Z+ and any integers nz, n3"'" nt with nz > N, n3 - nz > N, ... , nt - nt - 1 > N, we have (4.25)
for all B E P. Comments on the definition.
1. Notice that if t were kept bounded by k the definition reduces to n...Tn2(B2) rn (B )···1i T"k(B IJl(BJl(T"2(B T"k(B » 2 rn (B 3
Ii
Ii
3
k
I
(B) < e
It ) _
3
3 ) ••• I i
Jl
(4.26)
once n 2 , n3 - n 2 , ••• , nk - nk-l are large enough. A little induction shows that this is equivalent to k-fold mixing. Thus the K-property is a kind of uniform k-fold mixing, and K-automorphisms are mixing of all orders. 2. Although the definition gives uniformity in t, we must choose the sets B,
B2 , ••• , B, from some fixed finite collection. If we allowed ourselves to choose from an infinite collection, for example, B, T- 1 (B), T-Z(B), ... , no map would satisfy the condition.
3. We use conditional expectation in the definition as opposed to
as these values automatically tend to zero in t if T is mixing. For any e and for large t, the condition would be vacuous. Thus this property is just k-fold mixing for all k.
70
I
Fundamentals of measurable dynamics
The K-property asks for more, requiring that each new Tn'(Bt ) mix well with the previous ones relative to their size. Just as weakly mixing opened the door to spectral theory and point spectrum, the K-property will open the door to entropy.
5 Entropy
Our approach so far to the classification of ergodic processes has been to search for natural invariants of the process. We have discussed two sorts, the point spectrum and mixing properties. One more invariant remains to be looked at, perhaps the most fundamental numerical invariant of stationary stochastic processes, the Kolmogorov-Sinai entropy. What the entropy attempts to measure is the rate at which a process becomes random.
5.1
Counting names
The approach we take to this question is unusual, being based from the outset on name counting. We will not intersect the classical theory for some time (at Definition 5.3). With our approach the ideas will evolve quite naturally, and the skills we develop in manipulating names will stand us in good stead in Chapters 6 and 7. Let's begin to focus precisely on how entropy is computed. Let T, acting on the Lebesgue probability space (X,!F, p.) be an invertible measure-preserving ergodic map, and P = {Pl' ... , P.} be a finite partition of X into measurable sets. Here is a slightly novel notion of a 'partition' which will be very useful for us. Regard P as a function from X to a 'finite state space' of 'symbols' or 'names.' Thus P:X ..... {Pl' ... ,Ps} is a finite partition and {Pl' ... 'Ps} is the space of symbolS': The set Pk is then more precisely p- 1 (Pk), the set of points 'named,' or 'labeled' Pk. It will happen often that we will label several partitions with the same labels. In this case the state space will be lower case letters, SUbscripted to order them. The function will be indicated by the corresponding capital letter, embellished with a subscript or other notational device to indicate it uniquely (Pl ,P2 ,P',F, etc.). When there is no ambiguity we will refer to a set by its name. The T,P,n-name of x E X is a sequence of n symbols, each chosen from among PI, ... , P. written Pn(X)
= (P(x), P(T(x», . .. , P(Tn - 1 (x»)
(5.1)
where P(x) is that element of P containing x. Thus Pn maps X to r. These sequences of n symbols from P are then the names for the sets in T-i(P). The measure ~ on X is transported by Pn to a measure on the elements PO-the measure of a name is the measure of the set of points which have that name.
vr;;J
72
I Fundamentals of measurable dynamics
Entropy, roughly speaking, is the exponential rate of growth in n of the number of T,P,n-names. To be precise, fix 8 > O. Starting from the names of least measure in pn, remove as many as possible so that the measure of the remaining names is still greater than (1 - 8). This collection of remaining names we denote by
S(T, P, n, 8). The presence of the 8 is what ties the entropy to the invariant measure p. We only consider the 'large' names. If we did not omit any names, measure would play no role in the definition. This is a fruitful approach also and leads to the development of topological entropy for symbolic systems. Let
N(T,P,n,8)
(5.2)
# (S(T,P,n,e».
=
It seems reasonable to guess that this number would grow exponentially in n. To try to extract the coefficient of this growth we introduce the following definition.
Definition 5.1
For T ergodic and P a finite partition let
1 h(T,P,n,e) = -log2(N(T,P,n,e» n
[note:
~
log 2 s].
(5.3)
To remove dependence on n set
h(T, P, e) = lim h(T, P, e, n),
(5.4)
and to remove dependence on e set
h(T, P) = lim h(T, P, e).
(5.5)
The limit certainly exists as the sequence h(T, P, e) is monotone increasing (as e ..... 0) and bounded (by log2 s). We first use the backward Vitali lemma to show that e plays no significant role, except to be less than 1, in this computation. Theorem 5.1 we have
For T ergodic and P a finite partition,for any e', with e' < e < 1 lim h(T,P,n,e')
~
lim h(T,P,n,e).
Proof Fix e, e' and select {n;} so that lim h(T, P, ni , e)
= lim h(T, P, n, e) = h(T, P, e).
Also fix 8 < e', and assume 1h(T, P, ni , e) - h(T, P, e)1 < 8/10.
(5.6)
Entropy I 73 Recall that S(T, P, nj, e) is the set of 'large' names-i.e., a minimal (in cardinality) set of T,P,nj-names sufficient to cover all but e (in measure) of X. Set
8i = {x: Pn'(x) E S(T,P,ni,e)}.
(5.7)
Clearly I'(S;) ~ 1 - e. Let
[= n~=l Ur;,N S;
= the subset of those x which lie in infinitely many S;], and
1'(8) ~ 1 - e > O.
Consider
i=O
Since T is ergodic 1'(8)
= 1 and
without loss of generality we may assume
$=X. For x E X = $ we have x E Ti(S) for some i so T-i(x) E 8 and for infinitely many k, T-i(x) E SI;' Let k(l, x) < k(2, x) < ... be these k's. Once nl;(n.x) > i we may set in = - i, in = nl;(n.x) - i. Thus for each x E X there are values in < 0, jn > 0 with jn - in -+ 00 in n and the T, P, jn - in-name of T-in(x) is in the set S(T, P, jn - in, e). We also know 1h(T, P, jn - in' e) h(T, P, e)1 < '8/10. We are now ready to use the backward Vitali lemma (Theorem 3.9). Applying it, there is a set F, and for x E F, values i(x), j(x) from among the in (x), in(x) so that the orbit intervals ~
{Ti(X)(x), Ti(X) +1 (x), ... , Ti(x)(x)}
are disjoint and cover all but '8/10 of X. There is also an No with j(x) - i(x) < No uniformly. Let xeF
and I'(G) > 1 - '£/10. Select N ~ IONofe and so large that for n ~ N, for all but '8 of X in measure and for all but a set of density at most '8/5 of i E {O, 1, ... ,n - I}, we have Ti(x) E G. That we can do this is a consequence of the Birkhoff ergodic theorem applied to 1.G' Let H be this good set for the ergodic theorem. We now compute an upper bound for the number of T,P,n-names for points in H. Such a T,P,n-name can be represented as in Fig. 5.1. There is a subset of the name of density at least 1 - (2/5)'8 consisting of disjoint blocks 110 12 , ••• , II and across each such block we see a name from S(T,P, #11;,e). These blocks are the intervals
74
I Fundamentals of measurable dynamics HI[~---H][~--~]~E----4-~jJE ~~
11
[
~~
12
//- 1
Fig. 5.1
T,P,n-names for x.
(i(TU(x))
+ u,j(TU(x) + u)
1/
where P(x) E F, and the entire block is contained in (0, 1, ... , n - 1). The remaining indices correspond to points TU(x) ~ G or whose block of indices is not completely contained in (0, ... , n - 1). The number of such T,P,n-names in H is bounded by of names ) IJ'(#possible
(#
of ways ) the I h . : ., I, x k-1 can anse
Now
across II.:
(#
of ways)
(1)
theI 1 ,.:.,I,:::;
(#
( #
p~~:~~es )
(
#
of names po.ssible outSIde the Ik
1 .
(S.8)
of subsets of size )
atmostt~nina
,
set of SIze n
can anse
(2)
x
= 2"(T,P, #lk,a)(#lk)
across Ik and (3)
(p~s~i~7:~~:-):::;
s2/Sin.
(S.9)
side the Ik
To estimate the number of subsets of size at most (2/S)I.n in a set of size n we use Stirling's formula. (S.lO)
where
Note that this fundamental combinatorial formula can be regarded as the core of entropy theory. See Ahlfors (1966) for a proof. Thus the binomial coefficient
I 75
Entropy
So the number of sets of size at most exn in a set of size n is
n ) ( [exn]
+(
n ) [exn] - 1
+ ... + (n) + (n) ~ (ex-«(l 1
0
_ ex)-(l-«»)n
In. (5.12)
Set
H(ex)
=
(5.13)
-odog 2 ex - (1 - ex)log2(1 - ex).
Combining estimates (5.11) and (5.12), the number of T,P,n-names covering all but e of X is bounded by 2', where
e) k~l #11 (5Te) (n + clog2 n) + (18) 5 (log2 s)n + (h(T, P, e) + 10 e H (Te) log2 n (Te) ) ( h(T,p,e)+10+ 5 +c n -+ 5 log2 s n. t
r= H ~
Thus
h(T,P,n,B) ~ h(T,P,e)
(26) 26
n e + clog2 n + 10 + H 5" + slog2 s.
This holds for all n sufficiently large so
!~~ h(T, P, n, e) ~ h(T, P, e) + :0 + H (~) + ~ log2 s.
(5.14)
Clearly for e ~ e',
lim h(T,P,n,B)
~
lim h(T,P,n,e')
~
so if we let e -+ 0 in (5.l4), lim h(T,P,n,e')
~
h(T,P,e).
It follows now that for e < 1,
lim h(T,P,n,e) exists and its value is h(T, P) independent of e. Corollary 5.2 (Of the proof) Given T as usual and a finite partition P, there exists an increasing sequence of sets An whose limit is a.a. of X so that log2( # of T,P,n-names in An) n
Converges to h( T, P) as n -+
00.
(5.15)
76
I Fundamentals of measurable dynamics
Proof In the course of proving the previous theorem we selected 6 and constructed a set G of measure at least 1 - 6/10 using the backward Vitali lemma. We discarded those points of X whose orbit of length n was outside G more than 6/5 of the time-by the pointwise ergodic theorem this is a decreasing sequence of sets. More precisely, let
Bn ,.
=
{x :pointwise ergodic theorem holds for for all n'
XG
~ n to within an error of 10 6 }.
Here we choose 6 so
and n so large that
By the previous proof, log( # T,P,n'-names in B
,.)
n --=--=------'------.:--,_ _----=.::.c = h( T, P)
n
± E:
for all n' ~ n. Now let ei = 2- i and select {nJ increasing so that Jl(Bn".,) > 1 -
and for n' > ni in Bn I og ( # T,P,n'-names ,'"
r
i
.»)
n
= h(T,P)
± ei •
Set An =
n
ni+l~n
Bn".,·
Now
so An increases to a.a. of X. Also
if-
log( T,P,n-names in ---=::... An) ---="-'---_ _____
:s; log( # T,P,n-names in Bn ,,0.•. )
n
n =
where
h(T, P)
± ei
Entropy I 77
Hence
lim log( # T,P,n-names in A .. ) ~ h(T, Pl. n
However, lim loge # T,P,n-names in A .. ) ~ h(T, P) n
•
anyway as Jt(A .. )#,.!. 1.
5.2
The Shannon-McMillan-Breiman theorem
We have often spoken of the measure of a T,P,n-name, meaning the measure of the set of points with this T,P,n-name. We write this as Jt(P.. (x» allowing P.(x) to represent both the name and the set of points possessing it. Our next result concerns the asymptotic size of such names. Theorem 5.3 (Shannon-McMilIan-Breiman) For T an ergodic map and P a finite partition, for a.e. x E X
lim -log2(Jt(P.. (x))) = h(T, Pl.
(5.16)
n
" .... 00
Note: this convergence is also in L 1 (Jt). Proof We first
show~
lim -log2(Jt(P.. (x))) ~ h(T, P) n
for a.e. x Let
E X.
B•.• = {x: -log2(:(P.. (X))) > h(T, P)
+ e, x E A .. }
where A .. is the set constructed in the previous corollary. Once n is sufficiently large Jt(B•.• )
~ 2-(h(T.P)+t)n. 2(h(T.P)+(£/2».
Thus 00
L Jt(B.... ) < 00 . .. =1
= 2-../2 •
(5.17)
78
I Fundamentals of measurable dynamics
By the Borel-Cantelli lemma ",{x: x lies in infinitely many B.... } If x lies in only fmitely many
Bn ••~then
= O.
for large n either x ¢
-log2(",(Pn(x))) < h(T, P) n
An
or
+ B.
As the A .. increase to a.a. of X, for a.e. x, once n is large enough, x for a.e. x, once n is large enough 1 --log2"'(p,,(x» < h(T,P) n
E An.
Hence
+ B.
Thus for a.e. x,
which is (5.17). Now we prove the lower estimate lim "-+00
-~log2"'(p,,(x» ~ h(T,P).
(5.18)
n
Redefine
B",. =
{x: -~ log "'(P..(x» < h(T, P) - B}.
Thus the number of T,P,n-names in B". is less than or equal to Let B.
2(h(T,P)-.)n.
= {x: x is an infinitely many B.... }.
If ",(B.) = 0 for all B we are done. Assume ",(B.) > 0 for some e > O. Now by ergodicity
1=0
and for a.e. x there exist i .. (x) < 0, i ..(x) > 0 with i .. (x) - i,,(x) .!. 00 and such that the T,P,i..(x) - i.. (x)-name of Tin(X)(x) is among the 2(h(T,P)-.)Un(x)-i..(x» names in Bjn(X)-in(X)•• ' Following the sequence of estimates of Theorem 5.1, using the backward Vitali lemma we can conclude from this that h(T, P) < h(T, P) - e/2, a contradiction. Hence ",(B.) = 0 for all e > 0 and (5.18) holds finishing the proof of pointwise convergence. To see uniform integrability and hence L 1 ("') convergence, just notice
",{x: -~log2"'(P.(X» > log2s + a} ~ 2-..
a•
•
Entropy
5.3
I 79
Entropy zero and past algebras
To this point entropy has been tied to a fixed finite partition P. We need to loosen this tie. Our first step is to understand what h(T, P) = 0 means. This in itself is an important step. Later we shall see that the relationship between entropy zero and the K -property is very analogous to the relationship between isometries and the weakly mixing property. Let T be, as usual, an invertible measurable measure-preserving map on the Lebesgue probability space (X,.?i', p.), and P = {PI"'" Ps} a finite partition of X. Define the past a-algebra of T, P to be -00
V T-j(P),
f1J =
(5.19)
j=-l
and more generally v
f1Ju.v =
Y
V T-i(P),
(5.20)
j=u
where u ~ v. If (u ' , Vi) £ (u, v) then f1Ju'.v' £ f1Ju.v' The sets in f1Ju.v consist of points whose T,P-names agree from index u to index v. This T,P-name we write Pu,v(x). Let A be any measurable set. We want to define the conditional expectation of A given f1J which we will write E(AIf1J). Define fN(X) = p.(A n P-l,-~(x» = E(AIP_ 1 -N)' p.(p-l, -N(X» ,
It is an easy check that the functions fN and algebras~_l -N form a bounded positive martingale. The U-martingale theorem (Coroll~ry 2.8) of Chapter 2 tells us, for a.e. x, fN(X) converges. We call the limit functions, defined a.e., E(AlgP)·
(5.21)
We know, for any S E ~
Is
XA dp. =
Is E(AIf1J)dp..
(5.22)
Suppose A = Pi' an element of the partition P. The function E(pdgP) meaSures the probability that, given only the past history of a point x, that it now lies in Pi' Note that since LPiEPX Pi = 1, we have
L PiEP
so for a.e. x, the vector
E(pdf1J) = 1 a.s.,
80
I Fundamentals of measurable dynamics
forms a probability vector which we write (5.23)
D(PI&')(x),
a probability distribution-valued function. Theorem 5.4 h(T,P) = 0 iff for all PI e P, E(p,I&') the (T, P) process determines its present.
= Xll •
a.s. i.e., the past of
Proof Suppose that for all Pi e P, E(pd&')
= Xll ,
a.s.
For x e X, let p(x) e P be that element which contains x. Thus for a.e. x, E(p(x)I&') = 1.
Fix e > 0 and select No so large that on a set G, ",(G) > 1 - e, we have for xeG
Exercise 5.1
How do we do this?
Thus if we know that x e G and we know thenamef-l.-No(X) then we know p(x). By the ergodic theorem we can select N so large that for all but e of the
pointsxeX,atleast(1 - 2e)Nofx, T(x), ... , TN-1(X) are in G. Call this subset BN , ",(BN ) > 1 - e.
We now count the number of T,P,N-names in HN . We do this by first selecting a subset of 2EN places in 0, ... , N - 1 which are to be those indices not in G. At these indices in the name, and in the first No positions we assign some arbitrary symbols from P. The rest of the symbols in the name are now determined, as working inductively from the left, at- an undetermined index we must be in G and we know symbols in the previous No positions. Thus
# (T,P,N-names in BN ) ~
~
# of subsets Of)
# names # names) across x ( across ofN (0, ... , No - 1) such a set 2(H(2£)+c1ogNIN+2£logs+(NoIN)logs)N
(
~2eNinaset
x (
(cf. estimates (5.11), (5.12) and (5.14». Thus h(T,P,e,N) ~ H(2e)
clogN
N.
+ ~ + 2dogs + ; logs.
1
(5.24)
Entropy I 81 Letting N
--+ 00
and then e --+ 0, h(T,P)
= O.
To prove the converse we will show that if E(PilgIJ) is not 0, 1 a.s .• then h(T,P) > O.
Note: To say E(pdgIJ)
= {~
a.s.
is the same as to say (5.25) Since there are only finitely many elements Pi in p. for some fixed Pi and for some a > 0 we must have 1 - a > E(PilgIJ) > a on a set A E ~ Jl(A) > a. Select e < a and No so large that for a set G of measure Jl(G) > 1 - e we have for all x E G and n ~ No. E(pd,9l)(x) - E(PioP-l . ...,J(x) < e.
(5.26)
I'\.
This follows from the pointwise convergence of the martingale
I" =
E(pJi>-l.-N)·
This now gives us information about the asymptotic behavior of Jl(P.(x». The key observation is that if T"(x) E A (") G and n ~ No. then (5.27) To see this just note Jl(p" +1 (x» = E(P(T"( »1 (T"( ») Jl(p"(x» x P-l,-" x and as T"(x)
E
G (") A,
E(P(T"(x»lp_l._"(T"(x)))
= E(P(T"(x»lgIJ) ± e
0, then there is an n so that for a set G, Jl(G) > 1 - e.
If x
E
G then
for all Pi E P.
Entropy
I 83
Proof We know
E(PiI5~1 T-nk(P») = ;~ E(PiI5~1 T-nk(p») pointwise a.e. For any x E X,
( I"Y--N1 T
E Pi
-Ilk
)
_
(P) (x)-
P.(Pi n p(T-n(x» n p(T- 2 n(x» n"'n P(T-Nn(x))) p.(P(T n(x»nP(T 2n(x»n"'n P(T Nn(x))) .
As T is a K-system, (see Definition 4.4) once n is sufficiently large, for all N, for all but B of X, this is P.(Pi) ± B for all PI' • Proof of Theorem 5.5 If P is a non-trivial partition, as T is a K-system by Lemma 5.7 there is an n so that E(pdVk-=~l T-nl(p» =F 0 or 1 with positive probability (in fact this conditional expectation is approximately P.(Pi(X» for most x). This says h(Tn, P) =F O. Now by our first lemma h(T, P) =F O.
5.5
The entropy of an ergodic transformation
We want to prove the converse of Theorem 5.5 but it will be some time before we have sufficient machinery to do so. We now begin to develop this machinery, for a time leaving behind the notion of K-system and re-entering the general development of the theory of entropy. DefinidoD 5.1 h(T)
=
suph(T,P)
(5.29)
where the sup is taken over all finite partitions P of X. We need some basic facts to help make h(T) a reasonably computable . 0 quantIty. Lemma 5.8
For all k > 0, h ( T,
i~l T- i(P») = h(T, Pl·
(5.30)
Proof We estimate the number of T, Vl'=-l T-i(P), n-names on a subset A. On the one hand it is at least the number of T,P,n-names on A. On the other hand to know the T, ~-:,k_k T-i(P), n-name of x is precisely to know P-k,nU(X), hence there are at most S2k times as many T, Vl'=-l T-i(P), n-names on A as there are T,P,n-names, Hence
84
I Fundamentals of measurable dynamics # (T,P,n-names in A)
~
# ( T,
~
S2k
/i"
T-i(P), n-names in A )
# (T,P,n-names in A).
•
The result follows.
Lemma S.9 Suppose a partition H is P measurable, i.e., an element of H is a union of elements of P. Then h(T, H)
~
h(T, p).
(5.31)
•
Proof Exercise 5.2.
Exercise 5.2 Prove Lemma 5.9. If we have two finite partitions of X, H, and H', both with state space {h l' ... , h,}, we can define their symmetric differences H tl H' = {x : H (x) of. H'(x)}. This set is a measure of how different the partitions are. Notice it does take into accounfthe labels on the sets.
Lemma S.lO Suppose Hand H' are two finite partitions with the same state space {hl>"" h,}, If p(H tl H') < e, i.e., Hand H' are very close, then h(T, H)
~
h(T, H')
+ elog2 s + H(6).
(5.32)
Proof Fix B > 0 and select Nl so large that for a set A l , p(Ad > 1 - Band for n ~ N l ,
# (T,H',n-names in Ad
~ 2("(T,H)+i)n
(cf. Theorem 5.3).
Let E = {x: H(x) =F H'(x)}, and hence p(E) < 6. Applying the Birkhoff theorem to E, select N2 so large that for a set A2 with ",(A 2) > 1 - B we have for x E A2 and n ~ N 2 , for all but at most (6 + B)n of the points x, T(x), ... , yn-l(X), H(Ti(x» = H'(Ti(x». LetA = Al n A 2 ,so",(A) ~ 1 - !e;letN = max{Nl ,N2 },andn ~ N. Then 2"(T,H,2i, ..) ~
# (T,H,n-names in A)
~
# T,H',n-names in A)
# of subsets of size ~ (e x(. f . In a set 0 SIZe n
+ B)n)
n
x s .
Applying the usual Stirling's formula estimates
h(T, H,!e, n) ~ h(T, H') Letting n -+
00
clogn
+ B + H(e + B) + - - + (e + B) log s.
and B -+ 0 we get the conclusion.
n
•
Entropy
Corollary 5.11
I
85
If
V T-i(P)
He
i=-oo
then h(T, H) ::;; h(T, P).
Proof For any e > 0 select k and k
H'
V T-i(P)
c
i=-k
with Il(H ~ H') < e.
Now applying Lemmas 5.9 and 5.10 h(T,H)::;; h(T,H') ::;; h (T,
+ H(e) + dogs
i~k T-i(P)) + H(e) + dogs
::;; h(T, P)
+ H(e) + dog s.
•
Letting e -+ 0 completes the result. Corollary 5.12
If P is a generating partition, i.e., 00
V T-i(P) = ;=-00
ff'
then h(T) = h(T, P).
•
5.6 Examples of entropy computations Example 1 Irrational Rotations. Let T circle (J -+ ((J
=
R a , an irrational rotation of the
+ oc) mod 2n.
The partition P = {(O, n), [n, 2n]}
is a generator. In fact, if (Jl =F (J2' as the values noc mod 2n are dense in [0,2n), for some n, (nIX) mod 2n lies between (Jl and (J2' hence (Jl and (J2 lie in distinct elements of T"(P). Thus VtLoo T-i(P) separates points and so generates. Thus h(R,,) = h(R", P).
86
I Fundamentals of measurable dynamics
Theorem 5.13
h(R,.. P)
= o.
Proof A set in
is an interval. Spanning this with T-"(P) cuts exactly two of these when forming
Thus #(T,P,n+ I-names) = # (T,P,n-names)
+2
•
and the result follows.
Example 1 Bernoulli Processes. We introduced Markov processes in Chapter 1, Example 3. Bernoulli processes are a special case of Markov processes. As we continue through this and the next two chapters, they will come to play an ever more critical role. Hence we will describe them in detail, and begin with almost obvious facts. Let X = {1,2, .. . ,s}l, the set of all doubly infinite sequences of elements from the finite set of 'symbols' {t, 2, ... , s}, and let
be a probability vector 'Ttl > O. Define I-' on a 'cylinder set' (iN' iN +1 , ..• , iv)
= {x E X I the symbol at index i
E
(u,v) ofx is ij },
to be 'Ttl •• 'Ttl •• I
•••••
'Ttl.·
(5.33)
As long as s =F 1, one can easily show that using cylinder sets to form a generating tree of partitions, (X, iF,l-') is a non-atomic Lebesgue probability space. Define T: X -+ X by where is = is+1 (the left shift). This is known as a Bernoulli shift, Bernoulli process, or i.i.d. (independent, identically distributed) process. A Bernoulli shift is completely specified by the probability vector ft, so often this is all that is given, i.e., the Bernoulli shift (1/2,1/2), also called just 'the 2-shift' or Bernoulli shift (1/3, 1/3, 1/3), etc. We want to compute h(T). To do so we first need to know it is ergodic.
Entropy I 87 Lemma 5.14 A Bernoulli shift is mixing and so ergodic (in fact it is a K-process but we're not ready to prove that yet).
Proof Let A and B be finite unions of cylinder sets. It is clear that, once n is sufficiently large Jl(T"(A) 11 B) = Jl(A)Jl(B).
Such finite unions are dense in F (w.r.t. Jl). It follows that if T(A) = A then = Jl(A)2 and so Jl(A) = 0,1. •
Jl(A)
Let P be the finite partition of X according to io(x), the O-position symbol, and
P = {Pl,P2'''.,Ps}. Sets in V;"=u T-i(P) consist of cylinders on indices (u, v), hence P clearly separates points and so generates. We will compute h(T) = h(T, P)
by using the Shannon-McMillan-Breiman theorem, (5.3) which identifies the entropy as the exponential shrinkage rate of a typical T,P,n-name. Now Jl(p"(x»
= 1t p(x) ·1t p(T(x»··· ·1tp(T"-'(X»
as p"(x) is the cylinder (P(x), P( T(x», ... , P(T,,-1 (x))).
This is then 11-1
II-I
n;~XI(T;(X» x
n
1
II-I
I 'oX2(T;(X» x ... x n;~X.(T;(X» s 2
where Xl(X) is the characteristic function of Pl. According to the Birkhoff theorem, given B > 0, for a.e. x E X, there is an N so that for n ~ N, 1 "-1
.
-ni=O L Xk(T'(x» =
1tk
± B.
Therefore for large n Jl(p"(x» = (1t~'1ti2 .. ·1t:·)"(1t 1 •• ·1tS)±E". For e > 0, choose B so small that - elOg 2(1t 1 0
and now
••• 0
1ts )
< "6
88
I Fundamentals of measurable dynamics
We conclude for a Bernoulli process s
h(T)
=
= -
h(T,P)
L x log2 X i
i'
(5.34)
i=l
The entropy of a Bernoulli shift is thus easily computed. With this result we can now, for example, conclude that the 2-shift and 3-shift are nonisomorphic, as they have different entropies. A much deeper fact, Ornstein's isomorphism theorem, states the converse: any two Bernoulli shifts of the same entropy are isomorphic. We will prove this in Chapter 7. Also notice that we have finally seen the formula Xi log2 Xi' which classically is the starting point for entropy theory. See Billingsley (1965) or Smorodinsky (1971) for good discussions of its abstract character. This is a generalization of our function H(a.) which arose from Stirling's formula.
Li
Example 3 Ergodic Markov processes. Let [Xi.J] be the matrix of transition probabilities, and (Xl"'" Xs) be the stationary distribution of an ergodic Markov process. As in Chapter 1, X is the set of all doubly infinite sequences ( ... , i.. - l ' i..... ) where X1._"I. > O. The measure p. is defined on cylinders (io, i 1 , • •• ,it) by p.((io,i1,···,it
»=
Xio 'Xio,i, "'Xi,_hit"
A generating partition can be formed by setting P(x) = io(x) as in the previous example. We compute h(T) = h(T, P) exactly as in the i.i.d. case,
where ri,j(x) = Li.:-J X(I,J)(1(~» and X(i,J) is the characteristic function of the cylinder (i, j). Once again, by the Birkhoff theorem, for B > 0, for a.e. x, once n is sufficiently large p.(P.. (x» =
xp(x)
TI i,} 1[i.j>O
Hence for e > 0 choose e with
and we obtain
xl~ji.J)±.)(.. -l).
I 89
Entropy
-log2(/l(Pn(x» n
=
-log2 1tp(x) n
n - 1~ I - - - L.. 1t i 1t i ,} Og2 1ti,j n
+ _ B.
i,}
The Shannon-McMillan-Breiman theorem (5.3) now implies for ergodic Markov processes
L 1t 1t
h(T, P) = -
i
i ,j
(5.35)
log2 1t i ,j'
t,}
Note that a Bernoulli shift is a special case of a Markov process with 1ti ,j = and the entropy formula of Example 3 reduces to that of Example 2.
5.7
1tj
Entropy and information from the entropy formula
As we indicated earlier, the formula for the entropy of a Bernoulli shift is the starting point for the classical development of entropy. We will now give a part of that development. Definition 5.3
F or a probability vector ft =
(1t 1" .. , 1t.)
we define
s
H(ft) = -
L 1t log2 1t i
(5.36)
i
i=l
(with the convention that extending x log x continuously to 0, 0 log2 0
=
0).
This is a convex function, i.e., Lemma 5.15
If ft and ft' are probability vectors and 0 < A < 1 then H(Aft
+ (1
- A)ft')
~
AH(ft)
+ (1
- A)H(ft')
with equality precisely when ft = ft'.
Proof Let F(A) = H(Aft
+ (1
- A)ft').
One easily computes
and equality holds only if 1ti =
1ti
for all i.
•
Exercise 5.3 Show that as a corollary of Lemma 5.15, the maximum value of H on n-dimensional probability vectors is log2(n).
90
I Fundamentals of measurable dynamics
Definition 5.4
For a finite partition P of a probability space (X, ff', 11), we set (5.37)
Now for a measure-preserving transformation T of (X, ff', 11) we have the past algebra -00
!!J =
V
T-'(P),
i=-l
and defined a.e., the conditional distribution of P given ~ written D(PI!!J) = (E(P11!!J), E(P21!!J), ... , E(Psl!!J»·
This is a probability vector-valued function on X. We have seen earlier that h(T,P)
=0
if and only if D(PI!!J) is an elementary vector (all O's but for a single 1) almost everywhere. Definition 5.5 The conditional information of P given its past .9 is I(PI!!J)
= H(D(PI!!J».
This is a function, not anum ber. More generally, if .Yf is any subalgebra of ff' then D(PI.Yf)
= (E(P11.Yf), E(P21.Yf),···,E(p.I.Yf»
and I(PI.Yf)
= H(D(PI.Yf».
This function is called the conditional information of P given .Yf. It is meant to measure the amount of 'information' gained when learning the set in P to which x belongs, having already known the sets in .Yf to which x belong. Equivalently, it measures how much 'randomness' remains in P after having learned all of .Yf. These are obviously only heuristic ideas. In fact, much precision can be given them. Our intentions are more technical and less philosophical. Set h(T, P)
=
f
I(PI!!J) dJl.
(5.38)
Our goal is to show that h(T, P)
= h(T, Pl.
First we check this equality for the examples we have considered.
(5.39)
Entropy I 91 Example 1 h(T, P) = 0
iff h(T, P) = O.
We have seen that h(T, P) = 0 iff D(PI9) is an elementary vector a.e. But H(n) = 0 iff n is an elementary vector and the result follows. Example 2
For an i.i.d. process, by the definition of independence, D(PI9)
= (n 1 ,7t2, ... ,7t.).
Hence I(PI9) = H(7t) so h(T, P)
= H(7t) =
h(T, Pl.
Example 3 Fora Markov process, D(PI9) Pk and hence
f I(PI9)dJl = i
= (nl ,1,nl ,2, ... ,nk ,s)
if T- 1(x)
E
7tk·H(7tk,l,7tt,2, .. ·,7tt,.) = h(T,P).
k=1
Showing that 11 = h can be thought of as a generalization of what happens for Markov chains. For a Markov chain all the 'information' the past algebra gives us concerning the present is contained in the single symbol at time -1. Lemma 5.16 algebra,
Let P and H be finite partitions. Regarding H as a finite
h(P v H) = h(H)
Proof Let n i
= Jl(hJ
+ f I(PIH)dJl.
and ni,j
=
Jl(h i n p.) Jl(hJ J
= Jl(pjlhJ
Then h(P v H) = = -
L Jl(h; n pj)log2(Jl(h/ n PJ» ;,j L 7t/7t/,j log2(7t/7t;,i) ;,j
= - L.. " n·1,).n.log - "L., n·'.}.7t.log2(n .. ) 2(n.) , I 1 '.J i,i
;,J
= L n;log2(n;) i
Now
L n n;,jlog2(n ,J)' i9j i
i
(1-
92
I Fundamentals of measurable dynamics
f
I(PIH)dp.
=
-~~ 1t(pj nh i )log2
= -
p.(h(n Pj) p.(h;)
L 1ti,j1ti log2(1t ,j) i
i,j
and h(P v H)
= h(H) +
f
J(PIH)dp..
•
Corollary 5.17 Suppose n = (n1"'" n.) and n' = (n~, ... , n;) are probability vectors and, for 0 < 1 < 1, n"
= (11t 1, 11t 2 , ... , A.ns ,(1
- 1)n'1, ... ,(1 - 1)1t~).
Then H(1t") = 1H(1t)
+ (1
- 1)H(1t')
+ H(1).
•
Proof Exercise 5.4. Exercise 5.4
Prove Corollary 5.17.
Lemma 5.18 h(T, P)
1
(,,-1
= !~~ ~h :Yo T-'(P)
)
.
Proof Let e > 0 and select N so that for all n ~ N, for a set G of all but e of the x E X we have p.(Pn(X» =
Let B
= GC, p.(B)
}
T-i(p) "
Let k > m and S c V;'~-m T-i(P). Then D(
sli=2_1 T-i(P») D(sli=~_1 T-i(P) v R) =
and so
Letting k
--+ 00,
using the reverse martingale theorem
fI(SI~)dll fI(SI~ =
v R)dll·
This holds for any S lying in a finite span of the T-i(p). As Q c Vt~-oo T-i(P) and R c ffQ' R c Vt~-oo T-i(P) so there is a sequence of partitions Si c 'Vi~(iJ.m(i) T-i(p) and Si --+ R in symmetric difference. Thus
and
fI(Sd~)dll--+ fI(RI~)dll' fI(Sd~)dll fI(Sd~ fI(RI~ =
V
R)dll--+
We conclude I(RI~) = 0 a.e. and so R Corollary 5.31
If
~
c ~
and ffQ
v R)dll
=
C;; ffp.
O.
•
.
is trivial for a generating partition P then ffQ is trivial
~~~
This now gives us a circle of facts about the K-property analogous to Proposition 4.19 on weakly mixing. Definition 5.32
The following are all equivalent:
(1) ffp is trivial for all finite P;
(2) h(T, P) -::/= 0 for all finite P; (3) D(PI Vi·;'~i T- i (P)} "7 D(P) for all finite P; J
(4) T is a K-automorphism.
If T has a finite generating partition P, these are also equivalent to
Entropy (5)
I 103
ff" is trivial;
(6) D(Vli 1 ( 0, there is a sequence {nk} 5; 71+ of positive density with Ilg(snk(y» - g(y)1I2
_
Thus by Theorem 5.30,
Thus
~
~p.
(6.5)
116
I Fundamentals of measurable dynamics
f
I(pi
Q&'~ v ffQ )dji =
f
I(P)dJl
But JI(PIQ)djiis squeezed between these two by Corollary 5.26. Thus it also is JI(P)dJl. But then P .1 Q relative to ji. • We have already pointed out the common thread in the 'only if' direction of these three theorems. In the case of the K-property we were not being quite honest, as the argument was entropy and not L2-based. Notice that the 'if' directions also are completely parallel. When disjointness from identities, isometries or zero entropy fails it is because of the existence of a factor of this type. We will see in the next seciton that this is not a totally general phenomenon. Systems can fail to be disjoint without possessing common factors. Each of these results concerns two collections of transformations, characterizing one as the systems disjoint from the other. One can ask if the other member of the pair (identities, isometries, zero entropy) is characterized similarly. For identity maps this is false. There are maps which are disjoint from all ergodic maps but are not the identity. A nice example is T(x, y) = (x, x + y) on the two-dimen~ional torus. For isometries it is also false. Glasner and Weiss (1990) have constructed systems which are not isometries of compact metric spaces but are disjoint from all weakly mixing systems. For zero entropy though it is true. As part of our work in Chapter 7 (Theorem 7.24) we will see that any ergodic positive entropy system has a Bernoulli shift factor, hence is not disjoint from the K-systems.
6.4
Minimal self-joinings
When one considers two distinct systems it is possible, as we have seen, for
leX, Y) to contain only product measure. When the two systems are the same, l(X, X) must contain more. Elements of l(X,X) we refer to as self-joinings. There are automatically certain automorphisms of this system, the powers of T. The measures supported on their gr~p~s are self-joinings. Thus ~j supported on the graph {(X, TJ(x»} is in l(X, X). We call ~j an off-diagonal as it generalizes diagonal measure ~o. Definition 6.4 We say X has two-fold minimal self-joinings if leX, X) is the convex hull of {Jl x Jl,bj}j=-coWe call this two-fold minimal self-joinings because we could always consider joining any finite number of copies of X together. In this case there are again certain joinings that must exist. Define an off-diagonal joining to be one supported on a granh {(x, Ti1(x), ... , TJ,,(x»} in a (k + I)-fold joining. These must be in l(X, ... ,X). More generally, one could partition the k copies into
JOinings and disjointness
I
117
subsets. On each subset place an ofT-diagonal measure, then take the direct . product of these ofT-diagonals. Definition 6.5 We say X has k-fold minimal self-joinings if all k-fold selfjoinings of X are in the convex hull of such products of ofT-diagonals. In these definitions we are not assuming product measure is necessarily ergodic. We will see though that if a system has minimal self-joinings and is not weakly mixing, then it must be a finite rotation. We will use this notion as a constructive tool to exhibit control of certain aspects of a dynamical system not directly reachable with our current methods. Our next theorem sets the tone for these ideas. Theorem 6.12 If X is totally ergodic and has two-fold minimal self-joinings, then it commutes only with its powers and has no non-trivial factor algebras. Proof Suppose To S = SoT, and of course S is measure-preserving. First assume S is invertible. Construct the joining ji supported on the graph of S (Example 6.2). Now ji-a.s. the two coordinate algebras are equal, and ji is ergodic for TxT. Thus ji is either J1. x J1. or some ~j. It cannot be J1. x J1., as this does not identify the coordinate algebras. Thus ji = ~j for some j. But then ji( {(x, S(x»} Ll {(x, Tk(X»}) = 1
and some Sex) = Tj(x) J1.-a.s. If S is not invertible, then S-1 (ff) does not separate points and is a nontrivial factor algebra. All that remains, then, is to show there are no such. Let cp : X -+ Z be a factor map. Let ji be the relatively independent selfjoining of X over Z. Now ji may not be ergodic, but it must be of the form co
+ (1
J1. = a(J1. x J1.)
- a)
L aA,
j=
-00
L
where 0 ::; a ::; 1, and aj ~ 0, aj = 1. Suppose Z is non-trivial, i.e., there is a set A E cp-1(£) with J1.(A) =F 0,1. By Lemma 6.5, J1.(A)
=
ji(A x A)
=
aJ1.(A)2
+ (1
co
- a)
L
aj J1.(A (\ T- j(A».
Of the numbers J1.(A)2 and J1.(A (\ T-i(A» only one is as large as J1.(A), and that is J1.(A (\ T(-O)(A». All the others are strictly smaller, as T is totally ergodic. But then we must have a = 0 and aj = 0 unlessj = O. We conclude ji
and
= ~O,
118
I Fundamentals of measurable dynamics jI(A x X)
=
jI(A x A)
=
ji(X x A)
for all A E $' and so again by Lemma 6.5, cp-l(£,)
= $' and cp is an isomor-
ph~m
_
Corollary 6.13 (Of the proof). If X has two-fold minimal self-joinings but is not totally ergodic, then X is finite. Proof As X is not totally ergodic, it has a factor algebra that is a finite point space (Exercise 3.3). Constructing ji the relatively independent self-joining over this factor the above proof still says IJ. = 0 as ~(A)2 < ~(A). But now, for any point Z E Z, by Corollary 6.7, the fibre measure of ji over Z is ~l,z X ~2,z' Hence the fibre measure over the entire first coordinate algebra is J1.2,z' But this measure is atomic, equal to LJ= -00 a/j(Ti(x» over x. Thus Z is atomic, and the fibre measures J1.z over Z are atomic. Thus X itself is atomic.
-
In fact, any cyclic permutation on a finite set of points has minimal selfjoinings. If such were all of them this idea would not be very useful.
6.5
Chac6n's map once more
We will show here that Chacon's map has minimal self-joinings of all orders. This argument is due to del Junco, Rahe and Swanson (1980). As always with Chacon's map, it is the spacer placed between the second and third blocks in the cutting and stacking that brings otT the argument. By the time we are finished we will have come to a very precise understanding of the name structure of the transformation. The core of the proof is to show that if a pair of points (x, y) gives the expected limit values for the ergodic theorem applied to some self-joining ji, but lie on distinct orbits of T, then in fact ji = J1. x ~. What we have to show is that for such a pair (x, y) the Cesaro averages are those of product measure. Our first technical lemmas tell us how to recognize product measure. Lemma 6.14 If then ji = ~ x v.
X and Yare ergodic and ji E l(X x Y) is (id x S)-invariant,
Proof As S is ergodic, by Theorem 6.8, ji = ~ x v. Our next lemma tells us how we see (id x S)-invariance in a ji E is far easier to prove than express. Lemma 6.15 Suppose for X, and Q for Y. Let
X and
-
leX, X). It
Yare ergodic, P is a finite generating partition
I
Joinings and disjointness d- =
nQ C~n
(T x Sri(p x
a countable generating algebra of cylinder sets. Suppose ii E J(X, X) is ergodic and (x, y) E X 1 n-l
X
119
Q»).
Y is a point satisfying
.
-n i=O L XA(T'(x), S'(y» -+n ji(A)
(6.6)
and (6.7) for all A E.£ Suppose there are intervals (ij,jk) S Z, i" :5: 0 :5: j", Uk - ik) -+ and subintervals I", I" + tk S (ik,A) with #(1,,) ~ ex(A - ik + 1). Finally, for i E I",
00,
an ex > 0
(1) P(Ti(X» = P(Ti+t,,(x»; and (2) Q(Si(y» = Q(Si+t,,+1(y».
(6.8)
We conclude ji is id x S-invariant and hence ji = p. x p.. Proof Let (C x C) E V7';" -no (T X S)-i(P x Q) be some cylinder set. We want to show ji(C x S(C'» = ji(C x C'). This of course completes the result. We have convergence of the Birkhofftheorem in both directions for T x S on cylinder sets. The intervals Ik and Ik + tIc occupy a fraction ex > 0 of (i",it). It is a simple argument that
and lim ""'00
#
(II) t
L
Xc Xs(c,)(Ti(x), Si(y» = ji(C x S(C'»,
(6.10)
ieI"+I,,
(Break (it ,it) into pieces and consider those on which this convergence must occur. This forces convergence on any differences of such pieces that occupy at least a fraction IX > 0 of (ik,j,,).) But now for i more than 2no + 1 positions from the ends of I", (6.8) tells us (T x S)i(X,y) E C X C exactly when (T x S)i+t,,(X,y) E C x S(C'). But then by (6.9) and (6.10) .. .L- XcxdT'(x),S'(y» I#11 (~ "
Letting k
,eI"
-+ 00
we are done.
-. ~ ~ ,eI"+',,
.. XCXs(c,)(T'(x), S'(y» )
I:5: 4no#1+ 2 . k
•
120
I Fundamentals of measurable dynamics
What we need to do now is to show that we can find the structure of this lemma in Charon's map. We first exhibit a generating partition. Lemma 6.16 Let X be Chacon's map. The two-set partition P of X into points in the zero block, and its complementary set of points added in spacers, generate.
Proof We argue inductively that the T,P-name of a point determines its level in a tower. This is equivalent to saying the T,P-name can be broken in only one way into n-blocks, each of which is the name of a passage up through the n-tower. This is true for O-blocks, as those are the individual occurrences in the name of Pl' Suppose we can recognize passages through the n-tower uniquely in the T,P-name of x. These blocks are disjoint and separated in the name by at most one occurrence of P2' Such single P2'S indicate a point on the orbit of x added as a spacer after stage n. Whenever we see an n-block with such a spacer both above and below it, this n-block must be the third (top) n-block in an (n + I)-block. We must see such an n-block at least one out of every nine. Once we see one such the entire name breaks uniquely into triples of n-blocks forming (n + I)-blocks. • This argument has begun an investigation of T,P-names in X, telling us such a name breaks up into a hierarchy of n-blocks, the n-blocks grouping into triples to form the (n + I)-blocks. Let N(n) be the number of levels in the nth stack, hence the length of a passage through an n-block. List these intervals as 1(1, n), ... , I(N(n), n) from bottom to top. Thus T(/(i, n» = I(i + 1, n) for i < N(n). Let hn(x) to be the index withx E I(h,,(x) + 1, n). It is undefined if x is not in the nth stack. Set k,,(x) to be 1,2 or 3 depending on whether x is in the first, second, or third occurrence of an (n - I)-stack in the n-stack. Again k n is undefined if x is not in the (n - I)-stack. In terms of the T,P-name of x, the origin is at a position hn(x) to the right of the beginning of its n-block. When (n - I)-blocks get grouped to form n-blocks, k,,(x) tells us which of the three contain the origin. Lemma 6.17 density 1/3.
For p.-a.e. x, the set of n E ~ with k,,(x)
= 1, 2 or 3 each has
Proof For a.e. x, kno(x) is defined once no is large enough. Let .r..o = U~~~o) I(i, no) be the no-tower. It is a simple induction that the functions k"o+l' k"o+2"" restricted to.r..o are independent and take on values 1, 2, 3 equally likely. By the law of large numbers (just the ergodic theorem applied to the Bernoulli shift (1/3,1/3,1/3», for p.-a.e. x E .r..o ' k,,(x) = 1,2,3, each with density 1/3 in ~. •
Joinings and disjointness
I
121
The sequence of functions k,,(x) 'drive' the construction of T. If we know their values we can construct the T,P-name of X, the first index no for which k"o(x) is defined tells us x is in the spacer of the no-block, i.e., h"o(x) = 2N(no - 1) + 1. Each successive value k,,(x) tells us how to extend the name across the n-block. If these extensions covered all of Z we would get the full name this way. As k,,(x) = 2 infinitely often, with probability one we do cover all of Z. In fact the only points where this procedure does not generate the full name is when either k,,(x) is asymptotically always 1 or always 3. In this case we only get a half name. These two half names can be combined to form a full name, two in fact, one with a spacer between the two half names. Lemma 6.18
For p.-a.e. x, a point y
= Ti(x) iff k,,(x) = k,,(y) for all large
enough n.
Proof Suppose y = Ti(x). Then once h,,(x) and N(n) - h,,(x) are at least Ijl, k" +1 (x) = k"+1 (y). On the other hand, if k,,(x) = k,,(y) for all n ~ no, then for some iii < N(no), y and Ti(x) have identical T,P-names. As P generates, y = Tl(x). • Theorem 6.19 (del Junco- Rahe-Swanson) Chacon's transformation has twofold minimal self-joinings.
Proof Suppose ji is an ergodic self-joining of X. Assuming ji =F ~j for any j, ji gives measure 0 to the union of the graphs of all powers of T. Thus for ji-a.e. (x, y), k,,(x) =F k,,(y) for infinitely many values n. Select (x,y) so that the Birkhotl' theorem is satisfied for both (T x T) and (T- I x T- I ) with reapect to ji for all sets in .s;[ =
"VI C~" (T
X
T)-i(P x P)}
We are going to establish the structure of Lemma 6.14 on the (T x T), P x P-name of(x,y). Let nl' n2' '" be those values with k" (x) =F k" (y) and both are defined. We know ji-a.s. both k,,(x) and k,,(y) are i~nitely' often 2, although these may never be at a value n•. But one of the two following cases must occur: (1) For infinitely many values s, either k".-I(X)
(2) for infinitely many k"s-l(X) = k".-I(y).
= 2; or (6.11)
The only way (2) can fail is if k,,(x) =F k,,(y) for all sufficiently large n. In this case k" -1 (x) = 2, infinitely often. In either case, in the T,P-name of x and y, the indices of the (n. - 1)-blocks
122
I
Fundamentals of measurable dynamics
Table 6.1 T,P-name of x T,P-name of y Bo BI *Bz
----~-----------------
2
R'I Bb*B~
2
2
3
B-zB_l *80(*)
----------B~
Bl 8 z *B3 ----~-----
B~*Bz
2
containing x and y respectively intersect (overlap) in at least N(n. - 2) = (N(n. - 1) - 1)/3 places. Consider the partition of the T,P-names ofx and yinto (n. - I)-blocks with possible spaces in between. We want to list the various possibilities we might see near the origin depending on the values kIf (x) and k" (y). Table 6.1 lists them. A Bi indicates an (n. - "I)-block i~ the T,P-name of x, B; in that of y. A * indicates a spaces, (*) a possible spacer. The origin lies in Bo and Bo. . In all six cases we have identified a value It I ~ 1 where one sees either ( 1)
tl!!*!1!.!J1 B'B' ,or ,
(2)
'+1
I-Bf.~~~I.
lB,
B'+1
r
(6.12)
The overlaps (B, n B;)and (B'+1 n B;+1) are each at least (N(ns - 1) - 1)/3 - 2 and as It + 11 ~ 2, are contained in an interval (i.,j.)
= (- 2N(n. - 1) - 2,2N(n. - 1) + 2).
In (1) when the spacer is between B, and B,+1, set Is B'+1 n(B:+l
+ 1).
= B, n B;
and I; =
Joinings and disjointness
In (2), set Is = H'+1 n H:+1 and I; = H, n(B; In either case 1(
I 123
+ 1).
N(n - 1) - 7 )
~ 18 N(ns ~ 1) + (4/3) U. - is - 1)
# (I.) = # (I;)
~ 2~(jS - i. + 1), once s is large enough. The T,P-name across all (n. - I)-blocks is the same, so condition (6.8) of Lemma 6.15 is established and Ii = J1. x J1.. • Theorem 6.20 (del Junco-Rahe-Swanson) Chacon's transformation has minimal self-joinings of all orders. Proof We know the result for k = 2. Assume it for some value k ~ 2. An otT-diagonal joining of any number of copies of X is isomorphic to a single copy. Thus any (k + I)-fold joining which, when restricted to some pair of copies, is an otT-diagonal is in fact a joining of just k copies. By inducion, then, we can assume that on any subset of k of the copies of X we have product measure. What we will do is show that Ii is either (id)(lc-l) x T x id- or (id)k-l x (T x T)-invariant. In either case, Lemma 6.14 completes the result. In the first k coordinates consider the set Sn of points (Xl' x 2 , .... x k ) for which hn(xJ ~
N(n)
ill
for
. I
=
1, ... , k,
~
kn +1(x i ) = 1 for i = 1, ... , k - 1, but
kn +1 (x t ) = 2. As Ii is product measure on these k coordinates,
Ii(Sn) once n
~
~ [(/0 - N~n»)~J ~ (~y
3.
Table 6.2 XI Xk-I Xk
B~I
B:B:
*BJ
Bok- 11l:I 1
*B~-I
B~*~
(6.13)
124
I Fundamentals of measurable dynamics
Thus jI(limsup(Sn» ~ O. As jI(S"A(T x .. , x T)Sn):::;; kIN(n) < kI3", lim sup(S,,) is (T x '" x T)-invariant jI-a.s. As we assume jI an ergodic joining, jI-a.s., (x1, .. ·,Xk,XH1 ) E Sn for infinitely many n. Suppose (Xl"",Xk,XHl)ESn and consider the n-blocks near the origin (Table 6.2). We know n~=l Hb and n~=l Hi both have length not less than 9N(n)/IO - I. What do we see in the T,P-name of Xlc+l across this section? We see two blocks ~+1Bt+1 or BA+1*B~+l where n~~f Bb and n~~f Bi are both ~ t(9N(n)/IO - I). Here BA+1 is chosen to be the block whose intersection with n~=l Bb is largest, not necessarily the one containing the origin. If for infinitely many n we see BA +1 B~ +1 then we will conclude, as in Theorem 6.19, that Ii is (id)(l-l) x T x id-invariant. If B~+1*B~+1 occurs infinitely often, Ii is (id)(k-l) x (T x T)-invariant. In either case, induction forces Ii = (~t • The proofs of Theorems 6.19 and 6.20 followed from a rather simple observation about the name structure of (T, P). The map T- 1 has this same structure, except the spacer is always placed between the first and second, rather than second and third blocks. In fact, at each stage of the construction one could make an independent decision as to which one of these two choices we make. Explicitly, let e = {e lO e2,"'} be an infinite sequence ofO's and I's. For each such we build a transformation Te analogous to Charon's. The spacer at stage n of the construction is placed between the first and second (n - I)-blocks if en = 0, and the second and third if en = 1. Chacon's map corresponds to e = {I} and its inverse e = {OJ. In fact, interchanging I's and O's in e always takes Te to Te- 1• Exercise 6.2
1. Show that all Te have minimal self-joinings of all orders.
2. Show that if elf =
e~
for all
n ~ No,
then Te ~ Te..
3. Show that if e and e' differ in infinitely many places, then Te and Te. are disjoint. Hint: show any ergodic joining is id x Te.-invariant. Thus Chacon's map is not isomorphic to its inverse. Exercise 6.3 Show that a map with two-fold minimal self-joinings must have zero entropy. Exercise 6.4 Show that any non-atomic system with two-fold minimal selfjoinings must be weakly mixing. Thus all the maps Te are entropy zero, weakly mixing, and if not isomorphic, then disjoint. Notice how much more precise the information gained from the explicit name structure of a system is than the more global mixing or entropy data.
Joinings and disjointness
I 125
The disjointness observed in Exercise 6.2 parts 2 and 3 is a general fact. Among systems with two-fold minimal self-joinings, two are either disjoint or isomorphic. This result is just the beginning of a much deeper analysis of minimal self-joinings on which we will not embark. We refer the interested reader to bibliographic items del Junco and Keane (1985); del Junco and Rudolph (1987); and Rudolph (1979).
6.6
Constructions
Systems with minimal self-joinings have no factors and commute only with their powers. This is perhaps too simple a situation. From it, though, we can build up examples whose behavior is controlled, but is not quite this trivial. Start with X, a non-atomic system with minimal self-joinings of all orders. Let X k be its k-fold direct product. On this we can define a group of transformations as follows. For i E Z and 1t a permutation of {I, .. . ,k}, set (6.14) Thus U(i,1[) acts by permuting the k coordinates by 1t and acting by Ti on each. Notice U(i,1[)
0
UU,1[') = U(i+i,1[o1[')'
We write X(i,1[) to indicate U(i,1[) acting on the k-fold direct product. Lemma 6.21 For any i f= 0 and 1t, 1t' E S(k) a joining of X(i,1[) and X(i, 1[') is in fact a self-joining of X(l,idl' hence a 2k-fold self-joining of X. Proof Suppose ji E JtX(i,n), X(i.d. As UN,1[) = UN1[') = UU,id),j = k! i f= 0, ji is a self-joining of UU,i~)' As we can decompose ji into ergodic components for UU,id) x UU,id)' we assume the action is ergodic. Setting 1 i-I (i = --:ji 0 (U(I.id) )
L
X
U(I,id»,
(6.15)
1=0
(i is an ergodic self-joining of X(l,id)' hence a 2k-fold self-joining of X. Thus (i must be a product of otT-diagonal measures. As X is weakly mixing, U(j,id) x UU,id) acts ergodically on such a {i. But (6.14) writes (i as an average of U(i,id) x UU,id)-invariant measures. Thus they must all the equal to {i, i.e., ji = (i, The ergodic self-joinings of XU,id) agree with those of X(l,id)' These are
the extreme points of all self-joinings, so J(X(1,id)' X(1,id»
= J(XU,id)' XU,id»'
But ) (X(i,1[)' X(i,1['» is squeezed between these.
•
Example 1 We've already seen that a non-atomic system with two-fold minimal self-joinings could have no roots (a power cannot be a root). Here is an
1 26 I Fundamentals of measurable dynamics example of a system with more than one non-isomorphic square root. Let k = 2, TI = U(i,id) = T x Tand T2 = U(1,(1,2». We use cycle form torepresent 1t. Thus Tl2 = Tl = T2 x T2. We want to show TI and T2 cannot be isomorphic. Suppose ji is a joining supported on the graph of an isomorphism. This, by Lemma 6.21, must be a four-fold self-joining of X. It is product measure on both the first two and second two coordinates, but identifies these two product algebras. Call them 9i;. x §i and ~3 x ~. As jiis a product of otT-diagonals, it must identify 9i;. with ~ or ~4. Suppose ~3. Then ~3 = ~
= (TI
X
T2)(~I)
= (T3
X
T4)(~)
= ~,
ji-a.s. This is of course false. The same holds if ~l is identified with ~. We conclude no isomorphism exists. Notice that even though TI and 0. are not isomorphic, they do have a common factor, the factor of symmetric sets invariant under interchanging the two coordinates. This factor algebra has two point fibres in both systems. The relatively independent joining over this common factor is ergodic for TI x 0. but not for TxT x TxT. It is obtained by averaging two four-fold joinings, one supported on points {(X h X2,X 1 ,X2 )}, the other on {(Xl' X2, X2, Xl)}· Also notice that T2 x T2 has other square roots. For example T3:
(XI,X 2 ) --+
(X2' T2(xd)·
This, though, is isomorphic to T2 by the isomorphism cp: (Xl>X 2) --+ (T- 1 (X I ),X2).
In fact, any square root of T2 x T2 is either TI or isomorphic to
0..
Exercise 6.S Construct a system with at least countably infinitely many pairwise non-isomorphic square roots. Example 2 It is quite simple to find a system with no non-trivial factors but which commutes with more than its powers. Just take T2 where Thas two-fold minimal self-joinings. To get a system which commutes only with its powers, but has non-trivial factors, we do the following. Let U = TxT x T. We will take a certain factor of this, the u-algebra of sets invariant under p : (Xl' X2 , X3) -+ (X2' X3, Xl)·
This consists of sets of the form Au p(A)u p2(A),
A
E
Let Y be U restricted to this factor. Now sets also invariant under
~
X
~2
X
~3.
Y has factors.
For example those
JOinings and disjointness
I
127
form a non-trivial subalgebra. Now Y is a factor of X x X x X with three point fibres. The fibre consists of the three points (X p X2,X 3 ), p«X 1 ,X 2,X3 » and p2«X 1 ,X2,X3» which Y cannot separate. If ji is a self-joining of Y, then ji can be extended via the relatively independent joining, to a self-joining of (T x TxT). Suppose qJ commutes with S. Support ji on the graph of qJ and extend it to {.t, a self-joining of (T x TxT). Now {.t still has Yas a factor, but has nine point fibres over it, consisting of a choice for Xl in the first copy, and for x~ in the second, from among the three. Thus {.t is invariant under the nineelement group H generated by p x id and id x p. The copy of Y consists of those sets invariant under this group H. Now {.t may not (in fact, cannot) be an ergodic joining. H permutes the ergodic components of {.t. As the fibres over Yare atomic, in any ergodic component {.tz of {.t, the coordinate algebras~, ~ and IF3 must be identified bijectively with~, /Fs, IF6 • If anyone were independent, the fibres would not be atomic. If §'l with §'4' then automatically §i1. with §'S and §'3 with §'6' This is an ergodic joining. Acting on it by p x id and p2 x id we get two others, and {.t must be an average of such triples. In /Jz the identification of IFl with §'4 is via some power of T, i.e., the measure restricted to this pair of algebras is supported ona graph {(Xl' Ti(x l ))}. Acting by p x p, there must also be a joining supported on {(X2, Ti(X2))} and similarly {(x 3, Ti(X3»}' This says that the set of sextuples {(X l ,X 2 ,X 3 , Ti(x 1), Ti(X2), Ti(X3))} has {.t-positive measure and as it is Y measurable and TxT x T-invariant, measure 1. We conclude that ji is supported on the graph of Si and hence this is qJ. In both these examples we considered algebras of sets invariant under some group of coordinate permutations. In our last two examples they will again carry the day. Exercise 6.6 Let H s S(n) be a subgroup of the symmetric group. In (X)~) -+ (X, P) weak·, then lim h(I;, ~) :s; h(T, Pl· i ....... oo
Proof We know, for any
'7p, P is a generator for (Yp, v). We know
V E
hv(S) =
!~~ hv(pli21 S-i(P»)
(see Exercise 5.5) and in fact the terms in this limit are non-increa.sing (Theorem 5.27). For any fixed n,
!~~ hyi (plj21 S-j(P») = hy(pli21 S-j(P») as these values hv,(PIVj:'~l s-j(P» depend only on the ~:,"o S-i(p) ~ Pn • As hv,(PI ~:~1 S-i(P» :5: hv,(PI ~:~l S-i(P»,
Vi
measure of sets in
Z S-j(P») :s; hy, (Pli21 S-j(P»)
lim hYi (p15 for all n. Let n -+
00.
1
•
132
7.2
I
Fundamentals of measurable dynamics
Painting names on towers and Generic names
We now describe a method for constructing partitions labelled by P. It is the essential tool for creating our representations. We call it painting a name on a tower. Suppose we have a finite P-name, i.e., a sequence P = (Pio,Pi"""PiN_,)ePN.
Also, in some dynamical system disjoint sets
X we have a tower of height N
consisting of
F, T(F), ... , TN-I(F).
To paint the name P on this towe~is to define a map P; Ui=;ol TJ(F) --+ P where P(x) = Pi. for x E Ti(F). Notice P does not partition all of X, only the tower. Suppo~e we are given a finite collection of names PI> ... , Ps E pH, and for each a tower with bases F l , ... , F., and height N, all of which are disjoint. We can paint Pi onto the tower over Fj giving a partition P of the union of all the towers. To actually partition all of X, imagine P to have been extended outside the towers in some perhaps arbitrary way. We will describe explicitly how when need be. As we said earlier, this painting of names on towers is the critical tool we will use in the proofs of our representations. There are three parts to the process of painting. We must have names to paint, towers to paint them on and an assignment of names to towers. We begin with a discussion of names. Suppose (X, Ii) is a process with state space P. Let {Bi.,,}~i"l be a listing of the elements ofVi"=-. T-i(P). Definition 7.1 We say x E X is e, N-generic for (X, Ii) if for all n, 0 ~ n ~ log2(2/e) + 1, and all Bi•• E V;"~-. T-i(P), /
n1 ifo XB;)P(X» N-l.
/
- ~(Bi.")
0 there is a partition P of X with
II(Yp , v),P;X,PII
X2 ) and A2 E J(X 2 , X3 ), then we can construct.u, ajoining of Xl' X2 and X3 simultaneously as the relatively independent joining of fl.1 and A2 over the common factor X2. Certainly
(l(P. x X2 x X3 AX. X X2 x P3 ) ~ p(Pl x X 2 X X3 L1 Xl X liz x
+ P(XI x P2
=.u(PI
X
X3AX.
X
X3)
X2x
P3 )
x X 2 AX l x P2)+P2(P2 x X 3 L1X 2 x
P3 ),
and as (l, restricted to the first and third coordinates is in J(Xl' X 3 ) we are done. 2 and 3. Remember J(X I , X2 ) is weak"'-compact and convex. The function V(A) = A(P. x X2 L1 Xl x P2 ) is weak"'-continuous and linear. Hence it has a minimum achieved on the boundary of J(X.,X2)' By Theorem 6.3, if Xl and X2 are ergodic this boundary consists of the ergodic joinings. _
-- -
-
-
e
d(X I ,Pl; X 2 , P2) < 4Iog 2 (2/s)
then
IIX.,P1 ;X2 ,P2 11
there is a sequence of names p;(x) = {P~o.n)(X)'P~I.n)(X), ... , P~n.nlx)} E pn which becomes ever more generic for (X2 , P2 ) and for which 1
lim -IIPn(x),p;lIl, n-+oo n
-
= d.
We end this section with a very important perturbation argument. We saw in Theorem 7.9 that entropy is d-continuous. Hence a small d-perturbation cannot change entropy by much. What we want to see now is that for v E flp, as long as hw(x) < log2(n), i.e., v is not the Bernoulli n-shift, we can perturb some entropy into v. We need a particularly strong version of this, we define a strengthening of the d metric. Definition 7.5
Set
d N(Xl>P1 ;X2,P2) = d( X 1 'iYN Tl-i(filf,X2'iYN T2- i(fi2»), i.e., we measure not just how closely PI can be joined to fi2 , but how closely
Vl!-N T1- i(fil ) can be joined to vr=-N T2- i (P2 ). It is a computation that
dN(X I , PI; X 2, P2) :5: Nii(X I , fi,,; X 2, P2)· Exercise 7.7
Show that for any e > O. for N > log2 (21e)
+1
IIv l , v2 11 :5: dN(vl> V2) + e. Theorem 7.12 so that
For any VI
E flp,
e > 0 and natural number N, there is an V2
E flp
(1) h. 2 (S) ~ hw,(S) + e(log2(n) - h.,(S»; and (7.8)
Further,
if VI is ergodic, so is v2.
Proof Let Vo be the Bernoulli n-shift (n- 1, n-l, ... ~ n- 1) with h.o(S) = log2(n), the unique maximal entropy measure in flp. Let (X, R) be any weakly mixing process, R, a partition into two sets r l , r2 where
IlCDN T-i(r ») ~ l
(1 - e)ll(rl
)
and ll(r2) = e. Such a partition can be found in any non-periodic ergodic X using the Rohlin lemma. Consider the space Z = Yp >< X >< Yp with measure VI >< 11 >< Vo. and trans-
The Krieger and Ornstein theorems
I
141
formation S x T x S. Construct a partition of f as follows {P(Yt> if x P(YI' x, Y2) = P(Y2) if x
E Yl
E
r2'
Let V2 E v(Z,P) E '1p, __ To see (2) just notice that VI and V2 are joined in Z as P(Yl) and P(YI' x, Y2)' In Z, if x E n~-N T-i(rd, then (YI' x, Y2) and Yl belong to the same elements ofV~_N(S x T x Sfi(p) and Vf=-NS-1(P), respectively. Thus iJN(v l , V2) .::;; 1 -
P,CDN T-i(rd) .::;; 2e.
For (1) we remember hV2 (S,P)
= hv,xPxvo(S x T x S,P)
fI(p15Z1 ~ fI(pIS~1 S-i(P) L
=
(S x T
=
Xr
x
2
X
Y
=
x
I(Y x X X
S)-i(P))d(V l
X
X
fr2xY (p I. VS-i(P») ,=-1
+ Lxr, I(pII'11
P, x vo) I
x p, x Vo
pi I'll S-i(P) I
x p, x vol
Yli'11 S-i(P)
iYa i-i(R) Xi'll S-i(P»)d(V I
X
;'10 T-i(R) Xi'll S-I(P»)dV
iYa T-i(R) x I'll S-I(P»)d(V
+ Lxr,n I(P x x
X
I
x p, x Vo)
d(p, x vol
S-I(P»)d(V
I
x p,)
= p,(r2)log2(n) + p,(rdhv,(S) = elog 2 (n) + (1 - e)hv,(S). 7.4
•
Pure columns and Ornstein's fundamental lemma
Ornstein's fundamental lemma is the critical piece of information we need for our proofs of the Ornstein and Krieger theorems. It is a painting argument. To this point our painting arguments have been rather robust, just painting
1 ...£
I r-unaamentals ot measurable dynamics
a name or collection of names on essentially any tower. Here though we will be given some pre-existing partitions and will wish to repaint them to improve their character. Our first step is to understand how a pre-existing partition paints a tower. Let (X, P) be an ergodic process, and F, T(F), ... , TN-I(F) be a tower in X. The T,P,N-names of points in F cut this tower into subtowers. Suppose P = (Pio,Pi" ... ,PiN_)E pN is some particular P,N-name. Let Fp c F consist of those points with PN(X) = p. Such sets partition F, and are bases for disjoint towers. Such a tower over F, is called a pu~e P-column because each ~ttle piece Ti(F,) n Ti(F) lies in a single, pure set of P, 0 ~ j < N. Notice that Pis simply a painting of the names P on the columns Fp. We want to prove tower-related versions of the ergodic theorem, and the Shannon-McMillan-Breiman theorem. Both these results will rest on the same trick. Even though the base set F occupies only a fraction l/N of the tower, thickening it to UI~1 Ti(F) occupies a fraction 0(. Further, the name of a point in the thickening differs in a controlled way from the name of the point below it in F. Theorem 7.13 Suppose (X, P) is an ergodic process. For any e > 0, once N is large enough in any tower F, T(F), ... , T N- I (F) with N-I
(1) J1 ( i~ Ti(F)
(2) J1( {x
E
)
> e; we can conclude
Fp: P is e-generic for (X, P)}) > (1 - e)J1(F).
Proof Notice that if x is 8,M -genericfor (X, P), then T-i(x) is (8 + i/(m + 1», (M + i)-generic for (X, P). Let 8 = e3 /8. Choose Mo by Lemma 7.2 so that for all but e of the x E X, x is 8,M-generic for all M ~ Mo. Let N(l - e/2) > Mo and consider the set of points B = UI~ci21 Ti(F). As J1(B) > e2/2 there is a subset Bo 5; B, J1(B o) > (1 - e/4)J1(B), and all x E Bo are e,M-generic for all M ~ Mo. Thus if x E Bo n Ti(F), 0 ~ i =s;: [eN/2], then x is 8(N - i)-generic. Hence T-i(x) E F is (e + i/(m + i», N-generic, hence e,N-generic for (X, P). As J1(UI~ci21 T-i(Bo) n F) > (1 - e/4)J1(F) we are done. _ Lemma 7.14 Suppose (X, P) is an ergodic process. For any e > 0, if N is large enough, and A is any set with J1(A) ~ e, there is, then, a subset AD 5; A, J1(Ao) > (1 - e)J1(A) and for any x E AD,
1- ~
log2(J1(PN(x) n A)/J1(A» - h(T,p)1 < e.
Proof We establish upper and lower estimates separately. First, it is enough to show
The Krieger and Ornstein tneorems I ,..., (7.9)
as
once N > -2Iog 2 (e)/e. By the Shannon-McMiIlan-Breiman Theorem (5.3), once N is large enough, for all but at most e2 /4 of X, \ _log2(f1~N(X))) _ h(T,P)\
e,
we can find a set I of P,N-names so that (2) f1(U, eI Fp ) > (1 - e)f1(F); and (3) for pel, I-l/N log2(f1(Fp )/f1(F» - h(T, P)I < e. Proof As in Lemma 7.14, we establish upper and lower estimates separately. Since
-
log2 f1(F) log(e) 10g(N) e < ---+-- O. There is, then, a partition 15 of X 2 so that (1) II(X I x X2 ,ji),P x Q;X2 ,15 v QII 32(No + 2)/8. A pply the Rohlin lemma (Theorem 3.10) to construct a tower in X 2 of height N + No + 2 with base set F, covering all but £/32 of X 2 • For now we work only on the first N levels of the tower, F, T2(F), ... , T!-l (F). This covers all but 8/16 of X'Z. LetF = Xl x F, a base in XI x X 2 .LetFo ~ Fconsistofthose(xl,x2)with (1) theP x Q,N-nameof(x l ,x 2)is8!1./16-genericfor{l;and
(7.15)
-I
1
£!1. and (2) (a) - N1 log2({l(F'N(JC,»//l.(F» - h(TI • P) < 16;
(b)
1- N1 IOg2({1(F'N(x2)//l.~F» -
-I
h(7;./~ < £!1. 16·
(7.16)
We know (1(Fo) >,J - "8!1./16)/I.(F). Let J = {q
= QN(X 2 ): there is an Xl
with (XI' X2)
E
Fo},
The Krieger and Ornstein theorems
I
147
and I = {p = PN(~ : there is an X2 with (Xl' X 2 ) E Fo}.
To any element q E J we can associate I(Q)
£;
J where
J(q) = {p E J: for some (X I ,X 2 ) E Fo, Pn(xd = P and qN(X 2 ) = q}.
(7.17)
We want to paint the tower with base Fq with a name P from J(q). This will make the pair (p, q) eoc/16-generic for p. To obtain (2) of (7.12), though we want, to as great a degree as possible, P to be unique to q. To be more specific, consider = {cp; J o ..... I; J o £; J, cp is 1-1 and cp(q) E I(q)}. We partially order by CPI -< CPo if Dom(CPI) £; Dom(cpo). Let cP be a maximal element of . Let us compute
M{(Xl' X 2 ) E F; qN(X2) E dom(cp)}). Notice that for any q E J and pEl
~(Fp)
h(T2). Further, suppose A E J(X I ,X2) is an ergodic joining and B > O. There is, then, a non-periodic ergodic process (X 3, P3) with h(T3' P3) > h(T2). Further, there is an ergodic joining Al E J(X3 , X2) so that (1)
II(X I
x
X2 ,m,PI •
(2) P3 x X 2
C
X3
X
x
Q;(X3
x
X 2 ,AI),P3
x
Q/I
h(T2)' not just greater than h(T2,Q). In Theorem 7.17, we constructed P inside the X 2 process. Here we only get (X3, P3 ) joined to (X 2, (2) with approximate containment. The critical new fact is
h(T3 , P3 ) > h(T2)· We will gain this entropy using Theorem 7.12. Notice that
h(S) < h(T, P)
~
log2(n).
The Krieger and Ornstein theorems
I 149
Let log2(n) - h(S) > /X > 0, /X ::;; 1. Also notice that if we obtain (7.21) for a refinement of Q, we automatically get it for Qitself. Select e so that (7.22) By our remark above we can assume, without loss of generality, that
oS
h(T2 )
6/X h(T2 , Q) ::;;
s.
-
(7.23)
Use Theorem 7.17 with error 6/2. This gives us a partition (1)
II(XI x X2'p.),P x Q;X 2,P v QII < 2' and
-
-
-
(2)
Qsg
V
T2-
i
--
-
P of X 2 so that
e
(p)'
(7.24)
112 ;=-ao
"p.
If we apply Theorem 7.12, with error e/4, for any N, Let VI = V{i'2'P) E there is a V2 E with
"p
(7.25) As VI is ergodic, we may assume V2 is also. How do we choose the parameter N? We ask two things of N. first i N > logA4/e), and second. as Q c;E12 V'r'=-oo we ask that Q 2 ._ 2 _ _ T2- (P). _ Vf= -N T2-'(P). With this choice for N. let (X 3 , P3 ) = «Yp , V2), P) and {J,I be an ergodic joining of (X3,1'3) and (X 2, p) that achieves the tiN-distance. Notice that as (7.25) tells us
c!
~
--_
--
-
-
-
d N (X2'P v Q;(X 3 x X2'{J,.),P x Q)
h(T) for v the first marginal of fJ,.
Corollary 7.21
J is a non-empty, compact space.
Proof Let (Yp,v o) be the full n-shift, (n-I,n-I, ... ,n- I ), and fJ, = Vo x Il. As (Yp , Yo) is weakly mixing, fJ, is ergodic. As h.(S) = log2(n) > h(T), fJ, E J. _ It is a very delicate task to identify J more precisely. It is not necessarily all elements of j whose first coordinate v E flp has h.(S) 2 h(T), although it is contained in this set. The ergodic elements of j may not be dense in it. The precise nature of J depends delicately on X. What we will show is that the /l E J satisfying our earlier conditions (7.27) and (7.28) are a dense Gd • Let (!7(n)
{ A
= /l E J : P x X
It is easy to see that any fJ, E
1/. Yp x , ~
I/.}
and Yp x Qn ~ PA x X .
nn fP. will satisfy (7.27) and (7.28).
Proof of Theorem 7.19 What we will show is that the sets fP(n) are open and dense in J. The Baire category theorem finishes the result, telling us fP(n) is a dense Gd in 1. ,)' '\' n To see that fP(n) is dense, notice that if fJ, = fP(n), then for some N large enough there are partitions pi s;;; QN and Q~ s;;; PN with
nn
IfJ,(Yp x
pi AP x X)I
O. Hence (X,p) cannot be finitely determined.
•
Let (Xl' 1') be finitely determined, and (X2' Q) another ergodic process with h(T1,p)
= h(T2' Q).
We assume l' and Q are generators of their respective processes. Let Ibe the weak· closure of the ergodic joinings in J(X!, X 2 ). As long as X! exists, lis not empty, as it is the closure ofthe extreme points of J(X!, X 2 ). In J we use the choice (1'" x Q") for the generating tree defining the weak· metric.
154 I Fundamentals of measurable dynamics Theorem 7.24
Those elements j1
E
J with P x X 2
c: p
Xl
X
~ are a dense GIJ.
Proof We define
(7.34) The proofthat (!'I(n) is open follows the same lines as that part of Theorem 7.19. To show denseness, we work as follows. Let {l E l, and 1/2n > e > 0 be given. We want to find Itl E ((,l(n) with IIIt,ltlll < e. We may, without loss of generality, assume {l is ergodic. We know by Theorem 7.7 that if ~~
~
~
~~
~
~
~
e
d«Xl x X 2 ,jil)'P x Q;(X I x X 2,ji2)'P x Q) < E = 410g(2/e)
+4
then n(X I x X2 ,jil)'P x Q;(X I x X2,ji2)'P x QII < e. Using E/4 in the definition of finitely determined for (Xl,P), we obtain a c5 > O. We assume
c5 < E/4 < e/4. As a first step, combining Lemma 5.10 and Exercise 7.8, we can find a partition Ql of X 2 with
and
Now h(TI,P) > h(T2 ,Qd
and {l can be regarded as an ergodic joining of(XI,P) and (X 2 , Qd. By Lemma 7.18 there is an ergodic process (X 3 , P3 ) and an ergodic joining {lo E J(X 3 , X2 ) with (1) II(X I x X2,ji),P x QI;(X3 x (2)
P3 x
1/211
X 2 c: Xl x Po
X2,fld,P3 x Qlll < c5; and
Vi=-oo T2- i(QI); most importantly, (7.35)
By (I) of (7.35), (1) IIX I x P;X3 x P3 11 < c5 and by (3), (2) h(T3 ,P3 ) > h(T,P) - c5. By our choice of c5, there is an ergodic ji E J(X I ,X3 ) with J.I.(P
~
X
-
E
X 3 AX I x P3 )X3 ) over their common factor X 3, hence an ergodic joining of X3, X2 and Xl· Let (i1 E J be the restriction of ji to these two components. To compute II(X 1 x X 2 ,{i),P X Q;(X1 x X2 ,{il)'P x QII,notice it is at most
X2 ,m,p x Q;(X I x X 2 ,m,p x Q111+ (2) II(X 1 x X 2 ,m,p x Q1;(X3 x X2 ,{iO),P3 x Qlll + (3) II(X 3 x X2 ,{iO),P3 x Ql;(X l x X2 ,{il)'P x Qlll +
(1) II(X I x
(4) II(X 1 x X 2 ,{il)'P x Q1;(X l x X2 ,{il)'P x QII.
(7.36)
The pairs of processes in terms (1), (2), and (4) of(7.36) are within ad-distance "£/4. Hence each of these three is less than or equal to e/4. Term (3) we already know is not greater than e/4. Hence A and Al are weak* less than e apart. We know 1/Zn
P3
X
Xl
c::
X3
X
~,
ito
and
Thus _
P
lin X
X z c:: Xl
X :#'2'
ito
and (i1
E
(Q(n).
•
Before going on to the almost obvious Corollary 7.25, we stop to make some remarks. What Theorem 7.24 tells us is that a finitely determined process can be embedded as a factor in any process of equal entropy. One easily concludes that it can also be embedded in any process of greater entropy, as such always have factors of any smaller entropy. Restricted to the case of Bernoulli shifts, this says a Bernoulli shift can be embedded as a factor of any system with equal or greater entropy. This deep fact is originally due to Sinai. As with Krieger's theorem, we have not precisely identified J Here, in fact, it is all of J(Xl>X z ). Once we know Xl is isomorphic to a Bernoulli shift, it is relatively easy to show the ergodic joinings are dense in J(X l' X2). Corollary 7.25
(Ornstein's isomorphism theorem, one form) Suppose both P and Q generators, and h(Td = h(Tz ). Those elements of J supported on graphs of isomorp'hisms are a dense G6 in J Hence the two systems are isomorphic. (Xl' P) and (X z , Q) are finitely determined, with
156
I Fundamentals of measurable dynamics
Proof By Theorem 7.24, those j1 with both
Px
X2
C
r.
Xl
X
fF2
and
are a dense G,. As P and
Qgenerate,
•
and Theorem 6.8 completes the result.
Exercise 7.10 Suppose (X, P) is finitely determined and Qis another generating partition. Show that (X, Q) is also finitely determined. Thus finitely determined is a property of X. Hint: Using finite code approximations, show that any (Xl> Q1) close to (X, Q) in entropy and distribution contains a copy of (X l' PI) close to (X, P) in entropy and distribution. Use the finite codes to bring the d-closeness of (X, P} and (Xl> Pl) back to (X, Q) and (X l' Q1). In fact, any partition of a finitely determined process is finitely determined. This deep result of Ornstein and Weiss can be found in Ornstein (1974). Thus all factor algebras of finitely determined systems are themselves finitely determined.
7.7 Weakly Bernoulli processes To complete our picture of the isomorphism theorem we want to verify that mixing Markov chains are finitely determined. We begin with a property of mixing Markov chains. Definition 7.9 We say a process (X, P) is weakly Bernoulli, if for any e > 0 there is a k > 0 so that for all N
II D (
'i
l+N-I
)
T-i(P}I&'p - D
'i
(I:+N-I
T-'(P)
)
II 1 < e,
i.e.,
Lemma 7.26
Mixing Markov chains are weakly Bernoulli.
Proof If (X, P) is Markov, then for all k > 0 and N,
The Krieger and Ornstein theorems I 157
II D C:~-l
T- i (P)I9'p) - D
C:~C T-i(P»)
t
= IID(T- k (P)I9'p) - D(T-I:(P))II1·
As a mixing Markov chain is a K-system, Proposition 5.32 finishes the result.
-
Although the weakly Bernoulli property is known to be strictly weaker than finitely determined, it is the property most often applied. Hyperbolic toral automorphisms, for example, are proven Bernoulli by showing they are weakly Bernoulli for an appropriately chosen partition Ornstein (1974). We have seen earlier (Corollary 5.26) that entropy is convex. We need a uniformity in this to proceed.
O,L
Lemma 7.27 Fornfixed let nn = {(n t , ... , n,,);ni ~ ni = I}, the space of probability n-vectors. Suppose V is a Borel probability measure on n". Given B > 0 there is a ~ = ~(B, n) > 0 so that if H(J Jtdv) - JH(Jt) dv < ~, then
Proof We know from Corollary 5.26 that
equality holding iff v is a point mass at JJt dv, i.e.,
Now JH(n)dv, H(J Jtdv) and f IJt - f Jtdvl dv are all weak* continuous functions ofv. Compactness in weak* of the Borel probability measures completes the result. _ Smorodinsky (1971) has shown that in fact ~ does not depend on n. Our non-constructive argument of this uniform convexity of entropy completely misses this. Lemma 7.28
Suppose (Xl> Pd is ergodic and for some k, N,
Given anye > 0 there is a ~ so that if (X 2, P2 ) satisfies
158 (1)
I
Fundamentals of measurable dynamics
IIX1 ,P1 ;X2,P211 < h(T1,P1) - b; then (3) II
DC:~-l T2-i(P2)I~P2) -
Proof We know that for j (N
DC:'Zl T2- i(P2»)
lll'
(7.41)
Proof As Xl and Xl are non-atomic, it is clear we can move the sets cp-1(A z ) II a 1 without changing their mass so that (1) is satisfied and D(Qzlcp'(a 1) II a z ) remains D(Qzlcp(ad II a z ). Repeat this step in X z to obtain (2) without losing (1). Within each set a 1 II cp'(a z ) we can further perturb cp' so as to map as much of qf II a 1 to q; II a z as possible, i.e., a set of measure min(Jl1 (qf II a 1), Jlz(q; II a l ». Map the rest of a 1 II cp'-l(a z ) to the remainder of cp'(a 1) II a z to get the real cp'. We compute
Jl1({X
E
a 1 II cp'(a z ): Q1(X)
= III (a 1 II
cp'(az»
=1=
Qz(cp'(x»})
IID(Q1Iad - D(Qzlaz)111'
Our next result will complete the proof that weakly Bernoulli implies finitely determined. It is an inductive construction of maps CPr enabling us to apply
160
I
Fundamentals of measurable dynamics
Corollary 7.11. The notation becomes very complex, but essentially all we are doing is successively matching more and more of the T,Pl-names to Tz,i'znames, without disturbing the statistics of our earlier matching. Our starting point is the hypothesis and conclusion of Lemma 7.28.
Suppose (Xl,Pd and (X z , Pz ) satiify
Theorem 7.30
(1) IID(V~';kN-l ~-i(~)I&'p) - D(V~,;t-l ~-i(Pj))lll
O.
£
(7.42)
Then -- - - k (3) d(Tl ,Pl ;Tz,Pz) 0, there is a k and N >
fl-i(Pdl~Pl) - DC:~-I ~-i(P2»)
t
h(ThPd > c5,
(7.50)
then
II DC:~-I
fZ-i(Pz)I~P2) -
DC:V: I fZ-i(PZ»)
t