This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0, £ G MJ(X).
Hence the LHS is well-defined. If £ = £ &, where & = $ - Cfc with ^ " , C € fc=i
M+(X) for 1 0 we can choose mo > 1 such that 0 < J{m,n)
< J(m,0) < e,
m > mo, n > 1.
Hence -r
X) ?(-A)logf(A) < J(fc,m,n) + e, Aeavat_!
If we let k ->■ co, then I(k,m,n) # ( £ ^ , 5 ) . Thus we have
-»■ # ( f , a ( m ) , S ) and the LHS of (5.8) -)•
-ff($,a,5) m 0 , n > 1.
m > m 0 , n > 1.
(5.9)
is monotonely nondecreasing by Remark 7 (2), lim F ( ^ a ( m ) , S ) = H £
exists. This and (5.9) imply that
B(t>S) H 8I O*|I') < % 2 (^k) < -ff(^k)When fi,k)
n
for 0 < k < n2 — 1 and n > 1. This implies h n
< _
M(-An.fc) < fc + 1 v{An ; } • Since ip(t) = tlogt is decreasing on (0, | ) and increasing on ( j , l ) , we have that
log——
/ Jx
hnloghndu.
On the other hand, let / = 4jJ and observe that for a; € X and n > 1 0 < gn{m) - fn(x) < - , n
0 < /(*) - /„(*) < &,(*) - /„(*) < -, n
j2
Chapter I: Entropy
0 < gn{x) - f(x) < gn(x) - fn{x) < -■ Then we see that 0 1) and 2) be cr-subalgebras of X and fi, v 6 If Vn t % then H%MV) t Hv(jt\v). (2) Let 0 < a < 1 and /^, Vj € P{X) (j = 1,2). Tften, # ( a / x i + (1 - a)/i 2 |ai/i + (1 - a)i/ 2 ) < aJf(A»x|*i) + (1 - a) H
fall's).
P(X).
(6.3)
(3) 7/ ||/i n - un\\ -► 0 and \\vn - v\\ -» 0 witfi {(i„,/i, i/„, v : n > 1} C P{X),
then
H(n\u) < liminf Hfa\un).
(6.4)
n—+oo
(4) 7/ fi,veP(X),
then \\n-V\\ 1 let /n„ = ^ l ^ and vn = i/|3)„, the restrictions of n and v to 2)„, respectively. Then /i 1. If we let / „ = ^ and / = g^, then it follows from Theorem 2 that for n > 1 # S > „ ( M I " ) = / /nlog/„aV,
Hv(n\v)=
j
flogfdv.
Since {/„ : n > 1} is a martingale in Lx(u), we have / „ -» / /n log / „ - * • / log / i/-a.e. Then, Fatou's lemma implies that H»0*1*0= / / l o g / d i / < liminf / n Jx ^>°° Jx
fn\ogfndv
i/-a.e. and hence
1.6. Relative entropy and Kullback-Leibler tier information in
= liminf HVn(fi\u)
53
< H 2e. Since 2)„ | 2), there exist n > 1 and £ 6 2J„ such that /i(AAB) < e, v{AAB) < e. Now observe that
.i™,flSo.M") > »».W") > #B(MI")
^W) -"{BC){I - VM)=M(5C) - V[BC v(B
\n{A) log ^ n—too
2,
£
—> co
as e —> 0.
That is, lim HmAlAv) = co = Hm{)j\v). n—too
(2) We can assume 0 < a < 1. If /ti -^ fi or /t2 it v%, then (6.3) is true since both sides of (6.3) become oo. So we assume /ti -C v\ and /*2 -C ^2- In order to prove (6.3) it suffices to show that for any 21 € V(X)
E > Aea
r
fA\
W
, M + (1
< a £
\
"
f/i\"M
a ) M 2 ( A ) } l0S
QMi(^) +
a^ W
+
M i ^ l o g ^ + (1 " a) £
(l-a)M2(^)
( l - a ) ^
W
M A ) log J ^ .
(6.6)
54
Chapter I: Entropy
Let ci,C2,di, d2 be nonnegative constants and consider a function ip denned by r V{x)
,-
\ l i
OJCi + (1 — x)c2
= {xCl + (1 - x)c 2 } log xdi
+
{1_x)d2-
Then we see that „, . [(ci - c2){xdi + (1 - x)rf2} - ( 0 for n > 1. Let 91 = {jin : n > 1}. Obviously 9 t < 971 since 91 C 971. We shall show 971 1. Take an arbitrary fi e 931. Since /i(A\ii"M) = 0 by the definition of K^, we can assume that A C K^. If /i(A\C) > 0, then X(A\C) > 0 and hence A U C is a chain with A(A U C) > A(C) = a. This contradicts the maximality of C. Thus n{A\C) = 0.
58
Chapter I: Entropy
Now observe that A(JlnC) = ^ A ( 4 n i f n ) = o n=l
since 0 = nn(A) = fi(A n Kn) = JAnKn f^ dX and Kn C #,,„ imply A(A n tf„) = 0 for n > 1. Therefore, fi{A) = n{A\C) + n(A D C) = 0. This means SOT < 01. We now introduce sufficiency. Definition 10. Let 2) be a 1}. Let X A
( ) = T,^(A),
A ex.
n=l
Then, A e P(X) and SOT « {A}. Since 2) is sufficient for SOT, for each A 6 £ there exists a 2J-measurable function hA such that °° 1 f f A(An£?) = £ — / ^ „ ( U | 2 J ) d / x „ = / ^ d A , Hence EX(1A\Z)) we have
Be?).
= hA X-a.e. Take any fi e SOT and let g = jfc. Then, for any A € X
!A^dX=SAtdX=^A) =
jxE^m^
= I hAdfi= [ Ex(lA\0
55, P { £ & - - ff(Pl^ < 0 let E(k,6)=l(xu...,xk)eX*:
i^l0g^l-ir(p|q)
< «J 1.
Then by (6.11) we have that lim P(E(k, 5)) = 1,
5 > 0.
For (xi,... , Xfc) € E(k, 5) it holds that it
Y[p(xj) exp { - k(H(p\q) - 5)} > f[ J= l
q(Xj)
3=1 kk
> HP(XJ) exp { - k(H(p|q) + 3=1 HP(XJ)
exp { - A(tf(p|q) +
E (xi,... ,xt)€AcnE{k,6)
n^) e x P{- A ; Wp|q) + 5)} j=l
= P(AC n E(k,6)) exp { - fe(H(p|q) + 6)}. C
= P(A
n E(k,8)) exp { - fc(JT(p|q) + 1 Since P(.A) < e, (6.12) implies that for large enough k > 1 p(Aen£(*,*))>^ and hence Q(AC) > ^
exp { - fc(# (p|q) + 6)}.
Since the RHS is independent of A it follows that P(k, S) > i ^
exp { - *(H{p|q) + 6)},
so that liminf i log/?(*,£) > - i r ( p | q ) - 5.
(6.14)
fc—foo fc
Since <S > 0 is arbitrary, combining (6.13) and (6.14) we conclude that (6.10) holds.
Bibliographical n o t e s There are some standard textbooks of information theory: Ash [1](1965), Csiszar and Korner [1](1981), Feinstein [2](1958), GaUager [1](1968), Gray [2](1990), Guiasu [1](1977), Khintchine [3](1958), Kullback [1](1959), Martin and England [1](1981), Pinsker [1](1964), and Umegaki and Ohya [1, 2](1983, 1984). As is well recognized there is a close relation between information theory and ergodic theory. For instance, Billingsley [1](1965) is a bridge between these two theories. We refer to some text books in ergodic theory as: Brown [1](1976), Cornfeld, Fomin and Sinai [1](1982),
Chapter I: Entropy
64
Gray [1](1988), Halmos [1, 2](1956, 1959), Krengel [1](1985), Ornstein [2](1974), Parry [1, 2](1969, 1981), Petersen [1](1983), Shields [1](1973) and Walters [1](1982). Practical application of information theory is treated in Kapur [1](1989) and Kapur and Kesavan [1](1992). The history of entropy goes back to Clausius who introduced a notion of entropy in thermodynamics in 1865. In 1870s, Boltzman [1, 2](1872, 1877) considered an other entropy to describe thermodynamical properties of a physical system in the micro-kinetic aspect. In 1928, Hartley [1] gave some consideration of the entropy. Then, Shannon came to the stage. In his epoch-making paper [1](1948), he really "constructed" information theory (see also Shannon and Weaver [1](1949)). The his tory of the early days and development of information theory can be seen in Pierce [1](1973), Slepian [1, 2](1973, 1974) and Viterbi [1](1973). 1.1. The Shannon entropy. Most of the work in Section 1.1 is due to Shannon [1]. The Shannon-Knintchine Axiom is a modification of Shannon's original axiom by Khintchine [1](1953). The Faddeev Axiom is due to Faddeev [1](1956). The proof of (2) => (3) in Theorem 1.4 is due to Tverberg [1](1958), who introduced a weaker condition than [1°] in (FA). 1.2. Conditional expectations. Basic facts on conditional expectation and condi tional probability are collected with or without proofs. For the detailed treatment of this matter we refer to Doob [1](1953), Ash [2](1972), Parthasarathy [3](1967) and Rao [1, 3] (1981, 1993). 1.3. The Kolmogorov-Sinai entropy. Kolmogorov [1](1958) (see also [2](1959)) introduced the entropy for automorphisms in a Lebesgue space and Sinai [l](1959) slightly modified the Kolmogorov's definition. As was mentioned, entropy is a com plete invariant among Bernoulli shifts, which was proved by Ornstein [1](1970). There are measure preserving transformations, called if-automorphisms, which have the same entropy but no two of them are isomorphic (see Ornstein and Shields [11(1973)). 1.4- Algebraic models. and Foias, [2, 3](1968). Chi section to projective limits are seen in Dinculeanu and
The content of this section is taken from Dinculeanu and Dinculeanu [1](1972) generalized the results in this of measure preserving transformations. Related topics Foias, [1](1966) and Foias, [1](1966).
1.5. Entropy functionals. Affinity of the entropy on the set of stationary probabil ity measures is obtained by several authors such as Feinstein [3] (1959), Winkelbauer [1](1959), Breiman [2](1960), Parthasarathy (1961) and Jacobs [4](1962). Here we followed Breiman's method. Umegaki [2, 3](1962, 1963) applied this result to con sider the entropy functional defined on the set of complex stationary measures. He obtained an integral representation of the entropy functional for a special case. Most of the work of this section is due to Umegaki [3]. 1.6. Relative entropy and Kullback-Leibler information.
Theorem 6.2 is stated
Bibliographical notes
65
in Gel'fand-Kolmogorov-Yaglom [1](1956) and proved in Kallianpur [1](1960). (4) of Theorem 6.4 is due to Csiszar [1](1967). Sufficiency in statistics was studied by several authors such as Bahadur [1](1954), Barndorff-Nielsen [1](1964) and Ghurge [1](1968). Definition 6.8 through Theorem 6.15 are obtained by Halmos and Savage [1](1949). We treated sufficiency for the dominated case here. We refer to Rao [3] for the undominated case. Theorem 6.16 is shown by Kullback and Leibler [1](1951). Theorem 6.17 is given by Stein [1](unpublished), which is stated in Chernoff [2](1956) (see also [1](1952)). Hoeffding [1](1965) also noted the same result as Stein's. Re lated topics can be seen in Blahut [2](1974), Ahlswede and Csiszar [1](1986), Han and Kobayashi [1, 2](1989) and Nakagawa and Kanaya [1, 2](1993).
CHAPTER II
INFORMATION
SOURCES
In this chapter, information sources based on probability measures are considered. Alphabet message spaces are reintroduced and examined in detail to describe infor mation sources, which are used later to model information transmission. Stationary and ergodic sources as well as strongly or weakly mixing sources are characterized, where relative entropies are applied. Among nonstationary sources AMS ones are of interest and examined in detail. Necessary and sufficient conditions for an AMS source to be ergodic are given. The Shannon-McMillan-Breiman Theorem is formu lated in a general measurable space and its interpretation in an alphabet message space is described. Ergodic decomposition is of interest, which states that every stationary source is a mixture of ergodic sources. It is recognized that this is a series of consequences of Ergodic and Riesz-Markov-Kakutani Theorems. Finally, entropy fimctionals are treated to obtain a "true" integral representation by a universal function.
2.1. Alphabet message spaces and information sources In Example 1.3.14 Bernoulli shifts are considered on an alphabet message space. In this section, we study this type of spaces in more detail. Also a brief description of measures on a compact Hausdorff space will be given. Let XQ = { o i , . . . , at} be a finite set, so called an alphabet, and X = XQ be the doubly infinite product of XQ over Z = {0, ± 1 , ± 2 , . . . } , i.e., oo
x = xl= n xk,
xk = x0,kez.
k=—oo
Each x £ X is expressed as the doubly infinite sequence x = (xfc) = (•■■, x-u xo, 67
xu...).
„„
Chapter II: Information Sources
The shift S on X is defined by S:x^x'
= Sx={...
,£'_!,4,2;'!,...),
x'k = xk+1,
keZ.
Denote a cylinder set by [x°i---x°} =
[xi=x°i,...,xi=x§
= {x=
(xk) eX
:xk = x°k,i< k
<j},
where x% 6 X0 for i < k < j and call it a {finite) message. following properties:
One can verify the
(l)i<s ( ^ • - • x°] n [yP ■ ■ ■ 2/P] = 0;
(3)[^..-*5] = n{[«S]:* Ac = Li Bj with disjoint B\, ■.. , Bn G Wl. S is a one-to-one and onto mapping such that (7)5" 1 (( a ; f c )) = ( s f c - i ) f o r ( x f c ) G X ; (8) S""[xP • • • xP] = [yf+n ■ ■ ■ 2/P+J with y°k+n = x°k for i < k < j and n G Z. Let X be the cr-algebra generated by all messages 9Dt, denoted X = a(Wl). Then {X, X, 5) is called an alphabet message space. Now let us consider a topological structure of the alphabet message space {X, X, S). Letting d 0 (a,, a,-) = |i - j \ , a*, a,- G X 0 ,
"(^')= E
^
,
*,x'GX,
(1.1)
k=—oo
we see that X is a compact metric space with the product topology and S is a homeomorphism on it. Recall that a compact Hausdorff space X is said to be totally disconnected if it has a basis consisting of closed-open (clopen, say) sets. Then we have the following: Theorem 1. For any nonempty finite set X0 the alphabet message space X = X j is a compact metric space relative to the product topology, where the shift S is a
69
2.1. Alphabet message spaces and information sources
homeomorphism. Moreover, X is totally disconnected and X is the Borel and also Baire cr-algebra of X. Proof. The shift S is continuous, one-to-one and onto. Hence it is a homeomorphism. X is totally disconnected. In fact, the set 97t of all messages forms a basis for the product topology and each message is clopen. To see this, let U be any nonempty open set in X. It follows from the definition of the product topology that there exists a finite set J = {ji,... , jn} of integers such that prfc(f) = Xk = XQ for k £ J, where prj;() is the projection onto the fcth coordinate space Xk- Let i = min{fc : k e J} and j = max{fc : k 6 J}. Then we see that, for any u = (life) €E U, [ui ■ ■ ■ Uj] C U
and
U = M [ui ■ ■ ■ Uj\.
This means that 971 is a basis for the topology. Each message is clearly clopen. In the rest of this section, we consider a compact Hausdorff space X and its Baire cr-algebra X with a measurable transformation S on X. C{X) and B{X) denote the Banach spaces of all continuous functions and Baire measurable functions on X with sup-norm, respectively. As in Chapter 1, M(X) denotes the Banach space of all C-valued measures on X. In this case, M(X) is the space of all Baire measures on X. P{X) (resp. PS(X)) denotes the set of all (resp. 5-invariant) probability measures in M(X). Each measure (JL 6 P{X) (or PS(X)) is called an information source (or stationary information source), or simply a source (or stationary source). A stationary source fj, 6 PS(X) is said to be ergodic if fJ,(A) = 0 or 1 for every S-invariant set A e X. Pse(X) denotes the set of all ergodic sources in PS(X). E x a m p l e 2. Let XQ — {a%,... , at} be an alphabet with a probability distribution P = (Pii • ■ ■ >Pi)- Consider the alphabet message space X = XQ with a shift S on it. For a message [x° • • • x^] we define
«,([*?-*?]) =P(*?) •••*(*?)•
(1-2)
Then, no is defined on the algebra .4(971) generated by 97t, the set of all messages, and is S-invariant such that ^ o ( ^ ) — 1- By the Caratheodory extension theorem ^o can be extended uniquely to an 5-invariant probability measure /x on X = cr(97t), i.e., \i € Pa(X). This n is called a ( p i , . . . ,pe)-Bernoulli (information) source and S is called a ( p i , . . . ,pt)-Bernoulli shift as in Example 1.3.14. We claim that fj, is ergodic. To see this, suppose that A G X is S-invariant and let e > 0 be arbitrary. Choose B e .4(971) such that n(AAB) < e and hence k
\n(A) — n(B)\ < e. Since B — U Bj with disjoint B\,... no > 1 such that S n°B have no > 1 such that S~n°B nohave n(S- B n{S-n°B
j=i
different coordinates from different coordinates from r\B) = M(S-"°5)/i(B) = n B) = n(S-noB)fj,(B) =
,Bk € 97t, we can choose B. This implies that B. This implies that fj,(B)2 fj,{B)2
Chapter II: Information Sources
70
by virtue of (1.2). Then we have fj,(AAS~n°B) = (i{S~n"AAS~noB), = n(S-n°(AAB)) = n(AAB) < e,
since A is S-invariant,
and hence /*(AA(B n S~n°B)) < »{{AAB) U (AAS-"°B)) < fi(AAB) + n(AAS-noB) < 2e. Consequently, it holds that \p(A) - n(B n S-n°B)\
< 2e and
|/*(A) - »(A)2\ < \p(A) - KB n S~n°B)\ + \fi(B n 5-""5) -
/J(>L) 2 |
0 with a + '/3 = 1 and n,£ e Pa(X) imply that M = n = i■ (3) The operator S on M(X) is continuous in the weak* topology if S is a con tinuous transformation on X. To see this, first we note that S is measurable. Let / G C(X). Then S / G C(X) since S/(-) = f(S-) and S is continuous. If C G X is compact, then there is a sequence {/ n }^Li Q C(X) such that / „ J. lc as n —> oo since X is compact and Hausdorff. Thus, lc(S-) = S l c ( 0 is Baire measurable, i.e., 5 _ 1 C G X. Therefore, S is measurable. Now let pn -> /J (weak*), i.e., /*„(/) —> /i(/) for / G C ( X ) . Then, we have for / G C{X) Spn(f)=
[ f{x)Spn(dx)= Jx
[ Jx
f(x)pn{dS-1x)
= I f(Sx) pn{dx) = M s / ) -> Ms/), since S / G C ( X ) , implying 5^i„ —> Sp (weak*). Therefore, S is continuous in the weak* topology.
2.2. Ergodic theorems Two celebrated Ergodic Theorems of Birkhoff and von Neumann will be stated and proved in this section. We begin with Birkhoff's ergodic theorem, where the operators S n 's are defined by (1.3).
72
Chapter II: Information Sources
Theorem 1 (Birkhoff Pointwise Ergodic T h e o r e m ) . Let fj, e PS(X) f e L1(X,fj,). Then there exists a unique fs € L1(X,fi) such that (1) fs=
lim S n /
and
fi-a.e.;
n—too
(2) Sfs
= fs
ii-a.e.;
(3) / f dfj,= I fs dfj, for every S-invariant A € X; JA
JA
Ll(X,n).
(4) | | S „ / - /s||i, M - ^ O a s n - > o o , ||-||i, M being the norm in If, in particular, \i is ergodic, then fs is constant
fi-a.e.
Proof. We only have to consider nonnegative / e Ll(X,n). / ( x ) = limsup(S„/)(x), r>->oo
Let
/(x)=liminf(Sn/)(x), ~
x e X.
n->oo
To prove (1) it suffices to show that f fdfi< JX
since this implies that f = f fM(x)
f fdn
0 and e > 0 be fixed and x e X.
= min {J(x), M},
Define n(x) to be the least integer n > 1 such that 1 n_1 < (S»/)(«) + e = - £ ' ( ^ z ) + e> n ,=o
fu(*)
Note that n(x) is finite for each x e X. Since / and fM
*
e
X
are 5-invariant, we have
n (( xx ) - l n
n(x)fu(x)
< n(x) [(S„ (x) / M )(x) + e] = £
" ( * ) / * ( * ) < n(a) [ { S „ W / M ) M + e] =
£ j=0
f(Sjx) + n(x)e,
x e X.
(2.1)
/ ( 5 i x ) + n(x)e>
* e X.
(2.1)
Choose a large enough N > 1 such that Choose a large enough N > 1 such that M
^
1.
Then it holds that for x e X fc(i)
L-l
ni(a:)-l
2
E/M(^*) = E 3=0
fc=lj=nk_1(i)
L-l
7M(S'*) + £
i=n*w(s)
fM(Sjx),
where fc(x) is the largest integer ft > 1 such that rifc(s) < L — 1. Applying (2.2) to each of the k(x) terms and estimating by M the last L — rik(x)(x) terms, we have k(x) r»jt(x)-l
L-l j
T,fM(S x)
= J2
E
L-l j
fu(S x)+
fc=lj=nt_i(a;)
3=0
fc(a:)
^E
E
fM(Sjx)
j=n*(z)(z)
n*W-l
E
/ ( 5 : ' x ) + ( n f c ( a ; ) - n f c _ 1 ( x ) ) e + (L-n f c ( a : ) (a;))M
k=i L-l
< E f(Sjx) + Le + (N- 1)M 3=0
since / > 0, / M < M and L — nk^(x) and divide by L, then we get /
7M 0 as n —> oo, || ■ || Pi/i being the norm in LP{X, /x). (5) The outline of von Neumann's original proof of Theorem 2 is as follows. Let 5 be as in (3) and « = 6 { / - S / : / e I 2 ( I , 4 where ©{•■■} is the closed subspace spanned by {■•■}■ Then, the first step is to show that S and H are orthogonal complementary subspaces, i.e., S®H = L2(X, fi). The next step is to prove that S n / —> 0 in L2 for / € %. Then, for any / e L2(X, fi) write f = fi+ f2 with / i 6 5 and fi e %. Hence we have llS«/-/l||2,M=llS«(/l + /2)-/l||2,,
= IIS»/2||2,^0' as was desired. This tells us that Theorem 2 holds for an arbitrary measure space.
2.3. Ergodic and mixing properties Let X be a compact Hausdorff space and X be its Baire tr-algebra. In this section, ergodicity and mixing properties are considered in some detail. After giving the following lemma we shall characterize ergodicity of stationary sources by using ergodic theorems. Recall that two measures /i, 77 e M(X) are said to be singular, denoted n L 77, if there is a set A 6 X such that |/i|(A) = \\fi\\ and |»7|(.(4C) = H77II, i.e., fi and T) have disjoint supports. Also recall that /j, e PS(X) is ergodic if each 5-invariant set A G X has measure 0 or 1 and Pse(X) denotes the set of all stationary ergodic sources. Lemma 1. / / /z, 77 G Pse(X),
then either n = n or fi ± 77.
7g
Chapter II: Information
Sources
Proof. Suppose that /J, / 77. Then there is an A G X such that n{A) oo
Ll(X,n).
^
(7) n lim ) (S„f,g)2/1
= (/, 1)2,^(1,5)2,^ /or ever?/ / , 5 G
(8) lim fJ.((Snf)g)
= n(f)fi(g)
for every f,ge
(9) lim (i((Snf)g)
= p{f)vig)
for every f,g G C(X).
L2(X,n).
B(X).
I n-1
(10) lim - y. n(S-kA n-t-oo n
(11) Urn - "][: (t(S-hA n-xx> n
C\B)=
fJ,(A)n(B) for every
A,BeX.
k=0
r\A) = n(A)2 for every
AeX.
k-0
Proof. (1) «• (2) is obvious and (1), (2) => (3) follows from Lemma 1. (3) => (4). Suppose (4) is false, i.e., fi £ exPs{X). Then there are a,j3 > 0 with a + P = 1 and £, 77 G P S (X) with £ ^ 7? such that /j = a£ + /3r/. Hence £ ^ (i and £ -C /z, i.e., (3) does not hold. (4) => (1). Assume that (1) is false, i.e., \i is not ergodic. Then there is an Sinvariant set A G X for which 0 < fi{A) < 1. Hence fi can be written as a nontrivial convex combination »(-) = »(AM-\A) + n(Ac)n(-\Ac), where /i(-\A) / n(-\Ac) and n(-\A),n(-\Ac) i.e., (4) is not true.
G PS(X).
This means that \x f
(1) => (5). Let / G B(X) be real valued and S-invariant and let Ar = {x G X : f(x) > r } ,
r G K.
exPs(X),
77
2.3. Ergodic and mixing properties
Then AT 6 X is S-invariant and hence /i{Ar) = 0 or 1 for every r € R by (1). This means / = const fj.-a.e. (5) =» (6). Let / e ^(X,n). Then, fs is measurable and .S-invariant fi-a.e. by Theorem 2.1. By (5) / s = const /i-o.e. Hence, / s = Jx fs dfi = fx f dfj. \i-a.e. (6) =>■ (7). Let / , g € L 2 (X, ^)- Then, by (6), fs = Jxf Ergodic Theorem implies lim {S„f,g)
dfi n-a.e. and the Mean
= ( lim Snf,g)
=
(fs,g)2,»
= (Jfdft,gj
=(/ll)2,„(l,9)2^.
(7) =* (8) =4> (9) are obvious since C(X) C B(X) C L2(X,n) and (9) => (7) can be verified by a simple approximation argument since C(X) is dense in L2(X, fj,). (8) => (10). Take / = 1A and g = 1 B in (8). (10) => (11) is obvious. (11) => (1). Let A 6 £ be S-invariant. Then (11) implies that (M(A) = n(A)2, so that /J(A) = 0 or 1. Hence (1) holds. R e m a r k 3 . (1) Recall that a semialgebra of subsets of X is a set Xo such that (i) 0 e X 0 ; (ii) A, B e X0 =*• A n 5 € X 0 ; (iii) i e J E o ^ A ^ U B j with disjoint B 1 ( . . . , Bn € XQ. i=1 As we have seen in Section 2.1, in an alphabet message space Xg2, the set 9JI of all messages is a semialgebra. Another such example is the set X x 2) of all rectangles, where (Y,%)) is another measurable space. (2) Let n e P{X) and Xo be a semialgebra generating X, i.e., cr(Xo) = X. If // is S-invariant on Xo, i-e., ^ ( S - 1 ^ ) = n(A) for A 6 Xo, then // e P S (X). In fact, let X1 = {AeX:
M(S_1A) =
n{A)}.
Then, clearly l o ^ I i - It is not hard to see that each set in the algebra -4(Xo) generated by Xo is a finite disjoint union of sets in Xo- Hence A{XQ) C XI- Also it is not hard to see that X\ is a monotone class, i.e., { A n } ^ ! C X\ and An t (or An \) imply U An 6 Xi (or n An e Xi). Since the cr-algebra generated by -4(Xo) is the monotone class generated by A(XQ),
we have that X = a(A(Xo))
= X\. Thus
»ePs(X). (3) In view of (2) above, we can replace X in conditions (10) and (11) of Theorem 2 by a semialgebra Xo generating X. In fact, suppose that the equality in (10) of
ya
Chapter II: Information Sources
Theorem 2 holds for A,B € X0. Then it also holds for A,B e A{X0) since each A € A(X0) can be written as a finite disjoint union of some Ai,... , An € XQ- Now let e > 0 and A,B eX, and choose A),B 0 G A(X0) such that n{AAA0) < e and li(BABo) < e. Note that for ji > 0 (S-'A n B)A(S~jAo n Bo) £ ( S ^ A S " ' A > ) u (BAB 0 ) = (S-j(AAA0)) U (BAB 0 ) and hence n((S-'A n B)A(5-;''AoASo)) < ^(S''(AAA0))
+ n(BAB0) < 2e,
since fj, is 5-invariant. This implies that \n{S-jA n B) -
n Bo) | < 2e,
A*(5-M0
j > 0.
(3.1)
Moreover, we have that
|M5-^ns)-MA)MS)| < \p(S-jA n B ) - /i(5~JAo n B 0 )| + I M ^ A O n B 0 ) -
n(A0)n(B0)\
+ \p{Ao)p(Bo) - lt{A)n{Bo)\ + \ii(A)n(Bo) - A*(A)/I(B)| < 4e + |/i(S-M 0 n Bo) - /Z(A 0 )JU(B 0 )|,
(3.2)
which is irrelevant for ergodicity but for mixing properties in Theorem 6 and Remark 11 below. Consequently by (3.1) it holds that i "—1 3=0'
i
n
~*
' 'j=0 =o n-1
+ +
lI " _ 1
- J- ]„ M(5-J Ao n Bo) - M(A 0 )M(B 0 ) 3=0
n
\li(Ao)it(Bo)-ii(A)it(B)\
1 *~ < 4e + - ^ /x(5-M 0 n B 0 ) - M(AO)P(B 0 ) 3=0
where the second term on the RHS can be made < e for large enough n. This means that (10) of Theorem 2 holds. (4) Condition (11) of Theorem 2 suggests that the following conditions are equiv alent to any one of (1) - (11) of Theorem 2:
79
i.S. Ergodic and mixing properties
(7') JJrn^ ( S „ / , fhtli (8')
= \(f, 1 ) 2 , / for every / G L2(X,
lim M ( ( S „ / ) / ) = H{f? n—¥oo
(9') lim (i{(Snf)f)
for every / G
2
= fi(f)
M );
B(X);
for every / G C{X).
n—K30
(5) 3 denotes the cr-subalgebra consisting of S-invariant sets in X, i.e., 3 = {A G X : S^A = A}. For M G -P(^) let J ^ j i e X : niS^AAA)
= 0},
the set of all fi-a.e. S-invariant or S-invariant (mod/i) sets in 3£. Clearly 3 0 ^ . Then, we can show that fi G iM-^O is ergodic iff fi(A) = 0 or 1 for every A G J^. In fact, the "if part is obvious. To prove the "only if part, let A G 3 M . First we note that n(S~nAAA) = 0, n > 0. For, if n > 1, then n - 1l
n -- 1l
7=0 j=0 j=0
7=0 3=0 j=0
j AAS-J 'A) A) = S'^S^AAA) S~nAAA C y (S- 1, then this gives the k-step transition probabilities, i.e., rojf = Pr{xk = aj \x0 = o j ,
1 < i, j < t
M or /i is said to be irreducible if for each i, j there is some k > 1 such that m y We first claim that 1 n_1
N=
> 0.
lim - V M * n-+oo n *—' fc=0
exists and N = (ny) is a stochastic matrix such that JVM = MN = N = N2 In fact, let Ai = [XQ = a»], 1 < « < £ and apply the Pointwise Ergodic Theorem to / = lAi ■ Then we have that
fs(x)=
lim
1 "_1
- V u t A ) fc=0
exists fi-a.e.x and — / fs(x)lAj(x)
ii{dx) = — l i m
I V ^ S - ^ n ^ )
1 "_1 = Urn - Y > j , f c ) = n y fe=0
(3.3)
for l (3) is clear since we are assuming m* > 0 for every i. 1 "_1
(3) => (4). For any i,j,
(k)
lim - J2 mj- = )iy > 0. This implies that we can find
n->oo n
fc=0
some k > 1 such that m\f > 0. That is, fi is irreducible. It is not hard to show the implications (4) => (3) =$> (2) => (1) and we leave it to the reader. We now consider mixing properties for stationary sources which are stronger than ergodicity. Definition 5. A stationary source fi e Pg(X) is said to be strongly mixing (SM) if lim n(S~nA
n B) = u,(A)u(B),
A,BeX,
n-¥oo
and to be weakly mixing (WM) if and to be weakly mixing (WM) if n-l
lim - nV- l \n(S-kA n B) - n(A)u{B) 1=0,
lim - fc=0 V n—>oo n *-~* fc=0
k
\n(S- A
A,BeX.
n B) - n(A)u{B) 1 = 0 ,
A,BeX.
It follows from the definition and Theorem 2 that strong mixing => weak mixing =*• ergodicity. First we characterize strong mixing. T h e o r e m 6. For a stationary source fi G PS{X) the following conditions are equiv alent to each other: (1) n is strongly mixing. (2) lim (Snf,g)2,li = (/, l) 2 l / .(l, ff)a,M for every f,ge L2(X,fj.). That is, S " / ->
/,
f dfj, weakly in L2(X,fj.) for every f £
L2(X,fi).
(3) lim ( S n / , / ) 2 i M = | ( / , l) 2 , / i | 2 for every f e L2(X, n->oo
n
'
(4) lim fj,(S~ A nA) = fi(A)
2
'
for every
M ).
AeX.
n—too
(5) lim [i(S~nA n A) = (i(A)2 for every A G Xo, a generating semialgebra. n—+oo
Proof. (2) =>■ (1) is seen by considering / = 1A and g = 1B- (2) =3- (3) => (4) => (5) is clear. (5) => (4) follows from (3.1) and (3.2) with A = B and A0 = B0.
tion So Chapter II: Information Sources
g2
(1) => (2). Let A,BeX.
Then by (1) we have
lim ( S n U , lu)a, M = n->oo
lim
KS~nA
n B) = M ^ M # ) = ( U , l ) a * ( l i 1B)2, M .
n-K»
If / = E " i 1 ^ , and g = £ Pk^Bk, simple functions, then 3=1
fc=l
lim(Sn/,s)2,M= n-K»
Um
y^«i/9fc(SBlxi,lBjaJ.
n->oo-^—'
= 22«i^fc(l^i.l)2,*i( 1 ' 1BJ2,,I = (/, 1)2,^(1,3)2^ Hence the equality in (2) is true for all simple functions / , g. Now let / , g e L2(X, (i) and e > 0. Choose simple functions /o and 30 such that | | / — /olh./i < e and llfl — 3O||2,M < £• Also choose an integer no > 1 such that I(S n /o,So)2,M-(/o,l)2, M (l,9o)2,^\ n0.
Then we see that for n > no \(Snf,g)2lli-(f,
1)2,^(1,5)2^1 n
< |(S / lff ) 2 ,M - (S"/o, 3)2,^| + |(S n /o,3)2,^ - (S"/o,ffo)2, M | + |(S n /o,5o)2, / i - (/O,1)2,M(1,5O)2,A.|
+ |(/o, 1)2,^(1,50)2,^ - (/, 1)2,^(1,50)2,^1 + i(/, 1)2,^(1,50)2,^ - (/, 1 ) 2 , ^ ( 1 , 5 ^ 1 < | ( S n ( / - fo),g)2J
+ |(Sn/0,5 -5O)2,M| + e
+ I ( / - /o, 1)2,^| I(l,5o)2,^| + I ( / , 1 ) 2 J , | I(1,5 - 5 o ) 2 , „ | < 11/ - /ollajlfflhj. + WfohJg
- 5O||2,M + e
+ ll/-/o||2,^||5o||2^ + ||/||2,^||5-5o||2,^ < e||fflk„ + l l / o l k ^ + e + e\\go\\2,u + [|/||a*e < j\9fo*
+ ( l l / l t a + e)e + e + e(||5lk„ + e) + e\\f\\2tll.
It follows that 1 n
™ 0 (S n /,5)2, M = (/, 1)2,^(1,5)2,M.
(4) =» (3) is derived by a similar argument as in the proof of (1) =>• (2) above.
83
2.3. Ergodic and mixing properties
(3) =*■ (2). Take any / 6 L2(X,n)
and let
-H = 6{Snf,c
: c € C,n > 0},
the closed subspace of L2(X, n) generated by the constant functions and S n / , n > 0. Now consider the set Mi = {g € L2(X,»)
: Umg(Snf,g)2tll
= (/.^(l.fl),^}.
Clearly Mi is a closed subspace of L2(X, (i) which contains / and constant functions, and is S-invariant. Hence Mi contains M. To see that Mi = L2(X,(i) let g e M x , the orthogonal complement of M in L2(X,(j,). Then we have (S"/,ff)a,M = 0 ( « > 0 )
and
(1.5)2^ = 0,
so that g € Mi. Thus M"1 C Mi. Therefore Mi = L2(X,fi),
i.e., (2) holds.
In Theorem 6 (2) and (3), L2(X, ft) can be replaced by B(X) or
C(X).
Example 7. Every Bernoulli source is strongly mixing. To see this let fi be a ( p i , . . . ,p*)-Bernoulli source on X = X j . Let A = [x? ■ • • SBJ], B = [y° ■ ■ ■ y(] e M. Then it is clear that lim n(S-nA n B) = u(AW-B) n—foo
since for a large enough n > 1 we have n + i > t. By Theorem 6 /u is strongly mixing. In order to characterize weak mixing we need the following definition and lemma. Definition 8. A subset J C Z + = { 0 , 1 , 2 , . . . } is said to be of density zero if lim - | J n J U = 0 ,
n—>oo n
where «/„ = { 0 , 1 , 2 , . . . , n — 1} (n > 1) and | J n J „ | is the cardinality of J n Jn. Lemma 9. For a bounded sequence {an}^=1 are equivalent:
of real numbers the following conditions
(1) Urn ; " E h l = 0 ; (2) lim -nE\aj\2 n-K» n j
= 0;
= 0
(3) Tftere is a set J C Z + 0/ density zero such that
lim
J$n—t 00
a n = 0.
84
Chapter II: Information
Sources
Proof. If we can show (1) (3), then (2) (3). Suppose (1) is true and let
Observe that £ i C £ Observe that £ i C £
2 2
Ek = In G Z + : | a n | > i | ,
k > 1.
Ek = in G Z+ : |o»j > ~\,
k> 1.
0 - ' and each £fc has density zero since 0 - ' and each Ek has density zero since
-\EknJn\0 n
n *—'
J
as n —> oo by (1). Hence for each k = 1,2,... we can find an integer j j . > 0 such that 1 = j 0 < J! < j 2 < ■•• and -\Ek+i n J „ | < ^-j-y,
n > i&.
(3.4)
Now we set J = U (£* n [j'fc-i.jfc)). We first show that J has density zero. If jk-i
0 as n —> oo, i.e., J has density zero. Secondly, we show that lim a n = 0. If n > jfc and n 4 J, then n 4 Ek and J$n—foo
|a„| < £ T J . This gives the conclusion.
ng properties pi 2.3. Ergodic and mixing
85
(3) => (1). Suppose the existence of a set J C Z + and let e > 0. Then,
j=o
j'6J„nJ
jeJnnJ'
Since {an} is bounded and J has density zero, the first term can be made < e for large enough n. Since a„ —> 0 as n —>■ oo and n $ J, the second term can also be made < e for large enough n. Therefore (1) holds. Theorem 10. For a stationary source fx £ Ps{X) the following conditions are equivalent to each other: (1) fj. is weakly mixing. (2) For any A,B e X there is a set J C Z+ of density zero such that lim
fj,(S-nA C\B) =
n(A)p(B).
J$ n—KX>
(3) lim - T ; U(5- J '-4 n B) - /i(A)/z(B)| 2 = 0 for every
A,BeX.
re-t-oo n j = o
(4) lim - " E | ( S k / , j ) 2 , , - (/,1)2,,(1,»)2,,| = 0 /or e ^ n , / , j £ L 2 ( I , , i ) . n-»oo n
k=0
(5) /J x /i is weakly mixing relative to S x S, where fix fi is the product measure on (XxX,X®X). (6) n x T) is ergodic relative to S xT, where (Y, %),n,T) is an ergodic dynamical system, i.e., n € Pse(Y). (7) fj, x fi is ergodic relative to S x S. Remark 11. In Theorem 10 above, conditions (2), (3) and (4) may be replaced by (2'), (2"), (3'), (3") and (4'), (4") below, respectively, where X0 is a semialgebra generating X: (2') For any A,B £ XQ there exists a set J C Z + of density zero such that lim /x(S~ n A n B) = fi(A)n(B). J$ n—>oo
(2") For any A e XQ there exists a set J C Z+ of density zero such that Urn fi{S-nAC\A) = n{A)2. J$ n—>-oo
(3') lim - V ; \n(S-jA n-K» n
HA)-
fi(A)2\2
r\A)-
fi(A)2\ = 0 for every A e X0.
= 0 for every A e X0.
j=Q
(3") lim - nT. WS'iA n-voo n J=Q
gg
Chapter
ion Sot II: Information Sources
(4') lim - i f |(SV, / ) 2 , „ - | ( / , 1) 2 ,„| 2 | = 0 for every / € L2(X, fi). n-s-oo n j = o
(4") Urn - " E K S ^ ' / . ^ ^ - K / . ^ ^ P I ^ O for e v e r y / e L 2 ( X , M ) . n-voo n J=Q
Proof of Theorem 10. (1) (2) •» (3) follows from Lemma 9 with an = tM(S-nAnB)-fi(A)fj,(B),
n > 1.
(1) =*> (4) can be verified first for simple functions and then for L 2 -functions by a suitable approximation as in the proof of (1) =>■ (2) of Theorem 6. (4) =>• (1) is trivial. (2) =» (5). Let A, B, C, D € X and choose Jj,, J 2 C Z+ of density zero such that n(S~nA
lim
n B) = /i(^)/i(B),
J i $ n—»-oo
Urn
/ i ( S " " C 1-12?) =
n(C)n(D).
J2$n—foo
It follows that lim
(/ux/j)((5x5)-"(J4xC7)n(Bx£>))
JiUJ2£n-K»
n{S~nA
lim
n B)n{S~nC
n D)
JiUJ2$n~foo
= /i(A)/i(B)/i(C7)/x(i)) = ()jx
^)(J4
x C)(/i x /^)(B x D).
Since 3C x X = {A x B : A, B e 3E} is a semialgebra and generates X <S> X, and J i U J 2 C Z + is of density zero, we invoke Lemma 9 and Remark 11 to see that / J X / I is weakly mixing. (5) =*- (6). Suppose fixfj,is weakly mixing and (Y, 2), T, ri) is an ergodic dynamical system. First we note that (5) implies (2) and hence n itself is weakly mixing. Let A, B e X and C, D 6 2). Then n-l
- £ ( M x 17)((5 x r ) - ' ( A x C) n (B x £>)) 3=0 n-l
= -£^M£)77(r^cni3) i=o
1
n _ 1
+ n E
W~'*
n
-B) - M(^)M(S)}7?(r^C n Z>).
(3.5)
87
2.3. Ergodic and mixing properties
The first term on the RHS of (3.5) converges to li{A)n(B)ri(C)T]{D) = (fixr]){Ax
C){fi x n ) ( S x D) (n -> oo)
since r/ is ergodic. The second term on the RHS of (3.5) tends to 0 (n —> oo) since n-l
l £ > ( s - ' ' i 4 n B) - M(A)M(B)}n(r-^c n D) n j=0
n-l
< - 53 IM-S'M nfl)- M(^)A*(B)| -> o (n —> oo), since ^i is weakly mixing. Thus n x r) is ergodic since 3£x 2) is a semialgebra generating £®2J. (6) =4- (7) is trivial. (7) => (3). Let A, B 6 £ and observe that 1
n—1
- ^
1
fi(S-'A
n—1
D B) = - ^ 0 * x AO ((5 x S)~j{A
j=o
xX)n(Bx
X))
j=o
-¥{jpx = - n—1
p){A x X)(/t x fi)(B x X),
by (7),
»(A)(i(B), - n—1
- 53 MOS'-'A n B)2 = - 53 (/* x M) ((5 x S)-J'(;4 x A) n (5 n B)) .7=0
j=0
-»
(A*
x /i)(A x A){n x n)(B x B),
by (7),
2
= M(A) V(B) . Combining these two, we get n-l
i53|M5-^nB)- M (AMB)| 2 n .
j=0
n-l
= - 5 3 {»(S~jA n B)2 - 2/i(S'-;'A n B)/*(A)Ai(B) + //(A) V(B)2} n
j=o
-» n(A)V(B)2
- 2n(A)n(B)n(A)n(B)
+ ^(A) 2 M (B) 2 = 0.
Thus (3) holds. E x a m p l e 12. Let fi be an (M, m)-Markov source on X = XQ . ^ or M is said to be aperiodic if there is some no > 1 such that M"° has no zero entries. Then, the following statements are equivalent:
88
Chapter II: Information Sources (1) n is strongly mixing. (2) jj. is weakly mixing. (3) /i is irreducible and aperiodic. (4) lim Tuffi = mj for every i, j . n—►oo
**
The proof may be found in Walters [1, p. 51]. In the rest of this section we use relative entropy to characterize ergodicity and mixing properties. Recall that for a pair of information sources fj, and v the relative entropy H{v\fi) of v w.r.t. \i was obtained by
H{y\fi) = sup j £
u{A) log ^
log — dv, dfj,
/ ,x oo,
: 21 G V(X) J if
J/
< /i,
otherwise
(cf. (1.6.1) and Theorem 1.6.2). The following lemma is necessary. L e m m a 1 3 . Let fin (n > 1),/J, G P(X). Suppose fin < afi for n > 1, where a > 0 is a constant. Then, lim fj.n(A) = fj,(A) uniformly in A G X iff lim H(fj,n\u) = 0. 71—fOO
71—>00
Proo/. The "if part follows from Theorem 1.6.3(4). To see the "only i f part, observe that {^jf} is uniformly bounded and converges to 1 in probability (w.r.t. //) by assumption. Since + ±(t-l)2,
\tlogt\0,
we have { ^ f l o g ^ f } converges to 0 in probability. Thus, since {&*.log^-} uniformly bounded, we also have lim H(Jtn\^)=
Urn [ ^ l o g ^ d / x = 0.
We introduce some notations. Let n G P{X). fln on X X by 1
is
For each n > 1 define a measure
71 — 1
W^xfl) = -^/»(s-^nB),
A, Be*.
2.3.
89
Ergodic and mixing properties
For a finite paxtition 21 G P(X) of X, fJ.% denotes the restriction of fi to 21 = o-(2l), i.e., fi ® e 'P(^) denote by .4(21 x 53) the algebra generated by the set {A x B : A G 21, S G 53} of rectangles. We also let
*-*^-JLMA"B^mmPV
Aea,B€®
FA„(2lx®) = -
£
;PV
;
/in(J4xB)log/i„(AxJB),
A62l,£e93
*woa x») = - E "(■*) lo^(A) - E ^(fi) lo s^ 5 ) Ae«
Be 0 there is an integer n 0 > 1 such that |^(A)-M{A)|n0,
A e l
Thus for A G 3E and p, g > 1 we have that ?-i
J
I£MS- A)-JX>(S-^)
¥ p
H y
j j=o =o
fc=o 9-1
< i 2 {KS-jA) - i^iS-'A)} - - £ {n(S~kA) - »no(S-kA)} i
+
__
■>
„
i
i£/zno(S-^)-iX>»o( 1 such that the third term of the RHS of the above expression can be made < e for p, g > po- Consequently it follows that for p,q > po, the LHS of the above expression is < 3e. Therefore, the limit in (4.1) exists for every A G X, so that p G P a ( X ) . (2) Let us set Ma{X)
= {an + fa : a,0€
C, p,»? G P « ( X ) } ,
Af s (X) = { a p + ^7?: a,/? G C, /X.JJ g
P.{X)}.
Note that M 0 ( X ) is the set of all measures p G M(X) for which the limit in (4.1) exists, and MS(X) is the set of all 5-invariant p G M(X). Define an operator
10:Pa(X)^Ps(X)by Top = p,
p G Pa{X)
and extend it linearly to an operator T on Ma(X) onto Ma(X). Then we see that X is a bounded linear operator of norm 1, since Ma(X) is a norm closed subspace of M(X) by (1). Hence (2) follows immediately.
24-
AMS sources
93
Definition 4. Let fi, rj e P{X). That r\ asymptotically dominates /u, denoted noo
a
a
a
/ i < ^ < 7 7 o r / i < f < 7 / , then p, < 77. After the next lemma, we characterize AMS sources. Lemma 5. Let p.,r) € P ( X ) and for n > Q let Snfi = (Sn(i)a + (Snfi)s be the Lebesgue decomposition of Snn w.r.t. 7/, where (Snp,)a is the absolutely continu ous part and {Snp.)a is the singular part. If / „ = » ^'" is the Radon-Nikodym a
derivative and p. -C 77, then it holds that n lim < sup n(S~ A)
n->oo I
JjndV 1=0
(4.2)
A e x
Proof. For each n = 0 , 1 , 2 , . . . let Bn e X be such that 77(Bn) = 0 and Snp(A)
= Snfi(A n Bn) + f /» *?,
A 6 3E.
OO
Let 5 = U i? n . Then we see that r)(B) = 0 and for any A € X 0 < M(5~ n A) - [ fndr,
= /x(5-n(Anfi„))
< At(5- n (A n 5 ) ) < fi{S-nB)
-> 0
a
as n —^ oo by /i (4) is immediate since A G Xoo if A G X is S-invariant. (4) =>■ (2). Let TJ G P S (X) satisfy the condition in (4). Take an A G X with 77(A) = 0 and let B = r i m s u p S ~ M . Then we see that B is S-invariant and 77(B) = n-Kx>
0. That lim^ n(S~nA) (1) =* (2)"above.
= 0 can be shown in the same fashion as in the proof of
24- A.MS sources
95
(2) => (5). Suppose that / J < 7 ? with 77 G Pa(X), and let / G B(X) be arbitrary. Then the set A = {x G X : (S„/)(a:) converges} is 5-invariant and r](A) = 1 by the Pointwise Ergodic Theorem. Thus, that Ac is 5-invariant and 77(^4°) = 0 imply lim n(S~nAc) n—>oo exists
a
= n{Ac) = 0 by n (6). Let / G ■B(-X') and observe that {Snf}'^'=1 is a bounded sequence in B(X) C LX(X, n) such that S „ / -» / s ^t-a.e. by (5). Then the Bounded Convergence Theorem implies that /i(S„/) —> (J,{fs)(6) => (1). We only have to take / = 1A in (6). The equality (4.3) is almost clear. a
Remark 7. (1) Note that /z• (2) in the above theorem. (2) If n G P(X) and n < 77 for some 77 G Ps(-X"), then n is AMS. (3) The Pointwise Ergodic Theorem holds for \i G P(X) iff /j G Pa(X). More precisely, the following statements are equivalent for \i G P(X): (i) /. G Pa(X). (ii) For any / G B(X) there exists some 5-invariant function fs G B(X) such that S n / —► / s \x-a.e. In this case, for / G L 1 (X,/Z), S „ / —» / s ft-a.e., p-a.e. and in L J (X,/T), and / s = En(fP) = Eji{f\3) fi-a.e. and p-a.e., where 3 = {.4 G X : 5 _ 1 . 4 = A} is a cr-algebra. (4) In (5) and (6) of Theorem 6, B(X) can be replaced by C(X). (5) Pa(X) is weak* compact with the identification of Pa(X) C M(X)
= C(X)*
When 5 is invertible, we can have some more characterizations of AMS sources. Proposition 8. Suppose that S is invertible. Then, for /j, G P(X) conditions are equivalent: (1) fi G Pa{X). (2) There exists some 77 G PS(X) such that n -C 77. (3) There exists some 77 G Pa(X) such that ft (3) in Theorem 6
71=0
implies ^ -C 77 = p. (2) => (3) is immediate. (3) => (1). If 77 G Pa(X),
then 77 « 7 7 by the proof of (1) => (2). Hence, if fj, < 77,
Chapter II: Information
96
Sources
then fj, -C 77. This implies fi 0 by / J < 7 7 , i.e., n(A) = 0. Similarly, if 77(A) = 1, then we have p(A) = 1. Thus \x £ Pae(X). The implications (1) =>. (4) =>• (5) =► (6) => (7) => (8) => (9) => (1) are shown in much the same way as in the proof of Theorem 3.2.
Chapter II: Information Sources
gg
Remark 13. In (8) and (9) of Theorem 12, X can be replaced by a semialgebra X0 that generates X. Also in (5), (6) and (7) of Theorem 12, we can take g = f. Theorem 14. (1) If p e exPa(X), then fi G Pae{X). That is, exPa(X) C Pae{X). (2) If Pse(X) / 0, then the above set inclusion is proper. That is, there is a H G Pae{X) such that \i £ exPa(X). Proof. (1) This can be verified in exactly the same manner as in the proof of (4) =J- (1) of Theorem 3.2. (2) Let fj, G Pae(X) be such that n ^ p. The existence of such a fi is seen as follows. Take any stationary and ergodic f G Pse(X) ( / 0) and any nonnegative / 6 ^(X, f) with norm 1 which is not S-invariant on a set of positive f measure. Define fi by
p(A) = J f (3) is clear.
then n -C p = r\ by Remark 9.
(3) =► (4). Let r) e Pae(X) be such that p < n. Then rj € Pse(X) Hence /i < rj and ^ < 77 since rj is stationary.
and n < fj.
(4) => (1). Let 7? e Pae(X) be such that /i(£) and n > 1.
/
—
For, this is verified by (1),(2),(3) and the mathematical induction. = 7 / i (5-("- 1 )2l) + £ / ^ - ( " - f c - i y
(7) iJVs-w)
•7_u
'
fc=i
^
V 5-("-J)2lV I i=i
/
S.5. Shannon-McMillan-Breiman Theorem
101
This is obtained from (6) by letting 21, = S~ln~j)% = 7„(2l) o S " - 1 + £
(8) iJ^S-m) \j=o
y
J
1 < j < n.
Mf a l V 5-( fc -> +1 )2l) o 5 n - f c - x . i
v ij=i
fc=1
This is immediate from (4) and (7). Now for n = 1,2,... let
/. = /„( Vs-H),
PO = i„(a),