Abstract Methods in Information Theory

ABSTRACT METHODS IN INFORMATION THEORY SERIES ON MULTIVARIATE ANALYSIS Editor: M M Rao Published Vol. 1: Martingales...

Author: YuМ„ichiroМ„ Kakihara

27 downloads 692 Views 84MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

ABSTRACT METHODS IN

INFORMATION THEORY

SERIES ON MULTIVARIATE ANALYSIS Editor: M M Rao

Published Vol. 1: Martingales and Stochastic Analysis J. Yeh Vol. 2: Multidimensional Second Order Stochastic Processes Y. Kakihara Vol. 3: Mathematical Methods in Sample Surveys H. G. Tucker Vol. 4: Abstract Methods in Information Theory Y. Kakihara

Forthcoming Convolution Structures and Stochastic Processes R. Lasser Topics in Circular Statistics S. R. Jammalamadaka and A. SenGupta

ABSTRACT METHODS IN

INFORMATION THEORY

Yuichiro Kakihara Department oj Mathematics University oj California, Riverside USA

World Scientific

Sinaapore*• New Jersey Jersey,London • Hong Kong Singapore

Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data Kakihara, Yuichiro. Abstract methods in information theory / Yuichiro Kakihara. p. cm. — (Series on multivariate analysis ; v. 4) Includes bibliographical references. ISBN 9810237111 (alk. paper) 1. Information theory. 2. Functional analysis. I. Title. II. Series. Q360.K35 1999 003'.54~dc21 99-31711 CIP

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

Copyright © 1999 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

Printed in Singapore by Uto-Print.

Dedicated to

Professor Hisaharu Umegaki

This page is intentionally left blank blank.

PREFACE

Half a century has passed since C. E. Shannan published his epoch-making paper entitled "A mathematical theory of communication" in 1948. Thereafter the socalled "information theory" began to grow and has now established a firm and broad field of study. Viewing from a mathematical angle, information theory might be thought of having the following four parts: (1) the mathematical structure of information sources, (2) the theory of entropy as amounts of information, (3) the theory of information channels, and (4) the theory of coding. Probabilistic and algebraic methods have mainly been used to develop information theory. Since the early stage of the expansion of information theory, however, measure theoretic and functional analysis methods have also been applied and are providing a powerful tool to obtain rigorous results in this theory. The purpose of this book is to present the first three parts of information theory, mentioned above, in the environment of functional analysis, in addition to probability theory. Here are a couple of examples in each of which functional analysis played a cru cial role obtaining important results in information theory. The coincidence of the ergodic capacity Ce and the stationary capacity Cs for a certain channel was one of the most important problems in the late 1950s. L. Breiman (1960) showed that for a finite memory channel the equality Ce = Cs holds and, moreover, C e is at tained by some ergodic input source (= measure) invoking Krein-Milman's theorem to the weak* compact convex set PS(X) of all stationary input sources. Another such example appeared in a characterization of ergodic channels. In the late 1960s H. Umegaki and Y. Nakamura independently proved that a stationary channel is er godic if and only if it is an extreme point of the convex set of all stationary channels. Umegaki observed a one-to-one correspondence between the set of channels and a set of certain averaging operators from the set of bounded measurable functions on the compound space to the set of those functions on the input. Then a channel is identified with an operator, called a channel operator, and hence we can make a full use of functional analysis in studying channels. In this book, readers will find how functional analysis helps to describe information theory, especially the mathematical structure of information souces and channels, in an effective way. Here is a brief summary of this book. In Chapter I, entropy is considered as the amount of information. Shannon's entropy for finite schema is defined and its basic properties are examined together with its axioms. After collecting fundamental propvii

viii

Preface

erties of conditional expectation and probability, Kolmogorov-Sinai's entropy is then obtained for a measure preserving transformation. Some fundamental properties of the Kolmogorov-Sinai entropy are presented along with the Kolmogorov-Sinai theo rem. Algebraic models are introduced to describe probability measures and measure preserving transformations. Some conjugacy problems are studied using algebraic models. When we fix a measurable transformation and a finite partition, we can consider Kolmogorov-Sinai's entropy as a functional on the set of invariant (with respect to the transformation) probability measures, called an entropy functional. This functional is extended to be the one defined on the set of all complex valued invariant measures, and its integral representation is obtained. Relative entropy and Kullback-Leibler information are also studied in connection with sufficiency which is one of the most important notions in statistics, and with hypothesis testing. In Chapter II, information sources are considered. Using an alphabet message space as a model, we describe information sources on a compact Hausdorff space. Mean and Pointwise Ergodic Theorems are stated and proved. Ergodicity is one of the important concepts and its characterization is presented in detail. Also strong and weak mixing properties are examined in some detail. Among the nonstationary sources, AMS ( = asymptotically mean stationary) sources are of interest and the structure of this class is studied. Shannon-McMillan-Breiman Theorem is then formulated for a stationary and an AMS source, which is regarded as the ergodic theorem in information theory. Ergodic decomposition of a stationary source is established and is applied to obtain another type of integral representation of an entropy functional. Chapter III, the main part of this book, is devoted to the information channels. After defining channels, a one-to-one correspondence between a set of channels and a set of certain averaging operators is established, as mentioned before. Strongly and weakly mixing channels are defined as a generalization of finite dependent channels and their basic properties are obtained. Ergodicity of stationary channels is discussed and various necessary and sufficient conditions for it are given. For AMS channels, absolute continuity plays a special role in characterizing ergodicity. Capacity and transmission rate are defined for stationary channels. Coincidence of ergodic and stationary capacities is proved under certain conditions. Finally, Shannon's coding theorems are stated and proved. Special topics on channels are considered in Chapter IV. When a channel has a noise source, some properties of such a channel are studied. If we regard a channel to be a vector (or measure) valued function on the input space, then its measurabilities are clarified. Some approximation problems of channels are treated. When the output space is a (locally) compact abelian group, a harmonic analysis method can be applied to channel theory. Some aspects of this viewpoint are presented in detail. Finally, a noncommutative channel theory is introduced. We use a C*-algebra approach to formulate channel operators as well as other aspects of noncommutative

Preface

IX

extension. Another purpose of this book is to present contributions of Professor Hisaharu Umegaki and his school on information theory. His selected papers is published under the title of "Operator Algebras and Mathematical Information Theory," Kaigai, Tokyo in 1985. As one of his students, the author is pleased to have a chance to write this monograph. In the text III.4.5 denotes the fifth item in Section 4 of Chapter III. In a given chapter, only the section and item number are used, and in a given section, only the item number is used. The author is grateful to Professor M. M. Rao at University of California, River side (UCR) for reading the manuscript and for the valuable suggestions. UCR has provided the author with a very fine environment, where he could prepare this mono graph. He is also grateful for the hospitality of UCR. Yuichiro Kakihara Riverside, California April, 1999

This This page page is intentionally is intentionally left blank blank. left blank.

CONTENTS

Preface

vii

Chapter I. Entropy

1

1.1. The Shannon entropy 1.2. Conditional expectations 1.3. The Kolmogorov-Sinai entropy 1.4. Algebraic models 1.5. Entropy functionals 1.6. Relative entropy and Kullback-Leibler information Bibliographical notes Chapter II. Information Sources

1 11 16 29 41 49 63 67

2.1. Alphabet message spaces and information sources 2.2. Ergodic theorems 2.3. Ergodic and mixing properties 2.4. AMS sources 2.5. Shannon-McMillan-Breiman theorem 2.6. Ergodic decompositions 2.7. Entropy functionals, revisited Bibliographical notes

67 71 75 91 99 106 110 119

Chapter III. Information Channels

121

3.1. Information channels 3.2. Channel operators 3.3. Mixing channels 3.4. Ergodic channels 3.5. AMS channels 3.6. Capacity and transmission rate 3.7. Coding theorems Bibliographical notes

121 129 136 147 156 166 178 187

Chapter IV. Special Topics

189

4.1. Channels with a noise source

189 xi

xil

Contents

4.2. Measurability of a channel 4.3. Approximation of channels 4.4. Harmonic analysis for channels 4.5. Noncommutative channels Bibliographical notes

196 202 207 214 222

References

225

Indices Notation index Author index Subject index

239 239 244 247

CHAPTER I

ENTROPY

In this chapter, basic ideas of entropy are presented from the works of Shannon and Kolmogorov-Sinai. The first one is defined for finite schema and the second is for measure preserving transformations. Conjugacy between two measure preserving transformations is considered in terms of their algebraic models. When a transfor mation is fixed, the entropy is defined for all transformation invariant probability measures. In this case, it is called an entropy functional. An integral representation for this functional is given. Relative entropy and Kullback-Leibler information are studied in connection with sufficient statistics and hypothesis testing.

1.1. The Shannon entropy We consider basic properties and axioms of Shannon's entropy. Let n e N (the set of all positive integers) and X = {xi,... ,xn} be a finite set with a probability n

distribution p = ( p i , . . . , p n ) , i-e., Pj = p{xj) > 0,1 < j < n and ^2 Pj = lj where i=i p(-) denotes the probability. We usually denote this as (X, p) and call it a complete system of events or a finite scheme. The entropy or the Shannon entropy H(X) of a finite scheme (X, p) is defined by H X x

( ))

n n n

=

-^Vjôgpj, H(X) = -J2Pi^g Pj,

(1.1)

i=i

where "log" is the natural logarithm and we regard 0 log 0 = 0 log § = 0. We also say that H{X) is the uncertainty or information of the system (X, p). Justification of these terminologies will be clarified later in this section. Since the RHS (= right hand side) of (1.1) depends only on the probability distribution p = ( p i , . . . ,p„) we may also write n

H(x) = H(p) = ff(pi,... ,Pn) = - Y,Pi j=l j=i

1

l

°m-

2

Chapter I: Entropy

We need some notations. For n G N, A„ denotes the set of all probability distri butions p = ( p i , . . . ,p„.), i.e., A„ = Pj > °- ! < 3 < n ?■ Let Y = { y i , . . . ,ym} be another finite set. The probability of (xj,yk) and the conditional probability of Xj given yk are respectively denoted by p{xj,yk) and p(xj|j/fc) = p(xi,y*' if p(yk) > 0. Then the conditional entropy H(X\Y) of X given y is defined by x l x (1.2) H(X\Y) = -J2J2 p(y)p( \y) °gp( \y)(u) If we define .H^XIJ/), called the conditional entropy of X given F = y, by #(-X1JO

= - ^2 P(x\y) logp(x\y),

then (1.2) is interpreted as the average of these conditional entropies over Y. The quantity I(X, Y) defined below is called the mutual information between (X, p) and (r.q):

I(X,Y) = H(X) - H(X\Y) I(X,Y)=H(X)-H(X\Y)

since we can easily verify verify that since we can easily verify that

I(X, Y) = H{Y) - H{Y\X) = H(X) + H(Y) - H(X, Y) I(X, Y) = H(Y) - H(Y\X)

(1.3)

= H(X) + H{Y) - H(X, Y)

where

H(X,Y) =

~"£p(X,y)\ogP(x,y)

where is the entropy of the compound scheme ((X, p), (Y, q)) (cf. Theorem 1 below). If we consider two probability distributions D .2a/)iogp(x, e A , of X. ff(x,y) = -^p(x, 2 /)then the relative entrovv H(p\q) of p w.r.t. ( = with respect to) q is given by is the entropy of the compound scheme ((X, p), (Y,q)) (cf. Theorem 1 below). If we consider two probability distributions p , q £ A n of X, then the relative entropy H(p\q) of p w.r.t. ( = with respect to) q is given by n

n

j=i

i=i

#(p|q) = J2pi(lo&Pi ~ loS*) = J2pi loS — • q

i

If Pj > 0 and qj = 0 for some j , then we define if(p|q) = oo. Observe the difference between H(X\Y) and H(p\q). Relative entropy will be discussed in detail in a later section.

1.1. The Shannon entropy

3

The next two theorems give basic properties of entropies. Theorem 1. Consider entropies on A„. (1) H(p\q) > 0 for p , q £ A n , and H(p\q) = 0 iff (= if and only if) p = q. (2) Let p 6 A n and A = (ajk)jtk be an n by n doubly stochastic matrix, i.e., a

n

jk > 0, 5D ajk j=i

■ff(q) > H(p). ■ff(q) > H(p). of n\. oj {tUi1, ,.... .. , n,nj. of }. (3) H{X, Y) (3) H{X, Y) (4) H(X, Y) (5) H{X\Y)

=

n

X) ajk = 1 for 1 < j,k

< n.

Then, q = Ap e A„ and

fc=i

The equality holds iff q^ = p^(fc), 1 < k < n for some permutation The equality holds iff qk = p^(fc), 1 < k < n for some permutation

n n

= H(X) + H(Y\X) = H{Y) + H{X\Y). = H(X) + H(Y\X) = H{Y) + H{X\Y). < H(X) + H(Y). The equality holds iff X and Y are independent. < H{X). The equality holds iff X and Y are independent.

Proof. (1) Assume H(p\q) < oo. Using an inequality i l o g i > t — 1 for t > 0, we get

2lxogPi>2lî Qj

Qj

piogPl>Pj-q

or

Qj

Qj

for j — 1 , . . . , n. Hence n

n

H(p|q) ==Y,Pi H(p|q) Y,Pil0l0SS~ ~* *EEf ef e" "*)*)= =°- °q

j=l

3

j=l

The statement about the equality follows from the fact that tlogt = t — 1 iff t = 1. (2) q = Ap G A„ is clear. Since the function 0, we have n

n

7= 1 nn nTI n

jj=i =i

1= 1

fc=i fc=i

The equality holds iff <j) I J2 ajkPk)

/ n V>

fc=l nn

\ '

fc=i fc=i

= S 0(Pfc) for 1 I < k < n ioi some permutation n of { 1 , . . . , n } . (3) Observe the following computations:

H{X, Y) = - ^2p(x, y) logp(x, y)

4

Chapter I: Entropy

= - Y^P(X' y) ôgp(x)p(y\x) x,y

= - *52P(X> y) logp^) - Y^p(x'y) = H(X) +

1O

SP(V\X)

H(Y\X),

giving the first equality and, similarly, H{X, Y) = H(Y) + (4) is derived as follows:

H(X) + H{Y) = - J2P(X)

H{X\Y).

l

°SP(x) - £ p ( y ) logp(y)

x

y

= - J^pfo y) logp(x)p(y) x,y.

> - ^vi*, =

y) iogp(z, y),

by (l),

x,y H(X,Y).

By (1) the equality holds iff p(x, y) = p(x)p(y) for x € X and y £Y, are independent. (5) is clear from (3) and (4).

i.e., X and Y

Let R = (-oo, oo), R+ = [0, oo) and R+ = [0, oo]. OO

Theorem 2. Let p = (j)j),q = (qA e U A n . n=2 (1) (Positivity) H(p) > 0. (2) (Continuity) H : U A„ —>■ R+ is continuous. (3) (Monotonicity) f(n) = H(^,...,^)isan I is an iincreasing function of n and H(pi,...,pn)■•■ , n } .

(6) (Additivity) H(piqi,...

,Piqm,p2qi,...

,p2qm,... = H(pu...

,p„gi,... ,pn)+H(qi,...

,p„qm) ,qm).

IT of

1.1. The Shannon entropy

5

(7) (Subadditivity) / / rjk > 0, J2rjk = 1, X>jk = Pj, J2rjk = Qk, then j,k

H(rn,...

k

,rnm) aH(p) + (1 -

a)H{q).

Proof. (1),(2),(4) and (5) are obvious. (3) f(n) = logn for n > 1, so that / is an increasing function. As to the second statement, without loss of generality we can assume pj > 0 for all j . Then i

H(pu...

,pn)-H(-,...

1

—

, - J = - l o g n - Y^Pj log Pj

-gw^) 3=1

n

n j=1

orfl-(pi,...)pn) 1,

/(*) __ m log s

log r

Since r, s > 2 are arbitrary, it follows that for some A > 0 / ( n ) = Alogn,

n > 2,

and by 1°) the above equaity is true for n = 1 too. Consider a finite scheme (X,p) with p = ( p i , . . . ,p„) e A n . If p(x.,) = p^- = ~, then Xj has logn = — log ^ as information or entropy, which is justified by Proposition 3. This suggests that each Xj has information of — logpj and H{X) = n

- J2 pj logp.,- is the average information that X = {xi,...

, x„} has, giving a good

7=1

reason to define the entropy of (X, p) by (1.1). To characterize the Shannon entropy we consider two axioms. The Shannon-Khintchine Axiom ( S K A ) . (1°) H : U A n —¥ R+ is continuous and, for every n > 2, n>2

H(-,....-)=

max{H(p) : p e A „ } .

(2°) For every n > 2 and ( p i , . . . ,p n ) G A n H(pu...

,p„,0) = i ? ( p i , . . . , p n ) . rrij

(3°) If p = ( p i , . . . ,p n ) e A„, Pj = £ 9jfc for 1 < j < n and gjfc > 0, then fc=i H(qn,

■ ■ . ,,ff(^,...,^). The Faddeev A x i o m (FA). [1°] /(P) = H(p,l-p) : [0,1] -r 1 is continuous and /(p 0 ) > 0 for somep 0 6 [0,1].

g

Chapter I: Entropy

[2°] H(pu ... ,pn) = H(prW, ■ . ■ ,Pn(n)) for every ( p x , . . . ,pn) 6 A „ and permu tation 7r of { 1 , . . . , n } . [3°] If ( p i , . . . ,p„) e A n and p„ = 9 + r > 0 with q, r > 0, then H(pu...

= H(Pl,...

,p„-i,q,r)

,pn) + pnH(^-, x

Pn

^-).

yn'

(FA) is an improvement of (SKA) since [1°] and [3°] are simpler than (1°) and (3°), and [2°] is very natural. These two axioms are equivalent and they imply the Shannon entropy to within a positive constant multiple as is seen in the following theorem. Theorem 4. The following statements are equivalent to each other: (1) H(-) : U A n -> R+ satisfies (SKA). n>2

(2) H(-) : U A n -> R + satisfies (FA). n>2

(3) There is some A > 0 such that n

( p i , . . . ,pn) e A n , n> 2.

H ( p i , . . . ,p„) = -X^2pjlogpj,

(1.7)

i=i

Proo/. (1) =*■ (2). [1°] M o w s from (1°). [2°] is derived as follows. If p x , . . . , p n are positive rationals, then pj = =*■ for some £1,... , £n, m € N. Hence H(Pl,...,pn)=H(^,...,^)

Vm

ml *•

«X

„

'

ft.

j ==ii

„ „ JJJ

£

iii,,

Thus, for any permutation n of { 1 , . . . , n}, H(pu . . . , p n ) = i f ( p w ( 1 ) l . . . ,pn{n)). The case where p / s are not necessarily rational follows from the continuity of H ((1°)) and the approximation by sequences of rational numbers. [3°]. It follows from (2°), (3°) and [2°] that

*(?i)="(«■».») "(Wva) = ir

(s'l) + |-»(i.o) +j-ff(i.o).

1.1.

9

The Shannon entropy

implying H(l,0) H(pu...

= 0. Hence ,pn-Uq,r)

= H(pi,0,...

,pn_i,0,g,r) n-1

= H(pu...

,pn) + £ ) P i t f ( l , 0 ) +PnH(±: j=l

—)

V

C

P« Pn'

O —T \

—>

Pn

) '

Pn'

i.e., [3°] holds. (2) => (3). Using [3°], we have for any p, q > 0, r > 0 with p + q + r = 1 H(p,q,r)

= H(p,q + r) + (q +

r)H(-l-,-?—) \q + r q + r/ = H{q,p + r) + (j> + r)H (-?-,-?—). \p + r p + r/

If we set /(p) = H(p, 1 — p), then the second of the above equalities becomes

( L8 )

/(p) + (i-p)/(i-y = /(«) +a-«)/(i^)Letting p = 0 and 0 < q < 1, we get /(0) = ff(0,l) = 0. Integrating (1.8) w.r.t. q from 0 to 1 — p gives

(i - p)m+a - p?fo m dt=jy mdt+p2j IT *• (L9) Since /(p) is continuous and hence all terms except the first on the LHS ( = left hand side) of (1.9) are differentiable, we see that /(p) is also differentiable on (0,1). By differentiating (1.9) we obtain (1 - P)f'ip) - fip) ~ 2(1 - P) /

fit) dt = - / ( l -p) + 2p[

Jo

Jp

M

f

dt -

l

M.

P

We can simplify the above by using /(p) = / ( l — p) to get

(1 - P)f'ip) = 2(1 - P) J fit) dt + 2pj M dt -

f

M.

(i.io)

Chapter

10

I:

Entropy

It follows that f'{p) is also differentiable on (0,1). By differentiating (1.10) we have f"ip)

= -^-^£fit)dt,

0 0 since otherwise qn+i is replaced by some % > 0. Then observe that H{qi,-..

,qn,qn+i)

= Hiqi,...

,qn-i,qn

+ Qn+i)

+ (gn + gn + l)gf

'"

,

g +1

"

)

n-1

=

_ A

X)9J

log9

J

_ A

(9n + 1n+i) log(g„ + g„+i)

J'=l

- A (fa log — L

+ 9 n + 1 log - g i i - )

n+1

= -A^?3log«j. j=i

(3) => (1). (1°) is shown in Theorem 2 (3), (2°) is clear and (3°) can be verified in a similar manner as in the proof of Theorem 2 (6).

1.2. Conditional expectations lal exp

11

1.2. Conditional expectations In order to study the Kolmogorov-Sinai entropy we need to work on conditional expectations and conditional probabilities on a probability measure space. We collect basic properties of these materials together with some proofs. Thus, let (X, X, fi) be a probability measure space, where X is a nonempty set, X is a cr-algebra of subsets of X and /i is a probability measure on it. A transformation S on X into X is said to be measurable if S^1X C X, i.e., S_1A £ X for every A e X. A measurable transformation 5 is said to be measure preserving if fi o S'1 = fj., i.e., (j. o S~1(A) = (i(S~1A) = n(A) for every A 6 X. If, in particular, a measure preserving transformation S is one-to-one and onto, then the inverse transformation S _ 1 can be considered. If S~1 is measurable, i.e., S is invertible, then it satisfies S_1X = X = SX and is also measure preserving. In what follows we fix a quadruple (X, X,'fi, 5), which is called a dynamical system, where S is measure preserving and not necessarily invertible. We treat real-valued random variables on (X, X, fJ.) in this section. Denote the L1-space of (X, X, JJ.) by LX(X). If 2) is a a-subalgebra of X and / e LX{X), let Hf{A)=

f fdp,

AetQ.

JA

Then (if is a c.a. (= countably additive) measure on 2) and is absolutely continuous w.r.t. fi, denoted (if i 1 (?J) is a linear operator, i.e., E(af

+ Pg\fQ) = aE{f \fg) + pE(g\y),

a,/3eR,f,g£

In fact, the RHS is 2)-measurable and the equality j

(af

+ 0g) d(i = J

{ Q E ( / | 2 J ) + /3E(g\iQ)} d/x

L^X).

Chapter I:

12

Entropy

holds for every A e 2). (2) (Positivity) / > 0 =*■ E{fW) > 0. In fact, 0<JAfd^ = JA E(f\%) dp for A € 2) and JS(/|gJ) is immeasurable. (3)|£(/|2J)|(-|?J) : I x (jt) -> Z/1©) is a projection of norm 1. This follows from (1), (4) and (5). (7) 2)i C ?J2 =» JS(JS?(/|9a)|SJx) = S(/|2)i). In fact, the RHS is 2Ji-measurable and f fdp=

[ E(f\q2) dp= f £(/|0)i) dp,

(8) £(/|2) = £ ( / ) = fxfdp, smallest a-subalgebra of 3£.

A 6 2)!.

the expectation of / , where 2 = {0,X}, the

For, E(f\2) is a constant. (9) [/ 6 L°°(X),g e L 1 ©)] or [/ € Ll{3), g g L°°(2j)] implies £( 5 /|2J) = ff£(/l9J). In fact, both sides are 2)-measurable and the equality holds for a 2J-simple func tion g. The general case follows from a suitable approximation by a sequence of 2)-simple functions. (10) (Dominated Convergence Theorem) |/„| < g, fn ->■ f fi-a.e. =$■ E{fn\ E{fW) p-a.e. and in L1. For, observe that for every A e 2 )

I / E(fn\Z))dp~ [ E(f\%)dp\ < f \E(Jn\fQ)-E{f\fQ)\dp \J A

JA

JA

0

1.2. Conditional

expectations

13

as n —> oo by the Dominated Convergence Theorem. But then,

/ £(/„|2)) d,i= f fndn-> JA

JA

[ fdfi= f E(J\%) dfi JA

JA

for every 4 e 2 ) again by the Dominated Convergence Theorem. This shows that E(fn\V) -»• E(f\■ £(/„|2)) t # ( / | ? J ) Ma.e. For, we can assume without loss of generality that / „ > 0 fi-a.e. for n > 1. Note that lim i?(/„|2)) exists and is 2)-measurable. Thus for A G 2) we have by a n—Kx>

suitable apphcation of the Monotone Convergence Theorem twice I Urn E(fn\ 1, then E(\fg\\fQ) < E{\f\*\Z))>E(\g\*\Z)) P(A\m < P(B|2J). P(A'\Z)) = 1 - P(A\tQ). P(A|2) = n(A).

(( U OO OO oo

I

n = 1

4 I

n=1

I

\\

'

'

"°°" °°

= En=l P(AnW). n=l

In fact, the above statements (17) - (22) are almost obvious. Let {2)„} be an increasing sequence of u-subalgebras of X such that 2) n f 2), i.e., 2Jn C 2) n + i (n > 1) and a[ U 2) n ) = 2J, where cr(-) is the tr-algebra generated by {■}. We say that {/„} C LX(X) is a martingale relative to {2) n } if £(/n+i|2)„) = /«,

n > 1.

For instance, if 2)„ f 2), / € £*(£) and / „ = E{f\tQn), n > 1, then {/„} is a martingale relative to {2J n }. {/„} C ^(X) is said to be a submartingale relative to {2Jn} if fn is 2J n -measurable for n > 1 and £(/„+i|2)„) > / „ ,

n > 1.

If ">" is replaced by "partitions. 21 < 93 means that 93 is finer than 21, i.e., each A £ 2t can be expressed as a union of some elements in OS.

1.3. The Kolmogorov-Sinai entropy

17

Definition 1. Let 21 = {Au ... , An} e V(X). The entropy H (21) of a ■partition 2t is defined by n

ff(2l) = ~5>(A3)log/zUi) i=i

= -£>(A)lo gjU (4). The entropy function 1(21) o/ 21 is defined by J(20(-) = - £ U ( - ) l o g M ( A ) . In this case we have # ( 2 l ) = £ ( / ( 2 l ) ) = f 7(21) d/i. Jx The conditional entropy function J(2l|2J) and conditional entropy i?(2l|2J) are respectively defined by

/(2l|2))(0 = - £ U(0 \ogP(A\m-), 7f (2t|2J) = S(/(2tl9J)) = / J(2t|2J) dM.

(3.1)

For 21 € V(X) we denote by 21 the er-algebra generated by 21, i.e., 21 = a(2l). For a-subalgebras 2Ji,2J 2 we denote 2Ji V 2J2 = (P(Am))), by (3.2), = ES(^(P(^|2J 1 )|2J2))) < E S(0(P(J4|2J2))) =

ff(a|2J2),

by (3.2).

(7) follows from (6) by letting 2Ji = 2J and 2J2 = 2. (8) is derived as follows: # ( a v 9 5 | 2 J ) = # ( a | 2 ) ) + .ff(23|av2J), < H(a|2J) + H(95|2J),

by (2),

by (6).

Chapter I: Entropy

20

When 2J = 2, (9) is obtained. h (10) Since P(5- 1 .4|S'- 1 2J) = P(A\fQ) oSiovAeXwe E we have

HiS-^S-1®)

= Y, E&iPiSÂlS-1®))), = ^

by (3.2),

E(<j>(P(A\2)) o 5 ) ) ,

by (16) in Section 1.2,

= 52E(4>(P(A\V)))t byn°S-l = v, = H(0\fQ). (11) is obvious. R e m a r k 2. (1) Theorem 1 holds if H{%) and i7(2l|2J) are replaced by 7(21) and I(2l|2J), respectively, where (10) and (11) of Theorem 1 are then read as: (10) IiS-^S-1®) = 7(2l|2J) o S. (ll)/(5-12l) = /(2t)o5. (2) In Theorem 1 (2), if 21 < 53 and 2J = 93, then 21V 55 = 53 and 21V 93 = 93, so that i7(2t|93) = 0. This means that 93 contains all the information that 21 has, and hence the conditional entropy of 21 given 53 equals 0. In particular, H{%\X) = 0 for every 21 e ?>(£). Definition 3 . Let 21 € V{X). The entropy H(% S) of S relative to 21 is defined by 77(21, S) = Urn i i f C V s ^ ' a V n-+oo n \ i=o /

(3.4) '

v

where the existence of the limit is shown below. The entropy 77(S) of S or the Kolmogorov-Sinai entropy of S is defined by H(S) = sup {77(21, S) : 21 €

V(X)}.

We have a basic lemma. L e m m a 4. Let 21 € V(X).

Then:

n 1

(i)H( y s-w)=H(K) + nf;H(x\ v s-*v). vj=o

y

fc=1

v ij=i

;

1

(2) i f ( V s-*a) = 77(21) + "f; # (s-*a| V s~m). Vj=o

J

fc=1

V

Ij=0

(3)/(Vs-^2i) =/(2i) + "f;17(2i| v s-'a).

)

inai entropy en 1.3. The Kolmogorov-Sinai

21

(4) /( V S - ' ' B ) = /(si) + nf; /(s-fcB| Vs-â). Proo/. (1) follows from the following computation: SYVS-'B)

=j?(s- (n - 1) av ( n y z s _J 'an = Jf (5- ( n - 1 ) 2l) + H ( "v 2 S- J '2t|5- (n - 1 >2t)

= Jf (a) + H (s-(n-2>a v ("v35-J'2i) Is-' 7 1 - 1 ^ ff(S-("-2)2l|S-(n-1>2l)

= if (21) +

+ Ff"v 3 5'- ; '2l|5- (n - 2) 2lvS- (n - 1) 2l[ N ) = F(2l) + //(2l|5- 1 3l) +/f("y 3 5- J '2l| . V <Sf-J«) n-l

= H(21) + ] T ff (2l| .V 5- J '2l). fc=i

(2), (3) and (4) can be verified in a similar manner. The following lemma shows that the limit (3.4) exists and 77(21, 5) is well-defined for every 21 £ V{X). Lemma 5. Let 21 G P(£). Then: (1) 77(21, 5) is well-defined and it holds that 77(21,5)= lim 77(21! V S-J'2l) n-yoo

\

I j'=l

/

n

= lim 77('5- 2trv 5- J '2iy ra-t-oo

V

1

I j=0

I

(2) 7/ S is inveriible, then 77(21,5)= lim 77 ( d V Sj

V I j'=l

/

n-Kx; n

V .7=0

Sj9i). /

Proof. (1) By Theorem 1 (6) and (1) we see that 0 < 77(2l| .V^S-'a) < 77(2l| V 5 - i 2 i ) < 77(2l|2) = 77(21)

22

Chapter I: Entropy

forn > 1 and hence lim H (oil V 5 _ J '2l) exists. Now by Lemma 4 (1) it holds that

ifm,s)= um ifrfVfir'a) v

n-KX> n

V .7=0

/

= lim - YH(*\ V S-'a) = lim fffal v s^'a). The second equality is obtained from Lemma 4 (2). (2) An application of Theorem 1 (11) and (1) above give

H(21, s)= lim ifffVs-'o) n-t-oo n

\ j=0

n->oo n 1

V /n-1

I

= lim ±i7(V("-1>"vV(2J). Then: (1) 2t < SB => H(21,5) < if (58,5). (2) if f . V 5-J'2t, 5 ) = if (21, S) for n > m > 0. f/ 5 is invertible, this holds for

ra,meZ = { 0 , ± l , ± 2 , . . . } . (3) if( m y o 1 5 , -J'2t,S' m ) = mff(2l,5) for m > 1. (4) if (21,5) < if (23, S) + H(21|lea *• ■/0

M(A)dt+ / -/-log^(A)

e- 7(2l|2J) /i-a.e. and in L 1 . (2)lf(2l|2) n )4ff(2l|2)). TYoo/. (1) It follows from the Martingale Convergence Theorem ((23) in Section 1.2) that P(A\Z)n) -S- P(A\Z)) n-a.e. for A G 21 and hence 7(2l|2J n ) ->■ 7(2l|2J) /i-a.e. Since sup/(2l|2)„) is integrable by Lemma 8, we see that 7(2l|2J„) -> 7(2l|2J) in Ll, TI>1

too. (2) is derived from (1) and the fact that 77(2l|2Jn) = Jx 7(2l|2J„) a> and 77(2l|2J„) > 77(2l|2J„+1) for n > 1 by Theorem 1 (6). Now we have: Theorem 10 (Kolmogorov-Sinai). 7 / 5 is invertible and 21 G V(X) is such that V 5"2l = X, then 77(5) = H(21,5). n=—oo

Proof. Let 2l„ =

V 5fc2l for n > 1. Then 77(2^, 5) = #(31, 5) by Theorem 6 (2).

Observe that for 25 G V(X) 77(23, 5) < 77(2l„, 5) + H^^),

by Theorem 6 (4),

= 77(2l,5)+77(23|2l n ) -> 77(21, 5)

(n ->• oo)

since H^^) I 77(23|£) = 0 by Lemma 9 (2) and Remark 2 (2). This means that 7/(03, 5) < 77(21, 5) for 23 G V{X), which implies that 77(21, 5) = sup {77(23, 5) : 23 G V{X)} = 77(5).

An immediate corollary of the above is: Corollary 11. Let 21 G V(X). n

Then:

(1) V 5 " 2 l = X => 77(5) = 77(21,5). n=0

„„

Chapter I: Entropy (2) If S is invertible and V S - " 8 = X, then H(S) = H(% S) = 0.

Proof. (1) Letting 2l„ = V S _fc 2l for n > 1, we see that this plays the same role as in the proof of Theorem 10. (2) By Theorem 10 we have # ( 2 l , S ) = H(S). We also have H(%S)=

lim H(%\ V S-j

V

= H(VL\X),

Ij = l

by Lemma 5(1),

/

by Lemma 9 (2)

since X = S~1X = V S~fc2l by assumption, which is 0 by Remark 2 (2). fc=i Consider the case where H(S) = 0. Note that this implies S~XX = X. For, suppose that 5 _1 3C C X, a proper set inclusion. Then there is an A G X such that A i S'^X. Let Oo = {A,AC} G V(X). Then 0 < H(Vl0\S-1X)

< lim fl-faol V S-j%0) n—nx

\

I j'=0

= H(ph, S) < H(S) = 0, /

a contradiction. To consider various dynamical systems and entropies, isomorphism among these systems is important, which is defined by Definition 12. Let (Xi,Xi,Hi,Si) (i = 1,2) be two dynamical systems. These systems or Si and 52 are said to be isomorphic, denoted Si = S2, if there exists some one-to-one and onto mapping (p : Xi —> X2 such that (1) for any subset Ai C X\, Ai G Xi iff H(S2) is also true. Thus if (Si) = H(S2). It trivially follows from the above theorem that if H(Si) / H(S2), then S x ^ S 2 In the next example we show how to compute the entropy of Bernoulli shifts.

nai entropy ent 1.3. The Kolmogorov-Sinai

27

Example 14 (Bernoulli shifts). Let (Xo,p) be a finite scheme, where XQ = { o i , . . . ,at} and p = ( p i , . . . ,pi) 6 Ae, so that p(a,j) = Pj, 1 < j ->■(... ,£'_!, X O J S I , . . . ) ,

xjj. = Xfc+i, 4 e Z .

A cylinder set is defined by [x?---x?] = {(•■■ ,xux0,xi,...)

:xk = x°k, i < k < j}

and let Extend /ô to the a-algebra X generated by all cylinder sets, denoted by p.. Note that 5 is measure-preserving w.r.t. \i and hence (X, X, p, 5) is a dynamical system. The shift S is called a ( p i , . . . ,pt)-Bernoulli shift. Since 21 = {[xo = «i], • • • , [ô = ai]} oo

is a finite partition of X and

Sn2l = X by definition, we have by Theorem 10

V n=—oo

and Lemma 5 (2) that

H(s) = Hm,s) = hm iff(Vs-fcsa). n->oo n

\

fc=o

/

Now V S~fc2l = {[x 0 • • • x„_i] : Xj € X0, 0 < j < n - l} and hence F(Vs-*2l)== -

^

p([x0---xn_1])logp([x0---xn_1])

5Z

p([xo---x n _i])logp([x 0 ])---p([x n _ 1 ])

XO,-" i^n — l€-Xo

= -

5 Z M ( W ) logp([x 0 ])

^2

a:oG-Xo

aJn —i€-Xo

= nH{%) since /i([fflj']) = p(aj) — Pj f° r 1 < J < n. This implies that I

H(S) = H(Ql) =

-J2PjtegPj.

n{[xn-i])

logp([x„_i])

28

Chapter

I:

Entropy

Thus (§, \)-Bernoulli shift and (§, §, §)-Bernoulli shift are not isomorphic. ( p i , . . . ,p„)- and ( g i , . . . , g m )-Bernoulli shifts have the same entropy, i.e., n

If

m.

1

- ^PJ

°SPJ

=-J2qk

log Qk

'

*!=i

j=i

then are these isomorphic? This was affirmatively solved by Ornsterin [1](1970) as: Theorem 15. Two Bernoulli shifts with the same entropy are

isomorphic.

We say that the entropy is a complete invariant among Bernoulli shifts. The proof may also be found in standard textbooks such as Brown [3], Ornstein [3] and Walters [1]. Example 16 (Markov shifts). Consider a finite scheme (Xo,p) and the inifinite product space X = XQ with the shift S as in Example 14. Let M = (rriij) be an £ x£ i

stochastic matrix, i.e., my > 0, Yl mij = 1 for 1 < i, j < £, and m = (mi,... j=l

, mi)

t

be a probability distribution such that Yl mifnij = rrij for 1 < j < £. For each i,j, i=l

rriij indicates the transition probability from the state dj to the state aj and the row vector m is fixed by M in the sense that m M = m. We always assume that mi > 0 for every i = 1,... ,£. Now we define /z0 on SDT, the set of all cylinder sets, by Mo([a»0 •••«»„]) =miomioi1

■■■min_lin.

fj,0 is uniquely extended to a measure // on X which is 5-invariant. The shift S is called an (M, m)-Markov shift. To compute the entropy of an (M, m)-Markov shift S consider a partition 21 = {[so = a i ] , . . . ,[x0 = at]} e V{X), which satisfies

V

Sn% = X. As in Example

n=—oo

14, we see that

Y,

M([zo • • ■ x„-i]) log fj,([x0 ■ • •a;„_1])

XQ,... , X n _ i ^ X o H

Ckl0S~k*)

= -

E

M(N--x„_1])logM([xo---a;n_1])

XQ,... 22 ,xn_i^Xo

m

iomioii---minîn^logmiornioil---rnin_ainl

to,-- , i „ _ i = l

e

mi

Zl

o"*»on---mi„_2in_1logmiomioil--.min_2i„_1

m = -^2to,-io , i „ _ ilo=gra»o - (n - 1) ^ l

= ~ «o=l E mio «o=l

lo

immij logmy

i , J = l mimij log ray S m i o - (n - 1) ^2 i,J=l

1,4. Algebraic models I

since X) j n i' T l y = mj

29 an

t

d !C m »j

=

1 f° r 1 — *"> J' S '• By dividing n and letting

n - > o o w e get i

H(S) = — 22 rriimij log rriij.

1.4. Algebraic m o d e l s In the previous section, we have defined the Kolmogorov-Sinai entropy for measure preserving transformations and have seen that two isomorphic dynamical systems have the same entropy. In this section, we relax isomorphism to conjugacy, which still has the property that two conjugate dynamical systems have the same entropy. Algebraic models of dynamical systems are good tools to examine conjugacy among these systems and these will be fully studied in this section. Let (X, X, n, S) be a dynamical system. Two sets A,BeX are said to be /iequivalent, denoted A~B, if fi(AAB) = 0, where AAB = (A U B) - {A !~l B), the symmetric difference. Let A = {B 6 X : A~B}, the equivalence class containing X e l , and 93^ = {A : A € X}, the set of all equivalence classes of X, called the measure algebra of n. The measure fj, on 93M is of course defined by H(A) = fi(A),

A eX

and is strictly positive in the sense that n(A) > 0 if A ^ 0. For now we consider conjugacy between a pair of probability measures. Definition 1. Let (Xj, Xj, fij) (j = 1,2) be two probability measure spaces with the associated measure algebras 931 and 2*2, respectively. Then Hi and \x2 are said to be conjugate, denoted fix ~ (j,2, if there exists a measure preserving onto isomorphism T : 5Si -»■ 93 2 , i.e., T satisfies that H2(TB) =

T

y OO

v

OO

rs

TBc={TBf,

M x (B),

T

/

OO

\

B e ®i, OO

( u^) = u i' ( n i) = n r i ^ ^ e ® i > J ^ L B

Conjugacy between a pair of probability measures can be phrased in terms of a unitary operator between L 2 -spaces as follows.

Chapter I: Entropy

30

P r o p o s i t i o n 2. Let (Xj,Xj,p,j) U = M ) be a Pair of probability measure spaces. Then, ^ i - H2 iff there exists a unitary operator U : £ 2 (A*I) -* L2(n2) such that UL°°(m) C L°°{nz) and U(fg) = Uf-Ug,

/,ff6£°°(/ii).

Proo/. To prove the "only i f part, assume /Ji ~ /x2 and let T : 55i -»■ 552 be an onto measure preserving isomorphism, where 93, is the measure algebra of fij (j = 1,2). Define an operator £/ : L2(jj.i) -> L 2 (/i2) by f/l B = I r s ,

5 e ®i.

Note that \\UlBh = V^(TB) = Vt*i(B) = l I M b s i n c e T i s measure preserving. Then, U can be linearly extended to the set of all 93i-simple functions and is an isometry. Thus U becomes a unitary operator since T is onto. n

Now, let / = J2 aj^Bj, where a / s are complex and Bj G Q5i (1 < j < n) are Now, let / = J2 aj^Bj,

where otj's are complex and Bj G Q5i (1 < j < n) are

disjoint. Then, it holds that

disjoint. Then, it holds that

Uf2 = ufeajlB,)

=

u(âiaklBllBl]

= 5Z a J afcl;rB > nTB * = 5^ a l l T B J' (Uff = ( 13 Q J l T B i )

=

5ZaJafclrBilrB*

= 5Z«jafclT(s,nB t ) =

J2oipTBt,

and hence Uf2 = (Uf)2 This implies by the polarization identity that U(fg) = Uf ■ Ug for all «Bi-simple functions / , g and then for all / , g e £°°(/xi) by a suitable approximation. Conversely suppose U satisfies the conditions mentioned. If / is an indicator function corresponding to S i e say. Define T : Q5i ->• Q52 by TSx = BJ for Z?i e ©i._Since U is onto, T is also onto. Since U is a unitary operator, TBi = 0 implies Bi = 0. To see that T is a measure preserving isomorphism, note that pi(B x ) = II/IH = HI7/III = p 2 ( T B i ) for Bx e L°°(ii2) unambiguously by U

o y J2 ajfjj = Y, aJUfi'

since J2ajfj = 0 fii-a.e. implies that YLajUfj i j a linear multiplicative mapping such that ||Db/|]. = H/lla,

= 0 fj.2-a.-e. Then note that f/o is

/ € A

It then follows from Theorem 8 that UQ can be extended to a unitary operator Ui : L2(m) -> L2(fi2) such that UiLx{tn) = L°°(fi2) and Ul(fg) = U1f-U1g,

/ , j e r W .

Therefore fix — (J.2Corollary 12. Two probability measures fj,\ and fi2 are conjugate iff algebraic measure systems (r(/ii),
0, £ G MJ(X).

Hence the LHS is well-defined. If £ = £ &, where & = $ - Cfc with ^ " , C € fc=i

M+(X) for 1 0 we can choose mo > 1 such that 0 < J{m,n)

< J(m,0) < e,

m > mo, n > 1.

Hence -r

X) ?(-A)logf(A) < J(fc,m,n) + e, Aeavat_!

If we let k ->■ co, then I(k,m,n) # ( £ ^ , 5 ) . Thus we have

-»■ # ( f , a ( m ) , S ) and the LHS of (5.8) -)•

-ff($,a,5) m 0 , n > 1.

m > m 0 , n > 1.

(5.9)

is monotonely nondecreasing by Remark 7 (2), lim F ( ^ a ( m ) , S ) = H £

exists. This and (5.9) imply that

B(t>S) H 8I O*|I') < % 2 (^k) < -ff(^k)When fi,k)

n

for 0 < k < n2 — 1 and n > 1. This implies h n

< _

M(-An.fc) < fc + 1 v{An ; } • Since ip(t) = tlogt is decreasing on (0, | ) and increasing on ( j , l ) , we have that

log——
/ Jx

hnloghndu.

On the other hand, let / = 4jJ and observe that for a; € X and n > 1 0 < gn{m) - fn(x) < - , n

0 < /(*) - /„(*) < &,(*) - /„(*) < -, n

j2

Chapter I: Entropy

0 < gn{x) - f(x) < gn(x) - fn{x) < -■ Then we see that 0 1) and 2) be cr-subalgebras of X and fi, v 6 If Vn t % then H%MV) t Hv(jt\v). (2) Let 0 < a < 1 and /^, Vj € P{X) (j = 1,2). Tften, # ( a / x i + (1 - a)/i 2 |ai/i + (1 - a)i/ 2 ) < aJf(A»x|*i) + (1 - a) H

fall's).

P(X).

(6.3)

(3) 7/ ||/i n - un\\ -► 0 and \\vn - v\\ -» 0 witfi {(i„,/i, i/„, v : n > 1} C P{X),

then

H(n\u) < liminf Hfa\un).

(6.4)

n—+oo

(4) 7/ fi,veP(X),

then \\n-V\\ 1 let /n„ = ^ l ^ and vn = i/|3)„, the restrictions of n and v to 2)„, respectively. Then /i 1. If we let / „ = ^ and / = g^, then it follows from Theorem 2 that for n > 1 # S > „ ( M I " ) = / /nlog/„aV,

Hv(n\v)=

j

flogfdv.

Since {/„ : n > 1} is a martingale in Lx(u), we have / „ -» / /n log / „ - * • / log / i/-a.e. Then, Fatou's lemma implies that H»0*1*0= / / l o g / d i / < liminf / n Jx ^>°° Jx

fn\ogfndv

i/-a.e. and hence

1.6. Relative entropy and Kullback-Leibler tier information in

= liminf HVn(fi\u)

53

< H 2e. Since 2)„ | 2), there exist n > 1 and £ 6 2J„ such that /i(AAB) < e, v{AAB) < e. Now observe that

.i™,flSo.M") > »».W") > #B(MI")

^W) -"{BC){I - VM)=M(5C) - V[BC v(B
\n{A) log ^ n—too

2,

£

—> co

as e —> 0.

That is, lim HmAlAv) = co = Hm{)j\v). n—too

(2) We can assume 0 < a < 1. If /ti -^ fi or /t2 it v%, then (6.3) is true since both sides of (6.3) become oo. So we assume /ti -C v\ and /*2 -C ^2- In order to prove (6.3) it suffices to show that for any 21 € V(X)

E > Aea

r

fA\

W

, M + (1

< a £

\

"

f/i\"M

a ) M 2 ( A ) } l0S

QMi(^) +

a^ W

+

M i ^ l o g ^ + (1 " a) £

(l-a)M2(^)

( l - a ) ^

W

M A ) log J ^ .

(6.6)

54

Chapter I: Entropy

Let ci,C2,di, d2 be nonnegative constants and consider a function ip denned by r V{x)

,-

\ l i

OJCi + (1 — x)c2

= {xCl + (1 - x)c 2 } log xdi

+

{1_x)d2-

Then we see that „, . [(ci - c2){xdi + (1 - x)rf2} - ( 0 for n > 1. Let 91 = {jin : n > 1}. Obviously 9 t < 971 since 91 C 971. We shall show 971 1. Take an arbitrary fi e 931. Since /i(A\ii"M) = 0 by the definition of K^, we can assume that A C K^. If /i(A\C) > 0, then X(A\C) > 0 and hence A U C is a chain with A(A U C) > A(C) = a. This contradicts the maximality of C. Thus n{A\C) = 0.

58

Chapter I: Entropy

Now observe that A(JlnC) = ^ A ( 4 n i f n ) = o n=l

since 0 = nn(A) = fi(A n Kn) = JAnKn f^ dX and Kn C #,,„ imply A(A n tf„) = 0 for n > 1. Therefore, fi{A) = n{A\C) + n(A D C) = 0. This means SOT < 01. We now introduce sufficiency. Definition 10. Let 2) be a 1}. Let X A

( ) = T,^(A),

A ex.

n=l

Then, A e P(X) and SOT « {A}. Since 2) is sufficient for SOT, for each A 6 £ there exists a 2J-measurable function hA such that °° 1 f f A(An£?) = £ — / ^ „ ( U | 2 J ) d / x „ = / ^ d A , Hence EX(1A\Z)) we have

Be?).

= hA X-a.e. Take any fi e SOT and let g = jfc. Then, for any A € X

!A^dX=SAtdX=Â) =

jxE^m^

= I hAdfi= [ Ex(lA\0

55, P { £ & - - ff(Pl^ < 0 let E(k,6)=l(xu...,xk)eX*:

i^l0g^l-ir(p|q)

< «J 1.

Then by (6.11) we have that lim P(E(k, 5)) = 1,

5 > 0.

For (xi,... , Xfc) € E(k, 5) it holds that it

Y[p(xj) exp { - k(H(p\q) - 5)} > f[ J= l

q(Xj)

3=1 kk

> HP(XJ) exp { - k(H(p|q) + 3=1 HP(XJ)

exp { - A(tf(p|q) +

E (xi,... ,xt)€AcnE{k,6)

n^) e x P{- A ; Wp|q) + 5)} j=l

= P(AC n E(k,6)) exp { - fe(H(p|q) + 6)}. C

= P(A

n E(k,8)) exp { - fc(JT(p|q) + 1 Since P(.A) < e, (6.12) implies that for large enough k > 1 p(Aen£(*,*))>^ and hence Q(AC) > ^

exp { - fc(# (p|q) + 6)}.

Since the RHS is independent of A it follows that P(k, S) > i ^

exp { - *(H{p|q) + 6)},

so that liminf i log/?(*,£) > - i r ( p | q ) - 5.

(6.14)

fc—foo fc

Since <S > 0 is arbitrary, combining (6.13) and (6.14) we conclude that (6.10) holds.

Bibliographical n o t e s There are some standard textbooks of information theory: Ash [1](1965), Csiszar and Korner [1](1981), Feinstein [2](1958), GaUager [1](1968), Gray [2](1990), Guiasu [1](1977), Khintchine [3](1958), Kullback [1](1959), Martin and England [1](1981), Pinsker [1](1964), and Umegaki and Ohya [1, 2](1983, 1984). As is well recognized there is a close relation between information theory and ergodic theory. For instance, Billingsley [1](1965) is a bridge between these two theories. We refer to some text books in ergodic theory as: Brown [1](1976), Cornfeld, Fomin and Sinai [1](1982),

Chapter I: Entropy

64

Gray [1](1988), Halmos [1, 2](1956, 1959), Krengel [1](1985), Ornstein [2](1974), Parry [1, 2](1969, 1981), Petersen [1](1983), Shields [1](1973) and Walters [1](1982). Practical application of information theory is treated in Kapur [1](1989) and Kapur and Kesavan [1](1992). The history of entropy goes back to Clausius who introduced a notion of entropy in thermodynamics in 1865. In 1870s, Boltzman [1, 2](1872, 1877) considered an other entropy to describe thermodynamical properties of a physical system in the micro-kinetic aspect. In 1928, Hartley [1] gave some consideration of the entropy. Then, Shannon came to the stage. In his epoch-making paper [1](1948), he really "constructed" information theory (see also Shannon and Weaver [1](1949)). The his tory of the early days and development of information theory can be seen in Pierce [1](1973), Slepian [1, 2](1973, 1974) and Viterbi [1](1973). 1.1. The Shannon entropy. Most of the work in Section 1.1 is due to Shannon [1]. The Shannon-Knintchine Axiom is a modification of Shannon's original axiom by Khintchine [1](1953). The Faddeev Axiom is due to Faddeev [1](1956). The proof of (2) => (3) in Theorem 1.4 is due to Tverberg [1](1958), who introduced a weaker condition than [1°] in (FA). 1.2. Conditional expectations. Basic facts on conditional expectation and condi tional probability are collected with or without proofs. For the detailed treatment of this matter we refer to Doob [1](1953), Ash [2](1972), Parthasarathy [3](1967) and Rao [1, 3] (1981, 1993). 1.3. The Kolmogorov-Sinai entropy. Kolmogorov [1](1958) (see also [2](1959)) introduced the entropy for automorphisms in a Lebesgue space and Sinai [l](1959) slightly modified the Kolmogorov's definition. As was mentioned, entropy is a com plete invariant among Bernoulli shifts, which was proved by Ornstein [1](1970). There are measure preserving transformations, called if-automorphisms, which have the same entropy but no two of them are isomorphic (see Ornstein and Shields [11(1973)). 1.4- Algebraic models. and Foias, [2, 3](1968). Chi section to projective limits are seen in Dinculeanu and

The content of this section is taken from Dinculeanu and Dinculeanu [1](1972) generalized the results in this of measure preserving transformations. Related topics Foias, [1](1966) and Foias, [1](1966).

1.5. Entropy functionals. Affinity of the entropy on the set of stationary probabil ity measures is obtained by several authors such as Feinstein [3] (1959), Winkelbauer [1](1959), Breiman [2](1960), Parthasarathy (1961) and Jacobs [4](1962). Here we followed Breiman's method. Umegaki [2, 3](1962, 1963) applied this result to con sider the entropy functional defined on the set of complex stationary measures. He obtained an integral representation of the entropy functional for a special case. Most of the work of this section is due to Umegaki [3]. 1.6. Relative entropy and Kullback-Leibler information.

Theorem 6.2 is stated

Bibliographical notes

65

in Gel'fand-Kolmogorov-Yaglom [1](1956) and proved in Kallianpur [1](1960). (4) of Theorem 6.4 is due to Csiszar [1](1967). Sufficiency in statistics was studied by several authors such as Bahadur [1](1954), Barndorff-Nielsen [1](1964) and Ghurge [1](1968). Definition 6.8 through Theorem 6.15 are obtained by Halmos and Savage [1](1949). We treated sufficiency for the dominated case here. We refer to Rao [3] for the undominated case. Theorem 6.16 is shown by Kullback and Leibler [1](1951). Theorem 6.17 is given by Stein [1](unpublished), which is stated in Chernoff [2](1956) (see also [1](1952)). Hoeffding [1](1965) also noted the same result as Stein's. Re lated topics can be seen in Blahut [2](1974), Ahlswede and Csiszar [1](1986), Han and Kobayashi [1, 2](1989) and Nakagawa and Kanaya [1, 2](1993).

CHAPTER II

INFORMATION

SOURCES

In this chapter, information sources based on probability measures are considered. Alphabet message spaces are reintroduced and examined in detail to describe infor mation sources, which are used later to model information transmission. Stationary and ergodic sources as well as strongly or weakly mixing sources are characterized, where relative entropies are applied. Among nonstationary sources AMS ones are of interest and examined in detail. Necessary and sufficient conditions for an AMS source to be ergodic are given. The Shannon-McMillan-Breiman Theorem is formu lated in a general measurable space and its interpretation in an alphabet message space is described. Ergodic decomposition is of interest, which states that every stationary source is a mixture of ergodic sources. It is recognized that this is a series of consequences of Ergodic and Riesz-Markov-Kakutani Theorems. Finally, entropy fimctionals are treated to obtain a "true" integral representation by a universal function.

2.1. Alphabet message spaces and information sources In Example 1.3.14 Bernoulli shifts are considered on an alphabet message space. In this section, we study this type of spaces in more detail. Also a brief description of measures on a compact Hausdorff space will be given. Let XQ = { o i , . . . , at} be a finite set, so called an alphabet, and X = XQ be the doubly infinite product of XQ over Z = {0, ± 1 , ± 2 , . . . } , i.e., oo

x = xl= n xk,

xk = x0,kez.

k=—oo

Each x £ X is expressed as the doubly infinite sequence x = (xfc) = (•■■, x-u xo, 67

xu...).

„„

Chapter II: Information Sources

The shift S on X is defined by S:x^x'

= Sx={...

,£'_!,4,2;'!,...),

x'k = xk+1,

keZ.

Denote a cylinder set by [x°i---x°} =

[xi=x°i,...,xi=x§

= {x=

(xk) eX

:xk = x°k,i< k

<j},

where x% 6 X0 for i < k < j and call it a {finite) message. following properties:

One can verify the

(l)i<s ( ^ • - • x°] n [yP ■ ■ ■ 2/P] = 0;

(3)[^..-*5] = n{[«S]:* Ac = Li Bj with disjoint B\, ■.. , Bn G Wl. S is a one-to-one and onto mapping such that (7)5" 1 (( a ; f c )) = ( s f c - i ) f o r ( x f c ) G X ; (8) S""[xP • • • xP] = [yf+n ■ ■ ■ 2/P+J with y°k+n = x°k for i < k < j and n G Z. Let X be the cr-algebra generated by all messages 9Dt, denoted X = a(Wl). Then {X, X, 5) is called an alphabet message space. Now let us consider a topological structure of the alphabet message space {X, X, S). Letting d 0 (a,, a,-) = |i - j \ , a*, a,- G X 0 ,

"(^')= E

^

,

*,x'GX,

(1.1)

k=—oo

we see that X is a compact metric space with the product topology and S is a homeomorphism on it. Recall that a compact Hausdorff space X is said to be totally disconnected if it has a basis consisting of closed-open (clopen, say) sets. Then we have the following: Theorem 1. For any nonempty finite set X0 the alphabet message space X = X j is a compact metric space relative to the product topology, where the shift S is a

69

2.1. Alphabet message spaces and information sources

homeomorphism. Moreover, X is totally disconnected and X is the Borel and also Baire cr-algebra of X. Proof. The shift S is continuous, one-to-one and onto. Hence it is a homeomorphism. X is totally disconnected. In fact, the set 97t of all messages forms a basis for the product topology and each message is clopen. To see this, let U be any nonempty open set in X. It follows from the definition of the product topology that there exists a finite set J = {ji,... , jn} of integers such that prfc(f) = Xk = XQ for k £ J, where prj;() is the projection onto the fcth coordinate space Xk- Let i = min{fc : k e J} and j = max{fc : k 6 J}. Then we see that, for any u = (life) €E U, [ui ■ ■ ■ Uj] C U

and

U = M [ui ■ ■ ■ Uj\.

This means that 971 is a basis for the topology. Each message is clearly clopen. In the rest of this section, we consider a compact Hausdorff space X and its Baire cr-algebra X with a measurable transformation S on X. C{X) and B{X) denote the Banach spaces of all continuous functions and Baire measurable functions on X with sup-norm, respectively. As in Chapter 1, M(X) denotes the Banach space of all C-valued measures on X. In this case, M(X) is the space of all Baire measures on X. P{X) (resp. PS(X)) denotes the set of all (resp. 5-invariant) probability measures in M(X). Each measure (JL 6 P{X) (or PS(X)) is called an information source (or stationary information source), or simply a source (or stationary source). A stationary source fj, 6 PS(X) is said to be ergodic if fJ,(A) = 0 or 1 for every S-invariant set A e X. Pse(X) denotes the set of all ergodic sources in PS(X). E x a m p l e 2. Let XQ — {a%,... , at} be an alphabet with a probability distribution P = (Pii • ■ ■ >Pi)- Consider the alphabet message space X = XQ with a shift S on it. For a message [x° • • • x^] we define

«,([*?-*?]) =P(*?) •••*(*?)•

(1-2)

Then, no is defined on the algebra .4(971) generated by 97t, the set of all messages, and is S-invariant such that ^ o ( ^ ) — 1- By the Caratheodory extension theorem ô can be extended uniquely to an 5-invariant probability measure /x on X = cr(97t), i.e., \i € Pa(X). This n is called a ( p i , . . . ,pe)-Bernoulli (information) source and S is called a ( p i , . . . ,pt)-Bernoulli shift as in Example 1.3.14. We claim that fj, is ergodic. To see this, suppose that A G X is S-invariant and let e > 0 be arbitrary. Choose B e .4(971) such that n(AAB) < e and hence k

\n(A) — n(B)\ < e. Since B — U Bj with disjoint B\,... no > 1 such that S n°B have no > 1 such that S~n°B nohave n(S- B n{S-n°B

j=i

different coordinates from different coordinates from r\B) = M(S-"°5)/i(B) = n B) = n(S-noB)fj,(B) =

,Bk € 97t, we can choose B. This implies that B. This implies that fj,(B)2 fj,{B)2


70

by virtue of (1.2). Then we have fj,(AAS~n°B) = (i{S~n"AAS~noB), = n(S-n°(AAB)) = n(AAB) < e,

since A is S-invariant,

and hence /*(AA(B n S~n°B)) < »{{AAB) U (AAS-"°B)) < fi(AAB) + n(AAS-noB) < 2e. Consequently, it holds that \p(A) - n(B n S-n°B)\

< 2e and

|/*(A) - »(A)2\ < \p(A) - KB n S~n°B)\ + \fi(B n 5-""5) -

/J(>L) 2 |

0 with a + '/3 = 1 and n,£ e Pa(X) imply that M = n = i■ (3) The operator S on M(X) is continuous in the weak* topology if S is a con tinuous transformation on X. To see this, first we note that S is measurable. Let / G C(X). Then S / G C(X) since S/(-) = f(S-) and S is continuous. If C G X is compact, then there is a sequence {/ n }^Li Q C(X) such that / „ J. lc as n —> oo since X is compact and Hausdorff. Thus, lc(S-) = S l c ( 0 is Baire measurable, i.e., 5 _ 1 C G X. Therefore, S is measurable. Now let pn -> /J (weak*), i.e., /*„(/) —> /i(/) for / G C ( X ) . Then, we have for / G C{X) Spn(f)=

[ f{x)Spn(dx)= Jx

[ Jx

f(x)pn{dS-1x)

= I f(Sx) pn{dx) = M s / ) -> Ms/), since S / G C ( X ) , implying 5î„ —> Sp (weak*). Therefore, S is continuous in the weak* topology.

2.2. Ergodic theorems Two celebrated Ergodic Theorems of Birkhoff and von Neumann will be stated and proved in this section. We begin with Birkhoff's ergodic theorem, where the operators S n 's are defined by (1.3).

72


Theorem 1 (Birkhoff Pointwise Ergodic T h e o r e m ) . Let fj, e PS(X) f e L1(X,fj,). Then there exists a unique fs € L1(X,fi) such that (1) fs=

lim S n /

and

fi-a.e.;

n—too

(2) Sfs

= fs

ii-a.e.;

(3) / f dfj,= I fs dfj, for every S-invariant A € X; JA

JA

Ll(X,n).

(4) | | S „ / - /s||i, M - ^ O a s n - > o o , ||-||i, M being the norm in If, in particular, \i is ergodic, then fs is constant

fi-a.e.

Proof. We only have to consider nonnegative / e Ll(X,n). / ( x ) = limsup(S„/)(x), r>->oo

Let

/(x)=liminf(Sn/)(x), ~

x e X.

n->oo

To prove (1) it suffices to show that f fdfi< JX

since this implies that f = f fM(x)

f fdn
0 and e > 0 be fixed and x e X.

= min {J(x), M},

Define n(x) to be the least integer n > 1 such that 1 n_1 < (S»/)(«) + e = - £ ' ( ^ z ) + e> n ,=o

fu(*)

Note that n(x) is finite for each x e X. Since / and fM

*

e

X

are 5-invariant, we have

n (( xx ) - l n

n(x)fu(x)

< n(x) [(S„ (x) / M )(x) + e] = £

" ( * ) / * ( * ) < n(a) [ { S „ W / M ) M + e] =

£ j=0

f(Sjx) + n(x)e,

x e X.

(2.1)

/ ( 5 i x ) + n(x)e>

* e X.

(2.1)

Choose a large enough N > 1 such that Choose a large enough N > 1 such that M

^

1.

Then it holds that for x e X fc(i)

L-l

ni(a:)-l

2

E/M(^*) = E 3=0

fc=lj=nk_1(i)

L-l

7M(S'*) + £

i=n*w(s)

fM(Sjx),

where fc(x) is the largest integer ft > 1 such that rifc(s) < L — 1. Applying (2.2) to each of the k(x) terms and estimating by M the last L — rik(x)(x) terms, we have k(x) r»jt(x)-l

L-l j

T,fM(S x)

= J2

E

L-l j

fu(S x)+

fc=lj=nt_i(a;)

3=0

fc(a:)

Ê

E

fM(Sjx)

j=n*(z)(z)

n*W-l

E

/ ( 5 : ' x ) + ( n f c ( a ; ) - n f c _ 1 ( x ) ) e + (L-n f c ( a : ) (a;))M

k=i L-l

< E f(Sjx) + Le + (N- 1)M 3=0

since / > 0, / M < M and L — nk^(x) and divide by L, then we get /

7M 0 as n —> oo, || ■ || Pi/i being the norm in LP{X, /x). (5) The outline of von Neumann's original proof of Theorem 2 is as follows. Let 5 be as in (3) and « = 6 { / - S / : / e I 2 ( I , 4 where ©{•■■} is the closed subspace spanned by {■•■}■ Then, the first step is to show that S and H are orthogonal complementary subspaces, i.e., S®H = L2(X, fi). The next step is to prove that S n / —> 0 in L2 for / € %. Then, for any / e L2(X, fi) write f = fi+ f2 with / i 6 5 and fi e %. Hence we have llS«/-/l||2,M=llS«(/l + /2)-/l||2,,

= IIS»/2||2,^0' as was desired. This tells us that Theorem 2 holds for an arbitrary measure space.

2.3. Ergodic and mixing properties Let X be a compact Hausdorff space and X be its Baire tr-algebra. In this section, ergodicity and mixing properties are considered in some detail. After giving the following lemma we shall characterize ergodicity of stationary sources by using ergodic theorems. Recall that two measures /i, 77 e M(X) are said to be singular, denoted n L 77, if there is a set A 6 X such that |/i|(A) = \\fi\\ and |»7|(.(4C) = H77II, i.e., fi and T) have disjoint supports. Also recall that /j, e PS(X) is ergodic if each 5-invariant set A G X has measure 0 or 1 and Pse(X) denotes the set of all stationary ergodic sources. Lemma 1. / / /z, 77 G Pse(X),

then either n = n or fi ± 77.

7g

Chapter II: Information

Sources

Proof. Suppose that /J, / 77. Then there is an A G X such that n{A) oo

Ll(X,n).

^

(7) n lim ) (S„f,g)2/1

= (/, 1)2,^(1,5)2,^ /or ever?/ / , 5 G

(8) lim fJ.((Snf)g)

= n(f)fi(g)

for every f,ge

(9) lim (i((Snf)g)

= p{f)vig)

for every f,g G C(X).

L2(X,n).

B(X).

I n-1

(10) lim - y. n(S-kA n-t-oo n

(11) Urn - "][: (t(S-hA n-xx> n

C\B)=

fJ,(A)n(B) for every

A,BeX.

k=0

r\A) = n(A)2 for every

AeX.

k-0

Proof. (1) «• (2) is obvious and (1), (2) => (3) follows from Lemma 1. (3) => (4). Suppose (4) is false, i.e., fi £ exPs{X). Then there are a,j3 > 0 with a + P = 1 and £, 77 G P S (X) with £ ^ 7? such that /j = a£ + /3r/. Hence £ ^ (i and £ -C /z, i.e., (3) does not hold. (4) => (1). Assume that (1) is false, i.e., \i is not ergodic. Then there is an Sinvariant set A G X for which 0 < fi{A) < 1. Hence fi can be written as a nontrivial convex combination »(-) = »(AM-\A) + n(Ac)n(-\Ac), where /i(-\A) / n(-\Ac) and n(-\A),n(-\Ac) i.e., (4) is not true.

G PS(X).

This means that \x f

(1) => (5). Let / G B(X) be real valued and S-invariant and let Ar = {x G X : f(x) > r } ,

r G K.

exPs(X),

77

2.3. Ergodic and mixing properties

Then AT 6 X is S-invariant and hence /i{Ar) = 0 or 1 for every r € R by (1). This means / = const fj.-a.e. (5) =» (6). Let / e ^(X,n). Then, fs is measurable and .S-invariant fi-a.e. by Theorem 2.1. By (5) / s = const /i-o.e. Hence, / s = Jx fs dfi = fx f dfj. \i-a.e. (6) =>■ (7). Let / , g € L 2 (X, ^)- Then, by (6), fs = Jxf Ergodic Theorem implies lim {S„f,g)

dfi n-a.e. and the Mean

= ( lim Snf,g)

=

(fs,g)2,»

= (Jfdft,gj

=(/ll)2,„(l,9)2^.

(7) =* (8) =4> (9) are obvious since C(X) C B(X) C L2(X,n) and (9) => (7) can be verified by a simple approximation argument since C(X) is dense in L2(X, fj,). (8) => (10). Take / = 1A and g = 1 B in (8). (10) => (11) is obvious. (11) => (1). Let A 6 £ be S-invariant. Then (11) implies that (M(A) = n(A)2, so that /J(A) = 0 or 1. Hence (1) holds. R e m a r k 3 . (1) Recall that a semialgebra of subsets of X is a set Xo such that (i) 0 e X 0 ; (ii) A, B e X0 =*• A n 5 € X 0 ; (iii) i e J E o ^ A ^ U B j with disjoint B 1 ( . . . , Bn € XQ. i=1 As we have seen in Section 2.1, in an alphabet message space Xg2, the set 9JI of all messages is a semialgebra. Another such example is the set X x 2) of all rectangles, where (Y,%)) is another measurable space. (2) Let n e P{X) and Xo be a semialgebra generating X, i.e., cr(Xo) = X. If // is S-invariant on Xo, i-e., ^ ( S - 1 ^ ) = n(A) for A 6 Xo, then // e P S (X). In fact, let X1 = {AeX:

M(S_1A) =

n{A)}.

Then, clearly l o ^ I i - It is not hard to see that each set in the algebra -4(Xo) generated by Xo is a finite disjoint union of sets in Xo- Hence A{XQ) C XI- Also it is not hard to see that X\ is a monotone class, i.e., { A n } ^ ! C X\ and An t (or An \) imply U An 6 Xi (or n An e Xi). Since the cr-algebra generated by -4(Xo) is the monotone class generated by A(XQ),

we have that X = a(A(Xo))

= X\. Thus

»ePs(X). (3) In view of (2) above, we can replace X in conditions (10) and (11) of Theorem 2 by a semialgebra Xo generating X. In fact, suppose that the equality in (10) of

ya


Theorem 2 holds for A,B € X0. Then it also holds for A,B e A{X0) since each A € A(X0) can be written as a finite disjoint union of some Ai,... , An € XQ- Now let e > 0 and A,B eX, and choose A),B 0 G A(X0) such that n{AAA0) < e and li(BABo) < e. Note that for ji > 0 (S-'A n B)A(S~jAo n Bo) £ ( S ^ A S " ' A > ) u (BAB 0 ) = (S-j(AAA0)) U (BAB 0 ) and hence n((S-'A n B)A(5-;''AoASo)) < ^(S''(AAA0))

+ n(BAB0) < 2e,

since fj, is 5-invariant. This implies that \n{S-jA n B) -

n Bo) | < 2e,

A*(5-M0

j > 0.

(3.1)

Moreover, we have that

|M5-^ns)-MA)MS)| < \p(S-jA n B ) - /i(5~JAo n B 0 )| + I M ^ A O n B 0 ) -

n(A0)n(B0)\

+ \p{Ao)p(Bo) - lt{A)n{Bo)\ + \ii(A)n(Bo) - A*(A)/I(B)| < 4e + |/i(S-M 0 n Bo) - /Z(A 0 )JU(B 0 )|,

(3.2)

which is irrelevant for ergodicity but for mixing properties in Theorem 6 and Remark 11 below. Consequently by (3.1) it holds that i "—1 3=0'

i

n

~*

' 'j=0 =o n-1

+ +

lI " _ 1

- J- ]„ M(5-J Ao n Bo) - M(A 0 )M(B 0 ) 3=0

n

\li(Ao)it(Bo)-ii(A)it(B)\

1 *~ < 4e + - ^ /x(5-M 0 n B 0 ) - M(AO)P(B 0 ) 3=0

where the second term on the RHS can be made < e for large enough n. This means that (10) of Theorem 2 holds. (4) Condition (11) of Theorem 2 suggests that the following conditions are equiv alent to any one of (1) - (11) of Theorem 2:

79

i.S. Ergodic and mixing properties

(7') JJrn^ ( S „ / , fhtli (8')

= \(f, 1 ) 2 , / for every / G L2(X,

lim M ( ( S „ / ) / ) = H{f? n—¥oo

(9') lim (i{(Snf)f)

for every / G

2

= fi(f)

M );

B(X);

for every / G C{X).

n—K30

(5) 3 denotes the cr-subalgebra consisting of S-invariant sets in X, i.e., 3 = {A G X : SÂ = A}. For M G -P(^) let J ^ j i e X : niSÂAA)

= 0},

the set of all fi-a.e. S-invariant or S-invariant (mod/i) sets in 3£. Clearly 3 0 ^ . Then, we can show that fi G iM-Ô is ergodic iff fi(A) = 0 or 1 for every A G J^. In fact, the "if part is obvious. To prove the "only if part, let A G 3 M . First we note that n(S~nAAA) = 0, n > 0. For, if n > 1, then n - 1l

n -- 1l

7=0 j=0 j=0

7=0 3=0 j=0

j AAS-J 'A) A) = S'^SÂAA) S~nAAA C y (S- 1, then this gives the k-step transition probabilities, i.e., rojf = Pr{xk = aj \x0 = o j ,

1 < i, j < t

M or /i is said to be irreducible if for each i, j there is some k > 1 such that m y We first claim that 1 n_1

N=

> 0.

lim - V M * n-+oo n *—' fc=0

exists and N = (ny) is a stochastic matrix such that JVM = MN = N = N2 In fact, let Ai = [XQ = a»], 1 < « < £ and apply the Pointwise Ergodic Theorem to / = lAi ■ Then we have that

fs(x)=

lim

1 "_1

- V u t A ) fc=0

exists fi-a.e.x and — / fs(x)lAj(x)

ii{dx) = — l i m

I V ^ S - ^ n ^ )

1 "_1 = Urn - Y > j , f c ) = n y fe=0

(3.3)

for l (3) is clear since we are assuming m* > 0 for every i. 1 "_1

(3) => (4). For any i,j,

(k)

lim - J2 mj- = )iy > 0. This implies that we can find

n->oo n

fc=0

some k > 1 such that m\f > 0. That is, fi is irreducible. It is not hard to show the implications (4) => (3) =$> (2) => (1) and we leave it to the reader. We now consider mixing properties for stationary sources which are stronger than ergodicity. Definition 5. A stationary source fi e Pg(X) is said to be strongly mixing (SM) if lim n(S~nA

n B) = u,(A)u(B),

A,BeX,

n-¥oo

and to be weakly mixing (WM) if and to be weakly mixing (WM) if n-l

lim - nV- l \n(S-kA n B) - n(A)u{B) 1=0,

lim - fc=0 V n—>oo n *-~* fc=0

k

\n(S- A

A,BeX.

n B) - n(A)u{B) 1 = 0 ,

A,BeX.

It follows from the definition and Theorem 2 that strong mixing => weak mixing =*• ergodicity. First we characterize strong mixing. T h e o r e m 6. For a stationary source fi G PS{X) the following conditions are equiv alent to each other: (1) n is strongly mixing. (2) lim (Snf,g)2,li = (/, l) 2 l / .(l, ff)a,M for every f,ge L2(X,fj.). That is, S " / ->

/,

f dfj, weakly in L2(X,fj.) for every f £

L2(X,fi).

(3) lim ( S n / , / ) 2 i M = | ( / , l) 2 , / i | 2 for every f e L2(X, n->oo

n

'

(4) lim fj,(S~ A nA) = fi(A)

2

'

for every

M ).

AeX.

n—too

(5) lim [i(S~nA n A) = (i(A)2 for every A G Xo, a generating semialgebra. n—+oo

Proof. (2) =>■ (1) is seen by considering / = 1A and g = 1B- (2) =3- (3) => (4) => (5) is clear. (5) => (4) follows from (3.1) and (3.2) with A = B and A0 = B0.

tion So Chapter II: Information Sources

g2

(1) => (2). Let A,BeX.

Then by (1) we have

lim ( S n U , lu)a, M = n->oo

lim

KS~nA

n B) = M ^ M # ) = ( U , l ) a * ( l i 1B)2, M .

n-K»

If / = E " i 1 ^ , and g = £ Pk^Bk, simple functions, then 3=1

fc=l

lim(Sn/,s)2,M= n-K»

Um

y^«i/9fc(SBlxi,lBjaJ.

n->oo-^—'

= 22«i^fc(lî.l)2,*i( 1 ' 1BJ2,,I = (/, 1)2,^(1,3)2^ Hence the equality in (2) is true for all simple functions / , g. Now let / , g e L2(X, (i) and e > 0. Choose simple functions /o and 30 such that | | / — /olh./i < e and llfl — 3O||2,M < £• Also choose an integer no > 1 such that I(S n /o,So)2,M-(/o,l)2, M (l,9o)2,^\ n0.

Then we see that for n > no \(Snf,g)2lli-(f,

1)2,^(1,5)2^1 n

< |(S / lff ) 2 ,M - (S"/o, 3)2,^| + |(S n /o,3)2,^ - (S"/o,ffo)2, M | + |(S n /o,5o)2, / i - (/O,1)2,M(1,5O)2,A.|

+ |(/o, 1)2,^(1,50)2,^ - (/, 1)2,^(1,50)2,^1 + i(/, 1)2,^(1,50)2,^ - (/, 1 ) 2 , ^ ( 1 , 5 ^ 1 < | ( S n ( / - fo),g)2J

+ |(Sn/0,5 -5O)2,M| + e

+ I ( / - /o, 1)2,^| I(l,5o)2,^| + I ( / , 1 ) 2 J , | I(1,5 - 5 o ) 2 , „ | < 11/ - /ollajlfflhj. + WfohJg

- 5O||2,M + e

+ ll/-/o||2,^||5o||2^ + ||/||2,^||5-5o||2,^ < e||fflk„ + l l / o l k ^ + e + e\\go\\2,u + [|/||a*e < j\9fo*

+ ( l l / l t a + e)e + e + e(||5lk„ + e) + e\\f\\2tll.

It follows that 1 n

™ 0 (S n /,5)2, M = (/, 1)2,^(1,5)2,M.

(4) =» (3) is derived by a similar argument as in the proof of (1) =>• (2) above.

83


(3) =*■ (2). Take any / 6 L2(X,n)

and let

-H = 6{Snf,c

: c € C,n > 0},

the closed subspace of L2(X, n) generated by the constant functions and S n / , n > 0. Now consider the set Mi = {g € L2(X,»)

: Umg(Snf,g)2tll

= (/.^(l.fl),^}.

Clearly Mi is a closed subspace of L2(X, (i) which contains / and constant functions, and is S-invariant. Hence Mi contains M. To see that Mi = L2(X,(i) let g e M x , the orthogonal complement of M in L2(X,(j,). Then we have (S"/,ff)a,M = 0 ( « > 0 )

and

(1.5)2^ = 0,

so that g € Mi. Thus M"1 C Mi. Therefore Mi = L2(X,fi),

i.e., (2) holds.

In Theorem 6 (2) and (3), L2(X, ft) can be replaced by B(X) or

C(X).

Example 7. Every Bernoulli source is strongly mixing. To see this let fi be a ( p i , . . . ,p*)-Bernoulli source on X = X j . Let A = [x? ■ • • SBJ], B = [y° ■ ■ ■ y(] e M. Then it is clear that lim n(S-nA n B) = u(AW-B) n—foo

since for a large enough n > 1 we have n + i > t. By Theorem 6 /u is strongly mixing. In order to characterize weak mixing we need the following definition and lemma. Definition 8. A subset J C Z + = { 0 , 1 , 2 , . . . } is said to be of density zero if lim - | J n J U = 0 ,

n—>oo n

where «/„ = { 0 , 1 , 2 , . . . , n — 1} (n > 1) and | J n J „ | is the cardinality of J n Jn. Lemma 9. For a bounded sequence {an}^=1 are equivalent:

of real numbers the following conditions

(1) Urn ; " E h l = 0 ; (2) lim -nE\aj\2 n-K» n j

= 0;

= 0

(3) Tftere is a set J C Z + 0/ density zero such that

lim

J$n—t 00

a n = 0.

84


Sources

Proof. If we can show (1) (3), then (2) (3). Suppose (1) is true and let

Observe that £ i C £ Observe that £ i C £

2 2

Ek = In G Z + : | a n | > i | ,

k > 1.

Ek = in G Z+ : |o»j > ~\,

k> 1.

0 - ' and each £fc has density zero since 0 - ' and each Ek has density zero since

-\EknJn\0 n

n *—'

J

as n —> oo by (1). Hence for each k = 1,2,... we can find an integer j j . > 0 such that 1 = j 0 < J! < j 2 < ■•• and -\Ek+i n J „ | < ^-j-y,

n > i&.

(3.4)

Now we set J = U (£* n [j'fc-i.jfc)). We first show that J has density zero. If jk-i

0 as n —> oo, i.e., J has density zero. Secondly, we show that lim a n = 0. If n > jfc and n 4 J, then n 4 Ek and J$n—foo

|a„| < £ T J . This gives the conclusion.

ng properties pi 2.3. Ergodic and mixing

85

(3) => (1). Suppose the existence of a set J C Z + and let e > 0. Then,

j=o

j'6J„nJ

jeJnnJ'

Since {an} is bounded and J has density zero, the first term can be made < e for large enough n. Since a„ —> 0 as n —>■ oo and n $ J, the second term can also be made < e for large enough n. Therefore (1) holds. Theorem 10. For a stationary source fx £ Ps{X) the following conditions are equivalent to each other: (1) fj. is weakly mixing. (2) For any A,B e X there is a set J C Z+ of density zero such that lim

fj,(S-nA C\B) =

n(A)p(B).

J$ n—KX>

(3) lim - T ; U(5- J '-4 n B) - /i(A)/z(B)| 2 = 0 for every

A,BeX.

re-t-oo n j = o

(4) lim - " E | ( S k / , j ) 2 , , - (/,1)2,,(1,»)2,,| = 0 /or e ^ n , / , j £ L 2 ( I , , i ) . n-»oo n

k=0

(5) /J x /i is weakly mixing relative to S x S, where fix fi is the product measure on (XxX,X®X). (6) n x T) is ergodic relative to S xT, where (Y, %),n,T) is an ergodic dynamical system, i.e., n € Pse(Y). (7) fj, x fi is ergodic relative to S x S. Remark 11. In Theorem 10 above, conditions (2), (3) and (4) may be replaced by (2'), (2"), (3'), (3") and (4'), (4") below, respectively, where X0 is a semialgebra generating X: (2') For any A,B £ XQ there exists a set J C Z + of density zero such that lim /x(S~ n A n B) = fi(A)n(B). J$ n—>oo

(2") For any A e XQ there exists a set J C Z+ of density zero such that Urn fi{S-nAC\A) = n{A)2. J$ n—>-oo

(3') lim - V ; \n(S-jA n-K» n

HA)-

fi(A)2\2

r\A)-

fi(A)2\ = 0 for every A e X0.

= 0 for every A e X0.

j=Q

(3") lim - nT. WS'iA n-voo n J=Q

gg

Chapter

ion Sot II: Information Sources

(4') lim - i f |(SV, / ) 2 , „ - | ( / , 1) 2 ,„| 2 | = 0 for every / € L2(X, fi). n-s-oo n j = o

(4") Urn - " E K S ^ ' / . ^ ^ - K / . ^ ^ P I ^ O for e v e r y / e L 2 ( X , M ) . n-voo n J=Q

Proof of Theorem 10. (1) (2) •» (3) follows from Lemma 9 with an = tM(S-nAnB)-fi(A)fj,(B),

n > 1.

(1) =*> (4) can be verified first for simple functions and then for L 2 -functions by a suitable approximation as in the proof of (1) =>■ (2) of Theorem 6. (4) =>• (1) is trivial. (2) =» (5). Let A, B, C, D € X and choose Jj,, J 2 C Z+ of density zero such that n(S~nA

lim

n B) = /i(^)/i(B),

J i $ n—»-oo

Urn

/ i ( S " " C 1-12?) =

n(C)n(D).

J2$n—foo

It follows that lim

(/ux/j)((5x5)-"(J4xC7)n(Bx£>))

JiUJ2£n-K»

n{S~nA

lim

n B)n{S~nC

n D)

JiUJ2$n~foo

= /i(A)/i(B)/i(C7)/x(i)) = ()jx

^)(J4

x C)(/i x /^)(B x D).

Since 3C x X = {A x B : A, B e 3E} is a semialgebra and generates X <S> X, and J i U J 2 C Z + is of density zero, we invoke Lemma 9 and Remark 11 to see that / J X / I is weakly mixing. (5) =*- (6). Suppose fixfj,is weakly mixing and (Y, 2), T, ri) is an ergodic dynamical system. First we note that (5) implies (2) and hence n itself is weakly mixing. Let A, B e X and C, D 6 2). Then n-l

- £ ( M x 17)((5 x r ) - ' ( A x C) n (B x £>)) 3=0 n-l

= -£^M£)77(r^cni3) i=o

1

n _ 1

+ n E

W~'*

n

-B) - M(^)M(S)}7?(r^C n Z>).

(3.5)

87


The first term on the RHS of (3.5) converges to li{A)n(B)ri(C)T]{D) = (fixr]){Ax

C){fi x n ) ( S x D) (n -> oo)

since r/ is ergodic. The second term on the RHS of (3.5) tends to 0 (n —> oo) since n-l

l £ > ( s - ' ' i 4 n B) - M(A)M(B)}n(r-^c n D) n j=0

n-l

< - 53 IM-S'M nfl)- M(^)A*(B)| -> o (n —> oo), since î is weakly mixing. Thus n x r) is ergodic since 3£x 2) is a semialgebra generating £®2J. (6) =4- (7) is trivial. (7) => (3). Let A, B 6 £ and observe that 1

n—1

- ^

1

fi(S-'A

n—1

D B) = - ^ 0 * x AO ((5 x S)~j{A

j=o

xX)n(Bx

X))

j=o

-¥{jpx = - n—1

p){A x X)(/t x fi)(B x X),

by (7),

»(A)(i(B), - n—1

- 53 MOS'-'A n B)2 = - 53 (/* x M) ((5 x S)-J'(;4 x A) n (5 n B)) .7=0

j=0

-»

(A*

x /i)(A x A){n x n)(B x B),

by (7),

2

= M(A) V(B) . Combining these two, we get n-l

i53|M5-^nB)- M (AMB)| 2 n .

j=0

n-l

= - 5 3 {»(S~jA n B)2 - 2/i(S'-;'A n B)/*(A)Ai(B) + //(A) V(B)2} n

j=o

-» n(A)V(B)2

- 2n(A)n(B)n(A)n(B)

+ ^(A) 2 M (B) 2 = 0.

Thus (3) holds. E x a m p l e 12. Let fi be an (M, m)-Markov source on X = XQ . ^ or M is said to be aperiodic if there is some no > 1 such that M"° has no zero entries. Then, the following statements are equivalent:

88

Chapter II: Information Sources (1) n is strongly mixing. (2) jj. is weakly mixing. (3) /i is irreducible and aperiodic. (4) lim Tuffi = mj for every i, j . n—►oo

**

The proof may be found in Walters [1, p. 51]. In the rest of this section we use relative entropy to characterize ergodicity and mixing properties. Recall that for a pair of information sources fj, and v the relative entropy H{v\fi) of v w.r.t. \i was obtained by

H{y\fi) = sup j £

u{A) log ^

log — dv, dfj,

/ ,x oo,

: 21 G V(X) J if

J/

< /i,

otherwise

(cf. (1.6.1) and Theorem 1.6.2). The following lemma is necessary. L e m m a 1 3 . Let fin (n > 1),/J, G P(X). Suppose fin < afi for n > 1, where a > 0 is a constant. Then, lim fj.n(A) = fj,(A) uniformly in A G X iff lim H(fj,n\u) = 0. 71—fOO

71—>00

Proo/. The "if part follows from Theorem 1.6.3(4). To see the "only i f part, observe that {^jf} is uniformly bounded and converges to 1 in probability (w.r.t. //) by assumption. Since + ±(t-l)2,

\tlogt\0,

we have { ^ f l o g ^ f } converges to 0 in probability. Thus, since {&*.log^-} uniformly bounded, we also have lim H(Jtn\^)=

Urn [ ^ l o g ^ d / x = 0.

We introduce some notations. Let n G P{X). fln on X X by 1

is

For each n > 1 define a measure

71 — 1

W^xfl) = -^/»(s-^nB),

A, Be*.

2.3.

89

Ergodic and mixing properties

For a finite paxtition 21 G P(X) of X, fJ.% denotes the restriction of fi to 21 = o-(2l), i.e., fi ® e 'P(^) denote by .4(21 x 53) the algebra generated by the set {A x B : A G 21, S G 53} of rectangles. We also let

*-*^-JLMA"B^mmPV

Aea,B€®

FA„(2lx®) = -

£

;PV

;

/in(J4xB)log/i„(AxJB),

A62l,£e93

*woa x») = - E "(■*) lo^(A) - E ^(fi) lo s^ 5 ) Ae«

Be 0 there is an integer n 0 > 1 such that |^(A)-M{A)|n0,

A e l

Thus for A G 3E and p, g > 1 we have that ?-i

J

I£MS- A)-JX>(S-^)

¥ p

H y

j j=o =o

fc=o 9-1



„

i

i£/zno(S-^)-iX>»o( 1 such that the third term of the RHS of the above expression can be made < e for p, g > po- Consequently it follows that for p,q > po, the LHS of the above expression is < 3e. Therefore, the limit in (4.1) exists for every A G X, so that p G P a ( X ) . (2) Let us set Ma{X)

= {an + fa : a,0€

C, p,»? G P « ( X ) } ,

Af s (X) = { a p + ^7?: a,/? G C, /X.JJ g

P.{X)}.

Note that M 0 ( X ) is the set of all measures p G M(X) for which the limit in (4.1) exists, and MS(X) is the set of all 5-invariant p G M(X). Define an operator

10:Pa(X)^Ps(X)by Top = p,

p G Pa{X)

and extend it linearly to an operator T on Ma(X) onto Ma(X). Then we see that X is a bounded linear operator of norm 1, since Ma(X) is a norm closed subspace of M(X) by (1). Hence (2) follows immediately.

24-

AMS sources

93

Definition 4. Let fi, rj e P{X). That r\ asymptotically dominates /u, denoted noo

a

a

a

/ i < ^ < 7 7 o r / i < f < 7 / , then p, < 77. After the next lemma, we characterize AMS sources. Lemma 5. Let p.,r) € P ( X ) and for n > Q let Snfi = (Sn(i)a + (Snfi)s be the Lebesgue decomposition of Snn w.r.t. 7/, where (Snp,)a is the absolutely continu ous part and {Snp.)a is the singular part. If / „ = » ^'" is the Radon-Nikodym a

derivative and p. -C 77, then it holds that n lim oo I

JjndV 1=0

(4.2)

A e x

Proof. For each n = 0 , 1 , 2 , . . . let Bn e X be such that 77(Bn) = 0 and Snp(A)

= Snfi(A n Bn) + f /» *?,

A 6 3E.

OO

Let 5 = U i? n . Then we see that r)(B) = 0 and for any A € X 0 < M(5~ n A) - [ fndr,

= /x(5-n(Anfi„))

< At(5- n (A n 5 ) ) < fi{S-nB)

-> 0

a

as n —^ oo by /i (4) is immediate since A G Xoo if A G X is S-invariant. (4) =>■ (2). Let TJ G P S (X) satisfy the condition in (4). Take an A G X with 77(A) = 0 and let B = r i m s u p S ~ M . Then we see that B is S-invariant and 77(B) = n-Kx>

0. That lim^ n(S~nA) (1) =* (2)"above.

= 0 can be shown in the same fashion as in the proof of

24- A.MS sources

95

(2) => (5). Suppose that / J < 7 ? with 77 G Pa(X), and let / G B(X) be arbitrary. Then the set A = {x G X : (S„/)(a:) converges} is 5-invariant and r](A) = 1 by the Pointwise Ergodic Theorem. Thus, that Ac is 5-invariant and 77(^4°) = 0 imply lim n(S~nAc) n—>oo exists

a

= n{Ac) = 0 by n (6). Let / G ■B(-X') and observe that {Snf}'^'=1 is a bounded sequence in B(X) C LX(X, n) such that S „ / -» / s ^t-a.e. by (5). Then the Bounded Convergence Theorem implies that /i(S„/) —> (J,{fs)(6) => (1). We only have to take / = 1A in (6). The equality (4.3) is almost clear. a

Remark 7. (1) Note that /z• (2) in the above theorem. (2) If n G P(X) and n < 77 for some 77 G Ps(-X"), then n is AMS. (3) The Pointwise Ergodic Theorem holds for \i G P(X) iff /j G Pa(X). More precisely, the following statements are equivalent for \i G P(X): (i) /. G Pa(X). (ii) For any / G B(X) there exists some 5-invariant function fs G B(X) such that S n / —► / s \x-a.e. In this case, for / G L 1 (X,/Z), S „ / —» / s ft-a.e., p-a.e. and in L J (X,/T), and / s = En(fP) = Eji{f\3) fi-a.e. and p-a.e., where 3 = {.4 G X : 5 _ 1 . 4 = A} is a cr-algebra. (4) In (5) and (6) of Theorem 6, B(X) can be replaced by C(X). (5) Pa(X) is weak* compact with the identification of Pa(X) C M(X)

= C(X)*

When 5 is invertible, we can have some more characterizations of AMS sources. Proposition 8. Suppose that S is invertible. Then, for /j, G P(X) conditions are equivalent: (1) fi G Pa{X). (2) There exists some 77 G PS(X) such that n -C 77. (3) There exists some 77 G Pa(X) such that ft (3) in Theorem 6

71=0

implies ^ -C 77 = p. (2) => (3) is immediate. (3) => (1). If 77 G Pa(X),

then 77 « 7 7 by the proof of (1) => (2). Hence, if fj, < 77,


96

Sources

then fj, -C 77. This implies fi 0 by / J < 7 7 , i.e., n(A) = 0. Similarly, if 77(A) = 1, then we have p(A) = 1. Thus \x £ Pae(X). The implications (1) =>. (4) =>• (5) =► (6) => (7) => (8) => (9) => (1) are shown in much the same way as in the proof of Theorem 3.2.


gg

Remark 13. In (8) and (9) of Theorem 12, X can be replaced by a semialgebra X0 that generates X. Also in (5), (6) and (7) of Theorem 12, we can take g = f. Theorem 14. (1) If p e exPa(X), then fi G Pae{X). That is, exPa(X) C Pae{X). (2) If Pse(X) / 0, then the above set inclusion is proper. That is, there is a H G Pae{X) such that \i £ exPa(X). Proof. (1) This can be verified in exactly the same manner as in the proof of (4) =J- (1) of Theorem 3.2. (2) Let fj, G Pae(X) be such that n ^ p. The existence of such a fi is seen as follows. Take any stationary and ergodic f G Pse(X) ( / 0) and any nonnegative / 6 ^(X, f) with norm 1 which is not S-invariant on a set of positive f measure. Define fi by

p(A) = J f (3) is clear.

then n -C p = r\ by Remark 9.

(3) =► (4). Let r) e Pae(X) be such that p < n. Then rj € Pse(X) Hence /i < rj and ^ < 77 since rj is stationary.

and n < fj.

(4) => (1). Let 7? e Pae(X) be such that /i(£) and n > 1.

/

—

For, this is verified by (1),(2),(3) and the mathematical induction. = 7 / i (5-("- 1 )2l) + £ / ^ - ( " - f c - i y

(7) iJVs-w)

•7_u

'

fc=i

^

V 5-("-J)2lV I i=i

/

S.5. Shannon-McMillan-Breiman Theorem

101

This is obtained from (6) by letting 21, = S~ln~j)% = 7„(2l) o S " - 1 + £

(8) iJ^S-m) \j=o

y

J

1 < j < n.

Mf a l V 5-( fc -> +1 )2l) o 5 n - f c - x . i

v ij=i

fc=1

This is immediate from (4) and (7). Now for n = 1,2,... let

/. = /„( Vs-H),

PO = i„(a),

Abstract Methods in Information Theory

Abstract Methods in Information Theory

Information-spectrum methods in information theory

Information-Spectrum Methods in Information Theory

Abstract Measurement Theory

Abstract Systems Theory

Abstract Set Theory

Abstract Measurement Theory

Methods in module theory

Abstract Algebra: Theory and Applications

Information-Spectrum Method in Information Theory

Information theory

Information Theory

Information Theory

Information Theory

Abstract algebra: Theory and applications

Methods of Information Geometry

Methods of Information Geometry

Qualitative methods in quantum theory

Elementary Methods in Number Theory

Mathematical methods in risk theory

Mathematical methods in risk theory

Methods in Banach space theory

Topological Methods in Group Theory

Topological methods in group theory

Qualitative methods in quantum theory

Abstract homotopy and simple homotopy theory

Abstract Analytic Function Theory And Hardy Algebras

Information theory and statistics

Coding and information theory

Network Information Theory

Abstract Methods in Information Theory