Stochastic Processes and Functional Analysis A Volume of Recent Advances in Honor of M. M. Rao
edited by
Alan C. Krinik Randall J. Swift California State Polytechnic University Pomona, California, U.S.A.
MARCEL DEKKER, INC.
NEW YORK • BASEL
Although great care has been taken to provide accurate and current information, neither the author(s) nor the publisher, nor anyone else associated with this publication, shall be liable for any loss, damage, or liability directly or indirectly caused or alleged to be caused by this book. The material contained herein is not intended to provide specific advice or recommendations for any specific situation. Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress. ISBN: 0-8247-5404-2 This book is printed on acid-free paper. Headquarters Marcel Dekker, Inc. 270 Madison Avenue, New York, NY 10016, U.S.A. tel: 212-696-9000; fax: 212-685-4540 Distribution and Customer Service Marcel Dekker, Inc. Cimarron Road, Monticello, New York 12701, U.S.A. tel: 800-228-1160; fax: 845-796-1772 Eastern Hemisphere Distribution Marcel Dekker AG Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland tel: 41-61-260-6300; fax: 41-61-260-6333 World Wide Web http://www.dekker.com The publisher offers discounts on this book when ordered in bulk quantities. For more information, write to Special Sales/Professional Marketing at the headquarters address above. Copyright © 2004 by Marcel Dekker, Inc. All Rights Reserved. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage and retrieval system, without permission in writing from the publisher. Current printing (last digit): 10 987
6 5 4 3 2 1
PRINTED IN THE UNITED STATES OF AMERICA
PURE AND APPLIED MATHEMATICS A Program of Monographs, Textbooks, and Lecture Notes
EXECUTIVE EDITORS Earl J. Taft Rutgers University New Brunswick, New Jersey
Zuhair Nashed University of Central Florida Orlando, Florida
EDITORIAL BOARD M. S. Baouendi University of California, San Diego Jane Cronin Rutgers University Jack K. Hale Georgia Institute of Technology
Anil Nerode Cornell University Donald Passman University of Wisconsin, Madison Fred S. Roberts Rutgers University
S. Kobayashi University of California, Berkeley
David L. Russell Virginia Polytechnic Institute and State University
Marvin Marcus University of California, Santa Barbara
Walter Schempp Universitat Siegen
W. S. Massey Yale University
Mark Teply University of Wisconsin, Milwaukee
LECTURE NOTES IN PURE AND APPLIED MATHEMATICS
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59.
N. Jacobson, Exceptional Lie Algebras L.-A. Lindahl and F. Poulsen, Thin Sets in Harmonic Analysis /. Satake, Classification Theory of Semi-Simple Algebraic Groups F. Hirzebruch et al., Differentiable Manifolds and Quadratic Forms I. Chavel, Riemannian Symmetric Spaces of Rank One R. B. Burckel, Characterization of C(X) Among Its Subalgebras 6. R. McDonald et al., Ring Theory Y.-T. Siu, Techniques of Extension on Analytic Objects S. R. Caradus et al., Calkin Algebras and Algebras of Operators on Banach Spaces E. O. Roxin et al., Differential Games and Control Theory M. Orzech and C. Small, The Brauer Group of Commutative Rings S. Thornier, Topology and Its Applications J. M. Lopez and K. A. Ross, Sidon Sets W. W. Comfort and S. Negrepontis, Continuous Pseudometrics K. McKennon andJ. M. Robertson, Locally Convex Spaces M. Carmeli and S. Malin, Representations of the Rotation and Lorentz Groups G. 6. Seligman, Rational Methods in Lie Algebras D. G. de Figueiredo, Functional Analysis L. Cesarietal., Nonlinear Functional Analysis and Differential Equations J. J. Schaffer, Geometry of Spheres in Normed Spaces K. Yano and M. Kon, Anti-Invariant Submanifolds W. V. Vasconcelos, The Rings of Dimension Two R. E. Chandler, Hausdorff Compactifications S. P. Franklin and B. V. S. Thomas, Topology S. K. Jain, Ring Theory B. R, McDonald and R. A. Morris, Ring Theory II R. B. Mura and A. Rhemtulla, Orderable Groups J. R. Graef, Stability of Dynamical Systems H.-C. Wang, Homogeneous Branch Algebras £. O. Roxin et al., Differential Games and Control Theory II R. D. Porter, Introduction to Fibre Bundles M. Altman, Contractors and Contractor Directions Theory and Applications J. S. Golan, Decomposition and Dimension in Module Categories G. Fairweather, Finite Element Galerkin Methods for Differential Equations J. D. Sally, Numbers of Generators of Ideals in Local Rings S. S. Miller, Complex Analysis R. Gordon, Representation Theory of Algebras M. Goto and F. D. Grosshans, Semisimple Lie Algebras A. I. Arruda et al., Mathematical Logic F. Van Oystaeyen, Ring Theory F. Van Oystaeyen and A. Verschoren, Reflectors and Localization M. Satyanarayana, Positively Ordered Semigroups D. L Russell, Mathematics of Finite-Dimensional Control Systems P.-T. Liu and E. Roxin, Differential Games and Control Theory III A. Geramita and J. Seberry, Orthogonal Designs J. Cigler, V. Losert, and P. Michor, Banach Modules and Functors on Categories of Banach Spaces P.-T. Liu and J. G. Sutinen, Control Theory in Mathematical Economics C. Byrnes, Partial Differential Equations and Geometry G. Klambauer, Problems and Propositions in Analysis J. Knopfmacher, Analytic Arithmetic of Algebraic Function Fields F. Van Oystaeyen, Ring Theory B. Kadem, Binary Time Series J. Barros-Neto and R. A. Artino, Hypoelliptic Boundary-Value Problems R. L. Stemberg et al., Nonlinear Partial Differential Equations in Engineering and Applied Science B. R. McDonald, Ring Theory and Algebra III J. S. Golan, Structure Sheaves Over a Noncommutative Ring T. V. Narayana et al., Combinatorics, Representation Theory and Statistical Methods in Groups I. A. Burton, Modeling and Differential Equations in Biology K. H. Kim and F. W. Roush, Introduction to Mathematical Consensus Theory
60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120.
J. Banas and K. Goebel, Measures of Noncompactness in Banach Spaces O. A. Me/son, Direct Integral Theory J. E Smith et al., Ordered Groups J. Cronin, Mathematics of Cell Electrophysiology J. W. Brewer, Power Series Over Commutative Rings P. K. Kamthan and M. Gupta, Sequence Spaces and Series T. G. McLaughlin, Regressive Sets and the Theory of Isols T. L Herdman et al., Integral and Functional Differential Equations R. Draper, Commutative Algebra W. G. McKay and J. Patera, Tables of Dimensions, Indices, and Branching Rules for Representations of Simple Lie Algebras R. L Devaney and Z. H. Nitecki, Classical Mechanics and Dynamical Systems J. Van Gee/, Places and Valuations in Noncommutative Ring Theory C. Faith, Injective Modules and Injective Quotient Rings A. Fiacco, Mathematical Programming with Data Perturbations I P. Schultz et al., Algebraic Structures and Applications L Bican et al., Rings, Modules, and Preradicals D. C. Kay and M. Breen, Convexity and Related Combinatorial Geometry P. Fletcher and W. F. Lindgren, Quasi-Uniform Spaces C.-C. Yang, Factorization Theory of Meromorphic Functions O. Taussky, Ternary Quadratic Forms and Norms S. P. Singh andJ. H. Burry, Nonlinear Analysis and Applications K. B. Hannsgen et al., Volterra and Functional Differential Equations N. L Johnson et al., Finite Geometries G. /. Zapata, Functional Analysis, Holomorphy, and Approximation Theory S. Greco and G. Valla, Commutative Algebra A. V. Fiacco, Mathematical Programming with Data Perturbations II J.-B. Hiriart-Urruty et al., Optimization A. Figa Talamanca and M. A. Picardello, Harmonic Analysis on Free Groups M. Harada, Factor Categories with Applications to Direct Decomposition of Modules V. I. Istratescu, Strict Convexity and Complex Strict Convexity V. Lakshmikantham, Trends in Theory and Practice of Nonlinear Differential Equations H. L. Manocha andJ. B. Srivastava, Algebra and Its Applications D. V. Chudnovsky and G. V. Chudnovsky, Classical and Quantum Models and Arithmetic Problems J. W. Longley, Least Squares Computations Using Orthogonalization Methods L P. de Alcantara, Mathematical Logic and Formal Systems C. E Aull, Rings of Continuous Functions R. Chuaqui, Analysis, Geometry, and Probability L. Fuchs and L. Salce, Modules Over Valuation Domains P. Fischer and W. R. Smith, Chaos, Fractals, and Dynamics W. B. Powell and C. Tsinakis, Ordered Algebraic Structures G. M. Rassias and T. M. Rassias, Differential Geometry, Calculus of Variations, and Their Applications R.-E. Hoffmann and K. H. Hofmann, Continuous Lattices and Their Applications J. H. Lightboume III and S. M. Rankin III, Physical Mathematics and Nonlinear Partial Differential Equations C. A. Baker and L. M. Batten, Finite Geometries J. W. Brewer et al., Linear Systems Over Commutative Rings C. McCroryand T. Shifrin, Geometry and Topology D. W. Kueke et al., Mathematical Logic and Theoretical Computer Science B.-L. Lin and S. Simons, Nonlinear and Convex Analysis S. J. Lee, Operator Methods for Optimal Control Problems V. Lakshmikantham, Nonlinear Analysis and Applications S. F. McCormick, Multigrid Methods M. C. Tangora, Computers in Algebra D. V. Chudnovsky and G. V. Chudnovsky, Search Theory D. V. Chudnovsky and R. D. Jenks, Computer Algebra M. C. Tangora, Computers in Geometry and Topology P. Nelson et al., Transport Theory, Invariant Imbedding, and Integral Equations P. Clement et al., Semigroup Theory and Applications J. Vinuesa, Orthogonal Polynomials and Their Applications C. M. Dafermos et al., Differential Equations E. O. Roxin, Modern Optimal Control J. C. Diaz, Mathematics for Large Scale Computing
121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183.
P. S. Milojevic, Nonlinear Functional Analysis C. Sadosky, Analysis and Partial Differential Equations R. M. Shorn, General Topology and Applications R. Wong, Asymptotic and Computational Analysis D. V. Chudnovsky and R. D. Jenks, Computers in Mathematics W. D. Wallis et al., Combinatorial Designs and Applications S. Elaydi, Differential Equations G. Chen et al., Distributed Parameter Control Systems W. N. Everitt, Inequalities H. G. Kaper and M. Garbey, Asymptotic Analysis and the Numerical Solution of Partial Differential Equations O. Anno et a/., Mathematical Population Dynamics S. Coen, Geometry and Complex Variables J. A. Goldstein et a/., Differential Equations with Applications in Biology, Physics, and Engineering S. J. Andima et a/., General Topology and Applications P Clement et a/., Semigroup Theory and Evolution Equations K. Jarosz, Function Spaces J. M. Bayod et a/., p-adic Functional Analysis G. A. Anastassiou, Approximation Theory R. S. Rees, Graphs, Matrices, and Designs G. Abrams et a/., Methods in Module Theory G. L Mullen and P. J.-S. Shiue, Finite Fields, Coding Theory, and Advances in Communications and Computing M. C. JoshiandA. V. Balakrishnan, Mathematical Theory of Control G. Komatsu and Y. Sakane, Complex Geometry /. J. Bakelman, Geometric Analysis and Nonlinear Partial Differential Equations T. Mabuchi and S. Mukai, Einstein Metrics and Yang-Mills Connections L Fuchs and R. Gobel, Abelian Groups A. D. Pollington and W. Moran, Number Theory with an Emphasis on the Markoff Spectrum G. Dore et a/., Differential Equations in Banach Spaces T. West, Continuum Theory and Dynamical Systems K. D. Bierstedtetal., Functional Analysis K. G. Fischer et a/., Computational Algebra K. D. Elworthyetal., Differential Equations, Dynamical Systems, and Control Science P.-J. Cahen, et a/., Commutative Ring Theory S. C. Cooper and W. J. Thron, Continued Fractions and Orthogonal Functions P. Clement and G. Lumer, Evolution Equations, Control Theory, and Biomathematics M. Gyllenberg and L. Persson, Analysis, Algebra, and Computers in Mathematical Research W. O. Bray et at., Fourier Analysis J. Bergen and S. Montgomery, Advances in Hopf Algebras A. R. Magid, Rings, Extensions, and Cohomology N. H. Pavel, Optimal Control of Differential Equations M. Ikawa, Spectral and Scattering Theory X. Liu and D. Siegel, Comparison Methods and Stability Theory J.-P. Zolesio, Boundary Control and Variation M. Kfizeketal., Finite Element Methods G. Da Prato and L Tubaro, Control of Partial Differential Equations E. Ballico, Projective Geometry with Applications M. Costabeletal., Boundary Value Problems and Integral Equations in Nonsmooth Domains G. Ferreyra, G. R. Goldstein, and F. Neubrander, Evolution Equations S. Huggett, Twistor Theory H. Cooket al., Continue D. F. Anderson and D. E. Dobbs, Zero-Dimensional Commutative Rings K. Jarosz, Function Spaces V. Ancona et a/., Complex Analysis and Geometry £. Casas, Control of Partial Differential Equations and Applications N. Kalton et a/., Interaction Between Functional Analysis, Harmonic Analysis, and Probability Z. Deng et a/., Differential Equations and Control Theory P. Marcellini et al. Partial Differential Equations and Applications A. Kartsatos, Theory and Applications of Nonlinear Operators of Accretive and Monotone Type M. Maruyama, Moduli of Vector Bundles A. Ursini and P. Agliano, Logic and Algebra X. H. Cao et al., Rings, Groups, and Algebras O. Arnold and R. M. Rangaswamy, Abelian Groups and Modules S. R. Chakravarthy and A. S. Alfa, Matrix-Analytic Methods in Stochastic Models
184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238.
J. E. Andersen et al., Geometry and Physics P.-J. Cahen et a/., Commutative Ring Theory J. A. Goldstein et al., Stochastic Processes and Functional Analysis A. Sorbi, Complexity, Logic, and Recursion Theory G. Da Prato and J.-P. Zolesio, Partial Differential Equation Methods in Control and Shape Analysis D. D. Anderson, Factorization in Integral Domains N. L Johnson, Mostly Finite Geometries D. Hinton and P. W. Schaefer, Spectral Theory and Computational Methods of Sturm-Liouville Problems W. H. Schikhofet a/., p-adic Functional Analysis S. Serfb'z, Algebraic Geometry G. Cahsti and E. Mitidieri, Reaction Diffusion Systems A. V. Fiacco, Mathematical Programming with Data Perturbations M. Kfizek et a/., Finite Element Methods: Superconvergence, Post-Processing, and A Posteriori Estimates S. Caenepeel and A. Verschoren, Rings, Hopf Algebras, and Brauer Groups V. Drensky et al., Methods in Ring Theory W. B. Jones and A. Sri Ranga, Orthogonal Functions, Moment Theory, and Continued Fractions P. E. Newstead, Algebraic Geometry D. Dikranjan and L. Salce, Abelian Groups, Module Theory, and Topology Z. Chen et al., Advances in Computational Mathematics X. Caicedo and C. H. Montenegro, Models, Algebras, and Proofs C. Y. Yildirim and S. A. Stepanov, Number Theory and Its Applications D. E. Dobbs et al., Advances in Commutative Ring Theory F. Van Oystaeyen, Commutative Algebra and Algebraic Geometry J. Kakol et al., p-adic Functional Analysis M. Bou/agouaz and J.-P. Tignol, Algebra and Number Theory S. Caenepeel and F. Van Oystaeyen, Hopf Algebras and Quantum Groups F. Van Oystaeyen and M. Saorin, Interactions Between Ring Theory and Representations of Algebras R. Costa et al., Nonassociative Algebra and Its Applications T.-X. He, Wavelet Analysis and Multiresolution Methods H. Hudzik and L. Skrzypczak, Function Spaces: The Fifth Conference J. Kajiwara et al., Finite or Infinite Dimensional Complex Analysis G. Lumerand L. Weis, Evolution Equations and Their Applications in Physical and Life Sciences J. Cagnol et al., Shape Optimization and Optimal Design J. Herzog and G. Restuccia, Geometric and Combinatorial Aspects of Commutative Algebra G. Chen et al., Control of Nonlinear Distributed Parameter Systems F. Ali Mehmeti et al., Partial Differential Equations on Multistructures D. D. Anderson and I. J. Papick, Ideal Theoretic Methods in Commutative Algebra A. Granja et al., Ring Theory and Algebraic Geometry A. K. Katsaras et al., p-adic Functional Analysis R. Salvi, The Navier-Stokes Equations F. U. Coelho and H. A. Merk/en, Representations of Algebras S. Aizicovici and N. H. Pavel, Differential Equations and Control Theory G. Lyubeznik, Local Cohomology and Its Applications G. Da Prato and L Tubaro, Stochastic Partial Differential Equations and Applications W. A. Carnielli etal., Paraconsistency A. Benkirane and A. Touzani, Partial Differential Equations A. Wanes et al., Continuum Theory M. Fontana et al., Commutative Ring Theory and Applications D. Mond and M. J. Saia, Real and Complex Singularities V. Ancona and J. Vaillant, Hyperbolic Differential Operators G. R. Goldstein et al., Evolution Equations A. Giambruno et al., Polynomial Identities and Combinatorial Methods A. Facchini et al., Rings, Modules, Algebras, and Abelian Groups J. Bergen et al., Hopf Algebras A. C. Krinik and R. J. Swift, Stochastic Processes and Functional Analysis
Additional Volumes in Preparation
Preface
An AMS Special Session in honor of M.M. Rao was held .at the 2002 joint meetings of the American Mathematical Society and the Mathematical Association of America. That Special Session was on Stochastic Processes and Functional Analysis and was organized by Professors Alan C. Krinik and Randall J. Swift, both of California State Polytechnic University, Pomona. Professor M.M. Rao has had a long and distinguished research career. His research spans the areas of probability, statistics, stochastic processes, Banach space theory, measure theory and differential equations - both deterministic and stochastic. The prolific published research of M.M. Rao impacts each of these broad areas of mathematics. The purpose of the Special Session was to highlight the key role played by abstract analysis in simplifying and solving fundamental problems in stochastic theory. This notion is fundamental to the mathematics research of M.M. Rao, who uses functional analytic methods to bring questions in these diverse areas to light. The Sessions were a great success, bringing together a diverse group of research mathematicians whose work has been influenced by M.M.'s work and who, in turn, have influenced his work. Not only did this diverse group of speakers benefit from the common unifying thread of the session, but also there were often lively discussions and questions from the session audience. This volume contains most of the talks given at the Sessions as well as several that were contributed later. This collection of papers reflects the depth and enormous breadth of M.M. Rao's work. A major highlight of the Sessions was M.M.'s talk entitled "Stochastic analysis and function spaces", which was a remarkable unifying survey of recent work in the area. This volume features that talk as an article, which includes a broad bibliography of the important works in the area. The volume begins with a biography of M.M. Rao, a complete bibliography of his published writings, a list of his Ph.D. students and, notably, a collection of essays about M.M. written by some of his Ph.D students. Many of M.M. students have remained devoted to him, decades after completing their degrees. Their loyal devotion arises from M.M.'s complete dedication iii
iv
R. J. Swift & A. C. Krinik
to them. He consistently put their concerns and welfare as his first priority. Their essays are a remarkable tribute. This volume complements the Festschrift volume Stochastic Processes and Functional Analysis, which was published by Marcel Dekker, Inc. in 1997. That volume was in celebration of M.M.'s 65th birthday. As M.M. continues to work on, develop and expand mathematics, we look to future collections of articles that honor him and his love of mathematics. R. J. Swift A. C. Krinik
Biography of M. M. Rao
M.M. Rao was born Malempati Madhusudana Rao in the village of Nimmagadda in the state of Andhra Pradesh in India on June 6, 1929. He came to the United States after completing his studies at the College of Andhra University and the Presidency College of Madras University. He obtained his Ph.D in 1959 at the University of Minnesota under the supervision of Monroe Donsker (as well as Bernard R. Gelbaum, Leonid Hurwicz, and I. Richard Savage). His first academic appointment was at Carnegie Institute of Technology (now called Carnegie Mellon University) in 1959. In 1972, he joined the faculty at the University of California, Riverside where he remains today. He has held visiting positions at the Institute for Advanced Study (Princeton), the Indian Statistical Institute, University of Vienna, University of Strassbourg, and the Mathematical Sciences Research Institute (Berkeley). In 1966 he married Durgamba Kolluru in India. They have twin daughters Leela and Uma and one granddaughter. M.M.'s research interests were initially in probability and mathematical statistics, but his intense mathematical interest and natural curiosity found him pursuing a wide range of mathematical analysis including stochastic processes, functional analysis, ergodic theory and related asymptotics, differential equations and difference equations. His breadth of interest is mirrored by his students, many of whom are recognized as experts in diverse fields such as measure theory, operator theory, partial differential equations and stochastic processes. M.M. has always strived for complete understanding and generality in mathematics and rarely accepts less from others. This view of mathematics has played a central role in his teaching. M.M. Rao is truly a gifted lecturer and he has inspired many generations of students. He is a demanding Ph.D. advisor that expects the most from his students. The guidance and mentoring he provides them has led to many of his students becoming successful mathematicians. It is no wonder that he has had his share of the best available graduate students. M.M. is a prolific writer. His first published writings were not on math-
VI
R. J. Swift & A. C. Krinik
ematics, but rather Indian poetry. He wrote poetry in is late teenage years and had a collection of his poems published when he was 21. His mathematical research publications are many and span five decades. He is active and vital as ever. He has just completed a second edition of his well-received measure theory text and is currently working upon revised and expanded second editions of his probability and conditional measures texts.
Published Writings of M. M. Rao
[I] Note on a remark of Wald, Amer. Math. Monthly 65 (1958), 277-278. [2] Lower bounds for risk functions in estimation, Proc. Nat'l Acad. of Sciences 45 (1959), 1168-1171. [3] Estimation by periodogram, Trabajos de Estadistica 11 (1960), 123-137. [4] Two probability limit theorems and an application, Indagationes Mathematicae 23 (1961), 551-559. [5] Theory of lower bounds for risk functions in estimation, Mathematische Annalon 143 (1961), 379-398. [6] Consistency and limit distributions of estimators of parameters in explosive stochastic difference equations, Annals of Math. Stat. 32 (1961), 195-218. [7] Some remarks on independence of statistics, Trabajos de Estadistica 12 (1961), 19-26. [8] Remarks on a multivariate gamma distribution, Amer. Math. Monthly 68 (1961), (with P. R. Krishnaiah, 342-346). [9] Theory of order statistics, Mathematische Annalen 147 (1962), 298-312. [10] Nonsymmetric projections in Hilbert Space, Pacific J. Math. 12 (1962), 343-357, (with V. J. Mizel). [II] Characterizing normal law and a nonlinear integral equation, J. Math. Mecli. 12 (1963), 869-880. [12] Inference in stochastic processes-I, Teoria Veroyatnastei i ee Primeneniya 8 (1963), 282-298. [13] Some inference theorems in stochastic processes, Bull. Amer. Math. Sec. 68 (1963), 72-77. [14] Discriminant analysis, Annals of Inst. of Stat. Math. 15 (1963), 11-24. [15] Bayes estimation with convex loss, Annals of Math. Stat. 34 (1963), 839-846, (with M.H. DeGroot). [16] Stochastic give-and-take, J. Math. Anal. &; Appl. 7 (1963), 489-498, (with M.H. DeGroot) [17] Averagings and quadratic equations in operators, Carnegie-Mellon Univesity Technical Report # 9 (1963) 27 pages, (with V. J. Mizel) [18] Projections, generalized inverses, and quadratic forms, J. Math. Anal. & Appl. 9 (1964), 1-11, (with J. S. Chipman). vn
viii
R. J. Swift & A. C. Krinik
[19] Decomposition of vector measures, Proceedings of Nat'l. Acad. of Sciences 51 (1964), 771-774. [20] Decomposition of vector measures, Proceedings of Nat'l. Acad. of Sciences 51 (1964), 771-774, Erratum, 52 (1964), p. 864. [21] Linear functionals on Orlicz spaces, Nieuw Archief voor Wiskunde 312 (1964), 77-98. [22] The treatment of linear restrictions in regression analysis, Econometrica 32 (1964), 198-209, (with J.S. Chipman). [23] Conditional expectations and closed projections, Indagationes Mathematicae 27 (1965), 100-112. [24] Smoothness of Orlicz spaces-I and II, Indagationes Mathematicae 27 (1965), 671-680, 681-690. [25] Existence and determination of optimal estimators relative to convex loss, Annals of Inst. of Stat. Math 17 (1965), 113-147. [26] Interpolation, ergodicity, and martingales, J. of Math. &L Mech. 16 (1965), 543-567. [27] Inference in stochastic processes-II, Zeitschrift fur Wahrscheinlichkeitstheorie 5 (1966), 317-335. [28] Approximations to some statistical tests, Trabajos de Estadistica 17 (1966), 85-100. [29] Multidimensional information inequalities and prediction, Proceedings of Int'l. Symposium on Multivariate Anal., Academic Press, (1966) 287-313, (withM.H. DeGroot). [30] Convolutions of vector fields and interpolation, Proceedings of Nat'l. Acad. Sciences 57 (1967), 222-226. [31] Abstract Lebesgue-Ptadon-Nikodym theorems, Annali di Matematica Pura ed Applicata (4) 76 (1967), 107-132. [32] Characterizing Hilbert space by smoothness, Indagationes Mathematicae 29 (1967), 132-135. [33] Notes on pointwise convergence of closed martingales, Indagationes Mathematicae 29 (1967), 170-176. [34] Inference in stochastic processes-Ill, Zeitschrift fur Wahrscheinlichkeitstheorie 8 (1967), 49-72. [35] Characterization and extension of generalized harmonizable random fields, Proceedings Nat'l. Acad. Sciences 58 (1967), 1213-1219. [36] Local functionals and generalized random fields, Bull. Amer. Math. Soc. 74 (1968), 288-293. [37] Extensions of the Hausdorff-Young theorem, Israel J. of Math. 6 (1968), 133-149. [38] Linear functionals on Orlicz spaces: General theory, Pacific J. Math. 25 (1968), 553-585. [39] Almost every Orlicz space is isomorphic to a strictly convex Orlicz space, Proceedings Amer. Math. Soc. 19 (1968), 377-379.
PUBLISHED WRITINGS OF M. M. RAO
ix
[40] Predictions nonlineares et martingales d'operateurs, Comptes rendus (Academic des Sciences, Paris), Ser. A, 267 (1968) 122-124. [41] Representation theory of multidimensional generalized random fields, Proceedings 2d Int'l. Sympt. Multivariate Anal., Academic Press (1969), 411-436. [42] Operateurs de moyednes et moyennes conditionnelles, C.R. Acad. Sciences, Paris, Ser. A, 268 (1969), 795-797. [43] Produits tensoriels et espaces de fontioiis, C.R. Acad. Sci., Paris 268 (1969), 1599- 1601. [44] Stone-Weierstrass theorems for function spaces, J. Math. Anal. 25 (1969), 362-371. [45] Contractive projections and prediction operators, Bull. Amer. Math. Sec. 75 (1969), 1369-1373. [46] Generalized martingales, Proceedings 1st Midwestern Symp. on Ergodic Theory Prob., Lecture Notes in Math., Springer-Verlag, 160 (1970), 241-261. [47] Linear operations, tensor products and contractive projections in function spaces, Studia Math. 38, 131-186, Addendum 48 (1970), 307-308. [48] Approximately tame algebras of operators, Bull. Acad. Pol. Sci., Ser. Math. 19 (1971), 43-47. [49] Abstract nonlinear prediction and operator martingales, J. Multivariate Anal. 1 (1971), 129-157, Erratum, 9. p. 646. [50] Local functionals and generalized random fields with independent values, Teor. Vero- jatnost., Prem. 16 (1971), 466-483. [51] Projective limits of probability spaces, J. Multivariate Anal. 1 (1971), 28-57. [52] Contractive projections and conditional expectations, J. Multivariate Anal. 2 (1972), 262-381, (with N. Dinculeanu). [53] Prediction sequences in smooth Banach spaces, Ann. Inst. Henri Poincare, Ser. B, 8 (1972), 319-332. [54] Notes on characterizing Hilbert space by smoothness and smooth Orlicz spaces, J. Math. Anal. & Appl. 37 (1972), 228-234. [55] Abstract martingales and ergodic theory, Proc. 3rd Symp. on Multivariate Anal., Academic Press (1973), 100-116. [56] Remarks on a Radon-Nikodym theorem for vector measures, Proc. Symp. on Vector & Operator Valued Measures and Appi., Academic Press (1973), 303-317. [57] Inference in stochastic processes-IV: Predictors and projections, Sankhya, Ser. A 36 (1974), 63-120. [58] Inference in stochastic processes-V: Admissible means, Sankhya, Ser. A. 37 (1974), 538-549. [59] Extensions of stochastic transformations, Trab. Estadistica 26 (1975), 473-485.
x
R. J. Swift & A. C. Krinik
[60] Conditional measures and operators, J. Multivariate Anal. 5 (1975), 330-413. [61] Compact operators and tensor products, Bull. Acad. Pol. Sci. Ser. Math. 23 (1975), 1175-1179. [62] Two characterizations of conditional probability, Proc. Amer. Math. Sec. 59 (1976), 75-80. [63] Conjugate series, convergence and martingales, Rev. Roum. Math. Pures et Appl. 22 (1977), 219-254. [64] Inference in stochastic processes-VI: Translates and densities, Proc. 4th Symp. Multivariate Anal., North Holland, (1977), 311-324. [65] Bistochastic operators, Commentationes Mathematicae, Vol. 21 March, (1978), 301- 313. [66] Asymptotic distribution of an estimator of the boundary parameter of an unstable process, Ann. Statistics 6 (1978), 185-190. [67] Convariance analysis of nonstationary time series, Developments in Statistics I (1978), 171-225. [68] Non L1-bounded martingales, Stochastic Control Theory and Stochastic Differential Systems, Lecture Notes in Control and Information Sciences, 16 (1979), 527-538, Springer Verlag. [69] Processus lineaires sur Cbo(G), C. R. Acad. Sci., Paris, 289 (1979), 139141. [70] Convolutions of vector fields-I, Math. Zeitschrift, 174 (1980), 63-79. [71] Asymptotic distribution of an estimator of the boundary parameter of an unstable process, Ann. Statistics 6 (1978), 185-190, Correction, Ann. Statistics 8 (1980), 1403. [72] Local Functionals on Coo(G) and probability, J. Functional Analysis 39 (1980), 23-41. [73] Local functionals, Proceedings of Oberwolfach Conference on Measure Theory, Lecture Notes in Math. 794, Springer-Verlag (1980), 484-496. [74] Structure and convexity of Orlicz spaces of vector fields, Proceedings of the F.B. Jones Conference on General Topology and Modern Analysis, University of California, Riverside (1981), 457-473. [75] Representation of weakly harmonizable processes, Proc. Nat. Acad. Sci., 79, No. 9 (1981), 5288-5289. [76] Stochastic processes and cylindrical probabilities, Sankhya, Ser. A (1981), 149-169. [77] Application and extension of Cramer's Theorem on distributions of ratios, In Contributions to Statistics and Probability, North Holland (1981), 617-633. [78] Harmonizable processes: structure theory, L'Enseignement Mathematique, 28 (1982), 295-351.
PUBLISHED WRITINGS OF M. M. RAO
xi
[79] Domination problem for vector measures and applications to non-stationary processes, Oberwolfach Measure Theory Proceedings, Springer Lecture Notes in Math. 945 (1982), 296-313. [80] Bimeasures and sampling theorems for weakly harmonizable processes, Stochastic Anal. Appl. 1 (1983), 21-55, (with D.K. Chang). [81] Filtering and smoothing of nonstationary processes, Proceedings of the ONR workshop on "Signal Processing", Marcel-Dekker Publishing (1984), 59-65. [82] The spectral domain of multivariate harmonizable processes, Proc. Nat. Acad. Sci. U.S.A. 81 (1984), 4611-4612. [83] Harmonizable, Cramer, and Karhunen classes of processes, Handbook in Vol. 5 (1985), 279-310. [84] Bimeasures and nonstationary processes, Real and Stochastic Analysis, Wiley & Sons (1986), 7-118, (with D.K. Chang). [85] A commentary on "On equivalence of infinite product measures", in S. Kakutani's selected works, Birkhauser Boston Series (1986), 377-379. [86] Probability, Academic Press, Inc., New York, Encyclopedia of Physical Science and Technology, Vol. 11 (1987), pp. 290-310. [87] Special representations of weakly harmonizable processes, Stochastic Anal. (1988), 169-189, (with D.K. Chang). [88] Paradoxes in conditional probability, J. Multivariate Anal., 27, (1988), pp. 434-446. [89] Harmonizable signal extraction, filtering and sampling Springer-Verlag, Topics in Non-Guassian Signal Processing, Vol. II (1989), pp. 98-117. [90] A view of harmonizable processes, North-Holland, New York, in Statistical Data Analysis and Inference (1989), pp. 597-615. [91] Bimeasures and harmonizable processes; (analysis, classification, and representation), Springer-Verlag Lecture Notes in Math., 1379, (1989), pp. 254-298. [92] Sampling and prediction for harmonizable isotropic random fields, J. Col Analysis, Information & System Sciences, Vol 16 (1991), pp. 207-220. [93] L2'2 - boundedness, harmonizability and filtering, Stochastic Anal. App., (1992), pp. 323-342. [94] Probability (expanded for 2nd ed.), Encyclopedia of Physical Science and Technology Vol 13 (1992), pp. 491-512. [95] Stochastic integration: a unified approach, C. R. Acad. Sci., Paris, Vol 314 (Series 1), (1992), pp. 629-633. [96] A projective limit theorem for probability spaces and applications, Theor. Prob. and Appl., Vol 38 (1993), (with V. V. Sazonov, in Russian), pp. 345355. [97] Exact evaluation of conditional expectations in the Kolmogorov model, Indian J. Math., Vol 35 (1993) pp 57-70.
xii
R. J. Swift & A. C. Krinik
[98] An approach to stochastic integration (a generalized and unified treatment) in Multivariate Analysis: Future Directions, Elsivier Science Publishers, The Netherlands (1993), pp. 347-374. [99] Harmonizable processes and inference: unbiased prediction for stochastic flows, J. Statistic. Planning and Inf., Vol 39 (1994), pp. 187-209. [100] Some problems of real and stochastic analysis arising from applications, Stochastic Processes and Functional Analysis, J. A. Goldstein, N. E. Gretsky, J.J. Uhl, editors, Marcel Dekker Inc. (1997), 1-15. [101] Packing in Orlicz sequence spaces, (with Z. D. Ren), Studia Math. 126 (1997), no. 3, 235-251. [102] Second order nonlinear stochastic differential equations, Nonlinear Analysis, Vol. 30, no. 5 (1997) 3147-3151. [103] Higher order stochastic differential equations. Real and Stochastic Analysis, , CRC Press, Boca Raton, FL, (1997), 225-302. [104] Nonlinear prediction with increasing loss. J. N. Srivastava: felicitation volume. J. Combin. Inform. System Sci. 23 (1998), no. 1-4, 187-192. [105] Characterizing covariances and means of harmonizable processes. Infinite Dimensional Analysis and Quantum Probability, Kyoto (2000), 363-381. [106] Multidimensional Orlicz space interpolation with changing measures. Peetre 65 Proceedings, Lund, Sweden, (2000). [107] Representations of conditional means. Dedicated to Professor Nicholas Vakhania on the occasion of his 70th birthday. Georgian Math. J. 8 (2001), no. 2, 363-376. [108] Convolutions of vector fields. II. Random walk models. Proceedings of the Third World Congress of Nonlinear Analysts, Part 6 (Catania, 2000). Nonlinear Anal. 47 (2001), no. 6, 3599-3615. [109] Martingales and some applications. Shanbhag, D. N. (ed.) et al., Stochastic processes: Theory and methods. Amsterdam: North-Holland/Elsevier. Handbook Statistics 19, (2001) 765-816. [110] Probability (revised and expanded for 3rd ed.), Encyclopedia of Physical Science and Technology (2002), pp. 87-109. [Ill] Representation and estimation for harmonizable type processes. IEEE, (2002) 1559-1564. [112] A commentary on "Une Theorie Unifiee des martingales et des moyennes ergodigues", C.R. Acad. Sci 252 (1961) p. 2064-2066, in Rota's Saleta, Birkhauser Boston (2002) [113] Evolution operators in stochastic processes and inference. Evolution Equation, G. R. Goldstein, R. Nagel, S. Romanelli, editors, Marcel Dekker Inc. (2003), 357-372. [114] Stochastic analysis and function spaces. Recent Advances in Stochastic Processes and Functional Analysis, A.C. Krinik, R.J. Swift, editors, Marcel Dekker Inc. (2004), 1-25.
PUBLISHED WRITINGS OF M. M. RAO
xiii
Books Edited [1] General Topology and Modern Analysis. Proceedings of the F.B. Jones Conference, Academic Press, Inc., New York (1981), 514 pages, (Edited jointly with L.F. McCauley). [2] Handbook in Statistics, Volume 5, Time Series in the Time Domain, (Edited jointly with E.J. Hannan, P.R. Krishnaiah), North-Holland Publishing Co., Amsterdam (1985). [3] Real and Stochastic Analysis, (Editor), Wiley & Sons, New York (1986), 347 pages. [4] Multivariate Statistics and Probability, (Edited jointly with C.R. Rao), Academic Press Inc., Boston (1989), 565 pages. [5] Real and stochastic analysis. Recent advances. (Editor) Boca Raton, FL, CRC Press. (1997), 393 pages. [6] Real and stochastic analysis. New Perspectives. (Editor) Birkhauser Boston (in preparation). Books Written [I] Stochastic Processes and Integration. SijthofF & Noordhoff International Publishers, Alpehn aan den Rijn, The Netherlands, (1979), 460 pages. [2] Foundations of Stochastic Analysis, Academic Press, Inc., New York, (1981), 295 pages. [3] Probability Theory with Applications, Academic Press, Inc. New York, (1984), 495 pages. [4] Measure Theory and Integration, Wiley-Interscience, New York (1987), 540 pages. [5] Theory of Orlicz Spaces (jointly with Z. D. Ren), Marcel Dekker Inc., New York (1991), 449 pages. [6] Conditional Measures and Applications, Marcel Dekker Inc., New York (1993), 417 pages. [7] Stochastic Processes: General Theory, Kluwer Academic Publishers, The Netherlands (1995), 620 pages. [8] Stochastic Processes: Inference Theory, Kluwer Academic Publishers, The Netherlands (2000), 645 pages. [9] Applications of Orlicz Spaces (jointly with Z. D. Ren), Marcel Dekker Inc., New York (2002), 464 pages. [10] Measure Theory and Integration, (Revised and enlarged second edition), Marcel Dekker Inc., New York (to appear, 2004). [II] Probability Theory with Applications, (jointly with R. J. Swift), (Revised and enlarged second edition), Kluwer Academic Publishers, The Netherlands (in preparation).
xiv
R. J. Swift & A. C. Krinik
[12] Conditional Measures and Applications, (Revised second edition), Marcel Dekker Inc., New York (in preparation).
Ph.D. Theses Completed Under the Direction of M.M. Rao
At Carnegie-Mellon University: Dietmar R. Borchers (1964), "Second order stochastic differential equations and related Ito processes." J. Jerry Uhl. Jr (1966), "Orlicz spaces of additive set functions and set martingales." Jerome A. Goldstein (1967), "Stochastic differential equations and nonlinear semi-groups." Neil E. Gretsky (1967), "Representation theorems on Banach function spaces." William T. Kraynek (1968), "Interpolation of sub-linear operators on generalized Orlicz and Hardy spaces." Robert L. Rosenberg (1968), "Compactness in Orlicz spaces based on sets of probability measures." George Y. H. Chi (1969), "Nonlinear prediction and multiplicity of generalized random processes." At University of California, Riverside: Vera Darlean Briggs (1973), "Densities for infinitely divisible processes." Stephen V. Noltie (1975), "Integral representations of chains and vector measures." Theodore R. Hillmann (1977), "Besicovitch - Orlicz spaces of almost periodic functions." Michael D. Brennan (1978), "Planar semi-martingales and stochastic integrals." James P. Kelsh (1978), "Linear analysis of harmonizable time series." Alan C. Krinik (1978), "Stroock - Varadhan theory of diffusion in a Hilbert space and likelihood ratios." Derek K. Chang (1983), "Bimeasures, harmonizable process and filtering." Marc H. Mehlman (1990), "Moving average representation and prediction for multidimensional strongly harmonizable process."
xv
xvi
R. J. Swift & A. C. Krinik
Randall J. Swift (1992), "Structural and sample path analysis of harmonizable random fields." Michael L. Green (1995), "Multi-parameter semi-martingale integrals and boundedness principles." Heroe Soedjak (1996), "Estimation problems for harmonizable random processes and fields."
Contents
Preface
iii
Biography of M. M. Rao
v
Published Writings of M. M. Rao
vii
Ph.D. Theses Completed Under the Direction of M. M. Rao
xv
Contributors
xxi
For M. M. Rao
1
An Appreciation of My Teacher, M. M. Rao J. A. Goldstein
3
1001 Words About Rao M. L. Green
7
A Guide to Life, Mathematical and Otherwise N. E. Gretsky
11
Rao and the Early Riverside Years Alan Krinik
13
OnM. M.Rao R. J. Swift
21
Reflections on M. M. Rao Jerry Uhl
25
\.
Stochastic Analysis and Function Spaces M. M. Rao
27
2.
Applications of Sinkhorn Balancing to Counting Problems Isabel Beichl and Francis Sullivan
53
3.
Zakai Equation of Nonlinear Filtering with Ornstein-Uhlenbeck Noise: Existence and Uniqueness Abhay Bhatt, Belram Rajput, andJieXiong xvii
67
xviii
Contents
4.
Hyperfunctionals and Generalized Distributions M. Burgin
81
5.
Process-Measures and Their Stochastic Integral Nicolae Dinculeanu
119
6.
Invariant Sets for Nonlinear Operators Gisele Ruiz Goldstein and Jerome A. Goldstein
141
I.
The Immigration-Emigration with Catastrophe Model Michael L. Green
149
8.
Approximating Scale Mixtures Hassan Hamdan and John Nolan
161
9.
Cyclostationary Arrays: Their Unitary Operators and Representations Harry Hurd and Timo Koski
171
10.
Operator Theoretic Review for Information Channels Yuichiro Kakihara
195
I1.
Pseudoergodicity in Information Channels Yuichiro Kakihara
209
12.
Connections Between Birth-Death Processes Alan Krinik, Carrie Mortensen, and Gerardo Rubino
219
13.
Integrated Gaussian Processes and Their Reproducing Kernel Hilbert Spaces Milan N. Lukic
241
14.
Moving Average Representation and Prediction for Multidimensional Harmonizable Processes Marc H. Mehlman
265
15.
Double-Level Averaging on a Stratified Space Natella V. O'Bryant
277
16.
The Problem of Optimal Asset Allocation with Stable Distributed Returns Svetlozar Rachev, Sergio Ortobelli, and Eduardo Schwartz
295
17.
Computations for Nonsquare Constants of Orlicz Spaces Z. D. Ren
349
18.
Asymptotically Stationary and Related Processes Bertram M. Schreiber
363
19.
Superlinearity and Weighted Sobolev Spaces Victor L. Shapiro
399
Contents
xix
20.
Doubly Stochastic Operators and the History of Birkhoff s Problem 111 Sheila King and Ray Shiflett
411
21.
Classes of Harmonizable Isotropic Random Fields Randall J. Swift
441
22.
On Geographically-Uniform Coevolution: Local Adaptation in Non-fluctuating Spatial Patterns Jennifer M. Switkes and Michael E. Moody
461
Approximating the Time Delay in Coupled van der Pol Oscillators with Delay Coupling Stephen A. Wirkus
483
23.
Contributors
I. Beichl NIST, Gaithersburg, Maryland, U.S.A. A. Bhatt
Indian Statistical Institute, New Delhi, India
M. Burgin
University of California, Los Angeles, Los Angeles, California, U.S.A.
N. Dinculeanu
University of Florida, Gainesville, Florida, U.S.A.
G. R. Goldstein J. A. Goldstein
University of Memphis, Memphis, Tennessee, U.S.A.
M. L. Green
California State Polytechnic University, Pomona, California, U.S.A.
N. E. Gretrsky H. Hamdan H. Kurd
University of Memphis, Memphis, Tennessee, U.S.A.
University of California, Riverside, Riverside, California, U.S.A.
James Madison University, Harrisonburg, Virginia, U.S.A.
University of North Carolina, Chapel Hill, North Carolina, U.S.A.
Y. Kakihara California State University, San Bernardino, San Bernardino, California, U.S.A. S. King
California State Polytechnic University, Pomona, California, U.S.A.
T. Koski University of Linkoping, Linkoping, Sweden A. C. Krinik California State Polytechnic University, Pomona, California, U.S.A. M. N. Lukic
Viterbo University, La Crosse, Wisconsin, U.S.A.
M. H. Mehlman M. E. Moody U.S.A.
University of New Haven, New Haven, Connecticut, U.S.A.
Franklin W. Olin College of Engineering, Needham, Massachusetts,
Carrie Mortensen
Pasadena City College, Pasadena, California, U.S.A. xxi
xxii J. Nolan
Contributors American University, Washington, DC, U.S.A.
N. V. O'Bryant
University of California, Irvine, Irvine, California, U.S.A.
S. Ortobelli Universita della Calabria, Arcavacata di Rende, Italy S. Rachev University of California Santa Barbara, Santa Barbara, California, U.S.A. and School of Economics and Buisness Engineering, University of Karlsruhe, Karlsruhe, Germany B. Rajput M. M. Rao Z. D. Ren
University of Tennessee, Knoxville, Tennessee, U.S.A. University of California, Riverside, Riverside, California, U.S.A. Suzhou University, Suzhou, P.R. China
G. Rubino INRIA, IRISA, Campus Universitaire de Beaulieu, Rennes, France B. M. Schreiber
Wayne State University, Detroit, Michigan, U.S.A.
E. Schwartz Anderson Graduate School of Management, University of California, Los Angeles, Los Angeles, California, U.S.A. V. L. Shapiro R. Shiflett
University of California, Riverside, Riverside, California, U.S.A.
California State Polytechnic University, Pomona, California, U.S.A.
F. Sullivan IDA/CCS, Bowie, Maryland, U.S.A. R. J. Swift
California State Polytechnic University, Pomona, California, U.S.A.
J. M. Switkes
California State Polytechnic University, Pomona, California, U.S.A.
J. J. Uhl University of Illinois, Urban, Illinois, U.S.A. S. Wirkus
California State Polytechnic University, Pomona, California, U.S.A.
J. Xiong University of Tennessee, Knoxville, Tennessee, U.S.A.
For M.M. Rao
Professor M.M. Rao has had a long and distinguished research career. His research spans the areas of probability, statistics, stochastic processes, Banach space theory, measure theory and differential equations - both deterministic and stochastic. The prolific published research of M.M. Rao impacts each of these broad areas of mathematics During M.M.'s career, he has had eighteen Ph.D. students. Many of his students have gone on to very successful careers in mathematics and are recognized experts in their field of study. Six of his former students have written tribute essays about M.M. and each are affectionately dedicated to him. These essays were written by J.A. Goldstein, M.L. Green, N.E. Gretsky, A.C. Krinik, R.J. Swift and J.J. Uhl. Jerry Goldstein is a Professor of Mathematics at the University of Memphis. He is a internationally known for his outstanding work in semigroup theory, functional analysis and differential equations. Mike Green is an Assistant Professor of Mathematics at California State Polytechnic University, Pomona. His research is in the area of multi-parameter manifold valued semi-martingales and aspects of applied probability. Neil Gretsky is an Associate Professor of Mathematics at the University of California, Riverside. He is recognized for his research in the geometry of Banach spaces and recently his work in game-theoretic applications in economics. Alan Krinik is a Professor of Mathematics at California State Polytechnic University, Pomona. He is noted for his work in lattice path combinatorics and its application to queueing theory and birth-death processes. He is also the co-editor of this volume. Randy Swift is an Associate Professor of Mathematics at California State Polytechnic University, Pomona. He is well-known for his work in harmonizable processes, mathematical modeling and differential equations. He is also the co-editor of this volume. Jerry Uhl is a Professor of Mathematics at the University of Illinois, Urbana-Champaign. He is known for his work in vector measures and Banach space theory. He is also noted for his work in calculus reform.
An Appreciation of my teacher, M.M. Rao
I want to record my thoughts about M. M. Rao as a teacher. He was a really great teacher and his teaching continues to have a major impact on my career. As a first year graduate student at Carnegie Tech, in 1963-64, 1 took Rao's year long course on Functional Analysis. There were a lot of good students around Tech at that time; included in Rao's class were second year students Neil Gretsky and Jerry Uhl. Rao's ambitious style was to cover one major result in each lecture, or three per week. And all major theorems had descriptive names, some standard ("Dominated Convergence Theorem") and some not ("Law of the Unconscious Statistician"). The use of those names made the results easier to remember; I think Rao got this idea from Michel Loeve's book (from which I learned probability theory). Our text was by Angus E. Taylor, but we didn't use it much. Rao taught mostly out of Dunford & Schwartz (Vol. 1) and Hille & Phillips. His organization of the topics was excellent. An unusually large amount of material was covered per class. So much so that details were often omitted (or, sometimes in our minds, incorrectly given). With great regularity Gretsky, Uhl and I would stay after class and work out the complete details of the arguments we had just seen. Sometimes we realized that Rao really had given all the details; after all we were merely beginners and not yet well versed in mathematics. We always found that all of his results had correct versions, occasionally slightly different from what one of us thought when the discussion began. But by the end of the year, I learned so much that, for the first time, I considered myself a mathematician. Gretsky, Uhl and I were somehow teaching assistants to Rao, helping to teach one another. At the time I didn't give Rao credit for orchestrating this, but I think he did, at least to a substantial extent. He conveyed his love of mathematical depth and understanding and his passion for intense mathematical discussions. I took many more grad courses from Rao prior to graduating in May 1967. They were all great courses, but none matched that extraordinary course in Functional Analysis. That course had a permanent influence on me, and for the rest of my life I will feel close and grateful to M.M. and to Neil and Jerry
4
J. A. Goldstein
as well. Having gotten my BS at Tech in 1963 and anticipating my MS in 1964, I decided to apply elsewhere (in the fall of 1963) to do my Ph.D. away from my birthplace, Pittsburgh. But my wife had a teaching job in Pittsburgh and her applications elsewhere were unsuccessful. So, despite fellowship offers from more prestigious institutions, I was happy to stay at Tech because I knew (from the Functional Analysis course) that a great thesis advisor was available. Gretsky and Uhl were already doing research reading under Rao, and in the spring of 1964 I told Rao I wished to work with him in Functional Analysis (as Neil and Jerry were doing). He said he would be glad to be my advisor, but he had a problem in probability theory for me. I protested, saying I didn't know any probability theory. He pointed out that I had taken a year long junior level probability course from Morrie DeGroot, an excellent teacher and probabilist. (Of course, he was right, but I was mystified, being so in love with Functional Analysis). But, as my main focus was to work under M.M., I said OK. The first paper he gave me to read was by Dynkin, and it defined a Markov process as a 21 tuple (or something like that). Numbers larger than my combined fingers and toes made me nervous; and I was unable to read Dynkin's paper, Rao suggested I try Loeve's book and work a lot of the problems. This was a great suggestion, and Rao helped me a lot when I got stuck. And, happily, 6 or 8 months later I was able to read and understand Dynkin's paper (which was indeed a toughie). M.M. ran great seminars and, among other things forced his students to present papers they read and their own work. His ferocious but kind questioning taught us never to give a seminar less than fully prepared. And he taught us to work together and learn from one another. This is a very important point which was evident, but I didn't realize it as the time. Rao's teaching and advising styles were shamelessly adopted by me in my capacity as a teacher and advisor. I have had over 20 Ph.D. students ("children") and at least 8 "grandchildren", most of whom never met Rao and probably are largely unaware of the major hidden role he played in their education. I love to reread occasionally the article Rao wrote in the Raofest volume, celebrating his 65th birthday. Rao did something special and unusual; he gave his best research problems to his students. I have tried to follow his lead, and I believe our profession would be better of if more thesis advisors did the same thing. Rao was uncompromising in his high standards, but he was gentle and helpful. Not all of his Ph.D. students had the native brilliance of Gretsky or Uhl, but all of them (that I know) wrote excellent theses. Rao got his students to live up to their potential. I think that is the highest praise one can give to a teacher. Rao was also an excellent researcher. As a departmental citizen he was pretty feisty. He objected to (mathematical) political issues taking prece-
AN APPRECIATION OF MY TEACHER, M.M. RAO
5
dence over issues of quality and scholarship. Doing the "right thing" is not always the way to maximize one's popularity. But M.M. never hesitated to stand up and fight for his principles. I suppose I should tell one anecdote. The enormous length of Rao's first two names reminds me of Dynkin's definition of a Markov process (which took me 6 or 8 months to understand). So once I asked Dick Moore what Rao's nickname was. Dick's response: "He doesn't have one. People call him M.M. But he should have one."- Dick thought about it and hit on the nickname Mmmmmmmmm. But it never stuck. I feel much affection for M.M. I always respected and admired him, and there were moments during my grad student days, when the term "affection" did not characterize my feelings toward him. But I was young, brash and impatient; some things I could figure out very quickly and some not. Was I lucky to have M.M. as my principal teacher and mentor? Absolutely yes. Could I have done better either at Tech or elsewhere? I don't think so. Rao shaped my passion for mathematics, my desires to understand things fully, my standards, and my teaching and advising techniques. I owe him so much, more that I can usually imagine. Thank you M.M., for being such an inspiration and such a friend. J.A. Goldstein
1001 words about Rao
My first contact with M.M. Rao was in the fall quarter of 1989 in an Advanced Calculus class. Before this course, like a typical undergraduate, I inquired of other students about him. Most of my information came from the graduate students at UCR, since they were the ones who had taken courses from him. The graduate students generally regarded him as a hard, but fair teacher. This positive tone, however, was laced with an undertone, not unlike the sort one would receive about a blind date, who in all other respects was perfect, except for some peculiar habit. It required only one lecture to discover the peculiarity of M.M. Rao. He is so absorbed into mathematics that where the man ended and the math began was blurred until the separation of the two is unimaginable. His lectures are wonderful. The students of Rao have coined the phrase "Rao Math" for the rather distinctive style he has when presenting mathematics. He carefully prepares all his lectures, often writing them out in their entirety before the beginning of the quarter. An appropriate motto he has given is "If we do this for the general case, the rest will follow as corollaries." One need only read one of his books to see the verity of this motto. A good example is his text for Real Analysis, Measure Theory and Integration. He often immediately began lecture upon entering the room and always went over the allotted time leaving the next class waiting at the door. On more than one occasion, he was writing as he walked out the door! These peculiarities are symptoms of his strength, a single-minded dedication to his profession coupled with a deep interest and curiosity in the subject. In M.M. Rao, I met someone that hit the 35th level of Math™, a true math guru. To be fair, not everyone prospered under Rao. The lack of concrete examples was the typical student complaint about M.M. Rao's instruction. I guess M.M. Rao had been getting some grief about not being concrete enough, for during one class he declared, "I am an applied mathematician! I apply this theorem to prove that one!" This is a typical Racism. The beauty of mathematics as presented by him seduced me. I know that I am not the only one to experience this and like others I started taking more courses from Rao after my first introduction to him in advanced calculus. I
8
M. L. Green
began learning more analysis and in particular probability as a consequence. A tremendous benefit to my education was the open door I always found at his office and the many conversations I had with Rao about mathematics and his research have enriched my life. As a student, I never felt belittled or talked down to by Rao even when discussing his research. During my seeking of an advisor for my Ph.D. thesis, some of the other professors cautioned me about Rao, concerning his ability to win the best students. A reputation well deserved. I mulled over several individuals, all very capable mathematicians, but the accessibility of M.M. Rao won me over, even though my first interests were in algebra and topology. His work ethic was intimidating. Sleeping four or fewer hours per night working on mathematics most of the day, seven days a week he labored with "a devil on his back" to complete his projects. He only took a half-day off on Sunday. He once said, "Mathematics is a harsh mistress. Either you love her, or she will leave you." My thesis topic was to extend stochastic integration to multi-parameter manifold valued semi-martingales using the generalized Bochner boundedness principle. The students of Rao have termed his theses topics as "topics in the clouds". A few of the completed theses are "Orlicz spaces of additive set functions and set martingales", "Integral representations of chains and vector measures" and "Multi-parameter semi-martingales integrals and boundedness principles." The last being my own, coming up short on the manifold part. The theses that Rao has guided tend to be on the long side, my own was 138 pages, not the longest. This propensity to generalize has worked well for M.M. Rao. Take for instance what Rao has done with ideas from S. Bochner. In 1956, Bochner wrote "Stationarity, boundedness, almost periodicity of random valued functions" in the Proceedings of the Third Berkeley Symposium. In this paper Bochner defined V-bounded processes and noted that these processes were an extension Loeve's harmonizable processes. Rao's idea was to define two classes of processes, the V-bounded being called weakly harmonizable processes which includes the processes of Loeve, now called strongly harmonizable. This definition opened up a whole new area of research in harmonizability being still actively pursued. Another idea Rao gleaned from this paper is to define stochastic integration via a boundedness principle. His generalized Bochner boundedness principle provides a unified approach to stochastic integration including all known stochastic integrals under one umbrella. This principle would still be unknown if M.M. Rao had not pursued mathematics in his own distinctive manner. For the Young functions from Orlicz space theory were necessary for the result. Rao met with Bochner three times. Bochner must have been impressed, since he communicated three of Rao's papers to the National Academy of Sciences. Rao still has not been entered as a member of the Academy.
1001 WORDS ABOUT RAO
9
Rao still lies dear in my heart as he does in the hearts of many others who have come across his path. M.M. Rao asked me to not compare him as an equal to Bochner, his modesty is showing, but in my eyes, he is a great mathematician and a great man. He still shows his faith in me and has many expectations for my work, encouraging me to continue my labors. My wish is that he finds a satisfaction in his life and work that brings him peace. I look forward to the years that come to see what new worlds he will open in mathematics. M.L. Green
A Guide to Life, Mathematical and Otherwise
When I went to graduate school in the early '60s I started in the Systems & Communications Sciences interdisciplinary program at Carnegie Tech. I knew that I wanted to study and work in Numerical Analysis and Computing. In my second year I decided to take a Functional Analysis course because I had some half-formed idea that this would be a valuable tool for Numerical Analysis. I did not in any way anticipate the ensuing life-changing event of meeting M.M. Rao. The course became an almost-religious epiphany for me: this was truly the way, the truth, and the light! M.M.'s lectures were magnificent; the material was spell-binding; the problem sets were really challenging. Several of my fellow students felt the same way - especially Jerry Uhl and Jerry Goldstein. We took more courses and seminars with M.M. and we chose to write our theses under his direction. The three of us spent a lot of time challenging each other and guiding each other under M.M.'s firm but insightful hand. In the last vestiges of the medieval guild system, we apprenticed ourselves to a guild master - a true master. There was certainly a deep love for Mathematics and a lifelong friendship and bond that we developed together under M.M.'s direction, but there was much more to M.M.'s influence. M.M. had a deep concern for, and loyalty to, his students. No matter how busy he was, he always had time and energy for us in all aspects of our development. When my wife left me in the final year of my thesis work, M.M. was there to counsel and comfort me. Unbeknownst to me at the time, he had also spoken with my wife to see if there were any possible solution. When I mistakenly thought that one of my thesis results was contained in an earlier paper, he brought me out of my depression and led me to see the positive differences in the work. When I succumbed to procrastination and other earthly temptations, he was there to inspire me with his example. He was never accusatory or condemning, just exemplary and inspiring. When a new department chair took a personal dislike to me, M.M. was there to defend his student. Of course, this was the same M.M. who liked to put an occasional (unannounced) unsolved problem on his takehome exams in the advanced topics courses. When I received a job offer from the Mathematics Department at UCR, 11
12
N. E. Gretsky
he told me that it was a great opportunity because Howard Tucker was there and he repeatedly advised me that "... you will really like Professor Tucker ". This stuck in my mind so deeply that when I finally met Howard and he told me to call him Howard, my natural response was "Yes sir, Professor Tucker". A few years later, my new department wanted to recruit a senior person in Functional Analysis and Probability. M.M. was not in the market for a new job, but I knew that he was not happy with his department chair. Our department managed to interview him twice and convinced him to come. So we wound up in the same department for thirty years. Once again he me led to learn life's great lessons. At first I needed to assert my independence from him. That must have been painful for him, but he never showed it. Then I needed to again succumb to procrastination and earthly temptations. Once again, he was over the space of many years non-accusatory and supportive. Coming back into the fold, I started to drift into areas of applications of Functional Analysis and Measure Theory. He renamed our continuing Functional Analysis Seminar the Functional Analysis and Related Topics Seminar. It has been a very large feature of my life as well as a remarkable pleasure and privilege to be his student, his colleague, and his friend. N.E. Gretsky
Rao and the early Riverside years
M.M. Rao first came to the University of California, Riverside Mathematics Department from Carnegie-Melon in 1973. There was much excitement and anticipation of his coming by both new colleagues and graduate students in the UCR Math Department. Neil Gretsky (a former Rao graduate student from Carnegie-Melon) was already on the faculty at UCR. Neil and others had alerted UCR graduate students of M.M. Rao's prominence in probability and functional analysis. M.M. Rao was a welcome addition of a talented research mathematician who was receptive to graduate students. This enhanced an already formidable mathematics department that had F. Burton Jones in topology, Richard Block in algebra and Victor Shapiro in differential equations among other notable faculty members. As a new graduate student at UCR (coming from UCLA) in 1973, I knew very little of the anticipation surrounding M.M. Rao's first academic year at UCR. However, I became quickly familiar with Rao's teaching style as I took his inaugural graduate sequence in real analysis at UCR: Math 209, 210, 211 starting in September 1973. The course was taught at a high level of abstraction. The first quarter was measure theory developed on general sigma rings using an outer measure approach restricted down to measurable sets via the Caratheodory construction. The second quarter contained the major results of general integration theory. The third quarter included an introduction to Choquet's capacity theory. There was no specific textbook for the course but rather a list of several recommended texts. The course was carefully and clearly presented by M.M. Rao, a man in his early forties (originally from India) with a lively personality, who wore a suit to class. I tried to take careful notes and absorb the material since I knew a comprehensive qualifying exam on real analysis based on this course was waiting for me at the end of the academic year. However, the material was not easy for me. I passed the qualifying exam but considered myself fortunate. As for this introduction to Rao, I found him an animated professor completely engaged in the subject of real analysis. He developed the theory from a modern abstract viewpoint but was concerned about the history of the subject and was careful to credit various mathematicians as we proceeded (Lebesgue, 13
14
A. C. Krinik
Caratheodory, Vitali, Saks, Pubini, Egoroff, Choquet, etc.). Several very talented UCR graduate students, including Stephen Noltie and Michael Brennan, were seemingly planning to work with Rao even before he arrived at UCR. By the time, I asked Professor Rao to be my advisor in 1975-76, I was his sixth Ph.D. student at UCR. I was grateful he agreed to take me as his student. From the beginning, Rao had the reputation of being more demanding than most other professors at UCR. Rao would oversee your progress but he would not help you in the writing of your thesis. Rao expected his students to be prepared in many different areas of mathematics. As a consequence, Rao students routinely took additional coursework past the qualifiers. For example, I took graduate sequences in functional analysis, advanced statistical inference and probability theory after my qualifying exams. The idea was to be prepared to solve our dissertation problem from a variety of different possible perspectives. Another important aspect of being a Rao graduate student in the 1970's was an ongoing quarterly seminar on functional analysis or stochastic processes. This seminar (which still meets) consisted of Rao, his students and any other interested parties. Everyone attending talked sooner or later. When the discussion became very specialized, the seminar often reduced down to Rao and his graduate students. For me, I recall having to present material that originated from a seminal 1969 article written by D.W. Stroock and S.R.S. Varadhan on solutions of diffusion processes in d-dimensions using the martingale problem approach. I vividly recall preparing this challenging material for what seemed like an endless number of consecutive weeks. It was stressful but very helpful in forcing me to understand this paper which I eventually generalized into my dissertation. Understanding came slowly (and in phases). I learned how to present material when there were holes or unresolved problem areas and how to talk around topics until I was able to make complete arguments. The whole experience also brought the Rao students together in a common misery and made me appreciative of the mathematical abilities of my fellow grad student, Michael Brennan, who kindly helped me understand the more incomprehensible parts of this paper. This seminar experience was a common learning experience for Rao students in the 1970's. It is an activity that I still do today on a modified basis with my own graduate students. At UCR, M.M. Rao was primarily known among graduate students as a consummate researcher in mathematics-a man whose research interests connected functional analysis and integration theory with probability theory and stochastic processes. He was also recognized as an engaging professor who attracted some of the stronger graduate students to work with him and take a wide range of graduate classes. From a work ethic point of view role, no one worked harder than Rao. In the 70's, Rao occupied the (eastward) end office of a string of about twelve windowed offices on the third floor of Sproul
RAO AND THE EARLY RIVERSIDE YEARS
15
Hall that faced south overlooking University drive. Any passer-by, looking up at these offices in the evening would customarily see only two or three lights on after dark. Sometimes if the hour was late only one light burned. M.M. Rao's window was almost always lit. He was up there doing research, reviewing articles and in the 70's writing his first book. His colleagues and graduate students knew he was there. They also knew that he would be back in his office at least one day over the weekend as well. Rao displayed a commitment to his profession that was hard to match. From a graduate student's point of view, no one could complain that Rao was inaccessible or difficult to find. M.M. Rao of the 1970's was a confident, forceful and demanding advisor. As an outstanding mathematician, Rao had expectations or intuition of how the solution of mathematical problems should turn out. Whenever graduate student progress did not fit his long range view, he expected to be consulted or convinced as to why these mathematical objectives were not possible. He also expected graduate students to make a dedicated effort and work hard. Finally, Rao expected his stronger graduate students to make significant contributions by doing future mathematical research. After all, Rao himself lived according to these standards. These expectations sometimes caused tensions between Rao and his students. For example, graduate students, myself included, would at times "disappear" for weeks or even months. When this happened there could be many possible reasonable explanations (and some unreasonable explanations as well)-including outside life factors effecting the unreal graduate school existence. Sometimes, a graduate student just rather "lay low" while trying to achieve progress rather than share their "failed attempts" at solving a problem. I can remember Rao asking "Where is ?-I haven't seen him in weeks!" These incidents had both good and bad consequences. Rao students developed an independence and self reliance in doing mathematical research and also provided more opportunities for Rao students to bond together. Rao stories, like war stories, were swapped over lunch or over a few beers. Sometimes even an old Rao story from the legendary days of Gretsky, Uhl and Goldstein would be recycled when pertinent. In the end, Rao's forceful personality and expectations played differently among his graduate students (some of whom also had strong personalities and different goals). M.M. Rao views his graduate students as one big family. Certainly, there are many of his former students who have flourished in mathematical careers engaged in many of the same aspects of the profession that have occupied Rao. There are also former, highly capable, graduate students who presently have little interest in mathematical research and have chosen exciting alternative career paths. M.M. Rao is interested and always enjoys hearing (and talking at length about) how each of his graduate students is doing. However, make no mistake about it, Rao is a true believer. M.M. Rao's career
16
A. C. Krinik
in mathematics is distinguished by his talent, passion and energy in doing mathematics. There has never been any doubt in his mind that (if one has the ability) being a research mathematician is the best way to go. I think that even today, Rao would not understand how a graduate student in mathematics with outstanding potential in research could choose to do anything else. It is also very difficult to imagine M.M. not being engaged in mathematical activities. Rao is a lifer. Currently, at age 74, he is going strong. Rao is busy writing books with plans for additional books in the future. M.M. Rao did a wonderful job of protecting and promoting his graduate students. He was influential and resourceful in securing teaching assistantships and research assistantships to support his graduate students throughout graduate school. During the 1970's, Rao was preparing his first book, Stochastic Processes and Integration. I, along with several other grad students, worked as a research assistant, proof-reading this monograph. Professor Rao was very receptive to student reaction to his writing. At first, I was hesitant to mention where I had difficulty in understanding his text but it became very clear that he was sincerely interested in both my mathematical and stylistic comments. Rao, in discussion, would often tell me of the history of various portions of the text and what different mathematicians had contributed. These were good times for me. I was seeing mathematics from an insider's perspective. Sometimes, Rao would go off describing some current mathematicians. For example, he knew I was studying a paper of the Russian mathematician Daletskii and Rao would tell me of his personal meeting with Daletskii on a visit to Russia and how nice a man he was even presenting M.M. a bottle of Vodka (or Cognac) as a gift. Rao still had the gift somewhere in his office. These exchanges were memorable. The academic environment and spirit for faculty and graduate students in the UCR Mathematics Department of the 1970's was very good. The Department was a friendly place and a good place to study mathematics. Al Stralka chaired the Department. I recall colloquia given by Erdos, Halmos, Bing, Stein and Uhl. I recall the excitement of the four color problem being solved at that time. There was an entertaining talk on properties of Fibonocci numbers as well. The colloquia were preceded by a reception that usually included cookies-a sure way to attract graduate students. For at least two years, the math graduate students participated in intramural basketball games. Our team names of "Zorn's Dilemma" and "The Hardy-Haar Measure" accurately reflected our team's abilities. We had measure 1 of going the whole season without a victory. Except, there was one anomalous game where we actually nipped the lowly and equally winless Physics team on a last second miracle shot-which demonstrated once and for all that events of measure 0 can indeed happen! We had fun with basketball but actually looked forward to the pizza and beer get-togethers after the game more than the game itself.
RAO AND THE EARLY RIVERSIDE YEARS
17
During the mid 1970's, Rao students were united by the knowledge that we were committed to a challenging route working under M.M. Rao and hopeful of his influence to secure us academic employment at a notoriously difficult time period for new Ph.D.'s to find jobs as professors in mathematics. We were also united by having taken an unusually large number of courses from Rao. The following pet phrases (and situations) were often repeated (or experienced) many times in class and today serve to help us recapture, with affection, his unique personality and style: "We make the following definition with complete 'malice of forethought'." "Did you think it was going to be easy? No! That is why his name is on the theorem." "Be careful whenever you see that word 'consider' for what follows is a new idea..." "From there he went on to develop (pronounced 'devil-up') the whole theory..." "You ask me if I can change the order of integration? I DID IT!" "That's the one, that's the condition you need..." "You work and work and work and that is what comes out..." "Now we have proved the Dynkin-Doob Lemma which is also used by statisticians who have no idea why it is true, so we call this result the Theorem of the 'Unconscious Statistician'..." "If you wish to avoid making any mistakes, do nothing at all and that, of course, would be the biggest mistake of all..." "What a loss...that is the death of his career as research mathematician", (Rao's reaction upon hearing a local mathematics professor had become dean.) "You can take this book and throw it in the ditch..." Many times Rao would smile and laugh as he repeated these sayings in different settings. Occasionally, M.M. would re-tell a joke or funny story and break out laughing aloud before reaching his own punch-line. And finally the signature (literally and figuratively) of most Rao chalk
18
A. C. Krinik
talks was the amazing amount of mathematical material he was able to cram into the lower right hand corner of the board as class time expired. His writing became a space filling curve as he adjusted by writing smaller and smaller-working several minutes past when the class was scheduled to stopleaving students dazed and hopelessly trying to decipher his final scribbling. Rao could push and posture. During my last year in graduate school Rao had monitored all my dissertation work. I had passed my oral exams and was in the process of writing up my final results. We were 3 months away from being done. He looked at my folder of dissertation results and then back at me and announced, "It's not enough". I felt my heart sink and had nothing to say. I went home wondering what more I could do. There was no more but he was still seeing if he could squeeze some new results out of me. I did not like the pressure but I understood his intent. It soon became clear to him (if it wasn't already) that there was nothing else to do on my problem. It never came up again and I finished my Ph.D. as originally scheduled three months later. In 1978, Rao still had three of his six graduate students (Brennan, Kelsh and Krinik) anxious to get out. Rao was leaving on a sabbatical (I believe to France) starting Fall 1978 and the realization dawned on us that Rao's sabbatical was our best chance of finally finishing up. Otherwise, we would have to postpone our graduation until Rao's return to UCR a year later and, of course, no one wanted to wait. In a furious finish, we all made it. I was the last of the three to finish and remember happily driving M.M. to the airport. After graduation, my relationship and appreciation of M.M. Rao grew and matured. As a graduate student, I was always appreciative of his financial support for all his students and his academic support for me in particular. After graduation, Rao became a major player in my career. He was always in my corner, helping me. From key letters of recommendation to help me secure positions at JPL, University of Nevada, Reno and Cal Poly Pomona, to advising me where to try to publish my results, to being supportive when my efforts were not always successful, to providing me with opportunities to resume research activities and to finally just being there as a good friend. His encouragement and assistance in developing my professional activities has been and remains a constant. In 1985, I invited M.M. to give a colloquium at Cal Poly Pomona. M. M. did his usual super job and in the audience sat a talented graduating senior who would not forget the talk nor the speaker. That senior was Randy Swift who eventually went on to earn his Ph.D. under M.M. Rao and who today is a good friend and valuable colleague at Cal Poly Pomona. Randy is also the real editor of this volume which we both affectionately dedicate to M.M., our mutual mentor. In tribute to M.M. Rao's stellar career, Randy compiled this volume of research articles.
RAO AND THE EARLY RIVERSIDE YEARS
19
The eighteen Rao students share a special bond and understanding of what it means to earn your doctorate degree in mathematics under the direction of M.M. This collection of essays and articles in honor of M.M. illustrates this bond crosses four decades and bridges his Carnegie-Mellon University students of the 60's with the University of California, Riverside students of the 70's , 80's and 90's. It's been a pleasure to have the opportunity to celebrate M.M. Rao's many contributions and to be "one of Rao's students". Alan Krinik
On M.M. Rao
I first met Professor M.M. Rao in 1985, when I was an undergraduate attending California State Polytechnic University, Pomona. Alan Krinik had invited M.M. to give a colloquium talk in the department. At the time, I was a senior math major and one of the department's promising students. I had attended departmental colloquium talks, but never had I been exposed to a mathematician of the caliber of M.M. Rao. His talk began in a very elementary fashion, but the breadth and depth of the mathematics it spanned greatly impressed me. I was struck by the passion for mathematics that he displayed. I had not been in the presence of someone totally devoted to his discipline. After I completed my Masters degree, I worked for a while in the aerospace industry. I found myself desiring to pursue a PhD. My interest in probability theory and my strong recollection of M.M. led me to apply to the University of California, Riverside. As fate would have it, and in my good fortune, I took M.M. for a graduate course in Probability, his lectures were absolutely beautiful. Spanning the subject with depth and presented with crystal clarity. Of course, he used his text Probability Theory with Applications, perhaps the finest graduate text written on the subject. This course, and indeed, this text, set the tone for what working with M.M. would be like. M.M. believes that homework should challenge his students. During his courses, he assigns a problem or two per week. These problems are not routine homework problems, rather they are problems from the research literature. They are not mere exercises. Indeed, his students spend vast amounts of time working on them. To this end, he is preparing his students for research. Many of these problems aid his students in their future works. M. M. greatly respects effort. If he sees that a student is working, he will guide the student gently down the appropriate path. He has an incredible memory for details, often if a student was stuck on an idea, he would say, go see this page of a particular paper or text. On that page, you would find what you needed to get going again on the problem. From these interactions, 21
22
R. J. Swift
M.M. seems to gauge a student's ability. I became a student of his after I had completed the course in probability theory, a seminar on random fields and a course in stochastic processes. I never asked him to be my advisor; rather, it seemed to be a natural evolution. The first problem that he asked me to study involved the sample path behavior of harmonizable processes. I spent a large part of that first summer developing my facility with these processes. By the end of the summer, I obtained my first minor result; it was on the analyticity of the sample paths. However, the goal was to consider the almost periodic behavior of these sample paths and I was stuck. I toiled in vain for the next few months on this problem. One day, late in November, I went to talk with M.M. about the problem, he listened intently and then said, "let it rest there, for now, I would like you to look at this calculation I have been working upon with harmonizable isotropic random fields." He had obtained a representation for the covariance that involved some rather complicated special functions. He asked me to see what I could do with it, in particular, could it be made to look similar to the representation obtained by Bochner for the stationary isotropic case. I told him that I would try, and he said "There is no try, there is only do, and I know that you can." By Sunday afternoon of that weekend, I had simplified the expression and had showed that it reduced to Bochner's representation in a very natural way. That Monday, I gave him the result. He, in a very delighted manner, then said to me, "See if you can push it. Look at Yadrenko's book and use this representation to extend his results." This began a glorious 3-month stretch of research production. I obtained several major results for harmonizable isotropic random fields. Riding the tide of this success, he said to me "And what of the almost periodicity?" With the confidence I had obtained, I went back to the problem. Within a month or so, I had obtained the results I had long sought. This experience gave me great confidence in my ability to do research. It also gave me a very broad research program to pursue. The confidence that M.M. showed in my abilities as a mathematician remains with me today. It has allowed me to flourish. Many years later, M.M. told me that the first string of results I had obtained after I had obtained the representation was likely enough for the PhD. However, he saw that I was on track and he was going to have me do as much as I could. This story is very typical of the relationships he has with his students. He works them very hard, always encouraging, and yet unyielding in his
ON M.M. RAO
23
determination that they do their very best work. In this setting, many of his students have become mathematicians completely devoted to the discipline. Whether this devotion is shown through excellence in teaching, or excellence in research, or both, for each of us, it is likely attributable to the role M.M. played, and continues to play, in our lives. It is no wonder then, that for some of us, M.M. holds a place in our hearts, and we remain devoted to him as former students and now colleagues. R.J. Swift
Reflections on M.M. Rao
Jerry Goldstein, Neil Gretsky, Bill Kraynek and I all fell under the charms and passions of M.M. at Carnegie Tech at the same time during the mid1960's. It was a glorious time for us and it was a glorious time for math at Carnegie. We had close access to mathematicians such as Dick DufEn, Nehari ( who claimed not to have a first name), Alan Perils, Dick MacCamy, Roger Pederson, Dick Moore, Vic Mizel, Charlie Coffman and of course M.M.. Not only did we study under these fellows but we also socialized with them too. Some drank beer with us, some drank scotch with us and they all drank coffee with us. They welcomed us into a flourishing community of math research. They made us feel like mathematicians. We all owe so much to them. With all these mathematicians (most of whom were in their prime) available to us, why did Goldstein, Gretsky and I choose to work under M.M.? I believe the reason was the sheer passion with which M.M. taught his courses. I'll never forget the M.M.'s closing lecture on the Bochner and Pettis integrals. Looking directly in our eyes, he said, "And now you know more about the Bochner and Pettis integrals than anyone in the department." Then he made a dramatic exit. This lecture must have grabbed me in a big way because I spent the next thirty years on research matters centering around the Bochner and Pettis integrals. There was another reason we chose M.M. The word was out that M.M. really cared about his students. We found this to be true in spades. M.M.'s door was always open and he devoted unlimited time to us. When we had personal problems, M.M. always tried to help. When Gretsky didn't show up for a week, M.M. took to the math department hallways asking everyone, "Where is Gretsky? Where is Gretsky?" Another time when I was finishing my thesis, M.M. was hospitalized for an eye condition under doctor's orders to rest his eyes. But when I visited him in the hospital, there he was going over my thesis in detail. M.M. did not baby us and certainly did not do our work for us. Somehow he was able to extract good work from us. One of his favorite techniques was to ask why we were stuck on a point. Quite often, we later found how to get around the apparent obstacle. All of our theses contained important new 25
26
J. J. Uhl
work and opened up wide opportunities for future research. I thank M.M. for preparing me so well to do what I love to do. There was a lighter side to M.M. One day department chair Ignace Kolodner ordered all advanced grad students to attend a seminar meeting Wednesdays at 4:00. Gretsky exclaimed:"But our intramural team plays Wednesdays at 4:00!" Kolodner said: "What's more important: Sports or Mathematics? Professor Rao what do you say?" In a completely straight face, M.M. said, "I'm on the team; but I don't play." The seminar was rescheduled. Just after I took my final Ph. D. oral fellow grad students threw a surprise party for me outside Pittsburgh in a bar near Harmarville. M.M. came in Roger Pederson's car and made it clear he was delighted to be there. I went over to him and asked him what he wanted to drink. In true M.M. form, he said, "Ginger ale." I want to the bar and ordered a double bourbon and ginger ale highball. He took the glass, took a swallow and remarked, "This is very good ginger ale." Needless to say, M.M. and the rest of us had a very good time that evening. The next day, M.M. said," I don't understand. On the way home I became dizzy and nauseous - so much so that I had to ask Roger to stop the car for a while." To this day I don't know whether M.M. knew what he had been drinking that night. I prefer to believe he did. In my long career, I have never met a Ph.D. advisor who was as respected and as loved by his Ph.D. students as M.M. I hope the reasons are now clear. Jerry Uhl
Stochastic Analysis and Function Spaces M. M. Rao Department of Mathematics, University of California Riverside, CA 92521
Abstract In this paper some interesting and nontrivial relations between certain key areas of stochastic processes and some classical and other function spaces connected with exponential Orlicz spaces are shown. The intimate relationship between these two areas, and several resulting problems for investigation in both areas are pointed out. The connection between the theory of large deviations and exponential as well as vector Orlicz, Fenchel-Orlicz, and Besov-Orlicz spaces are presented. These lead to new problems for solution. Relations between certain Holder spaces and the range of stochastic flows as well as stochastic Sobolev spaces for SPDEs are also pointed out.
1. Introduction To motivate the problem, consider a real random variable X on (fi, £, P), a probability triple, with its Laplace transform MX(-), or its moment generating function, existing so that
Mx(t)= ! etxdP, t€R, 7n
(1) 27
28
M. M. Rao
is finite. Since Mx(t) > 0, consider its (natural) logarithm also called the cumulant (or semi-invariant) function A : t H-> log MX (t). Then A(0) = 0 and has the remarkable property that it is convex. In fact, if 0 < a = 1 — /3 < 1, then one has A(as + pt) = log( / e^+W* dP) J S 6
< log(( / esX dP)a( I etx dPf), 7ft 7ft
by Holder's inequality,
So as t t oo, 0 = A(0) < A(t) t oo, and the convexity of A(-) plays a fundamental role in connecting the probabilistic behavior of X and the continuouity properties of A. First let us note that by the well-known integral representation, one has ft A.(t) = A(a) + / A'(w) du, (3) Ja
where A'(-) is the left derivative of A which exists everywhere and is nondecreasing. Taking a = 0, consider the (generalized) inverse of A', say A'. It is given by A' : 11—> infjt > a = 0 : A'(tt) > t}, which, if A' is strictly increasing, is the usual inverse function A' = (A')"1. Then A' is also nondecreasing and left continuous. Let A be its indefinite integral:
i'(v) dv.
(4)
A problem of fundamental importance in Probability Theory is the rate of convergence in a limit theorem for its application in practical situations. It will be very desirable if the decay to the limit is exponentially fast. The class of problems for which this occurs constitute a central part of the large deviation analysis. Its relation with Orlicz spaces and related function spaces is of interest here. Let us illustrate this with a simple, but nontrivial, problem which also serves as a suitable motivation for the subject to follow. Consider a sequence of independent random variables X\, Xz,... on a probability space (f2, S, P) with a common distribution F for which the Laplace transform (or the moment generating function) exists. Then the classical Kolmogorov law of large numbers states that the averages converge with probability 1 to their mean, i.e.,
n 4=1 ~•
-4 E(Xi) = m.
(a.e.)
Expressed differently, one has for each e > 0, hn(e) —> 0 as n —> oo in:
STOCHASTIC ANALYSIS AND FUNCTION SPACES
P(An] = P[\-Y'Xi- E(X1)\ >e} = hn(e] + o(n)-
29
(5)
and later it was found that hn(e) = e A ( £ ), with A as the Legendre transform of A, the latter being the cumulant function of F, or A(t) = log Mp(t), t e jR, and A is given by A(t) = supjsi - A(t) : t € R}.
(6)
The function A defined differently by (4) and (6) can be shown to be the same so that there is no conflict in notation. The following example illustrates and leads to further work. [A of (6) is also termed the complementary or conjugate function of A.) Let the Xn above be Bernoulli variables so that P[Xn = 1] = p and P[Xn = 0] = q(= 1 — p),0 < p < I . Then the cumulant function A is given by A(i) = log(q + pet) and hence A'(t) = 1 _ Pn^et\ • One finds its complementary function to be, since m = E(Xi) = p and A(m) = 0, A(i) = / [A'(u)]
du,
t > m,
Jm
= \ log -— Jv
p(l - U)
du,
Q 1, or) cumulant generating function A : t >—> Then the rate function I(= A, the compleby
A(s) = I(s) = sup{(s,t) - A(t) : t & Mk} > 0,
(14)
( • , • } being the inner product of Rk. Moreover A. is convex, continuous, and has its minimum at a = E(X\). A direct proof of this result was first sketched by Sanov (1957), and a full argument is considerably more involved than its one-dimensional counterpart. The classical study of Young's complementary function suggests that (3) may be considered as a definition of the conjugate function A. This was verified in a basic paper by Fenchel (1949) in Mk, k > 1, but a careful reinterpretation of it gives a useful generalization as follows. Indeed, the complementary Young pair satisfying (2) symmetrically in the sense that if A is such a convex (continuous) function, with A as its conjugate, then A(i) = sup{(s,i) - A(s) : s 6 Rk}
= - inf (A(s) + (5, t) : s e IRk},
(15)
since the mapping s >—> — s is a homeomorphism on Mk onto itself. It can be shown that one always has A — A if X is a reflexive separable Banach space, while the same is not true for a general (nonreflexive) X . But this is true for X = lRk, since all finite dimensional Banach spaces are reflexive. Thus if A : X —-> R+ is a Young functional and A is defined with Mk replaced by X, using the duality mapping (• , •} : X x X* —* JR so that A : X* —> JR + , one has A : X** —> M+. Hence A = h\X when X is identified as a subspace of X** by the standard natural embedding. [For this extension procedure, see Levin (1975).] These relations can be generalized as follows. To begin, let us rewrite (4), using the fact that A = A on Mk, a finite dimensional Banach space. Thus one has A(t) = log(E(e^x^)) = - m f { ( s , t ) + I ( s ) : s 6 X* = Mk}, s 6 Mk. (16) Since the Xn are independent and identically distributed, (13) and (16) give on replacing X by Yn = ± ££=i Xi\ An(t) = log(E(e)) = n l o g ( £ ( e < > ) ) = nA(-), (fL
M. M. Rao
34
and (15) becomes
A n (t) = -inf{nA(s) + n(-,s) : s e Mk}. It
Replacing t by nr, this becomes on cancelling the factor n: n
_,
-
HT,r n m = _ in f{A( ) ( ) . ^n J „ > -i. // i v s y + \s )r / s 6 >
Here letting n —> oo, the limit exists, and one has on setting ht(-) = ( t , - ) , so that ht is a bounded linear function on IRk, lim - \og(E(enhl(Yn]] = - inf{A(s) + ht(s] : s e -Kfc}.
(17)
This result admits an extension if h : X —> JR is not necessarily linear but simply bounded and continuous and lRh is replaced by a (separable) Banach space when the left side limit is assumed to exist. This is also motivated by a classical evaluation, due to Laplace, of the integral by a Taylor series expansion of the smooth function h : [0, 1] —> JR to obtain lim -\og(( e-nh(x} dx) = mm h(x). b
n^oo n
Vo
z€[0,l]
(18)
Combining (17) and (18), one gets the following generalization, proved directly by Varadhan (1966) for an arbitrary random sequence {Xn,n > 1} with values in a complete separable metric (or a Polish) space X, whose distributions (i.e., the image measures on X) satisfy the large deviation principle in the sense of Definition 2 above. This generalizes the result of Theorem 2 above with cumulant function A and its conjugate A as the rate function. More precisely one has: Theorem 4. Let {Xn,n > 1} be a sequence of X valued random variables obeying the large deviation principle with an action functional I : X —-> M+ where X is a Polish space. Then for any h G Cb(X ) , the space of real bounded continuous functions on X, equation (17) holds, i.e., one has: lim - \og(E(e-nh(x^) = - inf{/i(x) + I ( x ) : x € X}. —
(19)
Conversely, if (19) holds for any h G Cb(X), for an X-valued sequence {Xn,n> 1} of random variables, then the latter satisfies the large deviation principle for a unique action functional I in the sense of Definition 2. If X* is a separable adjoint Banach space, Cb(X) is replaced by X* in the above, and At is the cumulant functional of Xt for each t G J, an index set,
STOCHASTIC ANALYSIS AND FUNCTION SPACES
35
then one can extend the above result to the case of a continuous parameter process or field. Let us illustrate this with the case of Brownian Motion (or BM), and then give an immediate extension to all centered Gaussian processes with continuous covariance functions. In these cases admitting infinite dimensional X is essential as will now be seen. If e = ^ | 0 then (17) becomes formally lim l o g ( £ ( e - y £ ) ) = - inf{A(s) + h(s) : s e Mk}, where YE is defined in terms of the ^-process, so that Y£ —> 0 in probability as e \ 0. Indeed, if Ye(t) — ^/eX(t} for a process X(t) with zero mean and covariance r, then the Cebyshev inequality gives for each 6 > 0 P[\Ye(t}\ >8}< ^E(\X(t}?} = ^M _ o,
(20)
as e \ 0 for each t . If Xt is the BM which has continuous paths, i.e., X(.}(uj) : [0, 1] —> C*o([0, 1]) = X, then the problem is to find the (exponential) rate of convergence Y£ —> 0 (for all t) as e \ 0, and this means to find the rate or action functional related to the BM on the (Polish) space Cb([0, 1]), the space of real continuous functions on [0, 1] vanishing at 0 under the uniform norm. To get out of the i-points here one considers the probability measure determined by the entire process, and then the problem is solved from it, as follows. The canonical representation of a real process is obtained, using the classical Kolmogorov existence theorem, with fi = .ZRr,£ = the cylinder cr-algebra of Q, and P determined by the compatible family of all finite dimensional distributions. Then the process has the coordinate (or function space) representation Xt(uj) = uj(t),u> e £l,t e T, and whose finite dimensional distributions are the given ones, i.e., P[u> : Xti(u) <Xi,i = l,...,n] = Ftl,...ttn(xi, . . . ,xn), xt e M, t{ e T. The Fourier transform of the process related to P is defined uniquely as: ) = [ e € f2. In the case that P is a Gaussian measure on X = (20) can be evaluated to obtain: f ei(x,x)(w)
Jo.
where Q : X * x X* —> M+ is a positive definite bilinear form, X* being the
M. M. Rao
36
adjoint space of the vector space X, so that Q(x,x) = Vx, the variance, and Q(x, y) = Cx 0} satisfies the large deviation principle with the action functional A given on X by: t)\2dy,
(27)
for all absolutely continuous f e Cb([0,1]) whose derivatives are square integrable, and A(/) = oo otherwise. Thus the domain of A. in which it is finite
STOCHASTIC ANALYSIS AND FUNCTION SPACES
37
is the subset ofCQ([0, 1]) given by XQ = {/ € X : ft |/'(t)|2 dt = \\f\\2 < oo}, which is a Hilbert space, a subspace of X but with a stronger norm topology. It will next be shown how this admits an extension to general Gaussian processes, with continuous covariance and mean functions. Thus let X = {Xt,t £ T = [a, b}} be a centered Gaussian process with a continuous covariance function r : T x T —> M. Consider the integral operator R with kernel r, so that
'
d
S
(28)
,
which is compact (even Hilbert-Schmidt) on XQ and is positive definite. Also R is nonsingular and has a unique square root Rz which has an inverse. Then the range space M = R? (L2(T, dt)) can be described precisely as follows. For each / € M C fi = MT, consider a new process defined by Y/ (u>) = Xt(u>) + f ( t ) whose probability measure Pf has the property that Pf -C P. Such an / is called an admissible mean of the -Xj-process. Then M can be provided with a new inner product as follows. Since /j e M => fi = R?hi,i = 1, 2, let (/i,/2)Af = (Rhi,h-2)i?(T,dt}i the l^t symbol being the scalar product of L2(T,dt). Then M is a Hilbert space (cf., e.g., Rao (1975) for a proof of this fact), and R? also has a kernel representation (cf. Dunford-Schwartz (1958),VI.9.59) as: = f G(s,t)h(s)ds,
JT
heL2(T,dt),
(29)
so that the covariance r can be expressed as: r(s,t) = f G(s,u)G(t,u)du.
(30)
JT
Next define a new process on the same canonical space: Xt = fT G(s, t) dBs, where {Bs, s E T} is the standard BM on the same space. This being a linear transformation, {Xt,t 6 T} is also a centered Gaussian process having the same covariance function r as the Xt-process. Since a Gaussian process is determined by its mean and covariance functions, these two processes can be identified, Xt = Xt,a.e.,t € T. Now / = R?h,h <E L2(T,dt) and tf-^exists, and so by Theorem 5, one has the following exact form of the conjugate function A of A: f £M = R*(L2(T,dt)), and A(/) = oo, otherwise. This may be summerized as:
(31)
M. M. Rao
38
Theorem 6. Let {Xf = ^/eXt,t € T, e > 0} be a centered Gaussian process with a continuous covariance function. Then Xf -process obeys the large deviation principle with the action or rate functional A given by (20). This was first established by Preidlin with a slightly different argument (cf . , Freidlin and Wentzell (1998), Sec. 3.4). The point of these cases here is that the Fenchel- Young (or cumulant) functions A are defined on infinite dimensional (separable) Hilbert or Banach spaces, and their action functionals A are explicitly calculable. They are of primary interest in the large deviation problems. Because of this circumstance these spaces will be analyzed from the function space point of view in Section 4 below. It is also useful to discuss another class of fast growing Young functions to be included in that study since it is not usually detailed in general abstract analysis. 3. Exponential Orlicz spaces and stochastic processes The following familiar problem leads to a consideration of exponential Orlicz spaces, as path spaces of the associated process whose structure provides precise description of the growth of sample functions of interest in applications. Thus consider a Poisson process {Xt, t > 0} used to describe telephone traffic, so that the process has independent and stationary increments. Then XQ = 0, a.e. and for 0 < t\ < t^ are any time points, , fc = 0 , l , 2 , . . . .
(32)
Here c > 0 is a constant, called the intensity parameter. Let Y be a symmetric Bernoulli random variable, independent of the -Xj-process, so that P[Y = -1} = P[Y = +1} = \. Let {Zt = Y(-l}Xt,t > 0}, a process which has practical interest and has bounded paths. It is desired to describe the growth behavior of t >—> Zt(u), for almost all sample points u. Note that the ^-process is also stationary and has moving discontinuities. This will follow from the computation, given below, for the growth rate of the increments of the process in terms of an exponential Orlicz norm. Indeed, since Xt and Y are independent E(Y) = 0, one has E(Zt} = 0 and using the independent increment property of the Xt-process, it is seen that for s < t E(ZaZt) = — e —2c(t—s) .
/Q Q\ (M)
So for 0 < s, t < co, one has Cov(Zs,Zt) = e ~ 2c l*~ s l, implying the (Khintchine) stationarity of the Zj-process. Also by definition Zt — Zs takes only
STOCHASTIC ANALYSIS AND FUNCTION SPACES
39
the three values 0, ±1, and then P[Zt -Zs = 0} = P(Xt -Xs = 2k,k = 0,1,2,...] = e~c|^sl cosh(c(i - s)) = a(s,t), (say),
(34)
and similarly that P[Zt -Zs = l} = P[Zt -Za = -l] = ±[l- o(s,t)].
(35)
If $(•) is an ./V-function (i.e., a symmetric nonnegative convex function, $(0) = 0, ^ -+ 0(oo) as x -> 0(oo)), and A(x) = e*(x) - 1, which is again an ./V-function, then the space L A (P) on a probability space (fi, S,P) of functions / : fi —> JR for which JQ A (A;/) dP < oo for some /e > 0 is an exponential Orlicz space which is a Banach space under the norm: ||/||(A)=inf{Ar>0:
JO
A()dP 0, XQ > 0 such that A 2 (x) < A.(Kx), x > XQ, and the above function satisfies this, since A 2 (x) = e2*^ + 1 - 2e*(x) < e^^ - I = A(2x), x > 0. Moreover, any A 2 function has the growth condition satisfying (as seen from Rao and Ren, loc. cit.): exa < A(x),
x > xi > 0,
for some a > 0 and an x\. It is also easy to see that L°°(P) C L A (P) C n p>1 L p (P) C UP>1LP(P) C L A (P) C ^(P),
(37)
where A is the complementary function of A. Since Zt is bounded, it is in L A (P). Using (36) one can compute \\Zt — Z S ||(A) for any exponential ./V-function A and get: \\Zt - ZS\\(A) = M{k > 0 : [ A ( - ) + A ( ) ] ~
< 1}, (A(0) = 0)
= inf{A; > 0 : A(-)(l - a(s, t)) < 1}, by symmetry of A, /C
Also from the fact that for any to > 0 P[\Zto - Z0\ > 0] = 1 - o(0, to) > 1 - e~ct°,
(38)
M. M. Rao
40
it follows that P[\Zto+£: — Zto-£\ > 0] —> 0 as e \ 0, since a(0,e) —> 1 as s —> 0. Thus to > 0 is not a fixed discontinuity point, but is a moving one. This example shows that exponential Orlicz spaces L A (P) for A.(x) = e*(x) — 1, with $ as an N- (or a continuous Young-) function, are of interest in applications. The spaces L^(P) lie between Lf(P] and L°°(P) for all p > 1, as seen from (37). They also play a key role in pre-Gaussian random variables. Several applications may be found in Budygin and Kozachenko (2000). Recall that a random variable X is pre- (or sub-) Gaussian if its moment generating function exists, in a nondegenerate neighborhood of the origin (or 1R) and is dominated by that of a centered Gaussian variable in that neighborhood (in all 1R), i.e., there are an a > 0 and a (3 > 0 such that E(etx}<e^r-,
Vt € (-a,a), (t £ M).
(39)
Thus the cumulant function of X is dominated by a quadratic form in that interval. If Qo denotes the class of pre-Gaussian random variables, then -> e*(x) — I where & is a continuous Young function such that 3>(x) > 0 for x > 0, and consider the (exponential) Orlicz space L A (P), as above. Then: (i) I <E L^(P) iff there exist constanta C(= Cf) > 0, D(= £>/) > 0 such that P[w:|/(w)|>x],
x > 0,
(40)
and then
(! + ?)£;
(41)
(ii) if$(x) = \x\p,p > I , andN(f) = supn>a f, where || • \\np is the Lebesgue np norm of L (P), then N(-) is a norm equivalent to || • ||( A ). Here (i) is from Buldyagin and Kozachenko (2000) whose bound for the norm in (10) is slightly improved ((1 + ^) instead of (1 + C) there). This whole result is detailed in Rao and Ren (2002, Sec. 8.3). For (ii) which is essentially due to Fernique (1971), and which is also in the latter reference,
STOCHASTIC ANALYSIS AND FUNCTION SPACES
41
there is unfortunately a typographical slip at a key point which may confuse the reader. So it will be sketched here for convenience. As a consequence of this part, one obtains that the space of pre-Gaussian variables QQ is a Banach space. Sketch of Proof of (ii). Since $(x) = \x\p, A(x) = e^P -1 = £ n >i ^, define the functional M : / H-> supn>1 f and an an > 0, by the equation (n\)w
~
a n (n!)"p = \\f\\np- Then one has
WP- V
1
r
Thus an < ||/||(A),n > 1, and so M(/) < ||/||(A)- However
a
2M(/)
since by definition of an one has an < M(/),n > 1. Consequently, ||/||(A) < 2pM(/), using the definition of the gauge norm. Hence p > 1, / 6 L A (P),
(43)
so that the functionals M (•) and || • ||(A) are equivalent. Now observe that ||/||n T ll/lloo and by Stirling's approximation for n!, one gets easily sup n >j(i)^ ~ supn>! ^, and hence N : f >-* N(f) = supn>! is a norm functional and is equivalent to Af(-). Thus N(f) ~ ||/||(A)J or the norms N(-) and || • ||(A) are equivalent. Q Remark. If / is a pre-Gaussian variable, then one finds that N : f >—> su Pn>i(^i )" ^s a norm and it is equivalent to N and hence to || • ||(A)Thus the set QQ of pre-Gaussian variables / for which N(f) < oo is a subspace of L A (P), by part (ii) of the above theorem. By the equivalence of norms and the completeness of an (exponetial) Orlicz space, one concludes that (^O)-^(')) is a Banach space. A direct proof of this fact without the Orlicz space theory is considerably more difficult. It is natural to seek a common abstract function space analysis that combines both this and the preceding multidimensional Young function versions to give
42
M. M. Rao
an overview of the underlying functional structure. This is discussed in the next section.
4. Fenchel-Orlicz spaces for stochastic applications In the problems of large deviations of BM and the general Gaussian processes, it was seen that the Fenchel-Young function A : X —> JR+ where X is an infinite dimensional Polish space, and in the preceding section the classical Young functions leading to exponential Orlicz spaces. These two facts motivate the following functional analysis study. Thus A must be convex but not necessarily symmetric and not bounded, i.e., A(fcr) —> oo as t —> oo for each 0 ^ x e X. Simple examples show that even when X = M2 the last condition need not hold for a convex A, and has to be assumed. In this extension, even if A ( — x ) — A(x), i.e., symmetric, its conjugate A need not be. A first step in this direction may be formulated as follows. Definition 1. Let (X, \\ • ||) be a Banach space, A : X —> 1R+, a convex function such that A(0) — 0, A ( — x ) — A(x) and {x : A.(tx) < oo, for some t > 0} = X. Then the Fenchel-Orlicz space on a measure space (fi, £,//) denoted L A (jU, A") is the class of strongly measurable / : fi —> X such that /Q A(fc/) dp, < oo for some k > 0 (Bochner integral). The (gauge) norm, with which equivalent classes are identified, is given by ||/||(A)=inf{fc>0:jf A(£)d/x a||a;|| -f f3 for some a, (3 > 0 and all x 6 X — {0}. A proof of this result, and several of the properties discussed below can be obtained from the work of Turett (1980). After stating some geometrical aspects of Z/ A (//,A"), the problem of interest in stochastic analysis will be highlighted. The standard growth condition used for Young function is also meaningful for the general case. Thus A is Aa-regular if there are constants K^ > 0, i = 1, 2, such that A(2x) < K\K(x) for ||x|| > Ky. A similar condition is meaningful
STOCHASTIC ANALYSIS AND FUNCTION SPACES
43
for the conjugate function A of A, as in the classical case. With these concepts the following exemplifies some geometric facts of these spaces. Theorem 3. Let X be a Banach space and A : X —> M+ be a convex function, as in Definition 1, and L A (//,^f) be the corresponding Fenchel-Orlicz space. Then (L A (//,/f), || • \\(\)} is reflexive whenever X is reflexive and both (A, A) satisfy a A2-condition. For these spaces, the geometric structure as well as other related properties have been detailed by Turett (1980). The analysis of the corresponding spaces if A is not symmetric, as desired by the work of the preceding two sections, is not yet available but desirable. This will also be of interest even in the classical convex analysis, as discussed, for instance, by Br0ndsted (1964), and in fact much of the work in Rockafellar's (1970) book, to have extensions to the general case with X infinite dimensional. This is at present largely unexplored. In the next section, another set of spaces that are needed for an analysis of the sample paths of solutions of stochastic differential equations and related perturbation theory will be considered. 5. Function spaces for stochastic differential equations Suppose that a process {Xt,t > 0} is a solution of a first order stochastic differential equation dXt = £o-(Xt,t)dBt + be(Xt,t)dt,
t >0
(45)
where b£ : Md x JR+ -> Si, called the 'drift' and a : JRd x JR+ -> JR+, the 'diffusion' coefficients, and e > 0 is a parameter, Bt being the BM-process. If b£ is independent of e, and then e = 0 making the first term disappear, one has an ordinary differential equation. In the general case when e > 0 is fixed, and be,a satisfy a standard Lipschitz condition, and a given initial value XQ = x (or XQ = x in the nonstochastic case) the Ito theory implies that there is a unique solution of (45). [The matter is discussed even for higher order equations in Rao (1997).] A solution of (45) is called a diffusion process. The problem is to find conditions on be arid a and the range space in order that the solution of the perturbed equation tends to the unperturbed one, and find conditions that the deviations decrease exponentially. Thus the problem becomes a continuous parameter analog of the large deviation questions considered in the earlier sections. For simplicity b£,a will be assumed
44
M. M. Rao
to be defined just on lRd so that the diffusion process will have stationary transitions, but the extension to the general case is then also possible and useful, as discussed in the last reference. Since the solution should be "smooth", one has to find a proper space in which the process takes its values. Naturally these are Orlicz type spaces whose elements are "smooth" . The appropriate spaces here are the BesovOrlicz and Orlicz-Sobolev spaces, the latter being more suitable for the stochastic partial differential equations. First a brief recall of these spaces is given to make the discription intelligible, and then to state the results for the solution of (45). Consider the probability space (fi, £, P] where J7 = [0, 1], £ = Borel cr-algebra. For / € L Ap (P), A p (x) = exP — l,p > 1, the modulous of continuity is given as: WA P (/,*) = sup ||Ah/||(A ), 0 < t < 1, 0 B° is a mapping such that s(h) is continuous on balls of B°, and satisfies the ordinary differential equation, associated with (also called a 'skeleton' of) (45) as: = o-(s(h)(t}h'(t) + b(s(h))(t),
s(h)(Q} = x.
(50)
The details of proof, as one can expect, involve several estimates that are tedious but somewhat standard in this area, and are given by Eddahbi and Ouknine (1997). A two parameter extension (i.e., for fields Xt,t = (t\,tz) e [0,1] x [0,1]) is also available with an exact extension, but involves more work. It is recently obtained by Boufoussi, Eddahbi and N'zi (2000). The role of Besov-Orlicz spaces and exponential Orlicz spaces here need no further emphasis, being crucial for the work. Let us also note briefly the spaces that appear in the study of stochastic flows, which are continuous function spaces but are of a different type. Here nonlinear operations appear from the start, and the analysis is localized. A brief discussion will now be included. Consider the mappings at : Md x fi —> B(Md), the space of d x d matrices on JR and suppose the following three conditions hold on (fi, E, P) for 0 < s < t < u < oo:
46
M. M. Rao
(i) 4>su(x,uj) = ((pst o tu)(x,u), for x € Md, and a.a. u> <E ft,
(ii) ss = identity, (iii) ') is a Cfc-diffeomorphism (A; > 0) for a.a. uj € fi. Then the family {st, 0 < s < i < oo}, is termed a stochastic flow if it is obtained as a solution of a stochastic differential equation (SDE) and satisfying (i)-(iii): d(t>st = F(st,dt),
M(Q,u) = x,
(51)
where {F(x,t),t > 0} is a family of random functions for each x, which is either a semi-martingale relative ito a nitration {Ft,t > 0} from E or more simply an L2>2-bounded process in the sense of Bochner, with F(x,0) = 0. This includes many types of flows, such as Brownian, martingale, or harmonizable classes. It is now necessary to give a meaning for the SDE (51), and as usual converting it into an integral equation written symbolically as:
/•* (/)st(x) = x+ \ F((f)sr,dr), Js
0<s 0} for any pair of x,y 6 ]Rd. This is a certain nondecreasing process, adapted to the filtration, denoted as {A(x,y,t),Ft,t > 0} which takes values in a function space of the variables (x,y~), and F ( x , t ) also takes values in such a function space. It will be a Holder space noted above, and a consequence of this condition is that there is an increasing process {At,^t,t > 0} that dominates {A(x,y,t),Ft,t > 0} for all x,y. Hence for each pair x,y € JRd (by the Radon Nikodym theorem) one has the representation: ft A(x,y,s) = a(x,y,s)dAs. (53) Jo The pair (a(x, y,t),At),t>0,is sometimes called local characteristics of the semimartingale F. The function spaces that are central to the analysis are defined as subspaces of continuous function spaces as follows. Let V c Md be a domain [=open nonempty connected set] and consider / : V x V —> Md, a smooth function for which the norms are defined as: using the PDE symbolism, let Da = D«l • ••D°«,Dt = £pa; > 0,a = (c*i,.. .,ad), \a\ = £? ai , and define for any 0 < 6 < 1 the norms:
\\f\\ \\J\\S —
«
bu P Xi,yi€t>,xi^x
and |a|=m
where /(».!/) The space £m>(5 = {/ : V x V -> ]Rd, \\f\\m+s < 00} with the norm denned above can be verified to be a Banach space. These function spaces are the ones in which various processes take values and the analysis of the desired
M. M. Rao
48
flows is also in this space. As recalled above, a semimartingale F(x, t) is representable as F(x, t) = M(x, t) + B(x, t), a sum of a (local) martingale and one of (locally) bounded variation. Let the local characteristics of M be denoted by (a(x,y,t),At). Assume that B(x,t) = /0* b(x,s) dAs, and then one terms (a, 6, A) as local characteristics of the semimartingale F. If {X-t ,Ft,t > 0} is a process with values in Md with continuous paths, then under the conditions that F is a continuous semimartingale, x & T>, whose local characteristics (a, 6, A) 6 BQJ for some 5 > 0, then the integral in (53) can be defined with the .X^-process as follows: For any partition A — [0 = to < ti < . . . < tk = T] the (nonlinear) ltd integral is
t
fc-i F(Xs,ds) = UmQ^2[F(XtjM, tj+1 A t ) - F(XtjM, tj A t)}.
(54)
Here |A| denotes the norm of the partition, and the convergence is in probability. Similarly the Straonovich integral is defined by averaging the values of F on the partition points as in the classical Riemann integral. Thus one denotes this as: /* F(XS, ods) =
lirn ^{[F(Xt —*U
^0
j + l A t ) tj+1
A t ) - F(Xtj+lM, t, A t)}
-__Q
+ [F(XtlsM,tj+i
A t ) - F(Xtj A t,tj A t)]},
(55)
if this limit exists uniformly in t in probability as |A| —> 0 then one has the second integral, and moreover this limit exists if the local characteristics belong to (-82,5; -^1,0)- Equation (52) is thus well-defined with either concept, and when the conditions on the function space in the second case hold (then for the Ito definition they are automatically satisfied), one has the connection between these integrals as ({•,-} for quadratic variation as usual): ft
ft
\ F(Xs,ods)= JO
I
d
f
F(Xs,ds) + -^{ */0
(56)
^ •_ i « / 0
In many computations it is found that Stranovich integral is more convenient. Most of the above work stems from the availability of the quadratic variation of the process or field F(-, •), and then suitable function spaces as value spaces allow one to define and analyze stochastic flows of various types. However the general class of process and fields coming from an extended Bochner boundedness principle which includes semimartingales and other classes, also has the quadratic variation (cf. Rao (1995), Proposition 6.2,12). Consequently most of the above analysis may be extended to this general class and it appears to be an interesting research problem. Perhaps even the
STOCHASTIC ANALYSIS AND FUNCTION SPACES
49
Holder spaces may be replaced by the Besov-Orlicz spaces, especially the B° subspaces denned above which are separable. The theory of stochastic flows is well treated by Kunita (1990) in the context of Holder spaces and variations of them. ^,Prom a different approach, again extending Kunita's work, Carmona and Nualart (1990) presented a detailed exposition for "strong integrators" and use some refined aspects of stochastic integration on related function spaces. A general theory of the subject as given by Kunita (1990), and how a large amount of the latter work can be extended to the I/2'2classes is indicated in (Rao (1997),Sec. 4.8). The main point of the above sketch is to show how function spaces play a basic role in this theory from its formulation and the consequent analysis. 6. Stochastic PDEs and function spaces In this final section, a few results, as remarks, on function spaces that are also crucial for the SPDEs will be given. As is known, it is not always possible to obtain explicit solutions for PDEs and hence one seeks for 'weak solutions' using the theory of Schwartz distributions which depend on different types of 'smooth'function spaces. Consider the elliptic PDE (using the notation introduced above for / : G —» 1R, G is a domain with 'smooth' boundary 8G) is given by: Lf=Y, (-l)MDa(aa(Do and Daf\QG = 0 for 0 < a < m < oo. Also / has a weak a*ft-order derivative if there is a ga such that for all compactly based C^-functions h one has JGfDahdx — (—1)'" fGgahdx. A weak solution of (1) is an / which satisfies the equation (g,h) = (f,L*(h)) for all compact C^-functions h where the duality pairing is used and L* is the formal adjoint of the linear operator L. Under some conditions the equation has a unique weak solution subject to some boundary conditions. Note that (57) is actually of infinite order. Such equations appear in the theory of elastic equations in which the coefficients aa are random processes, making it an SPDE. Here a theory of Sobolev spaces of infinite order is relevant as shown by Dubinskij (1984), and a corresponding Orlicz-Sobolev space analog was briefly treated in (Rao and Ren (2002)) since the solution belongs to such a space for almost all sample paths. This is a natural function space and many interesting problems await solutions. There are several approaches and models for studying SPDEs. For instance a classical wave equation subject to random disturbances can be formally stated as
50
M. M. Rao
utt(x,t) = uxx(x,t) + g ( u ( x , t ) , t ) + f ( u ( x , t ) , t ) X ( x , t ) ,
(58)
with initial conditions u(x, 0) = uo(x), ut(x, 0) = VQ(X), x € 1R, and ut, utt, ux as the partial derivatives, X being a symbolic stochastic derivative (e.g., in the case of BM) as the noise disturbance. This is usually understood precisely by converting it into an integral equation as: / {u\(t>tt - 4xx](x,t) -
[g(u,t}(t)}(x,t)}dxdt
KJ1R
= 1 1 f(u,t)(x,i)X(dx,dt),
JMJJR
(59)
for all compactly based smooth functions . But now the stochastic integral on the right is relative to the random field X, and the theory becomes delicate even for Brownian sheets since it depends on multimeasure theory (not just bimeasures), as was briefly discussed in (Rao (1997), Section 4,10). This awaits a detailed investigation. The L2'2 boundedness principle seems to play a role here as well. On the other hand if the operator L of (57) is of finite order elliptic equation, and Lf = g has the right side a random function analogous to (58), one can consider using the Schwartz theory of distributions as generalized random fields in terms of Gelfand-Ito definition of the middle 1950s. Rozanov (1998) has taken this point of view and introduced stochastic Sobolev spaces to be used in lieu of the classical Sobolev spaces of the PDE theory. The latter spaces are of random linear functions £ : (j> H-> ^ G LP(P) whose norm ||£||p is defined as: m2P= sup [£(|^| 2 ] being the Fourier transform of <j>. Some preliminary analysis is in Rozanov's memoir noted above. A thorough investigation of these spaces and their role in solving the above type of SPDEs is an interesting project to pursue. Thus the combination of both the methods appears to be a good source of research problems that were only considered before with many (unnatural) constraints. One may also look into the stochastic Orlicz-Sobolev spaces generalizing the corresponding work already noted earlier. Many of these problems are potentially solvable.
STOCHASTIC ANALYSIS AND FUNCTION SPACES
51
References 1. Br0ndsted, A (1964), 'Conjugate convex functions in topological vector spaces,' Mat, Fss. Medd. Dansk. Vid. Selsk., 34, 1-26. 2. Boufoussi, B., M. Eddahbi, and M. N'zi (2000), 'Freidlin-Wentzell type estimates for solutions of hyperbolic SPDEs in Besov-Orlicz spaces and applications,' Stock. Anal. Appl, 18, 697-722. 3. Buldygin, V. V., and Yu. V. Kozachenco (2000), Metric Characterization of Random Variables and Random Processes, Amer. Math. Soc., RI. 4. Carmona, R., and D. Nualart (1990), Nonlinear Stochastic Integrators, Equations and Flows, Gordon and Breach Science Publishers, New York. 5. Cramer, H. (1938), 'Sur un nouveau theoreme-limite de la theorie des probabilites,' Acualites Scientifiques et Industrielles, 736, 7-23, Hermann, Paris. 6. Deuschel, J.-D., and D. W. Stroock (1989), Large Deviations, Academic Press, Inc., New York. 7. Dubinskij, Ju. A. (1984), Sobolev Spaces of Infinite Order and Differential Equations, D. Reidel Publishing Company, Dordrecht, Netherlands. 8. Dunford, N. and J. T. Schwartz (1958), Linear Operators, Part I: Gemneral Theory, Wiley-Interscience, New York. 9. Eddahbi, M. and Y. Ouknine (1997), 'Grandes deviations des diffusions sur les espaces de Besov-Orlicz,' Bull. Soc. Math., 121, 573-589. 10. Ellis, R. S. (1985), Entropy, Large Deviations, and Statistical Mechanics, Springer, Berlin. 11. Fenchel, W. (1949), 'On conjugate convex functions,' Canad. J. Math., 1, 73-77. 12. Fernique, X. (1971), 'Regularite de processus gaussiens', Invent. Math., 12, 304-320. 13. Freidlin, M. I., and A. D. Wentzell (1998), Random Perturbations of Dynamical Systems, (2nd ed.) Springer, New York. 14. Kunita, H. (1990), Stochastic Flows and Stochastic Differential Equations, Cambridge Univ. Press, Cambridge, UK. 15. Levin, V. L. (1975), 'Convex integral functional and the theory of lifting,' Russian Math. Surveys, 30(2), 119-184. 16. Rao, M. M. (1975), 'Inference in stochastic processes-V: Admissible means' Sankhya, 37,A, 538-549. 17. Rao, M. M. (1995), Stochastic Processes: General Theory, Kluwer Academic Publishers, Dordrecht, Netherlands. 18. Rao, M. M. (1997), 'Higher order stochastic differential equations,' Real and Stochastic Analysis, Recent Advances, CRC-Press, Boca Raton, FL, 225-302.
52
M. M. Rao
19. Rao, M. M., and Z. D. Ren (2002), Applications of Orlicz Spaces, Marcel Dekker Inc., New York. 20. Rozanov, Yu. A. (1998), Random Fields and Stochastic Partial Differential Equations, Kluwer Academic Publishers, Dordrect, Netherlands. 21. Sanov, I. N. (1957), 'On the probability of large deviations of random variables,' Mat. Sb., 42, 11-44 (English Translation, AMS, 1961). 22. Schilder, M. (1966), 'Some asymptotic formulas for Wiener integrals,' Trans. Amer. Math. Soc., 125, 63-85. 23. Turett, J. B. (1980), 'Fenchel-Orlicz spaces', Dissert. Math.,, 181, 1-60. 24. Varadhan, S. R. S. (1966), 'Asymptotic probabilities and differential equations,' Comm. Pure Appl. Math., 19, 261-286. 25. Varadhan, S. R. S. (1984), Large Deviations and Applications, SIAM, Philadelphia.
Applications of Sinkhorn balancing to counting problems Isabel Beichl NIST 100 Bureau Drive Gaithersburg, MD 20899 Francis Sullivan IDA/CCS 17100 Science Drive Bowie, MD 20715 Introduction We describe a novel Monte Carlo method for estimating the total number of matchings, both perfect and partial, in a bi-partite graph. For the case of perfect matchings, this is equivalent to estimating the permanent of a matrix of zeros and ones. Our method is quite different from the Monte Carlo Markov Chain (MCMC) techniques that have been developed extensively in the past several years [7, 9]. Instead of using a Markov chain, an importance sampling method originally developed by Knuth [10] has been extended to apply to this class of problems. The robustness of our technique is based on the use of Sinkhorn balancing [11, 12] to construct an extremely good importance function. As we shall see, the apparent advantage of importance sampling over MCMC is that importance sampling is extremely computationally efficient. But the disadvantage is that there's very little understanding of why convergence to the mean is so rapid. Unlike MCMC, for which a beautiful theory of convergence now exists, there are very few analytic results about importance sampling. Knuth gives an interesting expression bounding the variance which we can relate to our setting. We discuss how our method relates to Knuth's formulation. We also use some ideas due to Ando [1] to give an informal
53
54
I. Beichl & F. Sullivan
Ai Figure 1. The Knuth method estimates the size of a tree by sampling paths from the root to the leaves and generating estimators from the number of choices available at each node. explanation of why Sinkhorn balancing performs as well is it does. We will also show an application of the method to estimating the solutions of the dimer problem and the monomer-dimer problem. The basic method Let's begin with a sketch of Knuth's original idea [10]. The question was how to estimate the running time of a back-track program without actually performing the entire backtrack. The idea is simple. Any backtrack can be thought of as a search of a tree which backs up to the first ancestor node having an available choice whenever it is blocked, and continues doing this until an "answer" is found or all nodes have been examined. If we imagine that the backtrack is a tree (not necessarily balanced) then this amounts to a depth first traversal of the tree that stops at a node satisfying some prespecified condition. In many situations, it is useful to be able to estimate how much work will be done before the answer is found, i.e. to estimate how large the tree is without actually traversing it. If the search generated a perfect binary tree, the question is easy to answer. Just determine d, the depth of the tree and assume that the whole tree must be traversed before finding the answer. The amount of work is then 2d+1, the number of nodes in the tree. To determine d, just walk down one branch of the tree and count the number of steps to reach the leaf. Amazingly, a simple and obvious-seeming generalization of this idea works in much more general situations where the tree is not binary or even fixed degree and the depth is not uniform. A "sample" is a traversal of any path of the tree, stopping when a leaf is reached. At each step k choose at random among the n^ children of the current node and record n^. After traversing
APPLICATIONS OF SINKHORN BALANCING
55
a path,the estimate obtained for the number of nodes is given by:
Averaging values of c over sufficiently many samples gives the estimate. In Figure 1, the dashed lines are an example of one sample path through the tree. For this path, UQ = 3,nj = 2,ri2 = 2 and 713 = 1. Approximate counting by sampling
Why should this work? To explain, let's suppose we want to estimate a sum of positive terms,
r€T
where \T\, the number of items, is very, very large. One approach is to select N samples TI £ T, 1 < i < N uniformly, and then approximate G with F where
N Note that the uniform sampling method chooses a particular TJ with uniform probability P(TJ) = 1/|T|. The mean, F, can be re- written as:
r =— N
pin)
l but that is exactly the same as having a path through the matrix which includes the elements i,cr(i). Trying to find a path by choosing a unique column in each row, one row at a time is traversing a tree, because when a column is selected, it is eliminated from later consideration. At each row, we choose from among the columns such that a^j = I and j has not already appeared. Making sure that we find a path of maximum length can be done by a backtrack procedure. If there are no columns left to select, back up to an earlier row and make a different choice. So we apply the Knuth method to this problem. Estimating the permanent of a zero-one matrix is related to a question from statistical physics: estimating the dimer covering constant. In fact, the dimer covering constant is the limit of a sequence log(Mn)/n where the Mn are permanents of specific n x n zero-one matrices. Counting all matchings, rather than just the perfect ones, is called the monomer-dimer problem, because the nodes that are not matched are called monomers. Using the straight-forward version of Knuth's method "out of the box" would give estimates for both the dimer covering problem and the monomer-dimer problem. However, we can do better than the straight-forward version in terms of efficiency if we use a better importance function than the uniform one. As has been mentioned, if we want to think of finding a matching as traversing a tree, we can choose an element from row 1, using the number of non-zeros to define the uniform probability for choosing a column. The choice eliminates one row and one column and then we move to row two and choose again, and so forth. But we can devise a better importance function if we can define probabilities so that either a row or a column could be selected at each step and have the probability of choosing that location be proportional to the number of paths going through that location. That is, we'd like both the row sums and the columns sums to equal one. Namely, we want to associate with A a doubly stochastic matrix B having non-zeros in the same locations as A has ones except for those locations ( i , j ) in A where there exists no path. These are said to be unsupported elements of A and we address this in a later section. This leads us to the method of Sinkhorn balancing. Simply described, the Sinkhorn balancing algorithm derives a doubly stochastic B from a zero-one A by first dividing the entries in each row by the sum of the non-zeros in that row. The result of this is a row stochastic matrix that is not necessarily column stochastic. Next divide by column sums giving a column stochastic matrix that is not necessarily row stochastic. Then divide by row sums again, etc. This will converge quickly [12] if all of the non-zero elements of A are supported and the limit is the Sinkhorn balance of A. (We will return to the
APPLICATIONS OF SINKHORN BALANCING
59
question of support presently.) It is obvious that each division is a pre- or post-multiplication by a diagonal matrix and if products of these matrices converge we will have that A = D-B-E
where A is the original zero-one matrix, D and E are diagonal and B is doubly stochastic [11], [12] . An almost obvious consequence of this is for each (i, j) we have
in the case that there exists a path through position (i, j). The quantity bij is zero in case there is no path though i, j. Notice now that this implies that for any permutation a such that all a i>cr (j) are equal to one, the term f|j ^zo-(i) is equal to l/Hi^i 6 ^)) = Vrii(^ e j)- Summing over all permutations gives that
For any square matrix C, we will write |C*ij| for the ( i , j ) permanent minor of C, that is the permanent of the minor obtained from C by removing row i and column j. Another immediate consequence of the above equality is that for all i,j we have:
\B\
\A\
Notice now that if the terms \B\
happen to be close to one, the values 1/bij are close to being a perfect importance function, because we can apply the same idea at each step of finding a path. To apply the Knuth idea to approximating the permanent we recall Tfc(j') is the size of the tree at level k from node j down. For us, Tfc(j') is the permanent of the (n — k) x (n — k} minor of A obtained by deleting the first k elements of the path to node j which we denote as \A^ -|. Figure 3 shows the choices at each stage of path selection as part of a tree. The probability of going to node i is pi which is bk,i after k levels. So we want there to exist a close to one such that
60
I. Beichl & F. Sullivan
Figure 3. At level k we can choose among branches with different probabilities. where the bkj are obtained by Sinkhorn balancing the appropriate minors. But we know that 'J \B\
\A\
SO
Every term has
, hence combining these equations we get \B-kjl < a\B
and we would like a to be close to 1. In [1] Ando proves that the doubly stochastic matrix having minimum permanent for a given zero-one pattern has the property that all its minors have equal permanent. If all minors are equal then a will equal one. Ando also shows that the Sinkhorn balance, \B\ for a given zero-one pattern maximizes the entropy (= — X)^»,j l°g&ij)- This is proved in a new way in [3] where it was also shown that maximizing the entropy is similar to minimizing the permanent. The connection is via the Bregman map. For a matrix B the Bregman map [4] is defined as
61
APPLICATIONS OF SINKHORN BALANCING
Bregman map, entropy vs tog(permanent) for 4 x 4 "bad" matrix
3.5
4
-2
-1.5
-1
-0.5
Log(permanent)
Figure 4- Permanent versus entropy for iterations of the Bregman map and its inverse on a worst case matrix. We would like the Sinkhorn balanced B to be a fixed point of the Bregman map. That is, we'd like the Sinkhorn balance to be the minimum permanent and the maximum entropy matrix for the given zero-one pattern. That's not quite the case, but it's approximately true. For small matrices, we can compute the both the Bregman map and its inverse. [3]. The result for one particular zero-one pattern is shown in Figure 4 where we take a "worst case matrix" and illustrate that although it is possible to find matrices with smaller permanent than the matrix with maximum entropy, the permanents do not become very much smaller. Here we compare the log of the permanent with the entropy. Determining supported elements A supported element is a element ( i , j ) in the matrix A that lies on some complete path through the matrix. In other words, the edge in the corresponding bipartite graph is part of some complete matching. Sinkhorn balancing converges quickly if all elements are supported. See section 2 of [12] for a nice summary of terms and references for these results. We can detect unsupported elements, with the Dulmage-Mendelsohn algorithm [6]. This algorithm takes a zero-one matrix, A, and returns two index vectors p and q that define permutations of A to block upper triangular form. The elements corresponding to elements in the upper triangular black are supported. All others are unsupported.
62
I. Beichl & F. Sullivan
Application to Dinier and Monomer-Dimer Problem We can apply these ideas to the dimer problem and the monomer-dimer problem in all dimensions. The dimer problem in 2 dimensions is to find the number of ways to cover a square lattice with dimers and in 3 dimensions we cover a cubic lattice. Figure 5 shows an example of one covering of a red-black square lattice with dimers. There are more than 6 million other covers. The problem is provably hard and thus we sought an estimation method. The problem is equivalent to finding the permanent of a zero-one incidence matrix obtained by letting the rows represent the red locations in the checkerboard and and the columns as the black locations. The matrix has a one in location i,j if red location i in the checkerboard is a neighbor of black location j. In other words, Ojj = 1 iff one can cover red location i and black location j with the same dimer. Thus, the matrix for the checkerboard in Figure 5 has 6 x 6/2 rows and the same number of columns, each row and column having exactly 4 ones, one for each of the directions north south east and west. In three dimensions, a n x n x n cube gives an incidence matrix of size 7i3/2 x n 3 /2 with exactly six ones in every row and column. We used Sinkhorn balancing to obtain an importance function to estimate the permanent. Because bij is approximately |j4jj|/|j4| when location (i, j) is supported, the value of the bij for a given row is approximately the fraction of the paths that pass through column j. If the fcjj's were equal to |Ajj|/|^4|, then we would have a perfect importance function and could get the permanent with one sample selecting one column from each row, deleting the row and column selected, rebalancing, and performing the selection again until there is only a 1 x 1 minor. The estimator would be
1
1
Pl,jl P2,J2 \A\
\Al,jl\ ^
1
1
1
1
Pn,jn
= \A\
where A^ji^2,j2) *s tne minor after row 1, column jl is removed and row 2, column j2 is removed. Although the Sinkhorn balance importance function is not exactly equal to |Ajj|/|A| the calculation will still converge as described above because this has been shown to be only importance sampling and thus must converge to the permanent. The question would only be how fast it converged. In practice, we have found the convergence to be excellent [3], so much so that we were able to compute the dimer covering constant, the limit of the permanents of the incidence matrices when normalizing for the size of the problem. We were able to estimate the permanent for matrices of size 1024 x 1024. Our estimate of the dimer covering constant agrees with known analytic results in 2 dimensions [13], [8] and lies within the known upper and lower bounds in three dimensions [5], [3].
APPLICATIONS OF SINKHORN BALANCING
b
r
Lb
r
b
r
b
r
jd
r
b
63
r J; b
IT
r
r
lT
r
JO
r
b
r
b
r
b
r
Figure 5. The dimer problem is to count the number of covers of a grid with dimers. This is one dimer cover of a 6 x 6 grid in 2D. We extended this result to estimate the number of partial covers of all sizes in addition to the number of complete covers (the monomer-dimer problem) by using a similar importance function. Figure 6 is an example of a partial cover of size 11 of a 6 x 6 grid. In [9] and [2], MCMC solutions to this problem are given. To apply the Knuth method to this problem we define a non-zero importance function even for unsupported elements. As described earlier, we used di • ej for every nonzero ajj supported or not. We also needed to select from the entire minor and not select in any particular order. We were able to do this and Figure 7 shows the estimate of the number of partial covers for an 8 x 8 x 8 cubic lattice.
64
I. Beichl & F. Sullivan
Figure 6. The monomer-dimer problem is to count the number of covers of all sizes, partial and complete. This is a cover of size k=ll of a 6 x 6 grid in2D.
3D monomer dimer coefficients, 256x256 matrix, 4000 sampli
*N 250 o
100
150 level number (k)
200
250
Figure 7. Estimates as logs of the number of the covers of size k for a 8 x 8 x 8 grid. 4000 samples were taken of the 256 x 256 matrix.
APPLICATIONS OF SINKHORN BALANCING
65
References 1. Ando, T. "Majorization, Doubly Stochastic Matrices and Comparison of Eigenvalues" Linear Algebra & Applic. 118, 163 (1989). 2. Beichl, I., O'Leary, D. P. and Sullivan,F. "Approximating the Number of Monomer-Dimer Coverings in Periodic Lattices", Physical Review E, pp 016701.1 - 016701.6 (2001). 3. Beichl, I. and Sullivan, F. "Approximating the Permanent via Importance Sampling with Applications to the Dimer Covering Problem," J. Comp. Phys. 149, pp. 128-147 (1999) 4. Bregman, L. M., "Proof of convergence of Sheleikhovskii's method for a problem with transportation constraints", Zh. vychsl. Mat. mat. Fiz. 7 147 (1967). 5. Ciucu, M. "An improved upper bound for the three-dimensional dimer problem", Duke Math. J. 94 (1998), 1-11. 6. Dulmage, A. and Mendelsohn, N. "Coverings of bipartite graphs", Can. J. Math. 10 (1958) pp. 517-534. 7. Jerrum, M. and Sinclair, A. "Approximating the Permanent", SIAM J. Computing 18 1149 (1989). 8. Kastelyn, P. W., "The statistics of dimers on a lattice." Physica 27 1209 (1961). 9. Kenyon, C., Randall, D., and Sinclair, A. "Approximating the number of monomer-dimer coverings in a lattice" J. Stat. Phys. 83 637 (1996). 10. Knuth, Donald E. "Estimating the Efficiency of Backtrack Programs", Selected Papers on Analysis of Algorithms, CSLI Publications, Stanford, California, (2000). 11. Sinkhorn, R., "A relationship between arbitrary positive matrices and double stochastic matrices" Annals of Math. Stat. 35, 876 (1964). 12. Soules, G. W., "The rate of convergence of Sinkhorn balancing" Linear Algebra & Applic. 150, 3 (1991). 13. Temperley, H. N. V. & Fisher, M. E. "Dimer problem in statistical mechanics - an exact result." Philos. Mag. 6, 1061 (1961).
Zakai equation of nonlinear filtering with Ornstein-Uhlenbeck noise: Existence and Uniqueness Abhay Bhatt Indian Statistical Institute Delhi, India Balram Rajput Jie Xiong Department of Mathematics, University of Tennessee Knoxville, TN This paper is respectifully dedicated to Professor M. M. Rao
Abstract A filtering model where the noise is an Ornstein-Uhlenbeck process independent of the signal X is considered. We derive the (analogue of) Zakai equation and obtain the uniqueness of solu* Research supported partially by NSA
67
68
Bhatt, Rajput & Xiong
tion.
I.
Introduction
The process of interest - the system process X - is uriobservable. We can observe the (observation) process Y - a (known) function h of X - which in addition is corrupted by noise N. We want to filter out the noise N from the observations Y and get an estimate of the process X. This is filtering theory. The filtering model can be written as
Yt = f h(Xs)ds + Nt,
0 1} is exchangable. Thus limn_^oo ^ Z)it=i ^-t^(St) exists under PQ and the ergodic theorem implies that lim ± £ AJF(SJ) = E Po (A t F(St)|J) *OO Jl
71
(II.9)
1=1
where T is the invariant cr-field of the stationary sequence {(Sl, N*,Z) : i > 1}. As in Kurtz and Xiong ([8, Theorem 2.3]) we use the independence of (Sl, Nl) to note that T is contained in the completion of the a-field generated by Z. Now II. 9 implies nlim
i £ A.\F(Sl) = EPo(AtF(St)[^|) t=i = E Po (AtF(S t )|^ t z )
74
Bhatt, Rajput & Xiong
=
fttF.
(11.10)
Arguing similarly as above, using II.8, II.9, 11.10 and the fact that AT1 and Z are independent under PQ we get
fttF = fi0F+ f e-203fis(HF)dZs Jo
+ f fts(AF)ds. Jo
This completes the proof. Remark II. 2 The above Proposition can also be proved along the lines of the proof of the classical Zakai equation. Here we have given a
different
particle-representation proof. See Kurtz and Xiong [8]. Let fit denote the unnormalized conditional distribution of Xt given J^ ' .
i.e. (11.11) Also for 0 < s, t < T, let 1} be a CONS in /fo- Equation III.l holds for each /,. Adding over i and using Lemmas 3.2 and 3.3 in Kurtz and Xiong [8], we get that there exists a constant K such that
g + E /"* 2 /< u , (T 5 (^a s , u ))" - (Ta(ba,,u)'\ du Js
+E
\
4
/„
r Js
KE /* Js
We cannot directly use Gronwall's inequality here. Hence we proceed as follows. Let £^ be independent Markov processes which are solutions of the martingale problem for ( A , f j , s ( h ± - ) / ^ s ( h ± ) ) « (a, < bt ) because the equivalence of the sequences defining one hypernumber does not depend on any finite beginning of these sequences. 1 . The relation < is transitive on Rw . Let a < P and p < y for some a, p, y eRa . Then by the definition of n, then at < bt and for some natural number m, if/ > m , then /, < c/. Besides, a * P and p ^ y. Then there are such positive numbers k, h e /?++ that bt - at > k for all / > n and c, ~ l j > h for all y>m. Let g = min { k, h } . Then there is such a natural number/? that | bj - /,• | < g for all / > p because the sequences b = (6,-)i60 and / = (/, )ieo define the same hypernumber p . Let q = max { m, n, p }. Then assuming / > q , we have: c, - a,= c, - b/+ bt - «,= c, - /,• + // - b,+ b, -a/=( c, -/,•) + (/,• -bi) + (b,-cii)>g-g+g = g because b, -a,>g, c, - /,- > g and /, -fy• > - g . It implies by the definition of the relation < that a < y. As a, p, y are arbitrary points from /fm, the relation < is transitive. 2. The relation < is antisymmetric on /fffl . Let us suppose that this is not true, in other words, that for some a, P e/?B , we have a < P and P < a . By the definition of the relation < , a * P and there are such sequences a = («/)ie 0 for all / . Consequently, linij^oo (6, - a/) > 0. At the same time, limi^ ( b, - a,- ) = Hm j.^, (b, - // + /,• - a, ) = linij^oc (( bt - / / ) + (/,•- d{ ) + (dj a, )) = linij^oc (bi-l,) + lim (_>„ (lt-dt) + lim^co ( J,- - a, ) = \im^x ( /,• - dt ) as limi^oc (i, - /,•) = 0 and little*, (d, - a,) = 0. However, limj^a, ( /, - dt } < 0 because /, < d-t for all / . Consequently, lim^^ (6, - a,) = lim^x (/, - c/;) = 0 . It means, by the definition of a hypernumber, that a = p. This contradicts our assumptions and proves that the relation < is antisymmetric on R® by the principle of excluded middle. What concerns the relation < , the above proof shows that it is transitive. In addition to this, it is asymmetric because by the definition of a hypernumber, we have that a < a and a < P and P < a imply a = p. Thus, < is a partial order. 2.3. Topology in R® The space /?a, has good topological properties. These properties provide for an axiomatic characterization of real hypernumbers. Let/?: R® —> R^ be the natural projection. It is possible to define a topology on the space Rm by means of the special type of neighborhoods. Let a = (ai)ie 0 e /f° and k e DEFINITION 2.5 A spherical neighborhood of the sequence a is an arbitrary set Oka ={ c=(c/),-E06 R®; 3neco Vi>n (| o, - c/|< k)}. The system T of all spherical neighborhoods determines a topology T on Ra. On R as a subset of R®, this topology induces the natural topology for real numbers.
Hyperfunctionals and Generalized Distributions
PROPOSITION 2.1
87
R® is a topological vector space with respect to T.
A topological spaced may satisfy the following axioms [37]: T0 (the Kolmogorov Axiom). V*, y e X(3Ox (y gOx) v3Oy(x£ Oy) ). That is, for every pair of points a and b there exists an open set U in O such that at least one of the following statements is true: 1) a is in U and b does not lie in U; 2) b is in U and a does not lie in U. T, (the Alexandroff Axiom). Vx, y eX B Ox 3Oy (x zOy &y That is, for every pair of points a and b there exists an open set U such that U contains a but not b. To say that a space is T! is equivalent to saying that sets consisting of a single point are closed. T2 (the Hausdorff Axiom). Vx, y eX 3Ox 3 Oy (Oxr\Oy = 0 ). In other words, for every pair of points a and b there exist disjoint open sets which separately contain a and b. In this case, open sets separate points. Here Ox, Oy are some neighborhoods of x and y, respectively. A topological space, which satisfies the axiom T; , is called a Tj -space. Each Ti+1 is stronger than T; . T0 -spaces are also called the Kolmogorov spaces. TI -spaces are also called the Frechet spaces. T2 -spaces are also called the Hausdorff spaces [37]. There are also T3 or regular spaces, in which for every point a and closed set B there exist disjoint open sets which separately contain a and B. That is, points and closed sets are separated. Many authors require that T3 spaces also be T0, since with this added condition, they are also Ta [38]. There are also T4 or normal spaces, in which for every pair of closed sets A and B there exist disjoint open sets that separately contain A and B. That is, points and closed sets are separated. Many authors require that T4 spaces also be TI [38]. THEOREM 2.1 The topological space R® does not satisfy even the axiom T0 . To prove the theorem, it is sufficient to take elements 0 = (a\=Q) ,-eo and b = (b\ = 1/i) ,EO). Any spherical neighborhood of one of them includes the second point. LEMMA 2.5 If points a, b eRK determine the same hypernumber, then any spherical neighborhood of a contains b, and vice versa. Since Ra is the quotient space of R® , the topology T induces on Ra the definite topology 8, which is generated by means of the projections of the spherical neighborhoods. PROPOSITION 2.2 The topology 6 satisfies the axiom T2 , and thus, Ra is a Hausdorff space. Proof: Let us consider two arbitrary points a. and P from /?n that | a, -6, > k. It makes possible to choose an infinite set M of natural numbers such that for any me M the inequality | am -bm\>k\s valid. Let us take h — k/4 and consider two spherical neighborhoods O/,a and O/,b of the points a and b in R®. The projections p(O/,a) and p(O&6) of these neighborhoods will be neighborhoods of a and ft with respect to the topology 8. Moreover, p(Oha) n p(Ohb) = 0 . To prove this, we suppose that there is y e R^ and y is an element of the
M. Burgin set O/,a n O/,b . It implies that there are such points u, v e /?ffl for which p(w) = p(v) = y, p: /?"-> /?m is a natural projection, and u 6 O«a,, v e O/A Let w = (wi)/eo and v = (Vj)je(0. The equality p(w) = p(v) implies (cf. Section 2) that for the chosen number k) the following condition is valid: 3m ecoVi > m ( \u\ - v\ < k/3 ). The set M , which is determined above, is infinite. So, there is jeM, which is greater than m. For this j, we have \Uj - Vj | > \a-f - bj - \a, - u, \ - \b-} - Vj | > k - k/4 - k/4 = k/2 > k/3. It contradicts the condition w; - v,| < k/3. Consequently, the assumption is not true, and O/,o r\Oi,b = 0 . The proposition is proved because a and p are arbitrary points of/?ra. COROLLARY 2.1
Any limit in /?„, is unique.
This result makes possible to give an axiomatic description of R01. THEOREM 2.2 /?„, is the largest Hausdorff quotient space of the topological space R®. Proof: By Proposition 3.2, /?„ is a Hausdorff space. Thus, to prove the theorem, it is necessary to demonstrate that if a Hausdorff space X is a quotient space of R® with the projection q: /?" —> X, then there is a continuous projection v: R® —> X for which q = pv, i.e., the following diagram is commutative: P
R a —> R/ w / /' v
Let us consider such a Hausdorff space ^with the continuous projection q: R® -» X. Then, for any points x, yeA^the inequality x *y implies existence of neighborhoods Ox and Oy for which Ox n Oy = 0 is valid. As ^T is a quotient space of R"1, there are points a,b € R® for which q(a) = x and q(6) = y. The inverse images q"'(O^) and q"'(0j) are open sets because q is a continuous mapping. Besides, a eq"'(Ox) and b eq"'(Oy) because q(a) = x and q(b) = y. That is why, for some k, q"'(Ox) contains the spherical neighborhood Oka of a and q"1 (Oy) contains the spherical neighborhood Okb ofb. Let us suppose that p(o) = p(b). Then a e Qkb and b e O^a . As a consequence, x=q(a) e q(Ok6) c Oy and y = q(6) e q(Oka) c O^;. It contradicts to the condition that Ox n Oy = 0. Thus, for arbitrary x,y from X and such a, b from R" that ^rfoj = x and 9(6) = y, we have p(a) &p(b). It makes possible to define the mapping v: R^ -» Jf as follows: v(x) = p(a) for any x eX and for such a £ R1" that 9(0) = x. The definition of v implies that the mapping v is continuous because the topology in /?„, is induced by the topology in /?" and pv = q. Theorem is proved. This theorem shows that the set of real hypernumbers is a topological extension of the set of all complex numbers, while the set of hyperreal numbers, which is introduced in non-standard analysis, is a set-theoretical extension of the set of all real numbers [18, 19]. THEOREM 2.3 [36]. The topological space R^ is complete in the topology of pointwise convergence.
Hyperfunctionals and Generalized Distributions
89
2.4. Operations in R®. Operations in R induce corresponding operations in R®. Let a = (a, )I £ H and b = (hi )J E H are elements from R®. DEFINITION 2.6. a) Operation of addition in .R01: a + b = (c, )iea) where c, = at + b{ for all ieco; b) Operation of subtraction in R10: a - b = (c, )i(ECO where c, = a, - 6, for all ieco; c) Operation of multiplication in /?": a • b = (ct )-,ea where c, = a, • i, for all ieco. REMARK 2.1 By the definition of addition and multiplication in the set R®, all laws of operations with real numbers (commutativity of addition, associativity of addition, commutativity of multiplication, associativity of multiplication, and distributivity) are valid for corresponding operations with sequences of real numbers. These operations in /?ffl induce similar operations in /?„,. THEOREM 2.4 The set Ra of all real hypernumbers is an ordered linear infinite dimensional space over the field R of real numbers, in which the binary operations max and min are defined. Proof: To be a linear space over the field R, the set R^ has to possess to operations: addition + and multiplication by elements from R. 1. Let a, p e/?B . To define addition a + p = y in Ra, we take some sequences a = (a,)iera ea and b = (6, )je for some sequence b = (^)ieo) from p. If / = (/,)i£C) e P, then by Definition 2.1, lim^o, \b>\- lt \ = 0 . Consequently, lim^^ | cbj - cl\ = Hindoo |c|-|bj - /, = 0 . Then by Definition 2.1, cp = Hn(c/,)ie n. Consequently, aa. < ap. Let y = Hn(c,)i£(1) e /?B and a < p. By the definition of the relation < , there are such sequences a = (a-, )i£cl> e a and b = (Z>, )-lso e P, for which the following conditions are valid: there is such an element n from co that for any ; from co, / > n implies at < b\. Then a + y = Hn(a, + c, ) ieH and a + p = Hn(6, + c, ) i E U . By the properties of real numbers, we have a, + c, < b: + c, for all / > n. This implies a + y < P + y. 5. Let a, p e/?a . To define operations max(a + P) = 6 and min(a + p) = y in RK, we take some sequences a = (a,)je(0 eoc and b = (b( )ieo e P and determine the hypernumbers 5 = Hn(max(ai + £,)ieo) .and y = Hn(m/«(a; + bi)i 0 )) & (Vp e R 3i eco (a\ > p))\ 2) infinite decreasing if 3j eco Vi >j fa-H - a\ > 0 )) & (Vp e R 3i eco fa >p))\
3) infinite expanding if there are subsequences (£,)je ^(c,)iee> • PROPOSITION 2.3 Any finite real hypernumber is either a real number or an oscillating real hypernumber. PROPOSITION 2.4 Any infinite real hypernumber is either an infinite increasing number or an infinite decreasing number or an oscillating real hypernumber. REMARK 2.4 true.
For complex hypernumbers in general, Propositions 2.2 and 2.3 are not
REMARK 2.5 For an arbitrary partially ordered set H and H-real hypernumbers (Burgin, 2001), Propositions 2.2 and 2.3 are not true in general.
Hyperfunctionals and Generalized Distributions
93
PROPOSITION 2.5 If a, = Hn(a;)ieo) is a real number (an infinite increasing or infinite decreasing hypernumber), p = Hn(i,)ietl) , and for almost all i eo>, the element bt is between a\+\ and a\, then P is a real number (an infinite increasing or infinite decreasing hypernumber). REMARK 2.6 For oscillating hypernumbers this is not always true as it is demonstrated in the following example. EXAMPLE 2.7 The condition of the Proposition 2.4 is satisfied for the pair that consists of the finite oscillating hypernumber a = Hn(a,)jeci>, a, = (-1)', i = 1,2,... and for the real number P = Hn(i,)j £(a , 6, = 0 , i = 1,2, ... However, p is not an oscillating hypernumber. PROPOSITION 2.6 If a = Hn(a;)iec, is an oscillating hypernumber , P = Hn(&/)iero , and the following condition is satisfied Vi, j eco 3m, n €co (flj < a\ —> bm < «j < a\ < 6n)
then P is an oscillating hypernumber. The set of all rational numbers is dense in the space of all real numbers. This property gives us the following result. LEMMA 2.10 Any class of equivalent sequences contains a sequence, all elements of which are rational numbers. 2.6. Invariants of hypernumbers Hypernumbers have specific invariants. One of the most important of them is the spectrum of a hypernumber. DEFINITION 2.12 The spectrum Spec a of a sequence a = (aOiea is the set { reR ; r = lim b\ for some subsequence (A/)(eo) of a}. DEFINITION 2.13 The spectrum Spec a of a real hypernumber a = Hn(a,),£B is equal to the set Spec a where a = (a^iEco • EXAMPLE 2.8 If a = (a,)i6co and a = Hn(a,)ieo), where a,•,= i, i = 1, 2, ..., then Spec a = Spec a = 0. EXAMPLE 2.9 If a = (a,)ie„, of all rational hypernumbers, then we get the same set Rw as when we construct real hypernumbers with the set of all real numbers R. In particular, when we consider ordinary sequences, we build by this process rational hypernumbers. The set gm of all rational hypernumbers contains R. Thus, generating rational hypernumbers, we automatically obtain real numbers. Nevertheless, we construct real hypernumbers from real numbers because it makes the construction more transparent and helps to understand properties and behavior of hypernumbers.
3
SERIES AND INTEGRAL WITH VALUES IN HYPERNUMBERS
3.1. Sequences of real numbers In the universe of hypernumbers, it is possible to eliminate the concept of a limit for real numbers as it is demonstrated by the following result. PROPOSITION 3.1 A sequence / = (a,-)ie is a subsequence of a sequence (a,)j£0) if there is a strictly increasing function g: N -> N such that 6, = ag(;). There is an important criterion when the hypernumber of a sequence is a real number. THEOREM 3.1 Hn(a;)i60)e R if and only if Hn(a,)jeo = Hn(6j)(e(a for any subsequence (6/)i£,)ie(C1 as these hypernumbers have different spectra. If the hypernumber Hn(a,-)ie[0 is infinite, then there is a subsequence (6j)i6m of the sequence (a,)ieo) for which either bt —> oo with i—> oo or b,• —> - oo with ;'-» oo. Let us consider the first case. By the definition of b, —> oo with /'-> co, we can build a function
Hyperfunctionals and Generalized Distributions
99
g(i) such that b^ > 2bt = a. Then by the definition of a hypernumber, Hn(&,-)ie(0 ^ Hn(6g(;) )ie((). Consequently, it is impossible that at the same time we have both equalities Hn(fl;)ie/)ie(D and Hn(a()ieo = Hn(6gW )ieo). The case when bj->- of the sequence (6,)je k2. The proof of the Riemann theorem [41] or the properties of convergent series show that it is possible to take such permutation Qi that changes places only for a finite number of elements from S2. So, there is the least number hi such that Q2 does not move elements from S2 with larger indices. Let m2 = max { k2, h2 } and S3 = S/=m2+1°°a,. As the permutation Q'2 does not change elements of S2 with larger than m2 indices, 53 is a tail of S2 and thus, a tail of S. As the permutations Qi and Q'2 act at non-intersecting part of the series S, we can define the permutation Q2 = QrQ' 2 , which acts on S and changes the places only of those elements that have numbers less or equal to m2. In addition, we have | Z w m l ij - u\ \ < 1/2 and 11,,= m\+\'"2b\ - u2\ )2./(x)dx if and only if \(Rf) ij(x)&x = |(s>e) 2^)^ for any other partition Q of R that satisfies conditions 1) - 4).
104
M. Burgin
Let in addition, X= X-, for all /'eco. COROLLARY 3.2
The integral lgflx)dx of a function fix) exists and is equal to the
hyperintegral f(R,/>) i/(x)dx if and only if \(RJ>) ij(x)dx = 1(^0 2Ax)&c f°r anv otner partition QofR that satisfies conditions 1) - 4). REMARK 3.4 If instead of satisfying condition 4), the partition P is a system of tagged 8-fine partitions P, = { X t ] ; j = 1, 2, ... , / } [43], then the hyperintegral of the second type \(xf) 2,/Wd* of the function J(x) over the space X in the partition P coincides with (proper or improper) Kurzweil-Henstock or gauge integral [43]. REMARK 3.5 Similar construction of integration can be applied to abstract spaces with a measure as it is done for the conventional theory of integration in [46]. In its turn, the developed construction of hyperintegration can be applied to different multidimensional spaces, giving a new construction for path integrals, which includes such important for mathematics and physics constructions as Wiener integral, Ito integral, and Feynmann integral [16]. For hyperintegration in multidimensional spaces, approximations of the space in question are taken instead of space partitions.
4
EXTENDED DISTRIBUTIONS
If fix) is a function in X where X is an open subset of the space R, then )^/(x)dx is the Riemann integral (proper or improper) of this function on the whole space X. Let in what follows A" be some class of real functions such that for any continuous function X*) and for any function h(x) from A', the integral \xf{x) h(x)dx exists, and CX" be the space of all infinite sequences of continuous real functions in X. According to the tradition (cf, for example, [47]), we call K the set of test functions. DEFINITION 4.1 Two sequences of continuous functions (fn(x); weco} and (gn(x); 77eco} from CA*1 are equivalent with respect to K if linv^o) (fn(x) - gn(x)) h(x)dx = 0 for any function h(x) from A". DEFINITION 4.2 The classes of equivalent with respect to K sequences of continuous functions are called real extended distributions in X with respect to K. We denote by CXK(SI the class of all real extended distributions in ^fwith respect to K. Ed{fn(x); we00} denotes the real extended distribution that is defined by a sequence of continuous functions (fn(x); «eco}. REMARK 4.1 Here only Riemann integral is utilized for the definition of extended distributions. However, similar constructions may be developed and alike results obtained for other types of integral (Lebesgue, Stieltjes, Perron, Denjoy or gauge integral). In all cases, extended distributions are special kinds of extrafunctions.
Hyperfunctionals and Generalized Distributions
105
REMARK 4.2 It is possible to define extended distributions, taking sequences of functions from loc L2. THEOREM 4.1 The set CXKo> is an infinite dimensional linear space over R, which includes Ra as its subspace when A" is a subset of L. Proof: To be a linear space over the field R, the set CXKla has to possess to operations: addition + and multiplication by elements from R. 1 . Let F, G e CXKc> . To define addition F + G = H in CXKe> , we take some sequences of continuous functions {fn(x) ; weco} e F and (gn(x) ; «eco} e G and determine the extended distribution H = { fn(x) + gn(x) ; neco}. This procedure defines addition of sequences in CXKts> To show that this is an operation in CXKet , it is necessary to prove that H does not depend on the choice of sequences (fn(x) ; «eco} and (gn(x) ; «eco}. To do this, let us take another sequence {/„(*) ; weco} in G and show that if the extended distribution L is equal to Ed {/,(*) + ln(x) ; «eco}, then L = H . By Definition 4.1, limn^« I (/„(*) - g«(XO h(x)dx = 0 for any function h(x) from K. Consequently, lim^ f ((/"„(*) + /„(*)) - (/"„(*) + #„(*))) h(x)dx = linv^, f (/„(*) - gn(x)) h(x)dx - 0 for any function h(x) from K. Then by Definition 1 . 1 , L = H . As addition of sequences of real functions is commutative and associative, the operation of addition in CXKia is also commutative and associative 2. Let us take an arbitrary real number ce/? and an arbitrary real extended distribution F = Ed{fn(x) ; weco} from the class CXKes. We define cF as Ed{cfn(x) ; weco}. Let us take another sequence (ln(x) ; we 00} in F and show that if the extended distribution L is equal to Ed{ cln(x) ; «eco}, then L = cF . By Definition 4.1, lim,,^ ) (fn(x) - /„(*)) • h(x)dx = 0 for any function h(x) from K. Consequently, !«!!„_«, f (cfa(x) - c •/„(*)) • h(x)dx = limn^x c- f (fa(x) - ln(x)) • h(x)dx = c-limMOO f (ftt(x) - /„(*)) h(x)dx = 0 for any function h(x) from K as integral is a linear functional. Then by Definition 4.1, L = cF. It means that the definition of the product cF does not depend on the choice of a sequence from F. Consequently, multiplication by elements from R is defined in CXKa>. This multiplication inherits the commutativity, associativity, and distributivity properties from the corresponding operation in R". 3. Let a(x) denotes the function that is everywhere equal to a real number a. If a = Hn( a/)ieo> is a real hypernumber, it defines the sequence {a,{x) ; /ECO} and the extended distribution F = Ed {«,•(*) ; /eco}. lfb = (b,)i£a> is another sequence from a , then it defines the sequence (A,(jc) ; /eco}. By Definition 2.1, linv^ | ar bt \ = 0. Consequently, we have linij^oo I (a,(X) - bi(x)) h(x)dx = lim^ (a,- - bj) • | h(x)dx = ( | h(x)dx) • lim^ (a, - bi) < ( f h(x)dx) • limj^oo | o, -b-t - 0 . Thus, (b,{x) ; /eco} is also an element from F and the correspondence a = Hn( a, )J £M —> F = Ed{a,(x) ; /eco} defines an inclusion of Rw into CXg^ . 4. By Theorem 2.4, the linear space /?ra is infinite dimensional. Thus, the linear space CXK(il is infinite dimensional as it contain an infinite dimensional subspace. Theorem 4.1 is proved. Let K and H be some classes of real functions in X. LEMMA 4.1 I f K ^ H , then there is a projection P: CXK CXHa>
106
M. Burgin
Let X c Y c R and K and H be classes of real functions in X and Y, respectively, such that any function/from K is a restriction of some function from H. LEMMA 4.2 There is a projection P: CYHt, -> CA^ . Let JiT consists of functions with the compact support. THEOREM 4.2 The set CXfa> is a module over the algebra of all continuous functions in R. Proof: We need only to show that it is possible to multiply extended distributions by continuous functions. Let us take an arbitrary real continuous function g(x) and an arbitrary real extended distribution F = Ed{fn(x) ; KECO} from CXKlil. We define the class gF as Ed{ g(x)-fn(x) ; weco}. Let us take another sequence {ln(x) ; weco} in F and show that if the extended distribution L is equal to Ed{ gln(x); neco}, then L = gF. By Definition 4.1, linv^ I (fn(x) - /„(*)) h(x)dx = 0 for any function h(x) from K. By the initial condition, has a compact support. By the properties of the real line this support is a finite union of closed intervals. Then there is an interval [a, b] that contains the support of the function h(x). As g(x) is a continuous function, there is a number c such that c > g(x) for all x from the interval [a, b} . Consequently, limn_>K [ (g(x) fn(x) - g(x) •/„«) • h(x)dx = linwo f g(x) • (fn(x) - /„(*)) • h(x)dx < linv^ f c- (fn(x) - /„(*)) • h(x)dx = lim^oo c- f (fn(x) - /„(*)) • h(x)dx = c-lim,,^ f (fn(x) - /„(*)) h(x)dx = 0 as integral is a linear functional. Then by Definition 4.1, L = gF . It means that the definition of the product gF does not depend on the choice of a sequence from F. Consequently, multiplication by continuous functions is defined in CXKls>. This multiplication inherits the commutativity, associativity, and distributivity properties from the corresponding operations with functions. Theorem 4.2 is proved. We denote by PXKia the class of all real extended distributions in AT with respect to K such that each of them is defined by a sequence of polynomials. We have VXK<S1 c CXKo . The following result shows that to build all extended distributions in X with respect to K, it is sufficient to consider only sequences of polynomials. THEOREM 4.3 P^ = CXKta. Proof: a) Let us consider an arbitrary extended distribution F from CXKia . By the definition of the space CXKtl> , F = Ed{fn (x); n ecu }, where ally; (x) are continuous functions on X. By the Weierstrass theorem, each fn (x) is a limit of some sequence of polynomials {pnj (x); n eoo }, i.e., /„ (jc) = lim^x pnj (x). On each interval lk = [-k, k] where k~ 1,2,3, ..., these polynomials uniformly converge tofn(x). It makes possible to correspond to each function/,)*) such a polynomialp n j( n )(x) that \\fn (x) -/?„, j(n) (x) \\/n = max {|/« (x) -pn,j(n) (x) ; * € / „ } < l/n .
Hyperfunctionals and Generalized Distributions
107
Let us demonstrate that the sequence of polynomials {pn,j(n) (x)', n eco } defines the same extended distribution as the initial sequence of functions {fn (x); n eco }, that is, if G = Ed{pnJ(n) (x); H eo> }, then F= G. Let h(x) be an arbitrary function from K. By the initial condition, has a compact support. By the properties of the real line this support is a finite union of closed intervals. Then there is an interval [-«, n] that contains the support of the function h(x). This gives us the following inequalities limn^ I (/,(*)-pB>yw (x) )•/j(x)dx < linvw I (/,(*)PnjM W) h(x)\dx < linrw, ( f \ (x) - pnj(n) (x) ;xe !„} < \ln for all n eco. Consequently, lim,,^) (fn(x) - pnj(n) W) AWdx = 0. This means that F = G as h(x) is an arbitrary function from K. Theorem is proved. This result allows us to define derivatives for extended distributions. Let us suppose that an extended distributions F is defined by a sequence {fn(x); weco}, i.e., F = Ed{fn(x); «eco }. The sequence is called an approximation of the extended distributions F. Let us assume that all functions/,(x) have the first derivatives/, '(x). DEFINITION 4.3 The extended distribution F' = Ed{fn '(x); weeo } is called a sequential derivative of the extended distribution F with respect to the approximation {fn(x); «eco}. Thus, from Theorem 4.3, we have the following result. COROLLARY 4.1 Any extended distribution from CA^ has a sequential derivative. Extended distributions are built in the same way as distributions in the sequential approach [34], only instead of fundamental sequences, all sequences of continuous functions are taken. This implies a natural correspondence between distributions and extended distributions. It is possible to show that this correspondence is an isomorphic inclusion. However, this is done in the next section, where we consider functional approach to generalizing distributions, because results that are obtained there essentially simplify the proof.
5
SERIES AND INTEGRAL WITH VALUES IN HYPERNUMBERS
Let A'be some class of real functions. DEFINITION 5.1 A real hyperfunctional F on A" is a mapping of A" into the set KM of real hypernumbers (F:K-*Ra). Hyperintegrals introduced in section 3 are hyperfunctionals.
108
M. Burgin
We denote by K° the set of all real hyperfunctionals on K. If Fe K° and/e K, then application of F to/is denoted by F(f) or by (F,f). Operations in the space Rm determine corresponding operations in K°. Let F, G: K -> flM and H: K -> R. DEFINITION 5.2 a) Operation of addition in K°: (F + G,J) = (F,f) + (G, f) for all/e K, b) Operation of multiplication in K°: (H-F,f) = (H,J) • (F,f) for all/e K. Theorem 2.4 implies the following result. THEOREM 5.1 The set K° is a module over the algebra of all functionals on K and a linear space over R, in which the binary operations max and min are defined. REMARK 5.1 Some properties of hyperfunctionals are described in [15]. In particular, a generalization of the Banach-Hahn theorem is given. DEFINITION 5.3 A hyperfunctional F on A" is called linear, if for any two functions^*) and h(x) from K and for any numbers a, b e R, we have F(af+ bh) = aF(f) + bF(h). Functionals are related to distributions. Namely, a distribution is a linear continuous functional [48]. Extended distributions from the previous section define hyperfunctionals in a natural way. Namely, if/= Ed{/£c); /eco} is the extended distribution defined by a sequence (/fct); /eco} and h(x) is a test function from K, then we correspond to/the following hypernumber: (f, h) = Hn(a/)jem where a/ = \ffa) h(x)dx, /eco. If {g,(x); /eco} is another sequence that determines the extended distribution/, then it gives another hypernumber Hn(£;)iEM where bt = I g,{x) h(x)dx, for all /'eco. However, lim,-^ | a, - b-\ = lim,^ | f fi(x) h(x)dx - ] g{x) h(x)dx \ = lim^ | f (f,{x) - g,{x)) h(x)dx \ = 0 because the sequence(g,(x); /eco} defines the same extended distribution as the sequence (ffcc); /eco}. By Definition 2.1, Hn( [ (F;W - F;n(x)) Uinm(^\x)6x -> 0 when w -> co. Consequently, for all m e m, 1 1 g,n+(x) w;nm(x)dx| = f gin(x) uinm(x)dx\ —> 0 when n —> oo. Our next step is to show that | I gin+(x)dx \ —> 0 when « —^ oo. Suppose that this is not true. Then there is some number be R++ such that we have | g,n+(;t)dx | > b for all n e o>.
112
M. Burgin
We can find such natural number q that b > (21q). Thus, | f gin+(x)dx \ > (21q) for all « e CO.
Let us consider the integral \ (gln+(x) - gin+(x) uinm(x) )dx = f gin+(x)(\Xin - uinm(x))dx. The condition u({ * e I, ; 0 < uinm(x) < 1 })
> (2/q) and taking into account that | I gin+(x)dx \ = I gin+(x)dx \ as
gin+(x) is a non-negative function, we have I gin+(x) ulnm(x) dx > (l/q) for all n e co. This contradicts to the property that for all m e co, || gin+(x) uinm(x)dx\ -> 0 when n -> oo. In a similar way, we show that | f gin.(x)dx \ —> 0 when « —> co. This makes possible to find such number TW(&) that | f g,>,.(x)dx | < (1 /2k) and | f gin+(x)dx | < (l/2fc) for all « > Now let us take the sequence {F^m^y, weco } from the system of regular distributions Fin that was considered at the beginning of the proof and define the hyperfunctional G = Hn(Ffc m(i) )iceia that is, for each function h(x) from K the value G(h) of G is equal to the hypernumber Hn(Fti m(t)(/0),tea> • Since all Fit mW are regular distributions, G belongs to the space LRD '. To complete the proof, we need to show that F = G. This equality means that for each test function h(x) from K, the value F(h) of F is equal to the value G(h) of G . In other words, we need the equality of hypernumbers To prove this equality, let us consider the difference F^h) - F^ m(^(h) for an arbitrary function h(x) from K and such k that the support of h(x) is a subset of the interval lk . Since h(x) has a compact support, this inclusion is true for almost all numbers A: e co. By our construction, the distribution Ft on It is equal to the derivative fk^\x) of some infinitely differentiate function f/(x). Then we have Fk(h) - Fkt m(k)(h) = h(x))-\lk>m(t)(x)(™h(x)dx
= (fk(r(lc\x), h(x)) - \fk,m(k)(x)h(x)dx
\ 4, m(k)(x)(r(k)) h(x)4x where F^h) = f fk,
m(k)(x)
h(x)dx and 4,
(fk^k^(x),
= (f^k\x\h(x)) m(k)(x)
-
is the r(7c)-th
antiderivative of the function fk<m(k)(x) for all numbers k, n e co. Since the function h(x) has a compact support, which is a subset of I* , and integral is a liner functional, we have I aWt))W, h(x)) - f 4, m(k)(x)«k» h(x)dx I = I (Mx), A«*» (x)) - f 4, x«W AWt))Wdx I = f /t(x) A«*»(^) E b e a function. Its values X(u>,t,B) for u; e fi,t e R+,.B e L, will be also denoted by Xt(u,B). Such a function will be automatically considered extended to O x R x L with Xt(u,B) = 0 for t < 0. The nitration will be also extended automatically with ft = FQ for t < 0. Definition l.l.a) The function X is called a set function. b) X is said to be measurable, if for every B € L, the process (w,i) H-> Xt(w,B) is measurable with respect to the a—algebra M = FB(1&). c) ^ is said to be adapted to the filtration (J-t), if for every t € R and every B & L, the random variable u H-> Xt(w,B) is ft—measurable. d) X zs sazd io be right continuous (resp.cadlag) if for every u> E Q and B £ L the function t >—> Xt(u,B) is right continuous (resp. right continuous and has left limits). e) X is said to be a p—process-measure, if it is a—additive in LPE, i.e. for
PROCESS-MEASURES
121
every t G K and BEL, the random variable X t ( - , B) belongs to LPE, and for every t e K, the function B H-» Xt(-B) from L into LPE is a—additive. If the space E is separable and if X is right continuous and adapted, then X is measurable.
B.
2. The measure
Let X : f£ x E+ x L —> .E be a cadlag, adapted, p—process-measure. We define the set-function Ix '• ft x L —> L^ C L(F,LPG}, first for predictable rectangles in ft, by /x(A x {0} x B} — I A X O ( - , B ) , for A <E -Fo.-B e L, and /x(4 x (s,t] x 5) = 1,1 (^(-,5) - X B (-,5)), for A e ^ and B e L. For each fixed B € L, the mapping (7 »-> Ix(C,B) is finitely additive on the semiring of predictable rectangles. It can be extended to a finitely additive measure C >—» Ix (C, B) for C in the ring ft generated by the predictable rectangles. For each fixed C e ft, the mapping j3 H-» Ix(C,B) is cr—additive in L^ on L. We obtained a set function (C1, B) H-» Ix(C, B) from the semiring ft x L into L^, which is, separately, additive in the first argument C, and a—additive in the second argument B. If we write IX(C x B) = IX(C,B) for C e ft and B <E L, then /x is (jointly) finitely additive on the semiring ft x L, therefore it can be extended to a finitely additive measure Ix '• 7" (ft x L) —> Lg, on the ring generated by ft x L. The finitely additive measure nx '• r(P x L) —> E defined by nx(M) = E(Ix(M)), for M G r(P x L) is called the Doleans measure associated to X. The following problems arise: 1) Can Ix be extended to a set function Ix '• P x L —> L^, which is separately LPE C L(F,LPG] defined in the preceding section. Definition 1.2. We say that the adapted, cadlag, p— process-measure X : fi x M+ x L —> E is p—summable relative to the embedding E C L(F,G), or with respect to (F,G), if Ix can be extended to a a— additive measure Ix '• P ® L —» LPE C L(F,LPG} with finite semivariation (!X)F,LP • Assume X is p—summable relative to (F, G) and consider the extended a— additive measure Ix : P L —> LPE C L(F,LPG} with finite semivariation (IX)F LP • We can then use the general integration with respect to a— additive measures with finite semivariation presented in [Dl], §5, to define the integral / Hdlx and the stochastic integral H • X. This is done in the following way: Let —I— = 1 and let Z C LqG, be a norming space for LPG. In particular, we can take X = LqG*. For each z 6 Z, consider the measure (Ix)z : 'P <E> L —> F* defined by < x, (IX)Z(M) >=< Ix(M)x, z >, for x 6 F and M <E P L, where the first bracket represents the duality between F and F* and the second bracket represents the duality between LPG and LqGt . Each measure (Ix)z has finite variation |(/x)z| and we have = sup
/x
IMI, F we define the seminorm (IX}(H)
=
r
sup / \H\d\(Ix)z\ < +00.
l|z|U F with (Ix)(H) < oo. Then FF,G(!X] is a vector space, complete for the seminorm Ix(H). (See [Dl], Corollary 5.25). For H € FF&^X) we define the integral / Hdlx in the following way: We have
Let H € FF,G(!X] and z 6 Z. Then H 6 L l F ( ( I x } z ) , hence the integral / Hd(Ix)z is defined and is a scalar (see[Dl], 2.23). The mapping z H-> J Hd(Ix)z is a continuous linear functional on Z: , for z € Z.
PROCESS-MEASURES
123
We denote this linear functional by / H dlx • We have / H dlx e Z* , < / Hdlx,z>= t Hd(Ix)z,
forztZ
and j
HdIx\ 0 and B € L we have Ijo^xB-^ 6 FF,G(!X} We denote /[0,t]xB
Assume JJ0 t^xBHd!x £ LPE for every i > 0 and B € L. We denote by the same symbol the equivalence class JJ0 t , xB Hdlx in -^G> as well as any random variable belonging to this equivalence class. If we choose a representative from each equivalence class, we obtain a process-set function
'•
Hdlx}(u) /
with values in G. This process-set function is automatically adapted ([D2], theorem 3.2). Definition 1.3. We denote by L1FG(X] the subspace of FF,G(!X} consisting of the processes H satisfying the following two conditions: 1) /[0it]xB Hdlx € LPG, for t € R+ and B 6 L. 2) The process-set function (f\(j{\XQHdIx)t>Q^efj tion, for each B G L.
has a cadlag modifica-
Any cadlag moditifaciton is called the stochastic integral of H with respect to X and is denoted H • X: (H-X)t(u,B)
= (f
Hdlx)(u),a.s.
J[Q,t]xB
It follows that for each B, the stochastic integral is denned up to an evanescent set, i.e. a subset of N x E+ x L with N negligible (with respect to P). II. Classes of summable process-measures 1. Preliminaries To prove that a process-measure X is summable, it is necessary to prove first that the associated measure Ix can be extended to aCT—additivemeasure on the whole a—algebra P ® L.
124
Nicolae Dinculeanu
If (L, £) is an arbitrary a—algebra, it is not known whether or not even the increasing process-measures are summable. But if L — E™ and £ — B(L), the Borel cr-algebra of L, we can prove that the following classes of process-measures are summable: 1. Process-measures with integrable variation. 2. Process-measures with integrable semi variation. 3. Orthogonal martingale-measures with values in a Hilbert space. Without loss of generality, we shall restrict ourselves to the case L — M. To prove this, we shall use the following strategy: A. We prove first that a positive increasing, integrable, right continuous process measure is summable, by proving that its Doleans measure /j,x(M) = E(IX(M}) is cr—additive for M 6 'R- x £, hence it can be extended to a a—additive, positive measure on the a-algebra P £. This is done by associating to X a two parameter process F which is incrementaly increasing, right continuous and integrable. In this way we reduce the summability of X to that of F. B. If X is and E—valued, right continuous process-measure with integrable variation \X\, then \X\ is a positive increasing, integrable, right continuous process, hence p,\x\ is cr-additive. We prove that Ix « ft\x\ and use the extension theorem for vector measures, stated below. For process-measures with integrable semi variation, the reader is referred to [D3]. C. If M is an orthogonal martingale-measure with values is a Hilbert space E, we associate to it the sharp bracket < M > which is a positive increasing, integrable, right continuous process-measure, hence /^<M> is cr—additive. We prove that Ix « M<M> and use the following extnesion theorem for vector-valued measures.
Theorem 1. Let K be a ring on a set T, m :Tl —> E a finitely additive measure and /x : 72. —> Ti+ a positive, a— additive measure such that m « /i. a) Then m and p, can be extended uniquely to a—additive measures m' and // respectively on the 5—ring T> generated by ~R, and we still have m' « ^. b) If [i is bounded on 7i, then m' and // can be further extended to a—additive measures on the cr—algebra E generated by Ti. c) Ifm is cr—additive and has finite (resp. bounded) variation \m\ on Ti, then m can be extended to a a—additive measure m' : V —> E (resp m! : E —> E) with finite (resp. bounded) variation.
For the proof, we refer to [Dl], Theorems 7.3 and 7.4. We start by presenting first, functions of two variables and two-parameter process, which will be needed in the proof of case A above.
PROCESS-MEASURES
D.
125
Functions of two variables
Prom now on we shall assume that L — R and L = 6(R), but we maintain the notation L and L. We consider on R x L the usual order relation: z = (t, x) < z' = (t1, x'), ift2
We say g is cadlag if it is right continuous on R x L and at every point z e R x L it has left limit lim
z'—>z,z' R is said to be increasing if z < z' implies g(z) < g(z'). The function 5 is said to be incrementally increasing if z < z' implies &z,z>9 > 0. Definition 2.3. The variation of a function g : R x L —> E on a rectangle R (not necessarily bounded, or of the form (z, z1}) is the number var(g,R) = supP where the supremum is taken for all finite partitions P = (Ri)i£i consisting of rectangles Ri = (zi, Zj\ with vertices in R.
126
Nicolae Dinculeanu
The variation function \g\ is defined on R x L by \g\(z) = var(g,(—oo,z\) for z £ R x L Sl(-oo)=0 M(+oc) =
We .shall use the following properties of \g\: If g has finite variation \g\, then \g\ is increasing and incrementally increasing. If g has finite variation \g\ and if g is right continuous, then \g\ is right continuous. For a detailed account see [D], §29.
E.
The measure associated to a function of two variables
Let g :M. x L —> E ~be a, function. We associated to g a finitely additive measure mg on the semiring of rectangles (z, z'}, defined by mg(z,z'} = &z,z>g. Then mg can be extended to an additive measure mg : 72. —> E1 on the ring 7£ generated by the rectangles (z, z'}. If g : R x L —> R is real valued, we have mg > 0 if g is incrementally increasing. If g has finite variation \g\, then mg has finite variation \mg\ on "R, and \mg\ =m\g\.
(See [Dl], Theorem 29.48) The a— additivity of mg is ensured by the following theorem, (see [Dl], Theorem 29.50). Theorem 2. If g :R x L —> E is right continuous and has finite (resp. bounded) variation \g\, then mg can be extended to a cr— additive measure m : T> —> E (resp. m : B(R.) L —> E) with finite (resp. bounded variation) ml and we have
\m = \mg\ = "in\g\, on 'R,. In particular, if g : M x L —> R is right continuous and incrementally increasing, then mg is positive andCT—additiveand can be extended uniquelly to a positive, cr—additive measure on #(R) . E b e a right continuous function with finite variation \g\, and let mg be the associated measure with finite variation \mg\ = rn\g\-
PROCESS-MEASURES
127
Consider the space L^(|m s |) of the functions / : R x L —> F which are Bochner integrable with respect to \mg\. We denote LlF(mg] := LlF(\mg\}. The integral J fdmg 6 G is defined first for simple functions / in L^m^), and then it is extended by continuity to the whole space LlF(mg}. We denote ^(g) := LlF(mg), and for any function / e LlG(g) we define the Stieltjes integral J fdg by fdg=
fdmg.
If / € LlF(g], we say also that / is dg—integrable.
G.
Two parameter processes
Let . F i f i x M x L — > £ b e a two parameter process. We say that F is right continuous (resp. cadlag, incrementally right continuous) if for every u; 6 fi, the function of two variables ( t , x ) H-+ F(uj,t,x) has the same property on R x L. I f F r f i x R x L — > R i s real valued, we say F is increasing (resp. incrementally increasing), if for every ui € Q, the function (t,x) t—> F(u),t,x) has the same property. For every z = (s, t) e R x L we denote fz — ft. We say that the process -F is adapted to the filtration (J-z) if for every point z = (t,x) G R x L, the random variable a; t-> F ( u j , t , x ) is J>— measurable. The following theorem associates a R 6e a h#M continuous, F ® B(R) <S> L— measurable, positive, incrementally increasing process such that Fx = supzFz is P— integrable. Then there is a positive, finite, a— additive measure IJLF '• F ® #(R) i —» M+ satisfying HF(C) =E(j lcdFz), for C e ^O B(M) ® £. «/
(See [Dl], Theorem 30.4).
H.
The measure IF
Let F : f i x R x L — > £ b e a two parameter, cadlag, adapted process such that Fz 6 Lp for every z € K x L. For every set A x (z, z'] with A € J^ we define
128
Nicolae Dinculeanu
Then IF can be extended to an additive measure IF : 71 —> LPE on the ring Jl generated by the predictable rectangular sets A x (z, z'} with A e fz. Definition 2.6 We say that F is p—summable relative to (F,LPE} if IF can be extended to a a—additive measure IF '• P <S> L —» LPE, with finite semivariation relative to (F,LPE). A first example of summable two parameter processes is that of the incrementally increasing ones. Theorem 4. Let F : f 2 x R x L — > R be a right continuous, adapted, integrable, incrementally increasing process. Then F is I—summable relative to (R, L), and we have ), forC £P®L. Proof: By theorem 2.5, the measure fj,p : F B(R) L —> R + defined by /^(C1) = £( / Ic^z), for C e .F ® fi(R) ® L is positive, finite and cr—additive. For C = ^4 x (z, z'] with A 6 fz we have
j
e /F
= IF(C) > 0,
hence,
Since IF is additive on 7£, the measure (7 i-» E(Ip(C)) is also additive on 7?.. Since both measures /j,p and E(Ip(C)) are additive on ~R and equal on the semiring of rectangular sets, they are equal on Ti: = E(IF(C}} = \\IF(C)\\i, for C € 71. It follows that IF « p>p on TL. Since /ijr is cr— additive, by the extension Theorem, 2.1, Ip can be extended to a cr— additive measure 7p : P ® L —> Ll and we still have
III. Process-measures with integrable variation 1. The variation of a process-measure L e t X r f i x E x L — > . E b e a process-measure. Definition 3.1. a) Let u> £ fi. For every interval I C R and every set B G we define the variation of the mapping X.(LJ, •) on I x B by
PROCESS-MEASURES
129
var(X(u),I x B) = supj £ \Xti+1(u,Bj) where the supremum is taken for all divisions to < t\ < • • • < tn of finite points from I and for all finite families (Bj)j&j of disjoint sets from L contained in B. b) The variation process measure \X\ as defined by \X\t(u,B) = var(X(u),(-oo,t] x B), forui <E ft,i € R and B e L and \X\X = —» sup|A"| t . C^iK c) We say X has finite (resp. bounded) variation if for every u € fi, the function (t,B) i-> \ X \ t ( u j , B ) is finite (resp. bounded). d) We say X has p— integrable variation if the function u> >—> ^^(u^L) belongs to Lp. If p = I , we say X has integrable variation. We list a series of properties of the variation. 1. If X is measurable and right continuous and has p—integrable variation \X\, the X is right continuous in LPE, i.e., for each .B € L, the mapping t i—> X t ( - , B) from R into LPE is right continuous. 2. If X is measurable and pathwise a— additive in E on L, and hasp— integrable variation, then X isCT—additive in LPE on L, i.e. X is a process-measure. 3. If X is right continuous and has finite variation, \X\, then \X\ is also right continuous ([Dl], Theorem 1.3. a) 4. If X is pathwiseCT—additive in E on L and has finite variation \X\, then the variation is also pathwiseCT—additive in E on L. ([Dl], Theorem 1.3b). 5. If X is measurable and pathwiseCT—additive in E on L and hasp— integrable variation \X\, then \X\ is cr-additive in Lp on L. Use properties 4 and 2. 2. Summability of increasing process-measures The following theorem associates to an increasing process-measure X a two parameter process F, thus reducing summability of X to that of F. Theorem 5. Let X : Q x M x L —> R+ be a positive, increasing, right continuous, I— process-measure. There is an increasing, incrementally increasing, right continuous, two parameter process F : f) x M x L —> R+ satisfying F ( u } , t , x ) = Xt(ui, (— oo,x]),a.s., /or t € E and x 6 L. // X is adapted, then F is adapted.
130
Nicolae Dinculeanu
We have Ip(C] = Ix(C) for C in the ring generated by the sets A x (t,t'\ x (x,x'\ with A E T,t < t' in R,x > x' in L. Proof: If B n 5' = 0 in L, then for each t e R we have Xt(;BuB') = Xt(;B) + Xt(;D'), in L1, hence X t (w, 5 US') =**(t,t'n>t and x'n —> x, x'n > x. If ui $ N, then
,tn,xn) = F(w, i,x); hence F is right continuous at (t,x). If w 6 N, F is also, evidently, right continuous. Finally, we prove that F(u,t,x) = Xt(u,(-oo,x}),
a.s., for i 6 R and x 6 L.
Letui^N and (t, x) e E x L. Then
F(u>,t,x) = Hint' It x' I x
t',x' e QG(u,t',x') = limlimG^,^,^) x'lx t'lt
= limlimX t /(o;, (—00,0;']) = lim Xt(u, (— oo,x']). x'lx t'lt
x'lx
Hence, if x'n I x with x'n € Q, we have F(u,t,x) = HmXt(ijJ,(-oo,x'n),
for u; Xt(-,B) is cr— additive in L1 on L, therefore -,(-oo,a4]) = A" t (-, (-00, x]), in L1. It follows that the two limits are equal a.s.: F(u,t,x) = Xt(u, (-co, x]), a.s. Evidently, if X is adapted, then F is adapted. To prove the equality IF — Ix, let C = A x (t, t'} x (x, x'] with A t' in R and x < x' in L.
132
Nicolae Dinculeanu
Then
IF(C) = = 1A
P(; t, X) + F(; t'x'} - F(; t' x) ~ F(; t, z')
= Ix(Ax(t,t'}x(x,x'})
=,
=
Since both IF and Ix are additive, they are equal on the ring generated by the sets A x (t, t'} x (x, x'} with A £ F, t < t' in R and x < x' in L. Remark. We have a.s., F(UJ,OO,X) :=—> supF(uj,t,x) =—> supXt(uJ, (— oo,x]) = Xoo(u;, (—00, x]); t
t
F(cj,i,oo) :=—> supF(w,i,x) =—> supX t (u;, (— oo,x]) = .^(c^L); F(w, 00,00) = ^(w, L). Theorem 6. Lei X : fi x R x L —> R+ be a positive, increasing, right continuous, adapted, integrable, 1-process measure. Then X is 1-summable relative to the embedding R C Z/(R,M). The measure Ix is a— additive and has finite variation \Ix\Proof: Let F : f2 x R x L —> R + be the function associated with X by the previous theorem. Since X is integrable we have F(-, 00,00) = X 00 (-,L) G L1. By theorem 2.5 there is a positive, finite, a— additive measure fip • F ® #(R),L -> R + satisfying lcdFz), for C 6 ^ ® B(R) ® L. For C in the ring generated by the sets A x (t, t'] x (x, x'} with A £. ft,t i|; with finite variation Then /x has finite semivariation (Ix)pL1 > hence X is l—summable relative to (F,G). IV. Orthogonal martingale-measures 1. Martingale measures Definition 4.1. A process-measure M : fi x R+ x L ^> E is called a martingalemeasure if for each B £ L, the process (Mt(B))t>a is a martingale. If for each t > 0 and B 6 L we have Mt(B] 6 L^., we say that M is a p— martingale-measure. A p— martingale-measure is not necessarily summable; but if it is, then the following theorem states that the stochastic integral H • M is again a martingale-measure. Theorem 8. LetM : f2 x E+ x L —> E be a martingale-measure, p— summable relative to (F,G) and let H e FF,!? sucn ^a^ fioflxB-HdlM £ LPG for all t > 0 and B € L.
134
Nicolae Dinculeanu
Then H£L1FLP(M) ' G
and H-M is a uniformly integrable martingale-
P
measure bounded in L G.
For the proof see[Di-Mu 1] theorem 2.15 and Corollary 2.16. 2. Orthogonal martingale-measures
From now on we shall assume that E is a Hilbert space. Definition 4.3. An E—valued orthogonal martingale-measure (OMM} is a cadlag, adapted, square-integrable martingale measure M : fi x E+ x L —> E satisfying the following two conditions: a)Mo(-B) = 0, for every B € L. /?) for every B,B' e L disjoint, the martingales M(B) and M(B') are orthogonal, i.e., their sharp bracket vanishes: < M(B},M(B'} >t= 0, for every t> 0. Condition a) allows to extend M with Mt(B) = 0 for t < 0 and B & L. If we set Ft — FQ for £ < 0, then the extended process-measure is still an OMM. Condition /3) is equivalent to the condition that the inner product process, denoted Mt(B)Mt(B'}, to distinguish it from the sharp bracket, is a martingale for each B, B' 6 L disjoint. 3. The sharp bracket process-measure.
Let M : Q x M+ x L —> E be an orthogonal martingale-measure. For each set B £ L, the sharp bracket < M(B) >=< M(B},M(B] > is an increasing, positive, right continuous, predictable process-measure with < M(B) > 0 = \M0(B)\2 = 0 and < M(B] >oc integrable, such that \M(B)\2- < M(B) > is a martingale. We denote by < M > the process-set function defined by < M >t (u, B) =< M(B) >t (u), for uj e ft, t e 7^+ and B e L.
Theorem 9. // M is and E—valued OMM, then < M > is a positive, increasing, right continuous, predictable, 1—process-measure. Proof: For each t < +00 and B € L we have E(< M >t (B)) = E(< M(B) >t) = E(\Mt(B) 2 ) < oo, hence < Mt(-,B) <E L1. Since < M >oo (•, B) € L1, we deduce that < M > is integrable. For each t < +00, the mapping B H->< M >t (B) from L to L1 is additive. In fact, if B, B' £ L are disjoint, then Mt(B U B1) = Mt(B] + Mt(B'), in L\
PROCESS-MEASURES
135
hence Mt (B U B') = Mt (5) + Mt(B'),&.s. Then, a.s. < M(B U B1) >t=< Mt(B) + Mt(B') >= =< M(B) >t + < M(B') >i +2 < Mt(B),Mt(B') =< M(B) >t + < M(B') >t,
>=
since < M(B),M(B') >= 0. If (Bn) is a sequence from L with Bn j 0, then Mt(Bn] -> 0 in L\ for each t > 0. It follows that E(< M >t (Bn)) = E(< M(Bn) >t) = E(\Mt(Bn)\2) -> 0 hence < M >t (Bn) —> 0 in L1, consequently < M > is a 1— process-measure. Corollary 1. If M : fl xR x L —> E is an OMM, then the process < M > is l—summable, hence the Doleans measure //<M> is a— additive onP ® L. 4. Orthogonality of IM • Theorem 10. Let M : fi x E+ x L -> £ 6e an OMM. aj /M can 6e extended to a a— additive measure IM '• P i —> L|;.
c)For every disjoint sets C,C' £P ® L and a, a' € M, we /iane
d)Assume M is real valued and D is a Hilbert space. For every disjoint sets C, C' € P L and for every x,x' £ D we have IM(C)x±IM(C')x',
in L2D.
Proof: The proof will be divided into several steps. We prove first that if C, C' 6 r(Ti x L) are disjoint, then IM(C) ± IM(C'), in L\. Assume C = A x B and C" = A' x B' with A, A' 6 ft and J5, £' e L; assume Ar\ A' — (j) and prove IM(C) -L /M(C") in L|;. This is done considering successively the following cases: yl = L> x {0} and A' - D' x {0} , with D, £>' 6 ^0; A = D x {0} and A' = D1 x (s',t'} with D e ^0 and £>' € J17,; A = D x ( s , t ] and A' = D' x (s', t'} with £> 6 ^s and D1 e J^/; ^4, A' e ft are finite, disjoint unions of predictable rectangles of the above form.
136
Nicolae Dinculeanu
Assume now C — A x B, C' = A' x B' with A, A' € H and B, B' e L; assume B n B' = 4> and prove IM(C) -L IM(C), in L|. This is done considering successively the following cases: A = A' = D x {0} , with D e FQ]
A = A' = D x ( s , t ] , with D 6 .Tv, j4, A' G 7?., finite, disjoint unions of sets of the above form. The details of the above proofs are left to the reader. Assume now C, C' 6 r(U x L) and Cr\C' = H/AfCC 1 )!^ is additive on r(72. x L). In fact, let C\,Cz be disjoint sets from r(7£ x L) with union C. Then /M(CI) -L IM(CI) in i|;, hence
We have M<M>(C) = ||/M (OHi, for C € r(ft x L). In fact: If C = A x {0} x B with A e F0 and B 6 L, then, IM(C) = 0 and I<M>(C) = 1A< M(B) >0= 0 hence \\IM(C)\\l - 0 and E(I<M>(C)) = 0 and the equality is proved in this case. Let now C = Ax(t,t']xB with A^ft and B £ L. Then
(/ - < M(B) = E[lA(\Mt,(B)\ -\Mt(B)\2)] = = E[lA\Mt,(B] - Mt(B)\2} = E(\IM(C}\2) = \\IM(C}\\\. 2
Since both measures p,<M> and | I/M^)!!! are additive on r(R- x L) and are equal on the sets of the form A x {0} x B with A e ^b and -B 6 L, and of the form A x (t,t'} x B with A 6 ^t and I? e L, the two measures are equal on the semiring 7i x L and then on the ring r(R. x L). IM can be extended to a cr— additive measure IM '• P L —* Lg. In fact, from step 3, we have /M « ^<M>- Since H<M> is cr— additive on P <S> L, by the extension theorem of measures, IM can be extended to a a— additive measure on the whole cr— algebra P L. This proves assertion a) of the theorem. For any disjoint sets C, C' € P L we have IM(C] -L /jw(C")> m -^1- In fact, let C £ r(P x L) and denote by S the class of sets C" £ "P L such that IM(C) -L lu(C' - C) in L2E. Then by step 1, S contains r(7^ x L);E is also a monotone class, hence E = P ® L. It follows that for (7 e r("P x L) and C" G P L with C n C' = 0 we have /M^) -L /A/(C") in L|.
PROCESS-MEASURES
137
Let now C' € P L and denote by S' the class of sets C1 e P ® L such that IM(C) -L Iu(C' — C) in L|,. Again £' is a monotone class containing r(P x P), therefore £' = P L. It follows that for C, C" € P L disjoint we have IMM(C) L IM(C') in L|;. If in addition, a, a' € R, then we still have IM(C)CH -L IM(C'}OL' in L^. This proves assertion c) of the theorem. The mapping C t-» H/A/CC^Hi ^s cr— additive on P <S> L. In fact, using the proof of step 2, we deduce that this mapping is additive. If Cn I in P x L, then, by the a— additivity of IM we have IM(C) —> 0 in I?E, therefore ||/M (Cn)!)! ~* 0; hence /M (•) ig cr— additive on P (C') = ||/M(C)||i, for C e P ® L. In fact, both measures are L disjoint. Prove that IM(C)X 1 IM(C'}X' in L|,. By statement c) of the theorem, we have /A/(C") -L IM(C) in L 2 , i.e. E(IM(C}IM(C'}} = 0. Then IM(C)x,IM(C')x'
>D= E(IM(C}IM(C'}
< x,
= 0,
hence IM(C)X J_ IM(C')X' in L|J. This proves assertion d) of the theorem. 5. Summability of OMM. We are now able to prove the summability of OMM.
Theorem 11. Let M : ft x R + x L -> E = L(R,E) be an OMM. Then M is 2—summable relative to R, £ and Cj
= \\IM(C)\\L*, Cj
forCtP®L.
Assume M is real valued and D is a Hilbert space and consider R c L(D,D). Then M is 2—summable relative to (D,D) and (IM)D,L*D(C) = \\IM(C}\\Ll = \\IM(C}\\^Ll,
forC&P®L.
Proof: a) We already know that IM can be extended to a a— additive measure IM : P ^ L —> L|,. Then, automatically IM has finite semivariation(/kf ) K ^2 , hence M is summable relative to (R, E). It remains to prove the equality in assertion a). Let C e P <S> L, let (Cj)i<j< n be a finite family of disjoint sets from P L with union C, and (aj)i<j< n a family of numbers with c^ < 1.
138
Nicolae Dinculeanu
Then the family (lM(Ci)oti)i 0 ) , «(0) = / (1.1) at is well-posed and is governed by a quasicontractive nonlinear semigroup T — {T(t): t > 0}. The unique mild solution is given by u(t)=T(t)f. If X is reflexive (or, more generally, if X satisfies the Radon-Nikodym property), then T(t)(Dom(A)) C Dom(j4). This can be interpreted as a regularity 141
142
Gisele Ruiz Goldstein &: Jerome A. Goldstein
result. Typically, A is a differential operator and Dom(A) encodes some spatial regularity property. But in general Banach spaces such as L1(1R^V) or BUC(R N ), T(t)(Dom(A)) C Dom(A) may not hold. It is true that for the Favard class, Fav(A), T(t)(Fav(A)) C Fav(A) holds for all t > 0 in general spaces (cf. [5], [12]); but in general one does not know how to compute Fav(A) in concrete problems. Thus it is desirable to find invariant sets J such that T ( t ) ( J ) C J holds for all t > 0, when T is a nonlinear semigroup on a "bad" Banach space and J encodes some spatial regularity. When (1.1) describes a conservation law in L : (R) (or the Hamilton-Jacobi equation in BUC(R)), then even if / is a C100 function, the solution u(t, •) often becomes discontinuous (or the derivative •^u(t, •) becomes discontinuous) for a suitable positive time t. In this paper, we find such invariant sets J in a simple way for single operators as well as for semigroups, when the base space is BUG (£2) (with the supremum norm) and the set J involves Lipschitz continuous functions. 2 LIPSCHITZ CONDITIONS Let (fl,p) be a metric space. For T: £1 —> fi a uniformly continuous function, let z/(r) := s\ip{p(r(x), x) : x e fl} e [0, oo).
(2.1)
Write v e F0(r) iff v(r] < oo. Let X — BUC(f2; R), the bounded uniformly continuous real functions on fi; X is a Banach space under the supremum norm, which we denote by \\ • \\. Let D be a closed subset of X, and let T be Lipschitzian from D to D, or T 6 Lip(£>;£>), i.e., T: D -> D and ||r||Lip = inf{# > 0: ||T/ - Tg\\ < K\\f - g\\ for all f , g 6 D} < oo. For / 6 D, let / r (z) := f ( r ( x ) ) for x 6 fi and T e (7(0; Q) as above. Then &X and ||T/T - T/ll < ||T||Lipsup{|/(r(x)) - f ( x } \ : x €
fi}.
(2.2)
Consequently, if / € £> n Lip(fi;R), so that \f(x)-f(y)\ Q suc/i i/iai T(X) = y and V(T) < Kop(x,y). Here /Co is any constant satisfying KQ > 1LEMMA 2.1 Let fi 6e an open ellipsoid in a Hilbert space. Then (Hyp; KQ) holds with KQ = 1 provided the norm is replaced by an equivalent product norm. Proof: Let QQ be an open ball B(q,r) with center q e H (the underlying Hilbert space) and radius r > 0. It is a well-known property of Hilbert space that the radial projection R onto fio is a contraction; that is, for i f - y £ Z £ / II Z ^Z 12 j
l - v l < ^ 1 |Z|_^-*-J
oClCP o t IK.,
lo |o
where sign s = s/\s\ for s ^ 0, one has
for all ioi , W2 £ -ff, where • denotes the norm in H . By replacing the original norm by an equivalent inner product norm (again denoted by • |), we can replace a ball by an arbitrary ellipsoid in the above argument. Now define T on f2 by
T(X + w) = R(y + w) for all w G Q such that x + w G fl. Here x, y are as in (Hyp; KQ). For z G f2, let tu = z — x. Then \T(Z) - z\ = \R(y + w)- R(x + w)\
0 and all f € D n Lip(fyR) wf/i |j/|| L i P < k, \\Tf\\Lip
f2 be as in the definition of (Hyp; KO). Then \(Tf)(x)~(Tf)(y)\
0 sufficiently small we have cl(Range(/ — AA)) D I? where D = cl(Dom(A)) and cl denotes closure. Then by the Crandall-Liggett-Benilan Theorem [7], [2], the closure A uniquely determines a strongly continuous semigroup T = {T(t) : t > 0} of mappings from D to D satisfying T(t)f = lim
n-»oc
/
t — \ ~n / - -A] f for / e D, t > 0; n
s)=T(t)T(s) r(.)/eC-([0,oo);D) ||T(t)||Lip < ewt
for t, s > 0, T(0) = /; for/eD; fori>0;
and u(t) = T(t)f is the unique "mild solution" of the Cauchy problem (3.1) If fi satisfies (Hyp; KQ), then Theorem 2.2 implies
provided u(i) = T(t)f is the mild solution of (3.1) and / € D n Lip(fi;R). In particular, T(t)(D n Lip(fi; E)) C D n Lip(ft; R).
Set Lip f c :={/6Lip(n;R): ||/||Lip < k}. If (Hyp; K0) holds for fi, then Theorem 2.2 applies, and we conclude that T(t)(Lip fc ) C Lip^
(3.2)
where I = e^KQk for all i > 0, k > 0. 4 THE HAMILTON-JACOBI EQUATION Consider the Hamilton- Jacob! equation
du
x)
= G,
u(x,0) = f ( x )
(4.1)
for x € R , t > 0, where the Hamiltonian H: R —> R is continuous and tf (0) = 0. Burch [4] (see also Aizawa [1]) showed that if H e ^(R^) and if the Hessian matrix ( 3^.^. (^)) is nonnegative definite for each x £ R^, then the solution semigroup T on BUC(R W ;R) satisfies C Lipfc
(4.2)
146
Gisele Ruiz Goldstein & Jerome A. Goldstein
for all t > 0, k > 0, and analogous one-sided inequalities hold for second centered difference quotients. The semigroup governing (4.1) with no convexity assumption at all on H was obtained by Crandall and Lions ([8], [6]) as a beautiful application of their notion of "viscosity solutions" . Our result in Section 3 (see (3.1)) implies that for the Hamilton-Jacobi semigroup of Crandall and Lions, (4.2) holds. This is because u; = 0 arid KO — 1 (since we can take the T in (Hyp; KQ) to be translation: T(Z) = z + y-x for z e R N ). We conjecture that |J{Lip^. : k > 0} is the Favard class of the m-dissipative operator A associated with (4.1), but this seems very difficult to prove. An apparently easier conjecture is that for the semigroup S governing f/tl
— at
where G is (jointly) Lipschitzian, a variant of (4.2) holds, namely, S(t)(Lipk) C LiPf holds for a suitable (. = l(t, k) which depends on k, G and t. (Compare [11].) We hope to return to this question in a future paper. We conclude with a degenerate parabolic example. Let fi be a smooth bounded domain in RN and let (p€C\Tix RN) with p > 0 in ft x RN . Thus (f> may vanish on all or part of 0, x e fi,
0
is governed by an m-dissipative operator A and a contraction semigroup T on X = C(fi). Set Av = (f>(x, Vv )Au for
v € V(A) = {v£ C(Sl) n X : Av Then A is m-dissipative and densely defined (by [9]). This justifies the above assertion. We now apply Theorem 2.1 as before to conclude that T(i)(Lip fc ) C Lipfc holds for all positive k and t. REFERENCES 1. S. Aizawa, A semigroup treatment of the Hamilton-Jacobi equation in several space variables, Hiroshima Math. J., 6:15-30 (1976).
INVARIANT SETS FOR NONLINEAR OPERATORS
147
2. Ph. Benilan, Equations d'Evolution dans un Espace de Banach Quelconque et Applications, Ph.D. Thesis, Univ. Paris XI, Orsay, 1972. 3. Ph. Benilan and M. G. Crandall, Regularizing effects of homogeneous evolution equations. In Contributions to Analysis and Geometry, Johns Hopkins University Press, Baltimore, MD, 1981, pp. 23-39. 4. B.C. Burch, A semigroup treatment of the Hamilton-Jacob! equation in several space variables, J. Diff. Eqns., 23:107-124 (1977). 5. M. G. Crandall, A generalized domain for semigroup generators, Proc. Amer. Math. Soc., 37:434-440 (1973). 6. M. G. Crandall, L. C. Evans and P. L. Lions, Some properties of viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc., 282:487-502 (1984). 7. M. G. Crandall and T. M. Liggett, Generation of semigroups of nonlinear transformations on general Banach spaces, Amer. J. Math., 93:265-298 (1971). 8. M. G. Crandall and P. L. Lions, Viscosity solutions of Hamilton-Jacobi equations, Trans. Amer. Math. Soc., 277:1-42 (1983). 9. J. A. Goldstein and C. Y. Lin, Highly degenerate parabolic boundary value problems, Diff. & Int. Eqns., 2:216-227 (1989). 10. J. A. Goldstein and Y. Soeharyadi, Regularity of solutions to perturbed conservation laws, Applicable Anal, 7:185-199 (2000). 11. J. A. Goldstein and Y. Soeharyadi, Regularity of perturbed HamiltonJacobi equations, Nonlinear Anal, 51:239-248 (2002). 12. U. Westphal, Sur la saturation pour des semi-groupes non lineaires, C. R. Acad. Sci. Paris, 274:1351-1353 (1972).
The Immigration-Emigration with Catastrophe Model Michael L. Green Department of Mathematics, California State Polytechnic University, Pomona, CA 91768
Abstract A steady-state solution for the immigration-emigration with catastrophe process is derived using classic recursive methods.
I.
Introduction
The single server queue, also called M/M/l, has been the subject of mathematical investigations for over fifty years. Results for birth, death and birthdeath processes have been known for some time and can be found in most standard texts on probability models, (e.g. [19]). In the 70's the birth-death processes were generalized to include rates for immigration and emigration, and the in the 80's the effect of total catastrophe was added to the model. The typical analysis on general birth-death processes includes the solutions to the stationary and transient probabilities problem and some analysis on moments. With the rates for emigration, immigration and catastrophe in the model, this analysis becomes much more involved, almost prohibitively. As Leonard Kleinrock commented (1975 [14], page 78) after using the ztransform method to find the transient solution for M/M/l, "...we can only hope for greater complexity and obscurity in attempting to find timedependent behavior of more general queueing systems." A brief summary of the pertinent work on the birth-death-immigrationemigration-catastrophe processes will be given here for the interested reader.
149
150
Michael L. Green
The birth-death-immigration-emigration processes were first studied by W.M. Getz (1975 [4], 1976 [5]). However, in his analysis an error was made in constructing the differential equations resulting in an incorrect solution. This error was pointed out in J.N. Kapur (1979 [11]) and finally solved by Kapur (1978 [6],1978 [7]) in the steady-state case. Also in this paper the moments for population size were obtained using hypergeometric functions. Given the complexity of the steady state solution, J.N. Kapur and S. Kapur (1978 [8]) used numerical methods to study the behavior of the steady state solution with parameters set to predetermined values. A general solution was not obtained. A few years later J.N. Kapur derived the birth-death-emigration process transient probabilities for a null population (1985, [12]). This result is generalized with a logistic birth-death-irnmigration-emigration processes, e.g. where the infinitesimal probability of a birth or death occuring during At is (N(t)n + JV(i) 2 e)Ai + o(Ai), by R.J.Swift (2001 [21]). Another direction is to consider multiple rates, for example a multiple birth rate would be \iN(t),i = ! , • • • , n where N(t) is the population size at time t. The process with a multiple birth rate and ordinary death rate was investigated by W.G. Doubleday (1973 [3]), J.N. Kapur (1978 [10]) and J.N. Kapur and S. Kapur (1978 [9]) where the mean and variance of the population size N(t) was found. These results are extended to the multiple birth and death rate processes in J.N. Kapur, S. Kapur and U. Kumar (1991, [13]) where the mean, variance and some higher moments are derived for the multiple birth-death process and partial results on the multiple births and deaths with immigration and emigration and probability of extinction are given. It should be noted that the transient probabilities of an arbitrary population size appears to not be done as yet. Total catastrophe has the effect of killing off the entire population. Most of the papers using catastrophe in the model have a constant rate of total catastrophe. However, P.J. Brockwell, J. Gani and S.I. Resnick (1982 [2]) considered a model with constant immigration rate e, birth rate proportional to population size Ax, and catastrophe that occurs at a proportional rate vx and non-total catastrophe to population size Tx. Their paper investigates the distribution of Tx and ergodic properties. R. Bortoszynski, et al (1989 [1]) consider the birth-death model with constant rate catastrophe. In a later paper, N.F. Peng, et al (1993 [18]) generalize the catastrophe rate from a constant rate to a rate depending on the inter-catastrophe time r and the distribution and extinction of the population process are considered. E.G Kyriakidis and A. Abakuks (1989 [16]) give an application of catastrophe, and E.G.Kyriakidis (1993 [17]) found the stationary solution for the birthdeath-immigration-catastrophe process. It appears that prior to this paper no transient or steady-state solutions for models with catastrophe had been obtained. R.J. Swift (2000 [20]) derived the transient solution and moments
IMMIGRATION-EMIGRATION MODEL
151
for an immigration-catastrophe process and the transient solution for the birth-death-immigration-catastrophe process (2001 [22]). In this paper the stationary solution is computed for queues with constant rates for immigration a > 0, emigration /? > 0 and catastrophe 7 > 0. A recursive equation is generated using the Kolomogorov Forward equation which is solved with the rates listed. It has come to the attention of the author at a late date that B.K. Kumar and D. Arivudainambi (2000 [15]) have a transient and stable solution to the M/M/1 queue with catastrophe which is the case being considered here. The methods on the stationary solution differ and the work here will be given if for no purpose other than to point out elementary methods still sometimes work.
II.
The Steady-State Solution
The single server queue model with immigration, emigration and catastrophes will transition with the following rates: n —> n + 1 with rate a n —> n — 1 with rate /3 n —> 0
with rate 7.
Let N(t) be the population size at time t and Pn(t) = P(N(t) = n). Then using the Kolomogorov forward equations for n > 1 Pn(t + At) =
Thus P'n = /3Pn+l and the stationary solution problem reduces to
_ /3 + a + 7r
•* n+l —
n
_a n
n-
For n = 0, P0(t + At} = At/3Pi(*) + (1 - aAt)Po(t = At^Pi(t) + (1 - aAt)Po(t) + 7At(l - PO(*)) which leads to -Po).
(4)
152
Michael L. Green
and the stationary solution reduces to
= c + dP0
where c = ? and d — 3^p . Given that PI is described in terms of PQ the solution for Pn will be in terms of PQ and PI . Then with some algebra the solution can be found for PQ. In order to solve the Pn case the problem will be simplified to Pn+l = APn - BPn^
(6)
where
a + /3 + 7 A = -^—
and
a B = -.
By iterating on equation (6), Pn+2 = APn+i — BPn
= A(APn-BPn-l)-BPn = (A2 - B)Pn - ABPn^
(7) Pn+3 = APn+2 — BPn+\
= A[(A2 - B)Pn - ABPn^\ - B(APn = (A3 - 2AB)Pn + (-A2B + P- 2 )P n _i. Continuing this process grouping together terms in an appropriate manner, Pn+8 = (A8 - 7AeB -A7B + 6A5B2 -
(8)
Pn+9 = (A9 - 8A7B + 21A5B2 - 20A3B3 + (-A8B + 7A6B2 - 15A4B3
(9)
The general term Pn+k will be recursively obtained with a solution in terms of Pn and Pn-i. In the solution will be two coefficients a^ and bki defined by
and *
TjlUfc-i-nJ ' n=l
IMMIGRATION-EMIGRATION MODEL
153
for i < k = 1, 2, . . . . The terms a^o = fyto = 1 f°r &U ft by definition. For i > 0 a more detailed analysis is required for the coefficients are often zero for i < k and this detail is necessary in what follows. Consider the coefficient o-k,k-q, 9 = 0 , 1 , . . . ,
= £?(-! + ! ) • • • = 0
forallfc>l
Ofcfc-i = (F1!}! Un=\(k - (ft - 1) - ra + 1) =
ziiCft - (ft - 1) - 1 + l)(ft - (ft - 1) - 2 + 1) • • • = 0
= (fc^j(ft - (ft - «) - 1 + 1) . . . (ft - (ft - g) - (? + 1) - 1) • • • = 0. Thus in order for dkk-q = 0 there must be q + 1 terms in the product which is possible if and only if k-q>q+l or in other words
ft-1 Hence for a^ to be nonzero i = ft — q < q+ l
_ fc+i ~~
2 '
Given that i and k are positive integers, a^i 7^ 0 when
{
^i when ft is odd | whenftis even.
(12)
With a similar analysis it can be shown that b^i ^ 0
{
^-
when ft is odd
(13) | — 1 whenftis even.
The least integer function will be used in the indexing, namely in the summand with a^i coefficients i. \
( | whenftis even < i when ft is odd
154
Michael L. Green
and in the summand with bki coefficients will use _ iI
( ^r- when k is even when k is odd.
Theorem 1. Let Pn+k = Then LlJ i=0
i=0
w/iere
1 *
onii
1
/or i < k = 0, 1,2, . . . and a^o = bko = 1 /or aZZ A;.
Proof First note that for k = 3 Pn+3 = E?=0 a3iA 3 - 2 '(-5)
but b k \ k i =0 in the k even case (see (13)), hence
156
Michael L. Green
( 18) = E
Ki - afc-ii-O A(k+V-2i(-ByPn + akQAk+lPn
(19) In order to complete the proof some details on summing the coefficients is needed.
i + l)
(20)
= n nn=o( fc - i - n + 1) letra= m - 1
and
6fci +frfc.n.!= i
Ti lYn2o(k - i - n) let n = m - 1
(21)
Hence using equation (20) and equation (21) in equation (19) (19) = Elll
(22)
IMMIGRATION-EMIGRATION MODEL
157
Setting n = 1 in equation (22), UJ
Pi+k = YjakiAk^i(-B}iPl +
bkiAk~Vi+l\-B}i+lPQ.
(23)
i-Q
i=0
To reduce the complexity represent the coefficients by the following: i=0
and
then equation (23) becomes Pl+k = il>kPi+rikPo.
(24)
Using equation (5) the equation (24)
Summing the P^'s in order to find PO, assuming the infinite sums are finite, 00
00
n=0
k=0 oc
^(ci/jk + (di/jk fc=0
Acknowledgments The author expresses his thanks to Professor R.J. Swift for suggesting this problem over coffee in New Orleans. References 1. Bartoszynski, R., Buhler, W.J., Chan Wenyan and Pearl, D. K. (1989), Population processes under the influence of disasters occuring independently of population size. J. Math Biol. 27, 179-190.
158
Michael L. Green
2. Brockwell, P. J., Gani, J. and Resnick, S.I. (1982), Birth, immigration and catastrophe processes, Adv. in Appl. Probab., 14 709-731. 3. Doubleday, W.G. (1973), On Linear birth-death processes with Multiple Births, Math. Biosciences, 17, 43-56. 4. Getz, W.M. (1975), Optimal Control of a Birth and Death Population Model, Math. Biosciences, 23, 87-111. 5. Getz, W.M. (1976), Modeling and Control of Birth and Death Processes, Report WISK, 1, 6, National Institute of Mathematical Sciences, Pretoria. 6. Kapur, J.N. (1978), On Generalized Birth and Death Processes and Generalized Hypergeometric Function, Indian Jour. Maths.. 20, 11-20. 7. Kapur, J.N. (1978), Application of Generalized Hypergeometric Function to Generalized Birth and Death Processes. 8. Kapur, J. N., Kapur, S. (1978), Steady-State Birth-Death-ImmigrationEmigration Processes,Proc. Acad. Sci., 48 (A), III, 127-135. 9. Kapur, J. N., Kamar, U. (1978), Generalised Birth and Death Processes with Twin Births,Nat. Acad. Sci. Letters, I, 2, 30-32. 10. Kapur, J.N. (1979), Generalised Birth and Death Processes with Multiple Births, Acta Ciencia Indica, 5, 7-9. 11. Kapur, J.N. (1979), On Birth and Death with both Immigration and Emigration, Proc. Nat. Acad. Sci. India, 49 (A), II, 85-95. 12. Kapur, J.N. (1985), On Birth-Death-Emigration Processes, Jour. Math Phy. Sci., 19, 4, 301-323. 13. Kapur, J.N., Kapur, S., Kumar, U. (1991), Birth and Death Processes with Multiple Births and Deaths and with Immigration and Emigrations, Bull. Math. Ass. of India, 23, 1-22. 14. Kleinrock, L. (1975), Queueing Systems, John Wiley & Sons, New York. 15. Kumar, B.K., Arivudainambi, D. (2000), Transient Solution of an M/M/1 Queue with Catastrophe, Computers and Mathematics with Applications, 40, 1233-1240. 16. Kyriakidis, E. G. and Abakuks, A. (1989), Optimal pest control through catastrophes, J. Appl. Probab., 27, 873-879. 17. Kyriakidis, E. G. (1994), Stationary probabilities for a simple immigration-birth-death process under the influence of total catastrophes, Stat. & Prob. Let. 20, 239-240. 18. Peng, N.F., Pearl, O.K., Chan, W. and Bartoszynski, R. (1993), Linear birth and death processes under the influence of disasters with timedependent killing probabilities, Stochastic Process. Appl., 45, no. 2, 243-258. 19. Ross, S.M. (1997), Introduction to Probability Models,6th ed., Academic Press, San Diego. 20. Swift, R.J. (2000), A Simple Immigration-Catastrophe Process, Math. Scientist, 25, 32-36.
IMMIGRATION-EMIGRATION MODEL
159
21. Swift, R. J. (2001), A Logistic Birth-Death-Immigration-Emigration Process, Math. Scientist, 26, 25-33. 22. Swift, R.J. (2001), Transient Probabilities for a Simple Birth-DeathImmigration Process Under the Influence of Total Catastrophe, Int. J. Math. Math. Sci., 25, 10, 689-692.
Approximating Scale Mixtures Hasan Hamdan Department of Mathematics and Statistics, James Madison University, Harrisonburg, Virginia John Nolan Department of Mathematics and Statistics, American University, Washington DC
I.
Introduction
The distribution of products and quotients of continuous random variables are very important and occur in many situations. For example, the sampling distribution of a statistic may be the product or the quotient of two random variables. However, many of these distributions do not have a closed form or have forms which are computationally difficult to compute, including most stable densities and many other distribution functions. Hence, the main goal of this article is to approximate infinite scale mixtures by finite mixtures. Consider a continuous scale mixture random variable X with density fx(x}- Since X is a scale mixture, X = RW where R is nonnegative and independent of W. Using the convolution formula, the density of X can be written as /•oo
fx(x) = I Jo
h(x;r)^(dr),
where h(w; r) is the density of RW and the mixing measure TT is the distribution of R. If h and vr are known but fx has a complicated form, then one can use the stochastic representation of X to generate random samples from the distribution of X. 161
162
H. Hamdan & J. Nolan
II. A.
Examples
Contaminated Normal
If A takes on a finite number of values, say <TI, . . . , 7^2, which is sometimes called a contaminated normal mixture.
B.
Symmetric stable distributions
A stable random variable X with index of stability a 6 (0,2], scale parameter a € (0,oo), skewness parameter /? e [—1,1] and location parameter /it € (—00,00) is denoted by Sa(o',f3,^,). The characteristic function of X is given by (f>x(u) where / - , _ / exp (— a01 \u\a [l — z/?tan (fa) (sign (u))] + i/j,u] (u) — | exp ^^ u| ^ + ^g ^sign ^ln (jo-ui)] + ^u)
x(
a ^1 a = 1
If fj, = 0 and /? = 0, then X is symmetric. Suppose that X is Gaussian with mean zero and variance 2a2, A is positive stable S'a/2((cos(7rQ;/4))2/'Q:, 1,0), and A and X are independent, then W = Al/2X is symmetric alpha stable with scale CT, see [4] for more information on stable distributions.
C.
Exponential Power Family
The exponential power family consists of all distributions having densities of the form f ( x ) = fcexp(— \x ) where x € R and b > 0 If b < 2, then / is a variance mixture of normals where the inverse of the scale variable I/'A is stable 5^/2(1, 1,0). Two important special cases are the normal (b = 2) and the Laplace or double-exponential (b = 1). See [5] for a complete study of this family.
APPROXIMATING SCALE MIXTURES
163
D. Weibull In [3], it is shown that the Weibull distribution with shape parameter p — 1/2 and scale parameter l/A/2a is a scale mixture of exponentials with mixing distribution G'(a) = (|) exp (— 0. We wish to minimize the number of terms subject to the desired tolerance level. The number of terms needed for this approximation depends on the desired tolerance level, and the modulus continuity of h as a function of r, and the mixing measure, TT. THEOREM l:Let / be a scale mixture of h with mixing distribution ?r on [0,oo) such that f ( x ) < oo and sup/i(x; 1) < oo for all x € R. If /i has a modulus of continuity §(•) as a function of the scale parameter, s, then for any e > 0, there is an interval [a, b] C (0, oo) and a discrete distribution TT* on 5 = [a, b] with at most M(<J(-); 5) atoms such that /•
/•oo
sup / h(x\ s)-n(ds) — I h(x\ s)n*( < (1) Jo Js The proof consists of two steps. In the first step, we will show that there exists S = [a,b] such that /•oo
/•
/ h(x; s)ir(ds) — / h(x;s)-n(ds} < 6/2. Jo Js In the second part, we will construct a discrete measure TT* such that If h(x;s)^(ds)— If h(x; S)TT* (ds) <e/2. Js Js The triangular inequality will then give (1). Suppose that the maximum value of h(x\ s) is attained at (XQ, SQ). Then, f ( x ) = J0°° h(x; s)n(ds) < /0°° h(xo, so)ir(ds) < oo. Therefore there exist a and b such that /Qa h(x;s)rn(ds) < | and /b°° h(x; s)?r(ds) < |. This proves the first inequality.
164
H. Hamdan & J. Nolan
The second part is constructive. First, partition S in the following way : SQ = a, Si = Si-i + 6(si,e)., where s, is found based on the definition of continuity of h at Sj_i with the given e. Since 6(-, e) is positive and non-decreasing, some sn eventually will exceed b. Define a disjoint cover of Si I\ = [so,S2\, h = (SI,SJ\,....IM = (s2M-2,b]. Set TT,- = TT(/J) and a,j = min(s 2 j-i,&),j = 1, M. Also, h(x;aj}Kj = h(x,a,j) ff Tr(ds}. So, / h(x\ s)ir(ds) — I h(x; S)TT* (ds Js Js M h(x; S)TT(O!S) — ^T h(x;a^}',
L ^j. M
M
,
I h(x; iJli M
[h(x;s)ir(ds) — M
^ The definition of s'^s and the modulus of uniform of continuity guarantees that if s € Ij, then \h(x; s)Tr(ds) — h(x,a,j)\ < e/2. Hence, the last line above M
i s < E////2^(^) = | 7 r ( 5 ) < f . EXAMPLE 1. EXAMPLE 2: Approximating Variance Mixtures of Normals We will say that a random variable X is a (generalized or infinite) variance mixture of normals if X = AZ, where Z ~ N(0,1), A > 0, A and Z independent. We exclude the possibility that A = 0 with positive probability, which would make X have a point mass at the origin, so X would not have a density. If A has a density then X has pdf /•o
= / Jo
(2)
where g(x\a) = ^g(x/a\l) is the Normal(0,(T 2 ) density and the mixing measure IT is the distribution of A. Equivalently, the characteristic function x(t) of X can be written in the form fC
4x(t} =
where
Jo
is the characteristic function of the random variable aZ
APPROXIMATING SCALE MIXTURES
165
As above, when A takes on a finite number of values, the density of X is M
One can try to fit any data with such a mixture. When M is known, the EM algorithm (Expecation-Maximization algorithm) can be used to estimate the parameters. If M is unknown, then one typically tries different values of M and selects one based on some information criteria. Here we consider a related question: suppose it is known that X is a mixture of normals with known scale A having distribution TT. If the density of X is difficult to compute, then we might want to approximate it by a finite mixture. Two practical questions are: how many terms to take and what values of KJ and -— J(7i - 02 for all x e R. Fixing a, \dg(x a)/da\ =
x2 _ 2
^
g(x\a) is maximized at x = 0, where
it takes value g(0\a)/a = l/(\/27fcr 2 ), Hence \g(x\ai) - g(x\a-z)\ < (max \dg/da\] \ai - a2| = \a\ - a2\/(V2ira2). Define recursively a0 = a,
aj — CLJ-I + VZndj^e.
(3)
The distances between the a^-s are strictly increasing, so there exists an M = M(e,a,b) such that a^M > b. The reason for the term \727raj_je has to do with the rate of change of g(x a). Define a disjoint cover of [a, b}\ I\ = [a0, a2], /2 = (a 2) 04],... ,IM = («2M-2, b]. Set TT,- = TT(/J) and oj = min(o2j_:, 6), j = l,...,M. Note: the value of M depends only on a, b, and e through properties of the normal densities g(x\a). It can be found by solving equation (3) recursively. M is independent of the scale variable A, which enters only through the weights TTj = TT(/J) = Pr(A e Ij), not the determination of the intervals Ij. In general, A can take arbitrarily small values and arbitrarily large values.
166
H. Hamdan & J. Nolan
In such a case, write /•oo
I Jo
ra
rb
rcc
g(x\a}it(da} = I g(x\a)Tr(da) + I g(x\a}-K(da) + I JO
Ja
g(x\a)-K(da).
Jb
(4) The following lemma shows that in all cases where /(O) is bounded, there exist a, b such that the first and last integrals are less than e/3, while the middle integral can be approximated using Theorem [1]. LEMMA 4: Let X = AZ be a scale mixture of normals, e > 0 and Irt f ( x ) be its density function. (a) If /(O) < oo, then there exists an a > 0 such that J0° g(x\(r)ir(dcr) < e for all x 6 M. (b) There exists a 6 > 0 such that Jfe°° g(x\a}-n(da) < e for all x € R. /(z) = O(a:|cr)7r(da) < k$™ a~lK(da) = /(O) < oo. Let
r
h(a) (a) = II g(x Jo a
roc
a~lir(da) = k
Jo Let an be any sequence that converge to 0, then l^Q^a~l —> 0 pointwise to zero on (0, oo) and l( 0 ,a n ) (J ~ 1 < a~V G Ll(it). So, /i(a n ) —> 0 by the Dominated Convergence theorem. Similarly, let h(b] = f^ g(x\<j)Tr(do') /
/-/-oo
oo oo
a~l-K(da] = k I
/
Jo
Let bn be any sequence that converges to oo, then l(b >00 )0'~ 1 —> 0 and since the last expression is dominated by ^, the result holds by applying the Dominated Convergence Theorem.
B.
Discussion
EXAMPLE 5: Approximating Symmetric Stable that are Variance Mixtures of Normals Recall that if X is Gaussian with mean zero and variance 2a2 , A is positive stable Sla/2((cos(7rQ;/4))2/'Q:, 1,0), and A and X are independent, then W = Al'2X is symmetric alpha stable with scale a. In particular, when a = |, a = 1, and A is 52/s((cos(7r/3))6/'4, 1,0) then W is 54/3(1,0,0). The density of A is shown in Figure 1. If A is truncated between a = .1 and b = 7 then about 88.9% of the total density is captured and when a = .1 and b = 20 then about 94.7% of the total density is captured which is a gain of 5.8%. When b = 40, 96.7% of the total weight is captured. Fortunately, and as illustrated by 3 , taking larger values of b
APPROXIMATING SCALE MIXTURES
167
Figure 1. The density of A with a = I, (3 = 1 and a — "v/2/4. doesn't have significant effect on M, the number of terms needed to approximate the density of W, which depends mostly on a. The exact density and the approximated density are shown in Figure 2. The number of terms used in the approximation is 6. The approximation looks very good in the body but as expected the error becomes larger and larger as x becomes much larger than b. We can always improve the approximation in the tails by taking larger b and we can always improve the approximation in the body by taking smaller a. It is also clear that small change in a causes a dramatic change in M. In practice and as we saw in the last section, many of the terms used in the approximation process may have small weights. One can eliminate these terms and renormalize the remaining terms but then there is no guarantee that the same tolerance level that we started with will be maintained. In [1], it was shown that the kurtosis of a mixture is never less than the kurtosis of a normal. Additionally, they provided a necessary (but not sufficient) condition for X to be a scale mixtures of normals. Specifically, the log/square plot, in which log f ( x ) is plotted as a function of x 2 , must be convex. Moreover, the tails of the infinite scale mixture are always heavier than any normal. In practice, we suggest starting by looking at the unimodality of the empirical density of the sample. If the empirical density is unimodal, we proceed by checking the symmetry. If the empirical density is significantly symmetrical, we proceed by looking at log/square plot. Once we decided based on the graphs the underlying distribution is infinite scale mixture, we need to
168
H. Hamdan & J. Nolan
Figure 2. Solid is exact S4/3 (1,0,0) density and dotted is approximated density by truncating A between .1 and 40. estimate the mixing measure. There are many practical difficulties in estimating the mixing measure. For example, when we use the EM method to find the MLE of the mixing measure in the finite case, we might find a large local maxima that occurs as a consequence of a fitted component having a very small (but nonzero) variance. Moreover, it is not clear how to initialize the estimates, especially when the mixture is a scale mixture. A key problem in finite mixture models is the number of components in the mixture. Several criteria based on the penalized log-likelihood, such as Akaike Information Criterion, AIC, the Bayesian Information Criterion, BIG and the Information Complexity Criterion, have been used. Currently, we are testing and exploring a new method for estimating the mixing measure over a predetermined grid of r values and a predetermined grid of x values. The new method is called UNMIX and it is based on minimizing the weighted distance between the empirical density and the estimated density using the discrete scale mixture over the given grids.
References Beale, E. M. L. and Mallows, C. L., Scale Mixing of Symmetric Distributions with Zero Means, Annals of Mathematical Statistics, 40, 11451151, 1959. Feller, W. J., An Introduction to Probability and its Application, Vol. II, 2nd ed., NY, Wiley, 1971.
APPROXIMATING SCALE MIXTURES
169
3. Jewell, P., Nicholas., Mixtures of Exponential Distributions, The Annals of Statistics, 10,479-484, 1982. 4. Samorodnitsky, G., and Taqqu, M., Stable Non-Gaussian Processes,NY: Chapman and Hall, 1994, pp. 20-22. 5. West, M., On Scale Mixtures of Normal Distributions, Biometrika, 74, 646-648, 1987.
Cyclostationary Arrays: Their Unitary Operators and Representations Harry Hurd Department of Statistics, University of North Carolina Chapel Hill, N.C. 27599 -3260 [email protected] Timo Koski Department of Mathematics University of Linkoping Linkoping SWEDEN [email protected]
Abstract In 1927 Norbert Wiener studied the spectrum of a numerical infinite sequence, which he called an array. In this paper we show how unitary operators fit naturally into the study of cyclostationary arrays (or numerical sequences), which are nonstationary but in a cyclic manner. Here we use an isomorphism between Cyclostationary stochastic sequences and their array counterparts, to show how Hilbert-space representations of Cyclostationary random sequences are interpreted in the case of arrays.
I.
Introduction
In 1927 Norbert Wiener introduced (see [30]) the spectrum of a doubly infinite complex numerical sequence, x = {xj}!?_00, which he called an array. 171
172
Kurd and Koski
If the sums M ]=-M
are finite for every k, the sequence x is called regular and then, in particular, ?"o is finite. As pointed out by Masani [22], Wiener did not make use of the fact that for regular arrays, r^ is bounded and non-negative definite and thus by Herglotz' theorem, /•27T
rk = / eiXkdA(X) Jo with A bounded and non-decreasing on [0, 27r]. Rather, Wiener defined, through L2[0, 27t] limits, spectral functions A(\) that are non-decreasing even for non-regular arrays. Note regular arrays are stationary in that the correlation, given by (1.1), between the array {xj}°^L_go and it's shift {^j+fc}°l_oc depends only on k. This will be repeated a little more precisely below. The main focus of this paper are arrays whose correlations all exist (and hence are regular) but, in addition, for some fixed N the shifted crosscorrelations also exist for all subsequences of the form {xjN+m}^-^ , m ~ 0, 1, . . . , TV — 1. Under the assumption of regularity, these cross-correlations are denoted here as M (1-2)
1
j=-M
and Rm,n satisfies the condition of cyclostationarity, Rm,n = Rm+N n+x for all m,n. Our approach is to study, first for stationary and then for cyclostationary arrays, the notions of unitary shift operators, with the goal of obtaining representations of cyclostationary arrays. For stationary arrays, the unitary operator is identified with the 1 place shift whereas for cyclostationary arrays, it is identified with the N place shift. In a previous work [15], we developed these operators in the context of sequences having zero mean values. Here we address the issue of sequences with non-zero means. In [33] Herman Wold identified an isometric isomorphism, to be called the Wold Isomorphism, between the Hilbert space of a stationary stochastic sequence and a a Hilbert space formed from a numerical sequence. The notion of such a map for the cyclostationary case was introduced by Gardner [7, pp. 347], in his work on cyclostationary time series, with a construction based on randomly time shifting a cyclostationary array. We showed in [15] the existence of such a mapping in the cyclostationary case precisely in the sense used by Wold [33] for the stationary case. In the same work we showed how spectral representations for arrays (numerical sequences) could be obtained from the spectral theory for stationary or cyclostationary random sequences. Here we shall elaborate on this idea and use it to map other representations
CYCLOSTATIONARY ARRAYS
173
of cyclostationary random sequences [14, 5] to the corresponding results for arrays. An array is sometimes called a numerical sequence, a functional sequence or a time-series. In the remainder of the paper we shall mainly refer to an array as a' numerical sequence, or sometimes just as a sequence. Brillinger [4] gives a nice exposition of stationary arrays. Other works by Gardner concerning cyclostationary stochastic processes and cyclostationary time series include [8, p. 377], [9], and [10]. In the latter, a probability is defined in a fraction of time sense, and these ideas have been extended to almost periodic and higher order cases [16, 17]. Bass [2] deals with continuous time "pseudo-random" functions and gives a method, also useful in the current context, for understanding the completion of a pre-Hilbert space generated by a single numerical function. We now summarize the contents a little more completely. In section II we present the basic theory for stationary sequences, including the Wold isomorphism. Here we are concerned with the space of sequences obtained by linear combinations of shifts of a given sequence x. The closure HI (x) of this linear space with respect to the norm induced by the averaging inner product has been shown, essentially by Bass [2], to be a Hilbert sub-space of a Marcinkiewicz space of sequences. This gives a little more information about the sequence space that is isomorphic to the linear space of some second order stochastic sequence. Finally, it is important to note that the one place shift of a stationary sequence defines a unitary operator in the aforementioned Hilbert space of sequences, and this corresponds through the isomorphism to the unitary operator on the linear space of a stochastic sequence. In section III we define CS (cyclostationary) numerical sequences in terms of the N-step mean and the N-step autocorrelation formed by averaging Nstep subsequences. Using the N-step autocorrelation, a scalar product on the linear combinations of shifts of a given CS sequence x is defined and its closure HN (x) is the sequence space that is isomorphic to the linear space of some stochastic cyclostationary sequence. The result of Bass [2] again proves this sequence space to be a Hilbert sub-space of a Marcinkiewicz space of sequences. In this case, the one place shift of a CS sequence is no longer unitary in HN (x) but the TV place shift is unitary and corresponds through the isomorphism to the unitary operator of a stochastic cyclostationary sequence. We then apply the representations for stochastic CS sequences to the case of numerical CS sequences.
Kurd and Koski
174
II.
Stationary Numerical Sequences A.
Preliminaries
Let x = {xj}°?__00 designate a numerical sequence of complex numbers. If IIj{x} is the coordinate map i.e. IIf{x} := Xj and A is a complex number, then the operations of addition and multiplication by a scalar are denned in terms of the coordinate maps as n
j{x + y} = zj + 2/j, IIj{A • x} = \Xj.
We define also the mean M
(2.1)
2M
and the norm ,
I1 x I!22.- lim "
M
V l j=-M .*-".
X
(2.2) v '
whenever the limits exist. Denoting
we can write || x \\^= p, (|x|2) when the limit exists. For the sequence 1 = (...,1,1,1,.. .), it is clear that /u (1) = 1 and || 1 |||= 1 so for any scalar (complex) constant c, /j, (cl) = c and cl \\\= If both the limits ^ (x) and || x ||| exzsi, then it follows that
(x - /x (x) 1) = /z (x) - M (/x (x) /i (1)) = 0
(2.3)
and (2.4)
It is easy to show from the Cauchy-Scwharz inequality for finite sequences that if the limit /x (|x|2) is a finite limit then also /x(|x|) exists as a finite limit. But it is not true that the existence of JJL (|x|2) as a finite limit implies that the limit // (x) exists. A counterexample is given by the sequence x taking values in {—1,1} with ever increasing strings of +ls and —Is so that the partial sums in (2.1) are not convergent. Now with y = 1, for which /z (1) and || 1 \\2 exist as limits, and x the ±1 sequence described above, then it is
CYCLOSTATIONARY ARRAYS
175
clear that the partial sums M
,
M
= 2MTT E I*?
j=-M
j=~M
do not converge. Hence the space of sequences for which || x ||| exists as a limit is not a linear space. Continuing, we introduce for a pair of numerical sequence x and y the scalar product
_ i oo
{Xjy; J>j=—oo . The J
preceding example
shows that it is necessary to hypothesize the existence of the limit (2.5) as the existence of || x ||| and || y \\\ is not sufficient. Beginning with Cauchy's inequality for finite complex sequences one can show | (x, y) | 2
exists and is equal to (x, y) in (2.5); that is, the limits are independent of the origin. The autocorrelation of x is the sequence {^fc}^=_00 where -. r.
.—
, (~v\ •—
*k •— rf k \ X * ) •—
lim
A"" oTi/f i 1
M \
T •
~7r •
i
s_^ ^l+Q^y+q+ki
(*) *7\
\z"')
if the limit exists, and by our previous comment it is independent of q. If TO exists, then at any value of k for which rk exists, r-k = ^k and r^ |< TO = (x, x) < oo. We denote S as the forward (left) shift operator, defined in the set of doubly infinite numerical sequence by the action Uj{Sx} = xj+l
(2.8) 1
for every j. The inverse 5" exists and is simply given as the backward (right) shift, since the sequences are two-sided. The operation of j consecutive shifts to the left, <S^, is defined inductively as the j'th power of S, and S^i is denned similarly. Hence we may write
(2.9)
176
Kurd and Koski
We define next ( c.f. [32, p. 98]) the notion of a regular stationary numerical sequence in the sense of Wiener and Wold. This sense of regularity is not the same sense used in the study of prediction of random sequences, where regularity means that the intersection of the subspaces of past values is {0}. Definition II. 1 A sequence x = {xj}0^^ of complex numbers Xj such that 1. the mean fj, (x) exists as a finite number, 2. the autocorrelation r^ (x) exists for every integer k, is called a regular stationary numerical sequence.
{e
.
-I OO > J
is regular and fc=-OC
stationary for any real a. Two sequences ea and e^ are orthonormal with respect to (•, •)1 for a ^ 0. | Now we elaborate the preceding to stationary vector numerical sequences, or jointly stationary numerical sequences. First, if |z( fc ), k — 0, 1,. . . ,K — 1\ is a collection of numerical sequences, we define
\ z:=
(2.10)
Definition II. 2 A numerical vector sequence z, with K components zs-k> = /fc\ -i OO >
{ z] J
k = 0, 1, . . . ,K — 1, is regular stationary if the component se-
J j = -00
quences are regular and jointly stationary in the sense that the means and cross correlations defined by -,
M
M
exist for 0 ,
(2-15)
J=-P and on this linear space (xj,x 2 ) of (2.5) is seen to have the properties of symmetry, additivity and homogeneity; that is, (a) (xi,X2) = (x 2 ,xi), (b) (xi +x 2 ,x 3 ) = (xi,x 2 ) + (x 2 ,x 3 ) and (c) (axi,x 2 ) = a(x:,x 2 ) for every complex a and all xi,x 2 , x3 in M\ (x). In order for (xi, x 2 ) to be positive definite in the sense that || x ||2= 0 •$=>• x = 0 we interpret 0 as the (equivalence) class of sequences that are equivalent to the zero sequence (the complex sequence with Xj = 0 for every j ) . This expresses the sense of uniqueness implied by the norm induced by the scalar product (2.5). For example any two sequences xj,x 2 for which z = xj — x2 £ £2 are clearly equivalent because z satisfies ||xi — x 2 |J2 = 0. Thus any finite sequence is equivalent to 0 and any two sequences differing in a finite number of positions are equivalent. In the sequel, x will denote the (representative of ) any equivalence class in M (x). By the definitions of the linear operations and the inner product (•, •) we have thus established M (x) as a pre-Hilbert space. Let us note that the autocorrelation {r^ (x)}^=_00 is a non-negative definite sequence and is identifiable as the autocorrelation function of some (weakly) stationary stochastic sequence £ = {£j}°l_oc defined on some probability space (fi,.7r, P). From the sequence £ we can determine a Hilbert space H (£), the linear span of £j closed with respect to mean square norm II ^ )!#(£)= VE I ^ I 2 ' wnere E is expectation with respect to the probability measure P. Let us consider a linear map J defined on M (x) and assuming values in the linear span sp{^/~, A; € Z} of the sequence £ by
178
Kurd and Koski J ASpx + /zS*x = X£p + fj£k
(2.16)
for arbitrary complex A,/i and integers k,p. This defines the Wold isomorphism between the pre-Hilbert space M (x) and it's image JM (x) = sp{£,k,k €E Z}, which is dense in H (£). If we denote H (x) as the completion of M (x) with respect to || • ||2, then by the continuity (boundedness) of J on M (x) , J extends to a map from H (x) to H (£) and is the desired Hilbert space isomorphism. The elements of H (£) are all LI (Q,.F, P) random variables, but there arises a question about the interpretation of the limit points of M (x). That is, are we left only to define the closure as an abstract completion as in [26, p. 121, 124 - 125], or does there exist a more concrete interpretation? More information is provided by the following theorem of J. Bass [2, pp. 3335]. We shall denote .M2 (for Marcinkiewira) as the space of sequences for which M \\2M= limsup
Theorem II. 2 (Bass) If M is a pre-Hilbert space with inner product (x,y) given by (2.5), and if M C .M2 , then the closure of M with respect to \\ • \\2M is a Hilbert space H C M2 and (x, y) is still given by (2.5) for any x, y G H. That is, if y € H then y G Mz but also that if y and z G H then (y,z) exists as a limit. To emphasize the meaning of this for our current application, as long as y, z are derived from the x through the formation of M (x) and including its limit points, then (y,z) can be expressed as a limit of the form (2.5). Hence we obtain the following. Proposition II. 3 Let x be a regular stationary numerical sequence having autocorrelation sequence {rk (x)} and let J be the map from M (x) into sp{£,k,k G Z} C H (£) defined for arbitrary complex X,p, and integers k,p by (2.16). Then J can be extended as an isometric Hilbert space isomorphism from a complex Hilbert space H (x) D M (x) to the complex Hilbert space H (£), the closed linear span of a wide sense stationary sequence £ that has autocorrelation function equal to {rfc}^__00. For any y, z € H (x),
M
'
2M
j=-M
The mapping J given by (2.16) leads immediately to *It is an isomorphism in the sense that it is a bijection (invertible) and preserves inner products, and hence is also linear [26].
CYCLOSTATIONARY ARRAYS
179
JSkx = £ fe , A; an integer.
(2.19)
Let us recall that there exists a unitary operator U denned on H (£) for which (2-20) (see e.g. [28, p. 14]). Hence the Wold isomorphism J leads to JSkx = £fc = uk£o = UkJx,
(2.21)
which implies the expression U = JSJ'1.
(2.22)
Thus in the stationary case U and S are unitarily equivalent or similar operators (see [29, p. 242] and [26, p. 193]). To see how J acts on constant sequences, set x = 1 = (. . . , 1, 1, 1, . . .). Then first we have M (1) = { A l , A e C } is of dimension 1 and so the corresponding random sequence will be ^ = JSk\ = £, a fixed random variable. Clearly E{£k} — E{^} is constant with respect to k. For arbitrary x, the existence of // (x) corresponds to the existence of (x, 1). So if x has a non-zero component along the sequence 1, then JSk~x. = £fc will contain a nonzero component of Jl = £. And conversely, if ^ has the property of a constant projection on £, expressed (£&,£) = c, then J^1^ = iSfcx contains a component along 1 by the use of ( 5&x, l) = ( J«Sfcx, Jl}
v
' J \
' /
Since the existence of the mean /x (x) implies the existence of (<Sfcx, 1J for every k, then (z, 1) exists for every z € M(x). The following proposition may be concluded from continuity of the inner product. Proposition II.4 ery z G H(x).
If ^ (x) = (x, 1) exists, then p, (z) = (z, 1) exists for ev-
Proof. Take z G /f(x) and zn G M(x) with zn —> z. Then the sequence of numbers (zn, 1) is Cauchy because |(z n ,l) - (z m ,l) | mx is the -.
M
^ (x) := ^Um —- £ xjN+m = /*Sny|) exists where |x • y| = {|^j2/j|}°l_00-
Definition III. 3 The N-step autocorrelation of a numerical sequence x is the N-step scalar product between the numerical sequences (Smx and <Snx.'
CYCLOSTATIONARY ARRAYS
181
(3.4) when the limit exists. Again it follows from the independence of the origin that if Rm,n(x) exists, then Rm+kNn+kN^} exists for every integer k and
So if the N-autocorrelation Rjn,n(x) exists for m = 0,1,..., N — 1, n £ Z, it exists for all m,n and (3.5) is true for all m,n and k. Note the N-step mean and autocorrelation are defined on subsequences formed by periodic sampling of x. In the case of stationary stochastic sequences {£„,n £ Z}, the sequences £^ = {£ n jv+fc,n £ Z,k — 0,1,... ,N — 1} formed by sampling {£n,ra £ Z} at intervals of length TV are jointly stationary sequences. In the case of numerical sequences, it does not automatically occur that subsequences formed by periodic sampling are regular and stationary because the needed limits may not exist. As an example, let x be a real sequence taking the values —1,1 for which the partial sums that would define the mean do not converge. Then construct y by ...,a;_i, —x_i,o;o, —XQ,XI, —#1,.... We can see fj, (y) exists but the limit used to define fj, ({j/2j}) does not exist.
B.
Cyclostationarity
Definition III.4 (Regular CS sequence) A numerical sequence x = {xj] of complex numbers Xj such that 1. the N-step mean (3.1) exists for every m (and thus satisfies (3.2) for all m); 2. form = 0,1, ...,TV — l,n G Z, the autocorrelation (3.4) exists (and thus satisfies (3.5) for all m,n); is defined to be a regular cyclostationary sequence with period N. There is another condition on the autocorrelation that is often found in the literature on cyclostationarity; we now make precise its equivalence to item 2 above. The proof may be found in [15]. Proposition III.l A necessary and sufficient of Rm,n(x-} for all m,n is the existence of
, £fc,r(x) := lim ~*°°
condition for the existence
M ]T xj+Txjexp(-i27rkj/N)} j=-M
(3.6)
182
Kurd and Koski
The following notation was also introduced in [15] for the univariate case. Here we give it for the multivariate case. Definition III. 5 A numerical vector sequence z, with components z[ > = (k) 1 °° z] } , k = 0, 1, . . . , K — 1, is N -stationary if the component sequences
{
J
J j = -!X
are jointly N -stationary in the sense that the means 1
M j'=— M
and are constant with respect to m for 0 < r < K — 1 ancf £fte cross correlations defined by .,
p>(N) ( \ ._ ( om(r) cn 7 (s)\ _ i:™ "m,rzl r . s J — ^ z > ^ z )N-$™X2M
X
M
V +1 ^
7ZW ZW jN+mZjN+n
j=-M
exist and depend only on m — n for 0 < r , s < K — 1 and all m, n. We can now see that a regular CS(N) sequence is univariate TV-stationary if and only if fim (x) is constant with respect to m and the quantity B^, T (x) is identically zero for k j^ 0. The 2- norm of a regular CS(N) sequence x is naturally expressed via the correlation ll-5"x||| JV = J RW(x)
(3.9)
and the notation reminds us that the norm of a nonstationary sequence can depend on time (here, n). As before, if for some n, both ^ (x) and || <Snx \\^ jy exist as limits, then
- ^ (x) l) = ^ (x) - ^ (x) &W (1) = ^JV)(x)-/xW(x)=0
(3.10)
and || 5" [x -
MW
(x) l] ||2iJV=|| <S"x |2 >JV - [^) (x)] 2 .
(3.11)
Also, for fixed n, the existence of /4i (l x | 2 ) as a finite limit implies the existence of ph (|x|) as a finite limit but not the existence of /A (x) as a finite limit. Finally, we observe that || 5"x ||2,7V= 0 is valid for every n if and only if it is valid for n = 0, 1, . . . , N — 1.
(3.12)
CYCLOSTATIONARY ARRAYS C.
183
Periodic Numerical Sequences
A special case of a regular CS sequence is a periodic numerical sequence. Definition III. 6 A regular CS sequence x is called periodic with period N if i| <S"(x - S^x) ||2iJV= 0.
(3.13)
forn = 0,l,...,N - 1. Is it not difficult to show that (3.13) implies || 5n(x - S fcAr x) ||2)JV= 0 for fc 6 Z and n = 0, 1, . . . , AT - 1.
Proposition III. 2 A regular CS sequence x is periodic with period N if and only if
/or every m,n. Proof: If x is a regular CS(N) sequence and periodic with period N, all the limits implicit in l/cw-, om+JV., cn_,\ i = HO X — o X, O Xjjv|
V~^ > V"* > Xp\qXj N+pXjN+q.
p=0 g =0
"""^ ""' '
J
-j=-np=09=0
r
Letting denote the transpose and denoting XJTV ; — (XJN, • • • , Xj^v+fc) A = (Ao, . . . , Afc) we have for every j, k k _ 2J X, ^P\xiN+pxjN+q
/o 1 r~\ (3.15)
=\ A Xjjy
> 0,
and
(3.16)
p=0 q=0
which gives the assertion. | And another important property is given in the following lemma. Lemma III. 4 7/x = {xj}'?_00 is a regular CS(N) sequence, then y — x + A<Spx, p is an integer
(3. 1 7)
is a regular CS(N) sequence. Proof. The existence of the N step mean ^m (y) for all m is clear. We now need to show that the autocorrelation kernel Rm,n(y] for the sum exists. To verify this we consider the finite sum M 5^ yjN+myjN+n j=—M
M = E (XJN+m + AZjJV+m+p) (XjN+n + j=—M
M —
M x
x
/ ^ jN+m jN+n + A / ^ XjN+ j=-M j=-M M
M
A / ^ xjN+m+pxjN+n + |A| / ^ j=—M j——M
Dividing this equality by 2M + 1 and taking the limit as M —> oo, we obtain ). Hence the required autocorrelation kernel Rm,n(y) exists for everyTO,n and (N) /
_
—
showing that y is regular CS with period AT. | It follows that any finite linear combination of shifts of a regular CS(N) sequence x is regular CS(N) with the same period and therefore it is meaningful to consider the linear space MN (x) consisting of all such finite linear
(3-18)
CYCLOSTATIONARY ARRAYS
185
combinations: n
y = ^ AjS^x, arbitrary n, \j. j=i On Afjv (x) we define for arbitrary yi, y2, an TV-step scalar product according to
(yi> yz)N = £ £ x}x]>Rj,j'(x)-
(3-19)
The TV-step scalar product (3.19) induces a norm on the elements of ) provided we consider as identical any sequences x, y with || <Sm(x — y) 112,7V= 0 for m e {0,1,... , TV — 1}, which, as discussed before, implies it for all m. The connection between CS(N) random sequences and TV-variate stationary sequences is well known (see Gladyshev [11]). Here is the corresponding statement for numerical sequences. Proposition III.5 A numerical sequence x is regular CS(N) if and only if the vector numerical sequence x formed by x.^ = {^j/v+fc}0!-^ A; = 0 , 1 , . . . , TV 1 is regular stationary in the sense of definition II.2. Proof: If x is regular CS(N), then the existence of the limits /J,m (x) and Rm,n(x) for arbitrary m, n and the uniqueness of the relationships m = pN + r and n = qN + s shows that the limits IJL (X^(p) J and (f>r,s(p — q) will exist for all p,q £Z and r,s e {0,1,...,TV - 1} and (/>r,s(p ~ q) = Rm,n(x-)- Further, the relationship Rm,n(x) = Rm_lNn+N(x.) makes it clear that 4>r,s(p — q) = -Rp7v+r,q/v+s(x) depends only on the difference p - q. The converse follows from the same relationships. | If a stochastic sequence is properly cyclostationary of period TV, then it is not stationary. The following, seemingly paradoxical statement follows from the fact that different inner products are used to determine whether a sequence is CS(N) or CS(1). Proposition III.6 //x is regular CS(N) then it is regular CS(1). Proof. If x is CS(N) and the sums 1
M
7V-1
-,
XjN+k 2MTT j=-M ^MNp n k=0
I - . -,
M
, N-l
/ . ~j^ / ; j=-M k=0
x
jN+m+kxjN+n+k
186
Kurd and Koski
converge as M —> oo, then the limits are /j, (x) and r TO _ n (x). But by regularity, the sums do indeed converge to
— ii^(~v\ n (*} » T V^ 7 /-"i* \ •"• j — — A*' \ •"• I l\l
/
-^ '
K
\
/
Z is strictly periodic with period N. Then the periodically rearranged sequence x denned
by Xn = y(n+fn)
(3-27)
is regular CS(N). As above, -. M
2M
,
.
M (3 28)
'
.
j=-M
]=—
clearly converges to
M^W = A#2/m(y) = &+fm+N(y} = /S,v(x).
(3.29)
Then for any m,n M
M (3.30) 1
j=~M
^ j=~M
converges to
c. Periodic Mixtures of Jointly N-Stationary Sequences. This is a generalization of the amplitude modulation model. We will subsequently see that all cyclostationary numerical sequences can be expressed in a similar form. Suppose {y(k', k — 0 , 1 , . . . , K — 1} is a collection of regular and jointly ./V-stationary sequences with period N and {f(k>,k = 0 , 1 , . . . , K — 1} is a collection of strictly periodic sequences all with period N. Then the sum sequence x defined by K T "JTL
— \^ X /
f (*0 7 / ( f c ) T i V
C\ "V)\ lO.O^I
fc=l
is regular CS(N). Since the existence of the mean /Zm (x) is now clear, consider for any m, n -,
M
r,M^—Y , -,
V
i-. T-»r
T «
1
M
// >, ^jN+m-LjN+n Z]N+mXjN+n — —r)M ^M + 1 -. _£_^
K
K
^ 2-^ JjN+m yjN+mJjN+n
UjN+n
188
Kurd and Koski
which converges to
fc=l fc'=l
F.
The Wold Isomorphism
We are now in position to extend the Wold isomorphism to the cyclostationary case. A stochastic sequence £ = {^•}^._00 with E [| £j |2] < oo is called cyclostationary (CS), see [11], or periodically correlated with period N if fJLn = E [£n] = p,n+N
(3.35)
and if the autocorrelation satisfies Rm,n = E £m£n
= Rm+N,n+N
(3.36)
for every m and n. Since /zn is periodic in n, it is equivalent to require the autocovariance to satisfy (3.36). If £ is CS with period JV = 1, then £ is (weakly) stationary, so for the proper CS-property we require JV > 1. It is well known that a CS random sequence with period N can be obtained by interleaving the components of a stationary vector valued random sequence with N components and vice versa (see [11] and [23]). The Wold isomorphism for cyclostationary sequences follows in exactly the same manner as in the stationary case. One first establishes the correspondence on the two linear manifolds and then the isomorphism is obtained by taking a limit. Proposition III. 7 Let x be a regular CS(N) numerical sequence having autocorrelation Rm,n (x) and let J be the map from MN (x) into sp{£k , k 6 Z} C H (£) defined for arbitrary complex A,/i and integers k,p by J (\Spx
(3.37;
Then J can be extended as an isometric Hilbert space isomorphism from a complex Hilbert space HN (x) containing MN to the complex Hilbert space H (£,}, the closed linear span of a CS stochastic process £ that has autocorrelation function
Proof. Since HN (x) exists, at the very least via the process of abstract completion [26, p. 121, 124 - 125], the proposition holds through the continuity (boundedness) of J on MN (x).
CYCLOSTATIONARY ARRAYS
189
But again the results of Bass [2] may be applied because the pre-Hilbert space MJV (x) is a subspace of the Marcinkiewicz space M2 of sequences described in the previous section. To see this, suppose z £ MJV (x), then (z, z)N =|| z Hjy exists and is given by limit of the type (3.19). But if the limit exists for z certainly the limsup exists and so z 6 M"1. The theorem of Bass implies that if y 6 HN (x) then y £ M2 but also that if y and z 6 HN (x) then (y,z)N exists as a limit. | And now we know there is a continuous invertible linear map
that preserves inner products and the respective topologies are those induced (v)jv and (;-)La. The next proposition follows in the same manner as Proposition II. 4 for stationary sequences, and so we omit the proof. Proposition III. 8 If /^m (x) = (x, l) w exists form = 0, 1,. . . ,N — 1, then Mm (z) = (z, 1) , m = 0, 1, . . . ,-AT — 1 exist for every z G -£Zjv(x). The following proposition, whose proof is simple and thus omitted, (see [14] for the continuous time case and [5] for a review), gives the connection between unitary operators and cyclostationary sequences. It is the foundation for various representations of stochastic cyclostationary sequences and for corresponding results on cyclostationary numerical sequences. Proposition III. 9 A second order stochastic sequence £ = {£n}^L_oo ^s cyclostationary with period N if and only if there exists a unitary operator [/£ on the Hilbert space H (£) for which £n+N = U^n
(3.38)
for every integer n. The previous fundamental fact (3.38) transforms under J~l to a corresponding necessary statement about sequences. Proposition III. 10 // x = {aJj}°l_00 is regular cyclostationary with period N, there exists a unitary operator V on the Hilbert space H (x) for which
for every integer n. Proof. First if x is regular CS with period N, then let £„ be the corresponding stochastic cyclostationary sequence for which (3.38) will hold. Then applying J~l to (3.38) yields
190
Kurd and Koski
= VSnx.
(3.40)
1
Since J is an isometry, V = J~ U^J is unitary and is given by 5^, showing that the iV-shift is unitary. | Having established the Hilbert space isomorphism between HN (x) and the Hilbert space H (£) of a CS(N) random sequence, we can use any of the representations for CS random sequences to produce a corresponding representation for regular CS numerical sequences. G.
Representations of Regular CS Numerical Sequences
To begin, since CS random sequences are strongly harmonizable [11], regular CS numerical sequences are also. This means that the spectral representation = [ " eiXndZ(X) Jo transforms to x = / ^ eiXndz(X) Jo in which z(A) is a sequence valued process which does not necessarily have orthogonal increments as in the stationary case. A few more facts are given in [15]. But here we wish to focus on other representations arising from the fundamental result of Proposition III. 10. For this we need to see how periodic sequences are transformed under J. If p is a periodic CS(N) sequence (definition (III.6), then setting £n = JSnp yields II Cn - W ||i2HI
SH
P - SH+NP \\2,N= 0
for every n. Thus the periodicity given by definition III.6 corresponds, under J, exactly to our usual notion of a periodic sequence of random variables, and conversely. Note that if an is a periodic sequence of constants (which are still random variables) in H (£) then the sequence given by Sna = J~1an is only required to be periodic in the sense of definition III.6. Proposition III. 11 //x is regular CS(N), there exists a unitary operator V : HN (x) —> HN (x) and a sequence p 6 HN (x), regular and periodic in the sense of (3.13), for which Snx = Vn[Snp]
(3.41)
for every n. Proof. The stated result follows by application of J^1 to the representation for CS random sequences (see [14, 5])
CYCLOSTATIONARY ARRAYS
191
£n = Un[Pn]
(3-42)
where U : H(£) i—> H(£) is the unitary Nth root of U^ appearing in (III. 9), and || Pn — Pn+N \\L2= 0 for every n. For then
= Vn[Snp] 1
n
(3.43) l
where V = J~ UJ and S p = J~ [Pn}. The required periodicity follows from || Sn+Np - Snp \\2,N = \\ J~l(Pn+N - Pn] \\2,N= 0
as || Pn+N — Pn | #(£) = 0- The regularity of p follows from its inclusion in HN (x). I A slight variation permits us to give a precise meaning to our previous statement that all cyclostationary sequences are periodic mixtures of jointly stationary sequences. Proposition III. 12 A numerical sequence x is CS(N) if and only if there exists a collection a.^',j = 0, 1, ...,K — I of sequences in HN (x) that are regular and jointly N -stationary with respect to the index n and a collection of scalar periodic functions {/« = fn+N>3 =0, 1,...,/C — 1} for which 5"x = £ /C^aW > j=o for every n.
(3.44)
Proof. We have already shown in the examples that periodic mixtures of regular and jointly ,/V-stationary sequences are CS(N). For the converse we use the fact that a second order CS(N) random sequence has the representation ([14, 5] )
j=0
where fn ) = fn+N a^ n anc^ j = 0,1,. . .K — 1, and the collection {an , j = 0, 1 . . . , K — 1} are jointly stationary. This result follows from noting that if U is the unitary operator of a collection of jointly stationary sequences a$ in #(£), so a^+1 = Ua(j\ then the collection J"1^ = Sna^ is jointly A^-stationary in HN (x) , since
Additional representations of £n, obtained by utilizing the spectral representation of U and Fourier series representations for Pn (see [14, 5]) can give alternative representations for 5nx in the same way.
192
Kurd and Koski
IV.
Concluding Remarks
Wold noted that if Xn is a stationary second order ergodic sequence, then almost all of the sample sequences will be isomorphic to Xn in the sense described above. Similarly, suppose Xn is a stochastic sequence defined on a probability space (£}, F, /i) by Xn(u>} = f(Snu) where S is invertible and measurable but only TV-stationary with respect to /x : n(S~NE) — fj,(E) for E € T. Then if the system (£l,F,n,SN) is ergodic and f ( S n u j ) € ~,/-i), the averages 1 ^ — .A. i JV-i~rn•**• ? 2M + 1 *-" "./»-•-'»" J"i-» j=-M
M 1 2M + 1 *-
]=-
converges for almost every w(/i) to
EXiN+mXiN+n = I f ( S " Jn If Mi,m is the //-null set on which convergence to the indicated limit fails, then it is clear that the set fi — Um,nA/"n)TO has full measure and on this set all the required limits exist. Each u in this set produces a numerical sequence for which the averages of (3.4), and each such sequence is isomophic in the sense of Wold to the stochastic sequence. For other issues related to ergodic theory of CS sequences, see Gray and Kieffer [12], Boyles and Gardner [3], and Gray [13]. Acknowledgments: Part of the research reported here was done during a visit by the first author at Lulea University of Technology. The support by the School of Engineering and the Division of Signal Processing of the Lulea University of Technology as well as the generous hospitality of prof. Per Ola Borjesson are hereby gratefully acknowledged. This work was supported in part by the Office of Naval Research under contracts N00014-92-C-0057 and N00014-95-C-0093, and by the U.S. Army Research Office under contracts DAAH04-96-C-0027 and DAAD1902-C-0045.
References 1. J. L. Abreu, A note on harmonizable and stationary sequences, Bol. Soc. Mat. Mexicana (2), 15, 1970, pp.48-51. 2. J. Bass, Fonctions de Corr'elation Fonctions Pseudo-Al'eatoires et Applications, Masson, New York, 1984. 3. R. A. Boyles and W. A. Gardner, Cycloergodic Properties of DiscreteParameter Nonstationary Stochastic Processes, IEEE Trans, on Information Theory, vol. IT-29, no. 1, pp.105-114, 1983.
CYCLOSTATIONARY ARRAYS
193
4. D. R. Brillinger, Time Series, Data Analysis and Theory, Holden Day, San Francisco, 1981. 5. D. Dehay and H. Kurd, Representation and Estimation for Periodically and Almost Periodically Correlated Random Processes, in Cyclostationarity in Communications and Signal Processing, W.A. Gardner, ed., IEEE Press, 1993, pp. 295-328. 6. H. Furstenberg, Stationary Processes and Prediction Theory, Annals of Mathematics Study, No. 44, Princeton University Press, Princeton, N. J. 1960. 7. W. A. Gardner, Introduction to Random Processes With Applications to Signals & Systems, 2nd Edition, McGraw-Hill, New York. 1990. 8. W. A. Gardner, Statistical Spectral Analysis, A Nonprobabilistic Theory, Prentice Hall, Englewood Cliffs, NJ, 1987. 9. W. A. Gardner, An Introduction to Cy do stationary Signals, in Cyclostationarity in Communications and Signal Processing, W.A. Gardner, ed., IEEE Press, 1993, pp. 1-90. 10. W. A. Gardner and W. A. Brown, Fraction-of-time probability for timeseries that exhibit cyclostationarity, Signal Processing, 23, 1991, pp. 273 - 292. 11. E. G. Gladysev, Periodically Correlated Random Sequences, Sov. Math., 2, 1961, pp. 338 - 388. 12. R. M. Gray and J. C. Kieffer, Asymptotically mean stationary measures, Ann. Prob., 8, 1980, pp. 962 - 973. 13. R. M. Gray , Probability, Random Processes and Ergodic Properties, Springer-Verlag, New York, 1988. 14. H. L. Hurd and G. K. Kallianpur, Periodically Correlated Processes and Their Relationship to 1/2 [0,T]- Valued Stationary Sequences, in Nonstationary Stochastic Processes and their Application, A.G. Miamee, Ed., World Scientific Publishing Co., pp. 256-284, 1992. 15. H. L. Hurd and T. Koski, The Wold Isomorphism for Cyclostationary Sequences, submitted to Signal Processing. 16. L. Izzo and A. Napolitano , Higher-order cyclostationary properties of sampled time-series , Signal Processing, 54, 1996, pp. 303-307. 17. L. Izzo and A. Napolitano , Higher-order statistics for Rice's representation of cyclostationary signals , Signal Processing, 56, 1997, pp. 279-292. 18. L. Ljung, System Identification. Theory for the User, Prentice-Hall Inc. Englewood Cliffs, N. J. 1987. 19. G. W. Mackey, Ergodic Theory and its Significance for Statistical Mechanics and Probability Theory , Adv. in Math., 12, 1974, pp. 178 268. 20. P. Masani, Review of [6], Bull. AMS, 69, March 1963, pp. 195 - 207. 21. P. Masani, Einstein's Contribution to Generalized Harmonic Analysis,
194
Kurd and Koski
Jahrbuch Uberblicke Mathematik, 1986, pp. 191- 209. 22. P. Masani, Review o/[30], in Norbert Wiener: Collected Works Vol II, paper 27a, MIT Press, Cambridge, MASS, USA. 23. A.G. Miamee and H. Salehi, On the Prediction of Periodically Correlated Stochastic Processes, in P. R. Krishnaiah (ed.), Multivariate Analysis V, 1980, pp. 167 - 179. 24. A.G. Miamee and H. Salehi, Harmonizability, V-Boundedness and Stationary Dilation of Stochastic processes , Indiana J. Math., 27, 1978, pp. 37 - 50. 25. A.G. Miamee, Periodically Correlated Processes and Their Stationary Dilations , SIAM J. Appl. Math., 50, No. 4, 1990, pp. 1194-1199. 26. A. W. Naylor and G. R. Sell, Linear Operator Theory in Engineering and Science, Springer-Verlag, New York, Heidelberg, Berlin, 1982. 27. H. Niemi, Orthogonally Scattered Dilations of Finitely Additive Vector Measures with Values in a Hilbert Space, in Prediction Theory and Harmonic Analysis, The Pesi Masani Volume, V. Mandrekar and H. Salehi (eds), North Holland, 1983, pp. 233 - 251. 28. Yu. A. Rozanov, Stationary Random Processes, Holden-Day, San Francisco, Cambridge, London, Amsterdam, 1967. 29. M. H. Stone, Linear Transformations in Hilbert Space, American Mathematical Society Colloquium Publications, Vol. 15, Providence R. I. 1932. 30. N. Wiener, The spectrum of an array, J. of Math. Phys., 6, 1927, pp. 145-157. 31. N. Wiener, Generalized Harmonic Analysis , Acta Math., 55, 1930, pp. 117 - 258. 32. J. C. Willems, From Time Series to Linear System- Part III. Approximate Modelling , Automatica, 23, 1987, pp. 87 - 115. 33. H. O. A. Wold, On Prediction in Stationary Time Series, Ann. Math. Stat. , 19, 1948, pp. 558 - 567.
Operator Theoretic Review for Information Channels Yuichiro Kakihara Department of Mathematics, California State University, San Bernardino, CA 92407-2397
Abstract Information channels are studied in terms of operators between the spaces of measures, called channel operators. Ergodicity for a stationary channel operator is characterized. AMS channel operators are formulated and their ergodicity is also characterized.
1 INTRODUCTION An operator theoretic treatment of information channels is given. Here, an information channel (or a channel) is regarded as an operator, called a channel operator, between the set of input sources (probability measures on the input space) and the set of compound sources (probability measures on the Cartesian product of the input and output spaces). Stationarity, asymptotic mean Stationarity (AMS), and ergodicity are denned for channel operators, and then ergodicity of a stationary channel operator or an AMS channel operator is characterized, where absolute continuity of measures and channels plays a crucial role. A few years after Shannon [15] created information theory, McMillan [11] formulated information sources and channels based on measure theory using alphabet message spaces as the input and output spaces. A rigorous development of information theory is shown, for instance, by Feinstein [4]
195
196
Yuichiro Kakihara
and Khintchine [10]. An abstraction of the concept of information channels was done by Echigo (Choda) and Nakamura in their series of papers [1], [2], [3], where they considered a channel as a linear operator between two W*-algebras under the name of a "generalized channel." When the input and output are taken to be a pair of compact Hausdorff spaces, Umegaki [16] established a one-to-one correspondence between the set of all channels and the set of certain averaging operators from the set of Baire functions on the output to that of the input. Then, he characterized ergodicity for a stationary channel. Ozawa [13] obtained a S*-algebra formulation of a channel that is a direct generalization of Umegaki's operator setting since the set of all Baire functions on a compact Hausdorff space is a typical example of a E*-algebra (see also Ozawa [14]). In this section, some notations and basic definitions are given. In section 2, a stationary channel operator is introduced and its ergodicity is characterized, which is very similar to that of Umegaki [16]. In section 3, an AMS channel operator is defined and some necessary and sufficient conditions for an AMS channel operator to be ergodic will be obtained. We begin with some notations used throughout this paper. Let (X, X, S) and (Y, 2},T) be a pair of measurable spaces with measurable transformations 5 : X -> X and T : Y -> Y, respectively, and (X x Y,X ® 2J, S x T) be the product measurable space. For fi — X,Y or X x Y let P(fi) and M(fi) denote the set of probability measures on fi and the Banach space of all complex valued measures on fi, respectively. Also let B(£i) denote the space of all bounded measurable functions on fi. A channel with input X and output Y is a function v : X x 2) —> [0, 1] such that (cl) v(x, •) € P(Y) for every x 6 X, (c2) z/(-,C) <E B(X) for every C 6 2). C(X,Y) stands for the set of all channels with input X and output Y. A probability measure /j, € P(X) is called an input source. If an input source H € P(X) and a channel v € C(X, Y) are given, we can associate the output source ^v e P(Y) and the compound source \i v e P(X x Y) by letting = I v Jx
H®v(AxC)= I v(x, C}n(dx], JA
A 6 X, C 6 2).
Note that \LV and // ® v are defined also for a general measure yn € M(X). Let
Then F = Fv and G = GV are operators from M(X) into M(F) and M(X x
OPERATOR THEORETIC REVIEW
197
y), respectively. They are known as channel operators associated with the channel v. The following are some properties of these channel operators. (01) F and G are linear, bounded, and positive operators of norm 1 such that F : P(X) -» P(Y) and G : P(X] -> P(X x Y). (02) //(.) = G/x(- x y) and Fji(-) = G/z(X x •) for fj, e M(X).
(03) G/x < /i x F/x for ^ e Ppf). (04) Mi ^ M2 => GMI i-e., G is absolute continuity preserving. In view of the above four conditions, we can define a channel operator as follows. DEFINITION 1.1 An operator G : M(X) -> M(X x Y) is said to be a channel operator if it satisfies (ol), (o2), (o3) and (o4) below: (01) G is a linear, bounded and positive operator from M(X) into M(X x F) of norm 1 such that G : P(X) -» P(X x y);
(02) M(-) = GH(- x y) for n e M (X); (03) G/z < fj, x F/i for ^ e P(X), where F//(-) = G/x(X x •); (04) M! «; p,2 =>• GMI < G^2 for ^1,^2 6 -P(^)Let C?(X, y) denote the set of all channel operators. For a channel operator G € O(X, Y) and an input source n & P ( X ) , Fp € P(Y) and G/x e P(A" x y) are called an output source and a compound source, respectively. It is of interest to consider the following question: If a channel operator G : M(X) —> M(X x y) is given, can we associate a channel v e C(X,Y) such that G = Gvl A partial affirmative answer is given below. PROPOSITION 1.2 Suppose that X and Y are compact Hausdorff spaces and that X and 2) are Borel a-algebras of X and Y, respectively. Assume that Y is totally disconnected. Then, for each channel operator G e O(X, Y) that is weak*-to-weak* continuous, there exists a unique channel v € C(X, Y) such that G = Gv and (c3) f Y f ( ; y ) v ( ; d y )
€ C(X) for / e C(X x y),
where C(X) and C(X x y) are the Banach spaces of all continuous functions on X and X x Y, respectively. REMARK 1.3 That G is weak*-to-weak* continuous means that if {/ua} is a net such that p,a —> p weak* in M(X) = C(X)*, then Gjj,a —> G/n weak* in M(X x y). A channel v satisfying (c3) is said to be continuous.
198
Yuichiro Kakihara
Proof of Proposition 1.2: Since G : M(X] -> M(X x Y) is weak*-toweak* continuous, it has the predual G, = G\C(XxY) : C(X x y) -> C(X). Let x G X be arbitrary and define px by
where (lx 0 6)(x, y) = lx(aO&(y) for x G X and y £ Y, \x being the identity function on X. Since px is a positive linear functional of norm 1, there exists a probability measure v(x, •) G P(Y) such that =
b(y)v(x,dy), JY Since Y is totally disconnected, the set T = {C G 2) : C is closed and open} forms a topological basis for Y. Note that p x (lc) = v(x, C) for C € T since lc G C(y), lc being the indicator function of C. Now let 2)* = {C1 e 2) : z/(-, C) G -B(X)}. Then, it is easy to see that 3)* is a monotone class containing T, so that it coincides with s (X,y) (mod^),ifG = aGl + (1 - a)G2 (modP), Gi,G 2 e ^(X.y) and 0 < a < 1 imply that GI = G2 = G (modP). REMARK 2.6 We usually consider the cases where P = PS(X) and Pse(X). If (X, X, S) is complete for ergodicity, then mod PS(X) is equivalent to mod Pse(X). For an ordinary stationary channel Yi [17], Umegaki [16] and Nakamura [12] characterized its ergodicity. Now some necessary and sufficient conditions for a stationary channel operator to be ergodic will be given in the same spirit as above mentioned authors. THEOREM 2.7 Assume that (X, X, 5) is complete for ergodicity. Then,
200
Yuichiro Kakihara
for a stationary channel operator G G OS(X,Y) the following conditions are equivalent: (1) G € Ose(X,Y), i.e., G is ergodic; (2) G G e*Os(X,Y) (modPsepO); (3) There exists an ergodic stationary channel operator G\ G Ose(X,Y) such that G < GI (mod Pse(X)) ; (4) // a stationary channel operator GI & OS(X,Y) is such that GI (2). Let G € Ose(X,Y). Assume that G = aGi + (1 - a)G2 (modPsePO), where 0 < a < 1 and Gi,G 2 € OS(X,Y). Then, for [i G PsePO it follows that + (1 - a)G 2 M G Pse(X x F) = exPs(X x y). Since this is a proper convex combination of measures in PS(X x Y), we have GI// = G 2/ u = G/z for /x e Pse(X). Hence GI = G2 = G (modP se (X)). Therefore, (2) holds. (2) =>• (1). Suppose that G is not ergodic. Then there exists a stationary ergodic source /J.Q 6 Pse(^) such that G/UQ ^ P«e(^ x y). Hence, 0 < GHQ(EQ) < 1 for some 5 x T-invariant set ^o G % ® 2) . Let AI = G/xoC-Ko) and A2 = 1 — AI. Moreover, let 7 > 0 be such that 0 < 7 < min{Ai, A2} and a, = ^ for i = 1, 2. Now define operators GI, G2 : M(X) -* M(X x y) by n £?0) + (l - aiGju(E0))G,u(EO n
for /i G M(X) and £ € £ ® 2). Clearly, GI, G2 : M(X) -> M(X x y) are positive linear operators such that GI, G2 : P(-X") —> P(X x y). Hence GI, G2 C7(A",y) as conditions (o2)-(o4) are easily verified for GI and G2. We now show that Gi,G 2 G OS(X,Y). Let p. G PS(X) and E G ^02). Then, it follows that T)-1^) = aiGfj,((s x rr1^ n + (1 -
since G// is stationary and E1 is 5 x T-invariant. Thus GI is stationary. Similarly, G2 is shown to be stationary.
OPERATOR THEORETIC REVIEW
201
Next we show that G\ ^ G2 (mod Pse(X}) . Observe that
£c n E
+ 1
A2 —
nE
O) + 1 = (1 - aiAi)A 2 = A2 - a i A i A 2 , and hence (G2fj,0 - Gino)(Eft
= Q 2A2 + A2 - a 2 A 2 - (A2 = a 2 A 2 + A 2 (aiAi — a 2 A 2 ) = 7 + A 2 (7-7) =7>0,
implying that GI//Q ^ G^^Q. Thus GI ^ G2 ( mod Pse(X}}. Finally we show that AiGi + A 2 G 2 = G. For \i e P(X) and E e X (g> 2)
A2 [a2G/i(£; n
n £0) + n (1) Therefore, G £ ex0 s (X,y) (modP se (X)), hence (1) is not true. (1) =>• (3) is trivial. (3) ^> (1). Assume that G (4). Let G2 6 C? B (X,y) be such that G2 < G ( mod P Be (A")). If G2 ^ G (modPSe(-X'))) then there exists some p, € Pse(X) such that G2/u ^ G/z. Since G2/i (1). Suppose G ^ O ae (A",y). Then, there exists some fj,0 e Pse(^) such that G/UQ € PS(X x y)\P se (X x Y), and hence there exists some S x
202
Yuichiro Kakihara
T-invariant E0 € X 2J such that 0 < G^0(E0) < 1. Define G2 by
n£ for /z 6 P(-X') and E1 e X 2). It is easily seen that G% can be extended to an operator on M(X), G2 € Oa(X,Y), G2 & G (mod Pse(X}) , and G2 < G, which is a contradiction of (4).
3 AMS CHANNEL OPERATORS An AMS channel operator is introduced and its ergodicity is considered. In this sectoion, we assume that 5 and T are invertible and that (X, X, 5) is complete for ergodicity. A probability measure /j, £ P(X) is said to be asymptotically mean stationary or AMS if ., n-i fc=0
exists for every A e X. In this case, p, is a stationary source (i.e., p 6 Pa(X)} and is called the stationary mean of /x. Let P0(-^) denote the set of all AMS sources in P(X). An AMS source p, 6 P0PO is said to be ergodic if p,(A) = 0 or I for every 5-invariant set A € X. Pae(X) stands for the set of all AMS ergodic sources in Pa(X). We use the notations Pa(Y),Pae(X x y), etc in the obvious fashion. Here are some known facts on AMS measures that will be frequently used in the rest of this section (cf. Fontana, Gray and Kieffer [5], Gray and Kieffer [6], and Kakihara [7], [8], [9]). REMARK 3.1 (1) A source yu € P(X) is AMS if and only if the Pointwise Ergodic Theorem holds, i.e., for any bounded measurable function / € B(X) there exists some 5-invariant function /* £ B(X) such that n-l
fn(x) = ^ f ^ -> f*^ n k=0
V-a-t- *•
In this case, it also holds that /„ —> /* in Ll(X,n). (2) A source p, € P(X) is AMS if and only if there exists a stationary source /ii € Ps(X) such that /u (1). Let ;u € PS(X ) be given and G2 € 0a(X, y) be such that G/z < G 2 /z. Then, GI/Z e P a (X x y) and G^ < G2/z < G^ € PS(X x y). Thus G/z £Pa(X xY). Hence G is AMS by Lemma 3.4. As every AMS source /tz 6 PaC^O has the stationary mean /z e every AMS channel operator should have a "stationary mean," which is denned below. DEFINITION 3.7 Let G € Oa(X,Y) be an AMS channel operator. Then a stationary channel operator GI e Oa(X, Y) is said to be a stationary mean of G if Gi/tz = G/z for every /z € P a (X). GI is unique in modPs(X) or, equivalently, modPse(^0 sense. That is, if G% € OS(X,Y) is a stationary mean of G, then GI = G2 ( mod PS(X)). Hence, we denote any stationary mean of G by G. REMARK 3.8 For an AMS constant channel operator Gt) with 77 € Pa(X) the stationary mean is obtained as G,, = G^, which is seen by Remark 3.1 (4). For a general AMS channel operator G 6 Oa(X,Y) its stationary mean G is obtained as follows. Take any stationary channel operator GI € OS(X,Y) and define G by s
OPERATOR THEORETIC REVIEW
205
Now we summarize some necessary and sufficient conditions for a channel operator to be AMS as follows. PROPOSITION 3.9 For a channel operator G £ O(X,Y) conditions are equivalent:
the following
(1) G e Oa(X,Y), i.e., G is AMS; (2) G has a stationary mean G £ OS(X, Y); (3) There exists a stationary channel operator GI £ OS(X,Y) G < GI ( mod Pse(X));
such that
(4) There exists an AMS channel operator G% £ Oa(X,Y) such that G < G2 ( mod Pse(X}};
(5) n £ Pa(X) =* GH e Pa(X x Y). Proof: (1) =»• (2) is shown in Remark 3.8. (2) =* (3). Take GI = G. (3) =>• (4) is obvious. (4) =>> (5). If fj. £ PS(X), then Gp, < G2n -C G^ e PS(X x y) by assumption (4). Then, G// <E Pa(X x Y) by Remark 3.1 (2). (5) => (1) was shown in Lemma 3.4. DEFINITION 3.10 An AMS channel operator G e £> a (X,y) is said to be ergodic if (08) fj, e P0e(-X") =^ GH G P oe (X,y), i.e., G is ergodicity preserving. Let Oae(X, Y) denote the set of AMS ergodic channel operators. Ergodic AMS channel operators are characterized as follows. TEOREM 3.11 For an AMS channel operator G e Oa(X,Y) with a stationary mean G the following conditions are equivalent: (1) G e Oae(X,Y), i.e., G is ergodic; (2) fi G Pse(X) =» G» e Pae(X x Y); (4) There exists a stationary ergodic channel operator G\ G Ose(X, Y) such that G < GI ( mod Pse(X)); (5) There exists an AMS ergodic channel operator GI £ Oae(X,Y) taht G < G2 ( mod Pse(X)). Proof: (1) =4> (2) is obvious. _ (2) => (3). If M 6 Pse(X\ then Gp £ Pae(X x Y) by (2), so that € P,e(X x Y). Thus, G e Ose(X, Y).
such
206
Yuichiro Kakihara
(3)=> (4). Take Gl = G. (4) =4> (5) is immediate. (5) =» (1). Let n G PaepO. Then, Gp < G2/i < G& G Pse(* x y) by (5). Remark 3.1 (6) implies that Gfj, G Pae(^ x Y). Thus G is ergodic. EXAMPLE 3.12 As is seen in Example 3.3, a constant channel operator Gq is AMS if and only if r\ is AMS. Now, G^ is AMS and ergodic if and only if its stationary mean GJJ is ergodic if and only if rj is weakly mixing. As was seen that ergodicity is weaker than extremelity for AMS sources, we can prove the following. TEOREM 3.13 (1) If an AMS channel operator is extremal ( mod Pse(X)) in the set of AMS channel operators, then it is ergodic. That is, exOa(X, Y) C Oae(X,Y). (2) // (Y, 2J) is nontrivial and there exists a weakly mixing output source 77 G Pae(Y), then the above set inclusion is proper. That is, there exists an AMS ergodic channel operator that is not extremal in the set of AMS channel operators. Proof: (1) Let G e Oa(X,Y)\Oae(X,Y}. Then, by Proposition 3.9 there exists a stationary ergodic source ^o £ Pse(X) such that GJJLQ G Pa(X x Y)\Pae(X x y). Hence there exists an S x T-invariant set EQ G X <S> 2) such that 0 < AI = G^o(Eo) < 1. Let A2 = 1 — AI, take 7 such that 0 < 7 < min{Ai, A2J-, and let QJ = ^ for i = 1, 2. Define GI and 62 by
for /^ G M(A") and E1 G X <E> 2). Then, as in the proof of Theorem 2.7, we see that Gi^G2( mod Pse(X)) and G = AiGi +_A 2 G 2 . We now verify that GI, G2 G Oa(X, Y). Let G be the stationary mean of G. Then G < G (modP^PO) and hence (modPs(X)). For fj, G PS(X) we have that G/x(- n ^o) < G[i(- n £0) < G/x(-) and < "iG/i + (1 - aiGn(EQ)}~Gp, < G//. Consequently Gj is AMS by Proposition 3.9. Similarly, G2 is AMS. Therefore, G is not extremal in Oa(X,Y) ( mod Pse(X)). (2) Let r? G Pse(y) be weakly mixing and define £ by CG2J t(C)== [ gdri, Jc where g G L1(y, 17) is nonnegative valued with norm 1 such that it is not Tinvariant on a set of positive r/ measure. Then, we see that £ G Pae(Y),£ ^
OPERATOR THEORETIC REVIEW
207
r?,£ = r\ and ( = |(£ + 77) 6 Pae(Y) by Remark 3.1 (8). Hence the constant channel G^ is AMS ergodic and not extremal in Oa(X, Y) since G^ = G^ = Gn € Ose(X,Y] by Example 3.12.
References
1. M. Choda and M. Nakamura, A remark on the concept of channels II, Proc. Japan Acad. 46 (1970), 932-935. 2. M. Choda and M. Nakamura, A remark on the concept of channels HI, Proc. Japan Acad. 47 (1971), 464-469. 3. M. Echigo (Choda) and M. Nankamura, A remark on the concept of channels, Proc. Japan Acad. 38 (1962), 307-309. 4. A. Feinstein, Foundations of Information Theory, McGraw-Hill, New York, 1958. 5. R. J. Fontana, R. M. Gray and J. C. Kieffer, Asymptotically mean stationary channels, IEEE Trans. Inform. Theory IT-27 (1981), 308-316. 6. R. M. Gray and J. C. Kieffer, Asymptotically mean stationary measures, Ann. Probab. 8 (1980), 962-973. 7. Y. Kakihara, Ergodicity of asymptotically mean stationary channels, J. Multivariate Analysis 39 (1991), 315-323. 8. Y. Kakihara, Abstract Methods in Information Theory, World Scientific, Singapore, 1999. 9. Y. Kakihara, Ergodicity and extremality of AMS sources and channels, International J. Mathematics and Mathematical Sciences, to appear. 10. A. I. Khintchine, Mathematical Foundations of Information Theory, Dover, New York, 1958. 11. B. McMillan, The basic theorems of information theory, Ann. Math. Statist. 24 (1953), 196-219. 12. Y. Nakamura, Measure-theoretic construction for information theory, Kodai Math. Sem. Rep. 21 (1969), 133-150. 13. M. Ozawa, Channel operators and quantum measurement, Res. Rep. Inform. Sci. Tokyo Institute of Technology, No.A-29, May 1977. 14. M. Ozawa, Optimal measurements for general quantum systems, Rep. Math. Phys. 18 (1980), 11-28. 15. C. E. Shannon, A mathematical theory of communication, Bell System Technical J. 27 (1948), 379-423, 623-656. 16. H. Umegaki, Representation and extremal properties of averaging operators and their applications to information channels, J. Math. Anal. Appl. 25 (1969), 41-73. 17. S. S. Yi, Basic problems concerning stationary channels, Advancement in Mathematics 7 (1964), 1-38 (in Chinese).
Pseudoergodicity in Information Channels Yuichiro Kakihara Department of Mathematics, California State University, San Bernardino, CA 92407-2397
Abstract
Pseudoergodicity is defined for stationary channels and asymptotically mean stationary (AMS) channels, and then it is characterized for both types of channels, where Adler's ergodicity (output ergodicity) is shown to be equivalent to pseudoergodicity.
1
INTRODUCTION
Adler [1] considered strong and weak mixing properties and ergodicity for a stationary channel and stated that a weakly mixing channel is ergodic. Pseudoergodicity and Adler ergodicity (output ergodicity) are considered in addition to the usual ergodicity for stationary or AMS (asymptotically mean stationary) channels. A characterization of pseudoergodicity is obtained for stationary and AMS channels, where it is shown that pseudoergodicity is equivalent to output ergodicity. We refer to Kakihara [8] for a general reference. Here are notations and terminologies that will be used throughout the paper. Let (X, X) and (Y, 2)) be a pair of measurable spaces. Let S : X —> X and T : Y —» Y denote the measurable transformations. (X x Y, X 2), S x T) stands for the product measurable space with the measurable transformation 5 x T. Let P(X) denote the set of all probability measures on X and PS(X) the set of all stationary measures in P ( X ) , i.e., /z e Ps(X) if 209
210
Yuichiro Kakihara
for A 6 X. Pse(X) indicates the set of all ergodic measures in PS(X). Let Pa(X) denote the set of all AMS (i.e., asymptotically mean stationary) measures in P(X), where jj, e P(X) is said to be AMS if
k=0
exists for every A £ X. In this case \L G Ps(X) and it is called the stationary mean of //. An AMS measure /u € P a (X) is said to be ergodic if /J,(A) = 0 or 1 for 5-invariant A £ X. Pae(X) represents the set of all ergodic measures in Pa(X). We also use the same type of notations for Y and X x Y instead of X above. A channel with input X and output Y is a triple [X, z/, Y] or simply v, where v : X x 2) —> [0, 1] is a function which satisfies that (cl) i/(z,.) eP(Y)
forze*;
P(Y) and GV : P(X) -> P(X x Y) are affine mappings. We may define stationarity of a channel v € C(X, Y) as follows, i/ € C(X, Y) is said to be stationary if
(c3') n e PS(X) => ^ ® i/ e P,(X x Y), i.e., Gj/ is a mapping from PS(X) into PS(X x Y). It is well known that (c3) implies (c3'). The converse implication is essentially true if 5 is invertible and 2J has a countable generator. That is, if a channel v satisfies (c3'), then for each \i e PS(X) there exists a strictly stationary channel v\ (satisfying (c3)) such that v(x, •) = vi(x, •) /i-a.e. x (cf. Fontana, Gray and Kieffer [3]). Let C3(X,Y) denote the set of stationary channels.
PSEUDOERGODICITY
211
2 SOME TYPES OF ERGODICITY OF STATIONARY CHANNELS We consider pseudoergodicity, ergodicity and output ergodicity (Adler's ergodicity) for stationary channels, and characterize pseudoergodicity. Throughout this section we assume that 5 is invertible and 2) has a countable generator, so that stationarity and strict stationarity are essentially same. We need the following lemma that was proved by Yi [14] and can be viewed as an ergodic theorem for a stationary channel. For the proof see e.g. Kakihara [9]. LEMMA 2.1 Let v € CS(X,Y) 3C ®f('P) is the conditional expectation with respect to 3 = {E 6 £®2J : (S x T}~1E = E} under the measure n®v. In particular, for every C, D € 2J it holds that , (x> T~hc n D) = _
My)^W(M3y )(y) v(x, dy)
H-a.e. x£ X,
(1)
where 3Y = {C e 2J : T~1C = C}. DEFINITION 2.2 A stationary channel v e CS(X,Y) is said to be pseudoergodic if (c4) p 6 Pse(X} =^^e Pse(Y), i.e., F, : Pse(X] -> P^Y), to be output ergodic if (c5) for every C, D E 2J and fj, e Pse(X) n -i lim - V{i/(x,T~ f c Cn£))-i/(a;,r- f c C)r/(x,L>)} =0 /i-o.e.z, n—>c» n *~~* fc=0
and to be ergodic if (c6) /* € Pse(X] ^n®vt Pse(X x F), i.e., G, : P Clearly ergodicity implies pseudoergodicity. Yi [13], [14] obtained some
212
Yuichiro Kakihara
necessary and sufficient conditions for a stationary channel to be ergodic. For some years this was overlooked. Shortly after [14] Umegaki [12] and Nakamura [10] independently gave several equivalence conditions for ergodicity (see Kakihara [9] or [8, Section 3.4] for the detail). EXAMPLE 2.3 Let 77 e P(Y) and define ^ by x£X,CeZ).
(2)
Then, clearly v^ is a channel that may be called a "constant" channel. For this channel v it holds that = 77, jj, ® is,, = n x q,
fj,
The following are simple observations. (1) z/,j is stationary if and only if 77 is stationary. (2) If 77 is stationary and ergodic, then v^ is stationary, output ergodic and pseudoergodic. (3) i/,j is stationary and ergodic if and only if 77 is stationary and weakly mixing. Hence, output ergodicity is weaker than ergodicity. If "P C P ( X ) , then P-a.e. means ^,-a.e. for every /_i G "P. We now characterize pseudoergodicity as follows. THEOREM 2.4 For a stationary channel v G C S ( X , Y ) the following conditions are equivalent: (1) v is pseudoergodic; (2) v is output ergodic; (3) // C € 2J is T-invariant and JJL 6 Pse(X), then z/(x,C) = 0 \i-a.&. x orv(x,C) = l n-a.e.x; (4) There exists a stationary pseudoergodic channel v\ such that v ( x , - } • (2). Let /a G Pse(X). Then /xz/ G P5e(^) by (1). It follows from (2.1) in Lemma 2.1 that for every C, D € 2) 71-1
lira - ^ z/(x, T"feC n £») = ^v(C}v(x,D~] k=0 Thus (c5) holds. Therefore v is output ergodic.
p-a.e.x.
213
PSEUDOERGODICITY
(2) =» (3). Let C = D G 2) be T-invariant and p, 6 Pse(X). Then, v(x, C1)2 = f (x, C) and hence ^(x, C1) = 0 or 1 p-a.e. x. Let A0 = {x G X : v(x, C} = 0},
Al = {x € X : i/(x, C) = 1}.
(4)
Then, AQ and AI are 5-invariant since v is stationary and C is T-invariant. So n(Ao) = 0 and ju(Ai) = 1, or /J-(Ao) = 1 and fj,(Ai} = 0 since /x is ergodic. Thus (3) follows. (3) =>• (1). Let C G 2) be T-invariant and p G Pse(^r). Let AQ and A! be as in (2.4) above. Then, n(Aj) = 0 or 1 for j = 0,1 by assumption (3). Thus /L4Z/(C") = / v(x,C] n(dx) = 0 or 1. 7x This gives that [iv is ergodic. Therefore v is pseudoergodic. (1) => (4) is trivial, taking v\ = V, and (4) =>• (1) follows from the fact that
(1) =>• (5). Suppose that ^ is pseudoergodic and /x 6 P se (X), so that p,v is ergodic. Let C, D G 2). Then we have that n-l
k=0 n-l
k=0
—> 0 (n —> oo)
since /^z^ and // are ergodic. (5) => (1) can be shown in the same fashion as above. Comparing (c5) and (2.3), we see that (c5) implies (2.3) by the Bounded Convergence Theorem. That is, output ergodicity appears stronger than
214
Yuichiro Kakihara
pseudoergodicity. But the fact is that they are equivalent as was seen above. For a stationary channel v G CS(X,Y) the following implications are valid: Ergodicity =>• Output ergodicity pseudoergodicity. The first implication follows from the equivalence condition for ergodicity proved by Yi [14] (cf. [8, p. 147]). The implication "output ergodicity =>• ergodicity" is false as seen in Example 2.3. Nakamura [11] showed that there exist a stationary output ergodic channel v and a strongly mixing input source /j, such that the compound source p, <S> v is not ergodic (cf. [8, p. 146]). See also Gray [5, p. 189].
3 AMS PSEUDOERGODIC CHANNELS
Pseudoergodicity and output ergodicity can also be defined for AMS channels and their chacterizations also obtained. In this section we assume that S and T are invertible and 2) has a countable generator, which guarantees the exisitence of the stationary mean of an AMS channel. Recall the notations P 0 (n) and Pae(ty for Q = X,Y and X x Y given in Section 1. DEFINITION 3.1 A channel v £ C(X,Y) mean stationary (or AMS) if
is said to be asymptotically
(c7) fj, € Pa(X) =>IJL®V£ Pa(X x y), i.e., Gv : Pa(X) -> Pa(X x y). Let Ca(X,Y) denote the set of all AMS channels. An AMS channel v e Ca(X,Y) is said to be ergodic if (c8) M e Pae(X) => n ® v <E Pae(X x y), i.e., Gv : Pae(X) -+ Pae(X x y) and to be pseudoergodic if (c9) M 6 Pae(X) = > / « / € Pae(Y), i.e., F, : Pae(X) -> Pae(y). AMS channels were defined and studied by Fontana, Gray and Kieffer [3], where they characterized ergodic AMS channels (see also Kakihara [7], [9]). The same notion as AMS channels is considered by Ding and Yi [2], Jacobs [6] and Zhi [15] under the name "almost periodic channels." As every AMS source /j, e Pa(X) has a stationary mean p e PS(X), each AMS channel v £ Ca(X, Y) has a stationary mean v £ CS(X, Y) given by -. n—1
F(z,C) = lim -^Tv(S-kx,T-kC), n-*oo n f—'
Pse(X)-a.e.x 6 X, C 6 2J.
(5)
This was also obtained in [3]. Note that n®v = n®v for fj, £ PS(X) (cf. [3])-
PSEUDOERGODICITY
215
DEFINITION 3.2 An AMS channel v e Ca(X,Y) mean v G Ca(X, Y) is said to be output ergodic if
with the stationary
(clO) for C,D€%)it holds that n-l
lim -V{i/(z,T- f c Cnl>)-l7(z,T- f c CV(a;,Z?)} = 0
n—»oo n *—* fc=0
We need the following lemma which is an AMS version of Lemma 2.1 given by Ding and Yi [2]. LEMMA 3.3 Let v G Ca(X,Y) be an AMS channel with the stationary mean v G CS(X, Y) and p, € P S ( X ) . Then it holds that for every E, F € X 2) n-l
t j ) l^\X dlj]
LL-CL G X
(6)
In particular, for C, D G 2) n-l
ra^oc nfc=Q
H-a.e.x,
(7)
w/iere 3 and Uy are i/ie same as in Lemma 2.1. Proof: Since v is AMS and /z is stationary, /x z/ is AMS with \JL ® v = ^ z/. The rest of the proof parallels that of Lemma 2.1 and we have (3.2). The following theorem gives necessary and sufficient conditions for pseudoergodicity of AMS channels. THEOREM 3.4 For an AMS channel v € Ca(X,Y) with the stationary mean v G Ca(X,Y) the following statements are equivalent: (1) v is pseudoergodic; (2) v is output ergodic; (3) /x £ Pse(X) =*nV£ Pae(Y); (4) V is pseudoergodic;
216
Yuichiro Kakihara
(5) There exists a stationary pseudoergodic channel v\ & C S ( X , Y ) such that i/(av)«i/i(av) Pse(X)-a.e.x£X;
(8)
(6) There exists an AMS pseudoergodic channel v\ € Ca(X, Y) such that (3.4) holds; (7) // C € 2) is T-invariant and \L €E Pse(X), then v(x,C) = 0 /z-a.e. x G X or v(x,C) = 1 fJL-a.e.x; (8) For every C1, Z> € 2} and /x 6 PSe(X) it holds that n-\
lim -V n +ocn -
Proof: (1) => (2). If /z e Pse(A"), then /zz/ 6 Pae(^) by (1), arid hence p7 = ("27 € P3e(Y) by [3]. Invoke (3.3) of Lemma 3.3 to see that (2) holds. (2) =>• (3). Let /z 6 Pse(X). Then for C, D e 2) it follows that
fc=o n-l fc=0
x
Jx
1 ~
f lx *
— / — 2_,v(S x->C)p,(dx) •J
-A
I
|-1
f
I
v(x,D)iJ,(dx)
I/ J\
-» 0 (n -> oo)
by the bounded convergence thoerem and the mean ergodic theorem since p, is ergodic. (3) =» (4). Let /x G Pse(A"). Then /ii/ G Pae(Y) by (3) and hence Jw = (iv € P3e(Y). Thus F is pseudoergodic. (4) =>• (5). Take z/i = F. (5) =>• (6) is immediate. (6) =* (7). Let C G 2) be T-invariant and /x G Pse(X). Then /xz/i G Pae(^) and [ivi(C] = 0 or 1. Since i/(x, •) <S 1^1(2;, •) /x-a.e. x, we have /xf