irma_weber_titelei
10.8.2009
11:03 Uhr
Seite 1
IRMA Lectures in Mathematics and Theoretical Physics 14 Edited by Chr...

Author:
Weber M.

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 1

IRMA Lectures in Mathematics and Theoretical Physics 14 Edited by Christian Kassel and Vladimir G. Turaev

Institut de Recherche Mathématique Avancée CNRS et Université de Strasbourg 7 rue René-Descartes 67084 Strasbourg Cedex France

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 2

IRMA Lectures in Mathematics and Theoretical Physics Edited by Christian Kassel and Vladimir G. Turaev This series is devoted to the publication of research monographs, lecture notes, and other material arising from programs of the Institut de Recherche Mathématique Avancée (Strasbourg, France). The goal is to promote recent advances in mathematics and theoretical physics and to make them accessible to wide circles of mathematicians, physicists, and students of these disciplines. Previously published in this series: 1 2 3 4 5 6 7 8 9 10 11 12 13

Deformation Quantization, Gilles Halbout (Ed.) Locally Compact Quantum Groups and Groupoids, Leonid Vainerman (Ed.) From Combinatorics to Dynamical Systems, Frédéric Fauvet and Claude Mitschi (Eds.) Three courses on Partial Differential Equations, Eric Sonnendrücker (Ed.) Infinite Dimensional Groups and Manifolds, Tilman Wurzbacher (Ed.) Athanase Papadopoulos, Metric Spaces, Convexity and Nonpositive Curvature Numerical Methods for Hyperbolic and Kinetic Problems, Stéphane Cordier, Thierry Goudon, Michaël Gutnic and Eric Sonnendrücker (Eds.) AdS/CFT Correspondence: Einstein Metrics and Their Conformal Boundaries, Oliver Biquard (Ed.) Differential Equations and Quantum Groups, D. Bertrand, B. Enriquez, C. Mitschi, C. Sabbah and R. Schäfke (Eds.) Physics and Number Theory, Louise Nyssen (Ed.) Handbook of Teichmüller Theory, Volume I, Athanase Papadopoulos (Ed.) Quantum Groups, Benjamin Enriquez (Ed.) Handbook on Teichmüller Theory, Volume II, Athanase Papadopoulos (Ed.)

Volumes 1–5 are available from Walter de Gruyter (www.degruyter.de)

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 3

Michel Weber

Dynamical Systems and Processes

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 4

Author: Michel Weber Institut de Recherche Mathématique Avancée CNRS et Université de Strasbourg 7, rue René Descartes 67084 Strasbourg Cedex France

2000 Mathematics Subject Classification: 37-02, 60-02. Key words: Dynamical systems, measure-preserving transformation, ergodic theorems, spectral theorems, convergence almost everywhere, central limit theorem, stochastic processes, gaussian processes, metric entropy method, majorizing measure method, randomization methods, Riemann sums

978-3-03719-046-3 The Swiss National Library lists this publication in The Swiss Book, the Swiss national bibliography, and the detailed bibliographic data are available on the Internet at http://www.helveticat.ch. This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copyright owner must be obtained.

© 2009 European Mathematical Society Contact address: European Mathematical Society Publishing House Seminar for Applied Mathematics ETH-Zentrum FLI C4 CH-8092 Zürich Switzerland Phone: +41 (0)44 632 34 36 Email: [email protected] Homepage: www.ems-ph.org Typeset using the author’s TEX files: I. Zimmermann, Freiburg Printed in Germany 987654321

Preface

The aim of this book is to present in a concise and accessible way, as well as in a common setting, various tools and methods arising from spectral theory, ergodic theory and probability theory, which contribute interactively to the current research on almost everywhere convergence problems. The recent developments in the study of these questions are often obtained by combining either methods of spectral theory with principles of ergodic theory or methods from probability theory with tools and principles from spectral theory and ergodic theory. The spectral criterion of Gaposhkin, and later, following a remarkable metric entropy inequality of Talagrand, the spectral regularization developed in the setting of the study of square functions and oscillation functions in ergodic theory, are typical examples of this fruitful interaction. Another example of thorough interaction is certainly the work of Bourgain and notably his famous entropy criterion, at the basis of which lies the continuity principle of Stein. It was not our aim to write a complete treatise in ergodic theory, assuming such enterprise to be conceivable. The development of this theory during the last twenty years was indeed considerable. A similar remark can be made for the part concerning the study of the regularity of stochastic processes. The work is also not a synthesis of most significant results, complete with sketched proofs and references. We chose the intermediate route to writing a book in the spirit of lectures oriented towards research. The book provides an easy access to many tools, methods and results used in current research, presenting each of them in as wide a setting as possible. The proofs of these results are often given with full details. This book is divided in four parts, which came more or less naturally while writing it. Part I is devoted to spectral results and is followed by Part II, in which tools and results from ergodic theory are presented. In the third part, in connection with the description of two main methods, namely the metric entropy method and the majorizing measure method, recent applications to ergodic theory are given via the study of some maximal inequalities of Gál–Koksma type and the Lp norm, 1 ≤ p ≤ ∞, of important classes of polynomials. Finally, in the last part of the book we recollect classical results, as well as recent advances concerning Riemann sums and Khintchin sums, and the value distribution of divisors of Bernoulli or Rademacher sums, used in the study of Riemann sums. In Part I we begin elementarily with the spectral inequality. Chapter 1 concerns von Neumann’s theorem, which forms with Birkhoff’s ergodic theorem the basis of ergodic theory. It seems natural to include in this chapter Talagrand’s metric entropy n−1 estimate for the set {ATn f, n ≥ 1} where ATn is the average operator I +T +···+T n of a contraction T in a Hilbert space, thus completing naturally the von Neumann theorem. Recently discovered, remarkably efficient, spectral regularization inequalities analysing other structural properties of the set {ATn f, n ≥ 1}, followed by Weyl’s

vi

Preface

criterion and the van der Corput principle, complete this chapter. Chapter 2 starts with presenting the arguments leading to the representation of a weakly stationary process as Fourier transform of a random measure with orthogonal increments. Next we study Gaposhkin’s spectral criterion. In Part II, we first review in Chapter 3 classical ergodic and mixing properties of measurable dynamical systems. We also study several standard examples. Chapter 4 is devoted to Birkhoff’s pointwise theorem, to dominated ergodic theorems in Lp and to BMO spaces of associated maximal operators. This is continued with a discussion around spectral characterizations of the speed of convergence in Birkhoff’s pointwise theorem. Next we examine oscillation functions of ergodic averages. The transference principle and Wiener–Wintner theorems are discussed. A study of weighted ergodic averages concludes this chapter. In Chapter 5, some basic tools from ergodic theory, the Banach principle, the continuity principle and the conjugacy lemma are studied in detail. Chapter 6 concerns entropy criteria of Bourgain. Several functional inequalities linking the studied sequence of L2 -operators with the canonical Gaussian process on L2 are established, from which the criteria are then easily deduced. Study of the statistic of the ergodic averages naturally leads to investigating the question of the existence of some f ∈ L2 such that the related ergodic averages satisfy a central limit theorem, the invariance principle or the almost sure central limit theorem. Chapter 7 is devoted to this study. A detailed proof of the theorem of Burton–Denker on the existence, in any aperiodic dynamical system, of the central limit theorem is given. The method of proof relies upon Kakutani–Rochlin’s lemma and imitates the analogous result for irrational rotations of the unit circle which is obtained by using Fourier series. A fundamental fact in the background of the entire construction is provided by using Rochlin’s result on a factor space of Lebesgue space. The case of irrational rotations involving various remarkably efficient methods is more closely investigated. The existence of L2 elements of the torus satisfying the central limit theorem (CLT) is established for various types of means: nonlinear ergodic means, weighted ergodic means, and ergodic means along the squares. For the latter case, the circle method is used. The chapter concludes with a recent study of a kind of achieved form of the CLT, the convergence in variation implying the convergence of related density distributions in the spaces Lp (R), 1 ≤ p ≤ ∞, in the symptomatic case of lacunary random Fourier series. Two rather general methods are investigated in Part III: the metric entropy method and the majorizing measure method. In Chapter 8, a useful criterion for almost everywhere convergence involving covering numbers is proved, and then used to prove in a unified setting several classical results, such as Stechkin’s theorem, Gál–Koksma theorems and quantitative Borel–Cantelli lemmas. The metric entropy method is next applied to establish quite useful estimates of the supremum of random polynomials, notably random Dirichlet polynomials, and to study almost sure convergence properties of weighted series of contractions and random perturbation of some intersective sets in ergodic theory. Chapter 9 concerns an important tool: the majorizing measure method. A general criterion for almost sure convergence of averages is proved by means of this

Preface

vii

method. We continue with recent applications of the majorizing measure method to the study of the supremum of random polynomials, including a strictly stronger form of the well-known Salem–Zygmund estimate. Some remarkable classes of examples are studied. Chapter 10 is a succinct study of Gaussian processes presented in the form of a toolbox. Various fundamental results from the theory are discussed, sometimes with historical comments and proofs. Much importance is given to very handy correlation inequalities. Part IV is devoted to three studies: the study of Riemann sums, the study of convergence properties of the system {f (nk x), k ≥ 1} and a probabilistic approach concerning divisors with applications. Chapters 1 to 6 and partially Chapters 8 to 10 are based on lectures given at the Mathematical Institute of the University of Strasbourg. Chapters 11 to 13 are mainly based on research articles, as well as some parts of Chapters 1, 4, 7, 8, 9. In writing this book, we followed a general principle: where the proofs in our source readings were only sketched, we fill in the gaps in as much detail as possible. Further, we give quasisystematically complete references with page numbers and/or precise numeration of cited results. We always keep in mind the wish to help, as much as we can, the researcher but also the teacher and the graduate student in their work in these beautiful areas of mathematics, trying also to spare their time and to let them share our passion for research at the interfaces of related problems. I would like to thank Mikhail Lifshits for the many discussions and encouragements. I would also like to thank Istvan Berkes for his indefectible enthusiasm and the many exchanges and comments, as well as Ulrich Krengel for stimulating comments. I am much indebted and grateful to Irene Zimmermann for her technical assistance and for numerous observations and remarks. I thank Manfred Karbe and the European Mathematical Society Publishing House for accepting this work in their IRMA series, and for efficient help in publishing. I devote this book to my wife Marie-Christine. She always provided a favourable atmosphere for mathematical work.

Contents

Preface Part I

v Spectral theorems and convergence in mean

1

1 The von Neumann theorem and spectral regularization 1.1 Bochner–Herglotz lemma . . . . . . . . . . . . . . . . . 1.2 The spectral inequality . . . . . . . . . . . . . . . . . . 1.3 The von Neumann theorem . . . . . . . . . . . . . . . . 1.4 The spectral regularization inequality . . . . . . . . . . . 1.5 Moving averages . . . . . . . . . . . . . . . . . . . . . 1.6 Uniform distribution mod a – the Weyl criterion . . . . . 1.7 The van der Corput principle . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 3 8 10 26 44 51 55

2 Spectral representation of weakly stationary processes 2.1 Weakly stationary processes . . . . . . . . . . . . . . 2.2 Spectral representation of unitary operators . . . . . . 2.3 Elements of stochastic integration . . . . . . . . . . . 2.4 Spectral representation of weakly stationary processes . 2.5 Weakly stationary sequences and orthogonal series . . 2.6 Gaposhkin’s spectral criterion . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

61 61 64 76 78 80 85

Part II

. . . . . .

Ergodic Theorems

91

3 Dynamical systems – ergodicity and mixing 3.1 Measurable dynamical systems – topological dynamical systems 3.2 Ergodicity of a dynamical system . . . . . . . . . . . . . . . . . 3.3 Weak mixing, strong mixing, continuous spectrum . . . . . . . . 3.4 Spectral mixing theorem . . . . . . . . . . . . . . . . . . . . . 3.5 Other equivalences and other forms of mixing . . . . . . . . . . 3.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

93 93 101 103 110 114 121

4 Pointwise ergodic theorems 4.1 Birkhoff’s pointwise theorem 4.2 Dominated ergodic theorems 4.3 Classes L logm L . . . . . . 4.4 A converse . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

129 129 139 144 145

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

x 4.5 4.6 4.7 4.8 4.9

Contents

Speed of convergence . . . . . . . . . . Oscillation functions of ergodic averages Wiener–Wintner theorem . . . . . . . . Weighted ergodic averages . . . . . . . Subsequence averages . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

148 152 165 168 193

5 Banach principle and continuity principle 5.1 Banach principle . . . . . . . . . . . . . . . 5.2 Continuity principle . . . . . . . . . . . . . . 5.3 Applications . . . . . . . . . . . . . . . . . . 5.4 A principle of domination – conjugacy lemma

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

200 200 206 217 226

6 Maximal operators and Gaussian processes 6.1 Some liaison theorems . . . . . . . . . . . 6.2 Two preliminary lemmas . . . . . . . . . . 6.3 Proof of Theorem 6.1.1 . . . . . . . . . . . 6.4 Proof of Theorem 6.1.6 . . . . . . . . . . . 6.5 The case Lp , 1 < p < 2 . . . . . . . . . . 6.6 A remarkable GB set property . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

230 230 242 247 249 254 259

7 The central limit theorem for dynamical systems 7.1 Introduction and preliminaries . . . . . . . . . . 7.2 A theorem of Burton and Denker . . . . . . . . . 7.3 The central limit theorem for orbits . . . . . . . . 7.4 A theorem of Volný . . . . . . . . . . . . . . . . 7.5 CLT for rotations . . . . . . . . . . . . . . . . . 7.6 Lacunary series and convergence in variation . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

267 267 269 284 289 291 315

Part III

. . . . .

. . . . .

. . . . . .

Methods arising from the theory of stochastic processes

8 The metric entropy method 8.1 Introduction and general results . . . . . . . . . . . . . . . . . . . 8.2 A theorem of Stechkin . . . . . . . . . . . . . . . . . . . . . . . 8.3 An application to the quantitative Borel–Cantelli lemma . . . . . . 8.4 Application to Gál–Koksma’s theorems . . . . . . . . . . . . . . 8.5 An application to the supremum of random polynomials . . . . . . 8.6 Application to a.s. convergence of weighted series of contractions 8.7 An application to random perturbation of intersective sets . . . . . 8.8 An application to the discrepancy of some random sequences . . . 8.9 An application to random Dirichlet polynomials . . . . . . . . . .

339

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

341 341 349 353 364 369 387 403 409 415

9 The majorizing measure method 433 9.1 Introduction – the exponential case . . . . . . . . . . . . . . . . . . . . . 433

xi

Contents

9.2 A general approach . . . . . . . . . . . . . . . 9.3 A useful criterion . . . . . . . . . . . . . . . . 9.4 Proof of Theorem 9.3.3 . . . . . . . . . . . . . 9.5 Proof of Theorems 9.3.10 and 9.3.11 . . . . . . 9.6 Proof of Theorem 9.3.12 and some examples . 9.7 A stronger form of Salem–Zygmund’s estimate 9.8 Some examples and discussion . . . . . . . . . 9.9 Uniform convergence of random Fourier series

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

438 447 457 469 471 475 478 488

10 Gaussian processes 10.1 Gaussian variables and correlation estimates . . . 10.2 0-1 laws, integrability and comparison lemmas . 10.3 Regularity and irregularity of Gaussian processes 10.4 Gaussian suprema . . . . . . . . . . . . . . . . . 10.5 Oscillations of Gaussian Stein’s elements . . . . 10.6 Tightness of Gaussian Stein’s elements . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

491 491 504 510 517 529 537

Part IV Three studies

547

11 Riemann sums 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The results of Jessen and Rudin . . . . . . . . . . . . . . . . . . 11.3 Individual theorems of spectral type . . . . . . . . . . . . . . . 11.4 Breadth and dimension . . . . . . . . . . . . . . . . . . . . . . 11.5 Bourgain’s results . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Connection with number theory . . . . . . . . . . . . . . . . . 11.7 Riemann sums and the randomly sampled trigonometric system 11.8 Almost sure convergence and square functions of Riemann sums

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

549 549 551 554 557 562 565 573 587

12 A study of the system (f (nx)) 12.1 Introduction and mean convergence . . . . . . 12.2 Almost sure convergence – sufficient conditions 12.3 Almost sure convergence – necessary conditions 12.4 Random sequences . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

601 601 611 634 642

. . . . . . .

659 659 661 675 685 691 699 701

. . . .

. . . .

. . . .

. . . .

13 Divisors and random walks 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 13.2 Value distribution and small divisors of Bernoulli sums 13.3 An LIL for arithmetic functions . . . . . . . . . . . . . 13.4 On the order of magnitude of the divisor functions . . . 13.5 Value distribution of the divisors of n2 + 1 . . . . . . . 13.6 Value distribution of the divisors of Rademacher sums . 13.7 The functional equation and the Lindelöf Hypothesis .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

xii

Contents

13.8 An extremal divisor case . . . . . . . . . . . . . . . . . . . . . . . . . . 711 Bibliography

729

Index

759

Part I Spectral theorems and convergence in mean

Chapter 1

The von Neumann theorem and spectral regularization

Von Neumann’s theorem is, together with Birkhoff’s theorem, one of the fundamental results in ergodic theory. A remarkable spectral regularization inequality is established, from which Talagrand’s entropy estimate is deduced, as well as sharp bounds for the Littlewood–Paley square functions. Other averages, like moving averages, are considered. Some useful lemmas, the Bochner–Herglotz lemma, the spectral lemma and the spectral inequality are first established and completed by some other, sometimes less known results. Two important tools are included at the end of the chapter: Weyl’s equidistribution theorem and the van der Corput principle.

1.1

Bochner–Herglotz lemma

The lemmas studied in this section, as well as in the next one, are classical tools of spectral analysis. The spectral inequality, which is easily derived from the Bochner– Herglotz lemma, allows us to reduce many problems of in-norm evaluation of vectors, to much more tractable harmonic analysis questions. This tool is often used in ergodic theory. We thus begin by establishing Bochner–Herglotz’s lemma. A function γ : R → R is nonnegative definite if for any positive integer n, and any u1 , . . . , un ∈ R, a1 , . . . , an ∈ C, we have ai a¯ j γ (ui − uj ) ≥ 0. 1≤i,j ≤n

For a continuous function γ : R → R, an equivalent definition of nonnegative definiteness is that for any measurable bounded function ξ(x) vanishing outside some finite interval, ∞

∞

−∞ −∞

γ (t − s)ξ(t)ξ(s) dtds ≥ 0.

A sequence of complex numbers {ak , k ∈ Z} is nonnegative definite if a−k = a¯ k and if the inequality ρi ρ j ai−j ≥ 0, 1≤i,j ≤n

holds for any finite system of complex numbers ρ1 , . . . , ρn . A function γ : Z → R is thus nonnegative definite if the sequence {γ (k), k ∈ Z} is nonnegative definite. These notions immediately extend to functions defined on Rd or Zd . Let T = R/Z = [0, 1[

4

1 The von Neumann theorem and spectral regularization

be the circle equipped with the normalized Lebesgue measure λ, and let Td denote the d-dimensional torus equipped with the measure λd . 1.1.1 Lemma. a) Let γ : Rd → R be continuous, nonnegative definite. Then there exists a nonnegative bounded measure μ on Rd , such that for any x ∈ Rd , γ (x) = eit,x μ(dt). Rd

b) Let γ : Zd → R be nonnegative definite. Then there exists a nonnegative bounded measure μ on Td , such that for any k ∈ Zd , γ (k) = e2iπ k,t μ(dt). Td

Proof. We give the proof for d = 1, the multidimensional case being obtained in a quite identical way. Let Z denote some positive integer. Consider a) first. Put Ik =

k

[0,Z[k i,j =1

e−i(ui −uj )x γ (ui − uj ) du1 . . . duk .

By assumption Ik ≥ 0. Moreover, Ik = kγ (0) du1 . . . duk [0,Z[k

du1 . . . duk Z Z −i(ui −uj )x e γ (ui − uj ) dui duj k−2 dui duj 0 0 i,j =1 [0,Z[ Z Z = kγ (0)Z k + k(k − 1)Z k−2 e−i(u−v)x γ (u − v) dudv. +

k

0

Dividing by k(k

0

− 1)Z k−2

and then letting k tend to infinity, implies Z Z e−i(u−v)x γ (u − v)dudv ≥ 0. 0

0

Making the change of variables u − v = t gives Z Z Z−v e−itx γ (t)dt dv = e−itx γ (t) {min(Z, Z − t) − sup(0, −t)} dt 0

−v

−Z Z

=

−Z

e−itx γ (t) (Z − |t|) dt ≥ 0.

Let γZ (x) = γ (x) (1 − |x|/Z) 1[−Z,Z] (x),

γˆZ (x) =

R

e−itx γZ (t) dt.

5

1.1 Bochner–Herglotz lemma

Then γˆZ (x) ≥ 0, and evidently γZ ∈ L∞ (R). We show that γˆZ ∈ L1 (R). Integrating 2 2 γˆZ (x) over R with respect to the density √1 e−x /(2σ ) , yields σ 2π

R

γˆZ (x)e

−

x2 2σ 2

dx = √ σ 2π

2 2 2 −itx− x 2 dx 2σ γZ (t) e γZ (t)e−σ t /2 dt. dt = √ σ 2π R R R

Hence, since γZ ∞ ≤ γ (0), R

γˆZ (x)e

−

x2 2σ 2

√ 2 2 dx = σ 2π γZ (t)e−σ t /2 dt R √ √ 2 2 ≤ σ 2π γ (0) e−σ t /2 dt = 2π γ (0). R

But γˆZ (x) ≥ 0. Letting σ tend to infinity increasingly, finally shows in view of Fatou’s lemma that γˆZ ∈ L1 (R). Now we need the Fourier inversion theorem: Let h, hˆ ∈ L1 (Rd ). Then for almost all x, h(x) =

Rd

ˆ eit,x h(t)dt.

Thus γˆZ ∈ L1 (R) and for almost all x, γZ (x) = R eitx γˆZ (t)dt. As γZ and the mapping itx x → R e γˆZ (t)dt are continuous, the above equality holds in turn everywhere. Hence γ (0) = γZ (0) = γˆZ (t)dt. R

Denote by μZ the measure on R having density γˆZ (t). Since γZ (x) → γ (x) everywhere as Z tends to infinity, we get lim μˆ Z (x) = γ (x).

Z→∞

By assumption γ is continuous. It follows from the corollary on p. 481 in [Feller: 1966, II] that there exists a nonnegative bounded measure μ on R such that γ (x) = μ(x). ˆ Z −2iπ(n−m)x γ (n − m) ≥ 0. We pass to the proof of b). By assumption n,m=1 e This sum can also be written as Z

e−2iπ(n−m)x γ (n − m)

n,m=1

=

n−1 Z n=1 p=n−Z

e

−2iπ xp

γ (p) =

Z−1 −Z+1

e−2iπ xp γ (p)

p+1≤n≤p+Z 1≤n≤Z

1=

6

1 The von Neumann theorem and spectral regularization

=

Z−1

e−2iπ xp γ (p){min(p + Z, Z) − max(1, p + 1) + 1}

−Z+1

=

Z−1

e−2iπ xp γ (p) (Z − |p|) .

−Z+1

Put γZ (p) = γ (p)1{−Z+1,Z−1} (1 − |p|/Z) and gZ (x) = p∈Z e−2iπ xp γZ (p). Then γˆZ (−x) = gZ (x) ≥ 0, and since γZ has compact support, gZ is bounded continuous. Further gZ (x)e2iπ xr dx = γZ (p) e2iπ x(r−p) dx = γZ (r). T

T

p∈Z

In particular γZ (0) = γ (0) = T gZ (x)dx, thereby implying that the nonnegative measures νZ on (T, B(T)) with density gZ (x) are relatively compact for the weak convergence topology D on T. Hence, there exists a subsequence J and a bounded nonnegative measure ν on T such that D

lim

JZ→∞

and limJZ→∞ γZ (r) = any r ∈ Z,

Te

2iπ xr ν(dx).

νZ = ν, Since limZ→∞ γZ (r) = γ (r), we get for

γ (r) =

T

e2iπ xp ν(dx).

Schoenberg’s theorem. Schoenberg [1938] found a beautiful complement to Bochner’s theorem, which is worth being formulated here. Let f : R+ → R+ be continuous, nonnegative definite. Assume that f (0) = 1. Schoenberg’s theorem translates, via Bochner’s theorem, to the equivalence of the following two assertions: (a) For all d ≥ 1, there is a probability measure μd on Rd such that for every x ∈ Rd , eix,y μd (x). f ( x d ) = Rd

Here x d is the Euclidian norm on Rd . (b) There exists a Borel probability ν on R+ such that for any positive real t, ∞ 2 e−st /2 ν(ds). f (t) = 0

There is a proof of Schoenberg’s theorem via the law of large numbers in Khoshnevisan [2005], to which we may also refer as a source. 1.1.2 Remarks. 1. Nonnegative definite sequences are characterized by the previous lemma. According to this one, a sequence is nonnegative definite if and only if there

1.1 Bochner–Herglotz lemma

7

exists a weakly stationary sequence {Xn , n ≥ 1} in a Hilbert space H such that for any positive integers h and k, Xh , Xk = γh−k . This point can also be established by means of a direct vector representation in H , see Ky Fan [1946: Paragraph 2 and Appendix]. Nonnegative definite sequences are closely related to nonnegative trigonometric polynomials. p 2. A trigonometric polynomial k=−p zk eikθ with z−k = zk and taking only nonnegative values, is said to be nonnegative. In view of a classical result of Fejér and F. Riesz (Fejér [1915]), there exist p + 1 complex numbers ρ0 , ρ1 , . . . , ρp such that p

2 zk eikθ = ρ0 + ρ1 eiθ + · · · + ρp eipθ .

k=−p

3. We also quote a theorem due to Szász [1918] (see Ky Fan [1946: Paragraph 3]). A sequence {an , n ∈ Z} is nonnegative definite, if and only if, p

ak zk ≥ 0

k=−p

p holds for any nonnegative trigonometric polynomial k=−p zk eikθ of arbitrary order p. This characterization is to be compared with the one of Hausdorff [1923]: the sequence {an , n ∈ Z} is nonnegative definite, if and only if p p

ah−k ei(h−k)θ ≥ 0

h=1 k=1

is satisfied for any positive integer p and any real θ . Below we list some standard examples of nonnegative definite sequences and weakly stationary sequences. 1.1.3 Examples. (1) Given a weakly stationary sequence {Xn , n ≥ 1} in H , it is readily seen that, for any real value of ϑ, the sequence {e−inϑ Xn , n ≥ 1} is weakly stationary too. Anticipating a bit von Neumann’s theorem, for any value of ϑ the limit e−iϑ X1 + e−i2ϑ X2 + · · · + e−inϑ Xn n→∞ n

(ϑ) = lim

also exists. Further (see before Remarks 1.3.4), if ϑ1 = ϑ2 (mod 2π ), (ϑ1 ) and (ϑ2 ) are orthogonal elements in H . And there exists at most a countable infinite set of values of ϑ for which (ϑ) differs from the null element of H (see Ky Fan [1946: Paragraph 6]). (2) Let : R → R+ be even, convex and nonincreasing. Then the sequence { (n), n ∈ Z} is nonnegative definite. This follows from a classical theorem due to Polyá.

8

1 The von Neumann theorem and spectral regularization

(3) Let S be the space of correlated sequences introduced by Wiener [1933: Chapter 4], namely the space of sequences a = {a(n), n ∈ Z} with a−n = a¯ n , such that for any k ≥ 0 the limit n−1 1 γa (k) = lim a(j )a(j + k) n→∞ n j =0

exists. Observe that for any integers r, s with 0 ≤ r ≤ s, 1 a(h + r)a(h + s). n→∞ n n−1

γa (s − r) = lim

h=0

From this follows that the sequence {γa (k), k ≥ 0} is nonnegative definite. Indeed, m

n−1 m 1 ck c¯l a(j + l)a(j + k) n→∞ n

ck c¯l γa (k − l) = lim

j =0 k,l=1

k,l=1

n−1 m 2 1 ck a(j + k) ≥ 0. n→∞ n

= lim

j =0 k=1

In view of the Bochner–Herglotz theorem, there exists a uniquely determined nonnegative bounded measure a on [−π, π[, called the spectral measure of the sequence a. Consider the family of measures J,a (dα) =

2 1 −ij α e a(j ) dα. J 0≤j <J

A theorem due to Coquet, Kamae and Mendes-France [1977: Theorem 1] shows that the family of measures J,a converges weakly to a . To establish this property, ˆ J,a converges pointwise it suffices to show that the sequence of Fourier transforms ˆ to a , which is easily checked.

1.2 The spectral inequality Bochner–Herglotz’s lemma has a very useful consequence, which we now state. 1.2.1 Lemma. Let T be a contraction in a Hilbert space H . For any n ∈ Z, let Tn = T n if n ≥ 0 and Tn = T ∗ |n| if n < 0. Let x ∈ H . The sequence {Tn x, x, n ∈ Z} is nonnegative definite, and there exists a uniquely determined nonnegative bounded measure μx on T, the spectral measure of T at x verifying exp(2iπ nt)μx (dt) (∀n ∈ Z). Tn x, x = T

9

1.2 The spectral inequality

Proof. The second assertion follows from Lemma 1.1.1. The first assertion is simple when T is an isometry. n

zl z¯ m Tl−m x, x =

m

l

l,m=−n

2 zl z¯ m Tl+n x, Tm+n x = zl Tl+n x ≥ 0. l

For the general case, we put for any 0 < r < 1 and t ∈ T, U (r, t) = r k e2iπ kt T k , k≥0

V (r, t) =

r |k| e2iπ kt Tk = −I + U (r, t) + U (r, t)∗ .

k∈Z

If y = U (r, t)x, we have y − x = re2iπ t T y. Thus y − x ≤ y , and this shows V (r, t)x, x = −x, x + y, x + x, y = y, y − y − x, y − x ≥ 0. For any complex numbers {zl , |l| ≤ n}, we have n

r

l−m

zl z¯ m Tl−m x, x =

l,m=−n

=

n l,m=−n k n T l,m=−n

zl z¯ m Tk x, xr

|k|

T

e2iπ(k−(l−m))t dt

zl z¯ m e−2iπ(l−m)t V (r, t)x, x dt

2 = zl e−2iπ lt V (r, t)x, x dt ≥ 0. T

l

Letting r tend to 1 gives the required inequality. We shall now deduce from the spectral lemma an extremely useful tool. 1.2.2 Proposition. Let T be a contraction in a Hilbert space H , and let p(x) be a polynomial. Then, for any x ∈ H , 2iπ t 2 p(e

p(T )x 2 ≤ ) μx (dt), T

where the measure μx is the same as in Lemma 1.2.1. Proof. We follow an argument due to Wierdl. The inequality is obviously satisfied if the order of p is equal to 0. Assume now that the inequality is true for any polynomial of order k − 1. Let p(y) = a0 + · · · + ak y k , and consider the auxiliary polynomials q(y) = a1 y + · · · + ak y k ,

u(y) =

q(y) = a1 + · · · + ak y k−1 . y

10

1 The von Neumann theorem and spectral regularization

We have |p(y)|2 = |a0 + q(y)|2 = |a0 |2 + a0 q(y) ¯ + q(y)a¯ 0 + |q(y)|2 , and

p(T )x 2 = (a0 I + q(T )) x 2 = |a0 |2 x 2 + a0 x, q(T )x + q(T )x, a0 x + q(T )x 2 . By using the induction hypothesis, we have 2iπ t 2 u(e ) μx (dt).

u(T )x 2 ≤ T

Since T is a contraction, then q(T )x = T u(T )x ≤ u(T )x . And as |u(e2iπ t )| = |q(e2iπ t )|, we get 2iπ t 2 q(e

q(T )x 2 ≤ ) μx (dt). T

Besides, a0 x, q(T )x =

T

a0 q(e ¯

2iπ t

) μx (dt) and

q(T )x, a0 x =

T

q(e2iπ t )a¯ 0 μx (dt).

By putting together these various estimates, we obtain

p(T )x 2 ≤ ¯ 2iπ t ) + q(e2iπ t )a¯ 0 + |q(e2iπ t )|2 μx (dt), |a0 |2 + a0 q(e T

and this establishes the spectral inequality for all polynomials of order k, and thereby for any polynomial.

1.3 The von Neumann theorem Let T be a contraction in a Hilbert space H and introduce the operators 1 k T , n n−1

An = ATn =

n = 1, 2, . . . .

(1.3.1)

k=0

The fundamental result of von Neumann [1931] can be stated as follows. 1.3.1 Theorem. The limit limn→∞ An f = f¯ exists for any f ∈ H , and the map PT : f → f¯ is the orthogonal projection of H onto the subspace of invariant vectors HT = {g ∈ H : T g = g}. Further H = H0 ⊕ HT , where H0 = {g − T g, g ∈ H }. Proof. (1) The proof is based on the following lemma.

1.3 The von Neumann theorem

11

1.3.2 Lemma. Let T be a contraction in a Hilbert space H . Then the adjoint operator (Section 2.2.6) T ∗ has the same fixed points as T . Proof. T ∗ is also a contraction and if Tf = f , then f, Tf = Tf, f = f 2 . Conversely f, Tf = Tf, f = f 2 implies f, T ∗ f = f 2 and

Tf − f 2 = Tf − f, Tf − f = Tf 2 + f 2 − f, Tf − Tf, f = Tf 2 − f, T ∗ f ≤ 0. Thus Tf = f and so Tf = f ⇔ Tf, f = T ∗ f, f = f 2 . Therefore Tf = f ⇐⇒ Tf, f = T ∗ f, f = f, f ⇐⇒ T ∗ f = f. (2) We show that H = H0 ⊕ HT . According to (1), for any f ∈ HT , f, g − T g = f, g − f, T g = f, g − T ∗ f, g = 0. Hence HT ⊂ H0⊥ . Besides, if f is orthogonal to H0 , then 0 = f, g − T g = f − T ∗ f, g for any g in H . Thus T ∗ f = f , and thereby Tf = f . This implies that H0⊥ = HT . (3) It is plain that the theorem is satisfied for any vector of the type f + g − T g, f ∈ HT and g ∈ H . Indeed 1 k 1 k 1 k T (f + g − T g) = T f+ (T g − T k+1 g) n n n n−1

n−1

n−1

k=0

k=0

k=0

1 = f + (g − T n g) → f, n as n tends to infinity, and f is the orthogonal projection on HT of f + g − T g. (4)According to (2), these vectors are dense in H . The operators An are contractions as well. It follows that the set of vectors for which the theorem is true is closed in H . Let indeed A = {x ∈ H such that if y = projH0 (x) then lim An (y) = 0}. n→∞

We show that A is closed. Let {xn , n ≥ 1} ⊂ A, xn → x. Then yn → y, and

AN (y) ≤ AN (y − yp ) + AN (yp ) ≤ AN (yp ) + y − yp . Let ε > 0 and let p be a fixed integer such that y − yp < ε/2. Let N (ε) be such that for any N ≥ N(ε), AN (yp ) ≤ ε/2. We obtain that AN (y) ≤ ε. Thus A is closed in H and the theorem is established. Let {Xn , n ≥ 0} be a weakly stationary sequence in a Hilbert space H . According to Theorem 2.1.3, there exists a unitary operator U on H such that Xn = U n X0 . By von Neumann’s theorem, we get that the limit (X) := lim

n→∞

X0 + · · · + Xn−1 n

12

1 The von Neumann theorem and spectral regularization

exists in H . It can be directly observed that the inner product (X), Xh is independent of h. Indeed, by using the weak stationarity

Xk+1 + · · · + Xk+n Xh+1 + · · · + Xh+n , Xh = lim , Xk (X), Xh = lim n→∞ n→∞ n n = (X), Xk . And consequently

(X), Xh = (X),

X1 + · · · + Xn , n

which gives as n tends to infinity: (X), Xh = (X) 2 . As observed in Examples 1.1.3, for any real value of ϑ, the sequence {e−inϑ Xn , n ≥ 1} is weakly stationary too. The limit e−iϑ X1 + e−i2ϑ X2 + · · · + e−inϑ Xn n→∞ n thus also exists, for any value of ϑ. Then (X, ϑ) = lim

e−ihϑ Xh , (X, ϑ) = (X, ϑ) 2 , independently of h. Hence −iϑ1 e X1 + e−i2ϑ1 X2 + · · · + e−inϑ1 Xn

, (X, ϑ2 ) n ei(ϑ2 −ϑ1 ) + ei2(ϑ2 −ϑ1 ) + · · · + ein(ϑ2 −ϑ1 ) = . (X, ϑ2 ) 2 . n Therefore, if ϑ1 = ϑ2 (mod2π ), the last equation becomes, as n tends to infinity, (X, ϑ1 ), (X, ϑ2 ) = 0, as claimed in Examples 1.1.3. Weakly stationary sequences, however, enjoy other remarkable properties; among them is certainly the following identity which does not seem to be so known. An identity of Ky Fan. For any two positive integers n, m,

X1 + · · · + Xm 2

X1 + · · · + Xn 2

X1 + · · · + Xn+m 2 + − n m n+m

n(n + m) X1 + · · · + Xn X1 + · · · + Xn+m 2 = − . m n n+m This nice identity was observed and applied in Ky Fan [1946: 598]. The proof goes as follows. Put for any positive integer n, Sn = X1 + · · · + Xn , and if m is another positive integer let Tn,m = Sn+m − Sn , so that Sn+m = Sn + Tn,m . Then Sn Sn+m −

n

2

2 2 = Sn + Sn+m − Sn , Sn+m − Sn+m , Sn , n + m n2 (n + m)2 n n+m n+m n

13

1.3 The von Neumann theorem

and so

Sn+m 2 n(n + m) Sn − m n n + m (n + m) Sn 2 n Sn+m 2 = + nm m(n + m) 1 − Sn , Sn+m + Sn+m , Sn m

Sn 2

Sn 2

Sm 2

Sm 2

Sn+m 2 = + + − − n m m m n+m

Sn+m 2 1 + − Sn , Sn+m + Sn+m , Sn . m m But Sn , Sn+m + Sn+m , Sn = 2 Sn 2 + Sn , Tn,m + Tn,m , Sn , so that in turn n(n + m) m

Sn Sn+m −

n

n+m

2 2 2 2

2 2 = Sn + Sm − Sn+m + 1 S n+m − Sn

n m n+m m 2 − Sm − Sn , Tn,m − Tn,m , Sn

=

Sn 2

Sm 2

Sn+m 2 + − , n m n+m

since Sn+m = Sn + Tn,m . And we are done. Note that the weak stationarity assumption was only used in the last line of calculations, to say that Tn,m = Sm . A simple although quite interesting consequence of Ky Fan’s identity is

Sm 2

Sn 2

Sn+m 2 + , ≤ n m n+m

(1.3.2)

which is valid for any two positive integers n, m. This is inequality (4.8) in Ky Fan [1946]. We say that a sequence {gn , n ≥ 1} of real numbers is subadditive if it satisfies gn+m ≤ gn + gm .

(1.3.3)

Then we have the following well-known lemma. 1.3.3 Lemma. If {gn , n ≥ 1} is a subadditive sequence of real numbers, then gn /n converges to inf n≥1 (gn /n). Proof. Fix an arbitrary positive integer N and write n = jn N + rn with 1 ≤ rn ≤ N. Clearly jnn → N1 as n tends to infinity. Further gj N + grn gj N gr gr gn gn jn gN gN gr + n = ≤ ≤ n ≤ n + n ≤ + n. n≥1 n n n jn N n jn N n N n inf

14

1 The von Neumann theorem and spectral regularization

Letting now n tend to infinity gives inf

n≥1

gn gN gn gn ≤ lim sup ≤ . ≤ lim inf n→∞ n N n n→∞ n

As N was arbitrary, the lemma is proved. We thus deduce from (1.3.2) and from the lemma applied to gn :=

Sn 2 n

that

Sn Sn lim = inf . n→∞

n

n≥1

(1.3.4)

n

This is a remarkable consequence of Ky Fan’s identity, which remains true for averages of contractions by von Neumann’s theorem (proceed by approximation in view of the decomposition H = H0 ⊕ HT ). We continue with another interesting consequence concerning the ratios 2 Sn Snk n − n k+1 k k+1

1 nk

−

1 nk+1

,

where N = {nk , k ≥ 1} is a given increasing sequence of positive integers. Notice that in the orthonormal case, namely if X1 , X2 , . . . is an orthonormal sequence, then Sn k − Snk+1 2 = 1 − 1 precisely. We have the following properties: nk nk+1 nk nk+1

a)

Snk+1 2 N−1 Snk 1 nk − nk+1 lim sup

1 1 N→∞ nN k=1 nk − nk+1

b) Further if lim nk+1 − nk = ∞, then k→∞

Snk+1 −nk 2

SnN 2 − ≤ lim sup sup . 2 n2N N →∞ 1≤k 0. And |Vx (θ)| ≤ 1 if x is an integer. Hence (i). Now let −π ≤ θ ≤ π and put for any real x > 0, eixθ − 1 ϕθ (x) = . x

Then ϕθ (x) =

iθ xeixθ −eixθ +1 , x2

and noting δ(u) := |iueiu − eiu + 1|2 , we have

δ(u) = (1 − u sin u − cos u)2 + (u cos u − sin u)2 = 2[1 − u sin u − cos u] + u2 . We claim that for all u ≥ 0, δ(u) ≤ u4 /4. As δ(u) = δ(−u) it suffices to prove it for u ≥ 0. But δ (u) = 2u(1 − cos u) and if we set H (u) := u4 /4 − δ(u), we get H (u) = u3 − 2u(1 − cos u) = u(u2 − 4 sin2 (u/2)) ≥ 0, since | sin v| ≤ |v|. Then |ϕθ (x)| = |δ(xθ)|/x 2 ≤ |θ |2 /2. As it follows that ∂ V (θ ) ≤ π |ϕ (x)| ≤ π |θ |. x ∂x 2|θ | θ 4

∂ ∂x Vx (θ )

=

1 ϕ (x), eiθ −1 θ

Hence (ii). Let m ≥ n be positive integers . Then |Vn (θ) − Vm (θ )| =

1 π |ϕθ (n) − ϕθ (m)| ≤ (m − n) sup |ϕθ (x)|, − 1| 2|θ | n<x<m

|eixθ

and so |Vn (θ) − Vm (θ )| ≤

π 4 |θ|(m − n).

Now

n−1 m−1 1 1 ij θ 2(m − n) 1 ij θ |Vn (θ) − Vm (θ )| = e − e ≤ − .

n

m

j =0

m

j =n

m

Hence, (iii) and (iv). Introduce for θ ∈ [−π, π ) and y ∈ (0, 1] the regularizing kernel Q(θ, y) =

|θ | 1 ∧ 2. |θ| y

(1.4.1)

28

1 The von Neumann theorem and spectral regularization

1.4.3 Lemma. Let m ≥ n be two positive integers. Then, for any θ ∈ [−π, π ),

1/n

4π

Q(θ, y)dy + 4 1[ 1 , 1 ) (|θ |) ≥ |Vm (θ ) − Vn (θ )|2 . m n

1/m

Proof. Consider three cases: (1) |θ| ≥ n1 . By definition of Q and by Lemma 1.4.2.

1/n

1 m−n 1 dy = m n|θ | 1/m |θ| 2π 1 1 1 ≥ |Vm (θ ) − Vn (θ )|2 . ≥ |Vm (θ ) − Vn (θ )| 2 n|θ | 2π 4π

Q(θ, y)dy =

1/m

(2) |θ | ≤

1 m.

1/n

Then, for the same reasons

1/n

|θ| dy = (m − n)|θ | 2 1/m y 4 2 ≥ |Vm (θ ) − Vn (θ )| ≥ |Vm (θ ) − Vn (θ )|2 . π π

1/m

(3)

1 n

1/n

Q(θ, y)dy =

> |θ | ≥

1 m.

This case is obvious since we have |Vm (θ ) − Vn (θ )| ≤ 2.

Let f ∈ H , with spectral measure μf . Introduce a new measure, the spectral regularization of the measure μf with respect to the kernel Q, defined by μˆ f (dy) = 4π

π

−π

Q(θ, y)μf (dθ ) dy + 4μf (dy).

(1.4.2)

It is easy to verify that μˆ f ([0, 1]) ≤ 4(2π + 1)μf ([−π, π]) ≤ 4(2π + 1) f 2 . Indeed, if |θ | ≤ 1, then 0

1

|θ |

Q(θ, y)dy = 0

−1

|θ|

dy +

1 |θ |

|θ|y −2 dy = 1 + |θ |(|θ |−1 − 1) = 2 − |θ | ≤ 2,

1 1 and if 1 ≤ |θ| ≤ π , then y ≤ |θ| and 0 Q(θ, y)dy = 0 |θ |−1 dy ≤ 1. We thus have 1 0 Q(θ, y)dy ≤ 2; hence the inequality. 1.4.4 Theorem (Spectral regularization inequality). For any integers m ≥ n ≥ 1,

ATn f − ATm f 2 ≤ μˆ f

1 1 m, n

.

29

1.4 The spectral regularization inequality

Proof. By integrating the inequality of Lemma 1.4.3 with respect to the measure μf , we get π 1/n π 1 1 Q(θ, y)μf (dθ ) dy+4 μf m , n ≥ |Vm (θ )−Vn (θ )|2 μf (dθ ). 4π 1/m

−π

−π

By means of the spectral inequality (Proposition 1.2.2), we thus obtain the claimed result. The spectral regularization inequality allows us to easily evaluate the Littlewood– Paley square function associated to the averages ATn (f ). Put for any nondecreasing sequence N = {np , p ≥ 1} of positive integers, and any f ∈ H , SN (f ) =

∞

ATnp+1 (f ) − ATnp (f ) 2

1/2 .

(1.4.3)

p=1

These functions, which are extrapolated from the Littlewood–Paley theory, gained much interest in the ergodic circles during the last decade. We briefly recall their role in Fourier analysis on T. Introduce the so-called dyadic intervals ⎧ j −1 j −1 + 1, . . . , 2j − 1} if j > 0, ⎪ ⎨{2 , 2 j = {0} if j = 0, ⎪ ⎩ −|j | if j < 0, If f is any integrable function on T and fˆ its Fourier transform, then we write Sj f = ˆ n∈j f (n)χn . The square function of f is defined by Sf =

|Sj f |2

1/2 ,

j ∈Z

and the Littlewood–Paley theorem on T expresses that to each p in (1, ∞) correspond positive numbers Ap and Bp such that Ap Sf p ≤ f p ≤ Bp Sf p for (say) all trigonometric polynomials f on T. For more, see [Edwards–Gaudry: 1977]. The square function also appears in martingale theory ([Burkholder–Gundy: 1970], inequality (1.4)). Let f1 , f2 , . . . be a martingale on some probability space and d1 , d2 , . . . its difference sequence, so that fn =

n

dk ,

n ≥ 1.

k=1

Let

f

denote the maximal function of the martingale sequence: f = supn≥1 |fn |.

30

1 The von Neumann theorem and spectral regularization

The maximal function is related to the square function Sf = inequalities Ap Sf p ≤ f p ≤ Bp Sf p

∞

2 1/2 k=1 dk

by the

valid for 1 < p < ∞. 1.4.5 Theorem (Square function inequality). For any nondecreasing sequence N of positive integers, and any f ∈ H , SN (f ) ≤ 2(2π + 1)1/2 f . Proof. From Theorem 1.4.4, follows immediately that ∞

ATnp+1 (f ) − ATnp (f ) 2 ≤

p=1

∞

μˆ f

1 1 np+1 , np

≤ μˆ f {[0, 1]} ≤ 4(2π + 1) f 2 .

p=1

Actually the better constant 6π is obtained in Lifshits and Weber [2000: 77] by using another kernel Q. The corresponding spectral regularization of μ is given by π d μˆ −3 2 Q(θ, x)μ(dθ ) = |x| θ μ(dθ ) + |θ |−1 μ(dθ ), (x) = dx −π |θ | 1. Assume that H = L2 (μ), (X, A, μ) being a probability space, and define Tf = f τ where τ is a measure-preserving transformation of X (Section 3.1). By Theorem 1.4.5, the associated square function SN defined in (1.4.3) maps L2 (μ) to L2 (μ). This can be extended for 1 < p < ∞: There exists a constant Cp such that for any increasing sequence N = {nk , k ≥ 1} and any f ∈ Lp (μ), we have ∞ T A

p 1/p T (f ) − A (f ) ≤ Cp f p . nk+1 nk p

(1.4.11)

k=1

This nice result was shown by Jones, Kaufman, Rosenblatt and Wierdl [1998]. It is a direct consequence of a stronger result (see Theorem A), which we shall discuss in Section 4.6.6. With the notation from the beginning of the section, let N (AT (f ), p, ε) be the minimal number (possibly infinite) of Lp (μ) open balls centered in AT (f ) of radius ε, enough to cover AT (f ). In a way similar to the one we used to derive entropy estimates from the square function, we deduce from (1.4.11): There exists a constant Cp such that for ε > 0 and any f ∈ Lp (μ), N(A (f ), p, ε) ≤ T

p f p Cp p .

ε

(1.4.12)

For irrational rotations, this bound can be improved by using the Hausdorff–Young inequality (Lifshits [1997] and Weber [1997]). Let τ x = x + ϑ be a rotation on (T, λ), and T defined by Tf = f τ .

36

1 The von Neumann theorem and spectral regularization

Let 2 ≤ p < ∞ and 1/p + 1/q = 1. For f ∈ Lp (T), f ∼ fˆ = {fˆj , j ∈ Z} be its Fourier transform. Then

sup N(AT (f ), p, ε) ≤ Cε−q .

ˆ

j ∈Z fj ej ,

let

(1.4.13)

fˆ q ≤1

As T ej = e2iπj ϑ ej := λj ej , for all polynomials P we have P (λj )fˆj ej . P (T )f = j ∈Z

By the Hausdorff–Young theorem, we get

P (T )f p ≤ Cp

|P (λj )|q |fˆj |q

1/q .

j ∈Z

But this is a complete analog to (1.4.6) and we can proceed as in the proof of Proposition 1.4.7, by introducing a pseudo-spectral measure μ = j ∈Z |fˆj |q δλj , and its regularized version μˆ with the same kernel Q(z, r). We arrive at the estimate q

q

q

q

q

(An − Am )f p = (Vn − Vm )(T )x p ≤ Cp Vn − Vm q,μ ≤ C1 Cp μ[1/m, ˆ 1/n]. The estimate for covering numbers follows straightforwardly. Note that the proof works not only for rotations but also for all operators whose duals (with respect to a Fourier transform) act in q as contractive multiplications. Any convolution operator with respect to unit mass measure satisfies this condition. For more general averages such as averages of Dunford–Schwartz operators, or of a contraction in Lp , we do not know whether an analogous formulation of (1.4.12) exists. This estimate cannot, however, be improved in general as the following nice counterexample from Lisfshits [1997] shows. Lifshits’ counterexample. Let 2 ≤ p < ∞ and let U : Lp (T) → Lp (T) be the multiplication operator defined for any f ∈ Lp (T) and any θ ∈ T by Uf (θ ) = eiθ f (θ ). I +U +···+U We write An = AU n = n for any ε > 0 small enough that

n−1

where I is the identity operator. We shall prove

sup N (A(f ), p, ε/3) ≥ ε−p .

f p =1

Note that An f (θ ) = Vn (θ )f (θ ), so that for any positive integers m, n, p |Vn (θ ) − Vm (θ )|p |f (θ )|p dθ.

An f − Am f p = T

(1.4.14)

1.4 The spectral regularization inequality

37

Let B be some fixed integer strictly greater than 12. From the standard estimates |Vm (θ )| ≤ π(mθ )−1 ,

|Vn (θ ) − 1| ≤ π(n − 1)θ/4 ≤ nθ,

valid for any m, n, θ, we deduce that if B/m ≤ θ ≤ B 2 /m and n ≤ B −3 m, then |Vn (θ ) − Vm (θ )| ≥ 1/2. It follows for any f ∈ Lp (T), any m and any n ≤ B −3 m that B 2 /m p

An f − Am f p ≥ 2−p |f (θ )|p dθ. B/m

In particular, for any f ∈ Lp (T) and any positive integers l > t, B 2−3l p

AB 3t f − AB 3l f p ≥ 2−p |f (θ )|p dθ. B 1−3l

Let M be some positive integer and put ε = M −1/p . Set f (θ) =

M l=1

1

M(B 2−3l

1/p 1[B 1−3l ,B 2−3l ] (θ ).

− B 1−3l )

Then f p = 1 and

B 2−3l B 1−3l

Thus

|f (θ)|p dθ =

1 = εp , M

AB 3t f − AB 3l f p ≥ 2−p εp , p

l = 1, . . . , M. 1 ≤ t < l ≤ M.

We deduce from these calculations that N (A(f ), p, ε/3) ≥ M = ε−p , as claimed. A variant in L1 . There is a general estimate of a weaker form of the square function in L1 , which is due to Jones, Rosenblatt and Wierdl [1999: Theorem 2.3], and can be stated as follows. Let (X, A, μ) be a probability space. Consider mappings Tn : L1 (μ) → L1 (μ) and assume that each is strongly positive in the sense that Tn f ≥ 0 for all f ∈ L1 (μ). We also assume that each Tn is positively homogeneous, which means that Tn (cf ) = cTn f for nonnegative c and f ∈ L1 (μ). For instance, Tn can be the absolute value of any linear operator from L1 (μ) to L1 (μ).

∞ 2 1/2 . Then Let Sf (x) = n=1 Tn f (x) sup sup λ λ≥0 f 1 ≤1

∞ n=1

μ{|Tn f | ≥ λ} ≤ C "⇒ sup sup λμ{Sf ≥ λ} ≤ 10C. (1.4.15) λ≥0 f 1 ≤1

38

1 The von Neumann theorem and spectral regularization

The proof is rather elementary. As Sf ≤ S1 f + S2 f , where S1 f (x) = S2 f (x) =

∞ n=1 ∞

1/2

(Tn f (x))2 1{Tn f ≤1} (x)

, 1/2

(Tn f (x))2 1{Tn f >1} (x)

,

n=1

we get μ{Sf ≥ 2} ≤ μ{S1 f ≥ 1} + μ{S2 f ≥ 1} ∞ (Tn f )2 1{Tn f >1} ≥ 1 ≤ μ{S1 f ≥ 1} + μ ≤ μ{S1 f ≥ 1} +

n=1 ∞

μ{Tn f > 1} ≤ μ{(S1 f )2 ≥ 1} + C f 1

n=1 ∞

=μ

(Tn f (x))2 .

n=1 ∞

≤μ ≤

k=0

1{2−k−1 ≤Tn f ≤2−k } ≥ 1 + C f 1

k=0

2−2k

k=0 ∞

∞

2−2k

∞

1{2−k−1 ≤Tn f ≤2−k } ≥ 1 + C f 1

n=1 ∞

μ{Tn f ≥ 2−k−1 } + C f 1 ≤ 5C f 1 .

n=1

Let t > 0. Replacing now f by f/t gives tμ{Sf ≥ 2t} ≤ 5C f 1 ; hence sup λμ{Sf ≥ λ} ≤ 10C f 1 . λ≥0

Extensions to the Hilbert transform. Results of the previous section have extensions to the discrete bilateral Hilbert transform Hn (f ) = U j (f )/j, 0 0 be fixed. As τ has a limit from the right, there exists η > 0 such that ud < u < ud + η "⇒ |τ (u) − τ (ud + 0)| < ε. And thus ud 1 1 ud +η τ (u)du − τ (u)du − τ (ud + 0) |τ (ud + 0)| = η −π η −π ud +η 1 = τ (u) − τ (ud + 0) du η 1 ≤ η ≤ ε.

ud ud +η

|τ (u) − τ (ud + 0)|du

ud

As ε is arbitrary, we also deduce that τ (ud + 0) = 0. This shows that on any point t of the interval [−π, π[, we have τ (t) = 0. Said differently, σ¯ (t) = σ (t), for any t ∈ [−π, π[, as claimed. Equation (E1) thus admits, under the normalization conditions (2.2.3), a unique solution, namely, the function σ (t, f ) previously defined. Introduce then, for any t ∈ [−π, π], the function σ (t, f, g) =

1 i i 1 σ (t, f + g) − σ (t, f − g) + σ (t, f + ig) − σ (t, f − ig). 4 4 4 4

By successively replacing in equation (E1), f by f + g, f − g, f + ig and f − ig, we easily verify that π k U f, g = eikt dσ (t, f, g) (∀k ∈ Z). (E2) −π

We have thus obtained a representation of Fourier–Stieltjes transform type of the quantities U k f, g. We are now going to show that the mapping (f, g) → σ (t, f, g)

69

2.2 Spectral representation of unitary operators

is a bilinear form on H , with norm less than 1. Let f = a1 f1 + a2 f2 . Then U k f, g = a1 U k f1 , g + a2 U k f2 , g, and thus ∀k ∈ Z,

π

−π

e

ikt

dσ (t, f, g) = a1

π −π

e

dσ (t, f1 , g) + a2

ikt

π −π

eikt dσ (t, f2 , g).

This shows that σ (t, f, g) = a1 σ (t, f1 , g) + a2 σ (t, f2 , g). Thus f → σ (t, f, g) is linear. Further, g, U k f = U −k g, f = and

−π

g, U k f = U k f, g =

π −π

Consequently, for any integer k ∈ Z, π −ikt e dσ (t, g, f ) = −π

π

π

−π

(L1)

e−ikt dσ (t, g, f )

e−ikt dσ (t, f, g).

e−ikt dσ (t, f, g).

This shows that σ (t, f, g) = σ (t, g, f ). From this relation and from the linearity in f of σ (t, f, g) follows that σ (t, f, b1 g1 + b2 g2 ) = b1 σ (t, f, g1 ) + b2 σ (t, f, g2 ).

(L2)

Since the map t → σ (t, f, f ) is nondecreasing, and that σ (−π, f, f ) = 0, we have π σ (t, f, f ) ≤ σ (π, f, f ) = dσ (t, f, f ) = f, f . (L3) −π

We now need the following result. 2.2.2 Lemma. Let ϕ : H × H → C be an application satisfying the following properties: (a) ϕ(a1 f1 + a2 f2 , g) = a1 ϕ(f1 , g) + a2 ϕ(f2 , g), (b) ϕ(f, b1 g1 + b2 g2 ) = b¯1 ϕ(f, g1 ) + b¯2 f ϕ(f, g2 ), (c) |ϕ(f, f )| ≤ C f 2 , (d) |ϕ(f, g)| = |ϕ(g, f )|, where C is a constant, and f, g, f1 , f2 , g1 , g2 are arbitrary elements of H , and a1 , a2 , b1 , b2 are arbitrary complex numbers. Then ϕ is a bilinear form on H of norm less than or equal to C.

70

2 Spectral representation of weakly stationary processes

Proof. From (a) and (b) follows that ϕ(f, h) + ϕ(h, f ) =

1 ϕ(f + h, f + h) − ϕ(f − h, f − h) . 2

Consequently, |ϕ(f, h) + ϕ(h, f )| ≤

1

f + h 2 + f − h 2 = C f 2 + h 2 . 2

(2.2.7)

Let f and g be two elements of H such that max( f , g ) ≤ 1, and h = λg where λ is a complex number such that |λ| = 1. Then (2.2.7) implies, ¯ |λϕ(f, g) + λϕ(f, g)| ≤ 2C.

(2.2.8)

We assume that ϕ(f, g) = 0. Then by (d), ϕ(f, g) = |ϕ(f, g)|eia ,

ϕ(g, f ) = |ϕ(f, g)|eib .

And by means of (2.2.8), ¯ ia + λeib | ≤ 2C. |ϕ(f, g)| · |λe Choose λ = ei

a−b 2

. We obtain ¯ ia + λeib = ei λe

a+b 2

+ ei

a+b 2

= 2ei

a+b 2

.

And this shows |ϕ(f, g)| ≤ C

( f ≤ 1, g ≤ 1).

Hence the lemma, since for ϕ(f, g) = 0, the inequality is trivially satisfied. Relations (L1), (L2) and (L3) thus imply, by virtue of the lemma we have just proved, that σ (t, · , · ) is a continuous bilinear form on H with norm less than or equal to 1. They indicate also a simple consequence of Lemma 2.2.2. 2.2.3 Corollary. Let ϕ : H × H → C be a bilinear form on H , verifying the following condition: for any elements f and g of H , |ϕ(f, g)| = |ϕ(g, f )|. Then,

ϕ = sup

f ∈H

|ϕ(f, f )| . f, f

Proof. It follows from Lemma 2.2.2 that

ϕ ≤ sup

f ∈H

|ϕ(f, f )| . f, f

2.2 Spectral representation of unitary operators

71

Conversely, we also have sup

f ∈H

|ϕ(f, f )| |ϕ(f, g)| ≤ sup = ϕ . f, f f,g∈H f

g

Hence, the corollary is proved. The lemma below gives a representation of bilinear forms. 2.2.4 Lemma. Let φ : H × H → C be a continuous bilinear form on H . Then there exists a continuous operator A : H → H , such that for all f, g ∈ H , φ(f, g) = Af, g. Moreover,

A = φ . This is a straightforward application of the well-known Riesz–Fréchet theorem: 2.2.5 Lemma. Any linear form φ on H can be expressed in the form φ(h) = h, hφ , where hφ ∈ H is uniquely determined by φ. Further,

φ = f . Proof. We know that Ker(φ) := g ∈ H : φ(g) = 0 is a closed subspace of H . The claimed result is obvious if Ker(φ) = H . Now, if Ker(φ) = H , let g ∈ H ( Ker(φ). Consider the elements of the form φ(h)g − φ(g)h, As

h∈H

" # φ φ(h)g − φ(g)h = φ(g)φ(h) − φ(h)φ(g) = 0,

these elements thus belong to Ker(φ). Since g ∈ H ( Ker(φ), we have φ(h)g − φ(g)h, g = 0, and thus,

φ(g) φ(h) = h, g . g, g Thus φ(h) = h, hφ , for any element h of H , and this representation is obviously unique. Finally, from the equation φ(h) = h, hφ follows that |φ(h)| ≤ h

hφ . Hence, φ ≤ hφ . And for h = hφ , we obtain φ(h) = hφ 2 . Thus also, φ ≥

hφ . This achieves the proof of the Riesz–Fréchet theorem.

72

2 Spectral representation of weakly stationary processes

Now we can easily deduce Lemma 2.2.4. Proof of Lemma 2.2.4. Fix some f in H . The mapping g → ϕ(f, g) defines a continuous linear form on H . Thus by virtue of Lemma 2.2.5, there exists a unique fϕ , for which we have ϕ(f, g) = g, fϕ or ϕ(g, f ) = fϕ , g. Define the operator A by the equation Af = fϕ . Then ϕ(f, g) = Af, g, and since ϕ(a1 f1 + a2 f2 , g) = a1 ϕ(f1 , g) + a2 ϕ(f2 , g), we also have A(a1 f1 + a2 f2 ) − a1 A(f1 ) − a2 A(f2 ), g = 0, for any g ∈ H . Consequently, A(a1 f1 + a2 f2 ) = a1 A(f1 ) + a2 A(f2 ). And this shows that A is a linear operator on H . Finally

ϕ = sup f,g∈H

and

ϕ = sup f,g∈H

|ϕ(f, g)| |Af, g|

Af

= sup ≤ sup ,

f

g

f,g∈H f

g

f ∈H f

|ϕ(f, g)| |Af, Af |

Af

≥ sup = sup .

f

g

f

Af

f,g∈H f ∈H f

These relations imply that A is continuous and

ϕ = A . Finally, if B is another operator on H for which ϕ(f, g) = Bf, g for any f and g in H , then Af − Bf, g = 0. And thus A = B, whence the unicity of A. 2.2.6 Self-adjoint operators. There are some easy and useful (for the sequel) consequences to be drawn from Lemma 2.2.4. A bounded linear operator on H induces by means of the expression f, Ag a bilinear form on H with norm A . By Lemma 2.2.4 we deduce that there exists an operator A∗ on H with norm A∗ = A such that f, Ag = A∗ f, g.

73

2.2 Spectral representation of unitary operators

This operator is by definition the adjoint operator of A, and one can easily verify that A∗∗ = (A∗ )∗ = A. If A is a bounded operator and A∗ = A, then we say that A is self-adjoint. For a bounded self-adjoint operator A, we have the relation sup

f = g =1

|Af, g| = sup |Af, f |.

f =1

Indeed, the bilinear form ϕ(f, g) = Af, g, verifies the condition of Corollary 2.2.3, namely |ϕ(f, g)| = |ϕ(g, f )|. The result therefore follows from this corollary. There thus exists a family (Et )−π ≤t 0 "⇒ μ

k=1

τ −k A = 1.

(3.2.2)

103

3.3 Weak mixing, strong mixing, continuous spectrum

, −k A and let ω ∈ τ −1 A. ˇ Then τ ω ∈ τ −k A for some k ≥ 1, Indeed, write Aˇ = ∞ k=1 τ ˇ and by ˇ Thereby τ −1 Aˇ ⊂ A, which means that τ k (τ ω) = τ k+1 ω ∈ A; hence ω ∈ A. −k ˇ ˇ iterating this τ A ⊂ A for any positive integer k. But this has some consequences. By specifying Proposition 3.2.3 for indicator functions f = 1B , g = 1C , we get 1 μ(τ −k B ∩ C) = μ(B)μ(C). n→∞ n n−1

lim

(3.2.3)

k=0

Applying this with B = Aˇ = C gives 1 ˇ = 1 ˇ = μ(A) ˇ = μ(A) ˇ 2. μ(τ −k Aˇ ∩ A) μ(τ −k A) n n n−1

n−1

k=0

k=0

ˇ = 0 or 1. As μ(A) ˇ ≥ μ(τ −1 A) = μ(A) > 0, we obtain (3.2.2). Hence μ(A)

3.3 Weak mixing, strong mixing, continuous spectrum Let (X, A, μ, T ) be a measurable dynamical system and denote again by UT the operator on L2 (μ) defined by UT f (x) = f (T x). An equivalent reformulation of Proposition 3.2.3 is 3.3.1 Lemma. The dynamical system (X, A, μ, T ) is ergodic, if and only if, 1 μ(A ∩ T −k B) = μ(A)μ(B). n→∞ n n−1

∀A, B ∈ A,

lim

(3.3.1)

k=0

This means that the space X is well mixed under the action of T . We shall strengthen this notion of mixing, by replacing the convergence in mean in the Cesàro sense, by stronger modes of convergence. 3.3.2 Definition. (a) The dynamical system (X, A, μ, T ) is weakly mixing if ∀A, B ∈ A,

n−1 1 μ(A ∩ T −k B) − μ(A)μ(B) = 0. n→∞ n

lim

(3.3.2)

k=0

k The condition is equivalent to: limn→∞ n1 n−1 k=0 UT f, g − f, 1g, 1 = 0, for all 2 f, g ∈ L (μ). (b) A dynamical system (X, A, μ, T ) is strongly mixing if ∀A, B ∈ A,

lim μ(A ∩ T −n B) = μ(A)μ(B),

n→∞

or equivalently limn→∞ UTn f, g = f, 1g, 1, for all f, g ∈ L2 (μ).

(3.3.3)

104

3 Dynamical systems – ergodicity and mixing

(c) The dynamical system (X, A, μ, T ) has continuous spectrum, when the only eigenfunctions of UT are constants: ∀f ∈ L2 (μ),

UT f = f "⇒ f = constant.

(3.3.4)

(d) The dynamical system (X, A, μ, T ) has discrete spectrum, when the eigenfunctions of UT span L2 (μ). (e) Let k be some positive integer. A dynamical system (X, A, μ, T ) is k-mixing, if for any choice of measurable sets A1 , . . . , Ak ,

μ T −n1 A1 ∩ · · · ∩ T −nk Ak = μ(A1 ) . . . μ(Ak ). lim min(n1 ,...,nk )→∞ min(|ni −nj |,i =j )→∞

(f) The system, or more simply, the endomorphism T is completely mixing when

UTn f 1 tends to 0, for all f ∈ L1 (μ) with X f dμ = 0. The following fact is implicit in the above definitions: in order for a measurable dynamical system (X, A, μ, T ) to be ergodic, weakly mixing or mixing, it is sufficient (and obviously necessary) that the property be satisfied for a countable class of functions, which is also dense in L2 (μ). Note also that if T is weakly mixing, then so is T k for any k; a similar comment is trivially in order for the strong mixing property. ¯ Let A be a set of nonnegative integers. The lower (resp. upper) density d(A) (resp. d(A)) of A is defined by d(A) = lim inf J →∞

#{A ∩ [1, J ]} , J

#{A ∩ [1, J ]} ¯ d(A) = lim sup . J J →∞

(3.3.5)

¯ We say that A has a density d(A) if d(A) = d(A). The following liaison theorem is very useful. 3.3.3 Theorem. For a measurable dynamical system, the following conditions are equivalent. a) The dynamical system (X, A, μ, T ) is weakly mixing. b) The product dynamical system (X × X, A ⊗ A, μ × μ, T × T ) is ergodic. c) For any f, g ∈ L2 (μ), with f, 1 = g, 1 = 0, there exists a nondecreasing sequence S with density 1, such that lim UTn f, g = 0.

n→∞ n∈S

d) The dynamical system (X, A, μ, T ) has continuous spectrum. Moreover, when the σ -algebra A is countably generated, conditions a), b), c) and d) are also equivalent to the following condition:

105

3.3 Weak mixing, strong mixing, continuous spectrum

e) There exists a nondecreasing sequence M with density 1, such that for any f, g ∈ L2 (μ), with f, 1 = g, 1 = 0, lim UTn f, g = 0.

n→∞ n∈M

The last equivalence follows from a theorem due to Jones [1972]. Proof. The product dynamical system (X × X, A ⊗ A, μ × μ, T × T ) is ergodic, if and only if, 1 k lim UT ×T f, g = 0. (3.3.6) n→∞ n n−1

∀f, g ∈ L (μ × μ), f, 1 = g, 1 = 0, 2

k=0

As linear combinations of functions of type f (x, y) = f1 (x)f2 (y), f1 , f2 ∈ L2 (μ) are dense in L2 (μ × μ), and since averages of contractions are contractions in L2 (μ), we deduce that the product dynamical system is ergodic, if and only if, ∀f1 , f2 , g, g ∈ L2 (μ) ∩ 1⊥ μ,

1 k UT f1 , gUTk f2 , g = 0. n→∞ n n−1

lim

(3.3.7)

k=0

By Cauchy–Schwarz’s inequality, we get n−1 2 $ %$ % n−1 n−1 1 1 1 k k k 2 k 2 U f , gU f , g ≤ (U f , g) (U f , g ) . T 1 T 2 T 1 T 2 n n n k=0

k=0

k=0

To obtain (3.3.7), it thus suffices to prove ∀f, g ∈ L2 (μ) ∩ 1⊥ μ,

1 k UT f, g2 = 0. n→∞ n n−1

lim

(3.3.8)

k=0

By choosing f1 = f2 , g = g in (3.3.7), we observe that (3.3.7) and (3.3.8) are in turn equivalent, and equivalent to the ergodicity of the product dynamical system. The equivalence between properties a), b) and c) will be deduced from the following lemma. 3.3.4 Lemma. Let {an , n ≥ 0} be a bounded sequence of positive reals. The following conditions are equivalent: (a) limn→∞ n1 n−1 k=0 ak = 0, 2 (b) limn→∞ n1 n−1 k=0 ak = 0, (c) There exists an increasing sequence of positive integers S, with density 1, such that lim n→∞ an = 0. n∈S

106

3 Dynamical systems – ergodicity and mixing

Note that if (a) holds, then for any positive integer j , limn→∞ n1 n−1 k=0 aj k = 0; and so by (c), for any positive integer j there exists a sequence of integers S of density 1, such that lim n→∞ aj n = 0. This seems less easy to be directly deduced from (c). n∈S

1 n−1 2 Proof of Lemma 3.3.4. (1) Let M = supn≥0 an . Then, n1 n−1 k=0 ak ≤ M n k=0 ak ; hence the implication (a) ⇒ (b). Conversely, by the Cauchy–Schwarz inequality

1 n−1 2 1 1 n−1 2 a ≤ k k=0 k=0 ak ; hence the equivalence (a) ⇔ (b). n n (2) Now, we show the implication (c) ⇒ (a). Let ε > 0 and Nε be such that for any n ∈ [Nε , ∞) ∩ S, an < ε. For such sufficiently large n, we have ε 1 1 1 ak = ak + n n n

n−1

N

k=0

k=0

n−1

1S (k) + 1S c (k) ak

k=Nε +1

Nε 1 c ≤M (1S (k) ≤ 3ε, +ε+M n n n−1 k=0

since = 0. As ε was arbitrary, we obtain (a). (3) Finally consider the implication (a) ⇒ (c). There is no loss to assume that an > 0 infinitely often. For any integer k ≥ 1, we put −1 1 Nk = sup N : N1 N j =0 aj > k 2 . d(S c )

1} is nondecreasing, unbounded. Further, for any N > Nk , we The sequence {Nk , k ≥ 1 have N1 N−1 a ≤ . Let κ = {kp , p ≥ 1} be the strictly increasing sequence of j =0 j k2 integers defined by Nkp < Nkp+1 Set Sc =

and ∞ +

kp ≤ k < kp+1 "⇒ Nk = Nkp .

[Nkp , Nkp+1 [ ∩ n : an >

1 kp

.

(3.3.9)

k=1

Let j ∈ S and p˜ = p(j ) be the unique integer such that j ∈ [Nkp˜ , Nkp+1 [. For this value ˜ 1 1 of p, as S = ∞ , we have a [N , N [ ∪ n : a ≤ ≤ . Consequently kp+1 n j k=1 kp kp kp lim aj = 0.

j →∞ j ∈S

It remains to show that d(S) = 1. Let N be some positive integer, and let p be defined by N ∈ [Nkp , Nkp+1 [. Let n ≤ N, n ∈ S c . Then for some p ≤ p, we have n ∈ [Nkp , Nkp +1 [. And by definition of S c , an > k1 ≥ k1p . p Thus N −1 N −1 1 1 1 1 c 1S (n) ≤ kp an ≤ kp 2 = → 0, N kp kp N n=0

n=0

as p (in fact N) tends to infinity; hence d(S c ) = 0.

3.3 Weak mixing, strong mixing, continuous spectrum

107

Continuation of the proof of Theorem 3.3.3. As by the Cauchy–Schwarz inequality, the sequence an = |UTk f, g|, n ≥ 1 is bounded, the fact that assertions a), b) and c) are equivalent, is now easily deduced from Lemma 3.3.4. We shall now show the equivalence between a), b), c) and e), when the σ -algebra A has a countable basis. Let A1 , A2 , . . . be a sequence of measurable sets generating A, and denote by B the Boole algebra generated by this sequence. Note also for any integer k ≥ 1, Bk is the Boole algebra generated by the sequence A1 , A2 , . . . , Ak . From the previous step, we deduce for each k, that there exists a nondecreasing sequence of integers Mk with density 1, such that lim μ(A ∩ T −n B) = μ(A)μ(B).

∀A, B ∈ Bk ,

n→∞ n∈Mk

We shall build a sequence n1 < n2 < · · · growing sufficiently fast for the set +

M= [nk , nk+1 [ ∩ M1 ∩ M2 ∩ · · · ∩ Mk , k≥1

to be of density 1, and moreover to have Mk ⊃ {m ∈ M : m ≥ nk } for each k. Put, 1 c ∀k ≥ 1, nk = inf n : sup1≤j ≤k supN >n N1 N j =1 1Mj ≤ k 2 . Then for any N > nk and 1 ≤ j ≤ k, N 1 1 1Mjc ≤ 2 . N k j =1

Let N be some arbitrary integer, and k such that N ∈ [Nk , Nk+1 [. Let n ≤ N, n ∈ M c , then n ∈ [Nl , Nl+1 [ for some l ≤ k. Since Mc =

*

([Nk , Nk+1 [)c ∪

k +

Mjc ,

j =1

k≥1

for this value of l, we have: n∈

l + j =1

Mjc ⊆

k +

Mjc .

j =1

Thus, N N k N 1 1 1 k 1M c (n) ≤ 1 ,k M c (n) ≤ 1Mjc (n) ≤ → 0 j =1 j N N n (k − 1)2 n=1

j =1 n=1

n=1

as k (in turn N) tends to infinity. = 0. Now, by construction, if m ≥ nk , there exists l ≥ k such that m ∈ [nl , nl+1 [. For such l, we have m ∈ M1 ∩ · · · ∩ Ml ⊂ M1 ∩ · · · ∩ Mk ⊂ Mk . Hence, We get d(M c )

∀k ≥ 1,

{m ∈ M : m ≥ nk } ⊂ Mk .

108

3 Dynamical systems – ergodicity and mixing

We deduce, lim μ(A ∩ T −n B) = μ(A)μ(B).

∀N ≥ 1, ∀A, B ∈ Bk ,

n→∞ n∈M

, But the Boole algebra B generates the σ -algebra A. As B = N ≥1↑ BN , the proof is achieved by approximation. The remaining part of the proof of Theorem 3.3.3 now relies upon the spectral mixing theorem, and will be given at the end of the next section. Exact endomorphisms. This notion turns up to be very appropriate for describing mixing properties of some important number-theoretic endomorphisms. Consider a Lebesgue space (M, M, μ) with a continuous measure, and let T denote an endomorphism of (M, M, μ). Repeated application of the operation T −1 generates a sequence of σ -algebras, each imbedded in its predecessor, M ⊇ T −1 M ⊇ T −2 M ⊇ · · · . The endomorphism T is said to be exact if ∞ *

T −n M = N ,

n=0

where N denotes the trivial σ –algebra, namely the ensemble of all measurable sets of measure zero and their completion. In a nicely written paper [Rochlin: 1961] this property was introduced and further thoroughly investigated. An endomorphism T is mixing of degree r if for arbitrary sets M0 , M1 , . . . , Mr , and sequences (k0n , k1n , . . . , krn ) consisting of non-negative integers k0n , k1n , . . . , krn such that lim inf |kin − kjn | = ∞, n→∞ 0≤i<j ≤r

we have

r *

lim μ

n→∞

r ( j T −kn Mj = μ(Mj ).

j =0

j =0

A remarkable fact is that an exact endomorphism is mixing of all degrees ([Rochlin: 1961], p.17). Important examples arise from a class of number-theoretic endomorphisms studied by Rényi [1957]. Let φ be a real function defined on the interval ]0, 1[. We write T x = {φ(x)}, where {c} denotes the fractional part of c. If T x = 0, then T 2 x is defined, and this may be continued if T 2 x = 0. Let M denote the set of all points of ]0, 1[ for which all powers of T are defined. Clearly T maps M into M. For x ∈ M write also an (x) = φ(T n x),

n = 1, 2, . . . .

109

3.3 Weak mixing, strong mixing, continuous spectrum

It is natural to ask whether there does exist on M a measure μ equivalent to the Lebesgue measure λ such that T is an ergodic endomorphism of the space M with measure μ. Such a question has been studied with relation to concrete functions φ, notably in [Rényi: 1957]. As classical examples consider the function φ(x) = rx, where r is an integer greater than 1, and also the function φ(x) = 1/x. In the first case M is the set of r-based irrational numbers in the interval ]0, 1[, and a1 (x), a2 (x), . . . is an expansion of the number x into r-ary fractions. The measure μ exists and coincides with λ. In the second example, which we already encountered in 3.1.2, M is the set of all irrational numbers in the interval ]0, 1[ and a1 (x), a2 (x), . . . is the expansion of the number x into a continued fraction. The measure μ exists and is defined by the formula 1 dx μ(X) = , log 2 X 1 + x where X is an arbitrary measurable subset of M. More complete results are contained in [Rényi: 1957]. Consider the following two conditions: (A) φ is continuous and strictly decreasing from the limiting value lim φ(x), which x→0

is either infinite or is an integer greater than 2, to the limiting value lim φ(x) = 1. For x→1

x < y, the inequality φ(x)−φ(y) ≥ y−x holds, whereas for x > y > φ −1 (1+φ −1 (2)) the stronger inequality φ(x) − φ(y) ≥ γ (y − x), γ > 1 holds. (B) φ is continuous and strictly increases from the limiting value lim φ(x) = 0 x→0

to the limiting value lim φ(x) equal either to ∞ or to an integer greater than 1. For x→1

x < y, the inequality φ(y) − φ(x) ≥ y − x holds. Under either condition (A) or (B), the number x ∈ M is uniquely defined by the sequence of integers a1 (x), a2 (x), . . . . And under some additional condition, which amounts to saying that ξ is a generator with respect to T , there exists on the interval ]0, 1[ a measurable function p such that

C −1 ≤ p(x) ≤ C,

1

p(x)dx = 1

0

and T is an ergodic endomorphism of M with measure μ(X) = p(x)dx. X

See also [Philipp: 1967].

110

3 Dynamical systems – ergodicity and mixing

3.4

Spectral mixing theorem

We say that the sequence {an , n ≥ 1} converges in density to a if there exists a subset J of N of density 1, such that lim an = a (3.4.1) Jn→∞

and we write D- lim an = a.

(3.4.2)

n→∞

If {an , n ≥ 1} is a bounded sequence of real numbers, then in view of Lemma 3.3.4, D-limn→∞ an = 0, if and only if, 1 lim |an − a| = 0, n→∞ n n−1

(3.4.3)

j =0

The following lemma is due to Wiener. 3.4.1 Lemma. Let ν be a finite measure on the torus T, and let νˆ (n) = T e2iπ nt dν(t). Then νˆ (n) converges in density to zero, if and only if, ν has no atoms: ν{t} = 0, for any t ∈ T. Proof. We have n−1

1 1 |ˆν (k)|2 = n n n−1

k=0 T

k=0

=

As

T T

1 2iπ k(s−t) e = n n−1

e2iπ ks dν(s)

T

e−2iπ kt dν(t)

1 2iπ k(s−t) e dν(s)dν(t). n n−1 k=0

1 e2iπ(n+1)(s−t) −1 n e2iπ(s−t) −1

k=0

we deduce that

if s = t, if s = t,

1 2iπ k(s−t) e = δ{s=t} . n→∞ n n−1

lim

k=0

Hence, by means of the dominated convergence theorem, 1 lim |ˆν (k)|2 = n→∞ n n−1 k=0

dν(s)dν(t) = s=t

This limit equals 0, if and only if, ν has no atoms.

0≤t 0 for some t. As T is a contraction, exp(−2iπ t)T is a contraction as well. Applying von Neumann’s theorem, we get lim An (exp[−2iπ t]x) = y, n→∞

and check that exp(−2iπ t)T y = y. Thus n−1

n−1 1 1 k exp (−2iπ kt) · T x, x = lim exp (−2iπ kt)T k x, x. y, x = n→∞ n n 0

0

According to Lemma 3.4.1, T k x, x = y, x = lim

n→∞ T

As

T exp

(2iπ ks) dνx (s). Hence ,

1 exp [−2iπ k(s − t)] dνx (s). n n−1 0

n−1 1 0 lim exp 2iπ k(s − t) = n→∞ n 1 0

if s = t, if s = t,

we thus have y = 0. And y is a proper vector, which is not orthogonal to x; hence a contradiction. Thus νx has no atoms. (b) ⇒ (c). We have T n x, x = T exp (2iπ ns)dν(s) = νˆ (n). Thus, D- lim T n x, x = D- lim νˆ (n). n→∞

n→∞

This implication thus follows from Lemma 3.4.1. (c) ⇒ (d). Let H0 = {y ∈ H : D- limn→∞ T h x, y = 0}. Then H0 is a closed subspace of H . Let indeed {yn , n ≥ 1} ⊂ H0 , with yn → y0 in the strong topology . Write T n x, y = T n x, y − yk + T n x, yk . We have

1 1 |T i x, y| ≤ |T i x, yk | + x

y − yk . n n n−1

n−1

0

0

Let ε > 0 be fixed. We choose k sufficiently large so that y − yk

x ≤ 2ε , then n such that n1 n−1 |T i x, yk | ≤ 2ε . Thus n1 n−1 |T i x, y| ≤ ε. 0 0 This establishes that H0 is closed. Now, we show that H0 ⊃ {T n x, n ≥ 1}. Observe that T n is decreasing with n. Let k be some positive integer and a real ε > 0 such that for any sufficiently large n, 0 ≤ T n x 2 − T n+k x 2 ≤ ε2 .

3.4 Spectral mixing theorem

113

By Lemma 3.4.3, for any y ∈ H , |T n x, y − T n+k x, T k y| ≤ ε y . For y = x, we deduce |T n x, x − T n+k x, T k x| ≤ ε x . Since D- limn→∞ T n x, x = 0; it follows that D- lim supn→∞ |T n+k x, T k x| ≤ ε x . As ε is arbitrary, we thus have D- lim T n+k x, T k x = 0. n→∞

Let y ∈ H be such that T k x, y = 0, for any k ≥ 1. Plainly D- limn→∞ T k x, y = 0; and thereby y ∈ H0 . Thus, on the one hand H0 ⊃ H1 = span{T n x, n ≥ 1}, and on the other H0 ⊃ H2 = span{y ∈ H : y ⊥ T n x, ∀n ≥ 1} = H1⊥ . Consequently H0 = H . (d) ⇒ (a). Let ω be a proper vector of T with proper value λ of modulus 1. Let S = λ−1 T . Then S is a contraction with ω as fixed point. Besides, for any i ≥ 1, S i x, y = |λ−i T i x, y| = |T i x, y|. We deduce from (d) that 1 i |S x, y| n n−1 0

tends to zero as n tends to infinity. In view of von Neumann’s theorem, ASn (x) → L in the strong topology. According to the previous observation L, y = 0

for any y ∈ H .

Thus L = 0, the limit being the orthogonal projection of x onto HS = z ∈ H : Sz = z . And so x is orthogonal to ω. As ω is arbitrary, noting that HS = {v ∈ H : T v = λv}, we get x ∈ Hf l . End of the proof of Theorem 3.3.3. Now we show that (X, A, μ, T ) is weakly mixing, if and only if (X, A, μ, T ) has continuous spectrum. If T has continuous spectrum, the only proper vectors are the constants. Thus Hf l = 1⊥ . By the spectral mixing theorem, for any f, g ∈ L2 (μ) = H with f, g ∈ 1⊥ , we have D- lim T n f, g = 0. Thereby the system is weakly mixing. Conversely, if (X, A, μ, T ) is weakly mixing, then it follows that, for any f, g ∈ 1⊥ , D- lim T n f, g = 0. Hence, f ∈ Hf l , and thus 1⊥ ⊂ Hf l . We claim that Hf l = 1⊥ . Indeed, Hf l ⊂ 1⊥ since 1 |T n x, y| = 0. n→∞ n 0 1 n−1 n And thus x ∈ Hf l implies limn→∞ n 0 T x, y = 0. For y = 1, we have T x, 1 = 0, which means that x ∈ 1⊥ . This achieves the proof. n−1

x ∈ Hf l "⇒ ∀y ∈ H,

lim

114

3 Dynamical systems – ergodicity and mixing

3.5

Other equivalences and other forms of mixing

In this section, several quite interesting characterizations of mixing are presented and commented upon. Let (XA, μ) be a probability space. According to a result of Rényi [1958], a measure-preserving transformation T : X → X is strongly mixing if and only if, for all A ∈ A, lim μ(A ∩ T −n A) = μ(A)2 .

n→∞

(3.5.1)

By a result of England and Martin [1968], we also know that an automorphism T is weakly mixing if and only if, for each pair of sets A, B ∈ A with positive measure, there exists a set of integers N of density 1 such that μ(T n A ∩ B) > 0

for all n ∈ N .

(3.5.2)

The following characterization is due to Blum and Hanson [1960]. T is strongly mixing if and only if N 1 nk Lp T f −→ f dμ as N → ∞, N X

(3.5.3)

k=1

for every 1 ≤ p < ∞, f ∈ Lp and any increasing sequence {nk , k ≥ 1} of positive integers. A sequence {Bn , n ≥ 1} of elements of A is called remotely trivial if * σ {Bm+k , k ≥ 1} m≥0

is the trivial σ -algebra, namely it contains only sets of measure 0 or 1. Sucheston [1963] has shown that T is strongly mixing, if and only if for all A ∈ A, every subsequence of the sequence {T −n A, n ≥ 1} contains a further subsequence which is remotely trivial. Krengel [1972] showed that a similar characterization takes place for weak mixing: T is weakly mixing, if and only if for all A ∈ A, the sequence {T −n A, n ≥ 1} contains a subsequence which is remotely trivial. An isometry U of a complex Hilbert space H has purely discrete spectrum if H is spanned by the eigenvectors of U , and has continuous spectrum if it has no eigenvectors. If T : X → X is a measure-preserving transformation, these notions are transferred to T by considering U = UT (UT f = f T ). Krengel [1972] gave a geometric characterization as follows: a vector f ∈ H is called weakly wandering if there exists a strictly increasing sequence 0 = k0 < k1 < k2 < · · · of nonnegative integers such that the vectors U ki f, i = 0, 1, 2, . . . are orthogonal to each other. Then U has continuous spectrum if and only if the weakly wandering vectors span H , and U has purely discrete spectrum if and only if there exist

3.5 Other equivalences and other forms of mixing

115

no nonzero weakly wandering vectors. In the first case the weakly wandering vectors turn out to be dense in H . Now consider a partition of X, a finite collection of pairwise disjoint elements of (1) (1) A, the union of which is X. Finitely many partitions ξ 1 = {A1 , . . . , An1 }, . . . , (r) (r) ξ r = {A1 , . . . , Anr } are called independent if for every (i1 , . . . , ir ) with 1 ≤ ij ≤ nj (j = 1, . . . , r) the equation r *

μ

j =1

(j ) Aij

=

r ( j =1

(j )

μ(Aij )

holds. Infinitely many partitions are called independent if every subset of them is independent. Let T be an automorphism of (X, A, μ). A partition ξ = {A1 , . . . , An } is called weakly independent if there exists a strictly increasing sequence 0 = k0 < k1 < k2 < · · · of nonnegative integers such that the partitions T −ki ξ are independent. This notion is closely related to weak mixing and two-sided mixing: T is called two-sided mixing if for all A, B, C ∈ A, 1 |μ(T −k A ∩ B ∩ T k C) − μ(A)μ(B)μ(C)| = 0. n→∞ n n−1

lim

(3.5.4)

k=0

Weak mixing is the special case where the above holds for all A, B ∈ A and C = X. If T is two-sided mixing, there exists for every partition ξ = {A1 , . . . , An } and every ε > 0, a weakly independent partition ξ¯ = {A¯ 1 , . . . , A¯ n } with n

μ(Ai A¯ i ) ≤ ε,

and

μ(Ai ) = μ(A¯ i )

(i = 1, . . . , n).

(3.5.5)

i=1

In other words: the weakly independent partitions are dense in the set of finite partitions. Further, if for every partition ξ = {A, Ac }, A ∈ A, and every ε > 0, there exists ¯ A¯ c } and three integers k0 , k1 , k2 such that T k0 ξ , T k1 ξ , T k2 ξ are a partition ξ¯ = {A, independent, then T is weakly mixing. These two results are Theorem 3.1 in Krengel [1972], to which we refer for more details. As noticed by Del Junco, Reinhold, Weiss [1999: 447]), it follows from Krengel’s proof that 3.5.1 Theorem (Krengel [1972]). T is a weakly mixing transformation, if and only if the weakly independent partitions are dense in the set of finite partitions. Recall that an IP-set is the set of finite sums with no repetitions generated by a sequence {nk , k ≥ 1} of nonnegative integers, that is, consists of the elements of the form ni1 + · · · + nik , i1 < · · · < ik , k ≥ 1. Del Junco, Reinhold, Weiss [1999: Theorem 2] showed that if T is weakly mixing, then the weakly independent partitions along IP-sets are dense in the set of finite partitions. They also showed

116

3 Dynamical systems – ergodicity and mixing

[1999: Theorem 4] that if U is an isometry of a complex Hilbert space H , which has no discrete spectrum, then the weakly wandering vectors along IP-sets are dense in H . Given a measure-preserving transformation T of (X, A, μ), a sequence m = {mk , k ≥ 1} is mixing for T if for any pair of the sets A, B ∈ A, lim μ(A ∩ T −mk B) = μ(A)μ(B).

k→∞

(3.5.6)

In the same paper [1999: Theorem 5], it is proved that if m is mixing for T , then the weakly independent partitions along IP-sets with generators in m are dense in the set of finite partitions. Sequential dynamical systems. Berendt and Bergelson [1984] (another paper to which we refer in this section) generalized the notion of ergodicity, weak mixing, and strong mixing for arbitrary sequences of measure-preserving transformations. A sequential dynamical system is a quadruple (X, A, μ, T˜ ) where T˜ = {Tj , j ≥ 1} is a sequence of measure-preserving transformations of X. A sequential dynamical system is ergodic if for any pair of the sets A, B ∈ A, N 1 μ(Tn−1 A ∩ Tm−1 B) = μ(A)μ(B). N →∞ N 2

lim

(3.5.7)

n,m=1

Alternatively, we say that T˜ is ergodic. 3.5.2 Theorem. For a sequential dynamical system (X, A, μ, T˜ ), the following conditions are equivalent: (1) T˜ is ergodic. −1 −1 2 (2) limN→∞ N12 N n,m=1 μ(Tn A ∩ Tm A) = μ(A) . Lp nk (3) For every 1 ≤ p < ∞ and f ∈ Lp , N1 N k=1 T f −→ X f dμ as N → ∞. (4) The former property holds for some 1 ≤ p < ∞. This is Theorem 2.1 of Berendt and Bergelson [1984], from which we can easily infer that a dynamical system (X, A, μ, T ) is ergodic if and only if, the sequential dynamical system (X, A, μ, T ) is, where T = {T j , j ≥ 0}. The extension of the notion of strong mixing to sequential dynamical systems, requires the introduction of a notion. Let E be any set and F ⊆ E × E. We say that F is of bounded fibres if there exists some c such that for every a1 ∈ E, the set F contains at most c elements of the form (a1 , a2 ) with a2 ∈ E. Then a sequential dynamical system (X, A, μ, T˜ ) is strongly mixing if for any pair of the sets A, B ∈ A and ε > 0, the set of solutions (m, n) of μ(T −1 A ∩ T −1 B) − μ(A)μ(B) ≥ ε, (3.5.8) n m

3.5 Other equivalences and other forms of mixing

117

is of bounded fibres. Evidently, a dynamical system (X, A, μ, T ) is strongly mixing if and only if the corresponding sequential dynamical system (X, A, μ, T ) is. A theorem of Berendt and Bergelson [1984: Theorem 2.1] states 3.5.3 Theorem. For sequential dynamical systems, the following conditions are equivalent: (1) T˜ is strongly mixing. (2) For any A ∈ A and ε > 0, the set of solutions (m, n) of μ(T −1 A ∩ T −1 A) − μ(A)2 ≥ ε n m is of bounded fibres. (3) For any 1 ≤ p < ∞, f ∈ Lp and ε > 0, there exists if N ≥ K and a K such that T f − n1 < n2 < · · · < nN are positive integers, then N1 N k=1 nk X f dμ p ≤ ε. (4) Every subsequence of T˜ is ergodic. This allows us to recover the characterizations of strong mixing for dynamical systems of Rényi and Blum–Hanson mentioned before. Now we turn to weak mixing. We extend the notion (3.3.5) of lower density, upper density, and density of a subset of N to subsets of N2 , with respect to squares. The ¯ lower (resp. upper) density δ(B) of B (resp. δ(B)) of a subset B of N2 is defined by δ(B) = lim inf J →∞

#{B ∩ [1, J ]2 } , J

¯ δ(B) = lim sup J →∞

#{B ∩ [1, J ]2 } . J2

(3.5.9)

¯ When δ(B) and δ(B) coincide, we denote δ(B) the common value and say that B has density δ(B). A double sequence {am,n , m, n ≥ 1} converges in density to a if there exists a subset J of N2 of density 1 such that lim

J(m,n)→(∞,∞)

an,m = a.

(3.5.10)

In this case, we write D- lim an,m = a. This extends the notion of D-convergence of simple sequences defined in (3.4.1) to double sequences. A sequential dynamical system (X, A, μ, T˜ ) is weakly mixing if for any pair of the sets A, B ∈ A, D- lim μ(Tn−1 A ∩ Tm−1 B) = μ(A)μ(B). (3.5.11) ˜ is the The product of two sequential dynamical systems (X, A, μ, T˜ ) and (Y, B, ν, S) system ˜ (X × Y, A × B, μ × ν, T˜ × S) where (Tn × Sm )(x, y) = (Tn x, Tm y). Here again, there is a nice set of characterizations. Indeed by a theorem of Berendt and Bergelson [1984: Theorem 4.1] we have:

118

3 Dynamical systems – ergodicity and mixing

3.5.4 Theorem. The following conditions are equivalent: (1) T˜ is weakly mixing. (2) For any A ∈ A, D-lim μ(Tn−1 A ∩ Tm−1 A) = μ(A)2 . (3) For any A, B ∈ A, and δ > 0, there exists a K such that for any N ≥ K and m the inequality μ(Tn−1 A ∩ Tm−1 B) − μ(A)μ(B) ≥ ε has at most δN solutions n with 1 ≤ n ≤ N. that if N ≥ K (4) For any 1 ≤ p < ∞, f ∈ Lp and δ, ε > 0, there exists a K such and n1 < n2 < · · · < nN ≤ N/δ are positive integers, then N1 N k=1 Tnk f − ≤ ε. f dμ X p (5) Every positive lower density subsequence of T˜ is ergodic. (6) T˜ × T˜ is ergodic. ˜ (7) T˜ × S˜ is ergodic for any ergodic S. It follows in particular that (X, A, T , μ) is weakly mixing if and only if N 1 nk Lp T f −→ f dμ N X

as N → ∞,

(3.5.12)

k=1

for every 1 ≤ p < ∞, f ∈ Lp and any positive lower density increasing sequence {nk , k ≥ 1} of positive integers. This is a result due to Jones [1972]. Weakly mixing sequences may admit only zero density strongly mixing subsequences. For example, such is the case with the sequence of pointwise transformations on T given by Tn (x) = n1/2 x, x ∈ T, n = 1, 2, . . . . (3.5.13) From multiple recurrence theory more can be said (see Furstenberg [1977]): for instance if (X, A, μ, T ) is weakly mixing, then for any triple A, B, C of elements of A, lim n

−1

n−1

n→∞

|μ(A ∩ T −i B ∩ T −2i C) − μ(A)μ(B)μ(C)| = 0.

(3.5.14)

i=0

Let E = {Tj , j ≥ 1} be a family of measurable transformations of X, preserving μ. We assume that E is weakly mixing in the sense that 1 |Tk f, g − f, 1g, 1| = 0 n→∞ n n

lim

(∀f, g ∈ L2 (μ)).

k=1

Assertion (c) of Theorem 3.3.3 can be extended ([Weber: 2001], Propositions 6.1 and 6.2) as follows:

3.5 Other equivalences and other forms of mixing

119

3.5.5 Proposition. Let f, g ∈ L∞ (μ). Let X and Y be two independent random variables, such that X ∼ f and Y ∼ g. Let also F : R2 → R be continuous. Then for any ε > 0, one can define a sequence S of positive integers of density 1, such that for any u ∈ S, ≤ ε. F (f, T g)dμ − E F (X, Y ) u X

k l Proof. It suffices to prove the result for F (x, y) = M k,l=1 ak,l x y . The general case will follow from the Stone–Weierstrass theorem. Let ε > 0 and f, g ∈ L∞ (μ). We have thus to consider the expression F (f, Tu g)dμ =

M

ak,l

f k Tu g l dμ.

k,l=1

By Lemma 3.3.4, there exists a sequence S of density 1 such that k l k lim f Tu g dμ = f dμ · g l dμ. Su→∞

Operating by induction, for any ε > 0 we deduce the existence of a sequence S of density 1 such that ∀u ∈ S, ∀k, l = 1, . . . , M, f k Tu g l dμ − f k dμ · g l dμ ≤ ε . M

Choose ε = ε/(

k,l=1 |ak,l |).

Then, by the previous estimate

M k l F (f, Tu g)dμ − E F (X, Y ) = F (f, Tu g)dμ − ak,l f dμ · g dμ k,l=1

M k l k l ≤ |ak,l | f Tu g dμ − f dμ · g dμ k,l=1

≤ ε. The general case immediately follows. Indeed, by Stone–Weierstrass’s theorem, one can find a polynomial P (x, y) such that F − P ∞ ≤ ε. By the triangle inequality and applying the previous result to P , we deduce that there exists a sequence S of density 1, such that for any u ∈ S, F (f, Tu g)dμ − E F (X, Y ) ≤ 2ε + P (f, Tu g)dμ − E P (X, Y ) ≤ 3ε. Let a > 1. Let be a finite subset of L2 (μ) ∩ 1⊥ μ . A useful consequence of weak mixing is: for any z ∈ and any positive integer N , one can find integers u1 , . . . , uN

120

3 Dynamical systems – ergodicity and mixing

such that if S(z) = z Tu1 + · · · + z TuN , then S(z) 22,μ ∼ N z 22,μ . In particular, for any z ∈ ,

S(z) 22,μ ≤ aN z 22,μ . This naturally extends to Lp (μ) spaces with 2 ≤ p < ∞. Proposition 3.5.5 can be indeed used to prove 3.5.6 Proposition. Let 2 ≤ p < ∞ and ε > 0. Let be a finite subset of Lp (μ), and a positive integer N. Then, there exist integers u1 , . . . , uN such that if S(z) = z Tu1 + · · · + z TuN , for any z ∈ ,

p/2 |S(z)|p dμ ≤ (1 + ε)p 2pN

X

|z|p dμ. X

Proof. Let z ∈ L∞ (μ) and α > 0. Put Z = (1 + |z|)p dμ. Let X1 , X2 , . . . be a sequence of independent random variables having the same law as z. Let also F (x, y) = (x + y)l , l ≤ p. By applying p-times the previous proposition with the choice f = g = z, we establish the existence of a sequence of integers S1 with density 1, such that for any u ∈ S1 , ∀l ≤ p,

(z + Tu z)l dμ − E (X1 + X2 )l ≤ α.

At the next stage, we apply Proposition 3.5.5 with the choice f = z + Tu z, g = z and X, Y are independent with X ∼ f and Y ∼ g. For any u ∈ S1 and v belonging to S2 (depending on u) of density 1, we have: (z + Tu z + Tv z)l dμ − E (X + Y )l ≤ α

(∀l ≤ p).

But, E (X + Y )l =

l

Clk E Xk E Y l−k =

k=0

l

Clk

(z + Tu z)k dμ

k=0

Thus |E (X + Y )l − E (X1 + X2 + X3 )l | l = E (X + Y )l − Clk E (X1 + X2 )k zl−k dμ k=0

zl−k dμ.

121

3.6 Examples

l k k k l−k (z + Tu z) dμ − E (X1 + X2 ) z dμ Cl = ≤

k=0 l

Clk

k=0 l

≤α

l−k (z + Tu z) dμ − E (X1 + X2 ) · z dμ k

k

Clk

|z

l−k

|dμ = α

(1 + |z|)l dμ ≤ αZ.

k=0

We thus deduce that for any u ∈ S1 , v ∈ S2 and l ≤ p, l l (z + Tu z + Tv z) dμ − E (X1 + X2 + X3 ) ≤ α(1 + Z). Now, it suffices to iterate the preceding argument. For any integer N ≥ 1, we obtain that there exist N sequences S1 , . . . , SN of density 1, such that for any ui ∈ Si , i = 1, . . . , N and l ≤ p, N −2)+ N l (N l λ . ≤ α T z dμ − E X Z u i i i=1

λ=0

i=1

We notice that for any i = 1, 2, . . . , the sequence Si depends on u1 , . . . , ui−1 . For a suitable choice of α depending on ε, N , z and p, we also have: N

p Tu i z

i=1

dμ ≤ E

N

p Xi

+ ε.

i=1

Proceeding now by approximation, the same result for any z ∈ Lp (μ) can be obtained follows from this and finally for any finite subset of Lp (μ). Proposition

N3.5.5 then p and from Rosenthal’s inequality (8.2.9) applied to E i=1 Xi .

3.6

Examples

Before examining mixing properties of some standard examples of dynamical systems, recall that in order for a given measurable dynamical system (X, A, μ, T ) to be ergodic, weakly mixing or mixing, it is sufficient (and also necessary) that this property be satisfied for a countable class of functions that are dense in L2 (μ). In the following examples we consider X = T equipped with the normalized Lebesgue measure λ. (1) Irrational rotations x → T x = x + ϑ mod (1), ϑ ∈ Qc . They are ergodic but not weakly mixing. The last assertion is clear since the characters en (x) = e2iπ nx , n ∈ Z are eigenfunctions for the associated isometry UT , thereby spanning L2 (λ). The system has discrete spectrum and so, cannot be weakly mixing. As regarding the

122

3 Dynamical systems – ergodicity and mixing

ergodicity, let f ∈ L2 (μ), f ∼ number of indices n. Then AN f (x) =

n∈Z an en ,

and assume that an = 0 unless for a finite

N −1 1 f (T k x) = an VN (nϑ)en (x), N k=0

n∈Z

where VN (u) = (e2iπ N u − 1)/N (e2iπ u − 1). As |VN (u)| = o(N ) if u = 0 and {nϑ} = 0, it follows that limN →∞ AN f (x) = f, 1 in L2 (λ) (and pointwise). In view of von Neumann’s theorem and (3.1.3), we get E (f |JT ) = f, 1. By approximation, this remains true for any f ∈ L1 (λ), which means that JT is the trivial σ -field, hence the ergodicity of the system. In relation with this property, we mention the following general result. 3.6.1 Theorem. Let (X, A, μ, T ) be a measurable dynamical system such that T is ergodic but not weakly mixing (or equivalently T × T is not ergodic). Then T has a factor which is a rotation on a compact abelian group. We refer to Petersen [1983: 134] for a proof. (2) The transformations x → Tq x = qx mod (1), q a positive integer. Let a be some integer strictly greater than 1. Then Ta k = Tak . The mixing properties of these transformations rely upon the following lemma. Let (h, k) and [h, k] respectively denote the greatest common divisor and the least common multiple of the positive integers h and k, and put (h, k) h, k = . [h, k] 3.6.2 Lemma. Let A and B be two intervals in T. There exists a constant C depending on A and B such that, for any positive integers h and k, λ(T −1 A ∩ T −1 B) − λ(A)λ(B) ≤ Ch, k. (3.6.1) h k Further, there exists another constant C depending on A and B only, such that for any finite collection h1 , . . . , hR of distinct positive integers R λ(T −1 A ∩ T −1 B) − λ(A)λ(B) ≤ CR(log log R)2 . i,j =1

hi

hj

(3.6.2)

Before giving the proof of the lemma, we indicate some useful consequences. Let T˜ = {Th , h ≥ 1}. 3.6.3 Proposition. The sequential dynamical system (T, B(T), λ, T˜ ) is weakly mixing. Further, for any integer a strictly greater than 1, the transformation Ta is strongly mixing.

3.6 Examples

123

Proof. Indeed, by Theorem 3.5.4 (2), the sequential dynamical system (T, B(T), λ, T˜ ) is weakly mixing if and only if D- lim λ(Tn−1 A ∩ Tm−1 A) = λ(A)2 , for any A ∈ A. But by Lemma 3.6.2, for any pair A and B of intervals in T, R 1 λ(Ti−1 A ∩ Tj−1 B) − λ(A)λ(B) = 0. 2 R→∞ R

lim

i,j =1

Naturally, this remains true if A and B are the finite union of pairwise disjoint intervals. Now by proceeding by approximation, we get that the above property extends to any pair U and V of Borel sets of T: R 1 λ(Ti−1 U ∩ Tj−1 V ) − λ(U )λ(V ) = 0. 2 R→∞ R

lim

(3.6.3)

i,j =1

Specifying this for U = V , gives D- lim λ(Tn−1 U ∩ Tm−1 U ) = λ(U )2 , as required. Now if a is an integer strictly greater than 1, and A and B are intervals in T, then λ(T −k A ∩ T −1 B) − λ(A)λ(B) ≤ Ca, a k = Ca −(k−1) . (3.6.4) a a Thereby, since Ta is λ-preserving,

lim λ Ta−k A ∩ B = λ(A)λ(B). k→∞

The fact that Ta is strongly mixing now follows from the above and the same approximation argument used before. Proof of Lemma 3.6.2. Let A = [a, b), B = [c, d). By expanding the indicator function χ ([a, b[)(x) into a Fourier series, we get χ ([a, b[)(x) = b − a +

−1 e−2iπ nb − e−2iπ na e2iπ nx , 2iπ n ∗

n∈Z

χ ([c, d[)(x) = d − c +

−1 e−2iπ nd − e−2iπ nc e2iπ nx 2iπ n ∗

(3.6.5)

n∈Z

for almost all x. Note ϕ = χ ([a, b[), ψ = χ([c, d[), next ϕ¯ = ϕ − (b − a), ψ¯ = ψ − (d − c). Put for u, v ∈ T and n integer δn (u, v) = e−2iπ nv − e−2iπ nu . Then,

−1 e2iπ nhx δn (a, b), ϕ({hx}) ¯ = 2iπ n ∗ n∈Z

¯ ψ({kx}) =

−1 e2iπ mkx δm (c, d), 2iπ m ∗

m∈Z

124

3 Dynamical systems – ergodicity and mixing

so that ϕ¯h , ψ¯ k =

n∈Z∗

=

m∈Z∗

m,n∈Z∗ nh−mk=0

1 δn (a, b)δ−m (c, d) 2 4π mn

T

e2iπ(nh−mk)x dx

1

δn (a, b)δ−m (c, d). 4π 2 mn

The equation nh − mk = 0 has solutions given by n = μk/(h, k) and m = μh/(h, k), μ = 1, 2, . . . . Thus,

λ Th−1 A ∩ Tk−1 B − λ(A)λ(B) = ϕ¯h , ψ¯ k (3.6.6) =

∞ h, k 1 δμk/(h,k) (a, b)δ−μh/(h,k) (c, d) + δ−μk/(h,k) (a, b)δμh/(h,k) (c, d) . 2 2 4π μ μ=1

Therefore

λ(T −1 A ∩ T −1 B) − λ(A)λ(B) ≤ Ch, k, h

k

where the constant C depends on A and B. And the first part of the lemma is proved. The second part now plainly follows from the first and Gál’s estimate (8.4.19) which we briefly recall for convenience: there exists a constant C such that for any N-tuple of (all different) positive integers n1 , . . . , nN , we have ni , nj ≤ CN(log log N )2 . i,j ≤N

3.6.4 Remark. In Lemma 11, p. 52 of Sprindžuk [1979], another estimate is proposed, which is sometime more suitable. The proof is based on the method of Vinogradov. −1 λ T A ∩ T −1 B − λ(A)λ(B) = O |A| (h, k) . h k k

(3) Gaussian systems. Let π X = {Xn , n ∈ Z} be a centered Gaussian stationary sequence, and let r(m) = −π eimλ F (dλ) denote its covariance function. Let (, B, P) be the underlying probability space on which X is defined. Consider also the Gaussian dynamical system canonically associated to X: (RZ , B(RZ ), μ, T ) where μ = X(P), T is the usual shift: Tf = f ( · + 1) if f ∈ RZ . The mixing properties of these dynamical systems are characterized by a theorem due to Maruyama [1949]. For the proof, we use a probabilistic approach as in [Weber: 1980].

125

3.6 Examples

3.6.5 Theorem. (a) (RZ , B(RZ ), μ, T ) is weakly mixing if and only if N −1 1 |r(n)| = 0. N →∞ N

lim

n=0

(b) (RZ , B(RZ ), μ, T ) is strongly mixing if and only if limn→∞ r(n) = 0. Proof. According to Lemma 10.1.4, 1 P X0 ≥ 0, Xn ≥ 0 − P X0 ≥ 0 P Xn ≥ 0 = arcsin r(n). 2π −1 So, if the system is weakly mixing, necessarily limN →∞ N1 N n=0 | arcsin r(n)| = 0, 1 N −1 namely limN→∞ N n=0 |r(n)| = 0. Similarly, if the system is strongly mixing, then limn→∞ r(n) = 0. For proving the sufficiency part we use Lemma 10.1.5. There is no loss to assume N E Xn2 = 1 for ) every n ∈ N. Let )C, D be cylinders of R with basis I and J respectively, namely C = i∈N Ci , D = j ∈N Dj and

(ai , bi ) Ci = R Let

if i ∈ I , if i ∈ I c ,

˜ V = C˜ × D,

Dj =

C˜ =

(

(uj , vj ) if j ∈ J , R if j ∈ J c .

Ci , D˜ =

i∈I

(

Dj +n .

i∈J

where n is any positive integer sufficiently large for I ∩ {n + J } to be empty. We further assume the numbers ai , bi , uj , vj to be all distinct, which is not a restriction there. Then, by Lemma 10.1.5 there exists a constant C which depends on I and J only, such that |μ(C ∩ T −n D) − μ(C)μ(D)| = P {Xi , i ∈ I ; Xj +n , j ∈ J } ∈ V − P {Xi , i ∈ I } ∈ C˜ P {Xj , j ∈ J } ∈ D˜ (3.6.7) |r(j − i + n)|. ≤C i∈I j ∈J

If we know for instance that limN →∞

1 N

N −1 n=0

|r(n)| = 0, we get from (3.6.7)

N −1 1 μ(C ∩ T −n D) − μ(C)μ(D) = 0. N →∞ N

lim

(3.6.8)

n=0

Let C denote the semi-algebra of cylinders. It is plain that (3.6.8) extends to any pair of sets C and D in C. Let {Cp , p ≥ 1} be a sequence in C converging to some element

126

3 Dynamical systems – ergodicity and mixing

C ∈ B(RN ): limp→∞ μ(Cp *C) = 0. Let ε > 0. Then |μ(C ∩ T −n D) − μ(C)μ(D)| ≤ |μ(C ∩ T −n D) − μ(Cp ∩ T −n D)| + |μ(Cp ∩ T −n D) − μ(Cp )μ(D)| + |μ(Cp )μ(D) − μ(C)μ(D)| := P1 + P2 + P3 . For p large enough, say p ≥ pε , and any integer n,

P1 = |μ(C ∩ T −n D) − μ(Cp ∩ T −n D)| ≤ μ (C ∩ T −n D)*(Cp ∩ T −n D) ≤ μ(C*Cp ) + μ(T −n D*T −n D) ≤ ε/2, P3 ≤ μ(D)μ(C*Cp ) ≤ ε/2. Hence, N−1 N −1 1 1 |μ(C ∩ T −n D) − μ(C)μ(D)| ≤ ε + |μ(Cpε ∩ T −n D) − μ(Cpε )μ(D)|. N N n=0

n=0

By letting N tend to infinity in the above inequality, we easily get lim sup N →∞

N −1 1 |μ(C ∩ T −n D) − μ(C)μ(D)| ≤ ε. N n=0

−1 −n D) − μ(C)μ(D) = 0. As ε was arbitrary, we obtain limN →∞ N1 N n=0 μ(C ∩ T Since the monotonic class generated by the semi-algebra C coincides with B(RN ), we thereby conclude the fulfilment of N −1 1 |μ(C ∩ T −n D) − μ(C)μ(D)| = 0, N →∞ N

lim

n=0

for any C ∈ B(RN ) and any D ∈ C. Let C = T −1 C. The transformation T being invertible, the latter may also be rewritten as N −1 1 |μ(D ∩ T −n C ) − μ(D)μ(C )| = 0. N →∞ N

lim

(3.6.9)

n=0

If D is now a limit of a sequence {Dq , q ≥ 1} in C, by invoking the same reasoning, we also get (3.6.9) in that case. Finally, for any two elements C and D of B(RN ), we have N −1 1 |μ(D ∩ T −n C) − μ(D)μ(C)| = 0, lim N →∞ N n=0

which shows that the system is weakly mixing. That this one is strongly mixing under the assumption limn→∞ r(n) = 0, now follows from the same arguments.

127

3.6 Examples

3.6.6 Remark. The use of Lemma 10.1.5 allows us to prove a little more: if the system is mixing, then it is k-mixing for every k. Indeed, assuming still E Xn2 ≡ 1, let C 1 , . . . , C k be cylinders of RN with respective basis I1 , . . . , Ik : j j ( j (ai , bi ) if i ∈ Ij , j Ci , j = 1, . . . , k, C˜ j = Ci = c R if i ∈ Ij , i∈I j

j

j

where the reals ai , bi are all distinct. Let n1 , . . . , nk be positive integers and assume that the numbers min(n1 , . . . , nk ), min(|ni − nj |, i = j ) are large. Set V =

k (

k (

j

Ci .

j =1 i∈nj +Ij

By Lemma 10.1.5 there exists a constant C depending on I1 , . . . , Ik only, such that k k ( ( −nj T Cj − μ(Cj ) μ j =1

j =1

k ( = P {(Xnj +i )i∈Ij , j = 1, . . . , k} ∈ V − P {Xi , i ∈ Ij } ∈ C˜ j

=≤ C

j =1

|r(i − + nj − nh )|.

1≤j =h≤k i∈Jj ∈Jh

Hence lim

min(n1 ,...,nk )→∞ min(|ni −nj |,i =j )→∞

μ(T −n1 C1 ∩ · · · ∩ T −nk Ck ) = μ(C1 ) . . . μ(Ck ).

(3.6.10)

By proceeding as before, one can next prove that (3.6.10) holds for all C 1 , . . . , C k ∈ B(RN ). (4) Bernoulli shifts. Let (, B, P) be a probability space and consider on N = {ω = (ωz )z∈N : ωz ∈ , z ∈ N} the right shift T ω = (ωz+1 )z∈N . 3.6.7 Theorem. The dynamical system (N , B N , PN , T ) is strongly mixing, and in fact, k-mixing for every k. Proof. This is a simple consequence of independence. Let as before C 1 , . . . , C k be cylinders of N with respective basis I1 , . . . , Ik : j ( j Ai if i ∈ Ij , j Ci , j = 1, . . . , k. C˜ j = Ci = c ifi ∈ Ij , i∈I j

128

3 Dynamical systems – ergodicity and mixing

Let n1 , . . . , nk be positive integers. Then, if we assume that min(n1 , . . . , nk ) and min(|ni − nj |, i = j ) are large, PN

k ( j =1

k (

T −nj Cj = P N Cj . j =1

Hence lim

min(n1 ,...,nk )→∞ min(|ni −nj |,i =j )→∞

μ(T −n1 C1 ∩ · · · ∩ T −nk Ck ) = μ(C1 ) . . . μ(Ck ).

Chapter 4

Pointwise ergodic theorems

This chapter is essentially devoted to the study of pointwise ergodic theorems. In a first step, we study the Birkhoff pointwise ergodic theorem, and integrability properties of the associated maximal operators, as well as Gerstenhaber’s counterexample. A section is devoted to the speed of convergence: its absence in general and its existence when some spectral type conditions are fulfilled. In this chapter we continue the study of the related oscillation functions, made by means of the spectral regularization method introduced in Chapter 1. Other maximal inequalities and the transference principle are included in this chapter. The Wiener–Wintner ergodic theorem, and its uniform version due to Bourgain, as well as some weighted pointwise ergodic theorems conclude the chapter.

4.1

Birkhoff’s pointwise theorem

This theorem is together with von Neumann’s theorem the foundation of ergodic theory. It has many applications in various domains, such as number theory and probability theory. The strong law of large numbers is, in this context, an understatement of Birkhoff’s theorem. Let (X, A, μ, τ ) be a measurable dynamical system and put for f ∈ L0 (μ), Tf = f τ . Clearly T is a positive isometry in any Lp (μ) space. We shall use the notation Aτn f =

1 1 k f τk = T f = ATn f. n n n−1

n−1

k=0

k=0

4.1.1 Theorem (Birkhoff [1931]). Let (X, A, μ, τ ) be a measurable dynamical system. For any f ∈ L1 (μ), the limit lim Aτn f (x) = f¯(x)

n→∞

exists μ-almost everywhere and in L1 (μ), and we have f¯ = E {f |J}, where J = σ {A ∈ A : τ −1 A = A}. In case the dynamical system (X, A, μ, τ ) is ergodic, then for any f ∈ L1 (μ), 1 a.s. f dμ. (4.1.1) lim Aτn f = n→∞ n X Indeed if (X, A, μ, τ ) is ergodic, then E{f |J} is τ -invariant, therefore constant by Lemma 3.2.2, and we have E {f |J} = f dμ; hence (4.1.1). Conversely, assume

130

4 Pointwise ergodic theorems

that (4.1.1) holds for any function f of L1 (μ). Let f ∈ L1 (μ) be τ -invariant. As 1 n−1 k k=0 f τ = f , it follows that f = f dμ. Thus f is constant (modulo μ), and n this implies by Lemma 3.2.2 that the dynamical system (X, A, μ, τ ) is ergodic. An immediate consequence is the well-known Strong law of large numbers: Let X, X1 , X2 , . . . be a sequence of independent, identically distributed, integrable random variables with basic probability space (, B, P), and denote Sn = X1 + · · · + Xn . Then, Sn P lim = E X = 1. n→∞ n Proof of Theorem 4.1.1. (1) The theorem is verified for a dense subset L of L1 (μ). Indeed, let L be the set of functions h = f + g − g τ with f = f τ and g ∈ L∞ (μ). As L∞ (μ) is dense in L2 (μ), which is dense in L1 (μ), we deduce from the Riesz decomposition of L2 (μ) that L is dense in L1 (μ). Now, integrating the inequality |Aτn (f

g − g τn ≤ 2 g ∞ + g − g τ) − f | =

n

n

implies the convergence in L1 (μ) of the averages Aτn h. Further, ¯ E {h |J} = E {f |J} + E {g |J} − E {g τ |J} = f + E {g |J} − E {g |J} = f = h. (2) The operators Aτn , being barycenters of contractions of L1 (μ), are thus L1 (μ) contractions; as well as the conditional expectation operator E {• |J}. It follows from (1) and point (4) of the proof of the von Neumann Theorem 1.3.1 that Aτn (f ) converges in L1 (μ), for any f ∈ L1 (μ). The limit coinciding with E {f |J} on L, is therefore equal to E {f |J} for any f ∈ L1 (μ). (3) In this step, we prove a first type of maximal lemma, due to Yoshida–Kakutani [1939] and Hopf [1960] (other proofs with simplified arguments were given in Riesz [1932], [1932], [1938], [1942], see also the proof of Katznelson and Weiss [1982], and Petersen [1979] for other references, as well as Krengel [1985] for instructive comments), which is necessary to achieve the proof of Birkhoff’s theorem. We follow here a simple and elegant proof given by Garsia [1965], [1970]. We introduce the notation T MNT (f ) = sup ATn f, M∞ (f ) = sup MNT f. (4.1.2) N ≥1

1≤n≤N

4.1.2 Lemma (Maximal inequality). Let T be a positive contraction of L1 (μ). For any f ∈ L1 (μ), any real λ ≥ 0, T (f ) > λ} ≤ f dμ. λμ{M∞ T (f )>λ M∞

It follows that for any f ∈ L1 (μ), sup λμ sup |ATn f | > λ ≤ λ≥0

n≥1

|f | dμ.

131

4.1 Birkhoff’s pointwise theorem

Proof. Put n−1

SN = sup

T j f,

1≤n≤N 0

+ We have SN ≥

k−1 0

+ f + T SN ≥f +

+ SN = max{0, SN },

EN = {SN > 0}.

T j f , k = 1, . . . , N. Thus

k−1

k

T j +1 f = f +

0

T jf =

1

k

k = 1, . . . , N. (4.1.3)

T j f,

0

+ + ≥ f . Hence f + T SN ≥ SN +1 ≥ SN . Integrating this over EN Moreover f + T SN gives + + f dμ + T SN dμ ≥ SN dμ = SN dμ.

EN + T SN dμ ≤

EN + T SN dμ X

EN

X

+ SN

But EN ≤ dμ, since T is a positive contraction of , 1 L (μ). We deduce that EN f dμ ≥ 0. Let E = N EN = {supn≥1 ATn (f ) > 0}. The sets EN being increasing, by passing to the limit we get f dμ ≥ 0. (4.1.4) E

Replacing f by f − λ, we find

T (f ) > λ} ≤ Hence, λμ{M∞

T (f )>λ} {M∞

(f − λ) dμ ≥ 0.

T (f )>λ} f {M∞

dμ, as claimed.

(4) Now we show that the set of functions f ∈ L1 (μ) such that (AτN (f )) converges μ-almost everywhere is closed in L1 (μ). Let f, g ∈ L1 (μ) be two such functions. Then 0 ≤ lim sup Aτn (f ) − lim inf Aτn (f ) n→∞

=

lim sup Aτn (f n→∞

n→∞

τ − g) − lim inf Aτn (f − g) ≤ 2M∞ (f − g). n→∞

τ (f − g) > λ} ≤ f − g for any λ ≥ 0. According to the maximal lemma, λμ{M∞ 1 Thus 1 μ lim sup Aτn (f ) − lim inf Aτn (f ) > 2λ ≤ f − g 1 . n→∞ λ n→∞

If f is obtained as a limit in L1 (μ) of functions g such that (AτN (g)) converge μ-almost everywhere, we therefore deduce that lim supn→∞ Aτn (f ) − lim inf n→∞ Aτn (f ) = 0, μ-almost everywhere. And this proves our claim. (5) The proof is finally achieved by observing, according to (1) and (4), that the sequence AτN (f ) converges μ-almost everywhere for any f ∈ L1 (μ). By (2), this convergence also holds in L1 (μ), the limit being identified to E {f |J}.

132

4 Pointwise ergodic theorems

The maximal inequality will in turn imply (Section 4.2) for any 1 < p < ∞ sup |AT f | ≤ n p n≥1

p

f p , p−1

which is similar to the well-known martingale inequality. Martingale inequality. Let 1 < p < ∞, q = p/(p − 1). Let {Sj , Ej , j ≤ n} be a martingale and E |Sj |p < ∞, j ≤ n. Then n

E max |Sj |p ≤ q p E |Sn |p . j =1

The analogy goes beyond this remark. Birkhoff’s theorem can in turn be also deduced from the martingale convergence theorem. See Stroock [1993: Chapter VI]. Martingale convergence theorem. Let {Sn , En , n ≥ 1} be a martingale and assume supn≥1 E |Sn |p < ∞. Then {Sn , n ≥ 1} converges in Lp and almost surely. Flows. A flow {Tt , t ∈ R} is a group of measurable transformations Tt : (X, A) → (R, B(R)) with T0 =Identity, Tt+s = Tt Ts (s, t ∈ R). If the Tt are measurepreserving, the flow is called measure-preserving. The flow is called measurable if ˜ to (R, B(R)), and A˜ the map (x, t) → Tt x is a measurable map from (X × R, A) is the completion of the product σ -algebra A ⊗ B(R) with respect to the product of the measure μ with the Lebesgue measure on R. There are similar definitions for semiflows {Tt , t ≥ 0}. Note that if f ∈ L1 (μ), then by Fubini’s theorem t → f (Tt x) is locally integrable for μ-almost all x. Further 0

n

f (Tt x)dt =

n−1

j F (T1 x)

j =0

with F (x) =

1

f (Tt x)dt. 0

1 Let also F0 (x) = 0 |f (Tt x)|dt. Then F0 is integrable. n The pointwise ergodic theorem thus implies that n−1 0 f (Tt x)dt converges when n tends to infinity, and also that n−1 F0 (T1n x) → 0 almost surely. As for n ≤ τ ≤ n τ n + 1, 0 f (Tt x)dx − 0 f (Tt x)dx| ≤ F0 (T1n−1 x), the convergence also holds when τ → ∞, τ real. For flows there is another kind of result, the local ergodic theorem due to Wiener: If {Tt , t ≥ 0} is a measure-preserving measurable semiflow and f ∈ L1 (μ), then ε a.e. f (Tt x)dt = f (x). (4.1.5) lim ε−1 ε→0

0

4.1 Birkhoff’s pointwise theorem

133

Maximal inequality and maximal equality for flows. Let f ∈ L1 (μ) and define t Ft (x) 1 t = sup f (Ts x)ds, f ∗ (x) = sup f (Ts x)ds. Ft (x) = t t>0 t>0 t 0 0 Then sup αμ{f ∗ > α} ≤ α≥0

f ∗ >α

f dμ.

The maximal inequality above is due to Wiener [1939], and Yoshida, Kakutani [1939]. Marcus and Petersen [1979], also Engel and Kakutani [1981] showed that this inequality is often an equality. More precisely, when the flow is ergodic, in that every measurable subset A ∈ A which is invariant under the flow (Ts A = A for s ∈ R) has measure 0 or 1, then for α ≥ f dμ, ∗ f dμ. αμ{f > α} = f ∗ >α

The integrability condition f ∈ L1 (μ) is not necessary to ensure the convergence almost everywhere of ergodic means Aτn (f ). Gerstenhaber’s counterexample. Let X0 = [0, 1[, B(X0 ) be the σ -algebra of Borel sets of X0 , λ the normalized Lebesgue measure on X0 , and 0 an automorphism of (X0 , B(X0 ), λ), for instance 0 (x) = x + α mod(1), α irrational. Let also 1 = a0 ≥ a1 ≥ · · · ≥ 0 be a decreasing , sequence of reals, and put for any integer n ≥ 0, Xn = [0, an [×{n}. Let X = ∞ n=0 Xn . We endow X with the σ -algebra B defined by B ⊂ X: ∀n ≥ 1, p1 (B ∩ Xn ) ∈ B(X0 ) and p1 : R2 → R is the projection on ∞the first coordinate. Consider the measure μ on (X, B) defined by: μ(B) = n=0 λ(p1 (B ∩ Xn )), ∀B ∈ B. In addition, define the application (x, y + 1) if x < ay+1 , (x, y) = (0 (x), 0) otherwise. It is easily seen that is an invertible measure-preserving ergodic transformation in the measure space (X, B, μ). Choose the sequence {an , n ≥ 1} as follows: • a1 = a2 , a3 = a4 , a5 = a6 , . . . , ∞ • a2n < ∞, n=1 ∞ √ • n=1 na2n < ∞. We can for instance choose a2n = n−3/2 , n ≥ 1. Then μ(X) = ∞ n=0 an < ∞. Let f : X → R be defined as ⎧ ⎪ if z ∈ X0 , ⎨0 √ f (z) = − n if z ∈ X2n−1 , n ≥ 1, ⎪ ⎩√ n if z ∈ X2n , n ≥ 1.

134

4 Pointwise ergodic theorems

√ j It is easily verified that n−1 j =0 f (x) ≤ n/2, hence 1 n−1 j • n j =0 f (x) → 0 for λ-almost all x, and + • f dm = f − dm = ∞.

Problem 3. Find a condition strictly weaker than integrability ensuring the validity of the conclusion in Birkhoff’s theorem. Non-integrable functions and growth of stationary sequences. If τ is an ergodic endomorphism of (X, A, μ), and f ≥ 0, f ∈ / L1 (μ), then Birkhoff’s theorem implies that μ lim AτN (f ) = ∞ = 1, N →∞

since for any k ≥ 0, AτN (f ) ≥ AτN (f ∧ k), and therefore lim inf N →∞ AτN (f ) ≥ limN→∞ AτN (f ∧ k) = (f ∧ k)dμ. Thus the integrability condition in Birkhoff’s theorem is also necessary for nonnegative functions. If ξ = {ξk , k ≥ 0} is a strictly stationary sequence, Kesten [1975] showed that the related partial sums cannot grow slower than linearly. More precisely 1 ξk > 0 n n−1

lim inf n→∞

a.e. on

k=0

n−1

ξk → ∞ .

(4.1.6)

k=0

Proof via the shift model. Bourgain indicated in [Bourgain: 1988d] an alternate proof derived from the shift model (Z, S), where Sz = {z+1 , ∈ Z}, z = {z , ∈ Z}. Let (X, α, μ, τ ) be a measurable dynamical system. Fix some positive integers J, N with J N . Let f ∈ L0 (μ), x ∈ X and consider the function ϕ on Z defined as follows: f (τ j x) if 0 ≤ j ≤ J , ϕ(j ) = 0 otherwise. k τ j Then n1 n−1 k=0 T ϕ = An f (τ x), provided that 0 ≤ j < J − N , and thus

N 1

sup n=1

n

n−1 j =0

N ϕ(j ) = sup |Aτn f (τ j x)|. n=1

The maximal inequality of the shift model, which follows from elementary covering properties of integer-intervals, n−1 k sup 1 T ϕ n≥1

implies

n

N

k=0

p (Z)

≤ C(p) ϕ p (Z) ,

sup |Aτn f (τ j x)|p ≤ C(p)p

0≤j <J −N n=1

0≤j ≤J

|f (τ j x)|p ;

135

4.1 Birkhoff’s pointwise theorem

and by integrating we have 0≤j <J −N

N

p

sup |Aτn f τ j | p ≤ C(p)p n=1

p

f τ j p .

0≤j ≤J

Since τ is μ-preserving, we obtain N

p

sup |Aτn f | p ≤ C(p) n=1

J p

f p . J −N

Hence

sup |Aτn f | p ≤ C(p) f p . n≥1

A similar argument yields the corresponding weak-type inequality. Extensions. Any linear contraction T on L1 of a σ -finite measure space, with

Tf ∞ ≤ f ∞ for f ∈ L1 ∩ L∞ , is called a Dunford–Schwartz contraction and induces a contraction on all Lp , 1 < p ≤ ∞ (see Dunford and Schwartz [1958]). Birkhoff’s ergodic theorem has been extended to Dunford–Schwartz contractions in Dunford and Schwartz [1956]: the limit E(T )f := limn→∞ n1 nk=1 T k f exists almost everywhere for f ∈ Lp , 1 ≤ p < ∞, and also in Lp -norm for p > 1 (and in L1 -norm in probability spaces). The same conclusion cannot be reached for unitary operators. Indeed, according to a remarkable result of Paszkiewicz [2005a: Theorem 1], in L2 (T), there exists a unitary operator V such that for each increasing sequence N = {nk , k ≥ 1} of positive integers, the sequence of ergodic averages K 1 nk V f (x), K

K = 1, 2, . . .

k=1

diverges almost everywhere for some f ∈ L2 (T). A well-known generalization of Birkhoff’s theorem due to Hopf [1937: 49] asserts that the sequence n−1 k k=0 T f n = 0, 1, 2, . . . n−1 k k=0 T g converges almost everywhere, provided f, g are measurable, f ∈ L1 (μ) and g > 0. This is a particular case of a more general result due to Hurewicz [1944], which can be described as follows. Let (X, A, μ) be a measure space with a nonnegative σ -finite measure μ. Further let F be another σ -finite measure on (X, A). Consider a 1-to-1 measurable transformation T of X. Assume that F is absolutely continuous with respect to μ (μ(A) = 0 implies F (A) = 0 and μ(A) < ∞ implies |F (A)| < ∞). Then ([Saks: 1937], p. 36) F can be represented as an indefinite integral: F (A) = A f0 (x)μ(dx).

136

4 Pointwise ergodic theorems

Set Fn (A) =

n−1

F (T k A),

k=0

μn (A) =

n−1

n = 0, 1, 2, . . . . μ(T k A),

k=0

Then the measure Fn is absolutely continuous with respect to μn . Thus there exists fn such that for all A ∈ A, Fn (A) =

fn dμn . A

Assume now that no measurable subset A of X with positive measure is a wandering set with respect to T . A measurable subset A is a wandering set with respect to T if the images T n A, n ∈ Z are pairwise disjoint. Hurewicz [1944: Theorem 1] proved that the sequence {fn , n ≥ 1} converges μ-almost everywhere on X to a limit f¯, which satisfies (a) f¯(T x) = f¯(x) almost everywhere, (b) f¯ ∈ L1 (μ), (c) F (A) = A f¯(x)μ(dx) for all A ∈ A such that T A = A, μ(A) < ∞. In the special case of a measure-preserving transformation: μ(T A) = μ(A) for A ∈ A, one has easily n−1 f0 (T k x)μ(dx). μn (A) = (n + 1)μ(A), Fn (A) = A k=0

Comparing with the relation linking Fn and fn , we deduce that, μ-almost everywhere on X, n−1 1 fn (x) = f0 (T k x). n+1 k=0

And by Hurewicz’s theorem, these averages converge μ-almost everywhere. This is precisely Birkhoff’s theorem. Consider now in addition to f0 another measurable function g0 such that g0 (x) > 0 μ-almost surely. We introduce the measure ν(A) = g0 (x)μ(dx), A

and define νn similarly. From the T -invariance of μ, we get n−1 g0 (T k x)μ(dx), νn (A) = A k=0

and so

n−1 k k=0 f0 (T x) Fn (A) = νn (dx). n−1 k A k=0 g0 (T x)

137

4.1 Birkhoff’s pointwise theorem

By Hurewicz’s theorem, we conclude that the sequence n−1 k=0

f0 (T k x)

k=0

g0 (T k x)

n−1

,

n = 0, 1, 2, . . .

converges μ-almost everywhere, which is Hopf’s theorem. If T is a positive contraction in L1 (μ), (X, A, μ) a probability space, f ∈ L1 (μ) and g ∈ L+ 1 (μ), then .

n−1 k k=0 T f n−1 k k=0 T g

,n ≥ 1

converges a.e. on

n−1 k=0

T k g > 0 to a finite limit.

This is Chacon–Ornstein’s theorem. We refer to Krengel [1985: 119] for a proof and identification of the limit. A theorem of Campbell and Petersen. Let (X, A, μ) be a probability space and ξ = {ξk , k ∈ N} be a weakly stationary sequence in L2 (μ). Gaposhkin (Theorem 2.6.1) gave a necessary andsufficient condition for the convergence almost everywhere of the averages σn = n1 n−1 k=0 ξk , involving the spectral measure of the associated unitary operator (Chapter 2). When ξ is further strictly stationary, by the Birkhoff pointwise ergodic theorem, we know that these averages converge almost everywhere. It is natural to try to understand Gaposhkin’s characterization in that case. Campbell and Petersen [1989] clarified this point. More precisely, let T be a unitary operator on L2 (μ). Let ET denote the spectral measure for T , supported on the closed unit disk in C, and for n = 1, 2, . . . let Vn = {z ∈ C : 0 < |1 − z| < 2−n }. By Theorem 2.6.1, 1 k (a) d lim T f (x) exists a.e. ⇐⇒ (b) lim [ET (Vn )f ](x) = 0 a.e. n→∞ n n→∞ n−1 k=0

When T is induced by a measure-preserving transformation, a strengthened version of (b) actually holds. 4.1.3 Theorem. Let Tf = f τ where τ is an automorphism of (X, A, μ), with associated spectral representation π T = eiλ E(dλ). −π

If {εk , k ≥ 1} is any nonnegative sequence tending to 0 as n tends to infinity, then lim [E(−εn , 0)f ](x) = 0 a.e. for all f ∈ L2 (μ).

n→∞

(4.1.7)

138

4 Pointwise ergodic theorems

The proof uses the ergodic Hilbert transform, which is for f ∈ L2 (μ) the almost sure limit T k f (x) 1 Hf (x) = lim . π n→∞ k 1≤|k|≤n

According to [Campbell: 1986], H may be represented via the spectral integral π H =i η(λ)E(dλ), −π

where η(λ) is the odd function on [−π, π] whose value for λ ∈ (0, π ] is (π − λ)/π and η(0) = 0. Consider also for fixed ε ∈ [−π, π] the rotated Hilbert transform of f induced by T : eikε .T k f (x) 1 Hε f (x) := lim . π n→∞ k 1≤|k|≤n

Campbell and Petersen proved this theorem by first showing that condition (4.1.7) is equivalent to a form of continuity at ε = 0 of the rotated Hilbert transform, that is a.e.

lim Hεn f (x) = Hf (x) + i[E{0}f ](x).

(4.1.8)

n→∞

Next they showed that (4.1.8) in turns holds: if H ∗ f (x) =

1 eikε .T k f (x) , k −π ≤ε≤π n≥1 π

sup

sup

1≤|k|≤n

then there exists a constant C > 0 such that for all f ∈ L2 (μ), sup λ2 μ{x : H ∗ f (x) > λ} ≤ C f 22 .

(4.1.9)

λ≥0

With the help of the Banach principle, it is then easy to conclude. The proof of (4.1.9) follows from a nice maximal inequality established by the authors, which is worth quoting. For a = {ak , k ∈ Z} ∈ 2 (Z), set a (j ) = sup sup ∗

ε>0 n≥1 1≤|k|≤n

ei(k+j )ε ak+j . k

There exists a constant C > 0 such that for all a ∈ 2 (Z), |ak |2 . sup λ2 #{j : a ∗ (j ) > λ} ≤ C λ≥0

(4.1.10)

k∈Z

The authors conjectured that even strong (2, 2) holds: a ∗ 2 (Z) ≤ C a 2 (Z) .

4.2 Dominated ergodic theorems

139

Moving averages. Naturally moving averages present a more complex almost sure behavior than the usual “fixed” averages. The convergence almost everywhere of moving averages has been characterized by Bellow, Jones and Rosenblatt [1990], by means of a cone condition, which is related to works of Nagel and Stein [1984] and of Sueiro [1987]. Let (X, A, μ, τ ) be a measurable dynamical system, and assume τ is ergodic. Let = {(nk , k ), k ≥ 1} be a sequence of pairs of integers and define k −1 1 Ak f (x) = f (T nk +j x), k

k = 1, 2, . . . .

j =0

Introduce for α > 0, α = {(z, s) ∈ N2 : |z − y| ≤ α(s − r) for a pair (y, r) ∈ }. Let α (s) = k : (k, s) ∈ α be the cross-section of α at height s > 0. Introduce also the maximal operator associated to , 1 |f (T k+j x)|. n (k,n)∈ n−1

M f (x) = sup

j =0

According to Theorem 1 in [Bellow–Jones–Rosenblatt: 1990], we have the following characterization. a) Assume there exist constants A < ∞ and α > 0 such that |α (s)| ≤ As for any positive integer s. Then M is of weak type (1, 1) and of strong type (p, p) for any 1 < p ≤ ∞. b) If M is of weak type (p, p) for some p > 0, then for any α > 0, there exists Aα < ∞ such that for any positive integer s, |α (s)| ≤ Aα s. Here are two typical examples: 1. There exists f ∈ L∞ (μ) such that 2k k −1 μ x : 21k 2j =2+2 f (T j x), k = 1, 2, . . . converges = 0. 2k 2. For every f ∈ L1 (μ), μ x:

1 k 22

22.2k +22k −1

4.2

k j =22·2

f (T j x), k = 1, 2, . . . converges = 1.

Dominated ergodic theorems

Let (X, A, μ) be a measure space with μ(X) = 1. Let T be an L1 -L∞ positive contraction. We study in this section relations between integrability properties of f and those of the maximal operators defined in (4.1.2). The very proof of Lemma 4.1.2 also implies with minor changes the lemma below.

140

4 Pointwise ergodic theorems

4.2.1 Lemma. Let T be a positive contraction of L1 (μ). For any f ∈ L1 (μ) and λ ≥ 0, T (a) λμ{M∞ (f ) > λ} ≤ f dμ, T (f )>λ M∞ T (f ) > 2λ} ≤ 2 f dμ. (b) λμ{M∞ 2f >λ

These inequalities suggest to introduce the following definition. 4.2.2 Definition. Let (, A, μ) be a probability space and X, Y : (, A) → R+ two measurable applications. We say that X and Y are in maximal type relation if for any nonnegative real α, X dμ < ∞.

αμ(Y > α) ≤ (Y >α)

We will first prove a useful lemma. 4.2.3 Lemma. Assume that X and Y are in maximal type relation. Let ψ : R+ → R+ , be increasing, right continuous and such that ψ(0) = 0. Then, Y (ω) t −1 ψ(dt)dμ(ω). ψ(Y )dμ ≤ X(ω) 0

Proof. By means of the transfer formula, 1 ψ(Y )dμ = μ(Y > α)ψ(dα) ≤ Xdμ ψ(dα) + R+ α Y >α R X(ω) = dμ(ω)ψ(dα) α (ω,α):Y (ω)>α Y (ω) 1 = X(ω) ψ(dα) dμ(ω). α 0 We shall establish the following theorem. 4.2.4 Theorem (Dominated ergodic theorem). Let T be an L1 -L∞ positive contraction. Let f ≥ 0 be measurable, then

p

f p (1 < p < ∞), p−1 T e + (b) M∞ f 1 ≤ 1 + f log f dμ . e−1 T (a) M∞ f p≤

T is of strong type According to the usual terminology, inequality (a) means that M∞ (p, p) and of weak type (1, 1) by Lemma 4.1.2.

141

4.2 Dominated ergodic theorems

Proof. We apply Lemma 4.2.3 with ψ(t) = t p , t > 1, Y = MnT f , X = f . According to Lemma 4.1.2, X and Y are in maximal type relation. It follows that Y (ω) pt p−2 dt dμ(ω) ψ(Y )dμ = Y p dμ ≤ f (ω) 0 p p = f (ω)Y p−1 (ω)dμ(ω) = f Y p−1 dμ. p−1 p−1 1/a b 1/b

, f, g ≥ 0, 1/a + g dμ We apply Hölder’s inequality: f · g dμ ≤ f a dμ p−1 . This leads to 1/b = 1 with the choices a = p, b = p/(p − 1), g = Y

p Y dμ ≤ p−1

1

p

p

p−1

p

p

p

Y dμ

f dμ

,

p or else Y p ≤ p−1

f p . And inequality (a) follows from Fatou’s lemma, since T T Mn ↑ M∞ . Now observe that for any a ≥ 0, b ≥ 0, a log b ≤ a log+ a + b/e. This is easily proved by first observing that log x ≤ x −1 (x > 0), which allows us to get log b ≤ b/e, + then by distinguishing thecases a ≤ 1 and a > 1. Let ψ(t) = (t − 1) (t > 0). Put X = f , Y = Mn f . Then ψ(Y )dμ ≥ (Y − 1)dμ, and in view of Lemma 4.2.3, Y Y + −1 −1 t ψ(dt) dμ ≤ f· t dt dμ (Y − 1) dμ ≤ f · 0 1 Y ≥1 f log Y dμ ≤ f log+ f dμ + e−1 Y dμ. = Y ≥1

Thus

Y dμ ≤

(Y − 1)+ dμ + 1 ≤

f log+ f dμ + e−1

Y dμ + 1,

or else (1 − 1/e) Y dμ ≤ 1 + f log+ f dμ. One concludes as in the previous step, hence part (b) of Theorem 4.2.4 is proved When f = 1A , inequality (b) of Theorem 4.2.4 does not provide any hint on the T (f ) when μ(A) tends to 0. We shall possible continuity of the maximal operator M∞ clarify this point by showing the following lemma. 4.2.5 Lemma. Let ε > 0, then for any A ∈ A, 1 T (M∞ (1A ) − ε)+ dμ ≤ log μ(A). ε

(4.2.2)

Let also A1 , . . . , AN , N be pairwise disjoint measurable sets. Then N i=1

T M∞ (1Ai )dμ ≤

N i=1

μ(Ai ) 1 + log N

N

i=1 μ(Ai )

.

(4.2.3)

142

4 Pointwise ergodic theorems

Proof. Put f = 1A and ψ(t) := (t − ε)+ , ε > 0. Lemma 4.2.3 applied to Y = MnT f , X = |f |, provides the estimate Y (ω) ∨ ε + dμ(ω). (Y − ε) dμ ≤ X(ω) log ε Thus

(MnT (1A ) − ε)+ dμ ≤

Hence,

Mn (1A )+ ∨ ε dμ(ω). ε

1A log

(MnT (1A ) − ε)+ dμ ≤

log

1 μ(A). ε

Letting now n tend to infinity, we deduce 1 T + μ(A). (M∞ (1A ) − ε) dμ ≤ log ε

(4.2.4)

Now let A1 , . . . , AN , N be pairwise disjoint measurable sets. Then N

T M∞ (1Ai )dμ =

i=1

N i=1

≤ Nε +

T (M∞ (1Ai ) − ε + ε)dμ N

T (M∞ (1Ai ) − ε)+ dμ

i=1

Hence,

N

N 1 ≤ N ε + log μ(Ai ). ε i=1

T (1Ai )dμ ≤ inf Nε + log M∞ ε>0

i=1

N 1 μ(Ai ) . ε i=1

The infimum of the right-hand side is reached at the value ε = thus have N i=1

T M∞ (1Ai )dμ ≤

N

i=1 μ(Ai )/N .

We

N μ(Ai ) 1 + log N . i=1 μ(Ai ) i=1

N

A maximal inequality in BMO. Let (X, , μ) be a probability space. Let T be a positive contraction of L1 (μ) such that T 1 = 1. Having now proved the dominated ergodic theorem, some useful observations can be made, notably in the light of Hopf maximal inequality, which we recall for our purpose: for any f ∈ L1 (μ), T f dμ. ∀λ > 0, λμ{M∞ (f ) > λ} ≤ T (f )>λ M∞

143

4.2 Dominated ergodic theorems

And by means of the dominated ergodic theorem 4.2.4, for any λ ≥ 0 and r > 1, T T T λμ sup M∞ (fθ ) > λ ≤ sup M∞ (|fθ |) dμ ≤ sup M∞ (|fθ |) r θ ∈

θ ∈

θ ∈

1 r r sup |fθ | ≤ ≤ #() r sup fθ r , r r − 1 θ ∈ r −1 θ ∈ (4.2.5)

where (fθ , θ ∈ ) is any finite subset of L1 (μ) (see Peškir–Weber [1996] and Ziegler [1998] for extensions to the non-measurable case). The last inequality follows from Jensen’s inequality. We shall extend this one to BMO spaces. Recall /∞ their definition.1 Let 0 ⊂ 1 ⊂ · · · ⊂ be an increasing filtration of ( = i=0 i ). Let f ∈ L (μ), and put fn = E (f |n ),

fn∗ = sup |fν |,

fn = fn − fn−1 ,

0≤ν≤n

Sn (f ) =

n

[ fν ]2

1/2 ,

f ∗ = sup fn∗ , n

S(f ) = sup Sn (f ).

ν=1

n

Introduce first the Hardy spaces, let p ≥ 1 and define p = f : E [S(f )]p < ∞ (4.2.6)

1/p with norm f p = E [S(f )]p . Now we introduce the BMO spaces (for bounded mean oscillations) (4.2.7) BMO = f : supn≥1 E (|f − fn−1 |2 n ) ∞ < ∞

with norm f BMO = supn≥1 E |f − fn−1 |2 n ∞ , for f such that f0 = E (f 0 ) = 0. Recall that these spaces are Banach spaces and that they strictly intercalate between the space of exponentially integrable functions and L∞ (μ). Indeed, by a theorem of Jones–Nirenberg [1961], any element f ∈ BMO is exponentially integrable. And a closed graph argument shows that there exists a constant C (possibly depending on the filtration) such that for any 1 ≤ r ≤ ∞,

f r ≤ Cr f BMO .

(4.2.8)

There are further examples of functions f belonging to BMO, but not to L∞ (μ). Recall also Feffermann inequality (see Garsia [1973: 7–8]) on the duality 1 –BMO. Let f, ϕ be such that E (f |0 ) = E (ϕ|0 ) = 0, then |E (f · ϕ)| ≤ c f 1 ϕ BMO ,

(4.2.9)

in the following sense: E (f.ϕ) = limn→∞ E (fn .ϕn ), and c is a universal constant. Recall also (Garsia [1973: 27]) that for any p ≥ 1, " p #1/p " #1/p E S (f ) ≤ Cp E (f ∗ )p . (4.2.10)

144

4 Pointwise ergodic theorems

One may refer to Garsia [1973] for more insights on these spaces. We are going to establish the following result. 4.2.6 Theorem. Let {fθ , θ ∈ } be a finite subset of BMO and assume further that E (fθ |0 ) = 0, θ ∈ . Then,

∀λ > 0, λμ sup M∞ (fθ ) > λ ≤ C sup fθ BMO · log #(), θ ∈

θ ∈

where C is a universal constant. Proof. Let (fθ , θ ∈ ) be a finite subset of BMO and put r = log #() (without loss of generality, one can assume that #() ≥ 3). We deduce from inequalities (4.2.5) and (4.2.8) that for any λ ≥ 0, r r T sup |fθ | ≤ (fθ ) > λ ≤ #()1/r sup fθ r λμ sup M∞ r r − 1 θ ∈ r −1 θ ∈ θ ∈ ≤ C log #() · sup |fθ | BMO . θ ∈

4.3

Classes L logm L

m For any positive mm, let L log L denote the class of measurable functions f

integer + such that |f | log |f | is integrable. These classes naturally appear in the study of the integrability properties of M∞ (f ).

4.3.1 Theorem. Let T be an L1 -L∞ positive contraction. Then, for any positive integer m, T f ∈ L logm L "⇒ M∞ (f ) ∈ L logm−1 L. (4.3.1) Proof. We pose Y = MnT (|f |), X = |f |. By Lemma 4.2.1, we have for any α ≥ 0, Xdμ. αμ {Y ≥ α} ≤ 2 2X≥α

We say in this case that X and Y are in relation of weak maximal type. By arguing as in the proof of Lemma 4.2.2, it is possible to also establish: for any right-continuous function ψ : R+ → R+ , with ψ(0) = 0, 2X(ω) ψ(Y )dμ ≤ 2 X(ω) t −1 ψ(dt)dμ(ω). (4.3.2) 0

Choose ψ(t) = t (log+ t)m−1 , m ≥ 2. As dψ = (m − 1)(log+ t)m−2 + (log+ t)m−1 , dt

145

4.4 A converse

we get 2X(ω) t

−1

+

2X(ω)

ψ(dt) ≤ m(log 2X(ω))

m−1

0

t −1 dt = m(log+ 2X(ω))m .

0

And so MnT (|f (t)|)(log+ MnT (|f (t)|))m−1 dμ(t) ≤ 2m |f (t)|(log+ 2|f (t)|)m dμ(t). We conclude by letting n tend to infinity.

4.4 A converse A theorem due to Ornstein [1971] shows that if τ is an ergodic automorphism, the sufficient condition f ∈ L log L is also necessary for the integrability of M∞ f , when f ≥ 0. 4.4.1 Theorem. If τ is an ergodic automorphism from a measure space (, A, μ), where μ is a finite measure, then for any f ≥ 0, we have the equivalence τ M∞ f = f ∗ ∈ L1 ⇐⇒ f ∈ L log L.

The proof relies upon the following lemma due to Moy [1960]. Put for A ∈ A such that μ(A) > 0, ω ∈ A, rA (ω) = inf{n ≥ 1 : τ n (ω) ∈ A} and let A∗ =

,∞

i=1 τ

(4.4.1)

i A.

4.4.2 Lemma. Let τ be an automorphism from a measure space (, A, μ). Let f ∈ L1 (μ), then r A −1 k f τ dμ = f dμ. A k=0

A∗

Proof. Introduce for any positive integer k, the sets Ak = τ k {ω ∈ A : rA (ω) ≥ k + 1}. We claim that these sets form a countable partition of A∗ \A. First if ω ∈ Ak , then τ −k ω ∈ A and rA (τ −k ω) = inf{n ≥,1 : τ n−k (ω) ∈ A} ≥ k + 1, which implies that ∗ ∗ ω∈ / A. Thus ω ∈ τ k A ∩ Ac , and so ∞ k=1 Ak ⊂ A \A. Conversely if ω ∈ A \A, let i i 0 0 i0 ≥ 1 be the smallest integer for which ω ∈ τ A, thus ω ∈ τ A and ω ∈ / τ j A for / A) j < i0 . Rewrite this as (using that ω ∈ τ −i0 ω ∈ A

and

τ n−i0 ω ∈ / A if 1 ≤ n ≤ i0 .

146

4 Pointwise ergodic theorems

c together imply r (τ −i0 ω) > i , which means that This and the fact A 0 ,∞that ω ∈ A ∗ ω ∈ Ai0 . Thus k=1 Ak = A \A. Now let 1 ≤ k < l and pick ω ∈ Ak ∩ Al . On the one hand, since ω ∈ Al we have

rA (τ −l ω) ≥ l + 1. And on the other, as ω ∈ Ak we get τ (l−k)−l ω = τ −k ω ∈ A. Thus rA (τ −l ω) ≤ l − k, which provides a contradiction. Hence Ak ∩ Al = ∅. Let g ≥ 0 be integrable. We deduce r A −1

g τ dμ = k

A k=0

= =

(by (3.1.2))

∞

j −1

j =1 (rA =j )∩A k=0 ∞ ∞

g τ dμ = k

k=0 j =k+1 (rA =j )∩A ∞

g dμ +

k=1 Ak

=

A∗

j =1 k=0 (rA =j )∩A ∞

g τ k dμ =

g dμ =

g τ k dμ

g τ k dμ

k=0 (rA >k)∩A

A

j −1 ∞

A∗ \A

g dμ +

g dμ A

gdμ, (4.4.2)

where in the last equality we used the fact that A∗ ∩ A = A, as it follows by applying Poincaré recurrence Theorem 3.1.5 to τ −1 . It is the only instance, with the use of (3.1.2), where the assumption that τ is an automorphism is used. Now let f ∈ L1 (μ) and write f = f + − f − . The proof is now achieved by applying (4.4.2) to g = f + and g = f − . Notice from (4.4.2) that r A −1

gτ k dμ =

A k=0

r A −1

gτ k dμ+

A∩(rA >1) k=0

A∩(rA =1)

gdμ =

A∗ \A

g dμ+

g dμ. A

Thus

r A −1

g τ k dμ =

A∩(rA >1) k=0

A∗ \A

g dμ +

g dμ.

(4.4.3)

A∩(rA >1)

4.4.3 Lemma. Let τ be an ergodic automorphism from a measure space (, A, μ). Let f ≥ 0 be integrable and assume that for some α > 0 the measure of the set A = {f ∗ < α} is positive. Then, f dμ ≤ 2αμ{f ∗ > α}. f ∗ ≥α

147

4.4 A converse

This result provides for ergodic automorphisms a converse to the maximal inequality given in Lemma 4.1.2. Proof. By Remark 3.1.4 applied to τ −1 , μ(A∗ ) = 1. Further A ∩ (rA > 1) = A ∩ τ −1 (Ac ) since ω ∈ A ∩ (rA > 1) means that τ ω ∈ Ac , thereby ω ∈ A ∩ τ −1 (Ac ) and conversely. Recall also that ω ∈ A implies rA (ω) < ∞. Thus for any g ∈ L1 (μ), by (4.4.3),

r A −1

g τ k dμ =

A∩(rA >1) k=0

Ac

=

And if g ≥ 0,

g dμ +

r A −1

Ac

g dμ

g dμ +

A∩(rA >1)

A∩τ −1 (Ac )

(4.4.4) g dμ.

g τ k dμ ≥

A∩(rA >1) k=0

g dμ.

(4.4.5)

Ac

Let ω ∈ A be such that rA (ω) > 1. Observe that r A −1

f (τ k ω) < αrA (ω),

(4.4.6)

k=0

since otherwise we would have f ∗ (ω) ≥

rA −1 1 f (τ k ω) ≥ α, rA k=0

which is absurd. Now by using (4.4.5) for g = f , next (4.4.6) and finally (4.4.4) for g = 1, we obtain Ac

r A −1

f dμ ≤

f τ k dμ

A∩(rA >1) k=0

≤α

rA dμ A∩(rA >1)

r A −1

=α ≤ This achieves the proof.

1 dμ

A∩(rA >1) k=0

Ac

1 dμ +

τ −1 (Ac )

1 dμ = 2αμ(Ac ).

148

4 Pointwise ergodic theorems

Proof of Theorem 4.4.1. We can assume f ≥ 1 (otherwise consider f + 1 in place of f ). Let α0 = inf{α : μ(f ∗ < α) > 0}. We have 1 0 f (ω)

f log f dμ =

f (ω)

α −1 dα

dμ(ω)

1

= α f (ω) dμ(ω) dα 1 f ≥α ∞ −1 ≤ α f (ω) dμ(ω) dα 1 f ∗ ≥α ∞ α0 ∗ ≤2 μ{f ≥ α} dα + f 1 α −1 dα

∞

−1

α0 ∗

1

≤ 2 f 1 + f 1 log α0 . 4.4.4 Remark. Let a = {ak , k ≥ 0} be a sequence of bounded nonnegative reals, and consider the weighted ergodic averages n−1 ak f τ k Wnτ f = k=0 . n−1 k=0 ak τ f = sup τ −1 ≤ a ≤ C, Put W∞ k n≥1 Wn f . Let m be any positive integer. If C k = 0, 1, . . . , the same arguments also show that if τ is an ergodic automorphism from a measure space (, A, μ), then for any f ≥ 0, τ W∞ f ∈ L logm−1 L "⇒ f ∈ L logm L.

(4.4.7)

The interesting case when ak = 0 or 1 according to k ∈ N , where N = {nk , k ≥ 1} is an increasing sequence of integers, require us to work with rN ,A (ω) = , should ni A. Some additional properties of N , e.g., τ inf{ ≥ 1 :τ n (ω) ∈ A} and A∗ = ∞ i=1 ni ± nj ∈ N , j < i, plus naturally a suitable subsequence mean ergodic theorem, seem also to be necessary.

4.5

Speed of convergence

It is a fundamental fact that no speed of convergence can be associated to Birkhoff’s theorem, neither to von Neumann’s theorem. These negative results are essentially due to O’Brien [1983], Halász [1976], Krengel [1978] and von Neumann [1936], see for instance the discussion in Krengel [1985: 14, 15]. In what follows, we shall refer to the survey of Kachurovskii [1996]. Below is a first result due to Halász and Krengel (see Kachurovskii [1996: Theorem 1]). 4.5.1 Theorem. For any automorphism τ of the interval [0, 1] provided with the normalized Lebesgue measure λ, we can choose indicator functions for which the speed

4.5 Speed of convergence

149

of convergence in the pointwise ergodic theorem can be arbitrarily fast or arbitrarily slow: (1) For any sequence {an , n ≥ 1} with a1 ≥ 2 tending to infinity monotonically, we can find a measurable set A of prescribed measure λ(A), such that λ-almost everywhere, an ∀n, |Aτn (1A ) − λ(A)| ≤ . n (2) For any sequence {bn , n ≥ 1} of positive reals tending to 0, we can find a measurable set B of measure λ(A) ∈ ]0, 1[, such that λ-almost everywhere, lim

1 τ |A (1B ) − λ(B)| = ∞ bn n

lim

1

Aτ (1B ) − λ(B) p = ∞ bn n

n→∞

and n→∞

for any p ∈ [1, ∞]. One can naturally search to find spectral type conditions under which a speed of convergence holds. In this direction, the two following statements are of interest (Theorems 3 and 4 in [Kachurovskii: 1996]). 4.5.2 Theorem. Assume that τ is weakly mixing. Then the following properties are equivalent: (1) Aτn (f ) 2 = O(n) (n → ∞); π (2) the integral −π |x|−2 μf (dx) is convergent; (3) f is cohomologous to 0: f = g τ − g, g ∈ L2 . The following statement concerns the speed of convergence in probability. Put (4.5.1) pnε = μ{|An f − f¯| > ε}, Pnε = μ sup |AN f − f¯| > ε . N ≥n

4.5.3 Theorem. Assume that f¯ = 0. Then, for any ε > 0, 1 pnε ≤ 2 |Vn (x)| μf (dx), ε 4 16 δ ε |Vn (x)| μf (dx) μf (dx) + Pn ≤ inf δ>0 ε 2 −δ N ε2 |x|≥δ N ≥n 16 δ 4 2

f

≤ inf μ (dx) +

f 2 . δ>0 ε 2 −δ (n − 1)ε 2 sin2 2δ

150

4 Pointwise ergodic theorems

4.5.4 Remark. Before giving the proof of this result, we shall make some useful comments concerning approximation properties in L2 by functions cohomologous to 0, namely functions of type Uτ g − g for g ∈ L2 , where Uτ is defined by Uτ f = f τ . Let f ∈ L2 be such that f¯ = 0. Let E = {Et , t ∈ ] − π, π]} be a spectral resolution of Uτ , and put for any δ ∈ ]0, π[, fδ = E[−δ, δ]f

(= (Eδ − E−δ )f ).

Write f as a sum of two orthogonal functions: f = fδ + (f − fδ ). Let μf be the spectral measure of f relative to Uτ . It follows from Theorem 2.2.9 that fδ 2 = μf ([−δ, δ]), and consequently fδ 2 → 0 as δ → 0, since f¯ = μ({0}) = 0. The second term (f − fδ ) of the decomposition is cohomologous to 0 for each δ, that is to say f − fδ = Uτ g(δ) − g(δ) where g(δ) ∈ L2 , since 1 is a regular value of the restriction of Uτ to the subspace Hδ of functions h ∈ L2 such that E[−δ, δ]h = 0. Besides,

(Uτ − I )

−1

=

δ −δ

1 1 1 . ≤ sup = dE t it eit − 1 2 sin 2δ |t|≤δ e − 1

We thus deduce a corresponding decomposition for the sums An (f ): An (f ) = An (fδ ) + An (f − fδ ),

1 n Uτ g(δ) − g(δ) , n 2

gδ 2 ≤

f 2 , sin 2δ

An (f − fδ ) =

fδ 2 = μf ([−δ, δ]) → 0, and so

An (fδ ) 2 ≤ fδ 2 → 0,

1 4

An (f − fδ ) 2 ≤

f 2 . n sin 2δ

Proof of Theorem 4.5.3. The first estimate immediately follows from Tchebycheff inequality and the spectral inequality. Consider the second estimate, and recall the decomposition An (f ) = An (fδ ) + An (f − fδ ) (δ ∈ ]0, π [). Then

Pnε

ε ε ≤ μ sup |AN fδ | ≥ + μ sup |AN (f − fδ )| ≥ . 2 2 N ≥n N ≥n

(4.5.2)

We bound the first expression by means of the dominated ergodic Theorem 4.1.4 (inequality (a) with p = 2):

2 16 ε 4 ≤ 2 sup |AN fδ | 2 ≤ 2 fδ 22 . μ sup |AN fδ | ≥ 2 ε N ≥1 ε N ≥n

(4.5.3)

4.5 Speed of convergence

151

Finally, concerning the second estimate,

ε ε ≤ μ |AN (f − fδ )| ≥ μ sup |AN (f − fδ )| ≥ 2 2 N≥n N ≥n

4

AN (f − fδ ) 22 ε2 N ≥n 64 1 64 f 22 2

f

≤ ≤ . 2 n2 ε2 sin δ 2 2 sin δ 2 (n − 1)ε N ≥n 2 2 ≤

This provides the requested estimate. There is also a remarkable interconnection between the large deviation probability ∞ pnε in (4.5.1), and the property for f ∈ L∞ 0 (μ) to be approximated (in L0 (μ)) by coboundaries whose cobounding functions have finite moments. This link was recently established by Volný and Weiss [2004]. Let (X, A, μ, T ) be a measurable dynamical system and assume that T is an ergodic aperiodic automorphism. For k = 1, 2, . . . we denote Sk = T 0 + · · · + T k−1 . We have the following results. 4.5.5 Theorem. Let f ∈ L∞ 0 (μ) and p ≥ 1. Then ⎧∞ p−1 μ{|S f | > εk} < ∞, ⎪ k ⎨ k=1 k (∀ε>0) inf

f − (g − g T )

= 0 "⇒ and ∞ ⎪ g∈Lp (μ) ⎩ supk≥1 k p μ{|Sk f | > εk} < ∞. p 4.5.6 Theorem. Let f ∈ L∞ 0 (μ) and p ≥ 1. If supk≥1 k μ{|Sk f | > ηk} < ∞ for every η > 0, then for whatever ε > 0 and v : R+ → R+ such that ∞ k=1

1 0 and v(x), x p /v(x) are increasing,

|g|p there exists g ∈ L0 (μ) such that X v(|g|) dμ < ∞ and f − (g − g T ) ∞ < ε. In particular, for any ε > 0 we can find g ∈ Lp−ε . e|x|

Let L" (μ) be the Orlicz space associated to the exponentialYoung function "(x) = − 1.

4.5.7 Theorem. Let f ∈ L∞ 0 (μ). We have the following equivalences: 1 lim inf − log μ{|Sn f | > n} > 0 ⇐⇒ n→∞ n

inf

g∈L" (μ)

f − (g − g T ) ∞ = 0.

We refer to the quoted paper of Volný and Weiss [2004] for the proof of these results as well as a reference source for coboundaries.

152

4 Pointwise ergodic theorems

4.6

Oscillation functions of ergodic averages

In this section, we show how to modify the spectral regularization of Section 1.4 in order to control the oscillation functions of ergodic averages. We assume throughout the section that (X, A, ν) is a measure space with a finite measure ν, H = L2 (ν), and U is the unitary operator generated by a measure-preserving transformation of (X, A, ν). We write Log(u) = max{1, log u} for u ≥ 1. We still denote μ the spectral measure of an element f ∈ H and define the regularized spectral measure μˆ by letting its Lebesgue density be π d μˆ (x) = Q(θ, x)μ(dθ ), dx −π where this time

|θ |−1 Log2 xθ ,

|x| < |θ |,

θ 2 |x|−3 ,

|θ | ≤ |x| ≤ π.

Q(θ, x) =

(4.6.1)

The following theorem provides a control of the oscillation over an arbitrary block of averages. 4.6.1 Theorem. Let n, n+ be positive integers such that n ≤ n+ . Then

# sup |Am (f ) − An (f )| 2 ≤ C μˆ 1 , 1 . n+ n 2,ν n≤m≤n+

Remarks. 1. The result still holds true for Am generated by arbitrary contraction of H (not necessarily related to a measure-preserving transformation) under supplementary assumption n+ ≤ Rn. In the latter case the constant C depends on R. 2. Theorem 4.6.1 immediately allows to recover the following result due to Jones, Kaufman, Rosenblatt, and Wierdl [1998: Theorem A] concerning oscillation functions of ergodic averages. 4.6.2 Corollary. Let {np , p ≥ 1} be an increasing sequence of positive integers. Then, ∞ p=1

sup

np ≤m n. Then, ⎧ (m−n)m 2 θ , |θ | < m1 ; ⎪ ⎪ 2 ⎨ # " 1 1 κm,n (θ ) = κ , , θ = m−n , |θ | ∈ m1 , n1 ; 4m ⎪ m n ⎪ ⎩ m−n Log2 (n|θ |), |θ | ∈ 1 , π #. mn|θ |

n

Proof of Theorem 4.6.1. At first we prove the theorem for a short dyadic block. Namely, let us assume additionally that for some integer p, n+ − n = 2p ≤ 2n.

(4.6.2)

We use the classical dyadic scheme and thus introduce the binary increments j,k (f ) = An+(j +1)2p−k (f ) − An+j 2p−k (f ),

1 ≤ k ≤ p, 0 ≤ j < 2k − 1.

Each integer m ∈ [n, n + 2p ) can be written as m=n+

p

εk (m) = 0 or 1.

εk (m)2p−k ,

k=1

Thus, Am (f ) = An (f ) +

p

j (k,m),k (f ),

k=1

where the indexes {j (k, m)} are easily defined by {εk (m)}. Thus, we have sup n≤m 0, ∞ n P ξk > nλ < ∞. n=1

k=1

(4.6.10)

161

4.6 Oscillation functions of ergodic averages

Inequality (4.6.9) is a particular case of a more general maximal inequality proved in Rosenblatt–Wierdl [1992]: let a = {ap , p ≥ 1} be a sequence of positive reals and p bp = n=1 an . Then ∞

ap μ

p=1

sup

np ≤m bp ≤ C f 1 ,

(4.6.11)

and C is independent of f and τ . Wittmann [1995b] showed that (4.6.11) holds for general L1 -L∞ contractions. Inequality (4.6.10) is related to the very useful notion of complete convergence, which is worth to describe a bit. Let X = {Xn,k , 1 ≤ k ≤ kn , n ≥ 1} denote a triangular array of real centered independent random variables, and a = {an,k , 1 ≤ k ≤ kn , n ≥ 1} with {kn , n ≥ 1} nondecreasing, a triangular array of positive reals. When the random variables are symmetric (resp. identically distributed), we will say that the triangular array X is symmetric (resp. i.i.d.). Set, for every n ≥ 1, Tn =

kn

an,k Xn,k ,

An =

k=1

kn k=1

an,k ,

Bn2 =

kn

2 an,k ,

Cn = An /Bn .

k=1

Let (, A, P) be the basic probability space on which X is defined. Note that Cn ≥ 1. c.c. We say that the sequence Tn /An converges completely to 0 and write Tn /An −→ 0, when for any ε > 0, P {|Tn |/An > ε} < ∞. (4.6.12) n

The study of this property originates from a well-known paper by Hsu and Robbins [1947] who proved in the case of a single i.i.d. sequence ξ = {ξ, ξn , n ≥ 1} with partial c.c. sums Sn = nk=1 ξk , n = 1, 2, . . . , that E ξ = 0, E ξ 2 < ∞ imply Sn /n −→ 0. Shortly afterward, Erdös [1949] proved the validity of the converse implication. Since then, the study of various possible generalizations of this result (subsequence case, the theorems of Baum–Katz [1965], extensions to triangular arrays of independent random variables, Banach space valued random variables) have received a lot of attention. One may for example refer to the works of Gut [1992], Fazekas [1985/88], Hu–Móricz– Taylor [1989], Ahmed–Giuliano–Volodin [2002], Kuczmaszewska–Szynal [1988/91], Li–Rao–Wang [1992], Pruitt [1966], Rohatgi [1971], Sung [1997] and Berkes–Weber [2006]. In the Gaussian case, namely if X is Gaussian, the problem can be simply settled. Put log #{n : Cn ≤ x} L(a) = lim sup . (4.6.13) x2 x→∞ Then we have the following characterization in [Berkes–Weber: 2006] c.c.

Tn /An −→ 0 ⇐⇒ L(a) = 0.

(4.6.14)

162

4 Pointwise ergodic theorems

This case is in general very informative and interesting, because of the classical Gaussian randomization procedure for sums of independent random variables. By applying Skorohod’s embedding scheme (see Section 10.4) for the row sums of the triangular 2 = 1 and X 2p for some p ≥ 2, array X, one can show, for instance if E Xn,k n,k ∈ L that the relation

kn 4 p/2 k=1 an,k

kn 2 p < ∞, n k=1 an,k c.c.

implies Tn /An −→ 0. To compare this result with the Gaussian case, note that L(a) = 0 is equivalent to

kn

exp

−δ

n

2

k=1 an,k k n 2 k=1 an,k

for all δ > 0.

It seems also worth mentioning some sharp results concerning the convergence of

f (τ n (x)) p the series ∞ with p > 1. Assani [1997b] showed that if τ is ergodic, n=1 n for f ≥ 0, f ∈ L log L, lim

p→1+

(p − 1)

1/p ∞ f (τ n (x)) p n=1

n

a.e.

=

f dμ.

(4.6.15)

Further, there is an absolute constant C such that 1/p ∞ " # f (τ n (x)) p ≤C sup (p − 1)1/p f log f dμ + 1 . (4.6.16) n 10

and for r < p,

x p,∞ ≤ x p ≤

p 1/p

x r,∞ . p−r

In the ergodic setting: if τ is ergodic, then xn = f (τ n x); Assani [1997a] proved that p for any f ∈ L+ (μ), Nf∗ is of weak type (p, p) for all p, 1 < p < ∞. Further, #{n : f (τ n x)/n ≥ 1/m} a.e. = lim m→∞ m

f dμ.

(4.6.19)

The convergence in L1 of the averages in (4.6.19) also holds. Note that for f ≥ 0,

#{n : f (τ n x)/n ≥ 1/m} f (τ n x) sup , n ≥ 1 . ∼ n m 1,∞ m≥1

Further [Assani: 1997b] for f ∈ L log L f (τ n x) ,n ≥ 1 < ∞. n 1,∞ 1

(4.6.20)

Assani, Buczolich and Mauldin [2005] however showed that for f ∈ L1 the convergence almost everywhere of these averages fails to hold. This negative result establishes that Bourgain’s return time theorem (see Section 4.7.3) does not hold for (L1 , L1 ) pairs.

164

4 Pointwise ergodic theorems

Transference principle. We shall state the Calderon transference principle not in its full generality, but in the discrete case. One may fruitfully refer to Calderon’s original paper for more general formulations, as well as to the illuminating discussion on “transference principles in ergodic theory” made in Bellow [1999]. Let m be a probability measure on Z. Define a mapping ϕ → m[ϕ] from 1 (Z) to 1 (Z) by m[ϕ](k) = m(j )ϕ(k + j ), k ∈ Z. j ∈Z

Let (X, A, μ, τ ) be a measurable dynamical system, and assume that τ is an automorphism of (X, A, μ). Define similarly a mapping f → m[f ] from L1 (μ) to L1 (μ) by putting m(j )f τ j (x), x ∈ X. m[f ](x) = j ∈Z

4.6.6 Theorem. Let {mn , n ≥ 1} be a sequence of probability measures on Z. Consider the following assertions: (1) There exists a constant C such that sup sup λ# k ∈ Z : sup mn [ϕ](k) > λ ≤ C.

ϕ 1 ≤1 λ≥0

n≥1

(2) There exists a constant C such that for every measurable dynamical system (X, A, μ, τ ), we have sup sup λμ x ∈ X : sup mn [f ](x) > λ ≤ C.

f 1 ≤1 λ≥0

n≥1

Then (1) implies (2). 4.6.7 Remarks. 1. The first assertion indicates that we have a weak type (1, 1) inequality on 1 (Z). The second one states a weak type (1, 1) inequality on L1 (μ). The constant C is the same in (1) and (2). One can allow σ -finite measure spaces in (2), and one obtains an equivalent statement. The transference principle also applies if we replace the weak type (1,1) estimate by a weak type (p, p) estimate (respectively a strong type (p, p) estimate) for 1 < p < ∞. The underlying fact is that if we have the estimate for the shift model (i.e., Z with translations), we can derive it for any other dynamical system. 2. If one sets mn = n1 n−1 k=0 δk , then (1) yields the Hardy–Littlewood maximal inequality sup sup λ# k ∈ Z : sup n1 n−1 j =0 ϕ(j + k) > λ ≤ C,

ϕ 1 ≤1 λ≥0

n≥1

proved in the celebrated paper of Hardy and Littlewood [1930]. And (2) yields the maximal ergodic inequality (Lemma 4.1.2) j sup sup λμ x ∈ X : sup n1 n−1 k=0 f τ (x) > λ ≤ C.

f 1 ≤1 λ≥0

n≥1

165

4.7 Wiener–Wintner theorem

We thus see that the maximal ergodic inequality may in turn be also deduced from the Hardy–Littlewood maximal inequality published one year before Birkhoff’s proof of the pointwise ergodic theorem. Proof of Theorem 4.6.6. Let f ∈ L1 (μ). It suffices to prove (2) for nonnegative f . Assume then that (1) is realized and apply it to the sequence f τ j (x) if |j | ≤ J , ϕ(j ) = 0 otherwise. Observe that ϕ(k + j ) = 0, if |k| > 2J . Then for any x ∈ X, any positive integer N and any real t ≥ 0, k+j (x) > t t# k ∈ Z : sup j ∈Z m(j )f τ 1≤n≤N

= t# |k| ≤ J : sup 1≤n≤N

j ∈Z m(j )f

τ k+j (x) > t

≤ C ϕ 1 . Integrating over X with respect to μ gives k+j (x) > t ≤ 2CJ f . 4J tμ x : sup 1 j ∈Z m(j )f τ 1≤n≤N

Letting T and then N tend to infinity, finally leads to tμ x : sup j ∈Z m(j )f τ k+j (x) > t ≤ C f 1 , n≥1

as claimed.

4.7 Wiener–Wintner theorem Let (X, A, μ, T ) be a dynamical ergodic system and consider a rotation τ x = x + θ (mod 1) on the circle (T, B(T), λ). Let f ∈ L1 (μ). The Birkhoff ergodic theorem applied in the product dynamical system (X × T, A ⊗ B(T), μ × λ, T × τ ) to the function g = e2iπ θ f implies that the limit N −1 1 2iπkθ e f (T k x) N →∞ N

lim

k=0

exists μ-almost everywhere. The striking fact is that the measurable set of full measure, on which this property holds does not depend on the value of θ . This was first observed by Wiener and Wintner [1941].

166

4 Pointwise ergodic theorems

4.7.1 Theorem. Let (X, A, μ, T ) be an ergodic measurable dynamical system. Then for any f ∈ L1 (μ), for μ-almost all x, the sequence of averages N −1 1 inϑ e f (T n x) N n=0

converges for any value of ϑ. The proof proposed in the Wiener–Wintner paper was however incorrect. Since then, several different proofs have been published. This result admits a remarkable strengthening, in the sense that the latter convergence is uniform in ϑ. This uniform version of the Wiener–Wintner theorem is due to Bourgain [1990]. 4.7.2 Theorem (Uniform Wiener–Wintner theorem). Let (X,A, μ, T ) be an ergodic measurable dynamical system. Then for any f ∈ L1 (μ) with X f dμ = 0, −1 ikϑ f (T k x) = 0 = 1. μ x ∈ X : lim sup N1 N k=0 e N →∞ ϑ∈R

Proof. We give a proof using Van der Corput’s inequality when T is weakly mixing. Recall the Van der Corput inequality (Theorem 1.7.1), case H = C. If {xn , 0 ≤ n ≤ N − 1} are complex numbers and R is some integer between 0 and N − 1, then N−1 N −1 1 2 N +R |xk |2 x ≤ k N N 2 (R + 1) k=0

k=0

−r−1 R N N +R . +2 2 x x + 1 − r) · $ (R k k+r N (R + 1)2 r=1

k=0

Assume first f ∈ L2 (μ) and apply this inequality with the choice xn = einϑ f (T n x). We get N−1 2 N −1 R 1 ikϑ N +R N +R k k 2 e f (T x) ≤ |f (T x)| + 2 $(e−irϑ ) N N 2 (R + 1) N 2 (R + 1)2 k=0

r=1

k=0

−r−1 N

· (R + 1 − r)

f (T k x) · f (T k+r x) .

k=0

Taking the supremum of over all ϑ gives, since R ≤ N − 1, N−1 2 1 ikϑ e f (T k x) ≤

sup

ϑ∈R

N

k=0

N −1 2 |f (T k x)|2 N(R + 1) k=0

+

R N −r−1 1 4 k k+r x). f (T x) · f (T (R + 1) N r=1

k=0

167

4.7 Wiener–Wintner theorem

Taking now the limsup on N provides N 2 −1 1 ikϑ 2E (f 2 |JT ) + e f (T k x) ≤

lim sup sup N→∞ ϑ∈R

N

(R + 1)

k=0

R 4 E (f.f T r |JT ). (R + 1) r=1

Now since T is weakly mixing, then JT is the trivial σ -algebra of X, and so E (f · f T r |JT ) = X f · f T r dμ = f, f T r . Further, R 1 f, f T r − f, 12 = 0. R→∞ R

lim

r=1

By passing to the limsup on R, we finally get N 2 −1 1 ikϑ e f (T k x) ≤ 4f, 12 ,

lim sup sup N →∞ ϑ∈R

N

k=0

which equals 0, if moreover X f dμ = 0. Now consider the case f ∈ L1 (μ) with X f dμ = 0. An intuitive approximation argument which, however, is worth display, suffices to reach a conclusion in that case. Let {fn , n ≥ 1} be a sequence of L2 (μ) elements converging to f in the L1 (μ) norm. For each of these elements, we have by the previous step N −1 1 ikϑ e fn (T k x) ≤ 2|fn , 1|,

lim sup sup N →∞ ϑ∈R

N

k=0

almost surely. Further, by Birkhoff’s theorem N −1 N −1 1 ikϑ

1 f (T k x) − fn (T k x) ≤ lim sup e |f (T k x) − fn (T k x)|

lim sup sup N→∞ ϑ∈R

N

N →∞

k=0

N

k=0

≤ f − fn 1 , almost surely. By the triangle inequality, we get N −1 1 ikϑ e f (T k x) ≤ 2|fn , 1| + f − fn 1 ,

lim sup sup N→∞ ϑ∈R

N

k=0

for any integer n, almost surely. As the right-hand side tends to zero as n tends to infinity, we obtain the result in the L1 (μ) case as well. Wiener–Wintner functions. The uniform version of the Wiener–Wintner theorem has recently given rise to some interesting developments (see Assani, Lesigne and Rudolph [1995], see also Assani [2003], [2004]). Let (X, A, μ, T ) be an ergodic

168

4 Pointwise ergodic theorems

dynamical system and p ≥ 1. A function f is a Wiener–Wintner function in Lp (μ) if there exists an α > 0 such that N 1 sup N α sup f (T n x) e2π inε < ∞. p N ε>0 N ≥1 n=1

Assani [2004] obtained a spectral characterization of Wiener–Wintner functions, with the help of the almost everywhere continuity of the random Fourier series Hγε (f )(x) =

(−1)k

k∈Z

f (T k x) 2π ikε e |k|γ

which he called “the fractional rotated ergodic Hilbert transform”. He showed that an L∞ (μ) function f is a Wiener–Wintner function in L2 (μ) if and only if for almost all x, Hγε (f )(x) is a continuous function of ε, which is a remarkable fact. Return times theorems. By Theorem 4.7.1, f = 1A , then for all x outside a μ-null if−1 2iπ nϑ 1 (T n x) converge to a limit set N = Nf and for all ϑ, the averages N1 N A n=0 e as n tends to infinity. By the spectral inequality, this implies for any contraction S in a Hilbert space, that for all x ∈ / F , and all g ∈ H the averages N −1 1 1A (T n x)S n g, N

N = 1, 2, . . .

n=0

converge in H . When Sg = g σ , σ being an automorphism from a joint probability space (Y, B, ν), the question whether these averages converge ν-almost everywhere was settled affirmatively by Bourgain [1988d], and the solution is known as Bourgain’s return time theorem.

4.8 Weighted ergodic averages Let τ be a measure-preserving transformation of a probability space (X, A, μ). Let w = {wk , k ≥ 1} be a sequence of nonnegative reals with partial sums Wn := nk=1 wk . Since the ergodic theorem of Birkhoff for integrable functions can be viewed as an extension of the corresponding law of large numbers for i.i.d. random variables with finite expectation, it is natural to also look at the convergence almost everywhere of the weighted ergodic averages An f :=

n 1 wk f τ k , Wn k=1

n = 1, . . . .

169

4.8 Weighted ergodic averages

In view of the Beppo Levi theorem, we have to study only the case So we do assume throughout this section that

∞

k=1 wk

= ∞.

Wn ↑ ∞. Before going further, let us consider some typical means. Logarithmic means. After arithmetic means (or Cesàro-0 means), these averages are mostly known. They are defined for a given sequence x = {xk , k ≥ 0} of reals by 1 xk . k log n n

k=1

And it is a classical fact that Cesàro-0 convergence implies the one of the logarithmic means. So that Birkhoff’s ergodic theorem does hold for logarithmic averages. A set S of positive integers has logarithmic density when the limit 1 1 n→∞ log n k k∈S

L(S) := lim

k≤n

exists. And by a result due to Wintner [1944c; 53], S has logarithmic density if and only if the limit 1 lim (s − 1) s→1+0 ns s∈S

exists, in which case the limit is L(S). See also [Paul: 1962] for more on densities. Cesàro means. For α > −1, we set Aα0 = 1,

Aαn − Aαn−1 = Aα−1 n ,

A0n = 1.

Then Aαn

=

n k=0

Aα−1 n−k =

(α + 1) . . . (α + n) , n!

lim Aαn

n→∞

(α + 1) = 1. nα

Further Aαn increases with n if α > 0, and decreases with n if −1 < α < 0. Let 0 < α ≤ 1. We have the following estimates: (n + 1)α nα ≤ Aαn ≤ , (α + 1) (α + 1)

and

≤ Aα−1 n

nα if n > 0. (α)

Let x = {xk , k ≥ 0} be sequence of reals. The associated Cesàro-α means for x are defined by n 1 α−1 An−k xk . Mnα (x) = Mnα = α An k=0

170

4 Pointwise ergodic theorems

The sequence x is (C, α) (i.e. Cesàro-α) convergent to y, if limn→∞ Mnα = y. It is well-known ([Zygmund: 1959], Theorem 1.21, p. 77) that if x is (C, α) convergent to y for some α > −1, then x is (C, β) convergent to y for β ≥ α. In particular, (C, 0) convergence implies (C, α) convergence for α ≥ 0. And (C, α) convergence for −1 < α < 0 implies usual (C, 0) convergence. For an i.i.d. sequence X = {Xk , k ≥ 0} of random variables it is known that – for 0 < α ≤ 1, X is (C, α) convergent iff E |X0 |1/α < ∞, – for α ≥ 1, all (C, α) convergences are equivalent with E |X0 | < ∞. See [Deniel: 1989] and references therein. Now, let T be a positive linear contraction of Lp . Let 0 < α ≤ 1. Irmisch [1980] proved the a.s. convergence of Cesàro-α means n 1 α−1 k An−k T f, Aαn k=0

for any f ∈ Lp , if αp > 1. This applies in particular if Tf = f τ , where τ is a measure preserving transformation of some probability space (, A, μ). Irmisch further proved that this result is false in general if αp = 1. Deniel [1989; Theorem 7] showed that this is also false if Tf = f τ , τ ergodic, μ non-atomic by constructing a specific counterexample using Rochlin’s towers. Riesz harmonic means. These means, which must not be confused with logarithmic means, are defined for any sequence x = {xk , k ≥ 0} of reals by cn xk , log n n−k n−1

log n cn = n 1 .

k=0

k=1 k

The convergence of the Riesz harmonic means implies the one for α > 0 of Cesàro-α means (Hardy [1963; 110]). The Riesz harmonic means appear naturally when α → 0. Let X = {Xk , k ≥ 0} be an i.i.d. sequence with E X0 = 0. As a consequence of a result of Chow and Lai [1973: Theorem 2] cn Xk a.s. =0 n→∞ log n n−k n−1

lim

⇐⇒

E et|X0 | < ∞

(∀t > 0).

k=0

Deniel [1989; Theorem 11] showed that this result cannot be extended to the stationary case. More precisely, if τ is an ergodic automorphism on (, A, μ), μ non-atomic, there exists a measurable set B such that if f = 1B , then the Riesz harmonic means cn 1 f τk log n n−k n−1

Hn f =

k=0

4.8 Weighted ergodic averages

171

diverge almost surely. The construction of B goes as follows. Let n ≥ 2 be some integer. By Kakutani–Rochlin’s lemma (see (7.2.2)) there exists A ∈ A such that

n2 −1 u 2 2 A, τ A, . . . , τ n −1 A are mutually disjoint and μ u=0 τ A = n μ(A) ≥ 1 − 1/n. Let B= τ u A, D = τ j A. 1≤j 0, k=1 Bk , F = k=1 Dk . We observe that μ(F ) ≥ 1 − and on F lim sup Hn χE ≥ 1/2. n→∞ Further μ(E) ≤ k 1/nk < 1/2. Assume the convergence almost everywhere of Hn χE . The fact that the convergence of the Riesz harmonic means implies the convergence of the Cesàro means to the same limit, would imply that this one equals to μ(E) < 1/2. We consequently get a contradiction. Riesz B-means. Let {bk , k ≥ 1} be positive reals and assume that Bn → ∞. To any sequence x = {xk , k ≥ 0} of reals, one can associate the Riesz B-means defined by the formula N 1 bk xk . σN (x) = BN k=1

Gaposhkin has considered for stationary sequences the Riesz B-means with coefficients (bk , Bk ) satisfying some regularity assumptions, namely bk = b(k) where b(u) = u−1 ϕ(u) and ϕ on [1, ∞] is regularly varying in the sense that for each ε > 0 ϕ(u) ↓0 uε

and uε ϕ(u) ↑ ∞,

u → ∞,

172

4 Pointwise ergodic theorems

and

u

B(u) =

b(t)dt → ∞,

u → ∞.

1

He obtained in [Gaposhkin: 1995] optimal spectral conditions for the convergence almost everywhere of these means. Let ξ = {ξk , k ≥ 1} be a stationary sequence. If the spectral measure F (dλ) of ξ satisfies the condition

log log B 0 0. The elementary identity An − An−1 = −

wn wn An−1 + ξn Wn Wn

applied with n = nk together with the weak law implies that the left-hand side of the above converges to 0, and the first term of the right-hand side converges to −c(ξ1 ) wn in probability. So that Wnk ξnk converges in probability to c(ξ1 ). Thus ξnk converges k

173

4.8 Weighted ergodic averages

in probability to (ξ1 ). Since ξi are i.i.d., this means that ξ1 is degenerate; hence a contradiction. Notice that lim wn /Wn = 0 and Wn ↑ ∞ ⇐⇒ lim max (wk /Wk ) = 0.

n→∞

n→∞ k≤n

Conversely if limk→∞ wk /Wk = 0, letting F be the distribution function of ξ1 , the weak law holds if and only if lim xF (dx) exists. lim T P{|ξ1 | ≥ T } = 0 and T →∞

T →∞ |x| 0, (ii) supn n1 nk=1 wkα < ∞ for some α > 1,

∞

wn n=1 Wn

= ∞, while (4.8.2)

then condition (4.8.2) holds (Baxter, Jones, Lin and Olsen [2004: Theorem 3.4]). wk Proof. Sufficiency. Put for x ≥ 1, N(x) = #{k : W ≥ x −1 }, N (x) = 0 if 0 ≤ x < 1. k Then N is a nondecreasing function. Consider for k ≥ 1 the truncated random variables

Yk = ξk · χ |ξk | < Observe that

P{ξk = Yk } =

k≥1

W

k k≥1 |v|≥ wk

=

v =0

# k:

Wk . wk

F (dv) =

k≥1

χ|v|≥ Wk F (dv) wk

wk ≥ |v|−1 F (dv) = E N (|ξ |). Wk

174

4 Pointwise ergodic theorems

Under (4.8.2), we have N(y) ≤ Cy. So if E |ξ | < ∞, then P{ξk = Yk ultimately} = 1. Thus it suffices to prove the result with Yk in place of ξk . The random variables wk

ζk = Wk Yk − E Yk are independent; further,

wk Wk

E ζk2 ≤

2

E Yk2 =

wk Wk

2 W |x|< w k k

x 2 F (dx).

Given K arbitrary, let ≥ 0 be such that wk ≥ 2− . Wk

min

1≤k≤K

Then K

wk 2 x 2 F (dx) Wk Wk |x|< w

E ζk2 ≤

1≤k≤K

k=1

≤ =

W k

k :|x|< w k ≤2

|x|≤1

≤

+

|x|≤1

j =1

2

x 2 F (dx)

{2j 0 and a subsequence wn {nk , k ≥ 1} such that wnk /Wnk → c, and so Wnk ξ˜nk has a limit distribution, namely k

wn the distribution c(ξ1 − E ξ1χ {|ξ1 | < c}). Consequently P{| Wnk ξ˜nk | ≥ ε} → P{|ξ1 − k E ξ1χ {|ξ1 | < c}| ≥ ε/c}. If P{|ξ1 − E ξ1χ {|ξ1 | < c}| ≥ ε/c} = 0, for every ε > 0, then ξ1 = E ξ1χ{|ξ1 | < c} almost surely, a degenerate case which is excluded.

4.8 Weighted ergodic averages

177

Otherwise for some ε > 0, we have P{|ξ1 −E ξ1χ{|ξ1 | < c}| ≥ ε/c} > 0. Therefore the series n≥1 P{|ξ˜n | ≥ εWn /nn } diverges. By the Kolmogorov three series theorem, wn ˜ ξn cannot converge almost surely. [Petrov: 1975] p. 266, the series n≥1 W n Bounded sequences w, however, need not satisfy (4.8.2) as follows from the result below. 4.8.2 Theorem. Let w be bounded weights. Then for every centered i.i.d. sequence ξ a.s. with E (|ξ1 | log+ |ξ1 |) < ∞ we have limn→∞ An (ξ ) = 0. More generally, we will see that it is possible to relax the assumptions on the weights to obtain a.s. convergence, when more integrability conditions on ξ are known. But first, let us return to the ergodic setting and begin with first results ([Lin–Weber: 2007], Theorem1.2 and 3.1) concerning notably the natural example of sequences w satisfying “monotonicity” or “quasimonotonicity” assumptions. 4.8.3 Theorem. Let p ≥ 1. Let f = {fk , k ≥ 1} denote any sequence in Lp (μ). (i) In order for every sequence f to be such that n1 nk=1 fk converges to f almost everywhere (in norm), also limn→∞ W1n nk=1 wk fk = f almost everywhere (respectively in norm), it is necessary and sufficient that 1 k|wk − wk+1 | < ∞. nwn + Wn n−1

lim sup n→∞

(4.8.3)

k=1

(ii) Further, for any non-null sequence γ = {γk , k ≥ 1} of nonnegative numbers and n

f

k any sequence f , such that k=1 converges to some f almost everywhere (in n k=1 γk norm), also n 1 n w f k=1 wk fk k k W k=1 = 1n n →f n k=1 wk γk k=1 wk γk W n

almost everywhere (respectively, in norm) as n tends to infinity. (iii) In particular, if f is such that n1 nk=1 fk converges to f almost everywhere (in norm), then n 1 n wk fk → f j =1 wj k=1

almost everywhere (respectively, in norm) as n tends to infinity. The standard examples of sequences f for which the condition n1 nk=1 fk converges to f almost everywhere (in norm) is satisfied are given by fk = f τ k where τ is an endomorphism of (X, A, μ), which follows from Birkhoff’s theorem.

178

4 Pointwise ergodic theorems

Proof. Since nwn =

n−1 k=1

k(wk+1 − wk ) + Wn , condition (4.8.3) is equivalent to

lim sup n→∞

n−1 1 k|wk − wk+1 | < ∞. Wn k=1

(i) is a special case of a general result on summability methods which are stronger than the Cesàro method (see Zeller [1958: 100], see also Dunford–Schwartz [1958: 75]). If A is a matrix which preserves Cesàro convergence and C is the Cesàro matrix, then AC −1 is regular (preserves convergence). The sufficiency of (4.8.3) for preserving convergence of Cesàro averages (also in norm) follows from (iii), with γk ≡ 1. (ii) The proof is similar to that of Theorem 8.2.1 in Krengel [1985]. Given f put Fn = nk=1 fk . We denote by |F | either |F (x)| for a given point x or the norm F p , according to the given mode of convergence. By Abel’s summation formula we obtain n n−1 1 1 wn wk fk = (wk − wk+1 )Fk + Fn . Wn Wn Wn k=1

(4.8.4)

k=1

We are given γ a non-null sequence of nonnegative numbers and f ⊂ Lp , such that n fk k=1 converges to some f a.e. (in norm). Denote Gn := nk=1 γk . By assumption, n γ k=1 k a Gn . we have 0 < a ≤ wk and Wk ≤ kb for every k. Hence G∗n := W1n nk=1 wk γk ≥ nb To simplify the exposition, we assume γ1 > 0. Replacing fk and Fk in (4.8.4) by γk and Gk respectively and multiplying by g, we obtain (after subtraction from (4.8.4) and division by G∗n ), n n−1 Fk Gk Gn k=1 wk fk 1 nwn Fn − g ≤ |w − w |k − g + − g . k k+1 n W Gk kG∗n Wn Gn nG∗n n k=1 wk γk k=1

F k For ε > 0 we have G − g < ε for k > N . Splitting the summation above to a sum k up to N and a sum for k > N, the first sum converges to 0 as n → ∞ since Wn → ∞ (and G∗k ≤ G∗n for k ≤ N), and using (4.8.3) we obtain n n−1 k=1 wk fk b 1 nwn b lim sup n −g ≤ lim sup ε |wk −wk+1 |k + ≤ C ·ε. a Wn Wn a n→∞ n→∞ k=1 wk γk ∞

k=N +1

Note that when k=1 γk = ∞, it is enough to assume lim inf k→∞ wk > 0, since then k wk γk = ∞, and we can apply (ii) to the sequence wJ +k with a fixed large J . Gn (iii) The additional assumptions on w in (ii) were needed to obtain supn nG ∗ < ∞; n since this follows from the assumptions on γ in (iii), the proof of (ii) applies.

The following result of practical interest is now easily deduced from Theorem 4.8.3. 4.8.4 Corollary. In each of the following cases, condition (4.8.3) is satisfied (and hence all the assertions of Theorem 4.8.3 hold):

179

4.8 Weighted ergodic averages

(i) For some s ≥ 0 the sequence {k −s wk , k ≥ 1} is nonincreasing. (ii) For some s ≥ 0 the sequence {k s wk , k ≥ 1} is nondecreasing and satisfies nwn sup < ∞. (4.8.5) Wn n Proof. (i) We may of course assume s ≥ 1. We use the given monotonicity to estimate n−1

k|wk − wk+1 | ≤

k=1

n−1

k

1+s

k=1

wk wk+1 − s k (k + 1)s

Since s ≥ 1, the second sum is bounded by the first sum we have the estimate n−1 k wk − k=1

n−1

+

n−1 (k + 1)s − k s k wk+1 . (k + 1)s k=1

s(k+1)s−1 (k+1)s

k=1

kwk+1 ≤ s

n

j =2 wj .

For

(k + 1)s+1 − k s+1 ks w = (kw −(k+1)w )+ wk+1 k+1 k k+1 (k + 1)s (k + 1)s n−1

n−1

k=1

k=1

≤ w1 − nwn + (1 + s)

n−1

wk+1 .

k=1

n−1 (1 + 2s)Wn , which We obtain k=1 k|wk − wk+1 | + nwn ≤ w1 + implies (4.8.3). Note that (i) easily implies (4.8.5), since Wn = nk=1 k s k −s wk ≥ nk=1 k s n−s wn ≥ 1 s+1 nwn . (ii) We may now assume s ≥ 2, and use the monotonicity to estimate n−1

k|wk − wk+1 | ≤

k=1 n−1

=

n−1 n−1 1

(k + 1)s − k s s s w − k w wk+1 (k + 1) + k+1 k k s−1 k s−1 k=1

(k + 1)wk+1 − kwk

k=1

k=1

+

n−1 (k + 1)s k=1

k s−1

= nwn − w1 +

n−1

− (k + 1) wk+1 +

k=1

(k + 1)s − k s (k + 1)s−1 − k s−1 (k + 1)wk+1 + wk+1 s−1 k k s−1 n−1

k=1 n−1

≤ nwn + (s − 1)

k=1

≤ nwn + (2s − 1)2

n−1 (k + 1)s − k s wk+1 k s−1

s−1

k=1

k+1 k n

s−1

wk+1 + s

n−1 k=1

k+1 k

wj ,

j =2

and together with (4.8.5) we conclude that (4.8.3) holds.

s−1

wk+1

180

4 Pointwise ergodic theorems

Remarks. Trigonometric series with coefficients satisfying the “quasimonotonicity” assumptions of the corollary were considered by Lebed [1967]. Corollary 4.8.4 applies also to non-monotone sequences. As an example satisfying (i), define wk = 2−j s k s for 2j ≤ k < 2j +1 . Since w2j = 1 and the sequence increases in each dyadic bloc, it is not monotone. Part (ii) of the corollary applies, for example, to wk := 2j s k −s for 2j ≤ k < 2j +1 ; an unbounded example is wk := k + 23 sin k. For increasing sequences, condition (4.8.5) is satisfied when wk = k t for a fixed t > 0, or wk = (log k)t for a fixed t > 0, but not when wk = t k for some t > 1. For more details and examples we refer to Lin and Weber [2005]. Before discussing more precisely the L2 (μ) setting, let us recall a well-known fact (see for instance Hardy, Littlewood and Polya [1934: 120]), from which follows a simple but useful result. Let (t) > 0 be nondecreasing for t ≥ t0 ≥ 0 with limt→∞ (t) = ∞; then we have k

wk 1 < ∞ "⇒ < ∞. (k) (Wk )

This obtains from the inequality

(4.8.6)

k

wk (Wk )

≤

Wk

1 Wk−1 (t) dt,

valid when Wk−1 ≥ t0 .

4.8.5 Proposition. For any α > 1 and for any sequence {fk , k ≥ 1} ⊂ L1 satisfying supk fk 1 < ∞ we have n a.e. k=1 wk fk = 0. lim n→∞ Wn logα (1 + Wn ) Proof. Apply (4.8.6) with (t) = t logα (1 + t); the result follows from Beppo Levi’s theorem and Kronecker’s lemma. For p > 1 the proposition is also an immediate consequence of the remark to Corollary 9.3.7 (c) (with ξk = wk fk ). By taking (t) = t log(1 + t)[log log(1 + t)]α with α > 1, the proof also yields n k=1 wk fk = 0 a.e. lim n→∞ Wn log(1 + Wn )[log log(1 + Wn )]α From now on we write Mn :=

n

wk2 .

(4.8.7)

k=1

We now consider a sequence of functions f = {fk , k ≥ 1} ⊂ L2 (μ). Let w be a sequence of nonnegative weights. We assume the following relation between the weights and the functions: there exists a finite constant C0 such that m m 2 wk fk ≤ C0 wk2 , k=n

2

k=n

∀m ≥ n ≥ 1.

(4.8.8)

4.8 Weighted ergodic averages

181

Condition (4.8.8) obviously holds for norm-bounded orthogonal sequences, e.g., orthonormal sequences (for any sequence of weights). Such a condition is also realized by (1.3.11), when f satisfies fi , fj < ∞ (4.8.9) sup i

j

(e.g., fk are centered and satisfy supi j fi , fj < ∞). To get (4.8.9) it suffices for instance that for any integers j ≥ i ≥ 1, fi , fj ≤ C1 e−C2 |j −i| . (4.8.10) Since the weights are nonnegative, (4.8.9) holds for centered negatively correlated random variables with uniformly bounded variances. Another example of a sequence satisfying (4.8.9) is a wide-sense stationary sequence with bounded spectral density. 4.8.6 Theorem. Assume that the sequence of weights w satisfies log

1 = O(log Mn ) wn

(4.8.11)

and f satisfies (4.8.9). Then for any b > 3/2 we have n k=1 wk fk lim = 0 a.s. n→∞ M 1/2 logb M n n If in addition lim sup n→∞

Mn logγ Mn < ∞ for some γ > 3, Wn2

then we have

(4.8.12)

n

k=1 wk fk

lim

n→∞

Wn

= 0 a.s.

Proof. The first half of Theorem 4.8.6 is an immediate consequence of Corollary 9.3.7. The second half follows from the first using (4.8.12). We now explore some intermediate conditions. 4.8.7 Theorem. Assume that for some 0 < β < 1 we have 1 β n wn + k β |wk − wk+1 | < ∞. Wn n−1

sup n

(4.8.13)

k=1

Then for p > β1 and T power-bounded on Lp , An (T )f → 0 a.e. for f ∈ Lp which for some α ∈ (1 − β, 1] satisfies n 1 k sup 1−α T f < ∞. n

n

k=1

p

(4.8.14)

182

4 Pointwise ergodic theorems

Proof. p and T as specified in the theorem, and for f ∈ Lp (μ) denote Sn f := n Fix k f . If f ∈ L (μ) satisfies (4.8.14), then, since β > max{1 − α, 1/p}, T p k=1 Proposition 11.3.8 yields 1 Sn f → 0 a.e. (4.8.15) nβ Using Abel’s summation we have n n−1 1 1 β nβ wn 1 1 k wk T f = k (wk − wk+1 ) β Sk f + Sn f . Wn Wn k Wn nβ k=1

k=1

We now obtain the assertion of the theorem by using (4.8.15) and (4.8.13), similarly to the proof of Theorem 4.8.3. Remarks. Condition (4.8.14) implies that f is a fractional coboundary for T . For additional information we refer to Derriennic–Lin [2001], where (4.8.15) isproved for Dunford–Schwartz contractions. Condition (4.8.13) implies also that W1n nk=1 wk fk → 0 a.e. for any sequence {fk , k ≥ 1} ⊂ Lp (μ), with p > β1 with supk fk p < ∞ 1 n

# satisfying sup 1−α fk < ∞, for some α ∈ p (1 − β), 1 . This follows n

n

k=1

p−1

p

from Proposition 1 of Cohen and Lin [2003] (with δ = 1 − β). However the condition on α here is more restrictive than in the theorem. For nondecreasing weights, condition β (4.8.13) is equivalent to supn nWwn n < ∞. 4.8.8 Corollary. Assume that condition (4.8.13) holds for some β > 21 . Then for every power-bounded T on L2 (μ) and f ∈ L2 with sup √1n Sn f 2 < ∞, we have An (T )f → 0 a.s. 4.8.9 Corollary. Let 1 < q < 2 with dual index p = q/(q − 1), and assume

n−1 nwn 1 q q + q k |wk − wk+1 | < ∞. sup Wn Wn k=1 n

(4.8.16)

Then for every T power-bounded in Lp (μ) and f ∈ Lp (μ) satisfying (4.8.14) with α > 1/p we have An (T )f → 0 a.e. Proof. We first show that for any β < 1/q (4.8.13) is satisfied. By Hölder’s inequality n−1 n−1 1 1 1 β k |wk − wk+1 | = k|wk − wk+1 | 1−β Wn Wn k k=1

≤

k=1 n−1

1 q Wn

k=1

k |wk − wk+1 | q

q

1/q n−1 k=1

1 k p(1−β)

1/p .

183

4.8 Weighted ergodic averages

1 Since p(1 − β) > 1 the series ∞ k=1 k p(1−β) converges, so (4.8.13) holds. For α > 1/p we pick β ∈ (1 − α, q1 ) such that β > 1/p, which is possible since q < 2, and apply Corollary 4.8.8. Note that the proof that (4.8.16) implies (4.8.13) for β < 1/q is valid for any q > 1; it is the application of Corollary 4.8.8 to the dual index that requires q > 2. Now we turn to the i.i.d. case and will essentially discuss some extensions of Theorem 4.8.2 that allow us to weaken the assumptions on the weights, when balancing this with a few more integrability conditions on the sequence of random variables ξ = {ξk , k ≥ 1}. We assume wk > 0 for every k. We first begin with a simple proposition which does not require identical distribution. 4.8.10 Proposition. Let 1 < p ≤ 2. ∞ p (i) If for any centered independent sequence ξ with k=1 wk < ∞, then p supk E |ξk | < ∞, the series ∞ k=1 wk ξk converges almost surely. wk ξk (ii) We have almost sure convergence of the series ∞ k=1 Wk for every centered

wk p independent sequence ξ with supk E |ξk |p < ∞, if and only if ∞ < ∞. k=1 Wk (iii) The following are equivalent: ∞ wk p (a) < ∞, k=1 Wk ∞ wk ξk (b) k=1 Wk converges almost surely for any centered independent ξ with supk E |ξk |p < ∞, (c) W1n nk=1 wk ξk → 0 almost surely for any centered independent ξ with supk E |ξk |p < ∞. Proof. Assertion follows from Marcinkiewicz–Zygmund [1937]. In part(ii), if the

wk (i) p wk ξk series ∞ converges, then for ξ as in the statement, the series ∞ k=1 Wk k=1 Wk ∞ wk p converges almost surely by (i). Conversely, if k=1 Wk = ∞, then a result of Marcinkiewicz–Zygmund [1937: Theorem 5] yields the existence of a sequence ξ of in wk ξk dependent centered random variables with E |ξk |p = 1 for which the series ∞ k=1 Wk is almost surely divergent. In assertion (iii), (a) implies (b) by (ii), and (b) implies (c) by Kronecker’s lemma. Now assume (c). An inspection of the construction of the example in the quoted result of Marcinkiewicz–Zygmund shows that if (a) does not hold, then in fact there is {ξk } centered independent with E (|ξk |p ) = 1 such that lim sup wWk ξkk ≥ 1 a.s. (we define ξk = Wk xk /kwk , where xk are the random variables defined in Marcinkiewicz– Zygmund [1937]. This contradicts (c), since W1n nk=1 wk ξk is then a.s. non-convergent to 0 by the identity n n−1 n−1 1 1 wn ξn wn 1 wk ξk − wk ξk = − wk ξk . Wn Wn−1 Wn Wn Wn−1 k=1

k=1

k=1

184

4 Pointwise ergodic theorems

The following result is in the same spirit as in Theorem 4.8.2, but the weights need not necessarily be bounded. 4.8.11 Theorem. Let w be a weight sequence with

∞

k=1 wk

= ∞. If

n 1 wk (log(wk + 1) )β < ∞ for some β > 1, n≥1 Wn

sup

(4.8.17)

k=1

then for any i.i.d. sequence ξ such that E |ξ1 |(log+ |ξ1 |)γ < ∞ for some γ > 1, we a.s. have limn→∞ W1n nk=1 wk ξk = E ξ1 . The proof of the theorem will depend on a general method for obtaining sufficient conditions, described below. Let ϕ : R+ → R+ be a differentiable non-decreasing function satisfying (i) 0 ≤ ϕ (u) ≤ C ϕ(u) u , (ii) u−1 ϕ(u) is nondecreasing for u ≥ u0 , (iii) ϕ(uv) ≤ Cϕ(u)ϕ(v), u ≥ 1, v ≥ v0 , ∞ du < ∞ for some t > 0. (iv) ϕ(u) t

(4.8.18)

Note that (iv) and (ii) imply that limu→∞ u−1 ϕ(u) = ∞. Typical examples are the functions ϕ(u) = uα (log(1 + u) )β , α ≥ 1, β ∈ R+ , with β > 1 when α = 1. It can be ϕ(u) ϕ(v) shown that (ii) implies ϕ(u + v) ≥ ϕ(u) + ϕ(v). Indeed, as ϕ(u+v) u+v ≥ max u , v we have

ϕ(u + v) ≥ (u + v) max

ϕ(u) ϕ(v) , u v

≥

(u + v) ϕ(u) ϕ(v) . + 2 u v

ϕ(u) ϕ(v) We claim that (u+v) ≥ ϕ(u) + ϕ(v). This amounts to the assertion that 2 u + v u+v } ≤ 0. Assume u ≤ v for instance. Then, } + ϕ(v){1 − ϕ(u){1 − u+v 2u 2v

$

%

u+v 1 1 u+v u+v ϕ(u) 1 − + ϕ(v) 1 − ≤ ϕ(v) 2 − + 2u 2v 2 u v (u + v)2 = ϕ(v) 2 − 2uv 2 u + v2 = ϕ(v) 1 − ≤ 0. 2uv Thereby, ϕ

n k=1

n uk ≥ ϕ(uk ). k=1

(4.8.19)

185

4.8 Weighted ergodic averages

Let w be a weight sequence. Put T0 = 0 and Tn = nk=1 ϕ(wk ) for n ≥ 1. Property (4.8.19)implies that Tn ≤ ϕ(Wn ) for all n ≥ 1. When ϕ(t) = t α with α > 1 this means nk=1 wkα ≤ Wnα , which is weaker than the necessary condition (1.3.8), and thus yields no information for the weighted strong law of large numbers. We therefore introduce the following assumption: n ϕ(wk ) sup k=1 < ∞. (4.8.20) W n n Inequality (4.2.20) implies wn /Wn → 0. Denote κ = supn Tn /Wn . Fix some ε > 0. n) If ϕ(w wn > κε, then ϕ(wn ) wn wn wn = · ≤κ < ε; Wn ϕ(wn ) ϕ(wn ) Wn n) if ϕ(w wn ≤ κε, then convergence to infinity in (ii) implies wn ≤ Aε , so wn /Wn ≤ Aε /Wn , which is less than ε for large n. Since ϕ is nondecreasing and ϕ(wn ) = Tn − Tn−1 , for any positive integer j (iii) yields

ϕ(j wn ) # n : Wn ≤ j wn ≤ j + # n > j : Wn ≤ j wn ≤ j + ϕ(Wn ) ≤ j + Cϕ(j )

n>j

Tn − Tn−1 n≥j

ϕ(Wn )

∞

−Tj −1 1 1 + − Tn = j + Cϕ(j ) ϕ(Wj ) ϕ(Wn ) ϕ(Wn+1 )

.

n=j

But by the mean value theorem and the assumptions made on ϕ, we have 1 ϕ(Wn+1 ) − ϕ(Wn ) wn+1 1 − = ≤C . ϕ(Wn ) ϕ(Wn+1 ) ϕ(Wn )ϕ(Wn+1 ) Wn+1 ϕ(Wn ) Inserting this and using Tn ≤ Tn+1 we get

∞ −Tj −1 Tn+1 wn+1 # n : Wn ≤ j wn ≤ j + Cϕ(j ) +C · ϕ(Wj ) Wn+1 ϕ(Wn )

n=j

≤ j + C 2 ϕ(j ) sup k≥j

(Recall that by (4.2.20), κ = supk≥1

Tk Wk

Tk Wk

∞ n=j

∞

wn+1 wn+1 ≤ j + C 2 ϕ(j )κ . ϕ(Wn ) ϕ(Wn )

is finite.) Now

n=j

wn+1 ϕ(Wn )

#{n : Wn ≤ j wn } ≤ j + C κϕ(j ) 2

∞ Wj

≤

Wn+1

du ϕ(u)

Wn

du ϕ(u)

yields (4.8.21)

186

4 Pointwise ergodic theorems

which is finite by (4.8.18) (iv). We now extend the definition of Wn to R+ by putting W0 = 0 and W (t) = W[t] where [t] stands for the integer part of t. For t ∈ R+ and s ∈ R define 0 if 0 ≤ t < 1, ∞ du

G(t) = max t, ϕ(t) W (t) ϕ(u) if t ≥ 1. H (s) = s 2 t≥|s|

G(t) dt. t3

(4.8.22)

G(t) is always finite, and by (4.8.21), #{n : Wn ≤ j wn } < ∞. G(j ) j ≥1

sup

(4.8.23)

The function H need not be finite (e.g., ϕ(u) = u2 and wn = 1/n). When it is, we obtain 4.8.12 Theorem. Let ϕ satisfy (4.8.18), and let w with divergent series satisfy (4.8.20). Then for any i.i.d. sequence ξ with E H (ξ1 ) < ∞ we have n wk ξk a.s. k=1 = E ξ1 . n n→∞ k=1 wk lim

Proof. The proof is built upon (4.8.23) and an argument that lies in the proof of Theorem 3 of Jamison, Orey and Pruitt [1965]. For any positive real t put N(t) := #{n : Wn ≤ twn }. For t < 1 we have N (t) = 0. For t ≥ 1 we have ϕ(t + 1) ≤ ϕ(2t) ≤ Cϕ(2)ϕ(t), and [t] + 1 ≤ t + 1. In view of (4.8.23) and the definition of G, #{n : Wn ≤ twn } ≤ K G([t] + 1) ≤ K G(t). Thus E ξ2

t≥|ξ |

#{n : Wn ≤ twn } G(t) 2 dt ≤ KE ξ dt = KE H (ξ ) < ∞, 3 t3 t≥|ξ | t

by assumption. It follows that condition (4.8.2) of Theorem 4.8.1 is satisfied; by this theorem, as well as the remark at the bottom of p. 41 in Jamison, Orey and Pruitt [1965], the result obtains. Now we can pass to the Proof of Theorem 4.8.11. Since in (4.1.17) we can always replace β by a smaller value (still greater than 1), and also γ can be replaced by any smaller value > 1, we may always assume γ = β and β ≤ 2. Put ϕ(u) := u(log(1 + u) )β . Then the assumptions

4.8 Weighted ergodic averages

187

on ϕ and {wk } of Theorem 4.8.12 are satisfied. Since W (t) → ∞, we have W (t) ≥ e for t ≥ t1 , and ∞ ∞ du 1 du = (4.8.24) ≤ β (β − 1)(log W (t) )β−1 W (t) ϕ(u) W (t) u(log u) 1 t (log(1 + t) )β for large t. For yields that G(t) as defined before satisfies G(t) ≤ β−1 |s| large enough we obtain ∞ ∞ ∞ G(t) (log(1 + t) )β (log t)β dt ≤ c dt ≤ c dt t3 t2 t2 |s| |s| |s| ∞ β (log t)β (log |s|)β ≤ c 2 1− dt = 2c . 2 log t t |s| |s|

∞ dt ≤ C|s|(1 + (log+ |s|)β ). Hence We can now conclude that H (s) = s 2 |s| G(t) t3 E H (ξ1 ) < ∞ when E |ξ1 |(log+ |ξ1 |)β < ∞. Now Theorem 4.8.12 yields the assertion, since γ = β. Condition (4.8.17) is satisfied by any bounded weight sequence. However, Theorem 4.8.11 does not include Theorem 4.8.2, because the latter requires a slightly weaker integrability property of ξ1 . If we want to apply Theorem 4.8.12 to ϕ(t) = t α with (α) α > 1, condition (4.8.20) becomes {Mn /Wn } bounded, which implies (4.8.17). However, without any knowledge of the size of Wn we cannot get better estimates for H with this ϕ, and Theorem 4.8.12 in this case will not improve the result of Theorem 4.8.11. We now add a condition to Theorem 4.8.11 in order to obtain the weighted strong law of large numbers under a weaker integrability condition on the i.i.d. sequence. 4.8.13 Theorem. Let w be a weight sequence. If w satisfies (4.8.17) and also inf n n1 Wn > 0, then for any i.i.d. sequence ξ with E |ξ1 | log+ |ξ1 | < ∞, we have a.s. limn→∞ W1n nk=1 wk ξk = E ξ1 . Proof. We take ϕ as in the proof of Theorem 4.8.11. The additional assumption on Wn yields 1 W (t) Wn inf ≥ inf > 0, t≥1 t + 1 3 n≥1 n ∞ du ≤ C(log(1 + t) )1−β . Hence for large t we have so by (4.8.24) we have W (t) ϕ(u) G(t) ≤ c log(1 + t), so for large |s| we obtain ∞ ∞ ∞ G(t) (log(1 + t) (log t) log |s| + 1 log |s| dt ≤ c dt ≤ c dt = c ≤ c . 3 2 2 t t t |s| |s| |s| |s| |s| ∞ dt ≤ C|s|(1 + log+ |s|). Therefore, We can now conclude that H (s) = s 2 |s| G(t) t3 + E H (ξ1 ) < ∞ when E (|ξ1 | log |ξ1 |) < ∞. Theorem 4.8.11 now applies.

188

4 Pointwise ergodic theorems

We can deduce the following corollary. 4.8.14 Corollary. Let w be a weight sequence satisfying inf n sup n

Wn n

> 0 and

n 1 α wk < ∞ for some α > 1. Wn

(4.8.25)

k=1

Then for any i.i.d. sequence ξ with E |ξ1 | < ∞ we have n 1 a.s. wk ξk = E ξ1 . n→∞ Wn

lim

k=1

Proof. We take ϕ(t) = t α , which satisfies (4.8.18). Assumption (4.8.25) is condition (4.8.20) for our ϕ. It is easy to show that G(t) ≤ ct α W (t)1−α for large t. Since inf t≥1 W (t)/t ≥ 21 inf n Wn /n > 0, we obtain G(t) ≤ c t. Computations similar to the previous ones yield that H (s) ≤ C|s|. Hence E H (ξ1 ) < ∞ when E |ξ1 | < ∞, and Theorem 4.8.13 applies. Remarks. The assumed linear growth of Wn thus allows for more precise estimates, which result in the weighted strong law of large numbers for i.i.d. with only the first moment, when (4.1.17) is strengthened to (4.1.25). There are many unbounded sequences that satisfy the hypotheses of Theorem 4.8.13 and Corollary 4.8.14. For example, strictly stationary ergodic random weights with finite moment α > 1 satisfy the hypotheses of Corollary 4.8.14. On the other hand, if the stationary sequence is only in L(log+ L)β , then almost surely the random weights satisfy (4.1.17), but not (4.1.25). Nevertheless, the weighted strong law of large numbers for i.i.d. with only finite first moment still holds, by Theorem 4.1 of Baxter, Jones, Lin and Olsen [2004]. The stationary random weights above satisfy (4.1.17) but not (4.8.5), while the weights wk := k t , t > 0 satisfy (4.8.5) and not (4.1.17). The method leading to Theorem 4.8.12 can be generalized. We now assume only that ϕ satisfies (4.8.18) (i) to (iii), and instead of assuming (4.8.18) (iv) we take another positive nondecreasing function ϕ1 with ∞ du < ∞ for some t > 0. (4.8.26) ϕ1 (u) t For a weight sequence w with divergent series we assume the following (which is (4.8.20) when ϕ1 = ϕ): ϕ1 (Wn ) nk=1 ϕ(wk ) κ := sup < ∞. (4.8.27) Wn n ϕ(Wn ) Adapting the two inequalities preceding (4.8.21), we get ∞ du #{n : Wn ≤ j wn } ≤ j + C 2 κϕ(j ) . ϕ 1 (u) Wj

4.8 Weighted ergodic averages

189

4.8.15 Theorem. Let ϕ satisfy (4.8.18) (i) to (iii) and ϕ1 nondecreasing with (4.8.26). Let w with divergent series satisfy (4.8.27). Define 0 if 0 ≤ t < 1, ∞ du

G(t) = max t, ϕ(t) W (t) ϕ1 (u) if t ≥ 1. dt (s ∈ R) is finite, then for any i.i.d. sequence ξ with If H (s) := s 2 t≥|s| G(t) t3 E H (ξ1 ) < ∞ we have n wk ξk a.s. = E ξ1 . lim k=1 n n→∞ k=1 wk We use Theorem 4.8.15 for studying some weighted modulation. Fix α > 1, and let c = {ck , k ≥ 1} be a sequence of positive numbers satisfying ∞

ckα = ∞.

(4.8.28)

k=1

(α) Hence also k ck = ∞. Since Cn := nk=1 ck and Cn := nk=1 ckα are strictly increasing, there exist strictly increasing continuous functions ψ and ψα with ψ(0) = 0, (α) ψα (0) = 0, ψ(n) = Cn , and ψα (n) = Cn . Let b = {bk , k ≥ 1} be a sequence of positive numbers satisfying n n α α k=1 ck bk k=1 ck bk := lim n > 0 exists and sup (4.8.29) n α < ∞. n→∞ n≥1 k=1 ck k=1 ck n 2 2 2 As an example of √ such a situation, let c with ∞ k=1 ck = ∞ satisfy supn ncn / k=1 ck < n ∞. Then supn ncn / k=1 ck is finite, and Corollary 4.8.4 applies to c and to {ck2 , k ≥ 1}. Hence for positive i.i.d. random variables {fk , k ≥ 1} with finite third moment, almost surely the realizations bk = fk (x) satisfy (4.8.29). 4.8.16 Theorem. Let α > 1, and let c be a sequence of positive numbers with ∞ α t 1+α α k=1 ck = ∞. Put ϕ(t) = t and ϕ1 (t) := ψα ψ −1 (t) . Assume ϕ1 is nondecreasing and satisfies (4.8.26), and define 0 if 0 ≤ t < 1, ˜ ∞ du

G(t) = max t, ϕ(t) C[t] ϕ1 (u) if t ≥ 1. If H˜ (s) := s 2

˜ G(t) t≥|s| t 3 dt

(s ∈ R) is finite, then for any positive sequence b satisfying (4.8.29) and any i.i.d. sequence ξ with E H˜ (ξ1 ) < ∞ we have n k=1 ck bk ξk a.s. lim = · E ξ1 . n n→∞ k=1 ck

190

4 Pointwise ergodic theorems

˜ and H˜ depend only on the weights c and not on the “modSince the functions G ulators” b, the class of i.i.d. to which the result applies is the same for all positive modulators which satisfy (4.8.29). Proof. We will apply Theorem 4.8.15 to the sequence wk = ck bk . Clearly ϕ and ϕ1 (which depend only on c) satisfy (4.8.18) and (4.8.26)respectively. We now show that ∞. Clearly we may wk = ck bk satisfies (4.8.27). Since > 0, we have ∞ k=1 wk = assume > 1. Hence for n large enough, Wn = nk=1 ck bk ≥ nk=1 ck = Cn , and ψ −1 (Wn ) ≥ ψ −1 (Cn ) = n. By the definitions, for large n we obtain ϕ1 (Wn ) := Hence

ϕ1 (Wn ) ϕ(Wn )

n

k=1 ϕ(wk ) Wn

≤

Wn1+α Wn1+α Wn1+α . ≤ = (α) ψα ψ −1 (Wn ) ψα (n) Cn

Wn1+α (α) Cn Wnα

·

n

α α k=1 ck bk Wn

=

n

α α k=1 ck bk (α) Cn

for large n, and (4.8.29)

yields (4.8.27). Finally, for G and H defined for w in Theorem 4.8.15, Wn ≥ Cn ˜ implies G(t) ≤ G(t), and consequently H (t) ≤ H˜ (t). Hence, by Theorem 4.8.15 and (4.8.29), n n n ck bk a.s. k=1 ck bk ξk k=1 wk ξk n = lim n · k=1 = · E ξ1 . lim n n→∞ n→∞ k=1 ck k=1 wk k=1 ck 4.8.17 Theorem. Let β ≥ 0, and let α > 1. Then for any ergodic probability preserving transformation τ on (X, A, μ) and 0 < f ∈ Lα (μ) there exists X1 with μ(X1 ) = 1 such that if x ∈ X1 , then for any i.i.d. sequence ξ with E |ξ1 | < ∞ we have n f dμ 1 β a.s. k lim k f (τ x)ξk = (4.8.30) E ξ1 . n→∞ nβ+1 β +1 k=1

Proof. Put ck = k β . Asymptotically 1 β+1 β+1 t

and ψα (t) =

1 αβ+1 , αβ+1 t

n

k=1 ck

so ϕ1 (u) =

∼

1 β+1 , so we can take ψ(t) = β+1 n γ u(α+β)/(β+1) satisfies (4.8.26), and

˜ simple computations yield that for large t we have G(t) ≤ Ct, so H˜ (s) ≤ C s. ˜ Thus E H (ξ1 ) is finite if E (|ξ1 |) < ∞. Since {ck , k ≥ 1} and {ckα , k ≥ 1} both satisfy (4.8.5), for non-zero f , Corollary 4.8.4 yields that for almost every x ∈ X the sequence bk = f (τ k x) satisfies (4.8.29) with = f dμ. The result thus follows from the previous theorem. If we take α = 2 and put ck = log(k + 1), then for f ∈ L2 (μ), x ∈ X1 and any ξ i.i.d. with E |ξ1 | < ∞ we have n k k=1 log(k + 1)f (τ x)ξk a.s. lim = f dμ E ξ1 . n→∞ n log n In this case ψ(t) = t log+ t and ψ2 (t) = t (log+ t)2 . For large t we obtain ∞ ∞ ∞ du u2 du (log s + 1)ds 2 = = ≤ , −1 3 2 s log s t ψ(t) ϕ1 (u) ψ(t) ψ (u)u t

191

4.8 Weighted ergodic averages

˜ which yields G(t) ≤ 2t for large t, so H˜ (s) ≤ Cs. We use as before the fact that {ck , k ≥ 1} and {ck2 , k ≥ 1} both satisfy (4.8.5). Oscillations of weighted averages over intervals of polynomial length. The problem of almost everywhere convergence of weighted averages can often be reduced to proving the convergence along a subsequence of polynomial growth. 4.8.18 Proposition. Let (X, A, μ) be a probability space, 1 < p < ∞ fixed with dual index q = p/(p − 1), and {fk , k ≥ 1} ⊂ Lp (μ) with supk fk p < ∞. Let w be a bounded sequence of positive numbers with inf n n1 nk=1 wk > 0, and put n 1 wk fk (x). An (x) := Wn k=1

(i) There exists a constant K, depending only on w, such that for any positive integers n1 < n2 ≤ 2n1 and any x ∈ X we have sup

n1 ≤j <m≤n2

|Am (x) − Aj (x)| ≤ K

n2 − n 1 n2

1/q

n2 1/p 1 |fk (x)|p . n2 k=1 (4.8.31)

(ii) For every R ≥ 1 and r > 1 there exists a constant K(R, r), which depends on w but not on p, such that ∞

sup

i=1

i R ≤j <m≤(i+1)R

qr 1/qr |Am − Aj | p ≤ K(R, r) sup fk p .

(4.8.32)

k

(iii) When p > 2 we obtain, by putting r = p/q = p − 1 in (4.8.32), ∞

sup

i=1

i R ≤j <m≤(i+1)R

p 1/p |Am − Aj | p ≤ K(R, p − 1) sup fk p . k

Proof. (i) Denote C := supk wk , and put αn,k := wk /Wn for 1 ≤ k ≤ n and αn,k = 0 for k > n, so An (x) = ∞ k=1 αn,k fk (x). For j < m, Hölder’s inequality yields |Am (x) − Aj (x)| ≤

m k=1

|αm,k − αj,k |q

m 1/q k=1

|fk (x)|p

1/p .

(4.8.33)

192

4 Pointwise ergodic theorems

Using the definitions and the boundedness of w, we obtain m

|αm,k − αj,k | = q

k=1

j

q wk

k=1

=

j

q m wk − W j )q + q (Wm Wn )q Wm k=j +1

q (Wm wk

k=1

≤ =

j

q m wk 1 1 q − + q Wj Wm Wm k=j +1

qC

wk

q (m − j )q

+ (m − j )

(Wm Wn )q k=1 C q (m − j ) (m − j )q−1 q

q

Wm

Wj

Cq q Wm

j Cq + 1 .

Let C2 := supn Wnn (finite by assumption). Since n1 ≤ j < m ≤ 2n1 , we have m − j ≤ n1 ≤ j . Hence the last estimate yields m

|αm,k − αj,k |q ≤

k=1

≤

C q (m − j ) j q−1 q q q jC + 1 Wm Wj

C q (m − j ) q q m−j (C2 C + 1) = K1 q q . Wm Wm

Substituting this in (4.8.33) and then using m ≤ n2 ≤ 2n1 ≤ 2m, we obtain 1/q (m − j )

|Am (x) − Aj (x)| ≤ K1 ≤

1/q K1

1/q

≤ K1

n2 1/q

Wm (m − j )1/q Wm

k=1

C2 21/p

1/p

1/p n2 1 p |fk (x)| n2

1/p n2 1 p |fk (x)| . n2

1/p

(2m)

m − j 1/q m

|fk (x)|p

k=1

k=1

1/q

Since j/m ≥ n1 /n2 , this shows assertion (i), with K = C2 21/p K1 . (ii) Put C1 := supk fk p . Taking the p-th power of (i) and integrating we obtain

p n2 − n 1 |Am − Aj | p ≤ K p sup n2 n1 ≤j <m≤n2

p/q

p

C1

whenever n1 < n2 ≤ 2n1 . Fix R ≥ 1. For i ≥ i0 large enough (i +1)R / i R ≤ 2 and the previous applies with n1 = i R and n2 = (i + 1)R . Since (i + 1)R − i R ≤ R(i + 1)R−1 , 1 ≤ Ri + 1, so we have n2n−n 2 1/q R |Am − Aj | ≤ K C1 . sup p i+1 i R ≤j <m≤(i+1)R

193

4.9 Subsequence averages

Now let r > 1. We estimate the tail of the series in (ii) by ∞ i=i0

sup i R ≤j <m≤(i+1)R

∞ qr qr |Am − Aj | p ≤ K qr C1 R r i=i0

1 i+1

r

.

This proves the convergence in (ii), and after majorizing the first terms of the series, we can get an estimate for K(R, r).

4.9

Subsequence averages

Let N = {nk , k ≥ 1} be an increasing sequence of positive integers. Let 1 ≤ p < ∞. In view of Birkhoff’s pointwise theorem, it is quite natural to ask if it is true that, given any measurable dynamical system (X, A, μ, T ), the limit N 1 nj T f (x) N →∞ N

lim

(4.9.1)

j =0

exists almost everywhere for any f ∈ Lp (μ), 1 ≤ p < ∞. In which case, we say that N is p-universally good. When p = 1, we say more simply that the sequence N is universally good. This is obviously a fascinating question, although somewhat more theoretical than Birkhoff’s theorem, which attracted and motivated ergodicians during the last decades. Some authors prefer to use a slightly more precise notion, saying that a sequence N is p-nice when, given any measurable dynamical system (X, A, μ, T ) and any f ∈ Lp (μ), N 1 nj a.s. T f (x) = ET (f )(x). N →∞ N

lim

(4.9.2)

j =0

Here ET (f ) denotes the conditional expectation of f with respect to the σ -algebra B(T ) of T -invariant measurable subsets of X. Recall that the corresponding notion of universally p-mean good sequence was defined in Section 1.3 (see after Corollary 1.3.7). Assume T is ergodic. The limit in (4.9.1) need not necessarily be f dμ. For j2 2 instance, the averages along the squares N1 N j =1 T f converge in L (μ) for any f ∈ L2 (μ), but the limit is not necessarily constant for some ergodic transformations. Further, Bellow [1989] showed that there are subsequences of integers such that the averages in (4.9.1) fail to converge when applied to some f ∈ Lp (μ), p < p0 , p0 > 1, but converge for all f ∈ Lp (μ) for each p ≥ p0 . Boshernitzan pointed out that one can modify any increasing sequence, in particular a universally good sequence, by selecting either a 0 or a 1 to add at each point of the sequence, and obtain a sequence which is no longer universally good. Rosenblatt [1997] showed, however, that for “generic”

194

4 Pointwise ergodic theorems

invertible measure-preserving transformations the limit in (4.9.1) is always the integral f dμ. Similarly, we say that N is p-universally bad when, for any measurable dynamical system (X, A, μ, T ), there exists an f ∈ Lp (μ) such that the limit (4.9.1) fails to exist for all x in a set of positive μ-measure. Refinements of this property, namely the sweeping out properties can be defined as follows. We say that N is ∞-sweeping out for Lp when for every aperiodic dynamical system (X, A, μ, T ), there is an f ∈ Lp (μ) such that lim sup n→∞

N 1 nj a.s. T f (x) = ∞. N j =0

And we also say that N is strongly sweeping out iff in every aperiodic dynamical system (X, A, μ, T ), for every ε > 0, there is a set B with μ(B) < ε such that

lim sup n→∞

N 1 nj a.s. T 1B (x) = 1 N j =0

and

lim inf n→∞

N 1 nj a.s. T 1B (x) = 0. N j =0

In this section, we only briefly list some of the most famous examples of universally good or bad sequences. A spectacular result obtained by Bourgain [1988b], [1988c] establishes that the sequence of squares nk = k 2 , k = 1, 2, . . . is p-universally good for 1 < p < ∞. The proof uses the circle method on the shift model Z. A nice presentation of Bourgain’s arguments is given in [Thouvenot: 1989]. Recently Buczolich and Mauldin [2005], answering a question of Bourgain, showed that this sequence is 1universally bad. Bourgain also showed that the sequence nk = q(k), k = 1, 2, . . . , where q is a polynomial with integer coefficients is p-universally good for 1 < p < ∞. A result of the same kind was obtained for√the sequence of primes nk = pk , the k-th prime by Bourgain [1988b] (for p > (1 + 3)/2) and Wierdl [1988] (for p > 1), namely this sequence is also p-universally good for 1 < p < ∞. In a nicely written paper, Nair [1991] established that the sequence nk = q(pk ), k = 1, 2, . . . where q is a polynomial with integer coefficients is p-universally good for 1 < p < ∞. Buzcolich [2007] constructed a sequence {nk , k ≥ 1} such that nk+1 − nk → ∞, and for any ergodic dynamical system (X, α, μ, T ) and f ∈ L1 (μ), the averages N n (1/N) k=1 f (T k x) converge to X f dμ for μ-almost every x. The sequence being of zero Banach density, this disproves a conjecture of Rosenthal and Wierdl about the non-existence of such sequences. Krengel [1971] showed that there exist subsequences which are universally bad in Lp , 1 ≤ p < ∞. Lacunary sequences are strongly sweeping out (see Bellow [1983] and Akcoglu, Bellow, Jones, Losert, Reinhold-Larsson, Wierdl [1996], see also Jones and Wierdl [1994]). Consequently, a universally p-mean good sequence “must” satisfy lim

k→∞

nk+1 = 1. nk

4.9 Subsequence averages

195

Jones and Wierdl [1994] showed that if N satisfies the condition − 21 +ε nk+1 > e(log n) nk

for some positive ε, then it is ∞-sweeping for L2 (later Jones, Lacey and Wierdl [1999] also showed that there exists a universally 2-mean good sequence N satisfying for every −1−ε ε > 0, the condition nk+1 /nk > e(log n) , for all k > k(ε)). A basic fact used there is that for any m-tuple of positive reals v = (v1 , . . . , vm ) satisfying vk+1 > 2q, k = 1, 2, . . . , m − 1, vk then for any e = (e1 , . . . , em ) ∈ Zm , there is r > 0 so that vi r ≡ ei (mod q),

i = 1, 2, . . . , m.

This is Lemma 2.13 in Jones and Wierdl [1994], an article to which we refer for many other examples, the references therein and for a good understanding of the other arguments showing that lacunary sequences are universally bad. A general result (Theorem 2.3) for proving divergence of ergodic averages is also established. We also refer to [Rosenblatt–Wierdl: 1995] and to the works of Akcoglu, Bellow, Bourgain, Del Junco, Jones, Krengel, Lacey, Losert, Olsen, Petersen, Reinhold-Larsson, Rosenblatt, Tempelman, Wierdl, etc. The American school of ergodic theory has made an important contribution to the study of this attractive problem, in particular under the “dynamical” impulse of Bellow, Jones, Lacey, Petersen, Rosenblatt, Wierdl and their collaborators. A monograph making a synthesis of all results obtained, as well as a clear and accessible presentation of the main arguments would be very welcome and certainly very helpful. Lacunary sequences play a key role in many fundamental questions of analysis, probability theory, or Fourier analysis and here in ergodic theory. We shall notably see their interplay in studying the central limit theorem (Chapter 7) and the convergence properties of the system {f (nk x), k ≥ 1} (Chapter 12). Below, we indicate an unexpected arithmetical property of these sequences, which we think is worth being mentioned. An arithmetical property of lacunary sequences. Burr [1970] raised the following question: let a1 < a2 < · · · be a sequence of integers, call it A, and let P (A) = εi ai , εi = 0 or 1, ai ∈ A and i εi < ∞ . i

Which sets S of integers are equal to P (A) for some A? Burr mentioned that if the complement of S grows sufficiently rapidly, then there exists such a sequence A. Hegy´vari [1996] showed that if B = {bi , i ≥ 1} is such that 7 ≤ b1 < b2 < · · · and bn+1 ≥ 5bn for every n,

196

4 Pointwise ergodic theorems

then there exists a sequence A such that P (A) = N\B,

(4.9.3)

thereby improving substantially an earlier unpublished result of Burr. He also showed that his result cannot be improved essentially, which is a quite remarkable fact. More precisely, if B is such that bn+1 ≤ 2bn

for every n large enough,

and B is a Sidon set, namely bi + bj = bk + b implies i = k, j = t or i = t, j = k, then there is no sequence A for which P (A) = N\B. It seems that this kind of property or some variant of it deserves more investigation. We refer to Hegy´vari’s paper for more details and more results. Among these is another one, answering a question raised by Ruzsa, which we wish to include in these remarks: for any pair of real numbers 0 ≤ α ≤ β ≤ 1, there is a set A: a1 < a2 < · · · for which #{P (A) ∩ [1, n]} = α, n #{P (A) ∩ [1, n]} d(P (A)) = lim sup = β. n n→∞ d(P (A)) = lim inf n→∞

(4.9.4)

Random subsequences. There are two remarkable types of studies. The first originates from a work by Bourgain [1988b] who considered a special kind of averages. Here we are given a sequence {Yj , j ≥ 1} of independent random variables such that P{Yj = 0} = 1 − σj , P{Yj = 1} = σj , 0 < σj < 1 for all j , and we form, given any measurable dynamical system (X, A, μ, T ) and f ∈ L0 (μ), the averages Aωn f =

1 #{j ≤ n : Yj (ω) = 1}

f τj.

j ≤n:Yj (ω)=1

Only partial results exist. Bourgain proved that if (a) the sequence {σn , n ≥ 1} is decreasing,

(b) limn→∞

j ≤n σj log n

= ∞,

then for almost every ω the sequence Nω = {j : Yj (ω) = 1} is mean-good. Jones, Lacey and Wierdl [1999] showed for the limit case σj = 1/j that the sequence Nω is not mean-good for a measurable subset of ω of positive measure. The basic idea consists in showing that Nω contains a lacunary subsequence which has positive density in Nω . Notice by the weighted strong law of large numbers, that if σ = ∞, denoting j j n = j ≤n σj , #{j ≤ n : Yj = 1} j ≤n Yj a.s. 1 = −→ , n n 2

4.9 Subsequence averages

197

and so the averages Aωn f have the same limit behavior as the weighted averages Aωn f =

1 1 Yj (ω)f τ j = (Yj (ω) − E Yj + E Yj )f τ j . n n j ≤n

j ≤n

Since E Yj = σj /2 is decreasing, we deduce from Corollary 4.8.4 (i) that the weighted ergodic averages 1 E Yj f τ j n j ≤n

converge almost everywhere to Eτ (f )/2 for any f ∈ L1 (μ). And therefore only the limit behavior of the averages 1 Aωn f = (Yj (ω) − E Yj )f τ j , n = 1, 2, . . . n j ≤n

remains to be known. Consider the related random polynomials Qn (t) = (Yj − E Yj )e2iπj t , n = 1, 2, . . . . j ≤n

It follows from Example 2 given right after Theorem 8.5.1 that the increment condition (8.5.4) is fulfilled, and so Corollary 8.5.3 (c) applies. We get for the limit case σj = 1/j , E sup |Qn (t)| = O(log n). t∈T

A second remarkable example of subsequence averages built from random subsequences is described as follows: let {Xj , j ≥ 1} be a sequence of i.i.d. Z-valued random variables and form their partial sums Sn = X1 + · · · + Xn , n ≥ 1. Assume that T is invertible and consider the ergodic averages 1 f T Sj , n n−1

Bn f =

n ≥ 1.

j =0

Lacey, Petersen, Rudolph and Wierdl [1994] showed that if E X1 = 0 and E X12 < ∞, then for almost all ω, the sequence {Sn , n ≥ 1} is p-mean good for any p, 1 < p < ∞. Gamet and Schneider showed that under the condition E |X1 |δ < ∞ for some δ > 0, then for almost all ω, the sequence {Sn , n ≥ 1} is 2-mean good. Their result is also valid for Zd -actions and i.i.d. Zd -valued random variables. Let ϕ(t) = E e2iπ X1 t be the characteristic function of X1 . The behavior (in mean and almost sure) of the averages Bn is naturally related to that of the sup-norm of the polynomials Pn (t) =

n−1

e2iπ Sj t − ϕ j (t) ,

j =0

n = 1, . . . .

198

4 Pointwise ergodic theorems

Guillotin-Plantard [2002] showed the following sharp uniform bound: for every ε > 0, 2 a.s. sup |Pn (t)| = O(n5/6 log n), (4.9.5) t∈T

The proof cleverly develops martingale techniques used in [Lacey–Petersen–Rudolph– Wierdl: 1994], who previously established the same result, but with constant 7/8. The question naturally arises whether the constant 5/6 is optimal or not, and if so, whether 1/2+ε, ε arbitrarily small is suitable. An approach using stochastic processes methods presented in Chapters 8 and 9 remains also to do. Of interest for this question is probably the fact that for any 0 < ε < 1, 2 E Pn (t) − Pn (s) dsdt = O(n1+(5ε/2) ). (4.9.6) |s − t|1+ε T T A computation of the L2 -increments of Pn indeed yields 2 E Pn (α) − Pn (β) n ! " #" #

2 − ϕ(β − α)k − ϕ(α − β)k + ϕ(−β)k − ϕ(−α)k ϕ(α)k − ϕ(β)k = k=1 k−1 n

+

# " # !" − ϕ(β)k−l + ϕ(−α)k−l ϕ(α − β)l − 1 + ϕ(β − α)l − 1

k=2 l=1

# − ϕ(α − β)l − 1 ϕ(α)k−l − ϕ(β)k−l + ϕ(−β)k−l − ϕ(−α)k−l "

k k l l l l k k + ϕ(α) − ϕ(β) ϕ(−β) − ϕ(−α) + ϕ(α) − ϕ(β) ϕ(−β) − ϕ(−α) . (4.9.8) Elementary considerations on characteristic functions then imply k−1 n n 2

2

2 E Pn (α) − Pn (β) ≤ C k |α − β| ∧ 1 + kl|α − β|2 ∧ 1 . k=1

k=2 l=1

Now, owing to the fact that for a transient random walk n1 nk,l=1 P{Sk = Sl } → ∞ 2G(0, 0) − 1, where G(0, x) = k=0 P{Sk = x} is the Green function (which is finite for every x ∈ Z), we also have for any u ∈ T, 2 E Pn (α + u) dα ≤ Cn. T

These two facts easily imply the claimed property. Problem 4. Let A be an increasing sequence of positive integers. Find conditions ensuring that the set P (A) considered by Burr is 2-mean good or 2-universally good.

4.9 Subsequence averages

199

Problem 5. If B = {bi , i ≥ 1} is an increasing sequence of positive integers bn+1 /bn ≥ 1 + εn for every n where εn ↓ 0, what could be an analogous result to (4.9.3)? Problem 6. Does Theorem 5.2.4 provide an alternative way to prove that the sequence of primes is 1-universally bad? The numerous applications given by Stein of his result (see Chapter 5) suggest such a possibility. Problem 7. Is the estimate (4.9.5) improvable? Is it possible to develop an approach based on the majorizing measure method or the metric entropy method?

Chapter 5

Banach principle and continuity principle

In this chapter, we state and give the proof of several formulations of the Banach principle and the continuity principle, which have proved to be fundamental tools for the study of problems of convergence almost everywhere for sequences of operators. We study through some examples their application in analysis.

5.1

Banach principle

This principle, formulated by Banach in 1926, is a fundamental tool in the study of the almost everywhere property of sequences of Lp -operators with p finite. The statement corresponding to the case p = ∞ is much more recent and was obtained by Bellow and Jones in 1996. Its use will be crucial in the proof of the metric entropy criterion in L∞ in Chapter 8. We begin this section with some necessary background. Let (X, A, μ) be a probability space. Let L0 (μ) be the space of A-measurable functions f : X → R. For every f, g ∈ L0 (μ), we write |f − g| d(f, g) = dμ, ρ(f ) = d(0, f ). (5.1.1) X 1 + |f − g| μ

This metric defines the topology of the convergence in measure – gn → g if for any ε > 0, limn→∞ μ {|gn − g| > ε} = 0 – and we recall that (L0 (μ), d) is a complete metric space. Let (B, · ) be a Banach space and consider an application S from B to L0 (μ). Introduce the following definition. 5.1.1 Definition. We say that S is continuous in measure, or d-continuous, if for any sequence (f, fn , n ∈ N) ⊂ B, we have lim fn − f = 0 "⇒ lim d(Sfn , Sf ) = 0.

n→∞

n→∞

Then the Banach principle can be stated as follows. 5.1.2 Theorem. (a) Let S = {St , t ∈ N} be a family of operators St : B → L0 (μ). Assume there exists a nonincreasing function C : R+ → R+ with limt→∞ C(t) = 0 and ∀f ∈ B, ∀α > 0, μ x : sup |St (f )(x)| > α f ≤ C(α). (5.1.2) t∈N

201

5.1 Banach principle

Then the operators St are continuous in measure and the set L(S) = {f ∈ B : μ {(St (f ), t ∈ N) converges} = 1} is closed in B. (b) Conversely, if the operators St are continuous in measure, and if for any f ∈ B, μ sup |St (f )| < ∞ = 1, t∈N

then there exists a nonincreasing function C : R+ → R+ such that limt→∞ C(t) = 0 and ∀f ∈ B, ∀α > 0, μ sup |St (f )| > α f ≤ C(α). (5.1.3) t∈N

Proof. For every f ∈ B and t ∈ N, we write S ∗ (f ) = sup |Ss (f )|, s∈N

St∗ (f ) = sup |Ss (f )|.

(5.1.4)

s∈N,s≤t

Assertion (a) is immediate; the continuity in measure indeed follows from the inequality μ{|St (f ) − St (fn )| > ε} ≤ μ{S ∗ (f − fn ) > ε} ≤ C(ε f − fn −1 ) → 0, with f − fn . Let f ∈ L(S). There exists a sequence {fn , n ≥ 1} of elements of L(S) converging in B to f . Put ∀x ∈ X, ∀g ∈ B,

O(x, g) = lim

sup

T →∞ s,t∈N∩[T ,∞[

|Ss (g)(x) − St (g)(x)|.

Since O(x, f ) = |O(x, f )−O(x, fn )| ≤ O(x, f −fn ) ≤ 2S ∗ (f −fn )(x); we deduce for any ε > 0, μ{x : O(x, f ) > ε} ≤ μ{x : 2S ∗ (f − fn )(x) > ε} ≤ C

ε 2 f − fn

→ 0,

as n tends to infinity. And since ε is arbitrary, we get μ {x : O(x, f ) = 0} = 1. This shows that f ∈ L(S), and thus L(S) is closed. (b) Fix some ε > 0. By assumption, for each f ∈ B, μ{S ∗ (f ) < ∞} = 1. There thus exists a positive integer n = n(f, ε) such that μ{S ∗ (f ) > n} ≤ ε. Put for any positive integer n, Bn = {f ∈ B : μ{S ∗ (f ) > n} ≤ ε}. Then, B=

+ n≥1

Bn .

202

5 Banach principle and continuity principle

Besides, for any integer n ≥ 1, * Bn = Bn,t where Bn,t = {f ∈ B : μ{St∗ (f ) > n} ≤ ε}. t∈N

We first show that the sets Bn,t are closed. Let {fk , k ≥ 1} be a sequence of elements of Bn,t converging in B to f . Let h > 0 be fixed, then μ{St∗ (f ) > n + h} ≤ μ{St∗ (fk ) + St∗ (f − fk ) > n + h} and thus

μ{St∗ (f ) > n + h} ≤ inf μ{St∗ (fk ) > n} + μ{ St∗ (f − fk ) > h} ≤ ε, k≥1

by continuity in measure of St . As + * {St∗ (f ) > n} ⊂ St∗ (f ) > n + j1 J ≥1 j ≥J

= lim inf St∗ (f ) > n + j1 = lim (↓) St∗ (f ) > n + j1 , j →∞

we have

j →∞

μ{St∗ (f ) > n} = lim μ St∗ (f ) > n + j1 ≤ ε. j →∞

This shows that the sets Bn,t , and thereby the sets Bn are closed sets. We can thus write B as a countable union of closed sets. By virtue of the Baire theorem, one of these sets, say Bn , must have a nonempty interior. This set therefore contains a closed ball B(f0 , r) = {f ∈ B : f − f0 ≤ r}, r > 0. Consequently, μ{S ∗ (f ) > n} ≤ ε.

∀f ∈ B(f0 , r),

(5.1.5)

Writing then f in the form f = f0 + rz with z ∈ B, z ≤ 1, and observing that S ∗ (rz)(x) ≤ S ∗ (f0 )(x) + S ∗ (f0 + rz)(x), leads to μ{S ∗ (rz) > 2n} ≤ μ{S ∗ (f0 ) > n} + μ{S ∗ (f0 + rz) > n} ≤ 2ε.

(5.1.6)

Thus, for any z ∈ B, z ≤ 1, we have ∀α ≥

2n , r

μ{S ∗ (z) > α} ≤ 2ε.

(5.1.7)

μ{S ∗ (z) > α}.

(5.1.8)

Put C(α) =

sup

z∈B, z ≤1

Then C(α) ≤ 2ε, provided that α ≥

2n r .

As ε is arbitrary, we have on the one hand

lim C(α) = 0,

α→∞

(5.1.9)

5.1 Banach principle

203

and on the other, ∀f ∈ B, ∀α > 0,

μ sup |St (f )| > α f ≤ C(α).

(5.1.10)

t∈N

This achieves the proof. The importance of this result comes from the fact that it is often possible to establish the convergence μ-almost everywhere of the sequence {St f, t ∈ N} for f belonging to a countable dense subset of B. In many applications, B is an Lp (μ) space with 1 ≤ p < ∞. When p = ∞, namely when {St f, t ∈ N} is a sequence of continuous operators in measure, or simply continuous from L∞ (μ) to L∞ (μ), the fact that for any f ∈ L∞ (μ), μ x : sup |St (f )|(x) < ∞ = 1 t∈N

does not bring any significant information. A different formulation of this principle is then necessary. It is precisely the object of our next statement, which is due to Bellow–Jones [1996]. Put Y = {f ∈ L∞ (μ) : f ∞ ≤ 1}. We endow Y with the distance d associated to the convergence in measure, which is defined in (5.1.1). Observe that the distances d and · p , 1 ≤ p < ∞ are equivalent on Y. Indeed, one easily establishes for any f, g ∈ Y, 1

f − g 1 ≤ d(f, g) ≤ f − g 1 , 3

p

f − g p ≤ 2p−1 f − g 1 ≤ 2p−1 f − g p . (5.1.11)

5.1.3 Definition. Let S : Y → L0 (μ) be not necessarily linear. We say that S is continuous at 0, if S is d-continuous at 0 on Y. Let us make some useful comments. When S : L∞ (μ) → L0 (μ) is linear, then S is continuous at 0 if and only if S is d-continuous on L∞ (μ). Let (E, · ) be a normed space. When S : E → L0 (μ) is a sublinear operator (i.e., |S(λf )| = |λ||S(f )| for any f ∈ E and any real λ, and |S(f1 + f2 )| ≤ |S(f1 )| + |S(f2 )| for any f1 , f2 ∈ E), it is well known (see for instance Garsia [1970]) that S is continuous at 0 ∈ E, if and only if the function ϕ : ]0, ∞) → [0, ∞) defined by ϕ(λ) =

sup

f ∈E, f ≤1

μ{x : |Sf (x)| > λ}

tends to 0 as λ tends to infinity. Further, if E = L∞ (μ) and if the operators Sn ’s are continuous from L∞ (μ) to ∞ L (μ), the property μ{S ∗ f < ∞} = 1 for all f ∈ L∞ (μ) is often automatically satisfied; for instance if the Sn ’s are all contractions in L∞ (μ). But this does not necessarily imply that the sequence {Sn (f ), n ≥ 1} converges almost everywhere for

204

5 Banach principle and continuity principle

any f ∈ L∞ (μ), as is shown through the following example. On the circle (T, λ) consider for f ∈ L0 (λ), x ∈ T and any positive integer n, the averages operators 1 Sn,θ f (x) = f (x + 2k θ ). n n−1 k=0

Clearly

Sθ∗ f = sup |Sn,θ f | ≤ f ∞ . n≥1

So we do have μ{Sθ∗ f < ∞} = 1 for all f ∈ L∞ (λ). Further, for almost all θ ∈ T the sequence {2k θ, k ≥ 1} is uniformly distributed (mod 1). For such a θ , the sequence

{Sn,θ f, n ≥ 1} converges for all x if f is continuous. However, for every irrational θ ∈ T the convergence almost everywhere of this sequence is known to fail for some f ∈ L∞ (λ) (see Rosenblatt [1991] and Akcoglu, Bellow, Jones, Losert, ReinholdLarsson and Wierdl [1996]). We begin with a first theorem connecting the convergence almost everywhere to the continuity property at 0 of the maximal operator. 5.1.4 Theorem. Let {Sn , n ≥ 1} be a sequence of linear operators of L∞ (μ) in L0 (μ). Assume that the following conditions are realized: a) ∀f ∈ L∞ (μ), μ {S ∗ (f ) < ∞} = 1, b) S ∗ : Y → L0 (μ) is continuous at 0. Then the set E = {f ∈ Y : μ{(Sn (f ), n ≥ 1) converges} = 1} is closed in (Y, d). Consequently, if for any f in some countable dense subset D of (Y, d) the sequence {Sn (f ), n ≥ 1} converges almost everywhere, this will be also fulfilled for any f ∈ L∞ (μ). The next statement shows that the additional condition b) is natural. 5.1.5 Theorem. Let {Sn , n ≥ 1} be a sequence of linear operators of L∞ (μ) in L0 (μ). Assume that the following conditions are realized: a) each Sn is continuous at 0, b) for any f ∈ L∞ (μ), μ{x : {Sn (f )(x), n ≥ 1} converges} = 1. Then S ∗ : Y → L0 (μ) is continuous at 0. These results are Theorems 1 and 2 in [Bellow–Jones: 1996]. Theorem 5.1.5 was already stated (without proof) in Bourgain [1988a] under an additional commutation assumption needed in the context of the article. The proof follows, in turn, the same line of arguments as in the proof of the classical Banach principle. The one of Theorem 5.1.4 being rather elementary (combine subadditivity of oscillations functions O(x, f ) introduced in the proof of Theorem 5.1.2 with the continuity at 0 of the maximal operator S ∗ f , namely its continuity in measure at 0), we only give the

205

5.1 Banach principle

Proof of Theorem 5.1.5. We recall in view of (5.1.11) that (Y,

1 ) is a complete metric space. For any δ > 0 and f ∈ Y, we write Vf (δ) = {g ∈ Y : f − g 1 < δ}. Let 0 < ε < 1/2 and N be some positive integer. Put CN (ε) = f ∈ Y : μ{x : sup |SN f (x) − Sm f (x)| > ε} ≤ ε m≥N

and for M > N, CN,M (ε) = f ∈ Y : μ{x :

sup

N ≤m≤M

|SN f (x) − Sm f (x)| > ε ≤ ε}.

Since each Sn is linear and continuous at 0, it is continuous in measure. The sets CN,M (ε) are therefore closed. We omit the details since it is essentially a repetition of the proof -that the sets Bn,t are closed in the demonstration of Theorem 5.1.3. As CN (ε) = M>N CN,M (ε), the sets CN (ε) are closed as well. But our assumption implies that ∞ + CN (ε). Y= N =1

And so in view of the Baire theorem, one of these sets, call it CN (ε), must have a nonempty interior. Thus there exists f ∈ CN (ε) and δ > 0 such that Vf (δ) = f + V0 (δ) ⊂ CN (ε). For each g ∈ Vf (δ), we have μ{x : supm≥N |SN g(x) − Sm g(x)| > ε} ≤ ε. Thereby if h ∈ V0 (δ), writing h = f − g for some g ∈ Vf (δ) we get μ x : sup |SN h(x) − Sm h(x)| > 2ε ≤ 2ε. m≥N

But S ∗ h(x) = sup |Sm h(x)| ≤ sup |SN h(x) − Sm h(x)| + 2 sup |Sm h(x)|. m≥1

m≥N

m≤N

Hence 1 − 2ε ≤ μ x : sup |SN h(x) − Sm h(x)| ≤ 2ε m≥N

≤ μ x : S ∗ h(x) ≤ 2 sup |Sm h(x)| + 2ε . m≤N

Let C = {S ∗ h ≤ 2 supm≤N |Sm h| + 2ε}. Then S∗h S∗h ∗ dμ + dμ ρ(S h) = ∗ 1 + S∗h Cc 1 + S h C 2 supm≤N |Sm h| + 2ε ≤ dμ + μ(C c ) C 1 + 2 supm≤N |Sm h| + 2ε

206

5 Banach principle and continuity principle

supm≤N |Sm h| dμ + 1 + supm≤N |Sm h| + 2ε

≤2 X

X

2ε dμ + 2ε 1 + 2ε

≤ 2ρ( sup |Sm h|) + 4ε. m≤N

But each Sn is continuous at 0 by assumption, and so for some δ < δ we have that ρ(supm≤N |Sm h|) ≤ ε whenever h 1 < δ . This allows us to write ρ(S ∗ h) ≤ 5ε,

h 1 ≤ δ .

As ε can be arbitrarily small the proof is now complete.

5.2

Continuity principle

Let (X, A, μ) be a probability space. For sequences of operators {Sn , n ≥ 1}, Sn : Lp (μ) → L0 (μ), 1 ≤ p < ∞, which are continuous in measure and commuting with a mixing family of transformations of (X, A, μ), the Banach principle can be strengthened into a continuity principle. The study of this principle is the object of this section. We will make the following commutation assumption: (H ) There exists a family E of measurable transformations of X, preserving μ, which are mixing in the following sense: ∀A, B ∈ A, ∀α > 1, ∃E ∈ E ,

μ(A ∩ E −1 (B)) ≤ αμ(A)μ(B),

(5.2.1)

and commuting with the sequence of operators {Sn , n ≥ 1}: Sn (f E) = (Sn f ) E for any n ≥ 1, f ∈ Lp (μ) and E ∈ E . Remarks. 1. Assumption (H) is verified when for instance the operators Sn commute on Lp (μ) with an ergodic endomorphism τ from (X, A, μ): Sn (f τ ) = Sn (f ) τ . Indeed, it is then easy to check that the family E = {τ n , n ∈ N} satisfies the mixing condition (5.2.1). Let A, B ∈ A, by ergodicity of τ , 1 μ(A ∩ τ −k (B)) = μ(A)μ(B). n→∞ n n−1

lim

k=0

Let α > 1, for sufficiently large n we have 1 μ(A ∩ τ −k (B)) ≤ αμ(A)μ(B). n n−1 k=0

And this implies that there exists an integer k = k(n, α) < n such that μ(A ∩ τ −k (B)) ≤ αμ(A)μ(B),

207

5.2 Continuity principle

hence (5.2.1). Assumption (H) has also two useful consequences. 2. A first consequence concerns the sequence {Sn , n ≥ 1}: ∀n ≥ 0, μ{x : Sn (1)(x) = constant} = 1.

(5.2.2)

Indeed, let a and b be two reals satisfying a < b. Apply (5.2.1) to A = {x : S(1)(x) ∈ [a, b]} = B, where S is arbitrary in {Sn , n ≥ 1}. By assumption, for any E ∈ E , we have: S(1)E = S(1 E) = S(1), and thus E −1 A = A. From (5.2.1) thus follows that μ(A) ≤ αμ(A)2 , for any α > 1. Consequently, μ(A) = μ(A)2 , Therefore, ∀a, b ∈ R, a < b,

μ{x : S(1)(x) ∈ [a, b]} = 0 or 1.

(5.2.3)

From this follows that for some integer n0 , noting I = [n0 , n0 + 1], "

1#

μ{x : S(1)(x) ∈ I } = 1. "

#

Let I1 = n0 , n0 + 2 and I2 = n0 + 21 , n0 + 1 . Then there exists i ∈ {1, 2} such that μ{x : S(1)(x) ∈ Ii } = 1.

"

#

"

Assume for instance that it is I1 . Dividing I1 into I3 = n0 , n0 + 41 and I4 = n0 + 1 1# 4 , n0 + 2 , one progressively builds – by iterating the same argument – a decreasing sequence of compact intervals which we denote by {Jn , n ≥ 1}, verifying a) ∀n ≥ 1, μ{x : S(1)(x) ∈ Jn } = 1, b) ∀n ≥ 1, |Jn | = 21n . It follows from this that there exists a real λ such that μ{x : S(1)(x) = λ} = 1,

(5.2.4)

as claimed. 3. A second consequence concerns the mixing property (5.2.1). This one indeed implies that ∀A ∈ A, ∀n ≥ 2,

∃E1 , E2 , . . . , En ∈ E , such that if A = A ∪

n +

Ei−1 A, then

i=1

2 . (5.2.5) 1 − μ(A ) Said differently, in order to bound μ(A) by C/n, it suffices to show that μ(A ) < 1, and 2 then take C = 1−μ(A ) . This will be one of the key tools of the proofs of Theorems 5.2.1 and 8.2.1. nμ(A) ≤

208

5 Banach principle and continuity principle

In order to establish (5.2.5), the following intermediate property will be needed: ∀C ∈ A, ∀α > 1, ∀n ≥ 1, ∃E1 , E2 , . . . , En ∈ E , such that μ(C ∩ E1−1 C ∩ · · · ∩ En−1 C) ≤ α n μ(C)n+1 .

(5.2.6)

For n = 1, it suffices to apply (5.2.1) with the choice A = B = C. We find that there exists E1 ∈ E such that μ(C ∩ E1−1 C) ≤ αμ(C)2 . For n = 2, we apply again (5.2.1) with this time, the following choices A = C ∩ E1−1 C, B = C. Then there exists E2 ∈ E such that μ(C ∩ E1−1 C ∩ E2−1 C) ≤ αμ(C ∩ E1−1 C)μ(C) ≤ α 2 μ(C)3 . The reasoning made for n = 2 is next easily iterated for any integer n > 2; hence property (5.2.6). Now we show how to deduce property (5.2.5). Let A ∈ A be fixed. We can assume 0 < μ(A) < 1; indeed (5.2.5) is obvious if μ(A) = 0 whereas if μ(A) = 1, then μ(A ) = 1 and so (5.2.5) is also trivially realized. Observe that for any E ∈ E , E −1 (Ac ) = (E −1 A)c . Apply then (5.2.6) to C = Ac , 1 α = √1−μ(A) . There thus exists E1 , E2 , . . . , En ∈ E such that μ(Ac ∩ (E1−1 A)c ∩ · · · ∩ (En−1 A)c ) = 1 − μ(A ) ≤ α n (1 − μ(A))n+1 ≤ (1 − μ(A)) 2 . n

In other words, 2

μ(A) ≤ 1 − (1 − μ(A )) n . It is at this stage that the little technical restriction n ≥ 2 is used. Indeed, if f (x) = 2 2 (1 − x) n , 0 ≤ x ≤ 1, then f (x) = − n2 (1 − x) n −1 is decreasing. Applying then the mean value theorem to f gives μ(A) ≤ f (0) − f (μ(A )) ≤ μ(A )

2 2 1 1 ≤ . 2 n [1 − μ(A )]1− n n 1 − μ(A )

And this establishes property (5.2.5). We can now state the continuity principle. 5.2.1 Theorem (Stein [1961]). Suppose that {Sn , n ≥ 1} is a sequence of operators, Sn : Lp (μ) → L0 (μ), 1 ≤ p ≤ 2, which are continuous in measure and satisfy the commutation assumption (H). Then the following properties are equivalent: ∀f ∈ Lp (μ), ∃0 < C < ∞ : ∀f ∈ L (μ), p

μ{x : S ∗ f (x) < ∞} = 1, ∗

(5.2.7)

sup λ μ{x : S f (x) > λ} ≤ C p

λ≥0

|f |p dμ. (5.2.8) X

209

5.2 Continuity principle

We refer to (5.1.4) for the notation S ∗ f . A useful remark is the following: let < p; under (5.2.7) we get from (5.2.8) that S ∗ f ∈ Lp (μ), hence Sn f ∈ Lp (μ), n = 1, . . . . And if

p

μ{x : {Sn f (x), n ≥ 1} converges} = 1,

denoting S(f ) the limit, by the dominated convergence theorem S(f ) ∈ Lp (μ) as well. Although the proof we shall present of this theorem is much inspired by the one given in Garsia [1970], it differs in two points which seem of interest. First, we will use Gaussian random variables instead of Rademacher random variables in Stein’s randomisation technique. But above all, we will proceed with a direct reasoning unlike in Garsia [1970]. This will have the advantage to better highlight the basic arguments of the proof. Proof. We denote by E0 the identity of X. Let f ∈ Lp (μ) be such that f p,μ = 1. Let also λ > 0 and A = {x : S ∗ f (x) > λ}. Let n ≥ 2 be some integer, which will be determined later relatively to λ. By , virtue of property (5.2.5), we can find E1 , E2 , . . . , En ∈ E such that if A = A ∪ ni=1 Ei−1 A, then nμ(A) ≤

2 . 1 − μ(A )

Let (gi )i≥0 be a sequence of i.i.d. N (0, 1) distributed random variables defined on a different probability space which we denote by (, B, P). Associate to f the Gaussian Stein’s elements ∀n ≥ 1,

Fn,f =

n 1 gk f Ek . (1 + n)1/p

(5.2.9)

k=0

Step 1. Moment of order p of Fn,f . First observe, by means of Hölder inequality,

p/2

|Fn,f |p d P ≤

|Fn,f |2 d P

=

p/2 n 1 2 f E . i (1 + n)2/p i=0

As 1 ≤ p ≤ 2,

p/2 n n n p/2 1 1 2 1 p 2 f E = f E ≤ |f | Ei . i i (1 + n)2/p 1+n 1+n i=0

Thus

i=0

i=0

p

|Fn,f |p d P dμ ≤ f p,μ = 1. X

Hence by Fubini’s theorem

p

Fn,f p,μ d P ≤ 1.

(5.2.10)

210

5 Banach principle and continuity principle

Step 2. Size of A . Let x ∈ A . There thus exists an index 0 ≤ i ≤ n such that x ∈ Ei−1 A. We have consequently S ∗ (f )(Ei x) > λ.

(5.2.11)

For some integer m, we will thus have |Sm (f )(Ei x)| > λ. Now by using the commutation assumption n n 1 1 Sm (Fn,f ) = gk Sm (f Ek ) = gk Sm (f )Ek = Fn,Sm (f ) . (1 + n)1/p (1 + n)1/p k=0

k=0

The random variables gk being symmetric, we also have n 1 P sign gk Sm (f ) Ek = sign(gi Sm (f ) Ei ) = . 2 k=0

(5.2.12)

k =i

Let w be such that P{|gi | ≥ w} = 3/4. We can thus assign to any element x of A , a measurable set Ix of probability P{Ix } ≥ 1/4 such that ω ∈ Ix "⇒ S ∗ (Fn,f )(x, ω) ≥

λw . (1 + n)1/p

(5.2.13)

Let = {ω : Fn,f ( ·, ω) p,μ ≤ 81/p }. By means of (5.2.10) and Tchebycheff inequality, P( ) ≥ 1 − (1/8) = 78 . Thus for any x ∈ A , P{Ix ∩ } ≥

Define ϕ(x, ω) = Let x ∈ A , then

1 ≤ 8

1 . 8

1

if S ∗ (Fn,f )(x, ω) >

0

else.

(5.2.14)

ϕ(x, ω) d P ≤

∩Ix

λw , (1+n)1/p

ϕ(x, ω) d P.

By integrating this inequality on A relatively to μ, next using Fubini’s theorem, we obtain μ(A ) ϕ(x, ω) d P(ω) dμ(x) ≤ 8 A ≤ ϕ(x, ω) d P(ω) dμ(x) (5.2.15) X λw μ S ∗ (Fn,f ) > d P. = (1 + n)1/p

211

5.2 Continuity principle

By virtue of the Banach principle, there exists a real C such that ∀g ∈ Lp (μ),

μ{S ∗ (g) > C g p,μ } ≤

1 . 9

(5.2.16)

As Fn,f p,μ ≤ 81/p on , we thus have on this set μ{S ∗ (Fn,f ) > C81/p } ≤ μ{S ∗ (Fn,f ) > C Fn,f p,μ } ≤ Choose then n such that

λw (1+n)1/p

≥ C81/p , namely 1 + n ≤

assumed n ≥ 2, this is possible only if λw ≥

241/p C.

p

n = sup m ≥ 2 : 1 + m ≤ It follows from (5.2.17) that μ{S ∗ (Fn,f ) >

1 λw 8 C

μ(A ) ≤ As in view of the first step, nμ(A) ≤ n≥

1 λw p 16 ( C ) .

(5.2.17)

λw p C

. As we have

For this choice of λ, let

=

$

1 λw 8 C

λw 1 } dP ≤ . 1/p (1 + n) 9

We have thus shown that

1 8

1 . 9

p

%

−1 .

(5.2.18)

8 . 9

2 1−μ(A ) ,

we also deduce nμ(A) ≤ 18. But

Said differently μ{S ∗ (f ) > λ} ≤

C 18 ≤ 300 n λw

p

,

(5.2.19)

1/p 1/p ∗ if λw

C≥ p 24 C. Finally, if 0 < λw < 24 C, we have μ{S (f ) > λ} ≤ 1 ≤ 24 λw . Summarizing, for any λ > 0,

μ{S ∗ (f ) > λ} ≤ 300

C λw

p

.

(5.2.20)

The proof is thus achieved. Remarks. 1. Inequality (5.2.15) is crucial. When combined with the initial inequality, it provides the key of the proof: nμ{S ∗ (f ) > λ} − 2 λw ∗ μ S (F ) > d P, (5.2.21) ≤ 8 n,f 1 nμ {S ∗ (f ) > λ} (1 + n) p this being verified for any λ > 0 and any integer n ≥ 2. That inequality also indicates a possible bifurcation at this stage of the proof. Indeed, by inverting the order of

212

5 Banach principle and continuity principle 1

p integration, and letting λ = M w · (n + 1) , where M is a positive real and n ≥ 2 integer, we have nμ{S ∗ (f ) > λ} − 2 P{S ∗ (Fn,f ) > M} dμ. (5.2.22) ≤8 nμ{S ∗ (f ) > λ} X

Said differently, S ∗ (f ) is controlled by means of S ∗ (Fn,f ) for an appropriate choice of the integer n, which is a very striking fact. 2. If 1 < p ≤ 2, then S ∗ (f ) ∈ L1 (μ) whenever f ∈ Lp (μ). However, if we do not assume that the operators commute with a mixing family, the maximal function need not be integrable even for f ∈ L∞ . Wierdl’s counterexample. Consider the following example given in Bellow and Jones [1994: 157]: 2 1 Sn f (x) = n(n + 1) f (t)dt χ]1/(n+1),1/n[ (x). 0

These operators are contractions on L2 (T) and converge to 0 for all x ∈ T. But, for any f ∈ L2 (T) such that T f (t)dt > 0 we have

sup Sn f (x)dx =

T

T n≥1

f (t)dt

∞ 2

n(n + 1)

n=1

1/n

1 dx = ∞.

1/(n+1)

3. The result is optimal. Indeed, without any additional assumption on the sequence of operators {Sn , n ≥ 1}, one may give an example (Stein [1961]) showing that it is no longer true for Lp (μ) with p > 2. " " A counterexample for p > 2. Let T = R/Z = − 21 , 21 be the torus equipped = e(nx), for with the normalized Lebesgue measure λ. Let e(x) = e2iπ x and en (x)

, 1/2 −1 2 n ∈ Z. Let h(x) = (|x| log(1/|x|)) , x ∈ T. Then h ∈ L (T)\ q>2 Lq (T) . Let h(x) ∼ n∈Z cn en (x), with n∈Z |cn |2 < ∞. If {εn , n ∈ Z} is a Rademacher sequence, then 2 exp cn εn en dλ < ∞. E T

n∈Z

Thus, there is a sequence of ±1’s, again denoted by {εn , n ∈ Z}, such that * cn εn en ∈ Lp (T). f = n∈Z

p 2. Then, it would follow that T g q < ∞ whenever 2 ≤ q < p if g ∈ Lq (T). Now, take g = f . Then Tf = n∈Z cn en = h, and we obtain a contradiction since h ∈ / Lq (T), if q > 2. It is possible, however, to prove a partial extension to Lp spaces with p > 2. 5.2.2 Theorem (Stein [1961]). Suppose that {Sn , n ≥ 1} is a sequence of operators, Sn : Lp (μ) → L0 (μ), 2 ≤ p < ∞, continuous in measure and satisfying the commutation assumption (H ). Then the following properties are equivalent: μ{x : S ∗ f (x) < ∞} = 1, (5.2.23) 1/p ≤ C. sup λ2 μ x : S ∗ f (x) > λ |f |p dμ

∀f ∈ Lp (μ), ∃0 < C < ∞ : ∀f ∈ Lp (μ),

λ≥0

X

(5.2.24) Sketch of proof. The proof is nearly the same as that of Theorem 5.2.1. We just indicate the modification to be incorporated into the line of arguments. Instead of (5.2.9), we associate to f the random elements Fn,f =

n 1 gi f E i , (1 + n)1/2

n = 1, 2, . . . .

(5.2.9 )

i=0

Let f ∈ p,μ > 0. The elementary integrability properties of Gaussian random variables plus a plain convexity argument imply p/2 p p |Fn,f | d P ≤ Cp |Fn,f |2 d P Lp (μ) with f

=

p Cp

p/2 1 2 f Ei 1+n n

i=0

≤

p Cp

n

1+n

i=0

|f |p Ei ,

where the constant Cp depends on p only. Thus, p p |Fn,f |p d P dμ ≤ Cp f p,μ . X

Hence, by means of Fubini’s theorem p p p

Fn,f p,μ d P ≤ Cp f p,μ .

Replace f by f = f/(Cp f p,μ ). The rest of the proof then is as before.

(5.2.10 )

214

5 Banach principle and continuity principle

Sawyer [1966] has observed, nevertheless, that Theorem 5.2.1 remains valid in any Lp (μ) with 1 ≤ p < ∞ for positive operators: for any n ≥ 1 and any f ∈ Lp (μ), μ f ≥ 0 = 1 "⇒ μ Sn (f ) ≥ 0 = 1. (5.2.25) This is the object of the following theorem. Before considering this, it is worthwhile remarking that, by assumption, these operators are continuous on L∞ (μ). A first consequence of the commutation assumption together with positivity is that for any f ∈ L∞ (μ) |Sn (f )| ≤ Sn (|f |) ≤ f ∞ Sn (1). (5.2.26) There thus exists positive real An such that ∀f ∈ L∞ (μ),

Sn (f ) ∞ ≤ An f ∞ ,

hence the continuity. The result can be stated as follows. 5.2.3 Theorem. Let 1 ≤ p < ∞ and let {Sn , n ≥ 1} be a sequence of positive operators, Sn : Lp (μ) → L0 (μ), and continuous in measure. We further assume that the sequence {Sn , n ≥ 1} satisfies the commutation assumption (H). Then the following properties are equivalent: μ{x : S ∗ f (x) < ∞} = 1, (5.2.27) 1/p ≤ C. sup λp μ x : S ∗ f (x) > λ |f |p dμ

∀f ∈ Lp (μ), ∃0 < C < ∞ : ∀f ∈ Lp (μ),

λ≥0

X

(5.2.28) Proof. Here again a direct proof is accessible. This result will be easier to prove than Theorem 5.2.1. We denote by E0 the identity of X. Let f ∈ Lp (μ) be such that

f p,μ = 1 and μ{x : f (x) ≥ 0} = 1. Let λ > 0 be fixed. We associate to them the set A = {x : S ∗ f (x) > λ}. Let n ≥ 2 be some integer which will be defined later on with respect to λ.,By virtue of property (5.2.5), there exist E1 , E2 , . . . , En ∈ E such that if A = A ∪ ni=1 Ei1 A, then nμ(A) ≤ Introduce the auxiliary element F =

1 (1+n)1/p

1 ≤ (n + 1) n

p

F p,μ

2 . 1 − μ(A )

(5.2.29)

max0≤k≤n |f Ek |. Then

|f Ek |p dμ = 1.

k=0 X

The positivity assumption of operators Sn moreover implies that ∀m ≥ 1, ∀k = 0, . . . , n,

Sm (F ) ≥

1 1 Sm (f Ek ) = (Sm f )Ek . (n + 1)1/p (n + 1)1/p

215

5.2 Continuity principle

Consequently for k = 0, . . . , n, S∗F ≥

1 (S ∗ f ) Ek . (n + 1)1/p

It follows that μ{S ∗ F >

λ } ≥ μ{∃k ∈ [0, n] : (S ∗ f ) Ek > λ} = μ(A ). (n + 1)1/p

(5.2.30)

By virtue of the Banach principle, there exists a real C > 0 such that for any g ∈ Lp (μ), μ{S ∗ g > C g p,μ } ≤ 1/3. We assume that λ ≥ 31/p C. As F p,μ ≤ 1, if

p n = [ Cλ − 1] we have by (5.2.30), μ(A ) ≤ μ{S ∗ F >

λ 1 } ≤ μ{S ∗ F > C F p,μ } ≤ . 1/p (n + 1) 3

2 But nμ(A) ≤ 1−μ(A ) ; we have thus obtained that nμ(A) ≤ 3. Said differently, since

λ p n ≥ C /3, for any λ ≥ 31/p C,

μ{S ∗ (f ) > λ} ≤ Finally, if λ ≤ 31/p .C, then 1 ≤ 3 This thus achieves the proof.

C p λ

C 3 ≤9 n λ

p

.

(5.2.31)

so that (5.2.31) is trivially realized in this case.

Remarks. When p = ∞ and {Sn , n ≥ 1} is a sequence of positive operators from L∞ (μ) to L0 (μ), continuous in measure and satisfying the commutation assumption (H ), we have already observed in the remarks preceding Theorem 5.2.1 that these operators are continuous on L∞ (μ). Besides, if ∀f ∈ L∞ (μ)

μ{x : S ∗ f (x) < ∞} = 1,

(5.2.32)

then there exists a positive real A such that for any f ∈ L∞ (μ),

S ∗ (f ) ∞ ≤ A f ∞ . So that we also have in a trivial way a continuity principle in L∞ (μ). It is easy to see (cf. for instance Graversen–Peškir–Weber [1995: Theorem 3.1]) that this principle also extends to exponential type Orlicz spaces. Now, we shall obtain a variant of Theorem 5.2.1, for the case p = 1, which is particularly useful in applications. Consider a commutative compact group M and denote by “+” the group operation. Let m be the unique invariant measure, the Haar measure, on M with associated Lp (M) spaces. C(M) will designate the space of continuous functions on M, with the supremum norm, and B(M) will designate the

216

5 Banach principle and continuity principle

space of finite Borel measures on M with the usual norm. Let {Sn , n ≥ 1} be a sequence of operators. We assume: (a) Each Sn is a bounded operator from L1 (M) to C(M). (b) Each Sn commutes with translations. By Riesz’s representation of bounded linear functionals on L1 (M), it may be proved that the conditions (a) and (b) are equivalent with (c) Sn f (x) = M Kn (x − y)f (y)m(dy), where K ∈ L∞ (M).

Such an operator has a natural extension to a bounded operator from B(M) to L∞ (M), which we again denote by Sn . Notice that this extension still commutes with translations. Similarly, we also write S ∗ μ = supn∈N |Sn μ|. Then we have the following result. 5.2.4 Theorem. Under the above described assumptions, the following assertions are equivalent: (5.2.33) ∀f ∈ L1 (M), m{x : S ∗ f (x) < ∞} = 1, ∃0 < C < ∞ : ∀μ ∈ B(M), sup λμ x : S ∗ f (x) > λ |dμ| ≤ C. (5.2.34) λ≥0

M

Before giving the proof, we need a lemma. 5.2.5 Lemma. Let T1 , . . . , TN be operators that each satisfy the conditions (a) and (b) above. Let μ ∈ B(M). Then there exists a sequence f1 , f2 , . . . of elements of L1 (M), such that fk ≤ μ and lim Tn fk = Tn μ,

k→∞

n = 1, . . . , N.

Proof. Let ϕ1 , ϕ2 , . . . be continuous nonnegative functions such that T ϕk dm = 1 and forming an approximation of the identity in the usual sense. Put fk = ϕk ∗ μ. Then fk 1 ≤ μ . By (c), we may represent each Tn as Tn f = Kn ∗ f for some function Kn ∈ L∞ (M). Thus Tn fk = Kn ∗ (ϕk ∗ μ) = ϕk ∗ (Kn ∗ μ) = ϕk ∗ (Tn μ). Now, owing to the well-known fact that ϕk ∗ (Tn μ) − Tn μ 1 tends to 0 as n tends to infinity, we deduce the claimed result by extracting if necessary a subsequence of the sequence {ϕk , k ≥ 1}. Proof of Theorem 5.2.4. In view of Theorem 5.2.1, there exists a constant C such that for any f ∈ L1 (M) and α ≥ 0, αm sup |Sn f | > α ≤ C f 1 . 1≤n≤N

217

5.3 Applications

Apply this to the function f = fk , where the fk are given in the above lemma, and let k tend to infinity. It comes from this that αm sup |Sn μ| > α ≤ C |dμ|. 1≤n≤N

M

Letting now N tend to infinity achieves the proof.

5.3 Applications The continuity principle can be used to prove results of negative nature, but also of positive nature. In his fundamental paper, Stein gave several examples of applications. We study some of them. 1. Conjugate functions. For Fourier series of functions f ∼ conjugate function is defined by f˜ ∼ −i sign(n)an en .

n∈Z an en , the so-called

(5.3.1)

n∈Z

The linear operator which maps f to f˜ satisfies

f˜ 2 ≤ f 2 , and more generally for 1 < p < ∞,

f˜ p ≤ Cp f p .

(5.3.2)

This inequality is due to M. Riesz. For p = 1, this inequality fails, and the appropriate substitute result in that case is a theorem due to Kolmogorov, which asserts that sup tλ{x ∈ T : |f˜(x)| > t} ≤ C f 1 .

(5.3.3)

t≥0

It can be observed that this result together with the elementary inequality for p = 2, already implies by the Marcinkiewicz interpolation theorem, the Riesz inequality. Among the various proofs of inequality (5.3.3), the original proof of Kolmogorov is of special interest. He considered f˜r = −i sign(n)r |n| an en . n∈Z

By a known result, for every f ∈ L1 (λ), limr→1 f˜r exists almost surely. Kolmogorov proved that the limit operator satisfies inequality (5.3.3). But the mapping f → f˜r commutes with translations, and so this directly follows from the continuity principle enunciated in Theorem 5.2.1.

218

5 Banach principle and continuity principle

2. Lebesgue differentiation theorem. Consider the family of operators 1 h Th f (x) = f (x + t)dt, h > 0. h 0

(5.3.4)

According to the classical theorem of Lebesgue, if f is integrable, then for almost every x, (5.3.5) lim Th f (x) = f (x). h→0

Much later, Hardy and Littlewood introduced their maximal function f ∗ (x) = sup |Th f (x)|,

(5.3.6)

h>0

and proved for p > 1 the inequality

f ∗ p ≤ Cp f p .

(5.3.7)

F. Riesz observed that the inequality sup tλ{x ∈ T : |f ∗ (x)| > t} ≤ C f 1

(5.3.8)

t≥0

is implicit in their proof. Note that the operators Th commute with translations. Thus, in view of Lebesgue’s theorem, that inequality follows from the continuity principle. 3. Differentiation of functions of two variables. Let f ∈ L1 (T2 ), with Fourier expansion an,m en (x)em (y). f (x, y) ∼ (n,m)∈Z2

One may formally define double conjugate series sign(n)sign(m)an,m en em f˜ = −

(5.3.9)

(n,m)∈Z2

and ask whether this double conjugate series exists in a suitable sense. Similarly to the approach of Kolmogorov, one can consider the Abel sums of the above series, f˜r,ρ = − sign(n)sign(m)r |n| ρ |m| an,m en em , (5.3.10) (n,m)∈Z2

and inquire about the existence of the limit lim r→1 f˜r,ρ . For f ∈ Lp (T2 ), p > 1, it ρ→1

is known that this limit exists almost everywhere. In fact f ∈ L log L(T2 ) suffices. There is an analogy between double conjugate series and the differentiation of double integrals. Indeed, if f ∈ L log L(T2 ), it is known that h θ 1 f (x, y) = lim f (u + x, v + y)dudv, (5.3.11) h→1 hθ 0 0 θ →1

219

5.3 Applications

for almost all x and y. However, if one merely assumes that f ∈ L1 (T2 ), then the above inequality may fail to exist almost everywhere. But, if for instance we let h = θ , referring to Saks [1937] the limit (5.3.11) exists almost everywhere. In analogy with this, it was believed that the limit limr→1 f˜r,r for the double conjugate series exists for almost all x and y. Surprisingly enough, by a result of Stein [1961], the answer turns up to be negative. Here is the argument. As is well known, f˜r,ρ = Q(r, x − u)Q(ρ, y − v)f (u, v)dudv, T T

where

r sin 2π v . 1 − 2r cos 2π v + r 2 Let rm = 1 − 1/m. We shall prove that there exists an f ∈ L2 (T2 ) such that the limit L(f ) = limm→∞ f˜rm ,rm exists only in a set of measure 0. Assume the contrary. As the mappings f → f˜rm ,rm satisfy conditions (a) and (b), the conclusion of Theorem 5.2.4 holds. Apply it for μ equal to the Dirac measure at the origin. Then, sup f˜r ,r (x, y) = sup Q(rm , x)Q(rm , y) ≥ Q(1, x)Q(1, y) Q(r, v) =

m≥1

m m

m≥1

=

A 1 |(cotπ x)(cotπy)| ≥ . 4 xy

A The measure of the set {(x, y) ∈ T2 : xy > t} is of order B(log t)/t, thereby not of order B/t as it should be by the conclusion of Theorem 5.2.4. Hence a contradiction, and this proves the result.

4. Divergent Fourier series. A deep theorem of Kolmogorov asserts the existence of an integrable function f whose Fourier series diverges almost everywhere. The proof of this result is extremely difficult. It is possible, however, by means of the continuity principle to obtain a simplification and a refinement of this result. Let Sn (f ) designate the partial sum of order n of the Fourier series of f , and more generally Sn (μ) the partial sum of order n of the Fourier–Stieltjes expansion of a Borel measure μ. Recall the following fact: if f is integrable, then Sn f (x) − Sm f (x) = O(log |m − n|),

m, n → ∞,

almost everywhere. The refinement of Kolmogorov’s theorem is the following: let ϕ(n) be any function tending to zero as n tends to infinity. Then, there exists an integrable function such that the more restrictive property Sn (f )(x) − Sm (f )(x) = O(ϕ(|m − n|) log |m − n|)

(5.3.12)

is false on a set of positive measure. This result has been proved in Stein [1961]. For, consider the family of operators (m,n) f =

Sn (f ) − Sm (f ) . ϕ(|m − n|) log |m − n|

(5.3.13)

220

5 Banach principle and continuity principle

These operators satisfy conditions (a) and (b) of Theorem 5.2.4. We shall prove a lemma. 5.3.1 Lemma. There exists an absolute constant C such that for any integer k, there exists a measure μ on T with T |dμ| = 1 and Sn (μ) − Sm (μ) ≥ C log k almost surely. sup n,m:|n−m|=k

Proof. Let x1 , . . . , xN be some points of T to be specified later, and set μ=

N 1 δxi , N i=1

where δx denotes the Dirac measure at point x. Then Sn (μ)(x) − Sm (μ)(x) =

T |dμ|

= 1. Plainly,

N 2 cos π(n + m + 1)(x − xi ) sin π(n − m)(x − xi ) . N sin π(x − xj ) i=1

Write k = n − m, = n + m + 1. Assume that k is odd. Then must be even, but this is the only restriction on . We choose the xi to be linearly independent over Q, and such that they are very close to i/N. It is easily seen then, that for almost every x, the x − xi are linearly independent over Q. Choosing large enough, depending on x, we have N 2 | sin π k(x − xi )| |Sn (μ)(x) − Sm (μ)(x)| = . sup N | sin π(x − xj )| n,m:|n−m|=k i=1

Now the facts that xi are very close to i/N and N is large enough show that the sum on the right is close to its integral counterpart, and so exceeds half of its value. Therefore, 1 | sin π k(x − y)| sup |Sn (μ)(x) − Sm (μ)(x)| ≥ dy ≥ C log k, 2 T | sin π(x − y)| n,m:|n−m|=k as required. Returning to the studied property, we now can argue as follows: if for any f ∈ L1 (T) property (5.3.12) was true with positive probability, then the operators (m,n) f would satisfy the condition (5.2.33) of Theorem 5.2.4. Consequently, the maximal operator Sn (μ)(x) − Sm (μ)(x)

μ → ∗ (μ) := sup n,m

ϕ(|m − n|) log |m − n|

would satisfy the conclusion of this theorem, which is given by (5.2.34). But, this is now impossible in view of the lemma. Indeed by (5.2.34), we would have the existence of a constant C0 such that for any μ ∈ B(M) with T |dμ| = 1, and any t ≥ 0, tλ{x : ∗ μ(x) > t} ≤ C0 .

221

5.3 Applications

Let k be a positive integer, which we choose sufficiently large to ensure that log k > (2C0 )/C, where C is the same constant as in Lemma 5.3.1. Apply this for t = C(log k)/2; then,

C 2C0 log k ≤ < 1. 2 C log k But by Lemma 5.3.1, there exists μ ∈ B(M) with T |dμ| = 1 such that ∗ μ ≥ C log k almost surely. This provides a contradiction. Therefore, the operators cannot satisfy condition (5.2.33). And this shows the existence of an integrable function such that property (5.3.12) is false for almost every x. λ x : ∗ μ(x) >

5. Multiplier operators. In this example, we are concerned with the “multiplier problem” for Fourier series in one variable, which is that of characterizing the sequences λn of multipliers for which the transformation T defined for f ∼ n∈Z an en by Tf ∼ an λn en (5.3.14) n∈Z

is a bounded operator on Lp (T) into itself. If for any f ∈ Lp (T), f ∼ n∈Z an en , the series in (5.3.14) is the Fourier series of a function in Lp (T), and the operator T is bounded of Lp (T) to Lp (T), we say that λn is of type (Lp , Lp ). This is naturally an important problem, to characterize the sequences λn of type (Lp , Lp ). There is no restriction to assume λ0 = 0 and the sequence λn to be bounded. Introduce the generating function K given by K(x) =

λn en (x) . in ∗

(5.3.15)

n∈Z

An important, although basic fact (see [Zygmund: 1959], p. 157) about multipliers is this: if a sequence λn is of type (Lq , Lq ) for some q ∈ [2, ∞], then it is also of type (Lp , Lp ) for each p ∈ [q , q] where q is the index conjugate to q: 1/q + 1/q = 1. There is a corresponding result for the case q ∈ [1, 2]. Let q ∈ [2, ∞( and consider the classes Vq introduced in Kaczmarz [1933]. A function K belongs to Vq , if and only if K ∈ Lq (T), and sup K(bk − ·) − K(ak − ·) < ∞, (5.3.16) q

where the summation is taken over any finite collection of non-overlapping intervals of T, and the “sup” is taken over all such collections of intervals. The class V∞ may be defined to be the class of functions of bounded variation: V∞ = BV (T). Obviously V∞ ⊂ Vq ⊂ V2 if q ∈ [2, ∞]. The following fact is well known. 5.3.2 Lemma. A necessary and sufficient condition that the multiplier sequence λn is of type (L∞ , L∞ ), and thereby of type (Lr , Lr ) for all r ∈ [1, ∞], is that the generating function K defined in (5.3.15) belongs to V ∞ .

222

5 Banach principle and continuity principle

Let us continue with another simple lemma. 5.3.3 Lemma. A necessary and sufficient condition that the multiplier sequence λn is of type (L2 , L2 ), namely the λn are uniformly bounded, is that the generating function K belongs to V2 . Proof. Assume first that |λn | ≤ M for all n. Then, λn K(bk − x) − K(ak − x) ∼ en (bk ) − en (ak ) en (x). in n∈Z∗ In what follows, we write K0 (x) = n∈Z∗ en (x)/(in). By the Parseval relation, |λn |2 2 K(bk − ·) − K(ak − ·) 2 = en (bk ) − en (ak ) 2 2 n n∈Z∗ 1 2 ≤ M2 en (bk ) − en (ak ) 2 n n∈Z∗ 2 = M 2 K0 (bk − ·) − K0 (ak − ·) ≤M

2

2 2

K0 (bk − ·) − K0 (ak − ·) ∞

≤ M0 < ∞,

since K0 is of bounded variation. Thus K ∈ V2 . Conversely, assume K ∈ V2 and let f = an en (x) be any trigonometric polynomial. Let λn F (x) = K ∗ f (x) = K(x − y)f (y)dy = (5.3.17) an en (x). in T ∗ n∈Z

Then,

F (bk ) − F (ak )

≤ sup K(bk − ·) − K(ak − ·) 2 ≤ M < ∞

by the Cauchy–Schwarz inequality, if f 2 ≤ 1. Consequently F is of bounded variation, with total variation less than 2M. Therefore, |F (x)|dx ≤ 2M. T

But if f = en , then F = [λn /(in)]en , and thereby |λn | ≤ 2M. This achieves the proof. Now we shall use the continuity principle to prove the following nearly optimal result. 5.3.4 Theorem. Let q ∈ ]2, ∞[ . (i) Assume the multiplier operator defined in (5.3.14) to be of type (Lr , Lr ) for all r ∈ [q , q]. Then the generating function K defined in (5.3.15) belongs to Vq . (ii) Conversely, suppose that K belongs to Vq . Then, the multiplier operator is of type (Lr , Lr ) for all r ∈ ]q , q[.

223

5.3 Applications

Proof. We first prove part (i), which is relatively easy. Let p = q so that 1/p+1/q = 1, and consider again F = K ∗ f as in (5.3.17) for f ∼ an en ∈ Lp (T). Then, with the notation (5.3.14), x

F (x) =

(Tf )(t)dt. 0

By assumption the operator T satisfies Tf ∈ Lp (T), if f ∈ Lp (T). And F (bk ) − F (ak ) ≤ Tf 1 ≤ Tf p ≤ Tp , if f p ≤ 1 where Tp is the operator norm of T acting on Lp (T). Thus,

K(bk − x) − K(ak − x) f (x)dx ≤ Tp , T

whenever f p ≤ 1. Therefore, K(bk

− ·) − K(ak − ·) ≤ Tp , q

hence K ∈ Vq . Now, we prove part (ii). Consider the operator on Lp (T) defined by Dm f = Fm where

Fm = m F (· + 1/m) − F ( · ) = m K(· + 1/m) − K( · ) ∗ f. (5.3.18) By assumption K ∈ Lq (T), thus the operator Dm is bounded from Lp (T) to itself, for each m. Moreover Dm commutes with translations. Observe with the proof of Lemma 5.3.3 that F is of bounded variation, when f ∈ Lp (T). Indeed F (b K(b ) − F (a ) ≤ − x) − K(a − x) f (x)dx k k k k T ≤ K(bk − ·) − K(ak − ·) f p ≤ Tp < ∞. q

Thus the limit

lim Dm (f )(x) = lim m F (x + 1/m) − F (x) = F (x),

m→∞

m→∞

exists for almost every x, whenever f ∈ Lp (T). By the continuity principle (Theorem 5.2.1), the mapping D : f → F is of weak type (p, p). But K ∈ Vq ⊂ V2 , and so by Lemma 5.3.3 the mapping S : f → F is of type (L2 , L2 ). By the Marcinkiewicz interpolation theorem, it follows that S is also of type (Lr , Lr ) for r ∈ ]p, 2]. But the mapping S coincides with the multiplier operator T on trigonometric polynomials, thereby by continuity, on Lr (T). Invoking then a classical duality argument, we deduce that T is of type (Lr , Lr ) for r ∈ [2, q[. The proof is now complete.

224

5 Banach principle and continuity principle

6. Hardy spaces. In this example, we discuss an application of the continuity principle to some nonlinear operators occurring in analysis. Let H 1 denote the closed subspace of L1 (T) consisting of functions of power series type: f (t)en (−t)dt = 0 (∀ n < 0). (5.3.19) T

is invariant under the translation action. For any f ∈ L1 (T), let Sn f and σn f be respectively the partial sum and Cesàro mean of order n of the Fourier series of f . Define 1/2 |Sn (f )(x) − σn (f )(x)|2 ∗ g (x) = . (5.3.20) n Note that H 1

It is known that g ∗ (x) is finite for almost every x if f ∈ H 1 . Consider for f ∈ H 1 the nonlinear mapping f → g ∗ . 5.3.5 Theorem. There exists an absolute constant C such that for any f ∈ H 1 , ∗ sup aλ{x ∈ T : g (x) > a} ≤ C |f (x)|dx. T

a≥0

Proof. Let {αnm , n, m ∈ N} be a collection of complex numbers satisfying the following requirements: • the modulus of each αnm is rational and the argument is a rational multiple of 2π , • for each m, αnm = 0 for n sufficiently large, • for each m, |αnm |2 /n ≤ 1. Define for every m and f ∈ H 1 , Sn f (x) − σn f (x) Tm (f )(x) = αnm , (5.3.21) n n and

T ∗ f (x) = sup |Tm f (x)|.

(5.3.22)

m

Plainly T ∗ f (x) = g ∗ (x). The result then follows from the remark following Theorem 5.2.1. 7. Gabisoniya’s operator. Let f ∈ L1 (T). Gabisoniya [1973] showed that 2 n π i/n n f (x ± t) − f (x)dt = 0 for almost all x ∈ T. lim n→∞ i π(i−1)/n i=1 (5.3.23) This generates an operator of the form $ 2 %1/2 n π i/n

n f (x) = sup , |f (x + t) − f (x)| + |f (x − t) − f (x)| dt i π(i−1)/n n∈Z+ i=1

225

5.3 Applications

which is of weak type (1, 1), by the continuity principle. Now let f ∈ L1 (T) and let Sn (f, x) be the partial sums of the Fourier series of f . Rodin [1992] has considered the sequence {Sn (f, x), n ≥ 1} as a function of an integral argument n ∈ Z+ . He showed by means of Gabisoniya’s result that, for almost all x, it has bounded mean oscillation. 5.3.6 Theorem. Let f ∈ L1 (T), then the operator Tf (x) = sup m,n∈Z+

m−1 m−1 1 1 Sj +n (f, x) Sk+n (f, x) − m m j =0

k=0

is of weak type (1, 1). This operator is the BMO-norm of the function n → Sn (f, x). Further Tf (x) ≤ C f (x)

for almost all x ∈ T.

(5.3.24)

By the Jones–Nirenberg theorem (see also before Theorem 4.2.6), we have the inclusion BMO⊂ L where (x) = e|x| − 1, and we deduce from the preceding theorem 5.3.7 Corollary. Let f ∈ L1 (T), then for every constant A > 0, and for almost all x ∈ T, n 1 A|Sk (f,x)−f (x)| e − 1 = 0. lim n→∞ n k=0

The two previous results are respectively Theorem 1 and its corollary in [Rodin: 1992], to which we also refer for further results and the references therein. 8. Carleson’s theorem and Fefferman’s operator. Let f ∈ L2 (T). Here we choose the representation T ∼ (−π, π ). Carleson’s celebrated theorem shows that the partial sums Sn f of the Fourier series of f converge to f almost everywhere, thereby solving in the affirmative Lusin’s hypothesis. Carleson proved a few other results: a.e.

• If f ∈ Lp (T), 1 < p < 2, then Sn f (x) = o(log log log n).

• If for some δ > 0, T |f (x)| log+ |f (x)|)1+δ dx < ∞, then a.e.

Sn f (x) = o(log log n). Carleson [1966] considered a modified form of the Dirichlet formula for Sn f (x): −int e f (t) ˜ Sn f (x) = dt. x−t T Introduce the maximal function M ∗ f (x) = sup|n|≥0 |S˜n f (x)|. Carleson proved that λ{x ∈ T : M ∗ f (x) > y} ≤ C

f 22 , y2

226

5 Banach principle and continuity principle

for all y > 0, f ∈ L2 (T). Now put Mf (x) = supn≥0 |Sn f (x)|. By modifying Carleson’s proof, Hunt [1968] obtained corresponding inequalities for Mf : 5.3.8 Theorem. a) Mf p ≤ Cp f p , 1 < p < ∞, 2

b) Mf 1 ≤ C T |f (x)| log+ |f (x)| dx + C, c) λ{x ∈ T : Mf (x) > y} ≤ Ce−Cy/ f ∞ , y ≥ 0.

Fefferman [1973] gave another proof of Carleson–Hunt’s result. He proved the basic estimate Mf 1 ≤ C f 2 using a new approach. Given x, let n(x) ¯ be the least integer k for which |Sk f (x)| ≥ (1/2)Mf (x). The basic estimate is equivalent to

Sn¯ f 1 ≤ C f 2 . Elementary considerations of Dirichlet’s formula show that iny e − e−iny Sn f (x) = C f (x − y)dy + r y T where r is a trivial error term. To prove the basic inequality, it is enough to show that i N¯ (x)y e ≤ C f 2 , f (x − y)dy y T 1 ¯ ¯ for N(x) = n(x) ¯ and for N(x) = −n(x). ¯ Regard N¯ as a fixed function of x, and consider the linear operator T defined by Tf (x) =

¯

T

ei N (x)y f (x − y)dy. y

Fefferman proved that

Tf 1 ≤ C f 2 , with C independent of f and N¯ .

5.4 A principle of domination – conjugacy lemma We refer in this section to Halmos [1956]. Let (X, A, μ) be a probability space. A measurable transformation τ of X preserving μ (τ μ = μ) is called an automorphism of X, if τ is bijective, bi-measurable and if τ −1 is preserving μ. The family of automorphisms of (X, A, μ) is denoted by C. The family C, when equipped with the composition operation as internal law, is an abelian group. If τ ∈ C, then letting for any f ∈ L2 (μ) τf = f τ , we define a unitary operator on L2 (μ). As is well known, strong and weak topologies restricted to the set of all unitary operators coincide. The properties of these topologies are thus the same. The topology on C is usually called

5.4 A principle of domination – conjugacy lemma

227

the weak topology, and we have that τn → τ in C if and only if one of the following four equivalent properties is satisfied: ∀f ∈ L2 (μ), ∀A ∈ A,

f τn → f τ in L2 (μ), μ(τn (A)τ (A)) → 0,

∀A ∈ A, ∀f ∈ Lp (μ),

μ(τn −1 (A)τ −1 (A)) → 0, f τn → f τ in Lp (μ),

where 1 ≤ p < ∞ is given and fixed. Endowed with this topology, C is a topological group. In what follows, we will assume that the probability space (X, A, μ) is (pointwise) isomorphic to the interval [0, 1[ equipped with the normalized Lebesgue measure; namely that (X, A, μ) is a Lebesgue space. Recall for instance that any Polish space X (with A to be the Borel σ -field B(X) completed relatively to an arbitrary probability measure μ on B(X)) is a space of Lebesgue. Under this regularity assumption, the weak topology on C is metrizable and satisfies the first axiom of countability. Finally recall also that τ ∈ C is aperiodic if for any integer n ≥ 1, μ{x : τ n x = x} = 1. Then we have, 5.4.1 Lemma (Conjugacy lemma). If σ ∈ C is aperiodic, then the conjugate class of σ c(σ ) = {τ −1 σ τ : τ ∈ C}, is dense in C. Any ergodic endomorphism τ of (X, A, μ) is aperiodic. One can easily establish that this property is no longer true in other measure spaces. Define now the sequence {Sn , n ≥ 1} by means of the matrix summation method. For, assume that we are given an infinite matrix of reals A = {an,k , n, k ≥ 1} as well as some fixed 1 ≤ p < ∞. Let τ ∈ C, put then formally ∀f ∈ L (μ), ∀n ≥ 1, p

Snτ (f )

=

∞

an,k f τ k .

(5.4.1)

k=1

We will assume that all the column vectors an = {an,k , k ≥ 1} belong to 1 . From this assumption, it is easily deduced that (5.4.1) defines a sequence of continuous operators τ (f ) is the limit in Lp (μ) of N a f τ k as N tends on Lp (μ). Clearly, each SN k=1 n,k to infinity, this for any f ∈ Lp (μ). The fact that these operators are continuous in measure, is immediate. We will further assume that an,k ≥ 0 for any n, k ≥ 1. This assumption will guarantee that the operators Sn are positive. As usual, we also write for any f ∈ Lp (μ), Sτ∗ (f ) = sup |Snτ (f )|. n≥1

Observe then for any σ ∈ C that Sτ∗ (f ) σ = Sτ∗ (f σ ), for any f ∈ Lp (μ), provided that σ τ = τ σ . In particular, Sτ∗ (f ) τ i = Sτ∗ (f τ i ),

228

5 Banach principle and continuity principle

for any f ∈ Lp (μ), and i ≥ 1. We will need the following auxiliary result. 5.4.2 Lemma. Let D denote the set of τ ∈ C verifying ∀λ > 0, ∀f ∈ Lp (μ) with f p = 1,

μ{Sτ∗ (f ) > C(λ)} ≤ D(λ),

(5.4.2)

where C and D are applications from R+ in itself. Then D is closed in C. Proof. Assume that τp belongs to D for any p ≥ 1, and that τp → τ in C as p tends to infinity. It suffices then to show that the inequality N μ an,k f τ k > C(λ) ≤ D(λ),

(5.4.3)

k=1

holds for any λ > 0, any f ∈ Lp (μ) with f p = 1 and N ≥ 1. Let N ≥ 1 be given and fixed, as well as some real ε > 0. Since for any n ≥ 1, an ∈ 1 , we can find a number Mε ≥ 1 such that ∞ k=Mε an,k < ε for any 1 ≤ n ≤ N. For some integer pε depending on ε, Tchebycheff’s inequality allows us to write μ

N k=1

Mε ε an,k f τ k > C(λ) + 3δ ≤ μ an,k f τ k > C(λ) + 2δ + δ k=1

Mε ε an,k f τpkε > C(λ) + δ + 2 ≤μ δ k=1

∞

ε an,k f τpkε > C(λ) + 3 δ k=1 ε ≤ D(λ) + 3 . δ ≤μ

We conclude by letting ε go to 0. In the case of operators defined by means of matrix summation methods, the continuity principle admits the following strengthening due to Conze [1973], and still known as Conze’s principle. 5.4.3 Theorem. Let 1 ≤ p < ∞. Let A = {an,k , n, k ≥ 1} be an infinite matrix of positive reals and {Snτ , n ≥ 1} be the sequence of operators defined for τ ∈ C as in (5.4.1). Assume that the column vectors an = {an,k , k ≥ 1} belong to 1 . Then the following properties are equivalent: (a) There exists an ergodic automorphism σ ∈ C such that ∀f ∈ Lp (μ),

μ{x : Sσ∗ f (x) < ∞} = 1,

(5.4.4)

5.4 A principle of domination – conjugacy lemma

(b) ∃0 < C < ∞ : ∀f ∈ Lp (μ), ∀λ > 0, sup μ{x : Sτ∗ f (x) > λ} ≤

τ ∈C

C λp

229

|f |p dμ.

(5.4.5)

X

Proof. It suffices to show that (a) implies (b). By virtue of Sawyer’s continuity principle, C |f |p dμ. ∃0 < C < ∞ : ∀f ∈ Lp (μ), ∀λ > 0, μ{x : Sσ τ ∗ f (x) > λ} ≤ p λ X (5.4.6) Let c(σ ) = {τ −1 σ τ, τ ∈ C} the conjugate class of σ . Let α = τ −1 σ τ be an element of c(σ ). For any f ∈ Lp (μ), Snα (f ) = τ (Snσ (f τ −1 )). Thus

Sα∗ (f ) = τ (Sσ∗ (f τ −1 )).

We deduce ∀λ > 0, μ{x :

Sα∗ f (x)

C > λ} ≤ p λ

|f |p dμ,

(5.4.7)

X

for any f ∈ Lp (μ) and any α ∈ c(σ ). The preceding lemma shows that the family of all elements α of C verifying (5.4.6) is closed in C. As the conjugacy lemma 5.4.1 shows that this family is also dense in C, this achieves the proof.

Chapter 6

Maximal operators and Gaussian processes

This chapter is devoted to a study of the liaison inequalities existing between maximal operators of L2 -operators and those associated to the canonical Gaussian process on L2 . We shall also study the well-known metric entropy criteria developed by Bourgain, which have been proved to be efficient tools in the study of some classical problems of convergence almost everywhere. In presenting these criteria as direct corollaries of the above mentioned liaison inequalities, we will adopt a slightly different point of view than the initial one, allowing us to get a better understanding of the role played by the theory of Gaussian processes in the study of convergence almost everywhere.

6.1

Some liaison theorems

Let (X, A, μ) be some probability space and consider a sequence (denoted by S) of continuous operators Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . . Given 2 ≤ p ≤ ∞, the study of the convergence almost everywhere of the sequence {Sn f, n ≥ 1} for any f ∈ Lp (μ), is a fundamental question in ergodic theory. These properties are naturally expressed by means of the maximal operators SI (f ) = sup |Sn (f )|,

S ∗ (f ) = sup |Sn (f )|.

n∈I

n≥1

(6.1.1)

Here I is any finite subset of integers. By the Banach principle, the set of elements f ∈ Lp (μ) for which {Sn f, n ≥ 1} converges μ-almost everywhere is closed in Lp (μ) if and only if there exists a nonincreasing function C : R+ → R+ such that limα→∞ C(α) = 0, and for which μ{S ∗ f > α f p } ≤ C(α), α ≥ 0, f ∈ Lp (μ). Further if the sequence S commutes with a family E of measurable transformations of X preserving μ and mixing in the following sense: ∀A, B ∈ A, ∀α > 1, ∃T ∈ E :

μ(A ∩ T −1 B) ≤ αμ(A)μ(B),

(E )

then by the continuity principle C(α) = O(α −p ). This holds in particular when S commutes with an ergodic endomorphism of (X, A, μ). So that the study of the convergence almost everywhere of the sequence S amounts, modulo adequate commutation assumptions, to establishing a maximal inequality and to exhibiting a dense subset of Lp (μ) for which the convergence almost everywhere already holds. Recall now for our purpose some material from the theory of Gaussian processes taken from Chapter 10. Let H be a Hilbert space. In what follows we denote by

6.1 Some liaison theorems

231

Z = {Zh , h ∈ H } the canonical Gaussian process on H , namely the Gaussian centered process with covariance function (h, h ) = h, h ,

h, h ∈ H.

This process is easy to represent. Assume that H admits a countable orthonormal basis {hn , n ≥ 1}. This is realized if and only if H is separable (by Zorn’s lemma, any Hilbert space admits an orthonormal basis, although not necessarily countable). Let also g = {gn , n ≥ 1} be a sequence of i.i.d. N (0, 1) distributed random variables with basic probability space (, A, P). Then Z can be defined as follows: for any h ∈ H , Zh =

∞

gn h, hn .

(6.1.2)

n=1

We easily verify that E Zh Zh = h, h for any h, h ∈ H . Besides, if A is some finite or countable (only in order to avoid minor measurability problems) subset of H , we recall that A is a GB set (for Gaussian bounded set) if E sup |Z(h)| < ∞.

(6.1.3)

h∈A

Now we let H = L2 (μ) and introduce for any f ∈ L2 (μ) the subsets Cf = {Sn f, n ≥ 1}.

(6.1.4)

Bourgain [1988a] has established a remarkable link between the properties of convergence almost everywhere of the sequence S and the regularity of Z on the sets Cf . This link can be interpreted as follows: if the sequence S converges almost everywhere for a large class of functions, for any f ∈ Lp (μ) to be precise, with 2 ≤ p < ∞, then necessarily the associated sets Cf are GB-sets (i.e., E supn≥1 |Z(Sn (f ))| < ∞). And this provides by means of Sudakov’s minoration (inequality (6.2.7)) a necessary condition which reads on the size of the sets Cf . This condition means that these sets can not be too thick: their entropy numbers are not too big. There is an analogous result when p = ∞. Bourgain [1988a] also proved the efficiency of such a condition by showing, through several striking examples, how it can be successfully applied to recover some important results of Marstrand and Rudin. In this chapter, we will present Bourgain’s results from a functional analysis point of view. We will first establish relationships between some functionals naturally related through the Banach principle to the sequence S, and corresponding functionals related to the canonical Gaussian process Z. And next we show that Bourgain’s entropy criteria are easily deduced from these functional inequalities. We begin with some notation and first introduce for any subset I of integers the following functionals related to S and I . Let 2 ≤ p < ∞. Consider a sequence S of L2 (μ) continuous operators. We put sup |Sn (g)| dμ. (6.1.5) p (S, I ) = sup

g p,μ ≤1

n∈I

232

6 Maximal operators and Gaussian processes

When I = N we write more simply p (S) =

sup

g p,μ ≤1

sup S ∗ (g) dμ.

(6.1.6)

n∈N

Let p = ∞. Consider a sequence S of L2 (μ) − L∞ (μ)-continuous operators. It will be convenient to introduce the following functionals considered in Bourgain [1988a] (see also Bellow and Jones [1996]) ∞,2 (S, I, ε) = sup SI (f ) dμ,

f ∞,μ ≤1

f 2,μ ≤ε

∞,2 (S, ε) =

sup

S ∗ (f ) dμ.

(6.1.7)

f ∞,μ ≤1

f 2,μ ≤ε

When the operators Sn are further Lp (μ)-continuous for some p ∈ [2, ∞], we also put for any subset I of N and f ∈ Lp (μ),

Sn p =

sup

f p,μ ≤1

Sn (f ) p,μ ,

S(I, p) = sup Sn p .

(6.1.8)

n∈I

Finally we introduce the corresponding Gaussian functionals. Put (S, I ) =

sup

g 2,μ ≤1

E sup Z(Sn (g)),

(6.1.9)

n∈I

and for any positive integer K, ∗ (S, K) =

sup E sup Z(Sn (f )).

f 2,μ ≤1 #(I )=K

(6.1.10)

n∈I

We shall establish several liaison inequalities (Theorems 6.1.1 and 6.1.6) between these functionals. More precisely we compare (S, I ) with p (S, I ) for 2 ≤ p < ∞, and next ∗ (S, K) with ∞,2 (S, I, ε), if #(I ) = K. Consider the following assumption. (H1) There exists a sequence {Tj , j ≥ 1} of L2 (μ) positive isometries, preserving 1, commuting with the sequence {Sn , n ≥ 1}, Sn (Tj f ) = Tj (Sn f ), and verifying the following mean ergodic property: for all f ∈ L∞ (μ) 1 lim Tj f 2 − f 2 dμ = 0. J →∞ J 1,μ j ≤J

6.1.1 Theorem. Let Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . be continuous operators verifying (H1) and such that Sn (L∞ (μ)) ⊂ L∞ (μ). Let 2 ≤ p < ∞. Then there exists a constant Cp < ∞ such that for any finite subset I of N, (S, I ) ≤ Cp p (S, I ).

6.1 Some liaison theorems

233

The proof will notably result from an intermediate inequality proved in Lemma 6.2.2, and showing that for any 0 < ε < 1 and for all integers J along some index J, (1 − ε)E sup Z(Sn (f )) ≤ E sup Sn (FJ,f ) dμ. n∈I

n∈I

Here the FJ,f are the Stein elements that we already encountered in Chapter 5 for proving the continuity principle. In the following corollary, assumption (H1) is replaced by a slightly stronger one, needed to apply the continuity principle. (H2) There exists a family E = {Tj , j ≥ 1} of pointwise transformations of X preserving μ, commuting with the Sn , Sn (Tj f ) = Tj (Sn f ), and verifying for any f, g ∈ L2 (μ): n 1 Tk f, g = f, 1g, 1. lim n→∞ n k=1

Under this assumption, property E is fulfilled. Consequently the continuity principle applies to the sequence S. We shall now easily deduce the first entropy criterion of Bourgain [1988a: Proposition 1]. 6.1.2 Corollary (First entropy criterion). Let 2 ≤ p < ∞. Let S be a sequence of continuous operators Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . verifying assumption (H2), and such that the following property is fulfilled: μ{S ∗ (f ) < ∞} = 1 for any f ∈ Lp (μ).

(Bp )

Then for any f ∈ Lp (μ), the sets Cf are GB sets of L2 (μ). In particular, there exists a numerical constant C1 and a constant C2 depending on the sequence S only, such that for any f ∈ Lp (μ), 2 C1 sup ε log Nf (ε) ≤ E sup Z(Sn (f )) ≤ C2 f 2 , ε>0

n≥1

where for any ε > 0, Nf (ε) denotes the minimal number of L2 (μ) open balls of radius ε, centered in Cf and enough to cover Cf . Remarks. 1. Under the assumptions of Theorem 6.1.1, the same conclusion can be also reached by using the Banach principle. For more details see the proof of Theorem 4.1.1 in [Weber: 1998]. 2. The first inequality provides an entropy estimate which turns out to be optimal when the sequence S is a sequence of convolutions products ([Weber: 1998b], Remark 4.1.4). 3. The second inequality indicates that the sets Cf are uniformly GB.

234

6 Maximal operators and Gaussian processes

Proof. As Sn is a continuous operator in L2 (μ), this implies that Sn is also continuous in measure on Lp (μ). By virtue of the continuity principle, we know that sup sup λp μ{S ∗ (f ) > λ} < ∞.

f p ≤1 λ≥0

And thus for any r < p,

sup S ∗ (f ) r < ∞.

f p ≤1

Applying this with r = 1 implies, in view of Theorem 6.1.1, that there exists Kp < ∞ such that for any finite subset I , sup

f 2,μ ≤1

E sup Z(Sn (f )) ≤ Cp p (S, I ) ≤ Cp sup S ∗ (f ) 1 := Kp .

f p ≤1

n∈I

(6.1.11)

By letting I increase to N we get sup

f 2,μ ≤1

E sup Z(Sn (f )) ≤ Kp . n≥1

This proves the second inequality of the corollary. As to the first, it is an immediate consequence of Sudakov’s minoration, which we recall in this chapter (see inequality (6.2.7)). Before continuing, we shall study several applications of Theorem 6.1.1. We begin with a first application to Riemann sums. Let T be endowed with the normalized Lebesgue measure λ. Let f ∈ L0 (λ) and define for x ∈ T and any integer n = 1, 2, . . . the Riemann sum of order n of f , Rn (f )(x) =

1 j f x+ . n n 0≤j

l∈{1}∪E1

1 1 4(p2 − 1) + 2 > 1, 2 4p p

hence a contradiction. Now let i2 ∈ E\({1} ∪ E1 ). We easily check that

fi1 − fi2 ≥ |fi1 , φi2 − fi2 , φi2 | ≥

1 1 1 − = . p 2p 2p

More generally, for 1 ≤ k ≤ T , put

, Ek = i ∈ E\ {1, . . . , k} ∪ 0≤j

1 2p

.

Arguing as before, we also get #(Ek ) ≤ 4(p2 − 1), since otherwise 1 ≥ fik 2 ≥

fik , φl 2 >

l∈{k}∪Ek

1 1 4(p2 − 1) + 2 > 1. 2 4p p

, Now let ik+1 ∈ E\ {1, . . . , k} ∪ 0≤j ≤k Ej . For any l ≤ k we have 1 1 1 − = , p 2p 2p

, because ik+1 ∈ / 0≤l≤k El . We can iterate this procedure as long as E\ {1, . . . , k} ∪ , 0≤j ≤k Ej = ∅, namely at least k times, k ≤ T ; hence the lemma is proved.

fil − fik+1 ≥ |fik+1 , φik+1 − fil , φik+1 | ≥

Proof of Proposition 6.1.4. Let s be some fixed positive integer. Let P1 , P2 , . . . denote the sequence of prime numbers. For any nonnegative integer T we put AT = {n = P1α1 . . . Psαs : 2T ≤ n < 2T +1 , αi ≥ 0, i = 1, . . . , s}.

(6.1.12)

Since P1 = 2, replacing α1 by α1 + 1 we verify that #(AT ) ≤ #(AT +1 ).

(6.1.13)

As 0 ≤ α1 + · · · + αs ≤ T if n = P1α1 . . . Psαs ∈ AT , we also deduce that #(AT ) ≤ (T + 1)s .

(6.1.14)

The growth condition (6.1.14) implies that, given any arbitrary positive integer d, there exists T such that (6.1.15) #(AT +d ) ≤ 2#(AT ).

239

6.1 Some liaison theorems

Otherwise #(AT +d ) > 2#(AT ) for any T would imply that for some constant c > 0, #(And ) > c2n ,

(6.1.16)

for any positive integer n, which contradicts (6.1.14); hence (6.1.15). Now choose d such that 2d ≤ Ps . Any integer j ≤ 2d has consequently only prime factors from the set {P1 , . . . , Ps }. Put for i = 0, . . . , d, f (i) (x) =

1 #(AT +i )1/2

e2iπ nx ,

(6.1.17)

n∈AT +i

f = f (0) , fj (x) = f (j x) and then φi =

f (2i−1) + f (2i) , √ 2

i = 0, . . . ,

"d # 2

.

The f (i) form an orthonormal system in L2 , as do the φi as well. Besides, fj = 1 " # for any j . Let 1 ≤ i ≤ d2 and j ∈ [22i−1 , 22i ], and examine fj . Let n ∈ AT . Then all the prime factors of nj belong to {P1 , . . . , Ps }. Further, 2T +2i−1 ≤ nj < 2T +2i+1 . It follows that n ∈ AT and j ∈ [22i−1 , 22i ] "⇒ nj ∈ AT +2i−1 ∪ AT +2i . We may thus write fj (x) =

1 e2iπ mx , #(D)1/2 m∈D

where D ⊂ AT +2i−1 ∪ AT +2i and #(D) = #(AT ). Hence by (6.1.15), √ 2fj , φi =

1 [#(AT )#(AT +2i−1 )]1/2 +

≥ Therefore for any 1 ≤ i ≤

"d # 2

1 [#(AT )#(AT +2i )]1/2

m∈D∩AT +2i

#(D) #(AT ) 1 √ = √ =√ . #(T ) 2 #(AT ) 2 2 and any 22i−1 ≤ j ≤ 22i , fj , φi ≥

1 . 2

1

m∈D∩AT +2i−1

1

(6.1.18)

240

6 Maximal operators and Gaussian processes

And fj , φk ≥ 0 for any j and k. Thus 2i

i

2

j =1

j =22i−1

4 1 1 fj , φi ≥ i S4i (f ), φi = i 4 4

fj , φi

2i

2 22i − 22i−1 + 1 1 1 1 = ≥ . = 2i+1 2.4i 2 4 2i−1 l=2

We have thus obtained: for every i = 1, . . . ,

"d #

,

2

S4i (f ), φi ≥ Lemma 6.1.5 applied with the choices R =

N

"d # 2

1 . 4

,T =

$ %

S4i (f ), i ≤

d 2

(6.1.19)

,

1 8

"" d # 2

# /13 , p = 2 shows that

≥ T.

(6.1.20)

Since d is arbitrary, it follows from Theorem 6.1.1 and inequality (6.2.7), that for any M ≥ 26, B2 sup S4i f dμ ≥ log M/26, (6.1.21) sup 8

f 2 ≤1 1≤i≤M as claimed. We will also establish the following result concerning the functionals in (6.1.7). 6.1.6 Theorem. Assume for any positive integer n that Sn is L2 (μ)-L∞ (μ) continuous, and that assumption (H1) is satisfied. Then for any finite subset of I of N and any reals A > 0 and R > 0, (S, I ) ≤

2

2#(I )S(I, 2)e−A

2 /8

+

√

2 · AS(I, ∞)e−R

2 /4

+ A∞,2 S, I,

R . A

As an immediate consequence we have the following proposition. 6.1.7 Proposition. Let {Sn , n ≥ 1} be a sequence of L2 (μ)-L∞ (μ) contractions verifying assumption (H1). Then for any real ρ > 0, there exists a constant Cρ < ∞ such that for any integer K ≥ 3 and any R > 0,

2 2 R ∗ (S, K) √ K −ρ 2 ≤ 2√ + 2Cρ e−R /4 + Cρ ∞,2 S, 2 . (6.1.22) √ log K log K Cρ log K In particular

∗ (S, K) ≤ 2 lim ∗∞,2 (S, ε). lim sup √ ε→0 log K K→∞

(6.1.23)

241

6.1 Some liaison theorems

Proof. Theorem 6.1.6 implies 2 √ R −A2 /8 −R 2 /4 (S, I ) ≤ 2#(I )e + 2 · Ae + A∞,2 S, I, . A √ Let ρ > 0 be fixed. Choose C = Cρ = 8ρ + 4. Let K ≥ 3. Put A = C log K. Then for any subset I of N such that #(I ) = K, √ √ √ (S, I ) 2 R −ρ −R 2 /4 + C∞,2 S, √ . ≤√ K + 2Ce √ log K log K C log K

And by taking the maximum over all subsets I of integers such that #(I ) = K, √ √ √ 2 ∗ (S, K) R 2 , ≤√ K −ρ + 2Ce−R /4 + C∞,2 S, √ √ log K log K C log K which is (6.1.22). By now letting R run over any increasing sequence of integers RK = 0, next letting ρ go {RK , K ≥ 1} such that limK→∞ RK = ∞ and limK→∞ √log K to zero, we also get (6.1.23). From the above proposition, it is still possible to simply deduce as a corollary the other entropy criterion of Bourgain [1988a: Proposition 2] for the space L∞ (μ). This criterion is mostly applied. 6.1.8 Corollary (Second entropy criterion). Let {Sn , n ≥ 1} be a sequence of L2 (μ)L∞ (μ) contractions verifying assumption (H1). Assume that μ {Sn (f ), n ≥ 1 converges} = 1 for all f ∈ L∞ (μ). (C∞ ) Then for any real δ > 0, C(δ) =

sup

f ∈L∞ (μ), f 2 ≤1

Nf (δ) < ∞.

(6.1.24)

Proof. Assume that there exists a real δ > 0 such that C(δ) = ∞. Then for any integer K ≥ 3, there exists f ∈ L∞ (μ) such that f 2,μ = 1 and I with #(I ) = K such that inf

n,m∈I, n =m

Sn (f ) − Sm (f ) 2,μ ≥ δ.

In view of Proposition 6.1.7 (with ρ = 1 and C = Cρ ) and inequality (6.2.7), it follows that R −1 −R 2 /4 Bδ ≤ C(K + e ) + ∞,2 S, √ , log K where B is a numerical constant. Choosing now R such that Ce−R letting K go to infinity, we deduce 1 Bδ ≤ lim sup ∞,2 (S, ε). 2 ε→0

2 /4

≤ 21 Bδ, next

242

6 Maximal operators and Gaussian processes

This brings a contradiction since in view of Theorem 5.1.5 and the assumptions made we know that the maximal operator ∞,2 (S, ε) should be continuous at 0; hence the result. Return to Khintchin sums (Proposition 6.1.4) and the entropy estimate established in (6.1.20). Since d was arbitrary, it follows that

1 N (S4i (f ), i ≥ 1), sup 4 f ∈L∞ , f 2 ≤1

= ∞.

And by the second entropy criterion, we recover a well-known result due to Marstrand [1970], answering negatively a conjecture due to Khintchin: There exists a measurable bounded function f such that the sequence of Khintchin sums {Sn f, n ≥ 1} does not converge almost everywhere.

6.2 Two preliminary lemmas We begin with a useful lemma. 6.2.1 Lemma. Let T be a positive isometry of L2 (μ) such that T 1 = 1. (a) Then Tf ∞,μ ≤ f ∞,μ for any f ∈ L∞ (μ), and μ{(Tf )2 = Tf 2 } = 1. (b) Moreover, if T is a continuous operator on L1 (μ), then for any f ∈ L2 (μ), μ{(Tf )2 = Tf 2 } = 1, and T is a positive isometry of L1+ (μ). (c) Conversely, if T is a positive isometry on L1+ (μ) such that μ{(Tf )2 = Tf 2 } = 1 holds for any f ∈ L2 (μ), then T 1 = 1 and T is a positive isometry on L2 (μ). Proof. The first assertion in (a) is immediate since Tf ≤ T 1 · f ∞,μ = f ∞,μ . Now let A ∈ A, 0 < μ(A) < 1. We use the following property: f, g ∈ L2 (μ) with f ≥ 0, g ≥ 0 have disjoint supports if and only if

f + g 22,μ = f 22,μ + g 22,μ .

(6.2.1)

This property remains true (see Krengel [1985: p. 186]) in Lp (μ) with 1 < p < ∞. Since T is a positive isometry, from the fact that T 1 = 1, we deduce that T 1A and T 1Ac have disjoint supports; and 0 ≤ T 1A , T 1Ac ≤ 1. Let E = {0 < T 1A < 1} = {0 < T 1Ac < 1}. As E ⊂ supp(T 1A ) ∩ supp(T 1Ac ), we conclude that T 1A and T 1Ac are indicators. Consequently, any simple function is mapped by T into a simple function. For these functions we have (Tf )2 = Tf 2 .

243

6.2 Two preliminary lemmas

Let f ∈ L∞ (μ), f ≥ 0. Put for any integer n > f ∞,μ , n

n

n2 q fn = 1 q−1 q , 2n 2n ≤f < 2n

gn =

q=1

n2 q −1 q=1

Then f ≤ fn ≤ f + 21n and gn ≤ f ≤ gn + positivity of T and T 1 = 1,

(Tf ) ≤ (Tfn ) = 2

2

Tfn2

1 ≤T f + n 2

1 2n

2

2n

1 q−1 2n

≤f < 2qn

.

(6.2.2)

at any point. On the one hand, using

= Tf 2 + 2−n+1 Tf + 2−2n .

Consequently by letting n tend to infinity, (Tf )2 ≤ Tf 2 . And on the other,

1 Tf ≤ T gn + n 2 2

2

= T gn2 + 2−n+1 T gn + 2−2n ≤ (Tf )2 + 2−n+1 Tf + 2−2n .

Hence Tf 2 ≤ (Tf )2 by letting n tend to infinity, and thus Tf 2 = (Tf )2 . Let now f ∈ L∞ (μ), f = f + − f − . As

Tf + − Tf − 22,μ = Tf 22,μ = f 22,μ = f + 22,μ + f − 22,μ = Tf + 22,μ + Tf − 22,μ , it then follows that Tf + and Tf − have disjoint supports. This implies that (Tf )2 = (Tf + )2 + (Tf − )2 = T (f + )2 + T (f − )2 = Tf 2 .

(6.2.3)

We have thus established assertion (a). We now show (b). Let f ∈ L2 (μ); there exists a sequence (fn ) ⊂ L∞ (μ) such that f − fn 2 → 0 as n → ∞. By virtue of the Cauchy–Schwarz inequality, we have also f 2 − fn2 1 → 0 as n → ∞. Then

(Tf )2 − Tf 2 1 ≤ (Tf )2 − (Tfn )2 1 + (Tfn )2 − Tfn2 1 + T (fn2 − f 2 ) 1 ≤ T (fn − f ) 2 · T (fn + f ) 2 + T (fn2 − f 2 ) 1 → 0, as n → ∞ since T is continuous on L1 (μ) and L2 (μ); hence (b). Finally (c) is immediate. Recall for our purpose Slepian’s comparison inequality and Sudakov’s minoration (inequalities (10.2.7) and (10.2.9)). Let T be a finite set. Let X = {Xt , t ∈ T } and Y = {Yt , t ∈ T } be two Gaussian processes. Assume that for any s, t ∈ T ,

Xs − Xt 2 ≤ Ys − Yt 2 .

(6.2.4)

244

6 Maximal operators and Gaussian processes

Then for any positive increasing convex function f on R+ , " # " # Ef sup (Xs − Xt ) ≤ Ef sup (Ys − Yt ) . T ×T

T ×T

(6.2.5)

In particular, E sup Xt ≤ E sup Yt . t∈T

(6.2.6)

t∈T

An important consequence is Sudakov’s minoration: there exists a universal constant B such that for any Gaussian process X = {Xt , t ∈ T } with basic probability space (, B, P), 2 E sup Xt ≥ B inf Xs − Xt 2,P log #(T ). (6.2.7) s,t∈T s =t

t∈T

Now let g = {gn , n ≥ 1} be a sequence of i.i.d. N (0, 1) distributed random variables defined on a joint probability space of (X, A, μ), which we denote by (, B, P). To any f ∈ L2 (μ) and any finite subset E of N, we associate the Gaussian sequence FE,f = √

1 gj Tj (f ). #(E) j ∈E

(6.2.8)

When E = {1, 2, . . . , J } we will write more simply FE,f = FJ,f . The following comparison lemma is the key for proving Theorems 6.1.1 and 6.1.4. 6.2.2 Lemma. Let Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . be continuous operators verifying (H1) and such that Sn (L∞ (μ)) ⊂ L∞ (μ). Let f ∈ L∞ (μ); let also I be a finite subset of positive integers such that Sn (f ) − Sm (f ) 2,μ = 0 for all n, m ∈ I with m = n. Then for any 0 < ε < 1 and any index J0 , there exists a subindex J such that if

Sn (FJ,f ) − Sm (FJ,f ) 2,P √ A(I ) = ∀J ∈ J, ∀n, m ∈ I, m = n, ≥ 1−ε ,

Sn (f ) − Sm (f ) 2,μ then μ {A(I )} ≥

√ 1 − ε,

(6.2.9) R+ ,

and for any positive increasing convex function G on any J ∈ J: √

√ 1 − ε E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) ≤E

n,m∈I

G sup Sn (FJ,f ) − Sm (FJ,f ) dμ.

(6.2.10a)

n,m∈I

In particular, for any J ∈ J, (1 − ε)E sup Z(Sn (f )) ≤ E n∈I

sup Sn (FJ,f ) dμ. n∈I

(6.2.10b)

245

6.2 Two preliminary lemmas

Proof. We give the proof when J0 = {1, 2, . . . }, the case of an arbitrary index J0 presenting no additional difficulty. Let 0 < ε < 1 be fixed. Let f ∈ L∞ (μ). By assumption, the operators Sn and Tj are commuting; thus Sn (FJ,f ) = FJ,Sn (f ) . Consequently, " #2 Sn (FJ,f ) − Sm (FJ,f ) 2 = 1 Tj (Sn (f ) − Sm (f )) . 2,P J j ≤J

Lemma 6.2.1 and assumption (H1) allow us to write Sn (FJ,f ) − Sm (FJ,f ) 2 2,P

2 1 L1 (μ) = Tj (Sn (f ) − Sm (f ))2 −→ Sn (f ) − Sm (f ) 2,μ , J j ≤J

as J tends to infinity. Fix n, m ∈ I , n = m. We can thus define an index J = {Jk , k ≥ 1}, such that 1 2 2 ∀k ≥ 1, Tj (Sn (f ) − Sm (f )) − Sn (f ) − Sm (f ) 2,μ ≤ 2−2k . Jk 1,μ j ≤Jk

Therefore, ∀k ≥ 1,

1 2 2 μ Tj (Sn (f ) − Sm (f )) − Sn (f ) − Sm (f ) 2,μ ≥ 2−k ≤ 2−k .

Jk

j ≤Jk

Let L ≥ 1 be an integer such that 2−L−1 ≤ ε Sn (f ) − Sm (f ) 22,μ . Then, for any k > L, Sn (f ) − Sm (f ) 2 − 2−k ≥ (1 − ε) Sn (f ) − Sm (f ) 2 2,μ

2,μ

and consequently, √ μ ∀k > L, Sn (FJk ,f ) − Sm (FJk ,f ) 2,P ≥ 1 − ε Sn (f ) − Sm (f ) 2,μ 1 Tj (Sn (f ) − Sm (f ))2 − Sn (f ) − Sm (f ) 22,μ ≤ 2−k ≥ μ ∀k > L, Jk j ≤Jk ≥1− 2−k = 1 − 2−L . k>L

We write J(m, n) = {Jk , k > L}. We have thus shown Sn (FJ,f ) − Sm (FJ,f ) √ 2,P μ ∀J ∈ J(m, n), ≥ 1 − ε ≥ 1 − 2−L .

Sn (f ) − Sm (f ) 2,μ Let (m , n ), m = n be another pair of elements of I . Let also L be some sufficiently large positive integer. Since 1 2 2 lim Tj (Sn (f ) − Sm (f )) − Sn (f ) − Sm (f ) 2,μ = 0, J →∞ J 1,μ J ∈J(m,n) j ≤J

246

6 Maximal operators and Gaussian processes

by the preceding reasoning we can extract from J(m, n) another index J(m , n ) such that

Sn (FJ,f ) − Sm (FJ,f ) 2,P √ μ ∀J ∈ J(m , n ), ≥ 1 − ε ≥ 1 − 2−L .

Sn (f ) − Sm (f ) 2,μ Proceeding then by successive iterations, we can define for a convenient choice of integers L, L , . . . , an index J = J(I, ε) such that if

Sn (FJ,f ) − Sm (FJ,f ) 2,P √ A(I ) = ∀J ∈ J, ∀n, m ∈ I, m = n, ≥ 1−ε ,

Sn (f ) − Sm (f ) 2,μ then μ{A(I )} ≥

√

1 − ε.

Along this index, we thus have by virtue of (6.2.6), √ E sup Z(Sn (f )) dμ (1 − ε)E sup Z(Sn (f )) ≤ 1 − ε A(I ) n∈I n∈I ≤ E sup Sn (FJ,f ) dμ A(I ) n∈I ≤E sup Sn (FJ,f ) dμ, X n∈I

since μ{ E supn∈I Sn (FJ,f ) ≥ 0 } = 1. This establishes (6.2.10b). As for (6.2.10a), √ inequality (6.2.5) and the fact that μ {A(I )} ≥ 1 − ε, shows similarly

E G sup Sn (FJ,f ) − Sm (FJ,f ) dμ X

n,m∈I

≥

A(I )

≥

A(I )

E G sup Sn (FJ,f ) − Sm (FJ,f ) dμ n,m∈I

√ E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) dμ n,m∈I

√

√ ≥ 1 − ε E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) n,m∈I

√

√ ≥ 1 − ε E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) . n,m∈I

This achieves the proof of Lemma 6.2.2. Two elementary estimates for Gaussian variables (see Chapter 10) will now be ∞ 2 2 necessary. We recall them for convenience: if R(x) = ex /2 x e−t /2 dt (Mill’s ratio), then for any x ≥ 0, 3 2 π 2 . (6.2.11) ≤ R(x) ≤ ≤ √ 2 x2 + 4 + x x 2 + π8 + x

247

6.3 Proof of Theorem 6.1.1

It follows that for any standard Gaussian random variable g, any T > 0, E g 2 1(|g|≥T ) ≤ 6e−T

6.3

2 /4

.

(6.2.12)

Proof of Theorem 6.1.1

Let f ∈ L∞ (μ) be such that f 2,μ ≤ 1. By using Lemma 6.2.1 and moment properties of Gaussian random variables, we get E

p/2 p E |FJ,f |2 E |FJ,f |p dμ ≤ Cp dμ p/2 1 p 2 = Cp Tj f (x) dμ(x). J

|FJ,f |p dμ =

j ≤J

Here Cp depends on p only. By assumption 1 2 2 lim Tj f − f 2,μ

J →∞

J

j ≤J

1,μ

= 0.

Along some increasing subsequence of integers, say J0 , J1 j ≤J Tj f 2 (x) thus converges to f 22,μ for almost all x. Since f ∈ L∞ (μ), it follows that for J ∈ J0 , p/2

1 p 2 is a bounded sequence converging almost surely to f 2,μ , j ≤J Tj f (x) J and we may apply the dominated convergence theorem. Therefore lim

J0 J →∞

E

p

|FJ,f |p dμ = f 2,μ .

Extracting if necessary from J0 another subsequence which we call again J0 , we may thus conclude that E FJ,f p,μ ≤ 2Cp f 2,μ ,

∀J ∈ J0 .

(6.3.1)

Further by Lemma 6.2.2, for any 0 < ε < 1 there exists an index J ⊆ J0 such that for any J ∈ J, (6.3.2) (1 − ε)E sup Z(Sn (f )) ≤ E sup Sn (FJ,f ) dμ. n∈I

n∈I

248

6 Maximal operators and Gaussian processes

Let 0 < ε < 1 and put u0 = 0, un = ε(1 + ε)n−1 n = 1, 2, . . . . Write E

∞ sup Sn (FJ,f ) dμ = E n∈I

≤

1uk−1 ≤ FJ,f p,μ 1 be fixed. By extracting if necessary another index, we obtain

∀J ∈ J,

E

F A,J 22,μ

A2 ≤ (1 + α) exp − . 4

(6.4.9)

Integrating then inequality (6.4.5) with respect to P allows us to deduce from (6.4.4) and (6.4.9) that for any J ∈ J, 2 A2 (1 + α)#(I )S(I, 2) exp − + E sup |Sn (FA,J )| dμ. 8 n∈I n∈I (6.4.10) In order to estimate E sup |Sn (FA,J )| dμ, γ E sup Z(Sn (f )) ≤

n∈I

a fine evaluation of E exp a FA,J 22,μ where a = E exp

a FA,J 22,μ

1 4α

will be necessary. At first,

2 = E exp a FA,J dμ X

and, by means of Jensen’s inequality, we may continue as follows:

2 2 2 ≤E exp(aFA,J ) dμ ≤ E exp aFA,J dμ + eaA μ(Bαc ), X

Bα

252

6 Maximal operators and Gaussian processes

where the set Bα will be made explicit later on. We already know that J1 j ≤J Tj f 2 converges in L1 (μ) and almost everywhere to f 2 dμ = 1, as J tends to infinity along the index J. For what follows, it will be necessary to make this a bit more precise. Let δk = δ2−k , k ≥ 1, where 0 < δ < inf(α − 1, 1) will be defined later on. We can thus extract from the index J a sequence {Jk , k ≥ 1} such that 1 Tj f 2 − 1 > δk ≤ δk . μ Jk j ≤Jk

Put

1 Bˇδ = ∀k ≥ 1, Tj f 2 − 1 ≤ δk . Jk j ≤Jk

Then μ(Bˇδ ) ≥ 1 −

∞

= 1 − δ, and

k=1 δk

1 Bˇδ ⊂ Bα := ∀k ≥ 1, Tj f 2 < α . Jk j ≤Jk

We have thus μ(Bα ) ≥ 1 − δ. And on Bα ,

1 − 2a

1 1 Tj f 2 > 1 − 2aα = , Jk 2 j ≤Jk

for any k ≥ 1. Thus, Bα

1 − 2a

dμ

1 Jk

j ≤Jk

As for any 0 ≤ b < 21 , E exp b(N (0, 1)2 ) = E exp Bα

2 aFA,J

dμ ≤ Bα

√ 1 , 1−2b

Bα

√ 2.

we have the estimate

E exp aFJ2 dμ

=

Hence

Tj f 2

≤

1 − 2a

dμ

1 Jk

j ≤Jk

Tj f 2

≤

√ 2.

√

√ 2 2 E exp a FA,Jk 22,μ ≤ 2 + eaA μ(Bαc ) ≤ 2 + δeA a .

The extracted subsequence {Jk , k ≥ 1} relies upon δ. We choose δ < (α − 1)e−A /4α . Denote again by J the sequence {Jk , k ≥ 1}. Then J relies upon A and α, and for any J in J we have

√ E exp a FA,J 22,μ ≤ 2 + α − 1. (6.4.11) 2

253

6.4 Proof of Theorem 6.1.6

We now evaluate the quantity E supn∈I |Sn (FA,J )| dμ by considering separately the two integrals E sup |Sn (FA,J )| dμ and E sup |Sn (FA,J )| dμ. Bαc n∈I

Bα n∈I

The first integral can be bounded for any R > 0 by

E sup |Sn (FA,J )| 1{ FA,J 2,μ >R} dμ+ E sup |Sn (FA,J )| 1{ FA,J 2,μ ≤R} dμ. Bα

Bα

n∈I

n∈I

As concerns the second, using the Cauchy–Schwarz inequality gives 1 E sup |Sn (FA,J )| dμ ≤ μ(Bαc ) 2 · E sup |Sn (FA,J )| 2,μ Bαc n∈I

n∈I

≤ (α − 1)1/2 e−A ≤ (α − 1)

2 /8α

1/2 −A2 /8α

e

2 2

#(I )S(I, 2)E FA,J 2,μ #(I )S(I, 2).

Now return to the first term and observe that

Sn (FA,J ) ∞ ≤ S(I, ∞) FA,J ∞ ≤ S(I, ∞)A. Estimate (6.4.11) and the fact that Sn is continuous on L∞ (μ) allows us to bound

E sup |Sn (FA,J )| 1{ FA,J 2,μ >R} dμ Bα

n∈I

by

2 AS(I, ∞)P FA,J 2,μ > R ≤ AS(I, ∞)e−aR E exp a FA,J 22,μ 2 √ ≤ AS(I, ∞)e−aR ( 2 + α − 1). Consider the second integral. Here it is much easier, because we have the straightforward bound

R E sup |Sn (FA,J )| 1{ FA,J 2,μ ≤R} dμ ≤ A∞,2 S, I, . A Bα n∈I By combining all these estimates and returning to the initial inequality, we see that we have arrived at 2 A2 γ E sup Z(Sn (f )) ≤ S(I, 2) (1 + α)#(I ) exp − 8 n∈I 2 2 + S(I, 2)(α − 1)1/2 e−A /8α #(I ) (6.4.12) √ R 2 . + ( 2 + α − 1)AS(I, ∞)e−aR + A∞,2 S, I, A

254

6 Maximal operators and Gaussian processes

In this last inequality, J has disappeared. We were free to choose α > 1, but as close to 1 as we wish, which we do now. By letting also γ tend to 1, we have thus obtained 2 √ R 2 2 E sup Z(Sn (f )) ≤ 2#(I )S(I, 2)e−A /8 + 2·AS(I, ∞)e−R /4 +A∞,2 S, I, . A n∈I (6.4.13) This last inequality being satisfied for any f ∈ L∞ (μ) such that f 2,μ = 1, we easily deduce the claimed result by continuity in quadratic mean of Z.

6.5 The case Lp , 1 < p < 2 Let (X, A, μ) be some probability space. Let 1 < p ≤ 2 and denote by q its conjugate: 1 1 p p + q = 1. Consider a sequence {Sn , n ≥ 1} of continuous operators from L (μ) to Lp (μ), and assume that the almost sure boundedness property μ {S ∗ f < ∞} = 1

for all f ∈ Lr (μ)

(Br )

is fulfilled for some r < p. Can we again deduce an entropy criterion similar to Corollary 6.1.2? The following theorem ([Weber: 1993b], Theorem 1.4) shows that the answer is affirmative, but the proof will depend this time on more delicate properties of p-stable processes, instead of those of Gaussian processes used till now. 6.5.1 Theorem (Third entropy criterion). Let 1 < p ≤ 2 with conjugate q. Consider a sequence {Sn , n ≥ 1} of continuous operators from Lp (μ) to Lp (μ). Assume that there exists an ergodic endomorphism τ of the measure space (X, A, μ) commuting with each Sn . Assume also that for some real 0 < r < p, property (Br ) is fulfilled. Then there exists a constant C(r, p) < ∞ depending on r and p only, such that for any f ∈ Lp (μ), " #1/q p sup ε log Nf (ε) ≤ C(r, p) f p , (6.5.1) ε>0

p Nf (ε)

where is the minimal number of open Lp -balls of radius ε, centered in Cf and enough to cover it. Further C(r, p) tends to infinity as r tends to p. Proof. Let T be the operator associated to τ through the relation Tf = f τ . We shall replace the Gaussian elements by stable ones. Let {θi i ≥ 1} be a sequence of i.i.d. symmetric, p-stable random variables of parameter 1 ([Petrov: 1975], [Mijnheer: 1975]). For any f ∈ Lp (μ), any positive integer J and any x ∈ X and (ω, ω ) ∈ × , we put 1 θj T j f (x). (6.5.2) Ff,J (x) = 1/p J j ≤J

Then FJ = {Ff,J (x), x ∈ X} is a p-stable random function with spectral measure δT j f . m= j ≤J

6.5 The case Lp , 1 < p < 2

255

One can represent FJ as a random mixture of Gaussian random functions; this is a classical fact from p-stable random functions. More precisely, there exist a sequence {gi , i ≥ 1} of i.i.d. N (0, 1) random variables basic probability space (, A, P) and a sequence {ηj , j ≥ 1} of i.i.d. nonnegative random variables basic probability space ( , A , P ) such that the random function HJ defined by HJ,f (ω, ω , x) =

1 J 1/p

ηj (ω )gj (ω)T j f (x)

j ≤J

p

has the same distribution as FJ . See Remark 1.8 in [Marcus–Pisier: 1984] for this fact. We denote in what follows P˜ = P ⊗ P . Observe also for any r < p,

E |Fj |r = (E |θ1 |r )

1 j p r/p |T f | , J j ≤J

since

1 1

Jp

D

θj T j f = θ1

j ≤J

1 j p 1/p |T f | . J j ≤J

Let f ∈ L∞ (μ). In view of Birkhoff’s theorem, as well as the dominated convergence theorem, we get lim

J →∞

r/p 1 j p r/p |T f | dμ = |f |p dμ . J j ≤J

And so for any J large enough, E |Fj |r dμ ≤ 2r (E |θ1 |r ) f rp . Thus for any r < p and J large enough,

FJ r,μ×P˜ ≤ 2 θ1 r f p,μ . Besides, from the Banach principle and the assumptions made, we also observe that for any ε√> 0, any J large enough, there exist a measurable set XεJ ⊂ X with μ(XεJ ) ≥ 1 − ε, and a real C(ε) such that for all x ∈ XεJ , √ (6.5.3) P˜ sup |Sn (FJ,f )| ≤ C(ε) θ1 r f p,μ ≥ 1 − 2 ε. n≥1

Hence √ √ P˜ ω : P sup |Sn (FJ,f (ω, ω , x))| ≤ C(ε) θ1 r f p,μ ≥ 1 − ε ≥ 1 − 3 ε. n≥1

(6.5.4)

256

6 Maximal operators and Gaussian processes

We denote by EP the expectation symbol with respect to P. Using now estimate (10.2.2) for Gaussian semi-norms, we show that on XεJ , for any 0 < ε < 1/4, √ 4C(ε) √ θ1 r f p,μ . 1 − ε ≤ P ω : EP sup |Sn (FJ,f ( ·, ω , x))| ≤ 1− (6.5.5) ε n≥1

Consider the p-stable sequence of random variables defined by Sn (FJ,f ) =

1 J

1 p

θj Sn (T j (f )),

n ≥ 1,

j ≤J

and also equal, thanks to the commutation assumption, to

1 J 1/p

θj T j (Sn (f )),

n ≥ 1.

j ≤J

This sequence has thus the same distribution function as the p-stable random function HJ (n) =

1 J 1/p

ηj gj T j (Sn (f )),

n ≥ 1.

j ≤J

Introduce the Gaussian distance on N, $

2 1 EP HJ (n) − HJ (m) dJ,ω ,x (n, m) = 2

%1/2

,

p

as well as the metric associated to HJ , 1/p 1 p , |β(n) − β(m)| dmHJ (β) dJ,x (n, m) = 2 where mHJ denotes the spectral measure of HJ . For any finite subset A ⊂ N, any metric d on N, any ε > 0, we denote by N(A, d, ε) the minimal number of d-balls centered in A and enough to cover A. Moreover let σ (A, d, n) be the smallest ε > 0 such that A can be covered with at most n d-balls centered in A. By Lemma 2.1 in [Marcus–Pisier: 1984], there exists a measurable set 0 with P (0 ) > 21 , in fact the computations made show that the probability can be as close to one assume for only convenience reasons that P (0 ) > √ as we wish, and we shall 1 − ε, such that for any ω ∈ 0 and any positive integer n, σ (N, dJ,ω ,x , n) ≥ β(p)

σ (N, dJ,x , n) 1

(log(n + 1)) q

− 21

,

(6.5.6)

where β(p) depends on p only. We deduce from (6.5.6), as well as (6.5.5) and Sudakov’s minoration (6.2.7) that for any x ∈ XεJ ,

1/q 4C(ε) , √ θ1 r f p,μ ≥ γ (p) sup δ log N (N, dJ,x , δ) 1− ε δ>0

(6.5.7)

6.5 The case Lp , 1 < p < 2

257

where γ (p) > 0. Let I be a finite subset of N such that for any n, m ∈ I with m = n,

Sn (f ) − Sm (f ) 2,μ = 0. In view of the assumptions made, we can find a partial index J depending on I such that √ μ ∀j ∈ J, ∀n, m ∈ I, dJ,x (n, m) ≥ δ(p) (Sn − Sm )(f ) p,μ ≥ 1 − ε (6.5.8) where δ(p) > 0. By combining (6.5.7) and (6.5.8) we get

1/q , C(ε) θ1 r f p,μ ≥ ε(p) sup δ log N (I, · p,μ , δ)

(6.5.9)

δ>0

where ε(p) > 0. We conclude by letting I increase to N. We deduce the claimed result for any f ∈ Lp (μ) by proceeding by approximation. 6.5.2 Remarks. The conjugacy lemma allows us to get stronger criteria for matrix summation methods defined on general dynamical systems. Let (X, A, μ) be a Lebesgue space and denote by T the group of automorphisms on (X, A, μ). Let A = {an,k , 1 ≤ k ≤ Nn , n ≥ 1}, Nn an increasing sequence of positive integers, be an infinite matrix of real numbers. Define an = {an,k , k ≥ 1}, n ≥ 1, and assume that the following regularity assumptions are fulfilled: i)

A = sup an 1 < ∞, n≥1

ii)

lim

n→∞

Nn

(6.5.10) an,k = 1.

k=1

Put for every T ∈ T , every f ∈ Lp (μ), SnT (f ) =

Nn

an,k f T k ,

n = 1, 2, . . . .

(6.5.11)

k=1

Suppose there exists an ergodic operator T such that the sequence of operators {SnT , n ≥ 1} verifies property (Bp ), for some 2 ≤ p < ∞. Note that the commutation assumption (H2) is automatically satisfied since T is ergodic. Then (Weber [1993a: Theorem 7.7-8]) A is a GB set of 2 , (6.5.12) and the first entropy criterion for instance can be strengthened as follows: sup

sup

S∈T

f ∈Lp (μ)

f 2,μ ≤1

E sup |Z(SnS (f ))| < ∞.

(6.5.13)

n≥1

Let us prove (6.5.12) first. By means of Kakutani–Rochlin’s lemma (7.2.2), for any ε > 0, any N ≥ 0, there exists a measurable set A such that A, T A, . . . , T N −1 A, are

258

6 Maximal operators and Gaussian processes

pairwise disjoint and 1 − ε ≤ N μ(A) ≤ 1. We set f = 1A . Let n, m be such that Nn ≤ Nm ≤ N. Then, Nn Nm T

SnT (f ) − Sm (f ) 2,μ = (an,k − am,k )f T k + am,k f T k k=Nn +1

k=1

=

Nn

(an,k − am,k )2 +

k=1

2 = an − am 2 μ(A).

Nm

2 am,k

1/2 √

2,μ

μ(A)

k=Nn +1

By the first entropy criterion, 2 E sup Z(an ) ≤ C μ(A). n:Nn λ < ∞.

f 2,μ ≤1 λ≥0

n≥1

And it follows from Theorem 5.4.3 that C := sup sup sup λp μ sup |Snτ f | > λ < ∞. λ≥0 f p ≤1 τ ∈C

Thus

n≥1

sup sup sup |Snτ f | 1 < ∞.

f p ≤1 τ ∈C

n≥1

The claimed inequality now just follows from the same argument used to prove Theorem 6.1.1 and inequalities (6.1.11).

6.6 A remarkable GB set property One of the easiest consequences of the first entropy criterion is the following: let (X, A, μ, τ ) be an ergodic measurable dynamical system. Consider for any f ∈ L1 (μ) and any positive integer n the usual ergodic averages 1 f τ k, n n−1

Aτn (f ) =

k=0

and for any f ∈ L2 (μ) the subset of L2 (μ), Cf = {Aτn (f ), n ≥ 1}. Then these sets are always GB sets, and in fact even GC sets (see p. 510. In particular 2 sup δ log Nf (δ) ≤ C f 2 . δ>0

Now let A be a nonempty subset of L2 (μ) and form C(A) = {Aτn (f ), n ≥ 1, f ∈ A}. Assume that A is a GB set; can we say that C(A) is again a GB set? More precisely: A is a GB set ⇐⇒ C(A) is a GB set?

260

6 Maximal operators and Gaussian processes

This question was solved in [Weber: 1994] in a much more general setting than the simple one of usual ergodic averages, and is the main result of this section. Apart from the fact that we will work with positive operators it can be viewed as a logical extension of the first entropy criterion, since it is stated under the same assumptions and contains it obviously. 6.6.1 Theorem. Let 2 ≤ p < ∞. Let {Sn , n ≥ 1} be a sequence of positive continuous operators from Lp (μ) to Lp (μ), with S1 = Identity. Assume that there exists a sequence {Tj , j ≥ 1} of positive isometries L2 (μ) with Tj (1) = 1 and such that: 1 (a) ∀f ∈ L∞ (μ), lim Tj f − f dμ = 0, J →∞ J 1 j ≤J

(b) Sn Tj = Tj Sn . Assume that property (Bp ) is realized. Let A be any nonempty subset of Lp (μ) and set C(A) = {Sn (f ), n ≥ 1, f ∈ A}. Then the following equivalence holds: A is a GB set ⇐⇒ C(A) is a GB set. Further, there exists a constant C such that, Z being the canonical Gaussian process on L2 (μ), for any subset A of Lp (μ), E sup Z(h) ≤ C inf h 2,μ + E sup Z(h) . (6.6.1) h∈C(A)

h∈A

h∈A

Remarks. 1. Before giving the proof of this result, some comments are in order. Since S1 is the identity operator on Lp (μ), it follows that C(A) is a GB set only if A is. 2. Let τ be some ergodic endomorphism of (X, A, μ). Let A be a GB set. By applying the above theorem with the choices p = 2, Sn = Aτn (f ), Tj = T j where T is defined by Tf = f τ , and using Birkhoff’s theorem, we deduce that C(A) is a GB set of L2 (μ); which solves in the affirmative the question raised at the beginning of the section. 3. Put for any positive integer n, Cn (A) = C(C(· · · C(A) · · · )) . 56 7 4 n times

By iterating Theorem 6.6.1 we find that the sets Cn (A) are GB sets. These sets being increasing, let C ∗ (A) = limn→∞ Cn (A) be their limit. Is C ∗ (A) again a GB set? Proof of Theorem 6.6.1. We shall use again the Gaussian elements defined in (6.2.8). We associate to any f ∈ Lp (μ), the Gaussian sequence 1 FJ,f = √ gj Tj (f ), J j ≤J

J = 1, 2, . . . ,

261

6.6 A remarkable GB set property

where g1 , g2 , . . . is a sequence of i.i.d. N (0, 1) random variables defined on a joint probability space (, B, P) of (X, A, μ). Step 1. By means of the Banach principle, there exists a constant 0 < K < ∞ such that for any f ∈ Lp (μ), 1 μ sup |Sn (f )| ≥ K f p,μ ≤ . 4 n≥1 Thus for any finite subset A0 of Lp (μ) and any positive integer J , we have in view of the positivity assumption of the operators Sn , μ sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ f ∈A0 n≥1

f ∈A0

1 ≤ μ sup Sn ( sup |FJ,f |) ≥ K sup |FJ,f | p,μ ≤ . 4 n≥1 f ∈A0 f ∈A0

By integrating this inequality with respect to P, next applying Fubini’s theorem, we obtain 1 P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ dμ ≤ . 4 X f ∈A0 n≥1 f ∈A0 Let D ⊂ X defined by

D = P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ f ∈A0 n≥1

Then 1 ≥ 4

D

f ∈A0

1 ≥ . 2

P sup sup |Sn (FJ,f )| geK sup |FJ,f | p,μ dμ

f ∈A0 n≥1

f ∈A0

1 1 . ≥ μ P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ ≥ 2 2 f ∈A0 n≥1 f ∈A0 Thus

or else

1 1 μ P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ ≥ ≤ , 2 2 f ∈A0 n≥1 f ∈A0

1 1 ≤μ ≤ P sup sup |Sn (FJ,f )| ≤ K sup |FJ,f | p,μ . 2 2 f ∈A0 n≥1 f ∈A0 Put E=

sup sup |Sn (FJ,f )| ≤ K sup |FJ,f | p,μ ,

f ∈A0 n≥1

f ∈A0

F = sup |FJ,f | p,μ ≤ 4E sup |FJ,f | p,μ . f ∈A0

f ∈A0

(6.6.2)

262

6 Maximal operators and Gaussian processes

As

1 P sup |FJ,f | p,μ ≥ 4E sup |FJ,f | p,μ ≤ , 4 f ∈A0 f ∈A0

we have P(E ∩ F ) ≤ P sup sup |Sn (FJ,f )| ≤ 4KE sup |FJ,f | p,μ , f ∈A0 n≥1

f ∈A0

and also 1 P(E ∩ F ) ≥ P sup sup |Sn (FJ,f )| ≤ K sup |FJ,f | p,μ − . 4 f ∈A0 n≥1 f ∈A0 By means of (6.6.2) we have for any finite subset A0 of Lp (μ) and any positive integer J ,

1 1 ≤μ ≤ P sup sup |Sn (FJ,f )| ≤ 4KE sup |FJ,f | p,μ . 2 4 f ∈A0 n≥1 f ∈A0

(6.6.3)

Consequently, by using estimate (10.2.2) for Gaussian semi-norms we get 1 ≤ μ E sup sup |Sn (FJ,f )| ≤ 64KE sup |FJ,f | p,μ . 2 f ∈A0 n≥1 f ∈A0

(6.6.4)

Step 2. Fix some 0 < ε < 21 and a positive integer N . We shall now proceed by approximation. Let A be a finite subset of Lp (μ) and assume for any f, g ∈ A and any two distinct integers k, l in [1, N] that

Sk (f ) − Sl (g) 2,μ = 0,

Sk (f ) 2,μ = 0.

(6.6.5)

To any element f from A, a simple function f ε can be associated such that sup f − f ε 2,μ ≤ ε.

(6.6.6)

f ∈A

Set A0 = f ε , f ∈ A . The continuity properties of the operators Sn show for ε sufficiently small, that (6.6.5) imply for any f, g ∈ A and any two distinct integers k, l in [1, N ] that (6.6.7)

Sk (f ε ) − Sl (g ε ) 2,μ = 0, Sk (f ε ) 2,μ = 0. From the commutation assumption also follows that Sk (FJ,f ε ) − Sl (FJ,g ε ) 2

2,P

But

2 2 1

Tj [Sk (f ε ) − Sl (g ε )] . = FJ,Sk (f ε )−Sl (g ε ) 2,P = J j ≤J

2

Tj [Sk (f ε ) − Sl (g ε )]

2 = Tj Sk (f ε ) − Sl (g ε ) ,

263

6.6 A remarkable GB set property

μ-almost surely. Thereby Sk (FJ,f ε ) − Sl (FJ,g ε ) 2

2,P

2 2 1 = FJ,Sk (f ε )−Sl (g ε ) 2,P = Tj Sk (f ) − Sl (g) . J j ≤J

(6.6.8) But the assumptions made on the sequence {Tj , j ≥ 1} show for any f, g ∈ A and any 1 ≤ k, l ≤ N that, 2 2 (6.6.9) lim Sk (FJ,f ε ) − Sl (FJ,g ε ) 2,P − Sk (f ε ) − Sl (g ε ) 2,μ = 0. J →∞

1,μ

Proceeding by extraction, one can define a partial index J = Jq , q ≥ 1 such that for any f, g ∈ A, any 1 ≤ k, l ≤ N and any q ≥ 1, 2 ε ε ε 2 ≤ 2q 2 . (6.6.10) Sk (FJ,f ε ) − Sl (FJ,g ε ) 2,P − Sk (f ) − Sl (g ) 2,μ 1,μ 2 N #(A)2 Put for any q ≥ 1, Aq =

sup f,g∈A, 1≤k,l≤N

2 ε ε 2 Sk (FJq ,f ε )−Sl (FJq ,g ε ) 2,P − Sk (f )−Sl (g ) 2,μ

≥ 2−q , (6.6.11)

then we have

ε , ∀q ≥ 1. 2q * H = Acq .

μ(Aq ) ≤ Put

(6.6.12)

q≥1

∞

Then μ(H ) ≥ 1− q=1 μ(Aq ) ≥ 1−ε, and on H , for any f, g ∈ A, any 1 ≤ k, l ≤ N and any q ≥ 1, 2 ε ε 2 −q Sk (FJq ,f ε ) − Sl (FJq ,g ε ) 2,P − Sk (f ) − Sl (g ) 2,μ ≤ 2 . Let θ :=

inf

1≤k =l≤N f,g∈A

Sk (f ε ) − Sl (g ε ) 2,μ .

By (6.6.7) we have θ > 0. We can thus define ∗

q := inf q ≥ 1 : 2 and

−q

θ2 ≤ , 4

J ∗ = {Jq , q ≥ q ∗ }. J∗

depends on ε, N and A. On H , we have for any J ∈ We note that f, g ∈ A, 1 ≤ k, l ≤ N, 2 Sk (FJ,f ε ) − Sl (FJ,g ε ) 2,P

(6.6.13) J∗

and any

2 1 2 − Sk (f ε ) − Sl (g ε ) 2,μ ≤ Sk (f ε ) − Sl (g ε ) 2,μ , 4

264

6 Maximal operators and Gaussian processes

hence 1 Sk (f ε ) − Sl (g ε ) ≤ Sk (FJ,f ε ) − Sl (FJ,g ε ) ≤ 2 Sk (f ε ) − Sl (g ε ) . 2,μ 2,P 2,μ 2 (6.6.14) With (6.6.14), we can apply Slepian’s inequality (6.2.6) on the measurable set H . We obtain on H , for any J ∈ J ∗ , E

sup

Z(h) ≤ 2E sup

sup Sn (FJ,f ε ).

(6.6.15)

Combining (6.6.4) with (6.6.15) finally gives: for any J ∈ J ∗ , E sup Z(h) ≤ 128KE sup |FJ,h | p,μ .

(6.6.16)

h∈{Sn (f ε ),1≤n≤N,f ∈A}

f ∈A0 1≤n≤N

h∈{Sn (f ),1≤n≤N,f ∈A0 }

h∈A0

Step 3. We estimate E suph∈A0 |FJ,h | p,μ for any J ∈ J, under the additional assumptions (6.6.5). We will indicate at the end of the proof how to proceed without them. By means of Jensen’s inequality, $ %1/p p E sup |FJ,h | p,μ ≤ E sup |FJ,h | dμ . X h∈A0

h∈A0

The integrability properties of Gaussian laws imply p p

E sup |FJ,h |p ≤ Cp E sup |FJ,h | , h∈A0

h∈A0

where 0 < Cp < ∞ is a constant depending on p only. Thus E sup |FJ,h | p,μ ≤ Cp

$

%p E sup |FJ,h |

X

h∈A0

1/p dμ

.

(6.6.17)

h∈A0

We shall split the integral in the right-hand side of (6.6.17) in two parts, by integrating first over H , next over H c . Examine the contribution produced by the first integration. Fix some h0 in A. The triangle inequality and the symmetry properties of Gaussian laws imply E sup |FJ,h | ≤ E |FJ,hε0 | + E sup |FJ,g−h | = E |FJ,hε0 | + 2E sup FJ,h . h∈A0

h,g∈A0

h∈A0

Integrating then this inequality over H with respect to μ, then applying the Slepian comparison lemma, imply "

p #p E sup |FJ,h | dμ ≤ E |FJ,hε0 | + 2E sup FJ,h dμ H

H

h∈A0

≤ H

h∈A0

p

E |FJ,hε0 | + 4E sup Z(h) h∈A0

dμ.

265

6.6 A remarkable GB set property

Hence,

#p

"

E sup |FJ,h |

H

1/p

≤ 1H E |FJ,hε0 | p,μ + 4E sup Z(h).

dμ

h∈A0

But

(6.6.18)

h∈A0

1H E |FJ,hε | p

and

=

p,μ

0

H

p/2 2 1 ε 2 Tj (h0 ) dμ, πJ j ∈J

p/2 p/2 1 ε 2 ε 2 Tj (h0 ) → (h0 ) dμ , J X j ∈J

as J tends to infinity along J, uniformly in x ∈ H . This shows that

#p

"

E sup |FJ,h |

lim sup J →∞ J ∈J

H

1/p dμ

≤ (2/π )1/2 h0 2,μ + ε + 4E sup Z(h).

h∈A0

h∈A0

(6.6.19) Now consider the integration over H c . By means of Jensen’s inequality

"

Hc

#p

E sup |FJ,h |

1/p dμ

h∈A0

2 ≤ B log [1 + #(A0 )]

But

sup H c h∈A0

p/2 1/p 1 2 Tj (h) . J j ∈J

1 2 sup Tj (h) → sup h2 dμ, h∈A0 J h∈A0 X j ∈J

μ-almost surely as J tends to infinity along J. Since the Tj are positive operators and Tj 1 = 1, we get 1 sup Tj (h)2 ≤ sup h 2∞,μ . h∈A0 J h∈A0 j ∈J

By applying the dominated convergence theorem, we obtain lim sup J →∞ J ∈J

Hc

#p

"

E sup |FJ,h | h∈A0

1/p dμ

2 ≤ B[μ(H c )]1/p log [1 + #(A0 )] sup h 2,μ h∈A0

2 # " ≤ B[2ε]1/p log [1 + #(A)] ε + sup h 2,μ . h∈A

(6.6.20)

266

6 Maximal operators and Gaussian processes

By combining now estimates (6.6.16), (6.6.19) and (6.6.21), and using subadditivity of the function φ(x) = x 1/p , x > 0, we get

E sup Z(h) ≤ 32K h0 2,μ + ε+4E sup Z(h) h∈{Sn (f ),1≤n≤N,f ∈A0 }

h∈A0

2 # " + B log [1 + #(A)] ε + sup h 2,μ (2ε)1/p .

(6.6.21)

h∈A

The finite-dimensional margins of Gaussian vectors being L2 -continuous, it follows that E sup Z(h) ≤ C(ε) + E sup Z(h) , h∈{Sn (f ),1≤n≤N,f ∈A}

h∈{Sn (f ),1≤n≤N,f ∈A0 }

E sup Z(h) ≤ C(ε) + E sup Z(h), h∈A0

h∈A

where 0 < C(ε) < ∞ and limε→0 C(ε) = 0. Hence

Z(h) ≤ C(ε) + 128K h0 2,μ + ε + 4C(ε) E sup h∈{Sn (f ),1≤n≤N,f ∈A}

2 " # + 4E sup Z(h) + B log [1 + #(A)] ε + sup h 2,μ [2ε]1/p . h∈A

h∈A

But ε is arbitrary as well as h0 in A. We therefore conclude

E sup Z(h) ≤ 128K inf h 2,μ + 4E sup Z(h) . h∈{Sn (f ),1≤n≤N,f ∈A}

(6.6.22)

h∈A

(6.6.23)

h∈A

It is now clear that (6.6.23) remains true when the additional assumptions (6.6.5) are no longer fulfilled. Indeed, it suffices to establish (6.6.23) for A = {h ∈ A : h = 0} and B = {h ∈ {Sn (f ), 1 ≤ n ≤ N, f ∈ A} : Sn (h) = 0}. The proof is now achieved by letting A increase to some countable L2 (μ)-dense subset of A.

Chapter 7

The central limit theorem for dynamical systems

In any aperiodic dynamical system, there exists a square integrable centered function satisfying the central limit theorem (CLT). This is a famous result due to Burton and Denker, and we provide a complete and detailed proof, involving Kakutani–Rochlin’s lemma. Some additional CLT results for orbits of aperiodic dynamical systems are further established. The CLT for various means generated under the action of irrational rotations is proved next. In the case of Gaussian lacunary Fourier series, we study the convergence in variation of the related density distributions to the Gaussian density.

7.1

Introduction and preliminaries

We begin with some elementary and introductory considerations. Let (X, A, μ, τ ) be a measurable dynamical system. Recall Theorem 4.1.1. Birkhoff’s pointwise ergodic theorem. For any f ∈ L1 (μ), the limit 1 f τ k x = f¯(x) n→∞ n n−1

lim

k=0

exists μ-almost everywhere and in L1 (μ), and we have f¯ = E {f |J}, where J = σ {A ∈ A : τ −1 A = A}. A parallel result in probability theory is the well-known Strong law of large numbers (SLLN). Let X, X1 , X2 , . . . be a sequence of independent, integrable, identically distributed random variables with basic probability space (, B, P), and set Sn = X1 + · · · + Xn . Then P

Sn = E X = 1. n→∞ n lim

It is worth noticing that the SLLN is just a very particular case of the pointwise ergodic theorem. And even in the case of sequences of independent, identically distributed random variables, the pointwise ergodic theorem expresses a much stronger property. As is well known, the SLLN is completed by two fundamental results: the law of the iterated logarithm and the central limit theorem. This last result, which concerns the statistic of this convergence, states as follows:

268

7 The central limit theorem for dynamical systems

Central limit theorem (CLT). Let X, X1 , X2 , . . . be a sequence of independent, identically distributed random variables with basic probability space (, B, P). Assume that E X = 0, E X2 = 1. Then, Sn D lim √ = N (0, 1). n→∞ n In the late 1980s, a companion to this result was found independently by Brosamler, Fisher and Schatte. The following formulation of this result is due to Lacey and Philipp [1990]. Almost sure central limit theorem (ASCLT). Under the same assumptions, N 1 1 D δ{Sj /√j } = N (0, 1). N →∞ log N j

lim

j =1

It is natural to ask whether or not similar results hold for dynamical systems, and under which conditions. Such questions were, and are still intensively investigated. There are, however, very few fundamental results and in the same way many specific results. The object of this chapter is to present the probably most fundamental result in this area: the theorem of Burton and Denker recently completed by a fine result of Volný. Next, we focus on the central limit theorem and almost sure central limit theorem for irrational rotations, which are at the heart of the study of dynamical systems. Finally, we will study in the case of Gaussian lacunary Fourier series a very sharp form of the CLT: the convergence in variation, namely the convergence in the spaces L1 (R) and L∞ (R) of the related density distributions to the Gaussian density. Before really entering into the matter, we shall give some more comments onASCLT and its connection with CLT. By Theorem 1.6 in [Atlagh–Weber: 2000] both properties are equivalent, thereby equivalent to the moment condition E X = 0 and E X 2 = 1. The formulation of the ASCLT we gave is, however, only a weak form of a much stronger phenomenon: not only logarithmic averages converge, but also some series. Let s = {sk , k ≥ 0} be an arbitrary sequence of reals. Put for any positive integer n, Yn(s) = Yn =

2n ≤k 3/2. In view of Kronecker’s lemma, (7.1.2) implies with the choice sk = xk k and assuming that E X = 0, E X2 = 1, $

1 1 Sk lim 1{ √Sk <x } − P √ < xk k n→∞ log n k k k k=1 n

%

a.s.

= 0.

(7.1.3)

By using the CLT and letting xk ≡ x in (7.1.3), we recover the initial formulation of the ASCLT. In fact, property (7.1.2) remains true even in absence of a CLT. Indeed, let 0 < p < ∞ and consider the class Fp of distribution functions F verifying

(Fp ) F (−x) ∨ (1 − F (x) = O x −p , x → +∞. In∞the case of p ≥ 1, we moreover assume that F is a centered distribution function: −∞ xF (dx) = 0. Then A general formulation. Let X, X1 , X2 , . . . be a sequence of independent, identically distributed random variables with basic probability space (, B, P). Let F be the distribution function of X. Assume that F ∈ F2 . Then, property (7.1.2) holds true. Further, for any sequence {xk , k ≥ 1} of reals,

1 1 Sk 1 1 a.s. lim P √ ≤ xk = c "⇒ lim 1{ √Sk ≤x } = c. k n→∞ log n n→∞ k log n k k k k=1 k=1 n

n

The above formulation ([Giuliano–Weber: 2005], Theorem 1.1) of the ASCLT thus appears as the precise form that takes the quasi-orthogonal property of the geometric blocks (7.1.1) in presence of the CLT. Extensions to the case F ∈ Fp , 0 < p < 2 are given in the same paper, modulo some additional assumptions on F .

7.2 A theorem of Burton and Denker A main purpose of this section is to exhibit a real-valued function f defined on the phase space X of a given aperiodic dynamical system (X, A, μ, T ) such that the natural long-term ratio satisfies the central limit theorem: n−1 j u −v 2 1 j =0 T f (x) μ x ∈ X : n−1 ≤u → √ exp dv (7.2.1) 2 2π −∞

j =0 T j f 2 as n → ∞, for all u ∈ R.

270

7 The central limit theorem for dynamical systems

Let (X, A, μ, T ) be a measurable dynamical system. We assume here and throughout the section that (X, A, μ) is a Lebesgue space. Let us clarify that a complete finite measure space is called a nonatomic Lebesgue space, if it is isomorphic (mod 0) to the ordinary Lebesgue space ([0, γ ], L([0, γ ]), λ) for some γ > 0. In other words, there exist sets Z1 ⊂ X of measure 0 and Z2 ⊂ [0, γ ], and a measurable bijection ψ : X \ Z1 → [0, γ ] \ Z2 , such that ψ −1 is measurable and μ = λψ. A nonatomic Lebesgue space joined with finitely or countably many point masses of finite total mass is called a Lebesgue space. We will always assume that μ(X) = 1. For basic results concerning Lebesgue spaces we refer for instance to De la Rue [1993]. In addition, we recall that the given T is said to be aperiodic, if μ{x ∈ X : T n x = x} = 0 for all n ≥ 1. It is instructive to observe that in this case the measure space (X, A, μ) is nonatomic. This fact follows straightforwardly by the Poincaré recurrence theorem. It can be also easily verified by using the following well-known and in the sequel useful result, see Halmos [1956]: Kakutani–Rochlin’s lemma. If T is aperiodic, then for every ε > 0 and for every n ≥ 1 there exists F ∈ A such that the sets F, T −1 (F ) . . . T −(n−1) (F ) are mutually disjoint, and such that we have:

μ F ∪ T −1 (F ) ∪ · · · ∪ T −(n−1) (F ) > 1 − ε (7.2.2) Any set F ∈ A satisfying the conclusions of (7.2.2) with the given and fixed ε > 0 and n ≥ 1 will be called an (ε, n)-Kakutani–Rochlin set. The essential fact on such sets needed in the Burton–Denker construction is proved in Corollary 7.2.2 below, see also Remark 7.2.3. The next proposition establishes the main step in its proof. As a preliminary fact in this direction, recall a well-known result on Lebesgue spaces due to Rochlin ([Rochlin: 1962], p. 31): The factor space of a Lebesgue space with respect to a measurable decomposition is a Lebesgue space.

(7.2.3)

In particular, consider as in Proposition 7.2.1 below, a Lebesgue space (X, A, μ) and a σ -algebra B without atom, generated by a countable family (Bn )n≥1 of elements from A. Let ζ be the decomposition of X generated by B. That is, we introduce the equivalence relation on X by putting x ∼ x , if and only if 1B (x ) = 1B (x ) for all B ∈ B, and we put ζ = {[x], x ∈ X} to denote the set of all equivalence classes. Since for any two points x , x ∈ X we have 1B (x ) = 1B (x ) for all B ∈ B, if and only if 1Bn (x ) = 1Bn (x ) for all n ≥ 1, we see that ζ is measurable in the sense of Rochlin, see Rochlin [1962] (pp. 4–5, 26). That is, the decomposition ζ is generated by a countable family of measurable sets (Bn )n≥1 . Notice that ζ ⊂ B ⊂ A. Moreover, since B is without atom, we have μ(C) = 0 for all C ∈ ζ . Hence we easily find that (X, A, μ) is nonatomic. Let (Xζ , Aζ , μζ ) be the factor space of (X, A, μ) with respect , to ζ . That is, we have Xζ = {[x], x ∈ X}, Aζ = A˜ = [x]∈Xζ {[x]} : A˜ ∈ A and ˜ = μ(A) ˜ for all A˜ ∈ Aζ . By (7.2.3) we know that (Xζ , Aζ , μζ ) is a Lebesgue μζ (A) space. Moreover, since B is without atom, we see that (Xζ , Aζ , μζ ) is nonatomic. In

7.2 A theorem of Burton and Denker

271

other words, the Lebesgue space (Xζ , Aζ , μζ ) is isomorphic (mod 0) to the ordinary Lebesgue space ([0, 1], L([0, 1]), λ). This fact turns out to be of vital importance in Step 1 of the proof of Proposition 7.2.1 below. In the sequel, we use the following notation: given a finite measure space (X, A, μ) and A ∈ A, the trace of μ on A is a finite measure on A, denoted and defined by tr (μ, A)(B) = μ(A ∩ B) for all B ∈ A. If B is a σ -algebra on X and C is a subset of X, then the trace of B on C is a σ -algebra on C, denoted and defined by tr (B, C) = {B ∩C : B ∈ B}. It is instructive to observe, that if (X, A, μ) is a nonatomic Lebesgue space, and A belongs to A with μ(A) > 0, then A, tr (A, A), μ(A)−1 tr (μ, A) forms a nonatomic Lebesgue space as well. 7.2.1 Proposition. Let (X, A, μ) be a Lebesgue space, and let B be a σ -algebra without atom, generated by a countable family (Bn )n≥1 of elements from A. Then for any finite partition P of X, measurable with respect to A, there exists a σ -algebra C without atom, independent of P , and generated by a countable family of elements from B. Proof. The construction of C is divided into four steps as follows. Step 1. Let P be an arbitrary element from A. We show that for any θ ∈ ]0, 1[, there exists A ∈ B satisfying: μ(A) = θ, μ(A ∩ P ) = θ · μ(P ).

(7.2.4) (7.2.5)

Let ζ be the measurable decomposition generated by B, and let (Xζ , Aζ , μζ ) be the factor space of (X, A, μ) with respect to ζ . Denote by f the conditional μ -measure of P with respect to B. Then f can be regarded as a measurable map from Xξ into [0, 1]. According to the remarks stated after (7.2.3) above, we know that (Xζ , Aζ , μζ ) is isomorphic (mod 0) to the ordinary Lebesgue space ([0, 1], L([0, 1]), λ). Therefore we can restate Step 1 as follows. Step 1 . Let f be a measurable function from [0, 1] into itself, and let θ be an element from ]0, 1[. Then we show that there exists A ∈ L := L([0, 1]) satisfying

λ(A) = θ, f dλ = θ ·

A

(7.2.4 )

[0,1]

f dλ.

(7.2.5 )

Let Lθ denote the family of all finite unions of intervals of total length θ , and let L¯θ denote its closure in the topology generated by the metric d(A, B) = λ(A*B) for A, B ∈ L. It is easily seen that L¯θ = {A ∈ L : λ(A) = θ }, and since obviously Lθ is connected, then L¯θ is connected as well. Let us define a map ψ from L¯θ into R+ by 1 ψ(A) = f dλ θ A for all A ∈ L¯θ , and let us put M = [0,1] f dλ. Then we claim that there exists A+ ∈ L¯θ such that ψ(A+ ) ≥ M. Indeed, put B = f −1 ([M, 1]), and first suppose

272

7 The central limit theorem for dynamical systems

that λ(B) ≥ θ. Then obviously we can take for the desired A+ any measurable A ⊂ B satisfying λ(A) = θ. Next suppose λ(B) < θ; then we can fix any measurable C ⊂ B c satisfying λ(C) = θ − λ(B). Put A+ = B ∪ C; then A+ ∈ L¯θ , and we have: 1 1 1 f dλ = f dλ + M − f dλ ≥ (M · λ(B) + M − λ(C c )) θ A+ θ θ B Cc (1 + λ(B)) = (M − 1) · + 1 ≥ M. θ In a similar way we can find A− ∈ L¯θ such that ψ(A− ) ≤ M. Since ψ is continuous, then ψ(L¯θ ) is an interval, and the claim follows. Step 2. Consider the case where P = {P , P c }, and let A ∈ B with μ(A) > 0 be independent of P . Then we claim, that for any θ ∈ ]0, μ(A)[ there exists B ∈ B with B ⊂ A, independent of P , such this fact follows by applying

that μ(B) = θ . Indeed, −1 Step 1 to the Lebesgue space A, tr (A, A), μ(A) tr (μ, A) with the nonatomic σ algebra tr (B, A). Step 3. Consider the case where P = {P , P c }. Then we claim, that there exists a σ -algebra C without atom, independent of P , and generated by a countable family of elements from B. Indeed, by using Step 1 and Step 2, we can recursively construct an increasing sequence of finite partitions {Cn }n≥1 of X whose elements are from B, such that (7.2.6) each Cn consists of 2n atoms of measure 2−n , each atom in Cn is the union of two atoms from Cn+1 ,

(7.2.7)

each Cn is independent of P .

(7.2.8)

Let C be the smallest σ -algebra on X containing all the atoms of each partition Cn for n ≥ 1. Then by (7.2.6) and (7.2.7) we see that C is without atom, and by (7.2.8) we may easily verify that C is independent of P . Thus the claim follows. Step 4. Consider the general case where P = {P1 , . . . , Pn }. Then we claim, that we can find the σ -algebra C as it is stated in Proposition 7.2.1. For this, apply step 3 with the partition P1 = {P1 , P1c } and the σ -algebra B. In this way, we can find a σ -algebra C1 without atom, independent of P1 , and generated by a countable family of elements from B. Then Step 3 may be applied with the partition P2 = {P2 , P2c } and the σ -algebra C1 . In this way we can find a σ -algebra C2 without atom, generated by a countable family of elements from C1 , independent of P2 , and thus of P1 ∪ P2 as well. Continuing in this way, we shall at the end obtain a σ -algebra Cn without atom, generated by a countable family of elements from B, and independent of P . The claim then follows with C = Cn . These facts complete the proof. 7.2.2 Corollary. Let (X, A, μ) be a nonatomic Lebesgue space, let T be a measurepreserving transformation of X, let F ∈ A with μ(F ) > 0 be a given and fixed set, and let πl be a finite partition of T −l (F ) with elements from A for all 0 ≤ l < n with

7.2 A theorem of Burton and Denker

273

p some n ≥ 1. Then for any γ1 , . . . , γp ≥ 0 with k=1 γk = 1 and p ≥ 1, there exists a partition α = {A1 , . . . , Ap } of F with elements from A such that

μ T −l (Ak ) ∩ B = γk · μ(B) for all 1 ≤ k ≤ p, all B ∈ πl , and all 0 ≤ l < n. Proof. Applying Proposition 7.2.1 to the nonatomic Lebesgue space,

F, tr (A, F ), μ(F )−1 tr (μ, F ) with the partition π0 and the σ -algebra tr (A, F ), we can find a countably generated σ -algebra A0 ⊂ tr (A, F ) without atom and independent of π0 with respect to μ(F )−1 tr (μ, F ). We proceed by induction. Suppose that the σ -algebra Al is already constructed for some 0 ≤ l < n − 1. Then T −(l+1) (Al ) is a countably generated σ -algebra without atom on T −(l+1) (F ), and we can apply Proposition 7.2.1 to the Lebesgue space:

−(l+1) T (F ), tr (A, T −(l+1) (F )), μ(F )−1 tr (μ, T −(l+1) (F )) with the partition πl+1 and the σ -algebra T −(l+1) (Al ). In this way we get a countably generated σ -algebra Bl+1 ⊂ T −(l+1) (Al ) without atom, independent of πl+1 with respect to μ(F )−1 tr (μ, T −(l+1) (F )). Then we may define the countably generated σ -algebra Al+1 as follows: Al+1 = {A ∈ Al : T −(l+1) (A) ∈ Bl+1 }. The σ -algebra An−1 ⊂ tr (A, F ) obtained at the end is clearly such a one that T −l (An−1 ) is independent of πl with respect to μ(F )−1 tr (μ, T −l (F )) for all 0 ≤ l < n. Besides, as it is without an atom, we can find a partition α = {A1 , . . . , Ap } ⊂ An−1 of F such that μ(F )−1 μ(Ak ) = γk for all 1 ≤ k ≤ p. Hence we get μ(F )−1 μ(T −l (Ak ) ∩ B) = μ(F )−1 μ(T −l (Ak )) · μ(F )−1 μ(B) = γk · μ(F )−1 μ(B) for all 1 ≤ k ≤ p, all B ∈ πl , and all 0 ≤ l < n. This fact completes the proof. 7.2.3 Remark. We shall use Corollary 7.2.2 with n = N L and F being an (ε, N L)Kakutani–Rochlin set of an aperiodic dynamical system (X, A, μ, T ), where ε > 0 and N, L ≥ 1 with N ≥ 1 being even. Putting p = 2N/2 and γk = 2−N/2 for all 1 ≤ k ≤ p, in this way we may conclude the following: If F ∈ A is an (ε, NL)-Kakutani–Rochlin set, and πl is a finite partition of T −l (F ) with elements from A for all 0 ≤ l < NL, then there exists a finite partition α of F into 2N/2 sets from A such that

μ T −l (A) ∩ B = 2−N/2 · μ(B) for all A ∈ α, all B ∈ πl , and all 0 ≤ l < N L . It is instructive to observe that in this case we have μ(A) = 2−N/2 μ(F ) for all A ∈ A.

274

7 The central limit theorem for dynamical systems

Given a dynamical system (X, A, μ, T ), we shall use CLT(μ, T ) to denote the set of all functions f ∈ L2 (μ) with f dμ = 0, satisfying the central limit theorem as stated in (7.2.1) above. The main result of this section is the well-known theorem of Burton–Denker establishing that CLT(μ, T ) = ∅ whenever T is aperiodic. More precisely, we have: 7.2.4 Theorem. If (X, A, μ, T ) is an aperiodic dynamical system, then there exists f ∈ L2 (μ) with f dμ = 0 satisfying the central limit theorem: n−1 j u −v 2 1 j =0 T f (x) ≤u → √ exp dv μ x ∈ X : n−1 2 2π −∞

j =0 T j f 2 as n → ∞, for all u ∈ R. Proof. Let N, K, L ≥ 1 be given and fixed positive integers, such that N is even and 1 ≤ K < N is odd. Let F ∈ A be an (ε, NL)-Kakutani–Rochlin set, and let πl be a finite partition of T −l (F ) for all 0 ≤ l < NL. Then by Remark 7.2.3 there exists a partition α of F into 2N/2 sets from A satisfying:

μ T −l (A) ∩ B = 2−N/2 · μ(B) (7.2.9) for all A ∈ α, all B ∈ πl , and all 0 ≤ l < N L. The partition α can be written as follows: α = {A(ε0 , ε1 , . . . , ε N −1 ) : εi ∈ {−1, 1}}. 2

Define a measurable function

g 2l :

g (x) = εl 2l

F → {−1, 1} by putting if x ∈ A(ε0 , . . . , εl , . . . , ε N −1 ) 2

for all 0 ≤ l < N/2. Define a measurable function putting g (2l+K)(mod N ) (x) = −g 2l (x)

g (2l+K)(mod N ) :

F → {−1, 1} by

for all x ∈ F and all 0 ≤ l < N/2. In this way a function g l : F → {−1, 1} is defined for all 0 ≤ l < N. Finally, we define a measurable function g : X → {−1, 0, 1} as follows: ⎧

b−a 2 ⎪ g l T j N +l (x) , if x ∈ T −(j N +l) (F ) 2b θ ⎪ 2a ⎨ for some 0 ≤ j < L and 0 ≤ l < N; g(x) = ⎪ ⎪ , L−1 −l ⎩ 0, if x ∈ / N l=0 T (F ). m−1 We shall use Sm (f )(x) to denote j =0 f (T j (x)) whenever m ≥ 1 and x ∈ X. In addition, the following two facts on the Kakutani–Rochlin tower F, T −1 (F ), . . . , T −NL+1 (F ) instructively clarify the entire construction and will be freely used below. Namely, we have T (F ) ⊂

L−1 N+ l=0

c

T −l (F )

∪ T −N L+1 (F ),

(7.2.10)

275

7.2 A theorem of Burton and Denker

T

NL−1 +

c

T −l (F )

⊂

l=0

L−1 N+

c T −l (F ) ∪ T −N L+1 (F ).

(7.2.11)

l=0

Both statements follow straightforwardly. 7.2.5 Lemma. Under the hypotheses stated above we have: (a) Random functions g 0 , g 2 , . . . , g N −2 as well as g 1 , g 3 , . . . , g N −1 are independent and identically distributed with respect to the probability measure μ(F )−1 tr (μ, F ). Moreover, we have μ(F )−1 tr (μ, F ){g l = ±1} =

1 2

for all 0 ≤ l < N.

(7.2.12)

(b) Random functions g T i for 0 ≤ i < K restricted to T −l (F ) are independent and identically distributed with respect to the probability measure μ(F )−1 tr (μ, T −l (F )) whenever K ≤ l < N L. Moreover, we have μ(F )−1 tr (μ, T −l (F )){g T i = ±1} =

1 2

for all 0 ≤ i < K.

(7.2.13)

(c) Random function Sm (g) restricted to T −l (F ) is independent of πl with respect to the probability measure μ(F )−1 tr (μ, T −l (F )) whenever 1 ≤ m ≤ N and N ≤ l < N L. (d) Let K ≤ m < N − K and 0 ≤ l < N be given and fixed. Then there exists a set J ⊂ {0, 1, . . . , m − 1} of cardinality K − 1, K or K + 1 such that m−1

g(T i+l (x)) =

i=0

g(T i+l (x))

(7.2.14)

i∈J

, −j N +1 (F ). Furthermore, the random functions g T i+l whenever x ∈ L j =1 T with i ∈ J restricted to T −l (F ) are independent and identically distributed with respect to the probability measure μ(F )−1 tr (μ, T −l (F )). Moreover, we have μ(F )−1 tr (μ, T −l (F )){g T i+l = ±1} =

1 2

for all i ∈ J . (e) The following inequality is valid: |Sm (g)(x)| ≤ 2(K + 1)

(7.2.15)

for all m ≥ 1 and all x ∈ X outside of a μ-null set. (f) The following inequalities are valid: (1 − δ)m ≤ Sm (g) 22 ≤ (1 + δ)m

(7.2.16)

for all 1 ≤ m ≤ K, with δ → 0, if K(ε + L−1 ) → 0 when K, L → ∞ and ε → 0 (like maps from N into N, respectively ]0, ∞[, as n → ∞).

276

7 The central limit theorem for dynamical systems

(g) The following inequalities are valid: (1 − δ)K ≤ Sm (g) 22 ≤ (1 + δ)K

(7.2.17)

for all K ≤ m < N − K, with δ → 0, if K(ε + L−1 ) → 0 when K, L → ∞ and ε → 0 (like maps from N into N, respectively ]0, ∞[, as n → ∞). Proof. (a) The statements follow straightforwardly by definition and the fact that by (7.2.1) we have μ(A) = 2−N/2 μ(F ) for all A ∈ α. (b) Let us fix K ≤ l < N L, then by definition we have g(T i (x)) = g (l−i)(mod N ) (T l (x)) for all x ∈ T −l (F ) and all 0 ≤ i < K. Hence we can easily verify that (g T 0 , g T 1 , . . . , g T K−1 ) is equally distributed as (±g j1 , ±g j2 , . . . , ±g jK ) with a choice of signs ± and for some ( different ) j1 , j2 , . . . , jK from {0, 2, . . . , N − 2} ( both depending on the given l ). Thus the statements follow straightforwardly by (7.2.4). (c) Let A = {x ∈ T −l (F ) : g(T j (x)) = εj , ∀0 ≤ j < m} with some given and fixed j εj ∈ {−1, 1} for 0 ≤ j < m, and let B ∈ πl . If x ∈ T −l (F ), then T (x) ∈ −(l−j ) j (l−j )(mod N ) l (F ) and thus by definition we have g(T (x)) = g T T (x) . Hence , we easily find that A = C∈β T −l (C) for some subfamily β ⊂ α. Denote νl = μ(F )−1 tr (μ, T −l (F )), then by (7.2.1) we have: + νl (A ∩ B) = νl T −l (C) ∩ B = νl (T −l (C) ∩ B) C∈β

= μ(F )−1

C∈β

μ(T −l (C) ∩ B) = μ(F )−1

C∈β −1

= μ(F ) =

2−N/2 μ(B)

C∈β

2

−N/2

μ(F ) · μ(F )

−1

μ(B)

C∈β

μ(F )−1 μ(T −l (C)) · μ(F )−1 μ(B)

C∈β

=

νl (T −l (C)) · νl (B) = νl (A) · νl (B).

C∈β

This fact easily completes the proof of (c). (d) We shall verify the case where l = 0 and x ∈ T −N +1 (F ). Other cases follow in exactly the same manner by using periodicity in the definition of g. If m = K or K + 1, there is nothing to be proved. Thus consider the case where K + 2 ≤ m < N − K and look at any odd number p satisfying K ≤ p < m. Then p = 2j + K for some j ≥ 0, i and therefore members of the sum m−1 i=0 g(T (x)), which correspond to indices 2j and 2j + K, cancel. Arguing in this way for any odd number between K and m, we may cancel m − K, m − K + 1 or m − K − 1 indices ( depending on m and K ). Doing

7.2 A theorem of Burton and Denker

277

so, at the end we will have only those g T i which are independent with respect to μ(F )−1 tr (μ, F ). These facts complete the proof of (d). −1 l (e) By definition of g, we easily find that N l=0 g(T (x)) = 0, whenever L +

x∈

T −j N +1 (F ).

j =1

Therefore by (7.2.10), (7.2.11) and the definition of g we may conclude: Sm (g)(x) =

p−1

g (y) + ik

k=0

p−1

g jk (z)

(7.2.18)

k=0

for all x ∈ X outside of a μ-null set, with some y, z ∈ F , some 0 ≤ p, q ≤ K + 1, and some 0 ≤ i0 , . . . , ip−1 , j0 , . . . , jq−1 < N, where the corresponding term on the righthand side equals zero if p or q equals zero. Hence the claim follows straightforwardly by the fact that | g l (x) |≤ 1 for all x ∈ X and all 0 ≤ l < N. , L−1 −l (f) Suppose that G = N l=N T (F ). Then we have μ(G) = N (L − 1)μ(F ) ≥ N(L − 1)(1 − ε)N −1 L−1 ≥ 1 − ε − L−1 , and thus | Sm (g) |2 dμ ≤ m2 μ(Gc ) ≤ m2 (ε + L−1 ). (7.2.19) Gc

On the other hand we have N L−1 2 | Sm (g) | dμ = G

l=N

T −l (F )

| Sm (g) |2 dμ.

(7.2.20)

Fix N ≤ l < N L. Since m ≤ K, by (b) the functions g T i for 0 ≤ i < m restricted to T −l (F ) are independent with respect to μ(F )−1 tr (μ, T −l (F )) and we have i g(T (x)) μ(dx) = g (l−i)(mod N ) (T l (x)) μ(dx) T −l (F ) T −l (F ) (7.2.21) (l−i)(mod N ) g (x) μ(dx) = 0. = F

Thus we get T −l (F )

| Sm (g) |2 dμ =

m−1 i=0

T −l (F )

[g(T i (x))]2 μ(dx) = mμ(F ).

Now by (7.2.19), (7.2.20) and (7.2.22) we easily conclude: | Sm (g) |2 dμ + | Sm (g) |2 dμ

Sm (g) 22 = Gc 2

G

≤ m (ε + L−1 ) + N(L − 1)mμ(F ) ≤ m2 (ε + L−1 ) + m ≤ m(1 + K(ε + L−1 )).

(7.2.22)

278

7 The central limit theorem for dynamical systems

Moreover, by (7.2.20) and (7.2.22) we get

Sm (g) 22 ≥ N(L − 1)mμ(F ) ≥ N (L − 1)m(1 − ε)N −1 L−1 = m(1 − L−1 )(1 − ε) ≥ m(1 − ε − L−1 ). These facts complete the proof of (7.2.17). , L−1 −l −1 (g) Let G = N l=N T (F ), then as above μ(G) ≥ 1 − ε − L . Thus by (f) we have | Sm (g) |2 dμ ≤ 4(K + 1)2 μ(Gc ) ≤ 4(K + 1)2 (ε + L−1 ). (7.2.23) Gc

On the other hand we have | Sm (g) |2 dμ = G

N L−1 T −l (F )

l=N

| Sm (g) |2 dμ.

(7.2.24)

Fix N ≤ l < N L, and define l = j N − 1 − l for some j ≥ 2 in such a way that 0 ≤ l < N. Denote by J the set of all indices determined by (d) being applied to m, l and T −j N+1 (F ). Let x ∈ T −l (F ) be a point for which there exists y ∈ T −j N +1 (F ) satisfying T l (y) = x. Then by (d) above we have Sm (g)(x) =

m−1

g(T i+l (y)) =

i=0

g(T i+l (y)) =

i∈J

g(T i (x)).

(7.2.25)

i∈J

Since T is measure-preserving, the set of all x ’s from T −l (F ) satisfying the property stated above has a μ-outer measure equal to μ(F ). But the functions on the left and right-hand side of (7.2.25) are measurable, and thus (7.2.25) remains valid for μ -a.e. x ∈ T −l (F ). Moreover, by (7.2.25) we may easily conclude that the functions g T i for i ∈ J restricted to T −l (F ) are independent with respect to μ(F )−1 tr (μ, T −l (F )). Therefore by (7.2.21) we get | Sm (g) |2 dμ = [g(T i (x))]2 μ(dx) −l −l T (F ) (7.2.26) i∈J T (F ) = #(J ) · μ(F ) ≤ (K + 1)μ(F ). Now by (7.2.23), (7.2.24) and (7.2.26) we easily conclude: | Sm (g) |2 dμ + | Sm (g) |2 dμ

Sm (g) 22 = Gc

G

≤ 4(K + 1) (ε + L

−1

) + N (L − 1)(K + 1)μ(F )

≤ 4(K + 1) (ε + L

−1

) + (K + 1)

2 2

= K(1 + 4(1 + K

−1 2

) · K(ε + L−1 ) + K −1 ).

7.2 A theorem of Burton and Denker

279

Moreover, by (7.2.24) and (7.2.26) we get

Sm (g) 22 ≥ N (L − 1)(K − 1)μ(F ) ≥ N(L − 1)(K − 1)(1 − ε)N −1 L−1 = (K − 1)(1 − L−1 )(1 − ε) ≥ K(1 − ε − K −1 − L−1 − εK −1 L−1 ). These facts complete the proof of Lemma 7.2.5. To conclude the preliminary part of the proof let us notice that by (a) in Lemma 7.2.5 we have: g dμ = X

= =

N L−1 T l=0 N L−1 l=0 N L−1 l=0

−l (F )

T −l (F )

g(x)μ(dx) g l(mod N ) (T l (x)) μ(dx)

(7.2.27)

g l(mod N ) (x) μ(dx) = 0. F

We proceed by constructing a function f satisfying the statement of the theorem. Let {Nn , n ≥ 1}, {Kn , n ≥ 1} and {Ln , n ≥ 1} be increasing sequences of positive integers with Nn being even and 0 ≤ Kn < Nn being odd for all n ≥ 1. Let {εn }n≥1 be a decreasing sequence of positive real numbers converging to zero, and let Fn be an (εn , Nn Ln )-Kakutani–Rochlin set for every n ≥ 1. According to the preceding construction and statement (c) in Lemma 7.2.5 we may conclude, that for each n ≥ 1 there exists a measurable function gn from X into {−1, 0, 1} such that Sm (gn ) restricted to T −l (F ) is independent of the partition generated by {gn−1 T i | 0 ≤ i < Nn } with respect to the probability measure μ(Fn )−1 tr (μ, T −l (Fn )) for all 1 ≤ m ≤ Nn and all Nn ≤ l < Nn Ln with g0 being zero. In addition we assume that the given numbers satisfy the following four conditions: Kn < 2−1 Nn−1

for all n > 1;

Kn (εn + L−1 n ) → 0, 1 √

an Kn

aj Kj → 0,

n → ∞; n → ∞;

(7.2.28) (7.2.29) (7.2.30)

j n

It is instructive to observe that by (7.2.31) we have (aj ) ∈ 1 , and thus (aj ) ∈ 2 as well. Hence by(7.2.27) and the fact that |gn | ≤ 1 for all n ≥ 1 we easily find that f ∈ L2 (μ) and f dμ = 0.

280

7 The central limit theorem for dynamical systems

7.2.6 Remark. Conditions (7.2.30) and (7.2.31) may seem to exclude each other. In order to show that (7.2.28)–(7.2.31) can be satisfied, we may proceed as follows: Step 1. First choose {an , n ≥ 1} and {Kn , n ≥ 1} to satisfy (7.2.30) and (7.2.31). For example, put a1 = 2−1 and an = (an−1 )4 for all n ≥ 1, and let Kn = (an )−4 for all n ≥ 1. Then (7.2.30) and (7.2.31) become: an (aj )−3 → 0, n → ∞, (7.2.32) j n

which is easily verified, since {aj }j ≥1 is decreasing fast enough. Step 2. In this step Kn ’s are already given, so choose Nn ’s to satisfy (7.2.28), and choose εn ’s ( small enough ) and Ln ’s ( large enough ) to satisfy (7.2.29). This completes the choice. In addition we shall introduce an auxiliary sequence that will be of a vital importance in the rest. Given m ≥ 1, we define nm = sup{n ≥ 1 : Kn ≤ m}. Further, we denote: fm = anm gnm + anm +1 gnm +1

and

Am = Knm |anm |2 + m|anm +1 |2

for all m ≥ 1. Then we have: 7.2.7 Lemma. Under the hypotheses stated above we have 1 Sm (fm ) → 1 2 Am

(7.2.34)

1 Sm (f ) − Sm (fm ) → 0 2 Am

(7.2.35)

√ as m → ∞. Moreover, we have √ as m → ∞.

,Nnm +1 Lnm +1 −1 −l Proof. (7.2.34): Fix m ≥ 1 and put G = l=N T (Fnm +1 ), then we have: nm +1

Sm (fm ) 22 ≤ 2 [Sm (anm gnm )]2 dμ + 2 [Sm (anm +1 gnm +1 )]2 dμ Gc Gc (7.2.36) + [Sm (fm )]2 dμ. G

By (e) in Lemma 7.2.5 we easily find: [Sm (anm gnm )]2 dμ ≤ 2|anm |2 (Knm + 1)2 μ(Gc ) Gc

≤ 8|anm | Knm 2

· Knm +1 (εnm +1 + L−1 nm +1 ).

(7.2.37)

281

7.2 A theorem of Burton and Denker

Moreover, since |gn | ≤ 1 for all n ≥ 1, we then have [Sm (anm +1 gnm +1 )]2 dμ ≤ |anm +1 |2 m2 μ(Gc ) Gc

≤ |anm +1 |2 m · Knm +1 (εnm +1 + L−1 nm +1 ).

(7.2.38)

By construction Sm (anm +1 gnm +1 ) restricted to T −l (Fnm +1 ) is independent of Sm (anm gnm ) with respect to μ(Fnm +1 )−1 tr (μ, T −l (Fnm +1 )), for all Nnm +1 ≤ l < Nnm +1 Lnm +1 , since by definition m < Knm +1 < Nnm +1 . Moreover it is easily verified that by definition and (7.2.28) we have Knm ≤ m < Nnm − Knm . Therefore (g) in Lemma 7.2.5 may be applied. In this way by (7.2.27), and (f) and (g) in Lemma 7.2.5 with (7.2.29), we get [Sm (fm )]2 dμ G Nnm +1 Lnm +1 −1

=

[Sm (anm gnm ) + Sm (anm +1 gnm +1 )]2 dμ

l=Nnm +1

T −l (Fnm +1 )

Nnm +1 Lnm +1 −1

=

[Sm (anm gnm )] dμ +

[Sm (anm +1 gnm +1 )]2 dμ

2

l=Nnm +1

T −l (Fnm +1 )

2 2 ≤ Sm (anm gnm ) 2 + Sm (anm +1 gnm +1 ) 2

T −l (Fnm +1 )

≤ |anm |2 Knm (1 + δ ) + |anm +1 |2 m(1 + δ ) (7.2.39) where δ ∨ δ → 0 as m → ∞. Moreover, by (7.2.35)–(7.2.39) we get: Sm (fm ) 2 ≤ Kn |an |2 ·δ/2+m|an +1 |2 ·δ/2+Am (1+δ/2) = Am (1+δ), (7.2.40) m m m 2 where δ = 32(δ ∨ δ ∨ Knm +1 (εnm +1 + L−1 nm +1 )) → 0 as m → ∞. On the other hand by independence, (7.2.37) + (7.2.38), and (e) and (f) in Lemma 7.2.5 with (7.2.28) + (7.2.29), we similarly get:

Sm (fm ) 22

≥

[Sm (fm )]2 dμ ≥

G

G

[Sm (anm gnm )]2 dμ +

G

[Sm (anm +1 gnm +1 )]2 dμ

2 2 = Sm (anm gnm ) 2 + Sm (anm +1 gnm +1 ) 2 2 − [Sm (anm gnm )] dμ − [Sm (anm +1 gnm +1 )]2 dμ Gc

Gc

≥ |anm | Knm (1 − δ ) + |anm +1 | m(1 − δ ) 1 − − 8|anm |2 Knm .Knm +1 εnm +1 + Lnm +1 2

2

(7.2.41)

282

7 The central limit theorem for dynamical systems

− |anm +1 |2 m · Knm +1 (εnm +1 + L−1 nm +1 ) δ

δ

|anm |2 Knm + |anm +1 |2 m − Knm |anm |2 + m|anm +1 |2 ≥ 1− 2 2 = Am (1 − δ), where δ = 16(δ ∨δ ∨Knm +1 (εnm +1 +L−1 nm +1 )) → 0 for m → ∞. Thus the statement follows straightforwardly by (7.2.40)+(7.2.41), and the proof of (7.2.34) is complete. (7.2.35): We have already noticed that by (7.2.28) above we have Knm ≤ m < Nnm − Knm for all m ≥ 1. Thus (g) in Lemma 7.2.5 may be applied. In this way we get nm −1 1 1 1

Sm (an gn ) 2 + √ Sm (f ) − Sm (fm ) 2 ≤ √ √ Am Am n=1 Am

≤√

∞

Sm (an gn ) 2

n=nm +1

nm −1 1 1 2(Kn + 1)an + √ Am n=1 Am

∞

m · an

n=nm +2

√ ∞ m 4 ≤√ an Kn + an anm +1 Am n=1 n=nm +2 2 n ∞ m −1 Knm +1 4 ≤ an Kn + an , 2 anm +1 anm Knm n m −1

n=1

n=nm +2

being valid for all m ≥ 1. By (7.2.30) and (7.2.31) above the right-hand side tends to zero as m → ∞. This fact completes the proof of (7.2.25). √ Let us put Xm = Sm (f )/ Sm (f ) 2 and Ym = Sm (fm )/ Am for m ≥ 1, and let ϕXm and ϕYm denote the characteristic function of Xm and Ym respectively, for all m ≥ 1. Then by (7.2.34) and (7.2.35) in Lemma 7.2.7 we may easily conclude that Xm − Ym → 0 in L2 (μ) as m → ∞. Therefore we have ϕXm (t) − ϕYm (t) → 0 as m → ∞, for all t ∈ R. Thus by the continuity theorem the main proof will be completed as soon as we show that ϕYm (t) → exp{−t 2 /2} as m → ∞, for all t ∈ R. This fact is established in the next lemma. 7.2.8 Lemma. Under the hypotheses stated above we have

Sm (fm ) E exp it √ → exp −t 2 /2 Am as m → ∞, for all t ∈ R. ,Nnm +1 Lnm +1 −1 −l T (Fnm +1 ) for Proof. Let t ∈ R be given and fixed. Put G = l=N nm +1 m ≥ 1. Since μ(G) → 1 as m → ∞, we may be concerned only with the integration

7.2 A theorem of Burton and Denker

283

over G and proceed as follows. Denote νl = μ(Fnm +1 )−1 tr (μ, T −l (Fnm +1 )) for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. As in the proof of Lemma 7.2.7 we may conclude that Sm (anm +1 gnm +1 ) restricted to T −l (Fnm +1 ) is independent of Sm (anm gnm ) with respect to νl for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. Hence we get: Sm (fm ) exp it √ dμ Am G Nnm +1 Lnm +1 −1 Sm (fm ) = μ(Fnm +1 ) · exp it √ dνl Am T −l (Fnm +1 ) l=Nnm +1 (7.2.42) Nnm +1 Lnm +1 −1 Sm (anm gnm ) = μ(Fnm +1 ) exp it dνl √ −l (F A T ) m n +1 m l=Nnm +1 Sm (anm +1 gnm +1 ) × exp it dνl √ Am T −l (Fnm +1 ) being valid for all m ≥ 1. Since m < Knm +1 , then by (b) in Lemma 7.2.5 we see that the random functions gnm +1 T i for 0 ≤ i ≤ m restricted to T −l (Fnm +1 ) are independent and identically distributed with respect to νl for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. Moreover, we have νl {gnm +1 T i = ±1} = 1/2 for all 0 ≤ i ≤ m with m ≥ 1. Thus by Berry–Esséen’s theorem (see Feller [1971] p. 542), we may conclude: u Sm (gnm +1 )(x) −v 2 1 −l sup νl x ∈ T (FNm +1 ) : exp dv ≤u −√ √ 2 m 2π −∞ u∈R 3 ≤√ m for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. Hence we can deduce: Sm (anm +1 gnm +1 ) 1 m|anm +1 |2 2 exp it dνl = exp − t +ηm (7.2.43) √ 2 Am Am T −l (Fnm +1 ) → 0 as m → ∞. for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1, where ηm Moreover, it may be worthwhile to mention that this statement follows even more directly by the fact that the pointwise convergence of a sequence of characteristic functions to a characteristic function is uniform over every bounded interval. (This fact in turn relies upon Prohorov’s theorem and the continuity theorem). Indeed, having this it is enough to notice that |m|anm +1 |2 /Am | ≤ 1 for all m ≥ 1, and then apply the preceding fact to the interval [−t, t] by using the central limit theorem and the continuity theorem. However, note that, even though it is irrelevant for our purposes, = η (t). As an alternative proof of (7.2.43) we can also recall in this way we get ηm m that supx∈R | Fn (x) − F (x) |→ 0 as n → ∞, provided that Fn (x) → F (x) for all x ∈ R as n → ∞ and F is continuous, and supx∈R | Fn (x) − F (x) |→ 0 as n → ∞, if and only if supt∈R | ϕn (t) − ϕ(t) |→ 0 as n → ∞. Here Fn and F are distribution

284

7 The central limit theorem for dynamical systems

functions, and ϕn and ϕ are associated characteristic functions for n ≥ 1. Now by (7.2.42) and (7.2.43) we get: Sm (fm ) exp it √ dμ Am G 1 m|anm +1 |2 2 Sm (anm gnm ) = exp − t + ηm · exp it dμ √ 2 Am Am G (7.2.44) → 0 as m → ∞. Repeating the similar procedure being valid for m ≥ 1, where ηm for the nm level instead of the level nm + 1, and using the fact that by (7.2.28) above we have Knm ≤ m < Nnm −Knm , according to which by (d) in Lemma 7.2.5 we can extract from the sequence {gnm T i : 0 ≤ i < m} a subsequence of cardinality Knm − 1, Knm or Knm + 1 containing functions which are, being restricted to T −l (Fnm ), mutually independent and identically distributed with respect to ρl := μ(Fnm )−1 tr (μ, T −l (Fnm )), and taking values ±1 with probability 1/2, where Nnm ≤ l < Nnm Lnm , we obtain: Sm (anm gnm ) 1 Knm |anm |2 2 exp it dρl = exp − t + ηm (7.2.45) √ 2 Am Am T −l (Fnm ) → ∞ as m → ∞. Put for m ≥ 1, for all Nnm ≤ l < Nnm Lnm with m ≥ 1, where ηm ,Nnm Lnm −1 −l H = l=Nn T (Fnm ). Then it is easily verified that μ(G*H ) → 0 as m → ∞. m Hence by (7.2.34) and (7.2.35) we may conclude: Sm (fm ) exp it √ dμ Am G 1 m|anm +1 |2 2 Sm (anm gnm ) = exp − t + ηm · exp it dμ + δm √ 2 Am Am H $ % $ % m|anm +1 |2 2 Knm |anm |2 2 = exp − exp − μ(H ) + δm t + ηm t + ηm 2Am 2Am (7.2.46) → 0, δ → 0, η → 0, μ(G) → 1 and μ(H ) → 1 as m → ∞. Thus where ηm m m letting m → ∞ in (7.2.46) we get: Sm (fm ) exp it √ dμ → exp{−t 2 /2} Am

as m → ∞, for all t ∈ R. This fact completes the proofs of Lemma 7.2.5 and Theorem 7.2.8.

7.3 The central limit theorem for orbits Throughout the whole section we suppose that (X, A, μ, T ) is a given aperiodic dynamical system. We recall that CLT(μ, T ) denotes the set of all functions f ∈ L2 (μ)

285

7.3 The central limit theorem for orbits

with f dμ = 0 satisfying the central limit theorem as stated in (7.2.1) above. According to Theorem 7.2.4 we have CLT(μ, T ) = ∅. Given f ∈ CLT(μ, T ) we write Orb(f ) = {f T j : j ≥ 0}. The set Orb(f ) is called the orbit of f . The smallest linear subspace of L2 (μ) containing Orb(f ) is denoted by span(Orb f )). The main aim of this section is to investigate the central limit theorems involving elements of span(Orb(f )). It turns out that a complete description of weak Gaussian limits in this case can be obtained. The result is presented in Theorem 7.3.3 and then extended in Theorem 7.3.5. The proof relies upon the next two lemmas which are also of interest in themselves. Let Nd (μ, ) denote the d-dimensional Gaussian distribution with mean vector μ and covariance matrix . With U we denote the matrix having all entries equal to 1. We begin as follows. 7.3.1 Lemma. Let f ∈ L2 (μ) be an arbitrary function, and let f1 , . . . , fd ∈ Orb(f ) d be arbitrary elements for some d ≥ 1. For all dα = (α1 ,. 2. . , αd ) ∈ R and m ≥ 1, put 2 Am = |Sm (f )| dμ and Am (α) = |Sm k=1 αk fk | dμ. Suppose that Am → ∞ as m → ∞. Then we have:

d

1

k=1 αk

√

Am

d d − S S α f αk · f → 0, m k k m k=1

2

k=1

√ d Am (α) → αk √ Am k=1 as m → ∞, for all α = (α1 , . . . , αd ) ∈ Rd with dk=1 αk = 0.

(7.3.1)

(7.3.2)

Proof. Let α = (α1 , . . . , αd ) ∈ Rd with dk=1 αk = 0 be given and fixed. It is no restriction to assume that f1 = f T p1 , f2 = f T p2 , . . . , fd = f T pd for some 0 ≤ p1 < p2 < · · · < pd . Then we have Sm

d

d d m−1 m−1 αk fk − Sm αk · f = αk f T j +pk − f Tj

k=1

k=1

j =0

k=1

=

d

αk

k −1 m+p

j =m

k=1

j =0

f Tj −

p k −1

f Tj

j =0

being valid for all m ≥ 1, where the first term of the first sum in the last line reads zero in the case when p1 equals zero. Hence we get d d d αk fk − Sm αk · f ≤ 2 |αk |pk f 2 Sm k=1

k=1

2

(7.3.3)

k=1

for all m ≥ 1. Thus (7.3.1) follows straightforwardly by the fact that Am → 0 as m → ∞. Moreover (7.3.2) is an immediate consequence of (7.3.1). The proof is complete.

286

7 The central limit theorem for dynamical systems

7.3.2 Lemma. Let f ∈ CLT(μ, T ) be an arbitrary function, and let f1 , . . . , fd ∈ Orb(f ) be arbitrary elements for some d ≥ 1. Put Am =

d 2 αk fk dμ, |Sm (f )| dμ and Am (α) = Sm 2

k=1

for all α = (α1 , . . . , αd ) ∈ Rd and m ≥ 1. Suppose that Am → ∞ as m → ∞. Then we have d D 1 Sm αk fk −→ N (0, 1), (7.3.4) √ Am (α) k=1 √

d d 2 D 1 Sm αk fk −→ N 0, αk Am k=1 k=1

as m → ∞, for all α = (α1 , . . . , αd ) ∈ Rd with

d

k=1 αk

(7.3.5)

= 0.

Proof. Let α = (α1 , . . . , αd ) ∈ Rd with dk=1 αk = 0 be given and fixed. Then by (7.3.1) in Lemma 7.3.1 we easily find

d . Sm Sm (f ) k=1 αk fk dμ − exp it √ dμ → 0 exp it d √ Am Am k=1 αk as m → ∞, for all t ∈ R. Thus (7.3.4) follows by the continuity theorem and the fact that f ∈ CLT(μ, T ). Moreover (7.3.5) follows straightforwardly by (7.3.4) and (7.3.2) in Lemma 7.3.1. The proof is complete. 7.3.3 Theorem. Let f ∈ CLT(μ, T ) be an arbitrary function, and let f1 , . . . , fd ∈ Orb(f ) be arbitrary elements for some d ≥ 1. Write F = (f1 , . . . , fd ), and put Am = |Sm (f )|2 dμ for all m ≥ 1. Suppose that Am → ∞ as m → ∞. Then we have 1 D (7.3.6) √ Sm (F ) −→ Nd (0, U) Am as m → ∞. More generally, suppose that f1 , . . . , fd ∈ span(Orb(f )) are arbitrary elements. Then we have Nk fk = βik fik i=1

for some ∈ Orb(f ) and ∈ R with 1 ≤ i ≤ Nk and 1 ≤ k ≤ d. Suppose Nk k moreover that i=1 βi = 0 for all 1 ≤ k ≤ d, and that Am → ∞ as m → ∞. Then we have 1 D (7.3.7) √ Sm (F ) −→ Nd (0, ) Am fik

βik

287

7.3 The central limit theorem for orbits

as m → ∞, where = ( kl )dk,l=1 is given by kl =

Nk

Nl βik · βil

i=1

i=1

for all 1 ≤ k, l ≤ d. Proof. First we prove (7.3.6). Since the set D = (α1 , . . . , αd ) ∈ Rd : dk=1 αk = 0 is dense in Rd , then by the continuity theorem it suffices to show α · Sm (F ) dμ → exp iα · X¯ dμ (7.3.8) exp i √ Am for all α ∈ D, as m → ∞, where X¯ = (X, X, . . . , X) with X ∼ N (0, 1). Again by the continuity theorem, for (7.3.6) it is enough to show that α · Sm (F ) D −→ α · X¯ √ Am

(7.3.9)

for all α ∈ D, as m → ∞. Notice that we have α · Sm = Sm

d

αk fk ,

α · X¯ =

k=1

d

αk X

k=1

for all α = (α1 , . . . , αd ) ∈ Rd and all m ≥ 1. Hence we see that for (7.3.9) it suffices to show that d d D 1 (7.3.10) αk fk −→ αk X √ Sm Am k=1 k=1 for all α = (α1 , . . . , αd ) ∈ D, as m → ∞. However it is precisely the statement (7.3.5) in Lemma 7.3.2. Thus (7.3.8)–(7.3.10) is valid, and the proof of (7.3.6) is complete. Next we prove (7.3.7). We use the same argument as for (7.3.6). In this way we obtain that it suffices to show that D 1 αk fk −→ αk Xk √ Sm Am k=1 k=1 d

d

(7.3.11)

for all α = (α1 , . . . , αd ) that belong to a dense subset D of Rd , as m → ∞, where X = (X1 , . . . , Xd ) ∼ Nd (0, ). Notice that we have Sm

d k=1

Nk d αk fk = αk βik fik k=1 i=1

(7.3.12)

288

7 The central limit theorem for dynamical systems

d for all α = (α1 , . . . , αd ) ∈ R all m ≥ 1. Choose D to be the set of all α = dand k k d d (α1 , . . . , αd ) ∈ R such that k=1 N i=1 αk βi = 0. Then D is dense in R . Hence by (7.3.2) and (7.3.5) in Lemma 7.3.2 we get k D 1 Sm αk fk −→ αk βik X Am k=1 k=1 i=1

d

√

d

N

Nk k as m → ∞, with X ∼ N (0, 1). Thus (7.3.11) follows with Xk = i=1 βi X for all 1 ≤ k ≤ d, and the proof of (7.3.7) is complete. These facts complete the proof of the theorem. 7.3.4 Example. Let (X, A, μ, T ) be equal to (S N , AN , ν N , θ ) with S = {−1, 1}, A = 2S , ν{±1} = 1/2 and θ (s1 , s2 , . . . ) = (s2 , s3 , . . . ) for (s1 , s2 , . . . ) ∈ S N . Then T is strongly mixing, and thus aperiodic as well. Let f be the projection onto the first coordinate. Then f T j = εj form a Rademacher sequence for j ≥ 1. By the classical central limit theorem we have m−1 j m−1 1 D j =0 T f (x) =√ εj −→ N (0, 1) m−1 j m

j =0 T f 2 j =0

as m → ∞. Thus f ∈ CLT(μ, T ). However, if we put g = f − f T , then we have m−1 j 1 1 j =0 T g(x) = √ (ε1 − εm ) ∼ √ (ε1 + ε2 ) ∼ N (0, σ 2 ) m−1 j 2 2

j =0 T g 2 / CLT(μ, T ). Hence we see that the for all m ≥ 1 and any σ 2 > 0. Therefore g ∈ results of Lemma 7.3.1, Lemma 7.3.2 and Theorem 7.3.3 are as optimal as possible in general. A close look at the method of the proof of Theorem 7.3.3, through Lemma 7.3.1 and Lemma 7.3.2, shows that these results might be generalized in the number of elements in the orbit being involved. The result in shown can be formulated into2 two steps as is m γ f stands for the L (μ) -limit of γ f the next theorem. We clarify that ∞ k=1 k k k=1 k k 2 as m → ∞ whenever (γk )∞ k=1 ∈ 1 and fk ∈ Orb(f ) for k ≥ 1 with f ∈ L (μ). Notice that under these circumstances the limit exists. 7.3.5 Theorem. Let f ∈ CLT(μ, T ) be an arbitrary function, and let fk = f T pk ∈ Orb(f ) be arbitrary elements for some pk ≥ 0 with k ≥ 1. Let (αk )∞ k=1 ∈ 1 satisfy ∞

|αk |pk < ∞,

(7.3.13)

k=1 ∞ k=1

αk = 0.

(7.3.14)

7.4 A theorem of Volný

289

2 Put Am = |Sm (f )|2 dμ and Am (α) = |Sm ( ∞ k=1 αk fk )| dμ for all m ≥ 1. Suppose that Am → ∞ as m → ∞. Then we have 1 √ Sm αk fk − Sm αk · f → 0, 2 α A m k=1 k k=1 k=1 ∞

∞

∞

√ ∞ Am (α) 1 → αk √ , √ Am Am (α) k=1 Sm

∞

D αk fk −→ N (0, 1),

(7.3.15)

(7.3.16)

(7.3.17)

k=1 ∞ ∞ 2 D 1 αk fk −→ N 0, αk √ Sm Am k=1 k=1

(7.3.18)

pik k k k ∈ as m → ∞. More generally, suppose that: fk = ∞ i=1 βi fi for some fi = f T k k ∞ Orb(f ) with pi ≥ 0 and (βi )i=1 ∈ 1 for 1 ≤ k ≤ d and i ≥ 1. Suppose moreover that ∞ |βik |pik < ∞, (7.3.19) i=1 ∞

βik = 0

(7.3.20)

i=1

for all 1 ≤ k ≤ d. Write F = (f1 , . . . , fd ), and suppose that Am → ∞ as m → ∞. Then we have 1 D (7.3.21) √ Sm (F ) −→ Nd (0, ) Am

∞ k ∞ l as m → ∞, where = ( kl )dk,l=1 is given by: kl = i=1 βi · i=1 βi for all 1 ≤ k, l ≤ d. Proof. We point out (7.3.3) in the proof of Lemma 7.3.1 in order to understand how conditions (7.3.13) and (7.3.19) apply. The rest of the proof can be carried out along the lines of the proofs of Lemma 7.3.1, Lemma 7.3.2 and Theorem 7.3.3. We shall omit the details.

7.4 A theorem of Volný Let (X, A, μ, T ) be an aperiodic dynamical system. A natural question arising from the Cental Limit theorem of Burton and Denker is certainly the following: is it possible

290

7 The central limit theorem for dynamical systems

to find an f ∈ L2 (μ) with limit theorem:

f dμ = 0,

f 2 dμ = 1 satisfying the following central

u n−1 1 j 1 −v 2 μ x∈X: √ T f (x) ≤ u → √ exp dv 2 n 2π −∞ j =0 as n → ∞, for all u ∈ R? In other words, to replace in Theorem 7.2.4 is it jpossible by √n? The answer turns out T f the intrinsic CLT normalizing factors n−1 j =0 2 to be affirmative and follows from a fine result of Volný [1999], who even showed the existence of the invariance principle in any aperiodic dynamical system. Volný’s approach is still based on the use of Kakutani–Rochlin towers. 7.4.1 Theorem. Let (X, A, μ) be a nonatomic probability space. Let T : X → X be an ergodic aperiodic automorphism. There exist a function f ∈ L2 (μ) and independent

random variables Zj ∼ N 0, 2(log log 3 log log 2) , such that n−1 n−1 1 log log log log n 1/2 j max √ T f− Zj = O . 1≤j ≤n n log n j =0

j =0

A slight extension of this result to a class of weighted averages (with monotonic weights) is performed in Kristensen [2002: Corollary 9.5]. By elementary considerations, the answer to the above question is now easily deduced from this result. From the proof of this one, also follows that a strong invariance principle holds. 7.4.2 Theorem. Let (X, A, μ) be a nonatomic probability space. Let T : X → X be an ergodic aperiodic automorphism. There exist a function f ∈ L2 (μ) and a Brownian motion B = {Bt , t ≥ 0} such that n−1 j j =0 T f − B(n) a.s. = 0. lim √ n→∞ n log log n A key ingredient of the proof is the following very useful approximation property. 7.4.3 Proposition. Let (X, A, μ, T ) be an ergodic aperiodic measurable dynamical system. Let {Xi , i ∈ N} be a strictly stationary ergodic sequence defined on a joint probability space. Let A1 = {Ai , 1 ≤ i ≤ K} be a finite measurable partition of X, n be some positive integer as well as some positive real ε. Then, there exists a simple function f such that the following holds: (1) There exists a random vector (X0 , . . . , Xn ) defined on (X, A, μ) that is distributed as (X0 , . . . , Xn ) and such that μ ∃0 ≤ ≤ n : |f T − X | > ε ≤ ε.

7.5 CLT for rotations

291

(2) The partition F generated by {f T , 0 ≤ ≤ n} is ε-independent of A1 : μ(A) − μ(A|B) ≤ ε. A∈A1 ,B∈F μ(B)>0

A straightforward consequence of this proposition is the following statement, which is Proposition 1 in Volný [1999]. 7.4.4 Proposition. Let (X, A, μ, T ) be an ergodic aperiodic measurable dynamical system. Let {εk , k ≥ 1}, {αk , k ≥ 1} be two sequences of positive reals. Let of positive integers. Then there exist a sequence also {k , k ≥ 1} be a sequence {fk , k ≥ 1} ⊂ L2 (μ) with fk dμ = 0 and a triangular array of independent random variables Xk, ∼ N (0, αk2 ), 0 ≤ ≤ k , k ≥ 1 such that sup sup k≥1 0≤≤k

7.5

fk T − Xk,

≤ 1. εk

CLT for rotations

In this section, we focus on CLT for irrational rotations. In this case, harmonic analysis methods operate efficiently not only to treat the case of ergodic means, but also some other typical means having more complicated structure. Let θ ∈ ]0, 1[ ∩ Qc and τ (f )(t) = consider on (T, λ), the rotation τ x = x + θ (mod 1), x ∈ X. Let SN N−1 n k=0 f (τ t) denote the usual ergodic sums. Consider also the following sums: (A)

(B)

(C)

τ SN (f, g)(t) =

UNτ (f )(t) = τ N (f )(t) =

N −1 k=0 N −1 k=0 N −1

f (τ k t)g(τ 2k t) (nonlinear ergodic sums), σk f (τ k t) 2

f (τ k t)

(weighted ergodic sums),

(ergodic sums along the squares).

k=0

The nice linearity properties inherited from the structure of usual ergodic sums and allowing us in particular to work with coboundaries, naturally no longer exist for these sums. We study the CLT property for the means associated to each of these sums, including the usual ergodic sums, and show that this can be done using an elementary approach. In case (C), the method will be combined with the circle method. The study also extends, by means of the same method, to CLT and/or ASCLT for sample paths of some classes of Gaussian random Fourier series.

292

7 The central limit theorem for dynamical systems

The principle can be described as follows: let {[aj , bj [, j ≥ 1} be a sequence of non-overlapping intervals with limj →∞ aj = limj →∞ bj = 0. Let L = {j , j ≥ 1} be a lacunary sequence verifying {j θ } ∈ [aj , bj [

(∀j ≥ 1).

Then, for an appropriate choice of the intervals [aj , bj [, mild assumptions on the Fourier coefficients of f = 21 k∈Z βk llk will ensure that f satisfies the CLT and for the cases (A), (B) and (C) as well. Notation–Construction. Let f ∈ L2 (λ). We first propose a construction adapted to τ f = N −1 f τ k , and will say that a function the study of the usual sums SN f := SN k=0 f ∈ L2 (λ) satisfies the CLT when SN f D −→ N (0, 1),

SN f 2 and that f satisfies the ASCLT when N 1 1 D δ S f −→ N (0, 1). log N k SNNf 2 n=1

We thus follow the point of view already considered in Section 7.2, which consists of studying the CLT with respect to the natural L2 (λ)-normalizing factor. Let first ε = {εn , n ≥ 1} be a sequence of reals in ]0, 1] decreasing to 0, and put ηn = ε1 ε22 . . . εn2 . To the sequence ε we associate an increasing sequence of positive integers N = {Nj , j ≥ 1} which is defined as follows: n−1 ·N1 . (7.5.1) N1 ≥ 1 is arbitrary and for n ≥ 2, Nn = 16εn ηn−1 Next we associate to θ and the sequence ε another increasing sequence of positive integers L = {j , j ≥ 1} as follows: For any n ≥ 1 (since {{nθ}, n ≥ 1} is everywhere dense in [0, 1]) n is chosen so that ηn ηn ≤ {n θ } ≤ n . n 16 N1 2 16 N1

(7.5.2)

We next partition N into consecutive blocks Ij = [Nj , Nj +1 [ ,

j = 1, 2, . . .

(7.5.3)

and will estimate oscillations of partial sums overthese blocks. Let β = {βk , k ≥ 1} ∈ 2 2 such that β0 = 0, βn = β−n > 0, n = 0 and ∞ k=1 βk = 1. Let e(t) = exp(2iπ t), en (t) = e(nt), n ∈ Z. Define ∞

f (t) =

1 βk ek (t) = βk cos 2π k t. 2 k∈Z

k=1

293

7.5 CLT for rotations

Let g = {gk , k ≥ 1}, g = {gk , k ≥ 1} be two independent sequences of i.i.d. random variables defined on a joint probability space (, A, P) and such that E gk = E gk = 0, and E gk2 = E gk 2 = 1. We assume gk , gk to be sub-Gaussian random variables. Let γ = {γk , k ≥ 1} be defined by γk = gk + igk . Associate to f the random Fourier series X(t) = Xf (t) =

∞

# " βk $ γk ek (t) = βk gk cos 2π k t + gk sin 2π k t . k=1

k∈N

(7.5.4) Next, put for any integer N ≥ 1 and any real number θ ∈ [−1, 1], N −1 1 2iπ kθ e2iπ N t − 1 VN (θ ) = e = . N N (e2iπ t − 1) k=0

Plainly, for any integer N, SN (f )(t) = N

∞

" # βk $ VN (k θ )) cos 2π k t,

k=1

SN (Xf )(t) = N

∞

" # βk $ γk ek (t).VN (k θ )) .

k=1

The following estimates are valid for any integer n ≥ 1 and rely upon the choice of sequences N and L: (Ln ) (Nn )

∀N ≤ Nn , |VN ({n θ }) − 1| ≤ 2εn , ∀N ≥ Nn , ∀m < n, |VN ({m θ })| ≤ εn .

These are easily deduced from standard estimates of the kernels VN : for any integer N ≥ 1 and y ∈ [−1, 1], |VN (y) − 1| ≤ 32Ny,

|VN (y)| ≤

1 . 2Ny

(7.5.5)

Then, on the one hand (if N ≤ Nn ) |VN ({n θ}) − 1| ≤ 32N{n θ } ≤ 32Nn {n θ } ≤ 32

16n−1 .N1 ηn = 2εn , εn ηn−1 16n N1

and on the other (if N ≥ Nn , n > m) |VN ({m θ})| ≤

2 16m N1 1 1 εn ηn−1 = εn . ≤ ≤ 2N{m θ } 2Nn {m θ } 2(1/2)16n−1 .N1 ηm

Decomposition of partial sums. In order to avoid unnecessarily heavy notation, write more simply SN = SN (Xf ),

294

7 The central limit theorem for dynamical systems

unless the case requires something more explicit. The decomposition of SN goes as follows: put for any positive integers N and j , SN = SˇN + SˆN

where SˇN = N

+ sN SˇN = SN

where SN =N

∞ k>j ∞

" # βk $ γk · ek VN (k θ )) , " # βk $ γk · ek VN (k θ )) ,

(7.5.6)

k<j

TN = N

∞

" # βk $ γk · ek ,

bj =

k>j

∞

βk2

1/2 .

k>j

Observe for N ∈ Ij that TN /N bj depend on j only and equals

k>j βk $ γk ek j := . bj Put for j ≥ 1 and N ∈ Ij ,

τN = Nbj .

As will be seen later on, mild assumptions on the sequences ε and β will ensure that P We also put YN =

SN 2,λ = 1 = 1. N →∞ τN lim

SˇN − TN , τN

ZN =

SN , τN

UN =

SˆN − SN . τN

(7.5.7)

Then for N ∈ Ij we have the relation: SτNN − j = YN + ZN + UN .

Main estimates. Put ζk = ek N VN (k θ )−1 = zk +izk . Then $(γk ζk ) = gk zk −gk zk , and thus

SˇN − TN = βk gk zk − gk zk . k>j

= k:k≥j +1 βk (zk )2 +(zk )2 = k:k≥j +1 βk2 N 2 |VN (k θ )−1|2 . As N ∈ Ij and k > j , we have N < Nk and by property (Lk ), |VN (k θ ) − 1| ≤ 2εk . Thus SˇN − TN 2 ≤ 4N 2 βk2 εk2 ≤ 4εj2+1 N 2 βk2 = 4εj2+1 bj2 N 2 . 2,P Hence SˇN −TN 2

2

2,P

k>j

k>j

Hence ∀j ≥ 1, ∀n ∈ Ij ,

SˇN − TN 2,P ≤ 2εj +1 bj N

and

YN

2,P

≤ 2εj +1 . (7.5.8)

295

7.5 CLT for rotations

2 2 2 2 2 2 2 2 2 By property (Nj ) SN k<j βk N |VN (k θ )| ≤ 4 k<j βk N εj ≤ 4N εj , 2,P = since k < j and N ≥ Nj . Thus S

∀j ≥ 1, ∀N ∈ Ij ,

N 2,P

≤ 2εj N

and

ZN

≤ 2εj /bj .

2,P

(7.5.9)

" # The perturbation term is the quantity sN = βj N $ γj .ej VN (j θ ) (N ∈ Ij ). In this case |VN (j θ)| remains undetermined. We have sN

∀j ≥ 1, ∀N ∈ Ij ,

2,P

≤ βj N

and

UN

2,P

≤ βj /bj .

(7.5.10)

The series j (βj /bj )2 diverges in general, but we may choose sequence β so that the series j (βj /bj )a converges for a > 2. This will require that we work in some suitable Orlicz space in order to analyse efficiently the oscillation of partial sums SN around j . As a simple consequence of the previous estimates, we get SN − j

τN

2,P

≤ 4εj /bj + βj /bj ,

(7.5.11)

a bound which clearly limits the choice of the sequence β, one that cannot in effect grow faster than geometrically. Oscillations of partial sums. The oscillation of normalized partial sums SτNN around j and over the blocks Ij can be very precisely evaluated. Let indeed r > 2 and introduce the oscillation function Wr = Put r =

∞ j =1

∞

r 1/r SN sup − j .

j =1 N ∈Ij

(7.5.12)

τN

$ j +1 % % ∞ $ εj βj r 1/r 1 1/2 r 1/r log + . bj εk bj

(7.5.13)

j =1

k=1

7.5.1 Proposition. For any r > 2, Wr G,ν ≤ Cr r . Proof. It will be convenient to work with the following quantities: Yj = sup |YN |r ,

Zj = sup |ZN |r , N ∈Ij

N∈Ij

Yr =

∞

Yj

Uj = sup |UN |r ,

1/r ,

Zr =

∞

j =1

According to the decomposition Wr ≤ Cr (Yr + Zr + Ur ).

j =1 SN τN

Zj

N ∈Ij

1/r ,

Ur =

∞

Uj

1/r

(7.5.14) .

j =1

− j = YN + ZN + UN , we have the inequality

296

7 The central limit theorem for dynamical systems

s Put for s > 0, Gs (x) = e|x| − ns 0 : Gs dμ ≤ 1 (h ∈ L0 (μ)). α D

s Let also Mr < ∞ be such that e|x| ≤ 2 Gs (x) + 1 if |x| ≥ Mr . Since g and g are sub-Gaussian random variables, YN , ZN , UN are sub-Gaussian too and belong to LG (P). As for any finite index I ,

sup |fi | G,ν ≤ i∈I

2 log 2

1/2

2 sup fi G,ν #(I ), i∈I

we have for all t ∈ X, by using estimate (7.5.8),

2

(Yj )1/r (t) G,P = sup |YN (t)| G,P ≤ C sup YN (t) 2,P log #(Ij ) N ∈Ij

N ∈Ij

2

≤ Cεj +1 log #(Ij ). 2 This implies that (Yj )1/r G,ν ≤ Cεj +1 log #(Ij ). Therefore, $ %2 (Yj )1/r exp dν ≤ 2 2 Cεj +1 log #(Ij ) X× and consequently,

(Yj ) G2/r " #r dν ≤ 1. 2 Cεj +1 log #(Ij ) X× r

2 j , we deduce by Hence, Yj G2/r ,ν ≤ Cr εj +1 log #(Ij ) . Since Yrr = ∞ j =1 Y r

2 means of the triangle inequality that Yrr G2/r ,ν ≤ Cr ∞ j =1 εj +1 log #(Ij ) . We denote by B the bound obtained. Then r Yr G2/r dν ≤ 1. B X×

But

exp X×

Yr B 1/r

2

dν =

exp

r 2/r Y r

B

X×

=

Yrr B

+ <Mr

≤ exp(Mr )

2/r

Yrr B

dν

≥Mr

+2

$

G2/r X×

≤ exp(Mr )2/r + 4.

exp

r 2/r Y r

B

r Y r

B

dν %

+ 1 dν

297

7.5 CLT for rotations

It follows that

Yr G,ν ≤ Cr

∞ !

2 #r 1/r . εj +1 log #(Ij )

(7.5.15)

j =1

Using now estimate (7.5.9) and arguing similarly, we get

Zr G,ν ≤ Cr

∞ ! εj 2 j =1

bj

r 1/r

log #(Ij )

.

(7.5.16)

Finally, estimate (7.5.10) together with the same line of arguments also leads to

Ur G,ν ≤ Cr

∞ $ j =1

βj bj

%r 1/r

.

(7.5.17)

j +1 Notice that log #(Ij ) ≤ log Nj +1 ≤ C k=1 log(1/εk ). Since Wr ≤ Cr (Yr +Zr +Ur ) we have obtained

Wr G,ν ≤ Cr r , (7.5.18) as requested. 7.5.2 Remark. From the very proof of Proposition 7.5.1 follows that the conditions βj 2 log j = 0, j →∞ bj lim

εj2 log Nj +1 j ≥1

bj2

< ∞,

(7.5.19)

are sufficient to imply

P

SN 2 lim sup − j = 0 (in L (λ) and λ- a.e.) = 1.

j →∞ N ∈Ij

τN

(7.5.20)

CLT. In view of proving a CLT for X defined in (7.5.4), we modify very slightly the definition on f . At first, we shall and do assume that (k ) is r-lacunary with r ≥ 3. Let β 1 = βk1 , k ≥ 1 , β 2 = βk2 , k ≥ 1 , β 1 , β 2 ∈ 2 and let ρ = ρk , k ≥ 1 be defined by ρk = βk1 + iβk2 . Put F (t) =

∞

$(ρk ek (t))

(∀t ∈ T).

(7.5.21)

k=1

7.5.3 Theorem. Assume that the following condition is satisfied: εj + |ρj | lim = 0. 2 1/2 |ρ | k k>j

j →∞

Then F ∈ CLT.

(7.5.22)

298

7 The central limit theorem for dynamical systems

Some elementary estimates for products of complex exponentials collected in the following technical lemma will be necessary. 7.5.4 Lemma. Let be a finite index and {zk , k ∈ } be complex numbers. 1 2" 1 2 |zk |3 |zk | # (a) Assume that h0 = k∈ e− 2 zk 41 e 2 |zk | |zk |4 + 13 |1+z e ≤ 1/2. Then, k| ) zk k∈ e − 1 ≤ 2h0 . 1 2 ) 2 zk k∈ (1 + zk )e (b) Assume zk purely imaginary, and let h = ≤ 1/2. Then,

k∈

" |zk |4 4

e|zk | + √|zk |

3

2

3

1+|zk |

e|zk |+ 2

|zk |2 2

#

( ( 1 2 ezk − (1 + zk )e 2 zk ≤ 2h, k∈

Proof of Lemma 7.5.4. Put ak =

k∈ ezk 1 z2

− 1, k ∈ . We use the following inequal-

(1+zk )e 2 k 1 2 ity valid for any complex number z: ez − (1 + z)e 2 z

1

≤ 41 |1 + z||z|4 e 2 |z| + 13 |z|3 e|z| . # 3 1 2 "1 1 2 |zk | Then we get |ak | ≤ |e− 2 zk | 4 e 2 |zk | |zk |4 + 13 |1+z e|zk | . Thus k∈ |ak | ≤ h0 ≤ 21 , k| which by means of inequality 3.8.8 p. 314 of Mitrinovi´c [1970] implies ) 2 zk k∈ e ≤ h0 . − 1 − a k 1 2 1−h ) 0 (1 + zk )e 2 zk 2

k∈

k∈

1 2

1

Hence (a) follows. As for (b), observe that |1+zk ||e 2 zk | = |1+i|zk ||e− 2 |zk | ≤ 1, hence ) ) 1 2 1 2 2 zk ≤ 1. Multiplying both sides of inequality (a) by 2 zk k∈ (1 + zk )e k∈ (1 + zk )e gives % ( $1 ( 1 2 1 |zk |3 2 2 ezk − (1 + zk )e 2 zk ≤ 2 e|zk |+|zk | /2 = 2h. |zk |4 e|zk | + 2 4 3 1 + |zk |2 k∈ k∈ k∈ 2

Proof of Theorem 7.5.3. We use the decomposition of partial sums previously made and the Salem–Zygmund " method, which we # display for convenience. We observe that SN F (t) = N ∞ $ ρ V ( θ )e (t) . We introduce some notation, putting for k N k k k=1 any j ≥ 1, N ∈ Ij and t ∈ X: " # " # SˇN F (t) = N $ ρk VN (k θ )ek (t) , SˆN f (t) = N ρk $ ρk VN (k θ )ek (t) , k>j

k≤j

SˇN F (t) =

SN F

+ sN F,

" # $ ρk ek (t) , Fj (t) = k>j

SN F

" # =N $ ρk VN (k θ )ek (t) , k<j

cj =

k>j

|ρk |2

1/2 ,

σN = N cj .

299

7.5 CLT for rotations

Then, similarly to previous notation, SˇN f − NFj 2 = N 2 |ρk |2 |VN (k θ ) − 1|2 ≤ 4N 2 εj2+1 cj2 , k>j

2 S ≤ 4N 2 c2 ε 2 , 0 j N

Thus

sN ≤ N|ρj |.

SN F Fj SˆN F 2εj + |ρj | Fj SˇN F ≤ − + → 0, − ≤ 2εj +1 +

σN

cj

σN

cj

σN

cj

as N tends to infinity under the assumptions made. And by the triangle inequality

SN F 2 = 1. N →∞ σN lim

Using now the elementary inequality |eiu − eiv | ≤ 2| sin((v − u)/2)| ≤ |v − u| gives 1 SN F

Fj exp(iλSN F (t)/σN ) − exp(iλFj (t)/cj ) dt ≤ |λ| − σN cj 1 0 2εj + |ρj | ≤ |λ| 2εj +1 + → 0, cj as N tends to infinity. Let now κ : N → N be some increasing map satisfying |ρk |2 /cj2 ≤ εj2 (j ≥ 1), k>κ(j )

" # and write j = [j, κ(j )[, Fj = j j |ρ | k>j k k>j

300

7 The central limit theorem for dynamical systems

which tends to 0 as j tends to infinity. Redefining J if necessary, we deduce that for all t ∈ T and j ≥ J , ( j j zk (t) exp[zk (t)2 /2] ≤ C3 sup |ρk |/ck . exp(iλFj (t)/bj ) − k>j

k∈j

Integrating now on X leads for j ≥ J to

0

1

exp(iλFj (t)/bj ) −

(

dt ≤ C3 sup |ρk |/ck .

j j zk (t) exp[zk (t)2 /2]

k>j

k∈j

Put for any integer j ≥ 1, ( ( j (λ, t) = (1 + iλzk (t)), j

Bj (t) =

k∈j

j

(zk (t))2 .

k∈j

1) )

We have 0 j (λ, t)dt = 1. Indeed the product being k 1 + iλ[βk1 cos 2π k t + βk2 sin 2πk t] is representable as a sum of 1 plus a linear combination of cos 2π(k1 ± · · · ± kr ) or sin 2π(k1 ± · · · ± kr ). Since we assumed (k ) to be r-lacunary with r ≥ 3, the fact that the representation of a number n as n = k1 ± · · · ± kr is unique allows us to conclude our argument. We can thus factorize as follows:

1(

0

2 2 (λ, t). exp −λ Bj (t)/2 dt − exp(−λ /4)

j

=

1(

0

j

1(

≤

(λ, t) exp −λ2 Bj (t)/2 − exp(−λ2 /4)dt.

0

Now

" 2 # 2 (λ, t) exp −λ Bj (t)/2 − exp(−λ /4) dt

j

( (λ, t) =

(

j

j

(1 + |λ|2 |zk (t)|2 )1/2 ≤ exp(λ2 Bj (t)/2).

j κj

(βk1 )2 +(βk2 )2 . 2

$

Then since 0 ≤ Bj (t) ≤ 2,

%

λ2 1 − Bj (t) dt 2 2 $ % 2 32 /4 1 (βk1 )2 + (βk2 )2 ≤ e + Cj 2 . 2 2 cj2 k>κ

1 − exp

−

j

It remains to observe that

Cj 22 ≤

1 1 |ρk |4 /cj4 ≤ sup |ρk |2 /ck2 → 0, 4 4 k>j k∈j

as j tends to infinity by assumption. Collecting now these various estimates finally gives: for |λ| ≤ and j ≥ J = J (), N ∈ Ij , 1 2εj + |ρj | SN F (t) λ2 exp(iλ )dt − exp(− ) ≤ 2εj +1 + σN 4 cj 0 + εj + C3 sup |ρk |/ck + C2 exp(32 /4) sup |ρk |2 /ck2 → 0 k>j

k>j

as N tends to infinity. This achieves the proof. Let us now consider the random Fourier series defined in (7.5.4) and assume that the sequences g, g are independent and Gaussian. 7.5.5 Theorem (CLT for sample paths). Assume that |βj |(log j )1/2 = 0, j →∞ bj lim

εj2 log Nj +1 j ≥1

bj2

< ∞.

(7.5.23)

Then almost all sample paths of X satisfy CLT. Proof of Theorem 7.5.5. We may write

X(ω, t) = $ ρk (ω)elk (t) with ρk (ω) = βk gk (ω) + iβk gk (ω) k∈N

(k ≥ 1).

302

7 The central limit theorem for dynamical systems

Put ϕN (ω)2 =

2 gk (ω)2 + gk (ω) βk2

(N ∈ Ij , j ≥ 1).

k>j

Consider the Gaussian chaos of order 2, Bj =

βk 2 $ g 2 + g 2 − 2 % k

k

2

bj

k>j

.

By the hypercontractivity properties of Gaussian chaos (Ledoux–Talagrand [1991: p. 65]), for any integer q ≥ 2,

Bj q,P ≤ q Bj 2,P . 1 . Then E exp(αj |Bj |) ≤ 1 + αj Bj 1,P + Let wj = Bj 2,P and αj = 2ew j √ ∞ 1 q q 2π nnn e−n exp{ 1 1 } for any positive n (Mitriq=2 q! (αj q) wj . Since n! > 12n+ 4

novi´c [1970: p. 183]), we deduce

∞ ∞ 1 −q 1 1 q 2 = √ . (αj wj q) ≤ √ q! 2 π 4 π q=2

q=2

Thus E exp(αj |Bj |) ≤ C (with C = 1 + inequality gives, for any positive real η,

1 2e

+

1 √ ). 4 π

P |Bj | ≥ η ≤ C exp −

Now Bj 22,P = 2(E N (0, 1)4 − 1) β4 k 4 b k>j j

Thus wj = o so

1 log j

=o

k>j

k>j

Applying then Tchebycheff’s

η . 2ewj

βk4 /bj4 . By assumption,

βk2 bj2 log2 j

=o

1 . log2 j

. Let ρ > 0 be such that 2eρ < η. For j large, wj ≤ ρ/ log j , and

η η P |Bj | ≥ η ≤ C exp − ≤ exp − log j . 2ewj 2eρ By the Borel–Cantelli lemma and using the fact that η can be arbitrarily small, we a.s. deduce |Bj | = o(1) or, a.s. 1 2 gk + (gk )2 βk2 = 2. 2 j →∞ b j k>j lim

Therefore NϕN (ω) 1 2 gk (ω) + (gk (ω))2 βk2 = 2, = lim 2 N→∞ j →∞ b τN j lim

k>j

303

7.5 CLT for rotations

P-almost surely. The assumption of Theorem 7.5.3 here reduces to lim

1/2

εj + gj2 (ω) + (gj (ω))2 |βj |

j →∞

bj

= 0, g (ω)2 +g (ω)2

< which is trivially satisfied under the assumptions made, since supk≥1 k log kk ∞, P-almost surely. Applying now Theorem 7.5.3 gives for P-almost all trajectories of X(ω), SN (X(ω))/N ϕN (ω) "⇒ N (0, 1), √ as N tends to infinity. And thus SN (X(ω))/( 2τN ) ⇒ N (0, 1), as N tends to infinity, P-almost surely. Now we deduce from Remark 7.5.2 that SN (X) − j lim sup

j →∞ N∈Ij

τN

2,λ

= 0 "⇒

SN (X) lim sup

j →∞ N ∈Ij

τN

2,λ

− j 2,λ = 0,

P-almost surely. But we have seen that 1 2 gk (ω) + (gk (ω))2 βk2 = 2, 2 j →∞ b j k>j

lim j 22,λ = lim

j →∞

P-almost surely. Thus P limN →∞

SN (X(ω) 2,λ τN

=

√ 2 = 1. Finally,

SN (X(ω))/ SN (X(ω) 2,λ "⇒ N (0, 1), as N tends to infinity, P-almost surely. This achieves the proof. 7.5.6 Examples. We end the section with some examples. Example 1. Put for k ≥ 3 (α > 2, b > 1), −1/2

βk = |k|(log |k|)b .

εk = k −α/2 ,

n−1 ) −2 and so Nn 3 exp(α n log n), Then n−1 k=2 εk = exp k=2 α log k 3 exp(α n log n); 2 where α > 0 and α > 0 depend on α only. Besides bj 3 |k|≥j +1 |k|−1 (log |k|)−b 3 β √ (log j )−b+1 . Further bjj log j 3 √1j → 0, as j tends to infinity, and εj2 log Nj +1 j ≥1

bj2

3

(log j )b−1 j log j j ≥1

jα

< ∞.

Thus r < ∞ for any r > 2. Consequently Wr (X(ω)) < ∞ and X(ω) ∈ CLT for almost all ω ∈ .

304

7 The central limit theorem for dynamical systems

Example 2. Put for k ≥ 3 (c > b > 1), εk = exp[−(log k)c ],

βk2 = exp[−(log k)b ] − exp[−(log(k + 1))b ]. β2

Then βj2 3 jb (log j )b−1 exp[−(log(j ))b ]. Hence bj2 3 jb (log j )b−1 . Further εn ηn−1 3 j

exp[−n(log n)c ], and so Nn 3 exp[−n(log n)c ]. Therefore εj2 log Nj +1 j ≥1

bj2

3

exp[−(log j )c + (log j )b ]j (log j )c < ∞.

j ≥1

Thus r < ∞ for any r > 2. Consequently Wr (X(ω)) < ∞, X(ω) is continuous and X(ω) ∈ CLT for almost all ω ∈ . The third example will be used in the next section. Example 3. Put for k ≥ 1, βk2 = log−4 (k + 2) exp[−k log−4 k],

εk2 = k −4 exp[−k log−4 k].

− x − x The following lemma is elementary (i.e., uses e (log x)4 3 (log1x)4 e (log x)4 , x large). For any positive integer j put dj =

Nj ≤N j

ak2 bi2

|aj | 2 k>j ak

1/2 = 0,

1/2 = lim j →∞

|bj | i>j

bi2

1/2 = 0.

Then τ (f, g) SN D −→ N (0, 1). τ

SN (f, g) 2

Proof. Plainly τ SN (f, g)(t)

∞ N = ak bi k,i (t, θ ), 2 k,i=1

where we put "

# k,i (t, θ ) = cos 2π t (k + i )$ VN (k − 2i )θ "

# + cos 2π t (k − i )$ VN (k + 2i )θ "

# − sin 2π t (k + i )4 VN (k + 2i )θ "

# − sin 2π t (k − i )4 VN (k − 2i )θ .

(7.5.25)

Since L is assumed to be ρ-lacunary with ρ ≥ 3, the decomposition n = k + i is unique; hence τ

SN (f, g) 22 =

∞ 2

2 N 2 2 2

ak bi VN (k + 2i )θ + VN (k − 2i )θ . 4 k,i=1

306

7 The central limit theorem for dynamical systems

Next put for any j ≥ 1 and N ∈ Ij N τ SˇN (f, g) = ak bi k,i (t, θ ), 2 k,i>j

τ SN (f, g) =

N 2

ak bi k,i (t, θ ),

k or i <j

N N τ sN (f, g) = aj bi j,i (t, θ ) + bj ak k,j (t, θ ), 2 2 i≥j

(7.5.26)

k≥j

N Rjτ (f, g) = ak bi {cos 2π t (k + i ) + cos 2π t (k − i ) 2 k,i>j

− sin 2π t (k + i ) −sin 2π t (k − i )}. Then, τ Sˇ (f, g) − R τ (f, g) 2 N j 2 2

2 N 2 2 2

= ak bi VN (k + 2i )θ − 1 + VN (k − 2i )θ − 1 . 4 k,i>j

Define for u ∈ R, | u | = dist(u, Z); ( | u | ∈ [0, 1/2] and | u + v | ≤ | u | + | v | ). Then VN (ψ) = VN (| ψ |). We assume that N1 is chosen so that ε1 /(16N1 ) ≤ 1/8. Thus | i θ | = {i θ} and | 2i θ | = {2i θ } = 2{i θ }, (i ≥ 1). Combining these elementary observations with (7.5.5) gives

VN (k ± 2i )θ − 1 = VN | (k ± 2i )θ | − 1 ≤ 32N| (k ± 2i )θ | ≤ 32N | k θ | + 2| i θ | η ηi k (7.5.27) ≤ 64Nj +1 + 16k N1 16i N1 6416j N1 ηk ηi ≤ ≤ 8εj +1 . + ηi εj +1 16k N1 16i N1 τ 2 Therefore SˇN (f, g) − Rjτ (f, g) 2 ≤ 4N 2 k,i>j ak2 bi2 εj2+1 and so τ Sˇ (f, g) − R τ (f, g) ≤ 2N f 2 g 2 εj +1 . (7.5.28) N j 2 τ (f, g), we have As regarding SN 2 τ S (f, g) 2 = N N 2 4

2

2

ak2 bi2 VN (k + 2i )θ + VN (k − 2i )θ .

k or i <j

Observe that {k θ } = | k θ | ≤ 1/8, {2i θ } = 2{i θ } = | 2i θ | ≤ 1/4. Hence | (k ± 2λi )θ | = {k θ } ± 2{λi θ }.

307

7.5 CLT for rotations

Thus, by (7.5.5) VN (k ± 2λi )θ ≤ 1/2Nj {k θ } ± 2{λi θ }. By distinguishing the three cases k = i, k > i and k < i, it is easy to show that {k θ } ± 2{λi θ } ≥ 3 {h θ } (h = k ∧ i). 4 Since h < j we have {h θ} ≥ Therefore

3 ηj −1 ηj −1 1 16j −1−h "⇒ {k θ } ± 2{λi θ } ≥ . j −1 . . j −1 ηh ≥ j −1 2 16 N1 2.16 N1 8 16 N1 j −1

VN (k ± 2λi )θ ≤ 8 . ηj −1 εj . 16 N1 ≤ 3εj . 3 16j −1 N1 ηj −1

(7.5.29)

Thus 9N 2 εj2 τ τ 3N εj 2 2 S (f, g) 2 ≤ "⇒ SN a b (f, g) 2 ≤

f 2 g 2 . k i N 2 4 2 k or i <j

(7.5.30) Finally 1/2 1/2 τ 2 s (f, g) ≤ N |aj | . b + |b | ak2 j i N 2 i>j

(7.5.31)

k>j

Summarizing, we find by combining estimates (7.5.28), (7.5.30) and (7.5.31), τ S (f, g) − R τ (f, g) N j 2 τ τ τ τ ˇ ≤ SN (f, g) − Rj (f, g) 2 + SN (f, g) 2 + sN (f, g) 2 (7.5.32) $ 1/2 1/2 % ≤ 4N f 2 g 2 εj + N |aj | bi2 + |bj | ak2 . i>j

Therefore

k>j

τ τ Rjτ (f, g) SN (f, g) 2 SN (f, g) R τ (f, g) − 1 ≤ R τ (f, g) − R τ (f, g) 2 2 2 2 j j j εj ≤ 4 f 2 g 2 2 2 1/2 k,i>j ak bi $ % |aj | |bj | + + .

2 1/2 2 1/2 k>j ak i>j bi

Under the assumptions made, we deduce τ Rjτ (f, g) SN (f, g) = 0, − lim sup j →∞ N ∈Ij Rjτ (f, g) 2

Rjτ (f, g) 2 2 τ (f, g)

SN 2 = 0. lim sup − 1 2 2 1/2 j →∞ N ∈Ij N k,i>j ak bi )

(7.5.33)

308

7 The central limit theorem for dynamical systems

Thus the normalized partial are close – in the L2 (λ)-norm – to the sequence sums τ R (f,g) of normalized remainders R τj(f,g) 2 . We are led to the same situation as the one j

j

treated in detail before, and we deduce that τ (f, g) SN D −→ N (0, 1), τ

SN (f, g) 2

(7.5.34)

as N tends to infinity. We now pass to the study of another example. Weighted ergodic sums. Let σ = {σk , k ≥ 0} be a sequence of reals and put SN = N−1 2 σ . k=0 k Consider for f ∈ L (m) the weighted sums UNτ f =

N −1

σk f τ k

(N ≥ 1).

(7.5.35)

k=0

The kernels corresponding to the averages of these weighted sums are defined by −1 2iπ kt . We have the easy estimates valid for any M ≥ N ≥ 0 σ e WN (t) = S1N N k k=0 and t ∈ X, WM (t) − WN (t) ≤ 4π Mt SM − SN , SM ii) WM (t) − 1 ≤ 4π Mt, i)

iii)

WN (t) ≤

2σ∞ SN | sin π t|

(σ∞ = σ0 +

(7.5.36)

∞

|σk − σk+1 |).

k=0

σk 2iπ kt −1)− The first estimate comes from the equation WM (t)−WN (t) = M−1 k=0 SM (e N−1 σk 2iπ kt − 1); the last is obtained by Abel summation. Let N = {Nj , j ≥ 1} k=0 SN (e be some rapidly increasing sequence of integers greater than 2 and write again Ij = [Nj , Nj +1 [. We assume that ηj := Nj /Nj +1 tends to 0 as j tends to infinity. We choose a sequence L = {j , j ≥ 1} so that 1 1 1 − 2 ≤ {k θ } < . Nk Nk Nk Let β 1 = (βk1 )k≥1 , β 2 = (βk2 )k≥1 β 1 , β 2 ∈ 2 and put ρ = (ρk )k≥1 with ρk = βk1 +iβk2 and consider ∞ F (t) = $(ρk ek (t)) (∀t ∈ T) k=1

7.5.9 Theorem. Assume that SN ≥ N α for some positive α. If

309

7.5 CLT for rotations

i) limj →∞

ηj

k>j

ρk2

1/2 = 0, ρ2

k = 0 (∀0 < a ≤ 1), ii) limj →∞ aj ≤k≤j 1/2 k>j

ρk2

then

UNτ F D −→ N (0, 1).

UNτ F 2

If ρk2 = √ k −1 (log k)−2 , then condition (ii) is satisfied, and condition (i) means limj →∞ ηj / log j = 0. Theorem 7.5.10 applies in this case provided that N grows sufficiently fast. Proof. By estimates (7.5.36): for N ∈ Ij , WN (k θ ) − 1 = WN ({k θ }) − 1 ≤ 4π ηj +1 WN (k θ ) ≤ 8σ∞ η[j a]−1 (k < [j a]). π Put for any j ≥ 1, N ∈ Ij and t ∈ X: " # Uˇ N (F ) = SN $ ρk WN (k θ )ek , k>j +1

uN (F ) = SN

(k > j + 1),

UN (F ) = SN

"

#

$ ρk VN (k θ )ek ,

" # $ ρk WN (k θ )ek ,

k≤[j a]

Fj (t) =

[j a] 0.

N −1 k=0

f τ P (k)

Proof. We use the circle method. Let N = {Nk , k ≥ 1} be increasing to infinity and put $ % 1 1 1 k = − 2, . Nk Nk Nk 1/ h

Let 0 < h < 1/3 be fixed. Put for k ≥ 1 and N ≥ Nk , 2−h k (N) = α ∈ k : ∃q ≤ N h and a with (a, q) = 1 : α − qa < N2 . Then, λ(k (N)) ≤

#

2−h a 2 2−h " λ k ∩ qa − N2 ,q + N .

q≤N h

(a,q)=1 dist(a/q,k )

1 N 2−h

.

Let α ∈ N=2p ,N ≥N 1/ h k (N )c ; then for all p ≥ p0 , q ≤ (2p )h and (a, q) = 1, k α − a < 2p2−h . Let 2p ≤ N < 2p . For q ≤ N h , we have q ≤ (2p )h and for a such 2−h q (2 ) that (a, q) = 1, 2−h 1 α − a ≥ 2 ≥ 2−h . p 2−h q (2 ) N 1 Therefore, for any p ≥ p0 and 2p ≤ N < 2p the inequality α − qa ≥ N 2−h , is fulfilled h for any q ≤ N and a with (a, q) = 1. This implies that λ(∗k ) ≥

1 λ(k ) > 0. 2

By using Birkhoff’s theorem: for almost all x ∈ X and integers k ≥ 1, #{0 ≤ j < J : x + j θ ∈ ∗k (mod 1)} = λ(∗k ). J →∞ J lim

Thus we can find an x ∈ X and an increasing sequence of positive integers L = {k , j ≥ 1} such that x + k θ ∈ ∗k (mod 1) (∀k ≥ 1). Define, for ρ = {ρk , k ≥ 1} ∈ 2 , F =

∞

$ ρk .ek +x/θ . k=1

The system of functions (ek +x/θ , k ≥ 1) is orthonormal, and τ N F =N

∞

$ ρk .ek +x/θ · QN (k θ + x) . k=1

h a) We fix N. We first consider the summation block1 corresponding to N ≤ Nk < a We have the following estimate: if α − q < q 2 and (a, q) = 1,

N 2−h .

N 1 q 1/2 2iπ αn2 1+ε 1 e . + + 2 < Cε N q N N n=1

We choose ε = h/4 and assume N1 > 2. Since x + k θ ∈ ∗k ⊂ k (mod 1), we have 1 1 1 − 2 ≤ | x + k θ | = {x + k θ } < , Nk Nk Nk

312

7 The central limit theorem for dynamical systems

and so N 1 1 Nk 1/2 2 e2iπ(x+k θ )n < Ch N 1+h/4 + ≤ Ch N 1−h/4 . + 2 Nk N N n=1

Therefore |QN (x + k θ )| ≤ Ch N −h/4

(N h ≤ Nk < N 2−h )

and X

2 $ ρk · ek +x/θ · NQN (k θ + x) dt

k: N h ≤Nk j

Our first objective is to prove the following proposition. 7.6.2 Proposition. There exists an absolute constant C such that for any j large enough and |λ| ≤ j ,

1 0

2 Fj (t) |ρk | −λ2 /2 ≤ Cej sup exp(iλ )dt − e . cj k>j ck

Proof. Let ε = {εj , j ≥ 1} be a decreasing sequence of reals satisfying εj ≤ sup |ρk |.

(7.6.8)

k>j

Let now κ : N → N be some increasing function satisfying for any positive integer j , |ρk |2 /cj2 ≤ εj2 , (7.6.9) k>κ(j )

7.6 Lacunary series and convergence in variation

319

" # and put j = [j, κ(j )[, Fj = j κj

≤

2j 32 /4 " 2 e j εj 2

(7.6.13)

# + Cj 2 .

It remains to observe that 1 1 1 |ρk |4 /cj4 ≤ sup |ρk |2 /cj2 ≤ sup |ρk |2 /ck2 .

Cj 22 ≤ 4 4 k>j 4 k>j

(7.6.14)

k∈j

Putting together estimates (7.6.9) to (7.6.14) finally gives: there exists J2 < ∞ such that for any j ≥ J2 , |λ| ≤ j and N ∈ Ij , 1 2

(t) F λ j )dt − exp − exp(iλ cj 2 0 3j 3 2 1 sup |ρk | + 2j e 4 j εj2 + sup |ρk | ≤ cj k≥j 2cj k>j 2 |ρk | 3 3 2 43 2j |ρk | ≤ sup j + j e ≤ Cej sup . 2 k>j ck k>j ck This proves Proposition 7.6.2. Application to Gaussian random Fourier series. Now we pass to random Fourier series. Let g = (gk )k∈N , g = (gk )k∈N be two independent sequences of N (0, 1) distributed

7.6 Lacunary series and convergence in variation

321

random variables defined on a probability space (, A, P) different from (T, λ). Let

2 1/2 . We assume that the following condition β = (βk )k∈N ∈ 2 . Put bj = k>j βk is satisfied: 2 −ε lim ebk sup |βj | log j = 0 for some e > 0. (B1) k→∞

j ≥k

Let

−2/δ

k = bk

,

(7.6.15)

where δ is chosen sufficiently large to satisfy 2

2 e j lim j sup |βj | log j = 0. k→∞ bj j ≥k

(7.6.16)

It is enough to take δ > 4/ε, where ε is defined in (B1). For k ≥ 1 we put ρk = βk (gk + 1 2 2 2 1

2 2 = 2 igk ). Then with the preceding notation, cj = k>j |ρk | k>j gk +gk βk . Consider for t ∈ T, ω ∈ the following Gaussian random Fourier series:

X(ω, t) = $ ρk (ω)ek (t) k∈N

=

∞

βk gk (ω) cos 2π k t + gk (ω) sin 2π k t ,

k=1

Tj (ω, t) =

1

$ ρk (ω)ek (t) cj k>j

1

= βk gk (ω) cos 2π k t + gk (ω) sin 2π k t , cj

(7.6.17)

k>j

1

j (ω, t) = √ $ ρk (ω)ek (t) bj 2 k>j 1

= √ βk gk (ω) cos 2π k t + gk (ω) sin 2π k t . bj 2 k>j 7.6.3 Proposition. Under conditions (B1) and ∞

βk2 log2 k < ∞,

(B2)

k=1

we have sup

|λ|≤j

0

1

exp(iλTj (ω, t))dt − e

−λ2 /2 a.s.

= O e

2j

√ |βj | log j . bj

322

7 The central limit theorem for dynamical systems

Proof. Consider the Gaussian chaos of order 2, Bj =

βk 2 $ g 2 + g 2 − 2 % k

k>j

k

2

bj

.

By the hypercontractivity properties of Gaussian chaos (see for instance Ledoux and Talagrand [1991: inequality 3.8], for any integer q ≥ 2,

Bj q,P ≤ q Bj 2,P . Let wj = Bj 2,P and αj =

1 2ewj

. Then,

∞ 1 q E exp(αj |Bj |) ≤ 1 + αj Bj 1,P + (αj q)q wj . q! q=2

Using the elementary estimate n! > integer n, we deduce

√ " 2π nnn e−n exp

1 12n+ 41

#

valid for any positive

∞ ∞ 1 1 −q 1 q (αj wj q) ≤ √ 2 = √ . q! 2 π 4 π q=2

q=2

Thus E exp(α|Bj |) ≤ C (with C = 1 + inequality gives for any positive real η,

1 2e

+

1 √ ). 4 π

Applying then Tchebycheff’s

ηj . P |Bj | ≥ ηj ≤ C exp − 2ewj

(7.6.18)

2 Now Bj 2,P = 2(E N (0, 1)4 − 1) k>j βk4 /bj4 , and by assumption (B1) we have for −εp

any k large enough |βk | ≤ (log k)−1/2 exp(−bk−ε ). As exp(bk−ε ) ≥ Cp bk bound |βk | ≤ C(log k)−1/2 bk2 for a suitable choice of p. Thus β4 k 4 b k>j j

≤

, we get the

C 2 C 2 C , βk ≤ βk log k ≤ 2 log j log j k≥1 log2 j k>j

(7.6.19)

where we have used (B2) to get the last inequality. Henceforth, for j large

P |Bj | ≥ ηj

ηj ≤ C exp − 2ewj

≤ exp(−Cηj log j ).

By the Borel–Cantelli lemma and using the fact that η can be arbitrarily small, we deduce a.s. |Bj | = o(−1 (7.6.20) j ).

7.6 Lacunary series and convergence in variation

In particular,

a.s. 1 2 gk + (gk )2 βk2 = 2. 2 j →∞ b j k>j lim

It follows that limj →∞

cj a.s. bj =

√

323

(7.6.21)

2. As E supk≥1 gk2 + (gk )2 / log k < ∞, we have

1/2 √

2 |βk | gk2 + (gk )2 log k sup |ρk | = sup ≤ C sup |βk | log k, √ log k k>j k>j k>j where C is a random variable with finite expectation. Therefore, by (7.6.16), 2

lim e

j →∞

2j

2 |ρk | √ ej a.s. sup ≤ 2C lim sup |βk | log k = 0. j →∞ bj k>j k>j ck

This implies that assumption (H2 ) is satisfied for almost all sample paths of X. Applying now Proposition 7.6.2 gives 1 √ |βk | log k 2j −λ2 /2 a.s. exp(iλTj (ω, t))dt − e . sup = O e sup bk 0 |λ|≤j k>j This proves Proposition 7.6.3. We close the section by establishing the following corollary. 7.6.4 Corollary. Under assumptions (B1), (B2) and with the choice of j defined above, 1 2 1 2 sup max 0 exp(iλTj (ω, t))dt − e−λ /2 , 0 exp(iλj (ω, t))dt − e−λ /2 |λ|≤j

equals o(−1 j ) almost surely. Proof. By assumption, we deduce from Proposition 7.6.3 the inequality concerning Tj . Now 1! exp(iλTj (t)) − exp(iλj (t)) dt 0 1 Tj (t) − j (t)dt ≤ |λ| 0 (7.6.22) cj 1 |Tj (t)|dt = |λ|1 − √ bj 2 0 cj cj ≤ |λ|1 − √ Tj 2,λ = |λ|1 − √ . bj 2 bj 2

324

7 The central limit theorem for dynamical systems

But,

1 −

2 2 2 cj k>j gk + (gk ) βk . − 1 √ ≤ 2bj2 bj 2

The proof is then achieved by using (7.6.20). Small values of some trigonometric series. In this section we give conditions under which for any α > 0, the following estimate is fulfilled:

1

M(α) = sup j

0

dt

2 2 k≥j βk sin π k t

< ∞. 2 α / i≥j βi

(E)

First, we prove a series of intermediate results. 7.6.5 Proposition. Let K > 0 be an integer, (γn )n≥1 be a positive sequence with 2 n≥1 γn < ∞, and L = {n , n ≥ 1} be a sequence of integers such that n+1 is a multiple of n for every n ≥ 1. Set r = K /1 . Then, for any ε > 0, we have λ

n≥1

γn2 sin2 π n x

0. Then we have 1 λ(I ∩ A) ≤ λ(I )λ(A) + λ(A). s Proof.,It suffices to prove the estimate asserted by the lemma for sets of the form A = s−1 i=0 ( + (i/s)), where is an interval of length λ() ≤ 1/s. Indeed, once this is proved, the result follows since the Borel σ -field coincides with the monotone class generated by the Boole algebra of disjoint unions of sets (Ai )i≤1 , where each Ai has the same form as A. , We turn now to the proof of the assertion for the set A = s−1 i=0 ( + (i/s)) with as above. The maximal number of such i ∈ {0, . . . , s − 1} for which + (i/s) intersects I is bounded above by (λ(I )/(1/s)) + 1 = λ(I )s + 1. Then

λ(I ∩ A) ≤ (λ(I )s + 1)λ() = λ(I )sλ() + which completes the proof of the lemma.

sλ() 1 λ(A), = λ(I ) + s s

325

7.6 Lacunary series and convergence in variation

Proof of Proposition 7.6.5. We have that λ x ∈ T: γn2 sin2 π n x < ε n≥1

γn2 sin2 π n x < ε ≤ λ x ∈ T : γ12 sin2 π 1 x < ε, n≥K

(7.6.23)

n = λ x ∈ T : γ12 sin2 π x < ε, γn2 sin2 π rx < ε , 1 n≥K

and it follows now from Lemma 7.6.6 that γn2 sin2 π n x < ε λ x ∈ T: n≥1

ε 1 ≤ λ x ∈ T : sin π x < 2 + λ x ∈ T: γn2 sin2 π(n /1 )x < ε K γ1 n≥K 2

ε1/2 1 ≤ λ x ∈ T : 2x < + λ x ∈ T: γn2 sin2 π(n /1 )x < ε γ1 K

=

1/2 ε

γ1

+

n≥K

1

r

λ x ∈ T:

γn2 sin2 π n x < ε .

n≥K

7.6.7 Assume that {γn , n ≥ 1} is a sequence of positive reals satisfying Proposition. 2 < ∞ and let ( ) γ n n≥1 be a sequence of integers such that n+1 is a multiple n≥1 n of n for every n ≥ 1. Furthermore, let m > 1 and Km > Km−1 > · · · > K0 ≥ 1 be some integers. Then, m 1/2 ( Kj −1 ε γn2 sin2 π n x < ε ≤ + . λ x∈T: γKj −1 Kj j =1

n≥1

Proof. Follows from Proposition 7.6.5 by induction. As an immediate consequence, we have the following corollary. 7.6.8 Corollary. Under the assumptions of Proposition 7.6.7, let us additionally suppose that for s = 1, . . . , m, Ks /Ks−1 ≥ ρ and that γn ≥ γ > 0 for n = 1, . . . , Km . Then we have

λ x∈T:

n≥1

γn2 sin2 π n x

pm , we have pm ≥ kδm , thus γmp ≥ 2−δmp , and we can continue with 1 m 1 λ x∈T: γn2 sin2 π n x < 2−p ≤ 2− 2 p 2δmp + 2−p ≤ 2m · 2−p( 2 −δm)m n≥1

≤ 2m · 2−pm/6 . Define dγ (x) =

∞

γn2 sin2 π n x.

n=1

Thus for any sequence γ satisfying (H3 ), we have proved the following assertion: For any integer m ≥ 1, there exists a number pm depending on m and γ only, such that for any p > pm the following estimate holds true: (7.6.24) λ x ∈ T : dγ (x) < 2−p ≤ 2m .2−pm/6 . Since m ≥ 1 is an arbitrary integer, the latter relation implies that 1 dγ−α (x)dx < ∞,

(7.6.25)

0

for every α > 0. Throughout the rest of the section, we assume that for some b > 1, βk = k −1/2 (log k)−b/2 . We now pass to establishing, for every α > 1, the relation (E). We are going to make use of the following asymptotics (b > 1): βk2 = k −1 (log g)−b 3 1/(log j )b−1 , k≥j

k≥j

k≥j

βk4 =

k≥j

k −2 (log g)−2b 3 1/j (log j )2b ,

327

7.6 Lacunary series and convergence in variation

A positive integer A will be chosen later. Fix a certain integer p ≥ 1. Then for arbitrary integers m ≥ 1 and j ≥ Ap we obtain 8 λ βk2 sin2 π k x βi2 < 2−p k≥j

=λ

i≥j

k≥j

≤λ

βk2 cos 2π k x

k≥j

≤

k≥j

8

βk2 (1/2 − (1/2) cos 2π k x)

βi2

−1

8

βi2

βi2 < 2−p

i≥j

> 1 − 2−(p−1)

i≥j

βk2 cos 2π k x > 1 .

i≥j

In view of our assumption that k+1 is a multiple of k (k ≥ 1), the sequence (cos 2πk x)k≥1 is a reversed sequence of bounded martingale differences, and we may apply the following deviation bound. 7.6.9 Lemma. Let X1 , . . . , Xn be a sequence of bounded martingale-differences so that |Xi | ≤ ci , i = 1, . . . , n. Then for every x > 0 we have P

n

Xi > x ≤ exp −

i=1

2

x2 n

.

2 i=1 ci

From Lemma 7.6.9 and the previous calculations we see that for j ≥ Ap, λ

βk2 sin2 π k x

k≥j

8

βi2 < 2−p ≤ exp

i≥j

−

k≥j

2

βk2

k≥j

2

βk4

Aj (log(Aj ))2b ≤ exp −C(b) (log(Aj ))2b−2

≤ exp (−C(b)Aj log(Aj )2 ) ≤ exp (−C(b)Aj ) ≤ exp (−C(b)Ap). Now we choose A to satisfy the relation A = A(b, κ) > C −1 (b)κ log 2. Thus we get that for every integer p > 1 and j ≥ Ap, 8 βk2 sin2 π k x βi2 < 2−p ≤ 2−κp . λ k≥j

i≥j

(7.6.26)

(7.6.27)

328

7 The central limit theorem for dynamical systems

Let us consider now case j ≤ Ap with a certain fixed integer p ≥ 1. For arbitrary integers m ≥ 1 and j ≤ Ap we have then 8 λ βk2 sin2 π k x βi2 < 2−p k≥j

=λ

i≥j

βk2

8

k≥j

βi2

−1

sin2 π k x < 2−p ) .

i≥j

Notice that for k ∈ [j, j + mp] with j ≤ Ap we have k ≤ (A + m)p and βk2

βi2

−1

2 ≥ β(A+m)p

i≥j

βik

−1

i≥1

1 1 ≥ C(b, m, A) . (A + m)p(log((A + m)p))2b p(log p)2b 2 2 −p We apply now Corollary 7.6.8 with γn2 = βn+j i≥j βi (n = 1, 2, . . . ), ε = 2 , −1 / p Ks = j + sp(s = 0, . . . , m) and r = 2 to obtain for j ≤ Ap the relation ≥ C(b)

λ

βk2 sin2 π k x

k≥j

8

βi2

i≥j

≤

−p/2 2 p(log p)2b

C(b, m, A)

+2

−p

m

.

Now, by choosing m > 2κ, we may conclude that for every p > 1 and j ≤ A = A(b, κ) the estimate 8 λ βk2 sin2 π k x βi2 ≤ 2−p ≤ C (b, κ)2−κp (7.6.28) k≥j

i≥j

holds true. The estimate (7.6.28) combined with the inequality (7.6.27) gives us that for every b > 1 and κ > 1 there exists such constant C(b, κ) that for every p > 1 we have 8 λ βk2 sin2 π k x βi2 ≤ 2−p ≤ C(b, κ)2−κp . k≥j

i≥j

This implies (E) for every α < κ exactly in the same way as after the proof of Lemma 7.6.6. We have therefore proved the following result. 7.6.10 Proposition. Assume that βk = k −1/2 (log k)−b/2 where b > 1. Then property (E) is realized for every α > 0. Local time and density distribution. In this section, we show that the distribution function of j (ω, · ) (see definition in (7.6.17)) is for almost all ω, absolutely continuous with respect to the Lebesgue measure. Our approach relies upon the properties of local times for Gaussian processes; we refer to Section 10.3. Put for j ≥ 1 and

329

7.6 Lacunary series and convergence in variation

A ∈ B(R), j (A) = j (ω, A) = λ{0 ≤ t ≤ 1 : j (ω, t) ∈ A}, 1 ixu ˆ e j (ω, dx) = eiuj (ω,t) dt. j (u) = j (ω, u) = R

0

In a first step, we show the almost sure existence of a continuous local time for j (ω, · ), namely the density of the distribution function of j (ω, · ). The approach for this is standard. But the result (E) obtained in the previous section is crucial here. Existence and continuity of the local times of j . Our first objective will be to prove ˆ j (u)du is finite. To begin, we observe that that the integral E R ˆ j (u)2 |u|1+δ du E R 1 1 1+δ |u| E exp{iu[j (s) − j (t)]}dsdt du = R

0

=

R

|u|1+δ 0

=

|v|

1+δ

R

=

0 1 1

R

e

0

−v 2 /2

exp{−u2 / j (s) − j (t) 22,P }dsdt du

dv 0

|v|1+δ e−v

2 /2

1 1

0

1 1

dv 0

0

dsdt

j (s) − j (t) 2+δ 2,P

1 2bj2

dsdt

k≥j

2+δ

βk2 sin2 π k (s − t)

≤ C(δ)M(1 + δ/2), by (E). Hence

sup E j ≥1

Since $

E

|u|>ε

ˆ j (u)du

R

%2

ˆ j (u)2 |u|1+δ du ≤ C(δ)M(2 + δ).

$ ≤E $ ≤

%2

|u|>ε

|u|>ε

2 = δE δε

(7.6.29)

ˆ j (u)du

% |u|−(1+δ) du .E

|u|>ε

ˆ j (u)2 |u|1+δ du

|u|>ε

ˆ j (u)2 |u|1+δ ,

we deduce sup E j ≥1

R

ˆ j (u)du ≤ 2ε +

1/2

2 C(δ)M(1 + δ/2) δεδ

.

(7.6.30)

330

7 The central limit theorem for dynamical systems

It follows (see Section 10.3) that j (ω, · ) is absolutely continuous – j (ω, · ) has local times – and x j (ω, x) − j (ω, −∞) = φj (ω, u)du, −∞

where φj (ω, u) ≥ 0, φj (ω, u) ∈

L1 (R).

φj (ω, x) = Put

+∞ −∞

1 1 2 p(x) = √ e−x = 2π π

∞

−∞

Then φj (ω, x) − p(x) = And sup φj (ω, x)−p(x) ≤ x

Moreover ˆ j (u)du. e−iux

e−iux e−u

∞

−∞

2 /2

du

γ (x) = e−x

ˆ j (u) − γ (u) du. e−iux

(7.6.31)

2 /2

.

(7.6.32)

∞ −∞

ˆ j (u)−γ (u)du ≤ I1 (j )+I2 (j )+I3 (j ) , (7.6.33)

where

I1 (j ) =

j

−j

ˆ j (u) − γ (u)du,

I2 (j ) =

|x|≥j

I3 (j ) =

|x|≥j

γ (u)du,

(7.6.34)

ˆ j (u)du,

and j is chosen according to (7.6.15), (7.6.16). The first integral is estimated by Corollary 7.6.4: a.s. (7.6.35) I1 (j ) = o(1). Clearly I2 (j ) = o(1). In order to precisely estimate I3 (j ), it will be necessary to first consider for k < j the integrals ˆ j (u) − ˆ k (u)2 |u|1+δ du. E R

Estimating E

ˆ j (u) − ˆ k (u)2 |u|1+δ du, we shall prove the following lemma. R

7.6.11 Lemma. There exists Cδ finite, such that for any j ≥ k, bk2 − bj2 2 1+δ ˆ ˆ . E j (u) − k (u) |u| du ≤ Cδ bk2 R

331

7.6 Lacunary series and convergence in variation

Proof. Since 1 1

iu (t)

iu (s) ˆ j (u) − ˆ k (u)2 = E E e j − eiuk (t) dt e j − eiuk (s) ds, 0 0 1 1

iu (t) E e j − eiuk (t) eiuj (s) − eiuk (s) dtds, = 0

0

elementary computations show that E

R

ˆ j (u) − ˆ k (u)2 |u|1+δ du = C(δ)

1 1

k,j (s, t)dsdt, 0

(7.6.36)

0

where k,j (s, t) can be calculated as 1

j (s) − j (t) 2+δ 2,P

−

1

j (t) − k (s) 2+δ 2,P 1 1 − + . 2+δ

k (t) − j (s) 2,P

k (s) − k (t) 2+δ 2,P

Write k,j (s, t) = 1k,j (s, t) + 2k,j (s, t) + 3k,j (s, t), where 1k,j (s, t) = 2k,j (s, t) = 3k,j (s, t) =

1

j (s) − j (t) 2+δ 2,P 1

j (s) − j (t) 2+δ 2,P 1

k (s) − k (t) 2+δ 2,P

− − −

1

j (t) − k (s) 2+δ 2,P 1

k (t) − j (s) 2+δ 2,P 1

j (t) − j (s) 2+δ 2,P

, ,

(7.6.37)

.

The two first expressions are of the same type. We observe that √

2 j (t) − k (s) cos 2π λ t sin 2π λ t cos 2π λ s sin 2π λ s βλ − gλ + − gλ = bj bk bj bk λ>j

1 − bk

kj

=

bk2

− bj2 bk2

1 1 2 1 1 − 2+ − cos 2π λ (t − s) 2 b b b bk bj j j k

+4

βλ2

λ>j

(7.6.38)

bj2 − bk2 2 sin2 π λ (t − s) + bj bj2 bj2 bk2

2 1 1 2 + − βλ cos 2π λ (t − s) bj bj bk λ>j

2 2 1 1 2 − βλ cos 2π λ (t − s). = 2E j (t) − j (s) + bj bj bk λ>j

Hence, 2 2 2 1 1 2 − βλ cos 2π λ (t − s). 2E j (t) − k (s) = 2E j (t) − j (s) + bj bj bk λ>j

(7.6.39) Similarly, 2 2 2 1 1 2 − βλ cos 2π λ (t − s). 2E k (t) − j (s) = 2E j (t) − j (s) + bj bj bk λ>j

Now, we estimate (7.6.39),

11 0

0

(7.6.40) 1k,j (s, t)dsdt. Fix s and t in [0, 1], and write according to

1k,j (s, t) = =

1

j (s) − j (t) 2+δ 2,P 1 A1+δ/2

−

−

1

j (t) − k (s) 2+δ 2,P

1 , (A + a)1+δ/2

333

7.6 Lacunary series and convergence in variation

2 where A = j (s) − j (t) 22,P and a = b2j b1j − b1k λ>j βλ cos 2π λ (t − s). We have A + a ≥ 0. So if a ≤ 0, then 0 ≤ A + a ≤ A, and 1k,j (s, t) ≤ 0. Now if a ≥ 0, we make use of the elementary inequality (x + y)1+ε − x 1+ε ≤ (1 + ε)y(x + y)ε , valid for any reals x, y, ε > 0, to bound 1k,j (s, t) as follows: (A + a)1+δ/2 − A1+δ/2 a(A + a)δ/2 ≤ (1 + δ/2) 1+δ/2 1+δ/2 (A + a) A (A + a)1+δ/2 A1+δ/2 a ≤ (1 + δ/2) 2+δ/2 . A

1k,j (s, t) =

Hence, by writing a = a(s, t), 1k,j (s, t) ≤ 0.I{a(s, t) ≤ 0} + (1 + δ/2) ≤ 0.I{a(s, t) ≤ 0} + (2 + δ)

a

j (s) − j (t) 2+δ 2,P

I{a(s, t) > 0}

bk2 − bj2

1

bk2

j (s) − j (t) 2+δ 2,P

I{a(s, t) > 0}, (7.6.41)

since |a(s, t)| ≤

21 b j bj

b2 −b2 b −b − b1k bj2 = 2 kbk j ≤ 2 kb2 j . By integrating inequality (7.6.41) k

over [0, 1]2 with respect to dsdt, we obtain 1 0

1 0

≤ (2 + δ)

1k,j (s, t)dsdt

2 bk − bj2 1 1

bk2

0

0

dsdt

j (s) − j (t) 2+δ 2,P

. (7.6.42)

Now, we use Proposition 7.6.10 to observe that

1 1

sup

j ≥1 0

We thus arrive at

0

dsdt

j (s) − j (t) 2+δ 2,P

1 1 0

0

≤ M(1 + δ/4) < ∞.

1k,j (s, t)dsdt ≤ Cδ (

bk2 − bj2 bk2

).

(7.6.43)

).

(7.6.44)

Similarly 0

1 1 0

2k,j (s, t)dsdt

≤ Cδ (

bk2 − bj2 bk2

334

7 The central limit theorem for dynamical systems

11 It remains to estimate the last integral: 0 0 3k,j (s, t)dsdt. But, by the elementary inequality used to control 1k,j (s, t), we get 3k,j (s, t)

=

2+δ

j (t) − j (s) 2+δ 2,P − k (s) − k (t) 2,P

2+δ

k (s) − k (t) 2+δ 2,P j (t) − j (s) 2,P j (t) − j (s) 22,P − k (s) − k (t) 22,P 3 . "⇒ |k,j (s, t)| ≤ (1 + δ/2)

2+δ

j (t) − j (s) 2+δ 2,P ∧ k (s) − k (t) 2,P

Now k (s) − k (t) 2 − j (t) − j (s) 2 2,P 2,P 4 2 1 2 1 2 2 = 2 βλ (sin π λ (t − s)) + 4 2 − 2 βλ (sin π λ (t − s)) bk kj ≤4

bk2 − bj2 bk2

+4

bk2 − bj2 bk2

≤8

bk2 − bj2 bk2

.

Therefore, |3k,j (s, t)| ≤ 8

bk2 − bj2 bk2

1

+

j (t) − j (s) 2+δ 2,P

1

k (s) − k (t) 2+δ 2,P

.

By invoking again Proposition 7.6.10, we deduce that 1 1

0

0

|3k,j (s, t)|dsdt

≤ Cδ

bk2 − bj2 bk2

.

(7.6.45)

From (7.6.43), (7.6.44) and (7.6.45), we also have E

b2 − bj2 ˆ k (u)2 |u|1+δ du ≤ Cδ k ˆ j (u) − . bk2 R

(7.6.46)

And the lemma is proved. Proof of Theorem 7.6.1. We use the notation from the preceding section. Put, for any positive integer j , ˆ j (u)du. j = I3 (j ) = (7.6.47) |u|>j

Now, we show how Lemma 7.6.11 can be used to give an almost sure asymptotic estimate for j . Before going further, it is necessary to make some elementary

335

7.6 Lacunary series and convergence in variation

observations. First, we can write j − k = ˆ k (u)du, and thus k j

ˆ j (u) − ˆ k (u))du −

|u|>j (

ˆ k (u)du + ˆ j (u) −

ˆ k (u)du. k j

$

%2 ˆ k (u)du .

+ 2E k j

$

=E ≤ ≤

%2

ˆ j (u) − ˆ k (u)du

1+δ ( 1+δ 2 )−( 2 )

|u|>j

|u|

−(1+δ)

|u|

|u|>j bk2 − bj2 Cδ −δ j bk2

du E

R

ˆ j (u) − ˆ k (u)2 |u|1+δ du

(7.6.49)

= Cδ (bk2 − bj2 ),

−2/δ

since j = bj according to (7.6.15). And on the other, by using again the Cauchy– Schwarz inequality and (7.6.29), $ %2 −(1+δ) ˆ ˆ k (u)2 |u|1+δ du E k (u) du ≤ |u| du E k A 3 ≤ 2A sup φj (ω, x) − p(x) + 2 + p(x)dx, A |x|>A x∈R +

for any j large enough. Hence, by (7.6.54), 3 lim sup φj (ω, x) − p(x) dx ≤ 2 + p(x)dx. A R |x|>A j →∞ But A is arbitrary now. Letting then A tend to infinity finally gives a.s. lim sup φj (ω, x) − p(x)dx = 0. j →∞

R

To achieve the proof, it remains for us to prove that the series This amounts to requiring that (a) βλ2 log2 λ < ∞, λ

(b)

λ

2

λ log λ λ 0, not too many balls of radius u are needed to cover (N, d). According to a classical criterion, this information implies that the sequence X has an almost sure regular asymptotic behavior. In most cases, not only the sequence converges almost everywhere, but a speed of convergence can also be specified. As we will see in the next sections, (8.1.1) contains two cases of different nature: α > 1 and α = 1. Before going further, it seems natural to put assumption (8.1.1) into a more general framework.

342

8 The metric entropy method

Let : R → R+ be a Young function (convex, even, such that (0) = 0 and limx→∞ (x) = ∞). Let L denote the subspace of L0 (P), formed with elements f such that for some c > 0, E (c|f |) < ∞. The Orlicz norm associated to is defined by

f = inf{α > 0 : E (|f |/α) ≤ 1},

f ∈ L .

(8.1.2)

Then L endowed with the norm · is a Banach space. In particular, if (t) = |t|p , L is the usual Lp space. But other spaces are important, for instance exponential type Orlicz spaces associated to the exponential functions α (x) = e|x| − 1, α

1≤α u} ≤

|Xs −Xt | d(s,t)

1 u

and A = {U > u}, we have from (8.1.4):

U dP ≤ U >u

1 1 , P{U > u}−1 u P{U > u}

so that (8.1.4) implies

P

|Xs − Xt | 1 >u ≤ d(s, t) (u)

for every u ≥ 0 and s, t ∈ T .

(8.1.5)

When is of exponential type, (8.1.5) is equivalent to (8.1.3), and so (8.1.3) and (8.1.4) are equivalent. But when is of power type, (8.1.4) is in turn less stringent than (8.1.3). Conditions similar to (8.1.5) were used in Weber [1980]; more precisely it was assumed that for some random variable ,

|Xs − Xt | P > u ≤ P{ > u} d(s, t)

for every u ≥ 0 and s, t ∈ T ,

or else

+ E |Xs − Xt | − d(s, t)u ≤ d(s, t)E ( − u)+

(8.1.5 )

for every u ≥ 0 and s, t ∈ T .

The basic problem investigated under these various conditions can be described as follows: when for instance is the following implication true?

Xs − Xt ≤ d(s, t), ∀s, t ∈ T "⇒ sup |Xs − Xt | < ∞. s,t∈T

The supremum in the above is, for the moment, only understood as lattice supremum in L , for instance E sup |Xs − Xt | = sup E sup |Xs − Xt |, T0 finite in T . (8.1.6) s,t∈T

s,t∈T0

The weaker requirement will also be of some relevance: when under some of the increment conditions above considered, could we infer that E sup |Xs − Xt | < ∞? s,t∈T

Before continuing, it seems natural and necessary to examine what consequences can be drawn from these assumptions concerning finite supremums. A first observation concerns condition (8.1.4). Let Y1 , . . . , YN be nonnegative random variables on (, B, P) verifying: for any 1 ≤ n ≤ N and any measurable set A, 1 Yn d P ≤ P(A)−1 . (8.1.7) P(A) A Then, for any measurable set A, N N −1 sup Yn d P ≤ P(A) . P(A) A n=1

(8.1.8)

344

8 The metric entropy method

To see how it obtains, let {An , 1 ≤ n ≤ N} be a measurable partition of such that −1 Yn = supN i=1 Yi on An . Then, by the concavity of ,

N

sup Yn d P = A n=1

≤

N

Yn d P

n=1 A∩An N

P(A ∩ An )−1

n=1

N P(A ∩ An )

≤ P(A)−1

N . P(A)

Thus assumption (8.1.4) implies for any finite subset F of T ×T , and any measurable set A, |Xs − Xt | −1 #(F ) sup . (8.1.9) d P ≤ P(A) P(A) A (s,t)∈F d(s, t) This is also a consequence of assumption (8.1.3), since we have seen that (8.1.3) implies (8.1.4). In particular under (8.1.3), for any F finite in T × T , E sup |Xs − Xt | ≤ −1 (#(F )).

(8.1.10)

s,t∈F

But very often, under (8.1.3) more can be obtained and in a very elementary way. If (t) = |t|p with 1 ≤ p < ∞, then for any nonnegative random variables Y1 , . . . , YN on (, B, P), N N sup Yn ≤ N 1/p sup

Yn p . (8.1.11) n=1

p

n=1

The argument is rather straightforward: N N N p p p Yn d P ≤ N sup Yn p . sup Yn d P ≤ n=1

n=1

n=1

Now, if is of exponential type, a similar conclusion can be derived. For instance, α let 1 ≤ α < ∞ and set (t) = e|t| − 1. Then for any nonnegative random variables Y1 , . . . , YN on (, B, P), N N sup Yn ≤ max{1, ( log N )1/α } sup

Yn , (8.1.12) n=1

n=1

and we may take = 2/ log 2. This follows from Jensen’s inequality. We can assume supN n=1 Yn ≤ 1 and N ≥ 2. Then, as log N ≥ 1, α 1 ( log N) 1

N α supN n=1 Yn dP ≤ exp sup Yn d P ≤ (2N ) ( log N) ≤ 2. exp 1/α ( log N) n=1 This justifies the following definition: we say that a Young function is regular when there exists a constant C = C() depending on only, such that for any nonnegative random variables Y1 , . . . , YN , N N sup Yn ≤ C−1 (N ) sup

Yn . (8.1.13) n=1

n=1

345

8.1 Introduction and general results

Versions and separable processes. Let (T , d) be a metric space. Further, let X = {Xt , t ∈ T } be a stochastic process with basic probability space (, B, P). A version or a modification of X is a stochastic process X = {Xt , t ∈ T } with the same basic probability space, such that for each t in T , P{Xt = Xt } = 1. Suppose for instance that X satisfies the increment condition (8.1.3). By Tchebycheff’s inequality, if ε > 0, −1

P{|Xs − Xt | > ε} ≤

ε d(s, t)

→ 0,

as d(s, t) tends to 0. So that X is d-continuous in probability. If, in addition, (T , d) is separable, let T be a countable d-dense subset of T . Then for any t ∈ T , there exists a sequence {sn (t), n ≥ 1} contained in T and such that lim d(sn (t), t) = 0, P lim Xsn (t) = Xt = 1. n→∞

n→∞

X

If we now define, for each t in T , by Xt lim Xsn (t) n→∞

if t ∈ T , if t ∈ T \T ,

⎧ ⎨P{Xt = Xt } = 1, ⎩P sup Xt = sup Xt = sup Xt = 1,

then

t∈T

t∈T

∀t ∈ T .

t∈T

Consequently X is a version of X, and further X depends only on a countable family of random variables, so that there is no measurability problem when working with its supremum. We also note that the d-continuity in probability of the process, instead of condition (8.1.3), suffices for getting the conclusion. As a complement to this notion, we say that a stochastic process X = {Xt , t ∈ T } indexed on T and with basic probability space (, B, P) is d-separable or simply separable, if there exists a countable subset S of T , called a separation set (or separant set), and a null set N of B such that for any ω ∈ N and any t ∈ T , there is a sequence {sn , n ≥ 1} ⊂ S verifying lim d(sn , t) = 0,

n→∞

X(ω, t) = lim X(ω, sn ). n→∞

This is a very convenient notion, which solves measurability problems raised by the study of quantities such as supt∈T Xt , sups,t∈T |Xs − Xt | . . . . When X is d-separable, we have, by definition, P{sup Xt = sup Xt } = 1, t∈T

t∈S

P{ sup |Xs − Xt | = sup |Xs − Xt |} = 1 . . . . s,t∈T

s,t∈S

346

8 The metric entropy method

We therefore shall say that X admits a d-separable version or a d-separable modification, if there exists a stochastic process X = {Xt , t ∈ T } which is d-separable, and for which one also has P{Xt = Xt } = 1 (∀t ∈ T ). For instance, if X satisfies assumption (8.1.3) and (T , d) is separable, by the very construction of the modification X made above, X admits a d-separable version, which is precisely X . Indeed, take S = T and observe that for all ω ∈ and any t ∈ T , there is a sequence {sn , n ≥ 1} ⊂ S verifying limn→∞ d(sn , t) = 0 and X (ω, t) = limn→∞ X (ω, sn ). It is worth observing, when using these notions, that the fact that (T , d) is separable is a key property. If (T , d) is a pseudo-metric space, these notions can be extended to this case as well, for instance when the space is totally bounded, namely when the entropy numbers (see later) of the space are all finite. In this case, (T , d) contains a countable d-dense subset T : for all t ∈ T , there exists a sequence {sn , n ≥ 1} contained in T and such that limn→∞ d(sn , t) = 0. If X satisfies assumption (8.1.3), then X also admits a d-separable version X , which may be built exactly as before. Having defined these notions, we may now focus on our initial purpose: the study of the regularity of stochastic processes from the point of view of their in-norm increment properties. For, recall that for any real u > 0, the entropy number N (T , d, u) of order u of (T , d) is by definition the smallest (possibly infinite) number of open d-balls of radius u, enough to cover T . We write D = D(T ), the diameter of (T , d). 8.1.1 Theorem (Boundedness). Let be a regular Young function. Let (T , d) be a pseudo-metric space and let X = {Xt , t ∈ T } be a stochastic process satisfying the increment condition (8.1.3). Assume that the entropy integral I (T , d) =

D

−1 N (T , d, u) du

(8.1.14)

0

is convergent. Then X possesses a version X which is sample bounded and there exists a constant C depending on only such that sup |X − X | ≤ C I (T , d). (8.1.15) s t s,t∈T

Proof. We may assume D > 0 otherwise the result is obvious. By the finiteness of the integral in (8.1.14), (T , d) is totally bounded, hence separable. For any integer n = 0, 1, 2, . . . , let Tn ⊂ T be a sequence of centers of balls , corresponding to a minimal covering of T of size 2−n D, (T0 = {s0 }). Let T = ∞ n=0 Tn ; then T is a d-dense subset of T . Note also formally by s → s¯ , a map from Tn to Tn−1 such that d(s, s¯ ) < 2−n+1 D. Finally put, for n ≥ 0, Mn = sup |Xs − Xs0 |, s∈Tn

Mn = sup Mj . 0≤j ≤n

347

8.1 Introduction and general results

Then M0 = M0 = 0 and 0 ≤ Mn − Mn−1 ≤ sups∈Tn |Xs − Xs¯ |. Indeed, either Mn = Mn−1 , in which case there is nothing to prove; or Mn > Mn−1 , and thus Mn = Mn > Mn−1 . Let sσ ∈ Tn be such that Mn = |Xsσ − Xs0 |. Then Mn − Mn−1 = |Xsσ − Xs0 | − Mn−1 ≤ |Xsσ − Xs0 | − Xs¯σ − Xs0 | ≤ |Xsσ − Xs¯σ |. As is regular, for any n ≥ 1,

Mn − Mn−1 ≤ C−1 #(Tn ) sup Xs − Xs¯ s∈Tn

But Mn = Mn − M0 =

≤ C2

n

k=1 Mk

−(n−1)

−1

D

N (T , d, 2−n D) .

− Mk−1 . Thereby

n n

Mn ≤ Mk − Mk−1 ≤ C 2−k+1 D−1 N (T , d, 2−k D) k=1 ∞

≤C

k=1

2

−k+1

−1

D

N(T , d, 2

−k

D) ≤ C

D

−1 N (T , d, u) du.

0

k=1

We deduce

sup |Xs − Xs | ≤ C 0

D

−1 N (T , d, u) du.

0

s∈T

By (8.1.3), X is d-continuous in probability. Define X by Xt = limT s→t Xs . Then X admits a separable version of X , for which we obviously have D

sup |X − X | ≤ C −1 N (T , d, u) du. s s0 0

s∈T

And (8.1.15) now follows from the triangle inequality. Applying estimate (8.1.15) to any ball B(t, ρ) shows that X is also almost surely d-continuous at point t of T , since as ρ tends to 0, 2ρ

sup |X − X | ≤ C −1 N (B(t, ρ), d, u) du s t 0

s∈B(t,ρ)

≤C

2ρ

−1 N (T , d, u) du → 0.

0

Theorem 8.1.1 suffices completely for all the applications of the metric entropy method presented in this chapter. Its proof being also very simple and pedagogical, it is why we have chosen this presentation. We shall now complete it with a corresponding statement concerning sample continuity. Continuity appears in our applications as an already existing property: typically in the important case of random polynomials. However, establishing satisfactory conditions for sample continuity of a given class of stochastic processes is a more delicate problem. The theorem in view (Theorem 11.6 in [Ledoux–Talagrand: 1991]) states as follows:

348

8 The metric entropy method

8.1.2 Theorem (Continuity). Let be a Young function. Let (T , d) be a pseudo-metric space. Let X = {Xt , t ∈ T } be a stochastic process indexed on T and satisfying the increment condition (8.1.4). Then, if the integral condition (8.1.14) is satisfied, X possesses a version X which is sample bounded and sample (uniformly) d-continuous on T . Further, there exists an increasing function v : R+ → R+ with v(0) = 0, depending on condition (8.1.14) only, such that for any ε > 0, E supd(s,t)≤v(ε) |Xs − Xt | ≤ ε. Proof. We use the chaining argument of the proof of Theorem 8.1.1, and shall make it a little more precise. For any integer ≥ 1, there are maps π : T → T−1 satisfying d(s, π (s)) ≤ D2−+1 . We may also assume that T is finite, so T = TN for some large integer N . Define for 1 ≤ ≤ N, the maps σ : TN → T by σ = π+1 · · · πN . Note that σN = identity(TN ). We begin with a first observation. Let 1 ≤ k < N and s ∈ TN . Writing Xs − Xσk (s) = N =k+1 Xσ (s) − Xσ−1 (s) and arguing as in the previous proof, allows us to get, in view of (8.1.10), D2−k

−1 N (T , d, u) du. E sup |Xs − Xσk (s) | ≤ C 0

s∈TN

Let η > 0 and let s, t ∈ TN be such that d(s, t) ≤ η. If we now consider the set U = (x, y) ∈ T2 : ∃(u, v) ∈ TN2 such that d(u, v) ≤ η and σ (u) = x, σ (v) = y , then it is plain that (σ (s), σ (t)) ∈ U . Clearly to each pair ϑ = (x, y) in U , another pair (uϑ , vϑ ) in TN2 can be associated, satisfying σ (uϑ ) = x, σ (vϑ ) = y. These observations being made, choosing then (x, y) = (σ (s), σ (t)), we can write using the triangle inequality, |Xs −Xt | ≤ |Xs −Xσ (s) |+|Xσ (s) −Xuϑ |+|Xuϑ −Xvϑ |+|Xvϑ −Xσ (t) |+|Xσ (t) −Xt |. The trick there is that the third term in the right-hand side (which could be |Xs −Xt |) belongs to a set of cardinality less than or equal to #(U ) and d(uϑ , vϑ ) ≤ η, with and η independent. And this allows us to get the bound sup |Xs − Xt | ≤ sup |Xuϑ − Xvϑ | + 4 sup |Xt − Xσ (t) |.

s,t∈TN d(s,t)≤η

t∈TN

ϑ∈U

By the triangle inequality again and (8.1.10), we arrive at E sup |Xs − Xt | ≤ E sup |Xuϑ − Xvϑ | + 4E sup |Xt − Xσ (t) | s,t∈TN d(s,t)≤η

t∈TN

ϑ∈U

−1

≤ Cη

2

(N (T , d, D2

−

)) + 4C

D2−

−1 N (T , d, u) du.

D2−

−1 N (T , d, u) du.

0

Letting now N tend to infinity, gives E sup |Xs − Xt | ≤ Cη−1 (N 2 (T , d, D2− )) + 4C s,t∈T d(s,t)≤η

0

349

8.2 A theorem of Stechkin

The increment assumption (8.1.4) implies that Xt is d-continuous in probability. Define X by Xt = lim Xs . T s→t

Then E

X

is a version of X, and

sup d(s,t)≤η

|Xs

− Xt |

≤ Cη

−1

2

(N (T , d, D2

−

)) + 4C

D2−

−1 (N (T , d, u))du.

0

D2−

Given ε > 0, we choose such that 4C 0 −1 N (T , d, u) du ≤ ε/2, and then an η small enough to have Cη−1 (N 2 (T , d, D2− )) ≤ ε/2. In this way we are able to get E supd(s,t)≤η |Xs − Xt | ≤ ε. One can define v(ε) to be the largest possible η. The sample uniform d-continuity of X on T now follows from a standard application of the Borel–Cantelli lemma.

8.2 A theorem of Stechkin Recall first Stechkin’s theorem (see Gaposhkin [1966a: Theorem 8.3.5] or Billingsley [1999: Problem 6, p. 102, see also Theorem 12.2). 8.2.1 Theorem. Let ξ = {ξi , i ≥ 1} be a sequence of random variables satisfying the following assumption: γ α E ξl ≤ ul , 1 ≤ i ≤ j < ∞, (8.2.1) i≤l≤j

i≤l≤j

∞ where {ui , i ≥ 1} is a sequence of nonnegative reals ∞ such that the series l=1 ul converges and α > 1, γ > 0. Then the series l=1 ξl converges almost surely. Moreover, for α > 1, one has the bound ∞ α/γ ξl ≤ C ul , sup i,j ≥1 i≤l≤j

γ

l=1

where the constant C depends on α only. Note that this statement contains a trivial part: the case 0 < γ ≤ 1. Indeed (8.2.1) provides E |ξl |γ ≤ uαl . Since 0 < γ ≤ 1, then ∞ l=1

E (1 ∧ |ξl |) ≤

∞ l=1

E (1 ∧ |ξl |)γ ≤

∞ l=1

E |ξl |γ ≤

∞ l=1

uαl ≤

∞

α ul

< ∞.

l=1

(8.2.2) The series ∞ (1 ∧ |ξ |) thus converges almost surely. But this amounts to saying that l l=1 |ξ | converges almost surely, which is an even stronger conclusion. the series ∞ l=1 l In what follows, we will thus restrict our attention to the case γ > 1 only. The statement can also be completed in the case when the series ∞ l=1 ul diverges.

350

8 The metric entropy method

8.2.2 Theorem. Let the random variables ξ = {ξi , i ≥ 1} satisfy assumption (8.2.1) with α > 1, γ∞ > 1 and the sequence {ui , i ≥ 1} of nonnegative reals be such that the series 1≤l≤L ul and l=1 ul diverges. Put for any integer L ≥ 1, UL = SL = 1≤l≤L ξl . Then, |SL | a/γ

α/γ L→∞ U log UL L lim

= 0 (∀a > 1) almost surely.

(8.2.3)

We now give a common proof to both of these statements by means of the metric entropy approach, thus avoiding tedious use of a dyadic chaining argument. The important case α = 1 will be investigated by means of the same method for indicators in the next section (Theorem 8.3.1, see also Remark 8.3.5 for sequences of functions). Proofs of Theorems 8.2.1 and 8.2.2. Put U = {UL , L ≥ 1}, S = {SL , L ≥ 1}. Assumption (8.2.1) can be reformulated as follows:

α/γ

Sj − Sk γ ≤ Uj − Uk (∀j ≥ k ≥ 1) (8.2.1 ) Step 1. Proof of Theorem 8.2.1. Let u = ∞ l=1 ul . First observe that (8.2.1 ) implies γ that the sequence S is a Cauchy sequence in L , thus converging to some element S∞ of Lγ . The new sequence obtained by adding to S its limit is again denoted by S. Let 0 < ε ≤ u and write ⎧ ⎪ ε, (j + 1)ε[, j = 0, 1, . . . , [ uε ], ⎨Ij (ε) = [j u J ∗ (ε) = 0 ≤ j ≤ [ ε ] : Ij (ε) ∩ U = ∅ , ⎪ ⎩ if j ∈ J ∗ (ε). j− = inf{L : UL ∈ Ij (ε)} Then for all L ≥ 1, there exists j ∈ J ∗ (ε): 0 ≤ UL − Uj− ≤ ε and #(J ∗ (ε)) ≤ [ uε ] + 1 ≤ 2u/ε. Said differently, by invoking assumption (8.2.1 ), we have ∀L ≥ 1, ∃j ∈ J ∗ (ε) such that

SL − Sj− γ ≤ εα/γ .

Let N(S, · γ , ρ) be the minimal number of open · γ -balls of radius ρ centered in S and enough to cover S. Then, for 0 ≤ ρ ≤ uα/γ , N(S, · γ , ρ) ≤

2u . ρ γ /α

(8.2.4)

We apply Theorem 8.1.1. The corresponding setting is T = N, Xn = Sn , n ∈ N, d(n, m) = Sn − Sm γ . And the entropy integral is easily estimated: uα/γ uα/γ 1/γ 1/γ N (S, · γ , ρ) dρ ≤ (2u) ρ −1/α dρ = Cα uα/γ < ∞, 0

0

since α > 1, where the constant Cα depends on α only. Therefore S is convergent almost surely. And, we have the uniform bound sup Sn − Sm ≤ C uα/γ , α γ n,m≥1

351

8.2 A theorem of Stechkin

with a constant Cα depending on α only. We now go to the proof of the second statement. Step 2. Proof of Theorem 8.2.2. Let M > 1 and put for any integer k ≥ 1 Ik = [M k , M k+1 [. Let κ = {κp , p ≥ 1} be the sequence defined by κp = k if Ik is the p-th interval such that Ik ∩ U = ∅. Let Lp be the set of indices defined by L ∈ Lp ⇔ UL ∈ Iκp . Pick arbitrarily some index in Lp , which we write L∗p . Let a > 1. By assumption (8.2.1), P |SL∗p | > εM α(κp +1)/γ p a/γ ≤

γ

SL∗p γ εγ M α(κp +1) pa

≤

|UL∗p |α εγ M α(κp +1) p a

≤

1 . εγ pa

Thus by the Borel–Cantelli lemma, P lim sup p→∞

|SL∗p | M α(κp +1)/γ pa/γ

≤ ε = 1.

(8.2.5)

Examine now the oscillation of S over Lp . For i, j ∈ Lp we have Si − Sj γ ≤ |Ui − Uj |α/γ . For j ∈ Lp replace Sj by Sj = Sj /(M α(κp +1)/γ ), uj by uj = uj /(M κp +1 ) j and Uj by Uj = l=1 ul . Then α/γ

Si − Sj γ ≤ Ui − Uj ≤1

(i, j ∈ Lp ).

Let Sp = {SL , L ∈ Lp }. From the computation made at the previous step, we have the following estimate: 1 1 N(Sp , · γ , ρ)1/γ dρ ≤ 2 ρ −1/α dρ < ∞. 0

0

Hence by Theorem 8.1.1, sup |S − S | ≤ Cα < ∞, i j γ i,j ∈Lp

(8.2.6)

on α only. By Tchebycheff we deduce from the previous where Cα depends inequality, γ bound that P supi,j ∈Lp |Si − Sj | > εpa/γ ≤ (Cα /εγ p a ) and by the Borel–Cantelli lemma again supi,j ∈Lp |Si − Sj | ≤ ε = 1. (8.2.7) P lim sup M α(κp +1)/γ pa/γ p→∞ Combining now (8.2.5) with (8.2.7), and writing that SL = SL − SL∗p + SL∗p , easily gives: |SL | 1 + M α/γ ≤ε = 1. (8.2.8) P lim sup α/γ (log M)α/γ L→∞ UL (log UL )a/γ Since ε is arbitrary, this implies the result.

352

8 The metric entropy method

8.2.3 Remark. Very often, Theorem 8.2.2 applies in situations l = 1(Al ) − in which ξ L 1(A ) − P(Al ) and ul = P(Al ). And so Sl expresses the difference L l l=1 l=1 P(Al ). It is worth observing here, that if the sequence κ is very sparse, a smaller order size α/γ than for UL (log UL )a/γ can be assigned to the error term |SL |. This follows from (8.2.5) and (8.2.7) and is directly readable from the data. We continue with a second observation concerning the consistency, from a theoretical point of view, of the treatment proposed for the almost sure convergence of series of functions, through the approach described by Theorem 8.2.1. Later we will see in Remark 8.3.5, when treating the limit case α = 1, that this approach also allows us to re-capture (even in a more general form) the Rademacher–Menshov theorem. The very formulation of that theorem does not however allow us to recover classical results on almost sure convergence of series of independent random variables. It indeed requires that the series l≥1 E ξl2 (log l)2 converges – here we are given a sequence ξ = {ξi , i ≥ 1} of centered, square integrable, independent random variables – to ensure the convergence almost everywhere of the series l≥1 ξl , whereas it is classi cal (Petrov [1975: 266]) that the convergence of the series l≥1 E ξl2 is enough (and necessary). This result is however contained in Theorem 8.2.1. Here is how to get this. First, we shall quit L2 for Lp , p > 2 where we will apply Theorem 8.2.1. Introduce for some arbitrary ε > 0, the sequence ξ ε of truncated random variables: ξlε = ξl 1{|ξl | ≤ ε},

l ≥ 1.

Both sequences ξ and ξ ε are equivalent since the series l P{ξl = ξlε } converges. Appeal now to Rosenthal’s inequality: Let p ≥ 2. There exists a constant Cp depending on p only, such that for any sequence xi , i ≤ n of independent elements of Lp (P) with zero expectation, p 2 p/2 E . (8.2.9) xl ≤ Cp E |xl |p + E xl i≤l≤j

i≤l≤j

i≤l≤j

Assume first that ξ is a sequence of symmetric random variables. Thus p p/2 E ξlε ≤ Cp,ε E (ξlε )2 , i≤l≤j

i≤l≤j

where Cp,ε depends on p, ε only. For p > 2, Theorem 8.2.1 applies and we get the result in that case. Now if ξ is not symmetric (but centered), let ξ = {ξl , l ≥ 1} be an independent copy of ξ defined on a different probability space, with corresponding probability and expectation symbols P and E . Let ξlε = ξl 1{|ξl | ≤ ε} and ξl ε = ε ε ξl 1{|ξl | ≤ ε}. Then xl = ξl − ξl is a symmetric sequence. And by the reasoning made before, the series l xl converges. Moreover, by using the uniform bound in Theorem 8.2.1, p/2 E E sup xl ≤ Cp,ε E ξl2 , i,j ≥1 i≤l≤j

l

8.3 An application to the quantitative Borel–Cantelli lemma

353

so that E supj ≥1 l≤j xl < ∞, P-almost surely. An application of the dominated convergence theorem conditionally to ξ yields that the limit limj →∞ E l≤j xl = ε limj →∞ l≤j ξl − E ξl 1{|ξl | ≤ ε} exists P-almost everywhere. It now remains to control the sum l≤j E ξl 1{|ξl | ≤ ε}. But the centering assumption implies that ∞ −E ξl 1{|ξl | ≤ ε} = E ξl 1{|ξl | > ε} = εP{ξl > ε} + ε P{ξl > u}du. By assumption, the series l E ξl2 converges. Applying the Tchebycheff inequality to each term of the above writing of −E ξl 1{|ξl | ≤ ε}, we thus deduce convergence of the series ensures convergence almost everywhere of the series l≤j ξlε , l E ξl 1{|ξl | ≤ ε}. This and thereby of the series l≤j ξl since both sequences ξ and ξ ε are equivalent, in view of convergence of the series l P{ξl = ξlε }.

8.3 An application to the quantitative Borel–Cantelli lemma In this section, we discuss various formulations of the quantitative form of the Borel– Cantelli lemma. This is a relatively universal tool with wide fields of applications, notably in probability theory, metrical number theory and uniform distribution theory. The section is presented as a complementary part of the preceding, devoted here to the case α = 1 in Stechkin’s theorem. We show that the metric entropy approach is relevant there. We have also taken the opportunity to present some classical results, following a case by case natural progression, from independence to dependence in this study. We have not taken into consideration the various existing conditional versions of the Borel–Cantelli lemma, since they do not contain quantitative aspects. We have isolated as lemmas some useful estimates for suprema of finite families of random variables. We start with elementary considerations concerning Borel–Cantelli’s lemma, which we recall for our purpose. Borel–Cantelli lemma. Let (, B, P) be some probability space and a sequence {Ak , k ≥ 1} of measurable subsets of . (i) If the series k≥1 P(Ak ) converges, then P{lim supk→∞ Ak } = 0. (ii) If the series k≥1 P(Ak ) diverges and the events are independent, then P{lim supk→∞ Ak } = 1. As is well known, the independence assumption on the events Ak is too strong for getting the conclusion. It suffices indeed that some 0-1 law exists, and that the correlation condition be satisfied: P(Ak ∩ Al ) ≤ CP(Ak )P(Al ) (∀k = l) where C is some absolute constant. This follows from the

(8.3.1)

354

8 The metric entropy method

Paley–Zygmund inequality. For any g ∈ L2 (P) such that P(g ≥ 0) = 1 and any real λ ∈ [0, 1],

2 2 gdP . (8.3.2) P g ≥ λ gd P ≥ (1 − λ) g2d P Applying this inequality for g = I ≤k≤J 1Ak gives P

I ≤k≤J

1Ak ≥ λ

P(Ak ) ≥ (1 − λ)

2

I ≤k≤J

≥ (1 − λ)2

P(Ak ) +

I ≤k≤J

I ≤k≤J

1+C

I ≤k≤J

2

P(Ak )

I ≤k =l≤J

P(Ak )

I ≤k≤J

P(Ak )

P(Ak ∩ Al )

, (8.3.3)

which easily implies P(lim supk→∞ Ak ) = 1 whenever P(lim supk→∞ Ak ) = 0 or 1. Note that by Fatou’s lemma, (8.3.3) also provides an indication of the number of occurrences of the sets Ak : for any partial index J, (1 − λ)2 # 1 ≤ k ≤ J : ω ∈ Ak ≥λ ≥ (0 ≤ λ ≤ 1). P ω : lim sup C JJ →∞ I ≤k≤J P(Ak ) A great deal of attention has been devoted to getting much better estimates for the quantity (8.3.4) NJ = # 1 ≤ k ≤ J : Ak occurs . Let us first look at the independent case. Since NJ − E NJ is the sum of independent Bernoulli random variables ξk = 1(Ak ) − P(Ak ), we may invoke the strong law of large numbers. This one will in fact follow from a stronger result. Let ε > 0 and put ξk . (E Nk )1/2+ε Since the series k≥1 P(Ak ) diverges, the series k≥1 P(Ak )/(E Nk )α thus converges for any real α > 1 (see (4.8.6)). In particular the series k≥1 E ηk2 converges. The random variables ηk being independent, this implies, according to the TwoSeries Theorem (Petrov [1975a], p. 266), that the series ηk ηk =

k≥1

converges almost surely. By Kronecker’s lemma it follows that for all ε > 0, NJ − 1≤k≤J P(Ak ) P lim = 0 = 1. (8.3.5) 1/2+ε J →∞ 1≤k≤J P(Ak )

8.3 An application to the quantitative Borel–Cantelli lemma

355

This strictly stronger result can be made precise by invoking Kolmogorov’s law of the iterated logarithm for sums of independent random Theo Jvariables (Petrov [1975a], rem 1, p. 292). For any integer J ≥ 1, put BJ = k=1 P(Ak ) 1 − P(Ak ) . Then, NJ − Jk=1 P(Ak ) P lim sup " (8.3.6) #1/2 = 1 = 1. J →∞ 2BJ log log BJ Finally the statistic of the number of occurrences can also be made precise by invoking the Berry–Esseen inequality (Petrov [1975a], Theorem 3, p. 111): x NJ − 1≤k≤J P(Ak ) 1 −u2 /2 = O(LJ ), sup P < x − e du √ 1/2 2π −∞ x∈R BJ where

J

LJ =

3 k=1 E (ξk ) "J # 2 3/2 k=1 E (ξk )

J =

3 2 k=1 P(Ak ) + 2P(Ak ) − 3P(Ak ) #3/2 , "J k=1 P(Ak )(1 − P(Ak ))

" J #−1/2 and LJ ∼ as J tends to infinity, if limk→∞ P(Ak ) = 0. Obviously k=1 P(Ak ) we have a central limit theorem; we also have in fact an almost sure central limit theorem, which we will not describe here. Thus we have a complete picture of the asymptotic behavior of the number of occurrences for the sequence {Ak , k ≥ 1} in the independent case. Other forms of Paley–Zygmund inequality. This inequality is an extremely useful tool, and sometimes other variants turn up to be more appropriate. Observe first that the original Paley–Zygmund inequality is a simple consequence of the Cauchy–Schwarz inequality. We have (g ≥ 0, 0 ≤ λ ≤ 1)

2

E gχ {g ≥ λE g} ≤ E g 2 P{g ≥ λE g}. But E gχ {g ≥ λE g} = E g − E χ{g ≤ λE g} ≥ (1 − λ)E g. By combining both inequalities, we easily get

2 2 Eg P g ≥ λE g ≥ (1 − λ) . E g2 Lemma 8.7.4 has also provided the inequality P{X ≥ E X} > 0, valid for X ≥ 0 with E X < ∞. More generally, let r > s > 0 and 0 ≤ ε ≤ 1. Then for any non-negative random variable X ∈ Lr , 1−1 1 X s P X ≥ ε X s s r ≥ (1 − εs ) s . (8.3.7)

X r Indeed, let X, Y be nonnegative random variables. By applying Hölder’s inequality

r−s

s (with p = rs ), we have E Xs Y ≤ E Y r E Xr r . Choose Y = 1{X ≥ ε X s }. As E Xs Y = E Xs − E Xs 1{X ≤ ε X s } ≥ E X s − E Xs 1{Xs ≤ εs E Xs } ≥ (1 − εs )E Xs ,

356

8 The metric entropy method

r−s r−s and E Xs Y ≤ P X ≥ ε X s r X sr , we get P X ≥ ε X s r X sr ≥ (1 − εs ) X ss , or 1−1 1 X s P X ≥ ε X s s r ≥ (1 − εs ) s ,

X r as claimed. Inequality (8.3.7) can be viewed as a version of Petrov’s inequality. If X is any random variable and s > r > 0, then X ∈ Ls implies 1−1

X r . P X = 0 r s ≥

X s (See Petrov [1975b], inequality (2), p. 392.) In the light of the remark made after the statement of Borel–Cantelli’s lemma, it is interesting to figure out whether these results, or some of them, are extendable under weaker assumptions than independence. Before going further, it seems worthwhile to point out a kind of subsequence principle for independence observed by Neveu, after subsequent works from Fischler [1967], Gillis [1936], Lorentz [1960], Rényi [1958], Sucheston [1960], Visser [1937]. Weak convergence is essential in what follows. Let A = {Ak , k ≥ 1} be a sequence of measurable subsets of such that lim inf k→∞ P(Ak ) = ρ ∈ ]0, 1]. Then, according to Theorem 2, p. 67 of Neveu [1965], either • the sequence of indicators {1(Ak ), k ≥ 1} converges weakly in L2 (P) to the constant function equal to ρ, and then, for any ε > 0, there exists a subsequence n1 < n2 < · · · such that if Bm = Anm , for any two distinct, finite subsets and J with #( ) = I , #(J) = J , the following inequalities are realized: * * (1 − ε)ρ I (1 − ρ)J ≤ P Bi ∩ Bjc ≤ (1 + ε)ρ I (1 − ρ)J ; i∈

j ∈J

• or the sequence of indicators {1(Ak ), k ≥ 1} does not converge weakly in L2 (P), and there exist a real δ > 0 and a subsequence n1 < n2 < · · · such that if Bm = Anm , for any finite subset with #( ) = I , the following inequality is realized: * P Bi ≥ (ρ + δ)I . i∈

This result also generalizes the Poincaré recurrence theorem (Theorem 3.1.5). Consider now the dependent case. The first idea which comes to mind is whether it is possible to get something under assumption (8.3.1). Without any strengthening of (8.3.1) the answer is negative. This follows from a counterexample by Rieders for strong mixing sequences (c.f. Rieders [1993], remark following Theorem 1). One can also use the last part of the proof of Theorem 3, p. 68 in Fischler [1967] to give an elementary construction of a counterexample. Let η > 0 and denote I = [0, 1], J = [0, 1 + η]. Let be λ the Lebesgue measure on the interval I , and λ˜ be the

357

8.3 An application to the quantitative Borel–Cantelli lemma

probability measure on J defined by λ˜ (dx) = (1 + η)−1 1J (x)dx. On (I, λ) let us consider a sequence of independent (Rademacher) random variables taking values ±1 with probability 1/2. Define a sequence of events B = {Bn , n ≥ 1} by Bn = {εn = 1}. ˜ It is easily We view them as measurable events of the enlarged probability space (J, λ). checked that ˜ n )λ(B ˜ m ), ˜ n ∩ Bm ) = 1 = (1 + η)λ(B λ(B 4(1+η) ˜ ˜ n) = 1 and λ(lim supn→∞ Bn ) = 1 . λ(B 2(1+η)

1+η

This also provides a simple example of an orthonormal sequence, for which partial sums √ do not satisfy CLT. Indeed, let ξn (x) = 2(1 + η)1[0,1[ (x)εn (x), and put Sn (ξ ) = ξ1 + · · · + ξn , Sn (ε) = ε1 + · · · + εn . Then ξ = {ξn , n ≥ 1} is an orthonormal system ˜ but ξ ∈ in L2 (J, λ), / CLT since √ λ˜ x ∈ J : Sn (ξ )(x)/ n < t √ √ = λ˜ x ∈ I : Sn (ξ )(x)/ n < t + λ˜ x ∈ J \I : Sn (ξ )(x)/ n < t 2 " # √ = λ x ∈ J : 2(1 + η)Sn (ε)(x)/ n < t + 1R+ (t) /(1 + η) 2 " # → P{N (0, 1) < t/ 2(1 + η)} + 1R+ (t) /(1 + η) = P{N (0, 1) < t}. We shall concentrate in what follows on strong laws of large numbers with speed of convergence, rather than the study of the statistic of the occurrences via the CLT. The only comment we shall make in that direction concerns weakly multiplicative systems (WMS), a notion due to Alexits [1961] and later extended by Móricz [1976]. The study of the CLT, and therefore of the characteristic functions of the number of occurrences, indeed requires much stronger information on the correlation properties of the family {ξk , k ≥ 1}, where we have again set ξk = 1(Ak ) − P(Ak ). If for instance this family is for some real 1 ≤ p < 2, a p-WMS system:

1/p E ξi . . . ξi p sup Cr < ∞ where Cr = , r 1 r

i1 1 and consider a sequence {Al , l ≥ 1} of measurable subsets of . Put ml = P(Al ) and ξl = 1(Al ) − ml , l ≥ 1. We assume that the following assumptions are fulfilled: γ (i) E i≤l≤j ξl ≤ C i≤l≤j ml , 0 ≤ i ≤ j < ∞, (ii) the series ∞ k=1 mk diverges. Then, for every a > γ + 1: # 1 ≤ k ≤ J : Ak occurs − 1≤k≤J mk P lim = 0 = 1. (8.3.11) " #1/γ " #a/γ J →∞ log 1≤k≤J mk 1≤k≤J mk In the independent case, Theorem 8.3.2 does not bring any more than Theorem 8.3.1 or property (8.3.6), since by Rosenthal’s inequality (8.2.9), γ γ /2 E ξl ≤ Cγ ml , 0 ≤ i ≤ j < ∞. i≤l≤j

i≤l≤j

To prove Theorem 8.3.2, we begin with a useful lemma.

359

8.3 An application to the quantitative Borel–Cantelli lemma

8.3.3 Lemma. Let γ > 1, 0 < β ≤ 1 and consider a finite collection of random variables E = (X1 , . . . , XN ) ⊂ Lγ (P), such that sup

(i)

1≤i,j ≤N

Xi − Xj γ ≤ 1,

(ii) N (E, · γ , ε) ≤

C

(0 < ε ≤ 1).

ε 1/β

Then there exists a constant Kβ,γ depending on β, γ only such that ⎧ β 1/γ ⎪ ⎨Kβ,γ max(C , C ) if βγ > 1, N e sup |Xi − Xj | ≤ Kβ,γ C 1/γ log( ) if βγ = 1, C γ ⎪ 1 1≤i,j ≤N ⎩ −β β γ Kβ,γ C N if βγ < 1.

(8.3.12)

Note that a straightforward application of inequality (8.1.3) with (x) = |x|γ would have given sup1≤i,j ≤N |Xi − Xj | γ ≤ N 2/γ , which is a far poorer bound. We shall see in the next lemma that the requirement made on the entropy numbers of the family E is well adapted to our purpose . Proof. Under our assumption N(E, · γ , ε) ≤ min(C/ε 1/β , N ). Apply Theorem 8.1.1 with ϕ(x) = |x|γ . The entropy integral in (8.1.14) can be estimated as follows: 1 1

C 1/γ 1/γ N (E, · γ , ε) dε ≤ min ε1/β ,N dε 0

0

=

(C/N )β

N 1/γ dε + C 1/γ

0 1

= Cβ N γ

−β

1

+Cγ

1

1

ε−1/βγ dε

(C/N )β

ε−1/βγ dε.

(C/N )β

A direct computation then shows

⎧ " # β , C 1/γ ) 2βγ −1 ⎪ if βγ > 1, max(C ⎪ βγ −1 ⎨ 1

N β 1/γ β if βγ = 1, N (E, · γ , ε) dε ≤ C log C e ⎪ 1 " # 0 ⎪ −β 1 ⎩C β N γ if βγ < 1. 1−βγ

The result is thus implied by the conclusion of Theorem 8.1.1. 8.3.4 Lemma. Let γ > 1, 0 < β ≤ 1 and consider a finite collection of random variables E = X1 , . . . , XN ⊂ Lγ (P), and reals 0 ≤ t1 ≤ t2 ≤ · · · ≤ tN ≤ 1 such that

Xj − Xi γ ≤ (tj − ti )β (∀1 ≤ i ≤ j ≤ N ). (8.3.13) Then, there exists a constant Kβ,γ depending on β, γ only, such that ⎧ if βγ > 1, ⎪ ⎨Kβ,γ sup |Xi − Xj | ≤ Kβ,γ log N if βγ = 1, γ ⎪ 1 1≤i,j ≤N ⎩ −β Kβ,γ N γ if βγ < 1.

(8.3.14)

360

8 The metric entropy method

From the lemma above follows the well-known Rademacher–Menshov’s maximal inequality. Let X1 , X2 , . . . , Xn , n ≥ 2, have zero means and be orthogonal. Then n j 2 n E max Xi ≤ C(log n)2 E Xi2 , j =1

i=1

i=1

where C is a universal constant. Proof. It is similar to the construction made in the proof of Theorem 8.2.1. Let 0 < ε ≤ 1 and write ⎧ ⎪ ε, (j + 1)ε[ j = 0, 1, . . . [ 1ε ], ⎨Ij (ε) = [j "1# ∗ J (ε) = 0 ≤ j ≤ ε : Ij (ε) ∩ {tl , 1 ≤ l ≤ N} = ∅ , ⎪ ⎩ j− = inf{l : tl ∈ Ij (ε)} if j ∈ J ∗ (ε). Then for all 1 ≤ l ≤ N, there exists j ∈ J ∗ (ε): 0 ≤ tl − tj− ≤ ε and #(J ∗ (ε)) ≤ "1# ε + 1 ≤ 2/ε. This, by virtue of the assumption made, means that ∀1 ≤ l ≤ N, ∃j ∈ J ∗ (ε) such that

Xl − Xj− γ ≤ εβ .

Thus N(E, · γ , εβ ) ≤ 2/ε, or else N(E, · γ , ρ) ≤

2 ρ 1/β

(0 < ρ ≤ 1).

(8.3.15)

It remains to apply Lemma 8.3.3 to conclude (8.3.14). Now we can pass to the proof of Theorem 8.3.2. Proof of Theorem 8.3.2. We shall use the notation Sn = any integer k ≥ 1, put

n

l=1 ξl , n

n Nk = inf n ≥ 1 : ml ≥ k .

=

n

l=1 ml .

For

(8.3.16)

l=1

Then Nk −1 < k ≤ Nk ≤ Nk −1 + 1. Consider two positive integers P < Q; we will first estimate the oscillation of the sums Sl over the block of indices NP , NP +1 , . . . , NQ−1 .

Nk −1 ml = Nk −1 − Nk−1 −1 ≤ k − (k − 1) − 1 = 2, we deduce from Since l=N k−1 our assumption that m −1 N γ E ξl ≤ 2C(m − n). (8.3.17) l=Nn

8.3 An application to the quantitative Borel–Cantelli lemma

361

Put " #1/γ , Xh = SNP +h −1 / 2C(Q − P )

th = h/(Q − P )

(h = 0, . . . , Q − P − 1). (8.3.18) Reformulating then our previous estimate in terms of Xh , th gives (writing m = P + j , n = P + i) Xj − Xi γ ≤ tj − ti (0 ≤ i ≤ j ≤ Q − P − 1). (8.3.19) γ We can therefore infer from Lemma 8.3.4 that sup0≤i≤j ≤Q−P −1 |Xj − Xi | γ ≤ Kγ log(Q − P )e, or in terms of Sn : sup SN − SN ≤ Kγ (Q − P )1/γ log(Q − P )e. (8.3.20) n m γ m,n∈[P ,Q[

Apply this estimate with the choice P = 2r , Q = 2r+1 and put SN − SN (r ≥ 1). Br = sup n m 2r ≤n,m 0 but arbitrary and a > γ + 1. By estimate (8.3.20) and Tchebycheff’s inequality, γ E Br P Br > ε2r/γ r a/γ ≤ γ r a ≤ Kγ ε−γ r γ −a , ε 2 r thus implying that the series r≥1 P Br > ε2r/γ r a/γ converges. Hence, by Borel– Cantelli’s lemma, P ∃R < ∞ : Br ≤ ε2r/γ r a/γ , r ≥ R = 1. (8.3.21) Further

SN r γ E 1 2r + 1 2 P SN2r > ε2r/γ r a/γ ≤ γ r a ≤ γ r a N2r ≤ γ r a . ε 2 r ε 2 r ε 2 r We deduce for a > 1 that the series r≥1 P SN2r > ε2r/γ r a/γ converges. By invoking the Borel–Cantelli lemma again, we obtain P ∃R < ∞ : SN2r ≤ ε2r/γ r a/γ , r ≥ R = 1. (8.3.22) Let now k ≥ 1 and r ≥ 1 be integers such that 2r ≤ k < 2r+1 . From the inequality |SNk | ≤ |SN2r | + |SNk − SN2r | and (8.3.21)–(8.3.22), it follows that on a measurable set of full measure, |SNk | ≤ 2ε2r/γ r a/γ holds true for all k large enough. Since 2r ≤ 2r ≤ Nk ≤ 2r+1 −1 < 2r+1 , we also have 1/γ |SNk | ≤ Kγ εNk (log Nk )a/γ , (8.3.23) for all k large enough, on a measurable set of measure 1.

362

8 The metric entropy method

Finally we treat the general case. Let N be some arbitrary positive integer and k an integer such that Nk ≤ N < Nk+1 . Then k ≤ Nk ≤ N ≤ Nk+1 −1 ≤ k + 1. From (8.3.23) follows that on a measurable set of full measure, both inequalities below hold true: N

1(Al ) ≥

l=1

Nk

1/γ

1(Al ) ≥ Nk − Kγ εNk (log Nk )a/γ

l=1 1/γ

≥ N − Kγ εN (log N )a/γ , and N

Nk+1

1(Al ) ≤

l=1

1/γ

1(Al ) ≤ Nk+1 + Kγ εNk+1 (log Nk+1 )a/γ

l=1 1/γ

≤ N + Kγ εN (log N )a/γ , provided that N is large enough. In other words, SN P lim sup 1/γ ≤ Kγ ε = 1. N →∞ N (log N )a/γ

(8.3.24)

Since ε is arbitrary, we obtain the stated result. 8.3.5 Remark. 1. It is worth noticing here that we used assumption (i) – only – to control the behavior of the sums SNk . Thus the following, seemingly weaker condition would have been enough for our purpose: (i ) There exist a real η0 > 0 and a constant C0 = C(η0 ) depending on η0 only such that: for any integers i ≤ j , j

γ ml ≥ η0 "⇒ E ξl ≤ C0 ml .

l=i

i≤l≤j

i≤l≤j

2. The next observation concerns the limit case α = 1 in Stechkin’s theorem. Let γ > 1 and ξ = {ξi , i ≥ 1} be a sequence of random variables satisfying the assumption γ ξl ≤ ml , 0 ≤ i ≤ j < ∞, (8.3.25) E i≤l≤j

i≤l≤j

where {ml , l ≥ 1} is a sequence of reals with 0 ≤ ml ≤ 1. Assume first that the series ∞ l=1 ml diverges. Using the notation from the proof of Theorem 8.3.2 (notably definition (8.3.16)), the previous remark together with estimate (8.3.23) imply for any a > γ + 1 that SNk P lim 1/γ = 0 = 1. (8.3.26) a/γ k→∞ Nk (log Nk )

8.3 An application to the quantitative Borel–Cantelli lemma

∞

Assume now that the series ∞

l=1 ml

ml (log l)γ < ∞ "⇒ the series

363

converges. We claim that ∞

l=1 ξl

converges almost surely.

(8.3.27)

l=1

Indeed, let us first observe that the sequence {Sn , n ≥ 1} is a Cauchy sequence in Lγ , thus converging to some element S ∈ Lγ . Next by Lemma 8.3.3, for any integer r ≥ 1, it follows that 1/γ Sn − Sm ≤ Kγ r sup m . l γ 2r ≤n,m 0, L l=1 L

γ +1 1

+ε ξl = o "(L) γ (log L) γ

(Gál–Koksma [1950: Theorem 3]),

1 1

+ε ξl = o "(L) γ (log L) γ

(Gál–Koksma [1950: Theorem 5]),

1 3 σ ξl = o L 2 (log L) 2 + 2 +ε

(Gál–Koksma [1950: Theorem 6]).

l=1 L l=1

Essentially in each case, we examine a situation of the following type: γ E (∀1 ≤ i ≤ j < ∞), ξl ≤ " ul i≤l≤j

(8.4.3)

i≤l≤j

where {ui , i ≥ 1} is a sequence of nonnegative reals and " : R+ → R+ an increasing function. PutS = {SL , L ≥ 1}, LU = {UL , L ≥ 1}, where for any positive integer L, ξ and U = SL = L L l=1 l l=1 ul . We shall prove the result below.

365

8.4 Application to Gál–Koksma’s theorems

8.4.1 Theorem. a) Assume that the series ∞ l=1 ul converges and that the integral −1 γ −1/γ dρ is convergent. Then the series ∞ l=1 ξl is convergent almost +0 " (ρ surely. 1+η b) Assume that the series ∞ l=1 ul diverges and that for some real η ≥ 0, "(x)/x is nondecreasing. Then, for all ε > 0, SL (η > 0) P lim = 0 = 1, L→∞ "(UL )1/γ (log UL )(1+ε)/γ (8.4.4) SL (η = 0) P lim = 0 = 1. L→∞ "(UL )1/γ (log UL )1+(1+ε)/γ Putting ul ≡ 1 in the above result immediately gives Theorems 3, 5 and 6 of Gál–Koksma [1950]. Proof. The proof of this result follows from a simple modification of the proofs of Theorems 8.2.1 and 8.2.2. a) Assumption (8.4.3) implies that the sequence S is a Cauchy sequence in Lγ . The new sequence obtained by adding to S its limit is again denoted by S. Let 0 < ε ≤ u; write again u = ∞ m l=1 l and ⎧ ⎪ ε, (j + 1)ε[, if j = 0, 1, . . . , [ uε ], ⎨Ij (ε) = [j u J ∗ (ε) = 0 ≤ j ≤ [ ε ] : Ij (ε) ∩ U = ∅ , ⎪ ⎩ j− = inf{L : UL ∈ Ij (ε)} if j ∈ J ∗ (ε). Then

∀L ≥ 1, ∃j ∈ J ∗ (ε) such that SL − Sj− γ ≤ "(ε)1/γ ,

which implies that N(S, · γ , ρ) ≤

"(u)1/γ

2u " −1 (ρ γ )

,

0 < ρ < "(u)1/γ . "(u)1/γ $

(8.4.5)

%

1/γ 2u N (S, · γ , ρ) dρ ≤ I" := dρ < ∞, " −1 (ρ γ ) 0 0 by assumption. Applying Theorem 8.1.1 shows that S is convergent almost surely and sup Sl − Sn | ≤ KI" , γ 1/γ

l,n≥1

where K is a universal constant. b) We use the notation and definitions from the proof of Theorem 8.2.2: κ = {κp , p ≥ 1}, Lp , L∗p , and a > 1. On the one hand

P |SL∗p | > ε"(M

κp +1 1/γ

)

p

a/γ

γ

≤

SL∗p γ ε γ "(M κp +1 )pa

≤

1 εγ pa

.

366

8 The metric entropy method

Thus by the Borel–Cantelli lemma, P lim sup p→∞

|SL∗p | "(M κp +1 )1/γ pa/γ

≤ ε = 1.

(8.4.6)

On the other hand, put for j ∈ Lp , Sj = Sj /"(M κp +1 )1/γ , uj = uj /"(M κp +1 )1/γ , j Uj = l=1 ul . By assumption

Si

− Sj γ

$

"(Uj − Ui ) ≤ "(M κp +1 )

%1/γ

(i, j ∈ Lp ).

Now we use the fact that "(x)/x 1+η is nondecreasing. Since we have

Si

− Sj γγ

"(Uj − Ui ) ≤ ≤ "(M κp +1 )

Uj − Ui M κp +1

1+η

"(Uj −Ui ) (Uj −Ui )1+η

≤

"(M κp +1 ) , (M κp +1 )1+η

= (tj − ti )1+η ,

with tj = Uj /M κp +1 , j ∈ Lp . Applying Lemma 8.3.4 to the family Sp allows us to get the following bound for the oscillation of the Sj ’s over Lp : κp +1 )1/γ if η > 0, sup |Si − Sj | ≤ Kη,γ "(M (8.4.7) γ κp +1 )1/γ log #(L ) if η = 0, K "(M i,j ∈Lp η,γ p where Kη,γ depend on η, γ only. • If η > 0, then P supi,j ∈Lp |Si − Sj | > εpa/γ ≤ (Kη,γ /ε)γ p−a , which implies by the Borel–Cantelli lemma that P lim sup sup

p→∞ i,j ∈Lp

|Sj − Si | ≤ ε = 1. "(M κp +1 )1/γ pa/γ

(8.4.8)

• If η = 0, then P supi,j ∈Lp |Si − Sj | > εpa/γ log #(Lp ) ≤ (Kη,γ /ε)γ p−a , and again by Borel–Cantelli lemma, P lim sup sup

p→∞ i,j ∈Lp

|Sj − Si | ≤ ε = 1. "(M κp +1 )1/γ pa/γ log #(Lp )

(8.4.9)

Combining now (8.4.6) with (8.4.8) and letting ε tend to 0, establishes the result for the case η > 0. Combining finally (8.4.6) with (8.4.9) and observing for L ∈ Lp that #(Lp ) ≤ M κp +1 ≤ MUL and p ≤ κp , next letting ε tend to 0, establishes the result for the case η = 0. Theorems 1, 2 and 4 in Gál–Koksma [1950] contain rather theoretical conditions for almost sure convergence, which practically amount to re-starting the proof for applications on the considered example (hence Theorems 3, 5 and 6).

367

8.4 Application to Gál–Koksma’s theorems

Consider now the following assumption: for some γ > 1, σ > 1, γ E ξl ≤ Cj γ −σ (j − i)σ η(j − i) (∀1 ≤ i ≤ j < ∞),

(8.4.10)

i≤l≤j

where η(n) > 0 is nonincreasing and the series n≥1 η(n)/n converges. By Theorem 7 in [Gál–Koksma: 1950], SL P lim = 0 = 1. L→∞ L The proof is given under the additional assumption that η(n)(log n)2 is nondecreasing, and several nice applications to uniform distribution can be found in Koksma–Salem [1950]. In these applications, η(N) = N −b for some positive real b. It is shown for instance in Koksma–Salem [1950: Section 3], by means of a lemma of Van der Corput that j 2(1−γ ) 2(1−γ ) 1 e2iπ kf (l) ≤ Ck P −2 j P (j − i)1− P (8.4.11) l=i

with 0 < γ < 1, provided that f be p-times differentiable with P = 2p , p ≥ 2. Then the authors study uniform distribution for a class of smooth differentiable functions, using (8.4.11) to satisfy assumption (8.4.10). However, here again, it is possible to apply a metric entropy argument. Consider the following assumption: γ E (∀1 ≤ i ≤ j < ∞), (8.4.12) ξl ≤ ul " ul i≤l≤j

l≤j

i≤l≤j

where , " : R+ → R+ are nondecreasing, "(x)/x 1+ρ is nondecreasing for some ρ ≥ 0 and {ui , i ≥ 1} is a sequence of nonnegative reals such that the series ∞ l=1 ul diverges. Assumption (8.4.10) corresponds to (x) = x γ −σ , Let σ > σ > 1. By writing

"(x) xσ

"(x) = x σ η(x),

=

assumption, mentioned above, that

ul ≡ 1.

σ −σ

x η(x)log2 x, we deduce log2 x "(x) is nondecreasing. xσ

from the additional

8.4.2 Theorem. Assume that condition (8.4.12) is satisfied, and for some M > 1, that the series s γ = l≥1 "(M l )(M l )/M γ l converges. Put, S = γ

sup

k k+1 [ k≥1 j :Uj ∈[M ,M

with the convention that sup∅ = 0. Then,

S γ ≤ Kγ s,

and in particular

where Kγ is a constant depending on γ only.

P

|Sj | Mk

γ

SL = 0 = 1, L→∞ UL lim

(8.4.13)

368

8 The metric entropy method

Theorem 7 of Gál–Koskma[1950] follows from this result by putting ul ≡ 1, since the convergence of the series n≥1 η(n)/n implies the one of the series l≥1 η(M l ), thereby also implying the finiteness of s. Proof. Again we use the notation from the proof of Theorem 8.2.2: κ = {κp , p ≥ 1}, Lp , L∗p , and a > 1. On the one hand SL∗p γγ p

M γ (κp +1)

≤

"(M κp +1 )(M κp +1 ) M γ (κp +1)

p

Now, for i, j ∈ Lp , i ≤ j , E

Sj − Si γ (M κp +1 )

≤ Kγ s γ .

(8.4.14)

≤ "(Uj − Ui ),

we deduce from estimates (8.4.7) sup |Si − Sj | ≤ Kγ (M κp +1 )1/γ "(M κp +1 )1/γ . γ i,j ∈Lp

(8.4.15)

Then

γ supi,j ∈Lp |Si − Sj | γ p

M γ (κp +1)

≤ Kγ

(M κp +1 )"(M κp +1 )/M γ (κp +1) ≤ Kγ s γ .

p

(8.4.16) By the triangle inequality, (8.4.14) and (8.4.16) imply S γ ≤ Kγ s, and finally that supj ∈Lp |Sj |/M κp tend to 0 almost surely, as p tends to infinity. Hence (8.4.13). We conclude with an example of application to diophantine approximation, inspired by a very deep result of Gál [1949]. For u ≥ 0, let {u} = u − [u] − 21 where [u] denotes the greater integer less than u. Let us consider, for a given increasing sequence of positive integers N = {ni , i ≥ 1}, the following sums: N {ni x} (N ≥ 1). (8.4.17) κN (x) = i=1

In the case when N = N, Khintchin proved that κN (x) = o(log1+ε N ) for almost all x, where ε > 0 is an arbitrarily small positive number. In the general case, Erdös showed that κN (x) = o(N 1/2 logr N) for almost all x, where r is some positive constant. Later Gál improved this in showing that for every ε > 0, κN (x) = o(N 1/2 log2+ε N ),

(8.4.18)

for almost every x, and stated that a minor modification in the proof yields the following better bound: for every ε > 0, κN (x) = o(N 1/2 log3/2+ε N ),

(8.4.18a)

8.5 An application to the supremum of random polynomials

369

almost surely. In Gál [1949] (to which we refer for the above mentioned results, but see also Baker [1981]), the proof of (8.4.18) is relatively long and appeals to the “Hobson– Plancherel” method. A short proof using Theorem 8.4.1 is however available. Sketch of proof. Let (a, b) and [a, b] respectively denote the greatest common divisor and the least common multiple of the positive integers a and b, and put a, b =

(a, b) [a, b]

We introduce the following function f (N) = sup ni

ni , nj ,

i,j ≤N

where the sup is taken over all N -tuples of positive integers. By N -tuple it is meant a collection of N positive integers all different. We shall make use of the following strong result in Gál [1949: Theorem 2]: there exist two constants c and C, such that for all N large enough cN (log log N)2 ≤ f (N) ≤ CN (log log N )2 . As is well known

0

and so (8.4.19) implies 1 0

1

{ax}{bx}dx =

1 a, b 12

2

2 {nl x} dx ≤ C(j − i) log log(j − i) .

(8.4.19)

(8.4.20)

(8.4.21)

i≤l≤j

Thus, the assumptions of Theorem 8.4.1 are satisfied with "(u) = u(log log u)2 . We deduce for all ε > 0 κN (x) = o(N 1/2 log3/2+ε N ), for almost every x.

8.5 An application to the supremum of random polynomials Let {pk , k ≥ 1}, {θk , k ≥ 1} be two sequences of reals. Put p˜ N = max{[2 + |pk |], 1 ≤ k ≤ N},

N = 1, 2, . . . ,

where [x] stands for the integer part of x. Let also X = {X1 , X2 , . . . } and Y = {Y1 , Y2 , . . . } be two sequences of real random variables defined on a common probability space (, A, P). We will be mainly interested in the cases when X and Y are

370

8 The metric entropy method

either sequences of centered, independent random variables, or stationary sequences. Consider for N = 1, 2, . . . the sequence of random trigonometric sums ZN (ω, t) =

N

θk Xk (ω) cos 2πpk t + Yk (ω) sin 2πpk t .

(8.5.1)

k=1

In this section, we show that the metric entropy method can be efficiently applied for estimating the total extrema QN := sup |ZN (t)| .

(8.5.2)

0≤t≤1

We will see that this reduces to applying the metric entropy method in the simplest possible case: the real line provided with the usual distance. And this is also why we believe that it is likely the most elementary possible approach. As a particular case of a more general estimate we shall recover the well-known estimate of Salem–Zygmund’ proof or in Kahane [1954: Theorem 7]. It is of interest to mention that Bernstein’s inequality for polynomials is not used in this approach, unlike in Salem–Zygmund or Kahane [1968]. Let us first observe in the case when X and Y are independent random variables with E Xk = E Yk = 0 and E Xk2 = E Yk2 = 1, that 2

E ZN (s) − ZN (t) =E

N

2 θk Xk [cos 2πpk t − cos 2πpk s]+Yk [sin 2πpk t − sin 2πpk s]

k=1

=

N

2

θk2 [cos 2πpk t − cos 2πpk s]2 + [sin 2πpk t − sin 2πpk s]

k=1 N

=2

θk2 [1 − cos 2πpk (t − s)] = 4

k=1

N

θk2 sin2 πpk (t − s).

k=1

Therefore, if we put for s, t ∈ [0, 1], dN (s, t) = 2

N

1/2

θk2 sin2 πpk (s − t)

,

(8.5.3)

k=1

we define in this way a pseudo-metric on [0, 1], since dN (s, t) = ZN (s) − ZN (t) 2 . This pseudo-metric will play a central role in what follows. We introduce now an assumption concerning the increments of the process ZN ( · ). Consider the Young function G(t) = exp(t 2 ) − 1, t real, together with the associated Orlicz space LG (P), that is, the set of A-measurable functions f : → R, such that E G(af ) < ∞ for some real 0 < a < ∞. We recall that LG (P) is provided with the norm

∀f ∈ LG (P), f G = inf c > 0 : E G fc ≤ 1

8.5 An application to the supremum of random polynomials

and that (LG (P), · G ) is a Banach space. We will assume that for some constant B,

ZN (s) − ZN (t) G ≤ BdN (s, t),

N ∀N ≥ 1, ∀0 ≤ s, t ≤ 1, 2 1/2 .

ZN (s) G ≤ B k=1 θk

371

(8.5.4)

These assumptions are satisfied when X and Y are independent Rademacher or Gaussian random variables; but also in other interesting cases (see Examples 1–3 below). We will prove the following result. 8.5.1 Theorem. Under assumption (8.5.4), there exists a constant C (which is a function of the constant B from (8.5.4) only) such that for any integer N ≥ 1,

QN G ≤ C (log p˜ N )

1/2

N

θk2

1/2 .

k=1

This estimate is optimal. Indeed, assume that Xn = ξ2n , Yn = ξ2n+1 where (ξn )n≥0 is a sequence of independent Rademacher random variables. Assume also that θk = 1 and pk = k (k ≥ 1). Then, referring for instance to Proposition 2, p. 129 in Kashin– Saakyan [1989], we have ∀N ≥ 1,

E QN ≥ C (N log N )1/2 ,

(8.5.5)

where C is a universal constant. We shall now first give three nice classes of examples. Example 1. Assume that X and Y are two stationary centered Gaussian sequences, with finite decoupling coefficient, that is: ∞ E X1 Xk p(X) = E (X )2 < ∞, k=1

1

∞ E Y1 Yk p(Y) = E (Y )2 < ∞. k=1

1

Then, assumption (8.5.4) is satisfied. More precisely, for any 0 ≤ s, t ≤ 1, √

1/2

ZN (s) − ZN (t) G ≤ 9 2 max p(X), p(Y) dN (s, t), √

1/2 N 1/2 2

ZN (s) G ≤ 9 2 max p(X), p(Y) . k=1 θk

(8.5.4a)

So Theorem 8.5.1 does apply in that case. Note that the decoupling assumption is trivially satisfied when both X and Y consist of independent N (0, 1) distributed random variables. Observe also that no assumption on the correlation between X and Y is required, and consequently ZN is not necessarily Gaussian. Finally, recall that the Ornstein–Uhlenbeck process Uk = W (ek )e−k/2 (W being Brownian motion) k = 1, 2, . . . is the typical example of a stationary Gaussian sequence with finite decoupling coefficient. For proving the claimed inequalities, we will use the decoupling inequality

372

8 The metric entropy method

stated in Lemma 10.1.9. Let λ be some fixed real. By means of the Cauchy–Schwarz inequality: N

E eλ(Zn (s)−ZN (t)) = E eλ k=1 θk {Xk (cos 2πpk s−cos 2πpk t)+Yk (sin 2πpk s−sin 2πpk t)} N N

1/2 (8.5.6) ≤ E e2λ k=1 θk Xk (cos 2πpk s−cos 2πpk t) E e2λ k=1 θk Yk (sin 2πpk s−sin 2πpk t) . Put fkX (x) = e2λθk x(cos 2πpk s−cos 2πpk t) ,

Y

fk (x) = e2λθk x(sin 2πpk s−sin 2πpk t) ,

k = 1, . . . , N, and apply Lemma 10.1.9. We obtain, since E eλN (0,1) = eλ E e2λ

N

E e2λ

k=1 θk Xk (cos 2πpk s−cos 2πpk t)

N

k=1 θk Yk (sin 2πpk s−sin 2πpk t)

≤ e2λ ≤ e2λ

2 /2

,

2 p(X) N θ 2 (cos 2πp s−cos 2πp t)2 , k k k=1 k 2 p(Y) N θ 2 (sin 2πp s−sin 2πp t)2 k k k=1 k

.

Hence E eλ(Zn (s)−ZN (t)) ≤ e2λ

2 max(p(X),p(Y)) N θ 2 k=1 k

= e2λ

2 max(p(X),p(Y))d 2 (s,t) N

(cos 2πpk s−cos 2πpk t)2 +(sin 2πpk s−sin 2πpk t)2

. 2

2

Now we shall use the fact that if U is a real random variable such that E eλU ≤ eλ C (∀λ ∈ R), then U G ≤ 9C. Here we have C = 21/2 max(p(X), p(Y))1/2 dN (s, t). Thus, it follows from the previous estimates that √

1/2

Zn (s) − ZN (t) G ≤ 9 2 max p(X), p(Y) dN (s, t). Hence the first inequality in (8.5.4a). The second one is deduced by a similar reasoning. Example 2. Assume that both X and Y are sequences of independent, centered real random variables with unit variance, and that there exists a real constant M such that ∀k ≥ 1,

|Xk | ≤ M,

|Yk | ≤ M.

Then, assumption (8.5.4) is satisfied. More precisely, for any 0 ≤ s, t ≤ 1,

ZN (s) − ZN (t) G ≤ 9MdN (s, t),

N 2 1/2 .

ZN (s) G ≤ 9M k=1 θk

(8.5.4b)

This is a direct consequence of the following result (Theorem 3.5.1 in [Garsia: 1970]). Let {ξn , n ≥ 1} be independent, uniformly bounded (|ξn | ≤ M, a.s. for every n), centered random variables with unit variance. Let {an , n ≥ 1} ∈ 2 and let f = ∞ n=1 an ξn . Then √ |f |2 ≤ 2. (8.5.7) E exp 16M 2 f 22

373

8.5 An application to the supremum of random polynomials

This can also be proved by means of Lemma 4.1 in [Kuipers–Niederreiter: 1971]. According to this lemma, for any bounded random variable X and all real numbers α, E eαX ≤ eαE X+α

2 X 2 /2 ∞

.

(8.5.8)

We begin again with (8.5.6) and obtain E eλ(Zn (s)−ZN (t)) N

= E eλ k=1 θk {Xk (cos 2πpk s−cos 2πpk t)+Yk (sin 2πpk s−sin 2πpk t)} 1/2 N N ≤ E e2λ k=1 θk Xk (cos 2πpk s−cos 2πpk t) E e2λ k=1 θk Yk (sin 2πpk s−sin 2πpk t) . In view of the quoted lemma, E e2λθk Xk (cos 2πpk s−cos 2πpk t) ≤ e4λ

2 θ 2 (cos 2πp s−cos 2πp t)2 M 2 /2 k k k

.

Operating similarly for the “Yk ” component gives E eλ(Zn (s)−ZN (t)) ≤

N (

e

2λ2 θk2 (cos 2πpk s−cos 2πpk t)2 M 2

k=1

=

N (

eλ

N (

e2λ

2 θ 2 (sin 2πp s−sin 2πp t)2 M 2 k k k

1/2

k=1 2M2θ 2 k

"

(cos 2πpk s−cos 2πpk t)2 +(sin 2πpk s−sin 2πpk t)2

#

= eλ

2 M 2 d 2 (s,t) N

.

k=1

Hence the first inequality in (8.5.4b), and the second obtains by a similar reasoning. Theorem 8.5.1 thus applies in that case as well. / Example 3. Let A0 ⊂ A1 ⊂ · · · ⊂ A be an increasing filtration of A (A = ∞ i=0 Ai ), and assume that X is a sequence of martingale differences adapted to that filtration, with ∀k ≥ 1, Xk ∞ ≤ 1. (t) Assume that Y ≡ 0. Then assumption (8.5.4) is satisfied. Indeed Zn (t) = N k=1 dk (t) where dk = θk Xk cos 2πpk t. Thus Zn (t) is a sum of martingale differences satisfying (t)

dk ∞ ≤ θk . Then by Azuma’s inequality, for all nonnegative reals v, N (t) dk > v ≤ 2 exp − P k=1

Thereby ZN (s) G ≤ C have

2 1/2 k=1 θk

N

ZN (s) − ZN (t) G ≤ C

N

2

N

v2

(t) 2 k=1 dk ∞

.

(8.5.9)

for some universal constant C. Similarly, we

θk2 (cos 2πpk s − cos 2πpk t)2

1/2

≤ CdN (s, t).

k=1

(8.5.4c) Consequently, Theorem 8.5.1 applies in that case as well.

374

8 The metric entropy method

Proof of Theorem 8.5.1. The key point of the proof is contained in the following elementary observation: the pseudo-metric dN ( ·, · ) is locally comparable to the usual distance. Indeed, since | sin x| ≤ (|x| ∧ 1), we thus have N

N θk2 pk2 ∧ (πpk |s − t|) ∧ 1 ≤ 4π 2 |s − t|2

1 ≤4 . 2 π |s − t|2 k=1 k=1 (8.5.10)

1 2 , k = 1, . . . , N. We thus deduce that if π |s − t| ≤ 1/p˜ N , then pk2 ∧ π 2 |s−t| = p 2 k

N 1/2 2 2 And consequently dN (s, t) ≤ 2π |s − t| . k=1 θk pk Divide the interval [0, 1[ into sub-intervals: dN2 (s, t)

θk2

2

IN,j =

$

$

j −1 j , , 4p˜ N 4p˜ N

Since s, t ∈ IN,j implies |s − t| ≤ that dN (s, t) ≤ 2π |s − t|

N

≤

1 4p˜ N

θk2 pk2

j = 1, . . . , 4p˜ N .

1 π p˜ N ,

1/2

it follows from the previous estimate

j = 1, . . . , 4p˜ N , s, t ∈ IN,j .

,

(8.5.11)

(8.5.12)

k=1

Introduce now the auxiliary process "

# ZN (t) − ZN ( 4j p−1 ˜N ) YN (t) =

N , 2 2 1/2 2π k=1 θk pk

j = 1, . . . , 4p˜ N , t ∈ IN,j .

(8.5.13)

Then we bound QN relatively to the partition of [0, 1[ as follows: QN ≤

sup 1≤j ≤4p˜ N

N 1/2 j − 1 2 2 Z + 2π θ p N k k

4p˜ N

k=1

sup

sup |YN (t)|. (8.5.14)

1≤j ≤4p˜ N t∈IN,j

We are now in an easy setting, because we have to estimate the local extrema sup{|YN (t)|, t ∈ IN,j } of a stochastic process with increments locally bounded by the usual distance. Indeed, from (8.5.4), (8.5.12): for any s, t ∈ IN,j , YN (s) − YN (t) G ≤ B|s − t|, j = 1, 2, . . . , 4p˜ N . In order to estimate QN , we will need two simple tools. The first follows from inequality (8.1.12): sup |fj | ≤ ([2/ log 2] log n)1/2 sup fj G , G 1≤j ≤n

1≤j ≤n

∀n ≥ 2, ∀f1 , . . . , fn . (8.5.15)

375

8.5 An application to the supremum of random polynomials

From (8.5.4) and (8.5.15) follows that

QN G ≤ ([2/ log 2] log 4p˜ N )1/2 + 2π

N

θk2 pk2

sup j =1,...,4p˜ N

1/2

sup

.

j =1,...,4p˜ N

k=1

1/2

≤ [2/ log 2] log 4p˜ N

N

B

θk2

j − 1 Z N

4p˜ N

G

sup YN (t) G

t∈IN,j

(8.5.16)

1/2

k=1

+ 2π

N

θk2 pk2

1/2

sup

.

j =1,...,4p˜ N

k=1

sup YN (t) G

t∈IN,j

The second tool is Theorem 8.1.1. Now, we estimate supt∈IN,j |YN (t)| G . By taking account of (8.5.12) and since diam(IN,j , | · |) = 1/4p˜ N , we must first estimate N(IN,j , | · |, u) for 0 < u ≤ 1/4p˜ N ; obviously $

N(IN,j , | · |, u) ≤ 1 + Thus I (IN,j , | · |) ≤

1 4p˜ N

9

0

1/4p˜ N 2u

%

≤1+

1/4p˜ N 1 . ≤ 2u 2up˜ N

v

(u= 4p˜ ) 1 2 log du = N 4up˜ N 4p˜ N

1 0

3

2 C log dv ≤ . v p˜ N

It follows from (8.5.4), Theorem 8.1.1 and from the fact that Y( 4j p−1 ˜ N ) = 0, that for any countable subset E of IN,j , sup |YN (t)| ≤ sup YN (s) − YN (t) ≤ C , G G p˜ N t∈E s,t∈E

(8.5.17)

where C depends on B only. But the ω-trajectories t → ZN (t, ω) are continuous for each ω ∈ , and so are those of the auxiliary process YN . By specifying estimate (8.5.17) for a countable dense subset of IN,j , we have in fact shown sup YN (t) ≤ C . G p˜ N t∈IN,j By putting this estimate in (8.5.16), we thus obtain N N 1/2 1 1 2 2 1/2 2 2

QN G ≤ C (log 4p˜ N ) θk + θk pk p˜ N k=1

≤ C (log p˜ N )

1 2

N

θk2

k=1

1/2

k=1

We have therefore proved Theorem 8.5.1.

.

(8.5.18)

376

8 The metric entropy method

8.5.2 Remark. The same proof combined with a simple form of the Borell–Sudakov– Tsirelson inequality (operating the same way as in the proof of Corollary 8.5.5) also serves to establish a multidimensional version of Theorem 8.5.1. Let m be some ˜N = positive integer. Let {p k , k ≥ 1} be a sequence of elements of Rm + , and write p max{[2 + pik ], 1 ≤ k ≤ N, 1 ≤ i ≤ m}; here we have denoted p k = (p 1k , . . . , p m k ). For t ∈ [0, 1]m , define analogously to (8.5.1 ), m ZN (ω, t) =

N

θk Xk (ω) cos 2π pk , t + Yk (ω) sin 2π p k , t ,

k=1 m Qm N = sup |ZN (t)|. t∈[0,1]m

The corresponding pseudo-metric to (8.5.3) is defined for s, t ∈ [0, 1]m by dN,m (s, t) = 2

N

1/2

θk2 sin2 π pk , t − s

,

(8.5.3 )

k=1

When for instance X and Y are independent random variables with E Xk = E Yk = 0

m m (t) 2 = d 2 (s, t). Analogously, we will and E Xk2 = E Yk2 = 1, then E ZN (s) − ZN N,m assume that for some constant B,

ZN (s) − ZN (t) G ≤ BdN,m (s, t), m (8.5.4 ) ∀N ≥ 1, ∀s, t ∈ [0, 1] , 1/2

N 2

ZN (s) G ≤ B . k=1 θk The following is left as an exercise: under assumption (8.5.4 ), there exists a constant C (which is a function of m and the constant B from (8.5.4 ) only) such that for any integer N ≥ 1, N 1/2 m

Q ≤ C log p˜ N 1/2 θk2 . N G k=1

Some applications. We give four applications of Theorem 8.5.1, the first one establishing a precise uniform estimate of exponential sums of the form N

Uk θk e2iπpk t

N = 1, 2, . . .

k=1

where U = {Uk , k ≥ 1} is a sequence of weakly dependent random variables; the second one provides a global uniform estimate of the sequence formed by the differences of these polynomials. In that case, we will assume that the sequence U is Gaussian. The third application provides a similar global uniform estimate for sequences of independent symmetric random variables. A fourth application to a variant of the initial problem is given in Theorem 8.5.8. We first establish the following corollary.

8.5 An application to the supremum of random polynomials

377

8.5.3 Corollary. (a) Let U = {Uk , k ≥ 1} be a sequence of independent, centered real random variables. We assume that there exists a real M < ∞ such that |Uk | ≤ M a.s. for any k ≥ 1. Then N N 1/2 2iπpk t Uk θk e θk2 ≤ CM log p˜ N sup G

0≤t≤1 k=1

(8.5.19a)

k=1

where C is a universal constant. (b) Let V = {Vk , k ≥ 1} be a centered, stationary Gaussian sequence with finite decoupling coefficient p(V) (see Example 2). Then N N 1/2 2 2iπpk t Vk θk e θk2 ≤ C p(V) log p˜ N sup G

0≤t≤1 k=1

(8.5.19b)

k=1

where C is a universal constant. (c) Let U = {Uk , k ≥ 1} be a sequence of independent, centered real random variables. Then N

N N 2 1/2 , Uk e2iπpk t ≤ C min (log p˜ N )1/2 E sup k=1 E Uk k=1 E |Uk | 0≤t≤1 k=1

(8.5.19c) where C is a universal constant. Proof. For establishing (8.5.19a), we apply Theorem 8.5.1 to X = U, Y = 0, next to X = 0, Y = U. This provides the desired estimate for both the imaginary and real part; hence the result by putting together these estimates. We operate similarly for establishing (8.5.19b), by applying Theorem 8.5.1 to X = V, Y = 0, next to X = 0, Y = V. Now, to prove part c) of the statement, we use a well-known randomization trick, often called a symmetrization procedure. Let U = {Uk , k ≥ 1} be an independent copy of U. Let also ε = {εk , k ≥ 1} be a Rademacher sequence which is assumed to be independent from U and U , and denote by E , E ε the corresponding expectation symbols. The sequence {Uk − Uk , k ≥ 1} is a sequence of symmetric independent random variables and has thus the same law as {εk (Uk − Uk ), k ≥ 1}. Then, N N Uk e2iπpk t = E sup (Uk − E Uk )e2iπpk t E sup 0≤t≤1 k=1

0≤t≤1 k=1

≤ EE

N sup (Uk − Uk )e2iπpk t

0≤t≤1 k=1

N = E E E ε sup εk (Uk − Uk )e2iπpk t ≤ 0≤t≤1 k=1

378

8 The metric entropy method N ≤ 2E E ε sup εk Uk e2iπpk t 0≤t≤1 k=1

(by (8.5.19a))

≤ C(log p˜ N )

1/2

E

N

Uk2

1/2

≤ C(log p˜ N )

1/2

k=1

N

E Uk2

1/2 .

k=1

The proof is now complete.

N 1/2 8.5.4 Remark. One might think that the bound (log p˜ N )1/2 E Uk2 in k=1 E |U |. This is however not (8.5.19c) is always better than the trivial bound N k k=1 the case. Consider the following instructive example. We assume that each random variable Uk takes only two values as follows: 1/k with probability 1 − εk , Uk = −(1 − εk )/(kεk ) with probability εk , where 0 < εk < 1 and εk decreases to 0. Then E Uk = 0, E Uk2 = (1 − εk )/k 2 + (1 − εk )2 /(k 2 εk ). Assume that limk→∞ k 2 εk = 1. Then E Uk2 ∼ 1 as k tends to

1/2 infinity. And so (log p˜ N ) N Uk2 ∼ (N log p˜ N )1/2 , as N tends to infinity. k=1 E But E |Uk | = 2(1 − εk )/k, so that N k=1 E |Uk | ∼ C log N, which provides a much better bound. Estimate (8.5.19b) can be considerably strengthened. This is the object of the next 2iπpk t can be obtained, V corollary. A uniform bound for the increments M k=N +1 k θk e

2 1/2 should be slightly modified. It will but the normalizing factors log p˜ M M k=N +1 θk be necessary to have for all positive integers M, log p˜ M ≥ C log M, C being some con

2 1/2 stant depending from the data. We will therefore work with log p¯ M M k=N +1 θk where

p¯ M = max(p˜ M , M) = max max [2 + |pk |], 1 ≤ k ≤ M , M . (8.5.20) If {pk , k ≥ 1} is an increasing sequence of positive integers, or if for some δ > 0, pk ≥ k δ , log p¯ M and log p˜ M are of comparable order. But this is no longer the case when {pk , k ≥ 1} grows slower than polynomially, as it happens for the Dirichlet sums N −it . k=1 Vk θk k 8.5.5 Corollary. Let V = {Vk , k ≥ 1} be a centered stationary Gaussian sequence with finite decoupling coefficient p(V). Then M 2iπpk t 2 k=N +1 Vk θk e sup sup

≤ C0 p(V), 1/2 M N <M 0≤t≤1 log p¯ M θ2 k=N +1 k

where C0 is a universal constant.

G

8.5 An application to the supremum of random polynomials

379

Proof. It is enough to establish a similar estimate for each of the imaginary and real parts. We put ⎧ M (cos) ⎪ k=N +1 Vk θk cos 2πpk t ⎪ = sup (N < M), L

⎪ 1/2 0≤t≤1 M N,M ⎪ 2 ⎪ ⎨ log p¯M k=N +1 θk M (8.5.21) (sin) k=N +1 Vk θk sin 2πpk t L = sup (N < M),

⎪ 1/2 0≤t≤1 M N,M ⎪ 2 ⎪ log p¯ M k=N+1 θk ⎪ ⎪ ⎩ (cos) (cos) (sin) = supN <M LN,M , L(sin) = supN <M LN,M . L It thus suffices to show that 2 E L(cos) ≤ C p(V),

2 E L(sin) ≤ C p(V).

(8.5.22)

Then by estimate (10.2.2) for Gaussian semi-norms, 2 2 (cos) L ≤ C p(V), L(sin) ≤ C p(V). G G Hence, the desired result follows by combining together these estimates. We prove now (8.5.22). For convenience we recall (10.4.4): If G1 , . . . , GN are Gaussian random vectors with values in a separable Banach space (B, · ), then E sup Gk ≤ C sup E Gk + E sup σk |gk | 1≤k≤N

1≤k≤N

1≤k≤N

1/2 where σk = supf ∈B ∗ , f ≤1 E f, Gk 2 , k = 1, . . . , N, {gk , 1 ≤ k ≤ N } is a sequence of independent N (0, 1) distributed random variables, C a universal constant. From this we deduce (cos) E L(cos) ≤ C sup E LN,M + E sup |λN,M |σN,M N <M

where σN,M

N <M

M k=N +1 Vk θk cos 2πpk t = sup

, 1/2 M 0≤t≤1 log p¯ M θ2 k=N +1 k

2

and (λN,M )N<M is a sequence of independent N (0, 1) distributed random variables. By a computation similar to the one made in Example 1, we also obtain M M 1/2 2 Vk θk cos 2πpk t ≤ C p(V) θk2 cos2 (2πpk t) k=N+1

G

k=N +1 M 1/2 2 ≤ C p(V) θk2 . k=N +1

380

8 The metric entropy method

M √ 2 1/2 , and therefore Hence, M k=N +1 Vk θk cos 2πpk t 2 ≤ C p(V) k=N +1 θk 2 −1/2

σN,M ≤ C p(V) log p¯ M . √ (cos) By Theorem 8.5.1, we already know that supN <M E LN,M ≤ C p(V). Consider now the other part. First, we re-index the sequence as follows: put m1 = 1, mk = 1 + kj =2 (j − 1) (k ≥ 2). Next, put for any M ≥ 1 and any l ∈ [mM , mM+1 [, gl := λl−mM ,M , sl = (log p¯ M )1/2 . Observe that sl ≥ (log M)1/2 ≥ C(log l)1/2 . Thus |gl | l≥1 sl 9 log l |gl | ≤ C sup E sup √ ≤ C < ∞. sl log l l≥1 l≥1

E sup |λN,M |σN,M ≤ CE sup N <M

√ Hence E L(cos) ≤ C p(V). By arguing identically, we establish an estimate of the same order for E L(sin) . Hence (8.5.22). The corollary is thus proved. We will now prove the following result. 8.5.6 Theorem. Let W = {Wk , k ≥ 1} be a sequence of independent, symmetric real random variables. Then, M 2iπpk t W e k k=N +1 ≤ C, (8.5.23) sup sup

1/2 2 N <M 0≤t≤1 log p¯ M M W k=N +1

k

G

where C is a universal constant. Observe that, by means of the Cauchy–Schwarz inequality, M M 2iπpk t (M − N )1/2 k=N +1 Wk e k=N +1 |Wk | ≤

≤

1/2 . M 1/2 1/2 2 log p¯ M k=N +1 Wk2 log p¯ M M log p ¯ W M k=N +1 k In particular, if {pm , m ≥ 1} is λ-lacunary (λ > 1), that is pm+1 ≥ λpm for all m ≥ 1, then M 2iπpk t k=N +1 Wk e sup sup

≤ C, M 2 1/2 N <M 0≤t≤1 log p¯ M W k=N +1 k where C is a constant depending on λ only. So Theorem 8.5.4 is only interesting when {pm , m ≥ 1} grows at most geometrically. Proof. Since the sequence W is symmetric, it has the same distribution as the sequence W = (εk Wk )∞ k=1 , where ε = {εk , k ≥ 1} is a sequence of independent Rademacher random variables, which is also independent from the sequence W . Let P be some fixed

8.5 An application to the supremum of random polynomials

381

positive integer. Let also g = {gk , k ≥ 1} be a sequence of independent N (0, 1) distributed random variables, also independent from the sequence W . By Corollary 8.5.5, 0 M 1 2iπpk t g W e k k k=N +1 ≤1 sup sup E gG

M 2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M k=N +1 Wk where C0 is the same constant as in Corollary 8.5.5. Since |g| = {|gk |, k ≥ 1} and sign(g) = {sign(gk ), k ≥ 1} are independent sequences, by Jensen’s inequality 0 M 1 2iπpk t k=N +1 gk Wk e sup sup 1 ≥ E gG

M 2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M k=N +1 Wk 0 M 1 2iπpk t k=N +1 gk Wk e ≥ E sign(g) G E |g| sup sup

M 2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M k=N +1 Wk 0 2 M 2iπpk t 1 k=N +1 sign(gk )Wk e π sup sup ≥ E sign(g) G

2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M M k=N +1 Wk 03 M 1 2 supN <M≤P sup0≤t≤1 k=N +1 εk Wk e2iπpk t . = E εG

1/2 π C0 log p¯ M M W2 k=N +1

k

By integrating with respect to W , using symmetry of the law of W and finally letting P tend to infinity, we obtain 03 M 1 2 supN <M sup0≤t≤1 k=N +1 Wk e2iπpk t EG ≤ 1.

2 1/2 π C0 log p¯ M M k=N +1 Wk This means that M 2iπpk t k=N +1 Wk e sup sup

2 1/2 N <M 0≤t≤1 log p¯ M M k=N +1 Wk

3 π , ≤ C0 2 G

hence the announced result. It is now easy to deduce from Theorem 8.5.6 (except for the constant 2 in (8.5.24), the well-known estimate of Salem–Zygmund (see Kahane [1968] or Salem–Zygmund [1954]) that we recall now. 8.5.7. Salem–Zygmund’s estimate. Let {nk , k ≥ 1}, {pk , k ≥ 1} be two increasing sequences of integers and a sequence {ak , k ≥ 1} of reals. Let also ε = {εk , k ≥ 1} be a sequence of independent Rademacher random variables defined on a probability space (, B, P). Then maxnk 1). Theorem 8.5.6 can be used to get a simple sufficient condition for uniform convergence of random Fourier series. The condition is expressed by means of the convergence of a series whose terms depend on the sequence p. When the size’s order of this sequence is known, this condition can be easier to check than the remarkable characterization ([Ledoux–Talagrand: 1991] Theorem 13.6 and Corollary 13.9) of that property by Marcus and Pisier, in terms of the so-called Dudley’s entropy integral. 8.5.8 Theorem. Suppose there exist integers 0 := n0 < n1 < n2 < · · · such that the following condition is satisfied: ∞ 2

log pni+1 E

i+1 n

|Wk |2

k=ni +1

i=0

Then the sequence of partial sums Sn (ω, t) := converges in C, for P-almost all ω. Proof. Put R=

1/2

converges.

n

k=1 Wk (ω)e

M

2iπpk t ,

2iπpk t k=N +1 Wk e sup sup

. M 2 1/2 N <M 0≤t≤1 log pM k=N +1 Wk

By Theorem 8.5.6, E R < ∞, so that

Sni+1

i+1 n 1/2 2 − Sni C ≤ R log pni+1 |Wk |2 ,

k=ni +1

n = 1, 2, . . .

385

8.5 An application to the supremum of random polynomials

for any i ≥ 1, and moreover sup

ni ≤n≤ni+1

Sn − Sni C ≤ R =R

n

sup

ni ≤n≤ni+1 i+1 n

|Wk |2

1/2

log1/2 pn

k=ni +1

|Wk |2

1/2

log1/2 pni+1 .

k=ni +1

Thus by the triangle inequality, for all r ≥ 1, sup Su − Sv C ≤ R

u,v≥r

i+1 n

i≥r

|Wk |2

1/2

log1/2 pni+1 .

k=ni +1

This last inequality shows by the assumption made and Fatou’s lemma that sup Su − Sv C → 0

u,v≥r

as r tends to infinity, almost surely. The result easily follows. Lp -norms of random polynomials. The study of the behavior of Lp -norms of random polynomials built from sequences of i.i.d. random variables, requires a radically different approach. Borwein and Lockhart [2001] investigated this question. Their approach is based on convergence results of moments in the central limit theorem for triangular arrays of i.i.d. random variables. The case of arrays of independent random variables was considered in [Cuny–Weber: 2006], where a theorem of convergence of moments with speed of convergence in the CLT for triangular arrays of independent random variables is further established. We begin with introducing the necessary notation. Let Xn,k , 1 ≤ k ≤ kn , n ≥ 1 be a triangular array of real centered independent, square integrable random variables and set for every n ≥ 1 and 1 ≤ j, k ≤ kn , 2 2 σn,j = E Xn,j ,

2 sn,k =

k

2 σn,j ,

sn = sn,kn ,

Sn,k =

j =1

k

Xn,j ,

Sn = Sn,kn .

j =1

Introduce the (generalized) Lindeberg condition (also called Lyapunov’s condition) of order ν ≥ 2: kn

E |Xn,j |ν 1{|Xn,j |>εsn } = o(snν ), (∀ε > 0) n → ∞.

(Lν )

j =1

This condition is, for ν > 2, equivalent to kn j =1

E |Xn,j |ν = o(snν ),

n → ∞.

(Lν )

386

8 The metric entropy method

According to Lindeberg’s theorem (see for instance Hall and Heyde [1980]), under

S2 (L2 ), Ssnn converges in law to the standard normal law; and since E s 2n = 1, we n have 2 Sn,k lim E = 1 = m2 , (8.5.28) 2 n→∞ sn,k where m2 = E W 2 and W is a variable with standard normal law. More generally, for ν ν > 0, write mν := E |W | . • (0 < ν ≤ 2). Let Xn,k , 1 ≤ k ≤ kn , n ≥ 1 be a triangular array of real centered independent, square integrable random variables. Assume that (L2 ) holds. Then, E |Sn |ν = mν n→∞ snν lim

• (ν > 2). Let {Yk , 1 ≤ k ≤ n} be real centered independent random variables,

n 2 1/2 . Then, with finite moment of order ν. Write Sn = nk=1 Yk and sn = k=1 E Yn there exists a universal constant C such that n ν ν |S | n k=1 E |Yk | E ≤ C − m for 2 < ν ≤ 3, (8.5.29a) ν sn snν n ν n ν 3 k=1 E |Yk | k=1 E |Yk | E |Sn | − mν ≤ C + for 3 < ν ≤ 5, sn snν sn3 (8.5.29b) and, for ν > 5, n n ν ν 3 k=1 E |Yk | k=1 E |Yk | E |Sn | − mν ≤ (C ν )ν + ν 3 sn log ν sn sn (8.5.29c) n n ν−3 3 k=1 E |Yk | k=1 E |Yk | . + sn3 snν−3 As a corollary we obtain 8.5.9 Theorem. Let ν > 2. Let {Xn,k , 1 ≤ k ≤ kn , n ≥ 1 be a triangular array of real centered independent random variables, having moments of order ν. Assume that ν n| (Lν ) holds. Then E |S converges to mν as n tends to infinity, with the speed given snν above. Further if ν ≥ 3, the rate of convergence can be simplified: kn ν ν ν h ν |S | n k=1 E |Xn,k | E ≤ − m C max , ν 1 sn log ν h∈{1, ν−2 snν } where C is a universal constant. We refer to Cuny and Weber [2006] for these results and comparisons with earlier results. Theorem 8.5.10 can be used to prove the following result extending Borwein and Lockhart’s theorem to triangular arrays of independent random variables.

387

8.6 Application to a.s. convergence of weighted series of contractions

8.5.10 Theorem. Let {Xn,k , 1 ≤ k ≤ kn , n ≥ 1} be a triangular array of real centered 2 = 1, satisfying the Lindeberg condition independent random variables, with E Xn,k (Lν ) of order ν ≥ 2. We have 2π ν 1 ν E |qn (θ )| dθ = 1 + , lim n→∞ 2π kn ν/2 0 2 ∞ n Xn,k eikθ and (s) = 0 us−1 e−u du is the usual Gamma funcwhere qn (θ) = kk=1 tion.

8.6 Application to a.s. convergence of weighted series of contractions In this section the convergence properties, in mean and almost everywhere, of series of contractions (of an arbitrary Hilbert space) with random weights are investigated. The uniform estimates of random polynomials established in the previous section will be combined with the spectral inequality to obtain sharp conditions ensuring the existence of universal sets on which these series converge in mean and also almost everywhere, for arbitrary contractions. The general approach is further also based on the metric entropy method. Let (X, F , μ) be some probability space. Consider the randomly weighted series of contractions ∞ Wk (ω)T pk , (8.6.1) k=1

where {Wk , k ≥ 1} is a sequence of independent, mean zero, square integrable random variables, defined on some probability space (, B, P), and T is a linear contraction in a Hilbert space H , while {pk , k ≥ 1} is a nondecreasing sequence of nonnegative integers with p1 > 1, and ω ∈ . Consider first the convergence in mean of the series (8.6.1). One can establish the following theorem. 8.6.1 Theorem. Suppose that there exist integers 0 := n0 < n1 < n2 < · · · such that the following condition is satisfied: ∞ j =0

nj +1 nj +1 2 1/2 , min (log pnj +1 )1/2 k=nj +1 E Wk k=nj +1 E |Wk | < ∞.

(8.6.2)

Then there exists a (universal) sequence of P-integrable random variables M = {MJ , J ≥ 1} defined on (, B, P), which converges to zero P-a.s. and in P-mean, such that for any Hilbert space H and any contraction T in H we have R sup Wk (ω)T pk ≤ MJ (ω)

R>nJ

k=nJ +1

(8.6.3)

388

8 The metric entropy method

for all ω ∈ and all J ≥ 1. In particular, there exists a (universal) P-null set N ∗ ∈ B such that the series ∞ Wk (ω)T pk (8.6.4) k=1

converges in operator norm for all ω ∈ \N ∗ , whenever H is a Hilbert space and T is a contraction in H . We will see in Theorem 8.6.2 that condition (8.6.2) is in fact already enough to imply the existence of a (universal) P-null set N ∗ ∈ B such that: for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), if we define ∀ω ∈ , ∀x ∈ X, ∀n ≥ 1,

n

Sn (ω, x) =

Wk (ω)T pk f (x),

(8.6.5)

k=1

the sequence Snk (ω, •) converges μ-almost surely. If in addition to condition (8.6.2) we have

nk+1 " nk+1 2 # 2 < ∞, min log2 (nk+1 − nk ) log pnk+1 j =nk +1 E (Wj ) , E j =nk +1 |Wj | k

(8.6.6) then one also has the existence of a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), the sequence Sn n = 1, 2, . . . converges μ-almost surely. Proof of Theorem 8.6.1. Fix some N ≥ 1 and let R > nN ≥ 1. Let R ≥ 0 be defined by nN+R < R ≤ nN +R +1 . Let f ∈ H with spectral measure μf with respect to T . Then, R Wk (ω)T pk f k=nN +1

≤

N+ i+1 R −1 n

Wk (ω)T

pk

f +

k=ni +1

i=N

R

Wk (ω)T pk f

k=nN+R +1

∞ ∞ Wk (ω)T pk f + ni+1

≤

i=N

≤

i=N

+

j =0 nN+j nN

k=nN +1

t∈T k=n +1 i

i=N

+

∞

sup

j =N nj 21 , β > 2. Then, there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), the following sequences {Sn (ω, · ), n ≥ 1}, {Rn (ω, · ), n ≥ 1} defined by ∀x ∈ X, ∀n ≥ 1, Sn (ω, x) =

n Zk (ω)

kα

k=1

and ∀x ∈ X, ∀n ≥ 1, Rn (ω, x) =

n k=1

T k f (x)

(8.6.9)

Zk (ω) T k f (x) √ k logβ k

(8.6.10)

converge μ-almost surely. Proof. Set Nk = 2k for all k ∈ N. By Theorem 8.6.2, it is enough to verify conditions (8.6.2) and (8.6.6). Let α > 21 , then for the series (8.6.9), condition (8.6.2) becomes ∞

log(2i+1 )

i+1 $ 2

E (|Zk |2 ) k 2α

k=2i +1

i=0

%1/2

∞

≤ K(E |Z1 | )

2 1/2

√

i+1 1

i=0

2(α− 2 )i

< ∞.

And concerning condition (8.6.6), we find ∞

$ 2

min log (2

k+1

− 2 ) log 2 k

k+1

k+1 2

j =2k +1

k=1

≤ KE (|Z1 |2 )

∞ k=1

E |Zj |2 ,E j 2α

k3 2(2α−1)k

k+1 2

j =2k +1

|Zj | jα

2 %

.

< ∞.

Let β > 2, then for the series (8.6.10), condition (8.6.2) becomes ∞

log(2i+1 )

i+1 $ 2

k=2i +1

i=0

E (|Zk |2 ) k log2β k

%1/2

≤ K(E |Z1 |2 )1/2

∞

1

i=1

(β− 21 )

i

< ∞.

As for condition (8.6.6), ∞ k=1

$ 2

min log (2

k+1

− 2 ) log 2 k

k+1

k+1 2

j =2k +1

E (|Zj |2 ) ,E j log2β j

k+1 2

2 %

|Zj | β

j =2k +1

j 1/2 logj

8.6 Application to a.s. convergence of weighted series of contractions

≤ KE (|Z1 |2 )

∞ k=1

1 k 2β−3

395

< ∞.

Hence, conditions (8.6.2) and (8.6.6) are fulfilled for the series (8.6.9) and (8.6.10). This achieves the proof of Corollary 8.6.4. 8.6.5 Remark. If α ≤ 21 , and P{|Z1 | > 0} > 0, then the series (8.6.9) does not converge. To see this, it is enough to take T = Identity and α = 21 in (8.6.9). Then we have ∞ ∞ Zk (ω) k Zk (ω) ∀ω ∈ , ∀x ∈ X, √ T f (x) = f (x) √ . k k k=1 k=1 But by the 0-1 law and the central limit theorem, the series in the right-hand side diverges almost surely. The case α < 1/2 is treated in exactly the same manner, and this completes the proof of our claim. Corollary √ 8.6.4 also improves earlier results of Rosenblatt [1988], with a factor n instead of n logβ n and the Rademacher sequence instead of a general sequence of independent, symmetric, identically distributed random variables. By combining Corollary 8.6.4 with Kronecker’s lemma, we get 8.6.6 Theorem. If {Zk , k ≥ 1} is a sequence of independent, symmetric, square integrable, identically distributed random variables on a probability space (, B, P) and if β > 2, then there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), the sequence 1 Zk (ω)T k f (x), An (ω, x) = √ n logβ n k=1 n

x ∈ X, n ≥ 1,

(8.6.11)

converges to zero μ-almost surely. 8.6.7 Remark. 1. In the previous applications, we only considered the i.i.d. case. Naturally Theorem 8.6.2 applies to the non-i.i.d. case as well. The corresponding results are left as exercises. 2. The almost sure convergence of the weighted means 1 Zk (ω)T k f n n

(8.6.12)

k=1

was studied by several authors. In Assani [1998], the almost sure convergence to zero of these means is established when {Zk , k ≥ 1} is an i.i.d. sequence of symmetric random variables, such that E (|Z1 |p ) < ∞ for some 1 < p < ∞ and T is the transformation induced by a measure-preserving transformation. In Rosenblatt [1988], these means

396

8 The metric entropy method

are studied when T is a contraction on Lp (μ), 1 < p < ∞ and {Zk , k ≥ 1} is a Rademacher sequence. And in Schneider–Weber [1996], a Gaussian technique is used to prove the almost sure convergence of means (8.6.12), notably when the sequence {Zk , k ≥ 1} is positive. Now let {Zk , k ≥ 1} be as in Theorem 8.6.6, and let (X, F , μ) be a probability space, T a contraction on L1 (μ), which is also assumed to be a contraction on any Lp (μ), (p ≥ 1). Consider the series ∞ Zk (ω) k=1

k

T k f (x).

(8.6.13)

By using the above results and a complex interpolation method, we will prove the almost sure convergence of the series (8.6.13), for all f ∈ Lp (μ), p > 1. 8.6.8 Theorem. Let {Zk , k ≥ 1} be a sequence of independent, symmetric, square integrable, identically distributed random variables on some probability space (, B, P). Then, there exists a (universal) P-null set N ∗ ∈ B, such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L1 (μ), which is also a contraction on every Lp (μ) and for any f ∈ Lp (μ), (p > 1), the series defined in (8.6.13) converges μ-almost surely. Further, if n Zk (ω) k S ∗ (f ) = S ∗ (ω, f ) = sup (8.6.14) T f , k n≥1 k=1

then we have the strong maximal inequality ∀ω ∈ \N ∗ , ∀p > 1, ∀f ∈ Lp (μ),

S ∗ (f ) p ≤ C(p, ω) f p .

(8.6.15)

Proof. Let α > 1/2, z ∈ C with 0 ≤ $(z) ≤ 1, Nj = 2j , j = 1, 2, . . . . Let also ν : X → N∗ be a measurable application and define for ω ∈ , p ≥ 1 the operators in Lp (μ), ν Zj (ω) k T (f ) (8.6.16) ∀f ∈ Lp (μ), Szν (f ) = (α+ 2z ) j j =1 as well as

Sz∗ (f ) = sup Szν (f ) .

(8.6.17)

ν≥1

First we establish a useful estimate for Szν (f ) 2 when f ∈ L2 (μ). Let x ∈ X, then there exists a positive integer k0 = k0 (ν), such that 2k0 < ν(x) ≤ 2k0 +1 . Thus ν(x) Zj (ω) j |Szν (f )|(x) = T f (x) (α+ 2z ) j j =1 2k0 n Zj (ω) j ≤ T f (x) + max (α+ 2z ) 2k0 0 such that sup E |θk |α < ∞. k≥1

There exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ) and any f ∈ L2 (μ), if is the contraction defined in (a2), one has 1 pk +θk (ω) 1 pk T (f ) = lim T ((f )) μ-almost surely. n→∞ n n→∞ n n

n

k=1

k=1

lim

8.6.12 Remarks. Corollary 8.6.11 implies that if {pk , k ≥ 1} is a sequence of positive integers which satisfies (a1) and which is 2-good for the pointwise ergodic theorem, then there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , the perturbed sequence {pk + θk (ω), k ≥ 1} is also 2-good for the pointwise ergodic theorem. Thus it follows: 1) If {pk , k ≥ 1} is the sequence {k d , k ≥ 1}, d ≥ 1 or the sequence of prime numbers, then the perturbed sequence {pk + θk (ω), k ≥ 1} is 2-good for the pointwise ergodic theorem. Furthermore, if d = 1 (resp. d ≥ 2) and τ is ergodic (resp. τ n is ergodic for each n ∈ N), then for any ω ∈ \N ∗ one has 1 f τ pk +θk (ω) = n→∞ n n

k=1

(f )dμ =

lim

X

f dμ

μ-almost surely.

X

2) On the other hand, we can deduce from Corollary 8.6.11, that if {pk , k ≥ 1} is a sequence of positive integers which satisfies (a1) and which is 2-bad for the ergodic

8.7 An application to random perturbation of intersective sets

403

theorem (i.e., there exist an f ∈ L2 (μ) and Xf ∈ A with μ(Xf ) > 0 such that, for each x ∈ Xf , limn→∞ n1 nk=1 f τ pk (x) does not exist, then there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , the sequence {pk + θk (ω), k ≥ 1} is also bad for the pointwise ergodic theorem. This was observed in [Schneider: 1997], where a weaker form of Corollary 8.6.11 was proved using Gaussian techniques. 3) Several other papers dealing with this subject and with suprema of random polynomials appeared after [Boukhari–Weber: 2002], and we may cite the works of Cohen [2004–2006] in collaboration with Cuny, Jones and Lin. These papers explore some larger setting – multidimensional cases with valuable and quite interesting extensions to Lp -contractions with 1 < p ≤ 2 – but do not however improve significantly upon the results presented in this section. For other sources using the metric entropy method, we shall also refer to [Gamet–Weber: 2000]. For improvements based on the majorizing measure method, see Section 9.6. It seems pretty clear that the two aforementioned methods are mostly appropriate, when combined with spectral theory and ergodic theory, for tackling these questions. Problem 9. Only sufficient conditions are given. Find necessary conditions to these “universal” convergence properties.

8.7 An application to random perturbation of intersective sets Given a set S ⊂ Z and a sequence I = {In , n ≥ 1} of intervals of increasing length contained in Z, let b(S, I ) = lim sup |In |→∞

|S ∩ In | , |In |

b(S) = sup b(S, I ), I

where the supremum is taken over all collections I of intervals. Here and henceforth for a finite set B we will use |B| to denote its cardinality. We call b(S) the Banach density of S. If the limit |S ∩ [1, N ]| N →∞ N

d(S) := lim

exists, this is by definition the density of S. Suppose (X, B, μ, T ) is a measurable dynamical system. Recall that a sequence of natural numbers k = {kn , n ≥ 1} is 2-nice if given any dynamical system (X, B, μ, T ), and any f ∈ L2 (μ), N 1 f (T kn x) = ET (f )(x), N →∞ N

lim

n=1

μ-almost everywhere. Here ET (f ) denotes as usual the conditional expectation of f with respect to the σ -algebra B(T ) of T -invariant measurable subsets of X.

404

8 The metric entropy method

Say that a sequence of natural numbers k = {kn , n ≥ 1} is multiply intersective if, given any subset E of the natural numbers of positive Banach density, there exists another subset R of Z with d(R) existing and not less than b(E), such that for each finite subset {n1 , . . . , nr } of R we have b(E ∩ (E + kn1 ) ∩ · · · ∩ (E + knr )) > 0. We say that k is intersective if, given any subset E of Z of positive Banach density, there exists k in k such that. E ∩ (E + k) = ∅. Let us first comment about results related to that property. The interest in intersective sets dates from the 1970s, and immediately postdates Furstenberg’s famous ergodic theoretic proof of Szemeredi’s theorem (see Furstenberg [1981]). A number of authors Furstenberg [1981], Kamae and Mendes-France [1978], Sárközy [1978] showed by strikingly diverse arithmetic and analytic means that special arithmetic sequences like the squares {kr = r 2 , r ≥ 1} are intersective. In Bertrand-Mathis [1986] it is shown that a sequence of integers being intersective is equivalent to it having the Poincaré recurrence property. The relation of the intersectivity property of a sequence to other properties of an integer sequence is explored in Bourgain [1987]. The natural numbers are shown to be multiply intersective in Ruzsa [1978]. New families of multiply intersective sequences are given in Nair [1998] and Nair–Zaris [2001]. Suppose θ = {θn , n ≥ 1} denotes a sequence of N-valued independent, identically distributed random variables with basic probability space (, A, P), with a P-complete σ -field A. We assume k is 2-nice and that there exist 0 < α < 1 and B > 1/α, such that α kn = O(en ), E logB E(α, B) + |θ1 | < ∞. Then we say that (k, θ ) is a good pair. We will establish the following theorem. 8.7.1 Theorem. Suppose that (k, θ ) is a good pair. Then for P-almost all (θi ), given any set E contained in the natural numbers with b(E) > 0, there exists a set R contained in the natural numbers with density d(R) existing and d(R) ≥ b(E), such that for any finite set {n1 , . . . , nr } contained in R, b(E ∩ (E + kn1 + θn1 ) ∩ · · · ∩ (E + knr + θnr )) > 0. Before giving the proof, we proceed with a series of lemmas. Consider a sequence θ, θ1 , θ2 , . . . , of Z-valued, independent random variables defined on a probability space (, B, P), and satisfying P {ki + θi ≥ 0} = 1 i = 1, 2, . . . . Introduce again the sequence of random polynomials UN (t) =

N n=1

(e2iπ t (kn +θn ) − E e2iπ t (kn +θn ) ),

N = 1, 2, . . . .

405

8.7 An application to random perturbation of intersective sets

Assume that the following condition in which : N → N is increasing, is satisfied: #1/2 " log+ (kM + θM ) A(k, θ, ) = E sup < ∞. (8.7.1) (M) M≥1 According to Theorem 8.5.7, |UM (t) − UN (t)| ≤ C · A(k, θ, ), (M − N)1/2 (M) N <M 0≤t≤1

E sup sup

(8.7.2)

where C is a universal constant. The following lemma is related to condition (8.7.1). 8.7.2 Lemma. Assume that θ is an i.i.d. sequence and that condition E(α, B) is satisfied. Then condition (8.7.1) is realized with (t) = t α/2 . Proof. With this choice of , we have for T large, #1/2 " log+ (kM + θM ) 2 α P sup P kM + θM > e4T M > 2T ≤ (M) M≥1 M≥1 2 α ≤ P θM > e T M M≥1

≤

2B αB P logB M + θ1 > T

M≥1

≤

E logB + θ1 M −αB ≤ CT −2B , T 2B M≥1

where C depends on α, B, θ1 only. The result readily follows. 8.7.3 Lemma. Suppose that (k, θ ) is a good pair. Let be the contraction operator defined by ∞ f = E f T θ1 = P{θ1 = n}f T n . n=0

∗

There exist a measurable set of full measure, such that for any ω ∈ , any dynamical system (X, B, μ, T ), and any f ∈ L2 (μ), N 1 f T kn +θn (ω) = E T f (x) = 1. μ x : lim N →∞ N n=1

Proof. It follows from (8.7.2) and Lemma 8.7.2 that there exists a nonnegative P-integrable random variable such that |UM (t) − UN (t)| ≤ , 1/2 M α/2 N <M 0≤t≤1 (M − N) sup sup

(8.7.3)

406

8 The metric entropy method

P-almost surely. By the spectral inequality (Proposition 1.2.2), it follows that if · 2 denotes the standard norm on L2 (μ), kn kn +θn − N 0. Otherwise, P{X ≥ E X} = 0, and thus X ≤ E X a.s. Hence X ∞ = E X. But E ( X ∞ − X) = 0, whence X = X ∞ a.s. This contradicts our assumption, so P{X ≥ E X} > 0. 8.7.5 Lemma. Let (k, θ ) be a good pair. Suppose that (X, B, μ, T ) is a dynamical system, with T invertible, and let B ∈ B with μ(B) > 0. Let Bk denote T −k B for each integer k. Then for almost all θ with respect to P, there exists a subset R = Rk,θ of the natural numbers with d(R) ≥ μ(B) such that for each finite set F contained in R we have * μ Bkn +θn > 0. n∈F

Proof. Let be the universal measurable set of unit mass associated to the pair (k, θ ); is the set { < ∞} where is defined in (8.7.3). Then, for any ω ∈ , any dynamical system (X, B, μ, T ), and any f ∈ L2 (μ), N 1 μ x : lim f T kn +θn (ω) = E T f (x) = 1. N →∞ N n=1

(8.7.7)

8.7 An application to random perturbation of intersective sets

We note that ∞ n E f (x)μ(dx) = P{θ1 = n} f T (x)μ(dx) = f dμ. X

X

n=0

407

(8.7.8)

X

Throughout the rest of the proof, we fix ω ∈ ∗ , and write more simply θn instead of θn (ω). Let B ∈ B with μ(B) > 0. Let also P (N) denote the collection of finite subsets of N. For any F ∈ P (N), let * BF = Bkn +θn , n∈F

and let NF = {x ∈ X : χBF (x) > χBF ∞ }. Here we have used χ to denote the indicator function. Now let + NF , N= F ∈P (N)

and let N =

+

T m N.

m∈N

If f = χA , then f ∞ = 0 (resp. f ∞ = 1) if μ(A) = 0 (resp. μ(A) > 0). Therefore, NF = BF if μ(BF ) = 0, and NF = ∅ if μ(BF ) > 0. So that the set N is exactly + + NF = BF . F :μ(BF )=0

F :μ(BF )=0

This in particular implies that N, and hence N is a null set. Put B = B ∩ N c. Define for x ∈ X the return times set Rx = {n ∈ N : x ∈ T −(kn +θn ) B }. By (8.7.7),

N 1 μ x : d(Rx ) = lim χBk +θ (x) = E T (χB )(x) = 1. n n N →∞ N n=1

As by (8.7.8),

X

E T (χB )(x)μ(dx) = μ(B), we deduce from Lemma 8.7.4 that μ x : E T (χB )(x) ≥ μ(B) > 0.

408

8 The metric entropy method

Thus, there exist x0 in X such that if R = Rx0 , then d(R) ≥ μ(B). We now prove that

*

Bkn +θn > 0

μ

n∈F

for each finite set F contained in R. First, observe that x0 ∈ BF = n∈F Bk n +θn . We claim that x0 ∈ / N. Indeed, since * * T kn +θn x0 ∈ B = B ∩ N c = B ∩ (T m N )c = B ∩ T mN c , m∈N

m∈N

we have T kn +θn x0 ∈ T m N c , which with the choice m = kn + θn , implies x0 ∈ N c . But * x ∈ X : |χBF (x)| ≤ χBF ∞ , Nc = F ∈P (N)

and χBF (x0 ) = 1, since x0 ∈ BF ⊂ BF . Hence χBF ∞ ≥ 1, which ensures that μ(BF ) > 0 as required. We now give the proof of the theorem. According to our assumption, there exists a sequence of finite intervals I = {In , n ≥ 1} with strictly increasing lengths such that |E ∩ In | . n→∞ |In |

b(E) = lim

Let denote {0, 1}Z . Consider the point ζ = {χE (n), −∞ < n < ∞} in and let T denote the two-sided shift on defined by ∞ T (xn )∞ −∞ = (xn+1 )−∞ .

Now let X denote the closure of the orbit {T m ζ : m ∈ Z} in the product topology on and let X0 denote the set {x ∈ X : x1 = 1}. Let δx be the Dirac mass on the point x, and let 1 μN = δT m ζ , |IN | m∈IN

By a known argument (Furstenberg [1981: 73]), there exists a probability measure μ supported on X and preserved by T which is a weak star limit of the sequence of measures {μN , N ≥ 1}. In addition, passing to a subsequence {INs , s ≥ 1} if necessary, for every continuous function on , f dμ = lim f dμNs .

s→∞

8.8 An application to the discrepancy of some random sequences

409

This means that 1 δT m ζ (X0 ) = b(E) > 0. s→∞ |IN |

μ(X0 ) = lim μNs (X0 ) = lim s→∞

m∈IN

By Lemma 8.7.5 we have * * * μ(X0 T −kn1 −θn1 X0 ··· T −knr −θnr X0 ) * * * = lim μNs (X0 T −kn1 −θn1 X0 ··· T −knr −θnr X0 ) s→∞

* * * 1 δT m ζ (X0 T −kn1 −θn1 X0 ··· T −knr −θnr X0 ) s→∞ |IN | m∈IN * * * = b(E (E + kn1 + θn1 ) · · · (E + knr + θnr )) > 0

= lim

as required for every finite subset {n1 , . . . , nr } of R, thereby concluding the proof of Theorem 8.7.1.

8.8 An application to the discrepancy of some random sequences In this section, we give another application by estimating the discrepancy of {{nx}, n ≥ 1} when n is sampled by a random walk. Several examples involving the diophantine approximation properties of x are further considered. The metric entropy method is combined here with the Erdös–Turan inequality. For a real x, let x denote the distance from x to the nearest integer, namely,

x = min |x − m| = min {x}, 1 − {x} , m∈Z

where {x} denotes the fractional part of x. Now, let ψ be a nondecreasing positive function, defined at least on positive integers. An irrational number y is of type < ψ if qqy ≥ 1/ψ(q) for all positive integers q. If ψ is a constant function, then an irrational number y of type < ψ is also called of constant type. Let η be a positive real number (or infinity). The irrational number y is of type η, if η is the supremum of all γ for which lim inf q γ qy = 0. q→∞ q integer

It is classical (Dirichlet’s theorem) that we have lim inf q→∞ q γ qy = 0 for any γ < 1 and for any irrational y. Therefore the type of an irrational number is always greater than or equal to 1.

410

8 The metric entropy method

For a sequence s = {sn , n ≥ 1} of real numbers, the discrepancy of s modulo 1 is defined by N NDN (s) = sup 1 − N|I |. I ⊂[0,1[

n=1 sn ∈I

Recall the Erdös–Turan inequality (for a proof see e.g., Harman [1998: Theorem 5.5]): There exists an absolute constant C such that for any positive integers L and N , NDN (s) ≤

L N N 1 2iπ hsn e +C . L+1 h h=1

(8.8.1)

n=1

Now let X be a Z-valued random variable, with characteristic function ϕ, and let X = {Xn , n ≥ 1} be a sequence of independent copies of X. Put for any positive integer n, Sn = X1 + · · · + Xn , S0 = 0. Fix some x ∈ [0, 1[, and consider the sequence x = {Sn x}, n ≥ 1 . Results concerning the uniform distribution modulo 1 of the sequence x are in Holewijn [1973], Robbins [1973], Schatte [1984,1988], and in Kesten [1964] a variant of the problem is considered. The main result of the section is the theorem below giving an estimate of the discrepancy of the sequence x. The diophantine approximation properties of x are naturally involved there. Before stating the main result of this section, we shall introduce an extra function. Let : R+ → R+ be nondecreasing and such that for any m ∈ N, m h=1

1 ≤ (m). h|ϕ(hx) − 1|

8.8.1 Theorem. Let L : R+ → R+ be nondecreasing and such that L is concave. For any τ > 3/2, a.s.

DN (x) = O

$

%1/2

1 L(N ) + log L(N ) L(N ) N

logτ N .

(8.8.2)

8.8.2 Remarks. 1. Theorem 8.8.1 completes some results of Schatte [1988], in which only the non-lattice case is treated. Schatte considered sums Zn = Y1 + · · · + YN (mod 1), where Y1 , Y2 , . . . are independent copies of a random variable with values in [0, 1[ . Let ζ = {Zn , n ≥ 1}. Under the condition sup P(Zn < x) − x = O(n−3/2 ), (8.8.3) 0≤x≤1

it is proved in Schatte [1988: Theorems 1 and 2] that 2 a.s.

DN (ζ ) = O N −1/2 log N .

(8.8.4)

8.8 An application to the discrepancy of some random sequences

And if

sup P(Zn < x) − x = O(n−4 ),

411

(8.8.5)

0≤x≤1

a law of the iterated logarithm holds: let X0 be a random variable with values in [0, 1[, which is independent of the sequence Y1 , Y2 , . . . . Then n 1 a.s. lim sup √ 1[0,u( (X0 + Zj ) − nu = σ (u), n log log n n→∞

(8.8.6)

j =1

where σ 2 (u) = u − u2 + 2

∞

E 1[0,u( (U )1[0,u( (U + Zj ) − u2 < ∞

(8.8.7)

j =1

and U is a uniformly distributed random variable, which is independent of the sequence Y1 , Y2 , . . . . Schatte’s conditions are always satisfied when the distribution function of X1 possesses an absolute component; they are equally satisfied for some singular and discrete distribution functions, but not for lattice distributions. These results also remain valid when Y1 , Y2 , . . . are independent copies of a random variable with values in [0, y[, y being an arbitrary positive real and Zn = Y1 + · · · + YN (mod y). A general remark can however be made on this point. By putting Sn = nj=1 [Xi ]+ n n j =1 {Xi }, we have that {Sn y} = {Zn + j =1 [Xi ]y}. By considering characteristic functions, we easily see that there is no reason in general for the discrepancies of ({Sn y})n and (Zn )n to be comparable. In the following examples, we deduce from Theorem 8.8.1 discrepancy results for lattice random variables, depending on the diophantine approximation properties of x. 2. Assume that X is Z-valued and that there exists a constant C < ∞ such that for any t ∈ [−1/2, 1/2[, 1 − ϕ(t) ≥ C|t|. Since ϕ(hx) = ϕ(hx), we therefore have m h=1

1 1 =O . h|1 − ϕ(hx)| hhx m

h=1

If x is of irrational type < B, then (Kuipers–Niederreiter [1971: Lemma 3.3]) for any ε > 0 m 1 (8.8.8) = O(mB−1+ε ). hhx h=1

Let x be of irrational type < B and take (m) = mB−1+ε . We choose L(m) := mα/(B−1+ε) for some 0 < α < 1. Then L(m) = mα . We deduce from Theorem 8.8.1, for any σ > 2, DN (x) = O(N −α/(B−1+ε) + N (α−1)/2 logσ N ). a.s.

412

8 The metric entropy method

And, by taking α = (B − 1)/(B + 1), for any σ > 2, DN (x) = O(N −1/(1+B) logσ N ). a.s.

(8.8.9)

3. If X is a Bernoulli sequence and x is of type < B, then (8.8.9) is fulfilled. This is to be compared with the well-known fact that the discrepancy N (x) of the sequence

a.s. {nx} satisfies N (x) = O N −1/B+ε , for any ε > 0. There is a moderated loss of precision in that limit case. 4. Assume that: a

a) X is Z-valued with E X = 0, and E X2 log+ |X| < ∞, for some a > 0. b) x is irrational, and the partial quotients of the continued fraction expansion of x are bounded by a fixed number M. 2 According to Theorem 9.3.4 in Kawata [1972], 1 − $ϕ(t) = 21 (E X2 )t 2 + O | logt |t| |a as t → 0. And thus, m h=1

1 1 =O . h|1 − ϕ(hx)| hhx2 m

h=1

In view of Haber–Osgood [1969: 385], for any t > 1, C1 mt ≤

m h=1

1 ≤ C2 mt , hxt

where the constants C1 , C2 depend on M and t only. Then m h=1

1 ≤ C3 m2 , hhx2

where C3 depends on M and X only. We can thus choose (m) = m2 , and L(m) =

a.s. mα/2 , 0 < α < 1. Applying (8.8.2) gives, for any σ > 2, DN (x) = O N −α/2 + N (α−1)/2 logσ N . Taking α = 1/2, we obtain: Under conditions a) and b), for any σ > 2, a.s.

DN (x) = O N −1/4 logσ N .

(8.8.10)

The proof of Theorem 8.8.1 follows from a series of lemmas. Put for any integers N ≥ 1, m ≥ 0, N (m) =

N

e2iπ mSn x .

(8.8.11)

n=1

8.8.3 Lemma. For any two integers N ≥ P ≥ 1, one has the following estimate: 2 E N (m) − P (m) ≤

7(N − P ) ∧ (N − P )2 . |ϕ(mx) − 1|

(8.8.12)

413

8.8 An application to the discrepancy of some random sequences

Proof. An elementary computation shows for integers N > P ≥ 1 that 2 E N (m) − P (m) " # " = (N − P ) + (N − P − 1) ϕ(mx) + ϕ(−mx) + (N − P − 2) ϕ(mx)2 # " # + ϕ(−mx)2 + · · · + ϕ(mx)N −P −1 + ϕ(−mx)N −P −1 . But

ϕ(mx)k + ϕ(−mx)k = E eimxSk + e−imxSk = 2E cos mxSk = 2$(E eimxSk ) = 2$(ϕ(mx)k ),

for any k ≥ 1. And thus, 2 E N (m) − P (m) = (N − P ) + 2$ N − P − 1)ϕ(mx)

+ (N − P − 2)ϕ(mx)2 + · · · + ϕ(mx)N −P −1 .

For any z ∈ C and Q ∈ N,

Q−1 d=1

(Q − d)zd = QzQ−1 −

Q z−1

+

zQ −1 . (z−1)2

Therefore,

2 E N (m) − P (m) = (N − P ) + 2$ N − P )ϕ(mx)N −P −1 (N − P ) ϕ(mx)N −P − 1 − + ϕ(mx) − 1 (ϕ(mx) − 1)2 7(N − P ) 2 ∧ (N − P ) . ≤ |ϕ(mx) − 1|

This proves the lemma. Put now for any positive integer n, Un =

L(n) h=1

1 n (h). h

(8.8.13)

8.8.4 Lemma. For any two integers n > l ≥ 1, L(l) 2 1 E Un − Ul ≤ 14 (n − l)(1 + log L(l)) h ϕ(hx) − 1 h=1 L(n) + n log L(l)

1 . hϕ(hx) − 1 h=L(l)+1 L(n)

(8.8.14)

Proof. Plainly L(l) 1

Ul − Un = |l (h)| − |n (h)| − h h=1

L(n) h=L(l)+1

1 |n (h)| := A − B. h

414

8 The metric entropy method

By the Cauchy–Schwarz inequality, and by Lemma 8.8.3, L(l) L(l) 2 1 1 EA ≤ E l (h) − n (h) h h 2

h=1

h=1

L(l) L(l) 1 1 , ≤ 7(n − l) h h ϕ(hx) − 1 h=1

EB ≤ 2

L(n) h=L(l)+1

≤ 7n

h=1

1 E h

L(n) h=L(l)+1

1 h

L(n) h=L(l)+1

2 1 n (h) h

L(n)

1 . ϕ(hx) − 1 h h=L(l)+1

Lemma 8.8.4 thus follows. Put := L; then for any n > l ≥ 1,

Un − Ul 22

L(n) ≤ 14 (n − l)(l)(1 + log L(l)) + n (n) − (l) log L(l) ≤ 14(n − l)(n) log eL(n),

since by concavity assumption of ,

(n)−(l) n−l

(n) n .

≤

8.8.5 Proposition. For any τ > 3/2, #1/2 τ a.s. " Un = O (n)n log L(n) log n .

(8.8.15)

Proof. By the remark made above, for any n > l ≥ 1,

Un − Ul 22 ≤ 14(n − l)(n) log eL(n),

Un 22 ≤ 14n(n) log eL(n).

Let a > 1/2. By Tchebycheff’s inequality, " #1/2 a P |U2p | > (2p )2p log eL(2p ) p ≤ Cp−2a , and by the first form of the Borel–Cantelli lemma, " #1/2 a a.s. |U2p | = O( (2p )2p log L(2p ) p ). p , 2p+1 [. Now, examine the oscillation of Un over the interval 2 [2 " #1/2 p p p Put Un = Un / (2 )2 log eL(2 ) . Then E Un − Ul ≤ C ( n−l 2p ). Applying Lemma 8.3.3 gives U − U ≤ Cp. sup n l 2 2p ≤n,m 3/2. By the Tchebycheff inequality, P

sup 2p ≤n,m 3/2, " #1/2 τ a.s.

NDN (x) = O N/L(N ) + (N)N log L(N ) log N ,

(8.8.17)

which proves our claim.

8.9 An application to random Dirichlet polynomials We close this chapter by giving an application of the metric entropy method to the study of the supremum of some classes of random Dirichlet polynomials. We begin with some general considerations. Let {dn , n ≥ 1} be a sequence of real numbers. Let s = σ + it denote a complex −s over lines number. The supremum of the Dirichlet polynomials P (s) = N n=2 dn n {s = σ + it, t ∈ R} is naturally related to that of corresponding Dirichlet series, via the abscissa of uniform convergence −σ −it converges uniformly over t ∈ R , σu = inf σ : ∞ n=2 dn n through the relation −it log supt∈R N n=2 dn n σu = lim sup . log N N →∞ We refer to Bohr [1952], Helson [1967] or Hardy and Riesz [1915] for this background and related results. This naturally justifies the investigation of the supremum of Dirichlet polynomials. Studies for random Dirichlet polynomials and random Dirichlet series were developed in Halász [1983] and Quéffelec [1980], [1983], [1995] notably, see also Lifshits– Weber [2007], [2009a] and references therein. Such investigations concerning random

416

8 The metric entropy method

Dirichlet series and random power series go back to earlier works of Hartman [1939], Clarke [1969], Dvoretzky–Erdös [1955], [1959]. Let us indicate some useful general results. For instance let ξ = {ξ, ξn , n ≥ 1} be a sequence of i.i.d. random variables and let σc and σa be, respectively, the almost abscissa of convergence and of absolute convergence of the Dirichlet series ∞ sure −s ξ n . If ξ = 0 holds with positive probability, let kξ := sup{γ : E |ξ |γ < ∞}. n n=1 The connection between the abscissas σc and σa and integrability of ξ has been clarified in [Clarke: 1969]. We have the implications: kξ = 0 0 < kξ ≤ 1 (kξ > 1 and E ξ = 0) (kξ > 1 and E ξ = 0)

"⇒ "⇒ "⇒ "⇒

σa σa σa σa

= σc = ∞ = σc = 1/kξ = σc = 1 = 1 and σc = max(1/kξ , 1/2).

(8.9.1)

Now let ε = {εi , i ≥ 1} be a sequence of independent Rademacher random variables (P{εi = ±1} = 1/2) defined on a basic probability space (, A, P). The following result is due to Bayart, Konyagin and Quéffelec [2003/2004]. Let {an , n ≥ 1} be a sequence of complex numbers, then: N 1 2 – If lim supN →∞ log log n=0 |an | = γ > 0, then for almost all choices of signs N ∞ εn = ±1, the series n=0 εn an nit diverges for each t ∈ R. – The result is nearly optimal: if0 < δn → 0, there exists a sequence {an , n ≥ 1} 2 > 0, but for each ω, the series such that lim supN →∞ δN log1log N N n=0 |an | ∞ it n=0 εn (ω)an n converges for at least on t ∈ R. In relation with the above, we may quote Hedenmalm and Saksman’s extension [2003] of Carleson’s result, namely the convergence for almost all t of the Dirichlet series ∞ εn (ω)an n−1/2+it ∞

n=0

under the assumption n=0 |an |2 < ∞. A simple and elegant proof is given in Konyagin and Quéffelec [2001/2002, p. 158/159]. The growth of random Dirichlet series were studied in [Yu: 1978/95]. Now consider the random Dirichlet polynomials D(s) =

N

εn dn n−σ −it .

(8.9.2)

n=2

When dn ≡ 1, some results are known. If σ = 0, then for some absolute constant C, and all integers N ≥ 2 C −1

N N εn n−it | ≤ C ≤ E sup | . log N log N t∈R N

n=2

(8.9.3)

8.9 An application to random Dirichlet polynomials

417

This has been proved by Halász and was later extended by Queffélec to the range of values 0 ≤ σ < 1/2. Queffélec provided a probabilistic proof of the original one, using Bernstein’s inequality for polynomials. For some constant Cσ depending on σ only, and all integers N ≥ 2 N Cσ−1

1−σ

log N

≤ E sup |

N

εn n−σ −it | ≤ Cσ

t∈R n=2

N 1−σ . log N

(8.9.4)

A basic reduction step is used for establishing these results. Introduce a useful notion. A set of numbers ϕ1 , ϕ2 , . . . , ϕk is linearly independent if no linear relation a1 ϕ1 + a2 ϕ2 + . . . + ar ϕr = 0, with integral coefficients, not all zero, holds between them. For a proof of the classical result below, we refer to Hardy and Wright [1979; Theorem 442]. Kronecker’s theorem. If ϕ1 , ϕ2 , . . . , ϕk , 1 are linearly independent, θ1 , θ2 , . . . , θk are arbitrary, and N , ε are positive, then there are integers n > N, n1 , n2 , . . . , nk such that max |nϕm − nm − θm | < ε.

1≤m≤k

Consequently, the set of points {nϕ1 }, {nϕ1 }, . . . , {nϕk } is dense in Tk . Let p1 , p2 , . . . , pk be different primes. By the fundamental theorem of arithmetic log p1 , log p2 , . . . , log pk are linearly independent. This will enable to replace the Dirichlet polynomial by some relevant trigonometric polynomial. Introduce the necessary notation. Let 2 = ) a (n) p1 < p2 < · · · be the sequence of consecutive primes. If n = τj =1 pj j , we write a(n) = {aj (n), 1 ≤ j ≤ τ }. Let π(N ) denote, as usual, the number of prime numbers that are less or equal to N. Let us fix N , put μ = π(N), and define, for z = (z1 , . . . , zμ ) ∈ Tμ , Q(z) =

N

dn n−σ e2iπ a(n),z ,

n=2

H. Bohr’s observation states that N sup dn n−(σ +it) = sup |Q(z)|. t∈R n=2

(8.9.5)

z∈Tμ

Remark. Naturally, no similar reduction occurs when considering the supremum over a given bounded interval I . However, when the length of I is of exponential size with respect to the degree of P , precisely when |I | ≥ e(1+ε)ωN (log N ω) log N ,

418

8 The metric entropy method

the related supremum becomes comparable, for ω large, to the one taken on the real line, with an error term of order O(ω−1 ). This is in turn a rather general phenomenon due to existence of “localized” versions of Kronecker’s theorem; and in the present case to Turán’s estimate (see [Weber: 2008] for a slightly improved form of it using a probabilistic approach, and references therein). When the length is of sub-exponential order, the study however still belong to the field of application of the general theory of regularity of stochastic processes. Now consider the following natural extension. For any integer n > 1, let P + (n) denote the largest prime divisor of n. Let 1≤ M < N be two positive integers and define S(N, M) = 2 ≤ n ≤ N : P + (n) ≤ M . Since S(N, N ) = [2, N], these sets naturally generalize the notion of interval of integers. By using the standard notation "(N, M) := #(S(N, M)), u = (log N)/ log M, we have (see Tenenbaum [1990: 405])

"(N, M) 1 " (N, M) := , = ρ(u) + O N log y ∗

(8.9.6)

uniformly for x ≥ y ≥ 2, where ρ(u) is the Dickman function, namely the unique continuous function on [0, ∞[, having a derivative on ]0, ∞[, and such that ρ(v) = 1, 0 ≤ v ≤ 1, ρ (v)v + ρ(v − 1) = 0, v > 1. It is known that ρ(u) > 0 for all u > 0. By setting M = N ε in (8.9.6) we see that "(N, N ε ) ∼ Nρ(ε−1 ) for any fixed 0 < ε ≤ 1. In view of (8.9.6), we shall refer to " ∗ as to Dickman-type function. Fix some positive integer τ ≤ π(N) and put Eτ = Eτ (N ) = 2 ≤ n ≤ N : P + (n) ≤ pτ . Note that for τ = μ we have Eμ = {2, . . . , N}. The Eτ -based Dirichlet polynomials were considered in [Quéffelec: 1995]. 8.9.1 Theorem. (a) Upper bound. Let 0 ≤ σ < 1/2. such that for any integer N ≥ 2 it is true that ⎧ 1/2−σ τ 1/2 ⎪Cσ N (log N )1/2 ⎪ ⎨ N 3/4−σ E sup εn n−σ −it ≤ Cσ (log N )1/2 ⎪ t∈R n∈E ⎪ ⎩C N 1/2−σ τ 1/2 τ σ

Then there exists a constant Cσ if N 1/2 ≤ τ ≤ N , if

N 1/2 log N

≤ τ ≤ N 1/2

if 1 ≤ τ ≤

N 1/2 log N .

419

8.9 An application to random Dirichlet polynomials

(b) Lower bound. Let 0 ≤ σ < 1/2. Then there exists a constant Cσ such that for every N ≥ 2, 1/2 C N 1/2−σ τ 1/2 σ −σ −it ∗ N E sup εn n ·" , pτ/2 . ≥ (log τ )1/2 pτ t∈R n∈Eτ

Proof of the upper bound in Theorem 8.9.1. The principle of the proof of the upper bound is as follows. Once we have reduced the operation to the study of a random polynomial Q on the multidimensional torus by using (8.9.5), the proof then consists of two different steps based on a decomposition Q = Q1 + Q2 . Our study of the supremum of the polynomial Q1 is made by using the metric entropy method. Our investigation of the supremum of the polynomial Q2 is undertaken by using first the contraction principle, reducing the study to that of a complex-valued Gaussian process. The latter task is carried out by means of Slepian’s comparison lemma, and by a careful study of the L2 -metric induced by this process. Now, we turn to the rigorous proof of the upper bound and introduce some notation. We can represent Eτ as the union of disjoint sets Ej = {2 ≤ n ≤ N : P + (n) = pj }, j = 1, . . . , τ. For z ∈ Tτ we put Q(z) =

τ

εn n−σ e2iπ a(n),z .

j =1 n∈Ej

By (8.9.5) we have τ sup εn n−σ −it = sup Q(z). z∈Tτ

t∈R j =1 n∈E j

Let 1 ≤ ν < τ be fixed. Write Q = Q1 + Q2 where Q1 (z) = εn n−σ e2iπ a(n),z , Q2 (z) = P + (n)≤pν

εn n−σ e2iπ a(n),z .

pν

10.8.2009

11:03 Uhr

Seite 1

IRMA Lectures in Mathematics and Theoretical Physics 14 Edited by Christian Kassel and Vladimir G. Turaev

Institut de Recherche Mathématique Avancée CNRS et Université de Strasbourg 7 rue René-Descartes 67084 Strasbourg Cedex France

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 2

IRMA Lectures in Mathematics and Theoretical Physics Edited by Christian Kassel and Vladimir G. Turaev This series is devoted to the publication of research monographs, lecture notes, and other material arising from programs of the Institut de Recherche Mathématique Avancée (Strasbourg, France). The goal is to promote recent advances in mathematics and theoretical physics and to make them accessible to wide circles of mathematicians, physicists, and students of these disciplines. Previously published in this series: 1 2 3 4 5 6 7 8 9 10 11 12 13

Deformation Quantization, Gilles Halbout (Ed.) Locally Compact Quantum Groups and Groupoids, Leonid Vainerman (Ed.) From Combinatorics to Dynamical Systems, Frédéric Fauvet and Claude Mitschi (Eds.) Three courses on Partial Differential Equations, Eric Sonnendrücker (Ed.) Infinite Dimensional Groups and Manifolds, Tilman Wurzbacher (Ed.) Athanase Papadopoulos, Metric Spaces, Convexity and Nonpositive Curvature Numerical Methods for Hyperbolic and Kinetic Problems, Stéphane Cordier, Thierry Goudon, Michaël Gutnic and Eric Sonnendrücker (Eds.) AdS/CFT Correspondence: Einstein Metrics and Their Conformal Boundaries, Oliver Biquard (Ed.) Differential Equations and Quantum Groups, D. Bertrand, B. Enriquez, C. Mitschi, C. Sabbah and R. Schäfke (Eds.) Physics and Number Theory, Louise Nyssen (Ed.) Handbook of Teichmüller Theory, Volume I, Athanase Papadopoulos (Ed.) Quantum Groups, Benjamin Enriquez (Ed.) Handbook on Teichmüller Theory, Volume II, Athanase Papadopoulos (Ed.)

Volumes 1–5 are available from Walter de Gruyter (www.degruyter.de)

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 3

Michel Weber

Dynamical Systems and Processes

irma_weber_titelei

10.8.2009

11:03 Uhr

Seite 4

Author: Michel Weber Institut de Recherche Mathématique Avancée CNRS et Université de Strasbourg 7, rue René Descartes 67084 Strasbourg Cedex France

2000 Mathematics Subject Classification: 37-02, 60-02. Key words: Dynamical systems, measure-preserving transformation, ergodic theorems, spectral theorems, convergence almost everywhere, central limit theorem, stochastic processes, gaussian processes, metric entropy method, majorizing measure method, randomization methods, Riemann sums

978-3-03719-046-3 The Swiss National Library lists this publication in The Swiss Book, the Swiss national bibliography, and the detailed bibliographic data are available on the Internet at http://www.helveticat.ch. This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copyright owner must be obtained.

© 2009 European Mathematical Society Contact address: European Mathematical Society Publishing House Seminar for Applied Mathematics ETH-Zentrum FLI C4 CH-8092 Zürich Switzerland Phone: +41 (0)44 632 34 36 Email: [email protected] Homepage: www.ems-ph.org Typeset using the author’s TEX files: I. Zimmermann, Freiburg Printed in Germany 987654321

Preface

The aim of this book is to present in a concise and accessible way, as well as in a common setting, various tools and methods arising from spectral theory, ergodic theory and probability theory, which contribute interactively to the current research on almost everywhere convergence problems. The recent developments in the study of these questions are often obtained by combining either methods of spectral theory with principles of ergodic theory or methods from probability theory with tools and principles from spectral theory and ergodic theory. The spectral criterion of Gaposhkin, and later, following a remarkable metric entropy inequality of Talagrand, the spectral regularization developed in the setting of the study of square functions and oscillation functions in ergodic theory, are typical examples of this fruitful interaction. Another example of thorough interaction is certainly the work of Bourgain and notably his famous entropy criterion, at the basis of which lies the continuity principle of Stein. It was not our aim to write a complete treatise in ergodic theory, assuming such enterprise to be conceivable. The development of this theory during the last twenty years was indeed considerable. A similar remark can be made for the part concerning the study of the regularity of stochastic processes. The work is also not a synthesis of most significant results, complete with sketched proofs and references. We chose the intermediate route to writing a book in the spirit of lectures oriented towards research. The book provides an easy access to many tools, methods and results used in current research, presenting each of them in as wide a setting as possible. The proofs of these results are often given with full details. This book is divided in four parts, which came more or less naturally while writing it. Part I is devoted to spectral results and is followed by Part II, in which tools and results from ergodic theory are presented. In the third part, in connection with the description of two main methods, namely the metric entropy method and the majorizing measure method, recent applications to ergodic theory are given via the study of some maximal inequalities of Gál–Koksma type and the Lp norm, 1 ≤ p ≤ ∞, of important classes of polynomials. Finally, in the last part of the book we recollect classical results, as well as recent advances concerning Riemann sums and Khintchin sums, and the value distribution of divisors of Bernoulli or Rademacher sums, used in the study of Riemann sums. In Part I we begin elementarily with the spectral inequality. Chapter 1 concerns von Neumann’s theorem, which forms with Birkhoff’s ergodic theorem the basis of ergodic theory. It seems natural to include in this chapter Talagrand’s metric entropy n−1 estimate for the set {ATn f, n ≥ 1} where ATn is the average operator I +T +···+T n of a contraction T in a Hilbert space, thus completing naturally the von Neumann theorem. Recently discovered, remarkably efficient, spectral regularization inequalities analysing other structural properties of the set {ATn f, n ≥ 1}, followed by Weyl’s

vi

Preface

criterion and the van der Corput principle, complete this chapter. Chapter 2 starts with presenting the arguments leading to the representation of a weakly stationary process as Fourier transform of a random measure with orthogonal increments. Next we study Gaposhkin’s spectral criterion. In Part II, we first review in Chapter 3 classical ergodic and mixing properties of measurable dynamical systems. We also study several standard examples. Chapter 4 is devoted to Birkhoff’s pointwise theorem, to dominated ergodic theorems in Lp and to BMO spaces of associated maximal operators. This is continued with a discussion around spectral characterizations of the speed of convergence in Birkhoff’s pointwise theorem. Next we examine oscillation functions of ergodic averages. The transference principle and Wiener–Wintner theorems are discussed. A study of weighted ergodic averages concludes this chapter. In Chapter 5, some basic tools from ergodic theory, the Banach principle, the continuity principle and the conjugacy lemma are studied in detail. Chapter 6 concerns entropy criteria of Bourgain. Several functional inequalities linking the studied sequence of L2 -operators with the canonical Gaussian process on L2 are established, from which the criteria are then easily deduced. Study of the statistic of the ergodic averages naturally leads to investigating the question of the existence of some f ∈ L2 such that the related ergodic averages satisfy a central limit theorem, the invariance principle or the almost sure central limit theorem. Chapter 7 is devoted to this study. A detailed proof of the theorem of Burton–Denker on the existence, in any aperiodic dynamical system, of the central limit theorem is given. The method of proof relies upon Kakutani–Rochlin’s lemma and imitates the analogous result for irrational rotations of the unit circle which is obtained by using Fourier series. A fundamental fact in the background of the entire construction is provided by using Rochlin’s result on a factor space of Lebesgue space. The case of irrational rotations involving various remarkably efficient methods is more closely investigated. The existence of L2 elements of the torus satisfying the central limit theorem (CLT) is established for various types of means: nonlinear ergodic means, weighted ergodic means, and ergodic means along the squares. For the latter case, the circle method is used. The chapter concludes with a recent study of a kind of achieved form of the CLT, the convergence in variation implying the convergence of related density distributions in the spaces Lp (R), 1 ≤ p ≤ ∞, in the symptomatic case of lacunary random Fourier series. Two rather general methods are investigated in Part III: the metric entropy method and the majorizing measure method. In Chapter 8, a useful criterion for almost everywhere convergence involving covering numbers is proved, and then used to prove in a unified setting several classical results, such as Stechkin’s theorem, Gál–Koksma theorems and quantitative Borel–Cantelli lemmas. The metric entropy method is next applied to establish quite useful estimates of the supremum of random polynomials, notably random Dirichlet polynomials, and to study almost sure convergence properties of weighted series of contractions and random perturbation of some intersective sets in ergodic theory. Chapter 9 concerns an important tool: the majorizing measure method. A general criterion for almost sure convergence of averages is proved by means of this

Preface

vii

method. We continue with recent applications of the majorizing measure method to the study of the supremum of random polynomials, including a strictly stronger form of the well-known Salem–Zygmund estimate. Some remarkable classes of examples are studied. Chapter 10 is a succinct study of Gaussian processes presented in the form of a toolbox. Various fundamental results from the theory are discussed, sometimes with historical comments and proofs. Much importance is given to very handy correlation inequalities. Part IV is devoted to three studies: the study of Riemann sums, the study of convergence properties of the system {f (nk x), k ≥ 1} and a probabilistic approach concerning divisors with applications. Chapters 1 to 6 and partially Chapters 8 to 10 are based on lectures given at the Mathematical Institute of the University of Strasbourg. Chapters 11 to 13 are mainly based on research articles, as well as some parts of Chapters 1, 4, 7, 8, 9. In writing this book, we followed a general principle: where the proofs in our source readings were only sketched, we fill in the gaps in as much detail as possible. Further, we give quasisystematically complete references with page numbers and/or precise numeration of cited results. We always keep in mind the wish to help, as much as we can, the researcher but also the teacher and the graduate student in their work in these beautiful areas of mathematics, trying also to spare their time and to let them share our passion for research at the interfaces of related problems. I would like to thank Mikhail Lifshits for the many discussions and encouragements. I would also like to thank Istvan Berkes for his indefectible enthusiasm and the many exchanges and comments, as well as Ulrich Krengel for stimulating comments. I am much indebted and grateful to Irene Zimmermann for her technical assistance and for numerous observations and remarks. I thank Manfred Karbe and the European Mathematical Society Publishing House for accepting this work in their IRMA series, and for efficient help in publishing. I devote this book to my wife Marie-Christine. She always provided a favourable atmosphere for mathematical work.

Contents

Preface Part I

v Spectral theorems and convergence in mean

1

1 The von Neumann theorem and spectral regularization 1.1 Bochner–Herglotz lemma . . . . . . . . . . . . . . . . . 1.2 The spectral inequality . . . . . . . . . . . . . . . . . . 1.3 The von Neumann theorem . . . . . . . . . . . . . . . . 1.4 The spectral regularization inequality . . . . . . . . . . . 1.5 Moving averages . . . . . . . . . . . . . . . . . . . . . 1.6 Uniform distribution mod a – the Weyl criterion . . . . . 1.7 The van der Corput principle . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 3 8 10 26 44 51 55

2 Spectral representation of weakly stationary processes 2.1 Weakly stationary processes . . . . . . . . . . . . . . 2.2 Spectral representation of unitary operators . . . . . . 2.3 Elements of stochastic integration . . . . . . . . . . . 2.4 Spectral representation of weakly stationary processes . 2.5 Weakly stationary sequences and orthogonal series . . 2.6 Gaposhkin’s spectral criterion . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

61 61 64 76 78 80 85

Part II

. . . . . .

Ergodic Theorems

91

3 Dynamical systems – ergodicity and mixing 3.1 Measurable dynamical systems – topological dynamical systems 3.2 Ergodicity of a dynamical system . . . . . . . . . . . . . . . . . 3.3 Weak mixing, strong mixing, continuous spectrum . . . . . . . . 3.4 Spectral mixing theorem . . . . . . . . . . . . . . . . . . . . . 3.5 Other equivalences and other forms of mixing . . . . . . . . . . 3.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

93 93 101 103 110 114 121

4 Pointwise ergodic theorems 4.1 Birkhoff’s pointwise theorem 4.2 Dominated ergodic theorems 4.3 Classes L logm L . . . . . . 4.4 A converse . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

129 129 139 144 145

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

x 4.5 4.6 4.7 4.8 4.9

Contents

Speed of convergence . . . . . . . . . . Oscillation functions of ergodic averages Wiener–Wintner theorem . . . . . . . . Weighted ergodic averages . . . . . . . Subsequence averages . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

148 152 165 168 193

5 Banach principle and continuity principle 5.1 Banach principle . . . . . . . . . . . . . . . 5.2 Continuity principle . . . . . . . . . . . . . . 5.3 Applications . . . . . . . . . . . . . . . . . . 5.4 A principle of domination – conjugacy lemma

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

200 200 206 217 226

6 Maximal operators and Gaussian processes 6.1 Some liaison theorems . . . . . . . . . . . 6.2 Two preliminary lemmas . . . . . . . . . . 6.3 Proof of Theorem 6.1.1 . . . . . . . . . . . 6.4 Proof of Theorem 6.1.6 . . . . . . . . . . . 6.5 The case Lp , 1 < p < 2 . . . . . . . . . . 6.6 A remarkable GB set property . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

230 230 242 247 249 254 259

7 The central limit theorem for dynamical systems 7.1 Introduction and preliminaries . . . . . . . . . . 7.2 A theorem of Burton and Denker . . . . . . . . . 7.3 The central limit theorem for orbits . . . . . . . . 7.4 A theorem of Volný . . . . . . . . . . . . . . . . 7.5 CLT for rotations . . . . . . . . . . . . . . . . . 7.6 Lacunary series and convergence in variation . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

267 267 269 284 289 291 315

Part III

. . . . .

. . . . .

. . . . . .

Methods arising from the theory of stochastic processes

8 The metric entropy method 8.1 Introduction and general results . . . . . . . . . . . . . . . . . . . 8.2 A theorem of Stechkin . . . . . . . . . . . . . . . . . . . . . . . 8.3 An application to the quantitative Borel–Cantelli lemma . . . . . . 8.4 Application to Gál–Koksma’s theorems . . . . . . . . . . . . . . 8.5 An application to the supremum of random polynomials . . . . . . 8.6 Application to a.s. convergence of weighted series of contractions 8.7 An application to random perturbation of intersective sets . . . . . 8.8 An application to the discrepancy of some random sequences . . . 8.9 An application to random Dirichlet polynomials . . . . . . . . . .

339

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

341 341 349 353 364 369 387 403 409 415

9 The majorizing measure method 433 9.1 Introduction – the exponential case . . . . . . . . . . . . . . . . . . . . . 433

xi

Contents

9.2 A general approach . . . . . . . . . . . . . . . 9.3 A useful criterion . . . . . . . . . . . . . . . . 9.4 Proof of Theorem 9.3.3 . . . . . . . . . . . . . 9.5 Proof of Theorems 9.3.10 and 9.3.11 . . . . . . 9.6 Proof of Theorem 9.3.12 and some examples . 9.7 A stronger form of Salem–Zygmund’s estimate 9.8 Some examples and discussion . . . . . . . . . 9.9 Uniform convergence of random Fourier series

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

438 447 457 469 471 475 478 488

10 Gaussian processes 10.1 Gaussian variables and correlation estimates . . . 10.2 0-1 laws, integrability and comparison lemmas . 10.3 Regularity and irregularity of Gaussian processes 10.4 Gaussian suprema . . . . . . . . . . . . . . . . . 10.5 Oscillations of Gaussian Stein’s elements . . . . 10.6 Tightness of Gaussian Stein’s elements . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

491 491 504 510 517 529 537

Part IV Three studies

547

11 Riemann sums 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The results of Jessen and Rudin . . . . . . . . . . . . . . . . . . 11.3 Individual theorems of spectral type . . . . . . . . . . . . . . . 11.4 Breadth and dimension . . . . . . . . . . . . . . . . . . . . . . 11.5 Bourgain’s results . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Connection with number theory . . . . . . . . . . . . . . . . . 11.7 Riemann sums and the randomly sampled trigonometric system 11.8 Almost sure convergence and square functions of Riemann sums

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

549 549 551 554 557 562 565 573 587

12 A study of the system (f (nx)) 12.1 Introduction and mean convergence . . . . . . 12.2 Almost sure convergence – sufficient conditions 12.3 Almost sure convergence – necessary conditions 12.4 Random sequences . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

601 601 611 634 642

. . . . . . .

659 659 661 675 685 691 699 701

. . . .

. . . .

. . . .

. . . .

13 Divisors and random walks 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 13.2 Value distribution and small divisors of Bernoulli sums 13.3 An LIL for arithmetic functions . . . . . . . . . . . . . 13.4 On the order of magnitude of the divisor functions . . . 13.5 Value distribution of the divisors of n2 + 1 . . . . . . . 13.6 Value distribution of the divisors of Rademacher sums . 13.7 The functional equation and the Lindelöf Hypothesis .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

xii

Contents

13.8 An extremal divisor case . . . . . . . . . . . . . . . . . . . . . . . . . . 711 Bibliography

729

Index

759

Part I Spectral theorems and convergence in mean

Chapter 1

The von Neumann theorem and spectral regularization

Von Neumann’s theorem is, together with Birkhoff’s theorem, one of the fundamental results in ergodic theory. A remarkable spectral regularization inequality is established, from which Talagrand’s entropy estimate is deduced, as well as sharp bounds for the Littlewood–Paley square functions. Other averages, like moving averages, are considered. Some useful lemmas, the Bochner–Herglotz lemma, the spectral lemma and the spectral inequality are first established and completed by some other, sometimes less known results. Two important tools are included at the end of the chapter: Weyl’s equidistribution theorem and the van der Corput principle.

1.1

Bochner–Herglotz lemma

The lemmas studied in this section, as well as in the next one, are classical tools of spectral analysis. The spectral inequality, which is easily derived from the Bochner– Herglotz lemma, allows us to reduce many problems of in-norm evaluation of vectors, to much more tractable harmonic analysis questions. This tool is often used in ergodic theory. We thus begin by establishing Bochner–Herglotz’s lemma. A function γ : R → R is nonnegative definite if for any positive integer n, and any u1 , . . . , un ∈ R, a1 , . . . , an ∈ C, we have ai a¯ j γ (ui − uj ) ≥ 0. 1≤i,j ≤n

For a continuous function γ : R → R, an equivalent definition of nonnegative definiteness is that for any measurable bounded function ξ(x) vanishing outside some finite interval, ∞

∞

−∞ −∞

γ (t − s)ξ(t)ξ(s) dtds ≥ 0.

A sequence of complex numbers {ak , k ∈ Z} is nonnegative definite if a−k = a¯ k and if the inequality ρi ρ j ai−j ≥ 0, 1≤i,j ≤n

holds for any finite system of complex numbers ρ1 , . . . , ρn . A function γ : Z → R is thus nonnegative definite if the sequence {γ (k), k ∈ Z} is nonnegative definite. These notions immediately extend to functions defined on Rd or Zd . Let T = R/Z = [0, 1[

4

1 The von Neumann theorem and spectral regularization

be the circle equipped with the normalized Lebesgue measure λ, and let Td denote the d-dimensional torus equipped with the measure λd . 1.1.1 Lemma. a) Let γ : Rd → R be continuous, nonnegative definite. Then there exists a nonnegative bounded measure μ on Rd , such that for any x ∈ Rd , γ (x) = eit,x μ(dt). Rd

b) Let γ : Zd → R be nonnegative definite. Then there exists a nonnegative bounded measure μ on Td , such that for any k ∈ Zd , γ (k) = e2iπ k,t μ(dt). Td

Proof. We give the proof for d = 1, the multidimensional case being obtained in a quite identical way. Let Z denote some positive integer. Consider a) first. Put Ik =

k

[0,Z[k i,j =1

e−i(ui −uj )x γ (ui − uj ) du1 . . . duk .

By assumption Ik ≥ 0. Moreover, Ik = kγ (0) du1 . . . duk [0,Z[k

du1 . . . duk Z Z −i(ui −uj )x e γ (ui − uj ) dui duj k−2 dui duj 0 0 i,j =1 [0,Z[ Z Z = kγ (0)Z k + k(k − 1)Z k−2 e−i(u−v)x γ (u − v) dudv. +

k

0

Dividing by k(k

0

− 1)Z k−2

and then letting k tend to infinity, implies Z Z e−i(u−v)x γ (u − v)dudv ≥ 0. 0

0

Making the change of variables u − v = t gives Z Z Z−v e−itx γ (t)dt dv = e−itx γ (t) {min(Z, Z − t) − sup(0, −t)} dt 0

−v

−Z Z

=

−Z

e−itx γ (t) (Z − |t|) dt ≥ 0.

Let γZ (x) = γ (x) (1 − |x|/Z) 1[−Z,Z] (x),

γˆZ (x) =

R

e−itx γZ (t) dt.

5

1.1 Bochner–Herglotz lemma

Then γˆZ (x) ≥ 0, and evidently γZ ∈ L∞ (R). We show that γˆZ ∈ L1 (R). Integrating 2 2 γˆZ (x) over R with respect to the density √1 e−x /(2σ ) , yields σ 2π

R

γˆZ (x)e

−

x2 2σ 2

dx = √ σ 2π

2 2 2 −itx− x 2 dx 2σ γZ (t) e γZ (t)e−σ t /2 dt. dt = √ σ 2π R R R

Hence, since γZ ∞ ≤ γ (0), R

γˆZ (x)e

−

x2 2σ 2

√ 2 2 dx = σ 2π γZ (t)e−σ t /2 dt R √ √ 2 2 ≤ σ 2π γ (0) e−σ t /2 dt = 2π γ (0). R

But γˆZ (x) ≥ 0. Letting σ tend to infinity increasingly, finally shows in view of Fatou’s lemma that γˆZ ∈ L1 (R). Now we need the Fourier inversion theorem: Let h, hˆ ∈ L1 (Rd ). Then for almost all x, h(x) =

Rd

ˆ eit,x h(t)dt.

Thus γˆZ ∈ L1 (R) and for almost all x, γZ (x) = R eitx γˆZ (t)dt. As γZ and the mapping itx x → R e γˆZ (t)dt are continuous, the above equality holds in turn everywhere. Hence γ (0) = γZ (0) = γˆZ (t)dt. R

Denote by μZ the measure on R having density γˆZ (t). Since γZ (x) → γ (x) everywhere as Z tends to infinity, we get lim μˆ Z (x) = γ (x).

Z→∞

By assumption γ is continuous. It follows from the corollary on p. 481 in [Feller: 1966, II] that there exists a nonnegative bounded measure μ on R such that γ (x) = μ(x). ˆ Z −2iπ(n−m)x γ (n − m) ≥ 0. We pass to the proof of b). By assumption n,m=1 e This sum can also be written as Z

e−2iπ(n−m)x γ (n − m)

n,m=1

=

n−1 Z n=1 p=n−Z

e

−2iπ xp

γ (p) =

Z−1 −Z+1

e−2iπ xp γ (p)

p+1≤n≤p+Z 1≤n≤Z

1=

6

1 The von Neumann theorem and spectral regularization

=

Z−1

e−2iπ xp γ (p){min(p + Z, Z) − max(1, p + 1) + 1}

−Z+1

=

Z−1

e−2iπ xp γ (p) (Z − |p|) .

−Z+1

Put γZ (p) = γ (p)1{−Z+1,Z−1} (1 − |p|/Z) and gZ (x) = p∈Z e−2iπ xp γZ (p). Then γˆZ (−x) = gZ (x) ≥ 0, and since γZ has compact support, gZ is bounded continuous. Further gZ (x)e2iπ xr dx = γZ (p) e2iπ x(r−p) dx = γZ (r). T

T

p∈Z

In particular γZ (0) = γ (0) = T gZ (x)dx, thereby implying that the nonnegative measures νZ on (T, B(T)) with density gZ (x) are relatively compact for the weak convergence topology D on T. Hence, there exists a subsequence J and a bounded nonnegative measure ν on T such that D

lim

JZ→∞

and limJZ→∞ γZ (r) = any r ∈ Z,

Te

2iπ xr ν(dx).

νZ = ν, Since limZ→∞ γZ (r) = γ (r), we get for

γ (r) =

T

e2iπ xp ν(dx).

Schoenberg’s theorem. Schoenberg [1938] found a beautiful complement to Bochner’s theorem, which is worth being formulated here. Let f : R+ → R+ be continuous, nonnegative definite. Assume that f (0) = 1. Schoenberg’s theorem translates, via Bochner’s theorem, to the equivalence of the following two assertions: (a) For all d ≥ 1, there is a probability measure μd on Rd such that for every x ∈ Rd , eix,y μd (x). f ( x d ) = Rd

Here x d is the Euclidian norm on Rd . (b) There exists a Borel probability ν on R+ such that for any positive real t, ∞ 2 e−st /2 ν(ds). f (t) = 0

There is a proof of Schoenberg’s theorem via the law of large numbers in Khoshnevisan [2005], to which we may also refer as a source. 1.1.2 Remarks. 1. Nonnegative definite sequences are characterized by the previous lemma. According to this one, a sequence is nonnegative definite if and only if there

1.1 Bochner–Herglotz lemma

7

exists a weakly stationary sequence {Xn , n ≥ 1} in a Hilbert space H such that for any positive integers h and k, Xh , Xk = γh−k . This point can also be established by means of a direct vector representation in H , see Ky Fan [1946: Paragraph 2 and Appendix]. Nonnegative definite sequences are closely related to nonnegative trigonometric polynomials. p 2. A trigonometric polynomial k=−p zk eikθ with z−k = zk and taking only nonnegative values, is said to be nonnegative. In view of a classical result of Fejér and F. Riesz (Fejér [1915]), there exist p + 1 complex numbers ρ0 , ρ1 , . . . , ρp such that p

2 zk eikθ = ρ0 + ρ1 eiθ + · · · + ρp eipθ .

k=−p

3. We also quote a theorem due to Szász [1918] (see Ky Fan [1946: Paragraph 3]). A sequence {an , n ∈ Z} is nonnegative definite, if and only if, p

ak zk ≥ 0

k=−p

p holds for any nonnegative trigonometric polynomial k=−p zk eikθ of arbitrary order p. This characterization is to be compared with the one of Hausdorff [1923]: the sequence {an , n ∈ Z} is nonnegative definite, if and only if p p

ah−k ei(h−k)θ ≥ 0

h=1 k=1

is satisfied for any positive integer p and any real θ . Below we list some standard examples of nonnegative definite sequences and weakly stationary sequences. 1.1.3 Examples. (1) Given a weakly stationary sequence {Xn , n ≥ 1} in H , it is readily seen that, for any real value of ϑ, the sequence {e−inϑ Xn , n ≥ 1} is weakly stationary too. Anticipating a bit von Neumann’s theorem, for any value of ϑ the limit e−iϑ X1 + e−i2ϑ X2 + · · · + e−inϑ Xn n→∞ n

(ϑ) = lim

also exists. Further (see before Remarks 1.3.4), if ϑ1 = ϑ2 (mod 2π ), (ϑ1 ) and (ϑ2 ) are orthogonal elements in H . And there exists at most a countable infinite set of values of ϑ for which (ϑ) differs from the null element of H (see Ky Fan [1946: Paragraph 6]). (2) Let : R → R+ be even, convex and nonincreasing. Then the sequence { (n), n ∈ Z} is nonnegative definite. This follows from a classical theorem due to Polyá.

8

1 The von Neumann theorem and spectral regularization

(3) Let S be the space of correlated sequences introduced by Wiener [1933: Chapter 4], namely the space of sequences a = {a(n), n ∈ Z} with a−n = a¯ n , such that for any k ≥ 0 the limit n−1 1 γa (k) = lim a(j )a(j + k) n→∞ n j =0

exists. Observe that for any integers r, s with 0 ≤ r ≤ s, 1 a(h + r)a(h + s). n→∞ n n−1

γa (s − r) = lim

h=0

From this follows that the sequence {γa (k), k ≥ 0} is nonnegative definite. Indeed, m

n−1 m 1 ck c¯l a(j + l)a(j + k) n→∞ n

ck c¯l γa (k − l) = lim

j =0 k,l=1

k,l=1

n−1 m 2 1 ck a(j + k) ≥ 0. n→∞ n

= lim

j =0 k=1

In view of the Bochner–Herglotz theorem, there exists a uniquely determined nonnegative bounded measure a on [−π, π[, called the spectral measure of the sequence a. Consider the family of measures J,a (dα) =

2 1 −ij α e a(j ) dα. J 0≤j <J

A theorem due to Coquet, Kamae and Mendes-France [1977: Theorem 1] shows that the family of measures J,a converges weakly to a . To establish this property, ˆ J,a converges pointwise it suffices to show that the sequence of Fourier transforms ˆ to a , which is easily checked.

1.2 The spectral inequality Bochner–Herglotz’s lemma has a very useful consequence, which we now state. 1.2.1 Lemma. Let T be a contraction in a Hilbert space H . For any n ∈ Z, let Tn = T n if n ≥ 0 and Tn = T ∗ |n| if n < 0. Let x ∈ H . The sequence {Tn x, x, n ∈ Z} is nonnegative definite, and there exists a uniquely determined nonnegative bounded measure μx on T, the spectral measure of T at x verifying exp(2iπ nt)μx (dt) (∀n ∈ Z). Tn x, x = T

9

1.2 The spectral inequality

Proof. The second assertion follows from Lemma 1.1.1. The first assertion is simple when T is an isometry. n

zl z¯ m Tl−m x, x =

m

l

l,m=−n

2 zl z¯ m Tl+n x, Tm+n x = zl Tl+n x ≥ 0. l

For the general case, we put for any 0 < r < 1 and t ∈ T, U (r, t) = r k e2iπ kt T k , k≥0

V (r, t) =

r |k| e2iπ kt Tk = −I + U (r, t) + U (r, t)∗ .

k∈Z

If y = U (r, t)x, we have y − x = re2iπ t T y. Thus y − x ≤ y , and this shows V (r, t)x, x = −x, x + y, x + x, y = y, y − y − x, y − x ≥ 0. For any complex numbers {zl , |l| ≤ n}, we have n

r

l−m

zl z¯ m Tl−m x, x =

l,m=−n

=

n l,m=−n k n T l,m=−n

zl z¯ m Tk x, xr

|k|

T

e2iπ(k−(l−m))t dt

zl z¯ m e−2iπ(l−m)t V (r, t)x, x dt

2 = zl e−2iπ lt V (r, t)x, x dt ≥ 0. T

l

Letting r tend to 1 gives the required inequality. We shall now deduce from the spectral lemma an extremely useful tool. 1.2.2 Proposition. Let T be a contraction in a Hilbert space H , and let p(x) be a polynomial. Then, for any x ∈ H , 2iπ t 2 p(e

p(T )x 2 ≤ ) μx (dt), T

where the measure μx is the same as in Lemma 1.2.1. Proof. We follow an argument due to Wierdl. The inequality is obviously satisfied if the order of p is equal to 0. Assume now that the inequality is true for any polynomial of order k − 1. Let p(y) = a0 + · · · + ak y k , and consider the auxiliary polynomials q(y) = a1 y + · · · + ak y k ,

u(y) =

q(y) = a1 + · · · + ak y k−1 . y

10

1 The von Neumann theorem and spectral regularization

We have |p(y)|2 = |a0 + q(y)|2 = |a0 |2 + a0 q(y) ¯ + q(y)a¯ 0 + |q(y)|2 , and

p(T )x 2 = (a0 I + q(T )) x 2 = |a0 |2 x 2 + a0 x, q(T )x + q(T )x, a0 x + q(T )x 2 . By using the induction hypothesis, we have 2iπ t 2 u(e ) μx (dt).

u(T )x 2 ≤ T

Since T is a contraction, then q(T )x = T u(T )x ≤ u(T )x . And as |u(e2iπ t )| = |q(e2iπ t )|, we get 2iπ t 2 q(e

q(T )x 2 ≤ ) μx (dt). T

Besides, a0 x, q(T )x =

T

a0 q(e ¯

2iπ t

) μx (dt) and

q(T )x, a0 x =

T

q(e2iπ t )a¯ 0 μx (dt).

By putting together these various estimates, we obtain

p(T )x 2 ≤ ¯ 2iπ t ) + q(e2iπ t )a¯ 0 + |q(e2iπ t )|2 μx (dt), |a0 |2 + a0 q(e T

and this establishes the spectral inequality for all polynomials of order k, and thereby for any polynomial.

1.3 The von Neumann theorem Let T be a contraction in a Hilbert space H and introduce the operators 1 k T , n n−1

An = ATn =

n = 1, 2, . . . .

(1.3.1)

k=0

The fundamental result of von Neumann [1931] can be stated as follows. 1.3.1 Theorem. The limit limn→∞ An f = f¯ exists for any f ∈ H , and the map PT : f → f¯ is the orthogonal projection of H onto the subspace of invariant vectors HT = {g ∈ H : T g = g}. Further H = H0 ⊕ HT , where H0 = {g − T g, g ∈ H }. Proof. (1) The proof is based on the following lemma.

1.3 The von Neumann theorem

11

1.3.2 Lemma. Let T be a contraction in a Hilbert space H . Then the adjoint operator (Section 2.2.6) T ∗ has the same fixed points as T . Proof. T ∗ is also a contraction and if Tf = f , then f, Tf = Tf, f = f 2 . Conversely f, Tf = Tf, f = f 2 implies f, T ∗ f = f 2 and

Tf − f 2 = Tf − f, Tf − f = Tf 2 + f 2 − f, Tf − Tf, f = Tf 2 − f, T ∗ f ≤ 0. Thus Tf = f and so Tf = f ⇔ Tf, f = T ∗ f, f = f 2 . Therefore Tf = f ⇐⇒ Tf, f = T ∗ f, f = f, f ⇐⇒ T ∗ f = f. (2) We show that H = H0 ⊕ HT . According to (1), for any f ∈ HT , f, g − T g = f, g − f, T g = f, g − T ∗ f, g = 0. Hence HT ⊂ H0⊥ . Besides, if f is orthogonal to H0 , then 0 = f, g − T g = f − T ∗ f, g for any g in H . Thus T ∗ f = f , and thereby Tf = f . This implies that H0⊥ = HT . (3) It is plain that the theorem is satisfied for any vector of the type f + g − T g, f ∈ HT and g ∈ H . Indeed 1 k 1 k 1 k T (f + g − T g) = T f+ (T g − T k+1 g) n n n n−1

n−1

n−1

k=0

k=0

k=0

1 = f + (g − T n g) → f, n as n tends to infinity, and f is the orthogonal projection on HT of f + g − T g. (4)According to (2), these vectors are dense in H . The operators An are contractions as well. It follows that the set of vectors for which the theorem is true is closed in H . Let indeed A = {x ∈ H such that if y = projH0 (x) then lim An (y) = 0}. n→∞

We show that A is closed. Let {xn , n ≥ 1} ⊂ A, xn → x. Then yn → y, and

AN (y) ≤ AN (y − yp ) + AN (yp ) ≤ AN (yp ) + y − yp . Let ε > 0 and let p be a fixed integer such that y − yp < ε/2. Let N (ε) be such that for any N ≥ N(ε), AN (yp ) ≤ ε/2. We obtain that AN (y) ≤ ε. Thus A is closed in H and the theorem is established. Let {Xn , n ≥ 0} be a weakly stationary sequence in a Hilbert space H . According to Theorem 2.1.3, there exists a unitary operator U on H such that Xn = U n X0 . By von Neumann’s theorem, we get that the limit (X) := lim

n→∞

X0 + · · · + Xn−1 n

12

1 The von Neumann theorem and spectral regularization

exists in H . It can be directly observed that the inner product (X), Xh is independent of h. Indeed, by using the weak stationarity

Xk+1 + · · · + Xk+n Xh+1 + · · · + Xh+n , Xh = lim , Xk (X), Xh = lim n→∞ n→∞ n n = (X), Xk . And consequently

(X), Xh = (X),

X1 + · · · + Xn , n

which gives as n tends to infinity: (X), Xh = (X) 2 . As observed in Examples 1.1.3, for any real value of ϑ, the sequence {e−inϑ Xn , n ≥ 1} is weakly stationary too. The limit e−iϑ X1 + e−i2ϑ X2 + · · · + e−inϑ Xn n→∞ n thus also exists, for any value of ϑ. Then (X, ϑ) = lim

e−ihϑ Xh , (X, ϑ) = (X, ϑ) 2 , independently of h. Hence −iϑ1 e X1 + e−i2ϑ1 X2 + · · · + e−inϑ1 Xn

, (X, ϑ2 ) n ei(ϑ2 −ϑ1 ) + ei2(ϑ2 −ϑ1 ) + · · · + ein(ϑ2 −ϑ1 ) = . (X, ϑ2 ) 2 . n Therefore, if ϑ1 = ϑ2 (mod2π ), the last equation becomes, as n tends to infinity, (X, ϑ1 ), (X, ϑ2 ) = 0, as claimed in Examples 1.1.3. Weakly stationary sequences, however, enjoy other remarkable properties; among them is certainly the following identity which does not seem to be so known. An identity of Ky Fan. For any two positive integers n, m,

X1 + · · · + Xm 2

X1 + · · · + Xn 2

X1 + · · · + Xn+m 2 + − n m n+m

n(n + m) X1 + · · · + Xn X1 + · · · + Xn+m 2 = − . m n n+m This nice identity was observed and applied in Ky Fan [1946: 598]. The proof goes as follows. Put for any positive integer n, Sn = X1 + · · · + Xn , and if m is another positive integer let Tn,m = Sn+m − Sn , so that Sn+m = Sn + Tn,m . Then Sn Sn+m −

n

2

2 2 = Sn + Sn+m − Sn , Sn+m − Sn+m , Sn , n + m n2 (n + m)2 n n+m n+m n

13

1.3 The von Neumann theorem

and so

Sn+m 2 n(n + m) Sn − m n n + m (n + m) Sn 2 n Sn+m 2 = + nm m(n + m) 1 − Sn , Sn+m + Sn+m , Sn m

Sn 2

Sn 2

Sm 2

Sm 2

Sn+m 2 = + + − − n m m m n+m

Sn+m 2 1 + − Sn , Sn+m + Sn+m , Sn . m m But Sn , Sn+m + Sn+m , Sn = 2 Sn 2 + Sn , Tn,m + Tn,m , Sn , so that in turn n(n + m) m

Sn Sn+m −

n

n+m

2 2 2 2

2 2 = Sn + Sm − Sn+m + 1 S n+m − Sn

n m n+m m 2 − Sm − Sn , Tn,m − Tn,m , Sn

=

Sn 2

Sm 2

Sn+m 2 + − , n m n+m

since Sn+m = Sn + Tn,m . And we are done. Note that the weak stationarity assumption was only used in the last line of calculations, to say that Tn,m = Sm . A simple although quite interesting consequence of Ky Fan’s identity is

Sm 2

Sn 2

Sn+m 2 + , ≤ n m n+m

(1.3.2)

which is valid for any two positive integers n, m. This is inequality (4.8) in Ky Fan [1946]. We say that a sequence {gn , n ≥ 1} of real numbers is subadditive if it satisfies gn+m ≤ gn + gm .

(1.3.3)

Then we have the following well-known lemma. 1.3.3 Lemma. If {gn , n ≥ 1} is a subadditive sequence of real numbers, then gn /n converges to inf n≥1 (gn /n). Proof. Fix an arbitrary positive integer N and write n = jn N + rn with 1 ≤ rn ≤ N. Clearly jnn → N1 as n tends to infinity. Further gj N + grn gj N gr gr gn gn jn gN gN gr + n = ≤ ≤ n ≤ n + n ≤ + n. n≥1 n n n jn N n jn N n N n inf

14

1 The von Neumann theorem and spectral regularization

Letting now n tend to infinity gives inf

n≥1

gn gN gn gn ≤ lim sup ≤ . ≤ lim inf n→∞ n N n n→∞ n

As N was arbitrary, the lemma is proved. We thus deduce from (1.3.2) and from the lemma applied to gn :=

Sn 2 n

that

Sn Sn lim = inf . n→∞

n

n≥1

(1.3.4)

n

This is a remarkable consequence of Ky Fan’s identity, which remains true for averages of contractions by von Neumann’s theorem (proceed by approximation in view of the decomposition H = H0 ⊕ HT ). We continue with another interesting consequence concerning the ratios 2 Sn Snk n − n k+1 k k+1

1 nk

−

1 nk+1

,

where N = {nk , k ≥ 1} is a given increasing sequence of positive integers. Notice that in the orthonormal case, namely if X1 , X2 , . . . is an orthonormal sequence, then Sn k − Snk+1 2 = 1 − 1 precisely. We have the following properties: nk nk+1 nk nk+1

a)

Snk+1 2 N−1 Snk 1 nk − nk+1 lim sup

1 1 N→∞ nN k=1 nk − nk+1

b) Further if lim nk+1 − nk = ∞, then k→∞

Snk+1 −nk 2

SnN 2 − ≤ lim sup sup . 2 n2N N →∞ 1≤k 0. And |Vx (θ)| ≤ 1 if x is an integer. Hence (i). Now let −π ≤ θ ≤ π and put for any real x > 0, eixθ − 1 ϕθ (x) = . x

Then ϕθ (x) =

iθ xeixθ −eixθ +1 , x2

and noting δ(u) := |iueiu − eiu + 1|2 , we have

δ(u) = (1 − u sin u − cos u)2 + (u cos u − sin u)2 = 2[1 − u sin u − cos u] + u2 . We claim that for all u ≥ 0, δ(u) ≤ u4 /4. As δ(u) = δ(−u) it suffices to prove it for u ≥ 0. But δ (u) = 2u(1 − cos u) and if we set H (u) := u4 /4 − δ(u), we get H (u) = u3 − 2u(1 − cos u) = u(u2 − 4 sin2 (u/2)) ≥ 0, since | sin v| ≤ |v|. Then |ϕθ (x)| = |δ(xθ)|/x 2 ≤ |θ |2 /2. As it follows that ∂ V (θ ) ≤ π |ϕ (x)| ≤ π |θ |. x ∂x 2|θ | θ 4

∂ ∂x Vx (θ )

=

1 ϕ (x), eiθ −1 θ

Hence (ii). Let m ≥ n be positive integers . Then |Vn (θ) − Vm (θ )| =

1 π |ϕθ (n) − ϕθ (m)| ≤ (m − n) sup |ϕθ (x)|, − 1| 2|θ | n<x<m

|eixθ

and so |Vn (θ) − Vm (θ )| ≤

π 4 |θ|(m − n).

Now

n−1 m−1 1 1 ij θ 2(m − n) 1 ij θ |Vn (θ) − Vm (θ )| = e − e ≤ − .

n

m

j =0

m

j =n

m

Hence, (iii) and (iv). Introduce for θ ∈ [−π, π ) and y ∈ (0, 1] the regularizing kernel Q(θ, y) =

|θ | 1 ∧ 2. |θ| y

(1.4.1)

28

1 The von Neumann theorem and spectral regularization

1.4.3 Lemma. Let m ≥ n be two positive integers. Then, for any θ ∈ [−π, π ),

1/n

4π

Q(θ, y)dy + 4 1[ 1 , 1 ) (|θ |) ≥ |Vm (θ ) − Vn (θ )|2 . m n

1/m

Proof. Consider three cases: (1) |θ| ≥ n1 . By definition of Q and by Lemma 1.4.2.

1/n

1 m−n 1 dy = m n|θ | 1/m |θ| 2π 1 1 1 ≥ |Vm (θ ) − Vn (θ )|2 . ≥ |Vm (θ ) − Vn (θ )| 2 n|θ | 2π 4π

Q(θ, y)dy =

1/m

(2) |θ | ≤

1 m.

1/n

Then, for the same reasons

1/n

|θ| dy = (m − n)|θ | 2 1/m y 4 2 ≥ |Vm (θ ) − Vn (θ )| ≥ |Vm (θ ) − Vn (θ )|2 . π π

1/m

(3)

1 n

1/n

Q(θ, y)dy =

> |θ | ≥

1 m.

This case is obvious since we have |Vm (θ ) − Vn (θ )| ≤ 2.

Let f ∈ H , with spectral measure μf . Introduce a new measure, the spectral regularization of the measure μf with respect to the kernel Q, defined by μˆ f (dy) = 4π

π

−π

Q(θ, y)μf (dθ ) dy + 4μf (dy).

(1.4.2)

It is easy to verify that μˆ f ([0, 1]) ≤ 4(2π + 1)μf ([−π, π]) ≤ 4(2π + 1) f 2 . Indeed, if |θ | ≤ 1, then 0

1

|θ |

Q(θ, y)dy = 0

−1

|θ|

dy +

1 |θ |

|θ|y −2 dy = 1 + |θ |(|θ |−1 − 1) = 2 − |θ | ≤ 2,

1 1 and if 1 ≤ |θ| ≤ π , then y ≤ |θ| and 0 Q(θ, y)dy = 0 |θ |−1 dy ≤ 1. We thus have 1 0 Q(θ, y)dy ≤ 2; hence the inequality. 1.4.4 Theorem (Spectral regularization inequality). For any integers m ≥ n ≥ 1,

ATn f − ATm f 2 ≤ μˆ f

1 1 m, n

.

29

1.4 The spectral regularization inequality

Proof. By integrating the inequality of Lemma 1.4.3 with respect to the measure μf , we get π 1/n π 1 1 Q(θ, y)μf (dθ ) dy+4 μf m , n ≥ |Vm (θ )−Vn (θ )|2 μf (dθ ). 4π 1/m

−π

−π

By means of the spectral inequality (Proposition 1.2.2), we thus obtain the claimed result. The spectral regularization inequality allows us to easily evaluate the Littlewood– Paley square function associated to the averages ATn (f ). Put for any nondecreasing sequence N = {np , p ≥ 1} of positive integers, and any f ∈ H , SN (f ) =

∞

ATnp+1 (f ) − ATnp (f ) 2

1/2 .

(1.4.3)

p=1

These functions, which are extrapolated from the Littlewood–Paley theory, gained much interest in the ergodic circles during the last decade. We briefly recall their role in Fourier analysis on T. Introduce the so-called dyadic intervals ⎧ j −1 j −1 + 1, . . . , 2j − 1} if j > 0, ⎪ ⎨{2 , 2 j = {0} if j = 0, ⎪ ⎩ −|j | if j < 0, If f is any integrable function on T and fˆ its Fourier transform, then we write Sj f = ˆ n∈j f (n)χn . The square function of f is defined by Sf =

|Sj f |2

1/2 ,

j ∈Z

and the Littlewood–Paley theorem on T expresses that to each p in (1, ∞) correspond positive numbers Ap and Bp such that Ap Sf p ≤ f p ≤ Bp Sf p for (say) all trigonometric polynomials f on T. For more, see [Edwards–Gaudry: 1977]. The square function also appears in martingale theory ([Burkholder–Gundy: 1970], inequality (1.4)). Let f1 , f2 , . . . be a martingale on some probability space and d1 , d2 , . . . its difference sequence, so that fn =

n

dk ,

n ≥ 1.

k=1

Let

f

denote the maximal function of the martingale sequence: f = supn≥1 |fn |.

30

1 The von Neumann theorem and spectral regularization

The maximal function is related to the square function Sf = inequalities Ap Sf p ≤ f p ≤ Bp Sf p

∞

2 1/2 k=1 dk

by the

valid for 1 < p < ∞. 1.4.5 Theorem (Square function inequality). For any nondecreasing sequence N of positive integers, and any f ∈ H , SN (f ) ≤ 2(2π + 1)1/2 f . Proof. From Theorem 1.4.4, follows immediately that ∞

ATnp+1 (f ) − ATnp (f ) 2 ≤

p=1

∞

μˆ f

1 1 np+1 , np

≤ μˆ f {[0, 1]} ≤ 4(2π + 1) f 2 .

p=1

Actually the better constant 6π is obtained in Lifshits and Weber [2000: 77] by using another kernel Q. The corresponding spectral regularization of μ is given by π d μˆ −3 2 Q(θ, x)μ(dθ ) = |x| θ μ(dθ ) + |θ |−1 μ(dθ ), (x) = dx −π |θ | 1. Assume that H = L2 (μ), (X, A, μ) being a probability space, and define Tf = f τ where τ is a measure-preserving transformation of X (Section 3.1). By Theorem 1.4.5, the associated square function SN defined in (1.4.3) maps L2 (μ) to L2 (μ). This can be extended for 1 < p < ∞: There exists a constant Cp such that for any increasing sequence N = {nk , k ≥ 1} and any f ∈ Lp (μ), we have ∞ T A

p 1/p T (f ) − A (f ) ≤ Cp f p . nk+1 nk p

(1.4.11)

k=1

This nice result was shown by Jones, Kaufman, Rosenblatt and Wierdl [1998]. It is a direct consequence of a stronger result (see Theorem A), which we shall discuss in Section 4.6.6. With the notation from the beginning of the section, let N (AT (f ), p, ε) be the minimal number (possibly infinite) of Lp (μ) open balls centered in AT (f ) of radius ε, enough to cover AT (f ). In a way similar to the one we used to derive entropy estimates from the square function, we deduce from (1.4.11): There exists a constant Cp such that for ε > 0 and any f ∈ Lp (μ), N(A (f ), p, ε) ≤ T

p f p Cp p .

ε

(1.4.12)

For irrational rotations, this bound can be improved by using the Hausdorff–Young inequality (Lifshits [1997] and Weber [1997]). Let τ x = x + ϑ be a rotation on (T, λ), and T defined by Tf = f τ .

36

1 The von Neumann theorem and spectral regularization

Let 2 ≤ p < ∞ and 1/p + 1/q = 1. For f ∈ Lp (T), f ∼ fˆ = {fˆj , j ∈ Z} be its Fourier transform. Then

sup N(AT (f ), p, ε) ≤ Cε−q .

ˆ

j ∈Z fj ej ,

let

(1.4.13)

fˆ q ≤1

As T ej = e2iπj ϑ ej := λj ej , for all polynomials P we have P (λj )fˆj ej . P (T )f = j ∈Z

By the Hausdorff–Young theorem, we get

P (T )f p ≤ Cp

|P (λj )|q |fˆj |q

1/q .

j ∈Z

But this is a complete analog to (1.4.6) and we can proceed as in the proof of Proposition 1.4.7, by introducing a pseudo-spectral measure μ = j ∈Z |fˆj |q δλj , and its regularized version μˆ with the same kernel Q(z, r). We arrive at the estimate q

q

q

q

q

(An − Am )f p = (Vn − Vm )(T )x p ≤ Cp Vn − Vm q,μ ≤ C1 Cp μ[1/m, ˆ 1/n]. The estimate for covering numbers follows straightforwardly. Note that the proof works not only for rotations but also for all operators whose duals (with respect to a Fourier transform) act in q as contractive multiplications. Any convolution operator with respect to unit mass measure satisfies this condition. For more general averages such as averages of Dunford–Schwartz operators, or of a contraction in Lp , we do not know whether an analogous formulation of (1.4.12) exists. This estimate cannot, however, be improved in general as the following nice counterexample from Lisfshits [1997] shows. Lifshits’ counterexample. Let 2 ≤ p < ∞ and let U : Lp (T) → Lp (T) be the multiplication operator defined for any f ∈ Lp (T) and any θ ∈ T by Uf (θ ) = eiθ f (θ ). I +U +···+U We write An = AU n = n for any ε > 0 small enough that

n−1

where I is the identity operator. We shall prove

sup N (A(f ), p, ε/3) ≥ ε−p .

f p =1

Note that An f (θ ) = Vn (θ )f (θ ), so that for any positive integers m, n, p |Vn (θ ) − Vm (θ )|p |f (θ )|p dθ.

An f − Am f p = T

(1.4.14)

1.4 The spectral regularization inequality

37

Let B be some fixed integer strictly greater than 12. From the standard estimates |Vm (θ )| ≤ π(mθ )−1 ,

|Vn (θ ) − 1| ≤ π(n − 1)θ/4 ≤ nθ,

valid for any m, n, θ, we deduce that if B/m ≤ θ ≤ B 2 /m and n ≤ B −3 m, then |Vn (θ ) − Vm (θ )| ≥ 1/2. It follows for any f ∈ Lp (T), any m and any n ≤ B −3 m that B 2 /m p

An f − Am f p ≥ 2−p |f (θ )|p dθ. B/m

In particular, for any f ∈ Lp (T) and any positive integers l > t, B 2−3l p

AB 3t f − AB 3l f p ≥ 2−p |f (θ )|p dθ. B 1−3l

Let M be some positive integer and put ε = M −1/p . Set f (θ) =

M l=1

1

M(B 2−3l

1/p 1[B 1−3l ,B 2−3l ] (θ ).

− B 1−3l )

Then f p = 1 and

B 2−3l B 1−3l

Thus

|f (θ)|p dθ =

1 = εp , M

AB 3t f − AB 3l f p ≥ 2−p εp , p

l = 1, . . . , M. 1 ≤ t < l ≤ M.

We deduce from these calculations that N (A(f ), p, ε/3) ≥ M = ε−p , as claimed. A variant in L1 . There is a general estimate of a weaker form of the square function in L1 , which is due to Jones, Rosenblatt and Wierdl [1999: Theorem 2.3], and can be stated as follows. Let (X, A, μ) be a probability space. Consider mappings Tn : L1 (μ) → L1 (μ) and assume that each is strongly positive in the sense that Tn f ≥ 0 for all f ∈ L1 (μ). We also assume that each Tn is positively homogeneous, which means that Tn (cf ) = cTn f for nonnegative c and f ∈ L1 (μ). For instance, Tn can be the absolute value of any linear operator from L1 (μ) to L1 (μ).

∞ 2 1/2 . Then Let Sf (x) = n=1 Tn f (x) sup sup λ λ≥0 f 1 ≤1

∞ n=1

μ{|Tn f | ≥ λ} ≤ C "⇒ sup sup λμ{Sf ≥ λ} ≤ 10C. (1.4.15) λ≥0 f 1 ≤1

38

1 The von Neumann theorem and spectral regularization

The proof is rather elementary. As Sf ≤ S1 f + S2 f , where S1 f (x) = S2 f (x) =

∞ n=1 ∞

1/2

(Tn f (x))2 1{Tn f ≤1} (x)

, 1/2

(Tn f (x))2 1{Tn f >1} (x)

,

n=1

we get μ{Sf ≥ 2} ≤ μ{S1 f ≥ 1} + μ{S2 f ≥ 1} ∞ (Tn f )2 1{Tn f >1} ≥ 1 ≤ μ{S1 f ≥ 1} + μ ≤ μ{S1 f ≥ 1} +

n=1 ∞

μ{Tn f > 1} ≤ μ{(S1 f )2 ≥ 1} + C f 1

n=1 ∞

=μ

(Tn f (x))2 .

n=1 ∞

≤μ ≤

k=0

1{2−k−1 ≤Tn f ≤2−k } ≥ 1 + C f 1

k=0

2−2k

k=0 ∞

∞

2−2k

∞

1{2−k−1 ≤Tn f ≤2−k } ≥ 1 + C f 1

n=1 ∞

μ{Tn f ≥ 2−k−1 } + C f 1 ≤ 5C f 1 .

n=1

Let t > 0. Replacing now f by f/t gives tμ{Sf ≥ 2t} ≤ 5C f 1 ; hence sup λμ{Sf ≥ λ} ≤ 10C f 1 . λ≥0

Extensions to the Hilbert transform. Results of the previous section have extensions to the discrete bilateral Hilbert transform Hn (f ) = U j (f )/j, 0 0 be fixed. As τ has a limit from the right, there exists η > 0 such that ud < u < ud + η "⇒ |τ (u) − τ (ud + 0)| < ε. And thus ud 1 1 ud +η τ (u)du − τ (u)du − τ (ud + 0) |τ (ud + 0)| = η −π η −π ud +η 1 = τ (u) − τ (ud + 0) du η 1 ≤ η ≤ ε.

ud ud +η

|τ (u) − τ (ud + 0)|du

ud

As ε is arbitrary, we also deduce that τ (ud + 0) = 0. This shows that on any point t of the interval [−π, π[, we have τ (t) = 0. Said differently, σ¯ (t) = σ (t), for any t ∈ [−π, π[, as claimed. Equation (E1) thus admits, under the normalization conditions (2.2.3), a unique solution, namely, the function σ (t, f ) previously defined. Introduce then, for any t ∈ [−π, π], the function σ (t, f, g) =

1 i i 1 σ (t, f + g) − σ (t, f − g) + σ (t, f + ig) − σ (t, f − ig). 4 4 4 4

By successively replacing in equation (E1), f by f + g, f − g, f + ig and f − ig, we easily verify that π k U f, g = eikt dσ (t, f, g) (∀k ∈ Z). (E2) −π

We have thus obtained a representation of Fourier–Stieltjes transform type of the quantities U k f, g. We are now going to show that the mapping (f, g) → σ (t, f, g)

69

2.2 Spectral representation of unitary operators

is a bilinear form on H , with norm less than 1. Let f = a1 f1 + a2 f2 . Then U k f, g = a1 U k f1 , g + a2 U k f2 , g, and thus ∀k ∈ Z,

π

−π

e

ikt

dσ (t, f, g) = a1

π −π

e

dσ (t, f1 , g) + a2

ikt

π −π

eikt dσ (t, f2 , g).

This shows that σ (t, f, g) = a1 σ (t, f1 , g) + a2 σ (t, f2 , g). Thus f → σ (t, f, g) is linear. Further, g, U k f = U −k g, f = and

−π

g, U k f = U k f, g =

π −π

Consequently, for any integer k ∈ Z, π −ikt e dσ (t, g, f ) = −π

π

π

−π

(L1)

e−ikt dσ (t, g, f )

e−ikt dσ (t, f, g).

e−ikt dσ (t, f, g).

This shows that σ (t, f, g) = σ (t, g, f ). From this relation and from the linearity in f of σ (t, f, g) follows that σ (t, f, b1 g1 + b2 g2 ) = b1 σ (t, f, g1 ) + b2 σ (t, f, g2 ).

(L2)

Since the map t → σ (t, f, f ) is nondecreasing, and that σ (−π, f, f ) = 0, we have π σ (t, f, f ) ≤ σ (π, f, f ) = dσ (t, f, f ) = f, f . (L3) −π

We now need the following result. 2.2.2 Lemma. Let ϕ : H × H → C be an application satisfying the following properties: (a) ϕ(a1 f1 + a2 f2 , g) = a1 ϕ(f1 , g) + a2 ϕ(f2 , g), (b) ϕ(f, b1 g1 + b2 g2 ) = b¯1 ϕ(f, g1 ) + b¯2 f ϕ(f, g2 ), (c) |ϕ(f, f )| ≤ C f 2 , (d) |ϕ(f, g)| = |ϕ(g, f )|, where C is a constant, and f, g, f1 , f2 , g1 , g2 are arbitrary elements of H , and a1 , a2 , b1 , b2 are arbitrary complex numbers. Then ϕ is a bilinear form on H of norm less than or equal to C.

70

2 Spectral representation of weakly stationary processes

Proof. From (a) and (b) follows that ϕ(f, h) + ϕ(h, f ) =

1 ϕ(f + h, f + h) − ϕ(f − h, f − h) . 2

Consequently, |ϕ(f, h) + ϕ(h, f )| ≤

1

f + h 2 + f − h 2 = C f 2 + h 2 . 2

(2.2.7)

Let f and g be two elements of H such that max( f , g ) ≤ 1, and h = λg where λ is a complex number such that |λ| = 1. Then (2.2.7) implies, ¯ |λϕ(f, g) + λϕ(f, g)| ≤ 2C.

(2.2.8)

We assume that ϕ(f, g) = 0. Then by (d), ϕ(f, g) = |ϕ(f, g)|eia ,

ϕ(g, f ) = |ϕ(f, g)|eib .

And by means of (2.2.8), ¯ ia + λeib | ≤ 2C. |ϕ(f, g)| · |λe Choose λ = ei

a−b 2

. We obtain ¯ ia + λeib = ei λe

a+b 2

+ ei

a+b 2

= 2ei

a+b 2

.

And this shows |ϕ(f, g)| ≤ C

( f ≤ 1, g ≤ 1).

Hence the lemma, since for ϕ(f, g) = 0, the inequality is trivially satisfied. Relations (L1), (L2) and (L3) thus imply, by virtue of the lemma we have just proved, that σ (t, · , · ) is a continuous bilinear form on H with norm less than or equal to 1. They indicate also a simple consequence of Lemma 2.2.2. 2.2.3 Corollary. Let ϕ : H × H → C be a bilinear form on H , verifying the following condition: for any elements f and g of H , |ϕ(f, g)| = |ϕ(g, f )|. Then,

ϕ = sup

f ∈H

|ϕ(f, f )| . f, f

Proof. It follows from Lemma 2.2.2 that

ϕ ≤ sup

f ∈H

|ϕ(f, f )| . f, f

2.2 Spectral representation of unitary operators

71

Conversely, we also have sup

f ∈H

|ϕ(f, f )| |ϕ(f, g)| ≤ sup = ϕ . f, f f,g∈H f

g

Hence, the corollary is proved. The lemma below gives a representation of bilinear forms. 2.2.4 Lemma. Let φ : H × H → C be a continuous bilinear form on H . Then there exists a continuous operator A : H → H , such that for all f, g ∈ H , φ(f, g) = Af, g. Moreover,

A = φ . This is a straightforward application of the well-known Riesz–Fréchet theorem: 2.2.5 Lemma. Any linear form φ on H can be expressed in the form φ(h) = h, hφ , where hφ ∈ H is uniquely determined by φ. Further,

φ = f . Proof. We know that Ker(φ) := g ∈ H : φ(g) = 0 is a closed subspace of H . The claimed result is obvious if Ker(φ) = H . Now, if Ker(φ) = H , let g ∈ H ( Ker(φ). Consider the elements of the form φ(h)g − φ(g)h, As

h∈H

" # φ φ(h)g − φ(g)h = φ(g)φ(h) − φ(h)φ(g) = 0,

these elements thus belong to Ker(φ). Since g ∈ H ( Ker(φ), we have φ(h)g − φ(g)h, g = 0, and thus,

φ(g) φ(h) = h, g . g, g Thus φ(h) = h, hφ , for any element h of H , and this representation is obviously unique. Finally, from the equation φ(h) = h, hφ follows that |φ(h)| ≤ h

hφ . Hence, φ ≤ hφ . And for h = hφ , we obtain φ(h) = hφ 2 . Thus also, φ ≥

hφ . This achieves the proof of the Riesz–Fréchet theorem.

72

2 Spectral representation of weakly stationary processes

Now we can easily deduce Lemma 2.2.4. Proof of Lemma 2.2.4. Fix some f in H . The mapping g → ϕ(f, g) defines a continuous linear form on H . Thus by virtue of Lemma 2.2.5, there exists a unique fϕ , for which we have ϕ(f, g) = g, fϕ or ϕ(g, f ) = fϕ , g. Define the operator A by the equation Af = fϕ . Then ϕ(f, g) = Af, g, and since ϕ(a1 f1 + a2 f2 , g) = a1 ϕ(f1 , g) + a2 ϕ(f2 , g), we also have A(a1 f1 + a2 f2 ) − a1 A(f1 ) − a2 A(f2 ), g = 0, for any g ∈ H . Consequently, A(a1 f1 + a2 f2 ) = a1 A(f1 ) + a2 A(f2 ). And this shows that A is a linear operator on H . Finally

ϕ = sup f,g∈H

and

ϕ = sup f,g∈H

|ϕ(f, g)| |Af, g|

Af

= sup ≤ sup ,

f

g

f,g∈H f

g

f ∈H f

|ϕ(f, g)| |Af, Af |

Af

≥ sup = sup .

f

g

f

Af

f,g∈H f ∈H f

These relations imply that A is continuous and

ϕ = A . Finally, if B is another operator on H for which ϕ(f, g) = Bf, g for any f and g in H , then Af − Bf, g = 0. And thus A = B, whence the unicity of A. 2.2.6 Self-adjoint operators. There are some easy and useful (for the sequel) consequences to be drawn from Lemma 2.2.4. A bounded linear operator on H induces by means of the expression f, Ag a bilinear form on H with norm A . By Lemma 2.2.4 we deduce that there exists an operator A∗ on H with norm A∗ = A such that f, Ag = A∗ f, g.

73

2.2 Spectral representation of unitary operators

This operator is by definition the adjoint operator of A, and one can easily verify that A∗∗ = (A∗ )∗ = A. If A is a bounded operator and A∗ = A, then we say that A is self-adjoint. For a bounded self-adjoint operator A, we have the relation sup

f = g =1

|Af, g| = sup |Af, f |.

f =1

Indeed, the bilinear form ϕ(f, g) = Af, g, verifies the condition of Corollary 2.2.3, namely |ϕ(f, g)| = |ϕ(g, f )|. The result therefore follows from this corollary. There thus exists a family (Et )−π ≤t 0 "⇒ μ

k=1

τ −k A = 1.

(3.2.2)

103

3.3 Weak mixing, strong mixing, continuous spectrum

, −k A and let ω ∈ τ −1 A. ˇ Then τ ω ∈ τ −k A for some k ≥ 1, Indeed, write Aˇ = ∞ k=1 τ ˇ and by ˇ Thereby τ −1 Aˇ ⊂ A, which means that τ k (τ ω) = τ k+1 ω ∈ A; hence ω ∈ A. −k ˇ ˇ iterating this τ A ⊂ A for any positive integer k. But this has some consequences. By specifying Proposition 3.2.3 for indicator functions f = 1B , g = 1C , we get 1 μ(τ −k B ∩ C) = μ(B)μ(C). n→∞ n n−1

lim

(3.2.3)

k=0

Applying this with B = Aˇ = C gives 1 ˇ = 1 ˇ = μ(A) ˇ = μ(A) ˇ 2. μ(τ −k Aˇ ∩ A) μ(τ −k A) n n n−1

n−1

k=0

k=0

ˇ = 0 or 1. As μ(A) ˇ ≥ μ(τ −1 A) = μ(A) > 0, we obtain (3.2.2). Hence μ(A)

3.3 Weak mixing, strong mixing, continuous spectrum Let (X, A, μ, T ) be a measurable dynamical system and denote again by UT the operator on L2 (μ) defined by UT f (x) = f (T x). An equivalent reformulation of Proposition 3.2.3 is 3.3.1 Lemma. The dynamical system (X, A, μ, T ) is ergodic, if and only if, 1 μ(A ∩ T −k B) = μ(A)μ(B). n→∞ n n−1

∀A, B ∈ A,

lim

(3.3.1)

k=0

This means that the space X is well mixed under the action of T . We shall strengthen this notion of mixing, by replacing the convergence in mean in the Cesàro sense, by stronger modes of convergence. 3.3.2 Definition. (a) The dynamical system (X, A, μ, T ) is weakly mixing if ∀A, B ∈ A,

n−1 1 μ(A ∩ T −k B) − μ(A)μ(B) = 0. n→∞ n

lim

(3.3.2)

k=0

k The condition is equivalent to: limn→∞ n1 n−1 k=0 UT f, g − f, 1g, 1 = 0, for all 2 f, g ∈ L (μ). (b) A dynamical system (X, A, μ, T ) is strongly mixing if ∀A, B ∈ A,

lim μ(A ∩ T −n B) = μ(A)μ(B),

n→∞

or equivalently limn→∞ UTn f, g = f, 1g, 1, for all f, g ∈ L2 (μ).

(3.3.3)

104

3 Dynamical systems – ergodicity and mixing

(c) The dynamical system (X, A, μ, T ) has continuous spectrum, when the only eigenfunctions of UT are constants: ∀f ∈ L2 (μ),

UT f = f "⇒ f = constant.

(3.3.4)

(d) The dynamical system (X, A, μ, T ) has discrete spectrum, when the eigenfunctions of UT span L2 (μ). (e) Let k be some positive integer. A dynamical system (X, A, μ, T ) is k-mixing, if for any choice of measurable sets A1 , . . . , Ak ,

μ T −n1 A1 ∩ · · · ∩ T −nk Ak = μ(A1 ) . . . μ(Ak ). lim min(n1 ,...,nk )→∞ min(|ni −nj |,i =j )→∞

(f) The system, or more simply, the endomorphism T is completely mixing when

UTn f 1 tends to 0, for all f ∈ L1 (μ) with X f dμ = 0. The following fact is implicit in the above definitions: in order for a measurable dynamical system (X, A, μ, T ) to be ergodic, weakly mixing or mixing, it is sufficient (and obviously necessary) that the property be satisfied for a countable class of functions, which is also dense in L2 (μ). Note also that if T is weakly mixing, then so is T k for any k; a similar comment is trivially in order for the strong mixing property. ¯ Let A be a set of nonnegative integers. The lower (resp. upper) density d(A) (resp. d(A)) of A is defined by d(A) = lim inf J →∞

#{A ∩ [1, J ]} , J

#{A ∩ [1, J ]} ¯ d(A) = lim sup . J J →∞

(3.3.5)

¯ We say that A has a density d(A) if d(A) = d(A). The following liaison theorem is very useful. 3.3.3 Theorem. For a measurable dynamical system, the following conditions are equivalent. a) The dynamical system (X, A, μ, T ) is weakly mixing. b) The product dynamical system (X × X, A ⊗ A, μ × μ, T × T ) is ergodic. c) For any f, g ∈ L2 (μ), with f, 1 = g, 1 = 0, there exists a nondecreasing sequence S with density 1, such that lim UTn f, g = 0.

n→∞ n∈S

d) The dynamical system (X, A, μ, T ) has continuous spectrum. Moreover, when the σ -algebra A is countably generated, conditions a), b), c) and d) are also equivalent to the following condition:

105

3.3 Weak mixing, strong mixing, continuous spectrum

e) There exists a nondecreasing sequence M with density 1, such that for any f, g ∈ L2 (μ), with f, 1 = g, 1 = 0, lim UTn f, g = 0.

n→∞ n∈M

The last equivalence follows from a theorem due to Jones [1972]. Proof. The product dynamical system (X × X, A ⊗ A, μ × μ, T × T ) is ergodic, if and only if, 1 k lim UT ×T f, g = 0. (3.3.6) n→∞ n n−1

∀f, g ∈ L (μ × μ), f, 1 = g, 1 = 0, 2

k=0

As linear combinations of functions of type f (x, y) = f1 (x)f2 (y), f1 , f2 ∈ L2 (μ) are dense in L2 (μ × μ), and since averages of contractions are contractions in L2 (μ), we deduce that the product dynamical system is ergodic, if and only if, ∀f1 , f2 , g, g ∈ L2 (μ) ∩ 1⊥ μ,

1 k UT f1 , gUTk f2 , g = 0. n→∞ n n−1

lim

(3.3.7)

k=0

By Cauchy–Schwarz’s inequality, we get n−1 2 $ %$ % n−1 n−1 1 1 1 k k k 2 k 2 U f , gU f , g ≤ (U f , g) (U f , g ) . T 1 T 2 T 1 T 2 n n n k=0

k=0

k=0

To obtain (3.3.7), it thus suffices to prove ∀f, g ∈ L2 (μ) ∩ 1⊥ μ,

1 k UT f, g2 = 0. n→∞ n n−1

lim

(3.3.8)

k=0

By choosing f1 = f2 , g = g in (3.3.7), we observe that (3.3.7) and (3.3.8) are in turn equivalent, and equivalent to the ergodicity of the product dynamical system. The equivalence between properties a), b) and c) will be deduced from the following lemma. 3.3.4 Lemma. Let {an , n ≥ 0} be a bounded sequence of positive reals. The following conditions are equivalent: (a) limn→∞ n1 n−1 k=0 ak = 0, 2 (b) limn→∞ n1 n−1 k=0 ak = 0, (c) There exists an increasing sequence of positive integers S, with density 1, such that lim n→∞ an = 0. n∈S

106

3 Dynamical systems – ergodicity and mixing

Note that if (a) holds, then for any positive integer j , limn→∞ n1 n−1 k=0 aj k = 0; and so by (c), for any positive integer j there exists a sequence of integers S of density 1, such that lim n→∞ aj n = 0. This seems less easy to be directly deduced from (c). n∈S

1 n−1 2 Proof of Lemma 3.3.4. (1) Let M = supn≥0 an . Then, n1 n−1 k=0 ak ≤ M n k=0 ak ; hence the implication (a) ⇒ (b). Conversely, by the Cauchy–Schwarz inequality

1 n−1 2 1 1 n−1 2 a ≤ k k=0 k=0 ak ; hence the equivalence (a) ⇔ (b). n n (2) Now, we show the implication (c) ⇒ (a). Let ε > 0 and Nε be such that for any n ∈ [Nε , ∞) ∩ S, an < ε. For such sufficiently large n, we have ε 1 1 1 ak = ak + n n n

n−1

N

k=0

k=0

n−1

1S (k) + 1S c (k) ak

k=Nε +1

Nε 1 c ≤M (1S (k) ≤ 3ε, +ε+M n n n−1 k=0

since = 0. As ε was arbitrary, we obtain (a). (3) Finally consider the implication (a) ⇒ (c). There is no loss to assume that an > 0 infinitely often. For any integer k ≥ 1, we put −1 1 Nk = sup N : N1 N j =0 aj > k 2 . d(S c )

1} is nondecreasing, unbounded. Further, for any N > Nk , we The sequence {Nk , k ≥ 1 have N1 N−1 a ≤ . Let κ = {kp , p ≥ 1} be the strictly increasing sequence of j =0 j k2 integers defined by Nkp < Nkp+1 Set Sc =

and ∞ +

kp ≤ k < kp+1 "⇒ Nk = Nkp .

[Nkp , Nkp+1 [ ∩ n : an >

1 kp

.

(3.3.9)

k=1

Let j ∈ S and p˜ = p(j ) be the unique integer such that j ∈ [Nkp˜ , Nkp+1 [. For this value ˜ 1 1 of p, as S = ∞ , we have a [N , N [ ∪ n : a ≤ ≤ . Consequently kp+1 n j k=1 kp kp kp lim aj = 0.

j →∞ j ∈S

It remains to show that d(S) = 1. Let N be some positive integer, and let p be defined by N ∈ [Nkp , Nkp+1 [. Let n ≤ N, n ∈ S c . Then for some p ≤ p, we have n ∈ [Nkp , Nkp +1 [. And by definition of S c , an > k1 ≥ k1p . p Thus N −1 N −1 1 1 1 1 c 1S (n) ≤ kp an ≤ kp 2 = → 0, N kp kp N n=0

n=0

as p (in fact N) tends to infinity; hence d(S c ) = 0.

3.3 Weak mixing, strong mixing, continuous spectrum

107

Continuation of the proof of Theorem 3.3.3. As by the Cauchy–Schwarz inequality, the sequence an = |UTk f, g|, n ≥ 1 is bounded, the fact that assertions a), b) and c) are equivalent, is now easily deduced from Lemma 3.3.4. We shall now show the equivalence between a), b), c) and e), when the σ -algebra A has a countable basis. Let A1 , A2 , . . . be a sequence of measurable sets generating A, and denote by B the Boole algebra generated by this sequence. Note also for any integer k ≥ 1, Bk is the Boole algebra generated by the sequence A1 , A2 , . . . , Ak . From the previous step, we deduce for each k, that there exists a nondecreasing sequence of integers Mk with density 1, such that lim μ(A ∩ T −n B) = μ(A)μ(B).

∀A, B ∈ Bk ,

n→∞ n∈Mk

We shall build a sequence n1 < n2 < · · · growing sufficiently fast for the set +

M= [nk , nk+1 [ ∩ M1 ∩ M2 ∩ · · · ∩ Mk , k≥1

to be of density 1, and moreover to have Mk ⊃ {m ∈ M : m ≥ nk } for each k. Put, 1 c ∀k ≥ 1, nk = inf n : sup1≤j ≤k supN >n N1 N j =1 1Mj ≤ k 2 . Then for any N > nk and 1 ≤ j ≤ k, N 1 1 1Mjc ≤ 2 . N k j =1

Let N be some arbitrary integer, and k such that N ∈ [Nk , Nk+1 [. Let n ≤ N, n ∈ M c , then n ∈ [Nl , Nl+1 [ for some l ≤ k. Since Mc =

*

([Nk , Nk+1 [)c ∪

k +

Mjc ,

j =1

k≥1

for this value of l, we have: n∈

l + j =1

Mjc ⊆

k +

Mjc .

j =1

Thus, N N k N 1 1 1 k 1M c (n) ≤ 1 ,k M c (n) ≤ 1Mjc (n) ≤ → 0 j =1 j N N n (k − 1)2 n=1

j =1 n=1

n=1

as k (in turn N) tends to infinity. = 0. Now, by construction, if m ≥ nk , there exists l ≥ k such that m ∈ [nl , nl+1 [. For such l, we have m ∈ M1 ∩ · · · ∩ Ml ⊂ M1 ∩ · · · ∩ Mk ⊂ Mk . Hence, We get d(M c )

∀k ≥ 1,

{m ∈ M : m ≥ nk } ⊂ Mk .

108

3 Dynamical systems – ergodicity and mixing

We deduce, lim μ(A ∩ T −n B) = μ(A)μ(B).

∀N ≥ 1, ∀A, B ∈ Bk ,

n→∞ n∈M

, But the Boole algebra B generates the σ -algebra A. As B = N ≥1↑ BN , the proof is achieved by approximation. The remaining part of the proof of Theorem 3.3.3 now relies upon the spectral mixing theorem, and will be given at the end of the next section. Exact endomorphisms. This notion turns up to be very appropriate for describing mixing properties of some important number-theoretic endomorphisms. Consider a Lebesgue space (M, M, μ) with a continuous measure, and let T denote an endomorphism of (M, M, μ). Repeated application of the operation T −1 generates a sequence of σ -algebras, each imbedded in its predecessor, M ⊇ T −1 M ⊇ T −2 M ⊇ · · · . The endomorphism T is said to be exact if ∞ *

T −n M = N ,

n=0

where N denotes the trivial σ –algebra, namely the ensemble of all measurable sets of measure zero and their completion. In a nicely written paper [Rochlin: 1961] this property was introduced and further thoroughly investigated. An endomorphism T is mixing of degree r if for arbitrary sets M0 , M1 , . . . , Mr , and sequences (k0n , k1n , . . . , krn ) consisting of non-negative integers k0n , k1n , . . . , krn such that lim inf |kin − kjn | = ∞, n→∞ 0≤i<j ≤r

we have

r *

lim μ

n→∞

r ( j T −kn Mj = μ(Mj ).

j =0

j =0

A remarkable fact is that an exact endomorphism is mixing of all degrees ([Rochlin: 1961], p.17). Important examples arise from a class of number-theoretic endomorphisms studied by Rényi [1957]. Let φ be a real function defined on the interval ]0, 1[. We write T x = {φ(x)}, where {c} denotes the fractional part of c. If T x = 0, then T 2 x is defined, and this may be continued if T 2 x = 0. Let M denote the set of all points of ]0, 1[ for which all powers of T are defined. Clearly T maps M into M. For x ∈ M write also an (x) = φ(T n x),

n = 1, 2, . . . .

109

3.3 Weak mixing, strong mixing, continuous spectrum

It is natural to ask whether there does exist on M a measure μ equivalent to the Lebesgue measure λ such that T is an ergodic endomorphism of the space M with measure μ. Such a question has been studied with relation to concrete functions φ, notably in [Rényi: 1957]. As classical examples consider the function φ(x) = rx, where r is an integer greater than 1, and also the function φ(x) = 1/x. In the first case M is the set of r-based irrational numbers in the interval ]0, 1[, and a1 (x), a2 (x), . . . is an expansion of the number x into r-ary fractions. The measure μ exists and coincides with λ. In the second example, which we already encountered in 3.1.2, M is the set of all irrational numbers in the interval ]0, 1[ and a1 (x), a2 (x), . . . is the expansion of the number x into a continued fraction. The measure μ exists and is defined by the formula 1 dx μ(X) = , log 2 X 1 + x where X is an arbitrary measurable subset of M. More complete results are contained in [Rényi: 1957]. Consider the following two conditions: (A) φ is continuous and strictly decreasing from the limiting value lim φ(x), which x→0

is either infinite or is an integer greater than 2, to the limiting value lim φ(x) = 1. For x→1

x < y, the inequality φ(x)−φ(y) ≥ y−x holds, whereas for x > y > φ −1 (1+φ −1 (2)) the stronger inequality φ(x) − φ(y) ≥ γ (y − x), γ > 1 holds. (B) φ is continuous and strictly increases from the limiting value lim φ(x) = 0 x→0

to the limiting value lim φ(x) equal either to ∞ or to an integer greater than 1. For x→1

x < y, the inequality φ(y) − φ(x) ≥ y − x holds. Under either condition (A) or (B), the number x ∈ M is uniquely defined by the sequence of integers a1 (x), a2 (x), . . . . And under some additional condition, which amounts to saying that ξ is a generator with respect to T , there exists on the interval ]0, 1[ a measurable function p such that

C −1 ≤ p(x) ≤ C,

1

p(x)dx = 1

0

and T is an ergodic endomorphism of M with measure μ(X) = p(x)dx. X

See also [Philipp: 1967].

110

3 Dynamical systems – ergodicity and mixing

3.4

Spectral mixing theorem

We say that the sequence {an , n ≥ 1} converges in density to a if there exists a subset J of N of density 1, such that lim an = a (3.4.1) Jn→∞

and we write D- lim an = a.

(3.4.2)

n→∞

If {an , n ≥ 1} is a bounded sequence of real numbers, then in view of Lemma 3.3.4, D-limn→∞ an = 0, if and only if, 1 lim |an − a| = 0, n→∞ n n−1

(3.4.3)

j =0

The following lemma is due to Wiener. 3.4.1 Lemma. Let ν be a finite measure on the torus T, and let νˆ (n) = T e2iπ nt dν(t). Then νˆ (n) converges in density to zero, if and only if, ν has no atoms: ν{t} = 0, for any t ∈ T. Proof. We have n−1

1 1 |ˆν (k)|2 = n n n−1

k=0 T

k=0

=

As

T T

1 2iπ k(s−t) e = n n−1

e2iπ ks dν(s)

T

e−2iπ kt dν(t)

1 2iπ k(s−t) e dν(s)dν(t). n n−1 k=0

1 e2iπ(n+1)(s−t) −1 n e2iπ(s−t) −1

k=0

we deduce that

if s = t, if s = t,

1 2iπ k(s−t) e = δ{s=t} . n→∞ n n−1

lim

k=0

Hence, by means of the dominated convergence theorem, 1 lim |ˆν (k)|2 = n→∞ n n−1 k=0

dν(s)dν(t) = s=t

This limit equals 0, if and only if, ν has no atoms.

0≤t 0 for some t. As T is a contraction, exp(−2iπ t)T is a contraction as well. Applying von Neumann’s theorem, we get lim An (exp[−2iπ t]x) = y, n→∞

and check that exp(−2iπ t)T y = y. Thus n−1

n−1 1 1 k exp (−2iπ kt) · T x, x = lim exp (−2iπ kt)T k x, x. y, x = n→∞ n n 0

0

According to Lemma 3.4.1, T k x, x = y, x = lim

n→∞ T

As

T exp

(2iπ ks) dνx (s). Hence ,

1 exp [−2iπ k(s − t)] dνx (s). n n−1 0

n−1 1 0 lim exp 2iπ k(s − t) = n→∞ n 1 0

if s = t, if s = t,

we thus have y = 0. And y is a proper vector, which is not orthogonal to x; hence a contradiction. Thus νx has no atoms. (b) ⇒ (c). We have T n x, x = T exp (2iπ ns)dν(s) = νˆ (n). Thus, D- lim T n x, x = D- lim νˆ (n). n→∞

n→∞

This implication thus follows from Lemma 3.4.1. (c) ⇒ (d). Let H0 = {y ∈ H : D- limn→∞ T h x, y = 0}. Then H0 is a closed subspace of H . Let indeed {yn , n ≥ 1} ⊂ H0 , with yn → y0 in the strong topology . Write T n x, y = T n x, y − yk + T n x, yk . We have

1 1 |T i x, y| ≤ |T i x, yk | + x

y − yk . n n n−1

n−1

0

0

Let ε > 0 be fixed. We choose k sufficiently large so that y − yk

x ≤ 2ε , then n such that n1 n−1 |T i x, yk | ≤ 2ε . Thus n1 n−1 |T i x, y| ≤ ε. 0 0 This establishes that H0 is closed. Now, we show that H0 ⊃ {T n x, n ≥ 1}. Observe that T n is decreasing with n. Let k be some positive integer and a real ε > 0 such that for any sufficiently large n, 0 ≤ T n x 2 − T n+k x 2 ≤ ε2 .

3.4 Spectral mixing theorem

113

By Lemma 3.4.3, for any y ∈ H , |T n x, y − T n+k x, T k y| ≤ ε y . For y = x, we deduce |T n x, x − T n+k x, T k x| ≤ ε x . Since D- limn→∞ T n x, x = 0; it follows that D- lim supn→∞ |T n+k x, T k x| ≤ ε x . As ε is arbitrary, we thus have D- lim T n+k x, T k x = 0. n→∞

Let y ∈ H be such that T k x, y = 0, for any k ≥ 1. Plainly D- limn→∞ T k x, y = 0; and thereby y ∈ H0 . Thus, on the one hand H0 ⊃ H1 = span{T n x, n ≥ 1}, and on the other H0 ⊃ H2 = span{y ∈ H : y ⊥ T n x, ∀n ≥ 1} = H1⊥ . Consequently H0 = H . (d) ⇒ (a). Let ω be a proper vector of T with proper value λ of modulus 1. Let S = λ−1 T . Then S is a contraction with ω as fixed point. Besides, for any i ≥ 1, S i x, y = |λ−i T i x, y| = |T i x, y|. We deduce from (d) that 1 i |S x, y| n n−1 0

tends to zero as n tends to infinity. In view of von Neumann’s theorem, ASn (x) → L in the strong topology. According to the previous observation L, y = 0

for any y ∈ H .

Thus L = 0, the limit being the orthogonal projection of x onto HS = z ∈ H : Sz = z . And so x is orthogonal to ω. As ω is arbitrary, noting that HS = {v ∈ H : T v = λv}, we get x ∈ Hf l . End of the proof of Theorem 3.3.3. Now we show that (X, A, μ, T ) is weakly mixing, if and only if (X, A, μ, T ) has continuous spectrum. If T has continuous spectrum, the only proper vectors are the constants. Thus Hf l = 1⊥ . By the spectral mixing theorem, for any f, g ∈ L2 (μ) = H with f, g ∈ 1⊥ , we have D- lim T n f, g = 0. Thereby the system is weakly mixing. Conversely, if (X, A, μ, T ) is weakly mixing, then it follows that, for any f, g ∈ 1⊥ , D- lim T n f, g = 0. Hence, f ∈ Hf l , and thus 1⊥ ⊂ Hf l . We claim that Hf l = 1⊥ . Indeed, Hf l ⊂ 1⊥ since 1 |T n x, y| = 0. n→∞ n 0 1 n−1 n And thus x ∈ Hf l implies limn→∞ n 0 T x, y = 0. For y = 1, we have T x, 1 = 0, which means that x ∈ 1⊥ . This achieves the proof. n−1

x ∈ Hf l "⇒ ∀y ∈ H,

lim

114

3 Dynamical systems – ergodicity and mixing

3.5

Other equivalences and other forms of mixing

In this section, several quite interesting characterizations of mixing are presented and commented upon. Let (XA, μ) be a probability space. According to a result of Rényi [1958], a measure-preserving transformation T : X → X is strongly mixing if and only if, for all A ∈ A, lim μ(A ∩ T −n A) = μ(A)2 .

n→∞

(3.5.1)

By a result of England and Martin [1968], we also know that an automorphism T is weakly mixing if and only if, for each pair of sets A, B ∈ A with positive measure, there exists a set of integers N of density 1 such that μ(T n A ∩ B) > 0

for all n ∈ N .

(3.5.2)

The following characterization is due to Blum and Hanson [1960]. T is strongly mixing if and only if N 1 nk Lp T f −→ f dμ as N → ∞, N X

(3.5.3)

k=1

for every 1 ≤ p < ∞, f ∈ Lp and any increasing sequence {nk , k ≥ 1} of positive integers. A sequence {Bn , n ≥ 1} of elements of A is called remotely trivial if * σ {Bm+k , k ≥ 1} m≥0

is the trivial σ -algebra, namely it contains only sets of measure 0 or 1. Sucheston [1963] has shown that T is strongly mixing, if and only if for all A ∈ A, every subsequence of the sequence {T −n A, n ≥ 1} contains a further subsequence which is remotely trivial. Krengel [1972] showed that a similar characterization takes place for weak mixing: T is weakly mixing, if and only if for all A ∈ A, the sequence {T −n A, n ≥ 1} contains a subsequence which is remotely trivial. An isometry U of a complex Hilbert space H has purely discrete spectrum if H is spanned by the eigenvectors of U , and has continuous spectrum if it has no eigenvectors. If T : X → X is a measure-preserving transformation, these notions are transferred to T by considering U = UT (UT f = f T ). Krengel [1972] gave a geometric characterization as follows: a vector f ∈ H is called weakly wandering if there exists a strictly increasing sequence 0 = k0 < k1 < k2 < · · · of nonnegative integers such that the vectors U ki f, i = 0, 1, 2, . . . are orthogonal to each other. Then U has continuous spectrum if and only if the weakly wandering vectors span H , and U has purely discrete spectrum if and only if there exist

3.5 Other equivalences and other forms of mixing

115

no nonzero weakly wandering vectors. In the first case the weakly wandering vectors turn out to be dense in H . Now consider a partition of X, a finite collection of pairwise disjoint elements of (1) (1) A, the union of which is X. Finitely many partitions ξ 1 = {A1 , . . . , An1 }, . . . , (r) (r) ξ r = {A1 , . . . , Anr } are called independent if for every (i1 , . . . , ir ) with 1 ≤ ij ≤ nj (j = 1, . . . , r) the equation r *

μ

j =1

(j ) Aij

=

r ( j =1

(j )

μ(Aij )

holds. Infinitely many partitions are called independent if every subset of them is independent. Let T be an automorphism of (X, A, μ). A partition ξ = {A1 , . . . , An } is called weakly independent if there exists a strictly increasing sequence 0 = k0 < k1 < k2 < · · · of nonnegative integers such that the partitions T −ki ξ are independent. This notion is closely related to weak mixing and two-sided mixing: T is called two-sided mixing if for all A, B, C ∈ A, 1 |μ(T −k A ∩ B ∩ T k C) − μ(A)μ(B)μ(C)| = 0. n→∞ n n−1

lim

(3.5.4)

k=0

Weak mixing is the special case where the above holds for all A, B ∈ A and C = X. If T is two-sided mixing, there exists for every partition ξ = {A1 , . . . , An } and every ε > 0, a weakly independent partition ξ¯ = {A¯ 1 , . . . , A¯ n } with n

μ(Ai A¯ i ) ≤ ε,

and

μ(Ai ) = μ(A¯ i )

(i = 1, . . . , n).

(3.5.5)

i=1

In other words: the weakly independent partitions are dense in the set of finite partitions. Further, if for every partition ξ = {A, Ac }, A ∈ A, and every ε > 0, there exists ¯ A¯ c } and three integers k0 , k1 , k2 such that T k0 ξ , T k1 ξ , T k2 ξ are a partition ξ¯ = {A, independent, then T is weakly mixing. These two results are Theorem 3.1 in Krengel [1972], to which we refer for more details. As noticed by Del Junco, Reinhold, Weiss [1999: 447]), it follows from Krengel’s proof that 3.5.1 Theorem (Krengel [1972]). T is a weakly mixing transformation, if and only if the weakly independent partitions are dense in the set of finite partitions. Recall that an IP-set is the set of finite sums with no repetitions generated by a sequence {nk , k ≥ 1} of nonnegative integers, that is, consists of the elements of the form ni1 + · · · + nik , i1 < · · · < ik , k ≥ 1. Del Junco, Reinhold, Weiss [1999: Theorem 2] showed that if T is weakly mixing, then the weakly independent partitions along IP-sets are dense in the set of finite partitions. They also showed

116

3 Dynamical systems – ergodicity and mixing

[1999: Theorem 4] that if U is an isometry of a complex Hilbert space H , which has no discrete spectrum, then the weakly wandering vectors along IP-sets are dense in H . Given a measure-preserving transformation T of (X, A, μ), a sequence m = {mk , k ≥ 1} is mixing for T if for any pair of the sets A, B ∈ A, lim μ(A ∩ T −mk B) = μ(A)μ(B).

k→∞

(3.5.6)

In the same paper [1999: Theorem 5], it is proved that if m is mixing for T , then the weakly independent partitions along IP-sets with generators in m are dense in the set of finite partitions. Sequential dynamical systems. Berendt and Bergelson [1984] (another paper to which we refer in this section) generalized the notion of ergodicity, weak mixing, and strong mixing for arbitrary sequences of measure-preserving transformations. A sequential dynamical system is a quadruple (X, A, μ, T˜ ) where T˜ = {Tj , j ≥ 1} is a sequence of measure-preserving transformations of X. A sequential dynamical system is ergodic if for any pair of the sets A, B ∈ A, N 1 μ(Tn−1 A ∩ Tm−1 B) = μ(A)μ(B). N →∞ N 2

lim

(3.5.7)

n,m=1

Alternatively, we say that T˜ is ergodic. 3.5.2 Theorem. For a sequential dynamical system (X, A, μ, T˜ ), the following conditions are equivalent: (1) T˜ is ergodic. −1 −1 2 (2) limN→∞ N12 N n,m=1 μ(Tn A ∩ Tm A) = μ(A) . Lp nk (3) For every 1 ≤ p < ∞ and f ∈ Lp , N1 N k=1 T f −→ X f dμ as N → ∞. (4) The former property holds for some 1 ≤ p < ∞. This is Theorem 2.1 of Berendt and Bergelson [1984], from which we can easily infer that a dynamical system (X, A, μ, T ) is ergodic if and only if, the sequential dynamical system (X, A, μ, T ) is, where T = {T j , j ≥ 0}. The extension of the notion of strong mixing to sequential dynamical systems, requires the introduction of a notion. Let E be any set and F ⊆ E × E. We say that F is of bounded fibres if there exists some c such that for every a1 ∈ E, the set F contains at most c elements of the form (a1 , a2 ) with a2 ∈ E. Then a sequential dynamical system (X, A, μ, T˜ ) is strongly mixing if for any pair of the sets A, B ∈ A and ε > 0, the set of solutions (m, n) of μ(T −1 A ∩ T −1 B) − μ(A)μ(B) ≥ ε, (3.5.8) n m

3.5 Other equivalences and other forms of mixing

117

is of bounded fibres. Evidently, a dynamical system (X, A, μ, T ) is strongly mixing if and only if the corresponding sequential dynamical system (X, A, μ, T ) is. A theorem of Berendt and Bergelson [1984: Theorem 2.1] states 3.5.3 Theorem. For sequential dynamical systems, the following conditions are equivalent: (1) T˜ is strongly mixing. (2) For any A ∈ A and ε > 0, the set of solutions (m, n) of μ(T −1 A ∩ T −1 A) − μ(A)2 ≥ ε n m is of bounded fibres. (3) For any 1 ≤ p < ∞, f ∈ Lp and ε > 0, there exists if N ≥ K and a K such that T f − n1 < n2 < · · · < nN are positive integers, then N1 N k=1 nk X f dμ p ≤ ε. (4) Every subsequence of T˜ is ergodic. This allows us to recover the characterizations of strong mixing for dynamical systems of Rényi and Blum–Hanson mentioned before. Now we turn to weak mixing. We extend the notion (3.3.5) of lower density, upper density, and density of a subset of N to subsets of N2 , with respect to squares. The ¯ lower (resp. upper) density δ(B) of B (resp. δ(B)) of a subset B of N2 is defined by δ(B) = lim inf J →∞

#{B ∩ [1, J ]2 } , J

¯ δ(B) = lim sup J →∞

#{B ∩ [1, J ]2 } . J2

(3.5.9)

¯ When δ(B) and δ(B) coincide, we denote δ(B) the common value and say that B has density δ(B). A double sequence {am,n , m, n ≥ 1} converges in density to a if there exists a subset J of N2 of density 1 such that lim

J(m,n)→(∞,∞)

an,m = a.

(3.5.10)

In this case, we write D- lim an,m = a. This extends the notion of D-convergence of simple sequences defined in (3.4.1) to double sequences. A sequential dynamical system (X, A, μ, T˜ ) is weakly mixing if for any pair of the sets A, B ∈ A, D- lim μ(Tn−1 A ∩ Tm−1 B) = μ(A)μ(B). (3.5.11) ˜ is the The product of two sequential dynamical systems (X, A, μ, T˜ ) and (Y, B, ν, S) system ˜ (X × Y, A × B, μ × ν, T˜ × S) where (Tn × Sm )(x, y) = (Tn x, Tm y). Here again, there is a nice set of characterizations. Indeed by a theorem of Berendt and Bergelson [1984: Theorem 4.1] we have:

118

3 Dynamical systems – ergodicity and mixing

3.5.4 Theorem. The following conditions are equivalent: (1) T˜ is weakly mixing. (2) For any A ∈ A, D-lim μ(Tn−1 A ∩ Tm−1 A) = μ(A)2 . (3) For any A, B ∈ A, and δ > 0, there exists a K such that for any N ≥ K and m the inequality μ(Tn−1 A ∩ Tm−1 B) − μ(A)μ(B) ≥ ε has at most δN solutions n with 1 ≤ n ≤ N. that if N ≥ K (4) For any 1 ≤ p < ∞, f ∈ Lp and δ, ε > 0, there exists a K such and n1 < n2 < · · · < nN ≤ N/δ are positive integers, then N1 N k=1 Tnk f − ≤ ε. f dμ X p (5) Every positive lower density subsequence of T˜ is ergodic. (6) T˜ × T˜ is ergodic. ˜ (7) T˜ × S˜ is ergodic for any ergodic S. It follows in particular that (X, A, T , μ) is weakly mixing if and only if N 1 nk Lp T f −→ f dμ N X

as N → ∞,

(3.5.12)

k=1

for every 1 ≤ p < ∞, f ∈ Lp and any positive lower density increasing sequence {nk , k ≥ 1} of positive integers. This is a result due to Jones [1972]. Weakly mixing sequences may admit only zero density strongly mixing subsequences. For example, such is the case with the sequence of pointwise transformations on T given by Tn (x) = n1/2 x, x ∈ T, n = 1, 2, . . . . (3.5.13) From multiple recurrence theory more can be said (see Furstenberg [1977]): for instance if (X, A, μ, T ) is weakly mixing, then for any triple A, B, C of elements of A, lim n

−1

n−1

n→∞

|μ(A ∩ T −i B ∩ T −2i C) − μ(A)μ(B)μ(C)| = 0.

(3.5.14)

i=0

Let E = {Tj , j ≥ 1} be a family of measurable transformations of X, preserving μ. We assume that E is weakly mixing in the sense that 1 |Tk f, g − f, 1g, 1| = 0 n→∞ n n

lim

(∀f, g ∈ L2 (μ)).

k=1

Assertion (c) of Theorem 3.3.3 can be extended ([Weber: 2001], Propositions 6.1 and 6.2) as follows:

3.5 Other equivalences and other forms of mixing

119

3.5.5 Proposition. Let f, g ∈ L∞ (μ). Let X and Y be two independent random variables, such that X ∼ f and Y ∼ g. Let also F : R2 → R be continuous. Then for any ε > 0, one can define a sequence S of positive integers of density 1, such that for any u ∈ S, ≤ ε. F (f, T g)dμ − E F (X, Y ) u X

k l Proof. It suffices to prove the result for F (x, y) = M k,l=1 ak,l x y . The general case will follow from the Stone–Weierstrass theorem. Let ε > 0 and f, g ∈ L∞ (μ). We have thus to consider the expression F (f, Tu g)dμ =

M

ak,l

f k Tu g l dμ.

k,l=1

By Lemma 3.3.4, there exists a sequence S of density 1 such that k l k lim f Tu g dμ = f dμ · g l dμ. Su→∞

Operating by induction, for any ε > 0 we deduce the existence of a sequence S of density 1 such that ∀u ∈ S, ∀k, l = 1, . . . , M, f k Tu g l dμ − f k dμ · g l dμ ≤ ε . M

Choose ε = ε/(

k,l=1 |ak,l |).

Then, by the previous estimate

M k l F (f, Tu g)dμ − E F (X, Y ) = F (f, Tu g)dμ − ak,l f dμ · g dμ k,l=1

M k l k l ≤ |ak,l | f Tu g dμ − f dμ · g dμ k,l=1

≤ ε. The general case immediately follows. Indeed, by Stone–Weierstrass’s theorem, one can find a polynomial P (x, y) such that F − P ∞ ≤ ε. By the triangle inequality and applying the previous result to P , we deduce that there exists a sequence S of density 1, such that for any u ∈ S, F (f, Tu g)dμ − E F (X, Y ) ≤ 2ε + P (f, Tu g)dμ − E P (X, Y ) ≤ 3ε. Let a > 1. Let be a finite subset of L2 (μ) ∩ 1⊥ μ . A useful consequence of weak mixing is: for any z ∈ and any positive integer N , one can find integers u1 , . . . , uN

120

3 Dynamical systems – ergodicity and mixing

such that if S(z) = z Tu1 + · · · + z TuN , then S(z) 22,μ ∼ N z 22,μ . In particular, for any z ∈ ,

S(z) 22,μ ≤ aN z 22,μ . This naturally extends to Lp (μ) spaces with 2 ≤ p < ∞. Proposition 3.5.5 can be indeed used to prove 3.5.6 Proposition. Let 2 ≤ p < ∞ and ε > 0. Let be a finite subset of Lp (μ), and a positive integer N. Then, there exist integers u1 , . . . , uN such that if S(z) = z Tu1 + · · · + z TuN , for any z ∈ ,

p/2 |S(z)|p dμ ≤ (1 + ε)p 2pN

X

|z|p dμ. X

Proof. Let z ∈ L∞ (μ) and α > 0. Put Z = (1 + |z|)p dμ. Let X1 , X2 , . . . be a sequence of independent random variables having the same law as z. Let also F (x, y) = (x + y)l , l ≤ p. By applying p-times the previous proposition with the choice f = g = z, we establish the existence of a sequence of integers S1 with density 1, such that for any u ∈ S1 , ∀l ≤ p,

(z + Tu z)l dμ − E (X1 + X2 )l ≤ α.

At the next stage, we apply Proposition 3.5.5 with the choice f = z + Tu z, g = z and X, Y are independent with X ∼ f and Y ∼ g. For any u ∈ S1 and v belonging to S2 (depending on u) of density 1, we have: (z + Tu z + Tv z)l dμ − E (X + Y )l ≤ α

(∀l ≤ p).

But, E (X + Y )l =

l

Clk E Xk E Y l−k =

k=0

l

Clk

(z + Tu z)k dμ

k=0

Thus |E (X + Y )l − E (X1 + X2 + X3 )l | l = E (X + Y )l − Clk E (X1 + X2 )k zl−k dμ k=0

zl−k dμ.

121

3.6 Examples

l k k k l−k (z + Tu z) dμ − E (X1 + X2 ) z dμ Cl = ≤

k=0 l

Clk

k=0 l

≤α

l−k (z + Tu z) dμ − E (X1 + X2 ) · z dμ k

k

Clk

|z

l−k

|dμ = α

(1 + |z|)l dμ ≤ αZ.

k=0

We thus deduce that for any u ∈ S1 , v ∈ S2 and l ≤ p, l l (z + Tu z + Tv z) dμ − E (X1 + X2 + X3 ) ≤ α(1 + Z). Now, it suffices to iterate the preceding argument. For any integer N ≥ 1, we obtain that there exist N sequences S1 , . . . , SN of density 1, such that for any ui ∈ Si , i = 1, . . . , N and l ≤ p, N −2)+ N l (N l λ . ≤ α T z dμ − E X Z u i i i=1

λ=0

i=1

We notice that for any i = 1, 2, . . . , the sequence Si depends on u1 , . . . , ui−1 . For a suitable choice of α depending on ε, N , z and p, we also have: N

p Tu i z

i=1

dμ ≤ E

N

p Xi

+ ε.

i=1

Proceeding now by approximation, the same result for any z ∈ Lp (μ) can be obtained follows from this and finally for any finite subset of Lp (μ). Proposition

N3.5.5 then p and from Rosenthal’s inequality (8.2.9) applied to E i=1 Xi .

3.6

Examples

Before examining mixing properties of some standard examples of dynamical systems, recall that in order for a given measurable dynamical system (X, A, μ, T ) to be ergodic, weakly mixing or mixing, it is sufficient (and also necessary) that this property be satisfied for a countable class of functions that are dense in L2 (μ). In the following examples we consider X = T equipped with the normalized Lebesgue measure λ. (1) Irrational rotations x → T x = x + ϑ mod (1), ϑ ∈ Qc . They are ergodic but not weakly mixing. The last assertion is clear since the characters en (x) = e2iπ nx , n ∈ Z are eigenfunctions for the associated isometry UT , thereby spanning L2 (λ). The system has discrete spectrum and so, cannot be weakly mixing. As regarding the

122

3 Dynamical systems – ergodicity and mixing

ergodicity, let f ∈ L2 (μ), f ∼ number of indices n. Then AN f (x) =

n∈Z an en ,

and assume that an = 0 unless for a finite

N −1 1 f (T k x) = an VN (nϑ)en (x), N k=0

n∈Z

where VN (u) = (e2iπ N u − 1)/N (e2iπ u − 1). As |VN (u)| = o(N ) if u = 0 and {nϑ} = 0, it follows that limN →∞ AN f (x) = f, 1 in L2 (λ) (and pointwise). In view of von Neumann’s theorem and (3.1.3), we get E (f |JT ) = f, 1. By approximation, this remains true for any f ∈ L1 (λ), which means that JT is the trivial σ -field, hence the ergodicity of the system. In relation with this property, we mention the following general result. 3.6.1 Theorem. Let (X, A, μ, T ) be a measurable dynamical system such that T is ergodic but not weakly mixing (or equivalently T × T is not ergodic). Then T has a factor which is a rotation on a compact abelian group. We refer to Petersen [1983: 134] for a proof. (2) The transformations x → Tq x = qx mod (1), q a positive integer. Let a be some integer strictly greater than 1. Then Ta k = Tak . The mixing properties of these transformations rely upon the following lemma. Let (h, k) and [h, k] respectively denote the greatest common divisor and the least common multiple of the positive integers h and k, and put (h, k) h, k = . [h, k] 3.6.2 Lemma. Let A and B be two intervals in T. There exists a constant C depending on A and B such that, for any positive integers h and k, λ(T −1 A ∩ T −1 B) − λ(A)λ(B) ≤ Ch, k. (3.6.1) h k Further, there exists another constant C depending on A and B only, such that for any finite collection h1 , . . . , hR of distinct positive integers R λ(T −1 A ∩ T −1 B) − λ(A)λ(B) ≤ CR(log log R)2 . i,j =1

hi

hj

(3.6.2)

Before giving the proof of the lemma, we indicate some useful consequences. Let T˜ = {Th , h ≥ 1}. 3.6.3 Proposition. The sequential dynamical system (T, B(T), λ, T˜ ) is weakly mixing. Further, for any integer a strictly greater than 1, the transformation Ta is strongly mixing.

3.6 Examples

123

Proof. Indeed, by Theorem 3.5.4 (2), the sequential dynamical system (T, B(T), λ, T˜ ) is weakly mixing if and only if D- lim λ(Tn−1 A ∩ Tm−1 A) = λ(A)2 , for any A ∈ A. But by Lemma 3.6.2, for any pair A and B of intervals in T, R 1 λ(Ti−1 A ∩ Tj−1 B) − λ(A)λ(B) = 0. 2 R→∞ R

lim

i,j =1

Naturally, this remains true if A and B are the finite union of pairwise disjoint intervals. Now by proceeding by approximation, we get that the above property extends to any pair U and V of Borel sets of T: R 1 λ(Ti−1 U ∩ Tj−1 V ) − λ(U )λ(V ) = 0. 2 R→∞ R

lim

(3.6.3)

i,j =1

Specifying this for U = V , gives D- lim λ(Tn−1 U ∩ Tm−1 U ) = λ(U )2 , as required. Now if a is an integer strictly greater than 1, and A and B are intervals in T, then λ(T −k A ∩ T −1 B) − λ(A)λ(B) ≤ Ca, a k = Ca −(k−1) . (3.6.4) a a Thereby, since Ta is λ-preserving,

lim λ Ta−k A ∩ B = λ(A)λ(B). k→∞

The fact that Ta is strongly mixing now follows from the above and the same approximation argument used before. Proof of Lemma 3.6.2. Let A = [a, b), B = [c, d). By expanding the indicator function χ ([a, b[)(x) into a Fourier series, we get χ ([a, b[)(x) = b − a +

−1 e−2iπ nb − e−2iπ na e2iπ nx , 2iπ n ∗

n∈Z

χ ([c, d[)(x) = d − c +

−1 e−2iπ nd − e−2iπ nc e2iπ nx 2iπ n ∗

(3.6.5)

n∈Z

for almost all x. Note ϕ = χ ([a, b[), ψ = χ([c, d[), next ϕ¯ = ϕ − (b − a), ψ¯ = ψ − (d − c). Put for u, v ∈ T and n integer δn (u, v) = e−2iπ nv − e−2iπ nu . Then,

−1 e2iπ nhx δn (a, b), ϕ({hx}) ¯ = 2iπ n ∗ n∈Z

¯ ψ({kx}) =

−1 e2iπ mkx δm (c, d), 2iπ m ∗

m∈Z

124

3 Dynamical systems – ergodicity and mixing

so that ϕ¯h , ψ¯ k =

n∈Z∗

=

m∈Z∗

m,n∈Z∗ nh−mk=0

1 δn (a, b)δ−m (c, d) 2 4π mn

T

e2iπ(nh−mk)x dx

1

δn (a, b)δ−m (c, d). 4π 2 mn

The equation nh − mk = 0 has solutions given by n = μk/(h, k) and m = μh/(h, k), μ = 1, 2, . . . . Thus,

λ Th−1 A ∩ Tk−1 B − λ(A)λ(B) = ϕ¯h , ψ¯ k (3.6.6) =

∞ h, k 1 δμk/(h,k) (a, b)δ−μh/(h,k) (c, d) + δ−μk/(h,k) (a, b)δμh/(h,k) (c, d) . 2 2 4π μ μ=1

Therefore

λ(T −1 A ∩ T −1 B) − λ(A)λ(B) ≤ Ch, k, h

k

where the constant C depends on A and B. And the first part of the lemma is proved. The second part now plainly follows from the first and Gál’s estimate (8.4.19) which we briefly recall for convenience: there exists a constant C such that for any N-tuple of (all different) positive integers n1 , . . . , nN , we have ni , nj ≤ CN(log log N )2 . i,j ≤N

3.6.4 Remark. In Lemma 11, p. 52 of Sprindžuk [1979], another estimate is proposed, which is sometime more suitable. The proof is based on the method of Vinogradov. −1 λ T A ∩ T −1 B − λ(A)λ(B) = O |A| (h, k) . h k k

(3) Gaussian systems. Let π X = {Xn , n ∈ Z} be a centered Gaussian stationary sequence, and let r(m) = −π eimλ F (dλ) denote its covariance function. Let (, B, P) be the underlying probability space on which X is defined. Consider also the Gaussian dynamical system canonically associated to X: (RZ , B(RZ ), μ, T ) where μ = X(P), T is the usual shift: Tf = f ( · + 1) if f ∈ RZ . The mixing properties of these dynamical systems are characterized by a theorem due to Maruyama [1949]. For the proof, we use a probabilistic approach as in [Weber: 1980].

125

3.6 Examples

3.6.5 Theorem. (a) (RZ , B(RZ ), μ, T ) is weakly mixing if and only if N −1 1 |r(n)| = 0. N →∞ N

lim

n=0

(b) (RZ , B(RZ ), μ, T ) is strongly mixing if and only if limn→∞ r(n) = 0. Proof. According to Lemma 10.1.4, 1 P X0 ≥ 0, Xn ≥ 0 − P X0 ≥ 0 P Xn ≥ 0 = arcsin r(n). 2π −1 So, if the system is weakly mixing, necessarily limN →∞ N1 N n=0 | arcsin r(n)| = 0, 1 N −1 namely limN→∞ N n=0 |r(n)| = 0. Similarly, if the system is strongly mixing, then limn→∞ r(n) = 0. For proving the sufficiency part we use Lemma 10.1.5. There is no loss to assume N E Xn2 = 1 for ) every n ∈ N. Let )C, D be cylinders of R with basis I and J respectively, namely C = i∈N Ci , D = j ∈N Dj and

(ai , bi ) Ci = R Let

if i ∈ I , if i ∈ I c ,

˜ V = C˜ × D,

Dj =

C˜ =

(

(uj , vj ) if j ∈ J , R if j ∈ J c .

Ci , D˜ =

i∈I

(

Dj +n .

i∈J

where n is any positive integer sufficiently large for I ∩ {n + J } to be empty. We further assume the numbers ai , bi , uj , vj to be all distinct, which is not a restriction there. Then, by Lemma 10.1.5 there exists a constant C which depends on I and J only, such that |μ(C ∩ T −n D) − μ(C)μ(D)| = P {Xi , i ∈ I ; Xj +n , j ∈ J } ∈ V − P {Xi , i ∈ I } ∈ C˜ P {Xj , j ∈ J } ∈ D˜ (3.6.7) |r(j − i + n)|. ≤C i∈I j ∈J

If we know for instance that limN →∞

1 N

N −1 n=0

|r(n)| = 0, we get from (3.6.7)

N −1 1 μ(C ∩ T −n D) − μ(C)μ(D) = 0. N →∞ N

lim

(3.6.8)

n=0

Let C denote the semi-algebra of cylinders. It is plain that (3.6.8) extends to any pair of sets C and D in C. Let {Cp , p ≥ 1} be a sequence in C converging to some element

126

3 Dynamical systems – ergodicity and mixing

C ∈ B(RN ): limp→∞ μ(Cp *C) = 0. Let ε > 0. Then |μ(C ∩ T −n D) − μ(C)μ(D)| ≤ |μ(C ∩ T −n D) − μ(Cp ∩ T −n D)| + |μ(Cp ∩ T −n D) − μ(Cp )μ(D)| + |μ(Cp )μ(D) − μ(C)μ(D)| := P1 + P2 + P3 . For p large enough, say p ≥ pε , and any integer n,

P1 = |μ(C ∩ T −n D) − μ(Cp ∩ T −n D)| ≤ μ (C ∩ T −n D)*(Cp ∩ T −n D) ≤ μ(C*Cp ) + μ(T −n D*T −n D) ≤ ε/2, P3 ≤ μ(D)μ(C*Cp ) ≤ ε/2. Hence, N−1 N −1 1 1 |μ(C ∩ T −n D) − μ(C)μ(D)| ≤ ε + |μ(Cpε ∩ T −n D) − μ(Cpε )μ(D)|. N N n=0

n=0

By letting N tend to infinity in the above inequality, we easily get lim sup N →∞

N −1 1 |μ(C ∩ T −n D) − μ(C)μ(D)| ≤ ε. N n=0

−1 −n D) − μ(C)μ(D) = 0. As ε was arbitrary, we obtain limN →∞ N1 N n=0 μ(C ∩ T Since the monotonic class generated by the semi-algebra C coincides with B(RN ), we thereby conclude the fulfilment of N −1 1 |μ(C ∩ T −n D) − μ(C)μ(D)| = 0, N →∞ N

lim

n=0

for any C ∈ B(RN ) and any D ∈ C. Let C = T −1 C. The transformation T being invertible, the latter may also be rewritten as N −1 1 |μ(D ∩ T −n C ) − μ(D)μ(C )| = 0. N →∞ N

lim

(3.6.9)

n=0

If D is now a limit of a sequence {Dq , q ≥ 1} in C, by invoking the same reasoning, we also get (3.6.9) in that case. Finally, for any two elements C and D of B(RN ), we have N −1 1 |μ(D ∩ T −n C) − μ(D)μ(C)| = 0, lim N →∞ N n=0

which shows that the system is weakly mixing. That this one is strongly mixing under the assumption limn→∞ r(n) = 0, now follows from the same arguments.

127

3.6 Examples

3.6.6 Remark. The use of Lemma 10.1.5 allows us to prove a little more: if the system is mixing, then it is k-mixing for every k. Indeed, assuming still E Xn2 ≡ 1, let C 1 , . . . , C k be cylinders of RN with respective basis I1 , . . . , Ik : j j ( j (ai , bi ) if i ∈ Ij , j Ci , j = 1, . . . , k, C˜ j = Ci = c R if i ∈ Ij , i∈I j

j

j

where the reals ai , bi are all distinct. Let n1 , . . . , nk be positive integers and assume that the numbers min(n1 , . . . , nk ), min(|ni − nj |, i = j ) are large. Set V =

k (

k (

j

Ci .

j =1 i∈nj +Ij

By Lemma 10.1.5 there exists a constant C depending on I1 , . . . , Ik only, such that k k ( ( −nj T Cj − μ(Cj ) μ j =1

j =1

k ( = P {(Xnj +i )i∈Ij , j = 1, . . . , k} ∈ V − P {Xi , i ∈ Ij } ∈ C˜ j

=≤ C

j =1

|r(i − + nj − nh )|.

1≤j =h≤k i∈Jj ∈Jh

Hence lim

min(n1 ,...,nk )→∞ min(|ni −nj |,i =j )→∞

μ(T −n1 C1 ∩ · · · ∩ T −nk Ck ) = μ(C1 ) . . . μ(Ck ).

(3.6.10)

By proceeding as before, one can next prove that (3.6.10) holds for all C 1 , . . . , C k ∈ B(RN ). (4) Bernoulli shifts. Let (, B, P) be a probability space and consider on N = {ω = (ωz )z∈N : ωz ∈ , z ∈ N} the right shift T ω = (ωz+1 )z∈N . 3.6.7 Theorem. The dynamical system (N , B N , PN , T ) is strongly mixing, and in fact, k-mixing for every k. Proof. This is a simple consequence of independence. Let as before C 1 , . . . , C k be cylinders of N with respective basis I1 , . . . , Ik : j ( j Ai if i ∈ Ij , j Ci , j = 1, . . . , k. C˜ j = Ci = c ifi ∈ Ij , i∈I j

128

3 Dynamical systems – ergodicity and mixing

Let n1 , . . . , nk be positive integers. Then, if we assume that min(n1 , . . . , nk ) and min(|ni − nj |, i = j ) are large, PN

k ( j =1

k (

T −nj Cj = P N Cj . j =1

Hence lim

min(n1 ,...,nk )→∞ min(|ni −nj |,i =j )→∞

μ(T −n1 C1 ∩ · · · ∩ T −nk Ck ) = μ(C1 ) . . . μ(Ck ).

Chapter 4

Pointwise ergodic theorems

This chapter is essentially devoted to the study of pointwise ergodic theorems. In a first step, we study the Birkhoff pointwise ergodic theorem, and integrability properties of the associated maximal operators, as well as Gerstenhaber’s counterexample. A section is devoted to the speed of convergence: its absence in general and its existence when some spectral type conditions are fulfilled. In this chapter we continue the study of the related oscillation functions, made by means of the spectral regularization method introduced in Chapter 1. Other maximal inequalities and the transference principle are included in this chapter. The Wiener–Wintner ergodic theorem, and its uniform version due to Bourgain, as well as some weighted pointwise ergodic theorems conclude the chapter.

4.1

Birkhoff’s pointwise theorem

This theorem is together with von Neumann’s theorem the foundation of ergodic theory. It has many applications in various domains, such as number theory and probability theory. The strong law of large numbers is, in this context, an understatement of Birkhoff’s theorem. Let (X, A, μ, τ ) be a measurable dynamical system and put for f ∈ L0 (μ), Tf = f τ . Clearly T is a positive isometry in any Lp (μ) space. We shall use the notation Aτn f =

1 1 k f τk = T f = ATn f. n n n−1

n−1

k=0

k=0

4.1.1 Theorem (Birkhoff [1931]). Let (X, A, μ, τ ) be a measurable dynamical system. For any f ∈ L1 (μ), the limit lim Aτn f (x) = f¯(x)

n→∞

exists μ-almost everywhere and in L1 (μ), and we have f¯ = E {f |J}, where J = σ {A ∈ A : τ −1 A = A}. In case the dynamical system (X, A, μ, τ ) is ergodic, then for any f ∈ L1 (μ), 1 a.s. f dμ. (4.1.1) lim Aτn f = n→∞ n X Indeed if (X, A, μ, τ ) is ergodic, then E{f |J} is τ -invariant, therefore constant by Lemma 3.2.2, and we have E {f |J} = f dμ; hence (4.1.1). Conversely, assume

130

4 Pointwise ergodic theorems

that (4.1.1) holds for any function f of L1 (μ). Let f ∈ L1 (μ) be τ -invariant. As 1 n−1 k k=0 f τ = f , it follows that f = f dμ. Thus f is constant (modulo μ), and n this implies by Lemma 3.2.2 that the dynamical system (X, A, μ, τ ) is ergodic. An immediate consequence is the well-known Strong law of large numbers: Let X, X1 , X2 , . . . be a sequence of independent, identically distributed, integrable random variables with basic probability space (, B, P), and denote Sn = X1 + · · · + Xn . Then, Sn P lim = E X = 1. n→∞ n Proof of Theorem 4.1.1. (1) The theorem is verified for a dense subset L of L1 (μ). Indeed, let L be the set of functions h = f + g − g τ with f = f τ and g ∈ L∞ (μ). As L∞ (μ) is dense in L2 (μ), which is dense in L1 (μ), we deduce from the Riesz decomposition of L2 (μ) that L is dense in L1 (μ). Now, integrating the inequality |Aτn (f

g − g τn ≤ 2 g ∞ + g − g τ) − f | =

n

n

implies the convergence in L1 (μ) of the averages Aτn h. Further, ¯ E {h |J} = E {f |J} + E {g |J} − E {g τ |J} = f + E {g |J} − E {g |J} = f = h. (2) The operators Aτn , being barycenters of contractions of L1 (μ), are thus L1 (μ) contractions; as well as the conditional expectation operator E {• |J}. It follows from (1) and point (4) of the proof of the von Neumann Theorem 1.3.1 that Aτn (f ) converges in L1 (μ), for any f ∈ L1 (μ). The limit coinciding with E {f |J} on L, is therefore equal to E {f |J} for any f ∈ L1 (μ). (3) In this step, we prove a first type of maximal lemma, due to Yoshida–Kakutani [1939] and Hopf [1960] (other proofs with simplified arguments were given in Riesz [1932], [1932], [1938], [1942], see also the proof of Katznelson and Weiss [1982], and Petersen [1979] for other references, as well as Krengel [1985] for instructive comments), which is necessary to achieve the proof of Birkhoff’s theorem. We follow here a simple and elegant proof given by Garsia [1965], [1970]. We introduce the notation T MNT (f ) = sup ATn f, M∞ (f ) = sup MNT f. (4.1.2) N ≥1

1≤n≤N

4.1.2 Lemma (Maximal inequality). Let T be a positive contraction of L1 (μ). For any f ∈ L1 (μ), any real λ ≥ 0, T (f ) > λ} ≤ f dμ. λμ{M∞ T (f )>λ M∞

It follows that for any f ∈ L1 (μ), sup λμ sup |ATn f | > λ ≤ λ≥0

n≥1

|f | dμ.

131

4.1 Birkhoff’s pointwise theorem

Proof. Put n−1

SN = sup

T j f,

1≤n≤N 0

+ We have SN ≥

k−1 0

+ f + T SN ≥f +

+ SN = max{0, SN },

EN = {SN > 0}.

T j f , k = 1, . . . , N. Thus

k−1

k

T j +1 f = f +

0

T jf =

1

k

k = 1, . . . , N. (4.1.3)

T j f,

0

+ + ≥ f . Hence f + T SN ≥ SN +1 ≥ SN . Integrating this over EN Moreover f + T SN gives + + f dμ + T SN dμ ≥ SN dμ = SN dμ.

EN + T SN dμ ≤

EN + T SN dμ X

EN

X

+ SN

But EN ≤ dμ, since T is a positive contraction of , 1 L (μ). We deduce that EN f dμ ≥ 0. Let E = N EN = {supn≥1 ATn (f ) > 0}. The sets EN being increasing, by passing to the limit we get f dμ ≥ 0. (4.1.4) E

Replacing f by f − λ, we find

T (f ) > λ} ≤ Hence, λμ{M∞

T (f )>λ} {M∞

(f − λ) dμ ≥ 0.

T (f )>λ} f {M∞

dμ, as claimed.

(4) Now we show that the set of functions f ∈ L1 (μ) such that (AτN (f )) converges μ-almost everywhere is closed in L1 (μ). Let f, g ∈ L1 (μ) be two such functions. Then 0 ≤ lim sup Aτn (f ) − lim inf Aτn (f ) n→∞

=

lim sup Aτn (f n→∞

n→∞

τ − g) − lim inf Aτn (f − g) ≤ 2M∞ (f − g). n→∞

τ (f − g) > λ} ≤ f − g for any λ ≥ 0. According to the maximal lemma, λμ{M∞ 1 Thus 1 μ lim sup Aτn (f ) − lim inf Aτn (f ) > 2λ ≤ f − g 1 . n→∞ λ n→∞

If f is obtained as a limit in L1 (μ) of functions g such that (AτN (g)) converge μ-almost everywhere, we therefore deduce that lim supn→∞ Aτn (f ) − lim inf n→∞ Aτn (f ) = 0, μ-almost everywhere. And this proves our claim. (5) The proof is finally achieved by observing, according to (1) and (4), that the sequence AτN (f ) converges μ-almost everywhere for any f ∈ L1 (μ). By (2), this convergence also holds in L1 (μ), the limit being identified to E {f |J}.

132

4 Pointwise ergodic theorems

The maximal inequality will in turn imply (Section 4.2) for any 1 < p < ∞ sup |AT f | ≤ n p n≥1

p

f p , p−1

which is similar to the well-known martingale inequality. Martingale inequality. Let 1 < p < ∞, q = p/(p − 1). Let {Sj , Ej , j ≤ n} be a martingale and E |Sj |p < ∞, j ≤ n. Then n

E max |Sj |p ≤ q p E |Sn |p . j =1

The analogy goes beyond this remark. Birkhoff’s theorem can in turn be also deduced from the martingale convergence theorem. See Stroock [1993: Chapter VI]. Martingale convergence theorem. Let {Sn , En , n ≥ 1} be a martingale and assume supn≥1 E |Sn |p < ∞. Then {Sn , n ≥ 1} converges in Lp and almost surely. Flows. A flow {Tt , t ∈ R} is a group of measurable transformations Tt : (X, A) → (R, B(R)) with T0 =Identity, Tt+s = Tt Ts (s, t ∈ R). If the Tt are measurepreserving, the flow is called measure-preserving. The flow is called measurable if ˜ to (R, B(R)), and A˜ the map (x, t) → Tt x is a measurable map from (X × R, A) is the completion of the product σ -algebra A ⊗ B(R) with respect to the product of the measure μ with the Lebesgue measure on R. There are similar definitions for semiflows {Tt , t ≥ 0}. Note that if f ∈ L1 (μ), then by Fubini’s theorem t → f (Tt x) is locally integrable for μ-almost all x. Further 0

n

f (Tt x)dt =

n−1

j F (T1 x)

j =0

with F (x) =

1

f (Tt x)dt. 0

1 Let also F0 (x) = 0 |f (Tt x)|dt. Then F0 is integrable. n The pointwise ergodic theorem thus implies that n−1 0 f (Tt x)dt converges when n tends to infinity, and also that n−1 F0 (T1n x) → 0 almost surely. As for n ≤ τ ≤ n τ n + 1, 0 f (Tt x)dx − 0 f (Tt x)dx| ≤ F0 (T1n−1 x), the convergence also holds when τ → ∞, τ real. For flows there is another kind of result, the local ergodic theorem due to Wiener: If {Tt , t ≥ 0} is a measure-preserving measurable semiflow and f ∈ L1 (μ), then ε a.e. f (Tt x)dt = f (x). (4.1.5) lim ε−1 ε→0

0

4.1 Birkhoff’s pointwise theorem

133

Maximal inequality and maximal equality for flows. Let f ∈ L1 (μ) and define t Ft (x) 1 t = sup f (Ts x)ds, f ∗ (x) = sup f (Ts x)ds. Ft (x) = t t>0 t>0 t 0 0 Then sup αμ{f ∗ > α} ≤ α≥0

f ∗ >α

f dμ.

The maximal inequality above is due to Wiener [1939], and Yoshida, Kakutani [1939]. Marcus and Petersen [1979], also Engel and Kakutani [1981] showed that this inequality is often an equality. More precisely, when the flow is ergodic, in that every measurable subset A ∈ A which is invariant under the flow (Ts A = A for s ∈ R) has measure 0 or 1, then for α ≥ f dμ, ∗ f dμ. αμ{f > α} = f ∗ >α

The integrability condition f ∈ L1 (μ) is not necessary to ensure the convergence almost everywhere of ergodic means Aτn (f ). Gerstenhaber’s counterexample. Let X0 = [0, 1[, B(X0 ) be the σ -algebra of Borel sets of X0 , λ the normalized Lebesgue measure on X0 , and 0 an automorphism of (X0 , B(X0 ), λ), for instance 0 (x) = x + α mod(1), α irrational. Let also 1 = a0 ≥ a1 ≥ · · · ≥ 0 be a decreasing , sequence of reals, and put for any integer n ≥ 0, Xn = [0, an [×{n}. Let X = ∞ n=0 Xn . We endow X with the σ -algebra B defined by B ⊂ X: ∀n ≥ 1, p1 (B ∩ Xn ) ∈ B(X0 ) and p1 : R2 → R is the projection on ∞the first coordinate. Consider the measure μ on (X, B) defined by: μ(B) = n=0 λ(p1 (B ∩ Xn )), ∀B ∈ B. In addition, define the application (x, y + 1) if x < ay+1 , (x, y) = (0 (x), 0) otherwise. It is easily seen that is an invertible measure-preserving ergodic transformation in the measure space (X, B, μ). Choose the sequence {an , n ≥ 1} as follows: • a1 = a2 , a3 = a4 , a5 = a6 , . . . , ∞ • a2n < ∞, n=1 ∞ √ • n=1 na2n < ∞. We can for instance choose a2n = n−3/2 , n ≥ 1. Then μ(X) = ∞ n=0 an < ∞. Let f : X → R be defined as ⎧ ⎪ if z ∈ X0 , ⎨0 √ f (z) = − n if z ∈ X2n−1 , n ≥ 1, ⎪ ⎩√ n if z ∈ X2n , n ≥ 1.

134

4 Pointwise ergodic theorems

√ j It is easily verified that n−1 j =0 f (x) ≤ n/2, hence 1 n−1 j • n j =0 f (x) → 0 for λ-almost all x, and + • f dm = f − dm = ∞.

Problem 3. Find a condition strictly weaker than integrability ensuring the validity of the conclusion in Birkhoff’s theorem. Non-integrable functions and growth of stationary sequences. If τ is an ergodic endomorphism of (X, A, μ), and f ≥ 0, f ∈ / L1 (μ), then Birkhoff’s theorem implies that μ lim AτN (f ) = ∞ = 1, N →∞

since for any k ≥ 0, AτN (f ) ≥ AτN (f ∧ k), and therefore lim inf N →∞ AτN (f ) ≥ limN→∞ AτN (f ∧ k) = (f ∧ k)dμ. Thus the integrability condition in Birkhoff’s theorem is also necessary for nonnegative functions. If ξ = {ξk , k ≥ 0} is a strictly stationary sequence, Kesten [1975] showed that the related partial sums cannot grow slower than linearly. More precisely 1 ξk > 0 n n−1

lim inf n→∞

a.e. on

k=0

n−1

ξk → ∞ .

(4.1.6)

k=0

Proof via the shift model. Bourgain indicated in [Bourgain: 1988d] an alternate proof derived from the shift model (Z, S), where Sz = {z+1 , ∈ Z}, z = {z , ∈ Z}. Let (X, α, μ, τ ) be a measurable dynamical system. Fix some positive integers J, N with J N . Let f ∈ L0 (μ), x ∈ X and consider the function ϕ on Z defined as follows: f (τ j x) if 0 ≤ j ≤ J , ϕ(j ) = 0 otherwise. k τ j Then n1 n−1 k=0 T ϕ = An f (τ x), provided that 0 ≤ j < J − N , and thus

N 1

sup n=1

n

n−1 j =0

N ϕ(j ) = sup |Aτn f (τ j x)|. n=1

The maximal inequality of the shift model, which follows from elementary covering properties of integer-intervals, n−1 k sup 1 T ϕ n≥1

implies

n

N

k=0

p (Z)

≤ C(p) ϕ p (Z) ,

sup |Aτn f (τ j x)|p ≤ C(p)p

0≤j <J −N n=1

0≤j ≤J

|f (τ j x)|p ;

135

4.1 Birkhoff’s pointwise theorem

and by integrating we have 0≤j <J −N

N

p

sup |Aτn f τ j | p ≤ C(p)p n=1

p

f τ j p .

0≤j ≤J

Since τ is μ-preserving, we obtain N

p

sup |Aτn f | p ≤ C(p) n=1

J p

f p . J −N

Hence

sup |Aτn f | p ≤ C(p) f p . n≥1

A similar argument yields the corresponding weak-type inequality. Extensions. Any linear contraction T on L1 of a σ -finite measure space, with

Tf ∞ ≤ f ∞ for f ∈ L1 ∩ L∞ , is called a Dunford–Schwartz contraction and induces a contraction on all Lp , 1 < p ≤ ∞ (see Dunford and Schwartz [1958]). Birkhoff’s ergodic theorem has been extended to Dunford–Schwartz contractions in Dunford and Schwartz [1956]: the limit E(T )f := limn→∞ n1 nk=1 T k f exists almost everywhere for f ∈ Lp , 1 ≤ p < ∞, and also in Lp -norm for p > 1 (and in L1 -norm in probability spaces). The same conclusion cannot be reached for unitary operators. Indeed, according to a remarkable result of Paszkiewicz [2005a: Theorem 1], in L2 (T), there exists a unitary operator V such that for each increasing sequence N = {nk , k ≥ 1} of positive integers, the sequence of ergodic averages K 1 nk V f (x), K

K = 1, 2, . . .

k=1

diverges almost everywhere for some f ∈ L2 (T). A well-known generalization of Birkhoff’s theorem due to Hopf [1937: 49] asserts that the sequence n−1 k k=0 T f n = 0, 1, 2, . . . n−1 k k=0 T g converges almost everywhere, provided f, g are measurable, f ∈ L1 (μ) and g > 0. This is a particular case of a more general result due to Hurewicz [1944], which can be described as follows. Let (X, A, μ) be a measure space with a nonnegative σ -finite measure μ. Further let F be another σ -finite measure on (X, A). Consider a 1-to-1 measurable transformation T of X. Assume that F is absolutely continuous with respect to μ (μ(A) = 0 implies F (A) = 0 and μ(A) < ∞ implies |F (A)| < ∞). Then ([Saks: 1937], p. 36) F can be represented as an indefinite integral: F (A) = A f0 (x)μ(dx).

136

4 Pointwise ergodic theorems

Set Fn (A) =

n−1

F (T k A),

k=0

μn (A) =

n−1

n = 0, 1, 2, . . . . μ(T k A),

k=0

Then the measure Fn is absolutely continuous with respect to μn . Thus there exists fn such that for all A ∈ A, Fn (A) =

fn dμn . A

Assume now that no measurable subset A of X with positive measure is a wandering set with respect to T . A measurable subset A is a wandering set with respect to T if the images T n A, n ∈ Z are pairwise disjoint. Hurewicz [1944: Theorem 1] proved that the sequence {fn , n ≥ 1} converges μ-almost everywhere on X to a limit f¯, which satisfies (a) f¯(T x) = f¯(x) almost everywhere, (b) f¯ ∈ L1 (μ), (c) F (A) = A f¯(x)μ(dx) for all A ∈ A such that T A = A, μ(A) < ∞. In the special case of a measure-preserving transformation: μ(T A) = μ(A) for A ∈ A, one has easily n−1 f0 (T k x)μ(dx). μn (A) = (n + 1)μ(A), Fn (A) = A k=0

Comparing with the relation linking Fn and fn , we deduce that, μ-almost everywhere on X, n−1 1 fn (x) = f0 (T k x). n+1 k=0

And by Hurewicz’s theorem, these averages converge μ-almost everywhere. This is precisely Birkhoff’s theorem. Consider now in addition to f0 another measurable function g0 such that g0 (x) > 0 μ-almost surely. We introduce the measure ν(A) = g0 (x)μ(dx), A

and define νn similarly. From the T -invariance of μ, we get n−1 g0 (T k x)μ(dx), νn (A) = A k=0

and so

n−1 k k=0 f0 (T x) Fn (A) = νn (dx). n−1 k A k=0 g0 (T x)

137

4.1 Birkhoff’s pointwise theorem

By Hurewicz’s theorem, we conclude that the sequence n−1 k=0

f0 (T k x)

k=0

g0 (T k x)

n−1

,

n = 0, 1, 2, . . .

converges μ-almost everywhere, which is Hopf’s theorem. If T is a positive contraction in L1 (μ), (X, A, μ) a probability space, f ∈ L1 (μ) and g ∈ L+ 1 (μ), then .

n−1 k k=0 T f n−1 k k=0 T g

,n ≥ 1

converges a.e. on

n−1 k=0

T k g > 0 to a finite limit.

This is Chacon–Ornstein’s theorem. We refer to Krengel [1985: 119] for a proof and identification of the limit. A theorem of Campbell and Petersen. Let (X, A, μ) be a probability space and ξ = {ξk , k ∈ N} be a weakly stationary sequence in L2 (μ). Gaposhkin (Theorem 2.6.1) gave a necessary andsufficient condition for the convergence almost everywhere of the averages σn = n1 n−1 k=0 ξk , involving the spectral measure of the associated unitary operator (Chapter 2). When ξ is further strictly stationary, by the Birkhoff pointwise ergodic theorem, we know that these averages converge almost everywhere. It is natural to try to understand Gaposhkin’s characterization in that case. Campbell and Petersen [1989] clarified this point. More precisely, let T be a unitary operator on L2 (μ). Let ET denote the spectral measure for T , supported on the closed unit disk in C, and for n = 1, 2, . . . let Vn = {z ∈ C : 0 < |1 − z| < 2−n }. By Theorem 2.6.1, 1 k (a) d lim T f (x) exists a.e. ⇐⇒ (b) lim [ET (Vn )f ](x) = 0 a.e. n→∞ n n→∞ n−1 k=0

When T is induced by a measure-preserving transformation, a strengthened version of (b) actually holds. 4.1.3 Theorem. Let Tf = f τ where τ is an automorphism of (X, A, μ), with associated spectral representation π T = eiλ E(dλ). −π

If {εk , k ≥ 1} is any nonnegative sequence tending to 0 as n tends to infinity, then lim [E(−εn , 0)f ](x) = 0 a.e. for all f ∈ L2 (μ).

n→∞

(4.1.7)

138

4 Pointwise ergodic theorems

The proof uses the ergodic Hilbert transform, which is for f ∈ L2 (μ) the almost sure limit T k f (x) 1 Hf (x) = lim . π n→∞ k 1≤|k|≤n

According to [Campbell: 1986], H may be represented via the spectral integral π H =i η(λ)E(dλ), −π

where η(λ) is the odd function on [−π, π] whose value for λ ∈ (0, π ] is (π − λ)/π and η(0) = 0. Consider also for fixed ε ∈ [−π, π] the rotated Hilbert transform of f induced by T : eikε .T k f (x) 1 Hε f (x) := lim . π n→∞ k 1≤|k|≤n

Campbell and Petersen proved this theorem by first showing that condition (4.1.7) is equivalent to a form of continuity at ε = 0 of the rotated Hilbert transform, that is a.e.

lim Hεn f (x) = Hf (x) + i[E{0}f ](x).

(4.1.8)

n→∞

Next they showed that (4.1.8) in turns holds: if H ∗ f (x) =

1 eikε .T k f (x) , k −π ≤ε≤π n≥1 π

sup

sup

1≤|k|≤n

then there exists a constant C > 0 such that for all f ∈ L2 (μ), sup λ2 μ{x : H ∗ f (x) > λ} ≤ C f 22 .

(4.1.9)

λ≥0

With the help of the Banach principle, it is then easy to conclude. The proof of (4.1.9) follows from a nice maximal inequality established by the authors, which is worth quoting. For a = {ak , k ∈ Z} ∈ 2 (Z), set a (j ) = sup sup ∗

ε>0 n≥1 1≤|k|≤n

ei(k+j )ε ak+j . k

There exists a constant C > 0 such that for all a ∈ 2 (Z), |ak |2 . sup λ2 #{j : a ∗ (j ) > λ} ≤ C λ≥0

(4.1.10)

k∈Z

The authors conjectured that even strong (2, 2) holds: a ∗ 2 (Z) ≤ C a 2 (Z) .

4.2 Dominated ergodic theorems

139

Moving averages. Naturally moving averages present a more complex almost sure behavior than the usual “fixed” averages. The convergence almost everywhere of moving averages has been characterized by Bellow, Jones and Rosenblatt [1990], by means of a cone condition, which is related to works of Nagel and Stein [1984] and of Sueiro [1987]. Let (X, A, μ, τ ) be a measurable dynamical system, and assume τ is ergodic. Let = {(nk , k ), k ≥ 1} be a sequence of pairs of integers and define k −1 1 Ak f (x) = f (T nk +j x), k

k = 1, 2, . . . .

j =0

Introduce for α > 0, α = {(z, s) ∈ N2 : |z − y| ≤ α(s − r) for a pair (y, r) ∈ }. Let α (s) = k : (k, s) ∈ α be the cross-section of α at height s > 0. Introduce also the maximal operator associated to , 1 |f (T k+j x)|. n (k,n)∈ n−1

M f (x) = sup

j =0

According to Theorem 1 in [Bellow–Jones–Rosenblatt: 1990], we have the following characterization. a) Assume there exist constants A < ∞ and α > 0 such that |α (s)| ≤ As for any positive integer s. Then M is of weak type (1, 1) and of strong type (p, p) for any 1 < p ≤ ∞. b) If M is of weak type (p, p) for some p > 0, then for any α > 0, there exists Aα < ∞ such that for any positive integer s, |α (s)| ≤ Aα s. Here are two typical examples: 1. There exists f ∈ L∞ (μ) such that 2k k −1 μ x : 21k 2j =2+2 f (T j x), k = 1, 2, . . . converges = 0. 2k 2. For every f ∈ L1 (μ), μ x:

1 k 22

22.2k +22k −1

4.2

k j =22·2

f (T j x), k = 1, 2, . . . converges = 1.

Dominated ergodic theorems

Let (X, A, μ) be a measure space with μ(X) = 1. Let T be an L1 -L∞ positive contraction. We study in this section relations between integrability properties of f and those of the maximal operators defined in (4.1.2). The very proof of Lemma 4.1.2 also implies with minor changes the lemma below.

140

4 Pointwise ergodic theorems

4.2.1 Lemma. Let T be a positive contraction of L1 (μ). For any f ∈ L1 (μ) and λ ≥ 0, T (a) λμ{M∞ (f ) > λ} ≤ f dμ, T (f )>λ M∞ T (f ) > 2λ} ≤ 2 f dμ. (b) λμ{M∞ 2f >λ

These inequalities suggest to introduce the following definition. 4.2.2 Definition. Let (, A, μ) be a probability space and X, Y : (, A) → R+ two measurable applications. We say that X and Y are in maximal type relation if for any nonnegative real α, X dμ < ∞.

αμ(Y > α) ≤ (Y >α)

We will first prove a useful lemma. 4.2.3 Lemma. Assume that X and Y are in maximal type relation. Let ψ : R+ → R+ , be increasing, right continuous and such that ψ(0) = 0. Then, Y (ω) t −1 ψ(dt)dμ(ω). ψ(Y )dμ ≤ X(ω) 0

Proof. By means of the transfer formula, 1 ψ(Y )dμ = μ(Y > α)ψ(dα) ≤ Xdμ ψ(dα) + R+ α Y >α R X(ω) = dμ(ω)ψ(dα) α (ω,α):Y (ω)>α Y (ω) 1 = X(ω) ψ(dα) dμ(ω). α 0 We shall establish the following theorem. 4.2.4 Theorem (Dominated ergodic theorem). Let T be an L1 -L∞ positive contraction. Let f ≥ 0 be measurable, then

p

f p (1 < p < ∞), p−1 T e + (b) M∞ f 1 ≤ 1 + f log f dμ . e−1 T (a) M∞ f p≤

T is of strong type According to the usual terminology, inequality (a) means that M∞ (p, p) and of weak type (1, 1) by Lemma 4.1.2.

141

4.2 Dominated ergodic theorems

Proof. We apply Lemma 4.2.3 with ψ(t) = t p , t > 1, Y = MnT f , X = f . According to Lemma 4.1.2, X and Y are in maximal type relation. It follows that Y (ω) pt p−2 dt dμ(ω) ψ(Y )dμ = Y p dμ ≤ f (ω) 0 p p = f (ω)Y p−1 (ω)dμ(ω) = f Y p−1 dμ. p−1 p−1 1/a b 1/b

, f, g ≥ 0, 1/a + g dμ We apply Hölder’s inequality: f · g dμ ≤ f a dμ p−1 . This leads to 1/b = 1 with the choices a = p, b = p/(p − 1), g = Y

p Y dμ ≤ p−1

1

p

p

p−1

p

p

p

Y dμ

f dμ

,

p or else Y p ≤ p−1

f p . And inequality (a) follows from Fatou’s lemma, since T T Mn ↑ M∞ . Now observe that for any a ≥ 0, b ≥ 0, a log b ≤ a log+ a + b/e. This is easily proved by first observing that log x ≤ x −1 (x > 0), which allows us to get log b ≤ b/e, + then by distinguishing thecases a ≤ 1 and a > 1. Let ψ(t) = (t − 1) (t > 0). Put X = f , Y = Mn f . Then ψ(Y )dμ ≥ (Y − 1)dμ, and in view of Lemma 4.2.3, Y Y + −1 −1 t ψ(dt) dμ ≤ f· t dt dμ (Y − 1) dμ ≤ f · 0 1 Y ≥1 f log Y dμ ≤ f log+ f dμ + e−1 Y dμ. = Y ≥1

Thus

Y dμ ≤

(Y − 1)+ dμ + 1 ≤

f log+ f dμ + e−1

Y dμ + 1,

or else (1 − 1/e) Y dμ ≤ 1 + f log+ f dμ. One concludes as in the previous step, hence part (b) of Theorem 4.2.4 is proved When f = 1A , inequality (b) of Theorem 4.2.4 does not provide any hint on the T (f ) when μ(A) tends to 0. We shall possible continuity of the maximal operator M∞ clarify this point by showing the following lemma. 4.2.5 Lemma. Let ε > 0, then for any A ∈ A, 1 T (M∞ (1A ) − ε)+ dμ ≤ log μ(A). ε

(4.2.2)

Let also A1 , . . . , AN , N be pairwise disjoint measurable sets. Then N i=1

T M∞ (1Ai )dμ ≤

N i=1

μ(Ai ) 1 + log N

N

i=1 μ(Ai )

.

(4.2.3)

142

4 Pointwise ergodic theorems

Proof. Put f = 1A and ψ(t) := (t − ε)+ , ε > 0. Lemma 4.2.3 applied to Y = MnT f , X = |f |, provides the estimate Y (ω) ∨ ε + dμ(ω). (Y − ε) dμ ≤ X(ω) log ε Thus

(MnT (1A ) − ε)+ dμ ≤

Hence,

Mn (1A )+ ∨ ε dμ(ω). ε

1A log

(MnT (1A ) − ε)+ dμ ≤

log

1 μ(A). ε

Letting now n tend to infinity, we deduce 1 T + μ(A). (M∞ (1A ) − ε) dμ ≤ log ε

(4.2.4)

Now let A1 , . . . , AN , N be pairwise disjoint measurable sets. Then N

T M∞ (1Ai )dμ =

i=1

N i=1

≤ Nε +

T (M∞ (1Ai ) − ε + ε)dμ N

T (M∞ (1Ai ) − ε)+ dμ

i=1

Hence,

N

N 1 ≤ N ε + log μ(Ai ). ε i=1

T (1Ai )dμ ≤ inf Nε + log M∞ ε>0

i=1

N 1 μ(Ai ) . ε i=1

The infimum of the right-hand side is reached at the value ε = thus have N i=1

T M∞ (1Ai )dμ ≤

N

i=1 μ(Ai )/N .

We

N μ(Ai ) 1 + log N . i=1 μ(Ai ) i=1

N

A maximal inequality in BMO. Let (X, , μ) be a probability space. Let T be a positive contraction of L1 (μ) such that T 1 = 1. Having now proved the dominated ergodic theorem, some useful observations can be made, notably in the light of Hopf maximal inequality, which we recall for our purpose: for any f ∈ L1 (μ), T f dμ. ∀λ > 0, λμ{M∞ (f ) > λ} ≤ T (f )>λ M∞

143

4.2 Dominated ergodic theorems

And by means of the dominated ergodic theorem 4.2.4, for any λ ≥ 0 and r > 1, T T T λμ sup M∞ (fθ ) > λ ≤ sup M∞ (|fθ |) dμ ≤ sup M∞ (|fθ |) r θ ∈

θ ∈

θ ∈

1 r r sup |fθ | ≤ ≤ #() r sup fθ r , r r − 1 θ ∈ r −1 θ ∈ (4.2.5)

where (fθ , θ ∈ ) is any finite subset of L1 (μ) (see Peškir–Weber [1996] and Ziegler [1998] for extensions to the non-measurable case). The last inequality follows from Jensen’s inequality. We shall extend this one to BMO spaces. Recall /∞ their definition.1 Let 0 ⊂ 1 ⊂ · · · ⊂ be an increasing filtration of ( = i=0 i ). Let f ∈ L (μ), and put fn = E (f |n ),

fn∗ = sup |fν |,

fn = fn − fn−1 ,

0≤ν≤n

Sn (f ) =

n

[ fν ]2

1/2 ,

f ∗ = sup fn∗ , n

S(f ) = sup Sn (f ).

ν=1

n

Introduce first the Hardy spaces, let p ≥ 1 and define p = f : E [S(f )]p < ∞ (4.2.6)

1/p with norm f p = E [S(f )]p . Now we introduce the BMO spaces (for bounded mean oscillations) (4.2.7) BMO = f : supn≥1 E (|f − fn−1 |2 n ) ∞ < ∞

with norm f BMO = supn≥1 E |f − fn−1 |2 n ∞ , for f such that f0 = E (f 0 ) = 0. Recall that these spaces are Banach spaces and that they strictly intercalate between the space of exponentially integrable functions and L∞ (μ). Indeed, by a theorem of Jones–Nirenberg [1961], any element f ∈ BMO is exponentially integrable. And a closed graph argument shows that there exists a constant C (possibly depending on the filtration) such that for any 1 ≤ r ≤ ∞,

f r ≤ Cr f BMO .

(4.2.8)

There are further examples of functions f belonging to BMO, but not to L∞ (μ). Recall also Feffermann inequality (see Garsia [1973: 7–8]) on the duality 1 –BMO. Let f, ϕ be such that E (f |0 ) = E (ϕ|0 ) = 0, then |E (f · ϕ)| ≤ c f 1 ϕ BMO ,

(4.2.9)

in the following sense: E (f.ϕ) = limn→∞ E (fn .ϕn ), and c is a universal constant. Recall also (Garsia [1973: 27]) that for any p ≥ 1, " p #1/p " #1/p E S (f ) ≤ Cp E (f ∗ )p . (4.2.10)

144

4 Pointwise ergodic theorems

One may refer to Garsia [1973] for more insights on these spaces. We are going to establish the following result. 4.2.6 Theorem. Let {fθ , θ ∈ } be a finite subset of BMO and assume further that E (fθ |0 ) = 0, θ ∈ . Then,

∀λ > 0, λμ sup M∞ (fθ ) > λ ≤ C sup fθ BMO · log #(), θ ∈

θ ∈

where C is a universal constant. Proof. Let (fθ , θ ∈ ) be a finite subset of BMO and put r = log #() (without loss of generality, one can assume that #() ≥ 3). We deduce from inequalities (4.2.5) and (4.2.8) that for any λ ≥ 0, r r T sup |fθ | ≤ (fθ ) > λ ≤ #()1/r sup fθ r λμ sup M∞ r r − 1 θ ∈ r −1 θ ∈ θ ∈ ≤ C log #() · sup |fθ | BMO . θ ∈

4.3

Classes L logm L

m For any positive mm, let L log L denote the class of measurable functions f

integer + such that |f | log |f | is integrable. These classes naturally appear in the study of the integrability properties of M∞ (f ).

4.3.1 Theorem. Let T be an L1 -L∞ positive contraction. Then, for any positive integer m, T f ∈ L logm L "⇒ M∞ (f ) ∈ L logm−1 L. (4.3.1) Proof. We pose Y = MnT (|f |), X = |f |. By Lemma 4.2.1, we have for any α ≥ 0, Xdμ. αμ {Y ≥ α} ≤ 2 2X≥α

We say in this case that X and Y are in relation of weak maximal type. By arguing as in the proof of Lemma 4.2.2, it is possible to also establish: for any right-continuous function ψ : R+ → R+ , with ψ(0) = 0, 2X(ω) ψ(Y )dμ ≤ 2 X(ω) t −1 ψ(dt)dμ(ω). (4.3.2) 0

Choose ψ(t) = t (log+ t)m−1 , m ≥ 2. As dψ = (m − 1)(log+ t)m−2 + (log+ t)m−1 , dt

145

4.4 A converse

we get 2X(ω) t

−1

+

2X(ω)

ψ(dt) ≤ m(log 2X(ω))

m−1

0

t −1 dt = m(log+ 2X(ω))m .

0

And so MnT (|f (t)|)(log+ MnT (|f (t)|))m−1 dμ(t) ≤ 2m |f (t)|(log+ 2|f (t)|)m dμ(t). We conclude by letting n tend to infinity.

4.4 A converse A theorem due to Ornstein [1971] shows that if τ is an ergodic automorphism, the sufficient condition f ∈ L log L is also necessary for the integrability of M∞ f , when f ≥ 0. 4.4.1 Theorem. If τ is an ergodic automorphism from a measure space (, A, μ), where μ is a finite measure, then for any f ≥ 0, we have the equivalence τ M∞ f = f ∗ ∈ L1 ⇐⇒ f ∈ L log L.

The proof relies upon the following lemma due to Moy [1960]. Put for A ∈ A such that μ(A) > 0, ω ∈ A, rA (ω) = inf{n ≥ 1 : τ n (ω) ∈ A} and let A∗ =

,∞

i=1 τ

(4.4.1)

i A.

4.4.2 Lemma. Let τ be an automorphism from a measure space (, A, μ). Let f ∈ L1 (μ), then r A −1 k f τ dμ = f dμ. A k=0

A∗

Proof. Introduce for any positive integer k, the sets Ak = τ k {ω ∈ A : rA (ω) ≥ k + 1}. We claim that these sets form a countable partition of A∗ \A. First if ω ∈ Ak , then τ −k ω ∈ A and rA (τ −k ω) = inf{n ≥,1 : τ n−k (ω) ∈ A} ≥ k + 1, which implies that ∗ ∗ ω∈ / A. Thus ω ∈ τ k A ∩ Ac , and so ∞ k=1 Ak ⊂ A \A. Conversely if ω ∈ A \A, let i i 0 0 i0 ≥ 1 be the smallest integer for which ω ∈ τ A, thus ω ∈ τ A and ω ∈ / τ j A for / A) j < i0 . Rewrite this as (using that ω ∈ τ −i0 ω ∈ A

and

τ n−i0 ω ∈ / A if 1 ≤ n ≤ i0 .

146

4 Pointwise ergodic theorems

c together imply r (τ −i0 ω) > i , which means that This and the fact A 0 ,∞that ω ∈ A ∗ ω ∈ Ai0 . Thus k=1 Ak = A \A. Now let 1 ≤ k < l and pick ω ∈ Ak ∩ Al . On the one hand, since ω ∈ Al we have

rA (τ −l ω) ≥ l + 1. And on the other, as ω ∈ Ak we get τ (l−k)−l ω = τ −k ω ∈ A. Thus rA (τ −l ω) ≤ l − k, which provides a contradiction. Hence Ak ∩ Al = ∅. Let g ≥ 0 be integrable. We deduce r A −1

g τ dμ = k

A k=0

= =

(by (3.1.2))

∞

j −1

j =1 (rA =j )∩A k=0 ∞ ∞

g τ dμ = k

k=0 j =k+1 (rA =j )∩A ∞

g dμ +

k=1 Ak

=

A∗

j =1 k=0 (rA =j )∩A ∞

g τ k dμ =

g dμ =

g τ k dμ

g τ k dμ

k=0 (rA >k)∩A

A

j −1 ∞

A∗ \A

g dμ +

g dμ A

gdμ, (4.4.2)

where in the last equality we used the fact that A∗ ∩ A = A, as it follows by applying Poincaré recurrence Theorem 3.1.5 to τ −1 . It is the only instance, with the use of (3.1.2), where the assumption that τ is an automorphism is used. Now let f ∈ L1 (μ) and write f = f + − f − . The proof is now achieved by applying (4.4.2) to g = f + and g = f − . Notice from (4.4.2) that r A −1

gτ k dμ =

A k=0

r A −1

gτ k dμ+

A∩(rA >1) k=0

A∩(rA =1)

gdμ =

A∗ \A

g dμ+

g dμ. A

Thus

r A −1

g τ k dμ =

A∩(rA >1) k=0

A∗ \A

g dμ +

g dμ.

(4.4.3)

A∩(rA >1)

4.4.3 Lemma. Let τ be an ergodic automorphism from a measure space (, A, μ). Let f ≥ 0 be integrable and assume that for some α > 0 the measure of the set A = {f ∗ < α} is positive. Then, f dμ ≤ 2αμ{f ∗ > α}. f ∗ ≥α

147

4.4 A converse

This result provides for ergodic automorphisms a converse to the maximal inequality given in Lemma 4.1.2. Proof. By Remark 3.1.4 applied to τ −1 , μ(A∗ ) = 1. Further A ∩ (rA > 1) = A ∩ τ −1 (Ac ) since ω ∈ A ∩ (rA > 1) means that τ ω ∈ Ac , thereby ω ∈ A ∩ τ −1 (Ac ) and conversely. Recall also that ω ∈ A implies rA (ω) < ∞. Thus for any g ∈ L1 (μ), by (4.4.3),

r A −1

g τ k dμ =

A∩(rA >1) k=0

Ac

=

And if g ≥ 0,

g dμ +

r A −1

Ac

g dμ

g dμ +

A∩(rA >1)

A∩τ −1 (Ac )

(4.4.4) g dμ.

g τ k dμ ≥

A∩(rA >1) k=0

g dμ.

(4.4.5)

Ac

Let ω ∈ A be such that rA (ω) > 1. Observe that r A −1

f (τ k ω) < αrA (ω),

(4.4.6)

k=0

since otherwise we would have f ∗ (ω) ≥

rA −1 1 f (τ k ω) ≥ α, rA k=0

which is absurd. Now by using (4.4.5) for g = f , next (4.4.6) and finally (4.4.4) for g = 1, we obtain Ac

r A −1

f dμ ≤

f τ k dμ

A∩(rA >1) k=0

≤α

rA dμ A∩(rA >1)

r A −1

=α ≤ This achieves the proof.

1 dμ

A∩(rA >1) k=0

Ac

1 dμ +

τ −1 (Ac )

1 dμ = 2αμ(Ac ).

148

4 Pointwise ergodic theorems

Proof of Theorem 4.4.1. We can assume f ≥ 1 (otherwise consider f + 1 in place of f ). Let α0 = inf{α : μ(f ∗ < α) > 0}. We have 1 0 f (ω)

f log f dμ =

f (ω)

α −1 dα

dμ(ω)

1

= α f (ω) dμ(ω) dα 1 f ≥α ∞ −1 ≤ α f (ω) dμ(ω) dα 1 f ∗ ≥α ∞ α0 ∗ ≤2 μ{f ≥ α} dα + f 1 α −1 dα

∞

−1

α0 ∗

1

≤ 2 f 1 + f 1 log α0 . 4.4.4 Remark. Let a = {ak , k ≥ 0} be a sequence of bounded nonnegative reals, and consider the weighted ergodic averages n−1 ak f τ k Wnτ f = k=0 . n−1 k=0 ak τ f = sup τ −1 ≤ a ≤ C, Put W∞ k n≥1 Wn f . Let m be any positive integer. If C k = 0, 1, . . . , the same arguments also show that if τ is an ergodic automorphism from a measure space (, A, μ), then for any f ≥ 0, τ W∞ f ∈ L logm−1 L "⇒ f ∈ L logm L.

(4.4.7)

The interesting case when ak = 0 or 1 according to k ∈ N , where N = {nk , k ≥ 1} is an increasing sequence of integers, require us to work with rN ,A (ω) = , should ni A. Some additional properties of N , e.g., τ inf{ ≥ 1 :τ n (ω) ∈ A} and A∗ = ∞ i=1 ni ± nj ∈ N , j < i, plus naturally a suitable subsequence mean ergodic theorem, seem also to be necessary.

4.5

Speed of convergence

It is a fundamental fact that no speed of convergence can be associated to Birkhoff’s theorem, neither to von Neumann’s theorem. These negative results are essentially due to O’Brien [1983], Halász [1976], Krengel [1978] and von Neumann [1936], see for instance the discussion in Krengel [1985: 14, 15]. In what follows, we shall refer to the survey of Kachurovskii [1996]. Below is a first result due to Halász and Krengel (see Kachurovskii [1996: Theorem 1]). 4.5.1 Theorem. For any automorphism τ of the interval [0, 1] provided with the normalized Lebesgue measure λ, we can choose indicator functions for which the speed

4.5 Speed of convergence

149

of convergence in the pointwise ergodic theorem can be arbitrarily fast or arbitrarily slow: (1) For any sequence {an , n ≥ 1} with a1 ≥ 2 tending to infinity monotonically, we can find a measurable set A of prescribed measure λ(A), such that λ-almost everywhere, an ∀n, |Aτn (1A ) − λ(A)| ≤ . n (2) For any sequence {bn , n ≥ 1} of positive reals tending to 0, we can find a measurable set B of measure λ(A) ∈ ]0, 1[, such that λ-almost everywhere, lim

1 τ |A (1B ) − λ(B)| = ∞ bn n

lim

1

Aτ (1B ) − λ(B) p = ∞ bn n

n→∞

and n→∞

for any p ∈ [1, ∞]. One can naturally search to find spectral type conditions under which a speed of convergence holds. In this direction, the two following statements are of interest (Theorems 3 and 4 in [Kachurovskii: 1996]). 4.5.2 Theorem. Assume that τ is weakly mixing. Then the following properties are equivalent: (1) Aτn (f ) 2 = O(n) (n → ∞); π (2) the integral −π |x|−2 μf (dx) is convergent; (3) f is cohomologous to 0: f = g τ − g, g ∈ L2 . The following statement concerns the speed of convergence in probability. Put (4.5.1) pnε = μ{|An f − f¯| > ε}, Pnε = μ sup |AN f − f¯| > ε . N ≥n

4.5.3 Theorem. Assume that f¯ = 0. Then, for any ε > 0, 1 pnε ≤ 2 |Vn (x)| μf (dx), ε 4 16 δ ε |Vn (x)| μf (dx) μf (dx) + Pn ≤ inf δ>0 ε 2 −δ N ε2 |x|≥δ N ≥n 16 δ 4 2

f

≤ inf μ (dx) +

f 2 . δ>0 ε 2 −δ (n − 1)ε 2 sin2 2δ

150

4 Pointwise ergodic theorems

4.5.4 Remark. Before giving the proof of this result, we shall make some useful comments concerning approximation properties in L2 by functions cohomologous to 0, namely functions of type Uτ g − g for g ∈ L2 , where Uτ is defined by Uτ f = f τ . Let f ∈ L2 be such that f¯ = 0. Let E = {Et , t ∈ ] − π, π]} be a spectral resolution of Uτ , and put for any δ ∈ ]0, π[, fδ = E[−δ, δ]f

(= (Eδ − E−δ )f ).

Write f as a sum of two orthogonal functions: f = fδ + (f − fδ ). Let μf be the spectral measure of f relative to Uτ . It follows from Theorem 2.2.9 that fδ 2 = μf ([−δ, δ]), and consequently fδ 2 → 0 as δ → 0, since f¯ = μ({0}) = 0. The second term (f − fδ ) of the decomposition is cohomologous to 0 for each δ, that is to say f − fδ = Uτ g(δ) − g(δ) where g(δ) ∈ L2 , since 1 is a regular value of the restriction of Uτ to the subspace Hδ of functions h ∈ L2 such that E[−δ, δ]h = 0. Besides,

(Uτ − I )

−1

=

δ −δ

1 1 1 . ≤ sup = dE t it eit − 1 2 sin 2δ |t|≤δ e − 1

We thus deduce a corresponding decomposition for the sums An (f ): An (f ) = An (fδ ) + An (f − fδ ),

1 n Uτ g(δ) − g(δ) , n 2

gδ 2 ≤

f 2 , sin 2δ

An (f − fδ ) =

fδ 2 = μf ([−δ, δ]) → 0, and so

An (fδ ) 2 ≤ fδ 2 → 0,

1 4

An (f − fδ ) 2 ≤

f 2 . n sin 2δ

Proof of Theorem 4.5.3. The first estimate immediately follows from Tchebycheff inequality and the spectral inequality. Consider the second estimate, and recall the decomposition An (f ) = An (fδ ) + An (f − fδ ) (δ ∈ ]0, π [). Then

Pnε

ε ε ≤ μ sup |AN fδ | ≥ + μ sup |AN (f − fδ )| ≥ . 2 2 N ≥n N ≥n

(4.5.2)

We bound the first expression by means of the dominated ergodic Theorem 4.1.4 (inequality (a) with p = 2):

2 16 ε 4 ≤ 2 sup |AN fδ | 2 ≤ 2 fδ 22 . μ sup |AN fδ | ≥ 2 ε N ≥1 ε N ≥n

(4.5.3)

4.5 Speed of convergence

151

Finally, concerning the second estimate,

ε ε ≤ μ |AN (f − fδ )| ≥ μ sup |AN (f − fδ )| ≥ 2 2 N≥n N ≥n

4

AN (f − fδ ) 22 ε2 N ≥n 64 1 64 f 22 2

f

≤ ≤ . 2 n2 ε2 sin δ 2 2 sin δ 2 (n − 1)ε N ≥n 2 2 ≤

This provides the requested estimate. There is also a remarkable interconnection between the large deviation probability ∞ pnε in (4.5.1), and the property for f ∈ L∞ 0 (μ) to be approximated (in L0 (μ)) by coboundaries whose cobounding functions have finite moments. This link was recently established by Volný and Weiss [2004]. Let (X, A, μ, T ) be a measurable dynamical system and assume that T is an ergodic aperiodic automorphism. For k = 1, 2, . . . we denote Sk = T 0 + · · · + T k−1 . We have the following results. 4.5.5 Theorem. Let f ∈ L∞ 0 (μ) and p ≥ 1. Then ⎧∞ p−1 μ{|S f | > εk} < ∞, ⎪ k ⎨ k=1 k (∀ε>0) inf

f − (g − g T )

= 0 "⇒ and ∞ ⎪ g∈Lp (μ) ⎩ supk≥1 k p μ{|Sk f | > εk} < ∞. p 4.5.6 Theorem. Let f ∈ L∞ 0 (μ) and p ≥ 1. If supk≥1 k μ{|Sk f | > ηk} < ∞ for every η > 0, then for whatever ε > 0 and v : R+ → R+ such that ∞ k=1

1 0 and v(x), x p /v(x) are increasing,

|g|p there exists g ∈ L0 (μ) such that X v(|g|) dμ < ∞ and f − (g − g T ) ∞ < ε. In particular, for any ε > 0 we can find g ∈ Lp−ε . e|x|

Let L" (μ) be the Orlicz space associated to the exponentialYoung function "(x) = − 1.

4.5.7 Theorem. Let f ∈ L∞ 0 (μ). We have the following equivalences: 1 lim inf − log μ{|Sn f | > n} > 0 ⇐⇒ n→∞ n

inf

g∈L" (μ)

f − (g − g T ) ∞ = 0.

We refer to the quoted paper of Volný and Weiss [2004] for the proof of these results as well as a reference source for coboundaries.

152

4 Pointwise ergodic theorems

4.6

Oscillation functions of ergodic averages

In this section, we show how to modify the spectral regularization of Section 1.4 in order to control the oscillation functions of ergodic averages. We assume throughout the section that (X, A, ν) is a measure space with a finite measure ν, H = L2 (ν), and U is the unitary operator generated by a measure-preserving transformation of (X, A, ν). We write Log(u) = max{1, log u} for u ≥ 1. We still denote μ the spectral measure of an element f ∈ H and define the regularized spectral measure μˆ by letting its Lebesgue density be π d μˆ (x) = Q(θ, x)μ(dθ ), dx −π where this time

|θ |−1 Log2 xθ ,

|x| < |θ |,

θ 2 |x|−3 ,

|θ | ≤ |x| ≤ π.

Q(θ, x) =

(4.6.1)

The following theorem provides a control of the oscillation over an arbitrary block of averages. 4.6.1 Theorem. Let n, n+ be positive integers such that n ≤ n+ . Then

# sup |Am (f ) − An (f )| 2 ≤ C μˆ 1 , 1 . n+ n 2,ν n≤m≤n+

Remarks. 1. The result still holds true for Am generated by arbitrary contraction of H (not necessarily related to a measure-preserving transformation) under supplementary assumption n+ ≤ Rn. In the latter case the constant C depends on R. 2. Theorem 4.6.1 immediately allows to recover the following result due to Jones, Kaufman, Rosenblatt, and Wierdl [1998: Theorem A] concerning oscillation functions of ergodic averages. 4.6.2 Corollary. Let {np , p ≥ 1} be an increasing sequence of positive integers. Then, ∞ p=1

sup

np ≤m n. Then, ⎧ (m−n)m 2 θ , |θ | < m1 ; ⎪ ⎪ 2 ⎨ # " 1 1 κm,n (θ ) = κ , , θ = m−n , |θ | ∈ m1 , n1 ; 4m ⎪ m n ⎪ ⎩ m−n Log2 (n|θ |), |θ | ∈ 1 , π #. mn|θ |

n

Proof of Theorem 4.6.1. At first we prove the theorem for a short dyadic block. Namely, let us assume additionally that for some integer p, n+ − n = 2p ≤ 2n.

(4.6.2)

We use the classical dyadic scheme and thus introduce the binary increments j,k (f ) = An+(j +1)2p−k (f ) − An+j 2p−k (f ),

1 ≤ k ≤ p, 0 ≤ j < 2k − 1.

Each integer m ∈ [n, n + 2p ) can be written as m=n+

p

εk (m) = 0 or 1.

εk (m)2p−k ,

k=1

Thus, Am (f ) = An (f ) +

p

j (k,m),k (f ),

k=1

where the indexes {j (k, m)} are easily defined by {εk (m)}. Thus, we have sup n≤m 0, ∞ n P ξk > nλ < ∞. n=1

k=1

(4.6.10)

161

4.6 Oscillation functions of ergodic averages

Inequality (4.6.9) is a particular case of a more general maximal inequality proved in Rosenblatt–Wierdl [1992]: let a = {ap , p ≥ 1} be a sequence of positive reals and p bp = n=1 an . Then ∞

ap μ

p=1

sup

np ≤m bp ≤ C f 1 ,

(4.6.11)

and C is independent of f and τ . Wittmann [1995b] showed that (4.6.11) holds for general L1 -L∞ contractions. Inequality (4.6.10) is related to the very useful notion of complete convergence, which is worth to describe a bit. Let X = {Xn,k , 1 ≤ k ≤ kn , n ≥ 1} denote a triangular array of real centered independent random variables, and a = {an,k , 1 ≤ k ≤ kn , n ≥ 1} with {kn , n ≥ 1} nondecreasing, a triangular array of positive reals. When the random variables are symmetric (resp. identically distributed), we will say that the triangular array X is symmetric (resp. i.i.d.). Set, for every n ≥ 1, Tn =

kn

an,k Xn,k ,

An =

k=1

kn k=1

an,k ,

Bn2 =

kn

2 an,k ,

Cn = An /Bn .

k=1

Let (, A, P) be the basic probability space on which X is defined. Note that Cn ≥ 1. c.c. We say that the sequence Tn /An converges completely to 0 and write Tn /An −→ 0, when for any ε > 0, P {|Tn |/An > ε} < ∞. (4.6.12) n

The study of this property originates from a well-known paper by Hsu and Robbins [1947] who proved in the case of a single i.i.d. sequence ξ = {ξ, ξn , n ≥ 1} with partial c.c. sums Sn = nk=1 ξk , n = 1, 2, . . . , that E ξ = 0, E ξ 2 < ∞ imply Sn /n −→ 0. Shortly afterward, Erdös [1949] proved the validity of the converse implication. Since then, the study of various possible generalizations of this result (subsequence case, the theorems of Baum–Katz [1965], extensions to triangular arrays of independent random variables, Banach space valued random variables) have received a lot of attention. One may for example refer to the works of Gut [1992], Fazekas [1985/88], Hu–Móricz– Taylor [1989], Ahmed–Giuliano–Volodin [2002], Kuczmaszewska–Szynal [1988/91], Li–Rao–Wang [1992], Pruitt [1966], Rohatgi [1971], Sung [1997] and Berkes–Weber [2006]. In the Gaussian case, namely if X is Gaussian, the problem can be simply settled. Put log #{n : Cn ≤ x} L(a) = lim sup . (4.6.13) x2 x→∞ Then we have the following characterization in [Berkes–Weber: 2006] c.c.

Tn /An −→ 0 ⇐⇒ L(a) = 0.

(4.6.14)

162

4 Pointwise ergodic theorems

This case is in general very informative and interesting, because of the classical Gaussian randomization procedure for sums of independent random variables. By applying Skorohod’s embedding scheme (see Section 10.4) for the row sums of the triangular 2 = 1 and X 2p for some p ≥ 2, array X, one can show, for instance if E Xn,k n,k ∈ L that the relation

kn 4 p/2 k=1 an,k

kn 2 p < ∞, n k=1 an,k c.c.

implies Tn /An −→ 0. To compare this result with the Gaussian case, note that L(a) = 0 is equivalent to

kn

exp

−δ

n

2

k=1 an,k k n 2 k=1 an,k

for all δ > 0.

It seems also worth mentioning some sharp results concerning the convergence of

f (τ n (x)) p the series ∞ with p > 1. Assani [1997b] showed that if τ is ergodic, n=1 n for f ≥ 0, f ∈ L log L, lim

p→1+

(p − 1)

1/p ∞ f (τ n (x)) p n=1

n

a.e.

=

f dμ.

(4.6.15)

Further, there is an absolute constant C such that 1/p ∞ " # f (τ n (x)) p ≤C sup (p − 1)1/p f log f dμ + 1 . (4.6.16) n 10

and for r < p,

x p,∞ ≤ x p ≤

p 1/p

x r,∞ . p−r

In the ergodic setting: if τ is ergodic, then xn = f (τ n x); Assani [1997a] proved that p for any f ∈ L+ (μ), Nf∗ is of weak type (p, p) for all p, 1 < p < ∞. Further, #{n : f (τ n x)/n ≥ 1/m} a.e. = lim m→∞ m

f dμ.

(4.6.19)

The convergence in L1 of the averages in (4.6.19) also holds. Note that for f ≥ 0,

#{n : f (τ n x)/n ≥ 1/m} f (τ n x) sup , n ≥ 1 . ∼ n m 1,∞ m≥1

Further [Assani: 1997b] for f ∈ L log L f (τ n x) ,n ≥ 1 < ∞. n 1,∞ 1

(4.6.20)

Assani, Buczolich and Mauldin [2005] however showed that for f ∈ L1 the convergence almost everywhere of these averages fails to hold. This negative result establishes that Bourgain’s return time theorem (see Section 4.7.3) does not hold for (L1 , L1 ) pairs.

164

4 Pointwise ergodic theorems

Transference principle. We shall state the Calderon transference principle not in its full generality, but in the discrete case. One may fruitfully refer to Calderon’s original paper for more general formulations, as well as to the illuminating discussion on “transference principles in ergodic theory” made in Bellow [1999]. Let m be a probability measure on Z. Define a mapping ϕ → m[ϕ] from 1 (Z) to 1 (Z) by m[ϕ](k) = m(j )ϕ(k + j ), k ∈ Z. j ∈Z

Let (X, A, μ, τ ) be a measurable dynamical system, and assume that τ is an automorphism of (X, A, μ). Define similarly a mapping f → m[f ] from L1 (μ) to L1 (μ) by putting m(j )f τ j (x), x ∈ X. m[f ](x) = j ∈Z

4.6.6 Theorem. Let {mn , n ≥ 1} be a sequence of probability measures on Z. Consider the following assertions: (1) There exists a constant C such that sup sup λ# k ∈ Z : sup mn [ϕ](k) > λ ≤ C.

ϕ 1 ≤1 λ≥0

n≥1

(2) There exists a constant C such that for every measurable dynamical system (X, A, μ, τ ), we have sup sup λμ x ∈ X : sup mn [f ](x) > λ ≤ C.

f 1 ≤1 λ≥0

n≥1

Then (1) implies (2). 4.6.7 Remarks. 1. The first assertion indicates that we have a weak type (1, 1) inequality on 1 (Z). The second one states a weak type (1, 1) inequality on L1 (μ). The constant C is the same in (1) and (2). One can allow σ -finite measure spaces in (2), and one obtains an equivalent statement. The transference principle also applies if we replace the weak type (1,1) estimate by a weak type (p, p) estimate (respectively a strong type (p, p) estimate) for 1 < p < ∞. The underlying fact is that if we have the estimate for the shift model (i.e., Z with translations), we can derive it for any other dynamical system. 2. If one sets mn = n1 n−1 k=0 δk , then (1) yields the Hardy–Littlewood maximal inequality sup sup λ# k ∈ Z : sup n1 n−1 j =0 ϕ(j + k) > λ ≤ C,

ϕ 1 ≤1 λ≥0

n≥1

proved in the celebrated paper of Hardy and Littlewood [1930]. And (2) yields the maximal ergodic inequality (Lemma 4.1.2) j sup sup λμ x ∈ X : sup n1 n−1 k=0 f τ (x) > λ ≤ C.

f 1 ≤1 λ≥0

n≥1

165

4.7 Wiener–Wintner theorem

We thus see that the maximal ergodic inequality may in turn be also deduced from the Hardy–Littlewood maximal inequality published one year before Birkhoff’s proof of the pointwise ergodic theorem. Proof of Theorem 4.6.6. Let f ∈ L1 (μ). It suffices to prove (2) for nonnegative f . Assume then that (1) is realized and apply it to the sequence f τ j (x) if |j | ≤ J , ϕ(j ) = 0 otherwise. Observe that ϕ(k + j ) = 0, if |k| > 2J . Then for any x ∈ X, any positive integer N and any real t ≥ 0, k+j (x) > t t# k ∈ Z : sup j ∈Z m(j )f τ 1≤n≤N

= t# |k| ≤ J : sup 1≤n≤N

j ∈Z m(j )f

τ k+j (x) > t

≤ C ϕ 1 . Integrating over X with respect to μ gives k+j (x) > t ≤ 2CJ f . 4J tμ x : sup 1 j ∈Z m(j )f τ 1≤n≤N

Letting T and then N tend to infinity, finally leads to tμ x : sup j ∈Z m(j )f τ k+j (x) > t ≤ C f 1 , n≥1

as claimed.

4.7 Wiener–Wintner theorem Let (X, A, μ, T ) be a dynamical ergodic system and consider a rotation τ x = x + θ (mod 1) on the circle (T, B(T), λ). Let f ∈ L1 (μ). The Birkhoff ergodic theorem applied in the product dynamical system (X × T, A ⊗ B(T), μ × λ, T × τ ) to the function g = e2iπ θ f implies that the limit N −1 1 2iπkθ e f (T k x) N →∞ N

lim

k=0

exists μ-almost everywhere. The striking fact is that the measurable set of full measure, on which this property holds does not depend on the value of θ . This was first observed by Wiener and Wintner [1941].

166

4 Pointwise ergodic theorems

4.7.1 Theorem. Let (X, A, μ, T ) be an ergodic measurable dynamical system. Then for any f ∈ L1 (μ), for μ-almost all x, the sequence of averages N −1 1 inϑ e f (T n x) N n=0

converges for any value of ϑ. The proof proposed in the Wiener–Wintner paper was however incorrect. Since then, several different proofs have been published. This result admits a remarkable strengthening, in the sense that the latter convergence is uniform in ϑ. This uniform version of the Wiener–Wintner theorem is due to Bourgain [1990]. 4.7.2 Theorem (Uniform Wiener–Wintner theorem). Let (X,A, μ, T ) be an ergodic measurable dynamical system. Then for any f ∈ L1 (μ) with X f dμ = 0, −1 ikϑ f (T k x) = 0 = 1. μ x ∈ X : lim sup N1 N k=0 e N →∞ ϑ∈R

Proof. We give a proof using Van der Corput’s inequality when T is weakly mixing. Recall the Van der Corput inequality (Theorem 1.7.1), case H = C. If {xn , 0 ≤ n ≤ N − 1} are complex numbers and R is some integer between 0 and N − 1, then N−1 N −1 1 2 N +R |xk |2 x ≤ k N N 2 (R + 1) k=0

k=0

−r−1 R N N +R . +2 2 x x + 1 − r) · $ (R k k+r N (R + 1)2 r=1

k=0

Assume first f ∈ L2 (μ) and apply this inequality with the choice xn = einϑ f (T n x). We get N−1 2 N −1 R 1 ikϑ N +R N +R k k 2 e f (T x) ≤ |f (T x)| + 2 $(e−irϑ ) N N 2 (R + 1) N 2 (R + 1)2 k=0

r=1

k=0

−r−1 N

· (R + 1 − r)

f (T k x) · f (T k+r x) .

k=0

Taking the supremum of over all ϑ gives, since R ≤ N − 1, N−1 2 1 ikϑ e f (T k x) ≤

sup

ϑ∈R

N

k=0

N −1 2 |f (T k x)|2 N(R + 1) k=0

+

R N −r−1 1 4 k k+r x). f (T x) · f (T (R + 1) N r=1

k=0

167

4.7 Wiener–Wintner theorem

Taking now the limsup on N provides N 2 −1 1 ikϑ 2E (f 2 |JT ) + e f (T k x) ≤

lim sup sup N→∞ ϑ∈R

N

(R + 1)

k=0

R 4 E (f.f T r |JT ). (R + 1) r=1

Now since T is weakly mixing, then JT is the trivial σ -algebra of X, and so E (f · f T r |JT ) = X f · f T r dμ = f, f T r . Further, R 1 f, f T r − f, 12 = 0. R→∞ R

lim

r=1

By passing to the limsup on R, we finally get N 2 −1 1 ikϑ e f (T k x) ≤ 4f, 12 ,

lim sup sup N →∞ ϑ∈R

N

k=0

which equals 0, if moreover X f dμ = 0. Now consider the case f ∈ L1 (μ) with X f dμ = 0. An intuitive approximation argument which, however, is worth display, suffices to reach a conclusion in that case. Let {fn , n ≥ 1} be a sequence of L2 (μ) elements converging to f in the L1 (μ) norm. For each of these elements, we have by the previous step N −1 1 ikϑ e fn (T k x) ≤ 2|fn , 1|,

lim sup sup N →∞ ϑ∈R

N

k=0

almost surely. Further, by Birkhoff’s theorem N −1 N −1 1 ikϑ

1 f (T k x) − fn (T k x) ≤ lim sup e |f (T k x) − fn (T k x)|

lim sup sup N→∞ ϑ∈R

N

N →∞

k=0

N

k=0

≤ f − fn 1 , almost surely. By the triangle inequality, we get N −1 1 ikϑ e f (T k x) ≤ 2|fn , 1| + f − fn 1 ,

lim sup sup N→∞ ϑ∈R

N

k=0

for any integer n, almost surely. As the right-hand side tends to zero as n tends to infinity, we obtain the result in the L1 (μ) case as well. Wiener–Wintner functions. The uniform version of the Wiener–Wintner theorem has recently given rise to some interesting developments (see Assani, Lesigne and Rudolph [1995], see also Assani [2003], [2004]). Let (X, A, μ, T ) be an ergodic

168

4 Pointwise ergodic theorems

dynamical system and p ≥ 1. A function f is a Wiener–Wintner function in Lp (μ) if there exists an α > 0 such that N 1 sup N α sup f (T n x) e2π inε < ∞. p N ε>0 N ≥1 n=1

Assani [2004] obtained a spectral characterization of Wiener–Wintner functions, with the help of the almost everywhere continuity of the random Fourier series Hγε (f )(x) =

(−1)k

k∈Z

f (T k x) 2π ikε e |k|γ

which he called “the fractional rotated ergodic Hilbert transform”. He showed that an L∞ (μ) function f is a Wiener–Wintner function in L2 (μ) if and only if for almost all x, Hγε (f )(x) is a continuous function of ε, which is a remarkable fact. Return times theorems. By Theorem 4.7.1, f = 1A , then for all x outside a μ-null if−1 2iπ nϑ 1 (T n x) converge to a limit set N = Nf and for all ϑ, the averages N1 N A n=0 e as n tends to infinity. By the spectral inequality, this implies for any contraction S in a Hilbert space, that for all x ∈ / F , and all g ∈ H the averages N −1 1 1A (T n x)S n g, N

N = 1, 2, . . .

n=0

converge in H . When Sg = g σ , σ being an automorphism from a joint probability space (Y, B, ν), the question whether these averages converge ν-almost everywhere was settled affirmatively by Bourgain [1988d], and the solution is known as Bourgain’s return time theorem.

4.8 Weighted ergodic averages Let τ be a measure-preserving transformation of a probability space (X, A, μ). Let w = {wk , k ≥ 1} be a sequence of nonnegative reals with partial sums Wn := nk=1 wk . Since the ergodic theorem of Birkhoff for integrable functions can be viewed as an extension of the corresponding law of large numbers for i.i.d. random variables with finite expectation, it is natural to also look at the convergence almost everywhere of the weighted ergodic averages An f :=

n 1 wk f τ k , Wn k=1

n = 1, . . . .

169

4.8 Weighted ergodic averages

In view of the Beppo Levi theorem, we have to study only the case So we do assume throughout this section that

∞

k=1 wk

= ∞.

Wn ↑ ∞. Before going further, let us consider some typical means. Logarithmic means. After arithmetic means (or Cesàro-0 means), these averages are mostly known. They are defined for a given sequence x = {xk , k ≥ 0} of reals by 1 xk . k log n n

k=1

And it is a classical fact that Cesàro-0 convergence implies the one of the logarithmic means. So that Birkhoff’s ergodic theorem does hold for logarithmic averages. A set S of positive integers has logarithmic density when the limit 1 1 n→∞ log n k k∈S

L(S) := lim

k≤n

exists. And by a result due to Wintner [1944c; 53], S has logarithmic density if and only if the limit 1 lim (s − 1) s→1+0 ns s∈S

exists, in which case the limit is L(S). See also [Paul: 1962] for more on densities. Cesàro means. For α > −1, we set Aα0 = 1,

Aαn − Aαn−1 = Aα−1 n ,

A0n = 1.

Then Aαn

=

n k=0

Aα−1 n−k =

(α + 1) . . . (α + n) , n!

lim Aαn

n→∞

(α + 1) = 1. nα

Further Aαn increases with n if α > 0, and decreases with n if −1 < α < 0. Let 0 < α ≤ 1. We have the following estimates: (n + 1)α nα ≤ Aαn ≤ , (α + 1) (α + 1)

and

≤ Aα−1 n

nα if n > 0. (α)

Let x = {xk , k ≥ 0} be sequence of reals. The associated Cesàro-α means for x are defined by n 1 α−1 An−k xk . Mnα (x) = Mnα = α An k=0

170

4 Pointwise ergodic theorems

The sequence x is (C, α) (i.e. Cesàro-α) convergent to y, if limn→∞ Mnα = y. It is well-known ([Zygmund: 1959], Theorem 1.21, p. 77) that if x is (C, α) convergent to y for some α > −1, then x is (C, β) convergent to y for β ≥ α. In particular, (C, 0) convergence implies (C, α) convergence for α ≥ 0. And (C, α) convergence for −1 < α < 0 implies usual (C, 0) convergence. For an i.i.d. sequence X = {Xk , k ≥ 0} of random variables it is known that – for 0 < α ≤ 1, X is (C, α) convergent iff E |X0 |1/α < ∞, – for α ≥ 1, all (C, α) convergences are equivalent with E |X0 | < ∞. See [Deniel: 1989] and references therein. Now, let T be a positive linear contraction of Lp . Let 0 < α ≤ 1. Irmisch [1980] proved the a.s. convergence of Cesàro-α means n 1 α−1 k An−k T f, Aαn k=0

for any f ∈ Lp , if αp > 1. This applies in particular if Tf = f τ , where τ is a measure preserving transformation of some probability space (, A, μ). Irmisch further proved that this result is false in general if αp = 1. Deniel [1989; Theorem 7] showed that this is also false if Tf = f τ , τ ergodic, μ non-atomic by constructing a specific counterexample using Rochlin’s towers. Riesz harmonic means. These means, which must not be confused with logarithmic means, are defined for any sequence x = {xk , k ≥ 0} of reals by cn xk , log n n−k n−1

log n cn = n 1 .

k=0

k=1 k

The convergence of the Riesz harmonic means implies the one for α > 0 of Cesàro-α means (Hardy [1963; 110]). The Riesz harmonic means appear naturally when α → 0. Let X = {Xk , k ≥ 0} be an i.i.d. sequence with E X0 = 0. As a consequence of a result of Chow and Lai [1973: Theorem 2] cn Xk a.s. =0 n→∞ log n n−k n−1

lim

⇐⇒

E et|X0 | < ∞

(∀t > 0).

k=0

Deniel [1989; Theorem 11] showed that this result cannot be extended to the stationary case. More precisely, if τ is an ergodic automorphism on (, A, μ), μ non-atomic, there exists a measurable set B such that if f = 1B , then the Riesz harmonic means cn 1 f τk log n n−k n−1

Hn f =

k=0

4.8 Weighted ergodic averages

171

diverge almost surely. The construction of B goes as follows. Let n ≥ 2 be some integer. By Kakutani–Rochlin’s lemma (see (7.2.2)) there exists A ∈ A such that

n2 −1 u 2 2 A, τ A, . . . , τ n −1 A are mutually disjoint and μ u=0 τ A = n μ(A) ≥ 1 − 1/n. Let B= τ u A, D = τ j A. 1≤j 0, k=1 Bk , F = k=1 Dk . We observe that μ(F ) ≥ 1 − and on F lim sup Hn χE ≥ 1/2. n→∞ Further μ(E) ≤ k 1/nk < 1/2. Assume the convergence almost everywhere of Hn χE . The fact that the convergence of the Riesz harmonic means implies the convergence of the Cesàro means to the same limit, would imply that this one equals to μ(E) < 1/2. We consequently get a contradiction. Riesz B-means. Let {bk , k ≥ 1} be positive reals and assume that Bn → ∞. To any sequence x = {xk , k ≥ 0} of reals, one can associate the Riesz B-means defined by the formula N 1 bk xk . σN (x) = BN k=1

Gaposhkin has considered for stationary sequences the Riesz B-means with coefficients (bk , Bk ) satisfying some regularity assumptions, namely bk = b(k) where b(u) = u−1 ϕ(u) and ϕ on [1, ∞] is regularly varying in the sense that for each ε > 0 ϕ(u) ↓0 uε

and uε ϕ(u) ↑ ∞,

u → ∞,

172

4 Pointwise ergodic theorems

and

u

B(u) =

b(t)dt → ∞,

u → ∞.

1

He obtained in [Gaposhkin: 1995] optimal spectral conditions for the convergence almost everywhere of these means. Let ξ = {ξk , k ≥ 1} be a stationary sequence. If the spectral measure F (dλ) of ξ satisfies the condition

log log B 0 0. The elementary identity An − An−1 = −

wn wn An−1 + ξn Wn Wn

applied with n = nk together with the weak law implies that the left-hand side of the above converges to 0, and the first term of the right-hand side converges to −c(ξ1 ) wn in probability. So that Wnk ξnk converges in probability to c(ξ1 ). Thus ξnk converges k

173

4.8 Weighted ergodic averages

in probability to (ξ1 ). Since ξi are i.i.d., this means that ξ1 is degenerate; hence a contradiction. Notice that lim wn /Wn = 0 and Wn ↑ ∞ ⇐⇒ lim max (wk /Wk ) = 0.

n→∞

n→∞ k≤n

Conversely if limk→∞ wk /Wk = 0, letting F be the distribution function of ξ1 , the weak law holds if and only if lim xF (dx) exists. lim T P{|ξ1 | ≥ T } = 0 and T →∞

T →∞ |x| 0, (ii) supn n1 nk=1 wkα < ∞ for some α > 1,

∞

wn n=1 Wn

= ∞, while (4.8.2)

then condition (4.8.2) holds (Baxter, Jones, Lin and Olsen [2004: Theorem 3.4]). wk Proof. Sufficiency. Put for x ≥ 1, N(x) = #{k : W ≥ x −1 }, N (x) = 0 if 0 ≤ x < 1. k Then N is a nondecreasing function. Consider for k ≥ 1 the truncated random variables

Yk = ξk · χ |ξk | < Observe that

P{ξk = Yk } =

k≥1

W

k k≥1 |v|≥ wk

=

v =0

# k:

Wk . wk

F (dv) =

k≥1

χ|v|≥ Wk F (dv) wk

wk ≥ |v|−1 F (dv) = E N (|ξ |). Wk

174

4 Pointwise ergodic theorems

Under (4.8.2), we have N(y) ≤ Cy. So if E |ξ | < ∞, then P{ξk = Yk ultimately} = 1. Thus it suffices to prove the result with Yk in place of ξk . The random variables wk

ζk = Wk Yk − E Yk are independent; further,

wk Wk

E ζk2 ≤

2

E Yk2 =

wk Wk

2 W |x|< w k k

x 2 F (dx).

Given K arbitrary, let ≥ 0 be such that wk ≥ 2− . Wk

min

1≤k≤K

Then K

wk 2 x 2 F (dx) Wk Wk |x|< w

E ζk2 ≤

1≤k≤K

k=1

≤ =

W k

k :|x|< w k ≤2

|x|≤1

≤

+

|x|≤1

j =1

2

x 2 F (dx)

{2j 0 and a subsequence wn {nk , k ≥ 1} such that wnk /Wnk → c, and so Wnk ξ˜nk has a limit distribution, namely k

wn the distribution c(ξ1 − E ξ1χ {|ξ1 | < c}). Consequently P{| Wnk ξ˜nk | ≥ ε} → P{|ξ1 − k E ξ1χ {|ξ1 | < c}| ≥ ε/c}. If P{|ξ1 − E ξ1χ {|ξ1 | < c}| ≥ ε/c} = 0, for every ε > 0, then ξ1 = E ξ1χ{|ξ1 | < c} almost surely, a degenerate case which is excluded.

4.8 Weighted ergodic averages

177

Otherwise for some ε > 0, we have P{|ξ1 −E ξ1χ{|ξ1 | < c}| ≥ ε/c} > 0. Therefore the series n≥1 P{|ξ˜n | ≥ εWn /nn } diverges. By the Kolmogorov three series theorem, wn ˜ ξn cannot converge almost surely. [Petrov: 1975] p. 266, the series n≥1 W n Bounded sequences w, however, need not satisfy (4.8.2) as follows from the result below. 4.8.2 Theorem. Let w be bounded weights. Then for every centered i.i.d. sequence ξ a.s. with E (|ξ1 | log+ |ξ1 |) < ∞ we have limn→∞ An (ξ ) = 0. More generally, we will see that it is possible to relax the assumptions on the weights to obtain a.s. convergence, when more integrability conditions on ξ are known. But first, let us return to the ergodic setting and begin with first results ([Lin–Weber: 2007], Theorem1.2 and 3.1) concerning notably the natural example of sequences w satisfying “monotonicity” or “quasimonotonicity” assumptions. 4.8.3 Theorem. Let p ≥ 1. Let f = {fk , k ≥ 1} denote any sequence in Lp (μ). (i) In order for every sequence f to be such that n1 nk=1 fk converges to f almost everywhere (in norm), also limn→∞ W1n nk=1 wk fk = f almost everywhere (respectively in norm), it is necessary and sufficient that 1 k|wk − wk+1 | < ∞. nwn + Wn n−1

lim sup n→∞

(4.8.3)

k=1

(ii) Further, for any non-null sequence γ = {γk , k ≥ 1} of nonnegative numbers and n

f

k any sequence f , such that k=1 converges to some f almost everywhere (in n k=1 γk norm), also n 1 n w f k=1 wk fk k k W k=1 = 1n n →f n k=1 wk γk k=1 wk γk W n

almost everywhere (respectively, in norm) as n tends to infinity. (iii) In particular, if f is such that n1 nk=1 fk converges to f almost everywhere (in norm), then n 1 n wk fk → f j =1 wj k=1

almost everywhere (respectively, in norm) as n tends to infinity. The standard examples of sequences f for which the condition n1 nk=1 fk converges to f almost everywhere (in norm) is satisfied are given by fk = f τ k where τ is an endomorphism of (X, A, μ), which follows from Birkhoff’s theorem.

178

4 Pointwise ergodic theorems

Proof. Since nwn =

n−1 k=1

k(wk+1 − wk ) + Wn , condition (4.8.3) is equivalent to

lim sup n→∞

n−1 1 k|wk − wk+1 | < ∞. Wn k=1

(i) is a special case of a general result on summability methods which are stronger than the Cesàro method (see Zeller [1958: 100], see also Dunford–Schwartz [1958: 75]). If A is a matrix which preserves Cesàro convergence and C is the Cesàro matrix, then AC −1 is regular (preserves convergence). The sufficiency of (4.8.3) for preserving convergence of Cesàro averages (also in norm) follows from (iii), with γk ≡ 1. (ii) The proof is similar to that of Theorem 8.2.1 in Krengel [1985]. Given f put Fn = nk=1 fk . We denote by |F | either |F (x)| for a given point x or the norm F p , according to the given mode of convergence. By Abel’s summation formula we obtain n n−1 1 1 wn wk fk = (wk − wk+1 )Fk + Fn . Wn Wn Wn k=1

(4.8.4)

k=1

We are given γ a non-null sequence of nonnegative numbers and f ⊂ Lp , such that n fk k=1 converges to some f a.e. (in norm). Denote Gn := nk=1 γk . By assumption, n γ k=1 k a Gn . we have 0 < a ≤ wk and Wk ≤ kb for every k. Hence G∗n := W1n nk=1 wk γk ≥ nb To simplify the exposition, we assume γ1 > 0. Replacing fk and Fk in (4.8.4) by γk and Gk respectively and multiplying by g, we obtain (after subtraction from (4.8.4) and division by G∗n ), n n−1 Fk Gk Gn k=1 wk fk 1 nwn Fn − g ≤ |w − w |k − g + − g . k k+1 n W Gk kG∗n Wn Gn nG∗n n k=1 wk γk k=1

F k For ε > 0 we have G − g < ε for k > N . Splitting the summation above to a sum k up to N and a sum for k > N, the first sum converges to 0 as n → ∞ since Wn → ∞ (and G∗k ≤ G∗n for k ≤ N), and using (4.8.3) we obtain n n−1 k=1 wk fk b 1 nwn b lim sup n −g ≤ lim sup ε |wk −wk+1 |k + ≤ C ·ε. a Wn Wn a n→∞ n→∞ k=1 wk γk ∞

k=N +1

Note that when k=1 γk = ∞, it is enough to assume lim inf k→∞ wk > 0, since then k wk γk = ∞, and we can apply (ii) to the sequence wJ +k with a fixed large J . Gn (iii) The additional assumptions on w in (ii) were needed to obtain supn nG ∗ < ∞; n since this follows from the assumptions on γ in (iii), the proof of (ii) applies.

The following result of practical interest is now easily deduced from Theorem 4.8.3. 4.8.4 Corollary. In each of the following cases, condition (4.8.3) is satisfied (and hence all the assertions of Theorem 4.8.3 hold):

179

4.8 Weighted ergodic averages

(i) For some s ≥ 0 the sequence {k −s wk , k ≥ 1} is nonincreasing. (ii) For some s ≥ 0 the sequence {k s wk , k ≥ 1} is nondecreasing and satisfies nwn sup < ∞. (4.8.5) Wn n Proof. (i) We may of course assume s ≥ 1. We use the given monotonicity to estimate n−1

k|wk − wk+1 | ≤

k=1

n−1

k

1+s

k=1

wk wk+1 − s k (k + 1)s

Since s ≥ 1, the second sum is bounded by the first sum we have the estimate n−1 k wk − k=1

n−1

+

n−1 (k + 1)s − k s k wk+1 . (k + 1)s k=1

s(k+1)s−1 (k+1)s

k=1

kwk+1 ≤ s

n

j =2 wj .

For

(k + 1)s+1 − k s+1 ks w = (kw −(k+1)w )+ wk+1 k+1 k k+1 (k + 1)s (k + 1)s n−1

n−1

k=1

k=1

≤ w1 − nwn + (1 + s)

n−1

wk+1 .

k=1

n−1 (1 + 2s)Wn , which We obtain k=1 k|wk − wk+1 | + nwn ≤ w1 + implies (4.8.3). Note that (i) easily implies (4.8.5), since Wn = nk=1 k s k −s wk ≥ nk=1 k s n−s wn ≥ 1 s+1 nwn . (ii) We may now assume s ≥ 2, and use the monotonicity to estimate n−1

k|wk − wk+1 | ≤

k=1 n−1

=

n−1 n−1 1

(k + 1)s − k s s s w − k w wk+1 (k + 1) + k+1 k k s−1 k s−1 k=1

(k + 1)wk+1 − kwk

k=1

k=1

+

n−1 (k + 1)s k=1

k s−1

= nwn − w1 +

n−1

− (k + 1) wk+1 +

k=1

(k + 1)s − k s (k + 1)s−1 − k s−1 (k + 1)wk+1 + wk+1 s−1 k k s−1 n−1

k=1 n−1

≤ nwn + (s − 1)

k=1

≤ nwn + (2s − 1)2

n−1 (k + 1)s − k s wk+1 k s−1

s−1

k=1

k+1 k n

s−1

wk+1 + s

n−1 k=1

k+1 k

wj ,

j =2

and together with (4.8.5) we conclude that (4.8.3) holds.

s−1

wk+1

180

4 Pointwise ergodic theorems

Remarks. Trigonometric series with coefficients satisfying the “quasimonotonicity” assumptions of the corollary were considered by Lebed [1967]. Corollary 4.8.4 applies also to non-monotone sequences. As an example satisfying (i), define wk = 2−j s k s for 2j ≤ k < 2j +1 . Since w2j = 1 and the sequence increases in each dyadic bloc, it is not monotone. Part (ii) of the corollary applies, for example, to wk := 2j s k −s for 2j ≤ k < 2j +1 ; an unbounded example is wk := k + 23 sin k. For increasing sequences, condition (4.8.5) is satisfied when wk = k t for a fixed t > 0, or wk = (log k)t for a fixed t > 0, but not when wk = t k for some t > 1. For more details and examples we refer to Lin and Weber [2005]. Before discussing more precisely the L2 (μ) setting, let us recall a well-known fact (see for instance Hardy, Littlewood and Polya [1934: 120]), from which follows a simple but useful result. Let (t) > 0 be nondecreasing for t ≥ t0 ≥ 0 with limt→∞ (t) = ∞; then we have k

wk 1 < ∞ "⇒ < ∞. (k) (Wk )

This obtains from the inequality

(4.8.6)

k

wk (Wk )

≤

Wk

1 Wk−1 (t) dt,

valid when Wk−1 ≥ t0 .

4.8.5 Proposition. For any α > 1 and for any sequence {fk , k ≥ 1} ⊂ L1 satisfying supk fk 1 < ∞ we have n a.e. k=1 wk fk = 0. lim n→∞ Wn logα (1 + Wn ) Proof. Apply (4.8.6) with (t) = t logα (1 + t); the result follows from Beppo Levi’s theorem and Kronecker’s lemma. For p > 1 the proposition is also an immediate consequence of the remark to Corollary 9.3.7 (c) (with ξk = wk fk ). By taking (t) = t log(1 + t)[log log(1 + t)]α with α > 1, the proof also yields n k=1 wk fk = 0 a.e. lim n→∞ Wn log(1 + Wn )[log log(1 + Wn )]α From now on we write Mn :=

n

wk2 .

(4.8.7)

k=1

We now consider a sequence of functions f = {fk , k ≥ 1} ⊂ L2 (μ). Let w be a sequence of nonnegative weights. We assume the following relation between the weights and the functions: there exists a finite constant C0 such that m m 2 wk fk ≤ C0 wk2 , k=n

2

k=n

∀m ≥ n ≥ 1.

(4.8.8)

4.8 Weighted ergodic averages

181

Condition (4.8.8) obviously holds for norm-bounded orthogonal sequences, e.g., orthonormal sequences (for any sequence of weights). Such a condition is also realized by (1.3.11), when f satisfies fi , fj < ∞ (4.8.9) sup i

j

(e.g., fk are centered and satisfy supi j fi , fj < ∞). To get (4.8.9) it suffices for instance that for any integers j ≥ i ≥ 1, fi , fj ≤ C1 e−C2 |j −i| . (4.8.10) Since the weights are nonnegative, (4.8.9) holds for centered negatively correlated random variables with uniformly bounded variances. Another example of a sequence satisfying (4.8.9) is a wide-sense stationary sequence with bounded spectral density. 4.8.6 Theorem. Assume that the sequence of weights w satisfies log

1 = O(log Mn ) wn

(4.8.11)

and f satisfies (4.8.9). Then for any b > 3/2 we have n k=1 wk fk lim = 0 a.s. n→∞ M 1/2 logb M n n If in addition lim sup n→∞

Mn logγ Mn < ∞ for some γ > 3, Wn2

then we have

(4.8.12)

n

k=1 wk fk

lim

n→∞

Wn

= 0 a.s.

Proof. The first half of Theorem 4.8.6 is an immediate consequence of Corollary 9.3.7. The second half follows from the first using (4.8.12). We now explore some intermediate conditions. 4.8.7 Theorem. Assume that for some 0 < β < 1 we have 1 β n wn + k β |wk − wk+1 | < ∞. Wn n−1

sup n

(4.8.13)

k=1

Then for p > β1 and T power-bounded on Lp , An (T )f → 0 a.e. for f ∈ Lp which for some α ∈ (1 − β, 1] satisfies n 1 k sup 1−α T f < ∞. n

n

k=1

p

(4.8.14)

182

4 Pointwise ergodic theorems

Proof. p and T as specified in the theorem, and for f ∈ Lp (μ) denote Sn f := n Fix k f . If f ∈ L (μ) satisfies (4.8.14), then, since β > max{1 − α, 1/p}, T p k=1 Proposition 11.3.8 yields 1 Sn f → 0 a.e. (4.8.15) nβ Using Abel’s summation we have n n−1 1 1 β nβ wn 1 1 k wk T f = k (wk − wk+1 ) β Sk f + Sn f . Wn Wn k Wn nβ k=1

k=1

We now obtain the assertion of the theorem by using (4.8.15) and (4.8.13), similarly to the proof of Theorem 4.8.3. Remarks. Condition (4.8.14) implies that f is a fractional coboundary for T . For additional information we refer to Derriennic–Lin [2001], where (4.8.15) isproved for Dunford–Schwartz contractions. Condition (4.8.13) implies also that W1n nk=1 wk fk → 0 a.e. for any sequence {fk , k ≥ 1} ⊂ Lp (μ), with p > β1 with supk fk p < ∞ 1 n

# satisfying sup 1−α fk < ∞, for some α ∈ p (1 − β), 1 . This follows n

n

k=1

p−1

p

from Proposition 1 of Cohen and Lin [2003] (with δ = 1 − β). However the condition on α here is more restrictive than in the theorem. For nondecreasing weights, condition β (4.8.13) is equivalent to supn nWwn n < ∞. 4.8.8 Corollary. Assume that condition (4.8.13) holds for some β > 21 . Then for every power-bounded T on L2 (μ) and f ∈ L2 with sup √1n Sn f 2 < ∞, we have An (T )f → 0 a.s. 4.8.9 Corollary. Let 1 < q < 2 with dual index p = q/(q − 1), and assume

n−1 nwn 1 q q + q k |wk − wk+1 | < ∞. sup Wn Wn k=1 n

(4.8.16)

Then for every T power-bounded in Lp (μ) and f ∈ Lp (μ) satisfying (4.8.14) with α > 1/p we have An (T )f → 0 a.e. Proof. We first show that for any β < 1/q (4.8.13) is satisfied. By Hölder’s inequality n−1 n−1 1 1 1 β k |wk − wk+1 | = k|wk − wk+1 | 1−β Wn Wn k k=1

≤

k=1 n−1

1 q Wn

k=1

k |wk − wk+1 | q

q

1/q n−1 k=1

1 k p(1−β)

1/p .

183

4.8 Weighted ergodic averages

1 Since p(1 − β) > 1 the series ∞ k=1 k p(1−β) converges, so (4.8.13) holds. For α > 1/p we pick β ∈ (1 − α, q1 ) such that β > 1/p, which is possible since q < 2, and apply Corollary 4.8.8. Note that the proof that (4.8.16) implies (4.8.13) for β < 1/q is valid for any q > 1; it is the application of Corollary 4.8.8 to the dual index that requires q > 2. Now we turn to the i.i.d. case and will essentially discuss some extensions of Theorem 4.8.2 that allow us to weaken the assumptions on the weights, when balancing this with a few more integrability conditions on the sequence of random variables ξ = {ξk , k ≥ 1}. We assume wk > 0 for every k. We first begin with a simple proposition which does not require identical distribution. 4.8.10 Proposition. Let 1 < p ≤ 2. ∞ p (i) If for any centered independent sequence ξ with k=1 wk < ∞, then p supk E |ξk | < ∞, the series ∞ k=1 wk ξk converges almost surely. wk ξk (ii) We have almost sure convergence of the series ∞ k=1 Wk for every centered

wk p independent sequence ξ with supk E |ξk |p < ∞, if and only if ∞ < ∞. k=1 Wk (iii) The following are equivalent: ∞ wk p (a) < ∞, k=1 Wk ∞ wk ξk (b) k=1 Wk converges almost surely for any centered independent ξ with supk E |ξk |p < ∞, (c) W1n nk=1 wk ξk → 0 almost surely for any centered independent ξ with supk E |ξk |p < ∞. Proof. Assertion follows from Marcinkiewicz–Zygmund [1937]. In part(ii), if the

wk (i) p wk ξk series ∞ converges, then for ξ as in the statement, the series ∞ k=1 Wk k=1 Wk ∞ wk p converges almost surely by (i). Conversely, if k=1 Wk = ∞, then a result of Marcinkiewicz–Zygmund [1937: Theorem 5] yields the existence of a sequence ξ of in wk ξk dependent centered random variables with E |ξk |p = 1 for which the series ∞ k=1 Wk is almost surely divergent. In assertion (iii), (a) implies (b) by (ii), and (b) implies (c) by Kronecker’s lemma. Now assume (c). An inspection of the construction of the example in the quoted result of Marcinkiewicz–Zygmund shows that if (a) does not hold, then in fact there is {ξk } centered independent with E (|ξk |p ) = 1 such that lim sup wWk ξkk ≥ 1 a.s. (we define ξk = Wk xk /kwk , where xk are the random variables defined in Marcinkiewicz– Zygmund [1937]. This contradicts (c), since W1n nk=1 wk ξk is then a.s. non-convergent to 0 by the identity n n−1 n−1 1 1 wn ξn wn 1 wk ξk − wk ξk = − wk ξk . Wn Wn−1 Wn Wn Wn−1 k=1

k=1

k=1

184

4 Pointwise ergodic theorems

The following result is in the same spirit as in Theorem 4.8.2, but the weights need not necessarily be bounded. 4.8.11 Theorem. Let w be a weight sequence with

∞

k=1 wk

= ∞. If

n 1 wk (log(wk + 1) )β < ∞ for some β > 1, n≥1 Wn

sup

(4.8.17)

k=1

then for any i.i.d. sequence ξ such that E |ξ1 |(log+ |ξ1 |)γ < ∞ for some γ > 1, we a.s. have limn→∞ W1n nk=1 wk ξk = E ξ1 . The proof of the theorem will depend on a general method for obtaining sufficient conditions, described below. Let ϕ : R+ → R+ be a differentiable non-decreasing function satisfying (i) 0 ≤ ϕ (u) ≤ C ϕ(u) u , (ii) u−1 ϕ(u) is nondecreasing for u ≥ u0 , (iii) ϕ(uv) ≤ Cϕ(u)ϕ(v), u ≥ 1, v ≥ v0 , ∞ du < ∞ for some t > 0. (iv) ϕ(u) t

(4.8.18)

Note that (iv) and (ii) imply that limu→∞ u−1 ϕ(u) = ∞. Typical examples are the functions ϕ(u) = uα (log(1 + u) )β , α ≥ 1, β ∈ R+ , with β > 1 when α = 1. It can be ϕ(u) ϕ(v) shown that (ii) implies ϕ(u + v) ≥ ϕ(u) + ϕ(v). Indeed, as ϕ(u+v) u+v ≥ max u , v we have

ϕ(u + v) ≥ (u + v) max

ϕ(u) ϕ(v) , u v

≥

(u + v) ϕ(u) ϕ(v) . + 2 u v

ϕ(u) ϕ(v) We claim that (u+v) ≥ ϕ(u) + ϕ(v). This amounts to the assertion that 2 u + v u+v } ≤ 0. Assume u ≤ v for instance. Then, } + ϕ(v){1 − ϕ(u){1 − u+v 2u 2v

$

%

u+v 1 1 u+v u+v ϕ(u) 1 − + ϕ(v) 1 − ≤ ϕ(v) 2 − + 2u 2v 2 u v (u + v)2 = ϕ(v) 2 − 2uv 2 u + v2 = ϕ(v) 1 − ≤ 0. 2uv Thereby, ϕ

n k=1

n uk ≥ ϕ(uk ). k=1

(4.8.19)

185

4.8 Weighted ergodic averages

Let w be a weight sequence. Put T0 = 0 and Tn = nk=1 ϕ(wk ) for n ≥ 1. Property (4.8.19)implies that Tn ≤ ϕ(Wn ) for all n ≥ 1. When ϕ(t) = t α with α > 1 this means nk=1 wkα ≤ Wnα , which is weaker than the necessary condition (1.3.8), and thus yields no information for the weighted strong law of large numbers. We therefore introduce the following assumption: n ϕ(wk ) sup k=1 < ∞. (4.8.20) W n n Inequality (4.2.20) implies wn /Wn → 0. Denote κ = supn Tn /Wn . Fix some ε > 0. n) If ϕ(w wn > κε, then ϕ(wn ) wn wn wn = · ≤κ < ε; Wn ϕ(wn ) ϕ(wn ) Wn n) if ϕ(w wn ≤ κε, then convergence to infinity in (ii) implies wn ≤ Aε , so wn /Wn ≤ Aε /Wn , which is less than ε for large n. Since ϕ is nondecreasing and ϕ(wn ) = Tn − Tn−1 , for any positive integer j (iii) yields

ϕ(j wn ) # n : Wn ≤ j wn ≤ j + # n > j : Wn ≤ j wn ≤ j + ϕ(Wn ) ≤ j + Cϕ(j )

n>j

Tn − Tn−1 n≥j

ϕ(Wn )

∞

−Tj −1 1 1 + − Tn = j + Cϕ(j ) ϕ(Wj ) ϕ(Wn ) ϕ(Wn+1 )

.

n=j

But by the mean value theorem and the assumptions made on ϕ, we have 1 ϕ(Wn+1 ) − ϕ(Wn ) wn+1 1 − = ≤C . ϕ(Wn ) ϕ(Wn+1 ) ϕ(Wn )ϕ(Wn+1 ) Wn+1 ϕ(Wn ) Inserting this and using Tn ≤ Tn+1 we get

∞ −Tj −1 Tn+1 wn+1 # n : Wn ≤ j wn ≤ j + Cϕ(j ) +C · ϕ(Wj ) Wn+1 ϕ(Wn )

n=j

≤ j + C 2 ϕ(j ) sup k≥j

(Recall that by (4.2.20), κ = supk≥1

Tk Wk

Tk Wk

∞ n=j

∞

wn+1 wn+1 ≤ j + C 2 ϕ(j )κ . ϕ(Wn ) ϕ(Wn )

is finite.) Now

n=j

wn+1 ϕ(Wn )

#{n : Wn ≤ j wn } ≤ j + C κϕ(j ) 2

∞ Wj

≤

Wn+1

du ϕ(u)

Wn

du ϕ(u)

yields (4.8.21)

186

4 Pointwise ergodic theorems

which is finite by (4.8.18) (iv). We now extend the definition of Wn to R+ by putting W0 = 0 and W (t) = W[t] where [t] stands for the integer part of t. For t ∈ R+ and s ∈ R define 0 if 0 ≤ t < 1, ∞ du

G(t) = max t, ϕ(t) W (t) ϕ(u) if t ≥ 1. H (s) = s 2 t≥|s|

G(t) dt. t3

(4.8.22)

G(t) is always finite, and by (4.8.21), #{n : Wn ≤ j wn } < ∞. G(j ) j ≥1

sup

(4.8.23)

The function H need not be finite (e.g., ϕ(u) = u2 and wn = 1/n). When it is, we obtain 4.8.12 Theorem. Let ϕ satisfy (4.8.18), and let w with divergent series satisfy (4.8.20). Then for any i.i.d. sequence ξ with E H (ξ1 ) < ∞ we have n wk ξk a.s. k=1 = E ξ1 . n n→∞ k=1 wk lim

Proof. The proof is built upon (4.8.23) and an argument that lies in the proof of Theorem 3 of Jamison, Orey and Pruitt [1965]. For any positive real t put N(t) := #{n : Wn ≤ twn }. For t < 1 we have N (t) = 0. For t ≥ 1 we have ϕ(t + 1) ≤ ϕ(2t) ≤ Cϕ(2)ϕ(t), and [t] + 1 ≤ t + 1. In view of (4.8.23) and the definition of G, #{n : Wn ≤ twn } ≤ K G([t] + 1) ≤ K G(t). Thus E ξ2

t≥|ξ |

#{n : Wn ≤ twn } G(t) 2 dt ≤ KE ξ dt = KE H (ξ ) < ∞, 3 t3 t≥|ξ | t

by assumption. It follows that condition (4.8.2) of Theorem 4.8.1 is satisfied; by this theorem, as well as the remark at the bottom of p. 41 in Jamison, Orey and Pruitt [1965], the result obtains. Now we can pass to the Proof of Theorem 4.8.11. Since in (4.1.17) we can always replace β by a smaller value (still greater than 1), and also γ can be replaced by any smaller value > 1, we may always assume γ = β and β ≤ 2. Put ϕ(u) := u(log(1 + u) )β . Then the assumptions

4.8 Weighted ergodic averages

187

on ϕ and {wk } of Theorem 4.8.12 are satisfied. Since W (t) → ∞, we have W (t) ≥ e for t ≥ t1 , and ∞ ∞ du 1 du = (4.8.24) ≤ β (β − 1)(log W (t) )β−1 W (t) ϕ(u) W (t) u(log u) 1 t (log(1 + t) )β for large t. For yields that G(t) as defined before satisfies G(t) ≤ β−1 |s| large enough we obtain ∞ ∞ ∞ G(t) (log(1 + t) )β (log t)β dt ≤ c dt ≤ c dt t3 t2 t2 |s| |s| |s| ∞ β (log t)β (log |s|)β ≤ c 2 1− dt = 2c . 2 log t t |s| |s|

∞ dt ≤ C|s|(1 + (log+ |s|)β ). Hence We can now conclude that H (s) = s 2 |s| G(t) t3 E H (ξ1 ) < ∞ when E |ξ1 |(log+ |ξ1 |)β < ∞. Now Theorem 4.8.12 yields the assertion, since γ = β. Condition (4.8.17) is satisfied by any bounded weight sequence. However, Theorem 4.8.11 does not include Theorem 4.8.2, because the latter requires a slightly weaker integrability property of ξ1 . If we want to apply Theorem 4.8.12 to ϕ(t) = t α with (α) α > 1, condition (4.8.20) becomes {Mn /Wn } bounded, which implies (4.8.17). However, without any knowledge of the size of Wn we cannot get better estimates for H with this ϕ, and Theorem 4.8.12 in this case will not improve the result of Theorem 4.8.11. We now add a condition to Theorem 4.8.11 in order to obtain the weighted strong law of large numbers under a weaker integrability condition on the i.i.d. sequence. 4.8.13 Theorem. Let w be a weight sequence. If w satisfies (4.8.17) and also inf n n1 Wn > 0, then for any i.i.d. sequence ξ with E |ξ1 | log+ |ξ1 | < ∞, we have a.s. limn→∞ W1n nk=1 wk ξk = E ξ1 . Proof. We take ϕ as in the proof of Theorem 4.8.11. The additional assumption on Wn yields 1 W (t) Wn inf ≥ inf > 0, t≥1 t + 1 3 n≥1 n ∞ du ≤ C(log(1 + t) )1−β . Hence for large t we have so by (4.8.24) we have W (t) ϕ(u) G(t) ≤ c log(1 + t), so for large |s| we obtain ∞ ∞ ∞ G(t) (log(1 + t) (log t) log |s| + 1 log |s| dt ≤ c dt ≤ c dt = c ≤ c . 3 2 2 t t t |s| |s| |s| |s| |s| ∞ dt ≤ C|s|(1 + log+ |s|). Therefore, We can now conclude that H (s) = s 2 |s| G(t) t3 + E H (ξ1 ) < ∞ when E (|ξ1 | log |ξ1 |) < ∞. Theorem 4.8.11 now applies.

188

4 Pointwise ergodic theorems

We can deduce the following corollary. 4.8.14 Corollary. Let w be a weight sequence satisfying inf n sup n

Wn n

> 0 and

n 1 α wk < ∞ for some α > 1. Wn

(4.8.25)

k=1

Then for any i.i.d. sequence ξ with E |ξ1 | < ∞ we have n 1 a.s. wk ξk = E ξ1 . n→∞ Wn

lim

k=1

Proof. We take ϕ(t) = t α , which satisfies (4.8.18). Assumption (4.8.25) is condition (4.8.20) for our ϕ. It is easy to show that G(t) ≤ ct α W (t)1−α for large t. Since inf t≥1 W (t)/t ≥ 21 inf n Wn /n > 0, we obtain G(t) ≤ c t. Computations similar to the previous ones yield that H (s) ≤ C|s|. Hence E H (ξ1 ) < ∞ when E |ξ1 | < ∞, and Theorem 4.8.13 applies. Remarks. The assumed linear growth of Wn thus allows for more precise estimates, which result in the weighted strong law of large numbers for i.i.d. with only the first moment, when (4.1.17) is strengthened to (4.1.25). There are many unbounded sequences that satisfy the hypotheses of Theorem 4.8.13 and Corollary 4.8.14. For example, strictly stationary ergodic random weights with finite moment α > 1 satisfy the hypotheses of Corollary 4.8.14. On the other hand, if the stationary sequence is only in L(log+ L)β , then almost surely the random weights satisfy (4.1.17), but not (4.1.25). Nevertheless, the weighted strong law of large numbers for i.i.d. with only finite first moment still holds, by Theorem 4.1 of Baxter, Jones, Lin and Olsen [2004]. The stationary random weights above satisfy (4.1.17) but not (4.8.5), while the weights wk := k t , t > 0 satisfy (4.8.5) and not (4.1.17). The method leading to Theorem 4.8.12 can be generalized. We now assume only that ϕ satisfies (4.8.18) (i) to (iii), and instead of assuming (4.8.18) (iv) we take another positive nondecreasing function ϕ1 with ∞ du < ∞ for some t > 0. (4.8.26) ϕ1 (u) t For a weight sequence w with divergent series we assume the following (which is (4.8.20) when ϕ1 = ϕ): ϕ1 (Wn ) nk=1 ϕ(wk ) κ := sup < ∞. (4.8.27) Wn n ϕ(Wn ) Adapting the two inequalities preceding (4.8.21), we get ∞ du #{n : Wn ≤ j wn } ≤ j + C 2 κϕ(j ) . ϕ 1 (u) Wj

4.8 Weighted ergodic averages

189

4.8.15 Theorem. Let ϕ satisfy (4.8.18) (i) to (iii) and ϕ1 nondecreasing with (4.8.26). Let w with divergent series satisfy (4.8.27). Define 0 if 0 ≤ t < 1, ∞ du

G(t) = max t, ϕ(t) W (t) ϕ1 (u) if t ≥ 1. dt (s ∈ R) is finite, then for any i.i.d. sequence ξ with If H (s) := s 2 t≥|s| G(t) t3 E H (ξ1 ) < ∞ we have n wk ξk a.s. = E ξ1 . lim k=1 n n→∞ k=1 wk We use Theorem 4.8.15 for studying some weighted modulation. Fix α > 1, and let c = {ck , k ≥ 1} be a sequence of positive numbers satisfying ∞

ckα = ∞.

(4.8.28)

k=1

(α) Hence also k ck = ∞. Since Cn := nk=1 ck and Cn := nk=1 ckα are strictly increasing, there exist strictly increasing continuous functions ψ and ψα with ψ(0) = 0, (α) ψα (0) = 0, ψ(n) = Cn , and ψα (n) = Cn . Let b = {bk , k ≥ 1} be a sequence of positive numbers satisfying n n α α k=1 ck bk k=1 ck bk := lim n > 0 exists and sup (4.8.29) n α < ∞. n→∞ n≥1 k=1 ck k=1 ck n 2 2 2 As an example of √ such a situation, let c with ∞ k=1 ck = ∞ satisfy supn ncn / k=1 ck < n ∞. Then supn ncn / k=1 ck is finite, and Corollary 4.8.4 applies to c and to {ck2 , k ≥ 1}. Hence for positive i.i.d. random variables {fk , k ≥ 1} with finite third moment, almost surely the realizations bk = fk (x) satisfy (4.8.29). 4.8.16 Theorem. Let α > 1, and let c be a sequence of positive numbers with ∞ α t 1+α α k=1 ck = ∞. Put ϕ(t) = t and ϕ1 (t) := ψα ψ −1 (t) . Assume ϕ1 is nondecreasing and satisfies (4.8.26), and define 0 if 0 ≤ t < 1, ˜ ∞ du

G(t) = max t, ϕ(t) C[t] ϕ1 (u) if t ≥ 1. If H˜ (s) := s 2

˜ G(t) t≥|s| t 3 dt

(s ∈ R) is finite, then for any positive sequence b satisfying (4.8.29) and any i.i.d. sequence ξ with E H˜ (ξ1 ) < ∞ we have n k=1 ck bk ξk a.s. lim = · E ξ1 . n n→∞ k=1 ck

190

4 Pointwise ergodic theorems

˜ and H˜ depend only on the weights c and not on the “modSince the functions G ulators” b, the class of i.i.d. to which the result applies is the same for all positive modulators which satisfy (4.8.29). Proof. We will apply Theorem 4.8.15 to the sequence wk = ck bk . Clearly ϕ and ϕ1 (which depend only on c) satisfy (4.8.18) and (4.8.26)respectively. We now show that ∞. Clearly we may wk = ck bk satisfies (4.8.27). Since > 0, we have ∞ k=1 wk = assume > 1. Hence for n large enough, Wn = nk=1 ck bk ≥ nk=1 ck = Cn , and ψ −1 (Wn ) ≥ ψ −1 (Cn ) = n. By the definitions, for large n we obtain ϕ1 (Wn ) := Hence

ϕ1 (Wn ) ϕ(Wn )

n

k=1 ϕ(wk ) Wn

≤

Wn1+α Wn1+α Wn1+α . ≤ = (α) ψα ψ −1 (Wn ) ψα (n) Cn

Wn1+α (α) Cn Wnα

·

n

α α k=1 ck bk Wn

=

n

α α k=1 ck bk (α) Cn

for large n, and (4.8.29)

yields (4.8.27). Finally, for G and H defined for w in Theorem 4.8.15, Wn ≥ Cn ˜ implies G(t) ≤ G(t), and consequently H (t) ≤ H˜ (t). Hence, by Theorem 4.8.15 and (4.8.29), n n n ck bk a.s. k=1 ck bk ξk k=1 wk ξk n = lim n · k=1 = · E ξ1 . lim n n→∞ n→∞ k=1 ck k=1 wk k=1 ck 4.8.17 Theorem. Let β ≥ 0, and let α > 1. Then for any ergodic probability preserving transformation τ on (X, A, μ) and 0 < f ∈ Lα (μ) there exists X1 with μ(X1 ) = 1 such that if x ∈ X1 , then for any i.i.d. sequence ξ with E |ξ1 | < ∞ we have n f dμ 1 β a.s. k lim k f (τ x)ξk = (4.8.30) E ξ1 . n→∞ nβ+1 β +1 k=1

Proof. Put ck = k β . Asymptotically 1 β+1 β+1 t

and ψα (t) =

1 αβ+1 , αβ+1 t

n

k=1 ck

so ϕ1 (u) =

∼

1 β+1 , so we can take ψ(t) = β+1 n γ u(α+β)/(β+1) satisfies (4.8.26), and

˜ simple computations yield that for large t we have G(t) ≤ Ct, so H˜ (s) ≤ C s. ˜ Thus E H (ξ1 ) is finite if E (|ξ1 |) < ∞. Since {ck , k ≥ 1} and {ckα , k ≥ 1} both satisfy (4.8.5), for non-zero f , Corollary 4.8.4 yields that for almost every x ∈ X the sequence bk = f (τ k x) satisfies (4.8.29) with = f dμ. The result thus follows from the previous theorem. If we take α = 2 and put ck = log(k + 1), then for f ∈ L2 (μ), x ∈ X1 and any ξ i.i.d. with E |ξ1 | < ∞ we have n k k=1 log(k + 1)f (τ x)ξk a.s. lim = f dμ E ξ1 . n→∞ n log n In this case ψ(t) = t log+ t and ψ2 (t) = t (log+ t)2 . For large t we obtain ∞ ∞ ∞ du u2 du (log s + 1)ds 2 = = ≤ , −1 3 2 s log s t ψ(t) ϕ1 (u) ψ(t) ψ (u)u t

191

4.8 Weighted ergodic averages

˜ which yields G(t) ≤ 2t for large t, so H˜ (s) ≤ Cs. We use as before the fact that {ck , k ≥ 1} and {ck2 , k ≥ 1} both satisfy (4.8.5). Oscillations of weighted averages over intervals of polynomial length. The problem of almost everywhere convergence of weighted averages can often be reduced to proving the convergence along a subsequence of polynomial growth. 4.8.18 Proposition. Let (X, A, μ) be a probability space, 1 < p < ∞ fixed with dual index q = p/(p − 1), and {fk , k ≥ 1} ⊂ Lp (μ) with supk fk p < ∞. Let w be a bounded sequence of positive numbers with inf n n1 nk=1 wk > 0, and put n 1 wk fk (x). An (x) := Wn k=1

(i) There exists a constant K, depending only on w, such that for any positive integers n1 < n2 ≤ 2n1 and any x ∈ X we have sup

n1 ≤j <m≤n2

|Am (x) − Aj (x)| ≤ K

n2 − n 1 n2

1/q

n2 1/p 1 |fk (x)|p . n2 k=1 (4.8.31)

(ii) For every R ≥ 1 and r > 1 there exists a constant K(R, r), which depends on w but not on p, such that ∞

sup

i=1

i R ≤j <m≤(i+1)R

qr 1/qr |Am − Aj | p ≤ K(R, r) sup fk p .

(4.8.32)

k

(iii) When p > 2 we obtain, by putting r = p/q = p − 1 in (4.8.32), ∞

sup

i=1

i R ≤j <m≤(i+1)R

p 1/p |Am − Aj | p ≤ K(R, p − 1) sup fk p . k

Proof. (i) Denote C := supk wk , and put αn,k := wk /Wn for 1 ≤ k ≤ n and αn,k = 0 for k > n, so An (x) = ∞ k=1 αn,k fk (x). For j < m, Hölder’s inequality yields |Am (x) − Aj (x)| ≤

m k=1

|αm,k − αj,k |q

m 1/q k=1

|fk (x)|p

1/p .

(4.8.33)

192

4 Pointwise ergodic theorems

Using the definitions and the boundedness of w, we obtain m

|αm,k − αj,k | = q

k=1

j

q wk

k=1

=

j

q m wk − W j )q + q (Wm Wn )q Wm k=j +1

q (Wm wk

k=1

≤ =

j

q m wk 1 1 q − + q Wj Wm Wm k=j +1

qC

wk

q (m − j )q

+ (m − j )

(Wm Wn )q k=1 C q (m − j ) (m − j )q−1 q

q

Wm

Wj

Cq q Wm

j Cq + 1 .

Let C2 := supn Wnn (finite by assumption). Since n1 ≤ j < m ≤ 2n1 , we have m − j ≤ n1 ≤ j . Hence the last estimate yields m

|αm,k − αj,k |q ≤

k=1

≤

C q (m − j ) j q−1 q q q jC + 1 Wm Wj

C q (m − j ) q q m−j (C2 C + 1) = K1 q q . Wm Wm

Substituting this in (4.8.33) and then using m ≤ n2 ≤ 2n1 ≤ 2m, we obtain 1/q (m − j )

|Am (x) − Aj (x)| ≤ K1 ≤

1/q K1

1/q

≤ K1

n2 1/q

Wm (m − j )1/q Wm

k=1

C2 21/p

1/p

1/p n2 1 p |fk (x)| n2

1/p n2 1 p |fk (x)| . n2

1/p

(2m)

m − j 1/q m

|fk (x)|p

k=1

k=1

1/q

Since j/m ≥ n1 /n2 , this shows assertion (i), with K = C2 21/p K1 . (ii) Put C1 := supk fk p . Taking the p-th power of (i) and integrating we obtain

p n2 − n 1 |Am − Aj | p ≤ K p sup n2 n1 ≤j <m≤n2

p/q

p

C1

whenever n1 < n2 ≤ 2n1 . Fix R ≥ 1. For i ≥ i0 large enough (i +1)R / i R ≤ 2 and the previous applies with n1 = i R and n2 = (i + 1)R . Since (i + 1)R − i R ≤ R(i + 1)R−1 , 1 ≤ Ri + 1, so we have n2n−n 2 1/q R |Am − Aj | ≤ K C1 . sup p i+1 i R ≤j <m≤(i+1)R

193

4.9 Subsequence averages

Now let r > 1. We estimate the tail of the series in (ii) by ∞ i=i0

sup i R ≤j <m≤(i+1)R

∞ qr qr |Am − Aj | p ≤ K qr C1 R r i=i0

1 i+1

r

.

This proves the convergence in (ii), and after majorizing the first terms of the series, we can get an estimate for K(R, r).

4.9

Subsequence averages

Let N = {nk , k ≥ 1} be an increasing sequence of positive integers. Let 1 ≤ p < ∞. In view of Birkhoff’s pointwise theorem, it is quite natural to ask if it is true that, given any measurable dynamical system (X, A, μ, T ), the limit N 1 nj T f (x) N →∞ N

lim

(4.9.1)

j =0

exists almost everywhere for any f ∈ Lp (μ), 1 ≤ p < ∞. In which case, we say that N is p-universally good. When p = 1, we say more simply that the sequence N is universally good. This is obviously a fascinating question, although somewhat more theoretical than Birkhoff’s theorem, which attracted and motivated ergodicians during the last decades. Some authors prefer to use a slightly more precise notion, saying that a sequence N is p-nice when, given any measurable dynamical system (X, A, μ, T ) and any f ∈ Lp (μ), N 1 nj a.s. T f (x) = ET (f )(x). N →∞ N

lim

(4.9.2)

j =0

Here ET (f ) denotes the conditional expectation of f with respect to the σ -algebra B(T ) of T -invariant measurable subsets of X. Recall that the corresponding notion of universally p-mean good sequence was defined in Section 1.3 (see after Corollary 1.3.7). Assume T is ergodic. The limit in (4.9.1) need not necessarily be f dμ. For j2 2 instance, the averages along the squares N1 N j =1 T f converge in L (μ) for any f ∈ L2 (μ), but the limit is not necessarily constant for some ergodic transformations. Further, Bellow [1989] showed that there are subsequences of integers such that the averages in (4.9.1) fail to converge when applied to some f ∈ Lp (μ), p < p0 , p0 > 1, but converge for all f ∈ Lp (μ) for each p ≥ p0 . Boshernitzan pointed out that one can modify any increasing sequence, in particular a universally good sequence, by selecting either a 0 or a 1 to add at each point of the sequence, and obtain a sequence which is no longer universally good. Rosenblatt [1997] showed, however, that for “generic”

194

4 Pointwise ergodic theorems

invertible measure-preserving transformations the limit in (4.9.1) is always the integral f dμ. Similarly, we say that N is p-universally bad when, for any measurable dynamical system (X, A, μ, T ), there exists an f ∈ Lp (μ) such that the limit (4.9.1) fails to exist for all x in a set of positive μ-measure. Refinements of this property, namely the sweeping out properties can be defined as follows. We say that N is ∞-sweeping out for Lp when for every aperiodic dynamical system (X, A, μ, T ), there is an f ∈ Lp (μ) such that lim sup n→∞

N 1 nj a.s. T f (x) = ∞. N j =0

And we also say that N is strongly sweeping out iff in every aperiodic dynamical system (X, A, μ, T ), for every ε > 0, there is a set B with μ(B) < ε such that

lim sup n→∞

N 1 nj a.s. T 1B (x) = 1 N j =0

and

lim inf n→∞

N 1 nj a.s. T 1B (x) = 0. N j =0

In this section, we only briefly list some of the most famous examples of universally good or bad sequences. A spectacular result obtained by Bourgain [1988b], [1988c] establishes that the sequence of squares nk = k 2 , k = 1, 2, . . . is p-universally good for 1 < p < ∞. The proof uses the circle method on the shift model Z. A nice presentation of Bourgain’s arguments is given in [Thouvenot: 1989]. Recently Buczolich and Mauldin [2005], answering a question of Bourgain, showed that this sequence is 1universally bad. Bourgain also showed that the sequence nk = q(k), k = 1, 2, . . . , where q is a polynomial with integer coefficients is p-universally good for 1 < p < ∞. A result of the same kind was obtained for√the sequence of primes nk = pk , the k-th prime by Bourgain [1988b] (for p > (1 + 3)/2) and Wierdl [1988] (for p > 1), namely this sequence is also p-universally good for 1 < p < ∞. In a nicely written paper, Nair [1991] established that the sequence nk = q(pk ), k = 1, 2, . . . where q is a polynomial with integer coefficients is p-universally good for 1 < p < ∞. Buzcolich [2007] constructed a sequence {nk , k ≥ 1} such that nk+1 − nk → ∞, and for any ergodic dynamical system (X, α, μ, T ) and f ∈ L1 (μ), the averages N n (1/N) k=1 f (T k x) converge to X f dμ for μ-almost every x. The sequence being of zero Banach density, this disproves a conjecture of Rosenthal and Wierdl about the non-existence of such sequences. Krengel [1971] showed that there exist subsequences which are universally bad in Lp , 1 ≤ p < ∞. Lacunary sequences are strongly sweeping out (see Bellow [1983] and Akcoglu, Bellow, Jones, Losert, Reinhold-Larsson, Wierdl [1996], see also Jones and Wierdl [1994]). Consequently, a universally p-mean good sequence “must” satisfy lim

k→∞

nk+1 = 1. nk

4.9 Subsequence averages

195

Jones and Wierdl [1994] showed that if N satisfies the condition − 21 +ε nk+1 > e(log n) nk

for some positive ε, then it is ∞-sweeping for L2 (later Jones, Lacey and Wierdl [1999] also showed that there exists a universally 2-mean good sequence N satisfying for every −1−ε ε > 0, the condition nk+1 /nk > e(log n) , for all k > k(ε)). A basic fact used there is that for any m-tuple of positive reals v = (v1 , . . . , vm ) satisfying vk+1 > 2q, k = 1, 2, . . . , m − 1, vk then for any e = (e1 , . . . , em ) ∈ Zm , there is r > 0 so that vi r ≡ ei (mod q),

i = 1, 2, . . . , m.

This is Lemma 2.13 in Jones and Wierdl [1994], an article to which we refer for many other examples, the references therein and for a good understanding of the other arguments showing that lacunary sequences are universally bad. A general result (Theorem 2.3) for proving divergence of ergodic averages is also established. We also refer to [Rosenblatt–Wierdl: 1995] and to the works of Akcoglu, Bellow, Bourgain, Del Junco, Jones, Krengel, Lacey, Losert, Olsen, Petersen, Reinhold-Larsson, Rosenblatt, Tempelman, Wierdl, etc. The American school of ergodic theory has made an important contribution to the study of this attractive problem, in particular under the “dynamical” impulse of Bellow, Jones, Lacey, Petersen, Rosenblatt, Wierdl and their collaborators. A monograph making a synthesis of all results obtained, as well as a clear and accessible presentation of the main arguments would be very welcome and certainly very helpful. Lacunary sequences play a key role in many fundamental questions of analysis, probability theory, or Fourier analysis and here in ergodic theory. We shall notably see their interplay in studying the central limit theorem (Chapter 7) and the convergence properties of the system {f (nk x), k ≥ 1} (Chapter 12). Below, we indicate an unexpected arithmetical property of these sequences, which we think is worth being mentioned. An arithmetical property of lacunary sequences. Burr [1970] raised the following question: let a1 < a2 < · · · be a sequence of integers, call it A, and let P (A) = εi ai , εi = 0 or 1, ai ∈ A and i εi < ∞ . i

Which sets S of integers are equal to P (A) for some A? Burr mentioned that if the complement of S grows sufficiently rapidly, then there exists such a sequence A. Hegy´vari [1996] showed that if B = {bi , i ≥ 1} is such that 7 ≤ b1 < b2 < · · · and bn+1 ≥ 5bn for every n,

196

4 Pointwise ergodic theorems

then there exists a sequence A such that P (A) = N\B,

(4.9.3)

thereby improving substantially an earlier unpublished result of Burr. He also showed that his result cannot be improved essentially, which is a quite remarkable fact. More precisely, if B is such that bn+1 ≤ 2bn

for every n large enough,

and B is a Sidon set, namely bi + bj = bk + b implies i = k, j = t or i = t, j = k, then there is no sequence A for which P (A) = N\B. It seems that this kind of property or some variant of it deserves more investigation. We refer to Hegy´vari’s paper for more details and more results. Among these is another one, answering a question raised by Ruzsa, which we wish to include in these remarks: for any pair of real numbers 0 ≤ α ≤ β ≤ 1, there is a set A: a1 < a2 < · · · for which #{P (A) ∩ [1, n]} = α, n #{P (A) ∩ [1, n]} d(P (A)) = lim sup = β. n n→∞ d(P (A)) = lim inf n→∞

(4.9.4)

Random subsequences. There are two remarkable types of studies. The first originates from a work by Bourgain [1988b] who considered a special kind of averages. Here we are given a sequence {Yj , j ≥ 1} of independent random variables such that P{Yj = 0} = 1 − σj , P{Yj = 1} = σj , 0 < σj < 1 for all j , and we form, given any measurable dynamical system (X, A, μ, T ) and f ∈ L0 (μ), the averages Aωn f =

1 #{j ≤ n : Yj (ω) = 1}

f τj.

j ≤n:Yj (ω)=1

Only partial results exist. Bourgain proved that if (a) the sequence {σn , n ≥ 1} is decreasing,

(b) limn→∞

j ≤n σj log n

= ∞,

then for almost every ω the sequence Nω = {j : Yj (ω) = 1} is mean-good. Jones, Lacey and Wierdl [1999] showed for the limit case σj = 1/j that the sequence Nω is not mean-good for a measurable subset of ω of positive measure. The basic idea consists in showing that Nω contains a lacunary subsequence which has positive density in Nω . Notice by the weighted strong law of large numbers, that if σ = ∞, denoting j j n = j ≤n σj , #{j ≤ n : Yj = 1} j ≤n Yj a.s. 1 = −→ , n n 2

4.9 Subsequence averages

197

and so the averages Aωn f have the same limit behavior as the weighted averages Aωn f =

1 1 Yj (ω)f τ j = (Yj (ω) − E Yj + E Yj )f τ j . n n j ≤n

j ≤n

Since E Yj = σj /2 is decreasing, we deduce from Corollary 4.8.4 (i) that the weighted ergodic averages 1 E Yj f τ j n j ≤n

converge almost everywhere to Eτ (f )/2 for any f ∈ L1 (μ). And therefore only the limit behavior of the averages 1 Aωn f = (Yj (ω) − E Yj )f τ j , n = 1, 2, . . . n j ≤n

remains to be known. Consider the related random polynomials Qn (t) = (Yj − E Yj )e2iπj t , n = 1, 2, . . . . j ≤n

It follows from Example 2 given right after Theorem 8.5.1 that the increment condition (8.5.4) is fulfilled, and so Corollary 8.5.3 (c) applies. We get for the limit case σj = 1/j , E sup |Qn (t)| = O(log n). t∈T

A second remarkable example of subsequence averages built from random subsequences is described as follows: let {Xj , j ≥ 1} be a sequence of i.i.d. Z-valued random variables and form their partial sums Sn = X1 + · · · + Xn , n ≥ 1. Assume that T is invertible and consider the ergodic averages 1 f T Sj , n n−1

Bn f =

n ≥ 1.

j =0

Lacey, Petersen, Rudolph and Wierdl [1994] showed that if E X1 = 0 and E X12 < ∞, then for almost all ω, the sequence {Sn , n ≥ 1} is p-mean good for any p, 1 < p < ∞. Gamet and Schneider showed that under the condition E |X1 |δ < ∞ for some δ > 0, then for almost all ω, the sequence {Sn , n ≥ 1} is 2-mean good. Their result is also valid for Zd -actions and i.i.d. Zd -valued random variables. Let ϕ(t) = E e2iπ X1 t be the characteristic function of X1 . The behavior (in mean and almost sure) of the averages Bn is naturally related to that of the sup-norm of the polynomials Pn (t) =

n−1

e2iπ Sj t − ϕ j (t) ,

j =0

n = 1, . . . .

198

4 Pointwise ergodic theorems

Guillotin-Plantard [2002] showed the following sharp uniform bound: for every ε > 0, 2 a.s. sup |Pn (t)| = O(n5/6 log n), (4.9.5) t∈T

The proof cleverly develops martingale techniques used in [Lacey–Petersen–Rudolph– Wierdl: 1994], who previously established the same result, but with constant 7/8. The question naturally arises whether the constant 5/6 is optimal or not, and if so, whether 1/2+ε, ε arbitrarily small is suitable. An approach using stochastic processes methods presented in Chapters 8 and 9 remains also to do. Of interest for this question is probably the fact that for any 0 < ε < 1, 2 E Pn (t) − Pn (s) dsdt = O(n1+(5ε/2) ). (4.9.6) |s − t|1+ε T T A computation of the L2 -increments of Pn indeed yields 2 E Pn (α) − Pn (β) n ! " #" #

2 − ϕ(β − α)k − ϕ(α − β)k + ϕ(−β)k − ϕ(−α)k ϕ(α)k − ϕ(β)k = k=1 k−1 n

+

# " # !" − ϕ(β)k−l + ϕ(−α)k−l ϕ(α − β)l − 1 + ϕ(β − α)l − 1

k=2 l=1

# − ϕ(α − β)l − 1 ϕ(α)k−l − ϕ(β)k−l + ϕ(−β)k−l − ϕ(−α)k−l "

k k l l l l k k + ϕ(α) − ϕ(β) ϕ(−β) − ϕ(−α) + ϕ(α) − ϕ(β) ϕ(−β) − ϕ(−α) . (4.9.8) Elementary considerations on characteristic functions then imply k−1 n n 2

2

2 E Pn (α) − Pn (β) ≤ C k |α − β| ∧ 1 + kl|α − β|2 ∧ 1 . k=1

k=2 l=1

Now, owing to the fact that for a transient random walk n1 nk,l=1 P{Sk = Sl } → ∞ 2G(0, 0) − 1, where G(0, x) = k=0 P{Sk = x} is the Green function (which is finite for every x ∈ Z), we also have for any u ∈ T, 2 E Pn (α + u) dα ≤ Cn. T

These two facts easily imply the claimed property. Problem 4. Let A be an increasing sequence of positive integers. Find conditions ensuring that the set P (A) considered by Burr is 2-mean good or 2-universally good.

4.9 Subsequence averages

199

Problem 5. If B = {bi , i ≥ 1} is an increasing sequence of positive integers bn+1 /bn ≥ 1 + εn for every n where εn ↓ 0, what could be an analogous result to (4.9.3)? Problem 6. Does Theorem 5.2.4 provide an alternative way to prove that the sequence of primes is 1-universally bad? The numerous applications given by Stein of his result (see Chapter 5) suggest such a possibility. Problem 7. Is the estimate (4.9.5) improvable? Is it possible to develop an approach based on the majorizing measure method or the metric entropy method?

Chapter 5

Banach principle and continuity principle

In this chapter, we state and give the proof of several formulations of the Banach principle and the continuity principle, which have proved to be fundamental tools for the study of problems of convergence almost everywhere for sequences of operators. We study through some examples their application in analysis.

5.1

Banach principle

This principle, formulated by Banach in 1926, is a fundamental tool in the study of the almost everywhere property of sequences of Lp -operators with p finite. The statement corresponding to the case p = ∞ is much more recent and was obtained by Bellow and Jones in 1996. Its use will be crucial in the proof of the metric entropy criterion in L∞ in Chapter 8. We begin this section with some necessary background. Let (X, A, μ) be a probability space. Let L0 (μ) be the space of A-measurable functions f : X → R. For every f, g ∈ L0 (μ), we write |f − g| d(f, g) = dμ, ρ(f ) = d(0, f ). (5.1.1) X 1 + |f − g| μ

This metric defines the topology of the convergence in measure – gn → g if for any ε > 0, limn→∞ μ {|gn − g| > ε} = 0 – and we recall that (L0 (μ), d) is a complete metric space. Let (B, · ) be a Banach space and consider an application S from B to L0 (μ). Introduce the following definition. 5.1.1 Definition. We say that S is continuous in measure, or d-continuous, if for any sequence (f, fn , n ∈ N) ⊂ B, we have lim fn − f = 0 "⇒ lim d(Sfn , Sf ) = 0.

n→∞

n→∞

Then the Banach principle can be stated as follows. 5.1.2 Theorem. (a) Let S = {St , t ∈ N} be a family of operators St : B → L0 (μ). Assume there exists a nonincreasing function C : R+ → R+ with limt→∞ C(t) = 0 and ∀f ∈ B, ∀α > 0, μ x : sup |St (f )(x)| > α f ≤ C(α). (5.1.2) t∈N

201

5.1 Banach principle

Then the operators St are continuous in measure and the set L(S) = {f ∈ B : μ {(St (f ), t ∈ N) converges} = 1} is closed in B. (b) Conversely, if the operators St are continuous in measure, and if for any f ∈ B, μ sup |St (f )| < ∞ = 1, t∈N

then there exists a nonincreasing function C : R+ → R+ such that limt→∞ C(t) = 0 and ∀f ∈ B, ∀α > 0, μ sup |St (f )| > α f ≤ C(α). (5.1.3) t∈N

Proof. For every f ∈ B and t ∈ N, we write S ∗ (f ) = sup |Ss (f )|, s∈N

St∗ (f ) = sup |Ss (f )|.

(5.1.4)

s∈N,s≤t

Assertion (a) is immediate; the continuity in measure indeed follows from the inequality μ{|St (f ) − St (fn )| > ε} ≤ μ{S ∗ (f − fn ) > ε} ≤ C(ε f − fn −1 ) → 0, with f − fn . Let f ∈ L(S). There exists a sequence {fn , n ≥ 1} of elements of L(S) converging in B to f . Put ∀x ∈ X, ∀g ∈ B,

O(x, g) = lim

sup

T →∞ s,t∈N∩[T ,∞[

|Ss (g)(x) − St (g)(x)|.

Since O(x, f ) = |O(x, f )−O(x, fn )| ≤ O(x, f −fn ) ≤ 2S ∗ (f −fn )(x); we deduce for any ε > 0, μ{x : O(x, f ) > ε} ≤ μ{x : 2S ∗ (f − fn )(x) > ε} ≤ C

ε 2 f − fn

→ 0,

as n tends to infinity. And since ε is arbitrary, we get μ {x : O(x, f ) = 0} = 1. This shows that f ∈ L(S), and thus L(S) is closed. (b) Fix some ε > 0. By assumption, for each f ∈ B, μ{S ∗ (f ) < ∞} = 1. There thus exists a positive integer n = n(f, ε) such that μ{S ∗ (f ) > n} ≤ ε. Put for any positive integer n, Bn = {f ∈ B : μ{S ∗ (f ) > n} ≤ ε}. Then, B=

+ n≥1

Bn .

202

5 Banach principle and continuity principle

Besides, for any integer n ≥ 1, * Bn = Bn,t where Bn,t = {f ∈ B : μ{St∗ (f ) > n} ≤ ε}. t∈N

We first show that the sets Bn,t are closed. Let {fk , k ≥ 1} be a sequence of elements of Bn,t converging in B to f . Let h > 0 be fixed, then μ{St∗ (f ) > n + h} ≤ μ{St∗ (fk ) + St∗ (f − fk ) > n + h} and thus

μ{St∗ (f ) > n + h} ≤ inf μ{St∗ (fk ) > n} + μ{ St∗ (f − fk ) > h} ≤ ε, k≥1

by continuity in measure of St . As + * {St∗ (f ) > n} ⊂ St∗ (f ) > n + j1 J ≥1 j ≥J

= lim inf St∗ (f ) > n + j1 = lim (↓) St∗ (f ) > n + j1 , j →∞

we have

j →∞

μ{St∗ (f ) > n} = lim μ St∗ (f ) > n + j1 ≤ ε. j →∞

This shows that the sets Bn,t , and thereby the sets Bn are closed sets. We can thus write B as a countable union of closed sets. By virtue of the Baire theorem, one of these sets, say Bn , must have a nonempty interior. This set therefore contains a closed ball B(f0 , r) = {f ∈ B : f − f0 ≤ r}, r > 0. Consequently, μ{S ∗ (f ) > n} ≤ ε.

∀f ∈ B(f0 , r),

(5.1.5)

Writing then f in the form f = f0 + rz with z ∈ B, z ≤ 1, and observing that S ∗ (rz)(x) ≤ S ∗ (f0 )(x) + S ∗ (f0 + rz)(x), leads to μ{S ∗ (rz) > 2n} ≤ μ{S ∗ (f0 ) > n} + μ{S ∗ (f0 + rz) > n} ≤ 2ε.

(5.1.6)

Thus, for any z ∈ B, z ≤ 1, we have ∀α ≥

2n , r

μ{S ∗ (z) > α} ≤ 2ε.

(5.1.7)

μ{S ∗ (z) > α}.

(5.1.8)

Put C(α) =

sup

z∈B, z ≤1

Then C(α) ≤ 2ε, provided that α ≥

2n r .

As ε is arbitrary, we have on the one hand

lim C(α) = 0,

α→∞

(5.1.9)

5.1 Banach principle

203

and on the other, ∀f ∈ B, ∀α > 0,

μ sup |St (f )| > α f ≤ C(α).

(5.1.10)

t∈N

This achieves the proof. The importance of this result comes from the fact that it is often possible to establish the convergence μ-almost everywhere of the sequence {St f, t ∈ N} for f belonging to a countable dense subset of B. In many applications, B is an Lp (μ) space with 1 ≤ p < ∞. When p = ∞, namely when {St f, t ∈ N} is a sequence of continuous operators in measure, or simply continuous from L∞ (μ) to L∞ (μ), the fact that for any f ∈ L∞ (μ), μ x : sup |St (f )|(x) < ∞ = 1 t∈N

does not bring any significant information. A different formulation of this principle is then necessary. It is precisely the object of our next statement, which is due to Bellow–Jones [1996]. Put Y = {f ∈ L∞ (μ) : f ∞ ≤ 1}. We endow Y with the distance d associated to the convergence in measure, which is defined in (5.1.1). Observe that the distances d and · p , 1 ≤ p < ∞ are equivalent on Y. Indeed, one easily establishes for any f, g ∈ Y, 1

f − g 1 ≤ d(f, g) ≤ f − g 1 , 3

p

f − g p ≤ 2p−1 f − g 1 ≤ 2p−1 f − g p . (5.1.11)

5.1.3 Definition. Let S : Y → L0 (μ) be not necessarily linear. We say that S is continuous at 0, if S is d-continuous at 0 on Y. Let us make some useful comments. When S : L∞ (μ) → L0 (μ) is linear, then S is continuous at 0 if and only if S is d-continuous on L∞ (μ). Let (E, · ) be a normed space. When S : E → L0 (μ) is a sublinear operator (i.e., |S(λf )| = |λ||S(f )| for any f ∈ E and any real λ, and |S(f1 + f2 )| ≤ |S(f1 )| + |S(f2 )| for any f1 , f2 ∈ E), it is well known (see for instance Garsia [1970]) that S is continuous at 0 ∈ E, if and only if the function ϕ : ]0, ∞) → [0, ∞) defined by ϕ(λ) =

sup

f ∈E, f ≤1

μ{x : |Sf (x)| > λ}

tends to 0 as λ tends to infinity. Further, if E = L∞ (μ) and if the operators Sn ’s are continuous from L∞ (μ) to ∞ L (μ), the property μ{S ∗ f < ∞} = 1 for all f ∈ L∞ (μ) is often automatically satisfied; for instance if the Sn ’s are all contractions in L∞ (μ). But this does not necessarily imply that the sequence {Sn (f ), n ≥ 1} converges almost everywhere for

204

5 Banach principle and continuity principle

any f ∈ L∞ (μ), as is shown through the following example. On the circle (T, λ) consider for f ∈ L0 (λ), x ∈ T and any positive integer n, the averages operators 1 Sn,θ f (x) = f (x + 2k θ ). n n−1 k=0

Clearly

Sθ∗ f = sup |Sn,θ f | ≤ f ∞ . n≥1

So we do have μ{Sθ∗ f < ∞} = 1 for all f ∈ L∞ (λ). Further, for almost all θ ∈ T the sequence {2k θ, k ≥ 1} is uniformly distributed (mod 1). For such a θ , the sequence

{Sn,θ f, n ≥ 1} converges for all x if f is continuous. However, for every irrational θ ∈ T the convergence almost everywhere of this sequence is known to fail for some f ∈ L∞ (λ) (see Rosenblatt [1991] and Akcoglu, Bellow, Jones, Losert, ReinholdLarsson and Wierdl [1996]). We begin with a first theorem connecting the convergence almost everywhere to the continuity property at 0 of the maximal operator. 5.1.4 Theorem. Let {Sn , n ≥ 1} be a sequence of linear operators of L∞ (μ) in L0 (μ). Assume that the following conditions are realized: a) ∀f ∈ L∞ (μ), μ {S ∗ (f ) < ∞} = 1, b) S ∗ : Y → L0 (μ) is continuous at 0. Then the set E = {f ∈ Y : μ{(Sn (f ), n ≥ 1) converges} = 1} is closed in (Y, d). Consequently, if for any f in some countable dense subset D of (Y, d) the sequence {Sn (f ), n ≥ 1} converges almost everywhere, this will be also fulfilled for any f ∈ L∞ (μ). The next statement shows that the additional condition b) is natural. 5.1.5 Theorem. Let {Sn , n ≥ 1} be a sequence of linear operators of L∞ (μ) in L0 (μ). Assume that the following conditions are realized: a) each Sn is continuous at 0, b) for any f ∈ L∞ (μ), μ{x : {Sn (f )(x), n ≥ 1} converges} = 1. Then S ∗ : Y → L0 (μ) is continuous at 0. These results are Theorems 1 and 2 in [Bellow–Jones: 1996]. Theorem 5.1.5 was already stated (without proof) in Bourgain [1988a] under an additional commutation assumption needed in the context of the article. The proof follows, in turn, the same line of arguments as in the proof of the classical Banach principle. The one of Theorem 5.1.4 being rather elementary (combine subadditivity of oscillations functions O(x, f ) introduced in the proof of Theorem 5.1.2 with the continuity at 0 of the maximal operator S ∗ f , namely its continuity in measure at 0), we only give the

205

5.1 Banach principle

Proof of Theorem 5.1.5. We recall in view of (5.1.11) that (Y,

1 ) is a complete metric space. For any δ > 0 and f ∈ Y, we write Vf (δ) = {g ∈ Y : f − g 1 < δ}. Let 0 < ε < 1/2 and N be some positive integer. Put CN (ε) = f ∈ Y : μ{x : sup |SN f (x) − Sm f (x)| > ε} ≤ ε m≥N

and for M > N, CN,M (ε) = f ∈ Y : μ{x :

sup

N ≤m≤M

|SN f (x) − Sm f (x)| > ε ≤ ε}.

Since each Sn is linear and continuous at 0, it is continuous in measure. The sets CN,M (ε) are therefore closed. We omit the details since it is essentially a repetition of the proof -that the sets Bn,t are closed in the demonstration of Theorem 5.1.3. As CN (ε) = M>N CN,M (ε), the sets CN (ε) are closed as well. But our assumption implies that ∞ + CN (ε). Y= N =1

And so in view of the Baire theorem, one of these sets, call it CN (ε), must have a nonempty interior. Thus there exists f ∈ CN (ε) and δ > 0 such that Vf (δ) = f + V0 (δ) ⊂ CN (ε). For each g ∈ Vf (δ), we have μ{x : supm≥N |SN g(x) − Sm g(x)| > ε} ≤ ε. Thereby if h ∈ V0 (δ), writing h = f − g for some g ∈ Vf (δ) we get μ x : sup |SN h(x) − Sm h(x)| > 2ε ≤ 2ε. m≥N

But S ∗ h(x) = sup |Sm h(x)| ≤ sup |SN h(x) − Sm h(x)| + 2 sup |Sm h(x)|. m≥1

m≥N

m≤N

Hence 1 − 2ε ≤ μ x : sup |SN h(x) − Sm h(x)| ≤ 2ε m≥N

≤ μ x : S ∗ h(x) ≤ 2 sup |Sm h(x)| + 2ε . m≤N

Let C = {S ∗ h ≤ 2 supm≤N |Sm h| + 2ε}. Then S∗h S∗h ∗ dμ + dμ ρ(S h) = ∗ 1 + S∗h Cc 1 + S h C 2 supm≤N |Sm h| + 2ε ≤ dμ + μ(C c ) C 1 + 2 supm≤N |Sm h| + 2ε

206

5 Banach principle and continuity principle

supm≤N |Sm h| dμ + 1 + supm≤N |Sm h| + 2ε

≤2 X

X

2ε dμ + 2ε 1 + 2ε

≤ 2ρ( sup |Sm h|) + 4ε. m≤N

But each Sn is continuous at 0 by assumption, and so for some δ < δ we have that ρ(supm≤N |Sm h|) ≤ ε whenever h 1 < δ . This allows us to write ρ(S ∗ h) ≤ 5ε,

h 1 ≤ δ .

As ε can be arbitrarily small the proof is now complete.

5.2

Continuity principle

Let (X, A, μ) be a probability space. For sequences of operators {Sn , n ≥ 1}, Sn : Lp (μ) → L0 (μ), 1 ≤ p < ∞, which are continuous in measure and commuting with a mixing family of transformations of (X, A, μ), the Banach principle can be strengthened into a continuity principle. The study of this principle is the object of this section. We will make the following commutation assumption: (H ) There exists a family E of measurable transformations of X, preserving μ, which are mixing in the following sense: ∀A, B ∈ A, ∀α > 1, ∃E ∈ E ,

μ(A ∩ E −1 (B)) ≤ αμ(A)μ(B),

(5.2.1)

and commuting with the sequence of operators {Sn , n ≥ 1}: Sn (f E) = (Sn f ) E for any n ≥ 1, f ∈ Lp (μ) and E ∈ E . Remarks. 1. Assumption (H) is verified when for instance the operators Sn commute on Lp (μ) with an ergodic endomorphism τ from (X, A, μ): Sn (f τ ) = Sn (f ) τ . Indeed, it is then easy to check that the family E = {τ n , n ∈ N} satisfies the mixing condition (5.2.1). Let A, B ∈ A, by ergodicity of τ , 1 μ(A ∩ τ −k (B)) = μ(A)μ(B). n→∞ n n−1

lim

k=0

Let α > 1, for sufficiently large n we have 1 μ(A ∩ τ −k (B)) ≤ αμ(A)μ(B). n n−1 k=0

And this implies that there exists an integer k = k(n, α) < n such that μ(A ∩ τ −k (B)) ≤ αμ(A)μ(B),

207

5.2 Continuity principle

hence (5.2.1). Assumption (H) has also two useful consequences. 2. A first consequence concerns the sequence {Sn , n ≥ 1}: ∀n ≥ 0, μ{x : Sn (1)(x) = constant} = 1.

(5.2.2)

Indeed, let a and b be two reals satisfying a < b. Apply (5.2.1) to A = {x : S(1)(x) ∈ [a, b]} = B, where S is arbitrary in {Sn , n ≥ 1}. By assumption, for any E ∈ E , we have: S(1)E = S(1 E) = S(1), and thus E −1 A = A. From (5.2.1) thus follows that μ(A) ≤ αμ(A)2 , for any α > 1. Consequently, μ(A) = μ(A)2 , Therefore, ∀a, b ∈ R, a < b,

μ{x : S(1)(x) ∈ [a, b]} = 0 or 1.

(5.2.3)

From this follows that for some integer n0 , noting I = [n0 , n0 + 1], "

1#

μ{x : S(1)(x) ∈ I } = 1. "

#

Let I1 = n0 , n0 + 2 and I2 = n0 + 21 , n0 + 1 . Then there exists i ∈ {1, 2} such that μ{x : S(1)(x) ∈ Ii } = 1.

"

#

"

Assume for instance that it is I1 . Dividing I1 into I3 = n0 , n0 + 41 and I4 = n0 + 1 1# 4 , n0 + 2 , one progressively builds – by iterating the same argument – a decreasing sequence of compact intervals which we denote by {Jn , n ≥ 1}, verifying a) ∀n ≥ 1, μ{x : S(1)(x) ∈ Jn } = 1, b) ∀n ≥ 1, |Jn | = 21n . It follows from this that there exists a real λ such that μ{x : S(1)(x) = λ} = 1,

(5.2.4)

as claimed. 3. A second consequence concerns the mixing property (5.2.1). This one indeed implies that ∀A ∈ A, ∀n ≥ 2,

∃E1 , E2 , . . . , En ∈ E , such that if A = A ∪

n +

Ei−1 A, then

i=1

2 . (5.2.5) 1 − μ(A ) Said differently, in order to bound μ(A) by C/n, it suffices to show that μ(A ) < 1, and 2 then take C = 1−μ(A ) . This will be one of the key tools of the proofs of Theorems 5.2.1 and 8.2.1. nμ(A) ≤

208

5 Banach principle and continuity principle

In order to establish (5.2.5), the following intermediate property will be needed: ∀C ∈ A, ∀α > 1, ∀n ≥ 1, ∃E1 , E2 , . . . , En ∈ E , such that μ(C ∩ E1−1 C ∩ · · · ∩ En−1 C) ≤ α n μ(C)n+1 .

(5.2.6)

For n = 1, it suffices to apply (5.2.1) with the choice A = B = C. We find that there exists E1 ∈ E such that μ(C ∩ E1−1 C) ≤ αμ(C)2 . For n = 2, we apply again (5.2.1) with this time, the following choices A = C ∩ E1−1 C, B = C. Then there exists E2 ∈ E such that μ(C ∩ E1−1 C ∩ E2−1 C) ≤ αμ(C ∩ E1−1 C)μ(C) ≤ α 2 μ(C)3 . The reasoning made for n = 2 is next easily iterated for any integer n > 2; hence property (5.2.6). Now we show how to deduce property (5.2.5). Let A ∈ A be fixed. We can assume 0 < μ(A) < 1; indeed (5.2.5) is obvious if μ(A) = 0 whereas if μ(A) = 1, then μ(A ) = 1 and so (5.2.5) is also trivially realized. Observe that for any E ∈ E , E −1 (Ac ) = (E −1 A)c . Apply then (5.2.6) to C = Ac , 1 α = √1−μ(A) . There thus exists E1 , E2 , . . . , En ∈ E such that μ(Ac ∩ (E1−1 A)c ∩ · · · ∩ (En−1 A)c ) = 1 − μ(A ) ≤ α n (1 − μ(A))n+1 ≤ (1 − μ(A)) 2 . n

In other words, 2

μ(A) ≤ 1 − (1 − μ(A )) n . It is at this stage that the little technical restriction n ≥ 2 is used. Indeed, if f (x) = 2 2 (1 − x) n , 0 ≤ x ≤ 1, then f (x) = − n2 (1 − x) n −1 is decreasing. Applying then the mean value theorem to f gives μ(A) ≤ f (0) − f (μ(A )) ≤ μ(A )

2 2 1 1 ≤ . 2 n [1 − μ(A )]1− n n 1 − μ(A )

And this establishes property (5.2.5). We can now state the continuity principle. 5.2.1 Theorem (Stein [1961]). Suppose that {Sn , n ≥ 1} is a sequence of operators, Sn : Lp (μ) → L0 (μ), 1 ≤ p ≤ 2, which are continuous in measure and satisfy the commutation assumption (H). Then the following properties are equivalent: ∀f ∈ Lp (μ), ∃0 < C < ∞ : ∀f ∈ L (μ), p

μ{x : S ∗ f (x) < ∞} = 1, ∗

(5.2.7)

sup λ μ{x : S f (x) > λ} ≤ C p

λ≥0

|f |p dμ. (5.2.8) X

209

5.2 Continuity principle

We refer to (5.1.4) for the notation S ∗ f . A useful remark is the following: let < p; under (5.2.7) we get from (5.2.8) that S ∗ f ∈ Lp (μ), hence Sn f ∈ Lp (μ), n = 1, . . . . And if

p

μ{x : {Sn f (x), n ≥ 1} converges} = 1,

denoting S(f ) the limit, by the dominated convergence theorem S(f ) ∈ Lp (μ) as well. Although the proof we shall present of this theorem is much inspired by the one given in Garsia [1970], it differs in two points which seem of interest. First, we will use Gaussian random variables instead of Rademacher random variables in Stein’s randomisation technique. But above all, we will proceed with a direct reasoning unlike in Garsia [1970]. This will have the advantage to better highlight the basic arguments of the proof. Proof. We denote by E0 the identity of X. Let f ∈ Lp (μ) be such that f p,μ = 1. Let also λ > 0 and A = {x : S ∗ f (x) > λ}. Let n ≥ 2 be some integer, which will be determined later relatively to λ. By , virtue of property (5.2.5), we can find E1 , E2 , . . . , En ∈ E such that if A = A ∪ ni=1 Ei−1 A, then nμ(A) ≤

2 . 1 − μ(A )

Let (gi )i≥0 be a sequence of i.i.d. N (0, 1) distributed random variables defined on a different probability space which we denote by (, B, P). Associate to f the Gaussian Stein’s elements ∀n ≥ 1,

Fn,f =

n 1 gk f Ek . (1 + n)1/p

(5.2.9)

k=0

Step 1. Moment of order p of Fn,f . First observe, by means of Hölder inequality,

p/2

|Fn,f |p d P ≤

|Fn,f |2 d P

=

p/2 n 1 2 f E . i (1 + n)2/p i=0

As 1 ≤ p ≤ 2,

p/2 n n n p/2 1 1 2 1 p 2 f E = f E ≤ |f | Ei . i i (1 + n)2/p 1+n 1+n i=0

Thus

i=0

i=0

p

|Fn,f |p d P dμ ≤ f p,μ = 1. X

Hence by Fubini’s theorem

p

Fn,f p,μ d P ≤ 1.

(5.2.10)

210

5 Banach principle and continuity principle

Step 2. Size of A . Let x ∈ A . There thus exists an index 0 ≤ i ≤ n such that x ∈ Ei−1 A. We have consequently S ∗ (f )(Ei x) > λ.

(5.2.11)

For some integer m, we will thus have |Sm (f )(Ei x)| > λ. Now by using the commutation assumption n n 1 1 Sm (Fn,f ) = gk Sm (f Ek ) = gk Sm (f )Ek = Fn,Sm (f ) . (1 + n)1/p (1 + n)1/p k=0

k=0

The random variables gk being symmetric, we also have n 1 P sign gk Sm (f ) Ek = sign(gi Sm (f ) Ei ) = . 2 k=0

(5.2.12)

k =i

Let w be such that P{|gi | ≥ w} = 3/4. We can thus assign to any element x of A , a measurable set Ix of probability P{Ix } ≥ 1/4 such that ω ∈ Ix "⇒ S ∗ (Fn,f )(x, ω) ≥

λw . (1 + n)1/p

(5.2.13)

Let = {ω : Fn,f ( ·, ω) p,μ ≤ 81/p }. By means of (5.2.10) and Tchebycheff inequality, P( ) ≥ 1 − (1/8) = 78 . Thus for any x ∈ A , P{Ix ∩ } ≥

Define ϕ(x, ω) = Let x ∈ A , then

1 ≤ 8

1 . 8

1

if S ∗ (Fn,f )(x, ω) >

0

else.

(5.2.14)

ϕ(x, ω) d P ≤

∩Ix

λw , (1+n)1/p

ϕ(x, ω) d P.

By integrating this inequality on A relatively to μ, next using Fubini’s theorem, we obtain μ(A ) ϕ(x, ω) d P(ω) dμ(x) ≤ 8 A ≤ ϕ(x, ω) d P(ω) dμ(x) (5.2.15) X λw μ S ∗ (Fn,f ) > d P. = (1 + n)1/p

211

5.2 Continuity principle

By virtue of the Banach principle, there exists a real C such that ∀g ∈ Lp (μ),

μ{S ∗ (g) > C g p,μ } ≤

1 . 9

(5.2.16)

As Fn,f p,μ ≤ 81/p on , we thus have on this set μ{S ∗ (Fn,f ) > C81/p } ≤ μ{S ∗ (Fn,f ) > C Fn,f p,μ } ≤ Choose then n such that

λw (1+n)1/p

≥ C81/p , namely 1 + n ≤

assumed n ≥ 2, this is possible only if λw ≥

241/p C.

p

n = sup m ≥ 2 : 1 + m ≤ It follows from (5.2.17) that μ{S ∗ (Fn,f ) >

1 λw 8 C

μ(A ) ≤ As in view of the first step, nμ(A) ≤ n≥

1 λw p 16 ( C ) .

(5.2.17)

λw p C

. As we have

For this choice of λ, let

=

$

1 λw 8 C

λw 1 } dP ≤ . 1/p (1 + n) 9

We have thus shown that

1 8

1 . 9

p

%

−1 .

(5.2.18)

8 . 9

2 1−μ(A ) ,

we also deduce nμ(A) ≤ 18. But

Said differently μ{S ∗ (f ) > λ} ≤

C 18 ≤ 300 n λw

p

,

(5.2.19)

1/p 1/p ∗ if λw

C≥ p 24 C. Finally, if 0 < λw < 24 C, we have μ{S (f ) > λ} ≤ 1 ≤ 24 λw . Summarizing, for any λ > 0,

μ{S ∗ (f ) > λ} ≤ 300

C λw

p

.

(5.2.20)

The proof is thus achieved. Remarks. 1. Inequality (5.2.15) is crucial. When combined with the initial inequality, it provides the key of the proof: nμ{S ∗ (f ) > λ} − 2 λw ∗ μ S (F ) > d P, (5.2.21) ≤ 8 n,f 1 nμ {S ∗ (f ) > λ} (1 + n) p this being verified for any λ > 0 and any integer n ≥ 2. That inequality also indicates a possible bifurcation at this stage of the proof. Indeed, by inverting the order of

212

5 Banach principle and continuity principle 1

p integration, and letting λ = M w · (n + 1) , where M is a positive real and n ≥ 2 integer, we have nμ{S ∗ (f ) > λ} − 2 P{S ∗ (Fn,f ) > M} dμ. (5.2.22) ≤8 nμ{S ∗ (f ) > λ} X

Said differently, S ∗ (f ) is controlled by means of S ∗ (Fn,f ) for an appropriate choice of the integer n, which is a very striking fact. 2. If 1 < p ≤ 2, then S ∗ (f ) ∈ L1 (μ) whenever f ∈ Lp (μ). However, if we do not assume that the operators commute with a mixing family, the maximal function need not be integrable even for f ∈ L∞ . Wierdl’s counterexample. Consider the following example given in Bellow and Jones [1994: 157]: 2 1 Sn f (x) = n(n + 1) f (t)dt χ]1/(n+1),1/n[ (x). 0

These operators are contractions on L2 (T) and converge to 0 for all x ∈ T. But, for any f ∈ L2 (T) such that T f (t)dt > 0 we have

sup Sn f (x)dx =

T

T n≥1

f (t)dt

∞ 2

n(n + 1)

n=1

1/n

1 dx = ∞.

1/(n+1)

3. The result is optimal. Indeed, without any additional assumption on the sequence of operators {Sn , n ≥ 1}, one may give an example (Stein [1961]) showing that it is no longer true for Lp (μ) with p > 2. " " A counterexample for p > 2. Let T = R/Z = − 21 , 21 be the torus equipped = e(nx), for with the normalized Lebesgue measure λ. Let e(x) = e2iπ x and en (x)

, 1/2 −1 2 n ∈ Z. Let h(x) = (|x| log(1/|x|)) , x ∈ T. Then h ∈ L (T)\ q>2 Lq (T) . Let h(x) ∼ n∈Z cn en (x), with n∈Z |cn |2 < ∞. If {εn , n ∈ Z} is a Rademacher sequence, then 2 exp cn εn en dλ < ∞. E T

n∈Z

Thus, there is a sequence of ±1’s, again denoted by {εn , n ∈ Z}, such that * cn εn en ∈ Lp (T). f = n∈Z

p 2. Then, it would follow that T g q < ∞ whenever 2 ≤ q < p if g ∈ Lq (T). Now, take g = f . Then Tf = n∈Z cn en = h, and we obtain a contradiction since h ∈ / Lq (T), if q > 2. It is possible, however, to prove a partial extension to Lp spaces with p > 2. 5.2.2 Theorem (Stein [1961]). Suppose that {Sn , n ≥ 1} is a sequence of operators, Sn : Lp (μ) → L0 (μ), 2 ≤ p < ∞, continuous in measure and satisfying the commutation assumption (H ). Then the following properties are equivalent: μ{x : S ∗ f (x) < ∞} = 1, (5.2.23) 1/p ≤ C. sup λ2 μ x : S ∗ f (x) > λ |f |p dμ

∀f ∈ Lp (μ), ∃0 < C < ∞ : ∀f ∈ Lp (μ),

λ≥0

X

(5.2.24) Sketch of proof. The proof is nearly the same as that of Theorem 5.2.1. We just indicate the modification to be incorporated into the line of arguments. Instead of (5.2.9), we associate to f the random elements Fn,f =

n 1 gi f E i , (1 + n)1/2

n = 1, 2, . . . .

(5.2.9 )

i=0

Let f ∈ p,μ > 0. The elementary integrability properties of Gaussian random variables plus a plain convexity argument imply p/2 p p |Fn,f | d P ≤ Cp |Fn,f |2 d P Lp (μ) with f

=

p Cp

p/2 1 2 f Ei 1+n n

i=0

≤

p Cp

n

1+n

i=0

|f |p Ei ,

where the constant Cp depends on p only. Thus, p p |Fn,f |p d P dμ ≤ Cp f p,μ . X

Hence, by means of Fubini’s theorem p p p

Fn,f p,μ d P ≤ Cp f p,μ .

Replace f by f = f/(Cp f p,μ ). The rest of the proof then is as before.

(5.2.10 )

214

5 Banach principle and continuity principle

Sawyer [1966] has observed, nevertheless, that Theorem 5.2.1 remains valid in any Lp (μ) with 1 ≤ p < ∞ for positive operators: for any n ≥ 1 and any f ∈ Lp (μ), μ f ≥ 0 = 1 "⇒ μ Sn (f ) ≥ 0 = 1. (5.2.25) This is the object of the following theorem. Before considering this, it is worthwhile remarking that, by assumption, these operators are continuous on L∞ (μ). A first consequence of the commutation assumption together with positivity is that for any f ∈ L∞ (μ) |Sn (f )| ≤ Sn (|f |) ≤ f ∞ Sn (1). (5.2.26) There thus exists positive real An such that ∀f ∈ L∞ (μ),

Sn (f ) ∞ ≤ An f ∞ ,

hence the continuity. The result can be stated as follows. 5.2.3 Theorem. Let 1 ≤ p < ∞ and let {Sn , n ≥ 1} be a sequence of positive operators, Sn : Lp (μ) → L0 (μ), and continuous in measure. We further assume that the sequence {Sn , n ≥ 1} satisfies the commutation assumption (H). Then the following properties are equivalent: μ{x : S ∗ f (x) < ∞} = 1, (5.2.27) 1/p ≤ C. sup λp μ x : S ∗ f (x) > λ |f |p dμ

∀f ∈ Lp (μ), ∃0 < C < ∞ : ∀f ∈ Lp (μ),

λ≥0

X

(5.2.28) Proof. Here again a direct proof is accessible. This result will be easier to prove than Theorem 5.2.1. We denote by E0 the identity of X. Let f ∈ Lp (μ) be such that

f p,μ = 1 and μ{x : f (x) ≥ 0} = 1. Let λ > 0 be fixed. We associate to them the set A = {x : S ∗ f (x) > λ}. Let n ≥ 2 be some integer which will be defined later on with respect to λ.,By virtue of property (5.2.5), there exist E1 , E2 , . . . , En ∈ E such that if A = A ∪ ni=1 Ei1 A, then nμ(A) ≤ Introduce the auxiliary element F =

1 (1+n)1/p

1 ≤ (n + 1) n

p

F p,μ

2 . 1 − μ(A )

(5.2.29)

max0≤k≤n |f Ek |. Then

|f Ek |p dμ = 1.

k=0 X

The positivity assumption of operators Sn moreover implies that ∀m ≥ 1, ∀k = 0, . . . , n,

Sm (F ) ≥

1 1 Sm (f Ek ) = (Sm f )Ek . (n + 1)1/p (n + 1)1/p

215

5.2 Continuity principle

Consequently for k = 0, . . . , n, S∗F ≥

1 (S ∗ f ) Ek . (n + 1)1/p

It follows that μ{S ∗ F >

λ } ≥ μ{∃k ∈ [0, n] : (S ∗ f ) Ek > λ} = μ(A ). (n + 1)1/p

(5.2.30)

By virtue of the Banach principle, there exists a real C > 0 such that for any g ∈ Lp (μ), μ{S ∗ g > C g p,μ } ≤ 1/3. We assume that λ ≥ 31/p C. As F p,μ ≤ 1, if

p n = [ Cλ − 1] we have by (5.2.30), μ(A ) ≤ μ{S ∗ F >

λ 1 } ≤ μ{S ∗ F > C F p,μ } ≤ . 1/p (n + 1) 3

2 But nμ(A) ≤ 1−μ(A ) ; we have thus obtained that nμ(A) ≤ 3. Said differently, since

λ p n ≥ C /3, for any λ ≥ 31/p C,

μ{S ∗ (f ) > λ} ≤ Finally, if λ ≤ 31/p .C, then 1 ≤ 3 This thus achieves the proof.

C p λ

C 3 ≤9 n λ

p

.

(5.2.31)

so that (5.2.31) is trivially realized in this case.

Remarks. When p = ∞ and {Sn , n ≥ 1} is a sequence of positive operators from L∞ (μ) to L0 (μ), continuous in measure and satisfying the commutation assumption (H ), we have already observed in the remarks preceding Theorem 5.2.1 that these operators are continuous on L∞ (μ). Besides, if ∀f ∈ L∞ (μ)

μ{x : S ∗ f (x) < ∞} = 1,

(5.2.32)

then there exists a positive real A such that for any f ∈ L∞ (μ),

S ∗ (f ) ∞ ≤ A f ∞ . So that we also have in a trivial way a continuity principle in L∞ (μ). It is easy to see (cf. for instance Graversen–Peškir–Weber [1995: Theorem 3.1]) that this principle also extends to exponential type Orlicz spaces. Now, we shall obtain a variant of Theorem 5.2.1, for the case p = 1, which is particularly useful in applications. Consider a commutative compact group M and denote by “+” the group operation. Let m be the unique invariant measure, the Haar measure, on M with associated Lp (M) spaces. C(M) will designate the space of continuous functions on M, with the supremum norm, and B(M) will designate the

216

5 Banach principle and continuity principle

space of finite Borel measures on M with the usual norm. Let {Sn , n ≥ 1} be a sequence of operators. We assume: (a) Each Sn is a bounded operator from L1 (M) to C(M). (b) Each Sn commutes with translations. By Riesz’s representation of bounded linear functionals on L1 (M), it may be proved that the conditions (a) and (b) are equivalent with (c) Sn f (x) = M Kn (x − y)f (y)m(dy), where K ∈ L∞ (M).

Such an operator has a natural extension to a bounded operator from B(M) to L∞ (M), which we again denote by Sn . Notice that this extension still commutes with translations. Similarly, we also write S ∗ μ = supn∈N |Sn μ|. Then we have the following result. 5.2.4 Theorem. Under the above described assumptions, the following assertions are equivalent: (5.2.33) ∀f ∈ L1 (M), m{x : S ∗ f (x) < ∞} = 1, ∃0 < C < ∞ : ∀μ ∈ B(M), sup λμ x : S ∗ f (x) > λ |dμ| ≤ C. (5.2.34) λ≥0

M

Before giving the proof, we need a lemma. 5.2.5 Lemma. Let T1 , . . . , TN be operators that each satisfy the conditions (a) and (b) above. Let μ ∈ B(M). Then there exists a sequence f1 , f2 , . . . of elements of L1 (M), such that fk ≤ μ and lim Tn fk = Tn μ,

k→∞

n = 1, . . . , N.

Proof. Let ϕ1 , ϕ2 , . . . be continuous nonnegative functions such that T ϕk dm = 1 and forming an approximation of the identity in the usual sense. Put fk = ϕk ∗ μ. Then fk 1 ≤ μ . By (c), we may represent each Tn as Tn f = Kn ∗ f for some function Kn ∈ L∞ (M). Thus Tn fk = Kn ∗ (ϕk ∗ μ) = ϕk ∗ (Kn ∗ μ) = ϕk ∗ (Tn μ). Now, owing to the well-known fact that ϕk ∗ (Tn μ) − Tn μ 1 tends to 0 as n tends to infinity, we deduce the claimed result by extracting if necessary a subsequence of the sequence {ϕk , k ≥ 1}. Proof of Theorem 5.2.4. In view of Theorem 5.2.1, there exists a constant C such that for any f ∈ L1 (M) and α ≥ 0, αm sup |Sn f | > α ≤ C f 1 . 1≤n≤N

217

5.3 Applications

Apply this to the function f = fk , where the fk are given in the above lemma, and let k tend to infinity. It comes from this that αm sup |Sn μ| > α ≤ C |dμ|. 1≤n≤N

M

Letting now N tend to infinity achieves the proof.

5.3 Applications The continuity principle can be used to prove results of negative nature, but also of positive nature. In his fundamental paper, Stein gave several examples of applications. We study some of them. 1. Conjugate functions. For Fourier series of functions f ∼ conjugate function is defined by f˜ ∼ −i sign(n)an en .

n∈Z an en , the so-called

(5.3.1)

n∈Z

The linear operator which maps f to f˜ satisfies

f˜ 2 ≤ f 2 , and more generally for 1 < p < ∞,

f˜ p ≤ Cp f p .

(5.3.2)

This inequality is due to M. Riesz. For p = 1, this inequality fails, and the appropriate substitute result in that case is a theorem due to Kolmogorov, which asserts that sup tλ{x ∈ T : |f˜(x)| > t} ≤ C f 1 .

(5.3.3)

t≥0

It can be observed that this result together with the elementary inequality for p = 2, already implies by the Marcinkiewicz interpolation theorem, the Riesz inequality. Among the various proofs of inequality (5.3.3), the original proof of Kolmogorov is of special interest. He considered f˜r = −i sign(n)r |n| an en . n∈Z

By a known result, for every f ∈ L1 (λ), limr→1 f˜r exists almost surely. Kolmogorov proved that the limit operator satisfies inequality (5.3.3). But the mapping f → f˜r commutes with translations, and so this directly follows from the continuity principle enunciated in Theorem 5.2.1.

218

5 Banach principle and continuity principle

2. Lebesgue differentiation theorem. Consider the family of operators 1 h Th f (x) = f (x + t)dt, h > 0. h 0

(5.3.4)

According to the classical theorem of Lebesgue, if f is integrable, then for almost every x, (5.3.5) lim Th f (x) = f (x). h→0

Much later, Hardy and Littlewood introduced their maximal function f ∗ (x) = sup |Th f (x)|,

(5.3.6)

h>0

and proved for p > 1 the inequality

f ∗ p ≤ Cp f p .

(5.3.7)

F. Riesz observed that the inequality sup tλ{x ∈ T : |f ∗ (x)| > t} ≤ C f 1

(5.3.8)

t≥0

is implicit in their proof. Note that the operators Th commute with translations. Thus, in view of Lebesgue’s theorem, that inequality follows from the continuity principle. 3. Differentiation of functions of two variables. Let f ∈ L1 (T2 ), with Fourier expansion an,m en (x)em (y). f (x, y) ∼ (n,m)∈Z2

One may formally define double conjugate series sign(n)sign(m)an,m en em f˜ = −

(5.3.9)

(n,m)∈Z2

and ask whether this double conjugate series exists in a suitable sense. Similarly to the approach of Kolmogorov, one can consider the Abel sums of the above series, f˜r,ρ = − sign(n)sign(m)r |n| ρ |m| an,m en em , (5.3.10) (n,m)∈Z2

and inquire about the existence of the limit lim r→1 f˜r,ρ . For f ∈ Lp (T2 ), p > 1, it ρ→1

is known that this limit exists almost everywhere. In fact f ∈ L log L(T2 ) suffices. There is an analogy between double conjugate series and the differentiation of double integrals. Indeed, if f ∈ L log L(T2 ), it is known that h θ 1 f (x, y) = lim f (u + x, v + y)dudv, (5.3.11) h→1 hθ 0 0 θ →1

219

5.3 Applications

for almost all x and y. However, if one merely assumes that f ∈ L1 (T2 ), then the above inequality may fail to exist almost everywhere. But, if for instance we let h = θ , referring to Saks [1937] the limit (5.3.11) exists almost everywhere. In analogy with this, it was believed that the limit limr→1 f˜r,r for the double conjugate series exists for almost all x and y. Surprisingly enough, by a result of Stein [1961], the answer turns up to be negative. Here is the argument. As is well known, f˜r,ρ = Q(r, x − u)Q(ρ, y − v)f (u, v)dudv, T T

where

r sin 2π v . 1 − 2r cos 2π v + r 2 Let rm = 1 − 1/m. We shall prove that there exists an f ∈ L2 (T2 ) such that the limit L(f ) = limm→∞ f˜rm ,rm exists only in a set of measure 0. Assume the contrary. As the mappings f → f˜rm ,rm satisfy conditions (a) and (b), the conclusion of Theorem 5.2.4 holds. Apply it for μ equal to the Dirac measure at the origin. Then, sup f˜r ,r (x, y) = sup Q(rm , x)Q(rm , y) ≥ Q(1, x)Q(1, y) Q(r, v) =

m≥1

m m

m≥1

=

A 1 |(cotπ x)(cotπy)| ≥ . 4 xy

A The measure of the set {(x, y) ∈ T2 : xy > t} is of order B(log t)/t, thereby not of order B/t as it should be by the conclusion of Theorem 5.2.4. Hence a contradiction, and this proves the result.

4. Divergent Fourier series. A deep theorem of Kolmogorov asserts the existence of an integrable function f whose Fourier series diverges almost everywhere. The proof of this result is extremely difficult. It is possible, however, by means of the continuity principle to obtain a simplification and a refinement of this result. Let Sn (f ) designate the partial sum of order n of the Fourier series of f , and more generally Sn (μ) the partial sum of order n of the Fourier–Stieltjes expansion of a Borel measure μ. Recall the following fact: if f is integrable, then Sn f (x) − Sm f (x) = O(log |m − n|),

m, n → ∞,

almost everywhere. The refinement of Kolmogorov’s theorem is the following: let ϕ(n) be any function tending to zero as n tends to infinity. Then, there exists an integrable function such that the more restrictive property Sn (f )(x) − Sm (f )(x) = O(ϕ(|m − n|) log |m − n|)

(5.3.12)

is false on a set of positive measure. This result has been proved in Stein [1961]. For, consider the family of operators (m,n) f =

Sn (f ) − Sm (f ) . ϕ(|m − n|) log |m − n|

(5.3.13)

220

5 Banach principle and continuity principle

These operators satisfy conditions (a) and (b) of Theorem 5.2.4. We shall prove a lemma. 5.3.1 Lemma. There exists an absolute constant C such that for any integer k, there exists a measure μ on T with T |dμ| = 1 and Sn (μ) − Sm (μ) ≥ C log k almost surely. sup n,m:|n−m|=k

Proof. Let x1 , . . . , xN be some points of T to be specified later, and set μ=

N 1 δxi , N i=1

where δx denotes the Dirac measure at point x. Then Sn (μ)(x) − Sm (μ)(x) =

T |dμ|

= 1. Plainly,

N 2 cos π(n + m + 1)(x − xi ) sin π(n − m)(x − xi ) . N sin π(x − xj ) i=1

Write k = n − m, = n + m + 1. Assume that k is odd. Then must be even, but this is the only restriction on . We choose the xi to be linearly independent over Q, and such that they are very close to i/N. It is easily seen then, that for almost every x, the x − xi are linearly independent over Q. Choosing large enough, depending on x, we have N 2 | sin π k(x − xi )| |Sn (μ)(x) − Sm (μ)(x)| = . sup N | sin π(x − xj )| n,m:|n−m|=k i=1

Now the facts that xi are very close to i/N and N is large enough show that the sum on the right is close to its integral counterpart, and so exceeds half of its value. Therefore, 1 | sin π k(x − y)| sup |Sn (μ)(x) − Sm (μ)(x)| ≥ dy ≥ C log k, 2 T | sin π(x − y)| n,m:|n−m|=k as required. Returning to the studied property, we now can argue as follows: if for any f ∈ L1 (T) property (5.3.12) was true with positive probability, then the operators (m,n) f would satisfy the condition (5.2.33) of Theorem 5.2.4. Consequently, the maximal operator Sn (μ)(x) − Sm (μ)(x)

μ → ∗ (μ) := sup n,m

ϕ(|m − n|) log |m − n|

would satisfy the conclusion of this theorem, which is given by (5.2.34). But, this is now impossible in view of the lemma. Indeed by (5.2.34), we would have the existence of a constant C0 such that for any μ ∈ B(M) with T |dμ| = 1, and any t ≥ 0, tλ{x : ∗ μ(x) > t} ≤ C0 .

221

5.3 Applications

Let k be a positive integer, which we choose sufficiently large to ensure that log k > (2C0 )/C, where C is the same constant as in Lemma 5.3.1. Apply this for t = C(log k)/2; then,

C 2C0 log k ≤ < 1. 2 C log k But by Lemma 5.3.1, there exists μ ∈ B(M) with T |dμ| = 1 such that ∗ μ ≥ C log k almost surely. This provides a contradiction. Therefore, the operators cannot satisfy condition (5.2.33). And this shows the existence of an integrable function such that property (5.3.12) is false for almost every x. λ x : ∗ μ(x) >

5. Multiplier operators. In this example, we are concerned with the “multiplier problem” for Fourier series in one variable, which is that of characterizing the sequences λn of multipliers for which the transformation T defined for f ∼ n∈Z an en by Tf ∼ an λn en (5.3.14) n∈Z

is a bounded operator on Lp (T) into itself. If for any f ∈ Lp (T), f ∼ n∈Z an en , the series in (5.3.14) is the Fourier series of a function in Lp (T), and the operator T is bounded of Lp (T) to Lp (T), we say that λn is of type (Lp , Lp ). This is naturally an important problem, to characterize the sequences λn of type (Lp , Lp ). There is no restriction to assume λ0 = 0 and the sequence λn to be bounded. Introduce the generating function K given by K(x) =

λn en (x) . in ∗

(5.3.15)

n∈Z

An important, although basic fact (see [Zygmund: 1959], p. 157) about multipliers is this: if a sequence λn is of type (Lq , Lq ) for some q ∈ [2, ∞], then it is also of type (Lp , Lp ) for each p ∈ [q , q] where q is the index conjugate to q: 1/q + 1/q = 1. There is a corresponding result for the case q ∈ [1, 2]. Let q ∈ [2, ∞( and consider the classes Vq introduced in Kaczmarz [1933]. A function K belongs to Vq , if and only if K ∈ Lq (T), and sup K(bk − ·) − K(ak − ·) < ∞, (5.3.16) q

where the summation is taken over any finite collection of non-overlapping intervals of T, and the “sup” is taken over all such collections of intervals. The class V∞ may be defined to be the class of functions of bounded variation: V∞ = BV (T). Obviously V∞ ⊂ Vq ⊂ V2 if q ∈ [2, ∞]. The following fact is well known. 5.3.2 Lemma. A necessary and sufficient condition that the multiplier sequence λn is of type (L∞ , L∞ ), and thereby of type (Lr , Lr ) for all r ∈ [1, ∞], is that the generating function K defined in (5.3.15) belongs to V ∞ .

222

5 Banach principle and continuity principle

Let us continue with another simple lemma. 5.3.3 Lemma. A necessary and sufficient condition that the multiplier sequence λn is of type (L2 , L2 ), namely the λn are uniformly bounded, is that the generating function K belongs to V2 . Proof. Assume first that |λn | ≤ M for all n. Then, λn K(bk − x) − K(ak − x) ∼ en (bk ) − en (ak ) en (x). in n∈Z∗ In what follows, we write K0 (x) = n∈Z∗ en (x)/(in). By the Parseval relation, |λn |2 2 K(bk − ·) − K(ak − ·) 2 = en (bk ) − en (ak ) 2 2 n n∈Z∗ 1 2 ≤ M2 en (bk ) − en (ak ) 2 n n∈Z∗ 2 = M 2 K0 (bk − ·) − K0 (ak − ·) ≤M

2

2 2

K0 (bk − ·) − K0 (ak − ·) ∞

≤ M0 < ∞,

since K0 is of bounded variation. Thus K ∈ V2 . Conversely, assume K ∈ V2 and let f = an en (x) be any trigonometric polynomial. Let λn F (x) = K ∗ f (x) = K(x − y)f (y)dy = (5.3.17) an en (x). in T ∗ n∈Z

Then,

F (bk ) − F (ak )

≤ sup K(bk − ·) − K(ak − ·) 2 ≤ M < ∞

by the Cauchy–Schwarz inequality, if f 2 ≤ 1. Consequently F is of bounded variation, with total variation less than 2M. Therefore, |F (x)|dx ≤ 2M. T

But if f = en , then F = [λn /(in)]en , and thereby |λn | ≤ 2M. This achieves the proof. Now we shall use the continuity principle to prove the following nearly optimal result. 5.3.4 Theorem. Let q ∈ ]2, ∞[ . (i) Assume the multiplier operator defined in (5.3.14) to be of type (Lr , Lr ) for all r ∈ [q , q]. Then the generating function K defined in (5.3.15) belongs to Vq . (ii) Conversely, suppose that K belongs to Vq . Then, the multiplier operator is of type (Lr , Lr ) for all r ∈ ]q , q[.

223

5.3 Applications

Proof. We first prove part (i), which is relatively easy. Let p = q so that 1/p+1/q = 1, and consider again F = K ∗ f as in (5.3.17) for f ∼ an en ∈ Lp (T). Then, with the notation (5.3.14), x

F (x) =

(Tf )(t)dt. 0

By assumption the operator T satisfies Tf ∈ Lp (T), if f ∈ Lp (T). And F (bk ) − F (ak ) ≤ Tf 1 ≤ Tf p ≤ Tp , if f p ≤ 1 where Tp is the operator norm of T acting on Lp (T). Thus,

K(bk − x) − K(ak − x) f (x)dx ≤ Tp , T

whenever f p ≤ 1. Therefore, K(bk

− ·) − K(ak − ·) ≤ Tp , q

hence K ∈ Vq . Now, we prove part (ii). Consider the operator on Lp (T) defined by Dm f = Fm where

Fm = m F (· + 1/m) − F ( · ) = m K(· + 1/m) − K( · ) ∗ f. (5.3.18) By assumption K ∈ Lq (T), thus the operator Dm is bounded from Lp (T) to itself, for each m. Moreover Dm commutes with translations. Observe with the proof of Lemma 5.3.3 that F is of bounded variation, when f ∈ Lp (T). Indeed F (b K(b ) − F (a ) ≤ − x) − K(a − x) f (x)dx k k k k T ≤ K(bk − ·) − K(ak − ·) f p ≤ Tp < ∞. q

Thus the limit

lim Dm (f )(x) = lim m F (x + 1/m) − F (x) = F (x),

m→∞

m→∞

exists for almost every x, whenever f ∈ Lp (T). By the continuity principle (Theorem 5.2.1), the mapping D : f → F is of weak type (p, p). But K ∈ Vq ⊂ V2 , and so by Lemma 5.3.3 the mapping S : f → F is of type (L2 , L2 ). By the Marcinkiewicz interpolation theorem, it follows that S is also of type (Lr , Lr ) for r ∈ ]p, 2]. But the mapping S coincides with the multiplier operator T on trigonometric polynomials, thereby by continuity, on Lr (T). Invoking then a classical duality argument, we deduce that T is of type (Lr , Lr ) for r ∈ [2, q[. The proof is now complete.

224

5 Banach principle and continuity principle

6. Hardy spaces. In this example, we discuss an application of the continuity principle to some nonlinear operators occurring in analysis. Let H 1 denote the closed subspace of L1 (T) consisting of functions of power series type: f (t)en (−t)dt = 0 (∀ n < 0). (5.3.19) T

is invariant under the translation action. For any f ∈ L1 (T), let Sn f and σn f be respectively the partial sum and Cesàro mean of order n of the Fourier series of f . Define 1/2 |Sn (f )(x) − σn (f )(x)|2 ∗ g (x) = . (5.3.20) n Note that H 1

It is known that g ∗ (x) is finite for almost every x if f ∈ H 1 . Consider for f ∈ H 1 the nonlinear mapping f → g ∗ . 5.3.5 Theorem. There exists an absolute constant C such that for any f ∈ H 1 , ∗ sup aλ{x ∈ T : g (x) > a} ≤ C |f (x)|dx. T

a≥0

Proof. Let {αnm , n, m ∈ N} be a collection of complex numbers satisfying the following requirements: • the modulus of each αnm is rational and the argument is a rational multiple of 2π , • for each m, αnm = 0 for n sufficiently large, • for each m, |αnm |2 /n ≤ 1. Define for every m and f ∈ H 1 , Sn f (x) − σn f (x) Tm (f )(x) = αnm , (5.3.21) n n and

T ∗ f (x) = sup |Tm f (x)|.

(5.3.22)

m

Plainly T ∗ f (x) = g ∗ (x). The result then follows from the remark following Theorem 5.2.1. 7. Gabisoniya’s operator. Let f ∈ L1 (T). Gabisoniya [1973] showed that 2 n π i/n n f (x ± t) − f (x)dt = 0 for almost all x ∈ T. lim n→∞ i π(i−1)/n i=1 (5.3.23) This generates an operator of the form $ 2 %1/2 n π i/n

n f (x) = sup , |f (x + t) − f (x)| + |f (x − t) − f (x)| dt i π(i−1)/n n∈Z+ i=1

225

5.3 Applications

which is of weak type (1, 1), by the continuity principle. Now let f ∈ L1 (T) and let Sn (f, x) be the partial sums of the Fourier series of f . Rodin [1992] has considered the sequence {Sn (f, x), n ≥ 1} as a function of an integral argument n ∈ Z+ . He showed by means of Gabisoniya’s result that, for almost all x, it has bounded mean oscillation. 5.3.6 Theorem. Let f ∈ L1 (T), then the operator Tf (x) = sup m,n∈Z+

m−1 m−1 1 1 Sj +n (f, x) Sk+n (f, x) − m m j =0

k=0

is of weak type (1, 1). This operator is the BMO-norm of the function n → Sn (f, x). Further Tf (x) ≤ C f (x)

for almost all x ∈ T.

(5.3.24)

By the Jones–Nirenberg theorem (see also before Theorem 4.2.6), we have the inclusion BMO⊂ L where (x) = e|x| − 1, and we deduce from the preceding theorem 5.3.7 Corollary. Let f ∈ L1 (T), then for every constant A > 0, and for almost all x ∈ T, n 1 A|Sk (f,x)−f (x)| e − 1 = 0. lim n→∞ n k=0

The two previous results are respectively Theorem 1 and its corollary in [Rodin: 1992], to which we also refer for further results and the references therein. 8. Carleson’s theorem and Fefferman’s operator. Let f ∈ L2 (T). Here we choose the representation T ∼ (−π, π ). Carleson’s celebrated theorem shows that the partial sums Sn f of the Fourier series of f converge to f almost everywhere, thereby solving in the affirmative Lusin’s hypothesis. Carleson proved a few other results: a.e.

• If f ∈ Lp (T), 1 < p < 2, then Sn f (x) = o(log log log n).

• If for some δ > 0, T |f (x)| log+ |f (x)|)1+δ dx < ∞, then a.e.

Sn f (x) = o(log log n). Carleson [1966] considered a modified form of the Dirichlet formula for Sn f (x): −int e f (t) ˜ Sn f (x) = dt. x−t T Introduce the maximal function M ∗ f (x) = sup|n|≥0 |S˜n f (x)|. Carleson proved that λ{x ∈ T : M ∗ f (x) > y} ≤ C

f 22 , y2

226

5 Banach principle and continuity principle

for all y > 0, f ∈ L2 (T). Now put Mf (x) = supn≥0 |Sn f (x)|. By modifying Carleson’s proof, Hunt [1968] obtained corresponding inequalities for Mf : 5.3.8 Theorem. a) Mf p ≤ Cp f p , 1 < p < ∞, 2

b) Mf 1 ≤ C T |f (x)| log+ |f (x)| dx + C, c) λ{x ∈ T : Mf (x) > y} ≤ Ce−Cy/ f ∞ , y ≥ 0.

Fefferman [1973] gave another proof of Carleson–Hunt’s result. He proved the basic estimate Mf 1 ≤ C f 2 using a new approach. Given x, let n(x) ¯ be the least integer k for which |Sk f (x)| ≥ (1/2)Mf (x). The basic estimate is equivalent to

Sn¯ f 1 ≤ C f 2 . Elementary considerations of Dirichlet’s formula show that iny e − e−iny Sn f (x) = C f (x − y)dy + r y T where r is a trivial error term. To prove the basic inequality, it is enough to show that i N¯ (x)y e ≤ C f 2 , f (x − y)dy y T 1 ¯ ¯ for N(x) = n(x) ¯ and for N(x) = −n(x). ¯ Regard N¯ as a fixed function of x, and consider the linear operator T defined by Tf (x) =

¯

T

ei N (x)y f (x − y)dy. y

Fefferman proved that

Tf 1 ≤ C f 2 , with C independent of f and N¯ .

5.4 A principle of domination – conjugacy lemma We refer in this section to Halmos [1956]. Let (X, A, μ) be a probability space. A measurable transformation τ of X preserving μ (τ μ = μ) is called an automorphism of X, if τ is bijective, bi-measurable and if τ −1 is preserving μ. The family of automorphisms of (X, A, μ) is denoted by C. The family C, when equipped with the composition operation as internal law, is an abelian group. If τ ∈ C, then letting for any f ∈ L2 (μ) τf = f τ , we define a unitary operator on L2 (μ). As is well known, strong and weak topologies restricted to the set of all unitary operators coincide. The properties of these topologies are thus the same. The topology on C is usually called

5.4 A principle of domination – conjugacy lemma

227

the weak topology, and we have that τn → τ in C if and only if one of the following four equivalent properties is satisfied: ∀f ∈ L2 (μ), ∀A ∈ A,

f τn → f τ in L2 (μ), μ(τn (A)τ (A)) → 0,

∀A ∈ A, ∀f ∈ Lp (μ),

μ(τn −1 (A)τ −1 (A)) → 0, f τn → f τ in Lp (μ),

where 1 ≤ p < ∞ is given and fixed. Endowed with this topology, C is a topological group. In what follows, we will assume that the probability space (X, A, μ) is (pointwise) isomorphic to the interval [0, 1[ equipped with the normalized Lebesgue measure; namely that (X, A, μ) is a Lebesgue space. Recall for instance that any Polish space X (with A to be the Borel σ -field B(X) completed relatively to an arbitrary probability measure μ on B(X)) is a space of Lebesgue. Under this regularity assumption, the weak topology on C is metrizable and satisfies the first axiom of countability. Finally recall also that τ ∈ C is aperiodic if for any integer n ≥ 1, μ{x : τ n x = x} = 1. Then we have, 5.4.1 Lemma (Conjugacy lemma). If σ ∈ C is aperiodic, then the conjugate class of σ c(σ ) = {τ −1 σ τ : τ ∈ C}, is dense in C. Any ergodic endomorphism τ of (X, A, μ) is aperiodic. One can easily establish that this property is no longer true in other measure spaces. Define now the sequence {Sn , n ≥ 1} by means of the matrix summation method. For, assume that we are given an infinite matrix of reals A = {an,k , n, k ≥ 1} as well as some fixed 1 ≤ p < ∞. Let τ ∈ C, put then formally ∀f ∈ L (μ), ∀n ≥ 1, p

Snτ (f )

=

∞

an,k f τ k .

(5.4.1)

k=1

We will assume that all the column vectors an = {an,k , k ≥ 1} belong to 1 . From this assumption, it is easily deduced that (5.4.1) defines a sequence of continuous operators τ (f ) is the limit in Lp (μ) of N a f τ k as N tends on Lp (μ). Clearly, each SN k=1 n,k to infinity, this for any f ∈ Lp (μ). The fact that these operators are continuous in measure, is immediate. We will further assume that an,k ≥ 0 for any n, k ≥ 1. This assumption will guarantee that the operators Sn are positive. As usual, we also write for any f ∈ Lp (μ), Sτ∗ (f ) = sup |Snτ (f )|. n≥1

Observe then for any σ ∈ C that Sτ∗ (f ) σ = Sτ∗ (f σ ), for any f ∈ Lp (μ), provided that σ τ = τ σ . In particular, Sτ∗ (f ) τ i = Sτ∗ (f τ i ),

228

5 Banach principle and continuity principle

for any f ∈ Lp (μ), and i ≥ 1. We will need the following auxiliary result. 5.4.2 Lemma. Let D denote the set of τ ∈ C verifying ∀λ > 0, ∀f ∈ Lp (μ) with f p = 1,

μ{Sτ∗ (f ) > C(λ)} ≤ D(λ),

(5.4.2)

where C and D are applications from R+ in itself. Then D is closed in C. Proof. Assume that τp belongs to D for any p ≥ 1, and that τp → τ in C as p tends to infinity. It suffices then to show that the inequality N μ an,k f τ k > C(λ) ≤ D(λ),

(5.4.3)

k=1

holds for any λ > 0, any f ∈ Lp (μ) with f p = 1 and N ≥ 1. Let N ≥ 1 be given and fixed, as well as some real ε > 0. Since for any n ≥ 1, an ∈ 1 , we can find a number Mε ≥ 1 such that ∞ k=Mε an,k < ε for any 1 ≤ n ≤ N. For some integer pε depending on ε, Tchebycheff’s inequality allows us to write μ

N k=1

Mε ε an,k f τ k > C(λ) + 3δ ≤ μ an,k f τ k > C(λ) + 2δ + δ k=1

Mε ε an,k f τpkε > C(λ) + δ + 2 ≤μ δ k=1

∞

ε an,k f τpkε > C(λ) + 3 δ k=1 ε ≤ D(λ) + 3 . δ ≤μ

We conclude by letting ε go to 0. In the case of operators defined by means of matrix summation methods, the continuity principle admits the following strengthening due to Conze [1973], and still known as Conze’s principle. 5.4.3 Theorem. Let 1 ≤ p < ∞. Let A = {an,k , n, k ≥ 1} be an infinite matrix of positive reals and {Snτ , n ≥ 1} be the sequence of operators defined for τ ∈ C as in (5.4.1). Assume that the column vectors an = {an,k , k ≥ 1} belong to 1 . Then the following properties are equivalent: (a) There exists an ergodic automorphism σ ∈ C such that ∀f ∈ Lp (μ),

μ{x : Sσ∗ f (x) < ∞} = 1,

(5.4.4)

5.4 A principle of domination – conjugacy lemma

(b) ∃0 < C < ∞ : ∀f ∈ Lp (μ), ∀λ > 0, sup μ{x : Sτ∗ f (x) > λ} ≤

τ ∈C

C λp

229

|f |p dμ.

(5.4.5)

X

Proof. It suffices to show that (a) implies (b). By virtue of Sawyer’s continuity principle, C |f |p dμ. ∃0 < C < ∞ : ∀f ∈ Lp (μ), ∀λ > 0, μ{x : Sσ τ ∗ f (x) > λ} ≤ p λ X (5.4.6) Let c(σ ) = {τ −1 σ τ, τ ∈ C} the conjugate class of σ . Let α = τ −1 σ τ be an element of c(σ ). For any f ∈ Lp (μ), Snα (f ) = τ (Snσ (f τ −1 )). Thus

Sα∗ (f ) = τ (Sσ∗ (f τ −1 )).

We deduce ∀λ > 0, μ{x :

Sα∗ f (x)

C > λ} ≤ p λ

|f |p dμ,

(5.4.7)

X

for any f ∈ Lp (μ) and any α ∈ c(σ ). The preceding lemma shows that the family of all elements α of C verifying (5.4.6) is closed in C. As the conjugacy lemma 5.4.1 shows that this family is also dense in C, this achieves the proof.

Chapter 6

Maximal operators and Gaussian processes

This chapter is devoted to a study of the liaison inequalities existing between maximal operators of L2 -operators and those associated to the canonical Gaussian process on L2 . We shall also study the well-known metric entropy criteria developed by Bourgain, which have been proved to be efficient tools in the study of some classical problems of convergence almost everywhere. In presenting these criteria as direct corollaries of the above mentioned liaison inequalities, we will adopt a slightly different point of view than the initial one, allowing us to get a better understanding of the role played by the theory of Gaussian processes in the study of convergence almost everywhere.

6.1

Some liaison theorems

Let (X, A, μ) be some probability space and consider a sequence (denoted by S) of continuous operators Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . . Given 2 ≤ p ≤ ∞, the study of the convergence almost everywhere of the sequence {Sn f, n ≥ 1} for any f ∈ Lp (μ), is a fundamental question in ergodic theory. These properties are naturally expressed by means of the maximal operators SI (f ) = sup |Sn (f )|,

S ∗ (f ) = sup |Sn (f )|.

n∈I

n≥1

(6.1.1)

Here I is any finite subset of integers. By the Banach principle, the set of elements f ∈ Lp (μ) for which {Sn f, n ≥ 1} converges μ-almost everywhere is closed in Lp (μ) if and only if there exists a nonincreasing function C : R+ → R+ such that limα→∞ C(α) = 0, and for which μ{S ∗ f > α f p } ≤ C(α), α ≥ 0, f ∈ Lp (μ). Further if the sequence S commutes with a family E of measurable transformations of X preserving μ and mixing in the following sense: ∀A, B ∈ A, ∀α > 1, ∃T ∈ E :

μ(A ∩ T −1 B) ≤ αμ(A)μ(B),

(E )

then by the continuity principle C(α) = O(α −p ). This holds in particular when S commutes with an ergodic endomorphism of (X, A, μ). So that the study of the convergence almost everywhere of the sequence S amounts, modulo adequate commutation assumptions, to establishing a maximal inequality and to exhibiting a dense subset of Lp (μ) for which the convergence almost everywhere already holds. Recall now for our purpose some material from the theory of Gaussian processes taken from Chapter 10. Let H be a Hilbert space. In what follows we denote by

6.1 Some liaison theorems

231

Z = {Zh , h ∈ H } the canonical Gaussian process on H , namely the Gaussian centered process with covariance function (h, h ) = h, h ,

h, h ∈ H.

This process is easy to represent. Assume that H admits a countable orthonormal basis {hn , n ≥ 1}. This is realized if and only if H is separable (by Zorn’s lemma, any Hilbert space admits an orthonormal basis, although not necessarily countable). Let also g = {gn , n ≥ 1} be a sequence of i.i.d. N (0, 1) distributed random variables with basic probability space (, A, P). Then Z can be defined as follows: for any h ∈ H , Zh =

∞

gn h, hn .

(6.1.2)

n=1

We easily verify that E Zh Zh = h, h for any h, h ∈ H . Besides, if A is some finite or countable (only in order to avoid minor measurability problems) subset of H , we recall that A is a GB set (for Gaussian bounded set) if E sup |Z(h)| < ∞.

(6.1.3)

h∈A

Now we let H = L2 (μ) and introduce for any f ∈ L2 (μ) the subsets Cf = {Sn f, n ≥ 1}.

(6.1.4)

Bourgain [1988a] has established a remarkable link between the properties of convergence almost everywhere of the sequence S and the regularity of Z on the sets Cf . This link can be interpreted as follows: if the sequence S converges almost everywhere for a large class of functions, for any f ∈ Lp (μ) to be precise, with 2 ≤ p < ∞, then necessarily the associated sets Cf are GB-sets (i.e., E supn≥1 |Z(Sn (f ))| < ∞). And this provides by means of Sudakov’s minoration (inequality (6.2.7)) a necessary condition which reads on the size of the sets Cf . This condition means that these sets can not be too thick: their entropy numbers are not too big. There is an analogous result when p = ∞. Bourgain [1988a] also proved the efficiency of such a condition by showing, through several striking examples, how it can be successfully applied to recover some important results of Marstrand and Rudin. In this chapter, we will present Bourgain’s results from a functional analysis point of view. We will first establish relationships between some functionals naturally related through the Banach principle to the sequence S, and corresponding functionals related to the canonical Gaussian process Z. And next we show that Bourgain’s entropy criteria are easily deduced from these functional inequalities. We begin with some notation and first introduce for any subset I of integers the following functionals related to S and I . Let 2 ≤ p < ∞. Consider a sequence S of L2 (μ) continuous operators. We put sup |Sn (g)| dμ. (6.1.5) p (S, I ) = sup

g p,μ ≤1

n∈I

232

6 Maximal operators and Gaussian processes

When I = N we write more simply p (S) =

sup

g p,μ ≤1

sup S ∗ (g) dμ.

(6.1.6)

n∈N

Let p = ∞. Consider a sequence S of L2 (μ) − L∞ (μ)-continuous operators. It will be convenient to introduce the following functionals considered in Bourgain [1988a] (see also Bellow and Jones [1996]) ∞,2 (S, I, ε) = sup SI (f ) dμ,

f ∞,μ ≤1

f 2,μ ≤ε

∞,2 (S, ε) =

sup

S ∗ (f ) dμ.

(6.1.7)

f ∞,μ ≤1

f 2,μ ≤ε

When the operators Sn are further Lp (μ)-continuous for some p ∈ [2, ∞], we also put for any subset I of N and f ∈ Lp (μ),

Sn p =

sup

f p,μ ≤1

Sn (f ) p,μ ,

S(I, p) = sup Sn p .

(6.1.8)

n∈I

Finally we introduce the corresponding Gaussian functionals. Put (S, I ) =

sup

g 2,μ ≤1

E sup Z(Sn (g)),

(6.1.9)

n∈I

and for any positive integer K, ∗ (S, K) =

sup E sup Z(Sn (f )).

f 2,μ ≤1 #(I )=K

(6.1.10)

n∈I

We shall establish several liaison inequalities (Theorems 6.1.1 and 6.1.6) between these functionals. More precisely we compare (S, I ) with p (S, I ) for 2 ≤ p < ∞, and next ∗ (S, K) with ∞,2 (S, I, ε), if #(I ) = K. Consider the following assumption. (H1) There exists a sequence {Tj , j ≥ 1} of L2 (μ) positive isometries, preserving 1, commuting with the sequence {Sn , n ≥ 1}, Sn (Tj f ) = Tj (Sn f ), and verifying the following mean ergodic property: for all f ∈ L∞ (μ) 1 lim Tj f 2 − f 2 dμ = 0. J →∞ J 1,μ j ≤J

6.1.1 Theorem. Let Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . be continuous operators verifying (H1) and such that Sn (L∞ (μ)) ⊂ L∞ (μ). Let 2 ≤ p < ∞. Then there exists a constant Cp < ∞ such that for any finite subset I of N, (S, I ) ≤ Cp p (S, I ).

6.1 Some liaison theorems

233

The proof will notably result from an intermediate inequality proved in Lemma 6.2.2, and showing that for any 0 < ε < 1 and for all integers J along some index J, (1 − ε)E sup Z(Sn (f )) ≤ E sup Sn (FJ,f ) dμ. n∈I

n∈I

Here the FJ,f are the Stein elements that we already encountered in Chapter 5 for proving the continuity principle. In the following corollary, assumption (H1) is replaced by a slightly stronger one, needed to apply the continuity principle. (H2) There exists a family E = {Tj , j ≥ 1} of pointwise transformations of X preserving μ, commuting with the Sn , Sn (Tj f ) = Tj (Sn f ), and verifying for any f, g ∈ L2 (μ): n 1 Tk f, g = f, 1g, 1. lim n→∞ n k=1

Under this assumption, property E is fulfilled. Consequently the continuity principle applies to the sequence S. We shall now easily deduce the first entropy criterion of Bourgain [1988a: Proposition 1]. 6.1.2 Corollary (First entropy criterion). Let 2 ≤ p < ∞. Let S be a sequence of continuous operators Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . verifying assumption (H2), and such that the following property is fulfilled: μ{S ∗ (f ) < ∞} = 1 for any f ∈ Lp (μ).

(Bp )

Then for any f ∈ Lp (μ), the sets Cf are GB sets of L2 (μ). In particular, there exists a numerical constant C1 and a constant C2 depending on the sequence S only, such that for any f ∈ Lp (μ), 2 C1 sup ε log Nf (ε) ≤ E sup Z(Sn (f )) ≤ C2 f 2 , ε>0

n≥1

where for any ε > 0, Nf (ε) denotes the minimal number of L2 (μ) open balls of radius ε, centered in Cf and enough to cover Cf . Remarks. 1. Under the assumptions of Theorem 6.1.1, the same conclusion can be also reached by using the Banach principle. For more details see the proof of Theorem 4.1.1 in [Weber: 1998]. 2. The first inequality provides an entropy estimate which turns out to be optimal when the sequence S is a sequence of convolutions products ([Weber: 1998b], Remark 4.1.4). 3. The second inequality indicates that the sets Cf are uniformly GB.

234

6 Maximal operators and Gaussian processes

Proof. As Sn is a continuous operator in L2 (μ), this implies that Sn is also continuous in measure on Lp (μ). By virtue of the continuity principle, we know that sup sup λp μ{S ∗ (f ) > λ} < ∞.

f p ≤1 λ≥0

And thus for any r < p,

sup S ∗ (f ) r < ∞.

f p ≤1

Applying this with r = 1 implies, in view of Theorem 6.1.1, that there exists Kp < ∞ such that for any finite subset I , sup

f 2,μ ≤1

E sup Z(Sn (f )) ≤ Cp p (S, I ) ≤ Cp sup S ∗ (f ) 1 := Kp .

f p ≤1

n∈I

(6.1.11)

By letting I increase to N we get sup

f 2,μ ≤1

E sup Z(Sn (f )) ≤ Kp . n≥1

This proves the second inequality of the corollary. As to the first, it is an immediate consequence of Sudakov’s minoration, which we recall in this chapter (see inequality (6.2.7)). Before continuing, we shall study several applications of Theorem 6.1.1. We begin with a first application to Riemann sums. Let T be endowed with the normalized Lebesgue measure λ. Let f ∈ L0 (λ) and define for x ∈ T and any integer n = 1, 2, . . . the Riemann sum of order n of f , Rn (f )(x) =

1 j f x+ . n n 0≤j

l∈{1}∪E1

1 1 4(p2 − 1) + 2 > 1, 2 4p p

hence a contradiction. Now let i2 ∈ E\({1} ∪ E1 ). We easily check that

fi1 − fi2 ≥ |fi1 , φi2 − fi2 , φi2 | ≥

1 1 1 − = . p 2p 2p

More generally, for 1 ≤ k ≤ T , put

, Ek = i ∈ E\ {1, . . . , k} ∪ 0≤j

1 2p

.

Arguing as before, we also get #(Ek ) ≤ 4(p2 − 1), since otherwise 1 ≥ fik 2 ≥

fik , φl 2 >

l∈{k}∪Ek

1 1 4(p2 − 1) + 2 > 1. 2 4p p

, Now let ik+1 ∈ E\ {1, . . . , k} ∪ 0≤j ≤k Ej . For any l ≤ k we have 1 1 1 − = , p 2p 2p

, because ik+1 ∈ / 0≤l≤k El . We can iterate this procedure as long as E\ {1, . . . , k} ∪ , 0≤j ≤k Ej = ∅, namely at least k times, k ≤ T ; hence the lemma is proved.

fil − fik+1 ≥ |fik+1 , φik+1 − fil , φik+1 | ≥

Proof of Proposition 6.1.4. Let s be some fixed positive integer. Let P1 , P2 , . . . denote the sequence of prime numbers. For any nonnegative integer T we put AT = {n = P1α1 . . . Psαs : 2T ≤ n < 2T +1 , αi ≥ 0, i = 1, . . . , s}.

(6.1.12)

Since P1 = 2, replacing α1 by α1 + 1 we verify that #(AT ) ≤ #(AT +1 ).

(6.1.13)

As 0 ≤ α1 + · · · + αs ≤ T if n = P1α1 . . . Psαs ∈ AT , we also deduce that #(AT ) ≤ (T + 1)s .

(6.1.14)

The growth condition (6.1.14) implies that, given any arbitrary positive integer d, there exists T such that (6.1.15) #(AT +d ) ≤ 2#(AT ).

239

6.1 Some liaison theorems

Otherwise #(AT +d ) > 2#(AT ) for any T would imply that for some constant c > 0, #(And ) > c2n ,

(6.1.16)

for any positive integer n, which contradicts (6.1.14); hence (6.1.15). Now choose d such that 2d ≤ Ps . Any integer j ≤ 2d has consequently only prime factors from the set {P1 , . . . , Ps }. Put for i = 0, . . . , d, f (i) (x) =

1 #(AT +i )1/2

e2iπ nx ,

(6.1.17)

n∈AT +i

f = f (0) , fj (x) = f (j x) and then φi =

f (2i−1) + f (2i) , √ 2

i = 0, . . . ,

"d # 2

.

The f (i) form an orthonormal system in L2 , as do the φi as well. Besides, fj = 1 " # for any j . Let 1 ≤ i ≤ d2 and j ∈ [22i−1 , 22i ], and examine fj . Let n ∈ AT . Then all the prime factors of nj belong to {P1 , . . . , Ps }. Further, 2T +2i−1 ≤ nj < 2T +2i+1 . It follows that n ∈ AT and j ∈ [22i−1 , 22i ] "⇒ nj ∈ AT +2i−1 ∪ AT +2i . We may thus write fj (x) =

1 e2iπ mx , #(D)1/2 m∈D

where D ⊂ AT +2i−1 ∪ AT +2i and #(D) = #(AT ). Hence by (6.1.15), √ 2fj , φi =

1 [#(AT )#(AT +2i−1 )]1/2 +

≥ Therefore for any 1 ≤ i ≤

"d # 2

1 [#(AT )#(AT +2i )]1/2

m∈D∩AT +2i

#(D) #(AT ) 1 √ = √ =√ . #(T ) 2 #(AT ) 2 2 and any 22i−1 ≤ j ≤ 22i , fj , φi ≥

1 . 2

1

m∈D∩AT +2i−1

1

(6.1.18)

240

6 Maximal operators and Gaussian processes

And fj , φk ≥ 0 for any j and k. Thus 2i

i

2

j =1

j =22i−1

4 1 1 fj , φi ≥ i S4i (f ), φi = i 4 4

fj , φi

2i

2 22i − 22i−1 + 1 1 1 1 = ≥ . = 2i+1 2.4i 2 4 2i−1 l=2

We have thus obtained: for every i = 1, . . . ,

"d #

,

2

S4i (f ), φi ≥ Lemma 6.1.5 applied with the choices R =

N

"d # 2

1 . 4

,T =

$ %

S4i (f ), i ≤

d 2

(6.1.19)

,

1 8

"" d # 2

# /13 , p = 2 shows that

≥ T.

(6.1.20)

Since d is arbitrary, it follows from Theorem 6.1.1 and inequality (6.2.7), that for any M ≥ 26, B2 sup S4i f dμ ≥ log M/26, (6.1.21) sup 8

f 2 ≤1 1≤i≤M as claimed. We will also establish the following result concerning the functionals in (6.1.7). 6.1.6 Theorem. Assume for any positive integer n that Sn is L2 (μ)-L∞ (μ) continuous, and that assumption (H1) is satisfied. Then for any finite subset of I of N and any reals A > 0 and R > 0, (S, I ) ≤

2

2#(I )S(I, 2)e−A

2 /8

+

√

2 · AS(I, ∞)e−R

2 /4

+ A∞,2 S, I,

R . A

As an immediate consequence we have the following proposition. 6.1.7 Proposition. Let {Sn , n ≥ 1} be a sequence of L2 (μ)-L∞ (μ) contractions verifying assumption (H1). Then for any real ρ > 0, there exists a constant Cρ < ∞ such that for any integer K ≥ 3 and any R > 0,

2 2 R ∗ (S, K) √ K −ρ 2 ≤ 2√ + 2Cρ e−R /4 + Cρ ∞,2 S, 2 . (6.1.22) √ log K log K Cρ log K In particular

∗ (S, K) ≤ 2 lim ∗∞,2 (S, ε). lim sup √ ε→0 log K K→∞

(6.1.23)

241

6.1 Some liaison theorems

Proof. Theorem 6.1.6 implies 2 √ R −A2 /8 −R 2 /4 (S, I ) ≤ 2#(I )e + 2 · Ae + A∞,2 S, I, . A √ Let ρ > 0 be fixed. Choose C = Cρ = 8ρ + 4. Let K ≥ 3. Put A = C log K. Then for any subset I of N such that #(I ) = K, √ √ √ (S, I ) 2 R −ρ −R 2 /4 + C∞,2 S, √ . ≤√ K + 2Ce √ log K log K C log K

And by taking the maximum over all subsets I of integers such that #(I ) = K, √ √ √ 2 ∗ (S, K) R 2 , ≤√ K −ρ + 2Ce−R /4 + C∞,2 S, √ √ log K log K C log K which is (6.1.22). By now letting R run over any increasing sequence of integers RK = 0, next letting ρ go {RK , K ≥ 1} such that limK→∞ RK = ∞ and limK→∞ √log K to zero, we also get (6.1.23). From the above proposition, it is still possible to simply deduce as a corollary the other entropy criterion of Bourgain [1988a: Proposition 2] for the space L∞ (μ). This criterion is mostly applied. 6.1.8 Corollary (Second entropy criterion). Let {Sn , n ≥ 1} be a sequence of L2 (μ)L∞ (μ) contractions verifying assumption (H1). Assume that μ {Sn (f ), n ≥ 1 converges} = 1 for all f ∈ L∞ (μ). (C∞ ) Then for any real δ > 0, C(δ) =

sup

f ∈L∞ (μ), f 2 ≤1

Nf (δ) < ∞.

(6.1.24)

Proof. Assume that there exists a real δ > 0 such that C(δ) = ∞. Then for any integer K ≥ 3, there exists f ∈ L∞ (μ) such that f 2,μ = 1 and I with #(I ) = K such that inf

n,m∈I, n =m

Sn (f ) − Sm (f ) 2,μ ≥ δ.

In view of Proposition 6.1.7 (with ρ = 1 and C = Cρ ) and inequality (6.2.7), it follows that R −1 −R 2 /4 Bδ ≤ C(K + e ) + ∞,2 S, √ , log K where B is a numerical constant. Choosing now R such that Ce−R letting K go to infinity, we deduce 1 Bδ ≤ lim sup ∞,2 (S, ε). 2 ε→0

2 /4

≤ 21 Bδ, next

242

6 Maximal operators and Gaussian processes

This brings a contradiction since in view of Theorem 5.1.5 and the assumptions made we know that the maximal operator ∞,2 (S, ε) should be continuous at 0; hence the result. Return to Khintchin sums (Proposition 6.1.4) and the entropy estimate established in (6.1.20). Since d was arbitrary, it follows that

1 N (S4i (f ), i ≥ 1), sup 4 f ∈L∞ , f 2 ≤1

= ∞.

And by the second entropy criterion, we recover a well-known result due to Marstrand [1970], answering negatively a conjecture due to Khintchin: There exists a measurable bounded function f such that the sequence of Khintchin sums {Sn f, n ≥ 1} does not converge almost everywhere.

6.2 Two preliminary lemmas We begin with a useful lemma. 6.2.1 Lemma. Let T be a positive isometry of L2 (μ) such that T 1 = 1. (a) Then Tf ∞,μ ≤ f ∞,μ for any f ∈ L∞ (μ), and μ{(Tf )2 = Tf 2 } = 1. (b) Moreover, if T is a continuous operator on L1 (μ), then for any f ∈ L2 (μ), μ{(Tf )2 = Tf 2 } = 1, and T is a positive isometry of L1+ (μ). (c) Conversely, if T is a positive isometry on L1+ (μ) such that μ{(Tf )2 = Tf 2 } = 1 holds for any f ∈ L2 (μ), then T 1 = 1 and T is a positive isometry on L2 (μ). Proof. The first assertion in (a) is immediate since Tf ≤ T 1 · f ∞,μ = f ∞,μ . Now let A ∈ A, 0 < μ(A) < 1. We use the following property: f, g ∈ L2 (μ) with f ≥ 0, g ≥ 0 have disjoint supports if and only if

f + g 22,μ = f 22,μ + g 22,μ .

(6.2.1)

This property remains true (see Krengel [1985: p. 186]) in Lp (μ) with 1 < p < ∞. Since T is a positive isometry, from the fact that T 1 = 1, we deduce that T 1A and T 1Ac have disjoint supports; and 0 ≤ T 1A , T 1Ac ≤ 1. Let E = {0 < T 1A < 1} = {0 < T 1Ac < 1}. As E ⊂ supp(T 1A ) ∩ supp(T 1Ac ), we conclude that T 1A and T 1Ac are indicators. Consequently, any simple function is mapped by T into a simple function. For these functions we have (Tf )2 = Tf 2 .

243

6.2 Two preliminary lemmas

Let f ∈ L∞ (μ), f ≥ 0. Put for any integer n > f ∞,μ , n

n

n2 q fn = 1 q−1 q , 2n 2n ≤f < 2n

gn =

q=1

n2 q −1 q=1

Then f ≤ fn ≤ f + 21n and gn ≤ f ≤ gn + positivity of T and T 1 = 1,

(Tf ) ≤ (Tfn ) = 2

2

Tfn2

1 ≤T f + n 2

1 2n

2

2n

1 q−1 2n

≤f < 2qn

.

(6.2.2)

at any point. On the one hand, using

= Tf 2 + 2−n+1 Tf + 2−2n .

Consequently by letting n tend to infinity, (Tf )2 ≤ Tf 2 . And on the other,

1 Tf ≤ T gn + n 2 2

2

= T gn2 + 2−n+1 T gn + 2−2n ≤ (Tf )2 + 2−n+1 Tf + 2−2n .

Hence Tf 2 ≤ (Tf )2 by letting n tend to infinity, and thus Tf 2 = (Tf )2 . Let now f ∈ L∞ (μ), f = f + − f − . As

Tf + − Tf − 22,μ = Tf 22,μ = f 22,μ = f + 22,μ + f − 22,μ = Tf + 22,μ + Tf − 22,μ , it then follows that Tf + and Tf − have disjoint supports. This implies that (Tf )2 = (Tf + )2 + (Tf − )2 = T (f + )2 + T (f − )2 = Tf 2 .

(6.2.3)

We have thus established assertion (a). We now show (b). Let f ∈ L2 (μ); there exists a sequence (fn ) ⊂ L∞ (μ) such that f − fn 2 → 0 as n → ∞. By virtue of the Cauchy–Schwarz inequality, we have also f 2 − fn2 1 → 0 as n → ∞. Then

(Tf )2 − Tf 2 1 ≤ (Tf )2 − (Tfn )2 1 + (Tfn )2 − Tfn2 1 + T (fn2 − f 2 ) 1 ≤ T (fn − f ) 2 · T (fn + f ) 2 + T (fn2 − f 2 ) 1 → 0, as n → ∞ since T is continuous on L1 (μ) and L2 (μ); hence (b). Finally (c) is immediate. Recall for our purpose Slepian’s comparison inequality and Sudakov’s minoration (inequalities (10.2.7) and (10.2.9)). Let T be a finite set. Let X = {Xt , t ∈ T } and Y = {Yt , t ∈ T } be two Gaussian processes. Assume that for any s, t ∈ T ,

Xs − Xt 2 ≤ Ys − Yt 2 .

(6.2.4)

244

6 Maximal operators and Gaussian processes

Then for any positive increasing convex function f on R+ , " # " # Ef sup (Xs − Xt ) ≤ Ef sup (Ys − Yt ) . T ×T

T ×T

(6.2.5)

In particular, E sup Xt ≤ E sup Yt . t∈T

(6.2.6)

t∈T

An important consequence is Sudakov’s minoration: there exists a universal constant B such that for any Gaussian process X = {Xt , t ∈ T } with basic probability space (, B, P), 2 E sup Xt ≥ B inf Xs − Xt 2,P log #(T ). (6.2.7) s,t∈T s =t

t∈T

Now let g = {gn , n ≥ 1} be a sequence of i.i.d. N (0, 1) distributed random variables defined on a joint probability space of (X, A, μ), which we denote by (, B, P). To any f ∈ L2 (μ) and any finite subset E of N, we associate the Gaussian sequence FE,f = √

1 gj Tj (f ). #(E) j ∈E

(6.2.8)

When E = {1, 2, . . . , J } we will write more simply FE,f = FJ,f . The following comparison lemma is the key for proving Theorems 6.1.1 and 6.1.4. 6.2.2 Lemma. Let Sn : L2 (μ) → L2 (μ), n = 1, 2, . . . be continuous operators verifying (H1) and such that Sn (L∞ (μ)) ⊂ L∞ (μ). Let f ∈ L∞ (μ); let also I be a finite subset of positive integers such that Sn (f ) − Sm (f ) 2,μ = 0 for all n, m ∈ I with m = n. Then for any 0 < ε < 1 and any index J0 , there exists a subindex J such that if

Sn (FJ,f ) − Sm (FJ,f ) 2,P √ A(I ) = ∀J ∈ J, ∀n, m ∈ I, m = n, ≥ 1−ε ,

Sn (f ) − Sm (f ) 2,μ then μ {A(I )} ≥

√ 1 − ε,

(6.2.9) R+ ,

and for any positive increasing convex function G on any J ∈ J: √

√ 1 − ε E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) ≤E

n,m∈I

G sup Sn (FJ,f ) − Sm (FJ,f ) dμ.

(6.2.10a)

n,m∈I

In particular, for any J ∈ J, (1 − ε)E sup Z(Sn (f )) ≤ E n∈I

sup Sn (FJ,f ) dμ. n∈I

(6.2.10b)

245

6.2 Two preliminary lemmas

Proof. We give the proof when J0 = {1, 2, . . . }, the case of an arbitrary index J0 presenting no additional difficulty. Let 0 < ε < 1 be fixed. Let f ∈ L∞ (μ). By assumption, the operators Sn and Tj are commuting; thus Sn (FJ,f ) = FJ,Sn (f ) . Consequently, " #2 Sn (FJ,f ) − Sm (FJ,f ) 2 = 1 Tj (Sn (f ) − Sm (f )) . 2,P J j ≤J

Lemma 6.2.1 and assumption (H1) allow us to write Sn (FJ,f ) − Sm (FJ,f ) 2 2,P

2 1 L1 (μ) = Tj (Sn (f ) − Sm (f ))2 −→ Sn (f ) − Sm (f ) 2,μ , J j ≤J

as J tends to infinity. Fix n, m ∈ I , n = m. We can thus define an index J = {Jk , k ≥ 1}, such that 1 2 2 ∀k ≥ 1, Tj (Sn (f ) − Sm (f )) − Sn (f ) − Sm (f ) 2,μ ≤ 2−2k . Jk 1,μ j ≤Jk

Therefore, ∀k ≥ 1,

1 2 2 μ Tj (Sn (f ) − Sm (f )) − Sn (f ) − Sm (f ) 2,μ ≥ 2−k ≤ 2−k .

Jk

j ≤Jk

Let L ≥ 1 be an integer such that 2−L−1 ≤ ε Sn (f ) − Sm (f ) 22,μ . Then, for any k > L, Sn (f ) − Sm (f ) 2 − 2−k ≥ (1 − ε) Sn (f ) − Sm (f ) 2 2,μ

2,μ

and consequently, √ μ ∀k > L, Sn (FJk ,f ) − Sm (FJk ,f ) 2,P ≥ 1 − ε Sn (f ) − Sm (f ) 2,μ 1 Tj (Sn (f ) − Sm (f ))2 − Sn (f ) − Sm (f ) 22,μ ≤ 2−k ≥ μ ∀k > L, Jk j ≤Jk ≥1− 2−k = 1 − 2−L . k>L

We write J(m, n) = {Jk , k > L}. We have thus shown Sn (FJ,f ) − Sm (FJ,f ) √ 2,P μ ∀J ∈ J(m, n), ≥ 1 − ε ≥ 1 − 2−L .

Sn (f ) − Sm (f ) 2,μ Let (m , n ), m = n be another pair of elements of I . Let also L be some sufficiently large positive integer. Since 1 2 2 lim Tj (Sn (f ) − Sm (f )) − Sn (f ) − Sm (f ) 2,μ = 0, J →∞ J 1,μ J ∈J(m,n) j ≤J

246

6 Maximal operators and Gaussian processes

by the preceding reasoning we can extract from J(m, n) another index J(m , n ) such that

Sn (FJ,f ) − Sm (FJ,f ) 2,P √ μ ∀J ∈ J(m , n ), ≥ 1 − ε ≥ 1 − 2−L .

Sn (f ) − Sm (f ) 2,μ Proceeding then by successive iterations, we can define for a convenient choice of integers L, L , . . . , an index J = J(I, ε) such that if

Sn (FJ,f ) − Sm (FJ,f ) 2,P √ A(I ) = ∀J ∈ J, ∀n, m ∈ I, m = n, ≥ 1−ε ,

Sn (f ) − Sm (f ) 2,μ then μ{A(I )} ≥

√

1 − ε.

Along this index, we thus have by virtue of (6.2.6), √ E sup Z(Sn (f )) dμ (1 − ε)E sup Z(Sn (f )) ≤ 1 − ε A(I ) n∈I n∈I ≤ E sup Sn (FJ,f ) dμ A(I ) n∈I ≤E sup Sn (FJ,f ) dμ, X n∈I

since μ{ E supn∈I Sn (FJ,f ) ≥ 0 } = 1. This establishes (6.2.10b). As for (6.2.10a), √ inequality (6.2.5) and the fact that μ {A(I )} ≥ 1 − ε, shows similarly

E G sup Sn (FJ,f ) − Sm (FJ,f ) dμ X

n,m∈I

≥

A(I )

≥

A(I )

E G sup Sn (FJ,f ) − Sm (FJ,f ) dμ n,m∈I

√ E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) dμ n,m∈I

√

√ ≥ 1 − ε E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) n,m∈I

√

√ ≥ 1 − ε E G 1 − ε sup Z(Sn (f )) − Z(Sm (f )) . n,m∈I

This achieves the proof of Lemma 6.2.2. Two elementary estimates for Gaussian variables (see Chapter 10) will now be ∞ 2 2 necessary. We recall them for convenience: if R(x) = ex /2 x e−t /2 dt (Mill’s ratio), then for any x ≥ 0, 3 2 π 2 . (6.2.11) ≤ R(x) ≤ ≤ √ 2 x2 + 4 + x x 2 + π8 + x

247

6.3 Proof of Theorem 6.1.1

It follows that for any standard Gaussian random variable g, any T > 0, E g 2 1(|g|≥T ) ≤ 6e−T

6.3

2 /4

.

(6.2.12)

Proof of Theorem 6.1.1

Let f ∈ L∞ (μ) be such that f 2,μ ≤ 1. By using Lemma 6.2.1 and moment properties of Gaussian random variables, we get E

p/2 p E |FJ,f |2 E |FJ,f |p dμ ≤ Cp dμ p/2 1 p 2 = Cp Tj f (x) dμ(x). J

|FJ,f |p dμ =

j ≤J

Here Cp depends on p only. By assumption 1 2 2 lim Tj f − f 2,μ

J →∞

J

j ≤J

1,μ

= 0.

Along some increasing subsequence of integers, say J0 , J1 j ≤J Tj f 2 (x) thus converges to f 22,μ for almost all x. Since f ∈ L∞ (μ), it follows that for J ∈ J0 , p/2

1 p 2 is a bounded sequence converging almost surely to f 2,μ , j ≤J Tj f (x) J and we may apply the dominated convergence theorem. Therefore lim

J0 J →∞

E

p

|FJ,f |p dμ = f 2,μ .

Extracting if necessary from J0 another subsequence which we call again J0 , we may thus conclude that E FJ,f p,μ ≤ 2Cp f 2,μ ,

∀J ∈ J0 .

(6.3.1)

Further by Lemma 6.2.2, for any 0 < ε < 1 there exists an index J ⊆ J0 such that for any J ∈ J, (6.3.2) (1 − ε)E sup Z(Sn (f )) ≤ E sup Sn (FJ,f ) dμ. n∈I

n∈I

248

6 Maximal operators and Gaussian processes

Let 0 < ε < 1 and put u0 = 0, un = ε(1 + ε)n−1 n = 1, 2, . . . . Write E

∞ sup Sn (FJ,f ) dμ = E n∈I

≤

1uk−1 ≤ FJ,f p,μ 1 be fixed. By extracting if necessary another index, we obtain

∀J ∈ J,

E

F A,J 22,μ

A2 ≤ (1 + α) exp − . 4

(6.4.9)

Integrating then inequality (6.4.5) with respect to P allows us to deduce from (6.4.4) and (6.4.9) that for any J ∈ J, 2 A2 (1 + α)#(I )S(I, 2) exp − + E sup |Sn (FA,J )| dμ. 8 n∈I n∈I (6.4.10) In order to estimate E sup |Sn (FA,J )| dμ, γ E sup Z(Sn (f )) ≤

n∈I

a fine evaluation of E exp a FA,J 22,μ where a = E exp

a FA,J 22,μ

1 4α

will be necessary. At first,

2 = E exp a FA,J dμ X

and, by means of Jensen’s inequality, we may continue as follows:

2 2 2 ≤E exp(aFA,J ) dμ ≤ E exp aFA,J dμ + eaA μ(Bαc ), X

Bα

252

6 Maximal operators and Gaussian processes

where the set Bα will be made explicit later on. We already know that J1 j ≤J Tj f 2 converges in L1 (μ) and almost everywhere to f 2 dμ = 1, as J tends to infinity along the index J. For what follows, it will be necessary to make this a bit more precise. Let δk = δ2−k , k ≥ 1, where 0 < δ < inf(α − 1, 1) will be defined later on. We can thus extract from the index J a sequence {Jk , k ≥ 1} such that 1 Tj f 2 − 1 > δk ≤ δk . μ Jk j ≤Jk

Put

1 Bˇδ = ∀k ≥ 1, Tj f 2 − 1 ≤ δk . Jk j ≤Jk

Then μ(Bˇδ ) ≥ 1 −

∞

= 1 − δ, and

k=1 δk

1 Bˇδ ⊂ Bα := ∀k ≥ 1, Tj f 2 < α . Jk j ≤Jk

We have thus μ(Bα ) ≥ 1 − δ. And on Bα ,

1 − 2a

1 1 Tj f 2 > 1 − 2aα = , Jk 2 j ≤Jk

for any k ≥ 1. Thus, Bα

1 − 2a

dμ

1 Jk

j ≤Jk

As for any 0 ≤ b < 21 , E exp b(N (0, 1)2 ) = E exp Bα

2 aFA,J

dμ ≤ Bα

√ 1 , 1−2b

Bα

√ 2.

we have the estimate

E exp aFJ2 dμ

=

Hence

Tj f 2

≤

1 − 2a

dμ

1 Jk

j ≤Jk

Tj f 2

≤

√ 2.

√

√ 2 2 E exp a FA,Jk 22,μ ≤ 2 + eaA μ(Bαc ) ≤ 2 + δeA a .

The extracted subsequence {Jk , k ≥ 1} relies upon δ. We choose δ < (α − 1)e−A /4α . Denote again by J the sequence {Jk , k ≥ 1}. Then J relies upon A and α, and for any J in J we have

√ E exp a FA,J 22,μ ≤ 2 + α − 1. (6.4.11) 2

253

6.4 Proof of Theorem 6.1.6

We now evaluate the quantity E supn∈I |Sn (FA,J )| dμ by considering separately the two integrals E sup |Sn (FA,J )| dμ and E sup |Sn (FA,J )| dμ. Bαc n∈I

Bα n∈I

The first integral can be bounded for any R > 0 by

E sup |Sn (FA,J )| 1{ FA,J 2,μ >R} dμ+ E sup |Sn (FA,J )| 1{ FA,J 2,μ ≤R} dμ. Bα

Bα

n∈I

n∈I

As concerns the second, using the Cauchy–Schwarz inequality gives 1 E sup |Sn (FA,J )| dμ ≤ μ(Bαc ) 2 · E sup |Sn (FA,J )| 2,μ Bαc n∈I

n∈I

≤ (α − 1)1/2 e−A ≤ (α − 1)

2 /8α

1/2 −A2 /8α

e

2 2

#(I )S(I, 2)E FA,J 2,μ #(I )S(I, 2).

Now return to the first term and observe that

Sn (FA,J ) ∞ ≤ S(I, ∞) FA,J ∞ ≤ S(I, ∞)A. Estimate (6.4.11) and the fact that Sn is continuous on L∞ (μ) allows us to bound

E sup |Sn (FA,J )| 1{ FA,J 2,μ >R} dμ Bα

n∈I

by

2 AS(I, ∞)P FA,J 2,μ > R ≤ AS(I, ∞)e−aR E exp a FA,J 22,μ 2 √ ≤ AS(I, ∞)e−aR ( 2 + α − 1). Consider the second integral. Here it is much easier, because we have the straightforward bound

R E sup |Sn (FA,J )| 1{ FA,J 2,μ ≤R} dμ ≤ A∞,2 S, I, . A Bα n∈I By combining all these estimates and returning to the initial inequality, we see that we have arrived at 2 A2 γ E sup Z(Sn (f )) ≤ S(I, 2) (1 + α)#(I ) exp − 8 n∈I 2 2 + S(I, 2)(α − 1)1/2 e−A /8α #(I ) (6.4.12) √ R 2 . + ( 2 + α − 1)AS(I, ∞)e−aR + A∞,2 S, I, A

254

6 Maximal operators and Gaussian processes

In this last inequality, J has disappeared. We were free to choose α > 1, but as close to 1 as we wish, which we do now. By letting also γ tend to 1, we have thus obtained 2 √ R 2 2 E sup Z(Sn (f )) ≤ 2#(I )S(I, 2)e−A /8 + 2·AS(I, ∞)e−R /4 +A∞,2 S, I, . A n∈I (6.4.13) This last inequality being satisfied for any f ∈ L∞ (μ) such that f 2,μ = 1, we easily deduce the claimed result by continuity in quadratic mean of Z.

6.5 The case Lp , 1 < p < 2 Let (X, A, μ) be some probability space. Let 1 < p ≤ 2 and denote by q its conjugate: 1 1 p p + q = 1. Consider a sequence {Sn , n ≥ 1} of continuous operators from L (μ) to Lp (μ), and assume that the almost sure boundedness property μ {S ∗ f < ∞} = 1

for all f ∈ Lr (μ)

(Br )

is fulfilled for some r < p. Can we again deduce an entropy criterion similar to Corollary 6.1.2? The following theorem ([Weber: 1993b], Theorem 1.4) shows that the answer is affirmative, but the proof will depend this time on more delicate properties of p-stable processes, instead of those of Gaussian processes used till now. 6.5.1 Theorem (Third entropy criterion). Let 1 < p ≤ 2 with conjugate q. Consider a sequence {Sn , n ≥ 1} of continuous operators from Lp (μ) to Lp (μ). Assume that there exists an ergodic endomorphism τ of the measure space (X, A, μ) commuting with each Sn . Assume also that for some real 0 < r < p, property (Br ) is fulfilled. Then there exists a constant C(r, p) < ∞ depending on r and p only, such that for any f ∈ Lp (μ), " #1/q p sup ε log Nf (ε) ≤ C(r, p) f p , (6.5.1) ε>0

p Nf (ε)

where is the minimal number of open Lp -balls of radius ε, centered in Cf and enough to cover it. Further C(r, p) tends to infinity as r tends to p. Proof. Let T be the operator associated to τ through the relation Tf = f τ . We shall replace the Gaussian elements by stable ones. Let {θi i ≥ 1} be a sequence of i.i.d. symmetric, p-stable random variables of parameter 1 ([Petrov: 1975], [Mijnheer: 1975]). For any f ∈ Lp (μ), any positive integer J and any x ∈ X and (ω, ω ) ∈ × , we put 1 θj T j f (x). (6.5.2) Ff,J (x) = 1/p J j ≤J

Then FJ = {Ff,J (x), x ∈ X} is a p-stable random function with spectral measure δT j f . m= j ≤J

6.5 The case Lp , 1 < p < 2

255

One can represent FJ as a random mixture of Gaussian random functions; this is a classical fact from p-stable random functions. More precisely, there exist a sequence {gi , i ≥ 1} of i.i.d. N (0, 1) random variables basic probability space (, A, P) and a sequence {ηj , j ≥ 1} of i.i.d. nonnegative random variables basic probability space ( , A , P ) such that the random function HJ defined by HJ,f (ω, ω , x) =

1 J 1/p

ηj (ω )gj (ω)T j f (x)

j ≤J

p

has the same distribution as FJ . See Remark 1.8 in [Marcus–Pisier: 1984] for this fact. We denote in what follows P˜ = P ⊗ P . Observe also for any r < p,

E |Fj |r = (E |θ1 |r )

1 j p r/p |T f | , J j ≤J

since

1 1

Jp

D

θj T j f = θ1

j ≤J

1 j p 1/p |T f | . J j ≤J

Let f ∈ L∞ (μ). In view of Birkhoff’s theorem, as well as the dominated convergence theorem, we get lim

J →∞

r/p 1 j p r/p |T f | dμ = |f |p dμ . J j ≤J

And so for any J large enough, E |Fj |r dμ ≤ 2r (E |θ1 |r ) f rp . Thus for any r < p and J large enough,

FJ r,μ×P˜ ≤ 2 θ1 r f p,μ . Besides, from the Banach principle and the assumptions made, we also observe that for any ε√> 0, any J large enough, there exist a measurable set XεJ ⊂ X with μ(XεJ ) ≥ 1 − ε, and a real C(ε) such that for all x ∈ XεJ , √ (6.5.3) P˜ sup |Sn (FJ,f )| ≤ C(ε) θ1 r f p,μ ≥ 1 − 2 ε. n≥1

Hence √ √ P˜ ω : P sup |Sn (FJ,f (ω, ω , x))| ≤ C(ε) θ1 r f p,μ ≥ 1 − ε ≥ 1 − 3 ε. n≥1

(6.5.4)

256

6 Maximal operators and Gaussian processes

We denote by EP the expectation symbol with respect to P. Using now estimate (10.2.2) for Gaussian semi-norms, we show that on XεJ , for any 0 < ε < 1/4, √ 4C(ε) √ θ1 r f p,μ . 1 − ε ≤ P ω : EP sup |Sn (FJ,f ( ·, ω , x))| ≤ 1− (6.5.5) ε n≥1

Consider the p-stable sequence of random variables defined by Sn (FJ,f ) =

1 J

1 p

θj Sn (T j (f )),

n ≥ 1,

j ≤J

and also equal, thanks to the commutation assumption, to

1 J 1/p

θj T j (Sn (f )),

n ≥ 1.

j ≤J

This sequence has thus the same distribution function as the p-stable random function HJ (n) =

1 J 1/p

ηj gj T j (Sn (f )),

n ≥ 1.

j ≤J

Introduce the Gaussian distance on N, $

2 1 EP HJ (n) − HJ (m) dJ,ω ,x (n, m) = 2

%1/2

,

p

as well as the metric associated to HJ , 1/p 1 p , |β(n) − β(m)| dmHJ (β) dJ,x (n, m) = 2 where mHJ denotes the spectral measure of HJ . For any finite subset A ⊂ N, any metric d on N, any ε > 0, we denote by N(A, d, ε) the minimal number of d-balls centered in A and enough to cover A. Moreover let σ (A, d, n) be the smallest ε > 0 such that A can be covered with at most n d-balls centered in A. By Lemma 2.1 in [Marcus–Pisier: 1984], there exists a measurable set 0 with P (0 ) > 21 , in fact the computations made show that the probability can be as close to one assume for only convenience reasons that P (0 ) > √ as we wish, and we shall 1 − ε, such that for any ω ∈ 0 and any positive integer n, σ (N, dJ,ω ,x , n) ≥ β(p)

σ (N, dJ,x , n) 1

(log(n + 1)) q

− 21

,

(6.5.6)

where β(p) depends on p only. We deduce from (6.5.6), as well as (6.5.5) and Sudakov’s minoration (6.2.7) that for any x ∈ XεJ ,

1/q 4C(ε) , √ θ1 r f p,μ ≥ γ (p) sup δ log N (N, dJ,x , δ) 1− ε δ>0

(6.5.7)

6.5 The case Lp , 1 < p < 2

257

where γ (p) > 0. Let I be a finite subset of N such that for any n, m ∈ I with m = n,

Sn (f ) − Sm (f ) 2,μ = 0. In view of the assumptions made, we can find a partial index J depending on I such that √ μ ∀j ∈ J, ∀n, m ∈ I, dJ,x (n, m) ≥ δ(p) (Sn − Sm )(f ) p,μ ≥ 1 − ε (6.5.8) where δ(p) > 0. By combining (6.5.7) and (6.5.8) we get

1/q , C(ε) θ1 r f p,μ ≥ ε(p) sup δ log N (I, · p,μ , δ)

(6.5.9)

δ>0

where ε(p) > 0. We conclude by letting I increase to N. We deduce the claimed result for any f ∈ Lp (μ) by proceeding by approximation. 6.5.2 Remarks. The conjugacy lemma allows us to get stronger criteria for matrix summation methods defined on general dynamical systems. Let (X, A, μ) be a Lebesgue space and denote by T the group of automorphisms on (X, A, μ). Let A = {an,k , 1 ≤ k ≤ Nn , n ≥ 1}, Nn an increasing sequence of positive integers, be an infinite matrix of real numbers. Define an = {an,k , k ≥ 1}, n ≥ 1, and assume that the following regularity assumptions are fulfilled: i)

A = sup an 1 < ∞, n≥1

ii)

lim

n→∞

Nn

(6.5.10) an,k = 1.

k=1

Put for every T ∈ T , every f ∈ Lp (μ), SnT (f ) =

Nn

an,k f T k ,

n = 1, 2, . . . .

(6.5.11)

k=1

Suppose there exists an ergodic operator T such that the sequence of operators {SnT , n ≥ 1} verifies property (Bp ), for some 2 ≤ p < ∞. Note that the commutation assumption (H2) is automatically satisfied since T is ergodic. Then (Weber [1993a: Theorem 7.7-8]) A is a GB set of 2 , (6.5.12) and the first entropy criterion for instance can be strengthened as follows: sup

sup

S∈T

f ∈Lp (μ)

f 2,μ ≤1

E sup |Z(SnS (f ))| < ∞.

(6.5.13)

n≥1

Let us prove (6.5.12) first. By means of Kakutani–Rochlin’s lemma (7.2.2), for any ε > 0, any N ≥ 0, there exists a measurable set A such that A, T A, . . . , T N −1 A, are

258

6 Maximal operators and Gaussian processes

pairwise disjoint and 1 − ε ≤ N μ(A) ≤ 1. We set f = 1A . Let n, m be such that Nn ≤ Nm ≤ N. Then, Nn Nm T

SnT (f ) − Sm (f ) 2,μ = (an,k − am,k )f T k + am,k f T k k=Nn +1

k=1

=

Nn

(an,k − am,k )2 +

k=1

2 = an − am 2 μ(A).

Nm

2 am,k

1/2 √

2,μ

μ(A)

k=Nn +1

By the first entropy criterion, 2 E sup Z(an ) ≤ C μ(A). n:Nn λ < ∞.

f 2,μ ≤1 λ≥0

n≥1

And it follows from Theorem 5.4.3 that C := sup sup sup λp μ sup |Snτ f | > λ < ∞. λ≥0 f p ≤1 τ ∈C

Thus

n≥1

sup sup sup |Snτ f | 1 < ∞.

f p ≤1 τ ∈C

n≥1

The claimed inequality now just follows from the same argument used to prove Theorem 6.1.1 and inequalities (6.1.11).

6.6 A remarkable GB set property One of the easiest consequences of the first entropy criterion is the following: let (X, A, μ, τ ) be an ergodic measurable dynamical system. Consider for any f ∈ L1 (μ) and any positive integer n the usual ergodic averages 1 f τ k, n n−1

Aτn (f ) =

k=0

and for any f ∈ L2 (μ) the subset of L2 (μ), Cf = {Aτn (f ), n ≥ 1}. Then these sets are always GB sets, and in fact even GC sets (see p. 510. In particular 2 sup δ log Nf (δ) ≤ C f 2 . δ>0

Now let A be a nonempty subset of L2 (μ) and form C(A) = {Aτn (f ), n ≥ 1, f ∈ A}. Assume that A is a GB set; can we say that C(A) is again a GB set? More precisely: A is a GB set ⇐⇒ C(A) is a GB set?

260

6 Maximal operators and Gaussian processes

This question was solved in [Weber: 1994] in a much more general setting than the simple one of usual ergodic averages, and is the main result of this section. Apart from the fact that we will work with positive operators it can be viewed as a logical extension of the first entropy criterion, since it is stated under the same assumptions and contains it obviously. 6.6.1 Theorem. Let 2 ≤ p < ∞. Let {Sn , n ≥ 1} be a sequence of positive continuous operators from Lp (μ) to Lp (μ), with S1 = Identity. Assume that there exists a sequence {Tj , j ≥ 1} of positive isometries L2 (μ) with Tj (1) = 1 and such that: 1 (a) ∀f ∈ L∞ (μ), lim Tj f − f dμ = 0, J →∞ J 1 j ≤J

(b) Sn Tj = Tj Sn . Assume that property (Bp ) is realized. Let A be any nonempty subset of Lp (μ) and set C(A) = {Sn (f ), n ≥ 1, f ∈ A}. Then the following equivalence holds: A is a GB set ⇐⇒ C(A) is a GB set. Further, there exists a constant C such that, Z being the canonical Gaussian process on L2 (μ), for any subset A of Lp (μ), E sup Z(h) ≤ C inf h 2,μ + E sup Z(h) . (6.6.1) h∈C(A)

h∈A

h∈A

Remarks. 1. Before giving the proof of this result, some comments are in order. Since S1 is the identity operator on Lp (μ), it follows that C(A) is a GB set only if A is. 2. Let τ be some ergodic endomorphism of (X, A, μ). Let A be a GB set. By applying the above theorem with the choices p = 2, Sn = Aτn (f ), Tj = T j where T is defined by Tf = f τ , and using Birkhoff’s theorem, we deduce that C(A) is a GB set of L2 (μ); which solves in the affirmative the question raised at the beginning of the section. 3. Put for any positive integer n, Cn (A) = C(C(· · · C(A) · · · )) . 56 7 4 n times

By iterating Theorem 6.6.1 we find that the sets Cn (A) are GB sets. These sets being increasing, let C ∗ (A) = limn→∞ Cn (A) be their limit. Is C ∗ (A) again a GB set? Proof of Theorem 6.6.1. We shall use again the Gaussian elements defined in (6.2.8). We associate to any f ∈ Lp (μ), the Gaussian sequence 1 FJ,f = √ gj Tj (f ), J j ≤J

J = 1, 2, . . . ,

261

6.6 A remarkable GB set property

where g1 , g2 , . . . is a sequence of i.i.d. N (0, 1) random variables defined on a joint probability space (, B, P) of (X, A, μ). Step 1. By means of the Banach principle, there exists a constant 0 < K < ∞ such that for any f ∈ Lp (μ), 1 μ sup |Sn (f )| ≥ K f p,μ ≤ . 4 n≥1 Thus for any finite subset A0 of Lp (μ) and any positive integer J , we have in view of the positivity assumption of the operators Sn , μ sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ f ∈A0 n≥1

f ∈A0

1 ≤ μ sup Sn ( sup |FJ,f |) ≥ K sup |FJ,f | p,μ ≤ . 4 n≥1 f ∈A0 f ∈A0

By integrating this inequality with respect to P, next applying Fubini’s theorem, we obtain 1 P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ dμ ≤ . 4 X f ∈A0 n≥1 f ∈A0 Let D ⊂ X defined by

D = P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ f ∈A0 n≥1

Then 1 ≥ 4

D

f ∈A0

1 ≥ . 2

P sup sup |Sn (FJ,f )| geK sup |FJ,f | p,μ dμ

f ∈A0 n≥1

f ∈A0

1 1 . ≥ μ P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ ≥ 2 2 f ∈A0 n≥1 f ∈A0 Thus

or else

1 1 μ P sup sup |Sn (FJ,f )| ≥ K sup |FJ,f | p,μ ≥ ≤ , 2 2 f ∈A0 n≥1 f ∈A0

1 1 ≤μ ≤ P sup sup |Sn (FJ,f )| ≤ K sup |FJ,f | p,μ . 2 2 f ∈A0 n≥1 f ∈A0 Put E=

sup sup |Sn (FJ,f )| ≤ K sup |FJ,f | p,μ ,

f ∈A0 n≥1

f ∈A0

F = sup |FJ,f | p,μ ≤ 4E sup |FJ,f | p,μ . f ∈A0

f ∈A0

(6.6.2)

262

6 Maximal operators and Gaussian processes

As

1 P sup |FJ,f | p,μ ≥ 4E sup |FJ,f | p,μ ≤ , 4 f ∈A0 f ∈A0

we have P(E ∩ F ) ≤ P sup sup |Sn (FJ,f )| ≤ 4KE sup |FJ,f | p,μ , f ∈A0 n≥1

f ∈A0

and also 1 P(E ∩ F ) ≥ P sup sup |Sn (FJ,f )| ≤ K sup |FJ,f | p,μ − . 4 f ∈A0 n≥1 f ∈A0 By means of (6.6.2) we have for any finite subset A0 of Lp (μ) and any positive integer J ,

1 1 ≤μ ≤ P sup sup |Sn (FJ,f )| ≤ 4KE sup |FJ,f | p,μ . 2 4 f ∈A0 n≥1 f ∈A0

(6.6.3)

Consequently, by using estimate (10.2.2) for Gaussian semi-norms we get 1 ≤ μ E sup sup |Sn (FJ,f )| ≤ 64KE sup |FJ,f | p,μ . 2 f ∈A0 n≥1 f ∈A0

(6.6.4)

Step 2. Fix some 0 < ε < 21 and a positive integer N . We shall now proceed by approximation. Let A be a finite subset of Lp (μ) and assume for any f, g ∈ A and any two distinct integers k, l in [1, N] that

Sk (f ) − Sl (g) 2,μ = 0,

Sk (f ) 2,μ = 0.

(6.6.5)

To any element f from A, a simple function f ε can be associated such that sup f − f ε 2,μ ≤ ε.

(6.6.6)

f ∈A

Set A0 = f ε , f ∈ A . The continuity properties of the operators Sn show for ε sufficiently small, that (6.6.5) imply for any f, g ∈ A and any two distinct integers k, l in [1, N ] that (6.6.7)

Sk (f ε ) − Sl (g ε ) 2,μ = 0, Sk (f ε ) 2,μ = 0. From the commutation assumption also follows that Sk (FJ,f ε ) − Sl (FJ,g ε ) 2

2,P

But

2 2 1

Tj [Sk (f ε ) − Sl (g ε )] . = FJ,Sk (f ε )−Sl (g ε ) 2,P = J j ≤J

2

Tj [Sk (f ε ) − Sl (g ε )]

2 = Tj Sk (f ε ) − Sl (g ε ) ,

263

6.6 A remarkable GB set property

μ-almost surely. Thereby Sk (FJ,f ε ) − Sl (FJ,g ε ) 2

2,P

2 2 1 = FJ,Sk (f ε )−Sl (g ε ) 2,P = Tj Sk (f ) − Sl (g) . J j ≤J

(6.6.8) But the assumptions made on the sequence {Tj , j ≥ 1} show for any f, g ∈ A and any 1 ≤ k, l ≤ N that, 2 2 (6.6.9) lim Sk (FJ,f ε ) − Sl (FJ,g ε ) 2,P − Sk (f ε ) − Sl (g ε ) 2,μ = 0. J →∞

1,μ

Proceeding by extraction, one can define a partial index J = Jq , q ≥ 1 such that for any f, g ∈ A, any 1 ≤ k, l ≤ N and any q ≥ 1, 2 ε ε ε 2 ≤ 2q 2 . (6.6.10) Sk (FJ,f ε ) − Sl (FJ,g ε ) 2,P − Sk (f ) − Sl (g ) 2,μ 1,μ 2 N #(A)2 Put for any q ≥ 1, Aq =

sup f,g∈A, 1≤k,l≤N

2 ε ε 2 Sk (FJq ,f ε )−Sl (FJq ,g ε ) 2,P − Sk (f )−Sl (g ) 2,μ

≥ 2−q , (6.6.11)

then we have

ε , ∀q ≥ 1. 2q * H = Acq .

μ(Aq ) ≤ Put

(6.6.12)

q≥1

∞

Then μ(H ) ≥ 1− q=1 μ(Aq ) ≥ 1−ε, and on H , for any f, g ∈ A, any 1 ≤ k, l ≤ N and any q ≥ 1, 2 ε ε 2 −q Sk (FJq ,f ε ) − Sl (FJq ,g ε ) 2,P − Sk (f ) − Sl (g ) 2,μ ≤ 2 . Let θ :=

inf

1≤k =l≤N f,g∈A

Sk (f ε ) − Sl (g ε ) 2,μ .

By (6.6.7) we have θ > 0. We can thus define ∗

q := inf q ≥ 1 : 2 and

−q

θ2 ≤ , 4

J ∗ = {Jq , q ≥ q ∗ }. J∗

depends on ε, N and A. On H , we have for any J ∈ We note that f, g ∈ A, 1 ≤ k, l ≤ N, 2 Sk (FJ,f ε ) − Sl (FJ,g ε ) 2,P

(6.6.13) J∗

and any

2 1 2 − Sk (f ε ) − Sl (g ε ) 2,μ ≤ Sk (f ε ) − Sl (g ε ) 2,μ , 4

264

6 Maximal operators and Gaussian processes

hence 1 Sk (f ε ) − Sl (g ε ) ≤ Sk (FJ,f ε ) − Sl (FJ,g ε ) ≤ 2 Sk (f ε ) − Sl (g ε ) . 2,μ 2,P 2,μ 2 (6.6.14) With (6.6.14), we can apply Slepian’s inequality (6.2.6) on the measurable set H . We obtain on H , for any J ∈ J ∗ , E

sup

Z(h) ≤ 2E sup

sup Sn (FJ,f ε ).

(6.6.15)

Combining (6.6.4) with (6.6.15) finally gives: for any J ∈ J ∗ , E sup Z(h) ≤ 128KE sup |FJ,h | p,μ .

(6.6.16)

h∈{Sn (f ε ),1≤n≤N,f ∈A}

f ∈A0 1≤n≤N

h∈{Sn (f ),1≤n≤N,f ∈A0 }

h∈A0

Step 3. We estimate E suph∈A0 |FJ,h | p,μ for any J ∈ J, under the additional assumptions (6.6.5). We will indicate at the end of the proof how to proceed without them. By means of Jensen’s inequality, $ %1/p p E sup |FJ,h | p,μ ≤ E sup |FJ,h | dμ . X h∈A0

h∈A0

The integrability properties of Gaussian laws imply p p

E sup |FJ,h |p ≤ Cp E sup |FJ,h | , h∈A0

h∈A0

where 0 < Cp < ∞ is a constant depending on p only. Thus E sup |FJ,h | p,μ ≤ Cp

$

%p E sup |FJ,h |

X

h∈A0

1/p dμ

.

(6.6.17)

h∈A0

We shall split the integral in the right-hand side of (6.6.17) in two parts, by integrating first over H , next over H c . Examine the contribution produced by the first integration. Fix some h0 in A. The triangle inequality and the symmetry properties of Gaussian laws imply E sup |FJ,h | ≤ E |FJ,hε0 | + E sup |FJ,g−h | = E |FJ,hε0 | + 2E sup FJ,h . h∈A0

h,g∈A0

h∈A0

Integrating then this inequality over H with respect to μ, then applying the Slepian comparison lemma, imply "

p #p E sup |FJ,h | dμ ≤ E |FJ,hε0 | + 2E sup FJ,h dμ H

H

h∈A0

≤ H

h∈A0

p

E |FJ,hε0 | + 4E sup Z(h) h∈A0

dμ.

265

6.6 A remarkable GB set property

Hence,

#p

"

E sup |FJ,h |

H

1/p

≤ 1H E |FJ,hε0 | p,μ + 4E sup Z(h).

dμ

h∈A0

But

(6.6.18)

h∈A0

1H E |FJ,hε | p

and

=

p,μ

0

H

p/2 2 1 ε 2 Tj (h0 ) dμ, πJ j ∈J

p/2 p/2 1 ε 2 ε 2 Tj (h0 ) → (h0 ) dμ , J X j ∈J

as J tends to infinity along J, uniformly in x ∈ H . This shows that

#p

"

E sup |FJ,h |

lim sup J →∞ J ∈J

H

1/p dμ

≤ (2/π )1/2 h0 2,μ + ε + 4E sup Z(h).

h∈A0

h∈A0

(6.6.19) Now consider the integration over H c . By means of Jensen’s inequality

"

Hc

#p

E sup |FJ,h |

1/p dμ

h∈A0

2 ≤ B log [1 + #(A0 )]

But

sup H c h∈A0

p/2 1/p 1 2 Tj (h) . J j ∈J

1 2 sup Tj (h) → sup h2 dμ, h∈A0 J h∈A0 X j ∈J

μ-almost surely as J tends to infinity along J. Since the Tj are positive operators and Tj 1 = 1, we get 1 sup Tj (h)2 ≤ sup h 2∞,μ . h∈A0 J h∈A0 j ∈J

By applying the dominated convergence theorem, we obtain lim sup J →∞ J ∈J

Hc

#p

"

E sup |FJ,h | h∈A0

1/p dμ

2 ≤ B[μ(H c )]1/p log [1 + #(A0 )] sup h 2,μ h∈A0

2 # " ≤ B[2ε]1/p log [1 + #(A)] ε + sup h 2,μ . h∈A

(6.6.20)

266

6 Maximal operators and Gaussian processes

By combining now estimates (6.6.16), (6.6.19) and (6.6.21), and using subadditivity of the function φ(x) = x 1/p , x > 0, we get

E sup Z(h) ≤ 32K h0 2,μ + ε+4E sup Z(h) h∈{Sn (f ),1≤n≤N,f ∈A0 }

h∈A0

2 # " + B log [1 + #(A)] ε + sup h 2,μ (2ε)1/p .

(6.6.21)

h∈A

The finite-dimensional margins of Gaussian vectors being L2 -continuous, it follows that E sup Z(h) ≤ C(ε) + E sup Z(h) , h∈{Sn (f ),1≤n≤N,f ∈A}

h∈{Sn (f ),1≤n≤N,f ∈A0 }

E sup Z(h) ≤ C(ε) + E sup Z(h), h∈A0

h∈A

where 0 < C(ε) < ∞ and limε→0 C(ε) = 0. Hence

Z(h) ≤ C(ε) + 128K h0 2,μ + ε + 4C(ε) E sup h∈{Sn (f ),1≤n≤N,f ∈A}

2 " # + 4E sup Z(h) + B log [1 + #(A)] ε + sup h 2,μ [2ε]1/p . h∈A

h∈A

But ε is arbitrary as well as h0 in A. We therefore conclude

E sup Z(h) ≤ 128K inf h 2,μ + 4E sup Z(h) . h∈{Sn (f ),1≤n≤N,f ∈A}

(6.6.22)

h∈A

(6.6.23)

h∈A

It is now clear that (6.6.23) remains true when the additional assumptions (6.6.5) are no longer fulfilled. Indeed, it suffices to establish (6.6.23) for A = {h ∈ A : h = 0} and B = {h ∈ {Sn (f ), 1 ≤ n ≤ N, f ∈ A} : Sn (h) = 0}. The proof is now achieved by letting A increase to some countable L2 (μ)-dense subset of A.

Chapter 7

The central limit theorem for dynamical systems

In any aperiodic dynamical system, there exists a square integrable centered function satisfying the central limit theorem (CLT). This is a famous result due to Burton and Denker, and we provide a complete and detailed proof, involving Kakutani–Rochlin’s lemma. Some additional CLT results for orbits of aperiodic dynamical systems are further established. The CLT for various means generated under the action of irrational rotations is proved next. In the case of Gaussian lacunary Fourier series, we study the convergence in variation of the related density distributions to the Gaussian density.

7.1

Introduction and preliminaries

We begin with some elementary and introductory considerations. Let (X, A, μ, τ ) be a measurable dynamical system. Recall Theorem 4.1.1. Birkhoff’s pointwise ergodic theorem. For any f ∈ L1 (μ), the limit 1 f τ k x = f¯(x) n→∞ n n−1

lim

k=0

exists μ-almost everywhere and in L1 (μ), and we have f¯ = E {f |J}, where J = σ {A ∈ A : τ −1 A = A}. A parallel result in probability theory is the well-known Strong law of large numbers (SLLN). Let X, X1 , X2 , . . . be a sequence of independent, integrable, identically distributed random variables with basic probability space (, B, P), and set Sn = X1 + · · · + Xn . Then P

Sn = E X = 1. n→∞ n lim

It is worth noticing that the SLLN is just a very particular case of the pointwise ergodic theorem. And even in the case of sequences of independent, identically distributed random variables, the pointwise ergodic theorem expresses a much stronger property. As is well known, the SLLN is completed by two fundamental results: the law of the iterated logarithm and the central limit theorem. This last result, which concerns the statistic of this convergence, states as follows:

268

7 The central limit theorem for dynamical systems

Central limit theorem (CLT). Let X, X1 , X2 , . . . be a sequence of independent, identically distributed random variables with basic probability space (, B, P). Assume that E X = 0, E X2 = 1. Then, Sn D lim √ = N (0, 1). n→∞ n In the late 1980s, a companion to this result was found independently by Brosamler, Fisher and Schatte. The following formulation of this result is due to Lacey and Philipp [1990]. Almost sure central limit theorem (ASCLT). Under the same assumptions, N 1 1 D δ{Sj /√j } = N (0, 1). N →∞ log N j

lim

j =1

It is natural to ask whether or not similar results hold for dynamical systems, and under which conditions. Such questions were, and are still intensively investigated. There are, however, very few fundamental results and in the same way many specific results. The object of this chapter is to present the probably most fundamental result in this area: the theorem of Burton and Denker recently completed by a fine result of Volný. Next, we focus on the central limit theorem and almost sure central limit theorem for irrational rotations, which are at the heart of the study of dynamical systems. Finally, we will study in the case of Gaussian lacunary Fourier series a very sharp form of the CLT: the convergence in variation, namely the convergence in the spaces L1 (R) and L∞ (R) of the related density distributions to the Gaussian density. Before really entering into the matter, we shall give some more comments onASCLT and its connection with CLT. By Theorem 1.6 in [Atlagh–Weber: 2000] both properties are equivalent, thereby equivalent to the moment condition E X = 0 and E X 2 = 1. The formulation of the ASCLT we gave is, however, only a weak form of a much stronger phenomenon: not only logarithmic averages converge, but also some series. Let s = {sk , k ≥ 0} be an arbitrary sequence of reals. Put for any positive integer n, Yn(s) = Yn =

2n ≤k 3/2. In view of Kronecker’s lemma, (7.1.2) implies with the choice sk = xk k and assuming that E X = 0, E X2 = 1, $

1 1 Sk lim 1{ √Sk <x } − P √ < xk k n→∞ log n k k k k=1 n

%

a.s.

= 0.

(7.1.3)

By using the CLT and letting xk ≡ x in (7.1.3), we recover the initial formulation of the ASCLT. In fact, property (7.1.2) remains true even in absence of a CLT. Indeed, let 0 < p < ∞ and consider the class Fp of distribution functions F verifying

(Fp ) F (−x) ∨ (1 − F (x) = O x −p , x → +∞. In∞the case of p ≥ 1, we moreover assume that F is a centered distribution function: −∞ xF (dx) = 0. Then A general formulation. Let X, X1 , X2 , . . . be a sequence of independent, identically distributed random variables with basic probability space (, B, P). Let F be the distribution function of X. Assume that F ∈ F2 . Then, property (7.1.2) holds true. Further, for any sequence {xk , k ≥ 1} of reals,

1 1 Sk 1 1 a.s. lim P √ ≤ xk = c "⇒ lim 1{ √Sk ≤x } = c. k n→∞ log n n→∞ k log n k k k k=1 k=1 n

n

The above formulation ([Giuliano–Weber: 2005], Theorem 1.1) of the ASCLT thus appears as the precise form that takes the quasi-orthogonal property of the geometric blocks (7.1.1) in presence of the CLT. Extensions to the case F ∈ Fp , 0 < p < 2 are given in the same paper, modulo some additional assumptions on F .

7.2 A theorem of Burton and Denker A main purpose of this section is to exhibit a real-valued function f defined on the phase space X of a given aperiodic dynamical system (X, A, μ, T ) such that the natural long-term ratio satisfies the central limit theorem: n−1 j u −v 2 1 j =0 T f (x) μ x ∈ X : n−1 ≤u → √ exp dv (7.2.1) 2 2π −∞

j =0 T j f 2 as n → ∞, for all u ∈ R.

270

7 The central limit theorem for dynamical systems

Let (X, A, μ, T ) be a measurable dynamical system. We assume here and throughout the section that (X, A, μ) is a Lebesgue space. Let us clarify that a complete finite measure space is called a nonatomic Lebesgue space, if it is isomorphic (mod 0) to the ordinary Lebesgue space ([0, γ ], L([0, γ ]), λ) for some γ > 0. In other words, there exist sets Z1 ⊂ X of measure 0 and Z2 ⊂ [0, γ ], and a measurable bijection ψ : X \ Z1 → [0, γ ] \ Z2 , such that ψ −1 is measurable and μ = λψ. A nonatomic Lebesgue space joined with finitely or countably many point masses of finite total mass is called a Lebesgue space. We will always assume that μ(X) = 1. For basic results concerning Lebesgue spaces we refer for instance to De la Rue [1993]. In addition, we recall that the given T is said to be aperiodic, if μ{x ∈ X : T n x = x} = 0 for all n ≥ 1. It is instructive to observe that in this case the measure space (X, A, μ) is nonatomic. This fact follows straightforwardly by the Poincaré recurrence theorem. It can be also easily verified by using the following well-known and in the sequel useful result, see Halmos [1956]: Kakutani–Rochlin’s lemma. If T is aperiodic, then for every ε > 0 and for every n ≥ 1 there exists F ∈ A such that the sets F, T −1 (F ) . . . T −(n−1) (F ) are mutually disjoint, and such that we have:

μ F ∪ T −1 (F ) ∪ · · · ∪ T −(n−1) (F ) > 1 − ε (7.2.2) Any set F ∈ A satisfying the conclusions of (7.2.2) with the given and fixed ε > 0 and n ≥ 1 will be called an (ε, n)-Kakutani–Rochlin set. The essential fact on such sets needed in the Burton–Denker construction is proved in Corollary 7.2.2 below, see also Remark 7.2.3. The next proposition establishes the main step in its proof. As a preliminary fact in this direction, recall a well-known result on Lebesgue spaces due to Rochlin ([Rochlin: 1962], p. 31): The factor space of a Lebesgue space with respect to a measurable decomposition is a Lebesgue space.

(7.2.3)

In particular, consider as in Proposition 7.2.1 below, a Lebesgue space (X, A, μ) and a σ -algebra B without atom, generated by a countable family (Bn )n≥1 of elements from A. Let ζ be the decomposition of X generated by B. That is, we introduce the equivalence relation on X by putting x ∼ x , if and only if 1B (x ) = 1B (x ) for all B ∈ B, and we put ζ = {[x], x ∈ X} to denote the set of all equivalence classes. Since for any two points x , x ∈ X we have 1B (x ) = 1B (x ) for all B ∈ B, if and only if 1Bn (x ) = 1Bn (x ) for all n ≥ 1, we see that ζ is measurable in the sense of Rochlin, see Rochlin [1962] (pp. 4–5, 26). That is, the decomposition ζ is generated by a countable family of measurable sets (Bn )n≥1 . Notice that ζ ⊂ B ⊂ A. Moreover, since B is without atom, we have μ(C) = 0 for all C ∈ ζ . Hence we easily find that (X, A, μ) is nonatomic. Let (Xζ , Aζ , μζ ) be the factor space of (X, A, μ) with respect , to ζ . That is, we have Xζ = {[x], x ∈ X}, Aζ = A˜ = [x]∈Xζ {[x]} : A˜ ∈ A and ˜ = μ(A) ˜ for all A˜ ∈ Aζ . By (7.2.3) we know that (Xζ , Aζ , μζ ) is a Lebesgue μζ (A) space. Moreover, since B is without atom, we see that (Xζ , Aζ , μζ ) is nonatomic. In

7.2 A theorem of Burton and Denker

271

other words, the Lebesgue space (Xζ , Aζ , μζ ) is isomorphic (mod 0) to the ordinary Lebesgue space ([0, 1], L([0, 1]), λ). This fact turns out to be of vital importance in Step 1 of the proof of Proposition 7.2.1 below. In the sequel, we use the following notation: given a finite measure space (X, A, μ) and A ∈ A, the trace of μ on A is a finite measure on A, denoted and defined by tr (μ, A)(B) = μ(A ∩ B) for all B ∈ A. If B is a σ -algebra on X and C is a subset of X, then the trace of B on C is a σ -algebra on C, denoted and defined by tr (B, C) = {B ∩C : B ∈ B}. It is instructive to observe, that if (X, A, μ) is a nonatomic Lebesgue space, and A belongs to A with μ(A) > 0, then A, tr (A, A), μ(A)−1 tr (μ, A) forms a nonatomic Lebesgue space as well. 7.2.1 Proposition. Let (X, A, μ) be a Lebesgue space, and let B be a σ -algebra without atom, generated by a countable family (Bn )n≥1 of elements from A. Then for any finite partition P of X, measurable with respect to A, there exists a σ -algebra C without atom, independent of P , and generated by a countable family of elements from B. Proof. The construction of C is divided into four steps as follows. Step 1. Let P be an arbitrary element from A. We show that for any θ ∈ ]0, 1[, there exists A ∈ B satisfying: μ(A) = θ, μ(A ∩ P ) = θ · μ(P ).

(7.2.4) (7.2.5)

Let ζ be the measurable decomposition generated by B, and let (Xζ , Aζ , μζ ) be the factor space of (X, A, μ) with respect to ζ . Denote by f the conditional μ -measure of P with respect to B. Then f can be regarded as a measurable map from Xξ into [0, 1]. According to the remarks stated after (7.2.3) above, we know that (Xζ , Aζ , μζ ) is isomorphic (mod 0) to the ordinary Lebesgue space ([0, 1], L([0, 1]), λ). Therefore we can restate Step 1 as follows. Step 1 . Let f be a measurable function from [0, 1] into itself, and let θ be an element from ]0, 1[. Then we show that there exists A ∈ L := L([0, 1]) satisfying

λ(A) = θ, f dλ = θ ·

A

(7.2.4 )

[0,1]

f dλ.

(7.2.5 )

Let Lθ denote the family of all finite unions of intervals of total length θ , and let L¯θ denote its closure in the topology generated by the metric d(A, B) = λ(A*B) for A, B ∈ L. It is easily seen that L¯θ = {A ∈ L : λ(A) = θ }, and since obviously Lθ is connected, then L¯θ is connected as well. Let us define a map ψ from L¯θ into R+ by 1 ψ(A) = f dλ θ A for all A ∈ L¯θ , and let us put M = [0,1] f dλ. Then we claim that there exists A+ ∈ L¯θ such that ψ(A+ ) ≥ M. Indeed, put B = f −1 ([M, 1]), and first suppose

272

7 The central limit theorem for dynamical systems

that λ(B) ≥ θ. Then obviously we can take for the desired A+ any measurable A ⊂ B satisfying λ(A) = θ. Next suppose λ(B) < θ; then we can fix any measurable C ⊂ B c satisfying λ(C) = θ − λ(B). Put A+ = B ∪ C; then A+ ∈ L¯θ , and we have: 1 1 1 f dλ = f dλ + M − f dλ ≥ (M · λ(B) + M − λ(C c )) θ A+ θ θ B Cc (1 + λ(B)) = (M − 1) · + 1 ≥ M. θ In a similar way we can find A− ∈ L¯θ such that ψ(A− ) ≤ M. Since ψ is continuous, then ψ(L¯θ ) is an interval, and the claim follows. Step 2. Consider the case where P = {P , P c }, and let A ∈ B with μ(A) > 0 be independent of P . Then we claim, that for any θ ∈ ]0, μ(A)[ there exists B ∈ B with B ⊂ A, independent of P , such this fact follows by applying

that μ(B) = θ . Indeed, −1 Step 1 to the Lebesgue space A, tr (A, A), μ(A) tr (μ, A) with the nonatomic σ algebra tr (B, A). Step 3. Consider the case where P = {P , P c }. Then we claim, that there exists a σ -algebra C without atom, independent of P , and generated by a countable family of elements from B. Indeed, by using Step 1 and Step 2, we can recursively construct an increasing sequence of finite partitions {Cn }n≥1 of X whose elements are from B, such that (7.2.6) each Cn consists of 2n atoms of measure 2−n , each atom in Cn is the union of two atoms from Cn+1 ,

(7.2.7)

each Cn is independent of P .

(7.2.8)

Let C be the smallest σ -algebra on X containing all the atoms of each partition Cn for n ≥ 1. Then by (7.2.6) and (7.2.7) we see that C is without atom, and by (7.2.8) we may easily verify that C is independent of P . Thus the claim follows. Step 4. Consider the general case where P = {P1 , . . . , Pn }. Then we claim, that we can find the σ -algebra C as it is stated in Proposition 7.2.1. For this, apply step 3 with the partition P1 = {P1 , P1c } and the σ -algebra B. In this way, we can find a σ -algebra C1 without atom, independent of P1 , and generated by a countable family of elements from B. Then Step 3 may be applied with the partition P2 = {P2 , P2c } and the σ -algebra C1 . In this way we can find a σ -algebra C2 without atom, generated by a countable family of elements from C1 , independent of P2 , and thus of P1 ∪ P2 as well. Continuing in this way, we shall at the end obtain a σ -algebra Cn without atom, generated by a countable family of elements from B, and independent of P . The claim then follows with C = Cn . These facts complete the proof. 7.2.2 Corollary. Let (X, A, μ) be a nonatomic Lebesgue space, let T be a measurepreserving transformation of X, let F ∈ A with μ(F ) > 0 be a given and fixed set, and let πl be a finite partition of T −l (F ) with elements from A for all 0 ≤ l < n with

7.2 A theorem of Burton and Denker

273

p some n ≥ 1. Then for any γ1 , . . . , γp ≥ 0 with k=1 γk = 1 and p ≥ 1, there exists a partition α = {A1 , . . . , Ap } of F with elements from A such that

μ T −l (Ak ) ∩ B = γk · μ(B) for all 1 ≤ k ≤ p, all B ∈ πl , and all 0 ≤ l < n. Proof. Applying Proposition 7.2.1 to the nonatomic Lebesgue space,

F, tr (A, F ), μ(F )−1 tr (μ, F ) with the partition π0 and the σ -algebra tr (A, F ), we can find a countably generated σ -algebra A0 ⊂ tr (A, F ) without atom and independent of π0 with respect to μ(F )−1 tr (μ, F ). We proceed by induction. Suppose that the σ -algebra Al is already constructed for some 0 ≤ l < n − 1. Then T −(l+1) (Al ) is a countably generated σ -algebra without atom on T −(l+1) (F ), and we can apply Proposition 7.2.1 to the Lebesgue space:

−(l+1) T (F ), tr (A, T −(l+1) (F )), μ(F )−1 tr (μ, T −(l+1) (F )) with the partition πl+1 and the σ -algebra T −(l+1) (Al ). In this way we get a countably generated σ -algebra Bl+1 ⊂ T −(l+1) (Al ) without atom, independent of πl+1 with respect to μ(F )−1 tr (μ, T −(l+1) (F )). Then we may define the countably generated σ -algebra Al+1 as follows: Al+1 = {A ∈ Al : T −(l+1) (A) ∈ Bl+1 }. The σ -algebra An−1 ⊂ tr (A, F ) obtained at the end is clearly such a one that T −l (An−1 ) is independent of πl with respect to μ(F )−1 tr (μ, T −l (F )) for all 0 ≤ l < n. Besides, as it is without an atom, we can find a partition α = {A1 , . . . , Ap } ⊂ An−1 of F such that μ(F )−1 μ(Ak ) = γk for all 1 ≤ k ≤ p. Hence we get μ(F )−1 μ(T −l (Ak ) ∩ B) = μ(F )−1 μ(T −l (Ak )) · μ(F )−1 μ(B) = γk · μ(F )−1 μ(B) for all 1 ≤ k ≤ p, all B ∈ πl , and all 0 ≤ l < n. This fact completes the proof. 7.2.3 Remark. We shall use Corollary 7.2.2 with n = N L and F being an (ε, N L)Kakutani–Rochlin set of an aperiodic dynamical system (X, A, μ, T ), where ε > 0 and N, L ≥ 1 with N ≥ 1 being even. Putting p = 2N/2 and γk = 2−N/2 for all 1 ≤ k ≤ p, in this way we may conclude the following: If F ∈ A is an (ε, NL)-Kakutani–Rochlin set, and πl is a finite partition of T −l (F ) with elements from A for all 0 ≤ l < NL, then there exists a finite partition α of F into 2N/2 sets from A such that

μ T −l (A) ∩ B = 2−N/2 · μ(B) for all A ∈ α, all B ∈ πl , and all 0 ≤ l < N L . It is instructive to observe that in this case we have μ(A) = 2−N/2 μ(F ) for all A ∈ A.

274

7 The central limit theorem for dynamical systems

Given a dynamical system (X, A, μ, T ), we shall use CLT(μ, T ) to denote the set of all functions f ∈ L2 (μ) with f dμ = 0, satisfying the central limit theorem as stated in (7.2.1) above. The main result of this section is the well-known theorem of Burton–Denker establishing that CLT(μ, T ) = ∅ whenever T is aperiodic. More precisely, we have: 7.2.4 Theorem. If (X, A, μ, T ) is an aperiodic dynamical system, then there exists f ∈ L2 (μ) with f dμ = 0 satisfying the central limit theorem: n−1 j u −v 2 1 j =0 T f (x) ≤u → √ exp dv μ x ∈ X : n−1 2 2π −∞

j =0 T j f 2 as n → ∞, for all u ∈ R. Proof. Let N, K, L ≥ 1 be given and fixed positive integers, such that N is even and 1 ≤ K < N is odd. Let F ∈ A be an (ε, NL)-Kakutani–Rochlin set, and let πl be a finite partition of T −l (F ) for all 0 ≤ l < NL. Then by Remark 7.2.3 there exists a partition α of F into 2N/2 sets from A satisfying:

μ T −l (A) ∩ B = 2−N/2 · μ(B) (7.2.9) for all A ∈ α, all B ∈ πl , and all 0 ≤ l < N L. The partition α can be written as follows: α = {A(ε0 , ε1 , . . . , ε N −1 ) : εi ∈ {−1, 1}}. 2

Define a measurable function

g 2l :

g (x) = εl 2l

F → {−1, 1} by putting if x ∈ A(ε0 , . . . , εl , . . . , ε N −1 ) 2

for all 0 ≤ l < N/2. Define a measurable function putting g (2l+K)(mod N ) (x) = −g 2l (x)

g (2l+K)(mod N ) :

F → {−1, 1} by

for all x ∈ F and all 0 ≤ l < N/2. In this way a function g l : F → {−1, 1} is defined for all 0 ≤ l < N. Finally, we define a measurable function g : X → {−1, 0, 1} as follows: ⎧

b−a 2 ⎪ g l T j N +l (x) , if x ∈ T −(j N +l) (F ) 2b θ ⎪ 2a ⎨ for some 0 ≤ j < L and 0 ≤ l < N; g(x) = ⎪ ⎪ , L−1 −l ⎩ 0, if x ∈ / N l=0 T (F ). m−1 We shall use Sm (f )(x) to denote j =0 f (T j (x)) whenever m ≥ 1 and x ∈ X. In addition, the following two facts on the Kakutani–Rochlin tower F, T −1 (F ), . . . , T −NL+1 (F ) instructively clarify the entire construction and will be freely used below. Namely, we have T (F ) ⊂

L−1 N+ l=0

c

T −l (F )

∪ T −N L+1 (F ),

(7.2.10)

275

7.2 A theorem of Burton and Denker

T

NL−1 +

c

T −l (F )

⊂

l=0

L−1 N+

c T −l (F ) ∪ T −N L+1 (F ).

(7.2.11)

l=0

Both statements follow straightforwardly. 7.2.5 Lemma. Under the hypotheses stated above we have: (a) Random functions g 0 , g 2 , . . . , g N −2 as well as g 1 , g 3 , . . . , g N −1 are independent and identically distributed with respect to the probability measure μ(F )−1 tr (μ, F ). Moreover, we have μ(F )−1 tr (μ, F ){g l = ±1} =

1 2

for all 0 ≤ l < N.

(7.2.12)

(b) Random functions g T i for 0 ≤ i < K restricted to T −l (F ) are independent and identically distributed with respect to the probability measure μ(F )−1 tr (μ, T −l (F )) whenever K ≤ l < N L. Moreover, we have μ(F )−1 tr (μ, T −l (F )){g T i = ±1} =

1 2

for all 0 ≤ i < K.

(7.2.13)

(c) Random function Sm (g) restricted to T −l (F ) is independent of πl with respect to the probability measure μ(F )−1 tr (μ, T −l (F )) whenever 1 ≤ m ≤ N and N ≤ l < N L. (d) Let K ≤ m < N − K and 0 ≤ l < N be given and fixed. Then there exists a set J ⊂ {0, 1, . . . , m − 1} of cardinality K − 1, K or K + 1 such that m−1

g(T i+l (x)) =

i=0

g(T i+l (x))

(7.2.14)

i∈J

, −j N +1 (F ). Furthermore, the random functions g T i+l whenever x ∈ L j =1 T with i ∈ J restricted to T −l (F ) are independent and identically distributed with respect to the probability measure μ(F )−1 tr (μ, T −l (F )). Moreover, we have μ(F )−1 tr (μ, T −l (F )){g T i+l = ±1} =

1 2

for all i ∈ J . (e) The following inequality is valid: |Sm (g)(x)| ≤ 2(K + 1)

(7.2.15)

for all m ≥ 1 and all x ∈ X outside of a μ-null set. (f) The following inequalities are valid: (1 − δ)m ≤ Sm (g) 22 ≤ (1 + δ)m

(7.2.16)

for all 1 ≤ m ≤ K, with δ → 0, if K(ε + L−1 ) → 0 when K, L → ∞ and ε → 0 (like maps from N into N, respectively ]0, ∞[, as n → ∞).

276

7 The central limit theorem for dynamical systems

(g) The following inequalities are valid: (1 − δ)K ≤ Sm (g) 22 ≤ (1 + δ)K

(7.2.17)

for all K ≤ m < N − K, with δ → 0, if K(ε + L−1 ) → 0 when K, L → ∞ and ε → 0 (like maps from N into N, respectively ]0, ∞[, as n → ∞). Proof. (a) The statements follow straightforwardly by definition and the fact that by (7.2.1) we have μ(A) = 2−N/2 μ(F ) for all A ∈ α. (b) Let us fix K ≤ l < N L, then by definition we have g(T i (x)) = g (l−i)(mod N ) (T l (x)) for all x ∈ T −l (F ) and all 0 ≤ i < K. Hence we can easily verify that (g T 0 , g T 1 , . . . , g T K−1 ) is equally distributed as (±g j1 , ±g j2 , . . . , ±g jK ) with a choice of signs ± and for some ( different ) j1 , j2 , . . . , jK from {0, 2, . . . , N − 2} ( both depending on the given l ). Thus the statements follow straightforwardly by (7.2.4). (c) Let A = {x ∈ T −l (F ) : g(T j (x)) = εj , ∀0 ≤ j < m} with some given and fixed j εj ∈ {−1, 1} for 0 ≤ j < m, and let B ∈ πl . If x ∈ T −l (F ), then T (x) ∈ −(l−j ) j (l−j )(mod N ) l (F ) and thus by definition we have g(T (x)) = g T T (x) . Hence , we easily find that A = C∈β T −l (C) for some subfamily β ⊂ α. Denote νl = μ(F )−1 tr (μ, T −l (F )), then by (7.2.1) we have: + νl (A ∩ B) = νl T −l (C) ∩ B = νl (T −l (C) ∩ B) C∈β

= μ(F )−1

C∈β

μ(T −l (C) ∩ B) = μ(F )−1

C∈β −1

= μ(F ) =

2−N/2 μ(B)

C∈β

2

−N/2

μ(F ) · μ(F )

−1

μ(B)

C∈β

μ(F )−1 μ(T −l (C)) · μ(F )−1 μ(B)

C∈β

=

νl (T −l (C)) · νl (B) = νl (A) · νl (B).

C∈β

This fact easily completes the proof of (c). (d) We shall verify the case where l = 0 and x ∈ T −N +1 (F ). Other cases follow in exactly the same manner by using periodicity in the definition of g. If m = K or K + 1, there is nothing to be proved. Thus consider the case where K + 2 ≤ m < N − K and look at any odd number p satisfying K ≤ p < m. Then p = 2j + K for some j ≥ 0, i and therefore members of the sum m−1 i=0 g(T (x)), which correspond to indices 2j and 2j + K, cancel. Arguing in this way for any odd number between K and m, we may cancel m − K, m − K + 1 or m − K − 1 indices ( depending on m and K ). Doing

7.2 A theorem of Burton and Denker

277

so, at the end we will have only those g T i which are independent with respect to μ(F )−1 tr (μ, F ). These facts complete the proof of (d). −1 l (e) By definition of g, we easily find that N l=0 g(T (x)) = 0, whenever L +

x∈

T −j N +1 (F ).

j =1

Therefore by (7.2.10), (7.2.11) and the definition of g we may conclude: Sm (g)(x) =

p−1

g (y) + ik

k=0

p−1

g jk (z)

(7.2.18)

k=0

for all x ∈ X outside of a μ-null set, with some y, z ∈ F , some 0 ≤ p, q ≤ K + 1, and some 0 ≤ i0 , . . . , ip−1 , j0 , . . . , jq−1 < N, where the corresponding term on the righthand side equals zero if p or q equals zero. Hence the claim follows straightforwardly by the fact that | g l (x) |≤ 1 for all x ∈ X and all 0 ≤ l < N. , L−1 −l (f) Suppose that G = N l=N T (F ). Then we have μ(G) = N (L − 1)μ(F ) ≥ N(L − 1)(1 − ε)N −1 L−1 ≥ 1 − ε − L−1 , and thus | Sm (g) |2 dμ ≤ m2 μ(Gc ) ≤ m2 (ε + L−1 ). (7.2.19) Gc

On the other hand we have N L−1 2 | Sm (g) | dμ = G

l=N

T −l (F )

| Sm (g) |2 dμ.

(7.2.20)

Fix N ≤ l < N L. Since m ≤ K, by (b) the functions g T i for 0 ≤ i < m restricted to T −l (F ) are independent with respect to μ(F )−1 tr (μ, T −l (F )) and we have i g(T (x)) μ(dx) = g (l−i)(mod N ) (T l (x)) μ(dx) T −l (F ) T −l (F ) (7.2.21) (l−i)(mod N ) g (x) μ(dx) = 0. = F

Thus we get T −l (F )

| Sm (g) |2 dμ =

m−1 i=0

T −l (F )

[g(T i (x))]2 μ(dx) = mμ(F ).

Now by (7.2.19), (7.2.20) and (7.2.22) we easily conclude: | Sm (g) |2 dμ + | Sm (g) |2 dμ

Sm (g) 22 = Gc 2

G

≤ m (ε + L−1 ) + N(L − 1)mμ(F ) ≤ m2 (ε + L−1 ) + m ≤ m(1 + K(ε + L−1 )).

(7.2.22)

278

7 The central limit theorem for dynamical systems

Moreover, by (7.2.20) and (7.2.22) we get

Sm (g) 22 ≥ N(L − 1)mμ(F ) ≥ N (L − 1)m(1 − ε)N −1 L−1 = m(1 − L−1 )(1 − ε) ≥ m(1 − ε − L−1 ). These facts complete the proof of (7.2.17). , L−1 −l −1 (g) Let G = N l=N T (F ), then as above μ(G) ≥ 1 − ε − L . Thus by (f) we have | Sm (g) |2 dμ ≤ 4(K + 1)2 μ(Gc ) ≤ 4(K + 1)2 (ε + L−1 ). (7.2.23) Gc

On the other hand we have | Sm (g) |2 dμ = G

N L−1 T −l (F )

l=N

| Sm (g) |2 dμ.

(7.2.24)

Fix N ≤ l < N L, and define l = j N − 1 − l for some j ≥ 2 in such a way that 0 ≤ l < N. Denote by J the set of all indices determined by (d) being applied to m, l and T −j N+1 (F ). Let x ∈ T −l (F ) be a point for which there exists y ∈ T −j N +1 (F ) satisfying T l (y) = x. Then by (d) above we have Sm (g)(x) =

m−1

g(T i+l (y)) =

i=0

g(T i+l (y)) =

i∈J

g(T i (x)).

(7.2.25)

i∈J

Since T is measure-preserving, the set of all x ’s from T −l (F ) satisfying the property stated above has a μ-outer measure equal to μ(F ). But the functions on the left and right-hand side of (7.2.25) are measurable, and thus (7.2.25) remains valid for μ -a.e. x ∈ T −l (F ). Moreover, by (7.2.25) we may easily conclude that the functions g T i for i ∈ J restricted to T −l (F ) are independent with respect to μ(F )−1 tr (μ, T −l (F )). Therefore by (7.2.21) we get | Sm (g) |2 dμ = [g(T i (x))]2 μ(dx) −l −l T (F ) (7.2.26) i∈J T (F ) = #(J ) · μ(F ) ≤ (K + 1)μ(F ). Now by (7.2.23), (7.2.24) and (7.2.26) we easily conclude: | Sm (g) |2 dμ + | Sm (g) |2 dμ

Sm (g) 22 = Gc

G

≤ 4(K + 1) (ε + L

−1

) + N (L − 1)(K + 1)μ(F )

≤ 4(K + 1) (ε + L

−1

) + (K + 1)

2 2

= K(1 + 4(1 + K

−1 2

) · K(ε + L−1 ) + K −1 ).

7.2 A theorem of Burton and Denker

279

Moreover, by (7.2.24) and (7.2.26) we get

Sm (g) 22 ≥ N (L − 1)(K − 1)μ(F ) ≥ N(L − 1)(K − 1)(1 − ε)N −1 L−1 = (K − 1)(1 − L−1 )(1 − ε) ≥ K(1 − ε − K −1 − L−1 − εK −1 L−1 ). These facts complete the proof of Lemma 7.2.5. To conclude the preliminary part of the proof let us notice that by (a) in Lemma 7.2.5 we have: g dμ = X

= =

N L−1 T l=0 N L−1 l=0 N L−1 l=0

−l (F )

T −l (F )

g(x)μ(dx) g l(mod N ) (T l (x)) μ(dx)

(7.2.27)

g l(mod N ) (x) μ(dx) = 0. F

We proceed by constructing a function f satisfying the statement of the theorem. Let {Nn , n ≥ 1}, {Kn , n ≥ 1} and {Ln , n ≥ 1} be increasing sequences of positive integers with Nn being even and 0 ≤ Kn < Nn being odd for all n ≥ 1. Let {εn }n≥1 be a decreasing sequence of positive real numbers converging to zero, and let Fn be an (εn , Nn Ln )-Kakutani–Rochlin set for every n ≥ 1. According to the preceding construction and statement (c) in Lemma 7.2.5 we may conclude, that for each n ≥ 1 there exists a measurable function gn from X into {−1, 0, 1} such that Sm (gn ) restricted to T −l (F ) is independent of the partition generated by {gn−1 T i | 0 ≤ i < Nn } with respect to the probability measure μ(Fn )−1 tr (μ, T −l (Fn )) for all 1 ≤ m ≤ Nn and all Nn ≤ l < Nn Ln with g0 being zero. In addition we assume that the given numbers satisfy the following four conditions: Kn < 2−1 Nn−1

for all n > 1;

Kn (εn + L−1 n ) → 0, 1 √

an Kn

aj Kj → 0,

n → ∞; n → ∞;

(7.2.28) (7.2.29) (7.2.30)

j n

It is instructive to observe that by (7.2.31) we have (aj ) ∈ 1 , and thus (aj ) ∈ 2 as well. Hence by(7.2.27) and the fact that |gn | ≤ 1 for all n ≥ 1 we easily find that f ∈ L2 (μ) and f dμ = 0.

280

7 The central limit theorem for dynamical systems

7.2.6 Remark. Conditions (7.2.30) and (7.2.31) may seem to exclude each other. In order to show that (7.2.28)–(7.2.31) can be satisfied, we may proceed as follows: Step 1. First choose {an , n ≥ 1} and {Kn , n ≥ 1} to satisfy (7.2.30) and (7.2.31). For example, put a1 = 2−1 and an = (an−1 )4 for all n ≥ 1, and let Kn = (an )−4 for all n ≥ 1. Then (7.2.30) and (7.2.31) become: an (aj )−3 → 0, n → ∞, (7.2.32) j n

which is easily verified, since {aj }j ≥1 is decreasing fast enough. Step 2. In this step Kn ’s are already given, so choose Nn ’s to satisfy (7.2.28), and choose εn ’s ( small enough ) and Ln ’s ( large enough ) to satisfy (7.2.29). This completes the choice. In addition we shall introduce an auxiliary sequence that will be of a vital importance in the rest. Given m ≥ 1, we define nm = sup{n ≥ 1 : Kn ≤ m}. Further, we denote: fm = anm gnm + anm +1 gnm +1

and

Am = Knm |anm |2 + m|anm +1 |2

for all m ≥ 1. Then we have: 7.2.7 Lemma. Under the hypotheses stated above we have 1 Sm (fm ) → 1 2 Am

(7.2.34)

1 Sm (f ) − Sm (fm ) → 0 2 Am

(7.2.35)

√ as m → ∞. Moreover, we have √ as m → ∞.

,Nnm +1 Lnm +1 −1 −l Proof. (7.2.34): Fix m ≥ 1 and put G = l=N T (Fnm +1 ), then we have: nm +1

Sm (fm ) 22 ≤ 2 [Sm (anm gnm )]2 dμ + 2 [Sm (anm +1 gnm +1 )]2 dμ Gc Gc (7.2.36) + [Sm (fm )]2 dμ. G

By (e) in Lemma 7.2.5 we easily find: [Sm (anm gnm )]2 dμ ≤ 2|anm |2 (Knm + 1)2 μ(Gc ) Gc

≤ 8|anm | Knm 2

· Knm +1 (εnm +1 + L−1 nm +1 ).

(7.2.37)

281

7.2 A theorem of Burton and Denker

Moreover, since |gn | ≤ 1 for all n ≥ 1, we then have [Sm (anm +1 gnm +1 )]2 dμ ≤ |anm +1 |2 m2 μ(Gc ) Gc

≤ |anm +1 |2 m · Knm +1 (εnm +1 + L−1 nm +1 ).

(7.2.38)

By construction Sm (anm +1 gnm +1 ) restricted to T −l (Fnm +1 ) is independent of Sm (anm gnm ) with respect to μ(Fnm +1 )−1 tr (μ, T −l (Fnm +1 )), for all Nnm +1 ≤ l < Nnm +1 Lnm +1 , since by definition m < Knm +1 < Nnm +1 . Moreover it is easily verified that by definition and (7.2.28) we have Knm ≤ m < Nnm − Knm . Therefore (g) in Lemma 7.2.5 may be applied. In this way by (7.2.27), and (f) and (g) in Lemma 7.2.5 with (7.2.29), we get [Sm (fm )]2 dμ G Nnm +1 Lnm +1 −1

=

[Sm (anm gnm ) + Sm (anm +1 gnm +1 )]2 dμ

l=Nnm +1

T −l (Fnm +1 )

Nnm +1 Lnm +1 −1

=

[Sm (anm gnm )] dμ +

[Sm (anm +1 gnm +1 )]2 dμ

2

l=Nnm +1

T −l (Fnm +1 )

2 2 ≤ Sm (anm gnm ) 2 + Sm (anm +1 gnm +1 ) 2

T −l (Fnm +1 )

≤ |anm |2 Knm (1 + δ ) + |anm +1 |2 m(1 + δ ) (7.2.39) where δ ∨ δ → 0 as m → ∞. Moreover, by (7.2.35)–(7.2.39) we get: Sm (fm ) 2 ≤ Kn |an |2 ·δ/2+m|an +1 |2 ·δ/2+Am (1+δ/2) = Am (1+δ), (7.2.40) m m m 2 where δ = 32(δ ∨ δ ∨ Knm +1 (εnm +1 + L−1 nm +1 )) → 0 as m → ∞. On the other hand by independence, (7.2.37) + (7.2.38), and (e) and (f) in Lemma 7.2.5 with (7.2.28) + (7.2.29), we similarly get:

Sm (fm ) 22

≥

[Sm (fm )]2 dμ ≥

G

G

[Sm (anm gnm )]2 dμ +

G

[Sm (anm +1 gnm +1 )]2 dμ

2 2 = Sm (anm gnm ) 2 + Sm (anm +1 gnm +1 ) 2 2 − [Sm (anm gnm )] dμ − [Sm (anm +1 gnm +1 )]2 dμ Gc

Gc

≥ |anm | Knm (1 − δ ) + |anm +1 | m(1 − δ ) 1 − − 8|anm |2 Knm .Knm +1 εnm +1 + Lnm +1 2

2

(7.2.41)

282

7 The central limit theorem for dynamical systems

− |anm +1 |2 m · Knm +1 (εnm +1 + L−1 nm +1 ) δ

δ

|anm |2 Knm + |anm +1 |2 m − Knm |anm |2 + m|anm +1 |2 ≥ 1− 2 2 = Am (1 − δ), where δ = 16(δ ∨δ ∨Knm +1 (εnm +1 +L−1 nm +1 )) → 0 for m → ∞. Thus the statement follows straightforwardly by (7.2.40)+(7.2.41), and the proof of (7.2.34) is complete. (7.2.35): We have already noticed that by (7.2.28) above we have Knm ≤ m < Nnm − Knm for all m ≥ 1. Thus (g) in Lemma 7.2.5 may be applied. In this way we get nm −1 1 1 1

Sm (an gn ) 2 + √ Sm (f ) − Sm (fm ) 2 ≤ √ √ Am Am n=1 Am

≤√

∞

Sm (an gn ) 2

n=nm +1

nm −1 1 1 2(Kn + 1)an + √ Am n=1 Am

∞

m · an

n=nm +2

√ ∞ m 4 ≤√ an Kn + an anm +1 Am n=1 n=nm +2 2 n ∞ m −1 Knm +1 4 ≤ an Kn + an , 2 anm +1 anm Knm n m −1

n=1

n=nm +2

being valid for all m ≥ 1. By (7.2.30) and (7.2.31) above the right-hand side tends to zero as m → ∞. This fact completes the proof of (7.2.25). √ Let us put Xm = Sm (f )/ Sm (f ) 2 and Ym = Sm (fm )/ Am for m ≥ 1, and let ϕXm and ϕYm denote the characteristic function of Xm and Ym respectively, for all m ≥ 1. Then by (7.2.34) and (7.2.35) in Lemma 7.2.7 we may easily conclude that Xm − Ym → 0 in L2 (μ) as m → ∞. Therefore we have ϕXm (t) − ϕYm (t) → 0 as m → ∞, for all t ∈ R. Thus by the continuity theorem the main proof will be completed as soon as we show that ϕYm (t) → exp{−t 2 /2} as m → ∞, for all t ∈ R. This fact is established in the next lemma. 7.2.8 Lemma. Under the hypotheses stated above we have

Sm (fm ) E exp it √ → exp −t 2 /2 Am as m → ∞, for all t ∈ R. ,Nnm +1 Lnm +1 −1 −l T (Fnm +1 ) for Proof. Let t ∈ R be given and fixed. Put G = l=N nm +1 m ≥ 1. Since μ(G) → 1 as m → ∞, we may be concerned only with the integration

7.2 A theorem of Burton and Denker

283

over G and proceed as follows. Denote νl = μ(Fnm +1 )−1 tr (μ, T −l (Fnm +1 )) for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. As in the proof of Lemma 7.2.7 we may conclude that Sm (anm +1 gnm +1 ) restricted to T −l (Fnm +1 ) is independent of Sm (anm gnm ) with respect to νl for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. Hence we get: Sm (fm ) exp it √ dμ Am G Nnm +1 Lnm +1 −1 Sm (fm ) = μ(Fnm +1 ) · exp it √ dνl Am T −l (Fnm +1 ) l=Nnm +1 (7.2.42) Nnm +1 Lnm +1 −1 Sm (anm gnm ) = μ(Fnm +1 ) exp it dνl √ −l (F A T ) m n +1 m l=Nnm +1 Sm (anm +1 gnm +1 ) × exp it dνl √ Am T −l (Fnm +1 ) being valid for all m ≥ 1. Since m < Knm +1 , then by (b) in Lemma 7.2.5 we see that the random functions gnm +1 T i for 0 ≤ i ≤ m restricted to T −l (Fnm +1 ) are independent and identically distributed with respect to νl for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. Moreover, we have νl {gnm +1 T i = ±1} = 1/2 for all 0 ≤ i ≤ m with m ≥ 1. Thus by Berry–Esséen’s theorem (see Feller [1971] p. 542), we may conclude: u Sm (gnm +1 )(x) −v 2 1 −l sup νl x ∈ T (FNm +1 ) : exp dv ≤u −√ √ 2 m 2π −∞ u∈R 3 ≤√ m for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1. Hence we can deduce: Sm (anm +1 gnm +1 ) 1 m|anm +1 |2 2 exp it dνl = exp − t +ηm (7.2.43) √ 2 Am Am T −l (Fnm +1 ) → 0 as m → ∞. for all Nnm +1 ≤ l < Nnm +1 Lnm +1 with m ≥ 1, where ηm Moreover, it may be worthwhile to mention that this statement follows even more directly by the fact that the pointwise convergence of a sequence of characteristic functions to a characteristic function is uniform over every bounded interval. (This fact in turn relies upon Prohorov’s theorem and the continuity theorem). Indeed, having this it is enough to notice that |m|anm +1 |2 /Am | ≤ 1 for all m ≥ 1, and then apply the preceding fact to the interval [−t, t] by using the central limit theorem and the continuity theorem. However, note that, even though it is irrelevant for our purposes, = η (t). As an alternative proof of (7.2.43) we can also recall in this way we get ηm m that supx∈R | Fn (x) − F (x) |→ 0 as n → ∞, provided that Fn (x) → F (x) for all x ∈ R as n → ∞ and F is continuous, and supx∈R | Fn (x) − F (x) |→ 0 as n → ∞, if and only if supt∈R | ϕn (t) − ϕ(t) |→ 0 as n → ∞. Here Fn and F are distribution

284

7 The central limit theorem for dynamical systems

functions, and ϕn and ϕ are associated characteristic functions for n ≥ 1. Now by (7.2.42) and (7.2.43) we get: Sm (fm ) exp it √ dμ Am G 1 m|anm +1 |2 2 Sm (anm gnm ) = exp − t + ηm · exp it dμ √ 2 Am Am G (7.2.44) → 0 as m → ∞. Repeating the similar procedure being valid for m ≥ 1, where ηm for the nm level instead of the level nm + 1, and using the fact that by (7.2.28) above we have Knm ≤ m < Nnm −Knm , according to which by (d) in Lemma 7.2.5 we can extract from the sequence {gnm T i : 0 ≤ i < m} a subsequence of cardinality Knm − 1, Knm or Knm + 1 containing functions which are, being restricted to T −l (Fnm ), mutually independent and identically distributed with respect to ρl := μ(Fnm )−1 tr (μ, T −l (Fnm )), and taking values ±1 with probability 1/2, where Nnm ≤ l < Nnm Lnm , we obtain: Sm (anm gnm ) 1 Knm |anm |2 2 exp it dρl = exp − t + ηm (7.2.45) √ 2 Am Am T −l (Fnm ) → ∞ as m → ∞. Put for m ≥ 1, for all Nnm ≤ l < Nnm Lnm with m ≥ 1, where ηm ,Nnm Lnm −1 −l H = l=Nn T (Fnm ). Then it is easily verified that μ(G*H ) → 0 as m → ∞. m Hence by (7.2.34) and (7.2.35) we may conclude: Sm (fm ) exp it √ dμ Am G 1 m|anm +1 |2 2 Sm (anm gnm ) = exp − t + ηm · exp it dμ + δm √ 2 Am Am H $ % $ % m|anm +1 |2 2 Knm |anm |2 2 = exp − exp − μ(H ) + δm t + ηm t + ηm 2Am 2Am (7.2.46) → 0, δ → 0, η → 0, μ(G) → 1 and μ(H ) → 1 as m → ∞. Thus where ηm m m letting m → ∞ in (7.2.46) we get: Sm (fm ) exp it √ dμ → exp{−t 2 /2} Am

as m → ∞, for all t ∈ R. This fact completes the proofs of Lemma 7.2.5 and Theorem 7.2.8.

7.3 The central limit theorem for orbits Throughout the whole section we suppose that (X, A, μ, T ) is a given aperiodic dynamical system. We recall that CLT(μ, T ) denotes the set of all functions f ∈ L2 (μ)

285

7.3 The central limit theorem for orbits

with f dμ = 0 satisfying the central limit theorem as stated in (7.2.1) above. According to Theorem 7.2.4 we have CLT(μ, T ) = ∅. Given f ∈ CLT(μ, T ) we write Orb(f ) = {f T j : j ≥ 0}. The set Orb(f ) is called the orbit of f . The smallest linear subspace of L2 (μ) containing Orb(f ) is denoted by span(Orb f )). The main aim of this section is to investigate the central limit theorems involving elements of span(Orb(f )). It turns out that a complete description of weak Gaussian limits in this case can be obtained. The result is presented in Theorem 7.3.3 and then extended in Theorem 7.3.5. The proof relies upon the next two lemmas which are also of interest in themselves. Let Nd (μ, ) denote the d-dimensional Gaussian distribution with mean vector μ and covariance matrix . With U we denote the matrix having all entries equal to 1. We begin as follows. 7.3.1 Lemma. Let f ∈ L2 (μ) be an arbitrary function, and let f1 , . . . , fd ∈ Orb(f ) d be arbitrary elements for some d ≥ 1. For all dα = (α1 ,. 2. . , αd ) ∈ R and m ≥ 1, put 2 Am = |Sm (f )| dμ and Am (α) = |Sm k=1 αk fk | dμ. Suppose that Am → ∞ as m → ∞. Then we have:

d

1

k=1 αk

√

Am

d d − S S α f αk · f → 0, m k k m k=1

2

k=1

√ d Am (α) → αk √ Am k=1 as m → ∞, for all α = (α1 , . . . , αd ) ∈ Rd with dk=1 αk = 0.

(7.3.1)

(7.3.2)

Proof. Let α = (α1 , . . . , αd ) ∈ Rd with dk=1 αk = 0 be given and fixed. It is no restriction to assume that f1 = f T p1 , f2 = f T p2 , . . . , fd = f T pd for some 0 ≤ p1 < p2 < · · · < pd . Then we have Sm

d

d d m−1 m−1 αk fk − Sm αk · f = αk f T j +pk − f Tj

k=1

k=1

j =0

k=1

=

d

αk

k −1 m+p

j =m

k=1

j =0

f Tj −

p k −1

f Tj

j =0

being valid for all m ≥ 1, where the first term of the first sum in the last line reads zero in the case when p1 equals zero. Hence we get d d d αk fk − Sm αk · f ≤ 2 |αk |pk f 2 Sm k=1

k=1

2

(7.3.3)

k=1

for all m ≥ 1. Thus (7.3.1) follows straightforwardly by the fact that Am → 0 as m → ∞. Moreover (7.3.2) is an immediate consequence of (7.3.1). The proof is complete.

286

7 The central limit theorem for dynamical systems

7.3.2 Lemma. Let f ∈ CLT(μ, T ) be an arbitrary function, and let f1 , . . . , fd ∈ Orb(f ) be arbitrary elements for some d ≥ 1. Put Am =

d 2 αk fk dμ, |Sm (f )| dμ and Am (α) = Sm 2

k=1

for all α = (α1 , . . . , αd ) ∈ Rd and m ≥ 1. Suppose that Am → ∞ as m → ∞. Then we have d D 1 Sm αk fk −→ N (0, 1), (7.3.4) √ Am (α) k=1 √

d d 2 D 1 Sm αk fk −→ N 0, αk Am k=1 k=1

as m → ∞, for all α = (α1 , . . . , αd ) ∈ Rd with

d

k=1 αk

(7.3.5)

= 0.

Proof. Let α = (α1 , . . . , αd ) ∈ Rd with dk=1 αk = 0 be given and fixed. Then by (7.3.1) in Lemma 7.3.1 we easily find

d . Sm Sm (f ) k=1 αk fk dμ − exp it √ dμ → 0 exp it d √ Am Am k=1 αk as m → ∞, for all t ∈ R. Thus (7.3.4) follows by the continuity theorem and the fact that f ∈ CLT(μ, T ). Moreover (7.3.5) follows straightforwardly by (7.3.4) and (7.3.2) in Lemma 7.3.1. The proof is complete. 7.3.3 Theorem. Let f ∈ CLT(μ, T ) be an arbitrary function, and let f1 , . . . , fd ∈ Orb(f ) be arbitrary elements for some d ≥ 1. Write F = (f1 , . . . , fd ), and put Am = |Sm (f )|2 dμ for all m ≥ 1. Suppose that Am → ∞ as m → ∞. Then we have 1 D (7.3.6) √ Sm (F ) −→ Nd (0, U) Am as m → ∞. More generally, suppose that f1 , . . . , fd ∈ span(Orb(f )) are arbitrary elements. Then we have Nk fk = βik fik i=1

for some ∈ Orb(f ) and ∈ R with 1 ≤ i ≤ Nk and 1 ≤ k ≤ d. Suppose Nk k moreover that i=1 βi = 0 for all 1 ≤ k ≤ d, and that Am → ∞ as m → ∞. Then we have 1 D (7.3.7) √ Sm (F ) −→ Nd (0, ) Am fik

βik

287

7.3 The central limit theorem for orbits

as m → ∞, where = ( kl )dk,l=1 is given by kl =

Nk

Nl βik · βil

i=1

i=1

for all 1 ≤ k, l ≤ d. Proof. First we prove (7.3.6). Since the set D = (α1 , . . . , αd ) ∈ Rd : dk=1 αk = 0 is dense in Rd , then by the continuity theorem it suffices to show α · Sm (F ) dμ → exp iα · X¯ dμ (7.3.8) exp i √ Am for all α ∈ D, as m → ∞, where X¯ = (X, X, . . . , X) with X ∼ N (0, 1). Again by the continuity theorem, for (7.3.6) it is enough to show that α · Sm (F ) D −→ α · X¯ √ Am

(7.3.9)

for all α ∈ D, as m → ∞. Notice that we have α · Sm = Sm

d

αk fk ,

α · X¯ =

k=1

d

αk X

k=1

for all α = (α1 , . . . , αd ) ∈ Rd and all m ≥ 1. Hence we see that for (7.3.9) it suffices to show that d d D 1 (7.3.10) αk fk −→ αk X √ Sm Am k=1 k=1 for all α = (α1 , . . . , αd ) ∈ D, as m → ∞. However it is precisely the statement (7.3.5) in Lemma 7.3.2. Thus (7.3.8)–(7.3.10) is valid, and the proof of (7.3.6) is complete. Next we prove (7.3.7). We use the same argument as for (7.3.6). In this way we obtain that it suffices to show that D 1 αk fk −→ αk Xk √ Sm Am k=1 k=1 d

d

(7.3.11)

for all α = (α1 , . . . , αd ) that belong to a dense subset D of Rd , as m → ∞, where X = (X1 , . . . , Xd ) ∼ Nd (0, ). Notice that we have Sm

d k=1

Nk d αk fk = αk βik fik k=1 i=1

(7.3.12)

288

7 The central limit theorem for dynamical systems

d for all α = (α1 , . . . , αd ) ∈ R all m ≥ 1. Choose D to be the set of all α = dand k k d d (α1 , . . . , αd ) ∈ R such that k=1 N i=1 αk βi = 0. Then D is dense in R . Hence by (7.3.2) and (7.3.5) in Lemma 7.3.2 we get k D 1 Sm αk fk −→ αk βik X Am k=1 k=1 i=1

d

√

d

N

Nk k as m → ∞, with X ∼ N (0, 1). Thus (7.3.11) follows with Xk = i=1 βi X for all 1 ≤ k ≤ d, and the proof of (7.3.7) is complete. These facts complete the proof of the theorem. 7.3.4 Example. Let (X, A, μ, T ) be equal to (S N , AN , ν N , θ ) with S = {−1, 1}, A = 2S , ν{±1} = 1/2 and θ (s1 , s2 , . . . ) = (s2 , s3 , . . . ) for (s1 , s2 , . . . ) ∈ S N . Then T is strongly mixing, and thus aperiodic as well. Let f be the projection onto the first coordinate. Then f T j = εj form a Rademacher sequence for j ≥ 1. By the classical central limit theorem we have m−1 j m−1 1 D j =0 T f (x) =√ εj −→ N (0, 1) m−1 j m

j =0 T f 2 j =0

as m → ∞. Thus f ∈ CLT(μ, T ). However, if we put g = f − f T , then we have m−1 j 1 1 j =0 T g(x) = √ (ε1 − εm ) ∼ √ (ε1 + ε2 ) ∼ N (0, σ 2 ) m−1 j 2 2

j =0 T g 2 / CLT(μ, T ). Hence we see that the for all m ≥ 1 and any σ 2 > 0. Therefore g ∈ results of Lemma 7.3.1, Lemma 7.3.2 and Theorem 7.3.3 are as optimal as possible in general. A close look at the method of the proof of Theorem 7.3.3, through Lemma 7.3.1 and Lemma 7.3.2, shows that these results might be generalized in the number of elements in the orbit being involved. The result in shown can be formulated into2 two steps as is m γ f stands for the L (μ) -limit of γ f the next theorem. We clarify that ∞ k=1 k k k=1 k k 2 as m → ∞ whenever (γk )∞ k=1 ∈ 1 and fk ∈ Orb(f ) for k ≥ 1 with f ∈ L (μ). Notice that under these circumstances the limit exists. 7.3.5 Theorem. Let f ∈ CLT(μ, T ) be an arbitrary function, and let fk = f T pk ∈ Orb(f ) be arbitrary elements for some pk ≥ 0 with k ≥ 1. Let (αk )∞ k=1 ∈ 1 satisfy ∞

|αk |pk < ∞,

(7.3.13)

k=1 ∞ k=1

αk = 0.

(7.3.14)

7.4 A theorem of Volný

289

2 Put Am = |Sm (f )|2 dμ and Am (α) = |Sm ( ∞ k=1 αk fk )| dμ for all m ≥ 1. Suppose that Am → ∞ as m → ∞. Then we have 1 √ Sm αk fk − Sm αk · f → 0, 2 α A m k=1 k k=1 k=1 ∞

∞

∞

√ ∞ Am (α) 1 → αk √ , √ Am Am (α) k=1 Sm

∞

D αk fk −→ N (0, 1),

(7.3.15)

(7.3.16)

(7.3.17)

k=1 ∞ ∞ 2 D 1 αk fk −→ N 0, αk √ Sm Am k=1 k=1

(7.3.18)

pik k k k ∈ as m → ∞. More generally, suppose that: fk = ∞ i=1 βi fi for some fi = f T k k ∞ Orb(f ) with pi ≥ 0 and (βi )i=1 ∈ 1 for 1 ≤ k ≤ d and i ≥ 1. Suppose moreover that ∞ |βik |pik < ∞, (7.3.19) i=1 ∞

βik = 0

(7.3.20)

i=1

for all 1 ≤ k ≤ d. Write F = (f1 , . . . , fd ), and suppose that Am → ∞ as m → ∞. Then we have 1 D (7.3.21) √ Sm (F ) −→ Nd (0, ) Am

∞ k ∞ l as m → ∞, where = ( kl )dk,l=1 is given by: kl = i=1 βi · i=1 βi for all 1 ≤ k, l ≤ d. Proof. We point out (7.3.3) in the proof of Lemma 7.3.1 in order to understand how conditions (7.3.13) and (7.3.19) apply. The rest of the proof can be carried out along the lines of the proofs of Lemma 7.3.1, Lemma 7.3.2 and Theorem 7.3.3. We shall omit the details.

7.4 A theorem of Volný Let (X, A, μ, T ) be an aperiodic dynamical system. A natural question arising from the Cental Limit theorem of Burton and Denker is certainly the following: is it possible

290

7 The central limit theorem for dynamical systems

to find an f ∈ L2 (μ) with limit theorem:

f dμ = 0,

f 2 dμ = 1 satisfying the following central

u n−1 1 j 1 −v 2 μ x∈X: √ T f (x) ≤ u → √ exp dv 2 n 2π −∞ j =0 as n → ∞, for all u ∈ R? In other words, to replace in Theorem 7.2.4 is it jpossible by √n? The answer turns out T f the intrinsic CLT normalizing factors n−1 j =0 2 to be affirmative and follows from a fine result of Volný [1999], who even showed the existence of the invariance principle in any aperiodic dynamical system. Volný’s approach is still based on the use of Kakutani–Rochlin towers. 7.4.1 Theorem. Let (X, A, μ) be a nonatomic probability space. Let T : X → X be an ergodic aperiodic automorphism. There exist a function f ∈ L2 (μ) and independent

random variables Zj ∼ N 0, 2(log log 3 log log 2) , such that n−1 n−1 1 log log log log n 1/2 j max √ T f− Zj = O . 1≤j ≤n n log n j =0

j =0

A slight extension of this result to a class of weighted averages (with monotonic weights) is performed in Kristensen [2002: Corollary 9.5]. By elementary considerations, the answer to the above question is now easily deduced from this result. From the proof of this one, also follows that a strong invariance principle holds. 7.4.2 Theorem. Let (X, A, μ) be a nonatomic probability space. Let T : X → X be an ergodic aperiodic automorphism. There exist a function f ∈ L2 (μ) and a Brownian motion B = {Bt , t ≥ 0} such that n−1 j j =0 T f − B(n) a.s. = 0. lim √ n→∞ n log log n A key ingredient of the proof is the following very useful approximation property. 7.4.3 Proposition. Let (X, A, μ, T ) be an ergodic aperiodic measurable dynamical system. Let {Xi , i ∈ N} be a strictly stationary ergodic sequence defined on a joint probability space. Let A1 = {Ai , 1 ≤ i ≤ K} be a finite measurable partition of X, n be some positive integer as well as some positive real ε. Then, there exists a simple function f such that the following holds: (1) There exists a random vector (X0 , . . . , Xn ) defined on (X, A, μ) that is distributed as (X0 , . . . , Xn ) and such that μ ∃0 ≤ ≤ n : |f T − X | > ε ≤ ε.

7.5 CLT for rotations

291

(2) The partition F generated by {f T , 0 ≤ ≤ n} is ε-independent of A1 : μ(A) − μ(A|B) ≤ ε. A∈A1 ,B∈F μ(B)>0

A straightforward consequence of this proposition is the following statement, which is Proposition 1 in Volný [1999]. 7.4.4 Proposition. Let (X, A, μ, T ) be an ergodic aperiodic measurable dynamical system. Let {εk , k ≥ 1}, {αk , k ≥ 1} be two sequences of positive reals. Let of positive integers. Then there exist a sequence also {k , k ≥ 1} be a sequence {fk , k ≥ 1} ⊂ L2 (μ) with fk dμ = 0 and a triangular array of independent random variables Xk, ∼ N (0, αk2 ), 0 ≤ ≤ k , k ≥ 1 such that sup sup k≥1 0≤≤k

7.5

fk T − Xk,

≤ 1. εk

CLT for rotations

In this section, we focus on CLT for irrational rotations. In this case, harmonic analysis methods operate efficiently not only to treat the case of ergodic means, but also some other typical means having more complicated structure. Let θ ∈ ]0, 1[ ∩ Qc and τ (f )(t) = consider on (T, λ), the rotation τ x = x + θ (mod 1), x ∈ X. Let SN N−1 n k=0 f (τ t) denote the usual ergodic sums. Consider also the following sums: (A)

(B)

(C)

τ SN (f, g)(t) =

UNτ (f )(t) = τ N (f )(t) =

N −1 k=0 N −1 k=0 N −1

f (τ k t)g(τ 2k t) (nonlinear ergodic sums), σk f (τ k t) 2

f (τ k t)

(weighted ergodic sums),

(ergodic sums along the squares).

k=0

The nice linearity properties inherited from the structure of usual ergodic sums and allowing us in particular to work with coboundaries, naturally no longer exist for these sums. We study the CLT property for the means associated to each of these sums, including the usual ergodic sums, and show that this can be done using an elementary approach. In case (C), the method will be combined with the circle method. The study also extends, by means of the same method, to CLT and/or ASCLT for sample paths of some classes of Gaussian random Fourier series.

292

7 The central limit theorem for dynamical systems

The principle can be described as follows: let {[aj , bj [, j ≥ 1} be a sequence of non-overlapping intervals with limj →∞ aj = limj →∞ bj = 0. Let L = {j , j ≥ 1} be a lacunary sequence verifying {j θ } ∈ [aj , bj [

(∀j ≥ 1).

Then, for an appropriate choice of the intervals [aj , bj [, mild assumptions on the Fourier coefficients of f = 21 k∈Z βk llk will ensure that f satisfies the CLT and for the cases (A), (B) and (C) as well. Notation–Construction. Let f ∈ L2 (λ). We first propose a construction adapted to τ f = N −1 f τ k , and will say that a function the study of the usual sums SN f := SN k=0 f ∈ L2 (λ) satisfies the CLT when SN f D −→ N (0, 1),

SN f 2 and that f satisfies the ASCLT when N 1 1 D δ S f −→ N (0, 1). log N k SNNf 2 n=1

We thus follow the point of view already considered in Section 7.2, which consists of studying the CLT with respect to the natural L2 (λ)-normalizing factor. Let first ε = {εn , n ≥ 1} be a sequence of reals in ]0, 1] decreasing to 0, and put ηn = ε1 ε22 . . . εn2 . To the sequence ε we associate an increasing sequence of positive integers N = {Nj , j ≥ 1} which is defined as follows: n−1 ·N1 . (7.5.1) N1 ≥ 1 is arbitrary and for n ≥ 2, Nn = 16εn ηn−1 Next we associate to θ and the sequence ε another increasing sequence of positive integers L = {j , j ≥ 1} as follows: For any n ≥ 1 (since {{nθ}, n ≥ 1} is everywhere dense in [0, 1]) n is chosen so that ηn ηn ≤ {n θ } ≤ n . n 16 N1 2 16 N1

(7.5.2)

We next partition N into consecutive blocks Ij = [Nj , Nj +1 [ ,

j = 1, 2, . . .

(7.5.3)

and will estimate oscillations of partial sums overthese blocks. Let β = {βk , k ≥ 1} ∈ 2 2 such that β0 = 0, βn = β−n > 0, n = 0 and ∞ k=1 βk = 1. Let e(t) = exp(2iπ t), en (t) = e(nt), n ∈ Z. Define ∞

f (t) =

1 βk ek (t) = βk cos 2π k t. 2 k∈Z

k=1

293

7.5 CLT for rotations

Let g = {gk , k ≥ 1}, g = {gk , k ≥ 1} be two independent sequences of i.i.d. random variables defined on a joint probability space (, A, P) and such that E gk = E gk = 0, and E gk2 = E gk 2 = 1. We assume gk , gk to be sub-Gaussian random variables. Let γ = {γk , k ≥ 1} be defined by γk = gk + igk . Associate to f the random Fourier series X(t) = Xf (t) =

∞

# " βk $ γk ek (t) = βk gk cos 2π k t + gk sin 2π k t . k=1

k∈N

(7.5.4) Next, put for any integer N ≥ 1 and any real number θ ∈ [−1, 1], N −1 1 2iπ kθ e2iπ N t − 1 VN (θ ) = e = . N N (e2iπ t − 1) k=0

Plainly, for any integer N, SN (f )(t) = N

∞

" # βk $ VN (k θ )) cos 2π k t,

k=1

SN (Xf )(t) = N

∞

" # βk $ γk ek (t).VN (k θ )) .

k=1

The following estimates are valid for any integer n ≥ 1 and rely upon the choice of sequences N and L: (Ln ) (Nn )

∀N ≤ Nn , |VN ({n θ }) − 1| ≤ 2εn , ∀N ≥ Nn , ∀m < n, |VN ({m θ })| ≤ εn .

These are easily deduced from standard estimates of the kernels VN : for any integer N ≥ 1 and y ∈ [−1, 1], |VN (y) − 1| ≤ 32Ny,

|VN (y)| ≤

1 . 2Ny

(7.5.5)

Then, on the one hand (if N ≤ Nn ) |VN ({n θ}) − 1| ≤ 32N{n θ } ≤ 32Nn {n θ } ≤ 32

16n−1 .N1 ηn = 2εn , εn ηn−1 16n N1

and on the other (if N ≥ Nn , n > m) |VN ({m θ})| ≤

2 16m N1 1 1 εn ηn−1 = εn . ≤ ≤ 2N{m θ } 2Nn {m θ } 2(1/2)16n−1 .N1 ηm

Decomposition of partial sums. In order to avoid unnecessarily heavy notation, write more simply SN = SN (Xf ),

294

7 The central limit theorem for dynamical systems

unless the case requires something more explicit. The decomposition of SN goes as follows: put for any positive integers N and j , SN = SˇN + SˆN

where SˇN = N

+ sN SˇN = SN

where SN =N

∞ k>j ∞

" # βk $ γk · ek VN (k θ )) , " # βk $ γk · ek VN (k θ )) ,

(7.5.6)

k<j

TN = N

∞

" # βk $ γk · ek ,

bj =

k>j

∞

βk2

1/2 .

k>j

Observe for N ∈ Ij that TN /N bj depend on j only and equals

k>j βk $ γk ek j := . bj Put for j ≥ 1 and N ∈ Ij ,

τN = Nbj .

As will be seen later on, mild assumptions on the sequences ε and β will ensure that P We also put YN =

SN 2,λ = 1 = 1. N →∞ τN lim

SˇN − TN , τN

ZN =

SN , τN

UN =

SˆN − SN . τN

(7.5.7)

Then for N ∈ Ij we have the relation: SτNN − j = YN + ZN + UN .

Main estimates. Put ζk = ek N VN (k θ )−1 = zk +izk . Then $(γk ζk ) = gk zk −gk zk , and thus

SˇN − TN = βk gk zk − gk zk . k>j

= k:k≥j +1 βk (zk )2 +(zk )2 = k:k≥j +1 βk2 N 2 |VN (k θ )−1|2 . As N ∈ Ij and k > j , we have N < Nk and by property (Lk ), |VN (k θ ) − 1| ≤ 2εk . Thus SˇN − TN 2 ≤ 4N 2 βk2 εk2 ≤ 4εj2+1 N 2 βk2 = 4εj2+1 bj2 N 2 . 2,P Hence SˇN −TN 2

2

2,P

k>j

k>j

Hence ∀j ≥ 1, ∀n ∈ Ij ,

SˇN − TN 2,P ≤ 2εj +1 bj N

and

YN

2,P

≤ 2εj +1 . (7.5.8)

295

7.5 CLT for rotations

2 2 2 2 2 2 2 2 2 By property (Nj ) SN k<j βk N |VN (k θ )| ≤ 4 k<j βk N εj ≤ 4N εj , 2,P = since k < j and N ≥ Nj . Thus S

∀j ≥ 1, ∀N ∈ Ij ,

N 2,P

≤ 2εj N

and

ZN

≤ 2εj /bj .

2,P

(7.5.9)

" # The perturbation term is the quantity sN = βj N $ γj .ej VN (j θ ) (N ∈ Ij ). In this case |VN (j θ)| remains undetermined. We have sN

∀j ≥ 1, ∀N ∈ Ij ,

2,P

≤ βj N

and

UN

2,P

≤ βj /bj .

(7.5.10)

The series j (βj /bj )2 diverges in general, but we may choose sequence β so that the series j (βj /bj )a converges for a > 2. This will require that we work in some suitable Orlicz space in order to analyse efficiently the oscillation of partial sums SN around j . As a simple consequence of the previous estimates, we get SN − j

τN

2,P

≤ 4εj /bj + βj /bj ,

(7.5.11)

a bound which clearly limits the choice of the sequence β, one that cannot in effect grow faster than geometrically. Oscillations of partial sums. The oscillation of normalized partial sums SτNN around j and over the blocks Ij can be very precisely evaluated. Let indeed r > 2 and introduce the oscillation function Wr = Put r =

∞ j =1

∞

r 1/r SN sup − j .

j =1 N ∈Ij

(7.5.12)

τN

$ j +1 % % ∞ $ εj βj r 1/r 1 1/2 r 1/r log + . bj εk bj

(7.5.13)

j =1

k=1

7.5.1 Proposition. For any r > 2, Wr G,ν ≤ Cr r . Proof. It will be convenient to work with the following quantities: Yj = sup |YN |r ,

Zj = sup |ZN |r , N ∈Ij

N∈Ij

Yr =

∞

Yj

Uj = sup |UN |r ,

1/r ,

Zr =

∞

j =1

According to the decomposition Wr ≤ Cr (Yr + Zr + Ur ).

j =1 SN τN

Zj

N ∈Ij

1/r ,

Ur =

∞

Uj

1/r

(7.5.14) .

j =1

− j = YN + ZN + UN , we have the inequality

296

7 The central limit theorem for dynamical systems

s Put for s > 0, Gs (x) = e|x| − ns 0 : Gs dμ ≤ 1 (h ∈ L0 (μ)). α D

s Let also Mr < ∞ be such that e|x| ≤ 2 Gs (x) + 1 if |x| ≥ Mr . Since g and g are sub-Gaussian random variables, YN , ZN , UN are sub-Gaussian too and belong to LG (P). As for any finite index I ,

sup |fi | G,ν ≤ i∈I

2 log 2

1/2

2 sup fi G,ν #(I ), i∈I

we have for all t ∈ X, by using estimate (7.5.8),

2

(Yj )1/r (t) G,P = sup |YN (t)| G,P ≤ C sup YN (t) 2,P log #(Ij ) N ∈Ij

N ∈Ij

2

≤ Cεj +1 log #(Ij ). 2 This implies that (Yj )1/r G,ν ≤ Cεj +1 log #(Ij ). Therefore, $ %2 (Yj )1/r exp dν ≤ 2 2 Cεj +1 log #(Ij ) X× and consequently,

(Yj ) G2/r " #r dν ≤ 1. 2 Cεj +1 log #(Ij ) X× r

2 j , we deduce by Hence, Yj G2/r ,ν ≤ Cr εj +1 log #(Ij ) . Since Yrr = ∞ j =1 Y r

2 means of the triangle inequality that Yrr G2/r ,ν ≤ Cr ∞ j =1 εj +1 log #(Ij ) . We denote by B the bound obtained. Then r Yr G2/r dν ≤ 1. B X×

But

exp X×

Yr B 1/r

2

dν =

exp

r 2/r Y r

B

X×

=

Yrr B

+ <Mr

≤ exp(Mr )

2/r

Yrr B

dν

≥Mr

+2

$

G2/r X×

≤ exp(Mr )2/r + 4.

exp

r 2/r Y r

B

r Y r

B

dν %

+ 1 dν

297

7.5 CLT for rotations

It follows that

Yr G,ν ≤ Cr

∞ !

2 #r 1/r . εj +1 log #(Ij )

(7.5.15)

j =1

Using now estimate (7.5.9) and arguing similarly, we get

Zr G,ν ≤ Cr

∞ ! εj 2 j =1

bj

r 1/r

log #(Ij )

.

(7.5.16)

Finally, estimate (7.5.10) together with the same line of arguments also leads to

Ur G,ν ≤ Cr

∞ $ j =1

βj bj

%r 1/r

.

(7.5.17)

j +1 Notice that log #(Ij ) ≤ log Nj +1 ≤ C k=1 log(1/εk ). Since Wr ≤ Cr (Yr +Zr +Ur ) we have obtained

Wr G,ν ≤ Cr r , (7.5.18) as requested. 7.5.2 Remark. From the very proof of Proposition 7.5.1 follows that the conditions βj 2 log j = 0, j →∞ bj lim

εj2 log Nj +1 j ≥1

bj2

< ∞,

(7.5.19)

are sufficient to imply

P

SN 2 lim sup − j = 0 (in L (λ) and λ- a.e.) = 1.

j →∞ N ∈Ij

τN

(7.5.20)

CLT. In view of proving a CLT for X defined in (7.5.4), we modify very slightly the definition on f . At first, we shall and do assume that (k ) is r-lacunary with r ≥ 3. Let β 1 = βk1 , k ≥ 1 , β 2 = βk2 , k ≥ 1 , β 1 , β 2 ∈ 2 and let ρ = ρk , k ≥ 1 be defined by ρk = βk1 + iβk2 . Put F (t) =

∞

$(ρk ek (t))

(∀t ∈ T).

(7.5.21)

k=1

7.5.3 Theorem. Assume that the following condition is satisfied: εj + |ρj | lim = 0. 2 1/2 |ρ | k k>j

j →∞

Then F ∈ CLT.

(7.5.22)

298

7 The central limit theorem for dynamical systems

Some elementary estimates for products of complex exponentials collected in the following technical lemma will be necessary. 7.5.4 Lemma. Let be a finite index and {zk , k ∈ } be complex numbers. 1 2" 1 2 |zk |3 |zk | # (a) Assume that h0 = k∈ e− 2 zk 41 e 2 |zk | |zk |4 + 13 |1+z e ≤ 1/2. Then, k| ) zk k∈ e − 1 ≤ 2h0 . 1 2 ) 2 zk k∈ (1 + zk )e (b) Assume zk purely imaginary, and let h = ≤ 1/2. Then,

k∈

" |zk |4 4

e|zk | + √|zk |

3

2

3

1+|zk |

e|zk |+ 2

|zk |2 2

#

( ( 1 2 ezk − (1 + zk )e 2 zk ≤ 2h, k∈

Proof of Lemma 7.5.4. Put ak =

k∈ ezk 1 z2

− 1, k ∈ . We use the following inequal-

(1+zk )e 2 k 1 2 ity valid for any complex number z: ez − (1 + z)e 2 z

1

≤ 41 |1 + z||z|4 e 2 |z| + 13 |z|3 e|z| . # 3 1 2 "1 1 2 |zk | Then we get |ak | ≤ |e− 2 zk | 4 e 2 |zk | |zk |4 + 13 |1+z e|zk | . Thus k∈ |ak | ≤ h0 ≤ 21 , k| which by means of inequality 3.8.8 p. 314 of Mitrinovi´c [1970] implies ) 2 zk k∈ e ≤ h0 . − 1 − a k 1 2 1−h ) 0 (1 + zk )e 2 zk 2

k∈

k∈

1 2

1

Hence (a) follows. As for (b), observe that |1+zk ||e 2 zk | = |1+i|zk ||e− 2 |zk | ≤ 1, hence ) ) 1 2 1 2 2 zk ≤ 1. Multiplying both sides of inequality (a) by 2 zk k∈ (1 + zk )e k∈ (1 + zk )e gives % ( $1 ( 1 2 1 |zk |3 2 2 ezk − (1 + zk )e 2 zk ≤ 2 e|zk |+|zk | /2 = 2h. |zk |4 e|zk | + 2 4 3 1 + |zk |2 k∈ k∈ k∈ 2

Proof of Theorem 7.5.3. We use the decomposition of partial sums previously made and the Salem–Zygmund " method, which we # display for convenience. We observe that SN F (t) = N ∞ $ ρ V ( θ )e (t) . We introduce some notation, putting for k N k k k=1 any j ≥ 1, N ∈ Ij and t ∈ X: " # " # SˇN F (t) = N $ ρk VN (k θ )ek (t) , SˆN f (t) = N ρk $ ρk VN (k θ )ek (t) , k>j

k≤j

SˇN F (t) =

SN F

+ sN F,

" # $ ρk ek (t) , Fj (t) = k>j

SN F

" # =N $ ρk VN (k θ )ek (t) , k<j

cj =

k>j

|ρk |2

1/2 ,

σN = N cj .

299

7.5 CLT for rotations

Then, similarly to previous notation, SˇN f − NFj 2 = N 2 |ρk |2 |VN (k θ ) − 1|2 ≤ 4N 2 εj2+1 cj2 , k>j

2 S ≤ 4N 2 c2 ε 2 , 0 j N

Thus

sN ≤ N|ρj |.

SN F Fj SˆN F 2εj + |ρj | Fj SˇN F ≤ − + → 0, − ≤ 2εj +1 +

σN

cj

σN

cj

σN

cj

as N tends to infinity under the assumptions made. And by the triangle inequality

SN F 2 = 1. N →∞ σN lim

Using now the elementary inequality |eiu − eiv | ≤ 2| sin((v − u)/2)| ≤ |v − u| gives 1 SN F

Fj exp(iλSN F (t)/σN ) − exp(iλFj (t)/cj ) dt ≤ |λ| − σN cj 1 0 2εj + |ρj | ≤ |λ| 2εj +1 + → 0, cj as N tends to infinity. Let now κ : N → N be some increasing map satisfying |ρk |2 /cj2 ≤ εj2 (j ≥ 1), k>κ(j )

" # and write j = [j, κ(j )[, Fj = j j |ρ | k>j k k>j

300

7 The central limit theorem for dynamical systems

which tends to 0 as j tends to infinity. Redefining J if necessary, we deduce that for all t ∈ T and j ≥ J , ( j j zk (t) exp[zk (t)2 /2] ≤ C3 sup |ρk |/ck . exp(iλFj (t)/bj ) − k>j

k∈j

Integrating now on X leads for j ≥ J to

0

1

exp(iλFj (t)/bj ) −

(

dt ≤ C3 sup |ρk |/ck .

j j zk (t) exp[zk (t)2 /2]

k>j

k∈j

Put for any integer j ≥ 1, ( ( j (λ, t) = (1 + iλzk (t)), j

Bj (t) =

k∈j

j

(zk (t))2 .

k∈j

1) )

We have 0 j (λ, t)dt = 1. Indeed the product being k 1 + iλ[βk1 cos 2π k t + βk2 sin 2πk t] is representable as a sum of 1 plus a linear combination of cos 2π(k1 ± · · · ± kr ) or sin 2π(k1 ± · · · ± kr ). Since we assumed (k ) to be r-lacunary with r ≥ 3, the fact that the representation of a number n as n = k1 ± · · · ± kr is unique allows us to conclude our argument. We can thus factorize as follows:

1(

0

2 2 (λ, t). exp −λ Bj (t)/2 dt − exp(−λ /4)

j

=

1(

0

j

1(

≤

(λ, t) exp −λ2 Bj (t)/2 − exp(−λ2 /4)dt.

0

Now

" 2 # 2 (λ, t) exp −λ Bj (t)/2 − exp(−λ /4) dt

j

( (λ, t) =

(

j

j

(1 + |λ|2 |zk (t)|2 )1/2 ≤ exp(λ2 Bj (t)/2).

j κj

(βk1 )2 +(βk2 )2 . 2

$

Then since 0 ≤ Bj (t) ≤ 2,

%

λ2 1 − Bj (t) dt 2 2 $ % 2 32 /4 1 (βk1 )2 + (βk2 )2 ≤ e + Cj 2 . 2 2 cj2 k>κ

1 − exp

−

j

It remains to observe that

Cj 22 ≤

1 1 |ρk |4 /cj4 ≤ sup |ρk |2 /ck2 → 0, 4 4 k>j k∈j

as j tends to infinity by assumption. Collecting now these various estimates finally gives: for |λ| ≤ and j ≥ J = J (), N ∈ Ij , 1 2εj + |ρj | SN F (t) λ2 exp(iλ )dt − exp(− ) ≤ 2εj +1 + σN 4 cj 0 + εj + C3 sup |ρk |/ck + C2 exp(32 /4) sup |ρk |2 /ck2 → 0 k>j

k>j

as N tends to infinity. This achieves the proof. Let us now consider the random Fourier series defined in (7.5.4) and assume that the sequences g, g are independent and Gaussian. 7.5.5 Theorem (CLT for sample paths). Assume that |βj |(log j )1/2 = 0, j →∞ bj lim

εj2 log Nj +1 j ≥1

bj2

< ∞.

(7.5.23)

Then almost all sample paths of X satisfy CLT. Proof of Theorem 7.5.5. We may write

X(ω, t) = $ ρk (ω)elk (t) with ρk (ω) = βk gk (ω) + iβk gk (ω) k∈N

(k ≥ 1).

302

7 The central limit theorem for dynamical systems

Put ϕN (ω)2 =

2 gk (ω)2 + gk (ω) βk2

(N ∈ Ij , j ≥ 1).

k>j

Consider the Gaussian chaos of order 2, Bj =

βk 2 $ g 2 + g 2 − 2 % k

k

2

bj

k>j

.

By the hypercontractivity properties of Gaussian chaos (Ledoux–Talagrand [1991: p. 65]), for any integer q ≥ 2,

Bj q,P ≤ q Bj 2,P . 1 . Then E exp(αj |Bj |) ≤ 1 + αj Bj 1,P + Let wj = Bj 2,P and αj = 2ew j √ ∞ 1 q q 2π nnn e−n exp{ 1 1 } for any positive n (Mitriq=2 q! (αj q) wj . Since n! > 12n+ 4

novi´c [1970: p. 183]), we deduce

∞ ∞ 1 −q 1 1 q 2 = √ . (αj wj q) ≤ √ q! 2 π 4 π q=2

q=2

Thus E exp(αj |Bj |) ≤ C (with C = 1 + inequality gives, for any positive real η,

1 2e

+

1 √ ). 4 π

P |Bj | ≥ η ≤ C exp −

Now Bj 22,P = 2(E N (0, 1)4 − 1) β4 k 4 b k>j j

Thus wj = o so

1 log j

=o

k>j

k>j

Applying then Tchebycheff’s

η . 2ewj

βk4 /bj4 . By assumption,

βk2 bj2 log2 j

=o

1 . log2 j

. Let ρ > 0 be such that 2eρ < η. For j large, wj ≤ ρ/ log j , and

η η P |Bj | ≥ η ≤ C exp − ≤ exp − log j . 2ewj 2eρ By the Borel–Cantelli lemma and using the fact that η can be arbitrarily small, we a.s. deduce |Bj | = o(1) or, a.s. 1 2 gk + (gk )2 βk2 = 2. 2 j →∞ b j k>j lim

Therefore NϕN (ω) 1 2 gk (ω) + (gk (ω))2 βk2 = 2, = lim 2 N→∞ j →∞ b τN j lim

k>j

303

7.5 CLT for rotations

P-almost surely. The assumption of Theorem 7.5.3 here reduces to lim

1/2

εj + gj2 (ω) + (gj (ω))2 |βj |

j →∞

bj

= 0, g (ω)2 +g (ω)2

< which is trivially satisfied under the assumptions made, since supk≥1 k log kk ∞, P-almost surely. Applying now Theorem 7.5.3 gives for P-almost all trajectories of X(ω), SN (X(ω))/N ϕN (ω) "⇒ N (0, 1), √ as N tends to infinity. And thus SN (X(ω))/( 2τN ) ⇒ N (0, 1), as N tends to infinity, P-almost surely. Now we deduce from Remark 7.5.2 that SN (X) − j lim sup

j →∞ N∈Ij

τN

2,λ

= 0 "⇒

SN (X) lim sup

j →∞ N ∈Ij

τN

2,λ

− j 2,λ = 0,

P-almost surely. But we have seen that 1 2 gk (ω) + (gk (ω))2 βk2 = 2, 2 j →∞ b j k>j

lim j 22,λ = lim

j →∞

P-almost surely. Thus P limN →∞

SN (X(ω) 2,λ τN

=

√ 2 = 1. Finally,

SN (X(ω))/ SN (X(ω) 2,λ "⇒ N (0, 1), as N tends to infinity, P-almost surely. This achieves the proof. 7.5.6 Examples. We end the section with some examples. Example 1. Put for k ≥ 3 (α > 2, b > 1), −1/2

βk = |k|(log |k|)b .

εk = k −α/2 ,

n−1 ) −2 and so Nn 3 exp(α n log n), Then n−1 k=2 εk = exp k=2 α log k 3 exp(α n log n); 2 where α > 0 and α > 0 depend on α only. Besides bj 3 |k|≥j +1 |k|−1 (log |k|)−b 3 β √ (log j )−b+1 . Further bjj log j 3 √1j → 0, as j tends to infinity, and εj2 log Nj +1 j ≥1

bj2

3

(log j )b−1 j log j j ≥1

jα

< ∞.

Thus r < ∞ for any r > 2. Consequently Wr (X(ω)) < ∞ and X(ω) ∈ CLT for almost all ω ∈ .

304

7 The central limit theorem for dynamical systems

Example 2. Put for k ≥ 3 (c > b > 1), εk = exp[−(log k)c ],

βk2 = exp[−(log k)b ] − exp[−(log(k + 1))b ]. β2

Then βj2 3 jb (log j )b−1 exp[−(log(j ))b ]. Hence bj2 3 jb (log j )b−1 . Further εn ηn−1 3 j

exp[−n(log n)c ], and so Nn 3 exp[−n(log n)c ]. Therefore εj2 log Nj +1 j ≥1

bj2

3

exp[−(log j )c + (log j )b ]j (log j )c < ∞.

j ≥1

Thus r < ∞ for any r > 2. Consequently Wr (X(ω)) < ∞, X(ω) is continuous and X(ω) ∈ CLT for almost all ω ∈ . The third example will be used in the next section. Example 3. Put for k ≥ 1, βk2 = log−4 (k + 2) exp[−k log−4 k],

εk2 = k −4 exp[−k log−4 k].

− x − x The following lemma is elementary (i.e., uses e (log x)4 3 (log1x)4 e (log x)4 , x large). For any positive integer j put dj =

Nj ≤N j

ak2 bi2

|aj | 2 k>j ak

1/2 = 0,

1/2 = lim j →∞

|bj | i>j

bi2

1/2 = 0.

Then τ (f, g) SN D −→ N (0, 1). τ

SN (f, g) 2

Proof. Plainly τ SN (f, g)(t)

∞ N = ak bi k,i (t, θ ), 2 k,i=1

where we put "

# k,i (t, θ ) = cos 2π t (k + i )$ VN (k − 2i )θ "

# + cos 2π t (k − i )$ VN (k + 2i )θ "

# − sin 2π t (k + i )4 VN (k + 2i )θ "

# − sin 2π t (k − i )4 VN (k − 2i )θ .

(7.5.25)

Since L is assumed to be ρ-lacunary with ρ ≥ 3, the decomposition n = k + i is unique; hence τ

SN (f, g) 22 =

∞ 2

2 N 2 2 2

ak bi VN (k + 2i )θ + VN (k − 2i )θ . 4 k,i=1

306

7 The central limit theorem for dynamical systems

Next put for any j ≥ 1 and N ∈ Ij N τ SˇN (f, g) = ak bi k,i (t, θ ), 2 k,i>j

τ SN (f, g) =

N 2

ak bi k,i (t, θ ),

k or i <j

N N τ sN (f, g) = aj bi j,i (t, θ ) + bj ak k,j (t, θ ), 2 2 i≥j

(7.5.26)

k≥j

N Rjτ (f, g) = ak bi {cos 2π t (k + i ) + cos 2π t (k − i ) 2 k,i>j

− sin 2π t (k + i ) −sin 2π t (k − i )}. Then, τ Sˇ (f, g) − R τ (f, g) 2 N j 2 2

2 N 2 2 2

= ak bi VN (k + 2i )θ − 1 + VN (k − 2i )θ − 1 . 4 k,i>j

Define for u ∈ R, | u | = dist(u, Z); ( | u | ∈ [0, 1/2] and | u + v | ≤ | u | + | v | ). Then VN (ψ) = VN (| ψ |). We assume that N1 is chosen so that ε1 /(16N1 ) ≤ 1/8. Thus | i θ | = {i θ} and | 2i θ | = {2i θ } = 2{i θ }, (i ≥ 1). Combining these elementary observations with (7.5.5) gives

VN (k ± 2i )θ − 1 = VN | (k ± 2i )θ | − 1 ≤ 32N| (k ± 2i )θ | ≤ 32N | k θ | + 2| i θ | η ηi k (7.5.27) ≤ 64Nj +1 + 16k N1 16i N1 6416j N1 ηk ηi ≤ ≤ 8εj +1 . + ηi εj +1 16k N1 16i N1 τ 2 Therefore SˇN (f, g) − Rjτ (f, g) 2 ≤ 4N 2 k,i>j ak2 bi2 εj2+1 and so τ Sˇ (f, g) − R τ (f, g) ≤ 2N f 2 g 2 εj +1 . (7.5.28) N j 2 τ (f, g), we have As regarding SN 2 τ S (f, g) 2 = N N 2 4

2

2

ak2 bi2 VN (k + 2i )θ + VN (k − 2i )θ .

k or i <j

Observe that {k θ } = | k θ | ≤ 1/8, {2i θ } = 2{i θ } = | 2i θ | ≤ 1/4. Hence | (k ± 2λi )θ | = {k θ } ± 2{λi θ }.

307

7.5 CLT for rotations

Thus, by (7.5.5) VN (k ± 2λi )θ ≤ 1/2Nj {k θ } ± 2{λi θ }. By distinguishing the three cases k = i, k > i and k < i, it is easy to show that {k θ } ± 2{λi θ } ≥ 3 {h θ } (h = k ∧ i). 4 Since h < j we have {h θ} ≥ Therefore

3 ηj −1 ηj −1 1 16j −1−h "⇒ {k θ } ± 2{λi θ } ≥ . j −1 . . j −1 ηh ≥ j −1 2 16 N1 2.16 N1 8 16 N1 j −1

VN (k ± 2λi )θ ≤ 8 . ηj −1 εj . 16 N1 ≤ 3εj . 3 16j −1 N1 ηj −1

(7.5.29)

Thus 9N 2 εj2 τ τ 3N εj 2 2 S (f, g) 2 ≤ "⇒ SN a b (f, g) 2 ≤

f 2 g 2 . k i N 2 4 2 k or i <j

(7.5.30) Finally 1/2 1/2 τ 2 s (f, g) ≤ N |aj | . b + |b | ak2 j i N 2 i>j

(7.5.31)

k>j

Summarizing, we find by combining estimates (7.5.28), (7.5.30) and (7.5.31), τ S (f, g) − R τ (f, g) N j 2 τ τ τ τ ˇ ≤ SN (f, g) − Rj (f, g) 2 + SN (f, g) 2 + sN (f, g) 2 (7.5.32) $ 1/2 1/2 % ≤ 4N f 2 g 2 εj + N |aj | bi2 + |bj | ak2 . i>j

Therefore

k>j

τ τ Rjτ (f, g) SN (f, g) 2 SN (f, g) R τ (f, g) − 1 ≤ R τ (f, g) − R τ (f, g) 2 2 2 2 j j j εj ≤ 4 f 2 g 2 2 2 1/2 k,i>j ak bi $ % |aj | |bj | + + .

2 1/2 2 1/2 k>j ak i>j bi

Under the assumptions made, we deduce τ Rjτ (f, g) SN (f, g) = 0, − lim sup j →∞ N ∈Ij Rjτ (f, g) 2

Rjτ (f, g) 2 2 τ (f, g)

SN 2 = 0. lim sup − 1 2 2 1/2 j →∞ N ∈Ij N k,i>j ak bi )

(7.5.33)

308

7 The central limit theorem for dynamical systems

Thus the normalized partial are close – in the L2 (λ)-norm – to the sequence sums τ R (f,g) of normalized remainders R τj(f,g) 2 . We are led to the same situation as the one j

j

treated in detail before, and we deduce that τ (f, g) SN D −→ N (0, 1), τ

SN (f, g) 2

(7.5.34)

as N tends to infinity. We now pass to the study of another example. Weighted ergodic sums. Let σ = {σk , k ≥ 0} be a sequence of reals and put SN = N−1 2 σ . k=0 k Consider for f ∈ L (m) the weighted sums UNτ f =

N −1

σk f τ k

(N ≥ 1).

(7.5.35)

k=0

The kernels corresponding to the averages of these weighted sums are defined by −1 2iπ kt . We have the easy estimates valid for any M ≥ N ≥ 0 σ e WN (t) = S1N N k k=0 and t ∈ X, WM (t) − WN (t) ≤ 4π Mt SM − SN , SM ii) WM (t) − 1 ≤ 4π Mt, i)

iii)

WN (t) ≤

2σ∞ SN | sin π t|

(σ∞ = σ0 +

(7.5.36)

∞

|σk − σk+1 |).

k=0

σk 2iπ kt −1)− The first estimate comes from the equation WM (t)−WN (t) = M−1 k=0 SM (e N−1 σk 2iπ kt − 1); the last is obtained by Abel summation. Let N = {Nj , j ≥ 1} k=0 SN (e be some rapidly increasing sequence of integers greater than 2 and write again Ij = [Nj , Nj +1 [. We assume that ηj := Nj /Nj +1 tends to 0 as j tends to infinity. We choose a sequence L = {j , j ≥ 1} so that 1 1 1 − 2 ≤ {k θ } < . Nk Nk Nk Let β 1 = (βk1 )k≥1 , β 2 = (βk2 )k≥1 β 1 , β 2 ∈ 2 and put ρ = (ρk )k≥1 with ρk = βk1 +iβk2 and consider ∞ F (t) = $(ρk ek (t)) (∀t ∈ T) k=1

7.5.9 Theorem. Assume that SN ≥ N α for some positive α. If

309

7.5 CLT for rotations

i) limj →∞

ηj

k>j

ρk2

1/2 = 0, ρ2

k = 0 (∀0 < a ≤ 1), ii) limj →∞ aj ≤k≤j 1/2 k>j

ρk2

then

UNτ F D −→ N (0, 1).

UNτ F 2

If ρk2 = √ k −1 (log k)−2 , then condition (ii) is satisfied, and condition (i) means limj →∞ ηj / log j = 0. Theorem 7.5.10 applies in this case provided that N grows sufficiently fast. Proof. By estimates (7.5.36): for N ∈ Ij , WN (k θ ) − 1 = WN ({k θ }) − 1 ≤ 4π ηj +1 WN (k θ ) ≤ 8σ∞ η[j a]−1 (k < [j a]). π Put for any j ≥ 1, N ∈ Ij and t ∈ X: " # Uˇ N (F ) = SN $ ρk WN (k θ )ek , k>j +1

uN (F ) = SN

(k > j + 1),

UN (F ) = SN

"

#

$ ρk VN (k θ )ek ,

" # $ ρk WN (k θ )ek ,

k≤[j a]

Fj (t) =

[j a] 0.

N −1 k=0

f τ P (k)

Proof. We use the circle method. Let N = {Nk , k ≥ 1} be increasing to infinity and put $ % 1 1 1 k = − 2, . Nk Nk Nk 1/ h

Let 0 < h < 1/3 be fixed. Put for k ≥ 1 and N ≥ Nk , 2−h k (N) = α ∈ k : ∃q ≤ N h and a with (a, q) = 1 : α − qa < N2 . Then, λ(k (N)) ≤

#

2−h a 2 2−h " λ k ∩ qa − N2 ,q + N .

q≤N h

(a,q)=1 dist(a/q,k )

1 N 2−h

.

Let α ∈ N=2p ,N ≥N 1/ h k (N )c ; then for all p ≥ p0 , q ≤ (2p )h and (a, q) = 1, k α − a < 2p2−h . Let 2p ≤ N < 2p . For q ≤ N h , we have q ≤ (2p )h and for a such 2−h q (2 ) that (a, q) = 1, 2−h 1 α − a ≥ 2 ≥ 2−h . p 2−h q (2 ) N 1 Therefore, for any p ≥ p0 and 2p ≤ N < 2p the inequality α − qa ≥ N 2−h , is fulfilled h for any q ≤ N and a with (a, q) = 1. This implies that λ(∗k ) ≥

1 λ(k ) > 0. 2

By using Birkhoff’s theorem: for almost all x ∈ X and integers k ≥ 1, #{0 ≤ j < J : x + j θ ∈ ∗k (mod 1)} = λ(∗k ). J →∞ J lim

Thus we can find an x ∈ X and an increasing sequence of positive integers L = {k , j ≥ 1} such that x + k θ ∈ ∗k (mod 1) (∀k ≥ 1). Define, for ρ = {ρk , k ≥ 1} ∈ 2 , F =

∞

$ ρk .ek +x/θ . k=1

The system of functions (ek +x/θ , k ≥ 1) is orthonormal, and τ N F =N

∞

$ ρk .ek +x/θ · QN (k θ + x) . k=1

h a) We fix N. We first consider the summation block1 corresponding to N ≤ Nk < a We have the following estimate: if α − q < q 2 and (a, q) = 1,

N 2−h .

N 1 q 1/2 2iπ αn2 1+ε 1 e . + + 2 < Cε N q N N n=1

We choose ε = h/4 and assume N1 > 2. Since x + k θ ∈ ∗k ⊂ k (mod 1), we have 1 1 1 − 2 ≤ | x + k θ | = {x + k θ } < , Nk Nk Nk

312

7 The central limit theorem for dynamical systems

and so N 1 1 Nk 1/2 2 e2iπ(x+k θ )n < Ch N 1+h/4 + ≤ Ch N 1−h/4 . + 2 Nk N N n=1

Therefore |QN (x + k θ )| ≤ Ch N −h/4

(N h ≤ Nk < N 2−h )

and X

2 $ ρk · ek +x/θ · NQN (k θ + x) dt

k: N h ≤Nk j

Our first objective is to prove the following proposition. 7.6.2 Proposition. There exists an absolute constant C such that for any j large enough and |λ| ≤ j ,

1 0

2 Fj (t) |ρk | −λ2 /2 ≤ Cej sup exp(iλ )dt − e . cj k>j ck

Proof. Let ε = {εj , j ≥ 1} be a decreasing sequence of reals satisfying εj ≤ sup |ρk |.

(7.6.8)

k>j

Let now κ : N → N be some increasing function satisfying for any positive integer j , |ρk |2 /cj2 ≤ εj2 , (7.6.9) k>κ(j )

7.6 Lacunary series and convergence in variation

319

" # and put j = [j, κ(j )[, Fj = j κj

≤

2j 32 /4 " 2 e j εj 2

(7.6.13)

# + Cj 2 .

It remains to observe that 1 1 1 |ρk |4 /cj4 ≤ sup |ρk |2 /cj2 ≤ sup |ρk |2 /ck2 .

Cj 22 ≤ 4 4 k>j 4 k>j

(7.6.14)

k∈j

Putting together estimates (7.6.9) to (7.6.14) finally gives: there exists J2 < ∞ such that for any j ≥ J2 , |λ| ≤ j and N ∈ Ij , 1 2

(t) F λ j )dt − exp − exp(iλ cj 2 0 3j 3 2 1 sup |ρk | + 2j e 4 j εj2 + sup |ρk | ≤ cj k≥j 2cj k>j 2 |ρk | 3 3 2 43 2j |ρk | ≤ sup j + j e ≤ Cej sup . 2 k>j ck k>j ck This proves Proposition 7.6.2. Application to Gaussian random Fourier series. Now we pass to random Fourier series. Let g = (gk )k∈N , g = (gk )k∈N be two independent sequences of N (0, 1) distributed

7.6 Lacunary series and convergence in variation

321

random variables defined on a probability space (, A, P) different from (T, λ). Let

2 1/2 . We assume that the following condition β = (βk )k∈N ∈ 2 . Put bj = k>j βk is satisfied: 2 −ε lim ebk sup |βj | log j = 0 for some e > 0. (B1) k→∞

j ≥k

Let

−2/δ

k = bk

,

(7.6.15)

where δ is chosen sufficiently large to satisfy 2

2 e j lim j sup |βj | log j = 0. k→∞ bj j ≥k

(7.6.16)

It is enough to take δ > 4/ε, where ε is defined in (B1). For k ≥ 1 we put ρk = βk (gk + 1 2 2 2 1

2 2 = 2 igk ). Then with the preceding notation, cj = k>j |ρk | k>j gk +gk βk . Consider for t ∈ T, ω ∈ the following Gaussian random Fourier series:

X(ω, t) = $ ρk (ω)ek (t) k∈N

=

∞

βk gk (ω) cos 2π k t + gk (ω) sin 2π k t ,

k=1

Tj (ω, t) =

1

$ ρk (ω)ek (t) cj k>j

1

= βk gk (ω) cos 2π k t + gk (ω) sin 2π k t , cj

(7.6.17)

k>j

1

j (ω, t) = √ $ ρk (ω)ek (t) bj 2 k>j 1

= √ βk gk (ω) cos 2π k t + gk (ω) sin 2π k t . bj 2 k>j 7.6.3 Proposition. Under conditions (B1) and ∞

βk2 log2 k < ∞,

(B2)

k=1

we have sup

|λ|≤j

0

1

exp(iλTj (ω, t))dt − e

−λ2 /2 a.s.

= O e

2j

√ |βj | log j . bj

322

7 The central limit theorem for dynamical systems

Proof. Consider the Gaussian chaos of order 2, Bj =

βk 2 $ g 2 + g 2 − 2 % k

k>j

k

2

bj

.

By the hypercontractivity properties of Gaussian chaos (see for instance Ledoux and Talagrand [1991: inequality 3.8], for any integer q ≥ 2,

Bj q,P ≤ q Bj 2,P . Let wj = Bj 2,P and αj =

1 2ewj

. Then,

∞ 1 q E exp(αj |Bj |) ≤ 1 + αj Bj 1,P + (αj q)q wj . q! q=2

Using the elementary estimate n! > integer n, we deduce

√ " 2π nnn e−n exp

1 12n+ 41

#

valid for any positive

∞ ∞ 1 1 −q 1 q (αj wj q) ≤ √ 2 = √ . q! 2 π 4 π q=2

q=2

Thus E exp(α|Bj |) ≤ C (with C = 1 + inequality gives for any positive real η,

1 2e

+

1 √ ). 4 π

Applying then Tchebycheff’s

ηj . P |Bj | ≥ ηj ≤ C exp − 2ewj

(7.6.18)

2 Now Bj 2,P = 2(E N (0, 1)4 − 1) k>j βk4 /bj4 , and by assumption (B1) we have for −εp

any k large enough |βk | ≤ (log k)−1/2 exp(−bk−ε ). As exp(bk−ε ) ≥ Cp bk bound |βk | ≤ C(log k)−1/2 bk2 for a suitable choice of p. Thus β4 k 4 b k>j j

≤

, we get the

C 2 C 2 C , βk ≤ βk log k ≤ 2 log j log j k≥1 log2 j k>j

(7.6.19)

where we have used (B2) to get the last inequality. Henceforth, for j large

P |Bj | ≥ ηj

ηj ≤ C exp − 2ewj

≤ exp(−Cηj log j ).

By the Borel–Cantelli lemma and using the fact that η can be arbitrarily small, we deduce a.s. |Bj | = o(−1 (7.6.20) j ).

7.6 Lacunary series and convergence in variation

In particular,

a.s. 1 2 gk + (gk )2 βk2 = 2. 2 j →∞ b j k>j lim

It follows that limj →∞

cj a.s. bj =

√

323

(7.6.21)

2. As E supk≥1 gk2 + (gk )2 / log k < ∞, we have

1/2 √

2 |βk | gk2 + (gk )2 log k sup |ρk | = sup ≤ C sup |βk | log k, √ log k k>j k>j k>j where C is a random variable with finite expectation. Therefore, by (7.6.16), 2

lim e

j →∞

2j

2 |ρk | √ ej a.s. sup ≤ 2C lim sup |βk | log k = 0. j →∞ bj k>j k>j ck

This implies that assumption (H2 ) is satisfied for almost all sample paths of X. Applying now Proposition 7.6.2 gives 1 √ |βk | log k 2j −λ2 /2 a.s. exp(iλTj (ω, t))dt − e . sup = O e sup bk 0 |λ|≤j k>j This proves Proposition 7.6.3. We close the section by establishing the following corollary. 7.6.4 Corollary. Under assumptions (B1), (B2) and with the choice of j defined above, 1 2 1 2 sup max 0 exp(iλTj (ω, t))dt − e−λ /2 , 0 exp(iλj (ω, t))dt − e−λ /2 |λ|≤j

equals o(−1 j ) almost surely. Proof. By assumption, we deduce from Proposition 7.6.3 the inequality concerning Tj . Now 1! exp(iλTj (t)) − exp(iλj (t)) dt 0 1 Tj (t) − j (t)dt ≤ |λ| 0 (7.6.22) cj 1 |Tj (t)|dt = |λ|1 − √ bj 2 0 cj cj ≤ |λ|1 − √ Tj 2,λ = |λ|1 − √ . bj 2 bj 2

324

7 The central limit theorem for dynamical systems

But,

1 −

2 2 2 cj k>j gk + (gk ) βk . − 1 √ ≤ 2bj2 bj 2

The proof is then achieved by using (7.6.20). Small values of some trigonometric series. In this section we give conditions under which for any α > 0, the following estimate is fulfilled:

1

M(α) = sup j

0

dt

2 2 k≥j βk sin π k t

< ∞. 2 α / i≥j βi

(E)

First, we prove a series of intermediate results. 7.6.5 Proposition. Let K > 0 be an integer, (γn )n≥1 be a positive sequence with 2 n≥1 γn < ∞, and L = {n , n ≥ 1} be a sequence of integers such that n+1 is a multiple of n for every n ≥ 1. Set r = K /1 . Then, for any ε > 0, we have λ

n≥1

γn2 sin2 π n x

0. Then we have 1 λ(I ∩ A) ≤ λ(I )λ(A) + λ(A). s Proof.,It suffices to prove the estimate asserted by the lemma for sets of the form A = s−1 i=0 ( + (i/s)), where is an interval of length λ() ≤ 1/s. Indeed, once this is proved, the result follows since the Borel σ -field coincides with the monotone class generated by the Boole algebra of disjoint unions of sets (Ai )i≤1 , where each Ai has the same form as A. , We turn now to the proof of the assertion for the set A = s−1 i=0 ( + (i/s)) with as above. The maximal number of such i ∈ {0, . . . , s − 1} for which + (i/s) intersects I is bounded above by (λ(I )/(1/s)) + 1 = λ(I )s + 1. Then

λ(I ∩ A) ≤ (λ(I )s + 1)λ() = λ(I )sλ() + which completes the proof of the lemma.

sλ() 1 λ(A), = λ(I ) + s s

325

7.6 Lacunary series and convergence in variation

Proof of Proposition 7.6.5. We have that λ x ∈ T: γn2 sin2 π n x < ε n≥1

γn2 sin2 π n x < ε ≤ λ x ∈ T : γ12 sin2 π 1 x < ε, n≥K

(7.6.23)

n = λ x ∈ T : γ12 sin2 π x < ε, γn2 sin2 π rx < ε , 1 n≥K

and it follows now from Lemma 7.6.6 that γn2 sin2 π n x < ε λ x ∈ T: n≥1

ε 1 ≤ λ x ∈ T : sin π x < 2 + λ x ∈ T: γn2 sin2 π(n /1 )x < ε K γ1 n≥K 2

ε1/2 1 ≤ λ x ∈ T : 2x < + λ x ∈ T: γn2 sin2 π(n /1 )x < ε γ1 K

=

1/2 ε

γ1

+

n≥K

1

r

λ x ∈ T:

γn2 sin2 π n x < ε .

n≥K

7.6.7 Assume that {γn , n ≥ 1} is a sequence of positive reals satisfying Proposition. 2 < ∞ and let ( ) γ n n≥1 be a sequence of integers such that n+1 is a multiple n≥1 n of n for every n ≥ 1. Furthermore, let m > 1 and Km > Km−1 > · · · > K0 ≥ 1 be some integers. Then, m 1/2 ( Kj −1 ε γn2 sin2 π n x < ε ≤ + . λ x∈T: γKj −1 Kj j =1

n≥1

Proof. Follows from Proposition 7.6.5 by induction. As an immediate consequence, we have the following corollary. 7.6.8 Corollary. Under the assumptions of Proposition 7.6.7, let us additionally suppose that for s = 1, . . . , m, Ks /Ks−1 ≥ ρ and that γn ≥ γ > 0 for n = 1, . . . , Km . Then we have

λ x∈T:

n≥1

γn2 sin2 π n x

pm , we have pm ≥ kδm , thus γmp ≥ 2−δmp , and we can continue with 1 m 1 λ x∈T: γn2 sin2 π n x < 2−p ≤ 2− 2 p 2δmp + 2−p ≤ 2m · 2−p( 2 −δm)m n≥1

≤ 2m · 2−pm/6 . Define dγ (x) =

∞

γn2 sin2 π n x.

n=1

Thus for any sequence γ satisfying (H3 ), we have proved the following assertion: For any integer m ≥ 1, there exists a number pm depending on m and γ only, such that for any p > pm the following estimate holds true: (7.6.24) λ x ∈ T : dγ (x) < 2−p ≤ 2m .2−pm/6 . Since m ≥ 1 is an arbitrary integer, the latter relation implies that 1 dγ−α (x)dx < ∞,

(7.6.25)

0

for every α > 0. Throughout the rest of the section, we assume that for some b > 1, βk = k −1/2 (log k)−b/2 . We now pass to establishing, for every α > 1, the relation (E). We are going to make use of the following asymptotics (b > 1): βk2 = k −1 (log g)−b 3 1/(log j )b−1 , k≥j

k≥j

k≥j

βk4 =

k≥j

k −2 (log g)−2b 3 1/j (log j )2b ,

327

7.6 Lacunary series and convergence in variation

A positive integer A will be chosen later. Fix a certain integer p ≥ 1. Then for arbitrary integers m ≥ 1 and j ≥ Ap we obtain 8 λ βk2 sin2 π k x βi2 < 2−p k≥j

=λ

i≥j

k≥j

≤λ

βk2 cos 2π k x

k≥j

≤

k≥j

8

βk2 (1/2 − (1/2) cos 2π k x)

βi2

−1

8

βi2

βi2 < 2−p

i≥j

> 1 − 2−(p−1)

i≥j

βk2 cos 2π k x > 1 .

i≥j

In view of our assumption that k+1 is a multiple of k (k ≥ 1), the sequence (cos 2πk x)k≥1 is a reversed sequence of bounded martingale differences, and we may apply the following deviation bound. 7.6.9 Lemma. Let X1 , . . . , Xn be a sequence of bounded martingale-differences so that |Xi | ≤ ci , i = 1, . . . , n. Then for every x > 0 we have P

n

Xi > x ≤ exp −

i=1

2

x2 n

.

2 i=1 ci

From Lemma 7.6.9 and the previous calculations we see that for j ≥ Ap, λ

βk2 sin2 π k x

k≥j

8

βi2 < 2−p ≤ exp

i≥j

−

k≥j

2

βk2

k≥j

2

βk4

Aj (log(Aj ))2b ≤ exp −C(b) (log(Aj ))2b−2

≤ exp (−C(b)Aj log(Aj )2 ) ≤ exp (−C(b)Aj ) ≤ exp (−C(b)Ap). Now we choose A to satisfy the relation A = A(b, κ) > C −1 (b)κ log 2. Thus we get that for every integer p > 1 and j ≥ Ap, 8 βk2 sin2 π k x βi2 < 2−p ≤ 2−κp . λ k≥j

i≥j

(7.6.26)

(7.6.27)

328

7 The central limit theorem for dynamical systems

Let us consider now case j ≤ Ap with a certain fixed integer p ≥ 1. For arbitrary integers m ≥ 1 and j ≤ Ap we have then 8 λ βk2 sin2 π k x βi2 < 2−p k≥j

=λ

i≥j

βk2

8

k≥j

βi2

−1

sin2 π k x < 2−p ) .

i≥j

Notice that for k ∈ [j, j + mp] with j ≤ Ap we have k ≤ (A + m)p and βk2

βi2

−1

2 ≥ β(A+m)p

i≥j

βik

−1

i≥1

1 1 ≥ C(b, m, A) . (A + m)p(log((A + m)p))2b p(log p)2b 2 2 −p We apply now Corollary 7.6.8 with γn2 = βn+j i≥j βi (n = 1, 2, . . . ), ε = 2 , −1 / p Ks = j + sp(s = 0, . . . , m) and r = 2 to obtain for j ≤ Ap the relation ≥ C(b)

λ

βk2 sin2 π k x

k≥j

8

βi2

i≥j

≤

−p/2 2 p(log p)2b

C(b, m, A)

+2

−p

m

.

Now, by choosing m > 2κ, we may conclude that for every p > 1 and j ≤ A = A(b, κ) the estimate 8 λ βk2 sin2 π k x βi2 ≤ 2−p ≤ C (b, κ)2−κp (7.6.28) k≥j

i≥j

holds true. The estimate (7.6.28) combined with the inequality (7.6.27) gives us that for every b > 1 and κ > 1 there exists such constant C(b, κ) that for every p > 1 we have 8 λ βk2 sin2 π k x βi2 ≤ 2−p ≤ C(b, κ)2−κp . k≥j

i≥j

This implies (E) for every α < κ exactly in the same way as after the proof of Lemma 7.6.6. We have therefore proved the following result. 7.6.10 Proposition. Assume that βk = k −1/2 (log k)−b/2 where b > 1. Then property (E) is realized for every α > 0. Local time and density distribution. In this section, we show that the distribution function of j (ω, · ) (see definition in (7.6.17)) is for almost all ω, absolutely continuous with respect to the Lebesgue measure. Our approach relies upon the properties of local times for Gaussian processes; we refer to Section 10.3. Put for j ≥ 1 and

329

7.6 Lacunary series and convergence in variation

A ∈ B(R), j (A) = j (ω, A) = λ{0 ≤ t ≤ 1 : j (ω, t) ∈ A}, 1 ixu ˆ e j (ω, dx) = eiuj (ω,t) dt. j (u) = j (ω, u) = R

0

In a first step, we show the almost sure existence of a continuous local time for j (ω, · ), namely the density of the distribution function of j (ω, · ). The approach for this is standard. But the result (E) obtained in the previous section is crucial here. Existence and continuity of the local times of j . Our first objective will be to prove ˆ j (u)du is finite. To begin, we observe that that the integral E R ˆ j (u)2 |u|1+δ du E R 1 1 1+δ |u| E exp{iu[j (s) − j (t)]}dsdt du = R

0

=

R

|u|1+δ 0

=

|v|

1+δ

R

=

0 1 1

R

e

0

−v 2 /2

exp{−u2 / j (s) − j (t) 22,P }dsdt du

dv 0

|v|1+δ e−v

2 /2

1 1

0

1 1

dv 0

0

dsdt

j (s) − j (t) 2+δ 2,P

1 2bj2

dsdt

k≥j

2+δ

βk2 sin2 π k (s − t)

≤ C(δ)M(1 + δ/2), by (E). Hence

sup E j ≥1

Since $

E

|u|>ε

ˆ j (u)du

R

%2

ˆ j (u)2 |u|1+δ du ≤ C(δ)M(2 + δ).

$ ≤E $ ≤

%2

|u|>ε

|u|>ε

2 = δE δε

(7.6.29)

ˆ j (u)du

% |u|−(1+δ) du .E

|u|>ε

ˆ j (u)2 |u|1+δ du

|u|>ε

ˆ j (u)2 |u|1+δ ,

we deduce sup E j ≥1

R

ˆ j (u)du ≤ 2ε +

1/2

2 C(δ)M(1 + δ/2) δεδ

.

(7.6.30)

330

7 The central limit theorem for dynamical systems

It follows (see Section 10.3) that j (ω, · ) is absolutely continuous – j (ω, · ) has local times – and x j (ω, x) − j (ω, −∞) = φj (ω, u)du, −∞

where φj (ω, u) ≥ 0, φj (ω, u) ∈

L1 (R).

φj (ω, x) = Put

+∞ −∞

1 1 2 p(x) = √ e−x = 2π π

∞

−∞

Then φj (ω, x) − p(x) = And sup φj (ω, x)−p(x) ≤ x

Moreover ˆ j (u)du. e−iux

e−iux e−u

∞

−∞

2 /2

du

γ (x) = e−x

ˆ j (u) − γ (u) du. e−iux

(7.6.31)

2 /2

.

(7.6.32)

∞ −∞

ˆ j (u)−γ (u)du ≤ I1 (j )+I2 (j )+I3 (j ) , (7.6.33)

where

I1 (j ) =

j

−j

ˆ j (u) − γ (u)du,

I2 (j ) =

|x|≥j

I3 (j ) =

|x|≥j

γ (u)du,

(7.6.34)

ˆ j (u)du,

and j is chosen according to (7.6.15), (7.6.16). The first integral is estimated by Corollary 7.6.4: a.s. (7.6.35) I1 (j ) = o(1). Clearly I2 (j ) = o(1). In order to precisely estimate I3 (j ), it will be necessary to first consider for k < j the integrals ˆ j (u) − ˆ k (u)2 |u|1+δ du. E R

Estimating E

ˆ j (u) − ˆ k (u)2 |u|1+δ du, we shall prove the following lemma. R

7.6.11 Lemma. There exists Cδ finite, such that for any j ≥ k, bk2 − bj2 2 1+δ ˆ ˆ . E j (u) − k (u) |u| du ≤ Cδ bk2 R

331

7.6 Lacunary series and convergence in variation

Proof. Since 1 1

iu (t)

iu (s) ˆ j (u) − ˆ k (u)2 = E E e j − eiuk (t) dt e j − eiuk (s) ds, 0 0 1 1

iu (t) E e j − eiuk (t) eiuj (s) − eiuk (s) dtds, = 0

0

elementary computations show that E

R

ˆ j (u) − ˆ k (u)2 |u|1+δ du = C(δ)

1 1

k,j (s, t)dsdt, 0

(7.6.36)

0

where k,j (s, t) can be calculated as 1

j (s) − j (t) 2+δ 2,P

−

1

j (t) − k (s) 2+δ 2,P 1 1 − + . 2+δ

k (t) − j (s) 2,P

k (s) − k (t) 2+δ 2,P

Write k,j (s, t) = 1k,j (s, t) + 2k,j (s, t) + 3k,j (s, t), where 1k,j (s, t) = 2k,j (s, t) = 3k,j (s, t) =

1

j (s) − j (t) 2+δ 2,P 1

j (s) − j (t) 2+δ 2,P 1

k (s) − k (t) 2+δ 2,P

− − −

1

j (t) − k (s) 2+δ 2,P 1

k (t) − j (s) 2+δ 2,P 1

j (t) − j (s) 2+δ 2,P

, ,

(7.6.37)

.

The two first expressions are of the same type. We observe that √

2 j (t) − k (s) cos 2π λ t sin 2π λ t cos 2π λ s sin 2π λ s βλ − gλ + − gλ = bj bk bj bk λ>j

1 − bk

kj

=

bk2

− bj2 bk2

1 1 2 1 1 − 2+ − cos 2π λ (t − s) 2 b b b bk bj j j k

+4

βλ2

λ>j

(7.6.38)

bj2 − bk2 2 sin2 π λ (t − s) + bj bj2 bj2 bk2

2 1 1 2 + − βλ cos 2π λ (t − s) bj bj bk λ>j

2 2 1 1 2 − βλ cos 2π λ (t − s). = 2E j (t) − j (s) + bj bj bk λ>j

Hence, 2 2 2 1 1 2 − βλ cos 2π λ (t − s). 2E j (t) − k (s) = 2E j (t) − j (s) + bj bj bk λ>j

(7.6.39) Similarly, 2 2 2 1 1 2 − βλ cos 2π λ (t − s). 2E k (t) − j (s) = 2E j (t) − j (s) + bj bj bk λ>j

Now, we estimate (7.6.39),

11 0

0

(7.6.40) 1k,j (s, t)dsdt. Fix s and t in [0, 1], and write according to

1k,j (s, t) = =

1

j (s) − j (t) 2+δ 2,P 1 A1+δ/2

−

−

1

j (t) − k (s) 2+δ 2,P

1 , (A + a)1+δ/2

333

7.6 Lacunary series and convergence in variation

2 where A = j (s) − j (t) 22,P and a = b2j b1j − b1k λ>j βλ cos 2π λ (t − s). We have A + a ≥ 0. So if a ≤ 0, then 0 ≤ A + a ≤ A, and 1k,j (s, t) ≤ 0. Now if a ≥ 0, we make use of the elementary inequality (x + y)1+ε − x 1+ε ≤ (1 + ε)y(x + y)ε , valid for any reals x, y, ε > 0, to bound 1k,j (s, t) as follows: (A + a)1+δ/2 − A1+δ/2 a(A + a)δ/2 ≤ (1 + δ/2) 1+δ/2 1+δ/2 (A + a) A (A + a)1+δ/2 A1+δ/2 a ≤ (1 + δ/2) 2+δ/2 . A

1k,j (s, t) =

Hence, by writing a = a(s, t), 1k,j (s, t) ≤ 0.I{a(s, t) ≤ 0} + (1 + δ/2) ≤ 0.I{a(s, t) ≤ 0} + (2 + δ)

a

j (s) − j (t) 2+δ 2,P

I{a(s, t) > 0}

bk2 − bj2

1

bk2

j (s) − j (t) 2+δ 2,P

I{a(s, t) > 0}, (7.6.41)

since |a(s, t)| ≤

21 b j bj

b2 −b2 b −b − b1k bj2 = 2 kbk j ≤ 2 kb2 j . By integrating inequality (7.6.41) k

over [0, 1]2 with respect to dsdt, we obtain 1 0

1 0

≤ (2 + δ)

1k,j (s, t)dsdt

2 bk − bj2 1 1

bk2

0

0

dsdt

j (s) − j (t) 2+δ 2,P

. (7.6.42)

Now, we use Proposition 7.6.10 to observe that

1 1

sup

j ≥1 0

We thus arrive at

0

dsdt

j (s) − j (t) 2+δ 2,P

1 1 0

0

≤ M(1 + δ/4) < ∞.

1k,j (s, t)dsdt ≤ Cδ (

bk2 − bj2 bk2

).

(7.6.43)

).

(7.6.44)

Similarly 0

1 1 0

2k,j (s, t)dsdt

≤ Cδ (

bk2 − bj2 bk2

334

7 The central limit theorem for dynamical systems

11 It remains to estimate the last integral: 0 0 3k,j (s, t)dsdt. But, by the elementary inequality used to control 1k,j (s, t), we get 3k,j (s, t)

=

2+δ

j (t) − j (s) 2+δ 2,P − k (s) − k (t) 2,P

2+δ

k (s) − k (t) 2+δ 2,P j (t) − j (s) 2,P j (t) − j (s) 22,P − k (s) − k (t) 22,P 3 . "⇒ |k,j (s, t)| ≤ (1 + δ/2)

2+δ

j (t) − j (s) 2+δ 2,P ∧ k (s) − k (t) 2,P

Now k (s) − k (t) 2 − j (t) − j (s) 2 2,P 2,P 4 2 1 2 1 2 2 = 2 βλ (sin π λ (t − s)) + 4 2 − 2 βλ (sin π λ (t − s)) bk kj ≤4

bk2 − bj2 bk2

+4

bk2 − bj2 bk2

≤8

bk2 − bj2 bk2

.

Therefore, |3k,j (s, t)| ≤ 8

bk2 − bj2 bk2

1

+

j (t) − j (s) 2+δ 2,P

1

k (s) − k (t) 2+δ 2,P

.

By invoking again Proposition 7.6.10, we deduce that 1 1

0

0

|3k,j (s, t)|dsdt

≤ Cδ

bk2 − bj2 bk2

.

(7.6.45)

From (7.6.43), (7.6.44) and (7.6.45), we also have E

b2 − bj2 ˆ k (u)2 |u|1+δ du ≤ Cδ k ˆ j (u) − . bk2 R

(7.6.46)

And the lemma is proved. Proof of Theorem 7.6.1. We use the notation from the preceding section. Put, for any positive integer j , ˆ j (u)du. j = I3 (j ) = (7.6.47) |u|>j

Now, we show how Lemma 7.6.11 can be used to give an almost sure asymptotic estimate for j . Before going further, it is necessary to make some elementary

335

7.6 Lacunary series and convergence in variation

observations. First, we can write j − k = ˆ k (u)du, and thus k j

ˆ j (u) − ˆ k (u))du −

|u|>j (

ˆ k (u)du + ˆ j (u) −

ˆ k (u)du. k j

$

%2 ˆ k (u)du .

+ 2E k j

$

=E ≤ ≤

%2

ˆ j (u) − ˆ k (u)du

1+δ ( 1+δ 2 )−( 2 )

|u|>j

|u|

−(1+δ)

|u|

|u|>j bk2 − bj2 Cδ −δ j bk2

du E

R

ˆ j (u) − ˆ k (u)2 |u|1+δ du

(7.6.49)

= Cδ (bk2 − bj2 ),

−2/δ

since j = bj according to (7.6.15). And on the other, by using again the Cauchy– Schwarz inequality and (7.6.29), $ %2 −(1+δ) ˆ ˆ k (u)2 |u|1+δ du E k (u) du ≤ |u| du E k A 3 ≤ 2A sup φj (ω, x) − p(x) + 2 + p(x)dx, A |x|>A x∈R +

for any j large enough. Hence, by (7.6.54), 3 lim sup φj (ω, x) − p(x) dx ≤ 2 + p(x)dx. A R |x|>A j →∞ But A is arbitrary now. Letting then A tend to infinity finally gives a.s. lim sup φj (ω, x) − p(x)dx = 0. j →∞

R

To achieve the proof, it remains for us to prove that the series This amounts to requiring that (a) βλ2 log2 λ < ∞, λ

(b)

λ

2

λ log λ λ 0, not too many balls of radius u are needed to cover (N, d). According to a classical criterion, this information implies that the sequence X has an almost sure regular asymptotic behavior. In most cases, not only the sequence converges almost everywhere, but a speed of convergence can also be specified. As we will see in the next sections, (8.1.1) contains two cases of different nature: α > 1 and α = 1. Before going further, it seems natural to put assumption (8.1.1) into a more general framework.

342

8 The metric entropy method

Let : R → R+ be a Young function (convex, even, such that (0) = 0 and limx→∞ (x) = ∞). Let L denote the subspace of L0 (P), formed with elements f such that for some c > 0, E (c|f |) < ∞. The Orlicz norm associated to is defined by

f = inf{α > 0 : E (|f |/α) ≤ 1},

f ∈ L .

(8.1.2)

Then L endowed with the norm · is a Banach space. In particular, if (t) = |t|p , L is the usual Lp space. But other spaces are important, for instance exponential type Orlicz spaces associated to the exponential functions α (x) = e|x| − 1, α

1≤α u} ≤

|Xs −Xt | d(s,t)

1 u

and A = {U > u}, we have from (8.1.4):

U dP ≤ U >u

1 1 , P{U > u}−1 u P{U > u}

so that (8.1.4) implies

P

|Xs − Xt | 1 >u ≤ d(s, t) (u)

for every u ≥ 0 and s, t ∈ T .

(8.1.5)

When is of exponential type, (8.1.5) is equivalent to (8.1.3), and so (8.1.3) and (8.1.4) are equivalent. But when is of power type, (8.1.4) is in turn less stringent than (8.1.3). Conditions similar to (8.1.5) were used in Weber [1980]; more precisely it was assumed that for some random variable ,

|Xs − Xt | P > u ≤ P{ > u} d(s, t)

for every u ≥ 0 and s, t ∈ T ,

or else

+ E |Xs − Xt | − d(s, t)u ≤ d(s, t)E ( − u)+

(8.1.5 )

for every u ≥ 0 and s, t ∈ T .

The basic problem investigated under these various conditions can be described as follows: when for instance is the following implication true?

Xs − Xt ≤ d(s, t), ∀s, t ∈ T "⇒ sup |Xs − Xt | < ∞. s,t∈T

The supremum in the above is, for the moment, only understood as lattice supremum in L , for instance E sup |Xs − Xt | = sup E sup |Xs − Xt |, T0 finite in T . (8.1.6) s,t∈T

s,t∈T0

The weaker requirement will also be of some relevance: when under some of the increment conditions above considered, could we infer that E sup |Xs − Xt | < ∞? s,t∈T

Before continuing, it seems natural and necessary to examine what consequences can be drawn from these assumptions concerning finite supremums. A first observation concerns condition (8.1.4). Let Y1 , . . . , YN be nonnegative random variables on (, B, P) verifying: for any 1 ≤ n ≤ N and any measurable set A, 1 Yn d P ≤ P(A)−1 . (8.1.7) P(A) A Then, for any measurable set A, N N −1 sup Yn d P ≤ P(A) . P(A) A n=1

(8.1.8)

344

8 The metric entropy method

To see how it obtains, let {An , 1 ≤ n ≤ N} be a measurable partition of such that −1 Yn = supN i=1 Yi on An . Then, by the concavity of ,

N

sup Yn d P = A n=1

≤

N

Yn d P

n=1 A∩An N

P(A ∩ An )−1

n=1

N P(A ∩ An )

≤ P(A)−1

N . P(A)

Thus assumption (8.1.4) implies for any finite subset F of T ×T , and any measurable set A, |Xs − Xt | −1 #(F ) sup . (8.1.9) d P ≤ P(A) P(A) A (s,t)∈F d(s, t) This is also a consequence of assumption (8.1.3), since we have seen that (8.1.3) implies (8.1.4). In particular under (8.1.3), for any F finite in T × T , E sup |Xs − Xt | ≤ −1 (#(F )).

(8.1.10)

s,t∈F

But very often, under (8.1.3) more can be obtained and in a very elementary way. If (t) = |t|p with 1 ≤ p < ∞, then for any nonnegative random variables Y1 , . . . , YN on (, B, P), N N sup Yn ≤ N 1/p sup

Yn p . (8.1.11) n=1

p

n=1

The argument is rather straightforward: N N N p p p Yn d P ≤ N sup Yn p . sup Yn d P ≤ n=1

n=1

n=1

Now, if is of exponential type, a similar conclusion can be derived. For instance, α let 1 ≤ α < ∞ and set (t) = e|t| − 1. Then for any nonnegative random variables Y1 , . . . , YN on (, B, P), N N sup Yn ≤ max{1, ( log N )1/α } sup

Yn , (8.1.12) n=1

n=1

and we may take = 2/ log 2. This follows from Jensen’s inequality. We can assume supN n=1 Yn ≤ 1 and N ≥ 2. Then, as log N ≥ 1, α 1 ( log N) 1

N α supN n=1 Yn dP ≤ exp sup Yn d P ≤ (2N ) ( log N) ≤ 2. exp 1/α ( log N) n=1 This justifies the following definition: we say that a Young function is regular when there exists a constant C = C() depending on only, such that for any nonnegative random variables Y1 , . . . , YN , N N sup Yn ≤ C−1 (N ) sup

Yn . (8.1.13) n=1

n=1

345

8.1 Introduction and general results

Versions and separable processes. Let (T , d) be a metric space. Further, let X = {Xt , t ∈ T } be a stochastic process with basic probability space (, B, P). A version or a modification of X is a stochastic process X = {Xt , t ∈ T } with the same basic probability space, such that for each t in T , P{Xt = Xt } = 1. Suppose for instance that X satisfies the increment condition (8.1.3). By Tchebycheff’s inequality, if ε > 0, −1

P{|Xs − Xt | > ε} ≤

ε d(s, t)

→ 0,

as d(s, t) tends to 0. So that X is d-continuous in probability. If, in addition, (T , d) is separable, let T be a countable d-dense subset of T . Then for any t ∈ T , there exists a sequence {sn (t), n ≥ 1} contained in T and such that lim d(sn (t), t) = 0, P lim Xsn (t) = Xt = 1. n→∞

n→∞

X

If we now define, for each t in T , by Xt lim Xsn (t) n→∞

if t ∈ T , if t ∈ T \T ,

⎧ ⎨P{Xt = Xt } = 1, ⎩P sup Xt = sup Xt = sup Xt = 1,

then

t∈T

t∈T

∀t ∈ T .

t∈T

Consequently X is a version of X, and further X depends only on a countable family of random variables, so that there is no measurability problem when working with its supremum. We also note that the d-continuity in probability of the process, instead of condition (8.1.3), suffices for getting the conclusion. As a complement to this notion, we say that a stochastic process X = {Xt , t ∈ T } indexed on T and with basic probability space (, B, P) is d-separable or simply separable, if there exists a countable subset S of T , called a separation set (or separant set), and a null set N of B such that for any ω ∈ N and any t ∈ T , there is a sequence {sn , n ≥ 1} ⊂ S verifying lim d(sn , t) = 0,

n→∞

X(ω, t) = lim X(ω, sn ). n→∞

This is a very convenient notion, which solves measurability problems raised by the study of quantities such as supt∈T Xt , sups,t∈T |Xs − Xt | . . . . When X is d-separable, we have, by definition, P{sup Xt = sup Xt } = 1, t∈T

t∈S

P{ sup |Xs − Xt | = sup |Xs − Xt |} = 1 . . . . s,t∈T

s,t∈S

346

8 The metric entropy method

We therefore shall say that X admits a d-separable version or a d-separable modification, if there exists a stochastic process X = {Xt , t ∈ T } which is d-separable, and for which one also has P{Xt = Xt } = 1 (∀t ∈ T ). For instance, if X satisfies assumption (8.1.3) and (T , d) is separable, by the very construction of the modification X made above, X admits a d-separable version, which is precisely X . Indeed, take S = T and observe that for all ω ∈ and any t ∈ T , there is a sequence {sn , n ≥ 1} ⊂ S verifying limn→∞ d(sn , t) = 0 and X (ω, t) = limn→∞ X (ω, sn ). It is worth observing, when using these notions, that the fact that (T , d) is separable is a key property. If (T , d) is a pseudo-metric space, these notions can be extended to this case as well, for instance when the space is totally bounded, namely when the entropy numbers (see later) of the space are all finite. In this case, (T , d) contains a countable d-dense subset T : for all t ∈ T , there exists a sequence {sn , n ≥ 1} contained in T and such that limn→∞ d(sn , t) = 0. If X satisfies assumption (8.1.3), then X also admits a d-separable version X , which may be built exactly as before. Having defined these notions, we may now focus on our initial purpose: the study of the regularity of stochastic processes from the point of view of their in-norm increment properties. For, recall that for any real u > 0, the entropy number N (T , d, u) of order u of (T , d) is by definition the smallest (possibly infinite) number of open d-balls of radius u, enough to cover T . We write D = D(T ), the diameter of (T , d). 8.1.1 Theorem (Boundedness). Let be a regular Young function. Let (T , d) be a pseudo-metric space and let X = {Xt , t ∈ T } be a stochastic process satisfying the increment condition (8.1.3). Assume that the entropy integral I (T , d) =

D

−1 N (T , d, u) du

(8.1.14)

0

is convergent. Then X possesses a version X which is sample bounded and there exists a constant C depending on only such that sup |X − X | ≤ C I (T , d). (8.1.15) s t s,t∈T

Proof. We may assume D > 0 otherwise the result is obvious. By the finiteness of the integral in (8.1.14), (T , d) is totally bounded, hence separable. For any integer n = 0, 1, 2, . . . , let Tn ⊂ T be a sequence of centers of balls , corresponding to a minimal covering of T of size 2−n D, (T0 = {s0 }). Let T = ∞ n=0 Tn ; then T is a d-dense subset of T . Note also formally by s → s¯ , a map from Tn to Tn−1 such that d(s, s¯ ) < 2−n+1 D. Finally put, for n ≥ 0, Mn = sup |Xs − Xs0 |, s∈Tn

Mn = sup Mj . 0≤j ≤n

347

8.1 Introduction and general results

Then M0 = M0 = 0 and 0 ≤ Mn − Mn−1 ≤ sups∈Tn |Xs − Xs¯ |. Indeed, either Mn = Mn−1 , in which case there is nothing to prove; or Mn > Mn−1 , and thus Mn = Mn > Mn−1 . Let sσ ∈ Tn be such that Mn = |Xsσ − Xs0 |. Then Mn − Mn−1 = |Xsσ − Xs0 | − Mn−1 ≤ |Xsσ − Xs0 | − Xs¯σ − Xs0 | ≤ |Xsσ − Xs¯σ |. As is regular, for any n ≥ 1,

Mn − Mn−1 ≤ C−1 #(Tn ) sup Xs − Xs¯ s∈Tn

But Mn = Mn − M0 =

≤ C2

n

k=1 Mk

−(n−1)

−1

D

N (T , d, 2−n D) .

− Mk−1 . Thereby

n n

Mn ≤ Mk − Mk−1 ≤ C 2−k+1 D−1 N (T , d, 2−k D) k=1 ∞

≤C

k=1

2

−k+1

−1

D

N(T , d, 2

−k

D) ≤ C

D

−1 N (T , d, u) du.

0

k=1

We deduce

sup |Xs − Xs | ≤ C 0

D

−1 N (T , d, u) du.

0

s∈T

By (8.1.3), X is d-continuous in probability. Define X by Xt = limT s→t Xs . Then X admits a separable version of X , for which we obviously have D

sup |X − X | ≤ C −1 N (T , d, u) du. s s0 0

s∈T

And (8.1.15) now follows from the triangle inequality. Applying estimate (8.1.15) to any ball B(t, ρ) shows that X is also almost surely d-continuous at point t of T , since as ρ tends to 0, 2ρ

sup |X − X | ≤ C −1 N (B(t, ρ), d, u) du s t 0

s∈B(t,ρ)

≤C

2ρ

−1 N (T , d, u) du → 0.

0

Theorem 8.1.1 suffices completely for all the applications of the metric entropy method presented in this chapter. Its proof being also very simple and pedagogical, it is why we have chosen this presentation. We shall now complete it with a corresponding statement concerning sample continuity. Continuity appears in our applications as an already existing property: typically in the important case of random polynomials. However, establishing satisfactory conditions for sample continuity of a given class of stochastic processes is a more delicate problem. The theorem in view (Theorem 11.6 in [Ledoux–Talagrand: 1991]) states as follows:

348

8 The metric entropy method

8.1.2 Theorem (Continuity). Let be a Young function. Let (T , d) be a pseudo-metric space. Let X = {Xt , t ∈ T } be a stochastic process indexed on T and satisfying the increment condition (8.1.4). Then, if the integral condition (8.1.14) is satisfied, X possesses a version X which is sample bounded and sample (uniformly) d-continuous on T . Further, there exists an increasing function v : R+ → R+ with v(0) = 0, depending on condition (8.1.14) only, such that for any ε > 0, E supd(s,t)≤v(ε) |Xs − Xt | ≤ ε. Proof. We use the chaining argument of the proof of Theorem 8.1.1, and shall make it a little more precise. For any integer ≥ 1, there are maps π : T → T−1 satisfying d(s, π (s)) ≤ D2−+1 . We may also assume that T is finite, so T = TN for some large integer N . Define for 1 ≤ ≤ N, the maps σ : TN → T by σ = π+1 · · · πN . Note that σN = identity(TN ). We begin with a first observation. Let 1 ≤ k < N and s ∈ TN . Writing Xs − Xσk (s) = N =k+1 Xσ (s) − Xσ−1 (s) and arguing as in the previous proof, allows us to get, in view of (8.1.10), D2−k

−1 N (T , d, u) du. E sup |Xs − Xσk (s) | ≤ C 0

s∈TN

Let η > 0 and let s, t ∈ TN be such that d(s, t) ≤ η. If we now consider the set U = (x, y) ∈ T2 : ∃(u, v) ∈ TN2 such that d(u, v) ≤ η and σ (u) = x, σ (v) = y , then it is plain that (σ (s), σ (t)) ∈ U . Clearly to each pair ϑ = (x, y) in U , another pair (uϑ , vϑ ) in TN2 can be associated, satisfying σ (uϑ ) = x, σ (vϑ ) = y. These observations being made, choosing then (x, y) = (σ (s), σ (t)), we can write using the triangle inequality, |Xs −Xt | ≤ |Xs −Xσ (s) |+|Xσ (s) −Xuϑ |+|Xuϑ −Xvϑ |+|Xvϑ −Xσ (t) |+|Xσ (t) −Xt |. The trick there is that the third term in the right-hand side (which could be |Xs −Xt |) belongs to a set of cardinality less than or equal to #(U ) and d(uϑ , vϑ ) ≤ η, with and η independent. And this allows us to get the bound sup |Xs − Xt | ≤ sup |Xuϑ − Xvϑ | + 4 sup |Xt − Xσ (t) |.

s,t∈TN d(s,t)≤η

t∈TN

ϑ∈U

By the triangle inequality again and (8.1.10), we arrive at E sup |Xs − Xt | ≤ E sup |Xuϑ − Xvϑ | + 4E sup |Xt − Xσ (t) | s,t∈TN d(s,t)≤η

t∈TN

ϑ∈U

−1

≤ Cη

2

(N (T , d, D2

−

)) + 4C

D2−

−1 N (T , d, u) du.

D2−

−1 N (T , d, u) du.

0

Letting now N tend to infinity, gives E sup |Xs − Xt | ≤ Cη−1 (N 2 (T , d, D2− )) + 4C s,t∈T d(s,t)≤η

0

349

8.2 A theorem of Stechkin

The increment assumption (8.1.4) implies that Xt is d-continuous in probability. Define X by Xt = lim Xs . T s→t

Then E

X

is a version of X, and

sup d(s,t)≤η

|Xs

− Xt |

≤ Cη

−1

2

(N (T , d, D2

−

)) + 4C

D2−

−1 (N (T , d, u))du.

0

D2−

Given ε > 0, we choose such that 4C 0 −1 N (T , d, u) du ≤ ε/2, and then an η small enough to have Cη−1 (N 2 (T , d, D2− )) ≤ ε/2. In this way we are able to get E supd(s,t)≤η |Xs − Xt | ≤ ε. One can define v(ε) to be the largest possible η. The sample uniform d-continuity of X on T now follows from a standard application of the Borel–Cantelli lemma.

8.2 A theorem of Stechkin Recall first Stechkin’s theorem (see Gaposhkin [1966a: Theorem 8.3.5] or Billingsley [1999: Problem 6, p. 102, see also Theorem 12.2). 8.2.1 Theorem. Let ξ = {ξi , i ≥ 1} be a sequence of random variables satisfying the following assumption: γ α E ξl ≤ ul , 1 ≤ i ≤ j < ∞, (8.2.1) i≤l≤j

i≤l≤j

∞ where {ui , i ≥ 1} is a sequence of nonnegative reals ∞ such that the series l=1 ul converges and α > 1, γ > 0. Then the series l=1 ξl converges almost surely. Moreover, for α > 1, one has the bound ∞ α/γ ξl ≤ C ul , sup i,j ≥1 i≤l≤j

γ

l=1

where the constant C depends on α only. Note that this statement contains a trivial part: the case 0 < γ ≤ 1. Indeed (8.2.1) provides E |ξl |γ ≤ uαl . Since 0 < γ ≤ 1, then ∞ l=1

E (1 ∧ |ξl |) ≤

∞ l=1

E (1 ∧ |ξl |)γ ≤

∞ l=1

E |ξl |γ ≤

∞ l=1

uαl ≤

∞

α ul

< ∞.

l=1

(8.2.2) The series ∞ (1 ∧ |ξ |) thus converges almost surely. But this amounts to saying that l l=1 |ξ | converges almost surely, which is an even stronger conclusion. the series ∞ l=1 l In what follows, we will thus restrict our attention to the case γ > 1 only. The statement can also be completed in the case when the series ∞ l=1 ul diverges.

350

8 The metric entropy method

8.2.2 Theorem. Let the random variables ξ = {ξi , i ≥ 1} satisfy assumption (8.2.1) with α > 1, γ∞ > 1 and the sequence {ui , i ≥ 1} of nonnegative reals be such that the series 1≤l≤L ul and l=1 ul diverges. Put for any integer L ≥ 1, UL = SL = 1≤l≤L ξl . Then, |SL | a/γ

α/γ L→∞ U log UL L lim

= 0 (∀a > 1) almost surely.

(8.2.3)

We now give a common proof to both of these statements by means of the metric entropy approach, thus avoiding tedious use of a dyadic chaining argument. The important case α = 1 will be investigated by means of the same method for indicators in the next section (Theorem 8.3.1, see also Remark 8.3.5 for sequences of functions). Proofs of Theorems 8.2.1 and 8.2.2. Put U = {UL , L ≥ 1}, S = {SL , L ≥ 1}. Assumption (8.2.1) can be reformulated as follows:

α/γ

Sj − Sk γ ≤ Uj − Uk (∀j ≥ k ≥ 1) (8.2.1 ) Step 1. Proof of Theorem 8.2.1. Let u = ∞ l=1 ul . First observe that (8.2.1 ) implies γ that the sequence S is a Cauchy sequence in L , thus converging to some element S∞ of Lγ . The new sequence obtained by adding to S its limit is again denoted by S. Let 0 < ε ≤ u and write ⎧ ⎪ ε, (j + 1)ε[, j = 0, 1, . . . , [ uε ], ⎨Ij (ε) = [j u J ∗ (ε) = 0 ≤ j ≤ [ ε ] : Ij (ε) ∩ U = ∅ , ⎪ ⎩ if j ∈ J ∗ (ε). j− = inf{L : UL ∈ Ij (ε)} Then for all L ≥ 1, there exists j ∈ J ∗ (ε): 0 ≤ UL − Uj− ≤ ε and #(J ∗ (ε)) ≤ [ uε ] + 1 ≤ 2u/ε. Said differently, by invoking assumption (8.2.1 ), we have ∀L ≥ 1, ∃j ∈ J ∗ (ε) such that

SL − Sj− γ ≤ εα/γ .

Let N(S, · γ , ρ) be the minimal number of open · γ -balls of radius ρ centered in S and enough to cover S. Then, for 0 ≤ ρ ≤ uα/γ , N(S, · γ , ρ) ≤

2u . ρ γ /α

(8.2.4)

We apply Theorem 8.1.1. The corresponding setting is T = N, Xn = Sn , n ∈ N, d(n, m) = Sn − Sm γ . And the entropy integral is easily estimated: uα/γ uα/γ 1/γ 1/γ N (S, · γ , ρ) dρ ≤ (2u) ρ −1/α dρ = Cα uα/γ < ∞, 0

0

since α > 1, where the constant Cα depends on α only. Therefore S is convergent almost surely. And, we have the uniform bound sup Sn − Sm ≤ C uα/γ , α γ n,m≥1

351

8.2 A theorem of Stechkin

with a constant Cα depending on α only. We now go to the proof of the second statement. Step 2. Proof of Theorem 8.2.2. Let M > 1 and put for any integer k ≥ 1 Ik = [M k , M k+1 [. Let κ = {κp , p ≥ 1} be the sequence defined by κp = k if Ik is the p-th interval such that Ik ∩ U = ∅. Let Lp be the set of indices defined by L ∈ Lp ⇔ UL ∈ Iκp . Pick arbitrarily some index in Lp , which we write L∗p . Let a > 1. By assumption (8.2.1), P |SL∗p | > εM α(κp +1)/γ p a/γ ≤

γ

SL∗p γ εγ M α(κp +1) pa

≤

|UL∗p |α εγ M α(κp +1) p a

≤

1 . εγ pa

Thus by the Borel–Cantelli lemma, P lim sup p→∞

|SL∗p | M α(κp +1)/γ pa/γ

≤ ε = 1.

(8.2.5)

Examine now the oscillation of S over Lp . For i, j ∈ Lp we have Si − Sj γ ≤ |Ui − Uj |α/γ . For j ∈ Lp replace Sj by Sj = Sj /(M α(κp +1)/γ ), uj by uj = uj /(M κp +1 ) j and Uj by Uj = l=1 ul . Then α/γ

Si − Sj γ ≤ Ui − Uj ≤1

(i, j ∈ Lp ).

Let Sp = {SL , L ∈ Lp }. From the computation made at the previous step, we have the following estimate: 1 1 N(Sp , · γ , ρ)1/γ dρ ≤ 2 ρ −1/α dρ < ∞. 0

0

Hence by Theorem 8.1.1, sup |S − S | ≤ Cα < ∞, i j γ i,j ∈Lp

(8.2.6)

on α only. By Tchebycheff we deduce from the previous where Cα depends inequality, γ bound that P supi,j ∈Lp |Si − Sj | > εpa/γ ≤ (Cα /εγ p a ) and by the Borel–Cantelli lemma again supi,j ∈Lp |Si − Sj | ≤ ε = 1. (8.2.7) P lim sup M α(κp +1)/γ pa/γ p→∞ Combining now (8.2.5) with (8.2.7), and writing that SL = SL − SL∗p + SL∗p , easily gives: |SL | 1 + M α/γ ≤ε = 1. (8.2.8) P lim sup α/γ (log M)α/γ L→∞ UL (log UL )a/γ Since ε is arbitrary, this implies the result.

352

8 The metric entropy method

8.2.3 Remark. Very often, Theorem 8.2.2 applies in situations l = 1(Al ) − in which ξ L 1(A ) − P(Al ) and ul = P(Al ). And so Sl expresses the difference L l l=1 l=1 P(Al ). It is worth observing here, that if the sequence κ is very sparse, a smaller order size α/γ than for UL (log UL )a/γ can be assigned to the error term |SL |. This follows from (8.2.5) and (8.2.7) and is directly readable from the data. We continue with a second observation concerning the consistency, from a theoretical point of view, of the treatment proposed for the almost sure convergence of series of functions, through the approach described by Theorem 8.2.1. Later we will see in Remark 8.3.5, when treating the limit case α = 1, that this approach also allows us to re-capture (even in a more general form) the Rademacher–Menshov theorem. The very formulation of that theorem does not however allow us to recover classical results on almost sure convergence of series of independent random variables. It indeed requires that the series l≥1 E ξl2 (log l)2 converges – here we are given a sequence ξ = {ξi , i ≥ 1} of centered, square integrable, independent random variables – to ensure the convergence almost everywhere of the series l≥1 ξl , whereas it is classi cal (Petrov [1975: 266]) that the convergence of the series l≥1 E ξl2 is enough (and necessary). This result is however contained in Theorem 8.2.1. Here is how to get this. First, we shall quit L2 for Lp , p > 2 where we will apply Theorem 8.2.1. Introduce for some arbitrary ε > 0, the sequence ξ ε of truncated random variables: ξlε = ξl 1{|ξl | ≤ ε},

l ≥ 1.

Both sequences ξ and ξ ε are equivalent since the series l P{ξl = ξlε } converges. Appeal now to Rosenthal’s inequality: Let p ≥ 2. There exists a constant Cp depending on p only, such that for any sequence xi , i ≤ n of independent elements of Lp (P) with zero expectation, p 2 p/2 E . (8.2.9) xl ≤ Cp E |xl |p + E xl i≤l≤j

i≤l≤j

i≤l≤j

Assume first that ξ is a sequence of symmetric random variables. Thus p p/2 E ξlε ≤ Cp,ε E (ξlε )2 , i≤l≤j

i≤l≤j

where Cp,ε depends on p, ε only. For p > 2, Theorem 8.2.1 applies and we get the result in that case. Now if ξ is not symmetric (but centered), let ξ = {ξl , l ≥ 1} be an independent copy of ξ defined on a different probability space, with corresponding probability and expectation symbols P and E . Let ξlε = ξl 1{|ξl | ≤ ε} and ξl ε = ε ε ξl 1{|ξl | ≤ ε}. Then xl = ξl − ξl is a symmetric sequence. And by the reasoning made before, the series l xl converges. Moreover, by using the uniform bound in Theorem 8.2.1, p/2 E E sup xl ≤ Cp,ε E ξl2 , i,j ≥1 i≤l≤j

l

8.3 An application to the quantitative Borel–Cantelli lemma

353

so that E supj ≥1 l≤j xl < ∞, P-almost surely. An application of the dominated convergence theorem conditionally to ξ yields that the limit limj →∞ E l≤j xl = ε limj →∞ l≤j ξl − E ξl 1{|ξl | ≤ ε} exists P-almost everywhere. It now remains to control the sum l≤j E ξl 1{|ξl | ≤ ε}. But the centering assumption implies that ∞ −E ξl 1{|ξl | ≤ ε} = E ξl 1{|ξl | > ε} = εP{ξl > ε} + ε P{ξl > u}du. By assumption, the series l E ξl2 converges. Applying the Tchebycheff inequality to each term of the above writing of −E ξl 1{|ξl | ≤ ε}, we thus deduce convergence of the series ensures convergence almost everywhere of the series l≤j ξlε , l E ξl 1{|ξl | ≤ ε}. This and thereby of the series l≤j ξl since both sequences ξ and ξ ε are equivalent, in view of convergence of the series l P{ξl = ξlε }.

8.3 An application to the quantitative Borel–Cantelli lemma In this section, we discuss various formulations of the quantitative form of the Borel– Cantelli lemma. This is a relatively universal tool with wide fields of applications, notably in probability theory, metrical number theory and uniform distribution theory. The section is presented as a complementary part of the preceding, devoted here to the case α = 1 in Stechkin’s theorem. We show that the metric entropy approach is relevant there. We have also taken the opportunity to present some classical results, following a case by case natural progression, from independence to dependence in this study. We have not taken into consideration the various existing conditional versions of the Borel–Cantelli lemma, since they do not contain quantitative aspects. We have isolated as lemmas some useful estimates for suprema of finite families of random variables. We start with elementary considerations concerning Borel–Cantelli’s lemma, which we recall for our purpose. Borel–Cantelli lemma. Let (, B, P) be some probability space and a sequence {Ak , k ≥ 1} of measurable subsets of . (i) If the series k≥1 P(Ak ) converges, then P{lim supk→∞ Ak } = 0. (ii) If the series k≥1 P(Ak ) diverges and the events are independent, then P{lim supk→∞ Ak } = 1. As is well known, the independence assumption on the events Ak is too strong for getting the conclusion. It suffices indeed that some 0-1 law exists, and that the correlation condition be satisfied: P(Ak ∩ Al ) ≤ CP(Ak )P(Al ) (∀k = l) where C is some absolute constant. This follows from the

(8.3.1)

354

8 The metric entropy method

Paley–Zygmund inequality. For any g ∈ L2 (P) such that P(g ≥ 0) = 1 and any real λ ∈ [0, 1],

2 2 gdP . (8.3.2) P g ≥ λ gd P ≥ (1 − λ) g2d P Applying this inequality for g = I ≤k≤J 1Ak gives P

I ≤k≤J

1Ak ≥ λ

P(Ak ) ≥ (1 − λ)

2

I ≤k≤J

≥ (1 − λ)2

P(Ak ) +

I ≤k≤J

I ≤k≤J

1+C

I ≤k≤J

2

P(Ak )

I ≤k =l≤J

P(Ak )

I ≤k≤J

P(Ak )

P(Ak ∩ Al )

, (8.3.3)

which easily implies P(lim supk→∞ Ak ) = 1 whenever P(lim supk→∞ Ak ) = 0 or 1. Note that by Fatou’s lemma, (8.3.3) also provides an indication of the number of occurrences of the sets Ak : for any partial index J, (1 − λ)2 # 1 ≤ k ≤ J : ω ∈ Ak ≥λ ≥ (0 ≤ λ ≤ 1). P ω : lim sup C JJ →∞ I ≤k≤J P(Ak ) A great deal of attention has been devoted to getting much better estimates for the quantity (8.3.4) NJ = # 1 ≤ k ≤ J : Ak occurs . Let us first look at the independent case. Since NJ − E NJ is the sum of independent Bernoulli random variables ξk = 1(Ak ) − P(Ak ), we may invoke the strong law of large numbers. This one will in fact follow from a stronger result. Let ε > 0 and put ξk . (E Nk )1/2+ε Since the series k≥1 P(Ak ) diverges, the series k≥1 P(Ak )/(E Nk )α thus converges for any real α > 1 (see (4.8.6)). In particular the series k≥1 E ηk2 converges. The random variables ηk being independent, this implies, according to the TwoSeries Theorem (Petrov [1975a], p. 266), that the series ηk ηk =

k≥1

converges almost surely. By Kronecker’s lemma it follows that for all ε > 0, NJ − 1≤k≤J P(Ak ) P lim = 0 = 1. (8.3.5) 1/2+ε J →∞ 1≤k≤J P(Ak )

8.3 An application to the quantitative Borel–Cantelli lemma

355

This strictly stronger result can be made precise by invoking Kolmogorov’s law of the iterated logarithm for sums of independent random Theo Jvariables (Petrov [1975a], rem 1, p. 292). For any integer J ≥ 1, put BJ = k=1 P(Ak ) 1 − P(Ak ) . Then, NJ − Jk=1 P(Ak ) P lim sup " (8.3.6) #1/2 = 1 = 1. J →∞ 2BJ log log BJ Finally the statistic of the number of occurrences can also be made precise by invoking the Berry–Esseen inequality (Petrov [1975a], Theorem 3, p. 111): x NJ − 1≤k≤J P(Ak ) 1 −u2 /2 = O(LJ ), sup P < x − e du √ 1/2 2π −∞ x∈R BJ where

J

LJ =

3 k=1 E (ξk ) "J # 2 3/2 k=1 E (ξk )

J =

3 2 k=1 P(Ak ) + 2P(Ak ) − 3P(Ak ) #3/2 , "J k=1 P(Ak )(1 − P(Ak ))

" J #−1/2 and LJ ∼ as J tends to infinity, if limk→∞ P(Ak ) = 0. Obviously k=1 P(Ak ) we have a central limit theorem; we also have in fact an almost sure central limit theorem, which we will not describe here. Thus we have a complete picture of the asymptotic behavior of the number of occurrences for the sequence {Ak , k ≥ 1} in the independent case. Other forms of Paley–Zygmund inequality. This inequality is an extremely useful tool, and sometimes other variants turn up to be more appropriate. Observe first that the original Paley–Zygmund inequality is a simple consequence of the Cauchy–Schwarz inequality. We have (g ≥ 0, 0 ≤ λ ≤ 1)

2

E gχ {g ≥ λE g} ≤ E g 2 P{g ≥ λE g}. But E gχ {g ≥ λE g} = E g − E χ{g ≤ λE g} ≥ (1 − λ)E g. By combining both inequalities, we easily get

2 2 Eg P g ≥ λE g ≥ (1 − λ) . E g2 Lemma 8.7.4 has also provided the inequality P{X ≥ E X} > 0, valid for X ≥ 0 with E X < ∞. More generally, let r > s > 0 and 0 ≤ ε ≤ 1. Then for any non-negative random variable X ∈ Lr , 1−1 1 X s P X ≥ ε X s s r ≥ (1 − εs ) s . (8.3.7)

X r Indeed, let X, Y be nonnegative random variables. By applying Hölder’s inequality

r−s

s (with p = rs ), we have E Xs Y ≤ E Y r E Xr r . Choose Y = 1{X ≥ ε X s }. As E Xs Y = E Xs − E Xs 1{X ≤ ε X s } ≥ E X s − E Xs 1{Xs ≤ εs E Xs } ≥ (1 − εs )E Xs ,

356

8 The metric entropy method

r−s r−s and E Xs Y ≤ P X ≥ ε X s r X sr , we get P X ≥ ε X s r X sr ≥ (1 − εs ) X ss , or 1−1 1 X s P X ≥ ε X s s r ≥ (1 − εs ) s ,

X r as claimed. Inequality (8.3.7) can be viewed as a version of Petrov’s inequality. If X is any random variable and s > r > 0, then X ∈ Ls implies 1−1

X r . P X = 0 r s ≥

X s (See Petrov [1975b], inequality (2), p. 392.) In the light of the remark made after the statement of Borel–Cantelli’s lemma, it is interesting to figure out whether these results, or some of them, are extendable under weaker assumptions than independence. Before going further, it seems worthwhile to point out a kind of subsequence principle for independence observed by Neveu, after subsequent works from Fischler [1967], Gillis [1936], Lorentz [1960], Rényi [1958], Sucheston [1960], Visser [1937]. Weak convergence is essential in what follows. Let A = {Ak , k ≥ 1} be a sequence of measurable subsets of such that lim inf k→∞ P(Ak ) = ρ ∈ ]0, 1]. Then, according to Theorem 2, p. 67 of Neveu [1965], either • the sequence of indicators {1(Ak ), k ≥ 1} converges weakly in L2 (P) to the constant function equal to ρ, and then, for any ε > 0, there exists a subsequence n1 < n2 < · · · such that if Bm = Anm , for any two distinct, finite subsets and J with #( ) = I , #(J) = J , the following inequalities are realized: * * (1 − ε)ρ I (1 − ρ)J ≤ P Bi ∩ Bjc ≤ (1 + ε)ρ I (1 − ρ)J ; i∈

j ∈J

• or the sequence of indicators {1(Ak ), k ≥ 1} does not converge weakly in L2 (P), and there exist a real δ > 0 and a subsequence n1 < n2 < · · · such that if Bm = Anm , for any finite subset with #( ) = I , the following inequality is realized: * P Bi ≥ (ρ + δ)I . i∈

This result also generalizes the Poincaré recurrence theorem (Theorem 3.1.5). Consider now the dependent case. The first idea which comes to mind is whether it is possible to get something under assumption (8.3.1). Without any strengthening of (8.3.1) the answer is negative. This follows from a counterexample by Rieders for strong mixing sequences (c.f. Rieders [1993], remark following Theorem 1). One can also use the last part of the proof of Theorem 3, p. 68 in Fischler [1967] to give an elementary construction of a counterexample. Let η > 0 and denote I = [0, 1], J = [0, 1 + η]. Let be λ the Lebesgue measure on the interval I , and λ˜ be the

357

8.3 An application to the quantitative Borel–Cantelli lemma

probability measure on J defined by λ˜ (dx) = (1 + η)−1 1J (x)dx. On (I, λ) let us consider a sequence of independent (Rademacher) random variables taking values ±1 with probability 1/2. Define a sequence of events B = {Bn , n ≥ 1} by Bn = {εn = 1}. ˜ It is easily We view them as measurable events of the enlarged probability space (J, λ). checked that ˜ n )λ(B ˜ m ), ˜ n ∩ Bm ) = 1 = (1 + η)λ(B λ(B 4(1+η) ˜ ˜ n) = 1 and λ(lim supn→∞ Bn ) = 1 . λ(B 2(1+η)

1+η

This also provides a simple example of an orthonormal sequence, for which partial sums √ do not satisfy CLT. Indeed, let ξn (x) = 2(1 + η)1[0,1[ (x)εn (x), and put Sn (ξ ) = ξ1 + · · · + ξn , Sn (ε) = ε1 + · · · + εn . Then ξ = {ξn , n ≥ 1} is an orthonormal system ˜ but ξ ∈ in L2 (J, λ), / CLT since √ λ˜ x ∈ J : Sn (ξ )(x)/ n < t √ √ = λ˜ x ∈ I : Sn (ξ )(x)/ n < t + λ˜ x ∈ J \I : Sn (ξ )(x)/ n < t 2 " # √ = λ x ∈ J : 2(1 + η)Sn (ε)(x)/ n < t + 1R+ (t) /(1 + η) 2 " # → P{N (0, 1) < t/ 2(1 + η)} + 1R+ (t) /(1 + η) = P{N (0, 1) < t}. We shall concentrate in what follows on strong laws of large numbers with speed of convergence, rather than the study of the statistic of the occurrences via the CLT. The only comment we shall make in that direction concerns weakly multiplicative systems (WMS), a notion due to Alexits [1961] and later extended by Móricz [1976]. The study of the CLT, and therefore of the characteristic functions of the number of occurrences, indeed requires much stronger information on the correlation properties of the family {ξk , k ≥ 1}, where we have again set ξk = 1(Ak ) − P(Ak ). If for instance this family is for some real 1 ≤ p < 2, a p-WMS system:

1/p E ξi . . . ξi p sup Cr < ∞ where Cr = , r 1 r

i1 1 and consider a sequence {Al , l ≥ 1} of measurable subsets of . Put ml = P(Al ) and ξl = 1(Al ) − ml , l ≥ 1. We assume that the following assumptions are fulfilled: γ (i) E i≤l≤j ξl ≤ C i≤l≤j ml , 0 ≤ i ≤ j < ∞, (ii) the series ∞ k=1 mk diverges. Then, for every a > γ + 1: # 1 ≤ k ≤ J : Ak occurs − 1≤k≤J mk P lim = 0 = 1. (8.3.11) " #1/γ " #a/γ J →∞ log 1≤k≤J mk 1≤k≤J mk In the independent case, Theorem 8.3.2 does not bring any more than Theorem 8.3.1 or property (8.3.6), since by Rosenthal’s inequality (8.2.9), γ γ /2 E ξl ≤ Cγ ml , 0 ≤ i ≤ j < ∞. i≤l≤j

i≤l≤j

To prove Theorem 8.3.2, we begin with a useful lemma.

359

8.3 An application to the quantitative Borel–Cantelli lemma

8.3.3 Lemma. Let γ > 1, 0 < β ≤ 1 and consider a finite collection of random variables E = (X1 , . . . , XN ) ⊂ Lγ (P), such that sup

(i)

1≤i,j ≤N

Xi − Xj γ ≤ 1,

(ii) N (E, · γ , ε) ≤

C

(0 < ε ≤ 1).

ε 1/β

Then there exists a constant Kβ,γ depending on β, γ only such that ⎧ β 1/γ ⎪ ⎨Kβ,γ max(C , C ) if βγ > 1, N e sup |Xi − Xj | ≤ Kβ,γ C 1/γ log( ) if βγ = 1, C γ ⎪ 1 1≤i,j ≤N ⎩ −β β γ Kβ,γ C N if βγ < 1.

(8.3.12)

Note that a straightforward application of inequality (8.1.3) with (x) = |x|γ would have given sup1≤i,j ≤N |Xi − Xj | γ ≤ N 2/γ , which is a far poorer bound. We shall see in the next lemma that the requirement made on the entropy numbers of the family E is well adapted to our purpose . Proof. Under our assumption N(E, · γ , ε) ≤ min(C/ε 1/β , N ). Apply Theorem 8.1.1 with ϕ(x) = |x|γ . The entropy integral in (8.1.14) can be estimated as follows: 1 1

C 1/γ 1/γ N (E, · γ , ε) dε ≤ min ε1/β ,N dε 0

0

=

(C/N )β

N 1/γ dε + C 1/γ

0 1

= Cβ N γ

−β

1

+Cγ

1

1

ε−1/βγ dε

(C/N )β

ε−1/βγ dε.

(C/N )β

A direct computation then shows

⎧ " # β , C 1/γ ) 2βγ −1 ⎪ if βγ > 1, max(C ⎪ βγ −1 ⎨ 1

N β 1/γ β if βγ = 1, N (E, · γ , ε) dε ≤ C log C e ⎪ 1 " # 0 ⎪ −β 1 ⎩C β N γ if βγ < 1. 1−βγ

The result is thus implied by the conclusion of Theorem 8.1.1. 8.3.4 Lemma. Let γ > 1, 0 < β ≤ 1 and consider a finite collection of random variables E = X1 , . . . , XN ⊂ Lγ (P), and reals 0 ≤ t1 ≤ t2 ≤ · · · ≤ tN ≤ 1 such that

Xj − Xi γ ≤ (tj − ti )β (∀1 ≤ i ≤ j ≤ N ). (8.3.13) Then, there exists a constant Kβ,γ depending on β, γ only, such that ⎧ if βγ > 1, ⎪ ⎨Kβ,γ sup |Xi − Xj | ≤ Kβ,γ log N if βγ = 1, γ ⎪ 1 1≤i,j ≤N ⎩ −β Kβ,γ N γ if βγ < 1.

(8.3.14)

360

8 The metric entropy method

From the lemma above follows the well-known Rademacher–Menshov’s maximal inequality. Let X1 , X2 , . . . , Xn , n ≥ 2, have zero means and be orthogonal. Then n j 2 n E max Xi ≤ C(log n)2 E Xi2 , j =1

i=1

i=1

where C is a universal constant. Proof. It is similar to the construction made in the proof of Theorem 8.2.1. Let 0 < ε ≤ 1 and write ⎧ ⎪ ε, (j + 1)ε[ j = 0, 1, . . . [ 1ε ], ⎨Ij (ε) = [j "1# ∗ J (ε) = 0 ≤ j ≤ ε : Ij (ε) ∩ {tl , 1 ≤ l ≤ N} = ∅ , ⎪ ⎩ j− = inf{l : tl ∈ Ij (ε)} if j ∈ J ∗ (ε). Then for all 1 ≤ l ≤ N, there exists j ∈ J ∗ (ε): 0 ≤ tl − tj− ≤ ε and #(J ∗ (ε)) ≤ "1# ε + 1 ≤ 2/ε. This, by virtue of the assumption made, means that ∀1 ≤ l ≤ N, ∃j ∈ J ∗ (ε) such that

Xl − Xj− γ ≤ εβ .

Thus N(E, · γ , εβ ) ≤ 2/ε, or else N(E, · γ , ρ) ≤

2 ρ 1/β

(0 < ρ ≤ 1).

(8.3.15)

It remains to apply Lemma 8.3.3 to conclude (8.3.14). Now we can pass to the proof of Theorem 8.3.2. Proof of Theorem 8.3.2. We shall use the notation Sn = any integer k ≥ 1, put

n

l=1 ξl , n

n Nk = inf n ≥ 1 : ml ≥ k .

=

n

l=1 ml .

For

(8.3.16)

l=1

Then Nk −1 < k ≤ Nk ≤ Nk −1 + 1. Consider two positive integers P < Q; we will first estimate the oscillation of the sums Sl over the block of indices NP , NP +1 , . . . , NQ−1 .

Nk −1 ml = Nk −1 − Nk−1 −1 ≤ k − (k − 1) − 1 = 2, we deduce from Since l=N k−1 our assumption that m −1 N γ E ξl ≤ 2C(m − n). (8.3.17) l=Nn

8.3 An application to the quantitative Borel–Cantelli lemma

361

Put " #1/γ , Xh = SNP +h −1 / 2C(Q − P )

th = h/(Q − P )

(h = 0, . . . , Q − P − 1). (8.3.18) Reformulating then our previous estimate in terms of Xh , th gives (writing m = P + j , n = P + i) Xj − Xi γ ≤ tj − ti (0 ≤ i ≤ j ≤ Q − P − 1). (8.3.19) γ We can therefore infer from Lemma 8.3.4 that sup0≤i≤j ≤Q−P −1 |Xj − Xi | γ ≤ Kγ log(Q − P )e, or in terms of Sn : sup SN − SN ≤ Kγ (Q − P )1/γ log(Q − P )e. (8.3.20) n m γ m,n∈[P ,Q[

Apply this estimate with the choice P = 2r , Q = 2r+1 and put SN − SN (r ≥ 1). Br = sup n m 2r ≤n,m 0 but arbitrary and a > γ + 1. By estimate (8.3.20) and Tchebycheff’s inequality, γ E Br P Br > ε2r/γ r a/γ ≤ γ r a ≤ Kγ ε−γ r γ −a , ε 2 r thus implying that the series r≥1 P Br > ε2r/γ r a/γ converges. Hence, by Borel– Cantelli’s lemma, P ∃R < ∞ : Br ≤ ε2r/γ r a/γ , r ≥ R = 1. (8.3.21) Further

SN r γ E 1 2r + 1 2 P SN2r > ε2r/γ r a/γ ≤ γ r a ≤ γ r a N2r ≤ γ r a . ε 2 r ε 2 r ε 2 r We deduce for a > 1 that the series r≥1 P SN2r > ε2r/γ r a/γ converges. By invoking the Borel–Cantelli lemma again, we obtain P ∃R < ∞ : SN2r ≤ ε2r/γ r a/γ , r ≥ R = 1. (8.3.22) Let now k ≥ 1 and r ≥ 1 be integers such that 2r ≤ k < 2r+1 . From the inequality |SNk | ≤ |SN2r | + |SNk − SN2r | and (8.3.21)–(8.3.22), it follows that on a measurable set of full measure, |SNk | ≤ 2ε2r/γ r a/γ holds true for all k large enough. Since 2r ≤ 2r ≤ Nk ≤ 2r+1 −1 < 2r+1 , we also have 1/γ |SNk | ≤ Kγ εNk (log Nk )a/γ , (8.3.23) for all k large enough, on a measurable set of measure 1.

362

8 The metric entropy method

Finally we treat the general case. Let N be some arbitrary positive integer and k an integer such that Nk ≤ N < Nk+1 . Then k ≤ Nk ≤ N ≤ Nk+1 −1 ≤ k + 1. From (8.3.23) follows that on a measurable set of full measure, both inequalities below hold true: N

1(Al ) ≥

l=1

Nk

1/γ

1(Al ) ≥ Nk − Kγ εNk (log Nk )a/γ

l=1 1/γ

≥ N − Kγ εN (log N )a/γ , and N

Nk+1

1(Al ) ≤

l=1

1/γ

1(Al ) ≤ Nk+1 + Kγ εNk+1 (log Nk+1 )a/γ

l=1 1/γ

≤ N + Kγ εN (log N )a/γ , provided that N is large enough. In other words, SN P lim sup 1/γ ≤ Kγ ε = 1. N →∞ N (log N )a/γ

(8.3.24)

Since ε is arbitrary, we obtain the stated result. 8.3.5 Remark. 1. It is worth noticing here that we used assumption (i) – only – to control the behavior of the sums SNk . Thus the following, seemingly weaker condition would have been enough for our purpose: (i ) There exist a real η0 > 0 and a constant C0 = C(η0 ) depending on η0 only such that: for any integers i ≤ j , j

γ ml ≥ η0 "⇒ E ξl ≤ C0 ml .

l=i

i≤l≤j

i≤l≤j

2. The next observation concerns the limit case α = 1 in Stechkin’s theorem. Let γ > 1 and ξ = {ξi , i ≥ 1} be a sequence of random variables satisfying the assumption γ ξl ≤ ml , 0 ≤ i ≤ j < ∞, (8.3.25) E i≤l≤j

i≤l≤j

where {ml , l ≥ 1} is a sequence of reals with 0 ≤ ml ≤ 1. Assume first that the series ∞ l=1 ml diverges. Using the notation from the proof of Theorem 8.3.2 (notably definition (8.3.16)), the previous remark together with estimate (8.3.23) imply for any a > γ + 1 that SNk P lim 1/γ = 0 = 1. (8.3.26) a/γ k→∞ Nk (log Nk )

8.3 An application to the quantitative Borel–Cantelli lemma

∞

Assume now that the series ∞

l=1 ml

ml (log l)γ < ∞ "⇒ the series

363

converges. We claim that ∞

l=1 ξl

converges almost surely.

(8.3.27)

l=1

Indeed, let us first observe that the sequence {Sn , n ≥ 1} is a Cauchy sequence in Lγ , thus converging to some element S ∈ Lγ . Next by Lemma 8.3.3, for any integer r ≥ 1, it follows that 1/γ Sn − Sm ≤ Kγ r sup m . l γ 2r ≤n,m 0, L l=1 L

γ +1 1

+ε ξl = o "(L) γ (log L) γ

(Gál–Koksma [1950: Theorem 3]),

1 1

+ε ξl = o "(L) γ (log L) γ

(Gál–Koksma [1950: Theorem 5]),

1 3 σ ξl = o L 2 (log L) 2 + 2 +ε

(Gál–Koksma [1950: Theorem 6]).

l=1 L l=1

Essentially in each case, we examine a situation of the following type: γ E (∀1 ≤ i ≤ j < ∞), ξl ≤ " ul i≤l≤j

(8.4.3)

i≤l≤j

where {ui , i ≥ 1} is a sequence of nonnegative reals and " : R+ → R+ an increasing function. PutS = {SL , L ≥ 1}, LU = {UL , L ≥ 1}, where for any positive integer L, ξ and U = SL = L L l=1 l l=1 ul . We shall prove the result below.

365

8.4 Application to Gál–Koksma’s theorems

8.4.1 Theorem. a) Assume that the series ∞ l=1 ul converges and that the integral −1 γ −1/γ dρ is convergent. Then the series ∞ l=1 ξl is convergent almost +0 " (ρ surely. 1+η b) Assume that the series ∞ l=1 ul diverges and that for some real η ≥ 0, "(x)/x is nondecreasing. Then, for all ε > 0, SL (η > 0) P lim = 0 = 1, L→∞ "(UL )1/γ (log UL )(1+ε)/γ (8.4.4) SL (η = 0) P lim = 0 = 1. L→∞ "(UL )1/γ (log UL )1+(1+ε)/γ Putting ul ≡ 1 in the above result immediately gives Theorems 3, 5 and 6 of Gál–Koksma [1950]. Proof. The proof of this result follows from a simple modification of the proofs of Theorems 8.2.1 and 8.2.2. a) Assumption (8.4.3) implies that the sequence S is a Cauchy sequence in Lγ . The new sequence obtained by adding to S its limit is again denoted by S. Let 0 < ε ≤ u; write again u = ∞ m l=1 l and ⎧ ⎪ ε, (j + 1)ε[, if j = 0, 1, . . . , [ uε ], ⎨Ij (ε) = [j u J ∗ (ε) = 0 ≤ j ≤ [ ε ] : Ij (ε) ∩ U = ∅ , ⎪ ⎩ j− = inf{L : UL ∈ Ij (ε)} if j ∈ J ∗ (ε). Then

∀L ≥ 1, ∃j ∈ J ∗ (ε) such that SL − Sj− γ ≤ "(ε)1/γ ,

which implies that N(S, · γ , ρ) ≤

"(u)1/γ

2u " −1 (ρ γ )

,

0 < ρ < "(u)1/γ . "(u)1/γ $

(8.4.5)

%

1/γ 2u N (S, · γ , ρ) dρ ≤ I" := dρ < ∞, " −1 (ρ γ ) 0 0 by assumption. Applying Theorem 8.1.1 shows that S is convergent almost surely and sup Sl − Sn | ≤ KI" , γ 1/γ

l,n≥1

where K is a universal constant. b) We use the notation and definitions from the proof of Theorem 8.2.2: κ = {κp , p ≥ 1}, Lp , L∗p , and a > 1. On the one hand

P |SL∗p | > ε"(M

κp +1 1/γ

)

p

a/γ

γ

≤

SL∗p γ ε γ "(M κp +1 )pa

≤

1 εγ pa

.

366

8 The metric entropy method

Thus by the Borel–Cantelli lemma, P lim sup p→∞

|SL∗p | "(M κp +1 )1/γ pa/γ

≤ ε = 1.

(8.4.6)

On the other hand, put for j ∈ Lp , Sj = Sj /"(M κp +1 )1/γ , uj = uj /"(M κp +1 )1/γ , j Uj = l=1 ul . By assumption

Si

− Sj γ

$

"(Uj − Ui ) ≤ "(M κp +1 )

%1/γ

(i, j ∈ Lp ).

Now we use the fact that "(x)/x 1+η is nondecreasing. Since we have

Si

− Sj γγ

"(Uj − Ui ) ≤ ≤ "(M κp +1 )

Uj − Ui M κp +1

1+η

"(Uj −Ui ) (Uj −Ui )1+η

≤

"(M κp +1 ) , (M κp +1 )1+η

= (tj − ti )1+η ,

with tj = Uj /M κp +1 , j ∈ Lp . Applying Lemma 8.3.4 to the family Sp allows us to get the following bound for the oscillation of the Sj ’s over Lp : κp +1 )1/γ if η > 0, sup |Si − Sj | ≤ Kη,γ "(M (8.4.7) γ κp +1 )1/γ log #(L ) if η = 0, K "(M i,j ∈Lp η,γ p where Kη,γ depend on η, γ only. • If η > 0, then P supi,j ∈Lp |Si − Sj | > εpa/γ ≤ (Kη,γ /ε)γ p−a , which implies by the Borel–Cantelli lemma that P lim sup sup

p→∞ i,j ∈Lp

|Sj − Si | ≤ ε = 1. "(M κp +1 )1/γ pa/γ

(8.4.8)

• If η = 0, then P supi,j ∈Lp |Si − Sj | > εpa/γ log #(Lp ) ≤ (Kη,γ /ε)γ p−a , and again by Borel–Cantelli lemma, P lim sup sup

p→∞ i,j ∈Lp

|Sj − Si | ≤ ε = 1. "(M κp +1 )1/γ pa/γ log #(Lp )

(8.4.9)

Combining now (8.4.6) with (8.4.8) and letting ε tend to 0, establishes the result for the case η > 0. Combining finally (8.4.6) with (8.4.9) and observing for L ∈ Lp that #(Lp ) ≤ M κp +1 ≤ MUL and p ≤ κp , next letting ε tend to 0, establishes the result for the case η = 0. Theorems 1, 2 and 4 in Gál–Koksma [1950] contain rather theoretical conditions for almost sure convergence, which practically amount to re-starting the proof for applications on the considered example (hence Theorems 3, 5 and 6).

367

8.4 Application to Gál–Koksma’s theorems

Consider now the following assumption: for some γ > 1, σ > 1, γ E ξl ≤ Cj γ −σ (j − i)σ η(j − i) (∀1 ≤ i ≤ j < ∞),

(8.4.10)

i≤l≤j

where η(n) > 0 is nonincreasing and the series n≥1 η(n)/n converges. By Theorem 7 in [Gál–Koksma: 1950], SL P lim = 0 = 1. L→∞ L The proof is given under the additional assumption that η(n)(log n)2 is nondecreasing, and several nice applications to uniform distribution can be found in Koksma–Salem [1950]. In these applications, η(N) = N −b for some positive real b. It is shown for instance in Koksma–Salem [1950: Section 3], by means of a lemma of Van der Corput that j 2(1−γ ) 2(1−γ ) 1 e2iπ kf (l) ≤ Ck P −2 j P (j − i)1− P (8.4.11) l=i

with 0 < γ < 1, provided that f be p-times differentiable with P = 2p , p ≥ 2. Then the authors study uniform distribution for a class of smooth differentiable functions, using (8.4.11) to satisfy assumption (8.4.10). However, here again, it is possible to apply a metric entropy argument. Consider the following assumption: γ E (∀1 ≤ i ≤ j < ∞), (8.4.12) ξl ≤ ul " ul i≤l≤j

l≤j

i≤l≤j

where , " : R+ → R+ are nondecreasing, "(x)/x 1+ρ is nondecreasing for some ρ ≥ 0 and {ui , i ≥ 1} is a sequence of nonnegative reals such that the series ∞ l=1 ul diverges. Assumption (8.4.10) corresponds to (x) = x γ −σ , Let σ > σ > 1. By writing

"(x) xσ

"(x) = x σ η(x),

=

assumption, mentioned above, that

ul ≡ 1.

σ −σ

x η(x)log2 x, we deduce log2 x "(x) is nondecreasing. xσ

from the additional

8.4.2 Theorem. Assume that condition (8.4.12) is satisfied, and for some M > 1, that the series s γ = l≥1 "(M l )(M l )/M γ l converges. Put, S = γ

sup

k k+1 [ k≥1 j :Uj ∈[M ,M

with the convention that sup∅ = 0. Then,

S γ ≤ Kγ s,

and in particular

where Kγ is a constant depending on γ only.

P

|Sj | Mk

γ

SL = 0 = 1, L→∞ UL lim

(8.4.13)

368

8 The metric entropy method

Theorem 7 of Gál–Koskma[1950] follows from this result by putting ul ≡ 1, since the convergence of the series n≥1 η(n)/n implies the one of the series l≥1 η(M l ), thereby also implying the finiteness of s. Proof. Again we use the notation from the proof of Theorem 8.2.2: κ = {κp , p ≥ 1}, Lp , L∗p , and a > 1. On the one hand SL∗p γγ p

M γ (κp +1)

≤

"(M κp +1 )(M κp +1 ) M γ (κp +1)

p

Now, for i, j ∈ Lp , i ≤ j , E

Sj − Si γ (M κp +1 )

≤ Kγ s γ .

(8.4.14)

≤ "(Uj − Ui ),

we deduce from estimates (8.4.7) sup |Si − Sj | ≤ Kγ (M κp +1 )1/γ "(M κp +1 )1/γ . γ i,j ∈Lp

(8.4.15)

Then

γ supi,j ∈Lp |Si − Sj | γ p

M γ (κp +1)

≤ Kγ

(M κp +1 )"(M κp +1 )/M γ (κp +1) ≤ Kγ s γ .

p

(8.4.16) By the triangle inequality, (8.4.14) and (8.4.16) imply S γ ≤ Kγ s, and finally that supj ∈Lp |Sj |/M κp tend to 0 almost surely, as p tends to infinity. Hence (8.4.13). We conclude with an example of application to diophantine approximation, inspired by a very deep result of Gál [1949]. For u ≥ 0, let {u} = u − [u] − 21 where [u] denotes the greater integer less than u. Let us consider, for a given increasing sequence of positive integers N = {ni , i ≥ 1}, the following sums: N {ni x} (N ≥ 1). (8.4.17) κN (x) = i=1

In the case when N = N, Khintchin proved that κN (x) = o(log1+ε N ) for almost all x, where ε > 0 is an arbitrarily small positive number. In the general case, Erdös showed that κN (x) = o(N 1/2 logr N) for almost all x, where r is some positive constant. Later Gál improved this in showing that for every ε > 0, κN (x) = o(N 1/2 log2+ε N ),

(8.4.18)

for almost every x, and stated that a minor modification in the proof yields the following better bound: for every ε > 0, κN (x) = o(N 1/2 log3/2+ε N ),

(8.4.18a)

8.5 An application to the supremum of random polynomials

369

almost surely. In Gál [1949] (to which we refer for the above mentioned results, but see also Baker [1981]), the proof of (8.4.18) is relatively long and appeals to the “Hobson– Plancherel” method. A short proof using Theorem 8.4.1 is however available. Sketch of proof. Let (a, b) and [a, b] respectively denote the greatest common divisor and the least common multiple of the positive integers a and b, and put a, b =

(a, b) [a, b]

We introduce the following function f (N) = sup ni

ni , nj ,

i,j ≤N

where the sup is taken over all N -tuples of positive integers. By N -tuple it is meant a collection of N positive integers all different. We shall make use of the following strong result in Gál [1949: Theorem 2]: there exist two constants c and C, such that for all N large enough cN (log log N)2 ≤ f (N) ≤ CN (log log N )2 . As is well known

0

and so (8.4.19) implies 1 0

1

{ax}{bx}dx =

1 a, b 12

2

2 {nl x} dx ≤ C(j − i) log log(j − i) .

(8.4.19)

(8.4.20)

(8.4.21)

i≤l≤j

Thus, the assumptions of Theorem 8.4.1 are satisfied with "(u) = u(log log u)2 . We deduce for all ε > 0 κN (x) = o(N 1/2 log3/2+ε N ), for almost every x.

8.5 An application to the supremum of random polynomials Let {pk , k ≥ 1}, {θk , k ≥ 1} be two sequences of reals. Put p˜ N = max{[2 + |pk |], 1 ≤ k ≤ N},

N = 1, 2, . . . ,

where [x] stands for the integer part of x. Let also X = {X1 , X2 , . . . } and Y = {Y1 , Y2 , . . . } be two sequences of real random variables defined on a common probability space (, A, P). We will be mainly interested in the cases when X and Y are

370

8 The metric entropy method

either sequences of centered, independent random variables, or stationary sequences. Consider for N = 1, 2, . . . the sequence of random trigonometric sums ZN (ω, t) =

N

θk Xk (ω) cos 2πpk t + Yk (ω) sin 2πpk t .

(8.5.1)

k=1

In this section, we show that the metric entropy method can be efficiently applied for estimating the total extrema QN := sup |ZN (t)| .

(8.5.2)

0≤t≤1

We will see that this reduces to applying the metric entropy method in the simplest possible case: the real line provided with the usual distance. And this is also why we believe that it is likely the most elementary possible approach. As a particular case of a more general estimate we shall recover the well-known estimate of Salem–Zygmund’ proof or in Kahane [1954: Theorem 7]. It is of interest to mention that Bernstein’s inequality for polynomials is not used in this approach, unlike in Salem–Zygmund or Kahane [1968]. Let us first observe in the case when X and Y are independent random variables with E Xk = E Yk = 0 and E Xk2 = E Yk2 = 1, that 2

E ZN (s) − ZN (t) =E

N

2 θk Xk [cos 2πpk t − cos 2πpk s]+Yk [sin 2πpk t − sin 2πpk s]

k=1

=

N

2

θk2 [cos 2πpk t − cos 2πpk s]2 + [sin 2πpk t − sin 2πpk s]

k=1 N

=2

θk2 [1 − cos 2πpk (t − s)] = 4

k=1

N

θk2 sin2 πpk (t − s).

k=1

Therefore, if we put for s, t ∈ [0, 1], dN (s, t) = 2

N

1/2

θk2 sin2 πpk (s − t)

,

(8.5.3)

k=1

we define in this way a pseudo-metric on [0, 1], since dN (s, t) = ZN (s) − ZN (t) 2 . This pseudo-metric will play a central role in what follows. We introduce now an assumption concerning the increments of the process ZN ( · ). Consider the Young function G(t) = exp(t 2 ) − 1, t real, together with the associated Orlicz space LG (P), that is, the set of A-measurable functions f : → R, such that E G(af ) < ∞ for some real 0 < a < ∞. We recall that LG (P) is provided with the norm

∀f ∈ LG (P), f G = inf c > 0 : E G fc ≤ 1

8.5 An application to the supremum of random polynomials

and that (LG (P), · G ) is a Banach space. We will assume that for some constant B,

ZN (s) − ZN (t) G ≤ BdN (s, t),

N ∀N ≥ 1, ∀0 ≤ s, t ≤ 1, 2 1/2 .

ZN (s) G ≤ B k=1 θk

371

(8.5.4)

These assumptions are satisfied when X and Y are independent Rademacher or Gaussian random variables; but also in other interesting cases (see Examples 1–3 below). We will prove the following result. 8.5.1 Theorem. Under assumption (8.5.4), there exists a constant C (which is a function of the constant B from (8.5.4) only) such that for any integer N ≥ 1,

QN G ≤ C (log p˜ N )

1/2

N

θk2

1/2 .

k=1

This estimate is optimal. Indeed, assume that Xn = ξ2n , Yn = ξ2n+1 where (ξn )n≥0 is a sequence of independent Rademacher random variables. Assume also that θk = 1 and pk = k (k ≥ 1). Then, referring for instance to Proposition 2, p. 129 in Kashin– Saakyan [1989], we have ∀N ≥ 1,

E QN ≥ C (N log N )1/2 ,

(8.5.5)

where C is a universal constant. We shall now first give three nice classes of examples. Example 1. Assume that X and Y are two stationary centered Gaussian sequences, with finite decoupling coefficient, that is: ∞ E X1 Xk p(X) = E (X )2 < ∞, k=1

1

∞ E Y1 Yk p(Y) = E (Y )2 < ∞. k=1

1

Then, assumption (8.5.4) is satisfied. More precisely, for any 0 ≤ s, t ≤ 1, √

1/2

ZN (s) − ZN (t) G ≤ 9 2 max p(X), p(Y) dN (s, t), √

1/2 N 1/2 2

ZN (s) G ≤ 9 2 max p(X), p(Y) . k=1 θk

(8.5.4a)

So Theorem 8.5.1 does apply in that case. Note that the decoupling assumption is trivially satisfied when both X and Y consist of independent N (0, 1) distributed random variables. Observe also that no assumption on the correlation between X and Y is required, and consequently ZN is not necessarily Gaussian. Finally, recall that the Ornstein–Uhlenbeck process Uk = W (ek )e−k/2 (W being Brownian motion) k = 1, 2, . . . is the typical example of a stationary Gaussian sequence with finite decoupling coefficient. For proving the claimed inequalities, we will use the decoupling inequality

372

8 The metric entropy method

stated in Lemma 10.1.9. Let λ be some fixed real. By means of the Cauchy–Schwarz inequality: N

E eλ(Zn (s)−ZN (t)) = E eλ k=1 θk {Xk (cos 2πpk s−cos 2πpk t)+Yk (sin 2πpk s−sin 2πpk t)} N N

1/2 (8.5.6) ≤ E e2λ k=1 θk Xk (cos 2πpk s−cos 2πpk t) E e2λ k=1 θk Yk (sin 2πpk s−sin 2πpk t) . Put fkX (x) = e2λθk x(cos 2πpk s−cos 2πpk t) ,

Y

fk (x) = e2λθk x(sin 2πpk s−sin 2πpk t) ,

k = 1, . . . , N, and apply Lemma 10.1.9. We obtain, since E eλN (0,1) = eλ E e2λ

N

E e2λ

k=1 θk Xk (cos 2πpk s−cos 2πpk t)

N

k=1 θk Yk (sin 2πpk s−sin 2πpk t)

≤ e2λ ≤ e2λ

2 /2

,

2 p(X) N θ 2 (cos 2πp s−cos 2πp t)2 , k k k=1 k 2 p(Y) N θ 2 (sin 2πp s−sin 2πp t)2 k k k=1 k

.

Hence E eλ(Zn (s)−ZN (t)) ≤ e2λ

2 max(p(X),p(Y)) N θ 2 k=1 k

= e2λ

2 max(p(X),p(Y))d 2 (s,t) N

(cos 2πpk s−cos 2πpk t)2 +(sin 2πpk s−sin 2πpk t)2

. 2

2

Now we shall use the fact that if U is a real random variable such that E eλU ≤ eλ C (∀λ ∈ R), then U G ≤ 9C. Here we have C = 21/2 max(p(X), p(Y))1/2 dN (s, t). Thus, it follows from the previous estimates that √

1/2

Zn (s) − ZN (t) G ≤ 9 2 max p(X), p(Y) dN (s, t). Hence the first inequality in (8.5.4a). The second one is deduced by a similar reasoning. Example 2. Assume that both X and Y are sequences of independent, centered real random variables with unit variance, and that there exists a real constant M such that ∀k ≥ 1,

|Xk | ≤ M,

|Yk | ≤ M.

Then, assumption (8.5.4) is satisfied. More precisely, for any 0 ≤ s, t ≤ 1,

ZN (s) − ZN (t) G ≤ 9MdN (s, t),

N 2 1/2 .

ZN (s) G ≤ 9M k=1 θk

(8.5.4b)

This is a direct consequence of the following result (Theorem 3.5.1 in [Garsia: 1970]). Let {ξn , n ≥ 1} be independent, uniformly bounded (|ξn | ≤ M, a.s. for every n), centered random variables with unit variance. Let {an , n ≥ 1} ∈ 2 and let f = ∞ n=1 an ξn . Then √ |f |2 ≤ 2. (8.5.7) E exp 16M 2 f 22

373

8.5 An application to the supremum of random polynomials

This can also be proved by means of Lemma 4.1 in [Kuipers–Niederreiter: 1971]. According to this lemma, for any bounded random variable X and all real numbers α, E eαX ≤ eαE X+α

2 X 2 /2 ∞

.

(8.5.8)

We begin again with (8.5.6) and obtain E eλ(Zn (s)−ZN (t)) N

= E eλ k=1 θk {Xk (cos 2πpk s−cos 2πpk t)+Yk (sin 2πpk s−sin 2πpk t)} 1/2 N N ≤ E e2λ k=1 θk Xk (cos 2πpk s−cos 2πpk t) E e2λ k=1 θk Yk (sin 2πpk s−sin 2πpk t) . In view of the quoted lemma, E e2λθk Xk (cos 2πpk s−cos 2πpk t) ≤ e4λ

2 θ 2 (cos 2πp s−cos 2πp t)2 M 2 /2 k k k

.

Operating similarly for the “Yk ” component gives E eλ(Zn (s)−ZN (t)) ≤

N (

e

2λ2 θk2 (cos 2πpk s−cos 2πpk t)2 M 2

k=1

=

N (

eλ

N (

e2λ

2 θ 2 (sin 2πp s−sin 2πp t)2 M 2 k k k

1/2

k=1 2M2θ 2 k

"

(cos 2πpk s−cos 2πpk t)2 +(sin 2πpk s−sin 2πpk t)2

#

= eλ

2 M 2 d 2 (s,t) N

.

k=1

Hence the first inequality in (8.5.4b), and the second obtains by a similar reasoning. Theorem 8.5.1 thus applies in that case as well. / Example 3. Let A0 ⊂ A1 ⊂ · · · ⊂ A be an increasing filtration of A (A = ∞ i=0 Ai ), and assume that X is a sequence of martingale differences adapted to that filtration, with ∀k ≥ 1, Xk ∞ ≤ 1. (t) Assume that Y ≡ 0. Then assumption (8.5.4) is satisfied. Indeed Zn (t) = N k=1 dk (t) where dk = θk Xk cos 2πpk t. Thus Zn (t) is a sum of martingale differences satisfying (t)

dk ∞ ≤ θk . Then by Azuma’s inequality, for all nonnegative reals v, N (t) dk > v ≤ 2 exp − P k=1

Thereby ZN (s) G ≤ C have

2 1/2 k=1 θk

N

ZN (s) − ZN (t) G ≤ C

N

2

N

v2

(t) 2 k=1 dk ∞

.

(8.5.9)

for some universal constant C. Similarly, we

θk2 (cos 2πpk s − cos 2πpk t)2

1/2

≤ CdN (s, t).

k=1

(8.5.4c) Consequently, Theorem 8.5.1 applies in that case as well.

374

8 The metric entropy method

Proof of Theorem 8.5.1. The key point of the proof is contained in the following elementary observation: the pseudo-metric dN ( ·, · ) is locally comparable to the usual distance. Indeed, since | sin x| ≤ (|x| ∧ 1), we thus have N

N θk2 pk2 ∧ (πpk |s − t|) ∧ 1 ≤ 4π 2 |s − t|2

1 ≤4 . 2 π |s − t|2 k=1 k=1 (8.5.10)

1 2 , k = 1, . . . , N. We thus deduce that if π |s − t| ≤ 1/p˜ N , then pk2 ∧ π 2 |s−t| = p 2 k

N 1/2 2 2 And consequently dN (s, t) ≤ 2π |s − t| . k=1 θk pk Divide the interval [0, 1[ into sub-intervals: dN2 (s, t)

θk2

2

IN,j =

$

$

j −1 j , , 4p˜ N 4p˜ N

Since s, t ∈ IN,j implies |s − t| ≤ that dN (s, t) ≤ 2π |s − t|

N

≤

1 4p˜ N

θk2 pk2

j = 1, . . . , 4p˜ N .

1 π p˜ N ,

1/2

it follows from the previous estimate

j = 1, . . . , 4p˜ N , s, t ∈ IN,j .

,

(8.5.11)

(8.5.12)

k=1

Introduce now the auxiliary process "

# ZN (t) − ZN ( 4j p−1 ˜N ) YN (t) =

N , 2 2 1/2 2π k=1 θk pk

j = 1, . . . , 4p˜ N , t ∈ IN,j .

(8.5.13)

Then we bound QN relatively to the partition of [0, 1[ as follows: QN ≤

sup 1≤j ≤4p˜ N

N 1/2 j − 1 2 2 Z + 2π θ p N k k

4p˜ N

k=1

sup

sup |YN (t)|. (8.5.14)

1≤j ≤4p˜ N t∈IN,j

We are now in an easy setting, because we have to estimate the local extrema sup{|YN (t)|, t ∈ IN,j } of a stochastic process with increments locally bounded by the usual distance. Indeed, from (8.5.4), (8.5.12): for any s, t ∈ IN,j , YN (s) − YN (t) G ≤ B|s − t|, j = 1, 2, . . . , 4p˜ N . In order to estimate QN , we will need two simple tools. The first follows from inequality (8.1.12): sup |fj | ≤ ([2/ log 2] log n)1/2 sup fj G , G 1≤j ≤n

1≤j ≤n

∀n ≥ 2, ∀f1 , . . . , fn . (8.5.15)

375

8.5 An application to the supremum of random polynomials

From (8.5.4) and (8.5.15) follows that

QN G ≤ ([2/ log 2] log 4p˜ N )1/2 + 2π

N

θk2 pk2

sup j =1,...,4p˜ N

1/2

sup

.

j =1,...,4p˜ N

k=1

1/2

≤ [2/ log 2] log 4p˜ N

N

B

θk2

j − 1 Z N

4p˜ N

G

sup YN (t) G

t∈IN,j

(8.5.16)

1/2

k=1

+ 2π

N

θk2 pk2

1/2

sup

.

j =1,...,4p˜ N

k=1

sup YN (t) G

t∈IN,j

The second tool is Theorem 8.1.1. Now, we estimate supt∈IN,j |YN (t)| G . By taking account of (8.5.12) and since diam(IN,j , | · |) = 1/4p˜ N , we must first estimate N(IN,j , | · |, u) for 0 < u ≤ 1/4p˜ N ; obviously $

N(IN,j , | · |, u) ≤ 1 + Thus I (IN,j , | · |) ≤

1 4p˜ N

9

0

1/4p˜ N 2u

%

≤1+

1/4p˜ N 1 . ≤ 2u 2up˜ N

v

(u= 4p˜ ) 1 2 log du = N 4up˜ N 4p˜ N

1 0

3

2 C log dv ≤ . v p˜ N

It follows from (8.5.4), Theorem 8.1.1 and from the fact that Y( 4j p−1 ˜ N ) = 0, that for any countable subset E of IN,j , sup |YN (t)| ≤ sup YN (s) − YN (t) ≤ C , G G p˜ N t∈E s,t∈E

(8.5.17)

where C depends on B only. But the ω-trajectories t → ZN (t, ω) are continuous for each ω ∈ , and so are those of the auxiliary process YN . By specifying estimate (8.5.17) for a countable dense subset of IN,j , we have in fact shown sup YN (t) ≤ C . G p˜ N t∈IN,j By putting this estimate in (8.5.16), we thus obtain N N 1/2 1 1 2 2 1/2 2 2

QN G ≤ C (log 4p˜ N ) θk + θk pk p˜ N k=1

≤ C (log p˜ N )

1 2

N

θk2

k=1

1/2

k=1

We have therefore proved Theorem 8.5.1.

.

(8.5.18)

376

8 The metric entropy method

8.5.2 Remark. The same proof combined with a simple form of the Borell–Sudakov– Tsirelson inequality (operating the same way as in the proof of Corollary 8.5.5) also serves to establish a multidimensional version of Theorem 8.5.1. Let m be some ˜N = positive integer. Let {p k , k ≥ 1} be a sequence of elements of Rm + , and write p max{[2 + pik ], 1 ≤ k ≤ N, 1 ≤ i ≤ m}; here we have denoted p k = (p 1k , . . . , p m k ). For t ∈ [0, 1]m , define analogously to (8.5.1 ), m ZN (ω, t) =

N

θk Xk (ω) cos 2π pk , t + Yk (ω) sin 2π p k , t ,

k=1 m Qm N = sup |ZN (t)|. t∈[0,1]m

The corresponding pseudo-metric to (8.5.3) is defined for s, t ∈ [0, 1]m by dN,m (s, t) = 2

N

1/2

θk2 sin2 π pk , t − s

,

(8.5.3 )

k=1

When for instance X and Y are independent random variables with E Xk = E Yk = 0

m m (t) 2 = d 2 (s, t). Analogously, we will and E Xk2 = E Yk2 = 1, then E ZN (s) − ZN N,m assume that for some constant B,

ZN (s) − ZN (t) G ≤ BdN,m (s, t), m (8.5.4 ) ∀N ≥ 1, ∀s, t ∈ [0, 1] , 1/2

N 2

ZN (s) G ≤ B . k=1 θk The following is left as an exercise: under assumption (8.5.4 ), there exists a constant C (which is a function of m and the constant B from (8.5.4 ) only) such that for any integer N ≥ 1, N 1/2 m

Q ≤ C log p˜ N 1/2 θk2 . N G k=1

Some applications. We give four applications of Theorem 8.5.1, the first one establishing a precise uniform estimate of exponential sums of the form N

Uk θk e2iπpk t

N = 1, 2, . . .

k=1

where U = {Uk , k ≥ 1} is a sequence of weakly dependent random variables; the second one provides a global uniform estimate of the sequence formed by the differences of these polynomials. In that case, we will assume that the sequence U is Gaussian. The third application provides a similar global uniform estimate for sequences of independent symmetric random variables. A fourth application to a variant of the initial problem is given in Theorem 8.5.8. We first establish the following corollary.

8.5 An application to the supremum of random polynomials

377

8.5.3 Corollary. (a) Let U = {Uk , k ≥ 1} be a sequence of independent, centered real random variables. We assume that there exists a real M < ∞ such that |Uk | ≤ M a.s. for any k ≥ 1. Then N N 1/2 2iπpk t Uk θk e θk2 ≤ CM log p˜ N sup G

0≤t≤1 k=1

(8.5.19a)

k=1

where C is a universal constant. (b) Let V = {Vk , k ≥ 1} be a centered, stationary Gaussian sequence with finite decoupling coefficient p(V) (see Example 2). Then N N 1/2 2 2iπpk t Vk θk e θk2 ≤ C p(V) log p˜ N sup G

0≤t≤1 k=1

(8.5.19b)

k=1

where C is a universal constant. (c) Let U = {Uk , k ≥ 1} be a sequence of independent, centered real random variables. Then N

N N 2 1/2 , Uk e2iπpk t ≤ C min (log p˜ N )1/2 E sup k=1 E Uk k=1 E |Uk | 0≤t≤1 k=1

(8.5.19c) where C is a universal constant. Proof. For establishing (8.5.19a), we apply Theorem 8.5.1 to X = U, Y = 0, next to X = 0, Y = U. This provides the desired estimate for both the imaginary and real part; hence the result by putting together these estimates. We operate similarly for establishing (8.5.19b), by applying Theorem 8.5.1 to X = V, Y = 0, next to X = 0, Y = V. Now, to prove part c) of the statement, we use a well-known randomization trick, often called a symmetrization procedure. Let U = {Uk , k ≥ 1} be an independent copy of U. Let also ε = {εk , k ≥ 1} be a Rademacher sequence which is assumed to be independent from U and U , and denote by E , E ε the corresponding expectation symbols. The sequence {Uk − Uk , k ≥ 1} is a sequence of symmetric independent random variables and has thus the same law as {εk (Uk − Uk ), k ≥ 1}. Then, N N Uk e2iπpk t = E sup (Uk − E Uk )e2iπpk t E sup 0≤t≤1 k=1

0≤t≤1 k=1

≤ EE

N sup (Uk − Uk )e2iπpk t

0≤t≤1 k=1

N = E E E ε sup εk (Uk − Uk )e2iπpk t ≤ 0≤t≤1 k=1

378

8 The metric entropy method N ≤ 2E E ε sup εk Uk e2iπpk t 0≤t≤1 k=1

(by (8.5.19a))

≤ C(log p˜ N )

1/2

E

N

Uk2

1/2

≤ C(log p˜ N )

1/2

k=1

N

E Uk2

1/2 .

k=1

The proof is now complete.

N 1/2 8.5.4 Remark. One might think that the bound (log p˜ N )1/2 E Uk2 in k=1 E |U |. This is however not (8.5.19c) is always better than the trivial bound N k k=1 the case. Consider the following instructive example. We assume that each random variable Uk takes only two values as follows: 1/k with probability 1 − εk , Uk = −(1 − εk )/(kεk ) with probability εk , where 0 < εk < 1 and εk decreases to 0. Then E Uk = 0, E Uk2 = (1 − εk )/k 2 + (1 − εk )2 /(k 2 εk ). Assume that limk→∞ k 2 εk = 1. Then E Uk2 ∼ 1 as k tends to

1/2 infinity. And so (log p˜ N ) N Uk2 ∼ (N log p˜ N )1/2 , as N tends to infinity. k=1 E But E |Uk | = 2(1 − εk )/k, so that N k=1 E |Uk | ∼ C log N, which provides a much better bound. Estimate (8.5.19b) can be considerably strengthened. This is the object of the next 2iπpk t can be obtained, V corollary. A uniform bound for the increments M k=N +1 k θk e

2 1/2 should be slightly modified. It will but the normalizing factors log p˜ M M k=N +1 θk be necessary to have for all positive integers M, log p˜ M ≥ C log M, C being some con

2 1/2 stant depending from the data. We will therefore work with log p¯ M M k=N +1 θk where

p¯ M = max(p˜ M , M) = max max [2 + |pk |], 1 ≤ k ≤ M , M . (8.5.20) If {pk , k ≥ 1} is an increasing sequence of positive integers, or if for some δ > 0, pk ≥ k δ , log p¯ M and log p˜ M are of comparable order. But this is no longer the case when {pk , k ≥ 1} grows slower than polynomially, as it happens for the Dirichlet sums N −it . k=1 Vk θk k 8.5.5 Corollary. Let V = {Vk , k ≥ 1} be a centered stationary Gaussian sequence with finite decoupling coefficient p(V). Then M 2iπpk t 2 k=N +1 Vk θk e sup sup

≤ C0 p(V), 1/2 M N <M 0≤t≤1 log p¯ M θ2 k=N +1 k

where C0 is a universal constant.

G

8.5 An application to the supremum of random polynomials

379

Proof. It is enough to establish a similar estimate for each of the imaginary and real parts. We put ⎧ M (cos) ⎪ k=N +1 Vk θk cos 2πpk t ⎪ = sup (N < M), L

⎪ 1/2 0≤t≤1 M N,M ⎪ 2 ⎪ ⎨ log p¯M k=N +1 θk M (8.5.21) (sin) k=N +1 Vk θk sin 2πpk t L = sup (N < M),

⎪ 1/2 0≤t≤1 M N,M ⎪ 2 ⎪ log p¯ M k=N+1 θk ⎪ ⎪ ⎩ (cos) (cos) (sin) = supN <M LN,M , L(sin) = supN <M LN,M . L It thus suffices to show that 2 E L(cos) ≤ C p(V),

2 E L(sin) ≤ C p(V).

(8.5.22)

Then by estimate (10.2.2) for Gaussian semi-norms, 2 2 (cos) L ≤ C p(V), L(sin) ≤ C p(V). G G Hence, the desired result follows by combining together these estimates. We prove now (8.5.22). For convenience we recall (10.4.4): If G1 , . . . , GN are Gaussian random vectors with values in a separable Banach space (B, · ), then E sup Gk ≤ C sup E Gk + E sup σk |gk | 1≤k≤N

1≤k≤N

1≤k≤N

1/2 where σk = supf ∈B ∗ , f ≤1 E f, Gk 2 , k = 1, . . . , N, {gk , 1 ≤ k ≤ N } is a sequence of independent N (0, 1) distributed random variables, C a universal constant. From this we deduce (cos) E L(cos) ≤ C sup E LN,M + E sup |λN,M |σN,M N <M

where σN,M

N <M

M k=N +1 Vk θk cos 2πpk t = sup

, 1/2 M 0≤t≤1 log p¯ M θ2 k=N +1 k

2

and (λN,M )N<M is a sequence of independent N (0, 1) distributed random variables. By a computation similar to the one made in Example 1, we also obtain M M 1/2 2 Vk θk cos 2πpk t ≤ C p(V) θk2 cos2 (2πpk t) k=N+1

G

k=N +1 M 1/2 2 ≤ C p(V) θk2 . k=N +1

380

8 The metric entropy method

M √ 2 1/2 , and therefore Hence, M k=N +1 Vk θk cos 2πpk t 2 ≤ C p(V) k=N +1 θk 2 −1/2

σN,M ≤ C p(V) log p¯ M . √ (cos) By Theorem 8.5.1, we already know that supN <M E LN,M ≤ C p(V). Consider now the other part. First, we re-index the sequence as follows: put m1 = 1, mk = 1 + kj =2 (j − 1) (k ≥ 2). Next, put for any M ≥ 1 and any l ∈ [mM , mM+1 [, gl := λl−mM ,M , sl = (log p¯ M )1/2 . Observe that sl ≥ (log M)1/2 ≥ C(log l)1/2 . Thus |gl | l≥1 sl 9 log l |gl | ≤ C sup E sup √ ≤ C < ∞. sl log l l≥1 l≥1

E sup |λN,M |σN,M ≤ CE sup N <M

√ Hence E L(cos) ≤ C p(V). By arguing identically, we establish an estimate of the same order for E L(sin) . Hence (8.5.22). The corollary is thus proved. We will now prove the following result. 8.5.6 Theorem. Let W = {Wk , k ≥ 1} be a sequence of independent, symmetric real random variables. Then, M 2iπpk t W e k k=N +1 ≤ C, (8.5.23) sup sup

1/2 2 N <M 0≤t≤1 log p¯ M M W k=N +1

k

G

where C is a universal constant. Observe that, by means of the Cauchy–Schwarz inequality, M M 2iπpk t (M − N )1/2 k=N +1 Wk e k=N +1 |Wk | ≤

≤

1/2 . M 1/2 1/2 2 log p¯ M k=N +1 Wk2 log p¯ M M log p ¯ W M k=N +1 k In particular, if {pm , m ≥ 1} is λ-lacunary (λ > 1), that is pm+1 ≥ λpm for all m ≥ 1, then M 2iπpk t k=N +1 Wk e sup sup

≤ C, M 2 1/2 N <M 0≤t≤1 log p¯ M W k=N +1 k where C is a constant depending on λ only. So Theorem 8.5.4 is only interesting when {pm , m ≥ 1} grows at most geometrically. Proof. Since the sequence W is symmetric, it has the same distribution as the sequence W = (εk Wk )∞ k=1 , where ε = {εk , k ≥ 1} is a sequence of independent Rademacher random variables, which is also independent from the sequence W . Let P be some fixed

8.5 An application to the supremum of random polynomials

381

positive integer. Let also g = {gk , k ≥ 1} be a sequence of independent N (0, 1) distributed random variables, also independent from the sequence W . By Corollary 8.5.5, 0 M 1 2iπpk t g W e k k k=N +1 ≤1 sup sup E gG

M 2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M k=N +1 Wk where C0 is the same constant as in Corollary 8.5.5. Since |g| = {|gk |, k ≥ 1} and sign(g) = {sign(gk ), k ≥ 1} are independent sequences, by Jensen’s inequality 0 M 1 2iπpk t k=N +1 gk Wk e sup sup 1 ≥ E gG

M 2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M k=N +1 Wk 0 M 1 2iπpk t k=N +1 gk Wk e ≥ E sign(g) G E |g| sup sup

M 2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M k=N +1 Wk 0 2 M 2iπpk t 1 k=N +1 sign(gk )Wk e π sup sup ≥ E sign(g) G

2 1/2 N <M≤P 0≤t≤1 C0 log p¯ M M k=N +1 Wk 03 M 1 2 supN <M≤P sup0≤t≤1 k=N +1 εk Wk e2iπpk t . = E εG

1/2 π C0 log p¯ M M W2 k=N +1

k

By integrating with respect to W , using symmetry of the law of W and finally letting P tend to infinity, we obtain 03 M 1 2 supN <M sup0≤t≤1 k=N +1 Wk e2iπpk t EG ≤ 1.

2 1/2 π C0 log p¯ M M k=N +1 Wk This means that M 2iπpk t k=N +1 Wk e sup sup

2 1/2 N <M 0≤t≤1 log p¯ M M k=N +1 Wk

3 π , ≤ C0 2 G

hence the announced result. It is now easy to deduce from Theorem 8.5.6 (except for the constant 2 in (8.5.24), the well-known estimate of Salem–Zygmund (see Kahane [1968] or Salem–Zygmund [1954]) that we recall now. 8.5.7. Salem–Zygmund’s estimate. Let {nk , k ≥ 1}, {pk , k ≥ 1} be two increasing sequences of integers and a sequence {ak , k ≥ 1} of reals. Let also ε = {εk , k ≥ 1} be a sequence of independent Rademacher random variables defined on a probability space (, B, P). Then maxnk 1). Theorem 8.5.6 can be used to get a simple sufficient condition for uniform convergence of random Fourier series. The condition is expressed by means of the convergence of a series whose terms depend on the sequence p. When the size’s order of this sequence is known, this condition can be easier to check than the remarkable characterization ([Ledoux–Talagrand: 1991] Theorem 13.6 and Corollary 13.9) of that property by Marcus and Pisier, in terms of the so-called Dudley’s entropy integral. 8.5.8 Theorem. Suppose there exist integers 0 := n0 < n1 < n2 < · · · such that the following condition is satisfied: ∞ 2

log pni+1 E

i+1 n

|Wk |2

k=ni +1

i=0

Then the sequence of partial sums Sn (ω, t) := converges in C, for P-almost all ω. Proof. Put R=

1/2

converges.

n

k=1 Wk (ω)e

M

2iπpk t ,

2iπpk t k=N +1 Wk e sup sup

. M 2 1/2 N <M 0≤t≤1 log pM k=N +1 Wk

By Theorem 8.5.6, E R < ∞, so that

Sni+1

i+1 n 1/2 2 − Sni C ≤ R log pni+1 |Wk |2 ,

k=ni +1

n = 1, 2, . . .

385

8.5 An application to the supremum of random polynomials

for any i ≥ 1, and moreover sup

ni ≤n≤ni+1

Sn − Sni C ≤ R =R

n

sup

ni ≤n≤ni+1 i+1 n

|Wk |2

1/2

log1/2 pn

k=ni +1

|Wk |2

1/2

log1/2 pni+1 .

k=ni +1

Thus by the triangle inequality, for all r ≥ 1, sup Su − Sv C ≤ R

u,v≥r

i+1 n

i≥r

|Wk |2

1/2

log1/2 pni+1 .

k=ni +1

This last inequality shows by the assumption made and Fatou’s lemma that sup Su − Sv C → 0

u,v≥r

as r tends to infinity, almost surely. The result easily follows. Lp -norms of random polynomials. The study of the behavior of Lp -norms of random polynomials built from sequences of i.i.d. random variables, requires a radically different approach. Borwein and Lockhart [2001] investigated this question. Their approach is based on convergence results of moments in the central limit theorem for triangular arrays of i.i.d. random variables. The case of arrays of independent random variables was considered in [Cuny–Weber: 2006], where a theorem of convergence of moments with speed of convergence in the CLT for triangular arrays of independent random variables is further established. We begin with introducing the necessary notation. Let Xn,k , 1 ≤ k ≤ kn , n ≥ 1 be a triangular array of real centered independent, square integrable random variables and set for every n ≥ 1 and 1 ≤ j, k ≤ kn , 2 2 σn,j = E Xn,j ,

2 sn,k =

k

2 σn,j ,

sn = sn,kn ,

Sn,k =

j =1

k

Xn,j ,

Sn = Sn,kn .

j =1

Introduce the (generalized) Lindeberg condition (also called Lyapunov’s condition) of order ν ≥ 2: kn

E |Xn,j |ν 1{|Xn,j |>εsn } = o(snν ), (∀ε > 0) n → ∞.

(Lν )

j =1

This condition is, for ν > 2, equivalent to kn j =1

E |Xn,j |ν = o(snν ),

n → ∞.

(Lν )

386

8 The metric entropy method

According to Lindeberg’s theorem (see for instance Hall and Heyde [1980]), under

S2 (L2 ), Ssnn converges in law to the standard normal law; and since E s 2n = 1, we n have 2 Sn,k lim E = 1 = m2 , (8.5.28) 2 n→∞ sn,k where m2 = E W 2 and W is a variable with standard normal law. More generally, for ν ν > 0, write mν := E |W | . • (0 < ν ≤ 2). Let Xn,k , 1 ≤ k ≤ kn , n ≥ 1 be a triangular array of real centered independent, square integrable random variables. Assume that (L2 ) holds. Then, E |Sn |ν = mν n→∞ snν lim

• (ν > 2). Let {Yk , 1 ≤ k ≤ n} be real centered independent random variables,

n 2 1/2 . Then, with finite moment of order ν. Write Sn = nk=1 Yk and sn = k=1 E Yn there exists a universal constant C such that n ν ν |S | n k=1 E |Yk | E ≤ C − m for 2 < ν ≤ 3, (8.5.29a) ν sn snν n ν n ν 3 k=1 E |Yk | k=1 E |Yk | E |Sn | − mν ≤ C + for 3 < ν ≤ 5, sn snν sn3 (8.5.29b) and, for ν > 5, n n ν ν 3 k=1 E |Yk | k=1 E |Yk | E |Sn | − mν ≤ (C ν )ν + ν 3 sn log ν sn sn (8.5.29c) n n ν−3 3 k=1 E |Yk | k=1 E |Yk | . + sn3 snν−3 As a corollary we obtain 8.5.9 Theorem. Let ν > 2. Let {Xn,k , 1 ≤ k ≤ kn , n ≥ 1 be a triangular array of real centered independent random variables, having moments of order ν. Assume that ν n| (Lν ) holds. Then E |S converges to mν as n tends to infinity, with the speed given snν above. Further if ν ≥ 3, the rate of convergence can be simplified: kn ν ν ν h ν |S | n k=1 E |Xn,k | E ≤ − m C max , ν 1 sn log ν h∈{1, ν−2 snν } where C is a universal constant. We refer to Cuny and Weber [2006] for these results and comparisons with earlier results. Theorem 8.5.10 can be used to prove the following result extending Borwein and Lockhart’s theorem to triangular arrays of independent random variables.

387

8.6 Application to a.s. convergence of weighted series of contractions

8.5.10 Theorem. Let {Xn,k , 1 ≤ k ≤ kn , n ≥ 1} be a triangular array of real centered 2 = 1, satisfying the Lindeberg condition independent random variables, with E Xn,k (Lν ) of order ν ≥ 2. We have 2π ν 1 ν E |qn (θ )| dθ = 1 + , lim n→∞ 2π kn ν/2 0 2 ∞ n Xn,k eikθ and (s) = 0 us−1 e−u du is the usual Gamma funcwhere qn (θ) = kk=1 tion.

8.6 Application to a.s. convergence of weighted series of contractions In this section the convergence properties, in mean and almost everywhere, of series of contractions (of an arbitrary Hilbert space) with random weights are investigated. The uniform estimates of random polynomials established in the previous section will be combined with the spectral inequality to obtain sharp conditions ensuring the existence of universal sets on which these series converge in mean and also almost everywhere, for arbitrary contractions. The general approach is further also based on the metric entropy method. Let (X, F , μ) be some probability space. Consider the randomly weighted series of contractions ∞ Wk (ω)T pk , (8.6.1) k=1

where {Wk , k ≥ 1} is a sequence of independent, mean zero, square integrable random variables, defined on some probability space (, B, P), and T is a linear contraction in a Hilbert space H , while {pk , k ≥ 1} is a nondecreasing sequence of nonnegative integers with p1 > 1, and ω ∈ . Consider first the convergence in mean of the series (8.6.1). One can establish the following theorem. 8.6.1 Theorem. Suppose that there exist integers 0 := n0 < n1 < n2 < · · · such that the following condition is satisfied: ∞ j =0

nj +1 nj +1 2 1/2 , min (log pnj +1 )1/2 k=nj +1 E Wk k=nj +1 E |Wk | < ∞.

(8.6.2)

Then there exists a (universal) sequence of P-integrable random variables M = {MJ , J ≥ 1} defined on (, B, P), which converges to zero P-a.s. and in P-mean, such that for any Hilbert space H and any contraction T in H we have R sup Wk (ω)T pk ≤ MJ (ω)

R>nJ

k=nJ +1

(8.6.3)

388

8 The metric entropy method

for all ω ∈ and all J ≥ 1. In particular, there exists a (universal) P-null set N ∗ ∈ B such that the series ∞ Wk (ω)T pk (8.6.4) k=1

converges in operator norm for all ω ∈ \N ∗ , whenever H is a Hilbert space and T is a contraction in H . We will see in Theorem 8.6.2 that condition (8.6.2) is in fact already enough to imply the existence of a (universal) P-null set N ∗ ∈ B such that: for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), if we define ∀ω ∈ , ∀x ∈ X, ∀n ≥ 1,

n

Sn (ω, x) =

Wk (ω)T pk f (x),

(8.6.5)

k=1

the sequence Snk (ω, •) converges μ-almost surely. If in addition to condition (8.6.2) we have

nk+1 " nk+1 2 # 2 < ∞, min log2 (nk+1 − nk ) log pnk+1 j =nk +1 E (Wj ) , E j =nk +1 |Wj | k

(8.6.6) then one also has the existence of a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), the sequence Sn n = 1, 2, . . . converges μ-almost surely. Proof of Theorem 8.6.1. Fix some N ≥ 1 and let R > nN ≥ 1. Let R ≥ 0 be defined by nN+R < R ≤ nN +R +1 . Let f ∈ H with spectral measure μf with respect to T . Then, R Wk (ω)T pk f k=nN +1

≤

N+ i+1 R −1 n

Wk (ω)T

pk

f +

k=ni +1

i=N

R

Wk (ω)T pk f

k=nN+R +1

∞ ∞ Wk (ω)T pk f + ni+1

≤

i=N

≤

i=N

+

j =0 nN+j nN

k=nN +1

t∈T k=n +1 i

i=N

+

∞

sup

j =N nj 21 , β > 2. Then, there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), the following sequences {Sn (ω, · ), n ≥ 1}, {Rn (ω, · ), n ≥ 1} defined by ∀x ∈ X, ∀n ≥ 1, Sn (ω, x) =

n Zk (ω)

kα

k=1

and ∀x ∈ X, ∀n ≥ 1, Rn (ω, x) =

n k=1

T k f (x)

(8.6.9)

Zk (ω) T k f (x) √ k logβ k

(8.6.10)

converge μ-almost surely. Proof. Set Nk = 2k for all k ∈ N. By Theorem 8.6.2, it is enough to verify conditions (8.6.2) and (8.6.6). Let α > 21 , then for the series (8.6.9), condition (8.6.2) becomes ∞

log(2i+1 )

i+1 $ 2

E (|Zk |2 ) k 2α

k=2i +1

i=0

%1/2

∞

≤ K(E |Z1 | )

2 1/2

√

i+1 1

i=0

2(α− 2 )i

< ∞.

And concerning condition (8.6.6), we find ∞

$ 2

min log (2

k+1

− 2 ) log 2 k

k+1

k+1 2

j =2k +1

k=1

≤ KE (|Z1 |2 )

∞ k=1

E |Zj |2 ,E j 2α

k3 2(2α−1)k

k+1 2

j =2k +1

|Zj | jα

2 %

.

< ∞.

Let β > 2, then for the series (8.6.10), condition (8.6.2) becomes ∞

log(2i+1 )

i+1 $ 2

k=2i +1

i=0

E (|Zk |2 ) k log2β k

%1/2

≤ K(E |Z1 |2 )1/2

∞

1

i=1

(β− 21 )

i

< ∞.

As for condition (8.6.6), ∞ k=1

$ 2

min log (2

k+1

− 2 ) log 2 k

k+1

k+1 2

j =2k +1

E (|Zj |2 ) ,E j log2β j

k+1 2

2 %

|Zj | β

j =2k +1

j 1/2 logj

8.6 Application to a.s. convergence of weighted series of contractions

≤ KE (|Z1 |2 )

∞ k=1

1 k 2β−3

395

< ∞.

Hence, conditions (8.6.2) and (8.6.6) are fulfilled for the series (8.6.9) and (8.6.10). This achieves the proof of Corollary 8.6.4. 8.6.5 Remark. If α ≤ 21 , and P{|Z1 | > 0} > 0, then the series (8.6.9) does not converge. To see this, it is enough to take T = Identity and α = 21 in (8.6.9). Then we have ∞ ∞ Zk (ω) k Zk (ω) ∀ω ∈ , ∀x ∈ X, √ T f (x) = f (x) √ . k k k=1 k=1 But by the 0-1 law and the central limit theorem, the series in the right-hand side diverges almost surely. The case α < 1/2 is treated in exactly the same manner, and this completes the proof of our claim. Corollary √ 8.6.4 also improves earlier results of Rosenblatt [1988], with a factor n instead of n logβ n and the Rademacher sequence instead of a general sequence of independent, symmetric, identically distributed random variables. By combining Corollary 8.6.4 with Kronecker’s lemma, we get 8.6.6 Theorem. If {Zk , k ≥ 1} is a sequence of independent, symmetric, square integrable, identically distributed random variables on a probability space (, B, P) and if β > 2, then there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), the sequence 1 Zk (ω)T k f (x), An (ω, x) = √ n logβ n k=1 n

x ∈ X, n ≥ 1,

(8.6.11)

converges to zero μ-almost surely. 8.6.7 Remark. 1. In the previous applications, we only considered the i.i.d. case. Naturally Theorem 8.6.2 applies to the non-i.i.d. case as well. The corresponding results are left as exercises. 2. The almost sure convergence of the weighted means 1 Zk (ω)T k f n n

(8.6.12)

k=1

was studied by several authors. In Assani [1998], the almost sure convergence to zero of these means is established when {Zk , k ≥ 1} is an i.i.d. sequence of symmetric random variables, such that E (|Z1 |p ) < ∞ for some 1 < p < ∞ and T is the transformation induced by a measure-preserving transformation. In Rosenblatt [1988], these means

396

8 The metric entropy method

are studied when T is a contraction on Lp (μ), 1 < p < ∞ and {Zk , k ≥ 1} is a Rademacher sequence. And in Schneider–Weber [1996], a Gaussian technique is used to prove the almost sure convergence of means (8.6.12), notably when the sequence {Zk , k ≥ 1} is positive. Now let {Zk , k ≥ 1} be as in Theorem 8.6.6, and let (X, F , μ) be a probability space, T a contraction on L1 (μ), which is also assumed to be a contraction on any Lp (μ), (p ≥ 1). Consider the series ∞ Zk (ω) k=1

k

T k f (x).

(8.6.13)

By using the above results and a complex interpolation method, we will prove the almost sure convergence of the series (8.6.13), for all f ∈ Lp (μ), p > 1. 8.6.8 Theorem. Let {Zk , k ≥ 1} be a sequence of independent, symmetric, square integrable, identically distributed random variables on some probability space (, B, P). Then, there exists a (universal) P-null set N ∗ ∈ B, such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L1 (μ), which is also a contraction on every Lp (μ) and for any f ∈ Lp (μ), (p > 1), the series defined in (8.6.13) converges μ-almost surely. Further, if n Zk (ω) k S ∗ (f ) = S ∗ (ω, f ) = sup (8.6.14) T f , k n≥1 k=1

then we have the strong maximal inequality ∀ω ∈ \N ∗ , ∀p > 1, ∀f ∈ Lp (μ),

S ∗ (f ) p ≤ C(p, ω) f p .

(8.6.15)

Proof. Let α > 1/2, z ∈ C with 0 ≤ $(z) ≤ 1, Nj = 2j , j = 1, 2, . . . . Let also ν : X → N∗ be a measurable application and define for ω ∈ , p ≥ 1 the operators in Lp (μ), ν Zj (ω) k T (f ) (8.6.16) ∀f ∈ Lp (μ), Szν (f ) = (α+ 2z ) j j =1 as well as

Sz∗ (f ) = sup Szν (f ) .

(8.6.17)

ν≥1

First we establish a useful estimate for Szν (f ) 2 when f ∈ L2 (μ). Let x ∈ X, then there exists a positive integer k0 = k0 (ν), such that 2k0 < ν(x) ≤ 2k0 +1 . Thus ν(x) Zj (ω) j |Szν (f )|(x) = T f (x) (α+ 2z ) j j =1 2k0 n Zj (ω) j ≤ T f (x) + max (α+ 2z ) 2k0 0 such that sup E |θk |α < ∞. k≥1

There exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , for any probability space (X, F , μ), any contraction T on L2 (μ) and any f ∈ L2 (μ), if is the contraction defined in (a2), one has 1 pk +θk (ω) 1 pk T (f ) = lim T ((f )) μ-almost surely. n→∞ n n→∞ n n

n

k=1

k=1

lim

8.6.12 Remarks. Corollary 8.6.11 implies that if {pk , k ≥ 1} is a sequence of positive integers which satisfies (a1) and which is 2-good for the pointwise ergodic theorem, then there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , the perturbed sequence {pk + θk (ω), k ≥ 1} is also 2-good for the pointwise ergodic theorem. Thus it follows: 1) If {pk , k ≥ 1} is the sequence {k d , k ≥ 1}, d ≥ 1 or the sequence of prime numbers, then the perturbed sequence {pk + θk (ω), k ≥ 1} is 2-good for the pointwise ergodic theorem. Furthermore, if d = 1 (resp. d ≥ 2) and τ is ergodic (resp. τ n is ergodic for each n ∈ N), then for any ω ∈ \N ∗ one has 1 f τ pk +θk (ω) = n→∞ n n

k=1

(f )dμ =

lim

X

f dμ

μ-almost surely.

X

2) On the other hand, we can deduce from Corollary 8.6.11, that if {pk , k ≥ 1} is a sequence of positive integers which satisfies (a1) and which is 2-bad for the ergodic

8.7 An application to random perturbation of intersective sets

403

theorem (i.e., there exist an f ∈ L2 (μ) and Xf ∈ A with μ(Xf ) > 0 such that, for each x ∈ Xf , limn→∞ n1 nk=1 f τ pk (x) does not exist, then there exists a (universal) P-null set N ∗ ∈ B such that for each ω ∈ \N ∗ , the sequence {pk + θk (ω), k ≥ 1} is also bad for the pointwise ergodic theorem. This was observed in [Schneider: 1997], where a weaker form of Corollary 8.6.11 was proved using Gaussian techniques. 3) Several other papers dealing with this subject and with suprema of random polynomials appeared after [Boukhari–Weber: 2002], and we may cite the works of Cohen [2004–2006] in collaboration with Cuny, Jones and Lin. These papers explore some larger setting – multidimensional cases with valuable and quite interesting extensions to Lp -contractions with 1 < p ≤ 2 – but do not however improve significantly upon the results presented in this section. For other sources using the metric entropy method, we shall also refer to [Gamet–Weber: 2000]. For improvements based on the majorizing measure method, see Section 9.6. It seems pretty clear that the two aforementioned methods are mostly appropriate, when combined with spectral theory and ergodic theory, for tackling these questions. Problem 9. Only sufficient conditions are given. Find necessary conditions to these “universal” convergence properties.

8.7 An application to random perturbation of intersective sets Given a set S ⊂ Z and a sequence I = {In , n ≥ 1} of intervals of increasing length contained in Z, let b(S, I ) = lim sup |In |→∞

|S ∩ In | , |In |

b(S) = sup b(S, I ), I

where the supremum is taken over all collections I of intervals. Here and henceforth for a finite set B we will use |B| to denote its cardinality. We call b(S) the Banach density of S. If the limit |S ∩ [1, N ]| N →∞ N

d(S) := lim

exists, this is by definition the density of S. Suppose (X, B, μ, T ) is a measurable dynamical system. Recall that a sequence of natural numbers k = {kn , n ≥ 1} is 2-nice if given any dynamical system (X, B, μ, T ), and any f ∈ L2 (μ), N 1 f (T kn x) = ET (f )(x), N →∞ N

lim

n=1

μ-almost everywhere. Here ET (f ) denotes as usual the conditional expectation of f with respect to the σ -algebra B(T ) of T -invariant measurable subsets of X.

404

8 The metric entropy method

Say that a sequence of natural numbers k = {kn , n ≥ 1} is multiply intersective if, given any subset E of the natural numbers of positive Banach density, there exists another subset R of Z with d(R) existing and not less than b(E), such that for each finite subset {n1 , . . . , nr } of R we have b(E ∩ (E + kn1 ) ∩ · · · ∩ (E + knr )) > 0. We say that k is intersective if, given any subset E of Z of positive Banach density, there exists k in k such that. E ∩ (E + k) = ∅. Let us first comment about results related to that property. The interest in intersective sets dates from the 1970s, and immediately postdates Furstenberg’s famous ergodic theoretic proof of Szemeredi’s theorem (see Furstenberg [1981]). A number of authors Furstenberg [1981], Kamae and Mendes-France [1978], Sárközy [1978] showed by strikingly diverse arithmetic and analytic means that special arithmetic sequences like the squares {kr = r 2 , r ≥ 1} are intersective. In Bertrand-Mathis [1986] it is shown that a sequence of integers being intersective is equivalent to it having the Poincaré recurrence property. The relation of the intersectivity property of a sequence to other properties of an integer sequence is explored in Bourgain [1987]. The natural numbers are shown to be multiply intersective in Ruzsa [1978]. New families of multiply intersective sequences are given in Nair [1998] and Nair–Zaris [2001]. Suppose θ = {θn , n ≥ 1} denotes a sequence of N-valued independent, identically distributed random variables with basic probability space (, A, P), with a P-complete σ -field A. We assume k is 2-nice and that there exist 0 < α < 1 and B > 1/α, such that α kn = O(en ), E logB E(α, B) + |θ1 | < ∞. Then we say that (k, θ ) is a good pair. We will establish the following theorem. 8.7.1 Theorem. Suppose that (k, θ ) is a good pair. Then for P-almost all (θi ), given any set E contained in the natural numbers with b(E) > 0, there exists a set R contained in the natural numbers with density d(R) existing and d(R) ≥ b(E), such that for any finite set {n1 , . . . , nr } contained in R, b(E ∩ (E + kn1 + θn1 ) ∩ · · · ∩ (E + knr + θnr )) > 0. Before giving the proof, we proceed with a series of lemmas. Consider a sequence θ, θ1 , θ2 , . . . , of Z-valued, independent random variables defined on a probability space (, B, P), and satisfying P {ki + θi ≥ 0} = 1 i = 1, 2, . . . . Introduce again the sequence of random polynomials UN (t) =

N n=1

(e2iπ t (kn +θn ) − E e2iπ t (kn +θn ) ),

N = 1, 2, . . . .

405

8.7 An application to random perturbation of intersective sets

Assume that the following condition in which : N → N is increasing, is satisfied: #1/2 " log+ (kM + θM ) A(k, θ, ) = E sup < ∞. (8.7.1) (M) M≥1 According to Theorem 8.5.7, |UM (t) − UN (t)| ≤ C · A(k, θ, ), (M − N)1/2 (M) N <M 0≤t≤1

E sup sup

(8.7.2)

where C is a universal constant. The following lemma is related to condition (8.7.1). 8.7.2 Lemma. Assume that θ is an i.i.d. sequence and that condition E(α, B) is satisfied. Then condition (8.7.1) is realized with (t) = t α/2 . Proof. With this choice of , we have for T large, #1/2 " log+ (kM + θM ) 2 α P sup P kM + θM > e4T M > 2T ≤ (M) M≥1 M≥1 2 α ≤ P θM > e T M M≥1

≤

2B αB P logB M + θ1 > T

M≥1

≤

E logB + θ1 M −αB ≤ CT −2B , T 2B M≥1

where C depends on α, B, θ1 only. The result readily follows. 8.7.3 Lemma. Suppose that (k, θ ) is a good pair. Let be the contraction operator defined by ∞ f = E f T θ1 = P{θ1 = n}f T n . n=0

∗

There exist a measurable set of full measure, such that for any ω ∈ , any dynamical system (X, B, μ, T ), and any f ∈ L2 (μ), N 1 f T kn +θn (ω) = E T f (x) = 1. μ x : lim N →∞ N n=1

Proof. It follows from (8.7.2) and Lemma 8.7.2 that there exists a nonnegative P-integrable random variable such that |UM (t) − UN (t)| ≤ , 1/2 M α/2 N <M 0≤t≤1 (M − N) sup sup

(8.7.3)

406

8 The metric entropy method

P-almost surely. By the spectral inequality (Proposition 1.2.2), it follows that if · 2 denotes the standard norm on L2 (μ), kn kn +θn − N 0. Otherwise, P{X ≥ E X} = 0, and thus X ≤ E X a.s. Hence X ∞ = E X. But E ( X ∞ − X) = 0, whence X = X ∞ a.s. This contradicts our assumption, so P{X ≥ E X} > 0. 8.7.5 Lemma. Let (k, θ ) be a good pair. Suppose that (X, B, μ, T ) is a dynamical system, with T invertible, and let B ∈ B with μ(B) > 0. Let Bk denote T −k B for each integer k. Then for almost all θ with respect to P, there exists a subset R = Rk,θ of the natural numbers with d(R) ≥ μ(B) such that for each finite set F contained in R we have * μ Bkn +θn > 0. n∈F

Proof. Let be the universal measurable set of unit mass associated to the pair (k, θ ); is the set { < ∞} where is defined in (8.7.3). Then, for any ω ∈ , any dynamical system (X, B, μ, T ), and any f ∈ L2 (μ), N 1 μ x : lim f T kn +θn (ω) = E T f (x) = 1. N →∞ N n=1

(8.7.7)

8.7 An application to random perturbation of intersective sets

We note that ∞ n E f (x)μ(dx) = P{θ1 = n} f T (x)μ(dx) = f dμ. X

X

n=0

407

(8.7.8)

X

Throughout the rest of the proof, we fix ω ∈ ∗ , and write more simply θn instead of θn (ω). Let B ∈ B with μ(B) > 0. Let also P (N) denote the collection of finite subsets of N. For any F ∈ P (N), let * BF = Bkn +θn , n∈F

and let NF = {x ∈ X : χBF (x) > χBF ∞ }. Here we have used χ to denote the indicator function. Now let + NF , N= F ∈P (N)

and let N =

+

T m N.

m∈N

If f = χA , then f ∞ = 0 (resp. f ∞ = 1) if μ(A) = 0 (resp. μ(A) > 0). Therefore, NF = BF if μ(BF ) = 0, and NF = ∅ if μ(BF ) > 0. So that the set N is exactly + + NF = BF . F :μ(BF )=0

F :μ(BF )=0

This in particular implies that N, and hence N is a null set. Put B = B ∩ N c. Define for x ∈ X the return times set Rx = {n ∈ N : x ∈ T −(kn +θn ) B }. By (8.7.7),

N 1 μ x : d(Rx ) = lim χBk +θ (x) = E T (χB )(x) = 1. n n N →∞ N n=1

As by (8.7.8),

X

E T (χB )(x)μ(dx) = μ(B), we deduce from Lemma 8.7.4 that μ x : E T (χB )(x) ≥ μ(B) > 0.

408

8 The metric entropy method

Thus, there exist x0 in X such that if R = Rx0 , then d(R) ≥ μ(B). We now prove that

*

Bkn +θn > 0

μ

n∈F

for each finite set F contained in R. First, observe that x0 ∈ BF = n∈F Bk n +θn . We claim that x0 ∈ / N. Indeed, since * * T kn +θn x0 ∈ B = B ∩ N c = B ∩ (T m N )c = B ∩ T mN c , m∈N

m∈N

we have T kn +θn x0 ∈ T m N c , which with the choice m = kn + θn , implies x0 ∈ N c . But * x ∈ X : |χBF (x)| ≤ χBF ∞ , Nc = F ∈P (N)

and χBF (x0 ) = 1, since x0 ∈ BF ⊂ BF . Hence χBF ∞ ≥ 1, which ensures that μ(BF ) > 0 as required. We now give the proof of the theorem. According to our assumption, there exists a sequence of finite intervals I = {In , n ≥ 1} with strictly increasing lengths such that |E ∩ In | . n→∞ |In |

b(E) = lim

Let denote {0, 1}Z . Consider the point ζ = {χE (n), −∞ < n < ∞} in and let T denote the two-sided shift on defined by ∞ T (xn )∞ −∞ = (xn+1 )−∞ .

Now let X denote the closure of the orbit {T m ζ : m ∈ Z} in the product topology on and let X0 denote the set {x ∈ X : x1 = 1}. Let δx be the Dirac mass on the point x, and let 1 μN = δT m ζ , |IN | m∈IN

By a known argument (Furstenberg [1981: 73]), there exists a probability measure μ supported on X and preserved by T which is a weak star limit of the sequence of measures {μN , N ≥ 1}. In addition, passing to a subsequence {INs , s ≥ 1} if necessary, for every continuous function on , f dμ = lim f dμNs .

s→∞

8.8 An application to the discrepancy of some random sequences

409

This means that 1 δT m ζ (X0 ) = b(E) > 0. s→∞ |IN |

μ(X0 ) = lim μNs (X0 ) = lim s→∞

m∈IN

By Lemma 8.7.5 we have * * * μ(X0 T −kn1 −θn1 X0 ··· T −knr −θnr X0 ) * * * = lim μNs (X0 T −kn1 −θn1 X0 ··· T −knr −θnr X0 ) s→∞

* * * 1 δT m ζ (X0 T −kn1 −θn1 X0 ··· T −knr −θnr X0 ) s→∞ |IN | m∈IN * * * = b(E (E + kn1 + θn1 ) · · · (E + knr + θnr )) > 0

= lim

as required for every finite subset {n1 , . . . , nr } of R, thereby concluding the proof of Theorem 8.7.1.

8.8 An application to the discrepancy of some random sequences In this section, we give another application by estimating the discrepancy of {{nx}, n ≥ 1} when n is sampled by a random walk. Several examples involving the diophantine approximation properties of x are further considered. The metric entropy method is combined here with the Erdös–Turan inequality. For a real x, let x denote the distance from x to the nearest integer, namely,

x = min |x − m| = min {x}, 1 − {x} , m∈Z

where {x} denotes the fractional part of x. Now, let ψ be a nondecreasing positive function, defined at least on positive integers. An irrational number y is of type < ψ if qqy ≥ 1/ψ(q) for all positive integers q. If ψ is a constant function, then an irrational number y of type < ψ is also called of constant type. Let η be a positive real number (or infinity). The irrational number y is of type η, if η is the supremum of all γ for which lim inf q γ qy = 0. q→∞ q integer

It is classical (Dirichlet’s theorem) that we have lim inf q→∞ q γ qy = 0 for any γ < 1 and for any irrational y. Therefore the type of an irrational number is always greater than or equal to 1.

410

8 The metric entropy method

For a sequence s = {sn , n ≥ 1} of real numbers, the discrepancy of s modulo 1 is defined by N NDN (s) = sup 1 − N|I |. I ⊂[0,1[

n=1 sn ∈I

Recall the Erdös–Turan inequality (for a proof see e.g., Harman [1998: Theorem 5.5]): There exists an absolute constant C such that for any positive integers L and N , NDN (s) ≤

L N N 1 2iπ hsn e +C . L+1 h h=1

(8.8.1)

n=1

Now let X be a Z-valued random variable, with characteristic function ϕ, and let X = {Xn , n ≥ 1} be a sequence of independent copies of X. Put for any positive integer n, Sn = X1 + · · · + Xn , S0 = 0. Fix some x ∈ [0, 1[, and consider the sequence x = {Sn x}, n ≥ 1 . Results concerning the uniform distribution modulo 1 of the sequence x are in Holewijn [1973], Robbins [1973], Schatte [1984,1988], and in Kesten [1964] a variant of the problem is considered. The main result of the section is the theorem below giving an estimate of the discrepancy of the sequence x. The diophantine approximation properties of x are naturally involved there. Before stating the main result of this section, we shall introduce an extra function. Let : R+ → R+ be nondecreasing and such that for any m ∈ N, m h=1

1 ≤ (m). h|ϕ(hx) − 1|

8.8.1 Theorem. Let L : R+ → R+ be nondecreasing and such that L is concave. For any τ > 3/2, a.s.

DN (x) = O

$

%1/2

1 L(N ) + log L(N ) L(N ) N

logτ N .

(8.8.2)

8.8.2 Remarks. 1. Theorem 8.8.1 completes some results of Schatte [1988], in which only the non-lattice case is treated. Schatte considered sums Zn = Y1 + · · · + YN (mod 1), where Y1 , Y2 , . . . are independent copies of a random variable with values in [0, 1[ . Let ζ = {Zn , n ≥ 1}. Under the condition sup P(Zn < x) − x = O(n−3/2 ), (8.8.3) 0≤x≤1

it is proved in Schatte [1988: Theorems 1 and 2] that 2 a.s.

DN (ζ ) = O N −1/2 log N .

(8.8.4)

8.8 An application to the discrepancy of some random sequences

And if

sup P(Zn < x) − x = O(n−4 ),

411

(8.8.5)

0≤x≤1

a law of the iterated logarithm holds: let X0 be a random variable with values in [0, 1[, which is independent of the sequence Y1 , Y2 , . . . . Then n 1 a.s. lim sup √ 1[0,u( (X0 + Zj ) − nu = σ (u), n log log n n→∞

(8.8.6)

j =1

where σ 2 (u) = u − u2 + 2

∞

E 1[0,u( (U )1[0,u( (U + Zj ) − u2 < ∞

(8.8.7)

j =1

and U is a uniformly distributed random variable, which is independent of the sequence Y1 , Y2 , . . . . Schatte’s conditions are always satisfied when the distribution function of X1 possesses an absolute component; they are equally satisfied for some singular and discrete distribution functions, but not for lattice distributions. These results also remain valid when Y1 , Y2 , . . . are independent copies of a random variable with values in [0, y[, y being an arbitrary positive real and Zn = Y1 + · · · + YN (mod y). A general remark can however be made on this point. By putting Sn = nj=1 [Xi ]+ n n j =1 {Xi }, we have that {Sn y} = {Zn + j =1 [Xi ]y}. By considering characteristic functions, we easily see that there is no reason in general for the discrepancies of ({Sn y})n and (Zn )n to be comparable. In the following examples, we deduce from Theorem 8.8.1 discrepancy results for lattice random variables, depending on the diophantine approximation properties of x. 2. Assume that X is Z-valued and that there exists a constant C < ∞ such that for any t ∈ [−1/2, 1/2[, 1 − ϕ(t) ≥ C|t|. Since ϕ(hx) = ϕ(hx), we therefore have m h=1

1 1 =O . h|1 − ϕ(hx)| hhx m

h=1

If x is of irrational type < B, then (Kuipers–Niederreiter [1971: Lemma 3.3]) for any ε > 0 m 1 (8.8.8) = O(mB−1+ε ). hhx h=1

Let x be of irrational type < B and take (m) = mB−1+ε . We choose L(m) := mα/(B−1+ε) for some 0 < α < 1. Then L(m) = mα . We deduce from Theorem 8.8.1, for any σ > 2, DN (x) = O(N −α/(B−1+ε) + N (α−1)/2 logσ N ). a.s.

412

8 The metric entropy method

And, by taking α = (B − 1)/(B + 1), for any σ > 2, DN (x) = O(N −1/(1+B) logσ N ). a.s.

(8.8.9)

3. If X is a Bernoulli sequence and x is of type < B, then (8.8.9) is fulfilled. This is to be compared with the well-known fact that the discrepancy N (x) of the sequence

a.s. {nx} satisfies N (x) = O N −1/B+ε , for any ε > 0. There is a moderated loss of precision in that limit case. 4. Assume that: a

a) X is Z-valued with E X = 0, and E X2 log+ |X| < ∞, for some a > 0. b) x is irrational, and the partial quotients of the continued fraction expansion of x are bounded by a fixed number M. 2 According to Theorem 9.3.4 in Kawata [1972], 1 − $ϕ(t) = 21 (E X2 )t 2 + O | logt |t| |a as t → 0. And thus, m h=1

1 1 =O . h|1 − ϕ(hx)| hhx2 m

h=1

In view of Haber–Osgood [1969: 385], for any t > 1, C1 mt ≤

m h=1

1 ≤ C2 mt , hxt

where the constants C1 , C2 depend on M and t only. Then m h=1

1 ≤ C3 m2 , hhx2

where C3 depends on M and X only. We can thus choose (m) = m2 , and L(m) =

a.s. mα/2 , 0 < α < 1. Applying (8.8.2) gives, for any σ > 2, DN (x) = O N −α/2 + N (α−1)/2 logσ N . Taking α = 1/2, we obtain: Under conditions a) and b), for any σ > 2, a.s.

DN (x) = O N −1/4 logσ N .

(8.8.10)

The proof of Theorem 8.8.1 follows from a series of lemmas. Put for any integers N ≥ 1, m ≥ 0, N (m) =

N

e2iπ mSn x .

(8.8.11)

n=1

8.8.3 Lemma. For any two integers N ≥ P ≥ 1, one has the following estimate: 2 E N (m) − P (m) ≤

7(N − P ) ∧ (N − P )2 . |ϕ(mx) − 1|

(8.8.12)

413

8.8 An application to the discrepancy of some random sequences

Proof. An elementary computation shows for integers N > P ≥ 1 that 2 E N (m) − P (m) " # " = (N − P ) + (N − P − 1) ϕ(mx) + ϕ(−mx) + (N − P − 2) ϕ(mx)2 # " # + ϕ(−mx)2 + · · · + ϕ(mx)N −P −1 + ϕ(−mx)N −P −1 . But

ϕ(mx)k + ϕ(−mx)k = E eimxSk + e−imxSk = 2E cos mxSk = 2$(E eimxSk ) = 2$(ϕ(mx)k ),

for any k ≥ 1. And thus, 2 E N (m) − P (m) = (N − P ) + 2$ N − P − 1)ϕ(mx)

+ (N − P − 2)ϕ(mx)2 + · · · + ϕ(mx)N −P −1 .

For any z ∈ C and Q ∈ N,

Q−1 d=1

(Q − d)zd = QzQ−1 −

Q z−1

+

zQ −1 . (z−1)2

Therefore,

2 E N (m) − P (m) = (N − P ) + 2$ N − P )ϕ(mx)N −P −1 (N − P ) ϕ(mx)N −P − 1 − + ϕ(mx) − 1 (ϕ(mx) − 1)2 7(N − P ) 2 ∧ (N − P ) . ≤ |ϕ(mx) − 1|

This proves the lemma. Put now for any positive integer n, Un =

L(n) h=1

1 n (h). h

(8.8.13)

8.8.4 Lemma. For any two integers n > l ≥ 1, L(l) 2 1 E Un − Ul ≤ 14 (n − l)(1 + log L(l)) h ϕ(hx) − 1 h=1 L(n) + n log L(l)

1 . hϕ(hx) − 1 h=L(l)+1 L(n)

(8.8.14)

Proof. Plainly L(l) 1

Ul − Un = |l (h)| − |n (h)| − h h=1

L(n) h=L(l)+1

1 |n (h)| := A − B. h

414

8 The metric entropy method

By the Cauchy–Schwarz inequality, and by Lemma 8.8.3, L(l) L(l) 2 1 1 EA ≤ E l (h) − n (h) h h 2

h=1

h=1

L(l) L(l) 1 1 , ≤ 7(n − l) h h ϕ(hx) − 1 h=1

EB ≤ 2

L(n) h=L(l)+1

≤ 7n

h=1

1 E h

L(n) h=L(l)+1

1 h

L(n) h=L(l)+1

2 1 n (h) h

L(n)

1 . ϕ(hx) − 1 h h=L(l)+1

Lemma 8.8.4 thus follows. Put := L; then for any n > l ≥ 1,

Un − Ul 22

L(n) ≤ 14 (n − l)(l)(1 + log L(l)) + n (n) − (l) log L(l) ≤ 14(n − l)(n) log eL(n),

since by concavity assumption of ,

(n)−(l) n−l

(n) n .

≤

8.8.5 Proposition. For any τ > 3/2, #1/2 τ a.s. " Un = O (n)n log L(n) log n .

(8.8.15)

Proof. By the remark made above, for any n > l ≥ 1,

Un − Ul 22 ≤ 14(n − l)(n) log eL(n),

Un 22 ≤ 14n(n) log eL(n).

Let a > 1/2. By Tchebycheff’s inequality, " #1/2 a P |U2p | > (2p )2p log eL(2p ) p ≤ Cp−2a , and by the first form of the Borel–Cantelli lemma, " #1/2 a a.s. |U2p | = O( (2p )2p log L(2p ) p ). p , 2p+1 [. Now, examine the oscillation of Un over the interval 2 [2 " #1/2 p p p Put Un = Un / (2 )2 log eL(2 ) . Then E Un − Ul ≤ C ( n−l 2p ). Applying Lemma 8.3.3 gives U − U ≤ Cp. sup n l 2 2p ≤n,m 3/2. By the Tchebycheff inequality, P

sup 2p ≤n,m 3/2, " #1/2 τ a.s.

NDN (x) = O N/L(N ) + (N)N log L(N ) log N ,

(8.8.17)

which proves our claim.

8.9 An application to random Dirichlet polynomials We close this chapter by giving an application of the metric entropy method to the study of the supremum of some classes of random Dirichlet polynomials. We begin with some general considerations. Let {dn , n ≥ 1} be a sequence of real numbers. Let s = σ + it denote a complex −s over lines number. The supremum of the Dirichlet polynomials P (s) = N n=2 dn n {s = σ + it, t ∈ R} is naturally related to that of corresponding Dirichlet series, via the abscissa of uniform convergence −σ −it converges uniformly over t ∈ R , σu = inf σ : ∞ n=2 dn n through the relation −it log supt∈R N n=2 dn n σu = lim sup . log N N →∞ We refer to Bohr [1952], Helson [1967] or Hardy and Riesz [1915] for this background and related results. This naturally justifies the investigation of the supremum of Dirichlet polynomials. Studies for random Dirichlet polynomials and random Dirichlet series were developed in Halász [1983] and Quéffelec [1980], [1983], [1995] notably, see also Lifshits– Weber [2007], [2009a] and references therein. Such investigations concerning random

416

8 The metric entropy method

Dirichlet series and random power series go back to earlier works of Hartman [1939], Clarke [1969], Dvoretzky–Erdös [1955], [1959]. Let us indicate some useful general results. For instance let ξ = {ξ, ξn , n ≥ 1} be a sequence of i.i.d. random variables and let σc and σa be, respectively, the almost abscissa of convergence and of absolute convergence of the Dirichlet series ∞ sure −s ξ n . If ξ = 0 holds with positive probability, let kξ := sup{γ : E |ξ |γ < ∞}. n n=1 The connection between the abscissas σc and σa and integrability of ξ has been clarified in [Clarke: 1969]. We have the implications: kξ = 0 0 < kξ ≤ 1 (kξ > 1 and E ξ = 0) (kξ > 1 and E ξ = 0)

"⇒ "⇒ "⇒ "⇒

σa σa σa σa

= σc = ∞ = σc = 1/kξ = σc = 1 = 1 and σc = max(1/kξ , 1/2).

(8.9.1)

Now let ε = {εi , i ≥ 1} be a sequence of independent Rademacher random variables (P{εi = ±1} = 1/2) defined on a basic probability space (, A, P). The following result is due to Bayart, Konyagin and Quéffelec [2003/2004]. Let {an , n ≥ 1} be a sequence of complex numbers, then: N 1 2 – If lim supN →∞ log log n=0 |an | = γ > 0, then for almost all choices of signs N ∞ εn = ±1, the series n=0 εn an nit diverges for each t ∈ R. – The result is nearly optimal: if0 < δn → 0, there exists a sequence {an , n ≥ 1} 2 > 0, but for each ω, the series such that lim supN →∞ δN log1log N N n=0 |an | ∞ it n=0 εn (ω)an n converges for at least on t ∈ R. In relation with the above, we may quote Hedenmalm and Saksman’s extension [2003] of Carleson’s result, namely the convergence for almost all t of the Dirichlet series ∞ εn (ω)an n−1/2+it ∞

n=0

under the assumption n=0 |an |2 < ∞. A simple and elegant proof is given in Konyagin and Quéffelec [2001/2002, p. 158/159]. The growth of random Dirichlet series were studied in [Yu: 1978/95]. Now consider the random Dirichlet polynomials D(s) =

N

εn dn n−σ −it .

(8.9.2)

n=2

When dn ≡ 1, some results are known. If σ = 0, then for some absolute constant C, and all integers N ≥ 2 C −1

N N εn n−it | ≤ C ≤ E sup | . log N log N t∈R N

n=2

(8.9.3)

8.9 An application to random Dirichlet polynomials

417

This has been proved by Halász and was later extended by Queffélec to the range of values 0 ≤ σ < 1/2. Queffélec provided a probabilistic proof of the original one, using Bernstein’s inequality for polynomials. For some constant Cσ depending on σ only, and all integers N ≥ 2 N Cσ−1

1−σ

log N

≤ E sup |

N

εn n−σ −it | ≤ Cσ

t∈R n=2

N 1−σ . log N

(8.9.4)

A basic reduction step is used for establishing these results. Introduce a useful notion. A set of numbers ϕ1 , ϕ2 , . . . , ϕk is linearly independent if no linear relation a1 ϕ1 + a2 ϕ2 + . . . + ar ϕr = 0, with integral coefficients, not all zero, holds between them. For a proof of the classical result below, we refer to Hardy and Wright [1979; Theorem 442]. Kronecker’s theorem. If ϕ1 , ϕ2 , . . . , ϕk , 1 are linearly independent, θ1 , θ2 , . . . , θk are arbitrary, and N , ε are positive, then there are integers n > N, n1 , n2 , . . . , nk such that max |nϕm − nm − θm | < ε.

1≤m≤k

Consequently, the set of points {nϕ1 }, {nϕ1 }, . . . , {nϕk } is dense in Tk . Let p1 , p2 , . . . , pk be different primes. By the fundamental theorem of arithmetic log p1 , log p2 , . . . , log pk are linearly independent. This will enable to replace the Dirichlet polynomial by some relevant trigonometric polynomial. Introduce the necessary notation. Let 2 = ) a (n) p1 < p2 < · · · be the sequence of consecutive primes. If n = τj =1 pj j , we write a(n) = {aj (n), 1 ≤ j ≤ τ }. Let π(N ) denote, as usual, the number of prime numbers that are less or equal to N. Let us fix N , put μ = π(N), and define, for z = (z1 , . . . , zμ ) ∈ Tμ , Q(z) =

N

dn n−σ e2iπ a(n),z ,

n=2

H. Bohr’s observation states that N sup dn n−(σ +it) = sup |Q(z)|. t∈R n=2

(8.9.5)

z∈Tμ

Remark. Naturally, no similar reduction occurs when considering the supremum over a given bounded interval I . However, when the length of I is of exponential size with respect to the degree of P , precisely when |I | ≥ e(1+ε)ωN (log N ω) log N ,

418

8 The metric entropy method

the related supremum becomes comparable, for ω large, to the one taken on the real line, with an error term of order O(ω−1 ). This is in turn a rather general phenomenon due to existence of “localized” versions of Kronecker’s theorem; and in the present case to Turán’s estimate (see [Weber: 2008] for a slightly improved form of it using a probabilistic approach, and references therein). When the length is of sub-exponential order, the study however still belong to the field of application of the general theory of regularity of stochastic processes. Now consider the following natural extension. For any integer n > 1, let P + (n) denote the largest prime divisor of n. Let 1≤ M < N be two positive integers and define S(N, M) = 2 ≤ n ≤ N : P + (n) ≤ M . Since S(N, N ) = [2, N], these sets naturally generalize the notion of interval of integers. By using the standard notation "(N, M) := #(S(N, M)), u = (log N)/ log M, we have (see Tenenbaum [1990: 405])

"(N, M) 1 " (N, M) := , = ρ(u) + O N log y ∗

(8.9.6)

uniformly for x ≥ y ≥ 2, where ρ(u) is the Dickman function, namely the unique continuous function on [0, ∞[, having a derivative on ]0, ∞[, and such that ρ(v) = 1, 0 ≤ v ≤ 1, ρ (v)v + ρ(v − 1) = 0, v > 1. It is known that ρ(u) > 0 for all u > 0. By setting M = N ε in (8.9.6) we see that "(N, N ε ) ∼ Nρ(ε−1 ) for any fixed 0 < ε ≤ 1. In view of (8.9.6), we shall refer to " ∗ as to Dickman-type function. Fix some positive integer τ ≤ π(N) and put Eτ = Eτ (N ) = 2 ≤ n ≤ N : P + (n) ≤ pτ . Note that for τ = μ we have Eμ = {2, . . . , N}. The Eτ -based Dirichlet polynomials were considered in [Quéffelec: 1995]. 8.9.1 Theorem. (a) Upper bound. Let 0 ≤ σ < 1/2. such that for any integer N ≥ 2 it is true that ⎧ 1/2−σ τ 1/2 ⎪Cσ N (log N )1/2 ⎪ ⎨ N 3/4−σ E sup εn n−σ −it ≤ Cσ (log N )1/2 ⎪ t∈R n∈E ⎪ ⎩C N 1/2−σ τ 1/2 τ σ

Then there exists a constant Cσ if N 1/2 ≤ τ ≤ N , if

N 1/2 log N

≤ τ ≤ N 1/2

if 1 ≤ τ ≤

N 1/2 log N .

419

8.9 An application to random Dirichlet polynomials

(b) Lower bound. Let 0 ≤ σ < 1/2. Then there exists a constant Cσ such that for every N ≥ 2, 1/2 C N 1/2−σ τ 1/2 σ −σ −it ∗ N E sup εn n ·" , pτ/2 . ≥ (log τ )1/2 pτ t∈R n∈Eτ

Proof of the upper bound in Theorem 8.9.1. The principle of the proof of the upper bound is as follows. Once we have reduced the operation to the study of a random polynomial Q on the multidimensional torus by using (8.9.5), the proof then consists of two different steps based on a decomposition Q = Q1 + Q2 . Our study of the supremum of the polynomial Q1 is made by using the metric entropy method. Our investigation of the supremum of the polynomial Q2 is undertaken by using first the contraction principle, reducing the study to that of a complex-valued Gaussian process. The latter task is carried out by means of Slepian’s comparison lemma, and by a careful study of the L2 -metric induced by this process. Now, we turn to the rigorous proof of the upper bound and introduce some notation. We can represent Eτ as the union of disjoint sets Ej = {2 ≤ n ≤ N : P + (n) = pj }, j = 1, . . . , τ. For z ∈ Tτ we put Q(z) =

τ

εn n−σ e2iπ a(n),z .

j =1 n∈Ej

By (8.9.5) we have τ sup εn n−σ −it = sup Q(z). z∈Tτ

t∈R j =1 n∈E j

Let 1 ≤ ν < τ be fixed. Write Q = Q1 + Q2 where Q1 (z) = εn n−σ e2iπ a(n),z , Q2 (z) = P + (n)≤pν

εn n−σ e2iπ a(n),z .

pν

N/pτ . Then take a unique k ∈ (ν, τ ] such that N/pk < m ≤ N/pk−1 . We have 2 Km = (mpj )−2σ ≤ m−2σ pj−2σ ν<j ≤k−1

≤ Cσ m−2σ

j ≤k−1

(j log j )−2σ ≤ Cσ m−2σ

j ≤k

≤ Cσ m−2σ = Cσ Since k log k ≤ Cpk ≤ C

k 1−2σ (log k)2σ

k k ≤ Cσ m−2σ 2σ (N/m)2σ pk

k . N 2σ N m,

we have

N −1 N log . k≤C m m

1/2

−1/2 . It follows that log N We arrive at Km ≤ Cσ N −σ N m m

−1/2 N 1/2 N log m m m≤N/pν 1/pν ≤ Cσ N 1−σ u−1/2 (log(1/u))−1/2 du

Km ≤ Cσ N −σ

m≤N/pν

0

≤ Cσ N 1−σ pν−1/2 (log pν )−1/2 ≤

Cσ N 1−σ . ν 1/2 log ν

Now define a second Gaussian process by putting, for all γ ∈ G, N 1−2σ 1/2 αj ξj + Y (γ ) = pj ν<j ≤τ

Km βm ξm := Yγ + Yγ ,

m≤N/pν

where ξi , ξj are independent N (0, 1) random variables. It follows from (8.9.7) and (8.9.8) that for some suitable constant Cσ , one has the comparison relations: for all γ , γ ∈ G,

Xγ − Xγ 2 ≤ Cσ Yγ − Yγ 2 .

422

8 The metric entropy method

By virtue of the comparison Lemma 10.2.3, since X0 = Y0 = 0, we have E sup |Xγ | ≤ 2E sup Xγ ≤ 2Cσ E sup Yγ ≤ 2Cσ E sup |Yγ |. γ ∈G

γ ∈G

γ ∈G

γ ∈G

It remains to evaluate the supremum of Y . First of all, −1/2 1 pj . E sup |Y (γ )| ≤ N 2 −σ γ ∈G

ν<j ≤τ

By (8.9.10), we have

−1/2

pj

≤

ν<j ≤τ

−1/2

pj

1<j ≤τ

thus 1

E sup |Y (γ )| ≤ C N 2 −σ γ ∈G

≤

Cτ 1/2 , (log τ )1/2

τ 1/2 . (log τ )1/2

(8.9.10)

To control the supremum of Y , we use our estimates for the sums of Km and write N 1−σ N 1−σ Cσ N 1−σ E sup |Y (γ )| ≤ Km ≤ Cσ + ≤ . ν 1/2 log ν τ 1/2 log τ ν 1/2 log ν γ ∈G m≤N/pν

(8.9.11) Now, we turn to the supremum of Q1 . Towards this aim, introduce the auxiliary Gaussian process ϒ(z) = n−σ θn cos 2π a(n), z + θn sin 2π a(n), z , z ∈ Tν , P + (n)≤pν

where θi , θj are independent N (0, 1) random variables. By symmetrization, √ E sup Q1 (z) ≤ 8πE sup ϒ(z), z∈Tν

z∈Tν

so that we are again led to evaluating the supremum of a real-valued Gaussian process. For z, z ∈ Tν put ϒ(z) − ϒ(z) 2 := d(z, z ), and observe that 1 d(z, z )2 = 4 sin2 (π a(n), z − z ) (8.9.12) 2σ n + n:P (n)≤pν

≤ 4π 2

n:P + (n)≤pν

≤ 4π 2

1 |a(n), z − z |2 n2σ n−2σ

ν !

n:P + (n)≤pν

= 4π 2

n:P + (n)≤p

2

aj (n)|zj − zj |

j =1 ν

ν j1 ,j2 =1

aj1 (n)aj2 (n)|zj1 − zj 1 | |zj2 − zj 2 |n−2σ

423

8.9 An application to random Dirichlet polynomials

= 4π

2

ν

j1 ,j2 =1 n:P + (n)≤pν

≤ 4π 2

ν j1 ,j2 =1

≤ 4π 2

ν j1 ,j2 =1

≤ Cσ N 1−2σ = Cσ N 1−2σ = Cσ N 1−2σ

aj1 (n)aj2 (n)|zj1 − zj 1 | |zj2 − zj 2 |n−2σ

|zj1 − zj 1 | |zj2 − zj 2 | |zj1 − zj 1 | |zj2 − zj 2 |

ν j1 ,j2 =1 ν j1 ,j2 =1 ν

∞ b1 ,b2 =1 ∞ b1 ,b2 =1

|zj1 − zj 1 | |zj2 − zj 2 | |zj1 − zj 1 | |zj2 − zj 2 |

|zj − zj |

j =1

∞

b pj−b

b1 b2

n−2σ

n≤N,aj1 (n)=b1 ,aj2 (n)=b2 1 σ −2b2 σ b1 b2 pj−2b pj2 1

∞ b1 ,b2 =1 ∞ b1 ,b2 =1

k −2σ

−b −b k≤Npj 1 pj 2 1 2 + P (k)≤pν

1 σ −2b2 σ 1 −b2 1−2σ b1 b2 pj−2b pj2 [pj−b pj2 ] 1 1

1 −b2 b1 b2 pj−b pj2 1

2 .

b=1

Thus,

d(z, z ) ≤ Cσ N

1/2−σ

ν

|zj − zj |

j =1

∞

b pj−b .

(8.9.13)

b=1

Now we explore the entropy properties of the metric space (Tν , d). Towards this aim, take ε ∈ (0, 1) and cover T ν by rectangular cells so that, if z and z belong to the same cell, we have |zj − zj | ≤

ε log log ν ,

1 ≤ j ≤ ν 1/2 ,

ε,

ν 1/2 < j ≤ ν.

(8.9.14)

Thus, every cell is a product of two cubes of different size and dimension. The necessary number of cells M(ε) is bounded as follows: M(ε) ≤

log log ν ε

[ν 1/2 ]

ε−(ν−[ν

1/2 ])

= (1/ε)ν (log log ν)[ν

1/2 ]

.

Let us now evaluate the distance d(z, z ) for z, z satisfying (8.9.14). By (8.9.13) we have d(z, z ) ≤ Cσ N 1/2−σ {d1 + d2 + d3 } ,

424

8 The metric entropy method

where d1 =

ν

|zj − zj |

j =1

b pj−b ,

b=2

d2 =

∞

|zj − zj |pj−1 ,

ν 1/2 <j ≤ν

d3 =

|zj − zj |pj−1 .

j ≤ν 1/2

For any j ≥ 1 we have ∞

b

pj−b

b=2

b 2 ∞ ∞ 2 2 −b = b 2 ≤ b 2−b = Cpj−2 . pj pj b=2

(8.9.15)

b=2

Hence, d1 ≤

ν

Cpj−2 max |zj − zj | ≤ Cε. j ≤ν

j =1

Similarly, d2 ≤

pj−1

ν 1/2 <j ≤ν

≤C ≤C

max |zj − zj |

ν 1/2 <j ≤ν

(j log j )−1 ε

ν 1/2 <j ≤ν ν du

ν 1/2

u log u

ε = C log log ν − log

log ν 2

ε = C(log 2) ε.

Finally, d3 ≤

ν

pj−1

j =1

max |zj − zj | ≤ C

j ≤ν 1/2

ν

(j log j )−1

j =1

ε ≤ C ε. log log ν

By summing up three estimates, we have d(z, z ) ≤ Cσ N 1/2−σ ε which enables the evaluation of the metric entropy. Let N (Tν , d, u) be the minimal number of balls of radius u that cover the space ν (T , d). We have log N (Tν , d, Cσ N 1/2−σ ε) ≤ log M(ε) ≤ ν| log ε| + ν 1/2 · log log log ν. Observe also that

ϒ(z) 2 ≤ Cσ N 1/2−σ ,

z ∈ Tν .

(8.9.16)

425

8.9 An application to random Dirichlet polynomials 1

Hence, D := diam(Tν , d) ≤ Cσ N 2 −σ , and by the classical Dudley’s entropy theorem (see (10.3.9) and (10.3.10)), for any fixed z ∈ Tν , E sup |ϒ(z ) − ϒ(z)| ≤ Cσ z ∈T ν

D

[log N (Tν , d, u)]1/2 du

0

Cσ N 1/2−σ

≤ Cσ 0

1

= Cσ N 1/2−σ

[log N (Tν , d, Cσ N 1/2−σ ε)]1/2 dε

0 1"

ν| log ε| + log log log ν · ν 1/2

≤ Cσ N 1/2−σ ≤ Cσ N

[log N (Tν , d, u)]1/2 du

0 1/2−σ 1/2

ν

#1/2

dε

.

Using again (8.9.16), we have E sup |ϒ(z )| ≤ Cσ N 1/2−σ ν 1/2 . z ∈T ν

(8.9.17)

The final stage of the proof provides the optimal choice of the parameter ν balancing the quantities (8.9.10), (8.9.11), and (8.9.17). As the theorem’s claim suggests, we consider three cases. Case 1. N 1/2 ≤ τ ≤ N. Obviously, this case contains the results of Halasz and Queffélec. In this case we choose ν=

τ , log N 1/2−σ 1/2

thus balancing (8.9.10) and (8.9.17). We obtain from both terms the bound Cσ N(log N )τ1/2 while the term (8.9.11) is negligible. The correctness condition ν ≤ τ is obvious. Case 2. N 1/2 (log N)−1 ≤ τ ≤ N 1/2 . In this case we choose ν = N 1/2 (log N )−1 , 3/4−σ

N thus balancing (8.9.11) and (8.9.17). We obtain from both terms the bound Cσ (log N )1/2 while the term (8.9.10) is negligible. The correctness condition ν ≤ τ is obvious for the range under consideration.

Case 3. 1 ≤ τ ≤ N 1/2 (log N)−1 . Here we just set ν = τ . It means that we do not need the splitting of the polynomial in two parts. Formally, the quantities (8.9.10) and (8.9.11) are not necessary and we obtain the bound Cσ N 1/2−σ τ 1/2 directly from (8.9.17). The upper bound is now proved completely.

426

8 The metric entropy method

Proof of the lower bound in Theorem 8.9.1. Let d = {dn , n ≥ 1} be a sequence of reals. Recall that by (8.9.5) we have τ sup dn εn n−σ −it = sup Q(z) z∈Tτ

t∈R j =1 n∈E j

where Q(z) =

τ

dn εn n−σ e2iπ a(n),z .

j =1 n∈Ej

Tτ

defined by Consider the subset Z of Z = z = {zj , 1 ≤ j ≤ τ } : zj = 0, if j ≤ τ/2, and zj ∈ {0, 1/2}, if j ∈ (τ/2, τ ] . Observe that the imaginary part of Q vanishes on Z, since for any z ∈ Z and any n it is true that e2iπ a(n),z = cos(2π a(n), z) = (−1)2a(n),z . Hence, Q takes the following simple form on Z, Q(z) = dn εn n−σ (−1)2a(n),z . τ/2<j ≤τ n∈Ej

This is no longer a trigonometric polynomial, but simply a finite rank Rademacher process. For j ∈ (τ/2, τ ] define Lj = n = pj n˜ : n˜ ≤ pNj and P + (n) ˜ ≤ pτ/2 . Since Ej ⊃ Lj ,

j = 1, . . . τ,

the sets Lj are pairwise disjoint. Put, for z ∈ Z, Q (z) = εn n−σ (−1)2a(n),z . τ/2<j ≤τ n∈Lj

We now recall a useful fact. 8.9.2 Lemma. Let X = {Xz , z ∈ Z} and Y = {Yz , z ∈ Z} be two finite sets of random variables defined on a common probability space. We assume that X and Y are independent and that the random variables Yz are all centered. Then E sup |Xz + Yz | ≥ E sup |Xz |. z∈Z

z∈Z

8.9 An application to random Dirichlet polynomials

427

Proof. Let be the σ -field generated by Y . Then # "

E sup |Xz + Yz | = E E sup |Xz + Yz | z∈Z z∈Z # " ≥ E sup E (Xz + Yz ) z∈Z

= E sup Xz + E Yz = E sup Xz . z∈Z

z∈Z

Clearly, since {Q(z) − Q (z), z ∈ Z} and {Q (z), z ∈ Z} are independent, E sup |Q(z)| ≥ E sup Q (z) . z∈Z

z∈Z

We now proceed to a direct evaluation of Q (z) by proving 8.9.3 Proposition. There exists a universal constant c such that for any system of coefficients {dn , n ≥ 1}, 1/2 1/2 dn2 n−2σ ≤ E sup Q (z) ≤ dn2 n−2σ . c z∈Z

τ/2<j ≤τ n∈Lj

τ/2<j ≤τ n∈Lj

Proof. For any n ∈ Lj , we have 2a(n), z = 2zj , so that dn εn n−σ (−1)2a(n),z = (−1)2zj dn εn (ω)n−σ . n∈Lj

Thus

n∈Lj

Q (z) =

(−1)2zj

τ/2<j ≤τ

dn εn (ω)n−σ .

n∈Lj

Let ω ∈ . We can select zj = zj (ω) = 0 or 1/2, τ/2 < j ≤ τ , according to the sign + or − of the sum n∈Lj dn εn (ω)n−σ . This implies that sup Q (z) = z∈Z

dn εn n−σ .

τ/2<j ≤τ n∈Lj

Now we shall use the well-known Khintchin’s inequalities. Let {εi , 1 ≤ i ≤ N } be a Rademacher sequence. For any 0 < p < ∞, there exist positive finite constants cp , Cp depending on p only, such that for any finite sequence {ai , 1 ≤ i ≤ N} of real numbers cp

N i=1

ai2

1/2

N N 1/2 ≤ ai εi ≤ Cp ai2 . i=1

p

i=1

√ See Kashin and Saakyan [1989]. Further Cp ≤ K p, p ≥ 1, where K is numerical.

428

8 The metric entropy method

Consequently,

E sup Q (z) = z∈Z

E dn εn n−σ ≥ c

τ/2<j ≤τ

=c

τ/2<j ≤τ

n∈Lj

dn2 n−2σ

2 1/2 E dn εn n−σ

τ/2<j ≤τ

1/2

n∈Lj

.

n∈Lj

The upper bound immediately follows from the Cauchy–Schwarz inequality. 8.9.4 Corollary. If (dn ) is a multiplicative system, we have E sup Q (z) ≥ c N −σ z∈Z

dpj

τ/2<j ≤τ

dn2˜

1/2 .

n≤N/p ˜ j P + (n)≤p ˜ τ/2

Now we can finish the proof of Theorem 8.9.1. If dn ≡ 1, we get from the above corollary that τ E sup εn n−σ e2iπ a(n),z ≥ E sup Q (z) z∈Tτ

z∈Z

j =1 n∈Ej

≥ =

C Nσ C Nσ

1/2 # m ≤ N/pj : P + (m) ≤ pτ/2

τ/2<j ≤τ

"

τ/2<j ≤τ

N , pτ/2 pj

1/2

.

Since

"

N ∗ N cN N N N , pτ/2 ≥ " , pτ/2 = " , pτ/2 ≥ , pτ/2 , "∗ pj pτ pτ pτ τ log τ pτ

we obtain $ %1/2 τ cN c τ −σ 2iπ a(n),z ∗ N E sup dn εn n e , pτ/2 " ≥ σ N 2 τ log τ pτ z∈Tτ j =1 n∈Ej

=cN

1/2−σ

τ log τ

1/2

"

∗

N , pτ/2 pτ

1/2

,

as asserted. Remark. Theorem 8.9.1 was extended [2009a] to weighted ran in Lifshits–Weber −s , under moderate conditions on the dom Dirichlet polynomials D(s) = N d(n)n n=2

429

8.9 An application to random Dirichlet polynomials

weights d(n). In fact the approach can be used with slight modifications to treat the case when d(n) is a non-negative sub-multiplicative function, namely d(nm) ≤ d(n)d(m) provided (n, m) = 1,

(8.9.18)

and satisfy

n (8.9.19) p|n "⇒ d(n) ≤ C d( ), and d(pj ) ≤ C1 λj , p √ for some positive C, C1 , λ with λ < 2, any prime number p, any integers n, j . √ Clearly, if C < 2, the second property is implied by the first. But this is not always so as the√following example yields. Fix some prime number P1 as well some reals 1 < λ1 < 2, C1 ≥ 1, and put j C1 λj if P1 n, (8.9.20) d(n) = 1 if (n, P1 ) = 1. (n) Condition (8.9.19) is satisfied by the divisor function d(n) = δ|n 1, or if d(n) = λ where (n) = pν ||n ν is the prime divisor sum function; but also for multiplicative functions such that d(p a ) ≤ λ, a = 1, 2, . . . . (8.9.21) d(pa−1 ) Other remarkable examples are 1 if (n, K) = 1, dK (n) = 0 if (n, K) > 1. where K is some positive integer. And the truncated divisor function dN (n) = #{k ≤ N : k|n}, where N ≥ 1 is some fixed positive integer. These examples are studied in [Weber: 2009a] where significant simplifications of the approach are provided, yielding also strictly better bounds than in Theorem 8.9.1. 8.9.10. Other results. In this section we apply the technique used on some other sets of coefficients. Let {dn , n ≥ 1} be a sequence of multiplicative weights: dnm = dn dm whenever n, m are coprimes. Write Bm = dn2 . (8.9.22) 2≤n≤m

By choosing τ = μ := π(N) in the lower bound of Proposition 8.9.6, we get N E sup dn εn n−σ e2iπ a(n),z ≥ E sup |Q (z)| z∈Tμ n=2

z∈Z

≥ CN −σ

μ/2<j ≤μ

dpj

n≤N/p ˜ j P + (n)≤p ˜ μ/2

dn2˜

1/2 .

430

8 The metric entropy method

Note that for large N in the case τ = μ the sets Lj reduce to n = pj n˜ : n˜ ≤ Indeed, if n˜ ≤ pNj and if there is an s 2 ∼ (μ log μ)2 /4 ∼ N ≥ pj ps ≥ pμ/2 + necessarily P (n) ˜ ≤ pμ/2 . Thereby,

N pj

.

≥ μ/2 such that ps |n, ˜ then this implies that N 2 /4, which is impossible for large N. Thus

N dn εn n−σ e2iπ a(n),z ≥ CN −σ E sup z∈Tμ n=2

dpj

μ/2<j ≤μ

= CN −σ

n≤N/p ˜ j

dn2˜

1/2

1/2

μ/2<j ≤μ

dpj BN/pj .

We have obtained 8.9.5 Proposition. There exists a universal constant C, N0 such that for any 0 ≤ σ < 1/2, any integer N ≥ N0 and any multiplicative sequence of weights {dn , n ≥ 1}, N E sup εn dn n−σ −it ≥ CN −σ t∈R n=2

μ/2<j ≤μ

1/2

dpj BN/pj

where Bm is defined in (8.9.22). Apply this to the case dn = d(n), where d(n) = #{d : d|n} is the divisor function. Although these weights are very irregular, their sums behave regularly, in particular, N

d (n) ∼ 2

n=1

N log3 N π2

as N tends to infinity. The last estimate immediately provides Bm ∼ (m/π 2 ) log3 m, hence (noticing that dpj = 2 and μ ∼ N/ log N )), μ/2<j ≤μ

1/2

dpj BN/pj ∼ =

(2N/pj π 2 )1/2 log3/2

μ/2<j ≤μ

2N 1/2 π

2N 1/2 ∼ π ≈ N 1/2

1

1/2 μ/2<j ≤μ pj

μ/2<j ≤μ

N pj

N j log j log j )1/2

log3/2

μ/2<j ≤μ

log3/2

N pj

(j

1 μ1/2 N 1/2 . ≈ N ∼ (j log j )1/2 (log μ)1/2 log N

Now, let {Pk , k ∈ K} be a finite set of mutually coprime numbers. Consider the set of integers ) E = n : n = k∈K Pkαk , αk ∈ {0, 1}

431

8.9 An application to random Dirichlet polynomials

and the associated Dirichlet polynomial DE (t) = where N =

εn n−σ −it =

k∈K

εn χE (n)n−σ −it ,

n=2

n∈E

)

N

Pk . We prove the following.

8.9.6 Proposition. There exists a universal constant C such that, for any σ ≥ 0 and any {Pk , k ∈ K}, −σ (

j ∈G Pj −2σ 1/2 1 + Pk sup ) . E sup |DE (t)| ≥ C

−2σ 1/2 G⊆K t∈R k∈K k∈G 1 + Pk Proof. By (8.9.5) we have

sup DE (t) = sup Q(z) z∈Tμ

t∈R

where μ = |K| and Q(z) =

N

χE (n)εn n−σ e2iπ a(n),z .

n=2

Let A ⊂ K and B = K\A. We assume that both A and B are nonempty sets. Define for j ∈ B, Bj = {n ∈ E : αk = 0 if k ∈ B, k = j, αj = 1} and Z ⊂ Tμ by Z = z = {zk , 1 ≤ k ≤ 2r} : zk = 0, if k ∈ A, and zk ∈ {0, 1/2} if k ∈ B . For j ∈ B, n ∈ Bj and z ∈ Z, we have 2a(n), z = 2 k∈K αk zk = 2zj = ±1, so that similar to our previous lower bound, εn n−σ , sup Q(z) ≥ z∈Z

j ∈B n∈Bj

almost surely. Hence 2 1/2 E sup Q(z) ≥ C E εn n−2σ z∈Z

j ∈B

=C

j ∈B

=C

n∈Bj

Pj−σ

(

(αk )k∈A ∈{0,1}A k∈A

(

1 + Pk−2σ

k∈A

1/2 j ∈B

Pk−2σ αk

Pj−σ .

1/2

432

8 The metric entropy method

Therefore E sup DE (t) ≥ C t∈R

=C

sup

(

A⊆K,A =K k∈A

(

1 + Pk−2σ

k∈K

1 + Pk−2σ

1/2

Pj−σ

j ∈Ac

1/2

sup

A⊆K,A =K

j ∈Ac

) k∈Ac

Pj−σ

1 + Pk−2σ

1/2 .

Chapter 9

The majorizing measure method

The majorizing measure method, which originates from a well-known paper of Garsia, Rodemich and Rumsey, is presented in the exponential case first, in an introductory way. Next a general approach initiated by Talagrand is described. For the proofs, we however followed a recent and elegant simplification of these techniques introduced by Bednorz. An application and an illustration of the method appears in Section 9.3, where a criterion for the convergence of averages of random variables satisfying suitable increment conditions is established. Several applications in ergodic theory are given. The chapter concludes with another application giving rise to a strict sharpening of the Salem–Zygmund estimate for random polynomials.

9.1

Introduction – the exponential case

In a famous article of Garsia, Rodemich and Rumsey [1970], a real variable lemma was established and was then used to establish a new type of sufficient conditions for the convergence almost everywhere of stochastic processes. Unlike the metric entropy method, the kind of conditions obtained is expressed by means of a family of integrals analysing the local scattering of the parameter space, when endowed with a suitable metric (generally induced by the relevant stochastic process). Since this original work, more than thirty years have gone by, and during this period, considerable developments of this method, hereafter called “the majorizing measure method”, were obtained mainly under the impulse of Talagrand, after isolated but productive efforts of Fernique. In 1985, Talagrand solved the open question of characterizing the regularity (sample boundedness and sample continuity) of Gaussian processes, by means of the existence of a majorizing measure. This deep result was later published in Talagrand [1987]. The same year, Talagrand announced during a famous conference in Strasbourg a series of deep results of the same kind concerning non-Gaussian processes. These results are stated and proved in another famous paper Talagrand [1990], and are at the center of this chapter. In Section 9.2 we present some of them, as well as a recent simplified approach due to Bednorz [2006a]. In Section 9.3, we apply these results to obtain a very useful almost sure convergence criterion for averages of sequences of random variables satisfying increment conditions of Gál–Koksma type. This substantially completes the work done in Section 8.4. Now, we return to the seminal work of Garsia, Rodemich, Rumsey, and first to the above mentioned real variable lemma. Let (T , d) be a metric space and μ be a Borel probability on T . Let f : T → R be a Borel

434

9 The majorizing measure method

function. Let also A, B be two Borel subsets of T with positive measure. Put f (s) − f (t) f˜(s, t) = χ{d(s,t) =0} (s, t) ∀ s, t ∈ T , d(s, t) s∈A,t∈B (9.1.1) where χ denotes the indicator function. Then for any convex function " : R → R+ , and any positive real c, 1 1 f (x)μ(dx) − f (x)μ(dx) μ(A) μ(B) A B f (s) − f (t) μ(ds)μ(dt) −1 ≤ cδ(A, B) · " " . cd(s, t) μ(A)μ(B) A A δ(A, B) =

sup d(s, t),

By twice applying Jensen’s inequality, we indeed get 1 1 f (x)μ(dx) − f (x)μ(dx) μ(A) μ(B) B A μ(du)μ(dv) = (f (u) − f (v) μ(A)μ(B) A B $ ˜ % f (u, v) μ(du)μ(dv) −1 = c" " d(u, v) c μ(A)μ(B) A B ˜ f (u, v) μ(du)μ(dv) ≤ δ(A, B) c " −1 " c μ(A)μ(B) A B ˜ f (u,v) T T "( c )μ(du)μ(dv) −1 ≤ δ(A, B) c" . μ(A)μ(B) Now if " is aYoung function and if f˜ ",μ×μ < ∞, then choosing c = f˜ ",μ×μ gives 1 1 f (x)μ(dx) − f (x)μ(dx) μ(A) μ(B) B A (9.1.2) 1 ≤ f˜ ",μ×μ δ(A, B)" −1 . μ(A)μ(B) This is the basic inequality. If A, B are d-balls centered at some point t0 ∈ T : A = B(t0 , ε1 ), B = B(t0 , ε2 ), where we set B(t, ε) = Bd (t, ε) = s ∈ T : d(s, t) ≤ ε , then

1 μ(B(t0 , ε))

f (x)μ(dx), B(t0 ,ε)

represents an approximation of f (t0 ), and inequality (9.1.2) gives us a hint on how this approximation can be controlled.

435

9.1 Introduction – the exponential case

We shall now describe how this can be used by studying the regularity of a class of stochastic processes with exponential moments. Consider for α ≥ 1, t ∈ R the exponential Young functions α "α (t) = et − 1, with Orlicz norms

f "α = inf c > 0 : T "α fc dμ ≤ 1 .

(9.1.3)

9.1.1 Theorem. Let (T , d) be a compact metric space. Let D(T ) denote the diameter of (T , d). Let X = {X(ω, t), ω ∈ , t ∈ T } be a stochastic process with basic probability space (, A, P). Assume that the following increment condition is satisfied: for all s, t ∈ T

X(s) − X(t) "α ≤ d(s, t). (9.1.4) Let μ be a Borel probability measure on T such that D(T )/2 1 L = sup "α−1 du < ∞. μ(Bd (t, u)) t∈T 0 Put for s, t ∈ T , X(s) − X(t) ˜ X˜ = X(s, t) = χ{d(s,t) =0} (s, t). d(s, t) Then X admits a d-separable version, which we denote again by X. Further X˜ ∈ L"α (T 2 , μ × μ), P-almost surely, and ˜ "α ,μ×μ = 1. P ω : sup X(ω, t) − X(ω, t) dμ(t) ≤ 12L X

T

t∈T

Furthermore, for any ρ > 0,

P ω:

sup X(ω, s) − X(ω, t) s,t∈T d(s,t)≤ρ

≤ 40 X˜ "

α

sup ,μ×μ t∈T

ρ/2 0

"α−1

1 du = 1. μ(B(t, u))

Proof. The proof consists of a simple variation of the original proof in Garsia–Rodemich–Rumsey, and will also use some ideas from Preston [1971]. Since (T , d) is compact, it is d-separable. Let T be a countable d-dense subset of T . By assumption (9.1.4), X is d-continuous in probability. It is thus easily seen that X possesses a version which is d-separable and admits T as separation set (Section 8.1). We denote this version again by X. To any Borel subset A of T with μ(A) > 0, we associate the random variable 1 X(t)μ(dt). XA = μ(A) A

436

9 The majorizing measure method

We get from (9.1.2), ˜ "α ,μ×μ "α−1 |XA − XB | ≤ δ(A, B) X

1 . μ(A)μ(B)

(9.1.5)

Now put for any t ∈ T and r > 0, Xr = Xr (t) =

1 μ(B(t, r))

X(u) μ(du). B(t,r)

Set rn = D(T )2−n , n ≥ 0. By (9.1.5),

1 2 μ (B(t, rn )) 1 −n ˜ −1 = 3D(T )2 X "α ,μ×μ "α μ2 (B(t, rn )) rn 1 −1 ˜ ≤ 6 X "α ,μ×μ "α du μ2 (B(t, u)) rn+1 rn 1 ˜ "α ,μ×μ "α−1 du, ≤ 12 X

μ(B(t, u)) rn+1

˜ "α ,μ×μ "α−1 |Xrn − Xrn−1 | ≤ (rn + rn−1 ) X

(9.1.6)

since "α−1 (u2 ) = (log(1 + u2 ))1/α ≤ (log(1 + u)2 )1/α ≤ 2(log(1 + u))1/α .

From assumption (9.1.4) it follows that E "α X(s)−X(t) ≤ 1, for any s, t ∈ T d(s,t) a.s.

with d(s, t) = 0. And if d(s, t) = 0, then X(s) = X(t). Integrating now the latter inequality with respect to μ × μ over T × T , next using Fubini’s theorem, yields ˜ E "α (X(s, t)) μ(du)μ(dv) ≤ 1. T

T

Hence X˜ ∈ L"α (T 2 , μ × μ) P-almost surely. Let now {un , n ≥ 1} be a sequence of reals decreasing to 0 and such that n 2−n /un < ∞. By Tchebycheff’s inequality P |Xrn (t) − Xrn−1 (t)| > un 1 μ(du)μ(dv) ≤ E X(u) − X(v)| un B(t,rn ) B(t,rn−1 ) μ(B(t, rn ))μ(B(t, rn−1 )) Cα μ(du)μ(dv) X(u) − X(v) " ≤ α un B(t,rn ) B(t,rn−1 ) μ(B(t, rn ))μ(B(t, rn−1 )) d(u, v)μ(du)μ(dv) 3D(T )2−n Cα . ≤ ≤ un B(t,rn ) B(t,rn−1 ) μ(B(t, rn ))μ(B(t, rn−1 )) un And so by the Borel–Cantelli lemma, we get P lim Xrn (t) = X(t) = 1. n→∞

(9.1.7)

437

9.1 Introduction – the exponential case

Owing to the fact that Xr0 (t) =

T

X(t) μ(dt), we deduce

∞ ≤ X(t) − X X(t) μ(dt) = lim (t) − X(t) μ(dt) |Xrn (t) − Xrn−1 (t)| r n→∞ n T

T

˜ "α ,μ×μ ≤ 12 X

˜ "α ,μ×μ = 12 X

∞

rn

n=1 rn+1 D(T )/2 0

"α−1 "α−1

n=1

1 du μ(B(t, u))

1 du. μ(B(t, u)) (9.1.8)

Passing to the supremum over all t varying in T gives the first inequality of the statement. Now let s, t ∈ T be fixed with d(s, t) = 2r > 0, and put successively A = B(s, r) ∪ B(t, r),

B = B(s, r),

C = B(t, r).

Then δ(A, B) ≤ 4r, and δ(A, C) ≤ 4r. But, |X(s) − X(t)| ≤ |X(s) − Xr (s)| + |Xr (s) − XA | + |XA − Xr (t)| + |Xr (t) − X(t)|. (9.1.9) From (9.1.5) we deduce

1 μ(A)μ(B) 1 ˜ "α ,μ×μ "α−1 ≤ 4r X

μ2 (B(s, r) r 1 ˜ ≤ 8 X "α ,μ×μ "α−1 du almost surely. μ(B(r, u)) 0

˜ "α ,μ×μ "α−1 |Xr (s) − XA | ≤ 4r X

Operating similarly for the other terms in (9.1.9) gives, in view of (9.1.8), ˜ "α,μ×μ sup |X(s) − X(t)| ≤ 40 X

θ ∈T

d(s,t) 2

0

"α−1

1 du μ(B(θ, u))

almost surely.

Passing again to the supremum over all s and t such that d(s, t) < ρ and varying in T , gives the second inequality of the statement. We notice from this proof and from (9.1.9) particularly, that it was also possible to work directly with the original process X, and control its supremum over any countable subset of T with the help of (9.1.9), thereby avoiding separability considerations. This has interest for boundedness, when controlling lattice suprema defined in (8.1.6).

438

9 The majorizing measure method

9.2 A general approach In several remarkable papers [1987], [1990], [1992], [1994], [1996c], [2001] and also in a recent book [2005], Talagrand showed that the majorizing measure method is in turn a rather general approach to treat problems such as sample boundedness or sample continuity of stochastic processes. It applies not only to the exponential case but equally well to the power case, with some complications inherent to this important case. For simplicity of the exposition concerning sample boundedness, we will understand supremums as lattice suprema as in (8.1.6). Consider φ : R+ → R+ such that φ(0) = 0, φ is strictly increasing continuous. Let ψ = φ −1 and set for x > 0, x x (x) = φ(t)dt "(x) = ψ(t)dt. 0

0

Then and " are called conjugate Young functions, and we have Young’s inequality uv ≤ (u) + "(v),

(u ≥ 0, v ≥ 0).

(9.2.1)

We say that a function f : R → R+ satisfies the 2 -condition with constant C if for all x ≥ 1, we have f (2x) ≤ Cf (x). Typical examples are power functions f (x) = |x|p , p ≥ 1. The general result below is essentially due to Assouad; for a proof see Talagrand [1990: Theorem 2.3]. 9.2.1 Theorem. Let (T , d) be a metric space and let be a Young function. Then the following are equivalent: (a) For any stochastic process {Xt , t ∈ T } that satisfies

Xs − Xt ≤ d(s, t) for any s, t ∈ T ,

(9.2.2)

we have P{supt,u∈T |Xt − Xu | < ∞} > 0. (b) For each ε > 0, there is A > 0, such that for each stochastic process {Xt , t ∈ T } that satisfies (9.2.2), we have P{supt,u∈T |Xt − Xu | ≥ A} ≤ ε. (c) There exists a constant S such that for each stochastic process {Xt , t ∈ T } that satisfies (9.2.2), we have E supt,u∈T |Xt − Xu | ≤ S. (d) There exists a constant M, a positive linear functional θ on the space G of continuous bounded functions on T × T \(T ), (T ) being the diagonal of T , with θ (1) = 1, such that for any Lipschitz function f on T we have the implication

θ

f (t) − f (u) d(t, u)

≤ 1 "⇒ sup |f (t) − f (u)| ≤ M. t,u∈T

Moreover, these conditions imply that T is totally bounded, and if S, M are chosen minimal, we have M ≤ S ≤ 2M.

439

9.2 A general approach

The following important result extends Theorem 9.1.1 to the power case. The first part of the statement is Theorem 4.6 in Talagrand [1990]; the second part follows from Theorem 2.9 in the same paper. 9.2.2 Theorem. Let (T , d) be a compact metric space. Let be a Young function and assume that " satisfies the 2 -condition with constant C. (a) Assume that there is a probability measure m on T such that

D(T )

sup

−1

0

t∈T

1 dε ≤ A. m(B(t, ε))

Then there exists a universal constant K such that for any stochastic process X = {Xt , t ∈ T } that satisfies (9.2.2), we have E supt,u∈T |Xt − Xu | ≤ S with S = KA(1 + log C). (b) Further assume that limx→∞ (x)/x = ∞. If X is separable, then it is moreover sample continuous. A probability measure m such that

D(T )

sup t∈T

−1

0

1 dε < ∞ m(B(t, ε))

(9.2.3)

is called a majorizing measure. The condition on " to satisfy the 2 -condition is realized if (x) = |x|p , p > 1,

β but fails if (x) = |x| log(1+|x|) , β > 0. The theorem is obtained as a combination of several results, and what is essential, the approach consists of approximating (T , d) by ultrametric spaces. A metric space (U, δ) is ultrametric when the metric satisfies the stronger condition

δ(u, v) ≤ max δ(u, w), δ(w, v) (u, v, w ∈ U ). In an ultrametric space, two balls of equal radius are either disjoint or identical, which makes the structure of these spaces rather simple. Let S(T , d, ) be the smallest constant S such that for any stochastic process that satisfies (9.2.2), we have E supt,u∈T |Xt − Xu | ≤ S. By Theorem 9.2.2, we know that S(T , d, ) ≤ KA(T , d, )(1 + log C), where A(T , d, ) :=

inf

D

sup

m∈P (T ) t∈T

0

−1

1 dε. m(B(t, ε))

When (T , d) is ultrametric, a two-sided inequality is fulfilled: 1 A(T , d, ) ≤ S(T , d, ) ≤ K(1 + log C)A(T , d, ). 8

(9.2.4)

440

9 The majorizing measure method

In the general case, Talagrand showed (Theorem 1.2 in the same paper) that D 1 1 ψ dε ≤ S(T , d, ), inf sup m(B(t, ε)) 4 m∈P (T ) t∈T 0

(9.2.5)

which is always weaker than 41 A(T , d, ) ≤ S(T , d, ), and strictly weaker for instance if (x) = |x|p , p > 1. However, when increases fast enough (essentially faster than x α log log x for some α > 0), both inequalities are equivalent, thus giving a complete understanding of the condition S(T , d, ) < ∞. These lower counterparts are however only satisfactory from a theoretical point of view. Indeed in Weber [1999: Section 4] we showed by means of Birkhoff’s theorem and a theorem of Tandori, that it is possible to find two stochastic processes Xi = {Xti , t ∈ N} ⊂ L2 (P), i = 1, 2 with increments satisfying

Xs1 − Xt1 2 ≤ Xs2 − Xt2 2

(∀s, t ∈ N)

and such that X 2 is almost surely convergent, whereas X 1 is not. Recently Bednorz [2006a] (see Theorems 1.2 and 3.1) has proposed a simplified and slightly more general new approach, although much inspired by Talagrand’s paper. Ultrametric spaces are, however, not involved in Bednorz’s proofs. Their main feature lies in the role played by an adapted calibration of the balls of the parameter space. This is a nice and also pedagogical approach, which we shall present now. Bednorz’s approach. Let (T , d) be a fixed metric space and m a fixed Borel probability on (T , d) such that supp(m) = T . For a, b ≥ 0 let Ga,b be the family of functions : R+ → R which are increasing, continuous with (0) = 0 and such that x ≤a+b

(xy) (y)

for x ≥ 0, y ≥ −1 (1).

Note that each Young function is in G1,1 . Let B(T ) be the space of all Borel bounded functions on T and C(T ) the space of continuous functions on T . Given a function in Ga,b define D(T ) 1 −1 dε, s(x) = m(B(x, ε)) 0 S = sup s(x), x∈T ˜S = s(u) m(du). T

9.2.3 Theorem. Suppose ∈ Ga,b and let R > 2. Then there exists a probability measure ν on T × T such that for each bounded continuous function f on T the inequality |f (u) − f (v)| f (t) − ≤ aAs(t) + bB S˜ f (u)m(du) ν(du, dv) d(u, v) T T ×T

441

9.2 A general approach

holds for all t ∈ T , where A =

R2 (R−1)(R−2) ,

B=

R2 R−1 .

A consequence of this result is 9.2.4 Theorem. If is a Young function and m is a majorizing measure, then for any stochastic process X = {Xt , t ∈ T } that satisfies (9.2.2), we have E sup |Xt − Xu | ≤ 32S. t,u∈T

Theorems 9.2.3, 9.2.4 apply even if (x) = |x|. It should be noted, however, that the estimate given in Theorem 9.2.3 is not homogeneous in f . Proof of Theorem 9.2.3. Define the integer k0 by the condition R k0 ≤ −1 (1) < R k0 +1 . Next put for k > k0 and any x ∈ T ,

1 rk (x) := min ε ≥ 0 : −1 m(B(x,ε)) ≤ Rk . (9.2.6) If k = k0 we simply set rk0 (x) ≡ D(T ). The first important fact is that: For k ≥ k0 , the functions rk are 1-Lipschitz. (9.2.7)

Indeed, from the elementary inclusion relation B(t, ρ) ⊂ B s, ρ + d(s, t) valid for s, t, ρ arbitrary, we deduce −1

1 m(B(s, rk (t) + d(s, t)))

≤ −1

1 m(B(t, rk (t)))

≤ Rk ,

which implies rk (s) ≤ rk (t) + d(s, t), and similarly rk (t) ≤ rk (s) + d(s, t); hence |rk (s) − rk (t)| ≤ d(s, t) as claimed. Now observe that K

rk (x)(R k − R k−1 ) = −rk0 (x)R k0 −1 + R k0 (rk0 (x) − rk0 +1 (x)) +

k=k0

· · · + R K−1 (rK−1 (x) − rK (x)) + R K rK (x) ≤ ≤

K−1

R k (rk (x) − rk+1 (x)) + R K rK (x)

k=k0 rk (x) 0

−1

0

1 du + R K rK (x). m(B(x, u))

Hence ∞

D(T )

rk (x)(R k − R k−1 ) ≤

−1

0

k=k0

= 0

D(T )

−1

1 du + lim sup R K rK (x) m(B(x, u)) K→∞

1 du. m(B(x, u))

442

9 The majorizing measure method

And consequently ∞

rk (x)R ≤ k

k=k0

R R−1

D(T )

−1

0

1 du = m(B(x, u))

R s(x). (9.2.8) R−1

Now introduce for k ≥ k0 the notation Bk (x) = B(x, rk (x)), 1 Sk f (x) = f (u)m(du) := • f (u)m(du). m(Bk (x)) Bk (x) Bk (x) The operators Sk satisfy the following properties: Sk (1) = 1, f ≤ g "⇒ Sk f ≤ Sk g and |Sk f | ≤ Sk |f |, Sk Sk0 f = Sk0 f, f ∈ C(T ) "⇒ f (x) = lim Sk f (x).

(9.2.9)

k→∞

Now observe this: let i, j ≥ k0 and take v in Bi (u) = B(u, ri (u)). By (9.2.7), |rj (v) − rj (u)| ≤ d(u, v) ≤ ri (u). Hence Si rj (u) = • rj (v)m(dv) ≤ • rj (u)m(dv)+ • ri (u)m(dv) = rj (u)+ri (u), Bi (u)

Bi (u)

and so for i, j ≥ k0 ,

Bi (u)

Si rj ≤ rj + ri .

This will permit us to establish a key ingredient of the proof, namely the inequality Sm Sm−1 . . . Sk+1 rk ≤

m

2i−k ri .

(9.2.10)

i=k

If m = k + 1, this reduces to Sk+1 rk ≤ rk + 2rk+1 , which is clear by what precedes. Now, if for m − 1 > k ≥ k0 , Sm−1 . . . Sk+1 rk ≤

m−1

2i−k ri ,

i=k

then Sm Sm−1 . . . Sk+1 rk ≤ Sm

m−1

2

i−k

ri =

i=k

=

m−1 i=k

2i−k ri + rm

m−1

2

i−k

Sm ri ≤

i=k

m−1 i=k

m−1

2i−k (rm + ri )

i=k

2i−k ≤

m i=k

2i−k ri ,

443

9.2 A general approach

as claimed. Finally note that m−1 m k=k0

2

i−k

ri R = k

i=k

m−1 m k=k0 i=k

R ≤ R−2

2 R

∞

i−k

ri R ≤ k

m ∞ j 2 j =0

R

ri R i

i=k0

(9.2.11)

ri R i .

i=k0

Now from the fact that

f dm = lim Sm f (t) − Sk0 f (t) = lim Sm f (t) − Sm Sm−1 . . . Sk0 f (t) f (t) − m→∞

T

= lim

m→∞

m→∞

m−1

Sm Sm−1 . . . Sk+2 Sk+1 f (t) − Sm Sm−1 . . . Sk+1 Sk f (t) ,

k=k0

we get the bound f (t) −

T

m−1 Sm Sm−1 . . . Sk+2 Sk+1 (I − Sk )f (t) f dm ≤ lim m→∞

k=k0 m−1

≤ lim

m→∞

Sm Sm−1 . . . Sk+2 Sk+1 (I − Sk )f (t).

k=k0

•

Sk+1 (I − Sk )f (w) = •

But

(9.2.12)

Bk+1 (w)

And so

Sk+1 (I − Sk )f (w) ≤ •

•

Bk+1 (w)

Using the fact that ∈ Ga,b , with x =

(f (u) − f (v)) m(dv)m(du).

Bk (u)

|f (u) − f (v)|m(dv)m(du).

(9.2.13)

Bk (u) |f (u)−f (v)| , R k+1 d(u,v)

y = R k+1 yields

|f (u) − f (v)| |f (u) − f (v)| b . ≤a+ k+1 k+1 R d(u, v) (R ) d(u, v) Let v ∈ Bk (u), then by definition d(u, v) ≤ rk (u). Note also that the inequality m(Bk+1 (w)) ≥ 1/(R k+1 ) holds for any w ∈ T , by construction. Incorporating these two ingredients into the above, now leads to the more suitable form

|f (u) − f (v)| ≤ ark (u)R k+1 + bm(Bk+1 (w))rk (u)R k+1

|f (u) − f (v)| . d(u, v) (9.2.14)

444

9 The majorizing measure method

Thus with (9.2.13) and (9.2.14), |Sk+1 (I − Sk )f (w)|

rk (u)R k+1 •

≤ aR k+1 Sk+1 rk (w) + b T

Bk (u)

|f (u) − f (v)| m(du)m(dv). d(u, v) (9.2.15)

i−k r . Therefore By (9.2.10), Sm Sm−1 . . . Sk+1 rk ≤ m i i=k 2 Sm Sm−1 . . . Sk+2 Sk+1 (I − Sk )f (t) m |f (u) − f (v)| i−k k k ≤ aR 2 ri (t)R + bR rk (u)R • m(du)m(dv). d(u, v) T Bk (u) i=k

In view of (9.2.11), (9.2.12) and then (9.2.8), we get f (t) − f dm T

≤a

∞ R2 rk (t)R k R−2

+ bR

k=k0 ∞

rk (u)R k •

k=k0 T ∞

Bk (u)

≤ aAs(t) + bR

k=k0 T

rk (u)R • k

|f (u) − f (v)| m(du)m(dv) d(u, v)

|f (u) − f (v)| m(du)m(dv), d(u, v) Bk (u) (9.2.16)

where A =

R3 (R−1)(R−2) .

Let ν be a probability measure on T × T defined by

∞ 1 k ν(g) := rk (u)R • g(u, v)m(du)m(dv) for g ∈ B(T × T ), M T Bk (u) k=k0

k where M = ∞ k=k0 T rk (u)R m(du). By (9.2.8) we have M ≤ R ˜ S. Hence

R R−1 T

(9.2.17) s(u)m(du) =

R−1

f (t) − ˜ f dm ≤ aAs(t) + bB S T

where B =

R2 R−1 .

T ×T

|f (u) − f (v)| ν(du, dv), d(u, v)

The proof is now complete.

Proof of Theorem 9.2.4. To prove the result, we may replace the process {X(t), t ∈ T } by {X(t) − X(t0 ), t ∈ T } where t0 is arbitrary in T . As X(t) − X(t0 ) is integrable by

445

9.2 A general approach

(9.2.2), we may also assume for the proof that X(t) is integrable. Let (, B, P) be the underlying probability space on which X is defined. First assume that B is finite. We identify points in each atom of B and so assume that is finite. Observe from (9.2.2) that (9.2.18) |X(ω, s) − X(ω, t)| ≤ d(s, t)−1 (1/P({ω}). This means that the trajectories of X are Lipschitz and bounded, thereby bounded continuous. Now, from Theorem 9.2.3 and the triangle inequality also follows that there exists a probability measure ν on T × T such that for each bounded continuous function f on T , |f (u) − f (v)| sup |f (s) − f (t)| ≤ 2aAS + 2bB S˜ ν(du, dv). d(u, v) T ×T s,t∈T Therefore E sup |X(s) − X(t)| ≤ 2aAS + 2bB S˜

s,t∈T

T ×T

E

|X(u) − X(v)| ν(du, dv) d(u, v)

˜ = 2aAS + 2bB S. In the general case, we have to show for any finite subset T0 of T that ˜ E sup |X(s) − X(t)| ≤ 2aAS + 2bB S. s,t∈T0

We may assume that B is countably generated. And so there exists an increasing sequence Bn of finite σ -fields whose union generates B. As E |X(t)| < ∞, the conditional expectations Xn (t) = E (X(t)|Bn ) are well defined. Observe by Jensen’s inequality that

E Hence

|Xn (s) − Xn (t)| d(s, t)

≤ E

|X(s) − X(t)| d(s, t)

≤ 1.

˜ E sup |Xn (s) − Xn (t)| ≤ 2aAS + 2bB S. s,t∈T0

Owing to the fact that Xn (t) → X(t) P-almost surely and in L1 (P) for each t ∈ T0 , we conclude that ˜ E sup |X(s) − X(t)| ≤ 2aAS + 2bB S, s,t∈T0

for any finite T0 ⊂ T , as requested. 9.2.5 Remark. It is natural to ask whether, under the existence of a majorizing measure, the following implication is true:

Xt − Xu ≤ d(t, u), ∀t, u ∈ T "⇒ sup |Xt − Xu | < ∞. t,u∈T

446

9 The majorizing measure method

By Theorem 9.1.2 and Proposition 2.7 in Talagrand [1990], if there exists a Young function and α > 0 such that a ≥ −1 then

1 and b ≥ 1 "⇒ (ab) ≥ α(a) (b), 2

sup |Xt − Xu | ≤ K( )S(T , d, )/α,

(9.2.19)

(9.2.20)

t,u∈T

where K( ) depends on only. This applies if (x) = |x|p , p > 1, in which case the answer is yes, but is no if (x) = |x| log(1 + |x|), by Proposition 2.9 in the same α paper. The answer is also naturally yes for exponential functions α (x) = e|x| − 1, α ≥ 1 considered in Section 9.1.1. Bednorz [2006a: Theorem 2.1] has also considered this problem and proved the 9.2.6 Proposition. Let ∈ Ga,b . Let α ≥ 0, β ≥ 0 and ϑ : R+ → R be increasing continuous with ϑ(0) = 0, limx→∞ ϑ(x) = ∞, such that ϑ(x) ≤ α + β

(xy) (y)

for x ≥ 0, y ≥ 0.

(9.2.21)

Then for each bounded continuous function f on T the following inequality holds: f (t) − T f (u)m(du) |f (u) − f (v)| ν(du, dv), ≤α+β sup ϑ K d(u, v) T ×T t∈T (9.2.22) where K = (aA + bB)S and A, B, ν are as in Theorem 9.2.3. Proof. Given f , let c be defined by |f (u) − f (v)| ν(du, dv). ϑ(c) = α + β d(u, v) T ×T In view of (9.2.21), for all u, v ∈ T ,

|f (u) − f (v)| |f (u) − f (v)| ϑ(c) − α) ≤ β . cd(u, v) d(u, v)

Thereby

|f (u) − f (v)| ν(du, dv) cd(u, v) T ×T β |f (u) − f (v)| ν(du, dv) = 1. ≤ ϑ(c) − α T ×T d(u, v)

9.3 A useful criterion

447

Using now Theorem 9.2.3, we obtain 1 |f (u) − f (v)| ν(du, dv) sup f (t) − f (u)m(du) ≤ aAs(t) + bB S˜ c t∈T cd(u, v) T T ×T ≤ (aA + bB)S = K. Since ϑ is increasing f (t) − T f (u)m(du) sup ϑ K t∈T f (t) − f (u)m(du) T = ϑ sup K t∈T |f (u) − f (v)| ≤ ϑ(1) = α + β ν(du, dv), d(u, v) T ×T as requested. Problem 10. Let (, A, P) be a probability space, (T , d) a compact metric space and a Radon probability μ on T . Let 1 < p < ∞. Consider a stochastic process X = {X(ω, t), ω ∈ , t ∈ T } with increments satisfying the assumption X(s) − X(t) p μ(ds)μ(dt) < ∞. E d(s, t) T T Find conditions ensuring that X is sample bounded.

9.3 A useful criterion Let ξ = {ξl , l ≥ 1} be a sequence of random variables defined on some probability space (, A, P). Let m = {ml , l ≥ 1} be a sequence of positive reals with partial sums Mn = nl=1 ml . Assume that (ξ, m) are linked by the increment condition E

j j 2 ξl ≤ ml l=i

(i ≤ j ).

l=i

In Chapter 8, we used the metric entropy method to obtain various criteria for the convergence almost everywhere of the series l≥1 ξl under the above assumption or similarones. Here, instead of studying the convergence almost everywhere of the series l≥1 ξl , we are rather interested in finding fine convergence criteria for the aver ages v1n l≤n ξl , where vn are suitable normalizing factors. The convergence of these averages can often be efficiently established via Kronecker’s lemma, once the series ξ is shown to be convergent almost everywhere. However, the two convergence l≥1 l properties are basically different, and it seemed natural to develop a separate study for

448

9 The majorizing measure method

the averages. Because these properties are close, it also appeared appropriate to use a finer approach: namely the majorizing measure method. We shall prove by means of this method, the existence of a simple general criterion, uniquely built up from the sequence m, and allowing one to get remarkably efficient uniform bounds for suitable averages of the random variables ξl . We assume from now on, and throughout the whole section, that the sequence m = {ml , l ≥ 1} has partial sums Mn verifying Mn ↑ ∞,

(9.3.1)

as n tends to infinity; and let M = {Mn , n ≥ 1}. We will further assume that m does not increase faster than exponentially. To be precise, we assume the following growth condition: for any ρ large enough, mk ρ −k Cm (ρ) = sup k>n −n < ∞. (9.3.2) mn ρ n≥1 We also consider sequences of random variables ξ satisfying a more general type of increment condition. Let 1 < p < ∞ and q = p/(p − 1) be fixed. Let " : R+ → R+ be increasing. We assume that (9.3.3) "(x)/x p is nonincreasing. This implies that there exists a constant 1 < C < ∞ such that "(2x) ≤ C"(x)

(∀x ≥ 0).

(9.3.4)

As typical examples, we have the functions "(x) = x α (log(1 + x))β , 0 < α < p, β ∈ R, or α = p and β ∈ R− . Consider the more general assumption: j j p E ξl ≤ " ml l=i

(i ≤ j ).

(9.3.5)

l=i

Let ϕ : R+ → R+ denote a continuous increasing concave function such that ϕ p is convex and ϕ(0) = 0. The question studied can be described as follows. Problem. Given ϕ, find conditions ensuring the existence of a constant K (depending on p, m, " and ϕ only) such that any sequence of random variables ξ satisfying the increment condition (9.3.5) verifies n l=1 ξl sup ≤K ϕ(M ) n≥1

n

p

1 a.s. ξl −→ 0. ϕ(Mn ) n

and

l=1

(9.3.6)

9.3 A useful criterion

449

We introduce a definition. 9.3.1 Definition. A function ϕ enjoying property (9.3.6) will be called (p, ", m)-admissible, or more simply admissible. The difficulty in the application of the majorizing measure method, when compared to other methods, lies in the fact that one has, not only to imagine the measure, but also to really invent an argument that goes with, and show that this measure will, in turn, also satisfy the majorizing measure’s condition. Once this step is performed, the method yields efficient bounds. Introduce the following conditions linking " and ϕ, (a) (b)

ϕ(x)/"(x)1/p is nondecreasing, ∞ "(t)1/p dt < ∞ for some λ > 0. tϕ(t) λ

(9.3.7)

Finally, we define a class of functions of particular relevance. 9.3.2 Definition. Let L be the class of functions defined as follows: ∞ dt L = L : R+ → R+ : L(t) t p is nonincreasing and λ L(t) < ∞ for some λ > 0 . The following criterion is the main result of the section. 9.3.3 Theorem. Assume that (", ϕ) satisfy condition (9.3.7). Assume further that (m, ", ϕ) are linked by the following condition: There exists L ∈ L such that "(Mn ) L(Mn )1/p "(mn ) 1/p dt sup + < ∞. (9.3.8) 1/q " −1 (t)1/p mn n≥1 ϕ(Mn ) "(mn ) t Then ϕ is admissible. The criterion we obtain, is directly expressed in terms of the sequences m and M, which is not possible by means of the metric entropy method, since it uses by definition, covering numbers. This also makes its use very easy. In some important cases, condition (9.3.8) can be simplified. Assume that m is a bounded sequence. Then condition (9.3.8) is equivalent to L(Mn )1/p "(Mn ) dt there exists L ∈ L such that sup < ∞. 1/q " −1 (t)1/p ϕ(M ) t n n≥1 "(mn ) (9.3.9) p−1 n) ≤ m , and m is bounded. This is immediate since "(x) ≤ x p ; so "(m n mn If "(x) ≤ x, then x 1/q " −1 (x)1/p ≥ x; condition (9.3.8) reduces to there exists L ∈ L such that

"(Mn ) L(Mn )1/p log < ∞. ϕ(M ) "(mn ) n n≥1

sup

(9.3.10)

In the next statements, we apply Theorem 9.3.3 to the case "(x) = x β , 0 < β ≤ p.

450

9 The majorizing measure method

9.3.4 Corollary (0 < β < 1). ϕ is admissible if there exists L ∈ L such that ∞ L(Mn )1/p (β−1)/p dt < ∞ and (b) mn < ∞. (a) sup 1−β/p ϕ(t) n≥1 ϕ(Mn ) m1 t If mn ≥ c > 0, then for any L ∈ L, ϕ(t) = L(t)1/p is admissible; and, for instance, ϕ(t) equals t 1/p logτ/p (1 + t) with τ > 1. The first assertion is immediate. Concerning the second, if ϕ(t) = L(t)1/p , then (a) is fulfilled and we observe by Hölder’s inequality that ∞ 1/q ∞ dt 1/p ∞ dt dt ≤ < ∞, 1−(β/p) L(t)1/p q(1−(β/p)) m1 t m1 L(t) m1 t since 1 − (β/p) > 1/q. Thus (b) is satisfied too. 9.3.5 Corollary (β = 1). ϕ is admissible if there exists L ∈ L such that ∞ L(Mn )1/p dt Mn < ∞ and (b) log < ∞. (a) sup 1/q mn ϕ(t) n≥1 ϕ(Mn ) m1 t If log

Mn = O log Mn , mn

for any L ∈ L, ϕ(t) = L(t)1/p log t is admissible; and for instance ϕ(t) equals t 1/p log1+τ/p (1 + t) with τ > 1. Here again the first assertion is immediate; as for the second, one uses Hölder’s inequality to show (b). When ml ≡ 1, one recovers Theorem 3 of Gál–Koksma [1950]. The last condition on the growth of the sequence m is satisfied when ml ≥ l −c for some 0 ≤ c < 1. The critical case occurs when ml = l −1 . When the random variables ξl are indicators, it is possible to overcome that difficulty. The key observation to treat this case is that when "(x) = x, or more generally when " is subadditive, assumption (9.3.5) is preserved when replacing the sequence ξ by a sequence of sums on consecutive blocks of the ξl ’s. Let indeed {n k , k ≥ 1} be some increasing sequence of integers, and put γk = nk−1 ≤l 1 is admissible. Concerning Case c), we note that the increment condition (9.3.5) is trivially satisfied, when for instance ml = ξl p . The condition however forces ϕ to satisfy limt→∞ ϕ(t)/t = ∞, which is not surprising here. One thus always has, with τ > 1, n ξl l=1 sup < ∞.

n n τ p n≥1 l=1 ξl p log 1 + l=1 ξl p There are some applications in ergodic theory. 9.3.8 Proposition. Let 1 < p < ∞, q = p/(p − 1). Let T be power-bounded on Lp , f ∈ Lp (P) and 0 < α < 1. Assume that n

1 n1−α

weakly

T l f −−−−→ 0.

(9.3.12)

l=1

Let τ > 1 and put ⎧ n 1 l ⎪ ⎪ l=1 T f ⎨ n1/p (log n)τ/p n Tn τ f = n1/p (log1n)1+τ/p l=1 T l f ⎪ n ⎪ 1 l ⎩ l=1 T f n1−α (log n)τ Then,

if (1 − α)p < 1, if (1 − α)p = 1, if (1 − α)p > 1.

a.s. Tn τ f −→ 0 and sup Tn τ f p < ∞.

(9.3.13)

n≥1

According to a result of Derriennic and Lin [2001: Proposition 2.18], for T a contraction, assumption (9.3.12) is equivalent to 1

sup n≥1

n1−α

n l=1

T l f < ∞. p

(9.3.14)

454

9 The majorizing measure method

Now, if T is power bounded, T is a contraction in an equivalent norm (Krengel [1985; p. 110]), and Proposition 2.18 of Derriennic and Lin still applies to give (9.3.14). The increment condition (9.3.5) is fulfilled with "(x) = x p(1−α) . Proposition 9.3.8 thus follows at once from Corollaries 9.3.4, 9.3.5 and 9.3.6. 9.3.9 Remarks. Some comparisons with existing results are necessary. 1. In the particular case that T is induced on Lp by a Dunford–Schwartz operator, Corollary 3.7 of Derriennic and Lin [2001] gives rates of convergence under assumption 1 (9.3.14). When (1 − α)p < 1, the rate there is n1/p nl=1 T l f → 0 a.e., which is better than what Proposition 9.3.8 yields. On the other hand, when (1 − α)p ≥ 1, Proposition 9.3.8 provides a better rate than Derriennic and Lin [2001]. 2. For the particular case that T is induced by a Dunford–Schwartz operator and 1 n l f ∈ (I − T )α Lp , which implies limn→∞ n1−α l=1 T f p = 0 by Corollary 2.15 in Derriennic and Lin [2001], more precise information is given in Derriennic and Lin [2001], Theorem 3.2. 3. For T power-bounded on Lp and f ∈ Lp satisfying (9.3.14), the rates obtained here are better than in Cohen and Lin [2003: Corollary 1]. 4. For T unitary on L2 and f ∈ L2 satisfying (1.14), the rates obtained in [Gaposhkin: 1979], Theorem 3, cases (vii), (iv), and (iii) are better. Before passing to another application, we shall consider a variant of assumption (9.3.13) useful for L2 -applications. Let : R+ → R+ be some nondecreasing function, and consider the following type of increment assumption. j j j p E ξl ≤ ml " ml l=i

l=1

(i ≤ j ).

(9.3.15)

l=i

We further assume and " to also satisfy the condition below: (Mn ) − (Mm ) "(Mn − Mm ) ≤B (Mn ) "(Mm )

(m ≤ n),

(9.3.16)

where B is an absolute constant. 9.3.10 Theorem. Assume that ", satisfy condition (9.3.16). Further, assume that p, m, " and ϕ satisfy conditions (9.3.7) and (9.3.8). Then, there exists a constant K < ∞, such that any sequence ξ = {ξl , l ≥ 1} of random variables satisfying the increment condition (9.3.15) verifies 1

n

(Mn )1/p φ(Mn )

l=1

ξl −→ 0 and sup a.s.

n

l=1 ξl 1/p φ(Mn ) p n≥1 (Mn )

≤ K.

455

9.3 A useful criterion

The proof is given in Section 9.5. The main argument will consist of the fact that, under conditions (9.3.15) and (9.3.16), the increments of the averages considered are controlled in the same manner as those of the preceding averages. In view of our next theorem, we shall specialize this result to L2 -spaces and "(x) = x. Condition (9.3.15) becomes j j j 2 E ξl ≤ ml ml l=i

l=1

(i ≤ j ).

(9.3.17)

l=i

9.3.11 Theorem. Assume that is concave. Further assume ϕ is such that there exists L ∈ L satisfying the condition ∞ Mn L(Mn )1/2 dt sup < ∞ and log < ∞ for some λ > 0. √ ϕ(M ) m tϕ(t) n n n≥1 λ Then, there exists a constant K depending on m, , and ϕ only, such that any sequence ξ = {ξl , l ≥ 1} of random variables satisfying the increment condition (9.3.17) also verifies n n ξl a.s. l=1 l=1 ξl sup ≤ K. −→ 0 and 1/2 1/2 ϕ(Mn ) 2 (Mn ) ϕ(Mn ) n≥1 (Mn ) If log

Mn ∼ log Mn , mn

one can take ϕ(t) = L(t)1/2 log t for any L ∈ L; for instance ϕ(t) = t 1/2 logτ (1 + t) with τ > 3/2. Indeed, when p = 2 and "(x) = x, condition (9.3.16) reduces to (Mn ) − (Mm ) Mn − Mm ≤B (Mn ) Mm

(m ≤ n).

Since is concave, for m ≤ n, (Mn ) − (Mm ) (Mn ) (Mn ) ≤ ≤ . Mn − M m Mn Mm This implies (9.3.16) with B = 1. Theorem 9.3.11 then follows from Theorem 9.3.10 and the fact that, in the case under consideration, conditions (9.3.7) and (9.3.8) reduce to the conditions stated in Corollary 9.3.7. In the case ml ≡ 1, Theorem 9.3.11 also complements Theorem 7 in Gál–Koksma [1950], where under the assumption j p E ξl ≤ Cj p−σ (j − i)σ η(j − i) l=i

(i ≤ j )

456

9 The majorizing measure method

with p > σ > 1 and η(n) > 0 nonincreasing such that the series n≥1 η(n)/n converges, it is proved that L1 L l=1 ξl tends to 0 almost surely when L tends to infinity. Here the case p = 2 is considered and (9.3.17) with (x) = x s , s ∈ ]0, 1[ reads as follows: j 2 E ξl ≤ Cj s (j − i) (i ≤ j ). l=i

This corresponds to η(x) = x 1−σ , s = 2 − σ in Theorem 7 of Gál–Koksma [1950]. Applying Theorem 9.3.10 gives for any τ > 3/2,

L

s+1 2

1

L

logτ L

l=1

ξl → 0

almost surely when L tends to infinity, which is better than what is obtained by applying Theorem 7 in Gál–Koksma [1950]. Now, we pass to our next application to ergodic theory. Consider the following data. = {θl , l ≥ 1} is a sequence of reals, such that n = 1≤l≤n θl2 ↑ ∞. P = {pl , l ≥ 1} is an increasing sequence of positive integers. T is a contraction in L2 (P). Introduce the sequence of complex numbers ζl (x) = θl e2iπpl x

(x ∈ [0, 1[= R/Z).

Let · ∞ denote the supremum norm on C([0, 1[). We shall assume that the following condition is realized: there exists a sequence m and a concave nondecreasing function : R+ → R+ , such that 1/2 1/2 ζl ≤ ml ml (i ≤ j ). (9.3.18) i≤l≤j

∞

1≤l≤j

i≤l≤j

Condition (9.3.18) usually describes a situation where ml ∼ |θl |2 , but are not equal. Some examples are given in Section 9.6. Our next application is related to the study of the ergodic sums n θl T pl f (n ≥ 1). k=1

9.3.12 Theorem. Assume that ϕ is such that there exists L ∈ L with ∞ L(Mn )1/2 dt Mn < ∞ for some λ > 0. sup < ∞ and log √ mn tϕ(t) n≥1 ϕ(Mn ) λ

457

9.4 Proof of Theorem 9.3.3

Then, there exists a real K, such that for any f ∈ L2 (μ): n n pl pl a.s. k=1 θl T f k=1 θl T f ≤ K, and −→ 0. sup 1/2 1/2 ϕ(Mn ) 2 n (Mn ) ϕ(Mn ) n≥1 n (Mn ) Moreover, if log

Mn = O(Mn ), mn

(9.3.19)

one can choose ϕ(t) = L(t)1/2 log t, for any L ∈ L; and for instance ϕ(t) = √ t logτ (1 + t) with τ > 3/2. Then, for any f ∈ L2 (μ), n pl k=1 θl T f sup " ≤ K f 2 #1/2 τ n≥1 (Mn )Mn log (1 + Mn ) 2 and

n

(9.3.20) T pl f

a.s. k=1 θl → #1/2 τ (Mn )Mn log (1 + Mn )

"

0.

This result is proved and applied in Section 9.5. In the applications, Mn ∼ n .

9.4

Proof of Theorem 9.3.3

The proof is long. We pause to outline the steps. In Step 0, we specify Theorem 9.2.2 to our setting. Step 1 is an intermediate step consisting of the regularization of the sequence m. There are some specific functions built from this sequence, " and ϕ, and used later on, which necessitate such a regularization to be efficiently employed. In Step 2, a great deal of effort is devoted to an estimation of the increments Yn − Ym p for m ≤ n, according as Mm ≤ Mn /2 or Mm ≥ Mn /2. This preliminary work is of course indispensable. Finally, in Step 3, we really attack the proof. We construct a measure μ on N and show that a family of local integrals attached to it, is uniformly bounded. This establishes that μ is a majorizing measure, and consequently, enables us to conclude the proof. 0) Let (T , d) be a compact metric space and denote by D the diameter of T . For x ∈ T and ε > 0, consider a separable stochastic process X = {Xt , t ∈ T } indexed by T , defined on some probability space (, A, P) and satisfying the increment condition

Xs − Xt p ≤ d(s, t)

(s, t ∈ T ).

(9.4.1)

Assume that there exists a probability measure μ on T such that

D

sup x∈T

0

dε = M. μ(B(x, ε))1/p

(9.4.2)

458

9 The majorizing measure method

It follows from Theorem 9.2.2 that X is sample continuous and moreover sup (Xs − Xt ) ≤ Kp M, p

(9.4.3)

s,t∈T

where Kp depends on p only. We recall that X is separable (with respect to the metric d), if there exists a countable d-dense subset T0 of T and a null set N of B such that a.s. for any ω ∈ N and any t ∈ T , Xt (ω) = limT0 s→t Xs (ω). In our case, this is not important because we work with sequences of random variables; so T = N and the sample continuity property simply means here that the sequence studied converges almost surely. With this tool in hand, our task will consist in proving the existence of a majorizing measure on N provided with a specific metric: the one induced by the Lp -increments of the sequence nl=1 ξl /φ(Mn ), n ≥ 1. The majorizing measure is built at Step 3. But some preliminary steps are necessary. 1) Let ρ > 1 be some fixed real which we assume to be sufficiently large for condition (9.3.2) to be realized. Without loss of generality, we can assume m1 ≤

m2 . 2(1 + Cm (ρ))

(9.4.4)

˜ If this condition is not satisfied, we first "(x) by "(x) = 2p "(x). Then we let replace p ξ˜1 be a random variable satisfying E ξ˜1 ≤ "(m1 /2(1 + Cm (ρ))). We also replace

p ˜ defined by m ˜ i = mi−1 for i ≥ 2 and m ˜ 1 = " −1 E ξ˜1 . In place of ξ , m by m we then consider enlarged families ξ˜ defined as follows: ξ˜i = ξi−1 , for i ≥ 2. Then, m ˜1 ≤ m ˜ 2 /2(1 + Cm (ρ)) and j j j p ˜ E m ˜l ≤ " m ˜l ξ˜l ≤ " l=i

l=i

(2 ≤ i ≤ j ),

l=i

j j p p p p−1 ˜ E E ξl ≤ 2 ξ˜l + E ξ˜1 l=1

≤ 2p−1 "

l=2 j l=2

j ˜ ˜ 1) ≤ " m ˜ l + "(m m ˜l . l=1

˜ and the new sequence m, It follows that condition (9.3.5) is satisfied with function " ˜ for ˜ ˜ any sequence ξ obtained from ξ by adding ξ1 , as well as condition (9.4.4). Moreover, 1 the new sequence m ˜ satisfies condition (9.3.2) with Cm˜ (ρ)) = ( m m ˜ 1 ∨ 1)Cm (ρ). We now regularize the sequence m. Consider the new sequence m = {ml , l ≥ 1} defined by ∞ ml = ρ −|k−l| mk (l ≥ 1) (9.4.5) k=1

459

9.4 Proof of Theorem 9.3.3

and write Mn =

n

l=1 ml ,

n ≥ 1. Then, i)

ml ≤ ml ,

ii)

ρ −1 ≤

iii)

ml+1 ml

≤ ρ,

(9.4.6)

Mn ≥ Mn+1 /2ρ.

Assertions i) and ii) are elementary; as for iii) we have by ii) that Mn ≥ (Mn+1 −m1 )/ρ. But, in view of (9.3.2) and (9.4.4),

m1 = m1 +

∞

ρ −(k−1) mk ≤ m1 1 + Cm (ρ) ≤ m2 /2 ≤ m2 /2 ≤ Mn+1 /2.

k=2 Hence, Mn ≥ Mn+1 /2ρ. Observe now that

Mn =

∞ n

∞

ρ −|k−l| mk =

l=1 k=1

mk

n

k=1

ρ −|k−l| ,

l=1

and n l=1 n l=1 n

ρ −|k−l| ≤ ρ −(k−n)+1 /(ρ − 1)

(k > n),

ρ −|k−l| ≤ ρ/(ρ − 1)

(k = n),

ρ −|k−l| ≤ (ρ + 1)/(ρ − 1)

(k < n).

l=1

Thus, Mn ≤

Mn

≤

n k=1

≤

mk

n l=1

ρ

−|k−l|

+

k>n

mk

n

ρ −|k−l|

l=1

# ρ+1 " Mn + mn Cm (ρ) ≤ Cρ Mn , ρ−1

(9.4.7)

where we put Cρ = ( ρ+1 ρ−1 )[1 + Cm (ρ)], and Cm (ρ) is defined by condition (9.3.2). Hence, Mn ≤ Mn ≤ Cρ Mn . (9.4.8) Now, consider the following conditions: there exists L ∈ L such that "(Mn ) L(Mn )1/p "(mn ) 1/p dt sup + < ∞, 1/q " −1 (t)1/p mn n≥1 ϕ(Mn ) "(mn ) t

(9.3.8 )

460

9 The majorizing measure method j j p E ξl ≤ " ml l=i

(i ≤ j ).

(9.1.5 )

l=i

Since Mn , Mn are commensurable (9.3.5 ) and (9.3.8) ⇒ (9.1.8 ).

and mn ≤ mn , we have the implications (9.3.5) ⇒

Assume that we have provedthe theorem with m in place of m. Let ξ satisfy (9.3.5), n 1 and thus (9.3.5 ). Then φ(M ) l=1 ξl converges almost surely to 0 and verifies n

n l=1 ξl ≤ K. sup φ(M ) n≥1

n

p

1 Since φ(Mn ) ≥ φ(Cρ Mn )/Cρ ≥ φ(Mn )/Cρ , by concavity of φ we have φ(M n) converges almost surely to 0, and n ξl l=1 ≤ K. sup n≥1 φ(Mn ) p

n

l=1 ξl

It is therefore enough to prove the theorem under the additional assumption on m: a) ρ −1 ≤ ml+1 /ml ≤ ρ, b) Mn ≥ Mn+1 /2ρ.

(9.4.9)

2) Put for any integer n ≥ 1, n Yn =

l=1 ξl

φ(Mn )

.

(9.4.10)

Clearly, for any m ≤ n,

Yn − Ym p ≤ "(Mm )1/p

"(Mn − Mm )1/p ϕ(Mn ) − ϕ(Mm ) + . ϕ(Mn )ϕ(Mm ) ϕ(Mn )

We estimate the right-hand side according as Mm ≥ Mn /2 or Mm ≤ Mn /2. If Mm ≥ Mn /2, by concavity of ϕ, ϕ(Mn ) − ϕ(Mm ) ϕ(Mn ) − ϕ(Mm ) (Mn − Mm ) = 1/p "(Mn − Mm ) Mn − Mm "(Mn − Mm )1/p ϕ(Mm ) (Mn − Mm ) ≤ Mm "(Mn − Mm )1/p ϕ(Mm ) "(Mm )1/p Mn − Mm = "(Mm )1/p "(Mn − Mm )1/p Mm ϕ(Mm ) ≤ , "(Mm )1/p

(9.4.11)

461

9.4 Proof of Theorem 9.3.3 ϕ(Mn )−ϕ(Mm ) since "(x)/x p is nonincreasing and Mn −Mm ≤ Mm . Thus "(M 1/p ≤ n −Mm ) which implies

"(Mm )1/p

ϕ(Mm ) , "(Mm )1/p

"(Mn − Mm )1/p ϕ(Mn ) − ϕ(Mm ) ≤ . ϕ(Mn )ϕ(Mm ) ϕ(Mn )

Hence by (9.4.11),

Yn − Ym p ≤ 2

"(Mn − Mm )1/p ϕ(Mn )

if m ≤ n and Mm ≥ Mn /2.

(9.4.12)

Now, consider the case m ≤ n with Mm ≤ Mn /2. Since ϕ(Mn /2) ≤ ϕ(Mn )/21/p by convexity of ϕ p , we have ϕ(Mn ) − ϕ(Mm ) 1 − 2−1/p "(Mm )1/p ≥ "(Mm )1/p . ϕ(Mn )ϕ(Mm ) ϕ(Mm ) ϕ(x) But "(x) 1/p is nondecreasing; then estimate with

ϕ(Mn ) "(Mn )1/p

≥

ϕ(Mm ) "(Mm )1/p

and so we can continue our

ϕ(Mn ) − ϕ(Mm ) "(Mn )1/p "(Mm )1/p ≥ (1 − 2−1/p ) . ϕ(Mn )ϕ(Mm ) ϕ(Mn ) Thus

ϕ(Mn ) − ϕ(Mm ) "(Mn − Mm )1/p "(Mm )1/p ≥ (1 − 2−1/p ) . ϕ(Mn )ϕ(Mm ) ϕ(Mn )

Set

γp =

2 − 2−1/p . 1 − 2−1/p

(9.4.13)

Then by (9.4.11),

Yn − Ym p ≤ γp "(Mm )1/p .

ϕ(Mn ) − ϕ(Mm ) ϕ(Mn )ϕ(Mm )

if m ≤ n and Mm ≤ Mn /2. (9.4.14)

Finally remark that if n is sufficiently large, say n ≥ n1 , then m ≥ n "⇒

"(Mm − Mn )1/p "(Mn )1/p "(Mm )1/p ≤ ≤ ϕ(Mm ) ϕ(Mn ) ϕ(Mm ) ϕ(M n ) − ϕ(m1 ) ≤ "(m1 )1/p · . ϕ(Mn )ϕ(m1 )

Observe indeed, from (9.3.7) (b) we have that "(Mn )1/p = 0, n→∞ ϕ(Mn ) lim

(9.4.15)

462

9 The majorizing measure method

n )−ϕ(m1 ) and besides, limn→∞ "(m1 )1/p ϕ(M ϕ(Mn )ϕ(m1 ) =

"(m1 )1/p ϕ(m1 ) .

Define n1 so that for n ≥ n1 ,

"(Mn )1/p 1 "(m1 )1/p ϕ(Mn ) − ϕ(m1 ) ≤ ≤ "(m1 )1/p . ϕ(Mn ) 2 ϕ(m1 ) ϕ(Mn )ϕ(m1 ) This and (9.3.7) (a) prove our claim. By combining now successively (9.4.12) with (9.4.15) and (9.4.14) with (9.4.15), next using (9.3.7) (a) we get n )−ϕ(m1 ) if m ≥ n ≥ n1 and Mn ≥ Mm /2, 2"(m1 )1/p ϕ(M ϕ(Mn )ϕ(m1 )

Yn − Ym p ≤ 1/p ϕ(Mm )−ϕ(m1 ) γp "(m1 ) if m ≥ n ≥ n1 and Mn ≤ Mm /2. ϕ(Mm )ϕ(m1 ) Concerning the last case, we have by using (9.4.15) again "(m1 )1/p

"(m1 )1/p ϕ(Mm ) − ϕ(m1 ) ϕ(Mn ) − ϕ(m1 ) ≤ ≤ 2"(m1 )1/p , ϕ(Mm )ϕ(m1 ) ϕ(m1 ) ϕ(Mn )ϕ(m1 )

since n ≥ n1 . As γp > 2, we have obtained

Yn − Ym p ≤ 2γp "(m1 )1/p

ϕ(Mn ) − ϕ(m1 ) ϕ(Mn )ϕ(m1 )

if m ≥ n ≥ n1 .

(9.4.16)

Now let n ≥ n1 and m ≤ n. Then, by (9.4.12), (9.4.14), (9.3.7) (a) and (9.4.15) if Mm ≥ Mn /2, then "(Mn − Mm )1/p "(Mn )1/p "(m1 )1/p ≤ ≤ ϕ(Mn ) ϕ(Mn ) ϕ(m1 ) ϕ(M ) − ϕ(m ) n 1 ≤ 2"(m1 )1/p , ϕ(Mn )ϕ(m1 )

Yn − Ym p ≤ 2

(9.4.17)

and if Mm ≤ Mn /2, then

Yn − Ym p ≤ γp "(Mm )1/p ·

ϕ(Mn ) − ϕ(Mm ) ϕ(Mn ) − ϕ(m1 ) ≤ γp "(m1 )1/p . ϕ(Mn )ϕ(Mm ) ϕ(Mn )ϕ(m1 ) (9.4.18)

Therefore, sup Yn − Ym p ≤ 2γp "(m1 )1/p · m≥1

ϕ(Mn ) − ϕ(m1 ) ϕ(Mn )ϕ(m1 )

(n ≥ n1 ).

(9.4.19)

3) Fix now n ≥ n1 , and put for k = 1, 2, . . . , n − 1, (n)

εk = εk = 2"(Mk )1/p · Then,

ϕ(Mn ) − ϕ(Mk ) . ϕ(Mn )ϕ(Mk ) (n)

sup Yn − Ym p ≤ γp ε1 .

m≥1

(9.4.20)

(9.4.21)

463

9.4 Proof of Theorem 9.3.3

By concavity of ϕ and (9.4.9-b), we have that ϕ(Mk+1 ) ≤ Thus, for k + 1 < n, and p = [p] + 2, εk εk+1

Mk+1 Mk ϕ(Mk )

≤ 2ρϕ(Mk ).

"(Mk ) 1/p ϕ(Mk+1 ) ϕ(Mn ) − ϕ(Mk ) ϕ(Mn ) − ϕ(Mk ) ≤ρ "(Mk+1 ) ϕ(Mk ) ϕ(Mn ) − ϕ(Mk+1 ) ϕ(Mn ) − ϕ(Mk+1 ) ϕ p (M ) − ϕ p (M ) n k =ρ p ϕ (Mn ) − ϕ p (Mk+1 ) ϕ(M )p−1 + ϕ(M )p−2 ϕ(M ) + · · · + ϕ(M )p−1 n n k+1 k+1 · . ϕ(Mn )p−1 + ϕ(Mn )p−2 ϕ(Mk ) + · · · + ϕ(Mk )p−1

=

Since ϕ p is convex, then ϕ p is also convex and ϕ p (Mn ) − ϕ p (Mk ) = ϕ p (Mn ) − ϕ p (Mk+1 )

ϕ p (Mn )−ϕ p (Mk ) Mn −Mk ϕ p (Mn )−ϕ p (Mk+1 ) Mn −Mk+1

=1+

·

Mn − Mk M n − Mk ≤ Mn − Mk+1 Mn − Mk+1

Mk+1 − Mk mk+1 ≤1+ ≤ 1 + ρ. Mn − Mk+1 mk+2

Thus, εk εk+1

ϕ(Mn )p−1 + ϕ(Mn )p−2 ϕ(Mk+1 ) + · · · + ϕ(Mk+1 )p−1 ϕ(Mn )p−1 + ϕ(Mn )p−2 ϕ(Mk ) + · · · + ϕ(Mk )p−1 ϕ(Mn )p−1 + ρϕ(Mn )p−2 ϕ(Mk ) + · · · + ρ p−1 ϕ(Mk )p−1 ≤ ρ(1 + ρ) ϕ(Mn )p−1 + ϕ(Mn )p−2 ϕ(Mk ) + · · · + ϕ(Mk )p−1 p ≤ ρ (1 + ρ). ≤ ρ(1 + ρ)

Put η = ρ p (1 + ρ); we have shown εk ≤ η, εk+1

k = 1, 2, . . . , n − 2.

(9.4.22)

We denote B(n, ε) = {m ≥ 2 : Yn − Ym p < ε}. Let μ be the measure defined on the set of integers {2, 3, . . . } by Mn "(t)1/p 1 μ{n} = c dt, (9.4.23) + tϕ(t) L(t) Mn−1 with c = that

∞ "(t)1/p m1

tϕ(t)

+

1 L(t)

dt

sup

−1

. By Step 0 and (9.4.21), it suffices to establish

(n)

γp ε1

n≥n1 0

dε < ∞. μ(B(n, ε))1/p

We fix n ≥ n1 . Let kn be the unique integer such that εkn +1

0, √ mn tϕ(t) n≥1 ϕ(Mn ) λ then there exists a constant K such that n θl T pl f k=1 ≤ K, sup 1/2 ϕ(M ) n n≥1 n (Mn ) 2

n

and

pl k=1 θl T f n (Mn )1/2 ϕ(Mn )

a.s.

−→ 0.

The first part of the theorem follows by replacing f by g/ g 2 for arbitrary g ∈ L2 (P). The second part of the theorem similarly follows from the second half of Theorem 9.3.10. Now we give some examples of application of Theorem 9.3.12. 1. Consider a sequence = {θk , k ≥ 1} of independent, symmetric real-valued random variables, as well as an increasing sequence of integers P = {pk , k ≥ 1}. Let (X, F , μ) be an arbitrary probability space, and T any contraction of L2 (μ). In this example, we study the growth of the weighted ergodic sums n

θl (ω)T pl f

k=1

when ω belongs to a measurable set of full measure; which is universal in the sense that the estimates of the magnitude of the considered sums are independent of the contraction T and f ∈ L2 (μ). We shall introduce conditions on the sequences and P ; some of them are very weak. All these conditions are also natural, in regard to the optimality of the result we obtain below. Condition (P ): there exists : R+ → R+ nondecreasing, concave such that

pl = O e(l) . a.s. n 2 Condition (): i) For any l, P |θl | = 0 = 0; and ii) n = O l=1 θl . Condition ii) is weak. If the θl ’s are identically distributed, condition ii) is always satisfied. This follows from the strong law of large numbers. Condition i) is natural in regard to the studied averages. Put for any positive integer n, n = nl=1 θl2 .

472

9 The majorizing measure method

9.6.1 Theorem. Let τ > 3/2. There exists a measurable set with P( ) = 1, and for any ω ∈ ∗ , a real Kω < ∞, such that for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), we have n θl (ω)T pl f k=1 ≤ Kω f 2 sup " #1/2 τ n≥1 (n (ω))n (ω) log (1 + n (ω)) 2,μ and

n

pl a.s. k=1 θl (ω)T f −→ " #1/2 τ (n (ω))n (ω) log (1 + n (ω))

0.

• The stated result expresses a rather general form of an ergodic theorem with weights sampled by sequences of independent random variables. There is indeed no moment assumption at all. When some integrability property is moreover known, n can be replaced by a suitable deterministic sequence in the normalizing sequence. • Take P such that for some B < ∞, pn = O(nB ) and an i.i.d. sequence satisfying condition () i). Conditions (P ) and () are satisfied with (t) = B log t. Let b > 2, Theorem 9.6.1 applies with, as a normalizing factor, n (ω)1/2 logb (1 + n (ω)). Further, if θ1 is square integrable, for any b > 2, there exists a measurable set with P( ) = 1, and for any ω ∈ ∗ , a real Kω < ∞, such that: For any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), we have n θl (ω)T pl f k=1 ≤ Kω f 2 , sup √ n logb n n≥1 2,μ and

n pl a.s. k=1 θl (ω)T f −→ 0. √ b n log n δ

• Take P such that for some 0 < δ < 1, pn = O(en ) and an i.i.d. sequence satisfying condition () i). Conditions P and are satisfied with (t) = t δ . Then, (1+δ)/2 logb (1 + n ) or for b > 3/2, Theorem 9.6.1 applies with normalizing factor n b (1+δ)/2 n log n, if θ1 is square integrable. Proof. Fix τ > 3/2 and ρ > 0. By Theorem 8.5.6, there exists a universal constant C such that M 2iπpk t k=N +1 θk e E sup sup

1 ≤ C. M 2 2 N <M 0≤t≤1 log p θ M k=N +1 k Put for positive integers n and t ∈ [0, 1[, ζn (t) = θn e2iπpn t .

473

9.6 Proof of Theorem 9.3.12 and some examples

It follows that ζl i≤l≤j

∞

≤C

log pj

j

1/2

1/2 2 1/2 j θl , i≤l≤j

with E C < ∞. Conditions

(P ) and () imply j = O j and log pj = O (j ) . Thus log pj = O (j ) . Replacing C by λC for some suitable λ if necessary gives ζl i≤l≤j

∞

1/2 2 1/2 ≤ C j θl ≤C

i≤l≤j

(|θl | ∨ l −ρ )

2 1/2

1≤l≤j

(|θl | ∨ l −ρ )

2 1/2

.

i≤l≤j

Thus condition (9.3.18) of Theorem 9.3.12 is fulfilled. Now, in view of condition () ii), the sequence {(|θl | ∨ l −ρ )2 , l ≥ 1} clearly satisfies condition (9.3.19). The conditions for the application of Theorem 9.3.12 are fulfilled, and the proof is achieved by applying the second half of this theorem, and by observing, by means of condition () ii), that n a.s. (|θl | ∨ l −ρ )2 = O(n ). l=1

2. Consider a sequence Q = {Qk , k ≥ 1} of independent random variables with values in N, as well as an increasing sequence of integers P = {pk , k ≥ 1} and a sequence of reals A = {ak , k ≥ 1} such that An = nl=1 al2 ↑ ∞. Condition (P ): there exists : R+ → R+ nondecreasing, concave such that

pn = O e(An ) .

Condition (Q): E supn≥1

log+ (pn +Qn ) (An )

Condition (A): log |aAnn| = O log An

< ∞.

9.6.2 Theorem. Let τ > 3/2. There exists a measurable set with P( ) = 1, and for any ω ∈ ∗ , a real Kω < ∞, such that for any probability space (X, F , μ), any contraction T on L2 (μ), any f ∈ L2 (μ), we have n

p +Q (ω) l l f − E T pl +Ql f k=1 al T ≤ Kω f 2 a) sup " #1/2 τ n≥1 log (1 + An ) (An )An 2,μ and

n b)

T pl +Ql (ω) f − E T pl +Ql f " #1/2 τ log (1 + An ) (An )An

k=1 al

a.s.

−→ 0.

474

9 The majorizing measure method

• Theorem 9.6.2 provides optimal results for this type of ergodic averages. Here, are two examples. • Let 0 ≤ c < 1. Take n−c ≤ an ≤ 1. Then condition (A) is verified. Choose α pn = O(en ), for some 0 < α < 1, and Q an i.i.d. sequence such that E logB + Q1 < ∞ for some B > 1/α. Then conditions (P ) and (Q) are satisfied with (t) = t α . For τ > 3/2, Theorem 9.6.2 thus applies with normalizing factor n(1+α)/2 logτ n. In the case when an ≡ 1, by Corollary 8.6.11, a.s. 1 pl +Ql (ω) T f − E T pl +Ql f −→ 0. n n

k=1

Here we obtain that for any τ > 3/2, n

1 n(1+α)/2 logτ

n

a.s. T pl +Ql (ω) f − E T pl +Ql f −→ 0,

k=1

and a maximal inequality. • Take A as before, pn = O(nB ), for some B < ∞, and Q an i.i.d. sequence such that E Qδ1 < ∞ for some δ > 0. Choose (t) = B log t. Then conditions (P ), (Q), (A) are satisfied and for any b > 2, Theorem 9.6.2 applies with normalizing factor √ n logb n. The same kind of comments can be made for the case al ≡ 1. Proof. Fix τ > 3/2. In view of conditions (P ) and (Q), log 1 + pj + Qj < ∞. (Aj ) j ≥1

E sup

(9.6.1)

Put for any positive integer l,

ζl (x) = al e2iπ x(pl +Ql ) − E e2iπ x(pl +Ql ) }.

By Lemma 8.7.2, if for some increasing function G : N → N the following condition is satisfied $ % ∞ log(1 + pj + Qj ) 1/2 C(Q, G) = E sup < ∞, G(j ) j =1 then,

N l=M+1 ζl (t) ≤ CC(Q, G). E sup sup 1/2 M 2 N <M 0≤t≤1 G(M) k=N +1 ak

In view of (9.6.1), we can choose G by putting G(j ) = (Aj ). It follows that 1/2 ζl ≤ C (Aj )1/2 al2 . i≤l≤j

∞

i≤l≤j

Then, condition (9.3.18) is satisfied. The conditions for the application of Theorem 9.3.12 are fulfilled. And since in view of condition (A), condition (9.3.19) is verified, the proof is achieved by applying the second half of this theorem.

475

9.7 A stronger form of Salem–Zygmund’s estimate

9.7 A stronger form of Salem–Zygmund’s estimate The majorizing measure method allows us to obtain a new and strictly sharper estimate of the supremum of random trigonometric sums. The improvement is seen by considering the case when the characters are indexed on sub-exponentially growing sequences of integers. Several remarkable examples will be studied. Let p = {pk , k ≥ 1}, θ = (θk )k≥1 be two sequences of reals; and denote by p˜ N = max{2+|pk |, 1 ≤ k ≤ N }. Let also X = {X1 , X2 , . . . } and Y = {Y1 , Y2 , . . . } be two sequences of real random variables defined on a common probability space (, A, P). We will be mainly interested in the cases when X and Y are sequences of centered, independent random variables. Consider for N = 1, 2, . . . the sequence of random trigonometric sums ZN (ω, t) =

N

QN := sup |ZN (t)| .

θk {Xk (ω) cos 2πpk t + Yk (ω) sin 2πpk t},

0≤t≤1

k=1

(9.7.1) Put for s, t ∈ [0, 1], dN (s, t) = 2

N

1/2

θk2 sin2 πpk (s − t)

.

(9.7.2)

k=1

When X and Y are independent random variables with E Xk = E Yk = 0 and E Xk2 = E Yk2 = 1, then dN (s, t) = ZN (s) − ZN (t) 2 . Recall briefly the setting considered in Section 8.5. We assumed that for some constant B,

ZN (s) − ZN (t) G ≤ BdN (s, t),

N ∀N ≥ 1, ∀0 ≤ s, t ≤ 1, (8.5.4) 2 1/2 ,

ZN (s) G ≤ B k=1 θk 2

where G(x) = ex − 1. These assumptions are satisfied when X and Y are independent Rademacher or Gaussian random variables and in other interesting cases (see Examples 1–3, Section 8.5). We have shown in Theorem 8.5.1 that under assumption (8.5.4), there exists a constant C (which is a function of the constant B from (8.5.4) only) such that for any integer N ≥ 1,

QN G ≤ C (log p˜ N )1/2

N

θk2

1/2 .

k=1

This followed from estimate (8.5.18), which we recall for our purpose:

QN G ≤ C (log 4p˜ N )1/2

N

θk2

1/2

k=1

≤ C (log p˜ N )1/2

N k=1

θk2

1 2 2 1/2 θk pk p˜ N N

+

k=1

1/2 .

476

9 The majorizing measure method

And we see that QN G is controlled by two different quantities: aN =

N

θk2

1/2

1 2 2 1/2 θk pk . p˜ N N

,

bN =

(9.7.3)

k=1

k=1

Obviously bN ≤ aN . But bN is not necessarily of the same order than aN ; we may have bN 5 aN . Indeed, if p increases very fast, say exponentially, and θ no more than polynomially, then the appropriate order of bN can be sup1≤k≤N |θk |, which is quite different from aN . So the natural question to be drawn from this is: which of aN and bN really reflects the appropriate size for the order of QN G ? As will be seen, the answer turns out to be a bit subtle. From now on, we assume for simplicity that the sequence p is an increasing sequence of positive reals greater than 1. Put for r = 1, . . . , N, εr2 = pr−2

r

N

θk2 pk2 +

k=1

θk2

(9.7.4)

k=r+1

and observe first that the sequence εr , r = 1, . . . N is decreasing. Indeed, εr2 = pr−2

r k=1

Moreover ε12 =

N

θk2 pk2 +

−2 θk2 > pr+1

k=r+1

N

2 k=1 θk

r

θk2 pk2 +

k=1

2 p2 θr+1 r+1 2 pr+1

−2 N

2 , whereas ε 2 = p = aN N N

2 2 k=1 θk pk

+

N

2 θk2 = εr+1 .

k=r+2

=

[2+pN ] 2 pN

2. bN

9.7.1 Theorem. Under assumption (8.5.4), there exist constants Ci , i = 0, 1, 2 (which are functions of the constant B from (8.5.4) only) such that for any integer N ≥ 1, N 2

2 sup |ZN (s) − ZN (t)| ≤ C0 εN log pN + ε , − ε log p r−1 r r G s,t∈T

r=2

and N 2

2 sup |ZN (t)| ≤ C1 ε1 + C2 εN log pN + ε − ε log p r−1 r r . G t∈T

r=2

The last inequality follows from the first and assumption (8.5.4) by the triangle in√ equality. The right-hand side being clearly bounded above by max(C1 , C2 )ε1 log pN , it follows that Theorem 9.7.1 contains Theorem 2 8.5.1. Before giving the proof, we are first going to establish a lemma. Let ψ(x) = log(x + 1), x ≥ 0. 9.7.2 Lemma. For any positive integer N , sup α∈R 0

2ε1

ψ

1 dε ≤ CεN ψ(πpN ) + 2 (εr−1 − εr )ψ(πpr ), λ(BdN (α, ε)) N

r=2

477

9.7 A stronger form of Salem–Zygmund’s estimate

where BdN (α, ε) is the dN -ball of radius ε centered at point α, and C is an absolute constant. Proof. Let 1 ≤ r < N and let α, β ∈ R be such that dN2 (α, β)

≤4

N

θk2

≤ |α − β|

2. 2/N |u −1|

480

9 The majorizing measure method

√ The residual terms in Lemma 9.8.2, inequality (4), (εN −1 − εN ) log pN and β+α √ (εN−2 −εN−1 ) log pN −1 make a contribution which is at most N (1−β)/2 ≤ N 1−( 2 ) . It follows that ⎧ 1−( β+α ⎪ 2 )) if β + α < 2, N ⎨O(N 2 (εr−1 − εr ) log pr = O(log N ) if β + α = 2, ⎪ ⎩ r=2 O(1) if β + α > 2. From Lemma 9.8.2 we also have that 2 1−β εN log pN = O(N 2 ) ),

2 1−α ε1 log pN = O(N 2 ) ).

Consider the case β + α < 2. By Theorem 9.7.1, β+α sup |ZN (t)| ≤ C(α, β)N 1−( 2 ) G

(9.8.2a)

whereas by Theorem 8.5.1, sup |ZN (t)| ≤ C(α, β)N 1−α 2 . G

(9.8.2b)

t∈T

t∈T

As we assumed β > 1, it follows that 1 − β+α < 1−α 2 2 , therefore implying that Theorem 9.7.1 is strictly stronger than Theorem 8.5.1. In the case β + α ≥ 2, this fact is evident. 1−α Now if β = 1, we find with Theorem 9.7.1 an estimate which is O(N 2 ), whereas with Theorem 8.5.1 we get O((N 1−α log N)1/2 ). In particular, in the exponential case α = 0, we find an order of type O(N 1/2 ) again strictly better than O((N log N )1/2 ). Finally, consider for M > N the increment QN,M := sup |ZM (t) − ZN (t)|.

(9.8.3)

t∈T

This case is a bit more delicate and the corresponding sequence (εr ) is given by εr2 = pr−2 2 and εN +1 = the use of the M−1 k=N +2

r

θk2 pk2 +

k=N +1

M

θk2 ,

r = N + 1, . . . , M

(9.8.4)

k=r+1

M 2 2 = p −2 θ 2 p 2 . The previous calculations k=N +1 θk , ε k=N +1 M M r k k 2 2 r 2 2 trivial bound k=N +1 θk pk ≤ k=1 θk pk show here that

M

2

(εk−1 − εk ) log pk ≤ C

M−1

N +2

x 2 ϕ(x) ψ(x)[(M) − (x)]

2 M 1/2 εN +1 log pM ≤ C [(M) − (N )] , ϕ(M) M 1/2 2 x . dx εM log pM ≤ C N ψ(x)ϕ(x)

and

1/2

dx, (9.8.5)

481

9.8 Some examples and discussion

For the last estimate, we used the fact that 2

εM log pM

M M 1/2 1/2 −2 2 2 = pM log pM θk pk ≤ θk2 log pk k=N +1

=

M k=N +1

k ψ(k)ϕ(k)

k=N +1

1/2 .

Choose again for the discussion ψ(x) = x α , ϕ(x) = x β with β ≥ 1, 0 ≤ α < 1. Assume first that β > 1, α + β < 2 and for technical reasons M ≥ N + 6. We shall distinguish when η := M−N M is small or not as M, N tend to infinity. With the change of variables x = Mu, the integral in (9.8.5) is rewritten as 1−1/M $ 1−2β−α %1/2 α+β) u M 1−( 2 ) du. 1−β − 1| |u (N +2)/M α+β)

Since α + β < 2, the integral converges. The order is thus at most M 1−( 2 ) . But if η is small, since (N + 2)/M = 1 − η + 2/M, we see a contribution of the integration near 1. Operating the change of variables u = 1 − h, we get 1−1/M $ 1−2β−α %1/2 η−2/M u dh M − N 1/2 du ≤ C , ≤ C √ α,β α,β 1−β − 1| M h 1−η+2/M |u 1/M where we used the fact that η − 3/M ≤ η/2, since η > 6/M. Consequently, we get M−1 k=N+2

2 α+β) M − N 1/2 (εk−1 − εk ) log pk ≤ Cα,β M 1−( 2 ) . M

By (9.8.5) we have 2

εM log pM ≤

M−N 1/2 α+β−1 N

1 1/2 Cα,β N α+β−2

Cα,β

(9.8.6)

if M − N ≤ N, if M − N ≥ N.

Thus we get by Theorem 9.7.1, ⎧

M−N 1/2 1−( α+β ⎪ 2 ) ( M−N )1/2 C + M ⎨ α,β α+β−1 M N QN,M ≤

G α+β ⎪ 1/2 1 1−( 2 ) M−N 1/2 ⎩Cα,β + M ( ) M N α+β−2

if M − N ≤ N, if M − N ≥ N. (9.8.7a)

We deduce from Theorem 8.5.1 that QN,M G

1/2

1−α 1/2 Cα,β (M−NN)M , Cα,β [N 1−β − M 1−β ]M 1−α β ≤

1−β 1−α 1/2 Cα,β N M ,

if M − N ≤ N, if M − N ≥ N. (9.8.7b)

482

9 The majorizing measure method

Thus here again Theorem 9.7.1 provides better bounds than Theorem 8.5.1. If α + β = 2, we find by Theorem 9.7.1 that

M C log e , M − N ≤ N, α,β QN,M ≤

N M−N 1/2 G , M − N ≥ N, Cα,β log e M N M whereas if α + β > 2,

M M − N ≤ N, QN,M ≤ Cα,β log e N , M M−N 1/2 G , M − N ≥ N, Cα,β log e N M again better than those obtained via Theorem 8.5.1. 9.8.3. The polynomial case. Consider another case: pk = k s/2 , θk2 = log1 k . This corresponds to the choice ψ(x) = x/(s log x) and ϕ(x) = 1/ log x. In that case, we will see that εr 3 ε1 . This means that there is only one big ball at the origin. Theorems 8.5.1 and 9.7.1 will produce similar estimates. As said before, this example is also very instructive for the sequel. At first, pr−2

r

θk2 pk2 ∼

k=1

r , (2s + 1) log r

r k=1

θk2 ∼

r log r

(r → ∞).

N r 1 2 And εr2 = pr−2 rk=1 θk2 pk2 + N k=r+1 θk ∼ (2s+1) log r + k=r+1 log r . By distinguishing the cases r ≤ N/2 and r ≥ N/2, we easily see that for N large, C1

N N ≤ εr2 ≤ C2 , log N log N

1 ≤ r ≤ N,

C1 , C2 , . . . being absolute constants, therefore showing that εr 3 ε1 [recall that these numbers are defined once the value of N has been fixed]. −2 2 Now as pr−1 − pr−2 ∼ 2s/r 2s+1 , we get εr−1 − εr2 ∼ 2s/ log r, and combining these estimates 3 log N 1 εr−1 − εr 3 sC3 (r → ∞). N log r Consequently N

2

3

(εr−1 − εr ) log pr ∼ s

3/2

r=2

C4

N √ log N 1 ∼ s 3/2 C5 N √ N log r r=2

√ √ √ √ and εN log pN ∼ N, ε1 log pN ∼ N. Then N √ 2 2 (εr−1 − εr ) log pr ∼ N εN log pN + r=2

483

9.8 Some examples and discussion

when N tends to infinity. Hence by Theorems 8.5.1 or 9.7.1, √ sup |ZN (t)| ≤ C(s) N . G

(9.8.8)

t∈T

It is interesting to observe in this example that N 2√ 2 r=1 θr log pr (εr−1 − εr ) log pr 3 , N 2 1/2 r=2 r=1 θr

N

and by the Cauchy–Schwarz inequality this is less than √ has the same order in N. As one also always has εN

2

N

1/2 2 , r=1 θr log pr

N N 1/2 1/2 −2 2 2 log pN = pN log pN θk pk ≤ θk2 log pk , k=1

(9.8.9a)

which

(9.8.9b)

k=1

we have by Theorem 9.7.1 the bound N 1/2 2 sup |ZN (t)| ≤ C θ log p . k k G t∈T

(9.8.9c)

k=1

N √ 2 1/2 . It is That expression is of course much more useful than log pN r=1 θr therefore interesting to determine whether a set of conditions on p and θ guaranteeing the validity of (9.8.9c) is possible to define. This goes as follows. We assume that there exists a sequence c = {ck , k ≥ 1} of reals and a real number , 0 < ≤ 1, such that ⎧ 2r 2 2 1) lim supr→∞ 2r ⎪ k=1 θk / k=r θk < ∞, ⎪ ⎪ r ⎨2) lim sup 2 2 −2 − p −2 ]c−2 [p r→∞ r r k=1 θk pk < ∞, r+1 (C) r ⎪ 3) lim supr→∞ k=1 ck2 / rk=1 θk2 < ∞, ⎪ ⎪ ⎩ 4) p[r/2] ≥ pr . # " r −2 2 −1 if p = k s (s > 0) or if Observe at first that [pr−2 − pr+1 ] behaves like k k=1 pk pk = 2k , in which case it is also like pr−2 . Practically (C2) reads as follows: r 2 2 k=1 θk pk < ∞, lim sup 2 (C2 ) r 2 p r→∞ cr k=1 k which is satisfied in many cases. Condition (C1) is satisfied once we have that r 2 θ varying function near infinity. The rek=1 k 3 κ(r), where κ is some regularly 2 diverges. θ quirement also implies that the series ∞ k=1 k Condition (C3) complements (C2) on comparing the growth of θ and c. Finally, condition (C4) means that the sequence p grows at most polynomially.

484

9 The majorizing measure method

9.8.4 Proposition. Under assumption (C), there exists a constant C such that for all N large enough, N 1/2 2 sup |ZN (t)| ≤ C θ log p . r r G t∈T

r=1

Proof. By assumption, for some suitable real 0 < c < 1 we have for all r large enough: 1)

2r

θk2 ≥ c

r

2)

2r

θk2 ,

1

−2 c[pr−2 − pr+1 ]

r

θk2 pk2 ≤ cr2 ,

k=1

3)

r

θk2 ≥ c

k=1

r

ck2 .

k=1

Using 1) and (C3) we get −2 2 εr2 ≥ εN = pN

N

−2 2 θk2 pk2 ≥ pN p[N/2]

k=1

θk2 ≥ c2

N/2≤k≤N

N

θk2 = c2 ε12 .

k=1

Now by (C2) and estimate 3) above −2 2 2 [pr−1 − pr−2 ] r−1 cr2 k=1 pk θk ≤ ≤ εr−1 − εr = # " N " # . 1/2 N 2 1/2 εr−1 + εr c k=1 θk2 c2 k=1 ck 2 εr−1 − εr2

Therefore, by applying the Cauchy–Schwarz inequality, N

N 2 (εr−1 − εr ) log pr ≤

r=2

r=2

√ N 1/2 1 2 cr2 log pr c log p . r " N 2 #1/2 ≤ r c2 c2 r=2 k=1 ck

One concludes by applying Theorem 9.7.1. There is an interesting case where Proposition 9.8.4 applies. We assume that X and Y are either independent i.i.d. Rademacher sequences or independent i.i.d. N (0, 1) sequences. Let U = {Uk , k ≥ 1} be a sequence of independent random variables defined on a joint probability space (ϒ, F , ). Consider also a sequence c = {ck , k ≥ 1} of reals and choose in (9.7.1) θk = ck Uk ,

k = 1, 2, . . .

(9.8.10)

It is clear with the choice made for√ X and Y that condition (8.5.4) is satisfied, condi√ tionally to U (one can take B = 18 2, or B = 18 π in the Gaussian or Rademacher

485

9.8 Some examples and discussion

case, see Section 8.5, Example 1). We now impose on U to satisfy the two following weighted strong laws of large numbers: N 2 2 N 2 2 2 k=1 ck Uk a.s. k=1 pk ck Uk a.s. lim N = a1 , lim = a2 , (9.8.11) N 2 2 2 N→∞ N →∞ k=1 ck k=1 pk ck where 0 < a1 , a2 < ∞. When the random variables Uk are moreover identically distributed and a = E U12 < ∞, according to Theorem 4.8.1 the strong laws in (9.8.11) are respectively verified as soon as r r 2 2 2 1 1 k=1 ck k=1 pk ck lim sup #{r : ≤ t} < ∞, lim sup ≤ t} < ∞, #{r : cr2 pr2 cr2 t→∞ t t→∞ t (9.8.12) in which case a1 = a2 = a. Condition (9.8.12) allows us to catch a wide range of examples,for instance pk = k s and ck = k β with s ≥ 1 and β real are suitable. Put H (r) = rk=1 ck2 , r ≥ 1. We do assume that the sequence p is polynomially growing and that the extra assumption linking both p and c holds as well: there exists C > 1 such that for any r large enough, a)

H (2r) ≥ CH (r),

b)

−2 [pr−2 − pr+1 ]

r

(9.8.13)

ck2 pk2 ≤ Ccr2 .

k=1

2 The requirement (9.8.13a), implying the divergence of the series ∞ k=1 ck , is satisfied for instance if H (r) 3 κ(r) where κ is a regularly varying function with positive Karamata index, but not if κ is slowly varying. Let us look at the effect of assumptions (9.8.11), (9.8.13) on the control of the quantities appearing in conditions (C1), (C2) and (C3). On the one hand, for any C > C > 1, by using (9.8.11) and (9.8.13a), 2r 2 2 2r 2 2 r 2 2 H (2r) k=1 ck Uk 8 k=1 ck Uk k=1 ck Uk = ≥ C, r 2U 2 H (2r) H (r) H (r) c k=1 k k r 2 2 2 2 almost surely, for r large. So that 2r k=r+1 ck Uk ≥ (C − 1) k=1 ck Uk , r large, thus implying that condition (C1) is checked. On the other hand, by (9.8.11) and (9.8.13b) r r r c2 U 2 p 2 k=1 k k k −2 −2 −2 2 2 2 −2 2 2 [pr − pr+1 ] ck Uk pk = [pr − pr+1 ] ck pk r 2 2 k=1 ck pk k=1 k=1 −2 ≤ 2[pr−2 − pr+1 ]

r

ck2 pk2 ≤ 2Ccr2 ,

k=1

almost surely, for r large. This implies that condition (C2) is satisfied. Finally, con cerning condition (C3), we observe by assumption (9.8.11) that limr→∞ (a1

)−1 ,

so that it is trivially satisfied. Consequently we can state:

r 2 k=1 ck 2U 2 c k=1 k k

r

=

486

9 The majorizing measure method

9.8.5 Corollary. The sequences X and Y being fixed as before, let p be polynomially growing. Let also U be a sequence of independent random variables defined on a joint probability space (ϒ, F , ). Let c be a sequence of reals. We assume that U, p and c satisfy conditions (9.8.11) and (9.8.13). If θ is defined by (9.8.10), for almost all υ in ϒ, there exists Cυ < ∞ such that for all N , N 1/2 2 sup |ZN (t)| ≤ Cυ c log p . r r G t∈T

r=1

And specifying this for i.i.d. square integrable sequences, we get: 9.8.6 Corollary. The sequences X and Y being fixed as before, let p be polynomially growing. Now let U be a sequence of i.i.d. square integrable random variables defined on a joint probability space (ϒ, F , ). Let p and c satisfy (9.8.12), (9.8.13). With θ defined by (9.8.10), for almost all υ in ϒ, there exists Cυ < ∞ such that for all N , N 1/2 2 sup |ZN (t)| ≤ Cυ c log p . r r G t∈T

r=1

9.8.7. Arithmetical weights. So far we have been concerned with regular (decreasing) weights, except for Corollaries 9.8.5 and 9.8.6, in which we considered random independent weights. In this example we study one symptomatic case of weights arising from arithmetic number theory. Let d(n) = #{d : d|n} be the divisor function and consider the case pk = [k s/2 ], θk = d(k). In this case the weights are very irregular, but their sums behave regularly. According to equation 18.2.1, p. 263 of [Hardy–Wright: 1979] and equation (B), p. 81 of [Ramanujan: 1916] (see [Wilson: 1922] for a proof) we recall, in effect, that N n=1

d(n) ∼ N log N,

N n=1

d (n) ∼ 2

N log3 N π2

(9.8.14)

as N tends to infinity. It follows from Theorem 8.5.1 or Theorem 9.7.1 that

QN G ≤ C(s)N 1/2 (log N )2 . This case is also an example where the sums of the weights grow to infinity. It is natural to also compare when the weights are growing. We shall perform this on the limit case: pk2 = M k , where M > 1 is fixed. We assume that there exists a r 2 nondecreasing differentiable function such that (r) = k=1 θk /r, and r r−1 x (x) ≤ c0 (x). RecallAbel summation: k=1 uk yk = j =1 Dj (yj −yj +1 )+Dr yr , j k where Dj = k=1 uk . Applying it with uk = 1, yk = M gives the relation r−1 M r+1 −1 2 r j j =1 j M (M − 1). Applying it now with uk = θk arbitrary and M−1 = M r −

9.8 Some examples and discussion

487

using the latter relation gives r

θk2 pk2 = (r)rM r −

r−1

(j )j M j (M − 1)

j =1

k=1

r−1

M r+1 − 1 j M j (M − 1) = (r) . ≥ (r) rM r − M −1 j =1

Conversely as rM r = r

θk2 pk2

M r+1 −1 M−1

+

r−1

= (r)rM − r

j =1 (j )j M r−1

j (M

− 1),

(j )j M j (M − 1)

j =1

k=1

M r+1 − 1 + j M j (M − 1)[(r) − (j )]. M −1 r−1

= (r)

j =1

But, as (r) − (j ) ≤ (r − j ) (j ) and r−1

j M (M − 1)(r − j ) (j ) ≤ C j

j =1

r−1

M j (M − 1)(r − j )(j )

j =1

≤ C(r)M r

r−1

M −k (M − 1)k,

k=1

M + C k=1 M −k (M − 1)k . Consequently, for we get k=1 θk2 pk2 ≤ (r)M M−1 some constants C1 , C2 depending on M and only, one has C1 (r)M r ≤ rk=1 θk2 pk2 ≤ C2 (r)M r . And this now implies that r

C1

N r=2

√

r

∞

√ √ N N 2 (r) r (r) r (εr−1 − εr ) log pr ≤ C2 ≤ . √ D(N) − D(r) r=2 D(N ) − D(r) r=2

x Fix some α > 1 such that c0 log(1/α) < 1. Since (x) ≤ (xα) + xα (u)du ≤ x (xα) + c0 xα ((u)/u)du ≤ (xα) + [c0 log(1/α)](x), it follows that (x) ≤ cα (xα). Thus √ N √ 2 (r) r (N α) (εr−1 − εr ) log pr ≥ C1 r ≥ C1 √ √ N (N ) N (N ) N ≥r≥N α r=2 r=2

N

≥ Cα N (N )1/2 . But in view of Theorem 8.5.1, QN G ≤ CN (N )1/2 , so that in this case both theorems produce equivalent estimates.

488

9 The majorizing measure method

9.9

Uniform convergence of random Fourier series

Let C be the space of complex-valued continuous functions on T equipped with the sup-norm f = sup0≤t≤1 |f (t)|, f ∈ C. Let U = {Uk , k ≥ 1} be a sequence of independent symmetric real random variables, and let p be a nondecreasing sequence of positive integers. In Theorem 8.5.8 we showed that the condition: there exist integers 0 := n0 < n1 < n2 < · · · such that the series ∞ i=0

E

i+1 n

|Uk |2

1/2

log1/2 pni+1

(9.8.15)

k=ni +1

converges is enough to ensure the uniform convergence of the random Fourier series 2iπpk t for almost all ω. W (ω)e k k≥1 However, it is clear from the previous section that this condition is only efficient for polynomially growing sequences p. In concrete cases, it is often enough to choose nk = 2k to obtain a sharp sufficient condition on U and p. But there are examples (for instance Rademacher Fourier series with p and θ defined by (9.8.19)) for which the k

correct choice is nk = 22 , which show that the appearance of the sequence (nk )k in the above condition is meaningful. In what follows, we would like to use the results from the previous section to investigate this question more specifically. We will restrict the scope of the study to Rademacher random Fourier series. Let ε = {εk , k ≥ 1}, ε = {εk , k ≥ 1} be two independent Rademacher sequences. We assume in (9.7.1) that X = ε, Y = ε and define for integers M ≥ N: ZN,M (ω, t) = ZM (ω, t)−ZN (ω, t) =

M

θk εk (ω) cos 2πpk t +εk (ω) sin 2πpk t .

k=N +1

(9.8.16) We investigate the uniform convergence of the series ∞

θk εk (ω) cos 2πpk t + εk (ω) sin 2πpk t .

k=1

Consider first the polynomial case. We establish another type of sufficient condition for uniform convergence in which we get rid of the sequence (nk ). We consider sequences p and θ linked by the following conditions: 1 2 2 (i) ∀N ≥ 1, θ p = o θk2 , k k 2 pm k≤m k≤m (9.8.17) −2 −2 (ii) ∃ < ∞ : [pm−1 − pm ] θk2 pk2 ≤ θm2 . k≤m

The examples studied in the previous section justify introduction of the following set: (9.8.18) D = (p, θ) : condition (9.8.16) is fulfilled .

489

9.9 Uniform convergence of random Fourier series

The pairs (p, θ) studied in 9.8.1 and 9.8.3 belong to D, as well as for instance the pair defined by 1 θ , (9.8.19) pk2 = elog k , θk2 = k logμ k where μ > 1 and θ > 0. 9.9.1 Theorem. Let (p, θ) ∈ D. Assume that 2 a) ∞ r=1 θr log pr < ∞, √ θ2 log pr b) limN→∞ lim supM→∞ N x, V > y} ≤ P{U > x}P{V > y}. In the next lemma are other similar useful estimates. 10.1.3 Lemma. Let (U, V ) be jointly Gaussian centered random variables and let x ≥ 0.

x U −V 2 2 − 21 2 U 22 (a) P U > x, V > x ≤ P{U > x}e , if U 2 ≥ V 2 . Assuming for some 0 < α ≤ 1 that 2 U 22 − V 22 ≤ (1 − α 2 ) U − V 22 , then

αx U −V 2 2

− 21 2 max( U

22 , V 22 ) (b) P U > x, V > x ≤ min P{U > x}, P{V > x} e .

2x Proof. Plainly P U > x, V > x ≤ P U + V > 2x = " U +V

2 . If we write

2x 2 x2 2 = U 2 + b , then

U +V 2 2

b2 = x 2

1 4 − 2

U + V 2

U 22

= x2

4 U 22 − U + V 22

U 22 U + V 22

.

But 4 U 22 − U +V 22 = 3 U 22 − V 22 −2U, V = 2( U 22 − V 22 )+ U −V 22 . If U 22 ≥ V 22 , we get x 2 U − V 22 , b2 ≥ 4 U 42

495

10.1 Gaussian variables and correlation estimates

and consequently − 21

x 2 U −V 2 2

2

4 U 2 P{U > x, V > x} ≤ P{ U 2 > x}e . 2 2 Now if U 22 − V 22 ≤ 1−α 2 U − V 2 , for some 0 < α < 1, then we have 4 U 22 − U + V 22 ≥ α 2 U − V 22 , and so

b2 ≥

α 2 x 2 U − V 22 4 max( U 22 , V 22 )2

,

which implies P{U > x, V > x} ≤ P{ U 2 > x}e

− 21

and also ≤ P{ V 2 > x}e

− 21

2 αx U −V 2 2 max( U 22 , V 22 )

αx U −V 2 2 max( U 22 , V 22 )

2 .

Hence the lemma. We conclude this part with an interesting lemma allowing us to express the correlation of Gaussian pairs in terms of a probability involving their signs. 10.1.4 Lemma. Let (U, V ) be jointly Gaussian centered random variables and let ρ = E UU 2 · VV 2 . Then, 1 1 P U ≥ 0, V ≥ 0 − = arcsin ρ. 4 2π Proof. Let Z be an N (0, 1) distributed random2 variable, which we assume to be in1 − ρ 2 Z) have the same law. Put dependent of U . Then (U, V ) and (U, ρU + H (ρ) = P U ≥ 0, V ≥ 0 . Assume 0 ≤ ρ ≤ 1, then 2 ∞ dz − 1 − ρ2z 2 P U > sup(0, ) e−z /2 √ H (ρ) = ρ 2π −∞ 2 0 2 − 1 − ρ z −z2 /2 dz 1 ∞ −z2 /2 dz P U> e e = + √ √ ρ 2 0 2π 2π −∞ ∞ ∞ dx dθ 1 2 2 √ = e−x /2 √ e−θ /2 √ + . θ 1−ρ 2 4 2π 2π 0 ρ Besides H (ρ) =

0

∞

d dρ

$ θ

√

1−ρ 2 ρ

e−x

2 /2

% dx dθ 2 e−θ /2 √ . √ 2π 2π

496 As

d dρ

10 Gaussian processes

! θ

√

1−ρ 2 ρ

e−x

2 /2

√dx 2π

2 )−1/2 ρ −2 2 2 2 θ (1−ρ√ e−θ (1−ρ )/2ρ , 2π

=

we thus have

∞ 1 2 2 2 2 H (ρ) = θ e−θ (1−ρ )/2ρ e−θ /2 dθ 2 2πρ 2 1 − ρ 2 0 ∞ 1 2 2 θ e−θ /(2ρ ) dθ = 2 (10.1.13) 2 2 2πρ 1 − ρ 0 1 1 = = (arcsin ρ) . 2 2 2π 2π 1 − ρ ρ 1 √du Since H (0) = 1/4, we get H (ρ) − 1/4 = 0 = 2π arcsin ρ. Hence 2

2π 1−u

1 1 P U ≥ 0, V ≥ 0 − = arcsin ρ. 4 2π √ ∞ θ 1−ρ 2 −θ 2 /2 dθ √ Now we observe that H (ρ) = 0 P 0 < U < − ρ e if −1 ≤ 2π ρ ≤ 0. Further $

√

% dx dθ 2 H (ρ) = e e−θ /2 √ √ 2π 2π 0 0 ∞ 1 dθ 1 2 2 2 2 = 2 θ e−θ (1−ρ )/2ρ e−θ /2 √ . = 2 2π ρ 2 2π(1 − ρ 2 ) 0 2π 1 − ρ 2 (10.1.14) 0 1 Hence H (0) − H (ρ) = 1/4 − H (ρ) = ρ √du 2 = − 2π arcsin ρ. Thereby

∞

d dρ

−θ

1−ρ 2 ρ

−x 2 /2

2π 1−u

1 1 P U ≥ 0, V ≥ 0 − = arcsin ρ. 4 2π From these facts, the lemma follows easily. Gaussian vectors. A centered real random vector X = (X1 , . . . , XN ) is Gaussian if for any reals a1 , . . . , aN , the random variable N i=1 ai Xi is centered Gaussian. There exists an N × N nonnegative definite matrix A such that for any B ∈ B(RN ), with x = (x1 , . . . , xN ), 1t 1 −1 P X∈B = e− 2 xA x dx1 . . . dxN . (10.1.15) √ (2π )N/2 det A B It is always possible to diagonalize X so that its distribution follows the canonical Gaussian law 1 2 2 (10.1.16) γN (x1 , . . . , xN ) := (2π )−N/2 e− 2 (x1 +···+xN ) . Let indeed = (EXi Xj )1≤i,j ≤N = At A be the covariance matrix of X. The law of X is completely defined by and identical to the law of A(Y1 , . . . , YN ).

10.1 Gaussian variables and correlation estimates

497

Rotational invariance. Gaussian laws possess a remarkable rotational invariance property, which is worth first presenting for pairs of random variables before switching to Gaussian vectors. 10.1.5 Lemma. If X and Y are independent and non-constant random variables and if U = pX + qY and V = aX − bY are independent, where p, q, a and b are all real and non-zero, then X and Y are normally distributed, and hence so are U and V . The case p = q = a = b = 1 is the well-known theorem of Bernstein [1941], who further assumed that X and Y have finite, equal variances and positive densities, and is also stated in Gelbaum [1985: Theorem 1], who was apparently unaware of Bernstein’s result. But the quoted work of Gelbaum contains many other interesting aspects, which we shall mention later on. Bernstein’s theorem was extended by Gnedenko [1948] who proved Lemma 10.1.5 without moment condition. The general form we stated is due to Quine and Seneta [1999: Theorem 2]. This remarkable property should, however, be rather attributed to Kac [1939]. In an early little known paper, Kac showed this: if X and Y are independent random variables and if for every ϑ, the random variables X cos ϑ + Y sin ϑ and X sin ϑ − Y cos ϑ are independent, then X and Y are normally distributed. In fact, the assumption is √ used only for the values ϑ = π/4 and ϑ = 3π/4, √ which requires that (X + Y )/ 2 and (X − Y )/ 2 are independent, and also that √ √ (−X + Y )/ 2 and (X + Y )/ 2 are independent. This is verified once X + Y and X − Y are independent, since independence is not affected by scalar multiplication. Kac’s paper precedes even Bernstein’s, see in this regard the nice discussion in Quine and Seneta [1999: Section 3]. His proof, based on characteristic functions and the Cauchy method, is simple and elegant and extends to the finite-dimensional case as quoted at the end of the paper. We find it worth including here. Kac’s proof. We may assume X and Y symmetric, the general case indeed follows from a routine argument. Their characteristic functions are real. Let A, B be the characteristic functions of X and Y respectively: A(x) = E eixX and B(y) = E eiyY . By assumption E eix(X+Y )+iy(X−Y ) = E eix(X+Y ) E eiy(X−Y ) = A(x)B(x)A(y)B(−y), E eix(−X+Y )+iy(X+Y ) = E eix(−X+Y ) E eiy(X+Y ) = A(−x)B(x)A(y)B(y). But E eix(X+Y )+iy(X−Y ) = E ei(x+y)X E ei(x−y)Y = A(x + y)B(x − y), E eix(−X+Y )+iy(X+Y ) = E ei(−x+y)X E ei(x+y)Y = A(−x + y)B(x + y).

498

10 Gaussian processes

Comparing the two above equalities gives A(x + y)B(x − y) = A(x)A(y)B(x)B(y) = A(y − x)B(y + x), since A(x) = A(−x), B(x) = B(−x) by the symmetry assumption. By letting x = y, we get A(2x) = B(2x). And so we arrive at the functional equation A(x + y)A(x − y) = A2 (x)A2 (y).

(10.1.17)

In particular A(2x) = A4 (x), so that, A being real, A(x) ≥ 0. Repeated application of A(2x) = A4 (x) produces k

A(x/2k ) = [A(x)]1/4 . But A is continuous and A(x/2k ) → 1 as k tends to infinity. This implies that A(x) > 0 for every x. The rest of the proof is based on the well-known method of Cauchy. Replacing successively x by 2x, 3x, . . . allows us to obtain for arbitrary integers p and q, 2 2 A(px/q) = [A(x)]p /q . And since A is continuous, 2

A(x) = ekx ,

(ek = A(1)).

As 0 < A(x) ≤ 1 one has k ≤ 0. A related result is the well-known Darmois–Skitoviˇc theorem (see Darmois [1953] and Skitoviˇc [1953], see also King and Lukacs [1954]) which states as follows. 10.1.6 Lemma. Let X1 , . . . , Xn be mutually independent random variables. Then U=

n

aj Xj

j =1

and V =

n

bj Xj

j =1

are independent if and only if each Xj with a non-zero coefficient in both sums is normally distributed and nj=1 aj bj Var(Xj ) = 0. The proof depends on forming differences of the logarithmic characteristic functions and applying a theorem of Marcinkiewicz. There is a recent formulation of this result (Quine and Seneta [1999: Theorem 1]), close to Lemma 10.1.5. 10.1.7 Lemma. If X1 , . . . , Xn are independent and non-constant random variables and if n n Xj and V = bj Xj U= j =1

j =1

are independent, where the numbers are b1 , . . . , bn , all distinct and nonzero, then X1 , . . . , Xn are normally distributed.

10.1 Gaussian variables and correlation estimates

499

The Darmois–Skitoviˇc theorem has an extension for Banach-valued random variables, thus completing the previous description (see Krakowiak [1985]). It was observed long ago that under the kind of assumptions made in the above lemmas, direct computations imply that X1 , . . . , Xn have moments of any order. We may refer to Lancaster [1960] for instance. Now let (Y1 , . . . , YN ) be a Gaussian vector. The rotational invariance property can be described as follows. If U is an orthogonal matrix on RN , then U (Y1 , . . . , YN ) has law γN (defined in (10.1.16)). Consequently, for any sequence of reals a1 , . . . , aN , the

N 2 1/2 . And thus for random variable N i=1 ai i=1 ai Yi follows the same law as Y1 any 0 < p < ∞, N N 1/2 ai Yi = Y1 p ai2 . (10.1.18) i=1

p

i=1

Another way to describe this property is the following: let X be a Gaussian vector in RN , and let Y be an independent copy of X. Then for any η, the vector obtained from (X, Y ) by a rotation of angle η, (X sin η + Y cos η, X cos η − Y sin η),

(10.1.19)

has the same law as (X, Y ). It suffices, indeed, to compare their covariance matrix. Having defined and commented on this important property, we now continue with other classical Gaussian correlation estimates. The following lemma has self-evident practical interest: combined with Lemma 10.1.4, it allows us to characterize (Maruyama’s result) mixing properties of Gaussian dynamical systems, see Section 3.3.6. 10.1.8 Lemma. Let X = (X1 , . . . , XN ) be a Gaussian centered vector such that E Xn2 = 1 for 1 ≤ n ≤ N and let r(n, m) = E Xn Xm be its covariance function. Let A be a partition of {1, . . . , N} and denote by σ a generic element of A. Let x = (x1 , . . . , xN ) and y = (y1 , . . . , yN ) with distinct coordinates, be such that −∞ < xn < yn < +∞, for 1 ≤ n ≤ N . Denote also by In the interval (xn , yn ), and put for each σ ∈ A, ( ( Vσ = In , V = Vσ , X(σ ) = (Xn , n ∈ σ ). n∈σ

σ ∈A

Then there exists a constant CV depending on V only, such that ( E Xn Xm . P{X(σ ) ∈ Vσ } ≤ CV P{X ∈ V } − σ ∈A

σ =σ n∈σ m∈σ

be a GaussProof. Let 1 denote the covariance matrix of X. For each σ ∈ A, let X(σ ) ian vector having the same law as X(σ ) and such that the X(σ ) are mutually independent. , σ ∈ A). Let = (X(σ )

500

10 Gaussian processes

(1) Assume that 1 is invertible and write (λ) = λ 1 + (1 − λ) 0 , for λ ∈ [0, 1]. Then (λ) is invertible. Put 1 − 21 t u (λ)−1 u , F (λ) = gλ (u)du. (10.1.20) gλ (u) = e √ (2π )n/2 det (λ) V Then F (λ) has a derivative which may be evaluated as $ % ∂ 1 ∂ (λ) ∂ 2

∂ . 2 gλ (u) . F (λ) = (gλ (u)) du where (gλ (u)) = tr ∂λ 2 ∂λ ∂u V ∂λ (10.1.21) But ∂ (λ) r(α, β) if α ∈ σ, β ∈ σ , σ = σ ; = ∂λ 0 otherwise. Consequently ∂ 1 ∂2 r(α, β) (gλ (u)) = (gλ (u)) . ∂λ 2 ∂uα ∂uβ α∈σ σ =σ

Thus

(10.1.22)

β∈σ

1 ∂2 r(α, β) F (λ) = (gλ (u)) du. 2 V ∂uα ∂uβ α∈σ

σ =σ

β∈σ

And so ( P{X(σ ) ∈ Vσ } P{X ∈ V } − =

0

σ ∈A

1

1 1 ∂2 F (λ)dλ ≤ |r(α, β)| · (gλ (u)) dudλ. 2 0 V ∂uα ∂uβ α∈σ

σ =σ

β∈σ

(10.1.23) Put α,β g(u) = g(u1 , . . . , yα , . . . , yβ , . . . ) − g(u1 , . . . , xα , . . . , yβ , . . . ) − g(u1 , . . . , yα , . . . , xβ , . . . ) + g(u1 , . . . , yα , . . . , xβ , . . . ). Then

V

∂2 (gλ (u)) du ∂uα ∂uβ y1 y2 yβ yα ∂2 du1 du2 . . . duα = (gλ (u)) duβ . x1 x2 xα xβ ∂uα ∂uβ du1 . . . dun = x ≤u ≤y α,β g(u) j j j duα duβ j =α,j =β ∗ du1 . . . dun ≤ |α,β g(u)| ≤ (x, y, λr(α, β)), duα duβ Rn−2 (x,y)

(10.1.24)

10.1 Gaussian variables and correlation estimates

501

where the above sum runs over the set {(yα , yβ ), (xα , yβ ), (yα , xβ ), (yα , xβ )}. Thereby 1 ( P{X(σ ) ∈ Vσ } = F (λ)dλ ≤ CV |r(α, β)|, P{X ∈ V } − 0

σ ∈A

σ =σ α∈σ β∈σ

with CV = 4 max (x, y, ρ) : |ρ| ≤ 1, x, y ∈ {xα , yα , α ∈ A} . As for x = y, sup (x, y, ρ) < ∞,

−1≤ρ≤1

and it follows by assumption that CV is finite. (2) If 1 is not invertible, let be a Gaussian vector in RN with i.i.d. N (0, 1) distributed components; and put for u real, u = 0, Xu = X + uN,

u = + uN.

The covariance matrices are then invertible, and the first step of the proof shows that the conclusion of the lemma is verified by Xu . Further Xu (α, β) = r(α, β) + u2 . We then observe that it suffices to let u tend to 0 for concluding identically for X. Finally we quote a remarkable decoupling inequality due to Klein–Landau–Shucker [1982: Theorem 1]. For a proof we refer to the original paper. 10.1.9 Lemma. Let T = {Tk , k ≥ 1} be a stationary, centered Gaussian sequence with finite decoupling coefficient p(T ), that is: p(T ) :=

∞ E T1 Tk

k=1

E T12

< ∞.

Let {fk , k ≥ 1} be a sequence of complex-valued Borel-measurable functions. Then, for each finite subset J of N, ( ( fj (T1 ) fj (Tj ) ≤ . (10.1.25) E p(T ) j ∈J

j ∈J

Gaussian processes. A family X = {Xt , t ∈ T } of random variables with common basic probability space (, A, P) is a centered Gaussian process if any finite linear combination n ak Xtk k=1

with ak reals and tk ∈ T is a centered real Gaussian random variable. The law of the Gaussian process X is completely determined by its covariance function (s, t) = EXs Xt , s, t ∈ T . A more abstract way to define Gaussian processes usually goes as follows. Let H be a Hilbert space; a Gaussian process is a (linear) isometry T : H → L2 (P) such that:

502

10 Gaussian processes

(i) For any two orthogonal elements x, y ∈ H , T (x) and T (y) are independent. (ii) For any x ∈ H , T (x) is centered normally distributed with E T (x)2 = x 2 . We see from Lemma 10.1.5 that the second requirement is redundant. Indeed, if x and y are orthogonal so are x + y and x − y; whence T (x) and T (y) are normally distributed. The other requirement E T (x)2 = x 2 is implied by the fact that T is an isometry. We therefore have another simpler definition: 10.1.10 Definition. A Gaussian process is a linear isometry T : H → L2 (P) such that if x, y ∈ H are orthogonal, then T (x) and T (y) are centered independent. The comparison between the two definitions is easy. Let X = {Xt , t ∈ T } be a centered Gaussian process with basic probability space (, A, P). Let H = span{Xt } and T be the identity operator. Clearly X is the restriction of T to some subset of H . As the law of these random variables is determined by their finite margins, the rotational invariance properties stated before extend to these variables. Thus if X is a Gaussian process, or a Gaussian random variable with value in a Banach space (see Definition 10.1.11), and if X1 , . . . , XN are independent copies of X, for any sequence

N 2 1/2 a1 , . . . , aN of reals, N . i=1 ai Xi has the same law as X1 i=1 ai The finitely additive Gaussian cylinder measure on H induced by T is not extendable to a countably additive measure on the σ -algebra BT (H ) generated by the cylinders of H if H is infinite-dimensional. The following remarkable example is quoted in Gelbaum [1985]. If is a domain in R2 and if the two-dimensional Lebesgue measure of S is finite, say equal to 1, let H be the set of R-valued square integrable harmonic functions on and finally let T be any endomorphism of H . Then by a result of Hemasinha [1983], T induces a countably additive measure on BT (H ). Therefore no such T can satisfy either of the requirements (i) and (ii) above. Let H be a Hilbert space. The canonical Gaussian process Z = {Zh , h ∈ H } on H is the Gaussian centered process with covariance function given by (h, h ) = h, h , for any h, h ∈ H . By Zorn’s lemma, any Hilbert space admits an orthonormal basis although not necessarily countable. Assume that H admits a countable orthonormal basis {hn , n ≥ 1}, which is realized if and only if H is separable. Let also γ = {gn , n ≥ 1} be a sequence of i.i.d. N (0, 1) distributed random variables with basic probability space (, A, P). Then Z can be defined as follows: for any h ∈ H , Zh =

∞

gn h, hn .

(10.1.26)

n=1

We easily verify that E Zh Zh = h, h for any h, h ∈ H . Any centered Gaussian process X = {Xt , t ∈ T } can be represented as the restriction of the canonical Gaussian process to some suitable subset B of H . Let indeed H = L2 (, A, P ), and consider

10.1 Gaussian variables and correlation estimates

503

the restriction of Z on H to B = {Xt , t ∈ T }. Then X and ZB = {Zb , b ∈ B} have the same laws since their covariance functions are identical by construction. We shall introduce the notions of Gaussian measure and of Gauss space and review some of their important properties. For the proofs and for more about these spaces, we refer to the original works of Borell [1975–77] (see also Gross [1967]), which are clearly written and accessible. There are other remarkable sources: for instance Ledoux and Talagrand [1991], Lifshits [1995], Talagrand [2005] and the work of Ehrhard [1983], [1984a], [1984b]. Gauss spaces. Let E denote a locally convex Hausdorff space over the field of real numbers. 10.1.11 Definition. A Radon probability measure μ on E is said to be a (centered) Gaussian Radon measure on E if the image measure ξ(μ) is a (centered) Gaussian Radon measure on R for every ξ belonging to the topological dual E of E. The pair (E, μ) is called a Gauss space. A random variable X with value in E is Gaussian if f (X) is a real Gaussian for any f ∈ E . Equivalently, a Radon probability measure μ on E is said to be centered Gaussian if for independent random variables X, Y with common law μ, X + Y and X − Y are independent and have the same distribution. The class of all (centered) Gaussian Radon measures on E is denoted by G(E) (resp. (G0 (E)). Every μ ∈ G(E) has barycenter b ∈ E. Setting μ0 = μ(· + b), we also denote by E2 (μ) the closure of E in L2 (μ). If (H, · ) is a Hilbert space, the canonical cylinder measure on H is denoted by γH . The Fourier transform γˆH (x) of 2 γH equals e− x /2 . In the theorem below (Borell [1975: Theorem 2.1]), we list a few basic properties of Gauss spaces. 10.1.12 Theorem. Suppose μ ∈ G(E). Then a) μ has barycenter b ∈ E, b) every measure ξ μ0 , ξ ∈ E2 (μ), has barycenter ξ ∈ E. The map : E2 (μ) → E is linear and injective. We define H (μ) = range(),

˜ 2. h˜ = −1 h, h ∈ H (μ) and h 2 = μ(h)

Then, c) (H(μ), · ) is a Hilbert space and the canonical injection θ of (H(μ), · ) into E is weakly continuous. Furthermore θ (γH (μ) ) = μ0 . Let μ ∈ G(E) and write μx ( · ) = μ0 (· − x), x ∈ E. As a corollary we get # " ˜ 2 μh = e(h− h /2) · μ0 , h ∈ H(μ). (10.1.27) The Hilbert space H(μ) introduced in Theorem 10.1.12 is called the reproducing kernel Hilbert space (RKHS) of μ. Borell proved (see Theorem 7.1 in the aforementioned

504

10 Gaussian processes

paper) that H(μ) is separable.

(10.1.28)

We define O(μ) = {h ∈ H (μ) : h ≤ 1},

μ ∈ G(E).

(10.1.29)

Then O(μ) is a compact subset of E and we have the important relation μ0 (ξ 2 ) = max ξ 2 , O(μ)

10.2

ξ ∈ E.

(10.1.30)

0-1 laws, integrability and comparison lemmas

0-1 laws. The rotational invariance of Gaussian laws has an important consequence: a general 0-1 law (Fernique [1975: Theorem 1.2.1]) which can be stated as follows. 10.2.1 Proposition. Let (E, E ) be a measurable vector space. Let (, B, P) be a probability space. Consider a Gaussian vector X : (, B, P) → (E, E ). Then for any subspace V of E, we have P{X ∈ V } = 0 or 1. Proof. It is rather immediate. Let Y be an independent copy of X and put Bϑ = {X cos ϑ + Y sin ϑ ∈ V , X sin ϑ − Y cos ϑ ∈ / V }. Let ϑ1 = ϑ2 and assume that X cos ϑ1 + Y sin ϑ1 ∈ V and X cos ϑ2 + Y sin ϑ2 ∈ V . The determinant of the (2, 2) matrix cos ϑ1 sin ϑ1 cos ϑ2 sin ϑ2 being non-zero, it follows that X and Y belong to V as well, and so is the case for X sin ϑ1 − Y cos ϑ1 ∈ V and X sin ϑ2 − Y cos ϑ2 ∈ V . Thus the sets Bϑ are disjoint. Since they have the same probability, this one must be 0. In other words it follows that P(B0 ) = P{X ∈ V }(1 − P{X ∈ V }) = 0, as claimed. Integrability. Let N : (E, E ) → (R+ , B(R+ )) be a measurable semi-norm on E. A plain but useful consequence of the 0-1 law is that P{N(X) < ∞} = 0 or 1.

(10.2.1)

When N is the usual sup-norm, say N(X) = supn≥1 |Xn |, if X = {Xn , n ≥ 1}, this fact has been known for a long time, according to the discussion and related references (starting in 1951) given in the introduction of Landau and Shepp [1970]. When P{supn≥1 |Xn | < ∞} = 1, the possible exponential integrability of the supremum of X was conjectured by Varadhan in 1967, and proved by Landau and Shepp in the above quoted paper, and independently by Fernique [1970] for general seminorms. We shall indeed establish, as a direct consequence of the rotational invariance of Gaussian laws, that if P{N(X) < ∞} > 0, then N (X) is exponentially integrable.

505

10.2 0-1 laws, integrability and comparison lemmas

10.2.2 Theorem. Let (E, E ) be a measurable vector space. Let (, B, P) be a probability space. Consider a Gaussian vector X : (, B, P) → (E, E ). Let N = (E, E ) → R+ be a measurable semi-norm on E and assume that P{N (X) < ∞} > 0. Then E N(X) < ∞ and in fact there exists an absolute constant K such that

E exp

N(X)2 K(E N(X))2

≤ 2.

(10.2.2)

The proof is elementary but has some degree of elegance. Proof. Let Y be an independent copy of X. Let 0 < u ≤ v be two reals. Then X−Y X+Y ≤ u, N >v P N(X) ≤ u P N (X) > v = P N √ √ 2 2 (10.2.3) v−u 2 ≤ P N(X) > √ , 2

√ √ √ where we used the fact that N X+Y ≤ N X−Y + 2 sup(N (X), N (Y )). Let τ > 0 2 2 be fixed. Choose s such that

δ := P N(X) ≤ s > 1/2, Put

δ log 1−δ

and

√

tn = ( 2 + 1) 2(n+1)/2 − 1 s,

≥ τ.

n = 0, 1, . . . .

Then tn+1 − s = tn and from (10.2.3) applied with u = s and v = tn+1 we get P N(X) ≤ s}P N (X) > tn+1 ≤ 2P2 N (X) > tn . Letting xn = P N (X) > tn /P N(X) ≤ s}, the latter inequality means xn+1 ≤ xn2 . n Iterating this inequality leads to xn+1 ≤ x02 ; and so for n = 0, 1, . . . , n 1−δ 2 n P N(X) > 2 · 2n/2 s ≤ P N (X) > tn+1 ≤ δ = δe−2 τ . δ n n Let c = τ/2. Then P exp sc2 N(X)2 > e2 c = P N (X) > 2n/2 s ≤ e−2 τ and thus ∞ ∞ n c n n 2 2n c 2 c 2n−1 c e −e P exp 2 N(X) > e e−2 τ e2 c < ∞. ≤ s n=0

n=0

This establishes that

E exp

τ N (X)2 2s 2

≤ C,

506

10 Gaussian processes

where C = C(τ ) depends on τ only. We fix τ , say τ = log 2 so that C is now an absolute constant. By Jensen’s inequality, we may find a real 0 < η < 1 small enough for the following inequality to be true:

E exp η

τ N (X)2 2s 2

≤ E exp

τ N (X)2 2s 2

η

≤ C η = eη(log C) ≤ 2.

(10.2.4)

This notably implies that E N(X) < ∞. Now observe that the reasoning we just made is valid for any Gaussian vector X with value in E and satisfying P{N 0. (X) < ∞} > = X/E N (X). But δ = P N (X ) ≤ 3 = This is in particular the case of X P N(X) ≤ 3E N(X) ≥ 2/3 and

log

δ 1−δ

≥ log 2 = τ,

so that s = 3 is suitable there. Application of (10.2.4) to X yields

E exp

N(X) KE N(X)

2

≤ 2,

(10.2.5)

where K = (18/ητ )1/2 , as claimed. Comparison lemmas. This is, after the rotational invariance of Gaussian laws, the second fundamental property of Gaussian processes ([Fernique: 1975]). 10.2.3 Lemma. Let T be a finite set and consider two Gaussian (centered) processes X = {Xt , t ∈ T } and Y = {Yt , t ∈ T }. Assume that for any s, t ∈ T , dY (s, t) ≤ dX (s, t).

(10.2.6)

Then for any convex increasing function ϕ : R → R+ , Ef ( sup Ys − Yt ) ≤ Ef ( sup Xs − Xt ).

(10.2.7)

E sup Yt ≤ E sup Xt .

(10.2.8)

T ×T

T ×T

In particular t∈T

t∈T

Proof. Let n = #(T ). It suffices to prove the lemma when f is a smooth twice differentiable convex function, since any convex increasing function is the upper convex hull of such functions. Let X , Y denote the covariance matrix of X, Y respectively. (1) Assume first that X , Y are invertible and write (λ) = λ X + (1 − λ) Y , for λ ∈ [0, 1]. Then (λ) is invertible. Put for x = (xt )t∈T ∈ RT ,

1 − 21 t x (λ)−1 x e , H (λ) = f sup xs − xt gλ (x)dx. gλ (x) = √ n/2 (2π) det (λ) RT s,t∈T

10.2 0-1 laws, integrability and comparison lemmas

507

Arguing as along the lines (10.1.20) to (10.1.22), we find that H (λ) has a derivative and ∂

f sup xs − xt H (λ) = (gλ (x)) dx, ∂λ RT s,t∈T $

%

∂ 1 ∂ (λ) ∂ 2

. gλ (x) . (gλ (x)) = tr ∂λ 2 ∂λ ∂x 2 Developing more the expression of H (λ) leads to H (λ) =

∂ " # (s, s) − 2 (s, t) + (t, t) J (s, t) ∂α t∈T s∈T

=

s =t

" s∈T

# 2 (s, t) − dY2 (s, t) J (s, t), dX

t∈T s =t

where J (s, t) are positive integrals. It follows that H (λ) ≥ 0, and so H (1) ≥ H (0), which establishes (10.2.7). (2) If X or Y is not invertible, we proceed as in the second part of the proof of Lemma 10.1.8. An immediate consequence of this lemma is the well-known Sudakov’s minoration: There exists a universal constant K such that

2 E sup X(t) ≥ K inf dX (s, t) log #(T ).

(10.2.9)

s,t∈T s =t

t∈T

Proof. It suffices to prove (10.2.9) when T is finite, say T = {1, . . . , N}. Let λj , 1 ≤ j ≤ N be independent N (0, 1) distributed random variables and write ρ = inf 1≤i =j ≤N Xi − Xj 2 . Put ρ Yj = √ λj , 2

1 ≤ j ≤ N.

By construction Xi − Xj 2 ≥ Yi − Yj 2 , for all i and j , so that condition (10.2.6) is satisfied. And by Lemma 10.2.3, ρ N N E sup Xj ≥ √ E sup λj . 2 j =1 j =1 Using the symmetry of the Gaussian laws, we have that E supN i,j =1 |λi − λj | = N N N E supj =1 (λi ) + E supj =1 (−λj ) = 2E supj =1 λi . Now N

N

j =1

i,j =1

E sup λj = 21 E sup |λi − λj | ≥

1 2

N

1/2 N E sup |λi | − E |λ1 | = 21 E sup |λi | − π2 . j =1

j =1

508

10 Gaussian processes

But for any T > 0,

N N E sup |λi | ≥ T P sup |λi | > T = T 1 − P{|λ1 | < T }N j =1

j =1

≥ T 1 − eN log(1−P{|λ1 |>T }) ≥ T 1 − e−N P{|λ1 |>T } .

√ Choosing T = 2 log N implies NP{|λ1 | > T } ≤ 1; and so E supN j =1 |λi | ≥ √ C log N . The result follows easily. Error term in Slepian’s comparison lemma. It is also possible to bound the difference between the terms in (10.2.8). Put 2 (s, t) − dY2 (s, t)|. γ 2 = sup |dX s,t∈T

Then there exists a universal constant C such that 2 E sup Xt − E sup Yt ≤ Cγ log #(T ). t∈T

(10.2.10)

t∈T

This follows from a simple application of the previous lemma. Let N = {Nt , t ∈ T } where the components Nt are independent and N (0, 1) distributed and assume that N, X and Y are mutually independent. Put Z=

γ √ N + Y. 2

Then for s = t, 2 2 (u, v) − dY2 (u, v)| + dY2 (s, t) ≥ dX (s, t). dZ2 (s, t) := γ 2 + dY2 (s, t) ≥ sup |dX u,v∈T

By Lemma 10.2.3, E sup Xt ≤ E sup Zt = E sup t∈T

t∈T

Considering now Z =

t∈T

γ √ 2

%

γ γ √ Nt + Yt ≤ ( √ )E sup Nt + E sup Yt . 2 2 t∈T t∈T

N + X, we obtain similarly

E sup Yt ≤ t∈T

Therefore

$

γ √ E sup Nt + E sup Xt . 2 t∈T t∈T

γ E sup Xt − E sup Yt ≤ √ E sup Nt ≤ Cγ (log #(T ))1/2 , 2 t∈T t∈T t∈T

as claimed. This inequality was observed by Chatterjee [2005] who proved it by different arguments and considered also the non-centered case.

509

10.2 0-1 laws, integrability and comparison lemmas

Talagrand’s strengthening. A fundamental observation made by Talagrand is that the Sudakov minoration we just considered is only a piece of a stronger minoration inequality, which actually leads, when combined with a chaining argument, to the proof of the majorizing measure conjecture. Let {Xt , t ∈ T } be a Gaussian process and denote d(s, t) = Xs − Xt 2 . Consider points {t , 1 ≤ ≤ m} of T such that d(t , tk ) ≥ a if = k. Let σ > 0 and attach to each , 1 ≤ ≤ m, a finite set H ⊂ Bd (t , σ ). Let m + H = H . =1

Then we have E sup Xt ≥ t∈H

2 a 2 log m − C2 σ log m + min E sup Xt . 1≤≤m t∈H C1

(10.2.11)

In particular, if σ ≤ a(2C1 C2 ), E sup Xt ≥ t∈H

a 2 log m + min E sup Xt . 1≤≤m t∈H 2C1

(10.2.12)

The proof we shall give is taken from [Talagrand: 2005] (see p. 34), to which we refer the reader for more about Gaussian processes, Rademacher processes and majorizing measures. There is no loss to assume m ≥ 2. Consider the random variables

Y = sup Xt − Xt = sup (Xt − Xt ), 1 ≤ ≤ m, V = max Yl . t∈H

1≤≤m

t∈H

By the concentration inequality (10.4.5), 2 2 P |Y − E Y | ≥ u ≤ 2e−u /2σ . Thus P{V ≥ u} ≤ 2me−u /2σ , and so using inequality (10.1.3) and the above, ∞ ∞

2 2 EV = P{V ≥ u)du ≤ min 1, 2me−u /2σ du 2

0

≤

√ σ 2 log 2m

2

0

0

2 = σ 2 log 2m + 2mσ 2 ≤ C2 σ log m.

∞

2 2 e−u /2σ du √ σ 2 log 2m ∞ 1/2 2 π −v 2 /2 e dv ≤ σ 2 log 2m + σ √ 2 2 log 2m

du + 2m

But for each , V ≥ E Y − Y , and so Y ≥ min1≤≤m E Y − V , which implies sup Xt = Y + Xt ≥ Xt + min E Y − V .

t∈H

1≤≤m

510

10 Gaussian processes

Hence sup Xt ≥ max Xt + min E Y − V . 1≤≤m

t∈H

1≤≤m

Passing to expectation gives 2 E sup Xt ≥ E max Xt + min E Y − C2 σ log m. 1≤≤m

t∈H

1≤≤m

To conclude, it remains to apply Sudakov’s minoration to the first term of the right-hand side.

10.3

Regularity and irregularity of Gaussian processes

Let X = {Xt , t ∈ T } be a Gaussian process indexed on T and with basic probability space (, B, P). Let dX (s, t) = Xs − Xt 2 be the natural pseudo-metric induced by X on T . The following useful fact is easy to verify: in order that X has a dX -separable version or modification, it is necessary and sufficient that (T , dX ) be separable. Two fundamental properties are relevant in this section: the almost sure boundedness and almost sure continuity of sample paths. Let X = {Xt , t ∈ T } be a Gaussian process indexed on an arbitrary parameter set T . We endow T with the pseudo-metric dX (s, t) and assume that (T , dX ) is separable, so that X possesses a (dX -separable) version which we shall denote again by X. In this case, there is no ambiguity to say: X is sample bounded if P{ω : supt∈T |Xt (ω)| < ∞} = 1; X is sample dX -continuous if P{ω : t → Xt (ω) is dX -continuous} = 1. These properties lead to a fine notion of compactness in a Hilbert space. Let (H, · ) be a Hilbert space and let Z be the canonical Gaussian process on H . 10.3.1 Definition. We say that A is a GB (for Gaussian bounded) subset of H if the restriction of Z on A possesses a version which is sample bounded. We also say that A is a GC (for Gaussian continuous) subset of H if the restriction of Z on A possesses a version which is sample · -continuous. The 0-1 laws and integrability properties of Gaussian vectors (previous section) show that X is sample bounded if and only if E sup |X(t)| < ∞.

(10.3.1)

t∈T

As E sup X(t) ≤ E sup |X(t)| ≤ 2E sup X(t) + inf E |X(t0 )| t∈T

t∈T

t0 ∈T

t∈T

and E

sup

(s,t)∈T ×T

X(t) − X(s) = 2E sup X(t), t∈T

10.3 Regularity and irregularity of Gaussian processes

511

we have E sup X(t) ≤ E sup |X(t)| ≤ 2E sup X(t) + inf E |X(t0 )|. t∈T

t∈T

t0 ∈T

t∈T

(10.3.2)

It follows that X is also sample bounded if and only if E sup X(t) < ∞. t∈T

As for the sample path continuity, first examine the oscillation properties of Gaussian processes established by Ito and Nisio [1968] and Belyaev [1961]. Let (T , δ) be a separable metric space. Let X = {Xt , t ∈ T } be a Gaussian process on T . We assume that X is dX -separable. We also assume that the identity mapping i : (T , δ) → (T , dX ) is uniformly continuous. Then under these conditions the δ-oscillation of X, WX(ω) (t) = lim lim

u→0 ε→0

sup

δ(s,t) 0. (10.3.24) T

T

This condition means that the parameter space has finite energy integral with respect 2 2 to the kernel K(y) = eb /y . This implies that T is sufficiently large so that the sample paths have “enough time” to visit every set of positive measure. This approach has also been extended to non-Gaussian processes in a little known paper by Berman [1984], and certainly deserves further investigations.

10.4

Gaussian suprema

The isoperimetric inequality. The fundamental result is a Brunn–Minkowski type isoperimetric inequality in Gauss spaces (Section 10.1) discovered independently by Borell [1975] and Sudakov–Tsyrelson [1974]. Let E be a locally convex Hausdorff space. Let μ ∈ G(E). We set μ∗ (A) = sup μ(K) : K compact K ⊆ A , whenever A ⊆ E. x 2 Recall that we have set (x) = √1 ∞ e−t /2 dt in (10.1.2), and that O(μ) denotes 2π the unit ball of the RKHS of μ, see (10.1.29). 10.4.1 Theorem. Suppose that A is a μ-measurable subset of E. Choose a ∈ R so that μ(A) = (a). Then, for all t > 0,

μ∗ A + tO(μ) ≥ (a + t).

518

10 Gaussian processes

Equality occurs if A is a half space. In particular, if A + H (μ) = A, then μ(A) = 0 or 1. The proof in Borell [1975] is based on the Brunn–Minkowski inequality for spherical space. It is worth mentioning that in an earlier paper, Landau and Shepp [1970] already used this inequality to prove that if X is a centered Gaussian vector in Rn , V a convex set and s a real such that P{V ∈ C} ≥ (s), then if s > 0, for any a > 1, P{V ∈ aC} ≥ (as). We point out another useful inequality valid for all μ-measurable subsets A and B of E, and every 0 < λ < 1:

μ∗ λA + (1 − λ)B ≥ μλ (A)μ1−λ (B). (10.4.1) And so if μ ∈ G0 (E) and A is a convex Borel measurable subset of E, symmetric about the origin, then μ(A) ≥ μ(A + x), x ∈ E. (10.4.2) This is a fundamental inequality in Gauss spaces. A remarkable property enjoyed by μ-measurable subsets A with positive measure states as follows: If μ(A) > 0, then there exists a positive number δ such that δO(μ) ⊆ A − A. (10.4.3) We refer for these results to Borell [1975] and also to Section 2.3 in [Ledoux–Talagrand: 1991]. Let us give some important consequences of the isoperimetric inequality. Let (B,

) be a Banach space such that for some countable subset D of the unit ball B , x = supf ∈D |f (x)|. If X is a random variable in B, the study of the distribution of X thus amounts to estimating the supremum of countably many random variables {f (X), f ∈ D}. Consider now X Gaussian in B; by this we mean that {f (X), f ∈ D} is a Gaussian process, or equivalently that every finite linear combination i αi fi (X), αi ∈ R, fi ∈ D is Gaussian. The behavior of P{ X > t} is determined by two parameters: the median M = M(X), that is a number satisfying both P{ X ≤ M} ≥ and

1 , 2

P{ X ≥ M} ≥

1 , 2

1/2

σ = σ (X) = sup E f 2 (X) . f ∈D

Set D = {fn , n ≥ 1}. Let γ be the canonical Gaussian distribution on RN . By applying the Gram–Schmidt orthonormalization procedure to the sequence {fn (X), n ≥ 1}, we can write n ajn gj , n ≥ 1. fn (X) = j =1

519

10.4 Gaussian suprema

The meaning of these equalities is that if x = {xj , j ≥1} ∈ RN , the sequence {fn (X), n ≥ 1} has the same distribution as the sequence { nj=1 ajn xj , n ≥ 1} under γ . Consequently, the study of the distribution of X amounts to the one of x = supn≥1 |fn (x)| under γ . Note also that |hj |2 ≤ 1 . σ = sup h where O(γ ) = h : h∈O(γ )

j ≥1

The next result is a very important consequence of inequality (10.4.1). 10.4.2 Theorem. If X is a Gaussian random variable with value in a Banach space (B,

), with median M and supremum of weak variances σ , then for every t > 0, 2 2 P X − M > t ≤ 2"(t/σ ) ≤ e−t /2σ . Proof. Indeed, let A = {x ∈ RN : x ≤ M}. Then At is the Hilbertian neighborhood of order t of A and by Theorem 10.4.1, γ∗ (At ) ≥ (t). Further, if x ∈ At , x = a + th, a ∈ A, h ∈ O(γ ), then

x ≤ M + t h ≤ M + tσ. Thus At ⊂ {x ∈ RN : x ≤ M + tσ } and so γ {x ∈ RN : x ≤ M + tσ } ≥ (t). Operating similarly with A = {x ∈ RN : x ≥ M} shows that γ {x ∈ RN : x ≥ M − tσ } ≥ (t). Theorem 10.4.1 also allows us to estimate suprema of finitely many Gaussian vectors: There exists a universal constant C such that if G1 , . . . , GN are Gaussian random vectors with values in (B, · ), then (10.4.4) E sup Gk ≤ C sup E Gk + E sup σk |gk | 1≤k≤N

1≤k≤N

1≤k≤N

1/2 where σk = supf ∈B , f ≤1 E f, Gk 2 , k = 1, . . . , N, {gk , 1 ≤ k ≤ N} is a sequence of independent N (0, 1) distributed random variables. Now we specify Theorems 10.4.1, 10.4.2 for suprema of Gaussian processes. If X = {Xt , t ∈ T }, T finite, is a centered Gaussian process and σ = supt∈T (E Xt2 )1/2 , it follows that for u ≥ 0, we have 2 2 P sup Xt − E sup Xt ≥ u ≤ 2e−u /2σ . (10.4.5) t∈T

t∈T

If X = {Xt , t ∈ T } is a Gaussian process, the most general result on the tail distribution of supt∈T X(t) is derived from Theorem 10.4.1. Assume there exists w such that 1 P sup X(t) > w ≤ . 2 t∈T

520

10 Gaussian processes

Then, for all u ≥ w,

P sup X(t) > u ≤ "

t∈T

u−w σ (X)

(10.4.6)

where σ (X) = supt∈T (E X(t)2 )1/2 , and for any real u, u−w P sup X(t) > u ≤ 2" . σ (X) t∈T Let us list some more or less classical estimates. Some typical results. Assume that σ (X) = 1. Then P{sup X(t) > u} ≤ C(w)ewu "(u), t∈T

where the constant C(w) depends only on w. This bound cannot be improved. However it is too crude for many important cases. Consider several examples: Let Y = {Y (t), t ∈ R} be a stationary Gaussian process verifying E Yt2 ≡ 1 and having continuous sample paths. Then, for every ε > 0, E exp which implies ∀u ≥ ε,

P

1

2

sup |Yt | − ε

2

< ∞,

0≤t≤1

sup Yt > u ≤ C(ε)eεu "(u). 0≤t≤1

Better formulations of this result are established in Talagrand [1984]. Let further {B(t), 0 ≤ t < ∞} be a Brownian motion. It is well known that ∀λ ≥ 0, P sup B(t) > λ = 2P{B(1) > λ} = 2"(λ). 0≤t≤1

Let {Y (t), t ∈ R} be a Gaussian process satisfying for some 0 < α < 1,

E |Ys − Yt |2

1/2

3 |s − t|α ,

as |t − s| → 0. For these processes, the following asymptotic estimate is established in Pickands [1969]: P sup Yt > λ 3 λ1/α "(λ), λ → ∞. 0≤t≤1

Talagrand characterized the class of Gaussian processes X = {X(t), t ∈ T } satisfying P{supt∈T X(t) > u} lim = 1. (10.4.7) u→∞ "(u)

521

10.4 Gaussian suprema

More precisely, let T be a compact metric space on which a real separable centered Gaussian process X with continuous covariance is indexed. Assume that (T , dX ) is separable and that {X(t), t ∈ T } has almost surely bounded sample paths. Then (10.4.7) is equivalent to the condition: there exists a unique τ ∈ T such that sup E X2 (t) = E X2 (τ ) = 1,

(10.4.8)

t∈T

and E

(X(t) − a(t)X(τ )) = o(h) as h → 0,

sup

(10.4.9)

a(t)≥1−h2

where a(t) = E X(t)X(τ ). In [Dobriˇc–Marcus–Weber: 1988] the following application is given. Let 2 < p < ∞ and consider ∞ anp infinite sequence 1 = σ1 > σ2 ≥ σ3 ≥ · · · of positive reals satisfying k=1 σk < ∞. Let {gk , k ≥ 1} be a sequence of independent normal D

random variables, with gk = N (0, σk ), so that ∞

p

|gk | < ∞

a.s.

k=1

Then P

lim

∞

k=1

u→∞

|gk |p "(u)

1/p

>u

= 2.

(10.4.10a)

If further 1 = σ1 = · · · = σn > σn+1 ≥ σn+2 ≥ · · · , then lim

P

∞

u→∞

k=1

|gk |p "(u)

1/p

>u

= 2n.

The relationship between L(h) = E

sup

(X(t) − a(t)X(τ ))

a(t)≥1−h2

and the existence of a function (u) such that P supt∈T X(t) > u ≤1 lim u→∞ (u)"(u) has been further investigated.

(≥ 1)

(10.4.10b)

522

10 Gaussian processes

Independent case. Let {ζk , k ≥ 1} be a sequence of standard N (0, 1) random variables. Let σk > 0 and σ = supk σk . Observe first that E supσk |ζk | < ∞ k≥1

⇐⇒ e

−δ/σk2

< ∞, ∀δ > 0

(10.4.11)

k≥1

⇐⇒

lim ε log # k : σk ≥ ε = 0. 2

ε→0

The first equivalence follows from the Borel–Cantelli lemma and integrability properties of Gaussian semi-norms. We now indicate how the second one obtains. We may 2 assume σ = 1. Put M(δ) = k≥1 e−δ/σk . If limε→0 ε2 log # k : σk ≥ ε = 0, given any positive real δ, there exists a positive integer kδ such that 0 < ε ≤ 2−kδ "⇒ #{k : σk ≥ ε} ≤ eδ/(8ε ) . 2

Therefore

e

−δ/σk2

=

k :σk ≤2−kδ

≤

∞

e

k=kδ 2−k−1 0, we have

E sup σk |ζk | ≤ 2 log+ k≥1

∞

1/2

exp{−mσk−2 }

σ + 3m1/2 + 2σ.

(10.4.12)

k=1

Indeed, let Jn = {k : 2−n−1 σ < σk ≤ 2−n σ } and Nn = #(Jn ). Set Ln = 2−n σ (2 log Nn )1/2 and Sn = supk∈Jn σk |ζk |. Then, E Sn 1{Sn >Ln } ≤ E σk |ζk |1{σk |ζk |>Ln } ≤ 2−n σ Nn E |ζ1 |1{|ζ1 |>(2 log Nn )1/2 } k∈Jn −n

=2

σ Nn (2/π )1/2 Nn−1 ≤ 2−n σ.

523

10.4 Gaussian suprema

Further,

E sup σk |ζk | = E sup Sn ≤ sup Ln + k≥1

n:Nn >0

n:Nn >0

≤ sup Ln + n:Nn >0

∞

E Sn 1{Sn >Ln }

n:Nn >0

2−n σ ≤ sup Ln + 2σ. n:Nn >0

n=0

Moreover, for each n such that Nn > 0, we have 2 log+

∞

1/2

exp{−mσk−2 }

σ

k=1

1/2 −n

2 σ ≥ 2 log+ (Nn exp{−22n+2 mσ −2 )

2n+3 −2 1/2 −n mσ 2 σ ≥ Ln − 23/2 m1/2 , ≥ 2 log Nn − 2 + which proves (10.4.12). Let us also briefly discuss an elementary approach often called the double sum method, which goes back to earlier works of Sirao, Watanabe, Pickands . . . later by Kôno, Adler, Piterbarg, Weber, . . . etc. This simple method, which consists of a wise use of the correlation inequalities for Gaussian pairs, is often efficient to treat concrete problems of suprema. Let X = (X1 , . . . , XN ) be a Gaussian centered vector. There is no loss of generality to assume that

X1 ≤ X2 ≤ · · · ≤ XN . By Lemma 10.1.2 for any 1 ≤ j ≤ N, j −1

P{Xi > x, Xj > x} ≤ P{Xj > x}

i=1

j −1 i=1

1 x Xi − Xj

exp − 2 2 Xj 2

2

.

By using the elementary inequality P

N +

N N Aj ≥ P{Aj } − P{Ai ∩ Aj }

j =1

j =1

i,j =1 i<j

we deduce for all x ≥ 0 1−

j −1 i=1

1 x Xi − Xj

exp − 2 Xj 2 2

2

P supN j =1 Xj > x ≤ N ≤ 1. j =1 P Xj > x

Now assume we are given a separable centered Gaussian process X = {X(t), t ∈ T } with almost surely bounded sample paths. Put for all ε > 0,

sup X(u) . mX (ε) = sup E t∈T

dX (u,t)≤ε

524

10 Gaussian processes

Recall that σ (X) = supt∈T X(t) . Let S = {t1 , . . . , tN } be a finite fixed subset of T . We order S according to the increasing order of the variances of the X(ti )’s:

X(t1 ) ≤ X(t2 ) ≤ · · · ≤ X(tN ) . To avoid trivialities, assume that X(t1 ) > 0 and that ε = inf{ X(ti ) − X(tj ) , 1 ≤ i = j ≤ N} is also positive. Consider for 1 ≤ j ≤ n and k ≥ 1 the sets Ik (j ) = {i : i < j, kε < X(ti ) − X(tj ) ≤ (k + 1)ε}. Plainly, for any j ≤ N, j −1

exp −

i=1

1 x X(ti ) − X(tj )

2 2 X(tj ) 2

2

≤

∞

1 kεx 2 2σ (X)2

#(Ik (j )) exp −

k=1

2

.

And by Sudakov’s inequality, we have for any j ≤ N and k ≥ 1, 2 sup X(s) ≤ k0 mX ((k+1)ε). ε log #(Ik (j )) ≤ k0 E sup X(s) ≤ k0 E

X(s)−X(tj ) 1 is some fixed parameter. Then for all k ≥ 1, and so ∞ k=1

exp

k0 mX ((k + 1)ε) 2 1 kεx − ε 2 2σ (X)

2

≤

∞ k=1

e−H

x2 8σ 4 (X)

≥

∞

2 k2

≤ 0

This provides j −1 i=1

P{X(ti ) > x, X(tj ) > x} ≤

1 P{X(tj ) > x}, H

m2X ((k+1)ε k 2 ε4

e−H

2 u2

+

H2 , ε2

du

x} ≥ P{sup X(t) > x} ≥ 1 − t∈T

t∈S

1 H

P{X(t) > x}.

t∈S

Now, let MX (ε) be the maximal cardinality of the subsets S of T , such that

X(s) − X(t) > ε if s = t and s, t ∈ S. Define

H x , ≤ , ε(x) = inf ε > 0 : max supk≥1 mX ((k+1))ε 2 2 ε kε (2σ (X)) where mX (ε) = sup E t∈T

sup

X(t)−X(s) ≤ε

We have ε(x) ≤ D(X). By Theorem 10.4.1, P{sup X(t) > x + 2mX (ε(x))} ≤ P{ t∈T

s∈S(x)

≤"

X(s).

X(t) > x + 2mX (ε(x))}

sup

t: X(s)−X(t) ≤ε(x)

x #(S(ε(x))). σ (X)

10.4.3 Proposition. Let X = {X(t), t ∈ T } be a separable centered Gaussian process having almost surely bounded sample paths. Let H > 1 be an arbitrary fixed parameter. Let D(X) = sups,t∈T X(t) − X(s) and γ (X) = mint∈T X(t) . For all x verifying x ≥ (2σ (X))2 max

E supT X

we have

D(X)2

P{sup X(t) > x} ≥ 1 − t∈T

P{sup X(t) > x+2mX (ε(x))} ≤ " t∈T

,

H D(X)

,

ε(x) ≤ D(X),

1 x " MX (ε(x)), H γ (X)

x MX (ε(x)) ≤ γ (X)

H P{sup X(t) > x}. H −1 t∈T

Some examples. If X(t) − X(s) ∼ |s − t|α for some 0 < α ≤ 1, then cα−1 x 1/α "(x) ≤ P sup X(t) > x ≤ cα x 1/α "(x). t∈[0,1]

If X(t) − X(s) ∼ |log |s − t||−β for some β > 21 , then cβ−1 x 2/(2β+1) ≤ log

$

%

P{supt∈[0,1] X(t) > x} ≤ cβ x 2/(2β+1) . "(x)

If X(t) − X(s) ∼ exp |log |s − t||−γ for some 0 < γ ≤ 1, then cγ−1 (log x)1/γ ≤ log

$

%

P{supt∈[0,1] X(t) > x} ≤ cγ (log x)1/γ . "(x)

526

10 Gaussian processes

Before considering and investigating in more details the properties of the specific class of Gaussian processes defined by the Stein’s elements (Chapters 5 and 6), let us briefly comment on mostly known Gaussian process: the Brownian motion and discuss one of its powerful applications through the famous Skorokhod embedding scheme. The Brownian motion. This is likely the most investigated Gaussian process, since it plays a quasi-universal role in the Probability Theory. The Brownian motion, which is also called Wiener process, is a centered Gaussian process W = {W (t), t ≥ 0} defined (and thus characterized) by its covariance function E W (s)W (t) = s ∧ t. Consequently E W (s)2 = s. In particular W (0) = 0, and if 0 ≤ u ≤ v ≤ s ≤ t, E (W (v) − W (u))(W (t) − W (s)) = v − v − u + u = 0. And for any c ≥ 0, 0 ≤ s ≤ t E (W (t + c) − W (s + c))2 = E (W (t) − W (s))2 = t − s. Thus W is a Gaussian process with orthogonal, and thus independent √ stationary increments. It also follows that {W (ct), t ≥ 0} has same law W = { c W (t), t ≥ 0} for any positive real c. Notice also that −W and W have same law. The sample paths of W are almost surely continuous. Below are some of the distributional properties of W : for any u ≥ 0

u P{ sup W (t) ≥ u} = 2P{W (T ) ≥ u} = 2" √ . T 0≤t≤T u 1 2 P{ sup |W (t)| ≤ u} = √ (−1)k e−(x−2ku) /2 dx 2π −u k∈Z 0≤t≤T ∞ 4 (−1)k −π 2 (2k+1)2 /(8u2 ) = . e π 2k + 1

(10.4.13)

(10.4.14)

k=0

A bit less known is the following estimate related to the local infimum of |W |. Let 0 < a < b < ∞. Then for any c > 0 and any real M P

inf |W (t) − M| ≥ c =

a≤t≤b

! |v|>c

(M+v)2

|v| − c e− 2a 1 − 2"( √ dv. ) √ b−a 2π a

(10.4.15)

This is easily obtained with using the so-called “reflexion principle”, which in turn amounts to apply the intermediate values theorem, for getting P inf |W (t) − M| ≥ c = P inf W (t) ≥ M + c + P sup W (t) ≤ M − c . a≤t≤b

a≤t≤b

a≤t≤b

527

10.4 Gaussian suprema

Let x ≥ 0. Then P inf a≤t≤b |W (t) − M| ≥ c W (a) = M ± x = 0, if 0 ≤ x ≤ c; and if x > c, P inf |W (t) − M| ≥ c W (a) = M + x = P sup (W (a) − W (t)) ≤ x − c a≤t≤b a≤t≤b P inf |W (t) − M| ≥ c W (a) = M − x = P sup (W (t) − W (a)) ≤ x − c . a≤t≤b

a≤t≤b

Therefore P

inf |W (t) − M| = 0 = 2

a≤t≤b

In particular,

"( √

|v|

(M+v)2 2a

e− ) √

dv b−a 2π a 3b − a M2 e− 8 max(a,b−a) . ≤ C min 1, a R

2 P inf |W (t)| = 0 = 1 − arctan a≤t≤b π

3

a . b−a

(10.4.16)

(10.4.17)

And for every positive real c √ u2 u a e− 2 P 0 < inf |W (t)| < c = 2 1 − 2"( √ ) √ du a≤t≤b b−a 2π 0 √ √ u2 ∞ u a−c u a e− 2 + 4 √ "( √ ) − "( √ ) √ du. b−a b−a 2π c/ a

√ c/ a

Concerning both local and uniform modulus of continuity, Lévy proved the following result: |W (s + t) − W (s)| a.s. = 1, 2 2h log(1/ h) h→0 0≤s≤1−h 0≤t≤h |W (s + h) − W (s)| a.s. lim sup = 1. 2 2h log(1/ h) h→0 0≤s≤1−h lim

sup

sup

(10.4.18)

We refer to Csörgö and Révész [1981], Theorem 1.1.1, and for a thorough treatment of the asymptotic properties of the increments of W . The central role of the Brownian motion can be illustrated by the powerful randomization procedure introduced by Skorokhod, which we shall describe because of its usefulness and its wide range of application. The Skorokhod embedding. Let W = {W (t), t ≥ 0} denotes a standard Brownian motion. Any centered measure μ on the real line embeds into W : there exists a stopping D

time τ such that W (τ ) = μ, and further {W (τ ∧ t) : t ≥ 0} is a uniformly bounded martingale. In fact τ is the first exit time of W from a random interval containing 0. An

528

10 Gaussian processes

explicit construction of T , which is the Skorokhod stopping time, is given in Sawyer [Sawyer: 1974], Section 2, see also [Obloj: 2004], p. 332. This has been proved to be an extremely fertile idea, which usually applies as follows. Let 0 < η < 1 and set Aη = {|τ − E τ | ≤ ηE τ }. Assuming E τ < ∞, one then controls separately the set Acη by showing, via suitable use of Tchebycheff’s inequality, that its probability is small. Additional knowledge on the moments

τ − E τ p is then required. Next, on the set Aη , the problem studied is transferred in a “Brownian environment”, by translating it into local properties (on the interval ](1 − η)E τ, (1 + η)E τ [ ) of the sample paths of W , which are generally tractable. For the first step, the Burkholder, Davis, Gundy and Millar inequalities (see Proposition 2.1 in [Obloj: 2004] and [Davis: 1976], p. 697, or estimates (1.10) in [Sawyer: 1974]), are useful. For any 1 ≤ p < ∞, there exist universal constants cp , Cp such that p/2 p ≤ E |W (τ )| = |x|p μ(dx) ≤ Cp E τ p/2 . cp E τ R

A careful analysis of the integrability properties of τ is made in [Sawyer: 1974] (see Theorem 1). For instance, for any α ≥ 0, there is a constant Cα depending on α only such that 1/2 2 E e(ατ ) ≤ Cα eα|x| μ(dx). R

When μ is a symmetric measure, the latter estimate is even two-sided ([Sawyer: 1974], Theorems 2–3). See also Lemma A.2 on p.272 in Hall and Heyde [1980] Another important construction has been given in [Fisher: 1992] which turns up to be well adapted for treating questions involving weighted sums of i.i.d. random variables. Fisher’s construction takes care of the “scale change” role played by the weights, and in turn uses the fact that if ξ is a real random variable satisfying E ξ 2 < ∞ and E ξ = 0, and λ is a fixed positive real, then on a possibly larger probability space, there exist a Brownian motion W and stopping times T and Tλ such that D

W (T ) = ξ,

D

W (Tλ ) = λξ,

D

Tλ = λ2 T .

This applies as follows. Let w = {w , ≥ 1} be a sequence of positive real numbers, ξ = {ξ , ≥ 1} be centered i.i.d. random variables with unit variance, and denote N ϒN = =1 w ξ . Then there exists a probability space with a Brownian motion {W (t), t ≥ 0} and non-negative i.i.d. random variables {τ , ≥ 1} with E τ = 1, such that N D w2 τ , . . . , (ϒ1 , ϒ2 , . . . , ϒN , . . . ) = W (w12 τ1 ), W (w12 τ1 + w22 τ2 ), . . . , W =1 r/2

and, moreover, for each real number r ≥ 1, E (τ1 ) ≤ Crr E (|ξ1 |r ), where Crr = 2(8/π 2 )r−1 (r + 1). See Fisher [1992; Theorem 2.2] and Lin–Weber [2009; Theorem 3.6], in which a more direct approach than Fisher’s one is proposed, on the basis of an idea due to Breiman.

10.5 Oscillations of Gaussian Stein’s elements

529

Problem 11. Let Z be a centered square integrable random variable. Let x be some arbitrary real number. Show that for each η > 0 T |W (t) − x| = 0 , − 1| > η + P inf P Z=x ≤P | |t−E T |≤ηE T ET D

where T be a stopping time, such that W (T ) = Z and E T = E Z2 . Deduce that there exists an absolute constant C, such that for every real x, and 0 < a < b < ∞, x2 1 T − E T s 1/2 − 8E T . E P Z = x ≤ inf + Cη e 0 M > 0, J →∞

μ a.s. x. And since the law of lim sup on the first gn ’s, J →∞ |FJ ( ·, x)| does not depend this implies by the 0-1 law that P lim supJ →∞ |FJ ( ·, x)| = ∞ = 1, μ a.s. x. The regularity of the FJ ’s will be thus reflected by the magnitude of their oscillations. The study of these oscillations is the main purpose of the present work. We introduce some

530

10 Gaussian processes

convenient notation: Jˆ = Jˆf (x) =

J

(f T j (x))2 ,

AJ = AJ,f (x) = Jˆ/J ,

j =1

Af = Af (x) = sup AJ , J ≥1

f = f (x) =

∞

(AJ +1 − AJ )2

1/2 .

J =1

10.5.1 Theorem (Boundedness of the oscillations). Let {Jk , k ≥ 1} be an increasing sequence of positive integers. If for some M > 0, the series Q1 =

∞

exp{−MJk /(Jk+1 − Jk )}

k=1

converges, then Q2 = supk Jk+1Jk−Jk < ∞ and for each f ∈ L2 (X, μ), we have E sup sup |Fθ1 ,f − Fθ2 ,f | dμ ≤ K, (10.5.1) X

k

θ1 ,θ2 ∈[Jk ,Jk+1 ]

3/4 3/4 where the finite constant K does depend on M, Q1 , Q2 , Af dμ, and f dμ only. In particular, we have μ × P sup sup |Fθ1 ,f − Fθ2 ,f | < ∞ = 1. (10.5.2) k

θ1 ,θ2 ∈[Jk ,Jk+1 ]

The size of blocks in this statement is nearly the best possible. Indeed, we will also prove 10.5.2 Theorem. Let {Jk , k ≥ 1} be an increasing sequence of positive integers satisfying the two following assumptions: (H 1) the sequence {Jk+1 − Jk , k ≥ 1} is nondecreasing, (H 2) the sequence Jk+1Jk−Jk , k ≥ 1 is nonincreasing. Assume that there exists some ergodic dynamical system (X, A, μ, T ) and f ∈ L2 (X, μ), f = 0 such that (10.5.2) holds. Then, for some positive real M, Q1 =

∞

exp{−MJk /(Jk+1 − Jk )} < ∞.

(10.5.3)

k=1

These results on oscillations can be complemented by a study of the sojourn time of the sequence FJ in a given measurable subset ⊂ R1 . Consider for large the frequencies 1 d (, x, ω) = 1{FJ (x,ω)∈} . (10.5.4) J =1

531

10.5 Oscillations of Gaussian Stein’s elements

10.5.3 Proposition (Invariance principle). Let f ∈ L2 (X, μ) with f 2 = 1. Let ⊂ R be such that λ(∂) = 0. Then, for μ-almost all x ∈ X,

W (t) D lim d (, x, · ) = I = λ 0 ≤ t ≤ 1 : √ ∈ , →∞ t

(10.5.5)

where {Wt , t ≥ 0} is the Wiener process. As a corollary we get 10.5.4 Corollary. For any interval , we have μ × P (x, ω) : lim inf d (, x, ω) = 0, lim sup d (, x, ω) = 1 = 1. →∞

→∞

Oscillations – sufficient conditions. In this part, we prove Theorem 10.5.1. By the maximal Lemma 4.1.2, the maximal operator A is weak-(2,1): for any nonnegative real B, Bμ{Af > B} ≤ f 22 .

(10.5.8)

According to Theorem 4.2.4, we also know that the second operator is strong-(2,2):

f 2 ≤ C f 2 ,

(10.5.9)

where C is an absolute constant. This clearly shows that the constant K occurring in (10.5.2) depends on M, Q1 , Q2 , and f 2 only. We can now pass to the Proof of Theorem 10.5.1. Fix some x ∈ X, and let W ( · ) = W x ( · ) be a Wiener process such that for any J , W (Jˆ) =

J

f (T j x)gj = J 1/2 FJ .

(10.5.10)

j =1

Then, for any integer k and θ1 , θ2 ∈ [Jk , Jk+1 ] we have Fθ1 − Fθ2 −1/2

= θ1

−1/2 −1/2 −1/2 W (θˆ1 ) − θ2 W (θˆ2 ) + (θ2 − θ2 )W (θˆ1 )

−1/2

= (θ1 ≤

−1/2

− θ2

−3/2 (Jk+1 2−1 Jk

−1/2

)W (θˆ1 ) + θ2

(W (θˆ1 ) − W (θˆ2 ))

− Jk )

|W (u)| + 2Jk

sup u∈[0,Jˆk+1 ]

−1/2

sup u∈[Jˆk ,Jˆk+1 ]

|W (u) − W (Jˆk )|.

532

10 Gaussian processes

Concerning the first half of the last expression, we have 1 −3/2 (Jk+1 − Jk ) J 2 k

sup |W |

[0,Jˆk+1 ]

1/2

1 Jk+1 − Jk Jk+1 Jˆk+1 ˆ− 21 = J k+1 1 1/2 2 Jk Jk J 2 1/2

k+1

1 Jk+1 − Jk ≤ 2 Jk

=K

Jk+1 − Jk Jk

1/2 1/2

sup u∈[0,Jˆk+1 ]

Q2 (Q2 + 1)

A

|W (u)|

1/2

Jˆk+1

sup u∈[0,Jˆk+1 ]

|W (u)|

A1/2 sup |W1,k (u)|, u∈[0,1]

where W1,k is a Wiener process. Concerning the second half, we observe −1/2

2Jk

sup u∈[Jˆk ,Jˆk+1 ]

|W (u) − W (Jˆk )| =

ˆ J k+1 − Jˆk 1/2

Jk

sup |W2,k (u)|, u∈[0,1]

where W2,k is another Wiener process. Moreover, Jˆk+1 Jˆk+1 Jk+1 − Jk Jˆk+1 Jˆk+1 − Jˆk + − = Ak+1 − Ak + Jk Jk+1 Jk+1 Jk Jk+1 Jk+1 − Jk ≤ |Ak+1 − Ak | + A. Jk Putting now all our estimations together, leads us to

E sup

sup

k θ1 ,θ2 ∈[Jk ,Jk+1 ]

|Fθ1 − Fθ2 | ≤ KA1/2 E sup k

Jk+1 − Jk Jk

1/2

sup |W1,k (u)| 0≤u≤1

+ 2E sup |Ak+1 − Ak |1/2 sup |W2,k (u)|, k

0≤u≤1

(10.5.11) where Wi,k are Wiener processes (there is no assumption concerning their mutual independence). Now, we are ready to apply the following lemma which goes back to more general results on Gaussian processes. Applying then (10.5.11) to the first part of (10.5.10), with the choices m = M, σk = ( Jk+1Jk−Jk )1/2 , produces a bound equal to KA1/2 . Applying next (10.5.11) to the second half of (10.5.10) with the choices m = 1, σk = |Ak+1 − Ak |1/2 , σ ≤ A1/2 , also leads to the bound ∞ 1/2 exp{−|Ak+1 − Ak |−1 } σ + KA1/2 + K. KA1/2 2 log+ k=1

533

10.5 Oscillations of Gaussian Stein’s elements

with u = |Ak+1 − Ak |−1 and thus Now, we apply the obvious inequality e−u ≤ u−2 ∞ 2 replace the sum in the last expression by = k=1 |Ak+1 − Ak |2 . It remains then to study the integral X

A(x)1/2 [log+ (x) + 1]μ(dx).

We use the inequality log+ ≤ 21/4 , and next apply Hölder’s inequality, which provides 2/3 1/3 A(x)1/2 (x)1/4 μ(dx) ≤ A(x)3/4 μ(dx) (x)3/4 μ(dx) ≤ K. X

X

X

Theorem 10.5.1 is thus proved. Oscillations – necessary conditions. In this part, we prove Theorem 10.5.2. We split the proof in four steps. (1) Exponential consolidation of the sequence {Jk , k ≥ 1}. Under assumption (H2 ), b = supk Jk+1Jk−Jk < ∞. Put B = (b + 1)2 and for each integer l, J = k : Jk ∈ [B , B +1 ) ,

N = #(J ).

Let k, k ∈ J with k ≤ k . Then, by (H1 ), (H2 ), we have Jk +1 − Jk Jk · Jk · Jk B Jk+1 − Jk Jk ≤ · Jk · ≤ B(Jk+1 − Jk ). Jk B

Jk+1 − Jk ≤ Jk +1 − Jk ≤

Thus, max(Jk+1 − Jk ) ≤ B min (Jk+1 − Jk ). k∈J

(10.5.12)

k∈J

By the definition of B we also have for each k ∈ Jl , Jk+1 = Jk + (Jk+1 − Jk ) ≤ Jk (1 + b) ≤ Jk B 1/2 ≤ B +3/2 , Jk = Jk−1 + (Jk − Jk−1 ) ≤ Jk−1 (1 + b) ≤ Jk−1 B 1/2 . It follows that supk∈Jl Jk+1 ≤ B +3/2 and inf k∈Jl Jk ≤ B +1/2 . Hence the following chain of inequalities is true: N maxk∈J (Jk+1Jk−Jk )

≥

N · mink∈J (Jk+1 − Jk ) B +1

N · maxk∈J (Jk+1 − Jk ) ≥ ≥ B +2 √ 1 B +1 − B + 2 B− B ≥ = . B2 B +2

k∈J (Jk+1 B +2

− Jk )

(10.5.13)

534

10 Gaussian processes

Similarly, we also have N mink∈J (Jk+1Jk−Jk )

N · maxk∈J (Jk+1 − Jk ) B · N · mink∈J (Jk+1 − Jk ) ≤ l B Bl B · k∈J (Jk+1 − Jk ) B(B +3/2 − B l ) ≤ ≤ ≤ B(B 3/2 − 1). Bl Bl (10.5.14) ≤

Consequently, condition (10.5.3) can be rewritten in the following more convenient form: for some M ∈ (0, ∞), N e−MN < ∞. (10.5.3∗ ) ≥1

Indeed (10.5.3) ⇐⇒

∞

e−MJk /(Jk+1 −Jk ) < ∞ ⇐⇒

k=1

l

and thus (10.5.3) implies

N e

−

2 MB √ N B− B

e−MJk /(Jk+1 −Jk ) < ∞,

k∈J

< ∞.

l≥1

In the opposite direction, we also have 3/2 (10.5.3∗ ) ⇐⇒ N e−MB(B −1)mink∈J Jk /(Jk+1 −Jk ) < ∞ l≥1

"⇒

e

Jk k+1 −Jk

−MB(B 3/2 −1) J

0. According to Birkhoff’s theorem, ˆ the following inequality JJkk − f 22 ≤ ε, holds for μ-almost all x in X provided that k is large enough, say k ≥ k0 (x). Moreover

Jk+1

k =

j =Jk +1

f (T j x)2 ≤ (Jk+1 − Jk ) f 2∞ .

Then N

∗k =

k=1

k =

k∈J

(Jˆk+1 − Jˆk ) = sup Jˆk+1 − inf Jˆk k∈J

k∈J

k∈J

≥ ( f 2 − ε) sup Jk+1 − ( f 2 + ε) inf Jk k∈J

k∈J

+ 21

≥ ( f 2 − ε)B +1 − ( f 2 + ε)B √ √ # " ≥ f 2 (B − B) − ε(B + B) B = CB , with C > 0, provided that ε is chosen small enough, which we do. Let now 0 < α < 1 be fixed. Then N

∗k =

(1−α)N

∗k +

N

k=(1−α)N +1 ∗ N (1−α)N + αN f 2∞

∗k ≤ N ∗(1−α)N + αN · sup k k∈J

k=1

k=1

≤

sup (Jk+1 − Jk ). k∈J

536

10 Gaussian processes

Recall, according to (10.5.12), that N · sup (Jk+1 − Jk ) ≤ BN inf (Jk+1 − Jk ) ≤ B k∈J

k∈J

(Jk+1 − Jk )

k∈J

≤ B(B +3/2 − B ) = (B 5/2 − B)B . We thus have N ∗(1−α)N ≥

N

∗k − α f 2∞ (B 5/2 − B)B

k=1

≥ [C − α f 2∞ (B 5/2 − B)]B = C1 B , where C1 > 0, provided that α is chosen sufficiently small, which we do assume. The implication (10.5.15) ⇐⇒ (10.5.3∗ ) finally results from the following estimates: k∈J

e

−

A2 Jk+1 2(Jˆk+1 −Jˆk )

≥

e

2 B +3/2 2k

−A

e

2 B +3/2 2∗ k

−A

2 +3/2

≥ αN e

A B − 2 ∗

(1−α)N

(1−α)N ≤k≤N

k∈J

≥ αN e

≥

A2 B 3/2 B N − 2C1 B

2 3/2

= αN e

B − A 2C

1

N

.

Densities. In this part, we give the proofs of Proposition 10.5.3 and its Corollary 10.5.4. We start with the Proof of Proposition 10.5.3. By virtue of Birkhoff’s ergodic theorem, Jˆ(x) = 1, J →∞ J lim

(10.5.16)

μ-almost surely. Fix an x satisfying the above property. We will use the natural embedding of FJ into the Wiener process. More precisely, if W˜ is a Wiener process, then we have the equalities of the laws J D FJ (x, · ), J ≥ 1 = J −1/2 W˜ f (T j x)2 , J ≥ 1 = J −1/2 W˜ (Jˆ(x)), J ≥ 1

j =1

D = (J /)−1/2 W (Jˆ(x)/), J ≥ 1 ,

where W (u) = W˜ (u)−1/2 also is a Wiener process. Thus, D

d (, x, · ) = dW =

1 1{(J /)−1/2 W (Jˆ(x)/)∈} . J =1

10.6 Tightness of Gaussian Stein’s elements

537

It will be more convenient to work with the object W dˆ = −1

J =1

1{(Jˆ/)−1/2 W (Jˆ(x)/)∈} .

W This can be viewed as dˆ = λ (V ), where λ = λ (x) = −1 J =1 δθj / is a deterministic nonnegative measure on R+ , in the definition of which δa stands for the Dirac measure at the point a and V = V (ω) = {t ∈ [0, 1] : t −1/2 W (t) ∈ }. Then as a direct consequence of (10.5.16), we have that λ converges weakly to the restricted Lebesgue measure λ(1) (dt) = 1[0,1] (t)dt, as tends to infinity. Since P{λ(1) (∂A) = 0} = 1, weak convergence implies W D dˆ = λ (V ) −→ λ(1) (V ) = I,

(10.5.17)

W

almost surely. We deduce that dˆ → I , almost surely. Moreover the property (10.5.16) together with the condition λ(∂) = 0 easily imply W lim E |dW − dˆ | = 0.

l→∞

D

D

It follows now from (10.5.17) that dW → I as tends to infinity. So d (, x, ·) → I as tends to infinity. Proof of Corollary 10.5.4. Let 0 < ε < 1 be fixed. It follows from Proposition 10.5.3 that P ω : lim sup d (, x, ω) ≥ 1 − ε ≥ lim sup P ω : d (, x, ω) ≥ 1 − ε →∞

→∞

= P{I ≥ 1 − ε} > 0. And by applying the 0-1 law we show that P ω : lim sup→∞ d (, x, ω) = 1 = 1. The proof is thus achieved.

10.6 Tightness of Gaussian Stein’s elements We continue the study of the Gaussian Stein sequences undertaken in the preceding section. We now examine their tightness properties in two essential cases: the spaces Lp (T), 1 < p < ∞ and the space C(T) of continuous functions on the torus T. We begin with a useful criterion, which is in fact a corollary of a general result of Skorohod (see Fernique [1985: Lemma 1.3]).

538

10 Gaussian processes

10.6.1 Proposition. Let {gn , n ∈ N} be a sequence of Gaussian measures defined on a separable Banach space B. Assume {gn , n ∈ N} converges to g0 for the weak topology of measures on B. Then, there exists a Gaussian vector % = {xn , n ∈ N} with values in B N , such that (a) limn→∞ xn = x0 in Lr (B), for all r ≥ 0, (b) L(xn ) = gn , n = 1, 2, . . . . The next proposition is useful for studying the relative compactness of the Gaussian Stein sequence. Let (G, d) be a compact metric space and let τ : G → G be continuous and such that the following properties are satisfied: (a) (G, τ ) is a minimal system. (b) d(τ u, τ v) = d(u, v) for any u, v ∈ G. Let μ be a Borel measure on G which is left invariant under the action of τ . For any x, let Vε (x) = {u ∈ G : d(u, x) ≤ ε}. Then (a) and (b) imply μ(Vε (x)) = μ(Vε (0)). Let 1 ≤ p ≤ ∞ be fixed and put for any f ∈ Lp (μ), any x ∈ G and any ε > 0, 1 f (ε) (x) = f (u) dμ(u). μ(Vε (0)) Vε (x) The following criterion of relative compactness in Lp (μ) is due to Kolmogorov [1985: 148]. 10.6.2 Proposition. Let F be a subset of Lp (μ). Then F is compact in Lp (μ) if and only if the two following conditions are fulfilled: (a) supf ∈F f p,μ < ∞. (b) For any δ > 0, there exists ε > 0 such that supf ∈F f − f (ε) p,μ ≤ δ. From this criterion, one can deduce that the associated Gaussian Stein’s sequence ∀f ∈ Lp (μ) ∀J ≥ 1,

1 τ FJ,f =√ gj f τ j J j ≤J

is for any 2 ≤ p < ∞ and f ∈ Lp (μ), relatively compact in Lp (μ). This allows us to establish a delicate extension of Bourgain’s entropy criterion (Corollary 5.2.7 and Theorem 5.2.4 in [Weber: 1998b]). We specify in what follows G = T and μ = λ, the normalized Lebesgue measure on T. 10.6.3 Theorem. Let {Sn , n ≥ 1}, be a sequence of L2 (μ) − L∞ (μ) contractions commuting with rotations. Assume that the property (Cp ) is realized. Then for any f ∈ Lp (μ), the set Cf is a GC set of L2 (μ).

539

10.6 Tightness of Gaussian Stein’s elements

τ , J ≥ 1}, In the proof of this result, the tightness properties of the sequence {FJ,f where τ is an irrational rotation (τ x = x + ϑ mod (1), ϑ irrational) are crucial. We shall exhibit more general classes inspired by this example, and establish their tightness in Lp (T) or C(T). We will also study examples of non-tightness. Let be the family of all triangular arrays = {λJ,j , 1 ≤ j ≤ J, J ≥ 1} with λJ,j ∈ [0, 1] for all j and J . Let {gj , j ≥ 1} be a sequence of independent N (0, 1) distributed random variables defined on a common probability space (, A, P). We study the tightness properties of the families of random elements

FJ,f,

J 1 =√ gj f (x + λJ,j ) J j =1

(10.6.1)

in Lp (T) or C(T). The symbol “+” denotes the addition operation of the additive group T = R/Z = [0, 1). Two types of arrays are of special interest: the sequences λJ,j = λj and the array corresponding to randomized Riemann sums λJ,j = j/J. Tightness in Lp .

Let p ∈ [1, ∞]. Put for f ∈ Lp (T), ωf (u) = sup f (· + h) − f ( · ) p .

(10.6.2)

0≤h≤u

The modulus of continuity of a function f ∈ C(T) coincides with that of the space L∞ (T). 10.6.4 Theorem. Let p ∈ [1, ∞) and F be a subset of Lp (T). Then F is relatively compact if and only if sup f p < ∞ and F

lim sup ωf (u) = 0.

u→0 F

This Lp version of the Arzela–Ascoli theorem (for a proof see [Dunford–Schwartz: 1958], p. 298) is a very convenient criterion of tightness, and will not be applied directly. But it helps to better understand the following criterion of tightness of a family of measures. 10.6.5 Theorem. Let p ∈ [1, ∞). A family of random functions with sample paths in Lp (T) is tight if and only if lim sup P{ F > M} = 0 and for any ε > 0 lim sup P{ωf (u) > ε} = 0. u→0 F ∈

M→∞ F ∈

The criterion yields the simplified Gaussian version. 10.6.6 Theorem. Let p ∈ [1, ∞). A family of centered Gaussian random functions with sample paths in Lp (T) is tight if p

(i) sup E F p < ∞ and F ∈

(ii) lim sup E ωf (u)p = 0. u→0 F ∈

540

10 Gaussian processes

In our case, concerning (i) we have the estimate

p

E FJ,f, p = E

T

|FJ,f, |p dλ = J −p/2

= cp J −p/2

J T

T

J p E f (x + J,j )gj λ(dx) j =1

p

f 2 (x + J,j )

λ(dx),

j =1

with cp = E |g1 |p . If p ≥ 2, the discrete Hölder inequality yields J

J J J 2/p 1−2/p 2/p f 2 (x+J,j ) ≤ f p (x+J,j ) 1 = f p (x+J,j ) J 1−2/p .

j =1

j =1

j =1

j =1

Hence p E FJ,f, p

≤ cp J

−1

J T j =1

p

|f (x + J,j )|p λ(dx) = cp f p .

(10.6.3)

The latter inequality serves as a powerful instrument of “closure”. In the case 1 ≤ p < 2, we still have a Hölder estimate p/2

p p/2 p E FJ,f, p ≤ E FJ,f, 2 ≤ E FJ,f, 22 ≤ f p , (10.6.4) which is not always efficient, especially for f ∈ Lp (T)\L2 (T), but will be useful in the counterexample given in Section 10.6.9. We show now that indicator functions fa = χ[0,a) generate tight families in Lp (T), 1 ≤ p < ∞. A closing procedure will enable us to extend this result on the class of arbitrary functions f ∈ Lp , 2 ≤ p < ∞, while for 1 ≤ p < 2 the general result is false. 10.6.7 Theorem. The family of random functions = FJ,fa , , a ∈ [0, 1), λ ∈ , J ∈ N is tight in each Lp (T), 1 ≤ p < ∞. Proof. In order to keep transparent notation for intervals, we may consider, without loss of generality, only the case a ≤ 1/2. During this proof we will use a simplified notation F for FJ,fa , . Our estimates will be uniform over these parameters. We apply Theorem 10.6.6. For the moments, we already have p

p

E F p ≤ cq f q ,

q = max(2, p).

(10.6.5)

This bound is uniform over . Now we pass to the modulus of continuity. Let M ≥ 5 be a fixed integer and let u = M −1 . For each integer k = 0, . . . , M − 1, let tk = k/M

541

10.6 Tightness of Gaussian Stein’s elements

and Ik = [tk , tk + 2a). Then for each x ∈ Ik we have F (x) − F (tk ) = J −1/2 gj − gj := J −1/2 Wk+ (x) − Wk− (x) . j ≤J tk 0 and each J , the intervals of length ωf−1 (r) form a covering of T by sets of diameter not exceeding r with respect to the metric dJ generated by the process FJ . It follows that R3 R R2 log ω−1 (r)dr. log 1/ω−1 (r) dr = log N(T, dJ , r)dr ≤ 0

f

0

0

f

By the change of variables r = ωf (u) and integration by parts, we obtain

R

2

log N (T, dJ , r)dr ≤

0

ωf−1 (R)

log udωf (u)

0

ωf−1 (R) = ωf (u) log u + 0

ωf−1 (R) 0

ωf (u) du. 2u log u

Further, the main contribution comes from the integral term, since the function ωf is monotone and we have for each u, √ u

ωf (u) log u ≤ 2 u

ωf (v) dv. 2v log v

Letting u = ωf−1 (R), we obtain

R 0

2

log N(T, dJ , r)dr ≤ 2 0

ωf−1 (R)

ωf (u) du → 0 2u log u

as R tends to 0. Since the latter bound is uniform over J , the tightness easily follows from the Ascoli–Arzela theorem.

Part IV Three studies

Chapter 11

Riemann sums

The study of almost sure convergence of Riemann sums of Lesbegue integrable functions has been proved, since the fundamental paper of Rudin, to contain deep arithmetical aspects. The arithmetical characterization of that property is an open and certainly hard question. Riemann sums have also important connections with various problems from number theory, among them the Riemann Hypothesis, through their link with Farey sequences. This chapter provides an easy access to the main results of the theory, as well as the various methods elaborated by their authors. The two last sections are devoted to some recent advances.

11.1

Introduction

In this chapter, we are mainly interested in the study of the almost sure convergence of Riemann sums of Lesbegue integrable functions. We will state and comment on essential results, discuss their links and also give indications of proofs. The final section is reserved to some recent advances. The chapter is organized as follows: in Section 11.2 we introduce Jessen’s theorem on convergence almost everywhere of Riemann sums along chains of integers. This is likely the first result of the theory. The proof is sketched and comments about its optimality are added. Rudin’s theorem is the second fundamental result and shows for instance the irregular behavior of Riemann sums along the sequence of primes. A striking example derived from this result and Dirichlet’s theorem on distribution of primes in arithmetic progressions, shows that the convergence almost everywhere of Riemann sums along a given sequence definitely relies upon the arithmetical structure of this one. Section 11.3 is devoted to results of individual type. It is indeed possible to obtain sufficient conditions on the function f , sometimes quite sharp, ensuring the convergence almost everywhere of the Riemann sums of f . These conditions are often expressed in terms of the integral modulus of continuity of f . The results are mainly due to Marcinkiewicz and Salem. In the next section, the concepts of breadth and dimension are introduced and used to establish new convergence results for specific classes of functions. This is continued with Bourgain’s approach which we already discussed in Chapter 6. In Section 11.6 we study the connections of Riemann sums with number theory, and in particular their link (Mikolás’ works) with the Riemann Hypothesis through the study of Farey sequences, and with the prime number theorem. Finally in Sections 11.7 and 11.8, recent results related to the Marcinkiewicz–Zygmund conjecture and square functions of averages of Riemann sums are stated and proved.

550

11 Riemann sums

Let f be any measurable function on T. Define for n = 1, 2 . . . and x ∈ T the Riemann sums of f as follows:

1 j f x+ . n n n−1

Rn (f )(x) =

(11.1.1)

j =0

When x = 0, we simply write

j 1 f , n n n−1

Rn (f ) =

(11.1.2)

j =0

for the usual Riemann sums considered in Section 11.6. We begin with a first important property of Riemann sums. Write for ∈ Z, e (x) = e2iπ x . Then for all n ≥ 1, 1 2iπ j n = e (x)δn| . Rn (e (x)) = e (x) e n n−1

(11.1.3)

j =0

Hence for f ∈ L2 (T), f ∼ as

∈Z a e ,

the Riemann sums of f can be also rewritten Rn (f ) = a e . (11.1.4) ∈Z n|

We shall comment on this property by means of the infinite Möbius inversion due to Hartman and Wintner [1947: p. 853]. Consider the following two infinite systems of linear equations ∞

xnm = yn , n = 1, 2, . . . ,

(11.1.5)

m=1 ∞

μ(m)ynm = xn , n = 1, 2, . . .

(11.1.6)

m=1

where μ( · ) is the Möbius function, see (11.6.1). If xn = O(n−1−η ) for some η > 0, then (11.1.5) has a unique solution which is given by (11.1.6), namely −1−η ) for some η > 0, xn = ∞ m=1 μ(m)ynm , n = 1, 2, . . . . Conversely, if yn = O(n then (11.1.6) has a unique solution which is given by (11.1.5). In our case, this shows that if the Fourier coefficients of f satisfy the condition an = O(|n|−1−η )

for some η > 0,

then f can be reconstructed from its Riemann sums. More precisely an en (x) = μ(m)Rnm (f )(x). m

(11.1.7)

(11.1.8)

551

11.2 The results of Jessen and Rudin

11.2 The results of Jessen and Rudin The problem under consideration can be presented as follows. When f is Riemann integrable on T, for any real x, f dλ. (11.2.1) lim Rn (f )(x) = n→∞

T

When f is only Lebesgue integrable, {Rn (f ), n ≥ 1} converges to T f dλ in the mean. Indeed, let us first consider f ∈ L2 (T) with Fourier expansion f ∼ a e and a0 = T f dλ = 0. As Rn f = n| a e by (11.1.4), we have a2 → 0 (11.2.2)

Rn (f ) 22 ≤ ||≥n

as n tends to infinity. And so limn→∞ Rn (f ) − T f dλ 2 = 0. Now assume f ∈ L1 (T) and let {fk , k ≥ 1} ⊂ L2 (T) approximate f in L1 (T). Let ε > 0 be fixed, and choose k large enough such that fk − f 1 ≤ ε. Then Rn (f ) − f dλ ≤ Rn (f ) − Rn (fk ) 1 1 T + Rn (fk ) − fk dλ + fk dλ − f dλ, 1

T

T

T

L1 (T)

and since Rn is an contraction, (f ) − f dλ ≤

f − f

+ R (f ) − f dλ + f dλ − f dλ R n k n k 1 k k 1 1 T T T T ≤ 2ε + Rn (fk ) − fk dλ . 2

T

Letting n tend to infinity, we obtain lim sup Rn (f ) − f dλ ≤ 2ε. n→∞

Since ε is arbitrary, we get

T

1

lim Rn (f ) − f dλ = 0.

n→∞

T

1

It is natural to inquire about the possible convergence almost everywhere of these sums. A first study was made by Hahn [1914] where approximation of Lebesgue integral by Riemann sums was considered. In Jessen [1934: Theorem A], a first result is obtained. We introduce a preparatory definition. 11.2.1 Definition. A sequence of positive integers is a chain {nk , k ≥ 1} if, for any k ≥ 1, nk |nk+1 .

552

11 Riemann sums

11.2.2 Theorem. Let {nk , k ≥ 1} be a chain. Assume that f ∈ L1 (T). Then lim Rnk f (x) = f dλ almost everywhere. k→∞

T

As noted by Marcinkiewicz and Salem 1], this result is in a certain [1940: Theorem sense best possible. Indeed, when S = 2n , n ≥ 1 , to every positive and increasing function ω such that limx→∞ ω(x) log x = 0, a function f can be associated satisfying

T

|f | ω(|f |)dλ < ∞

sup |R2s (f )| dλ = ∞.

and

T s≥0

(11.2.3)

Jessen’s result is based on the following observation: since f is 1-periodic, Rn (f ) is 1 1 n -periodic for any n ≥ 1, and thus m -periodic if m divides n. Consequently since 1 Rnk f (x) is nk -periodic for any k, (x) = lim sup Rnk f (x) = C, nk →∞

for almost every x, where C is some constant. It suffices in fact that for infinitely many p, np divides nm whenever m is large enough. Let B be some fixed real and put Ek = {Rnk (f ) > B}. Then Ek as well as Ekc are n1k -periodic. Put E = {sup1≤k≤N Rnk (f ) > B}. We have c ∩E c c c c E = EN + EN N −1 + EN ∩ EN −1 ∩ EN −2 + · · · + EN ∩ · · · ∩ E2 ∩ E1 . Set c c Ak = EN ∩ · · · ∩ Ek+1 ∩ Ek .

Then Ak is

1 nk -periodic.

Thus,

j f (x) dx = f x+ dx = nk Ak Ak

Ak

Rnk (f )(x) dx ≥ Bλ(Ak ).

Consequently, by summing over k, f dλ ≥ Bλ(E). E

Letting N tend to infinity leads to supk≥1 Rnk (f )>B

f dλ ≥ Bλ sup Rnk (f ) > B . k≥1

If B < C, then λ{supk≥1 Rnk (f ) > B} = 1. The above relation thus shows f dλ ≥ B · 1 = B. T

11.2 The results of Jessen and Rudin

Hence C ≤

T f dλ.

553

Replacing f by −f also gives

T

f dλ ≤ lim inf Rnk (f )(x) almost everywhere, nk →∞

and the result follows. Ursell [1937: p. 231] showed that Riemann sums converge almost everywhere along the whole sequence of integers for monotone square summable functions. He also gave a simple example (f (x) = |x|−δ , 1/2 < δ < 1) showing that the convergence almost everywhere of Riemann sums of L1 (T) functions does not hold in general. The next result is due to Marcinkiewicz and Zygmund [1937: Theorems 3 and 3 ]. 11.2.3 Theorem. There exists f ∈ L1 (T) such that lim supn→∞ R2n+1 (f )(x) = ∞ almost everywhere. Much later Rudin [1964: p. 322] showed that, even for bounded functions, Riemann sums may not converge almost everywhere. 11.2.4 Theorem. Let S be an increasing sequence of positive integers satisfying the following property: for any N ≥ 1, there is a set SN of N elements of S, none of which divides the least common multiple (l.c.m.) of the others. Then there is a measurable subset A of T, such that if f = 1A , {Rn (f ), n ∈ S} does not converge almost everywhere. For instance, S can be a sequence of primes. The theorem implies that there is no maximal inequality for Riemann sums. Indeed, otherwise by means of the Banach principle, the set of elements of L2 (T) for which {Rn (f ), n ≥ 1} converges almost everywhere would be closed. And since {Rn (f ), n ≥ 1} does converge almost everywhere for finite linear combinations of the characters en , this set would also be everywhere dense in L2 (T) thus providing a contradiction. By combining this theorem with Jessen’s result, and using Dirichlet’s theorem on primes in arithmetic progressions, Rudin also built a sequence S = {nk , k ≥ 1} possessing a striking property. The construction goes as follows. Let n1 = 1 and assume nk is defined. There exists an integer r > 1 such that q = 1 + rnk is a prime. Then we set nk+1 = rnk . On the one hand, by means of Jessen’s theorem, (a) for any f ∈ L1 (T), λ x : limSn→∞ Rn (f )(x) = T f dλ = 1. And on the other, by invoking this time Rudin’s theorem, (b) there exists f ∈ L∞ (T) such that λ x : limSn→∞ Rn+1 (f )(x) = T f dλ = 0. This clearly shows that the problem relies upon the arithmetical structure of S. We indicate, before closing this section, a slight generalization of Jessen’s result. The fact that for f ∼ ∈Z a e the Riemann sums of f can be expressed by

554

11 Riemann sums

2 Rn (f ) = n| a e leads to a natural generalization of the problem in L -spaces. Assume we are given a fixed set of indices N together with {a , ∈ Z} ∈ 2 . Let μ be a Borel probability measure on [0, 1]. Let {ψ , ∈ Z} be an orthonormal sequence of L2 (μ) and define the generalized Riemann sums Rn = Rn(a) = a ψ . ∈Z n|

The investigation of the almost everywhere convergence problem of the sums Rn along the index N , for all orthonormal systems, simultaneously generalizes the study of the convergence almost everywhere of Riemann sums, as well as the one of orthogonal series. It is naturally quite a hard task since even for chains, the periodicity argument used for proving the convergence of Riemann sums is no longer available for arbitrary orthogonal systems. A slight extension of Jessen’s theorem can however be obtained. 11.2.5 Theorem. Let N = {nk , k ≥ 1} be a chain and put Ek = {n : nk |n},

Fk = Ek \Ek+1 and δk2 =

an2 .

n∈Fk

2

2

2+ε log log δ1n log log log δ1n conIf for some ε > 0 the series n≥1 δn2 log δ1n verges, then the sequence (Rn , n ∈ N ) converges almost everywhere. Notice that the latter condition is of the same type as in Marcinkiewicz–Salem [1940] (see e.g., condition (11.3.10)). Extensions of Jessen’s theorem for locally compact groups were also obtained by Ross–Stromberg [1967] and more recently by Ross–Willis [1997]. A generalization of Jessen’s theorem to one-parameter groups of measurepreserving transformations was given in Civin [1955]. Let T (ε) be such a group. If f is an integrable function satisfying f (s) = f (T (1)s), then the result asserts that the n sequence of sums fn (s) = 2−n 2i=1 f (T (i2−n s)) converges almost everywhere as n → ∞. To conclude this section, let us also mention that an approach to convergence of Riemann sums using ultrafilters was proposed by Witt (see [Mühlbach: 1962]). This was pointed out to us by Wefelscheidt.

11.3

Individual theorems of spectral type

The main contributions are due to Marcinkiewicz and Salem [1940]. Various type of results are presented here, leading to deep insight. Compared with the preceding section, the approach developed is different. The authors studied regularity assumptions on f under which the associated sequence of Riemann sums converge almost everywhere. The conditions are often expressed in terms of the integral modulus of continuity of f . For instance:

555

11.3 Individual theorems of spectral type

11.3.1 Theorem. Under the condition " #2 f (x + t) − f (x) dx = O(t ε ) (ε > 0), T

the sequence {Rn (f ), n ≥ 1} converges a.e. to Indeed let f (x) ∼

ν∈Z aν e

Rn f (x) =

2iπ νx

T

Tf

aν e2iπ νx

dλ.

with a0 =

T f dλ = 0. Then = an e2iπ nx .

ν∈Z n|ν

Thus |Rn f |2 dλ = |an |2 ,

(11.3.1)

∈Z

and

n≥1 T

∈Z

Rn2 f dλ =

|an |2 =

n≥1 ∈Z

aν2 d(|ν|)

ν∈Z

where d(k) is the number of divisors of k. But for all δ > 0 (Hardy–Wright [1979: Theorem 315]) d(k) = O(k δ ). 2 Therefore the series n≥1 T Rn f dλ converges once we know that ν∈Z aν2 |ν|δ converges for some δ > 0. Now by condition (11.3.1) the integral |f (x + t) − f (x − t)|2 dtdx tr T T converges if r < 1 + ε. Further, by the Parseval relation we have |f (x + t) − f (x − t)|2 dx = 4 aν2 sin2 (2π νt), T

so that T T

ν∈Z

|f (x + t) − f (x − t)|2 dtdx = 4 aν2 tr

Consequently

T

ν∈Z

n≥1 T

sin2 2π νt dt ≥ C aν2 |ν|r−1 . tr ν∈Z

Rn2 f (x)dx < ∞,

which easily leads to Rn f (x) → 0 for almost all x, and this is exactly the assertion of Theorem 11.3.1. When replacing Riemann sums by their averages 1 Rk (f ), n n

An (f ) =

n = 1, 2, . . . ,

k=1

assumption (11.3.1) can be essentially weakened.

(11.3.2)

556

11 Riemann sums

11.3.2 Theorem (Marcinkiewicz–Salem [1940]). Under the condition |f (x + t) − f (x)|2 dtdx < ∞ t log 2t T T the sequence {An (f ), n ≥ 1} converges almost everywhere to T f dλ. Note that condition (11.3.3) is satisfied if for instance 1 |f (x + t) − f (x)|2 dx = O , log2 | log t| T

(11.3.3)

(11.3.4)

which is essentially less restrictive than (11.3.1). The authors conjectured that {An (f ), n ≥ 1} converge almost everywhere for every f ∈ L2 (T). This famous conjecture remains still unsolved. Towards the validity of this one, Bourgain provided an affirmative answer for the logarithmic averages (see Theorem 11.5.1). Marcinkiewicz and Salem also observed the arithmetical nature of the problem. Let f = p prime cp ep with cp → 0 as p tends to infinity. Then Rn (f )(x) = 0 almost everywhere if n is not a prime and Rn (f )(x) = cn en + c−n e−n otherwise. Consequently Rn (f )(x) → 0 uniformly, outside a measurable set of zero measure. But we may have f essentially bounded in no interval, which is rather surprising. Note also that if f (x) ∼ ∈Z c e2iπ x with c0 = 0, |Rp f (x)|2 dx ≤ |cν |2 ω(ν) p prime T

ν∈Z

where ω(ν) is the number of primes dividing ν. Since ω(ν) = O that Rp (f )(x) → 0 almost everywhere whenever ∞ |ν|≥3

|cν |2

log |ν| < ∞. log log |ν|

The latter condition is satisfied in particular if " #2 f (x + t) − f (x) dt = O T

log ν log log ν ,

it follows

(11.3.5)

1 log2

1 t

,

(11.3.6)

which is a much weaker condition than (11.3.1). We also mention the following criterion due to Salem [1948: p. 60] providing a sufficient condition for the convergence almost everywhere of Riemann sums Rni (f ) along a given sequence of integers {nk , k ≥ 1}, when the integral modulus of continuity of f is sufficiently smooth. 11.3.3 Theorem. Assume that for some ε > 0, |f (x + t) − f (x)| dx = O T

1 . | log t|1+ε

(11.3.7)

11.4 Breadth and dimension

557

Let {nk , k ≥ 1} be an increasing sequence of positive integers such that, for some δ < ε, 1 1+δ < ∞. (11.3.8) log nk k≥1 Then limk→∞ Rnk f (x) = T f dλ almost everywhere.

11.4

Breadth and dimension

These results are essentially due to Baker [1976], Dubins–Pitman [1979], Révesz– Rusza [1991] and Bugeaud–Weber [1998]. We begin with a preparatory definition. 11.4.1 Definition. Let A ⊂ L1 (T). A sequence S = {nk , k ≥ 1} of positive integers ˆ is called an A-sequence if for every f ∈ A, lim Rnk (f ) = f dλ almost everywhere. k→∞

T

In this section we write L = L1 (T) and M = L∞ (T). Given two arbitrary sequences of positive integers S1 and S2 , we also write S1∨ S2 for the new sequence obtained by ordering (according to the natural order) the set [s1 , s2 ], s1 ∈ S1 , s2 ∈ S2 , where as usual [s1 , s2 ] is the least common multiple of s1 and s2 . 11.4.2 Theorem (Baker [1976]). If S1 = {mk , k ≥ 1} and S2 = {nk , k ≥ 1} are two ˆ ˆ M-sequences, then S1 ∨ S2 is again an M-sequence. The proof relies upon the fact that Rm (Rn (f )) = R[m,n] (f ).

(11.4.1)

Recall the notion of -sequences introduced by Cassels [1950]. 11.4.3 Definition. Let μk be the number of fractions mjk (0 < j < mk ) which are not equal to mlq (l integer, q < k). We say that {mk , k ≥ 1} is a -sequence, if the following condition is satisfied: 1 μk > 0. n mk n

lim inf n→∞

k=1

The interest of this notion lies in the fact that if {mk , k ≥ 1} is a -sequence, then the system of inequalities {mk x} < ψ(k)

(k = 1, 2, . . . ),

558

11 Riemann sums

where ψ is a nonincreasing function, admits an infinity of solutions for almost all there exists an example of a x when the series k≥1 ψ(k) diverges. Conversely, decreasing function ψ such that the series k≥1 ψ(k) is convergent, and for which the previous system of inequalities has only finitely many solutions for almost all x. Baker’s proof is partially based on this property. It is interesting to also observe, that almost all sequences are -sequences, although it is easy to exhibit some which are not. We mention a second result due to Baker [1976: Theorem 3.1]. 11.4.4 Theorem. Let {mk , k ≥ 1} be a -sequence with lim inf k→∞ k −1 log mk = 0. ˆ Then {mk , k ≥ 1} is not an L-sequence. Baker, however, suggested that the assumption of {mk , k ≥ 1} being a -sequence is not likely well adapted to this problem, and also established the following remarkable result [1976: Theorem 3.2]. 11.4.5 Theorem. Let ε > 0. Assume that {mk , k ≥ 1} is a sequence such that:

1 7 ∀k ≥ 1, mk = O exp (k 2 (log k)− 2 −ε ) . ˆ Then {mk , k ≥ 1} is not an L-sequence. Now we introduce a generalization of the notion of a chain used by Dubins–Pitman [1979]. For sets of positive integers S1 , . . . , Sd , put (11.4.2) [S1 , . . . , Sd ] = [n1 , . . . , nd ] : ni ∈ Si , i = 1, . . . , d . Let S be a set of positive integers. The dimension of S is the least positive integer d such that S is a subset of [S1 , . . . , Sd ] for some choice of chains S1 , . . . , Sd . Jessen’s theorem was extended by Dubins and Pitman, who proved 11.4.6 Theorem. If S has dimension d and f ∈ L(log+ L)d−1 , then λ x : limSn→∞ Rn (f )(x) = T f dλ = 1.

(11.4.3)

Here L(log+ L)d−1 denotes the set of Lebesgue measurable functions on T such that |f |(log+ |f |)d−1 dλ < ∞, T

where it is understood that log+ x = loge x if x ≥ 1 and equals 0 for 0 < x ≤ 1. A partial result (d = 2, f bounded) was proved in Baker [1976]. The proof of that result consists of associating to the sequence S a converse d-martingale bounded in L logd−1 L. The result then follows from a suitable extension to converse martingales of a maximal inequality for martingales with several parameters. By considering the sequence of dimension two S = {2i 3j , i ≥ 1, j ≥ 1}, the authors also showed that it is not possible to improve Theorem 11.4.6, replacing L log L by L.

11.4 Breadth and dimension

559

Nair [1995] suggested a more elementary proof avoiding the use of martingale theory. His argumentation is based on dominated estimates, Baker’s observation on property (11.4.1) for Riemann sums, and an induction argument on the dimension of S. In [Bugeaud–Weber: 1998] it is shown that for no d ≥ 2 can L(log+ L)d−1 in Theorem 11.4.6 be replaced by L(log+ L)d−2 , which solves a conjecture by Dubins and Pitman [1979]. For d = 2, this assertion is due to Baker. The proof of the general case consists of modifications of Baker’s arguments, which are based on an elementary but rather technical lemma. Recall a notion introduced in Dubins–Pitman [1979]. 11.4.7 Definition. We say that a set K of integers has breadth at most d, if the least common multiple of every finite subset of K is the least common multiple of at most d elements of that subset. The least such d is called the breadth of K and, if no such d exists, we say that K has infinite breadth. Rudin’s theorem can be reformulated as follows: If {nk , k ≥ 1} is a strictly increasing sequence of integers with infinite breadth, there exist bounded measurable functions f on T such that {Rnk f, k ≥ 1} does not converge almost everywhere. Indeed as {nk , k ≥ 1} has infinite breadth, for every r ≥ 2, there exist k1 , . . . , kr such that nki does not divide the least common multiple of nk1 , . . . , nki−1 , nki+1 , . . . , nr , for 1 ≤ i ≤ r. There exist sets of integers which are neither of infinite breadth nor finite dimension, and consequently the almost everywhere convergence properties of Riemann sums along these sets are unknown. Such a sequence has been given explicitly by Dubins– Pitman [1979: Section 3b]. Let p1 < p2 < · · · be the sequence of consecutive primes and consider the set E1 of all numbers of the type p1 . . . pj −1 pˇ j pj +1 . . . pk , for k ≥ 2 and 1 ≤ j ≤ k, where the symbol ˇ means that pj is excluded. In [Bugeaud–Weber: 1998], for any fixed d there is built a sequence {nk , k ≥ 1} with infinite dimension and finite breadth, which is not an L(log+ L)d -sequence. The construction goes as follows: let l be a positive integer. With the above notation, consider the set El of all integers n ranged in increasing order, such that a

−1 n = p1a1 . . . pj j−1 pˇ j pj +1 . . . pk ,

for k ≥ 2, 1 ≤ j ≤ k and l ≥ a1 ≥ · · · ≥ aj −1 ≥ 1. Then El has infinite dimension and breadth not exceeding l + 1. The proof uses the following extension of a theorem of Baker. 11.4.8 Lemma. If the sequence {nk , k ≥ 1} satisfies the growth condition

nk = O exp k 1/(2d+5) , then {nk , k ≥ 1} is not an L(log+ L)d -sequence. In the same paper is also the following result concerning the sequence E1 (of finite breadth and infinite dimension).

560

11 Riemann sums

11.4.9 Proposition. Let f ∼

∞ ∞

aν2

ν=0

Then

log l < ∞. log log l

lim

λ

where {aν , ν ≥ 0} satisfies

ν=0 aν eν

E1 n→∞

Rn (f ) =

T

f dλ = 1.

As concerning averaging along E1 , writing E1 = {nk , k ≥ 1}, λ

1 Rnk (f ) = N →∞ n N

lim

k=1

f dλ = 1

T

holds for all f ∈ L2 (T). Proof. Let t > 0 and k0 be fixed. Then λ sup |Rp1 ...pˇj ...pk+1 (f ) − Rp1 ...pk+1 (f )| > t 1≤j ≤k+1 k≥k0

≤

≤

1 t2 1 t2

1≤j ≤k+1 k≥k0

2 a e dλ

p1 ...pˇ j ...pk+1 | (pj ,)=1

a2 .

1≤j ≤k+1 p1 ...pˇ j ...pk+1 | k≥k0 (pj ,)=1

Given an arbitrary number , if k2 > k1 ≥ k0 are such that p1 . . . pˇ j1 . . . pk1 +1 | , pj1 ,

p1 . . . pˇ j2 . . . pk2 +1 | , pj2 ,

then j1 = j2 . Defining thus k() as being the index corresponding to the smallest j such that pj does not divide , we get λ

1 sup |Rp1 ...pˇj ...pk+1 (f ) − Rp1 ...pk+1 (f )| > t ≤ 2 t 1≤j ≤k+1 k≥k0

But ≥ p1 . . . pk()−2 , which gives k() = O

log , log log

≥p1 ...pk0

(k() − k0 − 1)a2 .

11.4 Breadth and dimension

561

and allows us to conclude the first half of the proposition. Concerning the second half, observe that 2 1 [R (f ) − R (f )] dλ p ...p p1 ...pˇ j ...pk+1 1 k+1 N 2 j ≤k+1 k≤N

∞ 2 1 2 = 4 a # j ≤ k + 1, k ≤ N : pj , p1 . . . pˇ j . . . pk+1 | N =0

(N − k())2 1 ≤ ≤ 2. 4 N N Therefore 2 1 [R (f ) − R (f )] dλ < ∞, p1 ...pk+1 p1 ...pˇ j ...pk+1 N 2 j ≤k+1

N≥1

k≤N

which, combined with Jessen’s theorem implies 1 R (f ) = f dλ, p1 ...pˇ j ...pk+1 N →∞ N 2 j ≤k+1 lim

k≤N

and this easily allows us to get the second half of the proposition. Révész and Ruzsa [1991] considered this problem in a wider arithmetical setting, independently of the works of Baker and Dubins–Pitman. The following notion is introduced. 11.4.10 Definition. A sequence S of positive integers has Rudin-dimension d when there exists sets Sl = {nk1 , . . . , nkl } ⊂ S such that ∀i ∈ [1, l],

nki [nk1 , . . . , nki−1 , nki+1 , . . . , nkl ],

if and only if l ≤ d. Then a sequence of Rudin-dimension 1 is a chain, whereas a sequence of infinite Rudin-dimension is simply a Rudin sequence, namely a sequence satisfying the requirement of Theorem 11.2.4. That notion is in fact equivalent to the notion of breadth, since a sequence S is of finite Rudin-dimension d if and only if it has a breadth equal to d. 11.4.11 Theorem. If S1 and S2 have Rudin-dimension α and β respectively, then the Rudin-dimension γ of the sequence S1 ∨ S2 satisfies γ ≤ α + β.

562

11 Riemann sums

Since one can find sequences for which the latter inequality is in fact an equality, the result is also optimal. Observe that a sequence of integers which is built from a given set of d primes, is of Rudin-dimension d. One could believe, in view of this result, that any sequence with large dimension can be built by means of sequences of smaller dimension. This is in turn not true. Révész and Ruzsa indeed showed the existence of a sequence of dimension 3 which cannot be represented by means of a finite number of chains. The proof is based on Van der Waerden’s theorem. Révész and Ruzsa [1991] also established the following remarkable result. 11.4.12 Theorem. Let S be a sequence of integers with Rudin-dimension equal to d. If S(n) = # ([1, n] ∩ S), then there exists a positive constant C such that for all n ≥ 1, S(n) < C(log n)d .

11.5

Bourgain’s results

The metric entropy criteria of Bourgain [1988a] were studied in detail in Chapter 6. Rudin’s theorem can be deduced from Corollary 6.1.8. Indeed, for every r ≥ 2, there exist k1 , . . . , kr such that for 1 ≤ i ≤ r, nki does not divide the least common multiple of nk1 , . . . , nki−1 , nki+1 , . . . , nr . Hence, there are p1 , . . . , pr distinct primes such that vpi (nki ) > vpi (nkj )

whenever i = j,

where vp denotes the p-adic valuation. Put N = lcm(nk1 , . . . , nkr )/(p1 . . . pr ) and notice that nki does not divide N for 1 ≤ i ≤ r. Consider the set of integers E = n = Np1α1 . . . prαr : αi ∈ {0, 1} and the function

Then

1 2iπ nx f =√ e . 2r n∈E 1 Rnks (f ) = √ 2r

e2iπ nx ,

n∈(E∩Nps N)

and for 1 ≤ s = t ≤ r, 1

Rnks (f ) − Rnkt (f ) 2 = √ . 2 √ Thus C(1/ 2) = ∞ and this achieves the proof. Akcoglu–Bellow–Jones–Losert–Reinhold-Larsson–Wierdl [1996: Theorem A.2] showed that the strong sweeping out property also takes place there. Slight extensions of Rudin’s result are given in Ruch [1997], [1998a], Ruch and Weber [1997].

11.5 Bourgain’s results

563

· is a sequence of primes and λ1 , λ2 , . . . is a sequence For instance, if p1 < p2 < · · ∞ of positive reals such that σN = N k=1 λk ↑ ∞, there exists f ∈ L (T) such that the averages N 1 λk Rpi (f ), N = 1, 2, . . . BN (f ) = σN i=1

do not converge almost everywhere. Related to Marcinkiewicz–Salem’s conjecture, Bourgain [1988d: Theorem 1.10] proved the following beautiful result we already mentioned right after Theorem 11.3.2. 11.5.1 Theorem. For any f ∈ L2 (T), N 1 1 lim f dλ almost everywhere. Rn (f ) = n→∞ log N n T n=1

Sketch of proof. The proof consists of proving the maximal inequality N 1 Rn (f ) ≤ C||f ||2 . sup s 2 N =22 N n=1

Let f ∈ L2 (T), f (t) ∼

ˆ

k≥1 f (k)e

2iπ kt ,

then

N d(k, N ) 1 Rn f (t) = fˆ(k)e2iπ kt , N N n=1

k≥1

where d(k, N ) = #{1 ≤ n ≤ N : n|k}. Let 1 if n | k, χn (k) = 0 otherwise. Define P ∗ = {pj , p prime, j ≥ 1}. Notice that ( ( χn (k) = (1 − χpj (n)) = (1 − χv (n)). v∈P ∗ v |k

p j ,j ≥1 p |k

Thus d(k, N) =

N

χn (k) =

n=1

Consider the multipliers ( (N ) μk =

v∈P ∗ ,v≤N v |k

(1 − v −1 ) =

N (

(1 − χv (n)).

v∈P ∗ v |n

n=1

( "

# (1 − v −1 ) + v −1 χv (k) .

v∈P ∗ v≤N

564

11 Riemann sums

One checks that

(1 − v −1 ) + v −1 χv (k) = |μˆ v (k)|2

where μv is the probability measure on T defined by 2 1− μv = 1 − v −1 δ0 +

√

v−1 1 − v −1 δj , v v j =0

and δx denotes the Dirac measure at point x. The leading idea of the proof consists of (N ) ) replacing the multipliers d(k,N by μk , and use Rota’s theorem [Rota: 1962], N 11.5.2 Theorem. Let (X, A, μ) be a probability space. Let {Tn , n ≥ 1} be positive operators, which are contractions on both L1 (μ) and L∞ (μ) and mapping the constant-1 function to itself. Then the sequence of operators T1 . . . Tn Tn∗ . . . T1∗ yields a bounded operator on Lp (μ), p > 1. In particular, if the Tn are given by convolution on T with a probability measure μn , one gets the inequality n ( 2 2iπ kt sup ˆ . f (k)e | μ ˆ (k)| ≤ C f p j n

k∈Z

(11.5.1)

p

j =1

In [Bourgain: 1990], a proof of this is given using the martingale maximal inequality: if {EN , N ≥ 1} is a sequence of refining expectation operators on a probability space, then sup |EN f | ≤ C f p . (11.5.2) p N

By (11.5.1)

(N ) μk fˆ(k)e2iπ kt ≤ C f 2 . sup 2

N ≥1 k≥1

Let d(k, 22s ) 2iπ kt ˆ f (k)e M1 f = sup , s 22 s≥1

s (22 ) M2 f = sup μk f (k)e2iπ kt . s≥1 k≥1

k≥1

Then M1 f ≤ M2 f +

2 1/2 " d(k, 22s ) s (22 ) # ˆ 2iπ kt − μ . f (k)e s k 22 s≥1 k≥1

By integrating and using Fubini’s theorem, we get for any f ∈ L2 (T), s s 2 1/2 d(k, 22 ) (22 ) − μk

f 2 .

M1 f 2 ≤ M2 f 2 + sup 2s k≥1

s≥1

2

11.6 Connection with number theory

565

The proof will be finished once we know that s d(k, 22s ) (22 ) 2 − μ sup ≤ C < ∞. s k 22 k≥1

s≥1

This is the main step. In the course of the proof the following interesting fact is also established: for all N and k, d(k, N) (N ) ≤ Cμk . N

11.6

Connection with number theory

Riemann sums can be connected to Farey sequences, and through this link to the Riemann Hypothesis (RH). This remarkable fact has been observed and developed by Mikolás [1949a], [1949b], [1951]. By comparing the convergence of averages associated to Farey sequences of a periodic function f with those of the Riemann sums of f , next studying the error of approximation made in this convergence (for a class of functions with bounded derivative), Mikolás showed a quite interesting equivalent reformulation of (RH) of functional analysis type. Although Mikolás’s work is still motivating number theorists, it seems to be little known. One can however quote the papers of Kanemitsu–Yoshimoto [1996] andYoshimoto [1998]. We begin by discussing the link between Farey sequences and Riemann sums and recall some useful estimates concerning Euler and Möbius functions. For the clarity of the exposition, we will display the arguments leading to the establishment of this link. At the end of this section some other results connecting Riemann sums with number theory, and especially with the prime number theorem, are presented. Farey sequences. Let x ≥ 1 be a given real; we denote by Fx = nk , 0 < k ≤ n ≤ x, (k, n) = 1 the Farey sequence of order x. The ν-th term is denoted by ρνx or ρν , when there is no confusion. The number of these fractions is (x) =

[x]

ϕ(n)

(n > 1),

n=1

where ϕ(n) is the Euler function ϕ(n) = #{m ≤ n, (m, n) = 1},

n > 1.

Let μ be the Möbius function ⎧ ⎪ ⎨1 μ(n) = 0 ⎪ ⎩ (−1)k

if n = 1, if p2 | n, if n = p1 . . . pk .

(11.6.1)

566

11 Riemann sums

1 From the formula ζ (s) = Wright [1979: 287]):

∞

μ(n) n=1 ns ,

s > 1, we get the following estimate (Hardy–

3 2 x + O(x log x). π2 Recall also for later use (Hardy–Wright [1979: 270]) that (x) =

M(x) =

x

x

μ(n) = o(x) and

n=1

|μ(n)| =

n=1

√ 6 x + O( x). 2 π

(11.6.2)

For an arbitrary real-valued function h defined on [0, 1], we have already denoted the associated Riemann sums by

1 k h . n n n

Rn (h) =

(11.6.3)

k=1

The link between Farey sequences and Riemann sums is deduced via the Möbius inversion formula: if g(n) = d|n f (d), then

f (n) =

μ(d)g

d|n

n . d

See Hardy–Wright [1979; p. 266]. Let

U (n) =

h

(k,n)=1 k≤n

Then Vn =

d|n

k , n

Vn =

(k,d)=1 k≤d

k d

U (d).

d|n

h

=

n k h , n k=1

and so, letting F (d) = dRd (h), d n n U (n) = μ h = μ F (d). d d d d|n

d|n

=1

We deduce (x)

h(ρν ) =

x n=1

ν=1

=

[x] d=1

x n U (n) = μ F (d) = μ(δ)F (d) d n=1 d|n

[ dx ]

F (d)

δ=1

μ(δ) =

dδ≤x

[x] d=1

$ %

dRd (h)M

x d

.

(11.6.4)

567

11.6 Connection with number theory

Thus for any real A, 1 (x)

h(ρν ) − A =

1≤ν≤(x)

1 (x)

n(Rn (h) − A)M([x]/n).

(11.6.5)

1≤n≤[x]

If Ru (h) → A as u → ∞, then (x) 1 h(ρν ) → A, (x)

(11.6.6)

ν=1

as x → ∞, by Toeplitz’s criterion which we recall now. 11.6.1 Lemma. Let t1 , t2 , . . . , tn be a sequence of reals converging to 0, and let {ak,l k, l ≥ 1} be an array of reals satisfying the following conditions: lim ak,l = 0,

(1) ∀l,

k→∞

(2) S(k) = |ak,1 | + |ak,2 | + · · · + |ak,k | = O(1). Then the new sequence {tk , k ≥ 1} defined by tk = ak,1 t1 + ak,2 t2 + · · · + ak,k tk converges to zero as well. For a proof see Kuipers–Niederreiter [1971], p. 75. We show that conditions (1) and (2) are indeed satisfied: 2 nM xn x (1) for all fixed n, (x) ≤ (x) ∼ π3x → 0, x → ∞, x x [x] x2 π2 (2) (x) n=1 n M( n ) ≤ (x) ∼ 3 , x → ∞. We can thus state 11.6.2 Theorem. Let h be such that the Riemann sums Rn (h) converge to a (finite) real A as n tends to infinity. Then the associated Farey averages converge to A: Fn h =

(x) 1 h(ρνx ) → A. (x) ν=1

Thus if limn→∞ Rn (h) =

T h(t)dt,

then limn→∞ Fn h =

T h(t)dt

as well.

We shall now estimate the error of approximation (x) 1 x h(ρν ) − h(t)dt, (x) T ν=1

and its connection with (RH).

(11.6.7)

568

11 Riemann sums

" # If h has bounded derivative on [0, 1], then d Rd (h) − T h(t)dt = O(1). Using this, we get (x) h(ρνx ) − (x) h(t)dt = O(x log x). (11.6.8) T

ν=1

This may however be easily improved. By using the simple relation (Landau [1927: II, p. 176]), (x) 2 ν ρνx − = O(1) (11.6.9) (x) ν=1

and writing that (x)

h(ρνx ) =

ν=1

(x)

h(ρνx ) − h

ν=1

ν (x)

+

(x)

h

ν=1

ν (x)

we get (x)

h(ρνx ) − (x)

ν=1

T

h(t)dt = O(1)

(x) ν=1

x ν ρ − + O(1). ν (x)

And so, in view of estimate (x) ∼ x 2 and Cauchy–Schwarz’s inequality, we arrive at (x)

h(ρνx ) − (x)

ν=1

! (x)

T

h(t)dt = O x

ρνx −

ν=1

ν 2 21 . (x)

(11.6.10)

(x) Thus by (11.6.9), ν=1 h(ρνx )−(x) T h(t)dt = O(x). Now, recall Franel’s identity (Franel [1924] or Landau [1927: II, 173]) (x) ν=1

ρνx

ν − (x)

2

1 x x (a, b)2 = M M −1 , 12(x) a b ab [x] [x]

(11.6.11)

a=1 b=1

and Tchudakov’s result [1936: p. 591–602] on the error of approximation in the prime number theorem x

du γ π(x) − (11.6.12) = O xe−c1 (log x) . 2 log u By using its analogue for the Möbius function (Fogels [1940])

γ M(x) = O xe−c2 (log x) , (11.6.13) # 1 11 " where γ ∈ 2 , 21 , c1 = c1 (γ ), c2 = c2 (γ ) are constants, we get the much better estimate (Mikolás [1949a]) (x) ν=1

ρνx −

ν (x)

2

γ = O xe−c3 (log x) .

(11.6.14)

11.6 Connection with number theory

569

Our next theorem relies to the Riemann Hypothesis, which we briefly recall. The Riemann Hypothesis. The Riemann zeta function defined on the half-plane {s : $s > 1} by the series ∞ ζ (s) = n−s n=1

admits a meromorphic continuation to the entire complex plane, with the unique and simple pole of residue 1 at s = 1. In the half-plane {s : $s ≤ 0}, the Riemann zeta function has simple zeros at −2, −4, −6, . . ., and only at these points which are called trivial zeros. There exist also non-trivial zeros in the band {s : 0 < $s < 1}. See for instance to [Blanchard: 1969] (Propositions IV.10 and IV.11, p. 84) and [Titchmarsh: 1951]. The Riemann Hypothesis (RH) asserts that all non-trivial zeros of the function ζ have abscissa 1/2. If the RH is true we have the well-known relations, the first implying the second: 1 +c4 logloglogloglogx x 2 M(x) = O x , (x)

ρνx −

ν=1

ν 2 1+c log log log x = O x 5 log log x . (x)

(11.6.15)

These estimates allow to establish the first part of the following result (Mikolás [1949: Theorems 3, 4]). The proof of the second part relies upon Dirichlet series machinery. 11.6.3 Theorem. Assume that h has a bounded derivative. Then (x)

h(ρνx )

ν=1

where γ ∈ ε > 0,

#1

11 2 , 21

"

= (x)

T

γ h(t)dt + O xe−c(log x) ,

and c = c(γ ) is a constant. And if (RH) is true, then for every (x) ν=1

h(ρνx ) = (x)

T

1 h(t)dt + O x 2 +ε .

Conversely, if h has a bounded derivative and

1 +ε (x) x 2 (i) , ν=1 h(ρν ) = (x) T h(t)dt + O x ∞ 1

(ii) F (s) = n=1 ns nRn (h) − n T h(t)dt is regular and has no zero in the strip $(s) > 21 , then (RH) is true. A remarkable consequence of this result is the following

570

11 Riemann sums

11.6.4 Theorem. Let f ∈ C 3 ([0, 1]) such that f (t) is not identically 0, and f (1) − f (0) 3ζ (3) > ≈ 0.574 . . . , 2π T |f (t)| dt then (RH) ⇐⇒

(x)

f (ρνx ) = (x)

T

ν=1

1 f (t)dt + O x 2 +ε ,

∀ε > 0.

Examples are f (t) = eλt , λ = 0, |λ| < 2π/(3ζ (3)), or f (t) = cos τ t (0 < τ ≤ π2 ). The proof consists of establishing condition (ii) of Theorem 11.6.3, under the assumptions made. To prove that F has no zero in the strip $(s) > 21 , Mikolás used the Euler–Maclaurin sum-formula at order 1: for ϕ having a continuous derivative in the interval (a, b), b b

x − x − 21 ϕ (x)dx ϕ(n) = ϕ(x)dx + a≤n≤b

a

a

+ a − a − 21 ϕ(a) − b − b − 21 ϕ(b), to estimate the sum nRn (h) − n T h(t)dt. This allows us to write F as a difference of two Dirichlet series, and reduces the study of the zeroes of F to finding good bounds for these two Dirichlet series. Yoshimoto [1998] recently showed that the constant 3ζ (3) 2π can be slightly sharpened by √ $ % 2 3 2 2 π + log 2 − · 6 3 3 Other equivalent reformulations of the RH. Among the many equivalent reformulations of the RH, the following one due to Robin [1984], is likely one of the most striking and at the same time the most simple. Let an integer n be termed “colossally abundant” if, for some ε > 0, σ (n)/n1+ε ≥ σ (m)/m1+ε for m < n and σ (n)/n1+ε > σ (m)/m1+ε for m > n. Using colossally abundant numbers, Robin showed that the RH is true if and only if σ (n) < eγ log log n, n for n > 5040, where σ (n) is the sum of divisors of n and γ is Euler’s constant. Let {xn , n ≥ 1} be the sequence of colossally abundant numbers. In the same paper, he also showed that the sequence {σ (xn )/xn log log xn , n ≥ 1} contains an infinite number of local extrema. In relation with Robin’s result, Lagarias [2002] showed that the RH is true if and only if σ (n) ≤ Hn + eHn log Hn ,

11.6 Connection with number theory

where Hn =

j ≤n 1/j

571

is the n-th harmonic number.

Grytczuk [2007] investigated the upper bound for σ (n) with some different n. Let ) α (2, m) = 1 and m = kj =1 pj j , where the pj are prime numbers and αj ≥ 1. Then, for all odd positive integers m > 39 /2, σ (2m)

) k h k h k h E δS =S e2iπ( n − m )Sl = E δ{S −S =0} e2iπ( n − m )(Sl −S ) e2iπ( n − m )S k h k h = E δ{S −S =0} e2iπ( n − m )(Sl −S ) E e2iπ( n − m )S k h = P S = S E e2iπ( n − m )S . And so E

T n∈[i,j ]∩N

=

a2 E

n∈[i,j ∧]∩N

≥1

+2

2 Rn f˜(x) dx

n−1 1 2iπ kS /n 2 e n k=0

a a P S = S E

>

n∈[i,j ∧]∩N m∈[i,j ∧ ]∩N

n−1 m−1 1 2iπ k − h S n m := S1 + S2 . e nm k=0 h=0

(11.7.8) D We treat the sum S2 first. Plainly for u > v, Su − Sv = Su−v , and so P Su = Sv } = 2−(u−v) . Then,

k h n−1 m−1 e2iπ n − m + 1 1 S2 = a a 2−( −) nm 2 n∈[i,j ∧]∩N

:= S2 + S2 .

−(− )

n∈[i,j ∧]∩N m∈[i,j ∧ ]∩N

k=0 h=0

n−1 m−1

1 e2iπ nm k=0 h=0

k

h n−m

2

+1

(11.7.9)

11.7 Riemann sums and the randomly sampled trigonometric system

577

By interchanging the notation of the indices ( with , n with m and k with h), we get

−2iπ nk − mh n−1 m−1 +1 1 e S2 = a a 2−( −) . nm 2 n∈[i,j ∧]∩N 1/2 and put 3 ϕ =

2a log ,

τ =

sin ϕ /2 . ϕ /2

We assume sufficiently large for τ to be greater than (a /a)1/2 . This is realized once i is large enough, say i ≥ i0 . Consider the sector

If

πu [m,n]

A = [0, ϕ [ ∪ ]π − ϕ , π [ , πu ∈ / A , then cos [m,n] ≤ cos ϕ . And

Ac = [0, π [\A .

π u 2 cos ≤ (cos ϕ ) ≤ e−2 sin (ϕ /2) . [m, n]

But 2 sin2 (ϕ /2) = 2(ϕ /2)2 τ2 ≥ a log . We deduce 2πu 0≤u 3C

1 n 1 m

:ϕ()>m

ϕ ' (y)

dy + f 2 y

1 n 1 m

≤

dy

1 , 3

and using (11.8.16), for some M suitably chosen, 1 . 3

P{A > M} ≤

(11.8.30)

There is, moreover, no loss in assuming M > 1. These inequalities imply, in view of (11.8.17), that B=

1 m2

2 a2 dm () − dn () ≤ 3CM

:ϕ()>m

1 n 1 m

ϕ ' (y)

dy + f 2 y

1 n 1 m

dy .

(11.8.31) Now consider the term D :=

1 1 − n m

2

a2 dn2 (),

(11.8.32)

:ϕ()>m

in the right-hand side of (11.8.10). Again, by means of the same argument used to obtain (11.8.12), we have D≤

1 1 − n m

2

j :ϕ(Sj )>m

aS2j dn2 (Sj ) ≤ A .

1 1 − n m

2

j :ϕ(j )>m

aj2

2 1 , k|Sj 1≤k≤n

(11.8.33)

596

11 Riemann sums

and similarly to (11.8.18),

2

=

1

d|S 1≤d≤n

1. 1 +2 [d,δ]|S 1≤d m

a2

√ :ϕ()> m

1 [d, δ]

√ :ϕ()> m

n 1. Let finally 0 < β < 1. Then the last expression in (12.1.20) is at most β n 1−β d C C = β n1−β d 2β−1 h d h and thus the λ∗N is less than C

1 1−β 2β−1 n d . hβ

(12.1.21)

d|h

Let 0 < ε < min(β, 1 − β). Since the number of divisors of h is O(hε ), for β ≥ 1/2 the sum in (12.1.21) is O(h2β−1+ε ) and thus the expression in (12.1.21) is less than C

1 1−β 2β−1+ε n h ≤ Cn1−β . hβ

609

12.1 Introduction and mean convergence

If β < 1/2, then the sum in (12.1.21) is O(hε ) and thus the expression in (12.1.21) is bounded by 1 C β n1−β hε ≤ Cn1−β . h Thus in both cases the expression in (12.1.21) is O(n1−β ), and thus the lemma is proved. To prove Corollary 12.1.6, it suffices to show that N (nh , nk ) k=1

[nh , nk ]

≤ C log nh .

(12.1.22)

Fix d|nh and compute the sum in (12.1.22) for those 1 ≤ k ≤ N such that (nh , nk ) = d. This restricted sum clearly cannot exceed, in view of the assumption of Corollary 12.1.6, 1≤k≤N,d|nk

d2 d2 A ≤ · . nh nk nh d

Summing for all d|nh , and using the fact that the sum of divisors of nh is O(nh log nh ), we get (12.1.22). Our next theoremgives a necessary and sufficient condition for the mean convergence of the series ∞ k=1 ck f (nk x) in terms of the coefficients ck and the Fourier coefficients of f . Despite its precise character, it is of mainly theoretical interest only since its number-theoretical character makes it difficult to apply in concrete cases. 12.1.8 Theorem. Let f ∈ L2 (T) have complex Fourier series f ∼ k∈Z ϕk ek with ϕ0 = T f (t)dt = 0 and ek (x) =exp(2π ikx). Let {nk , k ≥ 1} be an increasing sequence of positive integers. Then ∞ k=1 ck f (nk x) converges in the mean if and only if the following conditions are fulfilled: 2 a) lim sup ϕn/nk ck = 0, R→∞ P ≥R

b)

n

|n|>nR

nk |n k≤P

2 ϕn/nk ck

(12.1.23)

< ∞.

nk |n

If both sequences {ϕn , n ∈ Z} and c have constant signs then (12.1.23a) follows N from (12.1.23b), so that the sequence {SN (c, f ), N ≥ 1} converges in mean if and 2 only if condition (12.1.23b) holds. Also, if n < ∞, then the nk |n |ϕn/nk ck | N sequence {SN (c, f ), N ≥ 1} converges in mean. Proof. Observe that N (c, f ) = en ϕn/nk ck = en ϕn/nk ck + en ϕn/nk ck . SN n

nk |n k≤N

|n|≤nN

nk |n

|n|>nN

nk |n k≤N

610

12 A study of the system (f (nx))

Let M ≥ N ≥ R. Then, N N SN (c, f ), SM (c, f ) =

nk |n k≤N

n

=

ϕn/nk ck

|n|≤nR

nk |n k≤M

2 ϕn/nk ck

+

ϕn/nk ck

|n|>nR

nk |n k≤N

ϕn/nk ck

nk |n k≤N

ϕn/nk ck .

nk |n k≤M

Thus 2 N N (c, f ) − ϕn/nk ck SN (c, f ), SM |n|≤nR

nk |n k≤N

≤

2 1 ! 2 1 ! 2 2 ϕn/nk ck ϕn/nk ck |n|>nR

≤ sup

|n|>nR

nk |n k≤N

2 ϕn/nk ck

P ≥R |n|>n

R

nk |n k≤M

nk |n k≤P

→ 0, as R tends to infinity by assumption. Consequently, lim

2 N N sup SN (c, f ), SM (c, f ) − ϕn/nk ck = 0.

R→∞ M,N ≥R

|n|≤nR

nk |n k≤N

In other words, N N lim SN (c, f ), SM (c, f ) = A :=

M,N→∞

n

2 ϕn/nk ck

< ∞.

nk |n

And also N lim SN (c, f ) 22 = A.

N →∞

These two facts then imply that lim

N,M→∞

N N

SN (c, f ) − SM (c, f ) 2 = 0,

as required. N Conversely if the sequence {SN (c, f ), N ≥ 1} converges in mean, it is then bounded in mean: N sup SN (c, f ) 2 = B < ∞. N ≥1

611

12.2 Almost sure convergence – sufficient conditions

But as N (c, f ) 22 =

SN

|n|≤nN

2 ϕn/nk ck

nk |n

+

|n|>nN

2 ϕn/nk ck

,

nk |n k≤N

this implies that A ≤ B. Now let f ∗ denote the limit in mean of the sequence N (c, f ), N ≥ 1}. From {SN ∗ f , en − S N (c, f ), en ≤ f ∗ − S N (c, f ) 2 N N we deduce N (c, f ), en = lim f ∗ , en = lim SN N →∞

N →∞

ϕn/nk ck =

nk |n k≤N

ϕn/nk ck .

nk |n

Thus f ∗ = n∈Z en nk |n ϕn/nk ck . Let R be some positive integer and define HR = {en , |n| ≤ nR }. Let pR be the projection onto the orthogonal complement HR ⊥ of HR . Then, pR (f ∗ ) 2 − pR (S N (c, f )) 2 ≤ pR (f ∗ ) − pR (S N (c, f )) 2 N N N ≤ f ∗ − SN (c, f ) 2 → 0,

as N tends to infinity. Thus, N N (c, f )) 2 ≤ sup f ∗ − SN (c, f ) 2 → 0, sup pR (f ∗ ) 2 − pR (SN N ≥R

N≥R

as R tends to infinity. Now, by the triangle inequality, ! 2 1/2 N sup pR (SN (c, f )) 2 = sup ϕn/nk ck N≥R

N ≥R

|n|>nR

nk |n k≤N

N ≤ sup pR (f ∗ ) 2 − pR (SN (c, f )) 2 + pR (f ∗ ) 2 N ≥R

→ 0, as R tends to infinity. This completes the proof.

12.2 Almost sure convergence – sufficient conditions Let f ∈ L2 (T) with T f (t)dt = 0 and let N be an increasing sequence of positive integers. Using standard terminology, we call the pair (f, N ) (or, equivalently, the sequence f (nk x)) a convergence system if for any c ∈ 2 , ∞ x) converges for k=1 ck f (nk almost all x ∈ T. This is the simplest and strongest type of behavior of ∞ k=1 ck f (nk x), but it holds only in a few special situations. By Carleson’s theorem [1966], {cos 2π nx} and {sin 2πnx} are convergence systems. More generally, Gaposhkin [1968] proved (using Carleson’s theorem) the following result:

612

12 A study of the system (f (nx))

12.2.1 Theorem. Let f ∈ Lipα (T) for α > 1/2 and 1, 2, . . . } is a convergence system.

T f (t)dt

= 0. Then {f (nx), n =

Another classical result, proved by Kac [1943] for the Lipschitz class and extended substantially by Gaposhkin [1966b] is the following 12.2.2 Theorem. Let f ∈ L2 (T) with modulus of continuity ω2 (δ, f ) = sup

T f (t)dt

= 0 and assume that the square 1/2

|f (x + h) − f (x)| dx 2

0

0 0).

(12.2.1)

Let {nk , k ≥ 1} be a sequence of positive reals satisfying the Hadamard gap condition nk+1 /nk ≥ q > 1,

k = 1, 2, . . . .

(12.2.2)

Then f (nk x) is a convergence system. These theorems describe the known situations when f (nk x) is a convergence system. All conditions of these results are sharp. Gaposhkin [1966a] showed that Theo

−1/2 rem 12.2.2 becomes false if we replace the right-hand side of (12.2.1) by O log 1δ and Berkes [1997] proved that the condition f ∈ Lipα (T), α > 1/2 in Theorem 12.2.1 and the Hadamard gap condition (12.2.2) in Theorem 12.2.2 are also best possible: there exists a function f ∈ Lip1/2 (T) with T f (t)dt = 0 such that for any positive sequence {εk , k ≥ 1} tending to 0, there exists an increasing sequence N of integers satisfying nk+1 /nk ≥ 1 + εk , k = 1, 2, . . . (12.2.3) and c ∈ 2 such that the series ∞ k=1 ck f (nk x) diverges almost everywhere. Going beyond the conditions of Theorems 12.2.1 and 12.2.2, the almost everywhere conver gence behavior of ∞ c f (n x) becomes very complicated and examples show that k k=1 k ∞ the properties of k=1 ck f (nk x) are determined by a delicate interplay between the coefficient sequence {ck , k ≥ 1}, the smoothness properties of f and the growth speed and number-theoretic properties of {nk , k ≥ 1}. In this section we give a detailed study of this behavior and prove several convergence results such series. Our main interest for 2 ω(k) < ∞ where ω(k) → ∞ will be to find convergence criteria of the type ∞ c k=1 k is some positive sequence (Weyl multiplier) depending on f and {nk , k ≥ 1}. Before formulating our results, we first give an equivalent reformulation of the convergence system property of ∞ k=1 ck f (nk x) in terms of maximal operators. The following result is due to Nikishin [1970b].

613

12.2 Almost sure convergence – sufficient conditions

12.2.3 Proposition. A pair (f, N ) is a convergence system if and only if for any ε > 0, 0 < δ < 1, there exist a set Aε,δ ⊂ (0, 1) with Lebesgue measure ≥ 1−ε and a constant Cε,δ > 0 such that for arbitrary c ∈ 2 we have

N ∞ 1−δ (1−δ)/2 sup ck f (nk x) dx ≤ Cε,δ ck2 .

Aε,δ N ≥1 k=1

k=1

We now prove an analogous statement involving a weak (2, 2) type inequality. 12.2.4 Proposition. A pair (f, N ) is a convergence system if and only if there exists a constant C such that for any c ∈ 2 the following maximal inequality holds: N (c, f ) > t c 2 ≤ C. sup t 2 λ sup SN t≥0

N ≥1

N ,f

Proof. Given a pair (f, N ), considerthe L2 (T)-operators SN via the isomorphism c → g if g ∼ k c|k| ek by N ,f

SN

(g) =

N

, N = 1, 2, . . . defined

ck f (nk .).

k=1

Consider also the family of pointwise measurable transformations on T: τj x = j x mod 1. For any integer j ≥ 2, the transformation τj preserves the normalized Lebesgue measure λ. It is in turn an exact endomorphism (Section 3.3) and is in particular strongly mixing: Tj g = g τj . L2 -isometries preserving 1is better viewed on That the Tj ’s are commuting positive Fourier expansion of g, since if g ∼ m∈Z gm em , then Tj f ∼ m∈Z gm emj , which readily implies Tk (Tj g) = Tj (Tk g) (j, k = 1, 2, . . . ). (12.2.4) Proceeding next by approximation, we deduce that (12.2.4) holds for any g ∈ Lp (T), N ,f 0 < p ≤ ∞. This in particular implies that the sequence of operators SN commutes with E : for any g ∈ L2 (T), N ,f

SN

N ,f

(Tj g) = Tj (SN

g)

(N, j = 1, 2, . . . ).

Further, for any g ∈ L2 (T), J 1 lim Tj g − gdλ = 0. J →∞ J j =1

T

2

(12.2.5)

614

12 A study of the system (f (nx))

Since strong convergence implies weak convergence, it follows for any u, v ∈ L2 (T) that J 1 Tj u, v = u, 1v, 1. lim J →∞ J j =1

Choosing u = χ {A}, v = χ{B} where A, B are Borel sets of T and χ denotes the indicator function, we deduce J 1 λ(Tj−1 A ∩ B) = λ(A)λ(B). J →∞ J

lim

j =1

From this it follows easily that for any a > 1 and Borel sets A, B of T, there exists T ∈ E such that λ(T −1 A ∩ B) ≤ aλ(A)λ(B). (12.2.6) Thus Proposition 12.2.4 is just a consequence of Theorem 5.2.1. Proposition 12.2.4 implies that a pair (f, N ) is a convergence system only if the maximal operator N sup |SN (c, f )| N ≥1

Lp (T)

belongs to with p < 2. This has a consequence concerning convergence in mean. Say by analogy that a pair (f, N ) is an Lp -convergence system if for any N ,f g ∈ L2 (T) the sequence {SN g, N ≥ 1} converges in Lp (T). 12.2.5 Corollary. Assume that the pair (N , f ) is a convergence system. Then, it is also an Lp -convergence system for any p < 2. N N Proof. Define ωR = supN,M≥R |SN (c, f )−SM (c, f )|. By assumption limR→∞ ωR = p 0 a.e., and by the above remark ω1 ∈ L (T), p < 2. Thus by Fatou’s lemma, N N N N (c, f ) − SM (c, f )|p ≥ lim sup E |SN (c, f ) − SM (c, f )|p . 0 = E lim sup |SN N,M→∞

N,M→∞

The previous results summarize the basic equivalence of a.e. convergence results and maximal inequalities for f (nk x). In Theorem 12.2.16 at the end of this section we will in fact prove a maximal inequality that leads to various a.e. convergence results for ∞ k=1 ck f (nk x). Except for this result, however, our approach to a.e. convergence will be different and we will use a combination of martingale and quasi-orthogonality arguments to achieve our goal. Theorems 12.2.1 and 12.2.2 above show that the convergence properties of ∞ k=1 ck f (nk x) depend sensitively on the smoothness properties of f , and we start with a few preliminary remarks concerning smoothness criteria. Let f ∈ L2 (T) with T f (t)dt = 0 have Fourier series f ∼

∞ k=1

(ak cos 2π kx + bk sin 2π kx)

(12.2.7)

615

12.2 Almost sure convergence – sufficient conditions

and let rf (N ) =

∞

(ak2 + bk2 ).

(12.2.8)

k=N

Given an integer m ≥ 1, let [f ]m denote the function in [0, 1) which takes the constant (k+1)/m value m k/m f (t)dt in the interval [k/m, (k + 1)/m), (k = 0, 1, . . . , m − 1). In probabilistic terms, [f ]m is the conditional expectation of f with respect to the σ -field generated by the intervals [k/m, (k + 1)/m). Let rf∗ (N ) = f − [f ]N .

(12.2.9)

The speed of convergence of rf∗ (N ) to zero clearly measures the smoothness of f ; for example if f is a Lip (α) function, then rf∗ (N ) = O(n−α ). A simple connection between rf (N) and rf∗ (N ) is given by the following lemma, due essentially to Ibragimov [1962]. Its proof will be given after the proofs of Theorems 12.2.7 and 12.2.2. 12.2.6 Lemma. Let λ > 1 and g(t) = f (λt). Then we have, for any m ≥ λ,

g − [g]m ≤ C (m/λ)−1/2 + rf ((m/λ)1/3 ) (12.2.10) where C is a positive constant depending only on f . In particular, for any N ≥ 1 we have rf∗ (N ) ≤ C(N −1/2 + rf (N 1/3 )). Thus if rf (N) = O(N −α ) for some 0 < α ≤ 1, then rf∗ (N ) = O(N −α/3 ). Turning to the convergence behavior of ck f (nk x), we first study the lacunary case, i.e., we assume that {nk , k ≥ 1} grows very rapidly. If {nk , k ≥ 1} satisfies the Hadamard gap condition (12.2.2), then by Theorem 12.2.2 the system f (nk x) is a convergence system under mild smoothness conditions on f . We investigate now the case when {nk , k ≥ 1} grows with a sub-exponential speed, i.e., it satisfies the gap condition nk+1 /nk ≥ 1 + εk , k ≥ k0 , where εk tends to 0. A remarkable result on trigonometric series with sub-Hadamard gaps was proved by Erdös [1962], who showed that if {nk , k ≥ 1} is a sequence of positive integers satisfying nk+1 /nk ≥ 1 + ck −β ,

k = 1, 2, . . .

(12.2.11)

for some c > 0, β < 1/2, then sin 2π nk x satisfies the central limit theorem, i.e., lim λ{x ∈ (0, 1) : (N/2)−1/2

N→∞

N k=1

sin 2π nk x ≤ t} = (2π )−1/2

t −∞

e−u

2 /2

du.

616

12 A study of the system (f (nx))

Moreover, this result becomes false for β = 1/2. Thus, under (12.2.11) with β < 1/2 the sequence sin 2π nk x behaves like a sequence of independent random variables, and this is no longer valid if β = 1/2. Our next theorem gives a strong convergence property of series ∞ k=1 ck f (nk x) under the Erdös gap condition (12.2.11). Define, for any * > 0, * L+[k ] |c |. τk,* (c) = sup L≥k *+1 =L

12.2.7 Theorem. Let f ∈ L∞ (T) with T f (t)dt = 0 and rf (N ) = O(N −α ) for some α > 0. Let {nk , k ≥ 1} be a sequence of positive integers satisfying the gap condition (12.2.11) with some β < 1/2, and let c ∈ 2 with τk,* (c) = o(1) (k → ∞) for all 2 0 < * < 1. Assume that ∞ k=1 ck f (nk x) and all of its subseries converge in L (T) ∞ norm. Then k=1 ck f (nk x) also converges a.e. It seems likely that Theorem 12.2.7 remains valid without the technical condition τk,* (c) = o(1), but this remains open. This condition is certainly satisfied if ck = O(k −1/2 ) which, in turn, holds if c ∈ 2 and {ck , k ≥ 1} is monotone. Note that if Xk are independent random variables, then under suitable moment conditions, mean convergence of ∞ k=1 Xk implies a.e. convergence of the same series. Theorem 12.2.7 establishes a similar property for ∞ k=1 ck f (nk x). Note that the central limit theorem is in general not valid for f (nk x) under the gap condition (12.2.11) with β < 1/2, despite Erdös’ theorem mentioned above (see Kac [1949: 645]). 12.2.8 Corollary. Let f ∈ L∞ (T) with T f (t)dt = 0 and rf (N ) = O(N −α ) for some α > 0. Assume that the Dirichlet series ∞ n=1

an n−s and

∞

bn n−s

(12.2.12)

n=1

are regular and bounded in the half-plane $(s) > 0. Let {nk , k ≥ 1} be a sequence of positive integers satisfying the gap condition (12.2.11) with some β < 1/2. Then ∞ 2 −1/2 ). k=1 ck f (nk x) converges a.e. provided c ∈ and ck = O(k Corollary 12.2.8 connects the a.e. convergence of lacunary series ∞ k=1 ck f (nk x) to the classical Wintner theory, showing that the boundedness of the associated Dirichlet series (12.2.12) implies not only mean, but actually a.e. convergence in the lacunary case. We will show that this result is best possible: if the boundedness condition on the Dirichlet series (12.2.12) is not satisfied, there exists a sequence {nk , k ≥ 1} satisfying (12.2.11) for all β < 1/2, and a positive nonincreasing sequence c ∈ 2 such that ∞ almost everywhere. On the other hand, if we are interested k=1 ck f (nk x) diverges ∞ in the a.e. convergence of k=1 ck f (nk x) under more stringent coefficient conditions ∞ 2 c ω(k) < ∞, ω(k) → ∞, then the condition on the Dirichlet series can be k=1 k dropped, as the following result shows.

617

12.2 Almost sure convergence – sufficient conditions

12.2.9 Lemma. Let f ∈ Lipα (T) for some 0 < α ≤ 1 and assume that Let {nk , k ≥ 1} be an increasing sequence of positive integers and put ω(j ) := max

α α nj n 1≤≤j

Then

N T

nj

,

k≥j

2

N

∞

k=1

ck f (nk x) dx ≤ C

k=1

with some constant C. In particular, if in L2 norm.

2 k=1 ck ω(k)

nk

T f (t)dt

= 0.

.

ck2 ω(k)

< ∞,

∞

k=1 ck f (nk x) converges

In particular, if nk = [exp(k/(log k)τ )], then ω(j ) = (log j )ρ and in the case nk = [exp(k η )], 0 < η < 1, then ω(j ) = j 1−η . We supplement Theorem 12.2.7 with another result reducing the almost everywhere convergence of N (nk x) to mean convergence under an additional k=1 ck f assumption 2 on the size of the tail sums k>N ck2 , or, alternatively, under assuming ∞ k=1 ck ω(k) < ∞ for a suitable ω(k) → ∞. 12.2.10 Theorem. Let f ∈ Lipα (T) for some 0 < α ≤ 1 and assume that T f (t)dt = 2 0. Let k , k ≥ 1} be an increasing sequence of positive integers and c ∈ . Assume {n ∞ that k=1 ck f (nk x) converges in L2 norm and 1/2 1/2 1/α ck2 n−2 nαk = 0. (12.2.13) lim k R→∞

Then

∞

k>R

k=1 ck f (nk x)

k>R

k≤R

converges almost everywhere.

If the sequence {nk , k ≥ 1} satisfies the Hadamard gap condition (12.2.2), relation (12.2.13) trivially holds whenever c ∈ 2 . If, on the other hand, {nk , k ≥ 1} grows slower than exponentially, condition (12.2.13) imposes a restriction on the tail sums 2 k>R ck , which is very mild if {nk , k ≥ 1} grows near exponentially. For example, if k/(log k)τ ] for some τ > 0, then (12.2.13) reduces to nk = [e

ck2 = O (log R)−τ (1+2/α) . k>R

If nk = [e

k/(log log k)τ

], then (12.2.8) becomes

ck2 = O (log log R)−τ (1+2/α) , k>R

kγ

and if nk = [e ], 0 < γ < 1 then we get

ck2 = O R −(1−γ )(1+2/α) . k>R

618

12 A study of the system (f (nx))

The latter case corresponds to the Erdös gap condition (12.2.11), and thus we see that the conditions of Theorem 12.2.10 are more restrictive than those of Theorem 12.2.7. On the other hand, in Theorem 12.2.10 we do not assume regularity conditions like ck = O(k −1/2 ). Proof of Theorem 12.2.10. We follow Kac [1943]. For almost all points t0 , 1 t0 +h ∗ ∗ f (u)du. (12.2.14) f (t0 ) = lim h→0 h t0 Now, since k≥1 ck f (nk .) converges in mean to f ∗ , by Parseval’s relation,

t0 +h

∗

f (u) du =

t0

t0 +h

ck

k≥1

f (nk u)du.

(12.2.15)

t0

We shall use the following estimate: there exists a constant C such that for any 0 ≤ a < b < 1 and any positive integer k, b ≤ Cn−1 . f (n u)du (12.2.16) k k a

Let χ be the characteristic function of the interval [a, b], with period 1 extended onto the whole real line. Suppose that χ (x) = am em (x). m∈Z

By Parseval’s relation,

b

f (nk u)du =

a

ϕm ank m .

m∈Z

We have am = O 1/|m| (see (12.2.39)); and thus we get ! b 1/2 ! 1/2 2 2 ≤ f (n u)du |ϕ | |a | ≤ C f 2 /nk . k m n m k a

m∈Z

m∈Z

Combining (12.2.15) with (12.2.16) gives

t0 +h

∗

f (u)du −

t0

R

t0 +h

ck t0

k=1

! 1/2 ! 1 1/2 f (nk u)du ≤ C ck2 ( )2 . n k>R

k>R

Since f belongs to Lipα (T), R ck k=1

R # |ck |nαk . f (nk u) − f (nk t0 ) du ≤ C|h|1+α

t0 +h "

t0

k=1

k

619

12.2 Almost sure convergence – sufficient conditions

Therefore t0 +h R ! 1/2 ! 1 1/2 1 ∗ −1 f (u)du − c f (n t ) ≤ C |h| ck2 ( )2 k k 0 h n t0

k=1

k>R

+ |h|

α

R

|ck |nαk

k>R

k

.

k=1

Choosing h = hR =

α −1/α k=1 nk

R

|hR |

α

R

and observing that R

|ck |nαk

=

k=1

α k=1 |ck |nk α k=1 nk

R

→ 0,

as R tends to infinity since ck tends to 0 as k tends to infinity, finally shows in view of condition (12.2.13),

1 lim R→∞ hR

t0 +hR

f ∗ (u)du −

t0

R

ck f (nk t0 ) = 0.

k=1

The proof is completed by combining the above result with (12.2.14).

Proof of Lemma 12.2.9. From f ∈ Lipα (T) it follows (see Zygmund [1959: 324]) that ∞

(a2 + b2 ) ≤ Dn−2α .

=n+1

Let j ≤ k be fixed positive integers. Using Parseval’s relation yields T

ϕ(nj x)ϕ(nk x)dx =

(ar as + br bs ).

rnj =snk

The relation j ≤ k together with rnj = snk implies that s ≥ 1 and r ≥ (nk /nj ). Using the inequality |ar as + br bs | ≤ (ar2 + br2 )1/2 (as2 + bs2 )1/2 , and the Cauchy–Schwarz inequality we get ! α 1/2 ! 1/2 nj 2 2 2 2 ϕ(nj x)ϕ(nk x)dx ≤ (ar + br ) (as + bs ) ≤B . n T

r≥nk /nj

s≥1

k

620 Thus

12 A study of the system (f (nx))

T 1≤j 0 by the choice of *. By the periodicity of f and 1 0 f dx = 0 we clearly have for any real L and λ ≥ 1, L+1 1 2 f (λx)dx ≤ |f (x)|dx λ L

0

and thus (12.2.24) shows that the last expression of (12.2.23) cannot exceed C|cν | 1 ≤ C|k | ≤ Ck −2 . mν mt

ν∈k

622

12 A study of the system (f (nx))

Hence we proved (12.2.22) and thus (12.2.20). It is now easy to complete the proof of Lemma 12.2.11. By (12.2.18) and wellknown properties of conditional expectations we have E (|Dk − Tk |2 | Fk−1 ) = E |Dk − Tk |2 ≤ Ck −8 1 and thus by the Tchebycheff inequality

P E (|Dk − Tk | | Fk−1 ) ≥ k −2 ≤ P E |Dk − Tk |2 | Fk−1 ≥ k −4 ≤ Ck −4 . Together with (12.2.20) this yields (12.2.19). Set Dk = Dk −E (Dk | Fk−1 ); clearly (Dk , Fk ) is a martingale difference sequence and hence orthogonal. Also,

E (Dk | Fk−1 ) ≤ E ((Dk − Tk ) | Fk−1 ) + E (Tk | Fk−1 )

(12.2.25)

≤ Dk − Tk + Ck −2 ≤ C(k −4 + k −2 ) by (12.2.18) and (12.2.20). By the assumptions of Theorem 12.2.7, in L2 (T) norm and thus n Tk → 0

∞

k=1 Tk

converges

as m, n → ∞.

k=m

Consequently, using the orthogonality of Dk , (12.2.18) and (12.2.25) we get n

E Dk2

k=m

1/2

n n n = Dk ≤ Dk + E (Dk | Fk−1 ) k=m

k=m

k=m

k=m

k=m

k=m

n n n n ≤ Dk + C k −2 ≤ Tk + C k −2 → 0 k=m

(12.2.26) 2 as m, n → ∞. Thus ∞ k=1 E Dk < ∞ and thus the martingale convergence theorem implies that k Dk is a.e. convergent. Now k E (D k | Fk−1 ) is a.e. convergent by Lemma 12.2.11 and the Borel–Cantelli lemma, further k (Tk − Dk ) is a.e. convergent by (12.2.18) and the Beppo Levi theorem. Thus T is a.e. convergent; for the same k k reason k Tk is also a.e. convergent, where Tk = cν f (nν x). ν∈k

Hence setting SN =

ν≤N

cν f (nν x),

Nk = 2

[i * ] i≤k

623

12.2 Almost sure convergence – sufficient conditions

we proved that SNk is a.e. convergent. To prove the theorem it remains to show that Mk → 0 a.e. where Mk = max |SN − SNk |. Nk ≤N 0 and thus using g1 (x) =

N

(ak cos 2π kλx + bk sin 2π kλx)

k=1

and the linearity of the operation g → [g] and the fact that [g]m ≤ g we get |g1 − [g1 ]m | ≤

N

2π kλ(|ak | + |bk |)m−1

k=1

≤ 2π λm−1

N

k2

∞ 1/2 !

k=1

ak2

1/2

+

∞

k=1

bk2

1/2

≤ Cλm−1 N 3/2

k=1

(12.2.27) with some constant C depending on f . Further, by the periodicity of f and f1 we have λ 1 2 2 2 −1 f2 (λx) dx = 4λ f2 (t)2 dt

g2 − [g2 ]m ≤ 2 g2 = 4 0 0 [λ]+1 1 (12.2.28) −1 2 −1 f2 (t) dt ≤ 4λ f2 (t)2 dt ≤ 4λ 0

0

≤ 8 f − f2 = 8r(N). 2

624

12 A study of the system (f (nx))

Using relations (12.2.27)–(12.2.28) we get

g − [g]m ≤ C(λm−1 N 3/2 + r(N ))

(12.2.29)

whence the statement of the lemma follows by choosing N = [(m/λ)1/3 ]. We turn now to the nonlacunary case, i.e., the case when no growth condition on {nk , k ≥ 1} is assumed. As we already indicated, in this case the number-theoretic structure of the sequence {nk , k ≥ 1} will play an important role in the convergence behavior of ∞ k=1 ck f (nk x). The notion of a quasi-orthogonal system is of particular relevance in the study of the convergence in mean and/or almost everywhere of series n cn f (nk x). In this direction, we will establish the following general result. Here, and in the sequel, let L(x) = log(x ∨ 1) for x ∈ R. 12.2.12 Theorem. Let f ∈ L2 (T) with T f (x)dx = 0. Let {nk , k ≥ 1} be an increasing sequence of positive integers and assume that there exists a sequence {Ck , k ≥ 1} of positive integers such that ∞ rf∗ (Ck )2 < ∞ (12.2.30) k=1

and

(nh , nk ) (nh , nk )Ck L < ∞. (12.2.31) nk nh h≥1 k>h ∞ 2 2 Then the series ∞ k=1 ck f (nk x) converges a.e. provided k=1 ck (log k) < ∞. sup ch

The following theorem describes what happens if condition (12.2.31) of Theorem 12.2.12 is not assumed. 12.2.13 Theorem. Let f ∈ L2 (T) with T f (x)dx = 0. Let {nk , k ≥ 1} be an increasing sequence of positive integers and assume that there exists a sequence {ck , k ≥ 1} of positive integers and a positive nondecreasing sequence (λk ) such that λ2k /λk = O(1) and ∞ rf∗ (ck )2 /λk < ∞, (12.2.32) k=1

(nh , nk ) (nh , nk )ck sup ch L ≤ λN . nk nh 1≤h≤N h 1/2. Let {nk , k ≥ 1} be an increasing sequence of integers and let (λk ) be a positive nondecreasing sequence such that λ2k /λk = O(1) and sup

N

nh , nk α ≤ λN .

1≤h≤N k=1

Then

∞

k=1 ck f (nk x)

converges a.e. provided

∞

2 2 k=1 ck (log k) λk

< ∞.

Before proving Theorems 12.2.12–12.2.14, we give some applications. 12.2.15 Corollary. (i) Let f ∈ L2 (T) with T f (x)dx = 0 and rf (n) = O(n−α ). Let sequence of coprime integers such that nk ≥ k β with {nk , k ≥ 1} be an increasing some β > 1 + 1/(2α). Then ∞ k=1 ck f (nk x) converges almost everywhere provided ∞ 2 (log k)2 < ∞. c k=1 k (ii) Let f ∈ L2 (T) with T f (x)dx = 0 and with Fourier coefficients satisfying ak = O(k −α ), bk = O(k −α ), α > 1/2. {nk , k ≥ 1} be an increasing sequence of Let−α ∞ n < ∞. Then pairwise coprime integers such that ∞ k=1 ck f (nk x) converges ∞ 2 k=1 k2 almost everywhere provided k=1 ck (log k) < ∞. (iii) Let f ∈ L2 (T) have Fourier-coefficients O(1/k) (for example, let f ∈ BV (0, 1)) {nk , k ≥ 1} be a sequence of integers such that for any d ≥ 1 and let −1 we have d|nk nk ≤ A/d with an absolute constant A. Then ∞ k=1 ck f (nk x) con∞ 2 2 log n < ∞. c (log k) verges almost everywhere provided k k=1 k (iv) Let f ∈ L2 (T) with T f (x)dx = 0 and rf∗ (n) = O(n−α ). Then the series ∞ 2 β ∞ k=1 ck f (kx) converges almost everywhere provided k=1 ck k < ∞ for some β > 1/(1 + 2α). (v) Let f ∈ L2 (T) with T f (x)dx = 0 and with ∞Fourier coefficients satisfying −α −α ak = O(k ), bk = O(k ), 1/2 < α < 1. Then k=1 ck f (kx) almost everywhere 2 1−α (log k)2 < ∞. provided ∞ k=1 ck k (vi) Let f ∈ L2 (T) with T f (x)dx = 0 and with Fourier coefficients satisfying ak = O(k −α ), bk = O(k −α ), α > 1/2. Let nk = k r , r is aninteger with r > 1/α. ∞ 2 2 Then ∞ k=1 ck f (nk x) converges almost everywhere provided k=1 ck (log k) < ∞. Proof of Theorem 12.2.14. We follow the proof of Theorem 12.1.3 with minor modi 2 (log k)2 λ < ∞ and the c fications, using the same notation. The assumption ∞ k=1 k k 2 2 estimates in the second line of (12.1.18) with γ = 2 show that ∞ k=1 ν T Zν dx < ∞ 2 2 ∞ and thus k=1 ν Zν < ∞ almost everywhere. Hence (12.1.17) implies that 2n k=2m +1 f (nk t) → 0 almost everywhere as m, n → ∞ and thus the partial sums 2N k=1 ck f (nk x) converge a.e. Now (12.1.16) and the Rademacher–Menshov inequal-

626

12 A study of the system (f (nx))

ity (see Section 8.3) imply max

m

T 2N +1≤m≤2N+1

2

ck f (nk t) dt ≤ C3 λ2N+1

N+1 2

k=2N +1

ck2 (log 2N )2

k=2N +1

≤ C4

N+1 2

ck2 (log k)2 λk .

k=2N +1

(12.2.34) ∞

Summing these relations for N = 1, 2, . . . and using k=1 ck2 (log k)2 λk < ∞, it follows that m 2 max ck f (nk t) → 0 almost everywhere, 2N +1≤m≤2N+1

k=2N +1

completing the proof of Theorem 12.2.14. Proof of Theorem 12.2.12. Let fk = [f ]Ck (nk · ). By the Cauchy–Schwarz inequality we get ∞

|ck | f (nk ·) − fk (·) =

k=1

∞

|ck | f (·) − [f ]Ck (·) =

k=1

≤

∞

|ck |rf∗ (Ck )

k=1

∞

ck2 λk

k=1

∞ 1/2

rf∗ (Ck )2 /λk

1/2

0. Hence by the Minkowski inequality, (k)

E (Yj )2 ≥ (k)

1 L(ψ(k)) 2

(12.3.5)

(k)

for k ≥ k0 . Also |Yj | ≤ |Xj | + K ≤ constant · ψ(k) and thus setting rk 1 (k) Yj , Zk = (rk Lψ(k))1/2

σk2

=E

rk

j =1

(k) 2

Yj

j =1

≥

1 rk L ψ(k), 2

we get from the central limit theorem with Berry–Esseen remainder term, rk (k) P Zk ≥ 1 ≥ P Yj ≥ 2σk ≥ (1 − (2)) − C j =1

≥ 1 − (2) − o(1) ≥ 0.02,

rk (rk Lψ(k))3/2 ψ(k)3

(k ≥ k0 )

1/2

3/2 ≥ ψ(k)4 . Since the Z are indepenprovided rk grows so rapidly that rk L(ψ(k)) k dent, the Borel–Cantelli lemma implies P Zk ≥ 1 infinitely often = 1, i.e., k≥1 Zk is a.e. divergent, which, in view of (12.3.3), yields that ∞ k=1

rk 1 (k) Xj (rk L(ψ(k))1/2

is a.e. divergent.

Let now N :=

rk ∞ + +

(k)

Ij .

(12.3.7)

k=1 j =1

Then the sum in (12.3.6) is of the form ∞ i=1

ci2 =

(12.3.6)

j =1

∞ k=1

∞

i=1 ci f (ni x) ∞

where

rk 1 = < +∞. rk L(ψ(k)) L(ψ(k)) k=1

638

12 A study of the system (f (nx))

Finally, denote by 1 + ρk the smallest of the ratios (j + 1)/j , 1 ≤ j ≤ ψ(k) − 1; clearly ρk > 0. Given εk ↓ 0 one can choose rk growing so rapidly that ρk ≥ εrk−1

k = 1, 2, . . . .

(12.3.8)

(k)

Now if ns and ns+1 belong to the same set Ij , then clearly s ≥ rk−1 , and thus by 8 8 (12.3.7) we get ns+1 ns ≥ 1 + ρk ≥ 1 + εrk−1 ≥ 1 + εs . Since ns+1 ns ≥ 2 if ns and (k) ns+1 belong to different Ij ’s, we proved that {nk , k ≥ 1} satisfies 8 nk+1 nk ≥ 1 + εk (k ≥ k0 ). (12.3.9) This completes the proof of Theorem 12.3.1. There are few results concerning the bounded case, namely the case when in the series k ck f (nk x), f is not smooth but only bounded. We first consider the case of primes and prove the following result. 12.3.4 Theorem. Let P := (Pk ) be an increasing sequence of prime numbers. Let c = {ck , k ≥ 1} be a sequence of positive reals such that ck2 < ∞, ck = ∞. (12.3.10) k

k

Then with T f (t)dt = 0 such that the series ∞ there exists a function f ∈ k=1 ck f (Pk x) diverges on a set with positive measure. L∞ (T)

Theorem 12.3.4 will be deduced from the following 12.3.5 Theorem. Let P := (Pk ) be an increasing sequence of prime numbers. Let c = {ck , k ≥ 1} be a sequence of positive reals such that ck2 < ∞, ck = ∞. (12.3.11) Put Cn =

k≤n ck

k

k

and consider the weighted sums Sn f =

1 ck f (Pk x). Cn

(12.3.12)

k≤n

Then there exists a function f ∈ L∞ (T) with T f (t)dt = 0 such that the sequence {Sn f, n ≥ 1} diverges on a set with positive measure. Proof of Theorem 12.3.4. Assuming that Theorem 12.3.5 is valid, there exists a bounded measurable function f such that (Sn f )n does not converge almost everywhere. Then the partial sums k≤n ck f (Pk x) do not converge almost everywhere either. Otherwise, this would imply, in view of the assumption that the series k ck diverges, that (Sn f (x))n tend to 0 almost everywhere, a contradiction. Hence the result.

12.3 Almost sure convergence – necessary conditions

639

To prove Theorem 12.3.5, we use Bourgain’s entropy criterion in L∞ (Corollary 6.1.8) and Lemma 6.1.5. Proof of Theorem 12.3.5. Let {TN , N ≥ 1} be integers such that TN − TN −1 increases to infinity with N. Define αTN−1 +1 αTN +N = u = PTN−1 +1 . . . PTN : αi ∈ {0, 1} and (αTN−1 +1 , . . . , αTN ) = (0, . . . , 0) , 1 fN = " eu . #1/2 2TN −TN−1 − 1 u∈+N (12.3.13) Let TN−1 < R ≤ TN . Then, 1 1 ck euPk , ev . " #1/2 CR 2TN −TN−1 − 1 u∈+N v∈+N k≤R

SR (fN ), fN =

Let u, v ∈ +N and k ≤ R. Then euPk , ev = 1, if and only if uPk = v. Noting αT

+1

βT

αT

βT

+1

N−1 N u = PTN N−1−1+1 . . . PTN N , v = PTN−1 +1 . . . PTN , this means that

αT

+1

αT

βT

+1

βT

N−1 N−1 N N Pk PTN−1 +1 . . . PTN = PTN−1 +1 . . . PTN .

This equation has solutions if and only if k belongs to the interval ]TN −1 , TN ], and then the solutions are given by αk = 0,

βk = 1,

αj = βj otherwise.

Hence,

2Tθ −Tθ−1 −1 − 1 1 ≥ . (12.3.14) T −T θ θ−1 2 −1 4 Consequently, for any integer N ≥ 1 and any TN −1 < R ≤ TN , 1 1 1 ck fN (Pk · ), fN = ck fN (Pk · ), fN ≥ . SR (fN ), fN = CR CR 4 k≤R fN (Pk .), fN =

k≤R

k∈]TN−1 ,TN ]

(12.3.15) The proof is achieved by applying Lemma 6.1.5 and the entropy criterion in L∞ . The next two theorems will concern subsequences N generated by infinitely many primes. 12.3.6 Theorem. Let P = {P1 , P2 , . . . } be an increasing sequence of positive pairwise coprime integers, and denote by C(P ) the infinite-dimensional chain generated by P . ∞ Let c = {ck , k ≥ 1} be a sequence of positive reals such that the series k=1 ck diverges. Define for any measurable function f : T → R the weighted sums 1 Sn f (x) = cj f (j x). j ∈C(P )∩[1,n] cj j ∈C(P )∩[1,n]

640

12 A study of the system (f (nx))

Assume that

j ∈C(P )∩[ 21 P12i ,P12i ]) cj

lim sup i→∞

j ∈C(P )∩[1,P12i ]) cj

> 0.

(12.3.16)

Then there exists a bounded measurable function f such that (Sn f )n does not converge almost everywhere. From Theorem 12.3.6 one can obtain 12.3.7 Theorem. Let P = {P1 , P2 , . . . } be an increasing sequence of positive pairwise coprime integers, and denote by C(P ) the infinite-dimensional chain generated by P . Let c = {ck , k ≥ 1} be a sequence of positive reals such that ck2 < ∞, ck = ∞. k

k

Assume that condition (12.3.16) is satisfied. Then, there exists a bounded measurable function f such that c f (P .) does not converge almost everywhere. k k k≤n n The proof of Theorem 12.3.7 is similar to the proof of Theorem 12.3.5, so it is omitted. Proof of Theorem 12.3.6. Let s be some fixed positive integer. Put for any integer T ≥ 0, AT = n = P1α1 . . . Psαs : P1T ≤ n < P1T +1 , αi ≥ 0, i = 1, . . . , s . (12.3.17) By replacing α1 by α1 + 1, one can easily verify that #(AT ) ≤ #(AT +1 ). As for n =

P1α1

. . . Psαs

(12.3.18)

∈ AT , necessarily 0 ≤ α1 + · · · + αs ≤ T , so we also deduce #(AT ) ≤ T s .

(12.3.19)

Then, for any d > 0, there exists an integer T > 0 such that #(AT +d ) ≤ 2#(AT ).

(12.3.20)

Indeed, otherwise, #(AT +d ) > 2#(AT ) for any T , would imply for any integer n, #(And ) > B2n , where B is some positive constant, which contradicts (12.3.19). Choose d such that P1d ≤ Ps . Any element j ∈ C(P ) such that j ≤ P1d can be thus expressed as j = P1α1 . . . Prαr with r ≤ s. Put for any i = 0, . . . , d, f (i) (x) =

1 1

#(AT +i ) 2

n∈AT +i

e2iπ nx ,

(12.3.21)

641

12.3 Almost sure convergence – necessary conditions

and let Next, put for any i = 0, . . . ,

f = f (0) .

"d # 2

, f (2i−1) + f (2i) , √ 2

φi =

(12.3.22)

and let for any integer j , fj (x) = f (j x). The set of functions f (i) is a sub-orthonormal system of L2 and the same property holds true for the system of functions φi . Moreover

fj = 1 for any j . " # " # Let 1 ≤ i ≤ d2 , j ∈ P12i−1 , P12i ∩ C(P ), and examine fj . Let n ∈ AT . Then β β nj may be written as nj = P1 1 . . . Ps s . Moreover, P1T +2i−1 ≤ nj < P1T +2i+1 . It follows that we have the implication # " n ∈ AT and j ∈ P12i−1 , P12i ∩ C(P ) "⇒ nj ∈ AT +2i−1 ∪ AT +2i . We may thus write fj (x) = where D ⊂ AT +2i−1 ∪ AT +2i √ 2fj , φi =

and so for any 1 ≤ i ≤

"d # 2

1 2

e2iπ mx ,

#(D) m∈D and #(D) = #(AT ). Hence, 1

1

1

[#(AT )#(AT +2i−1 )] 2 +

≥

1

m∈D∩AT +2i−1

1

1

1

[#(AT )#(AT +2i )] 2

m∈D∩AT +2i

1

1 √ .#(AT ) = √ , #(AT ) 2 2

, P12i−1 ≤ j ≤ P12i , fj , φi ≥

1 . 2

(12.3.23)

Further, fj , φk ≥ 0 for any j and k. Thus, SP 2i (f ), φi = 1

≥

1 j ∈C(P )∩[1,P12i ] cj

cj fj , φi

j ∈C(P )∩[1,P12i ])

1 j ∈C(P )∩[1,P12i ] cj

j ∈C(P )∩[ 21 P12i ,P12i ])

1 j ∈C(P )∩[ 21 P12i ,P12i ] cj . ≥ 2 j ∈C(P )∩[1,P 2i ] cj 1

cj fj , φi

642

12 A study of the system (f (nx))

"d #

We have obtained for any i = 1, . . . ,

2

,

1 j ∈C(P )∩[ 21 P12i ,P12i ] cj SP 2i (f ), φi ≥ . 1 2 j ∈C(P )∩[1,P 2i ] cj

(12.3.24)

1

Now, by assumption

j ∈C(P )∩[ 21 P12i ,P12i ] cj

lim sup

j ∈C(P )∩[1,P12i ] cj

i→∞

> 0.

We may find an increasing sequence (iλ )λ of integers as well as a positive real c, such that 2i 2i c j ∈C(P )∩[ 21 P1 λ ,P1 λ ] j ≥ 2c (λ = 1, 2, . . . ). 2i c j ∈C(P )∩[1,P λ ] j 1

Consequently, for any λ such that iλ ≤ d, SP 2iλ (f ), φiλ ≥ c.

(12.3.25)

1

Let p" be# a positive "" #integer # such that pc ≥ 1. Lemma 6.1.5 applied with the choices R = D2 , T = D2 /13 with D = #(λ | iλ ≤ d) and p shows that

N

$

D SP 2i (f ), i ≤ 1 2

%

c , 2

≥ T.

(12.3.26)

But d is arbitrary, thus

sup

f ∈L∞ f 2 ≤1

N

c SP 2i (f ), i ≥ 1 , 1 2

= ∞.

Applying now Bourgain’s entropy criterion in L∞ (Corollary 6.1.8) achieves the proof.

12.4

Random sequences

In this section we investigate the convergence of the series ∞ k=1 ck f (nk x) where {nk , k ≥ 1} is a random sequence of real numbers. Specifically, we will investigate the model when nk = X1 + · · · + Xk , where the Xk are independent, identically distributed random variables defined on some probability space (, A, P). We will not assume that X1 is integer valued or X1 > 0; we assume only that the distribution n of X1 is nondegenerate. If the random walk X , n ≥ 1 is transient, we have k=1 k |nk | → ∞ a.s. On the other hand, if the random walk is recurrent and X1 is nonlattice,

643

12.4 Random sequences

{nk , k ≥ 1} is dense in R with probability 1. We begin our investigations with the study of random trigonometric sums of the form ∞

cn eitSn (ω)

(12.4.1)

n=1

where {ck , k ≥ 1} ∈ 2 ; the terms of this sum are functions defined on the product space × T, endowed with the product probability P × λ. 12.4.1 Theorem. Let X1 be nondegenerate with characteristic function ϕ and let Sn = nk=1 Xk be the corresponding random walk. Then for any c ∈ 2 and any real t for which ρ = max(|ϕ(t)|, |ϕ(2t)|, |ϕ(−t)|, |ϕ(−2t)|) < 1

(12.4.2)

the series (12.4.1) converges with probability 1. Consequently, the series (12.4.1) converges for almost all (t, ω) ∈ T × , provided c ∈ 2 . Since X1 is nondegenerate, (12.4.2) holds for all but countably many t’s. If X1 is nonlattice, then |ϕ(t)| < 1 for all t = 0; otherwise there exists a t0 > 0 such that |ϕ(t)| = 1 if and only if t = kt0 , k ∈ Z. If X1 is degenerate, then Sn = cn with some constant c, and the statement of Theorem 12.4.1 reduces to Carleson’s theorem, which is of course not contained in our result. But it is interesting to note that for all other random walks, the above formulated “random” version of Carleson’s theorem is valid. This seems paradoxical at first sight, since the random walk Sn can be recurrent, e.g., it is possible that Sn = 0 for infinitely many n. However, by the theory of random walks 1/2 in the interval [0, n]) the set H = {n : Sn = 0} is thin (e.g., it has O(n ) elements and Theorem 12.4.1 shows that k∈H |ck | < ∞ even if ∞ |c k=1 k | = ∞. Applying Theorem 8.2.1 with γ = 4, α = 2, uk = ck2 , for the proof of Theorem 12.4.1 it suffices to prove the following 12.4.2 Lemma. For any real c1 , . . . , cN we have N 4 E ck eitSk ≤ k=1

2 1 ck2 . (1 − ρ)2 N

(12.4.3)

k=1

where ρ is defined by (12.4.2). Proof. In the case ρ = 1 the lemma is obvious, so we can assume ρ < 1. Clearly for any real c1 , . . . , cN we have E|

N k=1

ck eitSk |4 =

1≤j,k,l,m≤N

cj ck cl cm E eit (Sj −Sk +Sl −Sm ) .

(12.4.4)

644

12 A study of the system (f (nx))

We now claim that |E eit (±Sj ±Sk ±Sl ±Sm ) | ≤ ρ (|j −k|+|l−m|)

(j ≥ k ≥ l ≥ m).

(12.4.5)

provided in the last exponent there are two positive and two negative signs. Clearly we can assume that the sign of Sj in (12.4.5) is positive; otherwise we replace t by −t. There are three cases: (a) E eit (Sj −Sk +Sl −Sm ) = E eit (Sj −Sk ) E eit (Sl −Sm ) = |ϕ(t)|j −k |ϕ(t)|l−m (b)

≤ ρ (|j −k|+|l−m|) , it (S −S −S +S ) it (S −S ) −it (S −S ) m = |ϕ(t)|j −k |ϕ(−t)|l−m l E e j k l m = E e j k E e

(c)

≤ ρ (|j −k|+|l−m|) , it (S +S −S −S ) it (S −S )+2it (S −S )+it (S −S ) m | k l l E e j k l m = E e j k = |ϕ(t)|j −k |ϕ(2t)|k−l |ϕ(t)|l−m ≤ ρ (|j −k|+|l−m|) ,

proving (12.4.5). Thus splitting the sum on the right-hand side of (12.4.4) into 24 subsums corresponding to a fixed relative order of j, k, l, m and in each such sum renaming the indices j, k, l, m so that they will be nonincreasing in the renamed order, we get N 4 E ck eitSk ≤ 24

|cj ||ck ||cl ||cm |ρ (|j −k|+|l−m|) .

(12.4.6)

N ≥j ≥k≥l≥m≥1

k=1

Summing the right-hand side of (12.4.6) first for those indices (j, k, l, m) for which j − k = r and l − m = s are fixed, we get by Cauchy’s inequality, |ck ||ck+r ||cm ||cm+s |ρ r+s 1≤k,k+r,m,m+s≤N

≤ ρ r+s

|ck ||ck+r |

|cm ||cm+s | 1≤m,m+s≤N 1/2 1/2 1/2 1/2 2 2 2 ck2 ck+r cm cm+s ρ r+s 1≤k≤N 1≤k+r≤N 1≤m≤N 1≤m+s≤N 2 ρ r+s cj2 . 1≤j ≤N 1≤k,k+r≤N

≤ ≤

Now summing for r and s we get Lemma 12.4.2. We turn now to the convergence of the series (12.4.1) in Lp (T × ) for p > 2. For simplicity, we consider the case p = 4. 12.4.3 Proposition. Let X = {X, Xi , i ≥ 1} be a sequence of independent, identically distributed, lattice random variables defined on some probability space (, A, P). We

645

12.4 Random sequences

assume that the random walk Sn = X1 + · · · + Xn , n ≥ 1 is transient. Then, E

n 4 ck e2ıπ αSk dα

T k=1

n ≤ 4G(0, 0) |ck |2 k=1

+6

|ci ||cj ||ck ||cl | P Sk − Si = ±(Sj − Sl )

1≤i≤k 0 such that for t ≥ a ≥ a0 , ∞t m t e−π t dt ≤ (2/π )a m e−π a . Applying these remarks with m = σ ∗ + 2 to the a right-hand side of (13.7.18) gives for a large, ∞

C1 ∗ a 1/2+σ

d=a

dσ

∗ +2

e−π d ≤

C2 σ ∗ +2 −π a e ∗a 1/2+σ a

= C2 a 3/2 e−π a .

(13.7.19)

Combining (13.7.17) with (13.7.18) and (13.7.19) gives for a large, ∞ √ 2 1 1 d − 21 s ζ (s) − υ(2G a, 4Q d)γ + π s k(d) k(d) s(1 − s) 2 a a d=a

C (s) −1/2 + C1 (s) max Pk(d) + C2 a 3/2 e−π a . ≤ d≥a a (13.7.20)

706

13 Divisors and random walks

We deduce ∞ √ 1 1 2 d 1 υ(2Gk(d) a, 4Qk(d) d)γ = + π − 2 s s ζ (s). a→∞ a a s(1 − s) 2 d=a (13.7.21) This achieves the proof.

lim

We conclude this section with some results related to the famous Lindelöf Hypothesis, and involving another random walk, the Cauchy random walk. The proofs being long and very technical, we refer to the original paper of [Lifshits–Weber: 2006]. We first begin with basic results. The Lindelöf Hypothesis. The Lindelöf Hypothesis (LH) asserts that

ζ

1 + it = O(t ε ) 2

(13.7.22)

for every positive ε. Up to now, the best known result towards (13.7.22) is due to Huxley [2005] 1 + it = O(t 32/205+ε ) (∀ε > 0) ζ 2 and 32/205 = 0.156097561 . . . . The validity of the Riemann Hypothesis implies ([Titchmarsh: 1951], Theorem 14.14) that

log t 1 ζ + it = O exp A 2 log log t

,

(13.7.23)

A being a constant, which is even a stronger form of LH, the latter being strictly weaker than the Riemann Hypothesis. The validity of (13.7.22) is equivalent to any of the three following assertions (see [Titchmarsh: 1951] Chapter XIII):

1 T 1 2k + it dt = O T ε , k = 1, 2, . . . , (13.7.24) ζ T 1 2 2k

1 1 T (13.7.25) σ > , k = 1, 2, . . . , ζ (σ + it) dt = O T ε , 2 T 1 ∞ 2k dk2 (n) 1 T 1 ζ (σ + it) dt = , σ > , k = 1, 2, . . . , (13.7.26) lim T →∞ T 1 n2σ 2 n=1

where dk (n) denotes the number of representations of integer n as a product of k factors. There is also a reformulation due to Backlund of the LH in terms of the location of the zeros of ζ ; (13.7.22) is equivalent to N(σ, T + 1) − N(σ, T ) = o(log T ) for every σ > 1/2.

(13.7.27)

It is thus natural to study the asymptotic behavior of the zeta function along the critical line σ = 1/2 by modelling the time t with a random walk. The Cauchy random

13.7 The functional equation and the Lindelöf Hypothesis

707

walk turns out to be most appropriate because of the smoothness of the Cauchy characteristic function, which also “preserves” the structure of the Riemann zeta function. Let X1 , X2 , . . . denote an infinite sequence of independent Cauchy distributed random variables (with characteristic function ϕ(t) = e−|t| ); then the time t is modelled by the sequence of partial sums Sn = X1 + · · · + Xn . In order to understand the behavior of ζ ( 21 + it) when t tends to infinity, one may investigate the almost sure asymptotic behavior of the system ζn := ζ

1 + iSn , 2

n = 1, 2, . . . .

(13.7.28)

Put for any positive integer n, Zn = ζ

1 1 + iSn − E ζ + iSn = ζn − E ζn . 2 2

(13.7.29)

A complete second-order theory of the system {Zn , n ≥ 1} is developed in [Lifshits– Weber: 2006]. The most striking fact is that this system nearly behaves like a system of non-correlated variables, i.e., the variables Zn are weakly orthogonal. More precisely 13.7.2 Theorem. There exist constants C, C0 such that E |Zn |2 = log n + C + o(1), n → ∞,

1 E Zn Zm ≤ C0 max 1 , m−n , for m > n + 1. n 2 ∞

1 1 dα, The explicit value of C is C = γ − 2 + 2 0 φ(α)dα + 2 1 φ(α) − 2α αeα −2eα +α+2 where γ is the Euler constant and φ(α) = 2α 2 (eα −1) . Combining then Theorem 13.7.2 with Theorem 9.3.11 also allows us to prove the following theorem [Lifshits–Weber: 2006], which displays a rather slow growth of the Riemann zeta function on the critical line, when sampled by the Cauchy random walk. 13.7.3 Theorem. For any real b > 2, n lim

n→∞

and

1 k=1 ζ ( 2 + iSk ) − n a.s. = n1/2 (log n)b

0,

n ζ ( 21 + iSk ) − n k=1 sup < ∞. n1/2 (log n)b n≥1

2

The used notation a.s. means that the corresponding property holds with probability one. Very likely, the results similar to the above theorems are valid when sampling with a large class of random walks with discrete or continuous steps. However, the necessary moment expressions we obtain for Cauchy distribution are by far more explicit than

708

13 Divisors and random walks

in other cases, e.g., for Gaussian or Bernoulli distributions. The approach is based on the following classical approximation result (Theorem 4.11 in [Titchmarsh: 1951]): letting, as usual, s = σ + it, we have ζ (s) =

1 x 1−s − + O(x −σ ), s n 1 − s n≤x

(13.7.30)

uniformly for σ ≥ σ0 > 0, |t| ≤ Tx := 2π x/K, K is any constant > 1. 13.7.4 Remark. Clearly (13.7.24) is equivalent to T

ζ (1/2 + it)2k dt = Oε T 1+ε ,

k = 1, 2, . . . .

T /2

This is also equivalent to n n 2k

m−s dt = O n1+ε ,

k = 1, 2, . . . .

n/2 m=1

Indeed, apply (13.7.30) with σ0 = 1/2. Minkowski’s inequality yields $ Tn %1/2k 2 ζ (1/2 + it)2k dt T n Tn /2 $ Tn %1/2k n 2 n1/2−it 2k −(1/2+it) ≤ Cn−1/2 . − m − dt T 1/2 − it n

As

Tn /2 m=1

Tn n1/2−it 2k dt ≤ Cnk ∞ Tn /2 1/2−it Tn /2

$ 2 T n

Tn Tn /2

dt (1/4+t 2 )k

2k %1/2k $ 1 2 ζ ( + it) dt − 2 T n

≤ Ck n1−k , we get

Tn

n 2k %1/2k −(1/2+it) ≤ Ck n−1/2 , m dt

Tn /2 m=1

which implies the claimed equivalence. The Lindelöf Hypothesis and Fourier inversion formula. It is striking to observe that (13.7.26) is “almost” a Fourier inversion formula. If ν is a distribution function on R and νˆ (t) = R eitx ν(dx) denotes its characteristic function, then (see remarks on “Continuous time and Fourier inversion formula” in Section 1.4) T 1 e−itx0 νˆ (t)dt = ν{x0 }. (13.7.31) lim T →∞ 2T −T From this result also follows that 1 T →∞ 2T lim

T

−T

|ˆν (t)|2 dt =

x∈R

ν({x})2 .

13.7 The functional equation and the Lindelöf Hypothesis

And actually, for any positive integer N, T 1 lim ν ∗N ({x})2 . |ˆν (t)|2N dt = T →∞ 2T −T

709

(13.7.32)

x∈R

Apply (13.7.32) to the measure μ = at point x, and σ > 1/2. Then

n

1 k=1 k σ δ{− log k} , where δ{x}

μ(t) ˆ =

n

is the Dirac measure

k −(σ +it) ,

k=1

and so 1 lim T →∞ 2T

n T

−T

k=1

1 2N

k σ +it

dt =

x∈R

=

k1 ...kN =ex

#{Y =

)N

i=1 ki Y 2σ

Y =ex Y ∈N

=

2 (Y ) dN,n Y ∈N

Y 2σ

1 σ σ k1 . . . kN

2

: ki ≤ n}2

(13.7.33)

,

where dN,n (Y ) denotes the number of representations of Y as a product of N factors less than or equal to n. And clearly lim

2 (Y ) dN,n

n→∞

Y ∈N

Y 2σ

∞ dk2 (m) . m2σ

=

(13.7.34)

m=1

Introduce according to (13.7.30) the measures μn =

n 1 δ{− log k} , kσ k=1 1−σ

νn = n

(13.7.35) δ{− log n} ,

m(dx) = χ[0,∞) (x)e−(1−σ )x dx where δ{x} is the Dirac measure at point x, and σ > 1/2. Then μˆ n (t) =

n

k −s ,

k=1 1−s

νˆ n (t) = n m ˆ n (t) =

,

1 . 1−s

(13.7.36)

710

13 Divisors and random walks

Therefore 1 n1−s ˆ n (t) with mn = μn − νn m. (13.7.37) − ˆ := m = μˆ n (t) − νˆ n (t) · m(t) ks 1 − s k≤n

We introduce the semi-norms 1/M T 1 f (t)M dt

f T ,M = , 2T −T

f M = lim sup f T ,M . T →∞

Choose in what follows M = 2N, N some fixed integer, and write Tn = 2π n/K, n ≥ 1. We have ˆ n T ,2N ≤ sup ζ (σ + i.) − m ˆ n T ,2N ≤ Cn−σ . sup ζ (σ + i.) T ,2N − m T ≤Tn

T ≤Tn

(13.7.38) From (13.7.38) follows that ˆ n T ,2N = 0. lim sup sup ζ (σ + i.) T ,2N − m

(13.7.39)

n→∞ T ≤Tn

Let ε > 0, and choose 0 so that

k=0 +1

2 Write for n2 ≥ n1 , μn1 ,n2 = nk=n 1 +1 μˆ n (t) = μˆ 0,0 + μˆ 0 ,n . Write also

1 ≤ ε. k 2σ

1 k σ δ{− log k} ,

(13.7.40) so that μn = μ0,0 + μ0 ,n , and

m0 ,n = μ0 ,n − νn m. By (13.7.37), mn = μ0,0 + m0 ,n

and

ˆ n (t) = μˆ 0,0 (t) + m m ˆ 0 ,n (t).

In view of (13.7.33),

μˆ 0,0 2N T ,2N

1 = 2T

0 T

−T

k=1

1 2N

k σ +it

dt →

2 dN, (Y ) 0 Y ∈N

Y 2σ

.

(13.7.41)

Choosing then 0 large enough (depending on ε and σ only), so that 2 2 (m) (Y ) dN, dN 0 ≤ ε/2, we get for all T large enough, say T ≥ Tε , − ∞ Y ∈N Y 2σ m=1 m2σ ∞ dN2 (m) 2N μ ˆ 0,0 T ,2N − m2σ m=1

T 2 dN, (Y ) 1 0 1 2N 0 dt − ≤ 2T k σ +it Y 2σ −T

+

Y ∈N

k=1 2 dN, (Y ) 0 2σ Y

Y ∈N ∞ 2 d (m) N − m2σ m=1

≤

ε ε + = ε. 2 2

(13.7.42)

711

13.8 An extremal divisor case

This is in particular true for T ≥ Tn−1 , assuming n large enough. By the triangle inequality, m ˆ n T ,2N − μˆ 0,0 T ,2N ≤ m ˆ 0 ,n T ,2N . And so for all n large enough and Tn−1 ≤ T ≤ Tn , ∞ dk2 (m) 1/2N m ≤ m ˆ

− ˆ 0 ,n T ,2N + ε. n T ,2N m2σ

(13.7.43)

m=1

Therefore by (13.7.39), for all n sufficiently large, say n ≥ nε , and for all T such that Tn−1 ≤ T ≤ Tn , ∞ dk2 (m) 1/2N ζ (σ + i · ) T ,2N − ≤ 2ε + m ˆ 0 ,n T ,2N . m2σ

(13.7.44)

m=1

We may further (and do) assume nε > 0 . In order to prove (13.7.26), it thus suffices to evaluate for n large (at least n ≥ nε ) and Tn−1 ≤ T ≤ Tn , 2N 2N 1 T m |m ˆ 0 ,n (t) dt (13.7.45) ˆ 0 ,n T ,2N = T −T and in turn to establish that sup

Tn−1 ≤T ≤Tn

1 T

T −T

2N |m ˆ 0 ,n (t) dt

is small enough for large n.

13.8 An extremal divisor case Let (, A, P) be some probability space on which a Rademacher sequence ε = {εi , i ≥ 1} is defined. Consider the sequence of partial sums SN = ε1 + · · · + εN , N = 1, 2, . . . . Put N1 = 1,

Nk = inf{ N > Nk−1 : N even and N |SN 2 }

(k > 1).

(13.8.1)

That this random sequence is well defined can be deduced from our result below. Before stating it, we observe that, for no sequence of positive reals {ak , k ≥ 1} tending to infinity, can the ratios Nk ak converge in probability to a random variable that is positive almost surely. Indeed ([Billingsley: 1999], p. 147) this would imply the validity of a randomly selected central limit theorem, namely SN D √ k "⇒ N (0, 1), Nk

712

13 Divisors and random walks

which is naturally impossible. The object of this section is to prove the following result, exhibiting an exponential growth of the sequence (Nk )k . 13.8.1 Theorem. Put s = 2

j ∈Z e

log Nk =

−2π 2 j 2 .

For any τ > 7/8,

k + O(k τ ) almost surely. s

The result extends to Bernoulli sequences β = {βi , i ≥ 1}. Write BN = β1 + · · · + D

βN , N = 1, 2, . . . . Since εi = 2βi − 1, Theorem 13.8.1 gives the same estimate for the sequence M1 = 1,

Mk = inf{ M > Mk−1 : M even and M| 2BM 2 }

(k > 1).

The proof will rely upon several intermediate results which are of independent interest. We start with a first lemma. 13.8.2 Lemma. For N even, log5/2 N N P N |SN 2 = s + O . N2 −1 2iπj S 2 /N N Proof. We use the formula NδN |SN 2 = N , which by direct integration j =0 e produces 2 −1 N 2πj N cos . (13.8.2) N P N|SN 2 = N j =0

N −1

N 2 We evaluate the trigonometric sum j =0 cos 2πj . Let a > a > 3 be fixed and N put for positive integers N, √ sin ϕN /2 2a log N ϕN = , τN = . N ϕN /2 We assume N sufficiently large for τN to be greater than (a /a)1/2 . Consider the sector A0N =] − ϕN , ϕN [ ∪ ]π − ϕN , π + ϕN [. Put

$√ % $√ % N 2πj 2a log N 2a log N AN = + 1 or 0 ≤ | − j | ≤ +1 . , 0 ≤ |j | ≤ N 2π 2 2π 2πj Since cos 2πj / AN , we have N ≤ cos ϕN for N ∈

/ N 0≤j 2a log N

= −4s

√

e

+ 16

e−2π

√

2 k2

− s2

2π k> 2a log M

−2π 2 j 2

2πj > 2a log N

− 4s

√

e−2π

2 k2

2π k> 2a log M

e

−2π 2 j 2 −2π 2 k 2

.

√ 2πj >√ 2a log N 2πk> 2a log M

And, by using estimate (13.8.6),

2 2 2 2 e−2π j +2 4 e−2π k +2 −s 2 ≤ C N −a +M −a +N −a M −a . 4 2πj N

∈I1N

2πk M M ∈I1

722

13 Divisors and random walks

Write U =

$

j k cos 2π + N M

2πj N ∈AN 2π k ∈A M M

%M 2 $

2πj cos N

Since a > 3, we have obtained C (log N)1/2 (log M)1/2 5/2 U≤ N )1/2 3/4 C (log M) M(log +M 2 N (log N )

%N 2 −M 2

− s . 2

(Case 1), (Case 2).

(13.8.31)

(13.8.32)

(B) Now, by the first step of the proof of Lemma 13.8.2,

M 2 N 2 −M 2 cos 2π( j + k ) cos 2πj ≤ N −(a −2) .

2πj N

N

∈A / N

M

N

(13.8.33)

(C) Finally, we estimate the sum N 2 −M 2 M 2 cos 2π j + k cos 2πj .

2πj N

N

∈AN , 2πk / M M ∈A

By considering, successively, the cases

2π

2πj N

M

N

∈ IiN , i = 1, 2, 3, 4, one sees that the sets

2π k 2πj k j , + ∈ / AM , ∈ IiN , N M M N

i = 2, 3, 4

are obtained from the set

j 2π k 2πj k 2π , + ∈ / AM , ∈ I1N , N M M N

by the transformations I → I π and I → I s defined in the proof of Lemma 13.8.2. Using now the fact that AcM is invariant under the transformation I → I s finally allows us to write

cos 2π

2πj N ∈AN 2π k ∈A M / M

=8

k M j + N M

2πj N N ∈I1 ϕM < 2πk M

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close