Probabilistic Diophantine Approximation: Randomness in Lattice Point Counting

Springer Monographs in Mathematics József Beck Probabilistic Diophantine Approximation Randomness in Lattice Point Cou...

Author: József Beck | 2014

40 downloads 736 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Springer Monographs in Mathematics

József Beck

Probabilistic Diophantine Approximation Randomness in Lattice Point Counting

Springer Monographs in Mathematics

More information about this series at http://www.springer.com/series/3733

József Beck

Probabilistic Diophantine Approximation Randomness in Lattice Point Counting

123

József Beck Department of Mathematics Rutgers University Piscataway, NJ, USA

ISSN 1439-7382 ISSN 2196-9922 (electronic) ISBN 978-3-319-10740-0 ISBN 978-3-319-10741-7 (eBook) DOI 10.1007/978-3-319-10741-7 Springer Cham Heidelberg New York Dordrecht London Library of Congress Control Number: 2014950069 © Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Preface

p We could choose randomness of 2 as an alternative subtitle of the p book. Indeed, the book connects two seemingly unrelated concepts, namely, (1) 2: symbolizing the class of quadratic irrationals, including the theory of the quadratic number fields in general and (2) randomness. These two concepts, representing algebra (the science of order and structure) and probability theory (the science of disorder), are the endpoints of aplong chain of relations/implications. The periodicity of the continued fraction of 2 (or any other quadratic irrational) means self-similarity. Self-similarity leads to independence (e.g., via Markov chains; here we refer to the well known probabilistic concept), and independence ensures (nearly) perfect randomness. In particular, we prove some unexpected probabilistic results: quadratic irrational H) periodic continued fraction H) H) self-similarity H) independence .or independence via Markov chains/ H) H) randomness W central limit theorem and the law of the iterated logarithm This diagram may summarize the book in a nutshell. p The reason why we decided not to choose randomness of 2 to be the subtitle is that it would perhaps mislead the reader. The reader would probably expect us to prove the apparent randomness of the digit distribution in the usual decimal expansion p 2 D 1:414213562373095048801688724209698078569671875376948 : : : : Unfortunately, we cannot make any progress with this famous old problem; it remains open and hopeless (to read more about this and other related famous open problems the reader may jump ahead right now to Sect. 2.5: A Giant Leap in number theory). What we study instead is the “irrational rotation” by any v

vi

Preface

p quadratic irrational, say, by 2. We study the global and local behavior of the irrational rotation from a probabilistic viewpoint—this explains the title of the book probabilistic diophantine approximation. Consider the linear sequence n˛, n D 1; 2; 3; : : :: it is perfectly regular, it is an infinite arithmetic progression. Even if we take it modulo one, and ˛ is an arbitrary (but fixed) irrational, the sequence n˛ (mod 1)—called irrational rotation—still features a lot of regularities. For example, (1) we have infinitely many Bounded Error Intervals, (2) we have infinitely many Bounded Error Initial Segments, (3) every initial segment has at most three different “gaps,” and (4) there is an extremely strong restriction on the induced permutations—these are all strong “anti-randomness” type regularity properties of the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : : (properties (1)–(4) will be explained in depths in Sect. 1.1). These regularities show that the irrational rotation is highly non-random in many respects. This is why the irrational rotation (with an underlying nested structure) is also called a quasi-periodic sequence. Also we know from number theory that the key to understand the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : : ; is to know the continued fraction for ˛. The quadratic irrationals have the most regular continued fraction: the class of quadratic irrationals is characterized by the property of (ultimately) periodic continued fraction, for example, p 2D1C

1 D Œ1I 2; 2; 2; : : : D Œ1I 2: 1 2 C 2C

Despite these regularities of the irrational rotation, our first main result exhibits “full-blown randomness.” For example, how much time does the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : : ; spend in the first half Œ0; 1=2/ of the unit interval Œ0; 1/?pWell, we prove a central limit theorem for every quadratic irrational ˛ (e.g., ˛ D 2). More precisely, let ˛ be p an arbitrary real root of a quadratic equation with integer coefficients, say, ˛ D 2. Given any rational number 0 < x < 1 (say, x D 1=2) and any positive integer n, we count the number of elements of the sequence ˛; 2˛; 3˛; : : : ; n˛ modulo 1 that fall into the subinterval Œ0; x. We prove that this counting number satisfies a central limit theorem in the following sense. First, we subtract the “expected number" nx from the counting number and study the typical fluctuation of this difference as n runs in a long interval 1 n N . Depending on ˛ and x, we may need an extra additive correction of constant times logarithm of N ; furthermore, what we always need is a multiplicative correction: division by (another) constant times square root of logarithm of N . If N is large, the distribution of this renormalized counting number, as n runs in 1 n N , is very close to the standard normal distribution (bell-shaped curve), and the corresponding error term tends to zero as N tends to infinity. This is one of the main results of the book (see Theorem 1.1). The proof is rather complicated and long; it has many interesting detours and by-products. For example, the exact determination of the

Preface

vii

key constant factors (in the additive and multiplicative norming), which depend on ˛ and x, requires surprisingly deep algebraic tools such as Dedeking sums, the class number of quadratic fields, and generalized class number formulas. p Perhaps the reader is wondering: why are the quadratic irrationals (like 2) special and worth spending hundreds of pages on. The answer is that the quadratic irrationals play a central role in diophantine approximation for several reasons. They are the “most anti-rational real numbers” (officially called badly approximable numbers), and at the same time they represent the most uniformly distributed irrational rotations. A third reason is the Pell’s equation x 2 dy 2 D ˙1 (d 2 is p square free), which is of course closely related to d . Also, and this is the message of our book, the best way to understand the local and global randomness of the irrational rotation is to focus on the class of quadratic irrationals. This class gives the most elegant and striking results with the simplest proofs. Some of these results extend to almost every real number, some of them do not extend. We will elaborate on each one of these issues later. The quadratic irrational rotation demonstrates the coexistence p of order and randomness; p a novelty here is the much smaller norming factor log n (instead of the usual n). The log n comes from the fact that the underlying problem is about “generalized digit sums” with the surprising twist that the base of the number p system p is an irrational number (namely, the fundamental unit, e.g., it is 1 C 2 for ˛ D 2). Also log n represents the minimum; it corresponds to the most uniformly distributed irrational rotations. Our second main subject is motivated by the classical Pell’s equation. Finding the integral solutions of (say) x 2 2y 2 D ˙1 means counting lattice points in a long and narrow tilted hyperbolic region that we call a “hyperbolic needle.” Of course, we basically know everything about Pell’s equation (this is why Pell’s equation is included in every undergraduate number theory course), but what happens if we translate the “hyperbolic needle”? What is the asymptotic number of lattice points inside (note that the area is infinite)? Well, for a typical translated copy of the “hyperbolic needle”—which corresponds to an “inhomogeneous Pell inequality”— we prove a “law of the iterated logarithm,” which describes the asymptotic number of integral solutions in a strikingly precise way. In other words, the classical Circle Problem of Gauss is wide open, but here we can solve an analogous Hyperbola Problem. This result is a good illustration of the full power of the probabilistic viewpoint in number theory. In general, consider the inhomogeneous diophantine inequality kn˛ ˇk
0 are arbitrary real numbers, and n is the variable. An old result of Kronecker states that inequality (0.1) has infinitely many integral solutions n if c D 3; this is how Kronecker proved that the irrational

viii

Preface

rotation n˛ (mod 1) is dense in the unit interval. What can we sayp about the number of solutions n of inequality (0.1)? Consider the special case ˛ D 2 of (0.1): p c kn 2 ˇk < ; n

(0.2)

p and let F . 2I ˇI cI N / denote the number of integral solutions n of inequality (0.2) satisfying 1 n of the p N ; this counting function is about the local behavior p irrational rotation n 2 (mod 1). We can describe the true order of F . 2I ˇI cI N /, as N ! 1, in an extremely precise way for almost every ˇ. We prove that the p number of solutions F . 2I ˇI cI e n / of (0.2) oscillates between the sharp bounds (" > 0) p p p p p 2cn n .2 C "/ log log n < F . 2I ˇI cI e n / < 2cnC n .2 C "/ log log n (0.3) as n !p1 for almost every ˇ; see Theorem 5.6 in Part 1.3 of the book. Note that D . 2; c/ > 0 is a positive constant, and (0.3) fails with 2 " instead of 2 C ". (The reason why in (0.3) we switched p from N to the exponentially sparse sequence e n is that the counting function F . 2I p ˇI cI N / is slowly changing in the sense that, as N runs in e n < N < e nC1 , F . 2I ˇI cI N / makes only an additive constant change.) Observe that inequality (0.2) is (basically) equivalent to the inhomogeneous Pell inequality c 0 .x C ˇ/2 2y 2 c 0 ;

(0.4)

p where c 0 D 2 2c. Notice that equation (0.4) determines a long and narrow tilted hyperbola region (“hyperbolic needle”). The message of (0.3) is, roughly speaking, that for almost all translations, the number of lattice points in long and narrow hyperbola segments of any fixed quadratic irrational slope equals the area plus an error term which is never much larger than the square root of the area. Notice that (0.3) is a perfect analog of Khinchin’s law of the iterated logarithm in probability theory (describing the maximum fluctuations of the digit sums of a typical real number ˇ; the factor log log n in (0.3) explains the name “iterated logarithm”). We also have an analogous central limit theorem: the renormalized counting function p F . 2I ˇI cI e n / 2cn p ; 0 ˇ < 1; n has a standard normal limit distribution with error term O.n1=4 .log n/3 / as n ! 1 p [ D . 2; c/ > 0 is the same positive constant as in (0.3)].

Preface

ix

Formally, ˇ n p p o ˇ max ˇmeasure ˇ 2 Œ0; 1/ W F . 2I ˇI cI e n / 2cn n

1 p 2

Z

1

e u

2 =2

ˇ ˇ d uˇˇ D O n1=4 .log n/3 ;

(0.5)

where the maximum is taken over all 1 < < 1 (and of course measure means the one-dimensional Lebesgue measure). The proofs of the innocent-looking results (0.3) and (0.5) are quite difficult (in spite of the fact that most of the arguments are “elementary”). Note that here “independence” comes from a good approximation by modified Rademacher functions. The book is basically “lattice point counting” in disguise. This explains the subtitle randomness in lattice point counting. The main results are proved by the same scheme: we represent a natural lattice point counting function in the form X1 C X2 C X3 C : : : C negligible; where X1 ; X2 ; X3 ; : : : are independent random variables. This way we can directly apply some classical results of probability theory (such as the central limit theorem and the law of the iterated logarithm). We have the following questions: (a) how to construct the independent random variables X1 ; X2 ; X3 ; : : :, (b) how to compute the expectation, and finally (c) how to compute the variance. These are surprisingly difficult questions. Of course (0.3) and (0.5) extend to all quadratic irrationals. They also extend to some p other special numbers for which we know the continued expansion (e.g., e, e 2 , e). Some of the main results about quadratic irrationals (e.g., Theorems 1.1 and 1.2) do not extend to almost every ˛. The reason is that the continued fraction digits (officially called partial quotients) of a typical real number ˛ exhibit a very irregular behavior (see Sect. 6.10). Some other results, including (0.3) and (0.5), do have p every pan analog for almost ˛. There is, however, a difference: the norming factor n is replaced by n log n, and also the error term is much weaker (see Sect. 6.10). The kind of “randomness” we prove in the book requires some knowledge about the continued fraction expansion of the real number ˛. This is why the best way to demonstrate this “randomness” is to study the class of quadratic irrationals. Unfortunately, we know very little about the continued fraction of algebraic numbers of degree 3.pThis explains why we cannot prove anything about (say) the “randomness of 3 2”; this is why we can prove strong results about the “randomness of e,” and can prove nothing about the “randomness of .”

x

Preface

Besides “randomness,” the other main subject of the book is “Area Principle versus superirregularity” (see Part 1.3, starting with Sect. 5.1). The traditional meaning of probabilistic diophantine approximation is that it is a collection of results best illustrated by the following classical 0 1 law of Khinchin. If .n/ > 0 is a nonincreasing sequence, then the diophantine inequality P nkn˛k < .n/ has infinitely many P integral solutions n for almost every ˛ if 1 nD1 .n/ D 1; on the other hand, if 1 .n/ < 1 then nkn˛k < .n/ has only finitely nD1 many integral solutions n for almost every ˛. The subtitle of our book (randomness in lattice point counting) emphasizes the fact that what we do here is very different. We develop a new direction of research on the borderline of probability theory and number theory (including algebraic number theory). We switch the focus from almost every ˛ to special numbers (like quadratic irrationals and e), and switch from 0 1 laws to more sophisticated probabilistic results such as the central limit theorem and the law of the iterated logarithm. One of the challenges we faced in writing this book was that the experts in probability theory tend to know very little algebraic number theory and vice versa: the experts in algebraic number theory do not really care much about probability theory. These two groups, “algebraists” and “probabilists,” are in fact very different kinds of mathematicians with totally different taste and different intuitions. It is hard to find a middle ground satisfying both groups, not to mention the readers who know little probability theory and little algebraic number theory. This forced us to include a lot of examples and “detours.” The book grew from five partly-survey-partly-research papers of ours written between 1991 and 2000 (see [Be1,Be2,Be3,Be4,Be5]) and four more recent papers starting from 2010 (see [Be7, Be8, Be9, Be10]). In a nutshell, our work is a farreaching extension of some classical results of Hardy–Littlewood and Ostrowski from the period of 1914–1920. In particular, we added the unifying “probabilistic viewpoint,” which is completely missing from the old papers. It is interesting to point out that for the generation of Hardy, number theory and probability sounded like a strange mismatch. Hardy once dismissively declared: “probability is not a notion of pure mathematics but of philosophy or physics” (Hardy made this statement before Kolmogorov’s axioms “legitimized” probability theory as a wellfounded chapter in measure theory). The main results of the book are Theorems 1.1, 1.2, 5.4, 5.6 (all about “randomness”) and the subject of “Area Principle versus superirregularity” (see, respectively, Proposition 1.18, Theorems 5.7 and 5.3, Sects. 5.4–5.10). Since the two parts of the book are quite independent, the reader may start reading Part 1.3 first. We would recommend the reader to start with Sects. 1.1, 1.2, 5.1, and 5.2. An alternative way is to start with Sect. 2.5 and then go to Sects. 1.1, 1.2, 5.1, and 5.2. The book is more or less self-contained. It should be readable to everybody with some basic knowledge of mathematics (second-year graduate students and up) who is interested in number theory and probability theory.

Preface

xi

A few words about the notation. We constantly use the (rather standard) notation fxg, kxk, bxc, dxe, which mean, in this order, the fractional part of a real number x, the distance of x from the nearest integer, and the lower and upper integral parts of x (for example, x D fxg C bxc and kxk D minffxg; 1 fxgg). A less well-known notation is ( fxg 12 ; if x is not an integerI ..x// D 0; otherwise for the “sawtooth function,” which is permanently used in Part I of the book starting from Sect. 2.1. Throughout the letter c (or c0 , c1 , c2 ; : : :) denotes a generic constant, i.e., a positive constant that we could but do not care to determine. This constant may be absolute, or may depend upon the parameters involved in the theorem in question; it will not generally be the same constant. The well-known O-notation which occurs involves constants implicitly. It will generally be obvious on what, if any, parameters these constants depend. The natural (base e) logarithm is denoted by log (instead of ln that we don’t use in the book). We use log2 for the iterated logarithm, so log2 x D log log x; we use log x= log 2 to denote the binary (i.e., base 2) logarithm of x. We are sure there are many errors in this first version of the book. We welcome any corrections, suggestions, and comments. Piscataway, NJ, USA March 2014

József Beck

Contents

Part I

Global Aspects Randomness of the Irrational Rotation

1 What Is “Probabilistic” Diophantine Approximation? . . . . . . . . . . . . . . . . . . 1.1 The Giant Leap in Uniform Distribution . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1.1 From Quasi-Periodicity to Randomness . . . . . . . . . . . . . . . . . . . . 1.1.2 Summary in a Nutshell . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Randomness in Lattice Point Counting .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.1 A Key Tool: Ostrowski’s Explicit Formula . . . . . . . . . . . . . . . . . 1.2.2 Counting Lattice Points in General . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.1 Digit Sums and Generalized Digit Sums.. . . . . . . . . . . . . . . . . . . 1.3.2 A Decomposition Trick .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.3 Concluding Remark .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Second Warm-Up: Markov Chains and the Area Principle .. . . . . . . . . 1.4.1 Statistical Independence and Markov Chains. . . . . . . . . . . . . . . 1.4.2 Long Runs of Heads . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 The Golden Ratio and Markov Chains: The Simplest Case of Theorem 1.2.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.1 Constructing the Underlying (Homogeneous) Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.2 How to Approximate with a Sum of Independent Random Variables .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.3 Solving the Parity Problem . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5.4 Concluding Remarks. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

3 3 14 14 17 23 26 29 37 38 43 44 49 52

2 Expectation, and Its Connection with Quadratic Fields . . . . . . . . . . . . . . . . . 2.1 Computing the Expectation in General (I).. . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.1 An Important Detour: How to Guess Proposition 2.1? .. . . . 2.1.2 Quadratic Fields in a Nutshell . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

79 79 82 83

59 65 69 70 77

xiii

xiv

Contents

2.1.3 2.1.4

2.2

2.3

2.4 2.5 2.6

Another Detour: Formulating a “Positivity Conjecture” . . . Proposition 2.1 and Some Works of Hardy and Littlewood . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Computing the Expectation in General (II) .. . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.1 The Expectation in Theorem 1.1 . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.2 An Analog of Proposition 2.1 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.3 Periodicity in Proposition 2.9 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Fourier Series and a Problem of Hardy and Littlewood (I) .. . . . . . . . . 2.3.1 Badly Approximable Numbers .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.2 The Hardy–Littlewood Series . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.3 Doubling and Halving in Continued Fractions . . . . . . . . . . . . . 2.3.4 A Geometric Interpretation .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Fourier Series and a Problem of Hardy and Littlewood (II) . . . . . . . . . A Detour: The Giant Leap in Number Theory . . .. . . . . . . . . . . . . . . . . . . . 2.5.1 Looking at the “Big Picture” . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Connection with Quadratic Fields (I) . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.1 A Detour: Another Class Number Formula .. . . . . . . . . . . . . . . . 2.6.2 How to Compute the Class Number in General: The Complex Case . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

87 98 100 100 105 113 116 118 120 123 125 128 137 137 148 161 163

3 Variance, and Its Connection with Quadratic Fields . . . . . . . . . . . . . . . . . . . . 3.1 Computing the Variance .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1.1 Guiding Intuition .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1.2 An Alternative Form of the Guiding Intuition .. . . . . . . . . . . . . 3.2 Connection with Quadratic Fields (II) .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.1 A Convenient Special Case: When the Class Number Is One . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.2 The Class Number for Real Quadratic Fields: Illustrations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.3 The Dedekind’s Zeta Function at s=2: A Formula Involving Characters . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.4 An Alternative Formula Due to Siegel: Proposition 3.7 . . . 3.3 Connection with Quadratic Fields (III) . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.1 The General Case: Computing the Variance for an Arbitrary Quadratic Irrational .. . .. . . . . . . . . . . . . . . . . . . . 3.3.2 Computing the Variance in Theorem 1.1: A Special Case . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.3 Computing the Variance in Theorem 1.1: The General Case . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.4 The Case of Symmetric Intervals . . . . . . .. . . . . . . . . . . . . . . . . . . .

167 167 168 170 176

4 Proving Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 Completing the Proof of Theorem 1.2 .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.1 Renewal Versus Self-Similarity . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.2 Ergodic Markov Chains: Exponentially Fast Convergence to the Stationary Distribution .. . . . . . . . . . . . . . . .

207 207 210

181 182 186 192 196 196 197 202 204

220

Contents

4.2 4.3 4.4

4.5 Part II

xv

How to Use Lemma 4.2 to Find the Analog of (1.223) in General? .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Completing the Proof of Theorem 1.1 . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The Fourier Series Approach.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.1 Guiding Intuition .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.2 Constructing a Sum XQ 1 C XQ 2 C XQ 3 C : : : of Almost Independent Random Variables . . . . . . . . . . . . . . . . . 4.4.3 Defining the Truly Independent Random Variables X1 ; X2 ; X3 ; : : :. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . More Results in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

223 226 226 227 233 236 240

Local Aspects Inhomogeneous Pell Inequalities

5 Pell’s Equation, Superirregularity and Randomness . . . . . . . . . . . . . . . . . . . . 5.1 From Pell Equation to Superirregularity .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.1 Pell’s Equation: Bounded Fluctuations .. . . . . . . . . . . . . . . . . . . . 5.1.2 The Area Principle . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.3 The Giant Leap in the Inhomogeneous Case: Extra Large Fluctuations.. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Randomness and the Area Principle . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Proving Theorem 5.3 and the Lemmas . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4 The Riesz Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.4.1 The Method of Nested Intervals vs. the Riesz Product .. . . . 5.4.2 The “Rectangle Property”, and a Key Result: Theorem 5.11.. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5 Starting the Proof of Theorem 5.11 Using Riesz Product . . . . . . . . . . . 5.5.1 What are the Trivial Errors and How to Synchronize Them . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.2 Geometric Ideas . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.3 An Important Consequence of the “Rectangle Property” . . 5.5.4 Choosing a Short Vertical Translation . .. . . . . . . . . . . . . . . . . . . . 5.5.5 Summarizing the Vague Geometric Intuition .. . . . . . . . . . . . . . 5.6 More on the Riesz Product . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.1 Applying Super-Orthogonality . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.2 Single Term Domination: Clarifying the Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.3 A Combination of the Rectangle Property and the Pigeonhole Principle . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7 Completing the Case Study . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.1 Verifying (5.152) . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.2 A Combination of the Rectangle Property and the Pigeonhole Principle . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.3 A Combination of the Rectangle Property and the Pigeonhole Principle. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.4 A Combination of the Rectangle Property and the Pigeonhole Principle. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

251 251 251 253 256 263 275 281 281 285 288 295 296 299 300 301 302 302 307 311 314 314 318 324 329

xvi

Contents

5.8 5.9

Completing the Proof of Theorem 5.11.. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Yet Another Generalization of Theorem 5.3 . . . . .. . . . . . . . . . . . . . . . . . . . 5.9.1 Step One .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9.2 Step Two: Small “Digit” ai Implies “Local” Rectangle Property .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9.3 Step Three: Employing the Riesz Product Technique .. . . . . 5.9.4 Step Four: Constructing a Cantor Set . . .. . . . . . . . . . . . . . . . . . . . 5.10 General Point Sets: Theorem 5.19 . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.10.1 Statistical Version of the Rectangle Property: An Average Argument .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.10.2 Consequences of Inequality (5.327). . . . .. . . . . . . . . . . . . . . . . . . . 5.11 The Area Principle in General . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6 More on Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 Completing the Blocks-and-Gaps Decomposition . . . . . . . . . . . . . . . . . . 6.3 Estimating the Variance . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Applying Probability Theory.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.1 Central Limit Theorem with Explicit Error Term . . . . . . . . . . 6.5 Conclusion of the Proof of Theorem 5.4 . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 Proving the Three Lemmas: Part One . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.1 Properties of the Auxiliary Functions in (6.222) and (6.223) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.2 Deduction of Lemma 6.6 from Lemmas 6.4 and 6.5 .. . . . . . 6.7 Proving the Three Lemmas: Part Two . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.8 Starting the Proof of Theorem 5.6 . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.9 Completing the Proof of Theorem 5.6 . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.10 More Results in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.10.1 Combining the Logarithmic Density with the Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

331 338 341 343 346 347 349 351 353 357 371 371 383 393 403 405 413 423 427 429 434 446 457 468 473

References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 481 Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 485

Part I

Global Aspects Randomness of the Irrational Rotation

Chapter 1

What Is “Probabilistic” Diophantine Approximation?

1.1 The Giant Leap in Uniform Distribution p We discuss some surprising new developments concerning 2, and in general the p class of quadratic irrationals. We use 2 as the representative for the whole class. These results provide some rigorous evidence for a mysterious general phenomenon that we call the Giant Leap. In a nutshell, it is about the unexpected randomness of explicit sequences (Giant Leap to full-blown randomness). The reader may jump ahead to Sect. 2.5 for p a detailed discussion of this issue. The history of 2 is quite remarkable. Every mathematician knows that the discovery the Pythagorean school—namely, that numbers p of irrational numbers by p like 2 and the golden ratio .1 C 5/=2 are irrational (the Ancient Greeks called them “incommensurable”)—caused a great deal of shock. The Pythagoreans looked upon integers as the essence of all things in the universe. When they realized that the integers did not suffice to measure even a simple geometric object such as the length of the diagonal of a unit square, they must have felt cheated by the gods. However, a modern student (say, a good undergraduate student) has a hard time understanding the magnitude of this philosophical crisis 2,500 years ago. The modern student remembers the well-known theorem from the high school that a real number is rational if and only if its decimal expansion (an infinite series(!)) is eventually periodic. Now it is very easy to construct decimal expansions which are obviously not periodic. For example, take a decimal expansion which is increasingly dominated by zeros: ˛ D 0:01001000100001000001000000100000001

(1.1)

It is clearly nonperiodic, since the length of the blocks of consecutive 0s (separated by 1s) tends to infinity; of course, there are infinitely many similar examples. The Ancient Greeks had a totally different way of discovering irrational numbers. Instead of studying infinite series (the Ancient Greeks knew little calculus), © Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__1

3

4

1 What Is “Probabilistic” Diophantine Approximation?

they were focusing on intuitive geometry, thoroughly studying regular polygons (equilateral triangle, square, regular pentagon, etc.) and also the regular polyhedra (regular tetrahedron, cube, etc.). By using Pythagoras’ theorem, they were able to express many natural geometric distances, say, the height of the equilateral triangle, the diagonal of the square, the diagonal of the regular pentagon, the height of the regular tetrahedron, and the space diagonal of the cube in terms of square roots p p (i.e., quadratic irrationals). If each side is one, we obtain the numbers 3=2, 2, p p p .1 C 5/=2, 2=3, and 3 in this order. The Ancient Greeks called two (positive) distances d0 and d1 commensurable (i.e., their ratio is rational) if they can be both measured with the same unit u so that d0 D m u and d1 D n u, where m and n are natural numbers. Before discovering irrational numbers, the Ancient Greeks probably felt intuitively that this process—basically the Euclidean algorithm—would always terminate. It was a shock, therefore, when in the fifth century B.C. a member of the Pythagorean school, probably Hippasus of Metapont, discovered examples of incommensurable (i.e., irrational) geometric distances. The first example was most p likely the ratio diagonal/side in the regular pentagon (i.e., the golden ratio .1 C 5/=2), due to the fact that the pentagram (regular pentagon with the five diagonals) was the official symbol of the Pythagorean brotherhood. By iterating the pentagram for the inscribed pentagon, we obtain an infinity of smaller and smaller similar pentagons. Converting this self-similar picture into a continued fraction, we obtain p 1C 5 1 diagonal D Œ1I 1; 1; 1; 1; 1; : : : D Œ1I 1; D D1C 1 side 2 1 C 1C:::

(1.2)

that is, we have an example where the Euclidean algorithm never terminates. We emphasize the difference between the artificially constructed irrational number in (1.1) and the quadratic numbers that the Ancient Greeks have proved to be irrational. The quadratic numbers represent genuinely interesting natural geometric distances; they deserve to be called special numbers. The real number in (1.1), on the other hand, is just an artificial counterexample. Equation (1.2) gives the continued fraction for the golden ratio. The irrationality p p of 2 and 3 were probably proved by the Greeks using analogous geometric considerations, by studying self-similar pictures. A self-similar picture can be converted into a recurrence relation, for example, p p 1 1 2 D 1 C . 2 1/ D 1 C p D1C p ; 2C1 2 C . 2 1/ and the recurrence relation in (1.3) leads to the familiar continued fraction for p

2D1C

1 2C

1 p 2C. 21/

D Œ1I 2; 2; 2; : : : D Œ1I 2:

(1.3) p

2:

(1.4)

1.1 The Giant Leap in Uniform Distribution

5

Now we jump ahead in time a couple of thousand years to Lagrange’s famous theorem, which generalizes (1.2) and (1.4) as follows. A real number ˛pis said to be a quadratic irrational if it can be written in the form ˛ D .a C d /=b, where a; b ¤ 0; d 2 are integers and d is not a complete square. An equivalent definition is that ˛ is a root of a quadratic equation Ax 2 CBx CC D 0 with integral coefficients such that the discriminant B 2 4AC 2 is not a complete square. Lagrange’s Theorem. The continued fraction which represents a quadratic irrational is always (ultimately) periodic. For example, the

p 24 15 17

D Œ1I 5; 2; 3 (the bar indicates the period). We also have

Converse of Lagrange’s Theorem. If the continued fraction of ˛ is (ultimately) periodic then ˛ is a quadratic irrational. Continued fractions play a key role in the theory of the Pell equation x 2 dy 2 D 1. We known that the Pell equation has infinitely many integral solutions if the integer d 2 is not a complete square. The well-known cyclic structure of p all integral solutions is a by-product of the periodicity of the continued fraction of d . It is also well knownphow to read out the least solution from the period of the continued fraction of d . As an illustration, take d D 29, and consider p 29 D Œ5I 2; 1; 1; 2; 10; 2; 1; 1; 2; 10; 2; 1; 1; 2; 10; : : : D Œ5I 2; 1; 1; 2; 10; where the bar indicates here the period. The length of the period is 5, an odd number, implying that the numerator and the denominator of the fifth convergent Œ5I 2; 1; 1; 2 D

70 13

give the least positive solution x D 70, y D 13 (i.e., x D x0 > 0, y D y0 > 0 for which y0 is least) of the Pell equation x 2 29y 2 D 1 with “1” instead of “C1.” In order to get the least solution of x 2 29y 2 D C1 we need the tenth convergent (i.e., we repeat the period) Œ5I 2; 1; 1; 2; 10; 2; 1; 1; 2 D

9081 ; 1820

and the least solution is the pair x D 9081 and y D 1820. Sometimes the least solution is huge. A striking example is the Pell equation x 2 61y 2 D 1 for which the least solution is x D 1,766,319,049 and y D 226,153,980; another one is x 2 109y 2 D 1 for which the least solution is x D 158,070,671,986,249 p and y D 15,140,424,455,100. Roughly speaking, the length of the period of d describes the logarithmp of the least solution of the Pell equation (for example, the length of the period of 61 is 11).

6


The remarkable connection between continued fractions and higher arithmetic, especially quadratic fields, is a well-known story, and it can be found in many books on number theory (see, e.g., [Ha-Wr]). Here we focus on a completely different, hardly known angle: the equally fascinating connection between quadratic irrationals and randomness. As a first illustration, we formulate and prove p a central limit theorem related to the uniform distribution of the sequence n 2 (mod 1), n D 1; 2; 3; : : :. If ˛ is rational then the sequence n˛ (mod 1), n D 1; 2; 3; : : : ; is clearly periodic. On the other hand, if ˛ is irrational, then the fractional parts 0 < fn˛g < 1, n D 1; 2; 3; : : : ; represent distinct points in the unit interval .0; 1/. The sequence n˛ (mod 1), n D 1; 2; 3; : : : ; is often called the irrational rotation, due to the familiar representation of the unit torus as a circle of unit circumference. What can we say about the distribution of the irrational rotation? We are going to achieve a “Giant Leap” from the perfectly regular, periodic behavior to randomness in three steps. a

12a

0=1=2=...

2a

3a

Step One: The irrational rotation is dense in .0; 1/. Step Two: The irrational rotation is uniformly distributed in .0; 1/. Step Three: The quadratic irrational rotation, counted in any fixed interval .0; x/ with rational endpoint x, exhibits a central limit theorem. Step Three is the new result here. Why quadratic irrationals? Well, the quadratic irrationals play a special role. Besides the deep connection with number theory (Pell’s equation is just one example), we have to point out that the quadratic irrationals are in the class of the “most anti-rational” real numbers (officially called badly approximable numbers—this will be explained below). This “antirational” property of the quadratic irrationals is a consequence of the boundedness of the continued fraction “digits”(= partial quotients); boundedness follows from periodicity.


7

How such as e, p p about the “anti-rational” property of other interesting numbers , 3 2, and log 2? Well, e is almost as “anti-rational” as p (say) 2, but we know hardly anything about the “anti-rational” property of or p3 2 or log 2 (because we can prove very little about the continued fraction for or 3 2 or log 2). For better understanding of Step Three, we have to briefly talk about Step One and Step Two, which are of course well-known classical results. Notice that Step One is just a one-dimensional special case of Kronecker’s famous general theorem that he proved in 1884: if 1; ˛1 ; ˛2 ; : : : ; ˛k are linearly independent over the rationals, then the k-dimensional sequence .n˛1 ; n˛2 ; : : : ; n˛k /; n D 1; 2; 3; : : : modulo one;

(1.5)

is dense in the k-dimensional unit cube. Density, important as it is, does not tell the whole truth about the global distribution of the irrational rotation: Step Two above claims the much stronger property of uniform distribution. We recall that an infinite sequence in the unit interval is said to be uniformly distributed if for any subinterval I .0; 1/ the density of the elements of the sequence that fall into I exists, and it equals the length jI j of the subinterval. The uniform distribution of the irrational rotation has been discovered and proved around 1910 (Bohl, Sierpinski, H. Weyl). For later purposes we include a short proof of this important result. Short Proof of Uniform Distribution. It is based on a simple but very useful observation of Hecke that if subintervals have some special length then the counting error is bounded. First a notation: for any interval I .0; 1/ write Z˛ .N I I / D

X

1;

(1.6)

1nN W n˛2I .mod 1/

and call Z˛ .N I I / the “counting function.” The counting function (1.6) is simply the partial sum of the interval-hitting sequence. t u Lemma on Bounded Error Intervals.. Let I .0; 1/ be a half-open interval of length jI j D fk˛g (fractional part) where k 1 is some integer. Then for every N jZ˛ .N I I / N jI jj < k:

(1.7)

Proof. First let k D 1. Since each step ˛ of the irrational rotation is the same as the length of interval I , the equality Z˛ .N I I / D bN˛cordN˛e

(1.8)

(meaning the lower or upper integral part) is obvious: every interval Œm; m C 1/, where m is an integer, contains exactly one multiple n˛ with n˛ 2 I (mod 1).

8


If k 2 then we simply decompose the sequence n˛, n D 1; 2; 3; : : : ; into k arithmetic progressions of the same gap k and apply (1.8) for each. This implies (1.7). t u Using this lemma we can quickly prove the uniform distribution of the irrational rotation. It clearly suffices to deal with intervals of the type I D Œ0; where 0 < < 1 is arbitrary. Since the irrational rotation is dense (“Step One”), for every " > 0 there exist natural numbers m1 and m2 such that " < fm1 ˛g < < fm2 ˛g < C ":

(1.9)

Write I1 D Œ0; fm1 ˛g/ and I2 D Œ0; fm2 ˛g/; then clearly Z˛ .N I I1 / Z˛ .N I I / Z˛ .N I I2 /:

(1.10)

By (1.7) for every N and j D 1; 2 jZ˛ .N I Ij / N jIj jj < mj :

(1.11)

Combining (1.9)–(1.11), for every N jZ˛ .N I I / N jI jj < maxfm1 ; m2 g C "N: Dividing (1.12) by N and taking " ! 0, uniform distribution follows.

(1.12) t u

Note that the usual proof is based on Weyl’s criterion [We], which is far the most flexible approach: it easily generalizes in higher dimensions, gives nontrivial results for power sequences like n2 ˛ and n3 ˛, and for many other cases. Weyl’s criterion says that a sequence xn , n D 1; 2; 3; : : : ; is uniformly distributed modulo one if and only if Z 1 N 1 X 2ikxn e D e 2ikx dx D 0 N !1 N 0 nD1 lim

(1.13)

for every integer k ¤ 0 (notice that the case k D 0 is trivial). There is a third proof, using continued fractions, which has the great advantage of providing a sharp estimation on the error term. This quantitative approach goes back to Ostrowski [Os] and to Hardy and Littlewood [Ha-Li1, Ha-Li2] (independent work around 1920). First we recall some well-known facts from the theory of continued fractions (see, e.g., the books [Kh2] or [La]). If


˛ D a0 C

1 1 a1 C a2 C : : :

9

D Œa0 I a1 ; a2 ; : : :;

then the j th convergent pj D Œa0 I a1 ; : : : ; aj 1 qj has the property that pj qj 1 pj 1 qj D .1/j ;

(1.14)

implying that pj and qj are relatively prime; the denominators qj satisfy the recurrence formula q1 D 1, q2 D a1 , qj D aj 1 qj 1 C qj 2 for all j 3, and finally, ˇ ˇ ˇ ˇ 1 ˇ˛ p j ˇ < ; ˇ ˇ qj qj qj C1 implying the weaker inequality that will suffice for our purposes here: ˇ ˇ ˇ ˇ ˇ˛ pj ˇ < 1 ; ˇ qj ˇ qj2

(1.15’)

(1.15”)

Quantitative proof of uniform distribution. It is based on the following Lemma on Bounded Error Initial Segments.. The special initial segment k˛, 1 k qn , where qn is a convergent denominator, is particularly well distributed in the sense that, for every subinterval I .0; 1/ and for every integer n 1, the discrepancy of the counting function [see (1.6)] is bounded: jZ˛ .qn I I / qn jI jj 3:

(1.16)

ˇ ˇ ˇ ˇ ˇk˛ kpn ˇ < k 1 ˇ qn ˇ qn2 qn

(1.17)

Proof. By (1.15”)

for all 1 k qn . Since pn and qn are relatively prime, the sequence kpn =qn , 1 k qn (mod 1) is just a permutation of the equidistant set j=qn , 1 j qn , for which we have Z1=qn .qn I I / D bqn jI jc or dqn jI je:

(1.18)

10


By (1.17) jZ˛ .qn I I / Z1=qn .qn I I /j 2; and combining this with (1.18), the lemma follows. By using this lemma we can easily estimate the discrepancy jZ˛ .N I I / N jI jj

t u

(1.19)

for a general N . Assume qn1 N < qn . In view of the recurrence relation qj D aj 1 qj 1 C qj 2 (for all j 3) we can write N in the form N D bn1 qn1 C bn2 qn2 C : : : C b1 q1 ;

(1.20)

where 1 bn1 an1 , 0 bj aj for 2 j n 2, and 0 b1 a1 1. Combining the trivial identity Z˛ .m C qj I I / Z˛ .mI I / D Z˛ .qj I I m˛/

(1.21)

with (1.16) and (1.20), we have jZ˛ .N I I / N jI jj 3.bn1 C bn2 C : : : C b1 /; which, in view of bj aj , immediately implies the following Discrepancy Lemma. For every integer N 1 and every subinterval I .0; 1/ jZ˛ .N I I / N jI jj 3.a1 C a2 C : : : C an1 /;

(1.22)

where qn1 N < qn . In fact, we have the slightly sharper form jZ˛ .N I I / N jI jj 3.a1 C : : : C an2 C N=qn1 /:

(1.23) t u

To prove uniform distribution we have to check that Z˛ .N I I / ! jI j N

(1.24)

as N ! 1 for all subintervals I .0; 1/. From the recurrence formula qj D aj 1 qj 1 C qj 2 (for all j 3) we have q2j C1 .1 C a1 a2 /.1 C a3 a4 / .1 C a2j 1 a2j /; and trivially

(1.25)


11

q2j C2 a2j C1 q2j C1:

(1.26)

a1 C : : : C ak ! 0 .1 C a1 a2 /.1 C a3 a4 / .1 C ak1 ak /

(1.27)

Using the general fact

as k ! 1 through the even integers, and combining (1.23), (1.25) and (1.26), we obtain (1.24) where qn1 N < qn . This completes the quantitative proof of uniform distribution. t u Let’s return to (1.22) in the Discrepancy Lemma: note without proof that the upper bound .a1 C a2 C : : : C an1 / is basically sharp apart from the constant factor. The max-discrepancy, i.e., the discrepancy taken over all N in qn1 N < qn and over all subintervals I .0; 1/, does fluctuate as much as constant times .an1 C an3 C an5 C : : :/; this result is due to Hardy and Littlewood and, independently, to Ostrowski. If qn1 N < qn , then from qj D aj 1 qj 1 C qj 2 , very roughly, qn .1 C a1 /.1 C a2 / .1 C an1 /:

(1.28)

Under side condition (1.28) the minimum of the critical digit sum .a1 C a2 C : : : C an1 / is attained when max n

a1 C a2 C : : : C an D O.1/; n

(1.29)

i.e., when the average digit size is bounded, and so the smallest possible maxdiscrepancy for all irrational rotations is (positive) constant times log N , with equality (apart from a constant factor) for the class of ˛ satisfying (1.29). For quadratic irrationals the average digit size is clearly bounded (a by-product of periodicity), so (1.29) applies, and implies that the quadratic irrational rotation n˛, n D 1; 2; 3; : : : (mod 1), has max-discrepancy c˛ log N . The smallest p values of constant factor cp > 0 occur for numbers like the golden ratio .1 C 5/=2 D ˛ Œ1I 1; 1; 1; : : : and 2 D Œ1I 2; 2; 2; : : : that have very small continued fraction digits; see the more recent works of Dupain [Du] and Dupain and Sós [Du-So]. Summarizing, we have a very good understanding of the max-discrepancy of the quadratic irrational rotation: it is always (positive) constant times log N —i.e., as small as possible—where the constant factor depends on ˛. The numbers ˛ which are badly approximable by rationals give the “most uniform” irrational rotation and vice versa. The first new result is about the typical discrepancy (instead of the maxdiscrepancy). Step Three: The quadratic irrational rotation, counted in any fixed interval .0; x/ with p rational endpoint x, exhibits a central limit theorem with standard deviation c log N .

12


Step Three is in perfect harmony with the mysterious Giant Leap phenomenon that we will discuss in detail in Sect. 2.5. The Giant Leap refers to the dramatic change that happens when we switch from rationals to irrationals, and especially to quadratic irrationals. The rational rotation exhibits extremely simple periodic behavior; the quadratic irrational rotation, on the other hand, exhibits full-blown randomness, including a delicate central limit theorem. Note that the quadratic irrational rotation is at the other end of the spectrum, since the quadratic irrationals are (among) the most “anti-rational” numbers. Here is the precise statement. Theorem 1.1 (Central limit theorem). Let ˛ be any quadratic irrational and consider any interval I D Œ0; x/ with rational endpoint 0 < x < 1. There are effectively computable constants C1 D C1 .˛; x/ and C2 D C2 .˛; x/ > 0 such that, for any real numbers 1 < A < B < 1, the density of integers N 2 for which A
nx C C1 log N C C2 plog N ˇ R1 !1 u2 =2 d u p1 e 2

as long as D O .log N /1=10 .


13

The exponent 1=10 is certainly not best possible, and with a little extra effort we could easily prove a better constant, but to find the best exponent is not our main goal here. Hecke’s Lemma on Bounded Error Intervals shows that our condition “endpoint x is rational” cannot be relaxed to “any x”; indeed, if x D f˛g, or x D fk˛g for some integer k 1 (i.e., x is the fractional part of an integer p multiple of ˛), then the fluctuation is bounded (instead of having average size log N ). Note that the first constant factor C1 D C1 .˛; x/ in (1.30) can be both zero and nonzero, but the second factor C2 D C2 .˛; x/ > 0 is always p strictly positive. For example, if I D Œ0; 1=2/ (i.e., x D 1=2) and ˛ D 2, then [see (2.86)] p C1 D C1 . 2; 1=2/ D

1 8 log.1 C

p 2/

(1.32)

and [see (3.127)] p 1 C2 D C2 . 2; 1=2/ D 8

3

1=2

p p 2 log.1 C 2/

:

(1.33)

p if I remains the first half p Œ0; 1=2/ of the unit interval, but p On the other hand, 2 is replaced by 3 or the golden ratio .1 C 5/=2, then the corresponding first constant factor C1 is zero [see (2.90) and (2.91)], that is, we don’t need the additive logarithmic term in the numerator of (1.30). p Another example is ˛ D 7 and I D Œ0; 1=2/, then [see (2.92)] p C1 . 7; 1=2/ D

1 p : 4 log.8 C 3 7/

p Note that the number 8 C 3 7 in the denominator comes from the p least positive solution x D 8; y D 3 of Pell’s equation x 2 p 7y 2 D ˙1; this 8 C 3 7 is called the fundamental unit in the real quadratic field Q . 7/. The reason why the fundamental unit shows up in both C1 and C2 will be explained in the proofs. Note also that Theorem 1.1 can be easily generalized for any interval I D .x1 ; x2 / where both endpoints are rational. For example, taking the symmetric intervals I D .x; x/ (instead of I D Œ0; x/) the first constant factor C1 is always zero. Note in advance that the explicit evaluation of the variance constants C2 is based on explicit finite formulas that we call “generalized class number formulas.” It involves surprisingly deep number theory (see Sects. 3.1–3.3). The basic idea of the proof of Theorem 1.1 is the following. As n runs in an interval 0 < n < N , we set up an approximation of Z˛ .nI I / nx with a sum of independent and identically distributed random variables. (Note in advance that the independence will come from an underlying homogeneous Markov chain.) Despite the simplicity of this approach, the details are complicated, and the proof of Theorem 1.1 is rather long.

14


1.1.1 From Quasi-Periodicity to Randomness Let’s return to Hecke’s Lemma on Bounded Error Intervals: it is a very strong “antirandomness” type limitation on the irrational rotation. By the way, later we need the following stronger form of Hecke’s Lemma. Lemma on Just Intervals. Let I .0; 1/ be an arbitrary half-open interval of length jI j D fqk ˛g for some integer k 0, where qk is the k-th convergent denominator of ˛. Then for any integer N 1, jZ˛ .N I I / N jI jj < 2: We give a proof of this lemma at the end of the section. Another strong regularity property of the irrational rotation is the Lemma on Bounded Error Initial Segments. A third strong regularity property is the so-called Three-distance theorem. We don’t need it for the rest, but this elegant result is definitely worthwhile mentioning. Let 0 < ˛ < 1 be an arbitrary irrational number, let n be a natural number, and let 0 < y1 < y2 < : : : < yn < 1 be the first n terms of the fractional part sequence fk˛g, 1 k n, arranged in increasing order. H. Steinhaus made the surprising conjecture that the set of gaps yj C1 yj , j D 0; 1; : : : ; n (where y0 D 0 and ynC1 D 1), attain at most three different values. Moreover, if there are three different values, say, 0 < ı1 < ı2 < ı3 , then ı 1 C ı2 D ı 3 . This beautiful conjecture was proved by Sós [So1] and Swierczkowski [Sw], and it is now called the “three-distance theorem.” It was Sós [So1] who noticed a very interesting by-product of the proof of the Three-distance theorem. Lemma on Restricted Permutations. Let ˛ be an arbitrary irrational, and let P be the permutation of the set 1; 2; : : : ; n such that 0 < fp.1/˛g < fp.2/˛g < : : : < fp.n/˛g < 1: Then the whole permutation P W p.1/; p.2/; : : : ; p.n/ can be reconstructed from the knowledge of p.1/ and p.n/; the point is that we don’t need to know ˛. It is worth mentioning that there is another interesting “three-distance theorem,” which goes as follows. Besides ˛ and n, let 0 < b < 1 be an arbitrary real number. The “gaps” between the successive values of k, 1 k n, for which fk˛g < b can have at most three lengths, and if there are three, one will be the sum of the other two (this was also a conjecture of Steinhaus).

1.1.2 Summary in a Nutshell The linear sequence n˛, n D 1; 2; 3; : : : ; is perfectly regular: it is an infinite arithmetic progression. Even if we take it modulo one, a lot of regularities are


15

still preserved. For example, (1) Hecke’s Lemma on Bounded Error Intervals and its stronger form, (2) the Lemma on Just Intervals, (3) the Lemma on Bounded Error Initial Segments, (4) the Three-distance theorem, and (5) the Lemma on Restricted Permutations are all strong “anti-randomness” type regularity properties of the irrational rotation. These regularities demonstrate that the irrational rotation is highly non-random in many respects, and explain why the irrational rotation is called a quasi-periodic sequence. Nevertheless, our Theorem 1.1, a central limit theorem, clearly exhibits full-blown “randomness.” The price p p that we pay is the much smaller norming factor log n instead of the usual n. The message—in fact, the basic message of the book—is that, even under very restrictive regularity conditions such as quasi-periodicity, randomness eventually prevails. We have a very good understanding of the irrational rotation n˛ (mod 1), n D 1; 2; 3; : : :, which is a linear sequence. By comparison, we know much, much less about the polynomial sequences such as n2 ˛ (mod 1), n3 ˛ (mod 1), n4 ˛ (mod 1), and so on, where ˛ is a given special number, say, a quadratic irrational. Computer experimentation indicates full-blown randomness with standard deviation p p n (instead of log n), but basically there is no mathematical tool to prove it (especially for degree 3). Finally, as we promised, we conclude this section with a Proof of the Lemma on Just Intervals. Let q1 D 1, q2 D a1 , q3 D a2 a1 C 1, : : : be the convergent denominators for ˛. In the special case q1 D 1 we already proved the statement, see (1.8). Now assume that qk is an arbitrary convergent denominator, I Œ0; 1/ is an arbitrary half-open interval of length jI j D fqk ˛g < 1=2, and we study the counting function Z˛ .N I I / D

X

1:

1nN W n˛2I .mod 1/

First assume that N is divisible by qk , and consider the arithmetic progressions for a D 1; 2; : : : ; qk : N : qk

(1.34)

Z .M I I .qk a/˛/;

(1.35)

i qk C a; i D 0; 1; 2; : : : ; M 1 with M D For brevity write D jI j D fqk ˛g, then by (1.34) Z˛ .N I I / D

qk X aD1

where I t denotes the translated copy of interval I modulo one. The point here is that the intervals I .qk a/˛, as a D 1; 2; : : : ; qk , are pairwise disjoint and also uniformly distributed in the unit interval.

16


To prove disjointness, notice that if I j˛ and I l˛ overlap for some 0 j < l < qk , then k.l j /˛k < jI j D kqk ˛k; which contradicts the well-known local minimum property of kqk ˛k (km˛k < kqk ˛k implies that m > qk ). To prove uniform distribution of the translated intervals, we simply refer to the Lemma on Bounded Error Initial Segments. Combining disjointness with uniform distribution, by (1.8) we have Z˛ .N I I / D

qk X

Z .M I I .qk a/˛/ D qk bM jI jc C ;

(1.36)

D bqk fM jI jgc or dqk fM jI jge

(1.37)

aD1

where

(lower or upper integral part). Since N D qk M , we can rewrite (1.36) and (1.37) as follows: Z˛ .N I I / D bN jI jc or bN jI jc;

(1.38)

which proves the lemma in the special case when N is divisible by qk . In the general case we write N D N1 C r, where N1 is divisible by qk and 0 r < qk . Clearly Z˛ .N I I / D Z˛ .N1 I I / C Z˛ .rI I N1 ˛/:

(1.39)

Since 0 r < qk , and again using the local minimum property of kqk ˛k D jI j, we have 0 Z˛ .rI I N1 ˛/ 1:

(1.40)

1 : qk

(1.41)

Also, jI j D jqk ˛ pk j < Combining (1.38)–(1.41) we have jZ˛ .N I I / N jI jj jZ˛ .N1 I I / N1 jI jj C jZ˛ .rI I N1 ˛/ rjI jj < < 1 C 1 D 2; completing the proof of the lemma.

t u

1.2 Randomness in Lattice Point Counting

17

1.2 Randomness in Lattice Point Counting First note that the counting function Z˛ .N I I / D

X

1

1nN W n˛2I .mod 1/

of the irrational rotation has an alternative geometric meaning: it counts lattice points in a long tilted narrow strip of slope ˛.

Indeed, let I be the interval .0; /, we push down the line y D ˛x of slope ˛ by the length of interval I , and consider the long tilted narrow parallelogram with vertices 1 1 1 1 .0; 0/; .0; /; .N C ; ˛.N C //; .N C ; ˛.N C / /I 2 2 2 2 we denote this parallelogram with P.I N /. Clearly the area of parallelogram P.I N / is .N C 12 /. Let L.I N / denote the number of lattice points in parallelogram P.I N /. It is easy to see that, with I D .0; /, Z˛ .N I I / N D

X 1nN W 00;c>0

where .1 C 7/ corresponds to b D 0, 2.1 C 6 C 2 C 3/ corresponds to b D ˙1, and 2.1 C 3/ corresponds to b D ˙2. Thus we have p

C4 . 7/ D

1=2

40

p p 240 7 log.8 C 3 7/

1=2

1

p p 6 7 log.8 C 3 7/

D

:

Finally, we have the analogous formula 0 p B C4 . 71/ D B @

11=2 1 p p 240 71 log.3480 C 413 71/

X b 2 CacD71W

C aC A

:

a>0;c>0

Since X

a D 1160;

b 2 CacD71W a>0;c>0

we have p C4 . 71/D

1160 p p 240 71 log.3480 C 413 71/

1=2

D

29 p p 6 71 log.3480 C 413 71/

1=2 :

p p Note that both real quadratic fields Q . 7/ and Q . 71/ have class number one: this is why we could use the elegant Siegel’s formula. If the class number of the real quadratic field is not one, then we have to switch to a more complicated algorithm. The basic idea of the proof of Theorem 1.2 is the same as that of Theorem 1.1: as n runs in the interval 0 < n < N , we approximate S˛ .n/ with a sum of independent


23

and identically distributed random variables. Again the independence comes from an underlying (homogeneous) Markov chain. Theorems 1.1 and 1.2 are our main results describing the asymptotic behavior of the irrational rotation from a global viewpoint. The proofs are very long. This is why we decided to include two warm-up sections: Sects. 1.3 and 1.4.

1.2.1 A Key Tool: Ostrowski’s Explicit Formula Our proof of Theorem 1.2 will use a somewhat complicated but very useful formula, due to Ostrowski (see [Os]), expressing the sum S˛ .n/ in terms of the basic parameters of the continued fraction expansion of ˛. First we recall the wellknown recurrence relations for the denominators qi of the convergents pi =qi of ˛ D Œa0 I a1 ; a2 ; : q1 D 1; q2 D a1 ; and for all i 1; qi C2 D ai C1 qi C1 C qi : In view of this, there is a unique way to express an arbitrary positive integer n as a linear combination of the qi s as follows: X

0 bi D bi .n/ ai for i 2; 0 b1 D b1 .n/ a1 1; (1.54) where * indicates the Extra Rule that if bi D ai then bi 1 D 0. The only new parameter in Ostrowski’s explicit formula below is "i D "i .˛/ D qi ˛ pi , where sign."i / D ˙1 denotes the usual sign. It is well known that, for every ˛, as i runs, "i forms an alternating sequence (in fact, an alternating decreasing sequence that tends to zero at least exponentially fast). nD

i

b i qi ;

Proposition 1.3 (Ostrowski’s explicit formula). Let q` n < q`C1 ; and write P n D 1i `bi qi as in (1.54). Then S˛ .n/ D

j"i j 1 bi qi j"i j X sign."i /bi C bj qj j"i j C : C 1j 0: In this case b` q` X kp` kD1

D b`

q`

C

k"` q`

D

b` q` X kp` kD1

q`

C

b` q` X k"` kD1

q`

b` q` .b` q` C 1/"` b` q` b` b` q` .q` 1/ C D C .b` q` C 1/"` : 2q` 2q` 2 2 2

` In the last step we used the facts that k" < q1` for k b` q` .< q`C1 / and also that q` the residue of kp` modulo q` ; as k runs, is a permutation of 0; 1; 2; ; q` 1: Therefore,

S1 D

b` b` C .b` q` C 1/ "` : 2 2

Case 2: "` < 0: In this case, if k 6 0 (mod q` ), we have

k"` kp` C q` q`

D

kp` q`

C

k"` q`

since jk"` =q` j < 1=q` for 1 k b` q` ; and if k 0 (mod q` ) and 1 k b` q` ; then

k"` kp` C q` q`

D1C

k"` : q`

Repeating the summations in Case 1, again we have S1 D

b` b` C .b` q` C 1/ "` : 2 2


25

Next we rewrite S2 : 0

S2 D

n X

fb` q` ˛ C m˛g

mD1

n0 2

where n0 D

X 1i 1 q`

since b` q` C n0 < q`C1 and j"` j < 1=q`C1 : Thus we have fb` q` ˛ C m˛g D fb` q` ˛g C fm˛g 1 which equals .1 C b` "` / C fm˛g 1 D b` "` C fm˛g; and

26


so again we have S2 D n0 b` "` C S˛ .n0 /: Summarizing, S˛ .n/ D S˛ .n0 /

b` .1 b` q` 2n0 j"` j j"` j/sign."` /; 2

and Ostrowski’s formula (1.55) follows by induction.

t u

Ostrowski used his formula to study the maximum fluctuation of the sum S˛ .n/ as ˛ is fixed and n runs in a long interval. As an illustration, we mention without proof the following result. Proposition 1.4 (Ostrowski’s large fluctuation result). Suppose the partial quotients of ˛ D Œa0 I a1 ; : : : form a bounded sequence: ai A for all i (this covers the class of quadratic irrationals). Then there are positive constants 0 < c1 < 1 and c2 > 0 (possibly depending on A) such that, for every sufficiently large N , the interval c1 N < n < N contains an integer n1 with the property S˛ .n1 / > c2 log N; and also the interval c1 N < n < N contains another integer n2 with S˛ .n1 / < c2 log N:

1.2.2 Counting Lattice Points in General We conclude this section with a short general discussion about lattice point problems. It is fair to say that there is no such thing as a coherent “lattice point theory” (yet). What we have instead are two unrelated subjects: (a) the two famous old lattice point problems and a lot of related partial results and (b) Minkowski’s well-known lattice point theorem(s), as the basic result(s) of the so-called geometry of numbers. A possible vague description of what “lattice point theory” should mean may go like this: the main question is to determine, or at least estimate, the number of lattice points in a “reasonable” region in the plane and in higher dimensions. Notice that the one-dimensional problem is trivial. The only “reasonable” set in the real line is an interval, and every interval Œa; b/ R I contains either bb ac or db ae integers (lower or upper integral part). By contrast, the two-dimensional problem is far from trivial. What are the “reasonable” sets in the plane? The first novelty here is that we have many natural candidates, such as


27

1. polygons, 2. smooth regions like the circle, and other quadratic shapes (ellipse, hyperbola), and 3. all convex regions. Some natural questions have an easy answer (e.g., Pick’s theorem about lattice polygons; see below); other problems are extremely hard and are open for more than 200 years (e.g., Gauss’s well-known Circle Problem). Theorem 1.1 (or Theorem 1.2) is in the middle in the sense that it is a lattice point counting result that is neither simple nor hopeless. In the rest of the section we collect some simple results that will be repeatedly used later.

1.2.2.1 Pick’s Theorem: Complete Answer for Lattice Polygons A polygon is called simple if it does not intersect itself. A simple polygon divides the plane into two regions: a bounded and simply connected “inside” (or interior) and an unbounded “outside”—this is a special case of a well-known theorem of Jordan. In the rest a polygon always means a simple polygon. Let P be a lattice polygon, meaning that every vertex of polygon P is a lattice point .k; l/ 2 ZZ2 . Let B.P/ denote the number of lattice points on the boundary of P, let I.P/ denote the number of lattice points inside P, and finally let A.P/ denote the area of P. Proposition 1.5 (Pick’s theorem). Every simple lattice polygon P satisfies the equation 1 B.P/ C I.P/ D A.P/ C 1: 2 A lattice triangle or parallelogram is called empty if it contains no lattice point inside, and contains, respectively, 3 or 4 lattice points on the boundary (the “vertices”). We have the following simple corollary of Proposition 1.5. Corollary 1.6. Every empty lattice triangle or parallelogram has area, respectively, 1/2 or 1. The standard way of proving Pick’s theorem is to prove Corollary 1.6 first and then extend it for arbitrary polygons by induction (since every polygon is a union of triangles). It is fair to say that Theorem 1.2—i.e., counting lattice points in right triangles of irrational slope ˛, where one vertex is the origin and one side is on the x-axis—is the simplest case beyond Pick’s theorem. And the simplest case already exhibits a central limit theorem. Pick’s theorem (Proposition 1.5) was an “exact result”; here is another one. Consider the following “half-open” version of the unit square: P D f.x; y/ W 0 x < 1 and 0 y < 1g:

(1.56)

28


In other words, from the closed unit square Œ0; 12 we remove the top unit interval Œ0; 1 and also the right-hand side unit interval Œ0; 1—this is how we get P. P contains exactly one lattice point (the origin), and every translated copy P C v of P contains exactly one lattice point in the plane. Similarly, let P be an arbitrary (not necessarily empty) lattice parallelogram; in fact, we assume that P is “half open” the same way as (1.56). Then again every translated copy P C v of P contains the same number of lattice points, and the common value is the area of P. In general, we can extend it to all centrally symmetric polygons. Indeed, every centrally symmetric polygon can be decomposed into parallelograms; we leave the easy proof to the reader. Thus we obtain the following simple but elegant result. Proposition 1.7. Let P be a centrally symmetric lattice polygon with half-open border the same way as (1.56). Then every translated copy P C v of P contains the same number of lattice points, and the common value is the area of P. t u Here is another simple result. Proposition 1.8. Let A R I 2 be a Lebesgue measurable set in the plane with finite measure (that we call the “area”). Then Z 1Z

1

j.A C x/ \ ZZ2 j d x D area.A/; 0

(1.57)

0

where A C x is the translated copy of set A, translated by the vector x 2 R I 2.

t u

Finally, we mention the almost trivial

I 2 be a region inside a simple curve , and assume that Proposition 1.9. Let S R has a well-defined finite arc length (= perimeter of S ), then Area.S / O.Perimeter.S // jS \ ZZ2 j Area.S / C O.Perimeter.S // C 1: (1.58) Note that Proposition 1.9 is basically best possible. Indeed, let S be the square Œ"; n C "2 : it has area .n C 2"/2 D n2 C 4"n C 4"2 D n2 C o.1/ if " > 0 is small enough, the perimeter of S is 4n C o.1/, the number of lattice points inside S is .n C 1/2 D n2 C 2n C 1, thus we have number of lattice points inside S D Area C

1 Perimeter C o.1/: 2

(1.59)

Here S is an axis-parallel square; the situation is completely different p pfor tilted squares where the slope is a (say) quadratic irrational, such as 2 or 3. Then the maximum fluctuation (around the area) drops from ˙Perimeter p in (1.59) to ˙ log.Perimeter/, and the typical fluctuation drops further to ˙ log.Perimeter/. In fact, we have a central limit theorem—a variant of Theorem 1.2; see Sect. 4.5.

1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given

29

1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given In 1935 van der Corput [Co] constructed his famous “digit reversal sequence” t0 ; t1 ; t2 ; : : :, which in many respects can be considered an oversimplified model for the irrational rotation. At the same time, it is the simplest example of a “most uniform” infinite sequence in the unit interval. The van der Corput sequence goes as follows: 0;

1 1 3 1 5 3 7 1 9 5 13 3 11 7 15 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; 2 4 4 8 8 8 8 16 16 16 16 16 16 16 16

1 17 9 25 5 21 13 29 3 19 11 27 7 23 15 31 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ::: 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 Note that here t1 D 1=2 is obtained from t0 D 0 by a shift of 1=2, then the first two elements t0 ; t1 are shifted by 1=4, then the first four elements t0 ; t1 ; t2 ; t3 are shifted by 1=8, then the first eight elements are shifted by 1=16, then the first sixteen elements are shifted by 1=32, and so on. An alternative definition of tn is the following. We write down n in binary form (say, 13 D 8 C 4 C 1 D 1101), then we write its digits in reverse order and prefix it with “0” and “.” like this: t13 D 0:1011 D

1 1 1 11 C C D : 2 8 16 16

In general, if n D 2k1 C 2k2 C 2k3 C with k1 > k2 > k3 > ;

(1.60)

then tn D 2k1 1 C 2k2 1 C 2k3 1 C :

(1.61)

The van der Corput sequence t0 ; t1 ; t2 ; : : : exhibits a clear-cut dyadic nested structure; it is well illustrated by the following three properties of the sequence. Property A: The set fti W 0 i < 2k g of the first 2k elements of the van der Corput sequence is the equidistant set fj 2k W 0 j < 2k g in different order. Property B: Let I .0; 1/ be an arbitrary half-open subinterval of length 2k for some integer k 1, and let n be an arbitrary integer divisible by 2k . Then the number of elements of the set fti W 0 i < 2k g that fall into interval I is exactly n2k . Property C (“Two Distances”): If 2k n < 2kC1 then the consecutive points of the set fti W 0 i < ng have at most two distances: 2k and 2k1 .

30


We have the perfect analogs of Properties A–C for the irrational rotation ˛; 2˛; 3˛; : : : (mod 1). The Three-distance theorem mentioned in Sect. 1.1 is an obvious analog of Property C. The Lemma on Bounded Error Initial Segments in Sect. 1.1 is an analog of Property A, and the Lemma on Just Intervals is the analog of Property B. We can say, intuitively speaking, that the van derpCorput sequence t0 ; t1 ; t2 ; : : : behaves like a “fake irrational rotation where ˛ D 2 is replaced by 1=2 (and 1=4 and 1=8 and so on).” Since tk is uniformly distributed in the unit interval Œ0; 1/ (see Properties A and B), it is natural to take the difference tk 1=2; in fact, we study the sum S.n/ D

n1 X 1 ; tk 2

(1.62)

kD0

which is a perfect analog of sum (1.43). As a warm-up result for Theorem 1.1 (and Theorem 1.2), we are going to prove the following central limit theorem for S.n/ as n runs in the interval 0 n < 2m , where m 2 is any integer. Proposition 1.10 (Central limit theorem for the van der Corput sequence). For any integer m 2 and any real numbers 1 < A < B < 1 ˇ ˇ Z B ˇ 1 S.n/ C m=8 1 ˇˇ 2 m ˇ 0n c1

log log N ; log log log N

where c1 > 0 is a positive absolute constant. After van Aardenne-Ehrenfest’s breakthrough results the main question was Question 1.12. How large is the max-discrepancy .N /? In 1972 Schmidt [Schm] settled this problem by proving that .N / > c2 log N; where c2 > 0 is a positive absolute constant (e.g., c2 D 1=50 is a good choice). The order of magnitude log N in Schmidt’s theorem is the best possible. There are several infinite sequences with max-discrepancy .N / D O.log N /—the van der Corput sequence is perhaps the simplest construction. Further examples are the irrational rotation k˛ (mod 1), k D 1; 2; 3; : : :, where ˛ is any quadratic irrational (this follows from the Discrepancy Lemma in Sect. 1.1, and it goes back to the early works of Hardy–Littlewood [Ha-Li1, Ha-Li2] and Ostrowski [Os]). Between van Aardenne-Ehrenfest (1945–1949) and W. M. Schmidt (1972), the most important work p was done by Roth [Ro], who proved in 1954 that the L2 discrepancy is > log N . More precisely, let ˛1 , ˛2 , : : :, ˛N be an arbitrary N element point set in the unit interval [0,1), and define the L2 -discrepancy as 0 B1 2 .N / D 2 .N I ˛1 ; : : : ; ˛N / D B @N

N Z 1 X nD1

0

0

12

11=2

B X C C B C dx C 1 nx @ A A 1i nW 0˛i <x

:

32


Roth’s theorem says that for any N -element set in the unit interval p 2 .N / D 2 .N I ˛1 ; : : : ; ˛N / > c3 log N ; where c3 > 0 is a positive absolute constant (e.g., c3 D 1=20 is a p good choice). In 1956 Davenport [Da] proved that the order of magnitude log N in Roth’s theorem is best possible. Davenport considered the following “symmetric” 2N element point set coming from the irrational rotation: S˛˙ D S˛˙ .N / D fk˛ .mod 1/ W k D ˙1; ˙2; ˙3; : : : ; ˙N g; where ˛ is a badly approximable number, meaning that an D O.1/ where an is the nth partial quotient in the continued fraction ˛ D Œa0 I a1 ; a2 ; a3 ; : : : of ˛ (in other words, the partial quotients are bounded—this is certainly the case for the quadratic irrationals, since periodicity implies boundedness). Davenport actually proved that for any n 2 12

0

Z

1=2 0

B B @

X 1knW x k3 > ;

(1.64)

34


then by repeated application of (1.64) and (1.61) we have 1 1 1 1 1 1 C C k k C1 C C k k C1 C k k C1 C S.n/ D 2 2 21 2 2 22 3 21 3 1 1 1 1 C k k C1 C k k C1 C k k C1 C 2 23 4 22 4 21 4

(1.65)

We prefer to rewrite (1.65) in the following way: if 2m1 n < 2m and n D ı0 2m1 C ı1 2m2 C ı2 2m3 C : : : C ım1 ; where ı0 D 1, ıi 2 f0; 1g for 1 i m 1, then 0 1 m1 X 1 @X S.n/ D ıi ıi ıj 2i j A : 2 i D0 0i <j <m

(1.66)

(1.67)

Notice that (1.67) is a simplified version of Ostrowski’s formula (1.55). We can clearly extend the range 2m1 n < 2m in (1.67) to 0 n < 2m by letting ı0 D 0; then, as long as n runs in the interval 0 n < 2m , the binary digits ı0 D ı0 .n/, ı1 D ı1 .n/, ı2 D ı2 .n/, : : :, ım1 D ım1 .n/ of n form independent random variables. It is very easy to evaluate the “expectation” 0 1 m1 2m X X X 1 1 i j 1 S.n/ D @ 2 A; Em D 2m (1.68) 2 2 4 nD1 i D0 0i <j <m since PrŒıi ıj D 1 D PrŒıi D 1 PrŒıj D 1 D 1=2 1=2 D 1=4 and so PrŒıi ıj D 0 D 3=4 for all i ¤ j ; here of course Pr stands for probability. We can easily find the exact value of sum (1.68): m 1 1 1 1 1 1 1 1 1 1 Em D C C C D C C C C C C : : : C m1 4 8 2 2 4 2 4 8 2 4 2 1 1 1 1 1 m 1 C 1 C 1 C : : : C 1 m1 D D C 4 8 2 4 8 2 1 m 1 m1 1 m (1.69) 1 m1 D C 2m2 : D C 4 8 8 2 8 4 Next we compute the “variance” (E means expectation) 12 0 m1 X 1 @X 1 1 i j A Vm D E 2 D ıi ıi ıj 4 2 4 i D0 0i <j <m D DiagPart C OffDiagPart;

(1.70)

1.3 First Warm-Up: Van der Corput Sequence—When Independence Is Given

35

where the diagonal and off-diagonal parts are defined as follows: 0 1 m1 X 1 2 1 2 i j A 1 @X C 4 DiagPart D E ıi ı i ıj 4 2 4 i D0 0i <j <m

(1.71)

and 1 0 X 1 1 A 1 @ ıi2 ıi1 OffDiagPart D E 2 4 2 2 0i 10

Let n denote the characteristic function of event Fn : n D 1 or 0 depending on whether Fn holds or fails. The sum X n (1.145) XM D 10 0 is arbitrarily small but fixed. Note that the order of magnitude p of log d is roughly around the length of the period of the continued fraction for d . The elegant Hirzebruch-Meyer-Zagier formula (HMZ-formula) was discovered in the 1970s. It states that h.p/ D

a1 C a2 a3 ˙ C a2s ; 3

(2.19)

where p 3 (mod 4) is a prime > 3, h.p/ D 1, and a1 ; a2 ; : : : ; a2s forms the p period of p (since p 3 (mod 4), the length of the period has to be even). p (Note that both (2.17) and (2.19) fail for p D 3, because Q . 3/ has too many automorphisms: 6 instead of the usual 2—a technical nuisance in algebraic number theory.) Combining the HMZ-formula with (2.14)–(2.17), we conclude Mpp .N / D

h.p/ log N C O .log log N /3 D 4 log

D

a1 C a2 C a2s log N C O .log log N /3 D 12 log

D

a1 C a2 a3 ˙ C .1/` a` C O .log log N /3 ; 12

(2.20)

p where ` is the last index for which q` N and is the fundamental unit of Q . p/ (in the last equation we heavily used the periodicity of the continued fraction p for p). Summarizing, by using the HMZ-formula, we just managed to prove (2.20), at least under some strong technical conditions (for example, we assumed that p 3 (mod 4) is a prime > 3 with h.p/ D 1, and also in (2.20) we have the ugly but negligible error term O .log log N /3 ). Nevertheless, from (2.20) it was quite easy to guess that Proposition 2.1 must hold for arbitrary ˛ (not just for quadratic irrationals), and this is exactly how we came up with the right conjecture (2.4).

2.1 Computing the Expectation in General (I)

87

Because we know a completely elementary proof of Proposition 2.1, reversing the argument, we can produce an elementary proof for the HMZ-formula. Later we will give a precise proof of (2.12) and (2.14); (2.12) is Proposition 2.16 and (2.14) is Proposition 2.20. (The interested reader can find all the details, and much more, about quadratic fields in the well-written book of Zagier [Za4] (it is in German), or in the classic Borevich–Safarevich: Number Theory.)

2.1.3 Another Detour: Formulating a “Positivity Conjecture” The first line in (2.20) raises a very interesting question. If a prime p satisfies the condition of the HMZ-formula, the expectation equals Mpp .N / D

h.p/ log N C negligible error: 4 log

Here the class number is trivially 1, and also therefore,

p

p > 1, implying log > 0;

Mpp .N / D c log N C negligible error; where c D c.p/ > 0 is a positive constant. By Proposition 2.1, the error term here is in fact O.1/, and in general, for any quadratic irrational ˛, M˛ .N / D c log N C O.1/; where c D p c.˛/ is a constant (expressed in terms of the period of ˛). Is it true that if ˛ D d , the corresponding constant factor is always nonnegative, that is, Mpd .N / D c log N C O.1/ with c 0? We guess the answer is “yes,” and I refer to this as the “positivity conjecture.” p If the length of the period of d is odd, the “positivity conjecture” is trivial. Indeed, by formula (2.4) the corresponding alternating sum “cancels out,” implying that the constant factor is zero, i.e., Mpd .N / D O.1/ (the same holds for any quadratic irrational p with odd period). Thus, the nontrivial case is when the length of the period of d is even. It is well known that then the period has the symmetric form with a central term p d D Œa0 I a1 ; a2 ; : : : ; at ; at C1 ; at ; : : : ; a2 ; a1 ; 2a0 p where a0 D b d c and at C1 denotes the central term. Applying the alternating sum in formula (2.4), we have

88

2 Expectation, and Its Connection with Quadratic Fields

a1 C a2 a3 ˙ C O.1/ D 12 1 1 0 0 t X log N C O.1/: D @2 @ .1/j aj A C .1/t C1at C1 C 2a0 A log j D1 Mpd D

The positivity of the constant factor c D c.d / in Mpd D c log N C O.1/ is, therefore, equivalent to the positivity of the alternating sum formed from the period 2

t X

.1/j aj C .1/t C1 at C1 > 0:

j D0

We checked the p tables for d < 100, and this alternating sum is indeed positive when the period of d is even. Since the “positivity conjecture” is certainly not truepfor arbitrary quadratic irrational ˛, its hypothetical truth in the special case ˛pD d is probably closely related top the arithmetic of the real quadratic field Q . d / (or perhaps the complex field Q . d /). Let’s return now to Proposition 2.1. We include an elementary (but far from easy) proof. Proof of Proposition 2.1. We use Dedekind sums. To explain where the Dedekind sum comes from, we rewrite (2.1) and (2.2) in the following form: M˛ .N / D

N 1 X 1 D .N C 1 k/ fk˛g N 2 kD1

D

N X N N C1 1 X 1 1 1 k fk˛g ; fk˛g N 2 2 N 2 2 kD1

(2.21)

kD1

where the last sum N X k kD1

N

1 2

1 fk˛g 2

in (2.21) strongly resembles a Dedekind sum D.H; K/ D

K1 X j D1

1 j K 2

1 fjH=Kg ; 2

where we always assume that H and K 1 are relatively prime integers.

(2.22)


89

Dedekind sums [i.e., (2.22)] originally appeared in Dedekind’s study of elliptic functions and theta-functions. Luckily we don’t need to know anything about these (rather technical) subjects; we can just work with definition (2.22). The key fact about Dedekind sums is the following reciprocity formula, a highly surprising and nontrivial result. Lemma 2.3 (Dedekind’s reciprocity formula). We have 1 D.H; K/ C D.K; H / D 12

H K 1 C C K H HK

1 : 4

(2.23)

Note that the definition of D.H; K/ and D.K; H / automatically includes the condition that “H 1 and K 1 are relatively prime integers.” For a proof of this classical result, see, e.g., the book [Ra-Gr]. From Lemma 2.3 we will derive Lemma 2.4. If 1 H < K are relatively prime then D.H; K/ D

a1 a2 C a3 C .1/`1 a` C O.1/; 12

(2.24)

where H D K

1

D Œa1 ; a2 ; a3 ; : : : ; a` :

1

a1 C a2 C

(2.25)

1 a3 C : : :

Note that the error term O.1/ in (2.24) has absolute value 1=4. Proof. The continued fraction Euclidean algorithm

H K

D Œa1 ; a2 ; a3 ; : : : ; a` is equivalent to the

K D a1 H C H1 ; H D a2 H1 C H2 ; H1 D a3 H2 C H4 ; : : : ; H`2 D al H`1 where H`1 D gcd.H; K/ D 1 (gcd denotes the greatest common divisor). We apply Lemma 2.3 with the short notation 1 g.x; y/ D 12

x y 1 C C y x xy

1 4

as follows: write K D H1 , H D H0 , then D.H; K/ D D.H0 ; H1 / D g.H1 ; H0 / D.H1 ; H0 / D D g.H1 ; H0 / D.H1 ; H0 /I

90


here we used the first equation of the Euclidean algorithm. Repeating the same argument, we have D.H; K/ D g.H1 ; H0 / D.H1 ; H0 / D D g.H1 ; H0 / .g.H0 ; H1 / D.H0 ; H1 // D D g.H1 ; H0 / g.H0 ; H1 / C D.H2 ; H1 /I here we used the second equation of the Euclidean algorithm. Repeating the same argument several times, we have D.H; K/ D g.H1 ; H0 / g.H0 ; H1 / C g.H1 ; H2 / g.H2 ; H3 / ˙ C .1/`1 g.H`2 ; H`1 / C .1/` D.H`2 ; H`1 /: Note that the last term here is in fact zero; indeed, H`1 D gcd.H; K/ D 1 implies that D.H`2 ; H`1 / D 0. Moreover, by using the notation f .x; y/ D

x y C ; y x

we have `1 X Hi Hi 1 i .1/ f .Hi 1 ; Hi / D .1/ C D Hi Hi 1 i D0 i D0

`1 X

i

X Hi 1 Hi C1 H0 C .1/i D H1 Hi i D0 `1

D

D

X X H ai 1 Hi H C C .1/i D .1/i ai 1 : K Hi K `1

`1

i D0

i D0

Since 1 f .x; y/ C g.x; y/ D 12

1 1 ; 12xy 4


91

combining the facts above, we conclude D.H; K/ D g.H1 ; H0 / g.H0 ; H1 / C g.H1 ; H2 / g.H2 ; H3 / ˙ C C.1/`1 g.H`2 ; H`1 / D D

C

1 C .1/`1 1 H C 12K 8 12

a1 a2 C a3 C .1/`1 a` C 12

1 .1/`1 1 1 C C KH HH1 H1 H2 H`2 H`1

:

The last alternating sum has absolute value 1=12, and because 1 H < K, the total error is at most maxf1=4; 1=12 C 1=12g D 1=4, completing the deduction of Lemma 2.4 from Lemma 2.3. t u Next we derive Proposition 2.1 from Lemma 2.4 in the special case N D qr , i.e., when N happens to be a convergent denominator of ˛; see Lemma 2.5. But first we introduce a notation that simplifies the treatment of Dedekind sums. Let ( ..x// D

fxg 12 ; if x is not an integerI 0;

otherwi se:

Note that y D ..x// is usually called the “sawtooth function.” By using this new notation, we can rewrite (2.22) in a shorter form: D.H; K/ D

K1 X j D1

j K

jH K

;

(2.26)

where, as usual, we assume that H and K 1 are relatively prime integers. Notice that extending the summation in (2.26) from 1 to K makes no difference (just adds a zero to the sum). Now we are ready to formulate and prove an important special case of Proposition 2.1. Lemma 2.5. We have M˛ .qr / D

a1 C a2 a3 ˙ : : : C .1/r1 ar1 C O.1/; 12

(2.27)

where ˛ D Œa1 ; a2 ; a3 ; : : : and pr =qr D Œa1 ; a2 ; : : : ; ar1 is the rth convergent of ˛. The implicit error term O.1/ is less than 5 for all ˛ and r.

92


Proof. We recall (2.21) with N D qr :

qr X qr 1 qr C 1 1 X 1 1 k fk˛g : fk˛g M˛ .qr / D qr 2 2 qr 2 2 kD1 kD1 (2.28) First we focus on the following subsum of (2.28): S D

qr X k kD1

qr

1 2

X qr 1 k fk˛g D ..k˛//: 2 qr

(2.29)

kD1

We compare S to the Dedekind sum D.pr ; qr / D

qr X k kpr kD1

qr

qr

;

(2.30)

where pr =qr is the rth convergent of ˛. We recall the well-known fact from diophantine approximation that ˇ ˇ ˇ ˇ ˇ˛ p r ˇ < 1 ; ˇ qr ˇ qr2 which implies that the inequality ˇ ˇ ˇ ˇ ˇk˛ kpr ˇ < k 1 ˇ qr ˇ qr2 qr

(2.31)

holds for all 1 k qr . By (2.31) we have jS D.pr ; qr /j < 1:

(2.32)

On the other hand, by Lemma 2.4, ˇ ˇ r ˇ ˇ ˇD.pr ; qr / a1 a2 C a3 C .1/ ar1 ˇ 1 : ˇ 4 ˇ 12

(2.33)

Combining (2.32) and (2.33) we have ˇ ˇ ˇ a1 a2 C a3 C .1/r ar1 ˇ 1 ˇS ˇ C 1 D 5: ˇ ˇ 4 12 4

(2.34)


93

Another application of (2.31) gives ˇ ˇ ˇˇq 1 ˇq 1 ˇ r r ˇ ˇX ˇX 1 1 ˇˇ j ˇ ˇ ˇ .fk˛g 1=2/ˇ ˇ ˙ ˇ ˇ ˇ ˇ qr qr 2 ˇˇ j D1 kD1 ˇ ˇ ˇqX ˇ ˇ ˇ r 1 j 1 ˇ C qr 1 D 0 C 1 D 1: ˇˇ 2 ˇˇ qr ˇ j D1 qr

(2.35)

Applying (2.34) and (2.35) in (2.28), we conclude that ˇ ˇ r ˇ ˇ ˇM˛ .qr / a1 a2 C a3 : : : C .1/ ar1 ˇ ˇ ˇ 12

ˇ ˇ ˇˇ ˇ 5 ˇˇ qr C 1 1 ˇˇ ˇˇ qr C 1 1 ˇˇ ˇˇ Cˇ ˇCˇ ˇ ˇfqr ˛g 4 qr 2 qr 2

ˇ ˇ ˇ ˇ qr C 1 1 ˇ 1 ˇˇ 5 ˇ < 5; ˇ C 2 ˇ q 2ˇ 4 2ˇ r t u

and Lemma 2.5 follows.

The last step is to derive the general Proposition 2.1 from the special case Lemma 2.5. There are many ways to reduce the general case to Lemma 2.5; see, e.g., Beck [Be4]. Here we follow a nice idea of Schoissengeier [Scho], involving telescoping sums, which seems to be the best treatment of the general case. Let N 1 be an arbitrary integer. Consider the Ostrowski expansion of N [see (1.54)]: N D

r X

bi qi ; where 0 bi ai and

(2.36)

i D1

bi D ai implies bi 1 D 0 (“Extra Rule”). Here ai is the i th partial quotient of the continued fraction of ˛ D Œa1 ; a2 ; a3 ; : : : and pi =qi D Œa1 ; : : : ; ai 1 is the i th convergent of ˛. We are motivated by the following telescoping sum equation: N X N C1i ipr D N qr i D1

(2.37)

1 0 NX Nk r k1 ipk jpk1 A 1 X @X .Nk C 1 i / .Nk1 C 1 j / ; D N qk qk1 i D1 j D1 kD1

where Nk is the kth partial sum of (2.36): Nk D

Pk

i D1 bi qi .

94


We are going to evaluate the terms of the telescoping sum (2.37). The next lemma, clearly motivated by Eq. (2.37), can be considered as a generalization, or new version, of Lemma 2.5. The idea is to involve the Dedekind sum D.pk ; qk /, just like we did in the proof of Lemma 2.5. Pj Lemma 2.6. If Nj D i D1 bi qi then Nk X

.Nk C 1 i /

i D1

D bk qk D.pk ; qk / C

ipk qk

N k1 X

.Nk1 C 1 j /

j D1

jpk1 qk1

D

bk1 .1 C .1/k /.2Nk1 C 1 .bk1 C 1/qk1 /C 4

C .1/kC1

Nk1 .Nk1 C 1/.Nk1 C 2/ : 6qk qk1

(2.38)

Proof of Lemma 2.6. We basically repeat the proof of Lemma 2.5. Write Nk X

.Nk C 1 i /

i D1

ipk qk

X

D

1

C

X 2

;

(2.39)

where X

X

bk qk 1

D

.Nk C 1 i /

i D1

ipk qk

and X 2

Nk X

D

.Nk C 1 i /

i Dbk qk C1

ipk qk

:

P We evaluate 1 first. Since ..x// D 0 if x is an integer, we take out the i ’s that are divisible by qk : X 1

D

D

bX k 1 .t C1/q Xk 1

.Nk C 1 i /

t D0 i Dt qk C1

bX k 1 k 1 qX

.Nk C 1 tqk j /

t D0 j D1

qk 1

D bk

X

j D1

j

jpk qk

ipk qk

jpk qk

D

D

;

(2.40)


95

since K1 X j D1

jH K

D 0:

Thus by (2.40), qk 1

X 1

D bk qk

X

j D1

1 j qk 2

jpk qk

D bk qk D.pk ; qk /;

(2.41)

justifying the first term Pon the Pright-handPside of (2.38). P Next we evaluate 2 3 , where 2 is the second term in (2.39) and 3 is the negative term on the left-hand side of (2.38): X 3

D

N k1 X

jpk1 qk1

.Nk1 C 1 j /

j D1

:

(2.42)

We recall the well-known fact from the theory of continued fraction: pk1 .1/k1 pk D C ; qk qk1 qk1 qk

(2.43)

and so, if j Nk1 then

jpk qk

D

.1/k1 j jpk1 C qk1 qk1 qk

D

jpk1 qk1

C

.1/k1 j ; qk1 qk

(2.44)

when j is not divisible by qk1 , and

jpk qk

D

jpk1 qk1

C

.1/k1 j 1 C .1/k1 ; C qk1 qk 2

when j is divisible by qk1 . Thus we can rewrite

Nk X

.Nk C 1 i /

i Dbk qk C1

D

N k1 X

2

ipk qk

.Nk bk qk C 1 j /

j D1

D

P

N k1 X j D1

.Nk C 1 j /

[see (2.39)] in the form D

jpk1 qk1

jpk1 qk1

D

;

(2.45)

96


and applying (2.44) and (2.45) we have [note that X 2

C bk1

D

P 3

is defined in (2.42)]

Nk1 .1/k1 j X C .Nk C 1 j /j C 3 qk1 qk j D1

X

.bk1 C 1/qk1 Nk1 C 1 : 2

1 C .1/k1 2

(2.46) t u

Combining (2.41), (2.42), and (2.46), Lemma 2.6 follows.

By using Lemma 2.6, we are ready to complete the proofPof Proposition 2.1. Let’s return to (2.36). First we extend the definition of Nk D kiD1 bk qk for all k > r in the trivial way: put bi D 0 for i > r. We sum up both sides of Lemma 2.6 as k D 1; 2; 3; : : :; the left-hand side of (2.38) gives r X

.N C 1 k/..k˛//;

(2.47)

kD1

and the right hand side of (2.38) gives X

C

1

X 1

X 2

D

r X bj j D1

X 3

D

r X

.1/j

j D1

4

D

X

D

2

C

r X

X 3

where

(2.48)

bi qi D.pi ; qi /;

i D1

.1 C .1/j C1 /.2Nj C 1 .bj C 1/qj /;

1 X

.1/j

j D1

Nj .Nj C 1/.Nj C 2/ D 6qj qj C1

Nj .Nj C 1/.Nj C 2/ N.N C 1/.N C 2/ prC1 C ˛ ; 6qj qj C1 6 qrC1

where in the last step we used (2.43) and the fact pi =qi ! ˛ as i ! 1.


First we evaluate r X

P

1.

By Lemma 2.4,

bi qi D.pi ; qi / D

r X

i D1

bi qi

i D1

D

97

r X .1/j aj 1 j D1

12

a1 a2 ˙ C .1/i ai 1 C 12 4

.N Nj 1 / C

D

N D 4

0 1 r r X .1/j aj 1 X .1/j 1 aj 1 Nj 1 DN@ C C A; 12 12 N 4 j D1 j D1

(2.49)

where ji j < 1 and jj < 1 are appropriate constants. Since the sequence Nj D Pj i D1 bi qi increases at least exponentially fast, an upper bound like k X

Ni 4NkC1

(2.50)

i D1

is trivial. Combining (2.49) and (2.50), a1 a2 ˙ C .1/r ar1 C 0 . max aj / C 00 ; 1j r 12 i D1 (2.51) 00 where j 0 j 4 and jP j 1=4. Next we estimate 2 from above: r X

bi qi D.pi ; qi / D N

X 2

r r X 1X 1 bi Ni . max aj / Ni 3N. max aj /; 1j r 2 i D1 2 1j r i D1

(2.52)

where in the last step wePused (2.50). Finally, we estimate 3 from above. Since Nj D

j X

bi qi and qj C1 aj qj bj qj ;

i D1

we have ˇ ˇ ˇ ˇ r r ˇ X ˇX N .N C 1/.N C 2/ j ˇ ˇ .1/j j j .bj C 1/2 qj 2N. max aj /: ˇ ˇ 1j r 6qj qj C1 ˇ j D1 ˇj D1

(2.53)

98


We also have N.N C 1/.N C 2/ 6

ˇ ˇ ˇ N3 N prC1 ˇˇ ˇ : ˇ˛ ˇ 2 qrC1 3 3qrC1

(2.54)

Combining (2.47), (2.48), (2.51)–(2.54), we obtain r 1 X .N C 1 k/..k˛// D M˛ .N / D N kD1

D

a1 a2 ˙ C .1/r ar1 C . max aj /; 1j r 12

(2.55)

where jj < 10. Equation (2.55) completes the proof of Proposition 2.1. t u Note that our original proof of Proposition 2.1 was a much longer, brute force deduction from Ostrowski’s formula (1.55) (see [Be2, Be3]). Later Schoissengeier [Scho] pointed out the connection with Dedekind sums and some related results of Knuth [Kn1], which made the proof substantially shorter. The proof above follows the Schoissengeier–Knuth approach.

2.1.4 Proposition 2.1 and Some Works of Hardy and Littlewood It is interesting to note that, a few weeks after we completed our proof of Proposition 2.1 (November 1995), we accidentally noticed the following technical lemma in Hardy–Littlewood [Ha-Li2]. “Lemma 14”: If ˛ D Œa0 I a1 ; a2 ; then l 1 X 1 k 2 .1/ ˛i C C O . max ai / ; M˛ .N / D 1i l 12 i D1 ˛i where l is the least index such that ql N , and ˛i D ai C

1 1 ai C1 C ai C2 C

D Œai I ai C1 ; ai C2 ; :

(2.56)


99

By using the trivial identity ˛i D ai C ˛i1C1 ; the alternating sum in “Lemma 14” becomes 1 1 1 ˛1 C C ˛2 C ˛3 C ˙ ˛1 ˛2 ˛3 D a1 C a2 a3 ˙ C .1/i ai ˙ :

(2.57)

The surprising conclusion is that from “Lemma 14” we can obtain a somewhat weaker version of Proposition2.1 in one line. Note that (2.56) is weaker, because the error term O .max1i l ai /2 is the square of the linear error term O.max1i l ai / in Proposition 2.1. Note that Hardy and Littlewood proved their “Lemma 14” by using a different kind of reciprocity formula (namely, the reciprocity formula for the theta functions). A related development is that, about 10 years later, in 1930, Hardy and Littlewood [Ha-Li3] studied the following (diophantine) series: 1 X nD1

1 n sin. n˛/

(2.58)

and made a very interesting discovery. Though the terms of the series (2.58) do not tend to zero for any ˛, Hardy and Littlewood p managed to prove the next best thing; namely, that for the special value ˛ D 2 the partial sums of (2.58) remain uniformly bounded, i.e., N X

1 D O.1/: n sin. n˛/ nD1

(2.59)

p In general, if ˛ D a2 C 1; a is odd, then the partial sums are similarly O.1/: p On the other hand, Hardy and Littlewood noticed that for ˛ D 6=2 1 the N th partial sum is c log N C O.1/ with c ¤ 0. p What is going on here? The proof of the “O.1/-theorem” for ˛ D a2 C 1, a is odd, was so complicated, mysterious, and ad hoc that in his Introduction to the Collected Papers of G.H. Hardy, Vol. 1, Davenport listed the “real understanding” of this paper as a major research problem in diophantine approximation. Now here is our “real understanding”: the “O.1/-theorem” of Hardy and Littlewood is a simple corollary of Proposition 2.1. Indeed, all that we need is the simple identity N X

1 D 4M˛=2 .N / 2M˛ .N / C O. max ai /; 1i l n sin. n˛/ nD1

where l is the last index such that ql N .

(2.60)

100


Equation (2.60) is an easy consequence of two facts. The first one is (2.12): M˛ .N / D

N 1 1 X C O. max ai / 1i k 2 nD1 n tan. n˛/

where k is the last index for which qk N , and the second fact is a simple trigonometric identity: 1 2 cos2 .ˇ/ cos.2ˇ/ 1 1 D D : tan.ˇ/ tan.2ˇ/ 2 sin.ˇ/ cos.ˇ/ sin.2ˇ/ It seems very likely that Hardy and Littlewood overlooked the simple application of Proposition 2.1 via (2.60) (the weaker error term (2.56) would be fine here). This is why they had to develop a complicated ad hoc method in [Ha-Li3]. P We will return to the Hardy–Littlewood series n 1=n sin. n˛/ in Sect. 2.3.

2.2 Computing the Expectation in General (II) 2.2.1 The Expectation in Theorem 1.1 Next we switch from the saw-tooth function ..x// to the characteristic function ( .x/ D

1; if 0 x < I 0; if x < 1;

(2.61)

of the interval Œ0; /, where 0 < < 1, and extend it periodically modulo 1. Then we get the simple equation .x/ D ..x // ..x//:

(2.62)

The sum n X

.k˛/

kD1

is the counting function for the irrational rotation: it counts the integers k in 1 k n for which k˛ 2 Œ0; / modulo 1. Theorem 1.1 is about this counting function. Therefore, to prove Theorem 1.1, we have to determine the corresponding expectation: by (2.62) we need to evaluate the generalized Dedekind sum

2.2 Computing the Expectation in General (II)

D.H; KI c/ D

K1 X j D1

101

j K

jH C c K

;

(2.63)

where c, the “shift constant,” is an arbitrary real number (by (2.62) we use c D or c D 1 ; it doesn’t matter which one). The following lemma, a reciprocity law due to Dieter [Di], describes the connection between the ordinary Dedekind sum and its generalization (2.63). For later application, we have to include a proof. Lemma 2.7. Let 1 H < K be relatively prime integers, and let 0 < c < K be a real number. Then D.H; KI c/ C D.K; H I c/ D D.H; K/ C D.K; H /C 1 bccdce 1 bc=H c C E.H; c/; 2HK 2 4

(2.64)

where ( E.H; c/ D

0; if c 6 0 mod H I 1; if c 0 mod H:

(2.65’)

Proof. First assume that c is a natural number; we prove (2.64) by induction on c. Clearly

jH C c C 1 K

D

jH C c K

C

1 1 1 jH C c jH C c C 1 C ı ; ı K 2 K 2 K (2.66)

where in this section we use the notation ı.x/ D 1 if x is an integer and 0 otherwise (“Kronecker delta”). By (2.63) and (2.66), D.H; KI c C 1/ D

K1 K1 X 1 X j jH C c j C K K K j D1 K j D1

K1 jH C c jH C c C 1 1X j ı Cı : 2 j D1 K K K

(2.67)

Since 1 H < K are relatively prime, there exist two integers h0 and k 0 such that H h0 C Kk 0 D 1:

(2.68)

102


If j h0 c .mod K/ then jH C c 0 .mod K/; and because the saw-tooth function ..x// is odd, we can rewrite (2.67) as follows: D.H; KI c C 1/ D D.H; KI c/ C

1 2

h0 c K

C

1 2

h0 .c C 1/ K

:

It follows by induction on c that D.H; KI c/ D D.H; KI 0/ C

c1 0 X hj

K

j D1

1 C 2

h0 c K

:

(2.69)

For every j with 1 j K 1 [see (2.68)]

h0 j K

D

D

j k 0 Kj HK k0 j H

D

j 1 C ı HK 2

k 0 Kj j HK

k0 j H

D

:

(2.70)

Adding (2.69) to itself with H and K interchanged, and using (2.70), we have D.H; KI c/ C D.K; H I c/ D D.H; K/ C D.K; H / C S; where SD

0 0 i 1 X c 1 1 j kj kc C : ı ı HK 2 H 2HK 4 H j D1

(2.71)

The evaluation of the last line in (2.71) is easy: we have SD

c2 1jc k 1 c

C ı : 2HK 2 H 4 H

(2.72)

Equations (2.71) and (2.72) complete the proof when c is any integer. For an arbitrary real number c we use the identity D.H; KI c C / D D.H; KI c/ C

1 2

h0 c K

;

(2.73)


103

where c 0 is an integer and 0 < < 1 [h0 is defined by (2.68)]. The proof of (2.73) is easy: K1 X j D1

D

K1 X j D1

j K

j K

jH C c K

jH C c C K

1 C ı K 2

D D.H; KI c/ C 0

1 2

D

h0 c K

jK C c K

D

;

because h0 Hc C c 0 (mod K), and (2.73) follows. When 0 < < 1, Eqs. (2.73) and (2.70) imply that D.H; KI c C / C D.K; H I c C / D D.H; KI c/ C D.K; H I c/C C

c 1 c

: ı 2HK 4 H t u

This completes the proof of Lemma 2.7.

Lemma 2.7 leads to the following analog of Lemma 2.4; see Knuth [Kn1]. Again we need the proof. Lemma 2.8. Let 1 H < K be relatively prime integers and let 0 < c < K be a real number. Let H D K

1 1 a1 C a2 C : : :

D Œa1 ; a2 ; a3 ; : : : ; a` ;

then D.H; KI c/ D.H; K/ D

C

b1 C b2 b3 ˙ C .1/` b` C 2

2 c`1 c22 c12 c02 C C .1/`1 C O.1/; 2KH 2HH1 2H1 H2 2H`2 H`1

(2.74)

where the terms bi , ci , Hi in (2.74) are determined by two Euclidean algorithms as follows. Let H1 D K, H0 D H , and define Hi by the first Euclidean algorithm K D a1 H C H1 ; H D a2 H1 C H2 ; H1 D a3 H2 C H4 ; : : : ; H`2 D al H`1 ; (2.75)

104


where H`1 D gcd.H; K/ D 1 (gcd denotes the greatest common divisor); then by using (2.75), we define the integers bi and the real numbers ci via the second Euclidean algorithm c D c0 D b1 H0 C c1 ; c1 D b2 H1 C c2 ; c2 D b3 H2 C c3 ; : : : ; c`1 D b` H`1 C c` ; (2.76) where 0 c1 < H0 , 0 c2 < H1 , : : :, and 0 c` < 1 (note that H` D 0). The error term O.1/ in (2.74) has absolute value 1. Proof. First assume that c is an integer; then c` D 0. Write .h; kI c/ D D.h; kI c/ D.h; k/ and F .h; k; c/ D

1 c 1 c

c2 ; b cC ı 2hk 2 h 4 h

then by Lemma 2.7, .h; kI c/ D F .h; k; c/ .k; hI c/ D D F .h; k; c/ .k .mod h/; hI c .mod h//:

(2.77)

Combining the Euclidean algorithms (2.75) and (2.76) with (2.77), we have .Hj ; Hj 1 I cj / D F .Hj ; Hj 1 ; cj / .Hj C1 ; Hj I cj C1 /

(2.78)

for j D 0; 1; 2; : : : ; ` 1. Write Fj D F .Hj ; Hj 1 ; cj /; then by repeated application of (2.78), we have .H; KI c/ D F0 F1 C F2 F3 ˙ : : : C .1/`1 F`1 D D

`1 X

.1/

j D0

D

j

! cj 1 1 D bj C1 C ı 2hk 2 4 Hj cj2

`1 cj2 .1/`1 b1 C b2 b3 ˙ C .1/` b` X .1/j C C : 2 2Hj 1 Hj 4 j D0

Equation (2.79) proves Lemma 2.8 if c is an integer. If c is not an integer then we simply apply (2.73).

(2.79)

t u


105

2.2.2 An Analog of Proposition 2.1 Let 0 < ˛ < 1 be any irrational and let 0 < < 1 be any rational number. To prove Theorem 1.1 about the irrational rotation, first we need to know the average (“expectation”) M˛ .I N / D

N 1 X S˛ .I n/; N nD1

(2.80)

where S˛ .I n/ D

n X .k˛/

(2.81)

kD1

and the characteristic function .x/ is defined in (2.61). By using (2.62) we have S˛ .I n/ D

n X

...k˛ // ..k˛/// ;

kD1

and N 1 X .N C 1 k/ ...k˛ // ..k˛/// : M˛ .I N / D N nD1

Repeating the proof of Proposition 2.1 with some natural modifications, we obtain the following analogous result. Proposition 2.9. For any irrational ˛ > 0, any real number 0 < < 1, and any integer N 1, M˛ .I N / D

b1 b2 C b3 C .1/`1 b` 2

2 c`1 c02 c22 c12 ˙ C .1/` C max bj ; C 1j ` 2KH 2HH1 2H1 H2 2H`2 H`1

(2.82)

where jj < 10, ˛ D Œa1 ; a2 ; : : :, the index ` D `.˛; N / is defined as the last integer j such that qj N , where pj =qj is the j -th convergent of ˛, and finally the terms bi , ci , Hi in (2.82) are determined by the two Euclidean algorithms (2.75) t u and (2.76) with c D c0 D .1 /K, K D q` , H D p` (i.e., H=K D p` =q` ). Next we show some illustrations.

106


p Example 2.10. First let D 1=2. We begin with ˛ D 2, and evaluate Mp2 .1=2I N /, i.e., the corresponding expectation in Theorem 1.1. The continued p fraction 2 1 D Œ2; 2; 2; : : : D Œ2 gives that 2 D a1 D a2 D a3 D in (2.75). Next we compute bi , ci , Hi in (2.76) as follows: c D c0 D .1 /K D

1 1 .2H C H1 / D H C H1 ; 2 2

implying b1 D 1, and c1 D c2 D

1 1 H1 D 0 H C H1 ; implying b2 D 0; and 2 2

1 1 1 H1 D .2H2 H C H3 / D H2 C H3 ; implying b3 D 1; 2 2 2

and so on. Thus we obtain the periodic sequences b1 D 1; b2 D 0; b3 D 1; b4 D 0; : : : ; bi D c0 D

1 .1 C .1/i 1 /I 2

1 1 1 1 K; c1 D c2 D H1 ; c3 D c4 D H3 ; c5 D c6 D H5 ; : : : 2 2 2 2

Hence we have b1 b2 C b3 b4 ˙ 1 0C10C1 0C D 2 2

(2.83)

and c22 c02 c12 ˙ D C 2KH 2HH1 2H1 H2 1 1 1 K H3 H5 1 1 1 H1 D 8H 8 H2 H 8 H4 H2 8 H6 H4 (2.84) Since 1 H2i C1 1 2H2i C1 H2i C1 H2i H2i C2 H2i C1 D D D 8 H2i C2 H2i 8 H2i C2 H2i 8 H2i C2 H2i

D

H2i2 C1 1 D C exponentially small; 4H2i C2 H2i 4

(2.85)


107

applying (2.83)–(2.85) in Proposition 2.9, by (2.82) we have Mp2

1 IN 2

D

10 1 2 4

1 log N p C O.1/; 2 log.1 C 2/

where in the last step we used the fact that [see (2.79)] q` D

.1 C

p p ` 2/ .1 2/` log N p D N implies ` D p C O.1/: 2 2 log.1 C 2/

Thus we obtain Mp2

1 IN 2

1 log N p C O.1/; 8 log.1 C 2/

D

(2.86)

which proves (1.32). In the special case D 1=2 we have the ad hoc identity 1=2 .x/

1 D ..2x// 2..x//; 2

(2.87)

which gives the equation [see (2.62) and (2.80)] M˛

1 IN 2

D M2˛ .N / 2M˛ .N /:

(2.88)

By using (2.88), we can easilypdouble-check (2.86). What it means is that we apply Proposition 2.1 for both ˛ D 2 D Œ2 and p p 2˛ D 2 2 D 8 D Œ2I 1; 4; 1; 4; 1; 4; : : : D Œ2I 1; 4: p The length of the period of ˛ D 2 is odd, so the corresponding alternating sum in Proposition 2.1 cancels out. Thus we have Mp2 D

1 IN 2

D M2p2 .N / D

1 C 4 1 C 4 1 C 4 C O.1/ D 12

log N log N 1 1 C 4 1 p C O.1/ D p C O.1/; 12 2 8 log.1 C 2/ log.1 C 2/

(2.89)

which givespback (2.86). In Eq. (2.89) we used the fact that the .2i /th convergent p2i =q2i of 8 satisfies the equation p2i ˙ q2i

p p 8 D .3 ˙ 8/i

108


(due to the fact that the least positive solution of x 2 8y 2 D ˙1 is x D 3; y D 1), which implies p

p p p 1 q2i D p .3 C 8/i .3 8/i .3 C 8/i D .1 C 2/2i : 2 8 The ad hoc equation (2.88) gives p a shortcut for D 1=2 with any quadratic irrational ˛. For example, if ˛ D 3 D Œ1I 1; 2 then p p 2˛ D 2 3 D 12 D Œ3I 2; 6: Thus by (2.88) and Proposition 2.1, Mp3

1 IN 2

D M2p3 .N / 2Mp3 .N / D

2 C 6 log N 2 log N 1 C 2 p 2 p C O.1/ D O.1/; 2 2 log.2 C 3/ log.2 C 3/ (2.90) p since the .2i /th convergent p2i =q2i of 3 satisfies the equation D

1 12

p2i ˙ q2i

p

3 D .2 ˙

p i 3/ ;

which implies p

p p 1 q2i D p .2 C 3/i .2 3/i .2 C 3/i I 2 3 p p similarly, the i th convergent denominator for 2 3 is about .2 C 3/i (because p the least p positive solution of x 2 12y 2 D ˙1 is x D 7; y D 2, and 7 C 2 12 D .2 C 3/2 ). p Next consider the golden ratio ˛ D . 5C1/=2. Then ˛ D Œ1I 1 and 2˛ D Œ3I 4. Since the length of the period is odd for both continued fractions, by (2.88) and Proposition 2.1, M.p5C1/=2

1 IN 2

D O.1/:

(2.91)

p The last example p in this section is ˛p D 7 (again D 1=2). We need the following facts: 7 D Œ2I 1; 1; 1; 4, 28 D Œ5I 3; 2; 3; 10, the least positive solutions of x 2 7y 2 D ˙1 and x 2 28y 2 D ˙1pare, respectively, p x D 8; y D 3 and x D 127; y D 24 with the relation 127 C 24 28 D .8 C 3 7/2 . Combining these facts with (2.88) and Proposition 2.1, we have


Mp7 log N D 12

1 IN 2

109

D M2p7 .N / 2Mp7 .N / D

3 C 2 3 C 10 1 C 1 1 C 4 p 2 p log.127 C 24 28/ log.8 C 3 7/

C O.1/ D

log N

p C O.1/: 4 log.8 C 3 7/

(2.92)

Next we discuss examples where ¤ 1=2. p p Example 2.11. Next let D 1=3 and ˛ D 2. Then 2 D Œ1I 2 gives that 2 D a1 D a2 D a3 D in (2.75). We compute bi , ci , Hi in (2.76) as follows: c D c0 D .1 /K D

2 1 2 2 K D .2H C H1 / D H C H C H1 ; 3 3 3 3

implying b1 D 1, and similarly

c1 D

1 1 2 1 1 2 H C H1 D .2H1 C H2 / C H1 D H1 C H1 C H2 ; implying b2 D 1; and 3 3 3 3 3 3

1 1 1 1 1 H1 C H2 D .2H2 C H3 / C H3 D H2 C H3 ; implying b3 D 1; and 3 3 3 3 3 1 1 c3 D H3 D 0 H3 C H3 ; implying b4 D 0; and 3 3 1 1 2 1 c4 D H3 D .2H4 C H5 / D 0 H4 C H4 C H5 ; implying b5 D 0; and 3 3 3 3

c2 D

c5 D

2 1 2 1 2 2 H4 C H5 D .2H5 C H6 / C H5 D H5 C H5 C H6 ; implying b6 D 1; and 3 3 3 3 3 3

2 2 2 H5 C H5 D .2H6 C H7 / C 3 3 3 3 2 c7 D H7 D 0 H7 C H7 ; implying 3 3

c6 D

c8 D

2 2 H6 D 2H6 C H7 ; implying b7 D 2; and 3 3 b8 D 0; and

2 2 1 2 H7 D .2H6 C H9 / D H8 C H8 C H9 ; implying b9 D 1; and so on; 3 3 3 3

back to the beginning. Thus we get the periodic sequence for b1 ; b2 ; b3 ; : : :: 1; 1; 1; 0; 0; 1; 2; 0; 1; 1; 1; 0; 0; 1; 2; 0; 1; 1; 1; 0; 0; 1; 2; 0; : : :

110


Therefore, we obtain b1 b2 C b 3 b4 ˙ D 2 D

1 11C10C01C20 log N p C O.1/; 2 8 log.1 C 2/

(2.93)

and

D

c22 c12 c02 ˙ D C 2KH 2HH1 2H1 H2

H32 1 .2K/2 .H C 2H1 /2 .H1 C H2 /2 log N p C C C 18 KH HH1 H1 H2 H2 H3 8 log.1 C 2/

H32 1 .2H4 C H5 /2 .2H5 C 2H6 /2 .2H7 /2 C C C 18 H3 H4 H4 H5 H5 H6 H6 H7

!

log N 8 log.1 C

p C O.1/: 2/ (2.94)

Since by (2.75) Hi Hi C2 D ai C2 D 2; H2i C1 we can rewrite (2.94) as follows: sum(2.94) D

1 H1 H3 H3 H5 4.K H1 / H H2 2 C C4C 18 H H1 H2 H4 4.H5 H7 / 4.H4 H6 / 8 C H5 H6

D

D

2 1 .8 C 4 C 2 2 2 2 C 4 C 8 8 8/ D ; 18 3

implying sum(2.94) D

log N 12 log.1 C

p C O.1/: 2/

(2.95)


111

Applying (2.93)–(2.95) in (2.82), we have Mp2

1 IN 3

D

D Next let D 2=3 and ˛ D

p

Mp

2

1 1 8 12

log N 24 log.1 C

log N p C O.1/ D log.1 C 2/

p C O.1/: 2/

(2.96)

2, then a similar calculation gives the same answer:

2 IN 3

log N

D

24 log.1 C

p

2/

C O.1/:

(2.97)

We can easily double-check (2.96) and (2.97) by using the ad hoc equation 1 2 C 2=3 .x/ D ..3x// 3..x//; 1=3 .x/ 3 3

(2.98)

which leads to [see (2.62) and (2.80)] M˛

1 IN 3

C M˛

2 IN 3

D M3˛ .N / 3M˛ .N /:

(2.99)

Notice that (2.98) and (2.99) is an analog of (2.87) and (2.88). p p We have 3 2 D 18 D Œ4I 4; 8, and so by Proposition 2.1, M3p2 .N / D

1 4 C 8 log N p C O.1/; 12 2 2 log.1 C 2/

(2.100)

2 2 because the least positive solution p of x 18y D ˙1 is x D 17; y D 4, and so the .2i /th convergent p2i =q2i of 18 satisfies the equation

p2i ˙ q2i

p p 18 D .17 ˙ 4 18/i ;

which implies p p q2i .17 C 4 18/i D .1 C 2/4i :

112


Since the length of the period of Mp2

1 IN 3

p

2 is odd, by (2.99) and (2.100),

C

Mp2

D

2 IN 3

log N 12 log.1 C

D M3p2 .N / 3Mp2 .N / D p C O.1/; 2/

which is in agreement with (2.96) and (2.97). p Example 2.12. Let D 1=4 and ˛ D . 5 C 1/=2 D Œ1I 1. Then 1 D a1 D a2 D a3 D in (2.75), c D c0 D .1 /K D

3 3 1 K D .H C H1 / D H C .3H1 H /; 4 4 4

(2.101)

implying p b1 D 1. Note that 3H1 > H , since H=H1 is very close to the golden ratio ˛ D . 5 C 1/=2 < 3. We have 3H1 H D 3H1 .H1 C H2 / D 2H1 H2 D 2.H2 C H3 / H2 D H2 C 2H3 ; (2.102) and so c1 D

1 1 H2 C H3 D 0 H1 C c2 D 0 H2 C c3 ; implying b2 D b3 D 0; and 4 2

1 H2 C 4 3 c4 D H3 C 4

c3 D

1 1 H3 D .H3 C H4 / C 2 4 1 3 H4 D .H4 C H5 / C 4 4

1 3 1 H3 D H3 C H4 < H3 ; implying b4 D 0; and 2 4 4 1 3 H4 D H4 C H5 ; implying b5 D 1; and 4 4

c5 D

3 3 H5 D 0 H5 C H5 ; implying b6 D 0; and 4 4

c6 D

3 3 1 H5 D .H6 C H7 / D H6 C .3H7 H6 /; 4 4 4

which is the same as the beginning. Thus we get the periodic sequence for b1 ; b2 ; b3 ; : : :: 1; 0; 0; 0; 1; 0; 1; 0; 0; 0; 1; 0; 1; 0; 0; 0; 1; 0; : : : ; implying b1 b2 C b 3 b4 ˙ D 2 D

1 10C00C10 log N p C O.1/; 2 6 log 5C1 2

(2.103)


113

and

c22 c12 c02 1 log N p ˙ D C O.1/; C S0 2KH 2HH1 2H1 H2 32 6 log 5C1 2

(2.104)

where S0 D

9K 2 C.H2 C2H3 /2 KH

1 1 9H52 .3H3 C H4 /2 1 C C : HH1 H1 H2 H2 H3 H3 H4 H4 H5

p The critical sum S0 in the middle of (2.104) equals (with ˛ D . 5 C 1/=2) .3˛ C 1/2 C 9˛1 ; S0 D 9˛ C .˛ C 2/2 ˛5 ˛ 3 C ˛1 ˛

(2.105)

and using the simple facts ˛ 2 D 1 C ˛ and ˛ 2 D 1 ˛1 , it is easy to evaluate (2.105): S0 D 24. Returning to (2.104), we have sum(2.104) D

1 log N p C O.1/: .24/ 32 6 log 5C1 2

(2.106)

Applying (2.103)–(2.106) in (2.82), we have M.p5C1/=2

1 IN 4

log N 24 p C O.1/ D D 1 32 6 log 5C1 2

D

log N 24 log

p 5C1 2

C O.1/:

(2.107)

2.2.3 Periodicity in Proposition 2.9 Let’s return to Proposition 2.9 and Eq. (2.82). The periodicity of b1 , b2 , b3 , : : : in the examples above was not an accident: we prove that if the sequence a1 , a2 , a3 , : : : is periodic and c=K is a rational number, then b1 , b2 , b3 , : : : is also periodic (but the length of the period is not necessarily the same). Indeed, write c=K D s=t where 1 s < t are relatively prime integers. Then by (2.75) and (2.76), s s c D c0 D K D .a1 H C H1 / D b1 H C c1 ; t t

114


where (bxc and fxg denote the lower integral part and the fractional part of x) b1 D

j sa k 1

t

and c1 D

n sa o 1

t

s1 s s H C H1 D H C H1 ; t t t

and here we assume that c1 < H . Similarly, c1 D

s1 s1 s s H C H1 D .a2 H1 C H2 / C H1 D b2 H1 C c2 ; t t t t

where b2 D

s1 a2 C s t

and c2 D

s2 H1 C s1 H2 s1 a2 C s s1 H C H2 D ; t t t

and again we assume that c2 < H1 . Repeating this argument, for every i 0 we have si Hi 1 C si 1 Hi ; t

ci D

(2.108)

where 0 si ; si 1 < t are integers, and we always assume that ci < Hi 1 . The periodicity of ai means that ai D ai CL holds for .say/ M1 i M2 ;

(2.109)

and here we assume that .M2 M1 /=L is a very large integer. Consider now the sequence with gap L [see (2.109)]: cM1 ; cM1 CL ; cM1 C2L ; cM1 C3L ; ; cM2 I by (2.108) we have cM1 CjL D

sj0 HM1 CjL1 C sj00 HM1 CjL t

< HM1 CjL1 ;

(2.110)

where 0 sj0 ; sj00 < t are integers. If .M2 M1 /=L is larger than t 2 , then by the Pigeonhole Principle there is a repetition among the pairs .sj0 ; sj00 /, j D 0; 1; 2; : : :, and the first repetition implies the periodicity of the sequence b1 , b2 , b3 , : : : in the rest of the interval M1 i M2 [see (2.109)]. Of course, we cannot predict the length of the period, but it is certainly less than L.t 2 C 1/. Warning! It may happen that our assumption ci D

si Hi 1 C si 1 Hi < Hi 1 ; 0 si ; si 1 < t; t


115

inp(2.108) is violated; for example, see Eq. (2.101) in Example 2.12 (where ˛ D . 5 C 1/=2 and D 1=4): 3 .H C H1 / > H; 4 p since H=H1 is very close to ˛ D . 5 C 1/=2 < 3. This is why we cannot write c0 D

c0 D 0 H C c1 with c1 D

3 .H C H1 /; 4

instead we have to use c0 D H C

3H1 H D H C c1 ; 4

where in c1 we face a negative(!) coefficient: 1 3 0 < c1 D H C H1 < H: 4 4

(2.111)

p For ˛ D . 5 C 1/=2 < 3 we can use the ad hoc fact [see (2.102)] 3H1 H D H2 C 2H3 ;

(2.112)

which simply eliminates the “negativity problem” in (2.111). Next we show that this trick always works; we can always eliminate the “negativity problem.” To prove this, assume that for some i we have—just like in (2.110)—the reverse of (2.108): ci D

si Hi 1 C si 1 Hi > Hi 1 0 si ; si 1 < t: t

(2.113)

Then we rewrite (2.113) in the form ci D Hi 1 C ci0 where ci0 D

si 1 Hi .t si /Hi 1 t

and 0 ci0 < Hi 1 . In (2.75) we have the recurrence formula Hi 1 D ai C1 Hi C Hi C1 , so with ri D t si , si 1 Hi ri Hi 1 D si 1 Hi ri .ai C1 Hi C Hi C1 / D si1 Hi ri Hi C1 ; where si1 D si 1 ri ai C1 1. Case 1:

si1 ri .

116


By using Hi D ai C2 Hi C1 C Hi C2 , we have the following analog of (2.112): si1 Hi ri Hi C1 D si1 .ai C2 Hi C1 C Hi C2 / ri Hi C1 D D .si1 ai C2 ri /Hi C1 C si1 Hi C2 ;

(2.114)

which eliminates the “negativity problem.” Case 2:

si1 < ri .

Then again we use (2.114): si1 Hi ri Hi C1 D .si1 ai C2 ri /Hi C1 C si1 Hi C2 :

(2.115)

If .si1 ai C2 ri / is positive, then we are done; if it is negative, then clearly ri C2 D jsi1 ai C2 ri j < si1 , and we can rewrite (2.115) in the form si1 Hi ri Hi C1 D si1 Hi C2 ri C2 Hi C1 where ri > ri C2 0:

(2.116)

The decreasing property in (2.116) guarantees that, repeating this argument less than t times, the negative coefficient eventually disappears [i.e., turns into a positive coefficient like in (2.112)]. In other words, in both cases we can eliminate the “negativity problem.” By getting rid of the “negativity problem,” we are safe to say that the Pigeonhole Principle argument above always works. As a consequence, we obtain the periodicity of b1 , b2 , b3 , : : :. Combining this periodicity with Lemma 2.7 and Proposition 2.9 [see Eq. (2.82)], we have Proposition 2.13. If ˛ is a quadratic irrational and 0 < < 1 is a rational number, then there is a constant c D c.˛; / such that M˛ .; N / D c log N C O.1/

(2.117)

holds for every integer N 2.

2.3 Fourier Series and a Problem of Hardy and Littlewood (I) It is a standard exercise in every Fourier analysis course to compute the Fourier coefficients of the sawtooth function ..x// D

1 X sin.2jx/ j D1

j

;

(2.118)

2.3 Fourier Series and a Problem of Hardy and Littlewood (I)

117

where ..x// D fxg 1=2 if x is not an integer and 0 otherwise. We want to apply (2.118) in both S˛ .n/ D

n X

..k˛// and M˛ .N / D

kD1

N N 1 X 1 X S˛ .n/ D .N C 1 k/..k˛//; N nD1 N nD1

but we have to be a little bit careful, since the Fourier series in (2.118) is not absolutely convergent. Instead of (2.118) we actually use a finite version with a small error term. First we recall Abel’s transformation (“discrete integration by parts”): m X

aj bj D a1 .b1 b2 / C .a1 C a2 /.b2 b3 /C

j D1

C.a1 Ca2 Ca3 /.b3 b4 /C: : :C.a1 C: : :Cam1 /.bm1 bm /C.a1 C: : :Cam /bm : (2.119) We also need the well-known summation formula m X j D1

sin.jˇ/ D

cos.ˇ=2/ cos..2m C 1/ˇ=2/ ; 2 sin.ˇ=2/

(2.120)

which implies the useful upper bound ˇ ˇ ˇ ˇ m ˇ ˇX 1 ˇ sin.jˇ/ˇˇ : ˇ ˇ j sin.ˇ=2/j ˇj D1

(2.121)

The pointwise convergence of the Fourier series in (2.118) follows from (2.119) and (2.121), and the equality of the two sides in (2.118) follows from Fejér’s wellknown theorem in Fourier analysis. By (2.119) and (2.121), for any T 1, ˇ ˇ ˇ ˇ T X ˇ ˇ 1 2 sin.2jx/ ˇ..x// C ˇ < ; ˇ ˇ j T kxk ˇ ˇ T j sin.x/j j D1

(2.122)

where kxk denotes, as usual, the distance of x from the nearest integer. It follows that ˇ ˇ ˇ ˇ T X n n X X ˇ ˇ sin.2j k˛/ 1 ˇS˛ .n/ C ˇ< 1 : (2.123) ˇ ˇ j ˇ ˇ T kD1 kk˛k j D1 kD1

118


2.3.1 Badly Approximable Numbers We need to estimate the diophantine sum n X kD1

1 kk˛k

from above for the class of quadratic irrational ˛. Our argument below—a standard application of the Pigeonhole Principle—will work even for a larger class of reals, called badly approximable numbers. A real number ˛ is called badly approximable, if there is a positive constant c0 D c0 .˛/ > 0 such that kkk˛k c0 > 0 holds for all integers k 1: One can easily characterize this class in terms of the continued fraction: ˛ is badly approximable if and only if the sequence a1 ; a2 ; a3 ; : : : of partial quotients in ˛ D Œa0 I a1 ; a2 ; a3 ; : : : is bounded, i.e., there is a threshold M0 D M0 .˛/ < 1 such that ak M0 holds for all k 1. The well-known fact from diophantine approximation ˛D

.1/i C1 pi C ; qi qi .qi C1 C qi /

where pi =qi D Œa0 I a1 ; : : : ; ai 1 is the i th convergent of ˛, qi C1 D ai qi C qi 1 , and 0 < D .i / < 1, implies that c0 and M0 are basically reciprocals of each other (apart from an absolute constant factor). Note that every quadratic irrational is badly approximable, since periodicity implies boundedness. Lemma 2.14. Assume that ˛ is badly approximable, and kkk˛k c0 > 0 holds for all integers k 1. Then for any integer n, n X kD1

n 4 1 = log 2 : n log kk˛k c0 c0

In general, for any m > 2 we have X n : n 2

(2.124)

We claim that the set Aj has at most 2j C1 elements. Indeed, if jAj j > 2j C1 then by the Pigeonhole Principle there exist 1 k1 < k2 n such that ki 2 Aj , i D 1; 2, and jfk1 ˛g fk2 ˛gj
0. Thus we have n X kD1

X j 1

D

XX 1 1 D kk˛k kk˛k j 1 k2Aj

n 2j 1 c0

4n c0

jAj j

X j 1W2j n=c0

1 X n j C1 2 D c0 j 1 2j 1

1
0 for all integers k 1. Then for any n and T , S˛ .n/ D

T X cos..2n C 1/j˛/ cos.j˛/ j D1

2j sin.j˛/

C 1

4n log.n=c0 / log 2c0 T

C

(2.125)

120


and M˛ .N / D

N T X 1 X 1 S˛ .n/ D N nD1 2j tan.j˛/ j D1

T X sin.2j˛/ sin.2.N C 1/j˛/ 4n log.n=c0 / C 2 ; 2 log 2c0 T 4Nj sin .j˛/ j D1

(2.126)

where j1 j < 1 and j2 j < 1.

t u

The only novelty in the proof of (2.126) is the use of the summation formula N X

cos.nˇ C / D

nD1

sin..N C 12 /ˇ C / sin. 12 ˇ C / ; 2 sin.ˇ=2/

(2.127)

instead of (2.120).

2.3.2 The Hardy–Littlewood Series Now we return to the numerical series 1 X nD1

1 ; ˛ is irrational; n sin. n˛/

(2.128)

briefly mentioned at the end of Sect. 2.1. First notice that the series (2.128) cannot be convergent, since the terms do not tend to zero for any ˛. Indeed, the inequality kn˛k < 1=n holds for infinitely many values of n, for example, let n D qj where pj =qj is the j th convergent of ˛. The inequality kn˛k < 1=n combined with the trivial fact j sin. n˛/j kn˛k implies that (2.128) contains infinitely many terms that have absolute value 1=. Thus the convergence is out of the question. Nevertheless, Hardy p and Littlewood made the very interesting discovery that for the special value ˛ D 2 the partial sums of (2.128) remain uniformly bounded, that is, N X

1 D O.1/: n sin. n˛/ nD1

(2.129)

Equation (2.129) represents a miraculous cancellation; we can consider it the next best thing to convergence. Note thatpHardy and Littlewood actually proved the slightly more general result that if ˛ D a2 C 1; a is odd, then the partial sums always remain bounded. On the


121

other hand, for many other quadratic irrationals the N th partialpsum is c log N C O.1/ with c ¤ 0 (Hardy and Littlewood gave the example ˛ D 6=2 1). What is going on here? We will give a very transparent proof of (2.129) by using the following improved version of (2.126). Proposition 2.16. If ˛ is badly approximable, then for any N , M˛ .N / D

N X j D1

1 C O.1/; 2j tan.j˛/

(2.130)

where the implicit constant O.1/ D O˛ .1/ is independent of N . We postpone the proof of Proposition 2.16 to the next section. Besides Proposition 2.16, we also need the following simple trigonometric identity: 1 1 2 cos2 .ˇ/ cos.2ˇ/ 1 D D : tan.ˇ/ tan.2ˇ/ 2 sin.ˇ/ cos.ˇ/ sin.2ˇ/

(2.131)

By using (2.131), we obtain N X nD1

X X 1 1 1 D ; n sin. n˛/ n tan. n˛=2/ nD1 n tan. n˛/ nD1 N

N

and combining this with Proposition 2.16, we get the equation N X nD1

1 D 2M˛ .N / 2M˛=2 .N / C O.1/: n sin. n˛/

(2.132)

If ˛ is a quadratic irrational, then ˛=2 is also a quadratic irrational; therefore, combining Eq. (2.132) with Proposition 2.1, we obtain Proposition 2.17. If ˛ is a quadratic irrational, then there is a constant c D c .˛/ such that N X

1 D c log N C O.1/; n sin. n˛/ nD1

(2.133)

where the constant factor c D c .˛/ can be determined by using (2.132) and Proposition 2.16. Now we are in p a position to understand why the constant factor p c .˛/ in (2.133) equals 0 for ˛ D 2, and why in general it equals 0 for any ˛ D m2 C 1 where

122


p m 1 is an odd integer. The advantage of ˛ D m2 C 1 is that it has a particularly simple continued fraction: ˛ D ŒmI 2m; 2m; 2m; D ŒmI 2m and Case 1: Case 2:

if m is odd, then ˛=2 D Œ.m 1/=2I 1; 1; m 1; if m is even, then ˛=2 D Œm=2I 4m; m.

In Case 1 both ˛ and ˛=2 have periods of odd length, so by Proposition 2.1 and (2.132), the partial sums of the series (2.128) are O.1/. On the other hand, in Case 2, ˛=2 has a period of even length, so the partial sums of the series (2.128) have the form c .˛/ log N C O.1/ where c .˛/ is never zero. Now we clearly understand why in the “O.1/-theorem” p of Hardy and Littlewood the condition “m is odd” was necessary. Indeed, if ˛ D m2 C 1 and m is even, then there is no O.1/-theorem: by (2.132) and Case 2 above, N X

1 D O.1/ 2M˛=2 .N / D n sin. n˛/ nD1

D

log N 2 4m m C O.1/ D p 12 2 log.m C m2 C 1/ D

m p log N C O.1/; 4 log.m C m2 C 1/

since x D m and y D 1 is the least solution of Pell’s equation x 2 .m2 C 1/y 2 D ˙1. In view of (2.132) it is natural to ask the following related question: How to compute the continued fraction for ˛=2 from the continued fraction for ˛? Well, if ˛ D Œa0 I a1 ; a2 ; a3 ; then ˛=2 D Œa0 =2I 2a1 ; a2 =2; 2a3 ; a4 =2; ; a2i =2; 2a2i C1 ; if this formula does make sense, i.e., if a2i is even for every i 0. Under this “parity condition,” by using (2.132) and Proposition 2.16, it is very easy to characterize those quadratic irrationals for which the partial sums of the series (2.128) are O.1/. Indeed, if the length s of the period aj C1 ; aj C2 ; ; aj Cs of ˛ is odd, then the Pj Cs necessary and sufficient condition for an “O.1/-theorem” is i Dj C1 .1/i ai D 0. On the other hand, if the length of the period is even, then there is no “O.1/theorem” whatsoever. p For example, if ˛ D 41 D Œ6I 2; 2; 12; 2; 2; 12; : : : D Œ6I 2; 2; 12 then the “parity condition” holds: ˛ D 2

p

41 D Œ3I 4; 1; 24; 1; 4; 6; 4; 1; 24; 1; 4; 6; : : : D Œ3I 4; 1; 24; 1; 4; 6; 2


123

and by (2.132) we have N X

1 D O.1/ 2M˛=2 .N / D n sin. n˛/ nD1

D

log N 2 4 1 C 24 1 C 4 6 p C O.1/ D 12 6 3 log.32 C 5 41/ D

2 log N p C O.1/; 9 log.32 C 5 41/

since x D 32 and y D 5 is the least solution of Pell’s equation x 2 41y 2 D ˙1. The general case, when the “parity condition” is violated, is technically more complicated and somewhat unpleasant. We guess that this technical difficulty was the reason why Hardy and p Littlewood restricted their study to the very special quadratic irrationals ˛ D m2 C 1 D ŒmI 2m having the simplest possible (“one digit period”) continued fraction. How to obtain the continued fraction for ˛=2 in general, assuming we know ˛ D Œa0 I a1 ; a2 ; a3 ; ? There is an interesting general procedure to answer this question, even when the “parity condition” is violated. We learned it from Richard Bumby (Rutgers University), an expert in continued fractions, who claims that the procedure goes back to Hurwitz. What Hurwitz was really interested in was to find the continued fraction for e=2 and 2e, based on the knowledge of Euler’s classical solution for e: e D Œ2I 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ; 1; 2i; 1; : : ::

(2.134)

2.3.3 Doubling and Halving in Continued Fractions The procedure consists of three operations. The first two, H =“halving,” D =“doubling,” are perfectly natural; the third, S =“special operation,” is the tricky one. For example, to get the continued fraction for e=2, first we apply the “halving operation” H to the first “digit” 2 in (2.134): this gives 1, and next comes the “doubling operation” D applied to the second “digit” 1 in (2.134), and so on. There are nine rules. 1. 2. 3. 4. 5. 6.

H(2n) = nD (i.e., D comes next) Dn = (2n)H H(2n C 1) = n,1S Dn,1 = (2n C 1)S S(2n) = 1,n 1,1S S(2n C 1) = 1,nD

124


7. S1,n = (2n C 1)H 8. S1,n,1 = (2n C 2)S 9. S2 = 2S Note that rules 1 and 2 are obvious, but the rest of the rules require a little bit of work with continued fraction. For example, to prove rule 3, we may proceed as follows (n 1, m 1 are integers and x > 1 is a real): .2n C 1/ C

1 mC x1

2 DnC

1 2mC x2 mC1C x1

DnC

m C 1 C x1 1 1 D n C D C 2 2.m C x1 / 2m C x2

1

DnC 1C

m1C x1 mC1C x1

DnC

1 1C

DnC

1 1 mC1C x m1C x1

1 1C

1 1C

:

1 m1C x1 2

Assume now that m D 2k C 1 where k 1 is an integer, then .2n C 1/ C 2

1 mC x1

DnC

1 1C

1 1C

;

1 1 kC 2x

which proves the combination of rules 3 and 6. Similar argument proves the rest of the cases—we leave the details to the reader. We illustrate the application of these rules by determining the continued fractions of e=2 and 2e (first published by Hurwitz). To get e=2 we proceed on the “digits” in (2.134); we start with the “halving operation” applied on 2 (the first “digit” of e): H2 H) rule 1 H) 1 (D comes next) D1 H) rule 2 H) 2 (H comes next) H2 H) rule 1 H) 1 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S4 H) rule 5 H) 1,1,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H6 H) rule 1 H) 3 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S8 H) rule 5 H) 1,3,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H(10) H) rule 1 H) 5 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S(12) H) rule 5 H) 1,5,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) and so on. We applied the following rules: 1, 3, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, 7,


125

This sequence shows periodicity; the period is 1,4,5,7, and we obtain e=2 D Œ1I 2; 1; 3; 1; 1; 1; 3; 3; 3; 1; 3; 1; 3; 5; 3; 1; 5; 1; 3; : : ::

(2.135)

It is easy to recognize the linear pattern in (2.135): e=2 D Œ1I 2; 1; 3; 1; 1; 1; 3; 3; 3; 1; 3; 1; 3; 5; 3; 1; 5; 1; 3; : : : ; 2i C 1; 3; 1; 2i C 1; 1; 3; : : :. Similarly, to get 2e we proceed on the “digits” in (2.134), but of course here we start with the “doubling operation” applied on 2: D2,1 H) rule 4 H) 5 (S comes next) S2 H) rule 9 H) 2 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H4 H) rule 1 H) 2 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S6 H) rule 5 H) 1,2,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H8 H) rule 1 H) 4 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S(10) H) rule 5 H) 1,4,1 (S comes next) S1,1 H) rule 7 H) 3 (H comes next) H(12) H) rule 1 H) 6 (D comes next) D1,1 H) rule 4 H) 3 (S comes next) S(14) H) rule 5 H) 1,6,1 (S comes next) and so on. We applied the following rules: 4, 9, 7, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, 7, 1, 4, 5, This sequence shows periodicity with the same period as for e=2, and we obtain 2e D Œ5I 2; 3; 2; 3; 1; 2; 1; 3; 4; 3; 1; 4; 1; 3; 6; 3; 1; 6; 1; : : :: It is easy to recognize the linear pattern here: 2e D Œ5I 2; 3; 2; 3; 1; 2; 1; 3; 4; 3; 1; 4; 1; 3; 6; 3; 1; 6; 1; : : : ; 2i; 3; 1; 2i; 1; 3; : : :: (2.136)

2.3.4 A Geometric Interpretation We conclude Sect. 2.3 with the interesting observation that the partial sums of the Hardy–Littlewood series [see (2.128)] N X nD1

1 ; ˛ is irrational; n sin. n˛/

(2.137)

126


have a nice geometric meaning: the partial sums represent the “average error” in yet another natural lattice point counting problem. To justify this claim, we go back to Sect. 1.2, where we counted lattice points inside the axes-parallel right triangle bounded with the lines y D ˛x, y D 0, x D n (we excluded the lattice points on the boundary). Here we slightly modify the problem: let 0 < < 1, we shift the line y D ˛x to the parallel line y D ˛.x / passing through the point .; 0/— this point is the left corner of our new triangle; the lines y D 0, x D n remain unchanged. In other words, we just shift the left corner of the right triangle from the origin .0; 0/ to .; 0/. Counting the lattice points inside the new triangle vertically, we obtain the following sum [an analog of (1.47)]: b˛ ˛c C b2˛ ˛c C b3˛ ˛c C C b.n 1/˛ ˛c D D

n1 X 1 1 D k˛ ˛ fk˛g 2 2 kD1

D E˛; .n 1/ S˛; .n 1/;

(2.138)

where E˛; .m/

! 1 mC1 m ˛ C D˛ 2 2

and .m/ D S˛;

m X ..k˛ ˛//: kD1

Just like in Sect. 1.2, we consider E˛; .n 1/ the “expectation,” and S˛; .n 1/ is the “error term” (i.e., the deviation from the expected value). By using the Fourier series of the sawtooth function [see (2.118)], we have

..x ˛// D

1 X sin.2j.x ˛// j D1

j

;

and so we have the (formal) equation S˛; .m/ D

1 m X 1 X sin.2j.k˛ ˛// D j j D1 kD1

D

1 X 1 cos.2j˛. 12 // cos.2j˛. mC1 2 // : j 2 sin.j˛/ j D1


127

Now we choose D 1=2, that is, the left corner of our right triangle is the point .1=2; 0/ (instead of the origin). Then S˛;1=2 .m/ D

1 X cos.2 mj˛/ 1 j D1

2j sin.j˛/

;

(2.139)

implying that in the average .N / D M˛;1=2

N 1 X S .m/ N mD1 ˛;1=2

we have the new factor sin.j˛/ in the denominator instead of tan.j˛/ that we have in M˛ .N /; see (2.125), (2.126) and (2.139). Now assume that ˛ is badly approximable; then the proof of Proposition 2.16 can be easily adapted for the similar M˛;1=2 .N /, and it gives the following analog of (2.130): .N / D M˛;1=2

N X

1 C O.1/; 2j sin.j˛/ j D1

(2.140)

where the implicit constant O.1/ D O˛ .1/ is independent of N . Comparing (2.137) to (2.140), we see the geometric interpretation of the initial segment of the Hardy–Littlewood series. It represents the “average error” in a lattice point counting problem. Namely, counting lattice points in axes-parallel right triangles of slope ˛ (where ˛ is badly approximable), bounded by the horizontal axis, where the left corner is the fixed half-integer point .1=2; 0/; see the picture below.

128


2.4 Fourier Series and a Problem of Hardy and Littlewood (II) The whole section is devoted to the proof of Proposition 2.16. By using Lemma 2.15 with the choice T N log N , we have M˛ .N / D

N X

1 S1 S2 C O.1/; 2j tan.j˛/ j D1

(2.141)

N X sin.2j˛/ sin.2.N C 1/j˛/ 4Nj sin2 .j˛/ j D1

(2.142)

where S1 D and T X

1 S2 D 2j j DN C1

1 sin.2j˛/ sin.2.N C 1/j˛/ C : tan.j˛/ 2N sin2 .j˛/

(2.143)

Since the irrational rotation is uniformly distributed, we have the “plausible” approximation 1 M2 M1

X

Z f .k˛/

1

f .x/ dx;

(2.144)

0

M1 k<M2

where f .x/ is a “nice” periodic function with period one. We can make the “plausible” approximation (2.144) precise by using the so-called Koksma’s inequality. Lemma 2.18 (“Koksma’s inequality”). Let X D fx1 ; : : : ; xn g be an arbitrary n-element point set in the unit interval [0,1), then ˇ ˇ n Z 1 ˇ .X / Z 1 ˇ1 X ˇ ˇ f .xi / f .x/ dx ˇ jf 0 .x/j dx; ˇ ˇ ˇn n 0 0 i D1 where of course f 0 is the derivative of f (i.e., we assume that f is smooth), and ˇ ˇ ˇ ˇX ˇ ˇ 1 ny ˇ .X / D sup ˇ ˇ ˇ 0 qj . Thus we have Z˛ .nI I / D Z˛ .M I I / C O.1/; and so by (2.150), jZ˛ .nI I / njI jj D jZ˛ .M I I / M jI j C O.1/ njI jj jZ˛ .M I I / M jI jj C O.1/ C njI j D

D

max bj

rj 0 is a constant (note that the case of quadratic irrationals is trivial). In view of (2.199) Roth’s inequality (2.200) is nearly best possible (since

2.5 A Detour: The Giant Leap in Number Theory

145

" > 0 can be arbitrarily small), but a more delicate analysis reveals that there is plenty of room for improvement in (2.200). Indeed, (2.200) is equivalent to q kq˛k >

c.˛; "/ q"

(2.201)

for every integer q 1, where kxk denotes the distance of a real x from the nearest integer. p On the other hand, for every real algebraic number of degree 3, including ˛ D 3 2, computer experimentation seems to support the much stronger inequality q kq˛k >

c.˛; "/ log q .log log q/1C"

(2.202)

for every integer q 3, and also that (2.202) is best possible in the sense that we cannot delete " > 0. Notice that there is an exponential gap between (2.201) and (2.202). By the way, (2.202) is certainly true for almost every real ˛; the proof is easy. A serious handicap of Roth’s theorem (or Thue–Siegel–Roth theorem) is that the constant c D c.˛; "/ > 0 is ineffective: we cannot replace it with an explicit constant. The reason is that the proof technique (“Thue method”) is indirect—it involves a hypothetical assumption that there is a large “bad” q, which behaves wickedly, and the constant c D c.˛; "/ > 0 depends on the size of this “bad” q (q is finite, but in principle it can be arbitrarily large). Nevertheless effective results have been obtained by A. Baker in the 1960s (for which he was awarded the Fields medal in 1970). For example, in 1964 Baker proved the explicit result p 106 3 q kq 2k > 0:955 q

(2.203)

that holds for every integer q 1. The point here is the effective constant 106 in the numerator and the exponent 0:955 p < 1 in the denominator (notice that (2.203) with 1 instead of 0:955 is trivial, since 3 2 is a cubic number). We have to admit, therefore, that there is a humiliating exponential gap between the apparent truth [i.e., conjecture (2.202)] and what contemporary mathematics can do: the ineffective (2.201) and the effective (2.203), due to two Fields’ medalists. (Nevertheless, even a “weak” result like (2.203) has remarkable consequences in the theory of diophantine equations.) Conjecture (2.202) for real algebraic numbers (of degree 3)—a special case of the vague Giant Leap phenomenon—features “randomness.” Where does this pseudorandomness come from? This is a fundamental open problem, and we are nowhere near to understand it (not to mention answering it). For more about this exciting general issue, see Wolfram [Wo] and Beck [Be6]].

146


With some exaggeration we may even include the celebrated Riemann Hypothesis as another example of the Giant Leap. In the history of mathematics the set of primes served the first example of what one would call a “random set.” The Riemann Hypothesis (arguably the most famous open problem in mathematics) is equivalent to a problem about the “randomness” of the primes in the following way. The starting point P is Riemann’s remarkable Explicit Formula for the prime-counting function .x/ D px 1, which involves the nontrivial zeros of the Riemann zeta function. Instead of the original formula, nowadays it is customary to discuss a simplified version, due to von Mangoldt, where the plain prime-counting function .x/ is replaced with a weighted version (“Mangoldt sum”) 0 .x/

D

X

ƒ.n/;

(2.204)

1nx

where ƒ.n/ D log p if n is a power of p (p always stands for a prime) and ƒ.n/ D 0 if n is not a prime power. Riemann’s Explicit Formula in prime number theory goes as follows: 0 .x/

Dx

X x

C O.1/;

(2.205)

where runs through the nontrivial zeta-zeros (meaning the zeros in the vertical strip with real part between 0 and 1). Riemann described the number of the nontrivial zeta-zeros (say) in the vertical box where the imaginary part has absolute value T (T is “large”): the number is 1 1 C log.2/ T log T T C O.log T /: 2 2

(2.206)

In sharp contrast to the number, we can prove very little about the location of the nontrivial zeta-zeros. What we can prove is much, much less than the Riemann Hypothesis, which claims that the nontrivial zeta-zeros are all on the critical line (vertical line with 0, y0 > 0 and y0 is least). The meaning of “primary representations” in (2.211) will be explained in the proof below. Proof. First we give a precise definition of the infinite series X .x;y/¤.0;0/W primary representations

x2

1 dy 2

(2.212)

in the middle of (2.211), and prove the convergence. Note that x 2 dy 2 is the principal (binary quadratic) form of discriminant 4d , and the theory of quadratic forms p of discriminant 4d is equivalent to the theory of the real quadratic field Q . d /. We assume that the reader is somewhat familiar with the simplest concepts and facts about quadratic forms and quadratic fields (see, for example the book [Za4]). We recall the well-known fact that, given any integer A ¤ 0, if the equation x 2 dy 2 D A has one integral solution .x; y/, then the equation has infinitely many integral solutions. Indeed, if x12 dy12 D A and u2 d v2 D 1, then the product formula

2.6 Connection with Quadratic Fields (I)

p p p p .x1 Cy1 d /.uCv d / D .x1 uCy1 vd /C.x1 vCy1 u/ d D x2 Cy2 d

149

(2.213)

leads to a new solution x2 D x1 uCy1 vd , y2 D x1 vCy1 u of the equation x 2 dy 2 D A. Since Pell’s equation u2 d v2 D 1 has infinitely many solutions, generated by the least solution, product formula (2.213) gives rise to infinitely many solutions of x 2 dy 2 D A. The two solutions, .x1 ; y1 / and .x2 ; y2 /, related by the product formula (2.213), are called associates—this defines an equivalence relation on the set of all solutions of x 2 dy 2 D A. Let Rd .A/ denote the number of equivalence classes. Note that Rd .A/ is always finite and satisfies the inequality Rd .A/ .jAj/;

(2.214)

where .n/ is the divisor function, i.e., .n/ is the number of (positive) divisors of n, including 1 and n itself. Inequality (2.214) is a classical result (it is in fact a corollary of an exact formula for Rd .A/, due to Dirichlet). Now we are ready to define the precise meaning of series (2.212): X .x;y/¤.0;0/W primary representations

1 X X Rd .A/ 1 Rd .n/ Rd .n/ D D : x 2 dy 2 A n nD1

(2.215)

A¤0

To prove the convergence in (2.215), we describe a definite way of selecting a representative solution from each equivalence class—we call these representatives the primary solutions of x 2 dy 2 D A. First we take the conjugate of the product formula (2.213): p p p .x1 y1 d /.u v d / D x2 y2 d ; (2.216) and then take the ratio of (2.213) and (2.216): p p p x1 C y1 d u C v d x2 C y2 d p p : p D x1 y1 d u v d x2 y2 d

(2.217)

p We have u C v dpD ˙m for some integer m (where D d is the fundamental unit), and so u v d D ˙m . Returning to (2.217), we have p p x2 C y2 d x1 C y1 d 2m p D p : x2 y2 d x1 y1 d

(2.218)

In view of (2.218) there is just one choice of m (for a given x1 and y1 ) which will ensure that p x2 C y2 d 1< p 2d : (2.219) x2 y2 d

150


Equation (2.219) does not change if we replace .x2 ; y2 / with .x2 ; y2 /, so we can further ensure that p (2.220) x2 y2 d > 0: The particular solution x D x2 , y D y2 of x 2 dy 2 D A that satisfies (2.219) and (2.220) will be called primary. To prove the convergence in (2.215), we estimate the sums N X nD1

Rd .n/ and

N X

Rd .n/

nD1

by employing a simple lattice point counting argument. (It is worthwhile to point out that the same lattice point counting argument is used in the p proof of Dirichlet’s class number formula for real quadratic fields h.d / log d D d L.1; d /.) We will show that N X

p Rd .n/ D c0 .d /N C O. N /

(2.2210)

p Rd .n/ D c0 .d /N C O. N /

(2.22100)

nD1

and N X nD1

with the same constant factor c0 .d / (which is of course independent of N ).

2.6 Connection with Quadratic Fields (I)

151

us that the sum PNTo prove (2.221), we use (2.219) and (2.220), which tells 2 R .n/ equals the number of lattice points .x; y/ 2 Z Z satisfying the three d nD1 requirements: p xCy d 0 < x dy N; x y d > 0; 1 < p 2d : xy d 2

p

2

(2.222)

The region defined by Eq. (2.222) is a sector of a hyperbola bounded by two half lines through the origin—we call it a “hyperbolic triangle,” and denote it with H.N / D Hd .N /; see the picture. The left corner of the “hyperbolic triangle” p H.N / D Hd .N / is the origin .0; 0/, the lower right corner is the point . N ; 0/, and the upper right corner is the intersection of the hyperbola x 2 dy 2 D N and the positive side of the line p xCy d p D 2d : xy d It is not too difficult to determine the area of H.N /: we have N Area.Hd .N // D p log d : 2 d

(2.223)

We outline the proof p of (2.223). Firstpwe change the coordinates from x; y to u; v where u D x y d and v D x C y d and compute the determinant p ˇ ˇ p ˇ1 d ˇ @.u; v/ D ˇˇ p ˇˇ D 2 d : 1 d @.x; y/

(2.224)

In the u; v-plane, the hyperbolic triangle H.N / [defined by (2.222)] is given by 0 < uv N; u > 0; u < v u2 : These conditions are equivalent to 0 1 is an absolute constant to be specified later. We divide the left-hand side of (2.235) into two parts: X p

1yN;1xN d W jx 2 dy 2 jm

x2

X X 1 D C ; 2 1 2 dy

(2.236)

154


where X 1

D

X p 1yN;1xN p dW jx 2 dy 2 jm;jxy d j 1 and 1 < < 0: ' It is well known that under this condition the continued fraction for periodic; in fact, we have 3 1 D D p ' 19 2 The period 1; 3; 1; 2; 8; 2 of

1 '

is purely

p 19 C 2 D Œ1I 3; 1; 2; 8; 2: 5

is basically the same as the period of

p

1 '

p

19:

19 D Œ4I 2; 1; 3; 1; 2; 8:

Here “basically the same” means that the two periods, 1; 3; 1; 2; 8; 2 and 2; 1; 3; 1; 2; 8, can be transformed into each other by a cyclic permutation. Checking the remaining 11 forms on the list,pit is easy to see that each has a period that can be transformed into the period of 19 by a cyclic permutation. Thus p the class number of Q . 19/ is one. Example 3.4. Let d D 79, then the discriminant D D 4 79 D 316 D b 2 4ac, implying that b is even. By condition (3.67’), 0 1 and 1 < < 0; '

186

3 Variance, and Its Connection with Quadratic Fields

so the continued fraction for

1 '

is purely periodic: 1 D Œ1I 7; 1; 16: '

p The period 1; 7; 1; 16 of '1 is exactly the same as the period of 79 D Œ8I 1; 7; 1; 16. Next consider the form 3x 2 C 14xy 10y 2 on the list; it has the following “first and second roots”: p p 7 C 79 7 C 79 'D ; D : 3 3 Again 1 1 > 1 and 1 < < 0; ' so the continued fraction for

1 '

is purely periodic:

1 3 D D p ' 79 7

p 79 C 7 D Œ1I 1; 1; 2; 3; 5: 10

Despite the fact that the reverse 5; 3; 2; 1; 1; 1 of this period can be transformed into the previous period 1; 5; 3; 2; 1; 1 by some cyclic permutation, the three periods 1; 5; 3; 2; 1; 1 and 1; 7; 1; 16 and 1; 1; 1; 2; 3; 5

(3.68)

are substantially different (i.e., neither period can be transformed into another by a cyclic permutation). Moreover, checking the remaining forms on the list, it is easy to see that each has a period that can be obtained by some cyclic p permutation from one of the three periods in (3.68). Thus the class number of Q . 79/ is 3. These examples illustrate an effective method (studying the periods of continued fractions) to determine the class number of positive discriminants in general. Let’s now return to (3.65’), and in particular to the value of K .2/.

3.2.3 The Dedekind’s Zeta Function at s=2: A Formula Involving Characters Our goal is to supply an explicit evaluation of K .2/ in the form of a finite character sum, see Proposition 3.6. Since it has a relatively short and elementary proof, we

3.2 Connection with Quadratic Fields (II)

187

decided to include it. It illustrates the kind of number-theoretic arguments that are needed here. Proposition 3.6 is based on the well-known product formula

K .s/ D .s/L.s; D /;

(3.69)

where .s/ is the Riemann zeta function and L.s; D / is the Dirichlet L-function of the real Dirichlet character D corresponding to discriminant D (=the discriminant of K):

.s/ D

1 1 X X 1 D .n/ and L.s; / D : D s n ns nD1 nD1

We know since Euler that

.2/ D

1 X 2 1 D : 2 n 6 nD1

The difficult part is to determine L.2; D /. Lemma 3.5. For positive discriminant D we have D1 2 X L.2; D / D 5=2 D .n/n2 : D nD1

(3.70)

Proof. Since the quadratic field K is real, its character D .n/ is even, i.e., D .n/ D D .n/; thus we have L.2; D / D

1 X

D .n/n2 D

nD1

D1 1X D .a/'.a; D/; 2 aD1

(3.71)

where '.a; D/ D

X

n2 ;

(3.72)

na .mod D/

and the summation in (3.72) is extended over all integers (positive and negative). We clearly have '.a; D/ D

X

.mD C a/2 D D 2 f .a=D/; m2ZZ

188


where f .x/ D

X m2ZZ

1 : .m C x/2

It is not too hard to see that f .x/ D

2 : sin .x/ 2

(3.73)

Indeed, we have the well-known formula X 1 D ; tan.x/ mCx m2ZZ and taking derivative of both sides, (3.73) follows. Combining (3.71)–(3.73) we already obtain a representation of L.2; D / in the form of a finite sum, but because of the denominators sin2 .a=D/, the evaluation of this sum is very inconvenient for large D. To obtain the elegant/convenient formula (3.70), we involve a Fourier expansion and the so-called Gauss sum. Let F .x/ D 0 if x is integer, and let F .b=D/ D f .b=D/ for all b 2 ZZ when b is not a divisor of D. The function F .b=D/ is periodic in integral variable b with period D and therefore has a finite Fourier expansion F .b=D/ D

D1 X

n e 2inb=D :

(3.74)

nD0

It is easy to determine the constant term 0 in (3.74): D1 D1 1 X 1 X F .b=D/ D f .b=D/ D 0 D D D bD0

bD1

D1 X D2 X X n2 D .mD C a/2 D 2D D bD1 m2ZZ n1W n60 .modD/ 1 0 X X 2 .D 1/.D C 1/ 2 n2 d 2 n2 A D 2D.1 D 2 / D : D 2D @ 6 D 3 n1 n1

D

(3.75) To determine a general coefficient n , 1 n < D, the standard recipe is to multiply (3.74) by e 2inb=D and take the sum b D 0; 1; : : : ; D 1:


189

D1 D1 1 X 1 X f .b=D/e 2inb=D D F .b=D/e 2inb=D D D D bD1

bD0

D

D1 D1 X 1 X m e 2i.mn/b=D D n : D mD0 bD0

Since f .x/ is an even function (see (3.73)), we have D1 1 X n D f .b=D/ e 2inb=D C e 2inb=D D 2D bD1

D

D1 2 2 X e 2inb=D C e 2inb=D : D .e ib=D e 2ib=D /2

(3.76)

bD1

Write y D y.b/ D e ib=D ; then, motivated by (3.76), we study the rational function y 2n 1 1 y 2n y 2n C y 2n 2 D D .y y 1 /2 y 2 1 1 y 2 D .1 C y 2 C y 4 C C y 2n2 /.1 C y 2 C y 4 C C y 2nC2 / D D nC.n1/.y 2 Cy 2 /C.n2/.y 4 Cy 4 /C.n3/.y 6 Cy 6 /C C.y 2n2 Cy 2nC2 /: (3.77)

By using (3.77), we can easily evaluate the Fourier coefficient n with 1 n < D: n D

2 2 n.D n/ C 0 : D

(3.78)

Indeed, for 1 k < D we have D1 X bD1

e 2ikb=D D 1 C

1 e 2ikb D 1: 1 e 2ikb=D

(3.79)

Using (3.79) in (3.77), we have with y D y.b/ D e ib=D : D1 X bD1

y 2n C y 2n 2 D .D 1/n 2 ..n 1/ C .n 2/ C C 1/ D .y y 1 /2 D .D 1/n n.n 1/ D n.D n/:

(3.80)

190


On the other hand, by (3.76), D1 X bD1

D1 y 2n C y 2n 2 2 n D 1 X D D C 2 1 2 2 2 .y y / 2 2 sin .b=D/ bD1

D

n D 0 D C : 2 2 2 2

Combining this with (3.80), (3.78) follows. Returning to (3.71)–(3.74), we have L.2; D / D

D1 1X D .a/D 2 f .a=D/ D 2 aD1

D1 D1 X 1 X .a/ n e 2ina=D D D 2 2D aD1 nD0

! D

! D1 D1 X 2 2 1 X 2ina=D D D D .a/ n.D n/ C 0 e 2D 2 aD1 D nD0 D1 D1 X 2 X D 3 D .a/ n.D n/e 2ina=D D D aD1

nD0

! D1 D1 X 2 X D 3 n.D n/ D .a/e 2ina=D : D nD0 aD1

(3.81)

Here the sum S.n; D/ D

D1 X

D .a/e 2ina=D

aD1

is the famous Gauss sum, which shows up everywhere in algebraic number theory. We need two facts about the Gauss sum. Fact 1: S.n; D/ D D .n/S.1; D/ where of course S.1; D/ D

D1 X aD1

Fact 2: S.1; D/ D

p

D

D .a/e 2ia=D :


191

Note that the proof of Fact 1 is easy if n and D are relatively prime: indeed, then 2D .n/ D 1, and so S.n; D/ D D .n/

D1 X

D .an/e 2ina=D D D .n/S.1; D/;

aD1

since the numbers an, a D 1; 2; : : : ; D 1 run through a complete set of residues modulo D. If n and D have a common divisor 2, then both sides of the equality in Fact 1 are equal to 0; the proof of this case requires some extra work that we skip. The proof of Fact 2 is far from easy; we have to refer to the textbooks. Now using Facts 1 and 2 in (3.81), we obtain L.2; D / D

D1 p 2 X n.D n/ D .n/ D: 3 D

(3.82)

nD0

By using the fact D .D n/ D D .n/, we can simplify (3.82): D1 X

D1 1X n D .n/ D .n D .n/ C .D n/ D .D n// D 2 nD1 nD1 D1 D1 DX 1X .n C D n/ D .n/ D D .n/ D 0: D 2 2 nD1

nD1

Using this in (3.82), formula (3.70) follows, and the proof of Lemma 3.5 is complete. t u p Returning to (3.69), and applying Lemma 3.5, with K=Q Q. d /, where D is the discriminant, we obtain the following result. Proposition 3.6. We have

K .2/ D .2/L.2; D / D

D

D1 2 2 X D .n/n2 D 5=2 6 D nD1

D1 4 X D .n/n2 : 6D 5=2 nD1

(3.83)

This is a well-known “folklore” result (as far as we know, it goes back to Hecke, Klingen, Siegel, and was simplified by others). For example, with d D 5 we have D D 5 and pD .n/ is the classical quadratic residue symbol (“Legendre symbol”), so for K=Q Q. 5/ (3.83) gives

192


K .2/ D

4 2 4 p .12 22 32 C 42 / D p : 150 5 75 5

(3.84)

3.2.4 An Alternative Formula Due to Siegel: Proposition 3.7 Proposition 3.6 is an elegant, satisfying result, but there is a more efficient way to evaluate the Dedekind zeta function K .s/ at s D 2. Since the proof is much more difficult than that of Proposition 3.6, we have to skip it and just briefly state the result itself. This new approach is based on two deep arithmetic facts. The first fact is the so-called functional equation of the Dedekind zeta function, applied at the special value s D 2:

K .1/ D

1 .4p/3=2 K .2/ 4 4

(3.85)

(note that 4p is the discriminant). The second fact is a remarkable formula that implies that 60 K .1/ is always an integer(!):

K .1/ D

1 60

X

a;

(3.86)

b 2 CacDpW a>0;c>0

where the sum is over all ways of writing p D b 2 C ac with a, c positive integers (integer b can be positive, negative, and zero). Formula (3.86) is due to Siegel (see, e.g., in Zagier [Za2]), and it gives a fast way of computing K .2/. Combining (3.85) and (3.86), we have

K .2/ D

4 120p 3=2

X

a:

(3.87)

b 2 CacDpW a>0;c>0

Applying (3.87) in (3.65’), we have Vpp .N / D

1

0 D

B 1 p B 240 p @

N 1 X p .S p .n/ Mpp .N //2 D N nD1

X b 2 CacDpW a>0;c>0

p C log N aC A log p C O.log log N log N /;

(3.88)


193

where p 3 (mod 4) is a prime, the class number h.p/ D 1, and p is the p p fundamental unit in K=Q Q. p/, i.e., p D t0 C u0 p where .t0 ; u0 / is the least positive solution of Pell’s equation t 2 pu2 D 1. p Remark. It seems very likely that (3.88) holds for any ˛ D d where d 2 is a square-free integer (generalizing the special case d D p 3 (mod 4) prime with class number h.p/ D 1; of course, for arbitrary p d 2 the factor p should be replaced by d , the fundamental unit in K=Q Q. d /). We call it a conjecture and challenge the experts of algebraic number theory to prove it. This conjecture of mine on the variance is clearly motivated by the results that we know about the expectation. We recall that at the beginning of Sect. 2.1 we explained how a deep result in algebraic number theory (the Hirzebruch–Meyer– Zagier class number formula, see (2.19)) implies Proposition 2.1 in the special case p ˛ D p, p 3 (mod 4) is a prime > 3, and h.p/ D 1. But Proposition 2.1 is a far more general result, which holds for any real ˛ (not just for the special quadratic p irrationals ˛ D p). Similarly, here we derived our variance formula (3.88) from another deep result in algebraic number theory (Siegel’s formula (3.86)), and it is reasonable to expect that (3.88) has a far-reaching generalization (something like Proposition 2.1), far beyond the reach of Siegel’s formula. Unfortunately, we have no clue how this (hypothetical) generalization may look like; perhaps the reader can help me. What we certainly know is that p Eq. (3.88) remains true for any positive d 2 (mod 4) if the class number of Q . d / is one, for example, this happens for d D 2; 6; 14; 22. Let f .d / denote the critical sum in (3.88): X

f .d / D

a:

(3.89)

b 2 CacDd W a>0;c>0

For example, we have f .2/ D 5; f .3/ D 10; f .6/ D 30; f .7/ D 40:

(3.90)

By (3.88), lim

N !1

Vpd .N / log N

D c .d / D

f .d / : p 240 d log d

(3.91)

For example, by (3.90) and (3.91) we have 1 1 p p ; c .3/ D p p ; 48 2 log.1 C 2/ 24 3 log.2 C 3/

(3.92)

1 1 p ; c .7/ D p p : c .6/ D p 8 6 log.5 C 2 6/ 6 7 log.8 C 3 7/

(3.93)

c .2/ D

194


p p Notice that (3.93) justifies the values of the constant factors C4 . 2/ and C4 . 3/ in the denominator in Theorem 1.2, p see (1.52) and (1.53). The golden ratio ˛ D .1 C 5/=2 is not covered by the key formula (3.88). One of the (annoying) peculiarities of algebraic number theory is that the real quadratic p fields Q . d / require a slightly different treatment if d is a square-free p positive integer with d 1 (mod 4) (d 5). Then the algebraic integers of Q . d / have the form p d 1 xCy ; x; y 2 ZZ with norm 2 ! ! p p .2x y/2 dy 2 d 1 d C1 xy D : (3.94) xCy 2 2 4 Therefore, if ˛ D .1 C

p d /=2 and kj˛k is “small,” then sin2 .j˛/ kj˛k2 D .j˛ `/2 2

1 `Cj dj 2

!2 !2 p p 2 .2` j /2 dj 2 d 1 d C1 `j D ; 2 2 4 4dj 2

(3.95)

where ` D `.j; d / is the nearest integer to j˛. Notice that (3.95) is an analog of (3.48). We can repeat the argument of the case d 3 (mod 4) above with the slight modification that the new Pell equation is .2x y/2 dy 2 D ˙1; 4

(3.96)

which is equivalent to t 2 d u2 D ˙4 and t u (mod 2). For example, if d D 5; 13; 17 then the fundamental units are p p p 1C 5 3 C 13 5 D ; 13 D ; 17 D 4 C 17; 2 2

(3.97)

and they all have norm 1. The following version of (3.88) covers all cases. p Proposition 3.7. Assume that the class number of the real quadratic fields Q . d / is one. Let p p 1C d if d 1 .mod 4/: ˛ D d if d 2; 3 .mod 4/ and ˛ D 2


195

Then the variance V˛ .N / D

1

0 D

B 1 p B 120 D @

N 1 X .S˛ .n/ M˛ .N //2 D N nD1

X B 2 C4acDDW a>0;c>0

p C log N aC A log d C O.log log N log N /;

(3.98)

p p where d is the fundamental unit in Q . d / and D is the discriminant of Q . d /, i.e., D D 4d if d 2,3 (mod 4) and D D d if d 1 (mod 4) (and, accordingly, B D 2b or b). Remarks. When we compute the standard deviation—i.e., we take the square root— p the relatively large error term O.log log N log N / in (3.98) becomes a negligible O.log log N /: p

V˛ .N / D c˛

p log N C O.log log N /:

Let F .D/ denote the critical sum in (3.98): F .D/ D

X

a:

(3.99)

B 2 C4acDDW a>0;c>0

For example, we have F .5/ D 2; F .13/ D 10; F .17/ D 20:

(3.100)

By (3.98), lim

N !1

F .D/ V˛ .N / : D c .d / D p log N 120 D log d

(3.101)

For example, by (3.97) and (3.100) and (3.101) we have c .5/ D

1 1 p p ; c .13/ D p p ; 60 5 log..1 C 5/=2/ 12 13 log..3 C 13/=2/ (3.102) 1 c .17/ D p p : 6 17 log.4 C 17/

(3.103)

196


3.3 Connection with Quadratic Fields (III) 3.3.1 The General Case: Computing the Variance for an Arbitrary Quadratic Irrational In Sect. 3.2 wep explained how to compute the variance for p the special quadratic irrationals ˛ D d for d 2 or 3 (mod 4) and ˛ D .1 C d p /=2 for d 1 (mod 4), assuming the class number of the real quadratic field K=Q Q. d / is one. In these cases the computation of the variance is equivalent to finding the exact value of the Dedekind zeta function K .s/ at s D 2. The general form of a quadratic irrational is p b ˙ ˛D ; 2a that is, ˛ is the root of ax 2 C bx C c D 0 and D b 2 4ac > 0 is the discriminant. We have D Dm2 , where D is a fundamental discriminant (i.e., the discriminant of a real quadratic field). To evaluate the corresponding variance X .x;y/¤.0;0/W primary representations

1 ; .ax 2 C bxy C cy 2 /2

(3.104)

the Dedekind zeta function K .s/ has to be replacedpby the zeta function .s; M / of the complete ZZ-module M D ZZ1 C ZZ˛ of K=Q Q. d /. For notational simplicity, we assume that D D, that is, the discriminant of the primitive indefinite form ax 2 C bxy C cy 2 is a fundamental discriminant (since the switch from D D to the general case D Dm2 is a routine matter in algebraic number theory). The fundamental discriminant means that we work p with the zeta function K .s; A/ of the corresponding ideal class A of K=Q Q. d /. In general K .s; A/ does not have an Euler-type decomposition as (3.130), but it does have a functional equation relating K .s; A/ and K .1 s; A/ (proved by Hecke): with F .s/ D D s=2 s 2 .s=2/ K .s; A/

(3.105)

we have F .s/ D F .1 s/: In the special case s D 2 we have

K .2; A/ D

4 4

K .1; A/: D 3=2

(3.106)

3.3 Connection with Quadratic Fields (III)

197

We know two effective algorithms to evaluate K .1; A/ is the form of a finite sum: one is due to Zagier [Za3] and the other one is due to Shintani [Shi]. Both are developed to the point of explicit calculations. Since the details become quite involved and technical, we stop here and refer the interested reader to the readable papers [Za3] and [Shi].

3.3.2 Computing the Variance in Theorem 1.1: A Special Case Theorem 1.1 is about the asymptotic behavior of the discrepancy X

S˛ .I n/ D

1 n

(3.107)

1knW k˛2.0;/ .mod 1/

of the counting function of the irrational rotation k˛ (mod 1). For simplicity, assume that D 1=2. The characteristic function 1=2 of the first half of the unit interval .0; 1=2/ has the Fourier series 1=2 .x/

1 D 2

X j 1W od d

2 sin.2jx/ ; j

and so we have S˛ .1=2I n/ D

n X

1=2 .k˛/

kD1

n D 2

X j 1W od d

cos..2n C 1/j˛/ cos.j˛/ : j sin.j˛/

By repeating the argument of Sects. 2.3 and 2.4 we can prove the following analog of Proposition 2.16: M˛ .1=2I N / D

N 1 X S˛ .1=2I n/ D N nD1

X 1j N W od d

1 C O.1/; j tan.j˛/

(3.108) assuming ˛ is badly approximable. p For simplicity, we just consider the special case ˛ D 2. Repeating the arguments of Sect. 3.1, we can easily prove the following result for the corresponding variance: Vp2 .1=2I N /

N 1 X p D .S 2 .1=2I n/ Mp2 .1=2I N //2 D N nD1

198


D

D

4 4

N X

1

kD1W od d

2 2 k 2

X

N X

p 1x 2N yD1W od d

p C negligible D sin2 .k 2/

.x 2

1 C negligible: 2y 2 /2

(3.109)

From (3.109) we will derive that Vp2 .1=2I N /

p

p log N 3 2 p C O log log N log N : D 7 2 log.1 C 2/

The new difficulty in (3.109) is the condition “y is odd”; without it the evaluation of the sum on the right hand side of (3.109) is easy: X

N X

p 1x 2N yD1

1 D .x 2 2y 2 /2

1

0 X

B DB @

.x;y/¤.0;0/W primary representations

.x 2

C log N 1 C p C negligible; 2 2 2y / A log.1 C 2/

(3.110)

p Q. 2/ (and 1 C where the sum on the right-hand side of (3.110) is K .2/ with K=Q p 2 is the fundamental unit). The condition “y is odd” in (3.109) means that we have to involve an extra case study modulo 8. We have

K .s/ D

1 X R2 .n/ nD1

(3.111)

ns

where R2 .n/ is the number of primary representations of x 2 2y 2 D ˙n. We decompose (3.111) modulo 8 as follows:

K .s/ D

X 4 0 is not a complete square. The integral solutions of Ax 2 C Bxy C Cy 2 D m, where m ¤ 0 is a given integer, are described by the automorphisms of the binary form (see the unimodular substitution in (3.59)): xD

t C uB t uB x1 C uy1 ; y D Aux1 C y1 ; 2 2

(3.129)

where .t; u/ is a solution of Pell’s equation t 2 Du2 D 4. Since all solutions of Pell’s equation t 2 Du2 D 4 are given by the formula (see (3.60)) p !n p ! t0 C u0 D t Cu D D˙ ; 2 2 where .t0 ; u0 / is the least positive solution, it suffices to study (3.129) with t D t0 , u D u0 : x2 D

t0 u 0 B t0 C u0 B x1 C u0 y1 ; y2 D Au0 x1 C y1 : 2 2

(3.130)

Replacing .x1 ; y1 / in (3.130) with .x2 ; y2 /, we obtain .x3 ; y3 /. Replacing .x2 ; y2 / in (3.130) with .x3 ; y3 /, we obtain .x4 ; y4 /, and so on. Also, we can go backward and obtain .x0 ; y0 /, .x1 ; y1 /, .x2 ; y2 /, and so on. In view of (3.128), we have to study the sequence .xi ; yi / modulo r2 (=denominator of ): .xi .mod r2 /; yi .mod r2 //; i 2 ZZ:

(3.131)

Equation (3.130) implies that (3.131) is a periodic infinite sequence, and the length of the period is clearly p r22 . This periodicity explains that the proof of the special case D 1=2 (and ˛ D 2) above perfectly illustrates the general case, and we have N X X

sin2 .yr1 =r2 / D c 0 .˛; / log N C negligible: 2 C Bxy C Cy 2 /2 .Ax 1x˛N yD1

204


Combining this with (3.128), we obtain V˛ .I N / D

N 1 X D c 00 .˛; / log N C negligible: N nD1

(3.132)

Note that the constant factor c 00 .˛; / in (3.132) is clearly nonzero. Indeed, by the periodicity of (3.131) and by (3.128), it suffices to find a single integer m ¤ 0 such that Ax 2 C Bxy C Cy 2 D m with some y 6 0 (mod r2 ) (because then sin2 .yr1 =r2 / ¤ 0). But this is trivial: x D 0, y D 1 is a good choice. Again we p can guarantee that “negligible” in (3.132) actually means O.log log N log N /. Thus we have Proposition 3.8. If ˛ is a quadratic irrational, and 0 < < 1 is a rational number, then V˛ .I N / D

N 1 X .S˛ .I n/ M˛ .I N //2 D N nD1

p D c 00 .˛; / log N C O log log N log N ; where c 00 .˛; / > 0 is a strictly positive constant.

t u

3.3.4 The Case of Symmetric Intervals What happens if the asymmetric interval .0; / in Proposition 3.8 is replaced with the symmetric interval .; /? It is easy to answer this question. Let ˙ denote the characteristic function of the interval .; / with a rational 0 < < 1=2. We write D r1 =r2 where 1 r1 < r2 are relatively prime integers. It is easy to compute the Fourier series of ˙ .x/: X

˙ .x/ 2 D

j 2ZZW j ¤0

sin.2jr1 =r2 / 2ijx : e j

Our goal is to study S˛ .˙I n/ D

X 1knW k˛2.;/ .mod 1/

1 2n:

3.3 Connection with Quadratic Fields (III)

205

By using the Fourier series, we have S˛ .˙I n/ D

n X

X

˙ .k˛/ 2n D

kD1

j 2ZZW j ¤0

D

X j 2ZZW j ¤0

n sin.2jr1 =r2 / X 2ij k˛ e j

! D

kD1

sin.2jr1 =r2 / e 2ij n˛ 1 : j 1 e 2ij˛

By repeating the argument of Sects. 2.3 and 2.4 we can prove the following analog of Proposition 2.1: if ˛ is badly approximable, M˛ .˙I N / D

N 1 X S˛ .˙I n/ D N nD1

D

X jj jN W j ¤0

N X sin.2jr1 =r2 /

j

j D1

sin.2jr1 =r2 / C O.1/ D j .1 e 2ij˛ /

C O.1/;

because 1 1 C D 1: 1 e 2ij˛ 1 e 2ij˛ Since r2 X

sin.2jr1 =r2 / D 0;

j D1

by Abel’s transformation (2.119) we have N X sin.2jr1 =r2 / D O.1/: j j D1

Therefore, with D r1 =r2 , M˛ .˙I N / D

N 1 X S˛ .˙I n/ D O.1/: N nD1

206


Repeating the arguments of Sect. 3.1, we can easily prove the following result for the corresponding variance: V˛ .˙I N / D

D

N 1 X 2 S .˙I n/ D N nD1 ˛

N X sin2 .2kr1 =r2 / C negligible: 2 2 k 2 sin2 .k˛/ kD1

Comparing this to (3.128), we find the following equality between the variances of the “symmetric” and “asymmetric” cases: V˛ .˙I N / D V˛ .2I N / C negligible; where “negligible” is the usual error term O.log log N

p log N /.

Chapter 4

Proving Randomness

4.1 Completing the Proof of Theorem 1.2 In Sect. 1.5 we almost proved Theorem 1.2 in the special case of the golden ratio. The missing part was the variance, but we took care of this particular issue in Sect. 3.2. The golden ratio has the simplest continued fraction among all quadratic irrationals, and this extreme simplicity (the length of the period is one, and every partial quotient is one) is rather misleading. What we do in this section is basically a “proof by examples.” We replace the golden ratio with some increasingly more complicated quadratic irrationals, such as ˛D

p p p 3 D Œ1I 1; 2 and 7 D Œ2I 1; 1; 1; 4 and 19 D Œ4I 2; 1; 3; 1; 2; 8

in Theorem 1.2, and p explain p phow to modify the arguments of Sect. 1.5. We show that the numbers 3, 7, 19 well represent the general case. The method of Sect. 1.5 was p to set up an approximation (in the special case of the golden ratio ˛ D .1 C 5/=2) S˛ .n/ M˛ .N / D X1 C X2 C X3 C : : : C negligible; where the Xi s are independent and identically distributed random variables (as n runs in 0 < n < N ). For an arbitrary quadratic irrational ˛ the construction of these independent random variables is somewhat more complicated, and this construction is the bulk of Sect. p 4.1. We start with 3. Again we are going to use Ostrowski’s explicit p formula (1.55), so the first step is Ostrowski’s expansion of n with respect to ˛ D 3, see (1.54). Since ˛D

p

3 D Œ1I 1; 2; 1; 2; 1; 2; : : : D Œa0 I a1 ; a2 ; a3 ; : : :;

© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7__4

207

208

4 Proving Randomness

it is easy to determine the denominator qi of the convergents pi =qi of

p

3. We have

q1 D 1; q2 D a1 D 1; qi D ai 1 qi 1 C qi 2 for i 3; that is; q2j D q2j 1 C q2j 2 and q2j 1 D 2q2j 2 C q2j 3 for j 2:

(4.1)

We also have p2j ˙ q2j

p p p 3 D .p2 ˙ q2 3/j D .2 ˙ 3/j ;

(4.2)

implying p

p 1 q2j D p .2 C 3/j .2 3/j : 2 3

(4.3)

By (4.1), q2j D q2j 1 C q2j 2 , and so (4.3) gives that q2j 1 D q2j q2j 2 D

p p p

p 1 1 D p .2 C 3/j .2 3/j p .2 C 3/j 1 .2 3/j 1 D 2 3 2 3 p p p

1 p D p . 3 C 1/ .2 C 3/j C . 3 1/ .2 3/j : 2 3

(4.4)

Following (1.54), the corresponding Ostrowski expansion of n is a unique way to express an arbitrary positive integer n as a linear combination of the qi s as follows: nD

X i

bi qi ;

0 b2j 2 D a2j ; 0 b2j 1 1 D a2j 1 for i 1

(4.5)

and b1 D b1 .n/ D 0, where * indicates the Extra Rule that if bi D bi .n/ D ai then bi 1 D bi 1 .n/ D 0 (i 2). p p The only new parameter in Ostrowski’s formula (1.55) is "i D "i . 3/ D qi 3 pi . By (4.2) we have "2j D q2j

p

3 p2j D .2

p

3/j ;

(4.6)

and by (4.1), p p "2j 1 D q2j 1 3 p2j 1 D .q2j q2j 2 / 3 .p2j p2j 2 / D p p D "2j "2j 2 D . 3 1/ .2 3/j 1 :

(4.7)

4.1 Completing the Proof of Theorem 1.2

209

Thus we have

p 1 q2j j"2j j D p 1 .2 3/2j ; 2 3

(4.8)

p 1 q2j 1 j"2j 1 j D p 1 C .2 3/2j 1 : 3

(4.9)

and

By using these facts in Ostrowski’s formula (1.55), for 0 n < qm we have (we recall that b1 D 0) 1 0 1 X q q j" j j j" b 1 j i i i i AD Sp3 .n/ D .1/i C1 bi @ C bj A qi j"i j C C@ 2 2 q 2 i i D2 2j 0

B 1 p B 240 19 @

C aC A

log N

p

log.8 C 3 7/

C O.log log N

p log N /;

1

0 Vp19 .N / D

p log N p C O.log log N log N /; log.2 C 3/

1

0 Vp7 .N / D

C aC A

X b 2 CacD19W a>0;c>0

C aC A

p p CO.log log N log N /: log.170 C 39 19/ log N


219

What we actually need is the standard deviation: q

p p 1=2 p Vp3 .N / D 24 3 log.2 C 3/ log N C O.log log N /;

q

p p 1=2 p Vp7 .N / D 6 7 log.8 C 3 7/ log N C O.log log N /;

q Vp19 .N / D

33 p p 40 19 log.170 C 39 19/

1=2

p log N C O.log log N /:

Note that the effect of the error term O.log log N / is obviously negligible. The rest of the proof of Theorem 1.2 for an arbitrary quadratic irrational ˛ goes along the p lines of the arguments in Sect. 1.5 (i.e., the special case of the golden ratio ˛ D . 5 C 1/=2)—we mean the arguments from (1.202) till the end. Note that in Sect. 1.5 we had a minor “parity problem”—due to the fact that the length of the period of the golden ratio is odd—that we could easily eliminate by the ad hoc trick of “slightly extending the groups of odd size in (1.203) p p going p up to the first single 0.” Our illustrative examples in this section, ˛ D 3, 7, 19, all have the property that the length of the period is even, avoiding the “parity problem.” In general, if the length of the period of the continued fraction for ˛ is odd, then we have two options to eliminate the “parity problem.” Either we use the ad hoc trick of Sect. 1.5, or we simply double the period (pretending that the double period is the period). Note that in Proposition 1.19 the integral parameter n runs in the special interval p 0 n < qm , where qm is the mth Fibonacci number. Similarly, for ˛ D 3 we studied the special interval p 0 n < q2m , where q2m is the denominator of the .2m/th convergent of 3, see (4.11). We made this restriction for the sake of simplicity: to guarantee that the initial distribution is the stationary p distribution. We can easily generalize from the special interval 0 n < q2m (for 3) to an arbitrary interval 0 n < N by using the simple trick explained in the Concluding Remarks at the end of Sect. 1.5, see (1.237)–(1.239). Of course, the same applies for any p quadratic irrational ˛ (not just for 3). There are two more technical details that we have to address here: namely, (1) how to prove the analog of (1.223) in general, that is, the fact that the correlation tends to zero exponentially fast and (2) how to prove the analog of (1.208) in general (“exponentially small tail probability for the return time”). We begin with the generalization of (1.223). In Sect. 1.5 we could carry out a direct approach: we could prove (1.223) by a direct computation, because the transition matrix of the corresponding Markov chain was a small 2-by-2 matrix, and it was easy to determine the eigenvalues explicitly. In the general case we can avoid the explicit calculation of the eigenvalues by using the following indirect “metric space” approach.

220


4.1.2 Ergodic Markov Chains: Exponentially Fast Convergence to the Stationary Distribution This is a basic result in the theory of Markov chains. In spite of its importance, most books in probability theory either avoid the proof, or just give a (trivial) illustration on a 2-by-2 matrix, or refer to very general results in ergodic theory. To make our book more or less self-contained, we decided to include a short proof. A finite homogeneous Markov chain is called ergodic if there is an integer s0 1 such that for any two states, say, states i and j , the s0 -step transition probability pi;j .s0 / from i to j is strictly positive: pi;j .s0 / > 0 for any i and j . Ergodicity is equivalent to the fact that the s0 th power As0 of the transition matrix A D .pi;j /i;j has the property that every entry is strictly positive. The term ergodic (which comes from statistical mechanics) will be justified by Lemma 4.2. Assume that a finite and homogeneous Markov chain with transition matrix A has r states 1; 2; : : : ; r, and let D f1 ; : : : ; r g be a probability distribution on 1; 2; : : : ; r. Then Ak defines a probability distribution for any integer k 1. Let 0 D f01 ; : : : ; 0r g and 00 D f001 ; : : : ; 00r g be two arbitrary probability distributions on 1; 2; : : : ; r. We define the “distance” 1X 0 j 00i j; 2 i D1 i r

dist.0 ; 00 / D

which turns the space of all probability distributions on 1; 2; : : : ; r into a complete metric space. Trivially 0 dist.0 ; 00 / 1. The following simple result plays a key role here. For the sake of completeness we include the short proof. Lemma 4.1. If Q D .qi;j /i;j is an r-by-r stochastic matrix, and qi;j ı > 0 for all 1 i; j r, then dist.0 Q; 00 Q/ .1 ı/ dist.0 ; 00 /: Proof. First note that 1X 0 j 00i j D 2 i D1 i r

dist.0 ; 00 / D

X 1 X 00 1X 0 .i 00i /C C .i 0i /C D .0i 00i /C ; 2 i D1 2 i D1 i D1 r

D

r

r

(4.32)

where .x/C D x if x > 0 and .x/C D 0 if x 0 (the “positive part” of x). In the last step of (4.32) we used the trivial fact that


0D11D

r X

221

0i

i D1

D

r X

00i D

r X

i D1

.0i 00i / D

i D1

r r X X .0i 00i /C .00i 0i /C : i D1

i D1

By (4.32) we have 0

r r X X

00

dist. Q; Q/ D

j D1

!C .0i

00i /qi;j

:

(4.33)

i D1

We claim that for some j r X

!C .0i

00i /qi;j

D 0:

(4.34)

i D1

Indeed, otherwise r X

0i qi;j >

i D1

r X

00i qi;j for all j;

i D1

and adding them up leads to the contradiction 1D

r X r X

0i qi;j >

j D1 i D1

r X r X

00i qi;j D 1:

j D1 i D1

Therefore, at least one index j D j0 is “missing” in (4.33), and so we have 0

r X

r X

j D1Wj ¤j0

i D1

00

dist. Q; Q/ D

r X

0 .0i

00i /C

i D1

@

r X

j D1Wj ¤j0

r X

1 qi;j A

!C .0i

00i /qi;j

r X .0i 00i /C 1 qi;j0 i D1

.0i 00i /C .1 ı/ D .1 ı/ dist.0 ; 00 /;

i D1

completing the proof of Lemma 4.1.

t u

222


Now let 0 be a probability distribution on 1; 2; : : : ; r, and let n D 0 An where A is the r-by-r transition matrix of an ergodic Markov chain. Lemma 4.1 implies that the sequence n D 0 An , n D 1; 2; 3; : : :, of probability distributions forms a Cauchy sequence in our complete metric space. Indeed, by Lemma 4.1, dist.n ; nCm / D dist.0 An ; 0 AnCm / .1 ı/dist.0 Ans0 ; 0 AnCms0 / D D .1 ı/dist.ns0 ; nCms0 /;

(4.35)

where every entry of As0 is ı > 0 (“ergodicity”). Iterating (4.35) we have dist.n ; nCm / .1 ı/dist.ns0 ; nCms0 / .1 ı/2 dist.n2s0 ; nCm2s0 / .1 ı/3 dist.n3s0 ; nCm3s0 / : : : .1 ı/k dist.nks0 ; nCmks0 / .1 ı/k

(4.36)

as long as n ks0 . By choosing k ! 1 in (4.36), we conclude that n D 0 An , n D 1; 2; 3; : : :, forms a Cauchy sequence. Since the metric space is complete, the limit exists, and limn n D is a probability distribution. We have A D lim n A D lim 0 An A D lim nC1 D : n!1

n!1

n!1

(4.37)

Next we show that the invariance property D A uniquely determines the probability distribution (see (4.37)). Indeed, assume that 0 D 0 A and 00 D 00 A, then 0 D 0 A D 0 A2 D : : : D 0 As0 ; and similarly 00 D 00 A D 00 A2 D : : : D 00 As0 ; and by Lemma 4.1, dist.0 ; 00 / D dist.0 As0 ; 00 As0 / .1 ı/dist.0 ; 00 / for some ı > 0, which implies that dist.0 ; 00 / D 0, i.e., 0 D 00 , proving the uniqueness.

4.2 How to Use Lemma 4.2 to Find the Analog of (1.223) in General?

223

We have thus obtained that for any initial probability distribution 0 on 1; 2; : : : ; r, the limit lim 0 An D exists, and it is independent of 0 :

n!1

We call the uniquely determined probability distribution the stationary distribution of the ergodic Markov chain. Let i be an arbitrary state; by choosing the initial distribution 0 D 1 on state i and 0 on the rest of the states, 0 An is equal to the sequence of n-step transition probabilities pi;j .n/, j D 1; 2; : : : ; r (the starting point, state i , is fixed). Let D fp1 ; : : : ; pr g denote the stationary distribution. Then by (4.36), 1X jpi;j .n/ pj j D dist.0 An ; lim 0 A` / `!1 2 j D1 r

.1 ı/n=s0 1 .1 ı/1 .1 "/n

(4.38)

with 1 " D .1 ı/1=s0 < 1. Since Eq. (4.38) holds for every i D 1; 2; : : : ; r, it proves Lemma 4.2. In every finite ergodic Markov chain the speed of convergence of the n-step transition probability pi;j .n/ to the limit pj (=the stationary probability of state j ) is exponential (independently of state i ).

4.2 How to Use Lemma 4.2 to Find the Analog of (1.223) in General? Let A denote the set of all admissible sequences .B1 ; B2 ; : : : ; B` / (i.e., satisfying the Extra Rule), where ` 1 is the length of the period of the quadratic irrational ˛ in Theorem 1.2. Consider now the Markov chain, where the set of states is A, that is, every admissible sequence represents a state, see the picture below. Why is it ergodic, and what is the corresponding s0 ? We represent an admissible sequence (a vector) with a boldface letter, so i D .B1 ; B2 ; : : : ; B` / denotes a state of the Markov chain. Let 0 D .0; 0; : : : ; 0/; since pi;0 > 0 for any state i, and also p0;j > 0 for any state j, the 2-step transition probabilities pi;j .2/ pi;0 p0;j > 0 are all strictly positive. This proves the ergodicity with s0 D 2. Therefore, the n-step transition probability pi;j .n/ converges to the limit pj (=the stationary probability of state j) exponentially fast.

224


Let i.k/ denote the kth coordinate of state i (which is an `-dimensional vector). By using the exponentially fast convergence in Lemma 4.2, we have X

p.b 0 ; b 00 I k1 ; k2 I n/ D

iW i.k1 / Db 0

X

D

iW i.k1 / Db 0

0

X

D@

iW i.k1 / Db 0

pi

X

X

pi

pi;j .n/ D

jW j.k2 / Db 00

pj C O ..1 "/n / D

jW j.k2 / Db 00

10

X

pi A @

1 pj A C O ..1 "/n / :

jW j.k2 / Db 00

By using the notation 0 p.b 0 ; b 00 I k1 ; k2 / D @ iW

X i.k1 / Db 0

10

X

pi A @ jW

1 pj A ;

j.k2 / Db 00

we conclude that p.b 0 ; b 00 I k1 ; k2 I n/ converges to p.b 0 ; b 00 I k1 ; k2 / exponentially fast (as n ! 1). Now this is the analog of (1.223) in the general case. Next, we formulate a general form of inequality (1.208). Similarly to Lemma 4.2, this is another important result in the theory of Markov chains. Again, for the sake of completeness, we included a short proof.

4.2 How to Use Lemma 4.2 to Find the Analog of (1.223) in General?

225

Lemma 4.3 (Exponentially small tail probability for the return time). Consider an arbitrary finite ergodic (homogeneous) Markov chain; let s denote the number of states. Let 0 be any state, and let denote the return time from state 0 to itself (of course is a random variable). Then there is a real number with 0 < < 1 (possibly depending on the Markov chain) such that PrŒ > ns < n for every integer n 1: Proof. Let : : : ; X1 ; X0 ; X1 ; X2 ; : : : denote our Markov chain. We have PrŒX1 D a1 ; X2 D a2 ; : : : ; X` D a` jX0 D a0 ; X1 D a1 ; : : : ; Xk D ak D D PrŒX1 D a1 ; X2 D a2 ; : : : ; X` D a` jX0 D a0 ; which expresses the basic property that, conditional upon the present (“X0 D a0 ”), the future (“X1 D a1 ; X2 D a2 ; : : : ; X` D a` ”) does not depend on the past (“X1 D a1 ; : : : ; Xk D ak ”). Let a be an arbitrary state, and write ra .n/ D PrŒX1 ¤ 0; : : : ; Xn ¤ 0jX0 D a: In the special case a D 0 we obtain r0 .n/ D PrŒ > n: We prove the lemma in two steps. First, we show that, for any integer n 1,

n PrŒ > ns D r0 .ns/ max ra .s/ ; a

where s is the number of states in our Markov chain. Second, we show that max ra .s/ < 1: a

Combining the two steps, the lemma immediately follows. The proof of the first step is a simple induction: for any state b, rb .n1 C n2 / rb .n1 / max ra .n2 / max ra .n1 / max ra .n2 / D a

a

a

D r.n1 /r.n2 / where r.k/ D max ra .k/: a

Similarly, rb .n1 C n2 C n3 / r.n1 /r.n2 /r.n3 /;

226


and in general rb .ns/ r n .s/; proving the first step. The proof of the second step is equally simple. Let m be the minimum value of k 1 such that ra .k/ < 1 (ergodicity implies that m is finite). This is equivalent to the fact that there is a positive product pa0 ;a1 pa1 ;a2 pam1 ;am > 0 of transition probabilities pa;b D PrŒXi C1 D bjXi D a, where a0 D a; a1 ; a2 ; : : : ; am1 ; am D 0 is some sequence of states, and m has the minimum property. We claim the inequality m s D number of states: Suppose m > s, then the sequence a1 ; a2 ; : : : ; am1 ; am D 0 must have a repetition: ai D aj for some 1 i < j m. But this clearly contradicts the minimum property of m. Indeed, if j D m, i.e., ai D am D 0 then pa0 ;a1 pa1 ;a2 pai 1 ;ai > 0 is a shorter product; otherwise consider pa0 ;a1 pa1 ;a2 pai 1 ;ai paj ;aj C1 pam1 ;am > 0; which is also shorter. This completes the proof of Lemma 4.3.

t u

Finally, note that, by using the above-mentioned analog of (1.223) and Lemma 4.3 (as an analog of (1.208)) the same way as we used (1.223) and (1.208) in Sect. 1.5 (i.e., in the special case of the golden ratio), we can easily complete the proof of Theorem 1.2 for any quadratic irrational ˛.

4.3 Completing the Proof of Theorem 1.1 4.4 The Fourier Series Approach In Sect. 4.1 we completed the proof of Theorem 1.2 by using Ostrowski’s explicit formula (1.55). An alternative approach is to work with the (truncated) Fourier series representation (see (3.4)): T X cos..2n C 1/j˛/ S˛ .n/ M˛ .N / D C O.log log N / 2j sin.j˛/ j D1

(4.39)

that holds for every n in 1 n N , where T D T .N / D N log N and ˛ is any badly approximable number.

4.4 The Fourier Series Approach

227

In this section we prove Theorem 1.1 by working out the details of the Fourier series approach. We recall that Theorem 1.1 is about the asymptotic behavior of the discrepancy X

S˛ .I n/ D

1 n

(4.40)

1knW k˛2.0;/ .mod 1/

of the counting function of the irrational rotation k˛ (mod 1). Note that Sós [So2] has developed an analog of Ostrowski’s formula (1.55) for S˛ .I n/, and it is possible to prove Theorem 1.1 by basically repeating the arguments of Sect. 4.1 with Sós’s formula instead of Ostrowski’s formula. The advantage of the Fourier series approach is that it is far more flexible and works well even when we don’t know any “explicit formula.” p For simplicity, we discuss first the special case ˛ D 2 and D 1=2 of Theorem 1.1. We then have the following analog of (4.39): Sp2 .1=2I n/

Mp2 .1=2I N /

X

D

1j T W od d

p cos..2n C 1/j 2/ p C O.log log N / j sin.j 2/ (4.41)

for every n in 1 n N , and again T D T .N / D N log N . The proof is the same as that of (4.39). The only difference between (4.39) and (4.41) is that in the latter j is restricted to the odd integers (i.e., half of the integers, explaining the extra factor of 2 in the denominator in (4.39)). Note that both (4.39) and (4.41) are based on the Fourier series ..x// D

1 X sin.2jx/ j D1

j

;

and also, (4.41) uses identity (2.62): .x/ D ..x // ..x//:

4.4.1 Guiding Intuition As n runs in the interval 1 n N , it is plausible to expect that the difference Sp2 .1=2I n/ Mp2 .1=2I N / can be approximated by the following “stochastic variant” of (3.4): Sp2 .1=2I n/Mp2 .1=2I N / D

X 1j N W od d

cos.2Yj / C negligible; 2j sin.j˛/

(4.42)

228


where Y1 ; Y2 ; : : : ; YN are independent random variables, each one is uniformly distributed in the unit interval 0 Yj 1. The guiding intuition gives a good insight, but it is far too vague. The actual method of the proof of Theorem 1.1 follows the technique of Sects. 1.5 and 4.1: as n runs in the interval 0 < n < N , we approximate Sp2 .1=2I n/ Mp2 .1=2I N / with a sum of independent and identically distributed random variables: Sp2 .1=2I n/ Mp2 .1=2I N / D X1 C X2 C X3 C : : : C negligible; see (4.83). And again, the independence comes from an underlying (homogeneous) Markov chain, see (4.45). We carry out the approximation in two steps: first we construct a sequence XQ 1 ; XQ 2 ; XQ 3 ; : : : of almost independent random variables (see (4.71) and (4.72)) in a way similar to (1.204)–(1.206), and it is the second step, when—after some “truncation” and “linearization”—we obtain the truly independent random variables X1 ; X2 ; X3 ; : : :, see (4.83). p Similarly to Sect. 1.4, we obtain our Markov chain frompworking with the .1 C 2/ scalep representation of real numbers. Write D 1 C 2 for the fundamental unit in Q . 2/. For simplicity assume that the N in (4.41) has the special form N D m D .1 C

p m 2/ :

(Note in advance that, to go from the special values N D m to arbitrary N s, we just repeat the argument of the Concluding Remark p at the end of Sect. 1.5, see (1.237)– p (1.239).) We recall that pj ˙ qj 2 D .1 ˙ 2/j , and so qj D

.1 C

p j p p p .1 C 2/j C .1 C 2/j 2/ .1 C 2/j p and pj D ; 2 2 2

(4.43)

p where pj =qj is the j th convergent of 2. Similarly to (4.11) and (1.189), we write p an arbitrary real number in the interval 0 < < N D m in the D .1 C 2/ scale form

D bm1 m1 C bm2 m2 C bm3 m3 C : : : ;

(4.44)

where bj 2 f0; 1; 2g, and, as usual, (4.44) satisfies the Extra Rule: bj D 2 implies bj 1 D 0 (this makes representation (4.44) unique). As runs in the interval 0
k then the factor . 2 1/k` D .1 C 2/`k in (4.50) is close to an integer: by (4.43) with j 1, .1 C

p

2/j D 2pj .1

p j 2/ :

Combining (4.50) with (4.51), we have

j

p 2

X `<mW h.j /`H.j /

.1/`C1 b` "` C

(4.51)

230


C

X

Xp . 2 1/d d 1

C

X

.1

.1/`Cd C1 b` "`Cd C

`<mW h.j /`Cd H.j /

X

p d 2/

d 1

.1/`d b` "`d modulo 1:

(4.52)

`<mW h.j /`d H.j /

By using the notation X

G. ; j / D

.1/`C1 b` "` C

(4.53)

`<mW h.j /`H.j /

0 X p B C . 21/d B @ d 1

1 X

X

.1/`Cd C1 b` "`Cd C

`<mW h.j /d `H.j /d

`<mW h.j /Cd `H.j /Cd

C .1/` b` "`d C A;

we can rewrite (4.52) as follows:

j

p

2 G. ; j / modulo 1;

(4.54)

and also we can rewrite the numerator in (4.41), which is the special case D n=integer: p p

p

p

cos .2 C 1/j 2 D cos 2 j 2 C j 2 D cos 2G. ; j / C j 2 : (4.55) p Next we estimate the denominator j sin.j 2/ in (4.41). By (4.49) we have (where kyk denotes, as usual, the distance of y from the nearest integer) kj

p

X

2k D k

"k .1

hkH

p

X p p 2/k k . 2 1/h . 2 1/k h L D L.N / D 3 log log N:

(4.59)

This particular choice of L D L.N / is motivated by the fact that p p . 2 1/L D . 2 1/3 log log N < .log N /2

(4.60)

is “very small.” This “throw away” procedure formally means that we write S. / D S .1/ . / C S .2/ . /;

(4.61)

where S

.1/

. / D

X 1j T W od d H.j /h.j /L

S

.2/

. / D

X 1j T W od d H.j /h.j />L

p cos..2 C 1/j 2/ p ; j sin.j 2/

(4.62)

p cos..2 C 1/j 2/ p ; j sin.j 2/

(4.63)

and S .1/ . / is expected to be the dominant part. First we estimate the quadratic average of S .2/ .n/ (which is expected to be the minority part of S.n/) as n runs through the integers in 1 n N . Lemma 4.4. We have 1

0 1 N

N X nD1

B .2/ 2 S .n/ D O B @

X 1j T W od d H.j /h.j />L

1 2

j 2 sin .j

p

C CC 2/ A

CO .log log N /2 : The proof of Lemma 4.4 is basically the same as that of Proposition 3.1—we just leave the details of the proof to the reader. In fact, the proof of Lemma 4.4 is simpler, because if H.j /h.j / > L D L.N / D 3 log log N , then by (4.57), (4.59), and (4.60)

232


1 j j sin.j

p

2/j

D O .log N /2 ;

(4.64)

and combining (4.64) with inequality (3.113), we have X

1

1j T W od d H.j /h.j />L

j 2 sin2 .j

p

2/

D O.1/:

(4.65)

Combining Lemma 4.4 with (4.65), we obtain N 1 X .2/ 2 S .n/ D O .log log N /2 : N nD1

(4.66)

Applying Chebyshev’s inequality (see (1.150)) with (4.66), we see that the contribution of S .2/ .n/ in our central limit theorem (Theorem 1.1) is totally negligible. It means that we can focus on the dominant part S .1/ .n/ of S.n/, see (4.58), (4.61), and (4.62). We study S .1/ .n/ as the integral variable n runs .1/ in 0 < n < N D m . In fact, we extend S .1/ .n/ p tomS . / (see (4.62)), where is m any real number in 0 < < N D D .1 C 2/ . The -scale representation of

(see (4.44)) gives the Markov chain (4.45): bm1 D bm1 . /; bm2 D bm2 . /; bm3 D bm3 . /; : : : ; and the initial distribution is in fact the stationary distribution. To get independence, we apply the basic trick of Sect. 1.4: we work with “0,” “1,” and “20” (instead of “0,” “1,” “2”). Formally, bm1 . /; bm2 . /; bm3 . /; : : : D B1 . /; B2 . /; B3 . /; : : :

(4.67)

where Bi D Bi . / D 0 or 1 or 20, and the right-hand side of (4.67) forms a sequence of independent random variables with common distribution (see (4.73)) PrŒBi D 0 D PrŒBi D 1 D

p

p p 2 1 and PrŒBi D 20 D . 2 1/2 D 3 2 2; (4.68)

one-dimensional Lebesgue where Pr (“probability”) means m times the ordinary p measure (as runs in 0 < < N D m D .1 C 2/m ).


233

4.4.2 Constructing a Sum XQ 1 C XQ 2 C XQ 3 C : : : of Almost Independent Random Variables We apply the usual decomposition technique of Sects. 1.3 and 1.5; here log N plays the role of m. Let 0 < < 1=2 be a constant to be specified later, and define parameter R as R D R.N / D b.log N / c:

(4.69)

We decompose the sequence B1 ; B2 ; B3 ; : : : on the right-hand side of (4.67) into groups of size R D R.m/ (defined in (4.69)): B1 ; B2 ; B3 ; : : : ; BR ; BRC1 ; BRC2 ; BRC3 ; : : : ; B2R ; B2RC1 ; B2RC2 ; B2RC3 ; : : : ; B3R ; and so on:

(4.70)

Let B.i 1/RCj be an arbitrary element of the i th group in (4.70), where i D 1; 2; 3; : : : and 1 j R. We have B.i 1/RCj D 0 or 1 or 20, that is, B.i 1/RCj D bk or bk bk1 for some k (see (4.67)); let Ii denote the set of all indices k or k; k 1 that occur this way in the i th group. Notice that Ii is an interval (of consecutive integers), and these intervals are disjoint and decreasing: the elements of Ii C1 are all smaller than that of Ii . By using decomposition (4.70) and these intervals Ii , we can rewrite S .1/ . / in (4.62) in the form S

.1/

. / D

X 1j T W od d H.j /h.j /L

p cos..2 C 1/j 2/ p D j sin.j 2/

D XQ 1 C XQ2 C XQ3 C C YQ1 C YQ2 C YQ3 C ;

(4.71)

where the XQ i s are defined by the disjoint intervals Ii above: XQi D XQi . / D

X 1j T W od d H.j /h.j /L;fh.j /L;H.j /CLgIi

p cos..2 C 1/j 2/ p ; j sin.j 2/

(4.72)

and YQi D YQi . / D

X 1j T W od d H.j /h.j /L;h.j /L2Ii C1;H.j /CL2Ii

p cos..2 C 1/j 2/ p : j sin.j 2/

(4.73)

234


We emphasize the similarity to (1.204)–(1.206). We have the upper bounds 0 B jXQ i j const B @

1 X 1j T DN log N W H.j /h.j /L;fh.j /L;H.j /CLgIi

j kj

1 p

C CD 2k A

D O.jIi j L/ D O..log N / log log N /; and 0 B jYQi j const B @

1 X 1j T DN log N W H.j /h.j /L;h.j /L2Ii C1;H.j /CL2Ii

C 1 p C D j kj 2k A

D O.L2 / D O..log log N /2 /; where in both cases we applied Lemma 2.14. These upper bounds play the role of (1.221) in Sect. 1.5. As usual, our plan is to show that the first sum XQ 1 C XQ 2 C XQ3 C in (4.71) is the dominating part. The new condition in (4.72) means that we keep those j s for which the whole L-neighborhood of the interval Œh.j /; H.j / is still inside of Ii . The motivation for this comes from the fact (see (4.60)) p p . 2 1/L D . 2 1/3 log log N < .log N /2 D very small;

(4.74)

P which implies that the tail d >L of the series in (4.53) is negligible, so we can safely cut the series off at d D L D 3 log log N . The “cutoff-the-tail” argument means that we consider the following truncated version of G. ; j / in (4.53): G . ; j / D

X

.1/`C1 b` "` C

(4.75)

`<mW h.j /`H.j /

0 C

L X d D1

p B . 2 1/d B @

1 X

.1/`Cd C1 b` "`Cd C

`<mW h.j /d `H.j /d

X `<mW h.j /Cd `H.j /Cd

C .1/` b` "`d C A:

By (4.54) and (4.74), we can rewrite the numerator: p

p

cos .2 C 1/j 2 D cos 2G. ; j / C j 2 D


235

p

D cos 2G . ; j / C j 2 C O .log N /2 : Next we study the numerator j sin.j j

p

(4.76)

2/. By (4.48) and (4.49),

X p p 2D " k qk 2 D hkH

D

X

X p "k .qk 2 pk / C "k pk D

hkH

hkH

X

D

"k .1

p

X

2/k C

hkH

" k pk :

(4.77)

hkH

p Every convergent numerator pk of 2 is odd (since p1 D 1, p2 D 3, and pi D 2pi 1 C pi 2 for all i 3), thus (4.77) implies that (we also use (4.48)) 0 j sin.j

p

2/ D j sin @

X

"k .1

p

2/k C

hkH

0 D .1/1C

P hkH

"k

@

X

1

X

1 "k A D

hkH

0

"k qk A sin @

hkH

X

1 "k .1

p

2/k A :

(4.78)

hkH

p Sum S .1/ .n/ (see (4.62)) is defined for j s with very small kj 2k (apart from a very few “small” j s), so we can safely apply the approximation sin.x/ D x C O.x 3 /: 0 sin @

X

1 X p k p "k .1 2/ A D "k .1 2/k C negligible;

hkH

(4.79)

hkH

and for the same reason in (4.76) we can ignore the term j

p

2:

p

cos .2 C 1/j 2 D cos 2G . ; j / C negligible: Let’s return to (4.78): by (4.43) we have 0 @

X

hk1 H

10 "k1 qk1 A @

X hk2 H

1 "k2 .1

p

2/k2 A D

(4.80)

236


0 D@

X

"k1

.1 C

hk1 H

D

X

X

p

2/k1 .1 C p 2 2

p k 10 2/ 1 A @ X

1 "k2 .1

p

2/k2 A D

hk2 H

"k1 "k2 .1/k2 .1 C

p k k 2/ 1 2 C negligible:

(4.81)

hk1 H hk2 H

It is easy to see that the single largest term in (4.81) comes from the choice k1 D H.j /, k2 D h.j /, and thepabsolute value of sum (4.81) is between two positive constant multiples of .1 C 2/H h .

4.4.3 Defining the Truly Independent Random Variables X1 ; X2 ; X3 ; : : : Besides G . ; j / in (4.75), we also need the new functions: G1 .j / D 1 C

X

"k and

hkH

G2 .j / D

X

X

"k1 "k2 .1/k2 .1 C

p k k 2/ 1 2 :

(4.82)

hk1 H hk2 H

We approximate XQ i (see (4.72)) with Xi D Xi . / D

X

.1/G1 .j /

1j T W od d H.j /h.j /L;fh.j /L;H.j /CLgIi

cos.2G . ; j // : 2 G2 .j / (4.83)

Similarly, we approximate YQi (see (4.73)) with Yi D Yi . / D

X

.1/G1 .j /

1j T W od d H.j /h.j /L;h.j /L2Ii C1 ;H.j /CL2Ii

cos.2G . ; j // : 2 G2 .j /

(4.84) Equations (4.78)–(4.82) make both approximations perfectly reasonable. Let’s return p to (4.67): as the real variable runs in the interval 0 < < N D m D .1 C 2/m , the sequence B1 . /; B2 . /; B3 . /; : : :

(4.85)

forms independent and identically distributed random variables (where Bi D Bi . / D 0 or 1 or 20, and the common distribution is described in (4.68)). The


237

random variables X1 D X1 . /, X2 D X2 . /, X3 D X3 . /; : : : in (4.83) were defined in such a way that they depend only on disjoint sets of B` ’s. More precisely, Xi D Xi . / depends only on the i th group in (4.70) (described by the interval Ii ). It follows that X1 ; X2 ; X3 ; : : : are independent random variables. Similarly, Y1 ; Y2 ; Y3 ; : : : are independent random variables. Yi may depend on both Xi and Xi C1 , but Yi is independent of all X` with ` 62 fi; i C 1g. By independence, EX` Yi D 0 if ` 62 fi; i C 1g. As an analog of (1.223), we need an upper bound for the absolute value of m

Z

m

EX` Yi D

X` . /Yi . / d 0

if ` 2 fi; i C 1g. It is more convenient to go back to (4.72) and (4.73): EXQ` YQi D m

Z

m

XQ` . /YQi . / 0

if ` 2 fi; i C 1g. By repeating the argument of Proposition 3.2, X

jEXQ` YQi j

1

1j T W od d H.j /h.j /L;h.j /L2Ii C1 ;H.j /CL2Ii

2

j 2 sin .j

p

2/

C

CO .log log N /2 D O.L/ C O .log log N /2 D O .log log N /2 if ` 2 fi; i C 1g. Since X` D XQ ` Cnegligible and Yi D YQi Cnegligible, the same upper bound holds for jEX` Yi j as well: EX` Yi D O .log log N /2 if ` 2 fi; i C 1g. Note that X1 ; X2 ; X3 ; : : : are identically distributed. This is a consequence of the “translation invariance” of the functions G1 .j /; G2 .j / (see (4.82) and (4.83)) and the linearity of G . ; j / (see (4.75)). The meaning of “translation invariance” is explained in Eq. (4.86). First we recall (4.48): j D

X

"k qk ; where "k 2 f1; 0; 1g;

(16.48’)

h.j /kH.j /

and, replacing each qk with qkC2i , define the “translate” j.i / D

X h.j /kH.j /

"k qkC2i

(16.48”)

238


where i D 0; ˙1; ˙2; ˙3; : : :. Then j and j.i / have the same parity (due to the recurrence q1 D 1, q2 D 2, q` D 2q`1 C q`2 for all ` 3; the parity is important, since in (4.41) the integral parameter j has to be odd), and also (see (4.82)) G1 .j / D G1 .j.i // and G2 .j / D G2 .j.i //:

(4.86)

Similarly, Y1 ; Y2 ; Y3 ; : : : are also identically distributed. Equation (4.83) is an analog of (1.205), and similarly Eq. (4.84) is an analog of (1.206). Repeating the arguments that follow (1.205) in Sect. 1.5 (e.g., applying p Kolmogorov’s inequality), we obtain Theorem 1.1 in the special case ˛ D 2 and D 1=2. To prove the general case, where ˛ is an arbitrary quadratic irrational and is an arbitrary rational number in 0 < < 1, we just have to repeat the arguments of Sect. 4.1. To compute the expectation, we use the results of Sects. 2.1 and 2.2. To compute the variance, we use the results of Sect. 3.1–3.3. Finally, as an analog of (4.56) and (4.57) in the general case, we use the following technical lemma describing an important property of the so-called Ostrowski representation (see also (1.54)). Let ˛ be an arbitrary irrational: ˛ D Œa0 I a1 ; a2 ; a3 ; : : :; and write Œa0 I a1 ; : : : ; ai 1 D

pi : qi

(4.87)

By using the denominators qi in (4.87), every integer n 1 can be written in the form (where q1 D 1, q2 D a1 ) nD

X

bi qi ; where m 0; bm ¤ 0;

i m

bi 2 f0; 1; : : : ; ai g for i > m; bi D ai implies bi 1 D 0; and b1 < a1 : (4.88) This is the Ostrowski representation of n defined by ˛ (see (4.87)). Lemma 4.5. For every n written in the form (4.88) we have the lower bound kn˛k D j

X

bi .qi ˛ pi /j jqmC1 ˛ pmC1 j;

(4.89)

i m

and also the upper bound kn˛k jqm1 ˛ pm1 j if m 2:

(4.90)

Since the proof is tricky, we include it. To prove (4.89), we write i D qi ˛ pi ; then i D ai 1 i 1 C i 2 I

(4.91)


239

note that i D .1/i 1 ji j; and ji 2 j D ai 1 ji 1 j C ji j:

(4.92)

We have 0 .1/m1 @

1 X

1 bj j A D bm jm jbmC1jmC1 jCbmC2jmC2 jbmC3jmC3 j˙

j Dm

bm jm j bmC1 jmC1 j bmC3 jmC3 j bmC5 jmC5 j :

(4.93)

Since bm ¤ 0 we have bmC1 amC1 1, and using the recurrence formula (4.92): ji 2 j D ai 1 ji 1 j C ji j repeatedly, we obtain bm jm j bmC1 jmC1 j jmC1 j C jmC2 j; jmC2 j bmC3 jmC3 j jmC4 j; jmC4 j bmC5 jmC5 j jmC6 j; and so on. Applying these inequalities in (4.93), we have 0 .1/m1 @

1 X

1 bj j A .bm 1/jm j C jmC1 j;

j Dm

which proves (4.89). On the other hand, by a telescoping sum argument 0 .1/m1 @

1 X

1 bj j A bm jm j C bmC2 jmC2 j C bmC4 jmC4 j C

j Dm

bm jm j C .jmC1 j jmC3 j/ C .jmC3 j jmC5 j/ C .jmC5 j jmC7 j/ C D D bm jm j C jmC1 j jm1 j by (4.92); this proves (4.90), and Lemma 4.5 follows. Thus the proof of Theorem 1.1 is complete.

u t t u

240


4.5 More Results in a Nutshell There are many more results that can be proved by this method. As a first illustration, consider the following, equally interesting and natural, lattice point counting in a right-angled triangle type problem—a variant of the problem in Sect. 1.2. It was Hardy–Littlewood [Ha-Li1,Ha-Li2] and Ostrowski [Os] who, independently of each other and about the same time around 1914–1920, started to investigate the problem of counting lattice points inside the right-angled triangle whose perpendicular sides are on the two coordinate axes and the long side is on the line ˛x C y D t; here ˛ (the negative of the slope) is a fixed irrational. Let D .˛I t/ denote this right triangle; the vertices are the origin O D .0; 0/, A D .t=˛; 0/, and B D .0; t/; we assume that t is a “large” positive number. The number of lattice points inside .˛I t/ equals (by vertical counting) bt =˛c

T˛ .t/ D

X

bt k˛c D

kD1

D

t t t2 C O.1/ C Z˛ .t/; 2˛ 4˛ 4

(4.94)

where bt =˛c

Z˛ .t/ D

X

fk˛ tg

kD1

1 2

(4.95)

is analogous to S˛ .n/ in (1.43) (fyg denotes, as usual, the fractional part of y). Note that the quadratic function t t t2 2˛ 4˛ 4 in (4.94) represents the area (=expectation). Indeed, consider the smaller and similar right-angled triangle where the lower left corner is .1=2; 1=2/ (instead of .0; 0/), but the long side is still on the AB-line; the area of this smaller triangle is exactly 1 t 2 ˛

1 2

t2 ˛

t t 1 D t C : 2 2˛ 4˛ 4 8

To study the crucial part Z˛ .t/ in (4.94), we use the familiar Fourier series of the sawtooth function: ..x// D fxg

1 X sin.2jx/ 1 D 2 j j D1

4.5 More Results in a Nutshell

241

in (4.95) and obtain 1 0 bt =˛c 1 X 1 @X sin.2 mx/A D Z˛ .t/ D m mD1 kD1

D

1 X cos.2 m˛ft=˛g m˛/ cos.2 mftg m˛/ ; 2 m sin. m˛/ mD1

(4.96)

where we used the identity n X

cos. 12 ˇ / cos..n C 12 /ˇ /

sin.kˇ / D

2 sin. 12 ˇ/

kD1

:

If the real variable t runs in the “long” interval 0 < t < N and N ! 1, then ˛ft=˛g .mod 1/ and ftg are “asymptotically independent” variables (a corollary of the uniform distribution of n˛ (mod 1)). It follows via routine calculations that 1 N

Z

N

Z˛ .t/ dt D negligible 0

and 1 N

Z

N

Z˛2 .t/ dt

0

D

N X

1

mD1

4 2 m2 sin2 . m˛/

C negligible:

(4.97)

Comparing this to the variance of n X 1 fk˛g S˛ .n/ D 2 kD1

(as n runs in 1 n pN ) in (3.12), we see an extra factor of 2 in (4.97), and so in the special case ˛ D 2, by (3.63), (3.87), (3.92) we have 1 N

Z

N 0

2 Zp .t/ dt D 2

D

N X

1 p C negligible D 2 m2 sin2 . m 2/ 4 mD1

1 log N p p C negligible: 24 2 log.1 C 2/

(4.98)

242


By using the proof technique of Sect. 4.3 (“Fourier series approach”) and applying it to (4.96), we can easily obtain the following analog of Theorem 1.1: for any integer N 3 and any real numbers 1 < A < B < 1 Z B Zp2 .t/ 1 1 2 B D p e u =2 d u C measure 0 < t < N W A p N c1 log N 2 A C O .log N /1=10 log log N ; (4.99) and measure stands for the usual one-

p p where c1 D .24 2 log.1 C 2//1=2 dimensional Lebesgue measure. Note that (4.99) remains true if the long interval 0 < t < N is replaced with a constant size interval, say, N 1 < t < N . Another similar lattice point problem arises when the large square Œt; t2 centered at the origin is rotated by an angle , where the slopeptan./ D ˛ is irrational; the center remains the origin. For simplicity, let ˛ D p 2; by a routine application of Poisson’s summation formula, the number L.t/ D L. 2I t/ of lattice p points inside the tilted square (of slope 2) equals X

2

L.t/D4t C

2

nD.n1 ;n2 /2ZZ W n¤0

p p p p sin.2 t.n1 2n2 /= 3/ sin.2 t.n1 Cn2 2/= 3/ p p p p ; ..n1 2n2 /= 3/ ..n1 Cn2 2/= 3/ (4.100)

2

where the term 4t (the area of the tilted square) comes from the contribution of n D 0. It follows via routine calculations that 1 N q and (let jnj D 1 N

Z 0

N

Z

N

L.t/ 4t 2 dt D negligible

0

n21 C n22 ) X

2 L.t/4t 2 dtD 2

n2ZZ W 0<jnjN

4 4 .n

p 1

9 2n2 /2 .n1 Cn2

p

2/2

C negligible:

Since p p 2n2 n22 n2 2n22 n1 2 n2 D p1 and n1 C n2 2 D 1 p ; n1 2 C n2 n1 n2 2


243

we have 1 N

Z

N

L.t/ 4t 2

2

dt D

0

9 C 4 4

9 D 2 4

9 4 4

X 2

n2ZZ W jnjN; n1 n2 0;n2 >0

D

8 4

X 2

n2ZZ W jnjN; n1 n2 >0

1 2 .n1 2n22 /2

1 .2n21 n22 /2

!2 p 2 C n2 p C n1 C n2 2 n1

p !2 n1 n2 2 p C negligible D n1 2 n2

p !2 p 2C 2 1 1 C 2 Cnegligible D 1C2 .2n21 n22 /2 .n1 2n22 /2

X 2

n2ZZ W jnjN; n1 >0;n2 >0

.n21

1 C negligible: 2n22 /2

Clearly X 2

n2ZZ W jnjN; n1 >0;n2 >0

log N 1 p C negligible; D K .2/ .n21 2n22 /2 log.1 C 2/

p where K .s/ is the Dedekind zeta function of the real quadratic field K DQ Q. 2/ (see (3.64)). By Siegel’s formula (3.87) and (3.92),

K .2/ D

4 120 23=2

X b 2 CacD2W a>0;c>0

aD

4 4 5D p ; 3=2 120 2 48 2

and so 1 N

Z

N 0

2 1 log N L.t/ 4t 2 dt D p p C negligible: 6 2 log.1 C 2/

Comparing this to (4.43), we see an extra factor of 4 here. This factor of 4 is “explained” by the geometric intuition that our square has four tilted sides; on the other hand, the right-angled triangle has only one tilted side (the horizontal and vertical sides do not count). By using the proof technique of Sect. 4.3 (“Fourier series approach”) and applying it to (4.100), we can easily obtain the following analog of Theorem 1.1: for any integer N 3 and any real numbers 1 < A < B < 1 ( ) p Z B L. 2I t/ 4t 2 1 1 2 measure 0 < t < N W A p e u =2 d u C B D p N c2 log N 2 A

244


C O .log N /1=10 log log N ; (4.101)

p p where c2 D .6 2 log.1 C 2//1=2 D 2c1 (see (4.99)) and again measure stands for the one-dimensional Lebesgue measure. Again (4.101) remains true if the long interval 0 < t < N is replaced with a short one like N 1 < t < N . p Of course, we can generalize both (4.99) and (4.101) from ˛ D 2 to any quadratic irrational. Note that (4.99) and (4.101) are both central limit theorems, p similar to Theorems 1.1 and 1.2. What happens when we switch from ˛ D 2 (or any quadratic irrational) to a typical real ˛, i.e., in the case of almost every ˛? Perhaps it surprises the reader to learn that for a typical ˛ we cannot expect a central limit theorem. The reason behind it is in the distribution of the partial quotients a1 ; a2 ; a3 ; : : : in the continued fraction for ˛ D Œa0 I a1 ; a2 ; a3 ; : : :. If ˛ is a quadratic irrational then the partial quotients form a bounded (and periodic) sequence. In sharp contrast, for a typical ˛, by a well-known theorem of Kusmin, the density of an D 1 is log.4=3/ D :415 : : : 41:5 %; log 2 see (2.198), and in general, for any fixed integer k 1, the density of an D k is 2

.kC1/ log k.kC2/

log 2

1 1 1 2 D log 1 C ; log 2 k.k C 2/ k log 2

(4.102)

see (2.197). So the average size of the partial quotients equals 2

.kC1/ 1 1 1 X X log k.kC2/ 1 1 X1 k k 2 D D1 log 2 k log 2 log 2 k kD1

kD1

(4.103)

kD1

for almost every ˛. The divergence in (4.103) is the reason behind the failure P of the central limit theorem for almost every ˛. (Note in advance that the fact nkD1 k1 D log n C O.1/ will be used again in Sect. 6.10, explaining the extra factor of log n for the “almost every ˛” type results in Part 1.3.) There is a limit theorem here, proved by Kesten [Ke], which is not a central limit theorem. It goes as follows: (

PN 2

area .˛; ˇ/ 2 Œ0; 1/ W

) Z 1 A du C ˇk 1=4/ A ! ; c3 log N 0 1 C u2 (4.104)

kD1 .kk˛

where c3 > 0 is an absolute constant and, of course, kyk is the distance of y from the nearest integer (its average value is 1/4).


245

The limit distribution on the right-hand side of (4.104) is called the Cauchy distribution, and it is “degenerate” in the sense that it has neither expectation nor variance (both give divergent integrals). The appearance of the “degenerate” Cauchy distribution is quite natural, since the square-integral Z 1Z 0 0

1

N X

!2 .kk˛ C ˇk 1=4/

d˛ dˇ D const N

(4.105)

kD1

is exponentially larger than the norming factor log N in (4.104). (Note that the proof of (4.105) is based on (4.102).) p other hand, by switching from ˛ D 2 to ˛ D e (or its relatives such as p On2the p e, e , 3 e), we can save the central limit theorem. As an illustration, we show an analog of Theorem 1.2. First we recall e D Œ2I 1; 2; 1; 1; 4; 1; 1; 6; 1; 1; 8; 1; : : : ; 1; 2i; 1; : : :;

(4.106)

and next we apply (3.12) with ˛ D e: variance D Ve .N / D

D

N X

1 C negligible D 2 j 2 sin2 .je/ 8 j D1

N 1 X 1 C negligible: 4 2 8 j D1 j kjek2

(4.107)

Let pi =qi denote the i th convergent of e, and let qk N < qkC1 . By using (4.106) and the recurrence formula qi D ai 1 qi 1 Cqi 2 (and also formula (1.28)), we have k D .3 C o.1//

log N : log log N

It is easy to see that the main contribution of (4.107) comes from the j s of the form j D mqi (with m D 1; 2; 3; : : :): 0 10 1 @X 1 A @ Ve .N / D 8 4 m1 m4 0 1 4 @ D 8 4 90

1

X

ai2 A C negligible D

1i 3 log N= log log N

X

1`log N= log log N

1 .2`/2 A C negligible D

246


D

1 4 1 4 .log N= log log N /3 C negligible D 4 8 90 3 D

1 540

log N log log N

3 C negligible;

(4.108)

describing the variance. On the other hand, to determine the mean value, we use Proposition 2.1: for ˛ D e the continued fraction (4.106) gives the alternating sum .1 C 2 1/ C .1 4 C 1/ C .1 C 6 1/ C C .1/i .1 2i C 1/; which equals .i 1/ if i is odd and i if i is even. Thus by Proposition 2.1, Me .N / D

N 1 X Se .n/ D O.log N= log log N /; N nD1

(4.109)

which is the true order of magnitude. Equations (4.108) and (4.109) tell us that the mean value Me .N / is negligible compared to the standard deviation 3=2 p 1 log N Ve .N / D C negligible: 6 15 log log N

(4.110)

We can prove the following analog of Theorem 1.2 for ˛ D e: for any integer N 3 and any real numbers 1 < A < B < 1 1 1 Se .n/ B Dp measure 1 n N W A N c4 .log N= log log N /3=2 2 Z

B

e u

2 =2

d u C C O .log N /3=10 log log N ;

A

(4.111) where c4 D 61 153=2 (see (4.110)). What happens for an arbitrary irrational ˛ D Œa0 I a1 ; a2 ; a3 ; : : :? How far can we generalize Theorem 1.2? Proposition 2.1 expresses the mean value M˛ .N / in terms of the partial quotients ai . For the variance we don’t have a similar elegant formula, but we still have (3.12) V˛ .N / D

N N 2 1 1 X 1 X C negligible: S˛ .n/ M˛ .N / D N nD0 8 2 nD1 .n sin. n˛//2 (4.112)

It is easy to see that P the right-hand side of (4.112) is between two absolute constant multiples of i Wqi N ai2 . We can prove the following central limit theorem:


247

assume that a2 Pm m

i D1

ai2

! 0 as

m ! 1;

(4.113)

then ) ( Z B 1 1 S˛ .n/ M˛ .N / 2 B ! p p e u =2 d u measure 1 n N W A N 2 A V˛ .N / (4.114) for any fixed values of 1 < A < B < 1 as N ! 1. Observe that (4.113) is basically the necessary condition (called Lindeberg condition) that the components are “individually negligible." This is why (4.114) is the most general result that we can hope for. p 3 2? Well, Can we prove a similar central limit theorem for, say, ˛ D p 3 unfortunately we know almost nothing about the continued fraction for 2 (or any other real algebraic number of degree 3). In particular, we don’t have the slightest p clue whether criterion (4.113) applies for ˛ D 3 2 or not. The proofs of (4.111) and (4.114) are somewhat more complicated than that of Theorem 1.1 (see Sect. 4.3), due to the fact that in general the continued fraction is not periodic, so we do not obtain a homogeneous Markov chain. Nevertheless, the usual decomposition technique (see Sects. 1.5, 4.1, 4.3) still works as the Markov chain (“short-term memory”) can be successfully replaced by exponentially weak dependence. The best way to handle exponentially weak dependence is to involve martingales. We will give the full details somewhere else.

Part II

Local Aspects Inhomogeneous Pell Inequalities

Chapter 5

Pell’s Equation, Superirregularity and Randomness

5.1 From Pell Equation to Superirregularity 5.1.1 Pell’s Equation: Bounded Fluctuations Our starting point is the well-known Pell’s equation, a standard part of any introductory course on number theory. The theory of Pell’s equation, while mostly elementary, is nevertheless one of the most beautiful chapters in the whole of mathematics. Also, it is very important, since the concept of units plays a key role in algebraic number theory. We illustrate the main results on the concrete equation x 2 2y 2 D ˙1. This equation has infinitely many integral solutions; in fact, the set of all integral solutions .xk ; yk / 2 ZZ2 forms a cyclic group generated by the least positive solution. More precisely, we have p p xk C yk 2 D ˙.1 C 2/k ; k 2 ZZ: All integral solutions of x 2 2y 2 D 1 are given by p p xk C yk 2 D ˙.1 C 2/2k and all of x 2 2y 2 D 1 by p p xk C yk 2 D ˙.1 C 2/2kC1 : In particular, all positive integer solutions of x 2 2y 2 D 1 are given by p p p xk C yk 2 D .1 C 2/2k D .3 C 2 2/k ; k D 1; 2; 3; : : :


251

252

5 Pell’s Equation, Superirregularity and Randomness

p p Taking the algebraic conjugate xk yk 2 D .3 2 2/k and adding/subtracting these two equations together, we obtain the explicit formulas p p p p .3 C 2 2/k C .3 2 2/k .3 2 2/k .3 2 2/k p : xk D and yk D 2 2 2 (5.1) p p Since 0 < 3 2 2 < 1 (in fact, 0 < 3 2 2 < 1=5), we have p 1 xk D the nearest integer to .3 C 2 2/k 2 and p 1 yk D the nearest integer to p .3 C 2 2/k : 2 2 If k is large, the error is very small. For example, the tenth solution of x 2 2y 2 D 1 in positive integers is the pair x10 D 22; 619; 537 and y10 D 15; 994; 428: On the other hand, p 1 .3 C 2 2/10 D 22; 619; 536:99999998895 : : : 2 and p 1 p .3 C 2 2/k D 15; 994; 428:000000007815 : : : : 2 2 p Let F .N / D F . 2I 1I N / denote the number of positive integer solutions of the Pell equation x 2 2y 2 D 1 up to N in the sense x 1 and 1 y N . (For the simplicity of notation it is more convenient to restrict the second variable y.) We have p p .3 C 2 2/k .3 2 2/k k F .N / ” p N; 2 2 which implies the asymptotic formula p F .N / D F . 2I 1I N / D

log N

p C O.1/: log.3 C 2 2/

(5.2)

5.1 From Pell Equation to Superirregularity

253

p Formula (5.2) says that the counting function F .N / D F . 2I 1I N / has an extremely predictable, almost deterministic behavior: it is const log N plus some totally negligible bounded error term. Note that (5.2) has p some far-reaching generalizations. Let Œ1 ; 2 be an arbitrary interval, and let F . 2I Œ1 ; 2 I N / denote the number of positive integer solutions of the Pell inequality 1 x 2 2y 2 2 , x 1 and 1 y N . By using the theory of indefinite binary quadratic forms, it is easy to prove the following analog of (5.2): p p F . 2I Œ1 ; 2 I N / D c0 . 2I 1 ; 2 / log N C O.1/;

(5.3)

p where the constant factor c0 . 2I 1 ; p 2 / is independent of N . What is more, we can switch from 2 to any other quadratic irrational ˛ (i.e., ˛ is a root of a quadratic equation Ax 2 C Bx C C D 0 with integral coefficients such that the discriminant B 2 4AC 2 is not a complete square).p Let’s go back to (5.3), that is, to the special case ˛ D 2. For example, if 2 < 1 1 < 1 2 < 2, then p c0 . 2I 1 ; 2 / D

2 p D p ; log.1 C 2/ log.3 C 2 2/ 1

(5.4)

if 1 < 1 1 2 < 2, then p c0 . 2I 1 ; 2 / D

1 p ; log.3 C 2 2/

and finally if 1 < 1 2 < 1, then of course p c0 . 2I 1 ; 2 / D 0:

(5.5)

(5.6)

5.1.2 The Area Principle It is very interesting to compare these well-known asymptotic results about the number of solutions of the Pell equation/inequality to what we like to call the “naive area principle”—a natural guiding intuition in “lattice point theory.” It goes as follows: if a “nice region” has a “large” area, then it should contain a “large” number of lattice points, “close” to the area. In the rest we refer to this vague intuition as the Area Principle. Of course, the heart of the matter is how to define “nice region” precisely. Consider, for example, the infinite open horizontal strip of height one: 0 < y < 1, 1 < x < 1; it has infinite area, but it contains no lattice point. The reader is likely to agree that the infinite strip is a “nice region,” so the Area Principle is clearly violated here.

254


A less trivial example comes from the Pell inequality

1 1 x 2 2y 2 ; 2 2

(5.7)

which is a hyperbolic region of infinite area, and contains no lattice point except the origin. The reader is again likely to agree that the hyperbolic region (5.7) is also “nice,” so this is again a violation of the Area Principle. Next we switch from (5.7) to the general Pell inequality 1 x 2 2y 2 2 ;

(5.8)

where 1 < 1 < 2 < 1 are arbitrary real numbers. Of course, the hyperbolic region (5.8) has infinite area. What we want to compute is the area of a finite segment. Consider the finite region p ˚ H. 2I Œ1 ; 2 I N / D .x; y/ 2 R I 2 W 1 x 2 2y 2 2 where x 1 and 1 y N g :

(5.9)

If Np is very large compared to the pair of constants 1 ; 2 , then the finite region H. 2I Œ1 ; 2 I N / looks like a “hyperbolic needle.” p It is easy to estimate the area of the “hyperbolic needle” H. 2I Œ1 ; 2 I N /:

p 2 1 area H. 2I Œ1 ; 2 I N / D p log N C O.1/; 2 2

(5.10)

where the implicit constant in O.1/ is independent of N (but may depend on 1 and 2 ). The proof of (5.10) is based on the familiar factorization p p x 2 2y 2 D .x C y 2/.x y 2/

(5.11)

and on the computation p of the Jacobian of the corresponding substitution [this explains the factor 2 2 in the denominator in (5.10)]. The details are easy, and go as follows. In view of the factorization (5.11), it is more convenient to compute the area of the following slight variant of region (5.10): let p ˚ H . 2I Œ1 ; 2 I N / D .x; y/ 2 R I 2 W 1 x 2 2y 2 2 p p o where 1 x C y 2 2 2N :

(5.12)

Consider the substitution p p u1 D x C y 2; u2 D x y 2;

(5.130)


255

which is equivalent to xD

u1 u 2 u1 C u 2 ; yD p ; 2 2 2

(5.1300)

and the corresponding Jacobian is ˇ ˇ ˇ 1=2 1=2 ˇ @.x; y/ ˇ D ˇ 3=2 3=2 ˇˇ D 23=2 : 2 2 @.u2 ; u1 / Applying the substitution (5.130) and (5.1300 ), we have Z p area.H . 2I Œ1 ; 2 I N // D

1 D p 2 2 1 D p 2 2

Z

p H . 2IŒ1 ;2 IN /

Z

Z p 1u1 2 2N

p 2 2N

1

1 =u1 u2 2 =u1

1 dxdy D

1 d u2

d u1 D

2 1 2 1 d u1 D p log N C O.1/: u1 2 2

(5.14)

A simple geometric consideration shows that

p p area H. 2I Œ1 ; 2 I N / D area H . 2I Œ1 ; 2 I N / C O.1/; and so (5.14) implies (5.10). Now let’s return to the Area Principle. Comparing (5.3) with (5.9) and (5.10), it is “reasonable” p to expect—in view of the Area Principle—that the countingpfunction F . 2I Œ1 ; 2 I N / is “close” to the area of the hyperbolic needle H. 2I Œ1 ; 2 I N /. In other words, it is “reasonable” to expect that p 2 1 c0 . 2I 1 ; 2 / D p : 2 2

(5.15)

Unfortunately, the Area Principle is “almost always” violated in the quantitative sense that (5.15) fails for the overwhelming majority of the choices 1 < 1 < 2 < 1. In fact, the left-hand side and the right-hand side of (5.15) have completely different behavior: the left-hand side of (5.15) has discrete jumps and the righthand side is a continuous function of 1 and 2 . For example, as 1 and 2 run in p the interval 2 < 1 < 2 < 2, the constant factor c0 . 2I 1 ; 2 / has only three possible values [see (5.4)–(5.6)]: 0;

2 1 p ; p : log.3 C 2 2/ log.3 C 2 2/

256


This shows—in a quantitative way—how the general Pell inequality [see (5.8)] 1 x 2 2y 2 2 violates the Area Principle.

5.1.3 The Giant Leap in the Inhomogeneous Case: Extra Large Fluctuations Using the familiar factorization (5.11), we can rewrite the Pell equation x 2 2y 2 D ˙1, restricted to positive integers, as follows: p p p p jx 2 2y 2 j 1 ” jy 2 xj .y 2 C x/ 1 ” ky 2k .y 2 C x/ 1; (5.16) where kzk denotes, as usual, the distance of a realpnumber z from the nearest integer. Notice that in (5.16) x is p the nearest integer to y 2 (=an p irrational number, namely, an integral multiple of 2 where y 1). Since y 2 D x C o.1/, (5.16) is “basically” equivalent to the “vague” inequality p 1 C o.1/ ky 2k p : 2 2y

(5.17)

The vagueness of (5.17) comes from the additive term o.1/, which tends to 0 as y ! 1. Formula (5.17) is ambiguous, but we are sure every mathematician understands what we are talking about here. An expert in number theory would classify (5.17) as a basic problem in diophantine approximation. Next we give a nutshell summary of diophantine approximation. The classical problem in the theory of diophantine approximation is to find “good” rational approximations of irrational numbers. More precisely, we want to decide whether an inequality kn˛k
0 > ˛ 0 . For simplicity assume that the interval Œ1 ; 2 is symmetric to 0, i.e., Œ1 ; 2 D Œ; . Also, assume that we are interested in the positive integral solutions of (5.24). Since ˛ > 0 > ˛ 0 , for “large” positive x and y the second factor .x ˛ 0 y C 2 / in (5.24) is also “large” positive, implying that the first factor .x ˛y C 1 / in (5.24) has to be very small. That is, x has to be the nearest integer to .y˛ 1 /. It follows that the symmetric version of (5.21) a11 x 2 C a12 xy C a22 y 2 C a13 x C a23 y ;

(5.25)

where > 0 is a given real number, is equivalent to the diophantine inequality ky˛ 1 k
0; y D v0 > 0 is the least positive solution of Pell’s equation x 2 Dy 2 D 4. As a by-product, we obtain that the number of positive integral solutions of 1 Q.x; y/ 2 with 1 x N; 1 y N has the simple asymptotic form c log N C O.1/, where c D c.a11 ; a12 ; a22 ; 1 ; 2 / is a constant and the error term O.1/ is uniformly bounded as N ! 1. (For a more detailed proof, see Lang’s book [La].) Exactly the same holds if there is a nonzero linear part a13 x C a23 y in (5.21), but its effect “cancels out”: 1 in (5.24) is an integer.

260


Finally, if 1 is not an integer, then we call (5.24) an inhomogeneous Pell inequality. In view of (5.26), an inhomogeneous Pell inequality (5.24) is basically equivalent to an inhomogeneous diophantine inequality kn˛ ˇk
0 and for almost every a13 ; a23 . The number of solutions exhibits 1. extra large fluctuations (proportional to the area!), 2. satisfies an elegant central limit theorem, and 3. satisfies a shockingly precise law of the iterated logarithm, see Theorems 5.3, 5.4, and 5.6. Because it represents the whole difficulty, for notational simplicity we formulate the results in the special case ofpdiscriminant D D 8. It corresponds to the most famous quadratic irrational ˛ D 2. Since the class number of discriminant D D 8 is one, the general form of an inhomogeneous Pell inequality of discriminant D D 8 is 1 .x C ˇ1 /2 2.y C ˇ2 /2 2

(5.28)

where 1 < 2 and ˇ1 ; ˇ2 2 Œ0; 1/ are fixed constants. For notational simplicity we restrict ourselves to symmetric intervals Œ; in (5.28); note that everything works similarly for general intervals Œ1 ; 2 . The factorization p p .x C ˇ1 /2 2.y C ˇ2 /2 D .x C ˇ y 2/.x C ˇ 0 C y 2/;

(5.29)

p p where ˇ D ˇ1 ˇ2 2 and ˇ0 D ˇ1 C ˇ2 2, clearly indicates that the asymptotic number p of integral solutions of (5.28) heavily depends on the “local” behavior of n 2 mod 1. In fact, (5.28) is essentially equivalent to the inhomogeneous diophantine inequality p c kn 2 ˇk < n

(5.30)

p with c D =2 2. To turn the vague term “essentially equivalent” into a precise statement, let p F . 2I ˇ1 ; ˇ2 I I N / be the number of integral solutions .x; y/ 2 ZZ2 of (5.28) with 2 D , 1 D satisfying 1 y N and x 1. It means counting lattice points in a long and narrow hyperbola segment.


261

p Let F . 2I ˇI cI N / be the numberpof integral solutions n of (5.30) satisfying 1 n N , where ˇ D ˇ1 pˇ2 2. Now essentially p equivalent means that, for almost every pair ˇ1 ; p ˇ2 , F . 2I ˇ1 ; ˇ2 I I N / F . p2I ˇI cI N / D O.1/ as N ! 1, where c D =2 2 (and of course ˇ D ˇ1 ˇ2 2). More precisely, we have Lemma 5.1. Let > 0 and ˇ2 be arbitrary real numbers. Then for almost every ˇ1 there exists a finite 0 < C.ˇ1 ; ˇ2 ; / < 1 such that Z

1

C.ˇ1 ; ˇ2 ; / dˇ < 1 and 0

ˇ p ˇ p ˇ ˇ ˇF . 2I ˇ1 ; ˇ2 I I N / F . 2I ˇI cI N /ˇ < C.ˇ1 ; ˇ2 ; / for all N 1; p p where c D =2 2 and ˇ D ˇ1 ˇ2 2. We postpone the simple proof to Sect. 5.3. In view of Lemma 5.1 it suffices to study the special case ˇ2 D 0, ˇ1 D ˇ: .x C ˇ/2 2y 2

(5.31) p

where > 0 and ˇ 2 Œ0; 1/ are fixed constants. For simplicity, let F . 2I ˇI I N / 2 denote the number p of integral solutions .x; y/ 2 ZZ of (5.31) satisfying 1 y N and x 1. F . 2I ˇI I N / counts the number of lattice points in a long and narrow p hyperbola segment (“hyperbolic p needle”) located along a line of slope 1= 2 (if ˇ D 0 then the line is y D x= 2), see Fig. 5.1.

Fig. 5.1

262


In the special case D 1 and ˇ D 0, (5.31) becomes the simplest Pell equation x 2 2y 2 D ˙1. The integral solutions .xk ; yk / form a cyclic group generated p by the smallest positive solution x D y D 1 in the well-known way: x C y 2D k k p .1 C 2/k , implying the familiar asymptotic formula p F . 2I ˇ D 0I D 1I N / D

log N p C O.1/; log.1 C 2/

(5.32)

p p where 1 C 2 is the fundamental unit of the real quadratic field Q . 2/. In sharp contrast to the bounded fluctuation in the homogeneous case ˇ D 0, the inhomogeneous case can exhibit “extra large fluctuations proportional to the area,” p see Theorem 5.3. To explain this, first we have to compute the mean value of F . 2I ˇI I N / as ˇ runs in the unit interval 0 ˇ < 1. Lemma 5.2. We have Z

1 0

p F . 2I ˇI I N / dˇ D p log N C O.1/; 2

(5.33)

where the implicit constant in O.1/ is independent of N (but may depend on ). Moreover, for an arbitrary subinterval 0 a < b 1 we have the limit formula lim

N !1

1 ba

Rb a

p F . 2I ˇI I N / dˇ Dp : log N 2

(5.34)

Formulas (5.33) and (5.34) express the almost trivial geometric fact that the average number of lattice points contained in all the translated copies of a given region (a hyperbola segment in our special case) is precisely the area of the region, see Lemma 5.8. We will give a detailed proof of Lemma 5.2 in Sect. 5.3. Now we are ready to formulate our first—and weakest—extra large fluctuation result, demonstrating that the fluctuations can be proportional to the area. This result is hardly more than a warm-up for—or simplest illustration of—the main results that will come later. Theorem 5.3. For D 1=2 there are continuum many “divergence points” ˇ 2 Œ0; 1/ in the sense that p p F . 2I ˇ I D 1=2I n/ F . 2I ˇ I D 1=2I n/ lim sup > lim inf : n!1 log n log n n!1

(5.35)

Note that the fluctuation const log n in (5.35) is as large as possible apart from a constant factor. This follows from Lemma 5.5 in the next section. It is fair to say that Theorem 5.3 represents a sophisticated violation of the Area Principle.

5.2 Randomness and the Area Principle

263

We postpone the proof of Theorem 5.3 to Sect. 5.3. Note that Theorem 5.3 has a far-reaching generalization: it holds for every > 0, and we actually have the stronger inequality p p F . 2I ˇ I I n/ F . 2I ˇ I I n/ > p > lim inf : (5.36) lim sup n!1 log n log n n!1 2 We will return to the stronger (5.36) later in Sect. 5.4, see Theorem 5.11. Another far-reaching generalization of Theorem 5.3 will be discussed in Sect. 5.9, see Theorem 5.14. Finally, an extra large fluctuation type result for arbitrary point sets (instead of the set ZZ2 of lattice points) will be discussed in Sect. 5.9, see Theorem 5.19. We refer to these extra large fluctuation type results as super-irregularity. Sections 5.4–5.9 are all devoted to super-irregularity.

5.2 Randomness and the Area Principle Equations (5.32) and (5.35) display the two extreme cases: (1) the totally negligible bounded fluctuations around the main value const log n (which is in the range of the area) and (2) the extra large fluctuations proportional to the area (“superirregularity”). But what kind of fluctuations do we have for a typicalp0 < ˇ < 1? We show that for a typical ˇ, the asymptotic number of solutions F . 2I ˇI I N /, as N ! 1, justifies the Area Principle. And beyond that a more thorough look reveals “randomness.” Talking about randomness, we note that the two most important parameters of a random variable are the expectation (or mean value) and the variance. By (5.33) Z 1 p expectation D F . 2I ˇI I N / dˇ D p log N C O.1/: 2 0 Explaining why the natural scaling is exponential. Note that for any 1 < M < N , the counting function is “slowly changing” in the following sense: p p (5.37) F . 2I ˇI I N / F . 2I ˇI I M / D O .log.N=M // ; where const log.N=M / is the corresponding area. The geometric reason behind this is the exponentially sparse occurrence of the lattice points in the corresponding long and narrow tilted hyperbola. The proof of (5.37) is a straightforward application of Lemma 5.5. We have the following corollary of (5.37). If M D cN , i.e., n runs p in cN < n < N with some constant 0 < c < 1, then the fluctuation of F . 2I ˇI I N / is a trivial O.1/. This negligible constant size change O.1/ in (5.37), as n runs in cNp< n < N , explains why it is more natural to switch to the exponential scaling F . 2I ˇI I e N /. In the rest of this discussion we will often prefer the exponential scaling.

264


The variance comes from the following result: for any > 0 there is a positive effective constant D . / > 0 such that 1 lim N !1 N

Z 0

1

2 p N dˇ D 2 . /: F . 2I ˇI I e / p N 2

The proof of this limit formula is far from easy: it is based on a combination of Fourier analysis (Poisson’s summationpformula, Parseval’s formula) and the arithmetic of the quadratic number field Q . 2/. The first probabilistic result—nicely fitting our general scheme of “determinism vs. randomness”—is the following. Theorem 5.4 (“central limit theorem”). The renormalized counting function p F . 2I ˇI I e N / p . / N

p N 2

; 0 ˇ < 1;

has a standard normal limit distribution with error term O.N 1=4 .log N /3 / as N ! 1. Formally, ˇ p p ˇ max ˇˇmeas ˇ 2 Œ0; 1/ W F . 2I ˇI I e N / p N . / N 2 ˇ Z 1 ˇ 1 2 e u =2 d uˇˇ D O N 1=4 .log N /3 ; p 2 where the maximum is taken over all 1 < < 1. To give at least a very vague intuition behind Theorem 5.4, we write p p Gj .ˇ/ D F . 2I ˇI I e j / F . 2I ˇI I e j 1 /; j D 1; 2; :::; N: I of (5.31) satisfying e j 1 < That is, Gj .ˇ/ is the number of integral solutions n 2 N j ne : Note that Gj .ˇ/ is a bounded function. This follows from Lemma 5.5, and from the obvious geometric fact that, any short hyperbola segment, corresponding to Gj , is “basically a rectangle.” More precisely, any short hyperbola segment, corresponding to Gj , can be approximated by an inscribed rectangle R1 of slope p p 1= 2 and a circumscribed rectangle R2 of slope 1= 2 such that the ratio of the two areas is uniformly bounded by an absolute constant. It is time now to formulate p Lemma 5.5. Every tilted rectangle of slope 1= 2 and area 1=5 contains at most one lattice point.


265

We postpone the proof of this simple but important result to the next section. Lemma 5.5 can be easily generalized: the same proof gives that for any quadratic irrational ˛ there is a positive constant c0 D c0 .˛/ > 0 such that every tilted rectangle of slope ˛ and area c0 contains at most one lattice point. Our key intuition is that the bounded function Gj .ˇ/ resembles the j th Rademacher function, so the sum N X p Gj .ˇ/ p ; F . 2I ˇI I e N / p N D 2 2 j D1 as a function of ˇ 2 Œ0; 1/; behaves like a sum of N independent Bernoulli variables (“N -step random walk”): p F . 2I ˇI I e N / p N ˙1 ˙ 1 ˙ ˙ 1 .N terms/: 2

(5.38)

Our next result—Theorem 5.6—can be interpreted as an analog of the famous Law of the iterated logarithm in probability theory. We show that the number of p solutions F . 2I ˇI I e n / of (5.31) oscillates between the sharp bounds (" > 0) p p p p p p n n .2 C "/ log log n < F . 2I ˇI I e n / < p n C n .2 C "/ log log n 2 2 (5.39)

as n ! 1 for almost every ˇ. Note that D . / > 0 is the same as in Theorem 5.4, and (5.39) fails for 2 " instead of 2 C " (where " > 0). Here the main term p2 n means the “area,” so (5.39) can be considered as a highly sophisticated justification of the Area Principle. Equation (5.39) is particularly interesting in view of the fact that the classical Circle Problem is unsolved, and seems to be hopeless by the current techniques. As we pointed out in Sect. 5.1, we do not know the true order of the fluctuations of the error term Z.cI R/ R2 , R ! 1 for any fixed center c. What (5.39) means is that we can solve a Hyperbola Problem instead of the Circle Problem. More precisely, we can solve the hyperbola version of the Circle Problem at least for almost every “center”. We show that, for almost every “center” (i.e., for almost every value of the translation parameter ˇ), the number of lattice points asymptotically equals the area plus an error, which even in the worst case scenario is roughly around the square root of the area. (For circles the corresponding maximum error is conjectured to be roughly around square root of the circumference, which is the fourth root of the area.) The law of the iterated logarithm is one of the most famous results in classical probability theory, and it describes the “maximum fluctuation” in the infinite (onedimensional) random walk. The term infinite random walk refers to an infinite sequence of random Bernoulli trials, where each trial is tossing a fair coin. Of course, “coin tossing” belongs to the physical world; it is not a mathematical concept. But there is a well-known pure mathematical problem, which is considered

266


“equivalent”: we can study the digit distribution of a typical real number written in the binary form ˇD

b1 b3 b2 C 2 C 3 C ; 2 2 2

where each bi D 0 or 1 (for simplicity assume that 0 < ˇ < 1). The infinite 0-1 sequence b1 D b1 .ˇ/; b2 D b2 .ˇ/; b3 D b3 .ˇ/; ; i.e., the sequence of binary digits of 0 < ˇ < 1, represents an infinite Heads-andTails sequence, say, 1 is Heads and 0 is Tails. The sum Bn D Bn .ˇ/ D b1 C b2 C b3 C C bn counts the number of 1s (“Heads”) among the first n binary digits of 0 < ˇ < 1. Borel’s classical theorem about normal numbers asserts that 1 Bn .ˇ/ ! for almost all 0 < ˇ < 1: n 2 Let Sn D Sn .ˇ/ denote the corresponding error term Sn D Sn .ˇ/ D 2Bn .ˇ/ n D number of Heads number of Tails: That is, Sn D Sn .ˇ/ represents the number of Heads minus the number of Tails among the first n random trials (“coin tossings”). A well-known theorem of Khinchin (see [Kh1]) asserts that lim sup p n

Sn .ˇ/ D 1 for almost all 0 < ˇ < 1: 2n log log n

Notice that Khinchin’s theorem is a far-reaching quantitative improvement on Borel’s famous theorem on “normal numbers.” The “long form” of Khinchin’s theorem says that, for any " > 0 and for almost every ˇ, we have the following two statements: 1. p Sn .ˇ/ < .1 C "/ 2n log log n 2.

for all sufficiently large values of n and p Sn .ˇ/ > .1 "/ 2n log log n holds for infinitely many values of n.


267

This strikingly elegant and precise result is the simplest form of the so-called law of the iterated logarithm, usually called the Khinchin’s form. About 20 years later Erd˝os proved the following ultimate convergence– divergence criterion (conjectured by Kolmogorov), which contains the Khinchin’s form as a simple corollary (see [Ko, Er, Fe3]). Let .n/ be an arbitrary positive increasing function of n. Then for almost every ˇ, p p Sn .ˇ// .n/ n or Sn .ˇ// > .n/ n

(5.40)

hold, respectively, for infinitely many ns or for all sufficiently large ns if and only if the series 1 X .n/ nD1

n

e

2 .n/=2

converges or diverges:

(5.41)

Exactly the same holds for the other inequality p Sn .ˇ// < .n/ n: Notice that Khinchin’s theorem is a special case of Erd˝os’s theorem with .n/ D ..2 C "/ log log n/1=2 : Indeed, the series (5.41) is convergent or divergent depending on whether we have " > 0 or " 0. We can obtain a much more precise result by choosing an arbitrarily large but fixed value of the integer k 5 and an arbitrarily small but fixed value of " > 0, and write .n/ D " .n/ D .2 log2 n C 3 log3 n C 2 log4 n C : : : C 2 logk1 n C 2.1 C "/ logk n/1=2 : (5.42)

Warning: here we use the space-saving notation log2 n D log log n, i.e., it means the iterated logarithm (and not the base 2 logarithm), and in general logk n D log.logk1 n/ denotes the k times iterated logarithm of n. With this choice of " .n/, 1 X " .n/ nD1

X n

n

e "

2

.n/=2

1 ; n log n log2 n log3 n logk1 n.logk n/1C"

where the last sum is convergent or divergent depending on whether we have " > 0 or " 0.

268


This example clearly illustrates the remarkable precision of Erd˝os’ theorem. Let’s return to (5.39). The fact that it is an analog of Khinchin’s law of the iterated logarithm suggests the vague intuition that the lattice point counting function p F . 2I ˇI I e n / behaves like a “generalized digit sum” (as ˇ runs in 0 < ˇ < 1). What we are going to actually formulate below (see Theorem 5.6) is a refinement of (5.39): we work with the " .n/ defined in (5.42). Theorem 5.6 (“law of the iterated logarithm”). Let k 5 be an integer, and write " .n/ D .2 log2 n C 3 log3 n C 2 log4 n C : : : C 2 logk1 n C 2.1 C "/ logk n/1=2 ; (5.43) where " 0 is a fixed constant. Choosing " > 0 in (5.43), for almost every ˇ, p p (5.44) F . 2I ˇI I e n / p n C " .n/ n 2 hold for all sufficiently large integers n. On the other hand, choosing " D 0 in (5.43), for almost every ˇ, p p F . 2I ˇI I e n / > p n C 0 .n/ n 2

(5.45)

hold for infinitely many n’s. Exactly the same holds for the negative direction p p F . 2I ˇI I e n / p n " .n/ n 2 and p p F . 2I ˇI I e n / < p n 0 .n/ n; 2 respectively. Remarks. We could also prove an analog of the Erd˝os type ultimate convergence– divergence criterion, but the proof of Theorem 5.6 is already very long, and the convergence-divergence criterion would require some (annoyingly long) extra technical discussions. p p By Lemma p 5.1, F . 2I ˇI cI N / D F . 2I ˇI I N / C O.1/ as N ! 1, where c D p =2 2. So Lemma 5.1 implies that Theorems 5.4 andp5.6 remain true if F . 2I ˇI I N / is replaced with the number of solutions F . 2I ˇI cI N / of the inhomogeneous diophantine inequality (5.30). Theorem 5.6 emphasizes the dramatic difference between rational ˇ and almost every ˇ. For every rational ˇ the counting function has the form p F . 2I ˇI I N / D c. / log N C O.1/ as N ! 1

(5.46)


269

p for all > 0, and it remains true if 2 is replaced by any quadratic irrational. This bounded size fluctuation around the main term c log N (which is typically not the area, but it is in the range of the area) jumps up considerably for a “typical” ˇ. By Theorem 5.6 we have square root size fluctuations (square root of the area) around the main term (=area), and this holds for almost every ˇ and all > 0. The bounded size fluctuation around c log N in (5.46) follows from a general principle that recurrence implies periodicitypin a fixed modulus. We illustrate this general principle in the special case of ˛ D 2. Let ˇ D r=s be a rational number; then 1 .x C ˇ/2 2y 2 2 ” 1 s 2 .xs C r/2 2.ys/2 2 s 2 ; so it suffices to study the asymptotic number of integral solutions u; v 2 ZZ of the quadratic equation u2 2v2 D m where 1 s 2 m 2 s 2 ; u r .mod s/; v 0 .mod s/: p the form of either u2k C v2k 2 D .1 C Every solution of u2 2v2 D m has p p p 2/2k .a C b 2/ where a C bp 2 is a fundamental x 2 2y 2pD p p solution of 2kC1 m, or u2kC1 C v2kC1 2 D .1 C 2/ .a C b 2/ where a C b 2 2 2 is a fundamental solution of x 2y D m (there are only a finite number of fundamental solutions). To prove (5.46) it suffices to show that the sequence of vectors .u2k ; v2k /, k 2 ZZ is periodic modulo s, and similarly the sequence of vector .u2kC1 ; v2kC1 /, k 2 ZZ is periodic modulo s. We start with the even case .u2k ; v2k /, k 2 ZZ. We have p p p p p u2k C v2k 2 D .1 C 2/2k .a C b 2/ D .pk C qk 2/.a C b 2/; which implies that u2k D a pk C 2b qk and v2k D a qk C b pk ; so it suffices to show that the sequence of vectors .pk ; qk /, k 2 ZZ is periodic modulo s. Since p p p pkC1 C qkC1 2 D .pk C qk 2/.3 C 2 2/; we have the recurrence formula pkC1 D 3pk C 4qk and qkC1 D 2pk C 3qk : The Pigeonhole Principle implies that there are two integers 1 `1 < `2 s 2 C 1 such that the vectors .p`1 ; q`1 / and .p`2 ; q`2 / are the same modulo s. Combining this fact with the recurrence formula, we conclude that the sequence of vectors .pk ; qk /,

270


k 2 ZZ is periodic modulo s with period `2 `1 . This settles the even case .u2k ; v2k /, k 2 ZZ. The odd case .u2kC1 ; v2kC1/, k 2 ZZ goes similarly. Next we focus on a simple consequence of Theorem 5.6. Let c > 0 be arbitrarily small but fixed, then by Theorem 5.6 the inhomogeneous diophantine inequality p c kn 2 ˇk < n

(5.47)

has infinitely many integer solutions n 1 for almost every ˇ (in the sense of the Lebesgue measure). Inequality (5.47) corresponds to the hyperbola segment (ˇ is fixed): c ; x 1; x

jy ˇj
1=2, there is an interval I2 D Œa; b with some 0 < a < b < 1 (a and b are generic numbers) such that ˇ2 2 I2 and F .ˇI D 1=2I N2 /

1 log N3 : 8

Since 1=4 < 1=2, there is an interval I3 D Œa; b with 0 < a < b < 1 such that ˇ3 2 I3 and F .ˇI D 1=2I N3/ >

1 log N3 for all ˇ 2 I3 : 8

(5.83)

We can clearly assume that I3 is a proper subinterval of I.0/. Write I.0; 0/ D I3 . Similarly, there is another subinterval I.0; 1/ such that I.0; 0/ [ I.0; 1/ I.0/, I.0; 0/ and I.0; 1/ are disjoint, and .1/

F .ˇI D 1=2I N3 / >

1 .1/ log N3 for all ˇ 2 I.0; 1/: 8

(5.84)

There are similar disjoint subintervals I.1; 0/ and I.1; 1/ of I.1/. p Next let n D .n1 ; n2 / 2 ZZ2 be a lattice point such that ˇ4 D n1 n2 2 2 I.0; 0/. Since the equation jx 2 2y 2 j 3=4 does not have a non-trivial integral solution, F .ˇ4 I D 3=4I N /
N3 . Since 3=4 > 1=2, there is an interval I4 D Œa; b with 0 < a < b < 1 such that ˇ4 2 I4 and F .ˇI D 1=2I N4 /
p > lim inf ; lim sup n!1 log n log n n!1 2

(5.87)

where p 2 log n C O.1/ is the area of the corresponding hyperbolic region. What is more, (5.87) holds for continuum many “divergence points” ˇ D ˇ . / 2 Œ0; 1/. The proof of Theorem 5.3 was based on an elementary argument that we may call the “method of nested intervals.” To prove (5.87) we need a new idea: we apply a more sophisticated “Riesz product argument.” The Riesz product is a powerful tool in Fourier analysis. A typical application is to prove large fluctuations for lacunary trigonometric series. To compare the “method of nested intervals” to the method of Riesz product, we give a simple illustration, see Facts 1 and 2 below. (Fact 2 is actually a well-known theorem of S. Sidon.) Consider a finite cosine sum F .x/ D

N X

aj cos.2 nj x/; where aj D ˙1 for all 1 j N;

(5.88)

j D1

and 1 n1 < n2 < : : : < nN are integers. We study the following question: What can we say about max0x1 F .x/? Well, under different extra conditions we have different results. We begin with Fact 1. If the strong gap condition nj C1 =nj 8 holds for every 1 j N 1, then max F .x/

0x1

N : 2

The proof of Fact 1 is almost trivial. Let J1 D fx 2 Œ0; 1 W cos.2 n1 x/ falls between

a1 and a1 g: 2

Since a1 D ˙1, the set J1 contains a closed subinterval I1 of length jI1 j

1 : 4n1

Next let J2 D fx 2 I1 W cos.2 n2 x/ falls between

a2 and a2 g: 2

Since a2 D ˙1, the set J2 contains a closed subinterval I2 of length jI2 j

1 : 4n2

5.4 The Riesz Product

283

Next let J3 D fx 2 I2 W cos.2 n3 x/ falls between

a3 and a3 g; 2

and so on. At the end of this process, we obtain a nested sequence of closed intervals Œ0; 1 I1 I2 IN such that ak cos.2 nk x/ 1=2 for all x 2 Ik , k D 1; 2; : : : ; N . Then we clearly have x 2 IN H) F .x/

N ; 2

proving Fact 1. This was a typical application of the “method of nested intervals.” Next comes the “Riesz product argument.” The problem that we study is the following: What happens if the strong gap condition nj C1 =nj 8 is replaced by the weaker nj C1 =nj 1 C " > 1 where " > 0 is an arbitrarily small but fixed constant? Can we still prove a linear lower bound like max0x1 F .x/ c N with some constant c D c."/ > 0 depending only on the value of "? Unfortunately, the “method of nested intervals” hopelessly collapses, and we need a new approach: it is exactly the “Riesz product argument.” The following result—a well-known theorem of Sidon in Fourier analysis—is much deeper than that of Fact 1. Fact 2 (Sidon’s theorem). If the weak gap condition nj C1 1C" > 1 nj

(5.89)

holds for every 1 j N 1, where 0 < " < 1=2 is a fixed constant, then for F .x/ defined in (5.88) we have max F .x/ c N with c D

0x1

4 "

1 : log 2"

To prove Sidon’s theorem, we define a Riesz product as follows. Let 1 D i.1/ < i.2/ < : : : < i.M / be a subsequence of 1; 2; 3; : : : ; N such that ni.j C1/ 2 holds for all j D 1; 2; : : : ; M 1; ni.j / "

(5.90)

and consider the product R.x/ D

M Y 1 C ai.j / cos.2 ni.j / x/ : j D1

(5.91)

284


Since ai.j / D ˙1, the Riesz product R.x/ is obviously nonnegative, i.e., R.x/ 0. Riesz product R.x/ is used as a test function: first we evaluate the integral Z

1

F .x/R.x/ dx D 0

M X

Z 2 ai.j /

j D1

1

cos2 .2 ni.j / x/ dx D 0

M : 2

(5.92)

Indeed, multiplying out the Riesz product R.x/, and using Euler’s formula e y D .e iy C e iy /=2, we obtain terms like ai.j1 / ai.j2 / ai.j3 / : : : ai.jk / e 2i.˙ni.j1 / ˙ni.j2/ ˙ni.j3/ ˙˙ni.jk / / ;

(5.93)

where we call (5.93) a product of length k 1. We distinguish two cases. Case 1: k D 1 (“short products”) Multiplying the corresponding terms with F .x/ and integrating from 0 to 1, we obtain M X

Z 2 ai.j /

j D1

1

cos2 .2 ni.j / x/ dx D 0

M ; 2

which is exactly (5.92). Next assume Case 2: k 2 (“long products”) We can clearly write 1 j1 < j2 < : : : < jk , then by using the elementary inequalities 1C

" " 2 " 3 C C < 1 C " and C 2 2 2

1

1 " " 2 " 3 > 2 2 2 1C"

if 0 < " < 1=2, we obtain that j˙ni.j1 / ˙ni.j2 / ˙ni.j3 / ˙ ˙ni.jk / j falls between .1C"/ni.jk / and

1 ni.jk / : 1C"

Comparing this to the gap condition (5.89), we see that F .x/ and the “long products” of R.x/ represent disjoint sets of exponential functions e 2i`x ; ` 2 ZZ; and using the orthogonality of these functions, the contribution of Case 2 in R1 0 F .x/R.x/ dx is zero. This proves (5.92).


285

The same argument shows that Z

1

R.x/ dx D 1:

(5.94)

0

R1 Since R.x/ 0, (5.94) means that the integral 0 F .x/R.x/ dx is a “weighted average” of F .x/ (with nonnegative weights). So by (5.92), Z

1

max F .x/

0x1

F .x/R.x/ dx D 0

M : 2

(5.95)

The inequality 2 2 2 .1 C "/ > ; clearly holds with r D log " " " r

thus by (5.89) and (5.90) we can choose M

N D r

2 "

N ; log 2"

and (5.95), (5.96) complete the proof of Sidon’s theorem.

(5.96) t u

5.4.2 The “Rectangle Property”, and a Key Result: Theorem 5.11 Let’s return now to Theorem 5.3 and (5.87). We restate Theorem 5.3 in a slightly different form. We recall the notation in (5.70): p ˚ HN . 2I / D .x; y/ 2 R I 2 W x 2 2y 2 p p o where 1 x C y 2 2 2N ;

(5.97)

p p that is, HN . 2I / is a long, narrow, tilted hyperbolic needle of slope 1= 2. Its area is p log N C O.1/, see (5.71). Theorem 5.3 states—roughly speaking—that, in the 2 special casep D 1=2, there are two translated copies of the same tilted hyperbolic needle HN . 2I D 1=2/ such that one is substantially richer in lattice points than the other: the discrepancy is proportional to the area (“extra large deviation”). More precisely, there is a positive absolute constant c > 0 such that, for infinitely many integers N D Ni (wherepNi ! 1), there are two translated p .i / .i / C H . 2I / and x2 C HN . 2I / of the tilted hyperbolic needle copies x N p 1 HN . 2I D 1=2/ such that

286


ˇ ˇ p p ˇ 2 ˇ .i / .i / ˇjZZ \ .x1 C HN . 2I D 1=2//j jZZ2 \ .x2 C HN . 2I D 1=2//jˇ > > c log N D c log Ni :

(5.98)

Because of the periodicity of the lattice points, we can clearly assume that the pairs .i / .i / x1 , x2 of vectors are all in the unit square Œ0; 1/2 (i ! 1). The extra large deviation result (5.98), which is equivalent to Theorem 5.3, can be generalized in several stages. The first generalization is (5.87) or at least an equivalent form as follows. Proposition 5.9. Let > 0 be an arbitrary but fixed real number and let N 2 be an integer. We study thep number of lattice points in the translated copies of the tilted hyperbolic needle HN . 2I / of area p2 log N C O.1/. There is a translated copy p x1 C HN . 2I / such that p jZZ2 \ .x1 C HN . 2I //j > p log N C ı 0 log N; 2

(5.990)

where ı 0 D ı 0 . / > 0 is a positive constant, independentpof N . Similarly, there is another translated copy x2 C HN . 2I / such that p jZZ2 \ .x2 C HN . 2I //j < p log N ı 0 log N; 2

(5.9900)

where ı 0 D ı 0 . / > 0 is the same positive constant as in (5.990 ). Note that Proposition 5.9 immediately gives the existence of a single “divergence point” ˇ D ˇ . / 2 Œ0; 1/ in (5.87). To prove continuum many “divergence points” ˇ D ˇ . / 2 Œ0; 1/, we just have to combine Proposition 5.9 with the routine Cantor set argument in the proof of Theorem 5.3. Next comes the second stage of generalization: we replace the set ZZ2 of lattice points in the plane with an arbitrary subset A ZZ2 of positive density. Here is an illustration. We say that a lattice point n D .n1 ; n2 / 2 ZZ2 is coprime (or visible) if the coordinates n1 and n2 are relatively prime. The alternative name visible is explained by the geometric fact that if a lattice point n D .n1 ; n2 / 2 ZZ2 is not coprime, then n is not visible from the origin (since n is “behind the back” of .n1 =d; n2 =d / 2 ZZ2 , where d 2 is the greatest common divisor of n1 and n2 , i.e., .n1 =d; n2 =d / is between .0; 0/ and .n1 ; n2 /). Let ZZ2copri me denote the set of coprime lattice points in the plane. It is well known from number theory that ZZ2copri me is a positive density subset of ZZ2 , and the density is 6= 2 . Now let A be an arbitrary subset of ZZ2 of positive density ı D ı.A/ > 0. There is a natural generalization of Proposition 5.9 where we replace ZZ2 with A; the price that we pay is that, due to the lack of periodicity of a general subset A, the translations are not necessarily in the unit square anymore.


287

Proposition 5.10. Let A ZZ2 be an arbitrary subset of positive density ı D ı.A/ > 0. Let > 0 be an arbitrary but fixed real number and let N 2 be an integer. We study the number of elements of A in the translated copies of the tilted p hyperbolic needle HN . 2I / of area p log N C O.1/. We restrict our attention 2

to the translated copies inside a square Œ0; M 2 . Assume that M=N is sufficiently large depending only p p on and ı. Then there is a translated copy x1 C HN . 2I / such that x1 C HN . 2I / Œ0; M 2 and p jA \ .x1 C HN . 2I //j > ı p log N C ı 0 log N; 2

(5.1000)

where ı 0 D ı 0 .; ı/ > 0 is a positive constant, independent p of N and M . Similarly, there is another translated copy x C H . 2I / such that x2 C 2 N p HN . 2I / Œ0; M 2 and p jA \ .x2 C HN . 2I //j < ı p log N ı 0 log N; 2

(5.10000)

where ı 0 D ı 0 .; ı/ > 0 is the same positive constant as in (5.1000). It turns out that the only relevant property of a lattice point set A ZZ2 that we really use in the proof of Proposition p 5.10 is the “rectangle property” in Lemma 5.5: every tilted rectangle of slope 1= 2 and area 1=5 contains at most one lattice point. (Of course, the concrete value 1=5 of the constant is secondary.) The third stage of generalization goes far beyond the family of lattice point sets A ZZ2 : the only requirement is that the point set satisfies the “rectangle property.” Theorem 5.11. Let P be a finite set of points in the square Œ0; M 2 with density ı, i.e., the number of elements of P is jPj D ı M 2 . We study the number p of elements of P in the translated copies of the tilted hyperbolic needle HN . 2I / of area p log N C O.1/. We restrict our attention to the translated copies inside a square 2

Œ0; M 2 . Assume that P satisfies the following “rectangle property”: there p is a positive constant c1 D c1 .P/ > 0 such that every tilted rectangle of slope 1= 2 and of area c1 contains at most one element of the set P. Furthermore, assume that both N and M=N are “large” in the precise sense of (5.102). p p Then there is a translated copy x1 C HN . 2I / such that x1 C HN . 2I / Œ0; M 2 and p jP \ .x1 C HN . 2I //j > ı p log N C ı 0 log N; 2

(5.1010)

where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant, independent of N and M .

288


p Similarly, there is another translated copy x2 C HN . 2I / such that x1 C p HN . 2I / Œ0; M 2 and p jP \ .x2 C HN . 2I //j < ı p log N ı 0 log N 2

(5.10100)

with the same ı 0 D ı 0 .c1 ; ; ı/ > 0 as in (5.1010); namely, 107 c1 107 c12 p 0 0 12 ı D ı .c1 ; ; ı/ D 10 ı min : ; c1 ; ; 20 2 2 Finally, the assumption that both N and M=N are “large” goes as follows: N 2

10 C 1

;

1 N < 2n N; 2

C 1 .N C 2 / n p o: M > 1011 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1

(5.102)

Note that Propositions 1.15 and 1.16 are special cases of Theorem 5.11 (with P D ZZ2 and P D A). Unfortunately, the proof of Theorem 5.11 is rather difficult and long. The complicated details cover the next four sections. But the main idea is quite simple: it is basically a sophisticated application of the Riesz product.

5.5 Starting the Proof of Theorem 5.11 Using Riesz Product Since the proof is long and complicated, a convenient notation makes a big difference. It is much simpler for us to work with hyperbolic regions in the usual horizontal–vertical position (instead of the tilted position). It means that, instead of working with the set ZZ2 of lattice points in the plane and the family of tilted hyperbolic needles of a fixed (quadratic) irrational slope (i.e., the setup of Theorem 5.11), we rotate back. In other words, we rotate ZZ2 by a (quadratic) irrational slope, and consider the family of hyperbolic needles in the usual horizontal–vertical position. Let > 0 be an arbitrary real number and let N 2 be a (large) integer: let H .N / denote the hyperbolic region xy , 1 x N , see Fig. 5.2. Notice that H .N / p is basically the horizontal–vertical version of the tilted hyperbolic needle HN . 2I / [see (5.70) or (5.97)]. To emphasize the difference between the tilted and the horizontal–vertical versions, we made a major change in the notation: notice that we switched the location of the parameters and N . Again we refer to H .N / as a “hyperbolic needle.” The area of H .N / equals the integral

5.5 Starting the Proof of Theorem 5.11 Using Riesz Product

289

Fig. 5.2

Z area.H .N // D 2 1

N

dx D 2 log N: x

(5.103)

Let rot˛ ZZ2 denote the rotated copy of ZZ2 by the angle where tan D ˛ Dslope (we assume that the origin is the fixpoint of the rotation). If ˛(¤ 0) is a quadratic irrational, then the continued fraction for ˛ is (eventually) periodic. This is a wellp known number-theoretic fact, for example, if ˛ D 2=2 then p 2 1 1 Dp D D Œ1; 2; 2; 2; : : : D Œ1; 2: 2 1 C 2C 1 1 2 2C

Periodicity implies that the continued fraction “digits” (officially called partial quotients) form a bounded sequence. Boundedness yields (via some well-known elementary facts from the theory of continued fraction) that kkk˛k c0 D c0 .˛/ > 0 for all integers k 1

(5.104)

where c0 D c0 .˛/ > 0 is some positive constant depending only on k p ˛, and k p denotes the distance from the nearest integer. For example, if ˛pD 2=2 p D 1= 2 then (5.104) follows from the factorization x 2 2y 2 D .x y 2/.x C y 2/: if x and y are integers then p p p p 1 jx 2 2y 2 j D jx y 2j jx C y 2j D jx˛ yj 2jx C y 2j; and we choose p x D k, y=the nearest integer to k˛; this explains why in the special case ˛ D 1= 2 the choice c0 D 1=4 in (5.104) works.

290


Inequality (5.104) has an important geometric interpretation: every axes parallel rectangle of area c1 .˛/ > 0 contains at most one element of the rotated copy rot˛ ZZ2 of ZZ2 ;

(5.105)

where c1 D c1 .˛/ p > 0 is another positive constant depending only on ˛. For example, if ˛ D 2=2, then by Lemma 5.5, c1 D 1=5 is a good choice in (5.105). The following statement is just a slight generalization of Theorem 5.11. Proposition 5.12. Let P be a finite set of points in the square Œ0; M 2 with density ı, i.e., the number of elements of P is jPj D ı M 2 . We study the number of elements of P in the translated copies of the hyperbolic needle H .N /. Assume that P satisfies the following “rectangle property”: there is a positive constant c1 D c1 .P/ > 0 such that every axes-parallel rectangle of area c1 contains at most one element of the set P. Furthermore, assume that both N and M=N are “large” in the precise sense of (5.107). Then there is a translated copy x1 C H .N / of the hyperbolic needle H .N / such that x1 C H .N / Œ0; M 2 and jP \ .x1 C H .N //j ı 2 log N C ı 0 log N;

(5.1060)

where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant, independent of N and M , to be defined below. Similarly, there is a translated copy x2 CH .N / of the hyperbolic needle H .N / such that x2 C H .N / Œ0; M 2 and jP \ .x2 C H .N //j ı 2 log N ı 0 log N

(5.10600)

with the same ı 0 D ı 0 .c1 ; ; ı/ > 0 as in (5.1060); namely, 0

0

ı D ı .c1 ; ; ı/ D 10

12

107 c1 107 c12 p ı min : ; c1 ; ; 20 2 2

Finally, the assumption that both N and M=N are “large” goes as follows: N 2

10 C 1

;

1 N < 2n N; 2

C 1 .N C 2 / o: M > 1011 n p 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1

(5.107)

Remarks. The term ı 2 log N in (5.1060) and (5.10600) represents the “expectation,” since the set P has density ı and the hyperbolic needle H .N / has area


291

2 log N . The extra terms ˙ı 0 log N mean that the deviation from the expectation is proportional to the expectation, justifying the name “extra large deviation.” The constant factors 1012 and 1011 are certainly very far from the best possible. Since the proof is complicated, my primary goal is to present the basic ideas in the simplest form—I don’t care too much about optimizing the constant factors. Proof of Proposition 5.12. Let f .x/ denote the point-counting function jP \ .x C H .N //j:

(5.108)

If x 2 Œ0; M N Œ; M then clearly x C H .N / Œ0; M 2 :

(5.109)

This explains why we choose the rectangle Œ0; M N Œ; M to be our underlying domain in the proof. Let .x/ D f .x/ ı area.H .N // D f .x/ ı 2 log N

(5.110)

denote the discrepancy function; .x/ deserves its name if (5.109) holds. In order to show that .x/ > ı 0 log N > 0 holds for some x D x1 , we apply the “test function method,” initiated by Roth [Ro]. The basic idea of the proof is to construct a positive test function T .x/ > 0 such that 1 .M N /.M 2 /

Z

M N

Z

M

.x/T .x/ d x > const log N > 0; 0

(5.111)

and 1 .M N /.M 2 /

Z

M N

Z

M

T .x/ d x < const: 0

(5.112)

Combining (5.111) and (5.112) with the general (trivial) inequality Z

Z .x/T .x/ d x max .x/ x

T .x/ d x;

which holds for any positive function T .x/ > 0, we obtain what we want: max .x/ > const log N x

with some positive constant const> 0.

(5.113)

292


Similarly, to verify the other direction .x/ < ı 0 log N < 0 for some x D x2 , we construct a positive test function T .x/ > 0 such that 1 .M N /.M 2 /

Z

M N

Z

0

M

.x/T .x/ d x < const log N < 0;

(5.114) and again 1 .M N /.M 2 /

Z

M N

Z

0

M

T .x/ d x < const:

(5.115)

Clearly (5.114) and (5.115) imply the other direction min .x/ < const log N < 0 x

with some positive constant const> 0. Let’s return to (5.111) and (5.112). We will express the test function T .x/ in terms of “modified Rademacher functions” (sometimes called Haar wavelets)—this is another idea that we borrow from Roth’s pioneering paper [Ro]. The benefit of working with modified Rademacher functions is that we have orthogonality, and what is more, we have “super-orthogonality,” see the key property below. Note that Roth himself simply took the sum of certain “modified Rademacher functions” (and applied the Cauchy–Schwarz inequality, instead of (5.113); for his purposes orthogonality was sufficient). It was Halász’s innovation to express T .x/ as a Riesz product of modified Rademacher functions, see Halász [Ha]. The main point is that the Riesz product takes advantage of the “super-orthogonality.” (Halász used this method, among many other things, to give an elegant new proof of Schmidt’s well-known discrepancy theorem, see [Schm].) Here we develop an adaptation of the Roth–Halász method for hyperbolic regions. Following the Roth–Halász method, we will express the test function T .x/ in the form of a Riesz product of “modified Rademacher functions” T .x/ D

Y

.1 C Rj . x//;

(5.116)

j 2J

where 0 < < 1 is an appropriate constant to be specified later, and Rj .x/, j 2 J are certain modified Rademacher functions to be defined below (J is some appropriate index-set). We assume that the test function T .x/ is zero outside of the rectangle Œ0; M N Œ; M . Suppose that 102 > 1 > 0 and 102 > 2 > 0 are (small) positive real numbers (to be specified later) such that M N M 2 D D 2m ; m 1 is an integer: 1 2

(5.117)


293

Let j be an arbitrary integer in the interval 0 j n where 2n N , that is, n D log N= log 2 C O.1/ (binary logarithm): we decompose the rectangle Œ0; M N Œ; M into 2m 2m D 4m disjoint translated copies of the small rectangle Œ0; 2j 1 Œ0; 2j 2 :

(5.118)

We call these congruent copies of the small rectangle (5.118) j -cells. For each one of the 4m j -cells we independently choose one of the following three patterns: C, C, and 0, see Fig. 5.3. t u

Fig. 5.3

As Fig. 5.3 shows, the pattern C actually means a two-dimensional pattern as follows: we divide the j -cell into four congruent subrectangles, and define a step function on the j -cell, which is C1 on the upper right, 1 on the upper left, 1 on the lower right, and C1 on the lower left subrectangle. Similarly, the pattern C means the step function with 1 on the upper right, C1 on the upper left, C1 on the lower right, and 1 on the lower left subrectangle. Finally, pattern 0 means that the step function is zero on the whole j -cell. In the rest we refer to these two-dimensional patterns simply as 0, C, and C (representing the top rows in Fig. 5.3). By making an independent choice of C, C, and 0 for each j -cell, we obtain a particular modified Rademacher function Rj .x/ of order j , defined over the whole rectangle Œ0; M N Œ; M . We define Rj .x/ to be 0 outside of the rectangle Œ0; M N Œ; M . Since for each one of the 4m j -cells there are three options (namely, C, C, m and 0), the total number of modified Rademacher functions Rj .x/ of order j is 34 . 4m Let R.j / denote the family of all 3 modified Rademacher functions of order j . This means the notation Rj .x/ is somewhat ambiguous in the sense that it represents any element of the huge family R.j /. Super-orthogonality: Key Property of the modified Rademacher functions. If k 1 and 0 j1 < : : : < jk n, then any product Rj1 .x/ Rjk .x/ of k modified Rademacher functions has the property that, in every elementary cell of size 2j1 1 2jk 2 , the product Rj1 .x/ Rjk .x/ equals to one of the three familiar patterns: 0, C, and C (meaning the two-dimensional patterns):

294


Note that an elementary cell of size 2j1 1 2jk 2 arises as a nonempty intersection of a j1 -cell and a jk -cell (where j1 < jk ). The proof of the “key property” is almost trivial: it is based on the fact that, for any k 2, the intersection of any k cells of different orders j1 < : : : < jk is either empty or equals the intersection of the j1 -cell and the jk -cell (i.e., the intersection of the first and the last). We emphasize that in each one of the three patterns the integral of the corresponding step function is zero. Since every modified Rademacher function Rj .x/ has values ˙1 or 0, and 0 < < 1, it is clear that the Riesz product (5.116) defines a positive test function T .x/. The index-set J , a subset of f0; 1; 2; : : : ; ng, will be specified later. Note in advance that J is a “large” subset of f0; 1; 2; : : : ; ng in the sense that jJ j const .n C 1/. Next we check the second requirement (5.112) of the test function: 1 .M N /.M 2 /

Z

M N

Z

M

T .x/ d x D O.1/: 0

(5.119)

Multiplying out the Riesz product (5.116), we have T .x/ D

Y j 2J

C 2

X

X

.1 C Rj .x// D 1 C

Rj .x/C

j 2J

X

Rj1 .x/Rj2 .x/ C 3

j1 <j2 W ji 2J

Rj1 .x/Rj2 .x/Rj3 .x/ C

(5.120)

j1 <j2 <j3 W ji 2J

Using (5.120) in (5.119), we have 1 .M N /.M 2 / C

X k1

k .M N /.M 2 /

Z

M N

Z

M

T .x/ d x D 1C 0

X j1 const log N > 0 0

(5.122)

holds with some positive constant const > 0. The verification of (5.122) is far the most difficult part of the proof. This is where we make the critical decision of how to choose an appropriate modified Rademacher function Rj .x/ from the m huge family Rj .x/ 2 R.j / of size 34 . We choose the “best” Rj .x/ 2 R.j / in order to “synchronize the trivial errors.” (If we don’t synchronize the trivial errors, then they might cancel out and we cannot guarantee extra large deviation.) The synchronization argument is the heart of the proof.

5.5.1 What are the Trivial Errors and How to Synchronize Them By (5.108) and (5.110), the discrepancy function equals .x/ D jP \ .x C H .N //j ı area.H .N //;

(5.123)

and so we can write Z

M N

Z

M

.x/T .x/ d x D 0

Z

M N

Z

M

D 0

Z

M N

Z

M

D 0

0

1

X

@

1 ı area.H .N //A T .x/ d x D

Pi 2P\.xCH .N //

0

X

@

1 1A T .x/ d xıarea.H .N //.M N /.M 2 /;

Pi 2P\.xCH .N //

(5.124) where in the last step we used (5.121), and P1 , P2 , P3 ; : : : denote the elements of the given point set P. Next we change the order of summation and integration: Z 0

M N

Z

M

0 @

X Pi 2P\.xCH .N //

1 1A T .x/ d x D

XZ Pi 2P

T .x/ d x;

Pi H .N /

(5.125)

296


where Pi H .N / denotes the reflected and translated copy of the hyperbolic needle H .N /: Pi H .N / D fPi w W w 2 H .N /g:

(5.126)

Combining (5.124) and (5.125), we have 1 .M N /.M 2 / D

X Pi 2P

1 .M N /.M 2 /

Z

M N

Z

M

.x/T .x/ d x D 0

Z Pi H .N /

T .x/ d x ı area.H .N //:

(5.127)

To evaluate (5.127), we multiply out the Riesz product [see (5.120)]: T .x/ D

Y

.1 C Rj .x// D 1 C

j 2J

C 2

X

Rj1 .x/Rj2 .x/ C 3

j1 <j2 W ji 2J

X

Rj .x/C

j 2J

X

Rj1 .x/Rj2 .x/Rj3 .x/ C ;

(5.128)

j1 <j2 <j3 W ji 2J

that is, we have 1 plus the “linear part” plus the “quadratic part” plus the “cubic part” and so on. Note that “1” in fact means the characteristic function B of the rectangle B D Œ0; M N Œ; M (since by definition the modified Rademacher functions are all zero outside of B). We begin with the contribution of 1 D B in (5.127): X Pi 2P

D

X Pi 2P

1 .M N /.M 2 /

Z 1 dx D B\.Pi H .N //

1 area B \ .Pi H .N // ; .M N /.M 2 /

(5.129)

where B D Œ0; M N Œ; M .

5.5.2 Geometric Ideas Next we study the contribution of the “linear part” [see (5.128)] in (5.127). Synchronization means that we want to make the sum


XZ Pi 2P

Pi H .N /

Rj .x/ d x

297

(5.130)

“large positive” (for every j 2 J , where the index-set J f0; 1; 2; : : : ; ng will be specified later). We decompose the underlying rectangle B D Œ0; M N Œ; M into j -cells. Let C be an arbitrary j -cell; it has size 2j 1 2j 2 . Consider a single term in (5.130) and restrict it to the j -cell C. The geometric meaning of the integral Z C\.Pi H .N //

Rj .x/ d x

(5.131)

plays a crucial role in the argument below; see Fig. 5.4 below.

Fig. 5.4

Since the j -cell is very small, the hyperbola arc Pi H .N / can be approximated by its tangent line locally—this explains the tilted straight line segment in Fig. 5.4. The arrows indicate the inside of the hyperbolic needle (i.e., the arc on the picture is the upper arc of the needle). The value of integral (5.131) heavily depends on which one of the three patterns happens to show up in the restriction of Rj .x/ to the j -cell C: the C pattern and the C pattern give two integrals where the values are negatives of each other, and of course the 0 pattern gives zero integral. How to choose the right pattern (C or C or 0) in an arbitrary j -cell C? Well, for a fixed point the choice is trivial: for every fixed point Pi 2 P, exactly one of the two patterns C and C will make the integral (5.131) positive (since the sum of the two integrals is zero). The problem is that we are dealing with a large sum

298


XZ Pi 2P

C\.Pi H .N //

Rj .x/ d x

(5.132)

instead of a single term (5.131), and we have to make (5.132) positive. The difficulty is that different points may prefer different patterns, say, for Pi1 the pattern C will make the integral (5.131) positive, and for another point Pi2 the pattern C will make the integral (5.131) positive. To overcome this difficulty, we will apply the Single Dominant Term Rule, which means the following. If the sum (5.132) is “dominated” by a single term (5.131), then by an appropriate choice between the patterns C and C, we can always make this dominant term positive, and show that the contribution of the rest of the terms in (5.132) is relatively negligible. If there is no dominant term in (5.132), then we choose the 0 pattern. Of course, we have to precisely define what “domination” means. The success of the Single Dominant Term Rule is based on the fact that “single term domination” is quite typical: it happens very often among the 4m j -cells. What is a “single term domination” in (5.132)? To explain this, we have to talk about slopes. The slope of the diagonal of a j -cell is 4j 2 =1 4j ; since 1 and 2 are almost equal (we don’t distinguish between positive and negative slopes). Since the hyperbola is a smooth curve, the intersection of a (translated and reflected) hyperbolic needle Pi H .N / with the j -cell C is “almost” like the intersection of C with a half plane or the intersection of C with two “nearly parallel” half planes. Since half planes have well-defined constant slopes, as an intuitive oversimplification, I will use the terms “half plane” and “slope” for the intersections C \ .Pi H .N // (we don’t distinguish between positive and negative slopes). “Single term domination” occurs if there is exactly one half-plane—meaning some C \ .Pi H .N //—with slope close to 4j , and Pi H .N / intersects only one of the four subrectangles (where the pattern is constant) of C, namely, the lower right subrectangle, and this intersection is a “large triangle.” The intersection requirement “large triangle from the lower right subrectangle” guarantees that the integral (5.131) is “far from zero.” The integral (5.131) of this dominant term (“far from zero”) is called the trivial error. Note that the reflected hyperbolic needle H .N / has two long arcs: the upper arc, which is increasing, and the lower arc, which is decreasing (the lower arc is under the upper arc). When we say “Pi H .N / intersects C”, then it always means that (at least) one of the two long arcs of Pi H .N / intersects C. For example, in the trivial error mentioned above the intersection comes from the upper arc.


299

5.5.3 An Important Consequence of the “Rectangle Property” As we said above, “single term domination” means that there is exactly one half plane (i.e., some C \ .Pi H .N //) with slope very close to 4j . It is important to point out that we cannot have two half planes with slopes very close to 4j such that both are upper arcs. Indeed, if C \ .Pi1 H .N // and C \ .Pi2 H .N // are both upper arcs with slopes very close to 4j (see Fig. 5.5), then the two points Pi1 and Pi2 have to be in the same axes-parallel rectangle of area c1 (namely, in an axes-parallel rectangle where the slope of the diagonal is close to 4j ). But two points in the same axes-parallel rectangle of area c1 is impossible: it contradicts the hypothesis of Proposition 5.12.

Fig. 5.5

What can happen, however, is that we have two half planes with slopes very close to 4j such that one is an upper arc and the other one is a lower arc. For example, it can happen that C \ .Pi1 H .N // is an upper arc and C \ .Pi2 H .N // is a lower arc with both slopes close to 4j (we don’t distinguish between positive and negative slopes). To overcome this difficulty, we switch to a 22 configuration of j cells. More precisely, instead of working with a single j -cell C, we switch to a 2 2 configuration of four neighboring j -cells C1 , C2 , C3 , and C4 , where C1 denotes the upper left, C2 is the upper right, C3 is the lower left and C4 is the lower right member of the 2 2 configuration. The simple geometric idea is the following. Assume that the upper arc of Pi1 H .N / intersects both C2 and C3 satisfying the requirement “large triangle from the lower right subrectangle (where the pattern is constant)”. Then obviously the lower arc of Pi2 H .N / cannot intersect both of C2 and C3

300


(we assumed that the slopes are close to 4j ). Therefore, either C2 or C3 will be a j -cell with “single term domination.” That is, we can always save at least one from the four neighboring j -cells C1 , C2 , C3 , and C4 , see Fig. 5.6 (where C3 has “single term domination”).

Fig. 5.6

5.5.4 Choosing a Short Vertical Translation Next we explain how to satisfy the intersection requirement “large triangle from the lower right subrectangle (where the pattern is constant).” This is very important, since this requirement guarantees that the dominant integral (5.131) is “far from zero.” First we pick an arbitrary point Pi 2 P; then of course the hyperbolic needle Pi H .N / has a “long” arc such that the slope is close to 4j ; “long” in fact means “length of roughly 2j .” Therefore, for each point Pi 2 P there is a j -cell C such that the intersection C \.Pi H .N // has slope close to 4j . But, unfortunately, nothing guarantees that Pi H .N / intersects only one of the four subrectangles (where the pattern is constant). The solution is very simple: we apply a “short” vertical translation for the point set P (but of course the modified Rademacher functions and the test function T .x/ remain fixed in the rectangle B D Œ0; M N Œ; M ). “Short” vertical translation means that the length of the vertical translation runs from 0 to 1. For a j -cell already a translation of length from 0 to 2j 2 suffices: as the point Pi is moving up vertically, the intersection C \ .Pi H .N // changes and has “good” positions where Pi H .N / intersects only the lower right subrectangle (where the pattern is constant), and at the same time, this intersection is a “large triangle.” Since the slope is close to 4j , a positive constant percentage of the translations is “good.” If we apply translations from 0 to 1, then it will work for all j .


301

It follows from a standard average argument that there is a vertical translation 0 < t0 < 1 (in fact, the majority will do) which is “good” for “many” pairs .Pi ; j / at the same time, where Pi 2 P is a given point and j 2 f0; 1; 2; : : : ; ng is an order (of the modified Rademacher function). Here “many” means positive constant percentage of all pairs. Of course, a vertical translation has a bad side effect: some points leave the underlying square Œ0; M 2 . But, luckily for us, it suffices to use “short” translations of length 1, which means that we just lose relatively few points, namely, those that are close to the border. The hypothesis of Proposition 5.12 (see the “rectangle property”) guarantees that there are just O.M / points close to the border, which is negligible compared to the number ı M 2 of the points in P (linear is negligible compared to quadratic; we assume that ı is fixed and M is “large”).

5.5.5 Summarizing the Vague Geometric Intuition A “typical” vertical translation of length 0 < t0 < 1 has the property that, for a positive constant percentage of the pairs .j; C/, where j 2 f0; 1; 2; : : : ; ng and C is a j -cell, we have “single term domination,” implying (and here we skipped a lot of technical details!) Z XZ 1 Rj .x/ d x Rj .x/ d x const > 0; 2 C\.Pi0 H .N // Pi 2P C\.Pi H .N // (5.133) where Pi0 is the “dominating” point, i.e., the intersection C \ .Pi0 H .N // has slope close to 4j , and this intersection is a “large triangle from the lower right subrectangle of C (where the pattern is constant).” We will explain the missing details of (5.133) later, including an explicit value for “const.” The Single Term Domination Rule and (5.133) give X X j 2J Pi 2P

1 .M N /.M 2 /

Z Pi H .N /

Rj .x/ d x constjJ j const.nC1/>0: (5.134)

The geometric intuition requires that j 2 J satisfies an inequality like 1 N max 1; 2j min N; :

(5.135)

To guarantee (5.135), we choose J to be the interval J W

n o

log max 1; 1 log 2

j

log N log .maxf1; g/ : log 2 log 2

(5.136)

302


We emphasize that this was just an “intuitive” proof for (5.134). We will return to (5.133) and (5.134) later and show how to make the whole thing perfectly precise and explicit. This concludes Sect. 5.5. We complete the proof of Proposition 5.12 in the next three sections. Note that (5.134) is the most difficult part.

5.6 More on the Riesz Product 5.6.1 Applying Super-Orthogonality Next we turn to the contribution of the “quadratic,” “cubic,” and even higher order parts of the Riesz product [see (5.128)] in (5.127). Let k 2 and let 0 j1 < : : : < jk n be k orders written as an increasing sequence. Let C be an elementary cell of size 2j1 1 2jk 2 : C is the intersection of k cells of orders j1 < : : : < jk . Super-orthogonality yields that the product Rj1 .x/ Rjk .x/ (of k modified Rademacher functions of given orders) restricted to C equals to one of the following three patterns: C or C or 0. Assume that the translated and reflected hyperbolic needle Pi H .N / intersects C ; let slope D slope.C \ .Pi H .N /// denote the slope of the intersection C \ .Pi H .N // (we don’t distinguish between positive and negative slopes). A simple geometric consideration shows that, roughly speaking, the integral 1 area.C /

Z C \.Pi H .N //

Rj1 .x/ Rjk .x/ d x

is “negligible” unless the slope of the intersection C \ .Pi H .N // is “close” to 2.j1 Cjk / (=the slope of the diagonal of C , which has size 2j1 1 2jk 2 ). The precise statement goes as follows: ˇ ˇZ ˇ ˇ 1 1 ˇ ˇ j1 Cjk R .x/ R .x/ d x ; slope 2 min : ˇ ˇ j j 1 k ˇ area.C / ˇ C \.Pi H .N // slope 2j1 Cjk (5.137)

Note that (5.137) is a straightforward corollary of the geometry of the three possible patterns of Rj1 .x/ Rjk .x/ in C . The hyperbolic needle H .N / is bounded by the long curves y D =x and its negative y D =x (where 1 x N ), and the slope is the derivative .=x/0 D x 2 . The number of elementary cells C of size 2j1 1 2jk 2 intersecting a fixed hyperbolic needle Pi H .N / is estimated from above by the simple expression 2

N 2 C j 2j 1 1 2 k 2

:

(5.138)

5.6 More on the Riesz Product

303

Here the factor 2 comes from the two long boundary curves (hyperbolas), the first fraction comes from the pointed end of the hyperbolic needle, and the second fraction comes from the wide part of the needle. A more detailed explanation of (5.138) goes as follows. Let’s start with the pointed end of the hyperbolic needle H .N / (bounded by the curves y D =x and y D =x, where 1 x N ). p Case A: When x runs in the interval N x 2.j1 Cjk /=2 , then the slope of the intersection C \ .Pi H .N // is x 2 , which is less than 2.j1 Cjk / (=the slope of the diagonal of C ). Therefore, in this range Pi H .N / intersects less than 2

N 2j1

1

elementary cells C of size 2j1 1 2jk 2 . p Case B: When x runs in the interval 2.j1 Cjk /=2 x 1, then the slope of the intersection C \ .Pi H .N // is larger than 2.j1 Cjk / (=the slope of the diagonal of C ). Therefore, in this range Pi H .N / intersects less than 2

2 2jk

2

elementary cells C (of size 2j1 1 2jk 2 ). In Case A we look at the hyperbola xy D as y D =x and in Case B we look at it as x D =y, i.e., in Case B we switch the role of the coordinate axes. Thus by (5.137) and (5.138) we have ˇ ˇZ Z ˇ ˇ 2 N N 2j1 Cjk ˇ ˇ Rj1 .x/ Rjk .x/ d xˇ 2 j area.C / dx C ˇ p 1 ˇ ˇ Pi H .N / 2 1 N xD 2.j1 Cjk /=2 x2

C2

2 2 area.C / 2jk 2

Z

p yD 2.j1 Cjk /=2

2.j1 Cjk / dy: y2

(5.139)

Since area.C / D 2j1 1 2jk 2 , by using this fact in (5.139) we have, ˇ ˇZ ˇ ˇ 2j1 Cjk ˇ ˇ jk p .j1 Cjk /=2 Rj .x/ Rjk .x/ d xˇ 42 2 2 C ˇ ˇ ˇ Pi H .N / 1 N C41 2j1

p

2.j1 Cjk /=2 2.j1 Cjk /

p 4 .1 C 2 /2.j1 jk /=2 :

(5.140)

304


Let’s return now to (5.127)–(5.129). We recall the notation B D Œ0; M N Œ; M . We have 1 .M N /.M 2 /

Z

M N

Z

M

.x/T .x/ d x D 0

0

1 X area B \ .Pi H .N // A ı area.H .N // C D@ .M N /.M 2 / P 2P i

C

X X j 2J Pi 2P

C

X

X

k

1 .M N /.M 2 /

X

j1 0. Let .A1 ; A2 / denote the coordinates of the lower left corner of the j -cell C. The intersection of the vertical line x D A1 with the upper arcs of Pi0 H .N / and Pi H .N / gives two points: the hypothesis of Case 1 implies that these intersection points are “close” to each other. More precisely, with x D 1 C ai0 A1 (where ai0 A1 > 0 and the additive term “1” comes from the fact that the hyperbolic needle H .N / begins at 1), we have the upper bound ˇ ˇ

ˇ ˇ ˇ < 2 2j 2 : ˇ b i C bi C (5.153) ˇ 0 x xCh ˇ Since bi bi0 D v, we can rewrite (5.153) as follows: ˇ ˇ ˇ ˇ ˇ ˇ h ˇ ˇ j C1 ˇ ˇ ˇ ˇ 2 : ˇ x x C h vˇ D ˇ x.x C h/ vˇ < 2

(5.154)

On the other hand, we know that the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 7 5 j 4 2 4j : 6 x 6

(5.155)


309

We claim that if 1 (and so 2 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the slope is still almost equal to 4j . Indeed, the horizontal size of C is 2j 1 , and, assuming that (5.155) holds, the inequality 5 j 7 4j 4 j 2 6 .x C `2 1 / 6 has constant times 11 consecutive integer solutions in `. If 1 > 0 is small then of course 11 is a “large number,” justifying our claim. Returning to (5.154) and (5.155), and using the substitution x D x C `2j 1 , we have the inequalities ˇ ˇ ˇ ˇ h ˇ < 2j C12 ˇ v ˇ ˇ .x C `2j /.x C `2j C h/ 1 1

(5.156)

and 7 5 j 4j : 4 6 .x C `2j 1 /2 6

(5.157)

p

If (5.155) holds, then there are at least 101 consecutive integer solutions ` of (5.157). The basic idea is the following: if ` runs through these integer solutions of (5.157), and of course , x, h, v remain fixed, then the function (=function of `)

.x C

`2j

h j 1 /.x C `2 1 C h/

(5.158)

has “substantially different” values, and we expect only a very few of them to be “very close” to a fixed v in the quantitative sense of (5.156) (of course, here we assume that 2 is “small”). Next we work out the details of this intuition. We begin with the following corollary of (5.157): r

6 j 2 x C `2j 1 5

r

6 j 2 ; 7

(5.159)

and using this in (5.158), we have the good approximation

.x C

`2j

h h p j p j D j 2 . 2 C h/ 1 /.x C `2 1 C h/ D

h 2j 2j C

ph

:

(5.160)

310


p Next we distinguish two cases. First assume that 0 < h c1 2j 1 , where c1 > 0 is the positive constant in the “rectangle property.” Then the “rectangle property” yields that jvj

p c1 c1 p j 1 D 2 c1 2j ; h c1 2

(5.161)

and also h 2j 2j C

ph

h j 2 2j

p 2h j 2 C h pc1 C h 2 c1 C 1

(5.169)

Combining (5.167)–(5.169), (5.166) follows. Let’s return to (5.165) and (5.166), and apply it in (5.156). We obtain that, among the p const=1 consecutive integer values of ` satisfying (5.157), only const .1 C =c1 / will satisfy (5.156). More explicitly, it is safe to say that r values of ` will satisfy both (5.156) and (5.157): at most 10 1 C c1 (5.170) The next step is

5.6.3 A Combination of the Rectangle Property and the Pigeonhole Principle We recall (5.164): h > decomposition: 2r1

p

c1 2j 1 . We consider the following power-of-two type

p j p c1 2 < h 2r c1 2j ; r D 0; 1; 2; : : :

(5.171)

We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most r r 10 2 (5.172) c1 other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.156) [and implicitly (5.157)] and (5.171). To prove (5.172), first note that if h D hi satisfies (5.171), then by (5.160) and (5.171), h h .x C `2j 1 /.x C `2j 1 C h/ j j 2 2 C p 2 r c1 2 j D p 2j .2j C p1 2r c1 2j /

p1

ph

2j ; C p1c1 2r

312


so a solution of (5.156) gives the approximation v D vi 2

!

1

j p1

C

˙ 22 :

p1 2r c1

(5.173)

Assuming

2 < 8

1 p1

p1 c1

C

;

(5.174)

(5.173) yields the good approximation 1

v D vi 2j

p1

C

p1 2r c1

:

(5.175)

Now suppose that (5.172) is not true. If there are more than r 10

r 2 c1

other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.156) [and implicitly (5.157)] and (5.171), then by the Pigeonhole Principle and (5.175) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that vi1 2j

1 p1

C

p1 2r c1

vi 2

and jhi1 hi2 j

p j c1 2 c1 2 j D p : q 10 10 c1 2r 2r

Since the product 2j

1 p1

C

p1 2r c1

c1 2j c1 q p D 1 C c1 2r

is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property and proves (5.172).


313

If h D hi falls into the interval (5.171), then slope of C \ .Pi H .N // D

2 4j ; .x C h/2 h c1 4 r

(5.176)

where 4j (almost) equals the slope of the diagonals of the j -cell C. By (5.176) we have ˇZ ˇ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ 10 r : (5.177) ˇ ˇ area.C/ ˇ C\.Pi H .N // c1 4 What is more, (5.177) holds for all j -cells C satisfying the property 7 5 j 4 slope of C \ .Pi0 H .N // 4j : 6 6

(5.178)

Let’s return now to (5.152). Combining (5.170)–(5.172) and (5.177) we have X

X

Pi 2PW i ¤i0 CW all j cel ls satisfying (5.178) C ase 1

D 103

c1

ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ ˇ area.C/ C\.Pi H .N //

r r r 10 1 C 2 10 r D 10 c1 c1 c1 4 r0

X

3=2 C

Since there are at least (5.155) holds), we have

c1

101

1 0 3=2 2 ! 2 ! X r 3 @ 2 A D 2 10 C : c1 c1 r0 (5.179) consecutive integer solutions ` of (5.157) (assuming X CW all j cel ls satisfying (5.178)

p

1

: 101

(5.180)

Let’s return to (5.152). As we said above, in order to prove (5.152), we distinguish four cases. Inequalities (5.179) and (5.180) complete Case 1. The remaining three cases will be discussed in the next section. Note that these cases are quite similar to Case 1, but there are some annoying differences in the minor details. We will be able to finish the proof of Proposition 1.19 in Sect. 5.8.

314


5.7 Completing the Case Study 5.7.1 Verifying (5.152) Let’s return to (5.151) and (5.152). Again we assume that there is a dominating point Pi0 D Pi0 .j; C/ 2 P such that 1. C \ .Pi0 H .N // has slope between 56 4j and 76 4j ; 2. Pi0 H .N / intersects only the lower right subrectangle of the j -cell C, and the intersection is a “large triangle”; 1 3. to be a “large triangle” means that the area is 20 of the area of C, that is, 1 2 the area is 32 . Let Pi ¤ Pi0 be another point in P such that Pi H .N / intersects C, i.e., the upper or lower arc of the boundary of the hyperbolic needle Pi H .N / does intersect the j -cell C. Next we discuss the second case, which is similar to the first case: roughly speaking, we switch the roles of horizontal and vertical. Case 2: The upper arc of Pi H .N / intersects C, and the slope is larger than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.8 below). Let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points. By the hypothesis of Case 2, ai0 > ai . Write h D ai0 ai > 0 and v D bi0 bi ; where again h stands for horizontal and v stands for vertical. The “rectangle property” guarantees that hjvj c1 > 0. Let .A1 ; A2 / denote the coordinates of the upper left corner of the j -cell C. The intersection of the horizontal line y D A2 with the upper arcs of Pi0 H .N / and Pi H .N / give two points: the hypothesis of Case 2 implies that these intersection points are “close” to each other in the following quantitative sense. Write y D A2 bi0 , then we have the upper bound ˇ ˇ ˇ ˇˇ ˇ ai a < 2 2j 1 : i 0 ˇ yCv y ˇ

(5.181)

Since ai0 ai D h > 0, we can rewrite (5.181) as follows: ˇ ˇ ˇ ˇ ˇ ˇ ˇ v ˇ j ˇ ˇ ˇ ˇ ˇ y y C v hˇ D ˇ y.y C v/ hˇ < 2 2 1 : We emphasize that y C v > 0. Indeed, otherwise 0 y C v D .A2 bi0 / C .bi0 bi / D A2 bi ;

(5.182)

5.7 Completing the Case Study

315

implying bi A2 , which means that the whole upper arc of Pi H .N / is above the j -cell C. But this is impossible, since in Case 2 we assumed that the upper arc of Pi H .N / does intersect C.

Fig. 5.8

Since we switch the roles of horizontal and vertical, we focus on the reciprocal of the slope: we know that the reciprocal of the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 6 6 j 4 2 4j : 7 y 5

(5.183)

We claim that if 2 (and so 1 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the reciprocal of the slope is still almost equal to 4j . Indeed, the vertical size of C is 2j 2 , and, assuming that (5.183) holds, the inequality 6 j 6 4j 4 7 .y C `2j 2 /2 5 has constant times 12 consecutive integer solutions in `. If 2 > 0 is small then of course 12 is a “large number,” justifying our claim. Returning to (5.182) and (5.183), and using the substitution y D y C `2j 2 , we have the inequalities: ˇ ˇ ˇ ˇ v j C1 ˇ ˇ ˇ .y C `2j /.y C `2j C v/ hˇ < 2 1 2 2

(5.184)

316


and 6 6 j 4j : 4 7 .y C `2j 2 /2 5

(5.185)

p

If (5.183) holds, then there are at least 102 consecutive integer solutions ` of (5.185). The basic idea is the same as in Case 1: if ` runs through these integer solutions of (5.185), and of course , x, h, v remain fixed, then the function (=function of `) v .y C `2j 2 /.y C `2j 2 C v/

(5.186)

has “substantially different” values, and we expect only a very few of them to be “close” to a fixed h in the quantitative sense of (5.184) (of course, here we assume that 1 is “small”). Next we work out the details of this intuition. We begin with the following corollary of (5.185): r

6 j 2 y C `2j 2 7

r

6 j 2 ; 5

(5.187)

and using this in (5.186), we have the good approximation .y C

v v p j p j D j C `2 2 C v/ 2 . 2 C v/

`2j 2 /.y

D

2j

v 2j C

pv

:

(5.188)

Next we distinguish three cases. First assume that v is negative. Since y C v > 0, we have y 1 < .y C v/1 , and so by (5.184) we have 2j C1 1 > jhj D h:

(5.189)

Combining (5.189) with the rectangle property, we obtain jvj

c1 j c1 2 ; > h 21

(5.190)

and using (5.190) in (5.188), we have

2j

v 2j C

D

p1

pv

D

2j

jvj pv

2j

2j 2j p > 1 D 2j ; 1 p jvj2j

D

(5.191)


317

assuming c1 1 < p : 2

(5.192)

Combining (5.184) and (5.188)–(5.191), we conclude 2j C11 > h >

1 p j 2 2j C1 1 ; 2

which is an obvious contradiction if p 1 < : 8

(5.193)

This proves that v > 0. p Next assume that 0 < v c1 2j 1 , where c1 > 0 is the positive constant in the rectangle property. The rectangle property yields that h

p c1 c1 p j 1 D 2 c1 2j ; v c1 2

(5.194)

p c1 j 2 : 2

(5.195)

and also

2j

v j 2 C

pv

p : D q 2v p 2j C v c1 C v 2 c1 C 1

(5.202)

Combining (5.200)–(5.202), (5.199) follows. Let’s return to (5.198) and (5.199) and apply it in (5.184). We obtain that, among the p const=2 consecutive integer values of ` satisfying (5.185), only const .1 C =c1 / will satisfy (5.184). More explicitly, it is safe to say that r at most 10 1 C values of ` will satisfy both (5.184) and (5.185): c1 (5.203) Just like in Case 1, the next step is

5.7.2 A Combination of the Rectangle Property and the Pigeonhole Principle We recall (5.197): v > decomposition: 2r1

p j 1 c1 2 . We consider the following power-of-two type

p j p c1 2 < v 2r c1 2j ; r D 0; 1; 2; : : :

(5.204)


319

We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most r r 10 2 (5.205) c1 other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi0 bi > 0 satisfy (5.184) [and implicitly (5.185)] and (5.204). To prove (5.205), first note that if v D vi satisfies (5.204), then by (5.188) and (5.204), .y C

v v j C `2 2 C v/ 2j 2j C

`2j 2 /.y

p 2r c1 2j D p 2j .2j C p1 2r c1 2j /

p1

pv

2j ; C p1c1 2r

so a solution of (5.184) gives the approximation h D hi 2 j

!

1 p1

C

p1 2r c1

˙ 21 :

(5.206)

Assuming

1 < 8

1 p1

C

p1 c1

;

(5.207)

(5.206) yields the good approximation h D hi 2 j

1 p1

C

p1 2r c1

:

(5.208)

Now suppose that (5.205) is not true. If there are more than r 10

r 2 c1

other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi0 bi > 0 satisfy (5.184) [and implicitly (5.185)] and (5.204), then by the Pigeonhole Principle and (5.208) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that hi1 2j

1 p1

C

p1 2r c1

hi 2

320


and jvi1 vi2 j

p j c1 2 c1 2j D p : q 10 10 c1 2r

2r

Since the product 2j

1 p1

C

p1 2r c1

c1 c1 2j q p D 1 C c1 2r

is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property, and proves (5.205). If v D vi falls into the interval (5.204), then 2 4j ; .y C v/2 v c1 4 r (5.209) where 4j (almost) equals the reciprocal of the slope of the diagonals of the j -cell C. By (5.209) we have the reciprocal of the slope of C \ .Pi H .N // D

ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ 10 r : ˇ ˇ area.C/ ˇ C\.Pi H .N // c1 4

(5.210)

What is more, (5.210) holds for all j -cells C satisfying (5.178): 5 j 7 4 slope of C \ .Pi0 H .N // 4j : 6 6 Let’s return now to (5.152). Combining (5.203)–(5.205) and (5.210) we have the perfect analog of (5.179): X

X


ˇ ˇZ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ area.C/ ˇ C\.Pi H .N //

r r r 10 1 C 2 10 r D 10 c c c 1 1 14 r0 X

D 103

c1

3=2

C

1 0 2 ! X 3=2 2 ! r 3 @ : 2 A D 2 10 C c1 c c 1 1 r0 (5.211)

This completes Case 2.


321

Next comes Case 3: The lower arc of Pi H .N / intersects C, and the slope is smaller than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.9). We emphasize that we don’t distinguish between positive and negative slopes. Again let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points. By the hypothesis of Case 3, ai > ai0 . Write h D hi D ai ai0 > 0 and v D vi D bi bi0 ; where of course h stands for horizontal and v stands for vertical. It is obvious from the geometry of Case 3 that v D vi D bi bi0 > 0. The rectangle property guarantees that hv c1 > 0. Let .A1 ; A2 / denote the coordinates of the lower left corner of the j -cell C. The intersection of the vertical line x D A1 with the upper arc of Pi0 H .N / and the lower arc of Pi H .N / gives two points: the hypothesis of Case 3 implies that these intersection points are “close” to each other. More precisely, just like in Case 1 we write x D 1 C ai0 A1 , then we have the upper bound ˇ ˇ

ˇ ˇ ˇ b i C bi ˇ < 2 2j 2 : ˇ 0 x xCh ˇ

(5.212)

Fig. 5.9

Since bi bi0 D v, we can rewrite (5.212) as follows: ˇ ˇ ˇ ˇ ˇ ˇ < 2j C12 : v C ˇ x ˇ xCh

(5.213)

322


Notice that (5.213) is an analog of (5.154) in Case 1: the only difference is that the “minus” is replaced by “plus.” This means that we can basically repeat the argument in Case 1; in fact, the “plus” just helps and makes Case 3 simpler than that of Case 1. Furthermore, we know that the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 5 j 7 4 2 4j : 6 x 6

(5.214)

Again, if 1 (and so 2 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the slope is still almost equal to 4j . Indeed, the horizontal size of C is 2j 1 , and, assuming that (5.214) holds, the inequality 5 j 7 4j 4 j 2 6 .x C `2 1 / 6 has constant times 11 consecutive integer solutions in `. Using the substitution x D x C `2j 1 in (5.213) and (5.214), we have the inequalities ˇ ˇ ˇ ˇ j C1 ˇ ˇ 2 ˇ x C `2j C x C `2j C h vˇ < 2 1 1

(5.215)

and 7 5 j 4j : 4 6 .x C `2j 1 /2 6

(5.216)

p

If (5.214) holds, then there are at least 101 consecutive integer solutions ` of (5.216). The basic idea is the same: if ` runs through these integer solutions of (5.216), and of course , x, h, v remain fixed, then the function (=function of `) C x C `2j 1 x C `2j 1 C h

(5.217)

has “substantially different” values, and we expect only a very few of them to be “very close” to a fixed v in the quantitative sense of (5.215) (we assume that 2 is “small”). To work out the details of this intuition, we begin with the following corollary of (5.216): r

6 j 2 x C `2j 1 5

r

6 j 2 ; 7

(5.218)


323

and using this in (5.217), we have the good approximation C p j Cp j : x C `2j 1 x C `2j 1 C h 2 2 C h

(5.219)

Next we distinguish two cases. First assume that c1 0 < h p 2j 2 ; where c1 > 0 is the positive constant in the rectangle property. Then jvj

p c1 c1 p j 1 D 2 c1 2j : h c1 2

(5.220)

On the other hand, by (5.215) and (5.175), v2 p

p C 2j C1 2 < 4 2j ; j 2

(5.221)

assuming p : 2 < 2

(5.222)

Since (5.220) and (5.221) contradict each other, we can assume that c1 h > p 2j 2 ;

(5.223)

which is analogous to (5.164) in Case 1. Next, similarly to Case 1, we go back to the basic idea. We claim that if we switch ` to ` C 1 in the function [see (5.217)] ; C j x C `2 1 x C `2j 1 C h

(5.224)

then (5.224) changes at least as much as 1 2j 2 :

(5.225)

(Notice that (5.225) is analogous to (5.166).) Indeed, (5.225) immediately follows from the routine estimation ! 1 1 1 1 1 D p j 1 : p j p j 1 j p 2 2 C 2 1 2 1C 2j

324


Let’s return to (5.224) and (5.225) and apply it in (5.215). We obtain that at most 10 values of ` will satisfy both (5.215) and (5.216):

(5.226)

Just like in Case 1, the next step is

5.7.3 A Combination of the Rectangle Property and the Pigeonhole Principle. We recall (5.223): c1 h > p 2j 2 : We consider the following power-of-two type decomposition: c1 c1 2r1 p 2j 1 < h 2r p 2j 1 ; r D 0; 1; 2; : : :

(5.227)

We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most 10 2r

(5.228)

other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai ai0 > 0 and v D vi D bi bi0 satisfy (5.215) [and implicitly (5.216)] and (5.227). To prove (5.228), first note that if h D hi satisfies (5.227), then by (5.219) and (5.227), C p j Cp j j j x C `2 1 x C `2 1 C h 2 2 C h

p

2

j

1C

!

1 1C

c1 r1 2

;

so a solution of (5.215) gives the approximation p v D vi 2j

1C

1 1C

c1 r1 2

! ˙ 2j C1 2 :

(5.229)


325

Assuming 2
0 and v D vi D bi bi0 satisfy (5.215) [and implicitly (5.216)] and (5.227), then by the Pigeonhole Principle and (5.231) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that p vi1 2j

1C

!

1 1C

c1 r1 2

vi2

and jhi1 hi2 j

2r pc1 2j 1 10

2r

D

c1 2j p : 20

Since the product p

2

j

1C

1 1C

c1 r1 2

!

c 1 2j p 2

is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property and proves (5.228). If h D hi falls into the interval (5.227), then slope of C \ .Pi H .N // D

.=c1 /2 j 4 ; .x C h/2 h2 4r2

(5.232)

where 4j (almost) equals the slope of the diagonals of the j -cell C. By (5.232) we have ˇ ˇZ ˇ ˇ 1 .=c1 /2 ˇ ˇ Rj .x/ d xˇ 10 r2 : (5.233) ˇ ˇ area.C/ ˇ C\.Pi H .N // 4

326


What is more, (5.233) holds for all j -cells C satisfying (5.178): 5 j 7 4 slope of C \ .Pi0 H .N // 4j : 6 6 Let’s return now to (5.152). Combining (5.226)–(5.228) and (5.233) we have X

X


X

ˇZ ˇ ˇ ˇ 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ ˇ area.C/ C\.Pi H .N //

10 10 2r 10

r0

D 42 103

.=c1 /2 D 4r2

0 1 2 X 2 @ 2r A D 32 103 : c1 c1

(5.234)

r0

This completes Case 3. Finally, we study Case 4: The lower arc of Pi H .N / intersects C, and the slope is larger than the slope of the “dominant needle” Pi0 H .N / (see Fig. 5.10). As usual, we don’t distinguish between positive and negative slopes. Again let Pi0 D .ai0 ; bi0 /, Pi D .ai ; bi / denote the coordinates of the two points. By the hypothesis of Case 4, ai0 > ai . We want positive real numbers: we write h D hi D ai0 ai > 0 and v D vi D bi bi0 > 0; where h stands for horizontal and v stands for vertical. The rectangle property guarantees that hv c1 > 0. Let .A1 ; A2 / denote the coordinates of the lower left corner of the j -cell C. In Case 4 we have bi > A2 > bi0 and bi A2 > A2 bi0 . The intersection of the horizontal line y D A2 with the upper arc of Pi0 H .N / and the lower arc of Pi H .N / gives two points: the hypothesis of Case 4 implies that these intersection points are relatively close to each other in the following quantitative sense. Write y D A2 bi0 > 0; then bi A2 D .bi bi0 / y D v y > y, and we have the upper bound ˇ ˇ ˇ ˇˇ ˇ ai a < 2 2j 1 : i0 ˇ vy y ˇ

(5.235)


327

Since ai0 ai D h > 0, we can rewrite (5.235) as follows: ˇ ˇ ˇ ˇ ˇ ˇ < 2 2j 1 : h ˇ y vy ˇ

(5.236)

Fig. 5.10

Now we basically repeat the argument of Case 2. But, just like Case 3 was a simpler version of Case 1, Case 4 is a simpler version of Case 2. Case 4 is similar to Case 3 in the following technical detail: the two critical functions f3 .y/ D

C and f4 .y/ D y yCh y vy

(5.237)

328


are similar in the sense that if y increases (decreases) then both parts of the function increase (decrease) at the same time. We refer to this fact by saying “f3 .y/ and f4 .y/ are in synchrony.” Since in both Cases 2 and 4 we switch the role of horizontal and vertical, here again we focus on the reciprocal of the slope: we know that the reciprocal of the slope of the upper arc of C \ .Pi0 H .N // satisfies the inequality 6 6 j 4 2 4j : 7 y 5

(5.238)

We know that if 2 (and so 1 ) is a small constant, then the upper arc of Pi0 H .N / intersects a “large number” of j -cells (different from C) such that the reciprocal of the slope is still almost equal to 4j . Returning to (5.236) and (5.238), and using the substitution y D y C `2j 2 , we have the inequalities ˇ ˇ ˇ ˇ j C1 ˇ ˇ ˇ y C `2j v .y C `2j / hˇ < 2 1 2 2

(5.239)

and 6 6 j 4j : 4 j 2 7 .y C `2 2 / 5

(5.240)

p

If (5.238) holds, then there are at least 102 consecutive integer solutions ` of (5.240). The basic idea is the same: if ` runs through these integer solutions of (5.240), and of course , x, h, v remain fixed, then the function (=function of `) j y C `2 2 v .y C `2j 2 /

(5.241)

has “substantially different” values, and we expect only a very few of them to be “close” to a fixed h in the quantitative sense of (5.240) (of course, here we assume that 1 is “small”). Next we work out the details of this intuition. We begin with the following corollary of (5.240): r

6 j 2 y C `2j 2 7

r

6 j 2 : 5

(5.242)

Since “f3 .y/ and f4 .y/ are in synchrony” [see (5.237)], we can basically repeat the argument of (5.224)–(5.226) in Case 3 and obtain that if we switch ` to ` C 1 in the function [see (5.241)] ; y C `2j 2 v .y C `2j 2 /

(5.243)


329

then (5.243) changes at least as much as 2 2j 2 :

(5.244)

(Notice that (5.244) is analogous to (5.199) and (5.225).) Thus we obtain that at most 10 values of ` will satisfy both (5.239) and (5.240):

(5.245)

Just like in Cases 1–3, the next step is

5.7.4 A Combination of the Rectangle Property and the Pigeonhole Principle. Since in Case 4 we have v .y C `2j 2 / > y C `2j 2 H) v > 2.y C `2j 2 /;

(5.246)

by (5.242) we can assume that r v>2

6 j 2 : 7

(5.247)

We consider the following power-of-two type decomposition: r 2

r1

6 j C2 < v 2r 2 7

r

6 j C2 r D 0; 1; 2; : : : 2 7

(5.248)

We claim that, for a fixed point Pi0 D .ai0 ; bi0 / 2 P and for a fixed integer r 0, there are at most 100

r 2 c1

(5.249)

other points Pi D .ai ; bi / 2 P (Pi ¤ Pi0 ) such that h D hi D ai0 ai > 0 and v D vi D bi bi0 > 0 satisfy (5.239) [and implicitly (5.240)] and (5.248). To prove (5.249), first note that if v D vi satisfies (5.248), then by (5.239), (5.242) and (5.246), h D hi
0 satisfy (5.239) [and implicitly (5.240)] and (5.248), then by the Pigeonhole Principle and (5.250) there must exist two points Pi1 ; Pi2 2 P (i1 ¤ i2 ) such that p maxfhi1 ; hi2 g 2 2j ; and 2r jvi1 vi2 j

q

6 j C2 7 2 100 c1 2r

q c1

D

6 7

p 2j : 25

Since the product p

q r c1 67 6 j j 2 p 2 D c1 7

is less than c1 , we obtain that there exists an axes-parallel rectangle of area less than c1 containing at least two points of P (namely, Pi1 and Pi2 ). This contradicts the rectangle property and proves (5.249). If v D vi falls into the interval (5.248), then 1 2 r 4j ; 2 .y C v/ v 4 (5.252) j where 4 (almost) equals the reciprocal of the slope of the diagonals of the j -cell C. By (5.252) we have ˇ ˇZ ˇ 10 ˇ 1 ˇ ˇ R .x/ d x (5.253) ˇ r: ˇ j ˇ 4 area.C/ ˇ C\.Pi H .N // the reciprocal of the slope of C \ .Pi H .N // D

What is more, (5.253) holds for all j -cells C satisfying (5.178): 5 j 7 4 slope of C \ .Pi0 H .N // 4j : 6 6


331

Let’s return now to (5.152). Combining (5.245), (5.248), (5.249), and (5.253) we have ˇZ ˇ ˇ ˇ X X 1 ˇ ˇ Rj .x/ d xˇ ˇ ˇ area.C/ ˇ C\.Pi H .N // Pi 2PW i ¤i0 CW all j cel ls satisfying (5.178) C ase 4

X

10 100

r0

r 10 2 r D c1 4

0 1 X D 104 @ 2r A D 2 104 : c1 r0 c1

(5.254)

This completes Case 4.

5.8 Completing the Proof of Theorem 5.11 Here we finally complete the proof of Proposition 5.12. Let’s return to (5.151) and (5.152); we are now ready to clarify the technical details of the Single Term Domination. Let Pi0 2 P and j 2 J be arbitrary. The slope x2 of the hyperbolic needle Pi0 H .N / satisfies Property I: 5 j 7 4 2 4j 6 x 6

(5.255)

if and only if r

6p j 2 x 7

r

6p j 2 ; 5

which is an interval of length r

6 5

r ! p j 6 p j 2 > 2 : 7 6

Since a j -cell C has horizontal side 1 2j , there are more than p j 2 6 1 2j

p D 61

j -cells C such that the slope of the intersection C \ .Pi0 H .N // satisfies Property I [see (5.255)].

332


It would be not too difficult to prove directly—by usingpsome familiar arguments from uniform distribution—that, among these more than 61 j -cells C, at least 1 % has the following additional property that we call Property II: Pi0 H .N / intersects only the lower right subrectangle of C, and the 1 intersection is a “large triangle”, meaning that the area is 32 of the area of C, i.e., 1 2 the area is 32 . It is technically simpler, however, to force Property II in an indirect way: by using the trick of short vertical translations; see Fig. 5.11 below. This geometric trick was already mentioned at the end of Sect. 5.5. More precisely, for every real number t0 in 0 < t0 < 1, consider all j -cells C such that with B D Œ0; M N Œ; M we have C \ ..Pi0 C .0; t0 // H .N // B

(5.256)

and 5 j 7 4 slope of C \ ..Pi0 C .0; t0 // H .N // 4j : 6 6

(5.257)

A simple geometric consideration shows that, for (say) at least 5 % of the pairs .t0 ; C/, where C satisfies (5.256) and (5.257), C \ ..Pi0 C .0; t0 // H .N // also satisfies Property II. That is, .Pi0 C .0; t0 // H .N / intersects only the lower right subrectangle of C, and the intersection is a “large triangle,” meaning that the area is 1 1 2 32 of the area of C, i.e., the area is 32 .

Fig. 5.11

For the proof of the positive direction (5.1060), we choose the pattern C in every j -cell C satisfying (5.256) and (5.257) [and of course we choose the negative pattern C for the negative direction (5.10600)]. Then we have Z C\..Pi0 C.0;t0 //H .N //

Rj .x/ d x

1 2 : 32

(5.258)

Finally, if the j -cell C does not satisfy both (5.256) and (5.257), then we choose the 0 pattern. Therefore, by (5.258) and Cases 1–4, we have


Z

0 1

@

t0 D0

.Pi0 C.0;t0 //H .N /

X

X

j 2J

Pi0 2PW for all 0 0 and > 0 be arbitrary real numbers. Then there is an effectively computable positive constant ı 0 D ı 0 . / > 0 depending only on > 0 such that for every sufficiently large integer N there exist two real numbers ˇ1 .N /, ˇ2 .N / in the unit interval 0 ˇ1 .N / < ˇ2 .N / < 1 such that jF .˛I ˇ1 .N /I I N / F .˛I ˇ2 .N /I I N /j > ı 0 log N: We just outline the proof in a couple of sentences, since it is basically the same as that of Theorem 5.14 (without the Cantor set construction). Indeed, let q` N < q`C1 . Since q`C1 D a` q` C q`1 .a` C 1/q` , we have 1

N a` C 1: q`

Again we distinguish two cases. Case 1: We have `1 k X p ˘ jp ai C N=q` 100 2 log N: i D1

Then, by repeating the argument of Case 1 in the proof of Theorem 5.14, we obtain Proposition 5.18 (see Lemma 5.17).

5.10 General Point Sets: Theorem 5.19

349

Next comes Case 2: We have `1 X p

k ˘ jp ai C N=q` > 100 2 log N:

i D1

Then F .˛I ˇ D 0I I N / > 100 2 log N; and so we can choose ˇ1 .N / D 0. Finally, for ˇ2 .N / we can choose any “below average point”: any ˇ2 .N / D ˇ with F .˛I ˇI I N / .2 C o.1// log N , see (5.283).

5.10 General Point Sets: Theorem 5.19 Let’s return to Theorem 5.11. What happens if we drop the “rectangle property” in Theorem 5.11, or—what is basically the same thing—in Proposition 5.12? Can we still prove “extra large deviations” for hyperbolic needles? This is what we discuss here. Let P be a finite point set of density ı > 0 in a “large” square Œ0; M 2 , i.e., jPj D ıM 2 . We just make a very mild technical assumption: we assume that P is “not clustered.” More precisely, we introduce a new concept called the “separation constant” D .P/. We say that P is -separated for some constant 0 if the (usual Euclidean) distance between any two points of the set P is at least . For example, the set of lattice points in the plane is clearly 1-separated, i.e., D 1. Our basic idea is the following: we show that if P is -separated with some not too small constant > 0, then the rectangle property holds, at least in a weak statistical sense, for the majority of the directions—we p call them the “good” directions. (For example, in Theorem 5.11 the slope 1= 2 is a concrete “good” direction.) This is how we will be able to save the Riesz product argument in the proof of Theorem 5.11 (or Proposition 5.12), and still prove “extra large deviations” (proportional to the area) for hyperbolic needles, at least for the majority of the directions. In the rest of the section we work out the details of the vague intuition—this will give us Theorem 5.19. The obvious handicap of this “majority approach” is that, for an arbitrary point set P, which is “not clustered,” we cannot predict that a given concrete direction is “good” or not. Another, purely technical, shortcoming is that in Theorem 5.19 we cannot get rid of the assumption that P is “not clustered.” This technical difficulty is rather counterintuitive, since, at least at first sight, “clusters” actually seem to help creating

350


“extra large deviations”. Nevertheless, some technical difficulties prevent me from adapting the Riesz product technique for “clustered” point sets P. It remains an interesting open problem to decide whether or not in Theorem 5.19 the separation constant D .P/ plays any role. In Theorem 5.19 we change the underlying set: we switch from the “large” square Œ0; M 2 to the “large” disk ˚ disk.0I M / D x 2 R I 2 W jxj M

(5.319)

of radius M centered at the origin. (The reason behind this change is rotation invariance: Theorems 5.3 and 5.11 were about the translated copies, and Theorem 5.19 is about the rotated and translated copies of the hyperbolic needle.) Let P be a finite point set of density ı > 0 in the “large” disk disk.0I M /, i.e., jPj D ıM 2 (we assume that the radius M is “large”). We also assume that P is “not clustered.” More precisely, we assume that P is -separated for some positive constant D .P/ > 0. The goal is to count the number of elements of P in the rotated and translated copies of our usual hyperbolic needle H .N /. Let 102 > > 0 be a (small) positive real numbers (to be specified later). Let j be an arbitrary integer in the interval 0 j n where 2n N , that is, n D log N= log 2 C O.1/ (binary logarithm). We decompose the “large” disk disk.0I M / [see (5.319)] into disjoint translated copies of the small rectangle Œ0; 2j 1 Œ0; 2j 2 ;

(5.320)

i.e., we form a rectangle lattice starting from the origin. We just focus on the copies of (5.320) which are inside the large disk disk.0I M /, i.e., we ignore the copies of (5.320) that intersect the boundary circle or are outside of the disk. Note that there are O.2j M / copies of (5.320) that intersect the boundary circle of the large disk. If 2j D o.M / then there are .1 C o.1//M 22 copies of (5.320) that are inside the large disk disk.0I M /; we call these translated copies of the small rectangle (5.320) j -cells. More precisely, we call them j -cells of angle 0. In general, let be an arbitrary angle in 0 < . Let Rot denote the rotation of the plane by the angle ; we assume that the fixpoint of Rot is the origin. We decompose the “large” disk disk.0I M / into disjoint translates of the rotated copy Rot Œ0; 2j 1 Œ0; 2j 2

(5.321)

of the small rectangle (5.320). We just focus on the translated copies of (5.321) which are inside the large disk disk.0I M /. Again, if 2j D o.M / then there are .1 C o.1//M 2 2 translated copies of (5.321) that are inside the large disk disk.0I M /; we call these translated copies of the small rectangle (5.321) j -cells of angle .

5.10 General Point Sets: Theorem 5.19

351

We want to prove, in a quantitative form, that if P is “not clustered”, then for a “typical” angle 2 Œ0; /, the overwhelming majority of the j -cells of angle that contain at least one point from P contain exactly one point from P. A quantitative result like this—a statistical version of the “rectangle property”—will serve as a substitute for the “rectangle property,” and it will suffice to save the Riesz product technique developed in Sects. 5.5–5.8.

5.10.1 Statistical Version of the Rectangle Property: An Average Argument Let Pi1 ; Pi2 2 P (i1 ¤ i2 ) be an arbitrary pair of points. Define the “angle-set” angle .Pi1 ; Pi2 I j /D f 2 Œ0; / W there is a j -cell of angle containing both Pi1 ; Pi2 g. The angle-set angle.Pi1 ; Pi2 I j / is clearly measurable; let jangle.Pi1 ; Pi2 I j /j denote the usual one-dimensional Lebesgue measure (“length”). A simple geometric consideration shows that jangle.Pi1 ; Pi2 I j /j < 2

2j ; jPi1 Pi2 j

(5.322)

where 2j is the length of the short side of a j -cell and jPi1 Pi2 j is the (usual euclidean) distance of the two points. The basic idea is to estimate the following double sum: X

jangle.Pi1 ; Pi2 I j /j
const area.H 0 / D const log N > 0 P 2P\H 0

with some positive constant, and similarly there is another translated (or rotated and translated) copy H 00 of the hyperbolic needle H .N / with negative ˙1-discrepancy: X

'.P / < const area.H 00 / D const log N < 0

P 2P\H 00

with some negative constant. Note that the Riesz product technique can be easily adapted to prove extra large fluctuations for the ˙1-discrepancy. For example, here is the ˙1-discrepancy analog of Proposition 5.12 (=basically Theorem 5.11). Proposition 5.20 (“˙1-discrepancy for translated copies”). Let P be a finite set of points in the square Œ0; M 2 with density ı, i.e., the number of elements of P is jPj D ı M 2 . Let ' W P ! f1; C1g be an arbitrary “2-coloring” of the point set P. We study the ˙1-discrepancy X

'.P /

P 2P\H

for the translated copies H of the hyperbolic needle H .N /. Assume that P satisfies the following “rectangle property”: there is a positive constant c1 D c1 .P/ > 0 such that every axes-parallel rectangle of area c1 contains at most one element of the set P. Furthermore, assume that both N and M=N are “large” in the precise sense of (5.333). Then there is a translated copy H D x1 C H .N / of the hyperbolic needle H .N / such that H Œ0; M 2 and X

'.P / ı 0 log N;

(5.331)

P 2P\H

where ı 0 D ı 0 .c1 ; ; ı/ > 0 is a positive constant, independent of N and M , to be specified below in (5.15). Similarly, there is another translated copy H D x2 C H .N / of the hyperbolic needle H .N / such that H Œ0; M 2 and X P 2P\H

'.P / ı 0 log N;

(5.332)

356


with the same ı 0 D ı 0 .c1 ; ; ı/ > 0 as in (5.331); namely, 0

0

ı D ı .c1 ; ; ı/ D 10

12

107 c1 107 c12 p ı min : ; c1 ; ; 20 2 2

(5.333)

Finally, the assumption that both N and M=N are “large” goes as follows:

10 C 1

N 2

;

1 N < 2n N; 2

C 1 .N C 2 / o: n p M > 1011 7 107 c 2 ı c1 min 20 ; c1 ; 10 2 c1 ; 2 1

(5.334)

As we said, the proof is a straightforward adaptation of the arguments in Sects. 5.5–5.8. Similarly, one can easily prove the following analog of Theorem 5.19. Proposition 5.21 (“˙1-discrepancy for rotated and translated copies”). Let P be a finite set of points in the disk disk.0I M / with density ı, i.e., the number of elements of P is jPj D ı M 2 . Let ' W P ! f1; C1g be an arbitrary “2-coloring” of the point set P. We study the ˙1-discrepancy X

'.P /

P 2P\H

for the rotated and translated copies H of the hyperbolic needle H .N /. Assume that P is -separated with some > 0. Furthermore, assume that both N and M=N are sufficiently large depending only on , ı, and . Then there is a measurable subset A Œ0; 2/ such that A is larger than (say) 99 99 % of the interval Œ0; 2/ (i.e., the Lebesgue measure of A is larger than 100 2), and for every angle 2 A there is a translate H D x1 CRot H .N / of the rotated copy Rot H .N / of the hyperbolic needle H .N /—rotated by angle —such that H disk.0I M / and X

'.P / ı 0 log N;

(5.335)

P 2P\H

where ı 0 D ı 0 .; ; ı/ > 0 is a positive constant, independent of N and M . Similarly, there is another translate H D x2 C Rot H .N / of the rotated copy Rot H .N / such that H disk.0I M / and X

'.P / ı 0 log N;

P 2P\H

where ı 0 D ı 0 .; ; ı/ > 0 is the same positive constant as in (5.335).

(5.336)

5.11 The Area Principle in General

357

We want to point out that in Proposition 5.21, which is about the ˙1-discrepancy of hyperbolic needles, we definitely need some extra condition implying “P is not too clustered.” Indeed, it is easy to construct an extremely clustered point set P for which the ˙1-discrepancy of the hyperbolic needles is negligible. For example, we can start with a “typical” point set in general position and split up every point into a pair of points being extremely close to each other. The two points in the extremely close pairs are joined with a straight line segment each; we refer to these line segments as the “very short line segments.” Consider the particular 2-coloring of the point set where the two points in the extremely close pairs all have different “colors”: one is +1 and the other one is 1. We can easily guarantee that this particular 2-coloring has negligible ˙1-discrepancy for the family of all hyperbolic needles congruent to H .N /. If the original point set was in general position and the point pairs are close enough, than the arcs of any congruent copy of H .N / intersect at most two “very short line segments.” Since the boundary of H .N / consists of four arcs, the ˙1-discrepancy is at most 4 2 D 8, which is indeed negligible.

5.11 The Area Principle in General Proof of Theorem 5.7. We use the theory of continued fractions. This is of course not surprising, since the complete solution of the homogeneous inequality (5.57), or (5.18), was determined by Euler and Lagrange exactly by using the tool of continued fractions. We note in advance that the last step in the proof is an application of the Chebyshev inequality. We use the Ostrowski representation of integers with respect to any fixed irrational 0 < ˛ < 1, given by the continued fraction ˛D

1 1 a1 C a2 C : : :

D Œa1 ; a2 ; a3 ; : : :;

Œa1 ; a2 ; : : : ; ak1 D pk =qk with q1 D 1, q2 D a1 , qn D an1 qn1 C qn2 for all n 3. Since qn D an1 qn1 C qn2 , every positive integer n can be written in the form nD

k X

di qi ; di are integers

(5.337)

i D1

where 0 di ai (see [Os]). An analog of the Ostrowski representation of integers can be developed for the representation of the real number ˇ. Write n D qn ˛ pn ; then n D an1 n1 C n2 :

(5.338)

358


Note that n D .1/n1 jn j; and jn2 j D an1 jn1 j C jn j:

(5.339)

In the theorem we can assume without loss of generality that 0 < ˛ < 1, so 1 D ˛ > 0 and 2 D a1 ˛ 1 < 0. Now every real number ˇ in the interval ˛ ˇ < 1 ˛ of length one (any interval of length one is fine, since the theorem is about modulo one) can be written in the form ˇD

1 X

bi i ; bi are integers;

(5.340)

i D1

where 0 b1 a1 1 and 0 bi ai for i 2. We can make representation (5.340) unique by enforcing the Extra Rule bi D ai implies bi 1 D 0 for all i 2;

(5.341)

and we also require that b2i C1 ¤ a2i C1 for infinitely many i:

(5.342)

Note that the minimum value of representation (5.340)–(5.342) is attained at a2 2 C a4 4 C a6 6 C : : : D .1 C 3 / C .3 C 5 / C .5 C 7 / C : : : D D 1 D ˛;

(5.343)

and similarly the maximum value of representation (5.340)–(5.342) is attained at .a1 1/1 C a3 3 C a5 5 C : : : D .a1 1/1 C .2 C 4 / C .4 C 6 / C : : : D D .a1 1/1 2 D .a1 1/˛ .1 a1 ˛/ D .1 ˛/;

(5.344)

but because of (5.342), equality in (5.344) cannot occur. This explains the interval ˛ ˇ < 1 ˛. Inserted Remark. Note that representation (5.340)–(5.342) was independently introduced by Cassels [Ca2], Descombes [De], and Sós [So1], and it was constantly used by Sós in her research of studying the irregularities of the irrational rotation (see, e.g., [So2, So3]).


359

By (5.337) and (5.340) (we use to indicate equality modulo one) n˛ ˇ D

k X

di qi ˛

i D1

k X

di .qi ˛ pi /

i D1

1 X

bi i

i D1 1 X

bi .qi ˛ pi /

i D1

k X

.di bi /i

i D1

1 X

bj j .mod 1/:

(5.345)

j >k

The term kn˛ ˇk is particularly small if di D bi for 1 i k

(5.346)

0 D bkC1 D bkC2 D : : : D bkC` ;

(5.347)

and also

meaning a relatively long zero-block of ` consecutive coefficients bj —the same idea as in Sect. 5.4. By (5.345)–(5.347) ˇ ˇ ˇ ˇ 1 ˇ ˇ X kn˛ ˇk ˇˇ bj j ˇˇ I ˇ ˇj >kC`

(5.348)

the larger `, the better inequality (5.348). First we need the technical Lemma 5.22. If bm ¤ 0 then j

1 X

bj j j bm jm j C jmC1 j:

(5.349)

j Dm

Proof. We have 0 .1/

m1 @

1 X

1 bj j A D bm jm jbmC1jmC1 jCbmC2jmC2 jbmC3jmC3 j˙

j Dm

bm jm j bmC1 jmC1 j bmC3 jmC3 j bmC5 jmC5 j

(5.350)

360


Since bm ¤ 0 we have bmC1 amC1 1, and using the recurrence formula (5.339): jn2 j D an1 jn1 j C jn j repeatedly, we obtain bm jm j bmC1 jmC1 j jmC1 j C jmC2 j; jmC2 j bmC3 jmC3 j jmC4 j; jmC4 j bmC5 jmC5 j jmC6 j; and so on. Applying these inequalities in (5.350), we have 0 .1/m1 @

1 X

1 bj j A .bm 1/jm j C jmC1 j:

(5.351)

j Dm

On the other hand, by a telescoping sum argument 0 .1/m1 @

1 X

1 bj j A bm jm j C bmC2 jmC2 j C bmC4 jmC4 j C

(5.352)

j Dm

bm jm j C .jmC1 j jmC3 j/ C .jmC3 j jmC5 j/ C .jmC5 j jmC7 j/ C D bm jm j C jmC1 j: Equations (5.351) and (5.352) prove Lemma 5.22.

t u

We recall the following well-known fact from the theory of continued fraction: ˇ ˇ ˇ ˇ 1 1 ˇ˛ pm ˇ < ” jm j D jqm ˛ pm j < : ˇ ˇ qm qm qmC1 qmC1

(5.353)

By Lemma 5.22 and (5.353) we have the following upper bound in (5.348): kn˛ ˇk
r C 1:

(5.370)

Now we are ready to prove (5.369) by induction on .t r/. We have Ar;t j .B/ D qt j C1 jr j C .1/t j r qr jt j C1j for both j D 1; 2, and returning to (5.370), we conclude Ar;t .B/ D at .qt jr j C .1/t 1r qr jt j/ C qt 1 jr j C .1/t 2r qr jt 1 j D D jr j.at qt C qt 1 / C .1/t r qr .at jt j C jt 1 j/ D D qt C1 jr j C .1/t r qr jt C1 j; proving (5.369), and this completes the proof of Lemma 5.24.

t u

Now it is easy to compute the measure of the intersection (5.368). First assume that the distance d D k2 k1 `1 1 is 1. We know from the proof of Lemma 5.23 that the number of permissible sequences .b1 ; b2 ; : : : ; bk1 1 / satisfying (5.340)– (5.342) is qk1 if bk1 D B1 ¤ ak1 and qk1 1 if bk1 D B1 D ak1 . By Lemma 5.24 the number of permissible sequences .bk1 C`1 C1 D B2 ¤ 0; bk1 C`1 C2 ; : : : ; bk2 1 / of length d is qk2 jk1 C`1 C1 j C .1/d C1 qk1 C`1 C1 jk2 j if bk2 D B3 ¤ ak2 and qk2 1 jk1 C`1 C1 j C .1/d qk1 C`1 C1 jk2 1 j if bk2 D B3 D ak2 :


365

Finally, note that, just like in Lemma 5.23, the tail series 1 X

b i i

i Dk2 C`2 C2

completely fills out an interval of length jk2 C`2 C1 j. Write X D S.bk1 D B1 ; bk1 C`1 C1 D B2 /

(5.371a)

Y D S.bk2 D B3 ; bk2 C`2 C1 D B4 /:

(5.371b)

and

Lemma 5.25. We have jmeas.X \ Y / meas.X /meas.Y /j 22d ; meas.X /meas.Y / where d D k2 .k1 C `1 C 1/ 1 is the “distance”. Proof. We distinguish four cases. We begin with Case 1: Assume that d D k2 k1 `1 1 is 1, B1 ¤ ak1 , B3 ¤ ak2 Then we have meas.X \ Y / D qk1 qk2 jk1 C`1 C1 j C .1/d C1 qk1 C`1 C1 jk2 j jk2 C`2 C1 j: On the other hand, by Lemma 5.23, meas.X / D qk1 jk1 C`1 C1 j and meas.Y / D qk2 jk2 C`2 C1 j: It follows that qk C` C1 jk2 j jmeas.X \ Y / meas.X /meas.Y /j D 1 1 : meas.X /meas.Y / qk2 jk1 C`1 C1 j

(5.372)

We need the almost trivial inequality qi Cd 2bd=2c ; qi which follows from the successive application of the recurrence qi D ai 1 qi 1 C qi 2 qi 1 C qi 2 2qi 2 ;

(5.373a)

366


and we also need the following analog of (5.373a): ji j 2bd=2c : ji Cd j

(5.373b)

By (5.372) and (5.373), we have jmeas.X \ Y / meas.X /meas.Y /j 21d ; meas.X /meas.Y /

(5.374)

where d D k2 .k1 C `1 C 1/ 1 is the “distance.” Inequality (5.374) justifies the term exponentially weak dependence, which is the reason behind the Area Principle (a “zero–one law”). Case 2: Assume that d D k2 .k1 C `1 C 1/ 1, B1 D ak1 , B3 D ak2 Then [see (5.371)] meas.X \ Y / D qk1 1 qk2 1 jk1 C`1 C1 j C .1/d qk1 C`1 C1 jk2 1 j jk2 C`2 C1 j; and by Lemma 5.23, meas.X / D qk1 1 jk1 C`1 C1 j and meas.Y / D qk2 1 jk2 C`2 C1 j: Combining these facts with (5.373), we obtain qk C` C1 jk2 1 j jmeas.X \ Y / meas.X /meas.Y /j D 1 1 22d ; meas.X /meas.Y / qk2 1 jk1 C`1 C1 j

(5.375)

which is basically the same as (5.374) (we lost an irrelevant factor of 2). It is easy to check that (5.375) remains true for the remaining two cases with d 1: Case 3: B1 ¤ ak1 , B3 D ak2 and Case 4: B1 D ak1 , B3 ¤ ak2 . In all four cases we have exponentially weak dependence. This completes the proof of Lemma 5.25. t u Now we are ready to complete the proof of Theorem 5.7: we simply use the exponentially weak dependence in a Chebyshev’s inequality as follows. (The most difficult part is to find a good notation.) Let k;`;B1 ;B2 denote the characteristic function of the set S.bk D B1 ; bkC`C1 D B2 / defined by (5.357)–(5.359): ( k;`;B1 ;B2 .ˇ/ D

1; if ˇ 2 S.bk D B1 ; bkC`C1 D B2 /I 0; if ˇ 62 S.bk D B1 ; bkC`C1 D B2 /:

We have a probabilistic viewpoint: the interval ˛ ˇ < 1 ˛ of length one is considered the whole probability space, and the usual “length” (one-dimensional Lebesgue measure), denoted by meas.: : :/, is the probability. So the expectation


367

E k;`;B1 ;B2 D meas .S.bk D B1 ; bkC`C1 D B2 // ; and the sum [see (5.356)] X

k;`;B1 ;B2 .ˇ/

(5.376)

m0 kM 2

counts the number of integral solutions of the diophantine inequality kn˛ ˇk D O. .n//

(5.377)

(the implicit constant in (5.377) is absolute) in the range 1 n qM , since by (5.361) B1 qk n .B1 C 2/qk qM : Here M is a parameter; we choose M ! 1 at the end of the proof. To apply Chebyshev’s inequality, we need to compute the variance 0 E@

X

12 . k;`;B1 ;B2 E k;`;B1 ;B2 /A D

m0 kM 2

D

X

. k;`;B1 ;B2 E k;`;B1 ;B2 /2 C

m0 kM 2

C2

X

E. k1 ;`1 ;B1 ;B2 E1 /. k2 ;`2 ;B3 ;B4 E2 /;

(5.378)

m0 k1 k1 C `1 C 1 or k2 D k1 C `1 C 1; B2 D B3 :

368


By (5.374) and (5.375) jE A1 A2 Pr.A1 / Pr.A2 /j 22d Pr.A1 / Pr.A2 /;

(5.380)

where d D k2 .k1 C `1 C 1/ 1. Using these facts in (5.378), we have X

Variance in (5.378)

Pr.A1 / C

X 1

m0 k1 M 2

C

X 2

;

(5.381)

where X 1

X

X

D

Pr.A1 \ A2 /

(5.382)

A1 W m0 k1 M 2 A2 W k1 C`1 C1Dk2 M 2 B2 DB3

and (5.380) X 2

0

X

D

Pr.A1 / @

A1 W m0 k1 M 2

1

X

X

Pr.A2 / 22d A :

d 1 A2 W k1 C`1 C1Dk2 M 2

(5.383) Since the sets A2 with fixed k2 are pairwise disjoint, we have X 1

X

Pr.A1 /;

(5.384)

m0 k1 M 2

and similarly X 2

0 1 X Pr.A1 / @ 22d A D 4

X m0 k1 M 2

d 1

X

Pr.A1 /:

(5.385)

m0 k1 M 2

Combining (5.381)–(5.385) we obtain X

Variance in (5.378) 6

Pr.A1 /:

(5.386)

m0 k1 M 2

By Chebyshev’s inequality and (5.386), for any 2 3 X X Pr 4 A1 Pr.A1 / 5 m0 k1 M 2

m0 k1 M 2

0 1 2 @6

X m0 k1 M 2

1 Pr.A1 /A :

(5.387)


369

Write T D T .M / D

X

Pr.A1 /;

m0 k1 M 2

then by (5.367) and (5.379), T D T .M / ! 1 as M ! 1:

(5.388)

We choose D .M / D

1 T .M /; 2

then by (5.387), 2 Pr 4

X m0 k1 M 2

A1

3 1 24 T .M /5 1 : 2 T .M /

(5.389)

Taking M ! 1, by (5.388) and (5.389) we obtain X km0

k;`;B1 ;B2 .ˇ/ D

X

A1 D 1

km0

for almost every ˇ 2 Œ˛; 1 ˛/, and by (5.376) and (5.377) this gives infinitely many integral solutions of the diophantine inequality kn˛ ˇk D O. .n//:

(5.390)

Since the implicit constant in (5.390) is absolute, the proof of Theorem 5.7 is complete. u t

Chapter 6

More on Randomness

6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition p We recall [see (5.31)] that F . 2I ˇI I N / denotes the number of lattice points in the long and narrow p hyperbolapsegment (“hyperbolic needle”) located along the line y D .x C ˇ/= 2 of slope 1= 2 p ˚ H . 2I ˇI N / D .x; y/ 2 ZZ2 W .x C ˇ/2 2y 2 ; 0 < y N; x > 0 : (6.1) p In the special case ˇ D 0 the line is y D x= 2 passing through the origin, and we simply write p p ˚ H . 2I N / D H . 2I 0I N / D .x; y/ 2 ZZ2 W x 2 2y 2 ; 0 < y N; x > 0g :

(6.2)

In Theorem 5.4 we study the case where ˇ runs in the unit interval 0 ˇ < 1; then ˇ ˇ

p p ˇ ˇ F . 2I ˇI I N / D ˇ H . 2I N / v.ˇ/ \ ZZ2 ˇ ;

(6.3)

where we use the standard notation that S C v means the translated copy of a set S , translated by the vector v, and in (6.3) the vector is v.ˇ/ D .ˇ; 0/. We also recall the well-known fact that the set of all positive integral solutions .pi ; qi / 2 ZZ2 of the Pell’s equation x 2 2y 2 D ˙1 forms a cyclic group generated by the least positive solution; formally, pi ˙ qi

p

2 D .1 ˙

p

2/i ; i 0;


371

372

6 More on Randomness

where all positive integral solutions of x 2 2y 2 D 1 are given by pi ˙ qi

p

2 D .1 ˙

p 2i 2/

and all of x 2 2y 2 D 1 by pi ˙ qi

p

2 D .1 ˙

p 2i C1 2/ :

It follows that pi D

p

p

p p 1 1 .1 C 2/i C .1 2/i and qi D p .1 C 2/i .1 2/i ; 2 2 2 (6.4)

and in particular we have .p0 ; q0 / D .1; 0/; .p1 ; q1 / D .1; 1/; .p2 ; q2 / D .3; 2/; .p3 ; q3 / D .7; 5/, and so on. For p every integer i 0 we define a “hyperbolic triangle” Ti D Ti . / D Ti . 2I / as follows. Let Li denote the half line starting from the origin .0; 0/ and passing through the lattice point .pi ; qi /. The “hyperbolic triangle” Ti D Ti . / is bounded by the lines Li ; Li C2 and the hyperbola x 2 2y 2 D in the positive quadrant if i 1 is odd and bounded by the lines Li ; Li C2 and the hyperbola x 2 2y 2 D in the positive quadrant p if i 0 is even. This means that Ti D Ti . / is below or above the line y D x= 2 depending on whether i 0 is even or odd. Note that Ti D Ti . / has vertices .0; 0/, .pi ; qi /, and .pi C2 ; qi C2 /. 12 is a fundamental automorphism We also use the fact that the matrix A D 11 of ˙.x 2 2y 2 / (indeed, x12 2y12 D .x C 2y/2 2.x C y/2 D .x 2 2y 2 /), and Ai D

i 12 ; i 2 ZZ 11

give rise to infinitely many automorphisms preserving the lattice points and the area. In particular, we have A

pi 12 pi C1 pi D D ; 11 qi qi qi C1

which implies ATi D Ti C1 . Thus we have Ti D Ai T0 , and in general Aj Ti D Ti Cj . The matrix A has determinant 1 (explaining why it preserves the area), and all hyperbolic triangles have the same area log.1 C area .Ti . // D p 2

p

2/

:

(6.5)

6.1 Starting the Proof of Theorem 5.4: Blocks-and-Gaps Decomposition

373

What we are interested in is the one-dimensional family of translations by the vectors v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 [see (6.3)]; nevertheless it turns out to be very useful to involve an extra dimension, and to study translations by all twodimensional vectors v 2 R I 2 (so we can take advantage of the rich geometry of the plane). This explains why we focus on the lattice point counting function ˇ ˇ fi .v/ D ˇ.Ti v/ \ ZZ2 ˇ ; v 2 R I 2; (6.6) where Ti D Ti . /. Since ZZ2 is periodic, the function fi .v/ is defined on the unit torus v 2 Œ0; 1/2 D R I 2 =ZZ2 . The fact Aj Ti D TiCj implies that fi .v/; v 2 Œ0; 1/2 ; i D 0; 1; 2; 3; : : : ; is a stationary sequence. This term in probability theory means that the joint cumulative distribution is invariant under the time shift, which in this special case is equivalent to ˚ area v 2 Œ0; 1/2 W fi .v/ a0 ; fi C1 .v/ a1 ; : : : ; fi C` .v/ a` D ˚ D area v 2 Œ0; 1/2 W fj Ci .v/ a0 ; fj CiC1 .v/ a1 ; : : : ; fj Ci C` .v/ a` for all integers i; ` 0, j 1 and reals a0 ; a1 ; : : : ; a` , where j is the time shift. Classical probability theory is mainly about independent random variables. The study of mixing stationary processes in discrete (and continuous) time came up later as a natural extension of independent identically distributed random variables. It is well known since the 1960s (or perhaps even earlier) that a discrete stationary process with exponentially fast mixing exhibits a central limit theorem (CLT). Exponentially fast mixing in our special case would mean the following: sup .E1 ;E2 / with time gap j W PrŒE1 >0

jPrŒE2 jE1 PrŒE2 j c j with some c > 1 for all j 1; (6.7)

where the pair .E1 ; E2 / runs through all possible events of the form ˚ E1 D v 2 Œ0; 1/2 W fi .v/ a0 ; fi C1 .v/ a1 ; : : : ; fi C` .v/ a` ; ˚ E2 D v 2 Œ0; 1/2 W fj Ci .v/ a0 ; fj Ci C1 .v/ a1 ; : : : ; fj Ci C` .v/ a` with time gap j , and of course PrŒE2 jE1 D

PrŒE1 \ E2 PrŒE1

denotes the conditional probability with Pr=area=two-dimensional Lebesgue measure.

374


Unfortunately we cannot prove (6.7) (it may be false). This means we don’t see any shortcut way to prove our CLT (Theorem 5.4) by directly applying some existing result in probability theory. What we can prove is the weaker version of (6.7): jPrŒE2 jE1 PrŒE2 j c j with some c > 1 for all j 1

(6.8)

holds for the “majority” of the pairs E1 ; E2 of events with PrŒE1 > 0 and time gap j . We refer to (6.8) as “exponentially fast majority mixing.” Unfortunately it is a long, nontrivial technical task to make “exponentially fast majority mixing” precise, and to derive from it a CLT. To do so, we borrow a decomposition technique from probability theory. It goes back to the works of S.N. Bernstein in the 1920s; we call it a “blocks-and-gaps” decomposition. Sections 6.1 and 6.2 are about the application of this method. We summarize the results of this method at the beginning of Sect. 6.3 in Lemma 6.3. (A reader in rush may jump ahead to Lemma 6.3 right now.) Another idea is to employ “Rademacher like functions”. Let 0 r0 < r1 < r2 < r3 < : : : be an arbitrary sequence of integers. A sequence '1 .x/; '2 .x/; '3 .x/; : : : of functions defined on the unit interval 0 x < 1 is called a sequence of Rademacher like functions of type 0 r0 < r1 < r2 < r3 < : : : if the following two properties hold: 1. 'j .x/ is a step function such that it is constant on every subinterval a2rj x < .a C 1/2rj , 0 a < 2rj integer, j 1; 2. the distribution of 'j .x/ on the longer subinterval a2rj 1 x < .a C 1/2rj 1 is independent of the value of a, where 0 a < 2rj 1 integer. It is obvious from the definition that a sequence of Rademacher like functions forms a sequence of independent random variables. Let 0 1 < 2 be arbitrary integers, and consider the lattice point counting function representing a “block” [see (6.6)] f .1 ; 2 I v/ D

X 1 i 2

fi .v/ D

X ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ ; v 2 R I 2;

(6.9)

1 i 2

where Ti D Ti . /. Since ZZ2 is periodic, the function f .1 ; 2 I v/ is actually defined on the unit torus v 2 Œ0; 1/2 D R I 2 =ZZ2 . p p 12 has eigenvalues 1 C 2 and 1 2; the eigenvector The matrix A D 11 p p . 2; 1/ of 1 C 2 represents the magnifying p p direction for the positive powers of A, and the eigenvector . 2; 1/ of 1 2 represents the “shrinking” direction. 2 The magnifying direction explains why we tilt the p half-open unit square Œ0; 1/ in such a way that the vertical side has slope 1= p2, thatp is, we consider the halfopen parallelogram with vertices .0; 0/; .1; 0/; . 2; 1/; . 2 C 1; 1/; let P0 denote this half-open parallelogram. Notice that P0 is equivalent to the unit square Œ0; 1/2 modulo one, i.e., the distribution of (6.9) is exactly the same as that of


f .1 ; 2 I v/ D

X

fi .v/ D

1 i 2

X ˇ ˇ ˇ.Ti v/ \ ZZ2 ˇ ; v 2 P0 ;

375

(6.10)

1 i2

where the longer sides of the parallelogram P0 are parallel to the magnifying 12 direction of matrix A D . 11 Given integers r 0 and 0 a < 2r , let P0 .rI p a/ denote thephalf-open parallelogram with vertices .a2r ; 0/, ..a C 1/2r ; 0/, . 2 C a2r ; 1/, . 2 C .a C 1/2r ; 1/. Notice that P0 is the disjoint union of P0 .rI a/, 0 a < 2r . We say that an interval .a2r ; .a C 1/2r / is 0-robust with respect to the lattice point counting function f .1 ; 2 I v/ if f .1 ; 2 I v/ is constant on the parallelogram v 2 P0 .rI a/. For later application we introduce now a generalization of the concept of 0robust intervals. Let s 0 be an arbitrary integer, p and let P ps denote the half-open parallelogram with vertices .0; 0/, .1; 0/, .2s 2; 2s /, .2s 2 C 1; 2s /. Again let r 0, 0 a < 2r be integers, and let Ps .rI parallelogram p a/ denote the half-open p with vertices .a2r ; 0/, ..aC1/2r ; 0/, .2s 2Ca2r ; 2s /, .2s 2C.aC1/2r ; 2s /. Notice that Ps is the disjoint union of Ps .rI a/, 0 a < 2r . We say that an interval .a2r ; .a C 1/2r / is s-robust with respect to the lattice point counting function f .1 ; 2 I v/ if f .1 ; 2 I v/ is constant on the parallelogram v 2 Ps .rI a/. If fi .v/ is constant on the parallelogram Ps .rI a/ for every 1 i 2 then of course f .1 ; 2 I v/ is also constant on the parallelogram Ps .rI a/. Let Ps;0 .r/ denote the parallelogram satisfying the following three properties: 1. Ps;0 .r/ is centered at the origin; 2. Ps;0 .r/ has two horizontal sides of length 2rC1 on the lines y D 2s and y D 2s ; p 3. the other two sides have slope 1= 2. Let 2Ps;0 .r/ D f2x W x 2 Ps;0 .r/g denote the twice as large magnified copy of Ps;0 .r/. Let i be an integer with 1 i 2 . We define the Ps;0 .r/-neighborhood of the boundary curve @Ti of the hyperbolic triangle Ti D Ti . / as follows (@ denotes the boundary) Ps;0 .r/-neighborhood-of-@Ti D fx C y W x 2 @Ti and y 2 Ps;0 .r/g :

(6.11)

If the translated copy .Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0/ of (6.11) [translated by the vector .a2r ; 0/] does not contain a lattice point 2 ZZ2 , then fi .v/ is clearly constant on the parallelogram Ps .rI a/. It follows that if .Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0/ does not contain a lattice point 2 ZZ2 for any 1 i 2 , then f .1 ; 2 I v/ is constant on the parallelogram Ps .rI a/.

376


We clearly have 2r

r 1 2X

ˇ ˇ ˇ..Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0// \ ZZ2 ˇ

aD0

Z

ˇ ˇ ˇ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v D

v2Ps

Z D v2Ai Ps

ˇ ˇ ˇ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v;

(6.12)

12 is measure-preserving, and of where we used the fact that the matrix A D 11 course 2Ps;0 .r/-neighborhood-of-@Ti means that in (6.11) we replace Ps;0 .r/ with the twice as large copy 2Ps;0 .r/. We have Z ˇ ˇ ˇ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v D v2Ai Ps

Z D v2Ai Ps

Z D

ˇ i ˇ ˇA ..2Ps;0 .r/-neighborhood-of-@Ti / v/ \ ZZ2 ˇ d v D

ˇ i ˇ ˇ A .2Ps;0 .r// -neighborhood-of-@T0 w \ ZZ2 ˇ d w;

(6.13)

w2Ps

since Ai Ti D Ai Ai T0 D T0 , where T0 D T0 . / is the hyperbolic triangle with vertices .0; 0/; .; 0/; .3; 2 /. We say that a lattice point n 2 ZZ2 is relevant in equation (6.13) if n 2 Ai .2Ps;0.r// -neighborhood-of-@T0 w holds for some w 2 Ps : (6.14) i The sides of .2Ps;0 .r// are parallel to the magnifying p the parallelogram A sC4 p that eigenvector . 2; 1/ p have length 2 .1 C 2/i and the other two sides have rC4 length 2 .1 C 2/i . Combining this with (6.14), we obtain that there are less than

p p

104 1 C 2s .1 C 2/i 1 C 2r .1 C 2/i .1 C 2 / (6.15) lattice points that are relevant in equation (6.13) [see (6.14)].


377

Similarly, we obtain the trivial upper bound area of Ai .2Ps;0 .r// -neighborhood-of-@T0 p p

104 2s .1 C 2/i C 2r .1 C 2/i :

(6.16)

Combining the trivial fact [see (6.14)] n 2 Ai .2Ps;0.r// -neighborhood-of-@T0 w ” ” w 2 Ai .2Ps;0 .r// -neighborhood-of-@T0 n with Fubini’s theorem (“continuous double counting”), we obtain the upper bound Z

ˇ i ˇ ˇ A .2Ps;0.r// -neighborhood-of-@T0 w \ ZZ2 ˇ d w w2Ps

Œnumber of relevant lattice points in (6.13) AREA;

(6.17)

AREA D area of Ai .2Ps;0.r// -neighborhood-of-@T0 :

(6.18)

where

Combining (6.12)–(6.18), we have 2r

r 1 2X

ˇ ˇ ˇ..Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0// \ ZZ2 ˇ 108 .1 C 2 /

aD0

p p p p

1 C 2s .1 C 2/i 1 C 2r .1 C 2/i 2s .1 C 2/i C 2r .1 C 2/i : (6.19) Switching to the union set [

Ti ;

1 i2

by (6.19) we obtain 2r

ˇ ˇ ˇ ˇ

r 1 2X ˇ

aD0

2

r

r 1 2X

Ps;0 .r/-neighborhood-of-@

[ 1 i 2

! Ti

ˇ ˇ ˇ .a2r ; 0/ \ ZZ2 ˇ ˇ !

X ˇ ˇ ˇ..Ps;0 .r/-neighborhood-of-@Ti / .a2r ; 0// \ ZZ2 ˇ

aD0 1 i 2

378


108 .1 C 2 /

X p 1 C 2s .1 C 2/i 1 i 2

p p p

1 C 2r .1 C 2/i 2s .1 C 2/i C 2r .1 C 2/i

p 108 .1 C 2 /.2 1 C 1/ 1 C 2s .1 C 2/1

p p p 1 C 2r .1 C 2/2 2s .1 C 2/1 C 2r .1 C 2/2 :

(6.20)

Trivial geometric consideration gives that if Ps;0 .r/-neighborhood-of-@

[

! .a2r ; 0/

Ti

1 i 2

does not contain a lattice point 2 ZZ2 then f .1 ; 2 I v/ is constant on the parallelogram Ps .rI a/. Combining this with (6.20) we obtain that there are at most

p 2r 108 .1 C 2 /.2 1 C 1/ 1 C 2s .1 C 2/1

p p p 1 C 2r .1 C 2/2 2s .1 C 2/1 C 2r .1 C 2/2 integers a with 0 a < 2r such that the set Ps;0 .r/-neighborhood-of-@

[

! Ti

.a2r ; 0/

1 i 2

contains a lattice point. This proves the following lemma. Lemma 6.1. There are at most

p 2r 108 .1 C 2 /.2 1 C 1/ 1 C 2s .1 C 2/1

p p p 1 C 2r .1 C 2/2 2s .1 C 2/1 C 2r .1 C 2/2 integers a in 0 a < 2r such that the interval .a2r ; .a C 1/2r / is not s-robust with respect to the lattice point counting function f .1 ; 2 I v/. Now we are ready to start the “blocks-and-gaps” decomposition and to define our Rademacher like functions. We proceed by induction.


379

Let B1 D

[

[

Ti and B2 D

` 1 we have Z .z/ D

1

cos.x/ sin.z=x/ dx D

0

p p p D p z1=4 sin.2 z/ C cos.2 z/ C O.z1=24 / 2 2

(6.93)

and Z

1

‰.z/ D

sin.x/ sin.z=x/ dx D 0

p p p D p z1=4 sin.2 z/ cos.2 z/ C O.z1=24 / ; 2 2

(6.94)

and finally for 0 < z 1 we have p p j.z/j 3 z and j‰.z/j 2 z:

(6.95)

We postpone the proof of Lemma 6.5 to Sects. 6.6 and 6.7. By Lemma 6.5, 2 . 2 n=2/ C ‰ 2 . 2 n=2/ D O . n/1=2 C O.1/:

(6.96)

398


Also we use the well-known number-theoretic fact that the divisor function is relatively small: .n/ D O.n" / for any " > 0. Combining this with (6.92) and (6.96), we have 1 X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ D 2 n nD1

! ! 1 1 X X p n"3=2 C O.1/ n"2 D D O. / nD1

nD1

p D O. / C O.1/;

(6.97)

proving the boundedness of series (6.89). By (6.89), the “variance constant” 2 D 2 . / is a sum of infinitely many terms 0, but this fact alone does not guarantee that 2 > 0, and it is even less clear why 2 D 2 . / cannot be “extremely close to zero.” The following lemma settles this issue. Lemma 6.6. There are absolute constants 0 < c1 < c2 (independent of ) such that c1 < 2 . / < c2 for all 0 < 1 and p p c1 < 2 . / < c2 for all > 1: Moreover, we have the asymptotic formula 1 X 2 . / R˙ .n/ 2 p : p D p !1 2 log.1 C 2/ nD1 n3=2

lim

We postpone the proof of Lemma 6.6 to Sects. 6.6 and 6.7. The following lemma is the link between Lemmas 6.3 and 6.4. Let Eˆh denote the expectation of the random variable ˆh .v/, v 2 P0 ; formally, Z ˆh .v/ d v: (6.98) Eˆh D P0

Similarly, let Eˆh denote the expectation of the random variable ˆh .v/, v 2 P0 . Write ˆh;0 D ˆh Eˆh and ˆh;0 D ˆh Eˆh :

(6.99)

6.3 Estimating the Variance

399

Lemma 6.7. Under the condition of Lemma 6.3, we have (using the same notation) ˇ ˇ !1=2 ˇ b

1=2 ˇˇ X p ˇ 2 ˇ Variance ˇ ˆh .v/ C ˆh .v/ ./b.k C 3`/ log.1 C 2/ C O.1/ ˇ ˇ ˇ ˇ hD1 104 .1 C 2 / C b".I k; `/104 .1 C 2 /.2 1 C 1/ C

p b".I k; `/104 .1 C 2 /.2 1 C 1/;

where ".I k; `/ is defined in (6.83). Similarly, ˇ

1=2 ˇˇ p ˇ ˇ Varv2P ˆh .v/ 1=2 2 . /3` log.1 C 2/ C O.1/ ˇ 0 ˇ ˇ 104 .1 C 2 / C ".I k; `/104 .1 C 2 /3` C

p ".I k; `/104 .1 C 2 /3`:

Finally, we have ˇ !ˇ b ˇ X ˇˇ ˇ ˆh .v/ C ˆh .v/ ˇ ".I k; `/104 .1 C 2 /b 2 .k C 3`/: ˇEv2P0 f .1 ; 2 I v/ ˇ ˇ hD1

Proof of Lemma 6.7. By Lemma 5.8 and (6.5), Z

Z P0

fi .v/ d v D

P0

ˇ ˇ ˇ.Ti . / v/ \ ZZ2 ˇ d v D

Z

ˇ ˇ ˇ.Ti . / v/ \ ZZ2 ˇ d v D Œ0;1/2

D area .Ti . // D

log.1 C p 2

p 2/

:

(6.100)

Equation (6.100) means that the random variable fi .v/, v 2 P0 has expectation log.1 C p Efi D 2

p 2/

:

(6.101)

We are going to apply Lemma 6.4 with [see (6.78)] p p 1 1 K D p .1 C 2/1 1 and L D p .1 C 2/2 ; 2 2 2 2 where 1 D d C ` C 1 and 2 D d C .3b C 1/` C bk:

(6.102)

In view of (6.4) K is the nearest integer to q1 1 and L is the nearest integer to q2 , and combining this with (6.86), we obtain that the symmetric set-difference

400


HK;L . / n

[

! Ti . / [

1 i 2

[

! Ti . / n HK;L . /

1 i 2

p can be easily covered by less than 104 .1 C 2 / rectangles that all have slope 1= 2 and area 1=5. So by Lemma 5.5, ˇˇ ˇ ˇ ˇˇ.HK;L . / v/ \ ZZ2 ˇ f .1 ; 2 I v/ˇ 104 .1 C 2 /:

(6.103)

Moreover, by (6.87) and (6.5) we have p area .HK;L . // D p log.L=K/ D p log.1 C 2/2 1 C1 D 2 2 ! [ p D p .2 1 C 1/ log.1 C 2/ D area Ti . / D Ef .1 ; 2 /; 2 1 i 2 (6.104) where Ef .1 ; 2 / denotes the expected value of the random variable f .1 ; 2 I v/, v 2 P0 . t u By (6.103), ˇ 2 ˇ Ev2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ f .1 ; 2 I v/ 108 .1 C 2 /2 :

(6.105)

We recall Minkowski’s inequality: kF C Gkp kF kp C kGkp for 1 p 1;

(6.106)

where k : : : kp denotes the Lp -norm. Note that (6.106) plays the role of the triangle inequality in the Lp -space, and it will be repeatedly used below. Combining (6.103)–(6.105), and Minkowski’s inequality in the special case p D 2, we have (Var stands for the variance) ˇ ˇ ˇ ˇ1=2 ˇ ˇ .Varv2P0 f .1 ; 2 I v//1=2 ˇ ˇ Varv2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ ˇ 2 1=2 ˇ Ev2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ f .1 ; 2 I v/ 104 .1C 2 /: By repeated application of Minkowski’s inequality with p D 2, we have ˇ !1=2 ˇˇ ˇ b X ˇ ˇ ˇ.Varv2P f .1 ; 2 I v//1=2 Varv2P ˇ ˆh .v/ C ˆh .v/ 0 0 ˇ ˇ ˇ ˇ hD1

(6.107)

6.3 Estimating the Variance

Varv2P0

401

!!1=2 b X f .1 ; 2 I v/ ˆh .v/ C ˆh .v/ hD1

0 @Ev2P0 f .1 ; 2 I v/

b X

!2 11=2 ˆh .v/ C ˆh .v/ A C

hD1

C Ev2P0 f .1 ; 2 I v/

b X

ˆh .v/ C ˆh .v/

! :

(6.108)

b" ;

(6.109)

hD1

We recall the following corollary of Lemma 6.3: ( area v 2 P0 W f .1 ; 2 I v/ ¤

b X

ˆh .v/ C ˆh .v/

)

hD1

where

2 p p p " D ". I k; `/ D 108 .1C 2 /4k 1 C .1 C 2/`C1 .1C 2/`C1 C400.1C 2/`=2 C

2 p p p k k C 108 .1 C 2 /12` 1 C .1 C 2/ 3 C1 .1 C 2/ 3 C1 C 400 .1 C 2/k=6 : (6.110) Furthermore, by Lemma 5.5, max f .1 ; 2 I v/ 104 .1 C 2 /.2 1 C 1/;

(6.111)

b X ˆh .v/ C ˆh .v/ 104 .1 C 2 /.2 1 C 1/:

(6.112)

v2P0

and similarly max v2P0

hD1

By (6.109)–(6.112), ˇ !ˇ b ˇ X ˇˇ ˇ ˆh .v/ C ˆh .v/ ˇ b" 104 .1 C 2 /.2 1 C 1/ D ˇEv2P0 f .1 ; 2 I v/ ˇ ˇ hD1

D b" 104 .1 C 2 /b.k C 3`/;

(6.113)

402


and 0 @Ev2P0 f .1 ; 2 I v/

b X

!2 11=2 ˆh .v/ C ˆh .v/ A

hD1

p b" 104 .1 C 2 /.2 1 C 1/:

(6.114)

Combining (6.107), (6.108), (6.113), and (6.114), the triangle inequality gives ˇ !1=2 ˇˇ ˇ b X ˇ ˇ ˇ ˇ ˇ ˇ Varv2P ˇ.HK;L . / v/ \ ZZ2 ˇ 1=2 Varv2P ˆh .v/ C ˆh .v/ 0 0 ˇ ˇ ˇ ˇ hD1 ˇ ˇ ˇ ˇ1=2 ˇ ˇ ˇ Varv2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ .Varv2P0 f .1 ; 2 I v//1=2 ˇ C ˇ !1=2 ˇˇ ˇ b X ˇ ˇ ˇ C ˇˇ.Varv2P0 f .1 ; 2 I v//1=2 Varv2P0 ˆh .v/ C ˆh .v/ ˇ ˇ ˇ hD1 p b" 104 .1 C 2 /.2 1 C 1/: (6.115) By using (6.88) in Lemma 6.4 with the choice (6.102), we have

104 .1 C 2 / C b" 104 .1 C 2 /.2 1 C 1/ C

p ˇ ˇ Varv2P0 ˇ.HK;L . / v/ \ ZZ2 ˇ D 2 . /.2 1 C 1/ log.1 C 2/ C O.1/: (6.116) Combining (6.115) and (6.116), we have ˇ 11=2 ˇˇ 0 ˇ b ˇ ˇ

1=2 X p ˇ 2 ˇ @Varv2P0 ˆh .v/ C ˆh .v/ A ˇ ˇ . /.2 1 C 1/ log.1 C 2/ C O.1/ ˇ ˇ ˇ ˇ hD1

p b" 104 .1 C 2 /.2 1 C 1/: (6.117) Repeating the proof of (6.117) with 2 1 C 1 D 3` instead of 2 1 C 1 D b.k C 3`/, we obtain 104 .1 C 2 / C b" 104 .1 C 2 /.2 1 C 1/ C

ˇ

1=2 ˇˇ p ˇ ˇ Varv2P ˆh .v/ 1=2 2 . /3` log.1 C 2/ C O.1/ ˇ 0 ˇ ˇ 104 .1 C 2 / C " 104 .1 C 2 /3` C

p

" 104 .1 C 2 /3`:

Combining (6.113), (6.117)–(6.118), and (6.110), Lemma 6.7 follows.

(6.118)

6.4 Applying Probability Theory

403

6.4 Applying Probability Theory We are now ready to prove Theorem 5.4. Theorem 5.4 is about the typical fluctuations of the lattice point counting function ˇ ˇ

p p ˇ ˇ F . 2I ˇI I N / D ˇ H . 2I N / v.ˇ/ \ ZZ2 ˇ ;

(6.119)

where parameter ˇ runs in the interval 0 ˇ < 1, i.e., we study the effect of the one-dimensional family of translations by the vectors v.ˇ/ D .ˇ; 0/ [see (6.1)–(6.3)]. As we explained at the beginning of Sect. 5.1, it is natural to switch from the linear scale N to the exponential scale e N . Let I0 D I0 . I N / denote the largest integer i such that thephyperbolic triangle Ti D Ti . / is still contained in the hyperbolic needle H . 2I e N /. By definition, ˚ I0 D I0 . I N / D max i 2 ZZ W qi C2 e N ; and using (6.4): p p

p 1 1 qi D p .1 C 2/i .1 2/i D nearest integer to p .1 C 2/i ; 2 2 2 2 we obtain that p N C log.2 2= / I0 D I0 . I N / D p 2; log.1 C 2/

(6.120)

where the slightly ambiguous (6.120) means either the upper or the lower integral part of the right-hand side. The set-difference [ p Ti . / H . 2I e N / n 0i I0 . IN /

p can be easily covered by less than 104 .1 C 2 / rectangles that all have slope 1= 2 and area 1=5. The first consequence of this fact is the straightforward inequality 0 p N

area H . 2I e / area @

[

0i I0 . IN /

1

104 p .1C 2 /; Ti . /A area H . 2I e N / 5

404


and the second consequence via Lemma 5.5 is the following: X

X

p fi .v.ˇ// F . 2I ˇI I e N /

0i I0 . IN /

fi .v.ˇ// C 104 .1 C 2 /

0i I0 . IN /

(6.121) for every vector v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 (and of course for every > 0). We choose jp k jp k I0 and k D I0 .log I0 /2 f0 or 1 or 2g (6.122) bD in such a way that k is divisible by 3. Then

p p p I0 I0 .log I0 /2 D I0 I0 .log I0 /2 I0 bk > >

p

p

p I0 1 I0 .log I0 /2 3 > I0 I0 .log I0 /2 C 4 :

(6.123)

For simplicity, we assume first that I0 [defined in (6.120)] has the special form I0 D I0 . I N / D 2 D .3b C 1/` C bk

(6.124)

(see (6.78) with d D 0). By (6.123) and (6.124), p p I0 I0 .log I0 /2 C 4 ; .log I0 /2 ` < 3b C 1 3b C 1 and by (6.122), p 1 1 bC1 b 1 1 I0 C > > > ; 3 3b 3b C 1 3b C 1 3b C 1 3 3b so we have the upper and lower bounds 1 1 .log I0 /2 1 ` < .log I0 /2 C 2: 3 3

(6.125)

By (6.124), X

fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ//; where 1 1 D `

0i I0 . IN /

(6.126) (see (6.78) with d D 0). Since ` is “relatively small,” the dominating part of (6.126) is f .1 ; 2 I v.ˇ//. In view of Lemma 6.3 the distribution of


405

f .1 ; 2 I v.ˇ//; v.ˇ/ D .ˇ; 0/; 0 ˇ < 1; is “almost the same” as that of f .1 ; 2 I v/, v 2 P0 . Moreover, we have the equality f .1 ; 2 I v/ D

b X

ˆh . v/ C

hD1

b X

ˆh .v/

(6.127)

hD1

for the “overwhelming majority” of v 2 P0 . Since parameter ` is “small” compared to k, the sum b X

ˆh .v/ is the dominating part in (6.127):

hD1

This is a sum of independent and identically distributed random variables, so it is natural to apply the standard CLT in probability theory. For later applications we use a more general version that goes beyond identically distributed components. (Note that we already used such a version in Sect. 1.3, see (1.90).)

6.4.1 Central Limit Theorem with Explicit Error Term (Berry–Esseen version) Let Z1 ,Z2 , : : :,Zn be independent random variables with expectation EZi D 0, variance EZi2 < 1, and also EjZi j3 < 1 for all 1 i n. Write W D

n X

EjZi j3 and V D

i D1

n X

EZi2 :

i D1

Then for every real ˇ ˇ Z 1 ˇ ˇ 40W u2 =2 ˇPr Z1 C Z2pC : : : C Zn p1 ˇ< e d u ˇ ˇ V 3=2 : V 2

(6.128)

In order to apply (6.128) P we need some information about the second and third central moments of the sum bhD1 ˆh .v/, v 2 P0 . By using the notation (6.98) and (6.99), and the independence relations at the end of Lemma 6.3, we have (Var stands for variance) !2 b b X X Var D ˆh C ˆh D E ˆh;0 C ˆh;0 hD1

hD1

406


D

b X

Var.ˆh / C

hD1

C

b X

Eˆh;0 ˆh;0 C

hD1

b X

Var.ˆh /C

hD1 b1 X

EˆhC1;0 ˆh;0 :

(6.129)

hD1

We apply the Cauchy–Schwarz inequality: q ˇ ˇ p ˇEˆh;0 ˆh;0 ˇ Var.ˆh / Var.ˆh /; and similarly q ˇ p ˇ ˇEˆhC1;0 ˆh;0 ˇ Var.ˆhC1 / Var.ˆh /: Using these inequalities in (6.129), we obtain ˇ ˇ b b ˇ ˇ X X ˇ ˇ Var.ˆh /ˇ ˆh C ˆh ˇVar ˇ ˇ hD1

hD1

q p bVar.ˆ1 / C .2b 1/ Var.ˆ1 / Var.ˆ1 /; which implies ˇ !1=2 ˇˇ !1=2 ˇ b b X X ˇ ˇ ˇ ˇ Var Var.ˆh / ˆh C ˆh ˇ ˇ ˇ ˇ hD1 hD1 q p bVar.ˆ1 / C .2b 1/ Var.ˆ1 / Var.ˆ1 /

1=2 1=2 Pb P Var bhD1 ˆh C ˆh C Var.ˆ / h hD1 bVar.ˆ1 / .2b 1/ 1=2 C pb Pb Var hD1 ˆh C ˆh

q Var.ˆ1 /:

We recall (6.120) and (6.125): p N C log.2 2= / p 2 I0 D I0 . I N / D log.1 C 2/

(6.130)


407

and 1 1 .log I0 /2 1 ` < .log I0 /2 C 2: 3 3 We use the elementary fact that given arbitrary constants C1 > 1 and C2 < 1, the inequality .log N /2

C1

> N C2

(6.131)

holds for every sufficiently large value of N . It follows via simple calculations that the choice of parameters k [see (6.122)] and ` implies the following upper bound for ".I k; `/ [defined in (6.83)]: ".I k; `/

1012 .1 C 2 /2 : N8

Thus by Lemma 6.7, ˇ ˇ !1=2 ˇ b

1=2 ˇˇ X p ˇ ˇ Var ˇ 2 . /b.k C 3`/ log.1 C 2/ C O.1/ ˆh C ˆ h ˇ ˇ ˇ ˇ hD1 104 .1 C 2 / C

1010 .1 C 2 /4 ; N2

(6.132)

and ˇ

1=2 ˇˇ p ˇ ˇ Varˆh 1=2 2 . /3` log.1 C 2/ C O.1/ ˇ ˇ ˇ 104 .1 C 2 / C

1010 .1 C 2 /2 : N2

(6.133)

Next we study the third moment. To estimate the third central moment of ˆ1 .v/, v 2 P0 , we are going to use the following well-known moment inequality: let X be a random variable, then 1=3 1=4 EjX j4 : EjX j3

(6.134)

(Note that (6.134) is a special case of the general inequality .EjX ju /1=u .EjX jv /1=v for all 0 < u v; which follows from Jensen’s inequality applied for the convex function x v=u , x > 0.)

408


In view of (6.134), it suffices to estimate the fourth central moment Ev2P0 .ˆ1;0 . v//4 :

(6.135)

It is based on another application of Lemma 6.3 where the blocks and the gaps all have the same size 3`. First we divide k with 6` [see (6.122)–(6.125)]: k D b 6` C r ; where the remainder is in the interval 0 r < 6`:

(6.136)

We specify the integral parameters “b 1; d 0; ` 1; k 3” in Lemma 6.3 to be b ; d ; ` ; k as follows: ` D ` D k =3; d D 0; b is defined in (6.136);

(6.137)

and of course ` is defined in (6.124)–(6.125). Write [see (6.78)] 1 D ` C 1 D ` C 1 and 2 D .3b C 1/` C b k D ` C 6`b :

(6.138)

By Lemma 6.3 there exist two sequences of Rademacher like functions '1 ; '2 ; : : : ; 'b and ' 1 .x/; ' 2 .x/; : : : ; ' b .x/

such that the extensions ˆh , ˆh , 1 h b , defined in (6.62) and (6.74), have the following approximation property:

f .1 ; 2 I v/

D

b X

ˆh .

hD1

v/ C

b X

ˆh .v/ for all v 2 P0

(6.139)

hD1

with the possible exception of vs of total area at most 2b

! p `C1 2 p `C1 p `=2 .1 C 2/ C 400 .1 C 2/ 10 .1 C /12` 1 C .1 C 2/ : 8

2

(6.140) We also need the simple fact max

1hb ;v2P0

ˇ ˇo n ˇ ˇ jˆh .v/j ; ˇˆh .v/ˇ 104 .1 C 2 /3`;

which is a standard application of Lemma 5.5.

(6.141)


409

Write

ˆh;0 D ˆh Eˆh and ˆh;0 D ˆh Eˆh ; that is, the extra 0 in the index indicates that the expectation is 0. By using the independence of the Rademacher like functions, we have 14 0 b X 4 2 2 ˆh;0 A D b E ˆ1;0 C 3b .b 1/ E ˆ1;0 E@ hD1

4 4 b max ˆ1;0 .v/ C 3b b 1 max ˆ1;0 .v/ v2P0

v2P0

4 2 4 2 3 b max ˆ1;0 .v/ 3 b 104 .1 C 2 /3` ; v2P0

(6.142)

where in the last step we used (6.141). Similarly, 0 14 b X 2 4 ˆh;0 A 3 b 104 .1 C 2 /3` ; E@

(6.143)

hD1

Applying Minkowski’s inequality with p D 4 [see (6.106)], by (6.142) and (6.143) we have 0 14 b

X 2 4 E@ ˆh;0 C ˆh;0 A 24 3 b 104 .1 C 2 /3` :

(6.144)

hD1

Note that (6.144) is the main step toward the estimation of (6.135). The rest is routine estimations with a few more applications of Minkowski’s inequality. The details go as follows. We have b

X .1/ .1/ ˆ1 .v/ ˆh C ˆh D ˆ1 .v/ f .k1 ; k2 I v/C

hD1 b X

C

.1/ .1/ f .k1 ; k2 I v/

ˆh C ˆh D 1 .v/ C 2 .v/ C 3 .v/;

(6.145)

hD1

where .1/

.1/

1 .v/ D ˆ1 .v/ f .k1 ; k2 I v/;

(6.146)

410


2 .v/ D f .2 C 1; k2 I v/; .1/

(6.147)

b

X ˆh C ˆh ;

3 .v/ D f .1 ; 2 I v/

(6.148)

hD1

since 1 D ` C 1 D k1 and 2 D ` C 6`b < k2 D ` C k [so k2 2 < 6`, see (6.136)–(6.138)]. Combining (6.135)–(6.139) with (6.131), we have that 3 .v/, v 2 P0 is zero except for a possible subset of P0 with area 1010 .1 C 2 /N 6 , and also p max j3 .v/j 104 .1 C 2 /k < 104 .1 C 2 /2 N : .1/

.1/

.1/

v2P0

It follows that p 4 Ev2P0 .3 .v/ E3 /4 1010 .1 C 2 /N 6 104 .1 C 2 /2 N <
0 and " > 0 we have Z

.1C"/C"

e u

2 =2

d u 2":

(6.202)

Indeed, Z

.1C"/C"

e

u2 =2

Z

.1C"/C"

du 2

e u d u D 2 e e .1C"/" D

D 2e

1 e "" 2e . C 1/" 2";

422


where we used the elementary inequalities e u =2 2e u , 1 e u u, and .u C 1/e u 1 that hold for all u 0. Finally notice that (6.191) immediately follows from (6.198)–(6.202). This settles the special case where 2

I0 D I0 . I N / D .3b C 1/` C bk

(6.203)

holds for some integer b [see (6.164)]. In the general case we have I0 D I0 . I N / D .3b C 1/` C bk C %

(6.204)

with some integers b and 0 % < k C 3`, where % is the “remainder.” Then we choose 2 D .3b C 1/` C bk C minf%; kg: Again using 1 1 D `, we clearly have X

fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ// C f .2 C 1; I0 I v.ˇ//;

0i I0 . IN /

and so ˇ ˇ ˇ ˇ ˇ ˇ X ˇ fi .v.ˇ// f .1 ; 2 I v.ˇ//ˇˇ O .log N /2 ; ˇ ˇ ˇ0i I0 . IN / which is analogous to (6.166). A straightforward adaptation of Lemma 6.3 with “remainder” gives that the distribution of f .1 ; 2 I v.ˇ//; v.ˇ/ D .ˇ; 0/; 0 ˇ < 1 is “almost the same” as that of f .1 ; 2 I v/, v 2 P0 , and we have the equality f .1 ; 2 I v/ D

b X hD1

ˆh .v/ C

b X

ˆh .v/ C ˆhC1 .v/

hD1

for the “overwhelming majority” of v 2 P0 . Here the last term ˆhC1 .v/ corresponds to the “short tail sum” f ..3b C1/`Cbk C1; 2 I v.ˇ//, representing the contribution of the “remainder.”

6.6 Proving the Three Lemmas: Part One

423

Since the “remainder” can be smaller than k, b X

ˆh .v/ C ˆhC1 .v/

hD1

is a sum of independent, but not necessarily identically distributed random variables (due to the last term ˆhC1 .v/). This change does not lead to a new problem, since the Berry–Esseen form of the CLT (6.128) still applies. The rest of the proof of the general case is the same as it was in the special case (6.203). This proves (6.191) in the general case (6.204). It does not mean, however, that the proof of Theorem 5.4 is complete. We still have to prove three lemmas from Sect. 6.3, namely, Lemmas 6.4– 6.6. It is the subject of the next two sections.

6.6 Proving the Three Lemmas: Part One To prove Lemmas 6.4–6.6 we will apply Poisson’s summation formula, study some nonelementary functions, estimate integrals of exponential functions, and finally, to compute the variance, we use Parseval’s formula. The details will be rather troublesome. To prove Lemma 6.4, we need the two-dimensional Poisson’s summation formula, which basically means that we work with Fourier series. For the convenience of the reader we start with the one-dimensional case. Assume that a series X G.x/ D g.x C k/ (6.205) k2ZZ converges uniformly for 0 x < 1, and also assume that the sum G.x/ is p represented by its Fourier series (where of course i D 1): G.x/ D lim

N !1

N X

cn e

2inx

1 X

D

cn e 2inx ;

(6.206)

nD1

nDN

where the Fourier coefficients cn in (6.206) are calculated in the usual way: Z

1

cn D

G.t/e 2int dt D

0

D

XZ k2ZZ

k

XZ k2ZZ

kC1

g.t/e

2int

Z

1

g.t C k/e 2int dt D

0 1

dt D 1

g.t/e 2int dt:

(6.207)

424


Applying (6.207) in (6.206) with x D 0, we obtain (the one-dimensional form of) Poisson’s summation formula: XZ 1 X g.t/e 2int dt: (6.208) g.k/ D G.0/ D 1 k2ZZ n2ZZ What we really need here is the two-dimensional form of (6.208), which can be proved exactly the same way: X

g.k/ D 2

X Z 2

k2ZZ

n2ZZ

1

1

Z

1

g.t/e 2int d t;

(6.209)

1

where nt D n1 t1 C n2 t2 is the usual dot (or inner) product. If g D B , that is, if the function g is the characteristic function of a bounded region B R I 2 in the plane, then the left-hand side of (6.209) counts the number of lattice points in the region B: X

X Z

B .k/ D jB \ ZZ2 j D 2

2

k2ZZ

e 2int d t

(6.210)

B

n2ZZ

with the usual dot product nt D n1 t1 C n2 t2 . If we switch from region B to the translated copy B C v, then by (6.210) and the equality t 2 B C v , t v 2 B, we have j.B C v/ \ ZZ2 j D

X Z 2

n2ZZ

D

X Z n2ZZ

2

e B

2in.tv/

dt D

e 2int d t D BCv

X Z 2

n2ZZ

e

2int

d t e 2inv ;

(6.211)

B

which is a Fourier series in terms of the translation vector v 2 Œ0; 1/2 (since the set of lattice points is periodic modulo one). We recall the definition of the hyperbolic region HK;L D HK;L . / [see (6.86)]: n o p HK;L . / D .x; y/ 2 R I 2 W x 2 2y 2 where K x C y 2 L : (6.212) For every v 2 R I 2 let HK;L . / C v denote the translated copy of region HK;L . /, and consider the periodic function h.K; LI v/ D h. I K; LI v/ D j.HK;L . / C v/ \ ZZ2 j Area.HK;L . //:

(6.213)


425

The first statement of Lemma 6.4 is about the integral Z

P0

2 j.HK;L . / v/ \ ZZ j Area.HK;L . // d v D

Z 1Z

1

2

h2 .K; LI v/ d v: 0

0

(6.214) By using (6.211) for the function h.K; LI v/ introduced in (6.213), we have h.K; LI v/ D

!

Z

X

e

2int

d t e 2inv :

(6.215)

HK;L . /

2

n2ZZ W n¤0

Combining (6.215) with Parseval’s formula, Z 1Z

h2 . I K; LI v/ d v D 0

0

ˇZ ˇ ˇ ˇ

X

1

2

n2ZZ W n¤0

HK;L . /

ˇ2 ˇ e 2int d tˇˇ :

(6.216)

We want to evaluate the integral Z

e 2int d t HK;L . /

for every n 2 ZZ2 with n ¤ 0. Unfortunately, there is no explicit formula for this integral; instead we are going to express it in terms of two nonelementary functions ˝.z/ and ‰.z/. p p Motivated by the factorization x 2 2y 2 D .x C y 2/.x y 2/, we apply the following substitution: p p u1 D x C y 2; u2 D x y 2;

(6.217)

which is equivalent to ˇ ˇ ˇ 1=2 1=2 ˇ u1 u 2 u 1 C u2 @.x; y/ ˇ ; yD p D ˇ 3=2 3=2 ˇˇ D 23=2 : xD with Jacobian D 2 2 2 @.v; u/ 2 2 (6.218) Applying the substitution (6.217) and (6.218), Z e

2i.n1 xCn2 y/

HK;L . /

1 D p 2 2

Z

p

e 2i.n1 .u1 Cu2 /=2Cn2 .u1 u2 /=2

dxdy D

p Z p e 2iu1 .n1 2Cn2 /=2 2

Z Ku1 L

2/

dxdy D

HK;L . /

e 2iu2 .n1 =u1 u2 =u1

p p 2n2 /=2 2

du2 du1

426


Z

p p p

2 2 e p sin .n1 2 n2 /=u1 2 d u1 : n1 2 n2 K (6.219) p p Making the substitution u D u1 .n1 2 C n2 /=2 2 in the last line of (6.219), we conclude that Z e 2i.n1 xCn2 y/ dxdy D 1 D p 2 2

L

2iu1 .n1

p

p 2Cn2 /=2 2

HK;L . /

Z

.n1

D .n1

p p 2Cn2 /L= 2

p

p 2Cn2 /K= 2

e

iu

p 2 2 sin 2 .2n21 n22 /=2u d u D 2n1 n22

p Z .n1 p2Cn2 /L=p2 2 2 2 2 D 2 p cos.u/ sin .2n1 n2 /=2u d u p 2 2n1 n2 .n1 2Cn2 /K= 2 p Z .n1 p2Cn2 /L=p2 2 2 2 2 i 2 p sin.u/ sin .2n1 n2 /=2u d u; p 2 2n1 n2 .n1 2Cn2 /K= 2

(6.220)

where n D .n1 ; n2 / 2 ZZ2 is a lattice point ¤ 0. By (6.211), 2

j.HK;L . / C v/ \ ZZ j Area.HK;L . // D

X 2

n2ZZ Wn¤0

(6.221)

R where .n/ is defined as Z

.n1 .n1

p

p

2Cn2 /L=

p 2

p 2Cn2 /K= 2

p Z 2 2inv e .n/; 2n21 n22

cos.u/ sin 2 .2n21 n22 /=2u i sin.u/ sin 2 .2n21 n22 /=2u d u:

To evaluate the right-hand side of (6.220), we need to study the auxiliary functions Z 1 cos.x/ sin.z=x/ dx (6.222) .z/ D 0

and Z

1

sin.x/ sin.z=x/ dx:

‰.z/ D

(6.223)

0

In particular, we have to show that the infinite integrals in (6.222) and (6.223) are both convergent (i.e., the functions are welldefined).


427

6.6.1 Properties of the Auxiliary Functions in (6.222) and (6.223) In fact, we study the integrals Z

b

cos.x/ sin.z=x/ dx for all 0 a < b 1 and z ¤ 0; a

and Z

b

sin.x/ sin.z=x/ dx for all 0 a < b 1 and z ¤ 0: a

First we show that the limit Z N cos.x/ sin.z=x/ dx exists for all z ¤ 0: lim N !1

1=N

This limit is the formal definition of .z/. To prove the limit, we assume z > 0; by using integration by parts, Z

N

p

p cos.x/ sin.z=x/ dx D sin.N / sin.z=N / sin2 . z/C

z

Z C

N p z

sin.x/ cos.z=x/zx 2 dx:

(6.224)

Also, by making the substitution y D z=x, Z

Z

p z

cos.x/ sin.z=x/ dx D 1=N

zN

p

cos.z=y/ sin.y/zy 2 dy:

We assume that N is large enough to yield 1=N < (6.224) and (6.225), Z

N

(6.225)

z

p z < minfN; zN g; then by

p cos.x/ sin.z=x/ dx D sin.N / sin.z=N / sin2 . z/C

1=N

Z C

N p z

sin.x/ cos.z=x/zx 2 dx C

Z

zN p

z

sin.x/ cos.z=x/zx 2 dx:

(6.226)

428


Taking the limit N ! 1 in (6.226), we have Z

N

.z/ D lim

N !1 1=N

Z D2

1

p

cos.x/ sin.z=x/ dx D

p sin.x/ cos.z=x/zx 2 dx sin2 . z/;

(6.227)

z

and the infinite integral in the second line is clearly convergent, since Z

1 p z

x 2 dx D z1=2 < 1:

Of course, .0/ D 0 and .z/ D .z/. Next we show that the limit Z lim

N

N !1 1=N

sin.x/ sin.z=x/ dx exists for all z ¤ 0:

This limit is the formal definition of ‰.z/. To prove the limit, let z > 0, and repeating the arguments above, we have Z

N p

p p sin.x/ sin.z=x/ dx D cos.N / sin.z=N / C cos. z/ sin. z/

z

Z

N

p

cos.x/ cos.z=x/zx 2 dx;

(6.228)

z

and Z

Z

p z

sin.x/ sin.z=x/ dx D 1=N

zN p

sin.z=y/ sin.y/zy 2 dy;

z

and also Z

N

p p sin.x/ sin.z=x/ dx D cos.N / sin.z=N / C cos. z/ sin. z/C

1=N

Z C

zN p

sin.x/ sin.z=x/zx z

2

Z dx

N p

z

cos.x/ cos.z=x/zx 2 dx:


429

Taking the limit N ! 1, we have Z ‰.z/ D lim

N

N !1 1=N

Z D

1 p z

sin.x/ sin.z=x/ dx D

p z 2 cos x C zx dx C sin.2 z/=2; x

(6.229)

and again the infinite integral in the second line is clearly convergent for the same reason as (6.227). Of course, ‰.0/ D 0 and ‰.z/ D ‰.z/. Equations (6.227) and (6.229) show that the functions .z/ and ‰.z/ are well defined. Their asymptotic behavior is described by Lemma 6.5. On the other hand, the limit constant 2 . / in Lemma 6.4 is described by Lemma 6.6. We conclude Sect. 6.6 deriving Lemma 6.6 from Lemmas 6.4 and 6.5. The proofs of Lemmas 6.4 and 6.5 are postponed to the next section.

6.6.2 Deduction of Lemma 6.6 from Lemmas 6.4 and 6.5 First note that the asymptotic formula at the end of Lemma 6.6 immediately follows from Lemma 6.5. Again applying Lemma 6.5, there is an absolute constant c3 > 0 such that if z c3 then 2 .z/C‰ 2 .z/

p p p p 1 1=2 z .sin.2 z/ C cos.2 z//2 C .sin.2 z/ cos.2 z//2 D 2 8 D

1 1=2 z 2 D z1=2 ; 2 8 8

(6.230)

and also 2 .z/ C ‰ 2 .z/ 2

1=2 z 2 D z1=2 : 8 2

(6.231)

We distinguish three cases. Case 1: > c3 Then by Lemma 6.4 with n D 1, and also by (6.230), we have 2 . / D

8 log.1 C

8 log.1 C p

p

2/

2 . 2 n=2/ C ‰ 2 . 2 n=2/

1=2 2 2 p =2 p D : 2/ 8 log.1 C 2/

(6.232)

430


On the other hand, by Lemma 6.4 and (6.97), p 2 . / c4

(6.233)

with some absolute constant c4 . Next we assume that > 0 is “small.” We recall (6.227): Z .z/ D 2 Z D2

1

p z

1

p

p sin.x/ cos.z=x/zx 2 dx sin2 . z/ D

z

sin.x/ cos.z=x/zx 2 dx C 2

Z

1

p sin.x/ cos.z=x/zx 2 dx sin2 . z/:

1

If z > 0 is “small” then Z

1

p

sin.x/ cos.z=x/zx 2 dx D

z

Z

1 p

xzx 2 dx C 0.z/ D z

z

Z

1 p z

x 1 dx C O.z/ D

1 1 1 D z log p C 0.z/ D z log C O.z/; z 2 z and Z

1

sin.x/ cos.z=x/zx

2

Z dx D O z

1

1

x

2

dx

D O.z/:

1

Thus, for 0 < z < 1=2 we have .z/ D z log

1 C O.z/: z

It follows that there is a (possibly small) constant c5 > 0 such that, for all 0 < z < c5 , 1 1 1 z log < .z/ < 2z log : 2 z z Next we switch from .z/ to ‰.z/: by definition, Z ‰.z/ D

Z

=2

1

sin.x/ sin.z=x/ dx C 0

sin.x/ sin.z=x/ dx; =2

(6.234)


431

and clearly ˇ Z ˇZ ˇ ˇ =2 =2 z ˇ ˇ sin.x/ sin.z=x/ dx ˇ x .z=x/ dx D : ˇ ˇ ˇ 0 2 0 By integration by parts [similarly to (6.228)] ˇZ ˇ ˇ ˇ

1 =2

ˇ ˇ Z ˇ ˇ p p sin.x/ sin.z=x/ dx ˇˇ D ˇˇcos. z/ sin. z/ Z

1

z

x 2 dx D

=2

1 =2

ˇ ˇ cos.x/ cos.z=x/zx 2 dx ˇˇ

2z :

Therefore, j‰.z/j

2z z C < 3z for all z > 0: 2

(6.235)

By (6.234) and (6.235) there is a (small) constant c6 > 0 such that for all 0 < z c6 .< 1=2/, 1 2 z log2 4

1 1 < 2 .z/ C ‰ 2 .z/ < 5z2 log2 : z z

(6.236)

Now we are ready to discuss Case 2: 0 < < c6 =10 Then by Lemma 6.4 and (6.236), 2 . /

1 log.1 C

p

X 2/ 1nc6 =5

2 R˙ .n/ 2 n=2 log2 2 n

2 ; 2 n

(6.237)

where R˙ .n/ denotes the number of primary representations of x 2 2y 2 D ˙n. The special case d D 2 in (2.221) gives log.1 C 1 X R˙ .n/ D p N 1nN 2

p

2/

C O N 1=2 :

(6.238)

Combining (6.237) and (6.238) with Abel’s transformation (2.119), 2 . / with some absolute constant c7 > 0.

c7 2 D c7

(6.239)

432


On the other hand, by Lemma 6.4, (6.236), (6.238), and Lemma 6.5, 2 . /

C

24 log.1 C

p

X 2/ 1nc6 =5

2 R˙ .n/ 2 n=2 log2 2 n

2 C 2 n

X R˙ .n/ X R˙ .n/ p p O. n/ D O. / C O. / D O. /: n2 n3=2

n>c6 =5

(6.240)

n>c6 =5

By (6.239) and (6.240) there are constants 0 < c8 < c9 such that 0 < c8 < 2 . / < c9 for all 0 < < c6 =10:

(6.241)

It remains to discuss Case 3: c6 =10 c3 We show that there are constants 0 < c10 < c11 such that in this range of , c10 < 2 . / < c11 :

(6.242)

The upper bound is trivial from Lemma 6.4 and (6.97). To prove the lower bound, we simply choose the least complete square m2 such that z D 2 m2 =2 c3 . Then by (6.230), 2 .z/ C ‰ 2 .z/

1=2 z c12 > 0; 8

and of course R˙ .m2 / 1 (since x 2 2y 2 D m2 has the solution x D y D m). Now the lower bound in (6.242) is trivial from Lemma 6.4: we just use the single term n D m2 . Combining (6.232), (6.233), (6.239), (6.240), and (6.242), Lemma 6.6 follows. Concluding Remark. Lemma 6.6 tells us that in the two different ranges 0 < 1 and > 1 we have two different exponents of , namely, 1 and 1/2, to describe the order of 2 . /. Here we give an intuitive explanation p for this somewhat surprising phenomenon. We recall the definition of region H . 2I N /: ˚ .x; y/ 2 R I 2 W x 2 2y 2 ; where 0 y e N ; x 0 ; p an exponentially long and narrow tilted “hyperbolic i.e., H . 2I N / denotes p needle” of area N= 2 C O.1/. 2 First p assume that is “very small”; say, 0 < < 10 . Divide the region H . 2I N / into segments H1 ; H2 ; H3 ; : : : such that each p p Hi is covered by a rectangle of slope 1= 2 and area 1/5.p (Note that slope 1= 2 comes from x 2 D 2y 2 , which is equivalent to y=x D ˙1= 2; on the other hand, area 1/5 comes from Lemma 5.5.) Then the area of each segment Hi is about log.1= /, and the number


433

of segments Hi is about N= log.1= /. By Lemma 5.5, each translate Hi C v, v 2 R I2 contains at most one lattice point. Note that Hi and Hi Ck have dramatically different shapes as the gap k is increasing: the change is larger than k times iterated doubling– halving (doubling in one direction, halving in another direction). Therefore, it is plausible to assume that the occurrence of a lattice point in Hi C v and in Hi Ck C v, as v runs in the unit square, is (almost) an independent event if k is “large.” By using the additivity of the variance for independent components, we have

p p Variance j.H . 2I N / C v/ \ ZZ2 j Area.H . 2I N //

X

Variance j.Hi C v/ \ ZZ2 j Area.Hi / D

1i O.N= log.1= //

D

X

Area.Hi / log.1= / N= log.1= / D N;

1i O.N= log.1= //

which perfectly fits Lemma 6.6 for the range 0 < 1. 2 Next p assume that is “very large,” say, > 10 . In this case we divide the region H . 2I N / intop segments H1 ; H2 ; H3 ; : : : such that each Hi has area . The parts Hi , 1 i N= 2, have a doubling–halving behavior: the next part Hi C1 is twice as long and half as narrow as Hi , which is a dramatic change in the shapes. We recall that x1 D x C 2y, y1 D x C y is a basic automorphism of the quadratic form x 2 2y 2 . Indeed, x12 2y12 D .x C 2y/2 2.x C y/2 D .x 2 2y 2 /: k Applying a proper power k for any segment Hi , the automorphism A with A D 12 maps the long and narrow tilted region Hi into a “round” shape of size about 11 p p p (the area is ). Since the perimeter of such a “round” shape is O. /, it is clear that

p Variance j.Hi C v/ \ ZZ2 j Area.Hi / : Again assuming independence for the different parts Hi , we obtain

p p Variance j.H . 2I N / C v/ \ ZZ2 j Area.H . 2I N //

X p 1i N= 2

p p N;

which fits Lemma 6.6 for > 1. This completes our “intuitive understanding” of Lemma 6.6.

434


6.7 Proving the Three Lemmas: Part Two It remains to prove Lemmas 6.4 and 6.5. We begin with the Proof of Lemma 6.5. In view of (6.229) it is natural to study the integral Z I D

1 p

cos.x C z

z / zx 2 dxI x

(6.243)

also, we assume z > 1. p We make the substitution x D z C y: xC

p p p z z z D D . z C y/ C p D . z C y/ C x zCy 1 C py z ! 1 X p p y k D D . z C y/ C z 1 C p z kD1

! 1 X y k y2 D2 zC p 1C : p z z p

(6.244)

kD1

Before applying (6.244), first we split the integral (6.243) into two parts: I D I1 C I2 where Z I1 D

p zCz

cos.x C

p z

Z I2 D

z / zx 2 dx and x

(6.245)

z / zx 2 dx; x

(6.246)

1

p

cos.x C zCz

where the value of the constant parameter in 1=4 < < 1=2 will be specified later (note in advance that D 7=24 will be a good choice). To evaluate the integral p in (6.245), we use the substitution y D x z and (6.244), and also use the trigonometric identity cos.˛ C ˇ/ D cos.˛/ cos.ˇ/ sin.˛/ sin.ˇ/ as follows: Z I1 D

p zCz

p z

cos.x C

z / zx 2 dx D x

6.7 Proving the Three Lemmas: Part Two

Z

z

D 0

435

!! 1 X y2 z y k cos 2 z C p 1 C dy D p p z z . z C y/2 p

kD1

p D cos.2 z/

Z

z

0

Z

p

z

sin.2 z/ 0

!! 1 X y2 y k 1 cos p 1 C p dy p z z .1 C y= z/2 kD1

!! 1 X y2 1 y k sin p 1 C p dy: p z z .1 C y= z/2

(6.247)

kD1

Making the substitution u D yz1=4 in (6.247), we have Z I1 D Z

p D z1=4 cos.2 z/

p zCz

cos.x C

p z

z1=4

cos u2 1 C 0

1=4

z

p sin.2 z/

Z

z / zx 2 dx D x

1 X

!! .uz1=4 /k

kD1 1 X .uz1=4 /k 1C

z1=4

sin u

2

0

!!

kD1

To evaluate (6.248), we use Lemma 6.8. We have Z 1 Z cos.u2 / d u D 0

1 0

1 du .1 C uz1=4 /2 1 d u: .1 C uz1=4 /2 (6.248) t u

p sin.u2 / d u D p ; 2 2

(6.249)

and for any M > 1, ˇZ ˇ ˇ ˇ

1

M

ˇ ˇZ ˇ ˇ ˇ 2 ˇˇ 1 2 2 ˇ cos.u / d uˇ < 2 ; ˇ sin.u / d uˇˇ < 2 : M M M 2

(6.250)

Remark. The two integrals in (6.249) are the so-called Fresnel integrals. For the sake of completeness we include a proof. Proof of Lemma 6.8. To prove (6.249) we use Cauchy’s integral theorem for complex variables. Let D 1 [ 2 [ 3 be the closed curve, where 1 is the interval p Œ0; R on the real axis; 2 is the arc Re i# where 0 # =4, of course i D 1, and 3 is the line segment fre i=4 W R r 0g returning to the origin. Since 2 f .w/ D e w is an analytic function (where w D x C iy), by Cauchy’s theorem,

436


Z 0D

f .w/ d w D

3 Z X

f .w/ d w:

j D1 j

We have Z

Z

R

f .w/ d w D 1

e x dx ! 2

0

Z

1

e x dx D 2

0

p as R ! 1; 2

Z f .w/ d w ! 0 as R ! 1; 2

Z

1Ci f .w/ d w D p 2 3

Z

R 0

Z

1Ci 2 e ix dx ! p 2

1

.cos.x 2 / i sin.x 2 // dx 0

as R ! 1. Summarizing, with R ! 1 we have p 0D

1Ci p 2 2

Z

1

.cos.x 2 / i sin.x 2 // dx ;

0

and (6.249) follows. Next we prove (6.250). We work with sin; the same argument works for cos. Let m be the least integer such that m M 2 . We have Z

Z

1

.m/1=2

2

2

sin.x / dx D M

sin.x / dx C M

1 X

Aj

j D0

where Z Aj D

..mCj C1//1=2

sin.x 2 / dx: ..mCj //1=2

P Notice that 1 j D0 Aj is an alternating series such that jAj j jAj C1 j and Aj ! 0 as j ! 1. Thus we have ˇ ˇ ˇ1 ˇ ˇX ˇ ˇ ˇ A j ˇ jA0 j; ˇ ˇj D0 ˇ


437

and so ˇZ ˇ ˇ ˇ

1 M

ˇ Z ˇ sin.x 2 / dx ˇˇ

..mC1//1=2

j sin.x 2 /j dx M

..m C 1//1=2 M .M 2 C 2/1=2 M
1, we have Z I1 D

p zCz

p z

cos.x C

z / zx 2 dx D x

(6.254)

438


p p p D p z1=4 cos.2 z/ sin.2 z/ C O z1=12 ; 2 2

(6.255)

which gives a good estimate for the first integral I1 in (6.245). It remains to estimate the second integral I2 in (6.245): Z I2 D

1 p

cos.x C zCz

z / zx 2 dx with D 7=24: x

(6.256)

To estimate I2 we apply a general lemma about exponential sums. Lemma 6.9. Let F .x/ and G.x/ be real-valued functions, F is differentiable with derivative F 0 , F .x/ and F 0 .x/=G.x/ are both monotonic throughout the interval a x b. Then ˇZ ˇ ˇ ˇ ˇ ˇ ˇ b ˇ ˇ G.a/ ˇ ˇ G.b/ ˇ ˇ ˇ iF .x/ ˇ ˇ ˇ e G.x/ dx ˇ 2 ˇ 0 ˇ C ˇ 0 ˇˇ : ˇ ˇ a ˇ F .a/ F .b/ Remark. This is a standard tool in analytic number theory; nevertheless, for the sake of completeness, we include a proof. Proof. The basic idea is the same as that of the simpler inequality (6.249). Suppose, for example, that F .x/ is monotone increasing, i.e., F 0 .x/ > 0 for a x b. Let F 1 denote the inverse function to F ; it is also increasing. Applying the substitution x D F 1 .u/, Z

Z

b

e

iF .x/

a

Z

F .b/

e iu

G.x/ dx D F .a/

F .b/

e iu h.u/ d u with h.u/ D

D F .a/

G.F 1 .u// du D F 0 .F 1 .u// G.F 1 .u// I F 0 .F 1 .u//

(6.257)

note that h.u/ is a monotone function. By integration by parts, Z

Z

F .b/

F .b/

iu

ie iu h.u/ d u D

e dh.u/ C F .a/

F .a/

D e ib h.F .b// e ia h.F .a//: The first integral is estimated from above as follows: ˇZ ˇ ˇZ ˇ ˇ F .b/ ˇ ˇ F .b/ ˇ ˇ ˇ ˇ ˇ e iu dh.u/ˇ ˇ 1 dh.u/ˇ D jh.F .b// h.F .a//j: ˇ ˇ F .a/ ˇ ˇ F .a/ ˇ Combining (6.257)–(6.259), Lemma 6.9 follows.

(6.258)

(6.259) t u


439

To estimate (6.256) we use Lemma 6.9 with F .x/ D x C

z and G.x/ D zx 2 : x

Then F 0 .x/ D 1

z F 0 .x/ z x2 x2 z and D 1 D ; x2 G.x/ x2 z z

and both are positive for x >

p p z. If x > z C z with D 7=24, then 1

F 0 .x/ x2 z 2z 2 C 1 D > D z 2 ; G.x/ z z and by Lemma 6.9, ˇZ ˇ ˇ 1 ˇ z 1 ˇ ˇ 2 cos.x C / zx dx ˇ 2z 2 D 2z5=24 : jI2 j D ˇ p ˇ zCz ˇ x

(6.260)

Combining (6.243), (6.245), (6.255), and (6.260), we obtain for z > 1, Z

1 p

cos.x C z

z / zx 2 dx D x

p p p D p z1=4 sin.2 z/ cos.2 z/ C O z1=24 : 2 2

(6.261)

Using (6.261) and (6.229), we have the asymptotic formula in Lemma 6.8 for ‰.z/ with z > 1. If 0 < z 1, then we just use the trivial estimation in (6.229): ˇZ ˇ ˇ 1 p z 2 ˇˇ ˇ j‰.z/j ˇ p cos x C zx dx ˇ C j sin.2 z/=2j ˇ z ˇ x Z z

1

p

z

x 2 dx C

p p p z z D p C z D 2 z: z

This completes the proof of Lemma 6.8 for ‰.z/. Next we discuss .z/, see (6.227). Using the trigonometric identity z z z 2 sin.x/ cos. / D sin.x C / C sin.x / x x x

440


in (6.227), we have Z .z/ D 2

1

p

p sin.x/ cos.z=x/zx 2 dx sin2 . z/ D

z

Z D Z C

1

p

sin.x C z

1

p

sin.x z

z /zx 2 dxC x

p z /zx 2 dx sin2 . z/: x

(6.262)

The first integral Z I D

1

sin.x C

p z

z /zx 2 dx x

is analogous to (6.243), so, not surprisingly, we just repeat the arguments above. Similarly to (6.245), I D I1 C I2 where Z I1 D

p zCz p

sin.x C z

Z I2 D

z / zx 2 dx and x

1 p zCz

sin.x C

z / zx 2 dx; x

and similarly to (6.248) I1 D z

1=4

p sin.2 z/

Z

z1=4 2

cos u

1C

0

1=4

Cz

p cos.2 z/

Z

1 X

!! .uz

1=4 k

/

kD1

z1=4

sin u

2

1C

0

1 X

!! 1=4 k

.uz

/

kD1

1 du .1 C uz1=4 /2

1 d u: .1 C uz1=4 /2

By using Lemma 6.9 as above, we eventually obtain the following analog of (6.261): for z > 1, Z

1

p

sin.x C z

z / zx 2 dx D x

p p p D p z1=4 sin.2 z/ C cos.2 z/ C O z1=24 : 2 2

(6.263)


441

Next we estimate the second integral in (6.262): Z

1 p

sin.x z

z /zx 2 dx: x

Now we apply Lemma 6.9 with F .x/ D x

z and G.x/ D zx 2 : x

Then F 0 .x/ D 1 C

z F 0 .x/ z x2 x2 C z and D 1 C D ; 2 2 x G.x/ x z z

and by Lemma 6.9, ˇ ˇZ ˇ ˇ 1 z ˇ ˇ sin.x / zx 2 dx ˇ 2: ˇ p ˇ ˇ zCz x

(6.264)

By (6.262)–(6.264) we obtain the asymptotic formula in Lemma 6.8 for .z/ with z > 1. If 0 < z 1, then we just use the trivial estimation in (6.262): Z j.z/j 2z

1 p z

p p z x 2 dx C sin2 . z/ 2 p C z D 3 z: z t u

This completes the proof of Lemma 6.8. Next we discuss the Proof of Lemma 6.4. By (6.220) and Parseval’s formula, Z 1Z

1

2 j.HK;L . / C v/ \ ZZ2 j Area.HK;L . // d v D

0 0

X

D

2

n2ZZ Wn¤0

C

1 .2n21 n22 /2

X 2

n2ZZ Wn¤0

1 2 .2n1 n22 /2

Z

.n1

.n1

Z

p

p

2Cn2 /K=

.n1 .n1

2Cn2 /L=

p 2

p 2

cos.u/ sin

p p 2Cn2 /L= 2

p

2Cn2 /K=

p 2

2

sin.u/ sin

.2n21

2

.2n21

!2

n22 /=2u

du

n22 /=2u

C !2

du

:

(6.265)

442


Equation (6.265) displays the integrals Z

b

a

Z

z cos.u/ sin. / d u and u

b

z sin.u/ sin. / d u u

a

(6.266)

with p p p a D a.n/ D .n1 2 C n2 /K= 2; b D b.n/ D 2.n1 2 C n2 /L; z D 2 .2n21 n22 /=2: (6.267)

Clearly Z

b a

Z

a

D .z/ 0

z cos.u/ sin. / d u D u

z cos.u/ sin. / d u u

Z

1

b

z cos.u/ sin. / d u; u

(6.268)

and Z

b a

Z

a

D ‰.z/ 0

z sin.u/ sin. / d u D u


Z

1 b

z sin.u/ sin. / d u: u

(6.269) t u

To estimate the tail integrals in (6.268) and (6.269), we use the simple Lemma 6.10. If 0 < a < b < 1 and z > 0 then ˇZ a ˇ ˇ ˇZ a ˇ ˇ ˇ ˇ z z ˇ ˇ a; ˇ ˇ a; cos.u/ sin. sin.u/ sin. / d u / d u ˇ ˇ ˇ ˇ u u 0

ˇZ ˇ ˇ ˇ

b

0

ˇ 1 ˇ z z cos.u/ sin. / d uˇˇ 2 ; u b

ˇZ ˇ ˇ ˇ

1 b

ˇ ˇ z z sin.u/ sin. / d uˇˇ 2 : u b

Proof. The first line is trivial. To prove the second line, we apply integration by parts: ˇZ 1 ˇ ˇ ˇ Z 1 ˇ ˇ ˇ ˇ z z 2 ˇ ˇ ˇ cos.u/ sin. / d uˇ D ˇ sin.b/ sin.z=b/ C sin.u/ cos. /zx d uˇˇ ˇ u u b

b

Z

1

.z=b/ C z b

Similar argument works for the other one.

z u2 d u D 2 : b t u


443

To prove Lemma 6.4, we basically repeat the proof of Proposition 2.20, p or, what is very similar, the proof of Proposition 3.2 (in the special case ˛ D 2). In fact, what we are going to do next is a somewhat simpler version. Let A > 0 be a positive integer; if x D v 0, y D w 0 is a primary solution of x 2 2y 2 D ˙A, then by definition [see (2.219)] p p p p vCw 2 p .1 C 2/2 ; ˙A D v 2w D .v C w 2/.v w 2/ with 1 < vw 2 2

2

implying p p p p A < v C w 2 .1 C 2/ A:

(6.270)

It follows from the classical product formula (2.213) that for every integer j , p p p .vCw 2/.1C 2/j D X CY 2 gives a solution xDX; y D Y of x 2 2y 2 D ˙A: (6.271) 2 Now let’s return to (6.265). Let a fixed integer; p j write z D A=2. If p A > 0 be p 2 2 2n1 n2 D ˙A then n1 C n2 2 D .v C w 2/.1 C 2/ for some integer j . We begin with [see (6.267)] Case 1: Suppose that

0 < a D a.n/ D .n1

p

p p 2Cn2 /K= 2 < 1; and bDb.n/ D 2.n1 2Cn2 /L > z D 2 A=2:

By using Lemma 6.10 in (6.268) and (6.269), Z

b

z cos.u/ sin. / d u D .z/ C O.a/ C O.z=b/; u

b

z sin.u/ sin. / d u D ‰.z/ C O.a/ C O.z=b/; u

a

Z a

and so Z a

b

z cos.u/ sin. / d u u

!2

Z

b

C a


!2 D

D 2 .z/ C ‰ 2 .z/ C O.a C z=b/.j.z/j C j‰.z//j/ D p D 2 .z/ C ‰ 2 .z/ C O.a C z=b/ 2 .z/ C ‰ 2 .z/; where in the last step we used the Cauchy–Schwartz inequality.

(6.272)

444


By (6.271), for every fixed integer A > 0 there are as many as log.b=a/ 2 log A log.L=K/ 2 log A p C O.1/ D p C O.1/ log.1 C 2/ log.1 C 2/

(6.273)

p p p integer values of j such that n1 C n2 2 D .v C w 2/.1 C 2/j satisfies the conditions of Case 1. The total contribution of Case 1 with a fixed integer A > 0 (i.e., 2n21 n22 D ˙A) in (6.265) is equal to 2

A

log.L=K/ 2 log A p C O.1/ 2 .z/ C ‰ 2 .z/ C log.1 C 2/

p C O A2 2 .z/ C ‰ 2 .z/ ;

(6.274)

where z D 2 A=2. Note that (6.274) is a consequence of (6.272) and (6.273); also, the error term comes from a convergent geometric series [due to the exponential nature of (6.271) and the effect of the factor .a.n/ C z=b.n// in (6.272)]. For a fixed integer A > 0, the contribution of Case p 1 represents thepoverwhelming p majority in (6.265): the rest of the j s with n1 C n2 2 D .v C w 2/.1 C 2/j make a total contribution p A2 O.log A/ 2 .z/ C ‰ 2 .z/ D A2 O.log A/O. A/I

(6.275)

this is a corollary of Lemma 6.10. Following the proof of Proposition 2.20 (or Proposition 3.2), we split the big sum (6.265) into two parts depending on a threshold M D .log.L=K//c (where the value of the constant c > 1 in the exponent will be specified soon): X 1

X

D

2

n2ZZ Wn¤0 j2n21 n22 jM

X

C

2


1 2 .2n1 n22 /2

1 .2n21 n22 /2

Z

Z

!2

b.n/

cos.u/ sin.z.n/=u/ d u

C

a.n/

!2

b.n/

sin.u/ sin.z.n/=u/ d u

;

(6.276)

a.n/

and X 2

D

X 2

n2ZZ Wn¤0 j2n21 n22 j>M

1 .2n21 n22 /2

Z

!2

b.n/

cos.u/ sin.z.n/=u/ d u a.n/

C


X

C

2

n2ZZ Wn¤0 j2n21 n22 j>M

1 2 .2n1 n22 /2

Z

445

!2

b.n/

sin.u/ sin.z.n/=u/ d u

;

(6.277)

a.n/

p p p where b.n/ D 2.n1 2 C n2 /L, a.n/ D .n1 2 C n2 /K= 2, z.n/ D 2 .2n21 n22 /=2. By Lemma 6.5, (6.274), and (6.275), X 2

! X R˙ .A/ p DO .log.L=K/ C O.log A// A : A2 A>M

Using the upper bound with the divisor function 0 R˙ .A/ .A/ D Ao.1/ , by (6.278) we have X

D log.L=K/ O

2

X

(6.278) P d jA

1D

! A3=2Co.1/

D log.L=K/ O.M 1=3 / D O.1/;

A>M

(6.279) if M D .log.L=K//c with c D 3. Returning to (6.276), by (6.274) and (6.275), X

X

D

1

2


X

C

2


D

.2n21

log.L=K/ C O.log j2n21 n22 j/ X 1 .n/C p 2 2 0 n2 / log.1 C 2/

1=2 O.1/ 2 . 2 .2n21 n22 /=2/ C ‰ 2 . 2 .2n21 n22 /=2/ D 2 2 2 .2n1 n2 /

X 4 log.L=K/ X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ C ; p 2 3 log.1 C 2/ 1nM n (6.280)

where X 0

.n/ D 2 . 2 .2n21 n22 /=2/ C ‰ 2 . 2 .2n21 n22 /=2/

and X 3

D

X R˙ .n/ O.log n/ 2 . 2 n=2/ C ‰ 2 . 2 n=2/ : 2 n 1nM

(6.281)

446


By (6.97), X 3

D O.1/:

(6.282)

Again by (6.97), X R˙ .n/ 2 . 2 n=2/ C ‰ 2 . 2 n=2/ D 2 n n>M X

DO

! 3=2Co.1/

n

D O.M 1=3 / D O .log.L=K//1 ;

(6.283)

n>M

since M D .log.L=K//3 . By (6.279)–(6.283), 1 4 log.L=K/ X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ C O.1/: p 2 1 2 log.1 C 2/ nD1 n (6.284) By (6.265), (6.276), (6.277), and (6.284),

X

C

Z 1Z 0

1

X

D

2 j.HK;L . / C v/ \ ZZ2 j Area.HK;L . // d v D 2 . / log.L=K/CO.1/;

0

(6.285)

where 2 . / D

4 log.1 C

p

1 X R˙ .n/ 2 2 . n=2/ C ‰ 2 . 2 n=2/ : 2 2/ nD1 n

(6.286)

Since P0 and Œ0; 1/2 are equivalent modulo one [see (6.213) and (6.214)], Lemma 6.4 follows from (6.285) and (6.286). This completes the proof of Theorem 5.4. t u

6.8 Starting the Proof of Theorem 5.6 The proof is based on Lemma 6.3 and a general form of the law of the iterated logarithm (LIL) in probability theory (see Feller’s theorem below). We apply Lemma 6.3 for every integer j 20 with the following choice of parameters. Let 1 D 1 .j / D 3 2j C 1; 2 D 2 .j / D 3 2j C1 I

(6.287)

6.8 Starting the Proof of Theorem 5.6

447

moreover, let i D i.j / denote the integer satisfying the inequality 2i j 3 C j 2 < 2i C1 ;

(6.288)

and define k D kj and ` D `j such that kj C 3`j D 3 2i and `j D b22i=3 c:

(6.289)

So `j is in the range of j 2 ; formally, `j j 2 ; furthermore, kj j 3 and kj is divisible by 3. Finally, let d D dj D 1 .j / 1 `j 3 2i `j :

(6.290)

Combining (6.287)–(6.290), we have 2 .j / dj `j D 3 2j C1 3 2j D 3 2j D 2j i .kj C 3`j / D bj .kj C 3`j /; i:e:; the choice b D bj D 2j i satisfies (6.78):

(6.291)

Note that bj is in the range of 2j =j 3 : bj 2j =j 3 . By Lemma 6.3, for every j 20 there exist two sequences of Rademacher like functions such that the first sequence 'j;1; 'j;2 ; : : : ; 'j;bj has type rj;0 D dj < rj;1 < rj;2 < : : : < rj;bj where $ rj;h D

% p log.1 C 2/ .dj C .3h 1/`j C hkj / for 1 h bj ; log 2

(6.292)

the second sequence ' j;1 .x/; ' j;2 .x/; : : : ; ' j;bj .x/ has type r j;0 < r j;1 < r j;2 < : : : < r j;bj where $

p % log.1 C 2/ 1 r j;h D dj C .3h C 1/`j C .h C /kj for 0 h bj ; log 2 3 (6.293) and the usual extensions ˆj;h , ˆj;h , 1 h bj , defined in (6.62) and (6.74) have the following approximation property:

448


f .1 .j /; dj C `j C h.kj C 3`j /I v/ D

h X

ˆj;h .v/ C ˆj;h .v/

sD1

for all integers 1 h bj and for all v 2 P0 .r bj I a/

(6.294)

with the possible exception of at most bj ".I kj ; `j /2r bj integers a in 0 a < 2r bj , where ".I k; `/ is defined in (6.83). Note that (6.294) follows from (6.84) and (6.85). The special case h D bj in (6.294) is particularly useful: it gives f .1 .j /; dj C `j C bj .kj C 3`j /I v/ D f .1 .j /; 2 .j /I v/:

(6.295)

Next we apply Lemma 6.7: for every j 20 we have ˇ0 ˇ 11=2 ˇ ˇ bj ˇ

X p 1=2 ˇˇ ˇ@ 2 A ˆj;h .v/ C ˆj;h .v/ . /bj .kj C 3`j / log.1 C 2/ C O.1/ ˇ Var ˇ ˇ ˇ ˇ ˇ hD1

104 .1 C 2 / C bj ".I kj ; `j /104 .1 C 2 /.2 .j / 1 .j / C 1/C q C

bj ".I kj ; `j /104 .1 C 2 /.2 .j / 1 .j / C 1/;

(6.296)

where ".I k; `/ is defined in (6.83). Furthermore, ˇ

1=2 ˇˇ p ˇ ˇ Varv2P ˆj;h .v/ 1=2 2 . /3`j log.1 C 2/ C O.1/ ˇ 0 ˇ ˇ 104 .1 C 2 / C ".I kj ; `j /104 .1 C 2 /3`j C

q

".I kj ; `j /104 .1 C 2 /3`j ; (6.297)

and, finally, ˇ 0 1ˇ ˇ ˇ bj X ˇ ˇ ˇEv2P @f .1 .j /; 2 .j /I v/ ˇ A ˆ .v/ C ˆ .v/ j;h j;h 0 ˇ ˇ ˇ ˇ hD1 ".I kj ; `j /104 .1 C 2 /bj2 .kj C 3`j /:

(6.298)

rj C1;0 rj;bj :

(6.299)

We claim


449

Indeed, in view of (6.292) it suffices to check the inequality dj C .3bj 1/`j C bj kj dj C1 :

(6.300)

We can derive (6.300) from (6.287)–(6.291) as follows: dj C .3bj 1/`j C bj kj D .dj C `j / C bj .kj C 3`j / 2`j D 2 .j / 2`j D 3 2j C1 2`j ;

and dj C1 D 3 2j C1 `j C1 ; so it remains to show that `j C1 2`j , and it is trivial from the definition of `j [see (6.288) and (6.289)] and the fact .j C 1/3 C .j C 1/2 < 2 for j 4: j3 C j2 For every j 20 and 1 hj bj let Xm D Xm .v/ D ˆj;hj .v/; v 2 P0 where m D

X

b C hj :

(6.301)

20 <j

Inequality (6.299) implies that the infinite sequence X1 ; X2 ; X3 ; : : : represents independent random variables:

(6.302)

Next we prove the following analog of (6.299): r j C1;0 r j;bj :

(6.303)

Indeed, in view of (6.293) it suffices to check the inequality 1 1 dj C .3bj C 1/`j C .bj C /kj dj C1 C `j C1 C kj C1 : 3 3

(6.304)

We can derive (6.304) from (6.287)–(6.291) as follows: 1 1 1 1 dj C.3bj C1/`j C.bj C /kj D .dj C`j /Cbj .kj C3`j /C kj D 2 .j /C kj D 32j C1 C kj ; 3 3 3 3

and 1 1 dj C1 C `j C1 C kj C1 D 3 2j C1 C kj C1 ; 3 3

450


so it remains to show that kj C1 kj , and it is trivial from the definition of kj [see (6.289)] kj D 3 2i.j / b22i.j /=3c and the fact i.j C 1/ i.j /: For every j 20 and 1 hj bj let X

Ym D Ym .v/ D ˆj;hj .v/; v 2 P0 where m D

b C hj :

(6.305)

20 <j

Inequality (6.303) implies that the infinite sequence Y1 ; Y2 ; Y3 ; : : : represents independent random variables:

(6.306)

We use the elementary fact that given arbitrary constants c1 > 1 and c2 < 1, the inequality j2

c1 > 2c2 j

(6.307)

holds for every sufficiently large integer j . Combining this elementary fact with (6.83) and the definitions of kj ; `j ; bj (see (6.287)–(6.291), in particular, `j j 2 , kj j 3 , and bj 2j =j 3 ), we obtain via routine calculations bj ".I kj ; `j / D O

1 j2

:

P Since j 1 1=j 2 is convergent, the Borel–Cantelli lemma and (6.294) and (6.295) imply the following [we use the notation of (6.301) and (6.305)]: for almost every ˇ 2 Œ0; 1/, with v.ˇ/ D .ˇ; 0/ we have that the sum

m X

.Xn .v.ˇ// C Yn .v.ˇ/// D

nD1

b X X ' ;h .ˇ/ C ' ;h .ˇ/ C 20 <j hD1

C

hj X 'j;s .ˇ/ C ' j;s .ˇ/ differs from sD1

X

f .1 . /; 2 . /I v.ˇ// C f .1 .j /; 3 2j C 3hj 2i.j / I v.ˇ// by not more

20 <j

than an absolute constant C . I ˇ/ < 1:

(6.308)

We emphasize that, for almost every ˇ 2 Œ0; 1/, the same C . I ˇ/ holds in (6.308) simultaneously for all m 1.


451

The next step is to apply a deep theorem from probability theory: it is Feller’s general form of the LIL (see Feller [Fe3]). Feller’s Theorem. Let Z1 ; Z2 ; Z3 ; : : : be an infinite sequence of independent random variables such that the upper bounds (Var stands for variance) max jZm j ƒm .Var.Z1 C Z2 C : : : C Zm //1=2 hold for some sequence ƒm , m 1. Let m , m 1, denote an increasing sequence 1 1 2 3 : : :. Assume that ƒm D O 3 m :

(6.309)

If the series 1 X

VarZm 2 m e m =2 is divergent Var.Z C Z C : : : C Z / 1 2 m mD1

(6.310a)

then with probability one m X

.Zs EZs / > m .Var.Z1 C Z2 C : : : C Zm //1=2

sD1

hold for infinitely many integers m; on the other hand, if the series 1 X

VarZm 2 m e m =2 is convergent Var.Z1 C Z2 C : : : C Zm / mD1

(6.310b)

then with probability one m X

.Zs EZs / m .Var.Z1 C Z2 C : : : C Zm //1=2

sD1

hold for all sufficiently large integers m. The same statements hold for the negative side m X

.Zs EZs / < m .Var.Z1 C Z2 C : : : C Zm //1=2 :

sD1

The deduction of Theorem 5.6 from Feller’s theorem is similar to how we derived Theorem 5.4 from the Berry–Esseen theorem in Sect. 6.5. A novelty is that we heavily use the fact that log log x is a very slowly changing function (see, e.g., Lemma 6.11), and also the different choice of the parameters in Lemma 6.3 leads to different estimations.

452


The key step in the proof is to apply Feller’s theorem to the infinite sequence X1 ; X2 ; X3 ; : : : defined in (6.301): Xm D Xm .v/ D ˆj;hj .v/; v 2 P0 where m D

X

b C hj :

20 <j

This means we have to estimate the variance Var.X1 C X2 C : : : C Xm / D

m X

Var.Xn /:

nD1

Formula (6.291) implies that bj .kj C 3`j / D 3 2j , so by (6.296) for every 20,

Var

b X

ˆ ;h .v/ C ˆ ;h .v/

!1=2

p 1=2 D 2 . /3 2 log.1 C 2/ C O .1/;

hD1

(6.311) where we used the definitions (6.287)–(6.291) and the definition of ".I kj ; `j / [see (6.83)]:

2 p p p ".I kj ; `j / D 108 .1C 2 /4kj 1 C .1 C 2/`j C1 .1C 2/`j C1 C400.1C 2/`j =2 C

kj p p p kj C1 2 3 C10 .1 C /12`j 1 C .1 C 2/ .1 C 2/ 3 C1 C 400 .1 C 2/kj =6 ; 8

2

and applied the elementary fact (6.307). Repeating the same argument for (6.297), we have p 1=2 1=2 2 Varv2P0 ˆ ;h .v/ D . /3` log.1 C 2/ C O .1/:

(6.312)

We recall (6.130): ˇ !1=2 ˇˇ !1=2 ˇ b b X X ˇ ˇ ˇ ˇ Var Var.ˆ ;h / ˆ ;h C ˆ ;h ˇ ˇ ˇ ˇ hD1 hD1 .2b 1/ b Var.ˆ ;1 / 1=2 C pb Pb Var hD1 ˆ ;h C ˆ ;h

q Var.ˆ ;1 /:

(6.313)

Combining (6.311), (6.312), and (6.313), and using the fact that b and ` are, respectively, in the range of 2 = 3 and 2 , we have

6.8 Starting the Proof of Theorem 5.6 b X

!1=2 Var.ˆ ;h /

453

p 1=2 D 2 . /3 2 log.1 C 2/ C

hD1

p

CO .1/ C O b ` 2 =2 C O b ` D p 1=2 p D 2 . /3 2 log.1 C 2/ C O 2 =2 = :

(6.314)

Combining (6.314) with the equality A2 B 2 D .A B/.A C B/, we have ˇb ˇ

ˇX p ˇˇ ˇ 2

Var.ˆ ;h / . /3 2 log.1 C 2/ˇ ˇ ˇ ˇ hD1

p p O 2 =2 = O 2 =2 D O 2 = :

(6.315)

By P using (6.315) we are ready now to estimate the variance: with m D 20 <j b C hj we have

Var.X1 C X2 C : : : C Xm / D

b X X 20 <j hD1

D

X

Varˆ ;h C

hj X

Varˆj;s D

sD1

X p p p 2 . /3 2 log.1 C 2/ C 2 . /hj 3 2i.j / log.1 C 2/ C O 2 = D

20 <j

20 j

p

p D 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j ;

(6.316)

where i.j / is defined in (6.288). By (6.315) and (6.316), p p

1 2 . /3 2j log.1 C 2/ C O 2j = j VarXm bj p D p ; Var.X1 C X2 C : : : C Xm / 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j

which implies

p 1 p

1 1 VarXm C O 1= j 1 C O 1= j : 2 bj Var.X1 C X2 C : : : C Xm / bj (6.317) Now we recall the statement of Theorem 5.6: let q 5 be an arbitrarily large but fixed integer, and write

454


1=2 " .n/ D 2 log2 n C 3 log3 n C 2 log4 n C : : : C 2 logq1 n C 2.1 C "/ logq n ; (6.318) where log2 n D log log n means the iterated logarithm (and not the base 2 logarithm), and in general logk n D log.logk1 n/ denotes the k times iterated logarithm of n. Note that with this choice of " .n/, the sum 1 X " .n/ nD1

n

e "

2

.n/=2

X n

1 n log n log2 n log3 n logq1 n.logq n/1C"

is convergent or divergent depending on whether we have " > 0 or " 0. Let > 0 be fixed, then Theorem 5.6 states that for almost every 0 ˇ < 1, p p F . 2I ˇI I e n / > p n C 0 .n/. / n 2

(6.319)

hold for infinitely many n’s [i.e., if we choose " D 0 in (6.318)]. Exactly the same holds for the “negative side” p p F . 2I ˇI I e n / < p n 0 .n/. / n: 2 On the other hand, choosing " > 0 in (6.318), for almost every 0 ˇ < 1 we have the opposite inequality p p F . 2I ˇI I e n / p n C " .n/. / n 2

(6.320)

for all sufficiently large values of n, and the same holds for the “negative side” p p F . 2I ˇI I e n / p n " .n/. / n: 2 To prove (6.319), we recall (6.161) and (6.162) (with e n instead of e N ): for every vector v.ˇ/ D .ˇ; 0/, 0 ˇ < 1 and > 0, X 0i I0 . In/

p fi .v.ˇ// F . 2I ˇI I e n /

X

fi .v.ˇ// C 104 .1 C 2 /;

0i I0 . In/

(6.321) where p n C log.2 2= / I0 D I0 . I n/ D p 2; log.1 C 2/

(6.322)


455

Let j denote the integer satisfying 3 2j I0 . I n/ < 3 2j C1; and write I0 . I n/ D 32j C3hj 2i.j / C% with 0 hj < bj and 0 % < 32i.j /;

(6.323)

where % is the (negligible) “remainder.” Write 1 D 1 .20/ D 3 220 C 1 and 2 D 3 2j C 3hj 2i.j /: Then X

(6.324)

fi .v.ˇ// D f .1 ; 2 I v.ˇ// C f .0; 1 1I v.ˇ// C f .2 C 1; I0 I v.ˇ//;

0i I0 . In/

and so ˇ ˇ ˇ X ˇ ˇ ˇ ˇ fi .v.ˇ// f .1 ; 2 I v.ˇ//ˇˇ O j 3 D O .log n/3 : ˇ ˇ0i I0 . In/ ˇ

(6.325)

Combining (6.308), (6.321), and (6.325), we have that, for almost every ˇ 2 Œ0; 1/ and every integer n 2, hj b X X X p F . 2I ˇI I e n / D ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ C 20 <j hD1

sD1

C O;ˇ .1/ C O .log n/3 : Involving the expectations in (6.326), we obtain p F . 2I ˇI I e n / p n D 2 D

C

b X X

' ;hI0 .ˇ/ C

hj X

20 <j hD1

sD1

b X X

hj X

20 <j hD1

' ;hI0 .ˇ/ C

sD1

'j;sI0 .ˇ/C

' j;sI0 .ˇ/C

(6.326)

456


0 CEˇ2Œ0;1/ @

b X X 20 <j hD1

1 hj X ' ;h.ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A p nC 2 sD1

C O;ˇ .1/ C O .log n/3 ;

(6.327)

' ;hI0 .ˇ/ D ' ;h.ˇ/ Eˇ2Œ0;1/ ' ;h .ˇ/

(6.328)

where

means that we subtracted the expectation, and the same for ' ;hI0 .ˇ/ (the extra 0 in the index refers to zero expectation). We recall that by definition the two families ' ;h.ˇ/; 0 ˇ < 1 and ˆ ;h .v/; v 2 P0 have the same distribution, and similarly ' ;h .ˇ/; 0 ˇ < 1 and ˆ ;h .v/; v 2 P0 have the same distribution. This is why we can replace the sum in (6.327) b X X

' ;hI0 .ˇ/ C

20 <j hD1

hj X

'j;sI0 .ˇ/

sD1

with .X1 EX1 /C.X2 EX2 /C: : :C.Xm EXm / D

b X X

ˆ ;hI0 .v/ C

20 <j hD1

hj X

ˆj;sI0 .v/;

sD1

and similarly, we can replace the other sum in (6.327) b X X

' ;hI0 .ˇ/ C

20 <j hD1

hj X

' j;sI0 .ˇ/

sD1

with .Y1 EY1 /C.Y2 EY2 /C: : : C.Ym EYm / D

b X X 20 <j hD1

ˆ ;hI0 .v/ C

hj X

ˆj;sI0 .v/;

sD1

where again ˆ ;hI0 .v/ and ˆ ;hI0 .v/ mean, as usual, that we subtracted the expectations (the extra 0 in the index refers to zero expectation).


457

6.9 Completing the Proof of Theorem 5.6 As we already said after formulating Feller’s theorem, the key step in the proof of Theorem 5.6 is to apply Feller’s theorem to the sequence X1 ; X2 ; X3 ; : : :. We choose 1=2 m D 2 log2 m C 3 log3 m C 2 log4 m C : : : C 2 logq m C 2 logqC1 m ; (6.329) that is, in (6.318) we replace q with q C 1 and write " D 0. Then the sum 1 X m 2m =2 X 1 e D1 m m log m log m log m logq m logqC1 m 2 3 m mD1 (6.330) is divergent. This implies requirement (6.310a). Indeed, we have

mD

X 20 <j

b C hj

X

b

20 j

and bj j 3 2j , which imply the existence of absolute constants 0 < C1 < C2 such that C1 m=bj C2 . Combining this fact with (6.317) and (6.330), we obtain that requirement (6.310a) is satisfied with the “deviation factor” m in (6.329). Moreover, we clearly have max jXm j D O j 3 D O .log m/3 ; where we use the fact that m is in the range of j 3 2j . It follows from (6.316) that .Var.X C : : : C Xm //1=2 is in the range of 2j=2 , or equivalently in the range p 1 C X2 3=2 of m.log m/ , so

max jXm j D O .log m/3 D O ƒm .Var.X1 C X2 C : : : C Xm //1=2 holds with the choice ƒm D .log m/2 m1=2 . It follows that requirement (6.309) ƒm D O 3 m is trivially satisfied with the “deviation factor” m in (6.329). Since both requirements (6.309) and (6.310a) are satisfied, we can apply Feller’s theorem for X1 ; X2 ; X3 ; : : :, and obtain that with probability one (i.e., for almost every v 2 P0 ) m X

.Xs EXs / > m .Var.X1 C X2 C : : : C Xm //1=2

sD1

hold for infinitely many integers m. We extend the discrete sequence (6.329) to the continuous function

(6.331)

458


1=2 .x/ D 2 log2 x C 3 log3 x C 2 log4 x C : : : C 2 logq x C 2 logqC1 x ; (6.332) so .m/ D m for (sufficiently large) positive integers m (note that logqC1 x, q 5 is well defined only for relatively large real numbers). For later application (motivated by the fact that bj j 3 2j ) we need to estimate the difference x.log x/4 .x/: The following simple lemma is based on the fact that log log x is a very slowly increasing function. Lemma 6.11. For q 5 we have log x x.log x/4 .x/ < 8.q C 1/ 2 : log x Moreover, we have logqC1 x ; .x/ > 0 .x/ C p 2 log2 x where 0 .x/ is defined in (6.318) by choosing " D 0. Proof. Note that .y/ is monotone increasing. By using the mean value theorem in calculus, we have .2y/ .y/ y max 0 .u/ D yu2y

D y max

yu2y

1

1

1

1 2.log u/ C 3.log2 u/ .log u/ C 2.log3 u/1 .log2 u/1 .log u/1 C : : :

1=2 u 2 2 log2 u C 3 log3 u C 2 log4 u C : : : C 2 logq u C 2 logqC1 u

qC1 : log y

(6.333)

Let p denote the integer satisfying 2p1 < .log x/4 2p . Using a standard powerof-two decomposition, and applying (6.333), we have X qC1 log x x.log x/4 .x/ < 8.q C 1/ 2 ; .2s x/ .2s1 x/ p log x log x 1sp proving the first statement of the lemma.


459

To prove the second statement, we use the fact A2 B 2 D .A B/.A C B/: .x/ 0 .x/ D

2 logqC1 x .x/ C 0 .x/

2 logqC1 x logqC1 x p > p ; 3 log2 x 2 log2 x t u

which completes the proof of Lemma 6.11. Next we estimate the “expectation part” of (6.327): 0

1 hj b X X X Eˇ2Œ0;1/ @ ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A p n D 2 20 <j hD1 sD1 0

1 hj b X X X D Ev2P0 @ ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ A p n: 2 20 <j hD1 sD1 (6.334) By using (6.298) and (6.324), we have ˇ ˇ 0 1 ˇ ˇ hj b X X X ˇ ˇ ˇEv2P @ A ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ f .1 ; 2 I v/ˇˇ 0 ˇ ˇ ˇ 20 <j hD1 sD1

ˇ !ˇ b X ˇˇ X ˇˇ ˆ ;h .v/ C ˆ ;h .v/ ˇ C ˇEv2P0 f .1 . /; 2 . /I v/ ˇ ˇ 20 <j

hD1

ˇ 0 1ˇ ˇ ˇ hj X ˇ ˇ ˇ @ A ˆj;s .v/ C ˆj;s .v/ ˇˇ C ˇEv2P0 f .1 .j /; 2 I v/ ˇ ˇ sD1

X

". I k ; ` /104 .1 C 2 /b 2 .k C 3` / D O .1/;

(6.335)

20 j

where the last step is the usual routine estimation combining the key inequality (6.307) with the definition of ".I k; `/ in (6.83) and the definitions of kj ; `j ; bj (see (6.287)–(6.291), in particular, `j j 2 , kj j 3 , bj j 3 2j ). Moreover, by (6.5) and (6.324), p Ev2P0 f .1 ; 2 I v/ D p .2 1 C 1/ log.1 C 2/ D 2 p D p 3 2j C hj 2i.j / 220 log.1 C 2/: (6.336) 2

460


Combining (6.335) and (6.336), we have 1 hj b X X X Ev2P0 @ ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ A D 0

20 <j hD1

sD1

p D p 3 2j C hj 2i.j / 220 log.1 C 2/ C O .1/: 2

(6.337)

By (6.322) and (6.323), p p 3 2j C hj 2i.j / 220 log.1 C 2/ D .I0 %/ log.1 C 2/ D n C O .log n/3 ; (6.338) and combining (6.338) with (6.334) and (6.337), we have 1 hj b X X X ' ;h .ˇ/ C ' ;h .ˇ/ C 'j;s .ˇ/ C ' j;s .ˇ/ A p n D Eˇ2Œ0;1/ @ 2 20 <j hD1 sD1 0

0 D Ev2P0 @

b X X

20 <j hD1

1 hj X ˆ ;h .v/ C ˆ ;h .ˇ/ C ˆj;s .v/ C ˆj;s .v/ A p n D 2 sD1

D O .log n/3 :

(6.339)

By (6.301), mD

X

b C hj ;

(6.340)

20 <j

and by (6.287)–(6.291) and (6.338), X

X

b 3 2i. / C hj 3 2i.j / D 3

20 <j

2 C hj 3 2i.j / D

20 <j

D 3 2j C hj 2i. / 220 D

n log.1 C

p

2/

C O .log n/3 :

(6.341)

By (6.288), 2i.j / j 3 C j 2 < 2i.j /C1; and combining this fact with (6.340) and (6.341), we have m.log m/4 > n for all sufficiently large n:

(6.342)


461

By Lemma 6.11 and (6.342), m D .m.log m/4 / .m.log m/4 / .m/ .n/ 8.q C 1/

log2 m log n .n/ 10.q C 1/ 2 log m log n

logqC1 n logqC1 n log n 0 .n/ C p 10.q C 1/ 2 0 .n/ C p : log n 2 log2 n 3 log2 n

(6.343)

By using (6.316), (6.338), and (6.343), we can estimate the right-hand side of (6.331) as follows: m .Var.X1 C X2 C : : : C Xm //1=2 D p

1=2 p D m 2 . /3 2j 220 C hj 2i.j / log.1 C 2/ C O 2j = j D p p

p p

p p D m . / n C O n= j D m . / n C O n= log n p

p logqC1 n p p 0 .n/. / n C p . / n C O n log log n= log n : 3 log2 n

(6.344)

Combining (6.331) and (6.344) we obtain that for almost every v 2 P0 , b X X 20 <j hD1

ˆ ;hI0 .v/ C

hj X

ˆj;sI0 .v/

sD1

p

logqC1 n p p 0 .n/. / n C p . / n C O n log log n= log n 3 log2 n

(6.345)

hold for infinitely many positive integers n, where the connection between the pair .j; hj / and the integer n is described by (6.291)–(6.293). Next we estimate .Y1 EY1 / C.Y2 EY2 / C : : : C.Ym EYm / D

b X X

ˆ ;hI0 .v/ C

20 <j hD1

hj X

ˆj;sI0 .v/

sD1

and show that its contribution is negligible. We apply Feller’s theorem to the sequence Y1 ; Y2 ; Y3 ; : : : with the choice p p m D 2 log2 m D 2 log log m:

(6.346)

462


(In this step we do not have to be very careful; this explains the “generous” choice in (6.346).) Then the sum p 1 X m 2m =2 X log log m p n C 0 .n/. / n 2 hold for infinitely many positive integers n. Indeed, it follows from the fact that the term p logqC1 n . / n p 3 log2 n is larger than the dominant term O

p

n log log n= log n

in the last line of (6.356) if n is large enough. This proves (6.319), so the proof of the first half of Theorem 5.6 is complete. Next we prove the other half (6.320). Again the key step is to apply Feller’s theorem to the sequence X1 ; X2 ; X3 ; : : :. This time we choose


465

m D "=2 .m/ D

1=2 "

D 2 log2 m C 3 log3 m C 2 log4 m C : : : C 2 logq1 m C 2 1 C logq m ; 2 (6.357) that is, in (6.318) we replace " > 0 with "=2 > 0. Then the sum 1 X m 2m =2 X 1 e p . / n; 4 log2 n

(6.368)

and the right-hand side of (6.368) is larger than the dominant term O

p

n log log n= log n

in the last line of (6.366) if n is large enough. Since (6.367) is exactly (6.320), the proof of Theorem 5.6 is complete. t u

468


6.10 More Results in a Nutshell There are many more results that can be proved by the method of Chap. VI. As a first illustration, consider the following modification of Theorem 5.6. p In Theorem 5.6 we choose a fixed > 0 and study the asymptotic behavior of F . 2I ˇI I e n / as n ! 1 for almost every ˇ. This raises the following natural question: Is there an analog of Theorem 5.6 which works simultaneously for all > 0? The answer is yes; for simplicity we just formulate such a result in the Khinchin form (LIL stands for the law of the iterated logarithm). Simultaneous version of the LIL: Let " > 0 be an arbitrarily small but fixed constant. Then for almost every ˇ, p p p p n . / n .2 C "/ log log n < F . 2I ˇI I e n / < 2 p p < p n C . / n .2 C "/ log log n 2

(6.369)

hold for all > 0 and for all sufficiently large n, i.e., for all n > n0 .ˇ; /. Note that (6.369) is sharp in the sense that 2 C " cannot be replaced by 2 ". Note that one can even upgrade the Simultaneous LIL from “all > 0” to “all intervals Œ1 ; 2 .” Let’s compare Theorems 5.4 and 5.6. The former is a “global” result describing the case where N D e n is fixed and ˇ runs in the unit interval 0 < ˇ < 1; the later describes the “individual” behavior of almost every ˇ as N D e n ! 1. One may think that, in a long run as N ! 1, we should have an “individual” CLT as follows: for almost every 0 < ˇ < 1, Z 1 p p p oˇˇ 1 ˇˇn 1 2 e u =2 d u: ˇ 1 n M W F . 2I ˇI I e n / n= 2 > . / n ˇ ! p M 2 (6.370) But (6.370) is false!. We just give a vague/intuitive explanation why it fails (this vague argument can be easily turned into a precise proof). In view of our basic probabilistic intuition (5.38), it suffices to study the analogous question for the symmetric random walk. Let X1 ; X2 ; X3 ; : : : be an infinite sequence of independent random variables with PrŒXi D 1 D PrŒXi D 1 D 1=2 for all i 1; and write Sn D X1 C X2 C : : : C Xn : If Sk 0 then we say that, after the kth step, the random walk is “on the positive side” (zero is included). We need the following well-known combinatorial/probabilistic result.


469

Lemma 6.12. The probability that the symmetric random walk spends exactly 2k from the first 2n steps on the positive side equals 2k k

!

! 2n 2k 2n 2 : nk

We postpone the proof to the end of the section. From Lemma 6.12 we can easily derive the so-called Arc Sine Law, a well-known C “paradox” of the random walk, which is the reason why (6.370) fails. Let T2n denote the total time of the random walk S1 ; S2 ; S3 ; : : : ; S2n (of the first 2n steps) on the positive side. Then by Lemma 6.12, for any 0 < x < 1, C PrŒT2n

2nx D

X k2nx

2k k

!

! 2n 2k 2n 2 : nk

p By using Stirling’s formula nŠ .n=e/n 2 n.1 C o.1//, C 2nx D .1 C o.1// PrŒT2n

Z

nx

D .1 C o.1// 0

2 D .1 C o.1//

Z

k2nx

p

p D .1 C o.1// t.n t/

p x 0

dt

X

1 k.n k/

Z 0

x

D

du p D u.1 u/

p dy 2 p D .1 C o.1// arcsin x; 1 y2

(6.371)

p where we applied two substitutions: first u D t=n and then y D u (of course arcsin means the “inverse sine”, and o.1/ ! 0 if n ! 1). C Equation (6.371) tells p us that T2n =2n has a distribution function which is 2 approximately arcsin x when n is sufficiently large. It is often called the “Arc Sine Law of visiting the positive side.” This Arc Sine Law is rather surprising; with some exaggeration one may even call it a “mathematical paradox.” One may think, in a long run of 2n steps, the random walk typically spends close to one-half of the time on the positive side and one-half of the time on the negative side (here “close” means a possible discrepancy tending to zero as n tends to infinity). But the truth is quite different. By (6.371), the probability that a random walk p of 2n steps spends 2nx steps or less on the positive side is approximately 2 arcsin x. Since u D 1=2 is in fact the minimum(!) of the density function 1 .u.1 u//1=2 in (6.371), we see that, with relatively large probability, the proportion of the time spent on the positive side is near to 0 or 1, but not near to 1/2. We can say, therefore, that in a long run of tosses of a fair coin,

470


there is a good chance that one face (either Heads or Tails) will dominate (meaning that it will lead the other for a disproportionate amount of time). The Arc Sine Law (more or less) explains why (6.370) is false. But we can restore common sense in the form of an individual CLT, if we switch from the ordinary asymptotic density 1 M !1 M

X

density.A/ D lim

1

(6.372)

n2AW nM

to the so-called 1 M !1 log M

X

logarithmic density.A/ D lim

n2AW nM

1 ; n

(6.373)

where A N I is an arbitrary (usually infinite) subset of the natural numbers. (Of course, the limits in (6.372) and (6.373) do not necessarily exist for an arbitrary A.) The name “logarithmic density” comes from the well-known fact that M X 1 D log M C O.1/ D log M C 0 C O.1=N / n nD1

(here 0 D :5772::: is Euler’s constant, but its value is irrelevant). In the following result CLT stands for the central limit theorem. Individual CLT for the logarithmic density: For any real numbers > 0 and , the set n p p p o n 2N I W F . 2I ˇI I e n / n= 2 > . / n has logarithmic density equal to 1 p 2

Z

1

e u

2 =2

du

for almost every 0 < ˇ < 1. Our two results so far (Simultaneous LIL and p Individual CLT) are stated, for the sake of simplicity, in the special case ˛ D 2. Needless to say, both can be extended for every quadratic irrational ˛. The next natural question is: What happens for almost every ˛? Let F .˛I ˇI cI N / denote the number of integral solutions 1 n N of the inhomogeneous diophantine inequality kn˛ ˇk
0, for almost every ˛ ˇ ˇ o n p ˇ ˇ max ˇmeas ˇ 2 Œ0; 1/ W F .˛I ˇI cI e n / 2cn 0 .c/ n log n N ./ˇ ! 0

(6.376) as n ! 1, where 1 N ./ D p 2

Z

1

e u

2 =2

du

is the tail probability of the normal distribution; the maximum at the beginning of (6.376) is taken over all 1 < < 1; 0 .c/ > 0 denotes a constant depending only on c. Similarly to the proofs of Theorems 5.4 and 5.6, the proof of the “CLT for almost every ˛” is also based on an approximation with a sum of independent random variables: F .˛I ˇI cI e n / 2cn D X1 C X2 C X3 C : : : C Xn C Un ;

(6.377)

where X1 ; X2 ; X3 ; : : : are independent random variables, and Un is “negligible” (they all have zero expectation). Combining the “variance lemma” Lemma 6.4 and Kusmin’s theorem (see (4.102) and (4.103), it is not too difficult to see that the distribution of the dominating part X1 C X2 C X3 C : : : C Xn in (6.377) is similar to the the distribution of the simpler sum Z1 C Z2 C Z3 C : : : C Zn of independent random variables with zero expectations: X1 C X2 C X3 C : : : C Xn Z1 C Z2 C Z3 C : : : C Zn ;

(6.378)

where, for simplicity assume that n is a 2-power: n D 2k ; assume that 2k1 of the random variables Zi in (6.378) have the distribution PrŒZi D 1 D PrŒZi D 1 D 1=2; 2k2 of the random variables Zi in (6.378) have the distribution PrŒZi D 2 D 1=4 and PrŒZi D 2=3 D 3=4;

472


2k3 of the random variables Zi in (6.378) have the distribution PrŒZi D 4 D 1=8 and PrŒZi D 4=7 D 7=8; 2k4 of the random variables Zi in (6.378) have the distribution PrŒZi D 8 D 1=16 and PrŒZi D 8=15 D 15=16; and so on; finally assume that two of the random variables Zi in (6.378) have the distribution PrŒZi D 2k2 D 2kC1 and PrŒZi D 2k2 =.2k1 1/ D .2k1 1/=2k1; and the last two of the random variables Zi in (6.378) have the distribution PrŒZi D 2k1 D 2k and PrŒZi D 2k1 =.2k 1/ D .2k 1/=2k : Indeed, Z1 ; Z2 ; : : : ; Zn reflects the distribution in (4.102) in the following sense: k 1 2X

log .ii.iC1/ C2/

i D2k1

2

log 2

2X 1 1 1 2k : 2 log 2 i k1 k

i D2

Note that X

2kj 2j D

.k=2/CC j k

X

2k2j < 22C C1

(6.379)

.k=2/CC j k

is very small if C is a large constant. Inequality (6.379) implies that in (6.378) we can “truncate the tail” in the sense that in suffices to restrict (6.378) to the cases 1 j < .k=2/ C C with the property that 2kj of the random variables Zi in (6.378) have the distribution PrŒZi D 2j 1 D 2j and PrŒZi D 2j 1 =.2j 1/ D .2j 1/=2j : Indeed, by (6.379) the probability of the event “Zi 2.k=2/CC occurs for at least one index 1 i n D 2kP ” is less than 22C C1 , which is a negligible probability if P C is a large constant. Let i Zi denote the truncated version of the sum niD1 Zi in (6.378). The good news is that we P can apply the Berry–Esseen form of the CLT [see (6.128)] for the truncated sum i Zi . Indeed, V D

X i

EZi2

X 1j A Vn

X p k>A Vn

p PrŒSn D k PrŒSm > A Vm jSn D k N 2 .A/ D

p PrŒSn D k PrŒSm Sn > A Vm k N 2 .A/:

(6.386)

If we make the stronger assumption m > 2n, then by the CLT, Z

1 cov.m ; n / p 2 1 p 2

Z

1

e

u2 =2

A 1

e A

u2 =2

Z

1 p 2

1 p 2

Z

!

1 p p A pVm u Vn Vm Vn

1

e A

t 2 =2

e

t 2 =2

dt

d u;

dt

du


475

where, of course, the last line is just N 2 .A/ in disguise. Thus we have s Vn Vm Vn

cov.m ; n / D O

!

r DO

n : mn

(6.387)

If n < m 2n then we simply use the trivial upper bound jcov.m ; n /j 1:

(6.388)

Notice that Lemma 6.13 is equivalent to the following strong law of large numbers: for every rational number A, the random variables n D A;n defined in (6.385) have the property PM

M X n En

1

n

nD1 1=n nD1

! 0 with probability one:

(6.389)

(Indeed, the set of rationals is countable, and a countable union of zero measure sets is still a zero measure set.) To prove (6.389), we compute the corresponding second moment: 2 E4

M X n En

n

nD1

X

C2

1n<mM

C2

X 1n<m2nM

!2 3 M X E.n En /2 5D C n2 nD1 cov.m ; n / D O.1/C mn

X cov.m ; n / cov.m ; n / C2 : mn mn 12n<mM

(6.390)

First we use (6.388): X 1n<m2nM

M X cov.m ; n / n DO mn n2 nD1

M X 1 DO n nD1

! D

! D O.log M /:

(6.391)

476


Next we use (6.387): X 12n<mM

X

cov.m ; n / DO mn 0

X

DO@

X

1 m

1<mM

12n<mM

1n<m=2

1 n

r

1 mn

r

n mn

! D

1 n A : mn

(6.392)

We have X 1n<m=2

1 n

r

X

X n D mn

k1 m2k1 n<m2k

1 n

r

n D mn

k X X m 2 k=2 O O 2k=2 D O.1/: D D 2 2kC1 m k1

k1

Using this in (6.392), we have X 12n<mM

X

cov.m ; n / DO mn

1mM

1 m

! D O.log M /:

(6.393)

Combining (6.390), (6.392), and (6.393), we have 2

M X n En

Variance D E 4

nD1

n

!2 3 5 D O.log M /:

(6.394)

By Chebyshev’s inequality and (6.394), for any > 0 we have the upper bound

1 O.1/ Pr jLM j > D 2 ; log M log M where LM D

M X n En

n

nD1

:

2

By choosing M D Mk D e k , we have 1 X kD1

1

X 1 1 D D O.1/: log Mk k2 kD1

(6.395)


477

So by (6.395), for every > 0, 1 X kD1

1 Pr jLMk j > < 1: log Mk

By the (trivial) Borel–Cantelli lemma, for every > 0, with probability one, jLMk j < for all sufficiently large k: 2

(6.396)

2

If Mk D e k < n < MkC1 D e .kC1/ then 0 Ln LMk D O @

DO

e .kC1/ 1 log 2 k ek2

2

1 k2

!

2

2

e k <j <e .kC1/

DO

1

X

1 AD j

1 log e 2kC1 k2

DO

1 ; k

which tends to zero as k ! 1. Combining this with (6.396), (6.389) follows, completing the outline of the proof of Lemma 6.13. t u Finally, we include the Proof of Lemma 6.12. It is a combinatorial argument based on a recurrence formula. Let p2n .2k/ denote the probability in question. By symmetry, p2n .2k/ D p2n .2n 2k/:

(6.397)

Let 1 k n 1, and consider the event corresponding to p2n .2k/. Then the time T of the first return to the origin equals T D 2r for some 1 r n 1. The time interval .0; T / is spent entirely on the strictly positive or strictly negative side, and each possibility has probability 1/2. Thus we have the key recurrence formula p2n .2k/ D

k X 1 rD1

C

2

nk X 1 rD1

2

PrŒT D 2r p2n2r .2k 2r/C

PrŒT D 2r p2n2r .2k/;

where the first sum represents the case of “strictly positive in the time interval .0; T /” and the second sum represents the case of “strictly negative in the time interval .0; T /.”

478


Since 1 k n 1, we can apply induction: p2n .2k/ D

k X 1 rD1

C

nk X 1 rD1

2

2

PrŒT D 2r u2k2r u2n2k C

PrŒT D 2r u2k u2n2r2k ;

where u2` D PrŒS2`

(6.398)

! 2` 2` D 0 D 2 : `

We use the obvious equality (where, as usual, PrŒAjB denotes the conditional probability) u2` D PrŒS2` D 0 D

` X

PrŒS2` D 0jT D 2r PrŒT D 2r D

rD1

D

` X

PrŒS2`2r D 0 PrŒT D 2r D

rD1

` X

u2`2r PrŒT D 2r

rD1

in (6.398), and obtain 1 1 u2k u2n2k C u2n2k u2k D 2 2 ! ! 2k 2n 2k 2n 2 ; D u2k u2n2k D nk k

p2n .2k/ D

proving Lemma 6.12 for 1 k n 1. It remains to prove the missing cases k D 0 and k D n. They follow from the symmetry (6.397) and the identity n X 2k k kD0

!

! 2n 2k 2n 2 D 1: nk

(6.399)

To prove (6.399), we start from the binomial series with power 1=2: 1=2

.1 x/

! ! 1 1 X X 1=2 2k 2k k k k .1/ x D 2 x : D k k kD0

kD0

(6.400)


479

Squaring both sides of (6.400), we have 1 X

2 x n D .1 x/1 D .1 x/1=2 D

nD0

! ! ! 1 ! 1 X X 2m 2k 2k k D 2 x 22m x m D k m mD0 kD0

1 n X X 2k D k nD0 kD0

!

! ! 2n 2k 2n 2 xn : nk

Comparing the coefficients of x n , identity (6.399) follows. This completes the proof of Lemma 6.12. t u

References

[Aa1] van Aardenne-Ehrenfest, T.: Proof of the impossibility of a just distribution of an infinite sequence of points over an interval, Proc. Kon. Ned. Akad. v. Wetensch. 48 (1945), 266–271. [Aa2] van Aardenne-Ehrenfest, T.: On the impossibility of a just distribution, Proc. Kon. Ned. Akad. v. Wetensch. 52 (1949), p 734–739. [Be1] Beck, J.: Randomness of n 2 mod 1 and a Ramsey property of the hyperbola, Colloquia Math. Soc. János Bolyai 60, Sets, Graphs and Numbers, Budapest, Hungary (1992), 23–66. [Be2] Beck, J.: Diophantine approximation and quadratic fields, Number Theory, Eds.: Gy˝ory/Peth˝o/Sós, Walter de Gruyter GmbH, Berlin - New York 1998, pp. 55–93. [Be3] Beck, J.: From probabilistic diophantine approximation to quadratic fields, Random and Quasi-Random Point Sets, Lecture Notes in Statistics 138, Springer-Verlag New York 1998, pp. 1–49. [Be4] Beck, J.: Randomness in lattice point problems, Discrete Mathematics 229 (2001), pp. 29–45 [Be5] Beck, J.: Lattice point problems: crossroads of number theory, probability theory, and Fourier analysis, Fourier Analysis and Convexity (conference in Milan, Italy, July 2001) Eds.: L. Brandoline et al., Applied and Numerical Harmonic Analysis, Birhäuser-Verlag, Boston MA 2004, pp. 1–35. [Be6] Beck, J.: Inevitable Randomness in Discrete Mathematics, University Lecture Series Vol. 49, Amer. Math. Soc. 2009. [Be7] Beck, J.: Randomness of the square root of 2 and the giant leap, Part 1, Periodica Math. Hungarica, 60, no. 2 (2010), 137–242. [Be8] Beck, J.: Randomness of the square root of 2 and the giant leap, Part 2, Periodica Math. Hungarica, 62, no. 2 (2011), 127–246. [Be9] Beck, J.: Lattice point counting and the probabilistic method, Journal of Combinatorics, 1, no. 2 (2010), 171–232. [Be10] Beck, J.: Superirregularity in Panorama of Discrepancy Theory, Editors: William Chen, Anand Srivastav, Giancarlo Travaglini, Springer Verlag 2014, pp. 1–87. [Be-Ch] Beck, J. and Chen, W.W.L.: Irregularities of Distribution, Cambridge Tracts in Mathematics 89, Cambridge University Press, Cambridge, 1987. [Ca2] Cassels, J.W.: Über lim xj x C ˛ yj, Math. Ann. 127 (1954), 288–304. [Cha] Chazelle, B.: The Discrepancy Method, Cambridge University Press, Cambridge, 2000. [Co] van der Corput, J.G.: Verteilungsfunktionen. I and II. Proc. Kon. Ned. Akad. v. Wetensch. 38 (1935), 813–821 and 1058–1066.

© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7

481

482

References

[Da] Davenport, H.: Note on irregularities of distribution, Mathematika 3 (1956), 131–135. [De] Descombes, I.R.: Sur la répartition des sommets d’une ligne polygonale réguliere nonfermée, Ann. Sci. de l’École Normale Sup. 75 (1956), 284–355. [Di] Dieter, U.: Das Verhaltender Kleinschen Functionen gegenüber Modultransformationen und verallgemeinerte Dedekindsche Summen, Journ. Reine Angew. Math. 201 (1959), 37–70. [Du] Dupain, Y.: Discrépance de la suite, Ann. Inst. Fourier 29 (1979), 81–106. [Du-So] Dupain, Y. and Sós, Vera T.: On the discrepancy of n˛ sequences, Topics in classical number theory, Colloquium, Budapest 1981, vol. 1, Colloq. Math. Soc. János Bolyai 34, 355–387. [El] Elliott, P.D.T.A.:Probabilistic number theory, vol. 1 and 2, Springer 1979–80. [Er] Erd˝os, P.: On the law of the iterated logarithm, Ann. of Math. 43 no. 2 (1942), 419–436. [Er-Hu] Erd˝os, P. and Hunt, G.A.: Changes of sign of sums of random variables, Pacific Journal Math. 3 (1953), 673–687. [Fe1] Feller, W.: An Introduction to Probability Theory and its Applications, Vol. 1 (3rd edn), Wiley, New York, 1969. [Fe2] Feller, W.: An Introduction to Probability Theory and its Applications, Vol. 2 (2nd edn), Wiley, New York, 1971. [Fe3] Feller, W.: The general form of the so-called law of the iterated logarithm, Trans. Amer. Math. Soc. 54 (1943), 373–402. [Ha] Halász, G.: On Roth’s method in the theory of irregularities of point distributions, Recent Progress in Analytic Number Theory, Vol. 2, pp. 79–94, London, Academic Press 1981. [Ha-Li1] Hardy, G.H. and Littlewood, J.: The lattice-points of a right-angled triangle. I, Proc. London Math. Soc. 3 (1920), 15–36. [Ha-Li2] Hardy, G.H. and Littlewood, J.: The lattice-points of a right-angled triangle. II, Abh. Math. Sem. Hamburg 1 (1922), 212–249. [Ha-Li3] Hardy, G.H. and Littlewood, J.: Some problems of Diophantine approximation: A series of cosecants, Bull. Calcutta Math. Soc. 20 (1930), 251–266. [Ha-Wr] Hardy, G.H. and Wright, E.M.: An introduction to the theory of numbers, 5th edition, Clarendon Press, Oxford 1979. [Ka] Kac, M.: Probability methods in some problems of analysis and number theory, Bull. Amer. Math. Soc. 55 (1949), 641–665. [Ke] Kesten, H.: Uniform distribution mod 1, Ann. of Math. 71 (1960), 445–471, and Part II in Acta Arithm. 7 (1961), 355–360. [Kh1] Khinchin, A.: Über einen Satz der Wahrscheinlichkeitsrechnung, Fundamenta Math. 6 (1924), 9–20. [Kh2] Khinchin, A.: Continued Fractions, English translation, P. Noordhoff, Groningen, The Netherlands 1963. [Kn1] Knuth, D.E.: Notes on generalized Dedekind sums, Acta Arithmetica 33 (1977), 297–325. [Kn2] Knuth, D.E.: The art of computer programming, volume 3, Addison-Wesley 1998. [Ko] Kolmogorov, A.: Das Gesetz des iterierten Logarithmus, Math. Annalen 101 (1929), 126–135. [La] Lang, S.: Introduction to Diophantine Approximations, Addison-Wesley 1966. [Ma] Matousek, J.: Geometric Discrepancy, Algorithms and Combinatorics 18, SpringerVerlag, Berlin 1999. [Os] Ostrowski, A.: Bemerkungen zur Theorie der Diophantischen Approximationen. I. Abh. Hamburg Sem. 1 (1922), 77–99. [Ra-Gr] Rademacher, H. and Grosswald, E.: Dedekind Sums, Math. Assoc. Amer., Carus Monograph No. 16 (1972). [Ro] Roth, K.F.: Irregularities of distribution, Mathematika 1 (1954), 73–79. [Schm] Schmidt, W.M.: Irregularities of distribution. VII, Acta Arithmetica 21 (1972), 45–50. [Scho] Schoissengeier, J.: Another proof of a theorem of J. Beck, Monatshefte für Mathematik 129 (2000), 147–151.

References

483

[Shi] Shintani, T.: On evaluation of zeta functions of totally real algebraic number fields at non-positive integers, Journ. Fac. Sci. Univ. Tokyo 23 1976, 393–417. [So1] Sós, Vera T.: On the distribution mod 1 of the sequence fn˛g, Ann. Univ. Sci. Budapest 1 (1958), 127–234. [So2] Sós, Vera T.: On the discrepancy of the sequence fn˛g, Coll. Math. Soc. János Bolyai 13 (1974), 359–367. [So3] Sós, Vera T.: On strong irregularities of the distribution of fn˛g sequences, Studies in Pure Math. (1983), 685–700. [So-Za] Sós, Vera T. and Zaremba, S.K.: The mean-square discrepancies of some twodimensional lattices, Studia Sci. Math. Hungarica 14 (1979), 255–271. [Sw] Swierczkowski, S.: On successive settings of an arc on the circumference of a circle, Fund. Math. 46 (1958), 187–189. [We] Weyl, H.: Über die Gleichverteilung von Zahlen mod Eins, Math. Ann. 77 (1916), 313–352. [Wo] Wolfram, S.: A new kind of science, Wolfram Media 2002. [Za1] Zagier, D.B.: Nombres de classes at fractions continues, Journ. Arithmetiques de Bordeaux, Asterisque 24–25 (1975), 81–97. [Za2] Zagier, D.B.: On the values at negative integers of the zeta-function of a real quadratic field, Einseignement Math. (2) 22 (1976), 55–95. [Za3] Zagier, D.B.: Valeurs des functions zeta des corps quadratiques reels aux entiers negatifs, Journ. Arithmetiques de Caen, Asterisque 41–42 (1977), 135–151. [Za4] Zagier, D.B.: Zeta-funktionen und quadratische Körper, Hochschultext, Springer 1981.

Index

A Area Principle, 44–59, 251–254, 260–279, 346, 355–367

B Badly approximable numbers, 6, 32, 118–120, 148, 226 Beck–Chen, 30 Beck, J., 93, 145 Binary quadratic form, 21, 84, 148, 179, 251, 257 reduced, 183 Birkhoff, 143 Blocks-and-gaps decomposition, 371–393 Bohl, 7 Borel, E., 137 Bumby, 123

E Elliott, P.D.T.A., 147 Erdõs-Hunt, 473, 474 Erdõs, P., 265 Euler, 85, 142, 187, 255, 355 Extra Rule, 23, 46, 48, 49, 60, 65, 66, 93, 208, 209, 212, 214, 216, 223, 228, 356

F Feller, W., 49, 56, 73 Fibonacci number, 47, 60, 67, 76, 77, 219 Formula Dedekind’s reciprocity, 89 Hirzebruch-Mayer-Zagier (HMZ), 85–87, 161, 163, 164, 193 Ostrowski’s explicit, 23–26, 46, 77, 207, 226 Parseval’s, 32, 33, 262, 423, 425, 441 Poisson’s summation, 242, 262, 423, 424

C Cassels, J.W., 356 Chazelle, B., 30 Continued fraction convergent, 60 partial quotient, 6, 32, 79, 85, 93, 118, 142, 144

G Gauss, 143, 182, 256 measure, 144 Giant Leap, 1–16, 137–147, 254–261

D Davenport, H., 32 Descombes, I.R., 356 Dieter, U., 101 Dupain-Sós, 11 Dupain, Y., 11

H Halász, G., 290 Hardy–Littlewood, 8, 18, 31, 98–100, 125, 239 series, 100, 120–123, 125, 127 Hardy–Wright, 6, 154 Hurwitz, 123, 124, 255

© Springer International Publishing Switzerland 2014 J. Beck, Probabilistic Diophantine Approximation, Springer Monographs in Mathematics, DOI 10.1007/978-3-319-10741-7

485

486 Hyperbolic needle, 252, 253, 259, 273, 283–286, 288, 294–296, 298, 300, 301, 305, 312, 336, 347, 348, 351–355, 371, 395, 403 triangle, 151, 152, 372, 375, 376, 390, 393, 403

I Inequality Koksma’s, 128, 136, 172 Kolmogorov’s, 73, 74, 238 Irrational rotation, 6–8, 11, 12, 14, 15, 17, 23, 29–32, 100, 105, 128, 129, 163, 197, 227, 356, 367

K Kac, M., 147 Kesten, H., 244 Khinchin, A., 8, 144, 264 Knuth, D.E., 98, 103, 141 Kolmogorov, A., 140, 265 Komlós, 52 Kronecker, 7, 101, 255 Kusmin, 143, 144, 244, 471

L Lang, S., 8, 142, 257 Law of the iterated logarithm (LIL), 258, 263, 266, 446, 451, 468, 470, 473 Lemma Borel-Cantelli, 56, 450, 477 on Bounded Error Initial Segments, 9–10, 14–16, 30 on Bounded Error Intervals, 7–9, 13–15 Discrepancy, 10, 11, 31, 129, 171 Hecke’s, 13–15 on Just Intervals, 14, 15, 30 on Restricted Permutations, 14

M Markov chain ergodic, 211, 219–223 stationary distribution, 50, 68, 77 Matousek, J., 30

O Ostrowski, A., 8, 11, 18, 23, 31, 239, 355 Ostrowski’s large uctuation result, 26

Index P Pell’s equation, 6, 13, 21, 80–82, 85, 122, 123, 149, 179, 180, 182, 193, 203, 249–367, 371 Pell’s inequalities homogeneous, 255–261 inhomogeneous, 255–261, 471

Q Quadratic field class number, 21, 22, 84, 161–165, 181–186 Dedekind zeta function, 181, 192, 196, 199, 243 Dirichlet character, 163, 187 Dirichlet L-function, 187 fundamental unit, 13, 84, 86, 162, 165, 180, 193–195, 198, 199, 215, 217, 260 norm, 83, 171 primary representation, 83, 148, 149, 153, 161, 165, 180, 196, 198 unique factorization, 84 unit, 84 Quadratic irrational, 3–6, 11, 12, 15, 20, 26, 28, 31, 64, 78, 80–83, 87, 88, 116, 118, 121–123, 141, 144, 167, 176, 180, 196–197, 203, 204, 207, 213, 219, 223, 226, 238, 243, 244, 251, 255, 258, 267, 286, 470

R Rademacher function, 263 modified, 290–294, 298–300 Rademacher-Grosswald, 89 Rademacher like function, 374, 378, 380–383, 385, 386, 389, 390, 392, 393, 408, 409 Riesz product, 279–311, 343–345, 347–349, 352, 353 Roth, K.F., 31, 144, 289, 290

S Sawtooth function, 91, 116, 126, 240 Schmidt, W.M., 31 Schoissengeier, J., 93, 98 Shintani, T., 197 Sierpinski, H., 7, 138 Sós, 14, 227, 356, 361

Index Sós-Zaremba, 33 Sum Dedekind, 64, 78, 88, 89, 91, 92, 94, 98, 100, 101 Gauss, 188, 190 Super-irregularity, 261, 272 Swierczkowski, S., 14

T Theorem central limit, 6, 11–13, 18, 20–23, 27, 28, 30–43, 65, 67, 72, 73, 76, 78, 139, 141, 148, 167, 181, 210, 232, 243–247, 258, 262–266, 373, 405–413, 470 Converse of Lagrange’s, 5–7 Feller’s, 446, 451–457, 461, 462, 464–466 Lagrange’s, 5 Roth’s, 32, 145 three-distance, 14, 15, 30

487 U Uniform distribution, 3–16, 30, 129, 141, 241, 330

V Van Aardenne-Ehrenfest, 31 Van der Corput, J.G., 29, 30 sequence, 29–43, 59 Von Mises, 140, 141

W Weyl, H., 7, 8 Weyl’s criterion, 8 Wolfram, S., 145

Z Zagier, D.B., 85, 87, 148, 162, 183, 184, 192, 197, 257

Probabilistic Diophantine Approximation: Randomness in Lattice Point Counting

Diophantine approximation

Introduction to Diophantine approximation

An introduction to diophantine approximation

An introduction to diophantine approximation

An introduction to diophantine approximation,

Diophantine approximation and abelian varieties

Distribution Modulo One and Diophantine Approximation

Diophantine Approximation and Abelian Varieties: Introductory Lectures

Diophantine Approximation: Festschrift for Wolfgang Schmidt

Randomized algorithms approximation generation and counting

A banach lattice without the approximation property

Diophantine approximation and transcendence theory. Seminar, Bonn

IFPUG Function Point Counting Practices (2010)

Metric Diophantine Approximation on Manifolds (Cambridge Tracts in Mathematics, 137)

Counting

Randomness

Randomness

Counting Up, Counting Down

Lattice Path Counting and Applications (Probability & Mathematical Statistics Monograph)

Heights in diophantine geometry

Counting

Counting

Lattice Path Counting and Applications (Probability & Mathematical Statistics Monograph)

Counting Up, Counting Down

Counting Up, Counting Down

Counting Cats in Zanzibar

Counting the Public In

Counting the Public In

Nevanlinna theory and its relation to diophantine approximation

Diophantine Approximations And Diophantine Equations

Probabilistic Diophantine Approximation: Randomness in Lattice Point Counting

Diophantine approximation

Introduction to Diophantine approximation

An introduction to diophantine approximation

An introduction to diophantine approximation

An introduction to diophantine approximation,

Diophantine approximation and abelian varieties

Distribution Modulo One and Diophantine Approximation

Diophantine Approximation and Abelian Varieties: Introductory Lectures

Diophantine Approximation: Festschrift for Wolfgang Schmidt

Randomized algorithms approximation generation and counting

A banach lattice without the approximation property

Diophantine approximation and transcendence theory. Seminar, Bonn

IFPUG Function Point Counting Practices (2010)

Metric Diophantine Approximation on Manifolds (Cambridge Tracts in Mathematics, 137)

Counting

Randomness

Randomness

Counting Up, Counting Down

Lattice Path Counting and Applications (Probability & Mathematical Statistics Monograph)

Heights in diophantine geometry

Counting

Counting

Lattice Path Counting and Applications (Probability & Mathematical Statistics Monograph)

Counting Up, Counting Down

Counting Up, Counting Down

Counting Cats in Zanzibar

Counting the Public In

Counting the Public In

Nevanlinna theory and its relation to diophantine approximation

Diophantine Approximations And Diophantine Equations

Recommend Documents