Introduction to Number Theory

HuaLooKeng Introduction to Number Theory Translated from the Chinese by Peter Shiu With 14 Figures Springer-Verlag Ber...

Author: L.-K. Hua

287 downloads 2482 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

HuaLooKeng

Introduction to Number Theory Translated from the Chinese by Peter Shiu With 14 Figures

Springer-Verlag Berlin Heidelberg New York 1982

Hua Loo Keng Institute of Mathematics Academia Sinica Beijing The People's Republic of China

Peter Shiu Department of Mathematics University of Technology Loughborough Leicestershire LE 11 3 TU United Kingdom

ISBN 3-540-10818-1 Springer-Verlag Berlin Heidelberg New York ISBN 0-387-10818-1 Springer-Verlag New York Heidelberg Berlin

Library of Congress Cataloging in Publication Data. Hua, Loo-Keng, 1910 -. Introduction to number theory. Translation of: Shu lun tao yin. Bibliography: p. Includes index. I. Numbers, Theory of. I. Title. QA24l.H7513. 512'.7. 82-645. ISBN 0-387-10818-1 (U.S.). AACR2 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, reuse of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use a fee~s payable to "Verwertungsgesellschaft Wort", Munich. © Springer-Verlag Berlin Heidelberg 1982 Printed in Germany

Typesetting: Buchdruckerei Dipl.-Ing. Schwarz' Erben KG, Zwettl. Printing and binding: Konrad Triltsch, Wiirzburg 2141/3140-5432 I 0

Preface to the English Edition

The reasons for writing this book have already been given in the preface to the . original edition and it suffices to append a few more points. In the original edition I collected various recent results in number theory and put them in a text book suitable for teaching purposes. The book contains: The elementary proof of the prime number theorem due to Selberg and Erdos; Roth's theorem; A. O. Gelfond's solution to Hilbert's seventh problem; Siegel's theorem on the class number of binary quadratic forms; Linnik's proof of the HilbertWaring theorem; Selberg's sieve method and Schnirelman's theorem on the Goldbach problem; Vinogradov's result concerning least quadratic non-residues. It also contains some of my own results, for example, on the estimation of complete trigonometric sums, on least primitive roots, and on the Prouhet-Tarry problem. The reader can see that the book is much influenced by the work of Landau, Hardy, Mordell, Davenport, Vinogradov, Erdos and Mahler. In the quarter of a century between the two editions of the book there have been, of course, many new and exciting developments in number theory, and I am grateful to Professor Wang Yuan for incorporating many new results which will guide the reader to the literature concerning the latest developments. It has been doubtful in the past whether number theory is a "useful" branch of mathematics. It is futile to get too involved in the argument but it may be relevant to point out some specific examples of applications. The fundamental principle behind the Public Key Code is the following: It is not difficult to construct a large prime number but it is not easy to factorize a large composite integer. For example, it only takes 45 seconds computing time to find the first prime exceeding 2200 (namely 2 200 + 235, a number with 61 digits), but the computing time required to,factorize a product of two primes, each with 61 digits, exceeds 4 million million years. According to Fermat's theorem: if p is prime then aP-l == 1 (modp), and if n is composite then a4'(n) == 1 (mod n), ¢(n) < n - 1. The determination of whether n is prime by this method is quite fast and this is included in the book. Next the location of the zeros of the Riemann Zeta function is a problem in pure mathematics. However, an interesting problem emerged during calculations of these zeros: Can mathematicians always rely on the results obtained from computing machines, and if there are mistakes in the machines how do we find out? Generally speaking calculations by machines have to be accepted by faith. For this reason Rosser, Schoenfeld and Yohe were particularly careful when they used computers to calculate the zeros of the Riemann Zeta function. In their critical examination of the program they discovered that there were several logical errors in the machine itself. The machine has been in use for some years and no-one had found these errors until

VI

Preface to the English Edition

the three mathematicians wanted to scrutinize the results on a problem which has no practical applications. Apart from these there are applications from algebraic number theory and from the theory of rational approximations to real numbers which we need not mention. Finally I must point out that this English edition owes its existence to Professor Heini Halberstam for suggesting it, to Dr. Peter Shiu for translating it and to Springer-Verlag for publishing it. I am particularly grateful to Peter Shiu for his excellent translation and to Springer-Verlag for their beautiful printing.

March 1981, Beijing

Hua Loo-Keng

Preface to the Original Edition

This preface has been revised more than once. The reason is that, during the last fifteen years, the author's knowledge of mathematics has changed and the needs of the readers are different. Moreover the content of the book has been so expanded during this period that the old preface has become quite unsuitable. Everything is still very clear in my memory. The plan for the book was conceived round about 1940 when I first lectured on number theory at Kwang Ming University. I had written some 85 thousand words (characters) for the first draft and I estimated that another 25 thousand words were needed to complete the manuscript. But where was I to publish the work? I therefore could not summon up the energy required to complete the project. Later when lecturing in America I made additions and revisions to the manuscripts, but these were made for my teaching requirements and not with a view to publishing the book. The real effort required for the task was given after the liberation. Since our country has very few reference books there is need for a broad introductory text in number theory. It seems a little peculiar that, even though we have been busier after the liberation, with the help of comrades the project actually has progressed faster. The book has also increased in size with the addition of new chapters and the incorporation of recent results which are within its scope. Apart from giving a broad introduction to number theory and some of its fundamental principles the author has also tried to emphasize several points to its readers. First there is a close relationship between number theory and mathematics as a whole. In the history of mathematics we often see the various problems, methods and concepts in number theory having a significant influence on the progress of mathematics. On the other hand there are also frequent instances of applying the methods and results of the other branches of mathematics to solve concrete problems in number theory. However it is often not easy to see this relationship in many existing introductory books. Indeed many "self-contained" books for beginners in number theory give an erroneous impression to their readers that number theory is an isolated and independent branch of mathematics. In this book the author tries to highlight this relationship within the scope of elementary number theory. For example: the relationship between the prime number theorem and Fourier series (the limitation on the nature of the book does not allow us to describe the relationship between the prime number theorem and integral functions); the partition problem, the four squares problem and their relationship to modular functions, the theory of quadratic forms, modular transformations and their relationship to Lobachevskian geometry etc.

VIII


Secondly an important progression in mathematics is the development of abstract concepts from concrete examples. Specific concrete examples are often the basis of abstract notions and the methods employed on the examples are frequently the source of deep and powerful techniques in advanced mathematics. One cannot go very far by merely learning bare definitions and methods from abstract notions without knowing the source of the definitions in the concrete situation. Indeed such an approach may lead to insurmountable difficulties later in research situations. The history of mathematics is full of examples in which whole subjects were developed from methods employed to tackle practical problems, for example, in mechanics and in physics. As for mathematics itself the most fundamental notions are "numbers" and "shapes". From "shapes" we have geometric intuition and from "numbers" we have arithmetic operations which are rich sources for mathematics. In this book the author tries to bring out the concrete examples underlying the abstract notions hoping that the readers may remember them when they make further advances in mathematics. For example, in Chapter 4 and Chapter 14, concrete examples are given to illustrate abstract algebra; indeed the example on finite fields describes the situation of general fini te fields. Thirdly, for beginners engaging in research, a most difficult feature to grasp is that of quality - that is the depth of a problem. Sometimes authors work courageously and at length to arrive at results which they believe to be significant and which experts consider to. be shallow. This can be explained by the analogy of playing chess. A master player can dispose of a beginner with ease no matter how hard the latter tries. The reason is that, even though the beginner may have planned a good number of moves ahead, by playing often the master has met many similar and deeper pro blems; he has read standard works on various aspects of the game so that he can recall many deeply analyzed positions. This is the same in mathematical research. We have to play often with the masters (that is, try to improve on the results of famous mathematicians); we must learn the standard works of the game (that is, the "well-known" results). If we continue like this our progress becomes inevitable. This book attempts to direct the reader to work in this way. Although the nature of the book excludes the very deep results in number theory the author introduces different methods with varying depths. For example, in the estimation of the partition function p(n), the simplest of algebraic methods is used first to get a rough estimate, then using a slightly deeper method the asymptotic formula for logp(n) is obtained. It is also indicated how an asymptotic formula for p(n) can be obtained by a Tauberian met~lOd and how an asymptotic expansion for p(n) can be obtained using results in advanced modular function theory and methods in analytic number theory. It is then easy to judge the various levels of depth in the methods used by following the successive improvement of results. The book is not written for a university course; its content far exceeds the syllabus for a single course in number theory. However lecturers can use it as a course text by taking Chapters 1 - 6 together with a suitable selection from the other chapters. Actually the book does not demand much previous knowledge in mathematics. Second year university students could understand most of the book, and those who know advanced calculus could understand the whole book apart from Sections 9.2, 12.14, 12.15 and 17.9 where some knowledge of complex


IX

functions theory is required. Those studying by themselves should not find any special difficulties either. I am eternally grateful to the following comrades: Yue Min Yi, Wang Yuan, Wu Fang, Yan Shi Jian, Wei Dao Zheng, Xu Kong Shi and Ren Jian Hua. Since 1953, when I began my lectures, they have continually given me suggestions, and sometimes even offer to help with the revision. They have also assisted me throughout the stages of publication, 'particularly comrade Yue Min Yi. I would also like to thank Professor Zhang Yuan Da for his valuable suggestion on a method of preparing the manuscript for the typesetter. Although we have collectively laboured over the book it must still contain many mistakes. I should be grateful if readers would inform me of these, whether they are misprints, errors in content, or other suggestions. There is much material that appears here for the first time in a book, as well as some unpublished research material, so that there must be plenty of room for improvement. Concerning this point we invite the readers for their valuable contributions.

September 1956, Beijing

Hua Loo-Keng

Table of Contents

List of Frequently Used Symbols . . . . .

XVII

Chapter 1. The Factorization of Integers. 1.1

1.2 1.3 1.4 1.5 1.6 1. 7 1.8 1.9 l.lO l.lI 1.12 1.13

Divisibility.............. Prime Numbers and Composite Numbers. Prime Numbers . . . . . . . . . . . . . . . Integral Modulus . . . . . . . . . . . . . . The Fundamental Theorem of Arithmetic. The Greatest Common Factor and the L.east Common MUltiple. The Inclusion-Exclusion Principle Linear Indeterminate Equations . . . . . . Perfect Numbers. . . . . . . . . . . . . . . Mersenne Numbers and Fermat Numbers. The Prime Power in a Factorial . Integral Valued Polynomials . . . The Factorization of Polynomials Notes. . . . . . . . . . . . .

I 2 3 4 6 7 10 II 13 14 16 17 19 21

Chapter 2. Congruences .

22

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10

22 22 23

Definition . . . . . . . . . . . . . . . . . Fundamental Properties of Congruences Reduced Residue System . . . . . . The Divisibility of 2P - 1 - I by p2 . The Function qJ(m) . . . . . . . . Congruences............ The Chinese Remainder Theorem Higher Degree Congruences . . . Higher Degree Congruences to a Prime Power Modulus. Wolstenholme's Theorem . . . . . . . . . . . . . . . . . .

Chapter 3.

3.1 3.2 3.3 3.4

Quadrati~esidues

.

Definitions and Euler's Criterion . . . The Evaluation of Legendre's Symbol The Law of Quadratic Reciprocity;". Practical Methods for the Solutions. .

24

26 28 29 31 32 33 35 35 36

38 42

Table of Contents

XI

3.5 3.6 3.7 3.8 3.9

44 44

The Number of Roots of a Quadratic Congruence Jacobi's Symbol . . . . . . . . Two Terms Congruences. . . . . . . . . . . . Primitive Roots and Indices . . . . . . . . . . The Structure of a Reduced Residue System.

47 48 49

Chapter 4. Properties of Polynomials. . . .

57

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

57 58 60 61 62 63 64 65 66 67 68

The Division of Polynomials . . . . . The Unique Factorization Theorem . Congruences . . . . . . . . . . . . . . Integer Coefficients Polynomials . . . Polynomial Congruences with a Prime Modulus On Several Theorems Concerning Factorizations. Double Moduli Congruences. . . . . Generalization of Fermat's Theorem. Irreducible Polynomials modp . Primitive Roots Summary. . . . . . . . . . . . .

Chapter 5. The Distribution of Prime Numbers

70

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12

70 71 72

Order of Infinity. . . . . The Logarithm Function . . . . . Introduction............ The Number of Primes is Infinite Almost All Integers are Composite. Chebyshev's Theorem . . . . . . . . Bertrand's Postulate . . . . . . . . . Estimation of a Sum by an Integral . . Consequences of Chebyshev's Theorem. The Number of Prime Factors of n . . . A Prime Representing Function . . . . . On Primes in an Arithmetic Progression. Notes . . . . . . . . . . . . . . . . . . . .

75 78 79 82

85 89

94 96 97

99

Chapter 6. Arithmetic Functions

·102

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10

102 104 105

Examples of Arithmetic Functions. . . Properties of Multiplicative Functions. The Mobius Inversion Formula The Mobius Transformation . . . . . . The Divisor Function. . . . . . . . . . Two Theorems Related to Asymptotic Densities . The Representation of Integers as a Sum of Two Squares. The Methods of Partial Summation and Integration. The Circle Problem . . . . . . . . . . Farey Sequence and Its Applications . . . . . . . . .

107 III 113 115 120 122

125

XII

Table of Contents

6.11 6.12 6.13 6.14 6.15

Vinogradov's Method of Estimating Sums of Fractional Parts . . Application of Vinogradov's Theorem to Lattice Point Problems. Q-results . . . . Dirichlet Series. Lambert Series. Notes . . . . . .

129 134 138 143 146 147

Chapter 7. Trigonometric Sums and Characters.

149

7.1 7.2 7.3 7.4 7.5 7.6 7.7

Representation of Residue Classes. Character Functions. Types of Characters. Character Sums .. . Gauss Sums . . . . . Character Sums and Trigonometric Sums. From Complete Sums to Incomplete Sums.

149 151 156 159 162 169 170

7.8

Applications of the Character Sum

IP (X2 + ax + b)

x=l

174

P

7.9 The Problem of the Distribution of Primitive Roots. 7.1 0 Trigonometric Sums Involving Polynomials. Notes . . . . . . . . . . . . . . . . . . . . . . . . . . .

177 180 185

Chapter 8. On Several Arithmetic Problems Associated with the Elliptic Modular Function.

186

8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9

186 187 188 193 195 199 204 210 215

Introduction. . . . . . . . The Partition of Integers. Jacobi's Identity . . . . . . Methods of Representing Partitions. Graphical Method for Partitions . Estimates for p(n) . . . . . . . . . . The Problem of Sums of Squares . Density. . . . . . . . . . . . . . . . A Summary of the Problem of Sums of Squares.

Chapter 9. The Prime Number Theorem . . .

217

9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8

217 219 222 226 231 233 235 243 248

Introduction . . . . . . . . • y • The Riemann (-Function. Several Lemmas . . . . . . A Tauberian Theorem .. The Prime Number Theorem Selberg's Asymptotic Formula. Elementary Proof of the Prime Number Theorem. Dirichlet's Theorem. Notes . . . . . . . . . . . . . . . . . . . . . . . . . .

Table of Contents

XIII

Chapter 10. Continued Fractions and Approximation Methods .

250

10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 10.11 10.12

250 252 254 255 257 260 261 262 264 266 269 270

Simple Continued Fractions. . . . . . . . . . . . . . . The Uniqueness of a Continued Fraction Expansion. The Best Approximation. . . . . . . Hurwitz's Theorem. . . . . . . . . . The Equivalence of Real Numbers. Periodic Continued Fractions. . . . Legendre's Criterion. . . . . . . . . Quadradic Indeterminate Equations PeB's Equation . . . . . . . . . . . . Chebyshev's Theorem and Khintchin's Theorem Uniform Distributions and the Uniform Distribution of n8 (mod I) Criteria for Uniform Distributions. . . . . . . . . . . . . . . . ..

Chapter 11. Indeterminate Equations. .

276

11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10

276 276 278 278 283 286 288 288 290 293 299

Introduction . . . . . . . . . . . Linear Indeterminate Equations. Quadratic Indeterminate Equations. The Solution to ax 2 + bxy + cy2 = k. Method of Solution . . . . . . . . . . Generalization of Soon Go's Theorem. Fermat's Conjecture . . . . . . . . . . Markoff's Equation . . . . . . . . . . The Equation x 3 + y3 + Z3 + w 3 = O. Rational Points on a Cubic Surface Notes. . . . . . . . . . . . . .

Chapter 12. Binary Quadratic Forms.

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 , 12.12 12.13 12.14 12.15

The Partitioning of Binary Quadratic Forms into Classes The Finiteness of the Number of Classes. . . . . . . . . . Kronecker's Symbol. . . . . . . . . . . . . . . . . . . . . The Number of Representations of an Integer by a Form The Equivalence of Forms modq. . . . . . . . . . . . . . The Character System for a Quadratic Form and the Genus. The Convergence of the Series K(d) . . . . . . . . . . . . . . The Number of Lattice Points Inside a Hyperbola and an Ellipse. The Limiting Average. . . . . . . . . . . . . The Class Number: An Analytic Expression. The Fundamental Discriminants . . . The Class Number Formula. . . . . . The Least Solution to PeB's Equation Several Lemmas . Siegel's Theorem. Notes. . . . . . .

300 300 302 304 307 309 314 317 318 318 321 322 323 326 329 331 337

XIV

Table of Contents

Chapter 13. Unimodular Transformations

338

13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11

338 339 342 344 348 350 354 355 356 358 361

The Complex Plane . . . . . . . . . . . . . Properties of the Bilinear Transformation. Geometric Properties of the Bilinear Transformation. Real Transformations . . . . . Unimodular Transformations. . . . . The Fundamental Region . . . . . . . The Net of the Fundamental Region. The Structure of the Modular Group. Positive Definite Quadratic Forms . . Indefinite Quadratic Forms . . . . . : . The Least Value of an Indefinite Quadratic Form.

Chapter 14. Integer Matrices and Their Applications .

365

14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9

365 371 377 382 384 387 389 394 399

Introduction. . . . . . . . . . . . . . . . . . . . The Product of Matrices . . . . . . . . . . . . . The Number of Generators for Modular Matrices. Left Association. . . . . . . . . . . . . . . . Invariant Factors and Elementary Divisors. . . . . Applications. . . . . . . . . . . . . . . . . . . . . . Matrix Factorizations and Standard Prime Matrices. The Greatest Common Factor and the Least Common Multiple. Linear Modules. . . .

Chapter 15. p-adic Numbers.

405

15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9

405 408 410 411 412 415 417 417 421

Introduction. . . . . . The Definition of a Valuation The Partitioning of Valuations into Classes. Archimedian Valuations. . . . . . Non-Archimedian Valuations. . . . The 0, is Kronecker's symbol; see §12.3. ind n denotes the index of n; see §3.8. oOf denotes the degree of the polynomialf(x). «, 0, 0, ~ see §5.1. w(n) denotes the number of distinct prime divisors of n. Q(n) denotes the total number of prime divisors of n. max(a, b, ... ,c) denotes the greatest number among a, b, ... ,c. min(a, b, . .. ,c) denotes the least number among a, b, ... , c. 9ls denotes the real part of the complex number s. y denotes Euler's constant. {a, b, c} represents the quadratic form ax2 + bxy + cy2; see §12.1. (ZhZ2,Z3,Z4) denotes the cross ratio of the four points Zt.Z2,Z3,Z4; see §13.3. A ~ B means that the matrices A and B are left associated. N(Wl) denotes the norm of Wl; see §14.9. {an} denotes the sequence at. a2, .... ~ is an equivalence sign; see §12.1, §13.6, §14.5 and §16.12.

XVIII

List of Frequently Used Symbols

[ao, at. ... ,aN] or ao

1

+-

1

-

1

-

denotes a finite continued fraction; a1 + a2 + ... + aN Pnlqn = [ao, ai, ... ,an] is the n-th convergent of a continued fraction. S(oc) = OC(l) + OC(2) + ... + oc(n) is the trace of oc. N(oc) = OC(1)OC(2) ... oc(n) is the norm of oc. LI(oct. ... ,ocn) denotes the discriminant of OCt. ... , ocn; LI = LI(R(.9)) denotes the discriminant of the integral basis for R(.9). See §16.3 and §16.4. 1. If n is prime, then there is nothing to prove. Suppose now that n is not prime and that q I is the least proper divisor. By Theorem 1.3, q I must be a prime number. Let n = qint. 1 < ni < n. IfnI is prime, then the required result is proved; otherwise we let q2 be the least prime divisor of ni giving

Continuing the argument we have n > ni > n2 > ... > 1, and the process must terminate before n steps so that eventually we have

where qI, . .. , qs are prime numbers. The theorem is proved.

0

We can arrange the prime numbers in Theorem 2.1 as follows al > 0, a2 > 0, ... , ak > 0,

PI max(a, b, ... ,k, /) positive integers. The number of integers without the properties oc, /3, ... is N - max(a, b, ... , k, I). The required result follows from Theorem 7.1. D

11

1.8 Linear Indeterminate Equations

Theorem 7.1 can also be used to prove the following two theorems: Theorem 7.3. [at. . .. , an]

= al ... an(at. a2)-1 ... (a n- t. an) -l(at. a2, a3) 0

... (at. ... ,an)(_l)n+l.

Theorem 7.4. (at. ... ,an) = al ... an[at.a2]-1 ... [a n-t. an]-1[al,a2,a3] ···[at. ... ,anJ 0, b > o. Then every integer greater than ab - a - b is representable as ax + by (x ~ 0, y ~ 0). Moreover, ab - a - b is not representable as such.

12

1. The Factorization of Integers

Proof From Theorem 8.2 we know that the solutions to the equation n take the form . x

= Xo + bt,

=

ax

+ by

= Yo - at.

y

We now select t so that x and yare non-negative. We can choose t so that 0::;:; Yo - at < a, or 0::;:; Yo - at::;:; a-I. From the hypothesis, we have (xo

+ bt)a =

n - (Yo - at)b > ab - a - b - (a - I)b

= -

a

a, x

+I

or Xo

+ bt > -

Xo

+ bt ~ 0.

I,

so that

Finally, suppose if possible that ab - a - b

= ax + by,

x

~

0,

y

~

0.

Then we have ab = (x

Since (a, b) = I, it follows that aly hence ab

which is impossible.

+ I)a + (y + I)b.

+ I, blx + I, so that y + I

= (x + I)a + (y + I)b

~

~

~

band

2ab,

D

The above theorem can be interpreted as follows: If a> 0, b> 0, (a, b) = I, then ab - a - b is the largest integer not representable as ax + by (x ~ 0, y ~ 0). We can generalize this to the following problem: Let a; b, c be three positive integers satisfying (a, b, c) = I. Determine the largest integer not representable as ax + by + cz (x ~ 0, y ~ 0, z ~ 0). This is an unsolved problem.

°

Exercise 1. Let a> 0, b > and (a, b) = 1. Then the number of non-negative solutions to the equation ax + by = n is equal to

[:b] (Hint: [ex] - [fJ]

[a:] +

or

1.

= [ex - fJ] or [ex - fJ] + 1.)

Exercise 2. Let a, b, c be positive integers satisfying (a, b) = (b, c) = (c, a) = 1. Determine the largest integer not representable as bcx

+ cay + abz,

(Answer: 2abc - ab - be - ca.)

x

~

0,

y

~

0,

z

~

0.

13

1.9 Perfect Numbers

Exercise 3. Determine the number of solutions to

x

+ 2y + 3z = n,

x

~

y

0,

~

0,

z

~

o.

(Hint: The required number is the coefficient of x" in the power series expansion for (1 - x)(l - x 2 )(l - x 3 )

•

The power series can be obtained by the method of partial fractions. Answer:

(n + 3)2 7 ( - 1)" 2 2nn -1-2--72 + - 8 - +9"cos-3-·) Exercise 4. (Ancient Chinese publication.) Cockerel one, five cents; chicken one, three cents; baby chicks three, one cent. One hundred cents are paid for one hundred birds. How many cockerels, chickens and baby chicks are there?

1.9 Perfect Numbers Theorem 9.1. Let u(n) denote the sum of the divisors of n. If n

u(n)

=

pa1+I_l I

= p~' ... p~s, then

pas+I_l . ..

PI - 1

s

•

Ps - 1

Proof All the divisors of n are of the form

Therefore we have

a,

u(n) =

as

L p~' ... p:

L

=0 a,

=

L

s

xs=O

Xl

a2 p~'

Xl=O

p~' + I

-

.

L

p~2

...

as

L

p:s

Xs:::::O X2=0 p~s+I - 1 1

PI - 1

Ps - 1

D

An immediate consequence of this theorem is: Theorem 9.2. If(m,n)

= 1, then u(mn) = u(m)u(n). D

Note: u(n) is called an arithmetic function. An arithmetic function possessing the property of Theorem 9.2 is called a multiplicative function. Definition. A positive integer n is called a perfect number if u(n) = 2n. Examples of perfect numbers are: 6

= 1 + 2 + 3,

28

=

1 + 2 + 4 + 7 + 14.

14

I. The Factorization of Integers

Theorem 9.3. Let p = 2n

-

1 be prime. Then !p(p

+ 1) =

2n- 1(2 n - 1)

is perfect. Moreover, every even perfect number is of this form. Proof 1) From Theorem 9.1 we have a(!p(p

2n

-

1 p2 - 1

+ 1)) = - - - - = (2n 2-1 p-1

- l)(p

.

+ 1) = p(p + 1).

2) Let a be any even perfect number. Set u> 1,

2,ru.

Then, by Theorem 9.2, 2n - 1 2nu = 2a = a(a) = - - a(u),

2-1

and so

But u and u/(2 n - 1) are both divisors of u. Since a(u) is the sum of all the divisors of u, it follows that u has only two divisors, so that u is prime and u/(2 n - 1) = 1. The theorem is proved. 0 Exercise 1. Verify that a(m) = a(n) = m

m n

+ n has the following three solutions: 9363584 9437056

Exercise 2. Prove that if a positive integer is the product of its proper divisors, then it must be a cube of a prime or a product of two distinct primes.

1.10 Mersenne Numbers and Fermat Numbers Whether there exists an odd perfect number is a famous difficult problem. From the previous section we see that the determination of even perfect numbers is reduced to the determination of Mersenne primes, that is prime numbers of the form 2n - 1, since there is now a one-to-one correspondence between Mersenne primes and even perfect numbers. Whether there exist infinitely many Mersenne primes is another difficult unsolved problem is number theory. Theorem 10.1.

If n > 1 and an - I

is prime, then a

= 2 and n is prime.

15

1.10 Mersenne Numbers and Fermat Numbers

Proof If a> 2, then (a - 1)I(an - 1) so that an - 1 cannot be prime. Again, if a and n = kl, where k is a proper divisor of n, then (2k - 1)1(2n - 1) so that 2n

cannot be prime.

=2 -

1

0

The problem of the primality of 2n is prime. We usually write

-

1 is thus reduced to that of 2P

-

1 where p

for a Mersenne prime. Up to the present (1981) Mp has been proved prime for p

= 2,3,5,7,13,17,19,31,61,89, 107, 127,521,607,1279,2203,2281, 3217,4253,4423,9689,9941,11213,19937,21701,23209,44497

so that there are 27 perfect numbers known to us. Similarly to the Mersenne numbers, there are the so-called Fermat numbers. Theorem 10.2.

Proof If m

If 2m + 1 is prime,

=

2n.

= qr, where q is an odd divisor of m, then we have 2qr

and 1 < 2r

then m

+ 1 = (2r)q + 1 = (2r + 1)(2r(q-l) -

+ 1 < 2qr + 1, so

...

that 2m + 1 cannot be prime.

+ 1) 0

Let

We call Fn a Fermat number, and the first five Fermat numbers

Fo = 3,

F3

= 257,

F4 = 65537

are all primes. On this evidence Fermat conjectured that Fn is prime for all n. However, in 1732, Euler showed that Fs

= 225 + 1 = 641 x 6700417

so that Fermat's conjecture is false. Note: The divisibility of Fs by 641 can be proved as follows: Let a = 27, b = 5 so that a - b 3 = 3, 1 + ab - b4 = 1 + 3b = 24. Therefore

and this must be divisible by 1 + ab = 24

+ 54 = 641.

16

I. The Factorization of Integers

It has been found that many Fermat numbers Fn are composite, but no Fermat prime has been found apart from the first five numbers. Therefore Fermat's conjecture has been a most unfortunate one, and indeed it is now conjectured that there are only finitely many Fermat primes. There is an interesting geometry problem associated with Fn , namely that Gauss proved that if Fn is prime, then a regular polygon with Fn sides can be constructed using only straight edge and compass.

1.11 The Prime Power in a Factorial Theorem 11.1. Let p be a prime number. Then the (exact) power alp that divides n! is given by .

[~J + [;2 J + [;3 J + .... (There are only finitely many non-zero terms in this series.) Proof From n!

=

1 ·2· .. (p - 1) . p . (p

+ 1) ... (2p) ... (p -

l)p ...

. p2 ...

we see that there are [~J mUltiples of p, [;2] multiples of p2, and so on. The theorem follows. 0 Theorem 11.2. The number n! ( n) r =r!(n-r)! is an integer. Proof We use the fact that [O(J - [PJ is either [0( Theorem 11.1 we see that the power of p in (;) is

I([;m a non-negative integer. Example. If n

J -

PJ

or [0( -

[;m J - [n p-: rJ),

0

= iooo, p = 3, then [10300J

=

333,

[1~~OJ = [3~3J = 111,

PJ + 1.

From

17

1.12 Integral Valued Polynomials

[l~~OJ =

[l~~OJ = 12,

37,

[ 1000J 35 = 4,

[l~~OJ = 1.

Therefore the exact power of 3 which divides 1000! is 333

+ III + 37 + 12 + 4 + 1 = 498.

Exercise 1. Detennine the exact power of 7 which divides 10000!. Exercise 2. Determine the exact power of 5 which divides Exercise 3. Prove that if r

+ s + ... + t = n,

GggO).

then

n! r! s! ... t! is an integer. Prove further that if n is prime and max(r, s, ... , t) < n, then the above number is a multiple of n.

1.12 Integral Valued Polynomials Definition. By an integral valued polynomial we mean a polynomial j(x) in the variable x which only takes integer values whenever x is an integer. Example. Polynomials with integer coefficients are integral valued polynomials. The polynomial (

x) = x(x - 1) ... (x - r r r!

+ 1)

is an integral valued polynomial. We shall write L1j(x) for f(x + 1) - f(x). Theorem 12.1.

Proof L1 (x) r

= (x + l)x ... (x - r + 2) _ x(x - 1) ... (x - r + 1) r!

r!

=x"'(X-r+2)«X+l)_(X_r+l))=( x ). r! r- 1

0

Theorem 12.2. Every integral valued polynomial of degree k can be written as

18

I. The Factorization ofIntegers

where ak, ... , ao are integers. Moreover, given any set of integers ak, ... , ao, the above is an integral valued polynomial. Proof Any polynomial f(x) of degree k can be written as

Now

1) + OCk-1 C: 2) + ... +

Llf(x) = OCkC:

Writing Ll2j(X) for LI(Llj(x)), and LI'j(x)

(Llj(x))x=o =

=

OCI,

OCI'

LI(Llr-1j(x)) we see that ••• ,

(LI'!(x))x=o =

oc" ••••

If j(x) is integral valued, then so are Llf(x) , Ll 2f(x),.... Therefore j(0), (Llj(x))x=o,"" (LI'j(x))x=o,'" are all integers; that is OCk>"" OCo are integers. The last part of the theorem is trivial. D The same method can be used to prove:

Theorem 12.3. Let f(x) be an integral valued polynomial. Given any integer x, a necessary and sufficient condition for j(x) to be a multiple of m is that

where ak,' .. , ao are integers given in Theorem 12.2.

D

Theorem 12.4 (Fermat). Let p be a prime number. Then,for any integer x, x P - x is a multiple of p.

Proof If P = 2, then the result follows at once from x 2 - x = x(x - 1). Assume therefore that p > 2, and letf(x) = x P - X. Now f(O) = 0 and Llj(x) = (x

+ 1)P -

x P - (x

+ 1) + x

where the coefficients (by Exercise 11.3) are all integers. With x = 0, we see thatj( 1) is a multiple of p; with x = 1, we see thatj(2) is a mUltiple ofp; and so on. Therefore f(x) is always a mUltiple of p if x ~ O. If x is a negative integer, we can deduce the result from

xP- x The theorem is proved.

D

= -

[( -

x)P - ( - x)].

19

1.13 The Factorization of Polynomials

Exercise 1. Generalize Theorems 12.2 and 12.3 to several variables. Exercise 2. Prove that n(n

+ 1)(2n + 1) is a mUltiple of 6.

Exercise 3. Prove that, as m and n run through the set of all positive integers,

m

+ t(m + n -

l)(m

+n -

2)

also runs through the whole set of positive integers, and with no repetition. Exercise 4. Prove that if a polynomial of degree k takes integer values for k successive integers, then it must be an integral valued polynomial.

+1

Exercise 5. If./{- x) = - ./{x), then we call./{x) an odd polynomial. Prove that an odd integral valued polynomial can be written as

ao

X(X+l) + allx (x) 1 + a 2 "2 3 + ... + am mx(x+m-l) 2m - 1 '

where at. ... ,am are integers.

1.13 The Factorization of Polynomials Theorem 13.1. Let g(x) and h(x) be two polynomials with integer coefficients:

g(x)

=

h(x)

=

alx' + ... + ao, bmxm + ... + b o,

a, i= 0, bm i= 0,

and g(x)h(x)

= C'+mx'+m + ... + co.

Then

Proof We may assume without loss that (a" ... ,ao) that pl(C,+ m, ... , co) and pl(b m, ... ,bv + I),

= 1, (b m, ... ,bo) = 1. Suppose

p,{'b v •

From the definition we have Cu + v

=

I

asb!,

s+t=u+v

and apart from the term aub v , each term is a multiple of p. Since p,{'aub v , it follows that p,{'cu+v , and so P,{'(C,+ m, ... , co), contradicting our assumption. Therefore no prime can divide (C,+ m, •.• , co)· D

20

1. The Factorization of Integers

Definition. Letfix) be a polynomial with rational coefficients. Suppose that there are two non-constant polynomials g(x) and h(x) with rationaJ coefficients such that f(x) = g(x)h(x). Then f(x) is said to be reducible. Irreducible means not reducible. Example. x 2 - 2 and x 2 + 1 are irreducible polynomials, whereas 3x 2 + 8x reducible and the factorization is (3x + 2)(x + 2).

+ 4 is

Theorem 13.2 (Gauss). Let fix) be a polynomial with integer coefficients. If f(x) = g(x)h(x) where g(x) and h(x) are polynomials with rational coefficients, then there exists a rational number y such that

1

yg(x),

-h(x) y

have integer coefficients. Proof We may assume that the greatest common factor of the coefficients offix) is 1. There are integers M, N such that Mg(x)

= alxl + ... + ao,

ai integer;

Nh(x) = bmxm +

... + b o, bi MNfix) = CI+mX I+m + ... + co.

integer;

From our assumption and Theorem 13.1 we have

Let

y=

M (az, ... ,ao)

and the required result follows.

=

(bm, ... ,bo) N

0

Theorem 13.3 (Eisenstein). Let f(x) = cnxn + ... + Co be a polynomial with integer coefficients. If p,tc", plCi (0 :::; i < n) and p2 ,tco, then fix) is irreducible. Proof Suppose, if possible, thatf(x) is reducible. By Theorem 13.2 we have that fix) g(x)

= g(x)h(x),

= alxl + ... + ao, 1+ m = n,

I> 0,

m>O,

where aj and bk are integers. From Co = aob o and plco we see that either plao or plb o. Suppose that plao. Then, from p2,ta ob o = Co we deduce that p,tb o. Next, the coefficients for g(x) cannot all be a multiple of p, since otherwise plcn. We can therefore suppose that pl(ao,"" ar-I), p,ta" 1:::; r :::; I. From Cr =

21

Notes

arb o +

... + aobr we geduce that p,./'cr. But r::;:; 1< n and so we have a contradiction. The theorem is proved. 0 As a corollary we have:

Theorem 13.4. xm - p is irreducible, so that

.:fP is an irrational number.

0

Theorem 13.5. The polynomial xp - 1 x-I

_ _ =xp - l + ... +x+ 1 is irreducible. Proof Write x

= y + 1 so that we have

~«y + 1)P -

1)

= yp-l + pyP-2 + (~)YP-3 + ... + p.

It is easy to see that each coefficient, apart from the first, is a multiple of p, and that the constant term is not a multiple of p2. 0

Exercise. Prove that the following polynomials are irreducible:

Notes 1.1. up to the present there are 27 known Mersenne primes, namely Mp = 2P where p

-

1

= 2,3,5,7,13,17,19,31,61,89,107,127,521,607,1279,2203, 2281,3217,4253,4423,9689,9941,11213,19937,21701, 23209,44497.

The twelfth Mersenne prime, namely M 127 , was found by Lucas in 1876 and the remaining fifteen have been found since 1952 with the aid of electronic computers. Thus M44497 is the largest known prime with 13395 digits which was discovered in 1979 (see [54J). 1.2. It is known that any odd perfect number must (i) exceed 10 50 (see [26J), (ii) have a prime factor exceeding 100110 (see [27J).

Chapter 2. Congruences

2.1 Definition Let m be a natural number. If a - b is a multiple of m, then we say that a and b are congruent modm, and we write a == b (modm). If a,b are not congruent modm, then we write a ¢= b (modm). Example. 31 == - 9 (mod 10). If a, b are integers, then we always have a == b (mod 1).

The notion of congruence occurs frequently and even in our daily lives; for example we may consider the days of the week as a congruence problem with modulus 7. Again in the ancient calendar in our country we count the years with respect to the modulus 60. Indeed our country made some significant contribution to the theory of congruence. For example, the Chinese remainder theorem originates from ancient publications concerning solutions to problems such as the following: There is a certain number. When divided by three this number has remainder two; when divided by five, it has remainder three; when divided by seven, it has remainder two. What is the number? With our notation here, the number concerned"is an integer x such that x == 2 (mod 3), x == 3 (mod 5), x == 2 (mod 7). The problem is therefore a problem of the solutions to simultaneous congruences.

2.2 Fundamental Properties of Congruences Theorem 2.1. (i) a == a (modm) (r.eflexive); (ii)Ifa == b (modm), thenb == a (modm) (symmetric); (iii) If a == b, b == c (modm), then a == c (modm) (transitive). D These three properties here show that being congruent is an equivalence relation. The set of integers can then be partitioned into equivalence classes so that integers in each class are congruent among themselves, and two integers from different classes are not congruent. We call these equivalence classes residue classes. It is clear that, for the modulus m, we have precisely m residue classes: the classes whose members have remainder r = 0, 1,2, ... ,m - 1 when divided by m. Ifwe select one member from each residue class, then the set of numbers formed is called a complete residue system.

23

2.3 Reduced Residue System

Theorem 2.2. If a == b, al == b l (modm), then we have a == b - bi> aal == bb l (modm). D

+ al == b + bi>

a - al

Theorem 2.2 has the following interpretation: Let A, B be any two residue classes from which we select any representatives a, b. Denote by C the residue class which contains a + b (or a - b or ab). Then C depends on A, B but not on the representatives a, b. In other words, the sum of any two integers from A, B must belong to C. We can therefore define Cto be the sum of the two classes A, B and we denote it by C = A + B. Similarly we can define A - B and A . B. We see from Theorem 2.2 that, with respect to residue classes mod m, the operations of addition, subtraction and multiplication are closed. We note that division is not always possible; for example 3 . 2 == 1 . 2, 2 == 2 (mod 4), but 3 i= 1 (mod 4). However we do have the following: Theorem 2.3. If ac == bd, c == d (modm) and (c, m)

Proof From (a - b)c + b(c - d) (c,m) = 1, so that mla - b. D

=

ac - bd ==

= 1, then a == b (modm).

°

(modm), we have ml(a - b)c. But

We denote by 0 the residue class of all mUltiples of m. Then A + 0 = A and A ·0= O. Again, if we let [be the residue class of integers with remainder 1 when divided by m, then A . [ = A. From our example and Theorem 2.3 we see that from A . B = A . C we may not deduce that B = C; but if the members of A are coprime with m (Note: if A has one member which is coprime with m, then every member must also be coprime with m), then we have B = C. If we take m to be a prime number, then apart from the class 0, every class is coprime with m. Therefore, for a prime modulus, the operations of addition, subtraction, multiplication and division are closed, except that we cannot divide by the class O.

2.3 Reduced Residue System As we said earlier, if a residue class A contains an element which is coprime with m, then every element of A is coprime with m, and we call A a class coprime with m. If A and m are coprime, then we can, by Theorem 2.3, define BIA. In particular, we write A- l for [IA. For example: A A- l

1°11121314 x 1 3 2 4

AA 1

I

(mod 5)

~ ~ ~ ~ ~ ~

(mod 6)

A~ll ~I~I!I~I~I~I:

(mod 7)

1

1

1

1

1

24

2. Congruences

The sign " x " in the table means "undefined". Definition. We denote by qJ(m) the number of residue classes (modm) coprime with m. This function qJ(m) is called Euler's function. If we select one member of each residue class coprime with m:

then we call this set of integers a reduced residue system. Example. qJ(l) = I,

qJ(2)

=

1,

qJ(3) = 2,

qJ(4) = 2.

We may also describe qJ(m) as the number of positive integers not exceeding m and coprime with m. If m = p is a prime, then qJ(p) = p - l. Theorem 3.1. Let a1' a2,"" a",(m) be a reduced residue system, and suppose that (k,m) = l. Then ka1, ka2,'" ,ka",(m) is also a re.duced residue system. Proof Clearly we have (ka;, m) = 1, so that each ka; represents a residue class coprime with m. If ka; == kaj (modm), then, since (k,m) = I, we have a; == aj (mod m). Therefore the members ka; represent distinct residue classes. The theorem is proved. 0

Theorem 3.2 (Euler). If(k,m) = I, then k",(m) == I (modm). Proof From Theorem 3.1 we have ",(m)

",(m)

• =1

'.=1

TI (ka.) == TI a.

(modm) .

Since (m,a;) = I, it follows that k",(m) == I (modm). Taking m

0

= p we have Fermat's theorem (Theorem 1.12.4).

Theorem 3.3. Let p be a prime. Then,for all integers a, we have a P == a (modp).

2.4 The Divisibility of 2P -

1 -

0

1 by p2

In 1828 Abel asked if there are primesp and integers a such that aP-1 == I (modp2)? According to Jacobi: if p ::;:; 37, then the above has the solutions p = 11, a = 3 or 9; p = 29, a = 14; and p = 37, a = 18. Recent research work on Fermat's last theorem has added some impetus to this problem. We have the following result concerning Fermat's last theorem: Let p be an odd prime. If there are integers x,y, z such that x P + yP + zP = 0, p,txyz, then (I)

2.4 The Divisibility of 2P -

1 -

25

1 by p2

and (2)

and more recently we know also that nP-1 == 1 (modp2) for n = 2,3, ... ,47. We do not know if there exists a prime p such that both (1) and (2) hold. Definition. If aP-1 == 1 (modp2), then we call a a Fermat solution. It is clear that the product of two Fermat solutions is a Fermat solution, the product of a Fermat solution and a non-Fermat solution is a non-Fermat solution. In the prime factorization of a non-Fermat solution there must be a prime divisor which is a non-Fermat solution.

Theorem 4.1. Let a, b be two Fermat solutions with respect to p. Then there does not exist q such that qp = a ± b, p,{'q. Proof From the definition we have a P == a, b P == b (modp2), (3)

If qp = a ± b, p,{'q, then a P = (=+= b + qp)P == =+= b P (modp2) giving a P ± b P == 0 (modp2). Substituting this into (3) yields a ± b = qp == 0 (modp2), which is a contradiction. D Theorem 4.2. 3 is a Fermat solution with respect to 11. Proof We have 3 5 = 243 == 1 (mod 112) so that 3 10 == 1 (mod 11 2).

D

Theorem 4.3. 2 is a Fermat solution with respect to 1093. Proof Let p = 1093. Then 3 7 = 2187 = 2p

+ 1, so

that (4)

also 214

= 16384 = 15p - II,

2 28

== - 330p + 121 (modp2),

so that 3 2 .2 28

== - 2970p + 1089 (modp2) == - 2969p - 4 == 310p - 4 (modp2),

32 . 2 28 . 7

== 2170p - 28 == - 16p - 28 (modp2).

26

2. Congruences

Therefore

From the binomial theorem we have

and hence (5)

From (4) and (5) we have

Therefore

Theorem 4.4. 3 is a non-Fermat solution with respect to 1093. Proof If 3 were a Fermat solution, then so would 3 7 be one. Since - I is clearly a Fermat solution, and 37 - I = 2p, we obtain the required contradiction from

Theorem 4.1.

0

Theorem 4.5. There exists no prime p < 100 which satisfies (I) and (2) simultaneously. Proof Suppose that 2 and 3 are both Fermat solutions. Then 21, 3m and 213m are all

Fermat solutions, and of course I is also a Fermat solution. The theorem now follows from Theorem 4.1 and the following calculations: 7=22+3,

2=3-1,

3=2+1,

13=22+3 2,

17=23+3 2,

5=2+3, 19=24 +3,

53=2'3 3-1,

37=26 -33, :'l)=2 5 +3 3,

41=25+3 2, 43=2 4 +3 3, 61=26 -3, 67=26 +3,

73=26+3 2,

79= -2+34,

83=2+3 4,

31=22+3 3,

23= _22 +3 3,

89=23 +34,

11 =2+3 2, 29=2+33, 47=24 '3-1, 71=2 3 '3 2-1, 97=24+3 4. 0

Recently Lehmer has proved that if p :::; 253,747,889, then there must exist m :::; 47 such that mP-l ¥= I (modp2). This makes some contribution towards

Fermat's last theorem.

2.5 The Function cp(m) Theorem 5.1. Let (m, m') = 1, and let x run over a complete residue system mod m, and x' run over a complete residue system modm'. Then mx' + m'x runs over a complete residue system modmm'.

27

2.5 The Function cp(m)

Proof Consider the mm' numbers mx' mx'

+ m'x.

+ m'x == my' + m'y

If (modmm'),

then mx'

== my' (mod m'),

m'x

== m'y (modm).

From (m, m') = 1 we have x' == y' (modm'), x == y (modm). The theorem is proved. D' Theorem 5.2. Let (m, m') = 1, and let x run over a reduced residue system mod m, and x' run over a reduced residue system mod m. Then mx' + m' x runs over a reduced residue system modmm'. Proof 1) We first prove that mx' + m'x is coprime with mm'. Suppose otherwise. Then there exists P such that pl(mm', mx' + m'x). If plm, then plm'x. Since (m, m') = 1, it follows that p,tm' and so pix. Thus pl(m, x) which is impossible. 2) We next prove that every integer a coprime with mm' must be congruent modmm' to an integer of the form mx' + m'x, (x,m) = (x',m') = 1. By Theorem 5.1 there are integers x, x' such that a == mx' + m'x (modmm'). We now prove that (x,m) = (x',m') = 1. If (x,m) = d ¥- 1, then (a,m) = (mx' + m'x,m) = (m'x,m) = (x,m) = d ¥- 1, which contradicts the hypothesis. Similarly we must have (x',m') = 1. 3) We have already proved in Theorem 5.1 that the numbers mx' + m'x are incongruent. Therefore the theorem is proved. D

We have in fact proved that 1, then dmust divide b, or else there is no solution. We then have

=

=

=

=

(2)

29

2.7 The Chinese Remainder Theorem

We have already proved that (2) has a unique solution Xl satisfying 0 ~ and X = Xl + (mld)t are all solutions to (2). Therefore Xl

+ (d-

Xl

< mid,

m

1)d

are all incongruent (modm) solutions to (I). We have therefore proved the following: Theorem 6.1. If (a, m)lb, then there are (a, m) incongruent (modm) solutions to (I). Otherwise (1) has no solution. 0 Theorem 6.2. A necessary and sufficient condition for the congruence aXI + ... + anxn + b = 0 (modm) to have a solution (xt. ... , xn) is that (at. ... , am m)lb. If this condition is satisfied, then the number of incongruent (mod m) solutions is m n- l(at. ... , am m). Proof The case n = 1 is settled by Theorem 6.1. We now proceed by induction. Let (at. ... ,an,m) = d and (at. ... ,an-I,m) = dt. SO that (dt.an) = d. From Theorem 6.1 we know that there are d· (midI) solutions to

o ~ xn < m. Corresponding to a solution Xn we set anxn .

dl

+b

=

bl .

From the induction hypothesis, the number of solutions to the congruence alxl + ... +an-Ixn-l +bldl =0 (modm) is mn-2(al, ... ,an_t>m)=mn-2dl' Therefore the total number of solutions is given by md - ' m n- 2d l = mn-Id dl

as required.

0

2.7 The Chinese Remainder Theorem Theorem 7.1. Let m be the least common multiple ofml and m2' The conditionfor the solubility of the simultaneous congruences X = al

(modmd,

=a2

(modm2),

X is

(1)

If(I) holds, then the solution is unique modm.

30

2. Congruences

Proof 1) Let (mr,m2) = d. If the simultaneous congruences have a solution, then x == ar, x == a2 (modd) and hence dial - a2' 2) If dial - a2, then the solutions to x == al (mod ml) are given by x = al + mlY' Substituting this into the second congruence gives al + mlY == a2 (mod m2)' From the proof of Theorem 6.1 this congruence has a unique solution modm2/d. Therefore the simultaneous congruences have a unique solution xmodm. 0

Theorem 7.2. If(mi' m)

=

1 (l

x == ai

~

i <j ~ n), then the simultaneous congruences

(modm;),

have a unique solution mod mI' .. m n • Proof Apply mathematical induction to Theorem 7.1.

0

Let us now discuss the ancient method of solutions to this type of problem. We already stated the problem of" What is the number?" in §1. The solution to this problem was published as a song in 1593, and it goes as follows: "Three people walking together, 'tis rare that one be seventy, Five cherry blossom trees, twenty one branches bearing flowers, Seven-disciples reunite for the half-moon, Take away (multiple of) one hundred andfive and you shall know."

We recall that the problem was to solve the simultaneous congruences x == 2 (mod 3), x == 3 (mod 5), x == 2 (mod 7). The meaning of the song here is as follows: Multiply by 70 the remainder of x when divided by 3, multiply by 21 the remainder of x when divided by 5, multiply by 15 (the number of days in half a Chinese (synodic) month) the remainder of x when divided by 7. Add the three results together, and then subtract a suitable multiple of 105 and you shall have the required smallest solution. For our specific example, we have 2 x 70

+3

x 21

+2

x 15

= 233

and on subtracting twice 105 we have the required solution 23. How do we explain this ancient method of solution, and in particular where do 70,21, 15 come from? The answer is as follows: 70 is a mUltiple of 5 and 7 which has remainder 1 when divided by 3. 21 is a mUltiple of 3 and 7 which has remainder 1 . when divided by 5. 15 is a mUltiple of 3 and 5 which has remainder 1 when divided by 7. It follows that 70a + 21b + 15cmust have remainders a, band cwhen divided by 3, 5 and 7 respectively. We may further investigate how they obtained 70,21 and 15. They had to solve x == 0

(modm2),

31

2.8 Higher Degree Congruences

where Y satisfies mlm2Y == 1 (modm3)? The answer is that they used their own version of the Euclidean algorithm to solve the indeterminate equation mlm2Y - m3z

= 1.

The following exercises are all from ancient Chinese publications. Exercises 2,3, 4 are dated 1275. Exercise 1. Replace 3, 5, 7 by 3, 7, 11 and determine the three numbers which correspond to 70, 21, 15. Exercise 2. Seven with remainder one, eight with remainder two, nine with remainder three. What is the number? Exercise 3. Eleven with left over three, twelve with left over two, thirteen with left over one. What is the number? Exercise 4. Two with left over one, five with left over two, seven with left over three, nine with left over four. What is the number? Exercise 5. There is a number. It has no remainder when divided by five. It has a remainder ten when divided by seven hundred and fifteen. It has a remainder one hundred and forty when divided by two hundred and forty seven. It has a remainder two hundred and forty five when divided by three hundred and ninety one. It has a remainder one hundred and nine when divided by one hundred and eighty seven. May we ask what is the number? (Answer: Ten thousand and twenty.)

2.8 Higher Degree Congruences Let m be a fixed natural number, and letfix) = anxn + ... with integer coefficients. We now discuss the congruence fix)

== 0

(modm).

+ ao be a polynom.ial (1)

If Xo is a solution, then Xo + mt is also a solution. This means that if Xo satisfies (1), then each member of the residue class represented by Xo also satisfies (1). Therefore, when we speak of the number of solutions to (1) we mean the number of incongruent solutions. The number of solutions to a higher degree congruence is quite irregular. For example:

= (x - 1)x(x + 1) == 0 (mod 6) has six solutions. 2. The congruence x 2 + 1 == 0 (mod 3) has no solution. 3. The congruence (x - 1)(x - P - 1) == 0 (mod p2) has p solutions, namely 1, 1. The congruence x 3 - x

p

+ 1, 2p + 1, ... , (p -

l)p

+ 1.

We see therefore that the solutions to higher degree congruences are difficult and complicated. The follqwing theorem helps a little.

32

2. Congruences

Theorem 8.1. Let (ml,m2)

= 1. Then the number of solutions to the congruence (2)

is the product of the numbers of solutions to the congruences fix) == 0

(modml),

(3)

fix) == 0

(modm 2)'

(4)

If m

= mlm2 = pilI . .. p!s

(PI < P2 < ... < Ps)

is the standard prime factorization of m, then the number of solutions to (2) is the product of the numbers of solutions to the s congruences:

1~ i

~

s.

Proof It is clear that each solution to (2) is also a solution to (3) and (4). Conversely, let CI and C2 be solutions to (3) and (4) respectively, and let c be a solution of c == CI (modml)andc == C2 (modm2)' The solution cexists andisuniquemodm according to the Chinese remainder theorem. Moreover, this c satisfies (2) because mr!f(c), m21f(c) so that mlf(c). D

2.9 Higher Degree Congruences to a Prime Power Modulus Theorem 9.1. Let p be a prime number. The number of solutions (including repeated ones) to the congruence

fix)

= anxn + ... + aD == 0 (modp)

(1)

does not exceed n. Proof We can assume that p,./'an. The theorem becomes trivial if (1) has no solutions. If a is a solution, then we can write f(x) = (x - a)fl(x)

+ rr,

where we see thatplr l by substituting a for x. Thereforef(x) == (x - a)fl(x)(modp). If a is also a solution to fl(x) == 0 (modp), then we have similarly that fl(x) == (x - a)f2(x) (modp), and in this case we call a a repeated solution to fix) == 0 (modp). Iff(x) == (x - a)hgl(x) (modp) where gl(a) =1= 0 (modp), then we call a a repeated solution of order h tof(x) == 0 (modp). From our proof so far, we see that the degree of gl(X) is n - h. Suppose now that b is another solution. Then

33

2.10 Wolstenholme's Theorem

Sincep,r(b - a), it follows thatgl(b) == 0 (modp). If bis a repeated solution of order k to gl(X) == 0 (modp), then we have, as before,

Proceeding in this way we have fix) == (x - a)h(x - b)k .. . (x - C)lg(X)

(modp),

whereg(x) isa polynomial of de green - h - k - ... -/andg(x) == no solution. The theorem is proved. 0

o(modp) has

Since 1,2, ... ,p - 1 are solutions to XP-l == 1 (modp) we see that XP-l - 1 == (x - l)(x - 2) ... (x - (p - 1))

(modp).

(2)

Substituting x = 0 into this, and noting that p - 1 is even if p > 2, we have: Theorem 9.2 (Wilson).

If p

is a prime, then (p - I)! == - 1 (modp).

0

Theorem 9.3. Let f'(x) = nanxn- l + ... + 2a2x + al. If fix) == 0, f'(x) == 0 (modp) have no common solution, then the two congruencesf(x) == 0 (modi) and fix) == 0 (modp) have the same number of solutions. Proof We prove this by induction on I, the case 1= 1 being trivial. Let Xl be a solution tof(x) == 0 (modi-i), so that

because (x + pl-ly)n == xn a unique y such that

+ npl-lyx"-l (modi). Butp,rf'(Xl) so that there exists

Theorem 9.4. The congruence XP-l == 1 (modi) has p - 1 solutions. Proof This is an immediate consequence of Theorem 9.3.

0

2.10 Wolstenholme's Theorem Theorem 10.1. Let p be a prime number greater than 3, and denote by ~ an integer s* such that ss* == 1 (mod p2). Then we have 1 1 1+- +- + 2 3

1

... + - - == 0 p-1

(mod p2).

34

2. Congruences

Proof Let (x - I)(x - 2)'" (x - (p - I))

- SIXp- 2 +

= XP-I

... + Sp-l>

(1)

so that Sp-I =(p-I)!.

Since (x - I)(x - 2) ... (x - (p - I))

==

XP-I - 1

(modp),

(2)

it follows that (3)

We set x = p in (I). Then (p - I)! = pP-1 - SIPp-2

+ ... -

Sp-2P

+ Sp-i>

or

Since p > 3, we have, by (3), that

or p21(p _ I)! (I

+ ~ + ... + 2

_1_),

p-I

or 1* + 2*

as required.

D

+ ... + (p

- 1)*

== 0

(modp2),

Chapter 3. Quadratic Residues

3.1 Definitions and Euler's Criteria Definition 1. Let m be an integer greater than 1, and suppose that (m, n) = 1. If x 2 == n (modm) is soluble, then we call n a quadratic residue mod m; otherwise we call n a quadratic non-residue mod m. We can now divide the set of integers coprime with n into two classes: the class of quadratic residues and the class of quadratic non-residues.

Example. The numbers 1,2,4 are quadratic residues and 3,5,6 are quadratic nonresidues mod 7. Definition 2 (Legendre's symbol). Letp be an odd prime, and suppose thatp,tn. We let if n is a quadratic residue mod p, if n is a quadratic non-residue mod p. If is easy to see that if n == n' (modp) and p,tn, then

Theorem 1.1. Let p > 2. There are t
2,

then

C/ ) = ( -

ly!-(p-l).

D

In other words, - 1 is a quadratic residue or non-residue modp, according to whether p == 1 or 3 (mod4). It follows from this that the odd prime divisors of x 2 + 1 must be congruent to 1 (mod 4). Theorem 2.2 (Gauss's Lemma). Let p > 2, p,tn. Denote by m the number of least positive residues of the 1) numbers n, 2n, ... l)n (mod p) which exceed p/2. Then

t(P -

Example 1. p

7,n

=

=

,t(P -

10. We have 10,20, 30 == 3,6,2

(mod 7).

There is exactly one least positive residue which exceeds (If) = - 1.

J. Therefore m = 1 and

Example 2. p = 11, n = 2. We have the residues 2,4,6,8, 10 (mod 11), and there are three which exceed 1f. Therefore (121) = - 1.

t(P -

Proof of Theorem 2.2. Let 1= 1) - m, and let at> ... , a, be those residues which are less than p/2, and bI> ... , bm be those residues which are greater than p/2. Then'

n as n b == n I

m

t(p-l)

l

s=l

1=1

k=l

(p _1)

p-l

kn = - - !n-22

(modp).

(1)

Since 1 ::;:; p - bl ::;:; t(p - 1) it follows that as and p - bl are t(p - 1) integers in the 1). We now prove that they are distinct by proving interval from 1 to as -:f p - bl • Suppose, if possible, as + bl = p. Then there are integers x, y such that

t(P -

xn

or x

+ yn == 0

+ y == 0 (modp),

1::;:; x ::;:;

(modp),

tCP -

1),

which is impossible. Therefore

n as n (P I

m

s=l/=l

bl )

1::;:; y ::;:;

(p _1) !.

= -

2

t(P - 1)

38

3. Quadratic Residues

From (1) we see that the left hand side of this equation is

== (- l)m

rl Ii as

s=1

ht

== (- l) mnt(p-l)(p 2

t=1

I)!

(modp).

Therefore nt(p-l)

== (- l)m (modp).

From Euler's criterion we see that (;) == ( - l)m (modp), and so (;)

= (-

l)m. 0

If we take n = 2 in Theorem 2.2, then

2,2'2, 2·3, ... ,t(p -1)·2 are already in the interval from 0 to p. We can now determine the number of integers k satisfying i < 2k < p, or ~ < k < i, which gives

m= Let p = 8a

+ r, r =

[~J -[~l

1,3,5,7. Then

m = 2a

+

GJ -[~J

== 0, 1, 1,0 (mod 2).

Therefore we have: Theorem 2.3.

If p > 2,

then (;)

= (-

l)i(pL 1).

0

In other words 2 is a quadratic residue or nO!l-residue modp, according to whether p == ± 1 or ± 3 (mod 8). It follows from this that every odd prime divisor of x 2 - 2 must be congruent to ± 1 (mod 8). Exercise. Let n be a positive integer such that 4n + 3 and 8n + 7 are primes. Prove that 24n + 3 - 1 = M 4n + 3 is composite. Use this to prove the following concerning Mersenne numbers:

231M ll ,

471M23 ,

1671Ms3 ,

263IM 131 ,

3591M 179 ,

3831M 19b

4791M239 ,

5031M251 •

3.3 The Law of Quadratic Reciprocity Theorem 3.1. Let p, q he two distinct odd primes. Then

(~) (~) = (_

l)t(p-1)t(q-1).

39

3.3 The Law of Quadratic Reciprocity

x2

In other words, if p == q == 3 (mod 4), then exactly one of the two congruences == p (mod q), x 2 == q (modp) is soluble. Otherwise the two congruences are either

both soluble or both insoluble. This is the famous and important Law of Quadratic Reciprocity in elementary number theory which was discovered by Legendre and proved by Gauss, who named it "the queen of number theory". The later research work on algebraic number theory by Kummer, Eisenstein, Hilbert, Takagi, Artin, Furtwangler seem to justify the name. Proof We do not, for the moment, exclude the case q = 2, and we suppose that p, q are distinct primes. When 1 ~ k ~ t(P - 1) we can write

Let m

I

a=

a.,

I

b=

bt

t= I

s= I

where as and bt are defined in the previous section. Then we have tIp-I)

I

+ b.

rk = a

(1)

k=I

We saw in the proof of G.auss's lemma that a., p - bt are the same as 1,2, ... ,t(P - 1). Therefore

p2 _ 1

1

-8-=1+2+ ... +"2(p-l)=a+mp -b,

(2)

and

p2 _ 1 - -q

tIp-I)

=

I

tIp-I)

kq

=p

I

g k=I k=I Subtracting (2) from (3), we have p2 _ 1 -g-(q - 1)

tIp-I)

qk

I

+

tIp-I)

rk

=P

k=I

I

qk

+ a + b.

(3)

k=I

!(p-I)

I

=p

qk - mp

+ 2b,

k=I

or

p2_1

tIp-I)

(4) I qk - m (mod 2). k=I 1) (Alternative proof of Theorem 2.3). We take q = 2 so that qk are all 0, and hence -8-(q - 1)

==

p2 _ 1

- - == - m (mod 2). 8

2) Let q > 2. Then tIp-I)

m ==

I

k=I

qk

(mod 2).

40


Therefore

Similarly we have

so that

If we can prove that t(p-1)

[kq]

t(q-1)

k= 1

P

1= 1

L

- + L

[lP] _P - 1 q - 1

- --2

q

or

2

=p-lq-l 2 2

(mod 2),

then the theorem will follow. It suffices therefore to prove the following lemma.

Lemma.

_P- 1q- 1 L [kq] -P + L [IP] -q . 2 2

t(p-1)

t(q- 1)

k= 1

1= 1

Proof Consider the rectangle with vertices: (O,tq)

(0,0), (0, tq), (tp, 0), (tp, tq)

<tp,O)

(0,0)

The diagonal from the origin does not pass through any lattice point (a point with integer coordinates). This is because if (x, y) is a lattice point on the diagonal, then xq - yp = 0 and so pix, qly, showing that (x,y) must lie outside the rectangle. The total number oflattice points in the rectangle is 1) . t(q - 1). The number of lattice points in the two triangular regions below and above the diagonal are respectively

t(P -

t(I

1

)

k=l

The lemma is therefore proved.

[kq] , P

0

Example 1. Determine those primes p > 3 of which 3 is a quadratic residue. From the law of quadratic reciprocity we have

(3) (p)

p-1 \p = 3 (- 1)-2-.

41

3.3 The Law of Quadratic Reciprocity

Now

{m~I'

G)~ (-/)~ p

p-l

(-1)-2-=

{

if p=.1

-I,

(mod 3),

if p=.2 (mod 3); if p=.1 (mod 4), if p =. - 1 (mod 4).

1 ' - 1,

It follows from the Chinese remainder theorem that if p =. if p =.

±1

(mod 12), ± 5 (mod 12).

Example 2. Determine those primes p -:f 5 of which 5 is a quadratic residue.

From the law of quadratic reciprocity we have (;) = (~), and

(5"2) =(-1)-8-= -1, 52-1

G)= 1,

G)=(-5 2 )=-I,

(i) =

so that if p =. ± 1 (mod 5), if p =. ± 2 (mod 5). Example 3. Determine those primes p of which 10 is a quadratic residue.

From Example 2 and the Chinese remainder theorem we have if p =. if p =.

± 1, ± 3, ± 9, ± 13 (mod 40), ± 7, ± 11, ± 17, ± 19 (mod 40).

Example 4. Determine the solubility of x 2 =. -1457 (mod2389).

Here p = 2389 is a prime. Since - 1457 = - 31 x 47 it follows from (

~ 1) = 1,

(:1) =

(:J e =

2 1) = 1,

(~) = (:7) = (:7)G~) = - (~7)G~) 8 = - G)C 3) = - G)C23) = - I, that

C2~n7

) = - 1, so that the congruence is not soluble.

Exercise 1. Show that (;3) = 1,

G~) =

- 1.

195) = - 1, (74) Exercise 2. Show that ( 1901 101 = - 1, (365) 1847 = 1.

1,

42


Exercise 3. Show that

= ±1

or

±5

(mod 24),

then(~) = 1;

±7

or

±

(mod 24),

then

if

p

if

p=

11

(~) = -

1.

3.4 Practical Methods for the Solutions Although the theory above is simple and beautiful, it is nevertheless rather negative. By this we mean the following. If, following our theory, the congruence is insoluble, then the problem is finished. However, if the congruence is soluble, we may further ask for the actual solutions to the congruence, and the method does not give us the solutions. In actual fact, when p is large, the determination of the solutions to x 2 = n (modp) is no easy matter. However, ifp = 3 (mod 4) or p = 5 (mod 8), then we have the following methods. 1) p = 3 (mod 4). Since (;) = 1, we have n t (p-1) = 1 (mod p). and so (n! ± Xl + 2' - 1 are actually incongruent solutions we see that the congruence has exactly four solutions. When p > 2 and I = I, the result is trivial, and the remaining part of the theorem follows from Theorem 2.9.3. D

xi -

From the results of Chapter 2 we can determine the number of solutions to a quadratic congruence to any integer modulus m.

3.6 Jacobi's Symbol Throughout this section m denotes a positive odd integer. Definition. Let the standard factorization of m be PI ... Pt, where the Pr may be repeated. If (n,m) = 1 then we define the Jacobi's symbol by

45

3.6 Jacobi's Symbol

(-mn) =0 G) t

r=1

Examples.

r

(~) = 1. If (a,m) = 1, then (~) = 1.

Note: If (:) = 1, it does not follow that x 2

=n (modm) is soluble.

Theorem 6.1. Let m and m' be positive odd integers. (i) If n

= n'

(modm) and

(n, m) = 1, then (:) = (:). (ii) If(n, m) = (n, m') = 1, then (:) (;,) = (m:'). (iii) If(n,m) = (n',m) = 1, then (:)(:) = (:'). Theorem 6.2. (

-:n 1)

D

= ( - l)t(m-l).

Proof. It suffices to prove that t

t

L

Pi - 1 2

i=1

=

OPi- 1 (mod 2),

i=1

2

which certainly holds when t = I. Given any two odd integers u, v we always have

u - 1 v-I -2- + -2-

=-uv2- -1

(mod 2)

(or (u - 1)(v - I)

=0 (mod4)).

It follows by induction that t

Pi -

1

t-l Pi -

1

Pt -

1

L=i= L - +2i= 1 2 1 2 t-l

o

_

Theorem 6.3.

i=1

Pi -

1

1

0 Pi -

= 222 +~

1

i=1

(mod 2).

(~) = (_ 1)~mL1).

Proof. This is similar to the above, except that we replace (I) by U2 V 2 -

8

1

u2 - 1 v2 - 1 = - 8 - + - 8 - (mod 2).

D

Theorem 6.4. Let m, n be coprime positive odd integers. Then ( -m)(n) n m

~.'!!..::..! 2.

= ( - 1) 2

D

(1)

46


Proof Let m = TIp, n =

=

TI q. Then n-lm-l

p-lq-l

TITI (- 1)-2--2- = (- 1)-2--2p

where we have used (1).

q

0

In using the Legendre's symbol we must always ensure that the denominator is a prime. In using Jacobi's symbol however, we can avoid the factorization process. For example:

383) (443) ( 60 ) ( 22 ) ( 15 ) ( 15 ) ( 443 = - 383 = - 383 = - 383 383 = - 383 8 8 = C1 :) = ( 5) = (25) = 1. If we delete the condition that m, m' are positive in Theorem 6.4, then we have:

Theorem 6.5. Let m, n be coprime odd integers.

(Imln)(m) jnf = - ( -

Otherwise, the required value is ( -

If m, n are both negative, then m-ln-l

1)-2--2-.

l)t<m-l).! 2, then cp(pl) is even, so that m cannot have two distinct odd prime divisors. If m has a primitive root, then m must be of the form 21, i or 2cpl. If c ~ 2, then cp(2C) = 2C- 1 is also even, and so 2ci cannot have primitive roots. Therefore m must be of the form 21,pl or 2pl. 2) m = 21. If I = I, then I is a primitive root. If 1= 2, then 3 is a primitive root. Let I ~ 3. We prove by induction that for all odd a, we have a 2' - 2

=

I

(mod 21).

This is easy, since if then

Therefore there is no primitive root for m = 21 (/ > 2). 3) m = i. The case I = I has already been settled in §8. Let g be a primitive root of p. If gP-l - I =1= 0 (modp2), then we take r = g; if gP-l - I = 0 (modp2), then we take r = g + p. We then have

Therefore such an r is a primitive root of p2. Let rP -

1 -

I

=

kp, p,tk.

Since s~O,

we can prove as before that

Hence rpl - 2 (p-l)

= I + kp l-l

(mod pi) ,

I

~

2.

(1)

If the order of r is e, then el(p - I)pl-l = cp(i). Since r is a primitive root of p, we see that(p - 1)le. We deduce from (I) that e = cp(Pl); that is r is a primitive root ofi· 4) m = 2pl. We take g to be a primitive roqt of pl. If g is odd, then g is also a primitive root of 2pl; if g is even, then g + pi is a primitive root of 2pl. D Theorem 9.2. Let I > 2. Then the order of 5 with respect to the modulus 21 is 21- 2.

Proof We first prove that, for a

52a - 3

~

3,

= I + 2a -

1

(mod2 a ).

51

3.9 The Structure of a Reduced Residue System

This clearly holds when a 5 2a - 2 = (5 2a - 3 f

Therefore 521-3 (mod 21). D

=1=

= 3,

and we now proceed by induction. We have

== (1 + 2a- 1 + k2a)2 == 1 + 2a (mod2 a+ 1 ).

1 (mod 21) and 52' - 2 == 1 (mod 21). That is, the order of 5 is 21- 2

Theorem 9.3. Let I > 2. Then, given any odd a, there exists b such that a-I

a == ( - 1)-2-5 b

(mod 21),

b

~

0.

Proof If a == 1 (mod 4), then by Theorem 9.2, 5b (0::;;; b < 21- 2) gives 21- 2 distinct numbers mod 21; moreover they are all congruent 1 (mod 4). Therefore there must be an integer b such that a == 5b (mod 21). If a == 3 (mod 4), then - a == 1 (mod 4), and the required result follows from the above. D

Theorem 9.4. Let m = 21 . pili . .. p~s (standard factorization) with I ~ 0, 11 > 0, ... , Is > 0. We define (j to be or 1 or 2 according to whether 1= 0, 1 or 1= 2 or I > 2

°

respectively. Then the reduced residue system ofm can be represented by the products of s + (j numbers. Proof 1) Suppose that m = m'm", (m', m") = 1. Let ar, .. . , aq>(m') be a reduced residue system mod m', and that ai == 1 (modm") (this is always possible). Let br, ... , bq>(m") be a reduced residue system mod m" and that bj == 1 (modm'). Then aibj represen t a reduced residue system mod mm', and its num ber is q>( m'm"). Also, if aibj == asb t (modm'm"), then ai == as (modm'), bj == b t (modm"). 2) From Theorems 9.1 and 9.3 we know that the reduced residue system modm, where m = pi (p > 2), is the product of a single number. If m = 21 where I > 1, then the reduced residue system is the product of (j numbers. Combining this with 1), the theorem is proved. D

This theorem points out an important principle. In group theory this result is known as the Fundamental Theorem of Abelian groups. Exercise. Prove that if k < p, n, = kp2

+ 1 and

2n -

1

== 1 (modn),

then n is a prime number. Hints: (i) First prove.that n has a prime divisor congruent 1 (modp). Let dbe the least positive integer such that 2d == 1 (mod n). Deduce that d,tk, din - 1 and pld. Then obtain the conclusion from pldlq>(n). (ii) Deduce from n = kp2 + 1 = (up + 1)(vp + 1) that n cannot be composite. Note: Taking p = 2127 - 1, k = 180, Miller and Wheeler proved, with the aid of a computer, that 180(2127 - 1)2 + 1 is prime. (Nature 168 (1951),838).

52


The least primitive roots for primes less than 5000. An asterisk indicates that lOis a primitive root. p

p-1

g

p

p-1

g

p

p-1

g

3 5 7* 11 13 17* 19* 23* 29* 31 37 41 43 47* 53 59* 61* 67 71 73 79 83 89 97* 101 103 107 109* 113* 127 131* 137 139 149* 151 157 163 167* 173 179* 181* 191 193* 197 199 211 223* 227 229* 233* 239

2 22 2·3 2·5 22.3 24 2.3 2 2·11 22.7 2·3·5 22.3 2 23 .5 2·3·7 2·23 22.13 2·29 22.3.5 2·3·11 2·5·7 23 .3 2 2·3·13 2·41 23 .11 25 .3 22.5 2 2·3·17 2·53 22.3 3 24 .7 2.3 2.7 2·5·13 23 ·17 2·3·23 22.37 2.3.5 2 22.3.13 2.3 4 2·83 22.43 2·89 22.3 2.5 2·5·19 26 .3 22.7 2 2.3 2·11 2·3·5·7 2·3·37 2·113 22.3.19 23 ·29 2·7·17

2 2 3 2 2 3 2 5 2 3 2 6 3 5 2 2 2 2 7 5 3 2 3 5 2 5 2 6 3 3 2 3 2 2 6 5 2 5 2 2 2 19 5 2 3 2 3 2 6 3 7

241 251 257* 263* 269* 271 277 281 283 293 307 311 313* 317 331 337* 347 349 353 359 367* 373 379* 383* 389* 397 401 409 419* 421 431 433* 439 443 449 457 461* 463 467 479 487* 491* 499* 503* 509* 521 523 541* 547 557 563

24 .3.5 2.5 3 23 2·131 22 ·67 2.3 3 .5 22.3.23 23 .5.7 2·3·47 22.73 "2.3 2.17 2·5·31 23 .3.13 22.79 2·3·5·11 24.3.7 2·173 22.3.29 25 ·11 2·179 2·3·61 22.3.31 2.3 3 .7 2·191 22.97 22.3 2·11 24.5 2 23 .3.17 2·11·19 22.3.5.7 2·5·43 24.3 3 2·3·73 2·13·17 26 .7 23 .3.19 22.5.23 2·3·7·11 2·233 2·239 2.3 5 2.5.7 2 2·3·83 2·251 22 ·127 22.5.13 2.3 2.29 22.3 3 .5 2·3·7·13 22 ·139 2·281

7 6 3 5 2 6 5 3 3 2 5 17 10 2 3 10 2 2 3 7 6 2 2 5 2 5 3 21 2 2 7 5 15 2 3 13 2 3 2 13 3 2 7 5 2 3 2 2 2 2 2

569 571* 577* 587 593* 599 601 607 613 617 619* 631 641 643 647* 653 659* 661 673 677 683 691 701* 709* 719 727* 733 739 743* 751 757 761 769 773 787 797 809 811* 821* 823* 827 829 839 853 857* 859 863* 877 881 883 887*

23 .71 2·3·5·19 26 .3 2 2·293 24 .37 2·13 ·23 23 .3.5 2 2·3·101 22.3 2·17 23 .7.11 2·3·103 2.3 2.5.7 27 .5 2·3·107 2·17·19 22 ·163 2·7·47 22.3.5.11 25 .3.7 22.13 2 2·11·31 2·3·5·23 22.5 2·7 22.3.59 2·359 2.3.11 2 22.3.61 2.3 2.41 2·7·53 2.3.5 3 22.3 3 .7 22.5.19 28 .3 22 ·193 2·3·131 22 ·199 23 ·101 2.3 4.5 22.5.41 2·3·137 2·7·59 22.3 2.23 2·419 22.3.71 23 ·107 2·3·11·13 2·431 22.3.73 24.5.11 2.3 2.72 2·443

3 3 5 2 3 7 7 3 2 3 2 -3

:,

11 5 2 2 2 5 2 5 3 2 2 11 5 6 3 5 3 2 6 11 2 2 2 3 3 2 3 2 2 11 2 3 2 5 2 3 2 5

53


p

p-1

g

p

p-1

g

p

p-1

g

907 911 919 929 937* 941* 947 953* 967 971* 977* 983* 991 997 1009 1013 1019* 1021* 1031 1033* 1039 1049 1051* 1061 1063* 1069* 1087* 1091* 1093 1097* 1103* 1109* 1117 1123 1129 1151 1153* 1163 1171* 1181* 1187 1193* 1201 1213* 1217* 1223* 1229* 1231 1237 1249 1259* 1277 1279

2·3·151 2'5'7·13 2'3 3 '17 25 ·29 23 '3 2'13 22'5.47 2'11·43 23 '7'17 2·3·7·23 2·5'97 24 ,61 2·491 2.3 2-5-11 22'3'83 24 '3 2'7 22 ·11·23 2'509 22. 3· 5 ·17 2'5·103 23 .3.43 2·3·173 23 ·131 2'3'5 2'7 22'5'53 2'3 2'59 22'3'89 2'3·181 2'5·109 22'3'7'13 23 ·137 2'19·29 22 ·277 22'3 2'31 2·3·11·17 23 '3'47 2'5 2'23 27 .3 2 2'7·83 2'3 2'5'13 22'5'59 2·593 22 ·149 24 '3'5 2 22.3'101 26 ·19 2·13·47 22,307 2·3·5·41 22'3'103 25 .3.13 2·17'37 22'11.29 2.3 2'71

2 17 7 3 5 2 2 3 5 6 3 5 6 7 11 3 2 10 14 5 3 3 7 2 3 6 3 2 5 3 5 2 2 2 11 17 5 5 2 7 2 3 11 2 3 5 2 3 2 7 2 2 3

1283 1289 1291* 1297* 1301* 1303* 1307 1319 1321 1327* 1361 1367* 1373 1381* 1399 1409 1423 1427 1429* 1433* 1439 1447* 1451 1453 1459 1471 1481 1483 1487* 1489 1493 1499 1511 1523 1531* 1543* 1549* 1553* 1559 1567* 1571* 1579* 1583* 1597 1601 1607* 1609 1613 1619* 1621* 1627 1637 1657

2·641 23 '7.23 2'3'5'43 24 '3 4 22. 52 ·13 2·3·7·31 2·653 2·659 23 .3.5.11 2'3·13·17 24 '5'17 2·683 22'7 3 22'3'5'23 2·3·233 27'11 2'3 2'79 2·23'31 22. 3· 7 ·17 23 ·179 2'719 2'3'241 2'5 2.29 22. 3.11 2 2'3 6 2.3.5'7 2 23 '5'37 2·3·13·19 2'743 24 .3'31 22'373 2'7·107 2·5·151 2'761 2.3 2. 5 ·17 2·3·257 22'3 2'43 24 '97 2·19·41 2'3 3 '29 2·5·157 2'3'263 2·7·113 22. 3· 7 ·19 26 ,5 2 2 ·11· 73 23 '3'67 22'13'31 2·809 22'3 4 '5 2·3·271 22,409 23 ,3 2 ,23

2 6 2 10 2 6 2 13 13 3 3 5 2 2 13 3 3 2 6 3 7 3 2 2 5 6 3 2 5 14 2 2 11 2 2 5 2 3 19 3 2 3 5 11 3 5 7 3 2 2 3 2 11

1663* 1667 1669 1693 1697* 1699 1709* 1721 1723 1733 1741* 1747 1753 1759 1777* 1783* 1787 1789* 1801 1811* 1823* 1831 1847* 1861* 1867 1871 1873* 1877 1879 1889 1901 1907 1913* 1931 1933 1949* 1951 1973 1979* 1987 1993* 1997 1999 2003 2011 2017* 2027 2029* 2039 2053 2063* 2069* 2081

2·3·277 2'7 2'17 22'3'139 22'3 2'47 25 '53 2·3·283 22'7'61 23 '5'43 2·3'7·41 22'433 22'3'5'29 2· 32·97 22'3'73 2'3'293 24 '3'37 2.3 4 '11 2·19'47 22'3'149 23 '3 2.5 2 2·5·181 2·911 2'3·5·61 2·13'71 22'3'5'31 2'3'311 2'5·11'17 24 '3 2'13 22'7.67 2·3'313 25 '59 22. 32'19 2'953 23 ·239 2·5·193 22'3'7'23 22 ·487 2'3'5 2'13 22'17'29 2·23·43 2·3·331 22'3'83 22'499 2.3 3 .37 2·7·11·13 2·3·5·67 25 .3 2.7 2 ·1013 22'3'13 2 2·1019 22.3 3 '19 2·1031 22 ·11·47 25 .5'13

3 2 2 2 3 3 3 3 3 2 2 2 7 6 5 10 2 6 11 6 5 3 5 2 2 14 10 2 6 3 2 2 3 2 5 2 3 2 2 2 5 2 3 5 3 5 2 2 7 2 5 2 3

54


p

p-I

g

p

p-I

g

p

p-I

g

2083 2087 2089 2099* 211l 21l3* 2129 2131 2137* 2141* 2143* 2153* 2161 2179* 2203 2207* 2213 2221* 2237 2239 2243 2251* 2267 2269* 2273* 2281 2287 2293 2297* 2309* 2311 2333 2339* 2341* 2347 2351 2357 2371* 2377 2381 2383* 2389* 2393 2399 2411* 2417* 2423* 2437* 2441 2447* 2459* 2467 2473*

2·3·347 2·7·149 23 .3 2.29 2·1049 2·5·21l 26 .3 ·Il 24.7.19 2·3·5·71 23 .3.89 22.5.107 2.3 2.7.17 23 ·269 24.3 3 .5 2.3 2.1l 2 2·3·367 2·1l03 22.7.79 22.3.5.37 22.13.43 2·3·373 2·19· 59 2.3 2.5 3 2·11·103 22.34.7 25 ·71 23 .3.5.19 2.3 2.127 22.3.191 23 .7.41 22.577 2·3·5·7·1l 22. II· 53 2·7·167 22.3 2.5.13 2·3·17·23 2.5 2 .47 22 .19.31 2·3·5·79 23 .3 3 ·Il 22.5.7.17 2·3·397 22.3.199 23 .13.23 2· 11·109 2·5·241 24 ·151 2·7·173 22 .3.7.29 23 .5.61 2 ·1223 2 ·1229 2.3 2.137 23 .3.103

2 5 7 2 7 5 3 2 10 2 3 3 23 7 5 5 2 2 2 3 2 7 2 2 3 7 19 2 5 2 3 2 2 7 3 13 2 2 5 3 5 2 3 II 6 3 5 2 6 5 2 2 5

2477 2503 2521 2531 2539* 2543* 2549* 2551 2557 2579* 2591 2593* 2609 2617* 2621* 2633* 2647 2657* 2659 2663* 2671 2677 2683 2687* 2689 2693 2699* 2707 271l 2713* 2719 2729* 2731 2741* 2749 2753* 2767* 2777* 2789* 2791 2797 2801 2803 2819* 2833* 2837 2843 2851* 2857 2861* 2879 2887 2897*

22.619 2.3 2.139 23 .3 2.5.7 2·5·1l·23 2.3 3 .47 2·31·41 4.7 2.13 2.3.5 3 .17 22.3 2.71 2·1289 2·5·7·37 25 .3 4 24.163 23 .3.109 22.5.131 23 .7.47 2.3 3 .7 2 25 .83 2·3·443 2·1l 3 2·3·5·89 22.3.223 2.3 2.149 2·17·79 27 .3.7 22.673 2·19·71 2·3·1l·41 2·5·271 23 .3.113 2.3 2.151 23 .11.31 2·3·5·7·13 22.5.137 22.3.229 26 .43 2·3·461 23 .347 22.17.41 2.3 2.5.31 22.3.233 24.5 2.7 2·3·467 2 ·1409 24.3.59 22 ·709 2.7 2.29 2.3.5 2.19 23 .3.7.17 22.5. II· 13 2 ·1439 2·3·13·37 24.181

2 3 17 2 2 5 2 6 2 2 7 7 3 5 2 3 3 3 2 5 7 2 2 5 19 2 2 2 7 5 .3 3 3 2 6 3 3 3 2 6 2 3 2 2 5 2 2 2 II 2 7 5 3

2903* 2909* 2917 2927* 2939* 2953 2957 2963 2969 2971* 2999 3001 301l* 3019* 3023* 3037 3041 3049 3061 3067 3079 3083 3089 3109 3119 3121 3137* 3163 3167* 3169 3181 3187 3191 3203 3209 3217 3221* 3229 3251* 3253 3257* 3259* 3271 3299*' 3301* 3307 3313* 3319 3323 3329 3331* 3343* 3347

2·1451 22.727 22.3 6 2·7·1l·19 2·13·1l3 23 .3 3 .41 22.739 2 ·1481 23 .7.53 2.3 3 .5 ·Il 2 ·1499 23 .3.5 3 2·5·7·43 2·3·503 2 ·151l 22·3·1l·23 25 .5.19 23 .3.127 22.3 2.5.17 2·3·7·73 2.3 4.19 2·23·67 24 ·193 22.3.7.37 2·1559 24.3.5.13 26 .7 2 2·3·17·31 2·1583 22·3 2.1l 22.3.5.53 2.3 3 .59 2·5·1l·29 2·1601 23 .401 24.3.67 22.5.7.23 22.3.269 2.5 3 .13 22.3.271 23 .11.37 2·3·181 2·3·5·109 2·17·97 22.3. 52. II 2·3·19·29 24.3 2.23 2·3·7·79 2·1l·151 28 .13 2.3 2 .5.37 2·3·557 2·7·239

5 2 5 5 2 13 2 2 3 10 17 14 2 2 5 2 3 II 6 2 6 2 3 6 7 7 3 3 5 7 7 2 II 2 3 5 10 6 6 2 3 3 3 2 6 2 10 6 2 3 3 5 2

55


p

p-1

g

P

p-1

g

3359 3361 3371* 3373 3389* 3391 3407* 3413 3433* 3449 3457 3461* 3463* 3467 3469* 3491 3499 3511 3517 3527* 3529 3533 3539* 3541 3547 3557 3559 3571* 3581* 3583 3593* 3607* 3613 3617* 3623* 3631 3637 3643 3659* 3671 3673* 3677 3691 3697 3701* 3709* 3719 3727* 3733 3739 3761 3767 3769

2·23·73 25 '3.5.7 2·5·337 22.3.281 22. 7 .11 2 2·3·5·113 2·13·131 22'853 23 • 3 ·11·13 23 ·431 27 ,3 3 22'5'173 2·3·577 2·1733 22. 3 .17 2 2'5'349 2·3·11·53 2'3 3 '5'13 22'3'293 2·41·43 23 ,3 2,7 2 22 ·883 2·29·61 22'3.5'59 2'3 2.197 22.7.127 2·3'593 2·3·5'7·17 22.5.179 2.3 2.199 23 '449 2·3·601 22'3'7'43 25 '113 2'1811 2.3'5.11 2 22. 32. 101 2·3·607 2·31'59 2·5'367 23 '3 3 '17 22'919 2'3 2'5'41 24 .3'7.11 22'5 2'37 22 • 32·103 2.11.13 2 2· 34 ·23 22'3'311 2·3·7·89 24 '5'47 2'7·269 23 '3'157

11 22 2 5 3 3 5 2 5 3 7 2 3 2 2 2 2 7 "2 5 17 2 2 7 2 2 3 2 2 3 3 5 2 3 5 21 2 2 2 13 5 2 2 5 2 2 7 3 2 7 3 5 7

3779* 3793 3797 3803 3821* 3823 3833* 3847* 3851* 3853 3863* 3877 3881 3889 3907 3911 3917 3919 3923 3929 3931 3943* 3947 3967* 3989* 4001 4003 4007* 4013 4019* 4021 4027 4049 4051* 4057* 4073* 4079 4091* 4093 4099 4111 4127 4129 4133 4139* 4153* 4157 4159 4177* 4201 4211* 4217* 4219*

2 '1889 24 '3'79 22 ·13· 73 2 ·1901 22'5'191 2'3'7 2'13 23 '479 2·3·641 2· 52. 7·11 22. 32·107 2·1931 22'3'17'19 23 '5'97 24 '3 5 2.3 2'7'31 2·5·17·23 22 ·11· 89 2·3·653 2·37'53 23 ·491 2·3·5·131 2'3 3 '73 2 ·1973 2·3·661 22 ·997 25 .5 3 2·3·23·29 2·2003 22'17'59 2'7 2'41 22'3'5'67 2·3·11·61 24 .11.23 2.3 4 '5 2 23 • 3 '13 2 23 '509 2·2039 2'5·409 22.3.11'31 2·3·683 2·3·5·137 2·2063 25 • 3 ·43 22 ·1033 2·2069 23 .3.173 22 ·1039 2.3 3 . 7'11 24 .3 2'29 23 '3'5 2'7 2'5'421 23 '17'31 2·3·19'37

2 5 2 2 3 3 3 5 2 2 5 2 13 11 2 13 2 3 2 3 2 3 2 6 2 3 2 5 2 2 2 3 3 10 5 3 11 2 2 2 17 5 13 2 2 5 2 3 5 11 6 3 2

p

4229* 4231 4241 4243 4253 4259* 4261* 4271 4273 4283 4289 4297 4327* 4337* 4339* 4349* 4357 4363 4373 4391 4397 4409 4421* 4423* 4441 4447* 4451* 4457* 4463* 4481 4483 4493 4507 4513 4517 4519 4523 4547 4549 4561 4567* 4583* 4591 " 4597 4603 4621 4637 4639 4643 4649 4651* 4657 4663

p-1

g

22'7'151 2'3 2'5'47 24 '5'53 2·3· 7 ·101 22 ·1063 2'2129 22'3.5'71 2'5'7·61 24 .3.89 2·2141 26 '67 23 '3'179 2·3'7·103 24 .271 2'3 2.241 22 '1087 22. 32.11 2 2·3·727 22 ·1093 2·5·439 22'7'157 23 ·19·29 22. 5·13 ·17 2· 3 ·11· 67 23 • 3· 5· 37, 2'3 2'13'19 2'5 2'89 23 '557 2·23·97 27 '5'7 2'3 3 '83 22 ·1123 2·3'751 25 .3'47 22 ·1129 2.3 2.251 2·7'17·19 2'2273 22'3'379 24 .3.5'19 2'3'761 2·29'79 2· 33 • 5 ·17 22'3'383 2·3 ·13'59 22 .3.5'7.11 22'19'61 2'3'773 2·11·211 23 '7'83 2'3'5 2'31 24 .3'97 2.3 2'7.37

2 3 3 2 2 2 2 7 5 2 3 5 3 3 10 2 2 2 2 14 2 3 3 3 21 3 2 3 5 3 2 2 2 7 2 3 5 2 6 11 3 5 11

5 2 2 2 3 5 3 3 15 3

56


p

p-I

g

p

p -I

g

p

p-I

g

4673* 4679 4691* 4703* 4721 4723 4729 4733 4751 4759 4783* 4787 4789

26 .73 2·2339 2·5·7·67 2·2351 24 .5.59 2·3·787 23 .3.197 22 .7.13 2 2.5 3 .19 2·3·13·61 2·3·797 2·2393 22 .3 2 .7.19

3 II 2 5 6 2 17 5 19 3 6 2 2

4793* 4799 4801 4813 4817* 4831 4861 4871 4877 4889 4903 4909 4919

23 .599 2·2399 26 .3.5 2 22 .3.401 24 .7.43 2·3·5·7·23 22 .3 5 .5 2·5·487 22 .23.53 23 .13.47 2·3·19·43 22 .3.409 2·2459

3 7 7 2 3 3 II II 2 3 3 6 13

4931* 4933 4937* 4943* 4951 4957 4967* 4969 4973 4987 4993 4999

2·5·17·29 22 .3 2 ·137 23 .617 2·7·358 2·3 2 ·5 2 ·II 22 .3.7.59 2·13·191 23 .3 3 .23 22 ·II·II3 2.3 2 .277 27 .3.13 2.3.7 2 .17

6 2 3 7 6 2 5 II 2 2 5 3

Chapter 4. Properties of Polynomials

4.1 The Division of Polynomials We consider polynomialsf(x) with rational coefficients and we denote by 13°f the degree of the polynomial.

Definition 1.1. Let./{x) and g(x) be two polynomials with g(x) not identically zero. If there is a polynomial h(x) such that./{x) = g(x)h(x), then we say that g(x) divides j{x), and we write g(x)I'/{x) or glf If g(x) does not divide ./{x), then we write g,tf Clearly we have the following: (i)flf; (ii) ifflg and gil, thenfand g differ only by a constant divisor, and we call them associated polynomials; (iii) if fig and glh, then Jlh; (iv) if fig, then 13°f ~ aOg. Ifflg and g,tI, then we callfa proper divisor of g and it is easy to see that, in this case, 13°f < 13° g. Theorem 1.1. Let./{x) and g(x) be any two polynomials with g(x) not identically zero. Then there are two polynomials q(x) and r(x) such that f = q . g + r, where either r = 0 or aOr < aOg. Proof We prove this by induction on the degree off If 13°f < aOg, then we can take q = 0, r =f If aOf~ aOg, we let f=

IXnXn

+ ... ,

g = Pmxm

+ ... ,

aOf= n, 13° g = m,

so that

From the induction hypothesis, there are two polynomials h(x) and r(x) such that

where either r

so that f

=

0 or aOr < aOg. We now put

= qg + r as required. D

58

4. Properties of Polynomials

Definition 1.2. By an ideal we mean a set I of polynomials satisfying the following conditions: (i) If f, gEl, then f + gEl; (ii) IffE I and h is any polynomial, then fh E I. Example. The multiples of a fixed polynomial fix) forms an ideal.

Theorem 1.2. Given any ideal I, there exists a polynomial f E I such that any polynomial in I is a multiple off; that is I is the ideal of the set of multiples off Proof Let f be a polynomial in I with the least degree. If g is a polynomial in I which is not a multiple off, then, according to Theorem 1.1, there are polynomials q(x) and r(x) (1' 0) such that g

= qf + r,

Since f E I, it follows from (ii) that qfE I, and hence from (i) that g - qfE I, that is rEI. But this contradicts the minimal degree property of f The theorem is proved. D Definition 1.3. Let f and g be two polynomials. Consider the set of polynomials of the form mf + ng where m, n are polynomials. From Theorem 1.2 we see that this set is identical with the set of polynomial which are multiple of a polynomial d. We call this polynomial dthe greatest common divisor offand g, and we write (f, g) = d. For the sake of uniqueness we shall take the leading coefficient of (f, g) to be I, that is a monic polynomial. Theorem 1.3. The greatest common divisor (f, g) has the following properties: (i) There are two polynomials m, n such that (f, g) = mf + ng; (ii) For every pair of polynomials m, n we have if, g)lmf + ng; (iii) If Ilf and Ilg, then 11(f, g). D Definition 1.4. If(f, g) = I, then we say thatfand g are coprime. Theorem 1.4. Let p be an irreducible polynomial. If plfg, then either plf or pig. Proof If p,tf, then (f, p) = I. Thus, from Theorem 1.3 there are polynomials m, n such that mf + np = 1 so that mfg + ngp = g. Since plfg, it follows that pig. D

4.2 The Unique Factorization Theorem Theorem 2.1. Any polynomial can be factorized into a product of irreducible polynomials. If associated polynomials are treated as identical, then, apart from the ordering of the factors, this factorization is unique. D

59

4.2 The Unique Factorization Theorem

The theorem can be proved by mathematical induction on the degree of the polynomial. Theorem 2.2. Letj(x) and g(x) be two polynomials with rational coefficients, and that j(x) be irreducible. Suppose that f(x) = 0 and g(x) = 0 have a common root. Then j(x)lg(x). Proof Sincefand g have a common zero, it follows that (f, g) # l. Let d(x) be the greatest common factor of j(x) and g(x). Then d(x) and j(x) are associated polynomials, because j(x) is irreducible. Therefore j(x)lg(x). 0

From this theorem we deduce the following: Ifj(x) is an irreducible polynomial of degree n, then the zeros

are distinct. Moreover, if 9(i) is a zero of another polynomial g(x) with rational coefficients, then the other n - I numbers are also the zeros of g(x). Theorem 2.3. Let f and g be monic polynomials:

where Pv are distinct irreducible monic polynomials. Then

where

Cv

= min (a v , bv )' 0

Definition 2.1. Letfand g be two polynomials. Polynomials which are divisible by bothfand g are called common multiples offand g. Those common multiples which have the least degree are called the least common multiples, and we denote by [f, g] the monic least common multiple. Theorem 2.4. Under the same hypothe~is as Theorem 2.3 we have

where dv

= max (a v , bv ). 0

From this we deduce: Theorem 2.S. A least common multiple divides every common multiple. Theorem 2.6. Let f, g be monic polynomials. Then fg

=

[f, g](f, g).

0

0

60


4.3 Congruences Let m(x) be a polynomial. If m(x)lfix) - g(x), then we say that fix) is congruent to g(x) modulo m(x) and we write

fix)

= g(x)

(modm(x)).

With respect to any modulus m(x) we have: (i)f=f(modm); (ii) iff= g (modm), then g =f(modm); (iii) iff= g, g = h (modm), thenf= h (modm); (iv) iff= g, fl gl (modm), thenf ±fl g ± gl,ffl ggl (modm). Being congruent is an equivalence relation which partitions the set polynomials into equivalence classes. From (iv) we see that addition and multiplication can be defined on these classes. We denote by 0 the class whose members are divisible by m(x). If m(x) is irreducible we can even define division on the set of equivalence classes (except by 0, of course). Specifically, if fix) is not a mUltiple of m(x), then there are polynomials a(x), b(x) such that a(x}f{x) + b(x)m(x) = 1 which means that there is a polynomial a(x) such that a(x)f(x) = 1 (modm(x)). We state this as a theorem.

=

=

=

Theorem 3.1. Let m(x) be irreducible. Then any non-zero equivalence class has a reciprocal. That is, if A is a non-zero equivalence class, then there exists a class B such that for any polynomials fix) and g(x) in A and B respectively we have fix)g(x) = 1 (mod m(x)). D We now give an example to illustrate the ideas in this section. Let m(x) = x 2 + 1, an irreducible polynomial. Each equivalence class contains a unique polynomial ax + b which we may take as the representative. The addition and subtraction of classes is given by ax + b ± (alx + b l ) = (a ± al)x + (b ± bl)' Multiplication is given by (ax + b)(alx + b l ) = aalx 2 + (ab l + alb)x + bb l = (ab l + alb)x + bb l - aal (modx 2 + 1). Using the ordered pair (a, b) to denote the class containing ax + b we then have

(a,b)

± (abb l ) =

(a, b)(ah b l )

(a

± abb ± bl),

= (ab l + bal, bb l - aal)'

From

(ax

+ b)( -

ax

. . ( we see thatthe Inverse of (a, b) IS

+ b) = a2 + b2

(modx 2

+ 1),

b)

a 2' 2 2 2 ' In other words we have the a +b a +b arithmetic of the complex number ai + b. Extending the idea here, if m(x) is a monic polynomial of degree n, then each equivalence class possesses a unique polynomial with degree less than n, say -

and the arithmetic of the congruence modulo m(x) becomes the arithmetic of these

61

4.4 Integer Coefficients Polynomials

polynomials. The sum of two such polynomials is obtained by adding the corresponding coefficients, and the product is the ordinary product polynomial reduced modulo m(x). Exercise 1. Let OCl, OC2, OC3 be distinct. Determine a quadratic polynomial j(x) satisfying j(OC1) = /31 '/(OC2) = /32, j(OC3) = /33'

Answer: The Lagrange interpolation formula ft..x) = /31

(x - O(2)(X - O(3) (OCI - O(2)(OCl - O(3)

+ /32

(x - O(3)(X - OCl) (OC2 - O(3)(OC2 - OCl)

(x - OCl)(X - O(2)

+ /33...,..-----,---(OC8 - OCl)(OC3 - O(2)

Exercise 2. Let ml(x) and m2(x) be two non-associated irreducible polynomials. Let fl(X) andf2(x) be two given polynomials. Prove that there exists a polynomialj(x) such thatj(x) =/;(x) (modmi(x)), i = 1,2.

4.4 Integer Coefficients Polynomials It is clear that the set of integer coefficients polynomials is closed with respect to addition, subtraction and multiplication. A set of integer coefficients polynomials is called an ideal if (i) f + g belongs to the set whenever f and g belong to the set, (ii) fg belongs to the set whenever f belongs to the set, and g is any integer coefficients polynomial. Theorem 4.1. (Hilbert) Every ideal A possesses a finite number of polynomials fl' ... ,J,. with the following property: Every polynomial f E A is representable as f = glfl + ... + gnfn where gb' .. , gn are integer coefficients polynomials.

Proof 1) Denote by B the set ofleading coefficients of members of A. We claim that B forms an integral modulus. To see this, we observe that if a, bEB, where ft..x) = axn + .. " g(x) = bxm + .. " then by (ii) we know thatj{x)xm, g(x)x" E A so that

j(x)xm ± g(x)xn

=

(a

± b)xm+n + ...

are in A. Therefore a ± bEB which proves our claim. From Theorem 1.4.3 members of B are multiples of an integer d. Let the corresponding polynomial with leading coefficient d be

2) Let fEA. Then there are two polynomials q(x) and r(x) such that ft..x) = q(X)fl (x) + r(x) where oOr < OOfl or r = O. This is certainly so if the degree of fis less than that offl' Ifj(x) = axn + ... + an (n ~ I), then by 1) we see that dla, and

62


is a polynomial with degree at most n - I. If the degree here is greater than or equal to I, then its leading coefficient is again divisible by d. Continuing the argument we see that our claim is valid. 3) If every member of A has degree at least I, then the theorem is proved. Otherwise we let d' be the greatest common divisor of the leading coefficients of . members of A whose degree are less than I, and we let f2

= d'xl' + d'lX"-l + ...

(did')

be the corresponding polynomial in A. From the above, we see that members of A whose degree lies between l' and I can be written asfix) = Q(X)f2(X) + r(x) where aOr < a 2f2 or r = O. Continuing this argument the theorem is proved. 0

4.5 Polynomial Congruences with a Prime Modulus In this section all the polynomials have integer coefficients and p is a fixed prime number. Definition 5.1. If the corresponding coefficients of two polynomials fix) and g(x) differ by multiples of p, then we say thatf(x) and g(x) are congruent modulo p, and we writefix)~g(x) (modp). By the degree aOfofj(x) modulo p we mean the highest degree of f(x) whose coefficient is not a multiple of p. For example 7x 2 + 16x + 9~2x + 2 (mod 7), and a°(7x 2 + 16x + 9) = I (mod 7). But with respect to the modulus 3, a 2(7 x 2 + 16x + 9) = 2. Clearly we have (i) j(x)~j(x) (modp); (ii) if f~g (modp), then g~f (modp); (iii) if f~g, g~h (modp), thenf~h (modp); (iv) iff~g,Jl ~gl (modp), thenf ±fl ~g ± gl and ffl ~ggl (modp). We note particularly that (f(xW

~j(xP)

(modp).

Definition 5.2. Letf(x) and g(x) be polynomials with g(x) not identically zero mod p. If there is a polynomial h(x) such thatj(x) ~h(x)g(x) (modp), then we say that g(x) dividesf(x) modulo p. We call g(x) a divisor ofj(x) modulo p, and we write g(x)lj(x) (modp). Example. From XS + 3x4 - 4x 3 + 2 ~ (2X2 - 3)(3x3 - x 2 + I) (mod 5) we see that 2X2 - 31x s + 3x4 - 4x 2 + 2 (mod 5). We have the following: (i) f(x)lj(x) (modp); (ii) if j(x)lg(x) and g(x)lf(x) (modp), thenj(x) and g(x) differs only by a constant factor; that is, there exists an integer a such thatj(x)~ag(x) (modp). In this case we say thatj(x) and g(x) are associated modulo p. It is easy to see that every polynomial has p - I associates

63

4.6 On Several Theorems Concerning Factorizations

modulo p. Moreover, there is a unique monic associated polynomial. (iii) Ifflg, glh (modp), thenflh (modp). (iv) Letfix) and g(x) be two polynomials with g(x) not identically zero modulo p. Then there are two polynomials q(x) and r(x) such that fi.x)~q(x)g(x) + r(x) (modp), where either aOr < aOg, or r(x)~O (modp). Definition 5.3. If a polynomial fix) cannot be factorized into a product of two polynomials with smaller degrees modp, then we say that f(x) is an irreducible polynomial modp, or thatf(x) is prime modp. Example. We take p = 3. There are three non-associated linear polynomials, namely x, x + 1, x + 2, which are irreducible. There are nine non-associated quadratic polynomials, namely x 2 , x 2 + x, x 2 + 2x, x 2 + 1, x 2 + X + 1, x 2 + 2x + 1, x 2 + 2, x 2 + X + 2, x 2 + 2x + 2. Of these there are 6 (= (x + a)(x + b)) which are reducible, and the three irreducible ones are x 2 + 1, x 2 + X + 2, x 2 + 2x + 2.

We note that if a polynomial is irreducible mod p, then it is irreducible and from this we deduce that x 2 + 2x + 2 has no rational zeros. The determination of the number of irreducible polynomials modp of degree n is an interesting problem which we shall solve in §9. Theorem 5.1. Any polynomial can be written as aproduct of irreducible polynomials modp, and this product representation is unique apartfrom associates and ordering of the factors. 0 We can define, similarly to §1, the greatest common divisor and the least common multiple. If we denote by (f, g) the monic greatest common divisor, then we have Theorem 5.2. Given polynomials j(x) and g(x), there are polynomials m(x) and n(x) such that m(x)f(x) + n(x)g(x)~(f(x), g(x)) (modp). 0

4.6 On, Several Theorems Concerning Factorizations Definition 6.1. Letj(x) = anxn + an_1x"-1 + ... be a polynomial. The polynomial + (n - 1)an_lxn- 2 + ... is called the derivative ofj(x) and is denoted by

nanx"-l f'(x).

Clearly we have (f(x) + g(x))' = f'(x) that (f(x)g(x)), = f'(x)g(x) + g'(x)j(x).

+ g'(x),

and it is not difficult to prove

Definition 6.2. If a polynomial j(x) is divisible by the square of a non-constant polynomial modp, then we say thatfix) has repeated/actors modp. For example, x 5 + X4 - x 3 - x 2 + X + 1 has the repeated factors (x 2 + 1)2 modulo 3.

64


Theorem 6.1. A necessary and sufficient condition for j(x) to have repeatedfactors is that the degree of (j(x),f'(x» is at least 1. D Theorem 6.2. Ijp,(n, then X' - 1 has no repeatedfactors modp. Theorem 6.3. Let (m,n)

=

d. Then (x'" - 1, xn - 1) =;xd - 1.

D

D

Theorem 6.4. Let (m, n) = d. Then

4.7 Double Moduli Congruences Definition 7.1. Let p be a prime number and q>(x) be a polynomial. Iff1 (x) - fix) is a multiple of q>(x) mod p, then we say that f1 and f2 are congruent to the double moduli p, q>(x) and we write

f1(X) §. f2 (x)

(moddp, q>(x».

For example, x 5 + 3x4 + x 2 + 4x + 3 §. 0 (modd 5, 2X2 - 3). Double moduli congruences have the following properties: 1) j(x)§.j(x) (moddp, q>(x»; 2) If f§.g (moddp, q», then g§.f(moddp, q»; 3) If f§.g and g§.h (moddp, q», thenf§.h (moddp, q»; 4) If f§.g and f1 §.gl (modd p, q», then f ±f1 §.g ± gl and ff1 §.ggl (moddp, q»; 5) Suppose that the degree of q>(x) (modp) is n. Then every polynomial is congruent to one of the following polynomials

0::;;; ai::;;;p - 1.

(1)

It is clear that there are pn polynomials in (1), no two of them are congruent (moddp, q>(x», and any polynomial must be congruent to one of them (moddp, q>(x». Definition 7.2. We call the pn polynomials in (1) a complete residue system (moddp, q>(x». By discarding those polynomials which are not coprime with q>(x) we have a reduced residue system (moddp, q>(x».

Theorem 7.1. Let (g(x), q>(x» = 1. Then, asj(x) runs through a complete (or reduced) residue system (moddp, q>(x», so does f(x)g(x). Proof If g(X)f1 (x) §. g(X)f2(X) (moddp, q>(x», then from (g(x), q>(x» = I we deduce that f1 (x) §. f2 (x) (moddp, q>(x». The required result follows easily from this. D

65

4.8 Generalization of Fermat's Theorem

4.8 Generalization of Fermat's Theorem Let p be a prime number, and (x)) is called the order of fix). As before, it can be proved that I divides pn - 1, and that there are precisely q>(l) polynomials having order I. There are therefore q>(pn - 1) polynomials with order pn _ 1, and these polynomials are called the primitive roots (moddp, q>(x)). Iffix) is a primitive root, then (fix)) v, v = 1,2, ... ,pn - I represent all the non-zero incongruent polynomials, moddp, q>(x). It is not difficult to prove that the product nv (X - fv(x)), where!., runs over all the primitive roots, is equal to

n

x pn - 1 _

n

1

(X(pn_1)/q -

(X(pn_1)/qq, -

1)

1)

(1)

q

where qi runs over all the distinct prime divisors of pn - 1. Exercise. Prove that the product of all the non-zero incongruent polynomials is congruent to - I (moddp, q>(x)).

4.11 Summary We may summarize the discussions of this chapter in the language of modern algebra or abstract algebra. We have a set of objects which we denote by R. The number of objects in R may be finite or infinite. 1. If we can define the operations of addition and subtraction in R and that these operations are closed in R, then we call R an integral modulus. For example: The set of even integers forms an integral modulus; the set of polynomials with even integer coefficients forms an integral modulus. An integral modulus is also known as an Abelian group. 2. If we can define the operations of addition, subtraction and multiplication which are closed in R, then we call R a ring. For example: The set of integers forms a ring; the set of integer coefficients polynomials forms a ring. 3. By an ideal E, we mean a subset of a ring R which satisfies the following conditions: i) If a,bEE, then a - bEE; ii) If aEE and rER, then arEE. For example: The subset of even integers forms an ideal in the ring of integers. In the ring of integer coefficient polynomials, we may form the ideal of polynomials having the formfix)(x 2 + 1) + 2g(x)x, wherefand g run over all integer coefficient polynomials. 4. If in R we can define the operations of addition, subtraction, multiplication and division (except by 0), and that these operations are closed in R, then we call R a field.

4.11 Summary

69

For example: The set of rational numbers forms a field. The residue classes modulo a fixed irreducible polynomial forms a field, which is known as an algebraic extension field in modern algebra. Next, take a prime number p and an irreducible polynomial qJ(x) of degree n. The residue classes with respect to the double modulus p and qJ(x) forms a field with pn elements. Students who master the various concrete examples discussed in this chapter will find it easier to learn the abstract concepts of modern algebra.

Chapter 5. The Distribution of Prime Numbers

In this chapter we give some basic results concerning the distribution of prime numbers. The reader will only require some knowledge of the calculus - this chapter is a first introduction to analytic number theory and we shall omit all the deeper investigations.

5.1 Order of Infinity In the discussion of the distribution of prime numbers we must understand the notion of the comparison of the order of growth between two functions. We often use the symbols .

«,

0,

0,

the meanings of which we shall now give. Let n be a positive integer which tends to infinity (or x a continuous variable which tends to infinity). Let -: n(2k+1) >-: _ _ _ - _ ..,....,.,..._ _ 7 72 k + 1 - 8 t(k + I)

1 2k + 2

>-: -

I

n

>-: - - -

78 H(2 k+1) 78 H(n)·

(8)

82

5. The Distribution of Prime Numbers

This holds for all n

~

2. Therefore I

H(n)

8

n

0

-~n(n)--~

~) =

log (I -

p";~

C7

+

-

P

0(_1_) log~

I P~ + p";~ I [lOg (I - P~) + P~J

p";~

+I

(IOg(1

p>2

-~) +~) P P

-~) +~) = -loglog~ +

(IOg(1

C13

P

P

+0(_1_), log~

where C13

= -

C7

+ I

p>2

(lOg (1

-~) + ~). P P

Therefore

I

p";~

( P1) 1--

(1) = __

eCl3 'c o ( log~ 1)

=e-logloge+cl3+o log~

log ~

= ~(1 + o log~

(_1_)) log ~

(C1 2

= eC[3),

where we have used

eOCo~~)= I + The theorem is proved.

0(_1_). log~

D

Theorems 9.2 and 9.3 are quantitative elaborations of Theorems 4.3 and 4.5. Exercise 1. Let Pn denote the n-th prime. Prove that there are constants that n

Exercise 2. Prove that there exists a positive constant ({)(n) >

cn

loglogn

,

Exercise 3. Prove that the infinite series 1

~ p(log logp)h

n ~ 3.

~

C

2.

such that

Cl, C2

such

94


converges or diverges according to whether h > 1 or h summation over all the prime numbers.

~

1. Here

Lp represents the

5.10 The Number of Prime Factors of n Let n be a positive integer. We denote by w(n) the number of distinct prime factors 'Of n and by Q(n) the total number of prime factors of n. That is, if n = p~1 ... p~', then Q(n)

w(n) = s,

If n is a prime, then w(n) of 2, then

= Q(n)

= at + ... + as.

(1)

= 1; but as n tends to infinity through power.s

Q(n)

logn log2

= - - --+

00;

and if n = PtP2 ... Ps is the product of the first s primes, then as n --+ 00, = s --+ 00. Thus the behaviours of w(n) and Q(n) are rather irregular and there is certainly no asymptotic formula for them. However, we do have the following: w(n)

Theorem 10.1. There are positive constants

L w(n)

:;=

Ct, C2

xloglogx

such that

+ Ct + o(x),

(2)

n:::=;x

L Q(n) = xloglogx +

C2

+ o(x).

(3)

n:.%.x

Proof 1) We have

L w(n) = L L 1 = L [~J = L ~ + O(n(x»

P p.sx P and so (2) follows from Theorem 9.2 and Theorem 6.2. 2) We have n.sx

n.sx pin

p.sx

and, by Theorem 6.2, logx

[ IOgX]

P log2 .sx

Therefore

L n:::=;x

Q(n)

=

x

L w(n) + L m + o(x). n:S;x

logx

r=

1 ~ - - L 1= - n ( y x) = o(x). log2 p2.sx log2

pm:s;x m~2

P

95

5.10 The Number of Prime Factors of n

But the series

~"1_,,(1+1+ ... )_,,

m'-:2

'7 pm - '7

p2

p3

-

'7 p(p 1-

-c 1) -

converges, so that

L

Q(n) =

n:::;:;:x

L w(n) + x(c + 0(1)) + o(x) =

x10g10gx

+ C2X + o(x). 0

n:::=;x

Theorem 10.2 (Hardy-Ramanujan). Let e > 0, and letf(n) denote either w(n) or Q(n). Then the number of positive integers n

~

x satisfying

If(n) - 10glognl > (loglogn}!-+£ is o(x), as x

(4)

~ 00.

Proof(Turan). Since 10glogx - 1 < 10glogn ~ 10glogx when xl/e < n ~ x, and the number of positive integers n ~ xl/e is [xl/e] = o(x), it suffices to prove that the number of positive integers n ~ x satisfying If(n) -loglogxl > (loglogx)t+£

(5)

is o(x) as x ~ 00. Next, from Q(n) ;::: w(n), and by (2) and (3)

L (Q(n) -

w(n))

= O(x)

n~x

so that the number of positive integers n ~ x satisfying Q(n) - w(n) > (log 10gx)t is

o ((lOg l:g x)t )

=

o(x) ..

Therefore we need only consider the casef(n) = w(n). We consider a pair p, q of distinct prime divisors of n (p, q and q,p are treated as two different pairs). Each p may take w(n) values and for each fixed p, q may take w(n) - 1 values. Therefore we have w(n)(w(n) -

1) =

L 1 = L 1 - L 1. pqln p¢q

Summing over n

=

pqln

p21n

1,2, ... , [x] we have (6)

Since

96


and

L [~J = x L ~ + O(x), pq pq

pq';'x

pq';'x

it follows from (2) and (6) that I

L w 2(n) = x L - + O(x log log x). n';'x

pq';'x

(7)

pq

Now

L ~)2 ~ L ~ ~ (L ~)2, ( P.;.J; P pq p pq';'x

and Lp';'~ lip = log log

p';'x

e+ 0(1), so that both the outsides in the above are

(loglogx

+ 0(1))2 = (loglogx)2 + O(loglogx).

It now follows from (7) that

L w 2(n) =

x(loglogx)2

+ O(x log log x),

(8)

n:::;:;:x

and so

L (w(n) -loglogx)2 = L w 2(n) -

2 log logx

L w(n) + [x](loglogx)2 n~x

n:::=;x

= x(loglogx)2 + O(xloglogx) - 210glogx(xloglogx + O(x)) + (x + 0(1))(loglogx)2 = O(x log log x). Given any (j > 0, if there are (jx positive integers n

~ x

(9)

such that (5) holds, then

L (w(n) -loglogx)2 ~ (jx(loglogX)1+2"

(10)

n~x

which contradicts with (9). Therefore the number of positive integers n that (5) holds is o(x), and the theorem is proved. D From this we see that w(n) '" log log n

and

Q(n) '" log log n

for almost all n.

5.11 A Prime Representing Function Theorem 11.1 (Miller). There exists a fixed number 2~o

then

[!Xn ]

is always prime.

= !Xl>""

!X

such that

if

~

x such

97

5.12 On Primes in an Arithmetic Progression

Proof We construct a sequence of primes {Pn} by induction: Take PI Theorem 7.1 there exists a prime Pn+ 1 satisfying

If Pn + 1 + 1 = 2Pn + 1, then Pn + 1 = 2Pn + 1 divisor 2'!-(Pn+ 1 ) - 1). Therefore

-

2Pn < Pn+ 1 < Pn+ 1

=

3. By

1 cannot be prime (because it has the

+ 1
0), then Dirichlet's theorem follows. For if there exists n such that an + b = PI (> b) is prime, and (replacing a by apr) there exists n such that apln + b = P2 (> PI) is prime, and so on, then there are infinitely many primes of the form an + b. Theorem 12.1. Let k > l. Then there are infinitely many primes of the form kn + l. From what we said earlier it suffices to prove that there always exists a prime of the form kn + l.

98


The roots of the equation Xk

1 are given by

=

a

= 0, 1, ... ,k - 1.

Let (a,n)= 1

where the product is over a reduced set of residues a mod n. Clearly we have Xk - 1 =

f1 Fn(x)

nlk where the product is over the divisors n of k, since each root on the left hand side must occur on the right hand side, and conversely without any repetition. Let

where Gk(x) is the least common multiple of the various polynomials xn._ 1 (n Ik, n < k), and its leading coefficient is 1. Therefore Gk(x) is an integer coefficient polynomial, and by Theorem 1.13.2 we see that Fk(X) is also an integer coefficient polynomial. If x is an integer not equal to ± 1, then

that is, Fk(X) and Gk(x) are non-zero integers. Lemma 1. Let n be a proper divisor of k. Then for all integers x -:f

± 1, we have

Proof Let xn - 1 = y, k = nd. Then Xk - 1

--= ~-l

(y

+ l)d y

1 =yd-l

== d (mody).

+

(d).,d-2 + ... + (d) y+d 1

y

2

0

Lemma 2. Let x be an integer not equal to Fk(X) and Gk(x) must be a divisor of k.

± 1. Then each common prime divisor of

Proof Let pl(Fk(x), Gk(x)). From pIGk(x)

=

f1 Fn(x)

nlk n 0.67

~ TI TI plnp-2 p

>2

(1 - (P-I) 1 2)~2 . logn

p>2

It follows, of course, from this that every sufficiently large even integer is a sum of a

prime and an integer having at most two prime factors. The proof of Chen's theorem is given in the book "Sieve Methods" by H. Halberstam and H. E. Richert [28] where there is also a comprehensive bibliography. 5.2. Concerning the prime twins problem J. R. Chen [20] also proved that there are infinitely many primes P such that P + 2 is either a prime or has two prime factors. 5.3. H. Iwaniec (unpublished) has proved that there are infinitely many integers n such that n2 + I is either a prime or has two prime factors. 5.4. The principle of the "large sieve" was invented by Yu. Linnik and A. Renyi, and was substantially developed by K. F. Roth [50] and E. Bombieri [9] (see also the books by H. L. Montgomery [44] and E. Bombieri [10]). From his result Bombieri deduced the following theorem on the average value of n(x; k, I): Given any A > 0, there exists B = B(A) > 0 such that

I

I

lix = 0 max n(x;k,/) .....:--

k:S;x-5-/1 og B x (I, k)= 1

I

O. We have

LJl(d) = LJl(n/d) = L1(n) = din

din

{I, 0,

if n = I, if n # 1.

Proof This follows from takingf(d) = 1 in Theorem 2.3.

0

Theorem 3.2. Let 0 < '10 ::;:; '11 and let h(k) be a completely multiplicative function which is not identically zero. If for any '1 satisfying '10 ::;:; '1 ::;:; '11 we have

g('1)

j(k'1)h(k), L "'k"'ql/q

(I)

1

Jl(k)g(k'1)h(k) ; L ",k"'ql/q

(2)

1

=

then f('1) the converse also holds.

=

106

6. Arithmetic Functions

Proof From (1) we have

L

L

Jl(k)g(k'1)h(k) =

L

Jl(k)h(k)

f(mk'1)h(m).

Let mk = r. From Theorem 3.1 we have 1

""k~~I/~ Jl(k)g(k'1)h(k) = ""k~~I/~ Jl(k) ""k~~li~f(r'1)h(k)h G·) 1

1

klr

L

f(r'1)h(r)

L

f(r'1)h(r)

Jl(k)

LJl(k) klr

l""r""~li~

L

L

f( r'1)h(r)LJ(r)

= f('1)h(l) = An)

which proves (2). Suppose instead that (2) holds. Then

L

L

f(k'1)h(k) =

h(k)

L

L

L

Jl(m)g(mk'1)h(m)

Jl(r/k)g(r'1)h(k)h(r/k)

1 ""k""~I/~ 1 ""k""~li~

klr

L

g(r'1)h(r)

l""r""~I/~

L 1

which proves (1).

L

Jl(r/k)

l""k""~I/~

klr

g(r'1)h(r)LJ(r) = g('1)

""r"" "I.l/~

0

We can extend this theorem as follows: Theorem 3.3. Let ~o not identically zero.

~

1 and let H(k) be a completely multiplicative function which is all real ~ satisfying 1 :::; ~ :::; ~o we have

Iffor

G(~)

L

=

F(~/k)H(k),

(3)

Jl(k)G(~/k)H(k);

(4)

l""k""~

then we have, for such

~,

F(~) =

L l""k""~

the converse also holds. Proof Letf('1) = F(lN) and g('1) = G(I/'1)' Then from (3) and (4) we have g('1) = G(l/'1) =

L

l""k""l~

F(

~) H(k) = L

'1

l""k""l~

f('1k)H(k) ,

107

6.4 The Mobius Transformation

f{1'/)

= =I F(1/1'/)

l"k"l!~

J1.(k)G

(~) H(k) = I

l"k"l!~

1'/

These are just formulae (1) and (2) with 1'/1 = I

~

J1.(k)g(1'/k)H(k).

Igo = 1'/0.

D

We now apply this to the following:

Theorem 3.4. When

~ ~

I we have

II

J1.r)

1 H"~

Proof In (3) we set

F(~)

~ ~

(5)

= =I H(k) I

If I

I ~ l.

so that GW

I

=

J1.(k)

1"kq

=

[~].

[t]·

(6)

< 2, then (5) clearly holds. Suppose now that ~

IxI

k= 1

J1.(k) k

From (4) we have

~

2, and let x

= [~]. T~en

-11=1 I J1.(k)(~-[~])1 k

k= 1

k

=IIJ1.(k)(~-[~])I~ k k k=2

II=x-l.

k=2

Therefore

xl I k=l

J1.(k) k


I~ I + (x -

1)

=

x,

D

6.4 The Mobius Transformation Another consequence of Theorem 3.3 is the following:

Theorem 4.1. Let h(k) be a completely multiplicative function which is not identically zero, and let no be a positive integer. If for all n satisfying I ~ n ~ no, we have g(n)

=

If(d)h(~),

(I)

din

then, for such n, we have f{n) =

I din

J1.(d)g('!.)h(d); d

(2)

the converse also holds. Proof We define F(~) by setting F(~) = f(~) when ~ is an integer and F(~) = 0 if ~ is

108

6. Arithmetic Functions G(~)

not an integer, and we define G(n) = g(n) =

similarly. We can rewrite (1) and (2) as

Ij(d)h(~) = If(~)h(k) = I F(~)h(k) d k k

din

kin

l';k';n

and F(n) =j(n)

=

IJ1.(d)g(~)h(d) = IJ1.(d)G(~)h(d) d

din

=

d

din

1.;~.;/(d)G(~)h(d).

From the definition of F(~) and

G(~)

these two formulae can also be written as

G(~) = I

F(i)h(k),

F(~) = I

J1.(k)G(i)h(k).

l';kq

l';k';~

Here ~ satisfies 1 :::; ~ :::; no. Conversely (1) and (2) can be deduced from these formulae. The theorem now follows from Theorem 3.3 with ~o = no. 0 Definition. If

g(n) = If(d) = din

If(~)'

din

then we call g(n) the Mobius transform ofj(n). We also callj(n) the inverse Mobius . transform of g(n). From Theorem 4.1 we have j(n) =

IJ1.(d)g(~) = IJ1.(~)g(d).

din

din

From Theorem 2.2 we see that the Mobius-- transform, and the inverse Mobius transform, of a multiplicative function is multiplicative. Example 1. From Theorem 3.1 we see that A(n) is the Mobius transform of J1.(n). Example 2. From u;.(n) = Idln d\ we see that u;.(n) is the Mobius transform of the multiplicative function E;.(n) = n\ and therefore u;.(n) is a mUltiplicative function. Since 'I

U;.(pl)

=

I

p;'(l+I)_1

pm;,

= --:-;,--

(2 # 0),

P - 1

m=O

we deduce that if n = TIvP~v, then u;,(n) ~

TI v

p;'(lv+ 1) _

v

;,

Pv - 1

1 •

109

6.4 The Mobius Transformation

In particular, when A = 0, we have d(n)

= (J'o(n) =

TI (Iv + 1),

which we already proved in an earlier exercise. Example 3. The function Eo(n) = 1 is the Mobius transform of LI(n). Example 4. Let n be fixed and let the integers 1,2, ... , a, ... , n be partitioned into distinct classes according to the value of t!le greatest common divisor (n, a). If d = (n, a), then we can write n = dk and 1 = (k, a/d). Now the number of integers a satisfying 1 = (k, a/d) is precisely O. Then den)

Here the O-constant depends on Proof Let n =

If pe

~

D

=

(1)

O(ne).

B.

TIpln pa be the standard factorization of n. We have

2, then pae

~

+ 1. Therefore

2a ~ a

~

TI

1

pin

l(a

pE 1 be such that the congruence

f2 == - 1 (modn)

(1)

has a solution. Then there exists a unique pair of integers x, y satisfying

x> 0,

y>O,

(x,y)

= 1,

y

== Ix

(modn).

(2)

Proof Clearly if (2) is soluble, then so is (1). A necessary condition for (1) to be soluble is that n is representable as a

= 0 or 1,

and Pi (i = 1,2, ... ,s) is a prime == 1 (mod 4). We now use induction to prove the theorem. 1) We consider first the case n = pA. If A. = 1, then from 12 + 1 == 0 (modp) we see that when (x,p) = 1, we have x 2/2 + x 2 == 0 (modp). We shall presently choose y and x so that x 2f2 == y2 (modp), and x 2 < p, y2 < p. Let x and y take the values 0,1, ... , and consider the various differences xl- y. Since there are + 1)2> p such differences, there must be two which are congruent modp. Let xII - YI == X21 - Y2 (modp), or (Xl - x2)1 == YI - Y2 (modp), and we can assume that Xl - X2 > 0 so that Xl - X2 < IYl - Y21 < and this then gives our desired x and y. For this pair x, Y we have x 2 + y2 = tp, and it is easy to see that t = 1, (x,y) = 1. The congruence Y == mx (modp) is soluble, and from x 2(1 + m 2) == 0 (modp) we see that m == ± I. Ifm = I, then we take the pair (x,y), while ifm = - I, then we take the pair (y, x). Now assume that p ¥- 2 and thatthe theorem holds for n = pA. Let ( - /)2 == - 1 (mod pH I) so that there exist u, v such that

([..JP]

[..JP]

..JP,

u > 0,

v> 0,

(u, v)

..JP

= 1,

v

== -

lu

(modpA).

When n = pA+l, we have pHI

=

(xu

+ YV)2 + (xv

_ yU)2

= X2+

y2

(X> 0, Y> 0).

First we have (X, Y) = 1, since otherwise pl(X, Y), but X

== xu + yv == xu -

flxu

== xu(1

- fl) =1= 0

(modp),

which is impossible. Next, because (X, p) = 1, the congruence Xm == Y (mod pA + I) is soluble. Thus X 2 + Y 2m 2 == 0 (modpHI) or 1 + m 2 == 0 (modpHI). From Theorem 2.9.3 this congruence has only two solutions, so that m = ± l. The desired result follows from the discussion in the case A. = 1.

118


2) Let n = ab, a > 1, b> 1, (a, b) = 1, and suppose that 12 == - I

(modn),

u2 + v 2 = a,

u> 0, '

v> 0,

(u,v)

= I,

v == lu

(mod a),

x 2 + y2

x> 0,

y>O,

(x,y)

=

1,

y == Ix

(mod b).

- YV)2

=

= b,

From Theorem 7.3 we have n

= ab = (xv + yuf + (xu

X 2 + y2.

(If xu - yv > 0, then let xu - ·yv = Y; otherwise we let xu - yv = - Y.) We now prove the following: (i) (X, Y) = 1. Let pl(X, Y). Then xv

+ yu =ps,

xu - yv =pt,

or x(u 2 + v 2)

= p(sv + tu),

y(u 2 + v 2) = p(su - tv).

Since (x,y) = I, we must have pl(u 2 + v 2), that is pia. Similarly plb. But this contradicts (a, b) = l. (ii) X == IY (mod n). From our assumption we have xv

+ yu == Ixu -

Iyv == I(xu - yv)

(mod a),

xv

+ yu ==

+ Ixu == I(xu

(mod b).

-Iyv

- yv)

Since (a, b) = I, it follows that X == IY (mod n). 3) Uniqueness. Suppose that there are two pairs (X, Y), (X', Y') both satisfying the conditions. Then n 2 = (XX'

+

yy')2

+ (XY'

_ YX')2.

But XX'

+

YY' == XX'(l

+ [2) == 0

(modn),

so that XX'

From XY' - YX'

=

+ YY' =n,

XY'- YX'=O.

0, we have

X

Y

-=-=c X' Y' ,

119

6.7 The Representation of Integers as a Sum of Two Squares

so that X 2 + y2 = C 2(X,2 + y'2) giving C = ± 1. Also from X > 0, X' > 0 we see that C = 1. The proof of our theorem is complete. 0

Proof of Theorem 7.2. From Theorem 7.1 and Theorem 7.4 we see that the number of solutions to x 2 + y2 = n, (x, y) = 1 is 4 V(n). We now consider the equation x 2 + y2 = n, and we partition the various solutions into sets according to (x, y) = d. The number of solutions satisfying (x,y) = d is equal to the number of solutions satisfying X)2 (d

(y)2

+ d

=

n d2

'

that is 4 V(n/d 2 ). Therefore

r(n)

= 4

I d21n

v(-;) d

= 4

I V(~)2(d), d

din

where 2(d) = I or 0 according to whether d is a square or not. Since V(n) and 2(n) are both mUltiplicative it follows that r(n)/4 is multiplicative. Since ben) is also multiplicative the theorem will follow if we show that r(n) = 4b(n) when n = p'. Now, if 21m, then

r(pm) = V(pm) + V(pm-2) + ... + V(p2) + V(l) 4

0+ ... + 0 + I = I, + ... + 0 + I = I, 2+"'+2+1= m =-·2+I=m+l 2 '

°

if p = 2, if p == 3

(mod 4),

if p == 1

(mod 4),

and if 2,tm, then

I,

=

{

°~ + I,

if p = 2, if p == 3 if p == 1

(mod 4), (mod 4).

On the other hand we have

b(pm) = 1 + X(p) + ... + X(pm)

I +0+0+ ... +0= 1, _ { 1 - 1 + ... + I = 1, I - I + ... - I = 0, 1 + 1 + ... + 1 = m + I, The theorem is proved.

0

if if if if

p=2, p==3 p==3 p==l

(mod 4), (mod 4), (mod 4).

21m, 2,tm,

120


Theorem 7.5. Denote by A and B the number ofdivisors ofn which are congruent I and 3 (mod 4) respectively. Then r(n) = 4(A - B). Proof This is an immediate consequence of Theorem 7.2.

0

Theorem 7.6. Let e > O. Then r(n) Proof Since r(n)

~

= O(n').

4d(n), the required result follows from Theorem 5.2.

0

6.8 The Methods of Partial Summation and Integration Theorem 8.1 (Abel). Let a numbers and

~

b and let n vary in a

~

n

~

b. Let 'l'n and en be complex

Then

IJa 'l'nenl ~ a~::b ISnl C"'m~b-l lem Proof Let Sa-l

=

em+11

+ lebl ).

(I)

O. Then b

b

n=a

n=a

L 'l'nen = L (sn =

Sn-l)en

b

b-l

n=a

n=a

L Snen - L Snen+l b-l

=

L sn(en -

en+d

+ Sbeb,

n=a

so that

Theorem 8.2. In the previous theorem if en is a positive decreasing sequence, then

Int 'l'nenl ~ a~::b ISnlea· We now apply this to the following:

0

(2)

121

6.8 The Methods of Partial Summation and Integration

Theorem 8.3.

If s >

0, then

"L... x(n)s I -....::::~~s' In~a n a so .that the series

I:'= 1 x(n)/n s converges when s> 0.

Proof We have x(a) + x(a + I) + x(a + 2) + x(a + 3) = 0,

so that

From Theorem 8.2 we deduce that

I ±X(7)1~~· n a n=a

Since the right hand side is independent of b, the theorem follows.

D

Note: In the next section we shall require x(n)

I

00

-=

n= 1

n

I

I

I

n

1--+---+'" =-. 3 5 7 4

This can be proved using the series expansion for tan - 1 X in ordinary calculus. Analogous to Theorems 8.1 and 8.2 we have: Theorem 8.4. Let ~ ~ '1 and let x vary in ~ ~ x ~ '1. Suppose thatf(x) and g(x) are continuous and g(x) is differentiable. Let x

11 (x) =

f fit) dt.

Then q

q

Iff(X)g(X)dxl ~

Moreover, if g'(x) ~

~ ~~::ql/l(x){flg'(X)ldX + Ig('1)I). ~

°

and g(x) > 0, then q

Iff(X)g(X)dxl

~ g(~) ~~::ql/l(X)I.

122


Proof From integration by parts we have ~

~

= I g(x)dl1 (x)

II(x)g(X)dX

~

= g(rO/l (1]) - III (x)g'(X) dx, and hence

II ~

fix)g(x) dx

I~ ~~::~

~

III (x)1 (lg(1])1

+I

~

Ig'(x)1 dX).

~

The last part of the theorem is also clear.

D

Example. Let a > O. Prove that 00

II

I cOS~/~Y I~ ~ maxi 00

COSX2dxl

=

I

2y

2a

a2~~

~

a

~

ICOSYdyl

~~. a

~

6.9 The Circle Problem' Theorem 9.1.

L

r(n)

=

nx

+ o(fi)·

Proof From Theorem 7.2 we have

L

r(n)

=4

l~n~x

L LX(d) l:::=;n~xdln

=4

L 1 ~d:::=;x

= 4

x(d)

L 1 ~n:::;;x

L X(d)[~J.

l~d~x

Here we divide the sum into two parts. From Theorem 8.3 we have

123

6.9 The Circle Problem

= 4x

I: d=l

= 1tX

X(d)

+ O(Jx)

d

+ O(Jx);

the other part is

and from Theorem 8.2 we have The theorem is proved.

D

Another proof of the theorem is the following: Clearly LO";n";xr(n) is the number of pairs of integers u, v satisfying u2 + v2 ~ x. In other words the sum is the number of lattice points inside the circle centre at the origin with radius Jx. This circle has area 1tx. We partition the plane into unit squares with orthogonal lines passing through the lattice points. To each point (u, v) in our circle we assign the square whose four corners have the coordinates (u, v), (u + 1, v), (u, v + 1), (u + 1, v + 1). These squares must lie inside the circle u2 + v2 = (Jx + J2)2 and they include the circle u2 + v2 = (Jx - J2)2. Therefore

and the required result follows at once. We observe that this second proof can be used as a proof for 1t 1 1 1 1--+---+ ... =-. 3 5 7 4 Concerning the pro blem of the number oflattice points inside a closed curve, the Czech mathematician M. V. Jarnik proved the following: Theorem 9.2. Let I ~ 1 be the length of a rectifiable simple closed curve and let A be the area of the region bounded by the curve. If N is the number of lattice points inside the curve, then

IA - NI < I. Proof (Steinhaus). We first prove the following two simple lemmas. Lemma 1. Let C be a rectifiable curve inside a unit square with the two end points on the boundary of the square. IfC crosses the two diagonals of the square, then its length must be at least 1.

Proof If the two end points are on the opposite sides of the square, then the result follows at once. Suppose next that the two end points are on two adjacent sides of

124


rJ.

P a

b

P

the square as shown in the diagram. It is easy to see that

A similar argument applies when the two end points are on the same side of the square. Lemma 2. Let C be a rectifiable curve inside a unit square with the two end points on the boundary of the square so that the square is partitioned into two regions. Suppose that C does not pass through the centre of the square, and denote by LI the region which does not contain the centre. Then the area of LI must be less than the length of C.

Proof We consider separately the cases shown in the following diagrams:

rJ.

P

q fJ

rJ.

fJ

P

rJ.

P

fJ

P

rJ.

fJ

rJ.

q P

P

Let A be the area of the region LI and I be the length of C. In the first two cases it is easy to see that every point of C is of distance at most I from the base line rxf3 so that LI must lie inside a rectangle with sides 1 and I and hence A < I. In the remaining three cases we see from Lemma 1 that I;?; 1 and so A < 1 ~ l. We can now proceed to prove the theorem. Denote by I the region inside the curve. We form a net of unit squares in the plane with the lines

x=m

+t,

y=n+t

(m,n

=

0,

± 1, ± 2, ... ).

Let Qb Q2,' .. , Qk be those squares which contain part of the boundary of I, let C i be the part of the curve in Qi' let Q i be the intersection of Qi and I, and define

{I,

N.= , 0,

if Q i contains a lattice point, otherwise.

We let Ai be the area of Qi, Ii the length of Ci, so that our theorem will follow if we can prove that IAi - Nil < 1;. Now the case when the whole of Ilies inside a Q follows at once since I;?; 1. We can assume therefore that Ci is made up of a number of sections of the curve and Qi is partitioned into regions DlS).

125

6.10 Farey Sequence and Its Applications

If the lattice point does not lie in any DlS) so that it lies on Ci, then Ni = 0, o < Ai < 1 and Ii ~ 1 so that our required result follows. If the lattice point lies inside a Dl S) we denote by AlS) the area of Dl S). If Dl S) is not in I, then Ni = 0, Ai ~ 1 - AlS); if DlS) is in I, then Ni = 1, 1 - Ai ~ 1 - AlS) and, from Lemma 2, we have 1 - AlS) < Ii' The theorem is proved. D

It is clear that Theorem 9.1 is an immediate consequence of Theorem 9.2. Exercise 1. Find the asymptotic formula for the number of lattice points inside an ellipse centre at the origin. Exercise 2. Prove that the number of lattice points inside the sphere u 2 + v2 + w2 ~ x is given by

1nx 3 / 2

+ O(x).

Exercise 3. Generalize the previous exercise to a sphere in n-dimensions. Exercise 4. Determine the order of Ln.;xr2(n). Exercise 5. The number of lattice points inside the circle u 2 coordinates is given by 6 -x n

+ v2

~

x with coprime

+ O(fi log x).

6.10 Farey Sequence and Its Applications Farey sequence was discovered well over a hundred years ago, but its significance in number theory is revealed only in modern times. "-

Definition 1. By the Farey sequence of order n we mean the fractions in the interval from 0 to 1, whose denominators are ~ n, arranged in ascending order of magnitude. That is, they are numbers of the form a

b'

(a, b) = 1,

arranged into an increasing sequence. We denote by tYn the Farey sequence of order

n. Example:

tY7

is the sequence

The total number offractionsin tYn is 1 + L~= 1qJ(m). These fractions divide the interval 0 ~ x ~ 1 into L~=l qJ(m) parts, and tYn+l is obtained from adding the

126 cp(n


+ 1) numbers a

+ 1) =

(a,n

n + l'

1,

o 2, 1 ~ m ~ A 1/3, (a, m) = 1, k ~ 1. Suppose that M+m-l

S

=

L

{fix)},

x=M

where fix) has a continuous second derivative in M a 9 f'(M) =-+-,

m

(a,m)

m2 1

A

~

If"(x) I ~

~

x

~

= 1,

M

+m

191
I/A we see that f"(x) does not change sign. We can therefore assume without loss that/"(x) > O. Then we have ( m) (m m "fiM) - m 2 < I/I(y) < m "fiM) + m2

2

) + 21 m A"k ,

or mfiM) - 1 < I/I(y) < mj(M)

+ 1 + tk.

The result follows from taking c = mj(M) - 1 and h 11.1. D

= 2 + k/2 in Theorem

132


Theorem 11.3. Let k ~ I and let fix) have a continuous second derivative in M ~ x ~ M + m, and I

-

A

~

k

If"(x) I ~ -. A

Then M+m-1 S=

L

x=M

I {fix)} = -m 2

+ 0(.1),

where

Proof We take 1: = A 1/3 , M = M 1. We see from Theorem 10.6 that there exist a 1 ,m,8 1 such that (7)

From Theorem 11.2 we have

M,+m,-1

L

x=M,

We next take M2 such that

8'

+ -.!..(k + 5), 2

+ m1 and again from Theorem 10.6 there exist a2, m2, 8 2

M1

=

I {fix)} = -ml 2

and

M2+m2-1

L

X=M2

I {fix)} = -m2

2

8'

+ ~(k + 5), 2

Continuing this way, if after s steps we have

o~ M + m -

I - Ms+l < 1:,

then

IS - t(m1 + ... + ms) - t(M + m - Ms+ 1)1 s

~ 2(k

or (since Ms+1 = M

I

+ 5) + 2(M + m -

M s + 1),

+ m1 + ... + ms) IS - tml < ts(k + 5) + t(1: + I).

(8)

We now have to estimate s. Suppose that 0 < q < 1:, (p, q) = 1. If p, q are given, we can estimate how many m1,'" ,ms are equal to q. From 1f"(x)1 > I/A and its

6.11 Vinogradov's Method of Estimating Sums of Fractional Parts

. 133

continuity we know thatf"(x) does not change sign. It follows that the set of values x satisfying I ---:;;;f'(x):;;;-+-

pip q

forms an interval. Let

Xl> X2

q1:

q

q1:

(9)

be any two points in the interval, so that

Hence X2

I

f

I

f"(t) dt
0, R(x)

= nx + O(x~+e).

(See Note 6.1.) A famous problem in number theory is the conjectUfe that R(x) = nx

+ O(xi +e).

We require the following result for the proof of Theorem 12.1. Theorem 12.2. Letj(x) have a continuous second derivative in the interval Q and let x

u(x)

=

fGo

{t} )dt.

~

x

~

R,

135

6.12 Application of Vinogradov's Theorem to Lattice Point Problems

Then R

I

f(x)

=

ff(X) dx

+ (t -

{R})f(R) -

(t -

{Q})f(Q) - (f(R)f'(R)

Q<x':;R Q

R

+ (f(Q)f'(Q) + f

(f(X)f"(x) dx.

Q

Proof Let Xl be an integer, Q tegration by parts we have p

~

~

oc < 13

R, Xl < oc < 13 < Xl

+ 1.

From in-

p

- ff(X)dX=

'"

ff(x)~G-{X})dX '"

= (t - {f3})f(f3) -

(t -

{oc})f(oc) - (f(f3)f'(/3)

+ (f(oc)f'(oc)

p

+f

(1)

(f(x)f"(x) dx.

'"

Letting oc -+ Xl> 13 -+ Xl +I

Xl

-

+ 1 we have Xl

+ 1) -

f(x)dx = - tf(XI

f

tf(xd

Xl

+

+I

f

(f(x)f"(x)dx.

Xl

From this it follows that [R)

-

f f(x)dx

I

= -

fix)

+ tf([Q] + 1) + tf([R])

[Q)+ I ':;X':; [R) [Q)+ I [R)

+

f

(2)

(f(X)f"(x) dx.

[Q)+ I

If in (1) we let

(J(

=

Q, 13 -+ [Q]

+ 1, then

[Q)+ I

f

fix) dx = -2 1f([Q]

+ 1) -

G-

{Q} )f(Q)

+ (f(Q)f'(Q)

Q [Q)+ I

+

f Q

(f(X)f"(x) dx.

(3)

136


Similarly we have

-f R

j(x)dx

= (t - {R})j(R) - tj([R]) - u(R)f'(R)

IR)

f R

+

(4)

u(x)f"(x) dx.

IR)

The required formula is obtained by adding (2), (3) and (4).

D

Proof of Theorem 12.1. By considering the diagram associated with the circle problem it is easy to see that R(x)

I

= I + 4[Jx] + 8

[Jx - u2 ]

x

-

4[

0 0, then the formula R(x) = nx

+ O(xi -')

does not hold. Actually we shall prove a very general result. In this section K, Kb K 2 , K3 represent absolute constants. At various places we may use the same symbol to denote different constants, but this should not cause any confusion. Let c> °and let ai, a2,'" be integers satisfying °Theorem : ;:; al ::;:; a213.1::;:; (Erdos-Fuchs). .. '. Let fin) denote the number of solutions to the equation ai

+ aj =

n, and r(x) =

I

f(n)

so that r(x) is the number of pairs of integers ah aj satisfying ai formula cannot hold.

+ aj ::;:; x.

Then the

139

6.13 D-Results

We shall first deal with the following auxiliary results. Theorem 13.2. Let an be real numbers such that co n= -

converges uniformly, and that

00

I:'= -co a; converges.

Then

1t

-1t

Proof Clearly we have co

co

I

1t/I(.9W =

I

anamei(n-m)8.

n=-oo m=-oo

The required result follows from integrating term by term over - n to n.

°

I:"

Theorem 13.3. Let bn ~ and let q>(z) = 0( < n, z = re i8 (0 < r < I), then we have

°
(zW d.9

~~ 6n

-at

1q>(zW d.9.

-1t

Proof We introduce the function q(.9) =

{I -I~I, 0,

when

1.91

when

0(

~

0(,

< 1.91

~

n.

Then we have

f at

f 1t

1q>(zW d.9

-at

~

f 1t

Iq(.9WIq>(zW d.9 =

m,~

1

bnbmrn+m

-1t

Iq(.9)1 2 ei(n-m)8 d.9.

-1t

When m =I n, we have a

1t

o

-1t

=

4 O(n - m)2

(

0

1-

sin(n - m)O() O(n - m)

~

0,

140


while when m = n,

f "

Iq(.9)12 d.9

=

23!Y. ,

-1[

and therefore we have

-a

-n

Theorem 13.4. Suppose that

Izl < 1 and let co

n=O

Then there exist constants c, C such that Yn O< c < ----;:-=t (l-r)-t

(1 -

where 61(r) --+ 0 as r --+ 1. In the first sum there are at most r)-t terms, each of which is at most (1 - r)-i, so that the sum is at most (1 - r)-!. From Theorem 13.4 the second sum is

1 1- r

:::; 6(r)10g-1--(1 - r)-i.

Together we have 00

I

bnrn:::; K(1 - r)-~

1

+ 6(r)10g-1--(1

- r)-i

1- r

n= 1

o

= O(10g - 1_1_(1 - r)-i). 1- r

Theorem 13.6. Letf(x) and g(x) be two continuous realfunctions in the interval (a, b). Then b

b

b

I ff(x)g(X)dX I:::; (fF(X)dX f a

a

g2(X)dx)t.

a

Proof Let A be any real number and consider b

b

b

A2 f F (X)dX+2A ff(x)g(X)dX+ f g2 (X)dX a

a b

=f a

(Aj{X)

+ g(X))2 dx ~ O.

a

142


The discriminant of the quadratic expression cannot be positive and so the theorem follows. 0 Proof of Theorem 13.1. Suppose that

t < r < 1, z = reiiJ., 1 -

r < oc < n12. Let

00

so that we have at once 00

g2(Z)

=

I

f(n)z"

"=0

and 00

(1 - z) -lg2(Z)

I

=

r(n)z".

"=0

If formula (1) holds, then 00

(1 - Z)-lg2(Z) = c

I

nz"

+ h(z)

"=0

= cz(1 - Z)-2 + h(z),

(2)

where 00

I

h(z) =

v"z",

"=0

We shall now derive a contradiction. From (2) we have

f e
3. There cannot be any real primitive character either if I = 1. For if m = 2m', 2,rm', then from n

== n' (mod m'),

(n,m) = 1,

(n',m) = 1

159

7.4 Character Sums

we deduce that n == n' (mod m) giving x(n) = x(n') so that x(n) is improper. Summarizing, the possibility for the existence of real primitive character occurs when

where Pi are distinct odd primes and a = 0,2,3. Moreover, if the character is primitive, then Cv = q>(p)/2 or

(~).

(x(n,p))"Hp-l) = e"iindn =

Thus, if a = 0, then the real primitive character is the Jacobi symbol (n,m)

=

1.

If a = 2, then the real primitive character is n-l ( n )

(- 1)-2- m/4 '

and if a

=

(n,m) = 1,

3, then there are two types of real primitive character:

)~n2 - (m~8) ,

(- 1

1)

n - 1 n - 1 ( __ n ) (_ 1)-2-+-82

m/8

= (_

(n,m)

1)~(n-2)2-9)

=

1,

( _n ) ,

m/8

(n,m)

=

1.

7.4 Character Sums Let m

S(a, X) =

L x(n)e21tian/m. n=1

Theorem 4.1. Let (mt. m2) = 1 and let X be factorized into

where Xl(n) is a character modml and X2(n) is a character modm2' Then

Proof Let n = mln2 + m2nl' Then as nt.n2 run over the complete sets of residues modmt. modm2 respectively, n runs over the complete set of residues modmlm2'

160

7. Trigonometric Sums and Characters

Therefore ml

Sea, X)

=

Xl (m2)X2(ml) L nl

m2

L Xl (ndxin2)e21tia(mln2 +m2 n d/mlm2

=1

n2::::::

1

Thus the study of character sums mod m is reduced to that of character sums to a prime power modulus.

Theorem 4.2. Let m = pl. If pia and X is a primitive character, or if p,ra and X is an improper character (but we exclude the case I = I, X = Xo), then S(a,x)

0.

=

Proof We make the substitution n = x(l

+ pl-ly).

When I ~ x ~ pl-l, p,rx and I ~ y ~ p, the number n runs over the reduced residue system mod i, and conversely. Therefore o

p'-l

P

Sea, X) = L x(x)e21tiaX/P' L x(l

+ pl-ly)e21tiaXY/P.

y=l

x=l p,/'x

If x(n) is improper, then x(l

+ i-ly) = I,

Sea, X) = {

so that

O'

if p,ra,

p L x(x)e21tiax/P',

if pia.

p'-l

x=l

If x(n) is primitive, then there exists u such that x(l from p

+ pl-1U) #

p

x(l +pl-1U)L x(l +i-ly)

= L x(l +pl-l(y+U»

y=l

y=l p

=

L x(l +i-ly), y=l

we have p

L x(l Therefore Sea, X) =

°also.

I; now pia and so

+ pl-ly) = 0.

y=l

0

We shall write T(X) = S(I, X)·

161

7.4 Character Sums

If (a,m)

= 1, then m

x(a)S(a, X)

L x(an)e21tian/m

=

n=l

= S(l,X)· Theorem 4.3. Let

L

Cq(n) = (a,

e21tian/q,

q)= 1

where a runs over a reduced set of residues mod q. Then 1) cq(n) is a multiplicative function of q; that is if (qt. q2) = 1, then Cq,(n)Cq2 (n) = Cq,q2(n);

i 2)

Cpl(n)= {

pl-l,

if iln,

_pl-l,

if pl,tn, pi-lin,

0,

if pl-l,tn;

3)

Proof 1) can be proved by the substitution a = qla2 method described earlier. 2) follows from Cpl(n)

=

3) follows from I) and 2). Theorem 4.4.

pI

p'-'

a=l

a=l

+ q2al

with the familiar

L e21tian/pl - L e21tian/pl-l. D

If x(n) is a primitive character, then

Proof First consider the case m = pl. We have easily

1't'(xW =

't'(X)i(X) p'

=

L

p'

x(n)e21tin/pl

q=l

n=l p'

=

L

pI

x(n)e21tin/pl

L

x(nq)e-21tinq/pl

q=l

n=l p'

=

L x(q)e-21tiq/pl

pI

L X(q) L e21ti(1-q)n/pl. q= 1

n

=1

p,tn

If pl-l ,t(q - I), then from Theorem 4.3, the inner sum on the right hand side in the above is O. We need therefore only examine the situation wh enpl-ll(q - I), that is

162

q

=


I

+ pl- 1U,

0

~ U ~

P - I. But now clearly p-1

1-c(xW

=

pi - pl- 1

L:

_

i(l + pl- 1 U)pl- 1

u= 1 p

=

L:

pi _ pl-1

i(l + i - 1u).

u= 1

Now if x(n) is primitive, then there exists v such that x(l

i(l + pl-1 V) # 0, 1. From p

p

L:

i(l +pl-1 V)

+ pl-1 V) #

L:

i(l +i- 1u)=

u=l

0, I so that

p

L:

i(l +pl-1(U + v)) =

. u= 1

i(l +i- 1u),

u=l

we have p

L:

i(l + pl-1U) = o.

u= 1

Therefore the case m Theorem 4.1. 0

=

pi is proved, and the general case follows at once from

We see therefore that -c(x) =

evlm-,

lei =

1.

However, the determination of e is no easy matter. For real primitive characters we know much more and in the next section we shall determine e when X is a real primitive character. Theorem 4.5. Let X be a real primitive character. Then, for odd m, we have

-c(X)

=

{± ~

if m == I (mod 4), if m == 3 (mod 4).

±lym Proof This is similar to the proof of Theorem 4.4. If m p

(-C(X))2 =

=

L:

X(q)

q=l

L:

e 21ti (1 +q)n/p = X( - I)p.

n=l

We already have x( - I)

so that the theorem follows.

= ( ~ I ) = ( _ I )p; 1,

0

7.5 Gauss Sums The trigonometric sum m-1

S(n, m) =

p, then

p-1

L:

x=o

e21tiX2n/m,

(n,m) = I

163

7.5 Gauss Sums

is the famous Gauss sum. In this formula the summation can be taken over any complete set of residues mod m. Theorem 5.1. If(m,m') = 1, then

S(n, mm') = S(nm', m)S(nm, m'). Proof Let x

=

my

+ m'z.

Then mm'

S(n, mm') =

L

e21tix2n/mm'

x=l m'

=

m

L L e21tin(my+m'z)2/mm' y= 1 z= 1

=

and hence the result.

m'

m

y=l

z=l

L e21timny2/m' L e21tim'nz2/m

D

We see that in order to evaluate a Gauss sum we need only deal with the case m=pl. Theorem 5.2. Let

b= {

1, 2,

when p is an odd prime, when p = 2.

Then, for 1 ~ 2b, we have

Proof Let x = y

+ p'-bZ. Then, from

2(1- b) ~ I, we have

y= 1 z= 1 pl-d

=

L

pd

L e41tiyzn/pd

e21tiy2n/pl.

y=l

z=l

pl-c;

=

pb

L

e21tiy2n/pl

y=l ply

p'-d-l

=

pb

L

e21tix2n/pl-2.

x=l

When p > 2, this is what is required. When p pl- 3

P

L

x=l

the result also follows.

D

=

2, then from

pl- 2 e21tix2n/pl-2

=

L

x=l

e21tix2n/pl->,

164


From this theorem we see that the crucial points in the evaluation of a Gauss sum rest on the determination of S(n,2),

S(n,4),

S(n,8)

and p an odd prime.

S(n,p), Theorem 5.3. If 2,rn, then

= 0, S(n,4) = 2(1 + in), S(n,2)

7ti

= 4e4"n.

S(n,8) Proof Clearly we have 2Jti

S(n,2)

= 1 + eTn = 1 - 1 = 0,

S(n,4)

=

S(n,8)

= 2(1 + esn + es4n + es9n )

27ti

1 + eTn

27ti 4

27ti 9

+ eT n + eT n = 1 + in + 1 + in = 2(1 + in), 27ti

2ni

27ti

Theorem 5.4. If p is an odd prime, then

= (;)S(I,P) =

S(n,p)

(;)T(X).

Here x(a)

=

(~).

Proof The number of solutions to the congruence x2

== u (modp)

is

and therefore

±

e27tix2n/p

=

x=1

f (1 + (~))e27tiun/p = f (~)e27tiun/p P P

u=1

= (':.)

p

which is the required result.

u=1

±(~)

v=1

0

P

e27tiv/p,

165

7.S Gauss Sums

Theorem 5.5.

=

S(l,p)

if if

{JP, iJP,

== 1 (mod 4), p == 3 (mod 4). p

Proof From the above theorem and Theorem 4.5 we have S(l,p) =

{±±iJP, JP,

if p == 1 (mod 4), if p == 3

(mod 4),

which, combining into a single formula, gives

t(1 + iP)(l -

i)S(1,p)

=

± JP.

If we can prove that

+ i P)(1

91H(1

- i)S(l,p)} > -

JP,

where 91{x} represents the real part of x, then the theorem will follow. Now itis easy to see that p-1

I

S(1,p) - 1 =

t(p-l)

I

e27tix2jp =

x= I

(e27tix2/p

+ e 27ti(p-X)2/ p)

x= 1

t(p-l)

=

2

I

(1)

e27tix2/p.

x=l

Let j(x) be any function. Then t(p-l)

I

t(p-l)

j(x)

x=l

(p x ) = I f (x) - . p-l

+ I f -x=l

2

x=l

2

This formula clearly holds because the first term on the left hand side is merely the sum of those terms on the right hand side when x is even, and the second term is the sum on the right hand side when x is odd. We take j(x) = e27tix2/p and note that j(~ - x) = iPe27tix2jp. Then, from (1), we have p-l

t(1 + iP)(S(l,p) -

I

1) =

+ Z,

(2)

e27tix2/4P.

(3)

e27tix2/4p = W

x=l

where

W

=

I

e27tix2/4p,

x.;;Jp

Z

I

=

JP<x.;;p-l

From (2) we have

t(1 + i P)(1 Since 91H(l

+ i P)(1

- i)S(l,p) -

t(1 + i P)(l

- i) = (1 - i)(W + Z).

- i)} is 1 or 0, it follows that

91H(l + i P)(l - i)S(1,p)} ~ 91{(1 - i)(W + Z)} ~ 91(1- OW -

filZI.

(4)

166


From cos x

+ sin x ;::::

1 when 0

9l{(1 - i)W}

~

~

x

n12, we deduce that

nx2 nx2) 1 r L - ( cos+ sin- ;:::: [vPJ;:::: -yp. 2p 2p 2

=

(5)

Jp On the other hand, if we write in Z, x:S;;

nx 2p

= cosec-,

Wx

then (6)

Therefore, from (3) and (6) we have p-l

L

2iZ =

x~q+

(v x -

Vx -

dwx,

1

that is

21Z1

=

Pil

I

viwx -

+ Vp-lWp -

Wx + l )

VqWq+ll

x~q+l

p-l

I

~

(Wx -

Wx+l)

+ Wp + W q + l = 2wq + l

x~q+l

r:

2p q+l

~--~2vp

(because

Wx

(7)

is decreasing). From (4), (5) and (7) we finally have

The theorem is therefore proved.

0

Summarizing we have the following result: Theorem 5.6. If m is odd, then

S(n,m)=

{ (:)fo, fo, .(n)

if m == 1 (mod 4), if m == 3

(mod 4).

1 -

m

Proof We use induction on the number of distinct prime divisors of m. If m = pi, then we have by Theorems 5.2 and 5.4, that I

S(n,p) =

{'

p2,

if 21/,

pt(l-1)S(n,p) =

(~)pt X2 have the moduli PI, m' respectively, and x(n) = XI(n)X2(n). Therefore, from Theorem 3.6.4 and the induction hypothesis, we have

{fi:} ifi: . {P} iP ~~ {fi:} {P} ifi: iP

r(x) = ( -m')(PI) - . PI m' = (-

1)

2

2'

•

== 1 (mod 4) if m == 3 (mod 4) 2) a = 2. Let m = 22m'. If m' = 1, then x(l) = 1, if m

{ Jp1m' =Fm, = iJplm' = iFm,

or

X(-l)=l,

or

X( - 1) = - 1.

X(3)

= - 1 so that

4

I

r(x) =

x(n)e21tin/4

= e21ti /4 - e61ti /4 = 2i.

n= I

If m' > 1, then from Theorem 4.1 and 1)

m'-1(4)

r(x) = (- 1)-2- m' 2i

.{P=i

== 1 (mod 4) if m' == 3 (mod 4) if m'

Fm, ip=Fm,

3) a

or

X( - 1)

= - 1,

or

X( - 1)

=

1.

= 3. Let m = 23 m'. When m' = 1, we have B

r(x)=

I

n= I

.

x(n)e 21t1n /B =

{e 21ti /B _ e61ti /B _ el 01ti/8 + eI41ti/8 = j8, if X( - 1) = 1,

.

.

.

.

e21t '/B + e6",/B - el 0",/8 - eI41t ,/8 = ij8, if X( - 1) =

Suppose that m' > 1. If x(n) = (- 1)t
=

IQ-lL

e27tina

I= 11 -

e27tiQa . 1 - e 27t1a

n=O

::::;

2

I

1

=---

11 - e 27tia l

Isin n!Y.1

1

~-

"" 2
(when 0 ::::; ~ ::::;

t, sin n~ ~ 2~, so that Isin n~1 ~ 2< 0).

Theorem 7.3. If2,(q, then

Im-lL

q-lL

I

e27tix2/q - m e27tix2/q ::::; x=o q x=o

Jq log q.

Proof Clearly we can assume that m ::::; q. From Theorem 7.1 we have

m-lL

x=O

e27tix2/q

q-lL q-l =m L =

e27tix2/qg(x)

x=o

e27tix2/q

1

q-l q-lL

+_L

qx=o

e27ti(x2+nx)/q

qn=lx=O

1

-27tinm/q - e . . 1-e 27t1n/q

From the formula for a Gauss sum we have

q-l Ix~o

e 27ti(x 2+nx)/q

I= Iq-l x~o

e27ti(X + tn)2/q 1* ::::;

so that

Iqil

x=o

e27tix2/q _ m

qi

1

e27tix2/q

I

q x=o

q-l ~-L-1

"" Jq n=l 2(~) *

Here

t represents the solution to the congruence 2x == 1 (mod q).

Jq,

172


I

~-

t(q-l)q

t(q-l)

I

I -=Jq I Jq n=l n n=l n < Jqt('I1)(_IOg(1 -~) + IOg(1 + ~)) n=l 2n 2n t(q-l)

= Jq

I n= 1

+ log(2n + I))

(-log(2n - I)

= Jqlogq.

0

Theorem 7.4 (polya). Let p be an odd prime, I character modp. Then

~

~

m

p, and X be a non-principal

I:t~ X(x) I< Jp logp. Proof From Theorem 7.1 we have

m-l p-l I x(x) = I x(x)g(x) x=o

x=o

m P-

=-

I

1

x(x)

Px=o

IP-l

+- I

x(x)

Px=O

p-l l_e-21tinm/p I e21tinx/p . n=l I-e 21t1n/p

From Theorem 2.3, Theorem 4.4 and Theorem 7.2 we have

m-l I JP-1II -_e-21tinm/p _21tin/p IIP-l I x(x)e21tinx/p I I I x(x) ~ - I x=O Pn=l I e x=O I p-l I ~ r.: I - ( ) < Jplogp. 0 V Pn =12 ~ p

This theorem has the following application: Theorem 7.5. Let p be an odd prime and dl(p - I). Then there is always a d-th power non-residue modp which is less than Jp logp.

Proof Let R represent a d-th power residue not exceeding m. Then R=

where X(x)

=

mid

I

d

m

I - I e 21tia ind x/d = - I I e 21tia ind x/d x=l da=l da=lX=l

e21tiindx/d. From Theorem 7.4, we have d-I r.: IR- dml 2. From Theorem 9.1 we have Jl(k)

0=

k

L -- L

Ih(p)l- 1

a

L L'

klp-1q>(k) u=1 a=O n=-a (u,k)= 1

e21tiuindn/\

179

7.9 The Problem of the Distribution of Primitive Roots

where If means that we omit the term n = O. On the right hand side of this equation the term k = 1 is equal to Ih(p)l- 1

I

a=O

a

Ih(p)l- 1

n= -a

a=O

If 1 = I

2a = Ih(p)12 - Ih(p)l·

For those terms in which k -:f 1 we use Theorem 9.2, taking A

I

a

Ih(p)l-l [

=

Ih(p)1 - 1, so that

Ih( )1 2

a~o n~~a x(n) ~ Ih(p)lpt -

;

,

where

Therefore Ih(p)12 - Ih(p)1

~ (lh(P)lpt -

I 1J1((~)1 ({)(k)

2 Ih(P;1 )

p

= 2m (lh(P)lpt

_

klp-l ({)

Ih~;12).

That is Ih(p)1 ~

2mpt

+1

1 + 2m/p

t < 2mpt.

0

From Theorem 9.3 we immediately deduce: Theorem 9.4.

If p == 1 (mod 4),

then we have the primitive root

Proof We have to prove that Ih(p)1 is a primitive root. Suppose otherwise, so that - Ih(p)1 is now a primitive root. But Ih(pW

== 1

(modp),

I
1, pI II (kak, ... , 2a2, al), and that f.ll' ... ,f.lr are distinct roots of f'(x) == 0

o ~ x pl/4H, we have

INiH (~)I

< eH, P is the Kronecker symbol. He also used this to give an estimate for n2(p), where (11) p the least quadratic non-residue mod p, namely n2(p) = O(p±Je +e). Burgess's method can be generalized and extended to give estimates for the least primitive root h(p) and the least d-th power non-residue nip): h(p) = O(p±+') (see D. A. Burgess [13J and Y. Wang [62J), nip)=O(pl/A+,), A=4e 1 - 1 / d (d~2); nip) = O(pB), B = (log log d + 2)j410g d (d> e33 ) (see Y. Wang [63J). n=N+l

Chapter 8. On Several Arithmetic Problems Associated with the Elliptic Modular Function

8.1 Introduction The following four important functions frequently occur in the theory of elliptic modular functions:

n (l 00

qo

=

q2n),

n~l

n (l + q2n), 00

ql

=

n~l

n (l + q2n-l), 00

q2

=

n~l

00

Following the tradition in the theory of elliptic modular functions we use q to represent the variable, which can be real or complex and which satisfies Iql < I. The four infinite products then clearly converge. We do not give any deep discussion on the properties of the elliptic modular function in this chapter. Indeed we do not even define an elliptic modular function and instead we shall study the following associated arithmetic problems: the partition of integers, the sum of four squares, and the transformation of power series related to qo, ql, q2, q3' The problems of convergence arising in the chapter are very simple and any reader familiar with advanced calculus can easily supply the details. (In §8 we also use n-dimensional multiple integration). We shall therefore omit all qiscussions on convergence in this chapter. The following is the first and simplest relationship between ql, q2, q3' Theorem 1.1. if Iql < I, tHen Proof We have

n (l 00

q2q3

=

q2(2n-l»).

n~l

We rearrange the terms in ql by taking out all the powers of 2 from 2n giving

187

8.2 The Partition of Integers.

ql =

00

00

00

n=l

n=l

n=l

f1 (l + q2(2n-l) f1 (l + q4(2n-l) f1

(1

+ q8(2n-l) ....

From this we see that 00

qlq2q3

=

00

n=l

=

n=l

00

f1

00

(l +q4(2n-l)

n=l

00

00

n=l

n=l

f1

00

(1 +q8(2n-l) ...

n=l

f1 (l + q4(2n-l) f1 (l + q8(2n-l)

(1 - q4(2n-l)

n= 1

=

00

f1 (1_q2(2n-l) f1 (1 +q2(2n-l) f1

...

00

f1

f1 (1

(1 - q8(2n-l)

n=l

+ q8(2n-l) ... = ... = 1. 0

n=l

The theorem can also be proved from the equation 00

f1 (1

qOqlq2q3 =

- qn)

n=l

00

00

n=l

n=l

f1 (1 + qn) = f1

(1 - q2n)

= qo.

8.2 The Partition of Integers Let n be a positive integer. Any collection of positive integers whose sum is equal to n is said to form a partition of n. For example:

5=4+1=3+2=3+1+1=2+2+1 = 2 + 1 + 1 + 1 = 1 + 1 + 1 + 1 + 1, so that there are 7 partitions of 5. We denote by p(n) the number of partitions of n, so that in the above example we have p(5) = 7. Ifwe restrict to those partitions of n in which each term in the partition does not exceed r, then we denote by Pr(n) the number of such partitions. For example, P3(5)

=

5.

Theorem 2.1. If Iql < ~, then 00

1+

n~l Pr(n)qn =

1 (1 _ q)(l _ q2) ... (l _ qr) .

Proof The right hand side of the equation above is equal to

(1 + q + q2 + q3 + ... + qXI + ... ) x (l + q2 + (q2)2 + (q2)3 + ... + (q2)X2 + ... ) x (l + q3 + (q3)2 + (q3)3 + ... + (q3)X3 + ... ) x ... x (1 + qr + (qr)2 + (qr)3 + ... + (qT' + ... ),

188

8. On Several Arithmetic Problems Associated with the Elliptic Modular Function

and the coefficient of qn is the number of non-negative integers solutions to Xl

which is Pr(n).

+ 2X2 + 3X3 + ... + rXr = n

0

We can prove similarly:

If Iql

or

x _ n -

(l - q2m-2n+2)(1 - q2m-2n+4) .. .. (1 - q2m) X q (l _ q2m+2n)(1 _ q2m+2n-2) ... (1 _ q2m+2) o· n2

From (3) we have (l - q4m)(1 - q4m-2) ... (1 _ q2m+2) Xo=------~----~------~~---

(l - q2)(l - q4) ... (1 _ q2m)

so that when 0

~ n ~ m -

,

1, n2

X n -

q X' (l _ q2)(1 _ q4) ... (1 _ q2m) n'

where X' n -

(1 - q2m-2n+2)(l _ q2m-2n+4) ... (1 _ q2m) (1 - q2m+2) ... (1 _ q4m) (1 _ q2m+2n)(l _ q2m+2n-2) ... (l _ q2m+2) (4)

It follows that (2) can be written as (1 - q2)(l - q4) ... (1 - q2m)({)m(z)

= X~ +

m

I

qn 2 (zn

+ z-n)x~.

(5)

n=l

As m --+ 00, X~ --+ 1 so that the identity in the theorem follows. However we still have to justify the process of taking the limit of the individual terms in the series. Let Uo. m

= X o, if 1 ~ n

~

if n > m,

m,

190

8. On Several Arithmedc Problems Associated with the 'Elliptic Modular Function

so that co

L un,m'

({)m(z) =

(6)

n=O

As m --+

00,

the term un,m --+ Un where (n > 0).

We have co

n (l + Iql2k) = Kl

IX~I
(using Theorem 6.6)

6(c/2)

which proves (3). 2) We next prove: Given any positive e there exists A (= A(e)) such that I p(n) > _e(C-s)n t . A

We use induction on n, but the choice of A will not be made clear until later. From Theorems 6.3 and 6.4 together with the induction hypothesis we see that (4)

203

8.6 Estimates for p(n)

Since e- X

;?;

I - x, the double sum is

I

;?;

Pk 2 ) I - -I (c - e) -,2 n'

le-t(c-e)lkn-t (

Ik';n

c-e

=Il--' I2 2n'

(say).

For any positive t we always have e- X

I

le-t(c-e)lkn-t

Ik>n

=

(5)

O(x- t ), so that

(ni I -h= o(n- I: I: =0

11

it (lk)

~~t)

Ik>n

h

11-itk-it)

1= 1 k= 1

= O(n- it ),

if t> 8.

(6)

From this and Theorem 6.6 we have 2n 1 n

II> 3(c -

e)

2n 2 n

C3Jn

2 -

2n 2 n

(I

I)

In

= 3c2 + -3- (c _ e)2 - c2 - C3 n (7)

(using

I

I

2

(c - e)

-2"=2 c

J x-

3

dx>2ec- 3 ).

c-e

On the other hand, by the binomial theorem and Theorem 6.5,

I2 = I

k 2 [3e- t (c-e)lkn-t

Ik';n n

~

co

I

I

k 2

k=1

1 3 e-·t(c-e)lkn-t

1=1 e - t(c -e)kn - t

n

~ 12

I

k 2

k=1

=0 ( n

(l-e

I

t(c

e)kn

1)4

t(c

e)kn

t)2 ).

n

k= 1

(l -

e

We divide the sum in the bracket into two parts: n

I = I k=1

k,.j~

+

I

j~ e-tcx,

o

which gives L _ (1 _ k ..

Jn

e-t~c-e)kn-t)2 = 0 (n

L k ..

Jn

:2) =

O(n).

In the second part t(e - e)kn- t ~ t(e - e) and

so that

From this and (8) we see that L2 = O(n2).

(9)

Collecting (4), (5), (7), (9) we have I np(n) > _e(c-e)n\(l A

+ 2ee- 1 )n -

e4Jn).

When

e4 )2 (2ee-

n> - -1 we have I

p(n) > _e(c-e)n t .

(10)

A

When n :::; e;(2ee- 1 )- 2 we take A large enough so that (10) holds. The theorem is proved. 0

8.7 The Problem of Sums of Squares Let r.(n) denote the number of sets of integer solutions (x h

xi + ... + x; = n. From Theorem 6.7.5 we already have r2(n)

=

L (- l)t(U-l), uln

... ,

x s ) to the equation

205

8.7 The Problem of Sums of Squares

where U runs over the odd divisors of n. This theorem is clearly equivalent to the following: Theorem 7.1.

if Iql
2 n= 1 and 1

Ck

00

+I

= -Uk

2

I

00

Uk+1

1=1

+I

U,Uk+1 -

-

k- 1

I

2 , =1

1=1

U,Uk-I'

Now and so that

Theorem 7.6.

Proof In Theorem 7.5 we take 9

G+ n~o

n~o

U4n+ 1 -

1

1

00

16 1

= -

16 1

+- I 2

+ I (-

1)mc2m

m=l 00

nUn

+ I (-

n=l

l)mu2m (l

+ U2m

-

m)

m=l

00

00

I

= - +16

nUn

2 n=l 1 00

2 1

y

00

I

= - +-

U4n+3

= nl2 giving

(2m -

I)U2m-1

m=l

+ I (-

l)mU2m (l

+ U2m)

m=l

00

+2 I

(2m -

I)U4m-2

m=l

1

1

= - +-

00

I

00

(2m -

16 1

2 m=l 1 00

16

2n=1

=-+-

I

4.j'n

nUn'

I)U2m-1

+ I

m=l

0

(2m -

I)U4m-2

(by Theorem 4)

208


Theorem 7.2 now follows easily from Theorem 7.1 and Theorem 7.6. From Theorem 7.2 we deduce at once: Theorem 7.7. r4(n)/8 is a multiplicative function.

D

Theorem 7.8 (Lagrange). ,Every positive integer is the sum of four squares.

D

Apart from these we also have the following application: Theorem 7.9 (Jacobi). q~ - q~ = l6qq~. Ifwe substitute the representation formulae in §1 into this identity then we have

CDI

(1

+ q2n-I)Y -

CDI (1 - q2n-I)Y

=

16q

CDI

(l

+ q2n)y.

(Jacobi called this result "Aequartro identica ratis abstrura".)

00

(qoq~)4

=

-L r4(n)(- l)nqn n=O

and (2qoqi)4 =

C=~oo qn(n+l)r,

we see that our required identity is equivalent to

Let s4(n) denote the number of solutions to (4)

where n must be odd. Thus our theorem has the following arithmetical interpretation: if n is odd, then s4(n) is equal to 2r4(n). We multiply equation (4) by 4 and from completing squares we have (2XI

+ 1)2 + ... + (2X4 + 1)2 = 4n.

The r4(4n) solutions to the Diophantine equation

209

8.7 The Problem of Sums of Squares

have only two types: (i) Yt>Y2,Y3,Y4 all odd, (ii) Yt>Y2,Y3,Y4 all even. From this it follows that


I

r4(4n)=S

m=sI(m+2m)=3(sIm)=3r4(n), min

ml2n

min

and hence

The theorem is proved.

0

Exercise 1. Use the following method to prove that 1

I

1

n2

1

+ 22 + 3 2 + 4 2 + ... = 6".

Obtain the asymptotic formula

for the number A(x) of lattice points inside the four dimensional sphere

Find another representation for A(x) with Theorem 7.2 and compare the results. Note. From this exercise and (6.14.2) we deduce at once that

~ J1(n) = ~

L..

n=l

n

2

n

2·

Exercise 2. Show that

Exercise 3. Use the identity (1 - cosn.9) cot2 t.9

= (2n - 1) + 4(n - l)cos.9 + 4(n - 2) cos 2.9 + ...

+ 4cos(n to prove that

1).9 + cosn.9

210


I 21 {-cot -8 8 2

1 12

X

+ - + --(1 1-

+ -3- (3 1 X

1-

=

- cos 8)

X

- cos 38)

X3

2X2

+ --(1 1-

- cos 28)

X2

+ ... }2

(~cot2~8 + ~)2 + ~{~(5 + cos 8) + 2 8 2 12 12 1 - X 1-

3

3X

+ 13_

3

X3

(5

X

2

X

2

(5

+ cos 28)

+ cos 38) + ... } .

8.8 Density Let r,(n, q) denote the number of solutions to

xi+"'+x;=n

(modq).

(1)

Consider the substitution

Xi + ... + x; = y. There can be q' values on the left hand side and q values on the right hand side. This means that corresponding to one value of y there are, on average, q'-1 solutions. We now consider the ratio between the number of solutions and the average number A

(

LJqn

)

=

rln, q)

,-1'

q

Let

we call this the p-density of the congruence (1). We also define oo(n)

1

= lim~-o 2(j

r··f

dX1 ... dx.,

which we call the real density of the congruence (1). We now calculate the values of the various densities. Theorem 8.1. When s is even the real density is equal to

(2)

211

8.8 Density

Proof We have, with polar coordinates, 21t

II

(l_x 2 _y 2)a- 1dxdy= I

1

d9I(l-p2)a-1PdP=~' o

o

We next use induction to prove the result: !...

, V=

dx "'dx 1-

xi - ... - x; > 0

n2

=--

'G}

1

Let

= Yv-2 JI

Xv

- xi - x~

(v=3,oo.,s).

Then

v, =

II

,-2

xi -

(l -

x~)-2-dx1

dX2

l-xi-xi > 0

= ~ V,_ 2 = (n:)/2 . -

-

2

,

2 .

We then have . I ( oo(n) = hmb-O

22 3 (2) n Also, from Theorem 8.1, we have

214


so that p>2

Ifn is odd, then the theorem is proved. Ifn is even, then, from Theorem 8.6, we have


Theorem 8.8.

D

If s = 8, then = 16( - l)n I

bs(n)

(- l)d d 3 .

din

Proof Let n = 2tn', 2,rn'. Then

}]2 op(n) = 1516 ,(4)1 n'-3u3(n') = 96n4n'-3 u3 (n'). Also, from Theorem 8.1, we have

n 3 oo(n) = _n 6' 4

so that oo(n)

n op(n) = 16 . 23tu3(n'). p>2

Also o2(n)

=

(l - 2- 3 (t+l). 15)(1 - t)-I;

hence When n is even

I (- l)dd 3 =

- u3(n') + 23u3(n') + 23.2 u3(n') + ... + 2 3 . uj(n') t

din

= -


2U3(n')

+

23 (t+ 1) _ 1 23 _ 1 u3(n')

D

Exercise 1. Prove the following: Let s = 2r. If r is even, then

215

8.9 A Summary of the Problem of Sums of Squares

If r is odd, then L(r)6s(2 tn')

= (( ~, 1) + ( -r 1)2(1-r)(d 1))n'l-rPr-1(n'),

where

1:

L(r) =

n= 1

X(7) , n

and x(n) = 0, 1,0, - 1 when n == 0, 1,2,3 (mod 4). Also

pt(n) = L (~)qt. q qln

Exercise 2. Prove that 2(n)

=

2r2(n).

Exercise 3. Prove that

8.9 A Summary of the Problem of Sums of Squares In the previous section we proved that r 4(n) = 4(n), but is this a mere coincidence? Actually we can prove that, for 3 ::;;; s ::;;; 8, we have r.(n)

= .(n),

and that this is no longer true if s > 8. Up to the present rs(n) has been explicitly evaluated for s'::;;; 24. For example: r3(n)

16 = -n!X2(n)K( - 4n) 1t

f1

(I + - + ... +---;=--t 1

1

P

p21n

P

where the definition of"C is p2tln, p2(t+ 1),tn, K( - 4n)

=

I Lco (-_ 4n) -, m=l

m

m

and if 4 -an == 7 (mod 8), if 4 -an == 3 (mod 8), if 4 -an == 1,2,5,6 (mod 8),

216


and here the definition of a is 4a ln, 4a +1,rn.

where u1't(n)

= I(-

1)dd 1 1,

din

and T(n) is the coefficient in the power series expansion 00

q«l - q)(l - q2) ... )24 =

I

T(n)qn

n=1

and if nl2 is not an integer, then T(nI2) = O. From Theorem 3.6 we have 00

«1 - q)(l -

q2)(l - q3) ... )3

=

I (- 1)n(2n + l)qtn(n+

1),

n=O

so that

«-

T(n) =

lY'(2x1

+ 1) + ... + (-

ly8(2xs

+ 1»

txdxl + 1)+· .. +tx8(t8+ 1)=n-1

I Y;+"'+yi=8n

s

I (- l)t(Yi-1)Yi' i=l

2,j'Yl'''Y8

The following table records the mathematicians who did the evaluations: s

r.(n)

2,4,6,8 3

5,7 10, 12 14, 16, 18 20,22,24 9, II, 13 15, 17, 19 21,23

Jacobi, 1828 Dirichlet Eisenstein, Smith, Minkowski Liouville, 1864, 1866 Glaisher, 1907 Ramanujan, 1916 Lomadze, 1949

Chapter 9. The Prime Number Theorem

9.1 Introduction The main aim of this chapter is to prove the following formula: X

(I)

n(x) '" - - . logx

Here n(x) denotes the number of primes not exceeding x, and the formula (I) is the famous prime number theorem. In this chapter we shall give two proofs. The first proof makes use of some rather deep analytic tools (the reader needs to know a little advanced calculus and complex function theory) but is relatively straight-forward, the fundamental idea being due to N. Wiener. Although the other proof does not require much analytic knowledge and can indeed be classified as an elementary proof, it is more difficult to understand. This proof is due to Erdos and Selberg. One of the difficult problems in the long history of prime number theory is the search for an "elementary proof" of the prime number theorem and success came in 1949. In the following sections we do not give a direct proof of the formula (I). Instead we prove two formulae, each of which is equivalent to (I). Suppose that x > O. Let 9(x)

=

L logp,

(2)

p~x

tjJ(x)

=

L

A(n)

L' logp.

=

(3)

In formula (3) A(n) is the von Mangoldt function of Example 6in §6.1. 9(x) and tjJ(x) are called Chebyshev's functions. It is easy to see that tjJ(x) = 9(x)

+ 9(xt) + 9(xt) + ...

(4)

and tjJ(x)

=

L [~Og x] logp, p'iix

(5)

ogp

where [~] denotes the integer part of ~. Theorem 1.1. We have

-I· 1m

n(x)

x ... oox(logX)-l

9(x) -I· tjJ(x) = -I· 1m - - = Im-x"'oo

X

x"'oo

X

(6)

218

9. The Prime Number Theorem

and lim n(x) = lim 8(x) = lim tfJ(x) . x-+oox(1ogX)-l x-+oo X x-+oo X

(7)

Proof From (4) and (5) we derive easily 8(x) ~ tfJ(x) ~

logx --Iogp = n(x) log x, p"'x logp

I

so that - . 8(x) hm - x-+ 00 X

~

- . tfJ(x) hm - x-+ 00 X

~

-. n(x) hm 1 . x-+ 00 x(log x)-

Now let 0 < oc < 1, x > 1. Then 8(x) ;:::

I

logp;::: {n(x) - n(x")} log XIX ;::: oc{ n(x) - x"} log x.

xOl 0 (i = 1, 2, 3, 4) such that for x ;::: 2, (8)

and (9) Also from Theorem 1.1 we see at once that in order to prove formula (1) we need only prove that tfJ(x)

or

~

x

(10)

219

9.2 The Riemann ,-Function

(11 )

8(x) '" x.

Before we prove formula (10) we need some preparation.

9.2 The Riemann ,-Function From now on we write s = u function defined by the series

+ it for

a complex number with u and t real. The

1

00

(s)

I -;

=

(u> 1)

(I)

n=ln

is called the Riemann (-function. Let a > I. When u ~ a, because

0011001001 ,,-~,,-~,,1 L... s ~ f...J O'~ L... a' n=N n

n=N

n

n=N n

we see that the series for (s) is uniformly convergent. Since a is any real number greater than I, it follows that (s) is an analytic function in the half plane u > I.

Theorem 2.1. Let 1

h(s) = (s) - - . s- I Then h(s) is analytic in the half plane u > 0, and

Ih(s)1

~ 1.1u

(u > 0).

Proof Let

f

n+l

fn(s) = n -s

-

u- s du,

n

so that (2)

Since

f

In- s - u-si

=

1

sv- s- 1 dv 1

n

f

n+l

u

~ lsi

n

v- lJ -

1

dv

(n ~ u ~ n

+ I),

220


we have

If

n+1

If,,(s) I =

n+1

(n- S

~ lsi

u-')dul

-

n

n

Suppose that 0 < a

~

(f

f v- a- 1dv.

~

~

b, -

T~

t

~

Jb 2 + T2 Na

T. Then

a,

so that the, series L:'= 1f,,(S) is uniformly convergent in 0 < a ~ (f ~ b, - T ~ t ~ T. Since a can be arbitrarily near 0, and b, T can be arbitrarily large it follows that h(s) = L:'= 1 f,,(s) is analytic in the half plane (f > O. From this we see that (2) can be used as an analytic continuation for '(s) into the half plane (f > 0, and s = 1 is the only simple pole with residue 1. From (2) we derive at once co

If,,(S)I~lslfv-a-1dv=~ I '(S)-~I=I s 1 n=l


((f> 0).

(f

0

Theorem 2.2. In the half plane (f

~

Proof When (f> 1 the series Theorem 5.4.4

L:'= 1 (lln

1, '(s) # O. S)

converges absolutely so that from

(3)

here the product is over all primes p. Since each factor in the product is non-zero and the product converges absolutely, it follows that '(s) # 0 when (f > 1. Since '(s) has a pole at s = 1 we are left to prove: when t # 0

'(1

+ it) #

O.

Now consider the funct!on (e> 0, t # 0).

From (3) we know that

221

9.2 The Riemann ,-Function

where a

P

1- . I1 3

= I1 - pI -1+e

-1-

pI +e+it

1- . I1 4

1

--:-:--:-::7"

pI +e+2it

1-1

'

so that

= From 3

00

1

m=l

m

L _p-(1

+e)m(3

+ 4cos(mtlogp) + cos (2mtlogp)).

+ 4 cos 9 + cos 29 = 2(1 + cos 9)2

~

0, we have loga p ~ 0, that is

ICfJe(t)1 ~ 1.

Suppose that (I

+ it) = O.

(4)

Then

f

1 +e

(I

+ e + it) =

nO"

+ it) dO" = O(e).

From Theorem 2.1, we have e(1

+ e) = 0(1)

so that, for any small e, we have CfJe(t)

and this contradicts (4).

= O(e),

0

Theorem 2.3. Let

ns) (s) When

0" ~

1

+s _

1 = g(s).

1, g(s) has a continuous first derivative.

Proof Differentiating the function h(s) in Theorem 2.1 we have

1

ns)

= - (s _ 1)2 + h'(s).

Here h'(s) is infinitely differentiable in

is regular in the half plane 0"

~

0"

> O. Also from Theorem 2.2, we see that

1

s- 1

(s)

1 + (s - l)h(s)

1, so that 1 + (s - 1)h(s) ¥- 0 in the same half plane.

222


Therefore

_ (

I

_ h'(S))(S _ I)

(s - 1)2

I

- - - - - - - - - = - - - + g(s), I

+ (s -

I )h(s)

S- I

and here g(s) has the required property stated in the theorem.

D

9.3 Several Lemmas Theorem 3.1. If f(x) has a continuous first derivative, then b

ff(x)eiX1dX

=

oG)·

(1)

a

Proof From integration by parts we have b

f f(x)e ixt dx

=

b

h

{[f(x)e ixtJ : - f f'(x)e ixt dX}

a

=0

G)·

a

Theorem 3.2. 00

sinx f --dx=n. x

(2)

-00

Proof Let 00

sinIXx J= f e-kx~dx

(I ::;:;

IX ::;:;

2, 0 ::;:; k ::;:; I).

o

Fix k > 0 so that the integrand is now a continuous function of IX and x, and the partial derivative with respect to IX is e- kx cos lXX, which is also continuous. From the convergence of the integral 00

f e-kxdx o

we see that the integral 00

f e-kxcoslXX dx o

converges uniformly in I ::;:;

IX ::;:;

2. We can therefore differentiate J under the

223

9.3 Several Lemmas

integral sign giving

Here the right hand side is obtained from integrating by parts twice. From integration formulae we have

With IX fixed, when 0

~

~

k

o ~ k ~ 1. Therefore

1, J is uniformly convergent so that J is continuous for

f 00

lim J= k-O+

sin IXX IX 1t --dx= lim tan- 1 -=-. X k-O+ k 2

o

Taking in particular

IX

= 1, we have 00

00

sinx

f sin x

f --dx = 2 --dx = x

o

1t.

X

o

-00

Theorem 3.3. Let a < 0 < h.

If f(x) has a continuous second derivative, then b

1f sinwx lim - f(x)--dx = f(O). ro- 00 1t

(3)

X

a

Proof We consider b

sinwx f (f(x) - f(O))-x- dx. a

At the point 0, (f(x) - f(O))/x has a continuous first derivative so that from Theorem 3.1 we have b

sinwx

lim f (f(x) - f(O))--dx = 0, (0-+

X

00

a

that is

f b

b

sinwx lim -1 f(x)--dx X

ro- 00 1t a

=

flO) lim -1 fSinwx --dx X

ro- 00 1t

a

224


f bro

1 lim =fiO)-

sinx dx X

11: ro-> 00 cro

f 00

1 =fiO)11:

sinx dx, x

-00

and the result follows from Theorem 3.2.

0

Theorem 3.4. Let A > 0, and

K;.(x) =

Ixl { 1 -2A.'

if Ixl:;;; 2A, if Ixl > 2A.

0,

Then

f . ./he 00

K;.(t)e,xt dt = k;.(x),

-1-

(4)

-00

where

{ ./he2x =

~(SinAX)2 ,

k;.(x)

if x =I 0,

2A

if x

./he'

= 0.

Proof It is easy to see that

fo f (1 H

k;.(x) =

(5)

;A)cosxtdt.

o

If x = 0, then clearly k;.(x) =

1

M:"2A.

y2n

If x =I 0, then integration by parts gives the required result at once.

0

Theorem 3.5. We have

f . 00

K;.(x)

1 =./he

k;.(t)e,xt dt.

-00

(6)

225

9.3 Several Lemmas

In particular, with A = 1, x

= 0, we have 00

(7) -00

Proof We first consider the integral

f . ro

lew) =

1 fo

f ro

k;.(t)e,xtdt =

2 fo

k;.(t)cosxtdt.

0

-ro

From (5) we have ro 2;'

lew)

= ~ f f (1 o

;A) cos utcosxtdudt

0 ro

2;'

=

~f

A) du f (cos(u + x)t + cos(u -

( 1 - 2U

o 2).

= ~f

o

(1 - ~)(sin(U ++ 2A

7t

x)t) dt

u

x)w x

+ sin(u -

X)w)dU. u- x

o

If x> 2A we have lim ro _ oo lew) = 0 from Theorem 3.1; if 0 < x < 2A we see from Theorem 3.1 and Theorem 3.3 that in the above formula the limit of the first term is o and the limit of the second term is 1 - X/2A. Since the integral in (6) is a continuous function of x, we see that K;.(U) = 0, K;.(O) = 1. The theorem is proved. D Theorem 3.6. Letf(t) ~ 0(0 ~ t ~ 00), andforany T > 0, the interval 0 ~ t ~ Tcan be divided into afinite number of sections in each of whichf(t) is continuous. Suppose further that, for any e > 0, the integral 00

converges. Then 00

00

lim f e-''f(t) dt = ff(t) dt.

,-0

o

(8)

o

Proof Since f(t) ~ 0, S~ f(t) dt increases with respect to T so that S~ f(t) dt exists either as a finite number or 00. Now

226

9. The Prime Number Theorem 00

00

f e-'1j(t) dt

~f

f(t) dt,

o

o

so that 00

00

lim f e-''l'(t) dt ,"'0

~ ff(t) dt.

o

o

On the other hand T

00

~f

f e-'1j(t) dt

T

~ e-,T f f(t) dt,

e-''l'(t) dt

o

o

o

so that T

00

~ ff(t) dt.

lim f e-'1j(t) dt ,"'0

Letting T -+

00

o

o

00

00

we have lim f e-''l'(t) dt ,"'0

~ ff(t) dt, o

o

and the theorem is proved.

0

9.4 A Tauberian Theorem Definition. If f(x) is defined in -

00

<x
0, then f(x)

-+

I (x

-+ 00).

=

I,

227

9.4 A Tauberian Theorem

Proof From Theorem 3.5 we have

f

f

co

1 ~ Y 2n

co

k;.(x - t)dt

= -1

n

-co

sin 2 u -- 1, -2-du u

-co

so that, without loss of generality we .can suppose that I = o. If f(x) 0, then there exists 0 > 0 and a sequence (xn)(xn -+ 00) such that j(xn) < - 0 (n = 1, 2, ... ) or j(xn) > 0. Assume without loss that j(xn) > 0 (n = 1,2, ... ). (The casej(xn) < - 0 can be proved in the same way.) Since f(x) is slowly decreasing, there exists Xo = xo(o) and 11 = 11(0) such that

+

o -"2

j(y) - f(x) ~

holds. Take a particular x in (xn). Then f(y)

From (2), when x

f

o >"2 ~

(2)

Xo and x in (x n), we have

co

~

k;.(x

+ 11 -

t)f(t)dt

-00

x+2q

f -~ f fo f 2fo ~f of

~ 2y~ 2n

x

k;.(X+I1- t )dt-

~

y 2n

x

f

k ;.(X+I1- t )dt

-co

co

k;.(x

+ 11 -

t) dt

x+2q

f

x+q

=

_0_

x-q

k;.(x - u)du -

x-q

k;.(v)dv - ; .

o

;'q

sin-2 d w w2M w2 n

=-

n

o

o

-+ -(A. -+ 00),

2

fo

-co

co

q

=

~

f

f

k;.(v)dv

q

co

sin 2 w --dw w2

f co

k;.(x - u)du -

~

fo

x+q

k;.(x-u)du

228


so that there exists a suitably large A. o such that 00

1 r::L y2n

f kAO(x + 1'/ -

0 t)f(t)dt >4

-00

Let x increase without bound in (xn ) so that 00

lim x-+oo xe{Xn}

f

1 r::L y 2n

kAO

(x

+ 1'/ -

0 t)f(t) dt ~ -, 4

- 00

which contradicts our supposition. Therefore f(x) proved. 0

-+

0 and the theorem is

Theorem 4.2 (Ikehara). Let h(t) be non-decreasing in 0 ~ t < 00, and suppose thatfor any finite T, h(t) has only afinite number ofdiscontinuities in 0 ~ t ~ T. Suppose also that the integral

f 00

j(s) =

(0" > 1)

e-sth(t)dt

(3)

o

converges, and that given any finite a > 0, there exists a constant A such that lim (j(s) 0, A > O. From Theorem 3.4 and the uniform convergence of

f 00

(a(t) - A(t))e-(e+iy)t dt

-00

in Iyl :::; 2A, it follows that

f f f

h.(x)

= 2~

(a(t) - A(t))e-et dt

f

2A

21n

KA(y)ei(x-t)y dy

-u

-00

=

f 2A

00

00

K;.(y)e ixy dy

-2A

(a(t) - A(t))e-(e+iy)t dt

-00

2A

= _I ~

K;.(y)eiXY(fll

+ e + iy) -

~)dY. e+ry .

-2A

From (4) we have

f 2A

. hmh.(x) .... 0

= -1 2n

g(y)K;.(y)e iXY dy.

(8)

-2A

From Theorem 3.1 we have lim 1imh.(x)

=

O.

(9)

x- 00 £-0

On the other hand, from Theorem 3.6, we have 00

limh.(x) = lim .... 0

e ...

~(f kA(x -

ov2n

fo f fo f

t)a(t)e-· t dt - A

o

00

=

k;.(x - t)e-· t dt)

0

fo- f 00

kA(x - t)a(t)dt -

o

f 00

kA(x - t)dt

0

00

=

-00

k;.(x - t)(a(t) - A(t))dt

= I;.(x),

230


and so from (8) we see that /;.(x) exists. This proves 1), and now 2) follows from (9). Finally we prove 3). From the definition of A(t) we see that it suffices to prove that a(t) is a bounded slowly decreasing function. From (7) we have

f

f

00

~

lim x-oo

V 2n

00

k;.(x - t)a(t)dt = lim x-oo

~

V 2n

-00

-00

f;'x

A

=--lim

fo x-oo

k;.(x - t)A(t)dt

A(Sin U)2 - -n u

du

-00

f 00

=A -

n

(sinu)2 - - du=A, u

-00

so that there exists Xo such that, when x

fo f

;?; xo,

00

k..{x - t)a(t)dt < A

+ 1;

-00

that is

f ( t)2 ( -"It) 00

sin -t-

a x

dt < n(A

+ 1)

-00

Since the integrand is non-negative, substituting x

+ 2/fi for

x, we have

J~

f ei~tya(x+ ~-Ddt O. We have

III

231

9.5 The Prime Number Theorem

so that lim {a(x

+ J) -

~

a(x)}

O.

x'" 00

b ... O

This means that a(x) is slowly decreasing. The theorem is proved.

0

9.5 The Prime Number Theorem In this section we apply Ikehara's theorem to prove the prime number theorem. We do not give a direct proof of the prime number theorem; instead we prove the equivalent theorem (see §I): Theorem 5.1. ljJ(x) '" x.

Proof From the definition of ljJ(x) we see that ljJ(x) is a non-negative increasing function with only finitely many discontinuities in the interval 0 ::::; t ::::; T. When u> I we have, from Theorem 1.2 and formula (6.14.5), that

f 00

f 00

e-stljJ(et) dt

=

u-(l +s)ljJ(u) du

o

n+l =

n~1

n+l

f

u-(1 +S)ljJ(u) du =

n~1 m~n A(m)

f

u-(s+ 1) du

n

I

=-

00

L (n-

(n

S -

+ I)-S) L

I

= - lim

N

L (n-

+ 1)-,)

(n

S -

SN-oo n =l

= ~ lim { s N-oo

=~

1:

A(m)

m~n

sn=1

f.

A(n)n- S

-

(

L A(m))(N + I)-s}

m~N

n=1

A(n) s n=1 nS

L A(m)

m~n

= _ ~ . ('(s) s

(u> I).

((s)

From Theorem 2.3 we see that the function

- I-('(s) - - - I- - --I (('(S) - + -I-) --I s ((s)

s- I

s ((s)

s- I

s

has a continuous derivative in u ~ I, so that for any a > 0 the function is uniformly continuous in I ::::; u::::; 2, It I ::::; a, and therefore there is a continuously differentiable function get) satisfying

232

in


lim (_

~ ('(s)

.,.-+1

S

1_) =

__

(s)

S -

g(t)

1

It I :::; a uniformly. From Theorem 4.2 we see that lim e-tl/l(et) = 1. t-+ 00

Let et

= x. Then lim I/I(x) = 1, x-oo x

D

which proves the theorem.

Exercise 1. Letpn be the n-th prime number. Prove that the prime number theorem is equivalent to 1l· mPn - - = 1. n logn

n-+oo

Exercise 2. Use the prime number theorem to deduce that M(x)

I

=

Jl(n)

= o(x).

Exercise 3. Use the prime number theorem to deduce that

Exercise 4. Let n

= p~! ... p%k and define w(n)

= k,

Let 1tk(X)

8k (x) =

I

I

=

1,

'tk(X)

I

=

1,

n~x

n~x

co(n) = Q(n) = k

Q(n)=k

log (p 1

••• Pk),

pl ••• Pk~X

o

(x)

=

I

1.

pl ••• Pk~X

k

(Note: Here the sum is over all primes Pi>'" ,Pk satisfying Pi ••• Pk :::; x; the same set of primes Pi>'" ,Pk with a different ordering is treated differently.) Prove: kx(loglogX)k-l 0' (x) '" ---k

logx

(k

~

2),

(k ~ 2),

x(loglogX)k-l

1tk(X) '" 'tk(X) '" ~--:---

(k-l)!logx

(k ~ 2).

233

9.6 Selberg's Asymptotic Formula

9.6 Selberg's Asymptotic Formula Throughout §6 - 8 we use the letters q and r to represent prime numbers. ~

Theorem 6.1 (Selberg). Let x

+

.9(x)logx

1. Then

L

.9(~)IOgp = 2xlogx + O(x)

(I)

+ O(x).

(2)

p

p""x

and L log2p

+

L logplogq = 2xlogx

We first prove the following: Lemma. Let F(x) and G(x) be two functions defined for x G(x) =

L l~n:::;;x

~

I and satisfying

F(~) log x.

Then

n~/(n)G(~) = F(x)logx + n~x F(~)A(n). Proof We have, from §6.4, A(n) L n""x

=

Ldln Jl(d) log~ so that

Jl(n)G(~) = L Jl(n) n n""x = L I""x

=

L I""x

L

x

m:::;;-

F(~) L I

F(~) log~n mn Jl(n)

(IOg~ + 10g~) n

I

nil

F(~)IOg~. LJl(n) + L I I nil

= F(x) log x + L

F(7)A(l)

I""x

F(~) A(l).

D I Proof of Theorem 6.1. Let y be Euler's constant. From §5.8 we have I""x

L ~ = log x + y + 0 (~) .

n~xn

X

Also

x

= L logn = flOgtdt + O(logx) = xlogx - x + O(logx). n:::;:;:x

1

234


We apply the lemma with

= "'(x) - x + y + 1

F(x)

so that G(x)

= logx l";~";X ",(~)

xlogx

-

n~x~ + (y + l)xlogx + O(logx)

= 0(log2 x) = O(yIx).

From the lemma we have F(x)logx

+I

n~x

F(~)A(n) = o( I J~) = O(x). n n

(4)

n~x

From Theorem 5.9.1 we have A(n)

I -

n~x

logx

=

n

+ 0(1).

(5)

Therefore, from (3), (4), (5) and Theorem 1.2 we have "'(x) log x

+ n~x "'(~)A(n)

=xlogx+x = 2xlogx

A(n) - - ( y + l)logx-(y+ I) I A(n) +O(x) n~x n n~x

I

+ O(x).

(6)


n~x "'(~)A(n) =

o(

Jx .9(~}Ogp

I

logp log q) =

pr.tqP~x a~2,p;;>l

= 0 (x

m~x A(m)A(n) - p~xlogplogq

o( I

=

I

logp

p(%~x a~2

logp ) P,,;J-;p(p - 1)

I

=

IOgq)

qP~xlprx. P~l

(7)

O(x)

and "'(x) = .9(x) = .9(x)

+ .9(xt) + ... + .9 (x[:::!J) + O(logx . .9(xt)) =

Formula (I) now follows from (6), (7) and (8).

.9(x)

+ O(xt log x).

(8)

235

9.7 Elementary Proof of the Prime Number Theorem

Also from 9(x)logx- I log2p= I logplog::'= I logp(I p';;x p';;x P p';;x x

~+O(l))

n~p

1

I -I

=

n:::=;x

n

= o(x

formula (2) follows at once.

logp

+ 0(9(x))

x p:S::;;

I ~) + O(x) =

n~xn

O(x),

0

9.7 Elementary Proof of the Prime Number Theorem Let R(x)

=

(l)

9(x) - x.

We know from Theorem 1.1 that the prime number theorem is equivalent to lim R(x) = O. x-+

(2)

X

00

Before we prove (2) we first establish the following lemmas. Lemma 1.

If x;?;

3, then logp logq

I

pq

pq:S:;x

I

.

1 = -log2 X 2

logplogq

pq';;x pq logpq

+ O(logx),

= logx + O(loglogx),

logp

I = O(log log x). " 2x p~x plogp

Proof Let A(n) = Ip,;;.logp/p. From Theorem 5.9.1 we have A(n) = logn where r. = 0(1). Therefore "

L...

pq';;x

logp log q "logp" log q "logp x - - - = L... - - L... - - = L. --logpq

p';;x P

x

q

p';;x p

p

+ O(logx)

q~P

x = I (A(n) - A(n - 1)) log- + O(logx) n~x

n

+ r.

236


I

A(n)

n~x-l

=I

n~x

{lOg~ n

logn .lOg(1

109_x_} + O(logx) n+1

+~) + n

o( I

n~x

109(1

+ ~)) + O(logx) n

1

= 2log2 X + O(logx). Using the same method we have, by partial summations, logplogq

I

pq ~ x

pq logpq

= logx + O(loglogx).

Also from

I

1 2x nlogn

n "'x ~

1 1 1( 1 -1 ) =-I-+Ilogx n"'x n n"'x n 2x logx ~

f

logn

~

x

I -1

=

n~x n

du 2 ulog u

+ 0(1)

-2x n

f

=

1

2x u

f

~

I

x

~n~x 2 du

ulog u

x

+ 0(1) = -du- + 0(1) = ulogu

2

O(loglogx),

2

we have

I

logp

I

=

'" p log2x p

'"

p~x

I

n~x

=

(A(n) - A(n -

1 1»-2x

logn

n~x

{logn -log(n - I)} _1_ 1og2x n

o( I

1

'" nlog2x n

+

o( I 'n n~x

2x

logn

12x ) log-n+1

) = O(loglogx).

n~x

The lemma is proved.

0

Lemma 2. 8(x)

+ I

pq~x

logplogq = 2x logpq

+ o(~) logx

(X

~

2).

237


Proof Let

I

B(n) =

logplogq,

=

C(n)

pq~n

I

log2p.

p~n

Then we have 8(x)

+ =

'" logp log q L.:. pq~x logpq

I

C(n) - C(n - I) n~x logn

= C([x]) log[x]

= 2x

+ L

B(n) - B(n - 1) n~x logn

+ B([x]) + I log[x]

{C(n)

{_l_ _ logn

1 } log(n + I)

IOg( 1 +~)

+ o (_x_) + I

(2nlogn

n~x-l

logx

+ B(n)}

n~x-l

+ O(n))

lognlog(n

n

+ 1)

=2X+O(_X), log x

and the lemma is proved.

0

Lemma 3.

logp logq R (x) pq~x logpq pq

R(x)logx = I

+ O(x log log x)

(x

~

3).

Proof From Lemma 1 and Lemma 2 we have

x) logp I 8 ( - logp=2x I - - Ilogp I p~x p p~x p p~x

log q log r --x logqr

qr~p

+

o(x I

~

IOgP)

p~x

=

2xlogx -

2x plog-p

logqlogr 8 (x) qr~x log qr qr

I

+ O(xloglogx).

Substituting this into Selberg's asymptotic formula (that is, formula (6.1)), we have 8(x) log x = I

pq~x

logplogq 8 (x) logpq pq

+ O(x log log x).

The result follows from substituting (I) into this and applying Lemma 1. Lemma 4.

IR(x)l::::; _1 I IR(~)I logx n~x n

+o

(x loglogxlog x)

(x

~

3).

0

238


Proof Substituting (1) into formula (6.1) we have

I R(~)IOgp + O(x),

R(x)logx = -

p"'x

p

so that from Lemma 3 we see that

2IR(x)llogx:::;; I IR(~)IIOgp + I logplogq p"'x p pq"'x logpq

IR(~)I + O(x log log x). pq

From Lemma 2 and partial summations, and noting that Iial - Ihl! :::;; la - hi, we see that

2IR(x)llogx:::;;

)1)

I (Ilogp+ I 10gpIOgq)(IR(~)I_IR(_x n"'x-l p"'n pq"'n logpq n n+I

I

+0 (

:::;; 2

p"'x

+ I 10gpIOgq) + O(xloglogx) pq"'x logpq

n"'~-l n(IR(~)I-IRC: 1)1)

I (~) I-I

+0( ~2

logp

I

I _n R n",x-llog2n n

(n I

R -x) +0

-..:: n"'x I

+ o(x

R (_x ) n+1

II)

+ O(x log log x)

I -n- ((x) 8 - -8 (( n",x_llog2n x-))) n n+ 1

1_))

I _n_(~ _ _

n",x_llog2n n

n+ I

+O(xloglogx).


I -

n ((x) (x)) 8 - -8 n n+1

n",x_llog2n

=

I

2"'n"x-l

= o(x

I

(_n _ n_1_) =

8 (~) n log2n

n"'x nlogn

1 ) log2(n - I)

+ O(x)

O(xloglogx),

so that

n~JR(~)1 + O(xloglogx),

2IR(x)llogx:::;; 2 and the required result follows. Lemma 5.

If x >

0

I, then

I

n~x

8(n) -2

n

= logx + 0(1),

239

9.7 Elementary Proof of the Prime Number Theorem and

L 8(~) =

xlogx

n

n~x

+ O(x).

Proof Since

L ~= L ~- L ~=~+O(~)+O(~), X

p";"";xn

";;'pn

">xn

P

P

we have

L

I

8(n)

L 2" L logp = p~x L logp p:::=;n~x L n2 n p~n

-2 =

n:::=;x

n

n:::=;x

L IOgp(~p + O(~) + O(~)) = P x

=

logx

+ 0(1)

p";x

and

L logp . (~+ 0(1») =

=

P

p";x

Lemma 6. logn L -R(n) = "";x

-

n

xlogx

+ O(X). 0

I (x)- + O(X). L -R(n)R "";x n

n

Proof From Selberg's formula (that is (6.2» and partial summations we have

x x IOg2 plog- + L logplogqlog- = 2xlogx + O(X). p";x P pq";x pq Substituting

L

log~ = L ~ + O(~), P

p";"";x

n

P

x log-= pq

into the above formula and interchanging the summations we have

L -I L log2p + L -I L logp L logq = 2xlogx + O(x);

n~x

n

p~n

n~x

n

p~n

x q~;

that is logn L -8(n) + L

(x)

I -8(n)8 - = 2xlogx + O(x). n n:S;xn n The required result follows from substituting (I) into this formula and then apply Lemma 5. 0 n~x

240


Lemma 7. Let 0 Xo, IR(x)1 < ux.

(3)

Then there exists Xu such that, when x > xu, the interval subinterval (y, eby) with the property that

IR;Z) I< u ~ u when y ~

Z

~ eby. Here

2

«(1 -

U)16 X,

x) contains a

,

c5 = u(1 - u)/32.

Proof From Lemma 6 we have logn I In~x-nR(n) ~

~R(n)R(~) n

I

x n

+II

~R(n)R(~)1 n

n<xo n

xo~n~~

+

x

~R(n)R(;)

I

+ O(x)

-

Xl,

Ix,,t,;;;x lO! n R(n) I< u (x + x') logx + O(x), 2

where x' = (l - U)16 X • Suppose that R(n) does not change sign in (x' (x' ~ y ~ x) so that

~

n

~

x). Then there must exist y

R(Y) I I logn < u (x + x')logx + O(x). I-y x'';;;n';;;x 2

From (l -

U)16

1- u < --I

+ 15u '

we see that

I< u IR(y) y

u(l

+ 7u) + 0 8

(_1_) log x

(4)

Xl)'

But if R(n) changes sign in (x', x), then clearly there exists y (x' IR(Y)I = O(1ogy) so that (4) still holds.

~

y

~

x) such that

241


When I < Y < Y' we have, by Lemma 2,

L

y «(1 + 70')/(1 + 15O'))x, then e-bYI>

1-0'

+ 150'

I

x>

Xl

so that we can take Y = e-bYI' The lemma is proved.

0

Proof of the prime number theorem. We already know that there exist e > 0 and x~ such that, for X > Xo, 8(x) > ex

(this is Theorem 1.2). From Selberg's formula, we have 8(x)

= 2x - _1_

L 8(~)IOgp + o(~) p log x

= 2x -

L 8(~)IOgp

logx p~x

_1_ logx

x

p~--;-

xo

p

(6)

242


::;;; 2x - ex log x logx

I+ 0 (-

logx

x

L

logp)

+ 0 (-x-) logx

xo' xo, e > 0).

From (I) we have

IR(x)1 < O'o(x)

(x> XO, 0'0

II -~ I,

=

0 < 0'0
xu, the interval «1 - (T)16 X, x) contains a subinterval (y, eOy) (J = (T(l - (T)/32) such that when y :;;; z :;;; eOy we have RI(Z) I < _1_ . (T + (T2. Z (()(k) 2

I

7) First use Theorem 8.2 to show that (To and Xo exist such that 0 < when x> Xo

(To

< 1 and

(To

IRI(X) I < - x . (()(k)

Then use this together with 4) and 6) to prove that lim Rl(x) =

o.

X

x-+ 00

Notes 9.1. The present best result on the error term of the prime number theorem, namely n(x)

~

= Ii x + O(xe-c(logX)'),

c a positive constant,

is due to I. M. Vinogradov and H. M. Korobov and is based on estimates on trigonometric sums.

249

Notes

9.2. In recent years a number of mathematicians have obtained error term estimates in Selberg's elementary proof of the prime number theorem. For example: n(x)

= Ii x + 0 (--;-) log x

where A is any constant however large and the O-constant d((pending on A (see E. Bombieri [8] and E. Wirsing [64]). An even better estimate is given by H. Diamond and G. J. Steinig [21].

Chapter 10. Continued Fractions and Approximation Methods

10.1 Simple Continued Fractions By a finite continued fraction we mean an expression ao+-----a1

+-----

We shall see that, as N -+ 00, the expression here tends to a definite number; we call the infinite continued fraction a continued fraction. It is convenient to denote the above expression by 1 ao+al

-

1

-

1

+ a2 + ... + aN

or

It is easy to see that ao

[ao] =

T'

In general, we let [ao, al> ... ,an] = Pn/qn, 0 ::;;; n ::;;; N, where Pn, qn are polynomials in ao, aI, ... ,an. These polynomials are linear in anyone a, and the denominator qn is independent of ao. We call Pn/qn the n-th convergent of [ao, al> ... ,aN]. Theorem 1.1. The convergents satisfy the following: PI =alaO+ 1,

Pn = anPn-1

ql = al>

qn

= anqn-l

+ Pn-2 + qn-2

(2::;;; n ::;;; N), (2::;;; n::;;; N).

0

Theorem 1.2. The convergents satisfy the following: (n

~

1),

(1)

251

10.1 Simple Continued Fractions

or Pn

Pn-1

( - 1)n-1

and (n

~

D

2).

(2)

Definition. Let ao be an integer, and a1, a2, ... be positive integers. Then 1 ao+a1

1 + a2

+ ...

is called a simple continued fraction. We shall only deal with simple continued fractions in this chapter. From Theorems 1.1 and 1.2 we deduce at once: Theorem 1.3. (i) lfn > 1, then qn (ii)

~

qn-1

P2n+1 P2n-1 ----. q2n q2n-2

(iii) Every convergent of a simple continued fraction is a reduced fraction.

0

Let oc be a real number. We take ao = [oc] and we let oc~ = l/(oc - [oc]). We then take a1 = [OC1] and we let oc~ = l/(oc~ - [OC'1])' We continue in this way by taking an = [oc~] and defining oc~ + 1 = 1/( oc~ - [oc~]). It is clear that if this process terminates, then oc must be a rational number. Conversely, if oc is a rational number p/q where (p, q) = 1, then ao = [p/q] and

1

O~-
1X2n-2' Next from Theorem 1.2 (1), 1X1 ~ 1X2n+1 ~ 1X2n ~ 1X2, so that limIX2n and limlX2n+1 exist. Finally, from Theorem 1.2 and Theorem 1.3 (i), we have 11X2n - 1X2n -11 = l/q2nq2n-l ::;;; 1/2n(2n - 1) so that limIX2n = limIX2n-1' 0 Exercise. Prove that

Pn =

-1 al 1

ao 1 0

0 - 1 a2

0 0 - 1

0 0 0

0 0 0

0 0 0

.......................................

0 0

0 0

0 0

0 0

1 an-l 0

- 1 an

and that qn is the determinant above with the first row and first column omitted. Exercise 2. The sequence (un) = (1, 1,2,3,5,8, 13, ... ), where Ul = U2 = 1, Ui + 1 = Ui - 1 + Ui (i > 1), is called the Fibonacci sequence. Prove that (i) Un +2/Un + 1 is the n-th convergent of (1 + )/2; (ii) in the continued fraction [ao, at. .. .], if ai = 2 (i > 0) and an = 1 (n # i), then for m > i we have

J5

Pm Ui+1 Um-i+3 + Ui Um-i+l qm Ui Um-i+3 + Ui-1 Um-i+l Exercise 3. A synodic month is the period of time between two new moons, and is 29.5306 days. When projected onto the star sphere, the path of the moon intersects the ecliptic (the path of the sun) at the ascending and the descending nodes. A draconic month is the period of time for the moon to return to the same node, and is 27.2123 days. Show that solar and lunar eclipses occur in cycles with a period of 18 years 10 days.

10.2 The Uniqueness of a Continued Fraction Expansion Definition. We call [ao, at. ... , an, ... ].

IX~

= [an, an+t. ... ] the (n + l)-th complete quotient of

253

10.2 The Uniqueness of a Continued Fraction Expansion


IX =

IX~,

IX~ao + 1 IX=-=--IX'l

IX~Pn-1 +Pn-2

IX

= --"-,- - - - - ,

IXnqn-1

+ qn-2

If IX is rational, then this holds up to n = N. Proof Use mathematical induction.

0

[IX~], except when IX is rational and aN = 1 in which - 1. Therefore there are only two representations to a

Theorem 2.2. We always have an =

case we have aN-1 = rational number.

[IX~_l]

Proof WehaveIX~ = an + I/IX~+l' If IX is irrational or iflXisrationai and n ¥- N - 1, then IX~ + 1 > 1 so that an < IX~ < an + 1, as required. If IX is rational and n = N - 1, IXn + 1 = 1, then an = [IX~] - 1. 0 Theorem 2.3. The representation of an irrational number by a simple continued fraction is unique.

Proof Suppose that IX= [aO,aI>a2,"'] = [b o,b 1,b 2, ... ]. Certainly we have ao = [IX] = bo, and similarly a1 = b 1. Suppose now that ak = bk for k < n, and we have to prove that an = bn. From IX = [ao, . .. , an-I> IX~] = [ao, . .. , an- b P~], we have IX~Pn-1 +Pn-2 IX = -,- - - - IXnqn-l +qn-2

P~Pn-1 +Pn-2 R'

I'nqn-1 +qn-2

,

so that (IX~ - P~)(Pn-1qn-2 - Pn-2qn-1) = O. From Theorem 1.2 we deduce that IX~ = P~ and therefore an = [IX~] = [P~] = bn· 0 Theorem 2.4. We have

(- I)nb n qnIX - Pn = - - -

0< bn < 1,

and bn/qn + 1 is a decreasing function of n. (If IX is rational, then this holds only for 1 :::; n :::; N - 2, and bN-1 = 1.) Proof We have

so that Pn IX~+1Pn + Pn-1 IX - - = ~-=----qn IX~+lqn+qn-1

Pn qn

- (Pnqn-1 - qnPn-1) qn(IX~+lqn + qn+d

(- I)n qn(IX~+lqn

+ qn-1)'

254

10. Continued Fractions and Approximation Methods

and hence (j= n

qn+1 rx n+1qn + qn-1

an+1qn+qn-1 rx n+1qn + qn-1

I

I

From this we see that 0 < (jn < 1 except when rxn + 1 rx~ = 1 + l/rx~+ 1 we have

= rx~ + l '

Also, from

1

-----;::: -------rx~+lqn

+ qn-1

(an+1

+ l)qn + qn-1

In the last inequality, equality sign holds only when rxn+ 1 rational and n = N - 1. 0

= rx~+ p

that is when rx is

From this theorem we deduce: Theorem 2.5.

If rx is irrational, then limpn/qn =

0

rx.


Irx -

Pn qn

I: : ; _1_ < ~, qnqn-1

qn

with the equality sign only when rx is rational and n = N - 1.

0

10.3 The Best Approximation Let rx be a real number.: Among the rational numbers with denominators not exceeding N, there is one which is closest to rx, and we call it the best rational approximation to rx. We now prove that the convergents Pn/qn are the best rational approximations to rx. Theorem 3.1. Suppose that n;::: 1, 0 < q ::::; qn and p/q -:f Pn/qno Then IPn/qn - rxl < Ip/q - rxl·

Proof It suffices to prove that IPn - qnrxl < Ip - qrxl. (i) If rx = [rx] + t, then Pr/q1 = rx and the result follows at once. (ii) If rx < [rx] + t, then the result holds when n = 0, and if rx > [rx] + t, then the result holds when n = 1. We now assume as induction hypothesis that the result holds for n - 1, and proceed to prove by induction. If q::::; qn-l> then from the induction hypothesis IPn-1 - qn-1rxl < Ip - qrxl, so that we may assume that qn;::: q > qn-1' If q = qn, then

255

lOA Hurwitz's Theorem

Also

If qn+l = 2, then n = 1, and al = a2 = 1, giving 1

IX

l' 1

= ao + -

-

-

1 + 1 + a3

+ ...

t

which shows that ao + < IX < ao + 1, and our required result clearly holds. We may therefore assume that qn+ 1 > 2, that is

and so

II!..q -

IX

I ~ II!.. - Pn 1-IPn q

qn

qn

IX

I ~ ~ -Ipn qn

qn

IX

I > Ipn qn

IX

I·

We may now assume that qn > q > qn-l' Let us write upn + vPn-l = p, uqn + vqn-l = q, so that u(Pnqn-l - Pn-lqn) = pqn-l - qPn-l' From Theorem 1.2 we have u = ± (pqn-l - qPn-l), and similarly v = ± (pqn - qPn). The numbers u, v cannot be zero, and in fact from qn > q = uqn + vqn-l we see that they are of opposite signs. Now from Theorem 2A,Pn - qnlX andpn-l - qn-llX have opposite signs, and therefore u(Pn - qnlX) and V(Pn-l - qn-llX) have the same sign. Finally from p-qlX=U(Pn-qnlX)+V(Pn-l-qn-llX) we see that Ip-qlXl>IPn-l-qn-llXl > IPn - qnlXl· D

Example. From n

= [3,7,15,1,292,I,I, ...J we

obtain the convergents

3 22 333 355 103993 104348 106' ill' 33102 ' 33215 , ....

l' 7'

In the year 500 A. D. Chao Jung-Tze obtained both the crude estimate 22/7 and the good estimate 355/113 (this is more than a thousand years earlier than the earliest European record due to Otto). More interesting still the two estimates of Chao belong to the family of best approximations to n; in other words there is no fraction with denominator less than 113 which is closer to n than 355/113 is. From Theorem 2.6 we have 3551

In -ill

1

1

< 113 x 33102 d > O.

e

e

e

Proof From = [ao, al>"" ak-l> '1], we have = (Pk-I'1 + Pk-2)/(qk-l'1 + qk-2) and we see that the condition c > d > 0 is necessary. The sufficiency of the condition can be proved by induction on d. D

e

Theorem 5.3. A necessary and sufficient condition for two irrational numbers and '1 to be equivalent is that = [ao, al>' .. , am, co, Cl>' .. ] and '1 = [b o, bl> ... , bn , co, Cl,' .. J. In other words their continued fractions expansions are eventually identical.

e

Proof I) Let W = [co, Cl>' .. J. Then

+ Pm-l , e= [ao,al> ... ,am,w] =WPm wqm + qm-l Thus wand eare equivalent. Similarly wand '1 are equivalent, and hence eand '1 are equivalent. 2) Let eand '1 be equivalent, and '1 = (ae + b)/(ce + d), ad - bc = ± l. We may assume that ce + d> O. We expand einto continued fractions: e= [ao, ... , ak, ak+ l>' ..] = [ao,· .. ,ak-l> IX~] =

(IX~Pk-l

+ Pk-2)(a.~qk-l + qk_2)-l.

259

10.5 The Equivalence of Real Numbers

It follows that '1 = (Prx~

+ R)(Qrx~ + S)-l,

aPk-2 + bqk-2, Q = CPk-1 + dqk-t. S= CPk-2 satisfying PS - QR = ± 1. From Theorem 2.4 we have Pk-2

where P = aPk-1 + bqk-t. R = + dqk-2; P, Q, R, S are integers

(j'

+ --,

= ~qk-2

l(jl < 1, WI < 1,

qk-2

so that (c~

Q=

c(j

+ d)qk-1 + - ,

S

qk-1

=(c~

c(j'

+ d)qk-2 + --. qk-2

From c~ + d> 0, qk-2 ~ k - 2 and by Theorem 1.3, qk-1 ~ qk-2 + 1 we see that Q > S > when k is sufficiently large. It follows from Theorem 5.2 that '1 = [b o,· .. , bn , rx~] and the necessity of the condition is established. 0

°

Denote by M(rx) the greatest number such that, for any e > 0, the inequality

Irx -

~ I ~ (M(rx) 1_ e)q;

has infinitely many solutions. For example Pi rx - qi

M«fi -

1)/2) =

fi. Let

1 Aiqi

= .,.---z ;

then 1 Ai = ( - l)i(' rxi + 1

qi-1) , +~

and qi

q;/qi-1

ai+qi-1

ai+ ai-1+qi-2

Therefore

i-co

i-oo

If rx and {J are equivalent, then ai = bi for all large i. We have therefore proved the following Theorem 5.4.

If rx and {J are equivalent,

then M(rx)

=

fi

M({J).

0

(fi -

We see therefore that if A > and if rx is equivalent to 1)/2, then the inequality Irx - p/ql < 1/Aq2 has only finitely many solutions. We may now ask for the value of M(rx) when rx is not equivalent to 1)/2. We have the following

(fi -

260


(J5 -

result: If oe is not equivalent to 1)/2, then M(oe) ~)8. Specifically, for such oe, the inequality loe - p/ql < 1/)8 q2 has infinitely many solutions. Also, if oe is equivalent to 1 + ,J2, then M(oe) = )8. For the general situation, we need the following: Definition 5.3. By a Markoff number we mean a positive integer u such that there are integers v, w satisfying u 2 + v2 + w2 = 3uvw. The first eleven Markoff numbers are 1,2,5,13,29,34,89,169,194,233,433. (We shall prove in the next chapter that the number of Markoff numbers is infinite.) It can be proved that if oe u = ~ (J9U 2 2u

-

4

+ u + 2V) w

where u, v, ware related by the definition of the Markoff number u, then M(oe u) = J9u 2 - 4/u. Furthermore if oe is not equivalent to oe u for I ::;:; u ::;:; v, then the inequality .

loe -~I
O. Now (J( = [ao, ... , an- b f3J. If 13 ~ 1, then 13 = (J(~ ( = [an> an+ b" .]), and this means that p/q is a convergent of (J(. If 0 < 13 < 1, then [an-I + 1/13] = an-I + c, c > 0 so that (J( = [ao, ... , an-2, an- b + c, . .. ] and we see that [ao, . .. , an-I] is not a convergent. Therefore the required necessary and sufficient condition is that 13 ~ 1; in other words we have: Theorem 7.1 (Legendt:e). Let e9- = q2(J( - pq, e = ± 1, 0 < 9- < 1, and let p/q = [ao, .. . ,an-I], ( - 1t- 1 = e. Then, a necessary and sufficient conditionfor p/q to be a convergent of (J( is that

D Since the right hand side of the above inequality exceeds t we deduce at once the following

262


Theorem 7.2. If a rational number p/q satisfies loc - p/ql < 1/2q2, then it is a convergent of oc. 0 Theorem 7.3. Let p, q be positive integers satisfying Ip2 - oc 2q21 < oc. Then p/q is a convergent of oc.

Proof Let oc 2q2 - p2 = eooc, e = that

± 1, 0::;:; 0 < 1. Then ocq - p = eooc/(ocq + p), so

oocq ocq+p

[) = eq(ocq - p) = - - =

oocqn-l , ocqn-I+Pn-1

(_I)n-l=e.

From Theorem 7.1 we see that it suffices to prove that

or that OOC(qn-1 + qn-2) < ocqn-l + Pn-l' Now this inequality clearly holds when n = 2 so that it suffices to establish ocqn-l - Pn-l < OC(qn-1 - qn-2) for n > 2. But ocqn-l - Pn-l = eooc/(ocqn-l + Pn- d, and by Theorem 1.3 we have

qn-l - qn-2 The theorem is proved.

~

1

I > -----

ocqn-l+Pn-1

0

10.8 Quadratic Indeterminate Equations In this and the next sections d denotes a positve integer which is not a perfect square. We consider the equation

0< III
yd = (X2,Y2) = I we deduce that Xl = X2, Yl = Y2 contrary to our assumption. We have therefore proved

°

Theorem 9.1. The Pel/'s equation X2 - dy2

= I has a non-trivial solution. 0

From Theorem 7.3 we see that x/y = Pn- t/qn-l must be a convergent of and from Theorem 8.2 we know that there exists n such that (- I )nQn = I. Theorem 9.2. Let n be the least positive integer satisfying (- I )nQn solutions to the equation X2 - dy2 = I are given by

jd,

= I. Then al/ the

265

10.9 Pell's Equation

+ jdqn-l > 1. Because ± 1/(x + jdy) = ± (x - jdy), it suffices to show that all positive solutions to x 2 - dy2 = 1 are given by x + yjd = em (m > 0). Let (x,y) be such a solution, so that x + yjd > 1. We may choose m so that em:::; x + yjd < em + l or 1 :::; e-m(x + yjd) < e. Let

Proof Let e = Pn-l

and we shall prove that X (xo

+ Y jd =

1. Since jd is irrational, it follows that

+ Yojd)(x -

yjd)

=

X - Y jd.

On multiplying the equations together we have X 2

1< X

-

dy 2 = 1. Suppose now that

+ jd Y < e. Then

°< e

-1

< (X + jd Y) -1 = X - jd Y < 1.

We deduce easily that 2X = (X + jd Y)

+ (X -

jd Y) > 1 + e - 1 > 0,

2jd Y = (X + jd Y) - (X - jd Y) > 1 - 1 = 0. It follows from these that

X>o,. and

Jl

+ dy2 increases withy, so that x + jdyincreases as y increases. We Now x = deduce from the above that Y < qn-l and X < Pn- b so that X/Y is a convergent with denominator less than qn-l' This is impossible; therefore X + jd Y = 1. D We see from the above that the equation x 2 - dy2 = I is always soluble, but the equation x 2 - dy2 = - I may have no solution. For example, since x 2 == 0, I (mod 4) so that x 2 - 3y2 == x 2 + y2 == 0, 1,2 (mod 4), we see that the equation x 2 - 3y2 = - 1 is insoluble. In fact this example shows that x 2 - dy2 = - I is insoluble whenever d == 3 (mod 4). However if xo, Yo satisfy x~ - dy~ = - 1, then, by defining Xl, Yl with Xl + jdYl = (xo + jdYO)2 we see that xi - dyi = 1. It is not difficult to prove that if x 2 - dy2 = - 1 is soluble, then all the solutions to x 2 - dy2 = ± 1 are given by ± (Pn _ 1 + jd qn - d where n is the least positive integer satisfying (- 1)nQn = - 1.

266

Continu~d

10.

Fractions and Approximation Methods

10.10 Chebyshev's Theorem and Khintchin's Theorem Let 8 be an irrational number. According to Theorem 2.4 there are infinitely many integers x, y satisfying Ix8 -

I

yl 0, then there exists an integer x such that x8 differs from [x8] by less than e. In other words the number 0 is a limit point of the point set

x = 1,2,3, ....

x8 - [x8],

(2)

An immediate problem arising from this is the determination of the set of limit points of the point set (2). For this Chebyshev has proved that each point in the interval (0, I) is a limit point of the point set (2). In fact he proved the following stronger result. Theorem 10.1. Let 8 be any irrational number and /3 be any real number. Then there are infinitely many integers x, y satisfying 18x - y -

3 x

/31 0 such that p

(i

l(il
O. Then the inequality

1+e Ix8 - y - 131 < - -

(8)

J5x

has infinitely many solutions in integers x > 0, and y. Proof By Theorem 4.3 there are infinitely many coprime pair of integers p, q such that 8 = p/q + b/q2, where 0 < Ibl < 1/J5. We may assume that b > 0 since otherwise we can replace 8, 13 by - 8, - 13. Let ~1' ~2 be real numbers satisfying ~2 - ~1 ~ 1, and we shall specify them later. We can choose x,y such that px - qy = [qf3],

(9)

Then we have

Ix8 - y - 131

=

I~x + bx _ q

q2

y _ [qf3] _ ~ q q

I (10)

where

-r = qf3 -

[qf3]. We want to show that

_~ ~ ~ (x; _-r) < ~, or -r2

1

J5

x 2b ~ q2

x-r

-r2

-r2

1

J5.

---~---+- I (the left hand side is merely ~2 - ~1) we obtain .. 2 < ~ +

(I -

+ 2c5 (I -

fi)

+ c5 2 = I -

fi

+ c5

< I/fi - c5. Let '1 > 0. We may specify x and y such that px - qy = [qP]

+ I,

'1q

~

x < (l

+ '1)q,

and similarly to (10) we have Ix.9 _ y _

PI = Ixc5 + I q2

q

't I = ~q (xc5q + (l - 't»)

I{ I} I I (l + '1)2 < - (I + '1)c5 + - - c5 ~ -(I + '1)- < ---=cq fi q fi xfi Since '1 is arbitrary, the theorem is proved.

D

Exercise. Let .9 be an irrational number such that, given any B > 0, there always exist integers x,y satisfying Ix.9 - yl < B/X. Prove that if c5 > and Pis real, then there exist integers X,y such that Ix.9 - y - PI < (I + c5)/3x.

°

10.11 Uniform Distributions and the Uniform Distribution of n9 (mod 1) Chebyshev's theorem in the last section states that the point set {x.9} = x.9 - [x.9] , x = 1,2,3, ... is dense in the interval (0, I), in the sense that each point in (0, I) is a limit point of the set. We may ask about the distribution of this point set in the interval (0, I). In other words, if (a, b) is a subinterval of (0, I), then as x takes the values 1,2, ...-, n does the interval (a, b) receive the "correct proportion" of points? Let us define precisely what we mean by the "correct proportion" . Definition. Let Pi (i = 1,2,3, ... ) be a point set in the interval (0, I). Let ~ a b ~ I, and for each positive integer n denote by Nn(a, b) the number of P b P 2 , ••• ,Pn that lie in the interval (a, b). Iflimn->ooNn(a,b)/n=b-a always holds, then we say that the point set Pi (i = 1,2,3, ... ) is uniformly distributed in (0, I).

°points
0 such that p

J

IJI
12/8 and then choose n > 2(q + 6)/8. It follows that we have neb -a) -n8~Nn(a, b) ~n(b-a) +n8. This proves that lim n _ oo Nn(a, b)/n = b -a. 0

10.12 Criteria for Uniform Distributions Theorem 12.1 (Weyl). A necessary and sufficient condition for the sequence (Xn) ,

o ~ Xn ~ I to be uniformly distributed in (0, I) is that the equation 1

lim f(xd n-+

00

+ ... + f(xn) =

ff(X) dx

n o

holds for every Riemann integrable function f(x) in (0,1).

(1)

271

10.12 Criteria for Uniform Distributions

°

Proof We first establish the necessity of the condition (1). I) Letf(x) be defined to be cor according to whether a clearly

+ ... + f(xn) =c I·1m

· f(xr) I1m

n

n-oo

Nn(a, b)

=c

~

(b

x

~

b or not. Then

) -a,

n

n-oo

and I

f f(x) dx

=

c(b - a).

o

Therefore the equation (I) holds for this function f(x). 2) The equation (I) is linear in the sense that if it holds for f1> . .. Jk, then it holds for cdl + ... + Cdk. From 1) we see that (I) holds for all step functions. 3) It is a simple exercise to show that iffis Riemann integrable, and B > 0, then there are two step functions (f).(x), €P.(x) such that (f).(x) ~ f(x) ~ €P.(x) and I

f (€P.(t) - (f).(t)) dt
oo

~n I

±

e21timxv

v=l

1= °

holds for all m -:f 0. Proof There is no need to prove the necessity part. For the sufficiency part we define g(x) = {

I,

if 0::;:; x < a,

0,

if a::;:;x(x)dx

~-1>+1

Co =

= a + b - 21'/;

~-I>

and when n ¥- 0,

~-I>

It follows that

ICnl

~ 1/b(nn)2

- g~,I>(xd S~,I> () X -

and

+ ... + g~,I>(Xk)

_ 1 ~

-

k

-

~

L.

L.

C

21tinx'

ne

kj~ln~-oo

J

look

=-

I

kn~-oo

Cn

I

j~l

e21tinxj.

Thus we have

We observe that

I Cn~ i IInl>N k j~l

e27tinXjl

~--; I ~. bn

n>N

n

Let e > 0 and choose N so that the right hand side of this inequality is less than e. With N fixed we see from I k lim - I e27tinXj = 0 k ... oo k j~l

that for all large k,

In~I

Cn -1 Ik -N k j~ 1

N

. e 27t ,"Xj

I< e.

n*O

Thus, given any pair of fixed 1'/, b we have

or lim k'"

Now let

00

S~,ix)

= a + b - 21'/.

274


From Sb.o(X)

~

S(x)

~

SO.b(X) we deduce that

k-+

k-+

00

00

for any lJ. Therefore limk_ 00 S = a as required.

D

For a clearer description of uniform distribution it is best to use the unit circle to represent the interval. Let en = e21tixn, n = 1,2, ... so that the sequence (x n) in ~ Xn ~ 1 is now transformed into a sequence on the unit circle. An advantage of using this description is the removal of the special properties of the end points 0, 1 in the interval (0,1). Take any arc of the unit circle with length 2noc (oc < 1). Then any uniformly distributed sequence will have the proportion oc of its points on this arc. Moreover, since e21tixn = e21ti (xn+d), it does not even matter if the sequence (xn) lie outside the interval (0,1). In other words we may define uniform distribution of f(x), mod 1 by the uniform distribution of the fractional parts of f(x) in (0, 1). A necessary and sufficient condition for the uniform distribution off(x), mod 1 is then 1 n lim - I e21timf(x) = 0, m¥-O.

°

n-+oonx=l

An interpretation of this condition is that the centre of gravity of the sequence of points e21timf(x), x = 1,2, ... , (m ¥- 0) is the centre of the circle. It is clear that iff(x) is uniformly distributed mod 1, then so is mf(x) for any non-zero integer m. The most interesting unsolved problem concerning this is whether eX is uniformly distributed mod 1.

Theorem 12.3. A necessary and sufficient condition for the uniform distribution of f(x) , mod 1 is that 1 n 1 lim - I {f(x)+a}=-, n-oon x =1 2 Proof Necessity. Let fix) be uniformly distributed, mod 1. Then f(x) + a is also uniformly distributed, mod L Therefore we need only establish the case a = 0. Let Xm = {f(m)}. Then, by Theorem 12.1, we have

f 1

lim -1

I

n

n-oon x =1

Sufficiency. Let

°

~

{f(x)}

=

xdx

1 2

=-.

o

b

~

1. Then

1 n i l {f(x) + 1 - b} = - II ({f(x)} + 1 - b) + - I2 ({f(x)} - b), n x= 1 n n

- I

where in II' X runs through those integers 1,2, ... ,n such that {f(x)} < b, and in I2, x runs through those integers 1,2, ... ,n such that {f(x)} ~ b. We see therefore that 1

n

- I

n x=1

{f(x)

+ 1-

n

b} = n- 1

I

x=1

{f(x)}

+ n- 1N n(0,b)

- b.

275

10.12 Criteria for Uniform Distributions

Letting n --+

00

and observing the hypothesis we see that . 1 11m -Nn(O,b) n-+

as required.

0

00

n

=

b

Chapter 11. Indeterminate Equations

11.1 Introduction By indeterminate equations we mean equations in which the number of unknowns occurring exceed the number of equations given, and that these unknowns are subject to further constraints such as being integers, or positive integers, or rationals etc. Apart from equations of the first and second degrees, the discussion on indeterminate equations is very scattered. The complicated nature of the subject is illustrated by the fact that Volume II of Dickson's History 0/ Number Theory devotes over eight hundred pages on such equations. The study of these equations has a long history. In the third century Diophantus attempted a systematic study and in fact nowadays indeterminate equations are often called Diophantine equations. In our country indeterminate equations have an even longer history; for example Soon Go gave the general solution of X2 + y2 = Z2 in integers x, y, z much earlier than the west.

11.2 Linear Indeterminate Equations From Theorem 2.6.2 we see that a necessary and sufficient condition for the equation alxl + a2x2 + ... + anxn = N to have a solution is that (at. ... ,an)IN. Suppose now that a 1 > 0, ... ,an> 0, (a 1 , ••• ,an) = 1. We ask for the asymptotic formula for the number of solutions to the equation Xv

;?;

0 (v = 1, 2, ... , n).

(1)

Theorem 2.1. Let (at. ... ,an) = 1, and denote by A(N) the number o/solutions to (1). Then we have

Proof 1) Since (at. ... , an) = 1, the number A(N) is the coefficient of XN in the power series for 1

j(x)

1

1

= 1 _ x'" . 1 _ x"2 ... 1 _ x an •

277

Il.2 Linear Indeterminate Equations

Let 1, (I, (2, ... , (I be the roots of (l - x a ,) ••• (l - x an ) = 0, with multiplicities n, 11, 12 , ••• , It respectively. Since (at> ... , an) = 1 we have Ii ~ n - 1 (i = 1,2, ... , t). We have, by partial fractions, j{x)=

An (l-x)n

Al + ... +--+

B" «(I-X)I,

I-x

BI + ... + (I-X

+ ... (2)

where A, B, . .. , P are constants. 2) Denote by ljJ(N) the coefficient of x N in the power series expansion of A (0( _

xy

=

AO(-I

(X)-I 1- ~ .

Then, by the binomial theorem expansion, we have ljJ(N)

= AO(

_ (- 1)( - I - 1) ... ( - I - N I

N!

= AO(-I (N + 1-

+ 1) ( -

)N

1 0(

l)(N + 1- 2) ... (N (I-I)!

+ 1) (~)N, 0(

so that .

hm

N-+oo

ljJ(N). NI

0(1

+N

A (I-I)!

1

(3)

Applying this to the various terms in (2) and observing that Ii that

~

n - 1 we see

and from (2) we have An=lim x-+I

(l - X)n (1 - x a ,)

•••

(1 - x an )

Theorem 2.2. Equation (1) is always soluble

if N

aI··· an

.

D

is sufficiently large.

D

Exercise. Let (a, b) = 1, a> 0, b > 0. Show that the number of solutions to ax + by = N, x ~ 0, y ~ is given by

°

N - (bl + am) ab

+1

where I and m are the least non-negative solutions to bl == N (mod a) and am == N (mod b) respectively.

278

11. Indeterminate Equations

11.3 Quadratic Indeterminate Equations We shall solve the equation ax 2 + bxy

+ ey2 + dx + ey + f

=

(1)

0.

We write D = b 2 - 4ae. If D = 0, then we multiply (1) by 4a giving (2ax + by)2 + 4adx + 4aey + 4af = 0, which is not a difficult equation to solve. Let 2ax + by = t so that t 2 + 2(2ae - bd)y

(t

+ d)2 = 2(bd -

+ 4af = - 2dt, 2ae)y + d 2 - 4af

The number t can be obtained from the congruence (t + d)2 == d 2 - 4af (mod2(bd - 2ae)), and so x, y can be solved. We now assume that D "# 0. Multiplying (1) by D2 we have

Substituting Dx = x' a(x'

+ 2ed -

be, Dy

= y' + 2ae - bd into (2) we have

+ 2ed - be)2 + b(x' + 2ed - be)(y' + 2ae - bd) + e(y' + 2ae + dD(x' + 2ed - be) + eD(y' + 2ae - bd) + fD2 = 0,

bd)2

or ax'2

+ bx'y' + ey'2 = k,

(3)

where

= a(2ed - be)2 + b(2ed - be)(2ae - bd) + e(2ae - bd)2

- k

+ dD(2ed -

be)

+ eD(2ae -

bd)

+ fD2.

We see therefore that whether (1) is soluble depends on whether (3) has solutions satisfying x' == be - 2ed,

y' == bd - 2ae

(mod D).

Our first priority is therefore to solve (3).

11.4 The Solutions to ax2 + bxy

+ cy2

=

k

We shall solve ax 2 + bxy

Let d = b 2

-

+ ey2 = k.

(1)

4ae. We shall assume that d is not a perfect square, and that

11.4 The Solutions to ax 2 + bxy

+ Cy2 = k

279

(a, b, c) = 1. We need only find those solutions satisfying (x,y) = I, and we call these the proper solutions. Theorem 4.1. Let x, y be a proper solution to (I). Then there are two uniquely determined integers sand r satisfying xs - yr = 1,

(2)

and the integer

1= (2ax

+ by)r + (bx + 2cy)s

satisfies

12 == d (mod 4k),

o ~ 1< 2k.

Proof Let ro, So be a solution to (2). Then the general solution to (2) is r = ro s = So + hy where h is any integer. Thus

1= (2ax

+ hx,

+ by)ro + (bx + 2cy)so + 2h(ax 2 + bxy + cy2) = 10 + 2hk,

so that we may choose a unique h such that 0 [2

(3)

~

I < 2k. Finally we have

= [(2ax + by)r + (bx + 2cY)S]2 = 4(ar2 + brs + cs 2)(ax2 + bxy + cy2) + (b 2 - 4ac)(xs - yr)2 == d (mod4k).

0

Theorem 4.2. Let (Xl' YI) and (X2' Y2) be two proper solutions corresponding to the same number I in the previous theorem. Then we have

where t and u are integers satisfying

(5) Conversely, if(X2, Y2) is aproper solution, then the numbers Xl> YI defined by (4) also give a proper solution and both solutions correspond to the same number I. Proof I) We first show that t

= «2axl + byr)(2ax2 + bY2)

u = - (XIY2 - X2YI)/k

- dYIY2)/2ak,

(6)

are the suitaqle integers; that is we show that t and u are integers satisfying (5). From

280


1+Ujd 2

+ byd(2ax2 + bY2) -

(2axt

dYtYz 4ak

± 2a(XtY2 -

X2Yt)jd

+ bYt + jdYt)(2aX 2 + bY2 ± jdY2) (2ax t + bYt + jdYt)(2ax t + bYt - jdYt) (2ax t

+ bYt + jdYl)( 2aX2 + bY2 ± jdY2) (2aX2 + bY2 + jdY2)(2aX2 + bY2 - jd Yz) , (2ax t

we see that (4) follows. Next from 12 - du 2

--4-= we see that

1

1 + jd u

2

1-

.

jd u

2

=1,

and u satisfy (5). Also

2axt

+ bYt = (2axt + bYt)(StXt - TtYt) = (2axt + bYdStxt - IYt + (bxt + 2cYt)StYt == - IYt

(mod 2k).

(7)

Similarly we have 2aX2

+ bY2 == -

IYz

(mod 2k).

Therefore 2a(xtYz - X2Yt) == 0

+ 1)(xtYz

(mod 2k),

- x2yd == 0

(mod2k).

2C(XtY2 - X2Yt) == 0

(mod2k),

(b - 1)(xtYz - X2Yt) == 0

(mod 2k).

(b

Similarly we have

But (2a,b

+ I,b -1,2c) = (2a,2b,2c,b + I) ~ 2,

so that XtYz - X2Yt == 0

(modk).

This shows that u is an integer. Therefore 12 is an integer, and since 1 is rational, itself must be an integer. 2) Suppose that 2axt

and

12 - du 2

+ (b + jd)Yt = (2ax2 + (b + jd)Y2)

e

+ ;jd),

= 4. Then t -

Xt

bu

= -2-X2 - CUY2,

1

Yt

+ bu

= aux2 + -2-Y2'

t

11.4 The Solutions to ax 2 + bxy

Let

rio

+ Cy2 =

SI correspond to the solution r2

t

281

k

+ bu

= -2-r1 + CUSl,

Xio

Yl. Then S2 = - aurl

t - bu

+ --SI 2

correspond to the solution X2, Y2, because

Finally, let II and 12 correspond to (Xio yd and (X2' Y2) respectively. Then

t - bu = (2ar l + bs1) ( -2--X2 - CUY2 ) + (brl + 2csd ( aux2t + -+2bu- Y2 )

= { 2a ( r l -t - 2-bu+ s 1cu)

t - bu )} X2 +b (SI-2-+rlau

bu- s 1cu.) +2c (SI-2--rlau t + bu )} Y2 + { b ( r l -t +2-


0

We shall now separate our discussion into two cases depending on the sign of d. Theorem 4.3. Suppose that d < 0. Let

if d < - 4, if d= - 4, if d = - 3. Then there are w proper solutions to (l) that correspond to the same l.

°

Proof From Theorem 4.2 we see that it suffices to show that the equation t 2 - du 2 = 4 has w solutions. If d < - 4, then clearly t = ± 2, U = are the only solutions, so that w = 2. If d = - 4, then t 2 + 4u 2 = 4 has the four solutions t = ± 2, U = and t = 0, u = ± 1. Finally if d = - 3, then t 2 + 3u 2 = 4 has the six solutions t = ± 1, U = ± 1; t = ± 2, U = 0. 0

°

Theorem 4.4. Let d > 0. Then all the solutions to the equation X2 - dy2 = 4 can be obtained as follows: Let xo, Yo be a solution in which Xo + Yofl is least (xo > 0,

282

II. Indeterminate Equations

Yo > 0). Then all the solutions are given by x

+ yfl = + (xo + YOfl)n 2

-

2

n = 0, ± I, ± 2, ....

'

Proof Since the equation x 2 - dy2 = I does possess a solution we see that Xo, Yo exist. The rest of the proof is the same as that in Theorem 10.9.2. 0

Let Xo

+ Yojd

e=---'--

2

_ Xo - Yojd e= 2 .

'

Definition. Let d > O. By a primary solution to (l) we mean a solution which satisfies

2ax

+ (b -

If we write L = 2ax above becomes

fl)y > 0,

I

+ (b + fl)y,

L

+ (b + jd)YI < e2. 2ax + (b - fl)y 2ax + (b - jd)y, then the condition 2ax

~ 1

=

' I

~ I~I < e

2

•

Theorem 4.5. Let d> O. If the equation (I) has proper primary solutions which correspond to the same I, then it has a unique proper primary solution.

Proof From Theorem 4.2 we know that if Xo, Yo is a proper primary solution to (l), then, on denoting by Lo the associated number L, every proper solution of (l) corresponding to the same I can be represented by L = ± Loen. We have

so that I ~ IL/LI < e2 only when n = 0, and in this case L = Lo > O.

0

When d > 0 we set w = I. We can now generalize the definition of a primary solution: When d> 0, the definition is as given previously; when d < 0 any proper solution is also called a primary solution. Combining Theorems 4.3 and 4.5 we now have Theorem 4.6. If, corresponding to the same I, the equation (I) has proper primary solutions, then there are w proper primary solutions. 0

Theorem 4.5 suggests that in solving ax 2 + bxy + cy2 = k there is no need to search for integer points on the whole hyperbola. The primary solution occurs in a finite part of the hyperbola, and having obtained the primary solution we may use the formula L = ± Loen to find all the other solutions. That is, if e is known, all the solutions can be obtained in a finite number of steps. Specifically, from LoLo = 4ak,

Lo > 0,

II

2 I~ Lo Lo <e,

283

11.5 Method of Solution

we see that

or

giving

Iyl :;:; 2sJlakl/d.

°

That is we need only find a solution which satisfies < y :;:; 2sJlakl/d and the rest can be obtained from L = ± Los". When a > 0, k > we deduce from L > and LL > that L > 0, and whence L < L so that

°

°

0 jd, then we can still reduce it to the case when k<jd. Suppose that x, Y is proper solution to (1). Then there are Xl> Yl such that (2)

Multiplying (1) by

x~

- dyi we have

or

Let xo, Yo be a solution to (2). Then all the solutions to (2) are given by XXI - dYYl = xXo - dyyo + (X2 - dy2)t = xXo - dyyo + Jtk. We may therefore choose t so that

Xl

= Xo + tx, Yl = Yo + ty so that

k

IXXI -

Let

IXXI -

dyyd

dYYll ::;;;

2·

= l. Then

x~ - dyi

f2-d

= ~ = 1'/h,

1'/ =

± 1,

h>

o.

Therefore

From this we see that from a solution to (1) we arrived at a similar equation with a number k which is smaller. If this number is still greater than jdwe can repeat the argument. This suggests the following procedure. We first solve for all those I satisfying 12 == d (modk), 0::;;; I::;;; k12, and we let them be 11 , /2, ... , It. Set (l? - d)/Jk = 1'/ihi, 1'/i = ± 1, hi > 0 and solve the system x? - dy? = 1'/ihi (1 ::;;; i::;;; t). Suppose that hi < jd. Then we use the method of continued fractions. Let Xi, Yi be a solution. Then X=

- JdYi ± lixi 1'/ih i

- JXi ± IJ'i Y=---1'/ihi

(3)

is a solution to (1). This is because from 1'/ihi(X

+ jdy) = (Xi + jdy;)( - Jjd ± Ii)

we have X2 - dy2 = Jk at once. Further, if x, y in (3) are integers, then they are solutions to (1).

285

1l.5 Method of Solution

x; -

If hi > jd, then we proceed to obtain a specific solution to dy; = Yfihi' Then all the solutions to (I) can be obtained. We illustrate this with an example. Example. We wish to solve

x 2 - 15y2 = 61.

(4)

6r

We first solve 12 == 15 (mod61), 0::;;; I::;;; This means solving f2 = 15 + 61h, f2 ::;;; 900, or finding h so that 15 + 61 h is a square. Letting h run over 0::;;; h ::;;; [900/61] = 14 we see that there is only one suitable h, namely h = 10, 1= 25. We now have to solve

xi -

15yi = 10.

Observing that 10> ji5 we now consider f2 = 15 h = I, 1= 5 so that we have to solve x~ - 15y~

(5)

+ IOh, I::;;; 1f- = 5. This gives

= I.

(6)

From the method of continued fractions, the solutions to (6) are given by X2 +ji5Y2 = ±(4+ji5t. Therefore Xl +ji5Yl = ±(4+ji5)n(5±ji5)and so x

+ ji5 Y = ± (4 + ji5)n(5 ± ji5)(25 ± ji5)/IO.

Here the three signs

± are independent so that either

x

+ ji5y = ± (4 + ji5)n(l4 ± 3ji5)

x

+ ji5 Y = ± (4 + ji5)n(11 ± 2ji5).

or

eJ

Alternatively we can use the inequality at the end of §4, that is 0 < Y < ak/d. For this example we have 0 < y::;;; 7 and we can construct the following table 2

3

4

5

6

7

15

45

75

105

135

165

195

15

60

135

240

375

540

735

76

121

196

301

436

601

796

y

15(2y - I)

Observe that in the second row of this table each term increases by 30, and in the third row the i-th term is the sum of the (i - l)-th term and the i-th term of the second row.

286


Exercise 1. Solve the following indeterminate equations.

+ 7y2

(a)

3x2 - 8xy

(b)

3xy

(c)

9x 2 - 12xy

(d)

x 2 - 8xy - 17y2

- 4x

+ 2y2 -

+ 2y =

109,

4x - 3y = 12,

+ 4y2 + 3x + 2y = + 72y -

75

=

12,

0.

Exercise 2. Let k < )d. Show that the solutions to ax 2 + bxy + cy2 = k can be obtained from the continued fractions expansions of the roots of the equation ax 2 + bx + c = 0. Try and generalize the results in this section.

11.6 Generalization of Soon Go's Theorem Let us consider the equation x 2 + y2 = Z2. If (x, y) = d> 1, then d also divides z. We may therefore assume that (x,y) = 1, and we need only consider positive solutions. Next, if x, yare both odd, then x 2 + y2 == 2 (mod 4), so that Z2 is divisible by 2 but not by 4; since this is impossible we see that x and y must be of opposite parity. We shall assume that x is even. Theorem 6.1. The solutions of the equation x 2 + y2 = Z > 0, (x,y) = 1, 21x are given by x

Z2

satisfying x > 0, y > 0,

= 2ab,

where a, b are coprime integers of opposite parity satisfying a > b > 0. There is a one to one correspondence between (x,y,z) and (a, b). 0

e

On putting ~ = x/z, 1'/ = y/z the equation x 2 + y2 = Z2 becomes + 1'/2 = 1 + 1'/2 = 1 has infinitely and we deduce from Theorem 6.1 that the unit circle many rational points given by .

e

We generalize the problem and ask if every second degree conic possesses infinitely 31'/2 = 2 many rational points. The answer is no; for example the hyperbola has no rational points. For if we put ~ = x/z, 1'/ = y/z, (x, y, z) = 1 then we have x 2 - 3y2 = 2Z2, so that x 2 == 2Z2 (mod 3), which implies 31x and 31z, and whence 31y, contradicting (x,y,z) = 1. However, we do have the following:

e-

Theorem 6.2. Let a second degree conic, not a pair of straight lines, have rational coefficients. If the conic has one rational point, then it has infinitely many rational points.

287

11.6 Generalization of Soon Go's Theorem

Proof We may assume that the conic passes through the origin; otherwise we can translate the origin to the rational point concerned. The conic can be written as S2(e, 1]) + Sl(e, 1]) = 0, where Si(e, 1]) is homogeneous in and I] with degree i. If Sl(e, 1]) 0, then the original conic is a pair of straight lines, and if S2(e, 1]) 0, then the original conic is a straight line. Therefore Sl(e, 1]) and S2(e, 1]) are not identically zero. Now put I] = (e so that eS2(l, 0 + Sl(l, 0 = giving

e

=

=

°

There are therefore infinitely many rational points.

0

Theorem 6.3. Let A, B, C be rational numbers, not all zero. Suppose that B2 - 4AC is a square. Then the conic (I) has infinitely many rational points. In other words, if the asymptotes of a hyperbola has rational points, then the hyperbola has infinitely many rational points; a parabola has infinitely many rational points. Proof Write L2 Ae

= B2 - 4AC, so that

+ Bel] + CI]2 = =

If L ¥-

A ((

A

°

e+

2: Y+ (~- ::2) I]

(e + ;:

I] -

1]2)

2~ 1]) (e + 2: I] + 2~ 1]).

we set 1]' =

and solving for

e-

-B+L 2A 1],

eand I] and substituting into (I) we have Ae'I]'

+ D'e' + E'I]' + F' = 0,

which gives ,

e= -

E'I]' + F' AI]' +

P' .

Therefore (1) has infinitely many rational points. If L = we set e' = e + BI]/2A, 1]' = - I] giving Af2 + D' e' + E'I]' + F' = 0. If E' ¥- 0, then 1]' = - (Ae'2 + D'e' + F')/E'sothatthereareinfinitelymanyrational points. If E' = 0, then the original curve is not a second degree conic. 0

°

Note: Theorems 6.2 and 6.3 raise the following problem. Let

(2)

288


be a homogeneous second degree equation in Xl, X2, ... , X" with integer coefficients, not factorizable into a product of linear terms. We ask if there are infinitely many lattice points satisfying (2). We see from Theorem 6.2 that if n ~ 3 and if (2) has a non-zero lattice point, then there are infinitely many lattice points. But when does it have a lattice point? For example: xi + x~ + ... + = certainly has no nonzero lattice point. We therefore have to assume thatj(~l> ... ' ~") = has other real solutions. It can be proved that, under this assumption, and for n ~ 5, the equation (2) has integer solutions, and indeed infinitely many solutions (this is Mayer's theorem). The result does not hold when n = 4. For if xi + x~ + x~ - 7x; = 0, then we may assume that (Xl> X2, X3, X4) = I. Now from xi + x~ + x~ + x; == (mod8), and x 2 == 0,1,4 (mod 8) we can deduce that 21(Xl,X2,X3,X4) which is a contradiction.

x;

° °

°

11. 7 Fermat's Conjecture Fermat claimed that when n ~ 3 the equation x" + y" = z" has no positive integer solutions in x, y, z. This has been proved for 2 < n < 125000, and even this modest amount of result involves some pioneering work by mathematicians. In order to prove Fermat's claim it suffices to establish the case when n = 4 and when n is an odd prime. For if n has an odd prime divisor p, then

and if n has no odd prime divisors, then n = 2k (k ~ 2) and

The case n = 4 can be settled using Fermat's method of infinite descent. In fact we have Theorem 7.1. The equation X4

+ y4 = Z4 has no positive integer solutions.

D

11.8 Markoff's Equation We introduced in §10.5 Markoff's equation (1)

and we stated the relationship between Markoff numbers and continued fractions. We shall now study this equation. Theorem 8.1. Let xo, Yo, Zo be a solution to (1). Then so is xo, Yo, 3xoyo - zoo

289

ll.8 Markoff's Equation

Proof x~

+ y~ + (3xoYo

- zof = x~

+ y~ + z~ - 6xoYozo + 9x~~ = - 3xoYozo + 9x~~ = 3xoYo(3xoYo -

Zo)·

D

Theorem 8.2. Every solution of (1) can be generated from Theorem 8.1 with x = y = z = 1 as an initial solution. Proof 1) If x = y = z, then clearly x = y = Z = 1. 2) If x = y -:f z, then 2X2 + Z2 = 3x2z. Hence x 21z2 or xlz. Let z = wx so that 2 + w2 = 3wx (w > 0) and hence w12, giving w = 1 or 2. But x -:f z so that w = 2 giving x = 1, y = 1, z = 2 and this is a solution generated by (1, 1, I) from Theorem 8.1. 3) We can now assume that x < y < z. Ifwe can establish that 3xy - z < z, then we can reduce the value of x + y + z, so that after a finite number of successive steps x, y, z cannot be all different which means that we have reduced the present case to 1) or 2). This is what we shall prove. From Z2 - 3xyz + x 2 + y2 = 0 we have

If then from we see that

2z < 3xy - xy = 2xy, or

z < xy. But

so that xy < z giving a contradiction. Therefore

as required.

D

Example. Starting with (l, 1, I) we have (l, 1,2) and then (l, 2, 5); (l, 5, 13); (2,5,29). Continuing we have the following table for x ::;:; y ::;:; z < 1000. z y

x

2

5

13

29

34

89

169

194

233

433

610

985

2

5

5

13

34

29

13

89

295

233

169

2

5

2

2

290


Note: Observe that this is also a method of descent. Fortunately there is no more descent after x = y = z = 1. We see therefore that Fermat's method of infinite descent can be used either to prove that there is no solution, or to prove that there are infinitely many solutions.

Exercise 1. Generalize the discussion here to the equation nXIX2 ... Xn •

Exercise 3. Show that the equation 2X4 - y4

11.9 The Equation x 3

=

xi + x~ + . . . + x; =

Z4 has infinitely many solutions.

+ y3 + Z3 + w3

=

0

The number 1729 is the smallest positive integer representable as the sum of two cubes in two different ways. That is 1729 = 103 + 93 = 12 3 + P. There are other numbers having this property, for example: 23 + 34 3 = 15 3 + 333, 93 + 15 3 = 23 + 16 3 • In fact we even have 70 3 + 560 3 = 98 3 + 552 3 = 315 3 + 525 3, 121170 3 + 969360 3 = 545275 3 + 908775 3 = 342738 3 + 955512 3

= 336455 3 + 956305 3, and 34 + 43 + 53 = 63, P + 63 + 83 = 9 3. The solutions to the equation x 3 + y3 + Z3 + w3 = 0 present a very interesting problem. Unfortunately we still have not obtained a formula for all the solutions. The Euler-Binet formula below provides all the rational solutions. Theorem 9.1. The rational solutions to the equation W 3 + 6XYZ = 0 are given by

+ 3 W(X2 +

y2

x = pa(a 2 + 3b 2 + 3c 2),

W= - 6pabc, Y = pb(a 2 + 3b 2 + 9c 2),

Z

= 3pc(a 2 + b2 + 3c 2).

Here (a, b, c) = 1, and p is a rational number. Proof We rewrite the given equation as W -Z

3Z W

Y

-x

- 3Y 3X =0, W

so that there must be integers a, b, c not all 0 and (a, b, c)

=

1 such that

+ Z2)

11.9 The Equation x 3

+ y3 + Z3 + w3 =

291

0

+ 3Zb - 3Yc = 0, Za + Wb + 3Xc = 0, Ya - Xb + We = 0.

Wa -

Solving these for X, Y, Z, W, the required result follows.

D

Let

+ f3 + y + Yfr) such that

Yfd~l

very small, there are pairs of numbers

(~1>

Yf1),

and the ratios

~,4~2 , ... ,4r are nearly equal. Therefore follows. 0

1

£'_

Yf1

Yf2

Yfr

~slYfs

are distinct ratios and the required result

Exercise 1. Show that the rational solutions to obtained from 0(

=

0'( -

(~

y = O'«e

where

~

-

3Yf)(e

+ 3Yf2)2

-

+ 3Yf2) + 1), (~ + 3Yf)),

0(3

+ ++ [33

y3

()3

= 0 can be

+ 3Yf)(~2 + 3Yf2) - 1), O'«e + 3Yf2)2 - (~ - 3Yf))

[3 = O'«~

then the linear substitution ~' = (;( 1 ~ + PI '1/2, '1' = '1, " = ,will make the left hand side of (14) linear in '1' and so the theorem is proved. D Theorem 10.4. If a non-degenerate cubic surface has a rational point, then it has infinitely many rational points.

Proof We may assume that the surface passes through the origin so that it can be written as

(16) where Si(~' '1, 0 are homogeneous in ~, '1,' with degree i. 1) If SI(~''1,O == 0, then S3(~''1,') + S2(~''1,') = 0, so that

'S3(t,~, 1) + S2(r~, 1) = 0, giving, = - S2«(;(, P, 1)/S3«(;(, P, 1). Observe that if S3«(;(, P, 1) == 0, then the original surface is not a cubic, and if S2«(;(, P, 1) == 0, then the cubic surface is a degenerated one. 2) If SI(~' '1, ') ¥= 0, then under the transformation SI(~' '1, ') ~, we have

If S3(~' '1, ') and

S2(~' '1, 0)

are not both identically zero, then we let, = 0 giving

299

Notes

Y

=

If S2(~' '1, 0) == 0, then S2(~' '1, 0 = (Ll(~' '1, O· We let Z = 1/(, X = ~/(, '11( so that S3(X, Y, 1)

+ ZL 1 (X,

Y, 1)

+ Z2

=

°

which gives

and this is included in Theorem 10.2 so that the required result follows. If S3(~' '1, 0) == 0, we let S3(~' '1, 0 = (T2(~' '1, 0, and this reduces to Theorem 10.3. The theorem is proved. D

Notes 11.1. The problem of the existence of solutions to the famous equation

x2

=

yn

+ 1,

has been settled by K. Chao [16]. He proved that, apart from n = 3, x = y = 2, there are no integer solutions.

± 3,

Chapter 12. Binary Quadratic Forms

12.1 The Partitioning of Binary Quadratic Forms into Classes Definition. For fixed integers a, b, c the homogeneous quadratic polynomial F = F(x,y) = ax 2 + bxy

+ cy2

is called a binary quadratic form, or simply a form, and is denoted by {a, b, c}. The integer d = b 2 - 4ac is called the discriminant of the form. It is easy to see that d

== 0 or I (mod 4).

Theorem 1.1. A necessary and sufficient condition for F to be factorized into a product of two linear forms with integer coefficients is that d is a perfect square. Proof 1) Let d be a perfect square, and a ¥- O. Then the equation ax 2 + bx

+ c = a {(x + :aY -

4~2} = 0

has rational roots, and therefore, by Theorem 1.13.2, the form can be factorized into a product of two linear forms with integer coefficients. If a = 0, then clearly F(x,y) = (bx + cy)y. 2) If ax 2 + bxy + cy2 = (rx + sy)(tx + uy), then d = b 2 - 4ac = (st


+ ru)2

- 4rt . su = (st - ru)2.

D

We shall assume from now on that d is not a perfect square. If d < 0, a > 0, then

301

12.1 The Partitioning of Binary Quadratic Forms into Classes

°

and so F(x,y) ~ Oforallx,y,andF(x,y) = ifand only if x = y = 0. We call such a form a positive definite form. If d < 0, a < 0, then F ::;;; for all x, y, and we call the form a negative definite form. Since a negative definite form becomes a positive definite form on multiplication by - 1, we shall only deal with positive definite forms which we shall simply call definite forms. If d > 0, then F(1,O)

= a,

F(b, - 2a)

= ab 2 -

°

b . b . 2a

+ c . 4a 2 =

-

da.

°

If a =I 0, then the two values here have different signs. If c =I we can similarly choose two values which have different signs. If a = c = 0, then F(l, 1) = b,

F(1, - 1)

= -

b

again have different signs. Thus when d > 0, the form F(x, y) can take both positive and negative values, and we therefore call such a form an indefinite form. Definition. Let the integer coefficient substitution x=rX+sY,

y=tX+uY,

(ru-st=l)

transform F(x,y) into G(X, Y)-we say that Fis transformed into G via ( rt

su)'

The two forms F and G are then said to be equivalent, and we write F ~ G to denote this. More specifically, let F = {a, b, c} and G = {al' bi> cd. Then we have (1)

b1

= 2ars + b(ru + st) + 2ctu

+ b(l + 2st) + 2ctu, as 2 +'bsu + cu 2,

= 2ars

(2)

Cl

=

(3)

=

(2ars

and we derive at once

bi -

4alcl

-

+ b(ru + st) + 2ctU)2 4(ar2 + brt + ct 2)(as2 + bsu + cu 2)

= (b 2 -

4ac)(ru - st)2 = b 2 - 4ac = d.

We see therefore that equivalent forms have the same discriminant. Also, if d < 0, a > 0, then al = F(r, t) ~ 0. Since a 1 = implies r = t = 0 which is impossible we see that al > 0. In other words forms which are equivalent to a positive definite form are themselves positive definite.

°

Theorem 1.2. (i) F ~ F (reflexive). (ii) If F ~ G, then G ~ F (symmetric). (iii) If F ~ G, G ~ H, then F ~ H (transitive).

D

302

12. Binary Quadratic Forms

We omit the simple proof for this theorem. The relation of being equivalent partitions the set of forms with discriminant d into classes, so that all the forms in one class are equivalent among themselves, and two forms from two different classes are not equivalent. It is clear that forms from the same class represent identical sets of integers. For if k = G(X, Y), then k = F(rX + sY, tX + uy).

12.2 The Finiteness of the Number of Classes Theorem 2.1. In every class offorms there is always one which satisfies the condition

Proof Let a be an integer with the least absolute value from the set of nonzero integers representable by forms in the class concerned. Let {ao, bo, co} be any form in the class. Then there exist r, t such that

and (r, t) = 1, since otherwise a/(r, t)2 is also representable by {ao, b o, co}, and lal/(r, t)2 < lal, which is impossible. We can fix sand u so that ru - st = 1. Then {ao, b o, co} is transformed into {a,b',c'} via

G:).

Now the transformation

G~)

transforms {a,b',c'} into

{a, b, c} where b = 2ah + b'. We can choose h so that Ibl ~ lal. Since c is representable by {a, b, c}, and this form also belongs to the class containing {ao, bo, co} it follows that Icl ;::i: 14 (Note that c # 0, because c = 0 implies that d is a perfect square.) D

Theorem 2.2. The number of classes is finite. Proof 1) d> 0 (indefinite). From Theorem 2.1 we have lacl ~ b 2 = d

+ 4ac > 4ac,

so that ac < O. Also 4a 2 ~ 41acl

= - 4ac = d - b 2 ~ d

so that

fl

lal~-,

2

and hence, by Theorem 2.1

fl

Ibl~-·

2

303

12.2 The Finiteness of the Number of Classes

There are therefore only finitely many possible values for a and b. Since c = (b 2 - d)/4a, the required result follows. 2) d < 0 (definite). Assuming that a > 0 we have, from Theorem 2.1,

so that

o a. Then t must be zero, since otherwise from ct 2 > at 2 and (4) we deduce that a > a which is impossible. Therefore t = 0, ru = 1. Now from (3), we have b' = 2ars

+ b == b

(mod 2a).

Since - a a' (= a). It remains to consider the case a = a' = c = c'. Here we must have b = ± b', and from b ~ and b' ~ we arrive at b = b'. D

°

°

Note. The case of the indefinite forms is not this easy.

Definition. We call a form which satisfies (I) a reduced form. Exercise 1. Verify the following table of all the reduced forms for d

-3

-4

-7

-8

-II

a

I I I

I 0 I

I I

I 0 2

I I

b c

2

I 0 3

3

Exercise 2. Prove that when d

= -

- 12 2 2 2

- 15 I I

4

2 I 2

- 16 I 0 4

°< -

d::::; 20.

- 19

- 20

2 0 2

I I

5

I 0 5

2 2 3

48 there are four reduced forms:

{1,0,12},{2,0,6},{3,0,4},{4,4,4}.

12.3 Kronecker's Symbol Definition. Let m > 0, d == Kronecker's symbol

°

or 1 (mod 4) and d not a perfect square. The

(~) is defined by

(~) =0,

if pld;

G)={-~

if d==1 if d==5

(~) =

(mod 8), (mod 8);

Legendre's symbol (p odd prime, p,td).

305

12.3 Kronecker's Symbol

If m

= TI~= 1 Pr

where Pr are primes, then

n

(m~)= r= (~) Pr 1

The following are very easy to prove:

(~) =

(i) If (d,m) > 1, then

(~) = ± 1.

(ii) If (d,m)

= 1, then

(iii) If

0, m2 > 0, then

ml >

O.

(ml~J = (:J (:J. Theorem 3.1. Ifm > 0, (m,d)

= 1, then the Kronecker's symbol is given by when

d) {(I:I). (m = (2)b -

m

d is odd

(- 1)~~(m) 2 2 -,

lui

Here(~), (~), (m) are all Jacobi symbols. Idl m lui Proof 1) Let dbe odd. From the definition of the Kronecker's symbol and Theorem 3.6.5 we have

2) Let d

=

2b u, 2,ru. Then b

~

2, and m is odd, so that

-_(2)b (- 1)~~(m) (-md) -_(2)b(U) m m m lui 2

2

-.

0

From this theorem we deduce that

Therefore we have: Theorem 3.2. The Kronecker's symbol

(~) is a real character mod Idl·

Theorem 3.3. Suppose that m > 0, n >

°and m == - n

(mod Idl). Then

if d> 0, if d < 0.

0

306


Proof Since

it follows from Theorem 3.1 that, when d is odd,

)

(Idl ~ I) = (Idll~ I) = ( ~t = (- I ={ When d is even, we let d

=

if d < 0.

2 bu, 2,ru, b ;;:: 2. Then, from Theorem 3.1, we have

=

2

The Theorem is proved. Theorem 3.4. Let k >

1

if d> 0,

I, - I,

(Idl 2)b .'C.!.. (Id l - I) (Idl d) _I _ I (- I) -Iul= (-

t;

"-1

1"1-1

1)-2-+-2-

={

I

'

- I,

= (-

I)

.'C.!.. ( 2

I)

~

if d> 0, if d < 0.

0

°and (d, k) = I. The number of solutions to the congruence x 2 == d

(1)

(mod 4k)

is equal to

2I(~) Ilk f ' where the sum is over all positive square-free divisors f of k.

If x is a solution to (1) then so is x solutions to

+ 2k. Hence, by the theorem, the number of

x 2 == d (mod4k),

0~x 1, then we say that {a, b, c} is imprimitive. Clearly

{~, ~,~} g g g

is a primitive form with discriminant d/g 2 • Also, if

{a, b, c} ~ {at. bt. cd then the two forms are either both primitive or both imprimitive. We denote by h(d) the number of classes of primitive forms with discriminant d. Clearly the number of classes of forms with discriminant d is equal to

From each class of primitive forms we select a representative (for definite forms we consider the primitive positive definite forms) giving a representative system which we denote by

Theorem 4.1. Let k > 0, (k, d) = 1, and denote by tjJ(k) the total number of primary solutions to k

=

F 1 (x,y),

... ,

k

= Fh(d)(X,y).

Then tjJ(k) = w I nlk

(~). n

(For the definitions of primary solution and w, see §4 in the previous chapter). Proof We begin by considering the solutions to the congruence [2

== d (mod 4k),

o ~ 1< 2k.

308


For a given solution Iwe can determine an integer m from f2 - 4km = d. This then gives a form {k, I, m} which is easily seen to be primitive and with discriminant d. Therefore {k, I, m} is equivalent to one and only one Fi • Also, from Theorem 11.4.3, we know that there are w proper primary solutions corresponding to each I. Therefore the total number of proper primary solutions to k

= F 1 (x,Y), ... , k = Fh(d)(X,y)

is

wI(~) Jlk f . Also the total number of primary solutions is t/J(k)

=w

I I (~) I f

g21k k g> 0 J g2

(since (k, d) = 1, so that ((k/g2), d) = 1). Since (g2, d) = 1 it follows that t/J(k)

=w

I I ( d) = wI (d) - . :~I~ J It> fg nlk n -2

(This is because any integer n can be written asfg2 wherefis square-free and g > O. Also g2Ik,fl(k/g 2) and nlk are equivalent.) D Consider now the following application of the theorem. It is easy to prove that = 1 so that t/J(k) is the number of solutions to k = X2 + y2. Therefore:

h( - 4)

+ y2 = k is equal to four times the difference between the number of divisors of k which are congruent 1 (mod 4) and the number which are congruent 3 (mod4). D

Theorem 4.2. The number of solutions to X2

This agrees completely with Theorem 6.7.5. Exercise 1. Let m be odd. The number of solutions to X2 + y2 = 2 1m is 20" where 0" is the difference between the number of divisors of m which are congruent 1 or 3 (mod 8) and the number which are congruent 5 or 7 (mod 8). Exercise 2. The number of solutions to X2 + xy + y2 = k is 6E(k) where E(k) is the number of divisors of k of the form 3h + 1 subtracting the number of divisors of the form 3h + 2. Exercise 3. Let m be odd and consider the number of solutions to the equation X2 + 3y2 = 2 1m. If I is odd, then this number is zero; if I = 0, then this number is 2E(m); if I is positive and even, then this number is 6E(m). Here E(m) has the same

definition as earlier.

309

12.5 The Equivalence of Forms modq

Exercise 4. If m is odd, then the equation x 2 + 3y2 = 4m has E(m) positive odd solutions. Exercise 5. Let m be odd and consider the number of solutions to the equation x 2 + 4y2 = 2km. When k = 0, this number is 2E; when k = I, this number is 0; when k ~ 2, this number is 2E. Here E is the number of prime divisors of m

congruent I (mod 4) subtract the number of divisors of k congruent 3 (mod 4). Exercise 6. Denote by e(n) the number of divisors of n congruent 1,2,4 (mod 7) subtract the number of those congruent 3, 5, 6 (mod 7). The number of solutions to x 2 + xy + 2y2 = n > 0 is then 2e(n). Exercise 7. If m is odd, then e(2am ) = (a + I)e(m). Let 3%t. If b is odd, then = 0 and if b is even, then e(3 b t) = e(t).

e(3 b ()

Exercise 8. Let m be positive and odd. The numbers of solutions to m = x 2 + 7y2 and 2m = x 2 + 7y2 are 2e(m) and 0 respectively. The number of solutions to 4k = x 2 + 7y2 is 4e(k). Exercise 9. Let m be positive and odd. Then there are e(m) positive integer solutions to x 2 + 7y2 = 8m. Exercise 10. The number of solutions to x 2 + xy + 3y2 = m > 0 is twice the difference between the number of divisors of m congruent I, 3, 4, 5, 9 (mod II) and the number of those congruent 2, 6, 7, 8, 10 (mod II).

12.5 The Equivalence of Forms mod q Let q be a prime number. Suppose that there is an integer valued coefficients substitution

x=rX+sY,

y = tX + uY,

(ru - st,q) = I

(I)

such that (2)

Then we say that the two forms {a, b, c} and {aI, bb cd are equivalent modq. Ifwe denote by dand d l the discriminants for {a,b,c} and {abbbcd, then clearly (3)

From (3) we see that if {a, b, c} and {ab bb cd are equivalent modp, then

(~) = (;).

310


Let us take q to be a prime p > 2. Suppose that the discriminant of {a, b, c} is d where p,j'd. Then {a, b, c} must be equivalent modp to a form {at> 0, cd. This is because p,j'(a, b, c), and if p,j'a then letting b X==x+-y, 2a

Y==y

(modp)

we have ax 2 + bxy

+ cy2 == a ( x + -b)2 y 2a

d d - _y2 == aX 2 - _y2 4a 4a

and similarly if p,j'c; if pl(a, c), then taking x = X ax 2 + bxy

+ cy2 == bxy == bX 2 -

+ Y, y = X

(modp), - Y we have

by2 (modp).

Therefore we can assume from now on that plb and p,j'ac. Lemma 1.

If p,j'ac,

then there are x, y such that ax 2 + cy2 == 1 (modp).

Proof Let x, y run over 0, 1, ... ,p - 1 separately. Then ax 2 and 1 - cy2 separately take (p + 1)/2 distinct values. Therefore there are x, y such that ax 2 == 1 - cy2

as required.

(mod p)

0

Let 1 == ar2 + ct 2 (mod p) and let s, u be any pair of integers satisfying p,j'ru - st. With s, u fixed, we let

b l == 2ars + 2ctu,

CI

== as 2 + cu 2 (modp)

so that {a, 0, c} ~ {l, bl> cd modp. If d l is the discriminant of the second form, then from our discussions we have {l,bl>cd

~ {1'0, - ~} ~ {l,0, -

dd

(modp).

Summarizing we have: Theorem 5.1. Let the discriminant of {a, b, c} be d, and p > 2, p,j'd. Let r be any quadratic non-residue modp. Then

{a,b,c}

if(~) = 1, and

~

{1,0, - l}

~

{O, 1,0}

(modp)

311

12.5 The Equivalence of Forms modq {a,b,c}~{l,O,

-r}

(modp)

ifGJ = - 1. Also {I, 0, - I} and {I, 0, - r} cannot be equivalent modp.

0

Corollary. If p is an odd prime that does not divide d, then any two forms with discriminant d must be equivalent modp. 0 When q

=

2 and the forms have odd discriminants we have:

Theorem 5.2. Any form with an odd discriminant must be equivalent mod 2 to exactly one of the following {O, 1,0}, {I, 1, I}. More specifically, we have {a,b,c}

~

{O, 1,0}

(mod 2)

if 2lac;

{a,b,c}

~

{I, 1, I}

(mod 2)

if 2,tac.

Proof Since 2,td it follows that 2,tb. Consequently if 2,tac, then ax 2 + bxy

+ cy2 == x 2 + xy + y2

(mod 2);

if 2lac, then either 21a or 21e. But if 21a then ax 2 + bxy

+ cy2 == xy + cy2 == y(x + cy)

(mod 2)

so that {a,b,c} ~ {O, 1,0} (mod2), and similarly if2lc. Finally {O, 1, O} and {I, 1, I} cannot be equivalent mod 2 so that the theorem is proved. 0 Corollary. Any two forms with the same odd discriminant must be equivalent mod2. 0 We next consider the case when p divides the discriminant of the forms. Lemma 2. Let n be any given integer. Then there are two integers x, y such that = 1 and (F(x,y),n) = 1.

(x,y)

Proof Let q be any prime number. Since F(x, y) is a primitive form, q,t(a, b, c). If q,ta, then q,tF(l,O); if q,tc, then q,tF(O, 1); if ql(a, c) and q,tb, then q,tF(l, 1). Therefore the lemma follows if n = q.

Let qi>' .. ,qt be all the distinct prime divisors of n. From the above, there are integers x;, y; such that q;,tF(x;, y;). From the Chinese remainder theorem there are

312


two integers X, Y such that X

==

Xi

Y

(mod qi),

= Yi

i

(mod q;),

= 1,2, ... , t.

Clearly we have (F(X, Y),n)

Now let

X

= X/(X,

Y), y

=

= 1.

Y/(X, Y). Then (x,y)

=

1 and

D

(F(x,y),n)=1.

Consider now p > 2, p Id where d is the discriminant of the form {a, b, c}. Since p,./'(a, c) we may assume that p,./'a. It is easily seen that {a,b,c}

~

{a,O,O}

(modp).

Theorem 5.3. Letp > 2 and let theforms {a, b; c} and {aI, bI> cd have discriminants d and d 1 respectively where p Id, pi d 1 • A necessary and sufficient condition for {a, b, c} and {aI> bI> cd to be equivalent mqdp is that

where k and kl are any integers representable by {a, b, c} and {aI' bI> cd respectively and satisfying (k,d) = 1, (k 1 ,d1 ) = 1. Proof That k and kl exist follows from Lemma 2. Let k == ax2 (mod p), (k, p) = 1. Then

Thus

(i)

is constant and is equal to

(~).

+ bxy + cy2

Suppose now that {a, b, c} and

{aI, bI> cd are equivalent modp. Then, from the definition of equivalence,

Conversely, if(i)

(i) = (~) = (~) = (:1). = (:1). (~) = (~ ) then

so that there is an integer z such that

a == alz2 (modp) and hence

It remains to consider the situation whenp following symbols:

= 2 and 21d. We first introduce the

313

12.5 The Equivalence of Forms modq k-l

d if -=0 or 3 (mod4); 4

(j(k) = ( - 1)-2 ,

d if -=0 or 2 (mod 8); 4

k 2 -1

e(k) = ( - 1)-8-, k-l

(j(k)e(k)

k 2 -1

d if -=0 or 6 (mod8); 4

= (- 1)-2-+-8-,

where k is an odd integer representable by {a, b, c}. Since 21 d implies 21 b we shall assume that b = 0 and consider

d= - 4ac. Theorem 5.4. A necessary and sufficient condition for two forms satisfying (mod 4) to be equivalent mod 4 is that they should have the same (j.

Proof Since d = - 4ac, it follows that ac and k is representable as

1 =3 .

=I (mod 4), that is a =c (mod 4). If 2,rk

then, since x, y must have the same parity it follows that k = a (mod 4) and hence = (j(a). The theorem can easily be deduced from this. 0

(j(k)

The same method can be used to prove the following theorems: Theorem 5.5. A necessary and sufficient condition for two forms satisfying (mod 8) to be equivalent mod 8 is that they should have the same e. 0

1 =2

Theorem 5.6. A necessary and sufficient condition for two forms satisfying (mod 8) to be equivalent mod 8 is that they should have the same ()e. 0

1= 6

Theorem 5.7. A necessary and sufficient condition for two forms satisfying (mod 4) to be equivalent mod 4 is that they should have the same (j. 0

1= 0

Theorem 5.S. A necessary and sufficient condition for two forms satisfying 1= 0 (mod 8) to be equivalent mod 8 is that they should have the same (j and e. 0 Exercise 1. Any two forms satisfying 1 = 2 (mod 4) are equivalent mod 4. Exercise 2. Any two forms satisfying 1

=I (mod4) are equivalent mod4.

Exercise 3. Any forms satisfying 1= I (mod 4) must be equivalent mod 8 to exactly one of

314


Deduce also that any two forms with the same discriminant d which satisfies ~ == 1 (mod 4) must be equivalent mod 8. Exercise 4. Let q be any positive integer. A necessary and sufficient condition for two quadratic forms to be equivalent mod q is that they have the same character system (see Definition 1 in the next section).

12.6 The Character System for a Quadratic Form and the Genus It follows at once from the definitions that any two quadratic forms which are equivalent are also equivalent mod q for any q.

Definition 1. Let PI>' .. ,Ps be the odd prime divisors of d. If (k, 2d) = 1 and k is representable by F(x,y) then, from the previous section, we see that

(~) , J(k), e(k), J(k)e(k)

(1)

do not depend on k. We call them the character system for F(x,y). Since two equivalent quadratic forms have the same character system we can speak of the character system of an equivalence class of forms. Definition 2. If two quadratic forms with the same discriminant d have the same values for each of the characters, then we say that they belong to the same genus. It is easily seen that a genus is formed from various equivalence classes offorms. We shall prove that each genus has the same number of equivalence classes. Since this fact falls more naturally in the study of ideals in a quadratic field we do not give the proof here. The importance of the notion of genus comes from the discussion of the representation of integers by quadratic forms. Let F(x, y) be a fixed quadratic primitive form. We now discuss the Diophantine equation k = F(x,y).

(2)

If h(d) = 1, then this problem can be solved with Theorem 4.1. But if h(d) ¥- 1, then we only have certain incomplete results from Theorem 4.1. For example if ljJ(k) = 0, then (2) has no solutions; but if ljJ(k) ¥- 0, is (2) soluble then? If it is soluble, then how many solutions are there? These questions cannot be answered by Theorem 4.1. The introduction of the notion of genus helps partly to answer these questions. Example 1. d

= - 96. There are four positive definite reduced primitive forms: {1,0,24},{3,0,8},{4,4,7},{5,2,5}.

12.6 The Character System for a Quadratic Form and the Genus

315

From Theorem 4.1 we only know that if k is representable by these four forms, then the total number of solutions is t/J(k)

=2L (--96) , nlk

n

where n runs over all the positive divisors of k. In order to calculate the character system we first select k coprime with d and representable by the forms. We take k = 1,11,7,5

and obtain

Form {1,0,24} {3, 0, 8} {4,4,7} {5,2,5}

(~)

o(k)

B(k)

+1

+1

+1

- 1 +1 - 1

- 1 - 1 +1

+1

- 1 - 1

This table shows that each genus has one equivalence class. Therefore, when k == 1,11,7,5 (mod 12), t/J(k) represents the number of solutions of the first, the second, the third and the fourth form respectively. More specifically, if k == 1 (mod 12), then t/J(k) = 2 Lnlk ( - 96/n) represents the number of solutions to x 2 + 24y2 = k. At the same time we have proved that this equation has no solution if k == 11,7,5 (mod 12). Example 2. d = - 15. There are two positive definite reduced primitive forms:

{l, 1, 4}, {2, 1, 2}.

Taking k = 1 and 17 will give

(~) = (~) = 1

and

(~)=(~)= -1.

We can then perform the calculations for k == 1,4 (mod 15) and k == 2,8 (mod 15). We conclude that if k == 7,11,13 or 14 (mod 15), then k is not representable by either of the two forms. If k == 1,4 (mod 15) then there are 2 Lnlk (- 15/n) ways to represent k by {I, 1, 4} ; if k == 2, 8 (mod 15), then there are the similar number of ways to represent k by {2, 1, 2}. From these two examples we see that if each genus contains only one equivalence class, then the number of solutions to (2) is completely determined when (k,2d) = 1. We tabulate all the discriminants d > - 400 in which the genus has only one equivalence class in the followin~ble, where we have also included all the positive definite reduced primitive forms.

316


Exercise. Study, as in the examples, the cases d

= - 20, - 24, - 32, - 35, - 51,

-75.

-d=3 4 7 8 11 12 15 16 19 20 24 27 28 32 35 36 40 43 48 51 52 60 64 67 72 75 84

88 91

1, 1, 1 1,0, 1 1,1,2 1,0,2 1,1,3 1,0,3 1,1,4 2,1,2 1,0,4 1, 1,5 1,0,5 2,2,3 1,0,6 2,0,3 1, 1,7 1,0,7 1,0,8 3,2,3 1,1,9 3,1,3 1,0,9 2,2,5 1,0,10 2,0,5 1, 1, 11 1,0,12 3,0,4 1,1,13 3,3,5 1,0,13 2,2,7 1,0,15 3,0,5 1,0,16 4,4,5 1, 1, 17 1,0,18 2,0,9 1,1,19 3,3,7 1,0,21 2,2,11 3,0,7 5,4,5 1,0,22 2,0,11 1, 1,23 5,3,5

-d=96

99 100 112 115 120

123 132

147 148 160

163 168

180

187 192

1,0,24 3,0,8 4,4,7 5,2,5 1,1,25 5,1,5 1,0,25 2,2,13 1,0,28 4,0,7 1,1,29 5,5,7 1,0,30 2,0,15 3,0,10 5,0,6 1, 1,31 3,3,11 1,0,33 2,2,17 3,0,11 6,6,7 1, 1,37 3,3,13 1,0,37 2,2,19 1,0,40 4,4,11 5,0,8 7,6,7 1,1,41 1,0,42 2,0,21 3,0,14 6,0,7 1,0,45 2,2,23 5,0,9 7,4,7 1,1,47 7,3,7 1,0,48 3,0,16 4,4,13 7,2,7

- d = 195

228

232 235 240

267 280

288

312

315

340

352

372

1,1,49 3,3,17 5,5,11 7,1,7 1,0,57 2,2,29 3,0,19 6,6,11 1,0,58 2,0,29 1,1,59 5,5,13 1,0,60 3,0,20 4,0,15 5,0,12 1,1,67 3,3,23 1,0,70 2,0,35 5,0,14 7,0,10 1,0,72 4,4,19 8,0,9 8,8,11 1,0,78 2,0,39 3,0,26 6,0,13 1,1,79 5,5,17 7,7,13 9,9,11 1,0,85 2,2,43 5,0,17 10, 10, 11 1,0,88 4,4,23 8,0,11 8,8,13 1,0,93 2,2,47 3,0,31 6,6,17

317

12.7 The Convergence of the Series K(d)

12.7 The Convergence of the Series K(d) Let

(d)

00 1 K(d)=I--· n= 1

This is a very important series. Since

(1)

n n

(~) is a real character mod Idl, it follows from

Theorem 7.2.3 that

Moreover we see from Theorem 6.8.2 that the series K(d) is convergent. Theorem 7.1. lim ~ t'"

I (~) =

I 1., k., t

00 "C

nlk

n

(()(Idl) K(d). Idl

(k,d)= 1

Proof 1) Let A("C; d, n) denote the number of positive integers not exceeding "Cln and coprime with d. Then 1 (d) 1 (d) -"C1 I I (d) -n =-I -n I I=-I -n I I "C "C 00

1 .,k"tnlk (k, d) = 1

00

n =l

I

l.,k"t (k,d) = 1 nlk

n =l

l.,k"t/n (k,d) = 1

(~)A("C;d,n) .

n= 1

n

(2)

"C

Since A("C; d, n) does not increase as n increases, and

A("C;d,n) 1 ---:;;;-, "C n it follows from Theorem 6.8.2 that the series (2) converges uniformly in "C. Also, for fixed n, we have

.

A("C; d, n)

t"'OO

"C

hm

(()(Idl) I

=---.

Idl

n

Therefore

. -1 "L." hm

,,(d) I'1m A("C;-d,n) L.. = ;, L." (d) -

t",00"C 1 .,k"tnlk (k,d)= 1

n

n=l

n

= ({)(Idl) Idl

t"'OO

I n= 1

(~) ~ n n

"C

.D

318


12.8 The Number of Lattice Points Inside a Hyperbola and an Ellipse Theorem 8.1. Let m > 0 and let there be an ellipse centre at the origin, or a hyperbola centre at the origin (the two curves of the hyperbola together with two lines passing through the origin). Denote by I the (finite) area of the region. Magnify the original figure by (that is replacing ~ and '1 by ~Jr and '1Jr), and denote by V(r) the number of lattice points in the magnified figure whose coordinates satisfy

Jr

~ = ~o

(modm),

'1 = '10

.

I

(modm).

Then V(r)

hm--=-2' t - co r m Proof We form a net in the original figure with the orthogonal lines ): =

.,

~o

'10 + sm '1 = --=--

+ "1m

Jr'

Jr

This gives a net of squares with side length mlJr. Denote by W(r) the number of squares whose "south-west corners" lie inside the ellipse or the hyperbola. Then clearly V(r)

=

W(r).

Since the area of each square in the net is m 2 /r it follows at once from the fundamental theorem of calculus that

and hence the required result.

D

12.9 The Limiting Average Denote by I/I(k, F) the number of proper representations of k by F, and let

L

H(r,F)=

I/I(k,F),

1 :::=;k~t (k,d)= 1

The aim of this section is the evaluate .

1

hm - H(r, F). t - 00

't

r

> 1.

319

12.9 The Limiting Average

Theorem 9.1. As x, y both run over a complete residue system mod Idl, there are precisely Idlcp(ldl) sets of x, y such that F(x, y) is coprime with d.

Proof It suffices to prove that if plld, I> 0, then there are icp(pl) sets of x, y in a complete residue system modi such that p,tF(x,y). For let the standard P:' Then, since (d, F(x,y)) = 1 and p,tF(x,y) are factorization for Idl be equivalent, it follows from the Chinese remainder theorem that, as x, y run over a complete residue system mod Idl, there are

ni

n plcp(pl)

=

Idlcp(ldl)

plldl

values of F(x,y) which are coprime with d. Since (a, b, c) = 1, we have p,t(a, c). We now assume that p,ta. 1) Suppose that p > 2. Since (p,4a) = 1, it follows from 4aF = (2ax

+ by)2 -

dy2

¥= 0 (mod p)

that 2ax

+ by ¥= 0

(modp),

and conversely. For any given value of y (there are pI values) there are p - 1 distinct values for xmodp, because p,t2a. There are thus pl-1(p - 1) = cp(pl) values for xmodpl. The required result is proved. 2) Suppose that p = 2. Now 21d implies 21b. The condition ax 2 + bxy

+ cy2 ==

1 (mod 2)

becomes ax

+ cy ==

1 (mod 2).

Since corresponding to each value of y (there are 21 values) there are 21- 1 values x (mod 21) which satisfy the above e9uation, the theorem is proved. D Theorem 9.2. We have

2n cp(ldl) .

11m t-+ 00

H(r, F) 1:

=

{

JIdf Idi' log e cp(d)

if d> O.

Jdd' Proof If d < 0, we let U(r)

=

U(1:, F, xo,Yo) denote the number of solutions to

0::::; F(x,y) ::::; x

if d < 0,

== Xo (mod Idl),

1:,

y == Yo

(mod Idl).

If d > 0, then we let U(r) = U(r, F, xo,Yo) denote the number of solutions to

320


X = Xo

1::;;;1~10,

O::;;;F(x,y)::;;;r,

(mod Idl),

= Yo

y

(mod Idl).

Here the definitions for L, £, 8 are the same as §11.4. Let xo, Yo both run over the complete residue system mod Idl such that (F(xo, Yo), d) = 1. Then

I

U(r)

I

=

t/I(k, F)

= H(r, F),

'"

U(r).

1 ';k';r (k,d) = 1

(XO,Yo) (F(xo,yo).d) = 1

and hence l' 1 · H( r, F) 11m = 1m 't

t - 00

t - 00

l'

L.

(XO,YO) (F(xo,yo),d) = 1

By Theorem 9.1 we see that our theorem follows if we can prove that, for each set of xo, Yo, we have

lim U(r) r- co

=

{~ :2' log 8 1 .jd d 2 '

r

if d < 0, if d> O.

Also, by Theorem 8.1, we need now only evaluate the area for the ellipse F(x, y) ::;;; 1, (d < 0), and the area for the hyperbola 0::;;; F(x,y) ::;;; 1, r > 0, 1 ::;;;

I~ I
0).

1) Suppose that d < O. It is well known that the area of the ellipse 2 ax + bxy + cy2 ::;;; 1 is 2n/JIdT. The theorem is therefore proved. 2) Suppose that d > 0, and we may assume that a > O. Since L

= 2ax + (b + .jd)y,

£ = 2ax + (b - .jd)y,

so that L£

= 4a(ax 2 + bxy + cy2),

and hence L > O. The required area for the hyperbola is

1= where the integration substitution

IS

ff

dxdy

over L£::;;; 4a, £ > 0, 1::;;; L/£ < L

2Ja= p,

£ --=(j

2Ja

82.

We make the

321

12.10 The Class Number: An Analytic Expression

whose Jacobian has the value op

op

ox

oy

ou ox

ou oy

Therefore 1=

~ II dpdu,

where the integration is over pu ~ 1, u > 0, u ~ p < e2 u. This is the region formed by the two straight lines from the points (1,1) and (e, lie) to (0, 0) together with the rectangular hyperbola joining the points (1,1) and (e, lie). Therefore I

Jd I =

e

P

lip

I I I I I I(~ -;) I ~p I~ -I; dp

du

+

dp

e

I

=

du

(p - ; ) dp

+

dp

o

l e e

=

p

+

dp = log e.

o

o

This gives

and the theorem is proved.

0

12.10 The Class Number: An Analytic Expression Theorem 10.1.

h(d)

=

{ W~ Jd

K(d),

-1-K(d), oge

Proof Let

if d < 0, if d> 0.

322


be a representative system. From Theorem 4.1 we have

I

I

H(7:, F) =

II/I(k,F)

l~k~T

F

F

(k,d)= 1

I

1 ~k~T (k,d) 1

=w

I/I(k)

=

I (-d) .

I

1 ~k~T nlk (k,d) 1

=

n

From Theorem 7.1 and Theorem 9.2 we have h(d) { 2n } cp(I~1) loge Idl'

as required.

= w cp(ldl) K(d) {if d < 0, Idl

if d> 0,

0

Therefore our problem becomes that of the determination of the sum of the series K(d)

=

I -l(d) - . 00

n= 1

n n

12.11 The Fundamental Discriminants Definition. By a fundamental discriminant we mean a discriminant d which has no odd prime square divisor, and d is odd or d == 8 or 12 (mod 16). For example: 5, 8,12,13,17,21,24,28,29, ... are fundamental discriminants. Theorem 11.1. Each discriminant d is uniquely expressible as fm 2 where f is a fundamental discriminant. Proof 1) If d is odd, then we let m 2 be the largest square that divides d. Write d = fm 2 for the required result. 2) If d is even, then we first write d = qr2 where r2 is the largest square that divides d. Clearly 21 r. If q == 1 (mod 4), then q is a fundamental discriminant. If q == 2 or 3 (mod 4), then we takef = 4q so that from 4q == 8 or 12 (mod 16) we see

that f is a fundamental discriminant. 3) Uniqueness. Let d = fm 2, m > andfbe a fundamental discriminant. If fis odd, thenfhas no square divisor so that m 2 is the largest square divisor of d. Iffis even, then f == 8 or 12 (mod 16), hence 4%f/4 and therefore (2m)2 is the largest square divisor of d. From this we see that the uniqueness property follows. 0

°

Theorem 11.2. Let d = fm 2 be the representation in Theorem 11.1. Then K(d) =

n (1 - ([)~)K(f). P P plm

323

12.12 The Class Number Formula

Proof We have

L (d) - -I = L (m2/) - -I co

=

K(d)

co

n= 1

n n

n

n= 1

I - L (I) -n n'

n

co

n= 1

-

(m,n)= 1 Let the standard factorization of m be pill . .. p!s. Then from Theorem 1.7.1 we have

K(d) = K(f) - L (£)~K(f) pdm Pi Pi

=

11 (I - ([)~)K(f).

D

p P

plm

We see from this theorem that we need only determine the values for Exercise. Show that if d is a fundamental discriminant then character mod Idl.

12.12 The Class Number Formula We now assume that d is a fundamental discriminant. Let

Ji = {+Ji, i~1,

Theorem 12.1.

If 0
land d == 0 or 1 (mod 4). Also let xo, Yo be the solution to

in which Xo

+ flyo

is least (xo > 0, Yo> 0) and let B=

Xo

+ flyo 2

.

The aim of this section is to prove that

Let d ~ m 2f, where 1 is a fundamental discriminant. Theorem 13.1. Letl> 0, and A* be the least non-negative integer

1+ 1 I I

A

A*

(I)

1(jj---::=-A* + l) I -n I "'~"2 jj. a

a= 1 n= 1

== A (mod!). Then

327

12.13 The Least Solution to Pelt's Equation

Proof We can prove, from Theorem 3.3, that

f ±(£)n =

0,

a=ln=l

so that

(I)

(I)

A A* 2:2:-=2:2:-. n n a

a=l n=l

a

a=l n=l

Also we can use the same method as in Theorem 7.9.2 to prove that

A*+I 1£ f(£)I~~(Jj_A*+I) Jj' I

a=ln=l

and so the required result follows.

n

""2

D

Theorem 13.2. Let d> 1. Then

Proof A direct computation gives

if (m,n)

=

I

if (m,n) > 1. Therefore

328


(Sincef~ 5 implies 1 < ~

Theorem 13.3. Let d

t.Jj; also Lklm 1 ::;:; m.)

0

5. Then K(d)
0).

(4)

From (4) we see that, given any 0"0 > 0, the series for Ld(s) is uniformly convergent in any finite region in the half plane 0" ;:::: 0"0' Since 0"0 can be any small positive number it follows that Lis) is analytic in the half plane 0" > 0. We now let n1 = 1 and n2 --+ 00 so that (2) follows. From Theorem 10.1 and h(d) ;:::: 1, loge> 0, we see that I

Lil) = K(d) > 0.

Separating the sum for Ld(l) into two parts: Lil)

=

I

n=l

(~)~ = n n

I (~)~ + I

n=l

n n

n=idi+1

(~)~, n n

we see that the first part satisfies

In~l (~)~I ~ n~l~ < 1 + logldl, while, by (4), the second part satisfies

II

(~)~I ~_Idl
1).

(2)

m=O From (1) and (2) we have

f(s)

=

n( I

apm p - ms )

(0" > 1).

(3)

m=O

p

Suppose now that the standard factorization of n is p~' ... pl'. We define

so that an is defined for all natural numbers n, and is a multiplicative function satisfying al = 1, an ;?; O. Again, by Theorem 5.4.4 and (3) we see that 00

(0"> I), n=l

where an has the requirement stated in the theorem.

0

Theorem 15.2. Let d and d 1 be two fundamental discriminants, Idl > dd 1 is a discriminant. Let f(s) be defined in Theorem 15.1 and let

Id11 >

1. Then

Then, for 0 < J < a 

2

C1P jdd1IC2 (1-a), I-a

where C b C 2 are positive constants depending on J. Proof The function f(s) - p/(s - "I) is analytic in expansion

P

f(s) - -

s- 1

=

Is -

21 < 1 and has the Taylor

00

I

m=O

(b m - p)(2 - s)m,

(4)

where

f (m)(2) bm = (_I)m _ _ m!

b o = f(2),

(m=I,2, ... ).

By Theorem 15.1, we havef(2) ;?; 1, and 00

(- 1)"'f(m) (2)

I

=

ann - 2 logm n ;?; 0

(m

= 1,2, ... ),

n=l

that is

b o ;?; 1,

(m

= 1,2, ... ).

(5)

334


From Theorems 14.2 and 14.3 we know thatj(s) - pl(s - 1) is analytic in the right hand half plane (J > 0, so that the expansion (4) actually holds for Is - 21 < 2. We now apply Theorem 14.1 to give an upper bound for Ibm - pI. For this purpose, we first seek an upper bound for If(s) - pl(s - 1)1 on the circle Is - 21 = (2 - (j)/~ where ~ is a number satisfying 0 < ~ < 1 and 1 < (2 - (j)/~ < 2. From Theorem 14.2 and Theorem 14.3 we have Ij(s)1

~ (_1 _ + ~) (~)3Iddtl2 Is - 11 (J

Since Is - 21 = (2 1;1

(j)/~

(s ¥- 1,

(J

> 0).

(6)

(J

we have

~ (2 + 2 ~ (j)

I

(2 _ 2

)-1 ,

~ (j).

1 (2-(j Is - 11 ~ -~- - 1

and hence

(j)

2( Is-21=-~, where C3 is a positive constant depending only on (j and 14.3, we have Ipl ~ Idd1 12 , so that i j(S) - -p-i s-l

~ C4 1dd1 12 ,

~.

Also, from Theorem

2-(j

Is-21 =-~-,

(7)

and, from the maximum modulus theorem, we see that (7) also holds for Is - 21 ~ (2 - (j)g. Therefore, from Theorem 14.1 we have that Ibm - pi

~ C 1dd

l l2

4

(2

~ (j

r'

m = 0, 1,2, . . . .

(8)

We can now obtain a lower bound for f(a) from the expansion (4). We have p mo-I 00 j(a) = - a-I

+ I

(b m - p)(2 - a)m

m=O

+ I

(b m - p)(2 - a)m,

m=mo

and, by (5), we have mo-I

I

(2-a)mO_l

mo-I

(bm-p)(2-ar~

m=O

1-

I

m=O

p(2-a)m= l-p

, 1- a

while, by (8), we have

giving (9)

335

12.15 Siegel's Theorem

We now choose

so that

mo
I),

and

(2 - a)mo
O. Then

I Ld(l) Proof We can assume that 0 < e < j(s)

t.

=

O(ldl')·

Let

= (s)Lh)LdJS)Ldd,(S),

P = Ld(l)Ld,(I)Ldd,(I),

(10)

where d 1 is chosen as follows: If there is a fundamental discriminant d 1 such that Ld,(a) has a zero in I - e < u < 1, then we take this d 1 to be the d 1 in (10) and we denote by a any zero of Ld,(u) in this interval, so that j(a) = O. If there is no fundamental discriminant d 1 such that Ld,(u) has a zero in 1 - e < u < I, then we take any fundamental discriminant d 1 • In this case, if j(u) has zeros in 1 - e < u < 1, then we take a to be anyone of its zeros so thatf(a) = 0; ifj(u) has no zero either in the interval, then we take a to be any point in the interval I - e 0 by Theorem 14.3 and sincej(s) - p/(s - I) is analytic in the right hand half plane we see thatf(u) --+ - 00 as u tends to I from the left, and we deduce thatj(u) is negative in I - e Id11. From Theorem 15.2 (taking J = we have

(11)

t so that 0 < J < I -

e < a < 1),

336


where C 1 and C2 are absolute positive constants. Therefore 2C I - < __ 1 L (I)L (1)ldd IC2(I-a) = CL (1)ldI C2 (I-a) Ld(l) 1 _ a d, dd, 1 dd, , where

is a constant which does not depend on d. When Idl > Id1 1 > 1 we have L dd ,(I) ~ 2

+ loglddd
0 we have, by Theorem

15.3 and Theorem 14.3, (12) and so by Theorem 10.1

Cloldlt-~ ~ h(d) { loge 1 } ~ Cllldlt+~ which is the required result. 2) If d is not a fundamental discriminant and d = fm 2 , where fis a fundamental discriminant, then from

K(d) = plmn (1 - ([)~)K(f), P P

337

Notes

we have C13m-~K(f)

::;:; K(d) ::;:;

C12m~K(f).

From (12) we arrive at C14Idlt-~::;:; IdltK(d) ::;:; C15Idlt+~

and the theorem follows from Theorem 10.1.

D

Notes 12.1. The method of D. A. Burgess (see Note 7.2) can be used to give an improved estimate on the least solution e = (xo + jdYo)/2 to Pell's equation x 2 - dy2 = 4,

d > 0, d == 0 or 1 (mod 4). The result is: corresponding to every l> > 0 there exists a constant eel»~ such that log e < whenever d>

eel»~

(± + l»jdlog d

(see Y. Wang [63J).

Chapter 13. Unimodular Transformations

13.1 The Complex Plane Let z = x + yi be a complex number which is represented by a point P on a plane with coordinates (x, y). From the origin 0 we construct a directed line to P and we call this line the vector OF. There is a bijection between z and P so that every complex number now corresponds to a vector from the origin. y

P(x,y)

----~--~------------------

a

___ x ~

The distance from 0 to P, also known as the length of the vector OP, is given by p = x 2 + y2 , and is the same as the a!solute value of z. The angle 8 measured from the positive x-axis to the vector OP, is called the argument of z. We have

J

x

=

y = psin8

pcos8,

and (p, 8) are referred to as the polar coordinates of the point (x,y). Clearly we have z= x

+ yi =

p(cos8

+ isin8) =

pew.

We usually write argz = 8. The circle centre c with radius r (;?; 0) can be represented by the equation

Iz - cl

=

r,

and the particular circle Izl = 1 is called the unit circle. We next investigate the bilinear transformation

az+ b z'=--cz

+ d'

(1)

339

13.2 Properties of the Bilinear Transformation

where a, b, e, d are (in general complex) constants, and ad - be =I o. This transformation maps a point z (=I - die) in the plane into another point z'. Corresponding to the point z = - die we introduce an ideal point, called the point at infinity, for its image and we write z' = 00. Our discussion is concerned with the plane together with this ideal point. This is often called the extended eomplex plane, but in this chapter we shall simply call it the complex plane. Corresponding to the point z = 00 we have the image z' = ale. If we solve (1) for z, we have Z=

-dz'+b , ez' - a

which is also a bilinear transformation known as the inverse transformation of (1). We see therefore that the transformation (1) is a bijection from the complex plane onto itself. Let us place a sphere on the complex plane with point of contact at the origin. We may refer to this point of contact as the "south-pole", and the point on the sphere which is diametrically opposite to this as the "north-pole". Consider a line joining a point z on the plane to the "north-pole". This line crosses the sphere at a point, and if we map the point z onto this point and the point at infinity onto the "north-pole" we see at once that this sets up a bijection between the complex plane and the surface of the sphere. This replacement of the abstract notion of the complex plane with an ideal point by the concrete notion of the surface of a sphere is due to Riemann, and we often call the sphere here the Riemann sphere.

13.2 Properties of the Bilinear Transformation Corresponding to a bilinear transformation A: az + b ez + d'

z'=---

(1)

there is a matrix (2)

whose determinant is ad - be (=I 0), which we call the determinant of the transformation. Note that different matrices may correspond to the same transformation, since (

ap

bP),

ep

dp

all represent the same transformation (1). However it is not difficult to prove that, apart from this situation, there is no other matrix which corresponds to the transformation (I). We can choose p so that p2(ad - be) = 1 so that there is always a unit determinant matrix to represent the bilinear transformation A. It is easy to

340

13. Unimodular Transformations

show that there are only two unit determinant matrices which correspond to a given bilinear transformation, namely the matrices

( ±a, ± ± c,

b).

±d

Let there be another bilinear transformation B: a'z' + b' - c'z' + d"

z"----

(3)

so that we have the bilinear transformation C: a'(az + b) + b'(cz c'(az + b) + d'(cz

+ d) + d) (a'a + b'c)z + a'b + b'd (c'a + d'c)z + c'b + d'd

z" = - - - - - - - - -

(4)

with corresponding matrix (

a'a + b'c c'a + d'c

a'b + b'd\ c'b + d'd)

known as the product of the two matrices (:: ( a'a + b'c c' a + d' c

a'b + b'd\ c' b + d' d)

(a'

~,) and (: ~), and we write b') (a c

= c' d'

b). d

The transformation (4) is also referred to as the product of the transformation (3) and (1) and we write C = BA. Note however that BA is not necessarily the same as AB. We denote by A -1 the inverse transformation to A. The transformation

z' = z is called the identity transformation and is denoted by E. We have AA -1 = A- 1 A

= E.

Definition 1.* Let a set of bilinear transformations have the following three properties: (i) it contains the identity transformation, (ii) the product of any two transformations in the set is also in the set, (iii) the inverse of any transformation is also in the set. Then we say that the set of transformations form a group. Example 1. The set of all bilinear transformations form a group. Example 2. The set of all bilinear transformations with real coefficients form a group.

*

The three properties here are interrelated, but they suffice for our purpose of keeping matters simple and easy.

341

13.2 Properties of the Bilinear Transformation

Example 3. The set of all bilinear transformations with real coefficients and positive determinants form a group. Example 4. The set of all bilinear transformations with integer coefficients a, b, c, d satisfying ad - bc = ± 1 form a group. Example 5. The set of bilinear transformations with complex integer (that is a = a' + alii, a', a" integers) coefficients form a group.

Definition 2. If the image of Zo under the transformation A is Zo itself, then we call Zo a fixed point of A. In general a bilinear transformation has two distinct fixed points (from z' = z). They are the two roots Qf the quadratic equation CZ 2

+ (d -

a)z - b = O.

(5)

If Zr. Z2 are the two roots of this equation, then we can rewrite the transformation in the standard form Z'-Zl

Z-Zl

Z' -

Z -

--=..1--. Taking

Z

=

00

so that z'

Z2

(6)

Z2

= ale we can specify A as ..1=

a - CZl a -

.

CZ 2

It is easy to show that A satisfies the quadratic equation I

a2

+ d 2 + 2bc

..1+-=----A ad - bc

(a

+ d)2

- - - - 2. ad- bc

(7)

If 1..11 = I, A ¥- I, then we say that the transformation is elliptic. If A is real and not equal to ± 1, then we say that the transformation is hyperbolic. If A is complex and 1..11 ¥- I, then we say that the transformation is loxodromic. If c = 0 and d - a ¥- 0, then one of the fixed points is the point at infinity. Taking Z2 = 00 equation (6) then becomes Z' -

Zl

= A(z -

Ifthe two fixed points coincide, that is (a - d)2

Zl

Zl)'

(8)

= Z2, then

+ 4bc =

0

or (a

+ d)2 + 4(bc -

ad)

= O.

(9)

342


A transformation satisfying this condition is said to be parabolic. Substituting (9) into (7) gives A = I and the standard equation (6) becomes I

1

Z'-Zl

Z-Zl

--=--+k where Zl = (a - d)/2c, k = 2c/(a + d). In particular when c = 0, a - d = 0, this fixed point becomes the point at infinity and the transformation then becomes Z'

= Z + k,

k

= b/a.

If on the repeated applications of a transformation the product becomes the identical transformation then we call the transformation a transformation offinite order. In this case, the period of the transformation is defined to be the least number of applications required to result in the identical transformation. Repeated applications of (10) and (6) give I

I

Z -Zl

Z-Zl

-,-- = - -

+ nk,

Z'-Zl =An _ Z-Zl ___ _ Z' - Z2

Z - Z2

so that the parabolic, hyperbolic and loxodromic transformations are not of finite order. Only for elliptic transformations do we have An = I and the period is theleast positive integer n such that An = 1. When n = 2 so that A = - 1 we call the transformation an involution.

13.3 Geometric Properties of the Bilinear Transformation

Theorem 3.1. The cross ratio is invariant under a bilinear transformation. Proof Let aZi CZi

+b + d'

Z~=---

,

so that Z' i

and hence

Z' j -

(ad - bc)(zi - z) J (CZi + d)(czj + d) ,

343

13.3 Geometric Properties of the Bilinear Transformation

Given any three points Zl> Z2, Z3 there exists a bilinear transformation which maps them onto any three specified points z'p z~, z~. This transformation can be written down explicitly by Z' -

Z'l

Z~ - z~ Z3 - Z2 Z - Z 1

Z' -

z~

Z3 -

"

-----,

Z2 Z3 -

Zl Z -

Z2

(I)

or Z~ -

Z~ Z' -

Z'l

Z3 -

Z2 Z -

Z1

Z~ -

Z~ Z' -

Z~

Z3 -

Z1 Z -

Z2

or (2)

If there is a bilinear transformation with the above property, then by Theorem 3.1 after Z having been specified, z' must satisfy (2). That is, z' is uniquely determined. Therefore a bilinear transformation with the above property is unique. In other words, (2) is a general form for a bilinear transformation. Let A 1 ,A 2 ,A 3 ,P be the points representing ZI>Z2,Z3,Z respectively. Then we have

where the direction of the signed angle is as shown in the diagram. From this we see that if the cross ratio is a real number, then

must be a multiple of n, and hence P lies on the circle through the three points A 1 ,A 2 ,A 3 •

If (ZlZ2Z3Z) is real, then by (2), (z~z~z~z') is also real, so that as Z describes the circle through Zl> Z2, Z3, the point z' will describe the circle through z~, z~, z~, and conversely. We have therefore proved that a bilinear transformation maps circles into circles. Note however that, in the present context, a straight line is interpreted as a circle with infinite radius.

344


Theorem 3.2. A bilinear transformation preserves the angle of intersection between two circles. That is, if two circles intersect with angle 9, then the two image circles ofa bilinear transformation also intersect with angle 9.

Proof Let the two circles intersect at Zt and Z2, and take two points Z3,Z4 in the neighbourhood of Z t on the two circles. The argument of the cross ratio arg(z3Z4ZtZ2) is LZ3Z2Z4 - L;3ZtZ4. As Z3 and Z4 both tend to Zt this gives the value of the angle of intersection for the two circles. Since the cross ratio is invariant under the bilinear transformation, the theorem is proved. D

13.4 Real Transformations We now consider the transformation az ez

+b + d'

z'=---

ad - be # 0,

where a, b, e, d are real numbers. Here we cannot always choose p so that p2(ad - be) = I; we can only choose p so that p2(ad - be) = ± I. From now on we shall assume that ad - be

= ± 1.

The set of all real bilinear transformations with determinant 1 form a group which we denote by 91. Clearly members of this group map the real axis onto itself. Moreover, given any three real numbers, there is a member which maps them onto any three specified real numbers. Theorem 4.1. Members of 91 map the upper half plane (that is y > 0) onto itself.

Proof Let z' = x'

+ iy', Z = x + iy, Z = x - iy. Then ., az + b az + b 2(ad - bc)iy 21Y = - - - - - - - = ----=-ez + d cz + d Icz + dl 2 '


D

(I)

345

13.4 Real Transformations

Definition 1. A semicircle centred on the x-axis lying in the upper half plane is called '

a geodesic.

From Theorem 4.1 and Theorem 3.2 we have: Theorem 4.2. Members of 91 transform geodesics into geodesics.

0

Letzt. Z2 be any two points in the upper half plane. Ifamember of 91 mapszt. Z2 into z~, z~ respectively, then clearly

or

~2 -zll2 = Iz~ -z: 1 Zl-Z2 Zl-Z2

2 1

•

Take Z2 = Z + L1z, Zl = Z and letting L1z --+ 0 we have dz 12 12y

= 1dz' 12, 2y'

or dx 2 + dy2 y2

dX,2

+ dy'2 y'2

From this we see that the metric (2)

is invariant under transformations in 91. The area dxdy y2

(3)

which corresponds to this metric is also invariant under transformations in 91. Readers who are not familiar with differential geometry can prove the invariance of (2) and (3) under members of 91 by a direct method. , Theorem 4.3. Let Zt.Z2 be two points on the upper halfplane and let C be a smooth curve lying in the upper half plane joining z 1 and Z2' Then the value of the integral

f

JdX 2 + dy2 y

C

is minimum when C is (part of) a geodesic. Proof Construct a circle centre on the x-axis passing through Zl and Z2' Denote its centre by (t,O) so that the circle is described by

346


x =t

+ pcos8,

y = psin8.,

Let 8 = 8 1 and 8 2 when z = z 1 and Z2 respectively. Now the curve C can be described by x= y

t

+ p(8) cos 8}

= p(8) sin 8

and hence

f

f 82

+ dy2

Jdx2

-'-----=

J(p'(8) cos 8 - p(8) sin 8)2

+ (p'(8) sin 8 + p(8) cos 8)2

p(8) sin 8

y

d8

c p'(8))2 d8

( ---1+ p(8) sin 8

f 82

~

d8

-- =

sin 8

log

tan}8 2 . tani81

This shows that the values of the integral is minimum when and only when p'(8) = 0, that is when p(8) = p is constant. 0

Figure I

The above proof actually gives the minimum value of the integral along the geodesic. We can interpret the value geometrically as follows: Let the geodesic through Zb Z2 intersect the x-axis at the points A, B with its centre at C (see Fig. 1). Then we have

1 BZl tan-8 1 = - , 2

Therefore

ZIA

1 BZ2 tan-8 2 = - . 2

Z2A

347

13.4 Real Transformations

Definition 2. The minimum value of the integral in Theorem 4.3 is called the nonZi and Z2'

Euclidean distance between the two points

Definition 3. In this chapter the curvilinear triangular region between three geodesics will be called a triangle. Theorem 4.4. The non-Euclidean area

II

d::Y of a triangle ABC is given by

rr-LA -LB-LC.

B

Figure 2

Proof I) We first consider the case L B = prove that there

L C = 0 (see Fig. 3). It is not difficult to

B

D

Figure 3

is a real bilinear transformation which maps Bto the point at infinity, Cto the point I, D to the point - I (or C to - I and D to I), and that the determinant is positive*. Thus Fig. 3 is transformed into Fig. 4. Let the coordinate for A be (xo,Yo). Then 1

II Xo

*

Jl-x 2

Ip 1

00

dx:y = y

I - x

Xo

= sin-

i

sin- Xo = rr - LA.

Xo

'

The real transformation which maps B, C, D into

z'

ixli = ~ 2

=

00,

± I,

=+=

I is given by

(D - 2B + C)z + (BC - 2DC + BD) +-----------(C - D)z + (D - C)B

and the value of the determinant is ± 2(D - C)(C - B)(B - D).

348


c

B

A

B

o

c

D

D

Figure 5

Figure 4

2) If L C = 0, then we use a real transformation to map C to From 1) we have

00,

giving Fig. 5.

LlABC = LlBDC - LlADC = (n - LB) - (n - (n - LA)) = n - LA - LB. 3) If none of LA, L B, L C is zero as in Fig. 6,

D

Figure 6

then, by 2), we have LlABC = LlADC - LlABD = (n - LC - LA - LBAD) - [n - (n - LB) - LBAD] =n-LA-LB-LC.

0

From this theorem we see that the sum of the interior angles of a triangle is at most two right angles, and its value can be any number between and n. What we have described here is a model of the famous Lobachevskian geometry which is an important tool in the study of modular functions.

°

13.5 Unimodular Transformations Definition. Let a, b, e, d be integers satisfying ad - be az

+b +d

z'=-ez

is called a unimodular transformation.

=

1. Then the transformation (1)

349

13.5 Unimodular Transformations

It is easy to see that unimodular transformations form a group. From (7) in §2 we have

A.

+ A. -1 = (a + d)2

- 2.

The discriminant of this quadratic equation is

[(a

+ d)2

- 2]2 - 4 = (a

+ d)2[(a + d)2

- 4].

In our discussion we may assume that a + d ~ 0, since otherwise we can replace a, b, c, d by - a, - b, - c, - d. I) If a + d> 2, then the transformation is hyperbolic and there are two real fixed points. These two fixed points are the roots of the quadratic equation

cz 2

+ (d -

a)z - b = 0.

The condition for this quadratic equation to have rational roots is that

(d - a)2

+ 4bc = (a + d)2

- 4 = u 2,

°

where u is an integer. Since the only solutions for x 2 - y2 = 4 are x = ± 2, Y = it follows that the fixed points of a hyperbolic transformation must be irrational numbers which are the roots of a quadratic equation with rational coefficients. We call such numbers quadratic algebraic numbers. 2) If a + d = 2, then A. = I and we have the parabolic transformation 1

I

z' - (a - I)/c

z - (a - I)/c

-----=

+c.

If c = 0, then a = d = I and we have z'

=

z

+ b.

The former has the rational number (a - 1)/c as the fixed point while the latter has 00 as the fixed point. 3) If a + d = 1, then ..1 2 + A. + I = and so A. is p = e 2 f[i/3 = (- 1 + )=3)/2 or p2. The fixed points are then given by

°

a

ZI

+ p2

=---,

a+p

Z2

=--, C

C

and the standard form for the transformation is

z' - (a + p2)/C z' - (a + p)/c

z - (a

+ p2)/C

= p z - (a + p)/c .

This is an elliptic transformation whose period is 3. Replacing p by p2 will give another elliptic transformation with period 3. 4) If a + d = 0, then the equation for A. is (A. + 1)2 = so that A. = - 1, and the fixed points are the roots of

°

350


cz 2

2az - b = 0,

-

that is

z=

a+i

---=--. c

The standard form for the transformation is z' - (a

+ i)/c

z - (a

z' - (a - i)/c

+ i)/c

z - (a - i)/c

This is an elliptic transformation with period 2. Summarizing we have:

Theorem 5.1. If a + d = 0, then the unimodular transformation (I) is an involution; if a + d = ± I, then it is a transformation with period 3; if a + d = ± 2 then it is parabolic and its fixed point is either a rational number or the point at infinity; if la + dl > 2, then it is hyperbolic and its fixed points are real quadratic algebraic numbers. 0

13.6 The Fundamental Region Definition 1. Let z, z' be two points on the upper half plane. Suppose that there is a unimodular transformation which maps z into z'. Then we say that z and z' are equivalent, and we write z '" z'. Clearly we have (i) z '" z; (ii) if z '" z', then z' '" z; (iii) if z '" z', z' '" z", then Z

1"'00./

z".

We shall consider the following region in the upper half plane: -t~x

D:

{x

2

x2

+ y2 ;?;

I I

when when

x> 0, x ~ O.

y

p __

~

_ _ _ _ _ _- L_ _ _ _ _ _J -___

o Figure 7

x

351

13.6 The Fundamental Region

Definition 2. We call the points in D reduced points, and the region D the fundamental region. This region D is a triangle with interior angles 0, rt/3, rt/3. Theorem 6.1. No two reduced points are equivalent. Proof Let z, z' be two distinct reduced points and suppose that az

+b

z'=--cz +d Then, by (1) in §4, we have y

,

y

= .,----::-:;-

Icz

+ dl 2

We have Icz

+ dl 2 = c2zz + cd(z + z) + d 2 = C2(X 2 + y2) + 2cdx + d 2 ~ c 2 - Icdl + d 2 > 1

where we must exclude the exceptional cases: c = ± 1, d = 0, or c = 0, d = ± 1, or c = d = 1. Therefore, apart from these exceptional cases, we always have y' < y.

When c = d = 1, Icz + dl 2 p2 + P + 1 = 0, we have Z

,

ap

+b

=

1 only when z ap + b p2

=--= ---=

P+ 1

2

= p.

From a - b

=

1 and

2

-ap -bp= -p +b.

Therefore /(z') = )312. If z' E D, then z' = p which contradicts with z, z' being distinct points. We also have dz' - b

z=---- cz'

+a

so that y 1, then z cannot be a reduced point. If Izl = 1, then z must lie on the arc of the circle from p to i, and z' (= - liz) lies on the arc of the circle from p + 1 to i. If z =I i, then z' cannot be a reduced point, and if z = i, then z' = i = z, contradicting our assumption. 0

Theorem 6.2. The number ofpoints in the rectangle - t ~ x < t, y ~ y (y > 0) which are equivalent to afixed point isfinite. That is, if we partition the rectangle into sets of mutually equivalent points, then each set has only a finite number of points. Proof Let z

= x + iy and az

+b

z'---- cz + d· Then we have

If y'

~

y, then C2(X 2

+ y2) + 2cdx + d 2 ~~, Y

or

and clearly there can only be a finite number of integers c, d satisfying this. Let (c', d') be any such pair of integers, and (c', d') = 1. Then all the solutions (a, b) of the equation ad' - bc' = 1 can be represented by a = a'

+ mc',

b = b'

+ md'

where a', b' is a fixed solutions (that is a'd' - b'c' = 1), and m is any integer. Thus az

+b

a'z

z'=---= cz + d c'z

+ b' +m. + d'

There can only be one m such that - t ~ x' < t. Therefore corresponding to each pair (c', d') with (c', d') = 1 there is only one set a, b such that - t ~ x' < t. Therefore the number of points in the rectangle which are equivalent to z is finite. 0

353

13.6 The Fundamental Region

Theorem 6.3. Every point in the upper half plane is equivalent to a unique reduced point. Proof Let z = Xo

+ iyo, Yo > O. We take the unique integer m satisfying -t~xo

+m 1, then z' is a reduced point and there is nothing more to prove. If Iz'l = I and z' lies on the arc from p to i, then it is a reduced point, and if it lies on the arc from I + p to i, then the transformation - liz will give the former situation. If Iz'l < I, then we let I

and

z"= - -

z'

Choose m' such that

z'" =z"

+ m',

-t~ x'"

Yo·

From Theorem 6.2 we see that there can only be a finite number of such points. Therefore every point must be equivalent to a reduced point, Also, from Theorem 6.1, there cannot be two equivalent reduced points. The theorem is proved. D In order to appreciate the significance of this theorem the reader should try to give direct proofs of the following two exercises which are immediate consequences of the theorem. Exercise 1. All the points a+i c

Z=--,

a2

+ bc + I = 0

are equivalent to i. Exercise 2. All the points a+p c

Z=--,

are equivalent to p.

a(l - a) - bc = I

354


13.7 The Net of the Fundamental Region Theorem 7.1. Suppose that Z is not a fixed point of any unimodular transformation. Let V, V be two distinct unimodular transformations. Then Vz ¥- Vz, where Vz represents the image of z under V. Proof If Vz = Vz, then z = V-I Vz, so that z is a fixed point of a unimodular transformation. D Theorem 7.2. The set of all triangular images of the fundamental region forms a covering of the upper half plane with no overlaps. Proof The first part of the theorem follows from Theorem 6.3. If Vand V are two distinct unimodular transformations whose triangular images of D overlap, then the mapping V-I Vmust map D into a triangular region which overlaps with D. Let z be a point in this overlap. Then there must be a point in D which is equivalent to z, and this is impossible if z is in D. D We can explain this theorem in terms of tile covering. In ordinary space we can cover regions without overlaps using equal size square tiles, and by this we mean that each tile can be "translated" from one place to another which is occupied by another tile. Here the fundamental region is the shape of our new tile, and "translation" is now a unimodular transformation. The above theorem then tells us that such a tile can be used to cover the upper half plane with no overlaps. This is the interpretation of non-Euclidean geometry, and with this language the notion of a fundamental region becomes clearer, and generalization becomes easier. We can alter the definition of a fundamental region as follows: Any region in the upper half plane is called a fundamental region if (i) any point must be equivalent to a point in the region; (ii) no two points in the region are equivalent. Take any point z in the upper half plane which is not a fixed point of any unimodular transformation. Construct the points z 1, Z 2, ••• which are equivalent to z, and then construct the perpendicular bisectors of (z, Zi), that is those points which have the two equal non-Euclidean distances from z and Zi. Discard the part on the side of Zi. Then the remaining part is a fundamental region. (The reader should supply the prooffor this, and also determine the fundamental region corresponding to z = 2i.) We remark that besides being important theoretically Lobachevskian geometry also has useful applications in number theory and in function theory. We note the following: The fixed points of an elliptic transformation with period 2 lie on the lines joining vertices with angles n/3. The fixed points of an

355

13.8 The Structure of the Modular Group

elliptic transformation with period 3 are the common vertices of six triangles. The fixed points of parabolic transformations are those points with infinitely many lines through them. The fixed points of a hyperbolic transformation cannot be vertices of any triangle (and it is even clearer that they cannot lie on the sides).

W= - I

W=

-t

W=o

W=t

Figure 8

13.8 The Structure of the Modular Group Let us denote by Sand T the transformations z' = z + I and z' = - liz respectively. Then S-1 denotes the transformation z' = z - l. The three transformations transform a fundamental region into the three neighbouring regions, and conversely the transformation which maps a neighbouring region into a fundamental region must be one of S, Tor S- 1. Let M be any unimodular transformation, and z be any point in the fundamental region D. Wejoin z to M z by a curve not passing through any vertices. Suppose that the various regions that this curve crosses are

Also, denote by Mi the unimodular transformation which maps D into D i. Now Ml = S, Tor S-I. Suppose that Mk can be represented as a product of Sand T. Since M;; 1 maps Dk into D, and D k+ 1 is a neighbouring region to Db it follows that M;; 1 maps D k+ 1 into M~+ l ' a neighbouring region of D. But D~+ 1 can be mapped into D via a transformation M,-1 (= S, Tor S-I). That is

or

356


Therefore Mk + 1 = MkM' can be represented as a product of Sand T, and hence so can M itself. We have therefore proved: Theorem 8.1. Any unimodular transformation is representable as a product of S and T. D Theorem 8.1 has the following explicit interpretation: If

then 1 z'=mt-m2 - m3 - m4 - ... - mv

+z

This shows clearly the relationship between unimodular transformations and continued fractions. It is easy to see that

Note. If we extend the definition of a unimodular transformation to: az + b z'=--cz +d'

ad - bc

=

±

1,

then we can have the result 1 z'=ml+m2

1 + m3

+ ...

1 --+ mv + z

13.9 Positive Definite Quadratic Forms Let w be any complex number in the upper half plane, p be a positive real number, and consider the quadratic form F(x,y) = p(x - wy)(x - wy) = px 2 - p(w

+ w)xy + pwwy2.

If we apply the unimodular transformation aw' +b w = cw' +d' then the above becomes p«cw' +d)x-(aw' +b)y)«cw' +d)x - (aw' +b)y)/lcw' +d1 2 = p(dx-by-w'( -cx+ay»(dx-by-w'( -cx+ay»/lcw' +dI 2 ,

357

13.9 Positive Definite Quadratic Forms

or p(X - w' Y)(X - ii/ Y)/Iew'

+ d1 2 ,

where

+ ay.

y= - ex

X= dx - by,

Therefore _

{p, - p(w

_

+ w),pww} ~

{p p(w' + w') pW'w'} lew' + dl 2 ' - lew' + dl 2 ' lew' + dl 2 .

(1)

We also note that w-w=

w'-w' lew'

+ dl 2

.

Starting from any positive definite form {IX, p, y}

where IX, p, yare real (IX > 0) and p2 side of (1), that

p = IX,

-

4IXY < 0, we have, by comparing the left hand

w=

- p + J p2 -

4IXY

2IX

.

Assuming that w' is in the fundamental region, then from (1) we have that

- I :::; w'

+ w'·
1, ~

w'w'

if w' if w'

1,

+ W' > 0, + w':::; O.

Substituting {IX', p', y'} into the right hand side of (1) we then have that

-

P' 1 < ---;-:::; IX

1,

r

--;> 1, IX

if

P' I; 2) W'l < - I, 0 < w~ < I; 3) - 1 < w~ < 0, and so w~ < - 2. I)

- I

o

w~

W'2

2)

W'1

o

- I

w~

3)

w'1

- I

w~

0

There is nothing to prove for 1), and 2) follows at once with the transformation

362


z' = z + 1. For case 3) we let z"=-z'-l

so that

- I
1, we see that I must be positive) satisfies

G ~)(~

-~) = C

=:)

and we see, from the method of (9), that the right hand side of this equation is a product of Sand T. The inductive argument is complete. D Note: Positive modular matrices can also be expressed as a product of

and

(10)

This is because

(-10 01)=(10 11)(11 0)-1(1 1 0 11). Theorem 1.2. Any modular matrix can be expressed as a product of the matrices

and

(11)

That is the group of modular matrices can be generated by these two matrices. Proof If a modular matrix M is not positive, then

MG

~)

368

14. Integer Matrices and Their Applications

is positive. It follows from the note above that any modular matrix is expressible as a product of the three matrices

But

G ~) = G~)(~ ~)(~ ~) so that the theorem follows.

D

Definition 1. Let M and N be two matrices. Suppose that there is a modular matrix U such that

M=UN.

Then we say that N is left associated to M, and we denote this by M ~ N. Clearly left association has the following three properties: (i) M ~ M (reflexive); (ii) if M ~ N, then N ~ M (symmetric); (iii) if M ~ N, N ~ P, then M ~ P (transitive). A similar definition can be given for right association. Theorem 1.3. Any matrix is left associated to a matrix of the form

if a > 0, then

°

~ c

(:

~),

d~O;

a ~O,

(12)

< a.

Proof Corresponding to the matrix

there are integers r, s such that rb

+ sd =

=

(r,s)

0,

Now there are integers u, v such that rv - su

=

l.

I so that

is a positive modular matrix, and UM=(a 1 Cl

If al

0)

d1

•

~ 0, then we mUltiply this matrix by ( - ~ ~) which will give a matrix with

369

14.1 Introduction ~ 0, and similarly we can make d l to a matrix of the form

al

~

O. Therefore every matrix is left associated

a ~O,

If a > 0, then we can choose q so that 0 ::;;; qa

+ C < a,

and from

0

we see that the theorem is proved.

Definition 2. We call the matrix in (12) the normal form of Hermite. Theorem 1.4. The normal form of Hermite for a non-singular matrix is unique.

. (a

~) for a non-singular

Proof We first note that the normal form of Hermite c

matrix cannot have a or d equal to zero. Now if (s u

t)(a v c

0)

d

(a l =

Cl

0)

dl

sv - tu '

= ± 1,

then, from td = 0, we have t = O. Also, from sa = al > 0, vd = d l > 0 and = ± I we see that s = v = 1. Finally, from ua + C = Cl> 0::;;; C < a, 0::;;; Cl < al = a we see that u = O. The theorem is proved. 0

sv

Exercise. Investigate the situation for a singular matrix. Definition 3. Let there be two modular matrices U and V such that UMV=N.

Then we say that M and N are equivalent, and we write M", N. Clearly, being equivalent has the three properties of being reflexive, symmetric, and transitive. Theorem 1.5. Any matrix is equivalent to a matrix of the form ( al

o

0), ala2

al

~ 0,

a2

~ O.

(13)

Proof Consider the matrix

Since the theorem becomes trivial if M is the null matrix we can assume that a -:f 0, and indeed we can even assume that a > O. We first prove that M must be equivalent to a matrix of the form

370


( al C1

bl ), dl

We use induction on a, the case a = I being trivial. When a > I and a,rb, we can choose q so that 0 < aq + b < a and we consider

where the leading element is a positive integer less than a. If alb and a,rc, then we choose q' such that 0 < aq' + c < a, and we consider

where the leading element is once again a positive integer less than a. Finally, if al(b, c), but a,rd, we let c = c'a so that

(oI I) ( I 0) I

- c'

I

(a b) c d

=

(a

(l - c')b

*

*

+ d\, )

and a,r{(l - c')b + d} which reduces back to the case when a,rb. The inductive argument is now complete. Nowatl(bbcl,dl).Weletbl =a 1b2,cI = a1c2, and d l =a 1 d 2 ,andweconsider

2 ~)C:~2

l (_ c

::~:)(~

- ~2)

=

(~l

al(d2

~ b2C2))

where we can assume that al > 0, since otherwise we can multiply by (

- I

0

Similarly we can assume that a2 = d 2 - b2C2 ;:::: O. The theorem is proved. Definition 4. We call the matrix in (13) the normal form of Smith.

We summarize our result as follows: By Theorem 1.2 any modular matrix is a product of the matrices

G~), From

and

we see that the effect of mUltiplying by

G~)

or its inverse is merely the

371

14.2 The Product of Matrices

interchanging of the two rows or the two columns of the matrix. Again, from

and

±

1)1 = (a

b

we see that the effect of multiplying by

±

a),

d± c

c

(~ ~) or by its inverse (~ -~) is the

addition or subtraction of the second row to the first row, or the first colu~n to the second column of the matrix. We call these operations here the elementary transformations of the matrix. We can therefore restate Theorem 1.5 as follows: We can use elementary transformations to reduce a given matrix to the normal form of Smith. Now the greatest common factor of the elements of a matrix is invariant under an elementary transformation, and so from Theorem 1.5 we have (a, b, c, d) = al' Also

lac bld = ad -

be =

± a 2l a2'

Therefore we have Theorem 1.6. The normal form of Smith for a given matrix is unique. D

14.2 The Product of Matrices Let all, aI2,"" amn be integers. We call the array

an m by n matrix and we sometimes denote it by A(m,n). Ifm = n, then we denote it by A(n) and we call it a square matrix of order n. Let B be an n by I matrix

B

= (::: . : : . :::) . .

bnl

• • •

We define the product matrix of A and B by

bnl

372


AB = C =

Cll ( ~~~

Cll) .. : : : ..

Cmi

(r

• •.

~~l.

n

Crs

'

I

=

t=

artbts 1

Cml

= 1, ... , m; s = 1, ... , I).

(1)

We see from the definition that the product matrix of A and B exists only when the number of columns in A is the same as the number of rows in B. Note also that, when AB and BA both exist, they may be different. If AB = BA, then we say that A and B commute. However we always have (AB)D = A(BD) whenever either side of this equation exists. If A and B are square matrices, then the determinant of AB is the product of the determinants of A and B. A square matrix whose determinant is zero is called a singular matrix, otherwise we call it a non-singular matrix. Modular matrices are those square matrices whose determinants equal ± 1 and positive modular matrices are those whose determinants equal 1. Clearly the product of two (positive) modular matrices is a (positive) modular matrix. The square matrix

where each element not on the main diagonal is zero is called a diagonal matrix, and we denote it simply by A = [Ab A2, ... ,An]. If Ai = A2 = ... = An = 1 then,

A- - I --

( o10 0) 1

...

0

........... .

o

0

...

1

and we call !the unit matrix. Clearly we have AI = fA = A for any square matrix A of order n. If the square matrices A and Bsatisfy AB = I, then we call Bthe inverse of A and we denote it by A - 1. Consider a square matrix A (= A(n). By the cofactor of the element aij we mean the determinant of the square matrix of order (n - 1) obtained by removing the i-th row and thej-th column of A. If we attach the sign (- l)i+ j to the cofactor of aij then we call it the algebraic cofactor ofaij and we denote it by Aij. Let

Ao =

(~:: . ~:.: ..... ~::) , A in

that is the matrix obtained from

A

A 2n

...

Ann

by replacing each element a rs with the algebraic

373


cofactor Ars of am is called the adjoint matrix of A. It is not difficult to prove that AAo = AoA = aI,

where a is the determinant of A. It follows that if A is a modular matrix, then its inverse exists, and that A -1 = ± Ao. Conversely, if A has an inverse, then it must be a modular matrix. If AB = I, then from B = (± AoA)B = ± Ao(AB) = ± Ao we see that the inverse is unique and that AA - 1 = A - 1 A = l. Also, if A and B both have inverses, then (AB)-l = B-1A- l . A 1 by n matrix (Xl, . .. ,xn ), where We no longer restrict the elements to be integers, is called a vector, and we write x = (Xl> ... ,xn). We should take care that this notation here is not to be confused with the greatest common factor symbol (Xl> ... , xn) = d. We shall use the convention that (Xl, ... , xn) by itself always represents a vector, while (Xl> . .. , xn) = d means the greatest common factor of Xl> ••• ,Xn • Also we shall always use the letters X and Y to denote a vector with n terms. The equation y=xB

(2)

represents the system of linear equations Yi

=

L xjbj;,

I :::; i :::; l.

j= 1

If n = I and B is non-singular, then (2) is called a transformation. Corresponding to integers Xl> ••• ,Xn the transformati9n gives integers Yl> ••. ,Yn, but not conversely. However, if B is a modular matrix, then when Yl> ... ,Yn are integers, the numbers Xl> . .. , Xn must also be integers. In this case we call (2) a modular transformation. Example l. Let r ¥- I, and Yl = - X" Yr = Xl> Yi = Xi (i ¥- I, i ¥- r). This is a modular transformation whose corresponding matrix is obtained from I by mUltiplying the first row by - I and then interchanging it with the r-th row (or mUltiplying the r-th column by - I and then interchanging it with the first column). We denote this matrix by Er so that

o o

o

o o

-I 0

o

o

o

0 r

0

0

1

Example 2. Let r ¥- I, ·and Yi = Xi (i ¥- r), Yr = Xr transformation and its corresponding matrix is

+ Xl.

r.

(3)

This too is a modular

374


Vr =

( o~ ~ ::: ~

.. ..

.. .. 0

...

0 r

::: .. ~) ...

,

(4)

1

that is the matrix obtained from [by adding the r-th row to the first row (or adding the first column to the r-th column). . It is easy to prove that Vr is representable as a product of V 2 and E i • In fact, if r > 2, then (5)

The proof is as follows: Let

so that

, ... ,

E,E,E, V,E,E,E,t

~(

t1 + ;:

t r)

.

But

_(t1 ~. t r

)

Vrt -

,

tn

so that (5) follows. Example 3. For fixed distinct rand s we let Yi = Xi (i "# s) and Ys = Xs + X r • Then this is also a modular transformation whose matrix is obtained from [by adding the s-th row to the r-th row (or adding the r-th column to the s-th column). We denote this matrix by Vrs so that

375


o

1 0

Vrs =

o

o

0

o

0

o

o

0

o

o

r

s

o o

r

o

s

(6)

When s > 1, Vrs = E r- I VsE" and Vrl = E r- I V r- I Er. Therefore Vrs can also be represented as a product of V 2 and E 2 , ••• , En. The matrices Vrs (1 ::;:; r ::;:; n, 1 ::;:; s ::;:; n, r ¥- s) together with all the products formed by them forms a group which we denote by Wl n • We saw, from the note following Theorem 1.1, thatthe group Wl 2 , generated by the matrices V21 and V 12 =

(~ ~).

=

G ~)

is identical with the group of all 2 by 2 positive modular

matrices. We now prove the corresponding result for n by n positive modular matrices. Theorem 2.1. The group Wln is the group of all n by n positive modular matrices.

It is clear that each matrix in Wln is a positive modular matrix so that we only have to prove that every positive modular matrix is in Wln , that is every positive modular matrix can be expressed as a product of the matrices Vrs . For this purpose we shall first establish the following two theorems. Theorem 2.2.

If (Xl> .•. ,xn ) = d, then there exists (Xl, ... , xn)U

=

U E Wln such that

(d, 0, ... ,0).

Proof Consider first the case n = 2. If (Xl> X2) = d, then there are integers rand s such that rXI

+ SX2

(r, s) = 1.

d,

=

We take u = - x2/d, v = xdd so that

VX2

+ UXI

=

0,

vr - us = 1.

Thus (Xl>

and P =

G:)

X2)

G:) =

(d,O)

is a positive modular matrix. Since we already know that P E Wl2 by

the note following Theorem 1.1, the case n

=

2 is proved.

376


We now proceed by induction on n. Let (xn-l> xn)

= dl> so that there exists

P E Wl2 such that

Let 1 0 0 0 000 v(n)

=

o

o

0

r

u

0

s

v

It is easy to see that v(n) E Wln and that

From the induction hypothesis we have v(n (Xl> ••• , X n-2,

d l ) v(n-l)

l) E

= (d, 0, ... ,0).

We now let v(n) 1

=(

Wl n _ 1 and that

v(n-l)

0)

0

1

so that (X I> ••• , xn) v(n) v~n) = (d, 0, ... ,0).

It is easy to see that

so that the theorem is proved.

0

Theorem 2.3. Let (all, a12, ... ,al n) = d. Then there isamatrix inWln whosefirst row is a12 aln) (~ d' d , ... , d . Proof By Theorem 2.2 there is a matrix U in Wln such that (all' aq, . .. , aln)U = (d, 0, ... ,0)

and so the matrix U - 1 is a suitable candidate. Proof of Theorem 2.1. The case n

on n. Let

0

= 2 is already established. We now use induction

377

14.3 The Number of Generators for Modular Matrices

be any positive modular matrix. Clearly (all, by the matrix U in Theorem 2.3 we have

aI2,""

( ~.2.1I ~~~0 •.

a~l

=

1. On multiplying A

0) ,

"

AU =

al n)

.. : : : ..

~~n.

•••

a~n

a~2

•

The matrix

o

0

0

o o

V=

is in IDln' and

From the induction hypothesis, the matrix

is in IDln_ 1, and so the matrix

is in IDln. From (7) we see that the theorem follows.

0

14.3 The Number of Generators for Modular Matrices We proved in §1 that any 2 by 2 positive modular matrix can be expressed 'as a product of the matrices V21

=

C ~)

and V l2

=

(~ ~). We now discuss the

378


general case, and ask for the matrices whose products give all possible n by n positive modular matrices - that is we want to know the generators of the group ill'ln. From the definition for ill'ln any matrix in it is a product of V", and from the previous section we know that each Vrs is expressible as a product of the following n matrices: 0 I 0 - I 0 0 0 0

E2 =

0 0 0

0 0 0

0

- I

0

, ... , E= n

000 I

0 0

I 0 0

0

o

0

o o o

0

010 o 0 000

Thus ill'ln can be generated by the n matrices E 2, E 3 , ••• ,En' V2. Let

VI =

(~ .. ~ . : : : .. ~ . ( ~. ~".- ') . o

0

...

I

0

It is not difficult to prove that each of E 2, E 3 , ••• , En is expressible as a product of VI and E 2 In fact, we have 0

Er

= (E2Vdr-2Ez(E2VI)n-r+l,

if n is even,

Er = (E;lvIy-2E2(E;IVlt-r+I,

if n is odd, r is even,

Er = (E;lvIy-2E;I(E;IVdn-r+l,

if nand r are odd.

(1)

The proof of (1) is similar to that of (2.5). Thus ill'ln can be generated by the three matrices VI> V2, E 2 If we write 0

o

V*=

o

0

I

0

o o o

0

000 then it is easy to verify that E2 the three matrices

V1

=

=

V* - 1 V2V* -I, so that ill'ln can also be generated by

0 0 I 0 0

0 (- It-I 0 0 0 0

..................... 0

0

1

0

379


1

o o

o

o o o

0

0 0

U*=

o

0 1 0 0

o o o

(2)

000

000

When n = 2 we saw that Wl2 can actually be generated by the two matrices UI

(0 -1)

= 1

(1 1)

0 and U2 = 0

1 . We now ask whether Wln (n ;:::: 3) can also be

generated by U I and U2 ; that is whether U* is expressible as a product of U I and U 2 • We first examine the cases n = 3 and 4. (1) For n = 3, we have UI

=

(0 0 1)

1 1 1

U2 = ( 0

1 0 0 , 010

o

U* =

0

(1 0 0) 1 1 0 001

.

In the following we call the position for the i-th row and the j-th column the "position (i, j)". Consider the operation of multiplying U2 by UI on the left and U 1 1 on the right. We see from .

that successive applications of the above operation will leave the elements in the main diagonal invariant, whereas the element 1 not on the main diagonal will take up the successive positions (1,2), (2,3), (3,1). Similarly the elements in the three positions (3,2), (l, 3), (2, 1) will be permuted along a rail as shown in the diagram.

(I, I)

(1,2)

(2, \I)~ (2,2) (3, I)

(3,2)

'"

(1,3) (2, 3) (3,3)

Consequently in order to obtain the element 1 in the position (2, 1) we have first to produce this element in one of the positions (l, 3) or (3,2). Now if we mUltiply

380


T by U-; I on the left and U2 on the right, it will give rise to the element 1 in the position (3,2); that is U-; I TU2 =

(~ ~ ~).

1 1 1 The operation of multiplying by U ~ 1 on the left and U I on the right will make the element 1 in the position (3,2) in the matrix U -; I TU 2 move to the position (2, 1), that is

w=

U~IU-;lTU2UI = (~ ~ ~). o

0

1

Therefore we need only to annihilate the element 1 in the position (2, 3) to give the required matrix U *, and this can be accomplished by mUltiplying by S - I on the left; that is

S-'w~G

o1 o

0) 0

= U*.

1

Therefore, for n = 3, we have (3)

(2) For n = 4 we have

UI

=

CO

0 1 0 0 0 1 0 o 0

u,~o

-~) o ' 0

0 1 0 0 1 0 0

~}

U'~G D

0 1 0 0 Similarly to the case n = 3, we start with

0 0 1 0

T~ U~·U,U. ~( ~

o1 0 0 0 ) o 1 0

.

- 1 001 We can produce the element - 1 in the position (4,2) by multiplying Tby u-; I on the left and U2 on the right; that is

U-; I TU 2

=( ~ ! H). -1

-1

0

1

381


Again, the operation of mUltiplying by U~ 1 on the left and U 1 on the right will move the element - 1 from the position (4,2) to the position (3, 1); that is

-1

-1

U 1 (U 2 TU 2) U 1

1 0 0 0) 0 1 0 0

=( _ 1 0 1 I 000

(4)

.

1

Performing the first operation of multiplying by U-;l on the left and U 2 on the right will now produce the element - I in the position (3,2); that is

U 2-1( U 1-1 U 2-1 TU2 U 1 ) U 2

=

(~ ~ ~ ~) _ 1

_ I

I

I

o

0

0

I

.

Performing the second operation of multiplying by U ~ 1 on the left and U 1 on the right will now move the element - I in the position (3,2) to the position (2, 1); that is

At this point we observe that the elements of the matrix below the main diagonal matches those of U* - 1, and the problem now is the anihilation of the elements 1 above the main diagonal. From (4) we have

o

0

1

o o

1 0

and hence

o

0

1 0

o o

1

0

Therefore, for n = 4, we have U*-l

=

U~lU~lU-;lU~lU-;lU1U2U1U1U~lU-;lU~lU-;lU~1

x~~~~~~.

If we write U

=

U 2Ut> then (3) and (5) become

U* = U*-l

~

=

U~lU-1U1U1U-1U;lU2

(n = 3),

U~1(U-l)2U1U.ul(U-l)2U~lU3

(n

= 4),

(6)

382


and in general we have U*(_1)"-1

=

U 1 1 (U- 1 )n- 2U 1 U n - 3 U1(U-1)n-2U11 un-I.

(7)

The reader can follow the proof of (2.5) to prove (7). Therefore we have Theorem 3.1. The group IDln ofpositive modular matrices can be generated by the two matrices

U1 =

(~ ~

.. ..........

o

0

...

100 010 0 000

~ (~ ?".-') , ..

I

0

000

In other words, any positive modular matrix is expressible as a product of U 1 and U2 • D

Any modular matrix which is not positive will become so on multiplying by

U3 =

(~.~o ~ ..

0

.. ::: ..

...

~)

.

I

Therefore we have Theorem 3.2. The group of all modular matrices can be generated by the three matrices U b U2 and U3 • In other words any modular matrix is expressible as a product of the matrices Ub U2 and U3 • D

14.4 Left Association Definition 1. Let A and B be two square matrices. Suppose that there is a modular matrix U such that

A= UB. Then we say that B is left associated to A and we write A ~ B. Clearly left association is reflexive, symmetric and transitive. Theorem 4.1. Any square matrix is left associated to a matrix of the form bl l b21

0 b22

0 0

0 0

0 0 (1)

bn - 1,1 bn - 1,2 bn - 1,3 bn - 1,n-1 bn1 bn2 bn3 bn ,n-1 where bvv ;?; O. Also if bvv > 0, then 0 :::; biv < bvv (i > v).

0 bnn

383

14.4 Left Association

Proof The case n = 2 has already been proved (Theorem 1.3). We now proceed by induction on n. Let

be any square matrix. If there is a non-zero element in the last column of the matrix A, then we let (aI", a2", ... ,a"n) = bnn . There are integers b I , b 2, .. . ,bn such that

By Theorem 2.3 there is a modular matrix V whose first row is (bb b 2, ... , bn). We interchange the first row of V with its n-th row to give a modular matrix U whose nth row is (b I , b 2, . .. ,bn). We then have

It is easy to see that a'in' ... ,a~-I.n are linear combinations of aln, a2", . .. ,ann and are therefore divisible by bnn . Therefore

0 A~

a~n

0

0

0

0 0

0

a"11 a"21

bnn a~n

bnn

a~,n-l a~,n-l

0 0 (2)

a~-l,n-l a~,n-l

a~-l,l

a"ni

0 bnn

The above still holds even when all the elements in the last column of A are zero, except that we have bnn = O. It follows from the induction hypothesis that

A~

where b vv

;;::

0, biv

=

bll b 21

0 b 22

0 0

0 0

............................... bn-I,I

bn- I,2

b~I

b~2

bn-I,n-I b~,n-I

0 (i < v), and if b vv > 0, then 0

~

biv
0, then there exists an integer qn-l such that

Therefore

o o

o o

A:!;, b~l

bn - l •n b~.n-l

b~2

l

0

bnn where b~i = qn-Ibn-l.i + b~i (1 ~ i ~ n - 1), 0 ~ b~.n-l < bn - l •n The theorem follows from repeated applications of this. 0

l .

Definition 2. We call a square matrix of the form (1) the normal form of Hermite. Exercise. Prove that the normal form of Hermite for a non-singular square matrix is unique.

14.5 Invariant Factors and Elementary Divisors Definition 1. Let A (= A(m.n») and B (= B(m.n») be two matrices. Suppose that there are two modular matrices U (= u(m»), V (= v(n») such that A

= UBV.

Then we say that A and B are equivalent and we write A '" B. Clearly equivalence has the three properties of being reflexive, symmetric and transitive.

Theorem 5.1. Any matrix A (= A(m.n») must be equivalent to a matrix of the form

o o

0 d l d2 0

d l d2 d3

o

o

o

dl

0 0

0 0 0

0 0 0

0 0 0

(m~n)

(1)

o

or

o o

where di ~ O.

o o

o o

o

o

(m ~ n)

o

(2)

385

14.5 Invariant Factors and Elementary Divisors

Proof Let A = (al b a12, ... ,alk) be a 1 by k matrix where k is any positive integer (k> 1). By Theorem 2.2 there is a positive modular matrix U such that AU = (d, 0, 0, ... ,0)

and so the required result is proved. Also, from

where U' is the transposed matrix of U, we see that the theorem also holds for k by 1 matrices. We now proceed by induction on the number of rows of the matrix A. Let A be any given matrix. If A = 0, then the result is trivial. If A i: 0, then we may assume that all i: 0 and indeed we can even assume that all> O. We first prove that A must be equivalent to a matrix of the form: A", Al =

(;t: . ~:: .... ~:.:) , a~l

a~2

...

(1

~i~m,

1 ~j~n).

a~n

This is clearly so if all = 1. When all > 1, if all,j'aiojo then we can move aiojo to one of the positions occupied by a12, a2b a22, by means of row or column interchanging. Therefore, using the method of proof for Theorem 1.5, we can change the leading element to a positive integer which is less than all, and an inductive argument completes the first part of proof. Now from

o

0

o ••••••••••••••••

a~l

L

0

a~l

a~2

x

we have

OJ

0

a~l 1

o

0

a~l o ( = ...... ~~ ...... .. ~n. 0 a"

o

a~2

...

...

0 ) a"

a~n

'

386


Therefore, from the induction hypothesis, we have

A '"

(a~o~ . ;~ . ~ . :::......~...... ~ . :::. ~) 0

0

...

d~

.. · d~

0

...

(m

~

n)

(4)

0

or

o d~

A",

o o

o o o o

o o

(m;;:: n).

(S)

o

Since a~ 11 a;j, and d~ is a linear combination of the elements of A 1> it follows that If we let a~I = d1> d~ = d Id 2 , d; = d 3 , d~ = d4 , ••• , then the theorem follows from (4) and (S). 0 a'llld~.

Definition 2. We call matrices of the form (1) or (2) the normal forms of Smith. In the proof of Theorem S.l the operations that we use are: the interchange of rows (or columns), the addition of an integer multiple of a row (or column) to another row (or column); the mu'Itiplication by - 1 to a row (or column). We call these operations the elementary operations of matrices. We can therefore restate Theorem S.l as follows: any matrix can be reduced to the normal form of Smith by elementary operations. After the interchange of two rows (or columns) or the multiplication by - 1 to a row (or column), the i by i sub-determinants of the resulting matrix are either the same as the i by i sub-determinants of the original matrix, or differ by their signs only. Again if we add an integer multiple of a row (or column) to another row (or column) the i by i sub-determinants of the resulting matrix are either the same as the i by i sub-determinants of the original matrix, or the i by i sub-determinants with the addition of an integer multiple of i by i sub-determinants. It follows that the greatest common factor of all the i by i sub-determinants of a matrix is invariant under any elementary transformation. Therefore we have

Theorem 5.2. Let A '" B. Then the greatest common factors of the i by i subdeterminants of the two matrices A and B are the same. 0 Meanwhile we see from (1) and (2) that

are the greatest common factors of the i by i sub-determinants of A. Therefore we have

387

14.6 Applications

Theorem 5.3. The normal form of Smith for a matrix is unique.

D

Definition 3. Let the non-zero elements of the nOl'Illal form of Smith in (1) or (2) for a matrix A be (k

~

min(m,

n».

We call these numbers the invariantfactors of A of orders 1,2, ... ,k respectively. The number k is called the rank of the matrix A. Let d1

•••

di = p~i1 ... p7/'i

be the standard prime factorization of an invariant factor. We call the prime power

p't an elementary divisor of the matrix A.

It is easy to see that the indices of the elementary divisors satisfy: (1

~j ~

I).

It also follows from th~ definition that if two matrices have the same invariant factors, then they have the same rank and the same elementary divisors. Conversely if the ranks are the same and the elementary divisors are the same, then the invariant factors are the same. Therefore we have

Theorem 5.4. A necessary and sufficient condition for two m by n matrices to be equivalent is that they should have the same rank and the same elementary divisors. D

14.6 Applications Let us consider the solutions to the system of linear equations n

Yi

=

L X/lji

(1 ~ i ~ m,

n;;:: m),

(1)

j= 1

with integer coefficients, and given integers solutions to

Yi -

that is we consider the integer

y=xA,

(2)

We saw in the previous section that there are two modular matrices U (= and V (= v(m) such that

u(n)

388


o o UAV=

o o

o o

o

o

=D.

(3)

o

We now let y V = y* =

(y~

, ... , y~),

xU- 1

= x* =

(x~, ... ,x~),

so that, from (2), y*

=

x*D,

(4)

or (I

~

i

~

(5)

m).

A necessary and sufficient condition for (I) to have a solution is that (5) has a solution. If d 1 ••• dk i: 0, dk + 1 = 0, then a necessary and sufficient condition for (5)

to have a solution is that Y~+l

= ... = y~ = o.

(6)

From (3) we have (7)

Now, if (6) holds, then we have, by (7), that (8)

conversely, if (8) holds, then

and from Theorem 5.2 we have y~+ 1

= ... = y~ = 0,

which is formula (6). Therefore a necessary and sufficient condition for (I) to have a solution is that (8) holds; that is, we have Theorem 6.1. A necessary and sufficient conditionfor the system (I) to have a solution is that there are two matrices A and (;) with the same invariant factors.

D

389

14.7 Matrix Factorizations and Standard Prime Matrices

If (5) holds, then we have I

Xl

y~

X'

= d1 '

... ,

2

(9)

This means that X'1' x~, ... , x~ are uniquely determined, and x~+ l ' ... , x~ can be any integers. Thus, if t1>' .. , t n - k are n - k arbitrary integers, then k Xi

=

L

n-k XjUji

+

j= 1

L

t/Uk+/,i

/= 1

n-k

= XlO) +

L

t/Uk+/,i

(1 ~ i ~ n),

(10)

/= 1

where

x~O),

. .. , x~O) is set of solution to (1) when tl

=

t2

= ... =

tn -

k

= O.

14.7 Matrix Factorizations and Standard Prime Matrices Definition 7.1. Let A and B be two square matrices, and suppose that there is a matrix C such that A = CB. Then we call B a right divisor of A, or we say that B right-divides A, and we write BIA. Clearly we have (i) AlA ; (ii) if AlB and BIC, thenAIC. We can define a left divisor and left-divide similarly. Definition 7.2. Let A be a non-singular square matrix which is not a modular matrix. Suppose that for any factorization A = BC, we always have either B or C a modular matrix. Then we call A an irreducible matrix or a prime matrix. Otherwise we call A a composite matrix.

Let A be a non-singular square matrix. By Theorem 5.1 there are two modular matrices U and V such that (1)

It is easy to decompose [dl, d 1 d 2, . .. , d 1 ••• dn ] into prime matrices. More specifically its factors are of the form P = [1, ... , l,p, 1, ... ,1] wherep is a prime number, and the number of such factors is the number of prime factors in d 1 • d 1d 2 ..... d 1d 2 ... dn • Therefore we have P

= [1, ... , l,p, 1, ... ,1],

(2)

where any two P can be interchanged. Consequently we have the following two theorems. Theorem 7.1. A necessary and sufficient conditionfor a square matrix to be a prime matrix is that its determinant is a prime number. 0

390


Theorem 7.2. Any composite square matrix can befactorized into a product ofafinite number ofprime matrices, and the number offactors is equal to the number ofprime divisors of the determinant of the matrix. D This type of factorization does not possess the uniqueness property. This is because we can always insert WW- 1 (where Wis a modular matrix) in between two factors Pi' Pi + 1 so that Pi Wand W- 1 Pi + 1 are now different factors from Pi and P i + 1· However, if we impose certain restrictions on the form of the factors, then we may have a sort of uniqueness theorem. Definition 7.3. If a prime matrix is expressible as U -1 [1, ... , l,p] U, where U is a modular matrix, then we call it a standard prime matrix. It is clear that every prime matrix must be left associated to a standard prime matrix. We now rewrite (2) as the following:

(3)

where any two V-I PV can be interchanged, and they are all standard prime matrices. Therefore we have: Theorem 7.3. Any composite square matrix must be left associated with a product ofa finite number of interchangeable standard prime matrices. D Definition 7.4. By the standard factorization of A we mean (4)

where Wand V are modular matrices, and PI, ... , P s are of the form in (2). It is clear that P b ..• , Ps are uniquely determined by A, apart from ordering. Before we prove our uniqueness theorem we need the following definition: Definition 7.5. Let A be a non-singular square matrix. A modular matrix U satisfying AUA o == 0

(modlAI)

is called an adjoint modular matrix of A. Here Ao is the adjoint matrix of A, and IAI is the absolute value of the determinant of A. Since the elements of the matrix AUA o are all multiples oflAI, it follow~ that the elements of the matrix (l/IAI)A UAo are integers. Moreover, on taking the determinant, we see that it is actually a modular matrix. Theorem 7.4. The set of all adjoint modular matrices of A forms a group.

391


Proof Let U and V be adjoint modular matrices of A. From AUAoAVAo =

± IAI' AUVA o == 0

(modIAI2)

we see that UV is an adjoint modular matrix of A. Also, from

we have 1 -AUA o ' AU-lAo == 0 IAI

(modIAI),

and (l/IAI)AUA o is a modular matrix, so that U- 1 is also an adjoint modular matrix of A. The theorem therefore follows. - D Definition 7.6. The group of adjoint modular matrices of A is called the adjoint group of A. Theorem 7.5. Let (5)

be any standardfactorization of A. Then there exists an adjoint modular matrix U of A such that V 1 = VU, W 1 = (± l/IAI)AU- 1 Ao WU where Wand Vare the matrices in (4). Proof From (4) and (5) we have A AV- 1 V 1

= WV- 1 P 1 P 2

'"

PsV= W 1 V;lP 1 P 2

'"

PsVlo

= WV- 1 V 1 W;lA.

It follows easily that U

=

V- 1 V 1 is an adjoint modular matrix of A, and that

±1 IAiAUAo

-1

= WUW 1

•

D

This theorem therefore gives the relationship between any two standard factorizations of A. Concerning the interchangeability of the standard prime matrices we have the following two theorems: Theorem 7.6. Let P = [1, ... , l,p] and Q = U- 1 [I, ... , 1, q]U be two interchangeable standard prime matrices. Then Q must be of the form (6)

where r = q or 1. Also, if r matrix.

= q, then Q1 = I, and if r = 1, then Q1 is a standard prime

392


Proof Let

From PQ = QP we have (Ql \yy

x) = (Ql pr y

xp). rp

(7)

It follows at once that y

= (0, ... ,0).

Next, let

so that, from VQ = [1, ... , I, qJV, we have

(~11~11 If u =F 0, then we have

r

:1:) = (~:

;~).

= q. In this case, from

Xlr

(8)

= Xl, we deduce that

and hence u = ± 1, and that V 1 is a modular matrix. From V 1 Ql = V 1 it follows that Ql = f. If u = 0, then

so that r = 1. From V 1 Ql = V 1 and Ql cannot be f, it follows that V 1 is singular. By Theorem 5.1 there are two modular matrices V 1 and W 1 such that V1 V 1 W1 = [A!> ... , An -2, OJ, Ai;;:: O. Therefore, if we let

V= (Vlo 0)l ' W= (Wlo 0)I ' then

393


0 A2

Ai 0

0 0

0 0

Cl C2

...........................

0 0 0 0 d 1 d2

0 Cn-2 An-2 0 0 Cn-l 0 dn- 2 dn- 1

From !Cn-ldn-1Al ... An-2! = !X! = 1 we see that Ai = ... = An-2 = 1, Cn-l = ± 1, dn- 1 = ± 1; here !X! denotes the absolute value of the determinant of X. Next we let 1 0 o 1 Y=

... ...

+ Cl + C2

0 0

+

0 0 0 0 0 0

0 0

0 0 0 0

Cn -2

1 0

0

0

0 0

o

Z=

+dn - 2

o

= (Zl

0

o

1 0 0 1

0)l '

where in the matrices Yand Z the ambiguous signs are determined by the opposite signs of Cn - l and dn - 1 respectively. We then have 1 0 1

o XYZ=

0 0 0 0 0 0

... ...

0 0

0 0

0 0 0 0 dn -

0 0

1

0 Cn -l 0

It follows from

XW- 1 QW = VUQW = V[I, ... , l,q]UW = [1, ... , l,q]X,

that YXZZ- 1 W- 1 QWZ

=

Y[I, ... , l,q]XZ = [1, ... , l,q]YXZ,

or (WZ)-lQ(WZ)

= (YXZ)-l[I, ... , 1, q] YXZ = [1, ... ,1, q, 1].

Therefore we have

This proves that Ql is a standard prime matrix.

D

394


Theorem 7.7. Corresponding to any set of interchangeable standard prime matrices P b . .. , P., there is a modular matrix U such that U- 1PiU are all diagonal matrices. Proof The theorem is trivial if s = 1. We now proceed by induction and assume that the theorem holds when the number of matrices is less than s. Corresponding to P., there is a modular matrix Us such that U s- 1PsUs = [1, ... , I,Psl Let U S- 1PiUs = Qi'

i = 1,2, ... ,so

It is clear that these Q are interchangeable standard prime matrices. By Theorem 7.6

we have 1

~

i ~ s,

where ri = Pi or 1. Also if ri = Pi' then Qi = I, and Qi is of diagonal form; if ri = 1, then Qi is a standard prime matrix. Since the matrices Q are interchangeable we may assume that r1 = r2 = ... = rl = 1, rl +1 = PI+b ... ,rs = Ps (0 ~ t ~ s - 1). If t = 0, then the theorem follows at once. Otherwise from the induction hypothesis, there is a corresponding to the interchangeable standard prime matrices Qi, ... , modular matrix U* such that U*-1QiU* (1 ~ i ~ t) are of diagonal form. Now take

Q:

U1

so that U~1QiU1 (1 ~ i taking U = UsU 1. D

~

=

(~* ~).

s) are all of diagonal form. The theorem follows on

Exercise. Examine the properties of the adjoint group of the matrix A = [db d 1d 2, . .. , d1 .•. dnl

14.8 The Greatest Common Factor and the Least Common Multiple Definition 8.1. Let A and B be two square matrices, not both equal to O. Let D be a common right divisor of A and B such that any common right divisor is also a right divisor of D. Then we call D a right greatest common divisor of A and B. Suppose that A and B are both right divisors of the square matrix M( i= 0), and that M is a right divisor of any square matrix having both A and B as right divisors. Then we call M a left least common multiple of A and B. Similar definitions for left greatest common divisors and right least common multiples can be given. We shall only discuss right greatest common divisors and left least common multiples and, for the sake of simplicity, we shall call them greatest common divisors and least common multiples.

395

14.8 The Greatest Common Factor and the Least Common Multiple

We define the sum of the two matrices A

(aij) and B

=i

= (bij) by

Theorem 8.1. Let A and B be two square matrices which are not both O. Then their greatest common divisor D exists. Moreover there are square matrices P and Q such that

PA+QB=D. Proof Consider the 2n by n matrix

By Theorem 5.1 there are two modular matrices V (= V(2n», V (= v(n» such that

We denote by

where Vij are n by n matrices. Then, from the above, we have (1)

and so (2)

and hence any right divisor of A and B must be a right divisor of D. Also, if we let

(~:: ~::r (~:: ~::) 1

=

where Xij are n by n matrices, then from (1) we have

= (Xll (A) B X 21

X12)(D), 22 0

X

and so A

=

X

ll D,

B

=

X 21 D,

and therefore D is a greatest common divisor of A and B. On taking U U 12 = Q the theorem is proved. 0

11 =

P,

396


Theorem 8.2. Let the square matrices A and B have a non-singular greatest common divisor D. Then any greatest common divisor of A and B must be of the form UD, where U is a modular matrix. Proof Let D1 be any greatest common divisor. Then, from the definition, we have D = RD1 and D1 = SD, and hence D = RSD. On taking the determinants we see that Rand S are modular matrices. D

We now consider least common multiples. If the two matrices are both singular, then a least common multiple need not exist. For example and

G ~)

has no least common multiple. This is because every right divisor of

0)

.. .

take the form ( acO' and every nght dlVlsor of

(II I)I

(~ ~) must

must take the form

(: :), and these two forms are equal only when a = c = O. However, we have the following: Theorem 8.3. Let A and B be two non-singular square matrices. Then their least common multiple M exists. Moreover, M is non-singular, and any least common multiple is of the form UM where U is a modular matrix. Proof From (1) we have

We let

so that M is a common multiple of A and B. We now prove that M is a least common multiple. Let M 1 be any common mUltiple of A and B. Then a greatest common divisor M 2 of M and M 1 is also a common multiple of A and B. Let M2=KA = LB

so that (4)

Denote by Ao and Bo the adjoint matrices of A and B, so that AAo = aI, and BBo = bl where a and b are the determinants of A and B. Since A, Bare nonsingular, we have a i= 0, b i= 0 and, from (4), U21

= HK,

- U 22

= HL.

397

14.8 The Greatest Common Factor and the Least Common Multiple

Therefore we have, from (3), that

so that H is a modular matrix and H- 1 exists. We see from

that M is a least common mUltiple. We next prove that M is non-singular. From the definition of a least common mUltiple it suffices to prove that A and B have a non-singular common multiple. From Theorem 5.1 there are two modular matrices Vb V 1 such that

Let

It is easily seen that M* is a non-singular square matrix, and that

This matrix M* thus serves our purpose. If M3 is any least common multiple, then from the definition, we have

and so M=EFM,

I=EF;

thus E, F are modular matrices, and the theorem is proved.

0

Theorem 8.4. Let A be a square matrix. Then, corresponding to each non-zero integer m, there are two square matrices Rand Q such that (1) A = mQ or (2) A = mQ + R, and 0 < IRI < Imln, where IRI denotes the absolute value of the determinant of R. Proof By Theorem 5.1 there are two matrices V and V such that (di

~

0, 1 ,,;;;. i ,,;;;. n).

There are integers qi and ri (> 0) such that d1

Let

•••

d i = mqi

+ ri,

0< ri ,,;;;.Iml

(1 ,,;;;. i,,;;;. n).

398

14. IntegeLMatrices and Their Applications

so that (5)

If ri =

Iml

(1 ~ i ~ n), then R1 = Imll =

A = mU(Q1

which proves (1). Ifthereexistsjsuch that 0 < rj < from (5), we have A

± ml so that, from (5), we have

± /)V= mQ

Iml, then 0 < IR11 = r1r2 ... rn < 1m I", and so,

= mUQ1V + UR 1V= mQ + R

and

This proves (2).

0

Theorem 8.5. Let B be a non-singular square matrix. Then, corresponding to any square matrix A, there exist two square matrices Q and C such that (1) A = QB, or (2) A = QB + C, and 0 < ICI < IBI.

Proof Let Bo be the adjoint matrix of B so that BBo = BoB = bl, where b is the determinant of B. From the previous theorem there are two square matrices Q and R such that ABo

= bQ

(6)

or ABo = bQ

+ R,

0
••• ,xn } the set of all linear forms

1) =

with integer coefficients al> ... ,an. If

is another linear form in

1),

then we define

Definition 9.1. Let IDl be a subset of 1) with the property that if YI, Y2 are in IDl, then so are YI ± Y2. Then we call IDl a module. Clearly 1) itself is a module. The subset oflinear forms 0, ± XI> ± 2xl> ... also form a module.' The module formed by the subset whose only member is 0= OXI + ... + OXn will be excluded from our discussions. Definition 9.2. Suppose that the module IDl contains the forms YI> . .. ,Yl such that any form IDl can be expressed uniquely as

where bl> . .. ,bl are integers. Then we say that IDl has dimension I, and we call Yl> ... ,Yl a basis for IDl. It follows at once from the definition that YI, ... ,Yl are linearly independentthat is aIYI + ... + alYI = 0 implies al = ... = al = O. Theorem 9.1. Every module has a basis and has dimension at most n. Proof Let I (~ n) be the integer such that, for each member ofIDl, the coefficients of Xl + 1, . . . ,Xn are all zero, but there is a member of IDl whose coefficient of Xl is not

zero. It follows that the set of coefficients of Xl forms a non-zero integral modulus. We denote by b l the least positive integer in this integral modulus, and we let the corresponding linear form in IDl be

Now the coefficient of Xl for any member Y of IDl must be a multiple of bl so that Y =y'

+ gYI

where g is an integer, and y' is a linear form of the indeterminants Xl> ... , Xl-I.

400


Consider now the set of all such forms y'. We can determine an integer l' ( ~ I - I) such that, for each y', the coefficients of Xl' + 1, •.. , XI_ 1 are all zero, but there is a y' whose coefficient of Xl' is not zero. As before we can determine a linear form

where b;, is the least positive coefficient of XI' among all forms y'. Let y' = y" + g'YI" where g' is an integer and y" is a linear form in Xl, •.. 'XI' -1' Proceeding inductively we see that Wl has a basis y" y;, ... with at most n members. The theorem is proved. D Theorem 9.2. The dimension of a module is independent of the choice of bases. Proof LetYl>'" ,YI andz 1 , ••• ,Zl' be any two bases for a module Wl and suppose, if possible, that I i: I'. We may assume that I> I'. From the definition of a basis there are integers aij and bij such that

0 0) ~~~ .. ::: .. ~~I: ..~ .. ·.·.·...~

all

( }:) =

(

all

all'

. ..

all'

0

...

0

o Zl

bll

ZI'

bl'l 0

0

b ll

............

... b/'l 0

............ 0

0

0

OJ

where (aij) and (bij) are I x I square matrices. Therefore bl l

o

bll

0

OJ

But Yl>' .• ,YI are linearly independent, so that (aij)(bij) = I. Since the determinant of the left hand side is zero, we have a contradiction and therefore the theorem is proved. D From now on we shall only consider modules with dimension n. Let Y 1, . . . ,Yn be a basis for a module Wl. Then

401

14.9 Linear Modules

(

Yl)

~2

=

(all a12

Yn

aln) (Xl)

~~~ .. ~~~ .. :.:.:.. .~2.n

~2.

ani an2

Xn

ann

Therefore corresponding to each n dimensional module and its basis there is a square matrix

(1)

Yl> ... ,Y",

(2)

which is a non-singular because Yl, ... ,Yn are linearly independent. Conversely, corresponding to each non-singular square matrix A, we can determine a set of linearly independent linear forms Yl> ... ,Yn which can then be used as a basis to determine an n dimensional module IDl'. This then sets up a relationship between n dimensional modules and non-singular square matrices of order n. We now ask: What is the relationship between the two matrices corresponding to the two different bases of a module? Let Z1> ... , Zn be another basis for IDl with the corresponding matrix B = (bij) so that

Since both U

= (uij),

V

Y1> ... ,Yn

and

Z1> ... ,Zn

are bases, there are two square matrices

= (Vij) such that

and so

Since Y1> ... , Yn are linearly independent, we deduce that UV = /, that is U and V are modular matrices. Now

so that B= VA.

(3)

402


Therefore matrices corresponding to the same module are left associated. Conversely, two non-singular square matrices which are left associated correspond to the same module. Ifwe partition the family of all non-singular matrices of order n into classes by left association, then each class represents a module, and modules represented by distinct classes are different. We may therefore speak of "the matrix A associated with 9Jl" to mean that A is a member of the class of matrices which represent 9Jl. From Theorem 4.1 we see that, for an n dimensional module 9Jl, we can select a basis Yt> ... ,Yn such that

(4)

where a vv > 0 (l :0::; v :0::; n), and 0:0::; allv < a vv (/1 > v). This is a standard form for a basis, or a standard basis. Theorem 9.3. Let 9Jl and~ be two modules. A necessary and sufficient condition for ~ to be included in 9Jl is that the matrix associated with 9Jl is a right divisor of the matrix associated with ~. Proof Let the bases for 9Jl and ~ be Yt> .. . ,Yn and Zt> .•• , Zn and let the associated matrices be A = (aij) and B = (b ij ) respectively. If ~ is included in 9Jl, then

so that B = CA. Conversely, if B = CA, then

so that

~

is included in 9Jl.

D

Definition 9.3. Suppose that the difference between two linear forms Zl and Z2 is a member of the module 9Jl. Then we say thatz 1 andz 2 are congruent mod 9Jl, and we write Zl == Z2 (mod 9Jl). The relation of being congruent mod 9Jl is reflexive, symmetric and transitive, so that the family of all linear forms is partitioned into equivalence classes mod 9Jl. The number of such classes is called the norm of 9Jl, and is denoted by N(9Jl), the existence of which has yet to be proved. Clearly 9Jl itself is an equivalence class.

403

14.9 Linear Modules

Theorem 9.4. Let A correspond to the module 9Jt Then N(9Jl)

= IAI.

Proof Since the matrices associated with 9Jl have the same absolute value for their determinants, we may assume that the basis chosen is the standard basis in (4). Any linear form

gives another one with 0 ~ an < an", by subtracting a suitable multiple of + ... + annx n· We may further subtract multiples of Yn-l = an-l.lxl + ... + an-l.n-1Xn-l so that 0 ~ an-l < an-l.n-l> and so on. Thus every linear form is congruent to a linear form

Yn = anlxl

(1

~

v ~ n).

The total number of such linear forms is alla22 ... ann = IAI, and moreover no two such linear forms are congruent. The theorem is proved. 0 From Theorem 9.3 and Theorem 9.4 we have Theorem 9.5. Let 91 c 9Jl and let A and B be matrices associated with 9Jl and 91 respectively. Then, in the partitioning of9Jl into congruent classes mod 91, the number of classes is equal to

N(91) N(9Jl)

IBI IAI

o

The set Tl = {Xl' ... , xn} with indeterminants Xl, ... , Xn can also be represented by other indeterminants. If we let

where U = (uij) is a modular matrix, then Tl can also be represented by x'!, ... , x~; that is Tl = {Xl>'" ,xn} = {x~, ... ,x~}. Let a module 9Jl, together with its basis Yl>'" ,Yn corresponding to the indeterminants Xl>'" ,Xn have the associated matrix A. We now consider the associated matrix corresponding to a change of indeterminants to X '1 , ••• , x~. From

we see that the required matrix corresponding to the indeterminants x~, ... ,x~ is A U. This means that the relation of right association corresponds to the change of indeterminants, or the change of basis for the representation ofTl. Also, from (3) we see that the relation of left association corresponds to the change of basis for the

404


module. It therefore follows from Theorem 5.1 that each fixed n dimensional module 9Jl, after suitable changes of bases for the module and for !l, has an associated matrix which is a diagonal matrix (d l > 0, ... , dn > 0).

From Theorem 7.2 and Theorem 9.5 we have Theorem 9.6. Let 9Jl be an n dimensional module. Then there is a chain (5) such that (1

are prime numbers.

~

i

~

I)

D

The set of forms belonging to both the modules 9Jl l and 9Jl 2 is also a module which is called the intersection of9Jl l and 9Jl 2 , and we denote it by 9Jlm • Also the set offorms obtained from addition and subtraction of members belonging to 9Jl l and 9Jl 2 forms a module which is called the sum of 9Jl l and 9Jl 2 , and we denote it by 9Jld • We then have: Theorem 9.7. Let the matrices Mb M 2, Mm, Md be associated with the modules 9Jl l , 9Jl 2, 9Jlm, 9Jld respectively. Then Md isa least common multiple of Ml and M 2, and Mm is a greatest common divisor of Ml and M 2.

If M3 = B1Ml = B2M2 is a common multiple of Ml andM2 , and 9Jl 3 is the module with which the matrix M3 is associated, then

and hence

Thus Mm is a least common multiple of Ml and M 2. The proof that Md is a greatest common divisor of Ml and M2 is similar. D

Chapter 15. p-adic Numbers

15.1 Introduction The purpose of this chapter is to introduce the theory of p-adic numbers due to Hensel. This theory has extensive applications in number theory, algebraic geometry and algebraic functions, and is an important theory in the study of modern algebra. Before we give the rigorous definitions we give a simple introduction as to how we obtain the p-adic numbers. We recall the method of solution to the congruence f(x) == O· (modp')

(1)

which we discussed in chapter 2; heref(x) is a polynomial with integer coefficients and p is a prime number. Our method was first to solve the congruence f(x) == 0

(modp).

(2)

If (2) has a solution ao (0 ,::;; ao < p) andf'(ao) ¥= 0 (modp), then welet x and we consider the congruence f(ao

+ py) == 0

(modp2),

=

ao

+ py,

O,::;;y --n

10gqJ(ao)

(1)

We may view (log qJ(a»jlog qJ(ao) as a lower bound for the set of rational numbers satisfying (1). If qJ' and qJ are equivalent, then (log qJ'(a»jlog qJ'(ao) also acts as a lower bound for this set of rational numbers. It follows that, for any rational a i= 0, log qJ(a)

log qJ'(a)

log qJ(ao)

log qJ'(ao)

This means that there exists a positive constant s, depending only on qJ and qJ', such

411

15.4 Archimedian Valuations

that log cp'(a) log cp'(ao) , -,----= =s>O log cp(a) log cp(ao) holds for all rational a "# O. Therefore cp'(a)

= cpS(a). D

Definition 3.2. Let cp be a valuation and suppose that there exists a positive integer no > I such that cp(no) > I. Then we call cp an Archimedian valuation. A nonArchimedian valuation cp is one such that cp(n) ~ I for all positive integers n. The valuation cp(a) = lal is Archimedian, the identical valuation CPo and the padic va:Iuation cp(a) = lal p are non-Archimedian.

15.4 Archimedian Valuations Theorem 4.1. Any Archimedian valuation is equivalent to the absolute value valuation.

Proof Let cp be an Archimedian valuation and let n, n' be two integers greater than 1. We represent.n' as

o ~ ai < n, Then

From cp(ai)

~

ai < n (i = 0,1, ... , v), we see that cp(n')

~

n(l

+ cp(n) + cp(n)2 + ... + cp(n)")

~

n(l

+ v)max(1,cp(n)").

From the representation of n' we know that n" hence

cp(n')

~

~

n' so that v ~ logn'jlogn, and

IOgn') n( 1 + - max(l, cp(nyogn ,flog n). logn

Substituting n'h for n' we have

cp(n')h ~ n ( I

, + h IOgn') log n max(l, cp(n)h logn flog n),

or ,

cp(n')

~

-max(l, cp(nyogn'flog n). ( n ( 1 + hIOgn'))lfh logn

412 Letting h -+

15. p-adic Numbers 00

we have cp(n')

~

max(l, cp(n)logn'/logn).

(1)

This holds for all n, n' > 1. By the Archimedian property there exists no > 1 such that cp(no) > 1. Therefore 1 < max(l, cp(n)logno/logn) and whence 1 < cp(nyogno/logn. Therefore cp(n) > 1 whenever n > 1 and (1) may be rewritten as cp(n')

~

cp(n)logn'/logn,

or log cp(n') logn'

~--:--~

log cp(n) . logn

By the symmetry of nand n' we deduce that log cp(n') logn'

log cp(n) logn'

and this implies the existence of a positive constant s, depending only on cp, such that log cp(n) ---=s>O, logn

n> 1.

Therefore cp(n) = nS. Also, from cp(n) ~ n we know that s ~ 1. Next, from cp( - n) = cp(n) we see that cp(n) = In IS for all integers n such that Inl > 1. Finally, from 2) we see that, for all rational numbers a, cp(a) = lal s ,

O<s~1.

0

15.5 Non-Archimedian Valuations We saw in §2 that for the p-adic valuation cp(a) = lal p we have la + blp ~ max(lalp,lblp) with equality when lal p =F Iblp. We now prove that all nonArchimedian valuations share this property. Theorem 5.1. Let cp be a non-Archimedian valuation. Then 3')

cp(a

+ b) ~ max(cp(a), cp(b)).

cp(a

+ b) = max(cp(a), cp(b)).

Also, ij'cp(a) =F cp(b), then, 3")

Conversely, if a valuation cp satisfies 3') then cp is non-Archimedian.

413

15.5 Non-Archimedian Valuations

Proof From the Binomial theorem (a

+ b)" =

+ (~)an-lb + ... +

an

C:

l)ab n-

1

+ bn,

and the inequality cp(n) ~ 1, which holds for a non-Archimedian valuation cp and positive integers n, we see that cp((a

+ b)n)

~

cp(a)n

~

(n

+

+ cp(a)"-lcp(b) + ... + cp(a)cp(b)"-l + W(b)n l)max(cp(a)",cp(b)"),

or cp(a

+ b) ~ (n +

l)l/nmax(cp(a),cp(b)),

and 3') follows from this by letting n -+ 00. If cp(a) i= cp(b), we may assume that cp(b) < 0) is always a valuation regardless of whether s ~ 1. This is because, from 3'), cpS(a

+ b) ~ max(cpS(a), cpS(b)) ~ cpS(a) + cpS(b)

which gives 3). Given any non-Archimedian valuation cp we put w(a)

=-

log cp(a),

where the base of the logarithm is any real number greater than 1. The choice of the base has. little relevance because cps (s > 0) is also a non-Archimedian valuation. From the properties of cp we see that w has the following properties. i) If a i= 0, then w(a) is a real number, and w(O) = 00; ii) w(ab) = w(a) + web);

414

15. p-adic Numbers

iii) w(a + b) ~ min(w(a), web)); iv) w(a + b) = min(w(a), web)) if w(a) # web). If q> is not the identical valuation, then there must be a rational ao such that 0< w(ao) < 00. We also note that w(l) = 0, w( - a) = w(a) and wen) ~O for integers n. Theorem 5.2. The following is a necessary and sufficient condition for the equivalence of two non-identical non-Archimedian valuations q> and q>'. There exists s > 0 such that, for every rational a # 0,

w'(a) = sw(a), where w'(a) = - log q>'(a) and w(a) = - log q>(a).

D

Theorem 5.3. Every non-identical non-Archimedian valuation q> is equivalent to some p-adtc valuation lal p •

Proof First wen) ~ 0 for integers n, and from q> # q>o there exists an integer m # 1 such that w(m) > O. We next show that the set of integers satisfying this inequality forms a modulus. This is easy since if wen) > 0 and wen') > 0, then wen ± n') ~ min(w(n), wen') > 0 by iii). From Theorem 1.4.3 we know that there exists a least positive integer g in the modulus such that g divides every member of the modulus. Obviously g> 1, and we now prove that g is a prime number. Suppose, if possible, that g = g' g" (g' > 1, g" > 1). Then w(g)

=

w(g'g")

=

w(g')

+ w(g").

Since w(g) is positive and w(g'), w(g") are non-negative, it follows that at least one of . w(g') and w(g") is positive. But g' and g" are less than g and this contradicts with g being the least positive integer in the modulus. Therefore g is a prime number which we shall now denote by p. We have now proved that wen) = 0 if p t n, and wen) > 0 if pin. Corresponding to any rational number a # 0 we have the unique representation a = (rjs)p' (s> 0), where r, s are coprime integers, p t rs and I is an integer. Therefore

w(a) = wG)

+ Iw(p) = w(r) -

w(s)

+ Iw(p) = Iw(p).

Now

w'(a) = - log lal p = llogp, so that

w(p) w(a) = w'(a). logp Let s

= w(p)jlogp and the result follows from Theorem 5.2. 0

15.6 The ip-Extension of the Rationals

415

15.6 The cp-Extension of the Rationals Readers who are familiar with Cantor's method for the construction of real numbers in mathematical analysis should have no difficulty with this and the next section. , Let cp be a valuation, and we shall write {an} for a sequence al, a2, ... , an, ... of rational numbers. Definition 6.1. By a fundamental sequence, or a cp-convergent sequence, we mean a sequence {an} which satisfies the following condition: Given any rational number e > 0, there exists a positive integer N (= N(e» such that cp(am - an) < e whenever m,n>N. For example, the constant sequence, where al = a2 = ... = an = ... = a, is a fundamental sequence which we shall denote by {a}. If {an} is a fundamental sequence, then there exists A such that cp(an) ~ A for all n. We define the sum, the difference and the product of two sequences by

From

and

we see at once that the sum, the difference and the product of two fundamental sequences are fundamental. Definition 6.2. Let {an} be a sequence such that there exists a rational number a with the following property: Given any rational number e > 0, there exists a positive integer N (= N(e» such that cp(an - a) < e whenever n > N. Then we say that {an} has the cp-limit a, and we write cp-limn--> 00 an = a. Obviously the cp-limit of {a} is a. From cp(am - an) ~ cp(am - a) + cp(an - a) we see that the existence of a cp-limit implies the sequence being fundamental. Note, however, that the converse does not follow-that is, not every fundamental sequence possesses a cp-limit. Let {an} and {b n} have the cp-limits a and b. Then the sum, the difference and the product also have cp-limits, namely a + b, a - band ab respectively. Also, if cplimn--> 00 an = a, then limn--> 00 cp(an) = cp(a). Definition 6.3. By a null sequence we mean a sequence having the cp-limit O. We denote by {O} the class of all null sequences.

416 Example 1. If cp(a) =

15. p-adic Numbers

lal, then {an =

lin} is a null sequence.

Example 2. If cp(a) = lal p , then {an = pn} is a null sequence. It is easy to prove that the sum of two null sequences is a null sequence; so is the product of a null sequence and a fundamental sequence. We now define the quotient of two sequences. Let {b n } be a non-null sequence. Then the quotient {an}/{b n} is defined to be the sequence {a nb;l}. Observe that since {b n } is not a null sequence we may discard those terms which are zero without affecting the discussion. If {an} is a fundamental sequence but not a null sequence, then there exists a positive rational number c and a positive natural number N such that cp(an) > c > 0 whenever n > N. It is not difficult to deduce from this that the quotient {an}/{b n} ({b n } not nUll) of the fundamental sequences is a fundamental sequence. Definition 6.4. Let {an} and {b n} be two fundamental sequences whose difference

{an - bn} is a null sequence. Then we say that {an} and {bnJ are congruent and we write {an} == {bn} (mod{O}).

Being congruent is an equivalence relation and the set of fundamental sequences is now partitioned into equivalence classes. From each class we may select a fundamental sequence {an} to represent the class {an}. We can now define the sum, the difference, the product and the quotient of two classes {an} and {b n}. We let {an} and {b n} be the representatives respectively and we -- --- -- - -- -} } }, define {an} ± {b = {an ± b }, {an}' {b = {anb and when {b n} =F {OJ we n n n n - --define {an} . {b n} - 1 = {anb n I}. It is easy to verify that the definitions are independent of the choices of the representatives. The aggregate of these classes is called the cp-extension of the rationals, and each class is called a number in the cp-extension. When cp(a) = lal, the cp-extension coincides with the set of real numbers. When cp(a) = lal p we call the cp-extension the set ofp-adic numbers. This then gives a rigorous definition of a p-adic number. Our next task is to give a concrete representation of a p-adic number. The aggregate of classes contains the class {a} (a rational), and each fundamental sequence in the class is cp-convergent to the same rational number a, that is, a is their cp-limit. We shall write {a} = a, since now there is a one-to-one correspondence between these classes and the set of rational numbers. Since there are fundamental sequences which are not cp-convergent to any rational number we see that the cp-extension of the rationals is an aggregate which is larger than the set of all rational numbers. In general we let {an} be the number to which the fundamental sequence in it cpconverges. That is, we define

We should add that, when {an} and cp-limn _ oo a~.

{a~}

belong to the same class, cp-lim n _

oo

an

=

417

15.8 The Representation of p-adic Numbers

In the above discussion the valuation is defined only in the field of the rationals. We shall now extend its definition to the cp-extension of the rationals.

We should point out that in this definition, cp( {an} ) is independent of the choice of {an}. That is, if {an} == {a~} (mod {O}), then lim n-+ oo cp(an) = limn-+oo cp(a~). This can easily be proved from cp(an) - cp(a~) ~ cp(an - a~). It is convenient to use Greek letters IX, 13, y, . .. to denote the classes. We have the following three properties for cp(IX): 1) cp(lX) ~ 0 with equality if and only ifoc = {O}; 2) cp(lXf3) = cp(lX)cp(f3); 3) cp(1X + 13) ~ cp(lX) + cp(f3). Exercise 1. Show that equivalent valuations give the same extension of the rationals. Exercise 2. Let cp be a non-Archimedian valuation. Prove that {an} is convergent if and only if limn-+oo cp(an+ 1 - an) = O.

15.7 The Completeness of the Extension In the previous section we constructed the cp-extension of the rationals from the fundamental sequences of rational numbers and we saw that the cp-extension is larger than the set of rationals. We then extended the domain of definition of cp from the rationals to that of the cp-extension, giving a definition of cp(lX) where IX is a class of fundamental sequences. We now ask the following: If we repeat the process to obtain another cp-extension from the cp-extension already obtained, do we have a still larger aggregate than the first cp-extension? If the answer is no, then we say that the extension is complete. In order to discuss this we have to consider sequences {IX,} of classes, and define the terms fundamental sequences, classes, cp-limit, null sequences etc. all over again. It turns out that the cp-extension is complete, but we shall omit the proof. Theorem 7.1. The cp-extension of the rationals is complete in the sense that every fundamental (or cp-convergent) sequence {IX,} has a cp-limit. 0

15.8 The Representation of p-adic Numbers In this section we let cp(a) = lal p , and we examine the representation of p-adic numbers. 1) We first consider the p-adic representation of a rational number a

b'

(a, b)

=

1,

p,(b.

418

15. p-adic Numbers

For this we examine the solution of the congruence

bx == a

o~ X

(modpl),

/ > L(8)), 1 -p

we know that {x,} is rp-convergent. This means that the limit

in the rp-extension is the p-adic representation of the rational number alb (p,r b). 2) We next deal with the p-adic representation for the rational number a

b'

(a,b) = 1,

m~O.

The general p-adic representation of a rational number is the power series m~O.

If, for this power series (1), we have a,+ v

= a'+ v + t = al+ v +2t = ... = al+ v + nt = . ..

(v = 1,2, ... , t),

(1)

419

15.8 The Representation of p-adic Numbers

where I and t are fixed integers, t we may rewrite it as

~

1, then we say that (l) is periodic, and in this case

+ alP + ... + a,p') + pl+l(a'+l + a,+2P + ... + a,+ li- 1) + pl+l+l(a,+l + a,+2P + ... + a,+ lpl-l) + ... ),

p-m((ao

or simply

where

A = ao

+ alP + ... + a,pl,

Theorem 8.1. The p-adic representation ofa rational number is a periodic power series in p; conversely a periodic power series in p is a rational number.

Proof I) If

where

then

apm _ A

+ p' + I + 1 B + p' + 21 + 1 B + = pl+ 1 B(l + pI + p21 + ... ). =

p' + 1 B

Now

1 + pI

1 _ p(k+1Y

+ p21 + ... + pkl = ____ I _pI

'

1_ _ 1 I =p-(k+l)1 < e I_ 1 - pI 1 _ pI p(k+ 1)1

P

so that

1 + pI

+ p21 + ... + pkl + ...

1 1 _ pI

= __ .

Therefore

1 a:pm - A =p'+1B _ _ 1 -pI ' or

1 a =p-mA +p'+l- mB _ _ 1 -pI ' so that a is a rational number.

420

15. p-adic Numbers

2) We first consider the rational number 0(

r =-, S

10(1

< 1,

(r, s)

=

s> 0,

1,

r

< 0,

p,j's.

(2)

Let the index of p (mods) be t, that is t is the least positive integer satisfying p' == 1 (mod s). Let 1 - p' = ms, m < 0, so that r

mr

0(=-=--.

s

Since

10(1

1 _ p'

< 1, the number mr has the representation 0,::;; bi <po

Therefore 0(

= (b o + blP + ... + bl_lP,-l)(l + p' + p21 + ... ) = (b o + blP + ... + bl_lP,-l) + p'(b o + blP + ... + bl_lP , - 1 ) + "',

which shows that 0( has a periodic power series in p as representation. Next, let 0( be any positive rational, say 0( = a/b, (a, b) = l,pmllb. Then 0( has the representation pmO( = ao

r

+ alP + ... + avpv + -, s

o '::;;ai -convergent, it follows that its q>-limit

does represent a p-adic number.

m

~

0,

(3)

421

15.9 Application

Therefore a power seriesinp of the form (3) represents ap-adic number. We now ask: How is a given p-adic number represented? From the previous section we know that any p-adic number is a limit of a (f)-convergent sequence {a,} in the (f)-extension of the rationals. But any rational number a, can be represented as

Our problem is thus solved if we can show that the limit of {a,}, in the (f)-extension of the rationals, also has this representation. Corresponding to each positive integer t, there exists a positive integer L ( = L(t» such that 1

la, - a"l p L. This shows that, when I > L, the first t + k (k ;;:: 0) terms of the power series in p representing a" a, + 1, a, + 2, ••• must be equal. Since t can be arbitrarily large the required result follows. We have proved that all the power series in p of the form (3), finite or infinite, together give the whole set of p-adic numbers.

15.9 Application Although the notion ofap-adic number is introduced as such only in this chapter, it has appeared several times already in this book. An example of this was pointed out in the beginning of this chapter. Th,e generalization of this example is known as Hensel's Lemma. Theorem 9.1 (Hensel). Let f(x) be a polynomial with integer coefficients, and f(x) == go(x)ho(x) (modp), where go(x) and ho(x) are coprime polynomials. Then, among the p-adic numbers, there are two polynomials g(x), h(x) such that g(x) == go(x), h(x) == ho(x) (modp), andf(x) = g(x)h(x). Proof Let g,(x), h,(x) be two polynomials satisfying

and

Clearly g, and h, are coprime (modp). Let

and

422

15. p-adic Numbers

so that we have

Let

j(x) - g,(x)h,(x)

------;-,- - == t(x) (modp). p

Since g,(x) and h,(x) are coprime (modp), there are two polynomials cp(x) and t/J(x) such that t(x) == cp(x)hl(x) + t/J(x)Mx) (modp). Therefore j(x) - gl+ 1 (x)h l + 1 (x) ==j(x) - g,(x)hl(x) - pl(cp(x)hl(X)

+ t/J(x)g,(x»

== p'(t(x) - cp(x)h,(x) - t/J(x)g,(x» == 0 (modp'+ 1). Since the degree of t(x) does not exceed the degree of g,(x)h,(x) we may assume that the degrees of cp(x) and t/J(x) do not exceed the degrees of g,(x) and h,(x) respectively. The coefficients of g,(x) and hl(x) are cp-convergent, and so they converge to g(x) and hex) respectively. The theorem is proved. 0

Note: Lemma 7.10.1 can be given an interpretation in p-adic numbers.

Chapter 16. Introduction to Algebraic Number Theory

16.1 Algebraic Numbers Definition 1.1. By an algebraic number we mean a number 8 which is a root of the algebraic equation (1)

where the coefficients a r are rational numbers.

J=l

Examples of algebraic numbers are j2, i = and the rational numbers themselves. By clearing the denominators of all the fractions a r in equation (1) we obtain an algebraic equation with integer coefficients. From now on we shall call an ordinary integer a rational integer to distinguish it from an algebraic integer, which we shall define later. We see therefore that algebraic numbers may also be defined as the roots of algebraic equations with rational integer coefficients. If the equation (1) is irreducible and an i= 0, then we call n = aOfthe degree of the algebraic number 8. For example, rational numbers have degree 1, and the number i has degree 2. Let the equation (1) be irreducible and let 8(1), 8(2), ••• , 8(n) be all its roots. From Theorem 4.2.2 we know that 8(j) 'are distinct, and if 8(j) satisfies a rational coefficient equation g(x) = 0, then so do the remaining n - 1 numbers. We see therefore that the degree of an algebraic number is uniquely determined. Theorem 1.1. The sum, the diflerence, the product and the quotient (not dividing by zero) of two algebraic numbers are algebraic. 0 With the aid of the symmetric polynomial theorem, the proof of this theorem is only a simple exercise, and we shall omit many of the proofs in this chapter. Definition 1.2. If the irreducible algebraic equation defining 8 has rational integer . coefficients and leading coefficient 1, then we call 8 an algebraic integer. Examples of algebraic integers are themselves.

j2, i, (1 + fi)/2, and the rational integers

Theorem 1.2. Any algebraic integer which is rational must be a rational integer.

0

424

16. Introduction to Algebraic Number Theory

Theorem 1.3. The sum, the difference and the product of two algebraic integers are algebraic integers. D Theorem 1.4. Let 8 be an algebraic number. Then there exists a natural number q such that q8 is an algebraic integer. D Definition 1.3. If 8 and 8 - 1 are both algebraic integers, then we call 8 a unit. Examples of units are i and 3 - 2)2. Theorem 1.5. A necessary and sufficient condition for 8 to be a unit is that 8 satisfies an algebraic equation with rational integer coefficients, and with leading coefficient 1 and last coefficient ± 1. D

16.2 Algebraic Number Fields Definition 2.1. Let Fbe a set of complex numbers with at least two distinct members. Suppose that, given any two members in F, their sum, difference, product and quotient (not dividing by 0) are also members of F. Then we call Fa number field, or simply a field. An example of a field is the set of rational numbers which we shall, from now on, denote by R. It is clear that every number field must contain the rational field R. Theorem 2.1. Let 8 be an algebraic number ofdegree n. Then the set ofnumbers of the form (1)

where ak are rational numbers, forms a field. Moreover numbers represented by (1) are all distinct.

D

Dermition 2.2. The field in Theorem 2.1 is called the single extension of R by 8, and we shall denote it by R(8). For example, R(i) is the field of numbers of the form a rational numbers.

+ ib where a and bare

Theorem 2.2. /f8 "# 0, then R(8) is largest set ofnumbers obtainedfrom 8 by means of addition, subtraction, multiplication and division (except by 0). D Definition 2.3. Let 81> ... ,8, be algebraic numbers. The field obtained from addition, subtraction, multiplication and division (except by 0) of these numbers is called a finite extension of R and is denoted by R(8 1 , .•• , 8,).

425

16.3 Basis

Theorem 2.3. Every finite extension of R is a single extension. That is, given any finite extension R(8l>"" 8,), there exists an algebraic number 8 such that R(8) = R(8l> ... ,8,). 0 From this theorem we need only consider single extensions R(8), which we now call algebraic number fields. We also call the degree of 8 the degree of the field R(8). For example, R(i) is a quadratic field, and R is the only field with degree I. Theorem 2.4. Let D run over all the rational integers not equal to 1 with no square divisors. Then R(JD) runs over all the quadratic fields. 0

16.3 Basis In this section R(8) denotes an algebraic number field of degree n. We set 8 = 8(1), and let 8(2), ••• ,8(n) denote the remaining n - 1 roots of the irreducible equation defining 8. From the previous section we see that each number aE R(8) is representable as

where

aj

are rational numbers.

Dermition3.1. Leta(l) = a. Weputa(k) = a(8(k»,k conjugates of a. We also call the numbers Sea) N(a)

= 2,3, ... ,nand we call them the

= a(1) + ... + a(n) = a(8(1» + ... + a(8(n», = a(l) ... a(n) = a(8(1» ... a(8(n»,

the trace and the norm of a respectively.

It is easy to see that S(a + f3) = Sea) + S(f3) and N(af3) = N(a)N(f3). Also, from the symmetric polynomial theorem, we see that Sea) and N(a) are rational numbers, and if a is rational then Sea) = na and N(a) = an. Next, if a is an algebraic integer, then so are a(i), and hence so are Sea) and N(a); but Sea) and N(a) are known to be rational so that they must be rational integers. If a is a unit, then from N(a)N(a- 1) = N(aa- 1) = N(I) = 1 and the fact that N(a), N(IX- 1) are rational integers we deduce that N(IX) = ± 1. Conversely, if ais an algebraic integer and N(IX) = ± I, then 1X- 1 = ± a(2) ... a(n) is also an algebraic integer and so IX must be a unit. Therefore a necessary and sufficient condition for an algebraic integer IX to be a unit is that N(IX) = ± 1. Theorem 3.1. Let aE R(8) and let the irreducible equation satisfied by IX be hex) = 0, aOh = I. Also, let g(x) = rr~= 1 (x - a(v». Then g(x) is a polynomial with rational coefficients, and g(x) = c(h(x»"/', where lin and c is a rational number.

426


Proof That g(x) is a polynomial with rational coefficients follows at once from the symmetric polynomial theorem. Let oc = a(8). Then from h(oc) = 0 we have h(oc(v) = h(a(8(v» = 0, so that every root of g(x) = 0 is also a root of hex) = O. Since hex) is an irreducible polynomial we must have h(x)lg(x). Let g(x) = h(x)gl(x). If gl(X) is a constant, then the required result is proved; otherwise gl(X) has zeros and these must also be zeros of hex), so that h(x)lgl(x). Let gl(X) = h(x)g2(x). We can repeat the argument, and since the degree of g(x) is finite we finally obtain g(x) ::= c(h(x»n/l. 0

From this theorem we see that if 0( is an algebraic number of degree I, then there are I distinct numbers among OC(l), ••• , O(n) and each of them occurs nil times. Definition 3.2. Suppose that there exists a set of numbers OC1, ••• , O(m in R( 8) such that any number in R(8) is uniquely representable as alO(l + ... + amOCm where aj (1 ~j ~ m) are rational numbers. Then we call 0(1)'' • ,O(m a basis for R(8). It is easy to see that no one of 0(1, ••• , O(m is expressible as a linear combination of the other m - 1 numbers with rational coefficients. From Theorem 2.1 we know that 1,8, ... , 8 n- 1 forms a basis for R(8), so that basis certainly exists. Following the proof of Theorem 14.9.2 the reader can easily prove

Theorem 3.2. Every basis for R(8) has precisely n elements.

0

Let 0(1" •• , O(n and PI" .. , Pn be two bases for R(8). Then, from the definition ofa basis, there are rational numbers ajk (1 ~j, k ~ n) such that all

n

O(j

=

L ajkPk

~j~n),

(1

lajkl

a1 n

= ............ # O.

k=l

Definition 3.3. Let number

OCl, ••• , OCn E

R( 8). By the discriminant of 0(1, •••

L1(CX1, ... ,ctn) =

,O(n

we mean the

.......... .

Theorem 3.3. The discriminant

J(O(1>' •• , OCn) possesses the following properties: is a rational number; and if 0(1) ••• , OCn are algebraic integers, then J(O(1>' •• ,O(n) is a rational integer. 2) Let 0 ( 1 ) ' ' ' ' O(n and PI,"" Pn be two bases for R(8), and aj = L~= 1 ajkPk (1 ~j ~ n). Then

1)

J(O(1>""

O(n)

3) A necessary and sufficient conditionfor 0(1)' J(O(l> ... ,O(n)

# O.

•• ,

OCn

to be a basis for R(8) is that

0

Theorem 3.4. Suppose that, among the numbers 8(1), ... , 8(n), rl of them are real, and r2pairs of them are complex conjugates (r1

+ 2r2 = n). Then,for any basis

0(1)' •• ,

O(n

427

16.4 Integral Basis

of R(8), we have

Proof From Theorem 3.3 we need only examine the case = 8n - 1 • Now

0(1

= 1,

0(2 =

8, ... ,

O(n

Let us denote by !J the complex conjugate of 8. When 8(k) i= !JU) we have «8(j) - 8(k»(!J(j) - !J(k»)2 > 0, and (8(j) - !JW)2 < O. Therefore (- 1),2.1(1,8, ... , 8 n -

1)

> O.

16.4 Integral Basis In the remaining part of this chapter we shall use the word integer to mean an algebraic integer. Definition 4.1. Let Oh, . .. , Wm be m integers in R(8). If every integer in R(8) can be expressed uniquely as a1 W1 + ... + amw m, where a1' ... ,am are rational integers, then we call W1>'" ,Wm an integral basis for R(8). Theorem 4.1. Integral basis exists. More specifically let W1, ... ,Wnbe a basis where Wj (l ~j ~ n) are integers such that ILI(w1>"" wn)1 is least. Then W1>"" Wn is an integral basis. Proof We can choose a natural number q so that q8 is an integer, and now 1, q8, (q8)2, ... , (q8)"-1 are integers which form a basis for R(8). Therefore a basis 0(1) .•• ,lXn consisting of integers certainly exists. We shall now prove the set W1,' .. ,Wn of integers forming a basis which makes 1.1(0(1>' •• ,00n)lleast is an integral basis. Suppose the contrary. Then there exists an integer W = a1W1 + ... + anWn> where some ai is not a rational integer. We may assume without loss that a1 is not a rational integer, say a1 = g + t where g is a rational integer and 0 < t < 1. Then W'l = W - gW1 = tW1 + a2W2 + ... + anWnis also an integer, and w~, W2," . ,Wn still forms a basis for R(8). But

contradicting the minimal property of ILI(w1>' .. ,wn)l. The theorem is proved.

0

From this theorem we see that an integral basis is a basis, so that each integral basis consists of n elements. Theorem 4.2. All integral basis have the same discriminant. That is, if W1> ... , Wn and W'l' ... ,w~ are two integral bases, then L1(W1> ... ,wn) = L1(w'1" .. ,w~). 0

428


Definition 4.2. By the discriminant of the field R(.9) we mean the discriminant of its integral basis. We shall denote the discriminant of R(.9) by LI(R(.9» or simply LI. Theorem 4.3 (Stickelberger). The discriminant ofafield satisfies LI == 0 or 1 (mod 4). Proof Let il>' .. , in be a permutation of 1, 2, ... , n, and let (jil ....• i be 1 or - 1 depending on whether il>' .. , in is an even or odd permutation. Then, from the expansion of a determinant, we have n

"

(j.

~

. w(ill .•• w(in)

It""f1n

1

n

(i1to .. ,i n )

L

W~ll ... W~in)

+ 21] =

a

+ 21],

(it, ... ,i n )

where I] is an algebraic integer, and a = L(il ..... i n ) W~l) ••• w~in) is a symmetric function of .9(1), ••. , .9(n), so that a is rational and hence a rational integer. Therefore

Since the integer 1](1] + a) = (LI - a 2 )/4 is rational, it is a rational integer. Therefore LI == a 2 == 0 or I (mod 4). D We shall now examine the quadratic field R(jD) where D is a square-free rational integer. Each number in R(jD) is representable as 0( = (a + bjD)/2 where a, b are rational numbers. The trace and the norm of 0( are given by S(O() = a,

a2 _ b 2 D N(O()=--'4

Theorem 4.4. In the quadratic field R(jD), a necessary and sufficient conditionfor 0( to be an integer is that a, b are both rational integers satisfying a

== b (mod 2),

a == b == 0 (mod 2),

when

D == 1 (mod4);

when

D == 2,3

(mod 4).

(1)

Proof Since, in a quadratic field, 0( is an integer if and only if S(O(), N(O() are rational integers, the sufficiency of the condition (1) follows at once. Conversely, if 0( is an integer, then a and (a 2 - b 2 D)/4 are rational integers, so that

is also a rational integer. Since D is square-free, the number b must be rational. The necessity of the condition (1) now follows from a 2 - b 2 D == O. D

429

16.4 Integral Basis

When D == 1 (mod4), (l

+ jD)/2 is an integer in R(jD).

From

a + bjD a - b 1 + jD ----,,--'--- = - - + b ----'--

2

2

2

and 1

1jD

1 12 _jD =4D,

1

1

l+jD

I-jD

2

2

2

=D,

we have the following:

Theorem 4.5. Let D be a square-free rational integer, and let

D

Ll- { - 4D'

when

D== 1

(mod 4),

when

D == 2,3

(mod 4).

Then Ll is the discriminant of R(jD) , and 1, w is an integral basis. The numbers 1, (Ll + Jii)/2 also form an integral basis. D

From this theorem we see that, in a quadratic field, we may choose an integer w such that 1, w form an integral basis. This is not true in general; that is, if R(8) is a field of degree n ~ 3, we may not always find an integer w such that 1, w, ... ,wn - 1 is an integral basis for R(8). Example. Let oc be a zero offix) = x 3 - x 2 - 2x - 8. We shall prove that no integer w, with the property that 1, w, w 2 is an integral basis for R(oc), exists. Since ± 1, ± 2, ± 4, ± 8 are not zeros offix) , we know thatfix) is irreducible so that R(oc) is definitely a cubic field. It is easy to show that Ll(I, oc, o( 2 ) = - 4 x 503. Since p = 4/oc is a zero of g(y) = y3 + y2 + 2y - 8, it follows that pis an integer in R(oc). Let us denote by oc' and oc" the two remaining zeros of fix). Then

Ll(I,oc,P)

=

oc oc' oc"

4/oc 4/oc' 4/oc"

42

2

=

2

(N(oc»

1 oc 1 oc' 1 oc"

oc 2

2

OC,2 OC"2

Since Ll(l, oc, P) i= 0, the numbers 1, oc, Pform a basis. Indeed 1, oc, p must be an integral basis for R(oc), since otherwise the discriminant Ll of the field must satisfy ILlI < 503, and from Theorem 3.3 there exists a natural number a i= 1 such that - 503 = a 2 Ll, which is impossible because 503 is a prime number. Now let w be any integer in R(oc). Then there are rational integers a, b, c such that w = a + boc + cpo Now

430


8

1X2

= IX + 2 + - = 2 + IX + 2{J,

{J2

= -

IX

{J - 2

8

+ Ii = -

2

+ 21X -

(J,

so that w 2 = a 2 + b 2(2 =

+ IX + 2{J) + e2( - 2 + 21X - (J) + 2abIX + 8be + 2ae{J (a 2 + 2b 2 - 2e 2 + 8be) + (b 2 + 2e 2 + 2ab)1X + (2b 2 - e 2 + 2ae){J,

and hence .d(I, W, w 2)

=

1 a 0 b o e

a 2 + 2b 2 - 2e 2 + 8be 2 b2 + 2e 2 + 2ab . .d(1, IX, (J) 2b 2 - e2 + 2ae

== 0 (mod 4 ·503).

Therefore 1, w, w 2 cannot be an integral basis for R(IX).

16.5 Divisibility Definition 5.1. Let IX and {J be two integers. Suppose that there exists an integer y such that IX = {Jy. Then we say that {J divides IX and we write {JIIX. We also say {J is a divisor of IX, or that IX is a multiple of (J. Theorem 5.1. Let g(x) = IX,X'

+ ... + lXo,

1X, "# 0,

where the numbers IX, (J are integers, and let g(x)h(x)

= Y,+mx,+m + ... + Yo.

If there exists an integer (j satisfying (jlyu

o ~ w ~ m).

(0

~

u ~ 1+ m), then (jllXv{Jw (0 ~ v ~ I,

0

The consideration of divisibility leads naturally to the problem of factorization of algebraic integers and the uniqueness of factorization. However, the factorization of integers in the field of all algebraic numbers has little meaning since an integer may be a product of infinitely many integers. For example 2 = 2! x 2! x 2t .... From this we see that we must somehow restrict the domain of the divisors, and therefore we only discuss the factorization problem within a certain algebraic field R(9). Next, there may be infinitely many units in an algebraic field. If e is a unit, then every integer may be written as IX = e . e-llX, and therefore IX has infinitely many

431

16.6 Ideals

factorizations whenever R(B) has infinitely many units. For example, the numbers (1 + J2)n (n = ± 1, ± 2, ... ) are all units in R(J2) so that integers in R(J2) have infinitely many factorizations. In order to avoid this difficulty we introduce the notion of association. Definition 5.2. Two integers associates of each other.

0(,

Pwhich differ only from a unit divisor are called

Being associates is an equivalence relation. Definition 5.3. Let 0( be an integer in R(B). If there exist non-unit integers p, y such that 0( = Py, then we say that 0( is non-prime; otherwise we call 0( a prime in R(B). Theorem 5.2. Every algebraic integer in R(B) can be factorized into a product of primes in R(B). Proof If 0( is a prime, then there is nothing to prove. If 0( = py wher p, yare not units, then IN(O() I = IN(P)I . IN(y)l. Since p, yare not units the natural numbers.IN(p)l, IN(y) I are proper divisors of IN(O()I, so that IN(O()I > IN(P) I > 1 and IN(O()I > IN(y)1 > 1. The proof can now be completed by induction on IN(O()I. D It remains to consider the uniqueness of the factorization, and this is an important problem in algebraic number theory. We shall now examine the quadratic field R(J=5) and show that there is no unique factorization. Since - 5 == 3 (mod 4), every integer in the field takes the form 0( = a + bJ=5 where a, b are rational integers. We shall show that 2,3, 1 ± J=5 are primes in the field, and that 2, 3 are not associates of 1 ± J=5, so that from 6 = 2 . 3 = (1 + J=5)(1 - J=5) we see that there is no unique factorization in R(J=5). First 2, 3 cannot be associates of 1 ± J=5 because IN(2)1 = 4, IN(3)1 = 9 and IN(1 ± J=5)1 = 6. Next, if 2 is non-prime in R(J=5), we let

2 = O(P,

IN(O() I >

1,

IN(P) I > 1.

Write 0( = a + bJ=5. Then, from IN(2) I = 4, we have IN(O() I = a 2 + 5b 2 = 2 and this is impossible. Therefore 2 is a prime in R(J=5). Similarly 3, 1 ± J=5 are also primes in R(J=5). In order to overcome this problem Kummer invented the notion of ideals.

16.6 Ideals We shall now consider a fixed algebraic number field R(B) of degree n. Definition 6.1. Let 0(1) ••• , O(q be any q integers in R(B). The set of integers of the form

432


where I'/l> ••• ,I'/q are integers in R(8) is called an ideal generated by OCl> ••• ,ocq, and is denoted by [OCl> ••• ,OCq]. We shall use the capital Gothic letters

~, ~, (£:,

!l, ... to denote ideals.

Definition 6.2. An ideal [ocJ generated by a single integer oc is called a principle ideal. The set [OJ containing only the integer 0 is an ideal, but we shall assume that our ideals are distinct from [0]. The ideal [IJ contains all the integers in R(8), and is called the unit ideal which we shall denote by .0. Theorem 6.1. Ideals possess the following properties: 1) If oc, 13 are in the ideal, then so are oc ± 13; 2) If oc is in the ideal and 1'/ is an integer in R(8), then We see from this theorem that if 1 E~, then

~

I'/OC

is in the ideal.

0

= [1].

Definition 6.3. Let ~ = [OCl> ••• ,ocqJ and ~ = [f3l> . •. ,f3rJ be two ideals. If ~ and ~ contain exactly the same integers in R(8), then we say that they are equal and we write ~ =~. Theorem 6.2. A necessary and sufficient condition for two ideals [OCl> ••• , ocqJ and [131, ... ,f3rJ to be equalis thatthere are integers ~ij, I'/ji (1 ~ i ~ q, 1 ~ j ~ r) such that OCi

=

L ~ijf3j, j= 1

In particular, if [ocJ

q

f3j =

L I'/jiOCi· i= 1

= [f3J, then oc and 13 are associates. 0

Let OCl> ••• ,ocq be any q rational integers with greatest common factor d. Then there are rational integers Xl> . .• , Xq such that d = X10Cl + ... + XqOCq, and hence, in the rational number field, [OCl> ••• , ocqJ = Cd]. In other words there are only principal ideals of the rational number field. On the other hand we know from our discussion in the last section that, in R(.j=5), the ideal [2, 1 + .j=5J cannot be reduced to a principal ideal, so that non-principal ideals exist. Definition 6.4. Let ~ = [OC1, ••• , ocqJ and ~ = [131, ..• , f3r J be two ideals. We call the ideal [OClf3l> ... , OClf3" OC2f3l> . •• ,OC2f3" •.• ,ocqf3rJ the product of ~ and ~; we shall denote it by ~ . ~. Theorem 6.3. The product of~ and ~ is independent of the choices OCi, f3i. That is, if

then

=

, 13'1'···' OC2, 13't'···' OCs'f3'J [ OC'13' t • 1 1'···' OC ,1 13't' OC2

o

433

16.7 Unique Factorization Theorem for Ideals

This can easily be proved from the definition of equality for ideals. Also we have ~ for any ideal ~, and that multiplication of ideals is commutative and associative. We can then use induction to define ~1 ••• ~m and ~m, where m is a natural number, and show that the usual rules of indices hold.

D .~ =

Definition 6.5. Let ~, mbe two ideals. Suppose that there exists an ideal ... , m:m-l> m:m) = «m:l> ... , m:m-l), m:m). If (m:, ~) = D then we say that m:, ~ are coprime. It is easy to see that if (m:, ~)

= 1), then (m:(£;,

~(£;)

= 1)(£; for any ideal (£;.

435

\6.7 Unique Factorization Theorem for Ideals

Theorem 7.5. Let

'l' be a prime

ideal. Suppose that

Proof Since 'l',r~ we have ('l', ~)

have

'l'1~.

'l'1~~

= .0 and so ('l'~, ~~) =

and ~.

'l',r~.

Then

'l'1~.

Since 'l'1~~ we now

D

Theorem 7.6. Every ideal has finitely many distinct divisors. Proof Given the ideal ~ we choose ~ and a natural number a such that ~ . ~ = [al Therefore ~ contains a, and any divisor of~ also contains a. Thus it

suffices to show that there is at most a finite number of ideals containing a fixed natural number. Let 9Jl = [0(10 ••• ,O(m] be an ideal which contains a, and let Wl, ... , Wn be an integral basis for R(8) so that each O(j can be written as O(j = gjlWl + ... + gjnWn (1 ~j ~ m), where gjk are rational integers. Now set (0

~

Yj

=

n

pj =

L:

k=l

so that

rjk < a), n

qjkWk,

L rjkWk,

k= 1

O(j = apj + Yjo Since a lies in 9Jl, we have 9Jl

=

[aPl

+ Yl,· .. , apm + Ym, a] = [Yb . .. , Ym, a].

Since there is at most a finite number of sets Yb ... , Ym the required result follows. D Theorem 7.7 (Fundamental theorem for ideals). Any ideal ~ distinct from .0 can be factorized into a product ofprime ideals. Furthermore, apart from the ordering of the factors, this factorization is unique. Proof Since each ideal has at most a finite number of divisors we can use induction on the number of divisors of ~. We first establish the existence of a factorization. If ~ is a prime ideal, then there is nothing more to prove; otherwise we let ~ = ~(£; (~ "# .0, (£; "# .0). Since the numbers of divisors of~ and of(£; are less than that of~, the required result follows by induction. We now prove the uniqueness of the factorization. Suppose that

m ;::, 1,

I;::, 1.

If ~ is a prime ideal, then I = m = 1 and there is nothing to prove. If ~ is not a prime ideal, then I > 1, m > 1. Since 'l'd'l"l ... 'l'~, there must be a 'l'j (l ~j ~ m) such that 'l'l = 'l'j. We may assume without loss that j = 1 so that 'l'2 ... 'l', = 'l'~ ... 'l'~, and the required result follows from the induction hypothesis. D We have proved that every ideal distinct from .0 can be written as ... 'l'~r where 'l'j are distinct prime ideals, and aj are natural numbers. The representation is unique apart from the ordering of 'l'j. 'l'~''l'~2

436


16.8 Basis for Ideals Let Wi, . .. , Wn be an integral basis for R(8), and let ~ be any ideal of R(8). Since each member of ~ is representable as a linear combination of Wb ..• ,Wn with rational integer coefficients we see, from Theorem 6.1, that ~ can be viewed as a linear module. Also, corresponding to the ideal ~, there is an ideal ~ and a natural number a such that ~~ = [a], so that aWb"" aWn all lie in ~; and since these n numbers are linearly independent we see that ~ is actually a linear module of dimension n. From our discussion in Chapter 14, section 9, this module ~ must have a basis, and every basis must have exactly n integers. In particular, we have: Theorem S.l. Let such that

~

be an ideal of R(8). Then we can find n integers DCb' •• ,DCn in

where aij are rational integers, aij > 0 (l ~ i ~ n), 0 ~ aji < aii (l DCi, ... , DCnform a standard basis for~. D Let

DCb' •• , IXn

and flb' .. , fln be two basis for

~

~

~

i <j ~ n), and

and let

n

lXi

=

L uijflj

(i=I, ... ,n).

j= 1

Then the coefficient matrix (Uij) must be a modular matrix so that A(DCb" . ,DCn) = A(fl 1, ... ,fln)' Thus the discriminant of a basis of an ideal is independent of the choice of the basis so that we may write this as A(~). We shall now examine the standard basis for ideals of the quadratic fields R(.ji5). Let I, W be an integral basis for R(.ji5); the definition of W is given in Theorem 4.5. From Theorem 8.1 we can find two integers a, b + ew to form a standard basis. Here a, b, e are rational integers and we may suppose that a > 0, e > 0, 0 ~ b < a. However we should note that not all pairs of integers of the above form always form a basis for the ideal; there are other conditions on a, b, e. It is easy to see that a, b + ew form a standard basis for a certain ideal only when aw, web + ew) are representable as xa + y(b + ew), where x, yare rational integers. From aw = xa + y(b + ew) we have a = ye, ax + by = 0, so that cia, elb. Let a = em, b = en. Then from e(n

+ w)w = e(n + w)(n + w + w') - e(n + w)(n + w') = - eN(n + w) + e(n + w)(n + Sew»~,

where Sew) and N(n + w) represent the trace and the norm of wand n + w respectively, we see that a necessary and sufficient condition for em, e(n + w) to be a

437

16.9 Congruent Relations

standard basis for a certain ideal is that N(n

+ w) == 0

(1)

(modm).

From Theorem 4.5 we see that (1) is equivalent to LI

== {(2n + 1)2 (2n)2

(mod4m), (mod4m),

if

D

if

D

== 1 == 2,3

(mod 4); (mod 4).

(2)

Therefore we have: Theorem 8.2. A necessary and sufficient condition for a pair of integers cm, c(n + w) (c > 0, m > 0, 0 ~ n < m) to be a standard basis for a certain ideal of R(JD) is that either (1) or (2) holds. 0

16.9 Congruent Relations Definition 9.1. If~I[IX], then we say that see that ~IIX means that IX is in ~.

~

divides IX, and we write ~IIX. It is easy to

We can follow the discussion in Chapter 14, section 9, and define a congruent relation on the integers of the field R(8) with respect to an ideal. Definition 9.2. If~11X - f3, where IX, f3 are integers in R(8), then we say that IX and f3 are congruent modulo ~, and we write IX == f3 (mod ~). The integers of the field R( 8) are now partitioned into equivalence classes, called the residue classes modulo ~. We shall denote by N(~) the number of these residue classes, and we call N(~) the norm of~. From Theorem 14.9.3 we have: Theorem 9.1. Let W1, ... , Wn be an integral basis for R(8), and let 1Xl> ... , IXn be any basis for the ideal~. If lXi = = 1 aijWj, then N(~) is equal to the absolute value of the determinant of the coefficients, that is N(~) = lIaijll· 0

I:;

From this theorem we deduce at once: Theorem 9.2. Let LI be the discriminant of R(8), and LI(~) be the discriminant of the basis for ~. Then we have LI(~) = (N(~»2L1. 0 Theorem 9.3. The norm of a principal ideal [IX] satisfies N([IX]) Theorem 9.4.

N(~m)

=

= IN(IX)I. 0

N(~)N(m).

Proof Since ~ contains ~m, from Theorem 14.9.4, the members of ~ are partitioned into residue classes modulo ~m, and the number of classes is equal to N(~m)/ N(~). It remains to prove that the number of classes is also equal to N(m).

438


Let flI>' .• , flN('B) denote the residue classes mod~. There exists an integer OCE 21 such that ([oc] , 21~) = 21. Now ocflI> .•. ,OCflN('B) all lie in 21, and if} "# k (l ~), k ~ n), then ocflj 1= OCflk (mod 21~). From ([oc], 21~) = 21, we know that corresponding to any y in 21, there are integers 1], (j such that y = I]OC + (j, (j E 21~. Also, corresponding to the integer 1], there is an integer fl and a natural number) (1 ~) ~ N(~» such that I] = flj + fl so that y = ocflj + ocfl + (j == ocflj (mod 21~). This shows that every member of 21 must be congruent to exactly one of OCfll' ... ,OCflN('B) modulo 21~, and therefore the number of classes concerned must be equal to N(~) as required. 0 Theorem 9.5. Let 'l' be a prime ideal, and let oc be any integer not divisible by 'l'. Then ocN(\jJ) - 1 == 1 (mod'l'). Proof Let 0, 1tl> 1t2,' .. , 1tN(\jJ)-l denote the residue classes mod 'l'. Since 'l',j'oc, the numbers 0, OC1tl, OC1t2, ... , OC1tN(\jJ) - 1 also represent the residue classes mod'l'. Therefore

and the theorem follows.

0

16.10 Prime Ideals Theorem 10.1. Every prime ideal 'l' must divide a rational prime p. Moreover,p is the least positive rational integer in 'l' so that it is unique. Proof From Theorem 7.1 there must exist a rational integer a such that 'l'1[a]. Let a = IIp be its factorization, so that there must be a prime p such that 'l'1[p], or ~ jp. Suppose, if possible, there exists a positive rational integer b such that b < p and 'l'lb. Then bE 'l' so that(p, b) = 1 also lie in 'l' giving 'l' = [1], which is impossible. Therefore p is the least positive rational integer in 'l'. 0

Let the prime ideal factorization for [p] be 'l'1'l'2 ... 'l't. Then, on taking the norm, we havepn = N([p]) = N('l'1)N('l'2) ... N('l't). It follows that the norm ofa prime ideal must be a prime power. If N('l') = pI, then we calIf the degree of 'l'. Concerning the factorization of [p] there is the following important theorem which we shall not prove. Theorem 10.2 (Dedekind's discriminant theorem). A necessary and sufficient condition for 'l'21p is that piA. 0 Let us examine the factorization of [p] in the quadratic field R(fo). Clearly there can only be the following three possibilities. 1) [p] = 'l'; 2) [p] = 'l'.Q, 'l' "# .0, N('l') = N(.Q) = p; 3) [p] = 'l'2, N('l') = p. Concerning the factorization of [p] in a quadratic field, we have:

439

16.10 Prime Ideals

Theorem 10.3. Let A be the discriminant of R(jD). Then 1),2) or 3) in the above holds according to (~) = - 1, + 1, or O. Here (~) is the Kronecker's symbol.

Proof If ~ is a prime divisor of [pJ, and N(~) = p, then either [pJ = ~.Q or [pJ = ~2. Let cm, c(n + w) be a standard basis for the ideal. Then N(~) = c2 m = p, so that c = 1, m = p. From (2) in section 8, we now see that (~) is either + 1 or O. Let us suppose, conversely, that (~) = + 1 or O. We first consider the case p =f. 2. I) If (~) = 1, then there exists a such that p j a and A == a 2 (mod p). Since p =f. 2, we have (p, 2a) = 1 so that [p,a

+ flJ[p, a - flJ

[

~ - fl,-----;-

=

[pJ p,a

+ fl,a

=

[pJ [p,a

+ fl,2a, a 2 ;

AJ

A ,1

J=

[p].

Also [p, a + flJ =f. [p, a - flJ, since otherwise we have [p, a + flJ = [p,a - flJ = [p, a + fl, 2aJ = [1J and this is impossible; [p, a + flJ and [p, a - f l J are not .0. Therefore, when p =f. 2 and (~) = 1, [p J is the product of two distinct prime ideals. 2) If (~) = 0, then piA, so that

[p,flJ 2 = [p,flJ[p,flJ

=

[PJ[p,fl,~J.

But A = D or 4D, p =f. 2 and 1) is square-free, so that (p, :) =-1 and hence [pJ = [p, flJ2. That is, if p =f. 2 and (:) = 0, then [pJ is the square of a prime ideal. Let us now consider the case p = 2. Since (i) =f. - 1 we must have D == 2, 3 (mod 4) or D == 1 (mod 8). As before we can prove: 3) When D == 2 (mod4), we have (1) = 0 and [2J = [2, jDJ2; 4) When D == 3 (mod4), we have (1) = 0 and [2J = [2, 1 + jDJ2; 5) When D == 1 (mod 8), we have (i) = 1 and

[2J = [2, 1 +

fDJ·

Since the two factors here are distinct, ideals. D

[2, 1 -

2jDJ.

[2J is now the product of two distinct prime

Theorem 10.3 establishes Dedekind's discriminant theorem for quadratic fields. We shall now examine a specific example for a cubic field. Let oc be a zero ofj(x) = x 3 - x 2 - 2x - 8. We saw in §4 that R(oc) is a cubic field with discriminant 503, that 1, oc, f3 = 4/oc form an integral basis, and that f3 is a zero of g(y) = y3 + y2 + 2y - 8. We now consider the factorization of [503J in R(oc). Let ~, .0, ~ denote prime ideals of R(oc). Then the factorization of [503J must take one of the following five situations:

440


1) 2) 3) 4) 5)

[503] = ~.Q9t; ~, .0, 9t distinct and N(~) = N(.Q) [503] = ~2.Q; ~ =F.Q and N(~) = N(.Q) = 503; [503] = ~3; N(~) = 503; [503] = ~.Q; N(~) = 503, N(.Q) = 503 2 ; [503] = ~; N(~) = 503 3 •

= N(9t) = 503;

In each of the first four situations, [503] has a prime divisor ~ with norm 503. Let us first examine these four situations. Let ao, bo + bllX, Co + CllX + C2fJ be a standard basis for ~ so that bo < ao', Co < ao, Cl < b l . Also, since aolX, aofJ lie in ~ we have, in addition, that b l ~ ao, C2 ~ ao, and from N(~) = aOb l C2 = 503, we obtain ao = 503, b l = 1, C2 = 1, Cl = O. Therefore ~ must take the form [503, a + IX, b + fJ], and 503, a + IX, b + fJ form a standard basis for ~. Since a + IX, b + fJ E ~ and N(~) = 503, we have N(a + IX) == N(b + fJ) == 0 (mod 503). But a + IX and b + fJ are the roots of fix - a) = 0 and g(y - b) = 0 respectively so that N(a + IX) = Ifl- a)1 and N(b + fJ) = Ig( - b)l. Therefore a and b satisfy the cubic congruences a 3 + a2 - 2a + 8 == 0 (mod 503) and b3 - b2+ 2b + 8 == 0 (mod 503), which give the solutions a == 149, 149,204 and b == 395, 395, 217 (mod 503). Therefore ~ must be one of the following four ideals: [503,149

+ 1X,395 + fJ],

[503,204

+ IX, 217 + fJ],

[503,149

+ 1X,217 + fJ],

[503,204

+ 1X,395 + fJ].

The third ideal is not 1X(217

+ fJ) -

~,

since otherwise

217(149

+ IX) + 65(503) = 4 -

217·149

+ 65·503 = 366

would be in ~, and from (366, 503) = 1 we would have ~ = .0. Similarly the fourth ideal is not ~. Next, from (149

+ IX)IX = -

46(503)

+ 150(149 + IX) + 2(395 + fJ),

(149

+ lX)fJ = -

117(503)

+ 149(395 + fJ),

(395

+ fJ)1X = -

117(503)

+ 395(149 + IX),

(395

+ fJ)fJ = -

310(503)

+ 2(149 + IX) + 394(395 + fJ),

we see that 503, 149 + IX, 395 + fJ do form a standard basis for the prime ideal [503, 149 + 1X,395 + fJ]. Similarly 503, 204 + IX, 217 + fJ do form a standard basis for the prime ideal [503, 204 + IX, 217 + fJ]. Finally the two ideals [503,149 + IX, 395 + fJ] and [503, 204 + IX, 217 + fJ] are distinct divisors of the ideal [503] and we therefore conclude that our situation 2) is the only possibility, and computation shows that actually [503] = [503,149

+ 1X,395 + fJ]2

. [503,204

+ IX, 217 + fJ].

441

16.12 Ideal Classes

16.11 Units We have the following result on units: Among all the units in R(8) we can choose = r1 + r2 - 1 of them, say e1> . .. , e" such that every unit is representable as pe~l ... e!r (I = 0, ± 1, ± 2, ... ); here p is a certain root of unity in R(8). Here we shall only concern ourselves with quadratic fields R(jD). Let a unit be x + yw so that N(x + yw) = ± 1. We need therefore to solve these equations in rational integers for the units in R(jD). Now r

N(x

+ yw) = (x + yw)(x + yw') if D == 1 (mod 4), if D == 2,3

(mod 4).

When D < 0, the equations (2x + y)2 - y2 D = 4 and x 2 - y2 D = 1 have only finitely many solutions, so that R(jD) can have only finitely many units. In fact if we denote by w the number of units in R(jD), it is not difficult to show that w = 6, 4 or 2 according to whether LI = - 3, - 4 or LI ~ - 7. Consider next D > O. Now the equations (2x + y)2 - y2 D = ± 4 and x 2 - y2 D = ± 1 are the Pell equations we considered in Chapter 10. Therefore there exists a unit 1'/ in R(jD) such that any unit in R(jD) is representable as ± 1'/n, n = 0, ± 1, ± 2, .... This number 1'/ is called the fundamental unit of R(jD).

16.12 Ideal Classes Definition 12.1. Let m: and mbe two ideals. Suppose that there exist two principal ideals [ae] and [p] such that [ae]m: = [p]m. Then we say that the two ideals m: and m belong to the same ideal class, and we write m: '" m. It is easy to see that being in the same ideal class is an equivalence relation, and moreover we have 1) m: '" .0 if and only if m: is a principal ideal; 2) if m: '" mand (£; '" :n, then m:(£; '" m:n; 3) if m:(£; '" m(£; then m: '" m. The ideals of R(8) are now partitioned into classes called ideal classes. Theorem 12.1. The number of ideal classes of R(8) is finite. Proof It suffices to show that there exists a positive number M, depending only on R(8), such that every class contains an ideal satisfying N(m) ~ M. This is because

m

there can only be finitely many ideals having a given norm. Let (£; be any ideal of R(8). We already know that there exists an ideal m: such that m:(£; '" .0, and if we can choose an ideal msuch that m:m '" .0 and N(m) ~ M, then our theorem is proved. This is because m:m '" m:(£; so that m'" (£;.

442


Let Oh, ... ,Wn be an integral basis for R( 8) and let n

s= 1

We define the natural number k by k n ~ N(21) < (k + l)n. Among the (k + l)n integers Xl W1 + ... + XnWn (Xm = 0,1, ... ,k) there are at least two which are congruent modulo 21, say

here 0 ~ Ym

~

Zm

in 21. Since IYm - zml

~

IN((X)I

Since

(X

~

=

k, 0

~

k, and we now have the non-zero integer

k it follows that

IS~l mt1 (Ym - zm)w~) I ~ S~l mt1 klw~)1 = knM ~ M· N(21).

is in 21 we see that 211 [(X], and we may write [(X] = = IN((X)I ~ M· N(21) or N(~) ~ M as required. D

21~

which gives

N(21)N(~)

Theorem 12.2. Let h be the number of ideal classes of R(8). Then,for any ideal 21, we have 21h - .0. Proof Let 21b ... , 21h be ideals that belong to different classes. Then so are 2121b ... ,2121h and hence 211 ... 21h - (2121 1) ... (2121h), or 21h - .0. D

16.13 Quadratic Fields and Quadratic Forms Let Ll be the discriminant of the quadratic field R(jD). We shall now establish the relationship between the ideal classes of R(jD) and the classes of quadratic forms having discriminant Ll. Let 21 be an ideal of R(jD) and let (Xl> (X2 be a basis for 21 satisfying (1)

where (X'1' (X~ are the conjugates of (Xl> (X2. Corresponding to 21 we construct the quadratic form F(x,y) =

N((X1 X + (X2Y) ((X1 X + (X2Y)((X~ X + (X~) 2 = = ax N(21) N(21)

+

b xy

+ cy

2

.

Since a = N((X1)/N(21), b = (N((X1 + (X2) - N((X1) - N((X2»/N(21), c = N((X2)/N(21), and (Xl> (X2, (Xl + (X2 are in 21 we see that a, b, c are rational integers. Also, the

443

16.13 Quadratic Fields and Quadratic Forms

discriminant of F(x, y) is b2 - 4ac = (OC1OC~ - OC'lO(2)2/N(91)2 = A. We say that F(x,y) is a quadratic form belonging to 91. When A < 0 the quadratic field R(.ji5) is imaginary so that a > 0 and F(x, y) is positive definite. Also, it is not difficult to see that as OCl, OC2 run through the basis for 91 satisfying (I) we obtain all the quadratic forms equivalent to F. Theorem 13.1. Every indefinite or positive definite quadratic form F(x,y) = ax 2 + bxy + cy2 with rational integer coefficients and discriminant A belongs to an ideal 91 with basis OC1> OC2'

Proof We first show that a, (b - fl)/2 form a basis for the ideal IDl = [a, (b - fl)/2]. Observe that (b - fl)/2 satisfies the equation x(b - x) = ac so that it is an integer. Also we have w = (s(w) + fl)/2, where sew) = 0 or I, and sew) aw=

+b-

(b - fl)

sew) a=

2

+b

2

b - fl a-a

b - fl b - f l sew) - b + b + f l b2 - A 2 w = 2' 2 = ~a

+

2

'

sew) - b b - f l 2

.

2

'

where (s(w) ± b)/2 and (b 2 - A)/4a are rational integers, so that a, (b - fl)/2 do indeed form a basis for IDl. If a > 0 we take 91 = IDl, OCl = a, OC2 = (b - fl)/2, and from N(IDl) = a we have the quadratic form

(ax

+ t(b -

fl)y)(ax a

+ tcb + fl)y) = ax 2 + bxy + cy 2,

---=------'------=-----'---

so that IDl is the required ideal. If a < 0, then, since the quadratic form is not negative definite, A > 0 and we now take 91 = flIDl, OCl = afl and a2 = (b - fl)fl/2. It is easy to see that OC1> OC2 form a basis for 91 satisfying (I). Also N(91) = - aA and we can now construct the quadratic form

- A(ax

+ tcb-

fl)y)(ax - aA

+ tcb + fl)y) = ax 2 + bxy + cy 2.

------'=------'--------==------'---


0

From the above we see that if F belongs to 91, then every quadratic form equivalent to Falso belongs to 91. However, given a quadratic form F, there may be two different ideals 91 and mto which Fbelongs. This then establishes a relationship between 91 and m. Definition 13.1. Let 91 and mbe two ideals. Suppose that there are integers oc and fJ such that [oc]91 = [fJ]m and N(ocfJ) > O. Then we say that 91 and mare equivalent in the narrower sense, and we write 91 ~ m.

444


It is clear that being equivalent in the narrower sense is a special case of being equivalent.

Theorem 13.2. Equivalent quadratic forms belong to ideals which are equivalent in the narrower sense. Conversely, quadratic forms belonging to ideals which are equivalent in the narrower sense are equivalent forms. 0 Let ho denote the number of ideal classes (not in the narrower sense), and let h denote the number of classes under the narrower sense of equivalence. Assume that the discriminant of the field concerned is Ll. Then h is the class number of quadratic forms with discriminant Ll. If ~ '" m then either ~ ~ m or ~ ~ [flJm, and we deduce that h ~ 2h o. In fact, if ~ '" m, then there are integers oc, f3 such that [ocJ~ = [f3Jm. (i) If Ll < 0, then N(ocf3) > 0 so that ~ ~ m, and whence ho = h. (ii) If Ll > 0 and the fundamental unit 1] satisfies N(1]) = - 1, then [ocJ~ = [f3Jm = [1]f3Jm and one of N(ocf3), N(ocf31]) must be positive, so that we still and ho = h. have ~ ~ (iii) If Ll > 0 and the fundamental unit 1] satisfies N(1]) = I then ~ cannot be equivalent in the narrower sense to both m and m[flJ, so that ho = h/2. Therefore we have

m

h' { ho = ~ 2'

if Ll < 0

or

Ll > 0,

if Ll > 0,

N(1])

N(1])

=

{1]2, 1],

1;

= + 1.

Also if we replace d by D in Theorem 11.4.4 and define B

=-

if Ll > 0, if Ll > 0,

B

accordingly, then

N(1]) = - I; N(1]) = + I.

Again, from our results on the class number in Chapter 12 we have: Theorem 13.3. Let ho denote the number of ideal classes. Then

W

h,

[tILlIl(Ll)

~ 2(2 _(~)) ,~,

1] h o =

if Ll < 0,

-; ,

£t O.

s,

Ll

s= 1

Example 1. In R(i) we have Ll = - 4, W = 4 so that ho

=

Example 2. In R(J="3) we have Ll ho

=

±(~) =

4 2(2 - 0)S=1

s

I.

= - 3, W = 6 so that

6 2(2 - (- 1»

±(-=2) =

s= 1

S

I.

0

445

16.14 Genus

Example 3. In R(J=5) we have LI

ho =

I (-

2 2(2 - 0).=1

Example 4. In R(.J"=T9) we have LI

ho =

20, W

= -

=-

20) s

19, W

f. (-

2 2(2-(-I))s=1

2 so that

=

= 2.

= 2 so that

19) = 1. s

Example 5. In R(J2) we have LI = 8, e = 3 + 2J2. Since - 1 and 1]2 = e, 1] is a fundamental unit. Also (1

+ J2)h = O

n

(sin ns)_ (~) = sin 3n

8

s= 1

so that ho

8

I

1]

= 1 + J2 has norm

sin ~ = (l

8

+ J2),

= 1.

16.14 Genus Let R(.ji5) be a fixed quadratic field with discriminant LI, and we shall assume in this section that the ideal classes are derived from the equivalence relation on ideals being equivalent in the narrower sense. Definition 14.1. If a quadratic form F(x, y) belongs to an ideal m: then we call the character system for F(x, y) (see Definition 12.6.1) the character system for m:. That is, if Pi>' .. ,Ps are the odd prime divisors of LI, we take an integer IX in m: so that (N(IX)jN(m:), 2L1) = 1 and we call

(N(IX)~~(m:))

(i= 1, ... ,s)

and 1[N(a)

]

"(IX)

= (-

1)2 N(~)-1

e(lX)

= (-

1)8 N(~) -1,

"(IX) e(IX),

1[(N(a»),

if

,

]

LI D=-=.3 4

(mod 4);

LI 4

if

-=.2 (mod 8);

if

-=.6

LI 4

(mod 8)

the character system for m:. Since ideals belonging to the same class have the same character system we may speak of the character system for an ideal class. Definition 14.2. Two ideal classes with the same character system are said to belong to the same genus. There is now a one-to-one correspondence between ideal classes in the quadratic field R(.ji5) and classes of primitive forms having discriminant LI.

446


Theorem 14.1. The values of the character systemfor ~m correspond to the products of the values of the character systems for ~, m.

Proof If a, 13 belong to

~,

m respectively, then af3 belongs to N(a)

N(f3)

--

--

N(~)

N(m)

~m.

Also

and

and if ( N(a)

N(~)'

2.1) = 1,


N(f3) ( N(m) ,

2.1) = 1,

then

( N(af3)

N(~m)

,

2.1)

=

1.

D

From this theorem we deduce at once: 1) The character system for the product of two classes is the product of the two character systems. 2) If {~} and {m} belong to a genus, and {~d{md belong to a genus, then {~~d and {mmd also belong to a genus. Definition 14.3. We call the class to which the unit ideal .0 belongs the principal class, and the genus to which the principal class belongs the principal genus. Also, if ~m = [a] where a is a natural number, then we call {m} the inverse of the class {~}.

From Theorem 7.1 we see that the inverse of any ideal class always exists. Also = {~}. Since the values of the character system for the principal class, as well as for all the classes in the principal genus, are all 1, it follows that the product of any two classes in the principal genus, and the inverse of any class in the principal genus, are classes in the principal genus. (The family of all ideal classes forms a group with respect to class multiplication, and the sub-family of ideal classes in the principal genus forms.a sub-group.) {.o}{~}

Theorem 14.2. Every genus has the same number of classes.

Proof We let ~ be the principal genus, and we let ~{~} denote the family of classes obtained from the product of classes in ~ with {~}. We put all the ideal classes into various families (1)

where {~i} is any class not belonging to~, ~{~2}' ... '~{~i- d. It is easy to see that there is no ideal class which belongs to two of the families in (1).

447

16.15 Euclidean Fields and Simple Fields

From Theorem 14.1 we know that in each family in (1) all the classes belong to the same genus, and distinct families belong to different genera, so that each family in (1) forms a genus. Since any two classes in 3{~i} are distinct the theorem is proved. 0

16.15 Euclidean Fields and Simple Fields Definition 15.1. If ho = 1, then we call the field a simple field. It is clear that, in a simple field, every ideal is a principal ideal. Therefore we have:

Theorem 15.1. The unique factorization theorem holds for integers in a simple field. 0

There is a type of simple fields, called Euclidean fields, having properties which are very similar to those of the rational field. Definition 15.2. If, corresponding to any two integers ~, 1] (1] -# 0) in R(jD), there exist two integers K, A. such that IN(A.) I < IN(1])I,

(1)

then we call R(jD) an Euclideanfield. An alternative definition is: Defmition 15.3. If, corresponding to any b in R(jD), there exists an integer K such that IN(b - K)I < 1,

(2)

then we call R(jD) an Euclidean field. Theorem 15.2. Every Euclidean field is a simple field.

Proof Let R(jD) be Euclidean. In order to prove that R(jD) is simple it suffices to show that every ideal is a principal ideal. Let ~ be any ideal in R( jD) and let 1X1' 1X2 be a basis for ~, and we may assume without loss that 0 < IN(1X1)1 ~ IN(1X2)1. Since R(jD) is Euclidean there are integers IX~ and /32 such that 1X2 = 1X~1X1 + /32, IN(/32)1 < IN(1X1)1· If /32 -# 0, then there are lX'l and /31 such that 1X1 = 1X'1/32 + /31> IN(/31)1 < IN(/32)1. Continuing with the argument, which must terminate after a finite number of steps because IN(1X1)1 is a natural number, we arrive at an integer IX such that ~ = [1X1> 1X2] = [IX]. The theorem is proved. 0

448


Theorem 15.3. There are only five quadratic imaginary Euclidean fields, namely

R(~), R(j=2), R(j=3), R(~) and R(F-U). Proof 1) Let D == 2, 3 (mod 4). Put" = r + sJD, K = x + yJD. Then the condition (2) becomes: corresponding to any pair of rational numbers r, s there are rational integers x, y such that (3) Settingr = s = tthecondition (3) gives± + IDI± < 1, or IDI < 3. Therefore R(JD) cannot be Euclidean if D ~ - 3. On the other hand, if r, s are given rational numbers we can always find rational integers x, y such that Ir - xl ~ t, Is - yl ~ t so that corresponding to D = - 1, - 2, the inequalities I(r - X)2 - D(s - y)21 ~ ± + IDI± < I hold so that R(~) and R(j=2) are Euclidean. 2) Let D == 1 (mod4). Put" = r + sJD, K = x + y(1 + JD)/2 so that

Setting r = s = ± we have /6 + /61DI < 1 or IDI < 15. Therefore there can only be the three Euclidean fields R(j=3), R(~) and R(F-U), and these fields are indeed Euclidean because, given rational numbers r, s we may choose rational integers x, y such that 12s - yl ~ t, Ir - x - (y/2) I ~ t, and therefore when D = - 3, - 7, - 11,

I(r -

x -

~)2 2

_

D (s _

~)21 ~ ~ + IDI ~ ~ < 2

""" 4

16""" 16

l.

D

In §13 we calculated the class number for R(F-l9) to be l. We see therefore that there are simple fields which are not Euclidean. From Theorem 12.15.4 we know that there are only finitely many imaginary fields which are simple. The question then is exactly how many? It is not difficult to prove that R(JD) is simple when D = - 1, - 2, - 3, - 7, - 11, - 19, - 43, - 67, - 163.

It has also been proved thatthere is at most one more value of D, and that ifit exists, then D < - 5 . 109 • (In fact no extra D exists; see Notes.) Concerning real Euclidean fields we have:

Theorem 15.4. The field R(JD) is a real Euclidean field only when D = 2,3,5,6,7,11,13,17,19,21,29,33,37,41,57,73.

D

Various Chinese mathematicians, including the author, made contributions to this problem, which in principle was eventually settled by Davenport. The proof of the theorem is beyond the scope of this book.

449

16.16 Lucas's Criterion for the Determination of Mersenne Primes

16.16 Lucas's Criterion for the Determination of Mersenne Primes We first sharpen Theorem 9.S for the quadratic field R(JiJ), D > O. From Theorem 10.3 we know that all the prime ideals can be separated into three classes according to whether (~) = 0, + I or - l. We shall write q for a prime number satisfying (~) = I so that q = .0,0; we write r for a prime number satisfying (1) = - 1 so that r itself is a prime ideal in R(JD). From Theorem 9.S we have, if Q'(O(, then

=1

(mod .0),

(1)

O(r'-l=1

(modr).

(2)

O(q-1

and if r,(O( then

Theorem 16.1. Suppose that q, r are not 2. O(q-1

=1

If q,(O(,

then

(modq),

(3)

and if r,(0(, then O(r+ 1

=

(mod r).

N(O()

(4)

Observe that (1) and (3) are equivalent, and that (2) follows (4).

Proof Let 0( = a + b(.1 + fl)/2 where a, b are rational integers. Let p be an odd prime so that, from Fermat's theorem, O(P

=a

P

+ bP

.1P

+ (fl)p 2P

b p-1 =a + -(.1 + .1 2 fl) 2

=a + ~(.1 + (;)fl) Therefore if p = q, then O(q = (modr) which gives (4). 0

0(

(modp).

(modq) which gives (3), and if p = r, then O(r = IX

Now let p be an odd prime and we shall examine the nature of the Mersenne number M = Mp = 2P - l. If there exists .1 > 0 such that

(~) = and there exists a unit

1

e in R(fl) satisfying N(e) =

where e' is the conjugate of e.

-

1, then we let

450


Theorem 16.2. A necessary and sufficient condition for M to be prime is that rp - l

== 0 (modM).

(6)

Proof 1) Assume that M is a prime. From (5) we know that M is of the type r, and so from Theorem 16.1 we have eM + 1 == - 1 (mod M) and therefore

2) Assume that M is composite, say M = ql ... q s r 1 ... r t • From (5) we know that at least one of the prime divisors of M is of type r. If (6) holds, then Mlr p - l or

and hence e2P

== - 1 (mod M),

(7)

== 1 (modM).

(8)

and on squaring e 2P + 1

Let 'P be a prime ideal divisor of M and let I be the least positive integer satisfying el == 1 (mod 'P). Then, by (8), 1I2P + 1, and so by (7), 1= 2P + 1. If 'P is a divisor of a certain q, then eq - 1 == 1 (mod 'P) by Theorem 16.1, and hence 2P + 11 q - 1, which is impossible because q cannot exceed M. If 'P is a certain r, then er + 1 == - 1 (mod r) by Theorem 16.1. This then gives 2P + 112(r + 1) andsor = 2Pm - 1. Butr ~ Mso thatm = 1, r = M. That is Mmust be prime after all. 0 Example. Take L1

= 5, e = (1 + .j5)/2 so that

Ifwe take p = 7, Mp = 127, then the residues mod 127 for r m (m = 1,2,3,4,5,6) are 3,7,47,48,16, O. Therefore 127 is a prime. Of course the full power of the theorem is not revealed in this specific example. However, with the aid of electronic computers, the same method can be used to show, for example, that the 687 digit number M 2281 = 2 2281 - 1 is prime. Indeed all the large known Mersenne primes are found by essentially the same type of method.

16.17 Indeterminate Equations The invention of the theory of ideals to tackle Fermat's problem is an important development in algebraic number theory. From the standpoint of mathematics this theory is far more important than that of settling a difficult problem. Let p be an

451

16.17 Indeterminate Equations

odd prime and p

= e21ti/ p •

If we can prove that

has no integer solutions in the field R(p), then obviously Fermat's Last Theorem is established. The expression ep + 1]P can be factorized into linear terms in R(p) so that the problem is easier to start with. Indeed this is Kummer's starting point in his research on Fermat's problem, but the principal difficulty lies with the absence of a unique factorization theorem. It is for this reason that Kummer invented his theory of ideals which has now become an indispensable part of mathematics. It is not easy to understand Kummer's method. That is, even if we assume that there is unique factorization in R(p), we still need a deep theorem of Kummer's before we can settle Fermat's pr.oblem. The theorem concerned is as follows: A necessary and sufficient condition for a unit B in R(p) to be a p-th power of another unit is that B is congruent to a rational number mod (1 - PY. We can only consider two simple examples in this book. Theorem 17.1. The equation (1)

has no solution in integers in R(J=1). Proof The unique factorization theorem holds in the field R(J=1), that is every ideal is a principal ideal. We may therefore assume without loss that (e,1]) = 1. 1) Let A = 1 - i. Then A is irreducible, and A2 = - 2i and 2 = i(l - i)2 are associates. Also N(2) = 4 so that every integer in R(J=1) must be congruent to one of the four numbers 0, 1, i, 1 - i (mod 2). Since 0, 1 - i are divisible by A, any integer DC not divisible by Amust satisfy DC == 1 or i (modA2) so that DC = 1 + PA 2 or DC = i + PA2, and hence (2)

Now let e, 1],. satisfy (I). Suppose, if possible, that e, 1] are not divisible by A. From (2) and (1) we have 2 == .2 (modA 6 ). Since 2 = A2 i we see that AI •. Write • = AY so that A,j'y, and iA 2 == A2/ (mod A6) or / == i (mod A4 ). On squaring this we deduce from (2) that 1 == y4 == - 1 (mod A4), which is impossible. Therefore one of e, 1] is divisible by A. By symmetry we may assume that Ale, and we now write e = Anb, n ;;:: 1, A,j'b, so that we have

2) We now prove a more general result, namely that there are no integers b, ., 1]

in R(

J=1) such that

B

unit,

A,j'b1],

(b,1]) = I,

n;;:: 1.

(3)

452


The proof is divided into two steps. In the first step we show that if (3) is soluble then

n must be at least 2; in the second step we show that if (3) is soluble for a certain n, then it is soluble for n - 1 also. The theorem therefore follows from this contradiction. If(3)holdsforintegersD,r,I]thenAt!"r.SinceN(A) Let r = 1 + JJ.A so that on squaring we have'

= 2weseethatr == 1 (mod A).

Also, by (2), (4) so that, by (3),

Thus AIJJ. and we may write r

= 1 + VA 2 ,

r2 = 1 + 2VA2

+ V2A4 =

1 + A4V(i

+ v).

(5)

Since v, i + v form a complete residue system mod A we have v(i + v) == 0 (mod A) giving r2 == 1 (modA 5 ). From (3) and (4) we deduce that GA 4n D4 == r2 - 1]4 == 0 (mod A5), and we conclude that n ~ 2. Now assume that D, r, I] satisfy (3) with n ~ 2. Then GA 4n D4 = (r - I]2)(r + 1]2). From (5) we have r == 1 (mod A2 ), and on the other hand, since At!"I] we have (6)

it follows from (7) that A4 (n-l) must divide one of these two divisors. We may assume that A4 (n-1) actually divides the latter divisor, since otherwise we may replace I] by il]. From (7) we have r

+ .,,.2

_

14(n-1)

~-G211.

rp

4

(At!"rpO", (0", rp)

where G1, G2 are two units. Thus

21]2

il]2

= Y = G2 A4(n-l)rp4 - G10"\

= 1),

453

16.17 Indeterminate Equations

or

where 83 = - 8di, 84 = 82/i are also units. Since n ~ 2, A-/,a we see from (2) that 1]2 == 83 (mod A4) and hence, by (6), 1 == (mod A2). Therefore 83 is either + 1 or - 1 and not ± i, that is

A-/,q>a,

83

(q>, a) = 1.

Ifwe take the negative sign here then our second step follows at once, and if we take the positive sign then the same result is obtained by replacing 1] by i1]. 0 Theorem 17.2. The equation (8)

has no solution in integers in R(p), p = (- 1 + ~)/2. Proof Since R(p) is a simple field we may assume that (~, 1]) = 1. ·1) Let A = 1 - p, so that 1 - p2 = - p2(l - p) = - p2A and N(A) = _ p2A2 = 3. Therefore Ais irreducible and all the integers are partitioned into three classes represented by 0, 1, - 1. Therefore, if A-/,~, then ~ == ± 1 (mod A). We shall now show that (9)

Let

We need only consider the ~ = 1 + f3A so that ~3

_

+ sign case, since otherwise we may replace ~ by -

~.

= f3A(f3A + 1 - p)(f3A + 1 _ p2) = f3A(f3A + A)(f3A - p2 A) = A3 f3(f3 + 1)(f3 _ p2).

1 = (~

-

1)(~

- p)(~ - p2)

Since f3, f3 + 1, f3 - p2 are incongruent mod A, and N(A) = 3 there must be one of them which is divisible by A. We deduce that if A-/'1], then (10)

Now if A-/, ~1]', then 0 == ~3 + 1]3 + ,3 == ± 1 ± 1 ± 1 (mod A3). The possible choices are ± 1, ± 3 and none of them is divisible by A3, so that one of ~,1]" must be divisible by .Ie. Let it be , = Any, n ~ 1, A-/,y so that (~,1])=

1,

A-/,y,

n~

1.

2) We shall now prove a more general result, namely that (~, 1]) =

1,

A-/,y, n

~

1,

(11)

454


where 8 is a unit, has no integer solutions in R(p). As in the proof of Theorem 17.1 we separate into two steps where, in the first step, we show thatif(ll) has a solution, then n ~ 2, and in the second step we show that if (11) has a solution, then n may be replaced by n - 1 and there is still a solution. The theorem then follows by this contradiction. If (11) has a solution, then by (10)

Since + 1 + 1 and - 1 - 1 are not divisible by A we see that - 8A 3n y 3 == 0 (mod A4 ), and so n ~ 2. Suppose that e, 1], yare solutions to (11). From 1 == p == p2 (mod A) we deduce that e + 1] == e + P1] == e + p21] (mod A) and hence - 8A 3ny 3 = e 3 + 1]3 = (e + 1])(e + p1])(e + p21]) where the three divisors are all multiples of A. It is not difficult to show that (e + 1])/A, (e + P1])/A, (e + p21])/A are pairwise coprime. In fact, for example, from (e + 1]) - (e + P1]) = A1] and p(e + 1]) - (e + P1]) = - Aeweseethat(e + 1])/Aand(e + P1])/A are coprime. Thus one of the three divisors. in the factorization

_ 8A 3 (n-l)y3

=

e

+ 1] e + P1] e + p21] A

A

A

must be a multiple of A3 (n-l), and we may assume that it is (e we can replace 1] by P1] or p21]. Hence

+ 1])/A since otherwise (12)

where 81> 82, 83 are units and jl, v, u are pairwise coprime integers not divisible by A. From (12) we have

giving

(V,u)

=

1,

A,(jl

(13)

where 84, 85 are also units. From (13) we have v3 + 84U3 == 0 (mod A2) and here, by (10), ± 1 ± 84 == 0 (modA 2). Among the units ± 1, ± p, ± p2 only 84 = ± 1 can satisfy this congruence. Hence 84 = ± 1 and we see that (13) is the same as (11) with n replaced by n - 1. The theorem is proved. 0

16.18 Tables We conclude this chapter with two tables displaying all the quadratic fields R(.ji5) with - 100 < D ~ 100. We list their integral basis, discriminants, ideal classes and the quadratic forms associated with the ideal classes together with their

16.18 Tables

455

character systems. We also display the continued fraction representations for OJ and the fundamental units in the second table. More precisely: In Table I, the first column is the value for D. The second column is OJ (see the definition in Theorem 4.5). The third column is the discriminant d. The fourth column displays the ideal classes of R(~ji). The fifth column indicates the relationship between the ideal classes. The sixth column displays the quadratic forms representing the classes of forms corresponding to the ideal classes. The seventh column is the character systems associated with these classes of forms. In Table II, the first two columns are as before. The third column displays the continued fractions expansion representing OJ when D is square-free and representing JD when D is not square-free. The fourth column is the discriminant d. The fifth column displays x + yJD when D is square-free and it is the fundamental unit 1] of R(JD); when D is not square-free it displays the least positive integer solutions to x 2 - y2 D = ± I (if x 2 - y2 D = - I is soluble, then x + yJD satisfies x 2 - y2 D = - I, otherwise x, y satisfy x 2 - y2 D = + I). The sixth column is N(x + yJD). The last four columns are the same as the last four columns in Table I.

456


Table I D

00

LI

Ideal classes

-I

)=l

_22

-2

)-2

-3 -5

-6

Quadratic forms

Character systems

(I)

X2 +y2

+1

_23

(I)

2X2+y2

+1

-3

(I)

X2+xy+y2

+1

)-5

-2 2 ·S

(I)

A2

SX2+y2

+1, +1

(2, I +) -S)

A

3X2 +2xy+2y2

-I, -I

)-6

-2 3 .3

(I)

A2

6X2+y2

+ I, +1

A

3x2 +2y2

-I, -I

2X2+xy+y2

+1

1+)-3 2

(2,) -6) -7 -10

-\I

-13

-14

1+)-7 2 )-10

1+)-11

-7

(I)

-5.2 3

(I)

A2

IOx2 + y2

+1, +1

(2,) -10)

A

5x 2 +2y2

-I, -I.

3x2 +xy+y2

+1

A2

13x2 +y2

+1, +1

(2, I +) -13)

A

7X2+2xy+2y2

-I, -I

(I)

[4

14x2 + y2

+1, +1

(3,2+) -14)

[3

6x 2 + 4xy + 3y2

-I, -I

(2,) -14)

[2

7x 2+2y2

+ I, + I

Sx2+2xy+3y2

-I, -I

A2

4X2+xy+y2

+1, +1

(2, I +(0)

A

(I)

3X2+3xy+2y2 17x2+y2

-I, -I

[4

+ I, +1

(3,2+) -17)

[3

7X2+4xy+3y2

-I, -I

(2,1 +) -17)

[2

9x2 + 2xy + 2y2

+ I, + I

(3, I +) -17)

6x 2 +2xy+3y 2

-I, -I

-19

(I)

Sx 2 +xy+y2

+1

-3.2 2 .7

(I)

A2A~

+2Ix2+y2

+1,+1,+1

AAI

6x 2 +6xy+Sy2

-1,-1,+1

AI

7x 2 +3y2

+1,-1,-1

(2, 1+) -21)

A

IIx2+2xy+2y2

-1,+1,-1

(I)

A2

22x2+y2

+1, +1

(2,) -22)

A

IIx 2 +2y2

-I, -I

-II

(I)

)-13

-2 2 .13

(I)

)-14

_7.2 3

2

Relations

(3,1+)-14) -IS

-17

-19 -21

1+)-15 2 )-17

1+)-19 2 )-21

-3·S

-2 2 .17

(I)

(5,3+) -21) (3,)-21)

-22

)-22

-23 .11

457

16.18 Tables Table I (continued) D

-23

w 1 +) -23 2

-29

)-29

Quadratic forms

Character systems

(1)

[3

6x 2+xy+y2

+1

[2

4x 2 + 3xy + 2y2

+1

3x2+2xy+2y2

+1

2,1+ (2,

)-26

Relations

-23 (

-26

Ideal classes

-2 3 .13

-2 2 .29

1+)-23) 2

1+~-23) (1)

[6

26x2+y2

+1, +1

(5,3+)-26)

[5

7x 2+6xy+5y 2

-1, -1

(3,1 +) -26)

[4

9X2+2xy+3y2

+1, +1

(2,) -26)

[3

13x2+2y2

-1, -1

(3,2+)-26)

[2

10x2+4xy+3y 2

+1, +1

(5,2+)-26) (1)

[6

6x 2+4xy+5y 2 29x 2+y2

-1, -1 +1, +1

(3,2+) -29)

[5

l1x2 +4xy+3y 2

-1, -1

(5,4+) -29)

[4

9x 2+8xy+5y 2

+ 1, + 1

(2,1 +) -29)

[3

15x2+2xy+2y 2

-1, -1

(5,1 +) -29)

[2

6x 2+2xy+5y2

+ 1, +1

(1)

A2A:

10x2+2xy+3y 2 30X2+y2

+1,+1,+1

(2,) -30)

AAI

15x2+2y2

-1, -1, +1

(3,) -30)

Al

10x2+3y2

+1,-1,-1

(5,) -30)

A

-1, +1,-1

(1)

[3

6x 2+5y2 8x 2+xy+y2

(2,w)

[2

4X2+xy+2y2 5x 2+3xy+2y 2

+1

(1)

A2A:

33x2+y2

+1,+1,+1

(2,1 +) -33)

AAI

17x2+2xy+2y2

-1,-1,+1

Al

11x2+3y2

-1,+1,-1

(6,3+)-33)

A [4

7x 2+6xy+6y 2 34x2+y2

+1,-1,-1

(I)

(5,4+) -34)

[3

IOx 2+8xy+5y 2

-I, -I

(2,)-34)

[2

17x2+2y2

+1, +1

(3,1 +) -29) -30

-31

)-30

W+)-31)

-2 3 .3.5

-31

(2,1 +w) -33

)-33

_22 ·3·11

(3,) -33) -34

)-34

-2 3 .17

(5,1 +) -34) -35

W+)-35)

-5·7

-1, -1

+1 +1

+1, +1

7x 2+2xy+5y 2

-I, -I

(I)

A2

9X2+xy+y2

+1, +1

(5, 5+~ -35)

A

3x 2+ 5xy + 5y2

-I, -I

458


Table I (continued) D

w

Ll

Ideal classes

Relations

Quadratic forms

-37

j-37

-2 2 '37

(1)

A2

37x 2+y2

+1, +1

(2,1 +j -37)

A

-1, -1

(1)

[6

19x2+2xy+2y 2 38x 2+y2

(3,2+j -38)

[5

14x2 + 4xy+ 3y2

-1, -1

(7,2+j-38)

[4

6x 2+4xy+7y2

+1, +1

(2,j -38)

[3

19x2+2y2

-1, -1

(7,5+j-38)

[2

9x 2+ 10xy + 7y2

+1, +1

[4

13x2+2xy+ 3y2 lOx 2+xy+y2

-1, -1

(1) (2,1 +w)

[3

(3,1 +w)

[2

-38

j-38

-2 3 .19

(3,1 +j -38) -39

to+j-39)

-3,13

j-41

-2 2 '41

j-42

-43

!O+j-43)

-46

j-46

-3.2 3 .7

-43 -2 3 .23

to+j-47)

-47

41x2 + y2

+1, +1

(3,2+j-41)

[1

15x2+4xy+3y 2

-I, -1

(5,3+j-41)

[6

lOx2 +6xy+5y 2

+1, +1

(7,6-+'j-41)

[5

Ilx2

-1, -1

(2,1 +j -41)

[4

21x2+2xy+2y2

+1, +1

(7,1 +j -41)

[3

6x 2+2xy+7y 2

-1, -1

(5,2+j -41)

[2

9x 2+4xy+5y 2

+1, +1

(1 )

A2Af

14x2+2xy+3y 2 42x2+y2

+1,+1,+1

(7,j -42)

AAI

6x 2+7y2

+1, -1,-1

(3,j -42)

Al

14x2+3y2

-1,-1,+1

(2,j-42)

A

21x2+2y2

-1, +1,-1

(1)

I

(1)

[4

Ilx2+xy+y2 46x 2+y2

+1, +1

(5,3+j-46)

[3

Ilx 2 + 6xy + 5y2

-1, -1

(2,) -46)

[2

23x2+2y2

+ I, +1

lOx 2+4xy+5y 2

-1, -1

!O+j-51)

-3·17

+ 12xy+7y 2

-1, -1

+1

(1)

[5

12x2+xy+y2

+1

(2,w)

[4

+1

(3,2+w)

[3

6X2+xy+2y2 6x 2 + 5xy+3y2

(3,w)

[2

4X2+xy+3y2 7x 2 + 3xy + 2y2

+1

(1)

A2

(3,1 +w)

A

13x2+xy+y2 5x 2+3xy+3y2

+ I, +1 -1, -1

(2,1 +w) -51

+1, +1

[8

( 1)

(5,2+j-46) -47

+1, +1 -1, -1 -1, -1

(3,1 +j -41) -42

+1, +1

5X2+xy+2y2

(2,w) -41

6x 2+3xy+2y 2 4x 2 + 3xy+3y 2

Character systems

+1 +1

459

16.18 Tables Table I (continued) D

w

Ll

Ideal classes

Rela,tions

Quadratic forms

-53

)-53

-2 2 .53

(I)

[6

53x 2+y2

+ I, + I

(3,2+)-53)

[5

19x2+4xy+3y 2

-I, -I

(9,8+)-53)

[4

13x2 + 16xy+9y 2

+1, +1

(2, 1+) -53)

[3

27x2+2xy+2y2

-I, -I

(9, 1+) -53)

[2

6x 2+ 2xy + 9y2

+1, +1

18x2+2xy+3y 2

-I, -I +1, +1 -I, -I

(3, 1+) -53) -55

W+)-55)

-5·11

(I)

[4

14x2+xy+y2

(2, I +w)

[3

(5,2+w)

[2

8x 2+3xy+2y 2 4x 2 + 5xy+ 5y2

)-57

+1,+1,+1

(I)

A2Ai

(2, 1+) -57)

AA.

29x2+2xy+2y2

-1,-1,+1

(3,) -57)

A.

19x2+3y2

+1,-1,-1

A A2

IIx 2 + 6xy+6y 2

-I, +1,-1

58x2 +y2

+ I, + I

(2,) -58)

A

29x 2+2y2

-I, -I

(I)

[3

15x2+xy+y2

+1

(3, 5+~ -59)

[2

7x 2 +5xy+3y 2

+1

5X2+xy+3y2

+1

-3.2 2 .19

(6,3+) -57) -58 -59

)-58 W+)-59)

-2 3 .29

(I)

-59

(3, -61

)-61

-2 2 .61

1+~-59)

)-62

-2 3 .31

-I, -I

(I)

[3

61x2 + y2

+ I, +1

(5,3+)-61)

[2

14x2 + 6xy+ 5y2

+ I, + I

(5,2+) -61)

-62

+1, +1

7X2+xy+2y2 57x 2 +y2

(2,w)

-57

Character systems

13x2 +4xy+5y 2

+1, +1

(7,4+) -61)

A[2

IIx 2 + 8xy+ 7y2

-I, -I

(7,3+) -61)

A[

IOx 2 +6xy+ 7y2

-I, -I

(2, 1+) -61)

A

31x 2 + 2xy + 2y2

-I, -I

(I)

[8

62x2+y2

+ I, +1

(3,2+)-62)

[7

22x2+4xy+3y2

-I, -I

(7, 1+) -62)

[6

9x 2+2xy+7y 2

+ I, + I

(11,2+) -62)

[5

6x 2 +4xy+ lIy2

-I, -I

(2,) -62)

[4

31x 2 +2y2

+1, + I

(11,9+) -62)

[3

13x2+ 18xy+ lIy2

-I, -I

(7,6+) -62)

[2

14x2 + 12xy+ 7y2

+1, +1

2Ix2+2xy+3y2

-I, -I

(3, I +) -62)

460


Table I (continued) D

w

Ll

Ideal classes

Relations

Quadratic forms

Character systems

-6S

j-6S

-2 2 'S'13

(I)

14

6Sx 2+y2

+1,+1,+1

(3,2+j-6S)

13

23x 2+4xy+3y 2

-1,+1,-1

(9,4+j -6S)

12

9x 2 + 8xy + 9y2

+1,+1,+1

(3, I +j -6S)

22x2+2xy+3y2

-1,+1,-1

(II, 10+j -6S)

AI3

ISx 2+ 20xy + IIy2

+1,-1,-1

(2, I +j -6S)

AI2

33x2+2xy+2y2

-1,-1,+1

(II, I +j -6S)

Al

6x2+2xy+lly2

+1,-1,-1

(S, j -6S)

A 14

13x2+Sy2 66x 2+y2

-I, -I, +1

(I) (S,3+j-66)

[3

ISx 2+6xy+Sy2

-1,+1,-1

(3,j -66)

12

22x 2 + 3y2

+1,+1,+1

14x2+4xy+Sy2

-1,+1,-1

AI2

IOx 2 + 4xy + 7y2

+1, -1,-1

AI2

6x 2+ IIy2

-I, -I, +1

(7,S+j -66)

Al

13x2+ 10xy+ 7y2

+1, -1,-1

(2,j -66)

A

33x 2+2y2

-1,-1,+1

(I)

14

17x2+xy+y2 69x 2+y2

+1,+1,+1

(7,6+j -69)

13

ISx 2+ 12xy+ 7y2

+1,-1,-1

(6, 3+j -69)

12

13x2+6xy+6y 2

+1, +1, +1

(7, I +j -69)

IOx 2+2xy+7y 2

+1,-1,-1

(S, I +j -69)

A[3

14x2+2xy+Sy2

-1,-1,+1

(3,j -69)

A[2

23x2+3y2

-1,+1,-1

(S,4+j-69)

Al

17x2+8xy+Sy2

-1,-1,+1

(2, I +j -69)

A

(I)

A2A~

3Sx2+2xy+2y 2 70X 2+y2

+1,+1,+1

(7,j -70)

AAI

IOx 2 + 7y2

-I, -I, +1

(S,) -70)

AI

14x2+Sy2

+1, -1,-1

(2,j-70)

3Sx 2+2y2

-1,+1,-1

(I)

A 17

71x2+y2

+1

(2, 3+~ -71)

16

IOx 2+3xy+2y 2

+1

(s, 7+~

IS

6x 2+7xy+Sy2

+1

-66

j-66

-2 3 '3'11

(S,2+j-66) (7,2+j-66) (11,j -66)

-67 -69

-70

-71

1-l ::r

+1, -1,-1

Z c:

8

"... 0

'

C4Y,

(3)

or

~I
max(m, Igol) such that

IhtO ghQ(h) 1< 1, then the required contradiction will be obtained from equation (l). By Theorem 6.2 it suffices to prove that, for any fixed x, we have n

as k=O but this is easy because, as p

p -+ 00,

-+ 00, m

IT

IxIP- 1 (h + Ixl)P L lakllxl k ~ _ _"-h=-=l_ _ _ -+ O. (p - I)! k=O n

D

486

17. Algebraic Numbers and Transcendental Numbers

17.7 The Transcendence of 1C Theorem 7.1. The number n is irrational. Proof (Niven). Suppose the contrary, so that n = alb where a, b are positive integers. Let xn(a - bx)n f(x)=---

n!

and F(x)

= f(x) - jO.

Let m

P(y) =

n (y h=l

alXh)'

Since 1 + e i1t = 0, it suffices to show that m

n (eO -

R=

e"h) =F O.

h=l Now R can be written as

R = c + Le"

+ Led'" + ... = c + efJl + efJ2 + ... + efJ

r,

where c is the number of the 2m terms in which the power of the exponential is zero, and 131,132,'" , f3r are non-zero numbers. Let p be a prime greater than max(c, a, n~= 1 alf3hl) and define I(x) by (ax)p-l I(x) =

n (ax -

af3h)P

h=l

(p - I)!

Similarly to the proof of Theorem 6.3 we have YP,h(X - f3h)P

+ Yp+ l,h(X -

f3h)P+ 1

+ ...

(p - I)!

where A" being symmetric functions of af31, ... , af3r and hence symmetric functions 'of alXb" . , alXh, are rational integers and A p- 1 =1= 0 (modp). On the construction of the corresponding F(x) and Q(x) we have F(O)R = F(O)

(c + tl h

efJh)

tl

t

= cF(O) + h F(f3h) + h Q(f3h),

so that cF(O) = C(Ap-l

+ pAp + ... )

is a rational integer which is not a mUltiple of p. Also

L F(f3h) = L (PYP,h + pcp + 1)yp+ l,h + ... )

h=l

h=l

L

L

Yp,h + pcp + 1) Yp+ l,h h=l h=l = pCp + pcp + I)Cp+l + ... ,

=p

+ ...

488


where cP' cp + 1>" ., being symmetric functions of apt> . .. , apr> are integers. It follows that L~ = 1 F(Ph) is a mUltiple of p and whence

IcF(O) + htl F(Ph) I~ 1. It only remains to show that, for sufficiently large p,

But, as p

--+ 00,

n

L lakllxlk ~ k=l

L

(alxl)P-l (alxl + alPhl)P _ _ _:.:....h=-=l'--_ _ _ _ --+ 0 (p - I)!

so that the result follows from Theorem 6.2.

D

Remark. This theorem settles the problem of "squaring the circle" - it is impossible to construct a square equal in area to a given circle, using only straight edge and compass. Exercise 1. Prove that sinh

eis transcendental whenever eis rational.

Exercise 2. Prove that sin 1 is transcendental by proving that ei is transcendental.

17.8 Hilbert's Seventh Problem In the year 1900 Hilbert gave a list of 23 unsolved problems which he believed to be worthy of the attention of mathematicians in the twentieth century. We already mentioned the first part of his seventh problem, and the remaining part is the following: Let (J( and Pbe algebraic numbers with (J( =F 0, 1 and Pirrational. Does it follow that (J(P is transcendental? As specific examples he asked for the proofs of the transcendence of 2J2 and e" = ( - 1) - i. In 1929 the Russian mathematician A. O. Gelfond made an important contribution to the solution of this problem. He proved the transcendence of e" and pointed out that his method can be used to settle Hilbert's problem when plies in an imaginary quadratic field. In 1930 Kusmin used Gelfond's method to settle the case when P lies in a real quadratic field and proved in particular that 2J2 is transcendental. Then in 1934 the complete solutions to Hilbert's problem were given independently by Gelfond and Schneider. It may be of some interest to recall that, when discussing this problem, Hilbert was of the opinion that the solution would not be available before the solutions to the Riemann's hypothesis and Fermat's last theorem. It seems therefore that it is very difficult to judge the difficulty of an unsolved problem before a solution is available.

489

17.8 Hilbert's Seventh Problem

Let K be an algebraic number field of degree h, and let /31, ... , /3h be an integer basis, so that every integer in K has the unique representation a1/31 + ... + ah/3h where al> . .. , ah are rational integers. We shall denote by loci the maximum of the modulus of the conjugates oc(i) (1 ~ i ~ h) of oc, that is

loci = max loc(i)I. 1 ~i~h

In the following we let c, Cl> C2 be natural numbers depending on K and its basis /3l> ... , /3h. It is easy to show that if oc is an algebraic integer with oc = a1/31 + ... + ah/3h, then lad ~ clocl· Lemma 8.1. Let 0 < M < N, and ajk be rational integers satisfying lajkl ~ A (A ~ 1, 1 ~j ~ M, 1 ~ k ~ N). Then there exists a set ofrationalintegers Xl> ... , XN, not all zero, satisfying 1

~j~

M,

(1)

and M

IXkl

~

[(NA)N-M],

1 ~ k ~ N.

(2)

Proof Let 1 ~j~M,

so that this defines a mapping from rational integers (x 1, ••• , XN) to rational integers (Yl>· . . ,YM). We write N-M M H = [(NA)N-M] so that NA «H+ 1)~, and hence

NAH + 1 ~ NA(H

N

+ 1) < (H + I)M.

(3)

For any set of integers (Xl, . .. , XN) satisfying (4)

we have

where - B j and Cj represent respectively the sum of the negative and positive coefficients of Yj, so that the number of values assumed by Yj cannot exceed NAH + 1. The number of sets (Xl> ... , XN) satisfying (4) is (H + I)N and the corresponding number of sets (Yl> .. . ,YM) is at most (NAH + I)M. It follows from (3) that there must be two sets (x~, ... , x~) and (x~, ... , x~) which correspond to the same set (Yl> .. . ,YM). Let Xk = x~ - x~ (1 ~ k ~ N) so that (Xl> .. . , XN) is now the required set satisfying (1) and (2). D

490


Lemma 8.2. Let 0 < p < q, and let a.kl (1 ~ k ~ p, 1 ~ I ~ q) be integers in K satisfying Ia.kll ~ A. Then there exists a set ofalgebraic integers e1,' .. , eq in K, not all ' zero, satisfying· l~k~p

(5)

and 1~ I

Proof Let el=XllfJ1 integers. Let

+ ...

+XlhfJh (1

~/~q)

~

(6)

q.

where Xll, ... ,X,h are rational (7)

where aklrl> ... ,aklrh are also rational integers. For 1 ~ k q

q

~

p we have, from (5), that

h

o = L a.kle, = L a.kl L x'rfJr 1=1

1=1

r=l

= rt J1 X'r ut aklrufJu = ut Ct1 J1 aklrUXlr)fJU' Since fJl>' .. , fJh are linearly independent we have the hp number of equations h

q

L L ak'rux'r = 0,

1~ u

~

h,

(8)

r=l'=l

with hq number of unknowns. From (7) and our remark preceeding Lemma 8.1 we see that laklrul ~ cmax 1';;i';;h IfJd A ~ c2A. It now follows from Lemma 8.1 that the system (8) has a non-trivial set of solutions in rational integers satisfying 1~ I

~

q,

1~ r

~

h.

Therefore

le,l Taking

C1 =

~

IX llllfJ11 + ... + IX'hllfJhl

~

c2h(l

+ (hqC2A y/(q-P».

c2h the lemma is proved.

D

17.9 Gelfond's Proof Let a. and fJ be algebraic numbers with a. i= 0, 1 and fJ irrational, and we have to prove that a. Pis transcendental. Suppose the contrary, so that y = a. P = ePlog~ (where log a. may be any fixed value of the logarithm of a.) is also algebraic. We shall derive a contradiction.

491

17.9 Gelfond's Proof

Suppose that

0(,

f3, y lie in an algebraic field with degree h. Let m

where q2

= t

=

2h

q2 n=2m

+ 2,

is a square of a natural number and is a multiple of 2m. Also, let

P1, P2, ••. , PI represent the t numbers

(a

+ bf3) log 0(,

1 ~ a ~ q,

1 ~ b ~ q.

We introduce the integral function (1)

where the coefficients 1'/1>' . . ,1'/1 are determined by the following conditions. We solve the system of mn homogeneous linear equations

o~ k ~ n -

1~ I

1,

~

m,

(2)

in the t = 2mn unknowns 1'/1", .,1'/1' The coefficients of this system are numbers in K and

~

1

I

~

m,

1 ~ a,

b ~ q,

O~k~n-l.

Let C1> C2, ••• denote natural numbers which are independent of n. There exists C1 such that C10(, Clf3 and C1Y are all integers in K, so that on multiplying each of the coefficients of the system by c~-lc~qc~q = C~-l + 2mq (~ c~), the resulting coefficients become integers in K. Moreover the absolute value of the conjugates of the various coefficients is at most

It follows from Lemma 8.2 that there is a non-trivial set of integers solutions 1'/1>' . ·,1'/1 in K such that

1~ k

~

t.

Since the numbers Pl, ... , PI are distinct, the function R(x) is not identically zero. For suppose otherwise; then on expanding the right hand side of(1) we have 1'/lP~

which implies 1'/1 R(x)

+ 1'/2P~ + ... + 1'/IP~ =

= 1'/2 = ... = 1'/1 =

=

anix _l)n

0,

k = 0,1,2, ...

0, a contradiction. Thus we see from (2) that

+ an+l,/(X _

/)n+1

+ ... ,

1 ~ I ~ m,

(3)

where an,I> an + l,I> ••• are not all zero. Hence there must be a natural number r such

492

17. Algebraic Numbyrs and Transcendental Numbers

that R(k)(/) = 0,0 ~ k ~ r - I, I ~ I ~ m. But for I so that we see from (3) that r ~ n. Let us now examine the number

~

10

~

m we have R(r)(lo)"# 0

(4) c~ + 2mq p

This number lies in K, and

is an integer in K so that

IN(p) I > c 1h(r+2m q )

s

> c r.

(5)

On the other hand (6)

We now determine a suitable upper bound for Ipl. We apply Cauchy's integral formula to the function S(z)

R(z) m r TI (z-/o) k=l k *'0

= r!

(/0---k)r . z-k

'We then have

f

I S(z) p = (logoc)-rS(/o) = (logoc)-r-. --dz, (7) 2m z - 10 c where Cis the circle Izl = m(l + r/q), so that 10 (~ m) lies inside C. Asz varies on the circle we have .

IR(z)1 ~ t max l'1kle(Q+qIPIl}oglal.m(l +~) ~k~t

1 ~

...."

tcnnt(n+l)Cr+Q 4

~

9

""

Cr rt (r+3) 10

,

mr ( + qr) - m = q'

Iz - 101 ~ Izl - 1/01 ~ m I Iz - kl

~

mr

-, q

m

1

(z-/o)-rJI

I ~ k ~ m,

(I -

k)rl :-k

~C~1 (q)mr -; ,

k*lo

IS(x)1

~ r!c~ort(r+3)S1 (~rr ~ c;2rtr(3-m)+~.

From (7) we now have Ipi

~

I -I(logoc)-rl 2n .

II I

S(z) - Idzl z - 10

c

493

Notes

From (6) and (8) we have

Replacing m by 2h

+ 2 we now have

and from (5) we deduce that rt'-~h

< c'14 c'5 -- c'15'

Since r ~ n, this cannot hold for sufficiently large n, and the required contradiction is obtained. D

Notes 17.1. The proof of Roth's theorem has been omitted in this English edition (seeJ. W. S. Cassels [15]). W. M. Schmidt [51], [52] has given the following important generalization of this famous theorem: Let 0(10"', O(n be real algebraic numbers such that 1,0(10"', O(n are linearly independent over that rational field R. Then, given any B > 0, the inequality

has at most a finite number of sets of integer solutions

q10 ... ,qn'

17.2. A. Baker [2] has made the following important improvement on Thue's theorem: Let g(x,y) be a homogeneous irreducible polynomial of degree n (~3) with rational integer coefficients, and let m be a positive integer. Then all the integer solutions to the equation g(x,y) = m can be effectively determined. More specifically, if H exceeds the absolute values of all the coefficients of g(x, y), then all the integer solutions to g(x,y) = m must satisfy max(lxl, Iyl) < exp{(nH) 0, the equation

n = ~ + ... + x!,

(1)

is always soluble in integers xv. We now denote by g(k) the least of all integers s with this property. Then Waring's statement becomes: "g(2)

= 4,

g(3) = 9,

g(4) = 19,

and so on."

We also denote by G(k) the least number swith the property that (1) is soluble for all sufficiently large n. Then clearly we have G(k)

~

g(k),

but in actual fact there is a great difference between the two numbers. In this chapter we only prove some very special results. The proof of the WaringHilbert theorem (that is g(k) < (0) is given in the next chapter. The proof, which Khintchin described as one of the three pearls in number theory, is due to Linnik and is much simpler than the original proof by Hilbert.

18.2 Lower Bounds for g(k) and G(k) Theorem 2.1. g(k) Proof Let q

~

2k

+ [@k]

- 2.

= [(W] and consider n = 2kq - 1 < 3k.

495

\8.2 Lower Bounds for g(k) and G(k)

This number ncan only be the sum of the powers lk and 2k, and in fact the least sfor the decomposition is given by n

= (q - 1)2k

+ (2k

- 1)1\

that is, n requires (q - 1) lots of 2k and 2k - 1 lots of 1\ giving g(k) ~ 2k

+ q - 2. D

From this theorem we see at once that g(2) ~ 4,

Theorem 2.2.

If k

~

~

9,

2, then G(k)

~

g(3)

g(4) ~ 19,

k

g(5)

~

37, ....

+ 1.

Proof Denote by A(N) the number of positive integers not exceeding N which are expressible in the form X~

We may suppose that

+ ... + x~,

Xl.' •. , Xk

are arranged so that

Hence A(N) cannot exceed the number of solutions to this set of inequalities, that is [N!lk]

A(N) ~

Xk

Xk-I

X2

L ... L

L L

1.

We claim that the sum on the right hand side is B(N) = -

1

k!

([N 1 / k]

+ 1)([N 1 /k] + 2)' .. ([N 1 /k] + k).

We can use induction to prove this. The claim clearly holds when k = 1, and so it remains to prove that

±(X +: -1) (Y +k), =

x=o

k

1

k

and this is easy to establish. When N -? 00, N

2

B(N)- k! 1. In this section we discuss the condition for the solubility of the congruence x~+···+x~==n

(modq).

From the Chinese remainder theorem we see that we can restrict our discussion to the congruence x~

+ ... + x~ == 0

(modp'),

(1)

497

18.3 Cauchy's Theorem

where p is a prime number. Since n = n - 1 + 1k we may also assume in what follows that p-fn. We first prove the following: Theorem 3.1 (Cauchy). Let Xl>'" ,Xm be m incongruent numbers (modq) and Yl>'" ,Yn be n incongruent numbers (modq). Suppose that there exists Yi such that (Yi - Yi,q) = 1 whenever j i= i. Then the number of incongruent numbers (modq) represented by Xu + Yv (1 ~ u ~ m, 1 ~ v ~ n) is at least min(m

+n -

1, q).

Proof The theorem is trivial if n = I. Suppose therefore that n ~ 2 and we may also assume that i = 1. We use an inductive argument. Let Zl>'" ,Zt be incongruent numbers (modq) of the form Xi + Yi- If t = q the required result is established. We suppose therefore that t < q and we denote by X, Y, Z the sets Xl>'" ,Xm ; Yl>'" ,Yn; Zl>'" ,Zt respectively. Consider Xl + Yl + A(Yn - Yl)' When A = 0, 1 all such numbers belong to Z. Since (q,Yn-Yl)=1 there must exist a Ao such that xl+Yl+(Ao-I)(Yn-Yt) E Z and Xl + Yl + AO(Yn - Yl) if; z. Let (j = Xl + Yl + AO(Yn - Yl) + Yl' Then (j - Yl if;Z and (j - YnEz. We can arrange Yl>'" ,Yn so that {

Clearly r

~ n -

~

(j -

Ys if;Z

(1

(j -

Ys'EZ

(r
-1-~--'

we have min(d + (d - 1)(s - 1),p) = p so that the theorem follows. 24) p > 2, (p - 1),tk, k = ptko, p,tko. From

and (p - 1),tko, we see that Xk runs over at least (p - 1)/(p - l,ko) (> 1) incongruent numbers (modp). Therefore X~

+ ... + x~,

gives min (

p-l (p - 1, k o)

+ (P-l

(p - 1, k o)

-

1) (s - 1) pY) ,

incongruent numbers modpY. From s - 1 ~ 3k

~

2pk -p-l

pY

~ -----~

1 p-l 2 (ko,p - 1)

pY - 1 ----_p_-_l__ l (ko,p - 1)

we see that x~ + . . . + x~(p,t Xl, ... ,X.) gives pY incongruent numbers. The proof of the theorem is complete. 0

18.4 Elementary Methods In the study of Waring's problem an elementary method usually gives rather poor results. We now introduce several examples which prove the existence of upper bounds for G(k) and g(k) for some special k. Sometimes we can even determine explicitly such an upper bound, but such a result will not be sharp. From Theorem 8.7.8 we already have that g(2) = 4.

500

18. Waring's Problem and the Problem of Prouhet and Tarry

Theorem 4.1. g(4)

~

50.

Proof We start with the identity 6(a 2

+ b 2 + c2 + d 2 )2 = (a + b)4 + (a - b)4 + (c + d)4 + (c - d)4 + (a + C)4 + (a - C)4 + (b + d)4 + (b - d)4 + (a + d)4 + (a - d)4 + (b + ct + (b - C)4.

Since a 2 + b 2 + c2 + d 2 can represent any positive integer, it follows that the left hand side of the identity represents 6x 2 where x is any integer. Now any integer n can be written as n

= 6N + r,

r

= 0, 1,2,3,4,5

so that n = 6(xi

+ x~ + x~ + x~) + r.

By the identity 6xi is representable as a sum of 12 biquadrates. Therefore n is the sum of at most 4 x 12 + 5 = 53 biquadrates. We take one further step. Any n ~ 81 is expressible as n

= 6N + t

where N~O, and t=0,1,2,8i,16 and 17 corresponding to n=:0,1,2,3,4,5 (mod 6). But 17 = 24

+ 1.

Therefore, following the method above, if n ~ 81, then it is the sum of 4 x 12 + 2 = 50 biquadrates. We can deal with n ~ 80 easily: If n ~ 50, then trivially n = n ·14. If 50 < n ~ 80, then n = 3 . 24 + (n - 48) . 14 and this is the sum of 3 + n - 48 < 50 biquadrates. D The same method together with the identity 5040(a 2

+ b 2 + c2 + d 2 )4 =

6L(2a)8

+ 60L(a±b)8 + L(2a ± b ± C)8

+6L(a±b±c±d)8,

(2)

can be used to prove that g(8) < 00. In this identity there are 840 8th powers on the right hand side, and since every n ~ 5039 is expressible as a sum of at most 273 numbers 18 and 28 , we see that g(8)

Theorem 4.2. G(3)

~

13.

~

840g(4)

+ 273 ~ 42273.

501

18.4 Elementary Methods

Proof We start with the identity 4

L «Z3 + Xi)3 + (Z3 -

X;)3)

= 8z 9 + 6Z 3(Xi + X~ + X~ + x~).

(1)

i= 1

If a number is expressible as (2)

then from (1) this number must be a sum of 8 cubes; this is because m is expressible as xi + x~ + x~ + x~, and Xi ~ Z3. Let z be a positive integer congruent to 1 (mod 6). We denote by /z the interval (3)

Clearly, for sufficiently large z, we have q>(z

+ 6)
' .. , x., with the property that Xl

+ ... + Xs = Yl + ... + y., (I)

506


We also let M(k) denote the least s so that (1) holds, and furthermore, (2) Theorem 6.1. M(k) ;;:: N(k) ;;:: k

+ 1.

Proof From

X~

+ . . . + x~ = y~ + . . . +

Y:,

we have

so that Yl>' .. ,Yk is only a permutation of Xl>' .. , Xk'

D

Theorem 6.2. N(k) ~ M(k) ~ 2k.

Proof Let Xl>"" Xs;Yl>'" ,Ys be solutions to (1) and (2). Then (3) i= 1

i= 1 s

L ((Xi + d)k+2 + y~+2)"# L (X~+2 + (Yi + d)k+2).

(4)

i= 1

i= 1

The proofs of these two formulae follow from the expansions of (3), (4) and applying (I), (2). Thus, if M(k) exists, then taking s = M(k) we have M(k + 1) ~ 2M(k). But M(1) = N(I) = 2, so that the theorem follows by mathematical induction. D Theorem 6.3. N(k) ~ tk(k

+ 1) + 1.

Proof Suppose that n > s! Sk. Let ai (i = 1,2, ... ,s) run over 1,2, ... ,n. Then there are nSsets al> a2, ... ,as. Each fixed set al> a2, ... ,as has s! permutations. It follows that among the nSsets al> a2,' .. , as there are at least nS/s! sets in which every set is a permutation of a certain other set. Let

h = 1,2, ... ,k.

Then Therefore there are at most k

IT (sn h h= 1

s

+ I) < skntk(k+ 1)

507

IS.7 The Problem of Prouhet and Tarry

sets of different Sl (a),

(5)

s2(a), ... , sk(a).

Take s = tk(k + 1) + 1. Then, from n > s!s\ we have

Therefore there are at least two different sets a1> a2, ... , as such that (5) takes the same values. Since these two sets are not permutations of each other, it follows that N(k) ~ s, and the theorem is proved. 0 We now write to represent (l) and (2). From Theorem 6.1 and the following examples, we have: Theorem 6.4. If k

~

9, then M(k)

=

N(k)

=

k

+ 1.

[0,3]1 = [1,2]1>

[1,2,6]2 = [0,4,5]2' [0,4,7,11]3 = [1,2,9,10]3, [1,2,10,14,18]4 = [0,4,8,16,17]4, [0,4,9,17,22,26]5 = [1,2,12,14,24,25]5, [0,18,27,58,64,89,101]6 = [1,13,38,44,75,84,102]6, [0,4,9,23,27,41,46,50]7 = [1,2,11,20,30,39,48,49]7, [0,24,30,83,86,133,157,181,197]8 = [1,17,41,65,112,115,168,174,198]8, [0,3083,3301,11893,23314,24186,35607,44199,44417,47500]9

= [12,2865,3519,11869,23738,23762,35631,43981,44635,47488]9' []

IS.7 The Problem of Prouhet and Tarry In this and the next sections we shall prove that

M(k) O. For this set at. ... ,aj-l we can clearly set aj so that C{Jj> O. But C{Jl(al) = 1, so that the theorem is proved. D Theorem 7.3. Let at. ... , ak be a set ofpositive integers satisfying Theorem 7.2. Let Q ~ 1 and Xl," ., X k be positive integers belonging io the intervals (i= 1,2, ... ,k).

Denote by N the number of sets (Xt. ... , X k) such that X k1

+ ... + X k'k

X k1 - l

+ ... + X kk- l , ... , X 1 + ... + X k

509

18.7 The Problem of Prouhet and Tarry

lie in intervals with lengths O(Qk - 1), O(Qk - 2), ... , O(Q), O( 1)

respectively. Then N

= 0(1).

Proof Let (Xl>' .. , X k) and (X'l' ... , X~) be two sets which satisfy the conditions of the theorem. Then

Let Y i = Xi - X;. Then AllYl

+ ...

+AlkYk = O(Qk-l),

so that A .. IJ

=

X~-i J

+ X~-i-1X'. + ... + X,.k-i J J J

(1 ~ i,j ~ k).

Thus

The ratio of the product of the terms of the main diagonal of the determinant IA k- i + d to that of Dk in the previous theorem is clearly greater than klQk-l+k-2+ ... +2+l

= klQtk(k-l).

Also the ratio of the absolute value of each remaining term in the expansion of IA k- i+ 1,A to the corresponding absolute value term for Dk is smaller than 2tk(k-l)k lQtk(k-l).

We now take H

= 2tk(k-l) in Theorem 7.2, so that we have

It is then easy to see that O(Qk-l)

A 12 "'A lk

. . . . . . . . . . . . . . . . . .. = O(Qtk(k-l». 0(1)

Ak2 ... Akk

Therefore Y l = 0(1).

510


Similarly we have Y2 = 0(1),


... , Yk = 0(1).

D

Theorem 7.4. Suppose that the conditions in Theorem 7.3 are satisfied. Let A1 ;;:: 0, A2 ;;:: 0, ... ,Ak ;;:: 0. Then the number of sets (Xl> ... , X k) such that X~

+ ...

+X~, X~-1

+ ...

+X~-1,,,,,X1

+ ...

+Xk

lie in intervals with lengths

respectively is

Proof Since an interval with length 0(Qk-i+A k- i+1 ) can be divided into O(Q"k-i+l) intervals with lengths O(Qk-i), the required result follows at once from Theorem 7.3. D Now let fJ = kj(k + 1) and a1, ... , ak + 1 be a set of positive integers satisfying the conditions of Theorem 7.2 (where we have replaced k by k + 1). We suppose that (1 ~ u ~ k

+ i,

1 ~ v ~ I).

Denote by r(n1' ... ,nk) the number of solutions to the system k+ 1 I (l ~ h ~ k). y~v = nh

L L

u= 1 v= 1

We now prove the following theorems: Theorem 7.5. There exists a set of integers N 1, ... ,Nk such that

Proof The numbers of different sets (Yuv) must be 1 k+ 1 I au QPV-l;;:: C2Q(k+1)(1+P+·oo+pl-l) 2 u=1 v=1 = C2 Q(k + 1)2(1- PI).

;: - n n

Since Inhl ~ C3Q\ the number of different sets (nh) is

511

18.8 Continuation

Therefore there must be a set of integers r(N1>"', N k ) >c:;..-

C2

N1>' .• , Nk

such that

Q(k+ 1)2(l_pl)_tk(k+ 1).

D

C4

Theorem 7.6. The number of solutions to the system k+ 1

I

L L y~v = Nh

(l ,,;;, h ,,;;, k

+ I)

u= 1 v= 1

is at most

Proof From k+1

k+1

I

L y~l = Nh - L L y~v

(l,,;;,h,,;;,k+l)

and (l ,,;;, u";;' k

+ I,

1 ,,;;, v ,,;;, I),

we see that 1 + ... + yk+ 1 Yk+ 11 k+1,l' yk11 + ... + ykk+1,l''''' Y 11 + ... + y k+1,l

lie in intervals with lengths

respectively. We take A. u = ufJ - (u - I)

~

0 in Theorem 7.4. Then, from

k+1

L:

{ufJ - (u - I)} = tfJ(k + I)(k + 2) - tk(k + I) = t k,

we see that the number of sets (Y11,'" ,Yk+ l,d is O(QkI2). Corresponding to each fixed set (Y11,' .. ,Yk+ 1,1) the sums k+ 1 y 12 + " ' + yk+1,2 Yk12+ 1 + " ' + yk+1,2'"''

clearly lie in intervals with lengths O(Q(k+ 1)P2),

O(Qk P2 ),

... ,

O(QP 2)

respectively. Replacing Q by QP in Theorem 7.4, we see that the number of different sets Y12"" ,Yk+1,2 is O(QkPI 2). Continuing this way the theorem is proved. D

18.8 Continuation Theorem 8.1. Denote by W(k,j) the least integer s such that the system (l ,,;;, h ,,;;, k),

512

18. Waring's Problem and the Problem of Prouhet and Tarry s

L ~/1"# L X~q+1, i= 1

(p "# q, 1 ~ p, q ~j)

i= 1

is soluble in integers. Then

([ IOg~(k++D + 2)J

+ I)

W(k,j)'; (k

IOg(1

) I .

Proof This theorem is an immediate consequence of the following theorem.

Theorem 8.2. Let

([ IOg~(k++D + 2)J

, ;;. (k

+ I)

)

log (I

I

.

Then, given any j, there are integers

such that the system

(l

~

h

~

k),

is soluble. Proof Let r(Nb' .. , N h ) be as defined in the previous section. By Theorem 7.5 there are N 1 , ••• , Nh such that

Corresponding to a set of solutions (Yuv) to the system k+ 1

I

L L Y:v =

Nh

u= 1 v= 1

there is clearly a number M such that k+ 1

I

L L y~:l =

M.

u=l v=l

If such an M has only e ( ~ j - 1) different values, say M b M 2, .•. ,Me, then, by Theorem 7.6, the number of solutions to the e-system

513

Notes

is at most cseQtk(k+ l)(l-P'). From the definition of Mi the number of solutions to this e-system is at least r(Nb' .. , N k ). On the other hand, if we take

~

I > {lOg (k

+ 2) flOg (1 + ~)} ,

then, for large Q, we have

giving a contradiction. Our theorem is proved.

D

Notes 18.1. Concerning the value of g(k) in Waring's problem there is the following result: When k > 6 and

we have

(See Hua [30].) Moreover K. Mahler [41J proved that there exists a constant ko such that the above inequality holds whenever k > k o. Unfortunately the method which is based on Roth's theorem is ineffective in the sense that it does not allow us to make a computation for the value of k o. J. R. Chen [18J proved that g(5) = 37. R. Balasubramanian proved that 19 ~ g(4) ~ 21 (see [5J). 18.2. I. M. Vinogradov [61J has improved on his own result on G(k) in Waring's problem: For sufficiently large k we have G(k) < k(210gk

+ 410glogk + 2 log log logk + 13).

Chapter 19. Schnirelmann Density

19.1 The Definition of Density and its History The purpose of this chapter is to prove the following two important results: "There exists a positive integer c such that every positive integer is the sum of at most c primes." "Let k be any positive integer. Then there exists a positive integer Ck (depending only on k) such that every positive integer is the sum of at most Ck k-th powers." These two rf':::;Jlts are obviously related to the Goldbach problem and the Waring problem. Indeed we can even say that these two results are the most fundamental first steps towards these two famous problems. We shall call them the Goldbach-Schnirelmann theorem and the Waring-Hilbert theorem respectively. In this chapter we introduce the notion of density created by Schnirelmann. This notion is extremely elementary, and yet it allows us to establish the two historic results. Our proof of the Goldbach-Schnirelmann theorem differs slightly from Schnirelmann's original proof in that we replace the application of Brun's sieve method by Selberg's sieve method. Again our proof of the Waring-Hilbert theorem is not the original proof due to Hilbert, nor that due to Schnirelmann. We shall give instead the proof by Linnik, given in 1943, with some simplifications and modifications. In both these proofs the notion of Schnirelmann density occupies an important place. The definition of density is as follows: Definition 1. Let ~ denote a set of (distinct) non-negative integers a. Denote by A(n) the number of positive integers in ~ which do not exceed n; that is A(n)

L

= 1

1.

~a~n

Suppose that there exists a positive number IX such that A(n) ;;:: IXn for every positive integer n. Then we say that the set ~ has positive'density, or that ~ is a positive density set. The greatest IX with this property is then called the density of ~. Obviously we have the following simple properties: (i) Since A(n) ~ n, it follows that IX ~ 1. (ii) If IX = 1, then A(n) = n for all n and so ~ must include all the positive integers. Exercise. Let't ;;:: 1. Determine the density of the set 1 + ['ten - I)J, n = 1,2, ....

515

19.2 The Sum of Sets and its Density

19.2 The Sum of Sets and its Density We now introduce the symbols m, b, B(n), {3 and (t, c, C(n), y. The definitions for them are analogous to those for~, a, A(n), oc: thatisbEm, B(n) = L1':;b':;n 1, and{3 is the density of the positive density set m. Definition. The set of integers of the form a + b (aE~, b Em) is called the sum of the sets ~, m, and is denoted by (t. We also write ~ + m = (t. Theorem 2.1. Let 0 E ~ and (t

= ~

+ m.

Then y ;;:: oc

+ {3 -

oc{3.

Proof Since {3 > 0 we see that 1 Em. The following three types of numbers are positive integers in (t; they are all different and are at most n. (i) In m we write b 1 = 1, bz , ... ,bB(n), the numbers being arranged in increasing order. Since 0 E ~ we see that b1, bz, ... ,bB(n) are members of (t, and that there are B(n) such members. (ii) Corresponding to any v where 1 ~ v ~ B(n) - 1, the various numbers a + bv , with a E ~ and 1 ~ a ~ bv + 1 - bv - 1, are distinct positive integers not exceeding n in the set (t. This is because

and

so that

It is clear that the two types of numbers in (i) and (ii) are mutually distinct. For . each fixed v (l ~ v ~ B(n) - 1), there are A(b v + 1 - bv - 1) such numbers a + bv in (t. (iii) For a E~, 1 ~ a ~ n - bB(n), the numbers a + bB(n) are distinct positive integers not exceeding n in the set (t. Since a + bB(n) ;;:: 1 + bB(n) we see that these numbers of type (iii) are different from those in types (i) and (ii), and there are A(n - bB(n» such numbers a + bB(n). From the results of (i), (ii) and (iii) we have B(n)-l C(n) ;;:: B(n) + L A(b v + 1 - bv - 1) + A(n - bB(n» v=l

B(n)-l ;;:: B(n) + L oc(b v + 1 - bv

-

1)

+ oc(n -

bB(n»

v= 1

= B(n) + oc{bB(n) - b1 - (B(n) - 1) + n - bB(n)}

= B(n) + oc{n

- B(n)} ;;:: (l - oc){3n

= n(oc + (3 - OC(3),

+ ocn

516

19. Schnire1mann Density

and hence C(n) --

n

~

0(

+ 13 - 0(13,

y

~

0(

+ 13 - 0(13· D

Note: This theorem is not the best concerning the density of the sum of sets. The sharpest result should be y ~ min (1, 0( + 13), a theorem proved by Mann in 1942. The proof of Mann's theorem is more complicated, and since there is no fundamental improvement concerned with the applications to the principal results in this chapter, we do not include it in this book. Let us now take ~ and ~ both to be sets of positive integers congruent to 1 mod q, and assume also that 0 E~. Then ~ + ~ include all the positive integers congruent to 1, 2 mod q. Obviously the densities of ~ and ~ are l/q while the density of ~ + ~ is 2/q. Therefore the result of Mann cannot be improved.

Theorem 2.2. Let 0 E ~ and 0( + 13 contains all the positive integers.

~

1. Then y = 1; that is the set

log2 -log(l -

, rx)

so that (l -

rx)'o

12 :::;:;

(1 -

log2 rx)-Iog(l-~)

=e

log2 log(l-~)

----·Iog(l-~)

1 = 2'

and hence

rxso /2 ;;:: 1 - (1 - rx)'o/2 ;;:: 1 -

t = t.

Since 0 E ~so/2 the set ~so = ~so/2 + ~so/2 must, QY Theorem 2.2, include all the positive integers and therefore every positive integer is expressible as the sum of So members of ~. 0 Theorem 2.4. Let ~* be a collection of non-negative integers, with multiplicity of membership being allowed. Let ~ be the largest set from ~* without multiplicity of membership. Let rea) denote the multiplicity of a in ~. Suppose that

1 n holds for all n ;;:: 1. Then

_C=-"'=~L"':":'"

~

n,--r.,....(a_))_'

r2(a)

~ a'

has a positive density

(> 0),

rx ;;::

rx/.

Proof From the Bunyakovsky-Schwarz inequality (Theorem 18.7.1) we have

C"'~"'n Y: ;: "'~"'n rea)

1

r2(a) 1

"'~"'n 12 = A(n) "'~"'n r2(a), 1

518

19. Schnirelmann Density

so that A(n) 1( -n- ~ ~

L

r(a)

)2/,

1 ~a~n


L

r2(a) ~

(X'.

1 ~a~n

D

19.3 The Goldbach-Schnirelmann Theorem In §§3 - 5, the letters Ch §§3 - 5 is to prove

C2, ...

denote absolute positive constants. The purpose of

Theorem 3.1. There exists a positive integer c such that every integer greater than 1 is the sum of at most c prime numbers. We define m:* to be the collection of numbers 1 together with Pl + P2 where Ph P2 run through all the prime numbers. Note that members of m:* may have multiplicity. We also define m: to be the largest set from m:* without multiplicity of membership. In order to prove Theorem 3.1 it suffices to prove Theorem 3.2. m: has positive density

Cl.

By Theorem 2.3 any positive integer m is expressible as a sum of at most So members ofm: (that is, a sum of terms involving 1 and numbers of the formpl + P2). This implies that m is the sum of at most 2so numbers which are primes or 1. Therefore, for any n > 2, we have n = 2 + (n - 2) = 2 + b . 1 + where the number of primes P being summed is at most 2so - b. Since 2 + b is expressible as a sum of at most b + 1 primes, it follows that n is expressible as a sum of at most 2so + 1 primes. Therefore Theorem 3.1 follows from Theorem 3.2. We now let r(l) = 1 and r(a) be the multiplicity of a in the collection m:*, that is

LP,

I, r(a)

=

if a

L

{

1,

= 1,

if a ~ 2.

Pl +P2=a

Ll.,;;a

Following Theorem 2.4 our aims are to find a lower bound for O.

522


L

Jl(mk) Jl(m)-l~m~l;lk sfimk) (m,k)= 1

L

Akg(k) = Jl(k) Jl2(m) sfik) 1 ~m~l;lk fim) (m,k)=l Jl(mk) Jl(m)--, 1 ~m~l;lk sfimk)

L

and hence, by Theorem 6.3.2, Jl(m)

L

sfim)

L

Akmg(km) =

1 ~k~l;lm

A,g(r).

1 ~'~l;

m!,

Therefore, by (7) and (8) we have

Q= The

requi~....d

L

1 ~d~l;

fid) {Jl(d)}2 sf(d)

= 12 s

L

1 ~dq

result follows from (6) and (8).

= : = ~.

Jl2(d) fid)

s

s

D

Theorem 4.3. Let the conditions in Theorem 4.2 hold. multiplicative function, and gl(P) = g(p), then

If glen)

is a completely

M

Nl;~----

L

gl(k)

We first establish the following:

Theorem 4.4. Let fin) be a completely multiplicative function satisfying 0 If f3n ~ 0, then

L

f3nf(n)

IT {1

- fip)}

-1

L

~

L

fin)

13m,

min

~n~l;

1

~

pl;';;=O>P!k m

where pi;';; => plkm means that n/m has only the prime divisors of k m. Proof

L

f3nfin)

IT {I -

fip)}

-1

=

L

00

IT L:

f3nfin)

p!k n

=

L

00

f3nfin)

IT L fipm) = L p!k n

=

L l~n~~

L r=l

p!,=o>p!k n

00

f3J(n)

f(nr) =

L l~n~~

00

f3n

L

,=1 p!,=o>p!k n

m=O

00

f3n

{fip)}m

m=O

L s=l

n!s

pl~ =o>p!k n

fis)

fir)

f(p) < 1.

523

19.4 Selberg's Inequality 00

Lfts)

f3n ?:-

L

s= 1

1

~n~,:

L 1

fts)

~s:::;~

nls

L

fts)

l"'s"'~

f3n

nls

pi; =>Plkn =

L 1 ~n~~

pi; =>Plkn

L

0

f3n·

nls

pi; =>Plkn

Proof of Theorem 4.3. We have, by (4),

+ J-l(p) = _1__ 1 =

f(p) = J-l(1) g(p)

g(1)

1 - g(p) .

g(p)

g(p)

If k is square-free, then, by Theorem 6.2.2,

2(k) ( ) ng1(p) _J-l_ = J-l2(k) gl p = J-l2(k)_:..:..plk~_ _ ftk) plk 1 - gl(P) {I - gl(P)} plk =J-l 2(k)gl(k)n{1-g1(p)}-1. plk

n

The above still holds when k Theorem 4.4,

L

n

(9)

= 1 and when k has square divisors. Therefore, by

J-l2(k) Irk) =

1 "'k"'~ J\

L 1 "'k"'~

?:-

L l"'k"'~

J-l 2(k)gl(k)n{1-g1(p)}-1 plk gl(k)

L J-l2(m). mlk pl~ =>plm

Let dk be the greatest square-free divisor of k, so that dklk. If p\!!-.-, thenplk and

dk

so pldk. Therefore dk is a number satisfying the condition on m, so that

J-l2(k) -k-?:- L gl(k). l"'k"'~ ft) 1H"'~ L

(10)

From (9) we see that J-l 2(k)lf(k) ?:- 0 and so, by (5) and (9) we have that

~ J-l2(k) _ J-l2(k) ~ n{l ( )}-1 . k ....". ftk)g(k) - ftk)gl (k) "" plk - gl P

1,1.1

When k = 1 or k is square-free, g(k) = gl(k), and if k is not square-free, J-l(k) = 0; therefore the above holds for all k. The theorem now follows from (10) and Theorem 4.2. 0 Theorem 4.5. Let A ?:- 0, M ?:- 3 and denote by n(A ; M) the number ofprimes between A and A + M. Then

524

19. Schnire1mann Density

n(A;M)

~

2M {1'+

10gM

o (IOgIOgM)}. 10gM

The implied constant here is independent of A and M. Proof Let

L

=

S(A;M)

1,

A+JMk2)

6k 1 . 6k 2

1 a 112(k) ~-'-2-1'""-+ 36~6. C7 log ~ kla k

L

We take ~

= a1o,

D

and the theorem follows from (1).

Proof of Theorem 3.4. When n

L 1 ';;a';;n

r2(a) ~

1+

~

2, we have

L c; 4';;a';;n

a2 -4-

112(k ) 112(k ) L _1'""_1_ L _1'""_2_

log a kda

k1

k21a

k2

528


l:::;a~n

k,k2 (k •• k2)

Ia

n

k1k2 (kl> k 2) Since (kl> k 2) ~ min{kl> k 2} ~ Jk 1k 2, it follows that ,,2

-

n

2

2

"

n

r (a) ~Y"+ Cs -1- 4 L. k k 3/2 1 "'a"'n og n 1 "'k •• k2 "'n ( 1 2) L.

1

~ + C~~( ~ log4 n

_1_)2

'::1 k

k

3/2

n3

~ C4-4 -- '

log n


D

Exercise 1. Let x, k, 1 be positive integers, and (I, k) = 1. Denote by n(x; k, I) the number of primes in the arithmetic progression kn + 1(n = 1, 2, ... ) not exceeding x, and let 0 < (j < 1. Prove that, for k < xo, n(x;k,/)~

2x

(

x q>(k) logk

1+0

((lOg log logx

where the implied constant depends at most on

X)2)) ,

(j.

Exercise 2. When p, p + 2 are both primes, we call them a pair of "prime twins". Denote by Z2(N) the number of pairs of "prime twins" not exceeding N. Prove that N Z2(N) ~ Cs - 2 - ' log N

and that the series

1

L-;, p'

p

where the summation is over all "prime twins" p*, is convergent.

19.6 The Waring-Hilbert Theorem In §§6 -7 theletters c, Cl> C2, ••• denote positive constants depending only on k. The constants implied by the O-symbol also depend at most on k. The purpose of §§6 - 7 is to prove

529

19.6 The Waring-Hilbert Theorem

Theorem 6.1 (Hilbert). Corresponding to each positive integer k there exists a positive integer c such that every positive integer is the sum of at most c k-th powers. We now define 21;" to be the collection of integers x~ + ... +.x;- where each Xm runs over all the non-negative integers. We define 21t to be the largest set of distinct elements from 21;". Let

The proof of Theorem 6.1 is divided into sections of a chain: Theorem 6.2.

If k

~

2, then 21q has positive density.

We see that Theorem 6.1 can be deduced at once from Theorem 2.3 and Theorem 6.2. We define rea) to be the number of solutions to

We first prove: Theorem 6.3.

If n ~

1, then

L

rea) ~ c2(k)nq /k •

1 ~a~n

Proof Clearly we can assume that n >

L

rea)

= -

Cl.

L

1+

O~a~n

1 ~a~n

Then

L

k =a xk+.··+x , n. We also note that, for any if q

=

0,

if q # O. o


1,,;~,,;nr2(a) ~ O,,;a~qpkC~+ ...~x~!=a ly O'::::;Xi~P 1 ~i~Cl

-II f. ... f. e21ti(X~+ +x~!)aI2 1

...

-

X!=o

o

drx

XC! =0

1

I

= Ixto e21tiXk"12C1 drx

~ cs(k)p 2q-k

o ~

c4(k)n 2C !lk-l

giving Theorem 6.4. Our aim therefore is to prove Theorem 6.5. Exercise. Deduce Theorem 6.5 from Theorem 6.4.

19.7 The Proof of the Waring-Hilbert Theorem Theorem 7.1. Let X, Y;;:: 1, n be an integer, and q(n) denote the number of integer solutions to

(Ixml

~

X,

IYml

~

Y, m = 1,2).

(1)

Then q(n)

~

{

27 X 3 / 2 y 3 / 2 , 1

60XYL din

d'

if n = 0; if n #

o.

(2)

531

19.7 The Proof of the Waring-Hilbert Theorem

Proof 1) n = O. Here the values taken by Xl> X2 and Yl cannot exceed 2X + 1, 2X + 1 and 2Y + 1. When Xl> X2, Y1 are specified, Y2 can only take one value. Therefore q(O) ~ (2X + 1)2(2 Y + 1) ~ (3X)2(3 Y) = 27 x 2Y,

and similarly q(O) ~ 27 XY 2 , and hence q(O) ~ min(27 x 2Y, 27 Xy2) ~ )27 x 2Y . 27 XY 2 = 27 X 3/2y3/2. 2) n i= O. We can assume without loss that X integer solutions to XlYl

+ X2Y2

=

n

~

Y. Let ql(n) be the number of

((Xl>X2) = 1, IX21 ~ Ixd ~ X, IYml ~ Y, m = 1,2).

(3)

Clearly Xl i= 0, since otherwise X2 = 0 giving n = 0, contradicting our present hypothesis. Next, for a fixed set Xl, X2 with (Xl' X2) = 1, IX21 ~ IXll ~ Xwe denote by q2(n; Xl> X2) the number of integer solutions in Yl> Y2 for (3). From Theorem 1.8.2 we see that (3) is soluble, and ifi l , Y~ is a set of solutions, then all the solutions are given by t

integer.

Therefore Itl=

Y+ Y 2Y IY~ Xl- Y21 ~--=-, IXll Ixd

and hence the number of values taken by t does not exceed 2Y

4Y + X

IXll

IXll

2'-+I~

5Y

~-,

IXll

that is

Therefore ql(n)~ ~

5Y 21xd + 1 -~5Y l""lx,I""X IX21""lxd Ixd l""lxd""X IXll 5 y. 3 . 2X = 30XY.

L

L

L

It follows that, with the condition (Xl, X2) = 1, the number of solutions to (1) does not exceed 2 . 30XY = 60XY. Next, if(xl' X2) = d i= 1, din, then we let x~ = xt/d, x~ = x2/d, so that we now seek the number of integer solutions to .

and we see from the above that this number does not exceed 6~ . Y.

532


Therefore, when n =F 0, q(n) ~ 60XY

L -.1 din

The proof of the theorem is complete.

d

0

Theorem 6.5 is obviously a consequence of the following Theorem 7.2. Let k

~

2, andf(x) be a polynomial with degree k having integer valued

coefficients:

Then

(4) o

Proof When k

= 2, the left hand side of (4) is the number of integer solutions to

a2

=

0(1),

al

=

O(P),

1~m

~

4.

(5)

Let Xi - Yi = Z;, a2(xi + Yi) + al = Wi (1 ~ i ~ 4). We see that the number of solutions to (5) does not exceed the number of integer solutions to (Zi

= O(P), Wi = O(P), 1 ~ i

~

4).

(6)

If we denote by q(n) the number of integer solutions to

= O(P), Wi = O(P), m = 1, 2 where the constants implied by the O-symbol are the same as those of (6», then the number of solutions to (6) is Llnl ~C6PZ q2(n). From Theorem 7.1, we have

(Zi

=

0(P

6)+ 0 (p4

1

~dl'~~C6PZ d d dl~ I 1 2

(dl.dz) n 1 ~n~c6P2

1)

533

19.7 The Proof of the Waring-Hilbert Theorem

and the required result (4) follows. Suppose now that k ~ 3. We proceed by mathematical induction, and assume as induction hypothesis that the theorem holds when k is replaced by k - 1. From

£

£

Ix=o e 21tij(X)a.12 = x=o e- 21tij(x)a.

I

e21tij(x + h)a.

-x~h~P-x

P

I' I'

e 21tih 2 n " by " ;;:: 2n " provided that we also replace" R must contain a non-zero lattice point" by "there must be a non-zero lattice which lies in R or on its boundary". D

We can make the result sharper in the following sense. Theorem 2.3. Denote by Q the mid-point of the line joining the origin 0 to the point P on the convex body R. As P runs over the points of R, the point Q describes a convex body which we denote by R t . Under the hypothesis of Theorem 2.2 we may strengthen the conclusion by assuming that the lattice point concerned lies outside R t . Proof Denote by (j the greatest distance between 0 and a boundary point of R. Take the integer N satisfying 2N - 1 ~ (j < 2N, so that the distance between 0 and any boundary point of R2 - N is less than 1. Since R2 - N has no non-zero lattice point, the lattice point in Theorem 2.2 must lie outside R2 -N. Therefore there exists an integer m with the property that inside or on the boundary of R2 -m, but outside R 2 -m-l, there is a lattice point (Xl> ... ,xm). Now the lattice point

lies inside or on the boundary of R but outside Rt .

D

540

20. The Geometry of Numbers

20.3 Linear Forms Let

a,rs

be real numbers, with the determinant :;t:0

LI= and let

r =

1,2, ... ,no

(1)

Take R to be the region

This is a convex body symmetrical about the origin, and its volume is given by

f. f f··· f I f··f

dXl . dX2 ... dX n

l~d"'A!.···,I~nl"'An

O(Xl,X2,""Xn) o(el> e2,"" en)

I~d

Idel ' de2'"

den

'" Al,···.I~nl '" An

1

ILII

Therefore if A1A2 ... An > ILlI, then R contains a non-zero lattice point, and if A1A2 ... An ~ ILlI, then there is a non-zero lattice point in R or on its boundary. Therefore: Theorem 3.1. Let el>' .. , en be n real linear forms in Xl, ... , Xn with determinant LI. Let Al>"" An be positive numbers satisfying A1A2 ... An ~ ILII. Then there exist integers Xl> X2,' .. , x"' not all zero, such that

Theorem 3.2. The conclusion of Theorem 3.1 can be strengthened to the following: there exist integers xl> X2,' .. , x"' not all zero, such that

Proof Let:. '> O. By Theorem 3.1 there are integers Xl, ... , X"' not all zero such that

lell

~ (l

1

+ e)n- Al> le21

~

A2 l+e

- - < A2, ... , lenl

~

An l+e

- - < An'

Now let e -+ O. From the discrete nature of integral points the theorem is proved. 0

541

20.3 Linear Forms

+ 1, and take

If we replace n by n

e. =

x.

(1 ~ v ~ n),

A. =

t 1/n

(1 ~ v ~ n),

1

= t'

An+ 1

then, from Theorem 3.2, we have: Theorem 3.3. There are always integers Xb . .. ,Xn and y, not all 0, such that

and Ix.1 ~ t 1/", where t is any positive number.

D

Again if we take (1

1

v ~ n),

(1 ~ v ~ n),

A+1=•

~

t

then we have: Theorem 3.4. Let 1X1' ••• , IXn be real numbers and t lattice point (X,YbY2, . .. ,Yn) such that 1 IIX.x - Y.I ... , X n ,

not all 0, such that

ei + ... + e; ~ 4 ( ILlI)2/n I n

'

ntn r(tn

+ 1)

+ 1), it

.

543

20.5 Products of Linear Forms

where

J.

~

(" )

. r ~+1

We can rewrite Theorem 4.1 differently. A positive definite quadratic form n

n

L L arsxrx.,

Q(X1>' .. ,Xn) =

ars

= asr

r= 1 s= 1

can be represented by

e

The determinant LI of 1> ••• ,en is equal to the square root of D = larsl. This is because A = (a rs ) is a positive definite matrix so that there exists a matrix B such that A = BB', LI = IBI = Dt. Therefore Theorem 4.1 can be stated as follows: Theorem 4.2. Let Q(Xl,' .. , xn) be a positive definite quadratic/orm with determinant D. Then there exists a non-zero point (Xl, . .. ,xn) such that (3)

Let Yn be the least constant with the following property: There exists a non-zero lattice point such that

In §1 we already remarked that Y2 = 2/.j3. Up to the present mathematicians have only determined the values of Yn for 2 ~ n ~ lO: Y4

= Ji, Ys

=

2,

Ys = Y9

18,

= 2,

In general, we know that Yn

3 is unsolved. We now discuss the product of linear forms. We shall use the following result, known as the arithmetic-geometric means inequality. Theorem 5.2.

If al

~

0, ... ,an

~

0, then

Proof 1) n = 2k. We use induction on k. Since

545

20.5 Products of Linear Forms

we see that the result holds when k = I. Assume now that the result holds when n = 2k- 1 . Then when n = 2k we have 1

1

1

(al'" a2k)2k = {(al'" a2k-I)2k-l(a2k-l+l'" a2k)2k-l}t

~ {(a 1 + ... + a 2k-I)(a2k-l+l + ... + a2k)}t 2k -

~

al

2k -

1

1

+ ... + a2k

""

2k

2) (Backward induction.) We now show that if the result holds for n holds for n. Take

+ 1, then it

Then, from our induction hypothesis, we have

);:tT = (al ... an+l)n+l + ... + an) 1

1 ( -na1 ... an(al

_1_

=

_1_ {a n+1 1

~

al

+ ... + an+1 n+ 1

+ ... + an + ~n (a 1 + ... + an)}

which gives


0

From Theorem 5.1 and Theorem 5.2 we have at once:

Theorem 5.3. There exists a non-zero lattice point such that

lei'" enl

n!

~ -ILlI· nn

0

Note. We can also deduce from Theorem 3.1 that there is a non-zero lattice point such that

Since n! < nn whenever n > 1, our Theorem 5.3 here gives a better result. Denote by I'n the least positive constant such that, whenever y ~ Ym there is a non-zero lattice point satisfying

Up to the present we only know that 1'2

= 1/)5 and Y3 = t (Davenport).

546


20.6 Method of Simultaneous Approximations Theorem 6.1. Let OCt> ••• , OC n be real numbers. Then there exist a non-zero lattice point ~

(Xl> ... , xn) and an integer Y

I such that

i = 1,2, ... ,n. Proof We first consider

IXi - ociyl

+ I~I,,;; r,

I ,,;; i,,;; n,

This is a convex body symmetrical about the origin, and its volume is given by

f. f I~d

(

dXl··· dXndy

here ~i = Xi - OCiY, I ,,;; i ,,;; ~n+l

n,)

= y/t

+ I~n+ d <S;,

i= 1, ... ,n

I~d

+ I~n+ d <S;, i= 1, ... ,n

f··f

=Itl I~d

d~l···d~nd~n+l

+ I~n+ d <S;,

i= 1, ... ,n

~i+~"+ 1 ~r

i= 1, ... ,n

~i~O,~n+ l~O

2 + lit I =_ _ rn+l. n

n

+I

Therefore there is a non-zero lattice point (Xl, ... , Xn, y) such that

lx, _ "",I +

I~I.; (n ~ 1)"~'


, ; _n_ (n + 1);;+1, + It I 1

n

I

i = I, ... ,n.

547

20.7 Minkowski's Inequality

Hence

I

O(i -

Xi

I~

Y

n (n

1 '

i = 1,2, ... , n.

0

+ 1)/+-;;-

This theorem is a slight improvement on Theorem 3.4. The best results at the present are: (Minkowski),

n+ 1 { 1 + (n---l)n+3}1/n

cn~--

n+1

n

(Blichfeldt).

Exercise. Let 0(. = fl. + iy. (v ::: 1, ... , n) be complex numbers. Then there are complex integers Zl>'" ,Zm W such that

20.7 Minkowski's Inequality For ai

~

0 (i

= 1, ... ,n), r > 0 we define I

Mr(a) = { ;;(a~

+ ... + a~)

}l/r

(1)

.

When r < 0, and some ai = 0, then the equation (1) has no meaning. In this case, we define (a~ + ... + a~)l/r = O. Therefore, when ai ~ 0, ri:O we can always define Mr(a) =

H(a~ + ... + a~)r/r.

From now on wedenoteai ~ 0 (i = 1, ... ,n) by (a). We write (a) > 0 to mean ai > 0 (i = 1, ... , n), and (a) i: 0 to mean that not all the ai are zero. We also denote by max a and min a the largest and the smallest numbers in ai respectively. If there are non-zero real numbers A., Jl such that A.ai = Jlb i (i = 1, ... , n), then we say that (a) and (b) are proportional. Theorem 7.1. limr _

oo

Mr(a)

= maxa.

Proof We can suppose that r > 0, so that }l/r I { ;;(maxa),

or

~ Mr(a) ~

{

(maxa)'

}l/r

,

548


(;;l)l/r max a ~ Mr(a) ~ maxa. Since

. (l)l/r = (1)0 - = 1,

hm r-

n

+00

we have limr_+ oo Mr(a) = maxa.

n

D

Theorem 7.2. lim r__ 00 Mr(a) = min a. Proof We can suppose that r < O. We first consider the case (a) > O. We have

so that by Theorem 7.1, 1

lim Mr(a) =

1

( ) = - - = mina.

• 1 1 hm M-r maxa a Finally when one ai = 0, and r < 0, we see that both Mr(a) and min a are zero. The theorem is proved. D

r- -

00

-';-+00

We write the geometric mean of ai' Theorem 7.3. limr_o Mr(a)

=

G(a).

Proof 1) r < 0, and some ai = O. Thi~/case is trivial. 2) r =F 0, (a) > O. From (1) we have

I Mr(a) = { ;;a~

+ ... + a~)

1 {1

r

r }

1 +···+a) = er-log -(a n "

We now let r

-+

r

•

0 and apply L'Hospital's rule, giving

1 {I

lim -log r-O

}l/r

-(a~

n

+ ... + a~)

}

=

1

n

n

i-1

- L a~loga· lim r-O

1

-(a~

n

-

I

I

+ ... + a~)

1~ logai'

= -

n

L.

i= 1

549


Therefore · M r () I1m a

1 {l

= I'1m er-log -(a n

r-O

r l

+···+ar )} n

r-O

= 0. We can assume that al > 0, ... ,as> 0, as+ 1 = as+ 2 = ... = an = 0, s < n. Then we have 3) r > 0, and some ai

I Mr(a)= { ~(a~ =

1 + ... + a.) }l/r = {s~'~(a~ + ...

+a~)

}l/r

(~ylrg(a~ + ... + a~)r/r.

From our earlier result we have I

lim { -(a~

+ ... + a~) }l~ = (al

... as)l/s,

S

r-+O

and, since s < n, s)l/r lim ( n

= 0.

r-+O

Therefore lim Mr(a)

=

r-+O

Lemma 1. Let 0(

+ {3 =

lim {(-s)l/r{1-(a~ n s

+ ... + a~) }l/r}

r-+O

1,0( > 0,{3 > 0, Then/or s;;:: 0, t;;:: 0, we have ~t(J ~

with equality only when s

sO(

+ t{3

= t.

Proof The lemma is trivial if s = t or if one of s, tis 0. We assume therefore that s, t are distinct positive numbers. If s > t, then sit> l. Also, < 0( < I, 1 - 0( = {3, so that

°

(n~ From

fy~ sit

1 = 0(

f sit

- I dy

~ 0(

dy

= 0(

(f -

I).

550


we have SXt fJ ~

Finally if SXt fJ =

SIX

+ t{3,

SIX

+ t{3.

then

fy~ sit

IX

f sit

- 1 dy =

IX

dy,

or

f (y~-l sit

which is impossible unless s = t.

- l)dy

= 0,

0

Lemma 2 (Holder's inequality). Let IX not proportional we have

+ {3 = 1, IX > 0, {3 > O.

When (a) and (b) are

n a~bf < (n.L ai)~( .Ln bi)fJ . .L ,=1 ,=1 ,=1 Proof Since (a) and (b) are not proportional, there exists i (I

~

i

~

n) such that

a· b· --'-=/:--'-. n n

L aj

j= 1

L bj

j= 1

Therefore, by Lemma 1, n

i= 1

(

.~ aj)~( .~ bj)fJ J= 1

J= 1

1),

(2)

551


(k < 1).

(3)

Proof 1) k> 1. Here k' = kl(k - 1) > 1, 0 < 11k < 1, 0 < 11k' < 1, 11k = 1. By Lemma 2 we have

n

i~l aibi

n

=

i~l (a~)l/k(bnl/k'
0. The theorem

IS

20.\1 The Least Value for

561

IAI

Exercise 1. Prove that we can always select an integer 0( from the ideal IN(O() I ~

M

0

such that

N(o).

Exercise 2. Prove that, given any ideal class, there is an ideal

0

satisfying

N(o)~M·

IAI

20.11 The Least Value for

We sawin the previous section that the discriminant A of an algebraic number field of degree n satisfies

1.11

~

(-n)2r2(nn)2 -. n!

"'"' 4

(mod 4), and (- 1)'2.1 > 0, we can construct the

Moreover, from A == 0 or following table:

'2 = I

'2 = 0 n=2 n=3 n=4 n=5

A A A A

~ ~ ~ ~

But actually the least value for

4 21 \16 680

A A

A A A

~

- 3 - 15

A~-71

A

~

- 419

(I) A A

~ ~

44 260

1.11 can be calculated to give

'2 = I

'2 = 0 n=2 n=3 n=4

~

'2 = 2

=5 = 49 = 725

A A A

===-

3 23 275

'2 = 2 (II) A

=

\17

The case n = 2 in Table (II) follows at once from considering the quadratic fields R(fi) and R(~). . When n = 3, if 8- satisfies x 3 + x 2 - 2x - I = 0, then the discriminant of R(8-) is 49, and if 8- satisfies x 3 - x - I = 0, then the discriminant of R(8-) is - 23. When n = 4, we let 8- be a root of

The following can then be proved: I) When a = 7,p = 29, we have r2 = 0, A = 725; 2) When a = 3,p = 11, we have r2 = I, A = - 275; 3) When a = - I,p = 13, we have r2 = 2, A = 117.

562


The actual construction of Table (II) presents a problem. The case n = 2 in the table is very easily settled. When n ;;:: 3, the proof of Theorem 10.3 gives us a method whereby after a "finite number" of calculations we can arrive at the results given in Table (II). However, in actual practice, this method requires the calculations of the roots of about one thousand polynomial equations and the determination of the discriminants of the corresponding algebraic number fields. In order to solve this concrete problem we need a practical method. We now examine the situation when n = 3. Suppose that the cubic field R(8) in our discussion has discriminant Ll which satisfies 0 < it ~ 49 (r2 = 0), or - 23 ~ Ll < 0 (r2 = 1). From §10 we see that there is a non-zero integer oc in this field such that (1)

and

The degree of oc is either 3 or 1. Suppose that the degree of oc is definitely 3 so that oc cannot be a rational integer and hence R(8) = R(oc). From the inequality (1) we can determine a bound for the coefficients for the equations satisfied by oc, and the eventual result can be obtained after a finite number of calculations. Unfortunately we have no way of ensuring that oc is not a rational integer. On the contrary, from r > 3, we see that oc = ± 1 do satisfy (1) and ± 1 belong to R(8); therefore this method is not applicable. Let p > 3 and consider the convex body B:

jell + le21 + le31

~ p,

lel + e2 + e31 < 3 «

p),

where

and Wb W2, W3 is an integral basis for R(8). It is easy to see that B is a convex body symmetrical about the origin. Denote by F(t) the area of the intersection between the convex body A:

and the plane el + e2 decreasing. Therefore

+ e3 = t.

Then F(t) = F( - t), and when t;;:: 0, F(t) .is

563

20.11 The Least Value for ILlI 3

Volume of B

=2

p

f F(t)dt = 2~ f FGU )dU o

o p

~ 2~ fF(U)dU = ~ x Volume of A. P

p

o

But

Volume of A

=

{

233!~' 2

3

(

)

r2

when

r2 = 1.

=

0;

3

1 1t P ---

4

when

3!)23'

Therefore, by Minkowski's theorem, there is a non-zero integer a in R(f) satisfying when

(2)

when and (3)

Now we see from (3) that a certainly cannot be a rational integer. Therefore a has degree 3 and R(f) = R(a). Let the irreducible equation satisfied by a be (4)

Then g3 i= 0, and we can assume that g3 > O. For, if otherwise, from - a satisfying the equation

and R(f) = R(a) = R( - a), and - a also satisfying (2) and (3), we can replace g3 by - g3' From the relationship between the roots and the coefficients we have

so that Ig11

~

2 and g3

= 1. Finally we find a bound for g2 by

564

20. The Geometry of Numbers Ig21

+ OC(l)OC(3) + OC(2)OC(3)1 ~ IOC(1)OC(2)1 + IOC(l)OC(3)1 + IOC(2)OC(3)1 (loc(l)l + 11X(2)1 + IOC(3)1)2 't'2

=

IOC(1)OC(2)

~

~- 0 and 1X(1)OC(3) < 0, so that

that is Ig21 ~ 3. Summarizing the above, in any cubic field R(8) with discriminant L1 satisfying 0< L1 ~ 49 (r2 = 0) or - 23 ~ L1 < 0 (r2 = 1) there is an integer oc such that R(8) = R(oc), and IX satisfies an irreducible equation

with Igti ~ 2, Ig21 ~ 4 (when r2 = 0, Ig21 ~ 3). Therefore, in order to determine cubic fields R(8) with discriminant L1 satisfying 0 < L1 ~ 49 (r2 = 0) or - 23 ~ L1 < 0 (r2 = 1), we need only examine these irreducible equations. But the number of such equations is at most 45 (at most 35 when r2 = 0). Moreover, when gl = g2 the equation has the root 1, and when gl + g2 + 2 = othe equation has the root - 1, so that we have no need to examine these reducible equations. Finally since the roots of x 3 - g2x2 + glX - 1 = 0 are the reciprocals of the roots of x 3 - glx2 + g2X - 1 = 0, and R(8) = R(I/8), the reciprocal equation to (4) need not be examined either. We are then left with 27 (18 when r2 = 0) equations to be considered. We then calculate the roots 8 of these 27 (or 18) equations and then determine the discriminants for R(8) to arrive at the results for n = 3 in Table (II).

Bibliography

I. Baker, A.: Linear forms in the logarithm of algebraic numbers. Mathematika 13 (1966) 204 - 216. (II) Mathematika 14 (1967) 102-107. (III) Mathematika 14 (1967) 220-228. (IV) Mathematika 15 (1968) 204-216 2. Baker, A.: Contribution to the theory of Diophantine equations I: On the representation of integers by binary forms. Phil. Tran. Roy. Soc. London, A 263 (1967) 273 -291 3. Baker, A.: On the class number of quadratic fields. Bull. London Math. Soc. 1(1969) 98-102 4. Baker, A.: Transcendental number theory. Cambridge University Press (1975) 5. Balasubramanian, R.: On Waring's problem: g(4) ..;; 21. Hardy-Ramanujan Journal 2 (1979) 1- 32 6. Barban, M. B.: Arithmetic functions on thin sets. [Russian]. Dokl. UzSSR 8 (1961) 9-11 7. Barban, M. B.: The density of the zeros of Dirichlet L-series and the problem ofthe sum of primes and almost primes. [Russian]. Mat. Sbornik (N. S.) 61 (103) (1963) 418-425 8. Bombieri, E.: Sulle formula di A. Selberg generalizzate per classi di funzioni aritmetiche e Ie applicazioni al problema del resto nel "Primzahlsatz". Riv. Mat. Univ. Parma 2; 3 (1962) 393-440 9. Bombieri, E.: On the large sieve. Mathematika 12 (1965) 201-225 10. Bombieri, E.: Le grand crible dans la theorie analytique des nombres. Societe Mathematique de France 18 (1974) II. Bombieri, E., and Davenport, H.: Small differences between prime numbers. Proc. Roy. Soc. Ser. A, 293 (1966) 1-18 12. Burgess, D. A.: The distribution of quadratic residues and non-residues. Mathematika 4 (1957) 106-112 13. Burgess, D. A.: On character sums and primitive roots. Proc. London Math. Soc. 12 (1962) 179-192 14. Buchstab, A. A.: New results in the investigation of the Goldbach-Euler problem and the problem of prime pairs. [Russian]. Dokl. Akad. NaukSSSR 162(1965) 735 -738 = Soviet Math. Dokl. 6 (1965) 729 -732 15. Cassels, J. W. S.: An introduction to Diophantine approximation. Cambridge Tracts in Mathematics 45 (1957) 16. Chao, K.: On the diophantine equation x 2 = y' + I, xy io O. Sci. Sin. 14, 3 (1965) 457 -460 17. Chen, J. R.: On the circle problem. [Chinese]. Acta Math. Sinica 13 (1963) 299-313 18. Chen, J. R.: On Waring's problem: g(5) = 37. [Chinese]. Acta Math. Sinica 14 (1964) 715 -734 19. Chen, J. R.: On the representation ofa large even integer as the sum ofa prime and the product of at most two primes. [Chinese]. Kexue Tongbao 17 (1966) 385-386 20. Chen, J. R.: On the representation of a large even integer as the sum a prime and the product of at most two primes. Sci. Sinica 16 (1973) 157 -176 21. Diamond, H., and Steinig, G. J.: An elementary proof of the prime number theorem with a remainder term. Inventiones Math. II (1970) 199 -258 22. Dickson, L. E. : History of the theory of numbers. (Three volumes). Carnegie Institute, Washington (1919, 1920, 1923) 23. Elliot, P. D. T. A., and Halberstam, H.: Some applications of Bombieri's theorem. Mathematika 13 (1966) 196-203 24. Estermann, T. : Introduction to modern prime num ber theory. Cambridge Tracts in Mathematics 41 (1952) 25. Gauss, C. F.: Disquisitiones arithmeticae. Leipzig, Fleisher, (1801). English translation: A. A. Clarke, Yale University Press (1966)

566

Bibliography

26. Hagis, Jr., P.: A lower bound for the set of odd perfect numbers. Math. Compo 27; 12l (1973) 951-953 27. Hagis, Jr., P., and McDaniel, W. L.: On the largest prime divisor of an odd perfect number, II. Math. Comp., 29 (1975) 922-924 28. Halberstam, H., and Richert, H.-E.: Sieve methods. Academic Press, London (1974) 29. Hardy, G. H., and Wright, E. M.: An introduction to the theory of numbers. 4th ed. Oxford (1960) 30. Hua, L. K.: Die Abschiitzungen von Exponentialsummen und ihre Anwendung in der Zahlentheorie. Enzykl. Math. Wiss., J, 2, Heft 13. Teil I. Leipzig (1959) 31. Huxley, M. N.: On the difference between consecutive primes. Inventiones Math. 16 (1972) 191-201 32. Huxley, M. N.: Small differences between consecutive primes. Mathematika, 20; 2 (1973) 229-232 33. Ingham, A. E.: The distribution of prime numbers. Cambridge Tracts in Mathematics 30 (1932) 34. Kolesnik, G. A.: The refined error term of the divisor problem. [Russian]. "Mat. Zametki" 6 (1969) 545-554 35. Korobov, N. M.: On the estimation of trigonometric sums and its applications. [Russian]. Uspeki Math. Nauk SSSR 13 (1958) 185-192 36. Landau, E.: Handbuch der Lehre von der Verteilung der Primzahlen. (2 Biinde). Leipzig, Teubner (1909) 37. Landau, E.: Vorlesungen iiber Zahlentheorie. (3 Biinde). Leipzig, Hirzel (1927) . 38. Landau, E.: Uber einige neuere Fortschritte der additiven Zahlentheorie. Cambridge Tracts in Mathematics 35 (1937) 39. Lavrik, A. V., and Soberov, A. S.: On the error term of the elementary proof ofthe prime number theorem. [Russian]. Dokl. Adad. Nauk SSSR 211 (1973) 534-536 40. Linnik, Yu. V.: The dispersion method in binary additive problems. Leningrad, (1961). = Providence, R.1. (1963) 41. Mahler, K.: On the fractional parts of the powers of a rational number, II. Mathematika 4 (1957) 122-124 42. Minkowski, H.: Geometrie der Zahlen. Leipzig, Teubner (1910) 43. Minkowski, H.: Diophantine Approximation. Leipzig, Teubner (1927) 44. Montgomery, H. L.: Topics in Multiplicative Number Theory. Springer Lecture Notes 227 (1971) 45. Pan, C. T.: On the least prime in an arithmetic progression. [Chinese]. Sci Rec., New Ser. 1 (1957) 283-286 46. Pan, C. T.: On the representation of an even integer as a sum of a prime and an almost prime. [Chinese]. Acta Math. Sinica 12 (1962) 95 -106 = Chinese Math.-Acta 3 (1963) 101-112 47. Pan, C. T.: On the representation of even numbers as the sum of a prime and a product of at most 4 primes. [Chinese]. Acta Sci. Natur. Univ. Shangtung 2 (1962) 40-62 = Sci. Sinica 12 (1963) 455 -474. [Russian] 48. Pan, C. T., Ding, X. X., and Wang, Y.: On the representation of a large even integer as a sum of a prime and an almost prime. Kexu Tongbao 8 (1975) 358-360 49. Richert, H.-E.: Zur multiplikativen Zahlentheorie. J. reine angew. Math. 206 (1961) 31-38 50. Roth, K. F.: On the large sieves of Linnik and Renyi. Mathematika 12 (1965) 1-9 51. Schmidt, W. M.: Simultaneous approximations to algebraic numbers by rationals. Acta Math. 125 (1970) 189-201 52. Schmidt, W. M.: Diophantine Approximations. Springer Lecture Notes 785 (1980) 53. Sierpiilski, W.: Elementary theory of numbers. Warszawa (1964) 54. Slowinski, D.: Searching for the 27th Mersenne prime. J. Recreational MatJ;!.ematics 11 (1979) 258-261 55. Stark, H. M.: A complete determination of the complex quadratic fields of class number 1. Michigan Math. J. 14 (1967) 1-27 56. Stepanov, S. A.: On the estimation of Weyl's sums with prime denominators. [Russian]. Uzv. Akad. Nauk. SSSR, Ser. Mat. (1970) 1015 -1037 57. Titchmarsh, E. c.: The theory of the Riemann zeta-function. Oxford (1951) 58. Vaughan, R. c.: A note on Snirel'man's approach to Goldbach's problem. Bull. London Math. Soc. 8 (1976) 245-250 59. Vinogradov, A. I.: The density hypothesis for Dirichlet L-series. [Russian]. Izv. Akad. Nauk SSSR Ser. Mat. 29 (1965) 903 -934. Corrigendum: ibid. 30 (1966) 719-720

Bibliography

567

60. Vinogradov, I. M.: On a new estimation of a function W + it). [Russian]. Izv. Akad. Nauk SSSR, Ser. Mat. 22 (1958) 161-164 61. Vinogradov, I. M.: On the problem of the upper estimation for G(m). [Russian]. Izv. Akad. Nauk SSSR, Ser. Mat. 23 (1959) 637 - 642 62. Wang, Y.: On the least primitive root. [Chinese]. Acta Math. Sinica 9 (1959) 432-441 63. Wang, Y.: On the estimation of character sums and its applications. [Chinese]. Sci. Record (N. S.) 7 (1964) 78-83 64. Wirsing, E.: Elementare Beweise des Primzahlsatzes mit Restglied, II. J. Reine Angew. Math. 214/215 (1964) 1-18 65. Yin, W. L.: On Dirichlet's divisor problem. Sci. Rec., New Ser. 3 (1959) 131-134

Index

Abel's lemma 120 Aequartro identica ratis abstrura 208 Algebraic number fields 425 Argument 338 Artin, E. 39 Association 431 -, left 368, 382 -, right 368 - modulo p 62 Baker, A. 493, 565 Balasubramanian, R. 513, 565 Barban, M. B. 565 Basis 399,426 -, integral 427 -, standard 402 Base interchange formula 49 Bertrand's postulate 75, 82 Blichfeldt, H. F. 547 Bombieri, E. 100, 249, 565 Brun, V. 74, 514 Buchstab, A. A. 565 Burgess, D. A. 185, 337, 565

Cassels, J. W. S. 478, 493, 565 Chao Jung-Tze 255 Chao, K. 299, 504, 565 Character 152 - system 314, 445 -, improper 157 -, primitive 156 -, principal 152 -, standard factorization of 156 Chebyshev, P. L. 82 Chen, J. R. 99, 100, 147,513,565 Chowla, S. 100 Cofactor 372 -, algebraic 372 Commute 372 Congruent 22, 416 - modulo m 437 - modulo 9Jl 402 Conjugates 425

Continued fraction 250 -, complete quotient of 252 -, n-th convergent of 250 -, periodic 260 -, simple 251 Convex body 538 - region 535 Coprime 5, 434 Countable 474 Cross ratio 342

Davenport, H. 100, 448, 534, 545, 565 Degree 423, 425 - of III 438 Density, asymptotic 113 -,p-

210

-, real 210 -, Schnirelmann 514 Diamond, H. 249, 565 Dickson, L. E. 276, 565 Dimension 399 Ding, X. X. 566 Diophantine equations 276 Diophantus 276 Dirichlet series 143 Dirichlet's divisor problem 147 Discriminant 300, 426 - of R(8) 428 -, fundamental 322 Divisor 2, 57, 430 -, elementary 387 -, greatest common 5, 58, 394, 434 -, ideal 433 -, proper 2 -, right 389 Dyson, F. J. 478

Eisenstein, F. G. 39 Elliot, P. D. T. A. 101, 565 Elliptic 341 Enumerable 474 Equipotent 474 Equivalent 257, 350, 369

570

Index

- form 301 - form modq 309 - in the narrower sense 443 Erdos, P. 217 Euclidean algorithm 5 Euclidean distance 347 Euler, L. 76 - -Binet formula 290 - -'s constant 88, 112, 483 - -'s criterion 36 - -'s identity 191 Extended complex plane 339 Extension, algebraic 69 -, finite 424 -, single 424 - , ((i-

416

((i-convergent sequence 415 ((i-limit 415 Factor, invariant 387 -, repeated 63 Farey sequence 125 Fermat, P. de 288 - solution 25 - last theorem 151,451,488 Fibonacci sequence 252 , Field 68, 424 -, Euclidean 447 -, simple 447 Finite order 342 Fixed point 341 Form, binary quadratic 300 -, (in)definite 301 -, primitive 307 -, reduced 304 Franklin, F. 197 Function, arithmetic 102 -, Chebyshev 217 -, (completely) multiplicative 13, 102 -, divisor 103, 111 -, Euler 103 -, generating 143 -, Mobius 103 -, Riemann zeta 144,219 -, slowly decreasing 226 -, von Mangoldt 103 Fundamental circle 358 Fundamental region 351 Fundamental sequence 415 Furtwiingler 39

Gauss, C. F. 37, 39, 47, 329, 565 Gelfond, A. O. 488 Genus 314, 445 -, principal 446

Geodesic 345 Goldbach's problem 74,99, 151,514 Graph 195 -, (self-)conjugate 196 Group 340 -, abelian 68 -, adjoint 391 Hagis, Jr., P. 566 Hajos 535, 542 Halberstam, H., 100, 101,534,566 Hardy, G. H. 101, 566 Heath-Brown, D. R. 100 Hensel, K. 405 Hensel's lemma 421 Hilbert, D. 39, 483, 488, 494, 514 Heilbronn, H. 329 Hua, L. K. 513,566 Huxley, M. N. 100, 566 Hyperbolic 341 Ideal 58, 68, 432 - class 441 - divisor 433 -, prime 434 -, principle 432 -, product of 432 -, unit 432 Index 48 Inequality, arithmetic-geometric means -, Bunyakovsky-Schwarz 508 -, Cauchy 330 -, Holder 550 -, Minkowski 547,553 Ingham, A. E. 566 Integer 1 -, algebraic 423 -, rational 423 Inverse transformation 339 Involution 342 Iwaniec, H. 100 Jarnik, M. V. 123 Jacobi's symbol 44, 159

Khintchin, A. 494 Kolesnik, G. A. 147, 566 Korobov, N. M. 248, 566 Kronecker's symbol 185, 304 Kummer, E. 39, 431, 451 Kusmin 488 Lagrange interpolation formula

61

544

571

Index Lambert series 146 Landau, E. 566 Large sieve 100 Lattice point 40, 112 Lavrik, A. V. 566 Law of Quadratic Reciprocity 39 Lehmer, D. H. 26 Lehmer, D. N. 4 Legendre, A.-M. 39 Legendre's symbol 35, 152 Lobachevskian geometry 348, 354 Loxodromic 341 Linnik, Yu. 100, 101, 494, 514, 566 Littlewood, J. E. 73, 101

Mahler, K. 513,566 Mann, H. B. 516 Markoff, A. A. 288 Matrix, adjoint (modular) 373, 390 -, composite 389 -, irreducible 389 -, (positive) modular 365, 372 -, (standard) prime 390 Mediant 127 Mersenne number 38, 449 Mersenne prime 450 Miller, J. C. P. 51 Minkowski, H. 544,547,556,566 Mobius (inverse) transform 108 Modular transformation 257 Modulus 4 -, double 64 -, integral 68 Montgomery, H. L. 100,566 MordeII, L. J. 538 Multiple 2 -, (left) least common 394, 8, 59

Niven, I. 486 Norm 425 - ofIDl 402 Normal form of Hermite 369, 384 - - of Smith 370, 386 NuII sequence 415 Number, algebraic 423 -, cardinal 474 -, composite 3 -, Markoff 260, 288 -, Mersenne 38, 449 -, perfect 13 -, prime 3 -, square-free 113 -, triangular 191 -, transcendental 476

Order 48,68 Otto 255

Pan, C. T. 100,566 Parabolic 342 Partition 187 -, (self) conjugate 195, 196 Period 342 Point at infinity 339 P6lya, G. 185 Polynomial, associated 57 -, integral valued 17 -, (ir)reducible 20,63 Primary solution 282 Prime in R(8) 431 - modp

63

- twins 74 Primitive root 48, 49, 68 Principal class 446 Proper solutions 279

Quadratic algebraic numbers 349 Quadratic, (non)-residue 35

Reduced points 351 Reduced quadratic form 358 Renyi, A. 100 Residue class 22 moth - moddp, rp(x) 67 ....:., (non)-k-th power 49 -, quadratic (non) 35 Residue system, complete 22, 64 - -, reduced 24, 64 Richert, H. E. 100, 147, 566 Riemann hypothesis 185,488 Riemann sphere 339 Roth, K. F. 100,478,566

Schmidt, W. M. 493, 566 Schneider, T. 488 Schnirelmann, L. 514 Selberg, A. 217,514 Siegel, C. L. 329,478 Sierpinski, W. 566 Sieve of Eratosthenes 3 Slowinski, D. 566 Soon Go 276 Squaring the circle 488 Standard factorization 3 Stark, H. M. 566 Steinhaus, H. 123 Steinig, G. J. 249, 565 Stepanov, S. A. 566

572 Symbol, Jacobi 44, 159 -, Kronecker 185,304 -, Legendre 35, 152

Takagi, T. 39 Theorem, Bombieri 101 -, Cauchy 497 -, Chebyshev 73,79,89,266 -, Chinese remainder 22, 29 -, Dedekind's discriminant 438 -, Dirichlet 73,97,243 -, Eisenstein 20 -, Erdos-Fuchs 138 -, Euler 24, 36, 76 -, Fermat 18, 24 -, Fundamental - of arithmetic I, 3, 6 -, Fundamental - for ideals 435 -, Gauss 20, 37 -, Hardy-Ramanujan 95 -, Heilbronn-Siegel 329 -, Hermite 485 -, Hilbert 61,529 -, Hurwitz 256 -, Ikehara 228 -, Jacobi 208 -, Jacobsthal 176 -, Khintchin 266 -, Lagrange 208 -, Landau-Ostrowski-Thue 480 -, Legendre 261 -, Liouville 476 -, Lindemann 486 -, Mayer 288 -, Miller 96 -, Minkowski's Fundamental 535, 538 -, P61ya 172 -, prime number 73 -, Roth 478 -, Selberg 233, 520 -, Schur 329 -, Siegel 331,335 -, Sierpinski 134

Index -, Soon Go 286 -, Stickel berger 428 -, Tchebotaref 556 -, Thue 479 -, unique factorization 58 -, Voronoi 137 -, Weyl 270, 272 -, Wilson 33 -, Wolstenholme 33 Thue, A. 478 Titchmarsh, E. C. 566 Trace 425 Transformation 373 -, (uni)modular 348, 373 Triangle 347 Turan, P. 95

Unit 424 -, fundamental - circle 338

441

Valuation 408, 409 -, (non)Archimedian 411 -, equivalent 410 -, identical 409 - , p-adic 409 Vaughan, R. C. 534, 566 Vinogradov, A. I. 100, 566 Vinogradov, I. M. 74, 173,248,513,567

Wang, Y. 185, 337, 566, 567 Weil, A. 185 Wheeler, D. J. 51 Wiener, N. 217 Wirsing, E. 249, 567 Wright, E. M. 566

Yin, W. L.

137, 147, 567

Introduction to Number Theory

Introduction to number theory

Introduction to Number Theory