This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
2,
then
C/ ) = ( 
ly!(pl).
D
In other words,  1 is a quadratic residue or nonresidue modp, according to whether p == 1 or 3 (mod4). It follows from this that the odd prime divisors of x 2 + 1 must be congruent to 1 (mod 4). Theorem 2.2 (Gauss's Lemma). Let p > 2, p,tn. Denote by m the number of least positive residues of the 1) numbers n, 2n, ... l)n (mod p) which exceed p/2. Then
t(P 
Example 1. p
7,n
=
=
,t(P 
10. We have 10,20, 30 == 3,6,2
(mod 7).
There is exactly one least positive residue which exceeds (If) =  1.
J. Therefore m = 1 and
Example 2. p = 11, n = 2. We have the residues 2,4,6,8, 10 (mod 11), and there are three which exceed 1f. Therefore (121) =  1.
t(P 
Proof of Theorem 2.2. Let 1= 1)  m, and let at> ... , a, be those residues which are less than p/2, and bI> ... , bm be those residues which are greater than p/2. Then'
n as n b == n I
m
t(pl)
l
s=l
1=1
k=l
(p _1)
pl
kn =   !n22
(modp).
(1)
Since 1 ::;:; p  bl ::;:; t(p  1) it follows that as and p  bl are t(p  1) integers in the 1). We now prove that they are distinct by proving interval from 1 to as :f p  bl • Suppose, if possible, as + bl = p. Then there are integers x, y such that
t(P 
xn
or x
+ yn == 0
+ y == 0 (modp),
1::;:; x ::;:;
(modp),
tCP 
1),
which is impossible. Therefore
n as n (P I
m
s=l/=l
bl )
1::;:; y ::;:;
(p _1) !.
= 
2
t(P  1)
38
3. Quadratic Residues
From (1) we see that the left hand side of this equation is
== ( l)m
rl Ii as
s=1
ht
== ( l) mnt(pl)(p 2
t=1
I)!
(modp).
Therefore nt(pl)
== ( l)m (modp).
From Euler's criterion we see that (;) == (  l)m (modp), and so (;)
= (
l)m. 0
If we take n = 2 in Theorem 2.2, then
2,2'2, 2·3, ... ,t(p 1)·2 are already in the interval from 0 to p. We can now determine the number of integers k satisfying i < 2k < p, or ~ < k < i, which gives
m= Let p = 8a
+ r, r =
[~J [~l
1,3,5,7. Then
m = 2a
+
GJ [~J
== 0, 1, 1,0 (mod 2).
Therefore we have: Theorem 2.3.
If p > 2,
then (;)
= (
l)i(pL 1).
0
In other words 2 is a quadratic residue or nO!lresidue modp, according to whether p == ± 1 or ± 3 (mod 8). It follows from this that every odd prime divisor of x 2  2 must be congruent to ± 1 (mod 8). Exercise. Let n be a positive integer such that 4n + 3 and 8n + 7 are primes. Prove that 24n + 3  1 = M 4n + 3 is composite. Use this to prove the following concerning Mersenne numbers:
231M ll ,
471M23 ,
1671Ms3 ,
263IM 131 ,
3591M 179 ,
3831M 19b
4791M239 ,
5031M251 •
3.3 The Law of Quadratic Reciprocity Theorem 3.1. Let p, q he two distinct odd primes. Then
(~) (~) = (_
l)t(p1)t(q1).
39
3.3 The Law of Quadratic Reciprocity
x2
In other words, if p == q == 3 (mod 4), then exactly one of the two congruences == p (mod q), x 2 == q (modp) is soluble. Otherwise the two congruences are either
both soluble or both insoluble. This is the famous and important Law of Quadratic Reciprocity in elementary number theory which was discovered by Legendre and proved by Gauss, who named it "the queen of number theory". The later research work on algebraic number theory by Kummer, Eisenstein, Hilbert, Takagi, Artin, Furtwangler seem to justify the name. Proof We do not, for the moment, exclude the case q = 2, and we suppose that p, q are distinct primes. When 1 ~ k ~ t(P  1) we can write
Let m
I
a=
a.,
I
b=
bt
t= I
s= I
where as and bt are defined in the previous section. Then we have tIpI)
I
+ b.
rk = a
(1)
k=I
We saw in the proof of G.auss's lemma that a., p  bt are the same as 1,2, ... ,t(P  1). Therefore
p2 _ 1
1
8=1+2+ ... +"2(pl)=a+mp b,
(2)
and
p2 _ 1  q
tIpI)
=
I
tIpI)
kq
=p
I
g k=I k=I Subtracting (2) from (3), we have p2 _ 1 g(q  1)
tIpI)
qk
I
+
tIpI)
rk
=P
k=I
I
qk
+ a + b.
(3)
k=I
!(pI)
I
=p
qk  mp
+ 2b,
k=I
or
p2_1
tIpI)
(4) I qk  m (mod 2). k=I 1) (Alternative proof of Theorem 2.3). We take q = 2 so that qk are all 0, and hence 8(q  1)
==
p2 _ 1
  ==  m (mod 2). 8
2) Let q > 2. Then tIpI)
m ==
I
k=I
qk
(mod 2).
40
3. Quadratic Residues
Therefore
Similarly we have
so that
If we can prove that t(p1)
[kq]
t(q1)
k= 1
P
1= 1
L
 + L
[lP] _P  1 q  1
 2
q
or
2
=plql 2 2
(mod 2),
then the theorem will follow. It suffices therefore to prove the following lemma.
Lemma.
_P 1q 1 L [kq] P + L [IP] q . 2 2
t(p1)
t(q 1)
k= 1
1= 1
Proof Consider the rectangle with vertices: (O,tq)
(0,0), (0, tq), (tp, 0), (tp, tq)
<tp,O)
(0,0)
The diagonal from the origin does not pass through any lattice point (a point with integer coordinates). This is because if (x, y) is a lattice point on the diagonal, then xq  yp = 0 and so pix, qly, showing that (x,y) must lie outside the rectangle. The total number oflattice points in the rectangle is 1) . t(q  1). The number of lattice points in the two triangular regions below and above the diagonal are respectively
t(P 
t(I
1
)
k=l
The lemma is therefore proved.
[kq] , P
0
Example 1. Determine those primes p > 3 of which 3 is a quadratic residue. From the law of quadratic reciprocity we have
(3) (p)
p1 \p = 3 ( 1)2.
41
3.3 The Law of Quadratic Reciprocity
Now
{m~I'
G)~ (/)~ p
pl
(1)2=
{
if p=.1
I,
(mod 3),
if p=.2 (mod 3); if p=.1 (mod 4), if p =.  1 (mod 4).
1 '  1,
It follows from the Chinese remainder theorem that if p =. if p =.
±1
(mod 12), ± 5 (mod 12).
Example 2. Determine those primes p :f 5 of which 5 is a quadratic residue.
From the law of quadratic reciprocity we have (;) = (~), and
(5"2) =(1)8= 1, 521
G)= 1,
G)=(5 2 )=I,
(i) =
so that if p =. ± 1 (mod 5), if p =. ± 2 (mod 5). Example 3. Determine those primes p of which 10 is a quadratic residue.
From Example 2 and the Chinese remainder theorem we have if p =. if p =.
± 1, ± 3, ± 9, ± 13 (mod 40), ± 7, ± 11, ± 17, ± 19 (mod 40).
Example 4. Determine the solubility of x 2 =. 1457 (mod2389).
Here p = 2389 is a prime. Since  1457 =  31 x 47 it follows from (
~ 1) = 1,
(:1) =
(:J e =
2 1) = 1,
(~) = (:7) = (:7)G~) =  (~7)G~) 8 =  G)C 3) =  G)C23) =  I, that
C2~n7
) =  1, so that the congruence is not soluble.
Exercise 1. Show that (;3) = 1,
G~) =
 1.
195) =  1, (74) Exercise 2. Show that ( 1901 101 =  1, (365) 1847 = 1.
1,
42
3. Quadratic Residues
Exercise 3. Show that
= ±1
or
±5
(mod 24),
then(~) = 1;
±7
or
±
(mod 24),
then
if
p
if
p=
11
(~) = 
1.
3.4 Practical Methods for the Solutions Although the theory above is simple and beautiful, it is nevertheless rather negative. By this we mean the following. If, following our theory, the congruence is insoluble, then the problem is finished. However, if the congruence is soluble, we may further ask for the actual solutions to the congruence, and the method does not give us the solutions. In actual fact, when p is large, the determination of the solutions to x 2 = n (modp) is no easy matter. However, ifp = 3 (mod 4) or p = 5 (mod 8), then we have the following methods. 1) p = 3 (mod 4). Since (;) = 1, we have n t (p1) = 1 (mod p). and so (n! ± Xl + 2'  1 are actually incongruent solutions we see that the congruence has exactly four solutions. When p > 2 and I = I, the result is trivial, and the remaining part of the theorem follows from Theorem 2.9.3. D
xi 
From the results of Chapter 2 we can determine the number of solutions to a quadratic congruence to any integer modulus m.
3.6 Jacobi's Symbol Throughout this section m denotes a positive odd integer. Definition. Let the standard factorization of m be PI ... Pt, where the Pr may be repeated. If (n,m) = 1 then we define the Jacobi's symbol by
45
3.6 Jacobi's Symbol
(mn) =0 G) t
r=1
Examples.
r
(~) = 1. If (a,m) = 1, then (~) = 1.
Note: If (:) = 1, it does not follow that x 2
=n (modm) is soluble.
Theorem 6.1. Let m and m' be positive odd integers. (i) If n
= n'
(modm) and
(n, m) = 1, then (:) = (:). (ii) If(n, m) = (n, m') = 1, then (:) (;,) = (m:'). (iii) If(n,m) = (n',m) = 1, then (:)(:) = (:'). Theorem 6.2. (
:n 1)
D
= (  l)t(ml).
Proof. It suffices to prove that t
t
L
Pi  1 2
i=1
=
OPi 1 (mod 2),
i=1
2
which certainly holds when t = I. Given any two odd integers u, v we always have
u  1 vI 2 + 2
=uv2 1
(mod 2)
(or (u  1)(v  I)
=0 (mod4)).
It follows by induction that t
Pi 
1
tl Pi 
1
Pt 
1
L=i= L  +2i= 1 2 1 2 tl
o
_
Theorem 6.3.
i=1
Pi 
1
1
0 Pi 
= 222 +~
1
i=1
(mod 2).
(~) = (_ 1)~mL1).
Proof. This is similar to the above, except that we replace (I) by U2 V 2 
8
1
u2  1 v2  1 =  8  +  8  (mod 2).
D
Theorem 6.4. Let m, n be coprime positive odd integers. Then ( m)(n) n m
~.'!!..::..! 2.
= (  1) 2
D
(1)
46
3. Quadratic Residues
Proof Let m = TIp, n =
=
TI q. Then nlml
plql
TITI ( 1)22 = ( 1)22p
where we have used (1).
q
0
In using the Legendre's symbol we must always ensure that the denominator is a prime. In using Jacobi's symbol however, we can avoid the factorization process. For example:
383) (443) ( 60 ) ( 22 ) ( 15 ) ( 15 ) ( 443 =  383 =  383 =  383 383 =  383 8 8 = C1 :) = ( 5) = (25) = 1. If we delete the condition that m, m' are positive in Theorem 6.4, then we have:
Theorem 6.5. Let m, n be coprime odd integers.
(Imln)(m) jnf =  ( 
Otherwise, the required value is ( 
If m, n are both negative, then mlnl
1)22.
l)t<ml).! 2, then cp(pl) is even, so that m cannot have two distinct odd prime divisors. If m has a primitive root, then m must be of the form 21, i or 2cpl. If c ~ 2, then cp(2C) = 2C 1 is also even, and so 2ci cannot have primitive roots. Therefore m must be of the form 21,pl or 2pl. 2) m = 21. If I = I, then I is a primitive root. If 1= 2, then 3 is a primitive root. Let I ~ 3. We prove by induction that for all odd a, we have a 2'  2
=
I
(mod 21).
This is easy, since if then
Therefore there is no primitive root for m = 21 (/ > 2). 3) m = i. The case I = I has already been settled in §8. Let g be a primitive root of p. If gPl  I =1= 0 (modp2), then we take r = g; if gPl  I = 0 (modp2), then we take r = g + p. We then have
Therefore such an r is a primitive root of p2. Let rP 
1 
I
=
kp, p,tk.
Since s~O,
we can prove as before that
Hence rpl  2 (pl)
= I + kp ll
(mod pi) ,
I
~
2.
(1)
If the order of r is e, then el(p  I)pll = cp(i). Since r is a primitive root of p, we see that(p  1)le. We deduce from (I) that e = cp(Pl); that is r is a primitive root ofi· 4) m = 2pl. We take g to be a primitive roqt of pl. If g is odd, then g is also a primitive root of 2pl; if g is even, then g + pi is a primitive root of 2pl. D Theorem 9.2. Let I > 2. Then the order of 5 with respect to the modulus 21 is 21 2.
Proof We first prove that, for a
52a  3
~
3,
= I + 2a 
1
(mod2 a ).
51
3.9 The Structure of a Reduced Residue System
This clearly holds when a 5 2a  2 = (5 2a  3 f
Therefore 5213 (mod 21). D
=1=
= 3,
and we now proceed by induction. We have
== (1 + 2a 1 + k2a)2 == 1 + 2a (mod2 a+ 1 ).
1 (mod 21) and 52'  2 == 1 (mod 21). That is, the order of 5 is 21 2
Theorem 9.3. Let I > 2. Then, given any odd a, there exists b such that aI
a == (  1)25 b
(mod 21),
b
~
0.
Proof If a == 1 (mod 4), then by Theorem 9.2, 5b (0::;;; b < 21 2) gives 21 2 distinct numbers mod 21; moreover they are all congruent 1 (mod 4). Therefore there must be an integer b such that a == 5b (mod 21). If a == 3 (mod 4), then  a == 1 (mod 4), and the required result follows from the above. D
Theorem 9.4. Let m = 21 . pili . .. p~s (standard factorization) with I ~ 0, 11 > 0, ... , Is > 0. We define (j to be or 1 or 2 according to whether 1= 0, 1 or 1= 2 or I > 2
°
respectively. Then the reduced residue system ofm can be represented by the products of s + (j numbers. Proof 1) Suppose that m = m'm", (m', m") = 1. Let ar, .. . , aq>(m') be a reduced residue system mod m', and that ai == 1 (modm") (this is always possible). Let br, ... , bq>(m") be a reduced residue system mod m" and that bj == 1 (modm'). Then aibj represen t a reduced residue system mod mm', and its num ber is q>( m'm"). Also, if aibj == asb t (modm'm"), then ai == as (modm'), bj == b t (modm"). 2) From Theorems 9.1 and 9.3 we know that the reduced residue system modm, where m = pi (p > 2), is the product of a single number. If m = 21 where I > 1, then the reduced residue system is the product of (j numbers. Combining this with 1), the theorem is proved. D
This theorem points out an important principle. In group theory this result is known as the Fundamental Theorem of Abelian groups. Exercise. Prove that if k < p, n, = kp2
+ 1 and
2n 
1
== 1 (modn),
then n is a prime number. Hints: (i) First prove.that n has a prime divisor congruent 1 (modp). Let dbe the least positive integer such that 2d == 1 (mod n). Deduce that d,tk, din  1 and pld. Then obtain the conclusion from pldlq>(n). (ii) Deduce from n = kp2 + 1 = (up + 1)(vp + 1) that n cannot be composite. Note: Taking p = 2127  1, k = 180, Miller and Wheeler proved, with the aid of a computer, that 180(2127  1)2 + 1 is prime. (Nature 168 (1951),838).
52
3. Quadratic Residues
The least primitive roots for primes less than 5000. An asterisk indicates that lOis a primitive root. p
p1
g
p
p1
g
p
p1
g
3 5 7* 11 13 17* 19* 23* 29* 31 37 41 43 47* 53 59* 61* 67 71 73 79 83 89 97* 101 103 107 109* 113* 127 131* 137 139 149* 151 157 163 167* 173 179* 181* 191 193* 197 199 211 223* 227 229* 233* 239
2 22 2·3 2·5 22.3 24 2.3 2 2·11 22.7 2·3·5 22.3 2 23 .5 2·3·7 2·23 22.13 2·29 22.3.5 2·3·11 2·5·7 23 .3 2 2·3·13 2·41 23 .11 25 .3 22.5 2 2·3·17 2·53 22.3 3 24 .7 2.3 2.7 2·5·13 23 ·17 2·3·23 22.37 2.3.5 2 22.3.13 2.3 4 2·83 22.43 2·89 22.3 2.5 2·5·19 26 .3 22.7 2 2.3 2·11 2·3·5·7 2·3·37 2·113 22.3.19 23 ·29 2·7·17
2 2 3 2 2 3 2 5 2 3 2 6 3 5 2 2 2 2 7 5 3 2 3 5 2 5 2 6 3 3 2 3 2 2 6 5 2 5 2 2 2 19 5 2 3 2 3 2 6 3 7
241 251 257* 263* 269* 271 277 281 283 293 307 311 313* 317 331 337* 347 349 353 359 367* 373 379* 383* 389* 397 401 409 419* 421 431 433* 439 443 449 457 461* 463 467 479 487* 491* 499* 503* 509* 521 523 541* 547 557 563
24 .3.5 2.5 3 23 2·131 22 ·67 2.3 3 .5 22.3.23 23 .5.7 2·3·47 22.73 "2.3 2.17 2·5·31 23 .3.13 22.79 2·3·5·11 24.3.7 2·173 22.3.29 25 ·11 2·179 2·3·61 22.3.31 2.3 3 .7 2·191 22.97 22.3 2·11 24.5 2 23 .3.17 2·11·19 22.3.5.7 2·5·43 24.3 3 2·3·73 2·13·17 26 .7 23 .3.19 22.5.23 2·3·7·11 2·233 2·239 2.3 5 2.5.7 2 2·3·83 2·251 22 ·127 22.5.13 2.3 2.29 22.3 3 .5 2·3·7·13 22 ·139 2·281
7 6 3 5 2 6 5 3 3 2 5 17 10 2 3 10 2 2 3 7 6 2 2 5 2 5 3 21 2 2 7 5 15 2 3 13 2 3 2 13 3 2 7 5 2 3 2 2 2 2 2
569 571* 577* 587 593* 599 601 607 613 617 619* 631 641 643 647* 653 659* 661 673 677 683 691 701* 709* 719 727* 733 739 743* 751 757 761 769 773 787 797 809 811* 821* 823* 827 829 839 853 857* 859 863* 877 881 883 887*
23 .71 2·3·5·19 26 .3 2 2·293 24 .37 2·13 ·23 23 .3.5 2 2·3·101 22.3 2·17 23 .7.11 2·3·103 2.3 2.5.7 27 .5 2·3·107 2·17·19 22 ·163 2·7·47 22.3.5.11 25 .3.7 22.13 2 2·11·31 2·3·5·23 22.5 2·7 22.3.59 2·359 2.3.11 2 22.3.61 2.3 2.41 2·7·53 2.3.5 3 22.3 3 .7 22.5.19 28 .3 22 ·193 2·3·131 22 ·199 23 ·101 2.3 4.5 22.5.41 2·3·137 2·7·59 22.3 2.23 2·419 22.3.71 23 ·107 2·3·11·13 2·431 22.3.73 24.5.11 2.3 2.72 2·443
3 3 5 2 3 7 7 3 2 3 2 3
:,
11 5 2 2 2 5 2 5 3 2 2 11 5 6 3 5 3 2 6 11 2 2 2 3 3 2 3 2 2 11 2 3 2 5 2 3 2 5
53
3.9 The Structure of a Reduced Residue System
p
p1
g
p
p1
g
p
p1
g
907 911 919 929 937* 941* 947 953* 967 971* 977* 983* 991 997 1009 1013 1019* 1021* 1031 1033* 1039 1049 1051* 1061 1063* 1069* 1087* 1091* 1093 1097* 1103* 1109* 1117 1123 1129 1151 1153* 1163 1171* 1181* 1187 1193* 1201 1213* 1217* 1223* 1229* 1231 1237 1249 1259* 1277 1279
2·3·151 2'5'7·13 2'3 3 '17 25 ·29 23 '3 2'13 22'5.47 2'11·43 23 '7'17 2·3·7·23 2·5'97 24 ,61 2·491 2.3 2511 22'3'83 24 '3 2'7 22 ·11·23 2'509 22. 3· 5 ·17 2'5·103 23 .3.43 2·3·173 23 ·131 2'3'5 2'7 22'5'53 2'3 2'59 22'3'89 2'3·181 2'5·109 22'3'7'13 23 ·137 2'19·29 22 ·277 22'3 2'31 2·3·11·17 23 '3'47 2'5 2'23 27 .3 2 2'7·83 2'3 2'5'13 22'5'59 2·593 22 ·149 24 '3'5 2 22.3'101 26 ·19 2·13·47 22,307 2·3·5·41 22'3'103 25 .3.13 2·17'37 22'11.29 2.3 2'71
2 17 7 3 5 2 2 3 5 6 3 5 6 7 11 3 2 10 14 5 3 3 7 2 3 6 3 2 5 3 5 2 2 2 11 17 5 5 2 7 2 3 11 2 3 5 2 3 2 7 2 2 3
1283 1289 1291* 1297* 1301* 1303* 1307 1319 1321 1327* 1361 1367* 1373 1381* 1399 1409 1423 1427 1429* 1433* 1439 1447* 1451 1453 1459 1471 1481 1483 1487* 1489 1493 1499 1511 1523 1531* 1543* 1549* 1553* 1559 1567* 1571* 1579* 1583* 1597 1601 1607* 1609 1613 1619* 1621* 1627 1637 1657
2·641 23 '7.23 2'3'5'43 24 '3 4 22. 52 ·13 2·3·7·31 2·653 2·659 23 .3.5.11 2'3·13·17 24 '5'17 2·683 22'7 3 22'3'5'23 2·3·233 27'11 2'3 2'79 2·23'31 22. 3· 7 ·17 23 ·179 2'719 2'3'241 2'5 2.29 22. 3.11 2 2'3 6 2.3.5'7 2 23 '5'37 2·3·13·19 2'743 24 .3'31 22'373 2'7·107 2·5·151 2'761 2.3 2. 5 ·17 2·3·257 22'3 2'43 24 '97 2·19·41 2'3 3 '29 2·5·157 2'3'263 2·7·113 22. 3· 7 ·19 26 ,5 2 2 ·11· 73 23 '3'67 22'13'31 2·809 22'3 4 '5 2·3·271 22,409 23 ,3 2 ,23
2 6 2 10 2 6 2 13 13 3 3 5 2 2 13 3 3 2 6 3 7 3 2 2 5 6 3 2 5 14 2 2 11 2 2 5 2 3 19 3 2 3 5 11 3 5 7 3 2 2 3 2 11
1663* 1667 1669 1693 1697* 1699 1709* 1721 1723 1733 1741* 1747 1753 1759 1777* 1783* 1787 1789* 1801 1811* 1823* 1831 1847* 1861* 1867 1871 1873* 1877 1879 1889 1901 1907 1913* 1931 1933 1949* 1951 1973 1979* 1987 1993* 1997 1999 2003 2011 2017* 2027 2029* 2039 2053 2063* 2069* 2081
2·3·277 2'7 2'17 22'3'139 22'3 2'47 25 '53 2·3·283 22'7'61 23 '5'43 2·3'7·41 22'433 22'3'5'29 2· 32·97 22'3'73 2'3'293 24 '3'37 2.3 4 '11 2·19'47 22'3'149 23 '3 2.5 2 2·5·181 2·911 2'3·5·61 2·13'71 22'3'5'31 2'3'311 2'5·11'17 24 '3 2'13 22'7.67 2·3'313 25 '59 22. 32'19 2'953 23 ·239 2·5·193 22'3'7'23 22 ·487 2'3'5 2'13 22'17'29 2·23·43 2·3·331 22'3'83 22'499 2.3 3 .37 2·7·11·13 2·3·5·67 25 .3 2.7 2 ·1013 22'3'13 2 2·1019 22.3 3 '19 2·1031 22 ·11·47 25 .5'13
3 2 2 2 3 3 3 3 3 2 2 2 7 6 5 10 2 6 11 6 5 3 5 2 2 14 10 2 6 3 2 2 3 2 5 2 3 2 2 2 5 2 3 5 3 5 2 2 7 2 5 2 3
54
3. Quadratic Residues
p
pI
g
p
pI
g
p
pI
g
2083 2087 2089 2099* 211l 21l3* 2129 2131 2137* 2141* 2143* 2153* 2161 2179* 2203 2207* 2213 2221* 2237 2239 2243 2251* 2267 2269* 2273* 2281 2287 2293 2297* 2309* 2311 2333 2339* 2341* 2347 2351 2357 2371* 2377 2381 2383* 2389* 2393 2399 2411* 2417* 2423* 2437* 2441 2447* 2459* 2467 2473*
2·3·347 2·7·149 23 .3 2.29 2·1049 2·5·21l 26 .3 ·Il 24.7.19 2·3·5·71 23 .3.89 22.5.107 2.3 2.7.17 23 ·269 24.3 3 .5 2.3 2.1l 2 2·3·367 2·1l03 22.7.79 22.3.5.37 22.13.43 2·3·373 2·19· 59 2.3 2.5 3 2·11·103 22.34.7 25 ·71 23 .3.5.19 2.3 2.127 22.3.191 23 .7.41 22.577 2·3·5·7·1l 22. II· 53 2·7·167 22.3 2.5.13 2·3·17·23 2.5 2 .47 22 .19.31 2·3·5·79 23 .3 3 ·Il 22.5.7.17 2·3·397 22.3.199 23 .13.23 2· 11·109 2·5·241 24 ·151 2·7·173 22 .3.7.29 23 .5.61 2 ·1223 2 ·1229 2.3 2.137 23 .3.103
2 5 7 2 7 5 3 2 10 2 3 3 23 7 5 5 2 2 2 3 2 7 2 2 3 7 19 2 5 2 3 2 2 7 3 13 2 2 5 3 5 2 3 II 6 3 5 2 6 5 2 2 5
2477 2503 2521 2531 2539* 2543* 2549* 2551 2557 2579* 2591 2593* 2609 2617* 2621* 2633* 2647 2657* 2659 2663* 2671 2677 2683 2687* 2689 2693 2699* 2707 271l 2713* 2719 2729* 2731 2741* 2749 2753* 2767* 2777* 2789* 2791 2797 2801 2803 2819* 2833* 2837 2843 2851* 2857 2861* 2879 2887 2897*
22.619 2.3 2.139 23 .3 2.5.7 2·5·1l·23 2.3 3 .47 2·31·41 4.7 2.13 2.3.5 3 .17 22.3 2.71 2·1289 2·5·7·37 25 .3 4 24.163 23 .3.109 22.5.131 23 .7.47 2.3 3 .7 2 25 .83 2·3·443 2·1l 3 2·3·5·89 22.3.223 2.3 2.149 2·17·79 27 .3.7 22.673 2·19·71 2·3·1l·41 2·5·271 23 .3.113 2.3 2.151 23 .11.31 2·3·5·7·13 22.5.137 22.3.229 26 .43 2·3·461 23 .347 22.17.41 2.3 2.5.31 22.3.233 24.5 2.7 2·3·467 2 ·1409 24.3.59 22 ·709 2.7 2.29 2.3.5 2.19 23 .3.7.17 22.5. II· 13 2 ·1439 2·3·13·37 24.181
2 3 17 2 2 5 2 6 2 2 7 7 3 5 2 3 3 3 2 5 7 2 2 5 19 2 2 2 7 5 .3 3 3 2 6 3 3 3 2 6 2 3 2 2 5 2 2 2 II 2 7 5 3
2903* 2909* 2917 2927* 2939* 2953 2957 2963 2969 2971* 2999 3001 301l* 3019* 3023* 3037 3041 3049 3061 3067 3079 3083 3089 3109 3119 3121 3137* 3163 3167* 3169 3181 3187 3191 3203 3209 3217 3221* 3229 3251* 3253 3257* 3259* 3271 3299*' 3301* 3307 3313* 3319 3323 3329 3331* 3343* 3347
2·1451 22.727 22.3 6 2·7·1l·19 2·13·1l3 23 .3 3 .41 22.739 2 ·1481 23 .7.53 2.3 3 .5 ·Il 2 ·1499 23 .3.5 3 2·5·7·43 2·3·503 2 ·151l 22·3·1l·23 25 .5.19 23 .3.127 22.3 2.5.17 2·3·7·73 2.3 4.19 2·23·67 24 ·193 22.3.7.37 2·1559 24.3.5.13 26 .7 2 2·3·17·31 2·1583 22·3 2.1l 22.3.5.53 2.3 3 .59 2·5·1l·29 2·1601 23 .401 24.3.67 22.5.7.23 22.3.269 2.5 3 .13 22.3.271 23 .11.37 2·3·181 2·3·5·109 2·17·97 22.3. 52. II 2·3·19·29 24.3 2.23 2·3·7·79 2·1l·151 28 .13 2.3 2 .5.37 2·3·557 2·7·239
5 2 5 5 2 13 2 2 3 10 17 14 2 2 5 2 3 II 6 2 6 2 3 6 7 7 3 3 5 7 7 2 II 2 3 5 10 6 6 2 3 3 3 2 6 2 10 6 2 3 3 5 2
55
3.9 The Structure of a Reduced Residue System
p
p1
g
P
p1
g
3359 3361 3371* 3373 3389* 3391 3407* 3413 3433* 3449 3457 3461* 3463* 3467 3469* 3491 3499 3511 3517 3527* 3529 3533 3539* 3541 3547 3557 3559 3571* 3581* 3583 3593* 3607* 3613 3617* 3623* 3631 3637 3643 3659* 3671 3673* 3677 3691 3697 3701* 3709* 3719 3727* 3733 3739 3761 3767 3769
2·23·73 25 '3.5.7 2·5·337 22.3.281 22. 7 .11 2 2·3·5·113 2·13·131 22'853 23 • 3 ·11·13 23 ·431 27 ,3 3 22'5'173 2·3·577 2·1733 22. 3 .17 2 2'5'349 2·3·11·53 2'3 3 '5'13 22'3'293 2·41·43 23 ,3 2,7 2 22 ·883 2·29·61 22'3.5'59 2'3 2.197 22.7.127 2·3'593 2·3·5'7·17 22.5.179 2.3 2.199 23 '449 2·3·601 22'3'7'43 25 '113 2'1811 2.3'5.11 2 22. 32. 101 2·3·607 2·31'59 2·5'367 23 '3 3 '17 22'919 2'3 2'5'41 24 .3'7.11 22'5 2'37 22 • 32·103 2.11.13 2 2· 34 ·23 22'3'311 2·3·7·89 24 '5'47 2'7·269 23 '3'157
11 22 2 5 3 3 5 2 5 3 7 2 3 2 2 2 2 7 "2 5 17 2 2 7 2 2 3 2 2 3 3 5 2 3 5 21 2 2 2 13 5 2 2 5 2 2 7 3 2 7 3 5 7
3779* 3793 3797 3803 3821* 3823 3833* 3847* 3851* 3853 3863* 3877 3881 3889 3907 3911 3917 3919 3923 3929 3931 3943* 3947 3967* 3989* 4001 4003 4007* 4013 4019* 4021 4027 4049 4051* 4057* 4073* 4079 4091* 4093 4099 4111 4127 4129 4133 4139* 4153* 4157 4159 4177* 4201 4211* 4217* 4219*
2 '1889 24 '3'79 22 ·13· 73 2 ·1901 22'5'191 2'3'7 2'13 23 '479 2·3·641 2· 52. 7·11 22. 32·107 2·1931 22'3'17'19 23 '5'97 24 '3 5 2.3 2'7'31 2·5·17·23 22 ·11· 89 2·3·653 2·37'53 23 ·491 2·3·5·131 2'3 3 '73 2 ·1973 2·3·661 22 ·997 25 .5 3 2·3·23·29 2·2003 22'17'59 2'7 2'41 22'3'5'67 2·3·11·61 24 .11.23 2.3 4 '5 2 23 • 3 '13 2 23 '509 2·2039 2'5·409 22.3.11'31 2·3·683 2·3·5·137 2·2063 25 • 3 ·43 22 ·1033 2·2069 23 .3.173 22 ·1039 2.3 3 . 7'11 24 .3 2'29 23 '3'5 2'7 2'5'421 23 '17'31 2·3·19'37
2 5 2 2 3 3 3 5 2 2 5 2 13 11 2 13 2 3 2 3 2 3 2 6 2 3 2 5 2 2 2 3 3 10 5 3 11 2 2 2 17 5 13 2 2 5 2 3 5 11 6 3 2
p
4229* 4231 4241 4243 4253 4259* 4261* 4271 4273 4283 4289 4297 4327* 4337* 4339* 4349* 4357 4363 4373 4391 4397 4409 4421* 4423* 4441 4447* 4451* 4457* 4463* 4481 4483 4493 4507 4513 4517 4519 4523 4547 4549 4561 4567* 4583* 4591 " 4597 4603 4621 4637 4639 4643 4649 4651* 4657 4663
p1
g
22'7'151 2'3 2'5'47 24 '5'53 2·3· 7 ·101 22 ·1063 2'2129 22'3.5'71 2'5'7·61 24 .3.89 2·2141 26 '67 23 '3'179 2·3'7·103 24 .271 2'3 2.241 22 '1087 22. 32.11 2 2·3·727 22 ·1093 2·5·439 22'7'157 23 ·19·29 22. 5·13 ·17 2· 3 ·11· 67 23 • 3· 5· 37, 2'3 2'13'19 2'5 2'89 23 '557 2·23·97 27 '5'7 2'3 3 '83 22 ·1123 2·3'751 25 .3'47 22 ·1129 2.3 2.251 2·7'17·19 2'2273 22'3'379 24 .3.5'19 2'3'761 2·29'79 2· 33 • 5 ·17 22'3'383 2·3 ·13'59 22 .3.5'7.11 22'19'61 2'3'773 2·11·211 23 '7'83 2'3'5 2'31 24 .3'97 2.3 2'7.37
2 3 3 2 2 2 2 7 5 2 3 5 3 3 10 2 2 2 2 14 2 3 3 3 21 3 2 3 5 3 2 2 2 7 2 3 5 2 6 11 3 5 11
5 2 2 2 3 5 3 3 15 3
56
3. Quadratic Residues
p
pI
g
p
p I
g
p
pI
g
4673* 4679 4691* 4703* 4721 4723 4729 4733 4751 4759 4783* 4787 4789
26 .73 2·2339 2·5·7·67 2·2351 24 .5.59 2·3·787 23 .3.197 22 .7.13 2 2.5 3 .19 2·3·13·61 2·3·797 2·2393 22 .3 2 .7.19
3 II 2 5 6 2 17 5 19 3 6 2 2
4793* 4799 4801 4813 4817* 4831 4861 4871 4877 4889 4903 4909 4919
23 .599 2·2399 26 .3.5 2 22 .3.401 24 .7.43 2·3·5·7·23 22 .3 5 .5 2·5·487 22 .23.53 23 .13.47 2·3·19·43 22 .3.409 2·2459
3 7 7 2 3 3 II II 2 3 3 6 13
4931* 4933 4937* 4943* 4951 4957 4967* 4969 4973 4987 4993 4999
2·5·17·29 22 .3 2 ·137 23 .617 2·7·358 2·3 2 ·5 2 ·II 22 .3.7.59 2·13·191 23 .3 3 .23 22 ·II·II3 2.3 2 .277 27 .3.13 2.3.7 2 .17
6 2 3 7 6 2 5 II 2 2 5 3
Chapter 4. Properties of Polynomials
4.1 The Division of Polynomials We consider polynomialsf(x) with rational coefficients and we denote by 13°f the degree of the polynomial.
Definition 1.1. Let./{x) and g(x) be two polynomials with g(x) not identically zero. If there is a polynomial h(x) such that./{x) = g(x)h(x), then we say that g(x) divides j{x), and we write g(x)I'/{x) or glf If g(x) does not divide ./{x), then we write g,tf Clearly we have the following: (i)flf; (ii) ifflg and gil, thenfand g differ only by a constant divisor, and we call them associated polynomials; (iii) if fig and glh, then Jlh; (iv) if fig, then 13°f ~ aOg. Ifflg and g,tI, then we callfa proper divisor of g and it is easy to see that, in this case, 13°f < 13° g. Theorem 1.1. Let./{x) and g(x) be any two polynomials with g(x) not identically zero. Then there are two polynomials q(x) and r(x) such that f = q . g + r, where either r = 0 or aOr < aOg. Proof We prove this by induction on the degree off If 13°f < aOg, then we can take q = 0, r =f If aOf~ aOg, we let f=
IXnXn
+ ... ,
g = Pmxm
+ ... ,
aOf= n, 13° g = m,
so that
From the induction hypothesis, there are two polynomials h(x) and r(x) such that
where either r
so that f
=
0 or aOr < aOg. We now put
= qg + r as required. D
58
4. Properties of Polynomials
Definition 1.2. By an ideal we mean a set I of polynomials satisfying the following conditions: (i) If f, gEl, then f + gEl; (ii) IffE I and h is any polynomial, then fh E I. Example. The multiples of a fixed polynomial fix) forms an ideal.
Theorem 1.2. Given any ideal I, there exists a polynomial f E I such that any polynomial in I is a multiple off; that is I is the ideal of the set of multiples off Proof Let f be a polynomial in I with the least degree. If g is a polynomial in I which is not a multiple off, then, according to Theorem 1.1, there are polynomials q(x) and r(x) (1' 0) such that g
= qf + r,
Since f E I, it follows from (ii) that qfE I, and hence from (i) that g  qfE I, that is rEI. But this contradicts the minimal degree property of f The theorem is proved. D Definition 1.3. Let f and g be two polynomials. Consider the set of polynomials of the form mf + ng where m, n are polynomials. From Theorem 1.2 we see that this set is identical with the set of polynomial which are multiple of a polynomial d. We call this polynomial dthe greatest common divisor offand g, and we write (f, g) = d. For the sake of uniqueness we shall take the leading coefficient of (f, g) to be I, that is a monic polynomial. Theorem 1.3. The greatest common divisor (f, g) has the following properties: (i) There are two polynomials m, n such that (f, g) = mf + ng; (ii) For every pair of polynomials m, n we have if, g)lmf + ng; (iii) If Ilf and Ilg, then 11(f, g). D Definition 1.4. If(f, g) = I, then we say thatfand g are coprime. Theorem 1.4. Let p be an irreducible polynomial. If plfg, then either plf or pig. Proof If p,tf, then (f, p) = I. Thus, from Theorem 1.3 there are polynomials m, n such that mf + np = 1 so that mfg + ngp = g. Since plfg, it follows that pig. D
4.2 The Unique Factorization Theorem Theorem 2.1. Any polynomial can be factorized into a product of irreducible polynomials. If associated polynomials are treated as identical, then, apart from the ordering of the factors, this factorization is unique. D
59
4.2 The Unique Factorization Theorem
The theorem can be proved by mathematical induction on the degree of the polynomial. Theorem 2.2. Letj(x) and g(x) be two polynomials with rational coefficients, and that j(x) be irreducible. Suppose that f(x) = 0 and g(x) = 0 have a common root. Then j(x)lg(x). Proof Sincefand g have a common zero, it follows that (f, g) # l. Let d(x) be the greatest common factor of j(x) and g(x). Then d(x) and j(x) are associated polynomials, because j(x) is irreducible. Therefore j(x)lg(x). 0
From this theorem we deduce the following: Ifj(x) is an irreducible polynomial of degree n, then the zeros
are distinct. Moreover, if 9(i) is a zero of another polynomial g(x) with rational coefficients, then the other n  I numbers are also the zeros of g(x). Theorem 2.3. Let f and g be monic polynomials:
where Pv are distinct irreducible monic polynomials. Then
where
Cv
= min (a v , bv )' 0
Definition 2.1. Letfand g be two polynomials. Polynomials which are divisible by bothfand g are called common multiples offand g. Those common multiples which have the least degree are called the least common multiples, and we denote by [f, g] the monic least common multiple. Theorem 2.4. Under the same hypothe~is as Theorem 2.3 we have
where dv
= max (a v , bv ). 0
From this we deduce: Theorem 2.S. A least common multiple divides every common multiple. Theorem 2.6. Let f, g be monic polynomials. Then fg
=
[f, g](f, g).
0
0
60
4. Properties of Polynomials
4.3 Congruences Let m(x) be a polynomial. If m(x)lfix)  g(x), then we say that fix) is congruent to g(x) modulo m(x) and we write
fix)
= g(x)
(modm(x)).
With respect to any modulus m(x) we have: (i)f=f(modm); (ii) iff= g (modm), then g =f(modm); (iii) iff= g, g = h (modm), thenf= h (modm); (iv) iff= g, fl gl (modm), thenf ±fl g ± gl,ffl ggl (modm). Being congruent is an equivalence relation which partitions the set polynomials into equivalence classes. From (iv) we see that addition and multiplication can be defined on these classes. We denote by 0 the class whose members are divisible by m(x). If m(x) is irreducible we can even define division on the set of equivalence classes (except by 0, of course). Specifically, if fix) is not a mUltiple of m(x), then there are polynomials a(x), b(x) such that a(x}f{x) + b(x)m(x) = 1 which means that there is a polynomial a(x) such that a(x)f(x) = 1 (modm(x)). We state this as a theorem.
=
=
=
Theorem 3.1. Let m(x) be irreducible. Then any nonzero equivalence class has a reciprocal. That is, if A is a nonzero equivalence class, then there exists a class B such that for any polynomials fix) and g(x) in A and B respectively we have fix)g(x) = 1 (mod m(x)). D We now give an example to illustrate the ideas in this section. Let m(x) = x 2 + 1, an irreducible polynomial. Each equivalence class contains a unique polynomial ax + b which we may take as the representative. The addition and subtraction of classes is given by ax + b ± (alx + b l ) = (a ± al)x + (b ± bl)' Multiplication is given by (ax + b)(alx + b l ) = aalx 2 + (ab l + alb)x + bb l = (ab l + alb)x + bb l  aal (modx 2 + 1). Using the ordered pair (a, b) to denote the class containing ax + b we then have
(a,b)
± (abb l ) =
(a, b)(ah b l )
(a
± abb ± bl),
= (ab l + bal, bb l  aal)'
From
(ax
+ b)( 
ax
. . ( we see thatthe Inverse of (a, b) IS
+ b) = a2 + b2
(modx 2
+ 1),
b)
a 2' 2 2 2 ' In other words we have the a +b a +b arithmetic of the complex number ai + b. Extending the idea here, if m(x) is a monic polynomial of degree n, then each equivalence class possesses a unique polynomial with degree less than n, say 
and the arithmetic of the congruence modulo m(x) becomes the arithmetic of these
61
4.4 Integer Coefficients Polynomials
polynomials. The sum of two such polynomials is obtained by adding the corresponding coefficients, and the product is the ordinary product polynomial reduced modulo m(x). Exercise 1. Let OCl, OC2, OC3 be distinct. Determine a quadratic polynomial j(x) satisfying j(OC1) = /31 '/(OC2) = /32, j(OC3) = /33'
Answer: The Lagrange interpolation formula ft..x) = /31
(x  O(2)(X  O(3) (OCI  O(2)(OCl  O(3)
+ /32
(x  O(3)(X  OCl) (OC2  O(3)(OC2  OCl)
(x  OCl)(X  O(2)
+ /33...,..,(OC8  OCl)(OC3  O(2)
Exercise 2. Let ml(x) and m2(x) be two nonassociated irreducible polynomials. Let fl(X) andf2(x) be two given polynomials. Prove that there exists a polynomialj(x) such thatj(x) =/;(x) (modmi(x)), i = 1,2.
4.4 Integer Coefficients Polynomials It is clear that the set of integer coefficients polynomials is closed with respect to addition, subtraction and multiplication. A set of integer coefficients polynomials is called an ideal if (i) f + g belongs to the set whenever f and g belong to the set, (ii) fg belongs to the set whenever f belongs to the set, and g is any integer coefficients polynomial. Theorem 4.1. (Hilbert) Every ideal A possesses a finite number of polynomials fl' ... ,J,. with the following property: Every polynomial f E A is representable as f = glfl + ... + gnfn where gb' .. , gn are integer coefficients polynomials.
Proof 1) Denote by B the set ofleading coefficients of members of A. We claim that B forms an integral modulus. To see this, we observe that if a, bEB, where ft..x) = axn + .. " g(x) = bxm + .. " then by (ii) we know thatj{x)xm, g(x)x" E A so that
j(x)xm ± g(x)xn
=
(a
± b)xm+n + ...
are in A. Therefore a ± bEB which proves our claim. From Theorem 1.4.3 members of B are multiples of an integer d. Let the corresponding polynomial with leading coefficient d be
2) Let fEA. Then there are two polynomials q(x) and r(x) such that ft..x) = q(X)fl (x) + r(x) where oOr < OOfl or r = O. This is certainly so if the degree of fis less than that offl' Ifj(x) = axn + ... + an (n ~ I), then by 1) we see that dla, and
62
4. Properties of Polynomials
is a polynomial with degree at most n  I. If the degree here is greater than or equal to I, then its leading coefficient is again divisible by d. Continuing the argument we see that our claim is valid. 3) If every member of A has degree at least I, then the theorem is proved. Otherwise we let d' be the greatest common divisor of the leading coefficients of . members of A whose degree are less than I, and we let f2
= d'xl' + d'lX"l + ...
(did')
be the corresponding polynomial in A. From the above, we see that members of A whose degree lies between l' and I can be written asfix) = Q(X)f2(X) + r(x) where aOr < a 2f2 or r = O. Continuing this argument the theorem is proved. 0
4.5 Polynomial Congruences with a Prime Modulus In this section all the polynomials have integer coefficients and p is a fixed prime number. Definition 5.1. If the corresponding coefficients of two polynomials fix) and g(x) differ by multiples of p, then we say thatf(x) and g(x) are congruent modulo p, and we writefix)~g(x) (modp). By the degree aOfofj(x) modulo p we mean the highest degree of f(x) whose coefficient is not a multiple of p. For example 7x 2 + 16x + 9~2x + 2 (mod 7), and a°(7x 2 + 16x + 9) = I (mod 7). But with respect to the modulus 3, a 2(7 x 2 + 16x + 9) = 2. Clearly we have (i) j(x)~j(x) (modp); (ii) if f~g (modp), then g~f (modp); (iii) if f~g, g~h (modp), thenf~h (modp); (iv) iff~g,Jl ~gl (modp), thenf ±fl ~g ± gl and ffl ~ggl (modp). We note particularly that (f(xW
~j(xP)
(modp).
Definition 5.2. Letf(x) and g(x) be polynomials with g(x) not identically zero mod p. If there is a polynomial h(x) such thatj(x) ~h(x)g(x) (modp), then we say that g(x) dividesf(x) modulo p. We call g(x) a divisor ofj(x) modulo p, and we write g(x)lj(x) (modp). Example. From XS + 3x4  4x 3 + 2 ~ (2X2  3)(3x3  x 2 + I) (mod 5) we see that 2X2  31x s + 3x4  4x 2 + 2 (mod 5). We have the following: (i) f(x)lj(x) (modp); (ii) if j(x)lg(x) and g(x)lf(x) (modp), thenj(x) and g(x) differs only by a constant factor; that is, there exists an integer a such thatj(x)~ag(x) (modp). In this case we say thatj(x) and g(x) are associated modulo p. It is easy to see that every polynomial has p  I associates
63
4.6 On Several Theorems Concerning Factorizations
modulo p. Moreover, there is a unique monic associated polynomial. (iii) Ifflg, glh (modp), thenflh (modp). (iv) Letfix) and g(x) be two polynomials with g(x) not identically zero modulo p. Then there are two polynomials q(x) and r(x) such that fi.x)~q(x)g(x) + r(x) (modp), where either aOr < aOg, or r(x)~O (modp). Definition 5.3. If a polynomial fix) cannot be factorized into a product of two polynomials with smaller degrees modp, then we say that f(x) is an irreducible polynomial modp, or thatf(x) is prime modp. Example. We take p = 3. There are three nonassociated linear polynomials, namely x, x + 1, x + 2, which are irreducible. There are nine nonassociated quadratic polynomials, namely x 2 , x 2 + x, x 2 + 2x, x 2 + 1, x 2 + X + 1, x 2 + 2x + 1, x 2 + 2, x 2 + X + 2, x 2 + 2x + 2. Of these there are 6 (= (x + a)(x + b)) which are reducible, and the three irreducible ones are x 2 + 1, x 2 + X + 2, x 2 + 2x + 2.
We note that if a polynomial is irreducible mod p, then it is irreducible and from this we deduce that x 2 + 2x + 2 has no rational zeros. The determination of the number of irreducible polynomials modp of degree n is an interesting problem which we shall solve in §9. Theorem 5.1. Any polynomial can be written as aproduct of irreducible polynomials modp, and this product representation is unique apartfrom associates and ordering of the factors. 0 We can define, similarly to §1, the greatest common divisor and the least common multiple. If we denote by (f, g) the monic greatest common divisor, then we have Theorem 5.2. Given polynomials j(x) and g(x), there are polynomials m(x) and n(x) such that m(x)f(x) + n(x)g(x)~(f(x), g(x)) (modp). 0
4.6 On, Several Theorems Concerning Factorizations Definition 6.1. Letj(x) = anxn + an_1x"1 + ... be a polynomial. The polynomial + (n  1)an_lxn 2 + ... is called the derivative ofj(x) and is denoted by
nanx"l f'(x).
Clearly we have (f(x) + g(x))' = f'(x) that (f(x)g(x)), = f'(x)g(x) + g'(x)j(x).
+ g'(x),
and it is not difficult to prove
Definition 6.2. If a polynomial j(x) is divisible by the square of a nonconstant polynomial modp, then we say thatfix) has repeated/actors modp. For example, x 5 + X4  x 3  x 2 + X + 1 has the repeated factors (x 2 + 1)2 modulo 3.
64
4. Properties of Polynomials
Theorem 6.1. A necessary and sufficient condition for j(x) to have repeatedfactors is that the degree of (j(x),f'(x» is at least 1. D Theorem 6.2. Ijp,(n, then X'  1 has no repeatedfactors modp. Theorem 6.3. Let (m,n)
=
d. Then (x'"  1, xn  1) =;xd  1.
D
D
Theorem 6.4. Let (m, n) = d. Then
4.7 Double Moduli Congruences Definition 7.1. Let p be a prime number and q>(x) be a polynomial. Iff1 (x)  fix) is a multiple of q>(x) mod p, then we say that f1 and f2 are congruent to the double moduli p, q>(x) and we write
f1(X) §. f2 (x)
(moddp, q>(x».
For example, x 5 + 3x4 + x 2 + 4x + 3 §. 0 (modd 5, 2X2  3). Double moduli congruences have the following properties: 1) j(x)§.j(x) (moddp, q>(x»; 2) If f§.g (moddp, q», then g§.f(moddp, q»; 3) If f§.g and g§.h (moddp, q», thenf§.h (moddp, q»; 4) If f§.g and f1 §.gl (modd p, q», then f ±f1 §.g ± gl and ff1 §.ggl (moddp, q»; 5) Suppose that the degree of q>(x) (modp) is n. Then every polynomial is congruent to one of the following polynomials
0::;;; ai::;;;p  1.
(1)
It is clear that there are pn polynomials in (1), no two of them are congruent (moddp, q>(x», and any polynomial must be congruent to one of them (moddp, q>(x». Definition 7.2. We call the pn polynomials in (1) a complete residue system (moddp, q>(x». By discarding those polynomials which are not coprime with q>(x) we have a reduced residue system (moddp, q>(x».
Theorem 7.1. Let (g(x), q>(x» = 1. Then, asj(x) runs through a complete (or reduced) residue system (moddp, q>(x», so does f(x)g(x). Proof If g(X)f1 (x) §. g(X)f2(X) (moddp, q>(x», then from (g(x), q>(x» = I we deduce that f1 (x) §. f2 (x) (moddp, q>(x». The required result follows easily from this. D
65
4.8 Generalization of Fermat's Theorem
4.8 Generalization of Fermat's Theorem Let p be a prime number, and (x)) is called the order of fix). As before, it can be proved that I divides pn  1, and that there are precisely q>(l) polynomials having order I. There are therefore q>(pn  1) polynomials with order pn _ 1, and these polynomials are called the primitive roots (moddp, q>(x)). Iffix) is a primitive root, then (fix)) v, v = 1,2, ... ,pn  I represent all the nonzero incongruent polynomials, moddp, q>(x). It is not difficult to prove that the product nv (X  fv(x)), where!., runs over all the primitive roots, is equal to
n
x pn  1 _
n
1
(X(pn_1)/q 
(X(pn_1)/qq, 
1)
1)
(1)
q
where qi runs over all the distinct prime divisors of pn  1. Exercise. Prove that the product of all the nonzero incongruent polynomials is congruent to  I (moddp, q>(x)).
4.11 Summary We may summarize the discussions of this chapter in the language of modern algebra or abstract algebra. We have a set of objects which we denote by R. The number of objects in R may be finite or infinite. 1. If we can define the operations of addition and subtraction in R and that these operations are closed in R, then we call R an integral modulus. For example: The set of even integers forms an integral modulus; the set of polynomials with even integer coefficients forms an integral modulus. An integral modulus is also known as an Abelian group. 2. If we can define the operations of addition, subtraction and multiplication which are closed in R, then we call R a ring. For example: The set of integers forms a ring; the set of integer coefficients polynomials forms a ring. 3. By an ideal E, we mean a subset of a ring R which satisfies the following conditions: i) If a,bEE, then a  bEE; ii) If aEE and rER, then arEE. For example: The subset of even integers forms an ideal in the ring of integers. In the ring of integer coefficient polynomials, we may form the ideal of polynomials having the formfix)(x 2 + 1) + 2g(x)x, wherefand g run over all integer coefficient polynomials. 4. If in R we can define the operations of addition, subtraction, multiplication and division (except by 0), and that these operations are closed in R, then we call R a field.
4.11 Summary
69
For example: The set of rational numbers forms a field. The residue classes modulo a fixed irreducible polynomial forms a field, which is known as an algebraic extension field in modern algebra. Next, take a prime number p and an irreducible polynomial qJ(x) of degree n. The residue classes with respect to the double modulus p and qJ(x) forms a field with pn elements. Students who master the various concrete examples discussed in this chapter will find it easier to learn the abstract concepts of modern algebra.
Chapter 5. The Distribution of Prime Numbers
In this chapter we give some basic results concerning the distribution of prime numbers. The reader will only require some knowledge of the calculus  this chapter is a first introduction to analytic number theory and we shall omit all the deeper investigations.
5.1 Order of Infinity In the discussion of the distribution of prime numbers we must understand the notion of the comparison of the order of growth between two functions. We often use the symbols .
«,
0,
0,
the meanings of which we shall now give. Let n be a positive integer which tends to infinity (or x a continuous variable which tends to infinity). Let : n(2k+1) >: _ _ _  _ ..,....,.,..._ _ 7 72 k + 1  8 t(k + I)
1 2k + 2
>: 
I
n
>:   
78 H(2 k+1) 78 H(n)·
(8)
82
5. The Distribution of Prime Numbers
This holds for all n
~
2. Therefore I
H(n)
8
n
0
~n(n)~
~) =
log (I 
p";~
C7
+

P
0(_1_) log~
I P~ + p";~ I [lOg (I  P~) + P~J
p";~
+I
(IOg(1
p>2
~) +~) P P
~) +~) = loglog~ +
(IOg(1
C13
P
P
+0(_1_), log~
where C13
= 
C7
+ I
p>2
(lOg (1
~) + ~). P P
Therefore
I
p";~
( P1) 1
(1) = __
eCl3 'c o ( log~ 1)
=elogloge+cl3+o log~
log ~
= ~(1 + o log~
(_1_)) log ~
(C1 2
= eC[3),
where we have used
eOCo~~)= I + The theorem is proved.
0(_1_). log~
D
Theorems 9.2 and 9.3 are quantitative elaborations of Theorems 4.3 and 4.5. Exercise 1. Let Pn denote the nth prime. Prove that there are constants that n
Exercise 2. Prove that there exists a positive constant ({)(n) >
cn
loglogn
,
Exercise 3. Prove that the infinite series 1
~ p(log logp)h
n ~ 3.
~
C
2.
such that
Cl, C2
such
94
5. The Distribution of Prime Numbers
converges or diverges according to whether h > 1 or h summation over all the prime numbers.
~
1. Here
Lp represents the
5.10 The Number of Prime Factors of n Let n be a positive integer. We denote by w(n) the number of distinct prime factors 'Of n and by Q(n) the total number of prime factors of n. That is, if n = p~1 ... p~', then Q(n)
w(n) = s,
If n is a prime, then w(n) of 2, then
= Q(n)
= at + ... + as.
(1)
= 1; but as n tends to infinity through power.s
Q(n)
logn log2
=   +
00;
and if n = PtP2 ... Ps is the product of the first s primes, then as n + 00, = s + 00. Thus the behaviours of w(n) and Q(n) are rather irregular and there is certainly no asymptotic formula for them. However, we do have the following: w(n)
Theorem 10.1. There are positive constants
L w(n)
:;=
Ct, C2
xloglogx
such that
+ Ct + o(x),
(2)
n:::=;x
L Q(n) = xloglogx +
C2
+ o(x).
(3)
n:.%.x
Proof 1) We have
L w(n) = L L 1 = L [~J = L ~ + O(n(x»
P p.sx P and so (2) follows from Theorem 9.2 and Theorem 6.2. 2) We have n.sx
n.sx pin
p.sx
and, by Theorem 6.2, logx
[ IOgX]
P log2 .sx
Therefore
L n:::=;x
Q(n)
=
x
L w(n) + L m + o(x). n:S;x
logx
r=
1 ~   L 1=  n ( y x) = o(x). log2 p2.sx log2
pm:s;x m~2
P
95
5.10 The Number of Prime Factors of n
But the series
~"1_,,(1+1+ ... )_,,
m':2
'7 pm  '7
p2
p3

'7 p(p 1
c 1) 
converges, so that
L
Q(n) =
n:::;:;:x
L w(n) + x(c + 0(1)) + o(x) =
x10g10gx
+ C2X + o(x). 0
n:::=;x
Theorem 10.2 (HardyRamanujan). Let e > 0, and letf(n) denote either w(n) or Q(n). Then the number of positive integers n
~
x satisfying
If(n)  10glognl > (loglogn}!+£ is o(x), as x
(4)
~ 00.
Proof(Turan). Since 10glogx  1 < 10glogn ~ 10glogx when xl/e < n ~ x, and the number of positive integers n ~ xl/e is [xl/e] = o(x), it suffices to prove that the number of positive integers n ~ x satisfying If(n) loglogxl > (loglogx)t+£
(5)
is o(x) as x ~ 00. Next, from Q(n) ;::: w(n), and by (2) and (3)
L (Q(n) 
w(n))
= O(x)
n~x
so that the number of positive integers n ~ x satisfying Q(n)  w(n) > (log 10gx)t is
o ((lOg l:g x)t )
=
o(x) ..
Therefore we need only consider the casef(n) = w(n). We consider a pair p, q of distinct prime divisors of n (p, q and q,p are treated as two different pairs). Each p may take w(n) values and for each fixed p, q may take w(n)  1 values. Therefore we have w(n)(w(n) 
1) =
L 1 = L 1  L 1. pqln p¢q
Summing over n
=
pqln
p21n
1,2, ... , [x] we have (6)
Since
96
5. The Distribution of Prime Numbers
and
L [~J = x L ~ + O(x), pq pq
pq';'x
pq';'x
it follows from (2) and (6) that I
L w 2(n) = x L  + O(x log log x). n';'x
pq';'x
(7)
pq
Now
L ~)2 ~ L ~ ~ (L ~)2, ( P.;.J; P pq p pq';'x
and Lp';'~ lip = log log
p';'x
e+ 0(1), so that both the outsides in the above are
(loglogx
+ 0(1))2 = (loglogx)2 + O(loglogx).
It now follows from (7) that
L w 2(n) =
x(loglogx)2
+ O(x log log x),
(8)
n:::;:;:x
and so
L (w(n) loglogx)2 = L w 2(n) 
2 log logx
L w(n) + [x](loglogx)2 n~x
n:::=;x
= x(loglogx)2 + O(xloglogx)  210glogx(xloglogx + O(x)) + (x + 0(1))(loglogx)2 = O(x log log x). Given any (j > 0, if there are (jx positive integers n
~ x
(9)
such that (5) holds, then
L (w(n) loglogx)2 ~ (jx(loglogX)1+2"
(10)
n~x
which contradicts with (9). Therefore the number of positive integers n that (5) holds is o(x), and the theorem is proved. D From this we see that w(n) '" log log n
and
Q(n) '" log log n
for almost all n.
5.11 A Prime Representing Function Theorem 11.1 (Miller). There exists a fixed number 2~o
then
[!Xn ]
is always prime.
= !Xl>""
!X
such that
if
~
x such
97
5.12 On Primes in an Arithmetic Progression
Proof We construct a sequence of primes {Pn} by induction: Take PI Theorem 7.1 there exists a prime Pn+ 1 satisfying
If Pn + 1 + 1 = 2Pn + 1, then Pn + 1 = 2Pn + 1 divisor 2'!(Pn+ 1 )  1). Therefore

2Pn < Pn+ 1 < Pn+ 1
=
3. By
1 cannot be prime (because it has the
+ 1
0), then Dirichlet's theorem follows. For if there exists n such that an + b = PI (> b) is prime, and (replacing a by apr) there exists n such that apln + b = P2 (> PI) is prime, and so on, then there are infinitely many primes of the form an + b. Theorem 12.1. Let k > l. Then there are infinitely many primes of the form kn + l. From what we said earlier it suffices to prove that there always exists a prime of the form kn + l.
98
5. The Distribution of Prime Numbers
The roots of the equation Xk
1 are given by
=
a
= 0, 1, ... ,k  1.
Let (a,n)= 1
where the product is over a reduced set of residues a mod n. Clearly we have Xk  1 =
f1 Fn(x)
nlk where the product is over the divisors n of k, since each root on the left hand side must occur on the right hand side, and conversely without any repetition. Let
where Gk(x) is the least common multiple of the various polynomials xn._ 1 (n Ik, n < k), and its leading coefficient is 1. Therefore Gk(x) is an integer coefficient polynomial, and by Theorem 1.13.2 we see that Fk(X) is also an integer coefficient polynomial. If x is an integer not equal to ± 1, then
that is, Fk(X) and Gk(x) are nonzero integers. Lemma 1. Let n be a proper divisor of k. Then for all integers x :f
± 1, we have
Proof Let xn  1 = y, k = nd. Then Xk  1
= ~l
(y
+ l)d y
1 =ydl
== d (mody).
+
(d).,d2 + ... + (d) y+d 1
y
2
0
Lemma 2. Let x be an integer not equal to Fk(X) and Gk(x) must be a divisor of k.
± 1. Then each common prime divisor of
Proof Let pl(Fk(x), Gk(x)). From pIGk(x)
=
f1 Fn(x)
nlk n 0.67
~ TI TI plnp2 p
>2
(1  (PI) 1 2)~2 . logn
p>2
It follows, of course, from this that every sufficiently large even integer is a sum of a
prime and an integer having at most two prime factors. The proof of Chen's theorem is given in the book "Sieve Methods" by H. Halberstam and H. E. Richert [28] where there is also a comprehensive bibliography. 5.2. Concerning the prime twins problem J. R. Chen [20] also proved that there are infinitely many primes P such that P + 2 is either a prime or has two prime factors. 5.3. H. Iwaniec (unpublished) has proved that there are infinitely many integers n such that n2 + I is either a prime or has two prime factors. 5.4. The principle of the "large sieve" was invented by Yu. Linnik and A. Renyi, and was substantially developed by K. F. Roth [50] and E. Bombieri [9] (see also the books by H. L. Montgomery [44] and E. Bombieri [10]). From his result Bombieri deduced the following theorem on the average value of n(x; k, I): Given any A > 0, there exists B = B(A) > 0 such that
I
I
lix = 0 max n(x;k,/) .....:
k:S;x5/1 og B x (I, k)= 1
I
O. We have
LJl(d) = LJl(n/d) = L1(n) = din
din
{I, 0,
if n = I, if n # 1.
Proof This follows from takingf(d) = 1 in Theorem 2.3.
0
Theorem 3.2. Let 0 < '10 ::;:; '11 and let h(k) be a completely multiplicative function which is not identically zero. If for any '1 satisfying '10 ::;:; '1 ::;:; '11 we have
g('1)
j(k'1)h(k), L "'k"'ql/q
(I)
1
Jl(k)g(k'1)h(k) ; L ",k"'ql/q
(2)
1
=
then f('1) the converse also holds.
=
106
6. Arithmetic Functions
Proof From (1) we have
L
L
Jl(k)g(k'1)h(k) =
L
Jl(k)h(k)
f(mk'1)h(m).
Let mk = r. From Theorem 3.1 we have 1
""k~~I/~ Jl(k)g(k'1)h(k) = ""k~~I/~ Jl(k) ""k~~li~f(r'1)h(k)h G·) 1
1
klr
L
f(r'1)h(r)
L
f(r'1)h(r)
Jl(k)
LJl(k) klr
l""r""~li~
L
L
f( r'1)h(r)LJ(r)
= f('1)h(l) = An)
which proves (2). Suppose instead that (2) holds. Then
L
L
f(k'1)h(k) =
h(k)
L
L
L
Jl(m)g(mk'1)h(m)
Jl(r/k)g(r'1)h(k)h(r/k)
1 ""k""~I/~ 1 ""k""~li~
klr
L
g(r'1)h(r)
l""r""~I/~
L 1
which proves (1).
L
Jl(r/k)
l""k""~I/~
klr
g(r'1)h(r)LJ(r) = g('1)
""r"" "I.l/~
0
We can extend this theorem as follows: Theorem 3.3. Let ~o not identically zero.
~
1 and let H(k) be a completely multiplicative function which is all real ~ satisfying 1 :::; ~ :::; ~o we have
Iffor
G(~)
L
=
F(~/k)H(k),
(3)
Jl(k)G(~/k)H(k);
(4)
l""k""~
then we have, for such
~,
F(~) =
L l""k""~
the converse also holds. Proof Letf('1) = F(lN) and g('1) = G(I/'1)' Then from (3) and (4) we have g('1) = G(l/'1) =
L
l""k""l~
F(
~) H(k) = L
'1
l""k""l~
f('1k)H(k) ,
107
6.4 The Mobius Transformation
f{1'/)
= =I F(1/1'/)
l"k"l!~
J1.(k)G
(~) H(k) = I
l"k"l!~
1'/
These are just formulae (1) and (2) with 1'/1 = I
~
J1.(k)g(1'/k)H(k).
Igo = 1'/0.
D
We now apply this to the following:
Theorem 3.4. When
~ ~
I we have
II
J1.r)
1 H"~
Proof In (3) we set
F(~)
~ ~
(5)
= =I H(k) I
If I
I ~ l.
so that GW
I
=
J1.(k)
1"kq
=
[~].
[t]·
(6)
< 2, then (5) clearly holds. Suppose now that ~
IxI
k= 1
J1.(k) k
From (4) we have
~
2, and let x
= [~]. T~en
11=1 I J1.(k)(~[~])1 k
k= 1
k
=IIJ1.(k)(~[~])I~ k k k=2
II=xl.
k=2
Therefore
xl I k=l
J1.(k) k
and the required result follows.
I~ I + (x 
1)
=
x,
D
6.4 The Mobius Transformation Another consequence of Theorem 3.3 is the following:
Theorem 4.1. Let h(k) be a completely multiplicative function which is not identically zero, and let no be a positive integer. If for all n satisfying I ~ n ~ no, we have g(n)
=
If(d)h(~),
(I)
din
then, for such n, we have f{n) =
I din
J1.(d)g('!.)h(d); d
(2)
the converse also holds. Proof We define F(~) by setting F(~) = f(~) when ~ is an integer and F(~) = 0 if ~ is
108
6. Arithmetic Functions G(~)
not an integer, and we define G(n) = g(n) =
similarly. We can rewrite (1) and (2) as
Ij(d)h(~) = If(~)h(k) = I F(~)h(k) d k k
din
kin
l';k';n
and F(n) =j(n)
=
IJ1.(d)g(~)h(d) = IJ1.(d)G(~)h(d) d
din
=
d
din
1.;~.;/(d)G(~)h(d).
From the definition of F(~) and
G(~)
these two formulae can also be written as
G(~) = I
F(i)h(k),
F(~) = I
J1.(k)G(i)h(k).
l';kq
l';k';~
Here ~ satisfies 1 :::; ~ :::; no. Conversely (1) and (2) can be deduced from these formulae. The theorem now follows from Theorem 3.3 with ~o = no. 0 Definition. If
g(n) = If(d) = din
If(~)'
din
then we call g(n) the Mobius transform ofj(n). We also callj(n) the inverse Mobius . transform of g(n). From Theorem 4.1 we have j(n) =
IJ1.(d)g(~) = IJ1.(~)g(d).
din
din
From Theorem 2.2 we see that the Mobius transform, and the inverse Mobius transform, of a multiplicative function is multiplicative. Example 1. From Theorem 3.1 we see that A(n) is the Mobius transform of J1.(n). Example 2. From u;.(n) = Idln d\ we see that u;.(n) is the Mobius transform of the multiplicative function E;.(n) = n\ and therefore u;.(n) is a mUltiplicative function. Since 'I
U;.(pl)
=
I
p;'(l+I)_1
pm;,
= :;,
(2 # 0),
P  1
m=O
we deduce that if n = TIvP~v, then u;,(n) ~
TI v
p;'(lv+ 1) _
v
;,
Pv  1
1 •
109
6.4 The Mobius Transformation
In particular, when A = 0, we have d(n)
= (J'o(n) =
TI (Iv + 1),
which we already proved in an earlier exercise. Example 3. The function Eo(n) = 1 is the Mobius transform of LI(n). Example 4. Let n be fixed and let the integers 1,2, ... , a, ... , n be partitioned into distinct classes according to the value of t!le greatest common divisor (n, a). If d = (n, a), then we can write n = dk and 1 = (k, a/d). Now the number of integers a satisfying 1 = (k, a/d) is precisely O. Then den)
Here the Oconstant depends on Proof Let n =
If pe
~
D
=
(1)
O(ne).
B.
TIpln pa be the standard factorization of n. We have
2, then pae
~
+ 1. Therefore
2a ~ a
~
TI
1
pin
l(a
pE 1 be such that the congruence
f2 ==  1 (modn)
(1)
has a solution. Then there exists a unique pair of integers x, y satisfying
x> 0,
y>O,
(x,y)
= 1,
y
== Ix
(modn).
(2)
Proof Clearly if (2) is soluble, then so is (1). A necessary condition for (1) to be soluble is that n is representable as a
= 0 or 1,
and Pi (i = 1,2, ... ,s) is a prime == 1 (mod 4). We now use induction to prove the theorem. 1) We consider first the case n = pA. If A. = 1, then from 12 + 1 == 0 (modp) we see that when (x,p) = 1, we have x 2/2 + x 2 == 0 (modp). We shall presently choose y and x so that x 2f2 == y2 (modp), and x 2 < p, y2 < p. Let x and y take the values 0,1, ... , and consider the various differences xl y. Since there are + 1)2> p such differences, there must be two which are congruent modp. Let xII  YI == X21  Y2 (modp), or (Xl  x2)1 == YI  Y2 (modp), and we can assume that Xl  X2 > 0 so that Xl  X2 < IYl  Y21 < and this then gives our desired x and y. For this pair x, Y we have x 2 + y2 = tp, and it is easy to see that t = 1, (x,y) = 1. The congruence Y == mx (modp) is soluble, and from x 2(1 + m 2) == 0 (modp) we see that m == ± I. Ifm = I, then we take the pair (x,y), while ifm =  I, then we take the pair (y, x). Now assume that p ¥ 2 and thatthe theorem holds for n = pA. Let (  /)2 ==  1 (mod pH I) so that there exist u, v such that
([..JP]
[..JP]
..JP,
u > 0,
v> 0,
(u, v)
..JP
= 1,
v
== 
lu
(modpA).
When n = pA+l, we have pHI
=
(xu
+ YV)2 + (xv
_ yU)2
= X2+
y2
(X> 0, Y> 0).
First we have (X, Y) = 1, since otherwise pl(X, Y), but X
== xu + yv == xu 
flxu
== xu(1
 fl) =1= 0
(modp),
which is impossible. Next, because (X, p) = 1, the congruence Xm == Y (mod pA + I) is soluble. Thus X 2 + Y 2m 2 == 0 (modpHI) or 1 + m 2 == 0 (modpHI). From Theorem 2.9.3 this congruence has only two solutions, so that m = ± l. The desired result follows from the discussion in the case A. = 1.
118
6. Arithmetic Functions
2) Let n = ab, a > 1, b> 1, (a, b) = 1, and suppose that 12 ==  I
(modn),
u2 + v 2 = a,
u> 0, '
v> 0,
(u,v)
= I,
v == lu
(mod a),
x 2 + y2
x> 0,
y>O,
(x,y)
=
1,
y == Ix
(mod b).
 YV)2
=
= b,
From Theorem 7.3 we have n
= ab = (xv + yuf + (xu
X 2 + y2.
(If xu  yv > 0, then let xu  ·yv = Y; otherwise we let xu  yv =  Y.) We now prove the following: (i) (X, Y) = 1. Let pl(X, Y). Then xv
+ yu =ps,
xu  yv =pt,
or x(u 2 + v 2)
= p(sv + tu),
y(u 2 + v 2) = p(su  tv).
Since (x,y) = I, we must have pl(u 2 + v 2), that is pia. Similarly plb. But this contradicts (a, b) = l. (ii) X == IY (mod n). From our assumption we have xv
+ yu == Ixu 
Iyv == I(xu  yv)
(mod a),
xv
+ yu ==
+ Ixu == I(xu
(mod b).
Iyv
 yv)
Since (a, b) = I, it follows that X == IY (mod n). 3) Uniqueness. Suppose that there are two pairs (X, Y), (X', Y') both satisfying the conditions. Then n 2 = (XX'
+
yy')2
+ (XY'
_ YX')2.
But XX'
+
YY' == XX'(l
+ [2) == 0
(modn),
so that XX'
From XY'  YX'
=
+ YY' =n,
XY' YX'=O.
0, we have
X
Y
==c X' Y' ,
119
6.7 The Representation of Integers as a Sum of Two Squares
so that X 2 + y2 = C 2(X,2 + y'2) giving C = ± 1. Also from X > 0, X' > 0 we see that C = 1. The proof of our theorem is complete. 0
Proof of Theorem 7.2. From Theorem 7.1 and Theorem 7.4 we see that the number of solutions to x 2 + y2 = n, (x, y) = 1 is 4 V(n). We now consider the equation x 2 + y2 = n, and we partition the various solutions into sets according to (x, y) = d. The number of solutions satisfying (x,y) = d is equal to the number of solutions satisfying X)2 (d
(y)2
+ d
=
n d2
'
that is 4 V(n/d 2 ). Therefore
r(n)
= 4
I d21n
v(;) d
= 4
I V(~)2(d), d
din
where 2(d) = I or 0 according to whether d is a square or not. Since V(n) and 2(n) are both mUltiplicative it follows that r(n)/4 is multiplicative. Since ben) is also multiplicative the theorem will follow if we show that r(n) = 4b(n) when n = p'. Now, if 21m, then
r(pm) = V(pm) + V(pm2) + ... + V(p2) + V(l) 4
0+ ... + 0 + I = I, + ... + 0 + I = I, 2+"'+2+1= m =·2+I=m+l 2 '
°
if p = 2, if p == 3
(mod 4),
if p == 1
(mod 4),
and if 2,tm, then
I,
=
{
°~ + I,
if p = 2, if p == 3 if p == 1
(mod 4), (mod 4).
On the other hand we have
b(pm) = 1 + X(p) + ... + X(pm)
I +0+0+ ... +0= 1, _ { 1  1 + ... + I = 1, I  I + ...  I = 0, 1 + 1 + ... + 1 = m + I, The theorem is proved.
0
if if if if
p=2, p==3 p==3 p==l
(mod 4), (mod 4), (mod 4).
21m, 2,tm,
120
6. Arithmetic Functions
Theorem 7.5. Denote by A and B the number ofdivisors ofn which are congruent I and 3 (mod 4) respectively. Then r(n) = 4(A  B). Proof This is an immediate consequence of Theorem 7.2.
0
Theorem 7.6. Let e > O. Then r(n) Proof Since r(n)
~
= O(n').
4d(n), the required result follows from Theorem 5.2.
0
6.8 The Methods of Partial Summation and Integration Theorem 8.1 (Abel). Let a numbers and
~
b and let n vary in a
~
n
~
b. Let 'l'n and en be complex
Then
IJa 'l'nenl ~ a~::b ISnl C"'m~bl lem Proof Let Sal
=
em+11
+ lebl ).
(I)
O. Then b
b
n=a
n=a
L 'l'nen = L (sn =
Snl)en
b
bl
n=a
n=a
L Snen  L Snen+l bl
=
L sn(en 
en+d
+ Sbeb,
n=a
so that
Theorem 8.2. In the previous theorem if en is a positive decreasing sequence, then
Int 'l'nenl ~ a~::b ISnlea· We now apply this to the following:
0
(2)
121
6.8 The Methods of Partial Summation and Integration
Theorem 8.3.
If s >
0, then
"L... x(n)s I ....::::~~s' In~a n a so .that the series
I:'= 1 x(n)/n s converges when s> 0.
Proof We have x(a) + x(a + I) + x(a + 2) + x(a + 3) = 0,
so that
From Theorem 8.2 we deduce that
I ±X(7)1~~· n a n=a
Since the right hand side is independent of b, the theorem follows.
D
Note: In the next section we shall require x(n)
I
00
=
n= 1
n
I
I
I
n
1++'" =. 3 5 7 4
This can be proved using the series expansion for tan  1 X in ordinary calculus. Analogous to Theorems 8.1 and 8.2 we have: Theorem 8.4. Let ~ ~ '1 and let x vary in ~ ~ x ~ '1. Suppose thatf(x) and g(x) are continuous and g(x) is differentiable. Let x
11 (x) =
f fit) dt.
Then q
q
Iff(X)g(X)dxl ~
Moreover, if g'(x) ~
~ ~~::ql/l(x){flg'(X)ldX + Ig('1)I). ~
°
and g(x) > 0, then q
Iff(X)g(X)dxl
~ g(~) ~~::ql/l(X)I.
122
6. Arithmetic Functions
Proof From integration by parts we have ~
~
= I g(x)dl1 (x)
II(x)g(X)dX
~
= g(rO/l (1])  III (x)g'(X) dx, and hence
II ~
fix)g(x) dx
I~ ~~::~
~
III (x)1 (lg(1])1
+I
~
Ig'(x)1 dX).
~
The last part of the theorem is also clear.
D
Example. Let a > O. Prove that 00
II
I cOS~/~Y I~ ~ maxi 00
COSX2dxl
=
I
2y
2a
a2~~
~
a
~
ICOSYdyl
~~. a
~
6.9 The Circle Problem' Theorem 9.1.
L
r(n)
=
nx
+ o(fi)·
Proof From Theorem 7.2 we have
L
r(n)
=4
l~n~x
L LX(d) l:::=;n~xdln
=4
L 1 ~d:::=;x
= 4
x(d)
L 1 ~n:::;;x
L X(d)[~J.
l~d~x
Here we divide the sum into two parts. From Theorem 8.3 we have
123
6.9 The Circle Problem
= 4x
I: d=l
= 1tX
X(d)
+ O(Jx)
d
+ O(Jx);
the other part is
and from Theorem 8.2 we have The theorem is proved.
D
Another proof of the theorem is the following: Clearly LO";n";xr(n) is the number of pairs of integers u, v satisfying u2 + v2 ~ x. In other words the sum is the number of lattice points inside the circle centre at the origin with radius Jx. This circle has area 1tx. We partition the plane into unit squares with orthogonal lines passing through the lattice points. To each point (u, v) in our circle we assign the square whose four corners have the coordinates (u, v), (u + 1, v), (u, v + 1), (u + 1, v + 1). These squares must lie inside the circle u2 + v2 = (Jx + J2)2 and they include the circle u2 + v2 = (Jx  J2)2. Therefore
and the required result follows at once. We observe that this second proof can be used as a proof for 1t 1 1 1 1++ ... =. 3 5 7 4 Concerning the pro blem of the number oflattice points inside a closed curve, the Czech mathematician M. V. Jarnik proved the following: Theorem 9.2. Let I ~ 1 be the length of a rectifiable simple closed curve and let A be the area of the region bounded by the curve. If N is the number of lattice points inside the curve, then
IA  NI < I. Proof (Steinhaus). We first prove the following two simple lemmas. Lemma 1. Let C be a rectifiable curve inside a unit square with the two end points on the boundary of the square. IfC crosses the two diagonals of the square, then its length must be at least 1.
Proof If the two end points are on the opposite sides of the square, then the result follows at once. Suppose next that the two end points are on two adjacent sides of
124
6. Arithmetic Functions
rJ.
P a
b
P
the square as shown in the diagram. It is easy to see that
A similar argument applies when the two end points are on the same side of the square. Lemma 2. Let C be a rectifiable curve inside a unit square with the two end points on the boundary of the square so that the square is partitioned into two regions. Suppose that C does not pass through the centre of the square, and denote by LI the region which does not contain the centre. Then the area of LI must be less than the length of C.
Proof We consider separately the cases shown in the following diagrams:
rJ.
P
q fJ
rJ.
fJ
P
rJ.
P
fJ
P
rJ.
fJ
rJ.
q P
P
Let A be the area of the region LI and I be the length of C. In the first two cases it is easy to see that every point of C is of distance at most I from the base line rxf3 so that LI must lie inside a rectangle with sides 1 and I and hence A < I. In the remaining three cases we see from Lemma 1 that I;?; 1 and so A < 1 ~ l. We can now proceed to prove the theorem. Denote by I the region inside the curve. We form a net of unit squares in the plane with the lines
x=m
+t,
y=n+t
(m,n
=
0,
± 1, ± 2, ... ).
Let Qb Q2,' .. , Qk be those squares which contain part of the boundary of I, let C i be the part of the curve in Qi' let Q i be the intersection of Qi and I, and define
{I,
N.= , 0,
if Q i contains a lattice point, otherwise.
We let Ai be the area of Qi, Ii the length of Ci, so that our theorem will follow if we can prove that IAi  Nil < 1;. Now the case when the whole of Ilies inside a Q follows at once since I;?; 1. We can assume therefore that Ci is made up of a number of sections of the curve and Qi is partitioned into regions DlS).
125
6.10 Farey Sequence and Its Applications
If the lattice point does not lie in any DlS) so that it lies on Ci, then Ni = 0, o < Ai < 1 and Ii ~ 1 so that our required result follows. If the lattice point lies inside a Dl S) we denote by AlS) the area of Dl S). If Dl S) is not in I, then Ni = 0, Ai ~ 1  AlS); if DlS) is in I, then Ni = 1, 1  Ai ~ 1  AlS) and, from Lemma 2, we have 1  AlS) < Ii' The theorem is proved. D
It is clear that Theorem 9.1 is an immediate consequence of Theorem 9.2. Exercise 1. Find the asymptotic formula for the number of lattice points inside an ellipse centre at the origin. Exercise 2. Prove that the number of lattice points inside the sphere u 2 + v2 + w2 ~ x is given by
1nx 3 / 2
+ O(x).
Exercise 3. Generalize the previous exercise to a sphere in ndimensions. Exercise 4. Determine the order of Ln.;xr2(n). Exercise 5. The number of lattice points inside the circle u 2 coordinates is given by 6 x n
+ v2
~
x with coprime
+ O(fi log x).
6.10 Farey Sequence and Its Applications Farey sequence was discovered well over a hundred years ago, but its significance in number theory is revealed only in modern times. "
Definition 1. By the Farey sequence of order n we mean the fractions in the interval from 0 to 1, whose denominators are ~ n, arranged in ascending order of magnitude. That is, they are numbers of the form a
b'
(a, b) = 1,
arranged into an increasing sequence. We denote by tYn the Farey sequence of order
n. Example:
tY7
is the sequence
The total number offractionsin tYn is 1 + L~= 1qJ(m). These fractions divide the interval 0 ~ x ~ 1 into L~=l qJ(m) parts, and tYn+l is obtained from adding the
126 cp(n
6. Arithmetic Functions
+ 1) numbers a
+ 1) =
(a,n
n + l'
1,
o 2, 1 ~ m ~ A 1/3, (a, m) = 1, k ~ 1. Suppose that M+ml
S
=
L
{fix)},
x=M
where fix) has a continuous second derivative in M a 9 f'(M) =+,
m
(a,m)
m2 1
A
~
If"(x) I ~
~
x
~
= 1,
M
+m
191
I/A we see that f"(x) does not change sign. We can therefore assume without loss that/"(x) > O. Then we have ( m) (m m "fiM)  m 2 < I/I(y) < m "fiM) + m2
2
) + 21 m A"k ,
or mfiM)  1 < I/I(y) < mj(M)
+ 1 + tk.
The result follows from taking c = mj(M)  1 and h 11.1. D
= 2 + k/2 in Theorem
132
6. Arithmetic Functions
Theorem 11.3. Let k ~ I and let fix) have a continuous second derivative in M ~ x ~ M + m, and I

A
~
k
If"(x) I ~ . A
Then M+m1 S=
L
x=M
I {fix)} = m 2
+ 0(.1),
where
Proof We take 1: = A 1/3 , M = M 1. We see from Theorem 10.6 that there exist a 1 ,m,8 1 such that (7)
From Theorem 11.2 we have
M,+m,1
L
x=M,
We next take M2 such that
8'
+ .!..(k + 5), 2
+ m1 and again from Theorem 10.6 there exist a2, m2, 8 2
M1
=
I {fix)} = ml 2
and
M2+m21
L
X=M2
I {fix)} = m2
2
8'
+ ~(k + 5), 2
Continuing this way, if after s steps we have
o~ M + m 
I  Ms+l < 1:,
then
IS  t(m1 + ... + ms)  t(M + m  Ms+ 1)1 s
~ 2(k
or (since Ms+1 = M
I
+ 5) + 2(M + m 
M s + 1),
+ m1 + ... + ms) IS  tml < ts(k + 5) + t(1: + I).
(8)
We now have to estimate s. Suppose that 0 < q < 1:, (p, q) = 1. If p, q are given, we can estimate how many m1,'" ,ms are equal to q. From 1f"(x)1 > I/A and its
6.11 Vinogradov's Method of Estimating Sums of Fractional Parts
. 133
continuity we know thatf"(x) does not change sign. It follows that the set of values x satisfying I :;;;f'(x):;;;+
pip q
forms an interval. Let
Xl> X2
q1:
q
q1:
(9)
be any two points in the interval, so that
Hence X2
I
f
I
f"(t) dt
0, R(x)
= nx + O(x~+e).
(See Note 6.1.) A famous problem in number theory is the conjectUfe that R(x) = nx
+ O(xi +e).
We require the following result for the proof of Theorem 12.1. Theorem 12.2. Letj(x) have a continuous second derivative in the interval Q and let x
u(x)
=
fGo
{t} )dt.
~
x
~
R,
135
6.12 Application of Vinogradov's Theorem to Lattice Point Problems
Then R
I
f(x)
=
ff(X) dx
+ (t 
{R})f(R) 
(t 
{Q})f(Q)  (f(R)f'(R)
Q<x':;R Q
R
+ (f(Q)f'(Q) + f
(f(X)f"(x) dx.
Q
Proof Let Xl be an integer, Q tegration by parts we have p
~
~
oc < 13
R, Xl < oc < 13 < Xl
+ 1.
From in
p
 ff(X)dX=
'"
ff(x)~G{X})dX '"
= (t  {f3})f(f3) 
(t 
{oc})f(oc)  (f(f3)f'(/3)
+ (f(oc)f'(oc)
p
+f
(1)
(f(x)f"(x) dx.
'"
Letting oc + Xl> 13 + Xl +I
Xl

+ 1 we have Xl
+ 1) 
f(x)dx =  tf(XI
f
tf(xd
Xl
+
+I
f
(f(x)f"(x)dx.
Xl
From this it follows that [R)

f f(x)dx
I
= 
fix)
+ tf([Q] + 1) + tf([R])
[Q)+ I ':;X':; [R) [Q)+ I [R)
+
f
(2)
(f(X)f"(x) dx.
[Q)+ I
If in (1) we let
(J(
=
Q, 13 + [Q]
+ 1, then
[Q)+ I
f
fix) dx = 2 1f([Q]
+ 1) 
G
{Q} )f(Q)
+ (f(Q)f'(Q)
Q [Q)+ I
+
f Q
(f(X)f"(x) dx.
(3)
136
6. Arithmetic Functions
Similarly we have
f R
j(x)dx
= (t  {R})j(R)  tj([R])  u(R)f'(R)
IR)
f R
+
(4)
u(x)f"(x) dx.
IR)
The required formula is obtained by adding (2), (3) and (4).
D
Proof of Theorem 12.1. By considering the diagram associated with the circle problem it is easy to see that R(x)
I
= I + 4[Jx] + 8
[Jx  u2 ]
x

4[
0 0, then the formula R(x) = nx
+ O(xi ')
does not hold. Actually we shall prove a very general result. In this section K, Kb K 2 , K3 represent absolute constants. At various places we may use the same symbol to denote different constants, but this should not cause any confusion. Let c> °and let ai, a2,'" be integers satisfying °Theorem : ;:; al ::;:; a213.1::;:; (ErdosFuchs). .. '. Let fin) denote the number of solutions to the equation ai
+ aj =
n, and r(x) =
I
f(n)
so that r(x) is the number of pairs of integers ah aj satisfying ai formula cannot hold.
+ aj ::;:; x.
Then the
139
6.13 DResults
We shall first deal with the following auxiliary results. Theorem 13.2. Let an be real numbers such that co n= 
converges uniformly, and that
00
I:'= co a; converges.
Then
1t
1t
Proof Clearly we have co
co
I
1t/I(.9W =
I
anamei(nm)8.
n=oo m=oo
The required result follows from integrating term by term over  n to n.
°
I:"
Theorem 13.3. Let bn ~ and let q>(z) = 0( < n, z = re i8 (0 < r < I), then we have
°
(zW d.9
~~ 6n
at
1q>(zW d.9.
1t
Proof We introduce the function q(.9) =
{I I~I, 0,
when
1.91
when
0(
~
0(,
< 1.91
~
n.
Then we have
f at
f 1t
1q>(zW d.9
at
~
f 1t
Iq(.9WIq>(zW d.9 =
m,~
1
bnbmrn+m
1t
Iq(.9)1 2 ei(nm)8 d.9.
1t
When m =I n, we have a
1t
o
1t
=
4 O(n  m)2
(
0
1
sin(n  m)O() O(n  m)
~
0,
140
6. Arithmetic Functions
while when m = n,
f "
Iq(.9)12 d.9
=
23!Y. ,
1[
and therefore we have
a
n
Theorem 13.4. Suppose that
Izl < 1 and let co
n=O
Then there exist constants c, C such that Yn O< c < ;:=t (lr)t
(1 
where 61(r) + 0 as r + 1. In the first sum there are at most r)t terms, each of which is at most (1  r)i, so that the sum is at most (1  r)!. From Theorem 13.4 the second sum is
1 1 r
:::; 6(r)10g1(1  r)i.
Together we have 00
I
bnrn:::; K(1  r)~
1
+ 6(r)10g1(1
 r)i
1 r
n= 1
o
= O(10g  1_1_(1  r)i). 1 r
Theorem 13.6. Letf(x) and g(x) be two continuous realfunctions in the interval (a, b). Then b
b
b
I ff(x)g(X)dX I:::; (fF(X)dX f a
a
g2(X)dx)t.
a
Proof Let A be any real number and consider b
b
b
A2 f F (X)dX+2A ff(x)g(X)dX+ f g2 (X)dX a
a b
=f a
(Aj{X)
+ g(X))2 dx ~ O.
a
142
6. Arithmetic Functions
The discriminant of the quadratic expression cannot be positive and so the theorem follows. 0 Proof of Theorem 13.1. Suppose that
t < r < 1, z = reiiJ., 1 
r < oc < n12. Let
00
so that we have at once 00
g2(Z)
=
I
f(n)z"
"=0
and 00
(1  z) lg2(Z)
I
=
r(n)z".
"=0
If formula (1) holds, then 00
(1  Z)lg2(Z) = c
I
nz"
+ h(z)
"=0
= cz(1  Z)2 + h(z),
(2)
where 00
I
h(z) =
v"z",
"=0
We shall now derive a contradiction. From (2) we have
f e
3. There cannot be any real primitive character either if I = 1. For if m = 2m', 2,rm', then from n
== n' (mod m'),
(n,m) = 1,
(n',m) = 1
159
7.4 Character Sums
we deduce that n == n' (mod m) giving x(n) = x(n') so that x(n) is improper. Summarizing, the possibility for the existence of real primitive character occurs when
where Pi are distinct odd primes and a = 0,2,3. Moreover, if the character is primitive, then Cv = q>(p)/2 or
(~).
(x(n,p))"Hpl) = e"iindn =
Thus, if a = 0, then the real primitive character is the Jacobi symbol (n,m)
=
1.
If a = 2, then the real primitive character is nl ( n )
( 1)2 m/4 '
and if a
=
(n,m) = 1,
3, then there are two types of real primitive character:
)~n2  (m~8) ,
( 1
1)
n  1 n  1 ( __ n ) (_ 1)2+82
m/8
= (_
(n,m)
1)~(n2)29)
=
1,
( _n ) ,
m/8
(n,m)
=
1.
7.4 Character Sums Let m
S(a, X) =
L x(n)e21tian/m. n=1
Theorem 4.1. Let (mt. m2) = 1 and let X be factorized into
where Xl(n) is a character modml and X2(n) is a character modm2' Then
Proof Let n = mln2 + m2nl' Then as nt.n2 run over the complete sets of residues modmt. modm2 respectively, n runs over the complete set of residues modmlm2'
160
7. Trigonometric Sums and Characters
Therefore ml
Sea, X)
=
Xl (m2)X2(ml) L nl
m2
L Xl (ndxin2)e21tia(mln2 +m2 n d/mlm2
=1
n2::::::
1
Thus the study of character sums mod m is reduced to that of character sums to a prime power modulus.
Theorem 4.2. Let m = pl. If pia and X is a primitive character, or if p,ra and X is an improper character (but we exclude the case I = I, X = Xo), then S(a,x)
0.
=
Proof We make the substitution n = x(l
+ plly).
When I ~ x ~ pll, p,rx and I ~ y ~ p, the number n runs over the reduced residue system mod i, and conversely. Therefore o
p'l
P
Sea, X) = L x(x)e21tiaX/P' L x(l
+ plly)e21tiaXY/P.
y=l
x=l p,/'x
If x(n) is improper, then x(l
+ ily) = I,
Sea, X) = {
so that
O'
if p,ra,
p L x(x)e21tiax/P',
if pia.
p'l
x=l
If x(n) is primitive, then there exists u such that x(l from p
+ pl1U) #
p
x(l +pl1U)L x(l +ily)
= L x(l +pll(y+U»
y=l
y=l p
=
L x(l +ily), y=l
we have p
L x(l Therefore Sea, X) =
°also.
I; now pia and so
+ plly) = 0.
y=l
0
We shall write T(X) = S(I, X)·
161
7.4 Character Sums
If (a,m)
= 1, then m
x(a)S(a, X)
L x(an)e21tian/m
=
n=l
= S(l,X)· Theorem 4.3. Let
L
Cq(n) = (a,
e21tian/q,
q)= 1
where a runs over a reduced set of residues mod q. Then 1) cq(n) is a multiplicative function of q; that is if (qt. q2) = 1, then Cq,(n)Cq2 (n) = Cq,q2(n);
i 2)
Cpl(n)= {
pll,
if iln,
_pll,
if pl,tn, pilin,
0,
if pll,tn;
3)
Proof 1) can be proved by the substitution a = qla2 method described earlier. 2) follows from Cpl(n)
=
3) follows from I) and 2). Theorem 4.4.
pI
p''
a=l
a=l
+ q2al
with the familiar
L e21tian/pl  L e21tian/pll. D
If x(n) is a primitive character, then
Proof First consider the case m = pl. We have easily
1't'(xW =
't'(X)i(X) p'
=
L
p'
x(n)e21tin/pl
q=l
n=l p'
=
L
pI
x(n)e21tin/pl
L
x(nq)e21tinq/pl
q=l
n=l p'
=
L x(q)e21tiq/pl
pI
L X(q) L e21ti(1q)n/pl. q= 1
n
=1
p,tn
If pll ,t(q  I), then from Theorem 4.3, the inner sum on the right hand side in the above is O. We need therefore only examine the situation wh enplll(q  I), that is
162
q
=
7. Trigonometric Sums and Characters
I
+ pl 1U,
0
~ U ~
P  I. But now clearly p1
1c(xW
=
pi  pl 1
L:
_
i(l + pl 1 U)pl 1
u= 1 p
=
L:
pi _ pl1
i(l + i  1u).
u= 1
Now if x(n) is primitive, then there exists v such that x(l
i(l + pl1 V) # 0, 1. From p
p
L:
i(l +pl1 V)
+ pl1 V) #
L:
i(l +i 1u)=
u=l
0, I so that
p
L:
i(l +pl1(U + v)) =
. u= 1
i(l +i 1u),
u=l
we have p
L:
i(l + pl1U) = o.
u= 1
Therefore the case m Theorem 4.1. 0
=
pi is proved, and the general case follows at once from
We see therefore that c(x) =
evlm,
lei =
1.
However, the determination of e is no easy matter. For real primitive characters we know much more and in the next section we shall determine e when X is a real primitive character. Theorem 4.5. Let X be a real primitive character. Then, for odd m, we have
c(X)
=
{± ~
if m == I (mod 4), if m == 3 (mod 4).
±lym Proof This is similar to the proof of Theorem 4.4. If m p
(C(X))2 =
=
L:
X(q)
q=l
L:
e 21ti (1 +q)n/p = X(  I)p.
n=l
We already have x(  I)
so that the theorem follows.
= ( ~ I ) = ( _ I )p; 1,
0
7.5 Gauss Sums The trigonometric sum m1
S(n, m) =
p, then
p1
L:
x=o
e21tiX2n/m,
(n,m) = I
163
7.5 Gauss Sums
is the famous Gauss sum. In this formula the summation can be taken over any complete set of residues mod m. Theorem 5.1. If(m,m') = 1, then
S(n, mm') = S(nm', m)S(nm, m'). Proof Let x
=
my
+ m'z.
Then mm'
S(n, mm') =
L
e21tix2n/mm'
x=l m'
=
m
L L e21tin(my+m'z)2/mm' y= 1 z= 1
=
and hence the result.
m'
m
y=l
z=l
L e21timny2/m' L e21tim'nz2/m
D
We see that in order to evaluate a Gauss sum we need only deal with the case m=pl. Theorem 5.2. Let
b= {
1, 2,
when p is an odd prime, when p = 2.
Then, for 1 ~ 2b, we have
Proof Let x = y
+ p'bZ. Then, from
2(1 b) ~ I, we have
y= 1 z= 1 pld
=
L
pd
L e41tiyzn/pd
e21tiy2n/pl.
y=l
z=l
plc;
=
pb
L
e21tiy2n/pl
y=l ply
p'dl
=
pb
L
e21tix2n/pl2.
x=l
When p > 2, this is what is required. When p pl 3
P
L
x=l
the result also follows.
D
=
2, then from
pl 2 e21tix2n/pl2
=
L
x=l
e21tix2n/pl>,
164
7. Trigonometric Sums and Characters
From this theorem we see that the crucial points in the evaluation of a Gauss sum rest on the determination of S(n,2),
S(n,4),
S(n,8)
and p an odd prime.
S(n,p), Theorem 5.3. If 2,rn, then
= 0, S(n,4) = 2(1 + in), S(n,2)
7ti
= 4e4"n.
S(n,8) Proof Clearly we have 2Jti
S(n,2)
= 1 + eTn = 1  1 = 0,
S(n,4)
=
S(n,8)
= 2(1 + esn + es4n + es9n )
27ti
1 + eTn
27ti 4
27ti 9
+ eT n + eT n = 1 + in + 1 + in = 2(1 + in), 27ti
2ni
27ti
Theorem 5.4. If p is an odd prime, then
= (;)S(I,P) =
S(n,p)
(;)T(X).
Here x(a)
=
(~).
Proof The number of solutions to the congruence x2
== u (modp)
is
and therefore
±
e27tix2n/p
=
x=1
f (1 + (~))e27tiun/p = f (~)e27tiun/p P P
u=1
= (':.)
p
which is the required result.
u=1
±(~)
v=1
0
P
e27tiv/p,
165
7.S Gauss Sums
Theorem 5.5.
=
S(l,p)
if if
{JP, iJP,
== 1 (mod 4), p == 3 (mod 4). p
Proof From the above theorem and Theorem 4.5 we have S(l,p) =
{±±iJP, JP,
if p == 1 (mod 4), if p == 3
(mod 4),
which, combining into a single formula, gives
t(1 + iP)(l 
i)S(1,p)
=
± JP.
If we can prove that
+ i P)(1
91H(1
 i)S(l,p)} > 
JP,
where 91{x} represents the real part of x, then the theorem will follow. Now itis easy to see that p1
I
S(1,p)  1 =
t(pl)
I
e27tix2jp =
x= I
(e27tix2/p
+ e 27ti(pX)2/ p)
x= 1
t(pl)
=
2
I
(1)
e27tix2/p.
x=l
Let j(x) be any function. Then t(pl)
I
t(pl)
j(x)
x=l
(p x ) = I f (x)  . pl
+ I f x=l
2
x=l
2
This formula clearly holds because the first term on the left hand side is merely the sum of those terms on the right hand side when x is even, and the second term is the sum on the right hand side when x is odd. We take j(x) = e27tix2/p and note that j(~  x) = iPe27tix2jp. Then, from (1), we have pl
t(1 + iP)(S(l,p) 
I
1) =
+ Z,
(2)
e27tix2/4P.
(3)
e27tix2/4p = W
x=l
where
W
=
I
e27tix2/4p,
x.;;Jp
Z
I
=
JP<x.;;pl
From (2) we have
t(1 + i P)(1 Since 91H(l
+ i P)(1
 i)S(l,p) 
t(1 + i P)(l
 i) = (1  i)(W + Z).
 i)} is 1 or 0, it follows that
91H(l + i P)(l  i)S(1,p)} ~ 91{(1  i)(W + Z)} ~ 91(1 OW 
filZI.
(4)
166
7. Trigonometric Sums and Characters
From cos x
+ sin x ;::::
1 when 0
9l{(1  i)W}
~
~
x
n12, we deduce that
nx2 nx2) 1 r L  ( cos+ sin ;:::: [vPJ;:::: yp. 2p 2p 2
=
(5)
Jp On the other hand, if we write in Z, x:S;;
nx 2p
= cosec,
Wx
then (6)
Therefore, from (3) and (6) we have pl
L
2iZ =
x~q+
(v x 
Vx 
dwx,
1
that is
21Z1
=
Pil
I
viwx 
+ VplWp 
Wx + l )
VqWq+ll
x~q+l
pl
I
~
(Wx 
Wx+l)
+ Wp + W q + l = 2wq + l
x~q+l
r:
2p q+l
~~2vp
(because
Wx
(7)
is decreasing). From (4), (5) and (7) we finally have
The theorem is therefore proved.
0
Summarizing we have the following result: Theorem 5.6. If m is odd, then
S(n,m)=
{ (:)fo, fo, .(n)
if m == 1 (mod 4), if m == 3
(mod 4).
1 
m
Proof We use induction on the number of distinct prime divisors of m. If m = pi, then we have by Theorems 5.2 and 5.4, that I
S(n,p) =
{'
p2,
if 21/,
pt(l1)S(n,p) =
(~)pt X2 have the moduli PI, m' respectively, and x(n) = XI(n)X2(n). Therefore, from Theorem 3.6.4 and the induction hypothesis, we have
{fi:} ifi: . {P} iP ~~ {fi:} {P} ifi: iP
r(x) = ( m')(PI)  . PI m' = (
1)
2
2'
•
== 1 (mod 4) if m == 3 (mod 4) 2) a = 2. Let m = 22m'. If m' = 1, then x(l) = 1, if m
{ Jp1m' =Fm, = iJplm' = iFm,
or
X(l)=l,
or
X(  1) =  1.
X(3)
=  1 so that
4
I
r(x) =
x(n)e21tin/4
= e21ti /4  e61ti /4 = 2i.
n= I
If m' > 1, then from Theorem 4.1 and 1)
m'1(4)
r(x) = ( 1)2 m' 2i
.{P=i
== 1 (mod 4) if m' == 3 (mod 4) if m'
Fm, ip=Fm,
3) a
or
X(  1)
=  1,
or
X(  1)
=
1.
= 3. Let m = 23 m'. When m' = 1, we have B
r(x)=
I
n= I
.
x(n)e 21t1n /B =
{e 21ti /B _ e61ti /B _ el 01ti/8 + eI41ti/8 = j8, if X(  1) = 1,
.
.
.
.
e21t '/B + e6",/B  el 0",/8  eI41t ,/8 = ij8, if X(  1) =
Suppose that m' > 1. If x(n) = ( 1)t
=
IQlL
e27tina
I= 11 
e27tiQa . 1  e 27t1a
n=O
::::;
2
I
1
=
11  e 27tia l
Isin n!Y.1
1
~
"" 2
(when 0 ::::; ~ ::::;
t, sin n~ ~ 2~, so that Isin n~1 ~ 2< 0).
Theorem 7.3. If2,(q, then
ImlL
qlL
I
e27tix2/q  m e27tix2/q ::::; x=o q x=o
Jq log q.
Proof Clearly we can assume that m ::::; q. From Theorem 7.1 we have
mlL
x=O
e27tix2/q
qlL ql =m L =
e27tix2/qg(x)
x=o
e27tix2/q
1
ql qlL
+_L
qx=o
e27ti(x2+nx)/q
qn=lx=O
1
27tinm/q  e . . 1e 27t1n/q
From the formula for a Gauss sum we have
ql Ix~o
e 27ti(x 2+nx)/q
I= Iql x~o
e27ti(X + tn)2/q 1* ::::;
so that
Iqil
x=o
e27tix2/q _ m
qi
1
e27tix2/q
I
q x=o
ql ~L1
"" Jq n=l 2(~) *
Here
t represents the solution to the congruence 2x == 1 (mod q).
Jq,
172
7. Trigonometric Sums and Characters
I
~
t(ql)q
t(ql)
I
I =Jq I Jq n=l n n=l n < Jqt('I1)(_IOg(1 ~) + IOg(1 + ~)) n=l 2n 2n t(ql)
= Jq
I n= 1
+ log(2n + I))
(log(2n  I)
= Jqlogq.
0
Theorem 7.4 (polya). Let p be an odd prime, I character modp. Then
~
~
m
p, and X be a nonprincipal
I:t~ X(x) I< Jp logp. Proof From Theorem 7.1 we have
ml pl I x(x) = I x(x)g(x) x=o
x=o
m P
=
I
1
x(x)
Px=o
IPl
+ I
x(x)
Px=O
pl l_e21tinm/p I e21tinx/p . n=l Ie 21t1n/p
From Theorem 2.3, Theorem 4.4 and Theorem 7.2 we have
ml I JP1II _e21tinm/p _21tin/p IIPl I x(x)e21tinx/p I I I x(x) ~  I x=O Pn=l I e x=O I pl I ~ r.: I  ( ) < Jplogp. 0 V Pn =12 ~ p
This theorem has the following application: Theorem 7.5. Let p be an odd prime and dl(p  I). Then there is always a dth power nonresidue modp which is less than Jp logp.
Proof Let R represent a dth power residue not exceeding m. Then R=
where X(x)
=
mid
I
d
m
I  I e 21tia ind x/d =  I I e 21tia ind x/d x=l da=l da=lX=l
e21tiindx/d. From Theorem 7.4, we have dI r.: IR dml 2. From Theorem 9.1 we have Jl(k)
0=
k
L  L
Ih(p)l 1
a
L L'
klp1q>(k) u=1 a=O n=a (u,k)= 1
e21tiuindn/\
179
7.9 The Problem of the Distribution of Primitive Roots
where If means that we omit the term n = O. On the right hand side of this equation the term k = 1 is equal to Ih(p)l 1
I
a=O
a
Ih(p)l 1
n= a
a=O
If 1 = I
2a = Ih(p)12  Ih(p)l·
For those terms in which k :f 1 we use Theorem 9.2, taking A
I
a
Ih(p)ll [
=
Ih(p)1  1, so that
Ih( )1 2
a~o n~~a x(n) ~ Ih(p)lpt 
;
,
where
Therefore Ih(p)12  Ih(p)1
~ (lh(P)lpt 
I 1J1((~)1 ({)(k)
2 Ih(P;1 )
p
= 2m (lh(P)lpt
_
klpl ({)
Ih~;12).
That is Ih(p)1 ~
2mpt
+1
1 + 2m/p
t < 2mpt.
0
From Theorem 9.3 we immediately deduce: Theorem 9.4.
If p == 1 (mod 4),
then we have the primitive root
Proof We have to prove that Ih(p)1 is a primitive root. Suppose otherwise, so that  Ih(p)1 is now a primitive root. But Ih(pW
== 1
(modp),
I
1, pI II (kak, ... , 2a2, al), and that f.ll' ... ,f.lr are distinct roots of f'(x) == 0
o ~ x pl/4H, we have
INiH (~)I
< eH, P is the Kronecker symbol. He also used this to give an estimate for n2(p), where (11) p the least quadratic nonresidue mod p, namely n2(p) = O(p±Je +e). Burgess's method can be generalized and extended to give estimates for the least primitive root h(p) and the least dth power nonresidue nip): h(p) = O(p±+') (see D. A. Burgess [13J and Y. Wang [62J), nip)=O(pl/A+,), A=4e 1  1 / d (d~2); nip) = O(pB), B = (log log d + 2)j410g d (d> e33 ) (see Y. Wang [63J). n=N+l
Chapter 8. On Several Arithmetic Problems Associated with the Elliptic Modular Function
8.1 Introduction The following four important functions frequently occur in the theory of elliptic modular functions:
n (l 00
qo
=
q2n),
n~l
n (l + q2n), 00
ql
=
n~l
n (l + q2nl), 00
q2
=
n~l
00
Following the tradition in the theory of elliptic modular functions we use q to represent the variable, which can be real or complex and which satisfies Iql < I. The four infinite products then clearly converge. We do not give any deep discussion on the properties of the elliptic modular function in this chapter. Indeed we do not even define an elliptic modular function and instead we shall study the following associated arithmetic problems: the partition of integers, the sum of four squares, and the transformation of power series related to qo, ql, q2, q3' The problems of convergence arising in the chapter are very simple and any reader familiar with advanced calculus can easily supply the details. (In §8 we also use ndimensional multiple integration). We shall therefore omit all qiscussions on convergence in this chapter. The following is the first and simplest relationship between ql, q2, q3' Theorem 1.1. if Iql < I, tHen Proof We have
n (l 00
q2q3
=
q2(2nl»).
n~l
We rearrange the terms in ql by taking out all the powers of 2 from 2n giving
187
8.2 The Partition of Integers.
ql =
00
00
00
n=l
n=l
n=l
f1 (l + q2(2nl) f1 (l + q4(2nl) f1
(1
+ q8(2nl) ....
From this we see that 00
qlq2q3
=
00
n=l
=
n=l
00
f1
00
(l +q4(2nl)
n=l
00
00
n=l
n=l
f1
00
(1 +q8(2nl) ...
n=l
f1 (l + q4(2nl) f1 (l + q8(2nl)
(1  q4(2nl)
n= 1
=
00
f1 (1_q2(2nl) f1 (1 +q2(2nl) f1
...
00
f1
f1 (1
(1  q8(2nl)
n=l
+ q8(2nl) ... = ... = 1. 0
n=l
The theorem can also be proved from the equation 00
f1 (1
qOqlq2q3 =
 qn)
n=l
00
00
n=l
n=l
f1 (1 + qn) = f1
(1  q2n)
= qo.
8.2 The Partition of Integers Let n be a positive integer. Any collection of positive integers whose sum is equal to n is said to form a partition of n. For example:
5=4+1=3+2=3+1+1=2+2+1 = 2 + 1 + 1 + 1 = 1 + 1 + 1 + 1 + 1, so that there are 7 partitions of 5. We denote by p(n) the number of partitions of n, so that in the above example we have p(5) = 7. Ifwe restrict to those partitions of n in which each term in the partition does not exceed r, then we denote by Pr(n) the number of such partitions. For example, P3(5)
=
5.
Theorem 2.1. If Iql < ~, then 00
1+
n~l Pr(n)qn =
1 (1 _ q)(l _ q2) ... (l _ qr) .
Proof The right hand side of the equation above is equal to
(1 + q + q2 + q3 + ... + qXI + ... ) x (l + q2 + (q2)2 + (q2)3 + ... + (q2)X2 + ... ) x (l + q3 + (q3)2 + (q3)3 + ... + (q3)X3 + ... ) x ... x (1 + qr + (qr)2 + (qr)3 + ... + (qT' + ... ),
188
8. On Several Arithmetic Problems Associated with the Elliptic Modular Function
and the coefficient of qn is the number of nonnegative integers solutions to Xl
which is Pr(n).
+ 2X2 + 3X3 + ... + rXr = n
0
We can prove similarly:
If Iql
or
x _ n 
(l  q2m2n+2)(1  q2m2n+4) .. .. (1  q2m) X q (l _ q2m+2n)(1 _ q2m+2n2) ... (1 _ q2m+2) o· n2
From (3) we have (l  q4m)(1  q4m2) ... (1 _ q2m+2) Xo=~~~~
(l  q2)(l  q4) ... (1 _ q2m)
so that when 0
~ n ~ m 
,
1, n2
X n 
q X' (l _ q2)(1 _ q4) ... (1 _ q2m) n'
where X' n 
(1  q2m2n+2)(l _ q2m2n+4) ... (1 _ q2m) (1  q2m+2) ... (1 _ q4m) (1 _ q2m+2n)(l _ q2m+2n2) ... (l _ q2m+2) (4)
It follows that (2) can be written as (1  q2)(l  q4) ... (1  q2m)({)m(z)
= X~ +
m
I
qn 2 (zn
+ zn)x~.
(5)
n=l
As m + 00, X~ + 1 so that the identity in the theorem follows. However we still have to justify the process of taking the limit of the individual terms in the series. Let Uo. m
= X o, if 1 ~ n
~
if n > m,
m,
190
8. On Several Arithmedc Problems Associated with the 'Elliptic Modular Function
so that co
L un,m'
({)m(z) =
(6)
n=O
As m +
00,
the term un,m + Un where (n > 0).
We have co
n (l + Iql2k) = Kl
IX~I
(using Theorem 6.6)
6(c/2)
which proves (3). 2) We next prove: Given any positive e there exists A (= A(e)) such that I p(n) > _e(Cs)n t . A
We use induction on n, but the choice of A will not be made clear until later. From Theorems 6.3 and 6.4 together with the induction hypothesis we see that (4)
203
8.6 Estimates for p(n)
Since e X
;?;
I  x, the double sum is
I
;?;
Pk 2 ) I  I (c  e) ,2 n'
let(ce)lknt (
Ik';n
ce
=Il' I2 2n'
(say).
For any positive t we always have e X
I
let(ce)lknt
Ik>n
=
(5)
O(x t ), so that
(ni I h= o(n I: I: =0
11
it (lk)
~~t)
Ik>n
h
11itkit)
1= 1 k= 1
= O(n it ),
if t> 8.
(6)
From this and Theorem 6.6 we have 2n 1 n
II> 3(c 
e)
2n 2 n
C3Jn
2 
2n 2 n
(I
I)
In
= 3c2 + 3 (c _ e)2  c2  C3 n (7)
(using
I
I
2
(c  e)
2"=2 c
J x
3
dx>2ec 3 ).
ce
On the other hand, by the binomial theorem and Theorem 6.5,
I2 = I
k 2 [3e t (ce)lknt
Ik';n n
~
co
I
I
k 2
k=1
1 3 e·t(ce)lknt
1=1 e  t(c e)kn  t
n
~ 12
I
k 2
k=1
=0 ( n
(le
I
t(c
e)kn
1)4
t(c
e)kn
t)2 ).
n
k= 1
(l 
e
We divide the sum in the bracket into two parts: n
I = I k=1
k,.j~
+
I
j~ etcx,
o
which gives L _ (1 _ k ..
Jn
et~ce)knt)2 = 0 (n
L k ..
Jn
:2) =
O(n).
In the second part t(e  e)kn t ~ t(e  e) and
so that
From this and (8) we see that L2 = O(n2).
(9)
Collecting (4), (5), (7), (9) we have I np(n) > _e(ce)n\(l A
+ 2ee 1 )n 
e4Jn).
When
e4 )2 (2ee
n>  1 we have I
p(n) > _e(ce)n t .
(10)
A
When n :::; e;(2ee 1 ) 2 we take A large enough so that (10) holds. The theorem is proved. 0
8.7 The Problem of Sums of Squares Let r.(n) denote the number of sets of integer solutions (x h
xi + ... + x; = n. From Theorem 6.7.5 we already have r2(n)
=
L ( l)t(Ul), uln
... ,
x s ) to the equation
205
8.7 The Problem of Sums of Squares
where U runs over the odd divisors of n. This theorem is clearly equivalent to the following: Theorem 7.1.
if Iql
2 n= 1 and 1
Ck
00
+I
= Uk
2
I
00
Uk+1
1=1
+I
U,Uk+1 

k 1
I
2 , =1
1=1
U,UkI'
Now and so that
Theorem 7.6.
Proof In Theorem 7.5 we take 9
G+ n~o
n~o
U4n+ 1 
1
1
00
16 1
= 
16 1
+ I 2
+ I (
1)mc2m
m=l 00
nUn
+ I (
n=l
l)mu2m (l
+ U2m

m)
m=l
00
00
I
=  +16
nUn
2 n=l 1 00
2 1
y
00
I
=  +
U4n+3
= nl2 giving
(2m 
I)U2m1
m=l
+ I (
l)mU2m (l
+ U2m)
m=l
00
+2 I
(2m 
I)U4m2
m=l
1
1
=  +
00
I
00
(2m 
16 1
2 m=l 1 00
16
2n=1
=+
I
4.j'n
nUn'
I)U2m1
+ I
m=l
0
(2m 
I)U4m2
(by Theorem 4)
208
8. On Several Arithmetic Problems Associated with the Elliptic Modular Function
Theorem 7.2 now follows easily from Theorem 7.1 and Theorem 7.6. From Theorem 7.2 we deduce at once: Theorem 7.7. r4(n)/8 is a multiplicative function.
D
Theorem 7.8 (Lagrange). ,Every positive integer is the sum of four squares.
D
Apart from these we also have the following application: Theorem 7.9 (Jacobi). q~  q~ = l6qq~. Ifwe substitute the representation formulae in §1 into this identity then we have
CDI
(1
+ q2nI)Y 
CDI (1  q2nI)Y
=
16q
CDI
(l
+ q2n)y.
(Jacobi called this result "Aequartro identica ratis abstrura".)
00
(qoq~)4
=
L r4(n)( l)nqn n=O
and (2qoqi)4 =
C=~oo qn(n+l)r,
we see that our required identity is equivalent to
Let s4(n) denote the number of solutions to (4)
where n must be odd. Thus our theorem has the following arithmetical interpretation: if n is odd, then s4(n) is equal to 2r4(n). We multiply equation (4) by 4 and from completing squares we have (2XI
+ 1)2 + ... + (2X4 + 1)2 = 4n.
The r4(4n) solutions to the Diophantine equation
209
8.7 The Problem of Sums of Squares
have only two types: (i) Yt>Y2,Y3,Y4 all odd, (ii) Yt>Y2,Y3,Y4 all even. From this it follows that
From Theorem 7.2 we have
I
r4(4n)=S
m=sI(m+2m)=3(sIm)=3r4(n), min
ml2n
min
and hence
The theorem is proved.
0
Exercise 1. Use the following method to prove that 1
I
1
n2
1
+ 22 + 3 2 + 4 2 + ... = 6".
Obtain the asymptotic formula
for the number A(x) of lattice points inside the four dimensional sphere
Find another representation for A(x) with Theorem 7.2 and compare the results. Note. From this exercise and (6.14.2) we deduce at once that
~ J1(n) = ~
L..
n=l
n
2
n
2·
Exercise 2. Show that
Exercise 3. Use the identity (1  cosn.9) cot2 t.9
= (2n  1) + 4(n  l)cos.9 + 4(n  2) cos 2.9 + ...
+ 4cos(n to prove that
1).9 + cosn.9
210
8. On Several Arithmetic Problems Associated with the Elliptic Modular Function
I 21 {cot 8 8 2
1 12
X
+  + (1 1
+ 3 (3 1 X
1
=
 cos 8)
X
 cos 38)
X3
2X2
+ (1 1
 cos 28)
X2
+ ... }2
(~cot2~8 + ~)2 + ~{~(5 + cos 8) + 2 8 2 12 12 1  X 1
3
3X
+ 13_
3
X3
(5
X
2
X
2
(5
+ cos 28)
+ cos 38) + ... } .
8.8 Density Let r,(n, q) denote the number of solutions to
xi+"'+x;=n
(modq).
(1)
Consider the substitution
Xi + ... + x; = y. There can be q' values on the left hand side and q values on the right hand side. This means that corresponding to one value of y there are, on average, q'1 solutions. We now consider the ratio between the number of solutions and the average number A
(
LJqn
)
=
rln, q)
,1'
q
Let
we call this the pdensity of the congruence (1). We also define oo(n)
1
= lim~o 2(j
r··f
dX1 ... dx.,
which we call the real density of the congruence (1). We now calculate the values of the various densities. Theorem 8.1. When s is even the real density is equal to
(2)
211
8.8 Density
Proof We have, with polar coordinates, 21t
II
(l_x 2 _y 2)a 1dxdy= I
1
d9I(lp2)a1PdP=~' o
o
We next use induction to prove the result: !...
, V=
dx "'dx 1
xi  ...  x; > 0
n2
=
'G}
1
Let
= Yv2 JI
Xv
 xi  x~
(v=3,oo.,s).
Then
v, =
II
,2
xi 
(l 
x~)2dx1
dX2
lxixi > 0
= ~ V,_ 2 = (n:)/2 . 

2
,
2 .
We then have . I ( oo(n) = hmbO
22 3 (2) n Also, from Theorem 8.1, we have
214
8. On Several Arithmetic Problems Associated with the Elliptic Modular Function
so that p>2
Ifn is odd, then the theorem is proved. Ifn is even, then, from Theorem 8.6, we have
The theorem is proved.
Theorem 8.8.
D
If s = 8, then = 16(  l)n I
bs(n)
( l)d d 3 .
din
Proof Let n = 2tn', 2,rn'. Then
}]2 op(n) = 1516 ,(4)1 n'3u3(n') = 96n4n'3 u3 (n'). Also, from Theorem 8.1, we have
n 3 oo(n) = _n 6' 4
so that oo(n)
n op(n) = 16 . 23tu3(n'). p>2
Also o2(n)
=
(l  2 3 (t+l). 15)(1  t)I;
hence When n is even
I ( l)dd 3 =
 u3(n') + 23u3(n') + 23.2 u3(n') + ... + 2 3 . uj(n') t
din
= 
The theorem is proved.
2U3(n')
+
23 (t+ 1) _ 1 23 _ 1 u3(n')
D
Exercise 1. Prove the following: Let s = 2r. If r is even, then
215
8.9 A Summary of the Problem of Sums of Squares
If r is odd, then L(r)6s(2 tn')
= (( ~, 1) + ( r 1)2(1r)(d 1))n'lrPr1(n'),
where
1:
L(r) =
n= 1
X(7) , n
and x(n) = 0, 1,0,  1 when n == 0, 1,2,3 (mod 4). Also
pt(n) = L (~)qt. q qln
Exercise 2. Prove that 2(n)
=
2r2(n).
Exercise 3. Prove that
8.9 A Summary of the Problem of Sums of Squares In the previous section we proved that r 4(n) = 4(n), but is this a mere coincidence? Actually we can prove that, for 3 ::;;; s ::;;; 8, we have r.(n)
= .(n),
and that this is no longer true if s > 8. Up to the present rs(n) has been explicitly evaluated for s'::;;; 24. For example: r3(n)
16 = n!X2(n)K(  4n) 1t
f1
(I +  + ... +;=t 1
1
P
p21n
P
where the definition of"C is p2tln, p2(t+ 1),tn, K(  4n)
=
I Lco (_ 4n) , m=l
m
m
and if 4 an == 7 (mod 8), if 4 an == 3 (mod 8), if 4 an == 1,2,5,6 (mod 8),
216
8. On Several Arithmetic Problems Associated with the Elliptic Modular Function
and here the definition of a is 4a ln, 4a +1,rn.
where u1't(n)
= I(
1)dd 1 1,
din
and T(n) is the coefficient in the power series expansion 00
q«l  q)(l  q2) ... )24 =
I
T(n)qn
n=1
and if nl2 is not an integer, then T(nI2) = O. From Theorem 3.6 we have 00
«1  q)(l 
q2)(l  q3) ... )3
=
I ( 1)n(2n + l)qtn(n+
1),
n=O
so that
«
T(n) =
lY'(2x1
+ 1) + ... + (
ly8(2xs
+ 1»
txdxl + 1)+· .. +tx8(t8+ 1)=n1
I Y;+"'+yi=8n
s
I ( l)t(Yi1)Yi' i=l
2,j'Yl'''Y8
The following table records the mathematicians who did the evaluations: s
r.(n)
2,4,6,8 3
5,7 10, 12 14, 16, 18 20,22,24 9, II, 13 15, 17, 19 21,23
Jacobi, 1828 Dirichlet Eisenstein, Smith, Minkowski Liouville, 1864, 1866 Glaisher, 1907 Ramanujan, 1916 Lomadze, 1949
Chapter 9. The Prime Number Theorem
9.1 Introduction The main aim of this chapter is to prove the following formula: X
(I)
n(x) '"   . logx
Here n(x) denotes the number of primes not exceeding x, and the formula (I) is the famous prime number theorem. In this chapter we shall give two proofs. The first proof makes use of some rather deep analytic tools (the reader needs to know a little advanced calculus and complex function theory) but is relatively straightforward, the fundamental idea being due to N. Wiener. Although the other proof does not require much analytic knowledge and can indeed be classified as an elementary proof, it is more difficult to understand. This proof is due to Erdos and Selberg. One of the difficult problems in the long history of prime number theory is the search for an "elementary proof" of the prime number theorem and success came in 1949. In the following sections we do not give a direct proof of the formula (I). Instead we prove two formulae, each of which is equivalent to (I). Suppose that x > O. Let 9(x)
=
L logp,
(2)
p~x
tjJ(x)
=
L
A(n)
L' logp.
=
(3)
In formula (3) A(n) is the von Mangoldt function of Example 6in §6.1. 9(x) and tjJ(x) are called Chebyshev's functions. It is easy to see that tjJ(x) = 9(x)
+ 9(xt) + 9(xt) + ...
(4)
and tjJ(x)
=
L [~Og x] logp, p'iix
(5)
ogp
where [~] denotes the integer part of ~. Theorem 1.1. We have
I· 1m
n(x)
x ... oox(logX)l
9(x) I· tjJ(x) = I· 1m   = Imx"'oo
X
x"'oo
X
(6)
218
9. The Prime Number Theorem
and lim n(x) = lim 8(x) = lim tfJ(x) . x+oox(1ogX)l x+oo X x+oo X
(7)
Proof From (4) and (5) we derive easily 8(x) ~ tfJ(x) ~
logx Iogp = n(x) log x, p"'x logp
I
so that  . 8(x) hm  x+ 00 X
~
 . tfJ(x) hm  x+ 00 X
~
. n(x) hm 1 . x+ 00 x(log x)
Now let 0 < oc < 1, x > 1. Then 8(x) ;:::
I
logp;::: {n(x)  n(x")} log XIX ;::: oc{ n(x)  x"} log x.
xOl 0 (i = 1, 2, 3, 4) such that for x ;::: 2, (8)
and (9) Also from Theorem 1.1 we see at once that in order to prove formula (1) we need only prove that tfJ(x)
or
~
x
(10)
219
9.2 The Riemann ,Function
(11 )
8(x) '" x.
Before we prove formula (10) we need some preparation.
9.2 The Riemann ,Function From now on we write s = u function defined by the series
+ it for
a complex number with u and t real. The
1
00
(s)
I ;
=
(u> 1)
(I)
n=ln
is called the Riemann (function. Let a > I. When u ~ a, because
0011001001 ,,~,,~,,1 L... s ~ f...J O'~ L... a' n=N n
n=N
n
n=N n
we see that the series for (s) is uniformly convergent. Since a is any real number greater than I, it follows that (s) is an analytic function in the half plane u > I.
Theorem 2.1. Let 1
h(s) = (s)   . s I Then h(s) is analytic in the half plane u > 0, and
Ih(s)1
~ 1.1u
(u > 0).
Proof Let
f
n+l
fn(s) = n s

u s du,
n
so that (2)
Since
f
In s  usi
=
1
sv s 1 dv 1
n
f
n+l
u
~ lsi
n
v lJ 
1
dv
(n ~ u ~ n
+ I),
220
9. The Prime Number Theorem
we have
If
n+1
If,,(s) I =
n+1
(n S
~ lsi
u')dul

n
n
Suppose that 0 < a
~
(f
f v a 1dv.
~
~
b, 
T~
t
~
Jb 2 + T2 Na
T. Then
a,
so that the, series L:'= 1f,,(S) is uniformly convergent in 0 < a ~ (f ~ b,  T ~ t ~ T. Since a can be arbitrarily near 0, and b, T can be arbitrarily large it follows that h(s) = L:'= 1 f,,(s) is analytic in the half plane (f > O. From this we see that (2) can be used as an analytic continuation for '(s) into the half plane (f > 0, and s = 1 is the only simple pole with residue 1. From (2) we derive at once co
If,,(S)I~lslfva1dv=~ I '(S)~I=I s 1 n=l
The theorem is proved.
((f> 0).
(f
0
Theorem 2.2. In the half plane (f
~
Proof When (f> 1 the series Theorem 5.4.4
L:'= 1 (lln
1, '(s) # O. S)
converges absolutely so that from
(3)
here the product is over all primes p. Since each factor in the product is nonzero and the product converges absolutely, it follows that '(s) # 0 when (f > 1. Since '(s) has a pole at s = 1 we are left to prove: when t # 0
'(1
+ it) #
O.
Now consider the funct!on (e> 0, t # 0).
From (3) we know that
221
9.2 The Riemann ,Function
where a
P
1 . I1 3
= I1  pI 1+e
1
pI +e+it
1 . I1 4
1
:::::7"
pI +e+2it
11
'
so that
= From 3
00
1
m=l
m
L _p(1
+e)m(3
+ 4cos(mtlogp) + cos (2mtlogp)).
+ 4 cos 9 + cos 29 = 2(1 + cos 9)2
~
0, we have loga p ~ 0, that is
ICfJe(t)1 ~ 1.
Suppose that (I
+ it) = O.
(4)
Then
f
1 +e
(I
+ e + it) =
nO"
+ it) dO" = O(e).
From Theorem 2.1, we have e(1
+ e) = 0(1)
so that, for any small e, we have CfJe(t)
and this contradicts (4).
= O(e),
0
Theorem 2.3. Let
ns) (s) When
0" ~
1
+s _
1 = g(s).
1, g(s) has a continuous first derivative.
Proof Differentiating the function h(s) in Theorem 2.1 we have
1
ns)
=  (s _ 1)2 + h'(s).
Here h'(s) is infinitely differentiable in
is regular in the half plane 0"
~
0"
> O. Also from Theorem 2.2, we see that
1
s 1
(s)
1 + (s  l)h(s)
1, so that 1 + (s  1)h(s) ¥ 0 in the same half plane.
222
9. The Prime Number Theorem
Therefore
_ (
I
_ h'(S))(S _ I)
(s  1)2
I
         =    + g(s), I
+ (s 
I )h(s)
S I
and here g(s) has the required property stated in the theorem.
D
9.3 Several Lemmas Theorem 3.1. If f(x) has a continuous first derivative, then b
ff(x)eiX1dX
=
oG)·
(1)
a
Proof From integration by parts we have b
f f(x)e ixt dx
=
b
h
{[f(x)e ixtJ :  f f'(x)e ixt dX}
a
=0
G)·
a
Theorem 3.2. 00
sinx f dx=n. x
(2)
00
Proof Let 00
sinIXx J= f ekx~dx
(I ::;:;
IX ::;:;
2, 0 ::;:; k ::;:; I).
o
Fix k > 0 so that the integrand is now a continuous function of IX and x, and the partial derivative with respect to IX is e kx cos lXX, which is also continuous. From the convergence of the integral 00
f ekxdx o
we see that the integral 00
f ekxcoslXX dx o
converges uniformly in I ::;:;
IX ::;:;
2. We can therefore differentiate J under the
223
9.3 Several Lemmas
integral sign giving
Here the right hand side is obtained from integrating by parts twice. From integration formulae we have
With IX fixed, when 0
~
~
k
o ~ k ~ 1. Therefore
1, J is uniformly convergent so that J is continuous for
f 00
lim J= kO+
sin IXX IX 1t dx= lim tan 1 =. X kO+ k 2
o
Taking in particular
IX
= 1, we have 00
00
sinx
f sin x
f dx = 2 dx = x
o
1t.
X
o
00
Theorem 3.3. Let a < 0 < h.
If f(x) has a continuous second derivative, then b
1f sinwx lim  f(x)dx = f(O). ro 00 1t
(3)
X
a
Proof We consider b
sinwx f (f(x)  f(O))x dx. a
At the point 0, (f(x)  f(O))/x has a continuous first derivative so that from Theorem 3.1 we have b
sinwx
lim f (f(x)  f(O))dx = 0, (0+
X
00
a
that is
f b
b
sinwx lim 1 f(x)dx X
ro 00 1t a
=
flO) lim 1 fSinwx dx X
ro 00 1t
a
224
9. The Prime Number Theorem
f bro
1 lim =fiO)
sinx dx X
11: ro> 00 cro
f 00
1 =fiO)11:
sinx dx, x
00
and the result follows from Theorem 3.2.
0
Theorem 3.4. Let A > 0, and
K;.(x) =
Ixl { 1 2A.'
if Ixl:;;; 2A, if Ixl > 2A.
0,
Then
f . ./he 00
K;.(t)e,xt dt = k;.(x),
1
(4)
00
where
{ ./he2x =
~(SinAX)2 ,
k;.(x)
if x =I 0,
2A
if x
./he'
= 0.
Proof It is easy to see that
fo f (1 H
k;.(x) =
(5)
;A)cosxtdt.
o
If x = 0, then clearly k;.(x) =
1
M:"2A.
y2n
If x =I 0, then integration by parts gives the required result at once.
0
Theorem 3.5. We have
f . 00
K;.(x)
1 =./he
k;.(t)e,xt dt.
00
(6)
225
9.3 Several Lemmas
In particular, with A = 1, x
= 0, we have 00
(7) 00
Proof We first consider the integral
f . ro
lew) =
1 fo
f ro
k;.(t)e,xtdt =
2 fo
k;.(t)cosxtdt.
0
ro
From (5) we have ro 2;'
lew)
= ~ f f (1 o
;A) cos utcosxtdudt
0 ro
2;'
=
~f
A) du f (cos(u + x)t + cos(u 
( 1  2U
o 2).
= ~f
o
(1  ~)(sin(U ++ 2A
7t
x)t) dt
u
x)w x
+ sin(u 
X)w)dU. u x
o
If x> 2A we have lim ro _ oo lew) = 0 from Theorem 3.1; if 0 < x < 2A we see from Theorem 3.1 and Theorem 3.3 that in the above formula the limit of the first term is o and the limit of the second term is 1  X/2A. Since the integral in (6) is a continuous function of x, we see that K;.(U) = 0, K;.(O) = 1. The theorem is proved. D Theorem 3.6. Letf(t) ~ 0(0 ~ t ~ 00), andforany T > 0, the interval 0 ~ t ~ Tcan be divided into afinite number of sections in each of whichf(t) is continuous. Suppose further that, for any e > 0, the integral 00
converges. Then 00
00
lim f e''f(t) dt = ff(t) dt.
,0
o
(8)
o
Proof Since f(t) ~ 0, S~ f(t) dt increases with respect to T so that S~ f(t) dt exists either as a finite number or 00. Now
226
9. The Prime Number Theorem 00
00
f e'1j(t) dt
~f
f(t) dt,
o
o
so that 00
00
lim f e''l'(t) dt ,"'0
~ ff(t) dt.
o
o
On the other hand T
00
~f
f e'1j(t) dt
T
~ e,T f f(t) dt,
e''l'(t) dt
o
o
o
so that T
00
~ ff(t) dt.
lim f e'1j(t) dt ,"'0
Letting T +
00
o
o
00
00
we have lim f e''l'(t) dt ,"'0
~ ff(t) dt, o
o
and the theorem is proved.
0
9.4 A Tauberian Theorem Definition. If f(x) is defined in 
00
<x
0, then f(x)
+
I (x
+ 00).
=
I,
227
9.4 A Tauberian Theorem
Proof From Theorem 3.5 we have
f
f
co
1 ~ Y 2n
co
k;.(x  t)dt
= 1
n
co
sin 2 u  1, 2du u
co
so that, without loss of generality we .can suppose that I = o. If f(x) 0, then there exists 0 > 0 and a sequence (xn)(xn + 00) such that j(xn) <  0 (n = 1, 2, ... ) or j(xn) > 0. Assume without loss that j(xn) > 0 (n = 1,2, ... ). (The casej(xn) <  0 can be proved in the same way.) Since f(x) is slowly decreasing, there exists Xo = xo(o) and 11 = 11(0) such that
+
o "2
j(y)  f(x) ~
holds. Take a particular x in (xn). Then f(y)
From (2), when x
f
o >"2 ~
(2)
Xo and x in (x n), we have
co
~
k;.(x
+ 11 
t)f(t)dt
00
x+2q
f ~ f fo f 2fo ~f of
~ 2y~ 2n
x
k;.(X+I1 t )dt
~
y 2n
x
f
k ;.(X+I1 t )dt
co
co
k;.(x
+ 11 
t) dt
x+2q
f
x+q
=
_0_
xq
k;.(x  u)du 
xq
k;.(v)dv  ; .
o
;'q
sin2 d w w2M w2 n
=
n
o
o
+ (A. + 00),
2
fo
co
co
q
=
~
f
f
k;.(v)dv
q
co
sin 2 w dw w2
f co
k;.(x  u)du 
~
fo
x+q
k;.(xu)du
228
9. The Prime Number Theorem
so that there exists a suitably large A. o such that 00
1 r::L y2n
f kAO(x + 1'/ 
0 t)f(t)dt >4
00
Let x increase without bound in (xn ) so that 00
lim x+oo xe{Xn}
f
1 r::L y 2n
kAO
(x
+ 1'/ 
0 t)f(t) dt ~ , 4
 00
which contradicts our supposition. Therefore f(x) proved. 0
+
0 and the theorem is
Theorem 4.2 (Ikehara). Let h(t) be nondecreasing in 0 ~ t < 00, and suppose thatfor any finite T, h(t) has only afinite number ofdiscontinuities in 0 ~ t ~ T. Suppose also that the integral
f 00
j(s) =
(0" > 1)
esth(t)dt
(3)
o
converges, and that given any finite a > 0, there exists a constant A such that lim (j(s) 0, A > O. From Theorem 3.4 and the uniform convergence of
f 00
(a(t)  A(t))e(e+iy)t dt
00
in Iyl :::; 2A, it follows that
f f f
h.(x)
= 2~
(a(t)  A(t))eet dt
f
2A
21n
KA(y)ei(xt)y dy
u
00
=
f 2A
00
00
K;.(y)e ixy dy
2A
(a(t)  A(t))e(e+iy)t dt
00
2A
= _I ~
K;.(y)eiXY(fll
+ e + iy) 
~)dY. e+ry .
2A
From (4) we have
f 2A
. hmh.(x) .... 0
= 1 2n
g(y)K;.(y)e iXY dy.
(8)
2A
From Theorem 3.1 we have lim 1imh.(x)
=
O.
(9)
x 00 £0
On the other hand, from Theorem 3.6, we have 00
limh.(x) = lim .... 0
e ...
~(f kA(x 
ov2n
fo f fo f
t)a(t)e· t dt  A
o
00
=
k;.(x  t)e· t dt)
0
fo f 00
kA(x  t)a(t)dt 
o
f 00
kA(x  t)dt
0
00
=
00
k;.(x  t)(a(t)  A(t))dt
= I;.(x),
230
9. The Prime Number Theorem
and so from (8) we see that /;.(x) exists. This proves 1), and now 2) follows from (9). Finally we prove 3). From the definition of A(t) we see that it suffices to prove that a(t) is a bounded slowly decreasing function. From (7) we have
f
f
00
~
lim xoo
V 2n
00
k;.(x  t)a(t)dt = lim xoo
~
V 2n
00
00
f;'x
A
=lim
fo xoo
k;.(x  t)A(t)dt
A(Sin U)2  n u
du
00
f 00
=A 
n
(sinu)2   du=A, u
00
so that there exists Xo such that, when x
fo f
;?; xo,
00
k..{x  t)a(t)dt < A
+ 1;
00
that is
f ( t)2 ( "It) 00
sin t
a x
dt < n(A
+ 1)
00
Since the integrand is nonnegative, substituting x
+ 2/fi for
x, we have
J~
f ei~tya(x+ ~Ddt O. We have
III
231
9.5 The Prime Number Theorem
so that lim {a(x
+ J) 
~
a(x)}
O.
x'" 00
b ... O
This means that a(x) is slowly decreasing. The theorem is proved.
0
9.5 The Prime Number Theorem In this section we apply Ikehara's theorem to prove the prime number theorem. We do not give a direct proof of the prime number theorem; instead we prove the equivalent theorem (see §I): Theorem 5.1. ljJ(x) '" x.
Proof From the definition of ljJ(x) we see that ljJ(x) is a nonnegative increasing function with only finitely many discontinuities in the interval 0 ::::; t ::::; T. When u> I we have, from Theorem 1.2 and formula (6.14.5), that
f 00
f 00
estljJ(et) dt
=
u(l +s)ljJ(u) du
o
n+l =
n~1
n+l
f
u(1 +S)ljJ(u) du =
n~1 m~n A(m)
f
u(s+ 1) du
n
I
=
00
L (n
(n
S 
+ I)S) L
I
=  lim
N
L (n
+ 1),)
(n
S 
SNoo n =l
= ~ lim { s Noo
=~
1:
A(m)
m~n
sn=1
f.
A(n)n S

(
L A(m))(N + I)s}
m~N
n=1
A(n) s n=1 nS
L A(m)
m~n
= _ ~ . ('(s) s
(u> I).
((s)
From Theorem 2.3 we see that the function
 I('(s)    I  I (('(S)  + I) I s ((s)
s I
s ((s)
s I
s
has a continuous derivative in u ~ I, so that for any a > 0 the function is uniformly continuous in I ::::; u::::; 2, It I ::::; a, and therefore there is a continuously differentiable function get) satisfying
232
in
9. The Prime Number Theorem
lim (_
~ ('(s)
.,.+1
S
1_) =
__
(s)
S 
g(t)
1
It I :::; a uniformly. From Theorem 4.2 we see that lim etl/l(et) = 1. t+ 00
Let et
= x. Then lim I/I(x) = 1, xoo x
D
which proves the theorem.
Exercise 1. Letpn be the nth prime number. Prove that the prime number theorem is equivalent to 1l· mPn   = 1. n logn
n+oo
Exercise 2. Use the prime number theorem to deduce that M(x)
I
=
Jl(n)
= o(x).
Exercise 3. Use the prime number theorem to deduce that
Exercise 4. Let n
= p~! ... p%k and define w(n)
= k,
Let 1tk(X)
8k (x) =
I
I
=
1,
'tk(X)
I
=
1,
n~x
n~x
co(n) = Q(n) = k
Q(n)=k
log (p 1
••• Pk),
pl ••• Pk~X
o
(x)
=
I
1.
pl ••• Pk~X
k
(Note: Here the sum is over all primes Pi>'" ,Pk satisfying Pi ••• Pk :::; x; the same set of primes Pi>'" ,Pk with a different ordering is treated differently.) Prove: kx(loglogX)kl 0' (x) '" k
logx
(k
~
2),
(k ~ 2),
x(loglogX)kl
1tk(X) '" 'tk(X) '" ~:
(kl)!logx
(k ~ 2).
233
9.6 Selberg's Asymptotic Formula
9.6 Selberg's Asymptotic Formula Throughout §6  8 we use the letters q and r to represent prime numbers. ~
Theorem 6.1 (Selberg). Let x
+
.9(x)logx
1. Then
L
.9(~)IOgp = 2xlogx + O(x)
(I)
+ O(x).
(2)
p
p""x
and L log2p
+
L logplogq = 2xlogx
We first prove the following: Lemma. Let F(x) and G(x) be two functions defined for x G(x) =
L l~n:::;;x
~
I and satisfying
F(~) log x.
Then
n~/(n)G(~) = F(x)logx + n~x F(~)A(n). Proof We have, from §6.4, A(n) L n""x
=
Ldln Jl(d) log~ so that
Jl(n)G(~) = L Jl(n) n n""x = L I""x
=
L I""x
L
x
m:::;;
F(~) L I
F(~) log~n mn Jl(n)
(IOg~ + 10g~) n
I
nil
F(~)IOg~. LJl(n) + L I I nil
= F(x) log x + L
F(7)A(l)
I""x
F(~) A(l).
D I Proof of Theorem 6.1. Let y be Euler's constant. From §5.8 we have I""x
L ~ = log x + y + 0 (~) .
n~xn
X
Also
x
= L logn = flOgtdt + O(logx) = xlogx  x + O(logx). n:::;:;:x
1
234
9. The Prime Number Theorem
We apply the lemma with
= "'(x)  x + y + 1
F(x)
so that G(x)
= logx l";~";X ",(~)
xlogx

n~x~ + (y + l)xlogx + O(logx)
= 0(log2 x) = O(yIx).
From the lemma we have F(x)logx
+I
n~x
F(~)A(n) = o( I J~) = O(x). n n
(4)
n~x
From Theorem 5.9.1 we have A(n)
I 
n~x
logx
=
n
+ 0(1).
(5)
Therefore, from (3), (4), (5) and Theorem 1.2 we have "'(x) log x
+ n~x "'(~)A(n)
=xlogx+x = 2xlogx
A(n)   ( y + l)logx(y+ I) I A(n) +O(x) n~x n n~x
I
+ O(x).
(6)
From Theorem 1.2 we have
n~x "'(~)A(n) =
o(
Jx .9(~}Ogp
I
logp log q) =
pr.tqP~x a~2,p;;>l
= 0 (x
m~x A(m)A(n)  p~xlogplogq
o( I
=
I
logp
p(%~x a~2
logp ) P,,;J;p(p  1)
I
=
IOgq)
qP~xlprx. P~l
(7)
O(x)
and "'(x) = .9(x) = .9(x)
+ .9(xt) + ... + .9 (x[:::!J) + O(logx . .9(xt)) =
Formula (I) now follows from (6), (7) and (8).
.9(x)
+ O(xt log x).
(8)
235
9.7 Elementary Proof of the Prime Number Theorem
Also from 9(x)logx I log2p= I logplog::'= I logp(I p';;x p';;x P p';;x x
~+O(l))
n~p
1
I I
=
n:::=;x
n
= o(x
formula (2) follows at once.
logp
+ 0(9(x))
x p:S::;;
I ~) + O(x) =
n~xn
O(x),
0
9.7 Elementary Proof of the Prime Number Theorem Let R(x)
=
(l)
9(x)  x.
We know from Theorem 1.1 that the prime number theorem is equivalent to lim R(x) = O. x+
(2)
X
00
Before we prove (2) we first establish the following lemmas. Lemma 1.
If x;?;
3, then logp logq
I
pq
pq:S:;x
I
.
1 = log2 X 2
logplogq
pq';;x pq logpq
+ O(logx),
= logx + O(loglogx),
logp
I = O(log log x). " 2x p~x plogp
Proof Let A(n) = Ip,;;.logp/p. From Theorem 5.9.1 we have A(n) = logn where r. = 0(1). Therefore "
L...
pq';;x
logp log q "logp" log q "logp x    = L...   L...   = L. logpq
p';;x P
x
q
p';;x p
p
+ O(logx)
q~P
x = I (A(n)  A(n  1)) log + O(logx) n~x
n
+ r.
236
9. The Prime Number Theorem
I
A(n)
n~xl
=I
n~x
{lOg~ n
logn .lOg(1
109_x_} + O(logx) n+1
+~) + n
o( I
n~x
109(1
+ ~)) + O(logx) n
1
= 2log2 X + O(logx). Using the same method we have, by partial summations, logplogq
I
pq ~ x
pq logpq
= logx + O(loglogx).
Also from
I
1 2x nlogn
n "'x ~
1 1 1( 1 1 ) =I+Ilogx n"'x n n"'x n 2x logx ~
f
logn
~
x
I 1
=
n~x n
du 2 ulog u
+ 0(1)
2x n
f
=
1
2x u
f
~
I
x
~n~x 2 du
ulog u
x
+ 0(1) = du + 0(1) = ulogu
2
O(loglogx),
2
we have
I
logp
I
=
'" p log2x p
'"
p~x
I
n~x
=
(A(n)  A(n 
1 1»2x
logn
n~x
{logn log(n  I)} _1_ 1og2x n
o( I
1
'" nlog2x n
+
o( I 'n n~x
2x
logn
12x ) logn+1
) = O(loglogx).
n~x
The lemma is proved.
0
Lemma 2. 8(x)
+ I
pq~x
logplogq = 2x logpq
+ o(~) logx
(X
~
2).
237
9.7 Elementary Proof of the Prime Number Theorem
Proof Let
I
B(n) =
logplogq,
=
C(n)
pq~n
I
log2p.
p~n
Then we have 8(x)
+ =
'" logp log q L.:. pq~x logpq
I
C(n)  C(n  I) n~x logn
= C([x]) log[x]
= 2x
+ L
B(n)  B(n  1) n~x logn
+ B([x]) + I log[x]
{C(n)
{_l_ _ logn
1 } log(n + I)
IOg( 1 +~)
+ o (_x_) + I
(2nlogn
n~xl
logx
+ B(n)}
n~xl
+ O(n))
lognlog(n
n
+ 1)
=2X+O(_X), log x
and the lemma is proved.
0
Lemma 3.
logp logq R (x) pq~x logpq pq
R(x)logx = I
+ O(x log log x)
(x
~
3).
Proof From Lemma 1 and Lemma 2 we have
x) logp I 8 (  logp=2x I   Ilogp I p~x p p~x p p~x
log q log r x logqr
qr~p
+
o(x I
~
IOgP)
p~x
=
2xlogx 
2x plogp
logqlogr 8 (x) qr~x log qr qr
I
+ O(xloglogx).
Substituting this into Selberg's asymptotic formula (that is, formula (6.1)), we have 8(x) log x = I
pq~x
logplogq 8 (x) logpq pq
+ O(x log log x).
The result follows from substituting (I) into this and applying Lemma 1. Lemma 4.
IR(x)l::::; _1 I IR(~)I logx n~x n
+o
(x loglogxlog x)
(x
~
3).
0
238
9. The Prime Number Theorem
Proof Substituting (1) into formula (6.1) we have
I R(~)IOgp + O(x),
R(x)logx = 
p"'x
p
so that from Lemma 3 we see that
2IR(x)llogx:::;; I IR(~)IIOgp + I logplogq p"'x p pq"'x logpq
IR(~)I + O(x log log x). pq
From Lemma 2 and partial summations, and noting that Iial  Ihl! :::;; la  hi, we see that
2IR(x)llogx:::;;
)1)
I (Ilogp+ I 10gpIOgq)(IR(~)I_IR(_x n"'xl p"'n pq"'n logpq n n+I
I
+0 (
:::;; 2
p"'x
+ I 10gpIOgq) + O(xloglogx) pq"'x logpq
n"'~l n(IR(~)IIRC: 1)1)
I (~) II
+0( ~2
logp
I
I _n R n",xllog2n n
(n I
R x) +0
..:: n"'x I
+ o(x
R (_x ) n+1
II)
+ O(x log log x)
I n ((x) 8  8 (( n",x_llog2n x))) n n+ 1
1_))
I _n_(~ _ _
n",x_llog2n n
n+ I
+O(xloglogx).
From Theorem 1.2 we have
I 
n ((x) (x)) 8  8 n n+1
n",x_llog2n
=
I
2"'n"xl
= o(x
I
(_n _ n_1_) =
8 (~) n log2n
n"'x nlogn
1 ) log2(n  I)
+ O(x)
O(xloglogx),
so that
n~JR(~)1 + O(xloglogx),
2IR(x)llogx:::;; 2 and the required result follows. Lemma 5.
If x >
0
I, then
I
n~x
8(n) 2
n
= logx + 0(1),
239
9.7 Elementary Proof of the Prime Number Theorem and
L 8(~) =
xlogx
n
n~x
+ O(x).
Proof Since
L ~= L ~ L ~=~+O(~)+O(~), X
p";"";xn
";;'pn
">xn
P
P
we have
L
I
8(n)
L 2" L logp = p~x L logp p:::=;n~x L n2 n p~n
2 =
n:::=;x
n
n:::=;x
L IOgp(~p + O(~) + O(~)) = P x
=
logx
+ 0(1)
p";x
and
L logp . (~+ 0(1») =
=
P
p";x
Lemma 6. logn L R(n) = "";x

n
xlogx
+ O(X). 0
I (x) + O(X). L R(n)R "";x n
n
Proof From Selberg's formula (that is (6.2» and partial summations we have
x x IOg2 plog + L logplogqlog = 2xlogx + O(X). p";x P pq";x pq Substituting
L
log~ = L ~ + O(~), P
p";"";x
n
P
x log= pq
into the above formula and interchanging the summations we have
L I L log2p + L I L logp L logq = 2xlogx + O(x);
n~x
n
p~n
n~x
n
p~n
x q~;
that is logn L 8(n) + L
(x)
I 8(n)8  = 2xlogx + O(x). n n:S;xn n The required result follows from substituting (I) into this formula and then apply Lemma 5. 0 n~x
240
9. The Prime Number Theorem
Lemma 7. Let 0 < u < 1 and suppose that there· exists Xo such that, for x > Xo, IR(x)1 < ux.
(3)
Then there exists Xu such that, when x > xu, the interval subinterval (y, eby) with the property that
IR;Z) I< u ~ u when y ~
Z
~ eby. Here
2
«(1 
U)16 X,
x) contains a
,
c5 = u(1  u)/32.
Proof From Lemma 6 we have logn I In~xnR(n) ~
~R(n)R(~) n
I
x n
+II
~R(n)R(~)1 n
n<xo n
xo~n~~
+
x
~R(n)R(;)
I
+ O(x)

Xl,
Ix,,t,;;;x lO! n R(n) I< u (x + x') logx + O(x), 2
where x' = (l  U)16 X • Suppose that R(n) does not change sign in (x' (x' ~ y ~ x) so that
~
n
~
x). Then there must exist y
R(Y) I I logn < u (x + x')logx + O(x). Iy x'';;;n';;;x 2
From (l 
U)16
1 u < I
+ 15u '
we see that
I< u IR(y) y
u(l
+ 7u) + 0 8
(_1_) log x
(4)
Xl)'
But if R(n) changes sign in (x', x), then clearly there exists y (x' IR(Y)I = O(1ogy) so that (4) still holds.
~
y
~
x) such that
241
9.7 Elementary Proof of the Prime Number Theorem
When I < Y < Y' we have, by Lemma 2,
L
y «(1 + 70')/(1 + 15O'))x, then ebYI>
10'
+ 150'
I
x>
Xl
so that we can take Y = ebYI' The lemma is proved.
0
Proof of the prime number theorem. We already know that there exist e > 0 and x~ such that, for X > Xo, 8(x) > ex
(this is Theorem 1.2). From Selberg's formula, we have 8(x)
= 2x  _1_
L 8(~)IOgp + o(~) p log x
= 2x 
L 8(~)IOgp
logx p~x
_1_ logx
x
p~;
xo
p
(6)
242
9. The Prime Number Theorem
::;;; 2x  ex log x logx
I+ 0 (
logx
x
L
logp)
+ 0 (x) logx
xo' xo, e > 0).
From (I) we have
IR(x)1 < O'o(x)
(x> XO, 0'0
II ~ I,
=
0 < 0'0
xu, the interval «1  (T)16 X, x) contains a subinterval (y, eOy) (J = (T(l  (T)/32) such that when y :;;; z :;;; eOy we have RI(Z) I < _1_ . (T + (T2. Z (()(k) 2
I
7) First use Theorem 8.2 to show that (To and Xo exist such that 0 < when x> Xo
(To
< 1 and
(To
IRI(X) I <  x . (()(k)
Then use this together with 4) and 6) to prove that lim Rl(x) =
o.
X
x+ 00
Notes 9.1. The present best result on the error term of the prime number theorem, namely n(x)
~
= Ii x + O(xec(logX)'),
c a positive constant,
is due to I. M. Vinogradov and H. M. Korobov and is based on estimates on trigonometric sums.
249
Notes
9.2. In recent years a number of mathematicians have obtained error term estimates in Selberg's elementary proof of the prime number theorem. For example: n(x)
= Ii x + 0 (;) log x
where A is any constant however large and the Oconstant d((pending on A (see E. Bombieri [8] and E. Wirsing [64]). An even better estimate is given by H. Diamond and G. J. Steinig [21].
Chapter 10. Continued Fractions and Approximation Methods
10.1 Simple Continued Fractions By a finite continued fraction we mean an expression ao+a1
+
We shall see that, as N + 00, the expression here tends to a definite number; we call the infinite continued fraction a continued fraction. It is convenient to denote the above expression by 1 ao+al

1

1
+ a2 + ... + aN
or
It is easy to see that ao
[ao] =
T'
In general, we let [ao, al> ... ,an] = Pn/qn, 0 ::;;; n ::;;; N, where Pn, qn are polynomials in ao, aI, ... ,an. These polynomials are linear in anyone a, and the denominator qn is independent of ao. We call Pn/qn the nth convergent of [ao, al> ... ,aN]. Theorem 1.1. The convergents satisfy the following: PI =alaO+ 1,
Pn = anPn1
ql = al>
qn
= anqnl
+ Pn2 + qn2
(2::;;; n ::;;; N), (2::;;; n::;;; N).
0
Theorem 1.2. The convergents satisfy the following: (n
~
1),
(1)
251
10.1 Simple Continued Fractions
or Pn
Pn1
(  1)n1
and (n
~
D
2).
(2)
Definition. Let ao be an integer, and a1, a2, ... be positive integers. Then 1 ao+a1
1 + a2
+ ...
is called a simple continued fraction. We shall only deal with simple continued fractions in this chapter. From Theorems 1.1 and 1.2 we deduce at once: Theorem 1.3. (i) lfn > 1, then qn (ii)
~
qn1
P2n+1 P2n1 . q2n q2n2
(iii) Every convergent of a simple continued fraction is a reduced fraction.
0
Let oc be a real number. We take ao = [oc] and we let oc~ = l/(oc  [oc]). We then take a1 = [OC1] and we let oc~ = l/(oc~  [OC'1])' We continue in this way by taking an = [oc~] and defining oc~ + 1 = 1/( oc~  [oc~]). It is clear that if this process terminates, then oc must be a rational number. Conversely, if oc is a rational number p/q where (p, q) = 1, then ao = [p/q] and
1
O~
1X2n2' Next from Theorem 1.2 (1), 1X1 ~ 1X2n+1 ~ 1X2n ~ 1X2, so that limIX2n and limlX2n+1 exist. Finally, from Theorem 1.2 and Theorem 1.3 (i), we have 11X2n  1X2n 11 = l/q2nq2nl ::;;; 1/2n(2n  1) so that limIX2n = limIX2n1' 0 Exercise. Prove that
Pn =
1 al 1
ao 1 0
0  1 a2
0 0  1
0 0 0
0 0 0
0 0 0
.......................................
0 0
0 0
0 0
0 0
1 anl 0
 1 an
and that qn is the determinant above with the first row and first column omitted. Exercise 2. The sequence (un) = (1, 1,2,3,5,8, 13, ... ), where Ul = U2 = 1, Ui + 1 = Ui  1 + Ui (i > 1), is called the Fibonacci sequence. Prove that (i) Un +2/Un + 1 is the nth convergent of (1 + )/2; (ii) in the continued fraction [ao, at. .. .], if ai = 2 (i > 0) and an = 1 (n # i), then for m > i we have
J5
Pm Ui+1 Umi+3 + Ui Umi+l qm Ui Umi+3 + Ui1 Umi+l Exercise 3. A synodic month is the period of time between two new moons, and is 29.5306 days. When projected onto the star sphere, the path of the moon intersects the ecliptic (the path of the sun) at the ascending and the descending nodes. A draconic month is the period of time for the moon to return to the same node, and is 27.2123 days. Show that solar and lunar eclipses occur in cycles with a period of 18 years 10 days.
10.2 The Uniqueness of a Continued Fraction Expansion Definition. We call [ao, at. ... , an, ... ].
IX~
= [an, an+t. ... ] the (n + l)th complete quotient of
253
10.2 The Uniqueness of a Continued Fraction Expansion
Theorem 2.1. We have
IX =
IX~,
IX~ao + 1 IX==IX'l
IX~Pn1 +Pn2
IX
= ",     ,
IXnqn1
+ qn2
If IX is rational, then this holds up to n = N. Proof Use mathematical induction.
0
[IX~], except when IX is rational and aN = 1 in which  1. Therefore there are only two representations to a
Theorem 2.2. We always have an =
case we have aN1 = rational number.
[IX~_l]
Proof WehaveIX~ = an + I/IX~+l' If IX is irrational or iflXisrationai and n ¥ N  1, then IX~ + 1 > 1 so that an < IX~ < an + 1, as required. If IX is rational and n = N  1, IXn + 1 = 1, then an = [IX~]  1. 0 Theorem 2.3. The representation of an irrational number by a simple continued fraction is unique.
Proof Suppose that IX= [aO,aI>a2,"'] = [b o,b 1,b 2, ... ]. Certainly we have ao = [IX] = bo, and similarly a1 = b 1. Suppose now that ak = bk for k < n, and we have to prove that an = bn. From IX = [ao, . .. , anI> IX~] = [ao, . .. , an b P~], we have IX~Pn1 +Pn2 IX = ,    IXnqnl +qn2
P~Pn1 +Pn2 R'
I'nqn1 +qn2
,
so that (IX~  P~)(Pn1qn2  Pn2qn1) = O. From Theorem 1.2 we deduce that IX~ = P~ and therefore an = [IX~] = [P~] = bn· 0 Theorem 2.4. We have
( I)nb n qnIX  Pn =   
0< bn < 1,
and bn/qn + 1 is a decreasing function of n. (If IX is rational, then this holds only for 1 :::; n :::; N  2, and bN1 = 1.) Proof We have
so that Pn IX~+1Pn + Pn1 IX   = ~=qn IX~+lqn+qn1
Pn qn
 (Pnqn1  qnPn1) qn(IX~+lqn + qn+d
( I)n qn(IX~+lqn
+ qn1)'
254
10. Continued Fractions and Approximation Methods
and hence (j= n
qn+1 rx n+1qn + qn1
an+1qn+qn1 rx n+1qn + qn1
I
I
From this we see that 0 < (jn < 1 except when rxn + 1 rx~ = 1 + l/rx~+ 1 we have
= rx~ + l '
Also, from
1
;::: rx~+lqn
+ qn1
(an+1
+ l)qn + qn1
In the last inequality, equality sign holds only when rxn+ 1 rational and n = N  1. 0
= rx~+ p
that is when rx is
From this theorem we deduce: Theorem 2.5.
If rx is irrational, then limpn/qn =
0
rx.
Theorem 2.6. We have
Irx 
Pn qn
I: : ; _1_ < ~, qnqn1
qn
with the equality sign only when rx is rational and n = N  1.
0
10.3 The Best Approximation Let rx be a real number.: Among the rational numbers with denominators not exceeding N, there is one which is closest to rx, and we call it the best rational approximation to rx. We now prove that the convergents Pn/qn are the best rational approximations to rx. Theorem 3.1. Suppose that n;::: 1, 0 < q ::::; qn and p/q :f Pn/qno Then IPn/qn  rxl < Ip/q  rxl·
Proof It suffices to prove that IPn  qnrxl < Ip  qrxl. (i) If rx = [rx] + t, then Pr/q1 = rx and the result follows at once. (ii) If rx < [rx] + t, then the result holds when n = 0, and if rx > [rx] + t, then the result holds when n = 1. We now assume as induction hypothesis that the result holds for n  1, and proceed to prove by induction. If q::::; qnl> then from the induction hypothesis IPn1  qn1rxl < Ip  qrxl, so that we may assume that qn;::: q > qn1' If q = qn, then
255
lOA Hurwitz's Theorem
Also
If qn+l = 2, then n = 1, and al = a2 = 1, giving 1
IX
l' 1
= ao + 


1 + 1 + a3
+ ...
t
which shows that ao + < IX < ao + 1, and our required result clearly holds. We may therefore assume that qn+ 1 > 2, that is
and so
II!..q 
IX
I ~ II!..  Pn 1IPn q
qn
qn
IX
I ~ ~ Ipn qn
qn
IX
I > Ipn qn
IX
I·
We may now assume that qn > q > qnl' Let us write upn + vPnl = p, uqn + vqnl = q, so that u(Pnqnl  Pnlqn) = pqnl  qPnl' From Theorem 1.2 we have u = ± (pqnl  qPnl), and similarly v = ± (pqn  qPn). The numbers u, v cannot be zero, and in fact from qn > q = uqn + vqnl we see that they are of opposite signs. Now from Theorem 2A,Pn  qnlX andpnl  qnllX have opposite signs, and therefore u(Pn  qnlX) and V(Pnl  qnllX) have the same sign. Finally from pqlX=U(PnqnlX)+V(PnlqnllX) we see that IpqlXl>IPnlqnllXl > IPn  qnlXl· D
Example. From n
= [3,7,15,1,292,I,I, ...J we
obtain the convergents
3 22 333 355 103993 104348 106' ill' 33102 ' 33215 , ....
l' 7'
In the year 500 A. D. Chao JungTze obtained both the crude estimate 22/7 and the good estimate 355/113 (this is more than a thousand years earlier than the earliest European record due to Otto). More interesting still the two estimates of Chao belong to the family of best approximations to n; in other words there is no fraction with denominator less than 113 which is closer to n than 355/113 is. From Theorem 2.6 we have 3551
In ill
1
1
< 113 x 33102 d > O.
e
e
e
Proof From = [ao, al>"" akl> '1], we have = (PkI'1 + Pk2)/(qkl'1 + qk2) and we see that the condition c > d > 0 is necessary. The sufficiency of the condition can be proved by induction on d. D
e
Theorem 5.3. A necessary and sufficient condition for two irrational numbers and '1 to be equivalent is that = [ao, al>' .. , am, co, Cl>' .. ] and '1 = [b o, bl> ... , bn , co, Cl,' .. J. In other words their continued fractions expansions are eventually identical.
e
Proof I) Let W = [co, Cl>' .. J. Then
+ Pml , e= [ao,al> ... ,am,w] =WPm wqm + qml Thus wand eare equivalent. Similarly wand '1 are equivalent, and hence eand '1 are equivalent. 2) Let eand '1 be equivalent, and '1 = (ae + b)/(ce + d), ad  bc = ± l. We may assume that ce + d> O. We expand einto continued fractions: e= [ao, ... , ak, ak+ l>' ..] = [ao,· .. ,akl> IX~] =
(IX~Pkl
+ Pk2)(a.~qkl + qk_2)l.
259
10.5 The Equivalence of Real Numbers
It follows that '1 = (Prx~
+ R)(Qrx~ + S)l,
aPk2 + bqk2, Q = CPk1 + dqkt. S= CPk2 satisfying PS  QR = ± 1. From Theorem 2.4 we have Pk2
where P = aPk1 + bqkt. R = + dqk2; P, Q, R, S are integers
(j'
+ ,
= ~qk2
l(jl < 1, WI < 1,
qk2
so that (c~
Q=
c(j
+ d)qk1 +  ,
S
qk1
=(c~
c(j'
+ d)qk2 + . qk2
From c~ + d> 0, qk2 ~ k  2 and by Theorem 1.3, qk1 ~ qk2 + 1 we see that Q > S > when k is sufficiently large. It follows from Theorem 5.2 that '1 = [b o,· .. , bn , rx~] and the necessity of the condition is established. 0
°
Denote by M(rx) the greatest number such that, for any e > 0, the inequality
Irx 
~ I ~ (M(rx) 1_ e)q;
has infinitely many solutions. For example Pi rx  qi
M«fi 
1)/2) =
fi. Let
1 Aiqi
= .,.z ;
then 1 Ai = (  l)i(' rxi + 1
qi1) , +~
and qi
q;/qi1
ai+qi1
ai+ ai1+qi2
Therefore
ico
ioo
If rx and {J are equivalent, then ai = bi for all large i. We have therefore proved the following Theorem 5.4.
If rx and {J are equivalent,
then M(rx)
=
fi
M({J).
0
(fi 
We see therefore that if A > and if rx is equivalent to 1)/2, then the inequality Irx  p/ql < 1/Aq2 has only finitely many solutions. We may now ask for the value of M(rx) when rx is not equivalent to 1)/2. We have the following
(fi 
260
10. Continued Fractions and Approximation Methods
(J5 
result: If oe is not equivalent to 1)/2, then M(oe) ~)8. Specifically, for such oe, the inequality loe  p/ql < 1/)8 q2 has infinitely many solutions. Also, if oe is equivalent to 1 + ,J2, then M(oe) = )8. For the general situation, we need the following: Definition 5.3. By a Markoff number we mean a positive integer u such that there are integers v, w satisfying u 2 + v2 + w2 = 3uvw. The first eleven Markoff numbers are 1,2,5,13,29,34,89,169,194,233,433. (We shall prove in the next chapter that the number of Markoff numbers is infinite.) It can be proved that if oe u = ~ (J9U 2 2u

4
+ u + 2V) w
where u, v, ware related by the definition of the Markoff number u, then M(oe u) = J9u 2  4/u. Furthermore if oe is not equivalent to oe u for I ::;:; u ::;:; v, then the inequality .
loe ~I
O. Now (J( = [ao, ... , an b f3J. If 13 ~ 1, then 13 = (J(~ ( = [an> an+ b" .]), and this means that p/q is a convergent of (J(. If 0 < 13 < 1, then [anI + 1/13] = anI + c, c > 0 so that (J( = [ao, ... , an2, an b + c, . .. ] and we see that [ao, . .. , anI] is not a convergent. Therefore the required necessary and sufficient condition is that 13 ~ 1; in other words we have: Theorem 7.1 (Legendt:e). Let e9 = q2(J(  pq, e = ± 1, 0 < 9 < 1, and let p/q = [ao, .. . ,anI], (  1t 1 = e. Then, a necessary and sufficient conditionfor p/q to be a convergent of (J( is that
D Since the right hand side of the above inequality exceeds t we deduce at once the following
262
10. Continued Fractions and Approximation Methods
Theorem 7.2. If a rational number p/q satisfies loc  p/ql < 1/2q2, then it is a convergent of oc. 0 Theorem 7.3. Let p, q be positive integers satisfying Ip2  oc 2q21 < oc. Then p/q is a convergent of oc.
Proof Let oc 2q2  p2 = eooc, e = that
± 1, 0::;:; 0 < 1. Then ocq  p = eooc/(ocq + p), so
oocq ocq+p
[) = eq(ocq  p) =   =
oocqnl , ocqnI+Pn1
(_I)nl=e.
From Theorem 7.1 we see that it suffices to prove that
or that OOC(qn1 + qn2) < ocqnl + Pnl' Now this inequality clearly holds when n = 2 so that it suffices to establish ocqnl  Pnl < OC(qn1  qn2) for n > 2. But ocqnl  Pnl = eooc/(ocqnl + Pn d, and by Theorem 1.3 we have
qnl  qn2 The theorem is proved.
~
1
I > 
ocqnl+Pn1
0
10.8 Quadratic Indeterminate Equations In this and the next sections d denotes a positve integer which is not a perfect square. We consider the equation
0< III
yd = (X2,Y2) = I we deduce that Xl = X2, Yl = Y2 contrary to our assumption. We have therefore proved
°
Theorem 9.1. The Pel/'s equation X2  dy2
= I has a nontrivial solution. 0
From Theorem 7.3 we see that x/y = Pn t/qnl must be a convergent of and from Theorem 8.2 we know that there exists n such that ( I )nQn = I. Theorem 9.2. Let n be the least positive integer satisfying ( I )nQn solutions to the equation X2  dy2 = I are given by
jd,
= I. Then al/ the
265
10.9 Pell's Equation
+ jdqnl > 1. Because ± 1/(x + jdy) = ± (x  jdy), it suffices to show that all positive solutions to x 2  dy2 = 1 are given by x + yjd = em (m > 0). Let (x,y) be such a solution, so that x + yjd > 1. We may choose m so that em:::; x + yjd < em + l or 1 :::; em(x + yjd) < e. Let
Proof Let e = Pnl
and we shall prove that X (xo
+ Y jd =
1. Since jd is irrational, it follows that
+ Yojd)(x 
yjd)
=
X  Y jd.
On multiplying the equations together we have X 2
1< X

dy 2 = 1. Suppose now that
+ jd Y < e. Then
°< e
1
< (X + jd Y) 1 = X  jd Y < 1.
We deduce easily that 2X = (X + jd Y)
+ (X 
jd Y) > 1 + e  1 > 0,
2jd Y = (X + jd Y)  (X  jd Y) > 1  1 = 0. It follows from these that
X>o,. and
Jl
+ dy2 increases withy, so that x + jdyincreases as y increases. We Now x = deduce from the above that Y < qnl and X < Pn b so that X/Y is a convergent with denominator less than qnl' This is impossible; therefore X + jd Y = 1. D We see from the above that the equation x 2  dy2 = I is always soluble, but the equation x 2  dy2 =  I may have no solution. For example, since x 2 == 0, I (mod 4) so that x 2  3y2 == x 2 + y2 == 0, 1,2 (mod 4), we see that the equation x 2  3y2 =  1 is insoluble. In fact this example shows that x 2  dy2 =  I is insoluble whenever d == 3 (mod 4). However if xo, Yo satisfy x~  dy~ =  1, then, by defining Xl, Yl with Xl + jdYl = (xo + jdYO)2 we see that xi  dyi = 1. It is not difficult to prove that if x 2  dy2 =  1 is soluble, then all the solutions to x 2  dy2 = ± 1 are given by ± (Pn _ 1 + jd qn  d where n is the least positive integer satisfying ( 1)nQn =  1.
266
Continu~d
10.
Fractions and Approximation Methods
10.10 Chebyshev's Theorem and Khintchin's Theorem Let 8 be an irrational number. According to Theorem 2.4 there are infinitely many integers x, y satisfying Ix8 
I
yl 0, then there exists an integer x such that x8 differs from [x8] by less than e. In other words the number 0 is a limit point of the point set
x = 1,2,3, ....
x8  [x8],
(2)
An immediate problem arising from this is the determination of the set of limit points of the point set (2). For this Chebyshev has proved that each point in the interval (0, I) is a limit point of the point set (2). In fact he proved the following stronger result. Theorem 10.1. Let 8 be any irrational number and /3 be any real number. Then there are infinitely many integers x, y satisfying 18x  y 
3 x
/31 0 such that p
(i
l(il
O. Then the inequality
1+e Ix8  y  131 <  
(8)
J5x
has infinitely many solutions in integers x > 0, and y. Proof By Theorem 4.3 there are infinitely many coprime pair of integers p, q such that 8 = p/q + b/q2, where 0 < Ibl < 1/J5. We may assume that b > 0 since otherwise we can replace 8, 13 by  8,  13. Let ~1' ~2 be real numbers satisfying ~2  ~1 ~ 1, and we shall specify them later. We can choose x,y such that px  qy = [qf3],
(9)
Then we have
Ix8  y  131
=
I~x + bx _ q
q2
y _ [qf3] _ ~ q q
I (10)
where
r = qf3 
[qf3]. We want to show that
_~ ~ ~ (x; _r) < ~, or r2
1
J5
x 2b ~ q2
xr
r2
r2
1
J5.
~+ I (the left hand side is merely ~2  ~1) we obtain .. 2 < ~ +
(I 
+ 2c5 (I 
fi)
+ c5 2 = I 
fi
+ c5
< I/fi  c5. Let '1 > 0. We may specify x and y such that px  qy = [qP]
+ I,
'1q
~
x < (l
+ '1)q,
and similarly to (10) we have Ix.9 _ y _
PI = Ixc5 + I q2
q
't I = ~q (xc5q + (l  't»)
I{ I} I I (l + '1)2 <  (I + '1)c5 +   c5 ~ (I + '1) < =cq fi q fi xfi Since '1 is arbitrary, the theorem is proved.
D
Exercise. Let .9 be an irrational number such that, given any B > 0, there always exist integers x,y satisfying Ix.9  yl < B/X. Prove that if c5 > and Pis real, then there exist integers X,y such that Ix.9  y  PI < (I + c5)/3x.
°
10.11 Uniform Distributions and the Uniform Distribution of n9 (mod 1) Chebyshev's theorem in the last section states that the point set {x.9} = x.9  [x.9] , x = 1,2,3, ... is dense in the interval (0, I), in the sense that each point in (0, I) is a limit point of the set. We may ask about the distribution of this point set in the interval (0, I). In other words, if (a, b) is a subinterval of (0, I), then as x takes the values 1,2, ..., n does the interval (a, b) receive the "correct proportion" of points? Let us define precisely what we mean by the "correct proportion" . Definition. Let Pi (i = 1,2,3, ... ) be a point set in the interval (0, I). Let ~ a b ~ I, and for each positive integer n denote by Nn(a, b) the number of P b P 2 , ••• ,Pn that lie in the interval (a, b). Iflimn>ooNn(a,b)/n=ba always holds, then we say that the point set Pi (i = 1,2,3, ... ) is uniformly distributed in (0, I).
°points
0 such that p
J
IJI
12/8 and then choose n > 2(q + 6)/8. It follows that we have neb a) n8~Nn(a, b) ~n(ba) +n8. This proves that lim n _ oo Nn(a, b)/n = b a. 0
10.12 Criteria for Uniform Distributions Theorem 12.1 (Weyl). A necessary and sufficient condition for the sequence (Xn) ,
o ~ Xn ~ I to be uniformly distributed in (0, I) is that the equation 1
lim f(xd n+
00
+ ... + f(xn) =
ff(X) dx
n o
holds for every Riemann integrable function f(x) in (0,1).
(1)
271
10.12 Criteria for Uniform Distributions
°
Proof We first establish the necessity of the condition (1). I) Letf(x) be defined to be cor according to whether a clearly
+ ... + f(xn) =c I·1m
· f(xr) I1m
n
noo
Nn(a, b)
=c
~
(b
x
~
b or not. Then
) a,
n
noo
and I
f f(x) dx
=
c(b  a).
o
Therefore the equation (I) holds for this function f(x). 2) The equation (I) is linear in the sense that if it holds for f1> . .. Jk, then it holds for cdl + ... + Cdk. From 1) we see that (I) holds for all step functions. 3) It is a simple exercise to show that iffis Riemann integrable, and B > 0, then there are two step functions (f).(x), €P.(x) such that (f).(x) ~ f(x) ~ €P.(x) and I
f (€P.(t)  (f).(t)) dt
oo
~n I
±
e21timxv
v=l
1= °
holds for all m :f 0. Proof There is no need to prove the necessity part. For the sufficiency part we define g(x) = {
I,
if 0::;:; x < a,
0,
if a::;:;x(x)dx
~1>+1
Co =
= a + b  21'/;
~I>
and when n ¥ 0,
~I>
It follows that
ICnl
~ 1/b(nn)2
 g~,I>(xd S~,I> () X 
and
+ ... + g~,I>(Xk)
_ 1 ~

k

~
L.
L.
C
21tinx'
ne
kj~ln~oo
J
look
=
I
kn~oo
Cn
I
j~l
e21tinxj.
Thus we have
We observe that
I Cn~ i IInl>N k j~l
e27tinXjl
~; I ~. bn
n>N
n
Let e > 0 and choose N so that the right hand side of this inequality is less than e. With N fixed we see from I k lim  I e27tinXj = 0 k ... oo k j~l
that for all large k,
In~I
Cn 1 Ik N k j~ 1
N
. e 27t ,"Xj
I< e.
n*O
Thus, given any pair of fixed 1'/, b we have
or lim k'"
Now let
00
S~,ix)
= a + b  21'/.
274
10. Continued Fractions and Approximation Methods
From Sb.o(X)
~
S(x)
~
SO.b(X) we deduce that
k+
k+
00
00
for any lJ. Therefore limk_ 00 S = a as required.
D
For a clearer description of uniform distribution it is best to use the unit circle to represent the interval. Let en = e21tixn, n = 1,2, ... so that the sequence (x n) in ~ Xn ~ 1 is now transformed into a sequence on the unit circle. An advantage of using this description is the removal of the special properties of the end points 0, 1 in the interval (0,1). Take any arc of the unit circle with length 2noc (oc < 1). Then any uniformly distributed sequence will have the proportion oc of its points on this arc. Moreover, since e21tixn = e21ti (xn+d), it does not even matter if the sequence (xn) lie outside the interval (0,1). In other words we may define uniform distribution of f(x), mod 1 by the uniform distribution of the fractional parts of f(x) in (0, 1). A necessary and sufficient condition for the uniform distribution off(x), mod 1 is then 1 n lim  I e21timf(x) = 0, m¥O.
°
n+oonx=l
An interpretation of this condition is that the centre of gravity of the sequence of points e21timf(x), x = 1,2, ... , (m ¥ 0) is the centre of the circle. It is clear that iff(x) is uniformly distributed mod 1, then so is mf(x) for any nonzero integer m. The most interesting unsolved problem concerning this is whether eX is uniformly distributed mod 1.
Theorem 12.3. A necessary and sufficient condition for the uniform distribution of f(x) , mod 1 is that 1 n 1 lim  I {f(x)+a}=, noon x =1 2 Proof Necessity. Let fix) be uniformly distributed, mod 1. Then f(x) + a is also uniformly distributed, mod L Therefore we need only establish the case a = 0. Let Xm = {f(m)}. Then, by Theorem 12.1, we have
f 1
lim 1
I
n
noon x =1
Sufficiency. Let
°
~
{f(x)}
=
xdx
1 2
=.
o
b
~
1. Then
1 n i l {f(x) + 1  b} =  II ({f(x)} + 1  b) +  I2 ({f(x)}  b), n x= 1 n n
 I
where in II' X runs through those integers 1,2, ... ,n such that {f(x)} < b, and in I2, x runs through those integers 1,2, ... ,n such that {f(x)} ~ b. We see therefore that 1
n
 I
n x=1
{f(x)
+ 1
n
b} = n 1
I
x=1
{f(x)}
+ n 1N n(0,b)
 b.
275
10.12 Criteria for Uniform Distributions
Letting n +
00
and observing the hypothesis we see that . 1 11m Nn(O,b) n+
as required.
0
00
n
=
b
Chapter 11. Indeterminate Equations
11.1 Introduction By indeterminate equations we mean equations in which the number of unknowns occurring exceed the number of equations given, and that these unknowns are subject to further constraints such as being integers, or positive integers, or rationals etc. Apart from equations of the first and second degrees, the discussion on indeterminate equations is very scattered. The complicated nature of the subject is illustrated by the fact that Volume II of Dickson's History 0/ Number Theory devotes over eight hundred pages on such equations. The study of these equations has a long history. In the third century Diophantus attempted a systematic study and in fact nowadays indeterminate equations are often called Diophantine equations. In our country indeterminate equations have an even longer history; for example Soon Go gave the general solution of X2 + y2 = Z2 in integers x, y, z much earlier than the west.
11.2 Linear Indeterminate Equations From Theorem 2.6.2 we see that a necessary and sufficient condition for the equation alxl + a2x2 + ... + anxn = N to have a solution is that (at. ... ,an)IN. Suppose now that a 1 > 0, ... ,an> 0, (a 1 , ••• ,an) = 1. We ask for the asymptotic formula for the number of solutions to the equation Xv
;?;
0 (v = 1, 2, ... , n).
(1)
Theorem 2.1. Let (at. ... ,an) = 1, and denote by A(N) the number o/solutions to (1). Then we have
Proof 1) Since (at. ... , an) = 1, the number A(N) is the coefficient of XN in the power series for 1
j(x)
1
1
= 1 _ x'" . 1 _ x"2 ... 1 _ x an •
277
Il.2 Linear Indeterminate Equations
Let 1, (I, (2, ... , (I be the roots of (l  x a ,) ••• (l  x an ) = 0, with multiplicities n, 11, 12 , ••• , It respectively. Since (at> ... , an) = 1 we have Ii ~ n  1 (i = 1,2, ... , t). We have, by partial fractions, j{x)=
An (lx)n
Al + ... ++
B" «(IX)I,
Ix
BI + ... + (IX
+ ... (2)
where A, B, . .. , P are constants. 2) Denote by ljJ(N) the coefficient of x N in the power series expansion of A (0( _
xy
=
AO(I
(X)I 1 ~ .
Then, by the binomial theorem expansion, we have ljJ(N)
= AO(
_ ( 1)(  I  1) ... (  I  N I
N!
= AO(I (N + 1
+ 1) ( 
)N
1 0(
l)(N + 1 2) ... (N (II)!
+ 1) (~)N, 0(
so that .
hm
N+oo
ljJ(N). NI
0(1
+N
A (II)!
1
(3)
Applying this to the various terms in (2) and observing that Ii that
~
n  1 we see
and from (2) we have An=lim x+I
(l  X)n (1  x a ,)
•••
(1  x an )
Theorem 2.2. Equation (1) is always soluble
if N
aI··· an
.
D
is sufficiently large.
D
Exercise. Let (a, b) = 1, a> 0, b > 0. Show that the number of solutions to ax + by = N, x ~ 0, y ~ is given by
°
N  (bl + am) ab
+1
where I and m are the least nonnegative solutions to bl == N (mod a) and am == N (mod b) respectively.
278
11. Indeterminate Equations
11.3 Quadratic Indeterminate Equations We shall solve the equation ax 2 + bxy
+ ey2 + dx + ey + f
=
(1)
0.
We write D = b 2  4ae. If D = 0, then we multiply (1) by 4a giving (2ax + by)2 + 4adx + 4aey + 4af = 0, which is not a difficult equation to solve. Let 2ax + by = t so that t 2 + 2(2ae  bd)y
(t
+ d)2 = 2(bd 
+ 4af =  2dt, 2ae)y + d 2  4af
The number t can be obtained from the congruence (t + d)2 == d 2  4af (mod2(bd  2ae)), and so x, y can be solved. We now assume that D "# 0. Multiplying (1) by D2 we have
Substituting Dx = x' a(x'
+ 2ed 
be, Dy
= y' + 2ae  bd into (2) we have
+ 2ed  be)2 + b(x' + 2ed  be)(y' + 2ae  bd) + e(y' + 2ae + dD(x' + 2ed  be) + eD(y' + 2ae  bd) + fD2 = 0,
bd)2
or ax'2
+ bx'y' + ey'2 = k,
(3)
where
= a(2ed  be)2 + b(2ed  be)(2ae  bd) + e(2ae  bd)2
 k
+ dD(2ed 
be)
+ eD(2ae 
bd)
+ fD2.
We see therefore that whether (1) is soluble depends on whether (3) has solutions satisfying x' == be  2ed,
y' == bd  2ae
(mod D).
Our first priority is therefore to solve (3).
11.4 The Solutions to ax2 + bxy
+ cy2
=
k
We shall solve ax 2 + bxy
Let d = b 2

+ ey2 = k.
(1)
4ae. We shall assume that d is not a perfect square, and that
11.4 The Solutions to ax 2 + bxy
+ Cy2 = k
279
(a, b, c) = 1. We need only find those solutions satisfying (x,y) = I, and we call these the proper solutions. Theorem 4.1. Let x, y be a proper solution to (I). Then there are two uniquely determined integers sand r satisfying xs  yr = 1,
(2)
and the integer
1= (2ax
+ by)r + (bx + 2cy)s
satisfies
12 == d (mod 4k),
o ~ 1< 2k.
Proof Let ro, So be a solution to (2). Then the general solution to (2) is r = ro s = So + hy where h is any integer. Thus
1= (2ax
+ hx,
+ by)ro + (bx + 2cy)so + 2h(ax 2 + bxy + cy2) = 10 + 2hk,
so that we may choose a unique h such that 0 [2
(3)
~
I < 2k. Finally we have
= [(2ax + by)r + (bx + 2cY)S]2 = 4(ar2 + brs + cs 2)(ax2 + bxy + cy2) + (b 2  4ac)(xs  yr)2 == d (mod4k).
0
Theorem 4.2. Let (Xl' YI) and (X2' Y2) be two proper solutions corresponding to the same number I in the previous theorem. Then we have
where t and u are integers satisfying
(5) Conversely, if(X2, Y2) is aproper solution, then the numbers Xl> YI defined by (4) also give a proper solution and both solutions correspond to the same number I. Proof I) We first show that t
= «2axl + byr)(2ax2 + bY2)
u =  (XIY2  X2YI)/k
 dYIY2)/2ak,
(6)
are the suitaqle integers; that is we show that t and u are integers satisfying (5). From
280
11. Indeterminate Equations
1+Ujd 2
+ byd(2ax2 + bY2) 
(2axt
dYtYz 4ak
± 2a(XtY2 
X2Yt)jd
+ bYt + jdYt)(2aX 2 + bY2 ± jdY2) (2ax t + bYt + jdYt)(2ax t + bYt  jdYt) (2ax t
+ bYt + jdYl)( 2aX2 + bY2 ± jdY2) (2aX2 + bY2 + jdY2)(2aX2 + bY2  jd Yz) , (2ax t
we see that (4) follows. Next from 12  du 2
4= we see that
1
1 + jd u
2
1
.
jd u
2
=1,
and u satisfy (5). Also
2axt
+ bYt = (2axt + bYt)(StXt  TtYt) = (2axt + bYdStxt  IYt + (bxt + 2cYt)StYt ==  IYt
(mod 2k).
(7)
Similarly we have 2aX2
+ bY2 == 
IYz
(mod 2k).
Therefore 2a(xtYz  X2Yt) == 0
+ 1)(xtYz
(mod 2k),
 x2yd == 0
(mod2k).
2C(XtY2  X2Yt) == 0
(mod2k),
(b  1)(xtYz  X2Yt) == 0
(mod 2k).
(b
Similarly we have
But (2a,b
+ I,b 1,2c) = (2a,2b,2c,b + I) ~ 2,
so that XtYz  X2Yt == 0
(modk).
This shows that u is an integer. Therefore 12 is an integer, and since 1 is rational, itself must be an integer. 2) Suppose that 2axt
and
12  du 2
+ (b + jd)Yt = (2ax2 + (b + jd)Y2)
e
+ ;jd),
= 4. Then t 
Xt
bu
= 2X2  CUY2,
1
Yt
+ bu
= aux2 + 2Y2'
t
11.4 The Solutions to ax 2 + bxy
Let
rio
+ Cy2 =
SI correspond to the solution r2
t
281
k
+ bu
= 2r1 + CUSl,
Xio
Yl. Then S2 =  aurl
t  bu
+ SI 2
correspond to the solution X2, Y2, because
Finally, let II and 12 correspond to (Xio yd and (X2' Y2) respectively. Then
t  bu = (2ar l + bs1) ( 2X2  CUY2 ) + (brl + 2csd ( aux2t + +2bu Y2 )
= { 2a ( r l t  2bu+ s 1cu)
t  bu )} X2 +b (SI2+rlau
bu s 1cu.) +2c (SI2rlau t + bu )} Y2 + { b ( r l t +2
The theorem is proved.
0
We shall now separate our discussion into two cases depending on the sign of d. Theorem 4.3. Suppose that d < 0. Let
if d <  4, if d=  4, if d =  3. Then there are w proper solutions to (l) that correspond to the same l.
°
Proof From Theorem 4.2 we see that it suffices to show that the equation t 2  du 2 = 4 has w solutions. If d <  4, then clearly t = ± 2, U = are the only solutions, so that w = 2. If d =  4, then t 2 + 4u 2 = 4 has the four solutions t = ± 2, U = and t = 0, u = ± 1. Finally if d =  3, then t 2 + 3u 2 = 4 has the six solutions t = ± 1, U = ± 1; t = ± 2, U = 0. 0
°
Theorem 4.4. Let d > 0. Then all the solutions to the equation X2  dy2 = 4 can be obtained as follows: Let xo, Yo be a solution in which Xo + Yofl is least (xo > 0,
282
II. Indeterminate Equations
Yo > 0). Then all the solutions are given by x
+ yfl = + (xo + YOfl)n 2

2
n = 0, ± I, ± 2, ....
'
Proof Since the equation x 2  dy2 = I does possess a solution we see that Xo, Yo exist. The rest of the proof is the same as that in Theorem 10.9.2. 0
Let Xo
+ Yojd
e='
2
_ Xo  Yojd e= 2 .
'
Definition. Let d > O. By a primary solution to (l) we mean a solution which satisfies
2ax
+ (b 
If we write L = 2ax above becomes
fl)y > 0,
I
+ (b + fl)y,
L
+ (b + jd)YI < e2. 2ax + (b  fl)y 2ax + (b  jd)y, then the condition 2ax
~ 1
=
' I
~ I~I < e
2
•
Theorem 4.5. Let d> O. If the equation (I) has proper primary solutions which correspond to the same I, then it has a unique proper primary solution.
Proof From Theorem 4.2 we know that if Xo, Yo is a proper primary solution to (l), then, on denoting by Lo the associated number L, every proper solution of (l) corresponding to the same I can be represented by L = ± Loen. We have
so that I ~ IL/LI < e2 only when n = 0, and in this case L = Lo > O.
0
When d > 0 we set w = I. We can now generalize the definition of a primary solution: When d> 0, the definition is as given previously; when d < 0 any proper solution is also called a primary solution. Combining Theorems 4.3 and 4.5 we now have Theorem 4.6. If, corresponding to the same I, the equation (I) has proper primary solutions, then there are w proper primary solutions. 0
Theorem 4.5 suggests that in solving ax 2 + bxy + cy2 = k there is no need to search for integer points on the whole hyperbola. The primary solution occurs in a finite part of the hyperbola, and having obtained the primary solution we may use the formula L = ± Loen to find all the other solutions. That is, if e is known, all the solutions can be obtained in a finite number of steps. Specifically, from LoLo = 4ak,
Lo > 0,
II
2 I~ Lo Lo <e,
283
11.5 Method of Solution
we see that
or
giving
Iyl :;:; 2sJlakl/d.
°
That is we need only find a solution which satisfies < y :;:; 2sJlakl/d and the rest can be obtained from L = ± Los". When a > 0, k > we deduce from L > and LL > that L > 0, and whence L < L so that
°
°
0 jd, then we can still reduce it to the case when k<jd. Suppose that x, Y is proper solution to (1). Then there are Xl> Yl such that (2)
Multiplying (1) by
x~
 dyi we have
or
Let xo, Yo be a solution to (2). Then all the solutions to (2) are given by XXI  dYYl = xXo  dyyo + (X2  dy2)t = xXo  dyyo + Jtk. We may therefore choose t so that
Xl
= Xo + tx, Yl = Yo + ty so that
k
IXXI 
Let
IXXI 
dyyd
dYYll ::;;;
2·
= l. Then
x~  dyi
f2d
= ~ = 1'/h,
1'/ =
± 1,
h>
o.
Therefore
From this we see that from a solution to (1) we arrived at a similar equation with a number k which is smaller. If this number is still greater than jdwe can repeat the argument. This suggests the following procedure. We first solve for all those I satisfying 12 == d (modk), 0::;;; I::;;; k12, and we let them be 11 , /2, ... , It. Set (l?  d)/Jk = 1'/ihi, 1'/i = ± 1, hi > 0 and solve the system x?  dy? = 1'/ihi (1 ::;;; i::;;; t). Suppose that hi < jd. Then we use the method of continued fractions. Let Xi, Yi be a solution. Then X=
 JdYi ± lixi 1'/ih i
 JXi ± IJ'i Y=1'/ihi
(3)
is a solution to (1). This is because from 1'/ihi(X
+ jdy) = (Xi + jdy;)(  Jjd ± Ii)
we have X2  dy2 = Jk at once. Further, if x, y in (3) are integers, then they are solutions to (1).
285
1l.5 Method of Solution
x; 
If hi > jd, then we proceed to obtain a specific solution to dy; = Yfihi' Then all the solutions to (I) can be obtained. We illustrate this with an example. Example. We wish to solve
x 2  15y2 = 61.
(4)
6r
We first solve 12 == 15 (mod61), 0::;;; I::;;; This means solving f2 = 15 + 61h, f2 ::;;; 900, or finding h so that 15 + 61 h is a square. Letting h run over 0::;;; h ::;;; [900/61] = 14 we see that there is only one suitable h, namely h = 10, 1= 25. We now have to solve
xi 
15yi = 10.
Observing that 10> ji5 we now consider f2 = 15 h = I, 1= 5 so that we have to solve x~  15y~
(5)
+ IOh, I::;;; 1f = 5. This gives
= I.
(6)
From the method of continued fractions, the solutions to (6) are given by X2 +ji5Y2 = ±(4+ji5t. Therefore Xl +ji5Yl = ±(4+ji5)n(5±ji5)and so x
+ ji5 Y = ± (4 + ji5)n(5 ± ji5)(25 ± ji5)/IO.
Here the three signs
± are independent so that either
x
+ ji5y = ± (4 + ji5)n(l4 ± 3ji5)
x
+ ji5 Y = ± (4 + ji5)n(11 ± 2ji5).
or
eJ
Alternatively we can use the inequality at the end of §4, that is 0 < Y < ak/d. For this example we have 0 < y::;;; 7 and we can construct the following table 2
3
4
5
6
7
15
45
75
105
135
165
195
15
60
135
240
375
540
735
76
121
196
301
436
601
796
y
15(2y  I)
Observe that in the second row of this table each term increases by 30, and in the third row the ith term is the sum of the (i  l)th term and the ith term of the second row.
286
11. Indeterminate Equations
Exercise 1. Solve the following indeterminate equations.
+ 7y2
(a)
3x2  8xy
(b)
3xy
(c)
9x 2  12xy
(d)
x 2  8xy  17y2
 4x
+ 2y2 
+ 2y =
109,
4x  3y = 12,
+ 4y2 + 3x + 2y = + 72y 
75
=
12,
0.
Exercise 2. Let k < )d. Show that the solutions to ax 2 + bxy + cy2 = k can be obtained from the continued fractions expansions of the roots of the equation ax 2 + bx + c = 0. Try and generalize the results in this section.
11.6 Generalization of Soon Go's Theorem Let us consider the equation x 2 + y2 = Z2. If (x, y) = d> 1, then d also divides z. We may therefore assume that (x,y) = 1, and we need only consider positive solutions. Next, if x, yare both odd, then x 2 + y2 == 2 (mod 4), so that Z2 is divisible by 2 but not by 4; since this is impossible we see that x and y must be of opposite parity. We shall assume that x is even. Theorem 6.1. The solutions of the equation x 2 + y2 = Z > 0, (x,y) = 1, 21x are given by x
Z2
satisfying x > 0, y > 0,
= 2ab,
where a, b are coprime integers of opposite parity satisfying a > b > 0. There is a one to one correspondence between (x,y,z) and (a, b). 0
e
On putting ~ = x/z, 1'/ = y/z the equation x 2 + y2 = Z2 becomes + 1'/2 = 1 + 1'/2 = 1 has infinitely and we deduce from Theorem 6.1 that the unit circle many rational points given by .
e
We generalize the problem and ask if every second degree conic possesses infinitely 31'/2 = 2 many rational points. The answer is no; for example the hyperbola has no rational points. For if we put ~ = x/z, 1'/ = y/z, (x, y, z) = 1 then we have x 2  3y2 = 2Z2, so that x 2 == 2Z2 (mod 3), which implies 31x and 31z, and whence 31y, contradicting (x,y,z) = 1. However, we do have the following:
e
Theorem 6.2. Let a second degree conic, not a pair of straight lines, have rational coefficients. If the conic has one rational point, then it has infinitely many rational points.
287
11.6 Generalization of Soon Go's Theorem
Proof We may assume that the conic passes through the origin; otherwise we can translate the origin to the rational point concerned. The conic can be written as S2(e, 1]) + Sl(e, 1]) = 0, where Si(e, 1]) is homogeneous in and I] with degree i. If Sl(e, 1]) 0, then the original conic is a pair of straight lines, and if S2(e, 1]) 0, then the original conic is a straight line. Therefore Sl(e, 1]) and S2(e, 1]) are not identically zero. Now put I] = (e so that eS2(l, 0 + Sl(l, 0 = giving
e
=
=
°
There are therefore infinitely many rational points.
0
Theorem 6.3. Let A, B, C be rational numbers, not all zero. Suppose that B2  4AC is a square. Then the conic (I) has infinitely many rational points. In other words, if the asymptotes of a hyperbola has rational points, then the hyperbola has infinitely many rational points; a parabola has infinitely many rational points. Proof Write L2 Ae
= B2  4AC, so that
+ Bel] + CI]2 = =
If L ¥
A ((
A
°
e+
2: Y+ (~ ::2) I]
(e + ;:
I] 
1]2)
2~ 1]) (e + 2: I] + 2~ 1]).
we set 1]' =
and solving for
e
B+L 2A 1],
eand I] and substituting into (I) we have Ae'I]'
+ D'e' + E'I]' + F' = 0,
which gives ,
e= 
E'I]' + F' AI]' +
P' .
Therefore (1) has infinitely many rational points. If L = we set e' = e + BI]/2A, 1]' =  I] giving Af2 + D' e' + E'I]' + F' = 0. If E' ¥ 0, then 1]' =  (Ae'2 + D'e' + F')/E'sothatthereareinfinitelymanyrational points. If E' = 0, then the original curve is not a second degree conic. 0
°
Note: Theorems 6.2 and 6.3 raise the following problem. Let
(2)
288
11. Indeterminate Equations
be a homogeneous second degree equation in Xl, X2, ... , X" with integer coefficients, not factorizable into a product of linear terms. We ask if there are infinitely many lattice points satisfying (2). We see from Theorem 6.2 that if n ~ 3 and if (2) has a nonzero lattice point, then there are infinitely many lattice points. But when does it have a lattice point? For example: xi + x~ + ... + = certainly has no nonzero lattice point. We therefore have to assume thatj(~l> ... ' ~") = has other real solutions. It can be proved that, under this assumption, and for n ~ 5, the equation (2) has integer solutions, and indeed infinitely many solutions (this is Mayer's theorem). The result does not hold when n = 4. For if xi + x~ + x~  7x; = 0, then we may assume that (Xl> X2, X3, X4) = I. Now from xi + x~ + x~ + x; == (mod8), and x 2 == 0,1,4 (mod 8) we can deduce that 21(Xl,X2,X3,X4) which is a contradiction.
x;
° °
°
11. 7 Fermat's Conjecture Fermat claimed that when n ~ 3 the equation x" + y" = z" has no positive integer solutions in x, y, z. This has been proved for 2 < n < 125000, and even this modest amount of result involves some pioneering work by mathematicians. In order to prove Fermat's claim it suffices to establish the case when n = 4 and when n is an odd prime. For if n has an odd prime divisor p, then
and if n has no odd prime divisors, then n = 2k (k ~ 2) and
The case n = 4 can be settled using Fermat's method of infinite descent. In fact we have Theorem 7.1. The equation X4
+ y4 = Z4 has no positive integer solutions.
D
11.8 Markoff's Equation We introduced in §10.5 Markoff's equation (1)
and we stated the relationship between Markoff numbers and continued fractions. We shall now study this equation. Theorem 8.1. Let xo, Yo, Zo be a solution to (1). Then so is xo, Yo, 3xoyo  zoo
289
ll.8 Markoff's Equation
Proof x~
+ y~ + (3xoYo
 zof = x~
+ y~ + z~  6xoYozo + 9x~~ =  3xoYozo + 9x~~ = 3xoYo(3xoYo 
Zo)·
D
Theorem 8.2. Every solution of (1) can be generated from Theorem 8.1 with x = y = z = 1 as an initial solution. Proof 1) If x = y = z, then clearly x = y = Z = 1. 2) If x = y :f z, then 2X2 + Z2 = 3x2z. Hence x 21z2 or xlz. Let z = wx so that 2 + w2 = 3wx (w > 0) and hence w12, giving w = 1 or 2. But x :f z so that w = 2 giving x = 1, y = 1, z = 2 and this is a solution generated by (1, 1, I) from Theorem 8.1. 3) We can now assume that x < y < z. Ifwe can establish that 3xy  z < z, then we can reduce the value of x + y + z, so that after a finite number of successive steps x, y, z cannot be all different which means that we have reduced the present case to 1) or 2). This is what we shall prove. From Z2  3xyz + x 2 + y2 = 0 we have
If then from we see that
2z < 3xy  xy = 2xy, or
z < xy. But
so that xy < z giving a contradiction. Therefore
as required.
D
Example. Starting with (l, 1, I) we have (l, 1,2) and then (l, 2, 5); (l, 5, 13); (2,5,29). Continuing we have the following table for x ::;:; y ::;:; z < 1000. z y
x
2
5
13
29
34
89
169
194
233
433
610
985
2
5
5
13
34
29
13
89
295
233
169
2
5
2
2
290
11. Indeterminate Equations
Note: Observe that this is also a method of descent. Fortunately there is no more descent after x = y = z = 1. We see therefore that Fermat's method of infinite descent can be used either to prove that there is no solution, or to prove that there are infinitely many solutions.
Exercise 1. Generalize the discussion here to the equation nXIX2 ... Xn •
Exercise 3. Show that the equation 2X4  y4
11.9 The Equation x 3
=
xi + x~ + . . . + x; =
Z4 has infinitely many solutions.
+ y3 + Z3 + w3
=
0
The number 1729 is the smallest positive integer representable as the sum of two cubes in two different ways. That is 1729 = 103 + 93 = 12 3 + P. There are other numbers having this property, for example: 23 + 34 3 = 15 3 + 333, 93 + 15 3 = 23 + 16 3 • In fact we even have 70 3 + 560 3 = 98 3 + 552 3 = 315 3 + 525 3, 121170 3 + 969360 3 = 545275 3 + 908775 3 = 342738 3 + 955512 3
= 336455 3 + 956305 3, and 34 + 43 + 53 = 63, P + 63 + 83 = 9 3. The solutions to the equation x 3 + y3 + Z3 + w3 = 0 present a very interesting problem. Unfortunately we still have not obtained a formula for all the solutions. The EulerBinet formula below provides all the rational solutions. Theorem 9.1. The rational solutions to the equation W 3 + 6XYZ = 0 are given by
+ 3 W(X2 +
y2
x = pa(a 2 + 3b 2 + 3c 2),
W=  6pabc, Y = pb(a 2 + 3b 2 + 9c 2),
Z
= 3pc(a 2 + b2 + 3c 2).
Here (a, b, c) = 1, and p is a rational number. Proof We rewrite the given equation as W Z
3Z W
Y
x
 3Y 3X =0, W
so that there must be integers a, b, c not all 0 and (a, b, c)
=
1 such that
+ Z2)
11.9 The Equation x 3
+ y3 + Z3 + w3 =
291
0
+ 3Zb  3Yc = 0, Za + Wb + 3Xc = 0, Ya  Xb + We = 0.
Wa 
Solving these for X, Y, Z, W, the required result follows.
D
Let
+ f3 + y + Yfr) such that
Yfd~l
very small, there are pairs of numbers
(~1>
Yf1),
and the ratios
~,4~2 , ... ,4r are nearly equal. Therefore follows. 0
1
£'_
Yf1
Yf2
Yfr
~slYfs
are distinct ratios and the required result
Exercise 1. Show that the rational solutions to obtained from 0(
=
0'( 
(~
y = O'«e
where
~

3Yf)(e
+ 3Yf2)2

+ 3Yf2) + 1), (~ + 3Yf)),
0(3
+ ++ [33
y3
()3
= 0 can be
+ 3Yf)(~2 + 3Yf2)  1), O'«e + 3Yf2)2  (~  3Yf))
[3 = O'«~
then the linear substitution ~' = (;( 1 ~ + PI '1/2, '1' = '1, " = ,will make the left hand side of (14) linear in '1' and so the theorem is proved. D Theorem 10.4. If a nondegenerate cubic surface has a rational point, then it has infinitely many rational points.
Proof We may assume that the surface passes through the origin so that it can be written as
(16) where Si(~' '1, 0 are homogeneous in ~, '1,' with degree i. 1) If SI(~''1,O == 0, then S3(~''1,') + S2(~''1,') = 0, so that
'S3(t,~, 1) + S2(r~, 1) = 0, giving, =  S2«(;(, P, 1)/S3«(;(, P, 1). Observe that if S3«(;(, P, 1) == 0, then the original surface is not a cubic, and if S2«(;(, P, 1) == 0, then the cubic surface is a degenerated one. 2) If SI(~' '1, ') ¥= 0, then under the transformation SI(~' '1, ') ~, we have
If S3(~' '1, ') and
S2(~' '1, 0)
are not both identically zero, then we let, = 0 giving
299
Notes
Y
=
If S2(~' '1, 0) == 0, then S2(~' '1, 0 = (Ll(~' '1, O· We let Z = 1/(, X = ~/(, '11( so that S3(X, Y, 1)
+ ZL 1 (X,
Y, 1)
+ Z2
=
°
which gives
and this is included in Theorem 10.2 so that the required result follows. If S3(~' '1, 0) == 0, we let S3(~' '1, 0 = (T2(~' '1, 0, and this reduces to Theorem 10.3. The theorem is proved. D
Notes 11.1. The problem of the existence of solutions to the famous equation
x2
=
yn
+ 1,
has been settled by K. Chao [16]. He proved that, apart from n = 3, x = y = 2, there are no integer solutions.
± 3,
Chapter 12. Binary Quadratic Forms
12.1 The Partitioning of Binary Quadratic Forms into Classes Definition. For fixed integers a, b, c the homogeneous quadratic polynomial F = F(x,y) = ax 2 + bxy
+ cy2
is called a binary quadratic form, or simply a form, and is denoted by {a, b, c}. The integer d = b 2  4ac is called the discriminant of the form. It is easy to see that d
== 0 or I (mod 4).
Theorem 1.1. A necessary and sufficient condition for F to be factorized into a product of two linear forms with integer coefficients is that d is a perfect square. Proof 1) Let d be a perfect square, and a ¥ O. Then the equation ax 2 + bx
+ c = a {(x + :aY 
4~2} = 0
has rational roots, and therefore, by Theorem 1.13.2, the form can be factorized into a product of two linear forms with integer coefficients. If a = 0, then clearly F(x,y) = (bx + cy)y. 2) If ax 2 + bxy + cy2 = (rx + sy)(tx + uy), then d = b 2  4ac = (st
The theorem is proved.
+ ru)2
 4rt . su = (st  ru)2.
D
We shall assume from now on that d is not a perfect square. If d < 0, a > 0, then
301
12.1 The Partitioning of Binary Quadratic Forms into Classes
°
and so F(x,y) ~ Oforallx,y,andF(x,y) = ifand only if x = y = 0. We call such a form a positive definite form. If d < 0, a < 0, then F ::;;; for all x, y, and we call the form a negative definite form. Since a negative definite form becomes a positive definite form on multiplication by  1, we shall only deal with positive definite forms which we shall simply call definite forms. If d > 0, then F(1,O)
= a,
F(b,  2a)
= ab 2 
°
b . b . 2a
+ c . 4a 2 =

da.
°
If a =I 0, then the two values here have different signs. If c =I we can similarly choose two values which have different signs. If a = c = 0, then F(l, 1) = b,
F(1,  1)
= 
b
again have different signs. Thus when d > 0, the form F(x, y) can take both positive and negative values, and we therefore call such a form an indefinite form. Definition. Let the integer coefficient substitution x=rX+sY,
y=tX+uY,
(rust=l)
transform F(x,y) into G(X, Y)we say that Fis transformed into G via ( rt
su)'
The two forms F and G are then said to be equivalent, and we write F ~ G to denote this. More specifically, let F = {a, b, c} and G = {al' bi> cd. Then we have (1)
b1
= 2ars + b(ru + st) + 2ctu
+ b(l + 2st) + 2ctu, as 2 +'bsu + cu 2,
= 2ars
(2)
Cl
=
(3)
=
(2ars
and we derive at once
bi 
4alcl

+ b(ru + st) + 2ctU)2 4(ar2 + brt + ct 2)(as2 + bsu + cu 2)
= (b 2 
4ac)(ru  st)2 = b 2  4ac = d.
We see therefore that equivalent forms have the same discriminant. Also, if d < 0, a > 0, then al = F(r, t) ~ 0. Since a 1 = implies r = t = 0 which is impossible we see that al > 0. In other words forms which are equivalent to a positive definite form are themselves positive definite.
°
Theorem 1.2. (i) F ~ F (reflexive). (ii) If F ~ G, then G ~ F (symmetric). (iii) If F ~ G, G ~ H, then F ~ H (transitive).
D
302
12. Binary Quadratic Forms
We omit the simple proof for this theorem. The relation of being equivalent partitions the set of forms with discriminant d into classes, so that all the forms in one class are equivalent among themselves, and two forms from two different classes are not equivalent. It is clear that forms from the same class represent identical sets of integers. For if k = G(X, Y), then k = F(rX + sY, tX + uy).
12.2 The Finiteness of the Number of Classes Theorem 2.1. In every class offorms there is always one which satisfies the condition
Proof Let a be an integer with the least absolute value from the set of nonzero integers representable by forms in the class concerned. Let {ao, bo, co} be any form in the class. Then there exist r, t such that
and (r, t) = 1, since otherwise a/(r, t)2 is also representable by {ao, b o, co}, and lal/(r, t)2 < lal, which is impossible. We can fix sand u so that ru  st = 1. Then {ao, b o, co} is transformed into {a,b',c'} via
G:).
Now the transformation
G~)
transforms {a,b',c'} into
{a, b, c} where b = 2ah + b'. We can choose h so that Ibl ~ lal. Since c is representable by {a, b, c}, and this form also belongs to the class containing {ao, bo, co} it follows that Icl ;::i: 14 (Note that c # 0, because c = 0 implies that d is a perfect square.) D
Theorem 2.2. The number of classes is finite. Proof 1) d> 0 (indefinite). From Theorem 2.1 we have lacl ~ b 2 = d
+ 4ac > 4ac,
so that ac < O. Also 4a 2 ~ 41acl
=  4ac = d  b 2 ~ d
so that
fl
lal~,
2
and hence, by Theorem 2.1
fl
Ibl~·
2
303
12.2 The Finiteness of the Number of Classes
There are therefore only finitely many possible values for a and b. Since c = (b 2  d)/4a, the required result follows. 2) d < 0 (definite). Assuming that a > 0 we have, from Theorem 2.1,
so that
o a. Then t must be zero, since otherwise from ct 2 > at 2 and (4) we deduce that a > a which is impossible. Therefore t = 0, ru = 1. Now from (3), we have b' = 2ars
+ b == b
(mod 2a).
Since  a < b ::::; a and  a =  a' < b' ::::; a' = a we arrive at b = b', and hence c = c' at once. The same conclusion can be obtained if we assume that c' > a' (= a). It remains to consider the case a = a' = c = c'. Here we must have b = ± b', and from b ~ and b' ~ we arrive at b = b'. D
°
°
Note. The case of the indefinite forms is not this easy.
Definition. We call a form which satisfies (I) a reduced form. Exercise 1. Verify the following table of all the reduced forms for d
3
4
7
8
II
a
I I I
I 0 I
I I
I 0 2
I I
b c
2
I 0 3
3
Exercise 2. Prove that when d
= 
 12 2 2 2
 15 I I
4
2 I 2
 16 I 0 4
°< 
d::::; 20.
 19
 20
2 0 2
I I
5
I 0 5
2 2 3
48 there are four reduced forms:
{1,0,12},{2,0,6},{3,0,4},{4,4,4}.
12.3 Kronecker's Symbol Definition. Let m > 0, d == Kronecker's symbol
°
or 1 (mod 4) and d not a perfect square. The
(~) is defined by
(~) =0,
if pld;
G)={~
if d==1 if d==5
(~) =
(mod 8), (mod 8);
Legendre's symbol (p odd prime, p,td).
305
12.3 Kronecker's Symbol
If m
= TI~= 1 Pr
where Pr are primes, then
n
(m~)= r= (~) Pr 1
The following are very easy to prove:
(~) =
(i) If (d,m) > 1, then
(~) = ± 1.
(ii) If (d,m)
= 1, then
(iii) If
0, m2 > 0, then
ml >
O.
(ml~J = (:J (:J. Theorem 3.1. Ifm > 0, (m,d)
= 1, then the Kronecker's symbol is given by when
d) {(I:I). (m = (2)b 
m
d is odd
( 1)~~(m) 2 2 ,
lui
Here(~), (~), (m) are all Jacobi symbols. Idl m lui Proof 1) Let dbe odd. From the definition of the Kronecker's symbol and Theorem 3.6.5 we have
2) Let d
=
2b u, 2,ru. Then b
~
2, and m is odd, so that
_(2)b ( 1)~~(m) (md) _(2)b(U) m m m lui 2
2
.
0
From this theorem we deduce that
Therefore we have: Theorem 3.2. The Kronecker's symbol
(~) is a real character mod Idl·
Theorem 3.3. Suppose that m > 0, n >
°and m ==  n
(mod Idl). Then
if d> 0, if d < 0.
0
306
12. Binary Quadratic Forms
Proof Since
it follows from Theorem 3.1 that, when d is odd,
)
(Idl ~ I) = (Idll~ I) = ( ~t = ( I ={ When d is even, we let d
=
if d < 0.
2 bu, 2,ru, b ;;:: 2. Then, from Theorem 3.1, we have
=
2
The Theorem is proved. Theorem 3.4. Let k >
1
if d> 0,
I,  I,
(Idl 2)b .'C.!.. (Id l  I) (Idl d) _I _ I ( I) Iul= (
t;
"1
1"11
1)2+2
={
I
'
 I,
= (
I)
.'C.!.. ( 2
I)
~
if d> 0, if d < 0.
0
°and (d, k) = I. The number of solutions to the congruence x 2 == d
(1)
(mod 4k)
is equal to
2I(~) Ilk f ' where the sum is over all positive squarefree divisors f of k.
If x is a solution to (1) then so is x solutions to
+ 2k. Hence, by the theorem, the number of
x 2 == d (mod4k),
0~x 1, then we say that {a, b, c} is imprimitive. Clearly
{~, ~,~} g g g
is a primitive form with discriminant d/g 2 • Also, if
{a, b, c} ~ {at. bt. cd then the two forms are either both primitive or both imprimitive. We denote by h(d) the number of classes of primitive forms with discriminant d. Clearly the number of classes of forms with discriminant d is equal to
From each class of primitive forms we select a representative (for definite forms we consider the primitive positive definite forms) giving a representative system which we denote by
Theorem 4.1. Let k > 0, (k, d) = 1, and denote by tjJ(k) the total number of primary solutions to k
=
F 1 (x,y),
... ,
k
= Fh(d)(X,y).
Then tjJ(k) = w I nlk
(~). n
(For the definitions of primary solution and w, see §4 in the previous chapter). Proof We begin by considering the solutions to the congruence [2
== d (mod 4k),
o ~ 1< 2k.
308
12. Binary Quadratic Forms
For a given solution Iwe can determine an integer m from f2  4km = d. This then gives a form {k, I, m} which is easily seen to be primitive and with discriminant d. Therefore {k, I, m} is equivalent to one and only one Fi • Also, from Theorem 11.4.3, we know that there are w proper primary solutions corresponding to each I. Therefore the total number of proper primary solutions to k
= F 1 (x,Y), ... , k = Fh(d)(X,y)
is
wI(~) Jlk f . Also the total number of primary solutions is t/J(k)
=w
I I (~) I f
g21k k g> 0 J g2
(since (k, d) = 1, so that ((k/g2), d) = 1). Since (g2, d) = 1 it follows that t/J(k)
=w
I I ( d) = wI (d)  . :~I~ J It> fg nlk n 2
(This is because any integer n can be written asfg2 wherefis squarefree and g > O. Also g2Ik,fl(k/g 2) and nlk are equivalent.) D Consider now the following application of the theorem. It is easy to prove that = 1 so that t/J(k) is the number of solutions to k = X2 + y2. Therefore:
h(  4)
+ y2 = k is equal to four times the difference between the number of divisors of k which are congruent 1 (mod 4) and the number which are congruent 3 (mod4). D
Theorem 4.2. The number of solutions to X2
This agrees completely with Theorem 6.7.5. Exercise 1. Let m be odd. The number of solutions to X2 + y2 = 2 1m is 20" where 0" is the difference between the number of divisors of m which are congruent 1 or 3 (mod 8) and the number which are congruent 5 or 7 (mod 8). Exercise 2. The number of solutions to X2 + xy + y2 = k is 6E(k) where E(k) is the number of divisors of k of the form 3h + 1 subtracting the number of divisors of the form 3h + 2. Exercise 3. Let m be odd and consider the number of solutions to the equation X2 + 3y2 = 2 1m. If I is odd, then this number is zero; if I = 0, then this number is 2E(m); if I is positive and even, then this number is 6E(m). Here E(m) has the same
definition as earlier.
309
12.5 The Equivalence of Forms modq
Exercise 4. If m is odd, then the equation x 2 + 3y2 = 4m has E(m) positive odd solutions. Exercise 5. Let m be odd and consider the number of solutions to the equation x 2 + 4y2 = 2km. When k = 0, this number is 2E; when k = I, this number is 0; when k ~ 2, this number is 2E. Here E is the number of prime divisors of m
congruent I (mod 4) subtract the number of divisors of k congruent 3 (mod 4). Exercise 6. Denote by e(n) the number of divisors of n congruent 1,2,4 (mod 7) subtract the number of those congruent 3, 5, 6 (mod 7). The number of solutions to x 2 + xy + 2y2 = n > 0 is then 2e(n). Exercise 7. If m is odd, then e(2am ) = (a + I)e(m). Let 3%t. If b is odd, then = 0 and if b is even, then e(3 b t) = e(t).
e(3 b ()
Exercise 8. Let m be positive and odd. The numbers of solutions to m = x 2 + 7y2 and 2m = x 2 + 7y2 are 2e(m) and 0 respectively. The number of solutions to 4k = x 2 + 7y2 is 4e(k). Exercise 9. Let m be positive and odd. Then there are e(m) positive integer solutions to x 2 + 7y2 = 8m. Exercise 10. The number of solutions to x 2 + xy + 3y2 = m > 0 is twice the difference between the number of divisors of m congruent I, 3, 4, 5, 9 (mod II) and the number of those congruent 2, 6, 7, 8, 10 (mod II).
12.5 The Equivalence of Forms mod q Let q be a prime number. Suppose that there is an integer valued coefficients substitution
x=rX+sY,
y = tX + uY,
(ru  st,q) = I
(I)
such that (2)
Then we say that the two forms {a, b, c} and {aI, bb cd are equivalent modq. Ifwe denote by dand d l the discriminants for {a,b,c} and {abbbcd, then clearly (3)
From (3) we see that if {a, b, c} and {ab bb cd are equivalent modp, then
(~) = (;).
310
12. Binary Quadratic Forms
Let us take q to be a prime p > 2. Suppose that the discriminant of {a, b, c} is d where p,j'd. Then {a, b, c} must be equivalent modp to a form {at> 0, cd. This is because p,j'(a, b, c), and if p,j'a then letting b X==x+y, 2a
Y==y
(modp)
we have ax 2 + bxy
+ cy2 == a ( x + b)2 y 2a
d d  _y2 == aX 2  _y2 4a 4a
and similarly if p,j'c; if pl(a, c), then taking x = X ax 2 + bxy
+ cy2 == bxy == bX 2 
+ Y, y = X
(modp),  Y we have
by2 (modp).
Therefore we can assume from now on that plb and p,j'ac. Lemma 1.
If p,j'ac,
then there are x, y such that ax 2 + cy2 == 1 (modp).
Proof Let x, y run over 0, 1, ... ,p  1 separately. Then ax 2 and 1  cy2 separately take (p + 1)/2 distinct values. Therefore there are x, y such that ax 2 == 1  cy2
as required.
(mod p)
0
Let 1 == ar2 + ct 2 (mod p) and let s, u be any pair of integers satisfying p,j'ru  st. With s, u fixed, we let
b l == 2ars + 2ctu,
CI
== as 2 + cu 2 (modp)
so that {a, 0, c} ~ {l, bl> cd modp. If d l is the discriminant of the second form, then from our discussions we have {l,bl>cd
~ {1'0,  ~} ~ {l,0, 
dd
(modp).
Summarizing we have: Theorem 5.1. Let the discriminant of {a, b, c} be d, and p > 2, p,j'd. Let r be any quadratic nonresidue modp. Then
{a,b,c}
if(~) = 1, and
~
{1,0,  l}
~
{O, 1,0}
(modp)
311
12.5 The Equivalence of Forms modq {a,b,c}~{l,O,
r}
(modp)
ifGJ =  1. Also {I, 0,  I} and {I, 0,  r} cannot be equivalent modp.
0
Corollary. If p is an odd prime that does not divide d, then any two forms with discriminant d must be equivalent modp. 0 When q
=
2 and the forms have odd discriminants we have:
Theorem 5.2. Any form with an odd discriminant must be equivalent mod 2 to exactly one of the following {O, 1,0}, {I, 1, I}. More specifically, we have {a,b,c}
~
{O, 1,0}
(mod 2)
if 2lac;
{a,b,c}
~
{I, 1, I}
(mod 2)
if 2,tac.
Proof Since 2,td it follows that 2,tb. Consequently if 2,tac, then ax 2 + bxy
+ cy2 == x 2 + xy + y2
(mod 2);
if 2lac, then either 21a or 21e. But if 21a then ax 2 + bxy
+ cy2 == xy + cy2 == y(x + cy)
(mod 2)
so that {a,b,c} ~ {O, 1,0} (mod2), and similarly if2lc. Finally {O, 1, O} and {I, 1, I} cannot be equivalent mod 2 so that the theorem is proved. 0 Corollary. Any two forms with the same odd discriminant must be equivalent mod2. 0 We next consider the case when p divides the discriminant of the forms. Lemma 2. Let n be any given integer. Then there are two integers x, y such that = 1 and (F(x,y),n) = 1.
(x,y)
Proof Let q be any prime number. Since F(x, y) is a primitive form, q,t(a, b, c). If q,ta, then q,tF(l,O); if q,tc, then q,tF(O, 1); if ql(a, c) and q,tb, then q,tF(l, 1). Therefore the lemma follows if n = q.
Let qi>' .. ,qt be all the distinct prime divisors of n. From the above, there are integers x;, y; such that q;,tF(x;, y;). From the Chinese remainder theorem there are
312
12. Binary Quadratic Forms
two integers X, Y such that X
==
Xi
Y
(mod qi),
= Yi
i
(mod q;),
= 1,2, ... , t.
Clearly we have (F(X, Y),n)
Now let
X
= X/(X,
Y), y
=
= 1.
Y/(X, Y). Then (x,y)
=
1 and
D
(F(x,y),n)=1.
Consider now p > 2, p Id where d is the discriminant of the form {a, b, c}. Since p,./'(a, c) we may assume that p,./'a. It is easily seen that {a,b,c}
~
{a,O,O}
(modp).
Theorem 5.3. Letp > 2 and let theforms {a, b; c} and {aI, bI> cd have discriminants d and d 1 respectively where p Id, pi d 1 • A necessary and sufficient condition for {a, b, c} and {aI> bI> cd to be equivalent mqdp is that
where k and kl are any integers representable by {a, b, c} and {aI' bI> cd respectively and satisfying (k,d) = 1, (k 1 ,d1 ) = 1. Proof That k and kl exist follows from Lemma 2. Let k == ax2 (mod p), (k, p) = 1. Then
Thus
(i)
is constant and is equal to
(~).
+ bxy + cy2
Suppose now that {a, b, c} and
{aI, bI> cd are equivalent modp. Then, from the definition of equivalence,
Conversely, if(i)
(i) = (~) = (~) = (:1). = (:1). (~) = (~ ) then
so that there is an integer z such that
a == alz2 (modp) and hence
It remains to consider the situation whenp following symbols:
= 2 and 21d. We first introduce the
313
12.5 The Equivalence of Forms modq kl
d if =0 or 3 (mod4); 4
(j(k) = (  1)2 ,
d if =0 or 2 (mod 8); 4
k 2 1
e(k) = (  1)8, kl
(j(k)e(k)
k 2 1
d if =0 or 6 (mod8); 4
= ( 1)2+8,
where k is an odd integer representable by {a, b, c}. Since 21 d implies 21 b we shall assume that b = 0 and consider
d=  4ac. Theorem 5.4. A necessary and sufficient condition for two forms satisfying (mod 4) to be equivalent mod 4 is that they should have the same (j.
Proof Since d =  4ac, it follows that ac and k is representable as
1 =3 .
=I (mod 4), that is a =c (mod 4). If 2,rk
then, since x, y must have the same parity it follows that k = a (mod 4) and hence = (j(a). The theorem can easily be deduced from this. 0
(j(k)
The same method can be used to prove the following theorems: Theorem 5.5. A necessary and sufficient condition for two forms satisfying (mod 8) to be equivalent mod 8 is that they should have the same e. 0
1 =2
Theorem 5.6. A necessary and sufficient condition for two forms satisfying (mod 8) to be equivalent mod 8 is that they should have the same ()e. 0
1= 6
Theorem 5.7. A necessary and sufficient condition for two forms satisfying (mod 4) to be equivalent mod 4 is that they should have the same (j. 0
1= 0
Theorem 5.S. A necessary and sufficient condition for two forms satisfying 1= 0 (mod 8) to be equivalent mod 8 is that they should have the same (j and e. 0 Exercise 1. Any two forms satisfying 1 = 2 (mod 4) are equivalent mod 4. Exercise 2. Any two forms satisfying 1
=I (mod4) are equivalent mod4.
Exercise 3. Any forms satisfying 1= I (mod 4) must be equivalent mod 8 to exactly one of
314
12. Binary Quadratic Forms
Deduce also that any two forms with the same discriminant d which satisfies ~ == 1 (mod 4) must be equivalent mod 8. Exercise 4. Let q be any positive integer. A necessary and sufficient condition for two quadratic forms to be equivalent mod q is that they have the same character system (see Definition 1 in the next section).
12.6 The Character System for a Quadratic Form and the Genus It follows at once from the definitions that any two quadratic forms which are equivalent are also equivalent mod q for any q.
Definition 1. Let PI>' .. ,Ps be the odd prime divisors of d. If (k, 2d) = 1 and k is representable by F(x,y) then, from the previous section, we see that
(~) , J(k), e(k), J(k)e(k)
(1)
do not depend on k. We call them the character system for F(x,y). Since two equivalent quadratic forms have the same character system we can speak of the character system of an equivalence class of forms. Definition 2. If two quadratic forms with the same discriminant d have the same values for each of the characters, then we say that they belong to the same genus. It is easily seen that a genus is formed from various equivalence classes offorms. We shall prove that each genus has the same number of equivalence classes. Since this fact falls more naturally in the study of ideals in a quadratic field we do not give the proof here. The importance of the notion of genus comes from the discussion of the representation of integers by quadratic forms. Let F(x, y) be a fixed quadratic primitive form. We now discuss the Diophantine equation k = F(x,y).
(2)
If h(d) = 1, then this problem can be solved with Theorem 4.1. But if h(d) ¥ 1, then we only have certain incomplete results from Theorem 4.1. For example if ljJ(k) = 0, then (2) has no solutions; but if ljJ(k) ¥ 0, is (2) soluble then? If it is soluble, then how many solutions are there? These questions cannot be answered by Theorem 4.1. The introduction of the notion of genus helps partly to answer these questions. Example 1. d
=  96. There are four positive definite reduced primitive forms: {1,0,24},{3,0,8},{4,4,7},{5,2,5}.
12.6 The Character System for a Quadratic Form and the Genus
315
From Theorem 4.1 we only know that if k is representable by these four forms, then the total number of solutions is t/J(k)
=2L (96) , nlk
n
where n runs over all the positive divisors of k. In order to calculate the character system we first select k coprime with d and representable by the forms. We take k = 1,11,7,5
and obtain
Form {1,0,24} {3, 0, 8} {4,4,7} {5,2,5}
(~)
o(k)
B(k)
+1
+1
+1
 1 +1  1
 1  1 +1
+1
 1  1
This table shows that each genus has one equivalence class. Therefore, when k == 1,11,7,5 (mod 12), t/J(k) represents the number of solutions of the first, the second, the third and the fourth form respectively. More specifically, if k == 1 (mod 12), then t/J(k) = 2 Lnlk (  96/n) represents the number of solutions to x 2 + 24y2 = k. At the same time we have proved that this equation has no solution if k == 11,7,5 (mod 12). Example 2. d =  15. There are two positive definite reduced primitive forms:
{l, 1, 4}, {2, 1, 2}.
Taking k = 1 and 17 will give
(~) = (~) = 1
and
(~)=(~)= 1.
We can then perform the calculations for k == 1,4 (mod 15) and k == 2,8 (mod 15). We conclude that if k == 7,11,13 or 14 (mod 15), then k is not representable by either of the two forms. If k == 1,4 (mod 15) then there are 2 Lnlk ( 15/n) ways to represent k by {I, 1, 4} ; if k == 2, 8 (mod 15), then there are the similar number of ways to represent k by {2, 1, 2}. From these two examples we see that if each genus contains only one equivalence class, then the number of solutions to (2) is completely determined when (k,2d) = 1. We tabulate all the discriminants d >  400 in which the genus has only one equivalence class in the followin~ble, where we have also included all the positive definite reduced primitive forms.
316
12. Binary Quadratic Forms
Exercise. Study, as in the examples, the cases d
=  20,  24,  32,  35,  51,
75.
d=3 4 7 8 11 12 15 16 19 20 24 27 28 32 35 36 40 43 48 51 52 60 64 67 72 75 84
88 91
1, 1, 1 1,0, 1 1,1,2 1,0,2 1,1,3 1,0,3 1,1,4 2,1,2 1,0,4 1, 1,5 1,0,5 2,2,3 1,0,6 2,0,3 1, 1,7 1,0,7 1,0,8 3,2,3 1,1,9 3,1,3 1,0,9 2,2,5 1,0,10 2,0,5 1, 1, 11 1,0,12 3,0,4 1,1,13 3,3,5 1,0,13 2,2,7 1,0,15 3,0,5 1,0,16 4,4,5 1, 1, 17 1,0,18 2,0,9 1,1,19 3,3,7 1,0,21 2,2,11 3,0,7 5,4,5 1,0,22 2,0,11 1, 1,23 5,3,5
d=96
99 100 112 115 120
123 132
147 148 160
163 168
180
187 192
1,0,24 3,0,8 4,4,7 5,2,5 1,1,25 5,1,5 1,0,25 2,2,13 1,0,28 4,0,7 1,1,29 5,5,7 1,0,30 2,0,15 3,0,10 5,0,6 1, 1,31 3,3,11 1,0,33 2,2,17 3,0,11 6,6,7 1, 1,37 3,3,13 1,0,37 2,2,19 1,0,40 4,4,11 5,0,8 7,6,7 1,1,41 1,0,42 2,0,21 3,0,14 6,0,7 1,0,45 2,2,23 5,0,9 7,4,7 1,1,47 7,3,7 1,0,48 3,0,16 4,4,13 7,2,7
 d = 195
228
232 235 240
267 280
288
312
315
340
352
372
1,1,49 3,3,17 5,5,11 7,1,7 1,0,57 2,2,29 3,0,19 6,6,11 1,0,58 2,0,29 1,1,59 5,5,13 1,0,60 3,0,20 4,0,15 5,0,12 1,1,67 3,3,23 1,0,70 2,0,35 5,0,14 7,0,10 1,0,72 4,4,19 8,0,9 8,8,11 1,0,78 2,0,39 3,0,26 6,0,13 1,1,79 5,5,17 7,7,13 9,9,11 1,0,85 2,2,43 5,0,17 10, 10, 11 1,0,88 4,4,23 8,0,11 8,8,13 1,0,93 2,2,47 3,0,31 6,6,17
317
12.7 The Convergence of the Series K(d)
12.7 The Convergence of the Series K(d) Let
(d)
00 1 K(d)=I· n= 1
This is a very important series. Since
(1)
n n
(~) is a real character mod Idl, it follows from
Theorem 7.2.3 that
Moreover we see from Theorem 6.8.2 that the series K(d) is convergent. Theorem 7.1. lim ~ t'"
I (~) =
I 1., k., t
00 "C
nlk
n
(()(Idl) K(d). Idl
(k,d)= 1
Proof 1) Let A("C; d, n) denote the number of positive integers not exceeding "Cln and coprime with d. Then 1 (d) 1 (d) "C1 I I (d) n =I n I I=I n I I "C "C 00
1 .,k"tnlk (k, d) = 1
00
n =l
I
l.,k"t (k,d) = 1 nlk
n =l
l.,k"t/n (k,d) = 1
(~)A("C;d,n) .
n= 1
n
(2)
"C
Since A("C; d, n) does not increase as n increases, and
A("C;d,n) 1 :;;;, "C n it follows from Theorem 6.8.2 that the series (2) converges uniformly in "C. Also, for fixed n, we have
.
A("C; d, n)
t"'OO
"C
hm
(()(Idl) I
=.
Idl
n
Therefore
. 1 "L." hm
,,(d) I'1m A("C;d,n) L.. = ;, L." (d) 
t",00"C 1 .,k"tnlk (k,d)= 1
n
n=l
n
= ({)(Idl) Idl
t"'OO
I n= 1
(~) ~ n n
"C
.D
318
12. Binary Quadratic Forms
12.8 The Number of Lattice Points Inside a Hyperbola and an Ellipse Theorem 8.1. Let m > 0 and let there be an ellipse centre at the origin, or a hyperbola centre at the origin (the two curves of the hyperbola together with two lines passing through the origin). Denote by I the (finite) area of the region. Magnify the original figure by (that is replacing ~ and '1 by ~Jr and '1Jr), and denote by V(r) the number of lattice points in the magnified figure whose coordinates satisfy
Jr
~ = ~o
(modm),
'1 = '10
.
I
(modm).
Then V(r)
hm=2' t  co r m Proof We form a net in the original figure with the orthogonal lines ): =
.,
~o
'10 + sm '1 = =
+ "1m
Jr'
Jr
This gives a net of squares with side length mlJr. Denote by W(r) the number of squares whose "southwest corners" lie inside the ellipse or the hyperbola. Then clearly V(r)
=
W(r).
Since the area of each square in the net is m 2 /r it follows at once from the fundamental theorem of calculus that
and hence the required result.
D
12.9 The Limiting Average Denote by I/I(k, F) the number of proper representations of k by F, and let
L
H(r,F)=
I/I(k,F),
1 :::=;k~t (k,d)= 1
The aim of this section is the evaluate .
1
hm  H(r, F). t  00
't
r
> 1.
319
12.9 The Limiting Average
Theorem 9.1. As x, y both run over a complete residue system mod Idl, there are precisely Idlcp(ldl) sets of x, y such that F(x, y) is coprime with d.
Proof It suffices to prove that if plld, I> 0, then there are icp(pl) sets of x, y in a complete residue system modi such that p,tF(x,y). For let the standard P:' Then, since (d, F(x,y)) = 1 and p,tF(x,y) are factorization for Idl be equivalent, it follows from the Chinese remainder theorem that, as x, y run over a complete residue system mod Idl, there are
ni
n plcp(pl)
=
Idlcp(ldl)
plldl
values of F(x,y) which are coprime with d. Since (a, b, c) = 1, we have p,t(a, c). We now assume that p,ta. 1) Suppose that p > 2. Since (p,4a) = 1, it follows from 4aF = (2ax
+ by)2 
dy2
¥= 0 (mod p)
that 2ax
+ by ¥= 0
(modp),
and conversely. For any given value of y (there are pI values) there are p  1 distinct values for xmodp, because p,t2a. There are thus pl1(p  1) = cp(pl) values for xmodpl. The required result is proved. 2) Suppose that p = 2. Now 21d implies 21b. The condition ax 2 + bxy
+ cy2 ==
1 (mod 2)
becomes ax
+ cy ==
1 (mod 2).
Since corresponding to each value of y (there are 21 values) there are 21 1 values x (mod 21) which satisfy the above e9uation, the theorem is proved. D Theorem 9.2. We have
2n cp(ldl) .
11m t+ 00
H(r, F) 1:
=
{
JIdf Idi' log e cp(d)
if d> O.
Jdd' Proof If d < 0, we let U(r)
=
U(1:, F, xo,Yo) denote the number of solutions to
0::::; F(x,y) ::::; x
if d < 0,
== Xo (mod Idl),
1:,
y == Yo
(mod Idl).
If d > 0, then we let U(r) = U(r, F, xo,Yo) denote the number of solutions to
320
12. Binary Quadratic Forms
X = Xo
1::;;;1~10,
O::;;;F(x,y)::;;;r,
(mod Idl),
= Yo
y
(mod Idl).
Here the definitions for L, £, 8 are the same as §11.4. Let xo, Yo both run over the complete residue system mod Idl such that (F(xo, Yo), d) = 1. Then
I
U(r)
I
=
t/I(k, F)
= H(r, F),
'"
U(r).
1 ';k';r (k,d) = 1
(XO,Yo) (F(xo,yo).d) = 1
and hence l' 1 · H( r, F) 11m = 1m 't
t  00
t  00
l'
L.
(XO,YO) (F(xo,yo),d) = 1
By Theorem 9.1 we see that our theorem follows if we can prove that, for each set of xo, Yo, we have
lim U(r) r co
=
{~ :2' log 8 1 .jd d 2 '
r
if d < 0, if d> O.
Also, by Theorem 8.1, we need now only evaluate the area for the ellipse F(x, y) ::;;; 1, (d < 0), and the area for the hyperbola 0::;;; F(x,y) ::;;; 1, r > 0, 1 ::;;;
I~ I
0).
1) Suppose that d < O. It is well known that the area of the ellipse 2 ax + bxy + cy2 ::;;; 1 is 2n/JIdT. The theorem is therefore proved. 2) Suppose that d > 0, and we may assume that a > O. Since L
= 2ax + (b + .jd)y,
£ = 2ax + (b  .jd)y,
so that L£
= 4a(ax 2 + bxy + cy2),
and hence L > O. The required area for the hyperbola is
1= where the integration substitution
IS
ff
dxdy
over L£::;;; 4a, £ > 0, 1::;;; L/£ < L
2Ja= p,
£ =(j
2Ja
82.
We make the
321
12.10 The Class Number: An Analytic Expression
whose Jacobian has the value op
op
ox
oy
ou ox
ou oy
Therefore 1=
~ II dpdu,
where the integration is over pu ~ 1, u > 0, u ~ p < e2 u. This is the region formed by the two straight lines from the points (1,1) and (e, lie) to (0, 0) together with the rectangular hyperbola joining the points (1,1) and (e, lie). Therefore I
Jd I =
e
P
lip
I I I I I I(~ ;) I ~p I~ I; dp
du
+
dp
e
I
=
du
(p  ; ) dp
+
dp
o
l e e
=
p
+
dp = log e.
o
o
This gives
and the theorem is proved.
0
12.10 The Class Number: An Analytic Expression Theorem 10.1.
h(d)
=
{ W~ Jd
K(d),
1K(d), oge
Proof Let
if d < 0, if d> 0.
322
12. Binary Quadratic Forms
be a representative system. From Theorem 4.1 we have
I
I
H(7:, F) =
II/I(k,F)
l~k~T
F
F
(k,d)= 1
I
1 ~k~T (k,d) 1
=w
I/I(k)
=
I (d) .
I
1 ~k~T nlk (k,d) 1
=
n
From Theorem 7.1 and Theorem 9.2 we have h(d) { 2n } cp(I~1) loge Idl'
as required.
= w cp(ldl) K(d) {if d < 0, Idl
if d> 0,
0
Therefore our problem becomes that of the determination of the sum of the series K(d)
=
I l(d)  . 00
n= 1
n n
12.11 The Fundamental Discriminants Definition. By a fundamental discriminant we mean a discriminant d which has no odd prime square divisor, and d is odd or d == 8 or 12 (mod 16). For example: 5, 8,12,13,17,21,24,28,29, ... are fundamental discriminants. Theorem 11.1. Each discriminant d is uniquely expressible as fm 2 where f is a fundamental discriminant. Proof 1) If d is odd, then we let m 2 be the largest square that divides d. Write d = fm 2 for the required result. 2) If d is even, then we first write d = qr2 where r2 is the largest square that divides d. Clearly 21 r. If q == 1 (mod 4), then q is a fundamental discriminant. If q == 2 or 3 (mod 4), then we takef = 4q so that from 4q == 8 or 12 (mod 16) we see
that f is a fundamental discriminant. 3) Uniqueness. Let d = fm 2, m > andfbe a fundamental discriminant. If fis odd, thenfhas no square divisor so that m 2 is the largest square divisor of d. Iffis even, then f == 8 or 12 (mod 16), hence 4%f/4 and therefore (2m)2 is the largest square divisor of d. From this we see that the uniqueness property follows. 0
°
Theorem 11.2. Let d = fm 2 be the representation in Theorem 11.1. Then K(d) =
n (1  ([)~)K(f). P P plm
323
12.12 The Class Number Formula
Proof We have
L (d)  I = L (m2/)  I co
=
K(d)
co
n= 1
n n
n
n= 1
I  L (I) n n'
n
co
n= 1

(m,n)= 1 Let the standard factorization of m be pill . .. p!s. Then from Theorem 1.7.1 we have
K(d) = K(f)  L (£)~K(f) pdm Pi Pi
=
11 (I  ([)~)K(f).
D
p P
plm
We see from this theorem that we need only determine the values for Exercise. Show that if d is a fundamental discriminant then character mod Idl.
12.12 The Class Number Formula We now assume that d is a fundamental discriminant. Let
Ji = {+Ji, i~1,
Theorem 12.1.
If 0
land d == 0 or 1 (mod 4). Also let xo, Yo be the solution to
in which Xo
+ flyo
is least (xo > 0, Yo> 0) and let B=
Xo
+ flyo 2
.
The aim of this section is to prove that
Let d ~ m 2f, where 1 is a fundamental discriminant. Theorem 13.1. Letl> 0, and A* be the least nonnegative integer
1+ 1 I I
A
A*
(I)
1(jj::=A* + l) I n I "'~"2 jj. a
a= 1 n= 1
== A (mod!). Then
327
12.13 The Least Solution to Pelt's Equation
Proof We can prove, from Theorem 3.3, that
f ±(£)n =
0,
a=ln=l
so that
(I)
(I)
A A* 2:2:=2:2:. n n a
a=l n=l
a
a=l n=l
Also we can use the same method as in Theorem 7.9.2 to prove that
A*+I 1£ f(£)I~~(Jj_A*+I) Jj' I
a=ln=l
and so the required result follows.
n
""2
D
Theorem 13.2. Let d> 1. Then
Proof A direct computation gives
if (m,n)
=
I
if (m,n) > 1. Therefore
328
12. Binary Quadratic Forms
(Sincef~ 5 implies 1 < ~
Theorem 13.3. Let d
t.Jj; also Lklm 1 ::;:; m.)
0
5. Then K(d)
0).
(4)
From (4) we see that, given any 0"0 > 0, the series for Ld(s) is uniformly convergent in any finite region in the half plane 0" ;:::: 0"0' Since 0"0 can be any small positive number it follows that Lis) is analytic in the half plane 0" > 0. We now let n1 = 1 and n2 + 00 so that (2) follows. From Theorem 10.1 and h(d) ;:::: 1, loge> 0, we see that I
Lil) = K(d) > 0.
Separating the sum for Ld(l) into two parts: Lil)
=
I
n=l
(~)~ = n n
I (~)~ + I
n=l
n n
n=idi+1
(~)~, n n
we see that the first part satisfies
In~l (~)~I ~ n~l~ < 1 + logldl, while, by (4), the second part satisfies
II
(~)~I ~_Idl
1).
(2)
m=O From (1) and (2) we have
f(s)
=
n( I
apm p  ms )
(0" > 1).
(3)
m=O
p
Suppose now that the standard factorization of n is p~' ... pl'. We define
so that an is defined for all natural numbers n, and is a multiplicative function satisfying al = 1, an ;?; O. Again, by Theorem 5.4.4 and (3) we see that 00
(0"> I), n=l
where an has the requirement stated in the theorem.
0
Theorem 15.2. Let d and d 1 be two fundamental discriminants, Idl > dd 1 is a discriminant. Let f(s) be defined in Theorem 15.1 and let
Id11 >
1. Then
Then, for 0 < J < a < I (J is any fixed positive number less than 1), we have
~
f(a) >
2
C1P jdd1IC2 (1a), Ia
where C b C 2 are positive constants depending on J. Proof The function f(s)  p/(s  "I) is analytic in expansion
P
f(s)  
s 1
=
Is 
21 < 1 and has the Taylor
00
I
m=O
(b m  p)(2  s)m,
(4)
where
f (m)(2) bm = (_I)m _ _ m!
b o = f(2),
(m=I,2, ... ).
By Theorem 15.1, we havef(2) ;?; 1, and 00
( 1)"'f(m) (2)
I
=
ann  2 logm n ;?; 0
(m
= 1,2, ... ),
n=l
that is
b o ;?; 1,
(m
= 1,2, ... ).
(5)
334
12. Binary Quadratic Forms
From Theorems 14.2 and 14.3 we know thatj(s)  pl(s  1) is analytic in the right hand half plane (J > 0, so that the expansion (4) actually holds for Is  21 < 2. We now apply Theorem 14.1 to give an upper bound for Ibm  pI. For this purpose, we first seek an upper bound for If(s)  pl(s  1)1 on the circle Is  21 = (2  (j)/~ where ~ is a number satisfying 0 < ~ < 1 and 1 < (2  (j)/~ < 2. From Theorem 14.2 and Theorem 14.3 we have Ij(s)1
~ (_1 _ + ~) (~)3Iddtl2 Is  11 (J
Since Is  21 = (2 1;1
(j)/~
(s ¥ 1,
(J
> 0).
(6)
(J
we have
~ (2 + 2 ~ (j)
I
(2 _ 2
)1 ,
~ (j).
1 (2(j Is  11 ~ ~  1
and hence
(j)
2( Is21=~, where C3 is a positive constant depending only on (j and 14.3, we have Ipl ~ Idd1 12 , so that i j(S)  pi sl
~ C4 1dd1 12 ,
~.
Also, from Theorem
2(j
Is21 =~,
(7)
and, from the maximum modulus theorem, we see that (7) also holds for Is  21 ~ (2  (j)g. Therefore, from Theorem 14.1 we have that Ibm  pi
~ C 1dd
l l2
4
(2
~ (j
r'
m = 0, 1,2, . . . .
(8)
We can now obtain a lower bound for f(a) from the expansion (4). We have p moI 00 j(a) =  aI
+ I
(b m  p)(2  a)m
m=O
+ I
(b m  p)(2  a)m,
m=mo
and, by (5), we have moI
I
(2a)mO_l
moI
(bmp)(2ar~
m=O
1
I
m=O
p(2a)m= lp
, 1 a
while, by (8), we have
giving (9)
335
12.15 Siegel's Theorem
We now choose
so that
mo
I),
and
(2  a)mo
O. Then
I Ld(l) Proof We can assume that 0 < e < j(s)
t.
=
O(ldl')·
Let
= (s)Lh)LdJS)Ldd,(S),
P = Ld(l)Ld,(I)Ldd,(I),
(10)
where d 1 is chosen as follows: If there is a fundamental discriminant d 1 such that Ld,(a) has a zero in I  e < u < 1, then we take this d 1 to be the d 1 in (10) and we denote by a any zero of Ld,(u) in this interval, so that j(a) = O. If there is no fundamental discriminant d 1 such that Ld,(u) has a zero in 1  e < u < I, then we take any fundamental discriminant d 1 • In this case, if j(u) has zeros in 1  e < u < 1, then we take a to be anyone of its zeros so thatf(a) = 0; ifj(u) has no zero either in the interval, then we take a to be any point in the interval I  e < u < l. For this last case, since f(u) has no zero, it has a fixed sign. Moreover, p > 0 by Theorem 14.3 and sincej(s)  p/(s  I) is analytic in the right hand half plane we see thatf(u) +  00 as u tends to I from the left, and we deduce thatj(u) is negative in I  e < u < 1. Therefore, no matter how we choose d 1 or a, we always have j(a) :::::; O.
Let Idl > Id11. From Theorem 15.2 (taking J = we have
(11)
t so that 0 < J < I 
e < a < 1),
336
12. Binary Quadratic Forms
where C 1 and C2 are absolute positive constants. Therefore 2C I  < __ 1 L (I)L (1)ldd IC2(Ia) = CL (1)ldI C2 (Ia) Ld(l) 1 _ a d, dd, 1 dd, , where
is a constant which does not depend on d. When Idl > Id1 1 > 1 we have L dd ,(I) ~ 2
+ loglddd
0 we have, by Theorem
15.3 and Theorem 14.3, (12) and so by Theorem 10.1
Cloldlt~ ~ h(d) { loge 1 } ~ Cllldlt+~ which is the required result. 2) If d is not a fundamental discriminant and d = fm 2 , where fis a fundamental discriminant, then from
K(d) = plmn (1  ([)~)K(f), P P
337
Notes
we have C13m~K(f)
::;:; K(d) ::;:;
C12m~K(f).
From (12) we arrive at C14Idlt~::;:; IdltK(d) ::;:; C15Idlt+~
and the theorem follows from Theorem 10.1.
D
Notes 12.1. The method of D. A. Burgess (see Note 7.2) can be used to give an improved estimate on the least solution e = (xo + jdYo)/2 to Pell's equation x 2  dy2 = 4,
d > 0, d == 0 or 1 (mod 4). The result is: corresponding to every l> > 0 there exists a constant eel»~ such that log e < whenever d>
eel»~
(± + l»jdlog d
(see Y. Wang [63J).
Chapter 13. Unimodular Transformations
13.1 The Complex Plane Let z = x + yi be a complex number which is represented by a point P on a plane with coordinates (x, y). From the origin 0 we construct a directed line to P and we call this line the vector OF. There is a bijection between z and P so that every complex number now corresponds to a vector from the origin. y
P(x,y)
~~
a
___ x ~
The distance from 0 to P, also known as the length of the vector OP, is given by p = x 2 + y2 , and is the same as the a!solute value of z. The angle 8 measured from the positive xaxis to the vector OP, is called the argument of z. We have
J
x
=
y = psin8
pcos8,
and (p, 8) are referred to as the polar coordinates of the point (x,y). Clearly we have z= x
+ yi =
p(cos8
+ isin8) =
pew.
We usually write argz = 8. The circle centre c with radius r (;?; 0) can be represented by the equation
Iz  cl
=
r,
and the particular circle Izl = 1 is called the unit circle. We next investigate the bilinear transformation
az+ b z'=cz
+ d'
(1)
339
13.2 Properties of the Bilinear Transformation
where a, b, e, d are (in general complex) constants, and ad  be =I o. This transformation maps a point z (=I  die) in the plane into another point z'. Corresponding to the point z =  die we introduce an ideal point, called the point at infinity, for its image and we write z' = 00. Our discussion is concerned with the plane together with this ideal point. This is often called the extended eomplex plane, but in this chapter we shall simply call it the complex plane. Corresponding to the point z = 00 we have the image z' = ale. If we solve (1) for z, we have Z=
dz'+b , ez'  a
which is also a bilinear transformation known as the inverse transformation of (1). We see therefore that the transformation (1) is a bijection from the complex plane onto itself. Let us place a sphere on the complex plane with point of contact at the origin. We may refer to this point of contact as the "southpole", and the point on the sphere which is diametrically opposite to this as the "northpole". Consider a line joining a point z on the plane to the "northpole". This line crosses the sphere at a point, and if we map the point z onto this point and the point at infinity onto the "northpole" we see at once that this sets up a bijection between the complex plane and the surface of the sphere. This replacement of the abstract notion of the complex plane with an ideal point by the concrete notion of the surface of a sphere is due to Riemann, and we often call the sphere here the Riemann sphere.
13.2 Properties of the Bilinear Transformation Corresponding to a bilinear transformation A: az + b ez + d'
z'=
(1)
there is a matrix (2)
whose determinant is ad  be (=I 0), which we call the determinant of the transformation. Note that different matrices may correspond to the same transformation, since (
ap
bP),
ep
dp
all represent the same transformation (1). However it is not difficult to prove that, apart from this situation, there is no other matrix which corresponds to the transformation (I). We can choose p so that p2(ad  be) = 1 so that there is always a unit determinant matrix to represent the bilinear transformation A. It is easy to
340
13. Unimodular Transformations
show that there are only two unit determinant matrices which correspond to a given bilinear transformation, namely the matrices
( ±a, ± ± c,
b).
±d
Let there be another bilinear transformation B: a'z' + b'  c'z' + d"
z"
(3)
so that we have the bilinear transformation C: a'(az + b) + b'(cz c'(az + b) + d'(cz
+ d) + d) (a'a + b'c)z + a'b + b'd (c'a + d'c)z + c'b + d'd
z" =         
(4)
with corresponding matrix (
a'a + b'c c'a + d'c
a'b + b'd\ c'b + d'd)
known as the product of the two matrices (:: ( a'a + b'c c' a + d' c
a'b + b'd\ c' b + d' d)
(a'
~,) and (: ~), and we write b') (a c
= c' d'
b). d
The transformation (4) is also referred to as the product of the transformation (3) and (1) and we write C = BA. Note however that BA is not necessarily the same as AB. We denote by A 1 the inverse transformation to A. The transformation
z' = z is called the identity transformation and is denoted by E. We have AA 1 = A 1 A
= E.
Definition 1.* Let a set of bilinear transformations have the following three properties: (i) it contains the identity transformation, (ii) the product of any two transformations in the set is also in the set, (iii) the inverse of any transformation is also in the set. Then we say that the set of transformations form a group. Example 1. The set of all bilinear transformations form a group. Example 2. The set of all bilinear transformations with real coefficients form a group.
*
The three properties here are interrelated, but they suffice for our purpose of keeping matters simple and easy.
341
13.2 Properties of the Bilinear Transformation
Example 3. The set of all bilinear transformations with real coefficients and positive determinants form a group. Example 4. The set of all bilinear transformations with integer coefficients a, b, c, d satisfying ad  bc = ± 1 form a group. Example 5. The set of bilinear transformations with complex integer (that is a = a' + alii, a', a" integers) coefficients form a group.
Definition 2. If the image of Zo under the transformation A is Zo itself, then we call Zo a fixed point of A. In general a bilinear transformation has two distinct fixed points (from z' = z). They are the two roots Qf the quadratic equation CZ 2
+ (d 
a)z  b = O.
(5)
If Zr. Z2 are the two roots of this equation, then we can rewrite the transformation in the standard form Z'Zl
ZZl
Z' 
Z 
=..1. Taking
Z
=
00
so that z'
Z2
(6)
Z2
= ale we can specify A as ..1=
a  CZl a 
.
CZ 2
It is easy to show that A satisfies the quadratic equation I
a2
+ d 2 + 2bc
..1+=A ad  bc
(a
+ d)2
    2. ad bc
(7)
If 1..11 = I, A ¥ I, then we say that the transformation is elliptic. If A is real and not equal to ± 1, then we say that the transformation is hyperbolic. If A is complex and 1..11 ¥ I, then we say that the transformation is loxodromic. If c = 0 and d  a ¥ 0, then one of the fixed points is the point at infinity. Taking Z2 = 00 equation (6) then becomes Z' 
Zl
= A(z 
Ifthe two fixed points coincide, that is (a  d)2
Zl
Zl)'
(8)
= Z2, then
+ 4bc =
0
or (a
+ d)2 + 4(bc 
ad)
= O.
(9)
342
13. Unimodular Transformations
A transformation satisfying this condition is said to be parabolic. Substituting (9) into (7) gives A = I and the standard equation (6) becomes I
1
Z'Zl
ZZl
=+k where Zl = (a  d)/2c, k = 2c/(a + d). In particular when c = 0, a  d = 0, this fixed point becomes the point at infinity and the transformation then becomes Z'
= Z + k,
k
= b/a.
If on the repeated applications of a transformation the product becomes the identical transformation then we call the transformation a transformation offinite order. In this case, the period of the transformation is defined to be the least number of applications required to result in the identical transformation. Repeated applications of (10) and (6) give I
I
Z Zl
ZZl
, =  
+ nk,
Z'Zl =An _ ZZl ___ _ Z'  Z2
Z  Z2
so that the parabolic, hyperbolic and loxodromic transformations are not of finite order. Only for elliptic transformations do we have An = I and the period is theleast positive integer n such that An = 1. When n = 2 so that A =  1 we call the transformation an involution.
13.3 Geometric Properties of the Bilinear Transformation
Theorem 3.1. The cross ratio is invariant under a bilinear transformation. Proof Let aZi CZi
+b + d'
Z~=
,
so that Z' i
and hence
Z' j 
(ad  bc)(zi  z) J (CZi + d)(czj + d) ,
343
13.3 Geometric Properties of the Bilinear Transformation
Given any three points Zl> Z2, Z3 there exists a bilinear transformation which maps them onto any three specified points z'p z~, z~. This transformation can be written down explicitly by Z' 
Z'l
Z~  z~ Z3  Z2 Z  Z 1
Z' 
z~
Z3 
"
,
Z2 Z3 
Zl Z 
Z2
(I)
or Z~ 
Z~ Z' 
Z'l
Z3 
Z2 Z 
Z1
Z~ 
Z~ Z' 
Z~
Z3 
Z1 Z 
Z2
or (2)
If there is a bilinear transformation with the above property, then by Theorem 3.1 after Z having been specified, z' must satisfy (2). That is, z' is uniquely determined. Therefore a bilinear transformation with the above property is unique. In other words, (2) is a general form for a bilinear transformation. Let A 1 ,A 2 ,A 3 ,P be the points representing ZI>Z2,Z3,Z respectively. Then we have
where the direction of the signed angle is as shown in the diagram. From this we see that if the cross ratio is a real number, then
must be a multiple of n, and hence P lies on the circle through the three points A 1 ,A 2 ,A 3 •
If (ZlZ2Z3Z) is real, then by (2), (z~z~z~z') is also real, so that as Z describes the circle through Zl> Z2, Z3, the point z' will describe the circle through z~, z~, z~, and conversely. We have therefore proved that a bilinear transformation maps circles into circles. Note however that, in the present context, a straight line is interpreted as a circle with infinite radius.
344
13. Unimodular Transformations
Theorem 3.2. A bilinear transformation preserves the angle of intersection between two circles. That is, if two circles intersect with angle 9, then the two image circles ofa bilinear transformation also intersect with angle 9.
Proof Let the two circles intersect at Zt and Z2, and take two points Z3,Z4 in the neighbourhood of Z t on the two circles. The argument of the cross ratio arg(z3Z4ZtZ2) is LZ3Z2Z4  L;3ZtZ4. As Z3 and Z4 both tend to Zt this gives the value of the angle of intersection for the two circles. Since the cross ratio is invariant under the bilinear transformation, the theorem is proved. D
13.4 Real Transformations We now consider the transformation az ez
+b + d'
z'=
ad  be # 0,
where a, b, e, d are real numbers. Here we cannot always choose p so that p2(ad  be) = I; we can only choose p so that p2(ad  be) = ± I. From now on we shall assume that ad  be
= ± 1.
The set of all real bilinear transformations with determinant 1 form a group which we denote by 91. Clearly members of this group map the real axis onto itself. Moreover, given any three real numbers, there is a member which maps them onto any three specified real numbers. Theorem 4.1. Members of 91 map the upper half plane (that is y > 0) onto itself.
Proof Let z' = x'
+ iy', Z = x + iy, Z = x  iy. Then ., az + b az + b 2(ad  bc)iy 21Y =        = =ez + d cz + d Icz + dl 2 '
and the required result follows.
D
(I)
345
13.4 Real Transformations
Definition 1. A semicircle centred on the xaxis lying in the upper half plane is called '
a geodesic.
From Theorem 4.1 and Theorem 3.2 we have: Theorem 4.2. Members of 91 transform geodesics into geodesics.
0
Letzt. Z2 be any two points in the upper half plane. Ifamember of 91 mapszt. Z2 into z~, z~ respectively, then clearly
or
~2 zll2 = Iz~ z: 1 ZlZ2 ZlZ2
2 1
•
Take Z2 = Z + L1z, Zl = Z and letting L1z + 0 we have dz 12 12y
= 1dz' 12, 2y'
or dx 2 + dy2 y2
dX,2
+ dy'2 y'2
From this we see that the metric (2)
is invariant under transformations in 91. The area dxdy y2
(3)
which corresponds to this metric is also invariant under transformations in 91. Readers who are not familiar with differential geometry can prove the invariance of (2) and (3) under members of 91 by a direct method. , Theorem 4.3. Let Zt.Z2 be two points on the upper halfplane and let C be a smooth curve lying in the upper half plane joining z 1 and Z2' Then the value of the integral
f
JdX 2 + dy2 y
C
is minimum when C is (part of) a geodesic. Proof Construct a circle centre on the xaxis passing through Zl and Z2' Denote its centre by (t,O) so that the circle is described by
346
13. Unimodular Transformations
x =t
+ pcos8,
y = psin8.,
Let 8 = 8 1 and 8 2 when z = z 1 and Z2 respectively. Now the curve C can be described by x= y
t
+ p(8) cos 8}
= p(8) sin 8
and hence
f
f 82
+ dy2
Jdx2
'=
J(p'(8) cos 8  p(8) sin 8)2
+ (p'(8) sin 8 + p(8) cos 8)2
p(8) sin 8
y
d8
c p'(8))2 d8
( 1+ p(8) sin 8
f 82
~
d8
 =
sin 8
log
tan}8 2 . tani81
This shows that the values of the integral is minimum when and only when p'(8) = 0, that is when p(8) = p is constant. 0
Figure I
The above proof actually gives the minimum value of the integral along the geodesic. We can interpret the value geometrically as follows: Let the geodesic through Zb Z2 intersect the xaxis at the points A, B with its centre at C (see Fig. 1). Then we have
1 BZl tan8 1 =  , 2
Therefore
ZIA
1 BZ2 tan8 2 =  . 2
Z2A
347
13.4 Real Transformations
Definition 2. The minimum value of the integral in Theorem 4.3 is called the nonZi and Z2'
Euclidean distance between the two points
Definition 3. In this chapter the curvilinear triangular region between three geodesics will be called a triangle. Theorem 4.4. The nonEuclidean area
II
d::Y of a triangle ABC is given by
rrLA LBLC.
B
Figure 2
Proof I) We first consider the case L B = prove that there
L C = 0 (see Fig. 3). It is not difficult to
B
D
Figure 3
is a real bilinear transformation which maps Bto the point at infinity, Cto the point I, D to the point  I (or C to  I and D to I), and that the determinant is positive*. Thus Fig. 3 is transformed into Fig. 4. Let the coordinate for A be (xo,Yo). Then 1
II Xo
*
Jlx 2
Ip 1
00
dx:y = y
I  x
Xo
= sin
i
sin Xo = rr  LA.
Xo
'
The real transformation which maps B, C, D into
z'
ixli = ~ 2
=
00,
± I,
=+=
I is given by
(D  2B + C)z + (BC  2DC + BD) +(C  D)z + (D  C)B
and the value of the determinant is ± 2(D  C)(C  B)(B  D).
348
13. Unimodular Transformations
c
B
A
B
o
c
D
D
Figure 5
Figure 4
2) If L C = 0, then we use a real transformation to map C to From 1) we have
00,
giving Fig. 5.
LlABC = LlBDC  LlADC = (n  LB)  (n  (n  LA)) = n  LA  LB. 3) If none of LA, L B, L C is zero as in Fig. 6,
D
Figure 6
then, by 2), we have LlABC = LlADC  LlABD = (n  LC  LA  LBAD)  [n  (n  LB)  LBAD] =nLALBLC.
0
From this theorem we see that the sum of the interior angles of a triangle is at most two right angles, and its value can be any number between and n. What we have described here is a model of the famous Lobachevskian geometry which is an important tool in the study of modular functions.
°
13.5 Unimodular Transformations Definition. Let a, b, e, d be integers satisfying ad  be az
+b +d
z'=ez
is called a unimodular transformation.
=
1. Then the transformation (1)
349
13.5 Unimodular Transformations
It is easy to see that unimodular transformations form a group. From (7) in §2 we have
A.
+ A. 1 = (a + d)2
 2.
The discriminant of this quadratic equation is
[(a
+ d)2
 2]2  4 = (a
+ d)2[(a + d)2
 4].
In our discussion we may assume that a + d ~ 0, since otherwise we can replace a, b, c, d by  a,  b,  c,  d. I) If a + d> 2, then the transformation is hyperbolic and there are two real fixed points. These two fixed points are the roots of the quadratic equation
cz 2
+ (d 
a)z  b = 0.
The condition for this quadratic equation to have rational roots is that
(d  a)2
+ 4bc = (a + d)2
 4 = u 2,
°
where u is an integer. Since the only solutions for x 2  y2 = 4 are x = ± 2, Y = it follows that the fixed points of a hyperbolic transformation must be irrational numbers which are the roots of a quadratic equation with rational coefficients. We call such numbers quadratic algebraic numbers. 2) If a + d = 2, then A. = I and we have the parabolic transformation 1
I
z'  (a  I)/c
z  (a  I)/c
=
+c.
If c = 0, then a = d = I and we have z'
=
z
+ b.
The former has the rational number (a  1)/c as the fixed point while the latter has 00 as the fixed point. 3) If a + d = 1, then ..1 2 + A. + I = and so A. is p = e 2 f[i/3 = ( 1 + )=3)/2 or p2. The fixed points are then given by
°
a
ZI
+ p2
=,
a+p
Z2
=, C
C
and the standard form for the transformation is
z'  (a + p2)/C z'  (a + p)/c
z  (a
+ p2)/C
= p z  (a + p)/c .
This is an elliptic transformation whose period is 3. Replacing p by p2 will give another elliptic transformation with period 3. 4) If a + d = 0, then the equation for A. is (A. + 1)2 = so that A. =  1, and the fixed points are the roots of
°
350
13. Unimodular Transformations
cz 2
2az  b = 0,

that is
z=
a+i
=. c
The standard form for the transformation is z'  (a
+ i)/c
z  (a
z'  (a  i)/c
+ i)/c
z  (a  i)/c
This is an elliptic transformation with period 2. Summarizing we have:
Theorem 5.1. If a + d = 0, then the unimodular transformation (I) is an involution; if a + d = ± I, then it is a transformation with period 3; if a + d = ± 2 then it is parabolic and its fixed point is either a rational number or the point at infinity; if la + dl > 2, then it is hyperbolic and its fixed points are real quadratic algebraic numbers. 0
13.6 The Fundamental Region Definition 1. Let z, z' be two points on the upper half plane. Suppose that there is a unimodular transformation which maps z into z'. Then we say that z and z' are equivalent, and we write z '" z'. Clearly we have (i) z '" z; (ii) if z '" z', then z' '" z; (iii) if z '" z', z' '" z", then Z
1"'00./
z".
We shall consider the following region in the upper half plane: t~x
D:
{x
2
x2
+ y2 ;?;
I I
when when
x> 0, x ~ O.
y
p __
~
_ _ _ _ _ _ L_ _ _ _ _ _J ___
o Figure 7
x
351
13.6 The Fundamental Region
Definition 2. We call the points in D reduced points, and the region D the fundamental region. This region D is a triangle with interior angles 0, rt/3, rt/3. Theorem 6.1. No two reduced points are equivalent. Proof Let z, z' be two distinct reduced points and suppose that az
+b
z'=cz +d Then, by (1) in §4, we have y
,
y
= .,:::;
Icz
+ dl 2
We have Icz
+ dl 2 = c2zz + cd(z + z) + d 2 = C2(X 2 + y2) + 2cdx + d 2 ~ c 2  Icdl + d 2 > 1
where we must exclude the exceptional cases: c = ± 1, d = 0, or c = 0, d = ± 1, or c = d = 1. Therefore, apart from these exceptional cases, we always have y' < y.
When c = d = 1, Icz + dl 2 p2 + P + 1 = 0, we have Z
,
ap
+b
=
1 only when z ap + b p2
== =
P+ 1
2
= p.
From a  b
=
1 and
2
ap bp= p +b.
Therefore /(z') = )312. If z' E D, then z' = p which contradicts with z, z' being distinct points. We also have dz'  b
z= cz'
+a
so that y 1, then z cannot be a reduced point. If Izl = 1, then z must lie on the arc of the circle from p to i, and z' (=  liz) lies on the arc of the circle from p + 1 to i. If z =I i, then z' cannot be a reduced point, and if z = i, then z' = i = z, contradicting our assumption. 0
Theorem 6.2. The number ofpoints in the rectangle  t ~ x < t, y ~ y (y > 0) which are equivalent to afixed point isfinite. That is, if we partition the rectangle into sets of mutually equivalent points, then each set has only a finite number of points. Proof Let z
= x + iy and az
+b
z' cz + d· Then we have
If y'
~
y, then C2(X 2
+ y2) + 2cdx + d 2 ~~, Y
or
and clearly there can only be a finite number of integers c, d satisfying this. Let (c', d') be any such pair of integers, and (c', d') = 1. Then all the solutions (a, b) of the equation ad'  bc' = 1 can be represented by a = a'
+ mc',
b = b'
+ md'
where a', b' is a fixed solutions (that is a'd'  b'c' = 1), and m is any integer. Thus az
+b
a'z
z'== cz + d c'z
+ b' +m. + d'
There can only be one m such that  t ~ x' < t. Therefore corresponding to each pair (c', d') with (c', d') = 1 there is only one set a, b such that  t ~ x' < t. Therefore the number of points in the rectangle which are equivalent to z is finite. 0
353
13.6 The Fundamental Region
Theorem 6.3. Every point in the upper half plane is equivalent to a unique reduced point. Proof Let z = Xo
+ iyo, Yo > O. We take the unique integer m satisfying t~xo
+m 1, then z' is a reduced point and there is nothing more to prove. If Iz'l = I and z' lies on the arc from p to i, then it is a reduced point, and if it lies on the arc from I + p to i, then the transformation  liz will give the former situation. If Iz'l < I, then we let I
and
z"=  
z'
Choose m' such that
z'" =z"
+ m',
t~ x'"
Yo·
From Theorem 6.2 we see that there can only be a finite number of such points. Therefore every point must be equivalent to a reduced point, Also, from Theorem 6.1, there cannot be two equivalent reduced points. The theorem is proved. D In order to appreciate the significance of this theorem the reader should try to give direct proofs of the following two exercises which are immediate consequences of the theorem. Exercise 1. All the points a+i c
Z=,
a2
+ bc + I = 0
are equivalent to i. Exercise 2. All the points a+p c
Z=,
are equivalent to p.
a(l  a)  bc = I
354
13. Unimodular Transformations
13.7 The Net of the Fundamental Region Theorem 7.1. Suppose that Z is not a fixed point of any unimodular transformation. Let V, V be two distinct unimodular transformations. Then Vz ¥ Vz, where Vz represents the image of z under V. Proof If Vz = Vz, then z = VI Vz, so that z is a fixed point of a unimodular transformation. D Theorem 7.2. The set of all triangular images of the fundamental region forms a covering of the upper half plane with no overlaps. Proof The first part of the theorem follows from Theorem 6.3. If Vand V are two distinct unimodular transformations whose triangular images of D overlap, then the mapping VI Vmust map D into a triangular region which overlaps with D. Let z be a point in this overlap. Then there must be a point in D which is equivalent to z, and this is impossible if z is in D. D We can explain this theorem in terms of tile covering. In ordinary space we can cover regions without overlaps using equal size square tiles, and by this we mean that each tile can be "translated" from one place to another which is occupied by another tile. Here the fundamental region is the shape of our new tile, and "translation" is now a unimodular transformation. The above theorem then tells us that such a tile can be used to cover the upper half plane with no overlaps. This is the interpretation of nonEuclidean geometry, and with this language the notion of a fundamental region becomes clearer, and generalization becomes easier. We can alter the definition of a fundamental region as follows: Any region in the upper half plane is called a fundamental region if (i) any point must be equivalent to a point in the region; (ii) no two points in the region are equivalent. Take any point z in the upper half plane which is not a fixed point of any unimodular transformation. Construct the points z 1, Z 2, ••• which are equivalent to z, and then construct the perpendicular bisectors of (z, Zi), that is those points which have the two equal nonEuclidean distances from z and Zi. Discard the part on the side of Zi. Then the remaining part is a fundamental region. (The reader should supply the prooffor this, and also determine the fundamental region corresponding to z = 2i.) We remark that besides being important theoretically Lobachevskian geometry also has useful applications in number theory and in function theory. We note the following: The fixed points of an elliptic transformation with period 2 lie on the lines joining vertices with angles n/3. The fixed points of an
355
13.8 The Structure of the Modular Group
elliptic transformation with period 3 are the common vertices of six triangles. The fixed points of parabolic transformations are those points with infinitely many lines through them. The fixed points of a hyperbolic transformation cannot be vertices of any triangle (and it is even clearer that they cannot lie on the sides).
W=  I
W=
t
W=o
W=t
Figure 8
13.8 The Structure of the Modular Group Let us denote by Sand T the transformations z' = z + I and z' =  liz respectively. Then S1 denotes the transformation z' = z  l. The three transformations transform a fundamental region into the three neighbouring regions, and conversely the transformation which maps a neighbouring region into a fundamental region must be one of S, Tor S 1. Let M be any unimodular transformation, and z be any point in the fundamental region D. Wejoin z to M z by a curve not passing through any vertices. Suppose that the various regions that this curve crosses are
Also, denote by Mi the unimodular transformation which maps D into D i. Now Ml = S, Tor SI. Suppose that Mk can be represented as a product of Sand T. Since M;; 1 maps Dk into D, and D k+ 1 is a neighbouring region to Db it follows that M;; 1 maps D k+ 1 into M~+ l ' a neighbouring region of D. But D~+ 1 can be mapped into D via a transformation M,1 (= S, Tor SI). That is
or
356
13. Unimodular Transformations
Therefore Mk + 1 = MkM' can be represented as a product of Sand T, and hence so can M itself. We have therefore proved: Theorem 8.1. Any unimodular transformation is representable as a product of S and T. D Theorem 8.1 has the following explicit interpretation: If
then 1 z'=mtm2  m3  m4  ...  mv
+z
This shows clearly the relationship between unimodular transformations and continued fractions. It is easy to see that
Note. If we extend the definition of a unimodular transformation to: az + b z'=cz +d'
ad  bc
=
±
1,
then we can have the result 1 z'=ml+m2
1 + m3
+ ...
1 + mv + z
13.9 Positive Definite Quadratic Forms Let w be any complex number in the upper half plane, p be a positive real number, and consider the quadratic form F(x,y) = p(x  wy)(x  wy) = px 2  p(w
+ w)xy + pwwy2.
If we apply the unimodular transformation aw' +b w = cw' +d' then the above becomes p«cw' +d)x(aw' +b)y)«cw' +d)x  (aw' +b)y)/lcw' +d1 2 = p(dxbyw'( cx+ay»(dxbyw'( cx+ay»/lcw' +dI 2 ,
357
13.9 Positive Definite Quadratic Forms
or p(X  w' Y)(X  ii/ Y)/Iew'
+ d1 2 ,
where
+ ay.
y=  ex
X= dx  by,
Therefore _
{p,  p(w
_
+ w),pww} ~
{p p(w' + w') pW'w'} lew' + dl 2 '  lew' + dl 2 ' lew' + dl 2 .
(1)
We also note that ww=
w'w' lew'
+ dl 2
.
Starting from any positive definite form {IX, p, y}
where IX, p, yare real (IX > 0) and p2 side of (1), that
p = IX,

4IXY < 0, we have, by comparing the left hand
w=
 p + J p2 
4IXY
2IX
.
Assuming that w' is in the fundamental region, then from (1) we have that
 I :::; w'
+ w'·
1, ~
w'w'
if w' if w'
1,
+ W' > 0, + w':::; O.
Substituting {IX', p', y'} into the right hand side of (1) we then have that

P' 1 < ;:::; IX
1,
r
;> 1, IX
if
P' I; 2) W'l <  I, 0 < w~ < I; 3)  1 < w~ < 0, and so w~ <  2. I)
 I
o
w~
W'2
2)
W'1
o
 I
w~
3)
w'1
 I
w~
0
There is nothing to prove for 1), and 2) follows at once with the transformation
362
13. Unimodular Transformations
z' = z + 1. For case 3) we let z"=z'l
so that
 I
1, we see that I must be positive) satisfies
G ~)(~
~) = C
=:)
and we see, from the method of (9), that the right hand side of this equation is a product of Sand T. The inductive argument is complete. D Note: Positive modular matrices can also be expressed as a product of
and
(10)
This is because
(10 01)=(10 11)(11 0)1(1 1 0 11). Theorem 1.2. Any modular matrix can be expressed as a product of the matrices
and
(11)
That is the group of modular matrices can be generated by these two matrices. Proof If a modular matrix M is not positive, then
MG
~)
368
14. Integer Matrices and Their Applications
is positive. It follows from the note above that any modular matrix is expressible as a product of the three matrices
But
G ~) = G~)(~ ~)(~ ~) so that the theorem follows.
D
Definition 1. Let M and N be two matrices. Suppose that there is a modular matrix U such that
M=UN.
Then we say that N is left associated to M, and we denote this by M ~ N. Clearly left association has the following three properties: (i) M ~ M (reflexive); (ii) if M ~ N, then N ~ M (symmetric); (iii) if M ~ N, N ~ P, then M ~ P (transitive). A similar definition can be given for right association. Theorem 1.3. Any matrix is left associated to a matrix of the form
if a > 0, then
°
~ c
(:
~),
d~O;
a ~O,
(12)
< a.
Proof Corresponding to the matrix
there are integers r, s such that rb
+ sd =
=
(r,s)
0,
Now there are integers u, v such that rv  su
=
l.
I so that
is a positive modular matrix, and UM=(a 1 Cl
If al
0)
d1
•
~ 0, then we mUltiply this matrix by (  ~ ~) which will give a matrix with
369
14.1 Introduction ~ 0, and similarly we can make d l to a matrix of the form
al
~
O. Therefore every matrix is left associated
a ~O,
If a > 0, then we can choose q so that 0 ::;;; qa
+ C < a,
and from
0
we see that the theorem is proved.
Definition 2. We call the matrix in (12) the normal form of Hermite. Theorem 1.4. The normal form of Hermite for a nonsingular matrix is unique.
. (a
~) for a nonsingular
Proof We first note that the normal form of Hermite c
matrix cannot have a or d equal to zero. Now if (s u
t)(a v c
0)
d
(a l =
Cl
0)
dl
sv  tu '
= ± 1,
then, from td = 0, we have t = O. Also, from sa = al > 0, vd = d l > 0 and = ± I we see that s = v = 1. Finally, from ua + C = Cl> 0::;;; C < a, 0::;;; Cl < al = a we see that u = O. The theorem is proved. 0
sv
Exercise. Investigate the situation for a singular matrix. Definition 3. Let there be two modular matrices U and V such that UMV=N.
Then we say that M and N are equivalent, and we write M", N. Clearly, being equivalent has the three properties of being reflexive, symmetric, and transitive. Theorem 1.5. Any matrix is equivalent to a matrix of the form ( al
o
0), ala2
al
~ 0,
a2
~ O.
(13)
Proof Consider the matrix
Since the theorem becomes trivial if M is the null matrix we can assume that a :f 0, and indeed we can even assume that a > O. We first prove that M must be equivalent to a matrix of the form
370
14. Integer Matrices and Their Applications
( al C1
bl ), dl
We use induction on a, the case a = I being trivial. When a > I and a,rb, we can choose q so that 0 < aq + b < a and we consider
where the leading element is a positive integer less than a. If alb and a,rc, then we choose q' such that 0 < aq' + c < a, and we consider
where the leading element is once again a positive integer less than a. Finally, if al(b, c), but a,rd, we let c = c'a so that
(oI I) ( I 0) I
 c'
I
(a b) c d
=
(a
(l  c')b
*
*
+ d\, )
and a,r{(l  c')b + d} which reduces back to the case when a,rb. The inductive argument is now complete. Nowatl(bbcl,dl).Weletbl =a 1b2,cI = a1c2, and d l =a 1 d 2 ,andweconsider
2 ~)C:~2
l (_ c
::~:)(~
 ~2)
=
(~l
al(d2
~ b2C2))
where we can assume that al > 0, since otherwise we can multiply by (
 I
0
Similarly we can assume that a2 = d 2  b2C2 ;:::: O. The theorem is proved. Definition 4. We call the matrix in (13) the normal form of Smith.
We summarize our result as follows: By Theorem 1.2 any modular matrix is a product of the matrices
G~), From
and
we see that the effect of mUltiplying by
G~)
or its inverse is merely the
371
14.2 The Product of Matrices
interchanging of the two rows or the two columns of the matrix. Again, from
and
±
1)1 = (a
b
we see that the effect of multiplying by
±
a),
d± c
c
(~ ~) or by its inverse (~ ~) is the
addition or subtraction of the second row to the first row, or the first colu~n to the second column of the matrix. We call these operations here the elementary transformations of the matrix. We can therefore restate Theorem 1.5 as follows: We can use elementary transformations to reduce a given matrix to the normal form of Smith. Now the greatest common factor of the elements of a matrix is invariant under an elementary transformation, and so from Theorem 1.5 we have (a, b, c, d) = al' Also
lac bld = ad 
be =
± a 2l a2'
Therefore we have Theorem 1.6. The normal form of Smith for a given matrix is unique. D
14.2 The Product of Matrices Let all, aI2,"" amn be integers. We call the array
an m by n matrix and we sometimes denote it by A(m,n). Ifm = n, then we denote it by A(n) and we call it a square matrix of order n. Let B be an n by I matrix
B
= (::: . : : . :::) . .
bnl
• • •
We define the product matrix of A and B by
bnl
372
14. Integer Matrices and Their Applications
AB = C =
Cll ( ~~~
Cll) .. : : : ..
Cmi
(r
• •.
~~l.
n
Crs
'
I
=
t=
artbts 1
Cml
= 1, ... , m; s = 1, ... , I).
(1)
We see from the definition that the product matrix of A and B exists only when the number of columns in A is the same as the number of rows in B. Note also that, when AB and BA both exist, they may be different. If AB = BA, then we say that A and B commute. However we always have (AB)D = A(BD) whenever either side of this equation exists. If A and B are square matrices, then the determinant of AB is the product of the determinants of A and B. A square matrix whose determinant is zero is called a singular matrix, otherwise we call it a nonsingular matrix. Modular matrices are those square matrices whose determinants equal ± 1 and positive modular matrices are those whose determinants equal 1. Clearly the product of two (positive) modular matrices is a (positive) modular matrix. The square matrix
where each element not on the main diagonal is zero is called a diagonal matrix, and we denote it simply by A = [Ab A2, ... ,An]. If Ai = A2 = ... = An = 1 then,
A  I 
( o10 0) 1
...
0
........... .
o
0
...
1
and we call !the unit matrix. Clearly we have AI = fA = A for any square matrix A of order n. If the square matrices A and Bsatisfy AB = I, then we call Bthe inverse of A and we denote it by A  1. Consider a square matrix A (= A(n). By the cofactor of the element aij we mean the determinant of the square matrix of order (n  1) obtained by removing the ith row and thejth column of A. If we attach the sign ( l)i+ j to the cofactor of aij then we call it the algebraic cofactor ofaij and we denote it by Aij. Let
Ao =
(~:: . ~:.: ..... ~::) , A in
that is the matrix obtained from
A
A 2n
...
Ann
by replacing each element a rs with the algebraic
373
14.2 The Product of Matrices
cofactor Ars of am is called the adjoint matrix of A. It is not difficult to prove that AAo = AoA = aI,
where a is the determinant of A. It follows that if A is a modular matrix, then its inverse exists, and that A 1 = ± Ao. Conversely, if A has an inverse, then it must be a modular matrix. If AB = I, then from B = (± AoA)B = ± Ao(AB) = ± Ao we see that the inverse is unique and that AA  1 = A  1 A = l. Also, if A and B both have inverses, then (AB)l = B1A l . A 1 by n matrix (Xl, . .. ,xn ), where We no longer restrict the elements to be integers, is called a vector, and we write x = (Xl> ... ,xn). We should take care that this notation here is not to be confused with the greatest common factor symbol (Xl> ... , xn) = d. We shall use the convention that (Xl, ... , xn) by itself always represents a vector, while (Xl> . .. , xn) = d means the greatest common factor of Xl> ••• ,Xn • Also we shall always use the letters X and Y to denote a vector with n terms. The equation y=xB
(2)
represents the system of linear equations Yi
=
L xjbj;,
I :::; i :::; l.
j= 1
If n = I and B is nonsingular, then (2) is called a transformation. Corresponding to integers Xl> ••• ,Xn the transformati9n gives integers Yl> ••. ,Yn, but not conversely. However, if B is a modular matrix, then when Yl> ... ,Yn are integers, the numbers Xl> . .. , Xn must also be integers. In this case we call (2) a modular transformation. Example l. Let r ¥ I, and Yl =  X" Yr = Xl> Yi = Xi (i ¥ I, i ¥ r). This is a modular transformation whose corresponding matrix is obtained from I by mUltiplying the first row by  I and then interchanging it with the rth row (or mUltiplying the rth column by  I and then interchanging it with the first column). We denote this matrix by Er so that
o o
o
o o
I 0
o
o
o
0 r
0
0
1
Example 2. Let r ¥ I, ·and Yi = Xi (i ¥ r), Yr = Xr transformation and its corresponding matrix is
+ Xl.
r.
(3)
This too is a modular
374
14. Integer Matrices and Their Applications
Vr =
( o~ ~ ::: ~
.. ..
.. .. 0
...
0 r
::: .. ~) ...
,
(4)
1
that is the matrix obtained from [by adding the rth row to the first row (or adding the first column to the rth column). . It is easy to prove that Vr is representable as a product of V 2 and E i • In fact, if r > 2, then (5)
The proof is as follows: Let
so that
, ... ,
E,E,E, V,E,E,E,t
~(
t1 + ;:
t r)
.
But
_(t1 ~. t r
)
Vrt 
,
tn
so that (5) follows. Example 3. For fixed distinct rand s we let Yi = Xi (i "# s) and Ys = Xs + X r • Then this is also a modular transformation whose matrix is obtained from [by adding the sth row to the rth row (or adding the rth column to the sth column). We denote this matrix by Vrs so that
375
14.2 The Product of Matrices
o
1 0
Vrs =
o
o
0
o
0
o
o
0
o
o
r
s
o o
r
o
s
(6)
When s > 1, Vrs = E r I VsE" and Vrl = E r I V r I Er. Therefore Vrs can also be represented as a product of V 2 and E 2 , ••• , En. The matrices Vrs (1 ::;:; r ::;:; n, 1 ::;:; s ::;:; n, r ¥ s) together with all the products formed by them forms a group which we denote by Wl n • We saw, from the note following Theorem 1.1, thatthe group Wl 2 , generated by the matrices V21 and V 12 =
(~ ~).
=
G ~)
is identical with the group of all 2 by 2 positive modular
matrices. We now prove the corresponding result for n by n positive modular matrices. Theorem 2.1. The group Wln is the group of all n by n positive modular matrices.
It is clear that each matrix in Wln is a positive modular matrix so that we only have to prove that every positive modular matrix is in Wln , that is every positive modular matrix can be expressed as a product of the matrices Vrs . For this purpose we shall first establish the following two theorems. Theorem 2.2.
If (Xl> .•. ,xn ) = d, then there exists (Xl, ... , xn)U
=
U E Wln such that
(d, 0, ... ,0).
Proof Consider first the case n = 2. If (Xl> X2) = d, then there are integers rand s such that rXI
+ SX2
(r, s) = 1.
d,
=
We take u =  x2/d, v = xdd so that
VX2
+ UXI
=
0,
vr  us = 1.
Thus (Xl>
and P =
G:)
X2)
G:) =
(d,O)
is a positive modular matrix. Since we already know that P E Wl2 by
the note following Theorem 1.1, the case n
=
2 is proved.
376
14. Integer Matrices and Their Applications
We now proceed by induction on n. Let (xnl> xn)
= dl> so that there exists
P E Wl2 such that
Let 1 0 0 0 000 v(n)
=
o
o
0
r
u
0
s
v
It is easy to see that v(n) E Wln and that
From the induction hypothesis we have v(n (Xl> ••• , X n2,
d l ) v(nl)
l) E
= (d, 0, ... ,0).
We now let v(n) 1
=(
Wl n _ 1 and that
v(nl)
0)
0
1
so that (X I> ••• , xn) v(n) v~n) = (d, 0, ... ,0).
It is easy to see that
so that the theorem is proved.
0
Theorem 2.3. Let (all, a12, ... ,al n) = d. Then there isamatrix inWln whosefirst row is a12 aln) (~ d' d , ... , d . Proof By Theorem 2.2 there is a matrix U in Wln such that (all' aq, . .. , aln)U = (d, 0, ... ,0)
and so the matrix U  1 is a suitable candidate. Proof of Theorem 2.1. The case n
on n. Let
0
= 2 is already established. We now use induction
377
14.3 The Number of Generators for Modular Matrices
be any positive modular matrix. Clearly (all, by the matrix U in Theorem 2.3 we have
aI2,""
( ~.2.1I ~~~0 •.
a~l
=
1. On multiplying A
0) ,
"
AU =
al n)
.. : : : ..
~~n.
•••
a~n
a~2
•
The matrix
o
0
0
o o
V=
is in IDln' and
From the induction hypothesis, the matrix
is in IDln_ 1, and so the matrix
is in IDln. From (7) we see that the theorem follows.
0
14.3 The Number of Generators for Modular Matrices We proved in §1 that any 2 by 2 positive modular matrix can be expressed 'as a product of the matrices V21
=
C ~)
and V l2
=
(~ ~). We now discuss the
378
14. Integer Matrices and Their Applications
general case, and ask for the matrices whose products give all possible n by n positive modular matrices  that is we want to know the generators of the group ill'ln. From the definition for ill'ln any matrix in it is a product of V", and from the previous section we know that each Vrs is expressible as a product of the following n matrices: 0 I 0  I 0 0 0 0
E2 =
0 0 0
0 0 0
0
 I
0
, ... , E= n
000 I
0 0
I 0 0
0
o
0
o o o
0
010 o 0 000
Thus ill'ln can be generated by the n matrices E 2, E 3 , ••• ,En' V2. Let
VI =
(~ .. ~ . : : : .. ~ . ( ~. ~". ') . o
0
...
I
0
It is not difficult to prove that each of E 2, E 3 , ••• , En is expressible as a product of VI and E 2 In fact, we have 0
Er
= (E2Vdr2Ez(E2VI)nr+l,
if n is even,
Er = (E;lvIy2E2(E;IVltr+I,
if n is odd, r is even,
Er = (E;lvIy2E;I(E;IVdnr+l,
if nand r are odd.
(1)
The proof of (1) is similar to that of (2.5). Thus ill'ln can be generated by the three matrices VI> V2, E 2 If we write 0
o
V*=
o
0
I
0
o o o
0
000 then it is easy to verify that E2 the three matrices
V1
=
=
V*  1 V2V* I, so that ill'ln can also be generated by
0 0 I 0 0
0 ( ItI 0 0 0 0
..................... 0
0
1
0
379
14.3 The Number of Generators for Modular Matrices
1
o o
o
o o o
0
0 0
U*=
o
0 1 0 0
o o o
(2)
000
000
When n = 2 we saw that Wl2 can actually be generated by the two matrices UI
(0 1)
= 1
(1 1)
0 and U2 = 0
1 . We now ask whether Wln (n ;:::: 3) can also be
generated by U I and U2 ; that is whether U* is expressible as a product of U I and U 2 • We first examine the cases n = 3 and 4. (1) For n = 3, we have UI
=
(0 0 1)
1 1 1
U2 = ( 0
1 0 0 , 010
o
U* =
0
(1 0 0) 1 1 0 001
.
In the following we call the position for the ith row and the jth column the "position (i, j)". Consider the operation of multiplying U2 by UI on the left and U 1 1 on the right. We see from .
that successive applications of the above operation will leave the elements in the main diagonal invariant, whereas the element 1 not on the main diagonal will take up the successive positions (1,2), (2,3), (3,1). Similarly the elements in the three positions (3,2), (l, 3), (2, 1) will be permuted along a rail as shown in the diagram.
(I, I)
(1,2)
(2, \I)~ (2,2) (3, I)
(3,2)
'"
(1,3) (2, 3) (3,3)
Consequently in order to obtain the element 1 in the position (2, 1) we have first to produce this element in one of the positions (l, 3) or (3,2). Now if we mUltiply
380
14. Integer Matrices and Their Applications
T by U; I on the left and U2 on the right, it will give rise to the element 1 in the position (3,2); that is U; I TU2 =
(~ ~ ~).
1 1 1 The operation of multiplying by U ~ 1 on the left and U I on the right will make the element 1 in the position (3,2) in the matrix U ; I TU 2 move to the position (2, 1), that is
w=
U~IU;lTU2UI = (~ ~ ~). o
0
1
Therefore we need only to annihilate the element 1 in the position (2, 3) to give the required matrix U *, and this can be accomplished by mUltiplying by S  I on the left; that is
S'w~G
o1 o
0) 0
= U*.
1
Therefore, for n = 3, we have (3)
(2) For n = 4 we have
UI
=
CO
0 1 0 0 0 1 0 o 0
u,~o
~) o ' 0
0 1 0 0 1 0 0
~}
U'~G D
0 1 0 0 Similarly to the case n = 3, we start with
0 0 1 0
T~ U~·U,U. ~( ~
o1 0 0 0 ) o 1 0
.
 1 001 We can produce the element  1 in the position (4,2) by multiplying Tby u; I on the left and U2 on the right; that is
U; I TU 2
=( ~ ! H). 1
1
0
1
381
14.3 The Number of Generators for Modular Matrices
Again, the operation of mUltiplying by U~ 1 on the left and U 1 on the right will move the element  1 from the position (4,2) to the position (3, 1); that is
1
1
U 1 (U 2 TU 2) U 1
1 0 0 0) 0 1 0 0
=( _ 1 0 1 I 000
(4)
.
1
Performing the first operation of multiplying by U;l on the left and U 2 on the right will now produce the element  I in the position (3,2); that is
U 21( U 11 U 21 TU2 U 1 ) U 2
=
(~ ~ ~ ~) _ 1
_ I
I
I
o
0
0
I
.
Performing the second operation of multiplying by U ~ 1 on the left and U 1 on the right will now move the element  I in the position (3,2) to the position (2, 1); that is
At this point we observe that the elements of the matrix below the main diagonal matches those of U*  1, and the problem now is the anihilation of the elements 1 above the main diagonal. From (4) we have
o
0
1
o o
1 0
and hence
o
0
1 0
o o
1
0
Therefore, for n = 4, we have U*l
=
U~lU~lU;lU~lU;lU1U2U1U1U~lU;lU~lU;lU~1
x~~~~~~.
If we write U
=
U 2Ut> then (3) and (5) become
U* = U*l
~
=
U~lU1U1U1U1U;lU2
(n = 3),
U~1(Ul)2U1U.ul(Ul)2U~lU3
(n
= 4),
(6)
382
14. Integer Matrices and Their Applications
and in general we have U*(_1)"1
=
U 1 1 (U 1 )n 2U 1 U n  3 U1(U1)n2U11 unI.
(7)
The reader can follow the proof of (2.5) to prove (7). Therefore we have Theorem 3.1. The group IDln ofpositive modular matrices can be generated by the two matrices
U1 =
(~ ~
.. ..........
o
0
...
100 010 0 000
~ (~ ?".') , ..
I
0
000
In other words, any positive modular matrix is expressible as a product of U 1 and U2 • D
Any modular matrix which is not positive will become so on multiplying by
U3 =
(~.~o ~ ..
0
.. ::: ..
...
~)
.
I
Therefore we have Theorem 3.2. The group of all modular matrices can be generated by the three matrices U b U2 and U3 • In other words any modular matrix is expressible as a product of the matrices Ub U2 and U3 • D
14.4 Left Association Definition 1. Let A and B be two square matrices. Suppose that there is a modular matrix U such that
A= UB. Then we say that B is left associated to A and we write A ~ B. Clearly left association is reflexive, symmetric and transitive. Theorem 4.1. Any square matrix is left associated to a matrix of the form bl l b21
0 b22
0 0
0 0
0 0 (1)
bn  1,1 bn  1,2 bn  1,3 bn  1,n1 bn1 bn2 bn3 bn ,n1 where bvv ;?; O. Also if bvv > 0, then 0 :::; biv < bvv (i > v).
0 bnn
383
14.4 Left Association
Proof The case n = 2 has already been proved (Theorem 1.3). We now proceed by induction on n. Let
be any square matrix. If there is a nonzero element in the last column of the matrix A, then we let (aI", a2", ... ,a"n) = bnn . There are integers b I , b 2, .. . ,bn such that
By Theorem 2.3 there is a modular matrix V whose first row is (bb b 2, ... , bn). We interchange the first row of V with its nth row to give a modular matrix U whose nth row is (b I , b 2, . .. ,bn). We then have
It is easy to see that a'in' ... ,a~I.n are linear combinations of aln, a2", . .. ,ann and are therefore divisible by bnn . Therefore
0 A~
a~n
0
0
0
0 0
0
a"11 a"21
bnn a~n
bnn
a~,nl a~,nl
0 0 (2)
a~l,nl a~,nl
a~l,l
a"ni
0 bnn
The above still holds even when all the elements in the last column of A are zero, except that we have bnn = O. It follows from the induction hypothesis that
A~
where b vv
;;::
0, biv
=
bll b 21
0 b 22
0 0
0 0
............................... bnI,I
bn I,2
b~I
b~2
bnI,nI b~,nI
0 (i < v), and if b vv > 0, then 0
~
biv
0, then there exists an integer qnl such that
Therefore
o o
o o
A:!;, b~l
bn  l •n b~.nl
b~2
l
0
bnn where b~i = qnIbnl.i + b~i (1 ~ i ~ n  1), 0 ~ b~.nl < bn  l •n The theorem follows from repeated applications of this. 0
l .
Definition 2. We call a square matrix of the form (1) the normal form of Hermite. Exercise. Prove that the normal form of Hermite for a nonsingular square matrix is unique.
14.5 Invariant Factors and Elementary Divisors Definition 1. Let A (= A(m.n») and B (= B(m.n») be two matrices. Suppose that there are two modular matrices U (= u(m»), V (= v(n») such that A
= UBV.
Then we say that A and B are equivalent and we write A '" B. Clearly equivalence has the three properties of being reflexive, symmetric and transitive.
Theorem 5.1. Any matrix A (= A(m.n») must be equivalent to a matrix of the form
o o
0 d l d2 0
d l d2 d3
o
o
o
dl
0 0
0 0 0
0 0 0
0 0 0
(m~n)
(1)
o
or
o o
where di ~ O.
o o
o o
o
o
(m ~ n)
o
(2)
385
14.5 Invariant Factors and Elementary Divisors
Proof Let A = (al b a12, ... ,alk) be a 1 by k matrix where k is any positive integer (k> 1). By Theorem 2.2 there is a positive modular matrix U such that AU = (d, 0, 0, ... ,0)
and so the required result is proved. Also, from
where U' is the transposed matrix of U, we see that the theorem also holds for k by 1 matrices. We now proceed by induction on the number of rows of the matrix A. Let A be any given matrix. If A = 0, then the result is trivial. If A i: 0, then we may assume that all i: 0 and indeed we can even assume that all> O. We first prove that A must be equivalent to a matrix of the form: A", Al =
(;t: . ~:: .... ~:.:) , a~l
a~2
...
(1
~i~m,
1 ~j~n).
a~n
This is clearly so if all = 1. When all > 1, if all,j'aiojo then we can move aiojo to one of the positions occupied by a12, a2b a22, by means of row or column interchanging. Therefore, using the method of proof for Theorem 1.5, we can change the leading element to a positive integer which is less than all, and an inductive argument completes the first part of proof. Now from
o
0
o ••••••••••••••••
a~l
L
0
a~l
a~2
x
we have
OJ
0
a~l 1
o
0
a~l o ( = ...... ~~ ...... .. ~n. 0 a"
o
a~2
...
...
0 ) a"
a~n
'
386
14. Integer Matrices and Their Applications
Therefore, from the induction hypothesis, we have
A '"
(a~o~ . ;~ . ~ . :::......~...... ~ . :::. ~) 0
0
...
d~
.. · d~
0
...
(m
~
n)
(4)
0
or
o d~
A",
o o
o o o o
o o
(m;;:: n).
(S)
o
Since a~ 11 a;j, and d~ is a linear combination of the elements of A 1> it follows that If we let a~I = d1> d~ = d Id 2 , d; = d 3 , d~ = d4 , ••• , then the theorem follows from (4) and (S). 0 a'llld~.
Definition 2. We call matrices of the form (1) or (2) the normal forms of Smith. In the proof of Theorem S.l the operations that we use are: the interchange of rows (or columns), the addition of an integer multiple of a row (or column) to another row (or column); the mu'Itiplication by  1 to a row (or column). We call these operations the elementary operations of matrices. We can therefore restate Theorem S.l as follows: any matrix can be reduced to the normal form of Smith by elementary operations. After the interchange of two rows (or columns) or the multiplication by  1 to a row (or column), the i by i subdeterminants of the resulting matrix are either the same as the i by i subdeterminants of the original matrix, or differ by their signs only. Again if we add an integer multiple of a row (or column) to another row (or column) the i by i subdeterminants of the resulting matrix are either the same as the i by i subdeterminants of the original matrix, or the i by i subdeterminants with the addition of an integer multiple of i by i subdeterminants. It follows that the greatest common factor of all the i by i subdeterminants of a matrix is invariant under any elementary transformation. Therefore we have
Theorem 5.2. Let A '" B. Then the greatest common factors of the i by i subdeterminants of the two matrices A and B are the same. 0 Meanwhile we see from (1) and (2) that
are the greatest common factors of the i by i subdeterminants of A. Therefore we have
387
14.6 Applications
Theorem 5.3. The normal form of Smith for a matrix is unique.
D
Definition 3. Let the nonzero elements of the nOl'Illal form of Smith in (1) or (2) for a matrix A be (k
~
min(m,
n».
We call these numbers the invariantfactors of A of orders 1,2, ... ,k respectively. The number k is called the rank of the matrix A. Let d1
•••
di = p~i1 ... p7/'i
be the standard prime factorization of an invariant factor. We call the prime power
p't an elementary divisor of the matrix A.
It is easy to see that the indices of the elementary divisors satisfy: (1
~j ~
I).
It also follows from th~ definition that if two matrices have the same invariant factors, then they have the same rank and the same elementary divisors. Conversely if the ranks are the same and the elementary divisors are the same, then the invariant factors are the same. Therefore we have
Theorem 5.4. A necessary and sufficient condition for two m by n matrices to be equivalent is that they should have the same rank and the same elementary divisors. D
14.6 Applications Let us consider the solutions to the system of linear equations n
Yi
=
L X/lji
(1 ~ i ~ m,
n;;:: m),
(1)
j= 1
with integer coefficients, and given integers solutions to
Yi 
that is we consider the integer
y=xA,
(2)
We saw in the previous section that there are two modular matrices U (= and V (= v(m) such that
u(n)
388
14. Integer Matrices and Their Applications
o o UAV=
o o
o o
o
o
=D.
(3)
o
We now let y V = y* =
(y~
, ... , y~),
xU 1
= x* =
(x~, ... ,x~),
so that, from (2), y*
=
x*D,
(4)
or (I
~
i
~
(5)
m).
A necessary and sufficient condition for (I) to have a solution is that (5) has a solution. If d 1 ••• dk i: 0, dk + 1 = 0, then a necessary and sufficient condition for (5)
to have a solution is that Y~+l
= ... = y~ = o.
(6)
From (3) we have (7)
Now, if (6) holds, then we have, by (7), that (8)
conversely, if (8) holds, then
and from Theorem 5.2 we have y~+ 1
= ... = y~ = 0,
which is formula (6). Therefore a necessary and sufficient condition for (I) to have a solution is that (8) holds; that is, we have Theorem 6.1. A necessary and sufficient conditionfor the system (I) to have a solution is that there are two matrices A and (;) with the same invariant factors.
D
389
14.7 Matrix Factorizations and Standard Prime Matrices
If (5) holds, then we have I
Xl
y~
X'
= d1 '
... ,
2
(9)
This means that X'1' x~, ... , x~ are uniquely determined, and x~+ l ' ... , x~ can be any integers. Thus, if t1>' .. , t n  k are n  k arbitrary integers, then k Xi
=
L
nk XjUji
+
j= 1
L
t/Uk+/,i
/= 1
nk
= XlO) +
L
t/Uk+/,i
(1 ~ i ~ n),
(10)
/= 1
where
x~O),
. .. , x~O) is set of solution to (1) when tl
=
t2
= ... =
tn 
k
= O.
14.7 Matrix Factorizations and Standard Prime Matrices Definition 7.1. Let A and B be two square matrices, and suppose that there is a matrix C such that A = CB. Then we call B a right divisor of A, or we say that B rightdivides A, and we write BIA. Clearly we have (i) AlA ; (ii) if AlB and BIC, thenAIC. We can define a left divisor and leftdivide similarly. Definition 7.2. Let A be a nonsingular square matrix which is not a modular matrix. Suppose that for any factorization A = BC, we always have either B or C a modular matrix. Then we call A an irreducible matrix or a prime matrix. Otherwise we call A a composite matrix.
Let A be a nonsingular square matrix. By Theorem 5.1 there are two modular matrices U and V such that (1)
It is easy to decompose [dl, d 1 d 2, . .. , d 1 ••• dn ] into prime matrices. More specifically its factors are of the form P = [1, ... , l,p, 1, ... ,1] wherep is a prime number, and the number of such factors is the number of prime factors in d 1 • d 1d 2 ..... d 1d 2 ... dn • Therefore we have P
= [1, ... , l,p, 1, ... ,1],
(2)
where any two P can be interchanged. Consequently we have the following two theorems. Theorem 7.1. A necessary and sufficient conditionfor a square matrix to be a prime matrix is that its determinant is a prime number. 0
390
14. Integer Matrices and Their Applications
Theorem 7.2. Any composite square matrix can befactorized into a product ofafinite number ofprime matrices, and the number offactors is equal to the number ofprime divisors of the determinant of the matrix. D This type of factorization does not possess the uniqueness property. This is because we can always insert WW 1 (where Wis a modular matrix) in between two factors Pi' Pi + 1 so that Pi Wand W 1 Pi + 1 are now different factors from Pi and P i + 1· However, if we impose certain restrictions on the form of the factors, then we may have a sort of uniqueness theorem. Definition 7.3. If a prime matrix is expressible as U 1 [1, ... , l,p] U, where U is a modular matrix, then we call it a standard prime matrix. It is clear that every prime matrix must be left associated to a standard prime matrix. We now rewrite (2) as the following:
(3)
where any two VI PV can be interchanged, and they are all standard prime matrices. Therefore we have: Theorem 7.3. Any composite square matrix must be left associated with a product ofa finite number of interchangeable standard prime matrices. D Definition 7.4. By the standard factorization of A we mean (4)
where Wand V are modular matrices, and PI, ... , P s are of the form in (2). It is clear that P b ..• , Ps are uniquely determined by A, apart from ordering. Before we prove our uniqueness theorem we need the following definition: Definition 7.5. Let A be a nonsingular square matrix. A modular matrix U satisfying AUA o == 0
(modlAI)
is called an adjoint modular matrix of A. Here Ao is the adjoint matrix of A, and IAI is the absolute value of the determinant of A. Since the elements of the matrix AUA o are all multiples oflAI, it follow~ that the elements of the matrix (l/IAI)A UAo are integers. Moreover, on taking the determinant, we see that it is actually a modular matrix. Theorem 7.4. The set of all adjoint modular matrices of A forms a group.
391
14.7 Matrix Factorizations and Standard Prime Matrices
Proof Let U and V be adjoint modular matrices of A. From AUAoAVAo =
± IAI' AUVA o == 0
(modIAI2)
we see that UV is an adjoint modular matrix of A. Also, from
we have 1 AUA o ' AUlAo == 0 IAI
(modIAI),
and (l/IAI)AUA o is a modular matrix, so that U 1 is also an adjoint modular matrix of A. The theorem therefore follows.  D Definition 7.6. The group of adjoint modular matrices of A is called the adjoint group of A. Theorem 7.5. Let (5)
be any standardfactorization of A. Then there exists an adjoint modular matrix U of A such that V 1 = VU, W 1 = (± l/IAI)AU 1 Ao WU where Wand Vare the matrices in (4). Proof From (4) and (5) we have A AV 1 V 1
= WV 1 P 1 P 2
'"
PsV= W 1 V;lP 1 P 2
'"
PsVlo
= WV 1 V 1 W;lA.
It follows easily that U
=
V 1 V 1 is an adjoint modular matrix of A, and that
±1 IAiAUAo
1
= WUW 1
•
D
This theorem therefore gives the relationship between any two standard factorizations of A. Concerning the interchangeability of the standard prime matrices we have the following two theorems: Theorem 7.6. Let P = [1, ... , l,p] and Q = U 1 [I, ... , 1, q]U be two interchangeable standard prime matrices. Then Q must be of the form (6)
where r = q or 1. Also, if r matrix.
= q, then Q1 = I, and if r = 1, then Q1 is a standard prime
392
14. Integer Matrices and Their Applications
Proof Let
From PQ = QP we have (Ql \yy
x) = (Ql pr y
xp). rp
(7)
It follows at once that y
= (0, ... ,0).
Next, let
so that, from VQ = [1, ... , I, qJV, we have
(~11~11 If u =F 0, then we have
r
:1:) = (~:
;~).
= q. In this case, from
Xlr
(8)
= Xl, we deduce that
and hence u = ± 1, and that V 1 is a modular matrix. From V 1 Ql = V 1 it follows that Ql = f. If u = 0, then
so that r = 1. From V 1 Ql = V 1 and Ql cannot be f, it follows that V 1 is singular. By Theorem 5.1 there are two modular matrices V 1 and W 1 such that V1 V 1 W1 = [A!> ... , An 2, OJ, Ai;;:: O. Therefore, if we let
V= (Vlo 0)l ' W= (Wlo 0)I ' then
393
14.7 Matrix Factorizations and Standard Prime Matrices
0 A2
Ai 0
0 0
0 0
Cl C2
...........................
0 0 0 0 d 1 d2
0 Cn2 An2 0 0 Cnl 0 dn 2 dn 1
From !Cnldn1Al ... An2! = !X! = 1 we see that Ai = ... = An2 = 1, Cnl = ± 1, dn 1 = ± 1; here !X! denotes the absolute value of the determinant of X. Next we let 1 0 o 1 Y=
... ...
+ Cl + C2
0 0
+
0 0 0 0 0 0
0 0
0 0 0 0
Cn 2
1 0
0
0
0 0
o
Z=
+dn  2
o
= (Zl
0
o
1 0 0 1
0)l '
where in the matrices Yand Z the ambiguous signs are determined by the opposite signs of Cn  l and dn  1 respectively. We then have 1 0 1
o XYZ=
0 0 0 0 0 0
... ...
0 0
0 0
0 0 0 0 dn 
0 0
1
0 Cn l 0
It follows from
XW 1 QW = VUQW = V[I, ... , l,q]UW = [1, ... , l,q]X,
that YXZZ 1 W 1 QWZ
=
Y[I, ... , l,q]XZ = [1, ... , l,q]YXZ,
or (WZ)lQ(WZ)
= (YXZ)l[I, ... , 1, q] YXZ = [1, ... ,1, q, 1].
Therefore we have
This proves that Ql is a standard prime matrix.
D
394
14. Integer Matrices and Their Applications
Theorem 7.7. Corresponding to any set of interchangeable standard prime matrices P b . .. , P., there is a modular matrix U such that U 1PiU are all diagonal matrices. Proof The theorem is trivial if s = 1. We now proceed by induction and assume that the theorem holds when the number of matrices is less than s. Corresponding to P., there is a modular matrix Us such that U s 1PsUs = [1, ... , I,Psl Let U S 1PiUs = Qi'
i = 1,2, ... ,so
It is clear that these Q are interchangeable standard prime matrices. By Theorem 7.6
we have 1
~
i ~ s,
where ri = Pi or 1. Also if ri = Pi' then Qi = I, and Qi is of diagonal form; if ri = 1, then Qi is a standard prime matrix. Since the matrices Q are interchangeable we may assume that r1 = r2 = ... = rl = 1, rl +1 = PI+b ... ,rs = Ps (0 ~ t ~ s  1). If t = 0, then the theorem follows at once. Otherwise from the induction hypothesis, there is a corresponding to the interchangeable standard prime matrices Qi, ... , modular matrix U* such that U*1QiU* (1 ~ i ~ t) are of diagonal form. Now take
Q:
U1
so that U~1QiU1 (1 ~ i taking U = UsU 1. D
~
=
(~* ~).
s) are all of diagonal form. The theorem follows on
Exercise. Examine the properties of the adjoint group of the matrix A = [db d 1d 2, . .. , d1 .•. dnl
14.8 The Greatest Common Factor and the Least Common Multiple Definition 8.1. Let A and B be two square matrices, not both equal to O. Let D be a common right divisor of A and B such that any common right divisor is also a right divisor of D. Then we call D a right greatest common divisor of A and B. Suppose that A and B are both right divisors of the square matrix M( i= 0), and that M is a right divisor of any square matrix having both A and B as right divisors. Then we call M a left least common multiple of A and B. Similar definitions for left greatest common divisors and right least common multiples can be given. We shall only discuss right greatest common divisors and left least common multiples and, for the sake of simplicity, we shall call them greatest common divisors and least common multiples.
395
14.8 The Greatest Common Factor and the Least Common Multiple
We define the sum of the two matrices A
(aij) and B
=i
= (bij) by
Theorem 8.1. Let A and B be two square matrices which are not both O. Then their greatest common divisor D exists. Moreover there are square matrices P and Q such that
PA+QB=D. Proof Consider the 2n by n matrix
By Theorem 5.1 there are two modular matrices V (= V(2n», V (= v(n» such that
We denote by
where Vij are n by n matrices. Then, from the above, we have (1)
and so (2)
and hence any right divisor of A and B must be a right divisor of D. Also, if we let
(~:: ~::r (~:: ~::) 1
=
where Xij are n by n matrices, then from (1) we have
= (Xll (A) B X 21
X12)(D), 22 0
X
and so A
=
X
ll D,
B
=
X 21 D,
and therefore D is a greatest common divisor of A and B. On taking U U 12 = Q the theorem is proved. 0
11 =
P,
396
14. Integer Matrices and Their Applications
Theorem 8.2. Let the square matrices A and B have a nonsingular greatest common divisor D. Then any greatest common divisor of A and B must be of the form UD, where U is a modular matrix. Proof Let D1 be any greatest common divisor. Then, from the definition, we have D = RD1 and D1 = SD, and hence D = RSD. On taking the determinants we see that Rand S are modular matrices. D
We now consider least common multiples. If the two matrices are both singular, then a least common multiple need not exist. For example and
G ~)
has no least common multiple. This is because every right divisor of
0)
.. .
take the form ( acO' and every nght dlVlsor of
(II I)I
(~ ~) must
must take the form
(: :), and these two forms are equal only when a = c = O. However, we have the following: Theorem 8.3. Let A and B be two nonsingular square matrices. Then their least common multiple M exists. Moreover, M is nonsingular, and any least common multiple is of the form UM where U is a modular matrix. Proof From (1) we have
We let
so that M is a common multiple of A and B. We now prove that M is a least common multiple. Let M 1 be any common mUltiple of A and B. Then a greatest common divisor M 2 of M and M 1 is also a common multiple of A and B. Let M2=KA = LB
so that (4)
Denote by Ao and Bo the adjoint matrices of A and B, so that AAo = aI, and BBo = bl where a and b are the determinants of A and B. Since A, Bare nonsingular, we have a i= 0, b i= 0 and, from (4), U21
= HK,
 U 22
= HL.
397
14.8 The Greatest Common Factor and the Least Common Multiple
Therefore we have, from (3), that
so that H is a modular matrix and H 1 exists. We see from
that M is a least common mUltiple. We next prove that M is nonsingular. From the definition of a least common mUltiple it suffices to prove that A and B have a nonsingular common multiple. From Theorem 5.1 there are two modular matrices Vb V 1 such that
Let
It is easily seen that M* is a nonsingular square matrix, and that
This matrix M* thus serves our purpose. If M3 is any least common multiple, then from the definition, we have
and so M=EFM,
I=EF;
thus E, F are modular matrices, and the theorem is proved.
0
Theorem 8.4. Let A be a square matrix. Then, corresponding to each nonzero integer m, there are two square matrices Rand Q such that (1) A = mQ or (2) A = mQ + R, and 0 < IRI < Imln, where IRI denotes the absolute value of the determinant of R. Proof By Theorem 5.1 there are two matrices V and V such that (di
~
0, 1 ,,;;;. i ,,;;;. n).
There are integers qi and ri (> 0) such that d1
Let
•••
d i = mqi
+ ri,
0< ri ,,;;;.Iml
(1 ,,;;;. i,,;;;. n).
398
14. IntegeLMatrices and Their Applications
so that (5)
If ri =
Iml
(1 ~ i ~ n), then R1 = Imll =
A = mU(Q1
which proves (1). Ifthereexistsjsuch that 0 < rj < from (5), we have A
± ml so that, from (5), we have
± /)V= mQ
Iml, then 0 < IR11 = r1r2 ... rn < 1m I", and so,
= mUQ1V + UR 1V= mQ + R
and
This proves (2).
0
Theorem 8.5. Let B be a nonsingular square matrix. Then, corresponding to any square matrix A, there exist two square matrices Q and C such that (1) A = QB, or (2) A = QB + C, and 0 < ICI < IBI.
Proof Let Bo be the adjoint matrix of B so that BBo = BoB = bl, where b is the determinant of B. From the previous theorem there are two square matrices Q and R such that ABo
= bQ
(6)
or ABo = bQ
+ R,
0
••• ,xn } the set of all linear forms
1) =
with integer coefficients al> ... ,an. If
is another linear form in
1),
then we define
Definition 9.1. Let IDl be a subset of 1) with the property that if YI, Y2 are in IDl, then so are YI ± Y2. Then we call IDl a module. Clearly 1) itself is a module. The subset oflinear forms 0, ± XI> ± 2xl> ... also form a module.' The module formed by the subset whose only member is 0= OXI + ... + OXn will be excluded from our discussions. Definition 9.2. Suppose that the module IDl contains the forms YI> . .. ,Yl such that any form IDl can be expressed uniquely as
where bl> . .. ,bl are integers. Then we say that IDl has dimension I, and we call Yl> ... ,Yl a basis for IDl. It follows at once from the definition that YI, ... ,Yl are linearly independentthat is aIYI + ... + alYI = 0 implies al = ... = al = O. Theorem 9.1. Every module has a basis and has dimension at most n. Proof Let I (~ n) be the integer such that, for each member ofIDl, the coefficients of Xl + 1, . . . ,Xn are all zero, but there is a member of IDl whose coefficient of Xl is not
zero. It follows that the set of coefficients of Xl forms a nonzero integral modulus. We denote by b l the least positive integer in this integral modulus, and we let the corresponding linear form in IDl be
Now the coefficient of Xl for any member Y of IDl must be a multiple of bl so that Y =y'
+ gYI
where g is an integer, and y' is a linear form of the indeterminants Xl> ... , XlI.
400
14. Integer Matrices and Their Applications
Consider now the set of all such forms y'. We can determine an integer l' ( ~ I  I) such that, for each y', the coefficients of Xl' + 1, •.. , XI_ 1 are all zero, but there is a y' whose coefficient of Xl' is not zero. As before we can determine a linear form
where b;, is the least positive coefficient of XI' among all forms y'. Let y' = y" + g'YI" where g' is an integer and y" is a linear form in Xl, •.. 'XI' 1' Proceeding inductively we see that Wl has a basis y" y;, ... with at most n members. The theorem is proved. D Theorem 9.2. The dimension of a module is independent of the choice of bases. Proof LetYl>'" ,YI andz 1 , ••• ,Zl' be any two bases for a module Wl and suppose, if possible, that I i: I'. We may assume that I> I'. From the definition of a basis there are integers aij and bij such that
0 0) ~~~ .. ::: .. ~~I: ..~ .. ·.·.·...~
all
( }:) =
(
all
all'
. ..
all'
0
...
0
o Zl
bll
ZI'
bl'l 0
0
b ll
............
... b/'l 0
............ 0
0
0
OJ
where (aij) and (bij) are I x I square matrices. Therefore bl l
o
bll
0
OJ
But Yl>' .• ,YI are linearly independent, so that (aij)(bij) = I. Since the determinant of the left hand side is zero, we have a contradiction and therefore the theorem is proved. D From now on we shall only consider modules with dimension n. Let Y 1, . . . ,Yn be a basis for a module Wl. Then
401
14.9 Linear Modules
(
Yl)
~2
=
(all a12
Yn
aln) (Xl)
~~~ .. ~~~ .. :.:.:.. .~2.n
~2.
ani an2
Xn
ann
Therefore corresponding to each n dimensional module and its basis there is a square matrix
(1)
Yl> ... ,Y",
(2)
which is a nonsingular because Yl, ... ,Yn are linearly independent. Conversely, corresponding to each nonsingular square matrix A, we can determine a set of linearly independent linear forms Yl> ... ,Yn which can then be used as a basis to determine an n dimensional module IDl'. This then sets up a relationship between n dimensional modules and nonsingular square matrices of order n. We now ask: What is the relationship between the two matrices corresponding to the two different bases of a module? Let Z1> ... , Zn be another basis for IDl with the corresponding matrix B = (bij) so that
Since both U
= (uij),
V
Y1> ... ,Yn
and
Z1> ... ,Zn
are bases, there are two square matrices
= (Vij) such that
and so
Since Y1> ... , Yn are linearly independent, we deduce that UV = /, that is U and V are modular matrices. Now
so that B= VA.
(3)
402
14. Integer Matrices and Their Applications
Therefore matrices corresponding to the same module are left associated. Conversely, two nonsingular square matrices which are left associated correspond to the same module. Ifwe partition the family of all nonsingular matrices of order n into classes by left association, then each class represents a module, and modules represented by distinct classes are different. We may therefore speak of "the matrix A associated with 9Jl" to mean that A is a member of the class of matrices which represent 9Jl. From Theorem 4.1 we see that, for an n dimensional module 9Jl, we can select a basis Yt> ... ,Yn such that
(4)
where a vv > 0 (l :0::; v :0::; n), and 0:0::; allv < a vv (/1 > v). This is a standard form for a basis, or a standard basis. Theorem 9.3. Let 9Jl and~ be two modules. A necessary and sufficient condition for ~ to be included in 9Jl is that the matrix associated with 9Jl is a right divisor of the matrix associated with ~. Proof Let the bases for 9Jl and ~ be Yt> .. . ,Yn and Zt> .•• , Zn and let the associated matrices be A = (aij) and B = (b ij ) respectively. If ~ is included in 9Jl, then
so that B = CA. Conversely, if B = CA, then
so that
~
is included in 9Jl.
D
Definition 9.3. Suppose that the difference between two linear forms Zl and Z2 is a member of the module 9Jl. Then we say thatz 1 andz 2 are congruent mod 9Jl, and we write Zl == Z2 (mod 9Jl). The relation of being congruent mod 9Jl is reflexive, symmetric and transitive, so that the family of all linear forms is partitioned into equivalence classes mod 9Jl. The number of such classes is called the norm of 9Jl, and is denoted by N(9Jl), the existence of which has yet to be proved. Clearly 9Jl itself is an equivalence class.
403
14.9 Linear Modules
Theorem 9.4. Let A correspond to the module 9Jt Then N(9Jl)
= IAI.
Proof Since the matrices associated with 9Jl have the same absolute value for their determinants, we may assume that the basis chosen is the standard basis in (4). Any linear form
gives another one with 0 ~ an < an", by subtracting a suitable multiple of + ... + annx n· We may further subtract multiples of Ynl = anl.lxl + ... + anl.n1Xnl so that 0 ~ anl < anl.nl> and so on. Thus every linear form is congruent to a linear form
Yn = anlxl
(1
~
v ~ n).
The total number of such linear forms is alla22 ... ann = IAI, and moreover no two such linear forms are congruent. The theorem is proved. 0 From Theorem 9.3 and Theorem 9.4 we have Theorem 9.5. Let 91 c 9Jl and let A and B be matrices associated with 9Jl and 91 respectively. Then, in the partitioning of9Jl into congruent classes mod 91, the number of classes is equal to
N(91) N(9Jl)
IBI IAI
o
The set Tl = {Xl' ... , xn} with indeterminants Xl, ... , Xn can also be represented by other indeterminants. If we let
where U = (uij) is a modular matrix, then Tl can also be represented by x'!, ... , x~; that is Tl = {Xl>'" ,xn} = {x~, ... ,x~}. Let a module 9Jl, together with its basis Yl>'" ,Yn corresponding to the indeterminants Xl>'" ,Xn have the associated matrix A. We now consider the associated matrix corresponding to a change of indeterminants to X '1 , ••• , x~. From
we see that the required matrix corresponding to the indeterminants x~, ... ,x~ is A U. This means that the relation of right association corresponds to the change of indeterminants, or the change of basis for the representation ofTl. Also, from (3) we see that the relation of left association corresponds to the change of basis for the
404
14. Integer Matrices and Their Applications
module. It therefore follows from Theorem 5.1 that each fixed n dimensional module 9Jl, after suitable changes of bases for the module and for !l, has an associated matrix which is a diagonal matrix (d l > 0, ... , dn > 0).
From Theorem 7.2 and Theorem 9.5 we have Theorem 9.6. Let 9Jl be an n dimensional module. Then there is a chain (5) such that (1
are prime numbers.
~
i
~
I)
D
The set of forms belonging to both the modules 9Jl l and 9Jl 2 is also a module which is called the intersection of9Jl l and 9Jl 2 , and we denote it by 9Jlm • Also the set offorms obtained from addition and subtraction of members belonging to 9Jl l and 9Jl 2 forms a module which is called the sum of 9Jl l and 9Jl 2 , and we denote it by 9Jld • We then have: Theorem 9.7. Let the matrices Mb M 2, Mm, Md be associated with the modules 9Jl l , 9Jl 2, 9Jlm, 9Jld respectively. Then Md isa least common multiple of Ml and M 2, and Mm is a greatest common divisor of Ml and M 2.
If M3 = B1Ml = B2M2 is a common multiple of Ml andM2 , and 9Jl 3 is the module with which the matrix M3 is associated, then
and hence
Thus Mm is a least common multiple of Ml and M 2. The proof that Md is a greatest common divisor of Ml and M2 is similar. D
Chapter 15. padic Numbers
15.1 Introduction The purpose of this chapter is to introduce the theory of padic numbers due to Hensel. This theory has extensive applications in number theory, algebraic geometry and algebraic functions, and is an important theory in the study of modern algebra. Before we give the rigorous definitions we give a simple introduction as to how we obtain the padic numbers. We recall the method of solution to the congruence f(x) == O· (modp')
(1)
which we discussed in chapter 2; heref(x) is a polynomial with integer coefficients and p is a prime number. Our method was first to solve the congruence f(x) == 0
(modp).
(2)
If (2) has a solution ao (0 ,::;; ao < p) andf'(ao) ¥= 0 (modp), then welet x and we consider the congruence f(ao
+ py) == 0
(modp2),
=
ao
+ py,
O,::;;y n
10gqJ(ao)
(1)
We may view (log qJ(a»jlog qJ(ao) as a lower bound for the set of rational numbers satisfying (1). If qJ' and qJ are equivalent, then (log qJ'(a»jlog qJ'(ao) also acts as a lower bound for this set of rational numbers. It follows that, for any rational a i= 0, log qJ(a)
log qJ'(a)
log qJ(ao)
log qJ'(ao)
This means that there exists a positive constant s, depending only on qJ and qJ', such
411
15.4 Archimedian Valuations
that log cp'(a) log cp'(ao) , ,= =s>O log cp(a) log cp(ao) holds for all rational a "# O. Therefore cp'(a)
= cpS(a). D
Definition 3.2. Let cp be a valuation and suppose that there exists a positive integer no > I such that cp(no) > I. Then we call cp an Archimedian valuation. A nonArchimedian valuation cp is one such that cp(n) ~ I for all positive integers n. The valuation cp(a) = lal is Archimedian, the identical valuation CPo and the padic va:Iuation cp(a) = lal p are nonArchimedian.
15.4 Archimedian Valuations Theorem 4.1. Any Archimedian valuation is equivalent to the absolute value valuation.
Proof Let cp be an Archimedian valuation and let n, n' be two integers greater than 1. We represent.n' as
o ~ ai < n, Then
From cp(ai)
~
ai < n (i = 0,1, ... , v), we see that cp(n')
~
n(l
+ cp(n) + cp(n)2 + ... + cp(n)")
~
n(l
+ v)max(1,cp(n)").
From the representation of n' we know that n" hence
cp(n')
~
~
n' so that v ~ logn'jlogn, and
IOgn') n( 1 +  max(l, cp(nyogn ,flog n). logn
Substituting n'h for n' we have
cp(n')h ~ n ( I
, + h IOgn') log n max(l, cp(n)h logn flog n),
or ,
cp(n')
~
max(l, cp(nyogn'flog n). ( n ( 1 + hIOgn'))lfh logn
412 Letting h +
15. padic Numbers 00
we have cp(n')
~
max(l, cp(n)logn'/logn).
(1)
This holds for all n, n' > 1. By the Archimedian property there exists no > 1 such that cp(no) > 1. Therefore 1 < max(l, cp(n)logno/logn) and whence 1 < cp(nyogno/logn. Therefore cp(n) > 1 whenever n > 1 and (1) may be rewritten as cp(n')
~
cp(n)logn'/logn,
or log cp(n') logn'
~:~
log cp(n) . logn
By the symmetry of nand n' we deduce that log cp(n') logn'
log cp(n) logn'
and this implies the existence of a positive constant s, depending only on cp, such that log cp(n) =s>O, logn
n> 1.
Therefore cp(n) = nS. Also, from cp(n) ~ n we know that s ~ 1. Next, from cp(  n) = cp(n) we see that cp(n) = In IS for all integers n such that Inl > 1. Finally, from 2) we see that, for all rational numbers a, cp(a) = lal s ,
O<s~1.
0
15.5 NonArchimedian Valuations We saw in §2 that for the padic valuation cp(a) = lal p we have la + blp ~ max(lalp,lblp) with equality when lal p =F Iblp. We now prove that all nonArchimedian valuations share this property. Theorem 5.1. Let cp be a nonArchimedian valuation. Then 3')
cp(a
+ b) ~ max(cp(a), cp(b)).
cp(a
+ b) = max(cp(a), cp(b)).
Also, ij'cp(a) =F cp(b), then, 3")
Conversely, if a valuation cp satisfies 3') then cp is nonArchimedian.
413
15.5 NonArchimedian Valuations
Proof From the Binomial theorem (a
+ b)" =
+ (~)anlb + ... +
an
C:
l)ab n
1
+ bn,
and the inequality cp(n) ~ 1, which holds for a nonArchimedian valuation cp and positive integers n, we see that cp((a
+ b)n)
~
cp(a)n
~
(n
+
+ cp(a)"lcp(b) + ... + cp(a)cp(b)"l + W(b)n l)max(cp(a)",cp(b)"),
or cp(a
+ b) ~ (n +
l)l/nmax(cp(a),cp(b)),
and 3') follows from this by letting n + 00. If cp(a) i= cp(b), we may assume that cp(b) < 0) is always a valuation regardless of whether s ~ 1. This is because, from 3'), cpS(a
+ b) ~ max(cpS(a), cpS(b)) ~ cpS(a) + cpS(b)
which gives 3). Given any nonArchimedian valuation cp we put w(a)
=
log cp(a),
where the base of the logarithm is any real number greater than 1. The choice of the base has. little relevance because cps (s > 0) is also a nonArchimedian valuation. From the properties of cp we see that w has the following properties. i) If a i= 0, then w(a) is a real number, and w(O) = 00; ii) w(ab) = w(a) + web);
414
15. padic Numbers
iii) w(a + b) ~ min(w(a), web)); iv) w(a + b) = min(w(a), web)) if w(a) # web). If q> is not the identical valuation, then there must be a rational ao such that 0< w(ao) < 00. We also note that w(l) = 0, w(  a) = w(a) and wen) ~O for integers n. Theorem 5.2. The following is a necessary and sufficient condition for the equivalence of two nonidentical nonArchimedian valuations q> and q>'. There exists s > 0 such that, for every rational a # 0,
w'(a) = sw(a), where w'(a) =  log q>'(a) and w(a) =  log q>(a).
D
Theorem 5.3. Every nonidentical nonArchimedian valuation q> is equivalent to some padtc valuation lal p •
Proof First wen) ~ 0 for integers n, and from q> # q>o there exists an integer m # 1 such that w(m) > O. We next show that the set of integers satisfying this inequality forms a modulus. This is easy since if wen) > 0 and wen') > 0, then wen ± n') ~ min(w(n), wen') > 0 by iii). From Theorem 1.4.3 we know that there exists a least positive integer g in the modulus such that g divides every member of the modulus. Obviously g> 1, and we now prove that g is a prime number. Suppose, if possible, that g = g' g" (g' > 1, g" > 1). Then w(g)
=
w(g'g")
=
w(g')
+ w(g").
Since w(g) is positive and w(g'), w(g") are nonnegative, it follows that at least one of . w(g') and w(g") is positive. But g' and g" are less than g and this contradicts with g being the least positive integer in the modulus. Therefore g is a prime number which we shall now denote by p. We have now proved that wen) = 0 if p t n, and wen) > 0 if pin. Corresponding to any rational number a # 0 we have the unique representation a = (rjs)p' (s> 0), where r, s are coprime integers, p t rs and I is an integer. Therefore
w(a) = wG)
+ Iw(p) = w(r) 
w(s)
+ Iw(p) = Iw(p).
Now
w'(a) =  log lal p = llogp, so that
w(p) w(a) = w'(a). logp Let s
= w(p)jlogp and the result follows from Theorem 5.2. 0
15.6 The ipExtension of the Rationals
415
15.6 The cpExtension of the Rationals Readers who are familiar with Cantor's method for the construction of real numbers in mathematical analysis should have no difficulty with this and the next section. , Let cp be a valuation, and we shall write {an} for a sequence al, a2, ... , an, ... of rational numbers. Definition 6.1. By a fundamental sequence, or a cpconvergent sequence, we mean a sequence {an} which satisfies the following condition: Given any rational number e > 0, there exists a positive integer N (= N(e» such that cp(am  an) < e whenever m,n>N. For example, the constant sequence, where al = a2 = ... = an = ... = a, is a fundamental sequence which we shall denote by {a}. If {an} is a fundamental sequence, then there exists A such that cp(an) ~ A for all n. We define the sum, the difference and the product of two sequences by
From
and
we see at once that the sum, the difference and the product of two fundamental sequences are fundamental. Definition 6.2. Let {an} be a sequence such that there exists a rational number a with the following property: Given any rational number e > 0, there exists a positive integer N (= N(e» such that cp(an  a) < e whenever n > N. Then we say that {an} has the cplimit a, and we write cplimn> 00 an = a. Obviously the cplimit of {a} is a. From cp(am  an) ~ cp(am  a) + cp(an  a) we see that the existence of a cplimit implies the sequence being fundamental. Note, however, that the converse does not followthat is, not every fundamental sequence possesses a cplimit. Let {an} and {b n} have the cplimits a and b. Then the sum, the difference and the product also have cplimits, namely a + b, a  band ab respectively. Also, if cplimn> 00 an = a, then limn> 00 cp(an) = cp(a). Definition 6.3. By a null sequence we mean a sequence having the cplimit O. We denote by {O} the class of all null sequences.
416 Example 1. If cp(a) =
15. padic Numbers
lal, then {an =
lin} is a null sequence.
Example 2. If cp(a) = lal p , then {an = pn} is a null sequence. It is easy to prove that the sum of two null sequences is a null sequence; so is the product of a null sequence and a fundamental sequence. We now define the quotient of two sequences. Let {b n } be a nonnull sequence. Then the quotient {an}/{b n} is defined to be the sequence {a nb;l}. Observe that since {b n } is not a null sequence we may discard those terms which are zero without affecting the discussion. If {an} is a fundamental sequence but not a null sequence, then there exists a positive rational number c and a positive natural number N such that cp(an) > c > 0 whenever n > N. It is not difficult to deduce from this that the quotient {an}/{b n} ({b n } not nUll) of the fundamental sequences is a fundamental sequence. Definition 6.4. Let {an} and {b n} be two fundamental sequences whose difference
{an  bn} is a null sequence. Then we say that {an} and {bnJ are congruent and we write {an} == {bn} (mod{O}).
Being congruent is an equivalence relation and the set of fundamental sequences is now partitioned into equivalence classes. From each class we may select a fundamental sequence {an} to represent the class {an}. We can now define the sum, the difference, the product and the quotient of two classes {an} and {b n}. We let {an} and {b n} be the representatives respectively and we      } } }, define {an} ± {b = {an ± b }, {an}' {b = {anb and when {b n} =F {OJ we n n n n  define {an} . {b n}  1 = {anb n I}. It is easy to verify that the definitions are independent of the choices of the representatives. The aggregate of these classes is called the cpextension of the rationals, and each class is called a number in the cpextension. When cp(a) = lal, the cpextension coincides with the set of real numbers. When cp(a) = lal p we call the cpextension the set ofpadic numbers. This then gives a rigorous definition of a padic number. Our next task is to give a concrete representation of a padic number. The aggregate of classes contains the class {a} (a rational), and each fundamental sequence in the class is cpconvergent to the same rational number a, that is, a is their cplimit. We shall write {a} = a, since now there is a onetoone correspondence between these classes and the set of rational numbers. Since there are fundamental sequences which are not cpconvergent to any rational number we see that the cpextension of the rationals is an aggregate which is larger than the set of all rational numbers. In general we let {an} be the number to which the fundamental sequence in it cpconverges. That is, we define
We should add that, when {an} and cplimn _ oo a~.
{a~}
belong to the same class, cplim n _
oo
an
=
417
15.8 The Representation of padic Numbers
In the above discussion the valuation is defined only in the field of the rationals. We shall now extend its definition to the cpextension of the rationals.
We should point out that in this definition, cp( {an} ) is independent of the choice of {an}. That is, if {an} == {a~} (mod {O}), then lim n+ oo cp(an) = limn+oo cp(a~). This can easily be proved from cp(an)  cp(a~) ~ cp(an  a~). It is convenient to use Greek letters IX, 13, y, . .. to denote the classes. We have the following three properties for cp(IX): 1) cp(lX) ~ 0 with equality if and only ifoc = {O}; 2) cp(lXf3) = cp(lX)cp(f3); 3) cp(1X + 13) ~ cp(lX) + cp(f3). Exercise 1. Show that equivalent valuations give the same extension of the rationals. Exercise 2. Let cp be a nonArchimedian valuation. Prove that {an} is convergent if and only if limn+oo cp(an+ 1  an) = O.
15.7 The Completeness of the Extension In the previous section we constructed the cpextension of the rationals from the fundamental sequences of rational numbers and we saw that the cpextension is larger than the set of rationals. We then extended the domain of definition of cp from the rationals to that of the cpextension, giving a definition of cp(lX) where IX is a class of fundamental sequences. We now ask the following: If we repeat the process to obtain another cpextension from the cpextension already obtained, do we have a still larger aggregate than the first cpextension? If the answer is no, then we say that the extension is complete. In order to discuss this we have to consider sequences {IX,} of classes, and define the terms fundamental sequences, classes, cplimit, null sequences etc. all over again. It turns out that the cpextension is complete, but we shall omit the proof. Theorem 7.1. The cpextension of the rationals is complete in the sense that every fundamental (or cpconvergent) sequence {IX,} has a cplimit. 0
15.8 The Representation of padic Numbers In this section we let cp(a) = lal p , and we examine the representation of padic numbers. 1) We first consider the padic representation of a rational number a
b'
(a, b)
=
1,
p,(b.
418
15. padic Numbers
For this we examine the solution of the congruence
bx == a
o~ X
(modpl),
/ > L(8)), 1 p
we know that {x,} is rpconvergent. This means that the limit
in the rpextension is the padic representation of the rational number alb (p,r b). 2) We next deal with the padic representation for the rational number a
b'
(a,b) = 1,
m~O.
The general padic representation of a rational number is the power series m~O.
If, for this power series (1), we have a,+ v
= a'+ v + t = al+ v +2t = ... = al+ v + nt = . ..
(v = 1,2, ... , t),
(1)
419
15.8 The Representation of padic Numbers
where I and t are fixed integers, t we may rewrite it as
~
1, then we say that (l) is periodic, and in this case
+ alP + ... + a,p') + pl+l(a'+l + a,+2P + ... + a,+ li 1) + pl+l+l(a,+l + a,+2P + ... + a,+ lpll) + ... ),
pm((ao
or simply
where
A = ao
+ alP + ... + a,pl,
Theorem 8.1. The padic representation ofa rational number is a periodic power series in p; conversely a periodic power series in p is a rational number.
Proof I) If
where
then
apm _ A
+ p' + I + 1 B + p' + 21 + 1 B + = pl+ 1 B(l + pI + p21 + ... ). =
p' + 1 B
Now
1 + pI
1 _ p(k+1Y
+ p21 + ... + pkl = ____ I _pI
'
1_ _ 1 I =p(k+l)1 < e I_ 1  pI 1 _ pI p(k+ 1)1
P
so that
1 + pI
+ p21 + ... + pkl + ...
1 1 _ pI
= __ .
Therefore
1 a:pm  A =p'+1B _ _ 1 pI ' or
1 a =pmA +p'+l mB _ _ 1 pI ' so that a is a rational number.
420
15. padic Numbers
2) We first consider the rational number 0(
r =, S
10(1
< 1,
(r, s)
=
s> 0,
1,
r
< 0,
p,j's.
(2)
Let the index of p (mods) be t, that is t is the least positive integer satisfying p' == 1 (mod s). Let 1  p' = ms, m < 0, so that r
mr
0(==.
s
Since
10(1
1 _ p'
< 1, the number mr has the representation 0,::;; bi <po
Therefore 0(
= (b o + blP + ... + bl_lP,l)(l + p' + p21 + ... ) = (b o + blP + ... + bl_lP,l) + p'(b o + blP + ... + bl_lP ,  1 ) + "',
which shows that 0( has a periodic power series in p as representation. Next, let 0( be any positive rational, say 0( = a/b, (a, b) = l,pmllb. Then 0( has the representation pmO( = ao
r
+ alP + ... + avpv + , s
o '::;;ai convergent, it follows that its q>limit
does represent a padic number.
m
~
0,
(3)
421
15.9 Application
Therefore a power seriesinp of the form (3) represents apadic number. We now ask: How is a given padic number represented? From the previous section we know that any padic number is a limit of a (f)convergent sequence {a,} in the (f)extension of the rationals. But any rational number a, can be represented as
Our problem is thus solved if we can show that the limit of {a,}, in the (f)extension of the rationals, also has this representation. Corresponding to each positive integer t, there exists a positive integer L ( = L(t» such that 1
la,  a"l p L. This shows that, when I > L, the first t + k (k ;;:: 0) terms of the power series in p representing a" a, + 1, a, + 2, ••• must be equal. Since t can be arbitrarily large the required result follows. We have proved that all the power series in p of the form (3), finite or infinite, together give the whole set of padic numbers.
15.9 Application Although the notion ofapadic number is introduced as such only in this chapter, it has appeared several times already in this book. An example of this was pointed out in the beginning of this chapter. Th,e generalization of this example is known as Hensel's Lemma. Theorem 9.1 (Hensel). Let f(x) be a polynomial with integer coefficients, and f(x) == go(x)ho(x) (modp), where go(x) and ho(x) are coprime polynomials. Then, among the padic numbers, there are two polynomials g(x), h(x) such that g(x) == go(x), h(x) == ho(x) (modp), andf(x) = g(x)h(x). Proof Let g,(x), h,(x) be two polynomials satisfying
and
Clearly g, and h, are coprime (modp). Let
and
422
15. padic Numbers
so that we have
Let
j(x)  g,(x)h,(x)
;,  == t(x) (modp). p
Since g,(x) and h,(x) are coprime (modp), there are two polynomials cp(x) and t/J(x) such that t(x) == cp(x)hl(x) + t/J(x)Mx) (modp). Therefore j(x)  gl+ 1 (x)h l + 1 (x) ==j(x)  g,(x)hl(x)  pl(cp(x)hl(X)
+ t/J(x)g,(x»
== p'(t(x)  cp(x)h,(x)  t/J(x)g,(x» == 0 (modp'+ 1). Since the degree of t(x) does not exceed the degree of g,(x)h,(x) we may assume that the degrees of cp(x) and t/J(x) do not exceed the degrees of g,(x) and h,(x) respectively. The coefficients of g,(x) and hl(x) are cpconvergent, and so they converge to g(x) and hex) respectively. The theorem is proved. 0
Note: Lemma 7.10.1 can be given an interpretation in padic numbers.
Chapter 16. Introduction to Algebraic Number Theory
16.1 Algebraic Numbers Definition 1.1. By an algebraic number we mean a number 8 which is a root of the algebraic equation (1)
where the coefficients a r are rational numbers.
J=l
Examples of algebraic numbers are j2, i = and the rational numbers themselves. By clearing the denominators of all the fractions a r in equation (1) we obtain an algebraic equation with integer coefficients. From now on we shall call an ordinary integer a rational integer to distinguish it from an algebraic integer, which we shall define later. We see therefore that algebraic numbers may also be defined as the roots of algebraic equations with rational integer coefficients. If the equation (1) is irreducible and an i= 0, then we call n = aOfthe degree of the algebraic number 8. For example, rational numbers have degree 1, and the number i has degree 2. Let the equation (1) be irreducible and let 8(1), 8(2), ••• , 8(n) be all its roots. From Theorem 4.2.2 we know that 8(j) 'are distinct, and if 8(j) satisfies a rational coefficient equation g(x) = 0, then so do the remaining n  1 numbers. We see therefore that the degree of an algebraic number is uniquely determined. Theorem 1.1. The sum, the diflerence, the product and the quotient (not dividing by zero) of two algebraic numbers are algebraic. 0 With the aid of the symmetric polynomial theorem, the proof of this theorem is only a simple exercise, and we shall omit many of the proofs in this chapter. Definition 1.2. If the irreducible algebraic equation defining 8 has rational integer . coefficients and leading coefficient 1, then we call 8 an algebraic integer. Examples of algebraic integers are themselves.
j2, i, (1 + fi)/2, and the rational integers
Theorem 1.2. Any algebraic integer which is rational must be a rational integer.
0
424
16. Introduction to Algebraic Number Theory
Theorem 1.3. The sum, the difference and the product of two algebraic integers are algebraic integers. D Theorem 1.4. Let 8 be an algebraic number. Then there exists a natural number q such that q8 is an algebraic integer. D Definition 1.3. If 8 and 8  1 are both algebraic integers, then we call 8 a unit. Examples of units are i and 3  2)2. Theorem 1.5. A necessary and sufficient condition for 8 to be a unit is that 8 satisfies an algebraic equation with rational integer coefficients, and with leading coefficient 1 and last coefficient ± 1. D
16.2 Algebraic Number Fields Definition 2.1. Let Fbe a set of complex numbers with at least two distinct members. Suppose that, given any two members in F, their sum, difference, product and quotient (not dividing by 0) are also members of F. Then we call Fa number field, or simply a field. An example of a field is the set of rational numbers which we shall, from now on, denote by R. It is clear that every number field must contain the rational field R. Theorem 2.1. Let 8 be an algebraic number ofdegree n. Then the set ofnumbers of the form (1)
where ak are rational numbers, forms a field. Moreover numbers represented by (1) are all distinct.
D
Dermition 2.2. The field in Theorem 2.1 is called the single extension of R by 8, and we shall denote it by R(8). For example, R(i) is the field of numbers of the form a rational numbers.
+ ib where a and bare
Theorem 2.2. /f8 "# 0, then R(8) is largest set ofnumbers obtainedfrom 8 by means of addition, subtraction, multiplication and division (except by 0). D Definition 2.3. Let 81> ... ,8, be algebraic numbers. The field obtained from addition, subtraction, multiplication and division (except by 0) of these numbers is called a finite extension of R and is denoted by R(8 1 , .•• , 8,).
425
16.3 Basis
Theorem 2.3. Every finite extension of R is a single extension. That is, given any finite extension R(8l>"" 8,), there exists an algebraic number 8 such that R(8) = R(8l> ... ,8,). 0 From this theorem we need only consider single extensions R(8), which we now call algebraic number fields. We also call the degree of 8 the degree of the field R(8). For example, R(i) is a quadratic field, and R is the only field with degree I. Theorem 2.4. Let D run over all the rational integers not equal to 1 with no square divisors. Then R(JD) runs over all the quadratic fields. 0
16.3 Basis In this section R(8) denotes an algebraic number field of degree n. We set 8 = 8(1), and let 8(2), ••• ,8(n) denote the remaining n  1 roots of the irreducible equation defining 8. From the previous section we see that each number aE R(8) is representable as
where
aj
are rational numbers.
Dermition3.1. Leta(l) = a. Weputa(k) = a(8(k»,k conjugates of a. We also call the numbers Sea) N(a)
= 2,3, ... ,nand we call them the
= a(1) + ... + a(n) = a(8(1» + ... + a(8(n», = a(l) ... a(n) = a(8(1» ... a(8(n»,
the trace and the norm of a respectively.
It is easy to see that S(a + f3) = Sea) + S(f3) and N(af3) = N(a)N(f3). Also, from the symmetric polynomial theorem, we see that Sea) and N(a) are rational numbers, and if a is rational then Sea) = na and N(a) = an. Next, if a is an algebraic integer, then so are a(i), and hence so are Sea) and N(a); but Sea) and N(a) are known to be rational so that they must be rational integers. If a is a unit, then from N(a)N(a 1) = N(aa 1) = N(I) = 1 and the fact that N(a), N(IX 1) are rational integers we deduce that N(IX) = ± 1. Conversely, if ais an algebraic integer and N(IX) = ± I, then 1X 1 = ± a(2) ... a(n) is also an algebraic integer and so IX must be a unit. Therefore a necessary and sufficient condition for an algebraic integer IX to be a unit is that N(IX) = ± 1. Theorem 3.1. Let aE R(8) and let the irreducible equation satisfied by IX be hex) = 0, aOh = I. Also, let g(x) = rr~= 1 (x  a(v». Then g(x) is a polynomial with rational coefficients, and g(x) = c(h(x»"/', where lin and c is a rational number.
426
16. Introduction to Algebraic Number Theory
Proof That g(x) is a polynomial with rational coefficients follows at once from the symmetric polynomial theorem. Let oc = a(8). Then from h(oc) = 0 we have h(oc(v) = h(a(8(v» = 0, so that every root of g(x) = 0 is also a root of hex) = O. Since hex) is an irreducible polynomial we must have h(x)lg(x). Let g(x) = h(x)gl(x). If gl(X) is a constant, then the required result is proved; otherwise gl(X) has zeros and these must also be zeros of hex), so that h(x)lgl(x). Let gl(X) = h(x)g2(x). We can repeat the argument, and since the degree of g(x) is finite we finally obtain g(x) ::= c(h(x»n/l. 0
From this theorem we see that if 0( is an algebraic number of degree I, then there are I distinct numbers among OC(l), ••• , O(n) and each of them occurs nil times. Definition 3.2. Suppose that there exists a set of numbers OC1, ••• , O(m in R( 8) such that any number in R(8) is uniquely representable as alO(l + ... + amOCm where aj (1 ~j ~ m) are rational numbers. Then we call 0(1)'' • ,O(m a basis for R(8). It is easy to see that no one of 0(1, ••• , O(m is expressible as a linear combination of the other m  1 numbers with rational coefficients. From Theorem 2.1 we know that 1,8, ... , 8 n 1 forms a basis for R(8), so that basis certainly exists. Following the proof of Theorem 14.9.2 the reader can easily prove
Theorem 3.2. Every basis for R(8) has precisely n elements.
0
Let 0(1" •• , O(n and PI" .. , Pn be two bases for R(8). Then, from the definition ofa basis, there are rational numbers ajk (1 ~j, k ~ n) such that all
n
O(j
=
L ajkPk
~j~n),
(1
lajkl
a1 n
= ............ # O.
k=l
Definition 3.3. Let number
OCl, ••• , OCn E
R( 8). By the discriminant of 0(1, •••
L1(CX1, ... ,ctn) =
,O(n
we mean the
.......... .
Theorem 3.3. The discriminant
J(O(1>' •• , OCn) possesses the following properties: is a rational number; and if 0(1) ••• , OCn are algebraic integers, then J(O(1>' •• ,O(n) is a rational integer. 2) Let 0 ( 1 ) ' ' ' ' O(n and PI,"" Pn be two bases for R(8), and aj = L~= 1 ajkPk (1 ~j ~ n). Then
1)
J(O(1>""
O(n)
3) A necessary and sufficient conditionfor 0(1)' J(O(l> ... ,O(n)
# O.
•• ,
OCn
to be a basis for R(8) is that
0
Theorem 3.4. Suppose that, among the numbers 8(1), ... , 8(n), rl of them are real, and r2pairs of them are complex conjugates (r1
+ 2r2 = n). Then,for any basis
0(1)' •• ,
O(n
427
16.4 Integral Basis
of R(8), we have
Proof From Theorem 3.3 we need only examine the case = 8n  1 • Now
0(1
= 1,
0(2 =
8, ... ,
O(n
Let us denote by !J the complex conjugate of 8. When 8(k) i= !JU) we have «8(j)  8(k»(!J(j)  !J(k»)2 > 0, and (8(j)  !JW)2 < O. Therefore ( 1),2.1(1,8, ... , 8 n 
1)
> O.
16.4 Integral Basis In the remaining part of this chapter we shall use the word integer to mean an algebraic integer. Definition 4.1. Let Oh, . .. , Wm be m integers in R(8). If every integer in R(8) can be expressed uniquely as a1 W1 + ... + amw m, where a1' ... ,am are rational integers, then we call W1>'" ,Wm an integral basis for R(8). Theorem 4.1. Integral basis exists. More specifically let W1, ... ,Wnbe a basis where Wj (l ~j ~ n) are integers such that ILI(w1>"" wn)1 is least. Then W1>"" Wn is an integral basis. Proof We can choose a natural number q so that q8 is an integer, and now 1, q8, (q8)2, ... , (q8)"1 are integers which form a basis for R(8). Therefore a basis 0(1) .•• ,lXn consisting of integers certainly exists. We shall now prove the set W1,' .. ,Wn of integers forming a basis which makes 1.1(0(1>' •• ,00n)lleast is an integral basis. Suppose the contrary. Then there exists an integer W = a1W1 + ... + anWn> where some ai is not a rational integer. We may assume without loss that a1 is not a rational integer, say a1 = g + t where g is a rational integer and 0 < t < 1. Then W'l = W  gW1 = tW1 + a2W2 + ... + anWnis also an integer, and w~, W2," . ,Wn still forms a basis for R(8). But
contradicting the minimal property of ILI(w1>' .. ,wn)l. The theorem is proved.
0
From this theorem we see that an integral basis is a basis, so that each integral basis consists of n elements. Theorem 4.2. All integral basis have the same discriminant. That is, if W1> ... , Wn and W'l' ... ,w~ are two integral bases, then L1(W1> ... ,wn) = L1(w'1" .. ,w~). 0
428
16. Introduction to Algebraic Number Theory
Definition 4.2. By the discriminant of the field R(.9) we mean the discriminant of its integral basis. We shall denote the discriminant of R(.9) by LI(R(.9» or simply LI. Theorem 4.3 (Stickelberger). The discriminant ofafield satisfies LI == 0 or 1 (mod 4). Proof Let il>' .. , in be a permutation of 1, 2, ... , n, and let (jil ....• i be 1 or  1 depending on whether il>' .. , in is an even or odd permutation. Then, from the expansion of a determinant, we have n
"
(j.
~
. w(ill .•• w(in)
It""f1n
1
n
(i1to .. ,i n )
L
W~ll ... W~in)
+ 21] =
a
+ 21],
(it, ... ,i n )
where I] is an algebraic integer, and a = L(il ..... i n ) W~l) ••• w~in) is a symmetric function of .9(1), ••. , .9(n), so that a is rational and hence a rational integer. Therefore
Since the integer 1](1] + a) = (LI  a 2 )/4 is rational, it is a rational integer. Therefore LI == a 2 == 0 or I (mod 4). D We shall now examine the quadratic field R(jD) where D is a squarefree rational integer. Each number in R(jD) is representable as 0( = (a + bjD)/2 where a, b are rational numbers. The trace and the norm of 0( are given by S(O() = a,
a2 _ b 2 D N(O()='4
Theorem 4.4. In the quadratic field R(jD), a necessary and sufficient conditionfor 0( to be an integer is that a, b are both rational integers satisfying a
== b (mod 2),
a == b == 0 (mod 2),
when
D == 1 (mod4);
when
D == 2,3
(mod 4).
(1)
Proof Since, in a quadratic field, 0( is an integer if and only if S(O(), N(O() are rational integers, the sufficiency of the condition (1) follows at once. Conversely, if 0( is an integer, then a and (a 2  b 2 D)/4 are rational integers, so that
is also a rational integer. Since D is squarefree, the number b must be rational. The necessity of the condition (1) now follows from a 2  b 2 D == O. D
429
16.4 Integral Basis
When D == 1 (mod4), (l
+ jD)/2 is an integer in R(jD).
From
a + bjD a  b 1 + jD ,,' =   + b '
2
2
2
and 1
1jD
1 12 _jD =4D,
1
1
l+jD
IjD
2
2
2
=D,
we have the following:
Theorem 4.5. Let D be a squarefree rational integer, and let
D
Ll {  4D'
when
D== 1
(mod 4),
when
D == 2,3
(mod 4).
Then Ll is the discriminant of R(jD) , and 1, w is an integral basis. The numbers 1, (Ll + Jii)/2 also form an integral basis. D
From this theorem we see that, in a quadratic field, we may choose an integer w such that 1, w form an integral basis. This is not true in general; that is, if R(8) is a field of degree n ~ 3, we may not always find an integer w such that 1, w, ... ,wn  1 is an integral basis for R(8). Example. Let oc be a zero offix) = x 3  x 2  2x  8. We shall prove that no integer w, with the property that 1, w, w 2 is an integral basis for R(oc), exists. Since ± 1, ± 2, ± 4, ± 8 are not zeros offix) , we know thatfix) is irreducible so that R(oc) is definitely a cubic field. It is easy to show that Ll(I, oc, o( 2 ) =  4 x 503. Since p = 4/oc is a zero of g(y) = y3 + y2 + 2y  8, it follows that pis an integer in R(oc). Let us denote by oc' and oc" the two remaining zeros of fix). Then
Ll(I,oc,P)
=
oc oc' oc"
4/oc 4/oc' 4/oc"
42
2
=
2
(N(oc»
1 oc 1 oc' 1 oc"
oc 2
2
OC,2 OC"2
Since Ll(l, oc, P) i= 0, the numbers 1, oc, Pform a basis. Indeed 1, oc, p must be an integral basis for R(oc), since otherwise the discriminant Ll of the field must satisfy ILlI < 503, and from Theorem 3.3 there exists a natural number a i= 1 such that  503 = a 2 Ll, which is impossible because 503 is a prime number. Now let w be any integer in R(oc). Then there are rational integers a, b, c such that w = a + boc + cpo Now
430
16. Introduction to Algebraic Number Theory
8
1X2
= IX + 2 +  = 2 + IX + 2{J,
{J2
= 
IX
{J  2
8
+ Ii = 
2
+ 21X 
(J,
so that w 2 = a 2 + b 2(2 =
+ IX + 2{J) + e2(  2 + 21X  (J) + 2abIX + 8be + 2ae{J (a 2 + 2b 2  2e 2 + 8be) + (b 2 + 2e 2 + 2ab)1X + (2b 2  e 2 + 2ae){J,
and hence .d(I, W, w 2)
=
1 a 0 b o e
a 2 + 2b 2  2e 2 + 8be 2 b2 + 2e 2 + 2ab . .d(1, IX, (J) 2b 2  e2 + 2ae
== 0 (mod 4 ·503).
Therefore 1, w, w 2 cannot be an integral basis for R(IX).
16.5 Divisibility Definition 5.1. Let IX and {J be two integers. Suppose that there exists an integer y such that IX = {Jy. Then we say that {J divides IX and we write {JIIX. We also say {J is a divisor of IX, or that IX is a multiple of (J. Theorem 5.1. Let g(x) = IX,X'
+ ... + lXo,
1X, "# 0,
where the numbers IX, (J are integers, and let g(x)h(x)
= Y,+mx,+m + ... + Yo.
If there exists an integer (j satisfying (jlyu
o ~ w ~ m).
(0
~
u ~ 1+ m), then (jllXv{Jw (0 ~ v ~ I,
0
The consideration of divisibility leads naturally to the problem of factorization of algebraic integers and the uniqueness of factorization. However, the factorization of integers in the field of all algebraic numbers has little meaning since an integer may be a product of infinitely many integers. For example 2 = 2! x 2! x 2t .... From this we see that we must somehow restrict the domain of the divisors, and therefore we only discuss the factorization problem within a certain algebraic field R(9). Next, there may be infinitely many units in an algebraic field. If e is a unit, then every integer may be written as IX = e . ellX, and therefore IX has infinitely many
431
16.6 Ideals
factorizations whenever R(B) has infinitely many units. For example, the numbers (1 + J2)n (n = ± 1, ± 2, ... ) are all units in R(J2) so that integers in R(J2) have infinitely many factorizations. In order to avoid this difficulty we introduce the notion of association. Definition 5.2. Two integers associates of each other.
0(,
Pwhich differ only from a unit divisor are called
Being associates is an equivalence relation. Definition 5.3. Let 0( be an integer in R(B). If there exist nonunit integers p, y such that 0( = Py, then we say that 0( is nonprime; otherwise we call 0( a prime in R(B). Theorem 5.2. Every algebraic integer in R(B) can be factorized into a product of primes in R(B). Proof If 0( is a prime, then there is nothing to prove. If 0( = py wher p, yare not units, then IN(O() I = IN(P)I . IN(y)l. Since p, yare not units the natural numbers.IN(p)l, IN(y) I are proper divisors of IN(O()I, so that IN(O()I > IN(P) I > 1 and IN(O()I > IN(y)1 > 1. The proof can now be completed by induction on IN(O()I. D It remains to consider the uniqueness of the factorization, and this is an important problem in algebraic number theory. We shall now examine the quadratic field R(J=5) and show that there is no unique factorization. Since  5 == 3 (mod 4), every integer in the field takes the form 0( = a + bJ=5 where a, b are rational integers. We shall show that 2,3, 1 ± J=5 are primes in the field, and that 2, 3 are not associates of 1 ± J=5, so that from 6 = 2 . 3 = (1 + J=5)(1  J=5) we see that there is no unique factorization in R(J=5). First 2, 3 cannot be associates of 1 ± J=5 because IN(2)1 = 4, IN(3)1 = 9 and IN(1 ± J=5)1 = 6. Next, if 2 is nonprime in R(J=5), we let
2 = O(P,
IN(O() I >
1,
IN(P) I > 1.
Write 0( = a + bJ=5. Then, from IN(2) I = 4, we have IN(O() I = a 2 + 5b 2 = 2 and this is impossible. Therefore 2 is a prime in R(J=5). Similarly 3, 1 ± J=5 are also primes in R(J=5). In order to overcome this problem Kummer invented the notion of ideals.
16.6 Ideals We shall now consider a fixed algebraic number field R(B) of degree n. Definition 6.1. Let 0(1) ••• , O(q be any q integers in R(B). The set of integers of the form
432
16. Introduction to Algebraic Number Theory
where I'/l> ••• ,I'/q are integers in R(8) is called an ideal generated by OCl> ••• ,ocq, and is denoted by [OCl> ••• ,OCq]. We shall use the capital Gothic letters
~, ~, (£:,
!l, ... to denote ideals.
Definition 6.2. An ideal [ocJ generated by a single integer oc is called a principle ideal. The set [OJ containing only the integer 0 is an ideal, but we shall assume that our ideals are distinct from [0]. The ideal [IJ contains all the integers in R(8), and is called the unit ideal which we shall denote by .0. Theorem 6.1. Ideals possess the following properties: 1) If oc, 13 are in the ideal, then so are oc ± 13; 2) If oc is in the ideal and 1'/ is an integer in R(8), then We see from this theorem that if 1 E~, then
~
I'/OC
is in the ideal.
0
= [1].
Definition 6.3. Let ~ = [OCl> ••• ,ocqJ and ~ = [f3l> . •. ,f3rJ be two ideals. If ~ and ~ contain exactly the same integers in R(8), then we say that they are equal and we write ~ =~. Theorem 6.2. A necessary and sufficient condition for two ideals [OCl> ••• , ocqJ and [131, ... ,f3rJ to be equalis thatthere are integers ~ij, I'/ji (1 ~ i ~ q, 1 ~ j ~ r) such that OCi
=
L ~ijf3j, j= 1
In particular, if [ocJ
q
f3j =
L I'/jiOCi· i= 1
= [f3J, then oc and 13 are associates. 0
Let OCl> ••• ,ocq be any q rational integers with greatest common factor d. Then there are rational integers Xl> . .• , Xq such that d = X10Cl + ... + XqOCq, and hence, in the rational number field, [OCl> ••• , ocqJ = Cd]. In other words there are only principal ideals of the rational number field. On the other hand we know from our discussion in the last section that, in R(.j=5), the ideal [2, 1 + .j=5J cannot be reduced to a principal ideal, so that nonprincipal ideals exist. Definition 6.4. Let ~ = [OC1, ••• , ocqJ and ~ = [131, ..• , f3r J be two ideals. We call the ideal [OClf3l> ... , OClf3" OC2f3l> . •• ,OC2f3" •.• ,ocqf3rJ the product of ~ and ~; we shall denote it by ~ . ~. Theorem 6.3. The product of~ and ~ is independent of the choices OCi, f3i. That is, if
then
=
, 13'1'···' OC2, 13't'···' OCs'f3'J [ OC'13' t • 1 1'···' OC ,1 13't' OC2
o
433
16.7 Unique Factorization Theorem for Ideals
This can easily be proved from the definition of equality for ideals. Also we have ~ for any ideal ~, and that multiplication of ideals is commutative and associative. We can then use induction to define ~1 ••• ~m and ~m, where m is a natural number, and show that the usual rules of indices hold.
D .~ =
Definition 6.5. Let ~, mbe two ideals. Suppose that there exists an ideal ... , m:ml> m:m) = «m:l> ... , m:ml), m:m). If (m:, ~) = D then we say that m:, ~ are coprime. It is easy to see that if (m:, ~)
= 1), then (m:(£;,
~(£;)
= 1)(£; for any ideal (£;.
435
\6.7 Unique Factorization Theorem for Ideals
Theorem 7.5. Let
'l' be a prime
ideal. Suppose that
Proof Since 'l',r~ we have ('l', ~)
have
'l'1~.
'l'1~~
= .0 and so ('l'~, ~~) =
and ~.
'l',r~.
Then
'l'1~.
Since 'l'1~~ we now
D
Theorem 7.6. Every ideal has finitely many distinct divisors. Proof Given the ideal ~ we choose ~ and a natural number a such that ~ . ~ = [al Therefore ~ contains a, and any divisor of~ also contains a. Thus it
suffices to show that there is at most a finite number of ideals containing a fixed natural number. Let 9Jl = [0(10 ••• ,O(m] be an ideal which contains a, and let Wl, ... , Wn be an integral basis for R(8) so that each O(j can be written as O(j = gjlWl + ... + gjnWn (1 ~j ~ m), where gjk are rational integers. Now set (0
~
Yj
=
n
pj =
L:
k=l
so that
rjk < a), n
qjkWk,
L rjkWk,
k= 1
O(j = apj + Yjo Since a lies in 9Jl, we have 9Jl
=
[aPl
+ Yl,· .. , apm + Ym, a] = [Yb . .. , Ym, a].
Since there is at most a finite number of sets Yb ... , Ym the required result follows. D Theorem 7.7 (Fundamental theorem for ideals). Any ideal ~ distinct from .0 can be factorized into a product ofprime ideals. Furthermore, apart from the ordering of the factors, this factorization is unique. Proof Since each ideal has at most a finite number of divisors we can use induction on the number of divisors of ~. We first establish the existence of a factorization. If ~ is a prime ideal, then there is nothing more to prove; otherwise we let ~ = ~(£; (~ "# .0, (£; "# .0). Since the numbers of divisors of~ and of(£; are less than that of~, the required result follows by induction. We now prove the uniqueness of the factorization. Suppose that
m ;::, 1,
I;::, 1.
If ~ is a prime ideal, then I = m = 1 and there is nothing to prove. If ~ is not a prime ideal, then I > 1, m > 1. Since 'l'd'l"l ... 'l'~, there must be a 'l'j (l ~j ~ m) such that 'l'l = 'l'j. We may assume without loss that j = 1 so that 'l'2 ... 'l', = 'l'~ ... 'l'~, and the required result follows from the induction hypothesis. D We have proved that every ideal distinct from .0 can be written as ... 'l'~r where 'l'j are distinct prime ideals, and aj are natural numbers. The representation is unique apart from the ordering of 'l'j. 'l'~''l'~2
436
16. Introduction to Algebraic Number Theory
16.8 Basis for Ideals Let Wi, . .. , Wn be an integral basis for R(8), and let ~ be any ideal of R(8). Since each member of ~ is representable as a linear combination of Wb ..• ,Wn with rational integer coefficients we see, from Theorem 6.1, that ~ can be viewed as a linear module. Also, corresponding to the ideal ~, there is an ideal ~ and a natural number a such that ~~ = [a], so that aWb"" aWn all lie in ~; and since these n numbers are linearly independent we see that ~ is actually a linear module of dimension n. From our discussion in Chapter 14, section 9, this module ~ must have a basis, and every basis must have exactly n integers. In particular, we have: Theorem S.l. Let such that
~
be an ideal of R(8). Then we can find n integers DCb' •• ,DCn in
where aij are rational integers, aij > 0 (l ~ i ~ n), 0 ~ aji < aii (l DCi, ... , DCnform a standard basis for~. D Let
DCb' •• , IXn
and flb' .. , fln be two basis for
~
~
~
i <j ~ n), and
and let
n
lXi
=
L uijflj
(i=I, ... ,n).
j= 1
Then the coefficient matrix (Uij) must be a modular matrix so that A(DCb" . ,DCn) = A(fl 1, ... ,fln)' Thus the discriminant of a basis of an ideal is independent of the choice of the basis so that we may write this as A(~). We shall now examine the standard basis for ideals of the quadratic fields R(.ji5). Let I, W be an integral basis for R(.ji5); the definition of W is given in Theorem 4.5. From Theorem 8.1 we can find two integers a, b + ew to form a standard basis. Here a, b, e are rational integers and we may suppose that a > 0, e > 0, 0 ~ b < a. However we should note that not all pairs of integers of the above form always form a basis for the ideal; there are other conditions on a, b, e. It is easy to see that a, b + ew form a standard basis for a certain ideal only when aw, web + ew) are representable as xa + y(b + ew), where x, yare rational integers. From aw = xa + y(b + ew) we have a = ye, ax + by = 0, so that cia, elb. Let a = em, b = en. Then from e(n
+ w)w = e(n + w)(n + w + w')  e(n + w)(n + w') =  eN(n + w) + e(n + w)(n + Sew»~,
where Sew) and N(n + w) represent the trace and the norm of wand n + w respectively, we see that a necessary and sufficient condition for em, e(n + w) to be a
437
16.9 Congruent Relations
standard basis for a certain ideal is that N(n
+ w) == 0
(1)
(modm).
From Theorem 4.5 we see that (1) is equivalent to LI
== {(2n + 1)2 (2n)2
(mod4m), (mod4m),
if
D
if
D
== 1 == 2,3
(mod 4); (mod 4).
(2)
Therefore we have: Theorem 8.2. A necessary and sufficient condition for a pair of integers cm, c(n + w) (c > 0, m > 0, 0 ~ n < m) to be a standard basis for a certain ideal of R(JD) is that either (1) or (2) holds. 0
16.9 Congruent Relations Definition 9.1. If~I[IX], then we say that see that ~IIX means that IX is in ~.
~
divides IX, and we write ~IIX. It is easy to
We can follow the discussion in Chapter 14, section 9, and define a congruent relation on the integers of the field R(8) with respect to an ideal. Definition 9.2. If~11X  f3, where IX, f3 are integers in R(8), then we say that IX and f3 are congruent modulo ~, and we write IX == f3 (mod ~). The integers of the field R( 8) are now partitioned into equivalence classes, called the residue classes modulo ~. We shall denote by N(~) the number of these residue classes, and we call N(~) the norm of~. From Theorem 14.9.3 we have: Theorem 9.1. Let W1, ... , Wn be an integral basis for R(8), and let 1Xl> ... , IXn be any basis for the ideal~. If lXi = = 1 aijWj, then N(~) is equal to the absolute value of the determinant of the coefficients, that is N(~) = lIaijll· 0
I:;
From this theorem we deduce at once: Theorem 9.2. Let LI be the discriminant of R(8), and LI(~) be the discriminant of the basis for ~. Then we have LI(~) = (N(~»2L1. 0 Theorem 9.3. The norm of a principal ideal [IX] satisfies N([IX]) Theorem 9.4.
N(~m)
=
= IN(IX)I. 0
N(~)N(m).
Proof Since ~ contains ~m, from Theorem 14.9.4, the members of ~ are partitioned into residue classes modulo ~m, and the number of classes is equal to N(~m)/ N(~). It remains to prove that the number of classes is also equal to N(m).
438
16. Introduction to Algebraic Number Theory
Let flI>' .• , flN('B) denote the residue classes mod~. There exists an integer OCE 21 such that ([oc] , 21~) = 21. Now ocflI> .•. ,OCflN('B) all lie in 21, and if} "# k (l ~), k ~ n), then ocflj 1= OCflk (mod 21~). From ([oc], 21~) = 21, we know that corresponding to any y in 21, there are integers 1], (j such that y = I]OC + (j, (j E 21~. Also, corresponding to the integer 1], there is an integer fl and a natural number) (1 ~) ~ N(~» such that I] = flj + fl so that y = ocflj + ocfl + (j == ocflj (mod 21~). This shows that every member of 21 must be congruent to exactly one of OCfll' ... ,OCflN('B) modulo 21~, and therefore the number of classes concerned must be equal to N(~) as required. 0 Theorem 9.5. Let 'l' be a prime ideal, and let oc be any integer not divisible by 'l'. Then ocN(\jJ)  1 == 1 (mod'l'). Proof Let 0, 1tl> 1t2,' .. , 1tN(\jJ)l denote the residue classes mod 'l'. Since 'l',j'oc, the numbers 0, OC1tl, OC1t2, ... , OC1tN(\jJ)  1 also represent the residue classes mod'l'. Therefore
and the theorem follows.
0
16.10 Prime Ideals Theorem 10.1. Every prime ideal 'l' must divide a rational prime p. Moreover,p is the least positive rational integer in 'l' so that it is unique. Proof From Theorem 7.1 there must exist a rational integer a such that 'l'1[a]. Let a = IIp be its factorization, so that there must be a prime p such that 'l'1[p], or ~ jp. Suppose, if possible, there exists a positive rational integer b such that b < p and 'l'lb. Then bE 'l' so that(p, b) = 1 also lie in 'l' giving 'l' = [1], which is impossible. Therefore p is the least positive rational integer in 'l'. 0
Let the prime ideal factorization for [p] be 'l'1'l'2 ... 'l't. Then, on taking the norm, we havepn = N([p]) = N('l'1)N('l'2) ... N('l't). It follows that the norm ofa prime ideal must be a prime power. If N('l') = pI, then we calIf the degree of 'l'. Concerning the factorization of [p] there is the following important theorem which we shall not prove. Theorem 10.2 (Dedekind's discriminant theorem). A necessary and sufficient condition for 'l'21p is that piA. 0 Let us examine the factorization of [p] in the quadratic field R(fo). Clearly there can only be the following three possibilities. 1) [p] = 'l'; 2) [p] = 'l'.Q, 'l' "# .0, N('l') = N(.Q) = p; 3) [p] = 'l'2, N('l') = p. Concerning the factorization of [p] in a quadratic field, we have:
439
16.10 Prime Ideals
Theorem 10.3. Let A be the discriminant of R(jD). Then 1),2) or 3) in the above holds according to (~) =  1, + 1, or O. Here (~) is the Kronecker's symbol.
Proof If ~ is a prime divisor of [pJ, and N(~) = p, then either [pJ = ~.Q or [pJ = ~2. Let cm, c(n + w) be a standard basis for the ideal. Then N(~) = c2 m = p, so that c = 1, m = p. From (2) in section 8, we now see that (~) is either + 1 or O. Let us suppose, conversely, that (~) = + 1 or O. We first consider the case p =f. 2. I) If (~) = 1, then there exists a such that p j a and A == a 2 (mod p). Since p =f. 2, we have (p, 2a) = 1 so that [p,a
+ flJ[p, a  flJ
[
~  fl,;
=
[pJ p,a
+ fl,a
=
[pJ [p,a
+ fl,2a, a 2 ;
AJ
A ,1
J=
[p].
Also [p, a + flJ =f. [p, a  flJ, since otherwise we have [p, a + flJ = [p,a  flJ = [p, a + fl, 2aJ = [1J and this is impossible; [p, a + flJ and [p, a  f l J are not .0. Therefore, when p =f. 2 and (~) = 1, [p J is the product of two distinct prime ideals. 2) If (~) = 0, then piA, so that
[p,flJ 2 = [p,flJ[p,flJ
=
[PJ[p,fl,~J.
But A = D or 4D, p =f. 2 and 1) is squarefree, so that (p, :) =1 and hence [pJ = [p, flJ2. That is, if p =f. 2 and (:) = 0, then [pJ is the square of a prime ideal. Let us now consider the case p = 2. Since (i) =f.  1 we must have D == 2, 3 (mod 4) or D == 1 (mod 8). As before we can prove: 3) When D == 2 (mod4), we have (1) = 0 and [2J = [2, jDJ2; 4) When D == 3 (mod4), we have (1) = 0 and [2J = [2, 1 + jDJ2; 5) When D == 1 (mod 8), we have (i) = 1 and
[2J = [2, 1 +
fDJ·
Since the two factors here are distinct, ideals. D
[2, 1 
2jDJ.
[2J is now the product of two distinct prime
Theorem 10.3 establishes Dedekind's discriminant theorem for quadratic fields. We shall now examine a specific example for a cubic field. Let oc be a zero ofj(x) = x 3  x 2  2x  8. We saw in §4 that R(oc) is a cubic field with discriminant 503, that 1, oc, f3 = 4/oc form an integral basis, and that f3 is a zero of g(y) = y3 + y2 + 2y  8. We now consider the factorization of [503J in R(oc). Let ~, .0, ~ denote prime ideals of R(oc). Then the factorization of [503J must take one of the following five situations:
440
16. Introduction to Algebraic Number Theory
1) 2) 3) 4) 5)
[503] = ~.Q9t; ~, .0, 9t distinct and N(~) = N(.Q) [503] = ~2.Q; ~ =F.Q and N(~) = N(.Q) = 503; [503] = ~3; N(~) = 503; [503] = ~.Q; N(~) = 503, N(.Q) = 503 2 ; [503] = ~; N(~) = 503 3 •
= N(9t) = 503;
In each of the first four situations, [503] has a prime divisor ~ with norm 503. Let us first examine these four situations. Let ao, bo + bllX, Co + CllX + C2fJ be a standard basis for ~ so that bo < ao', Co < ao, Cl < b l . Also, since aolX, aofJ lie in ~ we have, in addition, that b l ~ ao, C2 ~ ao, and from N(~) = aOb l C2 = 503, we obtain ao = 503, b l = 1, C2 = 1, Cl = O. Therefore ~ must take the form [503, a + IX, b + fJ], and 503, a + IX, b + fJ form a standard basis for ~. Since a + IX, b + fJ E ~ and N(~) = 503, we have N(a + IX) == N(b + fJ) == 0 (mod 503). But a + IX and b + fJ are the roots of fix  a) = 0 and g(y  b) = 0 respectively so that N(a + IX) = Ifl a)1 and N(b + fJ) = Ig(  b)l. Therefore a and b satisfy the cubic congruences a 3 + a2  2a + 8 == 0 (mod 503) and b3  b2+ 2b + 8 == 0 (mod 503), which give the solutions a == 149, 149,204 and b == 395, 395, 217 (mod 503). Therefore ~ must be one of the following four ideals: [503,149
+ 1X,395 + fJ],
[503,204
+ IX, 217 + fJ],
[503,149
+ 1X,217 + fJ],
[503,204
+ 1X,395 + fJ].
The third ideal is not 1X(217
+ fJ) 
~,
since otherwise
217(149
+ IX) + 65(503) = 4 
217·149
+ 65·503 = 366
would be in ~, and from (366, 503) = 1 we would have ~ = .0. Similarly the fourth ideal is not ~. Next, from (149
+ IX)IX = 
46(503)
+ 150(149 + IX) + 2(395 + fJ),
(149
+ lX)fJ = 
117(503)
+ 149(395 + fJ),
(395
+ fJ)1X = 
117(503)
+ 395(149 + IX),
(395
+ fJ)fJ = 
310(503)
+ 2(149 + IX) + 394(395 + fJ),
we see that 503, 149 + IX, 395 + fJ do form a standard basis for the prime ideal [503, 149 + 1X,395 + fJ]. Similarly 503, 204 + IX, 217 + fJ do form a standard basis for the prime ideal [503, 204 + IX, 217 + fJ]. Finally the two ideals [503,149 + IX, 395 + fJ] and [503, 204 + IX, 217 + fJ] are distinct divisors of the ideal [503] and we therefore conclude that our situation 2) is the only possibility, and computation shows that actually [503] = [503,149
+ 1X,395 + fJ]2
. [503,204
+ IX, 217 + fJ].
441
16.12 Ideal Classes
16.11 Units We have the following result on units: Among all the units in R(8) we can choose = r1 + r2  1 of them, say e1> . .. , e" such that every unit is representable as pe~l ... e!r (I = 0, ± 1, ± 2, ... ); here p is a certain root of unity in R(8). Here we shall only concern ourselves with quadratic fields R(jD). Let a unit be x + yw so that N(x + yw) = ± 1. We need therefore to solve these equations in rational integers for the units in R(jD). Now r
N(x
+ yw) = (x + yw)(x + yw') if D == 1 (mod 4), if D == 2,3
(mod 4).
When D < 0, the equations (2x + y)2  y2 D = 4 and x 2  y2 D = 1 have only finitely many solutions, so that R(jD) can have only finitely many units. In fact if we denote by w the number of units in R(jD), it is not difficult to show that w = 6, 4 or 2 according to whether LI =  3,  4 or LI ~  7. Consider next D > O. Now the equations (2x + y)2  y2 D = ± 4 and x 2  y2 D = ± 1 are the Pell equations we considered in Chapter 10. Therefore there exists a unit 1'/ in R(jD) such that any unit in R(jD) is representable as ± 1'/n, n = 0, ± 1, ± 2, .... This number 1'/ is called the fundamental unit of R(jD).
16.12 Ideal Classes Definition 12.1. Let m: and mbe two ideals. Suppose that there exist two principal ideals [ae] and [p] such that [ae]m: = [p]m. Then we say that the two ideals m: and m belong to the same ideal class, and we write m: '" m. It is easy to see that being in the same ideal class is an equivalence relation, and moreover we have 1) m: '" .0 if and only if m: is a principal ideal; 2) if m: '" mand (£; '" :n, then m:(£; '" m:n; 3) if m:(£; '" m(£; then m: '" m. The ideals of R(8) are now partitioned into classes called ideal classes. Theorem 12.1. The number of ideal classes of R(8) is finite. Proof It suffices to show that there exists a positive number M, depending only on R(8), such that every class contains an ideal satisfying N(m) ~ M. This is because
m
there can only be finitely many ideals having a given norm. Let (£; be any ideal of R(8). We already know that there exists an ideal m: such that m:(£; '" .0, and if we can choose an ideal msuch that m:m '" .0 and N(m) ~ M, then our theorem is proved. This is because m:m '" m:(£; so that m'" (£;.
442
16. Introduction to Algebraic Number Theory
Let Oh, ... ,Wn be an integral basis for R( 8) and let n
s= 1
We define the natural number k by k n ~ N(21) < (k + l)n. Among the (k + l)n integers Xl W1 + ... + XnWn (Xm = 0,1, ... ,k) there are at least two which are congruent modulo 21, say
here 0 ~ Ym
~
Zm
in 21. Since IYm  zml
~
IN((X)I
Since
(X
~
=
k, 0
~
k, and we now have the nonzero integer
k it follows that
IS~l mt1 (Ym  zm)w~) I ~ S~l mt1 klw~)1 = knM ~ M· N(21).
is in 21 we see that 211 [(X], and we may write [(X] = = IN((X)I ~ M· N(21) or N(~) ~ M as required. D
21~
which gives
N(21)N(~)
Theorem 12.2. Let h be the number of ideal classes of R(8). Then,for any ideal 21, we have 21h  .0. Proof Let 21b ... , 21h be ideals that belong to different classes. Then so are 2121b ... ,2121h and hence 211 ... 21h  (2121 1) ... (2121h), or 21h  .0. D
16.13 Quadratic Fields and Quadratic Forms Let Ll be the discriminant of the quadratic field R(jD). We shall now establish the relationship between the ideal classes of R(jD) and the classes of quadratic forms having discriminant Ll. Let 21 be an ideal of R(jD) and let (Xl> (X2 be a basis for 21 satisfying (1)
where (X'1' (X~ are the conjugates of (Xl> (X2. Corresponding to 21 we construct the quadratic form F(x,y) =
N((X1 X + (X2Y) ((X1 X + (X2Y)((X~ X + (X~) 2 = = ax N(21) N(21)
+
b xy
+ cy
2
.
Since a = N((X1)/N(21), b = (N((X1 + (X2)  N((X1)  N((X2»/N(21), c = N((X2)/N(21), and (Xl> (X2, (Xl + (X2 are in 21 we see that a, b, c are rational integers. Also, the
443
16.13 Quadratic Fields and Quadratic Forms
discriminant of F(x, y) is b2  4ac = (OC1OC~  OC'lO(2)2/N(91)2 = A. We say that F(x,y) is a quadratic form belonging to 91. When A < 0 the quadratic field R(.ji5) is imaginary so that a > 0 and F(x, y) is positive definite. Also, it is not difficult to see that as OCl, OC2 run through the basis for 91 satisfying (I) we obtain all the quadratic forms equivalent to F. Theorem 13.1. Every indefinite or positive definite quadratic form F(x,y) = ax 2 + bxy + cy2 with rational integer coefficients and discriminant A belongs to an ideal 91 with basis OC1> OC2'
Proof We first show that a, (b  fl)/2 form a basis for the ideal IDl = [a, (b  fl)/2]. Observe that (b  fl)/2 satisfies the equation x(b  x) = ac so that it is an integer. Also we have w = (s(w) + fl)/2, where sew) = 0 or I, and sew) aw=
+b
(b  fl)
sew) a=
2
+b
2
b  fl aa
b  fl b  f l sew)  b + b + f l b2  A 2 w = 2' 2 = ~a
+
2
'
sew)  b b  f l 2
.
2
'
where (s(w) ± b)/2 and (b 2  A)/4a are rational integers, so that a, (b  fl)/2 do indeed form a basis for IDl. If a > 0 we take 91 = IDl, OCl = a, OC2 = (b  fl)/2, and from N(IDl) = a we have the quadratic form
(ax
+ t(b 
fl)y)(ax a
+ tcb + fl)y) = ax 2 + bxy + cy 2,
='='
so that IDl is the required ideal. If a < 0, then, since the quadratic form is not negative definite, A > 0 and we now take 91 = flIDl, OCl = afl and a2 = (b  fl)fl/2. It is easy to see that OC1> OC2 form a basis for 91 satisfying (I). Also N(91) =  aA and we can now construct the quadratic form
 A(ax
+ tcb
fl)y)(ax  aA
+ tcb + fl)y) = ax 2 + bxy + cy 2.
'='=='
The theorem is proved.
0
From the above we see that if F belongs to 91, then every quadratic form equivalent to Falso belongs to 91. However, given a quadratic form F, there may be two different ideals 91 and mto which Fbelongs. This then establishes a relationship between 91 and m. Definition 13.1. Let 91 and mbe two ideals. Suppose that there are integers oc and fJ such that [oc]91 = [fJ]m and N(ocfJ) > O. Then we say that 91 and mare equivalent in the narrower sense, and we write 91 ~ m.
444
16. Introduction to Algebraic Number Theory
It is clear that being equivalent in the narrower sense is a special case of being equivalent.
Theorem 13.2. Equivalent quadratic forms belong to ideals which are equivalent in the narrower sense. Conversely, quadratic forms belonging to ideals which are equivalent in the narrower sense are equivalent forms. 0 Let ho denote the number of ideal classes (not in the narrower sense), and let h denote the number of classes under the narrower sense of equivalence. Assume that the discriminant of the field concerned is Ll. Then h is the class number of quadratic forms with discriminant Ll. If ~ '" m then either ~ ~ m or ~ ~ [flJm, and we deduce that h ~ 2h o. In fact, if ~ '" m, then there are integers oc, f3 such that [ocJ~ = [f3Jm. (i) If Ll < 0, then N(ocf3) > 0 so that ~ ~ m, and whence ho = h. (ii) If Ll > 0 and the fundamental unit 1] satisfies N(1]) =  1, then [ocJ~ = [f3Jm = [1]f3Jm and one of N(ocf3), N(ocf31]) must be positive, so that we still and ho = h. have ~ ~ (iii) If Ll > 0 and the fundamental unit 1] satisfies N(1]) = I then ~ cannot be equivalent in the narrower sense to both m and m[flJ, so that ho = h/2. Therefore we have
m
h' { ho = ~ 2'
if Ll < 0
or
Ll > 0,
if Ll > 0,
N(1])
N(1])
=
{1]2, 1],
1;
= + 1.
Also if we replace d by D in Theorem 11.4.4 and define B
=
if Ll > 0, if Ll > 0,
B
accordingly, then
N(1]) =  I; N(1]) = + I.
Again, from our results on the class number in Chapter 12 we have: Theorem 13.3. Let ho denote the number of ideal classes. Then
W
h,
[tILlIl(Ll)
~ 2(2 _(~)) ,~,
1] h o =
if Ll < 0,
; ,
£t O.
s,
Ll
s= 1
Example 1. In R(i) we have Ll =  4, W = 4 so that ho
=
Example 2. In R(J="3) we have Ll ho
=
±(~) =
4 2(2  0)S=1
s
I.
=  3, W = 6 so that
6 2(2  ( 1»
±(=2) =
s= 1
S
I.
0
445
16.14 Genus
Example 3. In R(J=5) we have LI
ho =
I (
2 2(2  0).=1
Example 4. In R(.J"=T9) we have LI
ho =
20, W
= 
=
20) s
19, W
f. (
2 2(2(I))s=1
2 so that
=
= 2.
= 2 so that
19) = 1. s
Example 5. In R(J2) we have LI = 8, e = 3 + 2J2. Since  1 and 1]2 = e, 1] is a fundamental unit. Also (1
+ J2)h = O
n
(sin ns)_ (~) = sin 3n
8
s= 1
so that ho
8
I
1]
= 1 + J2 has norm
sin ~ = (l
8
+ J2),
= 1.
16.14 Genus Let R(.ji5) be a fixed quadratic field with discriminant LI, and we shall assume in this section that the ideal classes are derived from the equivalence relation on ideals being equivalent in the narrower sense. Definition 14.1. If a quadratic form F(x, y) belongs to an ideal m: then we call the character system for F(x, y) (see Definition 12.6.1) the character system for m:. That is, if Pi>' .. ,Ps are the odd prime divisors of LI, we take an integer IX in m: so that (N(IX)jN(m:), 2L1) = 1 and we call
(N(IX)~~(m:))
(i= 1, ... ,s)
and 1[N(a)
]
"(IX)
= (
1)2 N(~)1
e(lX)
= (
1)8 N(~) 1,
"(IX) e(IX),
1[(N(a»),
if
,
]
LI D==.3 4
(mod 4);
LI 4
if
=.2 (mod 8);
if
=.6
LI 4
(mod 8)
the character system for m:. Since ideals belonging to the same class have the same character system we may speak of the character system for an ideal class. Definition 14.2. Two ideal classes with the same character system are said to belong to the same genus. There is now a onetoone correspondence between ideal classes in the quadratic field R(.ji5) and classes of primitive forms having discriminant LI.
446
16. Introduction to Algebraic Number Theory
Theorem 14.1. The values of the character systemfor ~m correspond to the products of the values of the character systems for ~, m.
Proof If a, 13 belong to
~,
m respectively, then af3 belongs to N(a)
N(f3)


N(~)
N(m)
~m.
Also
and
and if ( N(a)
N(~)'
2.1) = 1,
The theorem is proved.
N(f3) ( N(m) ,
2.1) = 1,
then
( N(af3)
N(~m)
,
2.1)
=
1.
D
From this theorem we deduce at once: 1) The character system for the product of two classes is the product of the two character systems. 2) If {~} and {m} belong to a genus, and {~d{md belong to a genus, then {~~d and {mmd also belong to a genus. Definition 14.3. We call the class to which the unit ideal .0 belongs the principal class, and the genus to which the principal class belongs the principal genus. Also, if ~m = [a] where a is a natural number, then we call {m} the inverse of the class {~}.
From Theorem 7.1 we see that the inverse of any ideal class always exists. Also = {~}. Since the values of the character system for the principal class, as well as for all the classes in the principal genus, are all 1, it follows that the product of any two classes in the principal genus, and the inverse of any class in the principal genus, are classes in the principal genus. (The family of all ideal classes forms a group with respect to class multiplication, and the subfamily of ideal classes in the principal genus forms.a subgroup.) {.o}{~}
Theorem 14.2. Every genus has the same number of classes.
Proof We let ~ be the principal genus, and we let ~{~} denote the family of classes obtained from the product of classes in ~ with {~}. We put all the ideal classes into various families (1)
where {~i} is any class not belonging to~, ~{~2}' ... '~{~i d. It is easy to see that there is no ideal class which belongs to two of the families in (1).
447
16.15 Euclidean Fields and Simple Fields
From Theorem 14.1 we know that in each family in (1) all the classes belong to the same genus, and distinct families belong to different genera, so that each family in (1) forms a genus. Since any two classes in 3{~i} are distinct the theorem is proved. 0
16.15 Euclidean Fields and Simple Fields Definition 15.1. If ho = 1, then we call the field a simple field. It is clear that, in a simple field, every ideal is a principal ideal. Therefore we have:
Theorem 15.1. The unique factorization theorem holds for integers in a simple field. 0
There is a type of simple fields, called Euclidean fields, having properties which are very similar to those of the rational field. Definition 15.2. If, corresponding to any two integers ~, 1] (1] # 0) in R(jD), there exist two integers K, A. such that IN(A.) I < IN(1])I,
(1)
then we call R(jD) an Euclideanfield. An alternative definition is: Defmition 15.3. If, corresponding to any b in R(jD), there exists an integer K such that IN(b  K)I < 1,
(2)
then we call R(jD) an Euclidean field. Theorem 15.2. Every Euclidean field is a simple field.
Proof Let R(jD) be Euclidean. In order to prove that R(jD) is simple it suffices to show that every ideal is a principal ideal. Let ~ be any ideal in R( jD) and let 1X1' 1X2 be a basis for ~, and we may assume without loss that 0 < IN(1X1)1 ~ IN(1X2)1. Since R(jD) is Euclidean there are integers IX~ and /32 such that 1X2 = 1X~1X1 + /32, IN(/32)1 < IN(1X1)1· If /32 # 0, then there are lX'l and /31 such that 1X1 = 1X'1/32 + /31> IN(/31)1 < IN(/32)1. Continuing with the argument, which must terminate after a finite number of steps because IN(1X1)1 is a natural number, we arrive at an integer IX such that ~ = [1X1> 1X2] = [IX]. The theorem is proved. 0
448
16. Introduction to Algebraic Number Theory
Theorem 15.3. There are only five quadratic imaginary Euclidean fields, namely
R(~), R(j=2), R(j=3), R(~) and R(FU). Proof 1) Let D == 2, 3 (mod 4). Put" = r + sJD, K = x + yJD. Then the condition (2) becomes: corresponding to any pair of rational numbers r, s there are rational integers x, y such that (3) Settingr = s = tthecondition (3) gives± + IDI± < 1, or IDI < 3. Therefore R(JD) cannot be Euclidean if D ~  3. On the other hand, if r, s are given rational numbers we can always find rational integers x, y such that Ir  xl ~ t, Is  yl ~ t so that corresponding to D =  1,  2, the inequalities I(r  X)2  D(s  y)21 ~ ± + IDI± < I hold so that R(~) and R(j=2) are Euclidean. 2) Let D == 1 (mod4). Put" = r + sJD, K = x + y(1 + JD)/2 so that
Setting r = s = ± we have /6 + /61DI < 1 or IDI < 15. Therefore there can only be the three Euclidean fields R(j=3), R(~) and R(FU), and these fields are indeed Euclidean because, given rational numbers r, s we may choose rational integers x, y such that 12s  yl ~ t, Ir  x  (y/2) I ~ t, and therefore when D =  3,  7,  11,
I(r 
x 
~)2 2
_
D (s _
~)21 ~ ~ + IDI ~ ~ < 2
""" 4
16""" 16
l.
D
In §13 we calculated the class number for R(Fl9) to be l. We see therefore that there are simple fields which are not Euclidean. From Theorem 12.15.4 we know that there are only finitely many imaginary fields which are simple. The question then is exactly how many? It is not difficult to prove that R(JD) is simple when D =  1,  2,  3,  7,  11,  19,  43,  67,  163.
It has also been proved thatthere is at most one more value of D, and that ifit exists, then D <  5 . 109 • (In fact no extra D exists; see Notes.) Concerning real Euclidean fields we have:
Theorem 15.4. The field R(JD) is a real Euclidean field only when D = 2,3,5,6,7,11,13,17,19,21,29,33,37,41,57,73.
D
Various Chinese mathematicians, including the author, made contributions to this problem, which in principle was eventually settled by Davenport. The proof of the theorem is beyond the scope of this book.
449
16.16 Lucas's Criterion for the Determination of Mersenne Primes
16.16 Lucas's Criterion for the Determination of Mersenne Primes We first sharpen Theorem 9.S for the quadratic field R(JiJ), D > O. From Theorem 10.3 we know that all the prime ideals can be separated into three classes according to whether (~) = 0, + I or  l. We shall write q for a prime number satisfying (~) = I so that q = .0,0; we write r for a prime number satisfying (1) =  1 so that r itself is a prime ideal in R(JD). From Theorem 9.S we have, if Q'(O(, then
=1
(mod .0),
(1)
O(r'l=1
(modr).
(2)
O(q1
and if r,(O( then
Theorem 16.1. Suppose that q, r are not 2. O(q1
=1
If q,(O(,
then
(modq),
(3)
and if r,(0(, then O(r+ 1
=
(mod r).
N(O()
(4)
Observe that (1) and (3) are equivalent, and that (2) follows (4).
Proof Let 0( = a + b(.1 + fl)/2 where a, b are rational integers. Let p be an odd prime so that, from Fermat's theorem, O(P
=a
P
+ bP
.1P
+ (fl)p 2P
b p1 =a + (.1 + .1 2 fl) 2
=a + ~(.1 + (;)fl) Therefore if p = q, then O(q = (modr) which gives (4). 0
0(
(modp).
(modq) which gives (3), and if p = r, then O(r = IX
Now let p be an odd prime and we shall examine the nature of the Mersenne number M = Mp = 2P  l. If there exists .1 > 0 such that
(~) = and there exists a unit
1
e in R(fl) satisfying N(e) =
where e' is the conjugate of e.

1, then we let
450
16. Introduction to Algebraic Number Theory
Theorem 16.2. A necessary and sufficient condition for M to be prime is that rp  l
== 0 (modM).
(6)
Proof 1) Assume that M is a prime. From (5) we know that M is of the type r, and so from Theorem 16.1 we have eM + 1 ==  1 (mod M) and therefore
2) Assume that M is composite, say M = ql ... q s r 1 ... r t • From (5) we know that at least one of the prime divisors of M is of type r. If (6) holds, then Mlr p  l or
and hence e2P
==  1 (mod M),
(7)
== 1 (modM).
(8)
and on squaring e 2P + 1
Let 'P be a prime ideal divisor of M and let I be the least positive integer satisfying el == 1 (mod 'P). Then, by (8), 1I2P + 1, and so by (7), 1= 2P + 1. If 'P is a divisor of a certain q, then eq  1 == 1 (mod 'P) by Theorem 16.1, and hence 2P + 11 q  1, which is impossible because q cannot exceed M. If 'P is a certain r, then er + 1 ==  1 (mod r) by Theorem 16.1. This then gives 2P + 112(r + 1) andsor = 2Pm  1. Butr ~ Mso thatm = 1, r = M. That is Mmust be prime after all. 0 Example. Take L1
= 5, e = (1 + .j5)/2 so that
Ifwe take p = 7, Mp = 127, then the residues mod 127 for r m (m = 1,2,3,4,5,6) are 3,7,47,48,16, O. Therefore 127 is a prime. Of course the full power of the theorem is not revealed in this specific example. However, with the aid of electronic computers, the same method can be used to show, for example, that the 687 digit number M 2281 = 2 2281  1 is prime. Indeed all the large known Mersenne primes are found by essentially the same type of method.
16.17 Indeterminate Equations The invention of the theory of ideals to tackle Fermat's problem is an important development in algebraic number theory. From the standpoint of mathematics this theory is far more important than that of settling a difficult problem. Let p be an
451
16.17 Indeterminate Equations
odd prime and p
= e21ti/ p •
If we can prove that
has no integer solutions in the field R(p), then obviously Fermat's Last Theorem is established. The expression ep + 1]P can be factorized into linear terms in R(p) so that the problem is easier to start with. Indeed this is Kummer's starting point in his research on Fermat's problem, but the principal difficulty lies with the absence of a unique factorization theorem. It is for this reason that Kummer invented his theory of ideals which has now become an indispensable part of mathematics. It is not easy to understand Kummer's method. That is, even if we assume that there is unique factorization in R(p), we still need a deep theorem of Kummer's before we can settle Fermat's pr.oblem. The theorem concerned is as follows: A necessary and sufficient condition for a unit B in R(p) to be a pth power of another unit is that B is congruent to a rational number mod (1  PY. We can only consider two simple examples in this book. Theorem 17.1. The equation (1)
has no solution in integers in R(J=1). Proof The unique factorization theorem holds in the field R(J=1), that is every ideal is a principal ideal. We may therefore assume without loss that (e,1]) = 1. 1) Let A = 1  i. Then A is irreducible, and A2 =  2i and 2 = i(l  i)2 are associates. Also N(2) = 4 so that every integer in R(J=1) must be congruent to one of the four numbers 0, 1, i, 1  i (mod 2). Since 0, 1  i are divisible by A, any integer DC not divisible by Amust satisfy DC == 1 or i (modA2) so that DC = 1 + PA 2 or DC = i + PA2, and hence (2)
Now let e, 1],. satisfy (I). Suppose, if possible, that e, 1] are not divisible by A. From (2) and (1) we have 2 == .2 (modA 6 ). Since 2 = A2 i we see that AI •. Write • = AY so that A,j'y, and iA 2 == A2/ (mod A6) or / == i (mod A4 ). On squaring this we deduce from (2) that 1 == y4 ==  1 (mod A4), which is impossible. Therefore one of e, 1] is divisible by A. By symmetry we may assume that Ale, and we now write e = Anb, n ;;:: 1, A,j'b, so that we have
2) We now prove a more general result, namely that there are no integers b, ., 1]
in R(
J=1) such that
B
unit,
A,j'b1],
(b,1]) = I,
n;;:: 1.
(3)
452
16. Introduction to Algebraic Number Theory
The proof is divided into two steps. In the first step we show that if (3) is soluble then
n must be at least 2; in the second step we show that if (3) is soluble for a certain n, then it is soluble for n  1 also. The theorem therefore follows from this contradiction. If(3)holdsforintegersD,r,I]thenAt!"r.SinceN(A) Let r = 1 + JJ.A so that on squaring we have'
= 2weseethatr == 1 (mod A).
Also, by (2), (4) so that, by (3),
Thus AIJJ. and we may write r
= 1 + VA 2 ,
r2 = 1 + 2VA2
+ V2A4 =
1 + A4V(i
+ v).
(5)
Since v, i + v form a complete residue system mod A we have v(i + v) == 0 (mod A) giving r2 == 1 (modA 5 ). From (3) and (4) we deduce that GA 4n D4 == r2  1]4 == 0 (mod A5), and we conclude that n ~ 2. Now assume that D, r, I] satisfy (3) with n ~ 2. Then GA 4n D4 = (r  I]2)(r + 1]2). From (5) we have r == 1 (mod A2 ), and on the other hand, since At!"I] we have (6)
it follows from (7) that A4 (nl) must divide one of these two divisors. We may assume that A4 (n1) actually divides the latter divisor, since otherwise we may replace I] by il]. From (7) we have r
+ .,,.2
_
14(n1)
~G211.
rp
4
(At!"rpO", (0", rp)
where G1, G2 are two units. Thus
21]2
il]2
= Y = G2 A4(nl)rp4  G10"\
= 1),
453
16.17 Indeterminate Equations
or
where 83 =  8di, 84 = 82/i are also units. Since n ~ 2, A/,a we see from (2) that 1]2 == 83 (mod A4) and hence, by (6), 1 == (mod A2). Therefore 83 is either + 1 or  1 and not ± i, that is
A/,q>a,
83
(q>, a) = 1.
Ifwe take the negative sign here then our second step follows at once, and if we take the positive sign then the same result is obtained by replacing 1] by i1]. 0 Theorem 17.2. The equation (8)
has no solution in integers in R(p), p = ( 1 + ~)/2. Proof Since R(p) is a simple field we may assume that (~, 1]) = 1. ·1) Let A = 1  p, so that 1  p2 =  p2(l  p) =  p2A and N(A) = _ p2A2 = 3. Therefore Ais irreducible and all the integers are partitioned into three classes represented by 0, 1,  1. Therefore, if A/,~, then ~ == ± 1 (mod A). We shall now show that (9)
Let
We need only consider the ~ = 1 + f3A so that ~3
_
+ sign case, since otherwise we may replace ~ by 
~.
= f3A(f3A + 1  p)(f3A + 1 _ p2) = f3A(f3A + A)(f3A  p2 A) = A3 f3(f3 + 1)(f3 _ p2).
1 = (~

1)(~
 p)(~  p2)
Since f3, f3 + 1, f3  p2 are incongruent mod A, and N(A) = 3 there must be one of them which is divisible by A. We deduce that if A/'1], then (10)
Now if A/, ~1]', then 0 == ~3 + 1]3 + ,3 == ± 1 ± 1 ± 1 (mod A3). The possible choices are ± 1, ± 3 and none of them is divisible by A3, so that one of ~,1]" must be divisible by .Ie. Let it be , = Any, n ~ 1, A/,y so that (~,1])=
1,
A/,y,
n~
1.
2) We shall now prove a more general result, namely that (~, 1]) =
1,
A/,y, n
~
1,
(11)
454
16. Introduction to Algebraic Number Theory
where 8 is a unit, has no integer solutions in R(p). As in the proof of Theorem 17.1 we separate into two steps where, in the first step, we show thatif(ll) has a solution, then n ~ 2, and in the second step we show that if (11) has a solution, then n may be replaced by n  1 and there is still a solution. The theorem then follows by this contradiction. If (11) has a solution, then by (10)
Since + 1 + 1 and  1  1 are not divisible by A we see that  8A 3n y 3 == 0 (mod A4 ), and so n ~ 2. Suppose that e, 1], yare solutions to (11). From 1 == p == p2 (mod A) we deduce that e + 1] == e + P1] == e + p21] (mod A) and hence  8A 3ny 3 = e 3 + 1]3 = (e + 1])(e + p1])(e + p21]) where the three divisors are all multiples of A. It is not difficult to show that (e + 1])/A, (e + P1])/A, (e + p21])/A are pairwise coprime. In fact, for example, from (e + 1])  (e + P1]) = A1] and p(e + 1])  (e + P1]) =  Aeweseethat(e + 1])/Aand(e + P1])/A are coprime. Thus one of the three divisors. in the factorization
_ 8A 3 (nl)y3
=
e
+ 1] e + P1] e + p21] A
A
A
must be a multiple of A3 (nl), and we may assume that it is (e we can replace 1] by P1] or p21]. Hence
+ 1])/A since otherwise (12)
where 81> 82, 83 are units and jl, v, u are pairwise coprime integers not divisible by A. From (12) we have
giving
(V,u)
=
1,
A,(jl
(13)
where 84, 85 are also units. From (13) we have v3 + 84U3 == 0 (mod A2) and here, by (10), ± 1 ± 84 == 0 (modA 2). Among the units ± 1, ± p, ± p2 only 84 = ± 1 can satisfy this congruence. Hence 84 = ± 1 and we see that (13) is the same as (11) with n replaced by n  1. The theorem is proved. 0
16.18 Tables We conclude this chapter with two tables displaying all the quadratic fields R(.ji5) with  100 < D ~ 100. We list their integral basis, discriminants, ideal classes and the quadratic forms associated with the ideal classes together with their
16.18 Tables
455
character systems. We also display the continued fraction representations for OJ and the fundamental units in the second table. More precisely: In Table I, the first column is the value for D. The second column is OJ (see the definition in Theorem 4.5). The third column is the discriminant d. The fourth column displays the ideal classes of R(~ji). The fifth column indicates the relationship between the ideal classes. The sixth column displays the quadratic forms representing the classes of forms corresponding to the ideal classes. The seventh column is the character systems associated with these classes of forms. In Table II, the first two columns are as before. The third column displays the continued fractions expansion representing OJ when D is squarefree and representing JD when D is not squarefree. The fourth column is the discriminant d. The fifth column displays x + yJD when D is squarefree and it is the fundamental unit 1] of R(JD); when D is not squarefree it displays the least positive integer solutions to x 2  y2 D = ± I (if x 2  y2 D =  I is soluble, then x + yJD satisfies x 2  y2 D =  I, otherwise x, y satisfy x 2  y2 D = + I). The sixth column is N(x + yJD). The last four columns are the same as the last four columns in Table I.
456
16. Introduction to Algebraic Number Theory
Table I D
00
LI
Ideal classes
I
)=l
_22
2
)2
3 5
6
Quadratic forms
Character systems
(I)
X2 +y2
+1
_23
(I)
2X2+y2
+1
3
(I)
X2+xy+y2
+1
)5
2 2 ·S
(I)
A2
SX2+y2
+1, +1
(2, I +) S)
A
3X2 +2xy+2y2
I, I
)6
2 3 .3
(I)
A2
6X2+y2
+ I, +1
A
3x2 +2y2
I, I
2X2+xy+y2
+1
1+)3 2
(2,) 6) 7 10
\I
13
14
1+)7 2 )10
1+)11
7
(I)
5.2 3
(I)
A2
IOx2 + y2
+1, +1
(2,) 10)
A
5x 2 +2y2
I, I.
3x2 +xy+y2
+1
A2
13x2 +y2
+1, +1
(2, I +) 13)
A
7X2+2xy+2y2
I, I
(I)
[4
14x2 + y2
+1, +1
(3,2+) 14)
[3
6x 2 + 4xy + 3y2
I, I
(2,) 14)
[2
7x 2+2y2
+ I, + I
Sx2+2xy+3y2
I, I
A2
4X2+xy+y2
+1, +1
(2, I +(0)
A
(I)
3X2+3xy+2y2 17x2+y2
I, I
[4
+ I, +1
(3,2+) 17)
[3
7X2+4xy+3y2
I, I
(2,1 +) 17)
[2
9x2 + 2xy + 2y2
+ I, + I
(3, I +) 17)
6x 2 +2xy+3y 2
I, I
19
(I)
Sx 2 +xy+y2
+1
3.2 2 .7
(I)
A2A~
+2Ix2+y2
+1,+1,+1
AAI
6x 2 +6xy+Sy2
1,1,+1
AI
7x 2 +3y2
+1,1,1
(2, 1+) 21)
A
IIx2+2xy+2y2
1,+1,1
(I)
A2
22x2+y2
+1, +1
(2,) 22)
A
IIx 2 +2y2
I, I
II
(I)
)13
2 2 .13
(I)
)14
_7.2 3
2
Relations
(3,1+)14) IS
17
19 21
1+)15 2 )17
1+)19 2 )21
3·S
2 2 .17
(I)
(5,3+) 21) (3,)21)
22
)22
23 .11
457
16.18 Tables Table I (continued) D
23
w 1 +) 23 2
29
)29
Quadratic forms
Character systems
(1)
[3
6x 2+xy+y2
+1
[2
4x 2 + 3xy + 2y2
+1
3x2+2xy+2y2
+1
2,1+ (2,
)26
Relations
23 (
26
Ideal classes
2 3 .13
2 2 .29
1+)23) 2
1+~23) (1)
[6
26x2+y2
+1, +1
(5,3+)26)
[5
7x 2+6xy+5y 2
1, 1
(3,1 +) 26)
[4
9X2+2xy+3y2
+1, +1
(2,) 26)
[3
13x2+2y2
1, 1
(3,2+)26)
[2
10x2+4xy+3y 2
+1, +1
(5,2+)26) (1)
[6
6x 2+4xy+5y 2 29x 2+y2
1, 1 +1, +1
(3,2+) 29)
[5
l1x2 +4xy+3y 2
1, 1
(5,4+) 29)
[4
9x 2+8xy+5y 2
+ 1, + 1
(2,1 +) 29)
[3
15x2+2xy+2y 2
1, 1
(5,1 +) 29)
[2
6x 2+2xy+5y2
+ 1, +1
(1)
A2A:
10x2+2xy+3y 2 30X2+y2
+1,+1,+1
(2,) 30)
AAI
15x2+2y2
1, 1, +1
(3,) 30)
Al
10x2+3y2
+1,1,1
(5,) 30)
A
1, +1,1
(1)
[3
6x 2+5y2 8x 2+xy+y2
(2,w)
[2
4X2+xy+2y2 5x 2+3xy+2y 2
+1
(1)
A2A:
33x2+y2
+1,+1,+1
(2,1 +) 33)
AAI
17x2+2xy+2y2
1,1,+1
Al
11x2+3y2
1,+1,1
(6,3+)33)
A [4
7x 2+6xy+6y 2 34x2+y2
+1,1,1
(I)
(5,4+) 34)
[3
IOx 2+8xy+5y 2
I, I
(2,)34)
[2
17x2+2y2
+1, +1
(3,1 +) 29) 30
31
)30
W+)31)
2 3 .3.5
31
(2,1 +w) 33
)33
_22 ·3·11
(3,) 33) 34
)34
2 3 .17
(5,1 +) 34) 35
W+)35)
5·7
1, 1
+1 +1
+1, +1
7x 2+2xy+5y 2
I, I
(I)
A2
9X2+xy+y2
+1, +1
(5, 5+~ 35)
A
3x 2+ 5xy + 5y2
I, I
458
16. Introduction to Algebraic Number Theory
Table I (continued) D
w
Ll
Ideal classes
Relations
Quadratic forms
37
j37
2 2 '37
(1)
A2
37x 2+y2
+1, +1
(2,1 +j 37)
A
1, 1
(1)
[6
19x2+2xy+2y 2 38x 2+y2
(3,2+j 38)
[5
14x2 + 4xy+ 3y2
1, 1
(7,2+j38)
[4
6x 2+4xy+7y2
+1, +1
(2,j 38)
[3
19x2+2y2
1, 1
(7,5+j38)
[2
9x 2+ 10xy + 7y2
+1, +1
[4
13x2+2xy+ 3y2 lOx 2+xy+y2
1, 1
(1) (2,1 +w)
[3
(3,1 +w)
[2
38
j38
2 3 .19
(3,1 +j 38) 39
to+j39)
3,13
j41
2 2 '41
j42
43
!O+j43)
46
j46
3.2 3 .7
43 2 3 .23
to+j47)
47
41x2 + y2
+1, +1
(3,2+j41)
[1
15x2+4xy+3y 2
I, 1
(5,3+j41)
[6
lOx2 +6xy+5y 2
+1, +1
(7,6+'j41)
[5
Ilx2
1, 1
(2,1 +j 41)
[4
21x2+2xy+2y2
+1, +1
(7,1 +j 41)
[3
6x 2+2xy+7y 2
1, 1
(5,2+j 41)
[2
9x 2+4xy+5y 2
+1, +1
(1 )
A2Af
14x2+2xy+3y 2 42x2+y2
+1,+1,+1
(7,j 42)
AAI
6x 2+7y2
+1, 1,1
(3,j 42)
Al
14x2+3y2
1,1,+1
(2,j42)
A
21x2+2y2
1, +1,1
(1)
I
(1)
[4
Ilx2+xy+y2 46x 2+y2
+1, +1
(5,3+j46)
[3
Ilx 2 + 6xy + 5y2
1, 1
(2,) 46)
[2
23x2+2y2
+ I, +1
lOx 2+4xy+5y 2
1, 1
!O+j51)
3·17
+ 12xy+7y 2
1, 1
+1
(1)
[5
12x2+xy+y2
+1
(2,w)
[4
+1
(3,2+w)
[3
6X2+xy+2y2 6x 2 + 5xy+3y2
(3,w)
[2
4X2+xy+3y2 7x 2 + 3xy + 2y2
+1
(1)
A2
(3,1 +w)
A
13x2+xy+y2 5x 2+3xy+3y2
+ I, +1 1, 1
(2,1 +w) 51
+1, +1
[8
( 1)
(5,2+j46) 47
+1, +1 1, 1 1, 1
(3,1 +j 41) 42
+1, +1
5X2+xy+2y2
(2,w) 41
6x 2+3xy+2y 2 4x 2 + 3xy+3y 2
Character systems
+1 +1
459
16.18 Tables Table I (continued) D
w
Ll
Ideal classes
Rela,tions
Quadratic forms
53
)53
2 2 .53
(I)
[6
53x 2+y2
+ I, + I
(3,2+)53)
[5
19x2+4xy+3y 2
I, I
(9,8+)53)
[4
13x2 + 16xy+9y 2
+1, +1
(2, 1+) 53)
[3
27x2+2xy+2y2
I, I
(9, 1+) 53)
[2
6x 2+ 2xy + 9y2
+1, +1
18x2+2xy+3y 2
I, I +1, +1 I, I
(3, 1+) 53) 55
W+)55)
5·11
(I)
[4
14x2+xy+y2
(2, I +w)
[3
(5,2+w)
[2
8x 2+3xy+2y 2 4x 2 + 5xy+ 5y2
)57
+1,+1,+1
(I)
A2Ai
(2, 1+) 57)
AA.
29x2+2xy+2y2
1,1,+1
(3,) 57)
A.
19x2+3y2
+1,1,1
A A2
IIx 2 + 6xy+6y 2
I, +1,1
58x2 +y2
+ I, + I
(2,) 58)
A
29x 2+2y2
I, I
(I)
[3
15x2+xy+y2
+1
(3, 5+~ 59)
[2
7x 2 +5xy+3y 2
+1
5X2+xy+3y2
+1
3.2 2 .19
(6,3+) 57) 58 59
)58 W+)59)
2 3 .29
(I)
59
(3, 61
)61
2 2 .61
1+~59)
)62
2 3 .31
I, I
(I)
[3
61x2 + y2
+ I, +1
(5,3+)61)
[2
14x2 + 6xy+ 5y2
+ I, + I
(5,2+) 61)
62
+1, +1
7X2+xy+2y2 57x 2 +y2
(2,w)
57
Character systems
13x2 +4xy+5y 2
+1, +1
(7,4+) 61)
A[2
IIx 2 + 8xy+ 7y2
I, I
(7,3+) 61)
A[
IOx 2 +6xy+ 7y2
I, I
(2, 1+) 61)
A
31x 2 + 2xy + 2y2
I, I
(I)
[8
62x2+y2
+ I, +1
(3,2+)62)
[7
22x2+4xy+3y2
I, I
(7, 1+) 62)
[6
9x 2+2xy+7y 2
+ I, + I
(11,2+) 62)
[5
6x 2 +4xy+ lIy2
I, I
(2,) 62)
[4
31x 2 +2y2
+1, + I
(11,9+) 62)
[3
13x2+ 18xy+ lIy2
I, I
(7,6+) 62)
[2
14x2 + 12xy+ 7y2
+1, +1
2Ix2+2xy+3y2
I, I
(3, I +) 62)
460
16. Introduction to Algebraic Number Theory
Table I (continued) D
w
Ll
Ideal classes
Relations
Quadratic forms
Character systems
6S
j6S
2 2 'S'13
(I)
14
6Sx 2+y2
+1,+1,+1
(3,2+j6S)
13
23x 2+4xy+3y 2
1,+1,1
(9,4+j 6S)
12
9x 2 + 8xy + 9y2
+1,+1,+1
(3, I +j 6S)
22x2+2xy+3y2
1,+1,1
(II, 10+j 6S)
AI3
ISx 2+ 20xy + IIy2
+1,1,1
(2, I +j 6S)
AI2
33x2+2xy+2y2
1,1,+1
(II, I +j 6S)
Al
6x2+2xy+lly2
+1,1,1
(S, j 6S)
A 14
13x2+Sy2 66x 2+y2
I, I, +1
(I) (S,3+j66)
[3
ISx 2+6xy+Sy2
1,+1,1
(3,j 66)
12
22x 2 + 3y2
+1,+1,+1
14x2+4xy+Sy2
1,+1,1
AI2
IOx 2 + 4xy + 7y2
+1, 1,1
AI2
6x 2+ IIy2
I, I, +1
(7,S+j 66)
Al
13x2+ 10xy+ 7y2
+1, 1,1
(2,j 66)
A
33x 2+2y2
1,1,+1
(I)
14
17x2+xy+y2 69x 2+y2
+1,+1,+1
(7,6+j 69)
13
ISx 2+ 12xy+ 7y2
+1,1,1
(6, 3+j 69)
12
13x2+6xy+6y 2
+1, +1, +1
(7, I +j 69)
IOx 2+2xy+7y 2
+1,1,1
(S, I +j 69)
A[3
14x2+2xy+Sy2
1,1,+1
(3,j 69)
A[2
23x2+3y2
1,+1,1
(S,4+j69)
Al
17x2+8xy+Sy2
1,1,+1
(2, I +j 69)
A
(I)
A2A~
3Sx2+2xy+2y 2 70X 2+y2
+1,+1,+1
(7,j 70)
AAI
IOx 2 + 7y2
I, I, +1
(S,) 70)
AI
14x2+Sy2
+1, 1,1
(2,j70)
3Sx 2+2y2
1,+1,1
(I)
A 17
71x2+y2
+1
(2, 3+~ 71)
16
IOx 2+3xy+2y 2
+1
(s, 7+~
IS
6x 2+7xy+Sy2
+1
66
j66
2 3 '3'11
(S,2+j66) (7,2+j66) (11,j 66)
67 69
70
71
1l ::r
+1, 1,1
Z c:
8
"... 0
'
C4Y,
(3)
or
~I
max(m, Igol) such that
IhtO ghQ(h) 1< 1, then the required contradiction will be obtained from equation (l). By Theorem 6.2 it suffices to prove that, for any fixed x, we have n
as k=O but this is easy because, as p
p + 00,
+ 00, m
IT
IxIP 1 (h + Ixl)P L lakllxl k ~ _ _"h==l_ _ _ + O. (p  I)! k=O n
D
486
17. Algebraic Numbers and Transcendental Numbers
17.7 The Transcendence of 1C Theorem 7.1. The number n is irrational. Proof (Niven). Suppose the contrary, so that n = alb where a, b are positive integers. Let xn(a  bx)n f(x)=
n!
and F(x)
= f(x)  jO.
Let m
P(y) =
n (y h=l
alXh)'
Since 1 + e i1t = 0, it suffices to show that m
n (eO 
R=
e"h) =F O.
h=l Now R can be written as
R = c + Le"
+ Led'" + ... = c + efJl + efJ2 + ... + efJ
r,
where c is the number of the 2m terms in which the power of the exponential is zero, and 131,132,'" , f3r are nonzero numbers. Let p be a prime greater than max(c, a, n~= 1 alf3hl) and define I(x) by (ax)pl I(x) =
n (ax 
af3h)P
h=l
(p  I)!
Similarly to the proof of Theorem 6.3 we have YP,h(X  f3h)P
+ Yp+ l,h(X 
f3h)P+ 1
+ ...
(p  I)!
where A" being symmetric functions of af31, ... , af3r and hence symmetric functions 'of alXb" . , alXh, are rational integers and A p 1 =1= 0 (modp). On the construction of the corresponding F(x) and Q(x) we have F(O)R = F(O)
(c + tl h
efJh)
tl
t
= cF(O) + h F(f3h) + h Q(f3h),
so that cF(O) = C(Apl
+ pAp + ... )
is a rational integer which is not a mUltiple of p. Also
L F(f3h) = L (PYP,h + pcp + 1)yp+ l,h + ... )
h=l
h=l
L
L
Yp,h + pcp + 1) Yp+ l,h h=l h=l = pCp + pcp + I)Cp+l + ... ,
=p
+ ...
488
17. Algebraic Numbers and Transcendental Numbers
where cP' cp + 1>" ., being symmetric functions of apt> . .. , apr> are integers. It follows that L~ = 1 F(Ph) is a mUltiple of p and whence
IcF(O) + htl F(Ph) I~ 1. It only remains to show that, for sufficiently large p,
But, as p
+ 00,
n
L lakllxlk ~ k=l
L
(alxl)Pl (alxl + alPhl)P _ _ _:.:....h==l'_ _ _ _ + 0 (p  I)!
so that the result follows from Theorem 6.2.
D
Remark. This theorem settles the problem of "squaring the circle"  it is impossible to construct a square equal in area to a given circle, using only straight edge and compass. Exercise 1. Prove that sinh
eis transcendental whenever eis rational.
Exercise 2. Prove that sin 1 is transcendental by proving that ei is transcendental.
17.8 Hilbert's Seventh Problem In the year 1900 Hilbert gave a list of 23 unsolved problems which he believed to be worthy of the attention of mathematicians in the twentieth century. We already mentioned the first part of his seventh problem, and the remaining part is the following: Let (J( and Pbe algebraic numbers with (J( =F 0, 1 and Pirrational. Does it follow that (J(P is transcendental? As specific examples he asked for the proofs of the transcendence of 2J2 and e" = (  1)  i. In 1929 the Russian mathematician A. O. Gelfond made an important contribution to the solution of this problem. He proved the transcendence of e" and pointed out that his method can be used to settle Hilbert's problem when plies in an imaginary quadratic field. In 1930 Kusmin used Gelfond's method to settle the case when P lies in a real quadratic field and proved in particular that 2J2 is transcendental. Then in 1934 the complete solutions to Hilbert's problem were given independently by Gelfond and Schneider. It may be of some interest to recall that, when discussing this problem, Hilbert was of the opinion that the solution would not be available before the solutions to the Riemann's hypothesis and Fermat's last theorem. It seems therefore that it is very difficult to judge the difficulty of an unsolved problem before a solution is available.
489
17.8 Hilbert's Seventh Problem
Let K be an algebraic number field of degree h, and let /31, ... , /3h be an integer basis, so that every integer in K has the unique representation a1/31 + ... + ah/3h where al> . .. , ah are rational integers. We shall denote by loci the maximum of the modulus of the conjugates oc(i) (1 ~ i ~ h) of oc, that is
loci = max loc(i)I. 1 ~i~h
In the following we let c, Cl> C2 be natural numbers depending on K and its basis /3l> ... , /3h. It is easy to show that if oc is an algebraic integer with oc = a1/31 + ... + ah/3h, then lad ~ clocl· Lemma 8.1. Let 0 < M < N, and ajk be rational integers satisfying lajkl ~ A (A ~ 1, 1 ~j ~ M, 1 ~ k ~ N). Then there exists a set ofrationalintegers Xl> ... , XN, not all zero, satisfying 1
~j~
M,
(1)
and M
IXkl
~
[(NA)NM],
1 ~ k ~ N.
(2)
Proof Let 1 ~j~M,
so that this defines a mapping from rational integers (x 1, ••• , XN) to rational integers (Yl>· . . ,YM). We write NM M H = [(NA)NM] so that NA «H+ 1)~, and hence
NAH + 1 ~ NA(H
N
+ 1) < (H + I)M.
(3)
For any set of integers (Xl, . .. , XN) satisfying (4)
we have
where  B j and Cj represent respectively the sum of the negative and positive coefficients of Yj, so that the number of values assumed by Yj cannot exceed NAH + 1. The number of sets (Xl> ... , XN) satisfying (4) is (H + I)N and the corresponding number of sets (Yl> .. . ,YM) is at most (NAH + I)M. It follows from (3) that there must be two sets (x~, ... , x~) and (x~, ... , x~) which correspond to the same set (Yl> .. . ,YM). Let Xk = x~  x~ (1 ~ k ~ N) so that (Xl> .. . , XN) is now the required set satisfying (1) and (2). D
490
17. Algebraic Numbers and Transcendental Numbers
Lemma 8.2. Let 0 < p < q, and let a.kl (1 ~ k ~ p, 1 ~ I ~ q) be integers in K satisfying Ia.kll ~ A. Then there exists a set ofalgebraic integers e1,' .. , eq in K, not all ' zero, satisfying· l~k~p
(5)
and 1~ I
Proof Let el=XllfJ1 integers. Let
+ ...
+XlhfJh (1
~/~q)
~
(6)
q.
where Xll, ... ,X,h are rational (7)
where aklrl> ... ,aklrh are also rational integers. For 1 ~ k q
q
~
p we have, from (5), that
h
o = L a.kle, = L a.kl L x'rfJr 1=1
1=1
r=l
= rt J1 X'r ut aklrufJu = ut Ct1 J1 aklrUXlr)fJU' Since fJl>' .. , fJh are linearly independent we have the hp number of equations h
q
L L ak'rux'r = 0,
1~ u
~
h,
(8)
r=l'=l
with hq number of unknowns. From (7) and our remark preceeding Lemma 8.1 we see that laklrul ~ cmax 1';;i';;h IfJd A ~ c2A. It now follows from Lemma 8.1 that the system (8) has a nontrivial set of solutions in rational integers satisfying 1~ I
~
q,
1~ r
~
h.
Therefore
le,l Taking
C1 =
~
IX llllfJ11 + ... + IX'hllfJhl
~
c2h(l
+ (hqC2A y/(qP».
c2h the lemma is proved.
D
17.9 Gelfond's Proof Let a. and fJ be algebraic numbers with a. i= 0, 1 and fJ irrational, and we have to prove that a. Pis transcendental. Suppose the contrary, so that y = a. P = ePlog~ (where log a. may be any fixed value of the logarithm of a.) is also algebraic. We shall derive a contradiction.
491
17.9 Gelfond's Proof
Suppose that
0(,
f3, y lie in an algebraic field with degree h. Let m
where q2
= t
=
2h
q2 n=2m
+ 2,
is a square of a natural number and is a multiple of 2m. Also, let
P1, P2, ••. , PI represent the t numbers
(a
+ bf3) log 0(,
1 ~ a ~ q,
1 ~ b ~ q.
We introduce the integral function (1)
where the coefficients 1'/1>' . . ,1'/1 are determined by the following conditions. We solve the system of mn homogeneous linear equations
o~ k ~ n 
1~ I
1,
~
m,
(2)
in the t = 2mn unknowns 1'/1", .,1'/1' The coefficients of this system are numbers in K and
~
1
I
~
m,
1 ~ a,
b ~ q,
O~k~nl.
Let C1> C2, ••• denote natural numbers which are independent of n. There exists C1 such that C10(, Clf3 and C1Y are all integers in K, so that on multiplying each of the coefficients of the system by c~lc~qc~q = C~l + 2mq (~ c~), the resulting coefficients become integers in K. Moreover the absolute value of the conjugates of the various coefficients is at most
It follows from Lemma 8.2 that there is a nontrivial set of integers solutions 1'/1>' . ·,1'/1 in K such that
1~ k
~
t.
Since the numbers Pl, ... , PI are distinct, the function R(x) is not identically zero. For suppose otherwise; then on expanding the right hand side of(1) we have 1'/lP~
which implies 1'/1 R(x)
+ 1'/2P~ + ... + 1'/IP~ =
= 1'/2 = ... = 1'/1 =
=
anix _l)n
0,
k = 0,1,2, ...
0, a contradiction. Thus we see from (2) that
+ an+l,/(X _
/)n+1
+ ... ,
1 ~ I ~ m,
(3)
where an,I> an + l,I> ••• are not all zero. Hence there must be a natural number r such
492
17. Algebraic Numbyrs and Transcendental Numbers
that R(k)(/) = 0,0 ~ k ~ r  I, I ~ I ~ m. But for I so that we see from (3) that r ~ n. Let us now examine the number
~
10
~
m we have R(r)(lo)"# 0
(4) c~ + 2mq p
This number lies in K, and
is an integer in K so that
IN(p) I > c 1h(r+2m q )
s
> c r.
(5)
On the other hand (6)
We now determine a suitable upper bound for Ipl. We apply Cauchy's integral formula to the function S(z)
R(z) m r TI (z/o) k=l k *'0
= r!
(/0k)r . zk
'We then have
f
I S(z) p = (logoc)rS(/o) = (logoc)r. dz, (7) 2m z  10 c where Cis the circle Izl = m(l + r/q), so that 10 (~ m) lies inside C. Asz varies on the circle we have .
IR(z)1 ~ t max l'1kle(Q+qIPIl}oglal.m(l +~) ~k~t
1 ~
...."
tcnnt(n+l)Cr+Q 4
~
9
""
Cr rt (r+3) 10
,
mr ( + qr)  m = q'
Iz  101 ~ Izl  1/01 ~ m I Iz  kl
~
mr
, q
m
1
(z/o)rJI
I ~ k ~ m,
(I 
k)rl :k
~C~1 (q)mr ; ,
k*lo
IS(x)1
~ r!c~ort(r+3)S1 (~rr ~ c;2rtr(3m)+~.
From (7) we now have Ipi
~
I I(logoc)rl 2n .
II I
S(z)  Idzl z  10
c
493
Notes
From (6) and (8) we have
Replacing m by 2h
+ 2 we now have
and from (5) we deduce that rt'~h
< c'14 c'5  c'15'
Since r ~ n, this cannot hold for sufficiently large n, and the required contradiction is obtained. D
Notes 17.1. The proof of Roth's theorem has been omitted in this English edition (seeJ. W. S. Cassels [15]). W. M. Schmidt [51], [52] has given the following important generalization of this famous theorem: Let 0(10"', O(n be real algebraic numbers such that 1,0(10"', O(n are linearly independent over that rational field R. Then, given any B > 0, the inequality
has at most a finite number of sets of integer solutions
q10 ... ,qn'
17.2. A. Baker [2] has made the following important improvement on Thue's theorem: Let g(x,y) be a homogeneous irreducible polynomial of degree n (~3) with rational integer coefficients, and let m be a positive integer. Then all the integer solutions to the equation g(x,y) = m can be effectively determined. More specifically, if H exceeds the absolute values of all the coefficients of g(x, y), then all the integer solutions to g(x,y) = m must satisfy max(lxl, Iyl) < exp{(nH) 0, the equation
n = ~ + ... + x!,
(1)
is always soluble in integers xv. We now denote by g(k) the least of all integers s with this property. Then Waring's statement becomes: "g(2)
= 4,
g(3) = 9,
g(4) = 19,
and so on."
We also denote by G(k) the least number swith the property that (1) is soluble for all sufficiently large n. Then clearly we have G(k)
~
g(k),
but in actual fact there is a great difference between the two numbers. In this chapter we only prove some very special results. The proof of the WaringHilbert theorem (that is g(k) < (0) is given in the next chapter. The proof, which Khintchin described as one of the three pearls in number theory, is due to Linnik and is much simpler than the original proof by Hilbert.
18.2 Lower Bounds for g(k) and G(k) Theorem 2.1. g(k) Proof Let q
~
2k
+ [@k]
 2.
= [(W] and consider n = 2kq  1 < 3k.
495
\8.2 Lower Bounds for g(k) and G(k)
This number ncan only be the sum of the powers lk and 2k, and in fact the least sfor the decomposition is given by n
= (q  1)2k
+ (2k
 1)1\
that is, n requires (q  1) lots of 2k and 2k  1 lots of 1\ giving g(k) ~ 2k
+ q  2. D
From this theorem we see at once that g(2) ~ 4,
Theorem 2.2.
If k
~
~
9,
2, then G(k)
~
g(3)
g(4) ~ 19,
k
g(5)
~
37, ....
+ 1.
Proof Denote by A(N) the number of positive integers not exceeding N which are expressible in the form X~
We may suppose that
+ ... + x~,
Xl.' •. , Xk
are arranged so that
Hence A(N) cannot exceed the number of solutions to this set of inequalities, that is [N!lk]
A(N) ~
Xk
XkI
X2
L ... L
L L
1.
We claim that the sum on the right hand side is B(N) = 
1
k!
([N 1 / k]
+ 1)([N 1 /k] + 2)' .. ([N 1 /k] + k).
We can use induction to prove this. The claim clearly holds when k = 1, and so it remains to prove that
±(X +: 1) (Y +k), =
x=o
k
1
k
and this is easy to establish. When N ? 00, N
2
B(N) k! 1. In this section we discuss the condition for the solubility of the congruence x~+···+x~==n
(modq).
From the Chinese remainder theorem we see that we can restrict our discussion to the congruence x~
+ ... + x~ == 0
(modp'),
(1)
497
18.3 Cauchy's Theorem
where p is a prime number. Since n = n  1 + 1k we may also assume in what follows that pfn. We first prove the following: Theorem 3.1 (Cauchy). Let Xl>'" ,Xm be m incongruent numbers (modq) and Yl>'" ,Yn be n incongruent numbers (modq). Suppose that there exists Yi such that (Yi  Yi,q) = 1 whenever j i= i. Then the number of incongruent numbers (modq) represented by Xu + Yv (1 ~ u ~ m, 1 ~ v ~ n) is at least min(m
+n 
1, q).
Proof The theorem is trivial if n = I. Suppose therefore that n ~ 2 and we may also assume that i = 1. We use an inductive argument. Let Zl>'" ,Zt be incongruent numbers (modq) of the form Xi + Yi If t = q the required result is established. We suppose therefore that t < q and we denote by X, Y, Z the sets Xl>'" ,Xm ; Yl>'" ,Yn; Zl>'" ,Zt respectively. Consider Xl + Yl + A(Yn  Yl)' When A = 0, 1 all such numbers belong to Z. Since (q,YnYl)=1 there must exist a Ao such that xl+Yl+(AoI)(YnYt) E Z and Xl + Yl + AO(Yn  Yl) if; z. Let (j = Xl + Yl + AO(Yn  Yl) + Yl' Then (j  Yl if;Z and (j  YnEz. We can arrange Yl>'" ,Yn so that {
Clearly r
~ n 
~
(j 
Ys if;Z
(1
(j 
Ys'EZ
(r
1~'
we have min(d + (d  1)(s  1),p) = p so that the theorem follows. 24) p > 2, (p  1),tk, k = ptko, p,tko. From
and (p  1),tko, we see that Xk runs over at least (p  1)/(p  l,ko) (> 1) incongruent numbers (modp). Therefore X~
+ ... + x~,
gives min (
pl (p  1, k o)
+ (Pl
(p  1, k o)

1) (s  1) pY) ,
incongruent numbers modpY. From s  1 ~ 3k
~
2pk pl
pY
~ ~
1 pl 2 (ko,p  1)
pY  1 _p__l__ l (ko,p  1)
we see that x~ + . . . + x~(p,t Xl, ... ,X.) gives pY incongruent numbers. The proof of the theorem is complete. 0
18.4 Elementary Methods In the study of Waring's problem an elementary method usually gives rather poor results. We now introduce several examples which prove the existence of upper bounds for G(k) and g(k) for some special k. Sometimes we can even determine explicitly such an upper bound, but such a result will not be sharp. From Theorem 8.7.8 we already have that g(2) = 4.
500
18. Waring's Problem and the Problem of Prouhet and Tarry
Theorem 4.1. g(4)
~
50.
Proof We start with the identity 6(a 2
+ b 2 + c2 + d 2 )2 = (a + b)4 + (a  b)4 + (c + d)4 + (c  d)4 + (a + C)4 + (a  C)4 + (b + d)4 + (b  d)4 + (a + d)4 + (a  d)4 + (b + ct + (b  C)4.
Since a 2 + b 2 + c2 + d 2 can represent any positive integer, it follows that the left hand side of the identity represents 6x 2 where x is any integer. Now any integer n can be written as n
= 6N + r,
r
= 0, 1,2,3,4,5
so that n = 6(xi
+ x~ + x~ + x~) + r.
By the identity 6xi is representable as a sum of 12 biquadrates. Therefore n is the sum of at most 4 x 12 + 5 = 53 biquadrates. We take one further step. Any n ~ 81 is expressible as n
= 6N + t
where N~O, and t=0,1,2,8i,16 and 17 corresponding to n=:0,1,2,3,4,5 (mod 6). But 17 = 24
+ 1.
Therefore, following the method above, if n ~ 81, then it is the sum of 4 x 12 + 2 = 50 biquadrates. We can deal with n ~ 80 easily: If n ~ 50, then trivially n = n ·14. If 50 < n ~ 80, then n = 3 . 24 + (n  48) . 14 and this is the sum of 3 + n  48 < 50 biquadrates. D The same method together with the identity 5040(a 2
+ b 2 + c2 + d 2 )4 =
6L(2a)8
+ 60L(a±b)8 + L(2a ± b ± C)8
+6L(a±b±c±d)8,
(2)
can be used to prove that g(8) < 00. In this identity there are 840 8th powers on the right hand side, and since every n ~ 5039 is expressible as a sum of at most 273 numbers 18 and 28 , we see that g(8)
Theorem 4.2. G(3)
~
13.
~
840g(4)
+ 273 ~ 42273.
501
18.4 Elementary Methods
Proof We start with the identity 4
L «Z3 + Xi)3 + (Z3 
X;)3)
= 8z 9 + 6Z 3(Xi + X~ + X~ + x~).
(1)
i= 1
If a number is expressible as (2)
then from (1) this number must be a sum of 8 cubes; this is because m is expressible as xi + x~ + x~ + x~, and Xi ~ Z3. Let z be a positive integer congruent to 1 (mod 6). We denote by /z the interval (3)
Clearly, for sufficiently large z, we have q>(z
+ 6)
' .. , x., with the property that Xl
+ ... + Xs = Yl + ... + y., (I)
506
18. Waring's Problem and the Problem of Prouhet and Tarry
We also let M(k) denote the least s so that (1) holds, and furthermore, (2) Theorem 6.1. M(k) ;;:: N(k) ;;:: k
+ 1.
Proof From
X~
+ . . . + x~ = y~ + . . . +
Y:,
we have
so that Yl>' .. ,Yk is only a permutation of Xl>' .. , Xk'
D
Theorem 6.2. N(k) ~ M(k) ~ 2k.
Proof Let Xl>"" Xs;Yl>'" ,Ys be solutions to (1) and (2). Then (3) i= 1
i= 1 s
L ((Xi + d)k+2 + y~+2)"# L (X~+2 + (Yi + d)k+2).
(4)
i= 1
i= 1
The proofs of these two formulae follow from the expansions of (3), (4) and applying (I), (2). Thus, if M(k) exists, then taking s = M(k) we have M(k + 1) ~ 2M(k). But M(1) = N(I) = 2, so that the theorem follows by mathematical induction. D Theorem 6.3. N(k) ~ tk(k
+ 1) + 1.
Proof Suppose that n > s! Sk. Let ai (i = 1,2, ... ,s) run over 1,2, ... ,n. Then there are nSsets al> a2, ... ,as. Each fixed set al> a2, ... ,as has s! permutations. It follows that among the nSsets al> a2,' .. , as there are at least nS/s! sets in which every set is a permutation of a certain other set. Let
h = 1,2, ... ,k.
Then Therefore there are at most k
IT (sn h h= 1
s
+ I) < skntk(k+ 1)
507
IS.7 The Problem of Prouhet and Tarry
sets of different Sl (a),
(5)
s2(a), ... , sk(a).
Take s = tk(k + 1) + 1. Then, from n > s!s\ we have
Therefore there are at least two different sets a1> a2, ... , as such that (5) takes the same values. Since these two sets are not permutations of each other, it follows that N(k) ~ s, and the theorem is proved. 0 We now write to represent (l) and (2). From Theorem 6.1 and the following examples, we have: Theorem 6.4. If k
~
9, then M(k)
=
N(k)
=
k
+ 1.
[0,3]1 = [1,2]1>
[1,2,6]2 = [0,4,5]2' [0,4,7,11]3 = [1,2,9,10]3, [1,2,10,14,18]4 = [0,4,8,16,17]4, [0,4,9,17,22,26]5 = [1,2,12,14,24,25]5, [0,18,27,58,64,89,101]6 = [1,13,38,44,75,84,102]6, [0,4,9,23,27,41,46,50]7 = [1,2,11,20,30,39,48,49]7, [0,24,30,83,86,133,157,181,197]8 = [1,17,41,65,112,115,168,174,198]8, [0,3083,3301,11893,23314,24186,35607,44199,44417,47500]9
= [12,2865,3519,11869,23738,23762,35631,43981,44635,47488]9' []
IS.7 The Problem of Prouhet and Tarry In this and the next sections we shall prove that
M(k) O. For this set at. ... ,ajl we can clearly set aj so that C{Jj> O. But C{Jl(al) = 1, so that the theorem is proved. D Theorem 7.3. Let at. ... , ak be a set ofpositive integers satisfying Theorem 7.2. Let Q ~ 1 and Xl," ., X k be positive integers belonging io the intervals (i= 1,2, ... ,k).
Denote by N the number of sets (Xt. ... , X k) such that X k1
+ ... + X k'k
X k1  l
+ ... + X kk l , ... , X 1 + ... + X k
509
18.7 The Problem of Prouhet and Tarry
lie in intervals with lengths O(Qk  1), O(Qk  2), ... , O(Q), O( 1)
respectively. Then N
= 0(1).
Proof Let (Xl>' .. , X k) and (X'l' ... , X~) be two sets which satisfy the conditions of the theorem. Then
Let Y i = Xi  X;. Then AllYl
+ ...
+AlkYk = O(Qkl),
so that A .. IJ
=
X~i J
+ X~i1X'. + ... + X,.ki J J J
(1 ~ i,j ~ k).
Thus
The ratio of the product of the terms of the main diagonal of the determinant IA k i + d to that of Dk in the previous theorem is clearly greater than klQkl+k2+ ... +2+l
= klQtk(kl).
Also the ratio of the absolute value of each remaining term in the expansion of IA k i+ 1,A to the corresponding absolute value term for Dk is smaller than 2tk(kl)k lQtk(kl).
We now take H
= 2tk(kl) in Theorem 7.2, so that we have
It is then easy to see that O(Qkl)
A 12 "'A lk
. . . . . . . . . . . . . . . . . .. = O(Qtk(kl». 0(1)
Ak2 ... Akk
Therefore Y l = 0(1).
510
18. Waring's Problem and the Problem of Prouhet and Tarry
Similarly we have Y2 = 0(1),
The theorem is proved.
... , Yk = 0(1).
D
Theorem 7.4. Suppose that the conditions in Theorem 7.3 are satisfied. Let A1 ;;:: 0, A2 ;;:: 0, ... ,Ak ;;:: 0. Then the number of sets (Xl> ... , X k) such that X~
+ ...
+X~, X~1
+ ...
+X~1,,,,,X1
+ ...
+Xk
lie in intervals with lengths
respectively is
Proof Since an interval with length 0(Qki+A k i+1 ) can be divided into O(Q"ki+l) intervals with lengths O(Qki), the required result follows at once from Theorem 7.3. D Now let fJ = kj(k + 1) and a1, ... , ak + 1 be a set of positive integers satisfying the conditions of Theorem 7.2 (where we have replaced k by k + 1). We suppose that (1 ~ u ~ k
+ i,
1 ~ v ~ I).
Denote by r(n1' ... ,nk) the number of solutions to the system k+ 1 I (l ~ h ~ k). y~v = nh
L L
u= 1 v= 1
We now prove the following theorems: Theorem 7.5. There exists a set of integers N 1, ... ,Nk such that
Proof The numbers of different sets (Yuv) must be 1 k+ 1 I au QPVl;;:: C2Q(k+1)(1+P+·oo+pll) 2 u=1 v=1 = C2 Q(k + 1)2(1 PI).
;:  n n
Since Inhl ~ C3Q\ the number of different sets (nh) is
511
18.8 Continuation
Therefore there must be a set of integers r(N1>"', N k ) >c:;..
C2
N1>' .• , Nk
such that
Q(k+ 1)2(l_pl)_tk(k+ 1).
D
C4
Theorem 7.6. The number of solutions to the system k+ 1
I
L L y~v = Nh
(l ,,;;, h ,,;;, k
+ I)
u= 1 v= 1
is at most
Proof From k+1
k+1
I
L y~l = Nh  L L y~v
(l,,;;,h,,;;,k+l)
and (l ,,;;, u";;' k
+ I,
1 ,,;;, v ,,;;, I),
we see that 1 + ... + yk+ 1 Yk+ 11 k+1,l' yk11 + ... + ykk+1,l''''' Y 11 + ... + y k+1,l
lie in intervals with lengths
respectively. We take A. u = ufJ  (u  I)
~
0 in Theorem 7.4. Then, from
k+1
L:
{ufJ  (u  I)} = tfJ(k + I)(k + 2)  tk(k + I) = t k,
we see that the number of sets (Y11,'" ,Yk+ l,d is O(QkI2). Corresponding to each fixed set (Y11,' .. ,Yk+ 1,1) the sums k+ 1 y 12 + " ' + yk+1,2 Yk12+ 1 + " ' + yk+1,2'"''
clearly lie in intervals with lengths O(Q(k+ 1)P2),
O(Qk P2 ),
... ,
O(QP 2)
respectively. Replacing Q by QP in Theorem 7.4, we see that the number of different sets Y12"" ,Yk+1,2 is O(QkPI 2). Continuing this way the theorem is proved. D
18.8 Continuation Theorem 8.1. Denote by W(k,j) the least integer s such that the system (l ,,;;, h ,,;;, k),
512
18. Waring's Problem and the Problem of Prouhet and Tarry s
L ~/1"# L X~q+1, i= 1
(p "# q, 1 ~ p, q ~j)
i= 1
is soluble in integers. Then
([ IOg~(k++D + 2)J
+ I)
W(k,j)'; (k
IOg(1
) I .
Proof This theorem is an immediate consequence of the following theorem.
Theorem 8.2. Let
([ IOg~(k++D + 2)J
, ;;. (k
+ I)
)
log (I
I
.
Then, given any j, there are integers
such that the system
(l
~
h
~
k),
is soluble. Proof Let r(Nb' .. , N h ) be as defined in the previous section. By Theorem 7.5 there are N 1 , ••• , Nh such that
Corresponding to a set of solutions (Yuv) to the system k+ 1
I
L L Y:v =
Nh
u= 1 v= 1
there is clearly a number M such that k+ 1
I
L L y~:l =
M.
u=l v=l
If such an M has only e ( ~ j  1) different values, say M b M 2, .•. ,Me, then, by Theorem 7.6, the number of solutions to the esystem
513
Notes
is at most cseQtk(k+ l)(lP'). From the definition of Mi the number of solutions to this esystem is at least r(Nb' .. , N k ). On the other hand, if we take
~
I > {lOg (k
+ 2) flOg (1 + ~)} ,
then, for large Q, we have
giving a contradiction. Our theorem is proved.
D
Notes 18.1. Concerning the value of g(k) in Waring's problem there is the following result: When k > 6 and
we have
(See Hua [30].) Moreover K. Mahler [41J proved that there exists a constant ko such that the above inequality holds whenever k > k o. Unfortunately the method which is based on Roth's theorem is ineffective in the sense that it does not allow us to make a computation for the value of k o. J. R. Chen [18J proved that g(5) = 37. R. Balasubramanian proved that 19 ~ g(4) ~ 21 (see [5J). 18.2. I. M. Vinogradov [61J has improved on his own result on G(k) in Waring's problem: For sufficiently large k we have G(k) < k(210gk
+ 410glogk + 2 log log logk + 13).
Chapter 19. Schnirelmann Density
19.1 The Definition of Density and its History The purpose of this chapter is to prove the following two important results: "There exists a positive integer c such that every positive integer is the sum of at most c primes." "Let k be any positive integer. Then there exists a positive integer Ck (depending only on k) such that every positive integer is the sum of at most Ck kth powers." These two rf':::;Jlts are obviously related to the Goldbach problem and the Waring problem. Indeed we can even say that these two results are the most fundamental first steps towards these two famous problems. We shall call them the GoldbachSchnirelmann theorem and the WaringHilbert theorem respectively. In this chapter we introduce the notion of density created by Schnirelmann. This notion is extremely elementary, and yet it allows us to establish the two historic results. Our proof of the GoldbachSchnirelmann theorem differs slightly from Schnirelmann's original proof in that we replace the application of Brun's sieve method by Selberg's sieve method. Again our proof of the WaringHilbert theorem is not the original proof due to Hilbert, nor that due to Schnirelmann. We shall give instead the proof by Linnik, given in 1943, with some simplifications and modifications. In both these proofs the notion of Schnirelmann density occupies an important place. The definition of density is as follows: Definition 1. Let ~ denote a set of (distinct) nonnegative integers a. Denote by A(n) the number of positive integers in ~ which do not exceed n; that is A(n)
L
= 1
1.
~a~n
Suppose that there exists a positive number IX such that A(n) ;;:: IXn for every positive integer n. Then we say that the set ~ has positive'density, or that ~ is a positive density set. The greatest IX with this property is then called the density of ~. Obviously we have the following simple properties: (i) Since A(n) ~ n, it follows that IX ~ 1. (ii) If IX = 1, then A(n) = n for all n and so ~ must include all the positive integers. Exercise. Let't ;;:: 1. Determine the density of the set 1 + ['ten  I)J, n = 1,2, ....
515
19.2 The Sum of Sets and its Density
19.2 The Sum of Sets and its Density We now introduce the symbols m, b, B(n), {3 and (t, c, C(n), y. The definitions for them are analogous to those for~, a, A(n), oc: thatisbEm, B(n) = L1':;b':;n 1, and{3 is the density of the positive density set m. Definition. The set of integers of the form a + b (aE~, b Em) is called the sum of the sets ~, m, and is denoted by (t. We also write ~ + m = (t. Theorem 2.1. Let 0 E ~ and (t
= ~
+ m.
Then y ;;:: oc
+ {3 
oc{3.
Proof Since {3 > 0 we see that 1 Em. The following three types of numbers are positive integers in (t; they are all different and are at most n. (i) In m we write b 1 = 1, bz , ... ,bB(n), the numbers being arranged in increasing order. Since 0 E ~ we see that b1, bz, ... ,bB(n) are members of (t, and that there are B(n) such members. (ii) Corresponding to any v where 1 ~ v ~ B(n)  1, the various numbers a + bv , with a E ~ and 1 ~ a ~ bv + 1  bv  1, are distinct positive integers not exceeding n in the set (t. This is because
and
so that
It is clear that the two types of numbers in (i) and (ii) are mutually distinct. For . each fixed v (l ~ v ~ B(n)  1), there are A(b v + 1  bv  1) such numbers a + bv in (t. (iii) For a E~, 1 ~ a ~ n  bB(n), the numbers a + bB(n) are distinct positive integers not exceeding n in the set (t. Since a + bB(n) ;;:: 1 + bB(n) we see that these numbers of type (iii) are different from those in types (i) and (ii), and there are A(n  bB(n» such numbers a + bB(n). From the results of (i), (ii) and (iii) we have B(n)l C(n) ;;:: B(n) + L A(b v + 1  bv  1) + A(n  bB(n» v=l
B(n)l ;;:: B(n) + L oc(b v + 1  bv

1)
+ oc(n 
bB(n»
v= 1
= B(n) + oc{bB(n)  b1  (B(n)  1) + n  bB(n)}
= B(n) + oc{n
 B(n)} ;;:: (l  oc){3n
= n(oc + (3  OC(3),
+ ocn
516
19. Schnire1mann Density
and hence C(n) 
n
~
0(
+ 13  0(13,
y
~
0(
+ 13  0(13· D
Note: This theorem is not the best concerning the density of the sum of sets. The sharpest result should be y ~ min (1, 0( + 13), a theorem proved by Mann in 1942. The proof of Mann's theorem is more complicated, and since there is no fundamental improvement concerned with the applications to the principal results in this chapter, we do not include it in this book. Let us now take ~ and ~ both to be sets of positive integers congruent to 1 mod q, and assume also that 0 E~. Then ~ + ~ include all the positive integers congruent to 1, 2 mod q. Obviously the densities of ~ and ~ are l/q while the density of ~ + ~ is 2/q. Therefore the result of Mann cannot be improved.
Theorem 2.2. Let 0 E ~ and 0( + 13 contains all the positive integers.
~
1. Then y = 1; that is the set
log2 log(l 
, rx)
so that (l 
rx)'o
12 :::;:;
(1 
log2 rx)Iog(l~)
=e
log2 log(l~)
·Iog(l~)
1 = 2'
and hence
rxso /2 ;;:: 1  (1  rx)'o/2 ;;:: 1 
t = t.
Since 0 E ~so/2 the set ~so = ~so/2 + ~so/2 must, QY Theorem 2.2, include all the positive integers and therefore every positive integer is expressible as the sum of So members of ~. 0 Theorem 2.4. Let ~* be a collection of nonnegative integers, with multiplicity of membership being allowed. Let ~ be the largest set from ~* without multiplicity of membership. Let rea) denote the multiplicity of a in ~. Suppose that
1 n holds for all n ;;:: 1. Then
_C="'=~L"':":'"
~
n,r.,....(a_))_'
r2(a)
~ a'
has a positive density
(> 0),
rx ;;::
rx/.
Proof From the BunyakovskySchwarz inequality (Theorem 18.7.1) we have
C"'~"'n Y: ;: "'~"'n rea)
1
r2(a) 1
"'~"'n 12 = A(n) "'~"'n r2(a), 1
518
19. Schnirelmann Density
so that A(n) 1( n ~ ~
L
r(a)
)2/,
1 ~a~n
The theorem is proved.
L
r2(a) ~
(X'.
1 ~a~n
D
19.3 The GoldbachSchnirelmann Theorem In §§3  5, the letters Ch §§3  5 is to prove
C2, ...
denote absolute positive constants. The purpose of
Theorem 3.1. There exists a positive integer c such that every integer greater than 1 is the sum of at most c prime numbers. We define m:* to be the collection of numbers 1 together with Pl + P2 where Ph P2 run through all the prime numbers. Note that members of m:* may have multiplicity. We also define m: to be the largest set from m:* without multiplicity of membership. In order to prove Theorem 3.1 it suffices to prove Theorem 3.2. m: has positive density
Cl.
By Theorem 2.3 any positive integer m is expressible as a sum of at most So members ofm: (that is, a sum of terms involving 1 and numbers of the formpl + P2). This implies that m is the sum of at most 2so numbers which are primes or 1. Therefore, for any n > 2, we have n = 2 + (n  2) = 2 + b . 1 + where the number of primes P being summed is at most 2so  b. Since 2 + b is expressible as a sum of at most b + 1 primes, it follows that n is expressible as a sum of at most 2so + 1 primes. Therefore Theorem 3.1 follows from Theorem 3.2. We now let r(l) = 1 and r(a) be the multiplicity of a in the collection m:*, that is
LP,
I, r(a)
=
if a
L
{
1,
= 1,
if a ~ 2.
Pl +P2=a
Ll.,;;a
Following Theorem 2.4 our aims are to find a lower bound for O.
522
19. Schnirelmann Density
L
Jl(mk) Jl(m)l~m~l;lk sfimk) (m,k)= 1
L
Akg(k) = Jl(k) Jl2(m) sfik) 1 ~m~l;lk fim) (m,k)=l Jl(mk) Jl(m), 1 ~m~l;lk sfimk)
L
and hence, by Theorem 6.3.2, Jl(m)
L
sfim)
L
Akmg(km) =
1 ~k~l;lm
A,g(r).
1 ~'~l;
m!,
Therefore, by (7) and (8) we have
Q= The
requi~....d
L
1 ~d~l;
fid) {Jl(d)}2 sf(d)
= 12 s
L
1 ~dq
result follows from (6) and (8).
= : = ~.
Jl2(d) fid)
s
s
D
Theorem 4.3. Let the conditions in Theorem 4.2 hold. multiplicative function, and gl(P) = g(p), then
If glen)
is a completely
M
Nl;~
L
gl(k)
We first establish the following:
Theorem 4.4. Let fin) be a completely multiplicative function satisfying 0 If f3n ~ 0, then
L
f3nf(n)
IT {1
 fip)}
1
L
~
L
fin)
13m,
min
~n~l;
1
~
pl;';;=O>P!k m
where pi;';; => plkm means that n/m has only the prime divisors of k m. Proof
L
f3nfin)
IT {I 
fip)}
1
=
L
00
IT L:
f3nfin)
p!k n
=
L
00
f3nfin)
IT L fipm) = L p!k n
=
L l~n~~
L r=l
p!,=o>p!k n
00
f3J(n)
f(nr) =
L l~n~~
00
f3n
L
,=1 p!,=o>p!k n
m=O
00
f3n
{fip)}m
m=O
L s=l
n!s
pl~ =o>p!k n
fis)
fir)
f(p) < 1.
523
19.4 Selberg's Inequality 00
Lfts)
f3n ?:
L
s= 1
1
~n~,:
L 1
fts)
~s:::;~
nls
L
fts)
l"'s"'~
f3n
nls
pi; =>Plkn =
L 1 ~n~~
pi; =>Plkn
L
0
f3n·
nls
pi; =>Plkn
Proof of Theorem 4.3. We have, by (4),
+ Jl(p) = _1__ 1 =
f(p) = Jl(1) g(p)
g(1)
1  g(p) .
g(p)
g(p)
If k is squarefree, then, by Theorem 6.2.2,
2(k) ( ) ng1(p) _Jl_ = Jl2(k) gl p = Jl2(k)_:..:..plk~_ _ ftk) plk 1  gl(P) {I  gl(P)} plk =Jl 2(k)gl(k)n{1g1(p)}1. plk
n
The above still holds when k Theorem 4.4,
L
n
(9)
= 1 and when k has square divisors. Therefore, by
Jl2(k) Irk) =
1 "'k"'~ J\
L 1 "'k"'~
?:
L l"'k"'~
Jl 2(k)gl(k)n{1g1(p)}1 plk gl(k)
L Jl2(m). mlk pl~ =>plm
Let dk be the greatest squarefree divisor of k, so that dklk. If p\!!., thenplk and
dk
so pldk. Therefore dk is a number satisfying the condition on m, so that
Jl2(k) k?: L gl(k). l"'k"'~ ft) 1H"'~ L
(10)
From (9) we see that Jl 2(k)lf(k) ?: 0 and so, by (5) and (9) we have that
~ Jl2(k) _ Jl2(k) ~ n{l ( )}1 . k ....". ftk)g(k)  ftk)gl (k) "" plk  gl P
1,1.1
When k = 1 or k is squarefree, g(k) = gl(k), and if k is not squarefree, Jl(k) = 0; therefore the above holds for all k. The theorem now follows from (10) and Theorem 4.2. 0 Theorem 4.5. Let A ?: 0, M ?: 3 and denote by n(A ; M) the number ofprimes between A and A + M. Then
524
19. Schnire1mann Density
n(A;M)
~
2M {1'+
10gM
o (IOgIOgM)}. 10gM
The implied constant here is independent of A and M. Proof Let
L
=
S(A;M)
1,
A+JMk2)
6k 1 . 6k 2
1 a 112(k) ~'21'""+ 36~6. C7 log ~ kla k
L
We take ~
= a1o,
D
and the theorem follows from (1).
Proof of Theorem 3.4. When n
L 1 ';;a';;n
r2(a) ~
1+
~
2, we have
L c; 4';;a';;n
a2 4
112(k ) 112(k ) L _1'""_1_ L _1'""_2_
log a kda
k1
k21a
k2
528
19. Schnirelmann Density
l:::;a~n
k,k2 (k •• k2)
Ia
n
k1k2 (kl> k 2) Since (kl> k 2) ~ min{kl> k 2} ~ Jk 1k 2, it follows that ,,2

n
2
2
"
n
r (a) ~Y"+ Cs 1 4 L. k k 3/2 1 "'a"'n og n 1 "'k •• k2 "'n ( 1 2) L.
1
~ + C~~( ~ log4 n
_1_)2
'::1 k
k
3/2
n3
~ C44  '
log n
The theorem is proved.
D
Exercise 1. Let x, k, 1 be positive integers, and (I, k) = 1. Denote by n(x; k, I) the number of primes in the arithmetic progression kn + 1(n = 1, 2, ... ) not exceeding x, and let 0 < (j < 1. Prove that, for k < xo, n(x;k,/)~
2x
(
x q>(k) logk
1+0
((lOg log logx
where the implied constant depends at most on
X)2)) ,
(j.
Exercise 2. When p, p + 2 are both primes, we call them a pair of "prime twins". Denote by Z2(N) the number of pairs of "prime twins" not exceeding N. Prove that N Z2(N) ~ Cs  2  ' log N
and that the series
1
L;, p'
p
where the summation is over all "prime twins" p*, is convergent.
19.6 The WaringHilbert Theorem In §§6 7 theletters c, Cl> C2, ••• denote positive constants depending only on k. The constants implied by the Osymbol also depend at most on k. The purpose of §§6  7 is to prove
529
19.6 The WaringHilbert Theorem
Theorem 6.1 (Hilbert). Corresponding to each positive integer k there exists a positive integer c such that every positive integer is the sum of at most c kth powers. We now define 21;" to be the collection of integers x~ + ... +.x; where each Xm runs over all the nonnegative integers. We define 21t to be the largest set of distinct elements from 21;". Let
The proof of Theorem 6.1 is divided into sections of a chain: Theorem 6.2.
If k
~
2, then 21q has positive density.
We see that Theorem 6.1 can be deduced at once from Theorem 2.3 and Theorem 6.2. We define rea) to be the number of solutions to
We first prove: Theorem 6.3.
If n ~
1, then
L
rea) ~ c2(k)nq /k •
1 ~a~n
Proof Clearly we can assume that n >
L
rea)
= 
Cl.
L
1+
O~a~n
1 ~a~n
Then
L
k =a xk+.··+x , n. We also note that, for any if q
=
0,
if q # O. o
From Theorem 6.5 we have
1,,;~,,;nr2(a) ~ O,,;a~qpkC~+ ...~x~!=a ly O'::::;Xi~P 1 ~i~Cl
II f. ... f. e21ti(X~+ +x~!)aI2 1
...

X!=o
o
drx
XC! =0
1
I
= Ixto e21tiXk"12C1 drx
~ cs(k)p 2qk
o ~
c4(k)n 2C !lkl
giving Theorem 6.4. Our aim therefore is to prove Theorem 6.5. Exercise. Deduce Theorem 6.5 from Theorem 6.4.
19.7 The Proof of the WaringHilbert Theorem Theorem 7.1. Let X, Y;;:: 1, n be an integer, and q(n) denote the number of integer solutions to
(Ixml
~
X,
IYml
~
Y, m = 1,2).
(1)
Then q(n)
~
{
27 X 3 / 2 y 3 / 2 , 1
60XYL din
d'
if n = 0; if n #
o.
(2)
531
19.7 The Proof of the WaringHilbert Theorem
Proof 1) n = O. Here the values taken by Xl> X2 and Yl cannot exceed 2X + 1, 2X + 1 and 2Y + 1. When Xl> X2, Y1 are specified, Y2 can only take one value. Therefore q(O) ~ (2X + 1)2(2 Y + 1) ~ (3X)2(3 Y) = 27 x 2Y,
and similarly q(O) ~ 27 XY 2 , and hence q(O) ~ min(27 x 2Y, 27 Xy2) ~ )27 x 2Y . 27 XY 2 = 27 X 3/2y3/2. 2) n i= O. We can assume without loss that X integer solutions to XlYl
+ X2Y2
=
n
~
Y. Let ql(n) be the number of
((Xl>X2) = 1, IX21 ~ Ixd ~ X, IYml ~ Y, m = 1,2).
(3)
Clearly Xl i= 0, since otherwise X2 = 0 giving n = 0, contradicting our present hypothesis. Next, for a fixed set Xl, X2 with (Xl' X2) = 1, IX21 ~ IXll ~ Xwe denote by q2(n; Xl> X2) the number of integer solutions in Yl> Y2 for (3). From Theorem 1.8.2 we see that (3) is soluble, and ifi l , Y~ is a set of solutions, then all the solutions are given by t
integer.
Therefore Itl=
Y+ Y 2Y IY~ Xl Y21 ~=, IXll Ixd
and hence the number of values taken by t does not exceed 2Y
4Y + X
IXll
IXll
2'+I~
5Y
~,
IXll
that is
Therefore ql(n)~ ~
5Y 21xd + 1 ~5Y l""lx,I""X IX21""lxd Ixd l""lxd""X IXll 5 y. 3 . 2X = 30XY.
L
L
L
It follows that, with the condition (Xl, X2) = 1, the number of solutions to (1) does not exceed 2 . 30XY = 60XY. Next, if(xl' X2) = d i= 1, din, then we let x~ = xt/d, x~ = x2/d, so that we now seek the number of integer solutions to .
and we see from the above that this number does not exceed 6~ . Y.
532
19. Schnirelmann Density
Therefore, when n =F 0, q(n) ~ 60XY
L .1 din
The proof of the theorem is complete.
d
0
Theorem 6.5 is obviously a consequence of the following Theorem 7.2. Let k
~
2, andf(x) be a polynomial with degree k having integer valued
coefficients:
Then
(4) o
Proof When k
= 2, the left hand side of (4) is the number of integer solutions to
a2
=
0(1),
al
=
O(P),
1~m
~
4.
(5)
Let Xi  Yi = Z;, a2(xi + Yi) + al = Wi (1 ~ i ~ 4). We see that the number of solutions to (5) does not exceed the number of integer solutions to (Zi
= O(P), Wi = O(P), 1 ~ i
~
4).
(6)
If we denote by q(n) the number of integer solutions to
= O(P), Wi = O(P), m = 1, 2 where the constants implied by the Osymbol are the same as those of (6», then the number of solutions to (6) is Llnl ~C6PZ q2(n). From Theorem 7.1, we have
(Zi
=
0(P
6)+ 0 (p4
1
~dl'~~C6PZ d d dl~ I 1 2
(dl.dz) n 1 ~n~c6P2
1)
533
19.7 The Proof of the WaringHilbert Theorem
and the required result (4) follows. Suppose now that k ~ 3. We proceed by mathematical induction, and assume as induction hypothesis that the theorem holds when k is replaced by k  1. From
£
£
Ix=o e 21tij(X)a.12 = x=o e 21tij(x)a.
I
e21tij(x + h)a.
x~h~Px
P
I' I'
e 21tih 2 n " by " ;;:: 2n " provided that we also replace" R must contain a nonzero lattice point" by "there must be a nonzero lattice which lies in R or on its boundary". D
We can make the result sharper in the following sense. Theorem 2.3. Denote by Q the midpoint of the line joining the origin 0 to the point P on the convex body R. As P runs over the points of R, the point Q describes a convex body which we denote by R t . Under the hypothesis of Theorem 2.2 we may strengthen the conclusion by assuming that the lattice point concerned lies outside R t . Proof Denote by (j the greatest distance between 0 and a boundary point of R. Take the integer N satisfying 2N  1 ~ (j < 2N, so that the distance between 0 and any boundary point of R2  N is less than 1. Since R2  N has no nonzero lattice point, the lattice point in Theorem 2.2 must lie outside R2 N. Therefore there exists an integer m with the property that inside or on the boundary of R2 m, but outside R 2 ml, there is a lattice point (Xl> ... ,xm). Now the lattice point
lies inside or on the boundary of R but outside Rt .
D
540
20. The Geometry of Numbers
20.3 Linear Forms Let
a,rs
be real numbers, with the determinant :;t:0
LI= and let
r =
1,2, ... ,no
(1)
Take R to be the region
This is a convex body symmetrical about the origin, and its volume is given by
f. f f··· f I f··f
dXl . dX2 ... dX n
l~d"'A!.···,I~nl"'An
O(Xl,X2,""Xn) o(el> e2,"" en)
I~d
Idel ' de2'"
den
'" Al,···.I~nl '" An
1
ILII
Therefore if A1A2 ... An > ILlI, then R contains a nonzero lattice point, and if A1A2 ... An ~ ILlI, then there is a nonzero lattice point in R or on its boundary. Therefore: Theorem 3.1. Let el>' .. , en be n real linear forms in Xl, ... , Xn with determinant LI. Let Al>"" An be positive numbers satisfying A1A2 ... An ~ ILII. Then there exist integers Xl> X2,' .. , x"' not all zero, such that
Theorem 3.2. The conclusion of Theorem 3.1 can be strengthened to the following: there exist integers xl> X2,' .. , x"' not all zero, such that
Proof Let:. '> O. By Theorem 3.1 there are integers Xl, ... , X"' not all zero such that
lell
~ (l
1
+ e)n Al> le21
~
A2 l+e
  < A2, ... , lenl
~
An l+e
  < An'
Now let e + O. From the discrete nature of integral points the theorem is proved. 0
541
20.3 Linear Forms
+ 1, and take
If we replace n by n
e. =
x.
(1 ~ v ~ n),
A. =
t 1/n
(1 ~ v ~ n),
1
= t'
An+ 1
then, from Theorem 3.2, we have: Theorem 3.3. There are always integers Xb . .. ,Xn and y, not all 0, such that
and Ix.1 ~ t 1/", where t is any positive number.
D
Again if we take (1
1
v ~ n),
(1 ~ v ~ n),
A+1=•
~
t
then we have: Theorem 3.4. Let 1X1' ••• , IXn be real numbers and t lattice point (X,YbY2, . .. ,Yn) such that 1 IIX.x  Y.I ... , X n ,
not all 0, such that
ei + ... + e; ~ 4 ( ILlI)2/n I n
'
ntn r(tn
+ 1)
+ 1), it
.
543
20.5 Products of Linear Forms
where
J.
~
(" )
. r ~+1
We can rewrite Theorem 4.1 differently. A positive definite quadratic form n
n
L L arsxrx.,
Q(X1>' .. ,Xn) =
ars
= asr
r= 1 s= 1
can be represented by
e
The determinant LI of 1> ••• ,en is equal to the square root of D = larsl. This is because A = (a rs ) is a positive definite matrix so that there exists a matrix B such that A = BB', LI = IBI = Dt. Therefore Theorem 4.1 can be stated as follows: Theorem 4.2. Let Q(Xl,' .. , xn) be a positive definite quadratic/orm with determinant D. Then there exists a nonzero point (Xl, . .. ,xn) such that (3)
Let Yn be the least constant with the following property: There exists a nonzero lattice point such that
In §1 we already remarked that Y2 = 2/.j3. Up to the present mathematicians have only determined the values of Yn for 2 ~ n ~ lO: Y4
= Ji, Ys
=
2,
Ys = Y9
18,
= 2,
In general, we know that Yn
3 is unsolved. We now discuss the product of linear forms. We shall use the following result, known as the arithmeticgeometric means inequality. Theorem 5.2.
If al
~
0, ... ,an
~
0, then
Proof 1) n = 2k. We use induction on k. Since
545
20.5 Products of Linear Forms
we see that the result holds when k = I. Assume now that the result holds when n = 2k 1 . Then when n = 2k we have 1
1
1
(al'" a2k)2k = {(al'" a2kI)2kl(a2kl+l'" a2k)2kl}t
~ {(a 1 + ... + a 2kI)(a2kl+l + ... + a2k)}t 2k 
~
al
2k 
1
1
+ ... + a2k
""
2k
2) (Backward induction.) We now show that if the result holds for n holds for n. Take
+ 1, then it
Then, from our induction hypothesis, we have
);:tT = (al ... an+l)n+l + ... + an) 1
1 ( na1 ... an(al
_1_
=
_1_ {a n+1 1
~
al
+ ... + an+1 n+ 1
+ ... + an + ~n (a 1 + ... + an)}
which gives
The theorem is proved.
0
From Theorem 5.1 and Theorem 5.2 we have at once:
Theorem 5.3. There exists a nonzero lattice point such that
lei'" enl
n!
~ ILlI· nn
0
Note. We can also deduce from Theorem 3.1 that there is a nonzero lattice point such that
Since n! < nn whenever n > 1, our Theorem 5.3 here gives a better result. Denote by I'n the least positive constant such that, whenever y ~ Ym there is a nonzero lattice point satisfying
Up to the present we only know that 1'2
= 1/)5 and Y3 = t (Davenport).
546
20. The Geometry of Numbers
20.6 Method of Simultaneous Approximations Theorem 6.1. Let OCt> ••• , OC n be real numbers. Then there exist a nonzero lattice point ~
(Xl> ... , xn) and an integer Y
I such that
i = 1,2, ... ,n. Proof We first consider
IXi  ociyl
+ I~I,,;; r,
I ,,;; i,,;; n,
This is a convex body symmetrical about the origin, and its volume is given by
f. f I~d
(
dXl··· dXndy
here ~i = Xi  OCiY, I ,,;; i ,,;; ~n+l
n,)
= y/t
+ I~n+ d <S;,
i= 1, ... ,n
I~d
+ I~n+ d <S;, i= 1, ... ,n
f··f
=Itl I~d
d~l···d~nd~n+l
+ I~n+ d <S;,
i= 1, ... ,n
~i+~"+ 1 ~r
i= 1, ... ,n
~i~O,~n+ l~O
2 + lit I =_ _ rn+l. n
n
+I
Therefore there is a nonzero lattice point (Xl, ... , Xn, y) such that
lx, _ "",I +
I~I.; (n ~ 1)"~'
From Theorem 5.2 we have
, ; _n_ (n + 1);;+1, + It I 1
n
I
i = I, ... ,n.
547
20.7 Minkowski's Inequality
Hence
I
O(i 
Xi
I~
Y
n (n
1 '
i = 1,2, ... , n.
0
+ 1)/+;;
This theorem is a slight improvement on Theorem 3.4. The best results at the present are: (Minkowski),
n+ 1 { 1 + (nl)n+3}1/n
cn~
n+1
n
(Blichfeldt).
Exercise. Let 0(. = fl. + iy. (v ::: 1, ... , n) be complex numbers. Then there are complex integers Zl>'" ,Zm W such that
20.7 Minkowski's Inequality For ai
~
0 (i
= 1, ... ,n), r > 0 we define I
Mr(a) = { ;;(a~
+ ... + a~)
}l/r
(1)
.
When r < 0, and some ai = 0, then the equation (1) has no meaning. In this case, we define (a~ + ... + a~)l/r = O. Therefore, when ai ~ 0, ri:O we can always define Mr(a) =
H(a~ + ... + a~)r/r.
From now on wedenoteai ~ 0 (i = 1, ... ,n) by (a). We write (a) > 0 to mean ai > 0 (i = 1, ... , n), and (a) i: 0 to mean that not all the ai are zero. We also denote by max a and min a the largest and the smallest numbers in ai respectively. If there are nonzero real numbers A., Jl such that A.ai = Jlb i (i = 1, ... , n), then we say that (a) and (b) are proportional. Theorem 7.1. limr _
oo
Mr(a)
= maxa.
Proof We can suppose that r > 0, so that }l/r I { ;;(maxa),
or
~ Mr(a) ~
{
(maxa)'
}l/r
,
548
20. The Geometry of Numbers
(;;l)l/r max a ~ Mr(a) ~ maxa. Since
. (l)l/r = (1)0  = 1,
hm r
n
+00
we have limr_+ oo Mr(a) = maxa.
n
D
Theorem 7.2. lim r__ 00 Mr(a) = min a. Proof We can suppose that r < O. We first consider the case (a) > O. We have
so that by Theorem 7.1, 1
lim Mr(a) =
1
( ) =   = mina.
• 1 1 hm Mr maxa a Finally when one ai = 0, and r < 0, we see that both Mr(a) and min a are zero. The theorem is proved. D
r 
00
';+00
We write the geometric mean of ai' Theorem 7.3. limr_o Mr(a)
=
G(a).
Proof 1) r < 0, and some ai = O. Thi~/case is trivial. 2) r =F 0, (a) > O. From (1) we have
I Mr(a) = { ;;a~
+ ... + a~)
1 {1
r
r }
1 +···+a) = erlog (a n "
We now let r
+
r
•
0 and apply L'Hospital's rule, giving
1 {I
lim log rO
}l/r
(a~
n
+ ... + a~)
}
=
1
n
n
i1
 L a~loga· lim rO
1
(a~
n

I
I
+ ... + a~)
1~ logai'
= 
n
L.
i= 1
549
20.7 Minkowski's Inequality
Therefore · M r () I1m a
1 {l
= I'1m erlog (a n
rO
r l
+···+ar )} n
rO
= 0. We can assume that al > 0, ... ,as> 0, as+ 1 = as+ 2 = ... = an = 0, s < n. Then we have 3) r > 0, and some ai
I Mr(a)= { ~(a~ =
1 + ... + a.) }l/r = {s~'~(a~ + ...
+a~)
}l/r
(~ylrg(a~ + ... + a~)r/r.
From our earlier result we have I
lim { (a~
+ ... + a~) }l~ = (al
... as)l/s,
S
r+O
and, since s < n, s)l/r lim ( n
= 0.
r+O
Therefore lim Mr(a)
=
r+O
Lemma 1. Let 0(
+ {3 =
lim {(s)l/r{1(a~ n s
+ ... + a~) }l/r}
r+O
1,0( > 0,{3 > 0, Then/or s;;:: 0, t;;:: 0, we have ~t(J ~
with equality only when s
sO(
+ t{3
= t.
Proof The lemma is trivial if s = t or if one of s, tis 0. We assume therefore that s, t are distinct positive numbers. If s > t, then sit> l. Also, < 0( < I, 1  0( = {3, so that
°
(n~ From
fy~ sit
1 = 0(
f sit
 I dy
~ 0(
dy
= 0(
(f 
I).
550
20. The Geometry of Numbers
we have SXt fJ ~
Finally if SXt fJ =
SIX
+ t{3,
SIX
+ t{3.
then
fy~ sit
IX
f sit
 1 dy =
IX
dy,
or
f (y~l sit
which is impossible unless s = t.
 l)dy
= 0,
0
Lemma 2 (Holder's inequality). Let IX not proportional we have
+ {3 = 1, IX > 0, {3 > O.
When (a) and (b) are
n a~bf < (n.L ai)~( .Ln bi)fJ . .L ,=1 ,=1 ,=1 Proof Since (a) and (b) are not proportional, there exists i (I
~
i
~
n) such that
a· b· '=/:'. n n
L aj
j= 1
L bj
j= 1
Therefore, by Lemma 1, n
i= 1
(
.~ aj)~( .~ bj)fJ J= 1
J= 1
1),
(2)
551
20.7 Minkowski's Inequality
(k < 1).
(3)
Proof 1) k> 1. Here k' = kl(k  1) > 1, 0 < 11k < 1, 0 < 11k' < 1, 11k = 1. By Lemma 2 we have
n
i~l aibi
n
=
i~l (a~)l/k(bnl/k'
0. The theorem
IS
20.\1 The Least Value for
561
IAI
Exercise 1. Prove that we can always select an integer 0( from the ideal IN(O() I ~
M
0
such that
N(o).
Exercise 2. Prove that, given any ideal class, there is an ideal
0
satisfying
N(o)~M·
IAI
20.11 The Least Value for
We sawin the previous section that the discriminant A of an algebraic number field of degree n satisfies
1.11
~
(n)2r2(nn)2 . n!
"'"' 4
(mod 4), and ( 1)'2.1 > 0, we can construct the
Moreover, from A == 0 or following table:
'2 = I
'2 = 0 n=2 n=3 n=4 n=5
A A A A
~ ~ ~ ~
But actually the least value for
4 21 \16 680
A A
A A A
~
 3  15
A~71
A
~
 419
(I) A A
~ ~
44 260
1.11 can be calculated to give
'2 = I
'2 = 0 n=2 n=3 n=4
~
'2 = 2
=5 = 49 = 725
A A A
===
3 23 275
'2 = 2 (II) A
=
\17
The case n = 2 in Table (II) follows at once from considering the quadratic fields R(fi) and R(~). . When n = 3, if 8 satisfies x 3 + x 2  2x  I = 0, then the discriminant of R(8) is 49, and if 8 satisfies x 3  x  I = 0, then the discriminant of R(8) is  23. When n = 4, we let 8 be a root of
The following can then be proved: I) When a = 7,p = 29, we have r2 = 0, A = 725; 2) When a = 3,p = 11, we have r2 = I, A =  275; 3) When a =  I,p = 13, we have r2 = 2, A = 117.
562
20. The Geometry of Numbers
The actual construction of Table (II) presents a problem. The case n = 2 in the table is very easily settled. When n ;;:: 3, the proof of Theorem 10.3 gives us a method whereby after a "finite number" of calculations we can arrive at the results given in Table (II). However, in actual practice, this method requires the calculations of the roots of about one thousand polynomial equations and the determination of the discriminants of the corresponding algebraic number fields. In order to solve this concrete problem we need a practical method. We now examine the situation when n = 3. Suppose that the cubic field R(8) in our discussion has discriminant Ll which satisfies 0 < it ~ 49 (r2 = 0), or  23 ~ Ll < 0 (r2 = 1). From §10 we see that there is a nonzero integer oc in this field such that (1)
and
The degree of oc is either 3 or 1. Suppose that the degree of oc is definitely 3 so that oc cannot be a rational integer and hence R(8) = R(oc). From the inequality (1) we can determine a bound for the coefficients for the equations satisfied by oc, and the eventual result can be obtained after a finite number of calculations. Unfortunately we have no way of ensuring that oc is not a rational integer. On the contrary, from r > 3, we see that oc = ± 1 do satisfy (1) and ± 1 belong to R(8); therefore this method is not applicable. Let p > 3 and consider the convex body B:
jell + le21 + le31
~ p,
lel + e2 + e31 < 3 «
p),
where
and Wb W2, W3 is an integral basis for R(8). It is easy to see that B is a convex body symmetrical about the origin. Denote by F(t) the area of the intersection between the convex body A:
and the plane el + e2 decreasing. Therefore
+ e3 = t.
Then F(t) = F(  t), and when t;;:: 0, F(t) .is
563
20.11 The Least Value for ILlI 3
Volume of B
=2
p
f F(t)dt = 2~ f FGU )dU o
o p
~ 2~ fF(U)dU = ~ x Volume of A. P
p
o
But
Volume of A
=
{
233!~' 2
3
(
)
r2
when
r2 = 1.
=
0;
3
1 1t P 
4
when
3!)23'
Therefore, by Minkowski's theorem, there is a nonzero integer a in R(f) satisfying when
(2)
when and (3)
Now we see from (3) that a certainly cannot be a rational integer. Therefore a has degree 3 and R(f) = R(a). Let the irreducible equation satisfied by a be (4)
Then g3 i= 0, and we can assume that g3 > O. For, if otherwise, from  a satisfying the equation
and R(f) = R(a) = R(  a), and  a also satisfying (2) and (3), we can replace g3 by  g3' From the relationship between the roots and the coefficients we have
so that Ig11
~
2 and g3
= 1. Finally we find a bound for g2 by
564
20. The Geometry of Numbers Ig21
+ OC(l)OC(3) + OC(2)OC(3)1 ~ IOC(1)OC(2)1 + IOC(l)OC(3)1 + IOC(2)OC(3)1 (loc(l)l + 11X(2)1 + IOC(3)1)2 't'2
=
IOC(1)OC(2)
~
~ 0 and 1X(1)OC(3) < 0, so that
that is Ig21 ~ 3. Summarizing the above, in any cubic field R(8) with discriminant L1 satisfying 0< L1 ~ 49 (r2 = 0) or  23 ~ L1 < 0 (r2 = 1) there is an integer oc such that R(8) = R(oc), and IX satisfies an irreducible equation
with Igti ~ 2, Ig21 ~ 4 (when r2 = 0, Ig21 ~ 3). Therefore, in order to determine cubic fields R(8) with discriminant L1 satisfying 0 < L1 ~ 49 (r2 = 0) or  23 ~ L1 < 0 (r2 = 1), we need only examine these irreducible equations. But the number of such equations is at most 45 (at most 35 when r2 = 0). Moreover, when gl = g2 the equation has the root 1, and when gl + g2 + 2 = othe equation has the root  1, so that we have no need to examine these reducible equations. Finally since the roots of x 3  g2x2 + glX  1 = 0 are the reciprocals of the roots of x 3  glx2 + g2X  1 = 0, and R(8) = R(I/8), the reciprocal equation to (4) need not be examined either. We are then left with 27 (18 when r2 = 0) equations to be considered. We then calculate the roots 8 of these 27 (or 18) equations and then determine the discriminants for R(8) to arrive at the results for n = 3 in Table (II).
Bibliography
I. Baker, A.: Linear forms in the logarithm of algebraic numbers. Mathematika 13 (1966) 204  216. (II) Mathematika 14 (1967) 102107. (III) Mathematika 14 (1967) 220228. (IV) Mathematika 15 (1968) 204216 2. Baker, A.: Contribution to the theory of Diophantine equations I: On the representation of integers by binary forms. Phil. Tran. Roy. Soc. London, A 263 (1967) 273 291 3. Baker, A.: On the class number of quadratic fields. Bull. London Math. Soc. 1(1969) 98102 4. Baker, A.: Transcendental number theory. Cambridge University Press (1975) 5. Balasubramanian, R.: On Waring's problem: g(4) ..;; 21. HardyRamanujan Journal 2 (1979) 1 32 6. Barban, M. B.: Arithmetic functions on thin sets. [Russian]. Dokl. UzSSR 8 (1961) 911 7. Barban, M. B.: The density of the zeros of Dirichlet Lseries and the problem ofthe sum of primes and almost primes. [Russian]. Mat. Sbornik (N. S.) 61 (103) (1963) 418425 8. Bombieri, E.: Sulle formula di A. Selberg generalizzate per classi di funzioni aritmetiche e Ie applicazioni al problema del resto nel "Primzahlsatz". Riv. Mat. Univ. Parma 2; 3 (1962) 393440 9. Bombieri, E.: On the large sieve. Mathematika 12 (1965) 201225 10. Bombieri, E.: Le grand crible dans la theorie analytique des nombres. Societe Mathematique de France 18 (1974) II. Bombieri, E., and Davenport, H.: Small differences between prime numbers. Proc. Roy. Soc. Ser. A, 293 (1966) 118 12. Burgess, D. A.: The distribution of quadratic residues and nonresidues. Mathematika 4 (1957) 106112 13. Burgess, D. A.: On character sums and primitive roots. Proc. London Math. Soc. 12 (1962) 179192 14. Buchstab, A. A.: New results in the investigation of the GoldbachEuler problem and the problem of prime pairs. [Russian]. Dokl. Akad. NaukSSSR 162(1965) 735 738 = Soviet Math. Dokl. 6 (1965) 729 732 15. Cassels, J. W. S.: An introduction to Diophantine approximation. Cambridge Tracts in Mathematics 45 (1957) 16. Chao, K.: On the diophantine equation x 2 = y' + I, xy io O. Sci. Sin. 14, 3 (1965) 457 460 17. Chen, J. R.: On the circle problem. [Chinese]. Acta Math. Sinica 13 (1963) 299313 18. Chen, J. R.: On Waring's problem: g(5) = 37. [Chinese]. Acta Math. Sinica 14 (1964) 715 734 19. Chen, J. R.: On the representation ofa large even integer as the sum ofa prime and the product of at most two primes. [Chinese]. Kexue Tongbao 17 (1966) 385386 20. Chen, J. R.: On the representation of a large even integer as the sum a prime and the product of at most two primes. Sci. Sinica 16 (1973) 157 176 21. Diamond, H., and Steinig, G. J.: An elementary proof of the prime number theorem with a remainder term. Inventiones Math. II (1970) 199 258 22. Dickson, L. E. : History of the theory of numbers. (Three volumes). Carnegie Institute, Washington (1919, 1920, 1923) 23. Elliot, P. D. T. A., and Halberstam, H.: Some applications of Bombieri's theorem. Mathematika 13 (1966) 196203 24. Estermann, T. : Introduction to modern prime num ber theory. Cambridge Tracts in Mathematics 41 (1952) 25. Gauss, C. F.: Disquisitiones arithmeticae. Leipzig, Fleisher, (1801). English translation: A. A. Clarke, Yale University Press (1966)
566
Bibliography
26. Hagis, Jr., P.: A lower bound for the set of odd perfect numbers. Math. Compo 27; 12l (1973) 951953 27. Hagis, Jr., P., and McDaniel, W. L.: On the largest prime divisor of an odd perfect number, II. Math. Comp., 29 (1975) 922924 28. Halberstam, H., and Richert, H.E.: Sieve methods. Academic Press, London (1974) 29. Hardy, G. H., and Wright, E. M.: An introduction to the theory of numbers. 4th ed. Oxford (1960) 30. Hua, L. K.: Die Abschiitzungen von Exponentialsummen und ihre Anwendung in der Zahlentheorie. Enzykl. Math. Wiss., J, 2, Heft 13. Teil I. Leipzig (1959) 31. Huxley, M. N.: On the difference between consecutive primes. Inventiones Math. 16 (1972) 191201 32. Huxley, M. N.: Small differences between consecutive primes. Mathematika, 20; 2 (1973) 229232 33. Ingham, A. E.: The distribution of prime numbers. Cambridge Tracts in Mathematics 30 (1932) 34. Kolesnik, G. A.: The refined error term of the divisor problem. [Russian]. "Mat. Zametki" 6 (1969) 545554 35. Korobov, N. M.: On the estimation of trigonometric sums and its applications. [Russian]. Uspeki Math. Nauk SSSR 13 (1958) 185192 36. Landau, E.: Handbuch der Lehre von der Verteilung der Primzahlen. (2 Biinde). Leipzig, Teubner (1909) 37. Landau, E.: Vorlesungen iiber Zahlentheorie. (3 Biinde). Leipzig, Hirzel (1927) . 38. Landau, E.: Uber einige neuere Fortschritte der additiven Zahlentheorie. Cambridge Tracts in Mathematics 35 (1937) 39. Lavrik, A. V., and Soberov, A. S.: On the error term of the elementary proof ofthe prime number theorem. [Russian]. Dokl. Adad. Nauk SSSR 211 (1973) 534536 40. Linnik, Yu. V.: The dispersion method in binary additive problems. Leningrad, (1961). = Providence, R.1. (1963) 41. Mahler, K.: On the fractional parts of the powers of a rational number, II. Mathematika 4 (1957) 122124 42. Minkowski, H.: Geometrie der Zahlen. Leipzig, Teubner (1910) 43. Minkowski, H.: Diophantine Approximation. Leipzig, Teubner (1927) 44. Montgomery, H. L.: Topics in Multiplicative Number Theory. Springer Lecture Notes 227 (1971) 45. Pan, C. T.: On the least prime in an arithmetic progression. [Chinese]. Sci Rec., New Ser. 1 (1957) 283286 46. Pan, C. T.: On the representation of an even integer as a sum of a prime and an almost prime. [Chinese]. Acta Math. Sinica 12 (1962) 95 106 = Chinese Math.Acta 3 (1963) 101112 47. Pan, C. T.: On the representation of even numbers as the sum of a prime and a product of at most 4 primes. [Chinese]. Acta Sci. Natur. Univ. Shangtung 2 (1962) 4062 = Sci. Sinica 12 (1963) 455 474. [Russian] 48. Pan, C. T., Ding, X. X., and Wang, Y.: On the representation of a large even integer as a sum of a prime and an almost prime. Kexu Tongbao 8 (1975) 358360 49. Richert, H.E.: Zur multiplikativen Zahlentheorie. J. reine angew. Math. 206 (1961) 3138 50. Roth, K. F.: On the large sieves of Linnik and Renyi. Mathematika 12 (1965) 19 51. Schmidt, W. M.: Simultaneous approximations to algebraic numbers by rationals. Acta Math. 125 (1970) 189201 52. Schmidt, W. M.: Diophantine Approximations. Springer Lecture Notes 785 (1980) 53. Sierpiilski, W.: Elementary theory of numbers. Warszawa (1964) 54. Slowinski, D.: Searching for the 27th Mersenne prime. J. Recreational MatJ;!.ematics 11 (1979) 258261 55. Stark, H. M.: A complete determination of the complex quadratic fields of class number 1. Michigan Math. J. 14 (1967) 127 56. Stepanov, S. A.: On the estimation of Weyl's sums with prime denominators. [Russian]. Uzv. Akad. Nauk. SSSR, Ser. Mat. (1970) 1015 1037 57. Titchmarsh, E. c.: The theory of the Riemann zetafunction. Oxford (1951) 58. Vaughan, R. c.: A note on Snirel'man's approach to Goldbach's problem. Bull. London Math. Soc. 8 (1976) 245250 59. Vinogradov, A. I.: The density hypothesis for Dirichlet Lseries. [Russian]. Izv. Akad. Nauk SSSR Ser. Mat. 29 (1965) 903 934. Corrigendum: ibid. 30 (1966) 719720
Bibliography
567
60. Vinogradov, I. M.: On a new estimation of a function W + it). [Russian]. Izv. Akad. Nauk SSSR, Ser. Mat. 22 (1958) 161164 61. Vinogradov, I. M.: On the problem of the upper estimation for G(m). [Russian]. Izv. Akad. Nauk SSSR, Ser. Mat. 23 (1959) 637  642 62. Wang, Y.: On the least primitive root. [Chinese]. Acta Math. Sinica 9 (1959) 432441 63. Wang, Y.: On the estimation of character sums and its applications. [Chinese]. Sci. Record (N. S.) 7 (1964) 7883 64. Wirsing, E.: Elementare Beweise des Primzahlsatzes mit Restglied, II. J. Reine Angew. Math. 214/215 (1964) 118 65. Yin, W. L.: On Dirichlet's divisor problem. Sci. Rec., New Ser. 3 (1959) 131134
Index
Abel's lemma 120 Aequartro identica ratis abstrura 208 Algebraic number fields 425 Argument 338 Artin, E. 39 Association 431 , left 368, 382 , right 368  modulo p 62 Baker, A. 493, 565 Balasubramanian, R. 513, 565 Barban, M. B. 565 Basis 399,426 , integral 427 , standard 402 Base interchange formula 49 Bertrand's postulate 75, 82 Blichfeldt, H. F. 547 Bombieri, E. 100, 249, 565 Brun, V. 74, 514 Buchstab, A. A. 565 Burgess, D. A. 185, 337, 565
Cassels, J. W. S. 478, 493, 565 Chao JungTze 255 Chao, K. 299, 504, 565 Character 152  system 314, 445 , improper 157 , primitive 156 , principal 152 , standard factorization of 156 Chebyshev, P. L. 82 Chen, J. R. 99, 100, 147,513,565 Chowla, S. 100 Cofactor 372 , algebraic 372 Commute 372 Congruent 22, 416  modulo m 437  modulo 9Jl 402 Conjugates 425
Continued fraction 250 , complete quotient of 252 , nth convergent of 250 , periodic 260 , simple 251 Convex body 538  region 535 Coprime 5, 434 Countable 474 Cross ratio 342
Davenport, H. 100, 448, 534, 545, 565 Degree 423, 425  of III 438 Density, asymptotic 113 ,p
210
, real 210 , Schnirelmann 514 Diamond, H. 249, 565 Dickson, L. E. 276, 565 Dimension 399 Ding, X. X. 566 Diophantine equations 276 Diophantus 276 Dirichlet series 143 Dirichlet's divisor problem 147 Discriminant 300, 426  of R(8) 428 , fundamental 322 Divisor 2, 57, 430 , elementary 387 , greatest common 5, 58, 394, 434 , ideal 433 , proper 2 , right 389 Dyson, F. J. 478
Eisenstein, F. G. 39 Elliot, P. D. T. A. 101, 565 Elliptic 341 Enumerable 474 Equipotent 474 Equivalent 257, 350, 369
570
Index
 form 301  form modq 309  in the narrower sense 443 Erdos, P. 217 Euclidean algorithm 5 Euclidean distance 347 Euler, L. 76  Binet formula 290  's constant 88, 112, 483  's criterion 36  's identity 191 Extended complex plane 339 Extension, algebraic 69 , finite 424 , single 424  , ((i
416
((iconvergent sequence 415 ((ilimit 415 Factor, invariant 387 , repeated 63 Farey sequence 125 Fermat, P. de 288  solution 25  last theorem 151,451,488 Fibonacci sequence 252 , Field 68, 424 , Euclidean 447 , simple 447 Finite order 342 Fixed point 341 Form, binary quadratic 300 , (in)definite 301 , primitive 307 , reduced 304 Franklin, F. 197 Function, arithmetic 102 , Chebyshev 217 , (completely) multiplicative 13, 102 , divisor 103, 111 , Euler 103 , generating 143 , Mobius 103 , Riemann zeta 144,219 , slowly decreasing 226 , von Mangoldt 103 Fundamental circle 358 Fundamental region 351 Fundamental sequence 415 Furtwiingler 39
Gauss, C. F. 37, 39, 47, 329, 565 Gelfond, A. O. 488 Genus 314, 445 , principal 446
Geodesic 345 Goldbach's problem 74,99, 151,514 Graph 195 , (self)conjugate 196 Group 340 , abelian 68 , adjoint 391 Hagis, Jr., P. 566 Hajos 535, 542 Halberstam, H., 100, 101,534,566 Hardy, G. H. 101, 566 HeathBrown, D. R. 100 Hensel, K. 405 Hensel's lemma 421 Hilbert, D. 39, 483, 488, 494, 514 Heilbronn, H. 329 Hua, L. K. 513,566 Huxley, M. N. 100, 566 Hyperbolic 341 Ideal 58, 68, 432  class 441  divisor 433 , prime 434 , principle 432 , product of 432 , unit 432 Index 48 Inequality, arithmeticgeometric means , BunyakovskySchwarz 508 , Cauchy 330 , Holder 550 , Minkowski 547,553 Ingham, A. E. 566 Integer 1 , algebraic 423 , rational 423 Inverse transformation 339 Involution 342 Iwaniec, H. 100 Jarnik, M. V. 123 Jacobi's symbol 44, 159
Khintchin, A. 494 Kolesnik, G. A. 147, 566 Korobov, N. M. 248, 566 Kronecker's symbol 185, 304 Kummer, E. 39, 431, 451 Kusmin 488 Lagrange interpolation formula
61
544
571
Index Lambert series 146 Landau, E. 566 Large sieve 100 Lattice point 40, 112 Lavrik, A. V. 566 Law of Quadratic Reciprocity 39 Lehmer, D. H. 26 Lehmer, D. N. 4 Legendre, A.M. 39 Legendre's symbol 35, 152 Lobachevskian geometry 348, 354 Loxodromic 341 Linnik, Yu. 100, 101, 494, 514, 566 Littlewood, J. E. 73, 101
Mahler, K. 513,566 Mann, H. B. 516 Markoff, A. A. 288 Matrix, adjoint (modular) 373, 390 , composite 389 , irreducible 389 , (positive) modular 365, 372 , (standard) prime 390 Mediant 127 Mersenne number 38, 449 Mersenne prime 450 Miller, J. C. P. 51 Minkowski, H. 544,547,556,566 Mobius (inverse) transform 108 Modular transformation 257 Modulus 4 , double 64 , integral 68 Montgomery, H. L. 100,566 MordeII, L. J. 538 Multiple 2 , (left) least common 394, 8, 59
Niven, I. 486 Norm 425  ofIDl 402 Normal form of Hermite 369, 384   of Smith 370, 386 NuII sequence 415 Number, algebraic 423 , cardinal 474 , composite 3 , Markoff 260, 288 , Mersenne 38, 449 , perfect 13 , prime 3 , squarefree 113 , triangular 191 , transcendental 476
Order 48,68 Otto 255
Pan, C. T. 100,566 Parabolic 342 Partition 187 , (self) conjugate 195, 196 Period 342 Point at infinity 339 P6lya, G. 185 Polynomial, associated 57 , integral valued 17 , (ir)reducible 20,63 Primary solution 282 Prime in R(8) 431  modp
63
 twins 74 Primitive root 48, 49, 68 Principal class 446 Proper solutions 279
Quadratic algebraic numbers 349 Quadratic, (non)residue 35
Reduced points 351 Reduced quadratic form 358 Renyi, A. 100 Residue class 22 moth  moddp, rp(x) 67 ....:., (non)kth power 49 , quadratic (non) 35 Residue system, complete 22, 64  , reduced 24, 64 Richert, H. E. 100, 147, 566 Riemann hypothesis 185,488 Riemann sphere 339 Roth, K. F. 100,478,566
Schmidt, W. M. 493, 566 Schneider, T. 488 Schnirelmann, L. 514 Selberg, A. 217,514 Siegel, C. L. 329,478 Sierpinski, W. 566 Sieve of Eratosthenes 3 Slowinski, D. 566 Soon Go 276 Squaring the circle 488 Standard factorization 3 Stark, H. M. 566 Steinhaus, H. 123 Steinig, G. J. 249, 565 Stepanov, S. A. 566
572 Symbol, Jacobi 44, 159 , Kronecker 185,304 , Legendre 35, 152
Takagi, T. 39 Theorem, Bombieri 101 , Cauchy 497 , Chebyshev 73,79,89,266 , Chinese remainder 22, 29 , Dedekind's discriminant 438 , Dirichlet 73,97,243 , Eisenstein 20 , ErdosFuchs 138 , Euler 24, 36, 76 , Fermat 18, 24 , Fundamental  of arithmetic I, 3, 6 , Fundamental  for ideals 435 , Gauss 20, 37 , HardyRamanujan 95 , HeilbronnSiegel 329 , Hermite 485 , Hilbert 61,529 , Hurwitz 256 , Ikehara 228 , Jacobi 208 , Jacobsthal 176 , Khintchin 266 , Lagrange 208 , LandauOstrowskiThue 480 , Legendre 261 , Liouville 476 , Lindemann 486 , Mayer 288 , Miller 96 , Minkowski's Fundamental 535, 538 , P61ya 172 , prime number 73 , Roth 478 , Selberg 233, 520 , Schur 329 , Siegel 331,335 , Sierpinski 134
Index , Soon Go 286 , Stickel berger 428 , Tchebotaref 556 , Thue 479 , unique factorization 58 , Voronoi 137 , Weyl 270, 272 , Wilson 33 , Wolstenholme 33 Thue, A. 478 Titchmarsh, E. C. 566 Trace 425 Transformation 373 , (uni)modular 348, 373 Triangle 347 Turan, P. 95
Unit 424 , fundamental  circle 338
441
Valuation 408, 409 , (non)Archimedian 411 , equivalent 410 , identical 409  , padic 409 Vaughan, R. C. 534, 566 Vinogradov, A. I. 100, 566 Vinogradov, I. M. 74, 173,248,513,567
Wang, Y. 185, 337, 566, 567 Weil, A. 185 Wheeler, D. J. 51 Wiener, N. 217 Wirsing, E. 249, 567 Wright, E. M. 566
Yin, W. L.
137, 147, 567