Essays in Constructive Mathematics

Essays in Constructive Mathematics Harold M. Edwards Essays in Constructive Mathematics Springer Harold M. Edward...

Author: Harold M. Edwards

56 downloads 1101 Views 13MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Essays in Constructive Mathematics

Harold M. Edwards

Essays in Constructive Mathematics

Springer

Harold M. Edwards Courant Institute of Mathematical Sciences New York University 251 Mercer Street New York, NY 10012 USA

MSC 2000; 00B15, 03Fxx Library of Congress Cataloging-in-Publication Data Edwards, Harold M. Essays in constructive mathematics / Harold M. Edwards. p. cm. ISBN 0-387-21978-1 (alk. paper) 1. Constructive mathematics. I. Title. QA9.56.E39 2004 511.3—dc22 2004049156 ISBN 0-387-21978-1

Printed on acid-free paper.

© 2005 Harold M. Edwards All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 987654321

(EB)

SPIN 10985564

Springer is a part of Springer Science+Business Media springeronline.com

For Betty with love

Contents

Preface Synopsis

ix xiii

1

A Fundamental Theorem 1.1 General Arithmetic 1.2 A Fundamental Theorem 1.3 Root Fields (Simple Algebraic Extensions) 1.4 Factorization of Polynomials with Integer Coefficients 1.5 A Factorization Algorithm 1.6 Validation of the Factorization Algorithm 1.7 About the Factorization Algorithm 1.8 Proof of the Fundamental Theorem 1.9 Minimal Splitting Polynomials

1 1 6 10 13 20 27 31 35 39

2

Topics in Algebra 2.1 Galois's Fundamental Theorem 2.2 Algebraic Quantities 2.3 Adjunctions and the Factorization of Polynomials 2.4 The Splitting Field of x^ + cix^"^ -h C2X^~^ + • • • + c^ 2.5 A Fundamental Theorem of Divisor Theory

41 41 46 49 56 62

3

Some Quadratic Problems 3.1 The Problem A\J-\- B = D and "Hypernumbers" 3.2 Modules 3.3 The Class Semigroup. Solution of An^B =n 3.4 Multiplication of Modules and Module Classes 3.5 Is A a Square Mod p? 3.6 Gauss's Composition of Forms 3.7 The Construction of Compositions

65 65 71 79 93 102 108 112

viii

Contents

4

The Genus of an Algebraic Curve 4.1 Abel's Memoir 4.2 Euler's Addition Formula 4.3 An Algebraic Definition of the Genus 4.4 Newton's Polygon 4.5 Determination of the Genus 4.6 Holomorphic Differentials 4.7 The Riemann-Roch Theorem 4.8 The Genus Is a Birational Invariant

119 119 124 128 132 142 155 164 171

5

Miscellany 5.1 On the So-Called Fundamental Theorem of Algebra 5.2 Proof by Contradiction and the Sylow Theorems 5.3 Overview of 'Linear Algebra' 5.4 The Spectral Theorem 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics"

179 179 186 190 196 201

References

205

Index

209

Preface

He [Kronecker] was, in fact, attempting to describe and to initiate a new branch of mathematics, which would contain both number theory and algebraic geometry as special cases.—Andre Weil [62] This book is about mathematics, not the history or philosophy of mathematics. Still, history and philosophy were prominent among my motives for writing it, and historical and philosophical issues will be major factors in determining whether it wins acceptance. Most mathematicians prefer constructive methods. Given two proofs of the same statement, one constructive and the other not, most will prefer the constructive proof. The real philosophical disagreement over the role of constructions in mathematics is between those—the majority—who believe that to exclude from mathematics all statements that cannot be proved constructively would omit far too much, and those of us who believe, on the contrary, that the most interesting parts of mathematics can be dealt with constructively, and that the greater rigor and precision of mathematics done in that way adds immensely to its value. Mathematics came to a fork in the road around 1880. On one side, Dedekind, Cantor, and Weierstrass advocated accepting transfinite "constructions" like those needed to prove the Bolzano-Weierstrass "theorem." On the other, Kronecker argued that no such departure from the standards of proof adhered to by Dirichlet and Gauss was necessary and that the Aristotelian exclusion of completed infinites could be maintained. As we all know, the first group carried the day, and the Dedekind-Cantor-Weierstrass road was the one taken. The new orthodoxy was consolidated by Hilbert a century ago, and has reigned ever since, despite occasional challenges, notably from Brouwer and Bishop. During this century, the phrase "foundations of mathematics" has come to mean for most working mathematicians the complex of ideas surrounding the axioms of set theory and the axiom of choice, matters that for

X

Preface

Kronecker had no mathematical meaning at all, much less foundational meaning. Why, a hundred years after this choice was made, and made so decisively, do I believe t h a t the road Kronecker proposed might win new consideration? T h e advent of computers has had a profound impact on mathematics and mathematicians t h a t has already altered views about t h e n a t u r e and meaning of mathematics in a way favorable to Kronecker. T h e new technology causes mathematics to be t a u g h t and experienced in a much more computational way and directs attention to algorithms. In other words, it fosters constructive attitudes. My own preference for constructive formulations was shaped by my experience with computer programming in the 1950s, and computer programming at t h a t time was trivial by today's standards. No evidence supports the image t h a t is so often presented of Kronecker as a vicious and personal critic of Cantor and Weierstrass—another instance of history being written by the victors. As far as I have been able to discover, Kronecker vigorously opposed the views of Cantor and Weierstrass, as well as those of Dedekind, with whom he was on far better terms, but he was not hostile to the men themselves. Moreover, his opposition t o their views—which was of course reciprocated—was rarely expressed in his publications. In the rare instances in which he mentioned such issues, he merely stated his belief t h a t the new ways of dealing with infinity t h a t were coming to be accepted were unnecessary. Instead of excoriating nonconstructive methods, as legend would have us believe, he concentrated his efforts on backing up his beliefs with concrete mathematical results proved constructively. No one doubts t h a t Kronecker was one of the giants of nineteenth-century mathematics, but it is often said t h a t he succeeded in his works because he ignored the strictures t h a t he advocated in his philosophy. This view of the relation of Kronecker's mathematics to his philosophy is often ascribed to Poincare, but as I have written elsewhere [21], this ascription is based on a misinterpretation of a passage [53] in which Poincare writes about issues unrelated to the t r e a t m e n t of infinity in mathematics. Indeed, no one who has studied Kronecker's works could believe t h a t he accepted completed infinites or made use of nonconstructive arguments. Like many other mathematicians since, he was impatient with the philosophy of mathematics and wanted only to get on with his m a t h e m a t i c s itself, but for him "mathematics" was always constructive. T h a t a t t i t u d e inspires these essays. My goal has been not to argue against the prevailing orthodoxy, but to show t h a t substantial mathematics can be done constructively, and t h a t such mathematics is interesting, illuminating, and concordant with t h e new algorithmic spirit of our times. I have given examples of what I m e a n by constructive mathematics, without trying to define it. The underlying idea is well expressed in the essay of Poincare mentioned above, in which he says t h a t the guiding principle for b o t h Kronecker and Weierstrass was to "derive everything from the n a t u r a l numbers" so t h a t t h e result would "partake of the certainty of arithmetic." I regard the natural

Preface

xi

numbers not as a completed infinite set but as a means of describing the activity of counting. (See Essay 1.1.) T h e essence of constructive mathematics for me hes in the insistence upon treating infinity, in Gauss's phrase, as a fagon de parler, a shorthand way of describing ideas t h a t need to be restated in terms of finite calculations when it comes to writing a formal proof. It will surely be remarked t h a t almost all of the topics treated in the essays come from algebra and number theory. They not only partake of the certainty of arithmetic, as Poincare says, they are arithmetic—what Kronecker called "general arithmetic." (Again, see Essay 1.1.) But there are three exceptions. In Essay 4.4, Newton's polygon is treated as a m e t h o d of constructing an infinite series, which means, constructively, as an algorithm for generating arbitrarily many terms of the series. Convergence is not an issue because the theory t r e a t s t h e series themselves, not their limits in any sense. In Essay 5.1, a complex root of a given polynomial—a convergent sequence of rational complex numbers whose limit is a root of the polynomial—is found by an explicit construction. Finally, Essay 5.4, which sketches a proof of the spectral theorem for symmetric matrices of integers, necessarily deals with real numbers, t h a t is, with convergent sequences of rationals. An essay is "a short literary composition on a single subject, usually presenting t h e personal views of the author." There is nothing literary about these essays, b u t they do treat their mathematical subjects from a personal point of view. For example. Essay 5.1 explains why the "fundamental theorem of algebra" is misnamed—in a very real sense it isn't even t r u e — a n d Essay 1.2 explains why Euclid's statement of Proposition 1 of Book 1 of the Elements^ "On a given finite straight line to construct an equilateral triangle" is b e t t e r t h a n "Given a straight line segment, there exists an equilateral triangle of which it is one of t h e sides," the form in which most of Euclid's present-day successors would state it. These are my opinions. To my dismay, it is incessantly borne in on me how few of my colleagues share t h e m and how completely mathematicians today misunderstand and reject t h e m . These compositions t r y — t h e y essay—to present t h e m in a way t h a t will permit t h e reader to see past the preconceptions t h a t stand between what I regard as a commonsense a t t i t u d e toward the study of m a t h e m a t i c s and t h e attitudes most commonly accepted today. They essay t o reopen the Kroneckerian road not taken. Acknowledgments I am profoundly grateful to Professor David Cox, who provided encouragement when it was sorely needed, and backed it u p with sound advice. I also t h a n k Professors Bruce Chandler, Ricky Pollack, and Gabriel Stolzenberg for friendship and for many years of stimulating conversation about the history and philosophy of mathematics. Most of all, I t h a n k my wife, B e t t y Rollin, t o whom this book is dedicated, for more t h a n I could ever enumerate.

Synopsis

The essays are divided into five parts: A Fundamental Theorem Topics in Algebra Some Quadratic Problems The Genus of an Algebraic Curve Miscellany The fundamental theorem of Part 1 constructs a splitting field for a given polynomial. As is shown in Part 2, the case in which the given polynomial has coefficients in a ring of the form Z[ci, C2, . . . , Cj^]—a ring of polynomials in some set of indeterminates ci, C2, .. •, Cj^ with integer coefficients—suffices for the apparently more general case of a polynomial f{x) whose coefficients are "algebraic quantities" in a very general sense. For this reason, only polynomials with coefficients in Z[ci, C2, . . . , Cjy] are considered in Part 1. Another way to state the problem "Construct a splitting field for a given polynomial" is "Extend the notion of computation with polynomials with integer coefficients in such a way that the given polynomial can be written as a product of linear factors." Computation in Z[ci, C2, . . . , Cj^] involves just addition, subtraction, and multiplication, but it extends to computations involving division in the field of quotients of the integral domain Z[ci, C2, . . . , c^y] in the same way that computation in the ring of integers extends to computation in the field of rational numbers. As Gauss's lemma shows (Essay 1.4), this extension does not affect the factorization of polynomials. A simple further extension of Z[ci, C2, . . . , c^^] is effected by "adjoining" one root of a monic, irreducible polynomial with coefficients in Z[ci, C2, . . . , Cy] to the field of rational functions. This simple construction, which Galois used with amazing success, although with some lack of rigor, is generally known as a "simple algebraic extension" of the field of quotients of Z[ci, C2, . . . , c^y]. For the sake of brevity, I have called a field constructed in this way the "root field" of the monic, irreducible polynomial used in its construction (Essay 1.3).

xiv

Synopsis

W i t h this specific description of the way in which computations in Z[ci, C2, . • . , c^y] are to be extended, the construction problem to be solved becomes, "Given a polynomial / with coefficients in Z[ci, C2, . . . , c^y], find an auxiliary polynomial g with coefficients in t h e same ring such t h a t g is monic and irreducible and such t h a t its root field splits / " in t h e sense t h a t / can be written as a product of linear factors with coefficients in t h e root field of g. T h e problem, then, is, "Given / construct ^f." T h e solution in Part 1 is iterative. Suppose t h a t ^ is a failed a t t e m p t at a solution. Thus, the factorization of / over the root field of g contains at least one irreducible factor of degree greater t h a n 1. The iteration needs to construct a better a t t e m p t at a solution. Specifically, it needs to construct a new auxiliary polynomial, call it gi, with the property t h a t the factorization of / over the root field of gi contains more linear factors t h a n does the factorization of / over the root field of ^f. If gi fails to split / , the same procedure can be applied again to find a g2 t h a t gives / more linear factors t h a n gi did. Since the number of linear factors of / increases with each new ^, and since t h e number of such factors is bounded above by t h e degree of / , such an iteration must eventually reach a solution of the problem—an a t t e m p t e d g t h a t does not fail. To make this sketch into an actual iterative construction of a splitting field for / requires two main steps. First, given / and an a t t e m p t at g^ one needs to be able to factor f when it is regarded as a polynomial with coefficients not in Z[ci, C2, ' • y Ci,] hut in its extension, the root field of g. The difficult step in the construction of a splitting field for / is the algorithmic solution of this factorization problem. T h e algorithm is set forth in Essay 1.5, with examples, and the proof t h a t it achieves its objective is in Essay 1.6. The relation of the algorithm to Kronecker's solution of t h e same factorization problem is among the subjects discussed in Essay 1.7. Second, one needs to describe explicitly how to pass from a g t h a t fails to split / to a new gi t h a t comes closer to splitting / . T h e underlying idea of the construction is simple: Because g does not split / , there is an irreducible factor, call it (/), of / over the root field of g whose degree is greater t h a n 1. Adjoin to the root field of g a root of (j). This double adjunction, first of a root of g and then of a root of 0, gives a field over which / has more linear factors—a field in which / has more roots— because it contains a root of 0, and the root field of g did not. T h e problem is to write this double adjunction as a simple one—specifically as the field obtained by adjoining a root of a new gi with coefficients in Z[ci, C2, • . . , c^y]. T h e construction of such a gi is given in Essay 1.8. Finally, although there are infinitely many polynomials g t h a t split / , there is only one splitting field of / in t h e sense t h a t if ^ is a minimal splitting polynomial of / one t h a t is itself split by any polynomial t h a t splits / — t h e root field of g is isomorphic to the root field of any other minimal splitting polynomial of / . T h e end result is a theorem t h a t in my opinion deserves the name "Fundamental Theorem of Algebra" much more t h a n the theorem t h a t is and probably always will be known by t h a t name: Given a polynomial f (in one

Synopsis

xv

variable) with coefficients in Z[ci, C2, ..., Cjy] there is an explicit way to extend rational computations in Z[ci, C2, .. -, Cj^] so that f factors into linear factors; moreover, any two minimal ways of doing this are isomorphic. For the relation of this theorem to the "Fundamental Theorem of Algebra" see Essay 5.1. The theorem of Part 1 that has just been described is implicitly contained— with no hint of a proof—in Lemma III of Galois's treatise [27] on the algebraic solution of equations (see [22]). In this sense, it is the foundation of Galois theory. The connection is explained in Essays 2.1 and 2.3. Essay 2.2 is devoted to justifying Kronecker's assertion that every field of algebraic quantities is isomorphic to the root field of a polynomial with coefficients in some Z[ci, C2, . . . ; Cj^] as that concept was defined in Part 1. This fact is the basis of Kronecker's later view—despite the fact that he had previously given the title Foundations of an Arithmetical Theory of Algebraic Quantities to his major publication—that "algebraic quantities" were unnecessary in mathematics and that algebraic questions should be studied using "general arithmetic" instead (see Essay 1.1). The algorithmic description of fields of algebraic quantities in terms of "adjunction relations" in Essay 2.3 gives a construction of the splitting field of a polynomial that is very close to Chebotarev's in his excellent but littleknown book on Galois theory [8]. The construction of the splitting field of a general monic polynomial of degree n in Essay 2.4 proves another basic theorem of Galois—another to which Galois gave no hint of a proof—that the Galois group of an nth-degree polynomial f{x) whose coeflacients are 'letters' is the full symmetric group. The splitting field is explicitly given by adjunction relations fi{ai) — 0, where (1)

fi{x) -

^^""^ {x - ai){x - 0^2) • • • {x -

ai-i)

is the irreducible polynomial satisfied by a root a^ of f{x) whose coefficients are polynomials in the roots oî, 0:2, . . . , oî-i already adjoined. (The right side of (1), as it stands, is of course not a polynomial; it becomes one once i — 1 roots a i , 0^2, • • •, i{rj) = 0 for j ^ i and (t)i{ri) = 1. If ^(x) is a factor of / ( x ) , then its degree is at most n and its value agrees with the value of g{ri) 0 distinct roots a i , a2, .•-, o^m in that domain. Then n > 0 and g(y) = f{y + ai) defines a polynomial of degree n that has m distinct roots 0, Q;2 — Qî, Q^3 — Qî, • • • , o^m — «i- Because ^(0) = 0, g{y) = yh(y), where h{y) has degree n — 1. The m — 1 roots a2 — Qî, Qâ — Qî, • • • , o^m — oî of g{y) other than zero are roots of h{y)^ because (ai — ai)h{ai — ai) — g{ai — a i ) = 0 and the first factor on the left is not zero. If a polynomial of degree n had n + 1 distinct roots, this construction would prove that n > 0 and would yield a polynomial of degree n — 1 with n distinct roots, which is impossible because it would imply an infinite sequence of polynomials of decreasing degrees. Corollary 1: Given a polynomial of degree n with integer coefficients, any set of 2n -\- 1 distinct integers contains at least n + 1 that are not roots of it. Corollary 2: Let f{x) be a polynomial with coefficients in an integral domain that has degree n and n distinct roots ai, a^, . . ., OLn in the integral domain. Then f{x) = a{x — ai){x — a2) • • • (x — an), where a is the leading coefficient of f{x). Deduction: The difference between / ( x ) and a(x — ai)(x — a2) •' - (x — an) is a polynomial of degree less than n that has n distinct roots, so it must be zero. ^ As was shown in the previous note, (/)i(x) must be a rational number times n ^ i ( ^ — rj). The rational multiplier is determined by the condition 0(ri) = 1 to be the reciprocal of the value of Y[ -^^(^ ~ ^j) when x = ri. ^ To test whether g(x) = CQX^ + cix^~^ + • - - + Cm divides a nonzero f{x) = aox'^ -\- aix^~^ -\- •' •-\-am determine whether n>m and co divides ao- If not, g{x) does not divide f{x). If so, replace f{x) with f{x) — ^x^~^g{x), a polynomial that has degree less than d e g / and that is divisible by g{x) if and only if the original f(x) was. If the new f{x) is zero, it is divisible by g{x). Otherwise, apply the same test to the new f{x).

Essay 1.4 Factorization of Polynomials with Integer Coefficients

15

polynomials in k indeterminates, witfi ao 7^ 0 and n > 0. Again choose n-\- 1 integers r^ for which / ( n ) 7^ 0 and again construct t h e polynomials (l)i{x) with rational coefficients t h a t are one at r^ and zero at all other r j . Construct t h e finite list of expressions of the form 61 ^ i (x) + &202(^) H H bn(l)n{x) in which bi is a divisor of f{ri). (Here / ( r ^ ) is a polynomial in k indeterminates with integer coefficients. By the inductive hypothesis, its divisors can be listed.) T h e required list of divisors is obtained by striking from this finite list t h e entries t h a t are not polynomials with integer coefficients and those t h a t do not divide f{x). A polynomial with integer coefficients is a u n i t if it is 1 or —1. T h e t r i v i a l divisors of a nonzero polynomial are the units and t h e polynomial itself times a unit. A nonzero polynomial is r e d u c i b l e if its list of divisors given by the theorem contains more t h a n the four trivial divisors, or, what is the same, if it can be written as a product of two polynomials, neither of which is a unit. A nonzero polynomial is i r r e d u c i b l e if it is not a unit and not reducible. Corollary. Given a nonzero polynomial with integer coefficients a unit, write it as a product of irreducible polynomials.

that is not

Deduction. Take t h e given polynomial as t h e initial input to the following algorithm: Input: A product of nonzero polynomials (with integer coefficients) none of t h e m units. Algorithm: If each input factor is of them the list of divisors given divisors—the algorithm terminates. Replace one reducible factor in the factor as a product of two factors,

irreducible—which is to say that for each by the theorem contains only the trivial Otherwise, at least one factor is reducible. input product with a representation of that neither of which is a unit.

If the algorithm terminates, it terminates with a representation of t h e original polynomial as a product of irreducible polynomials, as required. But it must t e r m i n a t e because t h e list of all divisors of t h e original polynomial is finite. A nonzero polynomial with integer coefficients is p r i m e if it is not a unit and if it divides a product of polynomials with integer coefficients only when it divides one of t h e factors. T h e o r e m 2.

Irreducible polynomials

with integer coefficients

are

prime.

W h e n there are no indeterminates in the polynomials, this is the proposition t h a t irreducible integers are prime. Euclid proved it using t h e following lemma:

16

1 A Fundamental Theorem

L e m m a ( T h e E u c l i d e a n * a l g o r i t h m ) . Given positive integers f and g, find integers (j) and t/j for which (j)f -\- ipg is positive and divides both f and g. Proof Let a sequence of pairs of numbers {fni9n) be defined by taking ( / i , î) to be (/, g) and taking ( / n + i , ^n+i) to be either {fn^Qn - fn) or (/^ - gn, gn), depending on whether fn ^ 9n or fn > gn > ^' W h e n ^^ = 0, the sequence terminates. A step t h a t changes fn changes it t o a positive value, so each fn is positive. Similarly, each g^ is nonnegative. A step reduces fn-^Qn^ either by fn or by gn- By the principle of infinite descent, therefore, t h e sequence must terminate. Let (c?, 0) be the terminal pair. T h e n d is positive. Since / n + i and gn-\-i are sums of multiples of fn and g^, b o t h entries of the terminal pair (d, 0) are sums of multiples of the initial pair (/, ^ ) . Thus, there are integers 0 and ip such t h a t d = (j)f -\- ipg. A common divisor of / n + i and gn+i is a common divisor of fn and gn^ so, since d divides b o t h entries of (d, 0), it divides both / and ^, which proves the lemma. ^ T h e proof of Theorem 2 for polynomials with indeterminates will use a lemma analogous to t h e one above: L e m m a ( T h e E u c l i d e a n * a l g o r i t h m for p o l y n o m i a l s ) . Given nonzero polynomials f and g in one indeterminate with coefficients in a field, construct polynomials (f) and i/j with coefficients in the same field for which (j)f + ij^g divides both f and g. Proof. Let a sequence of pairs of polynomials (fn.gn) be defined by taking ifi^gi) to be (/, g) and taking (Z^+i, ^n+i) to be the pair derived from {fn,9n) in the following way: If 0 < d e g / ^ < d e g ^ ^ , set i = deg^f^ — d e g / ^ and {fn+i,gn+i) = (/n, 9n " i^x^fn/fJ^), wlicrc /x is t h e leading coefficient of fn and u is the leading coefficient of gn- If d e g / n > d e g ^ ^ > 0, set ( / ^ + i , ^ n + i ) = {fn — l^x^9n/î9n)^ whcrc i = degfn — deg^n and /i and v are the leading coefficients of fn and gn- T h e sequence terminates if fn or g^ is zero (that is, has degree —oo), which must occur eventually, because d e g / n + deg^n decreases with each step. Let d be the nonzero member of t h e terminal pair. Clearly c?, like all members of all pairs of t h e sequence, has t h e form (f)f -\-ijjg, where 0 and ijj are polynomials with coefficients in the field. T h e predecessor of (/^_î,^^+i) in the sequence just constructed is either (/n+i,5'n+i + ra'/n+i//^) or (/n+1 + /ix'^fn+i/^^^n+i)- In b o t h cascs, every common divisor of fn-\-i and gnî is a common divisor of fn and gn- Since d is a common divisor of the polynomials in the terminal pair, it is a common divisor of the polynomials in t h e initial pair (/, ^ ) , and the proof of the lemma is complete. * In essence, this is the algorithm Euclid used [25, Book 7, Propositions 1 and 2], although the formulation is entirely different. ^ For another version of this proof, see Essay 3.2. "•' It is "Euclidean" by virtue of the analogy with the previous case, not because Euclid considered anything of the kind.


17

Proof of Theorem 2. Let / , g^ and h be in the ring of polynomials Z[xi, X2, . . . , Xm] in m indeterminates with integer coefficients and let them satisfy (1) / is irreducible, (2) / divides gh, and (3) / does not divide g. It is to be shown that / must then divide h. When 777, = 0 the proof is the one that Euclid, in essence, gave [25, Book 7, Prop. 24]. In this case, / , g, and h are integers. Since changing the sign of / or g or h does not change (1), (2), (3), or the desired conclusion, / , ^, and h can be assumed to be nonnegative. Then / > 0 and ^^ > 0, because / is not zero by (1) and g is not zero by (3). The Euclidean algorithm provides a positive d = f -\-^g that divides both / and g. Since / is irreducible and d is positive, d = 1 or / . By (3), d ^^ f. Therefore (t)f ^ i^g ^ 1. Thus 0//^ + ipgh = h is divisible by / , as was to be shown, because / divides both terms on the left. Next consider the case of Theorem 2 in which / contains fewer than m indeterminates: A Special Case. An irreducible element of Z[xi^ X2^ • • •, Xm-i] is prime as an element of Z[xi, X2, • - -, Xm]Let / be an irreducible polynomial in xi, ^2, . . . , Xm-i with integer coefficients, and let g{x) and h{x) be polynomials in x, Xi, a;2, •. •, Xm-i with integer coefficients for which (2) and (3) hold. Then all coefficients oi g{x)h{x) are divisible by / , but at least one of the coefficients oi g{x) is not divisible by / . Let g{x) = a^x^ -\- aix^~^ + • • • + a^, where the a^ are polynomials in xi, X2, . . . , Xm-i: and let / be the least index for which aj is not divisible by / . If h{x) were not divisible by / , then, in the same way, when h{x) was written in the form h{x) = box^ -\- bix^~^ -}-••• -\- bt there would be a least index J for which bj was not divisible by / . Then the coefficient of x^^'^ in g{x)h{x) would be a lb J plus terms divisible by / (this coefficient is ajbj plus terms that are products in which one factor is divisible by / ) . But ajbj would not be divisible by / by the inductive hypothesis, so g{x)h{x) would not then be divisible by / , contrary to hypothesis. Therefore h{x) must be divisible by / , as was to be shown. The general case, in which / may contain x, can now be deduced from the case of TTI — 1 indeterminates as follows: The Euclidean algorithm gives equations d{x) = (t){x)f{x) -\- ip{x)g{x)^ f{x) — qi{x)d{x)^ g{x) = q2{x)d{x) where d^ 0, -0, gi, q2 are polynomials in x with coefficients in the field of rational functions in xi, X2, . . . , Xm-i- Let ^ be a common denominator of all five of these polynomials. (For example, 6 could be taken to be the product of all denominators of all coefficients of the five.) Then D{x) — 6 • d{x)^ ^{x) = 6 • 0(x), ^{x) = 6 • '0(x), Qi{x) = 6 ' qi{x), and Q2{x) = S • q2{x) all are in Z[x, Xi, X2, .. •, Xm-i] and they satisfy D{x) — ^{x)f{x) + ^{x)g{x)^ d'^ • f{x) = Qi{x)D{x) and 6'^ • g{x) = Q2{x)D{x). By the special case already proved, each irreducible factor e of S'^ divides either Qi{x) or D{x). Therefore, 6'^ ' f{x) = Qi{x)D{x) can be divided by each of the irreducible factors of 6'^ in succession to find f{x) = {Qi{x)/ei) • {D{x)/e2) where (5^ = 6162. By (1), D{x)/e2 must be ± 1 or ±f{x). By (3), D{x)/e2 7^ ±f{x). Therefore,

18


D{x) = ±62, SO dze2h{x) = ^ ( x ) • f{x) ' h{x) + ^{x) • g{x) • h{x), which shows t h a t f{x) divides e2/i(x), say €2h{x) = Q3{x) • f{x). By the special case, each irreducible factor of 62 divides Qs{X)^ so f{x) divides /i(x), as was to be shown. T h u s Theorem 2 follows by induction. C o r o l l a r y 1 ( U n i q u e f a c t o r i z a t i o n of p o l y n o m i a l s w i t h i n t e g e r coeff i c i e n t s ) . If (j)i(j)2' •' (j)^ = '0i'02 • • • '01/; where the factors on both sides are irreducible polynomials with integer coefficients, then fi = v, and the factors can be so ordered that (j)i = —-0^ for an even number of values ofi, and (j)i = ipi for all others. Deduction. Let such an equation 0 i 0 2 * * • ^^^ — iîi^2 ''' iî^ be given in which /i > 1. Since Theorem 2 implies t h a t ^ i divides t/jj for some j , the tp^s can be rearranged to make 01 divide '0i, say 0 i = qicf)!. T h e n 0203 ••* 0^^ = ^1'020^3 • • • 01/ is a product of factors, at least z/ — 1 of which are irreducible. If fi were less t h a n z^, ji iterations of this step would express 1 as a product of z/ factors, at least u — ji oi which were irreducible, contrary to the fact t h a t the only factors of 1, t h e units 1 and —1, are not irreducible. Therefore, /J. > v. For the same reason, z/ > /x, so /i and u must be equal in any such equation. In the first equation 0203 • • • 0;u = ^10203 •' 'iî^ found by the process above, qi must therefore have no irreducible factors and therefore must be a unit. Thus, fi steps rearrange the T/^'S in such a way t h a t 0^ = qi(f)i for each z, where qi is a unit and 1 = qiq2''' q^- T h e last equation shows t h a t the number of g's t h a t are —1 is even, and the corollary follows. C o r o l l a r y 2 ( G a u s s ' s * l e m m a ) . If an element of Z[x,xi, X2, • •., Xm-i] is reducible over the field of rational functions of xi, X2, • • -, Xm-i in the sense that it can be written as a product of two polynomials of positive degree in X with coefficients in this field, then it is reducible as an clement of Z[x^Xi, X2,

...,

Xm-l]'

Deduction. Let f{x) be reducible over the field of rational functions, say f{x) = g{x)h{x), and let di and G?2 be elements of Z[xi, X2, . . . , ^ m - i ] t h a t clear the denominators oi g{x) and h{x) respectively, in other words, elements such t h a t G{x) — dig{x) and H{x) — d2h{x) are in Z [ . T , X I , X2, • . . , Xm-i]T h e n did2f{x) — G{x)H{x); this equation can be divided successively by the irreducible factors of did2 to produce an equation f{x) = —^ • —^ (where Gauss's original statement was that a product of monic polynomials with rational coefficients can have integer coefficients only if the factors do. The same is true for TTi > 1: A product of monic polynomials whose coefficients are rational functions in xi, X2, . . . , Xm-i can have coefficients in Z[3;i, X2, • . . , Xm-i] only if the factors do. This statement can be proved in the same way as the statement above: When g{x) and h[x) are monic, f{x) is monic, so its factors —^ and —^ are monic, which implies that -^ = 1 and -^ — \ and therefore that g{x) = G{x) and h(x) = H{x). For more on Gauss's lemma, see Essay 2.5.


19

ei and 62 are integers for which 6162 = did2, and both factors have integer coefficients), which shows that f{x) is reducible in Z[x-, Xi, X2, . . . , Xm-i]The methods used to prove Corollary 1 prove another proposition: Proposition. If (f>i(t>2 ''' 0 can also be represented by a polynomial whose degree in x is less than m -\- j (replace the leading term (j){y)x'^^^ in X with (j){y)x^{x'^ — f{x)) while leaving the other terms unchanged). Thus, every ring element can be represented by a polynomial whose degree in x is less than m. In fact, every element of R[x,y] is congruent mod f{x) to one and only one polynomial of degree less than m in x, because an element of R[x,y] whose degree in x is less than m can be a multiple of f{x) only if it is zero. In the same way, any element of R[x,y] is congruent mod g{y) to just one element whose degree in y is less than n. Moreover, since the reduction method can be applied to each coefficient (f)i{y) of a polynomial (l)i{y)x'^~^ + 02(l/)^^"^ + • • • + (l>m{y) that has already been reduced mod / ( x ) , every element of R[x^ y] mod {f{x),g{y)) is represented by one and only one element of i^[x, y] whose degree in x is less than m and whose degree in y is less than n. Each element of this ring R[x,y] mod {f{x),g{y)) is a root of a monic polynomial with coefficients in R. Specifically, if (t){x^y) is an element of R[x^y]^ a monic polynomial J^{z) of degree mn with coefficients in R for which T{(j){x^y)) = 0 mod {f{x)^g{y)) can be constructed in the following way: For each of the mn monomials x'^y^ in which 0 < i < m and 0 < j < n, the polynomial (j){x^y)x'^y^ is congruent mod f{x) and g{y)^ as was just seen, to a sum of multiples of x^y^ ^ where 0 < a < m and 0 < /^ < n, in which the multipliers are in R, Thus, the congruence m—l n—1

determines an mn x mn matrix M of elements Mij^^fs of R once an ordering of the m.n monomials x'^y^ is decided upon. Otherwise stated, M is the matrix that represents multiplication by (/)(x, y) relative to the basis x^y^ of R[x,y] mod {f{x)^g{y)) over R. The characteristic polynomial of this matrix, which is to say the polynomial ^(2:) = det(2;/—M), is monic of degree mn in z; by the Cay ley-Hamilton theorem, it satisfies J^{(j){x, y)) = 0 mod {f{x),g{y)). (A proof will be given in the next essay.) Let this construction be applied not to a single polynomial (j){x^y) but to tx -\- uy^ regarded as a polynomial in new indeterminates t and u whose coefficients are in R[x^y]. The result is a polynomial T{z,tû) in 2:, t, and u with coefficients in R. Specifically, T is the characteristic polynomial det(2:/ — M) of the mn x mn matrix M determined by C • {tx -h uy) = MC mod {f{x),g{y)), where C is the column matrix of length mn whose entries are the monomials x'^y^ in which 0 < i < m and 0 < j < n arranged in some order and tx-\-yu is a 1x1 matrix. The entries of M are homogeneous polynomials of degree 1 in t and u with coefficients in R, so the entries oi zl — M are

22


homogeneous of degree 1 in z, t, and u. Thus, J-{z^ t, u) is homogeneous of degree mn in these indeterminates and has coefficients in R; moreover, it is monic in z. As was seen in the last essay, the irreducible factors of J^{z,tû) as a polynomial with integer coefficients (in 3 + z/ indeterminates) can be found, say T{z^tû) = YlJî{z^tû)^ where the Ti[z^tû) are irreducible. Because T is homogeneous, its irreducible factors Ti are homogeneous. Because J^ is monic in 2:, the leading coefficient of each of its irreducible factors Ti as a polynomial in z is ± 1 , so one can stipulate that each Ti is monic in z, and this condition determines the Jî completely. The required factorization (1)

f{x) = (t)i{x,y)(l)2{x,y) •' • (j)k{x,y) mod g[y)

contains one factor (j)i{x,y) for each Ti{z^tû). It is constructed as follows: As will be shown, the degree of Jî (it is homogeneous in z, t, and u) is a multiple of n, say it is /x^n. (By symmetry, this degree is also a multiple of m, a fact that is not of interest here.) Substitute tx + uy for z and 1 for t in Ti and write the result in the form (2) T^{x + uy, 1,u) = B^ôu^^^ + ^^,2^^^""' + ^^,2^^^""' + • • • + ^^,M^n. Each coefficient Bi^j is a polynomial in x, y, Ci, C2, . . . , c^^-i, and Cy with integer coefficients. The first fii of these coefficients are all zero mod g{y)^ which is to say that reduction mod g{y) gives Jî{x -h uy, 1, u) = iîU^^^""-^^ + • • • mod g[y), where the omitted terms are of lower degree in u and ifji = Bi^^. mod g{y). The factor (l)i{x, y) of f{x) mod g{y) corresponding to this factor Ti of T is (3)

0z(x, y) = -77^f|- mod ^(2/),

where g'{ij) is the derivative oi g{y). (Implicit in this statement, since (/)^(x, y) is monic in x, is the statement that ipi = g'{y)^^x^^ + • • • mod g{y) where the omitted terms have lower degree in x.) Example 1. f{x) = x'^ — 2 and g{y) = ^^ — 3. The first step is to find J^{z, t, u) for this / and g. When the monomials x"^^ for 0 < a < 2 and 0 < /^ < 2 are put in the order 1, x, y, xy, the matrix that represents multiplication by tx + uy becomes ' 0 t u 0 2t 0 0 u 3u 0 0 t 0 3u2t0 Therefore J^ is the determinant of

Essay 1.5 A Factorization Algorithm z -2t -3u 0

-t -u z 0 z 0 —3u -2t

23

0 -u -t z

which can be found without too much paper-and-pencil calculation to be z^ - (4^2 -h6?i2)^2 _^4^4 _ 12^21^2 _|_9^4^ rpj^-g polynomial T{z, t, u) is irreducible because J-{z, 1,1) = z^ — lOz^ + 1 obviously has no root mod 5, so it can only have a factorization of the form {z"^ -i-az-\-b) (z^ -h cz-^ d) = z^ — lOz^ + 1, and this would imply a — —c, d -^ ac-\- b = —10, and 6 = d = ± 1 , so a^ = —ac = b-\-d-\-10 =^ ±2 + 10, which is impossible. Therefore, x^ — 2 is irreducible mod y^ — 3 (the factorization algorithm produces only the one factor corresponding to ^ itself). To determine this factor—which must, of course, be a:^ — 2 itself— one computes the coefficient ip of u^ in T{x + uy, 1, li) = {x -\- uy)^ — (4 -h 6u^){x + uy)"^ + 4 — 12u'^ + 9t^^ because degT/degg = 2. (As expected, the coefficient of u^ is y"^ — 6y'^ + 9 = 0 mod {y'^ — 3), and the coefficient of u^ is 4:xy^ — 12xy = 0 mod {y'^ — 3).) Because xp = Qx'^y'^ — Ay'^ — 6x'^ — 12, formula (3) gives the factor 6x'^y'^-4y'^-6x^-12

_ 18^:2-12-6x2-12 _ 12x^-24 12 i^yy

=

X

2 mod g{y)

as expected. Example 2. f{x) determinant of

2 and g{y) = y

18. In this case, J-'{Zj t, u) is the

-t u 0 z z 0 —u -2t 0 z -t -ISu 18u -2t z 0 which can be found—the calculation is a variation of the one in Example 1—to be z^ - (4^2 + 361/2)^2 _^ 4^2 _ 72^2^2 _^ 324^^2^ rj.^^^ ^ factorization T{z^t,u) = Jî{z^t,u)J^2{z^t,u) can be found by completing the square to put J^ in the form T{z,t,u) = {z^ - 2t^ - l8v?Y - lUi^v? = {z^ - 2^ lSv?-l2tu){z'^-2t^-l8v? + l2tu). The factor oif{x) mod g{y) corresponding to Jî{z^tû) is, because in this case //i = d e g ^ i / d e g ^ = 1, the coefficient of u in {x + uyY — 2 — 181^^ _ i2u divided by g'{y), which is 2xy - 12 _ 2x2/2 - I2y _ 36x - 12^ = X - yf mod (y2 _ 18). 2y2 36 2?/ (As expected, the coefficient of v?^ which is y'^ — 18, is zero mod g{y).) In the same way, the factor of /(x) mod g{y) corresponding to ^2 is x + \y. Indeed, (x - \y) (x + \y) = x2 - ^y'^ = x^ - 2 mod [y'^ - 18), so /(x) = x^ - 2 splits mod g{y) = y'^ — IS into linear factors. (If y = \/T8, then | = \/2.)

24


Example 3. f{x) — x^ + c i x + C2, g{y) = y'^ — cf -^ 4c2. T h e factorization depends on factoring t h e characteristic polynomial of 0

t

— C2t —Cit

du 0

0 du

u

0

0

U

0 t —C2t —cit

where d = cf — 4c2. T h e computation of this characteristic polynomial is not too onerous. (One m e t h o d is t o use t h e formula z"^ — A\z^ -h A2Z^ — A3Z + A4 for the characteristic polynomial, where Ai is the sum of t h e i x i principal minors of t h e matrix.) T h e result is T{z, t, u) = z^ + 2citz^ + (2c2t^ + cji'^ - 2du^)z^ + Cit{2c2t'^ -\-4r

-h {2c2d - cld)ru^

2du^)z

+ J2Â dû

a homogeneous polynomial in 2:, t, and u with coefficients in Z[ci,C2] when c\ — 4c2 is substituted for d. T h e difficult step is the factorization of T{z^ t, u). W h e n t = 0 it is ^^ - 2dv?'z'^ + dû^ = {z'^ - dv?Y, and when 1/ = 0 it is z"^ + 2citz^ + (2c2 + cl)t^z'^ + 2ciC2t'^z + clt^ = (z^ + ci^t + C2t^Y. Therefore, the factorization of ^ , if there is one, must be of the form (2:^ + c i t z + C2t^ + ptu — dv?){z'^ -\- citz -\- C2t^ + gti^ — (ii^^), where p and g' are in Z[ci, C2]. T h e coefficient of tîx is 0 on the one hand and C2(p + q) on the other, so p — —q. T h e n the coefficient of t^v? is 2c2k{x, y) mod g{y)

and g{y) = xlJi{x,y)i;2{x,y)"-

^pi{x, y) mod

f{x)

be the factorizations of each modulo the other. Then k = I, and the can be so ordered that deg^ (/)^/ deg^^ ipi = deg / / d e g ^ for each i.

factors

Proof. To factor g{y) mod / ( x ) , one constructs the characteristic polynomial ^{z, t, u) of t h e m a t r i x tMyûM^ for which {tMy-\-uMx)C = C{ty-\-ux) m o d {g{y)^ f{x)), where C is the column m a t r i x of length mn t h a t contains the monomials y^x^ in which 0 < a < n = degg and 0 < f3 < m = d e g / . Because this characteristic polynomial is independent of t h e order chosen for the entries of C, it is clear t h a t !F{z^tû) = !F{zû^t). T h e factorization algorithm proves t h a t A: = / is t h e number of irreducible factors of J^{z,t,u), and the integers deg / deg^ ipi = d e g ^ deg^, (pi are t h e degrees of those factors. In addition to its aesthetic appeal, this theorem is a powerful tool. See Essay 2.1, where it is used in the proof of Galois's fundamental theorem. Inevitably, some readers will object t h a t t h e algorithm is impractical. T h e construction of T{z, t, u) is already a formidable task, and t h e factorization of this polynomial in three indeterminates with integer coefficients is even more daunting. But t h e practicality of t h e algorithm is irrelevant, because its purpose is to prove the existence of the factorization, not to effect it. Once t h e factorization is known to exist, methods for constructing it can be addressed. A similar situation occurs in t h e case of t h e fundamental theorem of algebra (see Essay 5.1); Newton's m e t h o d is in most cases the best way t o construct the roots of a polynomial, but other methods are needed to prove t h a t there are roots t o be constructed. Kronecker emphasized the importance of the problem of factoring f{x) m o d g{y) in a footnote to his 1887 paper "Uber den Zahlbegriff" [43, p. 262 of vol. * I called this theorem "Dedekind's reciprocity theorem" in Galois Theory [18, p. 66], but I have since learned that it already had the name "KroneckerKneser theorem" (see ([8] and [50]). Richard Dedekind discovered it in 1855, but the discovery was not published until Scharlau's paper [59] appeared in 1982. Kronecker included the theorem in his university lectures ([32, p. 309]). He might have known of Dedekind's work, but since he does not seem to have cited Dedekind, he probably discovered the theorem independently. The first publication of the theorem was by A. Kneser [36].

32


I l i a of t h e repubhcation in Mathematische Werke]. In 1881, he had already described an algorithm for such factorizations in the following way: It can be assumed t h a t f{x) has no repeated factors, because otherwise one could free it of repeated factors by dividing it by its greatest common divisor with its derivative. One sets z -\- uy in place of x in f{x), where u is an indeterminate; at t h e same time, one t r e a t s / itself as a function of X and the algebraic quantity y, which may figure in its coefficients. Therefore, denote / by f{x,y) and form the product of all the conjugate expressions f{z-\-uy^ y)^ t h a t is, all of t h e m t h a t arise when y is replaced by its conjugate values. This product is a polynomial in z whose coefficients are rational functions in ci, C2, . . . , c^^ [the presence of u in the coefficients is ignored] and therefore, as has been shown, can be decomposed into irreducible factors. If these factors are F i ( z ) , F2{z), . . . , then, as is easy to see, the greatest common divisors of / ( z + uy^ y) and Fi{z) for z = 1, 2, . . . give the irreducible factors of f{z + uy^ y)^ from which the irreducible factors of f{x) itself can be found when x — uy is substituted for z. It remains to remark t h a t substitution of 2; +1^^ for x ensures t h a t y actually occurs in the coefficients of / . " ^ Dabei kann angenommen werden, dass die Function f{x) keine gleichen Factoren enthalt; denn anderenfalls wiirde man dieselbe von gleichen Factoren dadurch befreien konnen, dass man sie durch den grossten Theiler, den die Function f{x) mit ihrer Ableitung gemein hat, dividirt. Man setze nun zuvorderst z -\- uy an Stelle von x in / ( x ) , wo u eine unbestimmte Grosse bedeutet; man betrachte ferner / selbst als Function von x and der zum Rationalitats-Bereich gehorigen algebraischen Grosse y welche also auch in den Coefficient en vorkommen kann, bezeichne demnach die Function / durch / ( x , y) und bilde das Product aller mit einander conjugirten Ausdriicke f{z-\-uy,y), d. h. aller derjenigen, welche entstehen, wenn man die mit y conjugierten algebraischen Grossen an Stelle von y setzt. Dieses Product ist eine ganze Function von z, deren Coefficienten rationale Functionen der Variabeln ci, C2, . . . , Cj, sind, kann also nach dem Vorhergehenden in irreductible Factoren zerlegt werden. Sind diese Factoren: Jî{z), ^2(2^), •••, so bilden, wie leicht zu sehen, die grossten gemeinschaftlichen Theiler von f{z-\-uy,y)

und

Th{z)

fur /i = 1, 2, . . . die irreductibeln Factoren von f{z-\-uy^ y), aus denen die Factoren von f{x) selbst unmittelbar hervorgehen, wenn wieder x — uy ?iii Stelle von z gesetzt wird. Es ist noch zu bemerken, dass die Einfiihrung von 2: + ny an Stelle von X zu dem Zwecke erfolgt ist, das Vorkommen von y in den Coefficienten zu sichern. (From §4 of [39]. The translation above is somewhat free, and Kronecker's notation F , 91, IH', d\'\ d\"'^ . . . , has been changed to / , ?/, ci, C2, . . . , Cjy to agree with the notation of these essays.)

Essay 1.7 About the Factorization Algorithm

33

My discussion of this subject in [18, §§60-61] shows that I found the exact algorithm Kronecker had in mind—not to mention its validity—far from "easy to see." In retrospect, however, I do see that it is essentially the algorithm of Essay 1.5. Instead of factoring a polynomial in x alone as in Essay 1.5, Kronecker changes f{x) to f{z + uy), where u is an indeterminate, in order to be sure that the polynomial to be factored does involve y. He then forms the "product of its conjugates," by which he surely means (see his §2) the norm of f{z -h uy) as a polynomial with coefficients in the root field of g{y)^ which is to say that it is plus or minus the constant term of the polynomial of which f{z-\- uy) is a root. The polynomial of which f{z-\- uy) is a root is the characteristic polynomial of the matrix M of elements of R[z, u] defined by C • f{z + uy) = MC mod g{y)^ where C is the column matrix with entries 1, ?/, ^^, . . . , y'^~^ (n being the degree of ^). Thus, Kronecker's ^1(2;), ^2(^)5 • • • are the irreducible factors of the constant term of the characteristic polynomial of this M. But this is zbdetM, and M is the n x n matrix f(zl + wG), where G is the matrix determined by g{y) as in Essay 1.6. Thus, the Fi{z) are the irreducible factors of det/(2;/ + uG). Since, as was shown in Essay 1.6, det/(;^/ + uG) = T{z^ 1, —u)^ he is saying that the desired irreducible factors 0i(x, y) are the greatest common divisors of f{z^-uy) with Ti (z, 1, —u)^ T2{z^ 1, —It), . . . , or, better, the greatest common divisors of j{x) with T\{x — uy^ 1, —ix), ^2(^ — uy^ 1, —'^)5 When one changes the sign of u and notes that a common divisor of f{x) and Ti{x -\- uy^lû) must be independent of u and must therefore divide all coefficients of Jî{x -h uy^lû), Kronecker's claim becomes the statement that (j)i{x^y) is the greatest common divisor of fix) and the coefficients of J-i{x + uy^ l,?i) when it is expanded in powers of u. Now, (j)i{x^y) is the greatest common divisor of f{x) and the leading coefficient %lji{x,y) mod g{y) in this expansion, so his claim comes down to the statement that (/)i(x, y) divides all the other coefficients oi Ti{x-ûy^l, u) when they are regarded as polynomials in x with coefficients in the root field of ^(y). Proposition. As a polynomial in x and u with coefficients in the root field of g{y), Jî{x -\-uy^lû) is divisible by (f)i{x^y). Proof. Let /C be the field K[x^y] mod ((/>i(x, y),g{y)), which is the ring of polynomials in X with coefficients in the root field of g{y) modulo the irreducible polynomial (j)i{x^y) with coefficients in this root field. (As before, K denotes the field of rational functions in ci, C2, . . . , c^.) Since f{x) is 0 as a polynomial with coefficients in /C (because f{x) is divisible by (j)i{x^y) mod ^(2/)), and since ^(2;, t, tt) = 0 mod {f{x),g{y)) as was shown in Essay 1.6, J-'{z^ t, u) is zero as a polynomial with coefficients in IC. Therefore at least one of its factors Tj{z^tû) must be zero as a polynomial with coefficients in /C. For any such value of j , J-j{x -\- uy^ 1, u) must be zero as a polynomial in u with coefficients in /C. In particular, iljj{x^y) must be zero as an element of /C, which is to say that ipj{x,y) is divisible by (f)i{x^y) mod g{y). But ipjix^y)

34


is a unit times (j)j{x^y) and the irreducible factors (j)j{x^y) of f{x) mod g{y) are distinct because f{x) is irreducible. Therefore, 2pj{x,y) is not divisible by (j)i{x^y) mod g{y) unless j — i, and the proposition follows.

Essay 1.8 Proof of the Fundamental Theorem

35

Essay 1.8 Proof of the Fundamental Theorem As before, R will denote the ring Z[ci, C2, . . . , c^^] of polynomials in ci, C2, . . . , Cjy with integer coefficients and K will denote its field of quotients, the field of rational functions of ci, C2, . . . , c^^. When z/ = 0, i? is the ring of integers and K is the field of rational numbers. The theorem to be proved was stated in Essay T2: Fundamental Theorem. Given a polynomial f{x) — a^x^â\x^~^ ^ h On of positive degree n with coefficients in R, construct a monic, irreducible polynomial g(y) with coefficients in R with the property that f{x) is a product of linear factors with coefficients in the root field of g{y). In other words, when the factors of f{x) mod g{y) are taken to be monic in X, the factorization is to have the form f{x) = ao{x — pi{y)){x — p2{y)) • • • (x — pn{y)) mod g{y), where ao is the leading coefficient of f{x) and the pi{y) are elements of the root field of g{y). Such a polynomial g{y) will be said to split f{x). As the proposition at the end of Essay 1.4 implies, the roots Pi{y) are determined, as elements of the root field of g{y)^ by f{x). Loosely speaking, the root field of g{y) extends computations in J^ in such a way that the given f{x) with coefficients in R splits into linear factors. The factorization algorithm of the preceding essays, which assumes that f{x) is monic and irreducible, can be used to factor an arbitrary / by taking the change of variable xi = aox and writing aQ~^f{x) = Xi + aiXi~^ + aoa2Xi~'^ H ha^Q~âjX^~-^ H ha^'ân- A factorization of aQ~^f{x) as a polynomial in Xi becomes a factorization of f{x) as a polynomial in x when it is divided by the nonzero element OQ"^ of K and the substitution Xi = aox is made. In this way, the theorem is reduced to the case in which f{x) is monic. The iteration theorem below proves this case of the theorem using the factorization algorithm for monic, irreducible polynomials, which obviously implies a factorization algorithm for arbitrary monic polynomials. This theorem differs from Kronecker's theorem in Ein Fundamentalsatz der allgemeinen Arithmetik [42] in that it specifies that the splitting field is to be described as the root field of g{y)^ whereas Kronecker left the form of the description open and in fact preferred a "prime module system" of an altogether different type. Nor is the proof below similar to Kronecker's, which constructed specific relations satisfied by the roots in a splitting field. Instead, it constructs a splitting polynomial g{y) for f{x) in an iterative way that follows the naive proof sketched at the beginning of Essay 1.4. Iteration Theorem. Given a monic polynomial f{x) with coefficients in R, and given a monic, irreducible polynomial g{y) with coefficients in R that does not split f{x), construct a monic, irreducible polynomial h{z) with coefficients in R for which the factorization of f{x) mod h{z) contains more linear factors than does the factorization of f{x) mod g{y).

36


Proof. The factorization of f{x) mod g{y) is accomphshed by applying the factorization algorithm to each of the monic, irreducible factors of f{x) and taking the product of the results. By assumption, at least one of the irreducible factors of f{x) mod g{y) obtained in this way has degree greater than 1. With the notation as before, at least one of the polynomials T{z^ t, u) used in the factorization of f{x) mod g{y) (there is an T for each irreducible factor of /(x)) must, by assumption, have at least one factor !Fi{z^tû) that gives rise to a monic factor (j)i{x^y) of f{x) mod g{y) of degree greater than 1. Let Ti{z,tû) be such a factor, and let i^j{x) is defined by (3)

X 0 , , , ( x ) = X ; a r a ? •••
59

Proposition 3. Each element of R is congruent mod A to one and only one element in canonical form. Proof Let Ti = 0i,o(ai) + (/)z,i(a^)ci+0i,2( i or Cj for j > n — i -f 1. Moreover, Ti = 0 mod A (because fi{ai) = 0). Division of a given element 0 of i? by T^ = a^ + • • • regarded as a monic polynomial in a^ leaves a remainder that is congruent to (f) mod A, call it 01, from which a^ has been eliminated. Division of 01 by T^-i = cin-i + *' • regarded as a monic polynomial in an-i leaves a remainder that is congruent to 01 = 0 mod A, call it 02, in which the degree of a^-i is at most 1 and a^ has not been reintroduced. Continuing in this way—on the zth step dividing 0i_i by Tn+l_^ regarded as a monic polynomial in a^+i-z and calling the remainder 0^—produces a sequence 0 = 0o, 0i, 02, • • •, 0n of polynomials congruent to 0 mod A. Since the degree in a^ is reduced to at most n — z by the (n + 1 — i)th step and is not increased by any subsequent step, 0^ is in canonical form. Thus, every element 0 of i? is congruent mod A to an element 0n in canonical form. Any element '0 of i^ is congruent mod A to an element from which the c's have been eliminated, because division of ip by Ti regarded as a monic polynomial in Cn leaves a remainder that does not contain c^, then division of this remainder by T2 regarded as a monic polynomial in c^-i leaves a remainder that contains neither c^-i nor c^, and so forth. (In other words, at the ith step, one substitutes c^+i-i — Ti in place of Cn-\-i-i to obtain an element that is unchanged mod A in which c^+i-i is no longer present and no c with a larger index is reintroduced.) When the input to the first algorithm is a polynomial in ai, a2, . . . , a^ alone and the input to the second algorithm is a polynomial in canonical form, these two algorithms are inverse to one another and establish a one-toone correspondence between polynomials in ai, a2, . . . , a^ and polynomials in canonical form in which corresponding polynomials are congruent mod A, because the first algorithm produces a sequence of equations QiTji

01 = Q2Tn-l n-l = QnTl

n, +02, +0n,

in which, when 0 contains none of Ci, C2, . . . , c^, 0^ contains none of Q+i, Q+2, . . . , c^ for z = 1, 2, . . . , n — 1, and 0^ is in canonical form. Thus, 0^ is the remainder when 0^+i is divided hy Tn-i regarded as a monic polynomial in Q+i, as was to be shown. To say that a polynomial 0 in ai, a2, . . . , a^ alone is congruent to zero mod A means that it has the form 0 = ^DjAj] this means that 0 = 0 because

60

2 Topics in Algebra

substitution of (—1)V^ for Q in this equation leaves t h e left side unchanged and makes t h e right side zero. Thus, two elements of R in canonical form are congruent mod A if and only if the two polynomials in a i , a2, . . . , a^ alone to which they correspond are congruent mod A, which is true if and only if these two polynomials are equal. This shows t h a t two elements of R in canonical form are congruent mod A only if they are equal and completes the proof of t h e proposition. Proof of the Theorem. The formula x^ -f Cix'^~^ + C2x'^~'^ + • • • + c^ = (x — ai){x — a2) •'' {x — an) mod A gives an explicit factorization of this polynomial with indeterminate coefficients into linear factors over an integral domain t h a t contains Z[ci, C2, . . . , c^] as a subring. (Elements of R t h a t do not contain a i , a2, ... ^ an are in canonical form, so they are congruent mod A only if they are equal.) Thus, the field of quotients of this integral domain is a splitting field of the polynomial. Since it is generated over Z[ci, C2, . . . , c^] by the roots of t h e polynomial in the ring, it is a minimal splitting field and is therefore the splitting field. C o r o l l a r y 1. The Galois group of x^ + cix^~^ -\- C2x'^~'^ + • • • + c^; where the c 's are indeterminates, permutes the roots in the splitting field in all n\ possible ways. Deduction. Each element of the integral domain R mod A has a unique representation in the form ^ êi,e2,...,e^(ci, C2, . . . , c^)a^â2^ • • • a^j', where the coefficients Be^ê2,...,erAî^ ^2, • • •, Cn) are in Z[ci, C2, . . . , Cn] and the monomials alâ2^ • • • a^"^ range over all n! such monomials in which e^ < n — i for each i, which shows t h a t the degree of the splitting field as an extension of the field of rational functions in ci, C2, . . . , c^ is n! and therefore t h a t n! is the order of the Galois group. C o r o l l a r y 2. A polynomial in ai, 02, • -, On that is unchanged by all n\ permutations of ai, 02, . . , On has one and only one representation as a polynomial with integer coefficients in the elementary symmetric polynomials ai, a2, •.., cFn in ai, 02, . . •, anDeduction. Let 0 be a given polynomial in a i , 02, . . . , On t h a t is unchanged by permutations of t h e a's. W h e n (j) is regarded as an element of i? = Z[ai, a2, . . . , ttn, ci, C2, . . . , Cn] it is cougrucnt mod A to one and only one element in canonical form; call it 0^. Since 0 and Ai, ^42, . . . , An are unchanged by permutations of the a's, (pn is unchanged by permutations of the a's. Since (pn does not contain a^, it cannot contain any a. Therefore (pn is a polynomial ^ ( c i , C2, . . . , Cn) in the c's alone. Since it is congruent mod A to (j) and to no other element of Z[ai, a2, . . . , a^], it follows t h a t 0 is equal to ^(—(71, ( 7 2 , . . . , (—1)^(7^) and to no other polynomial in the cr's with integer coefficients.

Essay 2.4 The Splitting Field of x^ + cix^"^ + C2X^"^ + • • • + Cn

61

Example. T h e adjunction relations t h a t describe the splitting field of x^ -\cix^ + C2X^ + csx^ + C4X + C5, where ci, C2, . . . , C5 are indeterminates, are al + c i ^ i H- C2ai -f c s ^ i -f C4âi -h C5 = 0, (0^2 + Q;2cei + a^^^f + 0:20^1 + ^ m + /co) can be expressed as a polynomial in CQ, ci, . . . , Cm+n with integer coefficients, as was to be shown. See P a r t 0 of [19] and [23, Nr. 20] for fuller accounts and other references.

Some Quadratic Problems

Essay 3.1 The Problem An + B = n and "Hypernumbers" The problem that motivates the study of 'hypernumbers' in the next few essays comes from the prehistory of mathematics. In "The Measurement of the Circle," Archimedes states that Y | | < \/3 and ^YH' > v ^ without giving any derivation. The closeness of these approximations becomes clear when one compares 265"^ = 70225 to 3 • 153^ = 70227 in the case of the first and 1351^ = 1825201 to 3 • 780^ = 1825200 in the case of the seco'nd to find that 265^ + 2 = 3- 153^ and 1351^ = 3 • 780^ + 1. There have been many attempts to guess how Archimedes might have derived these estimates. One can be certain that they were not found by trial and error; very probably, they involve some analogue of what is today called the continued fractions algorithm, but there is no documentary evidence on which to base such speculations. A similar problem is treated in earlier Greek mathematics. As early as the time of Pythagoras, Greek mathematicians are said to have derived* an entire sequence of approximations to \/2 in the form of "side and diagonal" numbers. If d is the length of the diagonal of a square and s is the length of its side, then (P = 25^ by the Pythagorean theorem. The followers of Pythagoras are thought to have discovered that there are no whole-number solutions (d, s) of this equation—and to have been very dismayed to learn that numbers, in the simplest sense, are not sufficient for the description of this simple geometrical construction. But their study of the problem probably went well beyond the impossibility of (P = 2s'^ in whole numbers to the following sequence of approximate solutions d^ = 2s'^ d= 1. A solution (d^, Sn) of (i^ = 2s^ =b 1 implies a solution (o^^+i, s^+i) of 2s^ implies (2s + df < 2(s + df and 2(s + df - (2s + df = d^ - 2s^; in the same way, d^ < 2s^ implies (2s + df - 2(s + df = 2s^ - d^.

Essay 3.1 The Problem AD + B = D and "Hypernumbers"

67

this essay, and Bhascara Acharya, in t h e 12th century, mentioned t h e spectacular fact t h a t the smallest number x for which 61x^ + 1 is a square is X = 226153980. Problems similar to this one are connected with the famous "cattle problem" of Archimedes [49], causing scholars to believe t h a t Archimedes knew far more about such number-theoretic problems t h a n our usual view of Greek mathematics as being primarily geometrical would lead us to expect. These problems will be studied in this group of essays in the form of t h e problem t h a t will be indicated by the symbolic equation AU -h 5 = D, which is to say t h e problem "Given numbers A and 5 , find numbers x for which Ax'^-\-B is a square," say Ax^-^B = y"^. (At first glance, Archimedes' solution 265^ + 2 = 3-153^ does not appear to be an instance of this problem, because A = 3 and B = 2 are on opposite sides of t h e equation, b u t if t h e equation is multiplied by 3, it becomes 3 • 265"^ H- 6 = (3 • 153)^, which is a solution of 3n + 6 = n . Conversely, in any solution (x, y) of 3x^ -\- ^ — y^ ^ y must be divisible b y 3, say z = i//3, and division of 3x^ + 6 = ^z^ by 3 gives a solution of x^ + 2 = 3z^.) Because the solution of this problem is easy when ^4 is a square,* t h e case in which ^4 is a square will be ignored. B r a h m a g u p t a stated (but in words, not as an algebraic formula) t h e crucial tool t h a t is used in the solution of AU + B = D. It is the observation^ t h a t a solution of AU + ^ = D can be combined with a solution of AU + C = D to find a solution of ^ D + BC = U. Specifically, if Ax'^ -\- B = y'^ and Au'^ -\-C = v'^, then A{xv + yu)'^ + BC — {Âxu -h yv)^. It seems likely t h a t some version of this remarkable fact was known was known in Greek times and t h a t it was involved in the calculation of approximations to square roots. Archimedes' approximations to \ / 3 can be derived using B r a h m a g u p t a ' s formula in t h e following way: Combine t h e simple equation 3 - 1 ^ + 1 = 2^ with itself to obtain 3 • 4^ + 1 = 7^, t h e n combine this new equation with 3 • 1^ + 1 ^^ 2^ to obtain 3 • 15^ + 1 = 26^, and so forth, to obtain the infinite sequence 3-1^ + 1 = 2^ 3.42 + 1 = 7^ * When A is a square, the problem is to write the given B in the form s^ — ^^ = (s — t)(s + 1 ) , where t is a multiple of the square root of A. Thus s — t — B\ and s-\-t — ^ 2 , where B = B1B2 is one of the finite set of factorizations of B in which Bi < B2. Since Bi-\-B2 — 2s is even, the problem is thus to find all factorizations BxB2 = B, if any, in which Bx < B2, Bi = B2 mod 2, and {B2 - Bi)/2 is a multiple of the square root of A. For each of them, {^^^Y = {^^^Y + B is a solution, and there are no others. ^ See [10, p. 363]. The proof, using modern algebraic notation, is a simple calculation. How Brahmagupta might have proved it without algebraic notation—or how he might have known it is true—is a mystery. Certainly Euclid's Proposition 10 of Book 2 (see note above) indicates a Greek awareness of a similar phenomenon many centuries earlier, but there is no reason to suppose that the Greeks were the first.

68

3 Some Quadratic Problems 3 • 15^ 1 = 26^ 3 • 56^ 1 = 97^ 1 = 362^ 3•209^ 3.780^ + 1 = 1351^

of solutions of 3n + 1 = D that includes the one Archimedes used. Each of these equations can be combined with 3 • 1^ + 6 — 3^ to obtain 3-52 + 6 = 3-192 + 6 = 3-712 + 6 = 3 • 265^+ 6 = 3 • 989^ + 6 = - 3691^ + 6 =

(^^ 3 ) ^ isM l ) ^ {c^ 4 l ) ^ {cM 5 3 ) ^

ic^ 5 7 l ) ^ ii^ 2 l 3 l ) ^

3 to obtain an infinite sequence 52 + 2 == 3 192 + 2 =- 3 71^ + 2 == 3 265^ + 2--= 3 989^ + 2--= 3 3691^ + 2'-= 3

32, ll^ 41^, 1532, 571^, 2131^,

of solutions of D + 2 = 3n that includes, of course, the other solution Archimedes used. (One naturally wonders whether D + 1 = 3n is possible; it is not, because —1 is not a square mod 3.) In modern terminology, Brahmagupta's formula has become the statement that for expressions of the form y + xy/A, the product of the norms is the norm of the product. Here the norm of y -\- xy/A is by definition its product with its conjugate y — x \ M , which is to say that it is {y + xy/A){y — xy/A) = ^^ — Ax^. When one computes with these expressions using the normal rules of algebra—the Buchstabenrechnung of Essay 1.1—one finds that the conjugate of a product is the product of the conjugates, because f ^ -h xvAj

(v-^ uvAj

(y — XVA\

(V — uyA\

= (yv + Axu) -\- [yu + xt')vG4,

whereas = {yv + Axu) — {yu +

xv)vA,

Essay 3.1 The Problem AU-\- B = U and "Hypernumbers"

69

so the norm of a product can be computed in either of two ways: the product can be expanded (y + x\/]4) (v + uy/A j = {yv -f Axu) + {yu + xv) y/A to find that the norm of the product is {yv-\-Axu)'^ — A{yu-\-xv)"^, or the norm of the product can be computed by multiplying the product ( y + xvAj f v -\- uy/Aj by its conjugate (y — xy/Aj

(v — uy/Aj.

Thus

{yv + Axu)'^ - A{yu + xv)^ = {y'^ - Ax^){v'^ - Au^). With B = y'^—Ax'^ and C = v'^—Au'^, this is Brahmagupta's formula rewritten as {yv + Axu)'^ — A{yu + xv)'^ = BC. A solution of ^ n + 5 = D is an expression y + X \ / A , in which x and y are numbers, whose norm is B. For want of a better term, I will call such an expression y + x\/A a hypernumber for A, so that the problem "find all solutions of An + -B = D" for given numbers A and B^ with A not a square, becomes "find all hypernumbers for A whose norms are J5." More precisely, the hypernumbers for a given number A not a square can be described in the following way: As in Essay 1.1, a number is a term in the sequence 0, 1,2, For a given number A not a square, a hypernumber is an expression y -f- xy/A in which x and y are numbers and y/A is a mere symbol. Hypernumbers for the same A are added in the obvious way, (T/I

+ X2 vC4) + (2/2 + X 2 \ / l ) = (^1 + y2) + {xi +

X2)VA,

and they are multiplied using the rule f y/A J = A to obtain (^yi + X2^/Aj [y2 + 2:2V^j = (2/12/2 + ^^1X2) + {yiX2 +

y2Xi)\fA.

Otherwise stated, the hypernumbers for A are N(X) mod (X^ — A), the set of all polynomials in X whose coefficients are numbers, when two such polynomials are considered to be equal if they are congruent mod (X^ — A). Every polynomial in X whose coefficients are numbers is congruent mod {X'^ — A) to one and only one polynomial of degree less than 2 (replace X^ with A, X^ with AX, X^ with A^, and so forth), and the defining relation X'^ =^ A justifies writing yfA in place of X. This definition of course implies the rules of addition and multiplication of hypernumbers just stated. (The assumption that A is not a square guarantees that nonzero factors can be canceled in the arithmetic of hypernumbers y-\-xy/A, because it guarantees that X'^ — A is irreducible, from which it follows that Z[X] mod {X'^ — A) is an integral domain; then for integers r, 5, x, y, u, v the congruence {s-\-rX){y-\xX) = (s -f rX){v 4- uX) mod (X^ - A) implies 5 + r X = 0 or y + x X = v + uX mod (X^ — A) and therefore implies s -\- rX = 0 or y-{- xX — v ^ uX. On the other hand, if A = r^ then, (r + X ) r = (r + X ) X mod (X^ - A) even though r + X ^ O a n d r ^ X mod (X^ - A).)

70

3 Some Quadratic Problems

The exclusion of negative numbers is a bit inconvenient—-the norm y'^—Ax'^ of a hypernumber y + xy/A may not be a number in this strict sense because Ax'^ may be larger than 2/^, and the conjugate y — x\J~A of ^ + x\/~A will be a hypernumber only when x = 0—but insistence on the narrow definition of "number" can be maintained with very little real difficulty and gives the theory a pleasing economy of structure. All of the results in the first five essays of this section, including the law of quadratic reciprocity in Essay 3.5, are deduced using only the arithmetic of numbers 0, 1, 2, . . . in the narrowest sense.

Essay 3.2 Modules

71

Essay 3.2 Modules T h e notion of a module of hypernumbers t h a t is introduced in this essay is used in t h e next essay to solve AD -\- B = D and in the following essays to deal with other questions in number theory. Very simply put, a module of hypernumbers for a given A is a list of hypernumbers for t h a t A, written between square brackets to indicate t h a t the list is to be used to define a congruence relation. The concept is motivated by the following reexamination of the Euclidean algorithm. Gauss's notion of what it means to say t h a t a = b mod m—that is, two numbers a and b are congruent modulo a third number m—was generalized by Kronecker* as follows: Given a list of numbers m i , m2, . . . , TTT,^, two numbers a and b are c o n g r u e n t m o d u l o [TTII, m2, • . . , m ^ ] , written a = b mod [mi, 1712, • • •, rn/j], if there are numbers i i , 22, - - - î/j. and j i , J2, • • •, j ^ such t h a t Ci-\-Yla=i '^^'â = ^ + Sa=i^Q!^^Q!- ^ module is a (nonempty, finite) list of numbers [TTII, 7722, . . . , 777,^] written between square brackets to indicate t h a t they are to be used to define a congruence relation in this way. Two modules are e q u a l if they define the same congruence relation. Clearly, two modules are equal if t h e lists of numbers they contain can be obtained from one another by a sequence of steps in which (1) terms are rearranged, or (2) a zero is omitted from the list or annexed to it, or (3) a t e r m is added to or subtracted from another term. (A subtraction assumes, of course, t h a t the t e r m being subtracted is less t h a n or equal to t h e t e r m from which it is being subtracted.) In the case of operations of types (1) or (2) the assertion is obvious. In t h e case of an operation of t y p e (3), it follows from t h e observations t h a t a -f- i i ( m i + 7712) + Z2^2 + ^3^3 + • • • = 6+ji(miH-7n2)-f J 2 ^ 2 + J 3 ^ 3 H implies a - h i i m i + ( 2 1 + ^ 2 ) ^ 2 ^ - i s ^ s H = 6 + j i m i H - ( j i H - J 2 ) ^ 2 + J 3 ^ 3 + - • • and, conversely, a+Zi777,i+i2^2-f ^WsH = 6+Jimi+J2^2+J3^3H implies a+ii(?77.i+777,2)+ (^'(+^2)^2+^3^3H ^ b-\-j'i{mi +777-2) + (^1 + ^ 2 ) ^ 2 + J 3 ^ 3 + • • • when i^777,2 + J W 2 is added t o b o t h sides. These simple observations lead t o a version of the Euclidean algorithm: The Euclidean Algorithm. Input: A list of numbers describing a module. Algorithm: While the list contains more than one number If the first entry is zero, drop it from the list. If the first entry is greater than the second entry, interchange entries. Otherwise, subtract the first entry from the second. End

the first

two

See, for example, [44, p. 144]. Kronecker did not go to the extreme that I have of insisting that the multipliers all be natural numbers, so he did not need to put sums of multiples of m's on both sides of the equation.

72


Output: The list with one entry that remains. For example, [21,15,6] = [15,21,6] = [15,6,6] = [6,15,6] = [6,9,6] = [6,3,6] = [3,6,6] = [3,3,6] = [3,0,6] = [0,3,6] = [3,6] = [3,3] = [3,0] = [0,3] = [3]. Each step results in a new module equal to the preceding one; it either reduces the length of the list (the first alternative holds), or it reduces the sum of the entries (the third alternative), or it is followed by a step in which one of these two types of reduction occurs (the second alternative). Therefore, the algorithm eventually terminates, and reduces the module to a very simple form: Theorem 1. Given any module [mi, m2, - • •, m^]; there is a number n for which [mi, m2, ..., m^] — [n]. This theorem gives a canonical form for modules, because [ni] = [77,2] only if ni =712. (If [^1] = [^2], then each of the numbers ni and n2 is a multiple of the other, which implies ni = 712, because it implies that if one is zero, then both are, and otherwise, each is less than or equal to the other.) One can determine whether two given modules are equal by putting them both in canonical form; they are equal if and only if the canonical forms are identical. The number n is obviously the greatest common divisor of mi, 7712, • • •, '^n^^ except when the numbers mî are all zero, in which case there is no greatest common divisor because all numbers are common divisors. Corollary. / / two lists determine the same module, they can he transformed into one another by a sequence of steps of types (1), (2), and (3) described above. Deduction. One can pass from either of them to their common canonical form and back by a sequence of such steps, so one can pass from either of them to the other. The "Euclidean algorithm" of Essay 1.4 shows that if [7711,7712] = [^], there are integers (j) and if) for which (j)mi + ipm2 = 77,. Without using integers, this fact can be stated and generalized as follows: Proposition 1. / / [777,1, ^ 2 ; •••; '^î] = [^] o.'nd m.^ 7^ 0, then there are numbers ki, k2, ..., kf^ for which kirui +/c2r7i2H \-k^-im.^-i-\-n = kfj^m^. Proof. Because 71 = 0 mod [77,] and therefore 77, = 0 mod [777,1, ^ 2 , • • •, '^^l]l there are numbers zi, 22, . . . , i^ and ji, J2, • • •, j/^ such that n -\-J2â'â = X l i a ^ a - What is to be shown is that there is an equation of this form in which ia > ia for Q; = 1, 2, . . . , /i — 1. If i^ < ja for some a, one can add Tâ'îi to both sides by adding 772^ to i^ and 777,c, to j ^ , increasing io, without changing any other i and without changing any j other than j ^ . Repetition of this step enough times makes ice ^ ja without changing the relation between i(i and j ^ for any (3 < /i other than a. Since this can be done for each a < /x, the desired conclusion follows.

Essay 3.2 Modules

73

Two numbers mi and 1712 are relatively prime if [mi, 7712] = [1]. Proposition 1 implies that if mi and m2 are relatively prime and nonzero then each is invertible mod the other. It also implies an important theorem of elementary number theory: The Chinese remainder theorem. / / / > 0 and F > 0 are relatively prime and g and G are given numbers, the congruences x = g mod / and X ~ G mod F determine a number x mod fF in the sense that there is a solution X of the congruences and any two solutions are congruent mod fF. Proof By the proposition, there are numbers /ci, A:2, /i, and I2 for which kif + 1 = k2F and hF + 1 = hf. Then x = g • k2F ^ G • hf satisfies X = g-k2F = g'{kif^l) = g mod / and x = G-hf = G-(/iF-f 1) = G mod F , as required. The uniqueness of x mod fF follows from simple counting: Since one of the fF numbers x less than fF solves each of the fF possible problems X = g mod / and x = G mod F in which g < f and G < F , no two of them solve the same problem. There is a natural way to multiply modules: The product of [777,1, '^2, . . . , m^] and [ni, 77,2, • • •, nj^] is by definition the module described by a list [..., rrianp,...] made up of all products of one m and one 77,, arranged in some order. Multiplication is well defined for modules in the sense that if one list is replaced by another list describing the same module, then the product list may change, but the module it describes will not. (This statement is clear if the passage from one factor to an equal factor involves rearranging the list or omitting or annexing zeros. If the passage involves adding a term to or subtracting a term from another, it is only slightly less obvious.) Multiplication of modules is obviously commutative and associative. All of the same ideas apply without change to modules of hypernumbers, except that there is no Euclidean algorithm in the case of hypernumbers, and the problem of establishing a canonical form for modules of hypernumbers is more challenging. Let a number A, not a square, be fixed throughout the discussion. A module of hypernumbers is simply a list [mi, m2, . . . , m^] of hypernumbers (for the given A) enclosed in square brackets. Two hypernumbers a and b are congruent modulo a given module, written a = fe mod [mi, 777-2, •••? m^], if there are hypernumbers ii, i2, . • •, i/^ and j i , J2, . . . , j ^ such that a -\- îa'â = b -\- YlJa'â- Two modulcs are equal if they determine the same congruence relation. Again it is easy to see that two modules are equal if the lists of numbers they contain can be obtained from one another by a sequence of steps in which (1) terms are rearranged, or (2) a zero is omitted from the list or annexed to it, or (3) a term is added to or subtracted from another term. (A subtraction is of course possible only when the coefficients of the term being subtracted are no larger than the corresponding coefficients of the term from which it is being subtracted.) In the hypernumber case, there is another elementary

74


operation t h a t does not change t h e module, namely, (4) \fA times a term is added to or subtracted from another term. This set of four types of transformations t h a t change a module into an equal module are sufficient to establish a m e t h o d for determining whether two given modules are equal: T h e o r e m 2. pernumbers for [e/, eg + ey/A], g^ = A mod / .

Let A he a fixed number, not a square. Every module of hyA that is not"" equal to [0] is equal to a module of the form where e, f, and g are numbers for which ef ^ 0, g < f, and Two modules of this form are equal only if they are identical.

A module [e/, eg + e^/A] in which e / 7^ 0, ^ < / , and g'^ = A mod / will be said to be in c a n o n i c a l form. Proof. T h e following elaboration of t h e Euclidean algorithm p u t s any module t h a t is not equal t o [0] in canonical form after a finite number of steps. By assumption, t h e list t h a t presents the given module contains at least one nonzero entry, call it y + x\/~A. Because the number \y'^ — Ax'^\ can be annexed to the list, one can assume without loss of generality t h a t the list t h a t presents the given module contains a nonzero number. (The annexed number is y(7/ + X \ / A ) — x\fA{y + x\fA) if y'^ > Ax'^ and xy/A{y + x\fA) — y{y + xy/A) if y'^ < Ax'^. Therefore, annexing it t o the list does not change the module. It cannot be zero because A is not a square, so Ax'^ cannot be a square"^ unless x = 0.) Therefore, provided the given module is not [0], one can assume without loss of generality t h a t the list representing the given module has a nonzero number as its first entry. Moreover, because the first entry times \/]4 can be annexed to the list if necessary, one can also assume without loss of generality t h a t the list contains at least one hypernumber t h a t is not a number. Reduction to Canonical Form Input: A presentation of a module in which the first entry is a nonzero number, and at least one entry is a hypernumber t h a t is not a number. * The module [0] is equal only to modules of the form [0, 0 , . . . , 0]. It is a very trivial sort of module—congruence mod it is simply equality—which for the most part will be ignored. A module in canonical form is not [0]. ^ li y^ = Ax^ and x ^ 0, then A must be a square, as can be seen as follows: Let [x,y] = [d] ^ [0]. If 2/ = 0, then yl = 0 is a square. Otherwise, by Proposition 1, there are numbers a and jS for which ax + d = /3y, from which it follows that [xd] = [xd,Ax^] = [xd.y'^] = [xd,y^, f3^y^] = [xd, y^, {ax -\- df] = [xd,y^,d'^] = [d"^]. Thus, xd = d^,x = d, y"^ = Ad^, and {y/d^ = A. Or if one is willing to take the unique factorization of numbers as known, one can simply observe that some prime factor of A divides A an odd number of times, and therefore divides Ax'^ an odd number of times, so Ax"^ is not a square.

Essay 3.2 Modules

75

Algorithm: While the module is not in canonical form If any number in the list is preceded by an entry that is not a number, interchange the two. Otherwise, if the second entry is a number, use the Euclidean algorithm to replace the first two entries with their greatest common divisor. Otherwise, if there is a third term (in which case the first term is a number and the second and third terms are not numbers), make use of the first term to perform the Euclidean algorithm on the coefficients of y/A in the second and third terms. Specifically, if the coefficient of y/A in the second term is less than or equal to the coefficient of ^TA in the third term, add the first term to the third term as many times as necessary and then subtract the second term from the third; otherwise, interchange the second and third terms. Otherwise (in which case there are just two entries, the first a number and the second not), if the coefficient of y/A in the second entry does not divide the other coefficient of the second entry, annex VA times the second entry to the list as a third entry. Otherwise, if the coefficient of y/A in the second entry does not divide the first entry, annex y/A times the first entry to the list as a third entry. Otherwise (in which case the module has the form [e/, eg + ey/A] but is not in canonical form), subtract the first term from the second if possible. Otherwise, annex the difference of the numbers eA and eg^, which is the difference of the hypernumbers y/A{eg-\-ey/A) and g{eg-\-eyA), to the list as a third entry. End Output: The module in canonical form with which the algorithm terminates. Example. To apply the algorithm to [7+5\/3] one must first annex |7^—3-5^| — 26. The succeeding steps are [26, 7 + 5\/3] = [26, 7 + 5^3,15 + 7^3] = [26,7 + 5^3,8 + 2^3] = [26,8 + 2\/3,7 -h 5v^] = [26,8 + 2\/3,25 + 3\/3] = [26,8 + 2V^, 17 + 73] - [26,17 + v ^ , 8 + 2^3] = [26,17 + x/3,17 + v^] - [26,17 + \/3,0] = [26,0,17 + \/3] = [26,17 + V^]. The final module is in canonical form because 1 divides both 17 and 26, 17 is less than 26, and 17^ = 3 mod 26. This algorithm terminates for the following reasons: By assumption, the input list contains both a number and a hypernumber that is not a number. A step that changes a hypernumber into a number leaves a hypernumber in the list unchanged, and the only step that reduces the number of numbers in the list replaces two numbers with a single one (the greatest common divisor of the first two entries). Therefore, at each step

76


there is at least one number and at least one hypernumber not a number. They are arranged by the first step to put all numbers first, and the second step eventually reduces the number of numbers in the list to one. Steps of t h e first three types do not change t h e greatest common divisor of the coefficients of \/]4 t h a t occur in entries of t h e list. Since they reduce the total of the numbers in the list or t h e total of t h e coefficients of ^Tk (except for the finitely many steps t h a t rearrange terms), eventually a step beyond t h e first three must be reached. Each such step reduces either t h e greatest common divisor of the coefficients of ^/]4 or t h e greatest common divisor of t h e numbers in t h e list (except for finitely many steps t h a t reduce the first coefficient of the second term), so only a finite number of t h e m can occur before canonical form is achieved. L e m m a . Let [e/, eg + e\/]4] he a module in canonical form. The congruence y + xy/A = 0 mod [e/, eg + eVA] is equivalent to the pair of congruences y = gx mod ef and x = 0 mod e. Proof

Because efVJ

+ efg = f [eg +

e^/A^

and vA

{eg + evAj

= g ieg ^ evA\

mod e / ,

an equation of t h e form y + x \ ^ + iief

+ i2 (eg + e\fA\

= y' + x'\fA

+ jief

+ J2 Ug + e\/~A\

in which zi, 22, j i , J2 are hypernumbers implies another equation of the same form in which i i , 22, j i , J2 are numbers. (For example, if zi = a + / 5 \ / ] 4 , one can add I3efg to b o t h sides and replace l3\fAef-]-f3efg with f]f{eg-\-ey/A) to obtain another equation of t h e same form in which ii is a, 22 is increased by / 3 / , and ji is increased by Pg. If ^2 = Q^ + / 3 V ^ , one can use the fact t h a t y/A{eg + e\/^)-\7 e / = g{eg + ey/A) + Sef for suitable 7 and S to add P'yef to b o t h sides and replace f3y/A{eg + e^/A) + Pêf with f3g{eg + ey/A) -f- pSef to obtain another equation of the same form in which Z2 is a -\- Pg, while pS is added to ii and P^ is added to j i , and so forth.) Therefore, y -h xy/A = 0 mod [e/, eg + ey/A] if and only if y -\- x \ ^ -\- iief -\- i2{eg -\- ey/A) = jief-\-J2{eg-\-e\/^) for some numbers i i , 22, j i , J2- Comparison of the coefficients of VA shows not only t h a t X = 0 mod e but also t h a t x -\- i2e = J2e; then comparison of the other t e r m s shows t h a t y-\-iief-{-i2eg = jief-\-J2eg = jief-\-{x-\-i2e)g and therefore t h a t y = gx mod ef. Conversely, if x -h ie = je and y + kef = gx -\- lef for numbers z, j , A:, /, then y-\-xy/A-\-kef-hi{eg-\-ey/A) = gx-\-lef -{-jey/A-\-ieg = gje + lef + je\/A = j{eg + ey/A) + lef, so 2/ + xy/A = 0 mod [ef, eg + e\/\A].

Essay 3.2 Modules

77

Completion of the Proof of Theorem 2. Thus, if [e/, eg -f eVA] = [e' f, e'g' + e'y/A\^ then e'^f' + e'^/A = 0 mod [e/, eg + e\/]4], which imphes in particular that e' = 0 mod e. By symmetry, e = 0 mod e', so e = e^ Then e' f = ^ • 0 mod ef implies f' = 0 mod / , and f — f follows as before by symmetry. Finally, eg' + e\fA = 0 mod [e/, eg + e v ^ ] implies eg' = ge mod e/, which is to say g' = g mod / . Since both g and ^' are less than f = f ^ 9 = 9' fohows. Corollary. / / two modules are equal, each can be transformed into the other by a sequence of steps of types (l)-(4)' Deduction. Such steps suffice to transform a module into its canonical form and vice versa. The product of two modules of hypernumbers can be defined, exactly as in the case of modules of numbers, to be the module described by the list containing all products in which one factor is from a list describing the first module and the other factor is from a list describing the second. Products are easily shown to be well defined for modules; that is, if a factor is replaced by an equal module, the new product is equal to the old one. The product operation defined in this way is commutative and associative, which is to say that it makes the set of modules of hypernumbers for a given A into a commutative semigroup. In this semigroup, [1] is an identity. The "modules" described here are closely related to Dedekind's "ideals" in the ring Z[\/]4], but the underlying attitude is opposite to Dedekind's. His goal was to divorce the theory as much as possible from algorithmic techniques, and he felt that he had achieved his goal by considering the infinite set of all ring elements that are zero mod [e/, eg-\-e^/A] to be a mathematical entity. To me, it borders on the absurd to believe that a mathematical idea is made "concrete" [23, Remark 21, p. 60] by describing it as an infinite set whose elements are themselves abstractions. Modules of hypernumbers for a given A are made concrete by specifying how they are to be described (as finite, nonempty lists of hypernumbers between square brackets) and how to compute with them (they are multiplied by the familiar rule, and one determines whether two given modules are equal by reducing them both to canonical form). Examples. When A — 3^ some modules in canonical form are [2,1 + ^ 3 ] , [3, A/S] , [11,5 + Vs], [ll, 6 + \ / 3 ] . Some products of such modules are 2,1 + Vsl [2,1 + V^j = [4, 2(1 + V3), 4 + 2\/3] = [2] [2,1 + A/3, 2 +

A/S

= [2] [2,1 + ^/3,1]= [2], 11, 5 + \/3] [11,5 + V3J = [ l l ^ 11(5 + V^), 28 + 10\/3 121,55 - 28 + (11 - 10)\/3,28 -h 10\/3 = 1121, 27 + VS, 28 + 10\/3 + 2 • 121 - 10(27 + V^)

78


121,27 + Tsl , 11,5 + \/3l 111,6 + V31 = 1121,11(5 + V3), 11(6 + Vs),33 + llVsj = [11] [ll, 5 + VS, 6 + V3,3 + \/3l - [11] fll, 2,3,3 + \/3l= [11] [ll, 2,1,3 + ^ 1 = [11].

Essay 3.3 The Class Semigroup. Solution of AB -\- B = D.

79

Essay 3.3 T h e Class Semigroup. Solution of An + B = D. ... die schwierigste Frage ... ndmlich die, oh zwei reducirte Formen derselben Determinante, welche verschiedenen Perioden angehoren, dquivalent sein konnen oder nicht. ( . . . the most difficult question . . . , namely, whether two reduced forms with the same determinant t h a t belong to different periods can be equivalent.)—P. G. Lejeune Dirichlet [16, §80] Again let ^ be a fixed number, not a square. As was seen in t h e previous essay, the modules of hypernumbers for A form a commutative semigroup under multiplication, or, to p u t it more simply, the operation of multiplication of modules is commutative and associative. Computations in the semigroup of modules will be used in this essay to solve An -\- B = D. A key role will be played by t h e following notion of equivalence of modules. A module will be called principal* if it can be expressed in the form [?/ + x\/]4] for some hypernumber y-\-xy/A t h a t satisfies y'^ > Ax^. T h e principal modules form a subsemigroup—in other words, a product of principal modules is principal—by virtue of B r a h m a g u p t a ' s formula {y'^ — Ax^){v'^ — Au^) = {yv + Axu)^ — A{yu + xv)'^, because this formula shows t h a t the product [{y + x\^){v + US/A)] = [{yv + Axu) + {yu -f xv)y/A] of [y -\- X^/A] and [^; + 'U\/A] satisfies {yv-{-Axu)^ > A{yu-\-xv)'^ when y'^ > Ax'^ and v'^ > Av?. Two modules M i and M2 will be called e q u i v a l e n t , written M i ~ M2, if there are principal modules P i and P2 for which M i P i = M2P2. This is an equivalence relation (transitivity follows from t h e fact t h a t a product of principal modules is principal, because M i P i = M2P2 and M2P1 = M3P2 imply M i P i P { = M2P2P1 = M3P2P2) t h a t is consistent with multiplication of modules ( M i ~ M2 implies M1M3 ^ M2M3 for any module M3). T h e c l a s s s e m i g r o u p is simply t h e set of equivalence classes, multiplied by multiplying representatives. Otherwise stated, t h e class semigroup is t h e quotient semigroup of t h e semigroup of modules relative to the subsemigroup of principal modules. Computations in t h e class semigroup depend on solving t h e problem of determining whether two given modules are equivalent; this problem, which I will call t h e equivalence problem, is solved by the theorem of this essay. It is the main step in the solution of AU -h P = • • T h e equivalence problem cannot be solved by giving a canonical form t h a t picks one representative out of each equivalence class, because there is no natural canonical form for this particular equivalence relation. Instead, the solution of t h e equivalence problem will follow a procedure like t h e one Gauss used in Section 5 of Disquisitiones Arithmeticae to determine whether two given binary quadratic forms are equivalent; it consists of two parts, the first establishing t h a t every module is equivalent to one in a certain finite set of * This term derives from the fact that the module is in the principal class of the class group. It has always seemed to me peculiar to apply the adjective "principal" to the module itself, but the usage is universal among mathematicians.

80


Fig. 3.2. Gauss.

stable^ modules, and t h e second giving a method of determining whether two stable modules are equivalent. Specifically, an algorithm—the "comparison algorithm"—will be given for generating a sequence of modules equivalent to a given one. A sequence of equivalent modules generated by the comparison algorithm eventually begins to cycle, as will be shown; a module will be called s t a b l e if the sequence of modules obtained by applying t h e comparison algor i t h m to it cycles back to this module itself. T h e equivalence problem will be solved by showing t h a t the obvious sufficient condition for the equivalence of two modules—namely, t h a t apphcation of t h e comparison algorithm to t h e m leads to the same cycle of stable modules—is also necessary. Thus, the answer to Dirichlet's "most difficult question" is no: Reduced forms in different periods are not equivalent, or, in the present formulation, stable modules in different cycles are not equivalent. By the definition of equivalence, a module [e] [/, g -h \^4] in canonical form is equivalent to [f,g-\-^/A]. Therefore, in solving the equivalence problem one can assume without loss of generality t h a t the given modules in canonical form have e = 1. ^ I have avoided Gauss's term "reduced" because it conflicts with my term "reduction algorithm," an algorithm for reducing a coefficient of VA, not for reducing the module.

Essay 3.3 The Class Semigroup. Solution of AD -\- B = B.

81

Comparison Algorithm. Input: A module [f,g + \/A] in canonical form with e = 1. Algorithm: Let r be the smallest solution ofr-\-g = 0 mod / for which r^ > A. Let A ^ (r2 - A)lf. Let gi be the smallest solution of gi = r mod / i . Output: A module [fi,gi-\-VA] in canonical form with e — 1 and an equation [r + \/A] [/, g -\- y/A] = [/] [/i, gi + ^/A] showing that it is equivalent to the input module. That the definition of/i makes sense—that is, that r'^ = A mod /—follows from r = —g mod / and the fact that ^ is a square root of A mod / . That gi is a square root of A mod / i follows from r = gi mod / i and r'^ — A = ffi. Of course gi < fi, because gi is the smallest number in its class mod / i . Finally, when q is defined hy qf = r -{- g one obtains the output equation [r + V I ] [/, ^ + / A ] = [/(r + y Z ) , r^ + A + (r + g) V I ]

= [/(r + VA), fq{r + VA),rg^r^~

ffi + (r + g)y/A]

= [f{r + VA)Jq{r + ^/A),rfq-ff^^fq^/A] = [f{r + ^/A), ff^^rfq - ff, + fqVI] = [/(r + ^/I),/A] = [/][/i,r + y i ] = [f][f,^g, ^ VJ]. Let the output [/i, gi -h \ / I ] of the comparison algorithm be called the immediate successor of the input [/, g + A / I ] , and let the successors of [/, g+y/A] be the modules in the sequence generated by repeated application of the comparison algorithm. Not only is each successor of [/, g + y/A] equivalent to [/, g -j- \ / I ] , but the algorithm gives an explicit equivalence

(1)

n(n+vi) [/,(/ +VI] =

"fe-1

"

fki9k

where [fi, gi + \ / I ] is the ith successor of [/, g + \/A] = [/o, go + y/A] and where ri is the value of r used by the comparison algorithm to go from [/^_i, ^^_i + ^/A] to [fi,gi^y/A]. Theorem. Let [/, g + \ / I ] and [F, G -\- \/A] be modules in canonical form with e = 1, and let [F, G-\-V\A] be stable. Formula (1) describes all equivalences between [/, g + \ / I ] and [F, G -h \ / I ] ^^ the sense that any equivalence \Y + X^fA][f,g+^fJ] = [VÛVA][F,G^y/A] in which Y^ > AX^ andV^ > AU'^ must satisfy k

(v + UVA) J] {n + VA)

k-1

= (Y + XVA) H

fi,

82


where k is a number for which [F, G + VA] is the kth successor of [/, g + y/A]. In particular, there are no equivalences when [F, G + y/A] is not a successor

of[f,g + VA]. Equation (1) implies that both coefficients of the hypernumber / n^^]^(ri + VA) are divisible by Hiô /*• Therefore both coefficients of (ri -\- VA){r2 + VA) • • • (^/C + y/A) are divisible by /1/2 • • • fk-i- Thus, (1) can be divided by /1/2 • • • fk-i^ which will normally be a very large number, to put it in the form

(2)

[y + xVA] [/, g + VA] = [/] [fk, A, then | r ^ - / ^ - l p > |r,+i - fi\'^. Note ffist that \ri — / i - i p < ^ if and only if \ri — /^p < A, because both are equivalent to fi-i -\- fi < 2ri, as one sees when one writes them as rf + ff_i < 2rifi-i -i- A and rf -\- ff < 2rifi -\- A, respectively, subtracts A from both sides, and uses rf — A = fi-ifi to obtain fi-ifi + f^_i < 2r^/i_i and fi-ifi + ff < 2rifi^ respectively. In the same way, the three inequalities \ri - / i - i p > A, \ri - Zip > A, and fi-i -\- fi > 2ri all imply one another. Also, on successive steps, the inequality r^ + r^+i > 2fi holds, as can be seen as follows: Because r^-hri-t-i = gi-\-ri-î = 0 mod fi, it will suffice to prove that ri -\- r^+i > fi. This is true if ri > fi. It is also true if r^ < /z_i, because then rf > rf - A ^ fifi-i > fin, so r^ > fi. Otherwise, /^_i < r^ < fi, in which case (r^ — fi-iY < -A by the definition of r^ {A is not a square, so {ri — fi-iY 7^ A), which implies |r^ — fi\^ < A, as was just seen. Thus, {fi — î)'^ < fî-\-i and fi — ri < r^+i in this case as well. Suppose now that |r^ — fi-i\^ > A. If fi < r^+i, the definition of r^+i implies |r^+i — fi\^ < A, so of course |r^ — / ^ - i p > |r^+i — /^p in that case. Otherwise, r^+i < fi, in which case the inequality of the last paragraph imphes that ri — fi > fi — rijî > 0. On the other hand, the assumption |r^ —/^_ip > A

Essay 3.3 The Class Semigroup. Solution of AD + B = D.

83

implies fi-i -\- fi > 2ri, as was seen above. Therefore /^_i —ri > ri — fi^ which combines with t h e previous inequality t o give fi-i — Vi > fi — r^+i > 0, from which the desired inequality |r^ — / ^ - i p > |î+i — / z P follows. Therefore |r^ — / i _ i p decreases as long as it is greater t h a n A, so a step must be reached at which |r^ — / i - i p < A. T h a t t h e same inequality holds on all subsequent steps—which is t o say t h a t Ir^ —/^_i p < A implies |r^+i —/^P < A—can be proved as follows: If |r^—/^_ip < A, then |r^—/^p < A, as was seen. If |r^+i—/^p were greater t h a n A, t h e n fi would be greater t h a n r^+i (r^+i is t h e least number in its class m o d fi whose square is greater t h a n A), in which case t h e above inequality n + r^+i > 2fi would imply U - fi > fi - r^+i > 0, from which \ri - / ^ P > 1/^ — r ^ + i p > A would follow. Therefore, 1/^ — r ^ + i p must be less t h a n A. Thus, t h e sequence of successors of any module eventually reaches a module [/, g-\-\^] in canonical form for which |r — / p < A, where r is t h e least solution of r + ^ = 0 mod / for which r^ > A. Let M. denote t h e set of such modules. T h e set A4 is finite, as one sees when one sets (p = |^ — / | and notes t h a t t h e n (f)^ < A and (j) = =br = =b^ m o d / , so / divides A — (j)"^. In particular, / < A. Since canonical form requires t h a t g be less t h a n / , A1 is therefore finite. T h e comparison algorithm defines a function from A4 t o itself, as was shown above. Since it carries [/,^ + y/A] in A^ t o a module [fi,gi + VA] for which |r — / i p < A, / i and gi determine r as t h e least number in t h e class of gi mod / i whose square is greater t h a n A. (If gf > A, t h e n r = gi; otherwise, r = gi + / i / i for /i > 0.) Therefore, [fi^gi -\- VA] determines r and determines [/, ^ -h y/A] by t h e rules f = {r"^ — A)/fi and g = —r mod / . In short, t h e function from A4 to itself defined by t h e comparison algorithm is one-to-one. Therefore, t h e comparison algorithm permutes t h e finite set A^, which implies t h a t every module in M. is stable—application of t h e comparison algor i t h m to it cycles back to this module itself—and t h e proof of t h e proposition is complete. Moreover, it has been shown t h a t t h e stable modules are precisely those in Ai. (It is not difficult t o show t h a t these are t h e modules [/, g -\- VA] in canonical form in which / divides a number of t h e form A — 0^ and t h e square root g oi A mod / satisfies either g'^ < A o r {f — gY < A.) See the table at t h e end of t h e essay for a list of t h e stable modules for a few values of A and t h e cycles into which they are partitioned by t h e comparison algorithm. T h e first step in finding all equivalences between [/, g + A/A] and [F, G + \ / A ] , where [F,G + ^/A\ is stable, will be t o find all equivalences of t h e special f o r m [ ^ - h x V ^ ] [ / , ^ + \/]4] = [n][F,G^VA] in which ^ - h a ; A / I = 0 m o d [ F , G 4 \fA\. T h e solution of this problem will use t h e following algorithm: Reduction Algorithm. Input: An equation [y -h x^/A][f,g -^ ^/A] = [n][F, G + V l ] in which x > 0, y^ > Ax^^ t h e modules [/, ^ -h \fA\ and [F, G -h \/~A\ are in canonical form, and y + x\fA = 0 mod [F, G + \fA]. ([F, G + ^/A] need not be stable.)

84


Algorithm: Determine p as the least number congruent to G mod F for which y < px. Define yi -h Xl^/A to be {p - \fA){y + x\/A)/F. Define Fi to be {p^ — A)/F and Gi to be the least solution of p -\- Gi = OmodFi. O u t p u t : A new equation [yi + xi\fA\[f^g + \/]4] = [n][Fi,Gi -h \ / A ] , with Xi < X, which can be used as a new input equation—that is, [-Fi, G i + \/A\ is in canonical form, yi > Ax\^ and y\ + xi\fA = 0 mod [Fi, Gi -h \/^]—unless xi = 0 . Justification. By t h e choice of p, px > y and p = G mod F , so t h e definition of xi as {px — y)/F is valid by virtue of ^ = Gx = px mod F (because y + x\fA = 0 mod [F, G + V ^ ] ) - Moreover, Fxi = px — y < Fx because px — y > Fx would imply p> F and (p — F)x > ?/, contrary to the definition of p. Thus, xi < X. Since p'^x'^ > y^ > Ax'^ implies p^ > A (because x > 0), it follows t h a t [pyY — f? - y^ > A- Ax^ — (Ax)^ and py > Ax; at the same time, py = Gy = G^x = Ax m o d F , which shows t h a t the definition of yi as {py — Ax)/F is valid. T h a t yf > Ax\ follows from (p^ - A){y'^ - Ax'^) > 0 when one rewrites this inequality first as p^y^ -\- A?x^ > Ax'^p^ -\- Ay^ ^ then as {py — Ax)"^ > A{xp — yY^ and divides by F^. Since p^ = G^ = A mod F , the definition of F i as (p^ - A)/F is valid and F i > 0. Also, Gl = {-pf = p'^ = A mod F i by virtue of p^ - yl = F F i , so [ F i , G i + \AA] is in canonical form. W h e n q is defined by p + Gi = g'Fi, one deduces [Fi][F,G + V 3 ] = [Fi][F,p + ^] = [(p - y/A){p + ^ ) , F i ( p + v ^ ) ] = [(gFi-Gi-A/Z)(p+x/I),Fi(p + v^)] = [ ( g F i - G i - y Z ) ( p + x / I ) , F i ( p + \/]4), g F i ( p + VA) - {qFi - G l - V ^ ) ( p + V ^ ) ] (the third entry is q times the second minus the first) = [{qFi - Gi - V ^ ) ( p + \/]4), F i ( p -h \ / A ) , ( G I +

^ ) ( p + v^)] = [Fi(p + ^ ) , ( G i + VI)(p + V^)] = [ p + ^ ] [ F i , G i + VI]. Since (p + V ^ ) ( y i + X I A / A ) = (p^ - ^)(?/ + a:\/]4)/F = Fi(?/ + x\/]4) = 0 mod [Fi][F,G + / A ] , t h e equation [Fi][F,G + V ^ ] = [ p + v ^ ] [ F i , Gi + V A ] implies (p + V A ) {yi-\-Xi ^/A) = 0 mod [p + \/]4] [Fi, Gi + V ^ ] and therefore implies* yi+xi^/A = 0 mod [Fi, Gi + %/]4]. Finally, multiplication of the input equation by [Fi] gives [Fi][?/ + x ^ ] [ / , ^ + / A ] = [n][p + y : 4 ] [ F i , G i + ^ ] ; multiply by [p—V^]—which is valid even though p—y/Ais not a hypernumber because the hypernumbers y-\-xy/A and p + y/A can b o t h be multiplied by p—y/A—to put this equation in t h e form [Fi][F{yi + xiy/A)][f, g-j-^/A] = [n][FFi][Fi, Gi + ^/A] and divide by [FFi] to conclude t h a t the o u t p u t equation holds. T h e theorem will be proved by proving t h a t if [F, G + ^/A] is stable and if the reduction algorithm is applied iteratively until x is reduced to zero, then (1) the terminal equation is obvious from t h e original equation and (2) the steps of the algorithm can be retraced using t h e comparison algorithm * The definitions imply—when use is made of the fact that if a, 6 and c are hypernumbers with c 7^ 0 then ac = be implies a = b—that, for any nonzero hypernumber c and any module M, a congruence ac = be mod [e]M implies a = b mod M.

Essay 3.3 The Class Semigroup. Solution oi AU -{• B = B.

85

to go from the terminal equation back to the original, thereby determining the possible original equations and showing that [F, G -f y/A] is a successor of [f,g + V ^ ] . For example, the input equation [236 + 89\/7][83,16-\-V7] = [83][3, l + \/7] leads to [236 -h 89\/7][83,16 + v^] = [83] [3,1 -f V7] [107 + 40^7] [83,16 + ^ ] = [83] [3,2 + ^ ] [85 + 31v^][83,16 + ^/7] = [83] [6,1 + V7]

{pi = 4), (p2 = 5), {p^ = 7),

[63 + 22A/7][83, 16 + X/7] -

[83] [7, V7]

{PA = 7),

[41 + 13^7] [83,16 + V7] [19 + 4^7] [83,16 + V7] [35 + 3x/7][83,16 + A/7] [51 + 2v^][83,16 + v^] [67 + A/7] [83,16 + 77] [83] [83,16 + / f ]

[83] [6, 5 + ^7] [83] [3,1 + ^7] [83] [14, 7 + ^7] [83] [31,10 + v^] [83] [54,13 + x/7] [83] [83,16 + A/7].

(ps = 5), (pe - 7), (pr = 21), (ps = 41), (pg = 67),

= = = = = =

Each step leaves the second factor on the left and the first factor on the right unchanged. At the last step, the uniqueness of canonical form implies that the two sides are identical. Therefore, the terminal equation [83] [83,16 + \/7] = [83] [83,16 + \/7] is determined without computation by the original one. Moreover, at each step the module on the right will be seen to be the immediate successor of the module below it. In fact, the number pi used to go from equation i — 1 to equation i is the number r used by the comparison algorithm to go from the module in equation i to the one in equation i — 1, which implies that the input equation at the top of the list can be obtained by starting with the identity at the bottom and successively multiplying by 67 + A/7, dividing by 83, multiplying by 41 + \/7, dividing by 54, and so forth, applying the operations to the hypernumbers in the first factors on the left and to the modules in the second factors on the right. As this example indicates, the key fact used to determine the possible input equations is that application of the reduction algorithm to an equation [y + xy/A] [/, g + \fA] = [n] [F, G + ^/A] in which [F, G + VA] is stable produces a sequence of equations [i/i + Xi^/A][f^ g + \/]4] = [n][F^, Gi + ^/A\ in which the immediate successor of [F^, Gi + A/A] is [Fj_i, G^_i + \fA\ and the number Pi used by the reduction algorithm to go from equation i — 1 to equation i is the number used by the comparison algorithm to go from [F^, Gi + y/A\ to its immediate successor. Let a step of the reduction algorithm be called traceable if the number p used to perform it is equal to the number r used by the comparison algorithm to determine the immediate successor of [Fi, Gi + Lemma. A step of the reduction algorithm is traceable if [F, G + \fA\ is stable or if it follows a traceable step. Proof. Let [y + x^/A][f, g + y/A] = [n][F, G + \/A] be an input to the reduction algorithm. To say that the resulting step of the reduction algorithm is

86


traceable is to say that p = r where p is the number used by the reduction algorithm and r is the number used by the comparison algorithm to determine the immediate successor of [Fi,Gi -f y/A]. Let s be the number used by the comparison algorithm to determine the immediate successor The first step of the proof will be to show that if the step is not traceable, then p-\- s = F. If the step is not traceable, then because p'^ > A and p-\- Gi = 0 mod Fi, and because r is by definition the smallest number for which r'^ > A and r + Gi = 0 mod Fi, p must be at least as great as Fi, and the square of p- Fi must be greater than A. Then {p - Fi^ > A or p^ + Ff > 2pFi + A, from which it follows (subtract A and divide by Fi) that F -\- Fi > 2p and F > p -\- {p — Fi) > p^ so F — p > p — Fi > 0. If p -\- s were greater than or equal to 2F, it would follow that s > F -\- {F - p) > F dnid {s - F)'^ > {F - PY > {p- FiY > A, contrary to the definition of s. Thus, p + s < 2F. Since p + s = G + 5 = 0 mod F , the desired conclusion p-\- s = F follows from the assumption that the step is not traceable. Since p ^ s = F implies |s — F p = p^ > A, which in turn implies that [F, G + \/]4] is not stable (see the proof of Proposition 1), the first statement of the lemma, that if [F, G + y/A] is stable then the step is traceable, follows. Suppose, finally, that the step follows a traceable step of the reduction algorithm. Then the step that it follows is retraced by multiplying by s + \/A and dividing by F . Thus, the previous x is {y + sx)/F. Since the reduction algorithm reduces x, it follows that x < ^^^^, which is to say Fx < y ^ sx. If the step were not traceable, F would be 5 + p, so px + sx would be less than y + sx, and px would be less than ?/, contrary to the definition of p. Therefore, the proof of the lemma is complete. Proposition 2 (Solution of Pell's equation). The only solutions of PelVs equation Ax'^ -\-l = y'^ are those given by the reduction algorithm, namely, the pairs {x,y) given by formula (2) when [f,g + \fA] = [fki9k + \fA] = [!]• Proof. Putting [y^x\fA\ in canonical form when y and x are relatively prime and y'^ > Ax"^ easily gives [y + XVA] — [y'^ — Ax'^^g + V^], where g is determined by y = gx mod (^^ — Ax"^). Therefore, Ax^ + 1 = 7/^ implies [y + x\/]4] = [1]. Conversely, if y^ > Ax^ and [y + XA/A] = [1], then x and y are relatively prime and y^ — Ax^ = 1. In short, solutions of Pell's equation correspond one-to-one to hypernumbers y + xy/A for which y^ > Ax^ and [y + xVA] = [l]. Since [1] is stable, infinitely many of its successors are [1]. Each such successor implies a solution {xk,yk) of Pell's equation given by the formula yk

-^Xk^/A

^Yt

is to be shown is that there iare no others.

Essay 3.3 The Class Semigroup. Solution of AD + 5 = D.

87

But a solution of Pell's equation is, as was just shown, a hypernumber y + x\fA that satisfies [y + xv^][l] — [1]. This equation is an input to the reduction algorithm (write the right side as [1][1,A/A]), and repeated application of the reduction algorithm reduces it to [l][l,\/]4] = [l][l,^/]4]. The input y + x\fA = 1 + 0 • ^/A of course is already reduced and corresponds to the trivial solution A • 0^ + 1 = 1^ of Pell's equation. Otherwise, the reduction requires /c > 1 steps, all steps are traceable, and y + x\/A is obtained when 1 is multiplied by pk + \fA and divided by 1, multiplied by pk-i + ^/A and divided by / ^ - i , and so forth. Since the sequence of p's is the sequence—in reverse order—of r's obtained by applying the comparison algorithm to [1], Proposition 2 follows. For example, when A = 13, the cycle of [1] is [1], [3,1 + Vl3], [4,1 -h vT3], [9,7 + ^13], [12,ll + Vl3], [13,713], [12,1 + 713], [9,2 + ^13], [4,3 + 713], [3, 2 + \/l3], after which the sequence returns to [1] and repeats. The r's used at the successive steps are 4, 5, 7, 11, 13, 13, 11, 7, 5, 4, after which they repeat. Thus the smallest solution of 13a:^ -\-l = y'^ other than the trivial one is given by the coefficients of (4 + 7 l 3 ) ' ( 5 + 713)^(7 + 713)^(11 + 713)^(13 + 713) 32 .42 . 92 .122 . 13

which is easily found to be 649 + 1807l3. That is, the smallest solution of Pell's equation when A = 13 is 13 • 180^ + 1 = 649^ Since (649 + 1807l3)^ = 842401 + 2336407l3, the next smallest solution is 13 • 233640^ + 1 = 842401^ and so forth. (For any A^ as for A = 13, the sequence of r's is in fact a palindrome, so the sequence of p's is identical to the sequence of r's.) Proposition 3. / / [/, g + ^/A\ is principal, then [f,f — g-h VA] is also principal, and the product of these modules is [/]. Proof. Suppose [/, ^ + 7 2 ] = [y-\-xy/A], where y'^ > Ax'^. Since any common divisor of x and y divides the coefficient of \fA in ^f + \fA, x and y are relatively prime. Therefore, the number y'^ —Ax^^ call it TV, is relatively prime to X, and x has a reciprocal, call it r, mod N. Then [y -\- x^/A] = [N, y -\XVA, r{y + X 7 A ) ] = [N,G-\7 3 ] , where G = ry mod N, so f = N, g = G, and y = xry = gx mod A^. The solutions (X, Y) of Pell's equation obviously grow without bound, so there is a solution Y'^ = AX'^ + 1 of Pell's equation in which X > x. Since y^-Ax^ = / > 0, it follows that X'^y^ = AX^x^-^fX^ > AX^x'^-hx^ = Y^x'^, which implies Xy > Yx. Also, Y'^y'^ = A^x^X'^ ^-Ax^ ^ f AX'^ ^ f > A^X'^x^, so Yy > AXx. Therefore, the formula z + wy/A = (Y -\- XVA){y - XVA) defines a hypernumber z -\- wyfA (even though, by the strict definition being used here, y — xyA is not a hypernumber). This hypernumber satisfies [z + w^fA){y + X 7 A ) = ( r + X^fA){y - x^fA){y + x^Â) = ( F + X^fA)f, Thus,

88


[z + wy/A][y -h x^/A] = [F + Xy/A][f]

= [/], and what is to be shown is t h a t

Now, z^ = {Yy-AxXf = Y'^y^-2AxyXYÂ^x^X^ = Y'^y'^-AY^x^ + AY^x^ - 2AxyXY + AX^y^ - AX^y^ + A^x^X^ = {Y^ - AX^){y^ - Ax^) + A{Xy — Yx)'^ = Aw'^-hf. Thus z'^—Aw'^ = / , and it remains only t o show t h a t z = {f—g)w mod / . B u t equating coefficients of VA in {z-{-w^/A){y-\-x^/A) = ( y + X\/A)f gives wy -\- zx = / X , so 0 ^ wy -\- zx = wgx -\- zx mod / , which implies, because x is relatively prime to / , t h a t wg -\- z = 0 mod / , or z = (f — g)w mod / , as was to be shown. Corollary.

A module that is equivalent

to [1] is

principal

Deduction. To say t h a t M is equivalent to [1] means t h a t there are principal modules P i and P2 for which MPi — P2. By Proposition 3, there is a principal module P3 such t h a t P1P3 = [n] for some number n. Thus M[n] = P2P3, which implies M[n\ = [z -{- wy/A]^ where z^ > Aw'^. This equation impHes t h a t n divides b o t h z and i^, so M = [^ + ^^^^1 ^^ principal. Proof of the Theorem. Suppose t h a t [/, g + y/A] and [P, G + ^/A] are equivalent—say [y + xy/A][f,g + y/A] = [v -\- UVA][F,G + y/A] where y'^ > Ax^ and v^ > Au^—and t h a t [P, G + \/]4] is stable. By Proposition 3, there is a hypernumber z + w\fA for which z^ > Aii;^ and [z + i^jv^lit' + u\/A\ = [n] for some number n. Let t h e given equivalence between [/, g + y/A\ and [P, G + \fA\ be multiplied by [P(2; + w\fA)] to yield an equation of the form [Y + X / A ] [/, ^ + \/]4] = [N] [P, G + \/A], where TV = Fn, t h a t is an input to the reduction algorithm. Application of the reduction algorithm reduces this equation to [N][f,g + y/A] = [N][f,g^ \fA]. Since [P, G + \fA] is stable, the steps of the algorithm can be retraced by applying the comparison algorithm to [/, ^ + \ / ^ ] , from which it follows t h a t Y + X\/~A can be obtained by multiplying N hy ri -{- \/]4, dividing by / , multiplying by r2 + \ / ^ , dividing by / i , and so forth, stopping with the kih step, where [P, G + \/~A] is the A:th successor of [/, ^ + \ / ^ ] . In short.

Y ^Xy/A

= N 112 = 0 /^

Since Y + X \ / ] 4 = F{z + it;\/]4)(2/ + x\/]4) and N = P n , the equation F{z + ^ ^ ) ( ? / + x^/A) n t r j / i = ^ ^ n j = i ( n + V ^ ) foUows. The equation k-l

{y + xs/A)

k

J J / i = (i; + w \ / I ) J J ( r , + V I ) Z=0

2=1

of the theorem follows when one multiplies by i; + uyA C o r o l l a r y ( S o l u t i o n of t h e e q u i v a l e n c e p r o b l e m ) . different cycles are not equivalent.

and divides by Fn. Stable modules

in

Essay 3.3 The Class Semigroup. Solution of AD + B = U.

89

Deduction. If two stable modules are equivalent, the theorem implies that each is a successor of the other, so they are in the same cycle. Solution of An^B = D. A solution of Ax^ -\-B = y'^ is called primitive if X and y are relatively prime. The primitive solutions are found in the following way: For each square root p of A mod B, use formula (2) to find all solutions

of the problem [y + x\fA\[B^p^yfA] — [B]. Each pair {x, y) found in this way is a primitive solution of Ax'^ -\- B — y'^, and there are no others. The solutions {x,y) that are not primitive are of the form {ud^vd), where is a square factor of B and (w, v) is a primitive solution of Au^ ~^ ^ ~ v^. Therefore, they can be found by finding all square factors d^ of B that are greater than 1 and, for each of them, using the method just described to find B_ all primitive solutions (u, v) of Au^ Proof. The hypernumbers (3) are the solutions of [y -\- x\fA\[B^p -\- \fA\ = [5][1, yA\ found by the construction of the theorem. Multiplication by [5, B — p + \fA] then gives [y + xV^][5] = [B][B, B - p-^ ^/A], which implies [y + xv]4] = [B, B — p-\-\/A]^ so X and y are relatively prime and satisfy y"^ —Ax"^ = B. Conversely, if x and y satisfy these conditions, then reduction of [^ + x\/]4] to canonical form gives [B^g -\- ^/A] for some square root g of A mod B, so [y + x\^][B^ B — g ^ y/A] = [B] for another square root B — g of A mod 5 , and y + x^/A is among the solutions given by (3). For example, the solution of 79n + 21 = D requires finding the square roots of 79 mod 21. (Note that 21 has no square factors, so all solutions are primitive solutions.) These are easily found by finding the square roots ± 1 of 79 mod 3 and the square roots ± 3 of 79 mod 7 and putting them together using the Chinese remainder theorem to find the four square roots 4, 10, 11, and 17 of 79 mod 21. The module [21,4 + \/79] is stable, and its cycle under the comparison algorithm contains 8 stable modules; since [1] is not among them, the square root 4 of 79 mod 21 gives rise to no solutions of 79n + 21 = D. Similarly, the module [21,17 -h A/79] is stable. Its cycle—which contains the conjugates of the modules in the cycle of [21,4 + \/79], as is shown in the table below—also has length 8 and does not contain [1], so this square root does not give rise to any solutions of 79n + 21 = D either. Application of the comparison algorithm to [21,10 + \/79], on the other hand, reaches [1] in two steps from [21,10 + \/79] to [2,1 + \/79] to [1]; the values of r are first 11 and then 9, so 7/ + x\/79 = (II+V^)(9+A/79) = 89 + 10\/79, and the smallest solution of 79n + 21 = D in the sequence of solutions corresponding to [21,10 + A/79] is 79 • 10^ + 21 = 89^. The next solution is found by taking

90


t h e comparison algorithm two steps further, which multiplies 89 + 10\/79 by (9+v^K9+x/79) ^ go -f 9 / 7 9 , which of course describes the smallest solution 79 • 9^ + 1 = 80^ of Pell's equation in t h e case A = 79. Since (89 + 1 0 \ / 7 9 ) (80 + 9 v ^ ) = 1 4 2 3 0 + 1 6 0 1 ^ 7 9 , t h e next solution of 790 4-21 = D in this sequence is 79 • 1601^ + 21 = 14230^. More generally, the n t h solution in t h e sequence is contained in the coefficients of (89 + 10\/79)(80 + 9 \ / 7 9 ) ' ' " ^ . In the same way, the fact t h a t [21,11 + v ^ ] ^ [1] leads to an infinite sequence of solutions of 79x^ + 21 = 2/^, namely, the solutions in which x is t h e coefficient of \/79 in (10 + \/79)(80 + 9\/79)'^"^. All solutions are contained in these two infinite sequences. O r b i t s of S t a b l e M o d u l e s for V a r i o u s V a l u e s of A A = 2. (2 modules, 1 cycle) [1]~[2,V21; A = 3. (3 modules, 2 cycles) [1] (Cycle contains just one module.) [2,1 + ^/3] - [3,\/3]; A = 5. (5 modules, 2 cycles) [1] - [4,3 + x/5] - [5, v^] - [4,1 + v ^ ] , [2,1 + V5]; A = 6. (6 modules, 2 cycles) [1]-[3,V6], [2, V6] - [5,4 + V6] - [6, V6] - [5,1 + x/6]; A = 7. (7 modules, 2 cycles) [ l ] - [ 2 , l + x/7], [3,1 + V^] - [6, 5 + V7] - [7, V7] - [6,1 + x/7] - [3, 2 + v/7]; A = 8. (7 modules, 3 cycles) [1], [2,V8]-[4,v^], [7,1 + V8] - [4, 2 + x/8] - [7, 6 + ^/8] - [8, v ^ ] ; A = 10. (10 modules, 2 cycles) [1] - [6,4 + VTO] - [9, 8 + yiO] - [10, \/lO] - [9,1 + \/lO] - [6, 2 + VTO], [2, v ^ ]

-

[3,1 + VW]

-

[5, \/l0]

-

[3, 2 + A/10];

A = 11. {9 modules, 2 cycles)

[i]-[5,4 + A/n]-[5,i + y n ] , [2,1 + x/iI] - [7,5 + \/Ii] - [io,9 + \/TT] - [ii,\/rTl - [10, i + x/TI]

-

[7,2 + A/n]; A = 12. (11 modules, 4 cycles) [l]~[4,x/l2], [2,712], [3,^12] ~ [8,6+\/l2] ~ [ 1 1 , 1 0 + v ^ ] ~ [12,VT2] ~ [11,1 + ^12] ~ [8,2+^12],

Essay 3.3 The Class Semigroup. Solution oi AD -\- B = D.

91

[6,Vl2]-[4,2 + VT2]; A = 13. (13 modules, 2 cycles) [1] - [3,1 + Vl^] - [4,1 + A/T3] - [9, 7 + VT3] - [12,11 + x/l3] - [13, VU] [12,1 + v ^ ] - [9, 2 + VlS] - [4,3 + VTS] - [3, 2 + ^13], [2,1 + v ^ ] - [6, 5 + 713] - [6,1 + 713]; A = 14. (10 modules, 2 cycles) [l]-[2,yi4], [13,1 + ^/I4] - [10, 2 + VTi] - [ 5 , 3 + ^/l4] - [7, VU] - [ 5 , 2 + x/T4] - [10, 8 + Vli] - [13,12 + A/14] - [14, VT4]; ^ = 15. (12 modules, 4 cycles) [1], [5,^/T5]-[2,l + Vl5], [3, 715] - [7,6 + VT5] - [7,1 + yi5], [15, A/15] - [14,l + x/T5] - [11,2+ v ^ ] - [6,3 + ^/15] - [11,9+ v ^ ] [14,13 + VT5]; A = 17. (13 modules, 2 cycles) [1] - [8,5 + ^ - [13,11 + ^17] - [16,15 + ^ - [17,^17] - [16,l + \/l7] [13,2 + x/l7]-[8,3 + Vl7],

[2,1 + v^] - [4,1 + Vrf] - [8,7 + Vrf] - [8,1 + Vrf] - [4,3 + yi7]; A = 18. (12 modules, 2 cycles) [1] - [7, 5 + \/l8] - [9, Vl8] - [7, 2 + / I S ] , [2, Vl8] - [ 9 , 6 + Vl8] - [14,12 + x/Ts] - [17,16 + \/l8] - [18, Vl8] - [17,1 + \/T8] - [14, 2 + VlS] - [9, 3 + A/18];

A = 19. (17 modules, 2 cycles) [1] - [ 6 , 5 + A/19] - [ 5 , 2 + yi9] - [9, 8 + 719] - [9,1 + VT9] - [ 5 , 3 + \/l9] [6,1 + A/T9],

[2,1 + \/i9] - [ 3 , 2 + 719] - [10, 7 + A/19] - [15,13 + 7l9] - [18,17 + \/i9] [19, 7l9] - [18,1 + \/T9] - [15, 2 + \/l9] - [10, 3 + VT9] - [3,1 + \/l9]; A = 20. (14 modules, 3 cycles) [l]-[5,720], [2, 720] - [8,6 + 720] - [10, 720] - [8, 2 + 720], [20, 720] - [19,1 + 720] - [16, 2 + 720] - [11, 3 + 720] - [4, 720] - [11,8 + 720] - [16,14 + 7 ^ ] - [19,18 + 720]; A = 21. (18 modules, 4 cycles) [1] - [4,1 + 72l] - [7, 72l] - [4, 3 + 721], [3, 721] - [5,1 + 72T] - [12, 9 + A/2T] - [17,15 + 721] - [20,19 + 72T] [21, V2i] - [20,1 + 721] - [17, 2 + 72I] - [12, 3 + 721] - [5,4 + 721], [2,l + 72l], [6,3 + 72I] - [10,9 + 72T] - [10,1 + 72I].

92


Finally, an example that is frequently cited by Gauss.* A = 79. (51 modules, 6 cycles) [ 1 ] - [ 2 , 1 + ^79], [3,1 + ^79] ~ [14,11 + 779] - [15, 2 + A / 7 9 ] - [6,1 + ^79] - [7,4 + ^79], [9,4 + V79] - [13,1 + V79] - [ 5 , 2 + ^79] - [18,13 + ^79] - [25, 23 + ^79] ^ [26,1 + A/79] - [21,4 + A/79] - [10, 7 + ^79], [27,22 + v ^ ] - [ 3 5 , 3 2 + A/79] - [39,38 + ^/79] - [39,1 + ^79] - [35,3 + ^79] [27, 5 + A/79] - [15, 7 + A/79] - [30,23 + A/79] - [43,37 + ^79] - [54,49 + V ^ ] . ^ [63,59 + A/79] - [70,67 + A/79] - [75,73 + A/79] - [78, 77 + A/79] ~ [79, A/79] [78,1 + v ^ ] - [75, 2 + Vi9] - [70, 3 + V79] - [63,4 + ^79] - [54, 5 + A/79] [43,6 + Vf9] - [30, 7 + A/79] - [15,8 + ^79], [9, 5 + A/79] ~ [10,3 + V79] - [21,17 + ^79] - [26, 25 + ^79] - [25, 2 + ^79] [18,5 + A / 7 9 ] - [5,3 + V 7 9 ] - [13,12 + A/79], [3, 2 + A/79] ~ [7,3 + V79] -

[6, 5 + ^79] -

[15,13 + A/79] -

[14,3 + A/79].

* Disquisitiones Arithmeticae, §§185, 186, 187, 195, 196, 198, 223. The reason 79 is of interest is that it is the smallest value of A for which the class group contains a square that is not the identity—for example, [3,1 + A/79]^ ~ [9, 4 + V ^ ] / [1]. Perhaps Gauss's attention was drawn to this case by the fact that it occurs in a counterexample Lagrange gave to a conjecture of Euler [48, Article 84]. Lagrange notes that the problem 79 • D + 733 = D has a solution (the comparison algorithm gives [733, 476 + A/79] - [90, 77 + v/79] - [1]) but the problem 79 • D + 101 = D does not ([101,33 + ^79] - [45,23 + A/79] - [9,4 + A/79] ^ [1]), contrary to a conjecture of Euler that would have implied that the answer to "Does An-\~B = D have a solution?" might, for prime B, depend only on the class of B mod 4A.

Essay 3.4 Multiplication of Modules and Module Classes

93

Essay 3.4 Multiplication of Modules and Module Classes In this essay, the semigroup of modules and the class semigroup are examined more closely. In the semigroup of modules, modules can be decomposed as products of their "p-parts" where p ranges over the primes, and in this way t h e semigroup can be described quite fully, except for the crucial problem of determining the primes p mod which A is a square, which are t h e primes for which [p] is a product in which neither factor is [1]. This question is the subject of Essay 3.5. T h e structure of the class semigroup depends on more subtle considerations, and general statements are harder to come by. T h e theorem t h a t is proved in this essay and t h a t is used in the next to prove the law of quadratic reciprocity merely describes the subgroups of index 2 of the class group in a few simple cases (namely, t h e cases in which A is an odd prime or twice an odd prime or a product of two primes t h a t are congruent mod 4). The computation of products of modules comes down to the computation of products [/, g + \/~A] [F, G + \fA] (where g'^ = A mod / and G'^ = A mod F), because multiplication of a module by [e] is easy. T h e following theorem determines these products when / and F are relatively prime: T h e o r e m 1. / / / and F are relatively prime, [/F, z + V ^ ] ; where z is determined by z = g mod / and z = G mod F. Proof. By t h e Chinese remainder theorem,* t h e congruences z = g mod / and z ^ G m o d F determine a unique z m o d / F , so t h e formula [/F, z -\- y/A] determines a module. T h e desired product is [/F, / ( G + V ^ ) , F{g-\-y/A), {g-\VA){G + VA)], which is [/F, f(z + y/A),F{z + yfA),gG + A + (^ + G ) v ^ ] , because fG = fz mod / F and Fg = Fz mod / F . If / 3 / = aF -\- 1, then z + \fA is t h e difference of (3f{z + \fA) and aF{z -h v ^ ) , so t h e desired product is [fF,z -h VA^gG + A + (^ + G)y/A]. Since A = z'^ m o d / and A = z"^ mod F and since / and F are relatively prime, A = z'^ mod / F ; moreover, {z — g){z — G) = 0 mod / F , so A + gG = z{g + G) mod / F , which implies t h a t gG + ^ in t h e last t e r m can be replaced by z{g -\- G)^ and t h e desired product becomes [/F, z + \/]4, {g -^ G){z + v ^ ) ] = [fF^ z + V ^ ] , as was to be shown. Given a prime p and a module [/, g + \fA\, let t h e p - p a r t of [/, g + vC4] be the module [p^^g -\- \ / ^ ] , where n is the number of times p divides / . (If p does not divide / , the p-part of [/, ^ + ^fA\ is [1] by this definition.) By Theorem 1, every module is the product of its p-parts, and one can find t h e product of [/, g -f- \fA\ and [F, G + vG4] by finding t h e product of their p-parts for each prime divisor p of / F , which reduces t o finding t h e product of the p-parts for t h e primes p t h a t divide b o t h / and F . In short, the computation of [/, g + \fA\ [F, G + \fA\ can be done prime by prime. For all b u t a very few primes, the needed products are given by the three propositions t h a t follow. * See Essay 3.2.

94


Proposition 1. Let p be an odd prime that does not divide A. If A is not a square mod p, there are no modules [/, g + A/A] in canonical form other than [1] in which f is a power of p. If A is a square mod p, there are exactly two modules [p^ ^g -\- \/A\ in canonical form for each n > 0^ and they are the nth powers of those in which n = 1. The product of the two in which n = 1 is [p]. Thus, a product of the form [p'^^g + A/]4][^^, G + y/A] can be found by writing it as [p, ^ + \/]4]^[p, G + A/Z]^ and observing that the theorem imphes that their product is [p,g + \/A]'^+'' if [p,g + VA] = [p,G -\- VA], and is [j9]^[p, G + \ / ] 4 ] ^ - ^ or [p]^[p, g + \ / ] 4 ] ^ - ^ in the obvious way 'ii\p,g + VA]^ [P.G^VA]. Proof Because a polynomial of degree n with integer coefficients that is not zero mod p has at most n roots mod p, A has at most two square roots mod p. Because —g is a square root of A mod p whenever g is, and because g and —g are different mod p when this is the case (if they were the same, then 2g would be zero mod p so AA = {2gY = 0 mod p, contrary to hypothesis), A has either no square roots mod p or exactly two. When it has two, and when g < p is one of them, the product [p, g + \/]4][p,P — ^ + VA] is [p^,p{p — g ^ ^/^)iP{9 + ^/^)iP9 — g'^ -\- A -\- py/A]. The third term minus the second is 2pg mod p^ and [p^, 2pg] = [p] (again, because the square of 2g is not divisible by p and p is prime), so the first term p^ can be replaced by p, and the module is equal to [p]. If, for some n > 1, gn-i is a square root of A modp^~^, then there is a unique square root of A mod p^ that is congruent to gn-i modp^~^, because the formula {gn-i + l3p^~^Y = A m o d p ^ implies 2(3gn-i = {A— ^^_i)/p^~^ mod p, which determines /? mod p because 2gn-i is relatively prime to p. Therefore, for each value of n and for each square root g of A mod p there is exactly one square root of A mod p^ that is g mod p; call it gn. If [p,^ + ^ ] " - ^ = [p^-\gn-i + VA], then the same formula holds with n in place of n - 1, because [p,^ + y/A]'^ = [p'^~^,gn-i + ^/A][p,g 4- VA] = [p^-\gn^VA][p,gn + VA] = [p^,p{gr^ + ^/A), gl ^ A + 2gnVA] = [p",p(^n + VA),2gn{gn + VA)] = [p^.g^ + VA] (because [p,2^,] = [1]). Thus [p,^ + VA]is the unique module [p^,G -\- y/A] in canonical form in which G = g mod p, as was to be shown. Proposition 2. If p is a prime that divides A once but not twice, the only module [p^,^ -f vA] in canonical form in which n > 0 is [p, V ^ ] ; the square

of[p,VA] is [p]. Proof If g were a square root of A mod p^ for n > 1, then p would divide g^ and therefore would divide g itself, so p^ would divide g^ = A mod p^, contrary to the assumption that AÔ mod p^. Therefore, the only module [p^,g-\-\/A] in canonical form in which n > 0 is [p, \/A] because g"^ = A = 0 mod p implies ^ = 0 mod p. The square of this module is [P^,PA/]4, A] = [p], because [p^,^] = [p] by assumption.


95

Proposition 3. If A is odd, the modules in canonical form [2^,^ + v ^ ] with n > 0 are as follows: (i) When A = 3 mod A, [2,1 + A/Z] is the only one; its square is [2]. (a) When A = b mod 8, there are three: a = [2,1 + ^/A], /? = [4,1 + \fA], and (5 = [4, 3 + yA\. The product of a with any of the three is [2\a, whereas p^ = [2]~P, 'f = [2]/3 and (3p = [4]. (Hi) When A = 1 mod 16, let a = [2,1 + VA], /? = [8,5 + VA], and P = [8,3 + y/A]. The modules in question are a and four infinite sequences of modules /^^/[2^-i], ^''/[2''-^], a/3^/[2^], and op /['^"'] for n > 0, these modules being distinct from one another. They can be multiplied using o? — \2\a andf5p^ [8]. (iv) When A = 9 mod 16, the answer is given by (Hi) when (5 is changed to [8,1 + y/A] and ^ to [8, 7 + y/A]. Proof, (i) The square of any odd number is 1 mod 4, so A = 3 mod 4 implies that no module [2^,^ -f v ^ ] is in canonical form when n > 1. The square of [2,1 + \/A] is [4,2 + 2 A / I , A + 1 + 2y/A] = [4,2 + 2y/A, A - 1], which is [2] because [A, A - 1] = [2]. (ii) Similarly, the square of any odd number is 1 mod 8, so ^ = 5 mod 8 implies that there are no modules [2^, p -h A/A] in canonical form in which n > 2. The multiplication formulas that are given are easy to verify. For example, [4,1 -f- y/Af = [16,4(1 -f \/]4), 1 + A + 2y/A]] twice the third term minus the second is 2+2A—A — 2(^1—1) = 8 mod 16, so the 16 in the first term can be replaced by 8, after which the second term can be dropped because it can be expressed in terms of the first and third, resulting in [8, H-A-f2\/]4] =

[2][4,3 + y i ] . (iii) and (iv) Let .A be 1 mod 8. For each n>l, there is a unique solution gn < 2^+2 of gl = A + 2^+^ ^^^ 2^+^ for which gn = 1 mod 4. In fact, ^1 = 5 is determined by these conditions when A = 1 mod 16, and gi = 1 is determined by them when A = 9 mod 16. For n > 1, knowledge of gn-i enables one to find gn in the following way. The congruence h^ = A.+2^"^^ mod 2n+3 Q£ which gn is a root implies, because g^_i = A-\- 2^+^ mod 2^^+^, that h^-gl_i = 2^+1 mod 2^+2. If /i = 1 = gn-i mod 4, then /i + ^n-i is divisible by 2 but not 4, so the congruence {h -\- gn-i){h — gn-i) = 2^^^ mod 2^^+^ imphes that h — gn-i is divisible by 2^ but not 2^'^^. Thus h = gn-i + 2^ mod 2^^+^. In short, if the conditions on gn can be met, then gn = gn-i + 2^ mod 2^+^. The two possible values gn-i =t 2^ of gn mod 2^+^ determined in this way have squares that differ by 2^^+^ mod 2"^+"^, so the condition g'^ = A + 2^+^ mod 2^"^^ is satisfied by exactly one of them. The sequence ^1, ^2, 93 j • • • determined in this way describes the modules /3^/[2'^~^] — [2^+^, ^n H- VA], as can be seen as follows: By the very definition of ^1, P = [8,^1 + y/A] in both (iii) and (iv). What is to be shown, then, is that [8,^1 + \/]4][2^+\^n_i + VA] = [2][2^+^^n + VA]. The product on the left side is [2^^\S{gn-i + y/A),2^^^{gi-^VA),gign-i-hA + {gi+gn-i)VA]. Because gn-i = gn + 2'' mod 2 ^ + \ one has 8gn-i = 8gn + 2^+^ mod 2^+^,

96


so the second term can be changed to 8{gn + y/A) -h 2"^+^. Similarly, the third term can be changed to 2''^^{gn + VA) + 2"'+^. (Use î = ^2 + 4 mod 8 and g2 = Qn mod 8.) Finally, the congruences g^ = gi + 4 mod 8 and gn = gn-i + 2- mod 2-+1 imply {g^ - gi){gn - 9n-i) = 2^+^ mod 2-+^ which combines with A = g'^ -\- 2"^+^ mod 2^^+"^ to give A -\- gign-i = 9n{9n-i + ^fi) mod 2^+^, and the product [8,î + \/]4][2^+-^,^^_i + \/]4] can be written as [2"+^8(5n + y I ) + 2"+^2"+l(9n + V I ) + 2"+^(5l+5„-l)(5r^ + VA)+2"+3a], where a = 0 or 1. The diflFerence between ^^ ^"^"^ times the second term and 4 times the last term is 2"^+^ mod 2^+"^, because ^^"^^"""^ is odd. Thus the first term can be changed to 2^+^, and the desired product can be expressed as [2"+^ ^{gn + v ^ ) , 2^^\9n + VC4), (^1 + ^n-i)(^n + v ^ ) ] - The third term is a multiple of the second and can be dropped. Because [8,^1 + gn] = [2], this brings [8,î + \ / I ] [ 2 ^ + \ ^ n _ i + \ / I ] to the desired form [2-2^+^ 2(^^ + V^)]. The sequence ^1, '§2^ ^25 • • • defined by 'g^ = —gn mod 2^"^^ satisfies the same conditions as the sequence ^ 1 , ^2, S's • • •, except that the condition gn = I mod 4 is replaced by ^^ = 3 mod 4. Therefore, the same argument gives ;5''/[2^-i] = [2^+2^-^^ + VA\ = [2'^+^^^ + \ / I ] . For n > 1 there are four modules of the form [2^~^^,g + V ^ ] , one for each of the four square roots of A mod 2^+^, which are ib^^ and ih^^ + 2^"*"^. The first two have been 77,4-1

accounted for. The remaining two are a/?^^^/[2"^+^] and a(3 /[2^+^], as follows from the observation that the last term in [2,1 + \/]4][2^'^'^, :^gn-\-i + y/A] = [2-+^2(±^,+l + ^ ) , 2 - + 3 ( l + v ^ ) , ± ^ , + l + A + ( l ± ^ , + l ) ^ / I ] can be changed to (1 ± ^n+i)(±^n+i) + 2""+^ + (1 ± ^n+i)V^, which is 2^+^ plus a multiple of the second term; therefore, the first term can be replaced with 2^+^, so that the module becomes [2^+^, 2(±c/n+i + V^)] = [2][2^+^ zb^f^ + 2'^+^H-\/]4], as was to be shown. When n = 0 the result is [2] [4, ±l+\/]4], which accounts for the two modules aP/[2] = [4,1 + y/A\ and a^/[2] = [4,3 + \/]4] and completes the proof. These three propositions cover all products [/, g + \/]4] [F^ G + y/A] except those in which some prime p divides both / and F and divides A twice. In particular, if A is square-free, it describes all products. The description of products of equivalence classes of modules is in principle much easier, because the number of equivalence classes is finite, so one can simply compile a multiplication table showing all products of all pairs of classes. However, multiplication tables are rarely very enlightening. More insight into the class semigroup is obtained by considering specific features. In particular, as Gauss's work in Section 5 of Disquisitiones Arithmeticae showed, enough information about the class semigroup on which to base a proof of quadratic reciprocity is provided by analyzing the classes that are ambiguous in the sense defined below. Lemma. / / [/, g + y/A] is stable and if [/i, gi + y/A\ is its successor in the comparison algorithm, then [/, f—g-\-y/A] is the successor o/[/i, / i —gi-\-VA] in the comparison algorithm.


97

Proof. By definition, [/i, gi + VA] is determined by / / i = r'^ — A and gi = r mod / i where r is the least solution of r + ^f = 0 mod / whose square is greater t h a n A. Similarly, t h e successor of [/i, / i — î + \ / ^ ] is [f^g^ + V ^ ] where / ' / i = r^ — ^ , ^' = r i mod f\ and r i is t h e least solution of r i = gi mod / i whose square is greater t h a n A. It was shown in Essay 3.3 in t h e proof t h a t t h e comparison algorithm permutes stable modules t h a t r is t h e least number in its class mod / i whose square is greater t h a n A. T h u s , since r i and r are b o t h gi mod / i , r i = r, which implies f = f and g' = ri = —g mod / , as was t o be shown. Thus, if [/, g -\- y/A] is stable, t h e cycle of [f,f — g-\- y/A] contains t h e modules obtained by changing g to f — g in t h e modules in t h e cycle of [f,g + v A ] , but they are traversed in t h e opposite direction. In particular, if [fif~9~^ VA] ^ [/, ^ -h y/A] holds for one module in a cycle it holds for all W h e n this is t h e case, t h e cycle is called a m b i g u o u s . Let Ml, M2, • . . , Mjy be t h e modules in an ambiguous cycle in t h e order given t h e m by t h e comparison algorithm. Let t h e definition of Mi be extended t o all integers i by setting Mi = Mj whenever i = j mod z/, and, for Mi = [fi 5 9i + VA] •> let Mi denote [/^, fi — gi-^ y/A]. Call a stable module Mi p i v o t a l if (1) Mi = Mi or (2) Mi = M^_i. An ambiguous cycle contains exactly two pivotal modules—unless it contains just one module—as can be seen as follows: Let t h e given cycle be ambiguous and let /i > 0 satisfy MQ = M ^ . By t h e lemma, M i = M ^ _ i , M2 = M ^ _ 2 , and so forth. If /i > 1 renumber t h e modules in t h e cycle by setting M / = M^+i. T h e n M / = M^+i = M ^ _ i _ i = -M^_^_2, so t h e renumbering of t h e modules has t h e effect of reducing /i by 2. In this way, // can be successively reduced until it is 0 or 1. If // = 0, t h e n Mi — M(^_i) = M^^î for each z, which is to say Mi — Mj^-i, so Mi is pivotal of t y p e (1) if and only iii = —i mod u and pivotal of t y p e (2) if and only if i = 1 — i mod ly. T h u s , when z/ = 2cr for a > 0 t h e only pivotal modules are MQ and M^ (because 0 and a are t h e only numbers i less t h a n u for which 2i = 0 mod 2a and 2i = 1 m o d 2cr has no solutions at all) and when z/ = 2 r -f 1 t h e only pivotal modules are MQ and M^-^-i. Similarly, if /i = 1, t h e n Mi = Mi^+i_^, so Mi is pivotal of t y p e (1) if and only if i = 1 — i m o d v and pivotal of type (2) if and only if i = 2 — z mod u, from which it follows t h a t there are two pivotal modules when z/ = 2(j for cr > 0, namely, M i and M0-+1, and two pivotal modules when u = 2 T -h 1, namely. M i and M ^ + i . T h a t there are two pivotal modules unless u = 1 follows from t h e observation t h a t the modules given by these formulas are distinct unless 1/ = 1. T h e o r e m 2. (a) If A is an odd prime, there are four pivotal modules, [1,VA], [2,1 -h VA], [A,y/A], and [ ^ ^ , 1 + y/A] (except that in the cases A = 3 and A = b the last of these coincides with one of the first two to form a cycle of length 1). (b) If A is twice an odd prime, say A = 2p, there are four pivotal modules, [1,VA], [2,VA], [P,VA], and[A,^/A].

98


(c) If A is a product of two odd primes, say A = pq, where p < q, there are eight pivotal modules, [1, ^/A], [p, \/]4], [q, VA], [A, ^/A], [2,1 + VA], [2p,p-\\/A], [^^y^, 1 + VA] cind [ ^ ^ , p + \fA\ (except that the last of these coincides with one of the others to form a cycle of length 1 if q — p — 2 or 4). Thus, in cases (a) and (b) there are 2 ambiguous cycles, and in case (c) there are four. Proof (a) A module [/, g + y/A] in canonical form is pivotal of type (1) if and only if ^ = —g mod / . W h e n this is the case, 2g = 0 mod / and 4A = (2^)^ = 0 mod / ; t h a t is, / is a factor of 4A. Since A is an odd prime, / = 1, 2, 4, or A, because / < A in a stable module. Since f = 4 would imply 2g = 0 mod 4 and ^ = 0 mod 2, it is impossible in view of ^^ = A mod / . Therefore, the only possible pivotal modules of type (1) in this case are [l,v^], [2,1 + / A ] , and [A, v ^ ] , all of which are clearly pivotal of type (1). If [/, ^ + y/A] is pivotal of type (2), then its predecessor in t h e comparison algorithm is [f,f — g-\- V ^ ] , and the r t h a t determines the step from [f^f — g-\- VA] to [/,g-\-y/A] satisfies p ^ y.2 _ ^ Thus, A = {r - f){r -\- f), which implies t h a t r - / = 1 and r-\-f = A and therefore t h a t / = ^ ^ , r = ^-^. Since the new g is congruent to r mod / , and since r = 1 mod / , the only possible pivotal module of type (2) is M ^ , l + \/]4 , which is in fact pivotal of type (2). (b) As in the proof of (a), if [/, g + ^/A] is pivotal of type (1), then 4^4 = 0 mod / and / < A, so / = 1, 2, 4, p, or 2p. Again, / = 4 is impossible because it would imply ^4 = 0 mod 4, so the only pivotal modules of type (1) are the ones listed in (b). There are no pivotal modules of type (2) because A = 2 mod 4, so r^ — f^ = A is impossible. (c) If / is a factor of 4^4 less t h a n or equal to A, then / = 1, 2, 4, p, g', 2p, 2g, or A = pq. Since [2g, q + \/A] is not stable (both q'^ and {2q — q)'^ are q^ > pq =^ A)^ the only possible pivotal modules of type (1) are [1], [2, 1 + A/A], [p, vA], [q^ A / ^ ] , [2p^p + vA], and [A^ \A]^ all of which are indeed pivotal of type (1). Those t h a t are pivotal of type (2) satisfy A = {r — f){r -\- f) as in the proof of (a), where r is t h e number used by the comparison algorithm to go to the pivotal module from its predecessor. Thus, either r — / = 1 and r-\-f = A or r — f = p and r + / = g. In t h e first case, r = 1 mod / , so the module must be

^ ^ ^ , 1 + y/Al,

module must be

^,P

and in the second case, r = p mod / , so the

+ VA which completes the list.

The complications t h a t arise in the cases A = 1 mod 4 of Proposition 3 stem from the fact t h a t in these cases the class semigroup is not a group, which is to say t h a t there are classes without inverses. For example, the square of [2,1 + \/5] is [2] [2,1 + V^], but [2,1 + V^] constitutes a cycle of length 1 and is therefore not equivalent to [1]. Therefore, the class of a = [2,1 + \/5] has no inverse because a^ ~ [1] would imply a = Q;[1] ^ a'^^y = [2] 0^7 ^ 0^7 ^ [1]. A module whose class is invertible in the class semigroup is called p r i m i t i v e .


99

The class group for a number A, not a square, consists of the invertible elements of the class semigroup, those classes whose elements are primitive. For any given A, the class semigroup is found by finding the cycles of stable modules. One can then find the class group by using the following theorem to determine which cycles contain primitive modules: Theorem 3. Let [e][/, ^ + y/A] be a module in canonical form, and let d he the greatest common divisor of f, 2g, and '^ 7 ^ if d = 1, then [f,g + y/A][f,f — g -\- y/A] = [/]; in particular, [e][f,g + \/]4] is primitive, because the class of [f, f — g -\- A/]4] is inverse to its class. If d > 1, [e][f,g + ^/A] is not primitive. In particular, if A = 1 mod 4, the module [2,1 + y/A] is not primitive, because in this case d = 2. Proof. By direct computation

[f,g+y/A][f,f-g+VA]

= [f, -fg + fVAJg + fVAJg-g^ = [f, -2fg,fg + fVA,\g^= [/]

+A

•/VA]

'f,2gJ-lf^,g + VA

= [/]['i,g + \/A], where use is made of the first term /^ to compute with the coefficients of the other terms as numbers mod /^. If d = 1, this product is [/], which proves the first statement.

Now, [d,g + VA][f,g + VA] = [df,d{g + VA),g^ + A + 2g^]

= [df,dig +

VA),A — g'^] (subtract 2g/d times the second term from the third, computing mod df) = [d][f^g + ^/~A\ (because g^ = Axnoddf). Therefore [d,g + y/A][f,g + VA] = [d][f,g ^ yfA] - [/,^ + A / I ] , which shows that ^/ [/? 9 + V^] ^^ primitive, then [d, g + \/~A\ ^ [1], which is to say that repeated application of the comparison algorithm to [d^ g + y/A\ must eventually reach [1]. But if d divides / , 2^, and '^ 7 ' for any module [/, ^ + \fA\ in canonical form—as is the case with the module [d^ gi -h \fA\ when gi is the smallest number congruent to g mod d because gi-\- g and gi — g are both zero mod d, so gf- A = g'^ - A-\-{gi^ g){gi - g) = 0 mod d"^—then d divides F , 2G, and ^—^-^, where [F, G + \ / I ] is the successor of [/, g + V l ] in the comparison algorithm, as can be seen as follows: Let r = uf — g he the number used to find [F, G + A/A]. Then / F = r^ - A = u'^f^ - 2guf -i-g'^ - A = 0 mod df, which implies F = 0 mod d. Moreover, G = fiF -\- r for some /x, which gives 2G = 2/xF ^2Tyf-2g = 0 mod d. Finally, G^ - A = r^ - A + 2/iFr -f M^F^ = Ff + 2/iFr + fi'^F'^ = Ff ^ 2/iFG = 0 mod dF, so d divides i ^ ^ . (Note that /x may be negative in these congruences, so that the argument is still valid when G < r.) Therefore, if [/, g + y/A] is primitive, d must divide 1, as was to be shown.

100


Corollary. Elements of the class group whose squares are the identity correspond one-to-one to ambiguous cycles whose modules are primitive. Deduction. An equivalence class whose square is the identity contains modules whose squares are equivalent to [1]; in particular, the modules it contains are all primitive. By the theorem just proved, if [/, g + y/A] is a stable module in canonical form that is in such a class, then both [f,g-{-y/A] and [/, / —^ +\/]4] are in the class inverse to the class of [/, g + y/A], which shows that the cycle of [/, g -\- \/\A] is ambiguous. Conversely, if [/, g + VA] is both primitive and ambiguous, then [f,g + ^/A]^ ^ [f,g + VA][fJ - g + ^/A] = [/] ~ [1]. When this corollary is combined with Theorem 2, it determines the elements of the class group of order 2 in a few cases: Theorem 4. / / A is prime and congruent to 1 mod 4, the class group has no element of order 2. If A is prime and congruent to 3 mod 4, or if A is a product of two primes A = pq for which p -\- q ^ 0 mod A, the class group has a unique element of order 2. Proof. When A is prime and 1 mod 4, [2,1 + VA] and [ ^ , 1 + ^/A] are not primitive, because the greatest common divisor of / , 2^, and ' Z^ ' is 2 for each of them, so at most two pivotal modules are primitive. The cycle of [1] therefore is the only one that represents an element of the class group whose square is the identity. (The few cases in which there are cycles of length 1 are enumerated in Theorem 2.) Therefore, the class group contains no elements of order 2. On the other hand, when A is prime and 3 mod 4, or when A — 2p where p is an odd prime, all four of the pivotal modules identified in Theorem 2 are primitive. Therefore, there are two primitive, ambiguous classes. The case A = pq in which p and q are odd primes that satisfy p = q mod 4 is similar, because the four pivotal modules other than [1], [p, vA]^ [q, vA]^ and [A, yA] are not primitive (for all of them, / , 2g^ and '^ 7 ' are all even). Thus, in these cases, the class group has just one element of order 2.

Essay 3.4 Multiplication of Modules and Module Classes T h e Class Semigroup for Various Values of A (Compare to the table of Essay 3.3)

Value of A 2 3 5 6 7 8 10 11 12 13 14 15 17 18 19 20 21 79

Class group Trivial group Group of order 2 Trivial group Group of order 2 Group of order 2 Group of order 2 Group of order 2 Group of order 2 Group of order 2 Trivial group Group of order 2 Four-group Trivial group Group of order 2 Group of order 2 Group of order 2 Group of order 2 Cyclic of order 6

Representatives c none none [2,1 + ^/5] none none [2,V8] none none [2,yi2], [6,yi2] [2,1 + Vn] none none [2,1 + yi7] none none [2,V26] [2,1 + ^21], [6,3 none

21]

101

102


Essay 3.5 Is A a Square M o d p? . . . eine noch hohere Bedeutung haben sie [die Reciprocitdtsgesetze] in der geschichtlichen Entwickelung dieser mathematischen Disciplin [Zahlentheorie] dadurch erlangt, dafi die Beweise derselhen, so weit sie ilberhaupt gefunden sind, fast durchgdngig aus neuen, his dahin noch unerforschten Gebieten haben geschopft werden miissen, welche so der Wissenschaft aufgeschlossen worden sind. ( . . . t h e reciprocity laws attained an even greater significance in the historical development of number theory by the fact t h a t their proofs, insofar as proofs have been found, had to be sought in areas t h a t were hitherto almost completely unexplored and t h a t in this way were opened to science.)—E. E. K u m m e r [47, Introduction] Essay 3.4 gives a description of t h e semigroup of modules for a given A {A not a square) t h a t is virtually complete except t h a t it leaves untouched the obvious question raised by its Proposition 1: For which odd primes p not dividing A are there modules [p,g + \/A] in canonical form? This is the question "What is t h e value of X p ( ^ ) ? " where, for a given odd prime p, Xp is defined to be the q u a d r a t i c c h a r a c t e r of numbers mod p, which is to say t h a t Xp is the function t h a t assigns to a number A the value* —1 if the congruence A = x^ mod p has no solution x, the value 0 if A = 0 mod p, and t h e value 1 otherwise. In this way, the evaluation of Xpi-^) is essential to computation in the semigroup of modules of hypernumbers for a given A. T h e problem of evaluating Xp(A) engaged Euler's interest rather early in his career (see [17]), when he discovered empirically the amazing fact t h a t ifp and q are primes that satisfy p = q mod 4:A, then Xp{^) = Xq(^)- I^ other words, the answer to the question "Is A a square mod p?" depends only on the class of the prime p mod AA. Euler made many a t t e m p t s to prove what he had found empirically; in the process, he found refinements and generalizations of the phenomenon, thereby setting much of the agenda for number theory for t h e next hundred years, but t h e hoped-for proof eluded him. W h a t Euler knew b u t couldn't prove about t h e values of Xp(^) implies the law of quadratic reciprocity fairly easily. This law, which is stated below, was p u t in its usual form by Legendre and was first proved by the young Gauss, who gave two proofs in Disquisitiones Arithmeticae^ published in 1801 when he was 24 years old. T h e second of these uses his theory of composition of binary quadratic forms. The proof t h a t will be given in this essay is inspired by Gauss's second proof, b u t it will use modules and their multiplication instead of quadratic forms and their composition. T h e law of quadratic reciprocity is one case of a general formula for Xp (^) of the form Use of the "negative number" — 1 can be avoided by treating the values of XP ^^ numbers mod 4, so that —1 = 3, (—1)^ = 1-

Essay 3.5 Is A a Square Mod p? (1)

Xp{A) ^

103

aA{p)l[xAdP) Ai

in which p is an odd prime, A is a square-free number that is not divisible by p, the product on the right is a product over all odd* prime factors Ai of A^ and (TA{P) depends only on the classes of p and A mod 8; in fact crA{p) depends only on the classes of p and A mod 4 when A is odd. The formula for (JA (p) is given at the end of the essay. Euler's observation that Xpi^) = Xg(^) when p = q mod AA follows immediately from (1), because p = q mod 4A implies p = q mod Ai for all odd prime factors Ai of A and implies p = q mod 4 when A is odd, p = q mod 8 when A is even. The derivation of formula (1) is the subject of the present essay. The law of quadratic reciprocity is simply the case in which A is an odd prime. However, the derivation will begin with this special case. It will use two lemmas: Lemma 1. If p is an odd prime, Xp(p ~ 1) ^-^ 1 when p = 1 mod 4 and —1 when p = 3 mod 4. Proof. I will use without proof the fundamental fact that for any prime p there is a primitive root 7 mod p, which is to say a number 7 with the property that every number not divisible by p is congruent to a power of 7 mod p. (See Section 3 of Disquisitiones Arithmeticae, which contains two proofs, the first in Articles 39 and 54, the second in Article 55.) If 7 is a primitive root mod p, then each of the p — 1 numbers between 0 and p is congruent mod p to a unique power of 7 in which the exponent is less than p. Since 7^^ = 7^^ mod p if and only if 2/i = A mod {p — 1), a power 7'^ of 7 is a square mod p for X < p if and only if A is even. Since the roots 7*^ and ^^P~^y^ of x^ — 1 mod p coincide with ± 1 mod p^ p—1 = 7^^"^)/^ mod p, which is a square mod p if and only if {p — l ) / 2 is even, or in other words, Xp(p — 1) = 1 if and only if p = I mod 4, as was to be shown. Let X4 denote the function that assigns the value 0 to even numbers, the value 1 to numbers congruent to 1 mod 4, and the value —1 to numbers congruent to 3 mod 4. Then Lemma 1 can be stated as Xp{p-1)

= X4{p) for any odd prime p.

As is easily proved, Xp(?7in) = X P ( ^ ^ ) X P ( ^ )fo^"^11 numbers m and n whenever p is prime and also when p = 4. Lemma 2. Given a primitive module and given a number N, construct a module [/, g -h \/3] in canonical form for which (1) f is relatively prime to N and (2) the product of [f^g + \fA\ and the given module is equivalent to [1]. * Since xîv) — 1 for all odd primes p, it makes no difference whether A^ = 2 is included in the product in formula (1) when A is even.

104


Proof. Let [E][F, G-f \/]4] be the given primitive module. Since it is equivalent one m a y as well assume E — 1. Choose ji large enough t h a t {^F+Gf > A. From G^ = A mod F it follows t h a t ( / i F + G ) ^ - ^ = 0 mod F , say HF = (//F + G)^ - A. For all numbers z/, {uF + /xF + G)^ - A can be written in the form F • ^(z^), where q{iy) is a polynomial of degree 2 with coefficients t h a t are numbers, namely, q{iy) = Fiy^ + 2 ( / i F + G)iy -f ( M ^ F + 2/xG + H). A common divisor d of the coefficients of q{iy) divides F , 2G, and i J , so {/iF + G)2 = A mod d F and G^ = A mod dF; thus, (i divides F , 2G, and ^—^—^, which implies d = 1 because [F, G + \/]4] is primitive by assumption. Thus, for any prime p , q{h') is a nonzero polynomial mod p whose degree is 2 at most. Therefore, g(z/) has at most 2 roots mod p t h a t are less t h a n p. Moreover, since F and H cannot b o t h be even (because at least one of F , 2G, and H is odd), q{u) is either odd when u is even (when H is odd) or odd when ly is odd (when H is even). Let Pi, P25 • • • ^ Pa hst the distinct prime divisors of FN. For each p^, choose a number Ui < pi for which q{iyi) ^ 0 mod Pi. Use the Chinese remainder theorem to construct a number i^ < Y[Pi such t h a t u = Ui mod p^ for each i. T h e n q{iy) = ^(z^^) ^ 0 mod p^ for each i. In other words, q{iy) is relatively prime to FN. Let p — {u -\- fi)F -\- G for the z/ chosen in this way. T h e n [p+y/A] = [pÂ,P+VA] = [F'q{iy),p-\-y/A] ^ [F, p-\- y/A][q{iy), p-\- y/A] (because q{u) and F are relatively prime) = [F,G -\- y/A][q{i'), p + y/A] (because p = G mod F ) . Thus, because [p + ^/A] ~ [1], the module [^(z/),^ + \/]4] has the required properties. P r o p o s i t i o n . If p is an odd prime divisor of A, the value of Xpif) ^ Ax^. Since this equation implies t h a t a: = 0 mod v and y = zx mod vfF (see Essay 3.2), t h e equation can be divided by V to give one of t h e form [/F, z + \/]4] = [2/ + xy/A]. This equation implies t h a t X is relatively prime t o y (any common divisor divides t h e coefficient of VA in 2: -f y/A) and therefore t h a t x is relatively prime t o y'^ — ^ x ^ , from which it follows t h a t [y + xy/A] = [y'^ - Ax^, y + XVA] = [y'^ - Ax'^, p + y/A] for some p {x is invertible mod y"^ — Ax'^). Therefore fF = y^ — Ax^. Since A = 0 mod p and fF ^ 0 mod p, it follows t h a t fF = y'^ ^ 0 m o d p. T h u s Xpif) = XpifF'^) = Xpiy'^F) = Xp{F), as was t o be shown. T h e proof of t h e analogous theorem for X4 in t h e case A = 3 mod 4 follows t h e same steps, except t h a t one needs t o prove t h a t if fF = y'^ — Ax^ and fF is odd, then X4(/) = Xî^)This follows easily from t h e observation t h a t y a n d X must have opposite parity (because fF is o d d ) , so one of t h e terms y'^ and —Ax"^ is 0 m o d 4 and t h e other is 1 m o d 4, resulting in XA.{fF) = 1, from which X4(/) = XA{F) follows. T h e o r e m . Let p and q be distinct odd primes. Ifp=l m o d 4, then Xq{p) = 1 implies Xp{q) = ^- VP = ^ mod 4, then Xq{p) = 1 implies Xp{q) = X4(^). ^/ p = g = 3 mod 4, then Xp{q) = -Xg(p)Proof If Xg(p) = I5 there is a module [q,g -h y/p] ^ [1]. By t h e proposition, t h e value of XP{Q) depends only on t h e class of [g, g + ^yp]. T h e kernel of t h e homomorphism defined by Xp froni t h e class group t o t h e group with two elements is either a subgroup of index two or it is t h e whole group. W h e n p = 1 m o d 4, Theorem 4 of t h e preceding essay states t h a t t h e class group has no element of order two. Therefore, t h e group has odd order (see Essay 5.2), so it can have no subgroup of index two, which implies t h a t t h e kernel is t h e whole group, so Xq{p) — 1 implies Xpio) = 1, as was t o be shown. W h e n p ~ 3 m o d 4, on t h e other hand, t h e class group contains a single element of order two, so t h e operation of squaring is a two-to-one homomorphism from t h e class group to itself whose image is a subgroup of index two. This subgroup is necessarily t h e kernel of t h e homomorphism determined by Xp, because this kernel contains all squares b u t does not contain t h e class of [p — 1,1 + ^ ] , because Xp{P~^) = ~ 1 - T h e homomorphism determined by X4 also has t h e subgroup of squares as its kernel, because its value for t h e class of [p, y/p] is XA{P) = —1- Therefore, these two homomorphisms are identical, and t h e statement t o be proved—that Xq{p) ~ 1 implies Xpio) = X4:{Q)—follows. W h e n p = q = 3 m o d 4 one finds in a similar way t h a t if either XP{Q) or Xq{p) is 1, t h e n t h e other is — 1 , b u t t h e possibility t h a t b o t h might be — 1 remains. Consider t h e class group in t h e case A — pq. By Theorem 4 of t h e last essay, t h e class group in this case has a single element of order 2, so the squares form a subgroup of index two, as before, t h a t is obviously

106


contained in the kernel of the homomorphisms from the class group to the group with two elements that are determined by either Xp or Xq- I^ f^ct, since the class of [pg — 1,1 + -y/pq] is not in either kernel, these homomorphisms have the same kernel—the subgroup of squares—and are therefore identical. The stable modules [1], [p,-y/pq]^ [Q^ y/PO]^ ^^^ [PQ^ y/Pol ^^^ partitioned between the two ambiguous, primitive cycles. Since [1] is in the principal cycle and Ipq, y/pq\ is not (one step of the comparison algorithm shows that [pg, y ^ ] ^ [pg — 1,1 + y/p^]), exactly one of [p, >Jpq\ and [g, yjpq\ is in the kernel of the homomorphism in question, which is to say that exactly one of Xg(p) ^^^ XP{Q) is 1, as was to be shown. The Law of Quadratic Reciprocity. If p and q are distinct odd primes, then XP{Q) = Xq{p) unless p = q = 3 mod A, in which case XP{Q) — ~Xq{v)Proof. The last statement is of course part of the previous theorem. Since Xp(^) = — 1 is the negation of Xpio) — 1? the statement that XP{Q) — Xq{p) is the statement that Xpio) = 1 if and only if Xq{p) — 1Whenp = g = 1 mod 4, the theorem proves that Xq{p) — 1 implies XP{Q) — 1, and the desired conclusion follows by symmetry. When p = 1 mod 4 and g = 3 mod 4, the theorem proves that Xq{p) — 1 implies XP{Q) — 1 ^^d that XP{Q) — 1 implies Xq{p) ~ XA{P) — 1^ ^s was to be shown. Evaluation of ±1 mod 8.

XP(2).

If

P

is an odd prime, Xp(2) — ^ if ctnd only if p ^

Proof Consider the class group in the case A = S. It has two elements, the class of [7,1 + V^] and the principal class. An odd prime p satisfies Xp(2) = 1 if and only if it satisfies Xp(8) = 1^ which is true if and only if there is a module [p, g ^ ^/S] ^ [1] for some g. When this is the case, either [p^g + VS] or [p, q -h v^] [7,1 H- VS] is principal, which implies (unless p = 7) that either p or 7p is of the form y'^ — Sx'^ and therefore that p = ±1 mod 8. If p = 1 mod 8, then either [8,1 + y/p] or [8, 5 + ^ ] is primitive (because if (p— l ) / 8 is even, then |p — 25|/8 is odd). The homomorphism from the class group to ± 1 determined by Xp is trivial (since p = 1 mod 4, the class group has no element of order 2), so Xp(8) = 1 and Xp(2) = 1Finally, if p = 7 mod 8, then the last two of the four pivotal modules [1, ^ ] , [2,1 + v^], [p, y/p], and [ ^ , l-\-^/p] are not principal—[p, ^ ] because it is equivalent to [p — 1,1 + y ^ ] , for which Xp{P~ 1) = ""I? and [^y^, 1 + y/p] because X4(^y^) = — 1- Therefore [2,1 + -y/p] must be principal, which implies that [2,1 + v^] = [^ + x^], where 2 = y^ - px\ Thus, Xp(2) = Xpiv^) = 1, as was to be shown. Let X8 denote the function that assigns the value 0 to even numbers, the value 1 to numbers congruent to ± 1 mod 8, and the value —1 to numbers congruent to ± 3 mod 8, so that Xp(2) is Xsip) for all odd primes p.

Essay 3.5 Is A a Square Mod p?

107

Evaluation of Xp(^)- ^^^ A be a square-free number and let p be an odd prime that does not divide A. The coefficient O-A{P) in formula (1) at the beginning of the essay is given by

(Â{P)

'1 X4{p) Xsip) ,X4{P)X8{P)

if A if A ifA z/^

= = = =

l mod 4, 3 mod 4, 2 mod 8, 6mod8.

In particular, cr^(p) depends only on the classes of p and A mod 4 when A is odd and only on their classes mod 8 when A is even. Proof Since Xp(^) = 0 x ^ ( ^ 0 ? where the product is over the prime factors Ai of A, the evaluations of XP{Q) for prime q given above imply that

if A is odd or Â{P)

=

X8{P)X4{PT

if A is even, where u is the number of prime factors Ai of A that are 3 mod 4. Because xîpY is 1 when u is even and xîp) when v is odd, the given formulas follow when one observes that an odd A is 3 mod 4 if and only if u is odd, and an even A is 6 mod 8 if and only if u is odd.

108


Essay 3.6 Gauss's Composition of Forms T h e structure of Gauss's Disquisitiones Arithmeticae suggests t h a t the original purpose of his theory of composition of forms was to put the law of quadratic reciprocity in a setting t h a t would make it seem clearer and more natural. In the first three sections of t h e book he introduces the elementary theory of congruences and proves the important theorem t h a t there is a primitive root mod p for every prime p. In Section 4 he goes on to the statement and proof of what he calls the "fundamental theorem," essentially t h e law of quadratic reciprocity. His proof in Section 4 was described by H. J. S. Smith [60, P a r t 1, Art. 18] as "repulsive to all but the most laborious students." Perhaps Gauss felt the same way about it, because he gave a second and altogether different proof in Section 5; in later years he gave other proofs, indicating t h a t even then he was not satisfied t h a t he had grasped the t r u e basis of the phenomenon. It is misleading to think of Section 5 as just one of seven sections of the book, because in number of pages it is more t h a n half of t h e book. A large part of it is devoted to t h e theory of "composition" of binary quadratic forms, which is used (Article 262) to prove the "fundamental theorem," but which is also used in t h e study of ternary forms and is studied for its own sake. Surely "composition" represents an early step in Gauss's quest for the deeper secrets of number theory. Section 5 had a profound effect on the development of number theory in the 19th century. K u m m e r ' s proof of his generahzed reciprocity law in midcentury, a proof t h a t was found only after years of intense effort, was directly inspired [47, p. 20 (700)] by Gauss's proof of quadratic reciprocity in Section 5. But beyond t h a t , Section 5 was fundamental to the development of Dedekind's theory of "ideals" in t h e second half of the century, and in t h a t way directly influenced the core ideas of modern abstract algebra. Moreover, the use of the structure of the group (without the name) of equivalence classes of binary quadratic forms in Section 5 (together with another implicit use of groups in Section 7) contributed to the development of the theory of groups. But perhaps t h e profoundest way in which Section 5 affected the development of mathematics lay in the challenge t h a t it presented. Starting with Dirichlet, and continuing with Kummer, Dedekind, Kronecker, Hermite, and countless others, the unwieldy b u t fruitful theory of composition of forms called forth great efforts of study and theory-building t h a t shaped modern mathematics. The "forms" in Gauss's theory are b i n a r y q u a d r a t i c f o r m s , which is to say t h a t they are homogeneous polynomials (forms) of degree 2 (quadratic) in 2 variables (binary) with integer coefficients. He used the notation ax'^-{-2bxy-icy^ for such forms and, as this notation indicates, he only considered forms with even middle coefficients. Because I prefer not to impose this restriction, I have chosen different letters altogether and will denote binary quadratic forms by rx^ -f sxy + ty'^ ^ where x and y are the variables of the form and r, 5, and t are its integer coefficients. (In this essay and the next, I will do as Gauss

Essay 3.6 Gauss's Composition of Forms

109

did and use integers instead of the numbers in the narrowest sense that were used in the other essays of Part 3.) I will also write the form as (r, 5, t) when the variables can remain unnamed. The notion of "composition" generalizes Brahmagupta's formula (x^ - Dy^){u^ - Dv'^) = X^ - DY'^

when X = xu-^ Dyv and Y = xv + yu

(where D is a. fixed integer). Other examples are {x^ + xy + y^){u^ -huv-h v^) = X^ + XY-]-Y^ when X = xu — yv and Y = xv -{- yu -\- yv and (16x2 + 4xy - y'^){iu^ + 2uv - v'^) = 4X'^ + 6XY + F^ when X = Axu — 2xv — yu and Y = Axv -\- 2yu -\- yv. These formulas can be verified by the lengthy but simple process of performing the prescribed substitutions for X and Y on the right and expanding to find that the resulting polynomial in x, ?/, u^ and v is indeed the product of the two polynomials on the left. In general, given two binary quadratic forms rx^ + sxy H- ty'^ and pv? + (juv-\-Tv'^ with integer coefficients, a third form RX'^ ^SXY^TY'^ is transformable into their product (see Gauss's §235) if one can define X and Y to be sums of integer multiples of the monomials xu^ xv, yu, and yv in such a way that (ra^ + sxy + ty^){pu^ + 2auv + rv^) = RX'^ + SXY + T F ^ A form that is transformable into the product of two forms composes those forms if the six 2 x 2 minors of the 2 x 4 matrix of coefficients of the expressions of X and Y in terms of xu, xv, yu, and yv that effect the transformation have no common divisor greater than 1. (It is easy to check that the three formulas above are compositions. For example, the last formula shows that AX'^ + 6XY + y^ composes 16^^ + Axy — y'^ and 4tu^ + 2uv — v'^ because the minors of the matrix of coefficients 4-2 -10 0 4 2 1 that effects the transformation are 16, 8, 4, 0, —2, and —1.) To modern ears, the phrase "composition of forms" suggests a binary operation, assigning a composed form to each pair of given forms, but Gauss's compositions do not conform to this expectation. On the one hand, there may be no form that composes two given forms, while on the other hand, if some form does compose them, then infinitely many others also do,* because * Both the English and the German translations of the Disquisitiones wrongly translate the theorems of §236 and §249, among others, when they use definite articles rather than indefinite ones; the original Latin of course has no articles.

110


infinitely many others can be obtained by a u n i m o d u l a r c h a n g e of varia b l e s in the composed form. Specifically, a m a t r i x [^ ^] with determinant 1 can be used to define U and V as sums of multiples of xu^ xv, yu^ and yv by U = aX ^bY and V = cX -\- dY; then substitution of X = dU - bV and Y = —cU-\- aV in the known composition using X and Y produces a composition using U and V. It seems fair t o say t h a t Gauss's theory in its full generality is largely forgotten. W h e n A n d r e Weil writes [63, p. 334] t h a t Dirichlet "restored its original simplicity" he is overlooking the fact t h a t Dirichlet composes only certain pairs of forms—pairs t h a t are concordant or einig—and t h a t Dirichlet justified this limitation by shifting all emphasis from the composition of forms to the composition of equivalence classes of forms, t h u s disregarding Gauss's success in developing t h e theory in the greatest possible generality. One could certainly argue t h a t in this case Gauss's insistence on generality was excessive—that t h e purposes to which the theory is put are served just as well by the mere composition of equivalence classes, and t h a t the classification of forms is so n a t u r a l t h a t the binary operation of composition of equivalence classes is a legitimate subject of s t u d y — b u t Gauss evidently disagreed. T h e technical demands of developing the theory in the way Gauss does are indeed formidable. This becomes clear from Gauss's very statement of what amounts to t h e associative law, not to mention the difficulty of proving the statement t h a t if F composes f and f, if T composes F and f", if F' composes f and f", and if T' composes F' and f, then F and F' are properly equivalent (§240, where it is assumed t h a t all forms enter directly into all compositions in the sense defined in Essay 3.7). The theory of multiplication of modules of hypernumbers resolves this conflict between the wish t o preserve the full* generality of Gauss's theory and the wish to avoid its technical difficulties. R u m m e r ' s first paper [46, p. 324 (208)] on "ideal prime factors" mentions the possibility of applying his new theory to Gauss's—at least to justify Gauss's belief t h a t t h e forms ax^ -\- 2bxy + cy'^ and ax'^ — 2bxy -\- cy^ should be considered to be inequivalent—but he never laid out the exact relation between the two theories. Similarly, Dedekind was well aware t h a t Gauss's composition of forms was, in essence, the multiplication of modules—"ideals" in his terminology—but he did not develop the correspondence in detail. Nor, as far as I know, has anyone since Dedekind. In all probability, the reason is t h a t it was felt t h a t Gauss's approach was a false start t h a t could b e disregarded and replaced with Dedekind's. But such an a t t i t u d e has t h e great disadvantage t h a t it destroys the access of modern readers to Gauss's classic. Essay 3.7 is meant to bridge the gap between the modern theory and Gauss's. On the one hand, modern readers will certainly see t h a t the multiplication of modules of hypernumbers is the multiplication of ideals in quadratic * In truth, Essay 3.7 does not preserve the full generality of Gauss's theory because it ignores forms (r, 5, t) for which 5^ — Art is a square.

Essay 3.6 Gauss's Composition of Forms

111

number fields with the added technicahty of deahng not with all integers in the field but with what Dedekind called orders (Ordnungen) of integers in the field. On the other hand, as Essay 3.7 shows, multiplication of modules makes possible the complete description of Gauss's compositions in the sense that it solves the problem. Given two binary quadratic forms, determine whether they can he composed, and if so, find all possible compositions. In brief. Gauss himself showed that two forms can be composed if and only if they pertain to the same square-free integer (see Essay 3.7) and showed that knowledge of one composition implies knowledge of all, because any two differ by a unimodular change of variables. Thus, the problem is solved by Proposition 3 of Essay 3.7, which describes how to use multiplication of modules to find an explicit composition of two given binary quadratic forms, provided they pertain to the same square-free integer.

112


Essay 3.7 T h e Construction of Compositions Binary quadratic forms with integer coefficients will be called "forms" in this essay. For simplicity, all forms will be assumed to be irreducible; in other words, forms (r, s,t) whose discriminants s^ — Art are squares are excluded.* Theorem.

Given two forms,

construct

all forms that compose

them.

If {R,S,T) composes (r,s,t) and {p,a,r), then (—i?, — 5 , - T ) composes (—r, —5, —t) and (p, cr, r ) . Therefore, one can assume without loss of generality t h a t r is nonnegative. Since s'^ — Art is not a square, r is positive in this case. Similarly, p can be assumed to be positive. T h e required construction is accomplished by the four propositions t h a t follow. A form (r, s, t) will be said to p e r t a i n to a given square-free integer if its discriminant s^ — Art is a square times t h a t integer. In this way, each form pertains to one and only one square-free integer. (An integer is square-free if it is not divisible by any square greater t h a n 1.) T h e reducible forms t h a t have been excluded from consideration are simply the forms t h a t pertain to the square-free number 1 together with forms whose discriminant is zero. P r o p o s i t i o n 1. same square-free

If a form composes two others, then all three pertain to the number.

Corollary. / / two given forms pertain to different square-free no form composes them.

integers,

then

Proof. Given are three forms (r, 5, t), (p, cr, r ) , and (R, 5, T ) and a substitution X = poxu + pixv + p2yu + p^yv, Y = Qoxu + qixv + q2yu + qsyv, in which the 2 x 2 minors of the matrix of coefficients have no common divisor greater t h a n 1, whose substitution in RX^ + SXY -\- TY^ results in (rx^ + sxy + ty'^){pu^ + GUV -h r v ^ ) . This last statement, when the coefficients of the various monomials xû^, x'^v'^, . . . , xyuv are compared, amounts to nine equations: (1)

Rpl^Spoqi^-^Tql^rp

(2)

Rpl^Spiqi^Tql^rr,

(3)

Rpl + Sp2q2 + Tql = tp,

(4)

Rpl + ^^3^3 + Tql = tr,

(5)

2Rpopi + S{poqi -^Piqo) + "^Tq^qi = ra,

* Characteristically, Gauss does not exclude reducible forms, as is shown by the point he makes in §235 of avoiding the assumption that the first coefficients of his forms are nonzero.

Essay 3.7 The Construction of Compositions

113

(6)

2Rpop2 + S{poq2 + P2 0 and £k-\-i > Sk- What is sought, then, are infinite sequences 6o^ î, O2, . • .and 0 < SQ < Si < - - - ioi which all terms of the terminating sequence x{^^ ô(^ — ^-^ + St) = {sy - s^{±s + St) -f {±s + st)^ = s^ {s^ - (±1 +1) + (±1 +1)3) -

138

4 The Genus of an Algebraic Curve

s^{s^ + 2t lb 3t^ + t^). The term 2t shows that this truncated solution is unambiguous. The continuation of the truncated solution y = :^^/x + • • • can be found using the equation s^ + 2t± 3t^ -h t^ = 0 to express t as a power series in s = y/x and substituting the result in y — ±s -\- st. Consider first the case in which the sign is plus. The relation s^ -\-2t -\- 3t^ -\-1^ — 0 can be written - 1 ^ 3 + {\s^ + IsH' + .. • ) ( - § -l) = -^s^ - Is' - IsH^ - IsH +••• = - | 5 ^ - | 5 ^ - | - | - 5 ^ + | - ^ 5 ^ + --- = - ^ 5 ^ - | s ^ - i s ^ + ---, where theomitted terms all contain 5^^, from which y = s-{-st = s—^s"^— ^s^^ — ^s^^-\ . When 5^ -h 2t + 3t^ + t^ = 0 is changed to s^ -\-2t - St"^ -\-t^ = 0, the corresponding solution is found by changing s to — 5 and t to —t. In summary, the second segment i + 2j = 3 corresponds to two infinite series solutions of y^ — xy-\-x^ = 0; they begin y = =bv^ - -x^ T :^x^V^ - - x ^ + • • •. The infinite series solution y — x'^ -\ corresponding to the first segment 2i + j = 3 calls for computing the polynomial x(s, s^ -\- s^t) = 5^ — 5^(1 + t) + s^(l + tf = s^{-t + s^(l + t)^). The term -t shows that the truncated solution y = x^ is unambiguous. The expansion of y in powers of x can be found by using the relation —t + s^{l-\-t)^ = Oto expand t in powers of s = x and substituting the result in y = s'^ + s^t. Now, ^ = 5^(1 + ^)*^ implies t = = = =

5^(1 + s^{l + tff = 5^(1 + 3s^(l + tf + 3s^(l + tf + s\l + tf) 5^ + 35^(1 + tf + 35^(1 + tf + 5^2(1 + tf s^ + 35^ + 9s^t + ^s't^ + • •. + 35^ -f- l^s^t + ... + s^2 + • • • 5^ + 35^ + 12s^ -h 28s^^ + • • •,

so 7/ = x^ + x^ + 3x^ + 12x^^ + 28x^^ + • • • is the beginning of this infinite series solution of y^ — xy -{- x^ = 0. (Note that the sum of the three series is zero, at least up to the terms in x^, in accord with the fact that the coefficient of y'^ in y^ — xy -{- x^ is zero.) Proof of Theorem 1. A truncated solution oixiîV) at x = a in which m = 1 and /i = 0 is an algebraic number (3Q for which the terms of x(c^ + -5, /?o + ^) of lowest degree in 5 all contain t] since x{^^s^Po+t) contains the term f^ with no s at all, y = /3o is a truncated solution if and only if X(Q; + -^^ /ô) does not contain a term without 5 or, to put it more simply, if and only if x( e(i/ — i) hold. Therefore, for this segment of the polygon, 7 > e. All infinite series solutions p = j{x — OLY^'^ + • • • t h a t correspond to this segment of the polygon are therefore divisible by [x — a)^. As is easily shown, t h e ratio - is smallest for this rightmost segment,* so all solutions p = ^{x — a)^!'^ + • • • are divisible by (x — a ) ^ , as was to be shown. Conversely, if q{xY fails to divide Ci{x) for some z, then {x — o;)^* fails to divide Q ( X ) for some root OL of multiplicity e of q[x) and some index i. Moreover, x(^5 V) was assumed in Essay 4.4 to be irreducible. The series expansions of a reducible polynomial can be found by finding the expansions of its irreducible factors. * What is to be shown is that the ratio cr/r for any segment of the polygon is larger than the ratio cr/r for the segment to its right. Since cr/r is minus the slope of the segment, this is the statement that the slopes of the segments increase as one moves from left to right, which is evident. In actual inequalities, the three endpoints of two successive segments of Newton's polygon, call them (r, jV), (s, js), (t, j t ) , satisfy ar + Tjr

= crs + Tjs (T'(s - r)

Essay 4.5 Determination of the Genus

147

For such an a the points (i, ji) of the polygon arising from ^{a -\- s,p) = Cy{a + s) + Cy-i{a + s)p-\-... 4-p^ include at least one for which e{u — i)> jili ji — 0 for some i < z/, then ^{a^p) contains a term of degree less than u in p, so this polynomial in p has a nonzero root, call it /3o, and there is a solution p = /ô + • • • of ^{a^p) = 0 that is not divisible by 5 = a: — a, and therefore not divisible by {x — aY. Otherwise, as before, the rightmost segment of the polygon, call it cri-^rj = A:, passes through (i/, 0) and at least one other point of the form {i,ji). At least one point (i^ji) lies below the line j = e(z/ — i) of slope —e passing through (z^, 0); since all points (i^ji) lie on or above any segment of the polygon, the rightmost segment j = ^{u — i) must he under the line j = e{v — i) for i < u. Thus, ^ < e, so no solution p — 7(x — a)^!'^ -h • • • arising from this segment of the polygon is divisible by (x — a)^, and the proof is complete. Thus, in a proper fraction r{x^y)/q{x) that is integral over x^ the coefficients of r{x^y) satisfy a homogeneous system of linear equations, so the most general such fraction can be written as a linear combination of a finite number of them, say of î, ^2, • • •, Cfc, with rational coefficients. When these elements ^1, ^2, • • •, Â: together with 1, ^, 2/^, • • •, y^~^ are taken as input to the following algorithm of Kronecker ([39, §7]), the algorithm produces an integral basis of the root field of xi^^ v) ^s described in the statement of the theorem. Construction of an Integral Basis Input: Elements ^1, ?/2, . . . , ? / / of the root field of x(^, y) integral over x that span the elements integral over x in the sense that each element integral over X can be expressed in the form X]i=i 4î{^)yi where the coefficients (j)i{x) are polynomials in x with rational coefficients. (At the outset, I — n-\-k^ and the coefficients of the î can be taken to be rational numbers.) Algorithm: As long as the number I of elements in the spanning set is greater than n, carry out the following operations. Consider the I x / symmetric matrix \tvxiyiyj)] CLTid consider its symmetric nxn minor determinants—those nxn minor determinants in which the indices of the n columns selected coincide with those of n the rows selected. Each such minor determinant is a polynomial in X with rational coefficients because all of its entries are. Rearrange yi, y2, ..., yi, if necessary, to make the first such minor—the one formed by selecting the first n rows and columns—nonzero and of degree no greater than that of any other nonzero symmetric nxn minor. Then the first n entries of yi, y2, ''', Vi dre linearly independent over Q{x), which means that each remaining entry yn+i, yn+2, - • -, yi can be expressed as a sum of multiples of the first n in which the multipliers are rational functions ofx. Each multiplier in each of these expressions can be written as a polynomial in x plus a proper rational function of x, one in which the degree of the numerator is less than the degree of the denominator. Let polynomial multiples of the first n of the y ^s be subtracted from the later y ^s in order to make the multipliers in the

148


representations of the later y ^s in terms of the first n all proper rational functions. Delete any y ^s that have become zero as a result of these subtractions, rearrange the list again, and repeat. O u t p u t : A list ?/i, ?/2, . . . , yn of just n elements integral over x t h a t span, over Q[x], the set of all elements integral over x. T h e operations of t h e algorithm—rearrange the T/'S, delete zeros, and subtract one y times a polynomial in x with rational coefficients from another y—do not change the conditions satisfied by the original set of T/'S t h a t they span t h e elements integral over x when coefficients t h a t are polynomials in x with rational coefficients are used. An argument like t h e one above t h a t proves t h a t D{x) is a common denominator of the elements integral over x proves t h a t each iteration of the algorithm reduces the degree of the determinant of the first n x n symmetric minor. Specifically, if, after t h e multipliers in the representations of ^ n + i , yn-\-2 ? • • • 5 2// cis sums of multiples of T/I , ^2, • • •, Vn have been reduced so t h a t they are proper rational functions, and after zeros have been deleted, there are more t h a n n items in t h e fist, then one of the coefficients—say the coefficient of yi—in t h e representation of 2/n+i is a nonzero proper fraction, call it ^ ^ , where d e g p < deg q. T h e symmetric nxn minor for any selection of n indices is a polynomial. As before. M i = ( 4 ^ j MQ when Mi is the minor in which the selected indices are 2, 3, . . . , n-f-l and MQ is the one in which they are 1,2, . . . , n. Thus, g(x)^Mi = p(x)^Mo, which shows t h a t deg M i < degMo. Thus, the minor of least degree has degree less t h a n deg MQ , and deg MQ decreases with each step, as was to be shown. In this way, the algorithm continues to reduce the degree of the first nxn minor. By the principle of infinite descent, the algorithm must terminate. In other words, a stage must be reached at which the list contains only n elements. Clearly, they are an integral basis of the root field. T h e proof of t h e theorem will be completed by a second algorithm, which starts with an integral basis and produces a normal basis. It requires t h a t one also construct an integral basis relative to the parameter u = ^] in other words, it uses a set 21, 2:2, . . . , 2;^ of elements of the root field of x ( ^ , y) with the property t h a t every element of the root field has a unique representation in the form Y2î{^)î^ where the coefficients ipi{x) are rational functions of x, and t h a t t h e element is integral over u = - ii and only if each ipi(x) is a polynomial in ^. T h e algorithm just given can be used to construct such a set 2^1, 2^2, • • • 7 Zn'i simply describe t h e root field as t h e root field of xi{^^^) = x{^^y)/^^î where u = ^^ v — ^ , and A is large enough to make x i a polynomial in u and V. Such an integral basis zi^ Z2, ..., Zn relative to - will be used to determine, given an integral basis yi, 2/2 5 • • •, Vn^ whether the basis yi

y2

Un_


149

is an integral basis relative to ^, where A^, for each i, is the order of yi at X — oc; that is, A^ is the least integer for which yi/x^^ is integral over ^. Construction of a Normal Basis Input: An integral basis y\^ 1/2, . . . , yn of the root field of x(x, y) relative to X.

Algorithm: Find the orders \i, \2, ..., \n of yi, y2, - - -, yn at x = 00. As long as -^, -^, - - -, - ^ (which is a basis consisting of elements integral over ^) is not an integral basis relative to ^, construct a new integral basis in which one yk is replaced by a new y'j^ whose order A^ at x = 00 is less than Xk in the following way. Write each Zi of an integral basis relative to ^ in the form ^jiîj{x)^, where the ilîj{x) are rational functions of x. By assumption, at least one ipij (x) is not a polynomial in ^. (If all were polynomials in ^ , then each Zi and therefore each element integral over - would be a sum of multiples of the -^ with coefficients that were polynomials in -.) Choose a value of i for which at least one ipij (x) is not a polynomial in ^. Since x^Zi = J2iîj{^)^^~'^^yj ^^ integral over x for sufficiently large u, arid yi, y2, ..., yn is an integral basis, the denominator of ipij (x) is a power of x for each j = 1, 2, ..., n, say ilîj{x) = x^j{x) -h Oj{^), where ^j{x) is a polynomial in X, and Oj{-) is a polynomial in - . By the choice of i, £,j{x) 7^ 0 for at least one j . Let a > 0 be the maximum of the degrees of î{x), ^2(^); • • •; in{x). Among those indices j for which deg^j = a, let k be one for which Xk is as large as possible and set y'j^ = ^CjX^^~^^yj, where Cj is the coefficient of x^ in ^j{x) (which is zero if deg^j ^ a). Output: An integral basis 2/1, 1/2, • • •, ?/n with the property that yi y2 yn QÂi

T' 2

X ^

is an integral basis relative to - . Justification. Replacement of yk with ?/^ gives an integral basis, as is shown by the two formulas y^ = V . CjX^^~^^yj (note that A^ > Xj for all j by the choice of k) and yk = -^y'k — IZj^k ^yj (^^^^ that Ck ^ 0 hy the choice of k). All that is to be shown, then, is that A'^ < A^. To this end, note that ^ 1 ^ = Yli^j~^ ) * ^ 5 where the omitted terms contain ^, ^ , ^ , Multiply by x and use the definition of yj^ to obtain -^ = x- ^ ^ + X ] î (x)* " ^ ' where ^j(^) for each j is x • ^^ l^+i^—, which is a polynomial in ^. Thus, X • -^ is a difference of elements integral over ^, which implies that the order of yj^ at X = oc is at most Xk — I, as was to be shown. Since the algorithm reduces the sum of the A^ at each step, it must terminate by the principle of infinite descent. When it terminates, the integral basis ^ 1 , 2/25 • • • 5 yn is a normal basis, because w = Ylî{^)yi ^^^ order at most u if and only if all coefficients of -^ =J2 u-\i ' ~%; ^^^ polynomials in

150


^, which is true if and only if deg (/)^ < z/ — A^, and the proof of the theorem is complete. If ?/i, ?/2, . . . , 2/n is a normal basis of the root field of x(x, 2/), the elements of 0{x^) are those whose representations in the form ^ ^ (j)i{x)yi have coefficients (t)i{x) that are polynomials in x, with rational coefficients, of degree at most v — Xi for each i. When ly < Xi this condition of course means that (/>i(x) = 0. Therefore, the dimension of 0{x^) as a vector space over Q is the sum of the numbers u — Xi + 1 over all indices i for which Xi < u. For large u^ then, the dimension of 0{x^) as a vector space over Q is exactly (z/ + l)n — ^^ A^. At the other extreme, when u = 0 this dimension—which is the degree of the field of constants 0{x^) as an extension of Q, denoted by c in Essay 4.3—is simply the number of indices i for which A^ = 0. In the notation of Essay 4.3, the genus of the root field of xi^^v) is g — TIQU — dim0(x^) + 1 for all sufficiently large i/, where no = n/c and the dimension is the dimension as a vector space over the field of constants, which is the dimension as a vector space over Q divided by c; thus, 1

g = riQiy

x"^ \

U^^ + 1)^ ~ X ] ^ 0 ^ •*" ^

~ ~ ^^^ ~ "^^'

In particular, when Q is the field of constants of the root field of x(^5 y), the genus of the root field is simply

(j2Xi)-in-l), where n — deg^ x ^îd Ai, A2, . . . , A^ are the orders of the elements ?/i, ^2, . . . , ^n of a normal basis of the field. As the discussion of Essay 4.3 already shows, the natural description of the genus uses the field of constants of the root field under consideration instead of the field of rational numbers: Determination of the Genus. ^45 was just explained, the construction of the theorem gives a basis over Q of the field of constants of the root field of x{x^y), namely, the elements yi of order zero in a normal basis. When the field Q is replaced by the (possibly) larger field of constants in the theorem, the construction gives a subset yi, y2, . . . ; yno of the root field of x(^, y) and nonnegative integers fii, JI2, • • •, f^no with the property that the elements of 0{x^) for any given v are precisely those of the form (f)l{x)yi

+ 02(^)2/2 +

h 0no(^)?/no

where (pi{x) is a polynomial of degree at most u — (ii in x whose coefficients are in the field of constants of the root field of x{x, y). Thus, for large u, the dimension of 0{x^) as a vector space over the field of constants is J27=i(^ ~ /li -\- 1) = UQiy — J2 fî ~^ ^0' By the definition of the genus, this dimension is UQV — g -\- 1, from which it follows that 9= ( X l ^ M - ( ^ 0 - 1 ) .


151

\y x3 + i/3 = xy

Fig. 4.6. The foUum of Descartes.

In particular, ^lJii'>nQ

— l.

y^ — xy + x^ (the folium of Descartes). Example 1: xi^^v) Multiplication by y is represented by

0 1 0" 0 01 -x^ X 0 relative to t h e basis 1, y^ y'^ of the root field over Q{x). of 7/ is 0. T h e trace of y'^ is the trace of

0 0

10 01

-x^ X 0

I 2

=

0 -x^ 0

0

Therefore, the trace

1

X

0

—X^

X

which is 2x. T h e trace of y^ — xy — x^ is x times the trace of y plus ~x^ times t h e trace of 1, which is x • 0 — x^ • 3. Similarly, t h e trace of y^ = xy'^ x^y IS 2x^, from which it follows t h a t 3 0 ^X

0 2x

2x -3x^

tjX

ZdX

and

D{x) = 12rr^ - 8x^ - 27:r^ = x^(4 - 27x^).

T h e square of t h e denominator q{x) of an element of t h e root field integral over X must divide x'^{A — 21 x"^)^ so x is a common denominator of these integral elements. A proper fraction integral over x must therefore be of the form ^+ y+^y where a, 6, and c are rational numbers. By the proposition, and by the fact t h a t y = i y ^ — . . . and y = x'^ -\- - • • are the series expansions of y in fractional powers of x, such an expression is integral over x if and only if a + h(^s) + c(dz5)^ = 0 mod s^ and a + bs^ + cs^ = 0 mod s. These conditions hold if and only if a = 6 = 0, so 2

t h e proper fractions integral over x are the rational multiples of ^ . Thus, 2

1, 2/, — are an integral basis. For this basis, Ai = 0 and A2 = 1. To find

152


the order A3 of ^3 = ^ at x = 00, one needs to find the equation of which To 0 1" it is a root, which is the characteristic polynomial of - —x^ x 0 . This [ 0 -x^ X characteristic polynomial is X^ — 2X'^ -\- X — x^, so y^ — 2yl -\- ys — x'^ = 0, and {ff - 2 • ^ • {ff + J^ • ( ^ ) - 1 = 0, which makes it clear that A3 = 1. With u = ~ and v = ^ the equation v^ — uv -\-1 = 0 holds. That 1, v, v^ is an integral basis of the root field oiv^ — uv ^ \ follows from the fact that in this case S =

3 0 2ii 0 2u - 3 2u - 3 2u^

from which

D{u) = 4u^ - 27.

Since D{u) is square-free, 1, i;, -z;^ is an integral basis over u. Thus, 1, y, y^ jx is a normal basis, because 1, - , ^ - ^ is the integral basis 1, v^ v^ over u. In this case, then, Q is the field of constants, and the genus is (0 + 1 + l ) - ( 3 - l ) = 0. Example 2: x(x, y^ — y^ ^ x^y -f x (the Klein curve). In this case, D{x) — —4x^ — 27x^, whose only square factor is x^, so again the proper fractions integral over x have the form ^^^V^^V ^ where a, 6, and c are rational numbers. Application of Newton's polygon in the case a = 0 leads easily to three unambiguous truncated solutions oi y^ •\- x^y ^ x — 0, namely, y — ^\fx^ where 7 is a cube root of —1. Substitution oiy — —s^ for y and of s^ for x in a + % + cy^ gives a series divisible by x = 5^ only if a = 6 = c = 0, so \, y^ y^ is an integral basis over x. The orders of the first two are 0 and 2, respectively. The third, call it i22/2 H h 6>no2/no are the representations of two elements h and 9 of K relative to this basis, then tr x{h9) = [h^Si [9] where [h] represents the row matrix whose entries are /ii, ^2 5 • • • 5 hno, and [9] represents the column matrix whose entries are ^1, 6>2, . . . ,

UriQ.

With this notation, to say that h dx is holomorphic is to say that [h] Si [9] is a polynomial of degree at most u — 2 whenever the ith entry 9i of the column matrix [9] is a polynomial whose degree is at most u — fii^ because trx{h9) must be a polynomial in x, while tixi—x^h • ^ ) = — ^^û-2 must be a polynomial in - . (Note that the hi need not be polynomials.) In other words, the row matrix [h]Si has the property that its product with a column matrix [9] is a polynomial of degree at most u — 2 when the ith entry of 9 is a polynomial of degree at most u — fii. If one takes all entries but one of [9] to be zero and that one to be a polynomial of degree v — iii for some large z/, one sees that the ith entry of [h] Si must be a rational function whose product

Essay 4.6 Holomorphic Differentials

159

with any polynomial of degree z/ — /i^ is a polynomial of degree at most u — 2. Thus, the zth entry of [h]Si must be a polynomial of degree at most /i^ — 2 when iii > 2 and must be zero if fii is 0 or 1. In other words, [h] must have t h e form [c]5';|~''^, where c is a row m a t r i x whose i t h entry is a polynomial in x of degree at most /i^ — 2 with coefficients in K^. (In particular, the i t h entry is zero when /i^ is 0 or 1.) This formula [h] — [c\S^^ completely describes the holomorphic differentials hdx. T h e number of constants in t h e coefficients of t h e entries of [c] is the sum of the numbers fJî — 1 over all values of i for which fii > 0. Since exactly one /i^ is zero (because KQ = 0{x^) consists of all elements 0 i ^ i + 22/2 + ' ' • + ^PnoUno i^ which 0^ = 0 when //^ > 0 and (pi is constant when /i^ = 0), it follows t h a t the number of arbitrary constants in this formula for h is ( ^ Aî) — (ô ~ l)^ which is t h e genus, as was to be shown. T h e proof t h a t the sum of the residues of any differential f{x,y) dx is zero reduces, by virtue of the definition of the sum of t h e residues as t h e sum of t h e residues of the rational differential tr x{f{X'>y))dx^ to the same statement for rational differentials 4 ^ d x , where p{x) and q{x) are polynomials with coefficients in some algebraic number field KQ and q{x) ^ 0. To define the sum of the residues of such a differential, it will be convenient to assume t h a t the denominator q{x) splits into linear factors over KQ^ although, as will be seen, t h e sum of the residues can be expressed rationally in terms of the coefficients of p{x) and q{x) even when this condition is not fulfilled. By t h e m e t h o d of partial fractions, one can see t h a t if q{x) — Y\{^ — oîY^ -> where the ai are distinct constants, then

for suitable constants pi(j. (One can assume without loss of generality t h a t d e g p < degg, so t h a t P{x) — 0. Multiplication of b o t h sides of the required equation

by q{x) = Yii^ ~ îY^ gives an equation of t h e form p{x) = Yl^ Piaîaix) in which the polynomials Aia{x) have degree less t h a n k = d e g ^ and depend only on q{x). This gives an inhomogeneous k x k system of linear equations satisfied by the k required coefficients pi^- W h e n p{x) = 0, these equations have only t h e trivial solution,* so for any p{x) of degree less t h a n k they have * Multiplication of YlLi (J'-afyi ^ 0, where a i , a2, . . . , a/^ are distinct algebraic numbers and degcpi < vi for each z, by Of^^^^^ ~ ^^Y^ gives an equation S r = i '^*(^) = 0 in which the ipi^x) are polynomials. All but one of these polynomials is divisible by (x — ai)^^, so the remaining one must also be divisible by {x — ai)^^, from which it follows that (/)i(x) must be zero. In the same way, 0i(a:) = 0 for each i.

160


a unique solution.) The residue at x = a of Îdx

is defined to be pî, the

coefficient of ^^^ in the partial fractions expansion of ^7^, when a is one of the roots a^ of q{x); otherwise, the residue at x = a of ^^dx Note that the residue at x = a of ( 4 \ +

is zero.

/ ^ 1 dx is the sum of the

residues at x = a of ~^dx and ^4|y(ix. (The partial fractions decomposition of a sum is the sum of the partial fractions decomposition when terms with the same denominators are combined.) The conventional statement that the sum of the residues of a rational differential is zero assumes that the "residue at x = cxo" is included in the sum. In this way, the conventional statement can be seen, as the corollary below shows, as a method of evaluating the sum of the residues of ^^dx over all finite values of a. This evaluation is in fact quite easy: Proposition. The sum of the residues at a of a rational differential over all finite values of a is lim

^-00

X'r{x)

_

q{x)

v^r{^) u^q{^)

^^dx

(e = degg), u=0

where r{x) is the remainder when p{x) is divided by q{x), where the limit on the left is merely a mnemonic standing for the expression on the right, and the expression on the right denotes the quotient of constants in which the denominator is the leading coefficient of q and the numerator is the coefficient of x^~^ in r{x). Proof. Since the residues of f P(a:) + 4 ^ 1 dx are the same as those of 4f|c^^, one can assume without loss of generality that the quotient ^T^T in the given differential is a proper fraction; i.e., one can assume r{x) = p{x). The residue of r^^^Y is p if e = 1 and 0 if e > 1, so for fractions of the particular form ^

= ^^l^Y the residue is given by the formula lim^^ôo ^ ^ - The

theorem therefore follows from the observation that if ^-T\ and '^'^)^\ are proper q{x)

fractions, then X ' r{x) q{x)

x ' ri{x) qi{x)

qi{x)

^

^

x - r{x)qi{x)-{-x • ri{x)q{x) q{x)qi{x)

so the same is true of their limits as x -^ oc, interpreted as in the statement of the theorem. (Note also that lima^ôo ^'(x) ^^ unchanged if a common factor is canceled from numerator and denominator.) Corollary 1. The sum of the residues of ^^^dx over all finite values of x is minus the residue at x — oc^ which residue is by definition the residue at u = 0 of

Essay 4.6 Holomorphic Differentials P(-)

1

161

pi-)

(The expression on the left is a mere mnemonic that takes advantage of the formula d{^) = — ^ of elementary calculus.) Deduction. What is to be shown is that the value of

at 2z = 0 is the residue at ?i = 0 of ^^^^

du=

^-^du.

Since this differential has the form uQ{u)

-du, '

where

Q{u)

is a proper fraction in which (5(0) ^ 0, this conclusion follows immediately from the definition. Corollary 2. When the residue of ^^dx at x = oo is defined as in Corollary 1, the sum of the residues of a rational differential is zero. These algebraic facts make possible a plausible implicit differentiation of x(x, i/) = 0 and 6{x^ 2/, oî, ci2, . . . , CLN-Q) = 0 that leads to N

(2)

Y^hj{xi,yi)dxi

=0

(j = 1,2,... ,^)

i=l

when the dy^s and da's are eliminated. As before, there is no loss of generality in assuming that N = n^v for some large v and that the [xi^yi) are the intersection points xiî^Vi) = 0? 6{xi, yi) = 0 for some fixed 6 = aiOi + a2^2 H h aN-g-\-iON-g-\-i^ where the 6i are a basis of 0{x^) over KQ and ai, a2, . . . , ajv-^+i are fixed constants. In addition, it will be assumed that the chosen 9 is in "general position" in the sense that a: is a local parameter at each of the N intersection points (x^, yi) and 0 has poles of order u at each of the n points where x = cx). Each of the N intersection points [xi^yi) implies a pair of differential equations (3) Oxdxi 4- Oydyi + Oidai + ^2C^«2 H

Xxdxi + XydVi = 0, \- ON-g+idaN-g+i = 0,

where subscripts x and y denote partial derivatives, and these partial derivatives are to be evaluated at the point (a:^, y^, ai, a2, . . . , ajv-c^+i) at which ai.

162


a2, . . . , a^-gî have the given values that determine the N points (xiî/i), and Xi and i/i are the coordinates of one of these points. Elimination of dyi from the pair of equations (3) gives the single equation dxi + Q{Oidai + 02da2 + • • • + ÂT-^+Ic/aAT-^+I) = 0 in which Q denotes the quotient —e^-e Y ' "^^^^ quotient is in fact the reciprocal of the derivative of 9 with respect to x (eliminate dy from the equations Xxdx + Xydy = 0 and OxdxÔydy = d^, a computation that assumes x is a parameter on the curve at the point in question). Otherwise stated, it is the residue of the differential ^ at this zero of the denominator 6^ because it is the value of the quotient ^/~^\ at the point (xi^yi) where numerator and denominator, taken separately, are both zero. (It is natural to think of this number as a limit, but of course it can be described algebraically as the value of the rational function of x and y when it is put in canonical form—a numerator in which y has degree less than n — deg x ^^^ ^ denominator that is a polynomial in x alone that is relatively prime to the numerator.) Therefore, if each equation dxi-\-Q{0idai-\-62da2-\-' • •-\-0 N -g+idajsi -gjî) = 0 is multiplied by the value h{xi^yi) of h at the corresponding point {xi^yi) and all N of these equations are added, the result is ^ h{xi,yi)dxi + Cidai -h C2da2 -\ h CN-g-\-idaN-g-\-i = 0, where the coefficient Cj of daj is the sum over all A^ zeros of 0 on the curve x = 0 of hOj times the residue of ^ at that point. It is to be shown that each such coefficient Cj is zero. Since neither 9j nor hdx has poles for finite x, the differential 9jhdx/0 has residues for finite x only at the zeros (xi^yi) of 9, and these residues are the values at {xi^yi) of 9jh times the residue of dx/9 at (xi.yi). In short, Cj is the sum of all residues of the differential 9jhdx/9 at points where x is finite. Therefore, it is minus the sum of the residues at x = oo of the differential 9jhdx/9. Since 9j has order at most z/ at x = CXD (it is in 0{x^)) and 9 has order z/ at X = oc (by assumption), 9j/9 is finite at x = oc, so 9jhdx/9 has no pole at X = oo, which implies Cj = 0 and ^ h{xi,yi)dxi = 0, as was to be shown. Example 1: x{^^ v) = V^ -^ ^^V + ^ (the Klein curve). As was seen in Essay 4.5, 1, y, y'^ are a normal basis over Q{x) for which the A's are 0, 2, 3. Therefore {hi + h2y -\- hsy'^)dx is a holomorphic differential if and only if [hi /i2 hs] S' = [O a 6x + c], where a, 6, and c are rational numbers and the matrix 5, which has tr (T/*"^-^"^) in the zth row of the j t h column, is easily found to be 3 0 0 -2x^ -2x^ -3x

-2x^^ -3x 2x^

When c = 1 and a = 6 = 0, this gives a 3 x 3 homogeneous linear system whose solution is [hi /i2 hs] = 4^9^^272:2 [^^^ ~ ^^ ^^^] • Thus, h = ^^ 4x^^273;^ ^ ' which can be written more simply as /i = 3 2^3,3 • It is easy to see that the solution in which 6 = 1 and c = a = 0 is x times this one, and

Essay 4.6 Holomorphic Differentials

163

the solution in which a — I and b — c =^ 0 is y times this one, which leads to the formula c-\-bx + ay 3y^ + x^ for the most general holomorphic differential on this curve. The formula has three parameters a, 6, c because the genus is 3. (For an easier derivation of this formula, see the examples of Essay 4.8.) Example 2: x(x, y) — y^ — f{x), where f(x) is a polynomial of degree 2n — 1 or 2n with distinct roots (a general hyperelliptic curve). As was seen in Essay 4.5, 1 and y are a normal basis for which the orders at X = oo are 0 and n. (The matrix S{x) is [^ 2f?x)]' whose determinant D(x) = 4:f(x) has distinct roots, so 1, ?/ is an integral basis over x. Division of y'^ - f{x) = 0 by x^^ gives ( ^ ) 2 - ^ ^ = 0, which, when i; = ^ and li = ^, is a curve of the same form v^ — F{u) = 0 of which 1, t' is an integral basis over u. It follows that 1, 7/ is a normal basis relative to x in which the order of 2/ at X = oo is n.) Therefore, {hi -{-h2y)dx is a holomorphic differential if and only if "2 0 ;0 q{x)] [hi h2 0 2f{x) where q{x) is a polynomial of degree at most n — 2. Thus, hdx =

2ffx)^ ~

2 is the most general holomorphic differential, where q{x) is a polynomial of degree at most n — 2. The genus is n — 1.

164


Essay 4.7 The Riemann-Roch Theorem Dedekind and Weber say in their classic treatise t h a t t h e R i e m a n n - R o c h theorem, in its usual formulation, determines t h e number of arbitrary constants in a function with given poles [14, §28]. Indeed, t h a t is exactly t h e way Roch himself formulated t h e theorem [58], as his title "On t h e Number of Arbitrary Constants in Algebraic Functions" indicates. T h e answer, a formula for the dimension of the vector space of rational functions with (at most) given poles, is a corollary of the theorem of this essay, which describes the principal parts of rational functions on an algebraic curve. Let f{x^y) be a rational function on a curve xiîV) — 0^ say f{x,y) = p{x^y)/q{x)^ where p and q are polynomials with integer coefficients and / is regarded as an element of the root field of x ( ^ , v)- T h e p r i n c i p a l p a r t s of / at finite v a l u e s of x are, by definition, t h e terms with negative exponents in the expansions of / in powers oix—a for algebraic numbers a. Such expansions are obtained by applying Newton's polygon to expand y\nn — deg^ x ways in (possibly fractional) powers of x — a, substituting these expansions in p(x, y)^ and multiplying the result by the expansion of \/q{x) in powers oix — a] they can contain negative powers oi x — a only if the expansion of l/q(x) does, which is to say, only if a is a root of q{x). T h e principal p a r t s of / ( x , y) thus amount simply to a list of the roots a of q{x) and, for each of them, a list of the terms, if any, with negative exponents in the n series found by substituting expansions of y in powers of x — o; in / ( a + (x — a ) , y). One can define the p r i n c i p a l p a r t s o f / at x = oo as the principal parts at li = 0 when u = ^, b u t for the sake of simplicity this essay will deal only with rational functions t h a t are finite at x = CXD, so t h a t there are no principal parts at X = (X). Specifically, the only functions considered will be those of the form f{x^y) — p{x^y)/q{x)^ where p{x^y) is in 0{x^) for v — degg. Expansion of numerator and denominator of f{x,y) = (^\/Jl in powers of 1/x then gives a quotient of power series in 1/x in which neither numerator nor denominator contains terms with negative exponents (the numerator is integral over 1/x) and the denominator is not zero when 1/x = 0, so t h e expansions of /(a;, y) in powers of - contain no terms with negative exponents. For values a of x at which xi^^v) — 0 ramifies—which is to say t h a t at least one of the expansions of y in powers oix — a involves fractional powers— the principal parts of a function satisfy an obvious consistency requirement, namely, since one solution ?/ = /ô + A-^ + /^2 1. In particular, this sum is not a polynomial in x. The bilinear form "the trace of the product" from K x K to Q{x) is described, relative to the integral basis zi, 2:2, • • •, z^hy diU n x n symmetric matrix S of polynomials in x with integer coefficients, namely, the matrix whose entry in the zth row of the j t h column is tr x{ziZj). To say that f dx is holomorphic for finite x means simply that all entries of [f]S are polynomials in X when [/] denotes the row matrix whose entries are the coefficients that represent / in the integral basis 2:1, 2:2, . . . , Zn- If this were the case, the sum of the expansions of / ^ , which is [/]/S'[c] where [c] contains the coefficients of 0 = J2îî ^s above, would also be a polynomial in x. Since it is not, f dx must not be holomorphic for finite x, which completes the proof of Lemma 1. Theorem. Let z be a parameter in the root field of x(^, y) ctnd let ^ be the element of the root field defined using implicit differentiation as above. A differential f dx is holomorphic if and only if f^dz is holomorphic.

Essay 4.8 The Genus Is a Birational Invariant

175

Proof. The reciprocal oi ~ is ^ , so it will suffice to prove that '^fdx is holomorphic" implies '^f^dz is holomorphic." Let ^ be a given algebraic number and let an embedding of K in A (a) be given that carries z to 6 -{- a^ for some / / > 0. It is to be shown that if h dx is holomorphic, then the image oi h - ^ under this embedding contains no terms in which the exponent on z — b \s less than or equal to —1, or, what is the same, that all exponents in the expansion oi [z — 6) • h • ^ are positive. Assume first that the image of x under the given embedding has no terms in which the exponent on a is negative; say it is a + a'a + a"o-'^ + • • •. In this case, let m be the exponent of the first nonzero term in the expansion of X — a. (There is such a term because x is not a constant.) When an mth root ei of the reciprocal of a^^^ is adjoined to A, if necessary, the following lemma constructs a substitution a = eis + 62^^ + 635"^ + • • • that carries X = a + a(^)cr"^ + •. • to a + 5"^. Lemma 2. Given a nonzero power series AmX'^-\-Ajn+ix'^'^^Âm-{-2X^^'^-^ ''' in which the coefficients are algebraic numbers and the first nonzero term contains x to the power m > 0, and given an mth root Ci of I/Am, construct an infinite series x — C\S-\-C2S^-\rCzs'^-\-• • - with algebraic number coefficients whose substitution in the series results in s ^ . Proof Substitution of x = Cis + C2S^ + Css^ + • • • in A ^ x ^ + A^+ix^+^ + A m + 2 X ^ + 2 + • • • gives BmS"^ + ^ m + l ^ ^ ^ ^ ' + 5 m + 2 5 " ^ + ' + ' ' ' w h c r C Bm

=

AmCr = 1, 5m+i = mAmCr~^C2 + Am+iC^^\ . . . . The formula for Bm+i when z > 0 contains the terms mAmC]^~^Ci-î and Am+iC'^^'^] the remaining terms in the formula constitute a polynomial in Ci, C2, . . . , C^ and A ^ , A ^ + i , . . . , Ajnjî-i with integer coefficients. Thus, the requirement Bm+i — 0 for i > 0 is the statement that C^+l is a polynomial in Ci, (72, . . . , Ci and Am, ^m+1, . •., Am+i divided by mAmC^'^ = m/Ci. Since Am = l/Cf", it follows that each successive C^+i can be expressed rationally in terms of Ci, Am+i, Am+2, • • •, Am+i' The series Cis -\- €28^ + Cs^^ + • • • constructed in this way has the required property. Because the given embedding K —^ A (a) followed by the substitution cr = eis + 625^ H carries x io a-\- s^^ and because h dx is holomorphic, the resulting embedding K -^ A{s) carries /i to a series in s in which no term has an exponent less than or equal to — m on s. Otherwise stated, all exponents in the expansion of (x — a) • /i in powers of s are positive. Since this expansion is found by substituting the expansion of a in powers of s into the expansion oi {x — a) ' h in powers of cr, it follows that all exponents in the expansion of {x — a) • h in powers of a are positive. Let this expansion be multiplied by the expansion of ^ • ^ 5 ^ in powers of a. On the one hand, the result is {z — 6)'h-^. On the other hand, if (/)(x, z) = 0 is the equation satisfied by x and z, then (f) [a-{- a^^'â^ + • • •, (5 -h cr^) is identically zero, so differentiation with respect to a gives (l)x{x, z){ma^'^'>a'^~^ +

176


• • •) + (pzix, z)ii(j^^~^ — 0, where x and z stand for their expansions as power series in a and the omitted terms are divisible by cr'^. Multiplication by a then gives (l)x (x, z) (ma^^^ (x — a) -h • • •) + 0;^(x, z)iji{z — S)=0, where the omitted terms are divisible by cr^"^^. Division by (l)x{x, z) (which is not zero, because z is not a constant) times x — a gives ma^'^^ -\ M ^ ' f r f — 0? where the omitted terms are all divisible by a. This equation shows that the expansion in powers of a of j^Ê^ is the constant —a^^^ plus terms in a. Therefore, {z - 6) ' h ' j ^ = {{x - a) • h) (—a'^^^ -\ j is a product of two series in a, one with positive exponents and one with no negative exponents, which shows that all terms in the expansion of {z — 6) • h - ^ in powers of a have positive exponents, a conclusion that holds for any embedding K -^ A((7) that carries z to 6 -\- a^ and carries x to a series with no negative exponents. If an embedding that carries z to 6 -\- a^ carries x to a series with some negative exponents, it carries TX = ^ to a series in which all exponents are positive. Since ^ - du \s holomorphic for finite u by virtue of the assumption that hdx is holomorphic, it follows that all exponents in the expansion of {z - 6) ' ^ ' ^ m powers of a are positive. By the chain rule, ^ = ^ ^ = ^ . ^ when X = :^, so it follows that all exponents in the expansion of {z — S) • h- j ^ — {^ ~ ^)' h' ^ • j ^ in powers of a are positive in this case too. Thus, Lemma 1 implies that h • j ^ - dz is holomorphic for finite z. By the same token, h- ^ -dv is holomorphic for finite v for any parameter V and in particular when v — ^. Therefore h • ^ • ^ • dv is holomorphic for finite V = ^, which completes the proof that h • ^ • dz is holomorphic. Corollary. The genus is a birational invariant The determination of the genus can be accomplished by finding holomorphic differentials, for which the following proposition is useful. An algebraic curve x(^, 2/) = 0 is nonsingular for finite x if no pair (a, /?) of algebraic numbers satisfies all three conditions xi^^P) — 0^ Xx{(^^ (^) = 0? and X^(Q;,/?) = 0.

Proposition. If x{x,y) = 0 is nonsingular for finite x, then hdx is holomorphic for finite x if and only if h • Xy ^-^ integral over x. In other words, when x(x, ?/) = 0 is nonsingular for finite x, the differentials holomorphic for finite x are those of the form /^ \ , where (/)(x, y) is integral over X. Proof. First assume that hdx is holomorphic. By the proposition of Essay 4.5, it will suffice to prove that the image oi h- Xy in each embedding of K in A(5) that carries x to a + s^, and carries rational numbers to themselves, is without negative exponents. When Xy{^^(^) 7^ 0, /? is a simple root of xi^^v)-! which implies, as was shown in Essay 4.4, that x = a ^ s^ y — (3 is d^n unambiguous truncated solution oi xi^^y) ~ 0- Such a truncated solution implies an infinite series

Essay 4.8 The Genus Is a Birational Invariant

177

solution y = l3 -^ P'{x - a) + f3'\x — a)"^ H . T h e corresponding embedding K -^ A(5) does not involve fractional powers o^ x — a. T h e assumption t h a t h dx is holomorphic implies t h a t the image of h in A(5) contains no exponents less t h a n or equal to —1, so all exponents are greater t h a n or equal to zero. The same is true of the image of Xy—it is a polynomial in x and y and is therefore integral over x—so the image oi h • Xy under the embedding has no negative exponents, as was to be shown. Otherwise, XX{OLÎ3) 7^ 0, because the curve is nonsingular for finite x. In this case, t h e polynomial ô(^) in x ( a + 5, /? + t) = ^Q{S) + î{s)t -\ \-t^ is divisible by s b u t not 5^, so the Newton polygon algorithm leads t o a "polygon" with one segment from (0,1) t o a point where ji = 0; call it (r, 0). T h e ambiguity of the t r u n c a t e d solution x = a-\-s^y = [3\^ then r , and t h e o u t p u t of Newton's polygon is r unambiguous t r u n c a t e d solutions x = a + 5[, y = P '^ A/CO • 'î (one solution for each of t h e r possible values of \/Co)- By Lemma 2, t h e infinite series expansion y — (3 — ^Co • si^- f3" si-\ implies an infinite series expansion si = 6i(?/ — /?) + 62(2/ — /5)^ + • • •, whose substitution in ^/^ • si-\- (3"si^ gives y — (3 and whose substitution in the embedding K -^ A(51) therefore gives an embedding K —^ A{y — (3) t h a t carries y to (3 -{- {y — (3). Because h - j ^ - dy is holomorphic, it follows t h a t the image of h ' ^ — —/i • — under this embedding has no exponents less t h a n or equal to —1. It has no fractional exponents, so all exponents in t h e expansion of h • — in powers oi y — (3 are at least zero. Therefore, t h e same is true of its expansion in powers of 5i. Since Xx is a polynomial in x = a -\- s\ and 2/ = /^ + A/CO ' SI-\ 5 t h e expansion of /i • Xy = ^ • ^ • Xx in powers of si has no terms with negative exponents, as was to be shown. Thus, the proof t h a t "/i dx is holomorphic" implies "/i • Xy is integral over x" is complete. To prove, conversely, t h a t all differentials of t h e form - ^ dx in which 0 is integral over x are holomorphic for finite x it will suffice to prove t h a t — dx is holomorphic for finite x. Certainly for any embedding x = a -\- s'^ ^ Xy

y = P -\- f3's + f3"s^ -h • • • for which Xy( e at the midpoint of a subsquare, then f{x) is nonzero throughout the subsquare, which means that log/(x) is defined as a function of x throughout the subsquare; since the derivative of \ogf{x) is "TCT, it follows that the integral of 4 ^ dx around the boundary of each such subsquare is zero. If |/(x)| > e were true for the midpoints of all the subsquares, then J^^ 4 ^ dx^ which is the sum of the integrals around the boundaries of all the subsquares, would be zero, contrary to the choice of SQ. Thus, the finite set of midpoints of the subsquares contains at least one complex rational number XQ for which |/(:zô)| < ^^ as required. (Note that the coordinates of the midpoints are rational, so | / ( x ) p at the midpoints can be computed exactly and compared to the rational number e^. Obviously, the amount of computation required to find an XQ by this construction could be huge even for a rather simple f{x) and a moderate e. The method described here is not a practical way of finding an XQ—which in most cases would be easily accomplished by simple bisection methods to get a rough estimate, followed by Newton's method—but a way that can be succinctly described and is a finite calculation.) Proof of the Main Theorem. Let f{x) be the given monic, irreducible polynomial with integer coefficients. Since | / ( x ) | ^ CXD as \x\ -^ CXD, a positive integer N can be chosen for which \x\ > N implies \f{x)\ > 1.

182

5 Miscellany

By the Euclidean algorithm, there are polynomials a{x) and P{x) with rational coefficients for which a{x)f{x) + [3{x)f'{x) divides b o t h f{x) and f'{x). Because f{x) is irreducible and the degree of f'{x) is less t h a n t h a t of / ( x ) , a{x)f{x) + j3{x)f'{x) must be a nonzero rational number, so it can be assumed without loss of generality to be 1. Let A and B be positive integers for which \2a{x)\ < A and \2(3{x)\ < B throughout the disk \x\ < A^ in the complex plane. Finally, let C be a positive integer t h a t is an upper b o u n d for the modulus of t h e polynomial ^ IZ when x and y are complex numbers whose moduli are less t h a n N ^1. Use the lemma to find a rational complex number XQ for which | / ( x o ) | is less t h a n b o t h ^ and J^JQ ^ ^ ^ define a sequence x i , X2^ . . . , x^, . . . by the formula

This sequence converges, as is proved by t h e estimate (1)

|Xn+l - ^ n | < ^ l ^ r i - ^ n - l |

( f o m = 1, 2, . . . )

which will be proved inductively using the estimate (2) \f'{x)

— f'{xQ)\

< —— 2B

(for X on the line segment from Xn-i

to Xn)-

Because 2 = 2a{xo)f{xo) -\- 2/3{xo)f'{xo) and because \f{xo)\ < ^ < 1, the estimate |xo| < N holds, so 2 = \2a{xo)f{xo) + 2P{xo)f{xo)\ < A- \ ^ B • | / ' ( x o ) | , which implies ,r,} ^| < B. Thus, because xi — XQ = —jr^^, the estimate \xi — xo\ < B - 4^2^ = j ^ holds. In particular, because \xo\ < N, the line segment from XQ to xi lies inside t h e disk for which the estimate \f\x) — / ' ( x o ) | < C • \x - xo\ applies, and \f^{x) — f'{xo)\ on this segment is at most C • -^^ = ^ , which implies (2) in t h e case n = 1. W h e n (2) is used to estimate the modulus of

^ £'(/(.„,-/•(.))..

f{xi) - f{xo) ^2 - ^ 1 = ^ 1 - ^ 0

777

^

f\xo)

r,

f

one obtains |x2 — x i | < B\xi — XQ\ • 1^ = ^\xi — XQ\, which proves (1) in the case n = 1. Now if (1) holds for all numbers less t h a n n, then \xn — XQ\ < \xn — Xn-i\-\|X^_1-X^_2|H

h | X i - X o | < {^2^

+ 2 ^

"^

VI)\XI-XQ\
0. The sum over all orbits of these numbers p^ is the number of left cosets of H in G, which, because i7 is a Sylow p-subgroup, is not divisible by p. Therefore, p* = 1 for at least one orbit. In other words, for at least one g in G, the coset represented by kg is the same as the coset represented by g for all /c in X. In short, g~^kg is in H for all A: in i^. Since g~^Kg is a subgroup conjugate to K^ the theorem follows. Proof of Theorem 3. Let G act on the s Sylow p-subgroups by conjugation. By Theorem 2, this action is transitive: There is just one orbit of size s. Therefore, s divides \G\. Let iJ be a Sylow p-subgroup of G, and let H act on the s Sylow psubgroups by conjugation. This action partitions these s subgroups into orbits. Suppose the number of elements in the orbit of H' is 1. Then hH'h~^ = H' for all h in H. In other words, the normalizer of H' in G, call it N^ contains H. Since N also contains H' and both H and H' are Sylow p-subgroups of N {p can divide |A^| no more times than it divides |G|), Theorem 2 implies that H and H' are conjugate in N, which is to say that nH'n~^ = H for some n in N. But nH'n~^ = H' by the definition of N. Therefore, H' — i7, which shows that only one orbit—the orbit of H—consists of a single element. The number of elements in any orbit divides the order of H and is therefore a power of p. Therefore, all orbits other than the one with 1 element have p* elements, where z > 1, from which s = 1 mod p follows.

190

5 Miscellany

Essay 5.3 Overview of 'Linear Algebra' So ist es nicht erstaunlich, dafi ein grofier Teil der modernen algebraischen Lehrbiicher sich der abstrakten Richtung angeschlossen hat, welche im Bereich der Forschung so grofie Erfolge zu verzeichnen hatte. Jedoch mehr als einmal hatte ich Gelegenheit zu beobachten, dafi dies im Bereich der Lehre nicht durchweg der Fall ist. (It is therefore not surprising t h a t a great many modern textbooks have followed the abstract direction t h a t has registered such great successes in the realm of research. However, I have had more t h a n one opportunity to observe t h a t in the realm of teaching this is not invariably the case.)—N. Chebotarev [8, Author's preface t o t h e translation] Some years ago, Sheldon Axler published a book with the audacious title Linear Algebra Done Right [4]. I was probably more struck by the audacity of his title t h a n most readers, because only a few years earlier I had published my own book called Linear Algebra, in which the subject had in fact been done right, b u t I had never thought to say so in the title. Of course, Axler's idea of doing it right t u r n e d out to have nothing to do with mine. W i t h o u t doubt the most attractive quality of mathematics is its apparent lack of subjectivity. "It must be easy for you mathematicians to grade papers," my friends often tell me, "because in mathematics there's only one right answer." In mathematics it can even h a p p e n t h a t the student is right and the teacher wrong, and t h e teacher can be forced to admit it (usually, we hope, cheerfully). T h e other side of this pleasant coin is t h a t mathematics a t t r a c t s people who have a great need for certainty and encourages t h e m to develop into rigidly dogmatic thinkers. T h e charge is made against advocates of constructive mathematics—it was made against Kronecker, against Brouwer, against Bishop—that they are dogmatists who implacably advocate unreasonably extreme views. But what distinguishes t h e m from their accusers is neither the extremity of their views nor the tenacity with which they hold them, b u t the mere fact t h a t their views differ from those of their accusers. T h e feeling on b o t h sides too often is, "I am not convinced by your arguments because your arguments are unconvincing; you are not convinced by my arguments because you are dogmatic." Of course mathematicians feel t h a t mathematics is pure reason and therefore immune to such controversy. But there are plenty of controversies in mathematics. How else can Axler's and my difference regarding linear algebra be described? His choice of title is—I assume—intended as a joke, just as I am joking when I say t h a t my linear algebra book had already done it right. But, in b o t h cases, not really. And I expect t h a t if you ask the first mathematician who comes along which of us is right, the reply will be t h a t both are wrong, and the right way to do linear algebra is . . . . So, having established t h a t it is a mere m a t t e r of opinion, let me explain, if not why I am right, at least why my opinion differs from Axler's. His main

Essay 5.3 Overview of 'Linear Algebra'

191

goal is to avoid determinants, for the reason that the formula for determinants is difficult to motivate, contrary to the modern style of mathematics and, as Axler shows, avoidable. I agree with him that the formula for determinants is daunting, but I believe that determinants, like a boulder in the path, need to be dealt with, not avoided. They are central to linear algebra—specifically to the solution of systems of linear equations—and the sooner students can be brought to use them and be comfortable with them, the better. My main goal in Linear Algebra, by contrast, was to deal with the subject in an algorithmic way that I have found through teaching makes sense to students and gives them the tools they need to solve problems in linear algebra. (Also, the book defines determinants without the formula in a natural way that is explained below.) The early chapters are largely devoted to the following theorem. Let two m X n matrices of integers be called equivalent if one can be transformed into the other by a sequence of steps in which a row is added to or subtracted from an adjacent row or a column is added to or subtracted from an adjacent column. Theorem. Given two m x n matrices of integers, determine whether they are equivalent. Let an m X n matrix of integers be called strongly diagonal if it is diagonal (that is, the entry in the ith row of the jth column is zero whenever i 7^ j ) , if each diagonal entry is a multiple of its predecessor on the diagonal, and if the diagonal entries are nonnegative, except that the entry in the lower right corner may be negative when the matrix is square. The theorem is proved by giving an algorithm that transforms a given m x n matrix of integers into a strongly diagonal one and proving that two strongly diagonal matrices are equivalent only if they are equal. In short, the theorem is proved by showing that strongly diagonal form is a canonical form for matrices with respect to this equivalence relation. (Strongly diagonal form is very close to what is often called Smith normal form in honor of H. J. S. Smith.) The algorithm is simple. (In the book it is given in two stages: the rules for reducing to diagonal form in Chapter 2 and the additional rules for reducing to strongly diagonal form in Chapter 5.) The hard part of the proof is the proof that if two square diagonal matrices are equivalent, then the products of their diagonal entries are equal. That the absolute values of these products are equal is comparatively easy to prove, so the proof comes down to showing that the signs are the same. The investigation of the sign of the product of the diagonal entries in equivalent diagonal square matrices motivates the definition of the determinant of a square matrix as the product of the diagonal entries of an equivalent diagonal matrix; the main thing to be proved then becomes the theorem needed to make this definition valid, namely, the statement that if two square diagonal matrices are equivalent, then the products of their diagonal entries are the same. In fact, the difficult point of the proof can be put even more starkly: Let J be the strongly diagonal matrix that is the nxn identity

192

5 Miscellany

matrix In with the last diagonal entry changed to —1. Prove that J is not equivalent to 7^. At first glance, this theorem seems to have little to do with linear algebra as it is generally thought of (vector spaces, linear maps, bases, etc.), but it provides an algorithmic solution to the core problem of linear algebra: Given an m X n matrix and a column matrix Y of length ?7i, find all column matrices X of length n for which AX = y . In linear algebra courses, the matrices are usually assumed to have real number entries, but the limit process inherent in the notion of a real number has nothing to do with linear algebra per se, and a more reasonable assumption is that the entries are rational numbers. The denominators can be cleared in order to translate the problem into one in which A and Y have integer entries, and the problem becomes that of finding all solutions X of AX — Y with rational entries (with, naturally, a preference for solutions whose entries are integers). If matrices A and B are equivalent, the solution of AX — Yis equivalent to the solution of BX' = Y' ^ because the column operations used to transform A into B can be regarded as invertible transformations of X into X' ^ while the row operations are invertible transformations of Y into Y'. Therefore, it suffices to solve AX — Y for diagonal matrices, which can be done by inspection. For example, DX = F for a diagonal matrix D (not necessarily square) has a solution X for every Y if and only if D has a nonzero entry in each row, which implies, in particular, that the number m of rows of D is no greater than the number n of columns. Similarly, DX determines X if and only if D has a nonzero entry in each column, which implies, in particular, that m > n. Thus, the equation AX = Y can be inverted to express y as a function of X only when m = n^ whether or not A is diagonal. Moreover, a square matrix is invertible if and only if the product of the diagonal entries of an equivalent diagonal matrix (which, at this point of the development, has not yet been shown to be independent of the choice of the equivalent diagonal matrix) is nonzero. / / students found it helpful to think of mathematics in terms of sets and functions, this could all be told to them in the usual way: An m x n matrix describes a particular kind of function—a linear function—from Q'^ to Q"^. It can be onto only if n > m and can be one-to-one only if n < ?7i. If a linear function is both one-to-one and onto, then its inverse function is a linear function, which is to say that the square matrix of coefficients of the given function has an inverse matrix. Time after time in teaching the course I have decided that I surely could explain these facts of linear algebra in terms of linear functions in a way that the students would find helpful, and time after time the effort has failed. The statement that a function is one-to-one seems indistinguishable from the definition of a function for most students. Confusion about the difference between the statement that / is a function from Q^ to Q ^ and the statement that / is onto Q^ is compounded by the fact that different mathematicians mean different things when they talk about the "range" of a function. Class


193

discussions bog down in terminological and conceptual issues that have nothing to do with linear algebra. These experiences have convinced me that the set-function conceptualization is not helpful for students of linear algebra. Perhaps it will work the other way around—a knowledge of linear algebra may help teach notions of sets and functions—but in my experience it does not work the way it is currently supposed to. The above theorem and related topics form the substance of the first six chapters. Chapter 7 is on Moore-Penrose generalized inverses. For every mxn matrix A of rational numbers, there is a unique n x m matrix of rational numbers B for which AB and BA are both symmetric and the equations ABA = A and BAB = B both hold. This matrix B is the Moore-Penrose generalized inverse of A, or the "mate" of A, as I call it for short. Clearly, if B is the mate of A^ then A is the mate of B. The main property of mates is that BY is the best solution, in the least squares sense, of the equation AX = Y for any column matrix Y of length m. (More precisely, when ||M|p denotes the sum of the squares of the entries of a matrix M, ||1^ — AX|p attains its minimum value when X = BY and, among all column matrices X of length n for which this minimum is attained, X = BY is the one for which ||X|p is smallest.) Chapter 8 generalizes the theorem stated above from the case of matrices of integers to the case of matrices of polynomials in one indeterminate x with rational coefficients. In this case, rather than restricting to addition or subtraction of an adjacent row or column, one allows subtraction of any multiple of a row or column from an adjacent row or column, where the multipliers are polynomials in x with rational coefficients. (In the case of integer matrices, the two definitions are the same because subtraction of an arbitrary integer multiple can be achieved by repeated additions or subtractions.) The condition that the nonzero diagonal entries of a strongly diagonal matrix must be positive is replaced by the condition that they must be monic unless, again, they occur in the lower right corner of a square matrix. Once again, every matrix is equivalent to a strongly diagonal matrix, and strongly diagonal matrices are equivalent only if they are equal. In this way, the problem of determining whether two given matrices are equivalent is solved. This solution leads to the proof of the following important theorem of intermediate linear algebra: Two n x n matrices of rational numbers A and B are similar if there is an invertible n x n matrix of rational numbers P for which A = P-^BP. Theorem. Given two nxn matrices of rational numbers, determine whether they are similar.

194

5 Miscellany

Proof. It is not difficult to prove* t h a t A is similar to B if and only ii xl — A is equivalent to xl — B when b o t h are regarded as matrices whose entries are polynomials in x with rational coefficients. (Here / denotes the nx n identity matrix.) Therefore, t h e algorithm of Chapter 8 for solving this latter problem solves the problem of t h e theorem. The c h a r a c t e r i s t i c p o l y n o m i a l of a square matrix A is the determinant of xl — A. As follows from what has just been said, similar matrices have t h e same characteristic polynomial. The converse of this statement is false, as is shown by the fact t h a t the matrix A = [^ | ] is not similar to the 2 x 2 identity matrix. In fact, the strongly diagonal matrix equivalent to xl — A in this case is [^ (3^^1)2], not [^~^ x - i ] ' "^^^ m i n i m u m p o l y n o m i a l of a square matrix A is t h e last diagonal entry of the strongly diagonal matrix equivalent to xl — A. T h e minimum polynomial of a m a t r i x is easily shown to be the greatest common divisor of t h e polynomials of which it is a root (when t h e constant t e r m of the polynomial is interpreted as a multiple of A^, and A^ is interpreted as / ) . For example, in the case of the matrix A just considered, ^4 is a root of f{x) if and only if f{x) is a multiple of J:^ — 2x + 1. By the above, similar matrices have the same minimum polynomial, but this necessary condition for two matrices to be similar is still not sufficient. Such considerations lead to t h e study of the e l e m e n t a r y d i v i s o r s of a matrix A, which are certain powers of irreducible polynomials (the elementary divisors of / are x — 1 and x — 1, while [J J] has just one elementary divisor (x — 1)^) whose product is the characteristic polynomial; they are easily described in terms of the strongly diagonal matrix equivalent to xl — A, and they do determine the similarity class of A. T h e elementary divisors are closely related to the r a t i o n a l c a n o n i c a l f o r m of a matrix. T h e J o r d a n c a n o n i c a l form of a matrix, a subject t h a t is much taught, and in my opinion overemphasized, in intermediate linear algebra courses, is the rational canonical form if one works over the complex numbers rather t h a n the rational numbers, or, better, if one works over an algebraic extension of Q t h a t splits the characteristic polynomial of the matrix. A m a t r i x is d i a g o n a l i z a b l e if it is similar to a diagonal matrix, or, what is the same, if its elementary divisors all have degree 1. T h e methods of Chapter 9 not only make it possible to determine whether a given matrix is diagonalizable, they make it possible, when it is diagonalizable, to construct a similar diagonal matrix. However, this solution of the problem "determine whether a given matrix is diagonalizable" does not solve the problem of "diagonalizing" symmetric matrices in the sense of t h e spectral theorem because it tells only whether a symmetric matrix of rational numbers is similar to a diagonal matrix of rational numbers. This proof is marred by a misstatement in the first—and so far only—printing of the book. On p. 92, E should be assumed to have polynomial entries and D to have rational number entries.


195

Chapter 10 is devoted to the (finite-dimensional) spectral theorem, which states that symmetric matrices are similar to diagonal matrices of real numbers. In the strict sense, this is not a theorem of linear algebra because it involves limits in an essential way: The equivalent diagonal matrix normally contains irrational numbers. This topic warrants an essay of its own.

196

5 Miscellany

Essay 5.4 The Spectral Theorem I remember trying unsuccessfully to concoct a constructive proof of the spectral theorem for symmetric matrices as long ago as 1964. It was only while writing my linear algebra book in t h e early 1990s t h a t I realized t h a t the eigenvectors would follow easily once t h e eigenvalues had been constructed, and t h a t t h e eigenvalues could be described constructively, in most cases, as places where the characteristic polynomial changed sign. This does not cover t h e case of multiple eigenvalues, but while I was developing t h e ideas in Chapter 9 of Linear Algebra it became clear to me t h a t the important polynomial is not the characteristic polynomial of the symmetric matrix but its minimum polynomial, which in t h e case of a symmetric matrix has no multiple roots. In this way, the geometrically fascinating principal axes theorem for symmetric matrices reappeared as the rather modest assertion t h a t the minimum polynomial of a symmetric matrix changes sign a number of times equal to its degree. Once this is known to be true, simple bisection determines all of the eigenvalues as real numbers, and simple linear algebra over a splitting field of the minimum polynomial suffices to determine the corresponding eigenvectors. Once the problem was reduced in this way, Kronecker's work on Sturm's theorem helped me solve it finally to my satisfaction. T h e solution was included in the last chapter of Linear Algebra, where, as far as I know, no one has ever read it. Here it is once again, with a few simplifications and improvements. T h e o r e m . Given a symmetric matrix S whose entries are integers, let f{x) be its minimum polynomial, and let m be the degree of f{x). Construct m-\-l rational numbers XQ < xi < X2 < - - - < Xm with the property that f{xi-i) and f{xi) have opposite signs for i = \, 2, ..., m. Proof. If 772 = 1, then f{x) = x + c, and one can simply set XQ — —N and xi = N for a sufficiently large number N. Assume, therefore, t h a t m > 1. The function* tv {g{S)h{S)), defined for pairs of polynomials {g{x),h{x)) with rational coefficients, can be regarded as a symmetric bilinear function from Q[x] mod f{x) to Q. If h{x) has t h e property t h a t tr {g{S)h{S)) = 0 for all polynomials g{x), t h e n h{x) = 0 mod f{x), because tr (/i(S')^) is the sum of the squares of the entries of the matrix h{S) (because S and therefore h{S) are symmetric), so the case g = h oi tv {g{S)h{S)) = 0 implies h{S) = 0. Therefore, t h e m x m matrix of rational numbers t h a t represents this symmetric bilinear form with respect to the basis 1, x, x^, . . . , x'^~^ of Q[x] mod f{x) is invertible, from which it follows t h a t every linear form Q[x] mod f{x) -^ Q can be expressed as g{x) ^-^ tr {g{S)h{S)) for some polynomial h{x) with rational coefficients. In this way, a polynomial h{x) with rational coefficients can be constructed for which tr {g{S)h{S)) = gi for any * Here the trace tr (M) of a square matrix M is of course the sum to its diagonal entries.

Essay 5.4 The Spectral Theorem

197

polynomial g{x) = gix^~^ + g2X^~'^ + • • • + ^'m in which the Qi are rational numbers. If it is stipulated that degh < m^ this property determines h when f{x) is given. When h{x) is defined to be the polynomial determined in this way, h{x) is relatively prime to / ( x ) , because a common divisor d{x) of f{x) and h{x) of positive degree, say f{x) = qi{x)d{x) and h{x) = q2{x)d{x)^ where deg^ > 0, would imply deggi < m, say degg'i -ht = m, where t > 0, so x^~^qi{x) would be a polynomial of degree m—1 and tr {S*~^qi{S)h{S)) would be nonzero, contrary to S'-^qi{S)h{S) = S*-^qi{S)q2{S)d{S) = S'-^q2{S)f{S) = 0. Therefore, h{x) is invertible mod f{x). Let Sm{x) = / ( x ) , let Sm-i be the unique inverse of h{x) mod f{x) whose degree is less than m, and let later terms of the sequence Sm{x), Sm-i{x), Sm-2{x)^ . . . , Sk{x) be defined by defining Si{x) to be the negative of the remainder when 5^+2 (^) is divided by 5^+1 (x). The sequence terminates with the last nonzero term Sk{x) generated in this way. It will be shown that each Si has the form Si{x) = si'^x^ + s^_^x^~^ H h S Q \ where the first coefficient 5^ , call it Ci, is positive. In particular, si{x) has degree 1 with a positive leading coefficient, and the final nonzero term SQ is a positive constant. For any g{x) = gix^"^ + g2X^~'^ -\ h gm, the identity tT{g{S)sm-i{S)h(Sf)=g^ follows from the definitions of h{x) and Sm-i{x), because Sm-i{S)h{S) = / . Use of Sm-2{x) = -Sm{x) + qm-i{x)sm-i{x), whcrc qm-i{x) is the quotient in the division that defines 5^-2(^)5 in ti {g{S)sm-2{S)h{S)^) gives - 0 + ti{g{S)qm-i{S)sm-i{S)h{S)'^), which is the coefficient of x^"^ in g{x)qm-i{x), provided g{x)qm-i{x) has degree less than m. Since Sm{x) — x'^ + • • •

and

Sm-i{x) = mx'^~^ + • • •

(the latter because tr {sm-i{S)sm-i{S)h{S)'^) = tr (/) = m), it is clear that qm-i{x) has degree 1 and leading coefficient 1/m, so tr

{g{S)sm-2{S)h{Sf)=g,/m

whenever g{x) = gix'^~^ + • • •• The general case of this formula, which states that (1) tT{g{S)si{S)h{Sf)=gr/ciî (Q+1 is the leading coefficient of s^+i(x)), where g{x) — gix'^ + • • • is a polynomial whose degree is at most i, will now be proved. Note first that the case i = m — 2 already proved implies, since Sm-2{x) has degree at most 771 — 2, that the coefficient Cm-2 of x'^~'^ in Sm-2{x) satisfies c^_2/m = tr {{sm-2{S)h{S))'^), which is the sum of the squares of the entries of Sm-2{S)h{S) and is therefore positive unless Sm-2{S)h{S) = 0; thus Cm-2 is positive, because h{S) is invertible and Sm-2{S) = 0 would imply

198

5 Miscellany

tr {g{S)sm-2{S)h{S)'^) = 0 for all g{x), but for g{x) = x ^ - ^ ^-j^jg ^âce is 1/m. Therefore, the assertion that Si{x) has degree i and positive leading coefficient is proved for i = TTI, m — 1, m — 2. Similarly, if this assertion and (1) are proved for both i -\- 2 and i + 1, say Si^2{x) = 0^+2^^"^^ + • • • and Si+i(x) = Q+IX*+^ + ' • •, the same method proves them for i, provided i > 0, because if g{x) = gix^ -\- - - -^ then tr {g{S)siiS)hiSf)

= - tr {giS)si+2{S)h{Sf)+tT

{g{S)qi+i{S)s,+i{S)h{Sf)

because ^(x)g'i+i(x) = ^^^''^^ • x'^'^^ + • • •. In particular, Si{S) ^ 0. The case g{x) = 5^(x) of this identity implies that Ci/ci-î is the positive rational number tr {{si{S)h{S))'^)^ so Si(x) has degree i and a positive leading coefficient, as was to be shown. A polynomial of degree i cannot change sign more than i times, as can be seen as follows: If F(x) is a polynomial, if a and b are rational numbers for which F{a) < F{b), and if c is the midpoint of the interval [a, b], then the rate of increase of F on the interval is the average of the rate of increase on the two halves of the interval [a, c] and [c, 6], which is to say that F{b) - F{a) _ 1 fF{c) - F{a) b— a 2\ c—a

F{b) - F{c) b— c

as follows from c — a = b — c = ^{b — a). Select the half interval on which the rate of increase is larger, or, if the two rates are the same, select the half interval on the right. Iteration of this bisection and selection rule determines a nested sequence of subintervals of [a, 6], each half as long as its predecessor, and therefore determines a real number (their "intersection"). Since the derivative* F'{x) of F{x) at this real number is the limit of a nondecreasing sequence of positive rational numbers (namely, of the values of {F{b) — F{a))/{b — a), where a and b wee the endpoints of the successive intervals), the real number determined by the nested intervals is one at which F^x) is positive. In this way, any interval on which F{x) increases contains a real number at which F'{x) is positive, and therefore contains a rational number at which F'{x) is positive. Similarly, any interval on which F decreases contains a rational number at which F'{x) is negative. Therefore, if a polynomial with rational coefficients of degree i changes sign at least cr > 0 times, then z > 0, and its derivative is a polynomial of degree i — 1 that changes sign at least a — 1 times. Repetition of this argument a times gives a polynomial of degree i — a, so z > cr. Moreover, if i nonoverlapping intervals on which a polynomial of degree i changes sign are given, bisection of each of them (but moving the midpoint * The derivative of F{x) is of course the coefficient of h in the polynomial F{x + h) — F{x) = F'{x)h + • • •, where the omitted terms all contain h?.

Essay 5.4 The Spectral Theorem

199

slightly if it happens to be a root of the polynomial) gives 2i nonoverlapping intervals; t h e polynomial changes sign on at least i of t h e m and, as was just shown, on no more t h a n i of them. Therefore, repetition of the bisection process constructs i real roots of the polynomial and shows t h a t the values of such a polynomial for two rational numbers a and b have opposite signs if and only if the interval [a, b] contains an odd number of its i real roots, provided neither a nor 6 is a root. For each i = 1, 2, ..., m, Si{x) has i real roots, and the number of real roots of Si{x) that are greater than a given one of them is equal to the number of real roots of Si-i{x) that are greater than it, as can be proved inductively as follows: This statement is obviously true in the case i = 1, because a polynomial of degree 1 has just one root, and no roots of SQ are greater t h a n it. Suppose now t h a t it is true for a given i, and let p i , p2^ .. •, Pi be t h e real roots of 5i(x), in ascending order. By t h e inductive hypothesis, the number of real roots of Si-i{x) greater t h a n pj is i — j ^ so, since Si-i{x) is positive for all sufficiently large values of x, Si-i{pj) has t h e sign (—1)*"-^. (A polynomial of degree z — 1 has at most i — 1 roots, counted with multiplicities, so t h e roots of Si-i{x) are simple and Si-i{x) changes sign at each root.) Since the formula Si-î{x)-\-Si-i{x) == qi{x)si{x) implies* t h a t Si-\-i{x) and s^_i(x) have opposite signs at a root of Si{x)^ it follows t h a t t h e sign of s^+i(x) at pj is (—1)*"-^+^. Because the leading coefficient of Si-i{x) is positive, when yo^+i is chosen to be a large enough number and when —po is chosen to be a large enough number, the same rule describes the sign of 5^+i(x) at all of the real numbers po, p i , . . . , p^+i. Since these signs alternate, it follows t h a t Siî{x) changes sign z + 1 times, and therefore has i -h 1 real roots. Moreover, t h e j t h one of these roots lies in the j t h interval where s^+i(x) changes sign, which places it between pj and pj+i and shows t h a t t h e number of roots of s^+l(x) greater t h a n a given root of Si-î{x) is the number of p^s greater t h a n it, as was to be shown. Thus, sufficiently close rational approximations to the roots of Sm-i{x), together with a pair of values ±A^ for a large number A/", demonstrate m changes of sign in Sm{x) — f{x), as required. C o r o l l a r y ( T h e s p e c t r a l t h e o r e m ) . Given a symmetric matrix S whose entries are integers, find real numbers pi, P2, • • -, Pm d'^d symmetric matrices of real numbers Pi, P2, . . . ; Pm that satisfy 5 = P i P i + P 2 P 2 + . . . + P m P rrri') * A common divisor of Si{x) and Si-i(x) divides all of the Sj{x), and therefore divides the nonzero constant SQ, and must therefore be a nonzero constant. Therefore, there are polynomials a{x) and /3{x) with rational coefficients for which a{x)si{x) + p{x)si-i{x) = 1. If p is a real root of Si{x), then 1 = /3{p)si-i{p), which implies that the real number Si-i{p) is nonzero. Similarly, a root of Si-i{x) is not a zero of Si{x). Thus, when p is a root of Si{x), the equation Si-\-i{p) + Si-i{p) = 0 implies that Si+i(p) and Si-i{p) have opposite signs.

200

5 Miscellany / = P i + P2 + • • • + Prr

and PP =

Pi 0

if ^ = J, otherwise.

Deduction. All that is needed is to construct the matrix that is orthogonal projection on the eigenspace corresponding to each eigenvalue p^, which is to say orthogonal projection on the kernel oi S — pil for all roots pi of the minimum polynomial of S. (To say that a matrix is an orthogonal projection means that it is symmetric and idempotent.) Each of these orthogonal projections is easy to find, because the orthogonal projection on the kernel of a symmetric matrix M is / — Q, where Q is orthogonal projection on the image of M, which is to say that Q is M multiplied on the right by the MoorePenrose generalized inverse of M. Therefore, the spectral decomposition of S can be given once the Moore-Penrose generalized inverses of the matrices I — S -\- pil are found. Note that the computation of the Moore-Penrose generalized inverse of a matrix requires exact computations with the entries, so it becomes possible only after a splitting field for the minimum polynomial is constructed; the interpretation of the p's and P's as real numbers requires an identification of the splitting field with a subfield of the field of real numbers.

Essay 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics"

201

Essay 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics" Kronecker laid himself out in 1891 to criticize Cantoris work to his students in Berlin, and it became clear that there was no room for them both under one roof As Kronecker was already in possession, Cantor resigned himself to staying out in the cold.—E. T. Bell, Men of Mathematics^ p. 570. Discussing Lindemann's proof that n is transcendental, Kronecker asked, ^'Of what use is your beautiful investigation regarding TT? Why study such problems, since irrational [and hence transcendental] numbers do not exist?''îhid., p. 568. It is a mistake to take Eric Temple Bell's book Men of Mathematics too seriously. Bell set out to write a popular book about the history of mathematics, and he succeeded admirably. From its publication in 1937 until today, the book has amused and inspired several generations of amateur and professional mathematicians, including mine. His outrageousness is part of his winning style. But in spicing up his stories he did create distortions, extrapolations, and outright falsehoods that have since become common "knowledge." On the whole, the picture Bell paints of Kronecker is not negative. "His skepticism was his greatest contribution," the table of contents says of Kronecker, and at many points Kronecker is made the respectable spokesman for those who objected to the growing use of the transfinite. Kronecker would probably have preferred to have less about his philosophy and more about his mathematical achievements in the book, but his importance and his contributions are certainly not slighted. Unfortunately, the word "vicious" is used more than once to describe his criticism of others, principally Weierstrass and Cantor, and, as I have said in the Preface, I do not feel this word is justified. I have recently come to understand what lies behind the two statements of Bell that are quoted above, and in explaining them in this essay I hope to shed some light on Kronecker and his ideas, as well as on Bell, his methods, and his lack of credibility. In the first quotation Bell implies that it was one thing to criticize professional colleagues and quite another to do it in front of students. That was surely Cantor's view of it when he bitterly complained to W. Thome in a letter dated 21 September 1891 that Kronecker in public lectures had told his "immature audience" that Cantor's work was "mathematical sophistry" [56, document 38]. Bell was undoubtedly referring to this letter of Cantor's when he wrote of Kronecker's 1891 criticism of Cantor, but I doubt that his evidence of the alleged criticism was as reliable as that contained in the transcript of Kronecker's 1891 lectures that was recently published [45]. Despite Cantor's claim that Kronecker had denounced a specific work of his, the published version of Kronecker's lectures does not even mention Cantor, much less the

202

5 Miscellany

specific work Cantor cites. The word "sophistry" is indeed used [45, p. 247] in connection with a transformation of i/-dimensional space into a space of some other dimension, which Cantor might reasonably imagine to be a reference to his work, but, as t h e editors of the lectures point out, t h e lectures were given just one year after P e a n o published his famous curve t h a t fills an area of the plane. It is of course possible t h a t Cantor knew more t h a n we do about Kronecker's actual words; he says in the letter t h a t he had obtained a copy of the lectures by chance, but nothing of t h e sort is to be found among his surviving papers. However, it is equally possible t h a t he was overreacting to a simple statement of opinion t h a t may not even have been directed at him and t h a t had, in the version t h a t has survived, no tinge of personal animosity. For his part. Cantor says Kronecker's "entire course of lectures is a muddled and superficial mix of undigested ideas, boasts, unmotivated name-calling, and rotten jokes."* If Bell's impression of Kronecker's remarks came from this characterization of t h e m it is easy to see why he used t h e word "vicious," b u t the surviving version of the lectures in no way deserves Cantor's description. Finally, before Kronecker's hostility can be taken, as Bell does take it, to be the cause of C a n t o r ' s spending his entire career at Halle instead of being called to Berlin, one must show t h a t somewhere there was someone who felt Cantor was a qualified a n d desirable candidate for appointment at Berlin. T h e fact is t h a t Kronecker died in the very year 1891 t h a t these lectures were given, so there was very soon no question of their needing to live "under one roof." Weierstrass survived Kronecker, and when Weierstrass died he was succeeded by H. A. Schwarz, no friend of Kronecker's views, but I am unaware of any effort to bring Cantor t o Berlin. When, as a g r a d u a t e student, I first read the passage in Bell's book about Kronecker's a t t i t u d e toward TT, I think I was as indignant as Bell wanted me to be with the claim t h a t TT might not "exist." Years later, when I encountered the same anecdote in Constance Reid's book Hilbert, I had come to take a great interest in Kronecker a n d his ideas, so this time rather t h a n being indignant, I was puzzled and unsure t h a t the anecdote was authentic. Neither Bell nor Reid cites a source. Kronecker's works show no inhibition about the use of TT. His papers on analytic number theory and those on elliptic functions are full of TT'S. In the first lecture in his course of lectures on number theory [44], he refers without apology to "the transcendental number TT from geometry" and notes t h a t it can be defined b y ^ = 1 — ^ + ^ — y + - - - . I am not aware t h a t he ever expressed any reservations about any particular transcendental number. W h a t he had reservations about was the notion t h a t t h e totality of real numbers could be treated as a mathematical entity. W h y would he have had any reservations about the "existence" of TT, or even about the meaningfulness of Lindemann's * Die ganze Vorlesung ist ein wirres oberfldchliches Gemisch von unverdauten Ideen, Prahlereien, unmotivierten Schimpfereien und faulen Witzen.

Essay 5.5 Kronecker as One of E. T. Bell's "Men of Mathematics"

203

theorem t h a t TT was transcendental? On the other hand, Bell's quotation of Kronecker could not be a mere invention. A few years ago, I found what I took to be Bell's source in Florian Cajori's History of Mathematics [7], where Cajori writes, "[Kronecker] once paradoxically remarked to Lindemann: 'Of what use is your beautiful research on t h e number TT? W h y cogitate over such problems, when really there are no irrational numbers whatever?' " B u t Cajori gives no source either. His book was originally published in 1894, which would put it only three years away from Kronecker and make plausible the hypothesis t h a t Cajori learned the story through word of mouth. Only much later did I read carefully the copyright page of t h e Chelsea reprint edition I had. There was a "second, revised and enlarged edition" in 1919, of which the Chelsea edition was a reprint. W h e n I finally tracked down a copy of t h e 1894 edition in microfiche, I learned t h a t t h e story of Kronecker and n had been added in 1919, twenty-eight years after Kronecker's death. Then, in J u n e of 2003, while writing to Professor David Rowe of Mainz about a different question, I had the happy thought to ask him whether he knew where Cajori might have heard the story. By r e t u r n e-mail he was able give me what seems certain to be the correct source.* In 1904, Teubner published a G e r m a n translation of Poincare's Science et Hypothese a n n o t a t e d by Lindemann [54]. One of Lindemann's notes—note (4) to page 20—cites Kronecker's advocacy of restating the theory of algebraic quantities entirely in terms of the theory of polynomials with integer coefficients (see Essay 1.1) and goes on to say: Spater ging Kronecker noch weiter, indem er die Existenz irrationaler Zahlen leugnete; so sagte er mir in seiner lebhaften und zu Paradoxen geneigten Art einmal: "Was nlitzt uns Ihre schone Untersuchung iiber die Zahl TT? Wozu das Nachdenken iiber solche Probleme, wenn es doch gar keine irrationalen Zahlen gibt?" (Later, Kronecker went even further and denied the existence of irrational numbers; thus, he once said to me in his lively and paradoxical way, "Of what use to us are your beautiful researches about the number TT? W h y consider such problems when in fact there are no irrational numbers?") Since Lindemann published this more t h a n ten years after Kronecker's death, we are entitled to take his quotation marks with a grain of salt. Clearly, Lindemann felt t h a t Kronecker was teasing him—not unkindly it would seem in view of the appearance of t h e word "beautiful"—but Kronecker's exact words, which are essential to an understanding of t h e underlying criticism, would probably have been difficult for Lindemann to recall after ten minutes, not to mention ten years. * Their different English versions of the alleged quotation suggest that Bell may have based his telling of the story directly on Lindemann, not on Cajori's retelling.

204

5 Miscellany

That Kronecker would prefer to state Lindemann's result in a way that made no reference to the totality of transcendental numbers comes as no surprise. If his meaning was simply that he would prefer to state the result in a form something like, "For any polynomial f{x) in one variable with integer coefficients the sequence of rational numbers / ( I — ^ + | — ^ + ---H- ^^,-^) can be bounded away from zero," no one would be scandalized, and the statement could not be used to ridicule Kronecker's views. I certainly do not claim to know what the point of Kronecker's criticism might have been—it could have been the form in which Lindemann stated his result or the methods he used or many things in between—but it seems certain to me that he would have regarded the result as having meaning and even as having considerable interest. For the fun of it, we can indulge Bell in his extravagant caricatures of our mathematical forebears, but we should be careful not to let them affect our understanding of the history of our subject. In particular, we should not let Kronecker's role as Bell's gadfly obscure his true-life role as a great mathematician whose works are classics.

References

[I]

[2] [3] [4] [5] [6] [7] [8]

[9]

[10] [II] [12] [13] [14]

N. H. Abel, Memoire sur une propriete generate d'une classe tres-etendue de fonctions transcendantes, Memoires presentes par divers savants a I'Academie des sciences, Paris, 1841, Oeuvres Completes, vol. 1, 145-211, (4.1). N. H. Abel, Sur la resolution algebrique des equations, Oeuvres Completes, vol. 2, pp. 217-243, (4.5). Aristotle, The Physics; P. H. Wicksteed and F. M. Cornford, translators. Harvard Univ. Press, Cambridge, Mass., 1957, (5.1) S. Axler, Linear Algebra Done Right, Springer-Verlag. New York, 1996, (5.3). E. T. Bell, Men of Mathematics, Simon and Schuster, New York, 1937, 1962, (5.5). N. Bourbaki, Elements d'Histoire des Mathematiques, 2nd edition, Hermann, Paris, 1969 (4.3). F. Cajori, History of Mathematics, Macmillan, New York, 1894, 1919, Chelsea Reprint 1980, 1985, Landmarks of Science Microform (5.5) N. G. Chebotarev (Tschebotarow), Grundziige der Galois'schen Theorie, Noordhoff, Groningen-Djakarta, 1950, Translation of Osnovie Teorii Galua, Gosudarstvennoe Techniko-Teoreticheskoe-Isdatelstvo, Moscow-Leningrad, 19341937, (Synopsis, 1.7, 5.3). N. G. Chebotarev, Newton's Polygon and its Role in the Present Development of Mathematics (Russian), Isaac Newton, 1643-1727 (S. I. Vavilova [transliterated Wawilow on the English version of the title page], ed.), Izdatelstvo Akademii Nauk, Moscow-Leningrad, 1943, pp. 99-126, (4.4). H. T. Colebrooke, Algebra, with Arithmetic and Mensuration, from the Sanscrit of Brahmegupta and Bhascara, J. Murray, London, 1817, (3.1). Gabriel Cramer, Introduction a Vanalyse des lignes courbes algebriques, Freres Cramer, Geneva, 1750, Landmarks of Science Microform (4.4). R. Dedekind, Uber einen arithmetischen Satz von Gauss, Mitt. Deut. Math. Ges. Prag, (1892), 1-11, Werke, vol. 2, 28-38. (2.5). R. Dedekind, Uber die Begriindung der Idealtheorie, Nachr. Kon. Ges. Wiss. Gottingen, (1895), 106-113, Werke, vol. 2, 50-58. (2.5). R. Dedekind and H. Weber, Theorie der algebraischen Funktionen einer Verdnderlichen, Jour, fiir Math., 92 (1882), 181-290, Dedekinds Werke, vol. 1, 238-349. (4.5, 4.7).

206

References

[15] L. E. Dickson, History of the Theory of Numbers, Carnegie Institute, Washington, 1920, Chelsea reprint, 1971 (3.1). [16] P. G. L. Dirichlet, Vorlesungen ilber Zahlentheorie (R. Dedekind, ed.), Vieweg, Braunschweig, 1863, 1871, 1879, 1894, Chelsea reprint, 1968, (3.3). [17] H. M. Edwards, Euler and Quadratic Reciprocity, Mathematics Magazine, 56 (1983), 285-291, (3.5). [18] H. M. Edwards, Galois Theory, Springer-Verlag, New York, 1984 (1.7, 1.9, 2.1, 2.3, 2.4). [19] H. M. Edwards, Divisor Theory, Birkhauser, Boston, 1990, (2.5). [20] H. M. Edwards, Linear Algebra, Birkhauser, Boston, 1995, (5.3, 5.4). [21] H. M. Edwards, Kronecker on the Foundations of Mathematics, From Dedekind to Godel (Jaakko Hintikka, ed.), Kluwer, 1995, pp. 45-52, (Preface, 1.1). [22] H. M. Edwards, Kronecker's Fundamental Theorem of General Arithmetic, Proceedings of a conference held at MSRI, Berkeley, in April 2003, (to appear) (Synopsis). [23] H. M. Edwards, O. Neumann and W. Purkert, Dedekinds "Bunte Bemerkungen" zu Kroneckers "Grundzilge", Arch. Hist. Exact Sci., 27 (1982), 49-85, (2.5). [24] F. Engel, Eduard Study, Jahres. der DMV 40 (1931), (4.2). [25] Euclid, The Thirteen Books of Euclid's Elements, T. L. Heath, translator and editor, 2nd Edition, Cambridge Univ. Press, 1925, Dover reprint, 1956, (1.2, 1.4, 3.1). [26] L. Euler, Observationes de Comparatione Arcuum Gurvarum Irrectificabilium, Novi Comm. acad. sci. Petropolitanae, 6 (1761), 58-84, Opera, ser. 1, vol. 21, pp. 80-107, Enestrom listing 252, (4.2) [27] E. Galois, Memoire sur les conditions de resolubilite des equations par radicaux, J. Math. Pures et AppL, 11, 1846, 381-444, see [18] for other citations and for English translation. (Synopsis, 1.2, 1.9, 2.1, 2.3, 2.4). [28] C. F. Gauss, Disquisitiones Arithmeticae, Braunschweig, 1801, (Synopsis, 2.5, 3.3, 3.4, 3.5, 3.6, 3.7). [29] C. F. Gauss, Demonstratio Nova Theorematis Omnem Functionem Rationalem Integram Unius Variabilis in Factores Reales Primi vel Secundi Gradus Resolvi Posse (1799), Helmstadt, Werke, vol. 3, pp. 1 30, (5.1). [30] C. F. Gauss, Demonstratio Nova Altera Theorematis Omnem Functionem Rationalem Integram Unius Variabilis in Factores Reales Primi vel Secundi Gradus Resolvi Posse, Comm. soc. reg. sci. Gottingensis (1815), Werke, vol. 3, 31-56, (5.1). [31] K. Hensel and G. Landsberg, Theorie der algebraischen Funktionen einer Variabeln, Leipzig, 1902, Chelsea Reprint, 1965 (4.4). [32] O. Holder, Uber den Casus Irreducibilis, Math. Annalen, 38 (1891), 307-312, (1.7). [33] A. Hurwitz, Uber die Theorie der Ideale, Nachr. kon. Ges. Wiss. Gottingen (1894), 291-298, Werke, vol. 2, 191-197 (2.5). [34] A. Hurwitz, Uber einen Fundamentals atz der arithmetischen Theorie der algebraischen Grofien, Nach. kon. Ges. Wiss. Gottingen (1895), 230-240, Werke, vol. 2, 198-207 (2.5). [35] A. N. Kolmogorov and A. P. Yuskevich (ed.). Mathematics in the 19th Gentury (Russian), Nauk, Moscow, 1978, English translation by A. Shenitzer, Birkhauser, 1992 (1.2).

References

207

[36] A. Kneser, Uher die Gattung niedrigster Ordnung . . . , Math. Annalen, 30 (1887), 179-202, (1.7). [37] L. Kronecker, Uher die verschiedenen Sturm^schen Reihen und ihre gegenseitigen Beziehungen^ Monatsber. Akad. Wiss. Berlin (1873), 117-154, Werke, I, 303-348, (2.4). [38] L. Kronecker, Uber die Discriminante algebraischer Functionen einer Variablen, Jour, fiir Math., 91 (1881), 301-334, Werke, II, 193-236, (4.3). [39] L. Kronecker, Grundzilge einer arithmetischen Theorie der algebraischen Groflen, Jour, fiir Math. 92 (1882), 1-122, Werke, II, 237-388, (1.1, 1.4, 1.5, 1.7, 2.2, 2.4, 4.5). [40] L. Kronecker, Die Zerlegung der ganzen Grossen eines natilrlichen Rationalitdts-bereichs in ihre irreductibeln Factoren, Jour, fiir Math. 94 (1883), 344348, Werke, II, 409-416, (1.4). [41] L. Kronecker, Zur Theorie der Formen hohere Stufen, Monatsber. Akad. Wiss. BerHn (1883), 957-960, Werke, II, 419-424, (2.5). [42] L. Kronecker, Ein Fundamentalsatz der allgemeinen Arithmetik, Jour, fiir Math. 100 (1887), 490-510, Werke, Ilia, 209-240, (1.1, 1.2, 1.8). [43] L. Kronecker, Uber den Zahlbegrijf, Jour, fiir Math. 101 (1887), 260-272, Werke, Ilia, 249-274, (1.1, 1.7). [44] L. Kronecker, Vorlesungen ilber Zahlentheorie, Teubner, Leipzig, 1901, Reprint, Springer, New York, 1978 (3.2, 5.5). [45] L. Kronecker, Uber den Begriff der Zahl in der Mathematik (Sur le concept de nombre en mathematiques)^ Retranscribed and annotated by J. Boniface and N. Schappacher, Revue d'histoire des mathematiques, 7 (2001), 207-275, (1.1, 5.5). [46] E. E. Kummer, Zur Theorie der complexen Zahlen, Jour, fiir Math. 35 (1847), 319-326, Collected Papers, 1, 203-210, (3.6). [47] E. E. Kummer, Uber die allgemeinen Reciprocitdtsgesetze unter den Resten und den Nichtresten, Math. Abh. Kon. Akad. Wiss. Berlin, 1859, Collected Papers, 1, 699-839, (3.5, 3.6). [48] J. L. Lagrange, Additions to Euler's Algebra, republished in vol. 7 of Lagrange's Oeuvres and vol. 1 (1) of Euler's Opera (3.3). [49] H. W. Lenstra, Jr., Solving the Pell Equation, AMS Notices, 49 (2002), pp. 182-192, (3.1). [50] A. Loewy, Algebraische Gleichungen mit reelen Wurzeln, Math. Zeitschrift 11 (1921), 108-114, (1.7). [51] I. Newton, The Mathematical Papers of Isaac Newton (D. T. Whiteside, ed.), Cambridge Univ. Press, 1969, (4.4). [52] O. Ore, Neils Henrik Abel, Mathematician Extraordinary, Univ. of Minnesota, Minneapolis, 1957, Chelsea reprint, 1974 (4.1). [53] H. Poincare, L'Oeuvre Mathematique de Weierstrass, Acta Mathematica 22 (1899), 1-18, (Preface). [54] H. Poincare, Wissenschaft und Hypothese (F. Lindemann, ed.), Teubner, Leipzig, 1904, (5.5). [55] Princeton University Bicentennial Conferences, Series 2, Conference 2, Problems of Mathematics, Reprinted in A Century of Mathematics in America, P. Duren et al., eds., AMS, Providence, 1989. (1.4). [56] W. Purkert and H. J. Ilgauds, Georg Cantor, Birkhauser, Basel, 1987, (5.5).

208

References

[57] B. Riemann, Grundlagen fiir eine allgemeine Theorie der Functionen einer verdnderlichen complexen Grosse, Riemann's gesammelte mathematische Werke, 1892, Dover reprint, 1953 (4.3). [58] G. Roch, Ueber die Anzahl der willkurlichen Constanten in algebraischen Functionen, Jour. f. Math. (1864), 372-376, (4.7). [59] W. Scharlau, Unveroffentlichte algebraische Arbeiten Richard Dedekinds aus seiner Gottinger Zeit 1855-1858, Arch Hist. Exact Sci. 27 (1982), 335-367, (1.7). [60] H. J. S. Smith, Report on the Theory of Numbers, Reports of the British Association for the Advancement of Science, 1859-1865, (3.6). [61] R. J. Walker, Algebraic Curves, Princeton Univ. Press, Princeton, 1950, Springer-Verlag reprint, 1978, (4.4). [62] A. Weil, Number-theory and algebraic geometry. Proceedings of the International Mathematics Congress, VI. II, 1950, pp. 90-100, Collected Papers, vol. 2, 442-452 (Preface, 2.2). [63] A. Weil, Number Theory, Birkhauser, Boston, 1984, (3.6).

Index

Abel, Niels Henrik (1802-1829), xvii, xviii, 119, 120, 122-124, 127, 128, 142 addition formula, see Euler's addition formula adjunction relations, xv, 51 adjunctions, xiii, 42 algebraic field, 47 algebraic integer, 63, 129 algebraic number, 63 algebraic number field, 48 algebraic quantities, xv, 1, 46, 47 algebraic variation, xvii, 121-124, 126, 127, 155 ambiguity of truncated solutions, 136, 138 ambiguous module classes, 96-98, 100 Archimedes, 65, 67, 68 Aristotle, ix, 179 Axler, Sheldon, 190 Bell, Eric Temple (1883-1960), 201-204 Bhascara Acharya, 67 binary quadratic forms, 108, 112 Bishop, Errett (1928-1983), ix, 190 Brahmagupta, 66-68, 109 Brouwer, J.E.J. (1881-1966), ix, 190 Buchstabenrechnung, 2, 4, 68 Cajori, Florian (1859-1930), 203 canonical form, 74, 191 Cantor, Georg (1854-1918), ix, x, 201, 202

Chebotarev, Nikolai G. (1894-1947), XV, 132

Chinese remainder theorem, 73, 93, 104 class group, 99 class semigroup, xvi, 79, 98 Comparison Algorithm, 80 completed infinites, ix, x complex numbers, 122, 128, 130, 179, 194 composition of forms, 102, 108, 109, 112 congruence relations, xvi, 71, 73 consistency requirement, 164-166 constructive mathematics, x, xi, 6, 133, 179-180, 186 content of a form, 113 continued fractions algorithm, 65 Dedekind, Richard (1831-1916), ix, x, 31, 62, 108, 110, 111, 129, 142, 164 diflFerential, 121, 157 Dirichlet, G. Lejeune (1805-1859), ix, xvii, 80, 108, 110 Disquisitiones Arithmeticae, xv, 79, 96, 102, 103, 108, 109, 127 double adjunction, xiv, 36 elementary divisors, 194 elementary symmetric polynomials, 57 elimination, 10 elliptic curves, xviii equivalence problem, 79, 88 equivalent modules, xvi, 79 Euclid's Elements, xi, 6, 16, 66 Euclidean algorithm, 16, 71, 73

210

Index

Euler, Leonhard (1707-1785), xviii, 91, 102, 124, 127 Euler's addition formula, 124-127 field of constants, 131, 150, 152, 157 field of quotients, 4 folium of Descartes, 137, 151 full representations of modules, 115 Fundamental Theorem of Algebra, xi, xiv, xix, 179, 185 Galois, Evariste (1811-1832), xiii, xv, 6, 7, 39, 42, 56, 120 Galois field, 41, 49-51 Galois group, 42 Galois polynomial, 39, 41 Gauss, Carl Friedrich (1777-1855), ix, xv-xvii, 3, 6, 7, 79, 91, 102, 108-110, 126, 183 Gauss's lemma, xiii, 18, 62 general arithmetic, xi, xv, xvii, 4, 63 genus, xvii, xviii, 122, 123, 126-128, 131, 150, 152, 153, 157-159, 171 genus of an algebraic curve, xvii Grobner bases, 5 Hensel, Kurt (1861-1941), 132 Hermite, Charles (1822-1901), 108 Hilbert, David (1862-1943), ix holomorphic differentials, xviii, xix, 124, 127, 155-158, 171-174, 176-178 Hurwitz, Adolph (1859-1919), 62 hyperelliptic curve, 154, 163 hypernumber, xv-xvi, 69 implicit diflFerentiation, 155, 157, 161 infinite descent, 148, 149, 187, 188 infinity, x, xi integers, 3 integral basis, 142, 147 integral domain, 4 integral over x, 128 irreducible polynomial, 15 Jacobi, Carl J.G. (1804-1851), 119 /ô, 157 Klein curve, 162, 178 Kronecker, Leopold (1823-1891), ix-xi, xiv, XV, xix, 1-8, 13, 31-33, 35,

46-47, 56, 62-63, 71, 108, 129, 142, 147, 190, 196, 201-204 Kronecker-Kneser Theorem, 31, 44 Kummer, Ernst (1810-1893), 102, 108, 110 Lagrange, Joseph-Louis (1736-1813), 92 Landsberg, Georg (1865-1912), 132 Legendre, Adrien-Marie (1752-1833), 102 Lindemann, Ferdinand (1852-1939), 203 linear algebra, xix, 190-193 minimal splitting polynomial, xiv, 39-40 module systems, 4-5 modules, xv-xvi, 71-78, 102 modules of hypernumbers, 73, 110 Moore-Penrose generalized inverse, 193, 200 multiplication of modules, xvi-xviii, 73, 77 Newton's polygon, xi, xvii, 132, 135, 139, 142, 152 Newton, Isaac (1642-1727), xvii, xviii, 132 normal basis, 142, 149 number, 2 order of a function at oo, 129 Ore, Oystein (1899-1968), 120, 122 p-parts, 93 Pascal, Blaise (1623-1662), 1 Pell's equation, 86 pivotal modules, 97, 98 Plato, 66 Poincare, Henri (1854 1912), x primitive modules, 98 100 principal modules, xvi, 79 principal parts of a function, 164, 172 proof by contradiction, 186 Pythagoras, 65 quadratic character mod p, 102 quadratic forms, xv, 108

Index

211

quadratic reciprocity, xv, xvi, 70, 93, 102, 106, 108

successors, 81 Sylow theorems, xix, 186

Reduction Algorithm, 83 Reid, Constance, 202 residues, 156, 159, 160, 162 Riemann surfaces, xvii, 121, 128 Riemann-Roch theorem, xviii, 164, 169 root fields, xiii, 5, 10 Rowe, David, 203

6)(x"), 129 the Euclidean algorithm, 16, 71 theorem of the primitive element, 44, 48, 52, 171 trace, 143, 157 transcendence degree, 47 truncated solution, 134-139

similar matrices, 193 simple algebraic extension, 10 Smith, H.J.S. (1826-1883), 108, 191 spectral theorem, xi, 195, 196, 199 splitting field, xiii, 6, 39, 40 stable modules, 80

Walker, Robert J. (1909-1992), 132, 140 Weber, Heinrich (1842-1913), 142, 164 Weierstrass, Karl (1815-1897), ix, x, 201, 202 Weierstrass normal form, 124, 127 Weil, Andre (1906-1998), ix, 46, 110

Essays in Constructive Mathematics

Essays in constructive mathematics

Essays in Constructive Mathematics