The American Mathematical Monthly, Number 3, March 2011

THE AMERICAN MATHEMATICAL MONTHLY VOLUME 118, NO. 3 MARCH 2011 Yueh-Gin Gung and Dr. Charles Y. Hu Award for 2011 to ...

Author: Mathematical Association of America

156 downloads 1294 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

THE AMERICAN MATHEMATICAL

MONTHLY VOLUME 118, NO. 3

MARCH 2011

Yueh-Gin Gung and Dr. Charles Y. Hu Award for 2011 to Joseph A. Gallian for Distinguished Service to Mathematics

195

Barbara Faires

Was Cantor Surprised?

198

Fernando Q. Gouvˆ ea

On Legendre’s Work on the Law of Quadratic Reciprocity

210

Steven H. Weintraub

Equimodular Polynomials and the Tritangency Theorems of Euler, Feuerbach, and Guinand

217

Alexander Ryba and Joseph Stern

The Sharkovsky Theorem: A Natural Direct Proof

229

Keith Burns and Boris Hasselblatt

A First Look at Differential Algebra

245

John H. Hubbard and Benjamin E. Lundell

NOTES Lines of Best Fit for the Zeros and for the Critical Points of a Polynomial

262

Grant Keady

Regular Matchstick Graphs

264

Sascha Kurz and Rom Pinchasi

A Recursive Scheme for Improving the Original Rate of Convergence to the Euler–Mascheroni Constant

268

Edward Chlebus

PROBLEMS AND SOLUTIONS

275

REVIEWS Logical Labyrinths. By Raymond M. Smullyan Christopher C. Leary

An Official Publication of the Mathematical Association of America

283

Invitation to Complex Analysis Ralph P. Boas Second Edition revised by Harold P. Boas An ideal choice for a first course in complex analysis, this book can be used either as a classroom text or for independent study. Written in an informal style by a master expositor, the book distills more than half a century of experience with the subject into a lucid, engaging, yet rigorous account. The book reveals both the power of complex analysis as a tool for applications and the intrinsic beauty of the subject as a fundamental part of pure mathematics. Written at the level of courses commonly taught in American universities to seniors and beginning graduate students, the book is suitable for readers acquainted with advanced calculus or introductory real analysis. The treatment goes beyond the standard material of power series, Cauchy’s theorem, residues, conformal mapping, and harmonic functions by including accessible discussions of many intriguing topics that are uncommon in a book at this level. Readers will encounter notions ranging from Landau’s notation to overconvergent series to the Phragmén-Lindelöf theorem. The flexibility afforded by the supplementary topics and applications makes the book adaptable either to a short, one-term course or to a comprehensive, full-year course. Each topic is discussed in a typical, commonly encountered situation rather than in the most general, abstract setting. There are no numbered equations. Numerous exercises interspersed in the text encourage readers to test their understanding of new concepts and techniques as they are presented. Detailed solutions of the exercises, included at the back of the book, both serve as models for students and facilitate independent study. Supplementary exercises at the ends of sections, not solved in the book, provide an additional teaching tool. This second edition of Invitation to Complex Analysis has been painstakingly revised by the author’s son, himself an award-winning mathematical expositor. Catalog Code: ICA Hardbound, 2010

Series: Textbooks List: $63.95

ISBN: 978-0-88385-764-9 Member Price: $50.95

THE AMERICAN MATHEMATICAL

MONTHLY VOLUME 118, NO. 3

MARCH 2011

EDITOR Daniel J. Velleman Amherst College ASSOCIATE EDITORS William Adkins Jeffrey Nunemacher Louisiana State University Ohio Wesleyan University David Aldous Bruce P. Palka University of California, Berkeley National Science Foundation Roger Alperin Joel W. Robbin San Jose State University University of Wisconsin, Madison Anne Brown Rachel Roberts Indiana University South Bend Washington University, St. Louis Edward B. Burger Judith Roitman Williams College University of Kansas, Lawrence Scott Chapman Edward Scheinerman Sam Houston State University Johns Hopkins University Ricardo Cortez Abe Shenitzer Tulane University York University Joseph W. Dauben Karen E. Smith City University of New York University of Michigan, Ann Arbor Beverly Diamond Susan G. Staples College of Charleston Texas Christian University Gerald A. Edgar John Stillwell The Ohio State University University of San Francisco Gerald B. Folland Dennis Stowe University of Washington, Seattle Idaho State University, Pocatello Sidney Graham Francis Edward Su Central Michigan University Harvey Mudd College Doug Hensley Serge Tabachnikov Texas A&M University Pennsylvania State University Roger A. Horn Daniel Ullman University of Utah George Washington University Steven Krantz Gerard Venema Washington University, St. Louis Calvin College C. Dwight Lahr Douglas B. West Dartmouth College University of Illinois, Urbana-Champaign Bo Li Purdue University EDITORIAL ASSISTANT Nancy R. Board

NOTICE TO AUTHORS The MONTHLY publishes articles, as well as notes and other features, about mathematics and the profession. Its readers span a broad spectrum of mathematical interests, and include professional mathematicians as well as students of mathematics at all collegiate levels. Authors are invited to submit articles and notes that bring interesting mathematical ideas to a wide audience of MONTHLY readers. The MONTHLY’s readers expect a high standard of exposition; they expect articles to inform, stimulate, challenge, enlighten, and even entertain. MONTHLY articles are meant to be read, enjoyed, and discussed, rather than just archived. Articles may be expositions of old or new results, historical or biographical essays, speculations or definitive treatments, broad developments, or explorations of a single application. Novelty and generality are far less important than clarity of exposition and broad appeal. Appropriate figures, diagrams, and photographs are encouraged. Notes are short, sharply focused, and possibly informal. They are often gems that provide a new proof of an old theorem, a novel presentation of a familiar theme, or a lively discussion of a single issue. Beginning January 1, 2011, submission of articles and notes is required via the MONTHLY’s Editorial Manager System. Initial submissions in pdf or LATEX form can be sent to the Editor-Elect Scott Chapman at http://www.editorialmanager.com/monthly The Editorial Manager System will cue the author for all required information concerning the paper. Questions concerning submission of papers can be addressed to the Editor-Elect at [email protected]. Authors who use LATEX are urged to use article.sty, or a similar generic style, and its standard environments with no custom formatting. The style of citations for journal articles and books should match that used on MathSciNet (see http://www.ams. org/mathscinet). Follow the link to Electronic Publications Information for authors at http://www.maa. org/pubs/monthly.html for information about figures and files, as well as general editorial guidelines. Letters to the Editor on any topic are invited. Comments, criticisms, and suggestions for making the MONTHLY more lively, entertaining, and informative can be forwarded to the Editor-Elect at [email protected]. The online MONTHLY archive at www.jstor.org is a valuable resource for both authors and readers; it may be searched online in a variety of ways for any specified keyword(s). MAA members whose institutions do not provide JSTOR access may obtain individual access for a modest annual fee; call 800-3311622. See the MONTHLY section of MAA Online for current information such as contents of issues and descriptive summaries of forthcoming articles: http://www.maa.org/

Proposed problems or solutions should be sent to: DOUG HENSLEY, MONTHLY Problems Department of Mathematics Texas A&M University 3368 TAMU College Station, TX 77843-3368 In lieu of duplicate hardcopy, authors may submit pdfs to [email protected]. Advertising Correspondence: MAA Advertising 1529 Eighteenth St. NW Washington DC 20036 Phone: (866) 821-1221 Fax: (866) 387-1208 E-mail: [email protected] Further advertising information can be found online at www.maa.org Change of address, missing issue inquiries, and other subscription correspondence: MAA Service Center, [email protected] All at the address: The Mathematical Association of America 1529 Eighteenth Street, N.W. Washington, DC 20036 Recent copies of the MONTHLY are available for purchase through the MAA Service Center. [email protected], 1-800-331-1622 Microfilm Editions: University Microfilms International, Serial Bid coordinator, 300 North Zeeb Road, Ann Arbor, MI 48106. The AMERICAN MATHEMATICAL MONTHLY (ISSN 0002-9890) is published monthly except bimonthly June-July and August-September by the Mathematical Association of America at 1529 Eighteenth Street, N.W., Washington, DC 20036 and Lancaster, PA, and copyrighted by the Mathematical Association of America (Incorporated), 2011, including rights to this journal issue as a whole and, except where otherwise noted, rights to each individual contribution. Permission to make copies of individual articles, in paper or electronic form, including posting on personal and class web pages, for educational and scientific use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear the following copyright notice: [Copyright the Mathematical Association of America 2011. All rights reserved.] Abstracting, with credit, is permitted. To copy otherwise, or to republish, requires specific permission of the MAA’s Director of Publications and possibly a fee. Periodicals postage paid at Washington, DC, and additional mailing offices. Postmaster: Send address changes to the American Mathematical Monthly, Membership/Subscription Department, MAA, 1529 Eighteenth Street, N.W., Washington, DC, 20036-1385.

Yueh-Gin Gung and Dr. Charles Y. Hu Award for 2011 to Joseph A. Gallian for Distinguished Service to Mathematics Barbara Faires

Joe Gallian.

The two themes that run through Joseph A. Gallian’s service to mathematics are (a) encouraging young mathematicians and helping them to develop successful careers and (b) communicating mathematics to the widest possible audience. He was one of the early proponents of undergraduates conducting mathematical research, and his REU at Duluth, which began in 1977, is widely regarded as the premier REU. The quality of the work at Joe’s REU is evidenced by the 160 papers by participants that grew out of their REU work. These papers have appeared in such journals as Crelle’s Journal, Journal of Algebra, Journal of Combinatorial Theory, Discrete Mathematics, Applied Discrete Mathematics, Annals of Discrete Mathematics, and Journal of Graph Theory. The REU, along with Joe’s continuing contact with its participants, makes an important contribution to developing the next generations of mathematicians. Participants have included some prominent mathematicians, whose careers this REU helped to shape. Joe is also an inspiration to a generation of mathematicians who involve students in high-quality undergraduate research in mathematics. Not only is Joe successful with his own REU, but he is generous with his time and advice to help others to set up REUs. In 2002, Joe was recognized by the Council on Undergraduate Research with their Fellow Award, given to members who have demonstrated sustained excellence in research with undergraduates. doi:10.4169/amer.math.monthly.118.03.195

March 2011]

YUEH-GIN GUNG AND DR. CHARLES Y. HU AWARD FOR 2011

195

Project NExT is the MAA’s widely acclaimed professional development opportunity for new or recent Ph.D.’s. Joe has been involved with Project NExT since its first summer in 1994 when he gave the closing address. The address was so extraordinarily successful that he has given each subsequent closing address. Later, when Joe became co-director of Project NExT in 1998, he assumed primary responsibility for many parts of the program, participated in developing the workshop program, and often drafted articles for Focus and reports to the Board of Governors. His boundless energy, enthusiasm, mathematical sophistication, and academic savvy have made him the perfect person to work with the hundreds of new mathematics faculty who have become Project NExT Fellows. Joe’s Project NExT work illustrates that his service to mathematics ranges across all the levels of work that needs to be done. Not only does Joe participate in longrange planning and vision discussions, but he also does the small tasks that keep a program functioning successfully. As with his REU, Joe does not treat Project NExT as a job for which he has specified, narrowly defined duties, but as a program to which he generously gives his time and to whose success he is committed. In his talks, Joe combines thorough preparation, imaginative presentation, and a showman’s flair with solid mathematical content. An indication of Joe’s success at communicating mathematics was a standing ovation from the audience at his Pi Mu Epsilon Frame Lecture. This audience included high school students as well as professors, and all understood and were excited with Joe’s talk. Joe has given 24 addresses at national meetings, 65 at MAA Section meetings, and over 200 at colleges and universities. Joe also communicates mathematics beyond the mathematical community. Articles about his work have appeared in twenty-five news outlets in the United States as well as in Europe and India. Four of these were in Science News and one in the New York Times. In addition to this he has more than a 100 articles in mathematical journals and other publications including Math Horizons, the Macmillan Encyclopedia of Chemistry, and the Mathematical Intelligencer. Joe Gallian was named by a Duluth newspaper as one of the “100 Great Duluthians of the 20th Century.” Joe has served professional organizations and the mathematical community at large. Joe has been national coordinator for Mathematics Awareness Month (2003 and 2010); he has served on more than 50 national committees, chairing at least 10 of them; he was a Council on Undergraduate Research Councilor for 11 years, serving as chair of the mathematics and computer science division for part of that time; he has served as associate editor for Mathematics Magazine and the American Mathematical Monthly; and he has been director or co-director of five conferences. Joe has refereed for 40 journals and is a reviewer for NSF, the Research Council of Canada, and the Australian Research Council. Those who work with Joe know that he is always an active contributor to a project in which he is involved—he is efficient and he moves the project along. Joe Gallian’s many awards and honors attest to his passion to serve undergraduates, professional organizations, and the mathematical community. He has been honored with teaching awards from the University of Minnesota Duluth, the Carnegie Foundation for the Advancement of Teaching, and the Mathematical Association of America (Haimo Award). Joe has received the MAA’s Trevor Evans and Carl B. Allendoerfer Awards and has been an MAA Polya Lecturer. Joe served as second vice president and then president of the MAA. Joe completed his undergraduate degree at Slippery Rock University, M.A. at the University of Kansas, and Ph.D. at Notre Dame. He is a professor of mathematics and statistics at the University of Minnesota Duluth, where he was recognized in December 2009 with the Chancellor’s Award for Distinguished Research. 196

c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

About the Gung-Hu Award. The Yueh-Gin Gung and Dr. Charles Y. Hu Award for Distinguished Service to Mathematics is the successor to the Association’s Award for Distinguished Service to Mathematics, first presented in 1962. It is intended to be the most prestigious award for service offered by the Association. The initial endowment was contributed by husband and wife Dr. Charles Y. Hu and Yueh-Gin Gung. Dr. Hu and Yueh-Gin Gung were not mathematicians, but rather a professor of geography at the University of Maryland and a librarian at the University of Chicago, respectively. They contributed generously to our discipline because, as they wrote, “We always have high regard and great respect for the intellectual agility and high quality of mind of mathematicians and consider mathematics as the most vital field of study in the technological age we are living in.”

The Essence of Mathematics . . . “The essence of mathematics is accuracy.” Walter Bagehot, The Works of Walter Bagehot, Vol. V, The Travelers Insurance Company, Hartford, CT, 1891, p. 457. “The essence of mathematics is exact truthfulness.” Ellery W. Davis, The Condition of Secondary Mathematical Instruction With Some Hints to Remedies, Mathematical Supplement of School Science, Vol. 1, No. 1, 1903, p. 8. “The essence of mathematics lies precisely in its freedom.” Georg Cantor, Ueber unendliche, lineare Punktmannichfaltigkeiten, Mathematische Annalen 21 (1883) 564.

March 2011]

YUEH-GIN GUNG AND DR. CHARLES Y. HU AWARD FOR 2011

197

Was Cantor Surprised? Fernando Q. Gouvêa Abstract. We look at the circumstances and context of Cantor’s famous remark, “I see it, but I don’t believe it.” We argue that, rather than denoting astonishment at his result, the remark pointed to Cantor’s worry about the correctness of his proof.

Mathematicians love to tell each other stories. We tell them to our students too, and they eventually pass them on. One of our favorites, and one that I heard as an undergraduate, is the story that Cantor was so surprised when he discovered one of his theorems that he said “I see it, but I don’t believe it!” The suggestion was that sometimes we might have a proof, and therefore know that something is true, but nevertheless still find it hard to believe. That sentence can be found in Cantor’s extended correspondence with Dedekind about ideas that he was just beginning to explore. This article argues that what Cantor meant to convey was not really surprise, or at least not the kind of surprise that is usually suggested. Rather, he was expressing a very different, if equally familiar, emotion. In order to make this clear, we will look at Cantor’s sentence in the context of the correspondence as a whole. Exercises in myth-busting are often unsuccessful. As Joel Brouwer says in his poem “A Library in Alexandria,” . . . And so history gets written to prove the legend is ridiculous. But soon the legend replaces the history because the legend is more interesting. Our only hope, then, lies in arguing not only that the standard story is false, but also that the real story is more interesting. 1. THE SURPRISE. The result that supposedly surprised Cantor was the fact that sets of different dimension could have the same cardinality. Specifically, Cantor showed (of course, not yet using this language) that there was a bijection between the interval I = [0, 1] and the n-fold product I n = I × I × · · · × I . There is no doubt, of course, that this result is “surprising,” i.e., that it is counterintuitive. In fact Cantor said so explicitly, pointing out that he had expected something different. But the story has grown in the telling, and in particular Cantor’s phrase about seeing but not believing has been read as expressing what we usually mean when we see something happen and exclaim “Unbelievable!” What we mean is not that we actually do not believe, but that we find what we know has happened to be hard to believe because it is so unusual, unexpected, surprising. In other words, the idea is that Cantor felt that the result was hard to believe even though he had a proof. His phrase has been read as suggesting that mathematical proof may engender rational certainty while still not creating intuitive certainty. The story was then co-opted to demonstrate that mathematicians often discover things that they did not expect or prove things that they did not actually want to prove. For example, here is William Byers in How Mathematicians Think: doi:10.4169/amer.math.monthly.118.03.198

198


Cantor himself initially believed that a higher-dimensional figure would have a larger cardinality than a lower-dimensional one. Even after he had found the argument that demonstrated that cardinality did not respect dimensions: that one-, two-, three-, even n-dimensional sets all had the same cardinality, he said, “I see it, but I don’t believe it.” [2, p. 179] Did Cantor’s comment suggest that he found it hard to believe his own theorem even after he had proved it? Byers was by no means the first to say so. Many mathematicians thinking about the experience of doing mathematics have found Cantor’s phrase useful. In his preface to the original (1937) publication of the Cantor-Dedekind correspondence, J. Cavaillès already called attention to the phrase: . . . these astonishing discoveries—astonishing first of all to the author himself: “I see it but I don’t believe it at all,”1 he writes in 1877 to Dedekind about one of them—, these radically new notions . . . [14, p. 3, my translation] Notice, however, that Cavaillès is still focused on the description of the result as “surprising” rather than on the issue of Cantor’s psychology. It was probably Jacques Hadamard who first connected the phrase to the question of how mathematicians think, and so in particular to what Cantor was thinking. In his famous Essay on the Psychology of Invention in the Mathematical Field, first published in 1945 (only eight years after [14]), Hadamard is arguing about Newton’s ideas: . . . if, strictly speaking, there could remain a doubt as to Newton’s example, others are completely beyond doubt. For instance, it is certain that Georg Cantor could not have foreseen a result of which he himself says “I see it, but I do not believe it.” [10, pp. 61–62]. Alas, when it comes to history, few things are “certain.” 2. THE MAIN CHARACTERS. Our story plays out in the correspondence between Richard Dedekind and Georg Cantor during the 1870s. It will be important to know something about each of them. Richard Dedekind was born in Brunswick on October 6, 1831, and died in the same town, now part of Germany, on February 12, 1916. He studied at the University of Göttingen, where he was a contemporary and friend of Bernhard Riemann and where he heard Gauss lecture shortly before the old man’s death. After Gauss died, Lejeune Dirichlet came to Göttingen and became Dedekind’s friend and mentor. Dedekind was a very creative mathematician, but he was not particularly ambitious. He taught in Göttingen and in Zurich for a while, but in 1862 he returned to his home town. There he taught at the local Polytechnikum, a provincial technical university. He lived with his brother and sister and seemed uninterested in offers to move to more prestigious institutions. See [1] for more on Dedekind’s life and work. Our story will begin in 1872. The first version of Dedekind’s ideal theory had appeared as Supplement X to Dirichlet’s Lectures in Number Theory (based on actual lectures by Dirichlet but entirely written by Dedekind). Also just published was one of his best known works, “Stetigkeit und Irrationalzahlen” (“Continuity and Irrational Numbers”; see [7]; an English translation is included in [5]). This was his account of how to construct the real numbers as “cuts.” He had worked out the idea in 1858, but published it only 14 years later. 1 Cavaill` es

misquotes Cantor’s phrase as “je le vois mais je ne le crois point.”

March 2011]

WAS CANTOR SURPRISED?

199

Georg Cantor was born in St. Petersburg, Russia, on March 3, 1845. He died in Halle, Germany, on January 6, 1918. He studied at the University of Berlin, where the mathematics department, led by Karl Weierstrass, Ernst Eduard Kummer, and Leopold Kronecker, might well have been the best in the world. His doctoral thesis was on the number theory of quadratic forms. In 1869, Cantor moved to the University of Halle and shifted his interests to the study of the convergence of trigonometric series. Very much under Weierstrass’s influence, he too introduced a way to construct the real numbers, using what he called “fundamental series.” (We call them “Cauchy sequences.”) His paper on this construction also appeared in 1872. Cantor’s lifelong dream seems to have been to return to Berlin as a professor, but it never happened. He rose through the ranks in Halle, becoming a full professor in 1879 and staying there until his death. See [13] for a short account of Cantor’s life. The standard account of Cantor’s mathematical work is [4]. Cantor is best known, of course, for the creation of set theory, and in particular for his theory of transfinite cardinals and ordinals. When our story begins, this was mostly still in the future. In fact, the birth of several of these ideas can be observed in the correspondence with Dedekind. This correspondence was first published in [14]; we quote it from the English translation by William Ewald in [8, pp. 843–878]. 3. “ALLOW ME TO PUT A QUESTION TO YOU.” Dedekind and Cantor met in Switzerland when they were both on vacation there. Cantor had sent Dedekind a copy of the paper containing his construction of the real numbers. Dedekind responded, of course, by sending Cantor a copy of his booklet. And so begins the story. Cantor was 27 years old and very much a beginner, while Dedekind was 41 and at the height of his powers; this accounts for the tone of deference in Cantor’s side of the correspondence. Cantor’s first letter acknowledged receipt of [7] and says that “my conception [of the real numbers] agrees entirely with yours,” the only difference being in the actual construction. But on November 29, 1873, Cantor moves on to new ideas: Allow me to put a question to you. It has a certain theoretical interest for me, but I cannot answer it myself; perhaps you can, and would be so good as to write me about it. It is as follows. Take the totality of all positive whole-numbered individuals n and denote it by (n). And imagine, say, the totality of all positive real numerical quantities x and designate it by (x). The question is simply, Can (n) be correlated to (x) in such a way that to each individual of the one totality there corresponds one and only one of the other? At first glance one says to oneself no, it is not possible, for (n) consists of discrete parts while (x) forms a continuum. But nothing is gained by this objection, and although I incline to the view that (n) and (x) permit no one-to-one correlation, I cannot find the explanation which I seek; perhaps it is very easy. In the next few lines, Cantor points out that the question is not as dumb as it looks, since “the totality qp of all positive rational numbers” can be put in one-to-one correspondence with the integers. We do not have Dedekind’s side of the correspondence, but his notes indicate that he responded indicating that (1) he could not answer the question either, (2) he could show that the set of all algebraic numbers is countable, and (3) that he didn’t think the question was all that interesting. Cantor responded on December 2: 200


I was exceptionally pleased to receive your answer to my last letter. I put my question to you because I had wondered about it already several years ago, and was never certain whether the difficulty I found was subjective or whether it was inherent in the subject. Since you write that you too are unable to answer it, I may assume the latter.—In addition, I should like to add that I have never seriously occupied myself with it, because it has no special practical interest for me. And I entirely agree with you when you say that for this reason it does not deserve much effort. But it would be good if it could be answered; e.g., if it could be answered with no, then one would have a new proof of Liouville’s theorem that there are transcendental numbers. Cantor first concedes that perhaps it is not that interesting, then immediately points out an application that was sure to interest Dedekind! In fact, Dedekind’s notes indicate that it worked: “But the opinion I expressed that the first question did not deserve too much effort was conclusively refuted by Cantor’s proof of the existence of transcendental numbers.” [8, p. 848] These two letters are fairly typical of the epistolary relationship between the two men: Cantor is deferential but is continually coming up with new ideas, new questions, new proofs; Dedekind’s role is to judge the value of the ideas and the correctness of the proofs. The very next letter, from December 7, 1873, contains Cantor’s first proof of the uncountability of the real numbers. (It was not the “diagonal” argument; see [4] or [9] for the details.) 4. “THE SAME TRAIN OF THOUGHT . . . ” Cantor seemed to have a good sense for what question should come next. On January 5, 1874, he posed the problem of higher-dimensional sets: As for the question with which I have recently occupied myself, it occurs to me that the same train of thought also leads to the following question: Can a surface (say a square including its boundary) be one-to-one correlated to a line (say a straight line including its endpoints) so that to every point of the surface there corresponds a point of the line, and conversely to every point of the line there corresponds a point of the surface? It still seems to me at the moment that the answer to this question is very difficult—although here too one is so impelled to say no that one would like to hold the proof to be almost superfluous. Cantor’s letters indicate that he had been asking others about this as well, and that most considered the question just plain weird, because it was “obvious” that sets of different dimensions could not be correlated in this way. Dedekind, however, seems to have ignored this question, and the correspondence went on to other issues. On May 18, 1874, Cantor reminded Dedekind of the question, and seems to have received no answer. The next letter in the correspondence is from May, 1877. The correspondence seems to have been reignited by a misunderstanding of what Dedekind meant by “the essence of continuity” in [7]. On June 20, 1877, however, Cantor returns to the question of bijections between sets of different dimensions, and now proposes an answer: . . . I should like to know whether you consider an inference-procedure that I use to be arithmetically rigorous. The problem is to show that surfaces, bodies, indeed even continuous structures of ρ dimensions can be correlated one-to-one with continuous lines, i.e., March 2011]


201

with structures of only one dimension—so that surfaces, bodies, indeed even continuous structures of ρ dimension have the same power as curves. This idea seems to conflict with the one that is especially prevalent among the representatives of modern geometry, who speak of simply infinite, doubly, triply, . . . , ρ-fold infinite structures. (Sometimes you even find the idea that the infinity of points of a surface or a body is obtained as it were by squaring or cubing the infinity of points of a line.) Significantly, Cantor’s formulation of the question had changed. Rather than asking whether there is a bijection, he posed the question of finding a bijection. This is, of course, because he believed he had found one. By this point, then, Cantor knows the right answer. It remains to give a proof that will convince others. He goes on to explain his idea for that proof, working with the ρ-fold product of the unit interval with itself, but for our purposes we can consider only the case ρ = 2. The proof Cantor proposed is essentially this: take a point (x, y) in [0, 1] × [0, 1], and write out the decimal expansions of x and y: (x, y) = (0.abcde . . . , 0.αβγ δ . . . ). Some real numbers have more than one decimal expansion. In that case, we always choose the expansion that ends in an infinite string of 9s. Cantor’s idea is to map (x, y) to the point z ∈ [0, 1] given by z = 0.aαbβcγ dδe . . . Since we can clearly recover x and y from the decimal expansion of z, this gives the desired correspondence. Dedekind immediately noticed that there was a problem. On June 22, 1877 (one cannot fail to be impressed with the speed of the German postal service!), he wrote back pointing out a slight problem “which you will perhaps solve without difficulty.” He had noticed that the function Cantor had defined, while clearly one-to-one, was not onto. (Of course, he did not use those words.) Specifically, he pointed out that such numbers as z = 0.120101010101 . . . did not correspond to any pair (x, y), because the only possible value for x is 0.100000 . . . , which is disallowed by Cantor’s choice of decimal expansion. He was not sure if this was a big problem, adding “I do not know if my objection goes to the essence of your idea, but I did not want to hold it back.” Of course, the problem Dedekind noticed is real. In fact, there are a great many real numbers not in the image, since we can replace the ones that separate the zeros with any sequence of digits. The image of Cantor’s map is considerably smaller than the whole interval. Cantor’s first response was a postcard sent the following day. (Can one envision him reading the letter at the post office and immediately dispatching a postcard back?) He acknowledged the error and suggested a solution: Alas, you are entirely correct in your objection; but happily it concerns only the proof, not the content. For I proved somewhat more than I had realized, in that I bring a system x1 , x2 , . . . , xρ of unrestricted real variables (that are ≥ 0 and ≤ 1) into one-to-one relationship with a variable y that does not assume all values of 202


that interval, but rather all with the exception of certain y 00 . However, it assumes each of the corresponding values y 0 only once, and that seems to me to be the essential point. For now I can bring y 0 into a one-to-one relation with another quantity t that assumes all the values ≥ 0 and ≤ 1. I am delighted that you have found no other objections. I shall shortly write to you at greater length about this matter. This is a remarkable response. It suggests that Cantor was very confident that his result was true. This confidence was due to the fact that Cantor was already thinking in terms of what later became known as “cardinality.” Specifically, he expects that the existence of a one-to-one mapping from one set A to another set B implies that the size of A is in some sense “less than or equal to” that of B. Cantor’s proof shows that the points of the square can be put into bijection with a subset of the interval. Since the interval can clearly be put into bijection with a subset of the square, this strongly suggests that both sets of points “are the same size,” or, as Cantor would have said it, “have the same power.” All we need is a proof that the “powers” are linearly ordered in a way that is compatible with inclusions. That the cardinals are indeed ordered in this way is known today as the SchroederBernstein theorem. The postcard shows that Cantor already “knew” that the SchroederBernstein theorem should be true. In fact, he seems to implicitly promise a proof of that very theorem. He was not able to find such a proof, however, then or (as far as I know) ever. His fuller response, sent two days later on June 25, contained instead a completely different, and much more complicated, proof of the original theorem. I sent you a postcard the day before yesterday, in which I acknowledged the gap you discovered in my proof, and at the same time remarked that I am able to fill it. But I cannot repress a certain regret that the subject demands more complicated treatment. However, this probably lies in the nature of the subject, and I must console myself; perhaps it will later turn out that the missing portion of that proof can be settled more simply than is at present in my power. But since I am at the moment concerned above all to persuade you of the correctness of my theorem . . . I allow myself to present another proof of it, which I found even earlier than the other. Notice that what Cantor is trying to do here is to convince Dedekind that his theorem is true by presenting him a correct proof.2 There is no indication that Cantor had any doubts about the correctness of the result itself. In fact, as we will see, he says so himself. Let’s give a brief account of Cantor’s proof; to avoid circumlocutions, we will express most of it in modern terms. Cantor began by noting that every real number x between 0 and 1 can be expressed as a continued fraction 1

x=

1

a+ b+

1 c + ···

2 Cantor claimed he had found this proof before the other. I find this hard to believe. In fact, the proof looks very much like the result of trying to fix the problem in the first proof by replacing (nonunique) decimal expansions with (unique) continued fraction expansions.

March 2011]


203

where the partial quotients a, b, c, . . . , etc. are all positive integers. This representation is infinite if and only if x is irrational, and in that case the representation is unique. So one can argue just as before, “interleaving” the two continued fractions for x and y, to establish a bijection between the set of pairs (x, y) such that both x and y are irrational and the set of irrational points in [0, 1]. The result is a bijection because the inverse mapping, splitting out two continued fraction expansions from a given one, will certainly produce two infinite expansions. That being done, it remains to be shown that the set of irrational numbers between 0 and 1 can be put into bijection with the interval [0, 1]. This is the hard part of the proof. Cantor proceeded as follows. First he chose an enumeration of the rationals {rk } and an increasing sequence of irrationals {ηk } in [0, 1] converging to 1. He then looked at the bijection from [0, 1] to [0, 1] that is the identity on [0, 1] except for mapping rk 7 → ηk , ηk 7 → rk . This gives a bijection between irrationals in [0, 1] and [0, 1] minus the sequence {ηk } and reduces the problem to proving that [0, 1] can be put into bijection with [0, 1] − {ηk }. At this point Cantor claims that it is now enough to “successively apply” the following theorem: A number y that can assume all the values of the interval (0 . . . 1) with the solitary exception of the value 0 can be correlated one-to-one with a number x that takes on all values of the interval (0 . . . 1) without exception. In other words, he claimed that there was a bijection between the half-open interval (0, 1] and the closed interval [0, 1], and that “successive application” of this fact would finish the proof. In the actual application he would need the intervals to be open on the right, so, as we will see, he chose a bijection that mapped 1 to itself. Cantor did not say exactly what kind of “successive application” he had in mind, but what he says in a later letter suggests it was this: we have the interval [0, 1] minus the sequence of the ηk . We want to “put back in” the ηk , one at a time. So we leave the interval [0, η1 ) alone, and look at (η1 , η2 ). Applying the lemma, we construct a bijection between that and [η1 , η2 ). Then we do the same for (η2 , η3 ) and so on. Putting together these bijections produces the bijection we want. Finally, it remained to prove the lemma, that is, to construct the bijection from [0, 1] to (0, 1]. Modern mathematicians would probably do this by choosing a sequence xn in (0, 1), mapping 0 to x1 and then every xn to xn+1 . This “Hilbert hotel” idea was still some time in the future, however, even for Cantor. Instead, Cantor chose a bijection that could be represented visually, and simply drew its graph. He asked Dedekind to consider “the following peculiar curve,” which we have redrawn in Figure 1 based on the photograph reproduced in [4, p. 63]. Such a picture requires some explanation, and Cantor provided it. The domain has been divided by a geometric progression, so b = 1/2, b1 = 3/4, and so on; a = (0, 1/2), a 0 = (1/2, 3/4), etc. The point C is (1, 1). The points d 0 = (1/2, 1/2), d 00 = (3/4, 3/4), etc. give the corresponding subdivision of the main diagonal. The curve consists of infinitely many parallel line segments ab, a 0 b0 , a 00 b00 and of the point c. The endpoints b, b0 , b00 , . . . are not regarded as belonging to the curve. The stipulation that the segments are open at their lower endpoints means that 0 is not in the image. This proves the lemma, and therefore the proof is finished. 204


a

a

d

a d b b

C

a b

d

a

d

b

P O

b

b1

b2 b3 b4

Figure 1. Cantor’s function from [0, 1] to (0, 1].

Cantor did not even add that last comment. As soon as he had explained his curve, he moved on to make extensive comments on the theorem and its implications. He turns on its head the objection that various mathematicians made to his question, namely that it was “obvious” from geometric considerations that the number of variables is invariant: For several years I have followed with interest the efforts that have been made, building on Gauss, Riemann, Helmholtz, and others, towards the clarification of all questions concerning the ultimate foundations of geometry. It struck me that all the important investigations in this field proceed from an unproven presupposition which does not appear to me self-evident, but rather to need a justification. I mean the presupposition that a ρ-fold extended continuous manifold needs ρ independent real coordinates for the determination of its elements, and that for a given manifold this number of coordinates can neither be increased nor decreased. This presupposition became my view as well, and I was almost convinced of its correctness. The only difference between my standpoint and all the others was that I regarded that presupposition as a theorem which stood in great need of a proof; and I refined my standpoint into a question that I presented to several colleagues, in particular at the Gauss Jubilee in Göttingen. The question was the following: “Can a continuous structure of ρ dimensions, where ρ > 1, be related oneto-one with a continuous structure of one dimension so that to each point of the former there corresponds one and only one point of the latter?” Most of those to whom I presented this question were extremely puzzled that I should ask it, for it is quite self-evident that the determination of a point in an extension of ρ dimensions always needs ρ independent coordinates. But whoever penetrated the sense of the question had to acknowledge that a proof was needed to show why the question should be answered with the “self-evident” no. As I say, I myself was one of those who held it for the most likely that the March 2011]


205

question should be answered with a no—until quite recently I arrived by rather intricate trains of thought at the conviction that the answer to that question is an unqualified yes.3 Soon thereafter I found the proof which you see before you today. So one sees what wonderful power lies in the ordinary real and irrational numbers, that one is able to use them to determine uniquely the elements of a ρ-fold extended continuous manifold with a single coordinate. I will only add at once that their power goes yet further, in that, as will not escape you, my proof can be extended without any great increase in difficulty to manifolds with an infinitely great dimension-number, provided that their infinitely-many dimensions have the form of a simple infinite sequence. Now it seems to me that all philosophical or mathematical deductions that use that erroneous presupposition are inadmissible. Rather the difference that obtains between structures of different dimension-number must be sought in quite other terms than in the number of independent coordinates—the number that was hitherto held to be characteristic. 5. “JE LE VOIS . . . ” So now Dedekind had a lot to digest. The interleaving argument is not problematic in this case, and the existence of a bijection between the rationals and the increasing sequence ηk had been established in 1872. But there were at least two sticky points in Cantor’s letter. First, there is the matter of what kind of “successive application” of the lemma Cantor had in mind. Whatever it was, it would seem to involve constructing a bijection by “putting together” an infinite number of functions. One can easily get in trouble. For example, here is an alternative reading of what Cantor had in mind. Instead of applying the lemma to the interval (η1 , η2 ), we could apply it to (0, η1 ) to put it into bijection with (0, η1 ]. So now we have “put η1 back in” and we have a bijection between [0, 1] − {η1 , η2 , η3 , . . . } and [0, 1] − {η2 , η3 , . . . }. Now repeat: use the lemma on (0, η2 ) to make a bijection to (0, η2 ]. So we have “put η2 back in.” If we keep doing that, we presumably get a bijection from (0, 1) minus the ηk to all of (0, 1). But do we? What is the image of, say, 13 η1 ? It is not fixed under any of our functions. To determine its image in [0, 1], we would need to compose infinitely many functions, and it’s not clear how to do that. If we manage to do it with some kind of limiting process, then it is no longer clear that the overall function is a bijection. The interpretation Cantor probably intended (and later stated explicitly) yields a workable argument because the domains of the functions are disjoint, so it is clear where to map any given point. But since Cantor did not indicate his argument in this letter, one can imagine Dedekind hesitating. In any case, at this point in history the idea of constructing a function out of infinitely many pieces would have been both new and worrying. The second sticky point was Cantor’s “application” of his theorem to undermine the foundations of geometry. This is, of course, the sort of thing one has to be careful about. And it is clear, from Dedekind’s eventual response to Cantor, that it concerned him. Dedekind took longer than usual to respond. Having already given one wrong proof, Cantor was anxious to hear a “yes” from Dedekind, and so he wrote again on June 29: 3 The original reads “. . . bis ich vor ganz kurzer Zeit durch ziemlich verwickelte Gedankereihen zu der Ue¨ berzeugung gelangte, dass jene Frage ohne all Einschränkgung zu bejahen ist.” Note Cantor’s Uberzeugung— conviction, belief, certainty.

206


Please excuse my zeal for the subject if I make so many demands upon your kindness and patience; the communications which I lately sent you are even for me so unexpected, so new, that I can have no peace of mind until I obtain from you, honoured friend, a decision about their correctness. So long as you have not agreed with me, I can only say: je le vois, mais je ne le crois pas. And so I ask you to send me a postcard and let me know when you expect to have examined the matter, and whether I can count on an answer to my quite demanding request. So here is the phrase. The letter is, of course, in German, but the famous “I see it, but I don’t believe it” is in French.4 Seen in its context, the issue is clearly not that Cantor was finding it hard to believe his result. He was confident enough about that to think he had rocked the foundations of the geometry of manifolds. Rather, he felt a need for confirmation that his proof was correct. It was his argument that he saw but had trouble believing. This is confirmed by the rest of the letter, in which Cantor spelled out in detail the most troublesome step, namely, how to “successively apply” his lemma to construct the final bijection. So the famous phrase does not really provide an example of a mathematician having trouble believing a theorem even though he had proved it. Cantor, in fact, seems to have been confident [überzeugt!] that his theorem was true, as he himself says. He had in hand at least two arguments for it: the first argument, using the decimal expansion, required supplementation by a proof of the Schroeder-Bernstein theorem, but Cantor was quite sure that this would eventually be proved. The second argument was correct, he thought, but its complicated structure might have allowed something to slip by him. He knew that his theorem was a radically new and surprising result—it would certainly surprise others!—and thus it was necessary that the proof be as solid as possible. The earlier error had given Cantor reason to worry about the correctness of his argument, leaving Cantor in need of his friend’s confirmation before he would trust the proof. Cantor was, in fact, in a position much like that of a student who has proposed an argument, but who knows that a proof is an argument that convinces his teacher. Though no longer a student, he knows that a proof is an argument that will convince others, and that in Dedekind he had the perfect person to find an error if one were there. So he saw, but until his friend’s confirmation he did not believe. 6. WHAT CAME NEXT. So why did Dedekind take so long to reply? From the evidence of his next letter, dated July 2, it was not because he had difficulty with the proof. His concern, rather, was Cantor’s challenge to the foundations of geometry. The letter opens with a sentence clearly intended to allay Cantor’s fears: “I have examined your proof once more, and I have discovered no gap in it; I am quite certain that your interesting theorem is correct, and I congratulate you on it.” But Dedekind did not accept the consequences Cantor seemed to find: However, as I already indicated in the postcard, I should like to make a remark that counts against the conclusions concerning the concept of a manifold of ρ dimensions that you append in your letter of 25 June to the communication and the proof of the theorem. Your words make it appear—my interpretation may be incorrect—as though on the basis of your theorem you wish to cast doubt on the meaning or the importance of this concept . . . 4 I don’t know whether this is because of the rhyme vois/crois, or because of the well-known phrase “voir, c’est croire,” or for some other reason. I do not believe the phrase was already proverbial.

March 2011]


207

Against this, I declare (despite your theorem, or rather in consequence of reflections that it stimulated) my conviction or my faith (I have not yet had time even to make an attempt at a proof) that the dimension-number of a continuous manifold remains its first and most important invariant, and I must defend all previous writers on the subject . . . For all authors have clearly made the tacit, completely natural presupposition that in a new determination of the points of a continuous manifold by new coordinates, these coordinates should also (in general) be continuous functions of the old coordinates . . . Dedekind pointed out that, in order to establish his correspondence, Cantor had been “compelled to admit a frightful, dizzying discontinuity in the correspondence, which dissolves everything to atoms, so that every continuously connected part of one domain appears in its image as thoroughly decomposed and discontinuous.” He then set out a new conjecture that spawned a whole research program: . . . for the time being I believe the following theorem: “If it is possible to establish a reciprocal, one-to-one, and complete correspondence between the points of a continuous manifold A of a dimensions and the points of a continuous manifold B of b dimensions, then this correspondence itself, if a and b are unequal, is necessarily utterly discontinuous.” In his next letter, Cantor claimed that this was indeed his point: where Riemann and others had casually spoken of a space that requires n coordinates as if that number was known to be invariant, he felt that this invariance required proof. “Far from wishing to turn my result against the article of faith of the theory of manifolds, I rather wish to use it to secure its theorems,” he wrote. The required theorem turned out to be true, indeed, but proving it took much longer than either Cantor or Dedekind could have guessed: it was finally proved by Brouwer in 1910. The long and convoluted story of that proof can be found in [3], [11], and [12]. Finally, one should point out that it was only some three months later that Cantor found what most modern mathematicians consider the “obvious” way to prove that there is a bijection between the interval minus a countable set and the whole interval. In a letter √ dated October 23, 1877, he took an enumeration φν of the rationals and let ην = 2/2ν . Then he constructed a map from [0, 1] sending ην to η2ν−1 , φν to η2ν , and every other point h to itself, thus getting a bijection between [0, 1] and the irrational numbers between 0 and 1. 7. MATHEMATICS AS CONVERSATION. Is the real story more interesting than the story of Cantor’s surprise? Perhaps it is, since it highlights the social dynamic that underlies mathematical work. It does not render the theorem any less surprising, but shifts the focus from the result itself to its proof. The record of the extended mathematical conversation between Cantor and Dedekind reminds us of the importance of such interaction in the development of mathematics. A mathematical proof is, after all, a kind of challenge thrown at an idealized opponent, a skeptical adversary that is reluctant to be convinced. Often, this adversary is actually a colleague or collaborator, the first reader and first critic. A proof is not a proof until some reader, preferably a competent one, says it is. Until then we may see, but we should not believe. REFERENCES 1. K.-R. Biermann, Dedekind, in Dictionary of Scientific Biography, C. C. Gillispie, ed., Scribners, New York, 1970–1981.

208


2. W. Byers, How Mathematicians Think: Using Ambiguity, Contradiction, and Paradox to Create Mathematics, Princeton University Press, Princeton, 2007. 3. J. W. Dauben, The invariance of dimension: Problems in the early development of set theory and topology, Historia Math. 2 (1975) 273–288. doi:10.1016/0315-0860(75)90066-X 4. , Georg Cantor: His Mathematics and Philosophy of the Infinite, Princeton University Press, Princeton, 1990. 5. R. Dedekind, Essays in the Theory of Numbers (trans. W. W. Beman), Dover, Mineola, NY, 1963. 6. , Gesammelte Mathematische Werke, R. Fricke, E. Noether, and O. Ore, eds., Chelsea, New York, 1969. 7. , Stetigkeit und Irrationalzahlen, 1872, in Gesammelte Mathematische Werke, vol. 3, item L, R. Fricke, E. Noether, and O. Ore, eds., Chelsea, New York, 1969. 8. W. Ewald, From Kant to Hilbert: A Source Book in the Foundations of Mathematics, Oxford University Press, Oxford, 1996. 9. R. Gray, Georg Cantor and transcendental numbers, Amer. Math. Monthly 101 (1994) 819–832. doi: 10.2307/2975129 10. J. Hadamard, An Essay on the Psychology of Invention in the Mathematical Field, Princeton University Press, Princeton, 1945. 11. D. M. Johnson, The problem of the invariance of dimension in the growth of modern topology I, Arch. Hist. Exact Sci. 20 (1979) 97–188. doi:10.1007/BF00327627 12. , The problem of the invariance of dimension in the growth of modern topology II, Arch. Hist. Exact Sci. 25 (1981) 85–267. doi:10.1007/BF02116242 13. H. Meschkowski. Cantor, in Dictionary of Scientific Biography, C. C. Gillispie, ed., Scribners, New York, 1970–1981. 14. E. Noether und J. Cavaillès, Briefwechsel Cantor–Dedekind, Hermann, Paris, 1937. ˆ is Carter Professor of Mathematics at Colby College in Waterville, ME. He is FERNANDO Q. GOUVEA the author, with William P. Berlinghoff, of Math through the Ages: A Gentle History for Teachers and Others. This article was born when he was writing the chapter in that book called “Beyond Counting.” So it’s Bill’s fault. Department of Mathematics and Statistics, Colby College, Waterville, ME 04901 [email protected]

March 2011]


209

On Legendre’s Work on the Law of Quadratic Reciprocity Steven H. Weintraub

Abstract. Legendre was the first to state the law of quadratic reciprocity in the form in which we know it and he was able to prove it in some but not all cases, with the first complete proof being given by Gauss. In this paper we trace the evolution of Legendre’s work on quadratic reciprocity in his four great works on number theory.

As is well known, Adrien-Marie Legendre (1752–1833) was the first to state the law of quadratic reciprocity in the form in which we know it (though an equivalent result had earlier been conjectured by Euler), and he was able to prove it in some but not all cases, with the first complete proof being given by Gauss [3]. In this paper we trace the evolution of Legendre’s work on quadratic reciprocity in his four great works [10, 11, 12, 13] on number theory. These works span a 45 year period in Legendre’s life, dating from 1785, 1797, 1808, and 1830 respectively. Before beginning with our analysis here, we call the reader’s attention to several other relevant works. [15] overlaps with our work here, though it has a somewhat different focus. [2] is a brief survey, and [14] is an extended treatment of the early history of reciprocity laws. A highly readable account of the development of number theory around this era can be found in [9], which has excerpts from original works of Euler, Legendre, Gauss, and others, translated into English. In this paper, we will use Legendre’s language to the extent possible. In particular, we will not use the terms quadratic residue/nonresidue or the notion of congruence in the body of this article, as these were never used by Legendre. We begin with Legendre’s 1785 paper [10]. In Article I of that paper he proves a result due originally to Euler: Theorem A. Let c be an odd prime and let d be any integer not divisible by c. Then d c−1 − 1 is divisible by c. Furthermore, c divides the formula x 2 + dy 2 (i.e., there are integers x and y not divisible by c with x 2 + dy 2 divisible by c) if and only if (−d)(c−1)/2 leaves a remainder of 1 when divided by c; otherwise (−d)(c−1)/2 leaves a remainder of −1 when divided by c. If −c/2 < d < c/2, each possibility occurs for (c − 1)/2 values of d.1 We follow Legendre’s notation throughout this paper and let a and A be distinct primes of the form 4x + 1 and b and B be distinct primes of the form 4x − 1 (or 4x + 3; Legendre used both but preferred 4x − 1). In Article IV he rewrites the above conclusions as (−d)(c−1)/2 = 1 or (−d)(c−1)/2 = −1, with the convention here, as he explicitly states elsewhere, that this is true “omitting multiples of c.” With this convention he then states the following 8 theorems: doi:10.4169/amer.math.monthly.118.03.210 that “c divides the formula x 2 + dy 2 ” if and only if −d is a quadratic residue (mod c). To see this, first suppose that −d is a quadratic residue (mod c), and let x be an integer with x 2 ≡ −d (mod c). Then x 2 + d12 ≡ 0 (mod c). Conversely, if x 2 + dy 2 ≡ 0 (mod c), let y be an integer with y y ≡ 1 (mod c). Then x 2 y 2 + d ≡ 0 (mod c), so −d ≡ (x y)2 (mod c). 1 Note

210


Theorem I. If b(a−1)/2 = 1, then a (b−1)/2 = 1. Theorem II. If a (b−1)/2 = −1, then b(a−1)/2 = −1. Theorem III. If a (A−1)/2 = 1, then A(a−1)/2 = 1. Theorem IV. If a (A−1)/2 = −1, then A(a−1)/2 = −1. Theorem V. If a (b−1)/2 = 1, then b(a−1)/2 = 1. Theorem VI. If b(a−1)/2 = −1, then a (b−1)/2 = −1. Theorem VII. If b(B−1)/2 = 1, then B (b−1)/2 = −1. Theorem VIII. If b(B−1)/2 = −1, then B (b−1)/2 = −1. Note that Theorems I and II are equivalent (being contrapositives of each other) as are Theorems III and IV, and Theorems V and VI, leaving five essentially different cases. Legendre makes this observation in the course of his proofs. Legendre then proceeds to (attempt to) prove these theorems. As he observes, he is successful in unconditionally proving Theorems I, II, and VII, but for the remaining cases his proof is conditional on an auxiliary hypothesis that he cannot prove. We state this as Hypothesis A below. His key tool in these (partial) proofs is the following result of his from Article III: Theorem B. Let r , s, and t be squarefree, pairwise relatively prime positive integers. Then the equation r x 2 + sy 2 = t z 2 has a nonzero solution in integers if and only if there are integers λ, µ, and ν such that r λ2 + s , t

tµ2 − s , r

tν 2 − r s

are all integers. Note in particular that in this theorem r , s, and t are not required to be prime. However, in his (partial) proofs of Theorems I–VIII he uses only the cases where r , s, and t are primes or are equal to 1. Now we come to Legendre’s 1797 book [11]. In this work we see two major changes from his previous work. The first change is the introduction of what we now call the Legendre symbol mn . Here n is a prime and m is an arbitrary integer not divisible by n. From Theorem A he knows that m (n−1)/2 leaves a remainder of ±1 when divided by n, and then he sets m = 1 or −1 as this remainder is 1 or −1. Legendre occasionally uses this symbol n when n = 1 as well, in which case he sets mn = 1 for every nonzero integer m. In the opinion of the author, this is more than a notational convenience. In introducing this notion, Legendre reifies this concept, and makes it into an object of independent study. This line of thought later led to the Jacobi symbol and the Hilbert symbol. The second change is the introduction of the term “reciprocity.” We have in this work [11, par. (164)], a paragraph entitled: March 2011]

LEGENDRE’S WORK ON QUADRATIC RECIPROCITY

211

Théorême contenant une loi de réciprocité qui existe entre deux nombres premiers quelconques (Theorem containing a law of reciprocity that exists between two arbitrary prime numbers). Legendre begins this paragraph by letting m and n be odd primes, and with this implicit assumption he states the theorem: For arbitrary prime numbers m and n, if they are not both of the form 4x − 1, then mn = mn , and if they are both of the form 4x − 1, then mn = − mn . These two general cases are contained in the formula n m m−1 n−1 = (−1) 2 · 2 . m n Again, in the opinion of the author, this general heading reflects not only an appreciation of the importance of this overall result, but a conception of it as a single result (rather than a collection of eight results), and a conception of it as reflecting a relationship between distinct odd primes. Legendre begins his proof by observing that, in case r , s, and t are primes or are equal to 1, the conditions in Theorem B can be restated in terms of the Legendre symbol: −r s st rt = 1, = 1, = 1. t r s He then proceeds to develop theproperties of the Legendre symbol. From his defiN −N N nition he immediately derives −N = and = − , as (a − 1)/2 is even, a a b b so (−1)(a−1)/2 = 1, and (b − 1)/2 is odd, so (−1)(b−1)/2 = −1, and also the multiplicativity of the Legendre symbol in general: McN = Mc Nc .2 In order to prove reciprocity he must divide the proof into the same eight cases as he did in 1785. (We warn the reader who wishes to consult the works of Legendre that he changes the numbering of these eight cases from 1785 to 1797, leaves it alone from 1797 to 1808, but changes it again from 1808 to 1830. In this paper we use the 1785 numbering throughout.) His proof is essentially (with one exception, noted below) the same as that in 1785, with some simplification due to the use of the Legendre symbol. He uses the same auxiliary hypothesis, Hypothesis A, as in 1785, which we now state (we have delayed stating it until now as it is much easier to state using the Legendre symbol). Hypothesis A. (a) For any a and A, there exists b with (b) For any a and b, there exists A with (c) For any b and B, there exists a with

b = 1 and Ab a A = −1 and a a = −1 and b

= −1. = −1.

A b a B

= −1.

The only essential difference between the proofs in 1785 and 1797 is that in 1797 Legendre gives two proofs of cases I and II, the first of which is the same as his 1785 proof. 2 Nowadays it is common to define the Legendre symbol m by m = 1 if m is a quadratic residue (mod n) n n and mn = −1 if m is a quadratic nonresidue (mod n), but this was not Legendre’s definition. Of course, the

modern definition and Legendre’s definition are equivalent, by Euler’s theorem. Note that the multiplicativity of the Legendre symbol is immediate from Legendre’s definition, but takes some work to obtain from the modern definition.

212


Now we come to Legendre’s 1808 book [12]. First we see that Legendre has isolated the multiplicativity of the Legendre symbol as an important property and instead of proving it inter alia as part of the proof of reciprocity, he proves this property in [12, par. (135)], the paragraph in which he first introduces the Legendre symbol. The major difference, however, is that Legendre has changed his proof of reciprocity. His proof here of cases I and II is the same as his second proof of these cases in 1797. As for the remaining cases, he has a new approach, based on his earlier results on the possibility of solving the equations M x 2 − N y 2 = ±1 for M and N primes. Using these results, he treats cases VII and VIII together, establishing them unconditionally. As for the remaining cases, his proof depends on a different auxiliary hypothesis, Hypothesis B, than he had previously used. His auxiliary hypothesis here is: Hypothesis B. For any a, there exists b with ab = −1. Of course, by this time Gauss had proved the law of quadratic reciprocity in general. Gauss gave two proofs in [3] and a third in [4]. (The author admits to not being able to read Latin, and he consulted [3, 4, 5, 6] in the German translation [7].) In [12, par. (381)], Legendre gives Gauss’s third proof as well. In Legendre’s two-volume 1830 book [13] we again find his attempt to prove quadratic reciprocity. Here his proof is the same as in 1808, with the same dependence on Hypothesis B in some of the cases. He again gives Gauss’s third proof, but he also gives a proof in [13, par. (679)] that is a modification of a proof due to Jacobi [8]. Jacobi’s proof is in part a simplification of Gauss’s sixth proof [6]. Jacobi begins with P d what we now call the Gauss sum P = c−1 that d=1 c exp(2πid/c). It is easy to show √ √ P 2 = (−1)(c−1)/2 c, so P = ± c if c is a prime of the form 4x + 1 and P = ±i c if c is a prime of the form 4x − 1. It is a celebrated theorem of Gauss [5] that the sign is always positive, and in that work Gauss used this fact to provide his fourth proof of quadratic reciprocity. Jacobi used the value of P. Legendre modified Jacobi’s proof to use only the value of P 2 so that the difficult sign question could be bypassed. In fact, the observation that quadratic reciprocity can be proved using only the value of P 2 and not the value of P goes back to Gauss in his sixth proof [6]. Legendre describes the proof he gives as the simplest of all known proofs of quadratic reciprocity. Legendre realized full well that his own two proofs of quadratic reciprocity were incomplete. The change from his 1785/1797 proof to his 1808/1830 proof enabled him to prove an additional case of reciprocity unconditionally. But this new proof also involved a change of the auxiliary hypothesis on which the proof of the remaining cases depended. Evidently, since he replaced Hypothesis A by Hypothesis B in his later works, he regarded that as progress. In 1808 in [12, par. (169)], the first time Hypothesis B appears, he observes that a, being of the form 4x + 1, is necessarily of the form 8x + 1 or 8x + 5, and he can verify this hypothesis in the 8x + 5 case, leaving the 8x + 1 case open. He further observes that that case splits into the two cases 24x + 1 and 24x + 17, and he can verify this hypothesis in the 24x + 17 case, choosing b = 3, leaving the 24x + 1 case open. In 1830 in [13, par. (171)] he observes that in addition he has verified this hypothesis for each of the fifteen primes a of the form 24x + 1 with a ≤ 1009. He verifies this by considering the remainders when a is divided by 168 or 264, where he chooses b = 7 or b = 11 respectively. In fact, Legendre’s earlier Hypothesis A is a consequence of Dirichlet’s theorem on primes in an arithmetic progression, although, in the author’s opinion, this is certainly overkill. Ironically, there is no known proof of his Hypothesis B that does not use quadratic reciprocity. Thus, what seemed to him to be an advance seems to us March 2011]


213

nowadays to be a step backwards. (This has been observed earlier; see for example [14, 15].) We now turn to the “supplements” to the law of quadratic reciprocity. The first of −1 = 1 if c is a prime of the form 4x + 1 and = −1 if c is a prime these is that −1 c c of the form 4x + 3. This was known to Fermat and has already been remarked upon above. The second of these is that 2c = 1 if c is a prime of the form 8x + 1 or 8x + 7 and 2c = −1 if c is a prime of the form 8x + 3 or 8x + 5. Legendre derives this in an elementary way in 1797 in [11, par. (148)] from the following theorem: Theorem C. (a) An odd prime c is of the form y 2 + z 2 if and only if c is of the form 8x + 1 or 8x + 5. (b) An odd prime c is of the form y 2 + 2z 2 if and only if c is of the form 8x + 1 or 8x + 3. (c) An odd prime c is of the form y 2 − 2z 2 if and only if c is of the form 8x + 1 or 8x + 7. Legendre credits the discovery of all parts of this theorem to Fermat, with the first proofs of parts (a) and (b) due to Euler and the first proof of (c) due to Lagrange. This derivation of the value of 2c is unchanged in 1808/1830 in [12, 13], although in those works Legendre investigates the equations M x 2 − N y 2 = ±2, and some but 2 not all cases of c follow more simply from those investigations, as he notes. (The derivation of the value of 2c was much simplified by Gauss. In [3] he proves this in an elementary way and in [4] he gives a second proof, using “Gauss’s lemma,” as part of his third proof of reciprocity. Both of these proofs were well known in the 19th century, appearing in the famous textbook [1], written by Dirichlet but with revisions and supplements due to Dedekind, which gave (a slight reformulation of) Gauss’s third proof of reciprocity as well. But only the last of these proofs is well known today, when it has become the standard proof of the value of 2c .) We have concentrated here on Legendre’s work, but we would like to make a few more historical remarks. We have mentioned that Euler stated a conjecture equivalent to the law of quadratic reciprocity (though it takes a bit of work to see that), but Euler’s statement seems not to have had any influence on either Legendre or Gauss. Euler coined the terms quadratic residue and nonresidue, which were not used by Legendre but were used by Gauss. Legendre coined the term quadratic reciprocity, but this was never used by Gauss, who always referred to this result as the “fundamental theorem (in the theory of quadratic residues),” nor did Gauss ever use the Legendre symbol in any of his works on the subject. In the introduction to [3] Gauss writes that his work there had been done without knowledge of prior results in the subject. He also writes there that in the meanwhile, the “excellent” work [11] of the “highly deserving” Legendre appeared, but that he did not rewrite [3] to take it into account, only adding a few additional remarks in the Appendix. He makes some historical remarks in [3, par. (151)] immediately after he proves the fundamental theorem, in which he comments on the efforts of Euler and Legendre. In particular, Gauss credits Legendre for having arrived at that theorem in [10], without having been able to completely prove it (as Legendre himself admitted there), and then claims (quite fairly, in the opinion of the author) that his own proof is the first proof. In the first paragraph of [4] Gauss writes that in number theory it is often easy to inductively arrive at results whose proofs lie very deep, or even which defy proof. 214


In the second paragraph of that work Gauss credits Legendre with being the first to discover the fundamental theorem, although he could not prove it, but that he himself arrived at it independently in 1795 and was able to prove it only after a year’s effort. Legendre evidently felt that Gauss never gave him due credit for the law of quadratic reciprocity, nor for the discovery of the method of least squares, which they both arrived at as well. See [16] for a bitter commentary that Legendre wrote about Gauss. To quote excerpts from [16], in the translation given there, “. . . the author [Legendre] spoke of his method of least squares: it is enough to recall that he published it for the first time, in 1805 . . . However, as a very celebrated geometer [Gauss] has not hesitated to appropriate this method to himself in a work printed in 1809 . . . In addition, we would have willingly spoken of the episode of 1809 as of a totally new and different kind, if we had not found that in 1801 the same geometer made another attempt of this type, in a way this earlier behavior was even more imperfect . . . ” While we may feel that there is enough glory in the development of number theory to go around, that feeling was apparently not shared by all the protagonists. Finally, one may ask why Legendre continued to include his own incomplete proof of the law of quadratic reciprocity in [12] and [13]. While this question cannot be definitively answered without reading Legendre’s mind, the author speculates that there were two reasons. First, Legendre hoped that a (simple) proof of his auxiliary hypothesis would be found, thus vindicating his approach, and second, to have omitted his proof would have been to weaken his claim of priority for this theorem, and that is something he was certainly most unwilling to do. ACKNOWLEDGMENTS. This work was done while the author was on sabbatical leave at the Mathematics Institute of the University of Göttingen, which he would like to thank for its hospitality. He notes with particular pleasure that the copy of [11] he consulted in Göttingen was the copy from Gauss’s personal library. Preparation of this manuscript for publication was supported by a Faculty Research Grant from Lehigh University.

REFERENCES 1. P. G. L. Dirichlet, Vorlesungen u¨ ber Zahlentheorie, Chelsea, New York, 1968; reprint of the 4th edition, Braunschweig, 1893. 2. G. Frei, The reciprocity law from Euler to Eisenstein, in The Intersection of History and Mathematics, Science Networks Historical Studies, vol. 15, S. Chikara, S. Mitsuo, and J. W. Dauben, eds., BirkhäuserVerlag, Basel, 1994, 67–90. 3. C.-F. Gauss, Disquisitiones Arithmeticae, privately printed, Leipzig, 1801. 4. , Theorematis arithmetici demonstratio nova, Commentationes soc. reg. sc. Gottingensis XVI, 1808. 5. , Summatio quarumdum serium singularium, Commentationes soc. reg. sc. Gottingensis recentiores I, 1811. 6. , Theorematis fundamentalis in doctrina de residuis quadraticis demonstrationes et ampliationes novae, Commentationes soc. reg. sc. Gottingensis recentiores IV, 1818. 7. , Untersuchungen u¨ ber höhere Arithmetik (trans. H. Maser), American Mathematical Society/ Chelsea, Providence 2006. 8. C. G. J. Jacobi, Letter to Legendre of 5 August 1827, in Collected Works, vol. I, C. W. Borchardt, ed., Chelsea, New York, 1969, 390–396; reprint of the original edition, G. Reimer, Berlin, 1881. 9. A. Knoebel, R. Laubenbacher, J. Lodder, and D. Pengelley, Mathematical Masterpieces, Further Chronicles by the Explorers, Springer, New York, 2007. 10. A. M. Le Gendre, Recherches d’analyse indéterminée, Histoire de l’Académie royale des sciences avec les Mémoires de Mathématique et de Physique pour la même Année, 1785, 465–559; also available at http://gallica.bnf.fr. 11. A. M. Legendre, Essai sur la Théorie des Nombres, Paris, 1797. 12. , Essai sur la Théorie des Nombres, Paris, 1808.

March 2011]


215

13.

, Théorie des Nombres, Tomes I et II, Paris 1830. Also available in German translation in Zahlentheorie von Adrien-Marie Legendre (trans. H. Maser), Michigan Historical Reprint Series, University of Michigan Library. 14. F. Lemmermeyer, Reciprocity Laws: From Euler to Eisenstein, Springer, Berlin, 2000. ¨ 15. H. Pieper, Uber Legendres Versuche, das quadratische Reziprozitätsgesetz zu beweisen, Acta Hist. Leopold. 27 (1997) 223–237. 16. S. M. Stigler, An attack on Gauss, published by Legendre in 1820, Historia Math. 4 (1977) 31–35. doi:10.1016/0315-0860(77)90032-5

STEVEN H. WEINTRAUB is Professor of Mathematics at Lehigh University. His research spans a range of areas in algebra, geometry, and topology, although lately he has become interested in the history of mathematics as well. When he is not thinking about mathematics he can often be found reading mystery novels or flying his airplane (but not doing both simultaneously). Department of Mathematics, Lehigh University, Bethlehem, PA 18015 [email protected]

A Rational Function Without a Rational Antiderivative We give a simple proof, based on the mean value theorem of calculus, that the antiderivative of 1/(1 + x 2 ) is not rational. Assume on the contrary that R(x) =

p0 x n + · · · + pn P(x) = Q(x) q0 x m + · · · + qm

is such that R 0 (x) =

1 , 1 + x2

with p0 , q0 6= 0. Observe that R must be strictly increasing. Next, using the mean value theorem, we see that, if x > 0, there exists y between x and 2x such that R(2x) − R(x) = R 0 (y)x ≤

x . 1 + x2

Hence R(2x) − R(x) =

p0 q0 (2n − 2m )x n+m + · · · →0 q02 2m x 2m + · · ·

as x → +∞, and so n ≤ m. Therefore R(x) approaches the same finite limit as x → −∞ and x → +∞ ( p0 /q0 if n = m, and 0 if n < m), contradicting the fact that it is strictly increasing. —Submitted by André Giroux, Université de Montréal

216


Equimodular Polynomials and the Tritangency Theorems of Euler, Feuerbach, and Guinand Alexander Ryba and Joseph Stern Abstract. We strengthen a result of Lehmer, obtaining a new necessary condition for the roots of a complex polynomial to have equal modulus. From this we derive the famous theorem of Feuerbach, as well as the less well-known theorems of Euler and Guinand on the tritangent centers of a triangle. The latter theorems constrain the possible locations of the incenter and excenters subject to fixed locations for the circumcenter and orthocenter.

1. INTRODUCTION. As early as 1765, Euler knew that if a triangle is varied while its circumcenter and orthocenter remain fixed, then its incenter will stay within a bounded region. Euler [3] characterized the region by a pair of simple inequalities, but did not comment on its shape. Remarkably, the shape of the region was not known until 1984, when Guinand [5] proved it to be a punctured disc. Guinand’s method extended nicely to the excenters, and he was able to show that each excenter ranges over the common exterior of Euler’s punctured disc and a certain simple closed curve. We refer to the boundary curves of these regions as the Euler shield and Guinand shield.1 There are now several proofs of Euler’s theorem (see, e.g., [8, 9, 10, 12, 13]). Our approach is motivated by [10], which derives the theorem from the fact that a certain cubic polynomial has complex roots of equal modulus. Following Lehmer, we call a polynomial equimodular if all its roots have the same nonzero modulus. In [6], Lehmer gives a necessary condition for equimodularity. We strengthen this condition and apply the result to a special family of cubics, obtaining new proofs of the theorems of Euler and Guinand. Just as Guinand explained the geometric meaning of the Euler shield, our proof introduces a synthetic construction of the Guinand shield, shedding light on its geometric meaning. We are also able to extract a short proof of Feuerbach’s theorem. The three theorems—Euler, Feuerbach, Guinand—we call “E-F-G”. 2. THE THEOREMS E-F-G. Euler’s 1765 article [3] marks an important milestone in triangle geometry. In it, he introduces the line now referred to as the Euler line, as well as his formula for the distance between the circumcenter and the incenter of a triangle. These points are two of the four classical triangle centers, illustrated in Figure 1. The circumcenter O is the center of the circumscribed circle, as well as the intersection point of the perpendicular bisectors of the sides.2 The incenter I is the center of the inscribed circle or incircle, and also the intersection point of the angle bisectors. The remaining classical centers are the orthocenter H , where the altitudes meet, and the centroid G, where the medians meet. In a nonequilateral triangle, the points O, G, and H lie on the Euler line in the order O-G-H , with GH = 2 · OG. A fifth triangle center, apparently unknown to Euler, is the nine-point center N . This is the doi:10.4169/amer.math.monthly.118.03.217 a suggestion of John Conway. 2 Our notation and terminology are consistent with those of [2]. 1 After

March 2011]

EQUIMODULAR POLYNOMIALS AND TRITANGENCY THEOREMS

217

A

O

G

I N H B

D

A

C

Figure 1. The classical triangle centers and the Euler segment OH.

center of the nine-point circle, which passes through the midpoints of the three sides and the feet of the three altitudes. It turns out that N is also the midpoint of OH, so that OG : GN : NH = 2 : 1 : 3. The nine-point circle is often attributed to Euler, but no evidence has been found to support this attribution. According to [7], there are precedents of the nine-point circle theorem dating as far back as the 1804 work of Benjamin Bevan, but it is first explicitly described in the 1821 article [1] of Brianchon and Poncelet. The term “nine-point circle” was coined in 1842 by O. Terquem [11]. Although the 1765 paper is historically important for its introduction of the Euler line, Euler states that his primary aim is to compute the sides of a triangle in terms of its central distances, OH, OI, and IH. He does this by showing that for a nonequilateral triangle, the central distances determine the coefficients of a real cubic whose roots are exactly the sides a, b, c. This leads to a pair of necessary conditions on OH, OI, IH, which come from the fact that the cubic must have three positive real roots. Guinand [5] shows that Euler’s necessary conditions are also sufficient to guarantee the existence of a triangle with pre-specified central distances. This is now called Euler’s theorem. In order to formulate Euler’s theorem, we begin by defining two functions N (O, H ) = 21 (O + H )

and

G(O, H ) = 13 (2O + H ),

where O and H are any two points. In the archetypal case when O and H are the circumcenter and orthocenter of a triangle, N (O, H ) and G(O, H ) coincide with the nine-point center N and the centroid G. For brevity, we always write N = N (O, H ) and G = G(O, H ). Definition 1. The Euler shield determined by two points O and H , denoted by = (O, H ), is the circle with GH as diameter (Figure 3). The Euler shield can also be described as the locus of a point X such that OX = 2 · NX. This equation defines a circle of Apollonius whose center lies on line ON. Since OG : GN : NH = 2 : 1 : 3, the values X = G and X = H satisfy the equation. As G and H lie on line ON, the circle of Apollonius has GH as a diameter. We may now state Euler’s theorem. Theorem E (Euler, 1765). Three points O, H , and I are the circumcenter, orthocenter, and incenter of a triangle if and only if I is inside the Euler shield and differs from N . 218


Another celebrated theorem of the 1765 paper is the formula3 OI 2 = R(R − 2r ).

(1)

Here R is the circumradius, or radius of the circumscribed circle, while r is the inradius, or radius of the incircle. Equation (1) often appears in conjunction with Feuerbach’s relation NI = 21 R − r,

(2)

which expresses the famous result that the incircle and the nine-point circle are internally tangent. The term 21 R represents the radius of the nine-point circle.4 Both (1) and (2) have analogues in which the incenter is replaced by one of the three excenters. These are the centers of the excircles, which touch one side of 4ABC internally and the other two sides externally (see Figure 2). Each is the intersection of one internal angle bisector with two external angle bisectors. We denote the excenters opposite A, B, C by E a , E b , E c . An arbitrary excenter will be denoted by E, with ρ being the radius of the corresponding excircle. The incircle and excircles are collectively called tritangent circles, and their centers tritangent centers. Theorems about them are called “tritangency theorems”. A

B

C

I N

B

C

A

Ea Figure 2. The excenter E a and A-excircle; Feuerbach’s theorem.

The analogues of formulas (1) and (2) for an excenter are as follows: OE2 = R(R + 2ρ) NE = 12 R + ρ.

(3) (4)

Together, (2) and (4) comprise Feuerbach’s theorem (illustrated in Figure 2): 3 Euler expresses the squared distances between the classical centers in terms of the area 1 and the elementary symmetric functions of the sides a, b, c. His expression for OI 2 is converted to the form (1) by substitution of the well-known formulas 21 = r (a + b + c) and 4R1 = abc. 4 Two circles are internally (externally) tangent if and only if the distance between their centers is equal to the difference (sum) of their radii. The nine-point circle has radius 21 R because it is the circumscribed circle of the midpoint triangle.

March 2011]


219

Theorem F (Feuerbach, 1822). In a nonequilateral triangle, the nine-point circle is internally tangent to the incircle, and externally tangent to the excircles. Guinand sought constraints on the location of an excenter, just as Euler had for the incenter. The region of possible excenters turns out to be the common exterior of and a rational algebraic curve 0, which we call the Guinand shield (see Figure 3). Guinand’s definition of this curve is stated in algebraic rather than geometric language. As we will see, 0 turns out to have a simple geometric characterization. Like that of the Euler shield, it depends only on the points O and H . Definition 2. The Guinand shield determined by two points O and H , denoted by 0 = 0(O, H ), is defined as follows. If X is a point on the Euler shield , let ` X be the line HX when X 6= H , and the tangent line of at H when X = H . For each X ∈ , there are two points L ∈ ` X such that L X = 2 · O X . Each such L has a reflection P across O. Then 0 is the joint locus of the two points P as X varies over . The construction involved in this definition is fully illustrated in Figure 5; Figure 3 shows only the resulting locus. We now state Guinand’s theorem. Theorem G (Guinand, 1984). Three points O, H , and E are the circumcenter, the orthocenter, and one excenter of a triangle if and only if E lies outside both and 0.

OG N

H

Figure 3. The Euler shield and Guinand shield 0.

3. INVERSIVE STABILITY. Euler and Guinand proved their theorems by generating the desired triangle from the roots of real cubic equations. The roots of Euler’s cubic are the sides a, b, c, while those of Guinand’s cubic are cos A, cos B, cos C. Continuing the approach of [10], we generate the vertices A, B, C—regarded as points of C—from a complex cubic, by squaring its three roots. The circumcenter of 4ABC lies at 0 precisely when the associated cubic is equimodular. This observation leads us to study the notion of equimodularity in its own right, and in this section we will develop a useful necessary condition for it. A polynomial f will be called inversively stable if for some r > 0, the set of complex roots of f is invariant under inversion in the circle |z| = r . Every equimodular polynomial is inversively stable with respect to the circle on which its roots lie. The following lemma completely characterizes inversive stability, and therefore provides a necessary condition for equimodularity. 220


Lemma 1. The monic polynomial f (z) = z n + cn−1 z n−1 + · · · + c1 z + c0 is inversively stable with respect to a circle of radius r > 0 if and only if r 2k ck = c0 c¯n−k ,

k = 0, 1, . . . , n.

(5)

The only possible radius for which this may occur is r = |c0 |1/n . In formula (5), the bar denotes complex conjugation. Of course, cn = 1. Proof. Inversion in the circle of radius r is the map z 7 → r 2 /¯z . This transforms f (z) into f (r 2 /¯z ), which has the same set of roots as 2 r c¯1r 2 n−1 c¯2r 4 n−2 zn r 2n = zn + g(z) = · f z + z + ··· + . c¯0 z¯ c¯0 c¯0 c¯0 Since f and g are monic and have the same degree, the inversive stability of f is equivalent to the condition f = g. Equating coefficients and taking conjugates produces (5). When k = n, (5) becomes r 2n = c0 c¯0 = |c0 |2 , or r = |c0 |1/n . Taking moduli in (5) gives the following weaker condition of Lehmer [6]: r k |ck | = r n−k |cn−k |,

k = 0, 1, . . . , n.

Now let each monic polynomial be identified with its vector of coefficients. This puts the natural topology of Cn on the set of monic polynomials of degree n. Whenever we apply topological terms to polynomial spaces, it is this topology that is to be understood. We call a polynomial simple if it has no repeated roots. The following lemma is a direct consequence of the implicit function theorem, and presents a key topological property of simple polynomials. Lemma 2 (Root Continuity Lemma). Let f 0 be a simple polynomial of degree n. There is a neighborhood U of f 0 , and a set of n continuous functions ρ1 , . . . , ρn from U into C, such that the roots of any f ∈ U are exactly ρ1 ( f ), . . . , ρn ( f ). Proof. For each root z 0 of f 0 , the implicit function theorem gives a neighborhood W of f 0 and a continuous function ρ : W → C such that ρ( f 0 ) = z 0 and f (ρ( f )) = 0 for all f ∈ W . The n distinct roots of f 0 give rise to n such “root functions” ρ1 , . . . , ρn , each continuous near f 0 . Let V be a neighborhood of f 0 in which all the ρ j are continuous. Write δ( f ) = min j6=k |ρ j ( f ) − ρk ( f )|. Note that δ is continuous on V , and δ( f 0 ) > 0. Thus δ > 0 in some neighborhood U of f 0 . For each f ∈ U , the numbers ρ1 ( f ), . . . , ρn ( f ) furnish n distinct solutions of f (z) = 0. As deg f = n, these solutions exhaust the roots of f . We prove next that the “equimodular region” in a space of inversively stable simple polynomials is both open and closed, and is therefore a union of connected components. This will later allow us to identify the Euler and Guinand shields with the boundaries of certain equimodular regions. Theorem 1. In any space of simple, inversively stable polynomials, the set of equimodular elements and the set of non-equimodular elements are each open. March 2011]


221

Proof. Let S be a set of simple, inversively stable polynomials with the relative topology inherited from polynomial space. Let M be its equimodular subset, and take f 0 ∈ M. The preceding lemma gives a neighborhood U of f 0 in which the roots ρk ( f ) vary continuously with f . The roots of f 0 are distinct and equimodular, so their arguments must be distinct. Since these arguments are locally continuous functions of the roots themselves, there is a possibly smaller neighborhood V of f 0 in which the arguments of the roots remain distinct. An inversively stable polynomial whose roots have distinct arguments is obviously equimodular. Hence V ⊂ M, and M is open in S. Similarly, each f 0 ∈ S − M has at least two roots of unequal modulus, and by the preceding lemma, so does any f sufficiently close to f 0 . Thus S − M is open in S. 4. FEUERBACH’S THEOREM. In our approach to the E-F-G theorems, points of the plane are represented by complex coordinates. This is suggested by the following theorem of [10], which relates the classical triangle centers to the algebra of C. It assumes a coordinate system whose origin lies at the circumcenter of the given triangle. This puts O = 0, and therefore H = 2N = 3G. Theorem 2. Suppose 4ABC has circumcenter 0. Exactly two of the eight sets of square roots α, β, γ of A, B, C make 4αβγ acute-angled. For either set, H = α2 + β 2 + γ 2

(6)

I = −(βγ + γ α + αβ)

(7)

E a = −βγ + γ α + αβ,

E b = βγ − γ α + αβ,

E c = βγ + γ α − αβ.

(8)

Sketch of proof. Since G = 31 (A + B + C), we have H = 3G = A + B + C, i.e., (6). Let X, Y, Z be the midpoints of arcs BC, CA, AB of the circumscribed circle, as in Figure 4. Simple arc-measure computations show that I is the orthocenter of 4XYZ, so that (6) gives I = X + Y + Z . In addition, it can be shown that X is the midpoint of IEa , so E a = X − Y − Z . Writing A = α 2 , B = β 2 , and C = γ 2 , clearly X = ±βγ , Y = ±γ α, and Z = ±αβ. In [10] it is shown that when 4αβγ is acute-angled, all the ± signs become −, yielding (7) and (8). Y

A = α2 Z

0 I

B = β2

C = γ2 X

Ea Figure 4. Constructions used in the proof of Theorem 2.

222


Equations (6)–(8) yield (α + β + γ )2 = H − 2I and (−α + β + γ )2 = H − 2E, where E = E a is the excenter opposite A. Thus we may select values of the relevant square roots for which √ α + β + γ = − H − 2I ,

√ −α + β + γ = − H − 2E.

Euler’s relations (1) and (3) have several relatively straightforward proofs (see, e.g., [2]). Combining them with our condition (5) yields a short new proof of Feuerbach’s theorem. Proof of Theorem F. Since O = 0, the vertices A, B, C are equimodular. By Theorem 2, the points A, B, C have a set of equimodular square roots α, β, γ such that √ α + β + γ = − H − 2I

and βγ + γ α + αβ = −I.

Let Q = −αβγ , so |Q|2 = |A · B · C| = R 3 . Note that α, β, γ are the roots of f I (z) = z 3 +

√

H − 2I z 2 − I z + Q.

(9)

Since f I is equimodular, the necessary condition (5) applies in the form √ R 2 H − 2I = −Q I¯. Taking moduli and squaring both sides gives R 4 |H − 2I | = R 3 |I |2 . Substituting H = 2N and |I |2 = R(R − 2r ) (formula (1)), we find that |N − I | = 12 R − r . This says that the distance between the centers of the incircle and the nine-point circle equals the difference between their radii. Thus the two circles are internally tangent. As for E = E a , observe that −α, β, γ are the equimodular roots of f E (z) = z 3 +

√

H − 2E z 2 − E z − Q.

(10)

In this case, condition (5) and formula (3) yield |N − E| = 12 R + ρ, implying the external tangency of the nine-point circle and the A-excircle. 5. THE THEOREMS OF EULER AND GUINAND. As in Section 4, we use complex coordinates with O = 0. Fix a point H 6 = 0. Recall that the points O and H determine N and G, as well as the Euler and Guinand shields and 0. We now investigate conditions under which a point P is eligible to be a tritangent center of some triangle with circumcenter O and orthocenter H . Such triangles will be called admissible in what follows. As we will see, the desired conditions turn out to involve and 0. Neither O nor N is eligible to be a tritangent center. An admissible triangle is clearly nonequilateral, so Feuerbach’s theorem implies that N differs from each tritangent center. This means that the right-hand side of (2) is positive, and likewise for the righthand side of (1). Thus O differs from I , and (3) shows immediately that O also differs from E a , E b , and E c . The following lemma constructs a pair of inversively stable cubics indexed by a point P. Either of the cubics associated with a given point will later determine whether the point is a tritangent center for some admissible triangle.

March 2011]


223

Lemma 3. Given a point P ∈ / {0, N }, each of the two square roots of H − 2P determines a unique inversively stable polynomial of the form z3 +

√

H − 2P z 2 − Pz + Q

(Q ∈ C).

(11)

The two polynomials so determined have mutually negative roots. Proof. By Lemma 1, a polynomial of the form (11) is inversively stable if and only if √ |Q|2/3 (−P) = Q H − 2P. Taking moduli allows us to solve for |Q|, and substituting the result back into the original equation gives Q = −

√ P2 P H − 2P. (H − 2P)2

(12)

By hypothesis, H − 2P 6 = 0. Associate the signs ± arbitrarily with the two square roots of H − 2P. It follows from (12) that the two corresponding polynomials f P+ and f P− are uniquely determined by P, and also that f P− (−z) = − f P+ (z). Thus the roots of f P+ and f P− are mutual negatives. It turns out that there is no need to distinguish between f P+ and f P− , and we write f P for either of them. The reason is that f P+ and f P− have the same equimodularity status, and throughout this paper their roots only appear in homogeneous combinations of degree 2, which are invariant under z 7 → −z. We call a polynomial square-simple if the squares of its roots are distinct. Theorem 3. The point P is a tritangent center of some admissible triangle if and only if f P is equimodular and square-simple. Proof. Suppose P is a tritangent center for an admissible triangle 4ABC, and consider α, β, γ , the equimodular square roots of A, B, C supplied by Theorem 2. Formulas (9) and (10) show that the roots of f P are ±α, ±β, ±γ (e.g., −α, β, γ when P = E a ). Thus f P is equimodular, and since A, B, C are clearly distinct, f P is also square-simple. Conversely, suppose f P has equimodular roots a, b, c with distinct squares A, B, C. Since Now a + b + c = √ A, B, C are distinct points of a circle, they form2 a triangle. − H − 2P and bc + ca + ab = −P, so that H = a + b2 + c2 = A + B + C. By (6), 4ABC is admissible. Theorem 2 supplies a possibly different set of square roots α, β, γ of A, B, C satisfying (6)–(8). However, since a = ±α, b = ±β, and c = ±γ , the point P = −(bc + ca + ab) can only be one of −(βγ + γ α + αβ),

−βγ + γ α + αβ,

βγ − γ α + αβ,

βγ + γ α − αβ.

These are precisely the tritangent centers of 4ABC by (7) and (8). Naturally we may ask whether the triangle of Theorem 3 is unique. Suppose P is a tritangent center for each of 4A1 B1 C1 and 4A2 B2 C2 . The preceding proof shows that A1 , B1 , C1 are the squares of the roots of f P , and similarly for A2 , B2 , C2 . Hence the two triangles coincide. It follows that no point can be an incenter for one admissible triangle and an excenter for another. The two triangles would coincide, and no point can be both inside and outside the same triangle. We have thus proved: 224


Corollary 4. The regions of possible incenter locations and possible excenter locations are disjoint. There is also a third region, of points ineligible to be a tritangent center. This follows from Theorem 3, once it is noted that there are nonequimodular cubics f P . Consider for instance P = −2G (conveniently chosen inside 0). Choosing λ ∈ C such that 7λ2 = G, formulas (11) and (12) give f −2G (z) = z 3 + 7λz 2 + 14λ2 z + 8λ3 = (z + λ)(z + 2λ)(z + 4λ). Because f −2G is nonequimodular, −2G is not eligible to be a tritangent center. Theorem 3 suggests that we try to determine those points P for which f P is equimodular but not square-simple. There are two ways for square-simplicity to fail: a pair of equal roots, or a pair of mutually negative roots. The two cases will give rise to two curves, which turn out to be the Euler and Guinand shields. Theorem 5. The polynomial f P is equimodular and has a pair of mutually negative roots if and only if P lies on the Euler shield . Proof. Note that f P is equimodular and has a pair of mutually negative roots if and only if f P (z) = (z − z 0 )(z + z 0 )(z − ωz 0 ), where z 0 , ω ∈ C satisfy z 0 6 = 0 and |ω| = 1. Equating coefficients and writing H = 2N , this is equivalent to 2N − 2P = ω2 z 02 ,

P = z 02 .

(13)

Since z 0 and ω are arbitrary subject to z 0 6 = 0 and |ω| = 1, our two equations are equivalent to the single condition 2|N − P| = |P|. This is exactly the Apollonius condition 2 · NP = OP, which characterizes . Theorem 6. The polynomial f P is equimodular and has two equal roots if and only if P lies on the Guinand shield 0. Proof. Observe that f P is equimodular and has two equal roots if and only if f P (z) = (z − z 0 )2 (z − ωz 0 ), where z 0 , ω ∈ C have z 0 6 = 0 and |ω| = 1. Equating coefficients and putting L = −P, this amounts to H + 2L = z 02 (2 + ω)2 ,

L = z 02 (1 + 2ω).

(14)

Suppose (14) is true, and define X ∈ C by X = z 02 . It follows at once that H = (2 + ω2 )X.

(15)

Thus 2(N − X ) = H − 2X = ω2 X , and therefore 2 · NX = OX, that is, X ∈ . Similarly, from L − X = 2ωX we infer that LX = 2 · OX. Next, notice that H−X (1 + ω2 )X ω + ω¯ = = = Re(ω). L−X 2ωX 2 If X 6= H , then H − X and L − X are related by a real nonzero scale factor, so L must lie on line HX, which is denoted by ` X . If X = H , we have ω = ±i. Thus L − H = L − X = 2ωX = ±2i H , and LH ⊥ OH. This means that L lies on the tangent line of at H , denoted by ` H . Therefore P ∈ 0. March 2011]


225

Conversely, suppose P ∈ 0. Then some X ∈ has L ∈ ` X and LX = 2 · OX. Since |X | = 2|N − X |, it makes sense to define z 0 , ω ∈ C by setting z 02 = X and ω2 = 2(N − X )/ X , as these imply z 0 6 = 0 and |ω| = 1. Equations (14) may be recovered by reorganizing the computations of the preceding paragraph. Incidentally, if we eliminate X between (15) and the second equation of (14), and if we put ω = eit , the following simple parametrization of 0 results: 1 + 2eit P=− H, 0 ≤ t ≤ 2π. 2 + e2it The parametrization confirms that 0 is a continuous closed curve.5 We now establish a minor technical point, that and 0 divide the plane into three connected regions. Of course, Figure 3 makes this completely apparent. Lemma 4. C − 0 − has exactly three connected components. Proof. Consider the reflection of 0 through O, which we call 3 (Figure 5). For each X ∈ , there are two points of ` X at a distance of 2 · O X from X . These are L 1 , L 2 ∈ 3, labeled so that L 1 and H are on the same side of X whenever X 6 = H . (When X = H , the L i are interchangeable.) L = L2 P = P1 X O G N H

L = L1 P = P2 Figure 5. Construction of 3 and 0.

As X varies over , the distances HX and OX are respectively maximized and minimized when X = G. Since HG = 2 · OG, we have 2 · OX ≥ HX, with equality for X = G only. Thus H is strictly between L 1 and L 2 for X 6 = G, and L 1 = H for X = G. This means that every ray from H meets 3 in at most one other point. In particular, H is the only candidate for a self-intersection point of 3. However, as X makes a circuit of , L 2 never passes through H , while L 1 does so only for X = G. Thus in fact 3 is a Jordan curve, and so is 0. Observe next that 0 ∩ = {G}. For suppose P ∈ 0 ∩ . By Theorems 5 and 6, f P must have the form (z − z 0 )2 (z + z 0 ). Equating coefficients gives H − 2P = z 02 = P, so that 3P = H , and P = H/3 = G. 5 It

226

also shows that 0 is a rational curve, since eit is a rational function of u = tan(t/2).


Since − {G} is connected and disjoint from 0, it is either entirely inside 0 or entirely outside. In fact it must be outside 0, since H is. Thus Ext(0) − has two components, while C − 0 − has three. Finally, we will use a continuity argument to show that the three connected regions formed by the Euler and Guinand shields coincide with the regions of incenters, excenters, and non-tritangent points.6 The argument requires a continuous choice of √ H − 2P, and to achieve this we will cut the plane. For each point P, there are two possibilities for f P , one for each square root of H − 2P. The branch point of these square roots is N . If we cut the plane √ along a continuous curve from N to ∞, then in the cut plane there is a choice of H − 2P which varies continuously with P. Formulas (11) and (12) supply a corresponding continuous choice of f P . It will be convenient to cut along the ray κ = NH, because this ray avoids 0. Notice that we can modify κ slightly to obtain a cut κ ∗ that avoids both 0 and any given point P0 ∈ κ − {N }. We simply take a disc about P0 that avoids 0, and deform κ inside this disc so as to avoid P0 . Theorem 7. f P is equimodular and simple if and only if P ∈ Ext(0) − {N }. Proof. The set Cκ = (C − κ) − 0 − {0} has two components, one inside 0 and one outside. Choose the map f : P 7 → f P to be continuous on Cκ . Let M be the equimodular subset of f (Cκ ). By Theorem 1, both M and f (Cκ ) − M are open. The continuity of f implies that both U = f −1 (M) and its complement V = Cκ − U are open. Thus U and V must be the components of Cκ .7 Since −2G lies between −H and G, inside 0, in fact V = Int(0) − {0} and U = Ext(0) − κ. We have now accounted for all points P ∈ / κ. To account for a given point P0 ∈ κ − {N }, the preceding argument may be repeated in the cut plane C − κ ∗ , where κ ∗ is a suitably deformed cut avoiding 0 and P0 . We now know that the region inside 0 contains the non-tritangent points. To complete our proof of the theorems of Euler and Guinand, we need only identify the two regions outside 0. Proof of Theorems E and G. The set Dκ = Ext(0) − − κ has two components, one inside and one outside. (Cutting along the ray NH does not disconnect the interior of , as may be seen in Figure 3.) Again, choose f : P 7 → f P so as to be continuous on Dκ . With each P ∈ Dκ we may associate a triangle 4ABC having one tritangent center at P (Theorems 3, 5, 6, 7). Recall that A, B, C are the squares of the roots of f P . Since P 7→ f P is continuous, the root continuity lemma implies that P 7 → (A, B, C) is continuous. Moreover, P is the incenter of 4ABC if and only if P is inside 4ABC. Consider the partition Dκ = I ∪ E, where I and E are the subsets of possible incenters and possible excenters (which are disjoint by Corollary 4). We claim that I and E are open. Take a point P0 ∈ I. Being an incenter, P0 is inside its triangle 4A0 B0 C0 . By the continuity of P 7 → (A, B, C), there is a neighborhood U of P0 such that each P ∈ U lies inside its triangle 4ABC. Hence each P ∈ U is a possible incenter, and U ⊂ I. This shows that I is open. A similar argument shows that E is open. 6 In

[5], Guinand refers to the region of non-tritangent points as the acentric lacuna. fixed admissible triangle will have some excenter E ∈ / κ, showing that U is nonempty. When κ is later replaced by the deformed cut κ ∗ , the deformation may be taken sufficiently small that E ∈ / κ ∗ , so that U is still nonempty. In either of these cases, −2G ∈ V . 7A

March 2011]


227

Exactly as in the proof of Theorem 7, I and E must be the components of Dκ . Euler’s formulas (1) and (3) imply that the incenter and excenters of any admissible triangle satisfy |E|2 > R 2 > |I |2 . Thus if I contains points of arbitrarily large modulus, so must E. However, Dκ has only one unbounded component. So I must be bounded, and we have I = Int() − κ. Accounting for points of κ − {N } with deformed cuts κ ∗ , we conclude that Int() − {N } is the incenter region and Ext(0) ∩ Ext() is the excenter region. REFERENCES 1. C. J. Brianchon and J.-V. Poncelet, Géométrie des courbes. Recherches sur la détermination d’une hyperbole e´ quilatère, au moyen de quatres conditions données, Annales de Gergonne 11 (1820–1821) 205–220; also available at http://www.numdam.org/item?id=AMPA_1820-1821__11__205_0. 2. H. S. M. Coxeter and S. L. Greitzer, Geometry Revisited, Mathematical Association of America, Washington, DC, 1967. 3. L. Euler, Solutio facilis problematum quorundam geometricorum difficillimorum, Comm. Acad. Sci. Petropol. 11 (1765) 103–123; also in Opera Omnia, A. Speiser, ed., ser. I, vol. 26, no. 325, 139–157; available at Euler Archive, http://math.dartmouth.edu/~euler/. 4. K. W. Feuerbach, Eigenschaften einiger merkwürdigen Punkte des geradlinigen Dreiecks und mehrerer durch sie bestimmten Linien und Figuren: Eine analytisch-trigonometrische Abhandlung, Riegel und Wiesner, Nürnberg, 1822. 5. A. P. Guinand, Euler lines, tritangent centers, and their triangles, Amer. Math. Monthly 91 (1984) 290– 300. doi:10.2307/2322671 6. D. H. Lehmer, The complete root-squaring method, J. Soc. Indust. Appl. Math. 11 (1963) 705–717. doi: 10.1137/0111053 7. J. S. MacKay, History of the nine point circle, Proc. Edinb. Math. Soc. 11 (1892) 19–61. doi:10.1017/ S0013091500031163 8. B. Scimemi, Paper-folding and Euler’s theorem revisited, Forum Geom. 2 (2002) 93–104. 9. G. C. Smith, Statics and the moduli space of triangles, Forum Geom. 5 (2005) 181–190. 10. J. Stern, Euler’s triangle determination problem, Forum Geom. 7 (2007) 1–9; also available at http: //forumgeom.fau.edu/FG2007volume7. 11. O. Terquem, Considérations sur le triangle rectiligne, Nouv. Ann. de Math. 1 (1842) 196–200; also available at http://www.numdam.org/item?id=NAM_1842_1_1__196_1. 12. A. Várilly, Location of incenters and Fermat points in variable triangles, Math. Mag. 74 (2001) 123–129. doi:10.2307/2690626 13. P. Yiu, Conic solution of Euler’s triangle determination problem, J. Geom. Graph. 12 (2008) 75–80. ALEX RYBA received his B.A. and Ph.D. from the University of Cambridge. His main area of interest is finite group theory. After teaching at the University of Illinois at Chicago, the University of Michigan, and Marquette University, he joined the faculty of Queens College CUNY in 1998. One of his first students at Queens College was Joe Stern. Department of Computer Science, Queens College, Flushing NY 11367 [email protected] JOE STERN is a Ph.D. student at Columbia University. After receiving the 1999 British Marshall Scholarship, he joined the faculty of Stuyvesant High School, where he taught until 2009. He is a two-time recipient of the MAA’s Edyth May Sliffe Award for Distinguished High School Teaching. Two of his last students at Stuyvesant were Alex Ryba’s sons, Andrew and Nicholas. Department of Mathematics, Columbia University, New York NY 10027 [email protected]

228


The Sharkovsky Theorem: A Natural Direct Proof Keith Burns and Boris Hasselblatt Abstract. We give a natural and direct proof of a famous result by Sharkovsky that gives a complete description of possible sets of periods for interval maps. The new ingredient is the ˇ use of Stefan sequences.

1. INTRODUCTION. In this note f is a continuous function from an interval into itself. The interval need not be closed or bounded, although this is usually assumed in the literature. The point of view of dynamical systems is to study iterations of f : if f n denotes the n-fold composition of f with itself, then for a given point x one investigates the sequence x, f (x), f 2 (x), f 3 (x), and so on. This sequence is called the f-orbit of x, or just the orbit of x for short. It is particularly interesting when this sequence repeats. In this case we say that x is a periodic point, and we refer to the number of distinct points in the orbit or cycle O := { f n (x) | n = 0, 1, . . . } as the period of x.1 Equivalently, the period of x is the smallest positive integer m such that f m (x) = x. A fixed point is a periodic point of period 1, that is, a point x such that f (x) = x. A periodic point with period m is a fixed point of f m (and of f 2m , f 3m , . . . ). Thus, if f n (x) = x, then the period of x is a factor of n. If f has a periodic point of period m, then m is called a period for (or of ) f . Given a continuous map of an interval one may ask what periods it can have. The genius of Alexander Sharkovsky lay in realizing that there is a structure to the set of periods. 1.1. The Sharkovsky Theorem. The Sharkovsky Theorem involves the following ordering of the set N of positive integers, which is now known as the Sharkovsky ordering: 3 F 5 F 7 F · · · F 2 · 3 F 2 · 5 F 2 · 7 F · · · F 22 · 3 F 22 · 5 F 22 · 7 F · · ·

· · · F 23 F 22 F 2 F 1.

This is a total ordering; we write l F r or r G l whenever l is to the left of r . It is crucial that the Sharkovsky ordering has the following doubling property: l F r if and only if 2l F 2r.

(1)

This is because the odd numbers greater than 1 appear at the left end of the list, the number 1 appears at the right end, and the rest of N is included by successively doubling these end pieces, and inserting these doubled strings inward: odds, 2 · odds, 22 · odds, 23 · odds, . . . , 23 · 1, 22 · 1, 2 · 1, 1. Sharkovsky showed that this ordering describes which numbers can be periods for a continuous map of an interval. doi:10.4169/amer.math.monthly.118.03.229 usually refer to m as the least period.

1 Dynamicists

March 2011]

THE SHARKOVSKY THEOREM: A NATURAL DIRECT PROOF

229

Theorem 1.1 (Sharkovsky Forcing Theorem [14, 16]). If m is a period for f and m F l, then l is also a period for f . This shows that the set of periods of a continuous interval map is a tail of the Sharkovsky order. A tail is a set T ⊂ N such that s F t for all s ∈ / T and all t ∈ T . There are three types of tails: {m} ∪ {l ∈ N | l G m} for some m ∈ N, the set {. . . , 16, 8, 4, 2, 1} of all powers of 2, and ∅. The following complementary result is sometimes called the converse to the Sharkovsky Theorem, but is proved in Sharkovsky’s original papers. Theorem 1.2 (Sharkovsky Realization Theorem [14, 16]). Every tail of the Sharkovsky order is the set of periods for some continuous map of an interval into itself. The Sharkovsky Theorem is the union of Theorem 1.1 and Theorem 1.2: a subset of N is the set of periods for a continuous map of an interval to itself if and only if the set is a tail of the Sharkovsky order. All proofs of the Sharkovsky Theorem that we know are elementary, no matter how ingenious; the Intermediate-Value Theorem is the deepest ingredient. There is variation in the clarity of the proof strategy and its implementation. Our aim is to present, with all details, a direct proof of the Forcing Theorem that is conceptually simple and involves no artificial case distinctions. Indeed, its directness provides additional information (Section 8). We also reproduce a proof of the Realization Theorem in Section 7 at the end of this note. The standard proof of the Sharkovsky Forcing Theorem studies orbits of odd period with the property that their period comes earlier in the Sharkovsky sequence than any other period for that map. It shows that such an orbit is of a special type, known as ˇ a Stefan cycle,2 and then that such a cycle forces the presence of periodic orbits with Sharkovsky-lesser periods. The second stage of the proof considers various cases in which the period that comes earliest in the Sharkovsky order is even. Finally, this approach requires special treatment of the case in which the set of periods consists of all powers of 2. We extract the essence of the first stage of the standard proof to produce an argument ˇ that does not need Stefan cycles, and we replace the second stage of the standard proof by a simple and natural induction. Our main idea is to select a salient sequence of orbit points and to prove that this sequence “spirals out” in essentially the same way as the ˇ Stefan cycles considered in the standard proof. 1.2. History. A capsule history of the Sharkovsky Theorem is in [11], and [1] provides much context. The first result in this direction was obtained by Coppel [5] in the 1950s: every point converges to a fixed point under iteration of a continuous map of a closed interval if the map has no periodic points of period 2; it is an easy corollary that a continuous map must have 2 as a period if it has any periodic points that are not fixed. This amounts to 2 being the penultimate number in the Sharkovsky ordering. Sharkovsky obtained the results described above and reproved Coppel’s theorem in a series of papers published in the 1960s [14, 16]. He also worked on other aspects of one-dimensional dynamics (see, for instance, [13, 15, 17]). Sharkovsky appears to have been unaware of Coppel’s paper. His work did not become known outside eastern Europe until the second half of the 1970s. In 1975 this M ONTHLY published a famous paper, “Period three implies chaos” [10] by Li and Yorke, which included the result that the presence of a periodic point of period 3 implies the presence of periodic points of 2 “S” ˇ

230

is pronounced “Sh.”


all other periods.3 This amounts to 3 being the initial number in the Sharkovsky order. Some time later Yorke attended a conference in East Berlin, and during a river cruise a Ukrainian participant approached him. Although they had no language in common, Sharkovsky (for it was he) managed to convey, with translation by Lasota and Mira, that unbeknownst to Li and Yorke (and perhaps all of western mathematics) he had proved his results about periodic points of interval mappings well before [10], even though he did not at the time say what that result was. Besides introducing the idea of chaos to a wide audience, Li and Yorke’s paper was to lead to global recognition of Sharkovsky’s work. Within a few years of [10] ˇ new proofs of the Sharkovsky Forcing Theorem appeared, one due to Stefan [18], and a later one, which is now viewed as the “standard” proof, due to Block, Guckenheimer, Misiurewicz, and Young [3], Burkart [4], Ho and Morris [9], and Straffin [19]. Nitecki’s paper [12] provides a lovely survey from that time. Alsedà, Llibre, and Misiurewicz improved this standard proof [1] and also gave a beautiful proof of the Realization Theorem, which we reproduce in Section 7. The result has also been popular with contributors to the M ONTHLY. We mention here a short proof of one step in the standard proof [2] and several papers by Du [6, 8, 7]. Reading the papers by Du inspired the work that resulted in this article. 1.3. Related Work. There is a wealth of literature related to periodic points for onedimensional dynamical systems. [1] is a good source of pertinent information. There is a characterization of the exact structure of a periodic orbit whose period comes earliest in the Sharkovsky order for a specific map. There is also work on generalizations to other permutation patterns (how particular types of periodic points force the presence of others, and how intertwined periodic orbits do so), to different one-dimensional spaces (that look like the letter “Y,” the letter “X,” or a star “∗”), and to multivalued maps. 2. INTERVALS, COVERING RELATIONS, AND CYCLES. f

Definition 2.1. We say that an interval I covers an interval J and write I − → J if J ⊂ f (I ). We usually omit f and simply write I → J instead. 2.1. Coverings Produce Cycles. The Intermediate-Value Theorem allows us to translate knowledge of how intervals are moved around into information about the presence of periodic points. This is the content of the next three lemmas. f

Lemma 2.2. If [a1 , a2 ] − → [a1 , a2 ], then f has a fixed point in [a1 , a2 ]. Proof. If b1 , b2 ∈ [a1 , a2 ] with f (bi ) = ai , then f (b1 ) − b1 ≤ 0 ≤ f (b2 ) − b2 . By the Intermediate-Value Theorem, f (x) − x = 0 for some x between b1 and b2 . Lemma 2.3 (Itinerary Lemma). If J0 , . . . , Jn−1 are closed bounded intervals and f

f

f

J0 − → ··· − → Jn−1 − → J0 (this is called a loop or n-loop of intervals) then there is a point x that follows the loop, that is, f i (x) ∈ Ji for 0 ≤ i < n and f n (x) = x. Proof. We write I J if f (I ) = J . If I → J , there is an interval K ⊂ I such that K J because the intersection of the graph of f with the rectangle I × J contains a

3 It should not be forgotten that Li and Yorke’s work contains more than a special case of Sharkovsky’s: “chaos” is not just “points of all periods.”

March 2011]


231

minimal arc that joins the top and bottom sides of the rectangle. We can choose K to be the projection to I of such an arc.

J

K I Figure 1. Finding K J .

Thus there is a closed bounded interval K n−1 ⊂ Jn−1 such that K n−1 J0 . Then Jn−2 → K n−1 , and so there is K n−2 ⊂ Jn−2 such that K n−2 K n−1 . Inductively, there are closed bounded intervals K i ⊂ Ji , 0 ≤ i < n, such that K 0 K 1 · · · K n−1 J0 . Any x ∈ K 0 satisfies f i (x) ∈ K i ⊂ Ji for 0 ≤ i < n and f n (x) ∈ J0 . Since K 0 ⊂ J0 = f n (K 0 ), Lemma 2.2 implies that f n has a fixed point in K 0 . We wish to ensure that the period of the point x found in Lemma 2.3 is n and not a proper divisor of n, such as for the 2-loop [−1, 0] [0, 1] of f (x) = −2x, which is followed only by the fixed point 0. Definition 2.4. We say that a loop J0 → · · · → Jn−1 → J0 of intervals is elementary if every point that follows it has period n.4 With this notion, the conclusion of Lemma 2.3 gives us: Proposition 2.5. For an elementary loop J0 → · · · → Jn−1 → J0 there is a periodic point with period n that follows the loop. This makes it interesting to give convenient criteria for being elementary. The simplest is that any loop of length 1 is elementary (since the period of a point that follows such a loop must be a factor of 1). A criterion with wider utility is: Lemma 2.6. A loop J0 → · · · → Jn−1 → J0 of intervals is elementary if it is not followed by either endpoint S of J0 and the interior Int(J0 ) of J0 is disjoint from each of n−1 J1 , . . . , Jn−1 , i.e., Int(J0 ) ∩ i=1 Ji = ∅. Proof. If x follows the loop, then x ∈ Int(J0 ) because x ∈ J0 and it is not an endpoint. If 0 < i < n then f i (x) ∈ / Int(J0 ) because it is in Ji , and so x 6 = f i (x). Thus x has period n. 2.2. Cycles Produce Coverings. A closed bounded interval whose endpoints belong to a cycle O of f is called an O-interval. 4 This

232

is a different use of the word “elementary” from the one in [1].


In the rest of the paper the above ideas will be applied to O-intervals. We will use only information that can be obtained from the action of f on O and therefore applies to all continuous maps f for which O is a cycle. In particular, all of the covering relations I → J of O-intervals considered in the rest of the paper are O-forced. By this we mean that J lies in the O-interval whose endpoints are the leftmost and rightmost points of f (I ∩ O). By our standing assumption that f is continuous and the Intermediate-Value Theorem, this implies I → J . We say that a loop of O-intervals is O-forced if every arrow in it arises from an O-forced covering relation. Because in the remainder of the paper these are the only covering relations we will f

use, the symbols “− →” and “→” will henceforth denote O-forced covering relations. 3. EXAMPLES. The first example is the most celebrated special case of the Sharkovsky Theorem: that period 3 implies all periods. The second and third examples apply the same method to longer cycles and illustrate how our choice of O-intervals differs from that made in the standard proof. The last example illustrates our induction argument, which is built on the doubling structure of the Sharkovsky order. 3.1. Period 3 Implies All Periods. A 3-cycle comes in two versions that are mirror images of one another. In Figure 2, the dashed arrows indicate that x1 = f (x0 ), x2 = f (x1 ), and x0 = f (x2 ). In both pictures, I1 is the O-interval with endpoints x0 and x1 , and I0 is the O-interval with endpoints x0 and x2 . The endpoints of I1 are mapped to the very left and right points of the cycle, so we have the O-forced covering relations I1 → I1 and I1 → I0 . The endpoints of I0 are mapped to those of I1 , and so I0 → I1 is O-forced. We summarize these covering relations by writing I1 I0 . I0

I1 x2

x0

I0

I1

x1

x1

x0

x2

Figure 2. 3-cycles.

Since I1 → I1 , Lemma 2.2 implies that I1 contains a fixed point of f . The endpoints of I1 cannot follow the cycle I1 → I0 → I1 because they are periodic points with period 3, whereas a point that follows this cycle must have period 1 or 2. By Lemma 2.6, f has a point with period 2. No point of O, and hence no endpoint of I0 , has three consecutive iterates in the interval I1 . Hence by Lemma 2.6 the loop l − 1 copies of I1

}| { z I0 → I1 → I1 → · · · → I1 → I0 is elementary if l > 3. Thus, f has a periodic point of period l for each l > 3. This shows a special case of the Sharkovsky Theorem: the presence of a period-3 point causes every positive integer to be a period. 3.2. A 7-cycle. Consider a 7-cycle O and O-intervals as in Figure 3. Again, we write xi = f i (x0 ) and I1 = [x0 , x1 ] and so on, as indicated. With this choice of intervals we get the following O-forced covering relations: March 2011]


233

I0

I5 I4 I3 I2 I1

x6

x4

x2

x0

x1

x3

x5

Figure 3. A 7-cycle.

(1) I1 → I1 and I0 → I1 , (2) I1 → I2 → I3 → I4 → I5 → I0 , and (3) I0 → I5 , I3 , I1 .

This information can be summarized in a graph as follows: I1

I2

I0

I3 I5

(2)

I4

From this graph we read off the following loops. (4) (5) (6) (7) (8)

I1 → I1 , I0 → I5 → I0 , I0 → I3 → I4 → I5 → I0 , I0 → I1 → I2 → I3 → I4 → I5 → I0 , I0 → I1 → I1 → · · · → I1 → I2 → I3 → I4 → I5 → I0 with 3 or more copies of I1 .

I1 → I1 is elementary because it has length 1, and the remaining loops are elementary by Lemma 2.6 because Int(I0 ) ∩ I j = ∅ if 1 ≤ j ≤ 5 and the loops cannot be followed by an endpoint of I0 for reasons familiar from the previous example. The lengths of these loops are 1, 2, 4, 6, and anything larger than 7, which proves that this cycle forces every period l G 7. The standard proof uses a different choice of O-intervals to study this example: the interval Ii for each i with 2 ≤ i ≤ 5 is replaced by the interval between xi and xi−2 . With this alternative choice one still obtains the covering relations (1)–(3), but our choice of O-intervals adapts better to other situations such as that in the next example. 3.3. A 9-cycle. Figure 4 shows a 9-cycle O for which we chose O-intervals I0 , . . . , I5 such that Int(I0 ) ∩ I j = ∅ if 1 ≤ j ≤ 5 and the covering relations in the graph (2) above are satisfied. The arguments in Subsection 3.2 apply word-for-word to show that there are elementary loops, and hence periodic orbits, of length 1, 2, 4, 6, and anything larger than 7. The endpoints x0 , . . . , x6 of the intervals in Figure 4 spiral outwards from the “center” c := (x0 + x1 )/2 like the corresponding points in Figure 3, but now they do not constitute the entire cycle O and we do not have f (xi ) = xi+1 for every i. 234


I0

I5

I4

I3

I2 I1 x6

x4

x2

x0

x1

x3

x5

Figure 4. A 9-cycle.

The sequence x0 , . . . , x6 is chosen using the algorithm explained in Section 5. The main idea in this algorithm is that one does not always choose xi+1 = f (xi ), but moves inwards towards the center c if this will make f (xi+1 ) lie further from c. Figure 5 illustrates this with the graph of a simple function f that exhibits the cycle O. Starting from a point (xi , f (xi )) on the graph of O one moves horizontally to the diagonal, then vertically to the point ( f (xi ), f 2 (xi )) on the graph. Then, if possible, one skips to a point on the graph of O that is closer to c in the horizontal direction and further from c in the vertical direction; this point will be (xi+1 , f (xi+1 )). Such skips happen in steps 2 and 3 of this example. The process terminates when the sequence has spiralled out past a point (x6 here) whose image under f is on the same side of c as the point itself.

x5 + +

x3 + x1 + c + x0 + x2 + +

x4 + x6 + + x6

+ x4

+

+ x2

+ + + x0 c x1

+ x3

+

+ x5

Figure 5. The spiral of the points xi .

In the next section we abstract the properties of the endpoints of the intervals I0 , I1 , . . . that are essential to the above argument. 3.4. A 6-cycle. Consider the 6-cycle in Figure 6. The salient feature here is that the 3 points in the left half are mapped to the 3 points in the right half and vice versa. Therefore, the 3 points in the right half form a cycle • • • for the second iterate March 2011]


235

I1

I1

I0

I0 x0

x1

Figure 6. A 6-cycle. f2

f2

f 2 . As in Subsection 3.1 we have the covering relations I1 −→ I1 , I1 −→ I0 , and f2

I0 −→ I1 for the intervals I0 and I1 shown in Figure 6. We can conclude as before that f 2 has elementary loops of all lengths. For f itself we choose two additional intervals I00 and I10 by taking I j0 to be the shortest O-interval that contains f (I j ∩ O). We now illustrate a recursive method we will use later: we show how to associate with an elementary k-loop for f 2 an elementary 2k-loop for f itself. In the present example this then tells us that every even number is a period. f2

Consider an elementary k-loop for f 2 made using the covering relations I1 −→ I1 , f2

f2

f2

f

f

I1 −→ I0 , and I0 −→ I1 . Replace each occurrence of “I1 −→” by “I1 − →” and → I10 − f2

f

f

each occurrence of “I0 −→” by “I0 − →” and note that this produces a 2k-loop → I00 − for f that is not a k-loop traversed twice (which would cause difficulty with being elementary). We show that it is elementary using the definition of elementary. Suppose a point p follows the 2k-loop under f . We need to show that it has period 2k for f . Observe that p follows the original elementary k-loop under f 2 and hence has period k for f 2 . On the other hand, the iterates of p under f are alternately to the left and the right of the middle interval (x0 , x1 ) since the 2k-loop for f alternates between primed and unprimed intervals. Therefore, the orbit of p consists of 2k distinct points; there are k even iterates on the right and k odd iterates on the left. This means that the period of p for f is 2k. Since k was arbitrary, we infer that this 6-cycle forces all even periods (as well as period 1 due to the interval [x0 , x1 ] in the center, which covers itself under f ). In the next 3 sections we prove the Sharkovsky Forcing Theorem 1.1. We first show that the existence of a special sequence in an m-cycle O produces all desired cycles. Next we construct such a sequence under a mild assumption on O. Finally we reduce the general case to this latter one. ˇ 4. STEFAN SEQUENCES PRODUCE CYCLES. Let m ≥ 2 and O an m-cycle of a continuous map f on an interval. Definition 4.1. Let p be the rightmost of those points in O for which f ( p) > p, and q the point of O to the immediate right of p. We define the center of O by c := ( p + q)/2. For x ∈ O we denote by Ox ⊂ O the set of points of O in the closed interval bounded by x and c. That is, Ox = O ∩ [x, p] when x ≤ p, and Ox = O ∩ [q, x] when x ≥ q. We say that a point x ∈ O switches sides if c is between x and f (x). From the examples in Section 3 we extract the following desirable properties of a sequence of points of O.

236


ˇ Definition 4.2. A sequence x0 , . . . , xn of points in O is called a Stefan sequence if ˇ {x0 , x1 } = { p, q}. (S1) ˇ x0 , . . . , xn are on alternating sides of the center c and the sequences (x2 j ) and (S2) (x2 j+1 ) are both strictly monotone (necessarily moving away from c). ˇ If 1 ≤ j ≤ n − 1, then x j switches sides and x j+1 ∈ O f (x ) . (S3) j

ˇ xn does not switch sides. (S4)

ˇ means that c < x j+1 ≤ f (x j ) if Remark 4.3. The condition x j+1 ∈ O f (x j ) in (S3) x j < c and c > x j+1 ≥ f (x j ) if x j > c. ˇ implies that x0 , . . . , xn are pairwise distinct. Hence n + 1 ≤ m and so n < m. (S2) ˇ Figure 2 and Figure 3 show Stefan sequences that happen to consist of the entire cycle; ˇ we have n + 1 = m in these cases. Figure 4 provides an illustration in which a Stefan sequence is a proper subset of the cycle and n + 1 < m. ˇ and (S4) ˇ together imply that n ≥ 2 and hence m ≥ 3. Note that for m = 1 the (S1) Sharkovsky Forcing Theorem is vacuously true and for m = 2 it is an application of Lemma 2.2. ˇ Proposition 4.4. Suppose that the m-cycle O has a Stefan sequence. If l G m, then f has an O-forced elementary l-loop of O-intervals and hence a periodic point with least period l. ˇ Given a Stefan sequence x0 , . . . , xn we define the desired O-intervals I0 , . . . , In−1 as follows. For 1 ≤ j < n, we take I j to be the shortest interval that contains x0 , x1 , and x j , while I0 is defined to be the O-interval with endpoints xn and xn−2 . It follows ˇ that Int(I0 ) ∩ I j = ∅ if 1 ≤ j < n. from (S2) Proposition 4.5. With I j chosen as above, we have the following O-forced covering relations. (1) I1 → I1 and I0 → I1 . (2) I1 → I2 → · · · → In−1 → I0 . (3) I0 → In−1 , In−3 , . . . .

They can be summarized in a graph as follows: I1 I0

In−5

In−1

In−4 In−2

In−3

Proof. (1) We will, in fact, prove that I j → I1 for j = 0, . . . , n − 1. This amounts to showing that f (I j ) contains x0 and x1 . ˇ and (S3) ˇ (or by (S1) ˇ if j = 1) the endpoints of I j for j = 1, . . . , n − 1 By (S2) are on opposite sides of c and both switch sides. The endpoints of I0 are on the same ˇ In either case f (I j ) side of c, but one switches sides and the other does not, by (S4). March 2011]


237

ˇ contains points of O on both sides of c and must therefore contain x0 and x1 by (S1) and the definition of c. (2) It suffices to show for 1 ≤ j ≤ n − 1 that f (I j ) contains x0 , x1 , and x j+1 . We have already seen that x0 and x1 are in f (I j ). Since f (x j ) is also in the interval f (I j ), ˇ that x j+1 ∈ f (I j ) as this implies that O f (x j ) ⊂ f (I j ). It follows from this and (S3) well. (3) It suffices to show that f (I0 ) contains x0 , x1 , and all of the points xn−1 , xn−3 , . . . of O that are on the opposite side of c from xn . We have already seen that f (I0 ) ˇ that f (xn−2 ) is at least as contains x0 and x1 . But xn−2 is in I0 and it follows from (S3) ˇ Consequently far from c as xn−1 , which is further from c than xn−3 , xn−5 , . . . , by (S2). the points xn−1 , xn−3 , . . . lie in f (I0 ). From the graph in Proposition 4.5 we read off the following loops: (L1) I1 → I1 ; (L2) I0 → In−(l−1) → In−(l−2) → · · · → In−2 → In−1 → I0 for even l ≤ n; (L3) I0 → I1 → I1 → · · · → I1 → I2 → · · · → In−1 → I0 with r ≥ 1 repetitions of I1 (and hence of length l = n − 1 + r ). Proof of Proposition 4.4. If l G m then there are 3 cases. If l = 1 we use that the loop (L1) has length 1 and is hence elementary. If l ≤ n is even, (L2) provides a loop of length l. If n ≤ l 6= m, then (L3) with r = l − n + 1 provides a loop of this length. The fact that Int(I0 ) ∩ I j = ∅ if 1 ≤ j < n combined with Lemma 2.6 will tell us that these loops are elementary once we show that they cannot be followed by a point of O. This holds for the loops in (L2) because they have length l ≤ n < m (Remark 4.3) and for the loops in (L3) because either they have length l < m, or else we have l > m and hence r = l − n + 1 ≥ m + 1 − n + 1 ≥ 3 repetitions of I1 . ˇ 5. CONSTRUCTING A STEFAN SEQUENCE. The Sharkovsky Forcing Theoˇ rem would be immediate from Proposition 4.4 if every cycle had a Stefan sequence. ˇ However, the cycle in Figure 6 has no Stefan sequence because every point switches ˇ sides. We now show that this is the only obstacle to finding a Stefan sequence. ˇ Proposition 5.1. A cycle with more than one point contains a Stefan sequence unless every point switches sides. Proof. Let O be a cycle with m ≥ 2 points. First we identify a set S ⊂ O, which contains the points of O that are candidates to ˇ be nonfinal terms in a Stefan sequence. Let M be the maximal O-interval containing [ p, q] such that all points of O that are in M switch sides; O ∩ M is thus the set of all x ∈ O such that every point of Ox switches sides. The set S consists of those x ∈ O ∩ M such that f maps x further from c than any other point in Ox . Equivalently, x ∈ O ∩ M is in S if O f (w) ⊂ O f (x) for all w ∈ Ox . Note that p, q ∈ S.

x

...

...

|

c

...

... f (x)

Figure 7. x ∈ S.

238


ˇ We now define a map σ : S → O, which will take an element of a Stefan sequence to its successor in the sequence. We always choose σ (x) ∈ O f (x) ; since x ∈ M this ensures that x and σ (x) are on opposite sides of c. (i) If f (x) ∈ / M, we can take σ (x) to be any point of O f (x) that does not switch sides. In this case σ (x) ∈ / S. (ii) If f (x) ∈ M, then σ (x) is the point of O f (x) that maps furthest from c, i.e., f (O f (x) ) ⊂ O f (σ (x)) . By construction (see Figure 8, for example) we have σ (x) ∈ S in this case.

... f (x)

... σ (x)

|

c

...

... f (σ ( x))

Figure 8. The successor map σ in case (ii).

We noted that x and σ (x) are on opposite sides of c, so σ 2 (x), if defined, is again on ˇ that σ 2 (x) the same side as x. It is crucial for obtaining the outward spiraling in (S2) 2 is further from c than x, i.e., that σ (x) ∈ / Ox . Lemma 5.2. If there is an x ∈ S such that σ 2 (x) ∈ Ox , then all points of O switch sides. Proof. In order to have σ 2 (x) defined and in Ox , we must have x, y := σ (x), and z := σ (y) = σ 2 (x) all in S. Moreover σ (x) and σ (y) are both obtained using case (ii) in the definition of σ . Hence f (O f (x) ) ⊂ O f (σ (x)) = O f (y) and f (O f (y) ) ⊂ O f (σ (y)) = O f (z) . Since z = σ 2 (x) ∈ Ox and x ∈ S, we have O f (z) ⊂ O f (x) . Combining the above inclusions shows that O f (x) ∪ O f (y) is mapped into itself by f . Since f is a cyclic permutation of O, the only nonempty f -invariant subset of O is O itself. Thus O = O f (x) ∪ O f (y) . But all points of this set switch sides because x and y are in S. To conclude the proof of Proposition 5.1 we now suppose that there is a point of O ˇ that does not switch sides and show that this implies the existence of a Stefan sequence. The contrapositive of Lemma 5.2 implies that we cannot have both σ ( p) = q and σ (q) = p. Therefore we can choose {x0 , x1 } = { p, q} in such a way that x2 := σ (x1 ) 6 = x0 and then continue to choose xi+1 = σ (xi ) while xi ∈ S. We now verify that this ˇ produces a Stefan sequence. ˇ Our choice of {x0 , x1 } = { p, q} gives (S1). March 2011]


239

ˇ note that successive terms lie on alternating sides of c because x and To check (S2) σ (x) are on opposite sides of c. To check that the sequence spirals outward note first that our choice of x0 and x1 ensures that x2 ∈ / Ox0 . Thereafter, Lemma 5.2 shows that xi+2 = σ 2 (xi ) ∈ / Oxi , i.e., xi+2 lies further from c than xi . This implies in particular that the terms of the sequence are pairwise distinct. Since they lie in the finite set O, the sequence terminates. We label the last term xn and note that it necessarily arises from (i) in the definition of σ . Hence xn does not switch sides, ˇ which implies (S4). ˇ To check (S3) we note first that for j < n we have x j ∈ S ⊂ M, and x j therefore switches sides. Finally, x j+1 = σ (x j ) ∈ O f (x j ) by the definition of σ . Proposition 5.1 and Proposition 4.4 give the following main case of the Sharkovsky Theorem. Proposition 5.3. If an m-cycle O with m ≥ 2 contains a point that does not switch sides, then for each l G m there is an elementary, O-forced l-loop of O-intervals, and hence an l-cycle. 6. PROOF OF THE SHARKOVSKY FORCING THEOREM. To prove the Sharkovsky Forcing Theorem it remains to reduce the case of a cycle in which all points switch sides to the main case of Proposition 5.3. We use the fact that the left and right halves of such a cycle are cycles for f 2 of half the length. Proposition 6.1. An m-cycle O has an O-forced elementary l-loop of O-intervals for each l G m. By Proposition 2.5, this implies the Sharkovsky Forcing Theorem 1.1. Proof. We proceed by induction on m. Proposition 6.1 is vacuously true for m = 1 since there is no l G 1. Suppose now that Proposition 6.1 is known for all cycles of length less than m. Let O be an m-cycle. If there is a point that does not switch sides, then the conclusion of Proposition 6.1 follows by Proposition 5.3. Otherwise, all points switch sides. Write L := min O and R := max O. Then O L (see Definition 4.1) contains the points in O to the left of c, O R contains those to the right of c, and f swaps these sets: f O L is a bijection from O L to O R and f O R is a bijection from O R to O L , so O L and O R have the same number of points, and m is even. Since m is even, it follows from the doubling property (1) that l G m if and only if l = 1 or l = 2k with k G m/2. Therefore we need to show that f has an elementary 1-loop as well as an elementary O-forced 2k-loop of O-intervals for each k G m/2. As the elementary 1-loop we can take the middle O-interval [ p, q], since p = max O L and q = min O R . To find the required 2k-loops, we use the inductive assumption and the fact that O L and O R are cycles of length m/2 for the second iterate f 2 . Proposition 6.1 can be applied to either of these cycles. Using O R , we find that f 2 has an elementary O R forced k-loop of O R -intervals for each k G m/2. The induction will be complete once we show that these give rise to elementary 2k-loops for f itself. To that end, consider an elementary k-loop f2

f2

f2

f2

f2

I0 −→ I1 −→ I2 −→ · · · −→ Ik−1 −→ I0 240

(3)


of O R -intervals for f 2 . For later convenience we set Ik := I0 . Let Ii0 be the shortest closed interval that contains f (Ii ∩ O) ⊂ O L . These are O-intervals and by construcf

→ Ii0 for each i, 0 ≤ i < k. The retion we have the O-forced covering relation Ii − mainder of the proof consists of showing that this produces an O-forced 2k-loop f

f

f

f

f

f

f

0 − → I0 → Ik−1 → ··· − → Ik−1 − → I10 − → I1 − → I00 − I0 −

(4)

for f that is elementary. To see that this is an O-forced loop we show that we also have the covering relaf2

f

→ Ii+1 and that they are O-forced. Since Ii −→ Ii+1 and this covering is O R tions Ii0 − forced, there are points ai , bi ∈ Ii ∩ O R such that the closed interval between f 2 (ai ) and f 2 (bi ) contains Ii+1 . But then ai0 := f (ai ) and bi0 := f (bi ) are in Ii0 ∩ O and the closed interval between f (ai0 ) = f 2 (ai ) and f (bi0 ) = f 2 (bi ) contains Ii+1 , as required. It remains to show that the loop in (4) is elementary. Consider a periodic point x for f that follows the loop (4). It is a periodic point for f 2 that follows the elementary loop (3) and hence has period k with respect to f 2 . Therefore k points of its f -orbit (the even iterates) lie in O R . Since the intervals in the loop (4) are alternately to the right and the left of the center, so are the iterates of x under f . Therefore another k points (the odd iterates) lie in O L , and the orbit has length 2k. Hence x has period 2k with respect to f , and the loop (4) is elementary. 7. THE SHARKOVSKY REALIZATION THEOREM. An elegant proof of the Sharkovsky Realization Theorem 1.2 is given in [1]. It reveals one Sharkovsky tail at a time as one increases h in the family of truncated tent maps Th : [0, 1] → [0, 1],

x 7 → min(h, 1 − 2|x − 1/2|)

for

0 ≤ h ≤ 1.

This family has several key properties.

h Th

Figure 9. Truncated tent maps.

(a) T1 has only one periodic point (the fixed point 0) while the tent map T1 has a 3-cycle {2/7, 4/7, 6/7} and hence has all natural numbers as periods by the Sharkovsky Forcing Theorem 1.1. (b) Any cycle O ⊂ [0, h) of Th is a cycle for T1 , and any cycle O ⊂ [0, h] of T1 is a cycle for Th . What makes the proof so elegant is that h plays three roles: as a parameter, as the maximum value of Th , and as a point of an orbit. The key idea is to let h(m) := min{max O | O is an m-cycle of T1 } for m ∈ N. (We can write “min” instead of “inf” because T1 has a finite number of periodic points for each period.5 ) From this and (b) we obtain: 5 Inspection

of the graph of T1n shows that it has exactly 2n fixed points.

March 2011]


241

(c) Th has an l-cycle O ⊂ [0, h) if and only if h(l) < h. (d) The orbit of h(m) is an m-cycle for Th(m) , and all other cycles for Th(m) lie in [0, h(m)). From (d) and the Sharkovsky Forcing Theorem 1.1 we see that if l G m, then Th(m) has an l-cycle that lies in [0, h(m)); it follows from (c) that h(l) < h(m). By symmetry, (e) h(l) < h(m) if and only if l G m. We see from (c), (d), and (e) that for any m ∈ N the set of periods of Th(m) is the tail of the Sharkovsky order consisting of m and all l G m. The set of all powers of 2 is the only other tail of the Sharkovsky order (besides ∅, which is the set of periods of the translation x 7 → x + 1 on R). We have h(2∞ ) := supk h(2k ) > h(2k ) by (e) for all k ∈ N, so Th(2∞ ) has 2k -cycles for all k by (c). Suppose Th(2∞ ) has an m-cycle with m not a power of 2. By Theorem 1.1 Th(2∞ ) also has a 2mcycle. Since the m-cycle and the 2m-cycle are disjoint, at least one of them is contained in [0, h(2∞ )) and hence in [0, h(2k )) for some k ∈ N, contrary to (c) and (e). 8. CONCLUSION. It may be of interest to note that the proof given here provides more information than the statement of the Sharkovsky Forcing Theorem 1.1. When in the proof of Proposition 4.4 we treated the loops in (L3) on page 238 we only needed to know that n ≤ l 6 = m. Therefore Proposition 4.4 can be amplified to the following: ˇ Proposition 8.1. If an m-cycle O contains a Stefan sequence x0 , . . . , xn , then O forces periods l = 1 (from (L1)), l ≥ n (from (L3)), and even l ≤ n (from (L2)). This includes periods that precede m in the Sharkovsky order. An extreme instance is given by a cycle in which the point q chosen at the beginning of Proposition 5.1 is max O and f (q) = min O, i.e., a cycle of the form • · · · • •. The ˇ 3 points shown here constitute a Stefan sequence with n = 2, which forces period 3 and hence all periods. Another way in which additional information can be extracted by keeping track of patterns arises in connection with cycles whose length is 2k for some k. If such a cycle ˇ O contains a point that does not switch sides, then there is a Stefan sequence with n < m − 1, and Proposition 8.1 shows that O forces all periods l ≥ n, in particular for some odd such l, and hence there are periods that are not powers of 2. Morover, if all points of O do switch sides, the reduction in the proof of Proposition 6.1 yields a cycle of length 2k−1 for f 2 to which one can apply the previous reasoning: it either forces a period that is not a power of 2 or all its points switch sides. In the latter case one can again reduce a step. If this keeps happening until one has reduced to period 2 for k−1 f 2 , then we say that O is simple, and we have observed that if a continuous map has only powers of 2 as periods, then all cycles must be simple. In other words, if there is a cycle of length 2k for any k > 1 that is not simple, then it forces a period that is not a power of 2. These observations illustrate that our method can make use of more information than just the period of the cycle from which one starts; this differs from the standard proof, which begins by discarding the initial orbit. Like our proof, refinements of Sharkovsky’s Theorem systematically take into account “patterns” instead of just periods. ˇ The definition of a Stefan sequence implies that if n = m − 1, there will be only one point of O, namely xm−1 , that does not switch sides. The point xm−1 must be either the leftmost or rightmost point of O and the sequence x0 , x1 , . . . must spiral outwards 242


clockwise or counterclockwise as shown: x m−1

...

x4

x2

x0

x1

x3

...

x m−2

or

x m−2

...

x3

x1

x0

x2

x4

...

x m−1 .

Furthermore we must have f (xi ) = xi+1 for 0 ≤ i < m − 1. These orbits are called ˇ Stefan cycles. They are central to the standard proof of the Sharkovsky Theorem. Our proof is more direct because we do not need these cycles, but they inspired our definiˇ tion of Stefan sequences. ACKNOWLEDGMENTS. We thank the Instituto Superior Técnico of the Universidade Técnica de Lisboa and the Mittag-Leffler Institute of the Royal Swedish Academy of Sciences for their hospitality and support. We are also grateful to Wah Kwan Ku for pointing out an error in a draft of this paper, Aaron Brown and Stephen Sherman for expository suggestions, and the referee and Dan Velleman for extensive corrections and improvements. Keith Burns was partially supported by NSF grant DMS-0701140.

REFERENCES 1. L. Alsedà, J. Llibre, and M. Misiurewicz, Combinatorial Dynamics and Entropy in Dimension One, 2nd ed., Advanced Series in Nonlinear Dynamics, vol. 5, World Scientific, River Edge, NJ, 2000. 2. R. Barton and K. Burns, A simple special case of Sharkovskii’s theorem, Amer. Math. Monthly 107 (2000) 932–933. doi:10.2307/2695586 3. L. Block, J. Guckenheimer, M. Misiurewicz, and L.-S. Young, Periodic points and topological entropy of one-dimensional maps, in Global Theory of Dynamical Systems, Z. Nitecki and C. Robinson, eds., Lecture Notes in Mathematics, vol. 819, Springer Verlag, Berlin, 1980, 18–34. 4. U. Burkart, Interval mapping graphs and periodic points of continuous functions, J. Combin. Theory Ser. B 32 (1982) 57–68. doi:10.1016/0095-8956(82)90076-4 5. W. A. Coppel, The solution of equations by iteration, Proc. Cambridge Philos. Soc. 51 (1955) 41–43. doi:10.1017/S030500410002990X 6. B.-S. Du, A simple proof of Sharkovsky’s theorem, Amer. Math. Monthly 111 (2004) 595–599. doi: 10.2307/4145161 , A simple proof of Sharkovsky’s theorem revisited, Amer. Math. Monthly 114 (2007) 152–155. 7. , A collection of simple proofs of Sharkovsky’s theorem (2007), available at http://arxiv. 8. org/abs/math/0703592. 9. C. W. Ho and C. Morris, A graph-theoretic proof of Sharkovsky’s theorem on the periodic points of continuous functions, Pacific J. Math. 96 (1981) 361–370. 10. T.-Y. Li and J. A. Yorke, Period three implies chaos, Amer. Math. Monthly 82 (1975) 985–992. doi: 10.2307/2318254 11. M. Misiurewicz, Remarks on Sharkovsky’s Theorem, Amer. Math. Monthly 104 (1997) 846–847. doi: 10.2307/2975290 12. Z. Nitecki, Topological dynamics on the interval, in Ergodic Theory and Dynamical Systems II–College Park, MD 1979–1980, Progr. Math., vol. 21, Birkhäuser, Boston, 1982, 1–73. 13. A. N. Sharkovski˘ı, The reducibility of a continuous function of a real variable and the structure of the stationary points of the corresponding iteration process, Dokl. Akad. Nauk RSR 139 (1961) 1067–1070. 14. , Coexistence of cycles of a continuous map of the line into itself, Ukrain. Mat. Zh. 16 (1964) 61–71; trans. J. Tolosa, Proceedings of “Thiry Years after Sharkovski˘ı’s Theorem: New Perspectives” (Murcia, Spain 1994), Internat. J. Bifur. Chaos Appl. Sci. Engrg. 5 (1995) 1263–1273. 15. , Fixed points and the center of a continuous mapping of the line into itself, Dopovidi Akad. Nauk Ukr. RSR 1964 (1964) 865–868. 16. , On cycles and structure of a continuous mapping, Ukrain. Mat. Zh. 17 (1965) 104–111. doi: 10.1007/BF02527365 17. , The set of convergence of one-dimensional iterations, Dopovidi Akad. Nauk Ukr. RSR 1966 (1966) 866–870. ˇ ˇ 18. P. Stefan, A theorem of Sarkovskii on the existence of periodic orbits of continuous endomorphisms of the real line, Comm. Math. Phys. 54 (1977) 237–248. doi:10.1007/BF01614086 19. P. D. Straffin Jr., Periodic points of continuous functions, Math. Mag. 51 (1978) 99–105. doi:10.2307/ 2690145

March 2011]


243

KEITH BURNS is a Professor of Mathematics at Northwestern University. He obtained his Ph.D. from Warwick University (UK) and works on geometrical and dynamical problems connected with geodesics in manifolds with nonpositive curvature. When not doing mathematics he likes to run marathons. Department of Mathematics, Northwestern University, Evanston, IL 60208 [email protected] BORIS HASSELBLATT is Professor and Chair of Mathematics at Tufts University. He obtained a Vordiplom in Physics from the Technische Universität Berlin in 1981, an M.A. in mathematics from the University of Maryland in College Park in 1984, and a Ph.D. in mathematics from the California Institute of Technology in 1989. He studies mainly hyperbolic dynamical systems, often of types that are geometrically motivated. When he is not doing mathematics he enjoys singing. He dedicates this article to all those who do not dedicate their articles to themselves. Department of Mathematics, Tufts University, Medford, MA 02155 [email protected]

Polanyi on Mathematics “All these difficulties are but consequences of our refusal to see that mathematics cannot be defined without acknowledging its most obvious feature: namely, that it is interesting. Nowhere is intellectual beauty so deeply felt and fastidiously appreciated in its various grades and qualities as in mathematics, and only the informal appreciation of mathematical value can distinguish what is mathematics from a welter of formally similar, yet altogether trivial statements and operations.” Michael Polanyi, Personal Knowledge: Towards a Post-Critical Philosophy, University of Chicago Press, Chicago, 1958, p. 200

244


A First Look at Differential Algebra John H. Hubbard and Benjamin E. Lundell

Abstract. This article is an introduction to the common algebraic methods used to study both solutions to polynomial equations and solutions to differential equations: Galois theory and differential Galois theory. We develop both theories simultaneously by studying the solutions to the polynomial equation x 5 − 4x 2 − 2 = 0 and the solutions to the differential equation u0 = t − u2.

1. INTRODUCTION. The object of this paper is to prove that the differential equation u0 = t − u2

(1)

has no solutions which can be written using elementary functions, or antiderivatives of elementary functions, or exponentials of such antiderivatives, or antiderivatives of those, etc. We should note that equation (1) can be solved using power series, integrals which depend on a parameter, or Bessel functions of order 1/3. However, as we will see, none of these methods of solution are “algebraic” in nature. We aim to give a precise definition of “algebraic” by developing the theory of differential algebra, which is largely the work of Ritt. Other contributors include Liouville, Picard, Vessiot, Kolchin, and Rosenlicht. The part of differential Galois theory which we will require is remarkably analogous to the part of Galois theory which leads to a proof of Abel’s celebrated result that a general polynomial equation of degree five or higher cannot be solved by radicals. In an effort to derive these two areas in parallel, we will also explain why the polynomial equation x 5 − 4x 2 − 2 = 0

(2)

has no solutions which can be written as radicals of solutions to lower degree polynomial equations. The paper is written with a reader in mind who at some point studied Galois theory: either very recently and is therefore not an expert, or long ago and has since forgotten many of the finer points. The examples are chosen to illuminate the theorems: it is scarcely possible to give all the details of every proof in this article. For further reading, we recommend [4], [5], [7], [8], [9], or [10]. For the basics of Galois theory, differential equations, or algebraic groups, see [1], [2], or [3], respectively. 2. SPLITTING FIELDS. Our first step will be to determine where solutions to equations (1) and (2) lie. Recall that a field is a set in which an addition, subtraction, multiplication, and division are defined, and that these operations satisfy the rules which one expects from elementary arithmetic. Three standard examples are the rational numbers Q, the real numbers R, and the complex numbers C. All fields in this paper will have characteristic 0. For the case of polynomials, we now have all of the background we need. doi:10.4169/amer.math.monthly.118.03.245

March 2011]

A FIRST LOOK AT DIFFERENTIAL ALGEBRA

245

Definition 2.1. Given a polynomial f with coefficients in a field F ⊂ C, the splitting field of f over F, denoted E f , is the smallest subfield of C which contains F and all of the roots of f . The reason we can be sure that all solutions to a polynomial f lie in some subfield of C is the fundamental theorem of algebra, which says that any degree-n polynomial with coefficients in C has n (not necessarily distinct) roots in C. Example 2.2. Consider the polynomial f (x) = x 2 − 2. Then the field √ √ E f = Q( 2) = {a + b 2 : a, b ∈ Q} is the splitting field of f over Q. Why? First, this is a field. One can obviously add, subtract, and multiply numbers of this form and obtain another number of this form. Division is possible since √ a−b 2 1 , √ = 2 a − 2b2 a+b 2 √ and a 2 − 2b2 = 0 implies that a = b = 0 (since √ 2 is irrational). √ Second, it does contain both roots of f : ± 2 ∈ Q( 2). Third, it is obviously the smallest field which contains these roots. Remark. The splitting field need not be thought of as a subfield of C. We need only to fix an algebraic closure of Q and we could work in there (as any algebraic closure contains the roots any polynomial, by definition). We find it easier to think of subfields of C, but that is just a crutch. In fact, the results on the Galois theory of polynomials that we collect in this section and the next are true in a much more general setting: if F is any characteristic-zero field, we will have the same results for a polynomial f with coefficients in F, provided that we fix an algebraic closure of F to begin with. This level of generality is not appropriate in these introductory sections, but will be necessary in Proposition 4.6. To deal with differential equations rather than polynomial equations, one must consider fields with a bit more structure. Definition 2.3. A differential field, here called a D-field, is a field F, together with a derivation δ : F → F which satisfies the rules δ(u + v) = δ(u) + δ(v) and δ(uv) = uδ(v) + vδ(u). A linear differential operator of degree k on (F, δ) is a map L : F → F given by a formula L(u) = δ k (u) + αk−1 δ k−1 (u) + · · · + α0 u where δ k (u) = δ(δ(. . . (u) . . . )) and αi ∈ F. The first example of a D-field is C(t), the set of all rational functions in one variable with complex coefficients, with the usual addition and multiplication, and with the derivation given by the ordinary derivative. The standard rules for differentiating say 246


that the derivative of a rational function is again a rational function. Another example is the field M(U ) of meromorphic functions on an open subset U ⊂ C, that is, quotients u/v of analytic functions with v not identically zero. For our purposes, the field C(t) will be the smallest field of interest (analogous to Q for polynomials), and M(U ) the largest (like C). The reason we can use M(U ) as our “big field” is the existence and uniqueness theorem for differential equations: if U is a simply connected subset of C and α1 , . . . , αk are analytic on U , then the differential equation L(u) = u (k) + α1 u (k−1) + · · · + αk u = 0 has a unique solution in M(U ) for any t0 in U with any given initial conditions u(t0 ), u 0 (t0 ), . . . , u (k−1) (t0 ). Definition 2.4. If (F, δ) is a differential field, then the constants of F are the elements u ∈ F such that δ(u) = 0. In this paper, the field of constants will always be C. We now have the background needed to define where the solutions to a differential equation lie. Definition 2.5. Let L(u) = δ n (u) + αn−1 δ n−1 (u) + · · · + α0 u be a differential operator where the αi ∈ F ⊂ M(U ) are all analytic in some simply connected open set U ⊂ C. Then the differential splitting field of L over F in M(U ), denoted E LU , is the smallest D-subfield of M(U ) containing F and all solutions of L(u) = 0 on U . We will suppress U from the notation and write E L for E LU when no confusion can arise. Example 2.6. One of the simplest differential operators is L(u) = u 0 − u. In this case the only coefficients are the numbers ±1, which are certainly analytic on all of C, so we may consider the splitting field over C(t) to be the smallest D-subfield of M(C) containing the rational functions and the solutions of the differential equation u 0 = u, that is, all functions of the form Cet . It should be clear that this subfield, E L , is precisely the space of functions of the form p0 (t) + p1 (t)et + · · · + pm (t)emt q0 (t) + q1 (t)et + · · · + qn (t)ent where the pi and q j are polynomials with coefficients in C, with the denominator not identically zero. Indeed, this is a differential field (clearly closed under addition, multiplication, division, and differentiation), and more or less obviously the smallest such field containing the constants and et . One√should think of this splitting field as a close analog of the numbers of the form a + b 2. 3. GALOIS GROUPS. We now investigate the structure of splitting fields. In particular, we try to understand what happens to solutions of a polynomial or differential equation under a field automorphism. We begin with the case of polynomials. Definition 3.1. Let F ⊂ C be a field and suppose K /F is any field extension. The Galois group, denoted Gal(K /F), is the group of all field automorphisms of K which leave F fixed, where the group law is given by composition of automorphisms. If f is a polynomial with coefficients in F, then we call Gal(E f /F) the Galois group of f . March 2011]


247

Fix a polynomial f with coefficients in F, and let σ ∈ Gal(E f /F). Since σ fixes F and respects field operations (both by definition), we see that f (σ (a)) = σ ( f (a)) for any a ∈ E f . In particular, if a is a root of f , i.e., f (a) = 0, we see that 0 = f (a) = σ ( f (a)) = f (σ (a)). We conclude that elements of the Galois group of f permute the roots of f . Consequently, if we denote the set of roots of f by R f , then there is a group homomorphism Gal(E f /F) → Perm(R f ) which can easily be seen to be an injection, so Gal(E f /F) is naturally isomorphic to a subgroup of the finite group of permutations of the roots. From now on, we will always identify the Galois group of a polynomial with a subgroup of this permutation group. In fact, we can say more. Write f = g · h, where g is an irreducible, nonconstant polynomial with coefficients in F, and suppose g(a) = 0 for a ∈ C. First, we note that we necessarily have f (a) = g(a)h(a) = 0, so that a ∈ E f . Let σ ∈ Gal(E f /F). By the same reasoning as before 0 = g(a) = σ (g(a)) = g(σ (a)), so that not only does Gal(E f /F) permute the roots of f , it permutes the roots of the irreducible factors of f . Because of this, we can focus solely on the case where f itself is irreducible for the remainder of the paper. 2 Example 3.2. If f (x) = x√ −2√ as in Example 2.2, then the group Gal(E f /Q) is the group of permutations of { 2, − 2}.

Example 3.3. Let f (x) = x 5 − 1. Then the set of roots is the set with the five elements ωk = e2kπi/5 , with k = 0, . . . , 4. Clearly the Galois group is not the whole group of permutations; no automorphism can map 1 to anything else. This is a particular case of the following general statement: the Galois group acts transitively on the roots of a polynomial f (i.e., given any two roots a1 , a2 of f there exists σ ∈ Gal(E f /Q) such that σ (a1 ) = a2 ) if and only if f is irreducible. In our case, x 5 − 1 = (x 4 + x 3 + x 2 + x + 1)(x − 1), and the roots of the two factors cannot get mixed up. How about the other roots? It is not quite obvious,1 but ω1 can be mapped to any other root ωk , k = 1, 2, 3, 4 by an automorphism σ ∈ Gal(E f /Q). Knowing σ (ω1 ) completely determines σ , since ωk = ω1k , so σ (ωk ) = σ (ω1 )k . Once you see that, it is not hard to see that the Galois group is the multiplicative group of Z/5Z, which is a cyclic group of order 4. The same argument shows that the Galois group of the polynomial x p − 1 is the multiplicative group of the field Z/ pZ for any prime p. Definition 3.4. An extension K /F of a field F is called a Galois extension if the set of elements of K which are fixed by all the elements of Gal(K /F) is precisely F; only then is the Galois group really useful.2 1 This

is the content of the theorem by Gauss that cyclotomic polynomials are irreducible. L/K is not a Galois extension, the right thing to consider is the set of embeddings K ⊂ C which are the identity on F. We will not pursue this here. 2 When

248


The following theorem gives us all finite Galois extensions of a field F of characteristic zero. Theorem 3.5. If f is an irreducible polynomial with coefficients in a field F ⊂ C, then the field extension E f /F is a Galois extension. In fact, every finite Galois extension of F arises as the splitting field of some irreducible polynomial with coefficients in F. Finally, in this setting, |Gal(E f /F)| = [E f : F]. Proof. See, for instance, Section 14.1 of [1]. In Examples 3.2 and 3.3 we saw examples of Galois extensions. We now look at an extension which is not a Galois extension. Example 3.6. Consider the field F of real numbers of the form a + b21/3 + c41/3 for any rational numbers a, b, c. In this case, Gal(F/Q) = {1} since any element of the group must send 21/3 to a cubic root of 2, and there are no other such roots in F. Thus, F/Q is not a Galois extension. Theorem 3.7 (The Fundamental Theorem of Galois Theory). Let K /F be a finite Galois extension. If M is a field and F ⊂ M ⊂ K , then Gal(K /M) is a subgroup of Gal(K /F). Furthermore, this defines an inclusion-reversing bijection between the subfields of K containing F and the subgroups of Gal(K /F). Under this bijection, the normal subgroups correspond to the subfields M such that M/F is also a Galois extension, and, in this case, Gal(M/F) ' Gal(K /F)/ Gal(K /M). Proof. See Section 14.2 of [1]. Example 3.8. Consider the polynomial f (x) = x 3 − 2. This has three roots, and the Galois group Gal(E f /Q) is the full group of permutations of these roots3 and so has order six. The splitting field, E f , contains the ratios of the roots, which are cubic roots of unity. If we set E g to be the splitting field of g(x) = x 2 + x + 1, then Gal(E f /E g ) is cyclic of order three. Therefore, Gal(E f /E g ) E Gal(E f /Q) since index-two subgroups are always normal. According to Theorem 3.7, then, E g /Q is a Galois extension with Gal(E g /Q) ' Gal(E f /Q)/ Gal(E f /E g ). More specifically, set ω = e2πi/3 . Then Gal(E f /Q) is generated by complex conjugation and the unique field automorphism σ which maps 21/3 7 → ω · 21/3 , whereas Gal(E f /E g ) is generated by just σ , since complex conjugation is not the identity on E g . This last statement shows that Gal(E g /Q) is isomorphic to the quotient group, which is generated by the coset containing complex conjugation, and so is of order two. Finally, we consider the order-two subgroup of Gal(E f /Q) generated by complex conjugation. The field corresponding to this subgroup under the bijection in Theorem 3.7 is precisely the field considered in Example 3.6. There, we saw that this field was not a Galois extension of Q. This corresponds to the fact that no subgroup of two elements is normal in Gal(E f /Q). 3 In

particular, the real cubic root of 2 cannot be algebraically distinguished from the other roots.

March 2011]


249

4. DIFFERENTIAL GALOIS GROUPS. In this section, we will develop an analogous Galois theory for differential field extensions. Throughout this section, we will assume that all differential fields contain C(t) and lie inside some field M(U ) for U ⊂ C a simply connect open set. Definition 4.1. Let (K , ) be a differential extension of the differential field (F, δ). The differential Galois Group, DGal(K /F), is the group of field automorphisms σ : K → K which restrict to the identity on F and satisfy σ ((u)) = (σ (u)) for all u ∈ K , and the group law is given by composition of automorphisms. If L is a linear differential operator with coefficients in F, then we call DGal(E L /F) the differential Galois group of L. Just as in the case of polynomials, if we fix a linear differential operator L of degree k with coefficients in F, then the elements of DGal(E L /F) will map one solution of L(u) = 0 to another. The proof is identical to that in the polynomial case. The space VL of solutions of L(u) = 0 in M(U ) has dimension k as a vector space over C (since we must specify k initial conditions to be guaranteed a unique solution). As we just mentioned, elements of the D-Galois group permute the solutions of L and are F-linear (and hence necessarily C-linear). We conclude that the D-Galois group DGal(E L /K ) is naturally isomorphic to a subgroup of GL(VL ). Thus, if you choose a basis of solutions of L(u) = 0, you can think of DGal(E L /F) as a group of invertible complex k × k matrices. As with polynomials, we will always make this identification. Example 4.2. Let L be the operator given by L(u) = u 0 − u; in Example 2.6, the splitting field of L was determined. Any automorphism of E L must send one solution of u 0 = u to another, so in particular, it must send et to Cet for some nonzero complex number C. Moreover, C completely determines the D-Galois automorphism. Consequently, the D-Galois group is C∗ = GL1 (C), the multiplicative group of the complex numbers. Theorem 4.3. The differential Galois group DGal(E L /K ) of a linear differential operator L is an algebraic subgroup of GL(VL ); that is, it is a subset defined by finitely many polynomial equations. Proof. See Section 17 of [4]. Let us see what this says for a few examples. The additive group C has lots of subgroups, isomorphic to Z, Z ⊕ Z, R, etc., but none of them are algebraic. For instance, Z is defined by the equation sin π z = 0, but f (z) = sin(π z) is not a polynomial (nor is the function f (z) = z − z¯ , which vanishes exactly on R). The group C∗ also has lots of subgroups, but only those consisting of the nth roots of unity for some n are algebraic (obviously defined by the single equation z n − 1 = 0). The next theorem is a strong result on algebraic groups, and will be necessary for the proof of Theorem 8.1. Theorem 4.4. Let V be a finite-dimensional vector space over C, and G ⊂ GL(V ) be an algebraic subgroup. Then the connected component G 0 ⊂ G of G containing the identity is a normal subgroup, which is also algebraic, and the quotient group G/G 0 is finite. Proof. See Chapter II, Section 7.3 of [3]. 250


Our next example shows that a D-Galois group can perfectly well be finite. Example 4.5. Let U ⊂ C be the open unit disk and consider the linear differential operator L(u) = u 0 +

t u, 1 − t2

√ whose coefficients are analytic on U . Let w be an analytic branch of 1 − t 2 on U , for instance the one which is positive on (−1, 1). Then L(Cw) = 0 for any C ∈ C, and the √ splitting field E L ⊂ M(U ) of L over C(t) is the set of functions of the form u + v 1 − t 2 with u, v ∈ C(t). Since L is a first-order operator, we see that DGal(E L /C(t)) is a subgroup of GL1 (C) = C∗ . Let σ ∈ DGal(E L /C(t)), so that σ (w) = Cw for some C ∈ C. Then, σ (w)2 = C 2 w 2 = C 2 (1 − t 2 ). But also, σ (w)2 = σ (w 2 ) = σ (1 − t 2 ) = 1 − t 2 , since σ fixes 1 − t 2 ∈ C(t). Therefore, C 2 = 1. Thus, DGal(E L /C(t)) is the group√of two 2 elements: √ the identity automorphism and the automorphism which exchanges 1 − t 2 and − 1 − t . This illustrates the following fact: even if a linear differential operator is irreducible, in the sense that it is not the composition of two linear differential operators of lower degree, the differential Galois group of the splitting field may well not act transitively on the nonzero solutions, which may have an “individuality” of their own. The situation in this example is completely general. Proposition 4.6. Let L be a linear differential operator with coefficients in some Dfield F ⊂ M(U ), where U ⊂ C is a simply connected open set. If E L /F is a Galois extension and DGal(E L /F) is finite, then all the elements of E L are algebraic over F. Before we prove this proposition, we should explain what we mean by “E L /F is a Galois extension.” By this, we mean that E L is the splitting field of some irreducible polynomial with coefficients in the D-field F (contained in some algebraic closure of F). In the setting of Example 4.5, we had E L = E f where f (x) = x 2 − (1 − t 2 ) ∈ C(t)[x]. As we mentioned in the remark following Example 2.2, the results on the Galois theory of polynomials we developed in Sections 2 and 3 carry over to this more general setting. Proof of Proposition 4.6 Choose v ∈ E L , and consider the polynomial f :=

Y

(x − σ (v)).

σ ∈DGal(E L /F)

The coefficients of this polynomial are fixed under DGal(E L /F), hence in F by Theorem 4.8 below. Thus, f is a polynomial with coefficients in F which has v as a root, so that v is algebraic over F, as desired. The previous proposition connects splitting fields of linear differential operators to splitting fields of polynomials. One might ask, then, are Galois D-extensions the right extensions to study in the D-field setting? The answer is no. However, we would still like an analogue of Galois extensions for differential fields. March 2011]


251

Definition 4.7. An extension K /F of differential fields is called a Picard-Vessiot extension if the set of elements of K which are fixed by all the elements of DGal(K /F) is precisely F. Picard-Vessiot extensions play the role in differential Galois theory that Galois extensions play in polynomial Galois theory. In support of this, we offer the following analogues of Theorems 3.5 and 3.7 for differential fields. Theorem 4.8. If L is a linear differential operator with coefficients in a differential field F ⊂ M(U ), then the field extension E L /F is a Picard-Vessiot extension. Moreover, every Picard-Vessiot extension of F arises as the splitting field of some linear differential operator with coefficients in F. Proof. See the section in [8] on Picard-Vessiot extensions. Theorem 4.9 (The Fundamental Theorem of Differential Galois Theory). Let K /F be a Picard-Vessoit extension. If M is a D-field and F ⊂ M ⊂ K , then DGal(K /M) is an algebraic subgroup of DGal(K /F). Furthermore, this defines an inclusion-reversing bijection between the D-subfields of K containing F and the algebraic subgroups of DGal(K /F). Under this bijection, the normal subgroups correspond to the D-subfields M such that M/F is also a Picard-Vessoit extension, and, in this case, DGal(M/F) ' DGal(K /F)/ DGal(K /M). Proof. See Sections 15–19 of [4]. 5. THE DISCRIMINANT AND THE WRONSKIAN. The resemblance between Galois theory and D-Galois theory is quite striking, but the correspondence between the discriminant of a polynomial and the Wronskian of a linear differential operator is positively uncanny. Definition 5.1. If f is an irreducible polynomial with coefficients in some field F, and E f is its splitting field, containing roots x1 , . . . , xd , then the discriminant of f is4 Y 1( f ) = ± (xi − x j ), i6 = j

where the sign is + if and only if the number of factors is divisible by four. A priori, this looks like an element of E f , but it is clearly fixed by Gal(E f /F), and is therefore Q an element of F by Theorem 3.5. In E f , the discriminant 1( f ) is the square of i< j (xi − x j ) (that is what the sign was for), but it is not necessarily a square in√F. If it is not a square, there is an intermediate field between F and E f , namely F( 1( f )). It is fairly easy to understand the relation between the various Galois groups. √ Proposition 5.2. We have Gal(E f /F( 1( f ))) = Gal(E f /F) ∩ Alt(R f ), where Alt(R f ) ⊂ Perm(R f ) is the subgroup of even permutations. In particular, the discriminant is a square in F precisely when Gal(E f /F) consists entirely of even permutations of the roots of f . Proof. An even permutation σ can be written as a product of an evenQ number of transpositions, and hence it will not change the sign of (and hence fixes) i< j (xi − x j ). 4 For the sake of comparison with the Wronskian, it might be better to define the discriminant in terms of the resultant of f , but we avoid that here because it is longer and more technical.

252


Example 5.3. Consider the polynomial f (x) = x 3 − 2; one can compute that 1(x 3 − 2) = −108. In Example √ 3.8, we saw that Gal(E f /Q) = S3 . Proposition 5.2 then shows that Gal(E f /Q( 1( f ))) = A3 . Of course, we had √ seen this already √ in Example 3.8: −108 = −1 · 22 · 33 so that Q( 1( f )) = Q( −3) = E g , where g(x) = x 2 + x + 1 is the irreducible polynomial giving the cubic root of unity. Let U be a simply connected open subset of C and F a differential subfield of M(U ). The Wronskian of a differential operator L with coefficients in F is best understood by making the differential equation L(u) = 0 into a system of first-order equations as follows. If L(u) = u (n) + αn−1 u (n−1) + · · · + α0 u, then we can represent the equation L(u) = 0 as the matrix equation A L u = u0 , where A L , u, and u0 are the n × n, n × 1, and n × 1 matrices     0   0 u u 0 u 00    u    .. . In−1  , u :=  .  , and u0 :=  .  . A L :=    ..   ..   0 −α0 −α1 . . . −αn−1 u (n−1) u (n) If u 1 , . . . , u n are n linearly independent solutions to L(u) = 0, we define a new n × n-matrix W L whose ith column is ui for i = 1, . . . , n. Then W L satisfies the n × nmatrix differential equation W 0 = A L W . Definition 5.4. The Wronskian of the differential operator L is the function Wr L = det(W L ) of the complex variable t. As the matrix W L depends on the choice of the basis of solutions to L(u) = 0, the Wronskian Wr L is only defined up to a nonzero complex constant. Unfortunately, it would seem from the definition that one would have to know a basis for the space of solutions to L(u) = 0 to compute the Wronskian. That, in general, could be very difficult to come by. However, the following proposition gives us a method of computing the Wronskian just by considering the matrix A L . Proposition 5.5. The Wronskian Wr L satisfies the differential equation u 0 = Tr(A L )u. Hence, we can write Wr L (t) = Wr L (t0 ) exp

Z

t

t0

Tr(A L (s))ds ,

where t0 is any point in U and we integrate along any path from t0 to t in U . In particular, the Wronskian is always contained in a differential extension obtained by adjoining the exponential of an antiderivative. Proof. To see the first fact, we use the Jacobi identity for invertible n × n matrices: (det W )0 = Tr(W 0 · Adj W ), where Adj W is the unique n × n invertible matrix such that W · Adj W = (Adj W ) · W = det(W )In . March 2011]


253

The first part of the proposition now follows immediately since Wr0L = Tr(W L0 · Adj W L ) = Tr(A L W L Adj W L ), = Tr(A L · det(W L )In ),

since W L0 = A L W L , since W L · Adj W L = det(W L )In ,

= Tr(A L · Wr L In ) = Tr(A L ) Wr L . Suppose now that the Wronskian is 0 at t0 ∈ U . Then there is some linear combination w of u 1 , . . . , u n such that w(t0 ) = w 0 (t0 ) = · · · = w (n−1) (t0 ) = 0. However, L(w) = 0, so by the uniqueness theorem for solutions to L(u) = 0, we must have the w is the constant function 0. This violates the linear independence of the solutions u 1 , . . . , u n , so we may conclude that Wr L is nonvanishing on U . Thus, we may divide to get Wr0L = Tr A L Wr L as functions on the simply connected set U . Thus, if we pick any t0 ∈ U , the integral Z t Z t Wr0L (s) ds Tr A L (s) ds = t0 t0 Wr L (s) = log Wr L (t) − log Wr L (t0 ) is independent of the path chosen from t0 to t and the second part of the proposition follows. Again, if the Wronskian is not in the original D-field, this gives an intermediate Dfield extension F ⊂ F(Wr L ) ⊂ E L , and it is not too difficult to understand the effect on the D-Galois groups. Proposition 5.6. After identifying DGal(E L /F) and GL(VL ), we have DGal(E L /F(Wr L )) = DGal(E L /F) ∩ SL(VL ), where SL(VL ) ⊂ GL(VL ) is the subgroup of automorphisms of determinant one. In particular, if the Wronskian is in F, then the D-Galois group is contained in SL(VL ). Proof. Let σ ∈ DGal(E L /F). Then, as discussed above, σ permutes the solutions of L(u) = 0. As W L is determined by these solutions, we see that σ acts on W L by acting on its entries. We will denote this action by σ (W L ). Since σ is a field homomorphism, and as det W L is a sum of products of elements of F, we can conclude σ (Wr L ) = σ (det W L ) = det(σ (W L )). By identifying DGal(E L /F) and GL(VL ), we can find a matrix S ∈ GL(VL ) such that σ (W L ) = W L S.5 Therefore, σ (Wr L ) = det(σ W L ) = det(W L S) = det(W L ) det(S) = det(S) Wr L . In particular, σ fixes Wr L if and only if det S = 1, as claimed. 5 This is not obvious, but it can be checked by writing down matrices with respect to the basis {u , . . . , u }. n 1 Also, notice that this means that the matrix corresponding to σ acts by right multiplication.

254


We illustrate these ideas with an example which is crucial to the proof of Theorem 8.1. Example 5.7. Consider the Airy differential operator L A (u) = u 00 − tu, and let v be a solution to L(u) = 0. Then 0 1 , AL A = t 0 and, if L A (v) = 0, then 0 1 v AL A v = t 0 v0 0 v = tv 0 v , since L A (v) = 0, = v 00

= v0 . We can now use Proposition R 5.5 to compute the Wronskian Wr L A . Since Tr A L A = 0, we have that Wr L A = C exp 0 = C, for some nonzero complex number C. Since C ∈ C(t), we see that DGal(E L A ) is a subgroup of SL2 (C) by Proposition 5.6; determining precisely which subgroup is the content of Theorem 8.1. 6. RADICAL EXTENSIONS AND SOLVABLE GALOIS GROUPS. Recall a major result in your high school algebra class: the quadratic formula. Presuming that you worked mainly in characteristic zero back then, this simple formula allowed you to find the roots of any quadratic polynomial you were given. Not surprisingly, a square root appeared in the solution. This basic setting provides all the intuition needed to continue. Definition 6.1. Suppose that you can find all the roots of an irreducible polynomial f with coefficients in a field F by the following procedure: (i) Choose some element a1 ∈ F and consider the splitting field F1 of x d1 − a1 . (ii) Choose an an element a2 ∈ F1 , and set F2 to be the splitting field of x d2 − a2 . (iii) Continue in this way until you have a field Fi which has all the roots of f . Then, we say that f is solvable by radicals. More complicated (and probably not covered in high school) is the formula for cubics: first, to solve the equation x 3 + ax 2 + bx + c = 0, substitute y = x − a3 to convert the problem to y 3 + py + q = 0, where p = b −

a2 2a 3 ab and q = − + c. 3 27 3

Then we find  y=

−q ±

q q2 + 2

4 p3 27

1/3 

p

−  3

March 2011]

r

p3 −q± q 2 + 427

2


1/3 . 

255

p Note that this is a typical radical extension: first we adjoin a square root, q 2 + 4 p 3 /27, then a cube root of an element of the field generated by the first extension. Definition 6.2. A group G is said to be solvable if there is a chain of subgroups, {1} = G n ⊂ G n−1 ⊂ · · · ⊂ G 0 ⊂ G −1 = G, such that each G j is a normal subgroup of G j−1 and the quotient groups G j−1 /G j are all abelian. Standard examples of solvable finite groups are the symmetric groups S3 and S4 , the latter via the chain {1} ⊂ V ⊂ A4 ⊂ S4 , where V is the Klein four-group and A4 is the alternating group. The groups Sn and An are not solvable, however, for n ≥ 5. The next proposition shows that the similarity in naming is no coincidence. Proposition 6.3. An irreducible polynomial f with coefficients in F is solvable by radicals if and only if Gal(E f /F) is a solvable group. 7. LIOUVILLIAN EXTENSIONS AND SOLVABLE DIFFERENTIAL GALOIS GROUPS. Of course, we can consider radical extensions of a differential field, but they are not the right analog of radical extensions in the context of differential fields. There, the “simple” extensions of a D-field F are those obtained by considering the antiderivative V of an element v ∈ F, and considering the smallest D-field containing F and either V or e V . We will also consider all finite algebraic extensions of F as elementary. Definition 7.1. A Liouvillian extension K of a D-field F is one such that there is a sequence F = F0 ⊂ F1 ⊂ F2 ⊂ · · · ⊂ Fn = K such that each field Fi+1 is either finite algebraic over Fi , or generated by an antiderivative or exponential of an antiderivative of an element of Fi . Notice that if you are thinking of all these fields as subfields of M(U ) for an appropriate U , then it may be necessary to restrict to some U1 ⊂ U : if v has a pole in U with nonzero residue, then there will not be an antiderivative of v defined on all of U , but there will be one on any simply connected subset of U which avoids the poles of v. Consider the extension Fi /Fi−1 ; there are three possibilities for DGal(Fi /Fi−1 ) depending on how the extension is generated: (i) If Fi /Fi−1 is generated by a finite algebraic element, then DGal(Fi /Fi−1 ) is finite. (ii) If Fi /Fi−1 is generated by an antiderivative of an element α ∈ Fi−1 , then we can think of Fi as the splitting field of the linear operator L(u) = u 0 − α. In particular, the solution we use to generate Fi is defined only up to addition of a constant in C. Since DGal(Fi /Fi−1 ) must permute the solutions of L(u) = 0, we see that DGal(Fi /Fi−1 ) ' C (since C has no algebraic subgroups). 256


(iii) If Fi /Fi−1 is generated by the exponential of an antiderivative of an element α ∈ Fi−1 , then we can think of Fi as the splitting field of the linear operator L(u) = u 0 − αu. In particular, the solution we use to generate Fi is defined only up to multiplication by a constant in C∗ , and so DGal(Fi /Fi−1 ) ' C∗ or DGal(Fi /Fi−1 ) is cyclic of order n (corresponding to the algebraic subgroup of the nth roots of unity) and Fi /Fi−1 is an algebraic extension by Proposition 4.6. Example 7.2. Let F ⊂ M(U ), where U ⊂ C is some simply connected open set. Parts (ii) and (iii) above show that if L is a first-order linear operator defined over F, then E L /F is a Liouvillian extension. What can we say about second-order linear operators? The answer requires us to think more about first-order operators. Suppose L(u) = u 0 + αu where α ∈ F. Let β ∈ E L and let v be a solution to the nonhomogeneous equation L(u) = β. Solving for v first requires us to both R multiply sides of the equation v 0 + αv = β by the integrating factor w := exp α . Note that w is contained in a Liouvillian extension of F. This multiplication transforms our equation into (vw)0 = wβ. Integrating and dividing gives Z 1 wβ; v= w in particular, v is contained in a Liouvillian extension of F since w and β are. We now have all we need to consider a second-order linear operator L(u) = u 00 + α1 u 0 + α0 u with α0 , α1 ∈ F. Suppose v satisfies L(v) = 0 and let K ⊂ E L be the smallest D-subfield which contains v and F. It need not be the case the K /F is a Liouvillian extension. However, we can show that E L /K is a Liouvillian extension. Let w be another, linearly independent, solution to L(w) = 0; then, up to a scalar multiple, we have v w Wr L = det = vw0 − wv 0 . v 0 w0 This shows that w satisfies the first-order nonhomogeneous differential equation w0 −

W rL v0 w= , v v

where W L /v is contained in a Liouvillian extension of K . From what we saw above, this shows that E L /K is a Liouvillian extension. It is worth noting that if K /F is a Liouvillian extension, then E L /F will be a Liouvillian extension as well. Proposition 7.3. Any elementary function is contained in some Liouvillian extension of C(t). Proof. The difficulty √ is that the definition of a D-field never mentioned compositions, like esin t or log 1 − t 2 + 1 . But such compositions are contained in Liouvillian extensions. Indeed, any composition will be of the form eu , log u, or sin u. Trigonometric functions are dealt with using Euler’s formulas cos t = March 2011]

eit + e−it 2

and

sin t =

eit − e−it . 2i


257

The exponentials were explicitly included in the definition of Liouvillian extensions, and the logarithm is the antiderivative of u 0 /u. The following proposition says that Liouvillian extensions are the analogs of radical extensions. Proposition 7.4. Let L be a linear differential operator with coefficients in a D-field F, and let G = DGal(E L /F). The D-splitting field E L is (contained in) a Liouvillian extension if and only if G 0 , the connected component of the identity in G, is solvable. Proof. See Sections 25, 26, and 27 of [4]. 8. SOLUTIONS OF EQUATIONS (1) AND (2). We now restate our goals in the new language we have developed over the previous sections: (i) The polynomial f (x) = x 5 − 4x − 2 is not solvable by radicals. (ii) No solution of the differential equation u 0 = t − u 2 is contained in a Liouvillian extension. We begin by showing (i). Recall that a polynomial is solvable by radicals only if the Galois group of its splitting field is a solvable group; that is, it suffices to show that Gal(E f /Q) is not solvable. Since f has degree five, there are five roots defined over C. The Galois group Gal(E f /Q) permutes these roots and is thus (naturally isomorphic to) a subgroup of S5 . If a ∈ C is such that f (a) = 0, then, since f is irreducible, the field Q(a) is an extension of Q of degree five. Since Q(a) ⊂ E f , the degree of E f /Q must be divisible by five. By Theorem 3.5, the order of Gal(E f /Q) is divisible by five as well. By Cauchy’s theorem, there is therefore an element of order five (or 5-cycle) in Gal(E f /Q). Note that f (−2) < 0 < f (−1) and f (0) < 0 < f (2), so that f has real roots a, b, c satisfying −2 < a < −1 < b < 0 < c < 2. By considering derivatives of f , one sees that these are the only real roots. Thus, there are two complex conjugate roots of f . Now consider the action of complex conjugation on the field E f . It is certainly an automorphism, and it fixes Q ⊂ R. It is therefore an element of Gal(E f /Q) ⊂ S5 . We just concluded that three of the roots of f are real, and thus fixed by complex conjugation, and that the two remaining roots are swapped. We can therefore conclude that Gal(E f /Q) contains a 2-cycle as well. That is all we need, however. It is a fact from elementary group theory that a 2cycle and a 5-cycle generate all of S5 . We conclude that Gal(E f /Q) = S5 , and f is not solvable by radicals. We now proceed to our second goal. Up to this point, we have considered only linear differential operators and their solutions. In light of this, our results so far would seem to have little bearing on the solutions to the nonlinear equation u 0 = t − u 2 . However, there is still hope: recall the Airy differential operator L A (u) = u 00 − tu from Example 5.7. It is certainly linear, and, since the function t has no poles, the splitting field E L A is a subfield of the meromorphic functions on C, M(C). 258


Let v ∈ E L A be any function such that L A (v) = 0, and let w = v 0 /v be the logarithmic derivative of v (this is necessarily in the field E L A ). Then differentiation shows that v · v 00 − [v 0 ]2 v2 t · v 2 − [v 0 ]2 , = v2

w0 =

since v 00 − tv = 0 by assumption,

= t − w2 ; that is, w is a solution to u 0 = t − u 2 ! Moreover, we can reverse R this process to see that if w is any function such that w 0 = t − w2 , then v = exp w satisfies L A (v) = 0. Thus, we have a way of relating solutions of u 0 = t − u 2 to solutions of the linear differential equation L A (u) = 0. Theorem 8.1. We have that G = DGal(E L A /C(t)) = SL2 (C). Proof. In Example 5.7, we found that G ⊂ SL2 (C). To prove equality, let G 0 be the connected component of the identity. Then G/G 0 is finite by Theorem 4.4. Since SL2 (C) is 3-dimensional, there are very few proper connected subgroups. In particular, they are all conjugate to one of the following four: 1 0 (i) , 0 1 a 0 ∗ (ii) :a∈C , 0 1/a 1 b (iiii) : b ∈ C , or 0 1 a b ∗ (iv) : a ∈ C ,b ∈ C . 0 1/a Note that for each of these groups, (1 0)t is a common eigenvector. Thus, if G 0 were a proper subgroup of SL2 (C), then all elements of G 0 would have a common eigenvector v ∈ E L A which satisfies L A (v) = 0. Since differentiation commutes with the action of G on E L A , we then see that w = v 0 /v is left fixed by G 0 . Thus, the Picard-Vessiot extension generated by w is a Dsubfield, M say, of E L A and G 0 ⊂ DGal(E L /M) (because G 0 fixes w). In particular, we can apply the Fundamental Theorem of Differential Galois Theory (Theorem 4.9) so that DGal(M/C(t)) ' G/ DGal(E L /M) is a quotient of G by a group containing G 0 ; hence, DGal(M/C(t)) is finite, so that w is an algebraic function by Proposition 4.6 (and so has finitely many poles). However, as we saw above, w0 = t − w 2 since w is the logarithmic derivative of the solution v of L A (u) = 0. For any number t0 < −1 − π/2, the solution w with w(t0 ) = 0 has domain of definition (a, b) with t0 − π/2 < a < b < t0 + π/2, since the solution is above tan(t + t0 ) for t > t0 and beneath tan(t + t0 ) for t < t0 (see Figure 1). Thus, w has at least as many poles as tan(t), which has infinitely many poles. Hence w is not algebraic and our guess that G 0 6 = SL2 (C) is false. March 2011]


259

y

7

–7

7

x

–7 Figure 1. Solutions to w0 (t) = t − w 2 .

Corollary 8.2. No nonzero solution of the Airy equation belongs to a Liouvillian extension of C(t). Proof. By Proposition 7.4, if one nonzero solution (and hence all solutions by Example 7.2) of the Airy equation belonged to a Liouvillian extension, then G 0 = SL2 (C) would be solvable. Since solvability is preserved by quotients (this is an exercise using the definition and the isomorphism theorems for groups), we would also have that PSL2 (C) = SL2 (C)/{± Id} would be solvable. By Theorem 8.4 of [6], PSL2 (C) is simple. Thus, PSL2 (C) is solvable only if it is abelian. One can easily check that this is not the case. We have (finally!) reached our second goal: Corollary 8.3. The differential equation u 0 = t − u 2 has no solutions which belong to a Liouvillian extension of C(t). R Proof. Suppose v is such a solution. Then exp v(t) dt is contained in a Liouvillian extension of C(t) and satisfies the Airy equation. This contradicts the previous corollary. REFERENCES 1. D. Dummit and R. Foote, Abstract Algebra, 3rd ed., John Wiley, Hoboken, NJ, 2004. 2. J. Hubbard and B. West, Differential Equations: A Dynamical Systems Approach, Texts in Applied Mathematics, vol. 5, Springer-Verlag, New York, 1995; corrected reprint of the 1991 edition. 3. J. Humphreys, Linear Algebraic Groups, Graduate Texts in Mathematics, vol. 21, Springer-Verlag, New York, 1975. 4. E. Kolchin, Algebraic matric groups and the Picard-Vessiot theory of homogeneous linear ordinary differential equations, Ann. of Math. (2) 49 (1948) 1–42. doi:10.2307/1969111 5. , Differential Algebra and Algebraic Groups, Pure and Applied Mathematics, vol. 54, Academic Press, New York, 1973. 6. S. Lang, Algebra, 3rd ed., Graduate Texts in Mathematics, vol. 211, Springer-Verlag, New York, 2002. 7. A. Magid, Lectures on Differential Galois Theory, University Lecture Series, vol. 7, American Mathematical Society, Providence, RI, 1994. 8. , Differential Galois theory, Notices Amer. Math. Soc. 46 (1999) 1041–1049.

260


9. J. Ritt, Differential Algebra, American Mathematical Society Colloquium Publications, vol. XXXIII, American Mathematical Society, New York, 1950. 10. M. van der Put and M. Singer, Galois Theory of Linear Differential Equations, Grundlehren der Mathematischen Wissenschaften (Fundamental Principles of Mathematical Sciences), vol. 328, SpringerVerlag, Berlin, 2003. JOHN H. HUBBARD received his undergraduate degree from Harvard University and his doctorate from the Université de Paris-Sud. He is currently a professor at Cornell University and the Université de Provence. His research interests lie in differential equations and complex dynamics. Department of Mathematics, Cornell University, Ithaca, NY 14850 [email protected] BENJAMIN E. LUNDELL received his undergraduate degree from the University of Illinois at UrbanaChampaign and a Certificate of Advanced Study in Mathematics from Cambridge University. He is currently a doctoral candidate at Cornell University studying number theory and arithmetic geometry. Department of Mathematics, Cornell University, Ithaca, NY, 14850 [email protected]

March 2011]


261

NOTES Edited by Ed Scheinerman

Lines of Best Fit for the Zeros and for the Critical Points of a Polynomial Grant Keady Abstract. Combining results presented in two papers in this M ONTHLY yields the following elementary result. Any line of best fit for the zeros of a polynomial is a line of best fit for its critical points.

This note gives a generalization of results on cubic polynomials presented in [1]. Our notation will follow that paper. A line of best fit for a set of points in the plane is defined, as in [1, p. 682], to be a line that minimizes the sum of squares of the perpendicular distances from the points to the line. (Sometimes, elsewhere, such a line is called a “least-squares perpendicular-offsets” line.) Let {z j | 1 ≤ j ≤ n} be a set of n ≥ 2 complex numbers. Define zA =

n 1X zj n j=1

and

Z=

n X (z j − z A )2 .

(1)

j=1

The generalization of [1, Theorem 2.4] is as follows. Theorem 1. Let {z j | 1 ≤ j ≤ n} be a set of n ≥ 3 complex numbers. Let z A and Z be defined as in equation (1). Let p(z) be the monic polynomial of degree n p(z) = z n +

n−1 X j=0

ajz j =

n Y j=1

(z − z j ),

Q 0 and denote the roots of its derivative by z k0 , so that p 0 (z) = n n−1 k=1 (z − z k ). The lines of best fit for the zeros z j of p coincide with the lines of best fit for the zeros z k0 of p 0 , and these lines pass through z A . The zeros z k0 of p 0 are also called the critical points of p. When n = 3 and Z = 0 the points z j form the vertices of an equilateral triangle. When n = 3 and Z 6 = 0, Theorem 1 becomes [1, Theorem 2.4], attributed there to Coolidge. (Closely related theorems are due to Grace, Bôcher, and Siebeck.) Theorem 2 (Coolidge, n = 3). Suppose the complex numbers z 1 , z 2 , z 3 form the verQ tices of a triangle which is nonequilateral. If p(z) = 3j=1 (z − z j ), then the unique line of best fit for the three numbers is the line through the roots of p 0 (z). The first ingredient in the proof of Theorem 1 is the following. doi:10.4169/amer.math.monthly.118.03.262

262


Theorem 3 ([1, Theorem 2.3]). Suppose z j , 1 ≤ j ≤ n, are complex numbers, and z A and Z are as in equation (1) above. (a) If Z = 0, then every line through z A is a line of best fit for the points z 1 , . . . , z n , and these are the only lines of best fit. √ (b) If Z 6= 0, then the line through z A that is parallel to the vector from 0 to Z is the unique line of best fit for z 1 , . . . , z n . The next ingredient inPthe proof is taken from [2]. (See also Newton’s identities.) First note that an−1 = − nj=1 z j = −nz A . Now suppose, as in [2], that z A = 0: there is no loss of generality in this translation of the points of the complex plane. Then an−1 = 0. (It also follows that the coefficient of z n−2 in p 0 (z) is also zero, so (n − 1)z A0 = Pn Pn−1 0 j=1 z j = 0, we find that k=1 z k = 0.) Squaring the equation Z=

n X j=1

z 2j = −2

X 1≤ j π3 we take a charge of min 12 , 2π3 α − 12 from T and move it to x (see Figure 2). We now show that after the redistribution of charges every vertex and every face has a nonnegative charge. Consider a vertex x. Let ` denote the number of internal angles at x that are greater than π3 . As the degree of x is equal to 5 we must have ` > 0. If due to one of these ` angles we transfered a charge of 21 to x, then the charge at x is March 2011]

NOTES

265

min T

1 , 3 2 2π

α−

1 2

α

x

Figure 2. The distribution of a charge from a face T to its vertex x.

nonnegative. Otherwise, denote by α1 , . . . , α` the measures of these ` internal angles at x that are greater than π3 . For i = 1, . . . , ` we transfer a charge of 2π3 αi − 12 to x due P` to αi . Hence the total charge transfered to x is 2π3 αi − 2` . The angle count at x P` P` i=1 π gives 2π ≤ i=1 αi + (5 − `) 3 . This implies i=1 αi ≥ 2π − (5 − `) π3 = π3 (` + 1). We conclude that the total charge transfered to x is at least 2π3 π3 (` + 1) − 2` = 21 . This leaves x with a nonnegative charge. Consider now a face T in M with k ≥ 3 edges. Assume first that T is a bounded face. The initial charge of T is k − 3 ≥ 0. Therefore, if the charge at T becomes negative this implies that one of the internal angles of T is greater than π3 . If k = 3, that is, T is a triangle, then it must be an equilateral triangle, hence having all internal angles equal to π3 . Consequently, the charge of T is unaffected and remains k − 3 = 0. If k = 4, then T is a rhombus. If only two internal angles of T are greater than π3 , then at most a total charge of 1 was deducted from the initial charge of T , leaving its charge nonnegative. If all internal angles of T are greater than π3 , then the total charge deducted from T is at most 2π3 · 2π − 42 = 1, leaving the charge at T nonnegative. If k = 5, then we split into two subcases. If at most four internal angles of T are greater than π3 , then there is a deduction of at most 2 from the initial charge of T , leaving the charge at T nonnegative. If all five internal angles of T are greater than π3 , then as the sum of the internal angles of T is equal to 3π, the total charge deducted from T amounts to 2π3 · 3π − 52 = 2, again leaving the charge at T nonnegative. Finally, if k ≥ 6, then the charge deducted from T is at most k2 leaving T with a charge of at least k − 3 − k2 ≥ 0. It is left to consider the unbounded face S of M. If the number of edges of S is at least 6, we are done as in the case of a bounded face. The cases where the unbounded face consists of at most 5 edges can easily be excluded. Another way to settle this issue is to observe that if S consists of at most 5 edges, then the total charge deducted from S is at most 52 , leaving the charge of S at least − 52 (and in fact at least − 23 ). We still obtain a contradiction as the sum of all charges should be equal to −6, while only the unbounded face may remain with a negative charge, and its charge is not smaller than − 25 . 266


3. CONCLUDING REMARKS. It is interesting to note that Theorem 1 is not true if we consider it on the sphere. A matchstick graph drawn on a sphere is a drawing of the vertices as points on the sphere and edges as great arcs connecting corresponding points, with the property that the lengths of all connecting arcs are equal and no two arcs cross. The example of the icosahedron shows that 5-regular matchstick graphs may exist on a sphere (see Figure 3). In this example the charges of all triangles are negative after the redistribution, since the angle sum of a spherical triangle is strictly larger than π .

Figure 3. Icosahedron on a sphere.

REFERENCES 1. A. Blokhuis, Regular finite planar maps with equal edges (2009), available at http://www.wm. uni-bayreuth.de/fileadmin/Sascha/aartb.pdf. 2. H. Harborth, Match sticks in the plane, in The Lighter Side of Mathematics, R. K. Guy and R. E. Woodrow, eds., MAA Spectrum, Washington, DC, 1994, 281–288. 3. N. Hartsfield and G. Ringel, Pearls in Graph Theory. A Comprehensive Introduction, Academic Press, Boston, 1990. 4. S. Kurz, Fast recognition of planar non-unit distance graphs—searching the minimum 4-regular planar unit distance graph (preprint). Department of Mathematics, University of Bayreuth, Universitätsstr. 30, 95447 Bayreuth, Germany [email protected] Department of Mathematics, Technion–Israel Institute of Technology, Haifa 32000, Israel [email protected]

March 2011]

NOTES

267

A Recursive Scheme for Improving the Original Rate of Convergence to the Euler–Mascheroni Constant Edward Chlebus

Abstract. We have used Euler–Maclaurin summation to develop a recursive scheme for modifying the original approximation for the Euler–Mascheroni constant γ . Convergence to γ resulting from successively employing the proposed scheme has been significantly accelerated while the form of the approximation originally introduced by Euler is still preserved.

1. INTRODUCTION. For n ∈ N let’s denote by γn(E) the difference between a partial sum of the harmonic series sn =

n X 1 k=1

k

=1+

1 1 1 + + ··· + 2 3 n

(1)

and the natural logarithm of n + 1, i.e., γn(E) = sn − ln(n + 1).

(2)

The Euler constant (also known as the Euler–Mascheroni constant) γ was introduced by Euler [6] in 1734 (the paper [6], originally presented to the St. Petersburg Academy of Sciences in 1734, was first published in 1740) as the limit of this difference (see also [7]): γ = lim γn(E) = lim [sn − ln(n + 1)] = 0.5772156649 . . . . n→∞

n→∞

(3)

The Euler–Mascheroni constant is usually defined not by (3) but by a slightly modified formula (see, e.g., [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 13, 14, 17]) 0

γ = lim γn(E ) = lim (sn − ln n) n→∞

n→∞

(4)

obtained from (3) by replacing γn(E) with the approximation 0

γn(E ) = sn − ln n.

(5)

As one can easily observe, (4) is equivalent to (3) because ln(n + 1) = ln n + ln 1 + n1 and limn→∞ ln 1 + n1 = 0. 0 The lower and upper bounds of γn(E ) − γ given by Young [17] 1 1 0 < γn(E ) − γ < 2(n + 1) 2n

(6)

doi:10.4169/amer.math.monthly.118.03.268

268


0

imply very slow convergence of (4). Replacing γn(E ) in equation (4) with DeTemple’s approximation [5] 1 γn(D) = sn − ln n + (7) 2 significantly speeds up the rate of convergence to γ since 1 1 < γn(D) − γ < . 2 24(n + 1) 24n 2

(8)

0

Accuracy estimates of the approximations γn(E ) and γn(D) are also given in [1, 4, 8, 11, 13, 16]. Although some of them slightly differ from the aforementioned bounds determined by Young and DeTemple, all the authors agree that the errors contributed by (5) and (7) are equal to 2n1 + O n12 and 24n1 2 + O n13 as indicated by (6) and (8), respectively. Negoi [10] has adopted DeTemple’s approach and further accelerated convergence to γ by introducing 1 1 (N ) γn = sn − ln n + + (9) 2 24n 0

in place of γn(E ) in (4) and proving that −

1 1 < γn(N ) − γ < − . 3 48n 48(n + 1)3

(10)

The following key questions naturally arise at this point: • •

What is the origin of the approximations γn(D) and γn(N ) that have such a profound effect on the rate of convergence despite seemingly negligible modifications of (5)? How can we further speed up the rate of convergence to γ ?

Our objective is to address these problems. 2. A RECURSIVE SCHEME FOR ACCELERATING CONVERGENCE OF (4). Using Euler–Maclaurin summation we get the formula for the error of the approxima0 tion γn(E ) [2, 7, 9] ∞

X B2k 1 sn − ln n − γ = − . 2n k=1 2kn 2k

(11)

The Bernoulli numbers B2k determine the terms of this expansion: sn − ln n − γ =

1 1 1 1 1 1 − + − + − + ··· . 2 4 6 8 2n 12n 120n 252n 240n 132n 10 (12)

By employing the Newton–Mercator series for

March 2011]

1 2n

X ∞ 1 (−1)k+1 1 1 1 ln 1 + = = − 2+ − ··· k 2n k(2n) 2n 8n 24n 3 k=1

(13)

NOTES

269

and subtracting (13) from (12), we can eliminate the term 2n1 from the right-hand side of (12): 1 1 1 23 1 sn − ln n 1 + −γ = − + − 2 3 4 2n 24n 24n 960n 160n 5 1 143 1 11 − + +O . (14) − 6 7 8 8064n 896n 30720n n9 The first two terms on the left-hand side of (14) constitute a new approximation of the Euler–Mascheroni constant: 1 (1) γn = sn − ln n 1 + . (15) 2n Its accuracy γn(1) − γ = 24n1 2 + O n13 , explicitly determined by (14), is higher than 0 that of γn(E ) given by (6) and can be easily further increased by applying the Newton– Mercator series again and eliminating the term 24n1 2 from (14) in the same way as 2n1 has been eliminated from (12). This process can be continued until the desired kth approximation is obtained. The kth approximation contributes an error (k) −2k (k) (k) γn(k) − γ = a2k n + a2k+1 n −(2k+1) + a2k+2 n −(2k+2) + · · ·

(16)

(k) (k) (k) depending on n and constant coefficients a2k , a2k+1 , a2k+2 , . . . . Note that −2k, not −(k + 1), is the highest exponent of n for k ≥ 2 in (16). This choice seems to be natural as there are no odd exponents of n (except for the first term) in the original infinite expansion (11–12). It means in practice that for k ≥ 2 each subsequent approximation γn(k) eliminates not one but two succesive terms of the series still left on the right-hand side of the formula for γn(k−1) − γ . Here are the three subsequent approximations directly following γn(1) (15): 1 1 1 (2) γn = sn − ln n 1 + 1− (17) 1+ 2n 24n 2 24n 3 1 1 (3) γn = sn − ln n 1 + 1+ 2n 24n 2 1 143 1 × 1− 1 + 1 − (18) 24n 3 5760n 4 160n 5 1 1 1 143 γn(4) = sn − ln n 1 + 1+ 1 − 1 + 2n 24n 2 24n 3 5760n 4 1 151 1 × 1− 1 − 1 − (19) 160n 5 290304n 6 896n 7

and the respective errors they contribute: γn(2) − γ =

270

143 1 151 1 − − − 4 5 6 5760n 160n 290304n 896n 7 30893 1 + +O 8 6635520n n9

(20)


1 109793 1 151 − + + O 6 7 8 290304n 896n 22118400n n9 109793 1 γn(4) − γ = + O . 22118400n 8 n9 γn(3) − γ = −

(21) (22)

The proposed recursive scheme can be generalized as follows. Theorem. For k ≥ 1 and l ≥ 2k let al(k) be the rational numbers defined as al(1) =

(−1)l − 2l Bl l2l

(23)

for l ≥ 2 and l

al(k+1)

=

al(k)

−

X m∈{2k,2k+1} l≡0 (mod m)

(−1) m +1 am(k)

ml

l m

(24)

for k ≥ 1 and l ≥ 2(k + 1), where Bl are the Bernoulli numbers with B3 = B5 = B7 = · · · = 0. Then we have ∞ X 1 al(1) γn(1) − γ = sn − ln n 1 + −γ = 2n nl l=2

(25)

and " ! !# Y k−1 (l) (l) a 1 a 2l+1 2l γn(k) − γ = sn − ln n 1 + 1 + 2l 1 + 2l+1 2n l=1 n n −γ =

∞ X a (k) l

l=2k

(26)

nl

for k ≥ 2. Corollary. For fixed k ≥ 2 and n → ∞, " ! !# Y k−1 (l) (l) a a 1 2l+1 1 + 2l2l γn(k) − γ = sn − ln n 1 + 1 + 2l+1 −γ 2n l=1 n n (k) a2k 1 = 2k + O . n n 2k+1

(27)

(k) The numbers a2(1) , a3(1) , a4(2) , a5(2) , a6(3) , a7(3) , . . . , a2k can be determined by recursively

March 2011]

NOTES

271

computing successive rows of the following triangular matrix a2(1) a3(1)

(1) a2k

···

a4(2) a5(2)

(2) a2k

···

a6(3) a7(3) · · ·

(3) a2k .. . (k) a2k

consisting of k 2 entries. 3. RESULTS AND CONCLUSIONS. The scheme we have developed for improving the convergence rate to the Euler–Mascheroni constant works well. It is clearly seen from (14), (20–22), and (27) that the order of the proposed approximation γn(k) is O n12k , i.e., for increasing n the accuracy of every subsequent approximation is higher than that of the preceding one. We have empirically validated these theoretical results. Figure 1 shows absolute values of the relative approximation error δn =

γn(k) − γ × 100% γ

(28)

for all the derived approximations (15) and (17–19). Performance of the original approximations (2) and (5), and Negoi’s (9), is also illustrated for comparison.

100 1 0.01

| δ n | [%]

0.0001 1e–006 1e–008 γn(E ) (5) γn(E) (2) γn(1) (15) 1e–012 γ (N) (9) n γn(2) (17) 1e–014 γ (3) (18) n γn(4) (19) 1e–016 5 10 1e–010

15

20 n

25

30

35

Figure 1. Comparison of the rates of convergence to γ for various approximations.

272


The convergence improvement provided by our algorithm is clear. All the resulting approximations (15) and (17–19) work better than the original ones (2) and (5) whose absolute errors are nearly identical. The Negoi approximation γn(N ) successfully com0 petes with γn(E) , γn(E ) , and γn(1) but is outperformed by our remaining approximations (2) (3) (4) γn , γn , and γn . The proposed scheme not only works well but also nicely explains the origin of DeTemple’s and Negoi’s improvements of the original formula for γ . Note that DeTemple’s approximation γn(D) is equivalent to γn(1) whereas Negoi’s approximation γn(N ) is a simplified version of γn(2) which can be expressed as 1 1 1 1 1 1 (2) − − − − . (29) γn = sn − ln n + + 2 24n 48n 2 48n 3 576n 4 1152n 5 Negoi has considered in his formula (9) only the first three terms of the logarithm given in (29). The orders of the original approximation (6) and DeTemple’s approximation (8) are also consistent with those given by Euler–Maclaurin summation, (12) and (14), respectively. In this paper we have analyzed only the first four approximations resulting from employing the proposed algorithm. The rate of convergence to γ can be improved even further if one keeps applying our scheme (27) until the desired approximation accuracy is obtained. 29,844,489,545 decimal digits of the Euler–Mascheroni constant γ are currently known. They were computed for n = 233 in March 2009 by Yee and Chan [15] using the algorithm of Brent and McMillan [3] which is based on the modified Bessel functions and avoids the need for computation of the Bernoulli numbers required for Euler–Maclaurin summation. The error of this algorithm is O(e−8n ) [15]. Our approach, which still employs Euler–Maclaurin summation, is less computationally efficient than the Sweeney [12] or Brent–McMillan [3] methods. The nice feature of the approximations γn(k) , however, is that they preserve the form γn(k) = sn − ln [ f (n)] introduced by Euler (2) with only the original function f (n) = n + 1 being modified in order to improve the rate of convergence to γ . ACKNOWLEDGMENTS. The author wishes to thank the anonymous reviewer for helpful comments.

REFERENCES 1. H. Alzer, Inequalities for the gamma and polygamma functions, Abh. Math. Sem. Univ. Hamburg 68 (1998) 363–372. doi:10.1007/BF02942573 2. T. M. Apostol, An elementary view of Euler’s summation formula, Amer. Math. Monthly 106 (1999) 409–418. doi:10.2307/2589145 3. R. P. Brent and E. M. McMillan, Some new algorithms for high-precision computation of Euler’s constant, Math. Comp. 34 (1980) 305–312. 4. C. -P. Chen, Inequalities for the Euler–Mascheroni constant, Appl. Math. Lett. 23 (2010) 161–164. doi: 10.1016/j.aml.2009.09.005 5. D. W. DeTemple, A quicker convergence to Euler’s constant, Amer. Math. Monthly 100 (1993) 468–470. doi:10.2307/2324300 6. L. Euler, De progressionibus harmonicis observationes, Commentarii academiae scientiarum imperialis Petropolitanae 7 (1740) 150–161; also available at http://www.math.dartmouth.edu/~euler. 7. J. Havil, Gamma: Exploring Euler’s Constant, Princeton University Press, Princeton, 2003. 8. E. A. Karatsuba, On the computation of the Euler constant γ , Numer. Algorithms 24 (2000) 83–97. doi: 10.1023/A:1019137125281 9. D. E. Knuth, Euler’s constant to 1271 places, Math. Comp. 16 (1962) 275–281. 10. T. Negoi, A faster convergence to Euler’s constant, Math. Gaz. 83 (1999) 487–489. doi:10.2307/ 3620963 11. A. Sintamarian, A generalization of Euler’s constant, Numer. Algorithms 46 (2007) 141–151. doi:10. 1007/s11075-007-9132-0

March 2011]

NOTES

273

12. D. W. Sweeney, On the computation of Euler’s constant, Math. Comp. 17 (1963) 170–178. 13. S. R. Tims and J. A. Tyrrell, Approximate evaluation of Euler’s constant, Math. Gaz. 55 (1971) 65–67. doi:10.2307/3613323 14. E. W. Weisstein, Euler–Mascheroni Constant—From MathWorld, A Wolfram Web Resource, http: //mathworld.wolfram.com/Euler-MascheroniConstant.html. 15. A. J. Yee, Large computations (2010), available at http://www.numberworld.org/nagisa_runs/ computations.html. 16. L. Yingying, On Euler’s constant—calculating sums by integrals, Amer. Math. Monthly 109 (2002) 845– 850. doi:10.2307/3072374 17. R. M. Young, Euler’s constant, Math. Gaz. 75 (1991) 187–190. doi:10.2307/3620251 Department of Computer Science, Illinois Institute of Technology, 10 West 31st St., Chicago, IL 60616 [email protected]

A New Construction of a Hyperbola We give a method of locating points on the hyperbola H with equation x 2 /a 2 − y 2 /b2 = 1 using ruler and compass, given the lengths a and b. To construct the right branch of H (the left one is analogous), first draw the vertex V = (a, 0) and the points P = (a, b) and Q = (a, −b). Draw the circle centered at the origin O passing through P and Q, √ which crosses the x-axis at the foci F = (e, 0) and F 0 = (−e, 0), where e = a 2 + b2 . Choose any point C on the segment PQ, except for P and Q. Draw the circle k with center C that passes through V . The x-axis is then one of the two tangents from F to k. Construct the second tangent t from F to k, the tangential point being T (T can be found as the intersection of k with the circle through V with center F). Similarly, draw the tangent u (different from the x-axis) from the left focus F 0 to k, with tangential point U . Finally, consider the intersection X of u and t. The trace of X when C moves on the line segment PQ is the right branch of H .

X

U

u

P C

k F0

O

t T

V

F Q

To check the correctness of the construction, we use the fact that the points X of the right branch of H are characterized by the property d(F 0 , X ) − d(F, X ) = 2a, where d denotes distance. According to our construction we have d(F 0 , U ) = d(F 0 , V ), d(X, U ) = d(X, T ), and d(F, V ) = d(F, T ). Thus we get d(F 0 , X ) − d(F, X ) = d(F 0 , U ) + d(X, U ) − (d(F, T ) + d(X, T ))

= d(F 0 , V ) − d(F, V ) = (e + a) − (e − a) = 2a. —Submitted by Joachim Jäger, Saarbrücken, Germany

274


PROBLEMS AND SOLUTIONS Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West with the collaboration of Mike Bennett, Itshak Borosh, Paul Bracken, Ezra A. Brown, Randall Dougherty, Tamás Erdélyi, Zachary Franco, Christian Friesen, Ira M. Gessel, László Lipták, Frederick W. Luttmann, Vania Mascioni, Frank B. Miles, Bogdan Petrenko, Richard Pfiefer, Cecil C. Rousseau, Leonard Smiley, Kenneth Stolarsky, Richard Stong, Walter Stromquist, Daniel Ullman, Charles Vanden Eynden, Sam Vandervelde, and Fuzhen Zhang.

Proposed problems and solutions should be sent in duplicate to the MONTHLY problems address on the back of the title page. Submitted solutions should arrive at that address before July 31, 2011. Additional information, such as generalizations and references, is welcome. The problem number and the solver’s name and address should appear on each solution. An asterisk (*) after the number of a problem or a part of a problem indicates that no solution is currently available.

PROBLEMS 11558. Proposed by Andrew McFarland, Płock, Poland. Given four concentric circles, find a necessary and sufficient condition that there be a rectangle with one corner on each circle. 11559. Proposed by Michel Bataille, Rouen, France. For positive p and x ∈ (0, 1), define the sequence hxn i by x0 = 1, x1 = x, and, for n ≥ 1, xn+1 =

pxn−1 xn + (1 − p)xn2 . (1 + p)xn−1 − pxn

Find positive real numbers α, β such that limn→∞ n α xn = β. 11560. Proposed by Gregory Galperin, Eastern Illinois University, Charleston, IL, and Yury Ionin, Central Michigan University, Mount Pleasant, MI. (a) The diagonals of a convex pentagon P0 P1 P2 P3 P4 divide it into 11 regions, of which 10 are triangular. Of these 10, five have two vertices on the diagonal P0 P2 . Prove that if each of these has rational area, then the other five triangles, and the original pentagon, all have rational areas. (b) Let P0 , P1 , . . . , Pn−1 , n ≥ 5 be points in the plane. Suppose no three are collinear, and, interpreting indices on Pk as periodic modulo n, suppose that for all k, Pk−1 Pk+1 is not parallel to Pk Pk+2 . Let Q k be the intersection of Pk−1 Pk+1 with Pk Pk+2 . Let αk be the area of triangle Pk Q k Pk+1 , and let βk be the area of triangle Pk+1 Q k Q k+1 . For 0 ≤ j ≤ 2n − 1, let ( α j/2 , if j is even; γj = β( j−1)/2 , if j is odd. Interpreting indices on γ j as periodic modulo 2n, find the least m such that if m consecutive γ j are rational, then all are rational. doi:10.4169/amer.math.monthly.118.03.275

March 2011]


275

11561. Proposed by Cezar Lupu (student), University of Bucharest, Bucharest, Romania. Let f 1 , . . . , f n be continuous real valued functions on [0, 1], none identically zero, R1 such that 0 f i (x) f j (x) d x = 0 if i 6 = j. Prove that !2 n Z 1 n Z 1 Y Y 2 n f k (x) d x ≥ n f k (x) d x , k=1

0

n Z X k=1 n X k=1

0

k=1 1

f k2 (x) d x ≥

0

n Z X k=1

!2

1

f k (x) d x

, and

0

R1

f k2 (x) d x 2 R 2 ≥ n . 1 f (x) d x 0 k 0

11562. Proposed by Pál Péter Dályay, Szeged, Hungary. For positive a, b, c, and z, let 9a,b,c (z) = 0((za + b + c)/(z + 2)), where 0 denotes the gamma function. Show that 9a,b,c (z)9b,c,a (z)9c,a,b (z) is increasing in z for z ≥ 1. 11563. Proposed by Vlad Matei (student), University of Bucharest, Bucharest, Romania. For each integer k ≥ 2, find all nonconstant f in Z[x] such that for every prime p, f ( p) has no nontrivial kth-power divisor.

SOLUTIONS Explaining a Polynomial 11403 [2008, 949]. Proposed by Yaming Yu, University of California–Irvine, Irvine, CA. Let n be an integer greater than 1, and let f n be the polynomial given by n i−1 X Y n f n (x) = (−x)n−i (x + j). i i=0 j=0 Find the degree of f n . Solution by Nicolás Caro, IMPA, Rio de Janeiro, Brazil, and independently by Cosmin Pohoata, Tudor Vianu National College, Bucharest, Romania. The degree of f n is bn/2c. This follows immediately from the stronger statement that the coefficient of x r in f n (x) is the number of derangements of [n] with r cycles, since each cycle must have at least two elements. Here [n] = {1, . . . , n}, and a derangement is a permutation with no fixed points. Let c(n, k) be the number of permutations of [n] with k cycles (the unsigned Stirling number of the first kind). The well-known generating function for these numbers is P Q given by nk=1 c(n, k)x k = n−1 j=0 (x + j) (provable in many ways, including induction on n). Thus n n−` n i X X X X n n n−i k f n (x) = (−x)` c(n − `, k)x k (−x) c(i, k)x = ` i i=0 k=1 `=0 k=0 n n n X r X X X n n ` r = (−1) c(n −`, r −`)x = (−1)` c(n −`, r −`)x r . ` ` r =` `=0 r =0 `=0

276


The coefficient of x r in this expression is precisely the inclusion-exclusion formula to count permutations with r orbits that have no fixed points. The universe is the set of permutations with r orbits, and the ith of the n sets to be avoided is the set of permutations in which element i is a fixed point. Editorial comment. Let d(n, r ) be the number of derangements of [n] with r cycles. The fact that the coefficient of x r in f n (x) is d(n, r ) can also be proved by induction on n using Pascal’s formula and the recurrence d(n, r ) = (n − 1)[d(n − 2, r − 1) + d(n − 1, r )]. O. P. Lossers (and others) gave a short proof of the degree statement by observing that f n (x) is the nth derivative (with respect to t) of the product e−t x (1 − t)−x , evaluated at t = 0. This polynomial appears explicitly in An Introduction to Combinatorial Analysis, chapter 4 section 4, pp. 72–74 by Riordan (Wiley, 1958). It is also mentioned in Advanced Combinatorics, chapter VI, section 6.7, p. 256, by Comtet (Reidel, 1974) with some references to previous non-combinatorial appearances in articles by Tricomi and Carlitz. Also solved by T. Amdeberhan & S. B. Ekhad, R. Bagby, D. Beckwith, R. Chapman (U.K.), P. Corn, P. P. Dályay (Hungary), O. Geupel (Germany), D. Grinberg, J. Grivaux (France), S. J. Herschkorn, E. Hysnelaj (Australia) & E. Bojaxhiu (Albania), D. E. Knuth, O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers ´ Plaza & S. Falcón (Spain), M. A. Prasad (India), R. Pratt, O. G. Ruehr, B. Schmuland (Netherlands), A. (Canada), A. Stadler (Switzerland), J. H. Steelman, R. Stong, R. Tauraso (Italy), M. Tetiva (Romania), BSI Problems Group (Germany), GCHQ Problem Solving Group (U.K.), Microsoft Research Problems Group, NSA Problems Group, and the proposer.

Mean Inequalities 11434 [2009, 463]. Proposed by Slavko Simic, Mathematical Institute SANU, Belgrade, Serbia. Fix n ∈ N with n ≥ 2. Let x1 , . . . , xn be distinct real numbers, and let p1 , . . . , pn be positive numbers summing to 1. Let 3 Pn Pn 3 k=1 pk x k − k=1 pk x k S = P 2 . Pn n 2 3 p x − p x k=1 k k k=1 k k Show that min{x1 , . . . , xn } ≤ S ≤ max{x1 , . . . , xn }. Solution by Jim Simons, Cheltenham, U.K. Consider a probability distribution on the real line that takes value x j with probability p j for 1 ≤ j ≤ n. Write µi0 for the ith moment about 0 and µi for the ith moment about the mean µ01 . Now S=

µ03 − µ01 3 3(µ02 − µ01 2 )

=

µ3 µ3 + 3µ01 µ2 = µ01 + . 3µ2 3µ2

From this we obtain inequalities stronger than those proposed: 1 2 1 2 min{x1 , . . . , xn } + µ01 ≤ S ≤ max{x1 , . . . , xn } + µ01 . 3 3 3 3 Also solved by R. Bagby, R. Chapman (U.K.), M. P. Cohen, W. J. Cowieson, P. P. Dályay (Hungary), H. Dehghan (Iran), P. J. Fitzsimmons, D. Grinberg, E. A. Herman, T. Konstantopoulos (U.K.), O. Kouba (Syria), J. H. Lindsey II, O. P. Lossers (Netherlands), J. Posch, K. Schilling, B. Schmuland (Canada), R. Stong, M. Tetiva (Romania), B. Tomper, BSI Problems Group (Germany), GCHQ Problem Solving Group (U.K.), Microsoft Research Problems Group, and the proposer.

March 2011]


277

A Circumradius Equation 11443 [2009, 548]. Proposed by Eugen Ionascu, Columbus State University, Columbus, GA. Consider a triangle ABC with circumcenter O and circumradius R. Denote the distances from O to the sides AB, BC, CA, respectively, by x, y, z. Show that if ABC is acute then R 3 − (x 2 + y 2 + z 2 )R = 2x yz, and (x 2 + y 2 + z 2 )R − R 3 = 2x yz otherwise. Solution by Philip Benjamin, Berkeley College,Woodland Park, NJ. We first prove the identity 1 − cos2 A + cos2 B + cos2 C = 2 cos A cos B cos C. (∗) Indeed, C = π − (A + B), so cos C = − cos(A + B) = sin A sin B − cos A cos B. Isolating sin A sin B and squaring yields cos2 A cos2 B + 2 cos A cos B cos C + cos2 C = sin2 A sin2 B = 1 − cos2 A − cos2 B + cos2 A cos2 B. This simplifies to (∗). Let the side lengths a, b, and c be opposite angles A, B, and C, respectively, so a/ sin A = b/ sin B = c/ sin C = 2R. The perpendicular from O to side AB bisects AB, so we have a right triangle with side lengths x, c/2, and R. Since c = 2R sin C, we conclude that x = R|cos C|. Similarly y = R|cos A| and z = R|cos B|. If 4ABC is acute, then the three cosines are positive, so multiplying (∗) by R 3 produces the desired result. Otherwise, say angle C is right or obtuse. Now x = −R cos C and the other two cosines are positive. Again, multiplying (∗) by R 3 produces the desired result. Editorial comment. A similar problem was proposed in Crux Mathematicorum with Mathematical Mayhem, December, 2008, Problem 3395. Also solved by A. Alt, H. Bailey, M. Bataille (France), D. Beckwith, R. Chapman (U.K.), L. Csete (Hungary), C. Curtis, P. P. Dályay (Hungary), P. De (India), D. Fleischman, V. V. Garc´ıa (Spain), M. Garner & J. Zacharias, O. Geupel (Germany), M. Goldenberg & M. Kaplan, M. R. Gopal, D. Gove, J.-P. Grivaux (France), L. Herot, J. G. Heuver (Canada), E. Hysnelaj (Australia) & E. Bojaxhiu (Germany), Y. K. Jeon (Korea), G. A. Kandall, Y. H. Kim (Korea), L. R. King, B. Klotzsche, T. Konstantopoulos (U.K.), O. Kouba (Syria), K.-W. Lau (China), J. C. Linders (Netherlands), J. H. Lindsey II, O. P. Lossers (Netherlands), J. McHugh, J. Minkus, J. H. Nieto (Venezuela), P. Nüesch (Switzerland), J. Oelschlager, M. Omarjee (France), J. Posch, C. R. & S. Selvaraj, R. A. Simon (Chile), J. Simons (U.K.), R. Stong, M. Tetiva (Romania), B. Tomper, Z. Vörös (Hungary), M. Vowe (Switzerland), H. Widmer (Switzerland), S. Xiao (Canada), Con Amore Problem Group (Denmark), GCHQ Problem Solving Group (U.K.), NSA Problems Group, and the proposer.

Extrema 11449 [2009, 647]. Proposed by Michel Bataille, Rouen, France. (corrected) Find the maximum and minimum values of (a 3 + b3 + c3 )2 (b2 + c2 )(c2 + a 2 )(a 2 + b2 ) given that a + b ≥ c > 0, b + c ≥ a > 0, and c + a ≥ b > 0. Solution by Jim Simons, Cheltenham, U.K. Call this big expression X . Since X is homogeneous, we may assume a 2 + b2 + c2 = 1. The feasible region then consists of a triangular patch on the positive octant of the unit sphere, excluding the vertices (where one of a, b, c is zero), but including the interiors of the sides (where two of a, b, c are equal). Using spherical polar coordinates, we may set (a, b, c) = 278


(cos α, sin α cos θ, sin α sin θ), where, since a, b, c are positive, θ is uniquely determined and 0 < θ < π/2. Now 2 cos3 α + sin3 α(cos3 θ + sin3 θ ) X= sin2 α cos2 α + sin2 α sin2 θ cos2 α + sin2 α cos2 θ 2 cos3 α + sin3 α(cos3 θ + sin3 θ ) . = sin2 α cos4 α + cos2 α sin2 α + sin4 α sin2 θ cos2 θ If f (θ ) = cos3 θ + sin3 θ , then f 0 (θ) = 3 cos θ sin θ (sin θ − cos θ ). In the feasible region for θ , this is positive for θ > π/4 and negative for θ < π/4. Thus f , and with it, the numerator of X for fixed α, is less at θ = π/4 than at any other feasible θ . Similarly, if g(θ) = sin2 θ cos2 θ , g, and with it, the denominator of X for fixed α, is increasing in θ for θ < π/4 and decreasing in θ for θ > π/4. Thus X is, for fixed α, smallest at θ = π/4, and greatest at an edge of the feasible region. By symmetry, the minimum value of X is 9/8, attained when a = b = c. From the foregoing, the maximum value of X on the closure of the feasible region occurs at a point where, with respect to any translation into spherical coordinates, θ is extremal. The only such points are the corners of the region. At (a, b, c) = (2−1/2 , 2−1/2 , 0), X = 2. However, this maximum is not attained because these corners are not in the feasible region. Also solved by R. Agnew, A. Alt, R. Bagby, D. Beckwith, H. Caerols & R. Pellicer (Chile), R. Chapman (U.K.), H. Chen, C. Curtis, Y. Dumont (France), D. Fleischman, J.-P. Grivaux (France), E. A. Herman, F. Holland (Ireland), T. Konstantopoulos (U.K.), O. Kouba (Syria), A. Lenskold, J. H. Lindsey II, P. Perfetti (Italy), N. C. Singer, R. Stong, T. Tam, R. Tauraso (Italy), M. Tetiva (Romania), E. I. Verriest, Z. Vörös (Hungary), S. Wagon, G. D. White, GCHQ Problem Solving Group (U.K.), Microsoft Research Problems Group, and the proposer.

Max Min Coordinate Difference 11450 [2009, 647]. Proposed by Cosmin Pohoata (student), National College “Tudor Vianu,” Bucharest, Romania. Let A be the unit ball in Rn . Find max a∈A

min ai − a j .

1≤i< j≤n

Solution by Omran Kouba, Higher Institute for Applied Sciences and Technology, Damascus, Syria. Let Mn denote the desired maximum. It is implicit in the statement p of the problem that n ≥ 2. We show that Mn = 12/(n(n 2 − 1)). Let (a1 , . . . , an ) be an element of A at which the maximum is achieved, and let Mn = min{|ai − a j | : 1 ≤ i < j ≤ n}. There is a permutation σ of the set {1, 2, . . . , n} such that aσ (1) ≤ aσ (2) ≤ · · · ≤ aσ (n) . Write for simplicity b j = aσ ( j) . For j > i, we then have

b j − bi =

j X

(bk − bk−1 ) ≥ ( j − i)Mn .

k=i+1

March 2011]


279

From this we conclude that |b j − bi | ≥ | j − i| Mn for 1 ≤ i, j ≤ n. Therefore X X X Mn2 ( j − i)2 ≤ (b j − bi )2 = (a j − ai )2 1≤i, j≤n

1≤i, j≤n

=

X

1≤i, j≤n

(a 2j + ai2 − 2ai a j )

1≤i, j≤n

≤ 2n

n X

n X

ak2 − 2

k=1

since

Pn

k=1

!2 ≤ 2n,

ak

k=1

ak2 ≤ 1 when (a1 , . . . , an ) ∈ A. On the other hand, X

( j − i) = 2n 2

1≤i, j≤n

n X

2

k −2

k=1

n X

!2 k

=

k=1

n 2 (n 2 − 1) . 6

p It follows that Mn2 ≤ 12/(n(n 2 − 1)), so Mn ≤ 12/(n(n 2 − 1)). Conversely, if we consider (a1(0) , a2(0) , . . . , an(0 ) defined by s 12 n+1 (0) ak = k− , k = 1, 2, . . . , n, n(n 2 − 1) 2 then (a1(0) , . . . , an(0) ) ∈ A and s = min ai(0) − a (0) j

1≤i< j≤n

Thus Mn ≥

12 . − 1)

n(n 2

p 12/(n(n 2 − 1)).

Editorial comment. Marian Tetiva (Romania) notes that a stronger form of this problem appeared as Problem E2032, this M ONTHLY 76 (1969) 691–692, proposed by D. S. Mitrinović. See also Problem 3.9.9 in Mitrinović, Analytic Inequalities (SpringerVerlag, 1970). Also solved by A. Alt, R. F. de Andrade, M. R. Avidon, R. Bagby, D. Beckwith, J. Cade, R. Chapman (U.K.), L. Comerford, W. J. Cowieson, P. P. Dályay (Hungary), A. Fielbaum (Chile), D. Fleischman, O. Geupel (Germany), J.-P. Grivaux (France), E. A. Herman, A. Ilić (Serbia), T. Konstantopoulos (U.K.), J. Kuplinsky, J. H. Lindsey II, O. P. Lossers (Netherlands), M. D. Meyerson, D. Ray, K. Schilling, B. Schmuland (Canada), J. Simons (U.K.), R. Stong, M. Tetiva (Romania), E. I. Verriest, GCHQ Problem Solving Group (U.K.), NSA Problems Group, and the proposer.

A Cauchy–Schwarz Puzzle 11458 [2009, 747]. Proposed by Cezar Lupu (student), University of Bucharest, Bucharest, Romania, and Vicent¸iu R˘adulescu, Institute of Mathematics “Simon Stoilow” of the Romanian Academy, Bucharest, Romania. Let a1 , . . . , an be nonnegative and let r be a positive integer. Show that  2 n X i r j r ai a j X X i r j r k r ai a j ak   ≤ m r −1 am . i + j −1 i + j +k−2 1≤i, j≤n m=1 1≤i, j,k≤n 280


Solution by Francisco Vial, Pontificia Universidad Católica de Chile, SantiPn student, i r ai x i−1 , so ago, Chile. Let f (x) := i=1

Z

n X

1

Z

f (x) d x = 0

m=1

1

Z

f 2 (x) d x =



 X

i r j r ai a j x i+ j−2  d x =



0

Z

1

m r −1 am ,

0

1

1

Z

f 3 (x) d x =

1≤i, j≤n



 X 

0

0

X i r j r ai a j , i + j −1 1≤i, j≤n

X i r j r k r ai a j ak x i+ j+k−3  d x = 1≤i, j,k≤n

1≤i, j,k≤n

and

i r j r k r ai a j ak . i + j +k−2

The stated inequality is equivalent to Z

1

f 2 (x) d x

2

1

Z

f (x) d x

≤

1

f 3 (x) d x ,

0

0

0

Z

which follows by applying the Cauchy–Schwarz inequality to f (x)1/2 and f (x)3/2 . Remarks. Because a1 , . . . , an are nonnegative, f (x) is nonnegative and continuous on [0, 1], so f (x)1/2 and f (x)3/2 are real and well defined. The parameter r need not be an integer. Also solved by M. R. Avidon, R. Chapman (U.K.), P. P. Dályay (Hungary), D. Grinberg, O. Kouba (Syria), O. P. Lossers (Netherlands), J. Simons (U.K.), R. Stong, GCHQ Problem Solving Group (U.K.), and the proposers.

An Orthocenter Inequality 11461 [2009, 844]. Proposed by Panagiote Ligouras, Leonardo da Vinci High School, Noci, Italy. Let a, b, and c be the lengths of the sides opposite vertices A, B, and C of an acute triangle. Let H be the orthocenter. Let da be the distance from H to side BC, and similarly for db and dc . Show that 1 2 ≥ da + db + dc 3

3 abc

1 1 1 √ +√ +√ ca bc ab

1/4

.

Solution by Michael Vowe, Fachhochschule Nordwestschweiz, Muttenz, Switzerland. Let R be the circumradius, r the inradius, F the area, and s the semiperimeter. From da = 2R cos B cos C, db = 2R cos C cos A, dc = 2R cos A cos B, we obtain r da + db + dc = 2R cos A cos B + cos B cos C + cos C cos A ≤ 2r 1 + R (see 6.10, p. 181, in D. Mitrinovic et al., Recent Advances in Geometric Inequalities, Dordrecht, 1989). From Jensen’s inequality for concave functions (here, the square root), we have 1 1 1 +√ +√ ≤3· √ ca ab bc March 2011]

s r 1 1 1 1 6s + + = . 3 ab bc ca abc


281

From abc = 4R F = 4Rr s and s 2 ≥ 27r 2 (6.1, p. 180, ibid.) we get !1/4 √ 1/4 2 1 2 3 1 1 3 6s ≤ √ +√ +√ 3 abc 3 (abc)3/2 ca bc ab !1/4 √ 2 3 6 1 1 1 23/8 ≤ · · 3/8 · 5/8 . = √ 3/2 3 (4Rr ) 3 R r 3 3r Thus it suffices to prove 1 23/8 1 1 ≥ · 3/8 · 5/8 . r 3 R r 2r 1 + R Writing x = r/R, this means we must prove x 3/8 (1 + x) ≤ 3/(2 · 23/8 ) for 0 < x ≤ 1/2. The function f (x) = x 3/8 (1 + x) is increasing on [0, 1/2], though, and we are done. Equality holds only if x = 1/2, or equivalently, R = 2r , which makes the triangle equilateral. Also solved by P. P. Dályay (Hungary), O. Faynshteyn (Germany), K.-W. Lau (China), C. R. Pranesachar (India), R. Stong, GCHQ Problem Solving Group (U.K.), Microsoft Research Problems Group, and the proposer.

An Erroneous Claim 11465 [2009, 845]. Proposed by Pantelimon George Popescu, Polytechnic University of Bucharest, Bucharest, Romania, and José Luis D´ıaz-Barrero, Polytechnic University of Catalonia, Barcelona, Spain. Consider three simple closed curves in the plane, of lengths p1 , p2 , and p3 , enclosing areas A1 , A2 , and A3 , respectively. Show that if p3 = p1 + p2 and A3 = A1 + A2 , then 8π A3 ≤ p32 . Solution by the Texas State University Problem Solvers Group, San Marcos, TX. The problem as stated is false. Consider the following counterexample. Let the first curve be a square of side 1, so p1 = 4 and A1 = 1. Let the second curve be a square √ with p2 = 40 and√A2 = 100. Let the third curve be a rectangle with sides 11 + 2 5 and 101/(11 + 2 5 ) so that p3 = 44 and A3 = 101. These three curves fulfill the requirements of the problem, and yet 8π A3 > p32 . Let us incorporate the additional requirement that p12 + p22 = 2 p1 p2 . Then the required inequality can be proved as follows. The isoperimetric inequality applied to any of the curves is p 2 i Ai ≤ π , 2π and thus 4π Ai ≤ pi2 . Therefore 4π A3 = 4π A1 + 4π A2 ≤ p12 + p22 . With the newly-added condition we get 8π A3 = 8π A1 + 8π A2 ≤ 2 p12 + 2 p22 = p12 + p22 + 2 p1 p2 = ( p1 + p2 )2 = p32 . Also solved by G. Apostolopoulos (Greece), R. Bagby, B. Burdick, R. Chapman (U.K.), W. J. Cowieson, P. P. Dályay (Hungary), J.-P. Grivaux (France), K. Hanes, J. H. Lindsey II, M. D. Meyerson, J. Minkus, J. Simons (U.K.), R. Stong, and the Microsoft Research Problems Group.

282


REVIEWS Edited by Jeffrey Nunemacher Mathematics and Computer Science, Ohio Wesleyan University, Delaware, OH 43015

Logical Labyrinths. By Raymond M. Smullyan. A K Peters, Ltd., Wellesley, MA, 2009, vii + 327 pp., ISBN 978-1-56881-443-8, $49.00.

Reviewed by Christopher C. Leary It is the pervading law of all things organic and inorganic, of all things physical and metaphysical, of all things human and all things superhuman, of all true manifestations of the head, of the heart, of the soul, that the life is recognizable in its expression, that form ever follows function. This is the law. [1] —Louis Sullivan

Since I entered this world late in the Eisenhower administration, some of my earliest introductions to the field of logic were through Raymond Smullyan’s work. I truly admire the authors among us who have the talent to open a field to others in an interesting and non-threatening manner, and Smullyan has that talent to spare. His logical puzzle books, from What is the Name of this Book? to Forever Undecided and beyond have introduced thousands to the delights of formal logic, and his variations on knights and knaves puzzles have been canonical material in logic classes for the past three decades, and will be for decades to come. Logical Labyrinths thus begins in familiar territory with the author guiding us into propositional logic through informal reasoning and logic conundrums. This volume, however, has a more ambitious goal than the author’s more popular works. Smullyan says in the preface that it is “. . . a bridge from all my previous recreational puzzle books to all my technical writing in the fascinating field of symbolic logic.” So this book aims to be more of a textbook, helping the reader move from an intuitive understanding of logical arguments to a mastery of specialized results that is on a par with that of firstyear graduate students. This is a daunting goal for a book of 320 pages, no matter who the author. Although there is much to praise about this work, and indeed parts of it are brilliant, I am left at the end with a feeling of disappointment. The overly ambitious reach of the book seems to have left it without a core audience to whom the author can pitch his prose. Smullyan treats the advanced topics at the conclusion of his journey in much the same way that he treats the introductory material. Unfortunately, Smullyan’s characteristic style, which works so well at the beginning of the book, does not seem to fit as well at the end. In my opinion this makes Logical Labyrinths somewhat unsatisfactory. Think about your ideal mathematics class. Maybe for you it is a first proofs course, or an analysis course, or the high school geometry class that your chair has not let you teach for the last fifteen years. Whatever the subject is, now think about the students in that class. Whether they are advanced students or raw beginners, most likely you have thought of these students as bright, inquisitive, somewhat self-motivated, and eager to join you as you explore the subject matter at hand. You are looking forward to sharing your love of your field with them, exploring the main ideas as well as showing them doi:10.4169/amer.math.monthly.118.03.283

March 2011]

REVIEWS

283

(or having them discover) some of your favorite little gems that are, perhaps, off the beaten path. You smile with anticipation as you look forward to the end of the class, knowing that you will have inspired them, that they will take with them at least the major points of the class, and most of all that they will remember their enthusiasm as, together, you explored this wonderful area of mathematics. This ideal student in this ideal class is the perfect audience for any author, but perhaps more so for Smullyan and his distinctive method of introducing logic. Much is demanded from the reader of this text. For without a doubt, Smullyan is not so much interested in laying out the foundations of logic as in having the reader discover the foundations of logic. His initial puzzles about liars and truth-tellers, insane liars and spies are set out with a dual purpose: certainly they set the stage for a somewhat more formal development of propositional logic in later chapters, but also they are designed to engage the reader, to entice her into the subject, and to get her to think, and think hard, about the questions posed by the puzzle. And the ideal student of the last paragraph will do just that, taking each question as a challenge and putting thought and time into each puzzle in order to build her skills and insight as she reads. Of course, that’s not what I did. The kids needed to get to school, there were groceries to buy, I had to get to work, and the car needed to go to the shop. And let’s not fool ourselves—I have better time management skills and fewer demands on my time than most undergraduates, many of whom have to do all of that as well as find time to write papers and do the reading for three or four other classes. So what did I do? After I read one of Dr. Smullyan’s puzzles, I thought about it for maybe five seconds and then I skipped ahead a couple of pages to the solution, all nicely laid out for me in black and white, thereby undermining the developmental plan that the author has crafted in his text. Dana Scott, in his endorsement on the back cover of Logical Labyrinths, says it nicely: “Suitable for self-study, the book will be most valuable if the readers agree not to peek at the solutions before learning to think for themselves!” Unfortunately, if I were considering this book as a text, it would be hard for me to believe that the students, even the students in my ideal class, would be so restrained as to not peek. We all know that reading mathematics is an active process. All authors whose goals include engaging the reader in active reading have the problem of how to encourage their readers to work through the material while, at the same time, providing the information in the text that they wish to convey. After posing a problem that the reader should ponder, the author faces this challenge: find the best way to provide a solution for the reader who either thinks that she has an answer and would like to know if it is correct, or has tried and is stymied, or is just too lazy to put in the effort but has to get this chapter finished before 9 a.m. tomorrow, and the author has to do all of this without undermining the active reading that is desired. George Exner, in his invaluable little volume An Accompaniment to Higher Mathematics [1], uses the symbol

284


to encourage the readers of his book to stop, pick up a pencil and some paper, and write something down, explicitly encouraging the active engagement of the reader. At the end of the text he provides hints, but no solutions, to his problems, exercises, and explorations. Jim Henle’s An Outline of Set Theory [2] follows a similar plan, laying out what is essentially an inquiry-based learning script in the first third of the book, hints for the exercises in the middle third, and solutions in the final third of the book. When I taught set theory using this text, I distributed large paper clips and told the students to paper clip the last part of the book shut until the end of the term. This was a reasonably successful experiment, at least to the extent that several of the students assured me that they had not, in fact, looked at the solutions during the course of the semester. In some sense, the author would like there to be some small cost involved in getting the answer, a cost that will encourage the reader to put forth the effort needed to receive maximal benefit from the problem that is posed. Perhaps today’s technology can help us with the problem of encouraging active reading while still providing the information that is needed. For example, authors and publishers could provide a web-based answer key. I can envision a system where users would register and references to answers would cost some nominal amount of real money that would be donated to a charity of the user’s choice, like the Red Cross or the MAA. Charge people 20c per answer with a limit of $10.00 per book, and the bar would be set high enough that readers would be encouraged to think about questions before running to get the answers, and we could do some good in the world, as well. Returning to the book at hand, my personal feeling is that Smullyan’s cost for finding the solutions to his puzzles is too low. At least it is too low to prevent me from cheating myself. I don’t know why flipping forward three pages makes it more likely that I will peek than looking at the back of the book, but I have the strong feeling that it does, and this makes me feel that the puzzle-book style that Professor Smullyan has used so effectively in his popular books is not as effective in a potential textbook. The list of topics covered in Logical Labyrinths is impressively, almost dauntingly, long. Starting from scratch, Smullyan brings us through propositional and first-order logic, proof tableaux, enough set theory to get to König’s infinity lemma, completeness, compactness, interpolation, and Beth’s definability theorem. Certainly a student who grasps all of these arguments and understands the import of these theorems will have gained a solid understanding of first-order logic, although arguably logic examined more from a philosopher’s point of view than from a mathematician’s point of view. I must say that I found the selection of advanced topics to be somewhat idiosyncratic. Incompleteness is only lightly covered in the last chapter (more on this later), and very little is said about the interplay between logic and computer science. These seem to be opportunities that the author chose to ignore, and I feel that the book ends up being slightly unbalanced because of these choices. There are parts of the book that just sing. I was very taken with Professor Smullyan’s treatment of cardinality and Cantor’s theorem, and the technical explanation of the tableaux method and the subsequent proofs of its completeness and correctness are compelling. Unfortunately, the end of the book was, for me, problematic. Being forewarned that the book was about generalization, I was not surprised to have the final sections be about a grand unification theorem for the major points of the book, but at the end of the day I wasn’t convinced that I cared. Every time I read a book I expect that the author had a reason for taking the trouble to write it. One of the things that I like an author to do is to have a point of view, to explain that point of view, and to argue that his or her point of view is correct. Smullyan states that his goal in this book is to bring beginners to an understanding of research in symbolic logic. That is a fine goal, and he does an admirable job of achieving it. But March 2011]

REVIEWS

285

I do not feel that he ever told me why the reader should be interested in the advanced results that he presents. In many logic texts (including the one that I wrote) there seems to be a natural stopping point after the incompleteness theorems of Gödel, where the implications of the theorems are easy to explain and defend. Smullyan detours around this point, however, and at the end of Logical Labyrinths I find myself wondering why I should care about the last few chapters. Where were the examples and the motivation? Chapter 23 proves Craig’s interpolation lemma, promising us important applications in the next two chapters. Chapter 24 uses the interpolation lemma to prove Robinson’s consistency theorem and Chapter 25 proves Beth’s definability theorem, but never are we given a clear idea about the reason that these two results are important. Perhaps their utility is supposed to be obvious, but I wanted more explanation from the text. At some point the author needs to inspire us, not just challenge us and quiz us. The back cover of Logical Labyrinths contains the usual mix of a publisher’s synopsis and ringing endorsements from respected reviewers. The late Martin Gardner correctly tells us, “It’s a volume that only Ray could have written . . . .” The puzzles, the jokes, and the tone of the book are vintage Smullyan, both in the introductory chapters and in the more advanced material. Unfortunately, this style does not seem to work as well late in the book as it does early in the book. The immediate attraction of the logic puzzles and paradoxes combined with the engaging style and wit of the author make the early chapters of Logical Labyrinths compelling reading, but that charm is lost by the end of the book. Although the form is consistent throughout, I believe that for most readers the function of the later chapters of this text needs to be different than the function of the earlier chapters. By losing sight of Sullivan’s dictum, Professor Smullyan lets us down. REFERENCES 1. G. R. Exner, An Accompaniment to Higher Mathematics, Springer-Verlag, New York, 1996. 2. J. M. Henle, An Outline of Set Theory, Springer-Verlag, New York, 1986. 3. L. H. Sullivan, The tall office building artistically considered, Lippincott’s Magazine series 57 (1896) 403–409. SUNY Geneseo, Geneseo NY 14454 [email protected]

286


Methods For Euclidean Geometry Owen Byer, Felix Lazebnik, and Deirdre L. Smeltzer Series: Classroom Resource Materials Methods for Euclidean Geometry explores one of the oldest and most beautiful of mathematical subjects. The book begins with a thorough presentation of classical solution methods for plane geometry problems, but its distinguishing feature is the subsequent collection of methods which have appeared since 1600. For example, the coordinate method, which is a central part of the book, has been part of mathematics for four centuries. However, it has rarely served as a tool that students consider using when faced with geometry problems. The same holds true regarding the use of trigonometry, vectors, complex numbers, and transformations. The book presents each of these as self-contained topics, providing examples of their applications to geometry problems. Both strengths and weaknesses of various methods, as well as the ranges of their effective applications, are discussed. Importance is placed on the problems and their solutions. The book contains numerous problems of varying difficulty; over a third of its contents are devoted to problem statements, hints, and complete solutions. The book can be used as a textbook for geometry courses; as a source book for geometry and other mathematics courses; for capstone, problem-solving, and enrichment courses; and for independent study courses. 480 pp., 2010, ISBN 978-0-88385-763-2, Hardbound List: $69.95 MAA Member: $55.95 Catalog Code: MEG

To order visit us online at www.maa.org or call us at 1-800-331-1622.

New from the MAA A Historian Looks Back The Calculus as Algebra and Selected Writings Judith V. Grabiner Judith Grabiner, the author of A Historian Looks Back, has long been interested in investigating what mathematicians actually do, and how mathematics actually has developed. She addresses the results of her investigations not principally to other historians, but to mathematicians and teachers of mathematics. This book brings together much of what she has had to say to this audience. The centerpiece of the book is The Calculus as Algebra: J.-L. Lagrange, 17361813. The book describes the achievements, setbacks, and influence of Lagrange’s pioneering attempt to reduce the calculus to algebra. Nine additional articles round out the book describing the history of the derivative; the origin of delta-epsilon proofs; Descartes and problem solving; the contrast between the calculus of Newton and Maclaurin, and that of Lagrange; Maclaurin’s way of doing mathematics and science and his surprisingly important influence; some widely held “myths” about the history of mathematics; Lagrange’s attempt to prove Euclid’s parallel postulate; and the central role that mathematics has played throughout the history of western civilization. The development of mathematics cannot be programmed or predicted. Still, seeing how ideas have been formed over time and what the difficulties were can help teachers find new ways to explain mathematics. Appreciating its cultural background can humanize mathematics for students. And famous mathematicians’ struggles and successes should interest—and perhaps inspire—researchers. Readers will see not only what the mathematical past was like, but also how important parts of the mathematical present came to be.

To order visit us online at www.maa.org or call us at 1-800-331-1622. 282 pp., Hardbound, 2010 List: $62.95

Catalog Code: CAGH MAA Member: $49.95

ISBN 978-0-88385-572-0

New from the MAA Functions, Data and Models An Applied Approach to College Algebra Sheldon P. Gordon & Florence S. Gordon Series: MAA Textbooks This is a college algebra-level textbook written to provide the kind of mathemati- 504 pp., Hardbound 2010 ISBN 978-0-88385-767-0 cal knowledge and experiences that stuList: $69.95 dents will need for courses in other fields, MAA Member: $55.95 such as biology, chemistry, business, fiCatalog Code: COS nance, economics, and other areas that are heavily dependent on data either from laboratory experiments or from other studies. The focus is on the fundamental mathematical concepts and the realistic problem-solving via mathematical modeling rather than the development of algebraic skills that might be needed in calculus. Functions, Data, and Models presents college algebra in a way that differs from almost all college algebra books available today. Rather than going over material covered in high school courses the Gordons teach something new. Students are given an introduction to data analysis and mathematical modeling presented at a level that students with limited algebraic skills can understand. The book contains a rich set of exercises, many of which use real data. Also included are thought experiments or what if questions that are meant to stretch the student’s mathematical thinking.

To order visit us online at www.maa.org or call us at 1-800-331-1622.

MATHEMATICAL ASSOCIATION OF AMERICA

1529 Eighteenth St., NW • Washington, DC 20036

New title by the MAA Counterexamples in Calculus Sergiy Klymchuk As a robust repertoire of examples is essential for students to learn the practice of mathematics, so a mental library of counterexamples is critical for students to grasp the logic of mathematics. Counterexamples are tools that reveal incorrect beliefs. Without such tools, learners’ natural misconceptions gradually harden into convictions that seriously impede further learning. This slim volume brings the power of counterexamples to bear on one of the largest and most important courses in the mathematics curriculum. —Professor Lynn Arthur Steen, St. Olaf College, Minnesota, USA, Co-author of Counterexamples in Topology

Counterexamples in Calculus serves as a supplementary resource to enhance the learning experience in single variable calculus courses. This book features carefully constructed incorrect mathematical statements that require students to create counterexamples to disprove them. Methods of producing these incorrect statements vary. At times the converse of a well-known theorem is presented. In other instances crucial conditions are RPLWWHGRUDOWHUHGRULQFRUUHFWGH¿QLWLRQVDUHHPSOR\HG,QFRUUHFWVWDWHments are grouped topically with sections devoted to: Functions, Limits, Continuity, Differential Calculus and Integral Calculus. 7KLV ERRN DLPV WR ¿OO D JDS LQ WKH OLWHUDWXUHDQG SURYLGH D UHVRXUFH IRU using counterexamples as a pedagogical tool in the study of introductory calculus. In that light it may well be useful for KLJKVFKRROWHDFKHUVDQGXQLYHUVLW\IDFXOW\DVDWHDFKLQJUHVRXUFH KLJKVFKRRODQGFROOHJHVWXGHQWVDVDOHDUQLQJUHVRXUFH DSURIHVVLRQDOGHYHORSPHQWUHVRXUFHIRUFDOFXOXVLQVWUXFWRUV Catalog Code: CXC 101pp., Paperbound, 2010 ISBN: 978-0-88385-756-6 List: $45.95 MAA Member: $35.95

Order your copy today! 1.800.331.1622 www.maa.org

The American Mathematical Monthly, Number 3, March 2011

American Mathematical Monthly, volume 117, number 3, March 2010

The American Mathematical Monthly, Number 1, January 2011

The American Mathematical Monthly, Volume 119, Number 2, February 2012

American Mathematical Monthly, volume 117, number 2, February 2010

American Mathematical Monthly, volume 116, number 1, january 2009

Scientific American March 2011

American Mathematical Monthly, volume 107, number 7, August-September 2000

American Mathematical Monthly, volume 105, number 2, February 1998

American Mathematical Monthly, volume 117, number 1, january 2010

American Mathematical Monthly, volume 117, number 4, April 2010

American Mathematical Monthly, volume 117, May 2010

EDN Magazine March 3, 2011

Yachting Monthly - May 2011

Combat Aircraft Monthly - June 2011

Historical Materialism, Volume 19, Number 3, 2011

NZ4WD - March 2011

Food Engineering March 2011

Cycle World (March 2011)

Today's Parent - March 2011

Beverage Industry March 2011

Elektor Magazine March 2011

Cosmopolitan UK March 2011

Coatings World March 2011

Appliance Design March 2011

Dairy Foods March 2011

Flexible Packaging March 2011

Elle Canada March 2011

The American Mathematical Monthly, Number 3, March 2011

American Mathematical Monthly, volume 117, number 3, March 2010

The American Mathematical Monthly, Number 1, January 2011

The American Mathematical Monthly, Volume 119, Number 2, February 2012

American Mathematical Monthly, volume 117, number 2, February 2010

American Mathematical Monthly, volume 116, number 1, january 2009

Scientific American March 2011

American Mathematical Monthly, volume 107, number 7, August-September 2000

American Mathematical Monthly, volume 105, number 2, February 1998

American Mathematical Monthly, volume 117, number 1, january 2010

American Mathematical Monthly, volume 117, number 4, April 2010

American Mathematical Monthly, volume 117, May 2010

EDN Magazine March 3, 2011

Yachting Monthly - May 2011

Combat Aircraft Monthly - June 2011

Historical Materialism, Volume 19, Number 3, 2011

NZ4WD - March 2011

Food Engineering March 2011

Cycle World (March 2011)

Today's Parent - March 2011

Beverage Industry March 2011

Elektor Magazine March 2011

Cosmopolitan UK March 2011

Coatings World March 2011

Appliance Design March 2011

Dairy Foods March 2011

Flexible Packaging March 2011

Elle Canada March 2011

Recommend Documents