Mathematical Analysis A Concise Introduction
Bernd S. W. Schroder Louisiana Tech University Program of Mathematics and...
546 downloads
2880 Views
26MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Mathematical Analysis A Concise Introduction
Bernd S. W. Schroder Louisiana Tech University Program of Mathematics and Statistics Ruston, LA
BtCLNTENNIAL
WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication
This Page Intentionally Left Blank
Mathematical Analysis
THE W l L E Y BICENTENNIAL-KNOWLEDGEFOR
GENERATIONS
6
ach generation has its unique needs and aspirations. When Charles Wiley first opened his small printing shop in lower Manhattan in 1807, it was a generation of boundless potential searching for an identity. And we were there, helping to define a new American literary tradition. Over half a century later, in the midst of the Second Industrial Revolution, it was a generation focused on building the future. Once again, we were there, supplying the critical scientific, technical, and engineering knowledge that helped frame the world. Throughout the 20th Century, and into the new millennium, nations began to reach out beyond their own borders and a new international community was born. Wiley was there, expanding its operations around the world to enable a global exchange of ideas, opinions, and know-how.
For 200 years, Wiley has been an integral part of each generation's journey, enabling the flow of information and understanding necessary to meet their needs and fulfill their aspirations. Today, bold new technologies are changing the way we live and learn. Wiley will be there, providing you the must-have knowledge you need to imagine new worlds, new possibilities, and new opportunities.
Generations come and go, but you can always count on Wiley to provide you the knowledge you need, when and where you need it! rc'\
U WILLIAM J. PESCE PRESIDENT AND CHIEF
EXECUTIVEOFFICER
PETER B O O T H WlLEV CHAIRMAN
OF
THE BOARD
Mathematical Analysis A Concise Introduction
Bernd S. W. Schroder Louisiana Tech University Program of Mathematics and Statistics Ruston, LA
BtCLNTENNIAL
WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication
Copyright C 2008 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacific0 Library of Congress Cataloging-in-Publication Data:
Schroder, Bernd S. W. (Bemd Siegfried Walter), 1966Mathematical analysis : a concise introduction / Bernd S.W. Schroder p. cm. ISBN 978-0-470-10796-6 (cloth) 1. Mathematical analysis. I. Title. QA300.S376 2007 5 15-dc22 2007024690 Printed in the United States of America.
Contents Table of Contents
V
Preface
xi
Part I: Analysis of Functions of a Single Real Variable 1 The Real Numbers 1.1 Field Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Order Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Lowest Upper and Greatest Lower Bounds . . . . . . . . . . . . . . . 1.4 Natural Numbers, Integers. and Rational Numbers . . . . . . . . . . . 1.5 Recursion. Induction. Summations. and Products . . . . . . . . . . .
1 1 4 8 11 17
:2 Sequences of Real Numbers 2.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 CauchySequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Bounded Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 25 30 36 40 44
:3 Continuous Functions 3.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49
3.2 Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 One-sided Limits and Infinite Limits . . . . . . . . . . . . . . . . . . 3.4 3.5 3.6 4
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of Continuous Functions . . . . . . . . . . . . . . . . . Limits at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
. . 66
Differentiable Functions 4.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Differentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Rolle’s Theorem and the Mean Value Theorem . . . . . . . . . . . . V
52 56
69
71 71 74 80
vi
Con tents
5 The Riemann Integral I 5.1 Riemann Sums and the Integral . . . . . . . . . . . . . . . . . . . . . 5.2 Uniform Continuity and Integrability of Continuous Functions . . . . 5.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . 5.4 The Darboux Integral . . . . . . . . . . . . . . . . . . . . . . . . . .
85 85 91 95 97
6 Series of Real Numbers I 101 6.1 Series as a Vehicle To Define Infinite Sums . . . . . . . . . . . . . . 101 6.2 AbsoluteConvergenceandUnconditionalConvergence . . . . . . . . 108 7
Some Set Theory 7.1 The Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . .
117 117 122 124
8
The Riemann Integral I1 8.1 Outer Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . 8.2 Lebesgue’s Criterion for Riemann Integrability . . . . . . . . . . . 8.3 More Integral Theorems . . . . . . . . . . . . . . . . . . . . . . . 8.4 Improper Riemann Integrals . . . . . . . . . . . . . . . . . . . . .
127 127 131 136 140
9 The Lebesgue Integral 145 9.1 Lebesgue Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . 147 9.2 Lebesgue Measurable Functions . . . . . . . . . . . . . . . . . . . . 153 9.3 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.4 Lebesgue Integrals versus Riemann Integrals . . . . . . . . . . . . . 165 10 Series of Real Numbers I1 10.1 Limits Superior and Inferior . . . . . . . . . . . . . . . . . . . . . . 10.2 The Root Test and the Ratio Test . . . . . . . . . . . . . . . . . . . . 10.3 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169 169 172
11 Sequences of Functions 11.1 Notions of Convergence . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . .
179 179 182
12 Transcendental Functions 12.1 The Exponential Function . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Sine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 L‘H6pital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189 189 193 199
13 Numerical Methods 13.1 Approximation with Taylor Polynomials . . . . . . . . . . 13.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 13.3 Numerical Integration . . . . . . . . . . . . . . . . . . . . .
203 204 208 214
.
175
Con tents
vii
Part 11: Analysis in Abstract Spaces 14 Integration on Measure Spaces 14.1 Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
225 225
14.2 Outer Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . .
230 234
14.4 Integration of Measurable Functions . . . . . . . . . . . . . . . . . . 235 14.5 Monotone and Dominated Convergence . . . . . . . . . . . . . . . . 238 14.6 Convergence in Mean. in Measure. and Almost Everywhere . . . . . 14.7 Product a-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 242 245
14.8 Product Measures and Fubini’s Theorem . . . . . . . . . . . . . . . . 251
15 The Abstract Venues for Analysis 15.1 Abstraction I: Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 15.2 Representation of Elements: Bases and Dimension . . . . . . . . . 15.3 Identification of Spaces: Isomorphism . . . . . . . . . . . . . . . . 15.4 Abstraction 11: Inner Product Spaces . . . . . . . . . . . . . . . . . 15.5 Nicer Representations: Orthonormal Sets . . . . . . . . . . . . . . 15.6 Abstraction 111: Normed Spaces . . . . . . . . . . . . . . . . . . . . 15.7 Abstraction IV: Metric Spaces . . . . . . . . . . . . . . . . . . . . . 15.8 LP Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9 Another Number Field: Complex Numbers . . . . . . . . . . . . .
. . . .
.
255 255 259 262 264 267 269 275 278 281
16 The Topology of Metric Spaces 16.1 Convergence of Sequences . . . . . . . . . . . . . . . . . . . . . . . 16.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . .
287 287 29 1 296
16.4 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
301 309 316 322 330 333
16.6 The Normed Topology of Rd . . . . . . . . . . . . . . . . . . . . . . 16.7 Dense Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.8 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.9 Locally Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . .
17 Differentiation in Normed Spaces 341 17.1 Continuous Linear Functions . . . . . . . . . . . . . . . . . . . . . . 342 17.2 Matrix Representation of Linear Functions . . . . . . . . . . . . . . . 348 17.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 17.4 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . 360 17.5 How Partial Derivatives Fit In . . . . . . . . . . . . . . . . . . . . . 362 17.6 Multilinear Functions (Tensors) . . . . . . . . . . . . . . . . . . . . 369 17.7 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 17.8 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . 380
...
Vlll
Con tents
18 Measure. Topology. and Differentiation 18.1 Lebesgue Measurable Sets in Rd . . . . . . . . . . . . . . . . . . . . 18.2 Cco and Approximation of Integrable Functions . . . . . . . . . . . . 18.3 Tensor Algebra and Determinants . . . . . . . . . . . . . . . . . . . 18.4 Multidimensional Substitution . . . . . . . . . . . . . . . . . . . . .
385 385 391 397 407
19 Introduction to Differential Geometry 42 1 19.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 19.2 Tangent Spaces and Differentiable Functions . . . . . . . . . . . . . . 427 19.3 Differential Forms. Integrals Over the Unit Cube . . . . . . . . . . . 434 19.4 k-Forms and Integrals Over k-Chains . . . . . . . . . . . . . . . . . . 443 19.5 Integration on Manifolds . . . . . . . . . . . . . . . . . . . . . . . . 452 19.6 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 20 Hilbert Spaces 20.1 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . .
463 463 467 475
Part 111: Applied Analysis 21 Physics Background 2 1.1 Harmonic Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Heat and Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Separation of Variables. Fourier Series. and Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 The Navier Stokes Equation for the Conservation of Mass . . . . . . .
483 484 486
22 Ordinary Differential Equations 22.1 Banach Space Valued Differential Equations . . . . . . . . . . . . . . 22.2 An Existence and Uniqueness Theorem . . . . . . . . . . . . . . . . 22.3 Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . .
505 505 508 510
23 The Finite Element Method 23.1 Ritz-Galerkin Approximation . . . . . . . . . . . . . . . . . . . . . . 23.2 Weakly Differentiable Functions . . . . . . . . . . . . . . . . . . . . 23.3 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4 Elliptic Differential Operators . . . . . . . . . . . . . . . . . . . . . 23.5 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
513 513 518 524 532 536
Conclusion and Outlook
490 493 496
544
Con tents
1x
Appendices A Logic A.l Statements. . . . . . . . . . . . . . . . . . . . , . . . . . . . . . A.2 Negations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
545
. ,
. 545
. 546
B SetTheory 547 B.l The Zermelo-Fraenkel Axioms . . . . . . . . . . . . . . . . . . . . . 547 B.2 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . 548
C Natural Numbers, Integers, and Rational Numbers C.1 The Natural Numbers . . , . . , . . . . . . . . . . . . . . . . . . .
549 . 549 C.2 The Integers . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . 550 C.3 The Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 550
Bibliography
55 1
Index
553
Con tents
X
Theon
systems
Background: Brieflv in Amendices
Chapter 6 senes I I
I
Part I: Analysis on R
Counrabdii)
Chapter 5 RlemB""
+
Chapter 8 RlCma""
Chapter I I Sequencesof Functions
Integral I
Integral II
Chapter 13
Chapter 9 Lcbeigue Integral
Chapter 12 Trmscendenral
Chapter 21 Ph>iics Background
Chapter 22 Ordmay Differential Equations
4
h"lIleClCd
Methods
4
c
F""Ctl0"S
Part 11: Abstract Analysis
Part 111: Applications
Chapter 21 Panial DEr. Finite Elements
Figure 1: Content dependency chart with minimum prerequisites indicated by arrows. Some remarks, examples, and exercises in the later chapter might still depend on other earlier chapters, but this problem typically can be resolved by quoting a single result. Details about where and how the reader can "branch out" are given in in the text.
Preface This text is a self-contained introduction to the fundamentals of analysis. The only prerequisite is some experience with mathematical language and proofs. That is, it helps to be familiar with the structure of mathematical statements and with proof methods, such as direct proofs, proofs by contradiction, or induction. With some support in the right places, mostly in the early chapters, this text can also be used without prerequisites in a first proof class. Mastering proofs in analysis is one of the key steps toward becoming a mathematician. To develop sound proof writing techniques, standard proof techniques are discussed early in the text and for a while they are pointed out explicitly. Throughout, proofs are presented with as much detail and as little hand waving as possible. This makes some proofs (for example, the density of C [ a ,b] in L P [ a ,b]in Part 11)notationally a bit complicated. With computers now being a regular tool in mathematics, the author considers this appropriate. When code is written for a problem, all details must be implemented, even those that are omitted in proofs. Seeing a few highly detailed proofs is reasonable preparation for such tasks. Moreover, to facilitate the transition to more abstract settings, such as measure, inner product, normed, and metric spaces, the results for single variable functions are proved using methods that translate to these abstract settings. For example, early proofs rely extensively on sequences and we also use the completeness of the real numbers rather than their order properties. Analysis is important for applications, because it provides the abstract background that allows us to apply the full power of mathematics to scientific problems. This text shows that all abstractions are well motivated by the desire to build a strong theory that connects to specific applications. Readers who complete this text will be ready for all analysis-based and analysis-related subjects in mathematics, including complex analysis, differential equations, differential geometry, functional analysis, harmonic analysis, mathematical physics, measure theory, numerical analysis, partial differential equations, probability theory, and topology. Readers interested in motivation from physics are advised to browse Chapter 21, even if they have not read any of the earlier chapters. Aside from the topics covered, readers interested in applications should note that the axiomatic approach of mathematics is similar to problem solving in other fields. In mathematics, theories are built on axioms. Similarly, in applications, models are subject to constraints. Neither the axioms, nor the constraints can be violated by the theory or model. Building a theory based on axioms fosters the reader's discipline to not make unwarranted assumptions.
xii
Preface
Organization of the content. The text consists of three large parts. Part I, comprised of Chapters 1-13, presents the analysis of functions of one real variable, including a motivated introduction to the Lebesgue integral. Chapters 1-6 and 10-13 could be called “single variable calculus with proofs.” For a smooth transition from calculus and a gradual increase in abstraction, Chapters 1-6 require very little set theory. Chapter 1 presents the properties of the real line and limits of sequences are introduced in Chapter 2. Chapters 3-5 present the fundamentals on continuity, differentiation, and (Riemann) integration in this order and Chapter 6 gives a first introduction to series. Chapters 6-8 are motivated by the desire to further explore the Riemann integral while avoiding the excessive use of Riemann sums. This exploration is done with the Lebesgue criterion for Riemann integrability. Although this criterion requires the Lebesgue measure, the payoff is that many proofs become simpler. To quickly reach this criterion, the first presentation of series in Chapter 6 is deliberately kept short. It presents enough about series to allow the definition of Lebesgue measure. Chapter 7 presents fundamental notions of set theory. Most of these ideas are needed for Lebesgue measure, but, overall, Chapter 7 contains all the set theory needed in the remainder of the text. Chapter 8 finishes the presentation of the Riemann integral. With Lebesgue measure available, it is natural to investigate the Lebesgue integral in Chapter 9. This chapter could also be delayed to the end of Part I, but the author believes that early exposure to the crucial ideas will ease the later transition to measure spaces. The analysis of single variable functions is finished with the rigorous introduction of the transcendental functions. The necessary background on power series is explored in Chapter 10. Chapter 11 presents some fundamentals on the convergence of sequences of functions and Chapter 12 is devoted to the transcendental functions themselves. Chapter 13 discusses general numerical methods, but transcendental functions provide a rich test bed for the methods presented. Part I of the text can be read or presented in many orders. Figure 1 shows the prerequisite structure of the text. Prerequisites for each chapter have deliberately been kept minimal. In this fashion, the order of topics in the reader’s first contact with proofs in analysis can be adapted to many readers’ preferences. Most notably, the intentionally early presentation of Lebesgue integration can be postponed to the end of Part I if so desired. Throughout, the author intends to keep the reader engaged by providing motivation for all abstractions. Consequently, as Figure 1 and the table of contents indicate, some concepts and results are presented in a “just-in-time’’ fashion rather than in what may be considered their traditional place. If a concept is needed in an exercise before the concept is “officially” defined in the text, the concept will be defined in the exercise and in the text. Part 11, comprised of Chapters 14-20, explores how the appropriate abstractions lead to a powerful and widely applicable theoretical foundation for all branches of applied mathematics. The desire to define an integral in d-dimensional space provides a natural motivation to introduce measure spaces in Chapter 14. This chapter facilitates the transition to more abstract mathematics by frequently referring back to corresponding results for the one dimensional Lebesgue integral. The proofs of these results usually are verbatim the same as in the one-dimensional setting. Moreover, this early introduction makes LP spaces available as examples for the rest of the text. The abstract venues of analysis are then presented in Chapter 15, which provides all examples
Preface
...
Xlll
for the rest of Part 11. The fundamentals on metric spaces and continuity are presented in Chapter 16. As with measure spaces, for several results on metric spaces the reader is referred back to the corresponding proof for single variable functions. Proofs are no longer verbatim the same and abstraction is facilitated by translating proofs from a familiar setting to the new setting while analyzing similarities and differences. In a class, the author suggests that the teacher fill in some of these proofs to demonstrate the process. Chapter 17 presents the fundamentals on normed spaces and differentiation. Again, ideas are similar to those for functions of a single variable, but this time the abstraction goes beyond translation. With all three fundamental concepts (integration, continuity, and differentiation) available in the abstract setting, Chapter 18 shows the interrelationship between concepts presented separately before, culminating in the Multivariable Substitution Formula. The second part is completed by a presentation of the fundamentals of analysis on manifolds, together with a physical interpretation of key concepts in Chapter 19 and by an introduction to Hilbert spaces in Chapter 20. The remaining chapters give a brief outlook to applied subjects in which analysis is used, specifically, physics in Chapter 21, ordinary differential equations in Chapter 22, and partial differential equations and the finite element method in Chapter 23. Each of these chapters can only give a taste of its subject and I encourage the reader to go deeper into the utterly fascinating applications that lie behind part 111. The mathematical preparation through this text should facilitate the transition. It should be possible to cover the bulk of the text in a two course sequence. Although Chapters 14-16 should be read in order, depending on the available time, the pace and the choice of topics, any of Chapters 17-23 can serve as a capstone experience.
How to read this text. Mathematics in general, and analysis in particular, is not a spectator sport. It is learned by doing. To allow the reader to “do” mathematics, each section has exercises of varying degrees of difficulty. Some exercises require the adaptation of an argument in the text. These exercises are also intended to make the reader critically analyze the argument before adapting it. This is the first step toward being able to write proofs. Of course the need for very critical (and slow) reading of mathematics is nicely summed up in the old quote that “To read without a pencil is daydreaming.” The reader should ask himherself after every sentence “What does this mean? Why is this justified?’ Making notes in the margin to explain the harder steps will allow the reader to answer these questions more easily in the second and third readings of a proof. So it is important to read thoroughly and slowly, to make notes and to reread as often as needed. The extensive index should help with unknown or forgotten terminology as necessary. Other exercises have hints on how to create a proof that the reader has not seen before. These exercises require the use of proof techniques in a new setting. Finally, there are also exercises without hints. Being able to create the proof with nothing but the result given is the deepest task in a mathematics course. This is not to say that exercises without hints are always the hardest and adaptations are always the easiest, but in many cases this is true. Finally, some exercises give a sequence of hints and intermediate results leading up to a famous theorem or a specific example. These exercises could also be used as mini-projects. In a class, some of them
xiv
Preface
could be the basis for separate lectures that spotlight a particular theorem or example. To get the most out of this text, the reader is encouraged to not look for hints and solutions in other background materials. In fact, even for proofs that are adaptations of proofs in this text, it is advantageous to try to create the proof without looking up the proof that is to be adapted. There is evidence that the struggle to solve a problem, which can take days for a single proof, is exactly what ultimately contributes to the development of strong skills. “Shortcuts,” while pleasant, can actually diminish this development. Readers interested in quantitative evidence that shows how the struggle to acquire a skill actually can lead to deeper learning may find the article [4] quite enlightening. A better survival mechanism than shortcuts is the development of connections between newly learned content and existing knowledge. The reader will need to find these connections to hisher existing knowledge, but the structure of the text is intended to help by motivating all abstractions. Readers interested in how knowledge is activated more easily when it was learned in a known context may be interested in the article [5].
Acknowledgments. Strange as it may sound, I started writing this text in the spring of 1987, as I prepared for my oral final examination in the traditional Analysis I111 sequence in Germany. Basically, I took all topics in the sequence and arranged them in what was the most logical fashion to me at the time. Of course, these notes are, in retrospect, immature. But they did a lot to shape my abilities and they were a good source of ideas and exercises. In this respect, I am indebted to my teachers for this sequence: Professor Wegener and teaching assistant Ms. Lange for Analysis I, Professor Kutzler and teaching assistant Herr Bottger for Analysis 11-111 as well as Professor Herz in whose Differential Equations class I first saw analysis “at work.” With all due respect to the other individuals, to me and many of my fellow students, the force that drove us in analysis (and beyond) was Herr Bottger. This gentleman was uncompromising in his pursuit of mathematical excellence and we feared as well as looked forward to his demanding exercise sets. He was highly respected because he was ready to spend hours with anyone who wanted to talk mathematics. Those who kept up with him were extremely well prepared for their mathematical careers. Incidentally, Dr. Ansgar Jungel, whose notes I used for the chapter on the finite element method, took the above mentioned classes with me. The thorough preparation through these classes is the main reason why most of this text was comparatively easy to write. If this text does half as good a job as Herr Bottger did with us, it has more than achieved its purpose. It was thrilling to test my limitations, it was humbling to find them and ultimately I was left awed once more by the beauty of mathematics. When my abilities were insufficient to proceed, I used the texts listed in the bibliography for proofs, hints or to structure the presentation. To make the reader fully concentrate on matters at hand, and to force myself to make the exposition self-contained, outside references are limited to places where results were beyond the scope of this exposition. A solid foundation will allow readers to judiciously pick their own resources for further study. Nonetheless, it is appropriate to recognize the influence of the works of a number of outstanding individuals. I used Adams [2], Renardy and Rogers [23], Yosida [33] and Zeidler [34] for Sobolev spaces, Aris 131, Cramer’s http: //www.navier-stokes .net/,and
Preface
xv
Welty, Wicks and Wilson [31] for fluid dynamics, Chapman [6] for heat transfer, Cohn [7] for measure theory, DieudonnC [8] for differentiation in Banach spaces, Dodge [9] and Halmos [ 131 for set theory, Ferguson [ 101, Sandefur [24] and Stoer and Bulirsch [28] for numerical analysis, Halliday, Resnick and Walker [ 121 for elementary physics, Hewitt and Stromberg [14], Heuser [15], [16], Johnsonbaugh and Pfaffenberger [20], Lehn [22] and Stromberg [29] for general background on analysis, Heuser [17] for functional analysis, Hurd and Loeb [18] for the use of quantifiers in logic, Jiingel [21] and Solin [25] for the finite element method, Spivak [26], [27] for manifolds, Torchinsky [30] for Fourier series, Willard [32] for topology, and the Online Encyclopaedia of Mathematics http : / /eom.springer.d e / for quick checks of notation and definitions. Readers interested in further study of these subjects may wish to start with the above references. The first draft of the manuscript was used in my analysis classes in the Winter and Spring quarters of 2007. The first class covered Chapters 1-9, the second covered Chapters 11 and 14-18 (with some strategic “fast forwards”). This setup assured that graduating students would have full exposure to the essentials of analysis on the real line and to as much abstract analysis as possible without “handwaving arguments.” I am grateful to the students in these classes for keeping up with the pace, solving large numbers of homework problems, being patient with the typos we found and also for suggesting at least one order in which to present the material that I had not considered. The students’ evaluations (my best ever) also reaffirmed for me that people will enjoy, or at least accept and honor, a challenge, and that an ambitious, motivated course should be the way to go. Devery Rowland once more did an excellent job printing drafts of the text for the classes. Aside from the referees, several colleagues also commented on this text and I owe them my thanks for making it a better product. In particular, I would like to thank Natalia Zotov for some comments on an early version that significantly improved the presentation, and Ansgar Jiingel for pointing out some key references on Sobolev spaces. Although I hope that we have found all remaining errors and typos, any that remain are my responsibility and mine alone. I request readers to report errors and typos to me so I can post an errata. My contacts at Wiley, Susanne Steitz, Jacqueline Palmieri, and Melissa Yanuzzi bore with me when the stress level rose and their patience made the publishing process very smooth. As always, this work would not have been possible without the love of my family. It is truly wonderful to be supported by individuals who accept your decision to spend large amounts of time reliving your formative years. Finally, I was sad to learn that Herr Bottger died unexpectedly a few years after I had my last class with him. Sir, this one’s for you.
Ruston, LA, August 30,2007 Bernd Schroder
Part I Analysis of Functions
of a Single Real Variable
Chapter 1
The Real Numbers This investigation of analysis starts with minimal prerequisites. Regarding set theory, the terms “set” and “element” will remain undefined, as is customary in mathematics to avoid paradoxes. The empty set 0 is the set that has no elements. The statement “e E S” says that e is an element of the set S. The statement “ A G B” says that every element of A is an element of B . Sets A and B are equal if and only if A C B and B C A . The statement “A c B” says that A E B and A # B . Subsets will be defined as “ A = {x E S : (property)},”that is, with a statement from which set S the elements of A are taken and a property describing them. The union of two sets A and B is A U B = {x : x E A o r x E B } , theintersectionis A n B = {x : x E A andx E B ) .
u n
Union and intersection of finitely many sets are denoted
j=l
n
A j and
n
A j , respec-
j=1
tively, and the relative complement of B in A is A \ B = {x E A : x @ B ) . Further details on set theory are purposely delayed until Section 7.1. Until then, we focus on analytical techniques. Any required notions of set theory will be clarified on the spot. To define properties, sometimes the universal quantifier “V” (read “for all”) or the existential quantifier “3” (read “there exists”) are used. Formal logic is described in more detail in Appendix A. Finally, the reader needs an intuitive idea what a function, a relation and a binary operation are. Details are relegated to Appendices B.2 and C.2. The real numbers R are the “staging ground” for analysis. They can be characterized as the unique (up to isomorphism) mathematical entity that satisfies Axioms 1.1, 1.6, and 1.19. That is, they are the unique linearly ordered, complete field (see Exercise 1-30). In this chapter, we introduce the axioms for the real numbers and some fundamental consequences. These results assure that the real numbers indeed have the properties that we are familiar with from algebra and calculus.
1.1 Field Axioms The description of the real numbers starts with their algebraic properties.
1
2
1. The Real Numbers
Axiom 1.1 The real numbers R are a field. That is, R has at least two elements and there are two binary operations, addition : R x R + R and multiplication . : R x R -+ R,so that
+
1. Addition is associative, that is, for all x , y , z
(x
R we have
E
R we have
+ y ) + z = x + (y + z).
2. Addition is commutative, that is, for all x , y x
E
+y = y +x.
3. There is a neutral element 0 f o r addition, that is, there is an element 0 t h a t f o r a l l x E R we havex + 0 = x .
4. For every element x x + (-x) = 0.
E
R so
E
R there is an additive inverse element (-x) so that E
R we have
6. Multiplication is commutative, that is, for all x , y E
R we have
5. Multiplication is associative, that is, for all x , y , z (x . y ) . z = x . ( y . z ) . x ‘ 4 ’ = y .x.
7. There is a neutral element 1f o r multiplication, that is, there is an element 1 E R so that for all x E R we have 1 . x = x . 8. For every element x E t h a t x . x - l = 1.
R \ { 0 }there is a multiplicative inverse element x - l so
9. Multiplication is (left) distributive over addition, that is, f o r all a , x , y have a . (x y) = a, .x + a . y .
+
E
R we
As is customary for multiplication, the dot between factors is usually omitted. Fields are investigated in detail in abstract algebra. For analysis, it is most effective to remember that the field axioms guarantee the properties needed so that we can perform algebra and arithmetic “as usual.” Some of these properties are exhibited in this section and in the exercises. The exercises also include examples that show that not every field needs to be infinite (see Exercises 1-7-1-9).
Theorem 1.2 The following are true in R: 1. For all x
E
R,we have Ox
= 0.
2. 0 # 1. 3. Additive inverses are unique. That is, i f x E property in part 4 ofAxiom 1.1, then x’ = X.
4. For all x
E
R,we have (- l)x
= -x.
R and
x’ and
X both have the
3
1.1. Field Axioms
Proof. Early in the text, proofs will sometimes be interrupted by comments in italics to point out standard formulations and proof techniques. To prove part 1, let x E R. Then the axioms allow us to obtain the following Ax.3
Ax.6
Ax.9
Ax.6
equation. Ox = (O+O)x = x(O+O) = xO+xO = Ox +Ox. This implies
as was claimed. The proof of part I shows how every step in a proof needs to be just$ed. Usually we will not explicitly justify each step in a computation with an axiom or a previous result. Howevel; the reader should always mentallyfill in thejusti3cation. The practice offilling in these justiJcations should be started in the computations in the remainder of this proot To prove part 2, first note that, because R has at least two elements, there is an x E R \ ( 0 ) . Now suppose for a contradiction (see Standard Proof Technique 1.4 below) that 0 = 1. Then x = 1 . x = 0 . x = 0 is a contradiction to x E R \ ( 0 ) . For part 3, note that if x’and X both have the property in part 4 of Axiom 1.1, then x’ = x’+O = x’+(x+X) = (x’+x)+X = (x+x’)+X = O+X = X+O = X.Note that the statement of part 3 already encodes the typical approach to a uniqueness proof (see Standard Proof Technique 1.5 below). Finally, for part 4 note that x (- l ) x = l x (- 1)x = (1 (- 1)). = Ox = 0. Because by part 3 additive inverses are unique, (- l)x must be the additive inverse -x of x . The last step is a typical application of modus ponens, see Standard Proof Technique 1.3 below.
+
+
+
To familiarize the reader with standard proof techniques, these techniques will be pointed out explicitly in the early part of the text. The techniques presented in Chapter 1 are general proof techniques applicable throughout mathematics. Techniques presented in later chapters are mostly specific to analysis.
Standard Proof Technique 1.3 The simplest mathematical proof technique is a direct proof in which a result that says “ A implies B” is applied after we have proved that A is true. Truth of A and of “ A implies B” guarantees truth of B . This technique is also called modus ponens. An example is in the proof of part 4 of Theorem 1.2. 0 Standard Proof Technique 1.4 In a proof by contradiction, we suppose the contrary (the negation, also see Appendix A.2) of what is claimed is true and then we derive a contradiction. Typically, we derive a statement and its negation, which is a contradiction, because they cannot both be true. For an example, see the proof of part 2 of Theorem 1.2 above. Given that the reasoning that led to the contradiction is correct, the contradiction must be caused by the assumption that the contrary of the claim is true. Hence, the contrary of the claim must be false, because true statements cannot imply false statements like contradictions (see part 3 of Definition A.2 in Appendix A). But this means the claim must be true. We will usually indicate proofs by contradiction with a starting statement like “suppose for a contradiction.” 0
4
1. The Real Numbers
Standard Proof Technique 1.5 For many mathematical objects it is important to assure that they are the only object that has certain properties. That is, we want to assure that the object is unique. In a typical uniqueness proof, we assume that there is more than one object with the properties under investigation and we prove that any two of these objects must be equal. Part 3 of Theorem 1.2 shows this approach.
Exercises 1-1. Prove that (-1). (-1) = 1. 1-2.
. is right distributive over +. Prove that for all x , y , z
E
R we have (x + y ) z = xz + yz.
1-3. Multiplicative inverses are unique. Prove that if x E W and x' and X both have the property in part 8 of Axiom 1.1 then x' = X. 1-4. Prove that 0 does not have a multiplicative inverse.
1-5. Prove that if x , y # 0, then ( x y ) - ' = y - l x - ' .
Conclude in particular that x y # 0.
1-6. Prove each of the binomial formulas below. Justify each step with the appropriate axiom
+ b ) 2 = a* + 2ab + b2 (a + b ) ( a - b ) = a2 - b2
(b) ( a - b)* = a 2 - 2ab
(a) ( a (c)
+ b2 +
1-7. Prove that the set (0, 1) with the usual multiplication and the usual addition, except that 1 1 := 0, is a field. That is, prove that the set and addition and multiplication as stated have the properties listed in Axiom 1.1. 1-8. Prove that the set (0, 1. 2 ) with the sum and product of two elements being the remainder obtained
when dividing the regular sum and product by 3 is a field.
1-9. A property and some finite fields (a) Let F be a field and let x , y
E
F . Prove that x y = 0 if and only if x = 0 or y = 0
(b) Prove that the set [O. 1, 2. 3 ) with the sum and product of two elements being the remainder obtained when dividing the regular sum and product by 4 is not a field. (c) Prove that the set (0, 1, . , , , p - 1) with the sum and product of two elements being the remainder obtained when dividing the regular sum and product by p is a field if and only if p is a prime number.
1.2 Order Axioms Exercises 1-7-1-9c show that the field axioms alone are not enough to describe the real numbers. In fact, fields need not even be infinite. However, aside from executing the familiar algebraic operations, we can also compare real numbers. This section presents the order relation on the real numbers and its properties.
Axiom 1.6 The real numbers R contain a subset R+,called the positive real numbers such that 1. For all x , y
2. For all x
E
Either x E
E
R+,we have x + y
E
E%+ and x y
E
E%+,
R, exactly one of the following three properties holds. R+ or -x E Rt or x = 0.
5
1.2. Order Axioms
A real number x is called negative if and only if -x E R+. Once positive numbers are defined, we can define an order relation. As usual, instead of writing y (-x) we write y - x and call it the difference of x and y. The binary operation “-” is called subtraction. The phrase “if and only if,” which is used in definitions and biconditionals, is normally abbreviated with the artificial word “iff.”
+
Definition 1.7 For x,y E R,we say x is less than y, in symbols x < y, i f f y -x E R+. We say x is less than or equal to y, denoted x 5 y, ifSx < y or x = y. Finally, we say x is greater than y, denoted x > y, i r y < x,and we say x is greater than or equal to y , denoted x 2 y, ifsy 5 x. The relation 5 satisfies the properties that define an order relation.
Proposition 1.8 The relation 5 is an order relation on R.That is, 1. 5 is reflexive. For all x
E
R we have x 5 x,
2. 5 is antisymmetric. For all x,y x = y,
E
R we have that x 5 y and y 5 x implies
3. 5 is transitive. For all x, y , z E X , we have that x 5 y and y 5 z implies x 5 z. Moreovel; the relation 5 is a total order relation, that is, f o r any two x,y E have that x 5 y or y 5 x.
R we
Proof. The relation 5 is reflexive, because it includes equality. For antisymmetry, let x 5 y and y 5 x and suppose for a contradiction that x y . Then x - y E R+ and -(x - y ) = y - x E R+,which cannot be by Axiom 1.6. Thus < must be antisymmetric. For transitivity, let x 5 y and y 5 z . There is nothing to prove if one of the inequalities is an equality. Thus we can assume that x y and y < z , which means y - x E Rf and z - y E R+.But then R+ contains (7, - y) ( y - x) = z - x, and hence x < z . We have shown that for all x,y. z E R the inequalities x 5 y and y I :z imply x 5 z , which means that 5 is transitive. For the “moreover” part note that if x,y E R,then y - x E R and we have either y - x E R+,which means x < y , or y - x = 0, which means y = x, or x - y = - ( y - x ) E R + , w h i c h m e a n s y < x . Thereforeforallx,y E R o n e o f x 5 y or y 5 x holds, and hence 5 is a total order.
+
+
Once an order relation is established, we can define intervals.
Definition 1.9 An interval is a set I C R so that for all c , d E I and x E R the inequalities c < x < d imply x E I . In particular for a , b E R with a < b we define 1. [ a , b] := (X E
R : a 5 x 5 b},
2. ( a , b ) := (x E R : a < x < b ] , ( a , 00) := (x E R : a < x}, (-00. b ) := (X E R : x < b}, (-w, 00) := R,
6
1. The Real Numbers
3. [ a ,b ) := {x E
R :a 5 x
4. ( a , b ] := {X
R :u
E
< b } , [ a , 00) := { X E R : u 5 x},
< x 5 b ] , (-w, b ] := ( X E R : x 5 b].
The points a and b are also called the endpoints of the interval. A n interval that does not contain either of its endpoints (where &m are also considered to be "endpoints") is called open, An interval that contains exactly one of its endpoints is called half-open and an interval that contains both its endpoints is called closed. For the first part of this text, the domains of functions will almost exclusively be intervals. Because analysis requires extensive work with inequalities, we need to investigate how the order relation relates to the algebraic operations.
Theorem 1.10 Properties of the order relation. Let x , y , z E R. 1. The number x is positive ifsx > 0 and x is negative c r x < 0. 2. I f x 5 y , then x
+ z 5 y + z.
3. I f x 5 y and z > 0, then xz 5 y z . 4. I f x 5 y and
z
< 0, then xz 2 y z .
5. l f 0 < x 5 y , then y-' 5 x-'. Similar results can be proved f o r other combinations of strict and nonstrict inequalities. We will not state these here, but instead trust that the reader can make the requisite translation from the statements in this theorem.
Proof. Parts 1 and 2 are left to the reader as Exercises 1-10a and 1-lob. Throughout this text, parts of proofs will be delegated to the reader to facilitate a better connection to the material presented. For part 3 , let x 5 y and let z > 0. Then, y - x E R+ or y = x. In case y = x, we obtain y z = x z and thus, in particular, xz 5 y z . In case y - x E R+,note that z > 0 means z E R+, and hence y z - xz = ( y - x ) z E R+.By definition, this implies xz < y z , and in particular xz 5 y z . Because we have shown xz 5 y z in each case, the result is established. All proofs in this section are done with the above kind of case distinction (see Standard Proof Technique 1.1 1). For part 4, let x 5 y and let z < 0. Then, y - x E R+ or y = x. In case y = x, we obtain y z = x z , and hence xz 2 y z . In case y - x E Rf,note that z < 0 means -2 E B+, and hence xz - y z = ( x - y ) z = ( y - x ) ( - z ) E R+.By definition, this implies y z < xz, and hence y z 5 xz,which establishes the result. For part 5, first note that there is nothing to prove if x = y . Hence, we can assume that x < y . Suppose for a contradiction that x - l < y-' . Then by part 3 we have that 1 = x - l x < y - ' x , and hence x < y . 1 < y y - ' x = x,contradiction. Standard Proof Technique 1.11 When several possibilities must be considered in a proof, the proof usually continues with separate arguments for each possibility. The proof is complete when each separate argument has led to the desired conclusion. This 0 type of proof is also called a proof by case distinction.
1.2. Order Axioms
7
We conclude this section by introducing the absolute value function and some of its properties.
Definition 1.12 For x
E
R,we set Ix I = x;
i f x 1.0, and we call it the absolute -x; i f x < 0,
value of x . Theorem 1.13 summarizes the properties of the absolute value. The numbering is adjusted so that properties 1,2, and 3 correspond to the analogous properties for norms (see Definition 15.38). We will formulate many results in the jirst part of the text to be analogous or easily generalizable to more abstract settings, but we will usually do so without explicit forward references. In this fashion many abstract situations will be more familiar because of similarities to situations investigated in the jirst part.
Theorem 1.13 Properties of the absolute value. 0. For all x
E
R,we have Ix I > 0,
1. For all x
E
R,we have 1x1 = 0 i y x
2. F o r a l l x , y
ER,wehave
= 0,
lxyl = Ixllyl,
3. Triangular inequality. For all x,y E R,we have Ix 4. Reverse triangular inequality. For all x , y
E
+ y I 5 lx I + I y 1.
R,we have 1 Ix I - I y I 1
I Ix - y I.
Proof. For part 0, let x E R.In case x > 0, by Definition 1.12 we have /x1 = x > 0. In case x < 0, we have x @ R+ and by part 2 of Axiom 1.6 we conclude -x > 0. Because in this case Ix I = -x > 0, part 0 follows. Throughout the text, the two implications of a biconditional “ A iff B” will be referred to as “+,”denoting “if A, then B ” and “+,”denoting “if B , then A.” For part 1, note that the direction “+=” is trivial, because (01= 0. For the direction “jlet , x” E R be so that /xI = 0 and suppose for a contradiction that x 0. If x > 0, then 0 < x = 1x1 = 0, a contradiction. (Note that the previous sentence is a shortproof by contradiction that is part of a longer proof by contradiction.) Therefore x < 0. But then 0 < -x = 1x1 = 0, a contradiction. Hence, x must be equal to 0. For part 2 , let x , y E R. If x 2 0 and y 1. 0, then by part 3 of Theorem 1.10 xy 1. 0, and hence lxyl = x y = I x / / y l . If x 2 0 and y < 0, then by part 4 of Theorem 1.10 we infer xy 5 0. Hence, (xyl = -xy = x ( - y ) = J x J J y The J . case x < 0 and y 3 0 is similar and the reader will produce it in Exercise 1- 11a. Finally, if x < 0 and y < 0, then by part 4 of Theorem 1.10 we obtain x y > 0. Hence, /xyl = xy = (-l)(-1)xy = ( - x ) ( - y ) = ixllyl. To prove the triangular inequality, first note that for all x E IR we have that x I /x1. This is clear for x 1. 0 and for x < 0 we simply note x < 0 < -x = 1x1.Moreover, (see Exercise 1-llb) for all x E R we have -x I1x1. Now let x,y E R. If the inequality x y 2 0 holds, then by part 2 of Theorem 1.10 at least one of x. 4’ is greater than or equal to 0. (Otherwise x < 0 and y < 0 would imply x y < 0.) Hence,bypart2ofTheoreml.lOIx+yI = x + y l I x l + y ~I x I + I y I . I f x + y ( 0 ,
+
+
+
1. The Real Numbers
8
then at least one of x and y is less than 0. Hence, by part 2 of Theorem 1.10 we obtain Ix y l = -(x y ) = --x (-y) < I -XI (-Y) i I --XI I - Y I = 1x1 IYI. Finally, for the reverse triangular inequality, let x, y E R.Without loss of generality (see Standard Proof Technique 1.14) assume that ( X I 3 IyI. (The proof for the case 1x1 < lyl is left as Exercise 1-llc.) Then 1x1 = Ix - y yl i Ix - yI lyl, which w implies 11x1 - 1y11 = 1x1 - IYI i Ix - Y I .
+
+
+
+
+
+
+
+
Standard Proof Technique 1.14 If the proofs for the cases in a case distinction are very similar, it is customary to assume without loss of generality that one of these similar cases is true. This is not a loss of generality, because it is assumed that what is presented enables the reader to fill in the proof(s) for the other case(s). In this text, the omitted part is sometimes included as an explicit exercise for the reader. 0
Exercises 1- 10. Finishing the proof of Theorem 1.10 (a) Prove part 1 of Theorem 1.10. (b) Prove part 2 of Theorem 1.10. 1-1 1. Finishing the proof of Theorem 1.13.
(a) L e t x , y ~ W . P r o v e t h a t i f x > O a n d y ~ O , t h e n I x y l = I x l l y l . (b) Prove that for all x E R we have --x 5 1x1.
1
(c) Prove that if 1x1 < Iyl, then 11x1 - ( y / 5 Ix 1-12. Let I , J G
R be intervals. Prove that I n J
= {x E
-
y/.
W :x
E I and x E J ] is again an interval
1-13. Let a < b and letx, y E [ u , b].Prove that In - yI 5 b - a 1-14. Prove that none of the fields from Exercise 1-9c can satisfy Axiom 1.6 by showing that for these fields part 2 of Axiom 1.6 fails for n = 1. Note. This result shows that Axiom 1.6 distinguishes R from the finite fields of Exercise 1-9c.
1.3 Lowest Upper and Greatest Lower Bounds A structure that has the properties outlined in Axioms 1.1 and 1.6 is also called a linearly ordered field. The rational numbers satisfy these properties just as well as the real numbers. Thus we are not done with our characterization of R.The final axiom for the real numbers addresses upper and lower bounds of sets.
Definition 1.15 Let A be a subset ofR. E R is called an upper bound of A iff u 2 a f o r all a E A. has an upper bound, it is also called bounded above.
1. The number u
2. The number I E R is called a lower bound of A i f f 1 5 a f o r all a a lower bound, it is also called bounded below. A subset A
E
If A
A. If A has
R that is bounded above and bounded below is also called bounded.
9
1.3. Lowest Upper and Greatest Lower Bounds
Among all upper bounds of a set, the smallest one (if it exists) plays a special role. Similarly, the greatest lower bound plays a special role if it exists.
Definition 1.16 Let A C R. 1. The number s E R is called lowest upper bound of A or supremum of A, denoted sup(A), iffs is an upper bound of A and f o r all upper bounds u of A we have that s 5 u. 2. The number i E R is called greatest lower bound of A or infimum of A, denoted inf(A), iff i is a lower bound of A and f o r all lower bounds 1 of A we have that 1 5 i . Formally, it is not guaranteed that suprema and infima are unique, but the next result shows that this is indeed the case. Note that the statement of Proposition 1.17 follows the standard pattern for a uniqueness statement.
Proposition 1.17 Suprema are unique. That is, ifthe set A and s , t E R both are suprema of A, then s = t.
R is bounded above
Proof. Let A G Iw and s , t E R be as indicated. Then s is an upper bound of A and, because t is a supremum of A, we infer s 2 t . Similarly, t is an upper bound of A and, because s is a supremum of A, we infer t 2 s. This implies s = t . Standard Proof Technique 1.18 (Also compare with Standard Proof Technique 1.14.) When, as in the proof of Proposition 1.17, two parts of a proof are very similar, it is common to only prove one part and state that the other part is similar. Throughout the text, the reader will become familiar with this idea through exercises that require the construction of proofs that are similar to proofs given in the narrative. The proof that infima are unique is similar (see Exercise 1-15). Because suprema and infima are unique if they exist, we speak of the supremum and the infimum. The final axiom for the real numbers now states that suprema and infima exist under mild hypotheses.
Axiom 1.19 Completeness Axiom. Every nonempty subset S of R that has an upper bound has a lowest upper bound. Although the Completeness Axiom formally only guarantees that nonempty subsets
of
R that are bounded above have suprema, existence of infima is a consequence.
Proposition 1.20 Let S 5 R be nonempty and bounded below. Then S has a greatest lower bound. Proof. Let L := {x E R : x is a lower bound of S}. Then L f 0. Let s E S. Then for all 1 E L we have that 1 Is. Because S f: 0 this means that L is bounded above. Because L f: 0, by the Completeness Axiom, L has a supremum sup(L). Every s E S is an upper bound of L , which means that s 2 sup(L) and so sup(L) is a lower bound of S . By definition of suprema, sup(L) is greater than or equal to all elements of L ,
10
1. The Real Numbers
that is, it is greater than or equal to all lower bounds of S. By definition of infima, this means that sup(L) = inf(S). rn We will see that suprema and infima are valuable tools in analysis on the real line. The next result shows that in any set with a supremum we can find numbers that are arbitrarily close to the supremum. This fact is important, because analysis ultimately is about objects “getting close to each other.”
Proposition 1.21 Let S c R be a nonempty subset of R that is bounded above and let s := sup(S). Thenfor every E > 0 there is an element x E S so that s - x < E . Proof. Suppose for a contradiction that there is an E > 0 so that for all x E S we have that s - x 1 E . Then for all x E S we would obtain s - E 1 x, that is, s - E would be an upper bound of S. But s - E < s contradicts the fact that s is the lowest upper bound of S. rn Although the supremum and infimum of a set need not be elements of the set, we have different names for them in case they are in the set.
Definition 1.22 Let A be a subset of R. 1.
If A is bounded above and sup(A)
E A, then the supremum of A is also called
the maximum of A, denoted max(A).
2. If A is bounded below and inf(A) minimum of A, denoted min(A).
E
A, then the injmum of A is also called the
Although the distinctions between suprema and maxima and between infima and minima are small, the notions are distinct. For example, the open interval (0, 1) has a supremum (1) and an infimum (0), but it has neither a maximum, nor a minimum.
Exercises
1-15. Let A g W be bounded below and l e t s , f E W both be infima of A. Prove that s = t . 1-16. Approaching infima. State and prove a version of Proposition 1.21 that applies to infima. Is the proof significantly different from that of Proposition 1.21? 1-17. Let S g W be bounded above. Prove that s E W is the supremum of S iff s is an upper bound of S and for all E > 0 there is an x E S so that Is - x / < E . 1-18. Suprema and infima vs. containment of sets. (a) Let A. B C W be bounded above. Prove that A (b) Let A , B g W be bounded below. Prove that A
1-19. Let A g
5 B implies sup(A) 5 sup(B). g B implies inf(A) ? inf(B).
W be bounded above. Prove that inf(x E R : - x
E
A] = - sup(A).
1.4. Natural Numbers, Integers, and Rational Numbers
11
1.4 Natural Numbers, Integers, and Rational Numbers Although Axioms 1.1, 1.6 and 1.19 uniquely describe the real numbers, they do not mention familiar subsets, such as natural numbers, integers, and rational numbers. This is because these sets can be constructed from the axioms as subsets of the real numbers. We start with the natural numbers, which are the unique subset with properties as stated in Theorem 1.23. While their existence is easy to establish, the uniqueness of the natural numbers can only be proved in Theorem 1.28 after some more machinery has been developed.
Theorem 1.23 There is a subset N
G R,called the natural numbers, so that
1. 1 E N .
2. For each n E N the number n
+ 1 is also in N.
3. Principle of Induction. If S s have n 1 E S,then S = N.
+
N is such that 1 E
S and f o r each n E S we also
Proof. Call a subset A G R a successor set iff 1 E A and for all a E A we also have a 1 E A . Successor sets exist, because, for example, R itself is a successor set. Let N be the set of all elements of R that are in all successor sets. Because 1 is an element of every successor set, we infer 1 E N. Moreover, if n E N, then n is in every successor set, which means n 1 is in every successor set, and hence n 1 E N. Finally, any subset S C N as given in the Principle of Induction is a successor set. Because the elements of N are contained in all successor sets, we conclude that N G S , and hence N = S. 1
+
+
+
Of course, we will denote the natural numbers by their usual names 1, 2, 3, . . . As algebraic objects, natural numbers are suited for addition and multiplication (see Proposition 1.24), but they are not so well suited for subtraction (see Proposition 1.25). Although all results until Theorem 1.28 are stated for N,they hold “for every subset of R that satisfies the properties in Theorem 1.23.” The reader should keep this in mind and double check, because we will need it in the proof of Theorem 1.28. To avoid awkward formulations, the results up to Theorem 1.28 are formulated for N,however.
Proposition 1.24 The natural numbers are closed under addition and multiplication. That is, i f m , n E N,then m n and mn are in N also.
+
Proof. The key to this result is the Principle of Induction. Let m E W be arbitrary and let S, := { n E N : m+n E N}.Then m E N implies m+ 1 E N,and hence 1 E S,. Moreover, if n E S,, then m n E N,and hence m ( n 1) = ( m n ) 1 E N, which means that n 1 E .S, By the Principle of Induction we conclude that S, = N. Because m E N was arbitrary, this means that for any m , n E N we have m n E W. 1 The proof for products is similar and left to the reader as Exercise 1-20.
+
+
+ +
+ + +
Readers familiar with induction recognize the part “1 E S,” of the preceding proof as the base step of an induction and the part “n E S, jn 1 E S”, as the induction step. In this section, we use the “induction on sets” as done in the preceding proof. The more commonly known Principle of Induction is introduced in Theorem 1.39.
+
12
1. The Real Numbers
Proposition 1.25 Let m , n
E
N be such that m
> n. Then m - n E
N.
Proof. We first show that if m E N,then m - 1 E N or m - 1 = 0. To do this, let A := { m E N : m - 1 E N o r m - 1 = 0 ) . Then 1 E A a n d i f m E A , then ( m 1) - 1 = m E A C N,which means m 1 E A . Hence, A = N by the Principle of Induction. Now let S:= { n E N:(Vm E N : m > n implies m - n E N)}. If n = 1 and m E W satisfies m > 1, then m - 1 > 0 and so by the above m - 1 E N,which means 1 E S. Let n E S. If m > n 1, then m - 1 > n , and hence m - ( n 1) = (m - 1) - n E N, which means n 1 E S. By the Principle of Induction we conclude that S = N,and hence for all m , n E N we have proved that m > n implies m - n E N.
+
+
+
+
+
Proposition 1.26 shows that the natural numbers are positive and the smallest difference between any two of them is 1.
Proposition 1.26 For all n E N,the inequality n 2 1 holds and there is no m that the inequalities n < m < n 1 hold.
+
E
N so
Proof. The proof that all natural numbers are greater than or equal to 1 is left to Exercise 1-21. Now suppose for a contradiction that there is an n E N and an rn E N so that n < m < n 1. Thenm - n E N a n d m - n < 1, acontradiction.
+
The Well-ordering Theorem turns out to be equivalent to the Principle of Induction (see Exercise 1-22).
Theorem 1.27 Well-ordering Theorem. Every nonempty subset of N has a smallest element. Proof. Suppose for a contradiction that B 5 N is not empty and does not have a smallest element. Let S := { n E N : (Vm E N : m I n implies m $ B ) } . By Proposition 1.26, 1 is less than or equal to all elements of N,so 1 # B, and hence 1 E S. Now let n E S. Then all m E N with m 5 n are not in B. But then n 1 E B would by Proposition 1.26 imply that n 1 is the smallest element of B . Hence, n 1 # B and we conclude n 1 E S.By the Principle of Induction, S = N and consequently B = 0, a contradiction.
+
+
+
+
Now we are finally ready to show that the natural numbers are unique.
Theorem 1.28 The natural numbers N are the unique subset of R that satisfies the properties in Theorem 1.23. Proof. Examination of the proofs of all results since Theorem 1.23 reveals that any set S E Iw that satisfies the properties in Theorem 1.23 must also have the properties given in these results. It may feel tedious to go back and verify the above statement. Howevel; mathematical presentations more often than not will ask a reader to use a modification of a known proof toprove a result (also see Standard Proof Technique 1.14). When this occurs, the
13
1.4. Natural Numbers, Integers, and Rational Numbers
reader is expected to verifL that the result(s)can indeed be proved with similar methods as were used for earlier results. Now suppose for a contradiction that there is a set S # N with properties as in Theorem 1.23. Then S is a successor set, so M S. Let B := S \N = {s E S : s # N). Then B # 0, and hence by the Well-ordering Theorem, which is valid for S, B has a smallest element b. Because 1 E N we infer b 1 , and hence by Proposition 1.25, which is valid for S, we have b - 1 E S. But then b - 1 @ N,because this would imply b = ( b - 1) 1 E N.Hence, b - 1 E B , which is a contradiction to the fact that b is the smallest element of B .
+
Once we have constructed the natural numbers, the next number system to consider are the integers.
Definition 1.29 The set Z := { m E R : m E of integers.
N or m
= 0 or
- rn
E
N)is called the set
We leave several proofs of natural properties of the integers to the reader.
Proposition 1.30 The integers are closed under addition, subtraction and multiplication. Moreovel; for any two integers k , 1 with k > 1 we have that k - 1 >_ 1, every nonempty set A 5 Z that is bounded below has a minimum, and every nonempty set A C Zthat is bounded above has a maximum. Proof. To prove that Z is closed under addition, let m , n E Z. In case both are natural numbers or in case one of them is zero, there is nothing to prove. Moreover, in case -m, -n E N we have rn n = -((-m) ( - n ) ) , which is in Z, because (-m) ( - n ) E N. Now consider the case m E N and -n E N. If m = -n, we obtain m n = 0 E Z. If m > -n, then by Proposition 1.25 we conclude that m n = m - (-n) E N 5 Z. Finally, if m < -n again by Proposition 1.25 we conclude that -(m n ) = (-n) - m E N,which means by definition of Z that m n E Z.The case -rn E M and n E N is treated similarly (see Exercise 1-23a). Closedness under subtraction and multiplication as well as the claim about differences are left to Exercises 1-23b-1-23d. Now let A E Z be nonempty and bounded below. Then, because A C R, it has an infimum a . By the version of Proposition 1.21 for infima, there is an integer rn E A with m - a < 1. Because the absolute value of the difference between any two distinct integers is at least 1, rn is the only integer in [a,a 1 ) . Hence, m is below all elements of A that are not in [a,a 1). Because m is the only element of A in [ a ,a l), m must be the minimum of A. The proof of the corresponding result for nonempty subsets A 5 Z that are bounded above is left to Exercise 1-23e.
+
+ +
+
+
+
+
+
+
+
A key property of the natural numbers is that any real number is exceeded by a natural number. To prove this, we need the usual fractions, which are easily introduced. 1 R \ { 0 )we set - := a-l and call it the reciprocal of a. n -. b 1 For b E W and a E W \ ( 0 )we set - := b . - = ba-' and call it a fraction. a U
Definition 1.31 For all a
E
1. The Real Numbers
14 1 1 Because - + - = 2-' 2 2 following.
+ 2-'
= (1
+ 1) . 2 - '
Theorem 1.32 For every x E R, there is an n
E
= 2 .2-' = 1 we can now prove the
N so that n 2 x.
Proof. For a contradiction, suppose that x is such that for all n E N we have that n < x. Then B := { y E R : (Vn E N : n < y ) ) is not empty. Moreover, B is bounded below by all n E N. By the Completeness Axiom, B has an infimum, call it b. Then 1 1 1 b - - # B, which means there is an n E N with n 2 b - -. But then n 1 2 b 2 2 2 is a lower bound of B, a contradiction to b = inf(B).
+
+
Because N C Z and because subsets of Z that are bounded below have a minimum, we infer that for every real number x there is a unique smallest integer that is greater than or equal to x. Similarly there is a unique largest integer that is less than or equal to x. These numbers are useful when we need integers instead of real numbers, so we define the following.
Definition 1.33 For every x E R, let [XI be the smallest integer greater than or equal to x. Moreovel; let 1x1 be the largest integer less than or equal to x. Asfunctions from IR to Z, r.1 is called the ceiling function and 1.1 is called the floor function. The last subset of R that we introduce is the set of rational numbers. Rational numbers are naturally defined as fractions.
ca
1
Definition 1.34 The set Q := - : n E Z, d E N is called the set of rational numbers. The set R \ Q := {x E R : x # Q]is called the set of irrational numbers. Proposition 1.35 The rational numbers are closed under addition, subtraction and 4 multiplication. Moreovel; i f q , r E Q and r 0, then - E Q. r
+
m
n
Proof. Let m , n E Z, let c , d E N and consider the rational numbers - and - . C d Then Q is closed under addition because m n = mc-' + nd-' = mdd-'c-' + ncc-'d-' - + c d
mn For multiplication, note that - - = mc-'nd-' cd der is left to Exercise 1-24.
= mnc-ld-'
=
mn The remaincd
-.
Rational numbers can be found between any two real numbers and Exercise 1-45 will establish a similar result for irrational numbers.
Theorem 1.36 Let a , b thata < q < b.
E
IR with a < b. Then there is a rational number q
E
Q such
15
1.4. Natural Numbers, Integers, and Rational Numbers
1 Proof. By Theorem 1.32, there is an n E N so that 0 < -< n. By part b-a 1 5 of Theorem 1.10, we obtain - < b - a. Now let u := min n 1 u 1 Then - - - 2 b - a > -, which means n n n n 1+1 u '+' 0 we have x 5
E,
then x = 0.
Proof. Let x be as indicated and suppose for a contradiction that x > 0. Then x . x . 1 := - is positive and x 5 E = - implies 1 5 -, a contradiction. 2 2 2
Exercises 1-20. Prove that if m. n E N, then mn E N. Hint. Same idea as the first part of the proof of Proposition 1.24 with sets S,
:= [n E
N : mn
-m E
N and n
E
N}.
1-21. Prove that if n E N,then n 2 1. Hint. Use S := ( n E N : n ? 1). 1-22. Use the Well-ordering Theorem to prove the Principle of Induction. 1-23. Finish the proof of Proposition 1.30 by proving the following. Finish the proof that then m n E Z.
+
Z is closed under addition. That is, prove that if
Prove that Zis closed under subtraction. That is, prove that m
-n E
Prove that 2.is closed under multiplication. That is, prove that mn Prove that for any two integers m , n with m > n we have m Hint. Find a contradiction to Proposition 1.26.
-n
E
Z for all m ,n
E
Z.
Z for all m ,n
E
Z.
? 1.
Prove that every nonempty set A 2 Z that is bounded above has a maximum. 1-24. Finish the proof of Proposition 1.35. That is.
(a) Prove that Q is closed under subtraction. (b) Prove that if q , r
E
Q and r f 0, then
Hint. First show that for n 1-25. Prove that if a , b
E
E
4
- E
r
Z\ ( 0 )and d
R are such that for all E
0. E
N we have that
> 0 we have a 5 b
+
E,
(;)-I
n then a 5 b.
1-26. Prove that for every real number x there is an integer n so that n 5 x. 1-27. Prove that for any real numbers x, E > 0 there is an n E N so that Hint. Theorem 1.32. 1 1 1 1-28. Prove that - - - = 1. 3 3 3
+ +
X -
n
=d
< E.
E
N.
I. The Real Numbers
16
1-29. A rational number r is called a dyadic rational number iff there are p E Z and n E N so that P r = -. Dyadic rational numbers are useful in analysis because they can provide a sequence of 2” “grids” such that each new grid contains the old one (see part 1-29a below), the whole set is the union of the “grids” (see part 1-29b) and between any two real numbers there is a dyadic rational number (see part 1-29c). Let D be the set of dyadic rational numbers and for each n E W let Dn :=
K
-
:p E
Z].
(a) Prove that for all n E N we have Dn c Dn+l.
uDn m
(b) Let
:=
{ x E W : (3n E N : x
co
E Dn)
] and prove that D
=
u D, n=l
n=l
(c) Prove that for any x , y E
R with x
< y there is a dyadic rational number d so that x < d < y.
1-30. In this exercise, we will prove that the real numbers are the (up to isomorphism) unique linearly ordered complete field. That is, we will prove that every mathematical object that satisfies Axioms 1.1, 1.6, and 1.19 is in a certain sense (defined below) “the same as R.” First notice that, similar to the proof of Theorem 1.28, all results proved so far hold for any object that satisjfies Axioms 1.1, 1.6, and 1.19 (because the results are derivEd fzom t h e y axioms). That is, every set W that satisfies Axioms 1.1, 1.6, and 1.19 contains subsets N, Z, and Q that have the properties that we have proved up to now for the natural numbers, the integers and the rational numbers. (a) Prove that for all x E
W we have that x
= sup(r E
Q : r 5 x).
0
(b) Now let Jk be a set that satisfies Axioms 1.1, 1.6, and 1.19 and let 6, 2, and be subsets of R that have the properties that we have proved up to now for the natural numbers, the integers and the rational numbers, including Exercise 1-30a. i. Define a function f : Q + @ as follows. For n E W, let f ( 1 ) := i and once f ( n ) is defined let f ( n 1) := f ( n ) + l . Also let f ( - n ) := l i t . For n E Z and d E W let
+
Q the above definition is not self-contradictory by d proving that it assigns exactly one value to each x E Q.Then prove that f ( x ) E for each x E Q and that f preserves the order, that is, if x < z , then f ( x ) 4 f(z). f
(E)
:= $.Prove that for all x E d
0
ii. For x E R let f ( x ) := sup { f ( r ) : r E Q and r 5 x ] . Prove that for all x E R the above definition is not self-contradictory by proving it assigns exactly one value to each x E
w.
(Formally this says that f is well-defined.) iii. Prove that the above function does not map any two points to the same image by proving that for all x, y E R the inequality x f y implies that f ( x ) # f ( y ) . (Formally, this says that the function f is one-to-one or injective.) iv. Prove that the above function “reaches” every element of Jk by proving that for all 1 E there is an x E R so that f ( x ) = 1. (Formally, this says that the function f is onto or surjective.)
fi
v. Prove that the above function is consistent with the algebraic operations by proving that for all x , y E W we have that f ( x y ) = f ( x ) T f ( y ) and f ( x y ) = f ( x ) - f ( y ) . (Formally, this says that f is a field isomorphism.)
+
vi. Prove that the above function is consistent with the order relation by proving that for all x , y E W we have that x 5 y implies that f ( x ) i f ( y ) . (Formally, this says that f is an order isomorphism.) The above steps show that the points and operations in W and in Jk can b_e identified with each other in such a way th5t it does not matter if we are working in R or in R.Thus for all intents and purposes, W and W are “the same.” This is the essence of saying that the real numbers are up to isomorphism the unique linearly ordered, complete field.
17
1.5. Recursion, Induction, Summations, and Products
1.5 Recursion, Induction, Summations, and Products A recursive definition defines an entity X, that depends on a natural number n first for n = 1 and then it defines Xn+l in terms of X , . By the Principle of Induction the set S = { n E N : X, is defined ] is equal to N,which means that a recursive definition defines the entity X, for all natural numbers n . In this fashion, the sum of finitely many numbers can be defined. I
Definition 1.38 For each j E N let
aj E
R. Define the sum c
a j := a1
and for
j=l
n+l
n
E
Ndefine the sum x a j := a n + l
n
-m
j=l
j=l
+ Caj.Form E N U {0},set x
j=l
a j := 0. The
parameter j is also called the summation index. In particular, note that a sum whose index starts at 1 and ends at a number smaller than 1 is always zero. It is also called an empty sum. Summations that start at numbers other than 1 are defined similarly (Exercise 1-31). By their nature, recursive definitions are closely linked to induction. Unlike what is stated in Theorem 1.23, induction normally is used to prove statements about natural numbers. This is possible, because a proof that a statement is true for all natural numbers is the same as a proof that a certain set is equal to N.
Theorem 1.39 Principle of Induction. Let P(n) be a Statement about the natural number n. I f P ( 1) is true and iffor all n E W truth of P(n) implies truth of P(n 1), then P (n) holds for all natural numbers.
+
Proof. Let P be as indicated and consider the set S := { n E W : P(n) is true }. Then 1 E S. For every n E S the statement P(n) is true, hence P(n 1) is true, which means n 1 E S . By Theorem 1.23 we conclude S = N and thus P(n) is true for all n E N. w
+
+
Standard Proof Technique 1.40 In the form of Theorem 1.39, induction is a standard proof technique. It involves a two-step process. In the first step, called the base step, P ( l ) is proved. Then, in the induction step, P(n) is used to prove P(n 1). In this context, P(n) is also called the induction hypothesis. All proofs in this section rely on induction. Moreover, Exercise 1-32 exhibits another way to carry out an induction (sometimes called strong induction). 0
+
n
Example 1.41 For all n
E
W, the summation formula j=l 7
1 j = -n(n 2
+ 1) holds.
7
1
Proof. The statement is P(n) = j=1
cj 1
Basestep. Weprove P(n) forn = 1.
j=l
1 = 1 = -1(1 2
+ l ) , so P ( l ) holds.
18
1. The Real Numbers
c n
Induction step. Under the induction hypothesis
1 j = -n(n
2
j=1 nil
I
j = 7 (n
+ 1) we must prove
+ l)((n + 1) + 1). A standard step in induction for recursively defined
L,
j=1
quantities is to split off the last term. This is done in the first step here.
cj
n
n+l
=
(n+l)+Cj
j=l
j=1
1 1 1 + 1) + -n(n + 1) = -2(n + 1) + -n(n + 1) 2 2 2
=
(n
=
1 -(n 2
+ 2)(n + 1). 0
Further examples of similar inductions can be found in Exercise 1-33.
Similar to sums we can define products. Although products occur less frequently than sums, they are useful to define powers.
n 1
Definition 1.42 For each j E N,let aJ
n
E
R. DeJine the product
n+l
f o r all n E
N dejine the product
n
aJ := an+l .
j=1
fi
n
aJ := a1 and
5= 1
a ] . For all m E
N U [O}, set
J=1
a] := 1. The parameter j is also called the product index.
J=1
Products that start at numbers other than 1 are defined similarly (Exercise 1-31). Products that end at an index that is smaller than the starting index are set to 1 and are also called empty products.
n n
Definition 1.43 For all a
E
R,and all n
E NU{O}, we dejine
the nth power an :=
a.
J=1
Aside from integer powers of numbers, we want to work with rational powers. To define rational powers, we need nth roots of nonnegative real numbers. To formally prove their existence, we need the Binomial Theorem. As a start we need binomial coefficients and one of their key properties.
n n
Definition 1.44 For all n
E
N U {O}, we dejine n ! :=
J=1
of n. For all n , k E n! := k ! ( n - k ) ! ’
(el
j and call it the factorial
N U (0) with k 5 n, we dejine the binomial coefficient as
1.5. Recursion, Induction, Summations, and Products
Theorem 1.45 The equation
19
(k:l)+(F)=(nll)
holdsforalln,k E
N
with k 5 n.
Proof. This result can be proved by direct computation.
=
(krl)+(;)
-
n! n! ( k - l ) ! ( n- ( k - l))! + k ! ( n - k ) ! n!k n !( n 1- k ) - n ! ( k +n 1- k ) k !( n 1- k ) ! k !( n 1-k ) ! k !( n 1 -k ) ! (n l)! k!(n 1 - k ) ! '
+ + +
+
+ +
+
+
Now we are ready to prove the Binomial Theorem.
Theorem 1.46 The Binomial Theorem. For all real numbers a , b we have ( a
+ b)"
n
, ,
E
R,and all n
E
N,
= k=O
Proof. Throughout the proof we will freely use the properties of sums proved in Exercise 1-34. The proof is by induction on n , with P ( n ) being the statement about (a b)". Base step. For n = 1, note that
+
which proves the base step. Induction step. Assuming that the result holds for n , we must prove it for n First note that it follows easily from Definition 1.43 that for all x E R and all m we have x . x m = xrn+'.
=
(a
=
(a
+ b)(a + b)" + b) f:(I) akbnPk k=O
=
2( i )
ak+lbn-k
k=O
+
2 (:> k=O
akbn+l-k
+ 1. E
N
20
1. The Real Numbers
=
(
;),.+lbn+l-intl,
+(
[( ) + (
: ) U O ~ ~ + I - ~ + ~
j=1
j-1
7)]dbn+l-j
With the Binomial Theorem, we can prove that nth roots exist. The proof of Theorem 1.47 is the first proof in this text in which we have to choose a number to make another number smaller than a given bound. That is, this is our first proof with a distinct analytical flavor. Theorem 1.47 Let n E N.For every nonnegative real number a, there exists a unique nonnegative real number r such that r n = a.
Proof. We first prove the existence of r . Let R := {x E R : x _> 0 and x n 5 a } . Then 0 E R and R is bounded above by max{l, a } . Let r := sup(R). To show that rn = a , we will show that rn # a and r" # a . First, suppose for a contradiction that rn < a . Then there is a 6 > 0 so that r n 6 < a. By Theorem 1.32 (or Exercise
+
1-27), for each k E (1, . . . , n } we can find an mk
E
N so that
(i)
m := max{ml, . . . , m,,). Then by the Binomial Theorem we conclude
k=l
mk
i
r n - k a n
s
.; Let
i
1.5.Recursion, Induction, Summations, and Products
The above shows that r
21
1 +E R , contradicting the fact that r = sup(R). Hence, m
rn # a . The proof that rn 3 a is similar and left to the reader as Exercise 1-36. For uniqueness, suppose for a contradiction that there is another b >_ 0 with b” = a . Then b < r or b > r . But if b > r. then with S := b - r we obtain a = b“ = ( r
+ S)n
= rn
+
rnPkSk> a , a contradiction. Hence, b < r . But k=l
then with 6 := r - b we have a = r n = ( b
+ 6 ) n =bn +
bnPkSk> a , a con. .
tradiction. Therefore r is unique. We conclude by defining rational powers of nonnegative numbers and by proving some of their properties.
Definition 1.48 Let n E N and let a E IR be nonnegative. The unique nonnegative real For n = 2 the root number r such that rn = a is called the nth root of a, denoted G. is called the square root, denoted &. Existence of nth roots is another property that distinguishes R from Q.Although Theorem 1.36 indicates that there are “many” rational numbers, the rational number system has some shortcomings when it comes to powers.
Proposition 1.49 There is no rational number r such that r 2 = 2. Proof. We first prove by induction as stated in Exercise 1-32 (strong induction) that if n2 = 2z for some z E N,then n = 22’ for some z’ E N. The base step for n = 1 is vacuously true. That is, because the hypothesis l 2 = 2 2 leads to the contradiction 1 = l 2 = 22 = z z > 1, the hypothesis is never true, which means that the implication is automatically true (see Definition A.2 in Appendix A). For the induction step, first note that the result is trivial for n = 2, because 2 = 2.1. Now assume that n > 2 and the statement has been proved for all natural numbers less than n. Then 22 = n2 = ( n - 2 2)2 = ( n - 2)2 4(n - 2) 4 implies that ( n - 2)2 = 22 for some Z E N.By induction hypothesis, we conclude that n - 2 = 2: for some Z E N,and hence n = 22 2 = 22’ for some z’ E N. This proves that if n2 = 2z for some z E N,then n = 22’ for z’ = 2 1 E N.
+
+ +
+
+
+
Now suppose for a contradiction that there are n E Z and d E N so that (;)2 =2 and such that there is no k E N \ 11) such that n = nk . k and d = dk . k . But by the above n2 = 2d2 implies n = n2 . 2 . Consequently, 2d2 = (n2 . 2)2, that is, d 2 = n; ’ 2 , which implies d = d2 . 2, a contradiction. We conclude from Theorem 1.47 and Proposition 1.49 that f i is irrational. For odd natural numbers (that is, natural numbers of the form n = 2k l), it is possible to define the nth root of a negative number a < 0 as @ := For the most part, powers are considered for nonnegative numbers, though.
+
-m.
1. The Real Numbers
22
Definition 1.50 For all real numbers a 2 0, all m E N U {0), n E N and all q E with q > 0 we dejine 1. a ; := ~.
That is, the
(i)th power
Q
of a is the nth root of a.
2. a: := ( a " ) : .
3.
a-q
:= (a">-
1
1 = -fora Uq
+ 0.
Theorem 1.51 For all positive numbers a and b and all rational numbers x and y , the following power laws hold: aXaY
=(p+Y
(q=
(ab)x = a X b X
aXY
Proof. We first prove (ab)x = axbx. For exponents n E N,this is an easy induction. The base step n = 1 is trivial and the induction step from n to n 1 is (ab)n+' = ab(ab)n = abanbn = aanbb" = an+'bn+'. n For rational exponents - with n , d E N,we have that (ab)a = (ab)n and d ( ~ z b ;= ) ~ = anbn = (ab)n. Note that in both equalities we used the dejinition of fractional powers, not the power law that we are currently proving. Because all numbers involved are positive and dth roots are unique, we conclude that n n n (ab)a = a a b a . For x = 0, the equality (ab)x = a X b Xis trivial. Finally, for all positive x E Q we note (ab)-xaxbx = (ab)-x(ab)x = 1. Therefore (ab)-x is the multiplicative inverse of u X b x ,that is, ( d P= xa-xb-x. Thus (ab)x = aXbXfor all a , b > 0 and all x E Q. To prove that uX+Y = a X a y we proceed similarly. For exponents m , n E N,the proof for arbitrary m is an induction on n. The base step amal = ama = urn+' follows straight from the definition of powers with natural exponents. For the induction step from n to n 1, note that aman+' = amana = a m f n a = a m + ( n f 1 )which , proves the result for exponents m , n E N. For positive rational exponents x and y , note that there are m , n , d E N so that m n x = - and y = -. Then, using the equality we already proved, we obtain d d
+
(
(b;)d
'!>"
+
ni
n
aXaJ = a a a a =
(.")f
1
= (a"an)a = (am+n
)
f
-
=a ~ + ~ .
The equality is trivial if one of x and y is zero. In case both exponents are negative, note that for all positive x,y E Q we have aX+Ja-Xa-J= ~ ~ a J ' a - ~ a -= - Y1, which means a-Xa-Y = ~ - ~ f ( - y ) as was to be
23
1.5. Recursion, Induction, Summations, and Products
proved. This leaves the case in which one exponent is positive and the other is negative. Let x,y E Q be positive and consider ax-)'. If the inequality 1x1 > IyI holds we have ax-4' a-v a --x = &Ca-x = 1, which means that ax-y = axa-y. If 1x1 < IyI we have aL'-xaxa-.i = a y a - y - 1, which means that ay-' = aya-'. If 1x1 = IyI the claim is trivial. Thus a x + y = d a y for all a > 0 and all x,y E Q. We leave the remaining three equalities as Exercise 1-37. Power laws for a 5 0 and b 5 0 (as applicable) can be proved similarly. To conclude, note that the results presented in this chapter guarantee that the real numbers have the properties we expect them to have. We will therefore use the usual notation (fractions, etc.) and laws of algebra throughout this text without further qualms about the need to justify that we are indeed allowed to do so.
Exercises 1-31. Let k . rn
E
m
Z and for each j
E
Zlet a j
E
n in
W.Define the sum
a , and the product j=k
a,.
j=k
1-32. Let P ( n ) be a statement about the natural numbern. Prove that if P(1) is true and if for all n E N\[1) truth of P ( l ) , . . . , P ( n - 1) implies truth of P ( n ) , then P ( n ) holds for all natural numbers. This type of induction is sometimes called strong induction. Hint Consider S := n
{
W : ( V k < IZ : P ( k ) holds
E
) }.
1-33. Prove each of the following by induction.
n
(b)
1
j 3 = ;n2(n
+
(d) Bernoulli's inequality. Prove that for all real numbers x that (1 +x)" z 1 + n x .
1-34. Properties of sums and products. Let c E
W and for all j
(a) Prove that for all n E N we have x ( a j
+bj) =
n
# 0 and n 2 2 we have
-1, x
N let a j and bj be real numbers aj
j=1
j=1
(b) Prove that for all n E
E
P
+
x
bj
j=l
n
N we have E ( c a j ) = c
nj J=1
j=1
n
1 =n
( c ) Prove that for all n E N we have j=1
(d) Prove that for all n
E
N we have
fi
(a, . b j ) =
j=1
1-35. Reindexing sums. Lets
E
Z, n
E
N and for j
[fi [fi aj )
j=1
E
Zlet a,
E
.
bj
)
j=1
5s. Prove that
s+n
n+I
1=s
i=l
C a, =
nkfs-l.
1. The Real Numbers
24 1-36. Finish the proof of Theorem 1.47 by showing that r" # a . Hint. Suppose r" > a and prove that then for some E > 0 and all 6
E
(0, E ) we have r" - S @ R
1-37. Finish the proof of Theorem 1.51. That is, let a and b be positive real numbers, let x , y E prove each of the following. (c)
Q and
(ax)' = ax'
1-38. Let 0 5 a < b and let q z 0 be rational. Prove that aq < bq. 1-39. Let a,x
E
(0, 00) and let x be a rational number.
(a) Prove that if a > 1 and x > 1, then ax > a .
P and compare a p and a4 Hint. Let p , q E N be so that x = 4 (b) Prove thatifa < 1 andx < 1, thenax > a . (c) Prove that if a > 1 and x < 1, then ax < a . (d) Prove that if a < 1 and x > 1, then a x < a . 1-40. Letn
E
IV,Provethat
(a>
= 1 andthat
(:>
=I
1-41. Prove that there is no rational number r such that r 2 = 3. 1-42. Prove that for any n real numbers X I ,
. . . , xn the inequality
a+b 1-43. Prove that for all a,b t 0 the inequality 4% 5 -holds. 2 1-44.
(a) Prove that for all a , b
E
W we have d;;2+b2 5 / a /+ /bl.
(b) Prove that for any a l , . . . , an
E
W we have
1-45. Let a , b E W with a < b. Prove that there is an irrational number x E W \ Q such that a < x Hint. Use that f i is irrational and Exercise 1-27 and mimic the proof of Theorem 1.36.
4
b.
Chapter 2
Sequences of Real Numbers Convergence is the fundamental concept of analysis. It explores what happens when two quantities get close to each other, or when a quantity grows beyond all bounds. This chapter exhibits these ideas for sequences, with special emphasis on standard proof techniques.
2.1 Limits We start by defining sequences.
Definition 2.1 A sequence of real numbers is a function f from the natural numbers to the real numbers. To emphasize their discrete nature, we denote sequences as with the understanding that a, = f ( n )for all n E N. Similar to sums and products, a sequence can actually start at any integer k (Exercise 2-1). The limit of a sequence should be the place where the sequence “stabilizes” for large n. Definition 2.2 encodes this property by demanding that for every given tolerance E , there is a threshold N so that once the running index n has gone past the threshold N ,the sequence can only deviate from the limit by less than the tolerance E .
Definition 2.2 Let {a,}=, be a sequence of real numbers. Then L E R is called limit of [ u , } g , ifffor all E > 0 there is an N E N so that for all n 2 N we have that la, - L J < E (see Figure 2). A sequence that has a limit will be called convergent, a sequence that does not have a limit will be called divergent. Fintrely man) n mapped below L - E w
1
w
All
n
IV i u “la1l.’ of w) mapped to ( L - 8 , L
-
w
L
w
-&
L
+
Finitely many n mapped aboie L
+
F)
w
\
w
L
f
a
w
r
E *
E
Figure 2: Visualization of convergence to L . For every E > 0 a “tail” of the sequence is in ( L - E , L E ) .
+
25
2. Sequences of Real Numbers
26
Remark 2.3 It can be helpful to restate the definition using quantifiers (see Definition A.3 in Appendix A). L
E
R is a limit of {a,}Eliff Vs
> 0 : 3N E
N : V n 2 N : la,
- LI
0 the inequality I L - MI < E holds. Let E > 0 be arbitrary but fixed. Then there is an N1 E N such that for all n 2 N1 & we have that la, - L / < -. There also is an N2 E N such that for all n 2 N2 we 2 & have la, - M I < -. Let N := max{Nl, N2]. Because N 2 N1, for all n 2 N we 2 & & have (a, - L / < -. Because N 2 N2, for all n 2 N we have la, - MI < -. Then 2 2 by adding and subtracting U N and applying the triangular inequality we obtain (with n =N )
Because for arbitrary E > 0 we have the inequality lL - MI < E , by Theorem 1.37 we conclude that I L - MI = 0, and hence L = M . We will ultimately read and produce proofs that are much more complicated than the proof of Proposition 2.4. Therefore it is only appropriate to analyze how such proofs can be conceived. The standard proof techniques discussed later in this section reveal that certain details are indeed standard techniques which simply need to be internalized and used at the right time. Other than that, the novice usually is impressed & . by the sometimes “strange” choices for E . The reason why - is chosen in the proof of 2 Proposition 2.4 is that the proof is actually created backwards. Consider the following. To show that L = M we first note that because the a, are eventually close to L and close to M , we can put an a, with a sufficiently large index between L and M . After applying the triangular inequality
27
2.1. Limits
the resulting differences should be small. By Theorem 1.37, if for all E > 0 we can make the difference IL. - MI less than or equal to E then IL, - MI = 0, that is, L = M . So we want to make the sum of the differences lL - a, I and (a, - MI smaller than E . & It is most natural to make each of the two terms smaller than - to obtain 2
jL - MI = IL - UN
+
- MI 5 IL. - a N 1
+ IUN - MI
E
+
E
- = E. 2 2 This argument provides the final few lines of the proof. Note that up to here we have not chosen any N1,N2, or N . However, now that we have the “meat” of the argument, it is easy to create the “header.” E To make la, - L J and la, - MI smaller than -, we use that by the definition 2 of convergence there are N1 and N2 as mentioned in the proof and choose N to be their maximum so that both required inequalities hold for indices beyond N . Note that even though the “header” is the first thing we encounter, it is often the last thing that materializes as a proof is created. So, to set up an analysis proof it is standard practice to start by working with inequalities. Once the inequalities work, we create a “header” with the appropriate choices for E , n , and so on. UN
< -
Standard proof techniques in analysis. Certain steps occur so frequently in analysis proofs that they should become second nature. In this fashion, communication becomes more effective because memory is less strained to recall details of proofs. This is a cognitive technique commonly known as “chunking” of data. By internalizing certain standard “chunks,” larger amounts of data can be recalled, because we only need to recall which chunks are involved rather than all details. Unlike the standard proof techniques listed so far, from here on most standard proof techniques will be specific to analysis. The standard techniques used in the proof of Proposition 2.4 are listed below. Standard Proof Technique 2.5 It is common practice to rearrange terms and to add and subtract the same term to obtain more manageable expressions. When working with absolute values, this is often done in conjunction with the triangular inequality. In Proposition 2.4 this is the step IL - MI = J L - U N
+
UN
- MI 5 IL - UN I
+
jUN
- MI.
+
Such a step is usually abbreviated as ( L - MI 5 IL - U N I la^ - MI. In other computations, we will see that it can also be useful to multiply and divide by the same nonzero term. 0
Standard Proof Technique 2.6 If finitely many numbers N1, . . . , Nk E N are such that for all n 2 Ni a certain inequality holds, we can choose N := max{N1, . . . , Nk}. Then for all n 2 N all these inequalities hold. We usually claim directly that such an O N exists, skipping the intermediate Ni . Standard Proof Technique 2.7 To prove that two quantities are equal, we can prove that for any E > 0 the absolute value of the difference is less than E . This is usually done without explicit reference to Theorem 1.37. To prove an inequality a 5 b we often prove that for all E > 0 the inequality a 5 b + E holds (see Exercise 1-25). 0
28
2. Sequences of Real Numbers
Standard Proof Technique 2.8 In many analysis proofs we prove results about a universally quantified variable, often denoted E . To prove such results, we pick one such E that is “arbitrary, but fixed,” throughout the proof. It must be fixed throughout the proof so we can uniquely define quantities that depend on it, and it must be arbitrary so that we really prove something about all variables in the scope of the universal quantification. Once the result is proved, we can conclude that “Because E was arbitrary we have proved . . . for all such E.” This final statement reiterates that, even though we made specific choices for the E in the proof, we can indeed make these choices for all E , which proves the universally quantified statement. Because this approach is so common, the bracketing statements about the variable E are usually left out or abbreviated. 0 Standard Proof Technique 2.9 Finally, statements like “We need to prove L = M,” that are put at the start to remind the reader what we will prove are often left out. Similarly, statements put at the end to reiterate what we have proved are often left out, too. 0 To phase in these techniques, we will first carry them out explicitly and give a reference to the above list. Then we will omit the explicit step, but still refer to the appropriate entry in the above list. Eventually a proof for something like Proposition 2.4 will condense to the following.
“Expert Proof” of Proposition 2.4. Let E > 0. Then there is an N E W such that & & for all n 2 N the inequalities la, - L ( < - and (a, - MI < - hold. Therefore 2 2 IL - MI 5 ( L - a N I
+ la^ - MI
E
0 be arbitrary but fixed, and let N for all n 3 N the following holds.
E
N be such that N
12-10 >
4
3n-1 3 Because E > 0 was arbitrary, this proves that lim -- n-tm2nf5 2
. Then
0
Exercises 2-1. Let k E Zand for each n E Zwith n 2 k let a, E B.Define the sequence it means for L E W to be its limit.
and define what
2-2. Prove Theorem 2.1 1. 2-3. Write out the argument that produced the choice for N in Example 2.12. 2-4. Let {an]F=, and (bn)r==l be sequences such that for all n if { u , ) ~ =converges ~ then so does
2-5. Let {a,)?=,
{bn)r-’, and n+cu lim a ,
E
N we have la,
= Iim bn.
- bnl
0 there is an N E N so that for all n 2 N we have the inequality (a, bn) - ( L M)I < E . Let E > 0 be arbitrary but fixed (see Standard Proof Technique 2.8). Because E lirn a, = L there is an NL E N so that for all n 2 NL we have la, - L / < -. Simn-m 2 ilarly, because lirn b, = M there is an NM E N so that for all n 2 NM we have
1
+
+
n+w
Ib,,
&
-. Let N
:= max(NL, NM}. Then (compare with Standard Proof Tech2 & & nique 2.6) for all n 3 N the inequalities la, - LI < - and Ib, - MI < - hold. By 2 2 -
MI
0. Prove that lirn - = 0. n+3o n ! rz! 2-15. Conjecture the value of lim - and then prove your conjecture. n-cc n" 2-16. Let
{an]gl be a sequence of real numbers and let ( p n ) g l be a sequence of positive real numbers 1
so that lim ___ = 0. n-tm c g , l Pk c;=i Pkak
(a) Prove that if (an]r=O=l converges to a E R, then lim
n++m
(b) Give an example to show that the convergence of convergence of
{
=a
C;,lPk
"='
I, =, oii
Pkak Pk
need not imply the
2. Sequences of Real Numbers
36
Figure 3: An injective function maps all elements of the domain to distinct images, but some elements of the range may not have a preimage ( a ) . For a surjective function, every element of the range has a preimage, but some elements of the domain may be mapped to the same image ( b ) . A bijective function maps all elements of the domain to distinct images and each element of the range has a preimage (c). This is why the existence of a bijection between two sets indicates that the two sets are “of the same size’’ (see Definitions 2.25 and 7.1 1).
2.3 Cauchy Sequences A sequence so that for all E > 0 all elements with a sufficiently large index are within E of each other should converge. Indeed, this condition guarantees that the elements with large indices cluster ever more tightly. However, the number system may have a hole just where these elements cluster. For example, the sequence 1.4, 1.41. 1.414, 1.4142, . . . of successively better decimal approximations of h does not converge in Q because the value that the sequence approaches is not in Q (see Exercise 2-17). This section shows that this problem does not arise in the real numbers. Sequences for which elements with large indices cluster ever more tightly play an important role in analysis. They are called Cauchy sequences.
{a,}zl
Definition 2.23 Let be a sequence of real numbers. Then {an)rT1is called a Cauchy sequence i r f o r all E > 0 there is an N E N so that for all m , n 2 N we have that laTl- a, 1 < E . In the real numbers, convergence and being a Cauchy sequence are equivalent. Before we can prove this result, we need to define finite and infinite sets.
Definition 2.24 Let A , B be sets and let f : A -+ B be a function. Then f is called injective or one-to-one ifffor all x , y E A the inequality x # y implies f (x) # f (y). f is called surjective or onto ifSfor all b E B there is an a E A with f ( a ) = b. Finally, f is called bijective iff f is both injective and surjective. Figure 3 gives a visualization of injective, surjective, and bijective functions and some properties of injective and surjective functions are investigated in Exercises 2-1 8 and 2-19. Once we have bijective functions, we can define finite sets as “sets of size n,” where n E N U {O}.
2.3. Cauchy Sequences
37
Definition 2.25 A set F is called finite iff F is empty or there is an n E N and a bijective function f : { 1, . . . , n } -+ F. Sets that are notjnite are called infinite. For finite sets F # 5 we set 1 FJ:= n with n as above and we set 101 := 0. For injinite sets ZwesetlII:=co. Lemma 2.26 Let F be ajnite set and let I be an injinite set. Then I \ F is injinite.
Proof. In case F is empty, there is nothing to prove. In case F is not empty let n E N be so that there is a bijective function f : ( 1, . . . , n } -+ F . Suppose for a contradiction that I \ F is finite. Then there are a natural number m E N and a bijective function g : (1, . . . , m } + I \ F . Define the function h : (1, . . . , m n } + I by if j 5 n , Then it is easy to show that h is bijective (Exercise h ( j ) := f ( A ; g ( j - n ) ; if j > n. 2-20). But this means that I is finite, a contradiction.
+
[
Theorem 2.27 A sequence ( u , } ~of~real numbers converges iff it is a Cauchy sequence.
Proof. For the direction “jlet , L” := lim a,. We need to prove that for all E > 0 n-oo there is an N E N so that for all m , n 2 N we have \a, - a, I < E . & Let E > 0. Then there is an N E N so that for all n 2 N we have la, - LI < -. 2 Therefore for all m , n 2 N we obtain
We have proved that for all E > 0 there is an N E N so that for all m , n > N we have la, - a,J < E . Hence, {a,}r=l is a Cauchy sequence. Once more we have used Standard Proof Technique 2.5. It is so common, and we have used it often enough, that we will no longer explicitly refer back to it. “+:”Let ( u , ) be ~ ~a Cauchy sequence. We need to prove that there is an L E R so that for all E > 0 there is an N E N so that for all n 2 N we have la, - L1 < E . We first need to find a suitable number L. Because ( a , ) ~isl a Cauchy sequence, for E := 1 there is an N E N such that for all m , n > N we have la, - a, I < 1. In particular, for all n 3 N we obtain la, - a N 1 < 1, and hence a, < a~ 1. Therefore the set ( n E N : a, 5 a N 1) is infinite and thus {x E R : ( n E N : a, 5 x } is infinite } # 0. For all n 2 N we also have that a, > a~ - 1, so that for all x 5 a N - 1 the set ( n E N : a, 5 x} is finite. Therefore {x E R : ( n E N : a, 5 x} is infinite } is bounded below. This means L := inf (x E R : {n E N : a, 5 x} is infinite } exists by Proposition 1.20. The idea how to obtain L is visualized in Figure 4. To prove that L is the limit of we need to prove that for all E > 0 there is an N E N so that for all n 2 N we have la, - L / < E . Let E > 0. By definition of L , the set H- := FI E N : a, 5 L - - is finite 2 and the set H+ := n E N : a, 5 L - is infinite. Therefore, by Lemma 2.26 the 2 is infinite. Because relative complement H+ \ H.- = n E N : a, E
+
+
(a,}zl
1
[
+7
{
7
2. Sequences of Real Numbers
38 CUlOff polnr Infimtely man) n mapped to an) (-m. x ] filth x > L Finilely many n mapped to any (-m. XIwith x < L
-
Fmrel) many n mapped below a N - 1 1
w
w
-I
FmtzI) many n mapped abore a N
w
w ,
aN - 1
w
L
w
-
w
1
w
\
,
a
w
+1
- c
aN f 1
aN
Figure 4: Visualization of the construction of L in the proof of Theorem 2.27.
{U,},X~=~ is a Cauchy sequence, there is an N E N such that for all in, n 2 N we have & & la, - a,/ < -. Moreover, because n E N : a, E ( L - i,L + is infinite, there
2 is a k 3 N so that
lak -
LI 5
6
z]} 2
{
-. Therefore for all n 2 N we obtain 2
Because E was arbitrary we have proved that for all E > 0 there is an N E W so that for all n 1 N we have la, - LI < E . Hence, lim a, = L . fl+W
Standard Proof Technique 2.28 Application of the Completeness Axiom to get the infimum or supremum of the “right” set is a standard technique on the real line. We will also see this approach in the proofs of Theorems 2.37, 2.41, 3.34, and 8.4. In abstract spaces, this technique is not available and we usually substitute compactness (see Definition 16.57), a property which, for closed and bounded intervals on the real line, is a consequence of the Completeness Axiom (see Theorems 2.41 and 8.4). 0 Convergence of Cauchy sequences is a fundamental analytical property called “completeness,” which is introduced in Section 16.2. Another way of formulating Theorem 2.27 is to say that the real numbers are complete. Although Axiom 1.19 is already called the Completeness Axiom of the real numbers, this terminology makes sense, because Exercise 2-25 shows that Theorem 2.27 and Axiom 1.19 are equivalent. This means that either one of them could rightly be called the Completeness Axiom. There are many other equivalent formulations of the Completeness Axiom. We will encounter some more of them in Theorems 2.37, 2.41, and 8.4 as well as in Exercise 2-5Oc. Whenever we encounter one of these formulations, there will be an exercise similar to Exercise 2-25 to show that the new result is equivalent to one of the equivalent formulations of the Completeness Axiom that we already know. Aside from the fundamental importance of completeness, Theorem 2.27 has immediate value for showing if a sequence is divergent. By Definition 2.2 a sequence is divergent iff (using negations as stated in Appendix A.2) for every real number L there is an E > 0 such that for all N E W there is an n 3 N so that la, - LI 2 E . This is a four times nested quantification that would require us to show for every real number that it is not the limit of the sequence. Theorem 2.27 reduces proofs of divergence to showing the sequence is not a Cauchy sequence.
Example 2.29 The sequence { (-
diverges.
2.3. Cauchy Sequences
39
To prove that the sequence diverges, we need to prove that it is not a Cauchy sequence. This means (see Appendix A.2) we must find an E > 0 so that for all N E N there are m , n >_ N so that la, - a,\ > E . But with E := 1 for every N E N we have that i(-l)N - (-l)N+li = 2 > 1. Hence, { (- l),}Zlis not a Cauchy sequence and therefore it diverges.
0
Standard Proof Technique 2.30 In Example 2.29, we had to negate the statement “for all E > 0 there is an N E N such that for all m , n 2 N we have la, - a, 1 < E.” When negating such a complicated statement, it is helpful to write the statement in quantifiers and then negate it. In this fashion, the definition of a Cauchy sequence is VE > 0 : 3N E
N : V m , n 2 N : la, - a,(
< E,
and the negation becomes (see Appendix A.2) 3~ > 0 : V N E
N : 3m, n
>_
N : la,
- a,
I 2 E,
which is what was needed in Example 2.29. The schematic way in which quantified statements can be negated is very helpful, especially the first few times one negates a complicated statement. However, the quantifiers must not become a crutch. It is advisable to first try the negation verbally and then double check with quantifiers.
Standard Proof Technique 2.31 To prove that a sequence of real numbers converges, we often simply prove that it is a Cauchy sequence. To prove that a sequence of real numbers diverges, we prove that it is not a Cauchy sequence. 0
Exercises 2-17. Prove that number.
{
1
x. is a Cauchy sequence of rational numbers whose limit is not a rational
n=l
2- 18. Let A . B be sets and let f : A -+ B be a function. Prove that f is injective iff for all bl , b2 E B we have that f ( b 1 ) = f ( b 2 ) implies bl = b2. 2-19. Compositions of injective and surjective functions. Let A , B , C be sets and let f : B -+ C and g : A -+ B be functions. The composition of f and g is defined by f o g ( a ) := f ( g ( a ) ) for all a E A.
(a) Prove that if f and g are injective, then so is f o g . (b) Prove that i f f and g are surjective, then so is f o g 2-20. Prove that the function h in the proof of Lemma 2.26 is bijective. 2-2 1. State the definition of a convergent sequence using quantifiers 2-22. For each of the following sequences, prove that it converges or prove that it diverges
40
2. Sequences of Real Numbers
2-23. Existence of the limit on the left side of a limit law as in Theorem 2.14 does not imply the existence of the limits on the right side. (a) Use {a,]:,
:= { ( - l ) , } g 1
1
and [b,]El:= (-1)"'l
cr3
ln=l
to show that lirn a, n+oo
+ b,
can exist without either sequence being convergent. (b) Show that lirn a, - b, can exist without either sequence being convergent. n+m (c) Show that lirn a, . b, can exist without either sequence being convergent n+m
(d) Show that lim
?!
n+m b,
can exist without either sequence being convergent
2-24. Can a Cauchy sequence have two limits? Explain your answer. 2-25. Use the fact that Cauchy sequences converge in the real numbers and the axioms for R except for Axiom 1.19 toprove Axiom 1.19. Hint Let S G R be bounded above and not empty. Construct a Cauchy sequence so that for all x E S there is an N E N so that for all n 1 N the inequality an 2 x holds.
2.4 Bounded Sequences If the elements of a sequence cannot become arbitrarily large, we speak of a bounded sequence. Unlike being a Cauchy sequence, boundedness is not equivalent to convergence, but it still has some important consequences.
Definition 2.32 A sequence is called bounded above iff there is a number A E R such that for all n E N the inequality a, 5 A holds. In this case, A is also called an upper bound of the sequence. A sequence {a,}Klis called bounded below iff there is a B E R such that for all n E N the inequality a, 2 B holds. In this case, B is also called a lower bound of the sequence. Finally, a sequence is called bounded #it is bounded above and bounded below and we call it unbounded if not. Example 2.33 The sequence
is bounded, while the sequence
{n)El is not
-
U
bounded.
Proposition 2.34 Any convergent sequence of real numbers is bounded. Proof. Let { a n ) z lbe a convergent sequence and let L := lim a,. We need to n+cc prove that there is a number M E R so that for all n E N the inequality /a,/ 5 M holds. Let E > 0. Then there is an N E N so that for all n 2 N we have la, - L / < E . Let M : = m a x ( J L l + & , la11 . . . . , I a M - I I } . Thenforalln < Nwetnviallyhavela,I 5 M and for n L N we obtain lan] 5 la, - LI jLI < E ILI 5 M . We have proved that is bounded below by -M and above by M . w
+
+
In general, the converse of Proposition 2.34 is not true as the next example shows.
Example 2.35 The sequence { (-l)"}zl is bounded, but it does not converge (see Example 2.29). 0
2.4. Bounded Sequences
41
For monotone sequences, however, boundedness does imply convergence.
Definition 2.36 Let { a n } E i be a sequence. Then [a,}Elis called nondecreasing ifs for all n E N we have a, 5 an+l. It is called nonincreasing ifffor all n E N we have a, 2 a,+l. rf{a,}EI is either nonincreasing or nondecreasing, then it is called monotone. Moreovel; is called (strictly) increasing zxfor all n E N we have a, < a,+l and it is called (strictly) decreasing ifffor all n E N we have a, > a,+l. The sequence { n ] z lshows that nondecreasing sequences can grow beyond all bounds. But if this is not the case, a monotone sequence converges. The key to the proof is Standard Proof Technique 2.28.
Theorem 2.37 Monotone Sequence Theorem. If { u , , } ~is ~bounded and monotone, then [a,}Elconverges. More precisely 1.
v{a,]Elis bounded above and nondecreasing, then it converges.
2.
Zf{a,},X=l is bounded
below and nonincreasing, then it converges.
Proof. We only prove part 1. The proof of part 2 is similar (Exercise 2-30). To prove part 1, let be bounded above and nondecreasing. Then the set {a, : n E N] is bounded above, and hence by Axiom 1.19 it has a supremum L . To prove that L is the limit of the sequence, we must prove that for every E 0 there is an N E N so that for all n 2 N the inequality la, - LI < E holds. Let E > 0. By Proposition 1.21 there is an N E N with U N > L - E . But then for all n N we have L - E < U N 5 a, 5 L , and hence ~ U N- LI < E . We have proved that for every E > 0 there is an N E N so that for all n 2 N we have la, - LI < E . Hence, lim a, = L . H n+o3
Although boundedness does not imply convergence, it forces the sequence to “cluster” in some places. To make this idea more precise, we need the notion of a subsequence.
Definition 2.38 Let A , B , C be sets and let f : B -+ C and g : A -+ B be functions. The composition o f f and g is defined by f o g ( a ) := f ( g ( a ) )for all a E A. Definition 2.39 Let [a,]Elbe a sequence of real numbers and let { n k ] g l be a strictly increasing sequence of natural numbers. Then { a f l k } Eis 1 called a subsequence of [ u , } ~ Formally, ~ . a subsequence is the composition of the function that maps k to nk and the function that maps n to a,. Convergence is what happens when the indices get large. To obtain a notion that is useful to analyze convergence behavior, in the definition of a subsequence we had to specifically demand that the nk are strictly increasing. If we had allowed nk = n k + l , then by choosing [ n k } & to be a constant sequence, any sequence would have infinitely many convergent “subsequences.” This would be counterintuitive, because a sequence such as
{ k]
00
would have a “subsequence” that converges to 1 even though the n= 1
sequence itself converges to 0.
2. Sequences of Real Numbers
42
With the definition as it is, subsequences behave sensibly when the sequence converges.
{a,}zl
Proposition 2.40 Let be a convergent sequence of real numbers with limit L . Then every subsequence { a n k } z also l converges to L.
{a,}zl
Proof. Let be a convergent sequence of real numbers with limit L and let be a subsequence. We must prove that for all E > 0 there is a K E N so that for all k 2 K we have lank - Li < E . Let E > 0. Because nk < nk+l for all k E N,an easy induction shows that rZk 2 k for all k E N (Exercise 2-31). Because { a , } z l converges, there is an N E N so that for all n 2 N we have la, - LI < E . Therefore for all k 2 N we obtain nk 2 k 2 N , and hence lank - LI < E . We have proved that for all E > 0 there is a K E N so that for all k 2 K we have /ank- L / < E . Hence, { a n k } E 1converges to L , too. H
{U,,}~QS_~
The precise statement of the idea that a bounded sequence of real numbers must “cluster” somewhere is that a bounded sequence of real numbers must have a convergent subsequence. This is an important property of the real numbers, which is ultimately encoded in the notion of compactness (see Section 16.5). The proof of the Bolzano-Weierstrass Theorem utilizes Standard Proof Technique 2.28.
of
Theorem 2.41 Bolzano-Weierstrass Theorem. Any bounded sequence real numbers has a convergent subsequence.
Proof. Let be a bounded sequence and let b > 0 be such that for all n E N we have ]a,] < b. Then b is a real number so that { n E N : a, 5 b ] = N and {n E N : a, 5 -b} = 0. Therefore b is contained in the set of real numbers {x E R : { n E N : a, 5 x} is infinite }. Moreover, this set is bounded below by -b. Hence, the infimum L := inf { x E R : ( n E N : a, 5 x } is infinite } exists by Proposition 1.20. We will prove that L is the limit of a subsequence of {U,},X=~. To do this, we will employ the Standard Proof Technique 2.22 and find a subsequence { a n k } E lso that /ank- LI is bounded by a sequence that goes to zero. For each k E N the set
1
Hk+ := n
E
N : a, 5 L
+ -kl l is infinite and the set HC
is finite. Therefore the set Tk := n E N : a,%E ( L -
{
1
i ,L
1 + ‘1)
:= n
E
N : a,
7
5L -k
= Hk+ \ H L is ink finite for each k E N (see Lemma 2.26). Construct {nk}‘& inductively as follows. Because TI is infinite, it is not empty and we let n 1 E T I . Once rZk was chosen, let nk+l be any natural number in Tk+l that is greater than ilk. Such a natural number exists, because T k + l is infinite. Then {a,,}& is a subsequence of and for all 1 k E N we have /ank- LI 5 -, because nk E Tk. By the Squeeze Theorem this means k H lim la,, - L / = 0, and hence lim ank = L.
{a,]zl
k-m
k+w
43
2.4. Bounded Sequences bounded
+:(1 + (-1)")n t
convergent subsequence
bounded and monotone f:
convergent
ti
monotone
c
finite or infinite limit
(-1)"
pi: n
Figure 5: Implications between the various notions related to convergence that are introduced in this chapter. Implications are indicated with arrows and the examples near the arrows indicate that the opposite implication does not hold. The Bolzano-Weierstrass Theorem is a useful tool in single variable analysis. We will see examples of its use in the proofs of Theorem 3.44 and Lemma 5.19. For many sets of properties, it is instructive to explore which property implies which other properties and which properties are equivalent. Figure 5 summarizes the properties introduced in this chapter (including those from the next section) and how they are related to each other.
Exercises 2-26. For each of the sequences below determine if it is bounded, and then prove your claim.
2-27. For the given sequence (a, nk is as indicated.
]El,find an expression for the terms of the subsequence { unk)Elwhere
2-28. Use Proposition 2.40 to prove that each of the sequences below diverges.
2-29. Explain why we could have chosen E = 1 in the proof of Proposition 2.34. Then explain why we cannot choose E = 1 in a general convergence proof. 2-30. Prove part 2 of the Monotone Sequence Theorem. 2-31. Perform the induction mentioned in the proof of Proposition 2.40. First state exactly what it is that you prove, then execute the proof.
2. Sequences of Real Numbers
44
2-32 Sketch a visualization for the construction of L in the proof of the Bolzano-Weierstrass Theorem that is similar to Figure 4. 2-33 Let x E R and let (x,)?=~ be a sequence of real numbers that does not converge to x. Prove that there is an E > 0 and a subsequence { x n k } E lso that for all k E N we have xnk - x 1 E .
1
1
2-34 Let [x,)~=~be a sequence of real numbers that has no convergent subsequence. Prove that for each x E R there is an E~ > 0 so that { n E N : Ixn - x / < cx } is finite. be a sequence of real numbers such that every subsequence has a subse2-35 Let L E R and let quence that converges to L . Prove that ( x ~ } ? = ~converges. 2-36 A well-known convergent sequence. n
(a) Let a , b E Wand n
E
N.Prove that b n f l -an+'
= (b- a )
ajbn-j. j =O
(b) Let a , b
E
(c) Prove that
:>")I,
lFt with 0 5 a < b and let n E
{
(1
+
N.Prove that
b"+l - .n+l
b-a
< (n
+ 1)b".
is increasing and that it converges.
Hint To prove it is increasing, bring an+' to the right in part 2-36b and use a = 1
1 and b = 1 + - , To prove the sequence is bounded above, use a = 1 and b = 1 n
+ n +1 1 ~
1
+. 2n
2-31 Use the Monotone Sequence Theorem and the axioms for 1.19.
R except Axiom 1.19 to prove Axiom
2-38 Use the Bolzano-Weierstrass Theorem and the axioms for 1.19.
R except Axiom 1.19 to prove Axiom
2-39 Use the Bolzano-Weierstrass Theorem and the axioms for W except for Axiom 1.19 to prove directly (that is, without using Theorem 2.27 or its proof) that every Cauchy sequence of real numbers converges. 2-40 Use the Bolzano-Weierstrass Theorem and the axioms for Monotone Sequence Theorem. 2-4 1 Let
R except
for Axiom 1.19 to prove the
[U,),X=~ and {bn]E1 be bounded sequences.
+
(a) Prove that (a, b,]?=;O=l is bounded. Hint. Unlike for convergence, the bound need not be one number M . An upper bound of the form Ma M b would work just fine.
+
(b) Prove that ( a n b n ] ~ =isl bounded.
(c) Prove that if there is a S
0 so that b, > S for all n
E
N then
is bounded
2-42. Let x E [O, I]. Prove that the sequence defined recursively by a0 := 0 and the recurrence 1 relation a,+l := an - (x - a,' ) converges to Jr;. 2 Hint. Prove by induction that the sequence is bounded above by &.Then prove that it is nondecreasing.
+
2.5
Infinite Limits
Although unbounded sequences diverge, they can display some types of regular behavior, which are explored in this section.
2.5. Infinite Limits
45
Definition 2.42 Let {a,,}z1 be a sequence of real numbers. Then we say that the limit of ( a , } z l is infinity ifffor every M E R there is an N E N so that for all n 2 N the inequality a, >_ M holds. In this case, we write lirn a, = 00. A limit of negative n-+ x infinity is defined similarly (Exercise 2-43) and denoted lirn a, = -00. ,--too
Intuitively an infinite limit should mean that eventually the sequence gets close to infinity. Similar to Definition 2.2, this idea is encoded in Definition 2.42 by saying that for any given bound M , there is a threshold N so that once the running index n goes past the threshold N ,the sequence will not drop below the bound M any more. It is also said that a sequence with lirn a, = 00 grows beyond all bounds. ,--too
Example 2.43 L e t x > 1. Then lim x n = 00. n+cc
Clearly, for all n E N the inequality xn+' > x n holds. Suppose for a contradiction that { x " } z l does not go to infinity. Then (refer to Standard Proof Technique 2.30 as necessary) there is a B > 0 so that for all N E N there is an rn > N with x m 5 B . Let n E N. By the above, there is an rn 1. n so that X" 5 x m 5 B . Hence, the sequence { x n } z l is bounded above. Let M := sup { x n : IZ E N} . Now M M by Proposition 1.21 because - < M , there is an n E N so that x n > -. But then X
X
M xntl = x . x n > x- = M , a contradiction. Thus lim x n = 00. n-m
X
With some exceptions, discussed at the end of this section, the limit laws for infinite limits are similar to those for convergent sequences. be such that lim a, = co.
Theorem 2.44 Limit laws involving co. Let
n+oo
1. If {bn},X==l is a bounded sequence, then lirn a,, n-+m
+ b, =
00
and if all a, are
bn nonzero, then lim - = 0. n+W a,
2.
If nlirn +x
b, =
then lim a, + b , = m and lirn anbn = 03. ,-too
n+cc
3. I f c > 0 is a real number, then lim can = 03. n--tx
4. I f c
4
0 is a real number, then lirn can = -co. n+m
Proof. To prove part 1 let lirn a, = 00 and let (b,],X==,be bounded. Let B E R be n i x such that for all n E W the inequality Ib, I < B holds. First consider the sum. We need to prove that for all M E R there is an N E N so that for all n > N we have a, + b, 2 M . Let M E R. There is an N E N so that for all n 2 N we have a, 2 M + B . But then for all n 2 N we obtain a, b, > M B - B = M , and hence lirn a, b, = 00. n i x Now consider the quotient. We need to prove that for all E > 0 there is an N E W
+
+
la": I
so that for all n 2 N the inequality -
0. By Theorem 1.32 there is an M E W
1Bl means - < M
E.
&
Now there is an N E W so that for all n >_ N we have a, 2 M . Thus < E , and hence lim
for all n >_ N we obtain
bn
- = 0.
n-+m a,
.. ,
To prove part 2 let lirn a, = lirn b, = 00. n-tcc n-00 For the sum, we need to prove that for all M E R there is an N E N so that for all n 3 N the inequality a, b, 2 M holds. Let M E R and note that there is an N E N so that for all n 2 N we have the M M inequalities a, 2 - and b, 2 - (see Standard Proof Technique 2.6). But then for all 2 2 M M n 2 N we have a, b, 1. - - = M . Hence, lirn a, b, = 00. 2 2 n+co For the product, we need to prove that for all M E R there is an N E N so that for all n 2 N the inequality anbn 2 M holds. Let M E R and note that there is an N E N so that for all n 2 N we have the inequalities a, 2 and b, 2 But then for all n 2 N the product exceeds M because anbn 2 IM I 2 M . Hence, lim a, b, = co. n+oo To prove part 3, let lim a, = co and let c > 0. We need to prove that for all n-00 M E R there is an N E N so that for all n 2 N the inequality can 2 M holds. M Let M E R.There is an N E W so that for all n 2 N we have a, 1. -. But then C M for all n 2 N we obtain can >_ c- = M . Hence, lim can = 00. C n+co Finally, to prove part 4 let lirn a, = 00 and let c < 0. We need to prove that for nim all M E R there is an N E N so that for all n 2 N the inequality can IM holds. M Let M E R.There is an N E N so that for all n 2 N we have a, 2 --. But then
+
+
+
+
m
m.
mJln/ll=
(
i)
for all n 2 N we obtain can I c --
Icl
= M . Hence, lirn
n+m
Can
= -m.
Infinite limits can also help indirectly to establish the existence of finite limits.
Example 2.45 Let x > 0. Then lim x f = 1. n+cc The result is trivial for x = 1. We first consider x > 1. Suppose for a contradiction that lirn xf f 1. For every n E n+cc
Thus if
I
N we have 1 < x h
=
(.;)
*
< xf .
00
xi
' ln=l
does not converge to 1, then (refer to Standard Proof Technique 2.30
as necessary) because xj > 1 for all n E N E N there is an n 2 N with x: > 1
+
N,there is an E.
E
Because (x;]
+
> 0 so that for every 00
n=l
is decreasing this
would mean that 2; > 1 E for all n E N,But then for all n E N we would have (1 + E ) , < (x:) = x, contradicting the fact that lim (1 E ) , = 00 (see Example 2.43). Thus for all x > 1 we have lirn n+m
I
xi
,-too
= 1.
+
2.5. Infinite Limits
47
The proof for 0 < x < I is deferred to Exercise 2-46. Just as for infinite limits, there are limit laws for limits that equal negative infinity. be such that lim a, = -cc.
Theorem 2.46 Limit laws involving -m. Let
n-cc
1. I f { b n } E l is a bounded sequence, then lim a, n-tcc
+ b, = -00
and
if all a,
are
bn nonzero, then lim - = 0. ,+aa,
2. I f lim b, = -.co, then lirn a, n+cc
,403
+ b, = --oo
and lirn anbn = 00. n+w
3. I f lim b, = cc,then lim anbn = -w. n-cc
,300
4. I f c > 0 is a real numbel; then lirn can = -cc. 11'00
5. I f c
40
is a real numbel; then lim can
00.
n+m
Proof. The proof of Theorem 2.46 is similar to that of Theorem 2.44. It is thus left to the reader as Exercise 2-44. The addition of two sequences such that one has limit 00 and the other has limit -m is absent from Theorems 2.44 and 2.46. This is because by Exercise 2-48b the sum need not converge. Exercise 2-48c shows that even if there is a limit, it is not the same number in all cases. The situation is similar for the product of a sequence with infinite limit and a sequence with limit zero (see Exercises 2-48d and 2-48e), as well as for the quotient of two sequences with infinite limits (see Exercises 2-48f and 2-488). These types of limits are called indeterminate forms and they are discussed in more detail in Section 12.3.
Exercises 2-43. State the definition of a sequence whose limit is negative infinity. 2-44. Prove Theorem 2.46. That is, prove each of the following.
+
is a bounded sequence, then lim a, bn = -oa and if all n+x b an are nonzero, then lim -!! = 0. n + w a, (b) If lim a, = lim bn = -w, then lirn a, b, = --oo and lim a,b, = oc. (a) If,5mxa,
n+x
= -m and [bn):=,
n+x
n e w
+
n+x
(c) If lim a, = -oa and lim b, = oa,then lim a,b, n+x
n+x
n+x
= -oa
(d) If c > 0 is a real number and lirn a, = -m, then lim can = --oo n+m
n+m
(e) If c < 0 is a real number and lirn a,, = - w , then lim can = 00. n+x
n+x
2-45. Let 0 < x < 1 (a) Prove that lim x n = 0 by using the limit laws n+m
(b) Prove that lirn x n = 0 by mimicking the proof in Example 2.43 n+m
2. Sequences of Real Numbers
48 1
1. Prove that lirn x i = 1
2-46. Let 0 < x 2-47. Prove that
n+x
I,"=,
{
is decreasing and that the limit is 1
( + :In 5
Hint. Use Exercise 2-36c to show that 1 (n
+ 1) &I
n
-
n for all large enough n and derive the inequality
from this. For the limit L , use the result from Example 2.45 to show that L = L2.
i
2-48. A first encounter with indeterminate forms. (a) For
[ u , ) ~:=~( n ] g l
and
( b n ] g l :=
lim b, = --o= and that the sequence (a,
n-x
[ - n2 ),"=,,
+ b,}r=l
prove
(n]E1 a n d ( b , ] z l := ( - n ) z l , p r o v e t h a t n+oo lirn and that the sequence (a, + b n ) r =l converges.
(b) F o r ( a n ) z l := (c) Let c E
W.Find sequences
+ b,
lim a,
n-x
lim an = x ,
that
,-roo
diverges.
[~,)r=~
an = x , n-oo lim bn = -00
and [ b n ) r z lso that lirn a, = 00, lirn bn = -m and n i w
n-a3
= c.
(d) Find sequences ( a n ) z l and [ b n J z 1so that lirn a, = x , lim bn = 0 and ( ~ n b n ] E i n+oo
n-x
diverges. (e) Let c E
W. Find sequences (an},X=l and ( b n ] ; P 1so that n+m lim
a, = 03,
lirn b, = 0 and
n+x
lim anbn = c.
n-oo
(0Find
3cI
and (b,)r=l so that lim a, = lirn b, = m and
sequences
n-oo
n+x
di-
verges. (g) Let c E lim
n-m
W. Find sequences
2 = c. b,
2-49. Prove that if (a,)?=l and (b,]?=l an N
E
are sequences so that lirn a, = x and there are an
N so that bn > E for all n 2 N , then n-oo lim
n-oo
E
> 0 and
anbn = 00.
2-50. A characterization of divergent sequences
(a,]zl
I,"=,
be an unbounded sequence. Prove that there is a subsequence (ank (a) Let lim ank = 00 or lim ank = -a.
so that
k+co
k-x
be a bounded divergent sequence. Prove that there are two convergent subse(b) Let quences (a!, such that lim a!, and lim ank exist, hut are not equal. and
{ank}gl
m+x
k+x
(c) Let be a sequence. Prove that [ u , ] ~ = ~diverges if and only if there is a subsequence {ank},& such that lim ank = x or lim ank = --x, or there are two subse-
}r=l
quences (aim
and
k-m
{ L Z , , ~ } :such ~
k+oo
that lirn al, and lim ank exist, but are not equal. m+m
k+x
(d) Use the characterization of divergent sequences in part 2-5Oc and the axioms for R except for Axiom 1.19 to prove the Bolzano-Weierstrass Theorem. 2-51. Prove Cauchy's Limit Theorem. That is, let (b,]?=, be a strictly increasing sequence of positive numbers that goes to infinity and let [a,]?=, be a sequence. Prove that if the sequence an converges to c, then lim - = c. n - x b, Hint Exercise 2-16 with p , := b,, - b,-l and another appropriate sequence
Chapter 3
Continuous Functions Functions are the central objects of analysis. This chapter defines limits and continuity for functions of a real variable and it presents some consequences of continuity. To avoid problems with complicated domains (see Exercise 3-32 and the end of Section 16.3.1 for some details), in this chapter functions are usually considered on intervals or on intervals from which at most finitely many points were removed. These domains are sufficient to build the traditional calculus of functions of one variable. More complicated domains are handled in metric spaces in the second part of the text.
3.1 Limits of Functions The limit of a function at a point x is supposed to express what happens near x, but not necessarily at x. This is similar to the running index n of a sequence never actually becoming 00. While 00 is not in the domain of a sequence, a real number x can be in the domain of a function. Hence, we must explicitly remove x from consideration. In this section, functions are defined on an open interval from which a point x has been removed. In this fashion, we assure that each function is defined “close to the left of x” and “close to the right of x,” which is what we need to investigate for (two-sided) limits. Because convergence of sequences is already defined, we can use sequences to define convergence of functions.
Definition 3.1 Sequence formulation of the limit of a function. Let I g R be an open interval and let x E I . The number L E R is called the limit of the function f : I \ (x} -+ Iw at x zTfoor all sequences (zn}El with zn E I \ {x} for all n E M and lim zn = x we have lim f ( z , ) = L. In this case, we denote lim f ( z ) := L and we n-rffi n m ’ XZ ’ also say that f converges (to L ) at x. Similar to Theorem 2.1 1, the limit of a function at x is only affected by the values of the function near x .
49
50
3. Continuous Functions
Theorem 3.2 Let I 2 IR be an open interval, x E I , and let f ,g : I \ {x} + R be functions. If there is a number 6 > 0 so that f and g are equal on the subset { Z E I \ { x } : Iz - X I < 8 ) o f I \ {x},then f converges at x ifsg converges at x and in this case the equality lirn f ( z ) = lim g ( z ) holds. Z'X
Z'X
Proof. E: cise 3-2. By Theorem 3.2, Definition 3.1 also defines the limit at x for any function that is defined on a set D that contains a set I \ (x},where I is an open interval that contains x. Formally, we define the following.
Definition 3.3 Let D , R , S be sets with R D and let f : D -+ S be a function. The restriction of thefunction f to R, denoted f I R , is defined by f I R ( x ) := f (x) for all x E R. Definition 3.4 Let f : D -+ IR be a function and let x E R be so that there is an open interval I with x E I and I \ { x }C D. We define the limit off at x as the limit of the restriction f l ~ \ { a~ t~ x and denote it lim f ( z ) := lim f l ~ \ { ~ ) ( z ) . X Z'
Z'X
All the following results on functions f : I \ {x} -+ R also apply to functions with larger domains. Strictly speaking we would need to apply all definitions and results to the restriction of the function to a set I \ {x},where I is an open interval and x E Z. We will usually avoid this simple formality. Ultimately, Definition 16.33 will encompass this situation as well as some situations in which, for single variable functions, we use one-sided limits (see Definition 3.15 below).
Example 3.5 1. For all x E
R,we have lirn z
= x.
Z'X
2. For all x E R,we have lirn IzI = 1x1. X Z'
3. The function f (x) :=
{
1; 0; -1;
f o r x > 0, for x = 0, does not have a limit at x = 0. f o r x < 0,
Part 1 is trivial and part 2 follows from Exercise 2-12. To see that the function in part 3 does not have a limit at x = 0, it suffices to produce two sequences {yn}E1and {zfl}glthat converge to zero so that yn f 0 and zn f 0 for all n E N and { f ( y n ) ) z J
for all n
E
W we obtain n+x lirn f
(:)
= 1 f -1 = lim f n+x
(- :),
which completes
the argument.
:StandardProof Technique 3.6 If a sequence that converges to a number x from the is usually a good choice. For a sequence that converges
51
3. I . Limits of Functions a2
to x from the left,
{
x--
is usually a good choice. This idea is extended in
Standard Proof Technique i 3.8. l n =
0
l
There are at least two ways to define the limit of a function. We have already seen the formulation with sequences and we give the formulation with E and S below. The two formulations are equivalent, and hence either one could serve as the definition. With both formulations available, we can choose which one to use. Depending on the situation, one formulation may be preferable over the other to produce a simpler statement or proof. The proof of Theorem 3.19 is a good example of how each formulation is better suited for certain settings than the other.
Theorem 3.7 E-6 formulation of the limit of a function. Let I C R be an open interval and let x E I . Then L E R is the limit of thefinction f : I \ {x} + R at x iff for all E > 0 there is a 6 > 0 such that for all z E I \ {x} with ) z - x / < 6 we have that ( z ) - LI < E .
If
Proof. For “+,”let lim f ( z ) = L . Suppose, for a contradiction, the statement on XZ ’
the right is false. Then there is an E > 0 so that for each 6 > 0 there is a z E I \ {x] with / z - x J < 6 and l f ( z ) - LI 2 E (if necessary, use Standard Proof Technique 1 1 2.30 for the negation). Then for S := - there is a z , E I \ {x] with Iz, - X I < - and n n l f ( z n ) - Li 2 E . But then lim zn = x, while lim f (z,) either does not exist, or n+3o n+cc if it exists, then lirn f (z,) L (see Exercise 2-5). Either way we have amved at a n+m
+
contradiction. For “e,” let f : I \ {x} + R be such that for all E > 0 there is a 6 > 0 such that for all z E I \ {x}with Iz - X I < 6 we have that I f ( z ) - LI < E . We need to prove that for each sequence with zn E I \ {x} for all n E N and lirn zn = x we have
{z,}r=,
n+cc
that lirn f ( z , ) = L . n+m
Let { z ~ } : = ~ be a sequence with zn E I \ ( x } for all n E N and n+co lim zn = x, and let E > 0. Then there is a 6 > 0 such that for all z E Z \ {x] with Iz - X I < 6 we have I f ( z ) - Ll < E . Moreover, for S there is an N E N so that for all n 3 N we have Izn - x J < 8. But then for all n 2 N we infer l f ( z , ) - LI < E , and hence lim f ( z , ) = L . This proves that lirn f (z) = L . 17 + 30
Z’X
Standard Proof Technique 3.8 In the ‘‘+”part of the proof of Theorem 3.7, for all S > 0 there is a z with Jz - x J < 6 and other properties. To obtain a sequence {~,,},x7_~ that converges to x so that each z, has the other desired properties, it is standard 1 1 practice to use 6 := - and then pick an appropriate element zn with / Z n - X I < -. n n
Exercises 3- 1, Explain why after Definition 3.1 it is not necessary to prove that the limit of a function is unique 3-2. Prove Theorem 3.2.
3. Continuous Functions
52 3-3. Let m , b
E
W.Prove that Z'lirn X
mz
+ b = mx + b.
3-4. Let I be an open interval and let x E I . Prove that if f : I \ ( x ) -+ W does not converge to L at x, then there are an E z 0 and a sequence [ z ~ ) : = ~ so that lirn zn = x , zn E I \ (x} for all n E 8 and n-x
1 f ( z n ) - L 1 > E for a11n E N.
3-5. Prove that if m E Zand 1.1 is the floor function, then lirn l z ] does not exist. z+m
2-3
1
3-6. Prove that lirn -- -
6'
2-312-9
3-7. Prove that the Dirichlet function f ( x ) =
''
for Q' does not converge at any x E R. 0; f o r x $ Q,
3-8. Explain why, with the present definition, lirn &is not defined. Then state what the limit should be
z+o
and how we could circumvent our purely formal problem.
Nore. This problem will be resolved in Exercise 16-28.
3.2 Limit Laws Just as for limits of sequences, we are interested in how limits of functions relate to the algebraic operations, because this should simplify the computation of limits. First note that all algebraic operations on functions are defined pointwise.
Definition 3.9 Let D E R be a set and let f , g : D -+ R be functions. For all x E D we define ( f g)(x) := f ( x ) g(x), ( f - g)(x) := f ( x ) - g(x), and f (x) ( f . g)(x) := f (x) . g(x). For all x E D with g(x) f 0 we dejne (x) := -
+
+
($)
Theorem 3.10 Limit laws for functions. Let I Iw be an open interval, let x E I and let f , g : I \ {x} -+ R be functions such that lim f ( z ) and lirn g ( z ) exist. Then the Z'X
Z'X
following hold. lim( f X Z'
+ g ) ( z ) = lim f ( z ) + lim g ( z ) . Z'X
XZ'
lim( f - g ) ( z ) = lim f ( z ) - lim g ( z ) .
Z'X
lim( f X Z'
Z"X
g ) ( z ) = lim f ( z ) . Z'X
XZ'
JL:
g(z).
Each equation implicitly asserts that the limit on the left side exists (see box on p . 33). Moreovel; formally, in part 4 we would need to demand also that g ( z ) # 0 for all z E I \ {x). But lirn g ( z ) # 0 implies that g ( z ) # 0 for all z E I \ {x} that are near Z'X
x. Hence, i f g has zeros, we implicitly assume that g has been restricted appropriately
rather than worry about zeros that do not affect the convergence behavior:
53
3.2. Limit Laws
Proof. Throughout this proof let L f := lim f (z) and let L , := lim g(z). Z’X
X’Z
In this proof we will use Definition 3.1 as well as Theorem 3.7. Although it will turn out that Definition 3.1 in conjunction with the limit laws for sequences is more effective for this prooj it will also be instructive to see how 8-6proofs are constructed. The reader will compare the two approaches by proving each part with the respective other approach in Exercises 3-9 and 3-10. To prove part 1, we use Theorem 3.7. We need to prove that for all E > 0 there is a 6 > 0 s o t h a t f o r a l l z ~I\{x)withIz-xl < 6 w e h a v e I ( f + g ) ( z ) - ( L f + L , ) l <E. Let E > 0. Then there are 6f > 0 and 6, > 0 so that for all z E I \ {x) with
If
(z) - L f l < - and for all z E I \ {x) with Iz - X I < 6, we 2 E have Ig(z) - L , < -. Let 6 := min{bf, 6,) (compare with Standard Proof Technique 2 2.6). Then for all z E I \ {x) with Iz - X I < 6 we obtain via the triangular inequality that
Iz - X I
< Sf we have
&
1
This means we have proved that for all E > 0 there is a 6 > 0 so that for all I \ {x) with Iz - X I < 6 we have f g)(z) - ( L f L g ) l < E . Consequently, lim( f g ) ( z ) = L f L,.
z
E
Z’X
+
I( +
+
+
To prove part 2 using Definition 3.1, we need to prove that for all sequences {z, with Z, E I \ {x) for all n E N and lim zn = x we have lirn f (z,) = L .
(z,}zl be a sequence in I \ {x} with lim n m ’
Let
Because
n-cc
n-cc Z, = x. By
{z,)zl was arbitrary this implies lim( f
)El
Theorem 2.14 we infer
-g)(z) = L f
-
L,.
X’Z
The proofs of parts I and 2 show that to prove the limit laws, Definition 3.1 is more effective than Theorem 3.7. Nonetheless, both ways are actually equally complex overall. If we compare the proof of part I with the proof of part 1 of Theorem 2. I 4 we see striking similarities in the arguments. This means that the complexity of a proof using Dejinition 3.1 is simply delegated to the proof of an earlier result (Theorem2.14). The reader will have the chance to analyze the similarities and the direrences in the proofs for part 3 in Exercise 3-9. The similarity between the proofs here and the proofs for Theorem 2. I 4 can be used to translate the proof for part 4 into a complete proof of part 4 of Theorem 2.14. The rather complicated choices in the header were of course made after the final inequalities had been analyzed carefully. The proof of part 3 is left to the reader as Exercise 3-9. For part 4, we use Theorem 3.7. So we need to prove that for all E > 0 there is a f L S > O s o t h a t f o r a l l z E Z \ {x} with Iz - X I < 6 we have l;(z) < E.
$1
Let E
1
0. Then there is a 6f > 0 such that for all z
have f ( z ) - L f l
\ (x)with Iz - x I < Sf we 0 such that for all z E I \ {x)
E
I
3. Continuous Functions
54
with
/z-XI
2 ( 2 ILf
We have proved that for all
Iz
ILg I*
< 6, we have / g ( z ) - L,/
0 there is a 6 > 0 so that for all c
E.
z
E
> 0 so
I \ (x)with
Therefore the limit of the quotient exists and
Just like limits of sequences, limits of functions preserve inequalities and there also is a Squeeze Theorem.
Definition 3.11 Let D 5 R be a set and let f , g : D + R be functions. We say 3f is pointwise less than or equal to g, that is, i r f o r all x E D the inequaliv f _< g 1 f ( x ) 5 g ( x ) holds. Theorem 3.12 Let I 5 R be an open interval, let x E I and let f , g : I \ (x}+ R be functions. I f f 5 g on I \ {x} and f and g converge at x, then lim f ( z ) 5 lim g ( z ) . Z'X
Z'X
Proof. Exercise 3- 1 1. Theorem 3.13 The Squeeze Theorem for functions. Let I C R be an open interval, let x E I and let f , g , h : I \ {x} -+ R be functions. I f f 5 g 5 h on I \ {x) and f and h converge at x with lim f ( z ) = lirn h ( z ) , then g converges at
x and lim g ( z ) = lim f ( z ) = Z'X
Z X'
i% h(z).
Z'X
Z'X
55
3.2. Limit Laws
Proof. Exercise 3- 12. Finally, convergence is also preserved by the composition of functions.
Theorem 3.14 Let I , J 5 R be open intervals, let x E I , let g : I \ {x) -+ IW and f : J + R befunctions with g [ I \ {x}] J , and let lirn g ( z ) = L E J . Assume that Z"X
lirn f ( y ) exists and that g [ I \ (x}] E J \ { g ( x ) } ,06 in case g(x)
E g[Z
J-+L
\ (x}],that
lirn f ( y ) = f ( L ) . Then f o g converges at x and lirn f o g(x) = lim f ( y ) .
J-tL
y+L
Z'X
Proof. Let M := lirn f ( y ) and let
be a sequence in the set I
\ {x} so
Y+L
that lirn zn = x. Then lim g ( z n ) = L . If no zn satisfies g ( z n ) = g(x), we obtain n+oo
n-cc
lirn f ( g ( z n ) ) = M , while if some g ( z n ) are equal to g(x), then M = f(L)and we
n-+w
can infer lirn f ( g ( z n ) ) = f ( L ) = M in this case also. Because the sequence {z,},"=, n+w was arbitrary the result is established.
Exercises 3-9. Completing the proof of Theorem 3.10. Let x E I and let f . g : I \ {x)--f lim f ( z ) and lim g(z) exist.
R be functions such that
Z"*
Z"X
(a) Prove part 3 using Definition 3.1.
(b) Prove part 3 using Theorem 3.7.
3-10. Alternative proofs for the proved parts of Theorem 3.10. Let x E I and let f,g : I \ {x] + functions such that lirn f ( z ) and lirn g(z) exist. (a) Prove that Jir+mx(f
R be
X Z'
X Z'
+ g)(z) = lirn f ( z ) + Jim g(z) (part 1) using Definition 3.1 XZ'
X Z'
(b) Prove that lirn ( f - g)(z) = lim f ( z ) - lirn g ( z ) (part 2 ) using Theorem 3.7. z+x
Z'X
(c) Prove that if lirn g(z) f 0, then ZX'
Z+X
!s(5) ( z )
= l i m Z ~ xf ( z ) (part4) using Definition 3.1. limz+x g(z)
3-11. Prove Theorem 3.12. (a) Using Definition 3.1.
(b) Using Theorem 3.7.
3-12. Prove Theorem 3.13. (a) Using Definition 3.1
(b) Using Theorem 3.7
3-13. Explain the similarities between the proof of part 1 of Theorem 3.10 as presented and the proof of part 1 of Theorem 2.14. 3-14. Computation of limits. (a) Let I be an open interval, let x E I and let f : I \ (x) + Jim f(z) = lim f ( x h ) . i-x
+
h-0
(b) Compute each of the following limits x 2 - 5x 6 i. lirn x 1 3 x 2 - x - 6 x-9 iii. lim X"9X2( 9 + & ) x + r n
+
ii. lim x-2
W be a function. Prove that
3x3 - x* - 12x x2 - 4
+4
56
3. Continuous Functions
3.3 One-sided Limits and Infinite Limits Section 3.5 will show that it is advantageous to consider continuous functions (see Definition 3.23) on closed intervals. But that means we also need a notion of convergence at the endpoints of a closed interval. One-sided limits provide just that.
Definition 3.15 Sequence formulation of the left limit of a function. Let a < b. The number L E R is called the left limit of the function f : [ a ,b ) + R at b iff for all sequences in [ a ,b ) with lirn zn = b we have lirn f (z,) = L. In this case, n+cc n+cc we denote lim f ( z ) := L and we say f converges (to L ) at b from the left.
{zn]zl
z+b-
The right limit at a for a function f : ( a , b ] -+ E% is defined similarly. It is denoted lim f ( z ) , and we say f converges at a from the right.
ziaf
I f f : D -+ R is a function and the domain D contains an interval [ a ,b ) with a c b, we dejine lim f (2) := lirn f I [ a , b ) ( z ) if it exists. (Exercise 3-15 shows that z+b-
z+b-
these left limits are well dejined.) Right limits are defined similarly. Similar to limits, we prove most results for functions defined on half-open intervals. These results are also valid for functions with larger domains. We simply apply them to the appropriate restrictions.
Example 3.16 Let m E Z.For thejoor and ceiling functions of Definition 1.33, we have lirn l z ] = m - 1, lim LzJ = m, lim rzl = m, and lirn [zl = m 1. 0 z+m-
z-+m+
zim-
z+m+
+
Definitions 3.15 and 3.1 differ in only one way. For a one-sided limit, the sequences must all stay on one side of the number, while in Definition 3.1 the sequences can have values on either side. To emphasize the ability to approach from either side, limits of functions are sometimes called two-sided limits. With such strong similarity in the definitions, it is only natural that the theorems that govern one-sided limits are similar to the theorems that govern (two-sided) limits.
Theorem 3.17 Limit laws for one-sided limits. Let f , g : [a,b ) -+ R be functions such that lim f ( z ) and lirn g ( z ) exist. Then the following hold. z+b-
zib-
I . lim ( f + g ) ( z ) = lim f ( z ) z-tb-
z+b-
+ z-tblim
g(z).
Each equation implicitly asserts that the limit on the left side exists. (See box on page 33.) Moreovel; because lim g ( z ) # 0 in part 4 implies that g ( z ) # 0 for z near b z-b-
(where it matters), we did not demand g ( z ) hold for right limits.
+ 0 for all z E [ a ,b). Similar limit laws
57
3.3. One-sided Limits and Infinite Limits Proof. Exercise 3-16.
Theorem 3.18 8-6formulation of the left limit of a function. The number L E R is the left limit of thefunction f : [ a ,b ) + R at b ifSfor all E > 0 there is a 6 > 0 such thatforallzE [ a , b ) w i t h l z - b l < 6 w e h a v e ( f ( z ) - L (< E . Proof. Exercise 3- 17. Theorem 3.19 connects one-sided limits to (two-sided) limits. Note that to make the proof efficient, we use the sequence formulation of the limit for one direction and the E-8 formulation for the other direction.
Theorem 3.19 Let I g R be an open interval, let x E I and let f : I \ {x) + R be a function. Then lim f ( z ) exists iff lim f ( z ) and lirn f ( z ) both exist and are equal. z'x-
Z'X
Z'X+
In this case the limit is equal to the left and the right limit.
Proof. For
"+,"let
for all sequences
L := lirn f ( x ) . Using Definition 3.15 we must prove that Z'X
{z,}:~ in I with zn
< x for all n E
N and nlim -ffi
Z, = x
{z,}Elin I with
lim f (z,) = L and we must prove the same result for all sequences > x for all n E N and lirn Z , = x.
fl'cc
zn
we have
rl-m
Let {z,},"=, be a sequence in I with Z , < x for all n E N and lirn Z , = x. By Definition 3.1 we have lirn f (z,) = L , which was to be proved. Sequences with n-+ffi zn > x for all n E N are treated similarly. Hence, lim f ( z ) = lirn f ( z ) = L . fl'ffi
-x'z
Z'X+
For "+,"let lim f ( 2 ) = lirn f ( z ) =: L . By Theorem 3.7, we must prove that Z'x-
Z'X+
> 0 there is a 6 > 0 so that for all l f ( z ) - L / < &.
for all
E
z
E
I
\
{x} with
Iz
-
XI
< 6 we have
Let E > 0. By Theorem 3.18 there is a 61 > 0 so that for all z E I \ (x} with z < x and Iz - X I < 61 we have l f ( z ) - LI < E . By the corresponding result for right limits, there is a 6, > 0 so that for all z E I \ { x } with z > x and ) z - X I < 6, we have I f ( z ) - LI < E . Let S := rnin{&, &}. Then for all z E I \ {x} with / z -xi < 6 we infer f ( z ) - L / < E . By Theorem 3.7, this proves Z'lim f (z) = L. X
1
Standard Proof Technique 3.20 To prove that a function has a limit at x often proves that the left and the right limits exist and that they are equal.
E
R one
Infinity and negative infinity are not numbers, so formally they do not qualify as limits. But knowing that a function grows beyond all bounds near a point gives more information than a statement that the limit does not exist. Hence, we extend the language to allow infinite limits.
Definition 3.21 Let f : [ a ,b ) + R be a function. Then the left limit off at b is said to be infinity iff f o r all sequences {zn}Klin [ a , b ) with lim zn = b we have lim f ( i n= ) 00. We denote lim f ( z ) := 00. fl'ffi
n
'x:
z+b-
58
3. Continuous Functions
Infinite right limits at a f o r a function f : ( a , b] -+ R and infinite (two-sided) limits of a function f : I \ (x} 3 E% at x E I , where I is an open interval, are defined similarly. Limits equal to negative infinity are also defined similarly and they are denoted by -m. Finally, as before, infinite one-sided and two-sided limits of functions with larger domains are defined via the limits of appropriate restrictions.
We chose to put one-sided infinite limits in the spotlight in Definition 3.21, because functions often have different behavior to the left and to the right of a point. For 1 1 example, lirn - = 00 and lirn - = -m.
z
Z'O+
z+o- z
It is not surprising that there is a formulation of infinite limits that is similar to the E-8 formulation of finite limits.
Theorem 3.22 M-8 formulation of infinite left limits. The left limit of the function f : [ u , b ) + R at b is infinite iff f o r all M E R there is a 6 > 0 such that f o r all z E [ a ,6 ) with Iz - bl < 6 we have f (2) > M .
Proof. Exercise 3-18. Of course, similar results also hold for right-sided and two-sided limits. Limit laws for infinite limits and a version of Theorem 3.19 are given in Exercises 3-22 and 3-23.
Exercises 3-15. Let f,g : [a. b ) -+ B be functions. Prove that if there is a number S t 0 so that f and g are equal on the subset [ z E [ a ,b ) : / Z - bl < S ] of [ a , b ) ,then the left limit o f f at b exists iff the left limit of g at b exists and in this case we have lirn f ( z ) = lirn g(z).
z- b-
i-b-
3-16. Prove Theorem 3.17. That is, let f,g : [a. b ) + X be functions such that lim f ( z ) and lirn g ( z ) z-tb-
z+b-
exist and prove each of the following. (a) - : ~ - ( f+ g ) ( z ) = lim f ( z ) z+b-
(b)
lirn :+b-
+ z+blim
(f - g ) ( t ) = lirn f ( z ) - lirn z+b-
g(z). g(z)
z-b-
lirn (f . g ) ( z ) = lim f ( z ) . lirn g ( z ) z-tb-
(d) If lirn g ( z ) z-tb-
z+b-
# 0, then lirn z+b-
('1 &'
(z)=
lim,,b-
f(z)
lim,,b-
g(z)
3-17. Prove Theorem 3.18.
3- 18. Prove Theorem 3.22. 3-19. Prove that f ( x ) = 3-20. Prove that Iim
z+o+
x+3; (x + 10;
h ,
forx 1, has a limit at x = 1 and state its value forx = 1,
= 0. Explain why this is a satisfactory resolution of the formal problem in
Exercise 3-8 or why it is not. 3-21. A function f is called nondecreasing on I C W iff for all x i < x2 in I we have f ( x 1 ) 5 f ( x 2 ) . Let f : [ a , b ] -+ R be a nondecreasing function.
59
3.4. Continuity (a) Prove that for every x E ( a , b] we have lim f ( z ) = sup
{ f(z):z<x }
Z+S-
Hint. Mimic the proof of the Monotone Sequence Theorem. (b) Prove that for every x E [ a , b ) the right limit lim f ( z ) exists and state its value ZX '+
3-22. Limit laws involving infinite limits of functions. Let f . g : [ a , b ) + W be functions, and let lim f ( z ) = m. Prove the following. :+b-
1
(a) If g is bounded (that is, there is an M > 0 so that for all z E [a. b ) we have g(z) then lirn f ( z ) + g(z) = x and if all f ( z ) are nonzero, then lirn z+b-
z+b-
(b) If lim g(z) = m, then lirn f ( i ) :+b-
z-b-
+ g(z) = co and z lim ib-
I < M),
go = 0. f(z)
f ( z ) g ( z ) = 00.
(c) If c > 0 is a real number, then lirn c f ( z ) = x. z+b-
(d) If c < 0 is a real number, then lim c f ( z ) = -m z-b-
JR be an open interval, let x E I and let f : I \ [ x ] 3-23. Let I lim f ( z ) =miff lirn f ( z ) = lim f ( z ) = 00. ZJ'
Z'X-
--f
R be a function. Prove that
ZX '+
3.4 Continuity Continuous functions are usually defined as functions for which the limit is computed by substituting the value. Because we may need to take care of endpoints, the formalization of the elementary definition from calculus requires an extra item (see number 4 below).
Definition 3.23 Let D 2 R be an interval of nonzero lengthfrom which at mostfinitely many points have been removed and let f : D -+ R be a function. Then f is called continuous at x
iff
1. f (x) is dejned, that is, x E D, and
2. lirn f ( z ) exists, and Z-+X
3. lim f ( z ) = f ( x ) ,and Z"X
4. I f x is an endpoint of D, use left or right limits in 2 and 3, as appropriate. f is called continuous (on D ) iff f is continuous at every x
E
D.
We could also define continuity for functions defined on sets for which every point of the domain is contained in an interval of nonzero length. Exercise 3-32 shows that with the present definition this idea is a bit too simple to produce a sensible result. This is not a problem, because in the early part of the text the only functions whose domains are not intervals are rational functions. For these functions the pathology of Exercise 3-32 is not an issue. Therefore we relegate all concerns regarding more complicated domains to Section 16.3.
3. Continuous Functions
60
__ I_ T
r---------
/
X
For theorems, we will usually work with functions that are defined on intervals, because if D is an interval from which at most finitely many points were removed, then f : D +. R is continuous at x E D iff f 1 is continuous at x, where I is a maximum-sized (with respect to containment) interval contained in D that contains x.
Example 3.24 1. Constantfunctions are continuous at every x E
W.
2. The function f (x) = x is continuous at every x
E
R.
3. Thefilmtion f ( x ) = 1x1 is continuous at every x E
R.
Parts 1 and 2 are trivial and part 3 follows from part 2 of Example 3.5.
0
It is useful to incorporate the definition of limits directly into a characterization of continuity. With such a characterization it is not necessary to resolve the multiple parts of the original definition whenever we work with continuity. Figure 6 shows graphical interpretations of the conditions.
EX be an interval, let x Theorem 3.25 Let I Then the following are equivalent.
E
I and let f : I -+ R be a function.
1. f is continuous at x.
2. For every sequence lim
n+m
f(zn) =
{zn},"=, with zn
E
I for all n E
N and n+m lim zn
= x, we have
f(x).
3. For every E > 0, there is a 6 > 0 such that for all z E I with Iz - x I < S we have
If(z)- f
< &.
3.4. Continuity
61
Proof. We will prove “1+2,” “2=+3”and “3+1.” The remaining implications follow because logical implications are transitive. That is, the implications “ I +2” and “2=+3”imply “l+3,” the implications “2=>3”and “3+l” imply “2+1,” and so on. We will assume throughout that x is not an endpoint of the domain. The arguments are easily modified (by using appropriate one-sided limits and theorems) for the case that x is an endpoint. For “1+2,” we need to prove that for every sequence { Z n } E i with Zn E I for all n E Nand lirn zn = x we have lim f(z,) = f ( x ) . n’cv n+cc Because f is continuous at x, we have lim f ( z ) = f (x). By Definition 3.1, this
(zn}zl with z ,
XZ ’
I \ {x) and n-too lirn Z n = x we have that (zn}gl be a sequence with Zn E I and lirn Zn = x . n+cv n+oo If there is an N E N so that zn = x for all n 2 N, then there is nothing to prove. Otherwise let (zn,}El be the subsequence of all elements with zllk E I \ {x}. Then lirn znk = x, and hence lim f (z,,) = f (x). Now let E > 0. Then there is a K E N means that for all sequences lim f (z,) = f (x). Now let
k+ cv
E
k+oc
so that for all k 2 K we have I f (z,,) - f (x)( < 8. Then for all n 2 nK either zn = x and f (2,)- f (x)] = 0 < E orthereis a k 2 K withn = nk and f ( z n k ) - f (x)l < E . This means lirn f (2,) = f (x) for every sequence with Z, E I for all n E N n-oo and lirn Z, = x, which establishes this part of the proof. n-oc The proof of “2+3” is similar to the proof of “=>” in Theorem 3.7 (Exercise 3-24). For “3+1,” note that, because f is defined on I and x E I , f is defined at x. The condition in part 3 implies by Theorem 3.7 that lirn f ( z ) = f (x), which completes the
1
{z,}zl
1
X’Z
proof. Because of the formal problems with limits of functions outlined at the end of Section 16.3.1, in general settings one of the conditions in Theorem 3.25 is normally used to define continuity.
Standard Proof Technique 3.26 To prove the equivalence of several conditions, it is standard practice to prove that the first implies the second, the second implies the third, and so on, and finally that the last condition implies the first. All other implications 0 follow from the transitivity of logical implications. The proofs of parts 5 and 6 of Theorem 3.27 serve as good examples how the conditions in Theorem 3.25 can be used in conjunction.
Theorem 3.27 Let I R be an interval, let x at x E I . Then the following hold. 1. f
E
I and let f , g : I + R be continuous
+ g is continuous at x.
2. f - g is continuous at x.
3. f . g is continuous at x .
4. I f g ( x ) # 0 f o r all x
E
f
I , then - is continuous at x. g
3. Continuous Functions
62
5. max{f ,g } is continuous at x. (The maximum is dejinedpointwise.)
6. min{f,g } is continuous at x. (The minimum is definedpointwise.) Proof. The first four parts are direct consequences of the corresponding limit laws. For part 5 , let x E I. We will establish continuity of max{f, g } at x by proving that for all sequences with Z, E I for all n E N and lim zn = x we have
{z,}zl
,--too
lim max { f ( z , ) , g ( z n ) } = max { f b ) ,g w } . n-oc Let be a sequence in I with lim zn = x. In case f ( x ) # g ( x ) , assume
{z,}zl
n+Cc
without loss of generality that f ( x ) > g ( x ) . To prove that the limit of the image sequence is lim max{f, g } ( z , ) = max{f, g } ( x ) = f ( x ) , let E > 0. Because f and n-+m .. .. g are continuous at x, there is a 6 > 0 so that for all z E I with Iz - X I < 6 we f ( x ) - g ( x ) . In have I f ( z ) - f ( x ) l < min E , f ( x ) and I g ( z ) - g ( x ) l < 2 particular, for all z E I with'lz - X I < 6 we dbtain
1
ig(')}
and hence max{f, g } ( z ) = f ( z ) . For 6, find an N Jz, - X I < 8.Then for all n 2 N we infer
E
N so that for all n 2 N we have
I maxIf3 g } ( z n ) - maxIf, g } ( x ) l = I f ( z n ) - f ( x > l
g(x). This leaves the case f ( x ) = g ( x ) . Let E > 0. Because f and g are continuous at x, there is a 6 > 0 so that for all z E I with Iz - xi < S we have that l f ( z ) - f ( x ) l < E and I g ( z ) - g ( x ) l < E . For 6, find an N E N so that for all n 2 N we have that Iz, - X I < 6. For all n 2 N , the maximum max{f, g } ( z n ) is equal to f ( z , ) or g(z,). Because Iz, - X I < 6, if max(f, g } ( z n ) = f(z,) we infer max{f, g } ( z n ) - maxIf9 g } ( x ) l = If(z,> - f ( x ) J < E andifmaxIf, g } ( z n >= g(z,) we infer max{f, g}(z,) - max{f, g } ( x ) l = Ig(z,) - g ( x ) / < E . Thus the maximum max{f, g } is continuous at x when f ( x ) = g ( x ) . The proof of part 6 is left as Exercise 3-25b.
I
1
There is a faster proof of part 5 that relies on Theorem 3.30 below and on an algebraic representation of the maximum (see Exercise 3-26a). Because our main focus is on standard techniques in analysis, the longer proof was presented here. Theorem 3.27 gives access to two standard examples of continuous functions.
Example 3.28 A polynomial is a function p : R -+ JR for which there are an n n
and ag, . . . , a , E R so that a, f 0 and for all x
E
E
W
Iw we have p ( x ) = x a ; x ; . The ;=0
number n is called the degree of the polynomial. The constant function p ( x ) = 0 is also considered to be a polynomial. Its degree is defined to be -m. Every polynomial is continuous on R. (See Exercise 3-28.)
3.4. Continuity
63
Example 3.29 A rational function is a function r for which there are two polynomials p ,q : R
R so that for all x
P(X)
R for which q ( x ) f 0 we have r ( x ) = -.
By q(x) Theorem 3.27 and Example 3.28, every rational function is continuous on its domain { x E R : q ( x ) f O}. (We implicitly use here that every polynomial has at most finitely many zeroes.) 0 --f
E
Continuity is also preserved by compositions.
Theorem 3.30 Let I , J R be intervals, let g : I -+ R be continuous at x E I , let g [ I ] C J and let f : J -+ R be continuous at g ( x ) . Then f o g : I -+ R is continuous at x . Proof. We will prove that for every E > 0 there is a 6 > 0 so that for all z E I with Iz - X I < 6 we have ( g ( z > )- f ( g ( x ) ) l < E . Let E > 0. Because f is continuous at g ( x ) , there is a 6f > 0 so that for all y E J with / y - g ( x ) < 6 f we have ( y ) - f ( g ( x ) ) < E . Because g is continuous at x , for Sf there is a 6 > 0 so that for all z E I with jz - X I < 6 we have I g ( z ) - g(x)l < 6f. But then because l g ( z ) - g(x)j < 6 f we infer f ( g ( z ) ) - f ( g ( x ) ) l < E . Therefore we have proved that for every E > 0 there is a 6 > 0 so that for all z E I with 1 - X I < 6 we have l f ( g ( z ) )- f ( g ( x ) ) l < E , which means that f o g is continuous at x . H
If
I
I 1
If
We conclude this section by characterizing discontinuities.
Definition 3.31 Let D be an interval of nonzero length from which at most finitely many points X I , . . . , x,, have been removed. Ifthefunction f : D -+ R is not continuous at x E D U ( X I , . . . , x,}, we speak of a discontinuity at x . There are several types of discontinuities. (For visualizations, see Figure 7,for examples, see Exercise 3-31.) I . I f lim f ( z ) exists, or i f x is an endpoint of D U ( X I , . . . , x,,} and the appropriate Z'X one-sided limit exists, but the limit is not equal to f ( x ) ; or if x is not an endpoint of D U { X I , . . . , x,} and f is not defined at x , we speak of a removable discontinuity. 2. I f lim f ( z ) and lim f ( z ) exist, but they are not equal, we speak of a jump Z'X-
Z'X+
discontinuity. 3. I f (at least) one of lim f ( z ) and lirn f ( z ) does not exist and Z'X-
Z'X+
sequence (z,}:, in D that converges to x and limn+oo of an infinite discontinuity.
1f(z,)] =
if there is a 00,
we speak
4. Iftat least) one of lim f ( z ) and lirn f ( z ) does not exist, ifthere is a 6 > 0 so z'x-
Z'X+
{z,,}zl
that f is bounded in { z E D : / z - x I ia} and ifthere are two sequences and ( w , ] that ~ ~ converge to x such that for all n E W we have z,,, w, < x (or z,, w, > x ) and such that both lirn f ( z , ) and lirn f (w,)exist, but they are n+oo ,--too not equal, we speak of a discontinuity by oscillation.
3. Continuous Functions
64
Figure 7: Visualization of the possible discontinuities of a function.
Theorem 3.32 Let D C R be an interval of nonzero length from which at most$nitely many points XI,. . . , x, have been removed and let f : D + R be a function. Then eveiy discontinuity x E D U {XI,. . . , x,} of f is of one of the four types listed in Dejinition 3.31. Proof, Let x E D or let x be one of the finitely many elements that were removed and assume that f is not continuous at x. Let the discontinuity at x be neither a removable discontinuity, nor a jump discontinuity, nor an infinite discontinuity. We will prove that it must be a discontinuity by oscillation. If lirn f (x) and lirn f ( x ) both Z’X-
Z’X+
existed, then they would either be equal and the discontinuity would be removable, or not, in which case the discontinuity would be a jump discontinuity. Hence, one of the one-sided limits does not exist at x. If x is the supremum or the infimum of D , then f is defined at x and one of the two one-sided limits does not exist by default. In this case, the respective other one-sided limit also must not exist, because otherwise there would be a removable discontinuity at x. By symmetry, we can assume without loss of generality that lim f (x) does not exist and f is defined on some interval [x - 6 . x). zx’-
Because the discontinuity at x is not an infinite discontinuity, there is a u > 0 so that V f is bounded on [x-u, x). Let zn := x - -. Then lirn z , = x. First consider the case n n’oo that lim f (z,) =: L exists. Because lim f ( z ) does not exist, there must be an E > 0 n+oo
ZX’-
and a sequence {vk}&
1
1
so that lim
k+m
Uk
= x, and for all k E
N we have x - v 5
uk
E . Because f is bounded on [x - u , x), the sequence { f ( U k ) J k Z l is bounded. By the Bolzano-Weierstrass Theorem, it has a convergent subsequence co {f(Uk,)},=1. Then n-oo lim f(vk,) L , and hence {wn}E1:= and {zn1Z1 are sequences as required in the definition of a discontinuity by oscillation. oo If { f (z.)},=~ does not converge, note that it is bounded, and hence it has a convergent subsequence { f Now f has a discontinuity by oscillation at x,because we can replace with { z n k } E in I the above argument and then repeat it. W
+
{w,}zl
(znk)}E1. {zn]zl
Standard Proof Technique 3.33 When an argument requires a convergent sequence and all that is guaranteed for a given sequence ( z ~ } ? = ~is a convergent subsequence, then one often assumes without loss of generality that ( z , , ] ~ = converges. ~ This is because, just like at the end of the proof of Theorem 3.32, the given sequence can be replaced with a convergent subsequence that (usually) has all the properties of (Z,},X=~, plus it converges. 0
{zn]zI
3.4. Continuity
65
Although Theorem 3.32 characterizes all possible discontinuities for functions from
R to R,other types of discontinuities exist for functions with infinite dimensional range (see Exercise 16-36).
Exercises 3-24. Prove part “ 2 j 3 ” of Theorem 3.25. 3-25. Completing the proof of Theorem 3.27
(a) Give all details of the proofs of parts 1-4. (b) Prove part 6. 3-26. Alternative proofs of parts 5 and 6 of Theorem 3.27. Let I be an interval and let f,g : I + R be functions.
(a) Use parts 1 and 2 and Theorem 3.30 to prove that if f and g are continuous at x
E
I , then
max(f, g ) is continuous at x .
Hinr. First prove that for all a , b
E
1 2
W the equality max[a, b ] = - ( a + b + la - b / ) holds.
(b) Give a similar proof of part 6 of Theorem 3.27. 3-27. Prove that for all n E N the function f ( x ) = x n is continuous on W.Then prove that for all m E W the function f ( x ) = Y r n is continuous on I% \ ( 0 ) . 3-28. Prove that all polynomials are continuous on E. Hint. Induction on the degree. 3-29. Alternative proofs of Theorem 3.30.
(a) Prove Theorem 3.30 using Definition 3.23 and Theorem 3.14. (b) Prove Theorem 3.30 using part 2 of Theorem 3.25. 3-30. Explain why we demand that the interval in Definition 3.23 must have nonzero length. 3-31. Examples of discontinuities.
2
(a) Prove that f ( x ) = - has a removable discontinuity at 0 X
(b) Prove that f ( x ) = rx] has a jump discontinuity at 0. 1 (c) Prove that f ( x ) = - has an infinite discontinuity at 0. x
(d) Let g(x) := +2;
.
1 f o r O S x i -, 2 1 for - < x 5 1. 2 -
,
has a discontinuity by oscillation at 0
u [k - L. ‘1 m
3-32. Let
u :=
10n ’ n
,,=I
f : [-l,O] (a) (b) (c) (d)
UU
--f
:= ( x E
B :(3n
I
E N :x E
0;
[
-
A,:])I
and let the function
forx E [-l,O], E U.
W be defined by f ( x ) := 1: forx
Prove that f is continuous on every interval I that is contained in its domain D := [- 1 , OIUU. Prove that every point in D is contained in an interval of nonzero length. Explain why the function still should not be considered to be continuous on D . Suggest a generalization of the definition of continuity that would allow domains such as D and that would make f discontinuous at 0.
This function will be revisited in Exercise 16-29.
66
3. Continuous Functions .xA approach c
I
,
from the left
sn approach c from the nght
b
-'
"
1
f(x)>Ofor
Figure 8: Visualizing the Intermediate Value Theorem. Intuitively ( a ) it is clear that an unbroken graph that goes from a point below the x-axis to a point above the xaxis must cross the x-axis at least once (solid graph) and that it could even cross the x-axis multiple times (dotted continuation past the first intercept). Part ( b ) gives a visualization of the proof. Because the sequences { x , ) and ~ ~ {xA},"=, meet in the middle at (c, f ( c ) ) , we conclude f ( c ) = 0.
3.5 Properties of Continuous Functions Aside from their obvious connection to limits, continuous functions are interesting because they have several useful properties.
Theorem 3.34 Intermediate Value Theorem. Let a < b and let f : [ a ,b] -+ R be a continuousjknction. I f f ( a ) < 0 and f ( b ) > 0 (or vice versa) then there is a c E ( a , b ) such that f ( c ) = 0. (Also see Figure 8(a).) Proof. Assume without loss of generality that f ( a ) < 0 and f ( b ) > 0. The proof for the other case is similar. The set G := {x E [ a ,b ] : f ( x ) > 0) contains b and it is bounded below by a . Let c := inf(G). We will show that c is as claimed in the theorem. First, we show that c $ { a ,b ) . For a contradiction suppose that c = a . Then by the version of Proposition 1.21 for infima, for each n E N there would be an
xn
E
(a,a
+ :)
with f (x,) > 0. (Note that we are using Standard Proof Tech-
nique 3.8 here.) But then lim x, = a , and by continuity of f we could infer that n-+m
0 > f ( a ) = n+m lim f(x,) 3 0, a contradiction. The inequality c f: b is proved similarly (see Exercise 3-33). Because c # b, again by the version of Proposition 1.21 for infima, for each n E N there is an x, E c
c, c
( # a , there is an N
E
+-
3
with f ( x , ) > 0. In particular, lim x, = c. Because n-cc
1 is greater than n = c and f(xA) 5 0 for
W so that for all n 2 N the number XI, := c -
XA
or equal to a. Hence, the sequence {xA},"==, satisfies lim n+cc all n E N.For a visualization of the sequences, consider Figure 8.
-
3.5. Properties of Continuous Functions
67
Because f is continuous, we infer lim f ( x n )= lim f (x;) = f ( c ) . But the inn-oc
n-oc
(a)
I 0. Thus equalities for f ( x n ) and f (x;) show that lim f ( x , ) 3 0 and lim f n-+w n-cc f ( c ) must be greater than or equal to zero and less than or equal to zero, which implies f ( c ) = 0.
The Intermediate Value Theorem immediately implies that continuous images of intervals are intervals, too.
Definition 3.35 Let A , B be sets, let f : A -+ B be a function and let C the image of C under f is dejned to be f [ C ] := { f ( c ) : c E C}. Theorem 3.36 Let I is an interval.
G R be an interval and let f
:I
s A.
Then
+ R be continuous. Then f [ I ]
Proof. Let I , u E f [ I ] with 1 < u and let m E (1, u ) . Then there are a , b E I with f ( a ) = 1 and f ( b ) = u . Without loss of generality assume a < b. The function g ( x ) := f ( x ) - m is continuous on [ a , b ] . By the Intermediate Value Theorem there is a c E ( a , b) so that g ( c ) = 0. But then f ( c ) = g ( c ) in = in. Because i n , 1, u were arbitrary we have proved that for any two elements 1 < u of f [ Z ] and any element m between them we have that m E f [ Z ] .Therefore f [ I ] is an interval.
+
We now turn to inverse functions.
Definition 3.37 Let A , B be sets and let f : A i B be a bijective function. Then the inverse function o f f is the unique (see Exercise 3-34)function f : B + A that maps each b E B to the unique a E A so that f ( a ) = b.
-'
Theorem 3.38 Let I 2 R be an interval and let f : I -+ R be a continuous injective function. Then the inverse function f : f [ I ] i I is also continuous. Proof. Clearly, f maps Z bijectively onto f [ Z ] ,and hence f has an inverse function f - ] that is defined on f [ Z ] . By Theorem 3.36, f [ Z ] is an interval. All that is left is
1
oc
to show that if a sequence { ~ ~ ] n h in 3 , f~ [ I ]converges to y E f [ Z ] ,then f - ( yn ) l n = l converges to f - ' ( y ) . Let(yn}E1beasequenceinf[I]with lim yn = y . F o r n ~ N , l e t x ,:= f - ' ( y n ) n+oo
and let x := f - ' ( y ) . For E > 0, let J := { z E I : Iz - X I < E } , Ji := { z E J : z 5 x) and J , := { z E J : z 2 x}. Then f [ J ] ,f [ J l ] and f [ J u ] are all intervals that contain j = f ( x ) . Because f is injective, the only point f [ J l ] and f [ J u ]have in common is y . Therefore y is the maximum of one of the intervals f [ J l ] and f [ J u ] and it is the minimum of the other. If both f [ J / ]and f [ J u ]have more than one element, then there is a 6 > 0 so that ( y - 6, y 8) f [ J ] . In this case, because {yn)Z1 converges to y , there is an N E N so that for all n 3 N we have yn E ( y - 6 , y 6) E f [ J ] .Consequently, for all n 2 N the point x n is in J , which means Ixn - x /< E . If f [J,] has exactly one element, then J , has exactly one element and x is the largest point of I . Therefore f [ J / ]has more than one element and there is a 6 > 0
+
+
68
3. Continuous Functions
+
so that ( y - 6 , y ] E f [ J l ] or [ y , y 6) & f [ J l ] . Without loss of generality assume ( y - 6 , y ] E f [ J l ] . We claim that then y is the largest point of f [ I ] . Suppose for a contradiction that there was a y' > y in f [ I ] . Then a := f - ' ( y ' ) < x and there would also be a b < x with f ( b ) < y . By Theorem 3.36 there would be a c # x between a and b so that f ( c ) = y = f ( x ) , contradicting the injectivity of f . Thus y = sup f [ I ] .Because ( y n } z lconverges to y there is an N E N so that for all n 2 N we have yn E ( y - 6, y ] 5 f [ J l ] . Thus for all n 2 N we infer x, E J l , which means
Ix, - x / < E . The case in which f [ J / ]has exactly one element is handled similarly. We have shown that in each of the above cases f - ' lim y,) = lim f - ' ( y n ) , which implies that f
(
r 2 . 3 ~ -
-' is continuous.
,
n.fOg-
Now that we know that inverses of continuous functions are continuous, we can establish limit laws for powers with rational exponents.
Definition 3.39 A number n E Z is called even irthere is a k it is called odd iff there is a k E Z so that n = 2k 1.
+
E
Z so that n = 2k and
Corollary 3.40 Let d E N. Then f ( x ) = x : is continuous on [0,00) i f d is even and it is continuous on JR i f d is odd. Proof. Use Theorem 3.38 (Exercise 3-35). Corollary 3.41 Let r E Q be positive. Then f (x) = x" is continuous on [0,CO), and i f r can be represented as a fraction with odd denominatol; f is continuous on R. For negative r E Q,f ( x ) = x" is continuous on ( 0 ,CO), and i f r can be represented as a fraction with odd denominatol; f is continuous on R \ {O}. Proof. Use Exercise 3-27 and Corollary 3.40 (Exercise 3-36). Theorem 3.42 Let f o r all r
E
lirn Q we have n+cc
be a convergent sequence of nonnegative numbers. Then aI; =
(n+oc lim
Proof. Exercise 3-37. We conclude this section by showing that on closed and bounded intervals continuous functions assume an absolute minimum and an absolute maximum.
Definition 3.43 Let I E R be an interval and let f : I + JR be a real valued function. The number y, is called the absolute minimum value o f f (in I ) if and only ifthere is an x,, E I with f (x,) = y, and f o r all x E I we have f (x) > f (x,). The number y~ is called the absolute maximum value off (in I ) ifand only ifthere is an X M E I with f ( x ~ = ) Y M and for all x E I we have f ( x ) 5 f ( x ~ ) .A value that is the absolute maximum or the absolute minimum is also called an absolute extremum. Theorem 3.44 Let f : [ a ,b] -+ Iw be continuous. Then there is an x that,forall z E [ a ,bl we have f ( x ) 2 f ( z ) .
E
[ a ,b] such
69
3.6. Limits at Infinity
Proof. First we show that f is bounded above on [ a ,b ] . For a contradiction, suppose that for every n E N there is an x, E [a,b ] such that f (x,) 2 n. Then by the Bolzano-Weierstrass Theorem there is a convergent subsequence { xnk},“=, with limit x E [ a ,b ] . But then for all k we have f ( x n k ) L nk while at the same time lim f ( x n k ) = f (x) < 00, which is not possible. k-+ rn
Thus f is bounded above. Let M := sup { f ( z ) : z E [ a , b ] } . For each n E N 1 find x, E [ a ,b] with f (x,) 2 M - -. Again by the Bolzano-Weierstrass Theorem n ~ limit x E [ a ,b ] . For all k we have there is a convergent subsequence { x , , } ~ with 1 A4 - - 5 f ( x n k ) 5 M , which means by the Squeeze Theorem that nk
f (x) = ,lhlf
= M = SUP { f ( z ) : z E [ a ,b l } .
Exercises + b in the Intermediate Value Theorem
3-33. Prove that c
3-34. Prove that i f f : A b E B.
+
B is bijective and g, It : B -+ A are inverses o f f , then g(b) = h ( b ) for all
3-35. Prove Corollary 3.40 3-36. Prove Corollary 3.41 3-37. Prove Theorem 3.42. 3-38. Let f : [ a . b] + B be continuous. Prove that there is an x inequality f ( x ) 5 f ( z ) holds.
E [ a , b] such
that for all
z
E [a. b] the
3-39. Let I 5 W be an interval. Give an example that shows that even if f : I + W is continuous, f [ I ] need not be bounded. Then explain why this example does not contradict Theorem 3.44. 3-40. Let I be an interval and let f : I + R be continuous and injective. Prove that f is either increasing (that is, for all x1 < x 2 we have f ( x 1 ) < f ( x 2 ) ) or decreasing (that is, for all X I < x2 we have f(x1) > f ( x 2 ) ) . 3-41. Although the contrapositive is often used to clarify a mathematical statement, some contrapositives can be quite confusing. Determine if the statement “If f ( n ) 0 for all c E [a, b ] ,then f ( a ) > 0 or f ( b ) 5 0 or f is not continuous on [a, b].” is true or false by analyzing its (much simpler) contrapositive.
+
3.6 Limits at Infinity For functions defined on an interval ( t , 30) it is sensible to investigate the behavior as the argument x gets large. The resulting notion of a limit at infinity is set up in exactly the same way as the other limits discussed in this chapter. Thus it is not surprising that similar laws hold.
Definition 3.45 Let L be a real number and let f : ( t , 00) + R be a function. We say f converges to the limit L at 30 and write lim f ( z ) = L iff for every sequence
{Z,},X=~ in ( t , 30) such that n+m lim
ZO ’D
Z,
= 00 we have lim f (z,) = L. ,--too
70
3. Continuous Functions
Limits at --oo are dejined similarly. For functions with domains that contain intervals ( t , 00) or (--00, t ) the limits at f - 0 0 are dejined as the limits of the appropriate restriction.
Theorem 3.46 Limit laws for limits of functions at -00. Let f , g : ( t , 00) + R be g ( z ) exist. Then the following hold. functions so that limZ+CCf ( z ) and limZ--rCC 1. lirn ( f Z M '
+ g ) ( z ) = lim f ( z ) + lim g ( z ) . Z'CC
Z'CC
2. lim (f - g ) ( z ) = lim f ( z ) - lim g(z). Z'ffi
Z'CC
Z'CC
3. lirn (f . g ) ( z ) = lim f ( z ) . lim g ( z ) . Z'W
Z'CC
Z'CC
Each equation implicitly asserts that and in this case the equality holds.
if the right side exists, so does the left side
Proof. Exercise 3-42. Theorem 3.47 E-M formulation for the limit of a function at infinity. Thefinction f : ( t , 00) -+ R converges to the limit L at infinity ifand only iffor every E > 0 there is an M such that for all z 2 M we have f ( z ) - L < E .
1
I
w
Proof. Exercise 3-43.
Moreover, infinite limits at &00 can be Similar results hold for limits at -a. defined similar to infinite limits at a finite number and similar limit laws hold. By now, the reader is sufficiently familiar with the underlying ideas to formulate these definitions independently (see Exercise 3-44).
Exercises 3-42. Prove Theorem 3.46. (a) Prove part 1 of Theorem 3.46.
(b) Prove part 2 of Theorem 3.46.
( c ) Prove part 3 of Theorem 3.46.
(d) Prove part 4 of Theorem 3.46.
3-43. Prove Theorem 3.47. 3-44. Infinite limits at infinity. (a) State the definition of lim f ( z ) = oc z+m
(b) State limit laws for infinite limits at oc.
(c) State and prove a result similar to Theorem 3.47 for infinite limits at infinity.
3-45. Explain why we do not define left and right limits at infinity.
Chapter 4
Differentiable Functions
Geometrically speaking, differentiable functions have unbroken graphs without corners. This “smoothness” of differentiable functions is useful in applications. In the present chapter differentiability is introduced in Section 4.1, the relation between differentiability and the common operations on functions is considered in Section 4.2, and some geometric consequences of differentiability are provided in Section 4.3.
4.1 Differentiability The derivative provides the slope of the tangent line, if the function has a tangent line at the point. Definition 4.1 encodes this idea by demanding that as we fix one point and move the other one closer to this “base point,” the slopes of the secant lines through these points approach a limit. The left part of Figure 9 shows that this convergence of the slopes means that the secant lines tilt towards a “limiting line,” the tangent line. It should be noted that differentiability typically is considered on open intervals so that secant lines “in both directions” can be used to obtain the limit.
Definition 4.1 A function f : ( a , b ) -+ lim
z+x
-
2--x
R is differentiable at x
exists. In this case we set f ’ ( x ) := lim Z’X
E
( a , b ) zrthe limit
(‘) - f ( x ) and call it the
z-x
d f ( x ) and D f ( x ) . derivative o f f at x. Other notations for the derivative at x are dx Similar to limits offunctions, if D C R, a function f : D --f JR is called differentiable at x E D iff there is an open interval ( a , b) D so that x E ( a ,b ) and so that 71
72
4. Differentiable Functions
f I ( a , b ) is differentiable at x. Moreover, similar to what we did so far, we will mostly work with functions defined on open intervals, trusting that the reader can make the jump to larger domains.
Example 4.2 The following are verified with routine computations (Exercise 4-1). I . At every x
E
R thefunction f (x) = x is dizerentiable with f’(x) = 1.
0
2. The function f (x) = 1x1 is not differentiable at x = 0.
As with continuity, the local property of differentiability can be demanded at every point of the domain to obtain a global definition.
Definition 4.3 Let D E R.Afunction f : D + IR is differentiable, or differentiable on D, iff it is differentiable at every x E D. Continuity at x means that lirn f ( z ) - f(x) = 0. Theorem 4.4 affirms that conZ X ’
vergence of the quotient
(‘) -
z-x
to a number is a stronger condition.
Theorem 4.4 Let f : ( a , b ) + R be dizerentiable at x . Then f is continuous at x. Hence, every differentiable function is continuous. Howevel; not every continuous function is dizerentiable.
Proof. Let f be differentiable at x. Then lim Z X ’
0 = lim(z - x) lim Z’X
Z X ’
f ( z ) - f (x) = lim(z - x) z-x
X’Z
(‘) -
z-x -
z-x
exists. This implies = lim f ( z ) - f(x), Z’X
+
that is, lirn f (z) = lirn (f( z ) - f (x)) lim f (x)= 0 + f (x) = f (x),and f is conZ’X Z’X Z’X tinuous at x. To see that not every continuous function is differentiable, consider f (x) = 1x1, H which is continuous, but it is not differentiable at x = 0 . Ultimately, we will generalize differentiability to higher dimensions. In higher dimensions, division is not possible, but it is possible to define entities that are similar to tangent lines. Theorem 4.5 shows the difference between differentiability at x and continuity at x without using division. A differentiable function f can be approximated by a straight line g ( z ) = f(x) f’(x)(z - x) through (x,f (x)) in such a way that the difference between f and g goes to zero faster than Iz - x 1. Geometrically, this means (see Exercise 4-2a and Figure 9 ( b ) )that, no matter how small the width, near x the differentiable function f will enter all “cones” which are centered at (x,f (x)) and symmetric about the line g . This idea ultimately leads to the definition of differentiability in higher dimensional spaces (see Definition 17.24). For a continuous function f,the difference between f and any straight line through (x,f (x)) goes to zero (see Exercise 4-2c), but the function need not enter arbitrarily narrow cones about the line.
+
4.1. Differentiability
73
(b)
Figure 9: Two ways to view differentiability. In ( a ) , as in Definition 4.1, the slopes of the secant lines through (x,f ( x ) ) and (z, f ( z ) ) approach a number, which is the slope of the tangent line. In ( b ) ,as in Theorem 4.5 and Exercise 4-2a, the function and a certain straight line, the tangent line, are such that for any width E > 0, near x the function will ultimately enter the “cone of width E” about the tangent line.
Theorem4.5 Let f : ( a , b ) + R beafunctionandletx E ( a ,b). Then f isdigerentiable at x iff there is an L E JR so that for every E > 0 there is a 6 > 0 such that for Moreovel; all z f x with Iz - X I < 6 we have I f ( z ) - f ( x ) - L ( z - x)/ < E J Z -XI. in this case f ’ ( x ) = L .
1
2-x
-L
OthereisaS > 0 so that for allz E R with / z - x i < 6 wehave
( A x )+
u z -XI
) - El2 --XI 5 f ( z ) 5
(f(x)
+ L(z - x )
E
W so that for every
) +ElZ
-XI
Also prove that in this case f’(x) = L . Use this result to explain the right part of Figure 9.
14
4. Differentiable Functions (b) Prove that for any m
E
W we have lim Iz/ - mz = 0, even though f ( x )
(c) Let f : ( a , b)
0
’i
tiable at x = 0
W be a function that is continuous at x
i
1 f ( z ) - [ f ( x ) + m(z - x)l I = 0.
J&x
E
= 1x1 is not differen-
W and let m
E
W. Prove that
4-3. For f : ( u , b ) + B and x E ( a , b ) we define the left-sided derivative of f at x via the left limit ~ ’ f ( x:=) lim z’x-
- f ( x ) if the limit exists and we define the right-sided derivative o f f at z-x
x via the right limit D ” f ( x ) := lim f ( z ) (-x) = lim f (2) + g ( z ) - f ( X I - g ( x ) z - X
z+x
z--x
4.2. Differentiation Rules
75
The remainder of the proof is left to the reader as Exercises 4-4 and 4-5.
H
Derivatives are also well-behaved when products, quotients, powers and compositions are involved. We first prove the quotient rule, which is a bit harder than the product rule and we leave the proof of the product rule as an exercise.
Theorem 4.7 Quotient Rule. Let the functions f,g : ( a , b) + R be diTerentiable at x E ( a , b) and let g ( x )
+ 0.
f is. diflerentiable at x with Then the quotient g
Proof. Similar to the proof of Theorem 4.6 we compute the limit of the difference quotients. The computations are a bit more involved than before.
$(z) lim X’Z
- gf ( x )
z-x
4. Differentiable Functions
76
AA=Af
g+Ag. f+AfAg
Figure 10: Visualization of the Product Rule. The growth rate of a rectangle with side lengths f and g can be obtained from the picture above by dividing the formula for
+
g ' f . The term A f A g does A A by At and letting A t go to zero. It is A' = f ' g At not contribute to the rate because in its numerator two quantities are going to zero. The proof of the product rule (Exercise 4-6) makes this idea more precise.
w Theorem 4.8 Product Rule. Let f,g : ( a , b) + R be differentiable at x E ( a , b). Then f g is differentiable at x with ( f g ) ' ( x ) = f ' ( x ) g ( x ) g ' ( x )f ( x ) . (For a visualization, consider Figure 10.)
+
Proof. Exercise 4-6. Now that products and quotients are taken care of, we can consider powers. The Power Rule could have been proved earlier in a more direct fashion, but the present proof is faster.
+
Theorem 4.9 Power Rule. For every integer n 0, thefunction f ( x ) = X" is diyerd entiable with -xn = nxn-' at evely x E R for which the right side is dejined. dx Proof. For n > 0, we use induction on the exponent n. For the base step with d 2-x n = 1, note that -xl = lim -= 1 for all x E R. dx z-xz-x d For the induction step n + ( n + l), we need to prove -x"+' = ( n 1)x" for all dx d x E 1w and we can use the induction hypothesis that -xn = nxn-'. Via the Product dx d d Rule we obtain -xn+' = - ( x . x " ) = 1 . X" nx"-' . x = ( n 1)x". dx dx This establishes the power rule for positive integer exponents n. For any rn E N, we have -rn < 0 and we can differentiate x P m as follows.
+
+
+
4.2. Differentiation Rules
77
ipeed g’(n) in the g-direction
,peed f ’ (g(x) ) g’(x) n the f-direction
,’slope g‘(xj
,‘slope i ‘ ( g ( x 1 )
I
X
*
-
__c
-
“g” c
speed 1 in the x-direction speed g’(x) in the g-direction The output of g becomes the input o f f . Position and speed are preserved.
Figure 11: Visualization of the chain rule. The derivative can also be understood as a magnification factor for speeds. If a particle at x moves at unit speed along the horizontal axis, then its image particle under g moves at speed g’(x) along the vertical axis. Now, if a particle at g ( x ) moves at speed g’(x) along the horizontal axis then its image particle moves at speed f ’(g(x))g’(x)along the vertical axis. The above shows that the power rule holds for all nonzero integers. Now that we have the power rule, the last computation can only be viewed as a clumsy way to compute the derivative of powers with negative exponents. Howevel; in this proof we had no choice, because we were proving the very rule that abbreviates this computation. We will revisit the Power Rule in Theorems 4.22 and 12.10. With algebraic operations taken care of, we turn our attention to composition. Figure 11 shows a kinematic way to explain the Chain Rule. Theorem 4.10 Chain Rule. Let g : ( a , b ) -+ LR and f : ( c ,d ) + R be functions with g [ ( a ,b ) ] ( c ,d ) and let x E ( a , b ) be such that g is dzjterentiable at x and f is dzjterentiable at g ( x ) . Then f o g : ( a , b ) -+ R is dijterentiable at x and the derivative is ( f 0 g)’(x) = f’(g(x))g’(x). Proof. First, consider the case that there is no sequence
{znlzl with ,hhr Zn
and zn # x and g ( z n ) = g ( x ) for all n E W. In this case, we proceed as follows. lim Z X ’
f
0
g(z)- f
z-x
0
g(x)
=x
4. Differentiable Functions
78
=
lim u*g(x)
=
f( u ) - f
. lim
u -g(x)
Z’X
g(z) - g(x) z -x
f’(g(x))g’(x).
{z,},“=, with n+m lirn zn
= x and
Because f is differentiable at g ( x ) , there is a u > 0 so that for all u
# g(x)
This leaves the case in which there is a sequence
zn # x and g ( z n ) = g ( x ) for all n
with 1u - g(x)l < u we have
1
E
N.In this case,
- g(x)
reverse triangular inequality) ’(’) E
- f l ( s ( x ) ) l < 1, and hence (by the
1
I
+ 1. Moreover, for all
- (g(x)) < f’(g(x))i - g(x) 0 there is a 6 > 0 so’ that for all z f ’ x with Iz - X &
If’ ( g ( x ) )1 + 1
I
< 6 we have that
(because g’(x) = 0) and / g ( z ) - g(x)l < u (by
Theorem 4.4). Formally, we would need tofind a 6 so that thefirst inequalig holds and another so that the second one holds and then use the minimum of the two. We used a simple modification of Standard Proof Technique 2.6 to abbreviate this step. Therefore for all z # x with Iz - x / < 6 we obtain the following. In case g ( z ) = g ( x ) , we have g(z) g ( x ) = 0. In case g ( z ) # g ( x ) ,we have I g ( z ) - g ( x ) / < v , and hence z-x
Because E was arbitrary we conclude
and the proof is complete.
4.2. Differentiation Rules
79
With compositions taken care of, it would be natural to also consider inverse functions. Because we need Rolle's Theorem to dispose of a technicality, consideration of inverse functions is postponed to Theorem 4.2 1.
Example 4.11 Maxima and minima do not preserve diyerentiability. Even though f (x) = x and g ( x ) = -x are both differentiable, the functions max{f , g } ( x ) = 1x1 and min{f , g ] ( x ) = --/XI are not differentiable at 0. Thus, while differentiability is compatible with the natural algebraic operations for functions, it is not compatible with the natural order-theoretical operations for functions. 0 If a function f : ( a , b ) -+ R is differentiable on ( a , b ) , then the derivative f' is a function in its own right. Hence, we can consider continuity and differentiability for
.f I . Definition 4.12 Let D R and let f : D -+ R be afunction. Then f is called continuously differentiable iff it is diyerentiable on D and the derivative f ' : D + R is continuous. The function f is also often considered to be its own "zeroth derivative" f (1' := f . The function f is called n times differentiable i y i t is n - 1 times derivative f (n-l) : D + R is d8erentiable. The nth diTerentiable and its (n d d" derivative o f f is f ( " ' ( x ) := -f("")(x), also denoted -f . The function f is dx dxn called n times continuously differentiable iff it is n times differentiable and its nth derivative f ( n ) : D + R is continuous. Finally, f is called infinitely differentiable iy fo r all n E N it is n times differentiable. The differentiation rules we have derived here immediately show the following.
Example 4.13 Polynomials and rational functions are infinitely diyerentiable on their
0
domains.
Example 4.14 For every n E N, the function f (x)= times continuously differentiable, but it is not ( n + 1)-times differentiable. (Exercise
0
4-7.)
Exercises 4-4. Prove that if f,g : ( a , b ) -+ R are both differentiable at x E ( a , b ) , then f - g is differentiable at x and (f - g)'(xj = f ' ( x ) - g'(x). 4-5. Prove that if f : ( a , b ) -+ and ( c f ) ' ( x ) = c f ' ( x ) .
W is differentiable at x
E ( a , b ) and c E
W,then cf is differentiable at x
4-6. Prove the product rule. Hint. It's similar to the proof of the quotient rule, but simpler. 4-7. Prove the claim in Example 4.14. Hint. Use induction and at x = 0 use Exercise 4-3.
4-8. Prove that f : ( u , b j + 5s is differentiable at x E ( a . b ) iff the limit lim and that in this case f ' ( x ) = lim h-0
f(x
+h) - f(x) h
h+O
f(x
+hj - f(xj h
exists
80
4. Differentiable Functions
4-9. Let n E N. Use the Binomial Theorem and Exercise 4-8 to prove the Power Rule for f ( x ) = x" without using induction.
4-10. Prove that the derivative of f ( x ) = f i is f ' ( x ) =
-.
1
2 f i 4- 1 1. Compute the derivative of each of the following functions.
(b) f ( x ) = (x2
+ 5>1 d
x (You may use Exercise 4-10.)
(c) f ( x ) =
4-12. Use induction
(-l)"n!x-"-'
4-13. Prove that if f , g : (a,b ) + differentiable and ( f g ) ( " ) =
W are both
2 (;>
for all x
# 0.
n times differentiable, then the product f g is n times
f(kIg(n-k),
k=O
Hznr. Mimic the proof of the Binomial Theorem.
4.3 Rolle's Theorem and the Mean Value Theorem One of the main applications of derivatives is to use the sign of the derivative to compute relative extrema of a function and intervals where a function is increasing or decreasing. The formal justification follows from Rolle's Theorem and the Mean Value Theorem.
Definition 4.15 Let f : ( a ,b ) + R be a function. Then f is said to have a relative (or local) minimum at x, @there is a 6 > 0 such that f (x,) 5 f ( x )for all x E (x, - 6 , x, + 6 ) . f is said to have a relative (or local) maximum at x~ ifand only ifthere is a 6 > 0 such that f ( X M ) 2 f (x)for all x E ( x -~6, x~ 6). I f f has a local maximum or a local minimum at the point c we also say that f has a relative (or local) extremum at c.
+
Intuitively, relative extrema are the locally highest or lowest points of the graph. (Note, however, that stagnation also is possible, see Exercise 4-14.) At the location of a relative maximum there cannot be an incline in any direction. Hence, the derivative should be zero at a relative maximum.
Theorem 4.16 Let f : ( u , b ) + R be a function and let m E ( a ,b). I f f is differentiable at m and f has a relative maximum at m, then f ' ( m ) = 0. Proof. Because f has a relative maximum at m there is a positive number 6 so that f (z) - f (m) f ( z ) - f ( m ) 5 0 for all z with Iz - ml < 6. We infer that lim 5 0 z+m+ z -m and lim z+m-
-
z-m
( m ) 3 0. Because f is differentiable at m , these two limits must
4.3. Rolle’s Theorem and the Mean Value Theorem
rn
(a) a
b
-+-T--+
81
b
(b) a
Figure 12: Rolle’s Theorem states that if a differentiable function starts and ends at the same height, then it must have a flat tangent in between ( a ) . The Mean Value Theorem states that some tangent must be parallel to the secant through the starting point and the ending point ( b ) . be equal to f’(rn). Therefore f ’ ( m ) is greater than or equal to 0 and smaller than or equal to 0. This implies f ’ ( m ) = 0. Rolle’s Theorem states that if a function’s values are equal at the endpoints of an interval, then the function must have a horizontal tangent line in the interval (see Figure 12). Note that the proof is a collection of direct proofs (modus ponens), which is possible because we can rely on strong results that we proved earlier.
Theorem 4.17 Rolle’s Theorem. Let f : [ a , b] + Iw be differentiable on the open interval ( a , b ) and continuous on the closed interval [ a ,b]. I f f ( a ) = f (b),then there is an m E ( a , b ) with f ’ ( m ) = 0. Proof. Because the result is trivial if f is constant, we can assume f is not constant. By Theorem 3.44, f assumes an absolute maximum and an absolute minimum on [ a ,b ] . Because f is not constant, one of these values is not equal to f ( a ) . Assume without loss of generality that the absolute maximum value is greater than f ( a ) . Let m E [ a ,b] be so that f ( m ) 1: f ( x ) for all x E [a, b ] . Because f ( m ) > f ( a ) = f ( b ) we infer m E ( a , b ) . Hence, f is differentiable at m. Because f ( m ) 2 f ( x ) for all x E ( a , b ) , f has a relative maximum at m . Now by Theorem 4.16 we conclude f ’ ( m ) = 0. The Mean Value Theorem generalizes Rolle’s Theorem by no longer demanding that the values at the endpoints are equal. It guarantees that there is a point in the interval so that the tangent line at that point is parallel to the secant line through the endpoints. In the proof, we subtract the line I(x) = (x - a ) f ( b ) - f ( a ) from the b-a function f . This reduces the proof to an application of Rolle’s Theorem.
Theorem 4.18 Mean Value Theorem. Let f : [a, b] + R be differentiable on the open interval ( a , b ) and continuous on the closed interval [ a ,b]. Then there is a numf ( b )- f ( a ) ber c E ( a , b ) so that = f’(c). b-a Proof. The function g(x) := f ( x ) - ( x - a ) f ( b ) is continuous on [ a ,b ] , b-a differentiable on ( a , b ) and g ( b ) = f ( a ) = g(a). By Rolle’s Theorem there is a c in the
82
4. Differen tiable Functions s-’/
ieflecr across the diagonal
c
I
’
I’
?
c
x
*
I
Figure 13: Visualization of Theorem 4.21. Reflection across the diagonal produces the graph of the inverse function and the slope of the reflected line is the multiplicative inverse of the slope of the original line.
interval ( a , b ) such that g’(c) = 0. But that means f ’ ( c ) that is, f ’(c) =
f (b)- f (a) b-a
f ( b )- f ( a ) / = g ( c ) = 0, b-a
‘
With the Mean Value Theorem available we can now prove a sufficient criterion for when a function is increasing or decreasing.
Definition 4.19 Let I 2 R be an interval and let f : I + R be a function. Then f is called (strictly) increasing on I if and only if for all x1 < x2 in I we have f ( x l ) < f ( ~ 2 ) f. is called (strictly) decreasing on I ifand only iffor all x1 < x2 in I we have f ( x 1 ) > f ( x 2 ) . Moreovel; f is called nondecreasing on I if and only iffor all x1 < x2 in I we have f ( x l ) 5 f (x2) and f is called nonincreasing on I ifand only iffor all x1 < x2 in I we have f ( X I ) 2 f ( ~ 2 ) . Theorem 4.20 Let f : [ a ,b] + R be differentiable on the open interval ( a , b ) and continuous on the closed interval [ a ,b]. If f ’ ( x ) > 0 for all x E ( a , b), then f is increasing on [ a ,b]. Proof. Let f ’ ( x ) > 0 for all x in ( a , b ) . Suppose, for a contradiction, that there are points X I ,x2 in [ a ,b] such that X I < x2 and f ( X I ) 2 f (x2). Then by the Mean Value Theorem there is a c in the interval ( X I , x2) (and therefore in ( a , b ) ) such that f’(c)=
(x2)x2 - X I
0 so that the containment ( x - 6, x 6) f [ ( a ,b ) ] holds. Hence, we can talk about differentiability of the function f - ' at x. Let be a sequence in f [ ( a , b)]\ { x } so that lirn zn = x . By Definition 3.1,
+
{zn}zl
n-m
f-'(zn) - f-'(x>
1
-
. For each f ' (f-'w) and let y = f - ' ( x ) . Then yn = f-'(z,) f f - ' ( x ) = y for
we are done if we can show that lim n-m
Zn - X
n E M let yn := f - ' ( z , ) all n E M.By assumption f is continuous, and hence by Theorem 3.38 so is f - ' . This means that lim yn = lirn f-'(z,) = f - ' lirn zn = f ( x ) = y , and therefore n-+m
(n-+m
n-m
=
lim n-Q3
-
Yn - Y
f(yn)-f(y)
1
= lim n+m
1 f(Yn)-f(Y) Yn -Y
1
-f'(y) f'(f-'(x>)
.
Theorem 4.21 allows us to extend the Power Rule to rational exponents.
Theorem 4.22 Power Rule. Let r E Q \ {O}. Then f ( x ) = x' is differentiable with d -Xr = rxr-1 at every x for which the right side is defined. dx Proof. Let m E N and consider f ( x ) = x i . Then f is the inverse function of g ( x ) = x m . By Theorem 4.21 for all nonzero x E R for which x i is defined, we
Now let n E Zand m
.
E
,
M. Bv the Chain Rule. for all nonzero x for which x t is
Exercises 4-14. Geometry versus utility in the definition of relative extrema. (a) Explain why by Definition 4.15 every point in the interval (0. 1) is a relative maximum and a for x 5 0, relative minimum of the function f ( x ) = for x E [O. I], ( x - 113; f o r x > 1.
84
4. Differentiable Functions (b) Sketch the graph of the function and comment whether intuition agrees that f has a relative maximum and a relative minimum at each x E (0, 1). (c) Use the proof of Rolle’s Theorem to explain why the definition of relative maxima as in Definition 4.15 is preferable to a definition of relative maxima that requires the value at a relative maximum to be strictly larger than the values of the function near the relative maximum.
f
4-15. Prove that if f ‘ ( m ) = 0.
: ( a , 6) -+
R is differentiable and has a relative minimum at m
E ( a , b ) , then
4-16. Let f : [a, b] + W be differentiable on the open interval (a, b ) and continuous on the closed interval [ a , b]. Prove that if f ’ ( x ) i0 for all x E ( u , b ) , then f is decreasing on [ u , b ] . 4-17. Let f : [ u , b] -+ I 3be differentiable on the open interval ( u , b ) and continuous on the closed interval [ a , b ] . Prove that if f ’ ( x ) = 0 for all x E (a, b), then f is constant on [a, b]. 4-18. Give a direct proof of Theorem 4.20. That is, give a proof that does not argue via contradiction 4-19. Let f ( x ) := u x 2 decreasing on
+ bx + c be a quadratic function defined on R. Prove that if u
(
-m.
3
--
G
a
State and prove a similar result for a < 0.
+
> 0 then
f is
)
- -, m .
and increasing on
+
+ +
4-20. Let f ( x ) := ax3 bx2 c x d be a cubic function defined on W with u 0. Prove that if 4b2 - 12uc > 0, then f has two relative extrema. For a > 0 and for a < 0 separately describe where f is increasing and where it is decreasing. 4-21. Let f : [ a , b] + R be differentiable on (a, b ) and continuous on [ a , b ] . (a) Prove that f is nondecreasing iff for all x equivalence.)
E ( a , b ) we
have f ’ ( x ) 2 0. (Note that this is an
(b) Prove that f is nonincreasing iff for all x E ( a , b ) we have f ’ ( x ) 5 0. (Note that this is an equivalence.) (c) Give an example that shows that the condition in Theorem 4.20 is not equivalent to f being strictly increasing. 4-22. Use Theorem 4.21 to prove that the derivative of f ( x ) = f i is f ’ ( x ) =
-.
1
2JT;
4-23. Explain with the following is not a proof for Theorem 4.21.
“Proof.” We know that x = f (f-’( x ) ) . Differentiating both sides with respect to x gives d
1= -x
dx
=
d
-f
= f ’ ( f - ’ ( x ) ) ( f - ’ ) ’ ( x ) , so ( f - ’ ) ’ ( x ) =
dx
1
f‘ (f -1( X I )
4-24. Let f : ( a . b ) + iW be continuous and let x E ( a , b). Prove that if f ’ ( z ) exists for all z and lim f ’ ( z ) exists, then f is differentiable at x with f ’ ( x ) = lim f ’ ( z ) .
E
?? ‘
( a , b) \ ( x )
Z’X
Z’X
4-25. Let f : ( a , b ) + W be differentiable. Prove that f ’ (which need not be continuous) has the intermediate value property. That is, prove that for all c < d in (a, b ) and all u between f ’ ( c ) and f’(d) there is an m E (c. d ) so that .f’(m)= u .
{
r;
and h ( x ) := f o r x i d , are for x = c, forx = d, continuous on [c, d ] , that g ( d ) = h ( c ) , apply the Intermediate ValueTheorem to one of them, then apply the Mean Value Theorem to f.
Hinr. Show that g ( x ) :=
4-26. Prove that for n 2 2 the nth derivative of f ( x ) = & is -Zn-l (2n - 2)! f ‘ n ’ ( x ) = (-1)”+l x 2 = (-,)n+’ 2n-l(n - 1)!2n
1.3.5...(2n-3) 2”
x
-2n-I 2
4-27. For each function, find an expression for f ( n ) ( x ) and prove that it is the nth derivative o f f I
(a) f ( x ) =
-
fi
(b) f ( x ) =
%
Chapter 5
The Riernann Integral I
The geometric goal of integration is to compute the area under a graph. In Riemann integration this is done by approximating the area with rectangles. This chapter presents the idea behind Riemann integration and some integration criteria, examples and theorems. Although intuitively the Riemann integral seems to be the right idea, by the end of the chapter we will need more machinery to fully characterize Riemann integrability and we will also have exposed some key weaknesses. An equivalent criterion for Riemann integrability will be presented in Theorem 8.12. The observed weaknesses of the Riemann integral will be addressed in Chapter 9.
5.1 Riemann Sums and the Integral To define the area of rectangles under the graph of a function, we first need to determine the base for each rectangle. This is done with a partition of the interval.
Definition 5.1 Let [ a ,b ] be a closed interval. Then anyjnite set P [a,b ] such that a , b E P will be called a partition of [ a ,b]. Because the order of the points will be important, we also write P = {a = xo < xi < . ' < xn = b } when working with a partition P . 1
With the partition giving the bases of the rectangles, we still need to determine the heights. Each height will be a value that the function assumes within the respective interval of the partition (see Figure 14(a)).
Definition 5.2 Let [ a ,b ] be a closed interval and let f : [ a ,b ] -+ R be bounded. For any partition P = {a = xo < x1 i. . . < xn = b } a set T = { t l , . . . , t n ) such that f o r all i E { 1, . . . , n } we have that ti E [xi-l,xi] will be called an evaluation set. We 85
5. The Riemann Integral I
86 Riemann sum
“upper sum”
“lower sum”
Figure 14: The Riemann integral approximates the area under the graph of a function with the areas of rectangles. Definition 5.2 demands that the height of each rectangle is the value of the function at some point in the base interval (one point marked in ( a ) ) . Of particular importance are the lower sum (see Definition 5.13) of the areas of the largest rectangles that can be fit under the graph of the function (b) and the upper sum (see Definition 5.13) of the areas of the smallest rectangles that contain the graph of the function ( c ) . dejine the Riemann sum off with respect to the partition P and evaluation set T to be
c n
R ( f , P , T ) :=
f (tj)(xi - xi-1). We will also use the notation Axi := xi - X i - 1 .
i=l
Clearly, a Riemann sum can only accidentally be equal to the area under the graph. However, the narrower we make the rectangles, the closer the Riemann sums should be to the actual area. The norm of a partition gives a uniform measure of how narrow the rectangles in a partition are.
Definition 5.3 Let [ a ,b ] be a closed interval in the real numbers. For a partition P = { a = xo ix1 i’ . . < x, = b ) , we define the norm of the partition P to be 11 P / / := max { (xi - xi-1) : i = I , . . . , n } . We now say that a function is Riemann integrable iff all Riemann sums get close to one value, the integral, as the norm of the partitions is made small.
Definition 5.4 Let D be a set. A function f : D -+ E% is called bounded iff there is an M E R so that f (x) 5 M f o r all x E D.
1
I
Definition 5.5 The function f : [ a ,b] + R is called Riemann integrable (on [ a ,b ] ) i f f f is bounded and there is a number I such that f o r all E > 0 there is a 6 > 0 so that f o r all partitions P with 11 PI1 < 6 and all evaluation sets T the inequality
c
f(ti)Axj - I
i=l
i
I
= IR( f , P , T ) - I
0. There is a number 6 > 0 so that for all partitions P of [ a ,b] with 11 P 11 < 6 and any associated evaluation set T we have R ( f , P , T ) -
Ib -=I
f (x)dx < &. Because lim I/Pk 11 = 0 there is a
K E W so that for all k 2 K we have IIPkII
/ R ( f $ pk, Tk) -
lb
f (x)dxl
k+w
6. Hence, for all k 2 K we infer that
rn
< &.
The idea in Lemma 5.6 is very useful for numerical integration (see Exercise 1325) and for the proof of the Fundamental Theorem of Calculus (see Theorem 5.23). To demonstrate how Lemma 5.6 is applied, consider the following example.
I'
1 x dx = -. 2 If f is Riemann integrable, then by Lemma 5.6 for any sequence {Pn}F=l of par-
Example 5.7 If f (x) = x is Riemann integrable on [0, 11, then titions with evaluation sets Tn we have lim R(f, P, :=
{1 j n
:
n-tw
= O , . . . , n } andT, :=
=
-
lim
n-tw
{f
:j = 1
-Cj 1
n2
j=1
I n lim - - ( n + a2 2
n-tw
f ( x ) d x . We choose
1
n2+n 1) = lim -n-tco 2n2 2'
Example 5.7 exhibits a key problem that we will ultimately resolve in Theorem 8.12. Although we may be able to compute a value that should be the Riemann integral, it may not be clear if the function actually is Riemann integrable.
88
5. The Riemann Integral I
The following results show that it is reasonably simple to prove Riemann integrability if the value of the integral is known. Note that the strong similarity between Theorem 5.8 and the appropriate parts of Theorems 2.14 and 3.10 is not accidental. The definition of the Riemann integral is similar to the definitions of the limits of sequences and functions. Hence, similar “limit laws” hold and they are proved with similar methods. Unfortunately the similarity does not apply to integrals of products and quotients. (Products are addressed in Exercises 5-21 and 5-22 and quotients are typically treated as products in integration.)
Theorem 5.8 Let f , g : [ a ,b] + R be Riemann integrable and let c and cf are Riemann integrable and the following equations hold. 1.
2.
lb(f + lb Ib g)(x) d x =
cf ( x )d x = c
b
f
( x ) dx
+
l
E
R. Then f
+g
b
g ( x )d x .
f (x) dx.
+
Proof. We will only prove that f g is Riemann integrable and that equation 1 holds, leaving the rest to Exercise 5-3. Let E > 0. Then there is a 6 > 0 so that for all partitions P of [ a ,b] with I/P 11 < 6 and all associated evaluation sets T we have
12
f (ti)&
-
i=l
Ib
f ( x ) dxi
= F f (t(k') =
holds. Therefore we obtain
(k)
'i
(x:!)~)
I
f (t?)) Ax(k) = F (x:")) - F . Let Tk := t!k) : i = 1, . . . , nk . Then Tk is an evaluation set for Pk and by Lemma 5.6 we conclude via a telescoping sum
which proves the result.
Definition 5.24 Because of the Fundamental Theorem of Calculus, if F and f are functions with F' = f , then F will also be called an indefinite integral o f f , denoted F =
1
f (x) dx. One way to compute Riemann integrals is to evaluate an indefinite
integral at the upper and lower bound and to compute the difference. The hypothesis that the derivative of F is Riemann integrable feels quite artificial. Nonetheless, this hypothesis is best possible. Exercise 12-24 will exhibit a differentiable function whose derivative is bounded, but not Riemann integrable. Although this example is a bit pathological, it points out a weakness of the Riemann integral that motivates the development of the Lebesgue integral. The Antiderivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral, which has no artificial looking hypotheses, will be proved in Exercise 23-8. The Derivative Form of the Fundamental Theorem of Calculus is proved in Theorem 8.17 for the Riemann integral and in Exercise 18-6 for the Lebesgue integral.
Exercises 5-20. Power Rule for integration. Let r E
Q \ [-
positive or both be negative. Prove that
b. In case r i0, let a and b either both be 1 1 x r dx = -brcl o r + ' . Then explain why
1) and let a
Ib
i
r+1
~
r f 1 i0.
we needed to require a and b to be both positive or both negative for r
5-21. Integration by Parts. Let [ a , b] c (c, d ) and let F , g : (c, d ) + W be continuously differentiable with derivatives f and g'. Prove that
Ib
f ( x ) g ( x ) dx = F(b)g(b)- F ( a ) g ( a ) -
Ib
F ( x ) g ' ( x ) dx.
5.4. The Darboux Integral
97
5-22. Integration by Substitution. Let [a, b] c (c, d ) , let g ; (c, d ) + W be continuously differentiable with derivative g’ and let F be continuously differentiable with derivative f such that the domain of F contains g [ [ u , b]
1. Prove that
b
f
(g(x)
) g ’ ( x ) dx = F ( g ( b ) ) - F
(g(a))
.
5-23. What would we need to prove so that the hypothesis that the derivatives are continuous in Exercises 5-21 and 5-22 can be replaced with the hypothesis that the derivatives are Riemann integrable?
5.4 The Darboux Integral Lemma 5.6 is an efficient tool to establish properties of the Riemann integral, provided all functions involved are Riemann integrable. It is now time to look for a similarly efficient criterion to prove that a function is Riemann integrable. Riemann’s Condition below is inspired by the idea of trapping the Riemann sums between lower and upper sums, which was already used in the proof that continuous functions are Riemann integrable. Riemann’s Condition is simpler to verify than Definition 5.5 of Riemann integrability, because, for E > 0, instead of working with all partitions of sufficiently small norm and all evaluation sets, we only need to find one partition so that the upper and lower sums are closer together than E . The price is paid in the proof, where for the ‘‘e” direction, our only tool is one partition and we must prove something for all partitions of sufficiently small norm.
Theorem 5.25 Riemann’s Condition. Let f : [ a ,b] + Jft be a boundedfunction. Then f is Riemann integrable on [a,b ] i#for all E > 0 there is a partition P of [a,b] such that U (f , P ) - L ( f , P ) < E. Proof. For “+,”let f : [ a ,b] + EX be bounded and such that for all E > 0 there is a partition P of [ a ,b] such that U ( f , P ) - L ( f , P ) < E . By Lemma 5.17 the set B := { L ( f , P ) : P is a partition of [ a ,b ] }is bounded above. Let C := sup B and let E > 0. Let P = {a = xg < x1 < . . . < x, = b } be a partition of [ a ,b ] such that E U (f,P ) - L ( f , P ) < -. Then for all refinements Q of P and all evaluation sets T for 2 Q , Lemmas 5.14 and 5.16 imply
IR(f,Q , TI
u(f, P ) - L ( f 3P ) < 5.
- 131 I u ( f ,Q ) - L ( f , Q ) I
&
To show that f is Riemann integrable, let M := sup {If (x)l : x E [ a ,b ] }and let & Ax1 &?.! . We will now show that for all partitions S 6 := min ( 4 n ( M + 1 ) ’ 3 ’ ” ’ 3 with IlSIl < 6 and all associated evaluation sets TS we have l R ( f , S, Ts)- CI < E . Let S = a = xo < x1 < . . . < x& = b be any partition of [ a ,b] with IlSll < 6. s s Then Q : S U P is a refinement of P . Therefore, for any evaluation set T for Q & we have R ( f , Q , T ) - CI < -. Moreover, by choice of 6, every interval [x:-~, xs] 2 contains at most one point of P that is not in S and any two intervals [x;-, , x/”]that contain such a point do not intersect. Let TS be any evaluation set for S and let T be an evaluation set for Q that contains Ts.Then T \ TS = { t i , , . . . , t i k } with k < n , because
1
1
I
I
98
5. The Riemann Integral I
the addition of at most n - 1 points to S that are all in distinct intervals at most n - 1 intervals that do not already have an evaluation point assigned. Therefore 1 R ( f ,Q , T ) - R
( f 9
S,
Ts)1
and hence
IR ( f , S , Ts)- CI
I IR
( f , S , Ts) - R ( f , Q, T ) l + I R ( f , Q , T ) - Cl
& & -+-=&.
0 there j=1
is an N E
N so that f o r all n 2 rn 2 N we have 00
Proof. The series
c a j
converges iff the sequence
{s,}zl of its partial sums
j=1
converges, which by Theorem 2.27 is the case iff it is a Cauchy sequence. This is the case iff for all E > 0 there is an N E N so that for all n 3 m 3 N the inequality
Isn - sm-11
iE
holds. Since
1
I we have proved the result.
a , = Is, - s-1, 1j;m
The Alternating Series Test shows that there are indeed many convergent series.
Theorem 6.11 Alternating Series Test. Let
{ b j } F 1 be
a nonincreasing nonnegative
00
sequence such that lim
bj
(- l ) j + ' b j converges.
= 0. Then
J-00
j=1
Proof. Let E > 0 be arbitrary. There is an N E b, i-. Then for all rn > n 2 N we obtain 2 I m
I
m-n
1 j=n
I
i=O
E
M so that for all n
i =O
,
i
use bn+2,+1?bn+2(j+1)and a telescoping sum
3 N we have
109
6.2. Absolute Convergence and Unconditional Convergence
rn
and by the Cauchy Criterion we conclude that the series converges.
In many situations we will work with series of nonnegative numbers. If negative summands occur, it is natural to take absolute values and hope the sum still converges. This is not always the case, but the idea of absolute convergence is fundamental.
c
c 00
00
Definition 6.12 A series
converges absolutely iffthe series
a j
j=1
laj
1 converges.
j=1
Absolute convergence is a strictly stronger condition than convergence. That is, absolutely convergent series converge, but the converse is not true.
c c J II c 00
Proposition 6.13 I f the series
converges absolutely, then it converges. More-
a j
j=l
Ix
ovel; the triangular inequality
x
a. < l a j I holds. ij=l - j=1
c c 1 00
00
Proof. Let E > 0. Because
a j
converges absolutely,
j=l
laj
I converges. Thus
j=1
n
there is an N E
N so that for all n 2 m 2 N we have that
c n
all n 2 m 2 N we obtain
c
5
laj
< E . But then for
j =m laj
I
0 } U {r E Q : r < 0 ) U (0} is countable by Theorem 7.16. Proof. We have {r E Q : r
- :n E
ff=I
Exercises 1 7-10. Prove that the function f : N x W + W defined by f ( m . n ) := - ( m 2 bijective. (For a visualization, consider the middle of Figure 18.)
+ n - l)(m + n - 2) + n is
7-11. Prove that the set of integers Z is countable. 7-12. Prove that the set of integers Z is countable by constructing a bijective function f : N + Z. Hint Figure 18(a). 7-13. Prove that the set of dyadic rational numbers is countable. 7-14. Use Theorem 7.16 to prove Lemma 7.14. 7-15. Give a direct proof that the union of TWO countable sets is countable. 7-16. Prove that if C1, , . . , C, are countable, then C1 x C2 x
.. . x
C, is countable.
Figure 18: Some standard visualizations for countability arguments. Part ( a ) shows the construction for a direct proof that the union of two countable sets is countable. Part ( b ) shows an explicit bijective function between W and the product of two countable sets. Part ( c ) shows the idea behind the proof that (0, 1) is not countable.
cc 3030
7-17. Let ( a ( i , j ) ] E = lbe a family of nonnegative numbers. Then the double series
a ( i , j ) con-
i=l j = 1 30
verges iff for all bijections
D
:W
+ N x N the sum
a o ( i ) converges. Furthermore, in this case i=l
the values are equal. Hint. This is similar to the proof of Proposition 6.22.
7.3 Uncountable Sets Not all sets are countable. In fact, (see Exercise 7-18) there is an infinite hierarchy of sizes for infinite sets, because any time we form a power set, we obtain a set that is strictly larger than the set we started with.
Theorem 7.18 I f X is a set, then X is not equivalent to its power set
Pix).
Proof. Suppose for a contradiction that f : X --f P ( X ) is a bijection. Define B := {x E X : x # f ( x ) } . Because f is surjective, there is a b E X with B = f ( b ) . Now b E B would imply b E B = f ( b ) and by definition of B this would mean b $ f (bj = B . Thus we infer b # B . But then b # B = f (b),which by definition of B forces b E B , a contradiction. For analysis, we typically do not need the full hierarchy. Instead we only need to distinguish sets that are countable from those that are not.
Definition 7.19 A set U is called uncountable i f f it is not countable. The real numbers, which are fundamental for analysis, are not countable.
Theorem 7.20 The interval ( 0 , 1) is uncountable.
125
7.3. Uncountable Sets x
,Itl,=, 1
.i j In=] CD
w; !
*
Figure 19: Cantor sets are the intersection of a sequence of unions of closed intervals where at each step only the left and right segments of each interval are kept.
Proof. Suppose for a contradiction that (0, 1) was countable. Then there is a sequence such that for every x E (0, 1) there is an n E N so that x = x,.
[X~}F=~
For each n
E
N,let
1
00
xAk)
I k l
be the decimal expansion of x, as in Proposition 6.6.
For each n E N,let y , be a number in the set [ I , 2, 3 , 4 , 5 , 6 , 7 , 8) \ [xf)] . Then ( y n } z l is a decimal expansion of a number y E (0, 1). However, for all n E N we have that yn f x;), and hence y # xn,contradiction. The remainder of the proof that W is uncountable is left to Exercise 7-19b. The uncountability of the real numbers shows that, even though in real life we work mostly with rational numbers, there are many more irrational numbers than there are rational numbers.
Theorem 7.21 The set W \
Q of irrational numbers is uncountable.
Proof. By Exercise 7-19b, the real numbers are uncountable. Now for a contradiction suppose that W \ Q was countable. Then R = (R\ Q) U Q would be countable by Theorem 7.16, contradiction. We conclude this section by defining Cantor sets. These sets are very useful to construct counterexamples which show that certain hypotheses in analytical theorems cannot be dispensed with. Because these counterexamples can be considered a bit pathological, we defer their construction to the exercises and we will only refer to Cantor sets when necessary. Cantor sets are constructed from a sequence of unions of closed intervals so that in each step we remove the middle of each interval. Figure 19 shows the first six stages in the construction of the ternary Cantor set, which is constructed by successively removing the middle third of the intervals at each stage.
1 Definition 7.22 Let [ a ,b ] be an interval and let 0 < q < -. Then we define the left 2 part o f [ a ,b ] to be the interval L q [ a ,b ] := [a, a q ( b - a ) ] and the rightpart to be the interval & [ a , b ] := [b - q ( b - a ) , b ] .
+
7. Some Set Theory
126
1 For a sequence Q := {qn]zl with 0 < qn < - f o r all n E N define CF recursively 2 as follows. Let Cf := ZFo := [0, 11 and once the set Cf is dejined as a union ofpair2”
wise disjoint closed intervals
u
Zi,n, Q f o r i = 1 , . . . , 2n let Z2i-l,n+l Q
:= Lqn+l[Ii:],
i=l
u
2n+l
:= Rqn+l[ Z e ] and let Cn+l Q := let Z2i,n+l Q
00
Zj:+l.
Then C Q :=
i=l
Cf is called n=l
the Cantor set associated with the sequence Q.
Even though the construction looks like it should only leave the boundary points of the intervals, Cantor sets are in fact uncountable. The details can be explored in Exercise 7-25.
Exercises 7-18. Construct a sequence (Pn}F=l of infinite sets so that no two sets P, and Pn+l are equivalent, but for each n E N there is an injective function f n : Pn -+ P,+1. 7-19. Containment and uncountable sets (a) Let U ,V be sets. Prove that if U is uncountable and U C V, then V is uncountable (b) Prove that B is not a countable set. 7-20. Prove that every uncountable set contains a countably infinite subset 7-21. Prove that the set of all functions from W to (0, 1) is uncountable 7-22. Let { a , ] j G lbe an uncountable family of positive numbers. Prove that there are an so that ain P E for all n E W. countable subfamily {qn
E P
0 and a
7-23. Let F : [ a , b] + R be a nondecreasing bounded function. Prove that F can have at most countably many discontinuities. Hint.Suppose the set of discontinuities is uncountable and use Exercise 3-21 to conclude that there must be an E so that the set of all x with lim F ( t ) - lim F ( z ) > E is uncountable. z+x+
z+x-
7-24. Prove that for every countable subset A g R there is a nondecreasing function f : B + [O, 11 that is continuous on W \ A and discontinuous at every a E A. Hint. With A = {a, : n
E
W) set f ( x ) := ,:an i x
2”
7-25. Cantor sets (a) Prove that for any sequence Q = (q,)F=l of numbers qn
E
tion from C Q to the set of all sequences of zeroes and ones. Hint. Forx
E
C Q andn 3 1, seta,(x) := Oiffx E I ?
Jsn
(
0,
:>
-
there is a bijective func-
= Lq,,
thatis, iffwehave
to turn “left” at the nth stage of the construction to keep x in the interval. (b) Prove that CQ is uncountable. Hint.Exercise 7-21. (c) Prove that the set of endpoints of the intervals I f n that make up the C :
is countable.
Q . (d) Prove that every x E C Q is the limit of a sequence of endpoints of intervals I 1.11
Chapter 8
The Riernann Integral I1 The Dirichlet function in Example 5.26 shows that functions with too many discontinuities may not be Riemann integrable. This is because for a discontinuity d there is an E > 0 so that, independent of 6, the numbers inf { f (x) : x E (d - 6 , d 6 ) ) and sup { f (x) : x E (d - 6 , d a)} will always be at least E apart. By Theorem 5.25, too many such discontinuities will cause the function to not be Riemann integrable. At the same time, Proposition 5.12 shows that a function can have some discontinuities and still be Riemann integrable. To determine when a function is Riemann integrable, we need to determine “how many” discontinuities are acceptable. For a graphical motivation, consider Figure 20 on page 133. Section 8.1 introduces outer Lebesgue measure, which is the tool to measure “how many” discontinuities a function has. Section 8.2 introduces Lebesgue’s integrability criterion and Section 8.3 shows how this criterion allows us to easily obtain new results about the Riemann integral. We conclude in Section 8.4 with improper integrals.
+
+
8.1 Outer Lebesgue Measure Outer Lebesgue measure covers a set with open intervals and assigns the infimum of the sums of the lengths of these intervals as the measure (“size”) of the set. With respect to Riemann integrability, we should note that if all discontinuities of our function are trapped in a union of intervals, then outside of this union of intervals the function is continuous and Riemann integrability should not be a problem there.
Definition 8.1 For an open interval I = ( a ,6 ) in R,we define I I1 := b - a. For any set S C R,we define the outer Lebesgue measure of S to be cx: j=1
j=1
where we set h ( S ) = oc, ifnone of the series in the set on the right converge.
127
8. The Riemann Integral II
128
The proof of Proposition 8.5 will show why we use open intervals in the definition of outer Lebesgue measure. We first turn our attention to sets with outer Lebesgue measure zero. These sets, and properties that hold on the complement of such a set, will be of particular importance for the Riemann integral.
Definition 8.2 A set of outer Lebesgue measure zero is called a set of measure zero or a null set. A property P ( x ) such that h ( ( x E D : P ( x ) is not true }) = 0 is said to hold almost everywhere in D. Almost everywhere is also abbreviated as a.e. Countable sets are considered “small” in set theory and they are also “small” with respect to outer Lebesgue measure.
Proposition 8.3 Countable subsets of EX have outer Lebesgue measure 0. Proof. Let C = {cl, c2, . . .} be a countable subset of R and let E > 0. For j E N let 00 00 00 E l 1 I . ‘= c . - - . ThenCsUZjandCIIil=Cs-=s.Thus 2 2j 2J j=1 j=l j=l h ( C ) = 0.
(,
Although the proof of Proposition 8.3 is quite simple, it takes a little to get used to the result. Recall that Q is countable, which means it is a null set! Exercise 8-3d will show that null sets can be uncountable, too. Of course, not all subsets of R are null sets. To prove that outer Lebesgue measure provides the “right” measure for intervals, we need the Heine-Bore1 Theorem. The conclusion of the Heine-Bore1 Theorem is the inspiration for the topological definition of compactness (see Theorem 16.72). Until we formally introduce compactness in Section 16.5 we will rely on Standard Proof Technique 2.28 as used in the proof of the Heine-Bore1 Theorem.
Theorem 8.4 Heine-Bore1 Theorem. Let [ a ,b] c R and let Z be a family of open intervals with [ a ,b] 2 I . Then there are$nitely many intervals I1, 1 2 , . . . , I,, E Z
u
n
SO
that [ a ,b]
IEZ
c U Zj. j=l
Proof. Suppose for a contradiction that
Let c := inf C. Because a E I for some I E Z we infer c > a. Let J E Zbe an open interval with c E J. If c < b there is a 6 > 0 with (c - 6, c 6) 5 J n [ a ,b ] . If c = b there is a 6 > 0 with (c - 6, b] 2 J n [ a ,b]. By definition of c, there
+
are finitely many I1, 1 2 , . . . , Z,
E
Z so that [ a ,c - 61 s
u n
j=1
I j . But then if c
b we
i
8.1. Outer Lebesgue Measure
+S] C J U
obtain [ a , c
u
u
129
n
Zj,
contradicting the definition of c. Hence, c = b and we
j=l
n
obtain [ a ,b] C J U
Zj, implying
C = 0, a contradiction.
j=1
Proposition 8.5 Let a , b E R with a < b. Then h ( [ a ,b ] ) = b - a. Proof. Let
E
> 0. Because [ a ,b] 2
b
+ E4 ) U
+
fi (--
-) 2 . 2 n 92 . 2 "
n=2
&
&
we
obtain h ( [ a ,b ] ) < ( b - a ) E . Because E > 0 was arbitrary, h ( [ a ,b ] ) I b - a. To show the reverse inequality, let [ Z j } c l be a countable family of open intervals
u 00
so that [ a ,b] E
Z j . By
the Heine-Bore1 Theorem, there is a finite number of inter-
j=l
u n
vals
Zj,,
. . . , Zj,
SO
that [ a ,b] C
Zj,.
For k = 1, , . . , n let
Zjk
= ( a k , bk). Without
k=l
loss of generality assume that no interval Z j k is contained in another. Reorder the intervals so that for k = 2, . . . , n we have bk-1 5 bk. Then for all k = 2, . . . , n we have a k - 1 P ak. Moreover, bn > b, a1 -= a and for all k = 2, . . . , n we infer ak ibk-1. n
Hence,
bk - ak > bl - a1
n
bk - bk-1 = bl - a1
+ bn - bl
1 b - a , which
k=2
k=l
co
IZ, 1
implies
+
> b - a , and hence h ( [ a ,b ] ) 3 b - a .
J=1
Aside from its use related to Riemann integration, outer Lebesgue measure also is the foundation for Lebesgue integration. We conclude this section with some of the properties of outer Lebesgue measure, which will also be helpful for some exercises.
Theorem 8.6 The properties of outer Lebesgue measure h. With 00 defined to be greater than all real numbers and the sum of a divergent series of nonnegative numbers being 00 we have the following. 1. h ( 0 ) = 0. 2. ZfA C B , then h ( A ) 5 h ( B ) . 3. Outer Lebesgue measure is countably subadditive. That is, for all sequences 00
{An]El ofsubsets An 5 R the inequality h Proof. For part 2, let A & B . Then we obtain the following.
130
8. The Riemann Integral II
u 30
IZj 1 : A G
5
inf
r
each Z j is an open interval
u
Zj,
each
00
IZj
I :B
s
j=l
=
Zj,
j=1
j=1
Zj
is an open interval
j=1
h(B).
For part 1, note that for all n E
N we have 0
1- ,:
I 1
L 7c
I
2 and thus h ( 0 ) 5 -, which
71
-71
via part 2 implies h(0) = 0. For part 3, first note that there is nothing to prove if the right side is infinite. So assume the right side is finite and let E > 0. For each n E M, find a countable family
{
I u uu 00
Lemma 7.14 the family Zn of this family is such that
1 j.n=l
is a countable family of open intervals. The union " 0 0
00
An 5
n=l
u xi
Zj" =
n=l j=1
Zj". By Proposition 6.22, the
j,n=l
convergence behavior and value of a doubly infinite sum of nonnegative numbers do not depend on the order of summation and by Exercise 7-17 it does not matter if we represent the sum as a double sum or a single sum. Thus we can conclude
Because E was arbitrary, this proves part 3.
Exercises 8-1. Let [ A n ) E lbe a countable family of null sets. Prove that h
8-2. Let a. b E W with a ib. Prove that h ( ( a ,b ) ) = b - a . Hint. For ''z"approximate the open interval "from the inside" with closed intervals
8-3. Let Q = ( q n ] 2 1be a sequence of numbers qn E
and let C Q be the associated Cantor set
as in Definition 7.22. We will use the notation of Definition 7.22 throughout this exercise.
j=l
8.2. Lebesgue ’s Criterion for Riemann Integrability
131
Hint.Prove that for a finite union of painvise disjoint intervals the outer Lebesgue measure is the sum of the lengths of the intervals. This requires repeated use of the argument in the proof of Proposition 8.5. (b) Prove that
{ fi
2q,} converges.
j=1
n n
(c) Prove that h (CQ) = n
%$24j
j=1
Hint. For ‘‘2,”first consider a family Z of open intervals so that C Q E VZ1,12,. . . , In E Z : C Q n [O, XIg‘
UZ.Prove that
= 0, by assuming it is j=1
not empty and showing that inf C E CQ, which leads to a contradiction similar to the proof of the Heine-Bore1 Theorem. (It also helps to look ahead to the proof of Lemma 8.11 for a similar argument.) Then show that for any countable family
u
{Zj]gl with C Q
u 03
I, there
j=1
m
are intervals Iji , . . . , Zjk with C Q
I jk . Conclude that there must be an n E W so that
k=l
m
k=l
(d) Prove that for any q
E
(0, -:>
the constant sequence Q = {q)z, yields a Cantor set C Q of
measure zero. Note. By Exercise 7-25b in Section 7.3 Cantor sets are uncountable. This means there are uncountable sets of measure zero. (e) Use Q =
{
2n;;
n=l
~
to prove that there are Cantor sets that are not of measure zero
n
n+l
n
Hint. Prove that for all n
E
N we have
-.
2qj 2 1 -
j=i
j=2
2J
(0Prove that there are Cantor sets whose Lebesgue measure is arbitrarily close to 1. Hint. Fork
E
W fixed, use n
+ k instead of n in Exercise 8-3e.
8-4. Use the Heine-Bore1 Theorem and the axioms for W except for Axiom 1.19 to prove the BolzanoWeierstrass Theorem. Hint. We will ultimately do this in a more abstract setting in Theorem 16.72. 8-5. Prove that i f f : [a, b] --f all x E [ u , b ] .
R is continuous and A ( ( x
E [ a ,b] : f ( x ) # 0)
) = 0, then f ( x ) = 0 for
8-6. Prove that if f,g : [a. b] + B are continuous almost everywhere, then f everywhere.
8.2
+ g continuous almost
Lebesgue’s Criterion for Riemann Integrability
The oscillation of a function is a quantitative measure “how discontinuous” the function is. It is the last tool we need to characterize Riemann integrable functions.
8. The Riemann Integral II
132
Definition 8.7 Let f : [ a ,b ] -+ R be a boundedfunction and let x E [ a ,b]. For any interval I let w f ( I ) := sup { If(y) - f ( z ) / : y , z E I n [a,b ] }be the oscillation off over the interval I . Define the oscillation o f f at the point x E [ a ,b] as the infimum w f ( x ) := inf { w f ( ~ :)x E J , J is an open interval }. Exercise 8-8 shows that the oscillation measures the height of “jumps” in the function and Exercise 12-23 shows that the oscillation is also a measure of the size of oscillations. Regarding the details of the definition, Exercise 8-10 shows that it is important that we use open intervals in the definition of the oscillation wf (x) at a point.
Theorem 8.8 The boundedfunction f : [a,b ] -+ R is continuous at x = 0.
E
[a,b ] ij7
Of(X)
0. Then there is a Proof. For “+,”let f be continuous at x E [a,b] and let E & S > 0 such that for all y E [ a ,b] with Iy - X I < 6 we have f ( y ) - f (x)l < -. Then 2 for all y , z E ( x - 6, x + 6 ) r l [a,b] we obtain
1
+
Hence, w f ( x ) 5 w f ( ( x - S, x 6)) 5 E and because E was arbitrary we conclude that w f ( x ) = 0. Conversely, for “e,” let x E [a,b] with w f ( x ) = 0 and let E > 0. Then there is an interval (c, d ) with x E (c, d ) such that w f ( ( c ,d ) ) < E . Let 6 := min{x - c, d - x} unless x = a , in which case we set 6 := d - x, or x = b, in which case we set S := x - c. Then for all y E [a,b] with lx - y / < 6 we have y E (c, d ) , and hence l f ( x ) - f ( y ) / < E . Thus f is continuous at x. The next two lemmas show that, when it comes to Riemann integrability, small oscillation means that lower and upper sums can get close to each other, while nonzero oscillation on a set of positive outer Lebesgue measure prevents Riemann integrability (also see Figure 20).
Lemma 8.9 Let f : [a,b ] --+ R be bounded. I f w f ( . x ) < & f o rall x E [ a ,b], then there is apartition P of [ a ,bl so that U ( f , P ) - L ( f , p ) < 61b - al.
+
Proof. For every z E [ a ,b ] ,there is an open interval Iz = ( z - a,, z 6,) so that w f ( I , ) < E . By the Heine-Bore1 Theorem, there are finitely many z1, . . . , zm E [ a ,b]
. Let P
:= ( a = xo < x1 < . . . < x n = b}
j=l 6. be the set comprised of a , b and the endpoints z j k 2 that are in [ a ,b ] . Then each 2 interval [xi-l, xi] is contained in an interval Izl and consequently, with notation as in Definition 5.13, we infer Mi - mi < E . But then n
U ( f , P ) - L ( f ,p )
=
n
Mi Axi i=l n
n
mi Axi = c ( M i - m i )Axi i=l i=l
8.2. LebesgueB Criterion for Riemann Integrability
133
A
4
(“1
(h)
Figure 20: The area of the boxes indicates the difference between the lower and the upper sum of the function for the given partition. Boxes are tall where the function has large slopes or discontinuities. Comparison of ( a ) and ( b ) shows that as the norm of the partition goes to zero, the height of the boxes goes to zero where the function is continuous. Where the function is discontinuous the height remains bounded away from zero (consider the two discontinuities). Thus f can only be Riemann integrable if the discontinuities (unavoidable tall boxes) can be trapped inside a set of intervals whose total length is small. This is the idea for the Lebesgue criterion. which finishes the proof.
Lemma 8.10 Let f : [ a ,bl -+ 24 be bounded. Ifh({.x theti f is not Riemann integrable.
Proof. Because { x
E
[ a .b ] : w f ( x ) > 0 } =
x
[ a .61 : w f ( x ) > 0 ) ) > 0,
E
E
[a. b] : w f ( x ) >
/=I
is an E > 0 so that L := h ( { x E [ u , b] : w , f ( x ) > E } ) > 0. Let P = { a = xo < X I < . . . < xn = b } be any partition of [a.b ] . Define the set D := { x E [ a .b] \ {xo, . . . , x,?} : w f ( x ) > E } . By Theorem 8.6, we obtain
+ CA ( { x j ~2 ) A 11
L
2 A(D)= A ( D ) 2
h ( { x E [u, b] : w f ( x ) >
=L,
E})
i 1 D
j=1
uUIxj}
which means h ( D ) = L . With i l , . . . , ik E { 1 , . . . , n } being the indices of the intervals ( x i - ] ,x i ) that ink
tersect D we obtain D
k
u(x;,-l. xi,)
j=1
Axi,
and
2 h ( D ) = L . But then in each
j=1
interval ( s i , - l .x;,) there is an s j with w , f ( s j ) > E , and hence w f ( x i , - l , x i , ) > each j = 1. . . . , k we infer M;, - ni,, > E and thus t1
E.
k
U ( f . P ) - L ( f . P ) = C ( M i - m;)Axi 2 C ( M i , - m;,)Ax;, > E L > 0. i=l
j=1
For
134
8. The Riemann Integral II
Because P was arbitrary we have shown that for any partition P of [ a ,b ] we have U (f ,P ) - L ( f ,P ) > E L > 0, where E and L are fixed. By Theorem 5.25 this means that f is not Riemann integrable. Lemma 8.10 shows that only functions that are continuous almost everywhere have a chance to be Riemann integrable. Lemma 8.9 shows that if we can “trap” the discontinuities in a small enough set, a function should be Riemann integrable. The only obstacle left is that outer Lebesgue measure works with countably many open intervals. To obtain a complement that is made up of closed intervals, it would be nice if we could use finitely many open intervals to cover the set of discontinuities. The next lemma shows that this is possible.
Lemma 8.11 Let f : [ p , b]- -+ E% be a bounded function and for each p > 0 define B, := {x E [a,b ] : w f ( x ) 2 p } . l f Z is a family of open intervals such that B,
c
u
n
I , then there arefinitely many 11,1 2 , . . . , I,
E
u
Zso that B ,
IEZ
Zj.
j=1
Proof. Suppose for a contradiction that
Then c := inf C 5 b. We first claim c E B,.
There is a sequence { a , } z l of elements of B, so that
a, < c and lim a, = c. Let
E
n+o3
> 0. Then there is an n
E
N so that la,
- CI
0 with ( c - c + E ) g (1, r ) we conclude
Moreover, because a,
E
E
B, we have that w f ( ( a , - ?, a,
W ~ ( ( C E,
E
i
E,
that w f ( c ) 2 p . Because c E B,, there is an open interval Z E Z that contains c. Clearly, c a. But then x := max { a ,inf(I)} E [ a,b] and because x # C, there are in-
+
u n
tervals Z1,. . . , In E
Zso that B , n [ a,XI 2
I ] . But then for all y
E
I with y 2 c
J=1
we obtain B,
n [ a ,y ] C Z U
u n
Zj.
If c
b this means inf C 2 sup(I) > c and if
i
j=1
c = b this means C = 0, a contradiction either way. This proves the result.
Now we can characterize Riemann integrability as follows.
Theorem 8.12 Lebesgue’s criterion for Riemann integrability. The bounded function f : [ a ,b ] + R is Riemann integrable on [ a,b] iff f is continuous a.e. on [ a ,b ]. Proof. The part
“+”is the contrapositive of Lemma 8.10.
8.2. Lebesgue’s Criterion for Riemann Integrability
135
For “+,”let f be continuous a.e. on [ a , b] and let E > 0. Choose rn E N so that b-a E . Then h(X,) = 0, and there is a -< -. Let X, := x E [ a ,b] : o f ( x ) 2 m 2 rn sequence
u
of open intervals with X, 5
c oc
m
(Zj}cl
Z j and
j=1
j=1
lZjI
0 so that I g ( x ) 2 E for all x E [ a ,b ] , then the quotient - is g
Riemann integrable on [a,b ] ,
3. The absolute value If 1 is Riemann integrable and the triangular inequality
Proof. For part 1, note that if f and g are continuous a.e., then the set of discontinuities { x : wfg (x) 0 ) is contained in the union { x : ~f (x) # 0 } U { x : wg(x) O } , which has measure zero. Thus f g is continuous a.e., and hence Riemann integrable. The remaining parts of this proof are left as Exercise 8-16.
+
+
It is now easy to see that once f : [a,b] + R is Riemann integrable, then it is also Riemann integrable over any closed subinterval. Formally, we define the following.
Definition 8.15 Let f : [ a ,b] + IR and let c , d E [ a ,b] with c < d. Then f is called Riemann integrable over [c,d ] ifSthe restriction f I[c,d] is Riemann integrable. In this case we set
l
d
f ( x ) dx :=
l
d f l [ c , d ~ ( ~d )x .
Theorem 8.16 Let f : [ a ,b] + R and let m E ( a ,b). Then f is Riemann integrable over [ a ,b] iff f is Riemann integrable over [a, m ] and over [ m ,b]. In this case, the integrals satisfy the equation
Ib
f ( x )d x =
lm
f (x) d x
+
/
m
b
f ( x )d x .
Proof. Exercise 8-17. We can now consider integrals in which the upper bound is an independent variable. This idea provides another connection between derivatives and integrals. For any continuous function f , definite integrals produce a function G with G' = f.
Theorem 8.17 Fundamental Theorem of Calculus, Derivative Form. Let f be a Riemann integrablefunction on [a,b]. Then the function G ( x ) := formly continuous on [ a , b ] and iff is continuous at x
I"f ( t )dt is uni-
Ja
E
( a ,b ) , then G is dixerentiable
at x with G ' ( x ) =
1
I
Proof. Let B be an upper bound so that f (x) < B for all x E [a,b ] . To see that &
G is uniformly continuous on [ a ,b ] ,let E > 0. Set 6 := -. Then for all x , z E [a,b] B with Ix - zI iS we obtain (assuming without loss of generality that x < z )
x . If we went through a partition from right to left rather than left to right, then all the bases of the rectangles in a Riemann sum would have negative length. Thus it makes sense to define the following. Definition 8.18 Let f : [a, b ] +. R be Riemann integrable. Then we define the integral with the reversed bounds to be
la
b
f (x)d x := -
f (x) dx.
Corollary 8.19 Let f : [ a ,b] -+ IR be Riemann integrable and let xo the function G ( x ) :=
Jc:
f ( t ) dt is uniformly continuous on [ a ,b ] and
d", (I:f ( t ) dt 1
uous at x E ( a ,b), then -
E [ u , b].
iff
Then
is contin-
= f (x).
Proof. Exercise 8- 18.
rn
Exercises 8-16. Proving Theorem 8.14 (a) Prove part 2 of Theorem 8.14. (b) Prove part 3 of Theorem 8.14. Hint.Use Lemma 5.6 to prove the inequality (c) To see how valuable the Lebesgue criterion is, use Riemann's Condition (Theorem 5.25) to prove that I f 1 is Riemann integrable over [u,b]. Then compare this proof with the proof using the Lebesgue criterion and state which proof is simpler. Hint.Use a partition P with U ( f , P ) - L ( f , P ) < E . 8-17. Proving Theorem 8.16 (a) Prove Theorem 8.16. Hint. In each direction of the proof use the Lebesgue criterion to establish Riemann integrability. Use Lemma 5.6 for the equations, making sure that the point m is an element of each partition P k .
8.3. More Integral Theorems
139
(b) To see how valuable the Lebesgue criterion is, use Riemann’s Condition (Theorem 5.25) to prove that if f is Riemann integrable over [a, b],then f is Riemann integrable over [a, nz]. Then compare this proof with the proof using the Lebesgue criterion and state which proof is simpler. Hint. Use a partition P with U ( f , P ) - L ( f , P ) iE . 8-18. Prove Corollary 8.19. 8-19. Use Riemann’s Condition (Theorem 5.25) to prove that if f and g are Riemann integrable over [a, b],then f g is Riemann integrable over [a, b]. Then compare this proof with the proof using the Lebesgue criterion and state which proof is simpler. 8-20. Determine which result from this section that was used in the proof of Theorem 8.17 was not available in Section 5.3. (This prevented us from placing Theorem 8.17 right after the Antiderivative Form of the Fundamental Theorem of Calculus.) 8-21. Mean Value Theorem for the Integral. Let the function f : [ u , b] + W be continuous and let the function g : [a, b] + [0, co)be nonnegative and Riemann integrable. Prove that there is a c E [a, b]
so that the integral satisfies 8-22 Let g : [ a , b] + [O.
00)
l
b
b
f ( x ) g ( x )d x = f ( c )
be Riemann integrable and let f : [a, b] --f b
there is a c
f ( x ) g ( x )dx = f ( a )
[ a , b] so that
E
g(x) dx.
[
g(x) dx
W be nondecreasing. Prove that
+f(b)
h
b
g ( x ) d x . You may use
that nondecreasing functions are Riemann integrable (see Exercise 8-1 1). 8-23 Integration by Parts. Let [ a , b] c (c. d ) and let F , g : (c, d ) -+ W be differentiable functions with derivatives f and g’ that are Riemann integrable on [ a , b]. Prove that the integral satisfies
lb
f ( x ) g ( x ) d x = F ( b ) g ( b )- F ( a ) g ( a ) -
lb
F(x)g’(x)dx.
8-24. Integration by Substitution. Let [ a , b] c (c, d ) , let g : (c, d ) + R be differentiable, let its derivative g’ be Riemann integrable on [a, b ] ,let ( u , u ) 2 g [ [ a ,b] ] and let F : ( u , u) + R be differentiable with continuous derivative f. Prove that
lb
f ( g ( x ) ) g ’ ( x ) dx = F ( g ( b ))
-
F (g(a) ) .
8-25. Let a > 0. A function f : [-a, a ] + W is called even iff f ( x ) = f ( - x ) for all x E [ - a , a ] . A function f : [-a, a ] + R is called odd iff f ( x ) = - f ( - x ) for all x E [-a, a ] .
La L
(a) Prove that if f is even and Riemann integrable, then
f ( x ) dx = 2
(b) Prove that i f f is odd and Riemann integrable, then
f ( x ) d x = 0.
La
f ( x ) dx
(c) Prove that any function f : [-a, a ] + W is the sum of an even and an odd function Hint. f ( x ) + f ( - x ) is even, ‘ 2 8-26. Compute the derivative. (Should the integrands be unknown, simply note that they are continuous.) (a)
& 1’
et2 dt
8-27. Let f : [a, b] + W be continuous and let I , u : (c, d )
(l:;)
f(t)
d f ) = f ( u ( x ) ) u’(x) - f (I@)
8-28. Construct a function f : [0, 11 + R so that integrable.
-+ [ a , b] be
differentiable. Prove that
) l’(x).
I f 1 is Riemann integrable and
f is not Riemann
8-29. A function f : [ a , b] + W is called absolutely continuous iff for every E > 0 there is a S > 0 so that for all sequences (a1 , b l ) , . , , , (a,, b,) of pairwise disjoint open intervals the inequality n
n
x ( b j - ai) i6, implies i=l
i=l
1 f ( b i )- f(ai) 1 < E.
8. The Riemann Integral 11
140
(a) Let f : [a, bl + R be a Riemann integrable function. Prove that the function G : [a, b] + W defined by G ( x ) :=
s,^
f ( t ) df is absolutely continuous on [ a , b].
(b) Prove that every absolutely continuous function is uniformly continuous. (c) Prove that f ( x ) =
1
-
X
is continuous, but not absolutely continuous on (0, 11
8-30. Results for Riemann-Stieltjes integrals. Let g : [a, b] + R be nondecreasing, and let the functions f,h : [ a , bl + Iw be bounded and Riemann-Stieltjes integrable on [ a , b] with respect to g.
(a) Prove that i f 1 is Riemann-Stielfjesintegrable on [a, b] with respect t o g and that the triangular inequality
ilb / If1 f dgi 5
b
dg holds.
(b) Prove that for all m E [a,b] the function f is Riemann-Stieltjes integrable with respect to g b
over [a,m ] and over [m,b] and that
f dg
=
/
rn
a
f dg
+
/f b
rn
dg.
(c) Prove that the product f h is Riemann-Stieltjes integrable on [ a , b] with respect to g Hint (for all). Use the Riemann Condition for Riemann-Stieltjes integrals (see Exercise 5-27)
8.4 Improper Riemann Integrals The Riemann integral allows us to compute the “area” under bounded functions defined on closed and bounded intervals. Sometimes we are interested in the area under functions that are defined on infinite intervals or that are unbounded. These areas can be approximated with Riemann integrals.
Definition 8.20 Let a restriction f I[a,c~
E
R and let f
: [ a ,00) -+ R be such that for all c > a the
is Riemann integrable. Ifthe limit lim r+cz,
l
t
f (x) d x exists, it is called cc
the improper Riemann integral of f over [ a , 00) and it is denoted Improper Riemann integrals for functions f : (Exercise 8-31).
(-00,
f ( x )d x .
b ] -+ R are dejned similarly
Example 8.21 The p-integral test for integrals over infinite intervals. Let p > 0 be 1 rational. Then f ( x ) = - is improperly Riemann integrable over [ 1, 00) zfs p > 1. xp
1
, while for
Finally, we need to show that the improper integral does not exist for p = 1. To do 1 1 this, note that f ( x ) = - is greater than -on each interval [ m ,m 1). Therefore, X m+l
+
141
8.4. Improper Riemann Integrals
for all n
E
N we infer f L
C" k +1 l
-l[k,k+l),
and hence
k=l
The latter sum is unbounded, so
0
d x does not exist.
Note that because we have not yet defined logarithms, the last argument in Example 8.21 is unavoidable. Similarly, we had to restrict ourselves to rational powers, because powers with arbitrary real exponents have not been defined yet. Of course, the p integral test ultimately holds for real exponents p and we will not restate it once real powers are defined. Improper integrals can also be defined for (potentially) unbounded functions. Definition 8.22 Let a , b E W with a < b and let f : [a,b ) + W be suck that for all
c
E
( a ,b ) the restriction f
is Riemann integrable. Ifthe limit lim t+b-
lt
f ( x ) dx
exists, it is called the improper Riemann integral off over [ a ,b ) and it is denoted
lb
f ( x ) dx. Improper Riemann integrals forfunctions f : ( a ,b] -+
W are defined
similarly (Exercise 8-32). For Riemann integrable functions on [ a ,b ] , the improper Riemann integral over [ a ,b ) agrees with the Riemann integral over [ a ,b ] . Proposition 8.23 Let a , b E R with a < b and let f : [ a ,b ] + R be Riemann integrable. Then the improper Riemann integral off over [ a ,b ) exists and it is equal to the Riemann integral off over [ a ,b]. &
Proof. Let B > 0 be an upper bound of I f / and let E > 0. Then for S := - and all B t E [ a ,b ) with It - bl < 6 we obtain
Riemann integrable functions are not the only functions that are improperly Riemann integrable. Example 8.24 shows that there are unbounded functions for which the improper Riemann integral exists. Example 8.24 The p-integral test for improper integrals over (0, 11. Let p > 0 be 1 rational. Then f ( x ) = - is improperly Riemann integrable over ( 0 , I ] iff p < 1. XP
Mimic the proof for Example 8.21. (Exercise 8-33.)
0
8. The Riemann Integral II
142 Note that the improper integrals
r
I'
-
:x
d x converge for p < 1, while the integrals
- d x converge for p > 1. Irrespective of this important difference, for both
xlP types of improper integrals similar laws hold and there also is a Comparison Test that is similar to the Comparison Test for series.
Theorem 8.25 Let f , g : [ a ,b ) + R (where b could be integrable over [ a ,b ) and let c E R.Then 1. f
00)
be improperly Riemann
+ g is improperly Riemann integrable over [a,b ) and
lb(f +
g ) ( x )d x =
l
b
f (x) d x
+
l
b
g(x)dx.
2. cf is improperly Riemann integrable over [a,b ) and l b c f ( x )d x = c
Ib
f(x) dx.
Proof. Similar to the proof of Theorem 6.4. (Exercise 8-34.) Theorem 8.26 Let f : [a,b ) -+ R (where b could be 00) be Riemann integrable over all intervals [ a ,c] 5 [a,b). Then f is improperly Riemann integrable over [ a ,b ) if and only if for all c E [ a ,b ) the finction f is improperly Riemann integrable over
1
b
[c. b ) and in this case
f (x)d x =
lc
f (x) d x
b
+
f (x) d x .
Proof. Exercise 8-35. Theorem 8.27 Comparison Test for improper integrals. Let f ,g : [ a ,b ) -+ E% (where b could be 00) be such that 0 5 f 5 g , f is Riemann integrable over every closed interval in [ a ,b ) and g is improperly Riemann integrable over [ a ,b). Then b
b
f is improperly Riemann integrable over [ a ,b) and
Proof. The function F ( t ) := [ a ,b ) and it is bounded by
f (x)d x 5
g ( x )dx.
lt
f (x)d x is continuous and nondecreasing on
g(x) d x . The reader will show in Exercise 8-36 that
lim F ( t ) = sup { F ( t ) : t E [ a ,b ) } 5
g ( x ) d x to complete the proof.
t+b-
Theorem 8.28 Let f : [ a ,b) + R (where b could be 00) be Riemann integrable over every closed interval in [ a ,b). If I f 1 is improperly Riemann integrable over [ a ,b), then f is improperly Riemann integrable over [ a ,b ) and the triangular inequality
143
8.4. Improper Riemann Integrals
t
Figure 21: In the integral test, the improper integral of a nonincreasing function is related to the series that give the Riemann sums with left and right endpoint evaluations for the partition with step length 1. The integral test says that for an improperly integrable function the series obtained by right endpoint evaluation cannot be infinite ( a ) , while for a function that is not improperly integrable the series obtained by left endpoint evaluation cannot be finite (b).
Proof. Exercise 8-37. Finally, we should note that the occurrence of series in Example 8.21 is not an accident. The Integral Test connects the convergence of certain series to the convergence of improper integrals over infinite intervals.
Theorem 8.29 Integral Test. Let f : [ 1, 00) -+ [0,00) be a bounded nonincreasing 00
ifs the improper integral
f ( j ) converges
nonnegative function. Then the series j=1
f (x)dx converges (also see Figure 21). 00
Proof. Throughout the proof let g ( x ) =
f ( j ) l [ j , j + I ) ( x ) .(For
every x 3 1,
j=1
this sum has at most one nonzero term.) m
For
“+,”let
f ( j ) be convergent. Then g as defined above is improperly Riej=1 00
w
[
mann integrable over [ l , 00) with
f(j)
g ( x )d x =
i00.
Because 0 5 f 5 g ,
J I
j=l
the Comparison Test for improper integrals implies that Conversely, for
r
La
“e,” let
f ( x - 1) d x =
Lrn
L
00
f ( x ) d x converges.
f ( x ) d x be convergent. Then the improper integral
f ( x ) d x converges. Because g ( x ) 5 f ( x - 1) for all x 2 2,
by Comparison Test for improper integrals the integral
g ( x ) d x converges, which
144
8. The Riemann Integral II C X
f ( j ) converges.
means the series j=2
The connections and similarities between integrals over infinite intervals and series need to be considered with caution. For example, Exercise 8-38 shows that there is no Limit Test for improper integrals over infinite intervals. Finally, improper integrals can also be defined for functions on the whole real line and for functions with multiple singularities. a1 < a2 < . . . < a,-l < a, = b (where a could be and b could be 00) and let f : ( a , b ) \ { a l , . . . , a n - l } -+ E% be Riemann integrable over all closed subintervals of its domain. We define the improper Riemann integral off as follows. Let r1 < 12 < < r, be so that aj-1 < r j < a , and define
Definition 8.30 Let a = a0 < -00
Ib
f (x) dx :=
s . .
2 [I::,
f (x) dx
+
j=1
1;
f (x) dx] ifall the summands exist.
Exercises 8-31. Let b E
W and let f
: (-m, b] +
W be such that for all a
integrable. Define the improper Riemann integral
< b the restriction f [ [ a , b ]is Riemann
s_, b
f ( x ) dx o f f over (--00,
b].
8-32. Let a , b E iW with a < b and let f : (a. b] + W be such that for all c E ( a , b) the restriction f l [ c , b ] is Riemann integrable. Define the improper Riemann integral 8-33. Prove the claim in Example 8.24.
lb
f ( x ) dx o f f over [a, b]
8-34. Prove Theorem 8.25. 8-35. Prove Theorem 8.26. 8-36. Finish the proof of Theorem 8.27 by proving that lirn F ( t ) = sup
+If1
8-37 Prove Theorem 8.28. Hint. 0 5 f
t+b-
[ F(t): t
E [a, b )
}.
5 2lfl.
8-38 Construct a function f : [ I , m ) + [0, I] that is improperly Riemann integrable, but does not 1 converge to zero as x --f 03. Hint. Triangles of height 1 and area 2n 8-39 Cauchy Criterion for improper Riemann integrability. Let f : [a, b ) + W (where b could be 00) be Riemann integrable over all intervals [a, c ] 2 [ a , b). Prove that f is improperly Riemann integrable over [ a , b ) iff for all E > 0 there is an M E [a, b ) , so that for all c, d E ( M , b ) we have
8-40 Limit Comparison Test for improper integrals. Let
be Riemann integrable over all intervals [ a ,c]
s,"
f,g :
[ a ,b )
+ [0, 03) (where b could be x)
[a, b ) . Prove that if
lim x+b-
f(x)
fo =K g(x)
> 0, then
d x converges iff
Hinr. Close to b we have g ( x ) ( K
~
E)
5 f(x) 5 g(x)(K
+E).
A
8-4 1 Prove that the function f : ( a , b] + W is improperly Riemann integrable over ( a , b] iff the function
[
improperly Riemann integrable over a , m ) and that in this case the integrals are equal.
Chapter 9
The Lebesgue Integral
The geometric idea behind integration is to approximate the area under the graph of a function with areas that are easier to compute. In the Riemann integral, we partition the x-axis and erect a rectangle over each partition interval to approximate the area under the graph. However, Lebesgue’s criterion for Riemann integrability shows that geometric rectangles will not approximate the area well if the function “oscillates” too much. That is, if the differences between the possible choices for the heights of the rectangles do not shrink to zero, then we cannot uniquely identify the area. Equivalently, in the Darboux formulation (see Section 5.4) excessive oscillations force the upper and lower approximations of the area with rectangles to stay a finite distance apart. If we change our point of view and partition the y-axis instead of the x-axis, the problem with different choices for the height goes away (see Figure 22). For a set S = { x E [ a , b] : yi-1 5 f ( x ) < yi }, all sensible values for the height of a generalized rectangle with base S are between yi-1 and yi. Because the difference between these values can be made small, oscillations are not an issue. However, this approach requires that the bases of our generalized rectangles are no longer intervals. The area of such a generalized rectangle will be the Lebesgue measure of the base set times the height of the rectangle. In this fashion, we retain all benefits of the geometric motivation for integration, while being able to integrate many more functions.’ This chapter introduces the fundamentals of Lebesgue integration. These fundamentals will be revisited in Chapter 14 when we generalize our work to arbitrary measure spaces. Our presentation is designed to readily translate to the more abstract setting of Chapter 14. Before we start, it is time to extend our arithmetic from the real numbers to the ‘The Lebesgue integral also remedies a more abstract problem with the Riemann integral. Spaces of Riemann integrable functions are usually not complete (see Exercise 16-15d), while spaces of Lebesgue integrable functions are (see Theorem 16.19). Completeness is such a fundamental abstract property that this may be the main reason why the Lebesgue integral is preferred.
145
146
9. The Lebesgue Integral
k f l ___
2"
k 2" k-l 2"
k-2 2"
dashed:
~
n2"
k-3 2"
S
=
k-1 7 l A k - l
k=l 7
-
2"
Ak-1
Figure 22: The idea behind the Lebesgue integral is to partition the y-axis instead of the x-axis and to approximate the area with simple functions. This figure shows that this can lead to "scattered" base sets on the x-axis, which we will treat first in this chapter. The proof of Theorem 9.19 uses the partition of the y-axis into intervals with dyadic rational endpoints. real numbers with 30 and -m included. This is sometimes called the extended real number system. In the extended real number system, every set has a supremum. This will make our definitions of measures and of the Lebesgue integral simpler, because we will not need to explicitly distinguish between bounded and unbounded sets.
Proposition 9.1 In the extended real number system [--00, co] := R U {oc.- 3 0 ) , 30 is the greatest element and --oo is the smallest element. Consequently, every set has an in$mum and a supremum. Proof. Let A C [--00, 001. The supremum of {-30) is -30, and the supremum of 0 also is -00, so we can assume that A # 0, {-cm}.If A is bounded above by a real number, then A n Iw # 0 has a supremum in the real numbers, which is also the supremum of A in [-30, 301. If A is not bounded above by a real number, then sup(A) = 00. Infima are treated similarly. Proposition 9.1 assures that from here on, we can take suprema and infima of sets of numbers fairly indiscriminately, as long as we know how to handle infinite values 301 is the same as on R with the additional convenalgebraically. Arithmetic on [-a, tions of Definition 9.2. These conventions are inspired by the corresponding limit laws in Theorems 2.44 and 2.46.
147
9.1. Lebesgue Measurable Sets
Definition 9.2 Arithmetic involving 00. Let c E R. Then c-00=-00,
c+co=00,
undefined; c.(-m)=
{
00;
:nzfined;
i f c > 0, i f c < 0, i f c = 0, i f c > 0, i f c < 0, i f c = 0,
00
C
- = 0: --oo
m . (-00) = -00,
m*co=co, ( - 0 0 ) . (-00)
C
- = 0,
= 00.
All other attempts at “arithmetic with infinity” lead to what is called indeterminate forms. For these, the result can be any number, and hence rules of arithmetic cannot be stated. We will consider indeterminate forms in Section 12.3.
9.1 Lebesgue Measurable Sets We will work with generalized rectangles whose bases are no longer intervals. Therefore we need to measure the size of sets that are more complicated than rectangles. Outer Lebesgue measure is a reliable upper bound for the (one-dimensional) “volume” of a set. For all examples we have seen, it gives the right “volume.” Hence, we can (and do) consider it the right way to measure the “outer volume” of a set. Unfortunately, there are complications. It is possible to split a set T into two sets so that the outer Lebesgue measures of the two sets add up to more than the outer Lebesgue measure of T . If we were to involve such pathological sets in a definition of integration, the integral would not even be a reliable measure of the area of generalized rectangles. It would make no sense if the total “length” of the base would depend on how we split the base. The definition of Lebesgue measurable sets is designed to safeguard exactly against this problem (see Theorem 9.11). Because all our sets are subsets of R (and later of a measure space M ) , we introduce abbreviated notation for the complement.
Notation 9.3 When there is one underlying set X that contains all sets that we currently investigate (as is the case in measure theory) and S s X , then we denote the 0 complement of S in X also as S’ := X \ S. Definition 9.4 A subset S R is called Lebesgue measurable i f f o r all T C R the equality h ( T ) = h ( S n T ) + h (S’ n T ) holds. We will also call the set T a test set. We denote the set of Lebesgue measurable subsets of R by EL. The existence of non-Lebesgue measurable sets, which would cause the abovementioned problems, is equivalent to the Axiom of Choice. That is, whether or not nonLebesgue measurable sets exist depends on what axiomatic system of set theory is used.
9. The Lebesgue Integral
148
In practical terms it means that non-Lebesgue measurable sets do not occur in physical phenomena. Hence, from an applied point-of-view we need not be too concerned with nonmeasurable sets and we will not consider them any further in this text. Exercise 9-7 illustrates the problems mentioned above with a simpler measure of ‘‘size,’’ the Jordan content, for which even simple sets can behave badly. The construction is more complicated for Lebesgue measure and the interested reader can find such constructions in [14] and [29]. For the remainder of this section, we explore the properties of the set of Lebesgue measurable subsets of R.
Definition 9.5 Let S R be a Lebesgue measurable set. Then the outer Lebesgue measure h ( S ) of S is also called the Lebesgue measure of S. We first note that “half’ of the definition of measurability is always satisfied.
Corollary 9.6 For all subsets S , T 5 R,we have h ( T ) 5 h ( S n T ) + A. (S’ n T ) Proof. This follows from part 3 of Theorem 8.6 with A1 := S n T , A2 := S’ n T and A,, := 0 for n 2 3. It is now easy to see that null sets are Lebesgue measurable.
Proposition 9.7 I f h ( S ) = 0, then S is Lebesgue measurable. Proof. Let S be a null set. By part 2 of Theorem 8.6 for all sets T R we obtain 0 5 h ( S f l T ) 5 h ( S ) = 0. Hence, for all subsets T E R we have the inequality h ( S n T ) h (S’ n T ) 5 h (S’ n T ) 5 h ( T ) , which by Corollary 9.6 is all we need to establish Lebesgue measurability of S .
+
In the following, we establish that certain set theoretical operations preserve measurability. Theorem 9.10 summarizes the most important facts. Although there are more results, the properties listed in Theorem 9.10 suffice for our purposes. Set systems that satisfy these properties are called a-algebras (see Definition 14.1). These set systems are fundamental for measure theory.
Lemma 9.8 If A and B are Lebesgue measurable sets, then the intersection A n B is Lebesgue measurable. Proof. Proofs of Lebesgue measurability typically involve the appropriate rewriting of terms and the use of the right test sets. To show that A n B is Lebesgue measurable, let T R be any subset of R. Then h ( ( An B ) n T ) =
+ h ( ( A n B)’ n T ) h ( A n B n T ) + h ((A’ U B’) n T ) +
+
h ( A n B f l T ) h ( A n (A’ U B’) f l T ) h (A’ = A ( B n A n T ) + h (B’ n A n T ) + A (A’ n T ) =
n (A’ U B’) n T )
149
9.1. Lebesgue Measurable Sets
= h(AnT )
+ h (A’ n T )
= h(T).
Because T was an arbitrary subset of measurable.
R we have proved that
A
nB
is Lebesgue
rn
The next lemma will be useful in two ways. It is a step toward proving that countable unions of Lebesgue measurable sets are Lebesgue measurable. It also is a step toward proving that if a Lebesgue measurable set consists of countably many pairwise disjoint Lebesgue measurable pieces, then the Lebesgue measure of the set is the sum of the Lebesgue measures of the pieces.
Lemma 9.9 Let
{An}Elbe a sequence of painvise
u
disjoint Lebesgue measurable
co
sets. Then the union
A, is Lebesgue measurable and for all T &
IR we have
n=l
Proof. Let T E R. We first prove by induction that for all k k
A(T)=
E
N we have that
)
C A (A, n T ) + A
n T . The base step with k = 1 follows
n=l
from the Lebesgue measurability of A 1. For the induction step k -+ ( k l), we can assume that the induction hypothe-
+
k
sis A(T) =
C A (A, n T )+A n=l
k+l
A.(T) =
CE. (A, n T ) + A
n=l
induction hypothesis.
is true, and we need to prove that
((5 n=l
An)’ n T ) . We start the induction step with the
9. The Lebesgue Integral
150
/ k+l
k
\
n=l
=
& A n n T ) + h ( ( ; A n ) ’ nnT= l) , n=l
which finishes the induction stem k
Thus for all k E
N we have h ( T ) 2
h (A,
n T )+h
n=l
letting k go to infinity we obtain the following.
2
h(T).
The above establishes measurability of the union as well as the desired equality. W
Theorem 9.10 The set CA of Lebesgue measurable subsets of properties. 1. 0
E
2. I f S
R has the following
C).. E C h , then
S’
E
C,.
u x
3. @ A n E CAf o r all n E W, then
An
E
Xi,.
n=l
Proof. Parts 1 and 2 are left to the reader as Exercises 9-la and 9-lb. For part 3, let An E CAfor all n E N and let T R. Define B1 := A 1 and then inductively for n E
N set Bn+l
that for all n
E
:= A n + l r7
n n
(dB1l’
= A n f l r7
Bi. An easy induction shows
,=1
N the set Bn is Lebesgue measurable (use part 2 and Lemma 9.8),
that B1, . . . , Bn are pairwise disjoint and that
u n
A, =
r=l
u n
2=1
B, (see Exercise 9-lc). But
9.1. Lebesgue Measurable Sets
u u x
then
151
00
Bi (see Exercise 9-ld) and the latter set is Lebesgue measurable by
Ai =
i=l
i=l
Lemma 9.9. Now that we have established that countable unions preserve Lebesgue measurability, it is reassuring to note that Lebesgue measure is additive for countable unions of pairwise disjoint Lebesgue measurable sets. This is exactly what we expect from a sensible measure. The size of a whole set is the sum of the sizes of its pairwise disjoint parts.
Theorem 9.11 Let
u rn
sets. Then
{An]El be a sequence of painvise disjoint Lebesgue measurable ffi
A, is Lebesgue measurable and h
n=l
Proof. The union is Lebesgue measurable by Lemma 9.9 and if we apply Lemma
u ffi
9.9 to T :=
A,, we obtain
n=l
which proves the result. The next result is a nice application of the countable additivity we just proved and it will also be needed when we show that sums of Lebesgue integrable functions are Lebesgue integrable. be a sequence of Lebesgue measurable subsets of W so
Theorem 9.12 Let
(,,u =, 100
that A,
A,+lforall n
E
N.Then h
Proof. Set B1 := A1 and for all n
u
N E
N the equality
I 1 =1
u N
N
B, =
E
\
A,
= lim h(A,). n-.+rn
N define
Bn+l := An+l
\ A,. Then for all
n=l
u
u cc
so
A, = A N holds, which means
B, =
n=l
A,.
n=l
N
lim
N+W
C ~ ( B , 1’)im= AN). n=l
N+rn
The whole idea of Lebesgue measurability is only useful if it indeed allows us to extend the idea of Riemann integrability. For that to happen, intervals must be Lebesgue measurable. We have delayed this result to the end of the section, because now we can use some of the machinery built so far.
9. The Lebesgue Integral
152
Proposition 9.13 Intervals are Lebesgue measurable and the Lebesgue measure of an interval is its length. Proof. Because singletons are null sets and thus Lebesgue measurable, and because countable unions of Lebesgue measurable sets are Lebesgue measurable, intervals are shown to be Lebesgue measurable if we can prove that open intervals of finite length are Lebesgue measurable. So let A = ( a , b ) be an open interval of finite length and let T E R. By Corollary 9.6, we only need to prove h ( T ) 2 h ( A n T ) + h (A’ n T ) . The inequality is trivial if h ( T ) = co, so we only need to prove the inequality if h ( T ) < 00. Let T C R be a set of finite outer Lebesgue measure, let E > 0 and let
[Zj}pl be a family of open intervals with T
u 00
00
Zj
and
IZjl
5 h(T)
j=1
j=l
+ !.2
Then
Z := { Z j n A : j E N} is a countable family of open intervals whose union contains A n T . Moreover, 0
:=
{Zj \ [ a , CO)
u
:j E
a - -&, a +
{(
8
N} U { Z j \ (-co,b] : j
-), (b&
i& ’ b +
E
i)] &
N]
8
is a countable family of open intervals whose union contains A’ n T . Thus h ( A f’ T )
=
+ h (A’ n T )
h(T)+&
+
Because E was arbitrary this proves that h ( T ) 1 h ( A r l T ) h (A’ n T ) and we have proved that A is Lebesgue measurable. Regarding the Lebesgue measure of intervals, by Proposition 8.5 for closed intervals in R we have h ( [ a ,b ] ) = b - a . For closed, unbounded intervals of the form [ a , co), we infer for all b > a that h ( [ a ,co)) 2 h ( [ a ,b ] ) = b - a , and hence h ( [ a ,00)) = 00. Intervals of the form (-00, b] are handled similarly. For open and half-open intervals, note that the singleton sets consisting of the endpoints have measure zero. This means (see Exercise 9-2) that adding or removing these points does not affect the Lebesgue measurability of the set or its Lebesgue measure.
Standard Proof Technique 9.14 Note that the inequality marked with (*) in the proof of Proposition 9.13 can actually be shown to be an equality. This is not necessary
9.2. Lebesgue Measurable Functions
153
because we only need the inequality. In complicated estimates, it can happen that an inequality sign is put between quantities that are actually equal. Usually, this happens when the equality would not have helped in the proof (as in the example just mentioned) and when the writer did not want (the reader) to spend extra effort to think about why the quantities may be equal. 0
Exercises 9-1. Finish the proof of Theorem 9.10. That is,
(a) Prove that 0
E
(b) Prove that if S
ZA. ZA,then S’
E
ZA.
E
(c) Perform the induction mentioned in the proof of part 3. (d) Prove that if (Ai):,
u u n
Ai =
i=1
and (Bi)iOO,l are countable families of sets so that for all n E N we have
n
u u 00
B i , then
03
Ai =
i=l
i=l
Bi.
i=l
9-2. Let A be a Lebesgue measurable set and let N be a null set. Prove that A \ N is Lebesgue measurable and that h ( A \ N ) = ,L(A). Him. Use Theorem 9.10.
9-3. Let A , B 9-4. Let
u
B be Lebesgue measurable sets. Prove that A \ B is Lebesgue measurable. be a finite sequence of painvise disjoint Lebesgue measurable sets. Prove that the union
N
An is Lebesgue measurable with A
u )
(ny1
n=l
An
N
=
2 h(An). n=l
n 00
9-5. Let (An)E1 be a sequence of Lebesgue measurable sets. Prove that
An is Lebesgue measurable
n=l
and that for all k E W we have h
n ) (Il An
5 h(Ak).
9-6. Let C Q be a Cantor set. Prove that CQ is Lebesgue measurable. Hint. Use Exercise 9-5. n
9-7. For all S C [0, 11, let J(S) := inf j=1
dan content of S . Prove that J ( [ O , 11 n Q ) = 1 and J (10, 11 \
Q ) = 1.
9-8. Let A G B C B and let A (but not necessarily B ) be Lebesgue measurable Prove that h ( B ) = h ( A ) h ( B \ A )
+
9.2 Lebesgue Measurable Functions We now introduce the functions for which the Lebesgue integral can be defined. By first defining what (potentially) integrable functions should look like, we avoid the Riemann integral’s conceptual complications that are characterized in Lebesgue’s criterion (see Theorem 8.12). Existence or nonexistence of the Lebesgue integral, defined in Section 9.3, will then merely be a question of whether there is too much area under the graph of the function. We will revisit the original motivation of partitioning the y-axis after the
9. The Lebesgue Integral
154
proof of Theorem 9.19. To approximate areas, in Lebesgue integration indicator functions take the place of rectangles. Recall that by Definition 5.9 the indicator function 1; f o r x E S, of a set S R is l s ( x ) := 0; f o r x $ S. Just as Riemann integrable functions can be approximated a.e. with step functions (see Exercise 5-26), Lebesgue integrable functions will be approximated with functions that are constant on measurable sets and which only assume finitely many values.
Definition9.15 A function s : R + R is called a simple Lebesgue measurable function, or, a simple function, iff there are numbers a1 , . . . , a, E R and pairwise n
disjoint Lebesgue measurable sets A1,
. . . , A, 5 so that s =
aklAk. k=l
For functions f that assume more than finitely many values, we consider the positive and negative parts of f separately.
Definition 9.16 For f : R + [-m, co],we define f + ( x ) := max { f (x), 0 ) and f - ( x ) := - min { f ( x ) , 0 )f o r all x E R. Because we will successively approximate measurable functions from below we want to speak of sequences of functions.
Definition 9.17 A family { f n J n E ~of functions will also be called a sequence of functions, denoted { fn}r=l. Definition 9.18 A function f : R + [0,001 is called Lebesgue measurable iff there is a sequence {sn)rE1of simple functions s, : Iw + [0,co)such that f o r all x E R the sequence { s, ( x ) } E l is nondecreasing and lim s, (x) = f (x). A function f : R + [--00,-00] both Lebesgue measurable.
n-+m
is called Lebesgue measurable iff f + and f - are
The key problem in Riemann integration is that for some functions the approximations from above and below will not “meet.” Definition 9.18 does not simply circumvent this problem by only focusing on approximations from below. Exercise 9-9 shows that a bounded function (only bounded functions are considered in Riemann integration) is Lebesgue measurable iff it can be approximated from above with simple functions. That is, Definition 9.18 may look biased, but for bounded functions the concept of Lebesgue measurability could also be defined with approximations from above. Because we are also interested in unbounded functions, we choose to work with approximations from below throughout. Because Lebesgue measurability is a key concept, it is useful to have several equivalent formulations available.
Theorem 9.19 Let f : R -+ lent.
[--00,
m] be a function. Then the following are equiva-
1. The function f is Lebesgue measurable.
9.2. Lebesgue Measurable Functions
155
2. For all a
E
R,the set {x E R : f (x) > a } is Lebesgue measurable.
3. For all a
E
R,the set {x E R : f (x) 5 a } is Lebesgue measurable.
4. For all a
E
R,the set {x E R : f (x) < a } is Lebesgue measurable.
5. For all a E
R,the set {x E R : f (x) L a } is Lebesgue measurable.
Proof. We first prove the result for a function f : R + [0, w]. For the implication “1=+2,” let a E R and let {s,]:=~ be a sequence of simple functions such that for all x E R the sequence {sn(x)}Z1is nondecreasing and converges to f ( x ) . For all x E R,if f ( x ) > a , then for some n E N the inequality s,(x) > a holds. Conversely, if for some n E N we have that s n ( x ) > a , then because { S , ( X ) } ~ is~ nondecreasing we must have f ( x ) > a . This means that
u {x 30
{x E R : f ( x ) > a } =
E
R : s n ( x ) > a } . But each set {x
E
R : sn(x) > a )
i1=1
is a union of finitely many Lebesgue measurable sets, which means it is Lebesgue measurable. Therefore, as a countable union of Lebesgue measurable sets, the set {x : f ( x ) > a } is Lebesgue measurable. For “2+3,” let a E R.Then {x E R : f ( x ) 5 a } = R \ {x E R : f ( x ) > a } , which is the complement of a Lebesgue measurable set. For “3=+4,” note that for all real numbers a the set {x E R : f ( x ) < a } is equal to the union {x
E
R :f(x) a } = {x E R : f + ( x ) > a } , which is Lebesgue measurable, and for a < 0 we have {x E R : f ( x ) a } = (x E R : f - ( x ) < - a } , which is also Lebesgue measurable. Parts “2+3,” “ 3 j 4 , ” and “4+5” are similar to what was done for nonnegative functions. To prove part “ 5 j 1 , ” first note that for all a > 0 we have {x E R : f + ( x ) 2 a } = { x E R : f ( x ) 2 a } , which is Lebesgue measurable and for all a 5 0 we have { x E R : f + ( x ) 2 a } = R, which is also Lebesgue measurable. Hence, f +is Lebesgue measurable. Considering the negative part f-,for all a 2 0 we have that { x E R : f - ( x ) 5 a } = { x E R : f ( x ) 2 - u } , which is Lebesgue measurable and for a < 0 we have {x E R : f - ( x ) 5 a } = 0, which is also Lebesgue measurable. Hence, f - is Lebesgue measurable and because we already proved that f’ is Lebesgue measurable we have proved that f is Lebesgue measurable. rn The underlying idea of the proof of part “ 5 j 1” for nonnegative functions f is to 1 partition the interval [O, n ) on the y-axis into intervals of length -. The proof shows 2n that the area under the functions that are used to approximate f should approximate the area under f , which means the idea of partitioning the y-axis can lead to a sensible notion of integration. Once measurable functions are characterized, it is helpful to determine how measurability relates to common algebraic operations.
+
Theorem 9.20 Let f , g : R -+ [--00, 001 be Lebesgue measurable functions. I f f g is dejined everywhere, then f g is Lebesgue measurable. Similarly, i f f - g or f . g is dejined everywhere, then it is Lebesgue measurable. Moreovel; f + , f- and I f I are Lebesgue measurable.
+
+
Proof. To see that f g is Lebesgue measurable, let a E R. We will use Theorem 1.36 to show that the set { x E R : ( f g ) ( x ) < a } is a countable union of Lebesgue measurable sets. If (f g ) ( x ) < a , then there is an E > 0 so that (f g ) ( x ) 26 < a. By Theorem 1.36, there are rational numbers r and s so that f ( x ) < r < f ( x ) E and g(x) < s < g ( x ) E . This means that f ( x ) g ( x ) < r s < ( f g ) ( x ) 2s < a , which proves the containment ‘‘S”in the equation below. The containment ‘‘2’’ is trivial.
+
+
+
+
+
+ +
+ + +
Because the latter set is a countable union of Lebesgue measurable sets, the set g)(x) < u } is Lebesgue measurable. Because a E R was arbitrary this means that f g is a Lebesgue measurable function. The proofs that f - g and f . g are Lebesgue measurable functions are similar (see Exercise 9-1 1). The functions f + and f - are Lebesgue measurable by Definition 9.18 and the Lebesgue measurability of If1 = f + f - follows from the Lebesgue measurability of sums of Lebesgue measurable functions. rn { x E R : (f
+ +
+
9.2. Lebesgue Measurable Functions
157
Exercises 9-9. Prove that a bounded function f : W -+ [0, 00) is Lebesgue measurable iff there is a sequence w ( s n ) z , of simple functions S n : W -+ R such that for all x E R the sequence ( s n ( x ) }n=l is nonincreasing and lim s n ( x ) = f ( x ) . n+oc
Hint. Mimic part ‘‘5+1” of the proof of Theorem 9.19 for nonnegative functions. 9-10. Let f : W -+ [-m, cc]be a Lebesgue measurable function. Use part 4 of Theorem 9.19 to prove that ( x E W : f ( x ) = cu } is Lebesgue measurable. 9- 1 1. Finish the proof of Theorem 9.20. That is, (a) Prove that if f,g : W + [ - x , m] are Lebesgue measurable and f - g is defined everywhere, then f - g is Lebesgue measurable. (b) Prove that i f f , g : R -+ [-m, co]are Lebesgue measurable and f . g is defined everywhere, then f g is Lebesgue measurable. Hint. This one is complicated because of negative signs. Prove the result first for f,g > 0, then use f = f+ - f- and g = g t - g - . 9-12. Prove that the sum of two simple functions is again a simple function. 9-13. Prove that f : R + [--00, co]is Lebesgue measurable iff for any two numbers a < b in W the set [ x E W : f ( x ) E [ a , b ) ] is Lebesgue measurable. 9-14. Use Definition 9.18 to prove that if f,g : f g is Lebesgue measurable.
+
W
-+ [0, co] are Lebesgue measurable functions, then
9-15. Let f , h : W -+ [-cc,x] and let f be Lebesgue measurable. Prove that i f f = h a.e., then h is Lebesgue measurable. 9-16. Let f .g : R -+ [-x,co]be Lebesgue measurable functions.
+
f,g : W + [-cc,co] are Lebesgue measurable and f ( x ) g ( x ) is de’ fined almost everywhere, then (f + g ) ( x ) := f ( x ) + g ( x ) ; if f ( x ) + g ( x ) is defined, IS lo: otherwise, Lebesgue measurable. Hint Apply Exercise 9-15 to the right auxiliary functions and then use Theorem 9.20.
(a) Prove that if
(b) Prove that if f,g :
R
+ [-m,
fined almost everywhere, then Lebesgue measurable (c) Prove that if
co] are Lebesgue measurable and f ( x ) - g(x) is deis defined, 1s ’ f ( x ) - g ( x ) ; if f ( x ) 0; otherwise,
(f - g ) ( x ) :=
l
f,g : W + [-x.m] are Lehesgue measurable and f ( x ) g ( x ) is defined almost
everywhere, then ( f g ) ( x ):=
if f ( x ) g ( x ) is defined’ is Lehesgue measurable. otherwise,
9-17, Let f,g : W +. [-m, 001 be Lebesgue measurable functions. (a) Prove that (b) Prove that (c) Prove that
{ x E W : f ( x ) = g ( x ) } is Lebesgue measurable. [ x E W : f ( x ) 5 g ( x ) ] is Lebesgue measurable. [ x E R : f ( x ) ig ( x ) } is Lebesgue measurable.
9-1 8. Let f . g : W + [-cu, x]be Lebesgue measurable functions (a) Prove that max(f, g ) (defined pointwise) is Lebesgue measurable. (b) Prove that min(f, g ) (defined pointwise) is Lebesgue measurable. 9-19, Let f : W + W be a nondecreasing function. Prove that f is Lebesgue measurable
9. The Lebesgue Integral
158
9.3 Lebesgue Integration Independent of whether the base is an interval or a potentially more scattered Lebesgue measurable set, the area of a “rectangle” should be the measure of the base times the height. This idea is behind the Lebesgue integral of simple functions.
Definition 9.21 Let A1, . . . , A , C JR be painvise disjoint Lebesgue measurable sets, n
let
al,
. . . , a,
E
[o, 00)
be nonnegative numbers and let s = n
function. We dejne the Lebesgue integral o f s by
UklAi,
k= 1
be a simple
n
ak1.4, d h :=
C akh(Ak). k=l
n
By Exercise 9-20a for any given simple function s the value
akh(Ak) does not k=l
7c
depend on the representation s =
ak1.4, that was chosen for s. Hence, Definition k=l 9.21 is sensible and we can proceed to more general functions. For a more general function, the Lebesgue integral is defined by approximating the area under the function from below with the area under simple functions.
Definition 9.22 Let f : R + [0,001 be a Lebesgue measurable function. We dejne the Lebesgue integral o f f to be f d h := sup
{
s d h : s is a simple function with 0 5 s 5 f
and we will call f Lebesgue integrable ifSthe supremum isfinite. A function g : R + E% will be called Lebesgue integrable i#g+ both Lebesgue integrable. We set
4
g d h :=
’8
dh -
1 and g- are
g- d h and call it the
Lebesgue integral of g . Continuing our comparison with the Riemann integral, Exercise 9-2 1 guarantees that for bounded functions that differ from zero only on a set of finite Lebesgue measure (the Riemann integral is defined for bounded functions on bounded intervals) there also is an approximation from above that will give the value from Definition 9.22. That is, unlike the RiemandDarboux integral (see Definition 5.27 and Theorem 5.29), for bounded functions that differ from zero on closed and bounded intervals the Lebesgue integral does not have any problems with an upper and a lower approximation not being equal. As noted after the definition of Lebesgue measurable sets (see Definition 9.4), nonmeasurable sets are quite hard to come by. Similarly, although we will always need to prove measurability for functions that we want to integrate, nonmeasurable functions are not expected to arise in practical applications. This means that the only possible problem in the definition of the Lebesgue integral is the potential for infinite area under
9.3. Lebesgue Integration
159
the graph. This is not a problem, because functions whose graphs enclose an infinite area cannot have a finite integral, independent of what notion of integration is used. Now that we have a sensible notion of integration that (as it ultimately turns out) does not have the weaknesses of the Riemann integral, we can establish some theorems about Lebesgue integrals.
Theorem 9.23 Let f , g : R + [-m, 001 be Lebesgue measurable functions. I. If0 5 f
5 g a.e. and g is Lebesgue integrable, then f is also Lebesgue
IRI r
integrable and
r
f dh 5
g dh.
R
2. f is Lebesgue integrable iff 1 f I is Lebesgue integrable and in this case the tri-
angular inequality 3.
Iff
>O,then
1
f d i i I I f I d h holds.
1
f d h = O z f f f =Oa.e..
Proof. For part 1, let N := {x E R : f ( x ) 2 g ( x ) } . By hypothesis, N is a null set. Let s : W + [0, 00) be a simple function with 0 5 s 5 f . Then s l ~ 5\ g~ is a simple function also and
sup
{b
s dh =
s l ~ d\h .~Hence,
s d h : s is a simple function with 0 5 s 5 f
= sup 5
1 1
sup
[1 {1
S~R\N
I
d h : s is a simple function with 0 5 s 5 f
s d h : s is a simple function with 0 5 s 5 g
1
.
I
Because g is Lebesgue integrable, the latter supremum is finite, and hence f is Lebesgue integrable. Because the suprema are equal to the respective Lebesgue integrals, we conclude that
b
f dh 5
b
g dh.
For part 2, first note that by Theorem 9.20 the function If1 is Lebesgue measurable. The direction "+"of the claim follows straight from part 1, because if 1 f I is Lebesgue integrable, then 0 5 f+ 5 I f 1 and 0 5 f - 5 I f I imply that f+ and f - are both Lebesgue integrable, which means by Definition 9.22 that f is Lebesgue integrable. For the direction "+,"let f : R + [--00. 001 be Lebesgue integrable. Then f f and - f - are Lebesgue integrable. To prove that I f [ = f + - f - is Lebesgue integrable I1
lets = x U k l & beasimplefunctionwith0 5 s 5 I f l . L e t P := {x E R :f ( x ) ? 0) k=l and N := { x E R : f (x) < 0). Then P U N = R and P and N are disjoint. Therefore n
n
160
9. The Lebesgue Integral
and 0 5 s- 5 f
-. But this means that
f - d h < co and thus 1 f
Therefore we conclude that i s Leoesgue integratxe. Finally, because f -, f
+
1
5 If 1 we conclude by part 1 that
L
For part 3, let f 2 0. First, consider the direction obtain 0 5 f 5 0 a.e. and by part 1 we conclude that P
f d h = 0. IB Conversely, for the direction
“+.”Because f f dh 5
= 0 a.e., we
0 d h = 0 , which
means
“+”let
s,
f d h = 0 and suppose for a contradiction
that A( {x E R : f (x) > 0 ) ) > 0. Then, because the countable union of null sets is again a null set and {x E R : f (x) > 0 } =
n
, there must be an
n and s
1 := - 1 we ~ obtain
n by part 1. Therefore we conclude that f = 0 a.e..
Exercise 9-15 shows that if a function is equal a.e. to a Lebesgue measurable function, then it must be Lebesgue measurable, too. Part 1 of Theorem 9.23 can be used to show that if two Lebesgue measurable functions are equal a.e., then their Lebesgue integrals must be equal, too (see Exercise 9-26). Basically this means that for integration null sets are insignificant. The following definition is therefore sensible because independent of how we extend the function f to all of R,either all extensions are Lebesgue measurable or none of them are (because for all a > 0 the set {x E R : g(x) < a } with g as in Definition 9.24 below differs from {x E R : f ( x ) exists and f ( x ) < a } at most by a null set) and either all extensions are Lebesgue integrable with the same Lebesgue integral or none of them are Lebesgue integrable (by part 1 of Theorem 9.23).
Definition 9.24 I f the function f : R g(x) :=
{ ofl(x)’
--f
[--30,
001 is defined a.e. and the function
is Lebesgue measurable, then we will call f is i f f ( x ) is not defined,
9.3. Lebesgue Integration
161
Lebesgue integrable iff g is Lebesgue integrable and we define the Lebesgue integral P
P
o f f to be /R f d h := /R g dh.
Theorem 9.25 shows that the Lebesgue integral is well-behaved with respect to the linear operations of multiplying with a real number and addition. Because of Exercise 9-16a and Definition 9.24 and because the set where the sum f g is undefined is a null set (see Exercise 9-27), we do not need to place any additional hypotheses on the functions in part 2 of Theorem 9.25.
+
Theorem 9.25 Let f ,g : R -+ [--00, a E R. The the following are true:
001
be Lebesgue integrablefunctions and let
1. af is Lebesgue integrable and
2. f
+ g is Lebesgue integrable and
s, + f
g dh =
s,
f dh +
g dh.
Proof. For part 1, note that by Theorem 9.20 with g(x) = a the function af is Lebesgue measurable. If f 2 0 and a 2 0, then af is Lebesgue integrable because af d h
= sup = sup
-
[s, [ [1
s d h : s is a simple function with 0 5 s 5 af
I
as d h : s is a simple function with 0 5 as 5 af s d h : s is a simple function with 0
a sup
5s5f
I
But this means that for any Lebesgue integrable function f and any a 2 0 the functions (af )+ = af + and (af ) - = a f - are Lebesgue integrable and
1
af d h
=
k(af)+dh-k(af)-dh=a
1
f'dh-a
1
f-dA
= a / Rf dh
Finally, for any Lebesgue integrable function f and any a < 0 the functions ( af )+ = -af- and (af ) - = --af + are Lebesgue integrable and
For part 2 first note that by Exercise 9-16a, Definition 9.24 and the preceding discussion, we can assume that f g is defined and finite everywhere and that f g is
+
+
9. The Lebesgue Integral
162
Lebesgue measurable. (Simply set f and g equal to zero where the sum is not defined.) Also note that part 2 is easily proved for simple functions (Exercise 9-20b), so we can use the additivity of the Lebesgue integral for simple functions in the following. We will first prove the result for nonnegative f and g. To see that f + g is Lebesgue integrable, suppose for a contradiction that f g is not Lebesgue integrable. Then for
+
each n E
{
N there is a simple function s, with 0 5 s, 5
F := x E
R :f(x) 3
For each n E
f(x)
+
2
[
f
and G := x E
'(')}
N one of the inequalities
1 1: :
-S,1F
Without loss of generality, assume that
+ g and
k
s, d h > n. Let
R : g(x) 2
+2
a k;
d h > - or
g(x)
n
1.
--s,lc; d h > - holds.
4
n - s , 1 ~d h > - holds for all n 4
E
N.Because
1 0 5 -S,*lF 5 + 1~ 5 f this implies that f is not Lebesgue integrable, which is a 2 2 contradiction. Hence, f g must be Lebesgue integrable. Now if s1 is a simple function with 0 5 s1 5 f and s2 is a simple function with 0 5 s2 5 g, then s1 s2 is a simple function with 0 5 s1 + s2 5 f + g. This
+
1
implies
s1
+
dh
+
arbitrary, we obtain
s2
I
sl
dh =
f dh
+
L
+ s?:d h 5
g dh 5
L
f
s,+ + f
g d h and because
s1, s2
were
g dh.
1
1 For the reversed inequality, let C, := x E R : - 5 f ( x ) g(x) 5 n for each { n n E N.We first prove that lim f g d h . Let E > 0 and let (f g ) lc, d h = n+m R k s d h > k f + g d A - - . N Eo t e s be a simple function so that 0 5 s 5 f g and 2
s
u CT:
that
C, = { x E
R : f ( x ) + g(x)
n=l
subsets A C {x E R : f ( x ) m
+
+
+
L
+
> 0 ) . Hence, by Theorem 9.12 for all measurable
+ g(x) > 0 ) we obtain n+Q3 lim h ( A n C,)
= h ( A ) . With
uJ l ~and , all aJ > 0 this implies
s = J=1
m
m
slc, d h = lim C a j h ( A j n C,) = C a j h ( A j ) = n+m j=1
Therefore there is an n E But then 0 5 slc, 5 ( f
j=1
N so that the inequality k . 1 ~ .d h
+ g ) l c , , and hence
L(f+g)lc,,dhz
s,
s dh.
> k s dh -
& -
2
holds.
163
9.3. Lebesgue Integration
{
Because f and g are both nonnegative, the sequence JR (f nondecreasing and we conclude that lim n+oo
Now let
E
> 0 and let n E
2
I
6
. Because f and g are bounded by n on Cn, the 4 ( W d + 1) proof of part “5+1” of Theorem 9.19 shows that there are simple functions sf and sg so that 0 I sf If l c , , 0 Isg F glc, and for all x E R the inequalities f ( x ) l c , (x) - s f ( x ) < u and g(x)lc, ( x ) - sg(x) < u hold. Therefore, Let u := min
-,
Because E > 0 was arbitrary, this proves the additivity of the Lebesgue integral for nonnegative functions. For not necessarily nonnegative functions, note that because If gJ5 I f / /gl, the above and part 2 of Theorem 9.23 show that f g is Lebesgue integrable when f and g are Lebesgue integrable. For the equality of the integrals, first notice that if f1, f 2 , gl, g2 are nonnegative,
+
+
integrable and satisfy we can conclude that
(f
+ g)’
- (f
+ g)-
f1
- f2 = gl fl
=f
dh -
- g2, then via f2
dh =
+ g = (f’ + g’)
1 +k fl
gl d h - (f-
+ g-)
g2 d h =
+
1+ f2
gl d h
g2 d h . Therefore with
we obtain
9. The Lebesgue Integral
164
which completes the proof.
+
+
The approximation of the integral of f g with integrals of functions ( f g) lc,, in part 2 of the proof of Theorem 9.23, shows that sequences of functions should be powerful tools. Convergence of sequences of functions is discussed in Chapter 11 and the fundamental limit theorems for (Lebesgue) integration are introduced in Section 14.5. Moreover, Exercise 14-33 gives a more efficient proof of part 2 using limit theorems for integrals.
Exercises 9-20 Integration of simple functions. (a) Let s be a simple nonnegative function, let 4'1, . . . , ym E (0, 03) be the nonzero values that s assumes, let a l , , , . ,a, E [O, M), and let A l , . . . , An G B be pairwise disjoint Lebesgue measurable sets so that s =
n
n
m
k=l
k=l
j=1
C a k l A k . Prove that C a k h ( A k ) = C y j h
(5-1
(yj1) .
(This proves that the integral in Definition 9.21 does not depend on the representation of s . ) Hint. Each of the ak must be a yj or zero. Group the first sum so that equal values U k are contiguous and then prove that the union of the corresponding sets Ak is the inverse image of the appropriate yj. (b) Prove that if $1 and s2 are simple functions, then
s,
$1
+ s2 dh =
s,
sl d h
+
s2 dh
Hint. Find Lebesgue measurable sets A 1 . . . . , An so that $1 and s2 are constant on each A j . 9-2 1 Let f : X + [O, mj be bounded and Lebesgue measurable so that h finite. Prove that f is Lebesgue integrable and
k 9-22 Let f :
f d h = inf
( {x
{k
iff the supremum S := sup
W
: f(x) > 0
} ) is
1
s dh : s is a simple function with f 5 s .
R + [O, 031 be a Lebesgue measurable function. Prove that
Lebesgue integral of f .
E
f is Lebesgue integrable
1
{s,
min(f, n ) l [ - n , n l d h : n E W is finite and that in this case S is the
9-23 Let f : R + [O, m] be Lebesgue integrable, let a > 0 and let A g R be Lebesgue measurable and SO
that f
-a
lA
> 0. Prove that
k
f
-
a l A dh =
f dh - uh(A).
9-24 Construct Lebesgue integrable functions f : B + R and g : R + W so that ( f and ( f + g)- f fg-.
+
9-25 Let a1 , . , , , a, E sets and let f =
+ gjf
f f'
+ gf
R,let A 1 , , , . , A n g B be (not necessarily pairwise disjoint) Lebesgue measurable
5
UklAk
k= 1
be a simple function. Prove that
s,
f dh =
5
a k h ( A k ) . Then explain
k=l
how this result differs from the result in Exercise 9-20a and why we could not have used it to prove Exercise 9-20a. Hint. For the proof of the equation, you may use Theorem 9.25.
9.4. Lebesgue Integrals versus Riemann Integrals
165
9-26. Let j , g : R + [-co,001 be Lebesgue measurable functions so that f = g a s . Use only Theorem 9.23 to prove that f is Lebesgue integrable iff g is Lebesgue integrable and that in this case we have IRjdA.=I
gdh. R
9-27. Let j : W
+-
[-co,co]be Lebesgue integrable. Prove that { x E JR : f ( x ) = cm } is a null set.
9-28. Let f,g : R +- [ - m , co]be Lebesgue integrable functions. (a) Prove that max(f, g) (defined pointwise) is Lebesgue integrable. (b) Prove that min(f, g } (defined pointwise) is Lebesgue integrable. (c) Prove that f - g is Lebesgue integrable and
s,
f - g dh =
s,
j dh -
g dh.
9-29. Let C Q be a Cantor set. Prove that l C p is Lebesgue integrable. Hinr. Exercise 9-6. 9-30. Prove that the Dirichlet function f ( x ) = 1~,[0,11is Lebesgue integrable.
9.4 Lebesgue Integrals versus Riemann Integrals We conclude this chapter by establishing the relationship between the Lebesgue integral and regular as well as improper Riemann integrals. The Lebesgue integral truly is an extension of the Riemann integral of bounded functions on closed and bounded intervals. To see this, we first show that Riemann integrable functions are Lebesgue integrable and that for these functions the Lebesgue integral equals the Riemann integral. Theorem 9.26 also shows how to overcome the small nuisance of Riemann integrals being defined on sets [ a ,b ] ,while the Lebesgue integral is defined on R.
Theorem 9.26 I f f is Riemann integrable over [ a ,b], then f R : E% + E% dejined by
{
f W ( x ) := ;()';
for "[ otherwise,
blJ
is Lebesgue integrable and the Riemann integral of
f is equal to the Lebesgue integral of
fR.
That is,
1
fps d h =
Proof. First, let f 2 0. Let P = {a = xo < . . . [ u , bl. With mi and Mi as in Definition 5.13, lets;
Ib
f dx.
xn = b } be a partition of
i
n
+
:= ~ m k l [ x r - l , x k mnl(xnl j and k=l
p XU
.-.-
n
+
M k l [ X k - , , X k ) M n l { x n l . Then 0 5 sf I f 5 sup. k=l
For all n E
N,consider the partition P,,
: k = 0, . . . ,2"
:=
I
. Then
sp 5 sF+' for all n E N and by Lebesgue's criterion for Riemann integrability (Theorem 8.12) we infer that lim s,p"(x) = f ( x ) a.e. Because fps and the s p are zero n+w outside [ a ,b ] ,by Exercise 9-15 this means that fw is Lebesgue measurable. Because f~ 5 Ml[,,bl for some M > 0, we conclude that fps is Lebesgue integrable. Moreover, for all n E
N we have L ( f , P,,)
=
1s p
dh 5
f R dh 5
s$ d h =
U (f , Pn).
166
9. The Lebesgue Integral
For each n E
N,there is an evaluation set T,
[
= t?),
integers k = 1, . . . ,2" the inequality f (t;)) - mk
0, we have ei > eo = 1. But this means eJ = eyPx+' = eYPxex > e x , so the natural exponential function is strictly increasing and, in particular, injective. Moreover, its values are greater than or equal to 1 on [0, 00). 1 For z < 0 note that e z e v 2 = e z P z= 1, which means that ez = 7 must be greater ethan 0. Therefore, ex maps R into (0. m). Now note that for x > 0 we have ex > 1 x , and hence lim ex = 00. ConseX'OO 1 quently, lim ex = lim - = 0. By Theorem 12.2, the natural exponential funcx+-x x--f--oo e-x tion is differentiable, and hence continuous. The above limits and the Intermediate Value Theorem show that the natural exponential function is surjective onto (0, 00). (MentallyJill in the details.) w
+
Definition 12.7 We define the natural logarithm function In : (0, m) + JR to be the inverse function of exp : R + (0, 00). The natural logarithm allows us to define powers of positive bases with arbitrary real exponents. With an argument similar to the proof of Theorem 12.4 we can show that this definition agrees with Definition 1.50 for rational exponents. (Exercise 12-2.)
Definition 12.8 Let a > 0 be a positive real number and let r E R.Then we dejine the rth power of a to be a' := exp ( r ln(a)) = erln('). Theorem 12.9 For all positive numbers a and b, and all real numbers x and y the following hold.
Proof. All properties follow directly from corresponding properties of the natural exponential function. The first property is proved as follows. u.xa~
= ex In(a)ey M a ) = ex W ) + yMa) = e ( x + ~M )a ) = ax+r
Working rows left to right we obtain the following for the second property. (ab)x = -
(eln(a)eln(b))x
= (eln(u)+ln(b))x= eln(el"(a)cl"(b) )x
e( In(a)+ln(b))x = eln(a)x+ln(b)x = ,1n(a)xeln(b)x = a X b X .
The third property follows from a x - J a J - x = 1 and ax-)'ay = a x . The remaining w properties are left as Exercise 12-3. Of course, the properties of the natural logarithm function are also of interest,
12. Transcendental Functions
192
Theorem 12.10 The natural logarithmfunction is diferentiable on (0,00) with deriva1 d tive - ln(x) = -. Consequently, allpowerfunctions f (x) := x r with r E R \ Q are dx X d dgerentiable on (0,00) and the Power Rule -xr = r 2 - l holds. dx Proof. By Theorem 4.21, the natural logarithm function is differentiable at every d 1 1 x E (0, 00) and - ln(x) = -= -. For the derivative of powers, the Chain Rule dx eln(x) x implies -xr d = d e r l n ( x ) = e r l n ( x ) r -1 = r x r - l . H dx dx x Because the natural logarithm function is differentiable, it is continuous, which allows us to extend the limit law for powers to arbitrary exponents.
be a convergent sequence of nonnegative numbers and
Corollary 12.11 Let let r
E
R.Then n+oo lirn a;
unless r < 0 and lirn an = 0.
=
n+oo
Proof. Exercise 12-4b.
H
Further properties of the natural logarithm function are exhibited in Exercises 12-5 and 12-6.
Exercises N we have exp(n) =
)" as claimed in the proof of
12-1. Prove by induction that for all n Theorem 12.4.
E
12-2. Prove that for all x > 0 and all r 12.8 agree.
Q the definitions of the power x r in Definition 1.50 and Definition
E
(exp(1)
12-3. Finish the proof of Theorem 12.9. That is, prove the following for all a , b > 0 and x , y (a)
(i)
a x
=
a" bX
(b) (a")'
= ax)
E
R.
(c) ax > 0
12-4. The limit law for powers (a) Prove that
lim ex = 0.
x+-m
(b) Prove Corollary 12.11 . Be careful with sequences that go to zero. 12-5. Prove that the natural logarithm function is a strictly increasing bijective function from (0, co) to E with lim In(x) = m and lim ln(x) = -co. x+o+
x+m
12-6. Let u , u > 0. Prove each of the following. (b) ln(e) = 1
(a) In(1) = 0 id) In
(:)
= ln(u) - W
u)
(c) ln(uu) = ln(u)
(e) In ( u L )= u ln(u)
12-7. Limits of nth roots (a) Prove that if nlirn an = a > 0, then n lirn +m +m Hint. Exercises 2-16 and 12-6.
g
m=a
+ ln(u)
193
12.2. Sine and Cosine (b) Prove that lim @ = 1. n-oc
1
1
12-8 Compute the integral
x3ex2 d x
12-9 Gronwall's Inequality. Let u , u : [ a , b] + [0,00) be continuous functions and let c ? 0 be so that for all x
E
[ a , b] we have u ( x ) 5 c
+ /'
u ( t ) u ( r ) d r . Prove that for all x E [ a , b] we have
Hint. Divide by the right side, multiply by u ( x ) and integrate.
1
co
12-10 The function r ( a ) :=
x 0 l - ' C x d x defined for CY z 0 is called the Gamma function.
(a) Prove that the improper integral
LW
xa-l e -' d x converges for a p 1.
(b) Prove that the improper integral also converges for 0
a < 1.
i
(c) Prove that the improper integral diverges for CY 5 0. (d) Prove that r(1)= 1. ( e ) Prove that for all a 2 1 we have
r ( a + 1) = a r ( a ) .
(0Prove by induction that for every natural number n we have T ( n ) = ( n - l)!. For this reason, the Gamma function is also referred to as the generalized factorial function. 12-11. Compute the following parameter dependent indefinite integrals. These integrals are useful for the integration of rational functions after a partial fraction decomposition.
12.2 Sine and Cosine Similar to the natural exponential function, the sine and cosine functions are defined via power series that have the right derivatives at the origin. Of course, this is a bit of "reverse engineering," because to obtain these derivatives we would need to quote arguments that rely on the geometric definition of these functions. oc ( - l ) k x 2 k + '
Definition 12.12 For x
E
R, we define sin(x) := k=O 30
sine function, and cos(x) := k=O
(- l)kx*k
(2k)!
(2k
+ 1)!
, which is called the
, which is called the cosine function.
Theorem 12.13 The power series that define sin(.) and cos(.) have infinite radius of convergence. Therefore, both sin(.) and cos(.) are direrentiable and moreover d sin(x) = cos(x) and - cos(x) = - sin(x)for all x E R. dx dx
12. TranscendentalFunctions
194
Proof. We prove the result for sin(.), leaving cos(.) to the reader in Exercise 12-12. For the sine function, note that for every x E Iw we have (- 1 ) k + l x 2 ( k + ” + 1
(2(k+l)+l)!
lim
kj.m
X2k+3
(2k
(-l)kx*k+’
(2k+l)!
+ 3)!
(2k + l)! xZk+’
= lim k+ca
X2
(2k
+ 2)(2k + 3)
= 0.
Hence, by the Ratio Test, the power series converges for all x E R,so its radius of convergence is infinite. By Corollary 11.12, the sine function is differentiable on R and
x1
=
(- l)kx2k
k=O
(2k)!
= cos(x).
The following identities are useful when working with sine and cosine.
Theorem 12.14 For all x, y E R the following identities hold.
+ cos2(x) = 1 (trigonometric law of Pythagoras) 2. sin(x + y ) = sin(x) cos(y) + cos(x) sin(y) 3. cos(x + y ) = cos(x) cos(y) - sin(x) sin(y) 1. sin2(x)
Proof. To prove the first identity, we proceed as follows. sin2(x)
+ cos2(x) (_1)kX2k+’ (2k
= n=l
2
+ l)!
(- 1); (- l)k ( 2 j + I)! (2k+ I)! 2j+1+2k+l=2n
( c
( - l ) J (-l)k
(-1)J
(-l)k
-1 n=l
k=O
1
12.2. Sine and Cosine
195
The remaining two identities are left for Exercise 12-13. The smallest positive zero of the sine function also has a special place in mathematics. Of course, we first must show that the sine function has a positive zero.
Proposition 12.15 Thehnction sin(x) ispositive on
Proof. For all x
E
(0,
&) , we have X4k+l
X2k+l
x
and it is negative at 4.
sin(x) = C ( - l l k (2k k=O
+
l)! = C ( 4 k + 1 ) ! k=O
)
X2
(4k
+ 2)(4k + 3) > o .
On the other hand, for x = 4 we obtain
42k+ 1
x
sin(4)
=
k=O C ( - l l k (2k + l)!
-
where the first term is negative because
42k+ I
4
D- ilk (2k + l ) !
k=O
= 4--
43
45
-
4 '
49
1.2.3 1.2.3.4.5 1.2.3.4.5.6.7 1.2.3.4.5.6.7.8.9 5.43 4.43 9.46 2.46 - 4-1.2.3.5 1 . 2 . 3 4 1.2.3.5.6.7.9 1.2.3.5.6.7-9 7 .43 .43 6.9.43 = 41.2.3.5.6.9 1'2.3.5.6'7.9 +
+
+--
+
12. Transcendental Functions
196 = 4-
472 118.43 =4-4.- 0 be a real number: A function f : R -+ R is called periodic with period p zrfor all x E R we have f ( x p ) = f(x).
+
Theorem 12.18 Both the sine and the cosine function have period 2n Proof. Note that sin(2n) = sin(n + n) = sin(n) cos(n) + cos(n) sin(n) = 0. This implies cos2(2n) = 1 - sin2(2n) = 1 and because cos(2n) = cos(n
+ n)= cos(n>cos(n) - sin(n) sin(n) = cos2(n) > 0, R we obtain
we infer that cos(2n) = 1. Hence, for all x E
+ 2n) cos(x + 2n)
+
=
sin(x) cos(27r) cos(x) sin(2n) = sin(x) = cos(x) cos(2n) - sin(x) sin(2n) = cos(x).
sin(x
Exercises 12-12. Prove that the power series that defines the cosine function has infinite radius of convergence and d that - cos(x) = - sin(x). .t% 12-13, Finish the proof of Theorem 12.14. (a) Use power series to prove sin(x
+ y) = sin(x) cos(y) + cos(x) sin(y) for all x .
(b) Use the above and the law of Pythagoras to prove cos
(?) n = O a n d s i n ( ?3)7
J E
(c) Use the power series to prove sin(-n) = - sin(x) and cos(-x) = cos(x) for all x (d) Prove that sin (e) Prove that cos
(5 (-2
7r
- x ) = cos(x) for all x
E
B.(Use parts
= sin(x) for all x
E
B.
- .x)
(0, Prove that cos(x + y),=
R
= 1. E
8.
12-13a, 12-13b and 12-13c.)
c o s ( ~ ) , c o s i ~ ~sinixi,sin(xi,for ,all x . y E
B
12-14. Prove that n > 3. 12-15. A function f : [0, 27r] + B of the form f ( x ) = a0
+
n ( a j cos(jx) i--.l
trigonometric polynomial. (a) Prove the following product-to-sum formulas 1 i. cos(x) cos(y) = - [ cos(x y) cos(x - y) ] 2 1 ii. sin(x) sin(y) = - [ cos(x - y) - cos(x y) ] 2
+ +
+
+ bj sin(jx) ) is called a
12.2. Sine and Cosine
197
1 [ sin(x y) sin(x - y) ] 2 (b) Prove that the product of two trigonometric polynomials is a trigonometric polynomial,
iii. sin(x) cos(y) =
+ +
-
12- 16. Inverse trigonometric functions.
[
-1.
n n
Prove that the sine function is injective on - - , 2 2
[ 5,]; n n that - arcsin(x) = -f o r allx E (--, -1. 2 2
The inverse of the sine function restricted to -
is called the arcsine arcsin(,). Prove
d
d c 2 dx Prove that the cosine function is injective on [0, n].
The inverse of the cosine function restricted to [0, n]is called the arccosine arccos(.). Prove 1 d for all x E (0,n). that - arccos(x) = -dx qF2 sin(x) 12-17. The tangent function is defined to be tan(x) := cos(x) (a) Prove that the tangent function is differentiable on its domain and
d dx
- tan(x)
1 cosZ(x)
=-
n (- n2 , -). 2 The inverse of the tangent function restricted to (- 5, );
(b) Prove that the tangent function is injective on
-
is called the arctangent arctan(,). 1 d Prove that - arctan(x) = dx 1 +x2' (d) (Another integral for integration with partial fraction decompositions.) Compute the inte1 gralS dx. (c)
~
12-18. (The last integral needed for integration with partial fraction decompositions.)Proceed as follows to prove that for all natural numbers n > 1 we have
(a) Use integration by parts on
1
1 (x2 + b2)n dx =
1
1 . ( x 2 + b2)" d x X2
(b) The resulting equation contains an integral
(x2
+ b2)""
(b)
1
d x . Expand the numerator with
i b 2 and cancel what can be canceled. (c) Solve the equation for
s
(x2 + b2)"fl
dx.
12-19. Compute each of the integrals below (a)
/
Cxsin(x) dx.
.rex2 sin (x2) dx.
12-20. A representation of n (a) Prove that
l
b
1 .
sin"(x) dx = - - sin"-'(x)cos(x)
sinnP2(x) dx for all nat-
ural numbers n E N and all real numbers a < b. Hint. Use sinn(x) = sin"-'(x) ( 1 - cos2(x) ) and use integration by parts for the summand ( sinn-2(x) cos(x) ) cos(x).
198
12. Transcendental Functions Prove by induction that
1'.
2n-1 2n-3 IT sin 2n (x) dx = __ . -. . . - - for all n 2n 2n-2 2 2
2n 2n - 2 d x = -. -. . . - for all n E 2 n + 1 2n-1 3
Prove that
,7
-
2
= lim
-
n - + ~2n
n
k= 1
~
E
W.
N.
(2k)2 This is called Wallis' Product Formula. (2k- 1 ) 2 '
s,'
(x)dx ?
sin2"(x) dx ?
1%
sinZnf1(x) dx for all n
E
N,
IT
substitute the above expressions and divide by the expression in front of the -. 2
R is Riemann integrable and has a Riemann integrable derivative, then
that if f : [0, n ] +
12-21.
I
-1"
(x -
+ bzn
f ( k ) = L n f(x) dxf
ler's Summation Formula. Hint. Note that J';l (x
i)
-
f'(x) =
1
dx.This formula is called Eu-
f'(x)
2
. _
[ f (0) + f (1) ] -
/
1
f ( x ) dx and that the last integral
0
on the right side of Euler's Summation Formula can be turned into a sum of integrals like these by integrating from one integer to the next and applying the appropriate shift. n
12-22 An asymptotic expression for n ! . To see the idea for the first step, note that ln(n!) =
ln(k). k=1
(a) Apply Euler's Summation Formula to f ( x ) = ln(x
{
(b) Prove that / n (x Hint.
1
0
1' i) (x
X
1
- 1x1) __ dx]
converges.
't 1; + I' k)
x+l
dx =
-
+ 1) to prove that for all n E W we have
n=l
(x -
1) 2 x
(x
i k
n!e" (c) Let bn := __ Prove that (bn}r=lconverges to a limit b nnJTi'
1 b
(d) Use Wallis' Product Formula to show that - = lim
E
-
2
1
dx.
R
1 5 = ~.&
n+m b:
n!e"
(e) Prove that lim -n+m nn&
-
This result is called Stirling's Formula. It is often written as n ! read "is asymptotically equal to." 12-23
($1'
where "->, -.en fin&=
is
for # O3 Prove that w ~ ( 0=) 2. for x = 0.
12-24 A differentiable function with bounded, but not Riemann integrable, derivative (see Figure 25) (a) Prove that for all 8 > 0 there is an xg
(b) Prove that f g ( x ) :=
'OS
(4)
E
;
part 12-24a, is differentiable on R, but
( 0 , 8 )so that 2x cos
(:)
+ sin (:)
=0
for x 5 0, for (O' x8), where xg is a positive number as in
'
fi is not continuous at x = 0.
12.3. L’H6pital’sRule
199
If
IIII IIV V I l
v
IIII I V
w vI
IIII I1 v
v II
IIII
Figure 25: The differentiable function h in Exercise 12-24 oscillates so that the derivative h’ is discontinuous on a Cantor set (marked by blocks).
(c) Prove that g a ( x ) :=
L
f4 (XI;
6 forx 5 -
’ is differentiable on W,g; is discontinuous at forx z -, 2 5 (0.8).
fs (8 - x ) :
0 and at 6 and
(x
: ga(x)
#0]
(d) For any open interval ( a , b ) , let g ( , , b ) ( x ) := gIb-,l ( x - a ) , Prove that g(,,b) is differentiable, g;u,b) is discontinuous at a and at b and (x E W : g(,,b)(X) # 0) g ( a . b ) . cx
(e) Let C Q =
n
Cf be a Cantor set of nonzero Lebesgue measure, which exists by Exercise
n=l
8-3e. For each n
E
N let D i , i
tervals so that [O. 13 \ CR =
= 1, . . . , 2n - 1 be a sequence of pairwise disjoint open in-
-1
u
2”
,=I. .
2“-1
0;;.Let hn :=
gn:, , Prove that h := lim hn (taken n+m
, = l. .
pointwise) is differentiable on W and h‘ is discontinuous on C Q (0 Prove that h‘ is bounded, but not Riemann integrable on [0, I ]
(g) Prove that h‘ is Lebesgue measurable, and hence Lebesgue integrable on [0, I]. Hint. Represent h‘ as a sum of measurable functions and obtain (h’)-’ or intersection of preimages of ( u , 00) under these functions.
[ ( a , x ) ] as a union
12.3 L’H6pital’s Rule Limits of functions involving transcendental functions can be hard to compute. The algebra either involves power series, or it might even be so complicated that it is virtually impossible. L‘H8pital’s Rule is a way to replace the limit of a quotient with the limit of the quotient of the derivatives, which may be easier to compute. The idea behind oc
L‘H8pital’s Rule is easily explained with power series. If f ( x ) = x
f k ( x - a ) k and k=O
gk(x - a)k are power series with f ( a ) = g ( a ) = 0 and if g ’ ( a ) f 0, then
g(x) = k=O
200
12. Transcendental Functions
That is, the limit of the quotient of the functions is the quotient of the derivatives. Because not every function is a power series and because derivatives cannot be defined at infinity, in general we expect the limit of the quotient of the derivatives to f (XI f’(x) . . be on the right side. To prove lirn -- lim -, it is tempting to apply the X t “ g(x) x+a g’(x) Mean Value Theorem to the quotient, using that f ( a ) = g ( a ) = 0. But arguing
f (XI
f(x)- f ( a )
lirn -- lim x+a g ( x ) x+a
X -a
g(x)-g(a)
= lim x+a
f/o is problematic. g’(cg)
If c f and cg approach a
X-U
at different rates, then the limit need not be lirn - To get the right limit, we need x+a g’(x) cf = cg, which can be achieved with a stronger form of the Mean Value Theorem.
Theorem 12.19 Generalized Mean Value Theorem. Let f , g be functions that are continuous on [ a , b] and di#erentiable on ( a , b) and let g ( a ) f: g(b). Then there is a - f’(c> number c E ( a , b ) such that f ( b ) - f ( a ) g(b) - g(a> g’(c) . Proof. Let h ( x ) := [ f ( b )- f ( a ) ] g ( x )- [g(b) - g ( a ) ] f ( x )and apply Rolle’s Theorem. (See Exercise 12-25.) Now we are ready to prove L‘HBpital’s Rule.
Theorem 12.20 L’H8pital’s Rule. Let a E [-a, co]and let f,g be differentiable functions dejned on an interval ( z , 00) ( i f a = GO), or (-co,z ) ( i f a = -00), or (a - 6, a + 6 ) \ { a ) ( i f a E R). Ifthe limits o f f and g satisfy lim f ( x ) = lirn g ( x ) = 0 x+a
or both limits are in {hm}and lim x+a
f
X+a
f’(x> exists as a number or is infinite, then g’(x)
(XI
lim -- lim -. This rule also applies to one-sided limits.
g(x)
x+a
x--ta
g’(x)
f’o. First consider a E (-m, m] and L + g’(x)
Proof. Let L := lirn x+a
-00. We claim
f
(x)
that for all yo < L there is an xo < a so that for all x E (xo, a ) we have g(x) > Let yo < L . Then there is an x1 < a such that for all x E
f
’(XI
a ) we have g’(x) > In case lirn f ( x ) = lirn g ( x ) = 0, by the Generalized Mean Value Theorem for x-+a
all x
E (XI,
(XI,
x+a
a ) there is a c
E
( x , a ) so that
f ( x > f ( x >- f ( a ) - f’(c) > yo, which g(x>- g ( x ) - g ( a ) g’o
means in this case the claim holds with xo := XI. In case lirn f ( x ) = lim g ( x ) = 00,we can assume without loss of generality that X-+a
x--ta
f and g are positive on
[XI,
a ) . Let
E
> 0 be such that ___ < L . Find an (1 - &)2
12.3. L’H6pital’s Rule x2 E ( X I , a )
an xo
20 1
so that for all c
E (x2, a )
E (x2, a )
so that for all x
f’ ( c ) we have that -> g’(c)
E (xo, a )
we have that
~
(1 - E ) 2 ’
f(x) - f(X2)
Then find
> 1 - E and
- g (x2 ) > 1 - E . Then for all x E (xo, a ) by the Generalized Mean Value Theg(x) orem there is a c E ( x 2 , x ) so that f(x) - f ( x 2 ) - f’(c) >and hence for g ( x > - g(x2) g’(c) (1 - E l 2 ’ all x E (xo,a ) we obtain
f (x) f ( x )- f ( x 2 ) f(x) g ( x ) - g(x2) f’(c) > -(1 g(x) g ( x ) - g(x2) f ( x > - f ( x 2 ) g(x) g (c)
-El
2
> yo,
which proves the claim in this case. The other cases for the limits of f and g being infinite are proved similarly, so the claim is proved. The claim proves the result for L = 00 for left limits at a E R and for limits at infinity. Similar to the above we can prove that for L # 00 for all yo > L there is an xo < a so that for all x E (xo, a ) we have - < yo. This proves the result for L = -cm for g(x>
left limits at a E R and for limits at infinity. Putting the two results above together, for L E R and every xo < a so that for all x E (xo, a ) we have L - E < ’(’) 0 there is an
+ E , which proves the
result for L E R for left limits at a E R and for limits-at 60. Repeating the above process for a E [-m, 00) to the right of a establishes the result for a = -00, for right limits at a E R and for two-sided limits at a E R. Exercise 12-28 shows that L‘HBpital’s Rule is not a one-for-one swap. The limit lim f ( ’ ) can exist even if lim
x+a
g(x)
x-+a
f’(X> fails to exist. g’(x)
Aside from the obvious applications to quotients, L‘HBpital’s Rule also allows us to derive a well-known representation for the exponential function.
Theorem 12.21 For all x Proof. For all x
E
E
R we have ex
= lim (1 n+cc
+ x--)n .
R,we obtain
Because the natural exponential function is continuous, the result follows.
202
12. TranscendentalFunctions
Exercises 12-25 Prove Theorem 12.19. Remember to prove that Rolle’s Theorem can be applied 12-26 Compute each of the limits below. (a) lirn
2x4
(g)
+
-t x 2 x x 2 - 16
- 4x3
x+4
-
276
(b)
lim xln(x)
1 1 lim -- ___ sin(x) cos(x) - 1
12-27 Prove by induction that for all n E W we have lim x”e-’ X’cs;
12-28. Prove that for f ( x ) = 2x lirn x+m
ln(x)
(c) &mm
x+o+
= 0.
+ sin(x), g(x) = 2x - sin(x) and a = co we have
f‘cx, does not exist. g
lirn x+w
f (x) ~
g(x)
= 1, but
(XI
12-29. Let f : (0,co) + R be differentiable. Does lirn f’(x) = 0 imply that lirn f ( x ) exists? Justify x +30 X+lX your answer. 12-30, Is there a differentiable function f : (0, m) + W with lim f ’ ( x ) = 0 so that for every real number
(x~)?=~that goes to infinity and so that n+lim X + M
L there is a sequence
f ( x n ) = L? Justify your answer.
12-31. Creating summation formulas. This exercise shows one way in which summation formulas for
c n
powers of integers (see Exercise 1-33) can be discovered. Let f ( x ) :=
ekx.
k=l
c n
(a) Prove that for all p E
N we have
kP = f@)(O).
k= 1
n
(b) Prove that for all x E
W we have
ekx = k=l
(c) Prove h a t for all p
E
W we have
f(P’(0)
ex
- e(n+l)x 1-ex
dP = lim xiodxp
’
- e(n+l)x 1-ex
’
n
k =
(d) Use l’H8pital’s Rule to verify that k=l
22 (n + 1) for all n
E W.
(e) Use a computer algebra system to generate a closed formula for
k5. k=i
12-32. Use Cauchy’s Limit Theorem (see Exercise 2-51) to give an alternative proof of 1’Hbpital’s Rule.
Chapter 13
Numerical Methods
Many problems cannot be solved exactly. Therefore it is natural to consider computational approaches to mathematics. Numerical analysis is a wide field. For any problem that can be solved exactly (under good circumstances), there is at least one numerical method to provide an approximate solution in case exact methods fail. Usually, a numerical method contains a parameter, call it n , that indicates the computational effort required to obtain the approximation. With enough computational effort, a numerical method should provide approximations close to the exact solution. More formally, this means that as n goes to infinity the limit of our approximations should be the exact solution. But just having approximations that converge to the exact answers usually is not enough. We want to obtain good approximations with as little computational effort as possible. This means we not only need to assure that, given enough computational effort, the approximations converge to the correct result. We also must analyze how fast the approximations converge.
In the language that we have developed, this means that just showing that for every N E N so that for all n 2 N the nth approximation is within E of the exact solution is not enough. We also want our estimates to be sharp enough so that when, say, N = 10 guarantees a desired accuracy, we do not use a larger N and waste computational effort. So, where in proofs so far we were satisfied with the fact that N exists, in numerical analysis we want to know what N is. Where in proofs so far we were satisfied that estimates ultimately showed that a certain difference is smaller than E , in numerical analysis we want to perform the estimate with an N that is as small as possible. E
> 0 there is an
For this reason, this chapter will emphasize error analysis. We present numerical approaches for three typical tasks: The representation of functions in Section 13.1, the solution of equations in Section 13.2 and the computation of integrals in Section 13.3. 203
204
13. Numerical Methods
13.1 Approximation with Taylor Polynomials It seems mundane, but the most fundamental numerical task is the computation of the values of functions such as exponential and trigonometric functions. The exact values of these functions can only be computed for certain special input values x. For all other input values, we need to use approximation techniques. The most fundamental of these techniques is the approximation with Taylor polynomials. There are several ways to motivate the use of polynomials. Most importantly, polynomials are easy to compute, which is a paramount concern in numerical analysis. Moreover, for each of the many functions defined as power series there is a sequence of polynomials that converges to it. Geometrically, we can argue that the tangent line of a differentiable function at a point a has the same value and the same first derivative as f at a and, locally, it approximates f rather well. We have reason to hope that, by increasing the number of derivatives that agree with the derivatives of the function at a, we can enlarge the interval on which we have a good approximation for f . To increase the number of derivatives that agree at a , we need to use polynomials of degree greater than 1.
Theorem 13.1 Let the function f be n times diTerentiable at a. Then the polynomial f(j)(a) ( x - a)’ is such that thejrst n derivatives of Tn at a are equal T,,(x) := j!
c
~
i =O
to thefirst n derivatives off at a. That is, Tjk)(a)= f ( k ) ( afor ) k = 0, . . . , n.
Proof. Prove by induction that for 1 Ik In the kth derivative of T,(x) is T,‘k’(x) =
c ~
j=k
f ( j ) ( a ) j. (
j !
j - 1). .. ( j - k
+ l)(x - a ) j - k . (Exercise 13-1.)
Theorem 13.1 motivates the definition of Taylor polynomials and Taylor series.
Definition 13.2 Let thefunction f be n times diyerentiable at a. Then the polynomial T n ( x ) :=
c
f(j)(a)
~
i =O
j!
(x - a ) j is called the nth Taylor polynomial of f at a. f(j)(a) j!
___ (x - a ) j
is infinitely difSerentiable at a, the series T ( x ) := j=O
If f
is called the
Taylor series o f f at a. The definitions of the exponential and the trigonometric functions guarantee that the Taylor polynomials at a = 0 ultimately provide good approximations for these functions (see Exercise 13-2). However, as mentioned in the introduction, for numerical purposes it is not sufficient to just know that for some degree n the nth Taylor polynomial of f is close to f .We need to know how close T,, is to f .
+
Theorem 13.3 Taylor’s Formula. Iff is ( n 1) times continuously differentiable on ( a - R , a + R ) , then for all x E E% with Ix - a / < R we have
13.1. Approximation with Taylor Polynomials
205
if Ix - a / < R and M is such that for all x E IR with Ix M have f ( " " ) ( x ) I M , then ] f ( x ) - T,(x) I 5 Ix - aIn+l.
In particulal;
I
x
I
(n
- a1
- T,(x)/ 5
~
+
Remark 13.5 Not every function can be approximated well with Taylor polynomials. For example, Lemma 18.8 exhibits a function that is infinitely differentiable, not identical to zero, and yet all its Taylor polynomials at a = 0 are identical to zero. 0 Remark 13.6 Early operating systems, such as the one on the Commodore 64 in the 1980s, used Taylor polynomials of sufficiently high degree to compute many functions. Nowadays, computational schemes that are faster than Taylor polynomials, but also more memory intensive, are used to compute functions. The reason is that memory is not as much of an issue as it was in the early days of computing, while speed remains a crucial concern. 0 Remark 13.7 Taylor polynomials are also used in physics to obtain low-order approximations of complicated functions f.Typically, i f f is to be evaluated at x Ax, where Ax is small, the exact expression f ( x Ax) is replaced with the approximately equal expression f ( x ) f'(x)Ax. This is feasible because, as Taylor's Formula shows, the difference is often bounded by C ( A X ) ~where , C is a constant. If Ax is small, terms of the order (Ax)* are usually negligible. The determination what is small and what is negligible is made based on practical, nonmathematical considerations. A posteriori, if the approximate formula correctly predicts an experiment, then the approximation must have been permissible. A priori, one could say that if other effects influence the quantity given by f by, say, 0.1% (of the underlying base unit), and Ax is at most I%, then (Ax)' is less than 0.01%, so it can be ignored because other effects will have greater influence. If a first order approximation as indicated does not work, higher0 order Taylor polynomials can be used in more sophisticated models.
+
+
+
13.1. Approximation with Taylor Polynomials
207
Functions can also be approximated with trigonometric polynomials. This idea is motivated by problems as described in Section 21.3. We will present the corresponding series, called Fourier series, in Section 20.2. The powerful tools available by then allow for a more efficient presentation than what would be possible now.
Exercises 13-1 Prove Theorem 13.1,
c m
13-2 Prove that if f ( x ) =
ckxk is a power series with nonzero radius of convergence, then the Taylor
k=O
c m
series o f f about a = o is
Ckxk
k=O
+ 1) times continuously differentiable on (a - R , a + R ) , then for all x E R 1 1 ( n +M I)! / x - a / " + ' , where M is such that for all 1x - a1 < R we have 1 f ( " + ' ) ( x ) I < M .
13-3 Prove that if f is ( n with / x
-
a / < R we have that f ( x ) - T n ( x ) 5
~
13-4 Find an upper bound for the error incurred when approximating f on [ l , I ] with its nth degree Taylor polynomial at a . (a) f ( x ) = e x , a = 0, n = 10, [ l , rl = [-5,51 (b) f ( x ) = sin(x), a = 0, n = 7, [ L , r ] =
[22 'T "1
13-5 Use induction to prove that the given expression is the nth derivative of the given function.
(a) f ( x ) = In 1x1,f ' " ) ( x ) = (bj f ( x ) = 2', f'"'(xj = 2'
- l)!
(-l)"+'(n
for n ? 1
X"
(ln(2)
)"
+ ne" 1 - a" f ( x ) = x e a x , f ( " ) ( x ) = anxeaX + -eax 1-a
(c) f ( x ) = x e x , f ( " ) ( x ) = .ex
(d)
13-6 Determine the smallest n so that polynomial of f about a .
1 T,(x) - f ( x ) 1
0, the Taylor polynomials of f ( x ) = In 1x1 at a converge for x E ( 0 , 2 a ) and they diverge for lx - a1 > a . 13-8 Second Derivative Test. Let f : (a, b ) + R be twice continuously differentiable and let x E ( a, b ) be so that f ' ( x ) = 0. (a) Prove that if f " ( x ) > 0, then there is an E > 0 so that for all z i:x with 1z - X I < E we have that f ( x ) < f ( z ) , (b) Prove that if f " ( x ) < 0, then there is an E > 0 so that for all z # x with ) z - X I iE we have that f ( x ) > f ( z ) .
208
13. Numerical Methods (c) State and prove a similar result for f : ( u , 6) + IF?, being n times continuously differentiable with f ’ ( x ) = f ” ( x ) = . . . = f ( n - l ) ( x ) = 0. (Distinguish even and odd n.) n
13-9. Efficient evaluationof polynomials. Let p ( x ) =
u j x j be a polynomial. j =O
(a) Provethatp(x1 = a ~ + x ( u l+ x ( u 2 + . . . + x ( u n - l + x ( a n ) ) . . . ) ) . n
(b) Count the number of operations in the evaluation of the sum
a j x j and in the evaluation j=O
in part 13-9a to prove that evaluation as in part 13-9a takes fewer operations (and is thus more efficient) than evaluation of the original sum. Hint. Evaluating u j x J takes j floating point multiplications and floating point multiplications take much more time than floating point additions. (c) State an n step recursive procedure that evaluates polynomials as in part 13-9a. Hint Start with Hn := an and define H n - l , . . . , H1 in such a way that H1 = p ( x ) .
13.2 Newton’s Method Solving equations is a common numerical task. The Intermediate Value Theorem guarantees that for equations f ( x ) = 0 the issue usually is not ifwe can find solutions, but rather how fast we can compute them.
Example 13.8 The bisection method. Let f : [a, b] + R be continuous with f ( a ) f ( b ) < 0. Then by the Intermediate Value Theorem f has a zero in [ a , b ] . To simplify the presentation, assume without loss of generality that f has a unique zero z in [ a , b]. We will recursively construct a sequence [ x , } ~ =that ~ converges to z . Let xg := a , x1 := b, and j ( 1 ) := 0. For the recursive construction, let xg,. . . , xn b-a and j ( n ) E ( 0 ,. . . , n - 1) be SO that f ( ~ ~ ) f ( x j ( 0 so that f o r ever)! starting point xg E ( z - 6,, z 8,) the sequence generated by Newton’s method converges to z.
+
Proof. We apply Lemma 13.11 to F ( x ) := x - f ( x ) . First note that
f ’(XI
In particular this means that F ’ ( z ) = 0, and hence there is a 6, > 0 so that 1 I F ’ ( x ) / < - < 1 for all x E ( z - 6,, z 6,). But then by Lemma 13.11 every se2 quence generated by Newton’s method started at any xo E ( z - 6,,z SZ) converges to z . w
+
+
21 1
13.2. Newton's Method
From Theorem 13.12 and its proof we infer that the closer xo is to a zero of f , the faster Newton's method will converge. But when Newton's method converges, the numbers xn will get ever closer to a zero of f . Hence, as Newton's method is executed, the speed of convergence should accelerate. Theorem 13.14 below makes this statement more precise.
Lemma 13.13 Let f : ( a , b ) + E% be continuously differentiable and let y > 0 be so that for all x,z E ( a , b ) we have that l f ' ( z ) - f ' ( x ) ( < y l z - XI. Then for all
Y ( z - x(2 holds. x,z E ( a , b ) the inequality 1 f ( z ) - f ( x ) - f ' ( x ) ( z - x) 1 < 2 Proof. Without loss of generality assume that x < z . Then
=
Y -x)2 -(t
2
JX
Y - x)2 = -(z 2
Theorem 13.14 Let f : ( a , b ) -+ R be a continuously differentiable function so that f ' ( x ) # 0 for all x E ( a , b). Assume there are xg E ( a , b) and a , j3, y > 0 so
1 :,(; 1
1
1 that - 5 a, so that for all x E ( a , b ) we have f'(x) i B, so that for all
I
( a , b ) we have f ' ( z ) - f'(x)/ 5 ylz -XI, so that h := - < 1 and so that 2 a with r := -we have [XO - r, xo r ] G ( a , b). Then 1-h
x,z
E
+
I . Each recursively dejinedpoint xn+l := xn - f ( x , ) is in (xo - r, xo + r ) . f'(xn) 2. The sequence (xn}ZO converges to a point u E [ X O -r, xo+r] with f ( u ) =O. 3. Foralln>Owehaveju-x,I
h2"-1 ( a - 1 - h2".
Proof. We first prove by induction that for all n E N the point xn is well defined, < ah2"-1-1 , and Ixn - xo/ < r . For n = 1 the above is trivial. For the lxn - xn-ll induction step, n + n 1 first note that because (xn- XOI < r the number xn+i is well defined. The definition of xn implies that f ( ~ ~ - 1+) f'(xn-l)(xn - x n - l ) = 0 , which implies
+
212
13. Numerical Methods But then, using a telescoping sum, we obtain the following.
which finishes the induction. To prove that {x~}:=~ converges, let m 2 n. Then lXm
- Xj-1
5
c m
lXk
- Xk-11 5 a
k=n+l
c m
h2k-'-1 = a
m-n-1
h2"-1h2"(2'-1) j=O
k=n+l
1 goes to zero as n goes to infinity, so 1 - h2" {x~}:=~ is a Cauchy sequence. (Mentallyfill in the argument.) In particular, (x,},"=~ converges to a number u , which by Theorem 13.10 must satisfy f ( u ) = 0. Letting m 1 go to infinity in the above estimate also shows that Iu - x, I 5 ah2"-' which 1 - h2" ' finishes the proof. Because 0 < h < 1 the bound ah2"-'-
~
Theorem 13.14 shows that once Newton's method is "close enough" to a zero of a function, it converges quite rapidly. Indeed, near a zero u of a continuously differentiable function f , the hypotheses of Theorem 13.14 can be satisfied if f ' ( u ) # 0, because we can make a small by starting near u . For comparison with the bisection 1 22"-1 method, suppose, for argument's sake, ct = j3 = y = 1. Then Iu - X n 1 I
ton's method six time; to obtain & approximation that gives the first 15 digits behind the decimal point of u. Another nice feature of Theorem 13.14 is that it can be generalized to several variables (see Exercise 17-44). Finally, note that even though Newton's method is only applicable to differentiable functions, Exercise 13-13 shows that it can be modified to provide a method that is applicable to all functions.
Exercises 13-10, Let f be a continuously differentiable function. Prove that if the sequence generated by Newton's method converges to a limit L in the domain of f and f ' ( L ) = 0, then f ( L ) = 0.
1
I
13-1 1. Let q > I, let F : ( u , b ) + W be a differentiable function so that F'(x) > q for all x E (a. b ) and let p E ( u , b ) be so that F ( p ) = p . Prove that p is the only fixed point of F in ( a , b ) and that for all xo E (a, b) \ [ p } the sequence generated by the recursive equation xn+1 := F ( x n ) terminates after finitely many steps with a value that is not in ( a , b).
13.2. Newton’s Method 13-12. Let f : ( a , b ) -+
213
R be twice differentiable.
(a) Let z E ( a , b ) be so that f ( z ) = 0, f’(x) > 0 for all x E ( z , b ) (that is, f is increasing on ( z , b ) ) ,and so that f ” ( x ) > 0 for all x E ( z , b ) (that is, f is concave up on ( z , b)).Prove that the sequence generated by Newton’s method started at any xo E ( z , b) converges to z . Hint. Prove that z 5 xn+l 5 xn for all n E N, Note. Figure 27 provides a geometric visualization of the claim in this exercise. Second note. In particular, it is allowed that f’(z) = 0 in this exercise. (b) Prove that i f f ” is continuous, then for each z E ( a , b ) with f ( z ) = f ’ ( z ) = 0 and f”(z) # 0 there is a 6, z 0 so that for every starting point xo E (z - 6,,z 6,) the sequence generated by Newton’s method converges to z.
+
13-13. Let
f be a function, let x0,xi
E
R with
f(x0)
# f ( x 1 ) and consider the recursively defined
(a) Show that the recursive formula is obtained by taking the equation of the secant line of f through ( ~ ~ - 1f(x,-l) , ) and (x,, f ( x n ) ) and computing its unique zero. (b) Prove that if the sequence generated with this method converges to L and bounded for x ,z near L , then f(L)= 0.
f ( z ) - f(x)
is
z-x
+
f ,z < x1 < xo, f is twice differentiable on ( z - 6,xo 8 ) for some 6 z 0, and f is increasing and concave up on ( z - 6,xo 6) (that is, f ” ( x ) > 0 for all x E (z - 6,xo 6)), then the sequence generated by this method converges to z . Hint. Prove that z 5 xn+1 5 X , for all n E N. (d) Prove that in a situation as in part 13-13c the sequence [ x , ) ~ = ~converges at least as fast to z (c) Prove that if z is a zero of
+
+
‘x
iz x,L o x,” for
as the sequence x N
Hint. Prove that 5
generated with Newton’s method and
5
xt =
XO.
all n
13-14. Explain why xo = 1 is not a useful starting point for finding the zeroes of f ( x ) = x 3 - 3x + 4 with Newton’s method. 13-15. Explain why xo = 1 is not a useful starting point for using Newton’s method to find the zeroes of 1. Sketch a rough graph of f and of the tangent lines used to the function f ( x ) = x 6 - 4x2 - x compute x1 and x2 to illustrate your point. 13-16. For the function f ( x ) = x 3 - 5 x - 5 , execute Newton’s method started with xo = -2. Use a calculator or a computer.
+
(a) Find x1,x2,x3, x4. (b) Find the first n such that your computer shows x, = xn+1. Explain why n is so large.
+
13-17. Apply Newton’s method to f ( x ) = 6x4 - 18x2 - 6 x 1 with xo = -1. Explain why the limit is not the zero o f f that is closest to the starting point. 13-18. The limit L computed with Newton’s method does not always give f(L)= 0.
+
Apply Newton’s method with xo = 1. Use a com(a) Let f ( x ) = (lO1oo los0) x puter and call the apparent limit L . (b) Compute f(L).Is f(L) zz O? (c) Compute the zero o f f exactly and compare it to L. Is L close to the zero? (d) Explain why Newton’s method cannot produce a better approximation to the actual zero o f f .
13-19. Square roots. (a) Use Newton’s method and the fact that &is the solution of the equation x 2 - a = 0 to devise a recursive method to approximate square roots. Note. This is one algorithm that is used in computers to approximate square roots. (b) Prove that for any xo > 0 the sequence generated in part 13-19a converges to & and for any xo < 0 the sequence generated in part 13-19a converges to
-A.
13. Numerical Methods
214
13.3 Numerical Integration We conclude our introduction to numerics with numerical integration. Although the Darboux and Lebesgue integrals are useful, even essential, for the development of integration theory, the Riemann integral and the Fundamental Theorem of Calculus are the best tools for numerical considerations. The key to numerical integration is to replace the function we want to integrate with a function that is close to it and which can easily be integrated. The simplest geometric idea is to choose points on the function and let a polynomial go through these points. This section will focus solely on this idea. For other approaches to numerical integration, consider [28].
n
Definition 13.15 Let X O , . . . , x, Li(X) :=
kfi
X
- Xk
__ .
xi - xk
E
R with xi # X k for i # k and define the polynomial
Then Li is called a Lagrange polynomial.
Theorem 13.16 Lagrange’s Interpolation Formula. Let n E N,xo, . . . , x , E R with xi # X k for i # k , and let f o , . . . , f n E R. Then the polynomial defined by n
thatfor all i = 0 , . . . , n we have P,,(xi) = fi.
Proof. The equation follows easily from the fact that for all j E {0, . . . , n } we have O’ if j ” Regarding uniqueness, suppose that Q was 1, if j = i .
*
another polynomial of degree 5 n with Q ( x j ) = f , for all j = 0, . . . , n . Then P - Q is a polynomial of degree 5 n with n 1 zeroes. By the Fundamental Theorem of Algebra (see Exercise 16-74 for a proof), this means that P - Q = 0 or P = Q.
+
With the polynomial P, going through the points (xi, f i ) the most natural way to approximate a function f is to set fi = f ( x i ) . We first note that if we start with a polynomial f of sufficiently low degree, then we simply get f back.
Corollary 13.17 Let f be a polynomial of degree 5 n and let P,, be a polynomial computed as in Theorem 13.16 with numbers f i = f ( x i )for distinct numbers xi E R, i = 0 , . . . , n. Then P,(x) = f ( x )for all x E R. Proof. Both P, and f are polynomials of degree 5 n that go through the points ( x i , f ( x i ) ) . The equality follows from the uniqueness of P,. To obtain a convenient integration formula, we need to express the integral of P, in terms of the fi = f (xi). Interestingly enough, as long as the xi are equidistant, the coefficients in the formula do not depend on the interval over which we integrate.
Proposition 13.18 Newton-Cotes formulas. Let n d-c tewal. Define h := -, f o r i = 0 , . . . , n let n
Xi
E
N and let
:= c
[c,d ] be an in-
+ i h , let f j
E
R and
13.3. Numerical Integration
215
't
Figure 28: Visualization of the approximation of an integral over two subintervals with Riemann sums with evaluation at left endpoints ( a ) , trapezoidal sums ( b ) and Simpson's Rule (c).
n
Pn(x) d x = h
fiai. i =O
x-c Proof. The key to the proof is the substitution t := h '
The most obvious way to approximate an integral is to approximate a function
f : [a. b ] -+ R with a polynomial of sufficiently large degree and then use the appro-
priate Newton-Cotes formula. However, for n 1. 7 some of the ai become negative (see Exercise 13-20), which leads to problems with cancellations of digits. Moreover, the a; are hard to compute for large n. Therefore an interval [ a ,b] is usually first partitioned into shorter subintervals, the formula from Proposition 13.18 is applied on each subinterval and then the results are added. This type of integration formula is called a composite integration formula. 1 1 For example, for n = 1, we obtain a0 = - and a1 = -. If we now partition the 2 2 b-a interval [ a ,b] into N subintervals of length Ax := -and apply Proposition 13.18 N on each subinterval we obtain the approximation
216
13. Numerical Methods
which is called the trapezoidal rule, also shown in Figure 28(b). In the trapezoidal rule, we need to evaluate the function N 1 times. The effort in numerical integration is proportional to the number of times the function f is evaluated. To compare the performance of two numerical integration formulas, it is thus important that both formulas evaluate the function equally many times. For this reason, we will construct composite Newton-Cotes formulas so that f is evaluated N 1 times. For n = 2, note that each additional subinterval of length Ax requires two additional evaluations of the function. Thus for n = 2 we must demand that N is even. 1 1 4 With a0 = -, a1 = - and a2 = - we obtain the approximation formula 3 3 3
+
+
N-1
”
\
\
k=l
which is called Simpson’s Rule, also shown in Figure 28(c). In Exercise 13-21, the reader will state composite integration formulas based on Newton-Cotes formulas for n = 3-6. We now turn to the error analysis once more. Peano’s error representation (Theorem 13.19) shows that a more “abstract” point-of-view can have benefits for concrete tasks, which makes it a perfect conclusion for Part I of this text and a lead-in for the more abstract Part 11. Consider that the integral as well as the Newton-Cotes formulas are functions themselves. They map functions on an interval to real numbers. Moreover, the integral as well as the Newton-Cotes formulas are linear. That is, sums and constant factors can be moved through the integral (see Theorems 5.8 and 9.25) and also through the Newton-Cotes formulas (easy computation). Linear functions on vector spaces (like vector spaces of integrable functions) will be important in abstract analysis (see Chapter 17). Because we adopt this more abstract point-of-view, the error representation actually is valid for any numerical approximation of integrals that is linear, that gives exact results for low-degree polynomials and that can be “moved into the integral” as described in the hypothesis of Theorem 13.19 (see [28] for examples beyond the Newton-Cotes formulas). By Corollary 13.17 the nth Newton-Cotes formula is exact for polynomials up to degree n. Because it only evaluates the function at select points X k , multiplies the values with numbers, and adds the results, it is linear and it can be moved into the integral. Hence, Peano’s error representation applies to the nth Newton-Cotes formula. Although the hypotheses for Peano’s error representation look complicated, the proof shows that they are just what is needed to get an estimate. Finally, Peano’s error representation also shows that we need to establish more abstract results to fully justify more concrete results, like error estimates. In the proof of Peano’s error representation, we work with double integrals and we reverse their order. Formally, we have not proved yet that this is possible. Fubini’s Theorem (see Theorem 14.66) will show that this reversal is indeed allowed. For the specific case of double Lebesgue integrals, Fubini’s Theorem is stated in Exercise 16-80. Because Theorem 13.19 is not used to prove these results, we will reverse the order of integration in the proof of Theorem 13.19, anticipating that this can be justified.
13.3. Numerical Integration
217
Theorem 13.19 Peano's error representation. Let n E N,c < d and let F ( . ) be an integration formula which is lineal; that is, F ( a f ,Bg) = a F ( f ) ,BF(g)for all a , E R and f , g : [c,d ] -+ R,which for polynomials p of degree at most
+
n gives F ( p ) =
+
Id
p ( x ) d x , and which for all continuousfunctions g : [c,d ] -+ R g ( t ) F ( ( x - t)nlx-tzO) dt. Forevery
g ( t ) ( x - t)nlx-r20 dt
R,let R( f ) := F ( f ) -
Riemann integrable function f : [c,d ] -+
kd
f ( x ) d x be 1 the error of the numerical integration and let K ( t ) := - R ((. - t)nl.-tzo). Then for n! every function f that for some 6 > 0 is n 1 times continuously diferentiable on the
+
interval ( c - 6 , d
+ 6 ) we have R ( f ) =
called the Peano kernel.
kd
f ( " + ' ) ( t ) K ( t )d t . The function K is also
Proof. By Taylor's Formula, we obtain the following for all x
f (x)
+
=
T n ( x ) (x - c)"+l
=
T,(x)
+
I'
f
("+l)
(c
+
1
U(X
E
- c))?(l n.
[c,d ] .
- u)" du
ld
f('+l)(t)+(x - t)nlx-r10 d t , n.
where T,(x) is the nth Taylor polynomial of f at c. Because the integration formula is linear, we infer for all functions g , h that R(g+h)
=
F(g+h)-
=
F'(g1-L
=
R(g)
d
g+hdx
Id
gdx+F(h)-k
d
hdx
+ R(h).
Now because the integration formula gives exact results for polynomials of degree 5 n we know that R(T,,) = 0. Because the integral in Taylor's formula above is a function of x we obtain the following. R(f) =
R(T,)
+R
1 f ' " + ' ) ( t ) - i ( x - t)'lx-tz0 dt n.
13. Numerical Methods
218
F =
( (X
- t)" ln-tzo
)-
ld
ld(x
- t)nlx-r20
1
dx d t
f ( " + ' ) ( t ) K ( t )d t . H
Although Peano's error representation is very versatile, we need something more concrete for applications. One key weakness of Peano's error representation is that it depends in a rather complicated way on the interval of integration. The next result makes this dependency more manageable for the Newton-Cotes formulas by reducing the error to a power of half the length of the interval, times a constant, times a value of an appropriate derivative.
Theorem 13.20 Let n E N and let f : [c. d ] +-R be n+ 1 times continuously differentiable ifn is odd and n+2 times continuously dgerentiable ifn is even. Let RE'd1(f ) be
f ( t ) d t with the nth Newton-Cotes for-
the error w9hen approximating the integral
/" d
inula F n ( f ) , that is, R F ' d l ( f )= F , ( f ) -
f ( t ) d t . Foroddn, there i s a
< E ( c ,d )
r
E
( c ,d ) so that R F . d l ( f )=
(n
+ 2)!
Proof. For odd n , by its definition the nth Newton-Cotes formula gives exact results for polynomials of degree n. For even n , the nth Newton-Cotes formula even gives exact results for polynomials of degree n 1 (see Exercise 13-22). Because the proofs for even and odd n are very similar, we will assume throughout that n is fixed and odd. The idea for the proof is a substitution that changes the domain of the integral from [c. d ] to [-1. 11. For all real numbers c < d , let KE.dl denote the Peano kernel for the nth NewtonCotes formula for the interval [ c ,d ] . This Peano kernel can be reduced to the Peano kernel KL-'.'] on [-1, 11 as follows. A simple substitution (similar to the computation
+
below, see Exercise 13-23) shows that KF.dl ( t
+
T)
d-c
d-c
= K,[-7'T1
( t ) for all
219
13.3. Numerical Integration t E
[
d-c --,
2
-1
d-c
2
. Moreover, for all x
E
[--1, I] we have
n!K,
d-c
=
F,
-
d-c -FA-','] 2
d-c
((
-9 1: (-Ud
--
2 __
-c 2
2
- -dX )- c
'I d - c
2
du
d-cx,O
T"-T -
1
((. - x), l.-xzo) -
11
(U - x )'
l u - x r ~du
1
To complete the error estimate, we need to use that the Peano kernels KE . d l ( t ) for the Newton-Cotes formulas on [c,d ] are of constant sign. The proof of this fact is geometrically obvious for n = 1 (see Exercise 13-24b)and still manageable for n = 2 (see Exercise 13-24c), but the algebra becomes increasingly tedious. Because we will focus only on n = 1 and n = 2 (trapezoidal and Simpson's Rule) in the examples, the presentation remains (somewhat)self-contained. For a general argument that also holdsfor other integrationformulas, consider [lo],which uses matrix methods toprove that the Peano kernel does not change its sign. Because the Peano kernel for the Newton-Cotes formulas does not change sign, by the Mean Value Theorem for the Integral as in Exercise 8-21 there is a [ E (c, d ) so that
=
f('"'(t)/
* d-c
K k . d l( u
-7-
1>
+ci-d
du
13. Numerical Methods
220
Note that the proof of Theorem 13.20 (and hence its conclusion) will work for any linear integration formula which can be moved into the integral, which gives exact results for polynomials up to degree n and for which the Peano kernel does not change sign. The reader will verify this for the Midpoint Rule in Exercise 13-25. With Theorem 13.20 providing a bound for the error for individual Newton-Cotes formulas, bounds for the error for composite Newton-Cotes formulas are now an easy consequence.
+
Corollary 13.21 Let n E N, a < b, 6 > 0 and let f : ( a - 8, b 6) + R be a function. If n is odd let f be n 1 times continuously difSerentiable and let M := max f ("+')(x) : x E [ a ,b] . I f n is even let f be n+2 times continuously d$
1
11
{
I
ferentiable and let M := max If""(x)I
+
I.
: x E [ a ,b ]
value of the error when approximating the integral
Let C F ' b l (f ) be the absolute
Lb f ( t ) d t with the nth composite
Ja
+
+
Newton-Cotes formula with N = n j intervals, that is, N 1 = n j 1 evaluations, ( b - a)n+2 nfl+l R[-"'1 n (xn+l) where j E N.Then ifn is odd we have C:3b1 ( f ) 5 M Nn+l ( n 1)!2"+2 '
+
Proof. If n is odd, by Theorem 13.20 we obtain
(b - a)n+2 nn+l R[-"'] =
Nnfl
(n
(~n+l)
+ 1)!2n+2
and a similar computation gives the result for even n. Note that in the error bounds in Corollary 13.21 only M depends on the function f , only ( b - a ) depends on the interval and only N depends on the number of evaluations. The remainder, although it looks complicated, is merely a constant. Therefore, as N --f 00 the numerical approximation of an integral of an n 1 times (if n is odd) or
+
13.3. Numerical Integration
22 1
+
n 2 times (if n is even) continuously differentiable function with the nth composite Newton-Cotes formula will converge to the actual integral. Of course, more concrete formulas will be better for given fixed n. For the trapezoidal rule and Simpson’s Rule, we obtain the error bounds indicated below.
Corollary 13.22 Let f : ( a - 6 , b
+ 6 ) + R be a function.
(b - a)3 , where 12N2 [ a ,b ] } . This is the error formula for the trapezoidal
1. I f f is twice continuously dgerentiable, we have Cy’bl(f ) I :K
{I
K = max f ” ( x ) I : x rule with N intervals.
E
~
(b - a)5 180N4 ’ where C = max f (i”)(x) : x E [ a ,b ] ] . This is the error formula for Simpson’s Rule with N intervals.
2. I f f is four times continuously dgerentiable, we have CF,bl(f ) 5 C
[1
~
1
Proof. For n = 1, we obtain
(b- a)3 which proves C p 3 b 1f () 5 K . The computation for Simpson’s Rule is similar 12N2 and left to the reader as Exercise 13-26. ~
If the function, the interval and N are given, the aposteriori error estimate is simply a substitution into the formula (see Exercise 13-27). If the function, the interval and a desired accuracy are given, the a priori estimate is obtained by demanding the error bound is less than the desired accuracy and solving for N .
Example 13.23 Find the number of intervals needed to estimate
s_:
e
2
-2
d x with the
trapezoidal rule so that the error is less than lop4. 2 12 First, note that with f (x) = e - T we have the derivatives f ’ ( x ) = - x e - T and 12
f ” ( x ) = ( x 2 - 1) e - T . The maximum of K-
( b - a13
12N2
I
2 1 0 - ~ means I-
23
12N2
-lo4. Hence, for N we obtain 3
I 7
81.65 and so N d ; always need to round up. N > 100 -
%
= 82 would work. Note that in the last step, we
222
13. Numerical Methods
Exercises 13-20. Compute the coefficients a; in the Newton-Cotes formulas for n = 7. Use a computer. 13-21. For each given n compute ao, . . . , a, for the Newton-Cotes formula. Then state the corresponding composite integration formula for N divisible by n. Finally, compute the error for the Newton-Cotes formula and for the composite integration formula. For the computation of the cq,a computer is recommended. 3 (a) n = 3. This is called the --rule. 8 (b) n = 4. This is called Mitne’s Rule. (c) n = 5. This rule does not have a specific name.
(d) n = 6. This is called Weddle’s Rule. 13-22. Prove as follows that for even n the nth Newton-Cotes formula gives accurate results for polynomials of degree n 1.
+
(a) Use the linearity property to prove that if the nth Newton-Cotes formula gives the exact integral for one polynomial of degree n 1, then it gives the exact integral for all of them.
+
(b) Prove that the coefficients a; of the nth Newton-Cotes formula satisfy ai = an-; for all i E (0,. . . , n ) . (cj Prove that if n is even, the nth Newton-Cotes formula gives the exact integral for the polynomial p ( ~=) x
(
-
c:d)n+l
-
and conclude that the nth Newton-cotes formula gives the
exact integral for all polynomials of degree n
+ 1.
(d) To illustrate that the result does not work for odd n , prove that the first Newton-Cotes formula does not give the integral of f ( x ) = x 2 on [-1, 11. 13-23. Prove that for all t
E
we have K F ’ d l ( t
c+d +T )= K n[-+.?I
(tj
13-24. The sign of the Peano kernel for the Newton-Cotes formulas (a) Explain why proving that the Peano kernel KA-”” for the nth Newton-Cotes formula on [-1, I] does not change its sign on [-I, 11 proves that all Peano kernels K F ’ d l for the nth Newton-Cotes formula on [c, d ] do not change signs on [ c , 4. (bj Prove that for n = 1 the Peano kernel KA-’“] is nonnegative on [-1, I]. Hint.Direct computation of the integrals, using that the approximating polynomial is a straight line.
’
(cj Prove that for n = 2 the Peano kernel KA- “ I is nonnegative on [ - 1, 13. Hint.This computation is more tedious. Make sure you use (x - t)31,-,z0 computations for t > 0 and t 5 0.
and do separate
(dj In a computer algebra system implement a short program that graphs the Peano kernel KL-” on [-I. 11 for arbitrary n. Note. While these graphs do not formally prove that the Peano kernel does not change its sign, the graphs and the implementation are instructive. 13-25. Let f : [a. b]--f
R be twice continuously differentiable. Prove that when approximating
b
f ( xj d x
with the midpoint rule, which uses Riemann sums with evaluation in the middle of the interval to ap( b - a)3 , where K = max { l f ” ( x ) I : a 5 x 5 b ) . proximate the integral, the error is bounded by K 24n2 Hint. Use Peano’s representation of the error and use the fact that the midpoint rule is accurate for polynomials of degree n 5 1. Then emulate the rest of the proof for the Newton-Cotes formulas, including the proof that the Peano kernel does not change its sign. ~
13.3. Numerical Integration
223
13-26. Prove part 2 of Corollary 13.22. 13-27. In each part, give an upper bound for the error of the approximation of the given integral with the given rule and the given number of intervals.
e - g dx,trapezoidal rule, N = 20
(a) (b) (c) (dj
/-:
e - g dx,Simpson’s Rule, N = 20
l5
sin ( x2 ) d x , trapezoidal rule, N = 50
s,’
sin ( x2 ) d x , Simpson’s Rule, N = 50
13-28. Compute the number N of intervals needed to approximate the integral with the indicated rule so that the error is at most the given v . (a) (bj (cj
l:
e - g dx,Simpson’s Rule, u 5
l2 i4
dx,trapezoidal rule, u i. lo-’
In(xj& d x , Simpson’s Rule, v i. lo-*
13-29. Approximate the integral with the indicated rule so that the error is at most the given u Hiizt. Use the error bounds in Corollary 13.22 to compute the number of intervals. Then compute the requisite sum with a computer, (a) (bj (c)
ll Ll‘ L2 1
1 2/5;;e
-2 2 d x , trapezoidal rule. u 5 lo-*
1 2 Z e - T dx,Simpson’s Rule, v 5
2
z
sin
(e)
1
e
(:)
-2 2 d x , Simpson’s Rule, u 5 lops
dx,trapezoidal rule, u 5
10
(0
1‘
f i e X d x , Simpson’s Rule, v
5
I
13-30. Let the function f : [ u ,b] + B be continuously differentiable. Use Euler’s Summation Formula (see Exercise 12-21j and an appropriate substitution to prove that the approximations of the integral
lb
f ( x ) dx with the trapezoidal rule with N trapezoids converge to
Ib
f ( x ) d x as N + w .
Part 11 Analysis in Abstract Spaces
Chapter 14
Integration on Measure Spaces Throughout this second part of the text, we need to integrate multivariable functions. Therefore we start our investigation of the more abstract realms of analysis with integration. Recall that the fundamental idea behind Lebesgue integration was the partition of the range (see Figure 22). Moreover, when we worked with Lebesgue measure, we were concerned with properties stated in terms of sets. We rarely used the fact that these sets were subsets of the real numbers. Therefore integration of functions from a more general space to [-m, m] should be similar to the theory developed in Chapter 9. Indeed, Sections 14.1-14.4 basically recast and sort the results of Chapter 9 to show how these ideas generalize to arbitrary measure spaces. In particular, this generalization makes it possible to talk about Lebesgue integration in Rd.Sections 14.5 and 14.6 provide fundamental results on sequences of integrable functions. These results are the cornerstone for the proofs of many important facts about integrable functions. Finally, Sections 14.7 and 14.8 show how measures on products (such as R2 = R x R) are related to measures on the factors. In this chapter, we experience the full power of abstraction for the first time. Once the abstract core of concrete results for the real numbers is identified, the familiar results from the real numbers can be established in much more general contexts, sometimes even with the same proof. It is important to realize, however, that this generalization comes at a price, We must very carefully check that we did not use any specific properties of the real line in the proof of the concrete result. The most important property of the real line that is lost in the general setting is the linear ordering. Proofs and results that depend on this ordering do not generalize easily. This is why the linear ordering of the real line was used sparingly in the first part of the text.
14.1 Measure Spaces To be most widely applicable, abstract integration is defined on sets equipped with a structure that makes them a “measure space.” In this fashion, we do not need to worry about details regarding the shape and dimension of the domain. The fundamental idea for integration in arbitrary spaces is the same as for Lebesgue integration. Partition the
225
14. Integration on Measure Spaces
226
/&
Ak+l
Ak-1
Figure 29: For integration in more complicated spaces than the real line, we retain the main idea from Lebesgue integration. We partition the range (here the z-axis) and measure the size of the inverse images Ak of intervals in the range (the z-axis). The sum of these sizes times the corresponding heights
should approximate the
“volume” under the graph of the function. range and approximate the “area” or “volume” under the function with “generalized rectangles” or “generalized boxes” whose bases are sets for which we can determine the “measure” (see Figure 29). It was noted after Definition 9.4 that even on the real line there are sets for which we cannot determine a sensible “measure” using outer Lebesgue measure. Thus, it is not surprising that on a general set M we need to consider the subset of the power set P ( M ) that contains all subsets for which we can determine the “measure.” The properties that define these subsets are directly inspired by properties of Lebesgue measurable sets (see Theorem 9.10).
Definition 14.1 Let M be a set. A subset C & P ( M ) is called a sigma algebra or a-algebra ifs 1. 0
E
c,
2. I f S E C, then S’ E C, 32
3. I
~
c for~ all n E W,then U A ,
EA
n=l
E
C.
14.1. Measure Spaces
227
Our most important examples of a-algebras so far are the power set itself and the set of Lebesgue measurable subsets of R.
Example 14.2 1. Let M be a set. Then the power set
P(M)of M
is a a-algebra.
2. The set Ch of Lebesgue measurable subsets of R is a a-algebra.
0
Part 1 is trivial and part 2 is Theorem 9.10.
More examples of a-algebras are given in Exercise 14-1 and Theorem 14.25. Because a-algebras are newly introduced entities, we need to prove that the properties we know from Lebesgue measurable sets also hold in this general context.
Proposition 14.3 Let M be a set and let C E P ( M ) be a 0-algebra. I f for every W
n
N we have A ,
E
E
C,then
An
E
C.
n=l
Proof. For each n
E
N by part 2 of Definition 14.1, we infer M \ A ,
of Definition 14.1, we obtain
n
E C . By part 3
M \ An E C.By DeMorgan’s Laws, this means that
n=l
co
M \
u co
n co
A , E C . Therefore by part 2 of Definition 14.1
n=l
A, E C,which concludes
n=l
w
the proof.
Proposition 14.4 Let M be a set, let X
u
_C
P ( M ) be a a-algebra, let N
E
N and let
N
A ] , . . . . A N E C.Then
A,
E
C.
n=l
Proof. For n 2 N
+ 1, let A ,
u N
:= 0. Then
n=l
u 00
A, =
A , E C.
w
n=l
Further properties of a-algebras are presented in Exercise 14-2. Recall that for the definition of Lebesgue measurable functions we never referred to the measure itself. Thus for some purposes it will be sufficient to work with a set and its measurable subsets. This is the idea behind a measurable space.
Definition 14.5 A measurable space is a pair ( M , C ) consisting of a set X and a a-algebra C 5 P ( M ) . The sets in C will also be called C-measurable sets. Finally, a measure is a function that assigns each measurable set its “measure.” The only conditions for a sensible “measure” function are that the empty set has no volume and that the volumes of pairwise disjoint sets are added to obtain the volume of their union.
228
14. Integration on Measure Spaces
Definition 14.6 Let ( M , C) be a measurable space. Then p : C + [O, measure iJj’
001
is called a
1. ~ ( 0 = ) 0, and 2. ,u is countably additive, that is, sets A ,
E
C, then p
( fi
if
is a sequence of painvise disjoint
!x
An) =E w ( A n ) .
The definition of a measure space connects the “measurable sets” with a function that assigns the measure.
Definition 14.7 A measure space is a triple ( M , C , p ) consisting of a set M , a sigma algebra C P ( M ) and a measure p : C + [O,CQ]. Example 14.8 With h denoting outer Lebesgue measure and Ch denoting the sigma algebra of Lebesgue measurable sets, (R,Ch,h ) is a measure space. We have already noted that Ch is a 0-algebra. Trivially h ( 0 ) = 0, and the countable additivity of Lebesgue measure is given by Theorem 9.1 1. 0 Example 14.9 Let M be a set. For A M , we define the counting measure y~ ( A ) to be the number of elements of A i f A isfinite and co i f A is infinite. Then ( M , P ( M ) , y ~ ) is a measure space. The reader will prove this in Exercise 14-3. Note that counting measure will allow us to model absolutely convergent series as integrals in ExamQles 14.36 and 14.37 below. 0 In Chapter 9, we defined Lebesgue integration over R,but we never formally defined Lebesgue integration over subsets of R such as intervals. The formalism of measure spaces allows us to easily fill this gap. Every measurable subset of a measure space can be equipped with the structure of a measure space.
Example 14.10 Let ( M , C , p ) be a measure space and let 52 E C be measurable. Let C’ := ( S E C : S a}and let pa := pl,n. Then (5 2 , C”, p a ) is a measure space. 0 The reader will prove this in Exercise 14-4. Because measure spaces are newly introduced, we must prove their properties “from scratch.” Specifically, we cannot use familiar properties of Lebesgue measure without first proving them for measure spaces. Nonetheless, the next three propositions should be familiar from Lebesgue measure.
Proposition 14.11 Let ( M , C ,F ) be a measure space and let A l , . . . , A N be painvise
(u /
disjoint sets in C. Then p
N
n=l
\
N
A,) =
p(An). n=l
14.1. Measure Spaces
229
u N
Proof. By Proposition 14.4, we have that
+ 1, let An := 0.
An E C. For n 2 N
n=l
Proposition 14.12 Let ( M , C , p ) be a measure space and let A , B Then k ( A ) I F(W.
E
C with A & B.
Proof. By Exercise 14-2b we have B \ A E C . Therefore @.(A)I d A ) L4B \ A ) = P ( A lJ ( B \ A ) ) = FUB).
+
Definition 14.13 Let ( M , C , p ) be a measure space. Then A E C is called a set of measure zero or a null set iff p ( A ) = 0. Aproperty is said to hold almost everywhere in M ipthe subset of M f o r which the property does not hold is of measure zero. Almost everywhere is also abbreviated as a.e. Proposition 14.14 Let ( M , C , p ) be a measure space and f o r all n
u co
A , E C be a null set. Then
E
N let the set
A , also is a null set.
n=l
Proof. By Proposition 14.12, subsets of null sets are null sets also. For n E N,
u
Aj.
Then each B, is a null set and
j=1 1 w
1
u u co
n-1
define B, := An \
m
Bn =
n=l
A,. But
n=l
co
p ( B n ) = 0, which establishes the result.
W
We conclude this section with a result that shows that the measure of the union of a nested sequence of sets is equal to the limit of the nondecreasing sequence of the measures of the sets
Theorem 14.15 Let ( M , C , p ) be a measure space and let
N.Then p
sets in C so that An E A,+lforall n E
u 1
100
be a sequence of
\
An = lim p ( A n ) .
(n=l
*-+m
Proof. Mimic the proof of Theorem 9.12. (Exercise 14-5.)
Exercises 14-1. Let M be a set. Prove that the set of all subsets S C M so that S is countable or M \ S is countable is a u-algebra.
14-2. Let M be a set and let C g P ( M ) be a o-algebra.
n N
(a) Prove that if A 1 , . . . , A N
E
C , then
n=l
(b) Prove that if A , B
E
X,then A \ B
E
C.
An
E
C
230
14. Integration on Measure Spaces
14-3. Counting measure. Let M be a set and let y~ : P ( M ) + [0, M ] be counting measure on M (a) Prove that ~ M ( A=) 0 iff A = 0. (b) Prove that y~ is a measure. 14-4. Measures on subsets. Let ( M , C , p ) be a measure space, let R E Z, let C" := {S E C : S a n d l e t p a := p l x n .
5 Q]
(a) Prove that C" is a a-algebra. (b) Prove that p~ is a measure. 14-5. Prove Theorem 14.15. 14-6. Let ( M , C . p ) be a measure space and let (An}:=, be a sequence of sets in C . Prove the inequality
14-7. Let M be a set. Prove that a subset C 2 P ( M ) is a o-algebra iff 0 E C; if S E C , then S' E C ; and 'x.
if An E C for all n E
N,then
A,, E C . n=l
14-8. L e t ( M , C , p ) b e a m e a s u r e s p a c e a n d l e t A , B E C . P r o v e p ( A ) - p ( B ) = p ( A \ B ) - p ( B \ A ) 14-9. The measure of nested intersections. (a) Let ( M , C , p ) be a measure space and let {An},",l be a sequence of sets in C such that for all n E N we have An 2 An+l and such that for some rn E W we have p(A,) < 00. Prove
(b) Show that the condition p ( A m ) < m for some m E W cannot be dropped Hint. Let ypq be counting measure on W and let An := { i E W : i 1 n } . 14-10. Let ( M , C , p ) be a measure space. (a) Prove that Z, := [ A that contains C.
gM
(b) Prove that for all A
C, and all E , F E Z. with E
E
: ( 3 E ,F E C : E
2 A 2 F , p ( F \ E ) = 0 ) } is a o-algebra g A g F and p ( F \ E )
= 0 we have
P ( E ) = W(F).
(c) F o r a l l A E C , , l e t E . F E C w i t h E g A g F a n d p ( F \ E ) = O a n d d e f i n e p ( A ) : = p ( F ) . Prove that iI : C, + [0, m ] is a measure. (You also need to show that p is well-defined.) (d) Prove that for all B E C we have p ( B ) = p ( E ) .
(e) Prove that the measure space ( M , X p , p) is complete. That is, prove that if N E X, is so that p ( N ) = 0 and S & N , then S E C., The o-algebra Z p is also called the completion of Z. and p is also called the completion of
p
14.2 Outer Measures Although not all subsets of the real numbers are Lebesgue measurable, outer Lebesgue measure is defined for all subsets of R. The idea of an outer measure can be transplanted to an abstract space. This section shows that outer measures produce a measure space similar to how outer Lebesgue measure produced a measure space. In particular, we will obtain a Lebesgue measure on d-dimensional space.
23 1
14.2. Outer Measures
Definition 14.16 Let M be a set. Then p : P ( M ) + [0,co] is called an outer measure 1 3 I . p ( 0 ) = 0, 2. ZfA
G B , then p ( A ) 6 p ( B ) ,
3. Thefunction p is countably additive, that is, for all sequences {An]:=, of sets
(u 00
in M we have p
n=l
00
An) 5 C p ( A , ) . n=l
By Theorem 8.6, outer Lebesgue measure on R is an outer measure. To integrate in Rd (and via Example 14.10 on subsets C2 E Rd)we need to define an outer measure on RBd that is similar to outer Lebesgue measure on R. The definition is very much the same as for the real line, except that instead of open intervals we use d-dimensional boxes. The thus defined outer measure is also called outer Lebesgue measure and it is also denoted with h.
Definition 14.17 Let d 2 1 and for i = 1, . . . , d let ai < bi. Then a set of the form d
d
B := n ( a i , bi) is called an open box in Rd.We define IBJ := n ( b i - a i ) . For a set i=l i=l S Rd we define the outer Lebesgue measure of S to be 30
I
U B j , eachBjisanopenboxinRd , j=1
j=l
where we set h ( S ) = cc ifnone of the series in the set on the right converge.
It would be nice to show here, similar to Proposition 8.5, that the outer Lebesgue measure of a box is exactly the volume of the box. We postpone this result to Theorem 16.81, where we can use compactness for an efficient proof. The argument that outer Lebesgue measure on Rd defines a measure on its measurable sets is exactly the same as for R.Thus the following results and the definition of measurable sets can be seen as a recap of the appropriate parts of Section 9.1.
Theorem 14.18 Outer Lebesgue measure on Rd is an outer measure. Proof. Mimic the proof of Theorem 8.6. (Exercise 14-11.) The motivation and definition of measurable sets in the abstract setting are the same as for Lebesgue measure (see Definition 9.4). We must prevent that the sum of the measures of the pairwise disjoint measurable parts of a set is different from the measure of the whole measurable set.
Definition 14.19 Let M be a set and let p : P(M)+ [0,00]be an outer meaM is called p-measurable iff for all T g M we have that sure. A subset S p ( T )= p ( S n T ) + p (S' n T ) . The set of p-measurable subsets of M is denoted C,.
14. Integration on Measure Spaces
232
As in Definition 9.5 the Lebesgue measure is obtained by restricting outer Lebesgue measure to the set Ch of Lebesgue measurable sets.
Definition 14.20 Let S C Rd be a Lebesgue measurable set. Then the outer Lebesgue measure h ( S ) of S is also called the Lebesgue measure of S. As before, “half’ of the equality for measurability is always satisfied. Moreover, the proof that outer measures induce measure spaces runs along the same lines as the proofs of the corresponding results in Section 9.1.
Proposition 14.21 Let M be a set and let p : P ( M ) -+ [0,CQ] be an outer measure. For all subsets S , T G M , we have p(T) 5 p(S n T ) p (S’ fl T ) .
+
w
Proof. Mimic the proof of Corollary 9.6. (Exercise 14-12.)
Proposition 14.22 Let M be a set and let p : P ( M ) -+ [0,001 be an outer measure. Ifp(S) = 0, then S is p-measurable.
w
Proof. Mimic the proof of Proposition 9.7. (Exercise 14-13.)
Lemma 14.23 Let M be a set and let p : P ( M ) -+ [0,001 be an outer measure. I f A and B are p-measurable, then the intersection A n B is p-measurable.
w
Proof. Mimic the proof of Lemma 9.8. (Exercise 14-14.) Lemma 14.24 Let M be a set, let p : P ( M ) let
-+
[0,001 be an outer measure and
{A,,]zl be a sequence of painvise disjoint p-measurable sets.
u cc
Then
A,, is
Proof. Mimic the proof of Lemma 9.9. (Exercise 14-15.) Theorem 14.25 Let M be a set, let p : P ( M ) -+ [0,001 be an outer measure and let C, be the set of p-measurable sets. Then ( M , C,, p ) is a measure space. Proof. Mimic the proof of Theorem 9.10 to prove that C, is a a-algebra and then mimic the proof of Theorem 9.11 to prove that p is countably additive on C,. (Exercise 14-16.) rn In particular, Theorem 14.25 shows that the triple Rd,Ch,h is a measure space. Lebesgue measure is the standard measure for integration in d-dimensional space. Thus, as we proceed to define measurable and integrable functions, we are constructing a theory that allows us to integrate on Rd and its subsets.
(
Exercises 14-11. ProveTheorem 14.18
)
14.2. Outer Measures
233
14-12. Prove Proposition 14.21 14-13. Prove Proposition 14.22 14-14. Prove Lemma 14.23. 14-15. Prove Lemma 14.24. 14-16. Prove Theorem 14.25. 14-17. Let d 2 1 and for i = 1, , . . , d let the numbers ai < bi be dyadic rational numbers. Then a set of
n d
the form D :=
(ai , b i ) is called a dyadic open box in Rd. Prove that for any set S
g Wd we
i=l
u 30
IDj I : S C
D j , each D j is a dyadic open box in Rd
j=1
1
14-18. Prove that if A , B g R and h ( A ) = 0 (where h is Lebesgue measure on JR), then h ( A x B ) = 0 (where h is Lebesgue measure on R2).
I
14-19. Looking at Riemann integrals from a measure theoretic point-of-view. Let [ a ,b] be an interval. For a set S & [a, b] we define J ( S ) := inf
n
("
IIjI : S &
j=1
u
I j , each I j is an open interval
j=1
and
call it the Jordan content of S. Note that the only difference between the Jordan content and outer Lebesgue measure is that the sums over which the infimum is taken are finite. (a) Prove that for any closed interval [c, d ] g [a, b ] we have J
( [c, dl ) = d - c.
(b) Prove that the Jordan content is nor an outer measure. Hint: Q. (c) Let Ji (S):=sup
In
n
Ibj - a j 1 : S 2 U [ a j ,b j ] , a l 5 bl 5 a2 5 b2 5 . .
j=1
. sa, 5 bn
j=1
1
, and call
it the inner Jordan content of S g [ a ,b]. Prove that for any closed interval [c, d ] E [ a ,b] we have Ji ([c, d ] ) = d - c. (d) Prove that the inner Jordan content is not an outer measure
E W Jordan measurable iff J ( S ) = J i ( S ) . An algebra of sets satisfies the first two properties of a a-algebra, but it is only closed under finite unions rather than infinite of Jordan measurable subsets of [ a ,b] forms an algebra. unions. Prove that the set
( e ) Call a set S
z[[n,b] J : z[~,b] + [O, m) is a finitely additive measure. That is,
(f) Prove that the Jordan content
J(0) = 0 and if (An),N_lis a finite sequence of pairwise disjoint sets
An E
q~.b], then
(g) Prove that a set S g [ a , b] is Jordan measurable iff its indicator function 1s is Riemann
integrable. Note. This exercise shows it is fair to say that the problem with the Riemann integral is that its
associated notion of a content is only finitely additive. 14-20. Let a < b be real numbers and let g : [ a ,b] + W be nondecreasing. For numbers c, d E ( a , b ) so that c < d define (c, d ) := lim g(x) - lirn g ( x ) . define [ a ,d ) := lirn g(x) - g ( a ) ,
Ig
1
1
and define (c, b]
x-td-
Ig := g ( b ) -+,,.. lim , ,
x+c+
1
g(x). Set [ a , b]
I
Ig
I
x+d-
:= g ( b ) - g ( a ) . Let Z be the set of all
Ir
subintervals of [ a , b ] that are either open, closed at a and open on the right, closed at b and open on the left or equal to [a, b]. For any set S c [a, b ] ,we define the outer Lebesgue-Stieltjesmeasure
u 30
lljlg : S &
of S to behg(S) := inf j=1
j=1
I j , each I j is i n Z
I
.
234
14. Integration on Measure Spaces (a) Prove that the outer Lebesgue-Stieltjes measure really is an outer measure (b) Prove that open intervals ( c , d ) C [ a . b] are hg-measurable. Hint. Compare with Proposition 9.13. (c) Prove that for c < d both in [ u , b] we have hg ([c, d ]
)=
lim g ( x ) - lim g ( x ) , where .xed+
x’c-
the one-sided limit is understood to be the value of g if c = a or d = b.
1;
forx
> c.
Prove that Ag(S) :=
(e) Construct a nondecreasing function g : [a, b] + hg ( [ c ,dl ) > g ( d ) - gic).
O; 1:
’
if for any set s 2 [ a , b ] . if c E S, ’3
B and an interval
[c, d ]
g
[ u , b] so that
(f) Prove that the function f : [ u , b] -+ W is Riemann-Stieltjes integrable with respect to g iff hg ( ( x : f is discontinuous at x ) ) = 0.
14.3 Measurable Functions As we did on the real line, we first define the functions for which there is a chance that the integral exists and then we define the integral. Indicator functions (see Definition 5.9) are once more our “rectangles” and simple functions are composed of finitely many disjoint “rectangles.” From here on, algebraic operations on functions will always be understood to be pointwise, that is, for example, the sum f g of two functions f and g with the same domain is defined as ( f g ) ( x ) := f ( x ) g ( x ) for all elements x of the domain.
+
+ +
Definition 14.26 Let ( M , C) be a measurable space. A simple measurable function is a function s : M -+ R such that there are n E N,a l , . . . , a, E R and painvise n
disjoint sets A 1,
. . . , A,
E
C so that s =
ak1.4,. We will also call these functions k=l
simple functions. For measurable functions, we consider the positive and negative parts separately.
Definition 14.27 Let M be a set. For f : M -+ [-m, 001 we dejine the positive part f + ( x ) :=max{f(x),O} andthenegativepart f - ( x ) := -min{f(x),O}. Definition 14.28 Let ( M , C) be a measurable space. The nonnegative function f : M -+ [0,001 is called measurable if there is a sequence of simple functions s,, : M + [0,00) such that f o r all x E M the sequence { s n ( x ) } Z lis nondecreasing and lim s, (x) = f (x).
{s,}zl
n+oc
A function f : M -+ [-a, 001 is called measurable fz f + and f - are both measurable. lf it is necessary to distinguish between several a-algebras, we will also call these functions C -measurable. Once again, measurable functions have many characterizations.
Theorem 14.29 Let ( M , C ) be a measurable space and let f : M + [-XI, function. Then the following are equivalent.
XI] be a
14.4. Integration of Measurable Functions
235
1. f is measurable,
2. For all a
E
R,we have {x E M
3. For all a
E
R, we have {x E M : f (x) 5 a } E C ,
4. For all a
E
R,we have {x E M : f
5. For all a
E
R,we have {x E M : f (x) 1. a } E C.
:f
(x) > a } E C ,
(x) < a }
E
C,
Proof. Mimic the proof of Theorem 9.19 (Exercise 14-21). The characterizations of measurable functions can be used to prove that certain operations preserve measurability.
Theorem 14.30 Let ( M , C ) be a measurable space and let f , g : M -+[-GO, co]be measurable functions. r f f g is defined everywhere, then f g is measurable. If f - g or f . g is defined everywhere, then it is measurable. Finally, f +, f - and I f I are measurable.
+
+
Proof. Mimic the proof of Theorem 9.20. (Exercise 14-22.)
Exercises 14-21. Prove Theorem 14.29 14-22. Prove Theorem 14.30. 14-23. Let ( M , C ) be a measurable space. Prove that a bounded function f : M -+ [0, co) is measurable iff there is a sequence of simple functions sn : M + [0, 00) such that for all x E M the is nonincreasing and lim s n ( x ) = f ( x ) . sequence { s , ( x )
I,"=,
n e w
Hint. Mimic part ''5=+1" of the proof of Theorem 14.29 for nonnegative functions.
14-24. Let ( M , C )be a measurable space and let f , g : M + [-m,
{x {x
E
M : f ( x ) = g(x) }
E
C.
(b) Prove that
E
M : f ( x ) 5 g(x) }
E
C.
(c) Prove that
{x
E M : f ( x ) < g(x)
}
E
C.
(a) Prove that
001 be
measurable functions.
14.4 Integration of Measurable Functions Once measurable functions are defined, the definition of the integral also is similar to that of the Lebesgue integral.
N,let A1, . . . , A ,
Definition 14.31 Let ( M , C ,p,) be a measlire space, let n
E
pairwise disjoint, let a1 , . . . , a,
a k l be ~ ~a simple function.
n
E n
[0,co)and let s =
k= 1
E
C be
14. Integration on Measure Spaces
236
The fact that the integral in Definition 14.31 does not depend on the representation of s can be proved just as for the Lebesgue integral in Exercise 9-20a.
Definition 14.32 Let ( M , C , p ) be a measure space. Let f : M measurable function. We dejine the integral o f f to be f d p := sup I M
[O, co] be a
+.
{ IM
s d p : s is a simple function with 0 5 s 5
f
and we call f integrable iff the supremum isjinite. A function g : M -+ [-m, 001 is called integrable z r g + and g - are both integrable. Its integral is defined to be
IM 1, g d p :=
g+ d p - /M g - dp.
The integral on measure spaces is very versatile. Examples 14.33 and 14.34 show that integration on the real line as well as on d-dimensional spaces are special cases. Examples 14.36 and 14.37 show that absolutely convergent series and absolutely convergent double series are in one-to-one correspondence with integrable functions on the right measure spaces.
Example 14.33 With M = R, C = C;, and p = A, Definition 14.32 gives the 0 Lebesgue integral on the real line. Example 14.34 With M = Rd,C = C;, and p = A, Definition 14.32 gives the Lebesgue integral in d-dimensional space. We will investigate in Section 14.8 how the 0 Lebesgue integrals in various dimensions are related to each other. For the mentioned examples on series, we first need to establish that a function is integrable iff its absolute value is integrable. As was (by now) to be expected, the properties of the abstract integral are proved in exactly the same way as for the Lebesgue integral.
Theorem 14.35 Let ( M , C, p ) be a measure space and let f , g : M measurable functions. 1. I f 0 5 f 5 g a.e. and g is integrable, then so is f and
+.
[-co,001 be
lM IM f dp 5
g dp.
2. f is integrable z f f 1 f I is integrable and in that case the triangular inequality I
n
3. I f f 2 0 , t h e n SM
f d p = 0 i f f f =Oa.e.
Proof. Mimic the proof of Theorem 9.23. (Exercise 14-25.)
237
14.4. Integration of Measurable Functions
Example 14.36 With M = 0;:
N,a series
aj
N,C
= P( N ) and p = yw being counting measure on
converges absolutely iff the function f : N + [--00,
001 defined by
j=1
f(j)=
aj
is integrable over the measure space (N, P ( N ) , yw). Moreover, for every
integrable function f on this measure space we have
jN dyw f
co
=
f(j). j=1
Example 14.37 Let (a(i,j ~ } $ = be ~ a doubly indexed countable family of numbers.
7
occo
We say the double series
a(i,j) converges
absolutely iff the double series
i = l j=1 moci
1 converges.
la(j,j)
With M = N x N, C = P ( N x N)and p = mxwbeing counting measure on N x N, 000;:
a double series
7
a(i,j)converges
absolutely iff the function f (i, j ) := a ( i , j ) is
i = l j=1
integrable on the measure space
(N x N, P ( N x N), y w x ~ ) . For every integrable
function f on this measure space, we have
S,
cocc
.t-dywxrv=~~f(i,.j).
XW
0
i = l j=1
As for Lebesgue measure, we consider null sets to be insignificant. Therefore with the same motivation as given for Definition 9.24 we define the following.
Definition 14.38 Let ( M , C , p ) be a measure space. I f . f : M -+ [--00, -001 is defined iff (') is defined? is integrable. a.e., we call it integrable z r g ( x ) := i f f (x) i s not defined,
{ 6(x)'
As long as a function is constructed from other integrable functions, like the sum in Theorem 14.39, measurability of the set where the function is not defined is not an issue. However, even if this was a problem and the function is undefined on a subset of a null set, we could simply switch to the completion of the measure (see Exercise 14-10) and note that subsets of null sets are also insignificant. Overall, with the same conventions as for the Lebesgue integral we can prove that the integral over a measure space has the right linearity properties.
Theorem 14.39 Let ( M , C , p ) be a measure space and let f ,g : M + integrablefunctions. Then the following hold. 1. For all a E
R,the scalar multiple af is integrable and
IM
2. The sum f
+ g is integrable and
f dp
f
+g d p =
[--00.
-003
IM 1, + lM af d p = a
g dp.
be
f dp.
14. Integration on Measure Spaces
23 8
Proof. Mimic the proof of Theorem 9.25. (Exercises 14-26a and 14-26b.)
Exercises 14-25. Prove Theorem 14.35. That is, let ( M , C,p ) be a measure space and let f ,g be measurable functions. (a) Prove that if 0 5 f 5 g a.e. and g is integrable, then so is f and (b) Prove that f is integrable iff
(c) Prove that if f
> 0, then
jM
I f 1 is integrable and that in this case
f dp 5
1
/
g dp.
M
[M f dw
5
~
IM
1f l
dp.
J, f d p = 0 iff f = 0 a.e..
14-26. Let ( M , C . p ) be a measure space and let f , g : M + [-m, 031 be integrable functions. Prove that if
ci
E
W,then the scalar multiple af is integrable and /M
Prove that the sum f
+ g is integrable and
IM + f
g dl* =
cif
dp =a
IM + IM 1 f dp
lM
f dw.
g dp
Note. Exercise 14-33 gives a more effective proof than mimicking the proof of Theorem 9.25. Prove that f - g is integrable and /M f - g d p = /M f d p -
g dp.
M
Prove that max(f, g ] (defined pointwise) is integrable. Prove that min[f, g ] (defined pointwise) is integrable. Give an example that shows that the product o f f and g need not be integrable.
14-27. Markov’s inequality. Let ( M , o,g ) be a measure space and let f : M + [O. 1 Prove that for all c z 0 we have p ( { x E M : f ( x ) > c ] ) 5 f dp. C
1
30)
be integrable.
M
14.5 Monotone and Dominated Convergence Sequences have been a standard tool throughout our investigation of single variable functions. Sequences play an equally important role in more abstract settings. In Exercise 11-13, we have seen that pointwise convergence of a bounded, monotone sequence of Riemann integrable functions need not produce a Riemann integrable limit function. The two fundamental results about pointwise convergence in measure theory are the Monotone Convergence Theorem, which says that the pointwise limit of a nondecreasing sequence of nonnegative integrable functions is integrable if the limit of the integrals is finite (see Theorem 14.41), and the Dominated Convergence Theorem, which says that the pointwise limit of a sequence of integrable functions is integrable provided all functions are below one integrable function that dominates them all (see Theorem 14.43). Hence, when it comes to pointwise convergence of functions, the Lebesgue-type integral has more favorable properties than the Riemann integral. Because every subset A E C of a measure space ( M , C , /A) can be turned into a measure space, and because by Theorem 14.35 for any integrable function g, the function g l A is also integrable, we can define integrals over subsets.
Definition 14.40 Let ( M , C , p ) be a measure space, let A E C and let the function ,f : A4 + [-m, 001 be integrable. Then we define the integral o f f over the subset A
239
14.5. Monotone and Dominated Convergence as
I,
g d p :=
lb
SM
glA d p . For Lebesgue integrals over intervals, we will also write
f ( x ) d h ( x ) :=
Lb
f d h :=
la,b, f dh.
Because the integral is defined as a supremum of integrals of simple functions, it should not be too surprising that the integral is well-behaved with respect to monotone sequences of functions.
Theorem 14.41 Monotone Convergence Theorem. Let ( M , C , p ) be a measure bela sequence of nonnegative measurable functions so that space and let { f f l } z { f f l ( x ) } z l is nondecreasing for almost all x E M and lim f n ( x ) exists for almost all x
n+m
E
M . Let f : M + [0,00] be a measurablefunction so that f (x) = fl+m lim f f l ( x )
IM P
a.e. Then
P
f d p = lirn
fn
f l J M
dl*.
Proof. First assume that for all x limit lim f n ( x ) exists.
E
M we have 0 5 f 1 (x) 5 f 2 ( x ) 5 . . . and the
fl+X
reversed inequality, let s =
c
ak1.4, be a simple function such that 0 5 s 5 f and let
k=l
t E ( 0 , 1). For n E
all n
E
N,define E,
cc
:= ( x
N we have the containment E ,
E
M : f i l ( x ) 2 t s ( x ) } .Then M =
fl=l
En+l, and
m
m
k=l
k=l
Let E > 0. Then there is an N E
u
N so that
E,, for
240
14. Integration on Measure Spaces
Because
E
SM SM f5y- 1,
was arbitrary we obtain
t s d p i lirn
fn
n+w
d p , and because the
number t E (0, 1) was arbitrary, we can let t approach 1 and obtain sdp =
s d p 5& :%
t
S,
dp.
fn
Because s 5 f was an arbitrary simple function, we conclude that SM
{
f dF = sup s,/
d p : s simple, 0 5 s 5 f } 5 lim n+oo
,/
fn dp.
Now assume that 0 i fi(x) 5 f 2 ( x ) I . and n+cc lirn f n ( x ) = f (x) for almost all x E M . Let N E C be a null set so that 0 5 f l ( x ) 5 f 2 ( x ) 5 . . . and lirn f n ( X ) = f (x) for all x E M \ N. Changing a function on a null set does not n+cc change its integral. Therefore we can replace f with f ~ M \ and N each f n with fnlM\N and apply what we have proved above to obtain the equation for the integrals. rn ‘
1
We need the next lemma for the proof of the Dominated Convergence Theorem.
{fn)rE1
Lemma 14.42 Fatou’s Lemma. Let ( M , C , p ) be a measure space and let be a sequence of nonnegative measurable functions. Then lirn inf f n (defined pointwise) n+w
is measurable and
f n d p 5 ln+cc iminfh fn dp.
Proof. Because the limit inferior is liminf f n = lim inf{f j : j 3 n ] , where the n+m n+co infimum and the limits are taken pointwise, we first consider the sequence of functions p n := inf{f j : j 2 n ) .
u {x 00
Let n
E
N and a
E
R. Then {x E M : p n ( x ) ia } =
E
M : fj(x) < a}.
j =n
Because the union on the right side is measurable and a was arbitrary we conclude that p n is measurable. Moreover, clearly we have pn 5 pn+l. By Exercise 14-28a, this means that liminf f n = lim p n is measurable and by n+oo
n+oo
the Monotone Convergence Theorem numbers n
E
N we have p n
p n d y . Now for all
I f n , which by Lemma 10.5 means
liminf f n d p = lim n+w
1
p n d p 5 liminfIM fn d p . n+m
The inequality in Fatou’s Lemma can be strict (Exercise 14-29). Finally, as long as all functions’ absolute values are bounded (“dominated”) by one integrable function, pointwise limits preserve integrability and the limit can be moved out of the integral.
Theorem 14.43 Dominated Convergence Theorem. Let ( M , C ,p ) be a measure be a sequence of measurable functions and let f be a measurable space, let ( fn]rG1 functionsuch that f (x) = lirn f n (x)for almost all x. Moreovel; let g be an integrable n+oo
24 1
14.5. Monotone and Dominated Convergence function such that for all n
f is integrable and
SM
E
N and almost all x
f dp=
,hI/,
fn
If
I
1
I
E M we have f n (x) 5 g ( x ) . Then
dp.
I
Proof. Because ( x ) l = lirn f n ( x ) 5 g(x) for almost all x E M , by part 1 of n+cc Theorem 14.35 I f 1 is integrable and then by part 2 of Theorem 14.35 f is integrable. Because changing a function on a null set does not change the integral, we can assume that l f n ( x ) (5 g(x) and (x)l 5 g(x) for all x E M . Now for all n E N we have f n (x) g(x) 2 0 and thus by Fatou’s Lemma we obtain
If
+
lim lirninf
F
n-cc
which means that functions (- f n ) sition 10.4)
IM
jMf
dp 5
fa
,/
fn
hnkf /M
+ g 2 0 gives
+ g d p = /M
-f d p
liminf(f, n-+m
+ g d p = liminf n-cc /M
+ g) d p
f n d p -t
/Mgdir’
f n d p . The same argument applied to the
5 liminf
-f n d p , that is (recall Propo/M
f n d p . By Proposition 10.3 and Theorem 10.6, we
f dp 2
r
The hypothesis that all functions f n in the Dominated Convergence Theorem are below an integrable function g cannot be dropped. For example, on (0, 1) equipped l o o
converges point{ [M);),=, wise to zero, but all integrals are equal to ln(2). Moreover, Exercise 14-28 shows that
with Lebesgue measure, the sequence of functions 1
the demand that the pointwise a.e. limit is measurable cannot be dropped in general, but that it can be dropped for Lebesgue measure.
Exercises 14-28. Pointwise almost everywhere limits of measurable functions. Let ( M , C , p ) be a measure space and be a sequence of measurable functions. let (fn)r=u=l Prove that if f ( x ) = lim f n ( x ) for all x E M , then f is measurable. n-tcc
Give an example that shows that the pointwise a.e. limit of a sequence of measurable functions need not be measurable. Hint. There is an example with M = (0, 11, C = (0, M )and p = 0. Prove that if ( M , C , p ) is a complete measure space, then f ( x ) = lirn f i l ( x ) a x . implies n-cc
that f is measurable. Prove that (&id, Xi,h
) is a complete measure space.
14-29. Give an example that the inequality in Fatou’s Lemma can be strict. Then explain why this is not a counterexample to the Dominated Convergence Theorem.
14-30. Let ( M , C , p ) be a measure space and let g : M + [O, pg : C + [O,
cc)defined by p g ( A ) :=
031
be integrable. Prove that the function
14. Integration on Measure Spaces
242
14-31. Let ( M , Z. p ) be a measure space and let ( f n } , X , l be a sequence of integrable functions that converges uniformly to a measurable function f. (a) Prove that if p ( M ) < co,then f is integrable and
sM
f dw
= LlmE
sM
f n dpL.
(b) Give an example that shows this result need not hold for infinite measure spaces. be a sequence of integrable functions that con14-32. Let ( M , Z, p ) be a measure space and let {fn]r=l verges pointwise to a measurable function f and for which there is a B E JR so that for all x E M we have f n ( x ) 5 B . (a) Prove that if p ( M ) < co,then f is integrable and
f d p = lim
n+x M
f n dp.
(b) Give an example that shows this result need not hold for infinite measure spaces
sM sM
14-33. Let f,g : M + [-co,301 be integrable functions. Give a more effective proof that f integrable and
IM + f
g dp =
f dp +
+ g is
g d p than was given in Theorem 9.23.
Hint. Use the Monotone Convergence Theorem and two nondecreasing sequences of simple functions that converge to I f 1 and / g (respectively to prove that I f 1 + / g /is integrable. Use a similar idea to prove the equation for the integrals. Do not use the Dominated Convergence Theorem (why?). 14-34. Explain how the hypotheses of the Dominated Convergence Theorem prevent the occurrence of examples as in Exercise 11- 17 or as mentioned at the end of this section. 14-35. Prove that for every Lebesgue integrable function f : SS -+ JR there is a sequence of Riemann integrable functions rn : R + R such that lim
n+w
s
w if
- rn
I d h = 0.
14.6 Convergence in Mean, in Measure, and Almost Everywhere Theorem 15.50 will show that the “distance” between two functions can be measured with the integral of the absolute value of the difference. Because convergence with respect to this distance is fundamental in analysis, we investigate this notion of convergence and some of its consequences here.
Definition 14.44 Let ( M , C ,p ) be a measure space, let [ fn]F=l be a sequence of measurable functions and let f : M + [-w, w] be a measurable function. If
1 fn
-
f 1 d p = 0 then we say that ( fn]zl converges in mean to f
I
Convergence in mean is near the heart of the definition of integration.
Theorem 14.45 Let ( M , C ,p ) be a measure space and let f : A4 + [-w, w] be an of simplefunctions that converges integrablefunction. Then there is a sequence (sn]El in mean to .f. Proof. By definition of the integral, for every n tions 0 5 ;s
5 f - and 0 5 :s
5 f + so that
E
N there are simple func1 n
14.6. Convergence in Mean, in Measure, and Almost Everywhere
-1 + -1= - 2,
a } =nx[{z E
xxY
: f(z)>a,nr(z)=y}]
Similarly, we prove that for every x E X the function fx is r-measurable. We conclude this section by showing that the constructions presented here give reasonable results for Lebesgue measure.
+
Proposition 14.60 Let m , n , d E N be such that m n = d and let h,, h, and hd denote Lebesgue measure on Rm,Rn and Rd,respectively. Then Chd 2 Xi, x Ch, and f o r all A E and B E Ch, we have hd(A x B ) 5 h,(A)k,(B). Proof. By Exercise 14-41b, Ch, x Xi,,is generated by the sets of the form A x B , where each A and B is either an open box or a null set. The proof that products of open boxes A x B (which are of course open boxes themselves) are hd-measurable is similar to the proof of Proposition 9.13 (see Exercise 14-40). Now let A E Ch,, let B E Ch, and let E > 0. Find sequences of open cc
{Zj)Eland { K l ] E l so that the containments A E u Zj, B C uKi and the c4
boxes
j=l
1=1
250
14. Integration on Measure Spaces
c x
inequality
00
II j I
j=1
I K1 I
< h,
(A)hn(B)+ E hold. Then
1=1
j,l=l
J=l
1=1
and for all A E Chm and B E Chn we obtain hd(A x B) I h,(A)h,(B). In particular, this means that for A E Ch,, and B E Ch, such that one of A , B is a null set we have that hd(A x B) = 0, which means A x B is Ad-measurable. Together with the Ad-measurability of products of open boxes, this means Chd 2 Chm x Ch,, because Chm x C,in is generated by products of open boxes or null sets in Rm with open boxes or null sets in Rn. Proposition 14.60 has two apparent shortcomings. First, the a-algebras are not shown to be equal to each other. Exercise 14-47 shows that the containment is indeed strict, so this part of the result cannot be improved. Second, we did not show that hd(A x B) and h,(A)h,(B) are equal, even though they should be equal. Recall that in Proposition 8.5 we needed the Heine-Bore1 Theorem to prove that the Lebesgue measure of an interval is its length. To prove that hd(A x B) and h,(A)hn(B) are indeed equal, we need a higher dimensional version of the Heine-Bore1 Theorem. This version will be provided in Theorems 16.72 and 16.80. Thus we delay the proof that hd(A x B) = A,(A)h,(B) to Theorem 16.81. To keep notation simple, unless the distinction of different dimensions is necessary, we will denote Lebesgue measure in all dimensions by h.
Exercises 14-39. Prove Proposition 14.51 14-40. Prove that every d-dimensional open box is hd-meaSUrabk. 14-41. Generating the Lebesgue measurable sets. For d E boxes and all d-dimensional null sets.
N let Gd be the set of all d-dimensional open
(a) Prove that C A ~the, 0-algebra of Lebesgue measurable sets in E d , is generated by Gd. Hint. Gd g C A by ~ Exercise 14-40. Prove that every S E X i d with h ( S ) < m differs by a null set from a countable intersection of countable unions of open boxes. Then use 0 -finiteness. (b) Prove that form, n E
N the 0-algebra generated by ( A x B : A
E G,, B E G,] is Chm x C h H .
(c) P r o v e t h a t i f m + n = d a n d A E C ~ ~ , t h e n t h e r e i s a BE Ch, X C A , withhd(A\BUB\A) = 0.
+
(d) Prove that if m n = d and the function f : Wd + [-m, m] is Zhd-measurable, then there is a Z A , x~ Cj,, -measurable function g with j = g a.e. 14-42. Let M be a set and let U be a set of subsets of M . Prove that the a-algebra generated by U is contained in all a-algebras that contain U . 14-43. On the equality of measures. (a) Prove Proposition 14.54.
(b) Prove that the 0-algebra generated by all the singleton subsets of C := (Cg W : C or W \ C is countable 1.
W is the set
14.8. Product Measures and Fubini's Theorem
25 1
(c) Construct two finite measures on C that agree on all singletons, but which are not equal 1 Hint.Set F ( [k) ) := - fork E N and set it equal to zero for all other singletons. 2k
(d) Construct a measurable space ( X , C) and two finite measures /1. and u on C so that the equality l ( X ) = u(X) holds, but { S E C : w ( S ) = u(S) } is not a a-algebra. Hint There is a finite example. 14-44. Prove that if ('Dj]iclis a family of Dynkin systems on the set M , then
n'Di
is a Dynkin system.
iel
14-45. Prove Theorem 14.57. 14-46. Prove that if is a measure defined on the a-algebra 8 generated by the finite open intervals such that F ( ( a , b ) ) = b - a for all finite open intervals, then p must be equal to the restriction of Lebesgue measure to B. 14-47. The containment of the a-algebras in Theorem 14.60 is strict. To see this, prove that for m < d every subset of W m (considered as a the subset Rm x [O)d-m of Rd) can be a section of a Lebesgue measurable set in Rd.
14.8 Product Measures and Fubini's Theorem We are now ready to define a measure on the product a-algebra. Because Theorems 14.56 and 14.57 are needed to construct an unambiguous product measure in Theorem 14.62, we introduce the notion of a-finiteness. Definition 14.61 A measure space ( M , C , k ) is called a-finite
u
iff
there is a sequence
a3
of subsets A; & M ofjnite measure so that M =
A ; . We will also sometimes call
j=1
the measure a -3nite.
Clearly, Lebesgue measure is a-finite, so we will be able to use the results from this section for d-dimensional integration. Theorem 14.62 Let (X, C , k ) and ( Y , r, u ) be a-jnite measure spaces. Then there is a unique measure k x u on C x r so that with the convention 0.00 = 0 for all A E C and B E we have ( k x u ) ( A x B ) = k ( A ) u ( B ) . Proof. The natural idea for the product measure of a set S E C x r is to integrate the u-measures of the sections S, over X. To do this we first need to prove that for all S E C x r the function x H u (S,) is C-measurable. Let X be the set of sets S E C x r so that the function x H u (S,) is C-measurable. We need to prove that X = C x r. For all A E C and B E r we have that u ( ( A x B),) equals u ( B ) when x E A and 0 when x 9 A . Therefore, the function x H u ( ( A x B),) = U(B)lA(X) is C-measurable for all A x B E C x r. To prove X is a a-algebra, first note that the rectangles with measurable sides form a rr-system (see Exercise 7-7) and X x Y E X. To be able to apply Theorem 14.56, we will now prove 'H is a Dynkin system when u is finite. If S,F E 'H and S C F , then for F \ S we have u ( ( F \ S),) = u (F, \ S,) = u(F,) - u(S,), which implies that F \ S E 'H. Now let be a sequence in X with A, 5 A,,+1 for all n E N.
252
Then u
(ifi
An),) = u
n=l
14. Integration on Measure Spaces
(c
( A n ) , ) = ,l&l u ( ( A , ) , ) by Theorem 14.15, so by
Exercise 14-28a li is closed under the formation of increasing unions and thus li is a Dynkin system. Therefore, by Theorem 14.56 7-l = C x r. Hence, if u is finite, for all S E C x r the function x H u(S,) is C-measurable. To prove the result for a-finite measure spaces ( Y , u , r), let { F n ) r = i be a se-
u CQ
quence of pairwise disjoint sets in u, (S) := u ( Fa
n S).
r
with u(F,)
l lg(n)l e B f B, < m, so f g E P. Similarly, if a E R,then for all n E N we have I ( a f ) ( n ) /= l a l / f ( n ) l I IcxIBf < CO. Thus loo is a subspace of F ( N , R).
f
+
+
+
+
Example 15.8 The set C o ( [ a ,b ] ,R) of continuous functions f : [ a ,b ] -+ R with pointwise addition and scalar multiplication is a vector space. Clearly, Co( [ a ,b ] ,R) is a subset of F ( [ a ,b ] ,R). Moreover, part 1 of Theorem 3.27 implies that sums of continuous functions are continuous. Part 3 of Theorem 3.27, applied with one function being constant, implies that scalar multiples of continuous functions are continuous.
258
15. The Abstract Venues for Analysis
Notation 15.9 To simplify notation, we write Co[a,b ] instead of C o ( [ a ,b ] ,R).
0
Example 15.8 shows how switching to more abstract situations is often just a change in point-of-view. We no longer consider individual objects, but instead we consider classes of objects. Results about individual objects are then often summarized in statements about the class. For example, the fact that sums and constant multiples of continuous functions are continuous is summarized in the statement that the continuous functions form a vector space.
Example 15.10 For a < b and k E N,let C k ( a ,b ) be the set of k times continuously differentiablefunctions on ( a , b). Then C k ( a ,b) with pointwise addition and scalar
n 00
multiplication is a vector space. Moreovel; C ” ( a , b ) :=
C k ( a ,b), the space of
k= 1
infinitely differentiable functions on ( a , b), with pointwise addition and scalar multiplication is a vector space. This is an induction using Theorem 4.6. 0 Spaces of “p-integrable functions” are of central importance in analysis.
Definition 15.11 Let ( M , C, p ) be a measure space and let 1 5 p
0 there is a S > 0 so that for all p Z 1 w e h a v e ( l I f I l ~ - ~ ) Si~ (d) Prove that for all f
E
(lb
/f(x)
1’
dx)’
C o [ a .b] we have lim l l f l l p = P-tW
5 ~ ~ f ~ ~ x ( b - a ) ~ .
lIfllx.
15.7 Abstraction IV: Metric Spaces Norms are used to measure distances in a vector space. Natural phenomena are often modeled in bounded subsets of d-dimensional space, which means sums and constant multiples do not necessarily stay in the subset. When there is no linear structure, distances are measured with metrics. The properties of a metric are inspired by the real life properties of distances. Distances are nonnegative, distinct objects have a nonzero distance from each other, the distance is independent of whether we go from point A to point B or vice versa, and detours through a third point cannot provide a shortcut.
15. The Abstract Venues for Analysis
276
Definition 15.53 A metric space is a pair ( X , d ) of a set X (without additional properties; inparticulal; X need not be a vector space) and afunction d : X x X -+ [0,co), called the metric on X , with the following properties. 1. For all x , y E X , we have that d ( x , y ) = 0 iff x = y . 2. For all x , y
E
X , we have d ( x , y ) = d ( y , x).
3. For all x , y , z E X , we have d ( x , z ) 5 d ( x , y )
+ d ( y , z).
We will usually call X itselfa metric space. When we do so, we implicitly assume that there is a metric on X , which will usually be denoted d. Normed spaces are metric spaces, too. So once more we have generalized a known concept.
Proposition 15.54 Let X be a normed space. Then d ( x , y ) := IIx - y 11 dejnes a metric on X . Proof. Clearly, d ( x , y ) = IIx - y 11 >_ 0 for all x , y E X and d ( x , y ) = 0 iff //x- y / / = 0 iff x - y = 0 iff x = y . Also d ( x , y ) = /Ix - y / / = / / y- xi1 = d ( y , x ) . Finally, d ( x , z ) = IIx - zll 5 IIx - y II
+ Ily - zll = d ( x , y ) + d ( y , z ) .
With metric spaces we are no longer tied to the linear structure of a vector space. In fact, any subset of a metric space is again a metric space.
Proposition 15.55 Let ( X , d ) be a metric space and let S C X be any subset of X . Let ds := d / s x S be the restriction of the metric d to the subset S. Then ( S , d s ) is a metric space. Proof. Clearly, any property of d that holds for all elements of X will also hold for all elements of S. Proposition 15.55 provides a wide range of examples of metric spaces. For example, we can now consider intervals on the real line and subsets of Rd as metric spaces.
Definition 15.56 Let X be a metric space. Then we will automatically consider any subset S 5 X to be a metric space, also called a metric subspace, carrying the metric d s of Proposition 15.55. Said metric will usually also be denoted d . It is sometimes called the induced metric or the relative metric.
dm+ dm,
Example 15.57 Not every metric space is a metric subspace of a normed space. On R2, for x f y let d ( x , y ) := which is the sum of the lengths of the straight line segments from x to 0 and from 0 to y , and for x = y let d ( x , y ) := 0. Then d is a metric on R2that is not induced by a norm. This metric models distances 0 in a situation in which all travel must go through a central hub. For more examples of metric spaces that are not subspaces of normed spaces, consider Exercise 15-34. We conclude this short section by proving the reverse triangular inequality for metric spaces. We could have proved it in normed spaces first, but with this approach we obtain it for normed spaces as a corollary.
277
15.7. Abstraction IV: Metric Spaces
2 LP
normed
spaces
topological spaces (nor considered
in
this rexr)
Figure 32: A hierarchy of structures for analysis. The LP spaces will be introduced in the next section.
Theorem 15.58 The reverse triangular inequality f o r metric spaces. Let X be a metric space. Then f o r all x,y , z E X we have ( d ( x ,y ) - d ( y , z)l I d ( x ,z ) . Proof. Let x,y , z E X and without loss of generality assume d ( y , z ) 5 d ( x , y ) . Then the inequality d (x,y) 5 d (x, z ) +d ( z . y ) implies the reverse triangular inequality w Id(x, y ) - d ( y , z ) / = d ( x , y ) - d ( y , Z) I d ( x , z). Corollary 15.59 The reverse triangular inequality f o r normed spaces. Let X be a normedspace. Then f o r a l l x , y E X we have jllxll - ((yIII I IIx - yII. The only spaces more abstract than metric spaces that occur frequently in mathematics are topological spaces. In analysis, spaces usually carry a metric. Therefore we will not present topological spaces in this text. Metric spaces will be investigated in more detail in Chapter 16. Figure 32 shows the hierarchy of structures that arise in analysis. To avoid notational confusion when working in normed spaces, we will use norm notation for the metrics on subsets of normed spaces. This is sensible, because all metrics that we will consider on subsets of normed spaces are induced by a norm.
Exercises 15-34. Examples of metric spaces.
I
0: i f x = y , X let d ( x , y ) := 1; i f x f y . Prove that d is a metric on X. Nofe. d is called the discrete metric on X.
(a) Let X be a set and for x , y
(b) Prove that d ( x , y ) :=
E
Ix - Yl
1
+ lx - 4'1 defines a metric on W.
Hint For the triangular inequality, expand with ~.
1
lx - T I
27 8
15. The Abstract Venues for Analysis (c) Consider the surface of a sphere in Bd with the usual distance function. Let the distance between two points be the length of the shortest path (on the sphere) between these two points. Explain why this distance function defines a metric between the points on the sphere and why this metric is not the metric induced by the usual distance function.
15-35. Explain why the metric induced on R2 by 11 111 is called the taxicab metric 15-36. Let 1 5 p 5 m. In Wd equipped with the metric induced by 11 . l i p compute the distance from the origin to the point (1, , . . , 1). In which metric does the cube [0, l l d have the longest diagonal? In which metric is the diagonal shortest?
15-37. The following, purportedly true, story illustrates the importance of having examples for an abstract notion. Warning. The following notion is absolutely useless. We simply debunk it here. The story itselfmay well be a “mathematical urban legend.” A mathematician once spent a lot of time proving abstract properties about so-called “anti-metric’’ spaces. An “anti-metric” d : X x X + [O. m) is a function so that for all x E X we have d [ x ,x ) = 0 iff x = 0, for all x , y E X we have that d ( x , y ) = d ( y , x ) and for all x , y , z E X we have that d ( x , z) ? d ( x , y ) d ( y , z ) . So, all that has changed from metric spaces is that the triangular inequality is reversed. Prove that an anti-metric space can have at most one point.
+
The mathematician could have simplified all his proofs by using that these spaces have at most one point. Spaces with at most one point have lots of properties, but they are not very interesting. So if you try to “be wise, generalize,” make sure that your generalizatiodmodification still has models [examples) that are useful.
15.8 LP Spaces Part 1 of Theorem 15.50 shows that CP spaces fail to be metric spaces because it is possible for distinct objects to have distance zero from each other. When all other properties of a metric are given, we speak of a semimetric space.
Definition 15.60 A semimetric space is a pair (X’, d S ) of a set X s and a function d S : X s x X s + [0,00) with the following properties. 1. For all x E X s , we have d(x,x ) = 0. 2. For all x , y E X s , we have d S ( x ,y ) = dS(y,x).
3. For all x , y , z E X s , we have dS(x,z ) 5 dS(x,y )
+ d S ( y ,z).
It would be cumbersome to develop a theory of semimetric spaces parallel to that of metric spaces. It is also more appropriate to work with metric spaces, because practical observation tells us that two distinct objects cannot occupy the same space at the same time. To overcome the minor deficiency of a semimetric, objects that have distance zero from each other are combined to become single points. This process produces a metric space in natural fashion.
-
Theorem 15.61 Let (X’, d S ) be a semimetric space. Then -G X s x X s dejined by y i f s d s ( x ,y ) = 0 is an equivalence relation (see Definition C.5 in Appendix C.2). x If we denote the equivalence classes of with [XI, then the set X := {[XI : x E Xs} equipped with d ( [ x ] ,[y]) := dS(x,y ) is a metric space.
-
15.8. LP Spaces
279
-
Proof. It is clear that the relation is reflexive, symmetric and transitive. The function d is defined for equivalence classes, but it is defined in terms of representatives of each class. Therefore we must prove that d is well defined. That is, we must show that the value of d does not depend on which representatives are used. Let [x], [ y ] E X , let X I x2 , E [x] and let y1, y2 E [ y ] . Then ds(xi, y i )
+
i d S ( x 1 , x 2 ) d S ( x 2 ,y2)
+ dS(y2,y i ) = d S ( x 2 ,y2)
and we prove the reversed inequality similarly. Hence, d s ( x l , y l ) = d S ( x 2 ,y 2 ) and the definition of d is independent of the representatives chosen from each equivalence class. Now d ( [ x l ,[ y l ) = 0 implies d S ( x ,y ) = 0 , that is, x y and thus [XI = [ y ] (Exercise 15-38a). Conversely, if [XI = [ y ] ,then d ( [ x ] [, y ] )= dS(x,x) = 0. Hence, d ( [ x ] ,[ y ] )= 0 is equivalent to [XI = [ y ] and the first condition for being a metric is satisfied. Antisymmetry and the triangular inequality for d are easily verified (Exerw cises 15-38b and 15-38c), so d is a metric.
-
Theorem 15.61 allows us to define metric spaces of “p-integrable functions.” Formally these spaces consist of classes of p-integrable functions, but the distinction blurs at times. Metric considerations are taken care of in L P ( M , C , p ) , while integral equations etc. are proved in Cp(M . C, p ) .
Definition 15.62 Let ( M , C, p ) be a measure space. For 1 5 p < 30, we denote by L p ( M , C , p ) the metric space obtainedfrom L p ( M . C ,p ) via Theorem 15.61. Exercises 15-39 and 15-40 show that the LP spaces actually are normed spaces and that the L 2 spaces are inner product spaces. We conclude this section by defining a space similar to 1, on measure spaces.
Definition 15.63 Let ( M , C ,p ) be a measure space. We define P ( M , C, p ) := { f E F ( M , W) : ( 3 B
For f
E
P ( M , C , p), we define II f
,1
E
W : I f (x)l 5 B p-a.e.) }.
:= inf { B E
W : I f (x)l i B p-a.e. }.
Note that sometimes the p-a.e. in the definition of LOc is replaced with “p-locally a.e.” Exercise 15-45 shows that for a-finite measure spaces null sets and locally null sets are the same. We consider a-finite measure spaces in this text, so the author chose the simpler definition.
Proposition 15.64 Let ( M , C , p ) be a measure space. Then COD( M , C , p ) equipped with d , ( f , g ) := 11 f - g 11 is a semimetric space. Proof. The reader will prove a little more in Exercise 15-41.
w
Definition 15.65 Let ( M , C , p ) be a measure space. We denote by L a ( M , C , p ) the metric space obtained from C m ( M , C , p ) via Theorem 15.61. Note that by Exercises 15-39 and 15-41 the space L w ( M , C ,p ) is actually a normed space.
15. The Abstract Venues for Analysis
280
Notation 15.66 If M = D is a Lebesgue measurable subset of Rd, C is the a-algebra of Lebesgue measurable subsets of D and p is Lebesgue measure restricted to C we will also write LP ( D )for Lp ( M , C , p ) and if D is an interval we also write Lp [ a ,b ] for L p ( [ a ,b ] ) ,and so on. Notation 15.67 Certain spaces are usually assumed to carry a certain norm or metric. The space Co[a,b ] is usually assumed to be normed by 11 ’ I/co. The spaces LP are usually assumed to be normed by 11 . I l p . The space Lco is usually assumed to be normed by 11 . l)co. Unless otherwise stated, we will assume that each of the above spaces carries the mentioned norm and that any subset of these spaces carries the metric induced by the mentioned norm. For finite dimensional spaces, Theorem 16.76 will show that although many norms are available, for most purposes any one of them can be used.
Exercises 15-38. Fill in the remaining details in the proof of Theorem 15.61.
-
(a) Prove that if is an equivalence relation with equivalence classes denoted by then [XI = [y]. (b) Prove that d : X x X + [O, for all [XI, [ y ] E X.
CXJ)
[.I and x
-
y.
as in Theorem 15.61 is satisfies d ([XI, [ y ] ) = d ( [ y l , [XI)
(c) Prove that d : X x X + [0, co)as in Theorem 15.61 satisfies the triangular inequality. 15-39. A seminormedspace is a pair (X’, 11 lls) of a vector space Xs and a function I / . 11’ : Xs+ [O, a), called the seminorm such that the following hold. 0
JjOlJ’ = 0. For all real numbers a and all x E X’ we have Ilax/Is= IL-IIIxII’. F o r a l l x , y E X S w e h a v e l ~ x + y ~5~ lS~ x l l s + l l y l l s .
(a) Prove that -& X s x Xsdefined by x
-
y iff IIx
-
yll’ = 0 is an equivalence relation.
(b) Prove that if [XI denotes the equivalence class of x under -, then X := {[XI : x E X’) equipped with [XI := llxlls is a normed space. Be careful. You must also prove that X is a vector space.
1
1
15-40. A semi-inner product space is a pair (X’, (., .)‘) consisting of a vector space X s and a function (., .)s : X s x Xs -+ JR, called the semi-inner product such that the following hold. 0
(0,O)’ = 0. For all x
E
X’, we have ( x . x ) 2 ~ 0.
Forallx,y ~ X ’ , w e h a v e ( x , y ) ’ = (y,x)’. For all real numbers a and all x , y For all x, y , z
E
Xs, we have ( x
E
+ y . z)’
-C X’
+
= ( x . z ) ~ ( y ,z)‘.
pair (X’,11 . 11’)
(a) Prove that with lIxI/’ := -the (b) Prove that
X s , we have ( a x , y ) $ = a ( x , y ) ’ .
x Xs defined by x
-
y iff Ilx
-
is a seminormed space
ylls = 0 is an equivalence relation.
(c) Prove that if [XI denotes the equivalence class of x under -, then the set X := {[XI :x E X s ] equipped with ( [XI, [ y ] ) := ( x , y)’ is an inner product space. 15-41. Prove that (L:“(M, C ,w ) , 11 . l l m )
in Proposition 15.64 is a seminormed space
15.9. Another Number Field: Complex Numbers 15-42. Let ( M , C,p ) be a finite measure space. For any two measurable functions f,g : M
d(f.g )
:=
s,
If
- g1
28 1 --f
W,define
d p . Prove that d is a semimetric on the space of measurable functions.
1 + l f -81 15-43. Holder’s inequality for p = 1, q = 00. Let ( M , C , p ) be a measure space. Prove that for all functions f E L ’ ( M , C, p ) and g E L 3 ” ( M , C, p ) we have fg E L 1 ( M , C, p ) and the inequality llfglll 5 Ilflllllglloo holds. Hint. Use that g is bounded a.e. by llgl/m.
15-44. Let ( M , C ,p ) be a measure space with p ( M ) < i*? (a) Prove that L m ( M , C , p )
0
c
L P ( M , C, p).
P€[l.W)
(b) Give an example that shows that the containment is proper. (c) Prove that for all f E Lm(M,C ,p ) we have lim llfllp = llfllm. P+m
15-45. Let ( M , C, p ) be a measure space. A set L C C is called locally p-null iff for all S p ( S ) < 00 we have p ( S n L) = 0.
E
C with
(a) Prove that every null set is locally p-null. (b) Prove that if ( M , C , p ) is c-finite, then every locally p-null set is a null set.
I
(c) Consider the function @ ( A ):= sets of W.
co;
‘
if A’ defined on the Lebesgue measurable subifOEA,
i. Prove that p is a measure. ii. Prove that Q is locally p-null, but not p-null
15.9 Another Number Field: Complex Numbers Complex numbers are often used in analysis, typically when “square roots of negative numbers” are needed. Exercise 15-52 shows that there is a price to be paid, because we lose the order relation. That in itself is not a problem. This section shows that the field axioms and an absolute value function remain available. Moreover, Theorem 15.75 will show that the convergence of Cauchy sequences, which is fundamental to analysis, is also preserved. Consequently, in abstract analysis R and @ are often used interchangeably.
Definition 15.68 The complex numbers @ are the set R x IR equipped with addition and multiplication dejned as follows. For all complex numbers ( a , b), ( c , d ) E @, we set ( a , b ) ( c , d ) := (a c , b d ) and ( a , b) . ( c , d ) := (ac - b d , ad bc). We define i := (0, 1) and 1 := ( 1 , O ) and write complex numbers also in the form ( a , 6 ) = a . 1 b . i = a ib. For z = a i b E @, the number a is also called the real part of z, denoted % ( z ) ,and the number b is also called the imaginary part of z, denoted 3 ( z ) .
+ +
+
+
The algebraic properties of
+
+
+
@: are
summarized in Theorems 15.69 and 15.70.
Theorem 15.69 The complex numbers C are afield.
15. The Abstract Venues for Analysis
282
+
Proof. The field axioms are easily verified with 0 = 0 Oi being neutral for addition, 1 = 1 Oi being neutral for multiplication, -(a b i ) = (-a) (-b)i being U b i being the multiplicative the additive inverse and (a ib)-’ = a2+b2 a2+b2 inverse. (Exercise 15-46.)
+
+
+
~
+
~
The special element i serves as the closest we can get to a “square root of (- l).”
Theorem 15.70 i 2 = -1. Proof. i 2 = (0
+ l i ) . (0 + ~
i =) ( 0 . o
- 1 . 1) + ( 0 . 1 + 1 . o ) i = -1.
w
Aside from the above algebraic properties, we need to know how to measure distances in the complex numbers. This is done via the absolute value function.
Definition 15.71 For z = a
+ ib E @, the absolute value of z is / z / :=
Theorem 15.72 Properties of the absolute value. 0. For all
z
E
@, we have
/ z / 2 0.
3. The triangular inequality holds. Thatis,forallzl,z2E@wehuvelzl+z2l F lziI+/z21.
w
Proof. Exercise 15-47.
If we switch the sign of the imaginary part of a complex number we obtain the complex conjugate. Definition 15.73 For z = a
+ i b E @, the complex conjugate of z is Z := a - ib.
Absolute value and complex conjugate are related via a simple equation. This equation can be used to express multiplicative inverses.
Proposition 15.74 For all z Moreover; for all z
E
E
@, the equalities z
+ Z = 2%(z) and 1zI2 = Z Z hold.
1 Z C \ { 0 }the multiplicative inverse is - = 7. IZI
Proof. Exercise 15-48. The definitions of complex valued sequences, convergence in C,and Cauchy sequences in C are similar to the corresponding definitions in R (Exercise 15-49).
Theorem 15.75 Every complex valued Cauchy sequence converges in @.
283
15.9. Another Number Field: Complex Numbers
{zn}E1
Proof. Let be a Cauchy sequence of complex numbers. For each n E W, let a, := M(zn) and b, := 3 ( z n ) so that zn = a, ib,. Then for all E > 0 there is an N E N so that for all m , n 2 N we have Iz, - zml < E . Thus for all rn, n 2 N we obtain (bn - bm)2 = Izn - z m 1 < E , la, - a, 1 I J(an -
+
+
{a,}zl
so is a Cauchy sequence in Iw. Similarly, { b n } E 1is shown to be a Cauchy sequence in R. Let a := lim a, and b := lim b,, where the limits are taken in R. Let z := a la - an I
0. Then - there is an N E N so that for all n 2 and Ib - b,l < L.Therefore, for all n 3 N we obtain
N we have
A
Iz
- z n / = J ( a - a,)2
which means that z = lim n+oa
Zn
+ (b - b,)2
0 so that { z E X : 1Jz- xJ1 iE } & C and let Y be a metric space. Prove that f : C -+ Y is continuous at x iff for all continuous functions p : [0, I] --f X so that p ( 0 ) = x , the composition f o p : [0, I ] + Y is continuous at 0.
301
16.4. Open and Closed Sets (b) Prove that f : E2 + W defined by f ( x , y ) :=
{ 09.
for (x, Y ) f (0,O ) , for x = y = 0,
is not continuous at the origin. 16-34 A polynomial p : Rd -+ W is a finite sum of finite products of the variables. That is, there are an n E W, c l , . . . , c, E R and (a;,. . . ,a;) E ( W U {O] )d so that for all ( X I ,. . . , xd) E X d we have p(a-1. . . . , x d ) =
2
c;xp'
.
x?. Prove that every polynomial p :
xd + B is continuous.
i=l
Hint. Induction. First, for n = 1 on the sum of the as to take care of the products, then on n. from X to Y is called uniformly 16-35 Let X and Y be metric spaces. A sequence of functions convergent to the function f : X + Y iff for all E > 0 there is an N E W so that for all n ? N and all x E X we have d ( f n ( x ) , f ( x ) ) < E . Let be a sequence of functions f, : X + Y that converges uniformly to f : X + Y and let all f, be continuous at c E X. Prove that f is continuous at c. Hint. Mimic the proof of Theorem 1 1.7. 1 1 j(j+l)(x-& ; for-sxs-, It1) J+1 j 1 1 16-36 For natural numbers j p 2, let g j ( j - 1) ( 1 - x) ; for 7 5 x 5 ---, J-1 J j-1 otherwise.
[fn]y=l
[f,]p=,
a3
Prove that the function f : R
--f
lCc defined by f :=
[ ,,
,
j=2 g / + 1 ' / - 1
1 e j is continuous on
W\
(01,
bounded, and for any sequence x, -+ 0 with x, z 0 the limit lim f ( x n )does not exist n-rm
Nore. This function shows that in infinite dimensional spaces functions can have discontinuities that are neither removable, nor jumps, nor infinite, nor by oscillation.
16.4 Open and Closed Sets Results, such as the Intermediate Value Theorem (Theorem 3.34) or the fact that continuous functions assume absolute maxima and minima (Theorem 3.44), refer specifically to the ordering of the real numbers. This ordering is not available in metric spaces. To obtain more general versions of these theorems, we must first establish properties that will take the place of the ordering of the real numbers. Connectedness (see Definition 16.90), which is needed to prove a version of the Intermediate Value Theorem, is defined purely in topological terms. Compactness, which allows us to prove a version of Theorem 3.44 for metric spaces, can be defined in terms of sequences, but it is more commonly defined in topological terms (see Theorem 16.72). Thus, before we continue, we introduce the requisite topological ideas. In a nutshell, topology revolves around open and closed sets. Figure 34 on page 304 summarizes the main properties of these sets. With a metric envisioned as measuring distances, a ball must be the set of points within a certain distance of a center point.
Definition 16.35 Let X be a metric space, let x E X and let E > 0. Then the open ball of radius E about x is dejned to be B,(x) := { p E X : d ( p , x) iF } . An open set contains for each of its points a small open ball around that point. So, just like in an open interval, at every point in an open set we have a certain amount of room to go in any direction (also see Figure 34(a)).
16. The Topology of Metric Spaces
302
Figure 33: Visualization of the proof of Proposition 16.37.
Definition 16.36 Let X be a metric space. A subset 0 E X is called open ifffor all E 0 there is an E~ > 0 such that Bex(x) g 0.
x
Open intervals ( a , b) on the real line are open sets as in Definition 16.36, because for each point x E ( a , b) with E := min(b - x,x - a } we obtain the containment B, (x)= (x - E , x E ) C ( a , b). Thus the nomenclature is consistent. In general, open balls are the prototypical examples of open sets. Although it is convenient to envision these balls as round entities, we should bear in mind that they need not be round at all. Balls with respect to the uniform norm 11 . )Im on Rd are cubes (also see Figure 37 on page 317).
+
Proposition 16.37 Let X be a metric space, let z ball B E ( z )is open.
E
X and let
E
> 0. Then the open
Proof. To prove that a set is open, we prove that it contains a small ball around each of its points. Let x E B E ( z )be arbitrary. Set E, := E - d ( x , z ) > 0. Then for all y E Bgx(x) we have that d ( y , z ) 5 d ( y , x ) d(x,z ) < E - d ( x , z ) d ( x , z ) = E . Hence, BCx(x) 5 B, ( z ) (see Figure 33) and since x was arbitrary, B, ( z ) is open.
+
+
We can summarize the most important properties of open sets as follows.
Theorem 16.38 Let X be a metric space. Then the following results hold: 1. Both 0 and X are open subsets of X.
n u n
2.
1f01,
. . . , 0, are open subsets ofx, then
ok
is open.
k=l
3. If0 is a family of open subsets of X then
Proof. Part 1 is trivial.
0 is open.
16.4. Open and Closed Sets
303
n n
For part 2, let
01,
. . . , On be open subsets of X, and let x
E
o k
be arbitrary.
k= 1
Then for each integer k E { 1, . . . , n } there is an E k > 0 such that B , , ( x ) E ok. Let := min{sk : k = 1, . . . , n ) . Then for all integers k E (1, . . . , n } the containments
E
n n
c:
~ & ( x ) ~,,(x)
c: ok hold. Hence, B,(x) G
n
nok n
ok.
Since x E
k= 1
was arbitrary,
k=l
n
the intersection
o k
is open.
u
k=l
For part 3, let 0 a family of open subsets of X and let x E 0be arbitrary. Then there is an 0 E 0 such that x E 0 and there is an E > 0 so that B,(x) 0. But then BE(x)E 0 E 0.Since x E 0 was arbitrary, 0 is open. rn
u
u
u
Example 16.39 Arbitrary intersections of opera sets need not be open. For any real numbers a < b, we have that [ a , b] =
n
0
is not an open subset of X .
Once open sets are defined we can briefly summarize the fundamental concepts of topology. A topology is a family 7 of subsets of a set X such that the three conditions in Theorem 16.38 hold. The pair (X, 7) is called a topological space and the sets 0 E 7 are called open sets. Any idea that can be stated purely in terms of open sets is actually a topological idea. In this text, we will focus on metric spaces and only touch upon the topological notions and vocabulary as necessary. As an example consider how continuous functions would be defined in terms of open sets.
Theorem 16.40 Topological formulation of continuity. Let X and Y be metricspaces and let f : X + Y be a function. Then f is continuous on X zrfor all open subsets V C Y the inverse image f [ V ]E X is open.
-‘
Proof. For “ j , let ” f be continuous and let V _C Y be open. Let x E f - ’ [ V ] . Then there is an E > 0 so that B E (f (x)) 5 V . Because f is continuous, there is a S > 0 such that for all z E X with d ( z , x) < 6 we have d ( f ( z ) ,f ( x ) ) < E . Therefore, for all z E Bs(x) we obtain f ( z ) E B , ( f ( x ) ) G V , and hence Bs(x) _C f - ’ [ V ] . Because x E f - ’ [ V ]was arbitrary, f - ’ [ V ]is open. Y the inverse For “+,”let the function f be such that for all open subsets V image f - ’ [ V ] _C X is open, let x E X and let E > 0. Then f-l[B,(f(x))] is open. Hence, there is a 6 > 0 with &(x) _C f - ’ [ B E ( f ( x ) ) ]But . this means for any z E X with d ( z , x) < S the inequality d ( f ( z ) , f ( x ) ) < E holds, and hence f is continuous at rn x . Because x E X was arbitrary, f is continuous. We will often be concerned with subsets of metric spaces that contain an open set around a given point x. To abbreviate the wording and because open sets contain a little “padding” around each of their points (see Figure 3 4 ( a ) ) ,we call such sets neighborhoods.
304
16. The Topology of Metric Spaces
open set
Figure 34: Open sets contain a certain amount of “padding” around each of their points ( a ) ,while closed sets contain all their limit points (b).
Definition 16.41 Let X be a metric space and let x E X . Then N X is called a neighborhood of x i f f x E N and there is an open set 0 G X so that x E 0 C N . I f S 2 X and N is a neighborhood of each s E S, then N is also called a neighborhood of s. The complements of open sets are called closed sets.
Definition 16.42 Let X be a metric space. A subset C G X is called closed fz its complement X \ C is open. Closed intervals on the real line are also closed sets as in Definition 16.42. Once again, the nomenclature is consistent.
Example 16.43 Although it is tempting to hope that open and closed are mutually exclusive and exhaustive properties, sets can actually be open, closed, both, or neither. 1 . The interval [0, I] is a closed subset of the real line.
2. In any metric space, the sets 0 and X are both open and closed. 3. The interval [0,1) as a subset of the real line is neither open nor closed.
0
Although properties of closed sets can in theory be obtained through complementation, closed sets are interesting in their own right, because closed sets “keep their limits” (also see Figure 34(b)).
Definition 16.44 Let X be a metric space and let A G X . Then x E X is called a limit point of A iff there is a sequence {a,}:=, with a, E A for all n E N so that x = lim a,. n+m
Limit points are different from accumulation points in that the sequence that converges to a limit point x can assume x as a value (and even be constant with value x ) , which is forbidden for accumulation points.
305
16.4. Open and Closed Sets
Theorem 16.45 Let X be a metric space. Then C C X is closed @for all convergent sequences with zn E C for all n E M we have lim Zn E C.
{z,}Z,
zn
n+co
c
Proof. For "jlet ," C X be closed, let {z,},"=, be a convergent sequence with C for all n E N and let x := lim zn. For a contradiction, suppose that x $ C.
E
n-tco
Because X \ C is open, there is an E > 0 so that B E ( x )G X \ C. But then there is an N E N so that for all n 2 N the point z,, is in BE(x)5 X \ C and in C, a contradiction. For "+,"suppose for a contradiction that X \ C is not open. Then there is an x E X \ C so that for every n E N there is a point zn E B L(x)n C. But then { Z n } E i n is a sequence of points in C that converges to a point x @ C, contradiction.
Remark 16.46 Definition 16.36 and Theorem 16.45 provide descriptive characterizations of open and closed sets, respectively. Note (also see Figure 34) that sequences co
in open sets need not take their limit in the open set (consider the sequence
{
ln=l
in the interval (0, 1)) and that closed sets need not contain a small ball around each of 0 their points (consider the point 0 in [0, 11). Because completeness is important, we should note that closed subsets of complete spaces will again be complete.
Corollary 16.47 Let X be a complete metric space and let C E X be closed. Then C with the induced metric is a complete metric space.
rn
Proof. Exercise 16-37.
16.4.1 The Interior and The Closure For each subset A of a metric space, there are a largest open set contained in A and a smallest closed set that contains A . These sets are called the interior and the closure, respectively.
Definition 16.48 Let X be a metric space and let A C X . The interior A" of A is the set of all points x E A so that a small ball around the point is also in A , that is, A" := {x E A : (3s > 0 : B,(x) C A ) } . The points in A" are also called the interior points of A.
u{
Proposition 16.49 Let X be a metric space and let A E X . Then A" is an open set that contains all open subsets of A. Moreovel; A" = 0 C A : 0 is open }. Proof. To see that A" is open, let x E A". Then x E A and there is an E > 0 so that B E ( x ) C A . By Proposition 16.37, for every z E B,(x)there is an E~ > 0 so that BEI( z ) g B , ( X ) A . But then B E ( x )E A", which proves that A" is open. Now let U g A be an open subset of A. Then for all x E U there is an E > 0 so that BE(x)C U G A, which means x E A". Hence, U A" for all open subsets of A. Finally to prove the equation, note that the set on the right is an open subset of A , which means (by what we just proved) that the set on the right is contained in A". Conversely, A" is an open set contained in A , so A" is contained in the union on the right, thus establishing the equation.
c
306
16. The Topology of Metric Spaces
Example 16.50 1. The interior of an open ball B, (x)obviously is the ball itself, ( B , (x))" = B, (x). 2 . The interior of the rational numbers as a subset of the real numbers is Q" = M, because every ball around a rational number contains an irrational number. 0 Because every singleton set {x)is contained in all open balls of radius r > 0 about x. it is easy to see that there is no smallest open set that contains a given set.
Definition 16.51 Let X be a metric space a i d let A C X . The closure A- (or 2)of A is dejined to be the set of all limit points of A, that is, -
[
(
A := A - := x E X : 3 { a , } z l : lirn a, = x and Vn nice
E
W : a,
E
I.
Proposition 16.52 Let X be a metric space and let A E X . Then A - is closed and it is contained in all closed supersets of A . Moreovel; A- = n ( C 1 A : C is closed }. Proof. To prove that A - is closed, let x E X and let {x,}?=~ be a sequence with A- for all n E W and lirn x, = x. Then for each n E N there is an a, E A with nice 1 d(x,. a,) < -. But then lim a, = lim x, = x and x E A-. Hence, A- is closed. n n+oo n+oo Now let C 2 A be a closed superset of A and let x E A - . Then there is a sequence {a,}Zlso that a, E A for all n E N and lim a, = x. But then, because C is closed
xn
E
n-co
and contains A we conclude x = lim a, E C, and hence A- E C. n-rzo Finally, to prove the equation, note that the set on the right is a closed superset of A (use Exercise 16-41c), so A - must be contained in the set on the right. Conversely, because A- is a closed superset of A it must contain the intersection on the right, which establishes the equality.
Example 16.53 1. The closure of an open ball is B,(x) = { p E X : d ( p , x) 5 r } (Exercise 1648a).
0
2 . The closure of the rational numbers in the real numbers is = R,because every 0 real number is the limit of a sequence of rational numbers. Because every open interval is the union of all its closed subintervals, there is no largest closed set that is contained in a given set. The set that is geometrically between the interior of a set and the interior of the set's complement is called the boundary.
Definition 16.54 Let X be a metric space and let A C X . The boundary 6 A of A is dejined to be 6A := A- \ A".
307
16.4. Open and Closed Sets
Figure 35: Visualization of Proposition 16.56. Relatively open sets U are intersections of the subset with open sets V in the space ( a ) . In particular, as subsets of the space itself, they need not be open ( b ) .
Example 16.55 1. The boundary of an open ball is s B , ( x ) = { p E X : d ( p , x) = r } (Exercise 16-48b) .
2. The boundary of the rational numbers in the real numbers is 8Q = R. Further properties of the interior, the closure, and the boundary of a set will be investigated in Exercises 16-43-16-47.
16.4.2 Relatively Open Sets When subspaces S of a metric space X are investigated, subsets of S can be open in the subspace S, as well as in the space X itself. However, it is important to realize that a set U C S can be open in S and still it may not be open in X. For example, the interval [O. 1) is an open subset of the metric space [O, 21. Indeed, for every x E [O, 1) there is a small E > 0 so that the ball in [0,2] around x of radius E is contained in [0, 1). Proposition 16.56 describes the relation between open sets in the space X and open sets in a subset S. Open sets in a subset are also called relatively open.
Proposition 16.56 Let X be a metric space and let S X be a subset. Then U S is open with respect to the induced metric on S iff there is a set V C X that is open with respect to the metric on X and such that U = V S. (Also see Figure 35.) Proof. To prove "+,"for each element u E U find E, > 0 so that the containment B;,(u) := {x E S : d ( x , u ) < e U }G U holds. Let
v :=
u U€U
Bt(U)=
u {x
E
x : d ( x ,u ) < E U }
UEU
Then V is open and we claim that U = Sn V . Clearly, U _C S f l V . Now let u E S n V. Then there is a u E U so that u E B: ( u ) . Because u E S we infer u E ( u ) _C U , and hence S n V C U .
Bli
16. The Topology of Metric Spaces
308 The part
“+”is left as Exercise 16-38.
rn
A similar result is proved for closed sets in Exercise 16-49.
Exercises 16-37. Prove Corollary 16.47. part. 16-38. Finish the proof of Proposition 16.56 by proving the ‘‘e” 16-39. Let X be a metric space. Prove that for all x
E
X the set X \ (x)is open.
16-40, Let X be a metric space. Prove that U C X is open iff U is a union of open balls
(x).
16-41. Closed sets. Let X be a metric space. (a) Prove that both 0 and X are closed subsets of X.
u n
(b) Prove that if C1, . . . , C, are closed subsets of X , then
Ck is closed.
k=l
(c) Prove that if C is a family of closed subsets of X then
C is closed.
(d) Give an example of an infinite union of closed sets that is not closed. d
16-42. Let C g
Wd be so that for all n E W the set C n n [ - n , n ] is closed. Prove that C is closed. i=l
16-43. The interior of a set. Let X be a metric space. (a) Prove that Aoo = A’ for all A g X. (b) Prove that if A l , . . . , A , g X, then
(h
Aj )
A;
=
j=1
j=1
oc
(c) Prove that if A j g X for all j E
N,then
AS and give an example that
shows that the containment can be proper. (d) Prove that U g X is open iff U o = U . 16-44. The closure of a set. Let X be a metric space Prove that A - - = A - for all A g X. Prove that if A 1 , . . . . A , g X, then
u n
uq. n
Aj =
j=1
j=1
Prove that if A j
sX
u CCI
for all j E W, then
A, 2
r=l
u x.
and give an example that shows
]=I
that the containment can be proper. Prove that C g X is closed iff C- = C 16-45. The boundary of a set. Let X be a metric space and let A C X. (a) Prove that the boundary S A of A is closed. (b) Find a set B in a metric space X so that B , SB and S(SB) are three distinct sets (c) Prove that if A is closed, then (X \ A ) U Ao = X. (d) Prove that for any set A we have S ( S A) = 6 ( S ( 6 A ) ).
16.5. Compactness
309
16-46. Closure, interior and the boundary. Let X be a metric space and let A & X (a) Prove that X \ (A") = ( X \ A ) - . (b) Prove that X \ (A-) = (X \A)'. (c) Prove that S A = A- n (X \ A ) - . (d) Prove that
A-O-
(e) Prove that A'-'
C A - and show that the containment can be proper.
2 A n and show that the containment can be proper.
16-47. Let X be a normed space and let Y C X be a normed subspace. Prove that Y - also is a normed subspace of X. 16-48. Let X be a metric space, let x (a) Prove that B,(x) = (b) Prove that S B , ( x ) =
E
X and let r > 0
[ p E X : d ( p ,x ) 5 r ) . { p E X : d ( p ,x ) = r ) .
16-49. Let X be a metric space and let S & X be a subset. Prove that C g S is closed with respect to the induced metric on S iff there is a set D C X that is closed with respect to the metric on X and such that C = D n S. Hint. Use Theorem 16.45. For ''3,'' let D be the set of all limits of sequences in C . 16-50, Let X, Y be metric spaces and let f : X +. Y , Prove that f is continuous at x E X iff for all open subsets V C Y that contain f ( x ) the inverse image f-'[V] & X contains an open ball around x . 16-51. Let x be a point in a metric space X. Can a neighborhood of x be closed? Explain. 16-52. Let X be a normed space and let R g X be an open subset. Prove that every x point of R.
E
R is an accumulation
16-53. Let X, Y be metric spaces and let f : X + Y be a function. Define the oscillation of f over the open set U C X as w f ( U ) := sup [ d ( f ( y ) , f ( z ) ) : y , z E U ). Define the oscillation o f f at x E x as w f ( x ) := inf [ w ~ ( u :) x E U , u open
1.
(a) Prove that f is continuous at x iff wf ( x ) = 0. (b) Prove that for all p 1 0, the set (c) Prove that for all p 2 0, the set
{x {x
E
X
: wf(x) 2 p
E
X
:wf(x) < p
} is closed. ) is open.
16-54. Prove that Cantor sets are closed.
16.5 Compactness The Bolzano-Weierstrass Theorem plays a key role in the proofs of several important results for functions of a single variable. For example, it is used to prove that a continuous function f : [ a ,b] + R always assumes an absolute maximum value (see Theorem 3.44), as well as to show that continuous functions f : [ a ,b] -+ R are uniformly continuous (see Lemma 5.19). It therefore is natural to investigate spaces that satisfy the conclusion of the Bolzano-Weierstrass Theorem.
Definition 16.57 Bolzano-Weierstrass formulation of compactness. A metric space X is called compact ifSevery sequence {x~}:=~ of elements in X has a convergent subsequence.
310
16. The Topology of Metric Spaces
Compactness is usually formulated in topological terms. We will investigate this formulation, which is reminiscent of the Heine-Bore1 Theorem (see Theorem 8.4), in Theorem 16.72. In metric spaces, the Bolzano-Weierstrass formulation of compactness is equivalent to the topological formulation, but we need to be careful. In general topological spaces, the Bolzano-Weierstrass formulation of compactness is called sequential compactness. It is a consequence of (topological) compactness, but it is not equivalent to it. Closed and bounded subsets of finite dimensional spaces are the prototypical examples of compact sets (also see Theorem 16.80).
Example 16.58 Let r > 0 and let c,(o):= [x E Rd : I I x J J5~ r ) . Then c,(o)is compact when equipped with the metric induced by the 11 . /Jm-norm. A typical compactness proof with the Bolzano- Weierstrass formulation will take a sequence in the space and produce a convergent subsequence. Let be a sequence in c,(o).Then each component sequence x;) is
[xn}zl
{ )5L
{
bounded. In particular, x:')]
{
00
has a convergent subsequence x"'}
n=l
[
00
1 5 j < d and assume n i ]
m=l
n=l
00
nk
. Now let k=l
is a strictly increasing sequence of integers such
that for 1 5 i 5 j the subsequences
(x(~))T
converge. Then
m=l
a convergent subsequence. That is, there is a strictly increasing sequence of integers
[niT1]I0
1=1
such that for all 1 5 i 5 j
+ 1 the subsequences
[
00
{ xy/+,]
00
converge.
1=1
Inductively we conclude that there is a subsequence x n i ] k = i such that for 1 5 i 5 d d
00
converge. Call each limit x ( i ) and let x := E x ( ' ) e i .
the subsequences
.i z lCCI
Then by Theorem 16.4 {xn..) -
k
is a convergent subsequence of
k =l
{xn}Z1 with limit x.
Moreover, x E C,(O), because C,(O) is closed. Since { x n } K l was arbitrary this means 0 that C, (0) is compact. Compact subspaces of Rd will be investigated in detail in Section 16.6. As Example 16.58 indicates, compact spaces are usually subsets of larger metric spaces.
Definition 16.59 Let X be a metric space. A subset C G X is called compact i f f C with the induced metric is a compact metric space. Equivalently, C is compact i f f every sequence in C has a convergent subsequence whose limit is in C. Compact subsets of metric spaces are closed and bounded
Proposition 16.60 Let X be a metric space and let C be a compact subset of X . Then C is closed and bounded.
16.5. Compactness
311
Proof. Let C X be compact. To prove that C is closed, suppose for a contradiction that C is not closed. Then there is a sequence {x~}:=~ of elements of C that converges in X to a limit x # C . But then by Proposition 16.11 all subsequences of {x,,},I=l x converge (in X ) to x,and hence no subsequence of has a limit in C, a contradiction. To prove that C is bounded, suppose for a contradiction that C is not bounded. Let .YI E C . Once X I . . . . , x,, E C have been chosen, we can find an x,,+] E C so that ~ ( x . , , L Ix,,) . ? (n 1 ) max {d(xk,x,) : k = 1, . . . , n - l } . But then the inequality d ( s , , - ~ xk) . 2 d(x,,+l. .xn) - d(x,,. xk) 2 n 1 holds for all k = 1. , . . , n , and hence all subsequences of the inductively constructed sequence {x, are unbounded. Now by Proposition 16.9 no subsequence of (x~}:=~ has a limit in C (or even in X ) , a contradiction. w
{x,,)zl
+ +
+
Exercise 16-55 shows that the converse of Proposition 16.60 does not hold in general. Next, we note that the closed subsets of a compact space inherit the compactness.
Proposition 16.61 Let X be a compact metric space and let C G X be closed. Then C is compact. Proof. Let { ~ , ) , " 3 _ ~be a sequence of elements of C. Because C 5 X there is a subsequence { a n k } E lthat converges in X to a limit L . But then by Theorem 16.45 L E C, and hence C is compact. w The general version of Theorem 3.44 is now the following.
Theorem 16.62 Let X be a compact metric space, let Y be a metric space and let f : X + Y be continuous and surjective. Then Y is compact. Proof. Let { y n } g l be a sequence in Y . For each n E N let xn E X be such that f ( x , ) = yn. Because X is compact, there is a subsequence {xn,}El of {x.},"=~ and an x E X such that lim xnk = x. But then y = f ( x ) E Y , { is a subsequence
ynk}zl
k+m
of { y n } z land lim ynk = lim f ( x n k )= f ( x ) = y . Therefore Y is compact. k+ ac:
k+x
w
Corollary 16.63 (Compare with Theorem 3.44.) Let X be a compact metric space and let f : X -+ R be continuous. Then f assumes its absolute maximum on X . That is, there is an x E X so that f (x) > f ( z ) f o r all z E X . Proof. The function f is surjective onto f [ X ] & R.By Theorem 16.62 this means f [ X ]is compact and by Proposition 16.60 this means that f [ X ] is closed and bounded. Then M := sup ( f [ X I ) is an element of f [ X ] . Therefore there is an x E X so that f ( x ) = M , and M is greater than or equal to all other values f assumes. w It is also easy to see that the inverses of continuous functions on compact metric spaces are continuous.
Theorem 16.64 (Compare with Theorem 3.38.) Let X be a compact metric space, let Y be a metric space and let the function f : X + Y be continuous and injective. Then the inverse function f : f [XI -+ X is continuous, too.
-'
312
16. The Topology of Metric Spaces
Proof (sketch). Let
{yn}El be a sequence in Y that converges to y E f i x
{ -'
1
[XI. Prove
has a subsequence that converges to f that every subsequence of f ( y n ) n=l and use Exercise 16-8. The full proof is left to the reader as Exercise 16-56.
(y) H
An important metric property of compact spaces is that they are complete.
Theorem 16.65 Let X be a compact metric space. Then X is complete. Proof. Exercise 16-57. Another important consequence of compactness is that continuous functions are uniformly continuous on compact metric spaces.
Definition 16.66 Let X , Y be metric spaces. Then the function f : X -+ Y is called uniformly continuous i f f o r every E > 0 there is a S > 0 such that for all u , u E X with d ( u , u ) < 6 we have that d ( f ( u ) ,f ( u ) ) < E . Lemma 16.67 Let X be a compact metric space, let Y be a metric space and let the function f : X + Y be continuous. Then f is uniformly continuous. Proof. Mimic the proof of Lemma 5.19. (Exercise 16-58.) Compactness typically is not formulated in terms of the Bolzano-Weierstrass Theorem, but in terms similar to the Heine-Bore1 Theorem (see Theorem 8.4). For metric spaces, both formulations are equivalent. Because the Heine-Bore1 formulation is exclusively in terms of open sets it is the topological (and thus more general) description of compactness. Recall that the Heine-Bore1 Theorem said that each open cover of a closed and bounded interval has a finite subcover.
u
Definition 16.68 A cover of a metric space X is a family C of sets such that X C. An open cover C is a cover such that all sets in C are open. For a subset S of a metric space X , it is usually more natural to cover S with sets that are open in X , even if this means that the sets are not contained in S. Hence, i f S X , we will also call a family C an open cover of S i ra l l sets in C are open (in X ) and s E
UC.
For a visualization of open covers, consider Figure 36.
Example 16.69 Open covers. I . {B,(o) : n E N} is anopencover ofR d.
2.
[ B-
(0) : x
3. For any a , b
E
}
B ~ ( o )is an open cover ofBl(0)
E (0, l),
the set
open cover of the interval [0, 11.
{ (:,
1-
i)
:n E
ll~d
N) u {Lo, a ) , ( b , 11)
is an
313
16.5. Compactness
(a1
(b)
Figure 36: Because compact sets are usually subsets of other spaces, open covers are typically visualized with sets that are open in the surrounding space ( a ) rather than with relatively open sets (b).
U { (-1, a ) , (b,2)) is an open cover
Moreovel; the set
ofthe interval [0, 11 i f w e consider [0, 11 as a subspace ofR.
We have encountered the proof technique of finding finite subcovers of open covers in Proposition 8.5, Exercise 8-3c and Lemma 8.11. Definition 14.17 of the d dimensional outer Lebesgue measure also relies on open covers and the remarks after the proof of Proposition 14.60 suggest that we need a version of the Heine-Bore1 Theorem to prove that d-dimensional Lebesgue measure is a product of lower dimensional Lebesgue measures. Indeed, in this text, the main use of finite subcovers of open covers is in connecting topology with measure theory. Theorem 16.72 below shows that compactness provides an abstract version of the Heine-Bore1 Theorem.
Lemma 16.70 Let X be a compact metric space. Then for every
E
> 0 there are
n
X I ,. . . . ~n E
X
SO
that X C
U B,(xj). j=1
Proof. Let E > 0. Let x1
E
X be arbitrary. If X
u
BE(xl)stop. Otherwise continue
n-1
as follows. If xi,. . . , x n - l
E
X are
SO
that X
j=1
u
n-1
B , ( x j ) , choose x n pi
u
B,(xj).
j=1
n
If X C
j=1
B , ( x j ) , stop, otherwise continue. This process cannot continue indefinitely,
because if it did, {xn}Z1 would be a sequence such that for any distinct m , n E N we have d(x,, x,) 2 E . This would mean that (xn}E1has no convergent subsequence, a contradiction to the compactness of X. The x1, . . . , x, for which the construction stops are as desired.
16. The Topology o f Metric Spaces
3 14
Definition 16.71 Let X be a metric space and let 0 be an open covey. A finite subset
u n
(01,. . . , O n ] & 0so that X &
Oj is also called a finite subcover.
j=1
Theorem 16.72 Heine-Bore1formulation of compactness. A metric space X is compact iff every open cover 0 of X has a finite subcovey. Proof. For "+,"we will prove the contrapositive. So assume that X is not compact. Let { x , } z l be a sequence that does not have a convergent subsequence. Then (Exercise 16-59) for every x E X there is an E~ > 0 so that { n E N : x, E BEx(x)) is finite. But then C = { BEx(x) : x E X ) is an open cover of X that cannot have a finite subcover. For "+,"let X be compact and let 0 be an open cover. We first prove that there is an E 0 so that for every x E X there is an 0 E 0 so that B,(x) g 0. For a contradiction, suppose that this is not the case. Then for each n E N there is an xfl E X such that B L(x,) is not contained in any set 0 E 0. Because X is compact, (x,}:=~ fl
has a convergent subsequence { x , ~ ) ~ Let = ~ x. := lim x,,. Then there is an 0 (x:
k+m
E
0
so that x E 0. Moreover, there is an E > 0 so that B E ( x ) 0. Now let k E N be such 1 & 1 that - < - and such that d ( x n k ,x) < -. Then for all y E B L (x,,) we have that nk 2 nk "k 1 1 & & d ( y , x) 5 d ( y , X,,) d(x,,, x) < - - < - - = E . Consequently, the containIlk nk 2 2 ments B L ( x n k ) 5 B, (x) C 0 provide the desired contradiction.
+
+
+
"k
Now let E > 0 be so that for every x E X there is an 0 E 0so that BE(x) & 0 . By
u u n
Lemma 16.70, there are finitely many X I , . . . , x, E X so that X C
B,(xj). For each
j=l n
j = 1 , . . . , n , let Oj
E
0be such that B E ( x j ) O j . Then X C
B E ( x j )&
j=1
and [ Oj}S=l is the desired finite subcover of 0.
u 11
Oj
j=1
I
Typically, when we invoke compactness we obtain a finite subcover of a cover with sets that are open in a surrounding space. It is not necessary to explicitly prove that a subset S of a metric space is compact iff every open cover 0 (with sets that are open in the surrounding space) has a finite subcover. The translation is eazily made by finding a finite subcover { 01 n S , . . . , 0, f' S}of the corresponding cover 0 := { 0 nS : 0 E 0) with relatively open sets and then going back to the sets { 0 1 , . . . , O n }that are open in X (also see Figure 36).
Exercises 16-55. Prove that B1 := [ x 16-56. Prove Theorem
E
16.64.
16-57. Prove Theorem 16.65. Hint. Exercise 16-16.
16-58. Prove Lemma 16.67
l a j : lIx/lcw 5 1 ) is a closed and bounded subset of lx that is not compact.
16.5. Compactness
315
16-59. Prove that if X is a metric space,~=:],x( { n E W : xn E BE(x)] is infinite, then
is a sequence and x E X is such that for all E > 0 the set
(X,],X,~has a convergent subsequence.
16-60. Let X be a metric space. Prove that if C a finite subcover. 16-61. Let X be a metric space. Prove that if C a finite subcover.
g X is not closed, then there exists an open cover without
5 X is not bounded, then there exists an open cover without
16-62. Let X be a metric space, let K C X be compact, and let 0 2 X he open such that K 2 0. Prove that there is an E z 0 such that for all x E K we have BE(x) C 0 . 16-63. Let X be a metric space such that all closed and bounded subsets are compact and let [an]r=l be a sequence. Prove that ( u , ) ~diverges ~ if and only if there is a subsequence { a n k ] E lthat l such that lim aim and is unbounded, or there are two subsequences { u n k ] ~and
{ U ~ ~ } Z = ~ m+m
lim unk both exist, but are not equal.
k+x
16-64. More on Lemma 16.70. (a) Prove that the conclusion of Lemma 16.70 is not equivalent to compactness by showing that the open interval (0, 1) satisfies the conclusion of Lemma 16.70. (b) Prove that a metric space X is compact iff
i. X is complete and
u n
ii. For every E > 0, there are X I ,. . . , xn E X so that X 2
B,(xj).
j=1
Hint. We only need to prove “e.” For this direction, let
( ~ j ] ? = be ~
struct a Cauchy sequence of points ( z k ] , X = , in X so that
{j
E
N :yj
a sequence in X. ConE B 1(zk)
You will need to take a subsequence of a subsequence when constructing zkil Zl,
x
.
] is infinite.
after obtaining
. . . Zk.
16-65. Continuous functions need not be bounded on noncompact closed and bounded sets. For each
n
E
N define the function
f : lm
+ B to be f ( x ) := n 1 - - x(“) -
E L ( e n ) around the nth unit vector and zero on Zx \ I0
arctan : B +
(-:,
and arctan(-x) :=
:j. 2
uB h
1o:
(
1Y)
on the ball
( e n ) . Prove that f is continuous on
n=l
lx and unbounded on B1(0) G 1% 16-66. The tangent function can be inverted on the interval
(
cc
T i c (- k, ,). \
L
Its inverse is the arctangent function
L I
7r
Extend arctan(,) to the interval [-m, m] by defining arctan(m) := 2 Forx, y E [-m, m] definedc(x, y) := arctan(x) - arctan(y)
1
1.
, ) is a compact metric space. (a) Prove that ([-m, a ]dc (b) Prove that if [ x n ] r = l is a sequence of real numbers and x lim Jxn -xi = 0.
E
W,then n lim +m
d c ( x n ,x ) = 0 iff
n-cc
(c) Prove that if is a sequence of real numbers, then we have lim dc(xn.x)) = 0 iff n-m lim xn = 30 in the sense of Definition 2.42. n-x
16-67. Prove that if f : [c, d ] x [a. b] + R is continuous, then the function g : [u,b] + R defined by g ( t ) :=
Id
f ( x , t ) dx is continuous.
H i m Use the uniform continuity o f f .
16. The Topology of Metric Spaces
316 16-68. Lemmas for the Stone-Weierstrass Theorem.
[fn]r=l
be a nondecreasing sequence of continuous real-valued func(a) Dini’s Theorem. Let tions on [a, b] that converges pointwise to the continuous function f : [a, b] + W.Prove that converges uniformly to f . Hint. Cover the interval with open intervals (x - S,, x + 8,) on which f is close to a function f n , . Then take a finite subcover.
{fn]E1
(b) Prove that the sequence [Pn]F=Oof polynomials defined recursively for all x
+
1
Po(x) := 0 and Pn+l (x):= Pn(x) - ( x - P:(x) 2 function f ( x ) = &. Hint. Same approach as in Exercise 2-42.
E
[0, 11 by
) converges uniformly on [O, 11 to the
(c) Prove that for any M > 0 there is a sequence of polynomials that converges uniformly to f ( x ) = & on [O, MI.
16.6 The Normed Topology of Rd So far, on d-dimensional space we have only worked with the uniform norm 11 . l l m . This norm is easy to work with, but it does not measure the usual Euclidean distance. Therefore we will now investigate how norms in d-dimensional space relate to each other. It turns out that all norms on finite dimensional spaces are equivalent and that they induce the same notion of convergence. This means there is no loss of generality in working with the uniform norm on a finite dimensional space.
Lemma 16.73 Let /I . 11 be a norm on Rd. Then f o r all x =
c
(XI,
. . . , Xd)
E
Rd the
d
inequality IIx 11 p llxlloo
ljei 11
holds, where ei denotes the i f h unit vector in Rd.
i=l
i=l
i=l
Theorem 16.74 Zfboth /I . I/ 1 and // . 112 are norms on Rd, then there are real numbers c, C > 0 such thatfor all x E Rd the inequalities cllx II 1 I IIx 112 I Cllx I/ 1 hold. Proof. Let // . I/ be an arbitrary norm on Rd. By Lemma 16.73, for all points d
x,y ~ R ~ w e i n f e r ~ I l x l l - I l yI l l I~l x - y l l I I l x - y l l c o ~ l l e i I I . T h u s II./Iisconi=l
tinuous with respect to 11 . IIoo. By Example 16.58, Proposition 16.61, and Corollary 16.63 we conclude that the norm /I . 11 assumes an absolute minimum and an absolute maximum on the compact set B := y E Rd : Ilyllw = 1 . Moreover, the absolute minimum cannot be zero, because IIx I/ = 0 implies x = 0 and 0 # B . The result will be proved if we can show that for any norms /I . Ill and II . 112 on Rd there is a C > 0 such that for all x E Rd \ ( O } we have that IIx (12 I C IIx II 1 . Let M := ma.{ Ilyll2 : y E B } and m := min{ IIyII1 : y E B } , and let x E Rd \ ( 0 )
1
1
16.6. The Normed Topology of Rd
317
t
Figure 37: Geometrically, Theorem 16.76 says that on a finite dimensional space inside any ball with respect to one norm we can find a ball with respect to any other norm and with the same center. Moreover, this smaller ball contains a ball with respect to the first norm with the same center. The figure shows this nesting for balls with respect to the three most common norms on R2, the uniform norm 11 . llo0 (dashed), the Euclidean norm 11 . 112 (solid), and the taxicab norm /I . 11 1 (dotted).
Norms that satisfy the conclusion of Theorem 16.74 are also called equivalent. Theorem 16.76 below shows that any two norms on a finite dimensional vector space are equivalent. Figure 37 provides a visualization of equivalence for norms.
Definition 16.75 Let X be a vector space and let 11 . 11 1 and /I . 112 be norms on X . Then 11 . 11 1 and I/ . 112 are called equivalent ifSthere are real numbers c , C > 0 such that for a l l x E X we have c I J x I I I I Ilxll2 I Clixll1. Theorem 16.76 Let X be afinite dimensional vector space. Then all norms on X are equivalent. Proof. Let 11 . 111 and 11 . 112 be two norms on X. Let { b l ,. . . , b d } be a base of X and let Q : X -+ Rd be the isomorphism that maps each bi to ei . For k = 1,2, define II d
II
/I d
I/
318
16. The Topology of Metric Spaces d
But then for all x := E x ( " h i E X we infer i=l
Equivalent norms induce the same notion of convergence.
Proposition 16.77 Let X be a vector space, let 11 ' 1 1 1 and 11.112 be two equivalent norms on X , let {xn}Elbe a sequence in X and let x E X . Then lim xn = x in ( X , I/ . 111)
if
lim x, = x in cc
I1 +
(x,/ / . 112).
n-+w
Proof. Exercise 16-69.
1
Because equivalent norms induce the same notion of convergence, sequences in d-dimensional space converge iff their component sequences converge.
Theorem 16.78 Let X be a j n i t e dimensional normed space and let { b l , . . . , bd) be a base of X . For each element x in X , let x('), . . . , be the components such d
that x = E x ( ' ) b i . Then a sequence {x.],"=~converges to L in X g a l l component i=l
Proof. Use Theorems 16.4, 16.76, and Proposition 16.77. (Exercise 16-70.)
1
In particular, we obtain that all finite dimensional normed spaces are complete.
Theorem 16.79 Let X be ajnite dimensional piormed space. Then X is complete. Proof. Exercise 16-7 1.
1
Moreover, in finite dimensional spaces compactness is equivalent to being closed and bounded.
Theorem 16.80 A subset C of ajnite dimensional normed space X is compact iff it is closed and bounded.
16.6. The Normed Topology of Rd
3 19
Proof. The direction "+"follows from Proposition 16.60. For "+,"let ( b l ,. . . , bd} be a base of X and for each x E X let x ( I ) , . . . , x@) d
be the components so that x =
x(')bi.Let C
X be closed and bounded and let
i=l
ffi
( x n } z l be a sequence in C . Because the component sequence
is bounded
00
in
R there is a convergent subsequence
with limit x ( l ) . Now suppose n:, oc
has been chosen so that for m = 1, . . . , i the sequence {xi:)}
has a limit i=l
. I -
30
Then we can choose a subsequence a limit x ( ~ + ' ) .But then for rn = 1, . . . , i
1
+ 1 the sequence { x:jl
i=l -
d
oc
has a limit
j=1
Continue this selection process up to i = d. By Theorem 16.78 the subsequence d
{ } oc 'n?
j=1
converges to x :=
x(')bi,and because C is closed, x
E
C.
i=l
Theorem 16.80 and the Heine-Bore1 formulation of compactness allow us to prove that if m , n , d E N with m n = d, then for all sets S E X i m x X i n the d-dimensional Lebesgue measure h d ( S ) is equal to the product measure A, x h,(S) of m-dimensional and n-dimensional Lebesgue measure. In particular, this completes the investigation started in Proposition 14.60.
+
Theorem 16.81 Let d E N and for i = 1, . . . , d let Ji be an interval offinite length. d
I Ji I . Moreovel; if the numbers m, n
E
N satisfy m
+ n = d,
then Ad 1 x;,,, = h, x A,, that is, the restriction of the Lebesgue measure on Rd to x X i n is equal to the product of the Lebesgue measures on Rmand Rn. d
Proof. The inequality
I Ji 1 follows directly from the definition
hd
of outer Lebesgue measure. To prove the reversed inequality, we proceed as follows. For i = 1, . . . , d, let ai be the left endpoint of Ji and let bi be the right endd
point. Let K := n [ a i , bi]. It is easy to prove (see Exercise 16-72) the equality
hd
( fI
i=l
Ji) = h d ( K ) . Now let
i=l
open boxes so that K
u
E
> 0 and let {Dj],"=l be a sequence of dyadic
ffi
j=1
30
D j and h d ( K )
+E >
I D j 1. (By Exercise 14-17, such j=1
16. The Topology of Metric Spaces
320
a sequence exists.) Because K is compact, we can assume without loss of gener-
u N
ality that there is an N E W so that K E
n
D j . For each j E 11,. . . . N } , let
j=l
d
Dj =
( u ! , b j ) . Let M be the largest integer so that 2M is the denominator of
i=l
n (5,T), + c
any of the completely simplified dyadic rational numbers u / and b/ . Let C M be the set d
of all cubes of the form
ci
1
where the C i are integers. Then for each D j
1=1
the equality (Dj( =
(El holds (see Exercise 16-73). For i = 1, . . . , d ,
E E c M , E n D jf 0
li
let lj be the largest integer so that - < ui and let ri be the smallest integer so that 2M Ti
- > bit Then for every E 2M
j
E
E C M that is contained in Q :=
. ~ _ i=l
1 . . . . , N so that E C Dj. Therefore
n1 n d
Because
E
was arbitrary we conclude hd
(i:1
Ji
2
I Ji 1, and hence the two
i=l
sides are equal. In particular, this means that if A is an rn-dimensional open box and B is an IZdimensional open box, then A d ( A x B) = hm(A)h,(Bj. By Proposition 14.60, this equation also holds when one of A or B is a null set and the other is an arbitrary Lebesgue measurable set. But then, because the ( m - and n-dimensional) Lebesgue measurable sets are the a-algebra generated by the open boxes and the null sets, the above and Theorem 14.57 prove that h d I H zAn= h , x h,.
Exercises 16-69. 16-70. 16-71. 16-72.
Prove Proposition 16.77. Prove Theorem 16.78. Prove Theorem 16.79. Let d E N and for i = 1, . . . , d let Jj be a, not necessarily closed, interval of finite length with left endpoint aj and right endpoint bj . Prove that i,d
16.6. The Normed Topology of Rd
321
+ n (3, k), n ($,5) d
16-73. Let M
E
N
and let CW be the set of all cubes of the form
C’
1
where the ci
i=l
d
are integers. Prove that for each dyadic open box of the form D =
the equality
i=l
\Dl =
] E l holds. EECM,E~D#O
n
16-74. The Fundamental Theorem of Algebra. Let P :
C + C defined by P ( z ) := C a k z k be a k=O
nonconstant complex polynomial.
(a) Prove that P is continuous. (b) Prove that IPl : C +- [O, co)assumes an absolute minimum in @. Hint. Recall that C is (as a metric space) isomorphic to B2. Prove that for any M > 0 there is an r 0 so that for all Iz1 > r the inequality P ( z ) > M holds.
1
1
(c) Now prove the Fundamental Theorem of Algebra. That is, prove that there must be a z E C so that P ( z ) = 0. Hint. Suppose there is no such z , let the absolute minimum of IPl be assumed at zo and n 1 b j z j for some rn E consider Q ( z ) := -P(z z o ) . Then Q ( z ) = 1 bmzm
+
P(z0)
+
+
1
W. Apply the triangular inequality and find a z with Q ( z )
j=m+l
I < 1.
n n
(d) Prove that there are, not necessarily distinct, z1, , . . , zn E
C so that P ( z ) = a,
(z - z j )
j=l for all z
E
C.
16-75, Partial Fraction Decompositions.
(a) Let P be a polynomial with real coefficients. Prove that if z P (Z) = 0.
E
C is so that P ( z ) = 0, then
(b) Use the Fundamental Theorem of Algebra to prove that each polynomial with real coefficients can be written as a product of the leading coefficient, linear factors ( x - c) and irreducible quadratic factors ( (x - a)* b2 ), where all constants a , b, and c are real.
+
(c) Prove that every rational function with real coefficients can be written as the sum of a polynomial and a linear combination of horizontally shifted rational functions as in Exercises 12-11, 12-17d, and 12-18. Hint. Induction on the degree of the denominator. (d) Explain why (at least in principle) it is possible to find a symbolic antiderivative for every rational function with real coefficients. 16-76. Prove that on Z2 the norms )I . 112 and
11 . /Ioc are not equivalent.
16-77. Prove that in a finite dimensional normed space a series converges unconditionally iff it converges absolutely. 16-78. Let X be an infinite dimensional inner product space. Prove that { x E X : Ilx 11 5 1 ] is closed and bounded, but not compact. Hint. Apply the Gram-Schmidt Orthonormalization Procedure to a sequence (bn)r= of linearly independent vectors in X to obtain a countable orthonormal system in [ x E X : / / xII i 1
1.
16-79. Proceed as follows to prove that a nomed space X is finite dimensional iff compactness is equivalent to being closed and bounded.
(a) Briefly explain why we only need to prove “e.”
16. The Topology of Metric Spaces
322
For ''e " we prove the contrapositive. So for the remainder; let X be an injinite dimensional n o m e d space and let [b,,]r=l be a sequence in X so that anyjinite subset is linearly independent. Even though we do not necessarily have an inner product in we can adapt the idea from Exercise 16-78.
x,
Let A g X be a nonempty subset. For all x E X , define the distance from x to A as dist(x. A ) := inf [ d(x,a ) : a E A ). (We will investigate this function in Section 16.9.) := { x E span(u1, , , . , u n ) : lIxl/ 5 r For u1, . . . , u,, E X and r 2 0 let B~pan(ul""'un)
Prove that for any element w
f!
span(u1,. . . , u n ) there is an a ( w ) E B~pan(vl""'L'n) so that
1 w - a ( w ) /I = dist ( w , B1span(u1, ....u,) Prove that
/ a(w) j/
}.
) z 0.
< 2l/wl/ 1 4
1
Prove that if llwll < -,then dist ( w - a ( w ) , B~pan(u13""u") ) = w - a(w)
Construct a sequence~=:],u( in X so that I/un // = 1 for all n E i , j E W we have llui - u j // 2 1. Finish the proof of the contrapositive of
/I
N and so that for all distinct
"e."
16-80. Fubini's Theorem revisited. Let h2 be Lebesgue measure on R2 and let hx and h , denote Lebesgue measure on the x - and y-axes, respectively. (a) Prove that if the function f : R2 + [-m, m] is Lebesgue integrable and hx xh,,-measurable, then for almost all elements x E W the function f x ( y ) := f (x,y ) is Lebesgue integrable, for almost all elements y E R the function f ' ( x ) := f ( x , y ) is Lebesgue integrable, the function x H f x dh?; if fx is Lebesgue is Lebesgue integrable and the function 0: otherwise,
{h
(c) State and prove a result similar to the result in part 16-80a for the Lebesgue integral on R3, representing it as an iteration of three single variable Lebesgue integrals. (d) Compute the following integrals:
16.7 Dense Subspaces Recall that the integral of a nonnegative measurable function is defined as a supremum of integrals of simple functions. This means that for every integrable function there should be simple functions arbitrarily "close" to it. The concept of a dense subset expresses this idea in precise terms.
16.7. Dense Subspaces
323
Definition 16.82 Let X be a metric space. A set S C X is called dense in X ifffor every e > 0 and every x E X there is an s E S so that d ( x , s ) < E . So a subset S of a metric space X is dense iff every neighborhood of every point of X contains a point in S. In terms of approximating elements, we can say the following.
Proposition 16.83 Let X be a metric space. Then S g X is dense in X ifffor all x there is a sequence of elements in S so that lirn s, = x.
{sn}zl
E
X
n-+m
Proof. Use Standard Proof Technique 3.8. (Exercise 16-81.) The simplest example of a dense subset are the rational numbers as a subset of the real numbers.
Theorem 16.84 Q is dense in R. Proof. Use Theorem 1.36. (Exercise 16-82.) Once we take care of the usual problem of equality almost everywhere, the above mentioned simple functions can be considered “dense in LP.”
Theorem 16.85 Let ( M , C,p ) be a measure space and let 1 5 p S := { [s] : s E F ( M , R) is simple } is dense in L P ( M , C , p).
i03.
Then the set
Proof. First, consider a nonnegative function g E P ( M , C , k ) . From the proof of Theorem 14.29 (see proof of Theorem 9.19), we infer that there is a sequence~=:],s{ of nonnegative simple functions that converges pointwise to g with 0 5 s, I g for co all n E W. Hence, the sequence { Is, - glp},=, converges pointwise to zero and it is bounded by g p E L P ( M , C , p ) . Thus by the Dominated Convergence Theorem we obtain lirn n+oo
jM
Is, - g l p
1
d p = 0, that is, n-+m lirn [s,] - [gl
{s,}zl
= 0.
Now let f E C P ( M , C , p ) . Let be a sequence of simple functions so that lirn I/s, = 0 and let { t n } z lbe a sequence of simple functions so that n -+ oc
f+I/,
/ t, - f - / P = 0. Then {s, - t n ) z l is a sequence of simple functions and we conclude 0 5 lirn 1 [s, - tn] - [f ] 1 5 lirn IISn - f + /I tn - f - 1 = 0. n+oo n-m lirn
,-too
+
By Proposition 16.83, S is dense in L p ( M , C , p ) . Although simple functions can be defined for arbitrary measure spaces, when additional structure is available it would be desirable to have dense subsets of functions with properties related to that structure. The next result shows that the continuous functions are “dense in LP[a,b].” For some LP spaces, we will find an even nicer dense subspace in Theorem 18.12.
[
Theorem 16.86 Let a < b, let C [ a ,b ] := [ f ]: f Then C [ a ,b] is dense in L p [ a ,b ] .
E
C o [ a ,b ] ]and let 1 5 p
0 and f E CP(R) there is a continuous g E P ( R ) so that llf - gllp < E . The result for U [ a ,b] will follow because CP[a,b ] is embedded in Cp(R) by setting each function equal to zero outside [a,b] and because the restriction of a continuous function on R to [ a ,b] is continuous on [ a ,b]. First, let (1, r ) be an open interval on the real line and let E > 0. Define
Then each h(l,rl,cis continuous on R and the following inequalities hold.
We now prove that for every measurable set A and every E > 0 there is a continuous function g A , E so that lI1A - g A , c l l p < E . For the idea, consider Figure 38. Let {Ij]y=l
u 00
be a sequence of open intervals so that A G
n
E
N be such that
[
j=n+l
j=l 1
I I j 1)
’
0 there is a continuous n
function gs,c so that llgs,e - slip
0. Then gs,e := J=1
a J g A JE , is continuous on R and n(IaJ~+l)
Now finally let f E CP(R) and let E > 0. By Theorem 16.85 there is a simple & & function s so that jl f - s I I p < -. 2 Moreover, g , ' 42 is continuous,
Hence, for every [f]E LP(R)and every E > 0 there is a g E C(R) so that < E , which proves that C [ a ,b ] is dense in U [ a ,b ] .
)I [f]- [ g ]Ilp
It is worth noting that S as well as C [ a ,b ] are actually linear subspaces of the normed spaces L p ( M , C , p ) and LP[a, b ] ,respectively. In finite dimensional spaces, proper linear subspaces cannot be dense in the whole space. In infinite dimensional spaces there can be many dense linear subspaces comprised of "nice" elements. For integrable functions, it is standard practice to prove results for a dense subset of functions with nice properties and then use a limit argument to get the result for all functions. The proof of Theorem 18.37 is a prime example of this approach. Theorem 16.87 gives a first impression how an equality on a dense subset translates to an equality on the whole space.
Theorem 16.87 Let X , Y be metric spaces, let D X be dense and let the functions f,g : x --f Y be continuous with f ID = glD. Then f = g .
16. The Topology of Metric Spaces
326
Proof. Let x
E
X \ D and let (d,}r=l be a sequence in D so that lim d, = x.
(
Then f ( x ) = f lim d,) = lim f (d,) = lim g (d,) = g n+cc n+cc n+cc cause f (x) = g(x) for all x E D this proves f = g.
( lim d, n-30
rn
Because completeness is such a useful analytical property, we conclude this section by proving that every metric space can be viewed as a dense subspace of a complete metric space.
Definition 16.88 Let X , Y be metric spaces. A function f : X + Y is called an isometry iff f o r all x, x' E X we have d ( f (x), f ( x ' ) ) = d (x, x'). If there is an isometry f : X + Y , we will also say that X can be isometrically embedded into Y . Theorem 16.89 Every metric space X can be isometrically embedded as a dense subspace into a complete metric space C ( X ) . Proof. For this proof, let d x denote the metric on X. Define
C(X):= [ { x ( ~ ) ] : : ( x i i j ] "
i=l
1=1
I
isacauchysequenceinx .
Let { x ( ~ ) ) : , [ y ( i ) ) m E C ( X ) . We will first show [ d x ( x ( ~ 4"i))]30 ), is a r=l i=l i=l Cauchy sequence. To do this let i , j E N. Assume without loss of generality that
Thus for any ( x ( ~ ) } : , { y ( i ) ] 3 0 r=1
i=l
E
C(X)the sequence { d x ( x l i ) , y ( i ) ) ) im= l is a
[ )N
Cauchy sequence, and hence it has a limit. For x ( i )
r=l
, [ y ( i ) ) m E C ( X ) , define i=l
16.7. Dense Subspaces
327
We claim that dS is a semimetric on the set C(X). Clearly, for all x,y E c(X) we have dS(x,y ) 2 0,d S ( x , x ) = 0 and dS(x,y ) = d S ( y ,x). Now consider three elements [ x ( ~ )r =)1 : , { y ( i ) } z l [, Z ( ~ ) ] : ~ E C(X).Then
-
Let be the equivalence relation on C(X)as in Theorem 15.61. Let C ( X ) be the set and let d be the metric on C ( X ) obtained from ( C ( X ) ,d S )via Theorem 15.61. AS in Theorem 15.61 we will denote the elements of C ( X ) by [x],where x E C(X). We claim for every { x ( ~ ) } : E C ( X ) and every n E N there is an equivalent 1=1 1 i = l so that for all i, j E N we have dx ( Y ( ~ )y, ( j ) ) < -. n Let n E N.
[
]
{ ~ ( ~ ) } f f ir=l
There is an rn E
N so that for i, j
3 rn we have that dx ( x ( ~ )x ,( j ) )
0. Then x)
{
IiT1.
1
&
&
there is an N E N so that - < - and for all rn, n 2 N we have d([x,], [x,]) 4 -. 3 N 3 & Let n , rn >_ N be fixed. Because ,lim dx ( x:), x?)) = d([x,], [x,]) < -, there is a 1 3 0 0 3 & k E N so that d x (.$I, xi')) < i.Hence, for all m ,n 2 N we obtain
dx ( x j d ) , x i 1 ) )
5
dx (x:!',
+ dx (xi."),x i k ) )+ dx ( x i k ) ,xi'))
x$')
I s 1 r n 3 n Because rn, n 3 N were arbitrary, x is a Cauchy sequence.
0. With N E
XI(^)) < 5,and hence
obtain that for all n , i 2 N there is a k 2 N with dx ( x i k ) ,
(
dx xn( i ) > xi'))
1
I,:
x(~)
16-84. Extensions of continuous functions.
i=l
converges to [XI in C ( X ) .
(a) Let X be a metric space, Y a complete metric space, D G X dense and let f : D + Y be uniformly continuous. Prove that f can be extended to a unique continuous function F : X + Y sothat F I D = f. (b) Prove that there are continuous functions
f : (0, 11 + B that do nor have a continuous
extension F : [0, 11 + W as in Exercise 16-84a.
(c) Give an example that shows that the space Y in Exercise 16-84a must be complete (d) Prove that if X , Y are metric spaces, D X g X is dense in X , D y E Y is dense in Y and f : D X --f Dy is an isometry, then there is an isometry between X and Y . 16-85. Examples of dense subspaces (a) Let ( M , C ,F ) be a measure space. Prove that the space of “simple functions” S := { [s] : s E F ( M , R)is simple ] is dense in L m ( M , X,p). (b) Prove that for 1 5 p < m, C’ := { [f]: f E C ’ ( a , b) ] is dense in L P ( a . b ) . Hint. You can defer most of the proof to corresponding parts of the proof of Theorem 16.86. Only the start needs to be modified.
16.7. Dense Subspaces
329
16-86. Prove that C [ a ,b] is not dense in L m [ a , b] 16-87. Dense subspaces of ( C o ( X , R),11 . //oo). Let X be a compact metric space and let C o ( X ,W) be A vector subspace L the s ace of continuous functions from X to R with the uniform norm I/ . of C ( X , R) is called a sublattice of C o ( X ,R)iff for all f E L we have that max(f, g ) E L and
g
min(f, g ] E L . A subset S of C o ( X ,W) is called point-separating iff for all x , y E X there is a g E S with g(x) f g ( y ) . A vector subspace A of C o ( X , R) is called asubalgebra of C o ( X ,R) iff for all f,g E A we have that f g E A. (a) Prove that if L is a point-separating sublattice of C o ( X ,W) that contains the constant functions, then for all E > 0, all functions f E C o ( X ,W) and all x E X there is a g E L so that g(x) = f ( x ) and for all y E X we have g ( y ) 5 f ( y ) E . Hint. Set h x ( z ) := f ( x ) and for y E X \ ( x ) find g E L with g(y) f g(x) and set
+
h y ( z ) := f ( x )
+ ( f ( y ) - f i x ) ) g'(')( y ) - g&)( x ) . Cover X with open sets I ,
equality h y ( z ) 5 f ( z )
on which the in-
-
+ E holds for all z E I y . Then use a finite open subcover and minima.
(b) Prove that if L is a point-separating sublattice of C o ( X ,W) that contains the constant functions, then L is dense in C o ( X ,W). Hint. Let f E C o ( X ,W). For each x E X find an open subset Ix 3 x and a function g, E L so that g x ( y ) if ( y ) E for all y E X and so that g, ( z ) - f ( z ) < E for all z E I x . Then use a finite open subcover and maxima.
+
I
I
(c) Prove that if L is a subspace of C o ( X , R) so that for all f E L we have that is a sublattice of C o ( X ,W). (d) Let A be a subalgebra of C o ( X ,W) and let P : B f E A the function P o f is also in A .
--f
If1
E
L , then L
B be a polynomial. Prove that for all
(e) Let A be a subalgebra of the space C o ( X , B).Prove that the subspace of C o ( X , W) is a sublattice. Hint. First prove that ; Iis a subalgebra. Then prove that for each f E 2 the function If1 = is also in ;I. To prove that the square root can be taken, use Exercise 16-68c.
fl
(f) Stone-Weierstrass Theorem. Let X be compact. Prove that if A is a point-separating subalgebra of C o ( X ,R)that contains the constant functions, then A is dense in C o ( X , W).
(g) Prove that the subspace of all polynomials on [a, b] is dense in C o [ a ,b ] . (h) Let E > 0. Prove that the subspace of all trigonometric polynomials on [-n, n - E ] is dense in co[-n,n - e l . Hint. Exercise 12-15b. (i) Why are the trigonometric polynomials nor dense in C o [ - n , n]? (j) Let C := [-n,n)with the metric d ( x , y ) :=
I/ (cos(x), sin(x) ) - ( co s( j ) , sin(y) ) / I 2
i. Prove that C is isometrically isomorphic to ii. Prove that C is compact.
[ z E W2 : lIzl12 = 1 ] .
iii. Prove that the trigonometric polynomials are dense in
( Co(C, W), 11 llm )
16-88. Density of polynomials does not imply that all Taylor series converge (a) Let X
Y & Z be metric spaces so that X is dense in Z. Prove that X is dense in Y .
(b) Prove that the set of polynomials is dense in ( C m [ a , b ] , 11
. llm). Derivatives on the boundary
are understood to be one-sided. You may use Exercise 16-878. (c) Explain why part 16-88b does not imply that every function in the space (Coo[-1, 11, I/ . l l ~ ) is the limit of its Taylor series about zero. (Lemma 18.8 will ultimately provide an example of a function whose Taylor series does not converge to the function.)
16. The Topology of Metric Spaces
330
16-89. Use Egoroff’s Theorem (see Exercise 14-37c) and Theorem 16.86 to prove that for every function f E L P [ a ,b] and every E > 0 there is a B & [ a , b ] with h ( B ) > (b - a) - E and a continuous function g : B + so that glB = f. 16-90. A metric space with a countable dense subset is called separable
( a ) Prove that every open subset in a separable metric space is a counrabie union of open balls ( b , Pro\e that a compact metric space is separable. I Hirir. Use Lemma 16.70 for sk := - and all k E ?I. k ( c ) Let 0 & X“ be an open set. Prove that 0 is separable.
Hint.
a‘‘is countable.
(d) Give an example of a complete, separable metric space that is not compact. (e) Prove that if X , Y are metric spaces, X is separable and f : X + Y is continuous and surjective, then Y is separable. 16-91. Let X be a metric space and let D g X . Prove that D is dense in X iff D = X . 16-92. Let X be a metric space. List all closed dense subspaces of X 16-93. Constructing the real numbers from the rational numbers. Prove that the metric space obtained when applying the construction in the proof of Theorem 16.89 to the rational numbers is an ordered field. Hint. Define the operations and . for sequences termwise and prove that they induce well-defined operations on the equivalence classes. Then prove that these operations satisfy the field axioms. Define R+ as the set of equivalence classes of sequences for which there is a positive rational number r so that the terms are eventually greater than r . Then prove that Axiom 1.6 is satisfied. The above, together with Theorem 16.89, shows that the space is a complete, ordered field. The Completeness Axiom as stated in this text can be proved using convergence of Cauchy sequences as in Exercise 2-25.
+
16-94. Prove that the completion of a normed space is a Banach space. 16-95. Prove that the completion of an inner product space is a Hilbert space.
16.8 Connectedness Intuitively, a metric space should be connected iff it is possible to get from any place to any other place by going along an unbroken path. Unfortunately, this would define connectedness in terms of itself, because the only interpretation of “unbroken” is “connected.” Using continuous paths to connect points also leads to some problems, which will be explained in Example 16.96. Therefore, we define connectedness by what it means to be disconnected. Sensibly, being disconnected should mean that the space can be split into two nonempty pieces that are separate from each other. Two disjoint open sets can be considered to be separate entities, because each element in each set has nonzero distance to the respective other set.
Definition 16.90 A metric space X is called disconnected iff there are t ~ ; disjoint o nonempty open sets U , V C X such that U U V = X . A metric space X is called connected iff no such disjoint nonempty open sets exist. Note that if X is disconnected, then the sets U and V are closed as well as open (Exercise 16-96). Such sets are sometimes called clopen.
16.8. Connectedness
331
Figure 39: A disconnected subset D of a metric space can be separated by two open sets U and V into at least two nonempty components (Dl and 0 2 U D3 in this figure).
As with any topological notion, subsets of metric spaces are disconnected iff they are disconnected as metric spaces. Figure 39 gives the idea and Exercise 16-97 investigates the properties of sets that disconnect a subset of a metric space. Most importantly, subspaces can only be disconnected by open subsets of the surrounding space. The most natural example of a connected set is an interval. It turns out that intervals are the only connected subsets of the real numbers.
Theorem 16.91 A subset of the real line is connected iff it is an interval. Proof. For "+,"let S be a connected subset of R and suppose for a contradiction that it is not an interval. Then there are I , r E S with 1 < r such that there is an m E R \ S with I < m < r. But then U := S n (-m, m ) and V := S n ( m ,m) are both open in S, nonempty and disjoint. This implies that S is disconnected, contradiction. For "+,"let I be an interval. Suppose for a contradiction that I is not connected and let U , V 5 I be disjoint nonempty open subsets of I so that U U V = I . Let c E I . Then c E U or c E V . Without loss of generality assume that c E U . Then there is an element u E V so that u > c or u < c. Without loss of generality assume u > c. Consider the set W := (x E V : x > c } ,which is nonempty and bounded below by c. Let b := inf W . Then because c # V and U is open we infer c < b. We claim that b # W. Indeed, otherwise b E V and then there is an E > 0 so that I n ( b - E , b E ) C V. Because c < b and c # V we obtain c 5 b - E . Because I is an interval, we infer ( b - E , b ] I , and then ( b - E , b ] 2 W , contradicting the choice of b. Thus b # W. Because v > b > c and I is an interval we obtain b E I , and hence b E U . But then there is an E > 0 so that I n ( b - E , b + E ) & U . In particular, this means that inf(W) 2 b E , contradicting the choice of b.
+
+
Continuous functions satisfy an abstract version of the Intermediate Value Theorem, with connected spaces taking the place of the intervals on the real line.
Theorem 16.92 Let X , Y be metric spaces and let f : X + Y be continuous. If X is connected, then the image f [XIis a connected subspace of Y .
16. The Topology of Metric Spaces
332
Proof. Suppose for a contradiction that f [ X ] is not connected. Then there are disjoint nonempty sets U , V c f [ X ] that are open in f [ X ] and satisfy U U V = f [ X ] . But then f - ' [ U ] and f - ' [ V ] are disjoint nonempty open subsets of X so that the equality f - ' [ U ] U f - l [ V ] = X holds, contradicting the assumption that X is connetted. Thus f [ X ] must be connected. The Intermediate Value Theorem is more readily recognized in the following result.
Corollary 16.93 Intermediate Value Theorem. Let X be a connected metric space and let f : X -+ R be continuous. Then for all a , b E X with f ( a ) f ( b ) and all t between f ( a ) and f ( b ) there is a c E X with f ( c ) = t.
+
Proof. Exercise 16-98. By using images of intervals in Theorem 16.92 we obtain the idea of connectedness that was mentioned in the introduction.
Definition 16.94 A metric space X is called pathwise connected @for any two points x,y E X there is a continuous jimction f : [0, I ] + X such that f ( 0 ) = x and
f (1)
=Y.
Some examples of pathwise connected sets are given in Exercise 16-99. The next result shows that these are also examples of connected sets.
Theorem 16.95 Let X be a pathwise connected metric space. Then X is connected. Proof. Suppose for a contradiction that X is not connected. Then there are two nonempty disjoint open subsets U and V of X so that U U V = X . Let u E U and u E V and let f : [0, 11 + X be continuous with f ( 0 ) = u and f ( 1 ) = u . Then f - ' [ U ] and f - ' [ V ] are disjoint nonempty open subsets of [0, 11, which is impossible because [0, 11 is connected. rn Exercise 16-100 will show that connectedness and pathwise connectedness are equivalent for open subsets of normed spaces. However, in general pathwise connectedness is strictly stronger than connectedness.
{
Example 16.96 The space X := { (x,0 ) : x 5 0} U (x, sin
(:))
: x > 0) is con-
nected, but it is not pathwise connected. The space X is not pathwise connected, because there is no continuous function
(-&, (:) ) x (;')) x
f : [0, 11 -+ X such that f ( 0 ) =
{ {
both { (x , 0 ) : x 5 0) and (x, sin
0) and f ( 1) = (0,O). On the other hand, :
> 0) are pathwise connected, and hence
connected. Thus, if X was disconnected, there would be open sets U , V C EX2 so that
{ (x,0) : x I 0} c U and U intersects
{
(x, sin
( x . sin
(i))
:
>0)
c V . But (0,O) E U implies that
: x > 0 ) , so this is not possible.
0
16.9. Locally Compact Spaces
333
Exercises 16-96. Prove that a metric space X is disconnected iff there are disjoint nonempty closed sets C , D that C U D = X. 16-97. Disconnected subspaces. Let X be a metric space and let S
C X so
5X
(a) Prove that S is disconnected iff there are disjoint nonempty subsets U , V in X such that S C U U V .
5 X that are open
(b) Give an example that shows that a subspace S can be disconnected and there are no disjoint nonempty subsets C, D C X rhat are closed in X such that S E C U D. 16-98. Prove Corollary 16.93. 16-99. Let X be a normed space
(a) Prove that for any two points a , b E X the function f ( r ) := a
+ r(b - a ) is continuous.
(b) Prove that X is pathwise connected. (c) Prove that for all x E X and all E > 0 the ball B E ( x )is pathwise connected. 16-100. Let X be a normed space and let R connected.
g X be open. Prove that R is connected iff it is pathwise
16-101. Let X , Y be metric spaces and let f : X + Y be continuous. Prove that if X is pathwise connected, then f[X]is a pathwise connected subspace of Y . 16-102. Unit “circles.”
+
(a) Prove that { ( x , y ) E W2 : x 2 y2 = 1 ] is connected. Hint. c(r) = cos(r)el sin(t)e2.
+
[ (x,y ) E R2 : I l ( x . y ) l l p = 1 ] is connected. [ (x,y ) E R2 : II(x, y ) I l m = 1 ] is connected.
(b) Let 1 5 p < (c) Prove that
30.
Prove that
16-103. Connected components. Let X be a metric space. A subset C 5 X is called a component of X iff C is connected and there is no proper superset D 3 C that is connected. (a) Prove that if A , B
5 X are connected and A n B # 0, then A U B is connected.
(b) Prove that if C1. C2 are components of X then either C1 = C2 or C1 n C2 = 0. (c) Prove that every x E X is contained in a component of X. (d) Prove that every open subset of
W is a countable union of pairwise disjoint open intervals.
16.9 Locally Compact Spaces
The goal of this section is to construct (families of) continuous functions that are equal to one on a specified set and equal to zero on another set. To construct such functions, we introduce the distance function.
Definition 16.97 Let X be a metric space and let A X be nonempv. For all x E X , we dejine dist(x, A) := inf { d ( x ,a ) : a E A } and call it the distance from x to A. Lemma 16.98 Let X be a metric space and let A C X be nonempty. Then the function dist(., A ) is Lipschitz continuous.
16. The Topology of Metric Spaces
334
+
Proof. Let x , y E X, let E > 0 and let a E A be so that d ( y , a ) 5 dist(y, A) E . Thendist(x, A) 5 d ( x , a ) 5 d ( x , y) d ( y , a ) 5 d ( x , y ) dist(y, A) 6 , andhence dist(x, A) - dist(y, A) i d ( x , y) E . Because E > 0 was arbitrary this means that dist(x, A ) - dist(y, A) 5 d ( x , y). We can prove dist(y, A) - dist(x, A) 5 d ( x , y) in similar fashion. Hence, Idist(x, A) - dist(y, A)l 5 d ( x , y) and dist(., A) is Lipschitz w continuous with Lipschitz constant 1
+
+
+
+
Lemma 16.99 below says that for any closed set C that is contained in an open set
U the distance function allows us to slip another open set V between C and U so that the closure of V is also between C and U . For an illustration of Lemma 16.99, see Figure 40 on page 336.
Lemma 16.99 Let X be a metric space, let C s X be closed and let U C X be open so that C U . Then there is a continuous function f : X -+ [0, 11 so that f Ic = 1 and f Ix\u = 0. Moreovel; there is an open set V so that C C V C 7 U . dist(x, X \ U ) . Because C and dist(x, X \ U ) dist(x, C) X \ U are disjoint closed sets, the denominator is greater than zero for all x E X (see Exercise 16-104b). Thus f is continuous on X. Moreover, (see Exercise 16-105) f / c = 1 and f Ix\u = 0.
Proof. For each x
E
X, let f ( x ) :=
To prove the claim about the sets, let V := f - '
+
[(i,
111. Because
(k,
11 is open
in [0, 11 and f is continuous, V is an open set in X and because f Ic = 1 it contains 1 C. Moreover, because f is continuous, for all x E 7we infer f (x)2 :, which means that
v
L
w
U
For compact sets C, we would like to separate C from its neighborhood U with an open set V whose closure is compact. While this is not possible in general (see Exercise 16-log), it is possible in spaces with sufficiently many compact subsets. Local compactness guarantees that locally there are enough compact subsets by demanding that every point has a compact neighborhood.
Definition 16.100 A metric space X is called locally compact i f e v e r y x compact neighborhood.
E
Proposition 16.101 A metric space X is locally compact iff f o r every x an E > 0 so that B , ( x ) is compact.
X there is
E
X has a
Proof. For "+,"let X be locally compact and let x E X. Then x has a compact neighborhood N . Let E > 0 be so that B , ( x ) N . Then by Proposition 16.61 the set B, ( x ) is compact. Conversely, let X be so that for every x E X there is an E > 0 so that B,(x) is w compact. Then for every x E X the neighborhood N := BE( x ) is compact. It is easy to infer from Proposition 16.101 that all open subsets of Rdand all closed subsets of Rd are locally compact. In particular, we obtain that surfaces like the unit
16.9. Locally Compact Spaces x E Rd :
lowing.
llxil2
335
I
= 1 are locally compact. More generally, we can say the fol-
Definition 16.102 Let X . Y be metric spaces. Then f : X + Y is called a homeomorphism iff f is continuous and bijective and its im.erse is continuous, too. Example 16.103 Any metric space for which each point has a neighborhood that is homeomorphic to an open set in R" is locally compact. If the dimension d does not depend on the point, we call the space a manifold. Manifolds are discussed in detail in Chapter 19. Surfaces such as the unit sphere are manifolds. For much of the following, solids and surfaces in Rdare a good visualization and motivation. 0 In locally compact spaces, between any compact set C and any open neighborhood U of C we can slip a compact neighborhood of C (also see Figure 40(a)).
Lemma 16.104 Let X be a locally compact metric space, let C 5 X be compact and let U 5 X be open with C U . Then C has a neighborhood V so that is compact and contained in U .
v
Proof. For each c E C, there is an E~ > 0 so that B,,(c) is compact and conBe, ( c ) and C is compact, there are c1, . . . , cn E C so tained in U . Because C
u
u
C€C
j=1
u n
n
that C E
BEc ( c j ) . But then V :=
B,, ( c j ) is a neighborhood of C so that by
j=1
Exercises 16-44b and 16-1 10 the closure 7=
u n
u n
B,, ( c j ) = BEc ( c j ) is compact. j=1 J=1 ) contained in U , the union is contained in U , too. W Moreover, because all B E c ( c are
Standard Proof Technique 16.105 The argument in the proof of Lemma 16.104 is a typical application of the Heine-Bore1 formulation of compactness. We start with an open cover and because the notion we want to preserve is only preserved by finite unions, we use a finite subcover. 0 Local compactness only applies near individual points. As it turns out, for connected spaces we can turn this local idea into a property that allows us to use compactness in a more global fashion.
Definition 16.106 A metric space X is called a-compact #X is the union of countably many compact sets. Clearly, closed subsets of Rd are a-compact, because their intersections with the closed balls B, (0) are compact. More specifically we can say the following.
Theorem 16.107 Let X be a connected, locally compact metric space. Then X is a compact.
16. The Topology of Metric Spaces
336
Figure 40: Part ( a ) illustrates Lemmas 16.99 and 16.104, which say that between any closed (compact) set C and any open set U surrounding C there is an open set V so that V and 7are “between” C and U . Part ( b ) shows a a-compact space and the idea for the proof of Theorem 16.112. The concentric circles depict a compact exhaustion (the sets K,, in the proof of Theorem 16.1 12). The shells S,,and their neighborhoods U, are set up so that only finitely may of the U, can intersect.
I
-
I
Proof. For each x E X, let r, := sup r > 0 : B,(x) is compact . Because X is locally compact, each r, is greater than zero. If x E X is so that r, is infinity, then co
for all n
E
N the set B , , ( x ) is compact, and hence X
=
uB,(x)
is a-compact. This
n=l
leaves the case in which each r, is finite. In this case, we first prove that the function x H r, is continuous. Let x,z E X be so that d(x,z ) < r,. Then for all r E ( d ( x ,z ) , r,) we infer B r - d ( x , z ) ( ~ C ) B,(x) and the ball on the right is compact. Hence, rz 2 r, - d ( x , z ) , that is, r, - rz 5 d ( x , z ) . Reversing the roles of x and z we can also prove rz - r, 5 d(x,z ) , which means lr, - rzI 5 d ( x , z ) for all z with d(x,z ) < r,. Hence, x H r, is continuous at each x E x. Now for each compact subset C of X we define N ( C ) :=
u
B?(c). We claim
C€C
that the set N ( C ) is compact. Let ~=:},x{ be a sequence in N ( C ) . For each x, there is a c, E C so that x, E B% (c,). Because C is compact, {cn)Zl has a convergent subsequence with limit c E C. Without loss of generality we can assume that [en]:=, itself converges to c. Then there is an N E N so that for all n 2 N the inequalities TC rC rc, 57, d(c,,, c ) < - and Ire, - r c / < - hold. But then for all n 2 N we have - < -, 4 4 2 8 rc rc 7rc and hence d ( x n , c ) 5 d ( x , , c,) +d(c,, c) < - < -, so x, E B b ( c ) . Be2 4 8 has a convergent subsequence. This proves that cause B7 ( c ) is compact, ~=:},x{ 3 N ( C ) is compact.
+
Now let x
E
X be arbitrary and let C1 := {x}. Recursively define Cn+l := N(C,)
16.9. Locally Compact Spaces
u
337
oc
for n E
N. Let H
:=
C,. Then because each C, is compact, H is a-compact.
n=l
Moreover, because each Cn+l is a neighborhood of every element of C,, H is open. To see that H is closed, let x E %. Then there is an h E H so that d ( h , x ) < 4 rX rh 3r.x rx and Irh - rx 1 < -. But then - > - > -, which means that if n E N is so that 4 2 84 h E C,, then x E N(C,). Hence, H = H is closed and so X \ H is open. Because X is connected, this means that X = H and since H is a-compact the result is proved. We conclude this section by proving that locally compact spaces have a partition of unity (see Definition 16.110 below) with certain properties.
Definition 16.108 The cover 0 of the metric space X is called locally finite iff each p E X has a neighborhood that intersects onlyjnitely many elements of 0. Definition 16.109 Let X be a metric space and let f : X + R. Then the support of f is dejked to be supp(f ) := { x E X : f ( x ) # 0). Definition 16.110 A family {cpj)jEJ of continuous functions on a metric space X is called a partition of unity iff 1. The collection { {x E X : cpj ( x ) $ 0 } }j E is a locallyjnite cover of X . 2. For all x
E
X we have that
cpj ( x ) =
1. (By part 1 for each x E X this sum
jEJ
has only jnitely many nonzero terms.) If0 is an open cover of X and for each j E J the containment supp(cpj) for some U E 0, then the partition of unity is called subordinate to 0.
c U holds
The importance of partitions of unity will become clear in Section 19.5. Until then, consider the following. Many surfaces in Rd cannot be parametrized with just one function that has an open domain. (Open domains are needed, because differentiable functions typically have open domains, see Chapter 17.) For example, a parametrization of the unit sphere, say, with spherical coordinates, will always either hit a few points twice or it will m i s s at least a “seam.” This is because a parametrization must be a homeomorphism and the unit sphere is compact, which means it cannot be homeomorphic to an open subset of I@. However, roughly speaking, a function is integrated over the sphere by integrating its composition with the right parametrization. Thus, it is problematic to double count points or m i s s points. Either case would distort the integral. This problem does not arise for functions that are zero except on some small open set. For such functions, we could simply use a parametrization for which the missed seam does not intersect the support of the function. A partition of unity {cp,)jGJ allows us to represent arbitrary functions f as sums ‘ p j f of functions ‘pj f whose supports jEJ
are contained in “small” open sets. We can then integrate these functions separately
16. The Topology of Metric Spaces
338
and the overall sum will be the integral of f.There is still a tremendous amount of detail left to be considered (think about independence of the parametrization), and this is why we will later need partitions of unity subordinate to open covers that have further nice properties. Because not every cover is locally finite, we need the notion of a refinement.
Definition 16.111 Let 0 be a cover of the metric space X . A cover refinement of 0#for all U E 6 there is a V E 0so that U C V .
6 is called a
Theorem 16.112 Let 0 be an open cover of the locally compact, a-compact metric space X . Then 0 has a countable locallyfinite open refinement. Moreovel; the closures of the sets in the rejinement are compact, Proof. There is nothing to prove if X is compact. If X is not compact, let { C ; } c l cx
be a sequence of compact sets with X =
u
C;. Let K1 := C1 and once K1, . . . , K,
j=1 are defined, let K,+1 be a compact neighborhood of K, U C, so that K, U C, g K,",,
u
.
M
Then X =
K , and for all n
E
N we have the containment K, C K,",, .
n=l
Let K-1 := KO := 0.For all n E
u
N let S,
:= K,
\ K,"-l and U,
:= K,",,
\
Kn-2.
ffi
Then X =
Sj and for all n
E
N the set S, is compact, U, is open and Sn g U,.
j=1
For In - ml z 1, we have S, n S, = 0 and for all / n - ml > 2 we have U , n U , = 0 (also see the right part of Figure 40). For each x E S, , let Nx be an open neighborhood of X that is contained in U, and in some 0 E 0. Then S,, C N x and because S,
u
X€S,
u kn
is compact, there are x1("), . . . , xj:)
E S, so that S, _C
j=1
We define
d :=
CL)
u
Nx(!] : n [
k nJ
u
E
N,j
= 1, . . . , k,
I
Nx(!).
'
-
. Clearly, 0is countable. Because
N,c,) for all n E N we conclude that 6 is a cover of X. All j=1 ' N.x(,) are open and contained in an 0 E 0, so 6 is an open refinement of 0. Finally,
X=
S j and S,
C
j=1
6 is locally finite, let x E X . Then there is a k E N so that x @ U,, unless + 1, k + 2). Any N$) that intersects uk U uk+l U u k + 2 must be contained in
to iee that m E { k ,k
one of uk-2,. . . , uk+4. These sets contain finitely many N;:) each. Therefore x can be in at most finitely many N$), and hence 6 is locally finite. Finally, because each Nx is contained in a U, 5 K,+1, the closure of each Nx is compact. As it turns out, any locally finite open cover of a a-compact space is countable.
16.9. Locally Compact Spaces
339
Proposition 16.113 Let 0be a locallyjinite open cover of the a-compact metric space X . Then 0 is countable. Proof. Exercise 16-109. To construct a partition of unity subordinate to a locally finite open cover we need to find a way to define functions that are supported inside the open sets so that the sets where the functions are not zero cover the whole metric space. To do that, we need to be able to shrink every set in the open cover a little bit while making sure that we still cover the whole space.
Theorem 16.114 Shrinking Lemma. Let 0 be a locally$nite open cover of the locally compact, a-compact metric space Then for each U E 0 there is an open set VIJso that c U and so that 6 := { VIJ: U E 0}is a locallyjnite open cover of X .
x.
Proof. By Proposition 16.113, 0 is at most countable, so let {U, : n E W} := 0. (For finite covers 0, the construction below terminates in finitely many steps with the
u 00
desired cover
6.) The set C1 := U1 \
u 00
U,, = X
\
n=2
U,, is closed and contained
n=2
in U1. By Lemma 16.99, there is an open set V1 so that C1 C V1 G % E U1. Then 0 1 := { V l }U {U, : n > 1) is anopencover of X and% & U1. Once an open cover 0 k = { V1, . . . , Vk} U {U,, : n > k } has been constructed so that
q C Uj holds for j
= 1, . . . , k, let C k + l := u k + l
\
u [j:l
co
Vj U
u n=k+2
Un
)
. Then
Ck+l is closed and contained in u k + l . By Lemma 16.99, there is an open set Vk+l so u k + ] . Let L?k+l := { V l , .. ., v k , vk+l}u{un: n > k 1). that ck & v k + l & v k + l Then 0 k + l is an opEncover of X and% E Uj for all j = 1 , . . . , k 1. Now consider 0 := { Vj : j E N}. Because (3 is locally finite, for each x E X there is an N E N so that x # U, for all n 2 N . But then, because ON was an open cover of X, there must be a j < N so that x E V j . Hence, 8 is a cover of X. By construction, for all j E N the set V, is open and satisfies _C U j . Because 0 is locally finite, 6 must also be locally finite and thus the result is proved.
+
+
Theorem 16.115 Let 0 be an open cover of the locally compact, a-compact metric space X . Then there is a partition of unity subordinate to 0. Proof. Let U be a locally finite open refinement of 0 as guaranteed by Theorem 16.112. Let u" be a locally finite open cover so that for every U E U there is a VIJE U so that 5 U , as guaranteed by the Shrinking Lemma. For each U E 24,let WIJ be an open set so that 5 W I J 6C U as provided by Lemma 16.99 and let = 0 as $U : X -+ [O, 11 be a continuous function so that $ I J I ~ , = 1 and provided by Lemma 16.99. Because U is locally finite, each x E X has a neighborhood V so thatLor all u E V the equality $ u ( v ) = 0 holds for all but finitely many U E U . Because U is a cover of X, for at least one U E U we have + u ( x ) # 0. Hence, for all x E X the sum +(x) := $ I J ( X ) is a positive real number. Moreover, for each UEU
340
16. The Topology of Metric Spaces
E X on a neighborhood V of x the function @ is the sum of finitely many continuous functions. Hence, @ is continuous on this neighborhood of x and so @ is continuous at x. Because x was arbitrary, @ is continuous on X.
x
= {x E X : @ u ( x )# 0). For all
For each U E U , we have @u
U E U define q y := -.
7b
over, for all x
E
X we have
Then { {x E X : cpu(x) # O ) } u E U is locally finite. More-
c
cpu(x) =
c
uEU
U€U
@u(x) - @(x) - 1. Hence, $(XI
{cpu}uEu
@(XI
is a partition of unity. For each U , we have supp(cpu) G U G 0 for some 0 the partition of unity {cpu}uEuis subordinate to 0.
E
0, so
Exercises 16-104. Let X be a metric space and let A
G
X be a nonempty subset.
(a) Prove that dist(x, A) = 0 iff there is a sequence [a,]:=, (b) Let C
CX
in A with lim a, = x. n-t 30
be closed and nonempty. Prove that dist(x, C) = 0 iff x E C
16-105. Let a , b be distinct nonnegative numbers that are not both zero. Prove that we have
a
5 E [O, a f b
a = Oiffa = 0,andthat = 1 iff b = 0. that a f b a f b 16-106. Let X be a metric space and let C & X be a nonempty compact subset. Prove that for all x is a c, E C so that d ( x , cx) = dist(x, C).
16-107. Let X be a metric space. For any two nonempty subsets A , B as dist(A, B ) := inf { dist(a, B ) : a E A
1.
E
11.
X there
X, define the distance from A to B
(a) Prove that for all nonempty subsets A , B g X we have dist(A, B ) = dist(B, A). (b) Give an example of two closed, disjoint, nonempty sets A , B such that dist(A, B ) = 0. (c) Prove that the function in Lemma 16.99 need not be uniformly continuous, and hence in particular it need not be Lipschitz continuous. (d) Prove that if B and C are not empty and C is compact, then there is a c E C so that dist(C, B ) = dist(c, B ) .
16-108. Give an example of a metric space in which Lemma 16.104 fails. That is, give an example of a metric space in which no neighborhood of a compact set is compact. 16-109. Prove Proposition 16.113.
u I1
16-1 10. Let X be a metric space and let C1, . . . , Cn be compact subsets of X. Prove that
C j is compact.
j=l
16-1 I 1. Give an example of a bijective continuous linear function whose inverse is not continuous.
Chapter 17
Differentiation in Normed Spaces To discuss differentiation, we need algebraic operations and a metric. Normed spaces have the algebraic structure of a vector space and a metric induced by the norm (see Proposition 15.54). The vector space structure is typically discussed in linear algebra. To keep the text self-contained, we introduce the requisite concepts and ideas in this chapter. Moreover, we freely use metric concepts discussed in Chapter 16. The presentation will be coordinate-free and valid for arbitrary (including infinite) dimensions. Although derivatives are mainly used in finite dimensional spaces and computed through partial derivatives along coordinate axes, this abstraction does not make the proofs more complicated. Instead, the omission of coordinates allows us to focus on the conceptual core of differentiation. The relevant results for finite dimensional spaces are given as consequences of the general theory. To understand differentiation in multidimensional spaces, we must first adjust our expectation what a derivative should be. The derivative cannot be a number or a slope, because it is not clear in which direction this slope would go. Instead, differentiation is defined similar to Theorem 4.5, which says that the derivative at x determines the unique straight line through (x,f(x)) for which the difference (at z ) between the function and the line goes to zero faster than ( z - XI. Geometrically, (hyper)planes will take the place of lines. Linear functions are the analytical tool used to define (hyper)planes. We start our investigation with linear functions in Sections 17.1 and 17.2. Derivatives and partial derivatives are introduced in Sections 17.3 and 17.5. In between, Section 17.4 introduces the Mean Value Theorem, which is crucial for using derivatives to estimate differences. Section 17.6 introduces tensors, which are needed to represent higher derivatives in Section 17.7. We conclude in Section 17.8 with the Implicit Function Theorem, which provides important examples of manifolds.
17. Differentiation in Normed Spaces
342
17.1 Continuous Linear Functions The definition of a linear function is entirely algebraic. It simply states that the function is compatible with the vector space operations.
Definition 17.1 Let X , Y be vector spaces. A function L : X -+ Y is called linear i f s E X we have L[ xl x2] = L [ x l ]+ L[x2]andfor all (Y E IR and x E X we have L [ ( Y x ]= a L [ x ] .A linearfunction is also sometimes called a linear operator.
for all X I ,x2
+
Notation 17.2 Derivatives will be functions that map points to linear functions. To distinguish the various evaluations, throughout the text we will enclose the argument of a linear function in square brackets rather than round parentheses. To avoid confusion with the square brackets which indicate that the elements of LP spaces are equivalence classes, we will henceforth omit the brackets around the elements of LP spaces, as is customary in analysis. 0 Differentiation and integration both lead to examples of linear functions.
Example 17.3 1 1 1. Let ( M , C, p ) be a measure space, let 1 5 p , q 5 co with - - = 1 and let P 9 g E L q ( M , C , p ) . By Example 16.28, Zg[f ] := fg d p defines a continuous
+
function Zg : L p ( M , 0,p ) -+ is linear.
s,
IR and it is easy to see via Theorem 14.39 that Zg
2. Let 1 5 p < co and let X be the subspace of functions f E C' ( a , b ) n L p ( a , b) so that f' E L p ( a , b ) , too. When it is not mentioned explicitly that elements of L P ( a , b) are equivalence classes, it is customary to use intersections like C ' ( a , b ) n L p ( a , b ) rather than the more complicated notation from Theorem 16.86. We claim that the function D : X + LP(a, b ) defined by D [f ] := f' is linear, but not continuous. If two functions in X are equal almost everywhere, then they must be equal (see Exercise 8-5). That is, i f f , g E X g L p ( a , b ) and f = g a.e., then f = g, which implies that D is well-defined. (Just because we do not mention that the elements of LP are equivalence classes does not mean we do not need to pay attention to this fact when we define a function.) Linearity of D follows easily from the corresponding linearity properties of the derivative (see Theorem 4.6). To see that D is not continuous, we will only consider the case a = 0 and b = 1 here. The general case is left to the reader in Exercise 17-1. Because
17.1. Continuous Linear Functions
lim
=
n--tm
343
(n p - p + l
=(m;
np
1;
i f p > 1, i f p = 1,
0
the function D is not continuous at 0.
It is a good rule of thumb that integration usually defines continuous linear functions and differentiation usually defines discontinuous linear functions. We have to be careful, though. Exercise 17-10 shows that integration need not always define a continuous linear function and Exercise 17-13 shows that differentiation can be continuous. The multiplication of a matrix with column vectors is another fundamental example of a linear function. We will investigate the connection between the “abstract” concept of continuous linear functions and the more “concrete” idea of matrix multiplication in Section 17.2. Before then, we need to investigate continuity for linear functions. In Example 17.3, we have only proved that D is not continuous at the origin. Loosely speaking, for linear functions the behavior at the origin is duplicated at every point. In particular, Exercise 17-2 shows that D is discontinuous at every point of LP(a, b). As a positive result, Theorem 17.4 below shows that for linear functions continuity at the origin is equivalent to continuity everywhere. Note how the proof uses linearity to reduce every situation to a configuration near the origin.
Theorem 17.4 Let X, Y be normed spaces and let L : X + Y be a linearfunction. The following are equivalent: 1. L is continuous on X. 2. L is continuous at 0. -
3. L is boundedon B l ( 0 ) c X. 4. T h e r e i s a c E R s o t h a t f o r a l l x ~ X w e h a v elIL[x]//icllxll.
Proof. The implication “1=+2” is trivial. For “2+3,” let E > 0. Because L is continuous at 0, there is a 6 > 0 so that for all x E X with Ilxll = //x- 011 5 6 we have IIL[x] < E . Therefore for all points x E X with / / x = / /IIx - 011 I 1 we obtain 1 1 IIL[x]11= IIL :6x] = L [6x] r: ; E . Hence, L is bounded on B1(0).
/
1
[
For ‘‘3=>4,”let c allx
E
x weinfer
E
1
1
R be such that for all x
E
l l ~ [ x ]=( /
For ‘ ‘ 4 j l . ” let c > 0 be such that for all x &
E
B1(0) we have 1 L [x]1 5 c. Then for
E
> 0 and set 6 := -. Then for all x,y E X with lIx - y C
1
X we have L [ x ] I /5 cIIxII. Let
I/
< 6 we obtain
1/L[xl- L[YlII = / / L b- Y 1 / / I cllx - YII
0 be such that for all x d
wehave
I I X ~ 5~ rllxll ~ andletc:=rxIIL[ui]ll. i=l
. E
X
d
Thenforallx=xciui ~ X w e i=l
obtain
which means that L is continuous. Linear functions can be added pointwise and they can be multiplied with real numbers. Hence, the linear functions from one normed space to another form a vector space. Part 4 of Theorem 17.4 enables us to define a norm on this space.
Definition 17.7 Let X, Y be normed spaces. We dejine C ( X , Y ) to be the set of all continuous linear functions from X to Y . Theorem 17.8 Let X, Y be normed spaces. Then, with pointwise addition and scalar multiplication, C ( X , Y ) is a vector space. Moreovel; the function
Proof. The proof that C(X, Y ) is a vector space is left to Exercise 17-3a. To prove that C(X, Y ) is a normed space, we start by defining for all L E C(X, Y ) the quantity 11 L /I := inf { c L 0 : (Vx E X : 11 L[x] 11 5 cI/x[I)}. This is necessary, because we do not know a priori that the infimum is assumed, as is implicitly claimed in the definition of the norm on C ( X , Y ) in the statement of the theorem. To prove that the 11 L 11 defined above is a norm, first note that if L : X + Y is linear and bounded, then IlLll = 0 iff inf{c L O:(Vx E X:/L[x]ll I cllxll)}=O iff for all
17.1. Continuous Linear Functions
x
E
345
X we have L [ x ] = 0 iff L = 0. Moreover, if L E C ( X , Y ) and (Y E
1la~11= inf{c 2
R,then
o : (VX E x : ~ l a ~ [ x I l l lc11x11)}
For the triangular inequality, let L , M E L ( X , Y ) . Then
o : (VX E x : / I L [ X+] M [ X ] I r:I C I I X I I ) } I inf{c 2 o : (VX E x : IIL[XIII + IIM[XIII I cIIxII)} 5 inf{a 2 o : (VX E x : /l~[xlll5 a / l x l l ) }
I I L+ M I / =
inf{c 2
+inf { b 2 0 : (Vx E X : llM[x]11 I bllxll)} =
IlLIl + IlMIl.
By Exercise 17-3b, we have l l L [ x ] / /I llLll IIxII for all x E X. In particular, the infimum that is used to define the norm is actually a minimum, as claimed. Unless otherwise indicated, throughout this text the norm on any space of continuous linear functions will be assumed to be the norm from Theorem 17.8.
Definition 17.9 The norm from Theorem 17.8 is also called the operator norm of the continuous linear function L. Another way to represent the norm of a continuous linear function is the following.
Proposition 17.10 Let X , Y be normed spaces and let L : X + Y be a continuous linearfunction. Then 11L11 = sup I I L [ x ] : x E B1(0)].
{
Proof. Exercise 17-4.
1
rn
The completeness of the spaces C(X,Y ) solely depends on the completeness of the image space Y .
Theorem 17.11 Let X be a normed space and let Y be a Banach space. Then C ( X , Y ) is a Banach space.
{Ln)rz1
Proof. Let be a Cauchy sequence in C ( X , Y ) . We first claim that for all x E X the sequence { L n [ x ] } zisl a Cauchy sequence. To prove this claim, let x E X and let E > 0. Find an N E N so that for all m , n 3 N we have the inequality & llLm - L l l I . Then for all m , n 2 N we infer IIxII + 1 ~
17. Differentiation in Normed Spaces
346
l a Cauchy sequence. and { L n [ x ] } z is Because Y is complete, for all x E X we can define L [ x ]:= lim L , [ x ] . It is easy n+m to prove that L is linear (see Exercise 17-5). Because the reverse triangular inequality /llLMII- ~ ~ L5 nIILm ~ ~- L,(( / holds for all m , n E N we obtain that { ~ ~ L n ~ ~ } is a Cauchy sequence, and hence lim IIL,)I exists. Now for all x E X the inequaln+x,
(ii /lL,/l) % //XI/ holds, and hence L
ity / / L [ x l /=/ lim / / L , [ x ] / II ,--too
E
L(X,Y ) . To
prove that L is the limit of { L n } z lin L ( X , Y ) let E > 0. Let N E N be such that for all m , n 2 N we have llLm - L , )I < E . Then for all n 2 N we obtain
llLm - L , /I IIx 11 : x Thus for all n
> N we have IIL - L , 11
E
-1
Bl(0) 5
E.
< E and { L n ] z converges l to L .
w
For our investigation of derivatives, it is important to realize that there is a simple bijective correspondence between the elements of a normed space Y and the linear functions from R to Y .
Theorem 17.12 Let Y be a normed space. For f E Y define L f : R -+ Y by L f [ x ]:= xf . Then the function I [f ] := L f defines an isometric isomorphismfrom Y to C(R, Y ) . Proof. Clearly, for all f E Y the function L f is linear and II L f II = I/ f 11. Linearity of I is trivial. Because IILf 11 = 11 f 11, I is an isometry, and hence it is injective. To prove that I is surjective, let L E C(R, Y ) . Set f := L[1]. Then for all x E W we have L [ x ] = L [ x l ] = x L [ l ] = xf , which means that L = L f and I is surjective. We conclude this section by noting that for continuous linear functions we could always assume that the domain is a Banach space. Recall that by Theorem 16.89 and Exercise 16-94 every normed space can be densely embedded into a Banach space. Theorem 17.13 shows that any linear function L : X -+ Y can be extended to a unique continuous linear function from the completion of X to the completion of Y . This means that as long as we work with continuous linear functions, which we do exclusively in this chapter, it would be no loss of generality to assume (as is often done) that domain and range are Banach spaces.
Theorem 17.13 Let X be a normed space, let Y be a Banach space, let D & X be a dense linear subspace of X and let L : D -+ Y be a continuous linear function. Then there is a unique continuous linear function M : X -+ Y such that M I D = L. Proof. Exercise 17-6.
w
347
1 7.1. Continuous Linear Functions
Exercises 17-1. Let 1 5 p < co,let a < b and let X := { f E C ’ ( u , b ) n L P ( a , b ) : f ’ E L p ( u , b ) D : X -+ L P ( a , b ) defined by D [ f ] := [ f ’ ]is not continuous.
}. Prove that
17-2. Let X , Y be named spaces and let L : X -+ Y be linear. Prove that if L is discontinuous at the origin, then it is discontinuous at every x E X . 17-3. Finish the proof of Theorem 17.8. Let X, Y be normed spaces (a) Prove that L(X,Y ) with pointwise addition and scalar multiplication is a vector space. (b) Let L : X + Y be a continuous linear function. Prove that
1 L[x] /
5 IILII IIx 11 for all x
E
X.
17-4. Prove Proposition 17.10. 17-5. Finish the proof of Theorem 17.11 by proving that the function L [ x ] := lim L n [ x ] defined in the n+m proof is linear. 17-6. Prove Theorem 17.13. 17-7. Let X, Y be vector spaces and let U be a base of X. (a) Let L . M : X + Y be linear. Prove that if L ( b ) = M ( b ) for all b E U, then L = M . (b) Prove that if f : U + Y is a function defined on the base 8,then there is a unique linear function L f : X + Y so that L f / 8 = f . 1 17-8. Let 1 5 p 5 x, let q be so that P
s { ~IF=, , ({xj]gI) :=
+ -1 = 1 and let
02
4
( U ~ J ~E = 14. ~
Prove that the function
C a j x j is a continuous linear operator from ZP to R. j=1
17-9. Let X,Y be normed spaces and let L : X + Y be linear. Prove that if L is continuous at some x E X,then L is continuous on X. 17- 10. Integration need not always define a continuous linear function. Let X be the space of all continuous functions f : JR + JR so that { x : f ( x ) f 0 } is bounded.
11 . 1/30 (a) Prove that X is a normed subspace of (C(W,a), (b) Prove that L : X -+ W defined by L [ f ] :=
__ (c) Prove that L is not bounded on B l ( 0 ) .
1,
).
x f ( x ) dx is a linear function on X
(d) Let f E X. Find a sequence ( f n ) ~ = of , elements of X that converges to f and such that 02 { L [ f n ] }n=l does not converge to L [ f ] . (e) Prove that X is not a Banach space.
(0 Prove that X
is not dense in (C(W, W),11
).
1130
17-11, Let X be the space of polynomials of order at most 3 on the interval (0, 1). Prove that D [ p ] := p’ is a continuous linear function from X to X. 17-12. Let X. Y and Z be normed spaces and let K : X functions.
--f
Y and L : Y + Z be continuous linear
(a) Prove that lIL o Kll 5 ~ ~ L i ~ ~ ~ K i ~ (b) Prove that the inequality in part 17-12a can be strict.
17-13, ProvethatifC’(u,b)isequippedwiththenorm I l f l l := l I f l l r n f H f ’ is a continuous mapping from C ( a , b ) to Co ( a , b ).
+ llf’llm
(seeExercise 16-17),then
17. Differentiation in Normed Spaces
348
17.2 Matrix Representation of Linear Functions The coordinate-free introduction in Section 17.1 provides a concise description of linear functions. Without specifics about a coordinate system, an abstract notion can usually be investigated more easily, because there are fewer details to keep track of. On the other hand, coordinates bridge the gap between abstract notions and concrete applications. Therefore, the connection between a concept and its coordinatized version should be investigated very carefully. This section shows that coordinatization of linear functions is done by carefully reinterpretingsome natural coefficients. A coordinate system in a vector space ultimately is nothing but a base, because any vector can be expressed as a unique linear combination of the base vectors (see Proposition 15.18). The coefficients in the base representation of the vector can be viewed as the coordinates. In this section, we investigate how a linear function between finite dimensional spaces maps the coordinates of its input vectors to the coordinates of its output vectors. Because all finite dimensional spaces are isomorphic to some Rd (see Proposition 15.25) we will work with spaces Rm,Rn etc. throughout this section. The choice of a base in domain and range leads to the connection between linear functions and matrices.
Proposition 17.14 Let in, n E N,let L : Rn --+ Rm be lineal; let { u l , . . . , u,} be a base of Rn and let (wl, . . . , w m }be a base of R". For all j = 1, . . . , n, let aij with m
i = 1, . . . , mbesuchthatL[u;] = C a i ; w i . Because(u1, . . . , u,]isabaseofRn,for i=l
all x
E
n
Rn,there are unique coeflcients c1, . . . , c,
so that x =
cj uj.
The image of
j=1
Proof. Exercise 17-14. Proposition 17.14 shows that, once we fix bases in Rn and R", for each linear function L : Rn --+ R" there is a unique rectangular array of real numbers a j j , with indices i = 1, . . . , m , j = 1, . . . , n that can be used to represent L . Such rectangular arrays of numbers are called matrices. The set of matrices with the natural addition and scalar multiplication is a vector space.
Definition 17.15 Let m , n E N.A real m x n-matrix with m rows and n columns is a function A : 11, . . . , m } x 11, . . . , n } --+ R,denoted A = (ajj) = l,. , . The j = l , . . . .n index i is called the row index and the index j is called the column index. We dejine M ( m x n , R)to be the set of all realm x n-matrices. ,,
Proposition 17.16 Let m , n
E
N,let
(aij)
=
,
,
,
j = l . . . . ,n
, (bij)
= 1 , ,, ,
.
j = I, ...,n
be real
17.2. Matrix Representation of Linear Functions
349
R.With addition dejined by
m x n matrices and let c E
+ (bij) i = 1 , .
= 1,. ., , m j = l,..,,n
(aij) i
, , ,m j = 1, . . . , n
:= (aij
+ bjj)
i = 1 , . ., , m j = l....,n
and with scalar multiplication dejined by = 1 , . . . ,m j = l , . . . ,n
~ ( a i j i)= l , . . . ,m := (caij) i j = l , . . . ,n
the set of matrices M ( m x n , R) is a vector space.
Proof. Exercise 17-15. The coefficients from Proposition 17.14 immediately lead to an isomorphism between C (Rn,R") and M ( m x n , R), where both are considered as vector spaces. Note that the specific isomorphism will depend on which bases we choose in R" and R".
Theorem 17.17 Let m , n E N,and let { u l , . . . , u n } and { w l , . . . , w m } be bases of W" and Rm, respectively. For each L E C (R", R"), let A ( L ) = ( a j j ) = 1,, , m ,,
j = 1, . . . , n
be the matrix with coeficients aij provided by Proposition 17.14. Then the function A : C (W", R") + M ( m x n , R)thus dejined is a vector space isomorphism.
Proof. Exercise 17-16. Similar to Theorem 17.12, form = n = 1 Theorem 17.17 shows that linear functions from R to R are in bijective correspondence with numbers (considered as "1 x 1 matrices"). Composition of functions can be used to define a multiplication on C (R" , R"). Exercise 17-17 shows that this multiplication is compatible with addition and scalar multiplication. For M ( m x n , R),we can define multiplication of matrices as follows.
Definition 17.18 Matrix multiplication. Let m , n , p E N. For the real matrices A = ( a j k ) j = 1 , . . , , m E M ( m x n , R ) a n d B = ( b i j ) j = l , . . _ , p E M ( p x m , R),we j = 1, . . . , m k = l , . . . ,n dejine the product B A :=
bijajk (j:1
)
E
M ( p x n , W).
i = l ,. . . , p k = 1.. ..,n
Theorems 17.20 and 17.21 below show the connection between matrix multiplication and the evaluation and composition of linear functions. For the remainder of this chapter, the base used in any space Rdis the standard base ( e l , . . . , ed). m
Proposition 17.19 Let m
E
N. Thefinction V,
isomorphism from M ( m x 1, R) to Wm.
(uij)
= 1,, , , , j=1
:=
C uilei is an i=l
17. Differentiation in Normed Spaces
350
I
W"
vn
M ( n x 1,R)
L
c
vm
A = A(L)
c
I
Rm
M(m x 1,W)
H
B =A(H)
* afp
c
i'.
M ( p x 1.X)
Figure 4 1: The connection between linear functions and their matrix representations. Note that instead of representing linear functions between spaces Rd with the standard basis, we could have also represented linear functions between arbitrary finite dimensional vector spaces using any base in each space.
Proof. Exercise 17- 18.
W
The isomorphisms Vm are the key to representing evaluations and compositions of linear functions as matrix multiplications and vice versa. It is customary to drop the second index for elements of M ( m x 1, R)and we will do so in the following. The representation of elements of Rm as in Proposition 17.19 is also called the representation with column vectors. The corresponding representation with 1 x m matrices is called the representation with row vectors.
Theorem 17.20 Matrix multiplication and evaluation of linear functions. Let m , n E N,let L : R" + Rm be a linearfunction and let A := A ( L ) E M ( m x n , R)be the matrix obtained from Theorem 17.17, using the standard bases in E%" and R". Then , where A and V;'[u] are multiplied as for all u E R" we have L [ v ] = Vm matrices. Proof. Exercise 17-19.
W
Exercise 17-20 shows that matrix multiplication is associative, which means we can write a product of three or more matrices without parentheses indicating which pairs of matrices are multiplied first.
Theorem 17.21 Matrix multiplication and composition of linear functions. Let m , n , p E N,let L : Rn + Rm and H : Rm + R p be linear functions and let A := A ( L ) E M ( m x n , R) and B := A ( H ) E M ( p x m , R) be the matrices obtained from Theorem 17.1 7, using the standard bases in Rn, Rm,and RP.Then for all u E Fin we have H o L [ u ] = Vp BAV;'[x]], where B, A and V;'[x] are multiplied as matrices. Proof. Exercise 17-21.
W
The connection between linear functions and the matrices that represent them is also expressed in Figure 41. Diagrams as in this figure are often called commutative diagrams, because the order in which the arrows are followed can be interchanged and the arrows representing isomorphisms can be reversed. To further familiarize ourselves with the correspondence between matrix multiplication and composition of linear functions, let us cast the well-known Gauss-Jordan
35 1
17.2. Matrix Representation of Linear Functions
algorithm from linear algebra in the language of linear functions. We will focus the result on bijective linear functions because we need this reinterpretation for Lemma 18.35 in the proof of the Multidimensional Substitution Formula.
Definition 17.22 Elementary row operation functions. Let d d
x
E
Rdbe represented as x
xie; =: ( X I ,
=
E
N and let every vector
. . . , xd).
i=l
I . D : lRd + Rdis called a diagonal operator $there are c1, . . . , Cd D ( x 1 , . . . , X d ) = ( C l X l , . . . , CdXd).
E
R so that
2. A : Rd -+ Rd is called a row addition operator zflthere are a number a E Iw and distinct indices i , j E (1, . . . , d } so that for all x = ( X I , . . . , X d ) E Rd we have A(x1, . . . , xd) = ( X I , . . . , X j - 1 , x; + a x ; , x;+l, . . . , xd). 3. T : Rd + Rd is called a row transposition operator zflthere are indices i , j E [ l , . . . , d } w i t h i < j s u c h t h a t f o r a l l x = ( x i , . . . , xd) ~ R ~ w e h a v e T ( x .~. . . , ~ d ) = ( ~ 1. .,. . ~ i - l , X j , ~ i +,l. . . , ~ j - l , ~ i , ~ j + l , . . . , x d ) .
Theorem 17.23 Gauss-JordanAlgorithm. Let d E N.Every bijective linearfunction L : Rd + Rd is a composition of one diagonal operator so that all cj # 0 with row addition and row transposition operators. Proof. We will provide an outline here and leave the full proof to Exercise 17-22. The proof is an induction on the dimension d . The base step d = 1 is trivial. For the induction step, let A' := A ( L ) E M ( d x d , R) be the matrix obtained from Theorem 17.17 using the standard base in Rd. Because L is bijective, there is a coefficient upl that is not equal to zero (explain). This means the transposition of rows 1 and i produces a matrix A' with a:1 # 0. Execute d - 1 row additions to produce a matrix A* with a:, # 0 and a:l = 0 for i = 2, . . . , d . Now consider the matrix B obtained from A 2 by erasing the first row and the first column. This matrix is the image B = A ( L ' ) of a bijective linear function L' : Itd-' + Rd-' (explain). Therefore, by induction hypothesis (explain) there is a sequence of row additions and transpositions that turns B into a diagonal matrix C. The corresponding row additions and transpositions turn A2 into a matrix A 3 whose only nonzero entries are in the first row and on the diagonal (explain). Moreover, all of the entries on the diagonal are not zero (explain). Perform the appropriate row additions to obtain a matrix A4 whose only nonzero entries are on the diagonal. (explain). For the above constructed sequence of row transpositions and additions, let the operators M I , . . . , M,, be the corresponding row transposition and row addition operators in the order in which the operations were performed. Then N := M , o . . . o Mi o L is a linear function so that A ( N ) = A4 (explain; a commutative diagram may help). That is, N is a diagonal operator and L = M r ' 0 .. OM;' o N . Row transposition operators are their own inverses and the inverse of a row addition operator is another row addition w operator (explain by stating the inverse). Thus we have proved the theorem. Note how the whole proof of Theorem 17.23 depends on the fluent translation between matrix operations and their interpretations as compositions of linear operators.
352
17. Differentiation in Normed Spaces
Exercise 17-24 shows another application of this translation by assuring that the solution x of a uniquely solvable system of equations Ax = b depends continuously on the coefficients of A and b.
Exercises 17-14. Prove Proposition 17.14. 17-15. Prove Proposition 17.16 17-16. Prove Theorem 17.17. 17-17. Let V , X, Y , Z be vector spaces, let L : Y + Z , M , N : X + Y and K : V c E B.Prove each of the following.
--f
X be linear and let
(a) L o ( M + N) = L O M + L o N (b) ( M + N)o K = M
o
K
+N o K
(c) ( C L ) 0 M = c ( L 0 M ) = L
0 (CM)
17-18. Prove Proposition 17.19 17-19. Prove Theorem 17.20. 17-20. Prove that matrix multiplication is associative. That is, prove that if A E M ( m x n , B),
B E M ( p x rn, R),and C E M ( q x p , W), then C ( B A ) = ( C B ) A . 17-21. Prove Theorem 17.21. 17-22. Prove Theorem 17.23 17-23. Interpret the rn x n matrix A with entries a i j as a linear function from ( E n ,/I ,112 ) to ( W" /I . / I 2 ). I
Prove that the operator norm of A satisfies 11 Alj 5 i=l j = ]
17-24. The continuous dependence of solutions x of uniquely solvable systems of linear equations Ax = b on the entries of the coefficient matrix A and the right side 6 . (a) Let X , Y be normed spaces and let X ( X , Y ) be the set of all invertible continuous linear functions from X to Y with continuous inverse ("linear homeomorphisms"). Prove that the function J : X ( X , Y ) + X ( Y , X), which maps each A E X ( X , Y ) to A-' E X ( Y . X ) is continuous on X ( X , Y ) . Hint. Prove that for all A , B E R ( X , Y ) that are close enough together, the inequality 1IA-l
-
B-]
15
'IA-' 'I2
1 - llA-'/I lIB - All
llB
-
All holds and prove that this implies that J is
continuous at A . (b) Prove that if A is an invertible d x d matrix and b solution of the system of equations A x = b. (c) Prove that the map S : R ous.
( Wd.Rd ) x
E
Ed,then x
= J ( A ) b is the unique
?Ed + Rd defined by S ( A , b ) := J ( A ) b is continu-
(d) Suppose an industrial process allows the measurement of the coefficient matrix A and of the right hand side b of a linear system of equations Ax = b. Moreover, suppose that the process requires the computation of the solution x . Explain why (unavoidable) sufficiently small measurement errors are not likely to have a large effect on the computed solution x .
353
17.3. Differentiability
3d view
’1
side view
+
6117 - - X I
Figure 42: Differentiation is approximation with linear functions. The graph of a linear function from R2 to R is a plane. The figure shows that the difference between the graph of a differentiable function and the graph of an appropriately shifted linear function fits into arbitrarily small “cones.” In the side view, the plane is seen “edge-on” and we only indicate the sides of the cone with dotted lines.
17.3 Differentiability Although it may be geometrically intuitive, defining the derivative of a multivariable function in terms of partial derivatives can become a notational nightmare. Just consider the indices and notations in the matrix representation of a linear function in Section 17.2 and imagine them as part of a more complex definition. To circumvent this level of detail, which might obscure the forest for the trees, we introduce derivatives with a coordinate-free definition. Aside from simpler notation, we avoid the pathology presented in Exercises 17-60 and 17-61 and we gain conceptual insights into a theory that is not bound to finite dimensional spaces. Indeed, all results in this section hold for infinite dimensional spaces, too. Interestingly enough, restriction to finite dimensional spaces would not simplify the proofs. This is similar to Sections 14.2-14.4, where proofs originally designed for functions of one real variable translated verbatim to the setting of measure spaces. In this chapter, we state the proofs in the abstract setting and provide results for d-dimensional space as corollaries. By staying with this level of generality, we will ultimately produce some rather elegant proofs of important results. For examples, consider the proof of Leibniz’ Rule in Exercise 17-58, as well as the ends of Sections 17.7 and 17.8. In each case, important results for the familiar d-dimensional setting are obtained as corollaries of coordinate-free results on differentiation. The key to differentiation in higher dimensional spaces lies in Figure 9 and its analytical formulation in Theorem 4.5. A function should be differentiable iff it can
17. Differentiation in Normed Spaces
354
be approximated very closely by a shifted linear function. As in Figure 9, the idea in Definition 17.24 below is that near the point where the derivative is taken, for any multidimensional analogue of a cone, both the function and the approximating linear entity should ultimately be in the same “cone” (see Figure 42). Similar to the single variable setting, the natural domains for differentiable functions are open sets.
Definition 17.24 Let X , Y be normed spaces, let R X be open, let f : R + Y be a function and let x E R. Then f is called differentiable at x @there is a continuous linearfunction L : X + Y so that for all E > 0 there is a 6 > 0 such that for all z E X with llz - xi1 < 6 we have IIf(z>-f(x)-L[z-xlI/
i&//Z--XII.
In this case, we set Df (x) := L and call it the derivative o f f at x . By Exercise 17-25b, the derivative is unique, so we are can speak of the derivative.
Notation 17.25 The argument in round parentheses behind a derivative Df will denote the point at which the derivative is taken and the argument in square brackets will denote the place where the derivative (remember that it is a linear function) is evaluated. That is, for f : R + Y , D f ( x ) [ a ]will denote the derivative of f taken at x E R and evaluated at a E X. 0
+
The function A [ z ] := f ( x ) Df ( x ) [ z- x ] is also often called the linear approximation of f at x . The name should be clear. The function A is “linear” (affine linear to be precise, but that distinction is not always made) and it approximates the function f (see Figure 42). Exercise 17-25 investigates the quality of the approximation and it also gives a geometric interpretation for functions into the real numbers. Theorem 17.12 expresses the connection between linear functions L : Iw + Y and vectors and Theorem 17.17 expresses (among other things) the connection between linear functions L : R + R and numbers. These connections are the reason why derivatives of functions defined on intervals ( a , b ) are usually considered to be numbers or (tangent) vectors. The formal justification is the following.
Proposition 17.26 Let Y be a normed space and let a < b be real numbers. The function f : ( a , b ) + Y is differentiable at x in the sense of Dejnition 17.24 i f f the f (x h ) - f ( X I exists. In this case, we call f ’ ( x ) the velocity limit f ’ ( x ) := lim h-0 h vector’ and we have Df ( x ) [ a ]= f ’ ( x ) afor all a E R.
+
Proof. Mimic the proof of Theorem 4.5. (Exercise 17-26.) Continuous linear functions are the simplest example of differentiable functions that do not follow the pattern of Proposition 17.26
Proposition 17.27 Let X , Y be normed spaces and let L : X + Y be continuous and lineal: Then L is differentiable at every x E X with D L ( x ) [ a ]= L [ a ] . ‘The name comes from the fact that if f : ( a , b ) +
f’(r) is the velocity of the particle.
W3 gives the position of a particle at time t , then
355
17.3. Differentiability
Proof. Exercise 17-27. Note that the derivative of a continuous linear function actually is a constant function (whose value at every point is L [ . ] ) !This is similar to the derivative o f f (x) = cx being f ’ ( x ) = c. More examples of differentiable functions will be encountered throughout this chapter. For the rest of this section, we will work with derivatives in their full generality. First note that differentiability still implies continuity.
Theorem 17.28 Let X , Y be normed spaces, let R _C X be an open set, and let the function f : 52 -+ Y be differentiable at x E Cl. Then f is continuous at x. Proof. Exercise 17-28 We conclude this section with differentiation rules.
Theorem 17.29 Let X , Y be normed spaces, let R g X be open, let the functions f , g : 52 + Y be differentiable at x E 52 and let a E R. Then the sum f g is differentiable at x with D ( f g)(x) = Df (x) + Dg(x) and the scalar multiple af is dizerentiable at x with D ( af ) (x) = a Df (x).
+
+
Proof. When the derivative is given, differentiability is proved directly with the definition. Consider af. For given E > 0, find 6 > 0 so that for all z E 52 with E /Iz -x 11 < S we have f ( z ) - f (x) - D f ( x ) [ z- XI I -I/z - x 11. Then for all b I+1 z E R with llz - xi1 < 6 we infer
/I
/I
I IaI-
&
la1
+ 1 Ilz - X I /
5
EIIZ
-
which means that f is differentiable at x and the derivative is a Df(x). The claim about the sum is proved similarly (Exercise 17-29). The Chain Rule retains its familiar form from calculus, except that the multiplication is replaced by composition. This is natural, because the composition of two linear functions from R to R corresponds to the multiplication of the numbers that represent the functions (see Theorem 17.21).
Theorem 17.30 Chain Rule. Let X , Y , Z be normed spaces, let R1 C X and 522 Y be open, let g i-2 1 -+ R2 be differentiable at x and let f : R2 + Z be differentiable at g(x). Then f o g isdifferentiableatx withderivative D( f og)(x) = Df (g(x))oDg(x). Proof. Let F > 0. Find 61 > 0 so that for all y
E
Y with lly - g(x)II < 61 we have
356
17. Differentiation in Normed Spaces
=
~IlZ-xlI,
which proves the Chain Rule. Similar to the Chain Rule, the rule for the differentiation of inverse functions retains its overall form. The reciprocal in Theorem 4.21 is replaced with the inversion of the derivative. This is once again natural, because if two linear functions from R to R are inverses of each other, then the numbers that represent the functions are reciprocals of each other. Unlike in Theorem 4.21 we must demand that the image of the domain of f is open and that the inverse is continuous. This is because the argument at the beginning of the proof of Theorem 4.21 is not easily translated to the setting of normed spaces. Corollary 17.66 will show that the translation is possible, but it requires the Implicit Function Theorem.
Theorem 17.31 Let X , Y be normed spaces, let Cl s X , 5 s Y be open and let f : R + 5 be a continuous bijective function with continuous inverse. I f f is differentiable at xo and Df (xo) is continuous, bijective and linear with continuous inverse, then f - ' is differentiable at yo := f ( x o ) and D
(f-') ( y o ) = (Df( f - ' ( y o ) ) ) - ' .
Proof. First note that there is a 61 > 0 so that for all x E X with 1I.x - xoll < 61 we have
/ f ( x ) - f ( x o ) - D f ( x o ) [-~ xol// 5
1
1
1
IIx - xoll. (Why is the
2 (Df(xo))-lII denominator not zero?) Hence, for all x E X with IIx - xoll
0. Find 82 E (0,s') so that for all x E X with l(x - xoll < 62 we have
1
= EllY-Yoll.
rn The derivative of the inverse of a function must not be confused with the derivative of the function that maps an invertible linear function to its inverse, which is considered in the following. (Recall that in Exercise 17-24a we have already proved that inversion is a continuous operation. Theorem 17.32 provides another proof of this fact.) Note that for the theorem to make sense, we first must also prove that the linear homeomorphisms (invertible continuous linear functions with continuous inverse) form an open subset of the space C(X,Y ) .
Theorem 17.32 Let X be a Banach space, let Y be a normed space and let K ( X , Y ) be the set of linear homeomorphisms from X to Y . Then K ( X , Y ) is an open subset of C(X. Y ) and J ( A ) := A-' is a differentiable function from ?-t(X, Y ) to X ( Y , X ) whose derivative at A E K ( X , Y ) is D J ( A ) [ F ]= -AT1 o F o A-'. X
Proof. For K
E
C(X,X) with IlKIl < 1, the series z ( - - l ) j K j converges absoj=O
lutely. Because X is a Banach space, the series converges. Let I : X --+ X denote the identity. It can be verified directly that ( I
+ K)-'
00
= c ( - l ) j K j . In particular, this j =O
358
17. Differentiation in Normed Spaces
means that if 11 K )I < 1, then Z
1
is bounded by (I
+ K)-'
I/
5
+ K is invertible with continuous inverse and its norm 1
~
1 - IIKII' Now let A E X ( X , Y ) and consider F E C ( X , Y ) with
1
II F II
0 there is a S > 0 so that for all z E R with llz - x I/ < S we have f ( z ) - f ( x ) - A(z - x ) < E I I Z - x /I, where A(z - x ) is the matrix product of A with the representation of z - x as a column vector with respect to the standard base.
1
/I
17-32. Let ( M . C. p ) be a measure space. Define I : L 2 ( M , C, p ) that I is differentiablewith D I ( f ) [ u ]= 17-33. Use Theorem 17.32 toprovethat
d (A) d x .x
lM
-+
R by I(f) :=
2f u d g
=
1 -?.
17-34. Compare the proofs of Theorems 4.10 and 17.30. Decide which is simpler and explain your decision. 17-35. Let X. Y be normed spaces, let R g X be open, let f : R -+ W be differentiable at a and let (fy)(x) := f ( x ) y for all x E R.Prove that D ( f y ) ( a ) [ . ]= D f ( a ) [ . ] y .
E
R,let y
E
Y
17-36. Let X. Y be normed spaces, let R g X be open and let f : R -+ Y be continuous at the point x E R. Prove that if there is a linear function L : X -+ Y (we do not assume L is continuous) so that for all E > 0 there is a S > 0 so that for all z E R with /1z - x/1 < 6 we have f ( z ) - f ( x ) - LIZ- X I i~ l l z xlI, then f is differentiable at x .
Ii
1
17-37. Let ( X , 11 //x ) , ( Y , 11 . / ( y ) be normed spaces, let R g X be open, let f : R -+ Y , let x E R.let /I.\/; be a norm on X that is equivalent to II.1Ix and let II.ll; be a norm on Y that is equivalent to 11 11 y . Prove that f considered as a function from a subset of ( X , 11 . 1 1 )~ to ( Y , 11 IIy ) is differentiable to ( Y . 11 .) ;1 is differentiable at x . at x iff f considered as a function from a subset of (X, 11 ) ; 11 17-38. Let X be a normed space. let R C X be open and let f : R + W be differentiable at x E R. Prove that if there is a 6 > 0 so that for all z E R with llz - X I \ < S we have that f ( z ) 5 f ( x ) (that is, there is a local maximum at x ) , then D f ( x ) = 0. Hinr. Suppose the contrary and use an a E X so that D f ( x ) [ a ]z 0. 17-39. Let X, Y be normed spaces, let R 2 X be open and for all n E N let f n : R -+ Y he differentiable converges pointwise to on R with continuous derivative O f n : R + L ( X , Y ) . Prove that if a function f : R -+ Y and [ Of,, },"==, converges uniformly to a function 7 : R -+ L ( X , Y ) , then f is differentiable and Df = 7. Hinr. First define pointwise and uniform convergence.
(fn)F=l
360
17. Differentiation in Normed Spaces
17.4 The Mean Value Theorem The Mean Value Theorem (Theorem 4.18) cannot be translated directly to higher dimensional settings. Exercise 17-40 shows that for functions f : [a, b] + X , where X is a normed space, the derivative (velocity vector) need not be parallel to f ( b ) - f ( a ) at any t E ( a , b ) . However, the Mean Value Theorem is mainly used to bound the difference f ( b ) - f ( a ) by the product of b - a with the supremum of the derivative. Such a result can be proved in a more general setting and we will call the general result “Mean Value Theorem,” too. A natural idea for the proof is to first use the Fundamental Theorem of Calculus for an appropriately defined integral and to then use the triangular inequality (Exercises 17-4 1 and 17-42). This approach ultimately requires the continuity of the derivative or a technical integrability condition. Conditions of this kind can be avoided by working with compactness instead.
Theorem 17.33 Mean Value Theorem. Let X , Y be normed spaces, let L! E X be open, let f : L! + X be differentiable and let a , b E L! be distinct points so that for all t E [0, 11 we have a t ( b - a ) E L!. Then
+
lIf(b)-f(a)II (sup{llDf(a+t(b-a))ll : t E [0,11}lJb-~ll.
Proof. Considerg(t) := f (a+t(b-a)), definedon aninterval (-u, l+u) for some d u > 0. By the Chain Rule, we obtain g’(t) = --g(t) = D f ( a + t ( b - a ) ) [ b- a]. Let dt E > 0. For every t E [0, 11, there is a Sr > 0 so that for all z E X with llzll < Sr lIb - a 11 we have
/I f ( a +t (b- a ) +z)
I/
&
- f ( a +t(b - a ) ) - D f ( a +t ( b- a ) ) [z] 5 -llzll.
Therefore, with c := sup {//of (a Ix - t I < 6r we infer
Ilb-all
+ t ( b - a ) )1 : t E [O, l]}, for all t , x E [0, 11 with
I/
/Ig(x) - s ( t ) 5 IlgCx) -so> - g’(t)(x - t > / / I/g’(t>(x- t)II
+
1 f ( a + x ( b - a ) ) - f ( ~ +(tb - a ) ) - D f ( a + t ( b - a ) ) [ ( x- t ) ( b -a)] / t )( b-011 1 I---I l b - t ) ( b-all1 + / ~ ~ f ( a + t ( b - a ) ) ~ / I I b - ~ I I I ~ - t l lIb - all
=
+ // of (a+t (b- a ) ) [(x
-
&
5
(cllb-all
+
E)IX
+
- tl.
NOW{ ( t - S r , t S t ) : t E [0, I]} is an open cover of the compact set [0, 11, which means there is a finite subcover { (t, - St,,t, A t i ) : i = 1, . . . , n } . Without loss of generality, we can assume that tl < t2 < . . . < tn, t l - &, < 0, 1 < tn S,, and t,+l-SrI+, < t , + S r L f o r i = l , . . . , n - l . S e t x o : = O , x , : = l a n d f o r i = 1 , . . . ,n-1 let x, E (t,+1 - SfL+,, t, S,,) n ( t l ,t,+l). Then, using a telescoping sum,
+
+
/I f ( b ) - f ( a )I/
+
17.4. The Mean Value Theorem
I
361
n
n
i=l n
i=l
C (cllb-all+
+ ( cl l b - ~ l l +& ) ( t i - x i - l )
E)(Xi-ti)
= cllb-a//
+
E.
i=l
Because E > 0 was arbitrary, the theorem is proved. The requirement that there is a straight line connection between the points a and b feels limiting, but it cannot be dropped. In particular, mere connectedness of the domain is not enough. (See Exercise 17-62.)
Exercises 17-40. There is no direct translation of the Mean Value Theorem to vector valued functions. Let the function f : [0, n]+ R3 be defined by f ( t ) := (cos(t), sin(t), t2(n- t ) ). Prove that there is no c E (0, n)so that f(n)- f ( 0 ) = D f ( c ) [ n- 01. 17-41. The Riemann integral for Banach space valued functions. Let X be a Banach space and let f : [a, b] -+ X be a function. (a) Define what it means for f to be Riemann integrable. (b) Prove that i f f is continuous, then f is Riemann integrable. Hint. Prove that for a sufficiently fine partition P any two Riemann sums R ( f , P , T I )and R(f,P , T z ) are close to each other. Then prove that for any refinement Q the Riemann sums for P and for Q are close to each other. Prove that for Pn being the equidistant partition with n intervals, the Riemann sums converge. Then prove that all Riemann sums converge. (c) Prove the Antiderivative Form of the Fundamental Theorem of Calculus. That is, prove that i f f : [a, b] + X is differentiable and f' is integrable, then
Ib
f ' ( x ) dx = f ( b ) - f ( a )
Hint. Cover [ a , b] withintervals ( x - S x , x + S x ) sothatforallz E [ a , b] with l z - x l < 8, we Iz - X I . Take a finite subcover and construct a have I l f ( z ) - f ( x ) - f ' ( x ) ( z - x ) / l 5 b-a partition
A
(d) Triangular inequality. Prove that
Illb Illb1
1
f(x)
f ( x ) dx 5
dx if both integrals exist.
(e) Prove the Derivative Form of the Fundamental Theorem of Calculus. That is, prove that
d", I' f ( x ) d x = f ( t ) for all
i f f : [ a , b1 -+ X is continuous, then -
t
E [a, b].
17-42. Use Exercises 17-41c and 17-41d to prove that if X, Y are normed spaces, R C X is an open subset, f : C2 + X is differentiable, Df is continuous, and a , b E 12 are so that for all t E [O, 11 we have a+t(b-a)ER,then / / f ( b ) - f ( a ) 5 s u p { / I D f ( a + t ( b - a ) ) / / : r ~ [ O , l ] } i ~ b - a ~ I . T h e n explain why we cannot avoid the continuity hypothesis in this approach.
/
17-43. Use Exercises 17-41c and 17-41d to prove that if X. Y are normed spaces, R 5 X is an open subset, f : R -+ X is differentiable, Df is continuous, and a , b E R and y > 0 are so that for all f E [O, 11 we have a f(b - a ) E R and Of(a t ( b - a ) ) - D f ( a ) 5 y t l / b - a l l , then the inequality
+
I/ + Y 1 f ( b ) - f ( a ) - Df(a)[b - a1 / 5 -2 Ilb - all2 holds.
1
Hint. Mimic the proof of Lemma 13.13 and use Exercise 17-41c. 17-44. Newton's method in several variables. Let X be a Banach space, let R C_ X be open and let f : Q + X be a continuously differentiable function so that D f ( x ) E ' H ( X , X) for all x E Q. Prove that if there are xo E R and a , B , y > 0 so that D f ( x g ) - ' [ f ( x 0 ) ] 5 a , so that for all x E R
1
17. Differentiation in Normed Spaces
362
we have /Df(x)-'11 5 j3, so that for all x. z E L2 we have h :=
~
2
< 1 and so that with r :=
1-h
1) D f ( z ) - D f ( x ) 1 5 y l l i -xIl,
we have B,(xo) & 52, then
(a) Each recursively defined point x n + l := xI1- Df(x,)-' (b) The sequence (x~];=~
so that
[ f ( x n ) ] is in B,(xo)
converges to a point u E B,(xo) with f ( u ) = 0. h2n-l
(c) For all n 2 0 we have
]/it - xn
11 5 cy 2 " 1-h
Hint. Mimic the proof of Theorem 13.14.
17.5 How Partial Derivatives Fit In Partial derivatives are usually defined in the direction of a coordinate axis. Rather than
n d
tying our presentation to
Rdand its coordinate axes, we consider a product
Xi of
i=l
normed spaces.
Proposition 17.34 For i = 1, . . . , d , let ( X I , 11 . lli) be a normed space and let 11 . //Bd
n d
be a norm on Rd. Then is a vector space and
jj
X i with componentwise addition and scalar multiplication
i=l (XI,
1)
. . . , xd) jj := (11x1 11 1 , . . . , llxdlid) IjRd defines a norm on
it.
Proof. Exercise 17-45. Definition 17.35 For i = 1, . . . , d , let (X,, I/ . I l l ) be a normed space and let I/ . llmd b e a n o r m o n R d . Thenorm /)(xi, . . . , xd)ll := ~ ~, . . . , Ilxdlld)llad ( ~ i s ~c a l l e d~a
n d
product norm. From now on, we will assume that any product
X , of normed spaces
I=1
is equipped with a product norm and we will call it a product space. Because all norms on Rdare equivalent, all product norms are equivalent. Hence, unless otherwise stated, we will use max { 11x111 1 , . . . , ((xdlid} as the product norm. Exercise 17-46 shows that all product norms are equivalent, so we are free to interchange specific norms in Rdin the definition of the product norm we use. The definition of product norms implies that the product of Banach spaces is again a Banach space and that the natural projections are continuous.
Proposition 17.36 For i = 1, . . . , d , let (X,, // . i l l ) be a normed space. The product
n d
X , is a Banach space ifSallfactor spaces ( X I , 11
r=l
Proof. Exercise 17-47.
. I l l ) are Banach spaces.
~
17.5. How Partial Derivatives Fit In
363
Figure 43: For partial derivatives, we restrict our attention to an appropriately translated subspace and take the derivative in that subspace.
n d
Proposition 17.37 For i = 1, . . . , d let
(Xi,
11 . lli) be a normed space and let
Xj
i=l
d
be the product space. Then the natural projections nj : n n;(xi
~
X i
-+ X i dejined by
i=l
. . . , xd) := x j are continuous.
Proof. Exercise 17-48. Exercise 17-49 now shows that not every norm on a product of vector spaces is a product norm. Proposition 17.38 assures that as we consider partial derivatives with respect to a factor space, the domains of the requisite functions are open.
n d
Proposition 17.38 Let X =
Xi
be a product space, let R C X be open, and let
i=l
a = ( a l , . . . , a d ) E R. Then f o r all j E [ 1, . . . , d } the set := { x j E is open in
xj
: ( ~ 1 , .. . , a j - l , x j , a j + l , . . . , a d )
E
a}
(x;.// . 1).;
Proof. Exercise 17-50.
n
rn
d
Definition 17.39 Let X =
X j be a product space, let Y be a normed space, let
i=l
R 2 X be open, let a = ( a l , . . . , a d )
E
R,and let j
E (1,.
. . , d } . For f : R + Y ,
17. Differentiation in Normed Spaces
364
define fj" : R; + Y by f j " ( x ; ) := f ( a l , . . . , a;-1, x;, a ; + l , . . . , a d ) . If fj" is d$ferentiable at a;, then the derivative of fj" at a; is denoted D; f ( a ) and it is called the partial derivative o f f with respect to X ; at a or the partial derivative o f f with respect to the j t h variable at a. For a visualization, consider Figure 43.
n d
It is easy to see that if f is differentiable at a
E
X i , then the restriction of its
i=l
derivative to X ; is the partial derivative with respect to X ; .
n d
Proposition 17.40 Let X =
X i be a product space, let Y be a normed space, let
i=l
52 X be open and let f : R + Y . I f f is difSerentiable at a E Q, then f o r all integers j E (1, . . . , d } the partial derivative at a with respect to X ; exists and is equal to D; f ( a ) = D f (a)I(o)x . . . x ( ~ ) x ~ , x ( ~ ~ x .Moreovel; . . x ~ ~ ~ . f o r all u E X we have d
of ( a ) [ u i , . . . , U d l =
D ; f (a)[u;l. ;=l
Proof. Exercise 17-51. Unfortunately, the existence of partial derivatives is not sufficient for differentiability (in fact, not even for continuity) as Exercises 17-60 and 17-61 show. The pathology exhibited in these exercises is why we build the theory around derivatives as in Definition 17.24 instead of partial derivatives. Nonetheless, it is possible to construct the derivative from partial derivatives, as long as the partial derivatives are continuous.
n d
Theorem 17.41 Let X =
X i be a product space, let Y be a normed space, let
i=l
Q & X be open and let f : R + Y . Iffor every j E { 1, . . . , d } the function f is dzfferentiable with respect to the jthvariable at every x E S2 and the function x H D; f (x) is continuous at a E S2, then f is di~erentiableat a and the derivative o f f at a is d
Df
(a>[Ui,.
. .,U d l =
D;f (a)[u;]. ;=l
Proof. Recall that we said we would use IIx 11 := max { llxi Ili : i = 1, . . . , d } as the product norm. Let E > 0. Find 6 > 0 so that for all j = 1, . . . , d and all x E Bs(a) we & have D; (x) - D; ( a ) < -. Then for all (z1, . . . , Z d ) E Bs(a1, . . . , a d ) we obtain d the following.
1
/
17.5. How Partial Derivatives Fit In
365
c d
The above proves that D f ( a ) [ u l ,. . . , U d ] =
Dj f ( a ) [ u j ] .
j=1
With the general results established, we can now turn to the familiar partial deriva-
n n
tives on Rn. Because Rn=
R is the prototypical product space, all results proved
i=l
so far apply to R '. Therefore we can concentrate on translating the abstract notions to the more concrete setting of n-dimensional space. Partial derivatives in R"are typically defined in the direction of a coordinate vector.
Definition 17.42 Let the set R R" be open, let f : R -+ R" be a function and let a E R. Then the partial derivative off with respect to x i at a is defined to be f (a + h e j ) - f ( a ) if this limit exists. af -(a) := lim ax h-0 h The connection between these partial derivatives and the derivative is an exercise in reinterpreting abstract concepts as matrices.
Theorem 17.43 Let R g a
E
R,f o r i
= 1 , . . . , m let
R" be an open set,
fi := j r i o f
let f : R
and letu
E
-+
R" be differentiable at
Rn be so thatu
n
= z u j e j . Then /=I
af
"
repreThat is, the matrix (:(a)) ax i = l , . . . ,m j=1 j = l ....,n sents D f ( a ) with respect to the standard bases in Rn and R". This matrix is also called the Jacobian matrix off at a. Conversely, i f f : R -+ Rn' is so that f o r all i = 1, . . . , m and j = 1, . . . , n and
of ( a ) [ u ] =
i=l
ej
j$(a)uj.
afi
all x E 52 the partial derivative -( x ) exists and 8Xj
continuous at a , then f is differentiable at a.
if all these partial derivatives are
366
17. Differentiation in Normed Spaces
n n
Proof. For j = 1, . . . , n , let Rj := R so that Rn =
R j . (In this fashion, we j=l can distinguish the partial derivatives in the coordinate directions.) For g : C2 + R
c n
differentiable at a , we infer by Proposition 17.40 that D g ( a ) [ u ]=
D~,g(a)[uj].
j=1
By Proposition 17.26 for all j = 1, . . . , n we obtain Da,g(a)[uj] = -ag (a)uj, hence D g ( a ) [ u ]=
ax
c
j=1
-ag (a)uj. ax
and
Now for f : C2 + Rm by Exercise 17-35 we con-
The last statement follows from Theorem 17.41. For m = 1, the Jacobian matrix of a differentiable function f : Rn + R is given special attention because of its physical meaning.
(7)
Definition 17.44 Let C2 C R” be open and let f : 52 + R be so that all partial derivatives o f f exist at a E C2. Then we define the gradient o f f at a to be grad( f ) ( a ) := V f ( a ) :=
(‘F’).
With V :=
this looks like a formal
af (a) multiplication o f f with the ‘ vector” V. The (purelyfo%al) “vector” V is also called the nabla operator.
The physical meaning of the gradient is easily explained with what we have derived so far. Let f : 52 + R be differentiable at a. Then by Theorem 17.43 for all u E Rd we obtain D f (a)[u] = ( Vf ( a ) , u ) . Moreover, the derivative describes the local behavior of f near a. In particular, for any vector u with IIuII = 1 we infer lim t+O
(‘
+
”)
t
-
’
(‘)
= (Vf ( a ) ,u ) (Exercise 17-52). That is, the inner product
( Vf ( a ) ,u ) gives the derivative of f in the direction u . This inner product is largest
when u = Of so the gradient vector gives the direction of steepest ascent V f (a>jj ’ with its norm being the slope in that direction. Similarly, -V f ( a ) gives the direction of steepest descent with the negative of its norm being the slope in that direction. Physical systems usually strive for equilibrium. Whenever there is an imbalance in a physical quantity described by f in a homogeneous medium, there will be a flow in the direction of -V f that tries to restore equilibrium. For more on the physical ideas, consider Section 21.2.
)I
Exercises 17-45. Prove Proposition 17.34 as follows
367
17.5. How Partial Derivatives Fit In
n d
(a) For i = 1, . . . , d , let X; be a vector space. Prove that
Xi with componentwise addition
i=l
and scalar multiplication is a vector space. d , let ( X;, //
that
11; ) be a normed space and let 11 . I/Rd be a norm on W d . Prove
1) (xl,. . . , xd) 1) := 1 ( 11x11; 1, . . . , IjXdlId )/I@
n xi. d
defines a norm on
i=l
17-46 Prove that any two product norms are equivalent. Note. This and Exercise 17-37 justify the free interchange of one product norm for another 17-47 Prove Proposition 17.36. 17-48 Prove Proposition 17.37. 17-49 Not every norm on a product is a product norm.
n d
(a) Let X I , . . . , x d be vector spaces and let //
I~x
be a norm on
Xj
j=1
1
/Ix
i. Prove that for all j = 1, . . . , d the function ilxj 11) := (0, . . . , 0,x j , 0 . . . , 0) is a norm on X j . ii. Prove that if for each i E [ 1, . . . . d ] we pick a fixed x; E X; \ (01, then the function ( a l , . . . , a d ) := ( a l x l , . . . , a d x d ) defines a norm on J R ~ .
/I
I/
/I
llx
(b) Let ( X , 11 . // ) be a normed space. let S g X be a dense subspace (like the simple functions in L P , see Theorem 16.85) and let F be a subspace so that S f l F = {O} (with S being the set of simple functions in L P , F could be the set of scalar multiples of a function f $ S). Define Y := S x F and let (s, f) := 11s f l l .
1
1 np
+
i. Prove that 11 . is a norm on S x F . ii. Prove that the natural projection ns : S x F + S is not continuous Hint. Let s, + f and consider s, - f. iii. Conclude that 11 . l l n p is nor a product norm on S x F 17-50 Prove Proposition 17.38. 17-51 Prove Proposition 17.40.
Bd be open and let f : R + B be differentiable at a f ( a +tu) - f(u) l(u11 = 1 we have lim = ( V f ( Q ) , u ). t-0 t
17-52 Let R C
n
E
R.Prove that for any vector u with
d
17-53 Let X =
Xi be a product space, let Y be a normed space, let R
5 X be open and let f : R + Y
i=l
be a function. Prove that i f f is differentiable on R and Df is continuous, then all partial derivatives D j f exist for all a E R and the functions D j f are continuous on R. 17-54 Let X be a vector space and let E,F g X be vector subspaces of X. Then X is called the direct sum of E and F , denoted E @ F , iff E F = ( 0 )and for all x E X there are e E E and f E F so thatx=e+ f . Prove that if X is the direct sum of E and F , then X = E c 3 F is isomorphic to E x F . 17-55 Prove the Multivariable Chain Rule. That is, prove that if f ( X I ,. . . , x,) : W n + W is a differentiable function of n variables and the components are differentiable functions x j ( t i . . . . , rm) of m variables, then
af
-
at;
= j=1
af ax2 + . . . + -2. af ax af ax, + -af a x j = --ax2 ati ax, at; ax j at; ax, ati
17-56 Coordinate transformations for differential operators. Let (x,y ) be rectangular coordinates on arctan ( f ) ; if x > 0, be polar coordinates on R’. w2, let r = and let Q := ar ctan ( $ ) +;r ; i f x < O ,
JG
17. Differentiation in Normed Spaces
368
af af af sin(@) (a) Prove that if f : W2 + R is differentiable, then - = - cos(8) - - ~. ax ar 80 r Hint. Use the Chain Rule as stated in Exercise 17-55. (b) Prove that i f f : R2 + R is differentiable and
a2f
a2f
af are differentiable, then af and ax’ af ar ae
a2f sin(8) cos(8) a f sin(Q)cos(O) af sin2(@) a2f sin2(8) 2+2arao r as ,2 ar r a@ r 2 ’ The second partial derivatives simply denote partial derivatives of partial derivatives.
+
-- -cos2(8)-2a x 2 - ar2
(c) Let f be a function whose partial derivatives an expression for
+--
af are differentiable. Derive af af af and ax’ ay’ ar a@
a2f + a2f in polar coordinates. ax2 ay*
-
a 2 f and add the two Hint. Derive a formula similar to that in part 17-56b for ay
n
*
d
17-57. Let W be a normed space, let
Xi be a product space, let R
i=l
let f j : R + Xi be differentiable at x E R. Prove that f := (f1, Df(xIL.1 = ( D f l ( X ) [ . I > . , Dfd(X)[.I ).
W be open and for i = 1, . . . , d
. . . , f d ) is differentiable at x with
, 1
17-58. Leibniz’ Rule. Let a , b : (c, d ) -+ ( I , u ) be differentiable and let g : ( I , u ) x (c, d ) --f tiable with respect to the second variable. Let F E L 1 ( l ,u ) be so that all g ( . , t ) , ’(” t
a and -g(.,
at prove that
R be differen-
h , - ’(” t ) h t ) are bounded by F ( . ) and let all g(., t ) be continuous. Use the steps outlined below to
(a) Prove that u : (c, d ) + ( I , u ) x (1. u ) x L’(I, u ) defined by u ( t ) :=
(
+
a(t)
b(t)
is differen-
g(x3 t )
tiable. Hint. Use the result of Exercise 17-57. Use Proposition 17.26 and the Dominated Convergence Theorem for the differentiability of the third component. (b) Prove that s : ( I , u ) x (1, u ) x C ( l , u ) + W defined by s where C ( I , u ) is a normed subspace of L1(Z, u ) .
0 lb b
Hint. Theorem 17.41. Use the linearity of the integral operator h for the partial derivative with respect to the third component.
h ( x ) dx is differentiable,
:=
H
lb
h ( x ) d x on L 1 ( I , u )
(c) Prove Leibniz’ Rule using the Chain Rule 17-59. Let X be a normed space, let D g X be a dense subspace, let Y be a Banach space, let R g D be open in D and let f : R + Y be continu_ously differentiable with bounded uniformly continuous deriIative. Prove th2t there is an open set R E X and a unique continuously differentiable function e : R + Y so that R n D = R and so that e l n = f . Hint. Use the Mean Value Theorem (Theorem 17.33).
{ m’ xy
17-60. Consider the function f (x, y) =
af
af
(a) Prove that -(0, 0) = -(0 , ax av
0
0 ) = 0.
for (x, y ) f (0, O ) , for (x, y) = (0,O).
17.6. Multilinear Functions (Tensors)
369
(b) Prove that f is not continuous at (0,O) 17-61. Consider the function f ( x , y) =
. for (x,y ) m' { 0; for (x, X2Y
f (O,
y) = (0,O).
(a) Prove that f is continuous at (0,O). (b) Let X = span(u)
c R2 be an arbitrary one dimensional linear subspace of W2 with
Prove that D x f ( 0 , O ) := lim
I / u / /= 1.
f @ u ) - f ( 0 ,0 ) exists, f
t+O
(c) Prove that f is not differentiable at (0,0) Hinf. If f was differentiable at (0, 0), what would the derivatives in part 17-61b be equal to? Use Theorem 17.43. 17-62. An unbounded function with bounded derivative and bounded, connected two-dimensional TI
( 8 , A r ) E R2 : 8 > -, A r
f ( Q ,A r ) :=
((
4
+ A r ) cos(8),
(f+
and define f : A + R2 by
E
A r ) sin(R)).
(a) Prove that S := f [ A ]is open, connected and contained in B2(0. 0). (b) Sketch S and state the geometric meaning of 8 and A r . (c) Prove that f is injective. (d) For (x, y ) E S, define 8 ( x , y ) to be the first component of f-'(x, y ) . Prove that the function (x,y ) F+- B(x, y) is differentiable at every point of S by showing that on every open disk contained in S it differs from arctan
(e) For (x, y ) E S , define r ( x , y ) := J x 2
(0For (x.y)
+ y 2 and prove that r is differentiable on S .
E S define B ( x . y ) := In ( 8 ( x ,y )
). Prove that B
is differentiable with bounded
derivative. Hint. Prove that the absolute values of the partial derivatives of 8 are equal to one of the 1x1 I cos(Q(x, Y))I or I ~ I - I s i w x , Y))I and use that the radius expressions r ( x ,Y ) r(x,Y ) 1 (g) Prove that
lim
( x . 41+ (0.0)
B ( x , y ) = 00.
17.6 Multilinear Functions (Tensors) I f f : R -+ Y is differentiable at every x E S2, then we can also try to differentiate the derivative D f : R -+ C ( X , Y ) . If the thus computed second derivative exists, it would be a function that maps points x E R to linear functions D 2 f ( x ) [ . ]E C ( X , C ( X , Y ) ) and these linear functions map points u E X to linear functions D2f ( x ) [ u ] [ . ] .It turns out that such functions are linear in both square bracketed arguments (see Proposition 17.53 below). To simplify notation, higher derivatives are usually identified with functions that have several arguments and are linear in each one of them.
370
17. Differentiation in Normed Spaces
n k
Definition 17.45 Let X =
n
Xi be a product space and let Y be a normed space.
i=l
k
Then T :
X i + Y is called multilinear or k-linear or a k-tensor
i=l
j
E
{ 1 . . . . . k } , all x.
J E
X ; and all a , p
TIXI. . . . . s j - I . a x + B y . =
E
R we have
. . . , x/;] x. x j + l . . . . . X k ] + ~ T [ x I . ,. ., X j - 1 .
aT[.Vl.. . . . “;-I.
lfSfor all indices
Xj+l.
)’, , u j + l , .
. . .X X ] .
2-linearfunctions are also called bilinear. As for linear functions, we enclose the argument of multilinear functions in square brackets instead of round parentheses. This is because as multilinear functions are identified with higher derivatives we will evaluate higher derivatives at a point x (enclosed in round parentheses) for an argument [tl , . . . , t m ] ,which will be distinguished by being enclosed in square brackets.
Example 17.46 Examples of bilinear functions.
1. The function m : R2+ R defined by m [ x , y ] := xy is bilinear. 2. Let X be a real vector space and let (., .) be an inner product on X. Then (., .) is bilinear. An inner product on a complex vector space X is not bilinear, because for x, y E X and a E @. we have (x,cuy) = E ( x , y ) . Complex inner products are also called sesquilinear. 3. The cross product
(1;) (ii) y1
x
:=
(
YlZ2 - ZlY2
is a bilinear funcx1y2 - Y l X 2
0
tion from I W ~x I W ~to I W ~ .
Continuity of multilinear functions is characterized similar to continuity of linear functions.
n k
Theorem 17.47 Let X =
n
X, be a product space, let Y be a normed space and let
1=1
h
T :
X, + Y be k-linear: Then the following are equivalent:
1=1
I . T is continuous on X . 2. T is continuous at 0
E
X.
-
3. T is bounded on B I (0) c X .
n k
4. There is a c
I/ T [ X l > . . .
1
E
R so that f o r all elements (XI,. . . , xk) E X
Xkl
/I IC l l X l I1 .
=
1=1 ’
. IlXk /I.
X , we have that
37 1
17.6. Multilinear Functions (Tensors)
Proof. Mimic the proof of Theorem 17.4 (Exercise 17-63).
n n k
Corollary 17.48 Let X =
X i be a finite dimensional product space, let Y be a
i=l
k
normed space and let T :
X , -+ Y be k-lineal: Then T is continuous.
i=l
Proof. Mimic the proof of Corollary 17.6 (Exercise 17-64). Similar to continuous linear functions, continuous multilinear functions form a normed space.
Definition 17.49 Let X I , . . . , xk, Y be normed spaces. DeJine Tk(X1,. . . , x k , Y ) to
n k
be the set of all continuous k-linear functions from the product space
Xi to Y .
If
i=l
XI
= ' . . = Xk = X , we also write
I ~ ( xY ,) instead of I ~ ( X .~. . ,, xk, Y ) .
Theorem 17.50 Let X i , . . . , xk, Y be normed spaces. Then, with pointwise addition and scalar multiplication, T k ( X l , . . . , x k , Y ) is a vector space and the function
is a norm on T ~ ( x. . .~, x,k , Y ) so that
nxi.
/I ~ ( x l. ., . , X k ) I/ 5 I I T11 11x1 11 . .
llXk
11 f o r all
k
( X I ,. . . , X k ) E
Moreover; i f y is a Banach space, then so is I k ( X 1 , . . . , xk. Y ) .
i=l
Proof. Mimic the proofs of Theorems 17.8 and 17.11 (Exercise 17-65). Definition 17.51 Similar to the operator norm of a continuous linearfunction, we will call the norm from Theorem 17.50 the tensor norm of the continuous k-tensor T . Continuing with similarities to continuous linear functions, multilinear functions are differentiable iff they are continuous.
Theorem 17.52 Let X I , . . .
~
xk
and Y be normed spaces and consider the function
n k
T
E
Tk(X1,. . . , X k , Y ) . Then T is dgerentiable and f o r each
( x i , .. . ,xk) E
Xi
i=l
k
the derivative is D T ( X ~ , . . . , X k ) [ u l , .. . , U k ] = c T [ x l , . . . ,x;-1, u j , x ; + l , . . . , x k l . j=l
Proof. The case k = 1 is Proposition 17.27. Hence, we will assume k 2 2 throughout. We use a telescoping sum that is similar to the one in the proof of Theorem
17. Differentiation in Normed Spaces
372
n k
17.41. Let ( X I ,. . . , X k ) E
1 (z1, . . . , zk) //
n k
X;. Then for all elements
i=l
211 ( X I , . . . , xk)
i
I/
(zi,. . . , zk) E
)I + 1 =: M we obtain the following.
X j so that
i=l
j=l
=:C &
Now for any E z 0 we can choose S := -to make the difference smaller than E
c.+!
llz - x 11. Hence, T is differentiable with the indicated derivative.
In particular, Theorem 17.52 says that the derivative of a k-tensor is a sum of ( k - 1)-tensors. We conclude this section with the result that allows us to identify higher derivatives with continuous k-linear functions.
Proposition 17.53 The spaces C (X,I k ( X , Y ) ) and 'Tk+'(X, Y ) are isomorphic via
D : X + I k ( X , Y ) in L (X,I k ( X , Y ) ) to the multiLinearfunction in I~+'(x, Y ) that maps (uo, u1,. . . , u k ) to D[uo][ul, . . . , uk]. the map that sends the function
Proof. Exercise 17-66. Starting with Exercise 17-69 below the exercises emphasize an important idea that was already used in Exercise 17-58. If we can write a complicated function as the appropriate composition of simpler functions, taking the derivative becomes a comparatively easy task of combining the Chain Rule and Exercise 17-57. This is one of the advantages of the coordinate free approach to differentiation.
373
17.7. Higher Derivatives
Exercises 17-63. Prove Theorem 17.47. 17-64. Prove Corollary 17.48. 17-65. Prove Theorem 17.50. 17-66. Prove Proposition 17.53. 17-67. More examples of k-linear maps. (a) Prove that m : Rk + & defined i by r n [ x l , . . . , X k ] := XI
is continuous and k-linear. 1 1 (b) Let ( M , Z, p ) be a measure space and let 1 5 p , q 5 cc with - - = 1. Prove that P 4 I : L P ( M , Z, p ) x L q ( M , Z, p ) + W defined by I[f, g ] := fg d p is a continuous ’ ’
Xk
IM
+
bilinear map. (c) Let X, Y , Z be normed spaces and let o : C ( Y , Z) x L(X,Y ) be defined by o [ L ,M ] := L o M . Prove that o is a continuous bilinear map. 17-68. Prove that for every 2-tensor T : JRd x Bd + R there is a d x d-matrix A so that T [ u ,w] = v T A w , where u , w are column vectors with respect to the standard base and v T is the transpose of u , that is, a row vector. 17-69. Prove each of the following as a consequence of Theorem 17.52 and Exercise 17-57. (a) Let f,g : ( a , b ) + JR be differentiable. Prove that (f . g)’ = f’g (b) Let Y be an inner product space and let f,g : ( a , b )
(f>g)’
= if’.g )
+ (f,g’).
--f
+
fg’. Y be differentiable. Prove that
(c) Let 52 C W3 be open, let f,g : 52 x 52 + W3 be differentiable and let x denote the cross product on R3. Prove that (f x g)’ = f’x g f x g’. (d) Let W , X , Y , Z be normed spaces, let 52 _C W be open and let L : 52 + L ( Y , Z) and M : R + C ( X ,Y ) be differentiable. Prove that D ( L o M ) = D L o M L o D M . Hint. Exercise 17-67c.
+
+
(e) Explain why all the above product rules “look the same.” 17-70. Derive a product rule for products of k functions f l , . . , , fk : ( a , b ) + W. 17-71. Let G L ( n x n. R) be the set of invertible n x n matrices and let S : G L ( n x n , W) x W“ + W” be the function that maps the pair ( A , b ) of an invertible matrix A and a “right hand side vector” b to the solution x of the system of equations Ax = b. (a) Prove that S is differentiable. Hint. Theorem 17.32. (b) Compute the derivative of S at an arbitrary ( A , b ) .
17.7 Higher Derivatives Now we are ready to investigate higher order derivatives. The underlying definition is the obvious one.
Definition 17.54 Let X , Y be normed spaces and let R g X be open. The function f : R + Y is called k times differentiableat x iff f is (k - 1) times differentiable on R and its ( k - l)Sfderivative Dk-’ f is differentiable at x. The kth derivative off at x is denoted Dkf (x). Thefunction f is called k times differentiable on R iy f is k times dixerentiable at every x E R.It is called k times continuously differentiable on 52 i f f it is k times differentiable on C2 and Dk f is continuous on R.Finally, the function f is called infinitely differentiable on irfor all k E N f is k times differentiable on R.
17. Differentiation in Normed Spaces
374
Appropriate application of Proposition 17.53 allows us to identify kth derivatives with k-tensors. This identification is common in analysis and we will use it throughout this text. That is, Dk f (x) will denote the k-tensor that is associated with the kth derivative of f at x. The next result shows how higher derivatives of higher derivatives are related.
Proposition 17.55 Let X , Y be normed spaces, let m , n E W, let R X be open, let f : R + Y , and let x E R. Then f is m + n times differentiable at x ifs f is m times diTerentiable on R and D" f is n times differentiable at x E R. Moreovel; Dm+" f (x) = Dn (D" f )(x) andfor ( t l , . . . , tm+,) E Xm+n we have the identity Dm+nf( x ) [ t l , . . tm+nl = Dn ( D m f )( x ) [ t l t . .. , tnl[tn+l,.. . tm+nl. Proof. Let m E N be arbitrary. The proof is an induction on n , with the definition being the base case n = 1. For the representations of the tensors, note that for ( t l , . . . , tm+l) E Xm+' the derivative D (D" f )( x) [t l ][t 2. ,. . , tm+l] is by Proposition 17.53 identified with the value Dm+' f ( x ) [ t l ., . . , t,+l], where Dm+' f (x) is the corresponding ( m 1)-linear map. For the induction step, note that f is m n 1 times differentiable at x iff f is m 1 times differentiable on 52 and Dm+l f is n times differentiable at x E R. This is the case iff f is m times differentiable on R,D " f is differentiable on R and Dm+' f is n times differentiable at x E R,which by induction hypothesis is the case iff f is m times differentiable on R and Dmf is n + 1 times differentiable at x E R. For the representations of the tensors note that for all (tl , . . . , tm+n+l)E Xm+n+l the following hold.
+
+ +
+
Dn+l Dm f ) ( x ) [ t l ., . . , tn+lI[tn+2>. . tm+n+lI ( = D (D" ( D m f ) )(x)[tlICt29... , tn+l1[tn+2,.. ., tm+n+lI . $
=
D ( D n + " f ) (x)[t11[t2, . . . , tm+n+ll
-
~n+m+l
f ( x ) [ t l , . . , tm+n+ll. 1
For kfh derivatives (or, more accurately, for the tensors associated with them) the order of the arguments does not matter. The key to this insight is to prove the result for second derivatives.
Theorem 17.56 Hermann Armandus Schwarz' Theorem. Let X and Y be normed spaces, let 52 & X be open, and let f : R + Y be twice differentiable at x E R.Then for all (s,t ) E X 2 we have D2f ( x ) [ s ,t ] = D2f ( x ) [ t ,sl. Proof. The main idea is that the sum f ( x + s + t ) - f ( x + t ) - f ( x + s ) + f (x) should be close to D2f ( x ) [ s ,t ] and D2f ( x ) [ t ,s ] . To understand where this expression comes f (x + s) - f from, recall that for single variable functions the difference quotient is close to f '(x) for small enough s. Hence, the difference quotient
t
St
S
1 7.7. Higher Derivatives
375
should be close to f ” ( x > , andso f(x+s+t)-f(x+t)-f(x+s)+f(x) should be close to theproductf”(x)st, whichfor general f would be D 2 f ( x ) [ s t, ] or D 2 f ( x ) [ t s, ] . Let E > 0 and let SSt > 0 be so that for all s , t E X with llsll < SSt and ljtll < Ssr & we have x s t E L 2 and Df (x s t ) - D f ( x >- D 2 f ( x > [ s tll! I - 11s t 11,
+ +
!I
+
+ +
where D’J is interpreted as a linear map into C ( X , Y ) . Then for all s, t 11s /I < 6,, and lltll < 6,,we obtain
‘i f
Gt
5
4 E
+
X with
+ + r ) - J (-u + t ) - f ( x + $1 + f ( x ) - D ’ f ( X ) [ S , fill 5
/lf(x+s+t) - f ( x + t ) -
(Df(X+S)[fI
- Df(X)[fI)
- (f(x+s) - f(x))/l
376
17. Differentiation in Normed Spaces
But this implies the following for all s , t E X
\ {O}.
lpf(x)[s> tl - D 2 f ( x ) [ t SIII ,
Therefore, for Als, t ] := D 2 f ( x ) [ st, ] and B [ s , t ] := D 2 f ( x ) [ t s, ] the tensor norm of the difference is IIA - BII 5 48. Because E > 0 was arbitrary this implies IIA - BII = 0, and hence D 2 f ( x ) [ s t, ] = D 2 f ( x ) [ ts] , for all s, t E X .
Definition 17.57 A bijective finction a : { 1, . . . , k } + (1, . . . , k } is also called a
n k
permutation. A k-tensor T :
n
Xi --f Y is called symmetric ifs for all k-tuples
i=l
k
( x i , . . . ,X k )
E
Xi and all permutations a : { 1, . . . , k } + { 1, . . . , k } the equal-
Corollary 17.58 Let X , Y be normed spaces, let fi X be open, let x f : fi + Y be k times differentiable at x. Then Dk f (x) is symmetric.
E
Q, and let
Proof. The proof is an induction on k with k = 1 being trivial and k = 2 proved in Theorem 17.56. For the induction step, assume that k > 2 and that the result is proved for all j < k . Let ( t l , . . . , t k ) E X k and let a : 11, . . . , k } + 11, . . . , k } be a permutation. If a(1) # 1 let t : { 2 , .. . , k } + { 2 , .. . , k } be a permutation with t ( 2 ) = a(1). If ~ ( 1= ) 1 let t ( i ) = i for all i E [ 2 , . . . , k } and skip the middle three lines in the computation below. D k f ( x ) [ t i , .. . , t k l
=
D ( D k - ' f (x)) [til[t2,.. . , tkl
=
D ( D k - ' f ( x ) ) [tll[t,(2),
=
D2 ( D k - ' f ( X ) ) [ti,t,(2)l[h(3),. . . , t ~ ( k ) 1
=
D2 ( D k - ' . f ( J ) ) [ & ( 2 ) , tll[tr(3),* * .
=
D (D"-'S(x))
=
D ( D k - l f ( x ) ) [t,(l)I[t,(2)3
=
D k f ( x ) [ & T ( l' ).> . t,(k)I,
. . ' > t,(k)I
3
k(k)]
. . > t,(k)I
[ ~ , ( l ) l l ~t,(3)>. l~
b(3),
. . . t,(k)I 3
1
where in the second to last step we use { l ,t(3),. . . , t ( k ) } = {a(2),. . . , a ( k ) }and apply an appropriate permutation.
17.7. Higher Derivatives
377
Clairaut's Theorem says that the order in which partial derivatives are taken is not important as long as the function is twice differentiable. It is usually stated for second partial derivatives of functions from Rd to R. The corollary below is easily seen to imply Clairaut's Theorem (see Corollary 17.60). Note, however (Exercise 17-73), that mixed partial derivatives can exist and not be equal.
n
n d
d
Corollary 17.59 Let
Xi
be aproduct space, let Y be a normed space, let R C
i=l
Xi
i=l
be open, let x E R, and let f : R + Y be k times differentiable at x. Then f o r all indices i l , . . . , i k E { 1, . . . , d } the partial derivative Di, . . . Di, f exists and f o r all permutations D : 11,. . . , k } + 11,. . . , k ] and all (xi,,. . . , x i , ) we have Dil ' . . D i , f ( x i , , ...,x i,)=Du(i,)...Do(ik)f(x,(il),. . . $ x o ( i k ) ) .
Proof. For j E (1, . . . , d } , define e [ X j ] := { (0, . . . , 0 , x j , 0 , . . . , 0 ) : x j E X j ) , where the xj is in the jthcomponent of each vector. First we prove by induction on k that Dil ' . . D i k f = D k f I n l = l e [ X i , l The . case k = 1 follows from Di f = D f le[xil (see Proposition 17.40) wherever f is differentiable. For the induction step, assume the result has been proved for all j < k . Then
Now Corollary 17.58 implies Dj,
. . . D j k f ( x i l ,. . . , xi,)
I n5=,e [ X i k1 (xi, . . . xi,)
=
Dk f
=
~ ' f l n ;e[X,(,k)1(~o(ii)3 =, . . xo(ik))
=
Du(il) ' ' ' Du(i,)f (xu(il),
I
. 3
...
9
xu(ik)),
which finishes the proof.
Corollary 17.60 Clairaut's Theorem. Let R C Rd be an open set and let the function f : C2 -+ R be twice diferentiable at x E R. Then for all i l , i 2 E { 1, . . . , d } we have
a2f
(x) = a 2 f (x). axi, axi, axi,axi, r f f is k times differentiable at x, then for all i l , . . , , i k ~
mutations D of { 1, . . . , k } we have
akf axi, . . . axi,
af Proof. Easy consequence of -(x) axi
(x) =
E { 1, . . . , d akf
axu(i1).
'
axo(i,)
) and all per-
(x).
. t = D R f~( x ) [ t ]and Corollary 17.59.
Now that we have made the connection to partial derivatives, we can also give an explicit formula for the kth derivative in terms of partial derivatives that is similar to Theorem 17.43.
17. Differen tiation in Normed Spaces
378
Rd be open, let x
Corollary 17.61 Let B differentiable at x. Then
E B and let f
: B
+ Iw be k times
Moreovel; for hi = h2 = . . . = hk = c = ( c l , . . . , C d ) we have
d
n;,[hjlei, for all j = 1, . . . , k. For the second part, note
Proof. Use that hj = i,=l
that because the kth partial derivatives do not depend on the order in which the partial derivatives are taken, we can sort the first sum by how often each kth partial derivative occurs. We conclude this section with a proof that k-fold differentiability is preserved by compositions.
Proposition 17.62 Chain Rule. Let X , Y , Z be normed spaces, let R1 & X , B2 E Y be open, let g : R1 -+ Q2 be k times differentiable at x E 521 and let f : R2 + Z be k times differentiable at g ( x ) E B2. Then f o g is k times direrentiable at x. Proof. This proof is an induction on k, with the base step k = 1 being the Chain Rule (Theorem 17.30). For the induction step (k - 1) -+ k, let f and g be k times differentiable. Recall that by Theorem 17.30 we have D ( f o g)(x) = D f ( g ( x ) ) o D g ( x ) . By Proposition 17.55, D g ( . ) is k - 1 times differentiable at x and by induction hypothesis D f ( g ( . ) )is k - 1 times differentiable at x. Therefore by an easy generalization of Exercise 17-57 the function x H ( D f ( g ( x ) ) D , g ( x ) ) is k - 1 times differentiable. Moreover, the function ( L , M ) H L o M from L(Y, Z ) x C ( X , Y ) to C ( X , Z ) is continuous and bilinear, and hence k - 1 times differentiable by Exercise 17-74. But then by induction hypothesis, the composition x H ( D f ( g ( x ) ) ,D g ( x ) ) H D f ( g ( x ) )o D g ( x ) is k - 1 times differentiable, which completes the proof.
Exercises 17-72. Finish the proof of Theorem 17.56 by proving that for all & z 0 we can find a St, > 0 so that for all s, r E X with llsli < St, and llrli < St, we have
17-73. Even when mixed partial derivatives exist, they need not be equal. Consider the function
17.7. Higher Derivatives (a) Prove that both
379
a2f ~
a2f
axay
(0, 0) and -(0, 0) exist and are not equal ayax
af .
af
.
(b) Prove that neither - nor - is differentiable at (0, 0). ax ay (c) Prove that f is differentiable at every (x.y ) E R2 (d) Prove that D f is not differentiable at (0,0). 17-74. Prove that every continuous k-tensor T is infinitely differentiable with D JT = 0 for j > k 17-75. Let R G Rd be open, let f : R + W be a twice differentiable function and let x E (a) Prove that A := ( D 2 f ( x ) [ e ; e, j l
R.
)
is such that D 2 f ( x ) [ u ,w ] = u T A w , = 1 , ,,, , j=l,..,,d where u , w are column vectors with respect to the standard base and uT is the transpose of u , that is, a row vector.
(b) Prove that A is a symmetric matrix, that is, for all i, j
E
(1, . . . , d ) we have a j j = a j j .
17-76. For each f : R2 + 22, compute the second derivative. Use the representation of Exercise 17-75 (a) f ( x , y) = ye"
(b) f ( x . J) = x 3
+ 3xy + y 2 +
17-77. Taylor's Formula Let X , Y be Banach spaces, let R C X be open, let f : R --f Y be k 1 times continuously differentiable on R, and let x E 52, z E X be so that for all r E [0, 11 we have
x +tz
E
R.
+
Hint. Consider the function t w f(x t z ) . The Riemann integral for continuous Banach space valued functions is defined in Exercise 17-41. Use Theorem 13.3 as guidance.
(b) For R Wd and Y = R state Taylor's formula from part 17-77a in terms of partial derivatives. Then decide which of the two formulas is easier to use computationally and which formula is easier to read. Hint. Corollary 17.61. (c) Let
o ( i ):=
s,'
+
Dk+'f(x
k
The function T k ( z ) :=
i=o
+ u z ) d u [ i, .
,,
. i ] . Prove that lim O ( z ) = o ~
llzIl+O
llzllk
1
-t D ' f ( x ) [ z ,. . . , i ] is called the kth Taylor polynomial o f f at x
-'
17-78. For each function f : R2 R below, compute the second Taylor polynomial at x = 0. Use the representation of Exercise 17-75 for the second derivative. (b) f ( x , y ) = x 2
(a) f ( x , y ) = ex?.*
+ y2 (Is the result a surprise?)
17-79. Let X be a normed space. A bilinear function T : X x X + 22 is called positive definite iff for all x E X \ ( 0 )we have T [ x ,x] > 0. (a) Second Derivative Test. Let Q
differentiable and let u E
E X be open, let f
R be so that D f ( u ) = 0.
: X
--f
R be twice continuously
i. Prove that if D 2 f ( u ) is positive definite, then there is an E > 0 so that for all points u E X \ ( u }with lIu - ulI < E we have f ( u ) < f ( u ) , ii. Prove that if - D 2 f ( u ) is positive definite, then there is an E > 0 so that for all points u E X \ ( u ) with )1u - u / / < E we have f ( u ) > f ( u ) ,
380
17. Differentiation in Normed Spaces iii. Prove that if there are x1. x 2 E X so that D2 f ( u ) [ x l ,xl] > 0 and D 2 f ( u ) [ x 2 x2] , < 0, then for every E > 0 there are elements u , w E X \ ( u )with IIu - ulI iE , Ilu - wll < E , f ( u ) < f (v), and f ( u ) > f (w). Hint. Use Exercise 17-77.
a2f ( u ) > 0 (b) Prove that if X = W2, then the second derivative D 2 f ( u ) is positive definite iff ax2
and -a2f ( u ) ~a2f ( u ) - (*(u))* ax2 ay axay
> 0.
Hint. Use Exercise 17-75.
a2f
a2f
ax2
ay2
(c) Prove that if X = W2 and -(u)-(u)
(a(.)) 2
-
< 0, then there are x1,x2 E X
axay
so that D2f ( u ) [ x l ,xi] > 0 and D 2 f ( u ) [ x 2 , xz] < 0. (d) State and prove a result similar to part 17-79a for a k times continuously differentiable function f : C2 + W so that Of (x) = 0, . . . , Dk-’ f (n) = 0. (Distinguish even and odd k.)
17-80. A characterization of symmetry
n k
(a) Prove that a continuous k-linear function T : there is a S > 0 so that for all k-tuples
(XI,.
. . . xk)
E
> 0
1 (XI,. . . , x k ) /
<S
Xi + Y is symmetric iff for every
i=l
n k
E
X i that satisfy
i=l
and all permutations
0
: [ 1, . , , , k ] + [ 1, , . , , k ] we have
/I ~ ( x l . . xk) - ~ ( x o ( 1 ) s . . , x o ( k ) ) 1 ,,,
,
5
E
C IIxi 11 (ill
Ik .
(b) Consider the continuous bilinear map T : B2 x W2 + R defined by T ( x , y ) := 7r1 ( x ) 7 r 2 ( y ) . Prove that for every E > 0 there is a 8 > 0 so that for all ( x , y ) E R2 x E2with I/(x,Y)il < 6 we have T ( x , y ) - T ( y ,x) 5 E ( //x/I IIyIl ) , but T is not symmetric.
1
1
+
17-81. Let X be a Banach space, let Y be a normed space and let X ( X , Y ) be the set of linear homeomorphisms from X to Y . Prove that the map J ( A ) := A-l is an infinitely differentiable function from X(X, Y ) to X ( Y , X ) .
17.8 The Implicit Function Theorem To investigate the solution sets of equations f (x,y ) = 0 in more detail, it is often helpful to represent y as a function g(x). The Implicit Function Theorem says that under mild hypotheses this is possible and g has the same differentiability properties as f.The first step toward the Implicit Function Theorem is a result about fixed points.
Definition 17.63 Let S be a set and let f : S -+ S be a function. Then p E S is called a fixed point off i f S f ( p ) = p . Fixed points are important throughout applied mathematics because many equations can be rewritten as fixed point equations. (Recall Newton’s Method from Section 13.2.) Under certain conditions, fixed points must exist. That is, if a fixed point equation from a concrete application satisfies the right abstract condition, then it must have a solution. Banach’s Fixed Point Theorem provides one such condition.
17.8. The Implicit Function Theorem
381
Theorem 17.64 Banach's Fixed Point Theorem. Let X be a complete metric space, let 0 5 q < 1 and let f : X -+ X be afunction so that for all points x,y E X we have d ( f ( x ) ,f ( y ) ) 5 q d ( x , y ) . Then f has a uniquefucedpoint p E X . Proof. Let x
E
proves that for all n E
X and consider the sequence { f n ( x ) } Z l .An easy induction
N we have d f n ( x ) , f n+l (x))
(
5 q " d ( x , f ( x ) ) . Hence, for all
m-1
q k d ( x ,f (x)). Therefore, { f n ( x ) } z lis
n , m E W we infer d (fn(x), f m ( x ) ) 5 k=n
a Cauchy sequence. Let p := lim f (x) and let E > 0. Then there is an n E d ( p , f n ( x ) )
= 0, and hence we conclude p = F.
a,
Now we are ready to state and prove the Implicit Function Theorem. Note that the function h in the proof is similar to applying Newton's Method in the Y-coordinate.
Theorem 17.65 Implicit Function Theorem. Let X , Y and Z be Banach spaces, let i-2 C X x Y be an open set, let (xo,yo) E i-2, and let f : i-2 -+ Z be continuously diferentiable so that f (xo,yo) = 0 and so that D y f (xo,yo) : Y -+ Z is a linear homeomorphism. Then there is an open neighborhood N 5 X of xo such that there is a unique continuously difSerentiable function g : N + Y so that g(x0) = yo and f o r all x E N we have (x,g ( x ) ) E R and f ( x , g(x)) = 0. The derivative of g is Dg(x> = - ( ~ y f ( x g, ( x ) ) ) - l o D x f ( x , g(x>). Moreovei; iff i s k times continuously differentiable, then so is g . Proof. Let LO := D y f ( x 0 , yo), let h ( x , y ) := y - L i l [ f ( x , y ) ] and let 6 > 0 be so that for all (x,y ) E X x Y with I / ( x , y ) - (xo,y o ) 5 6 we have that (x,y ) E R,
1
D y f ( x , y ) is invertible and ~ / DfY(x,y ) - Dr f (xo, y o ) / /
0 there is an N E N so that for all m ,n 2 N and all x E R we have / f n ( x ) - f m ( x ) < E . Prove that then there is a continuous function f : R +. Y so that lim f n (x) = f ( x ) for all x E R.
1
n+cc
17-83. Prove Corollary 17.67. 17-84. Prove Corollary 17.69. 17-85. Explain why in the proof of the Implicit Function Theorem we must prove that g is continuous before we can prove that it is differentiable. 17-86. Lagrange multipliers. (a) Let X , Y be finite dimensional normed spaces with dim(X) = n ? m = dim(Y), let R & X be open, let f : R +. R be continuously differentiable, let g : R + Y be continuously differentiable, and let x E R be so that g ( x ) = 0, the rank of D g ( x ) is m, and there is an E > 0 so that for all z E R with g(z) = 0 and I/z - x i / < E we have that f ( x ) > f ( z ) . Prove that there is a continuous linear function cp E L(Y,R) so that D f ( x ) cp o D g ( x ) = 0. Hint. Let X1 := [ a E X : D g ( x ) [ a ]= 0 ] and represent X as a product X = X I x X 2 with x = (xl . x 2 ) . Find a neighborhood U x V of x so that there is a continuously differentiable a :U + Vsothatg(xl,a(xl))=Oforallx E U.ThenconsiderH(x):=f(xl,~(xl)).
+
(b) Prove that if X = R" and Y = Wm, then there are h l , . . . , A m m
grad(f)(x)
E
R so that we have
+ z h i g r a d ( n i o g ) ( x ) = 0. i=l
17-87. Let X be a complete metric space and let f : X + X be a function so that there is a sequence cc
b, converges and for all n E N and all x , y
[ b n ) E 1of nonnegative numbers such that
E
X we
n=l
have d ( f n ( x ) ,f"(y)) 5 b n d ( x , y ) . Prove that f has a unique fixed point p in X . 17-88. Let 11 . 112 be the Euclidean norm on W3.Find a map f : R3 + R3 that has no fixed points and so thotfn-oll r c I33 W P h a v e 11 f l u ) - f ( v ) 11- = I I Y - v l l q .
Chapter 18
Measure, Topology, and Differentiation Continuity, differentiation, and integration are the three main topics in analysis. Part I of the text has shown that the combination of these concepts can lead to powerful new insights, such as the Fundamental Theorem of Calculus or the Lebesgue criterion for Riemann integrability. For abstract spaces, we have seen in Chapter 14 how integration leads to measure theory, in Chapter 16 how the investigation of limits and continuity leads to topological concepts, and in Chapter 17 how differentiation is approximation with continuous linear functions. In this chapter, we investigate the connections between measure, topology, and differentiation in d-dimensional space. Section 18.1 characterizes Lebesgue measurable sets topologically by approximating them from the inside with closed sets and from the outside with open sets. Similarly, Section 18.2 shows that p-integrable functions can be approximated with infinitely differentiable functions. After a brief excursion into tensor algebra in Section 18.3 (placed there to keep the presentation self-contained), the chapter concludes with the proof of the Multidimensional Substitution Formula in Section 18.4.
18.1 Lebesgue Measurable Sets in Rd This section shows how Lebesgue measurable sets in Rd can be characterized almost exclusively with the topological ideas of openness and closedness. We start by proving that the most fundamental subsets of Rd are Lebesgue measurable. Theorem 18.1 Open and closed subsets of Rd are Lebesgue measurable.
Proof. By Proposition 14.60, for any x = d
open cube C,(X) := n ( x i - E , xi the open ball of
i=l radius E
(XI,.
. . , xd) E
Rd and any E
+ E ) is Lebesgue measurable.
around x in the uniform norm (1 . ((oo.
385
> 0 the
Moreover, C,(X) is
18. Measure, Topology, and Differentiation
386
Let 0 5 Rdbe an open set. Then OQ := 0 n Qdis a countable dense subset of 0. For all x E OQ,let E, := sup { E < 1 : C,(x) 5 O}. Then clearly C , , ( x ) C 0.
u
x€Oq
To prove the reversed inclusion, let y E 0. There are an E E (0, 1) so that C,(y) _C 0 E and an x E OQ so that [ [ x- y [ I f f i< -. But then y E C % ( x )E C,(y) 5 0, which 2 C,,(x). Because y was arbitrary this proves that C,,(x) 2 0, means y E
u u
u
X€OZ
and hence
X€OQ
CEX (x) = 0. Because OQ is countable and every C,,
(x)is Lebesgue
X€OQ
measurable we conclude that 0 is Lebesgue measurable because it is a countable union of Lebesgue measurable sets. Therefore, the open subsets of Rd are Lebesgue measurable. Now let C 5 Rd be closed. Then 0 = Rd \ C is open and thus Lebesgue measurable. But then C is the complement of a Lebesgue measurable set, and hence it is Lebesgue measurable, too. w
Corollary 18.2 Let S 5 Rd be Lebesgue measurable and let f : S -+ R be continuous. Then f is Lebesgue measurable. Proof. Let a E R. Then the set ( a , co) is open in R. Because f is continuous, f - ' [ ( a , m)] is relatively open. That is, there is an open set 0 5 Rd so that f - ' [ ( a , m)] = S n 0, which by Theorem 18.1 is Lebesgue measurable. Hence, f is Lebesgue measurable. w Lebesgue measurable sets can now be characterized as the subsets of Rd that can be approximated by open sets from the outside and by closed sets from the inside so that the measure of the difference can be made arbitrarily small. We will first prove this result for sets of finite measure and then for sets of arbitrary measure.
Theorem 18.3 A subset S Rd with h ( S ) < 00 is Lebesgue measurable zfffor every z 0 there are a compact set C and an open set 0 such that C 5 S _C 0 and A(O \ C ) < E . E
Proof. For "+,"let S be Lebesgue - measurable with h ( S ) < 00 and let E > 0. Then
u
ffi
00
there is a family of open boxes { B j } g l with S 5
Bj and
j=1
u
j=1
cc
oc,
Let 0 :=
[Bjl < h ( S )
IBj I
B j . Then 0 is open, S C 0, and h ( 0 ) 5
< h(S)
j=1
j=1
+ -.E2
+ -.2 E
For any
d
n
E
N let K ,
:= n [ - n , n ] . By Theorem 14.15, we obtain h ( S ) = lim h ( S n K,)
i=l .~ Hence, there is an N
n+co
E
N so that K
.
E
:= K N satisfies h ( S n K ) 2 h ( S ) - -. The set
4 K \ S is Lebesgue measurable and there is a family of open boxes {Aj]yZl so that
18.1. Lebesgue Measurable Sets in Rd
u x
K \S
_C
j=1
387
x
$. But then C := K \ uA j is closed x
IAjl < h ( K
A j and
\ S) +
j=1
j=1
and bounded, and hence by Theorem 16.80 it is compact. Moreover, the containment
u x
C =K
\
A , 2 K \ ( K \ S) = S holds. NOW
j=l
>
&
&
( k ( S ) - -4) - - =4h ( S ) - -
&
2’ E
-) &
Now h ( 0 \ C) = h ( 0 ) - h ( C ) < h ( S ) + - - ( h ( S ) = 6 , whichproves the 2 2 direction “+.” Conversely, for “+”let S C Rd be such that h ( S ) < 00 and for every E > 0 there are a compact set C and an open set 0 such that C _C S _C 0 and h ( 0 \ C) < E . Let T C Rd and let e > 0. Then with 0 and C as described we infer
18. Measure, Topology, and Differentiation
388
A ( T ) = h(on T ) h (0’ n T ) > h ( o n T ) + h ( o ’ n T ) + h ( o\ c ) - E
+
(o’n T ) + h ( ( o\ c ) n T ) - E 2 h( o n T ) + h (0’ n c’n T ) + h (( o n c’)n T ) - E 2
A(O n T ) + A
2
+ A. (c’n T ) - E h ( s n T ) + A (s’n T ) - E . h(on T )
Because E was arbitrary we obtain h ( T ) 2 h ( S f l T ) was arbitrary, this means that S is Lebesgue measurable.
+ h (S’ f l T ) , and because T
Theorem 18.3 indicates that an “inner volume” or “inner Lebesgue measure” for a set S could be defined as the supremum of the (outer) Lebesgue measures of all compact sets contained in S (also see Exercise 18-3). A set would then be measurable iff the inner and outer volume are equal. Measurability in this sense turns out to be the same as Lebesgue measurability. However, this idea is quite complicated and it requires some topological structure, while Definition 14.19 of measurability can be used on arbitrary sets. This is why, even though it intuitively feels like a good idea, we do not work with “inner measures.” We can characterize Lebesgue measurable sets in Rd as follows.
Theorem 18.4 Let S
Rd. Then the following are equivalent.
1. S is Lebesgue measurable, 2. For every E > 0, there are a closed set C and an open set 0 so that C 2 S _C 0 and h ( 0 \ C) < E . of open sets and a sequence {C,}r=l of closed sets
3. There is a sequence {
cc
with 0, 2 On+l and C, G Cn+l f o r all n E
(6
andsothath
0,
N so that
n 00
C, 5 S 5
n=l
cc
\
u
O,?
n=l
U Cn) = 0.
nr-12. d
Proof. To prove “1+2,” for any natural number n
E
N let K ,
:=
n ] and let
i=l
K-1 := KO := 0. For each n E N the set S n ( K , \ K,-1) is Lebesgue measurable with finite Lebesgue measure, so by Theorem 18.3 there are a closed set C, and an openc set On withC, C Sn(K,\K,-1) E 0, K,O+,\K,-1 andsothath(0, \C,l) < A 2.2n’
u ~
n=l
u 00
30
Then the set 0 :=
0, is open and by Exercise 16-42, the set C :=
n=l
C, is closed.
18.1. Lebesgue Measurable Sets in Rd
389
Finally, with 00:= 0 we obtain the following. m
~ (\ C 0) =
C ~ ( (\ C0 ) n (Kn \
~n-1))
n=l
For “2=+3,” for each natural number k E N let i& be open and let z k be closed n 1 with E k c s s 6 k and A( & \ E k ) < -. Then for each n E N,the set 0, := & k
n
u E n
k=l
n
is open, the set Cn :=
u m
Therefore,
E k is closed, 0,
k=l
Cn G S
n=l
2 On+l,Cn G Cn+i and Cn G
On.
Cc
0,. Moreover, for all k
E
N we infer
n=l
Cc
and because k E
sG
N is arbitrary we conclude h
On \ (n=l
“1
u
C,
= 0.
n=l
The part “ 3 j l ” follows from the fact that open sets, closed sets and null sets are Lebesgue measurable. rn Lebesgue measure is also often considered on Lebesgue measurable subsets of Rd (see Example 14.10). Let A 2 Rd be Lebesgue measurable. Because the intersection of Lebesgue measurable sets is again Lebesgue measurable, Theorem 18.4 also holds for Lebesgue measurable subsets of A . If we want all sets involved to be subsets of A, we need to replace the demand that the 0, are open with the demand that the On are open in A and we need to replace the demand that the C, are closed with the demand that the C, are closed in A . If A itself is open, this is not necessary. Theorem 18.4 also shows that the a-algebra generated by the open subsets of Rdis interesting in itself. It is investigated further in Exercise 18-7.
Exercises 18-1. Explain why the hypothesis k ( S )
h ( S ) - E . 18-5. Lebesgue measurable functions. Let
R g Wd be open and let f : R
-+
W.
(a) Prove that f is Lebesgue measurable iff for all open sets 0 & measurable.
R the set f-' [ 01 is Lebesgue
(b) Prove that f is Lebesgue measurable iff for all closed sets C E measurable.
X the set f-' [C] is Lebesgue
(c) Prove that f is Lebesgue measurable iff for all compact sets K 2 Lebesgue measurable.
B the
set f - l [ K ] is
18-6. The Derivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral states the following. If h : [ a ,b] +
W is Lebesgue integrable, then the function H ( x ) :=
iXIX
I^
h ( t )d h ( t )
h ( t ) d h ( t ) = h(x) for almost all x E [ a ,b ] . In this
is differentiable a t . with derivative -
exercise, we will prove the result with the steps given below. (Recall that by Exercise 10-7, H is differentiable a.e. if h is nonnegative.)
u n
(a) Let a1
ibl
< a2 < b2
0 (see Exercise 18-9). For all j E N by dxn ~, _ _1 e .r e - i = lim uje-' = 0, which Exercise 12-27, we obtain lim - = lim ( : ) J proved that c E c"(R) with c'"'(x) :=
(:)
x+o+
xJ
x+O' k
a j x j we can conclude
means that for every polynomial p ( x ) = j =O
But then for all n
E
u+x
N U {0}we obtain
393
18.2. Cco and Approximation of Integrable Functions
Figure 44: Constructing a Coo “indicator function” for intervals. 1
d“e-i - 0 lim
z+o+
dz“
z-0
= lim z+o+
pn
(i) e-i - o = 2-0
1 lim - p n z+o+ z
(:)
e-: = 0.
1
Note that the function in Lemma 18.8 also shows that not every C”O function can be represented as a power series. This is because c(“)(O) = 0 for all n E N,but c # 0.
4x1 is c(6 - x) c(x) infinitely . differentiableon R.Moreovel; it takes values in [0, 11, it is identical to zero on (--00, 01, it is identical to 1 on [a, m) and j s [ ( O , S ) ] C (0, 1). Lemma 18.9 The Coojump function. Let 6 > 0. Then j a ( x ) := “I
+
~~
Proof. Because c 3 0 and for each x E R at least one of c(x) and c(6 - x ) is not zero, Proposition 17.62 and Theorem 18.6 imply that ja E C”(R). Because c 2 0 for 4x1 < all x E R we obtain c ( x ) 5 c(S - x ) c ( x ) , which implies 0 5
+
c(6 - x )
+ c(x) -
for all x E R and the inequalities are strict for x E (0,6). Finally, because c ( x ) = 0 for x I 0 we conclude j s ( x ) = 0 for x I 0 and because c(6 - x) = 0 for x 3 6 we conclude J s ( x ) = 1 for x 2 6.
Lemma 18.10 The Coo interval indicatol: Let a < b and 6 > 0 be real numbers. The function l ( a , b ) , J ( X ) := js(x - a ) j s ( b - x ) is infinitely difSerentiable on R. Moreovel; it takes values in [0, 11, it is identical to zero on E% \ ( a ,b ) and it is identical to 1 on [a 6 , b - 61.
+
1
Proof. Exercise 18-10.
Cm interval indicator functions are the key ingredient to constructing similar “indicator functions” for closed sets. These functions are then used to show that the compactly supported infinitely differentiable functions are dense in L* (Q).
Lemma 18.11 Let C C Rd be closed and let U C Rd be open so that C U. Then there is a function l c , u E Coo ( R d ) that takes values in [0,11 and satisfies k u l , = 1 and SUPP ( k u )
c u.
Proof. First consider the case that C is compact. Throughout, we will use the uniform norm 11 . Iloo. So, “balls” around points will actually be cubes. For each x E C, let E, 0 be so that BEx( x ) c U . Then BQ (x)],,~ is an open cover of C. Let
uf
{
n
XI,
. . . , xn be so that C
BsX
j=1
(xj).For
each x j , let x ; ) denote the ith component
394
18. Measure, Topology,and Differentiation
of the representation of x j with respect to the standard base. For j = 1, . . . , n let
11
becau\e it is a product of infinitely differentiable functions. Hence, for h := infer supp(h) x E
B61, -
c U,h
E Cm
(ad) and for all x
( x - ~ ) which . means h(x) 2
12,
hi ue J=i
E
C there is a j E [ 1. . . . . n ) so that j l being a Cx jump function
(x) = 1. With
as in Lkmma 18.9, define l c , := ~ j1 o h . By Proposition 17.62, this function is in Cx By the above, we have lc,ulc = 1 and supp ( l c , ~c) U , so the result is established for compact C. Now consider the case that C is closed, but not necessarily compact. For each n E W, let C, := C f l (B,(O) \ B,-l(O)). Then each C, is compact. For each x E C,,
(md).
let E~
E
(0, 1) be so that BEx(x) c U and let U,, :=
u
BE,(x). Then for each n
E
N,
X€C,
the set U, is open and contains the compact set C,. Let the functions l c , , ~be , as w
l c , , ~,, For , each x E Rd, there is a neighborhood
constructed above and let g := n='i
V of x so that at most four terms in this series are not equal to zero on V . Thus g is in COc Moreover, g(x) 2 1 for all x E C , and supp(g) c U . Now l c , := ~ j1 o g is as desired.
(Rd).
Note that Lemma 18.11 shows in particular that Cr(S2) is not empty, which is not necessarily trivial. The next result shows that even more is true.
Theorem 18.12 Let 1 5 p
0. By Theorem 18.3, there is a compact set C & B so that h ( B \ C) < E * . Let 1 c . be ~ as in Lemma 18.11. Then supp ( l c , ~c) B 2 S2, l c , E~ Cco(S2) and
E
We conclude that in any LP-neighborhood of an indicator function of a box in Q we can find a function in C r (Q). Now that we can approximate indicator functions of boxes with C r (D)-functions, the proof proceeds just like the proof of Theorem 16.86. To approximate indicator
18.2. Cx and Approximation of Integrable Functions
395
Figure 45: Illustration how the indicator function of an interval can be approximated in LP with CQ3functions. The area bounded by the difference shrinks to zero, which allows the approximation (in the LP sense) of the discontinuities with infinitely smooth functions. functions of sets, cover the set A with open boxes Bj C R such that the sum of the volumes of the boxes Bj is close to that of the set. Truncate the sum to obtain a finite number of boxes Bj so that the sum of the volumes of the finitely many boxes is close to the total sum of volumes. Approximate the indicator functions of these finitely many boxes with CF(S2) functions as indicated above to prove that indicator functions of sets can be approximated with CF(S2) functions. The details are to be given in Exercise 18-1la. Next, just like the proof of Theorem 16.86, approximate simple functions with functions in Cr(S2) (Exercise 18-llb). Finally, just like the proof of Theorem 16.86, apply Theorem 16.85 (Exercise 18-11c). Geometrically, Theorem 18.12 says that even highly discontinuous functions can be approximated arbitrarily well with infinitely smooth functions in LP. Figure 45 shows such an approximation for the indicator function of an interval. This visualization illustrates the crucial part of the proof of Theorem 18.12 and it also shows that the result is not counterintuitive. The infinitely smooth approximations do not in any way “fix” the jumps at the discontinuities.
Exercises 18-9. Prove by induction that for every n E
dn
-L x
= pn
(t)
N there is a polynomial
p n so that for all x > 0 we have
e-4.
18-10. Prove Lemma 18.10. 18-1 1. Finish the proof of Theorem 18.12
(a) Prove that for every measurable subset A of Q and every &?A,& E c r ( Q sothat ) l / l A -gA,&Ilp < E .
E
(b) Prove that for every simple function s E LP(R) and every gS,&E C r ( Q ) so that 11s - gS,&lIP < E .
E
> 0 there is a test function > 0 there is a test function
(c) Apply Theorem 16.85 to complete the proof. 18-12. Let R g Rd be an open set. Prove that for every function f in C F ( Q ) that converges a t . to f. 18-13. Let R
Rd be an open set. Prove that i f f
then f = 0 a.e.
E
E
L 1 (Q) there is a sequence of functions
L’(S2) is so that
fq dA = 0 for all cp
E
Cg(R),
18. Measure, Topology, and Differentiation
396 18-14. A half-open box in
Wd is a set H
for which there are real numbers ai < bi, i = 1, . . . . d so that
d
H = n[a;,bi). i=l
Let 0
2 Wd be open and let K g 0 be compact. Prove that for every E
> 0 there is a finite family n
(H,)S=l of pairwise disjoint half-open cubes of side length less that E so that K E
u
Hj
g 0.
j=1
Hint. Work with the
11 .
n[ n
use cubes of the form
1 , norm, let S := - min 2 IiS, (I,
{ E , dist
( C , Wd \ 0
) ] (need to prove S > 0) and
+ 1)s ).
i=l
18-15. Prove that the function g : Rd -+
B defined by g ( x ) := e%
;
lo:
for I/x1/2< 1, for Ilxll2 2 1,
is a C r function. Hint. Chain Rule and Lemma 18.8. 18-16. Let R C Wd be open and let f E LP(R). With f extended to all of Rd by setting it equal to zero
sI
+
outside of 52, prove that lim f ( x ) - f ( x z ) IP d u x ) = 0. z ~ o Hints. First prove the result for C r functions. Then prove the result in general by approximating f with C r functions. 18-17. Let R
2 Wd be open, let f
fp(x) :=
E
LP(R), let cp E C r ( R ) and for all x E Wd define the function fp by
f(z)cp(x - z ) d h ( z ) . (The integrals exist because LP(supp(cp)) 2 L'(supp(cp)).)
(a) Prove that fp is continuous on Wd. Hint. Dominated Convergence Theorem
af of f exist and are equal to Rd all first partial derivatives -
(b) Prove that at every x E
ax;
Hint. Dominated Convergence Theorem, Mean Value Theorem for functions of one variable and boundedness of the first partial derivatives of 9. (c) Prove that all first partial derivatives
af
- of
axi Hint. Apply parts 18-17a and 18-17b.
f are continuous at every x E Wd
(d) Prove that fp E Coo ( Wd ) Hint. Apply part 18-17c and use induction Note. The operation that produces 18-18. Let R
2 Wd beopenandlet
fp
from f and cp is also called the convolution of f and cp
p E [1, 00).
(a) Prove that LP(R) has a countable dense subset consisting of simple functions. Hint. Boxes with dyadic rational bounds. (b) Prove that LP(R) has a countable dense subset consisting of C r functions. 18-19. Prove that for every Riemann integrable function f : [a. b] + R on [a. b] there is a sequence (gn)?=] of infinitely differentiable functions on [a,b] such that & ( a ) = g n ( b ) = 0 for all n E W,
nl%l b
/f-g,ldx=OandforallnEWwehavelgn/i/fI.
Hint. First find a = xo < x l
.'.
i
n, = b and a step function s ( x ) =
a k l [ x k - l , , x kso ) that
i
k=l
Is/ 5
1 f j and /Is - f 11 1 is small. Then use Coo interval indicator functions.
18.3. Tensor Algebra and Determinants
397
Figure 46: Geometric arguments that give the area of a parallelogram and the volume of a Parallelepiped. (Formally, the cross product on the left is obtained by making 2 and b vectors in R3 whose third component is zero.)
18.3 Tensor Algebra and Determinants To prove the Multidimensional Substitution Formula we need a volume function for n-dimensional parallelepipeds spanned by vectors a1 , . . . , a,. Moreover, for integration on manifolds we need functions that allow us to compute the lower dimensional volume of lower dimensional parallelepipeds in higher dimensional spaces. For example, we will be interested in the area (two dimensional volume) of a parallelogram (two dimensional parallelepiped) in three dimensional space. We start by analyzing some formulas from geometry (also see Figure 46). It can be proved that the area of the parallelogram in R2 spanned by the vectors a = b=
(Ei 1is
(::)
and
1.
A ( a , b ) = lalb2 - bla2 Similarly, the volume of the three dimensional
1
1.
parallelepiped spanned by the vectors a , b , c is V ( a ,b , c) = ( a , b x c) Note that if we drop the absolute values, the (oriented) area is bilinear and the (oriented) volume is 3-linear (see Definition 17.45) in the input vectors. Throughout, we will consider oriented volumes, that is, volumes that can be positive or negative. This is not much of a loss, because if we really want a nonnegative number, we simply take absolute values. Because the two dimensional volume formula is incorporated in the three dimensional volume formula, the above suggests that we should try to construct a general tensor formalism that produces the volume function for parallelepipeds in arbitrary dimensions.
18. Measure, Topology, and Differentiation
398
We start by representing k-tensors on finite dimensional vector spaces. The representation in Theorem 18.16 below is already implicit in Corollary 17.61. Definition 18.13 Let V be a vector space. The set of real valued k-tensors on V is denoted by I k( V ) . For a Jinite dimensional vector space V , the dual space is deJined to be V* := I ’ ( V ) = C ( V ,R). With pointwise addition and scalar multiplication ‘ T k ( V )is a vector space (see Exercise 18-20). The (higher dimensional) volume of a box is the product of the lower dimensional volumes of the projections: for a rectangle, area is the product of the lengths of the sides (length is one dimensional volume), and for a three dimensional box, the volume is the area of the base times the height. It is thus not surprising that tensors can be multiplied in the same simplistic fashion. Definition 18.14 Let V be a vector space and S E ‘ T k ( V ) ,T E 7 “ ( V ) . We define s @ T [ U l , . . . , U k , U k + l , . . . , U k + l ] := s[L’l,. . . , uk]T[uk+l,. . . , V k + l ] and Call it the tensor product of S and T . Clearly, S @ T E ‘Tk+’(V). The tensor product is not commutative, but it is associative (see Exercise 18-21). This means that while we need to be careful with the order of the factors, tensor products with more than two factors can be written without parentheses. Tensor products allow us to represent tensors in terms of a very natural base. Theorem 18.15 Let V be a Jinite dimensional - vector space with base { u1, . . . , U d } .
-
Then the maps
@i
: V -+
IR dejined by q5i
:= ai form a base of the dual
space of V , called the dual base of { v l ,. . . , Ud}.
Proof. To see that { @ I ,
. . . , q5d) is linearly independent, let a1, . . . , ad
d
E
d
R be so
t h a t C a i $ i = O . T h e n f o r a l l j E { l , . . . , d } w e i n f e r O = ~ a i @ i [ u j ] = a j and , i=l
i=l
hence
{41,. . . , 4 d )
is linearly independent.
To see that (41,. . . , @ d ] is a base of V*, let
+E
c d
V*. Then for all u =
ai ui
i=l
d
d
in V we have + [ u ] = c a i + [ u i ] = c + [ u i ] 4 i [ u ] which , means that i=l i=l combination of the 4i. Hence, {$I, . . . , &j} is a base of V*.
+ is a linear H
Theorem 18.16 Let V be a Jinite dimensional vector space with base { v l ,. . . , U d } and let { @ I , . . . , @ d } be the associated dual base as in Theorem 18.15. Then the set B := {4i,@ . . . @ @ i k : i l , . . . . ik E 11. . . . , d } } is a base f o r I k ( V ) .
18.3. Tensor Algebra and Determinants
399
Proof. The proof is similar to the proof of Theorem 18.15. (Exercise 18-22).
H
Both the area of a parallelogram as well as the volume of a three dimensional parallelepiped have multiple summands. Hence, the volume function for parallelepipeds cannot just be a simple tensor product. A key property of both geometric formulas is that if we switch any two vectors, the sign of the result changes. This observation leads us to alternating tensors.
Definition 18.17 Let V be u vector space. A k-tensor w E I k V ( ) is called alternating #for all 1 5 i < j 5 k and all U I. . . . , Uk E v we have o[Ul,.
-
. . , U i - 1 , ui, u i + l , . . . U j - 1 , U j , u j + l , . . . , U k ] --W[Ul,. . . , U j - 1 , U j , U i + l , . . . , u j - 1 , ui, u j + l , . . . , U k ] . 3
The set of alternating k-tensors on V is denoted Ak( V ) . Exercise 18-23 shows that the set of alternating k-tensors forms a vector space. Because they have only one argument, linear functions q5 : V --f R are alternating 1-tensors. (The universal quantification in the definition of an “alternating 1-tensor” is over the empty set, so it is vacuously true.) For any tensor, we can define a corresponding alternating tensor. The idea is to sum all terms that can be obtained by permuting the entries in such a way that a transposition of two vectors will switch a positive summand with a negative summand.
Definition 18.18 For k E N we let S k be the set of all permutations on [ 1, . . . , k ) . A transposition is a permutation t that fixes all but two elements of the set. The sign sgn(o) of a permutation a E Sk is 1 i f a is a composition of an even number of transpositions and it is - 1 if a is a composition of an odd number of transpositions. (Exercise 18-24 shows that the sign is well-defined.) If T E I k ( V ) we define
Theorem 18.19 shows that Alt(T) is an alternating tensor, thus explaining the no1 tation. The proof also shows why we need the coefficient -. Without it, Alt( .) applied k! to an alternating tensor would multiply the alternating tensor with a factor that is not equal to 1.
Theorem 18.19 Let V be a vector space and let k E N. For all T Alt(T) E A k ( V )and f o r all w E A k ( V )we have Alt(w) = o.
E
I k( V ) we have
Proof. First note that a tensor w E I k( V ) is alternating iff for all transpositions sk and all u1, . . . , U k E V we have w [ u 1 , . . . , u k ] = - w [ U r ( l ) , . . . , U,(kj]. Moreover, note that for all transpositions t E s k we have t o t = id{I,..,,k). NOW let T E I ~ ( v and ) let u1, . . . , U k E V . Then for all transpositions t E Sk we obtain the following. t E
18. Measure, Topology, and Differentiation
400
=
-Alt(T)[Ul,.
..,~
k ] .
Therefore Alt( T ) is alternating. For the second part, note that if o E A’( V ) and a E sk, then for all u1, . . . , U k we have w [ q , . . . , ]‘u = sgn(a)w[u,(l), . . . , v,(k)] (Exercise 18-25). Now
=
W[Vl,.
E
V
..,Uk].
It is also easy to see that Alt : T k (V ) -+ Ak( V ) is linear (see Exercise 18-26). With Alt(.) we can now multiply alternating tensors in a way that produces a new alternating tensor.
Definition 18.20 Let V be a vector space. For w E A k ( V )and rj E Al(V) define the (k l)! wedge product w A v to be w A q := ___ AMw 8 v). k!l!
+
Example 18.21 The wedge product is the key to obtaining volume functions. Let ( e l , . . . , ed} be the standard base in Rd and let {nl , . , . , nd}be the corresponding dual base. 1. For all a,b E Rd,we have nl A n2(a,b ) = nl(a)n2(b)- nl(b)n2(a).That is, the wedge product of n1 and n2 maps any two vectors a , b E Rd to the oriented area of the parallelogram spanned by the vectors made up of the first two components of a and b. More generally, the wedge product of ni and nj maps a , b E Rd to the area of the parallelogram spanned by the vectors made up of the ith and jthcomponents of a and b.
2. For all a , b , c E
Rd,we have
(ni A ( ~ A2 ~ 3 ) ) ( ab ,, c ) =
18.3. Tensor Algebra and Determinants
40 1
(The lengthy componentwise proof can be produced in Exercise 18-27.) That is, the wedge product of nl,372, and n3 maps three vectors a , b , c E Rd to the oriented volume of the parallelepiped spanned by the vectors made up of the first three components of a , b and c. Similarly, the wedge product of ni, and n k maps any three vectors a , b , c E Rd to the volume of the parallelepiped spanned by the vectors made up of the ith, jthand kth components of a , b and c. The above indicates that the wedge product of k vectors of the dual base of the standard base e l , . . . , ed should map any k-tuple of vectors to the lower dimensional volume of the projection of the parallelepiped spanned by the vectors into the right lower dimensional subspace (see Exercise 18-28). Moreover, part 2 indicates that the wedge product of “projection volume functions” gives a higher dimensional volume function. Therefore the wedge product formalism should be the version of the familiar “base times height” formula for boxes that holds for parallelepipeds. 0 With the wedge product accepted as the right formalism to compute volumes of parallelepipeds we need to investigate its properties. Bilinearity of the wedge product is established in Exercise 18-29. Moreover, the wedge product allows a base representation of alternating tensors on finite dimensional spaces similar to Theorem 18.16. For this base representation, we first need to establish associativity and the behavior when two factors are transposed. The first two parts of the following lemma are motivated by the proof of the wedge product’s associativity.
Lemma 18.22 Let V be a vector space. I . Zf S
E
T k( V ) , T
E
7‘( V ) and Alt(S) = 0, then
Alt(S 8 T ) = Alt(T @ S) = 0.
2.
If S E T k( V ) , T
E
7‘(V ) and U
E
7”( V ) , then
Alt(Alt(S @ T ) @ U ) = Alt(S @ T 8 U ) = Alt(S 8 Alt(T @ U ) ) . 3. Zfw
E
A k ( V ) ,r j
E
A‘(V) and B E A”(V), then
In particular this means that the wedge product is associative. 4. The wedge product is “anti-commutative.’’I f w 0 A r j = (-l)k’q A w.
5. Zfk is odd and C#J
E
A k ( V ) ,then C#J
A
C#J
E
A k ( V )and
rj E
A’(V), then
= 0.
Proof. For part 1, we simply compute Alt(S @ T ) and represent the permutations cr E Sk+l as the composition of three permutations. The first sorts the elements of { 1, . . . ,k 1 ) into a set of size k and a set of size 1, so that in each set the elements
+
18. Measure, Topology, and Differentiation
402
are listed in increasing order and the sets are listed as two consecutive blocks. The remaining two permutations then act on these sets. Alt(S €3 T)[vi, . . . , w + ~ I
=Alt(S)[u,, , ..., u U k ] = O
=
0.
Part 2 is a straightforward application of the linearity of Alt(.), the multilinearity of tensor products and part 1. Alt(Alt(S €3 T ) €3 U ) - Alt(S €3 T €3 U ) = Alt(Alt(S @ T ) €3 U - ( S €3 T ) €3 U ) =
Alt([Alt(S @ T ) - ( S €3 T ) ]€3 U )
=
0,
and the other equality is proved similarly. Part 3 now follows from part 2. (W
A
q) A 8 =
[
%Ai)!t(W k!l!
€3 q ) ]
A
8
18.3. Tensor Algebra and Determinants
403
and the other equality is proved similarly. For part 4, represent w A r j and r j A o similar to the representation for part 1. Then note that 6u,w o 6 w t u , the permutation that transposes the “blocks” 11, . . . , k} and {k 1, . . . , k 1 ) can be represented as a composition of kl transpositions by going left to right in {k 1, . . . , k I } , transposing each element k times with its current predecessor. This establishes the claimed equality. Part 5 immediately follows from part 4.
+
+
+
+
With associativity of the wedge product established, we no longer need to include parentheses in the wedge multiplication of three or more alternating tensors.
Theorem 18.23 Let V be a $finite dimensional vector space with base { u1, . . . , u d } , and let {$I, . . . , &) be the associated dual base as in Theorem 18.15. Then the set d:={@ilA..+q5i,:lsi1 < . . . < i k s d } i s a b a s e f o r A k ( V ).
Proof. By Theorem 18.16, every q5 E I k ( V ) is a linear combination of tensor products $il @ . . . 8 4 i k . The function Alt : ‘ T k ( V )+ A k ( V ) is surjective by Theorem 18.19 and by Exercise 18-26 it is linear. Therefore every w E A k ( V ) is a linear combination of tensors Alt(@i, @ . . @ &). By associativity of the wedge product, Alt($i, 8 .. . 8 #ik) is a multiple of 4jl A . . . A &. If any two indices i j are equal, then by part 5 of Lemma 18.22 &, A q$] = 0.Moreover, for any permutation 0 E s k by part 4 of Lemma 18.22 we infer that q5il A . . . A $ j k E { 3 &,(il) A . . . A & ( i k ) } . This means that every w E A k ( V )is a linear combination of tensors $il A . . . A 4ik with il < i 2 < . . . < i k . To see that these tensors are linearly independent, let the numbers ail,,,ik be so that ail,,,ik@il A . . . A $ik = 0. For fixed 1 5 j 1 < . . . < j k 5 d note that l i i i 0 was arbitrary we infer h ( A [ S ] ) 5 h ( S ) . We can obtain the same inequality for A - ' ( x l , . . . , xd) = ( X I , . . . , x i - 1 , x; - a x , , x i + l , . . . , xd). But this
18. Measure, Topology, and Differentiation
412
means h ( S ) = h (A-'[A[S]]) 5 h(A[S]), and hence h ( S ) = h(A[S]). Finally, it was shown in Exercise 18-35b that det(A) = 1.
Lemma 18.34 The effect of row transposition on Lebesgue measure. Let S Rd be Lebesgue measurable, let i , j E 11, . . . , d } with i < j and let T : Rd -+ Rd be T ( x 1 , . . . , xd) := (xi,.. . , xi-1,x j , x i + i , . . . , x j - 1 , xi,x j + l , . . . , X d ) . Then T[S] is Lebesgue measurable with h ( T [ S ] )= h ( S ) = det(T)/h(S).
1
Proof. Exercise 18-39. Lemma 18.35 The effect of a bijective linear operator on Lebesgue measure. Let the function L : Rd + Rdbe linear and bijective and let S C Rdbe Lebesgue measurable. Then L[S] is Lebesgue measurable and /;(L[S]) = det(L)lh(S).
I
Proof. The Gauss-Jordan algorithm from Theorem 17.23 shows that there is a diagonal operator D with nonzero diagonal entries and a sequence of row transposition and row addition operators A1, . . . , A,* so that L = A,A,-l ' ' . A1 D. Because the determinant of a composition is the product of the determinants of the factors (see
I
n1
I
Corollary 18.27) we infer that det(L) =
(iI1
h(L[S])
=
h(A,A,-i
. . . A1 D[S])
1
det(Ai)
1
1 1 det(D) I, and hence
= det(A,)lh(A,-1
. . . A1 D[S])
Lemma 18.35 confirms our initial geometric idea for arbitrary dimensions. If a parallelepiped is spanned by vectors u1, . . . , v d , then it is the image of the unit cube under the linear map whose matrix representation is the matrix A with columns u1, . . . , Ud. Lemma 18.35 shows that the volume of this parallelepiped is the determinant of A. (Details are left to Exercise 18-48.) For the remaining results, we will work with half-open cubes, which are cubes of d
the form n [ a i ,bi) with all bi - ai being equal. The radius of a half-open cube is half i=l
its side length
Lemma 18.36 The effect of a diffeomorphism on Lebesgue measure. Let ' 2 1 and C22 be open subsets of Rd,let K C R1 be compact and let g : Rl -+ R2 be a continuously differentiable bijective function with det ( D g ( x ) ) # 0 f o r all x E R1. Then f o r eve? 6 > 0 there is a 6 > 0 so that for every half-open cube B K with center point x and with radius less than S we have h ( g [ B ] )- det(Dg(x))(h(B) < s h ( B ) .
I
Proof. Because K is compact, the derivative Dg is uniformly continuous on K and Dg(x)-llI is uniformly bounded on K by an M > 0. Moreover, we can
18.4. Multidimensional Substitution find a v > 0 such that (1
+u
)~ 1
0
1
U
so that for all x,y E K with lIy - x / / 3 with Sd-1 : (0, 00) x ( 0 , 2 n )x (0,7r)d-3 + Rd-l denoting d - 1 dimensional spherical coordinates, let Sd(p3
8 %$91, , , . , (Pd-2)
:=
Sd-1
. . . . Vd-3)) sin(Vd-2), . . . .
( p , 8.
rd-1 (Sd-I(P. 8, V l , .
,,>
Vd-3)) Sin(%-2). Pcos(Vd-2)
ldP2
Let Jd be the Jacobian of Sd. Prove that IJd 1 = p (sin(cpd-2) IJd- 11 f o r d 2 3. Hint. The last row of the matrix (which contains the derivatives of the last coordinate) has cos(rpd-2) as its first entry and --p sin(cpd-2) as its last entry. Expand the determinant with respect to the last row (see Exercise 18-33). Prove that the volume of the d-dimensional Euclidean ball of radius
I
about the origin is
Prove the area formula for circles in R2, Prove the volume formula for balls in B3.
Hint. It is not necessary to compute the integrals
18-46. Let ( M , X I p ) be a n-finite measure space, let f : M + [0, co]be C-measurable and let (R,Xjh.A ) be the real numbers with Lebesgue measure.
[;,fP d k = 1
m
Prove that
prP-'p
( {x
E
M :f(x) > t
})
d t for all 1 5 p < x.
(Measurability is not an issue because of Exercise 14-S1a.) Prove that if g : (0, 00)
1
m
[;,n.fdw
18-47.
g E
=
+
(0, x)is differentiable and g ' ( x ) > 0 for all x , then
g'(f)cL ( { x E M : f ( x ) >
] ) dt.
c' (.")
Prove that C ( x , t ) := f ( x
-
r ) g ( r ) is Lebesgue measurable.
Wd the function cx(r) := f ( x - r ) g ( r ) is Lebesgue integrable J& f ( x - t)g(t) dt: if cx is Lebesgue integrable, .+ g(x) := 10: otherwise, isLebesgueintegrableand /If *gill 5 ~ ~ f ~ I l / ~ g ~ 1 1 . The function f * g is also called the convolution o f f and g
Prove that for almost all x and that he function
E
419
18.4. Multidimensional Substitution (c) Prove that if h E (d) Prove that f
L 1(."), then k(x)
:= h ( - x ) also is Lebesgue integrable
*g =g * f
(e) Prove that if g is bounded, then f * g is continuous. Hint. First prove the result for continuous f.then use that the continuous functions are dense i n L 1 (R~). 18-48. Prove that the determinant det(u1, . . . , u d ) is the oriented n-dimensional volume of the parallelepiped spanned by the column vectors u1, . . . , Ud. Hint. Map a box to the parallelepiped and use the Multivariable Substitution Formula. 18-49. Prove that the hypothesis that L is bijective can be dropped from Lemma 18.35. Hint. Prove that if the linear function L is not bijective, then the Lebesgue measure of every image of a Lebesgue measurable set is 0. 18-50. Multidimensional Substitution with weaker hypotheses. Let R1, 9 2 5 Ed be open subsets of Rd and let g : R1 +. R2 be a continuously differentiable function so that for almost all x E R I we have det ( D g ( x ) ) # 0 and so that { x E R1 : (32 E C21 : f ( x ) = f ( z ) ) ) is a null set. Then for all
Hint. The set { x E R1 : det(Dg(x)) = 0 ] is closed in G I . Prove that if K is compact, then the set [ x E K : (32, E K : f ( x ) = f ( z ) ) } is closed, too. Apply Theorem 18.37 to the appropriate subset of R1 to first prove the result for compactly supported functions. 18-51. On the injectivity hypothesis of Theorem 18.37. Consider the function g : R2 + R2 defined by g ( x . y ) = (x' - y 2 , 2xy) (this function interprets (x,y ) as a complex number and squares it).
(a) Prove that for all (x,y ) E W2 the function g is differentiable at (x, y ) and that det ( D g ( x . y ) ) = 4x2 4 y 2 .
+
(b) Prove that for all (x, y ) E
(c) Prove that for all ( a , b)
W2 we have
1 g(x, y ) / I 2
+ (0,O) we have g
a
= x2
+y2
+
m
b
= ( u ,b ) .
(d) Prove that g [B1(0,O)] = B l ( 0 , O ) . (e) Prove that for the function f ( x , y ) = 1 we have
s,,
f d h = 7r, but the transformed inte-
(0.0)
gral is
s,,
f o g I det(Dg) 1 d h = 27r. Then explain why this result does not contradict (0,O)
Theorem 18.37. Hint. You may use the result of Exercise 18-42. 18-52. The effect of a diffeomorphism on Lebesgue measure. Let R1 and R2 be open subsets of Bd, let K g Rl be compact and let g : C21 --f R2 be a continuously differentiable bijective function with det ( D g ( x ) ) f 0 for all x E R1.Then for every E z 0 there is a 6 > 0 so that for every box B g K
1
withcenterpointx andwithdiameterlessthan8 wehave IA ( g [ B ]) - /det(Dg(x)) i ( B ) l c & h ( B ) . Hint. The difference between this result and Lemma 18.36 are the absolute value signs and that we can work with boxes rather than cubes. Use Theorem 18.37 with f = 1. 18-53. Line integrals and surface integrals are independent of the parametrization.
(a) Let m 5 d , let R,,R2 g Em be open and let r1 : 521 -+ Rd and r2 : R2 + Rd be injective and continuously differentiable so that r1 [R,]= r2[C22]and so that all derivatives Dri (x) are injective. Prove that r2-' o r1 is a differentiable function from R l to 822.
420
18. Measure, Topology, and Differentiation Hint. L e tx
E
a
R1.Find u r n + ' , . . . , ud so that - q ( x ) , 8x1
a
.. . , -rl(x),
v,+l,,
axm
d
a ba s e ofR'an d o n R 1 x R d-, s e t R l ( z l . . . . , z d ) : = r l ( z 1 , . . . , z m ) +
,
I
. , Ud is
zjuj.Use j=rn+l
Corollary 17.66 to prove that R1 is differentiable with differentiable inverse on a neighborhood of ( x i . . . . , x r n . 0, . . . , 0). Define a similar function R2 for ' 2 .
+
+
Let S > 0 and let rl : (a1 - 6 , bl 6) + Rd and r2 : (a2 - 8 , b2 6 ) + Rd be continuously differentiable functions with rl [ [ a l ,b l ] ] = r2 [ [ q b2] , rl (al)= r2(a2), rl(b1) = r2(b2) and so that all derivatives of the rj are not zero. Let R Rd be an open set that contains r l [ [ a l ,bill and let F : R -+ Rd be continuously differentiable. Prove that
1,
Hint. rl = r2 o r2-' o rl Let and R2 be open subsets of R2 and let 11 : Rl -+ W3 and r2 : R2 + W3 be continuously differentiable with r l [ Q l ] = rz[S22] so that for all ( x , y ) E 2, we have
a a a y ) # 0, for all ( u , u ) E R2 we have - r 2 ( ~ , u ) x - r 2 ( u , v ) # 0, ar au au and so that there are points ( x , y) E 521 and ( u , u ) E R2 so that r l ( x . y ) = r2(u, u ) and a a a a a
-q(x,
ax
y ) x -ri(x,
- q ( x , y ) x -r1 (x,y ) = h-r2(u, u ) x - r 2 ( u , u ) for some h > 0. Let R E R3 be an ax ay all au open set that contains r1 [ R l l and let F : R + W3 be continuously differentiable. Prove that
Hint. rl = r2 o r2-' o ' 1 , Then work out componentwise that the effect on the cross product is a multiplication with the determinant of the appropriate 2 x 2 matrix. 18-54. Continuous images of null sets can have nonzero measure. For this exercise, let the function @ : [0, 11 + [0, 11 be Lebesgue's singular function from Exercise 11-22. (a) Prove that if g : [a, b] + R is a nondecreasing function, h(x) = g(x) h (g[A] ) > 0, then h (h[A] ) > 0.
+
+ x and A is a set with
+
(b) Prove that F ( s ) := + ( x ) x (where is Lebesgue's singular function from Exercise 11-22) defines a continuous bijective function from [0, 11 to [0,2] so that F-' also is continuous and A. ( F [CQ])> 0. (c) Prove that for all d E N there is a null set N in Rd and a continuous bijective function G on an open subset of Rd that contains N so that h ( G [ N ]) > 0. 18-55. Prove that if f : [ a , b] + R is absolutely continuous and nondecreasing and N set, then f [ N ] also is a null set.
[ a , b] is a null
18-56. Uniform limits of continuous functions with additional properties need not have these additional properties. is a uniformly convergent sequence of (a) Use Exercises 18-54 and 18-55 to prove that if [ f n absolutely continuous functions on [ a , b ] ,then the limit need not be absolutely continuous. (b) Prove that if [fn]r==r is a uniformly convergent sequence of Lipschitz continuous functions b], then the limit need not be Lipschitz continuous. on [a,
Chapter 19
Introduction to Differential Geometry
In applications, it is often necessary to describe surfaces, like the d-dimensional spheres p E Rd : llp112 = r ] of radius r , or, more generally, lower dimensional subsets S of a higher dimensional space. To describe such subsets or surfaces, we use parametrizations, that is, bijective, continuously differentiable functions g : S2 --f S with injective derivatives D g ( x ) , where S2 is an open subset of RM for an appropriate m . Unfortunately, for many surfaces such an overall parametrization cannot be defined. For example, for the spheres of radius 1 it can be shown (Exercise 19-12) that a parametrization g as mentioned must have a continuous inverse function g-‘ : S +. S2. But S is compact and S2 is not compact, which is impossible by Theorem 16.62. On the other hand, the Implicit Function Theorem guarantees that for many surfaces that are solution sets of equations f ( x , y ) = 0, where both x and y can be in Banach spaces, we can locally find parametrizations. This is the idea behind the definition of a manifold. This chapter introduces manifolds in Section 19.1 and their tangent spaces and differentiable functions in Section 19.2. Sections 19.3, 19.4, and 19.5 build the integration theory on manifolds and the chapter culminates in Section 19.6 with Stokes’ Theorem.
1
19.1 Manifolds The fundamental idea behind a manifold is that it “locally looks like m-dimensional space.” This reflects our intuition that differentiable surfaces, despite being curved, locally look like two-dimensional space.
4 21
422
19. Introduction to Differential Geometry
Figure 48: Manifolds ( a ) are spaces that locally look like R". The atlas of a Cmmanifold is a collection of homeomorphisms {xi}ier into the appropriate Rm so that the compositions xi o xj7' are diffeomorphisms wherever they are defined. Embedded manifolds ( b ) are subspaces of Rd so that each point of the manifold has a neighborhood in Rdthat can be transformed so that the image of the intersection of the manifold with the neighborhood lies in the hyperplane in which the ( m 1)" through dth coordinates are zero.
+
Definition 19.1 Let m E N.A metric space M is called an m-dimensional (topological) manifold ifffor each p E M there is an open set 0 C M so that p E 0 and 0 is homeomorphic to R". (See Figure 48.) Equivalently, in the definition of a manifold we could demand that each 0 is homeomorphic to an open subset of Rm (see Exercise 19-1). For further investigations, we also need differentiability properties. Because M is just a metric space, these properties are introduced through differentiability properties of compositions of the homeomorphisms from the definition.
Rd be open. A bijective, infinitely direrentiablefiinction Definition 19.2 Let U , V h : U -+ V with infinitely direrentiable inverse is called a diffeomorphism. Definition 19.3 Let m E N and let M be an m-dimensional manifold. A family {Xi}iGI is called an atlasfor M ifleach xi is a homeomorphism from an open subset Oi of M to an open subset of R",for each p E M there is an i E I so that p E Oi and for all i , j E I the composition xi o xjT1 : xj [ Oi n O j ] -+ x,[ Oi n O j ] is a diffeomorphism. The functions xi are also called charts or coordinate systems. (See Figure 48.) The inverse x - l of a coordinate system x : U parametrization of the subset U of M .
-+
Rm can be interpreted as a
Definition 19.4 A pair ( M , {xi)ier) of an m-dimensional manifold M and an atlas {xi}iE1 is also called an m-dimensional Cm-manifold. A s for spaces, we Qpically refer to Cm-manifolds through the set M , implicitly assuming that an atlas is given. Of course, every Coo-manifoldis a manifold. A Ck-diffeomorphismis a bijective, k times continuously differentiable function with k times continuously differentiable
19.1. Manifolds
423
inverse. By using Ck-diffeomorphisms instead of diffeomorphisms, it is possible to define Ck-manifolds. Working with Ck-manifolds would require us to keep track of detailed differentiability conditions. Hence, throughout this chapter we will work with Cm-manifolds and we will simply refer to them as manifolds. Results similar to the ones derived in this chapter also hold for @-manifolds. Trivially (Rd. { idRd is a manifold. Whenever we work with Rd as a manifold
})
we will assume that the atlas is { idRd}. Our first nontrivial examples of manifolds are subsets of d-dimensional space. These "embedded manifolds" will be of particular interest throughout. They arise frequently in applications and we will use them as examples and for motivation of abstract definitions.
s
Definition 19.5 Let m , d E N be so that m 5 d. A set M Rd is called an m dimensional embedded manifold iff f o r every p E M there is an open neighborhood U C Rd of p , an open set V Rd and a diffeomorphism h : U -+ V f o r which h [ U f? M ] = { V E v : um+l = . . . = V d = 01. (See Figure 48.) Proposition 19.6 Every m-dimensional embedded manifold M is an m-dimensional manifold.
Proof. For each p E M , let U p be an open subset of Rd that contains p and for which there is a diffeomorphism h , : U p + V, with V p C Rd open and so that h,[U, n MI = { V E V p : um+l = . . . = Ud = 0 ) . Let TRrn : Rd + Rm be the projection onto the first m coordinates. For all p E M let x, := nRm o h p l U p n M . Clearly, each x, is a homeomorphism from the set 0, := U p n M to the subset r B m [ { ~E vP : um+l = . . . = Ud = o}] of Rm. For all p , q E M , the composition h , o h i ' : h p [ U pn U,] .+ h y [ U pn U q ] is a diffeomorphism. Moreover, the sets xP[Op n 0,] = TRni o h p [ U pn U , n M ] andxq[Op n O,] = n p o h q [ U pn U, n M ] are projections of the intersections of open subsets with the subspace Rm x (0)d-m of R d , which means they are open subsets of Rm. With epgm : Rm + Rd being the natural embedding that maps Rm to Rm x the composition X, O X - ' = ~ p g mo h , 0 h i ' 0 epgm is a diffeomorphism. Therefore P In~m [h , [UP"Uq nM]] { x p J p Eis~an atlas, and hence M is an m-dimensional manifold.
Whenever we work with an embedded manifold, we will assume that its atlas was generated as in the proof of Proposition 19.6. Embedded manifolds arise naturally when solving equations.
Theorem 19.7 Let R C Rd be open and let f : R + Rd-" be an infinitely diyerentiable function so that f o r all p E R with f ( p ) = 0 the matrix D f ( p ) has rank d - m. Then f ( 0 ) is an m-dimensional embedded manifold in Rd.
-'
Proof. Let p E R be so that f ( p ) = 0. Apply Corollary 17.69 to obtain an open set G G Rd and a diffeomorphism g : G -+ g[G] so that p E g [ G ] C C2 and f o g ( v 1 , . . . , U d ) = ( u r n + ' , . . . , U d ) for all ( q ,. . . , U d ) E G . Then for all (ul. .. .,Vd) E
8-1 [ f - ' ( ~ )
n g [ G ] ]we obtain f
o
g(v1, . . . , V d ) = 0, which means
424
19. Introduction to Differential Geometry
u,+1 = . . = ud = 0. With U := g[G], V = G and h = g-l we see that, because p was arbitrary, f (0) is an m-dimensional embedded manifold. w
-'
Proposition 19.6 and Theorem 19.7 provide a multitude of concrete examples.
Example 19.8 Examples of (embedded) manifolds. 1. Trivially, every open subset Cl of Rdis a d-dimensional manifold with atlas {in}, where i n ( p ) = p for all p E Cl. This observation is trivial, but it will help with integration over embedded manifolds. d
2. Let r > 0. Because f :
Rd -+ R defined by f (XI,. . . , Xd) := r 2 -
x; is J=1
1
infinitely differentiable, every sphere p E origin is a (d - 1)-dimensional manifold.
Rd : f ( p ) = 0) centered around the
3. A function a : Rd -+ R is called affine linear iff there is a nonzero linear function L : Rd + R and an r E R so that a(x) = r L [ x ]for all x E Rd. Because affine linear functions are infinitely differentiable, the intersection of any hyperplane p E Rd : a ( p ) = 0 ) with an open set is a (d - 1)-dimensional manifold.
+
I
4. More generally, the level surfaces of f ( x ) = k of any infinitely differentiable function f : Rd + R with nonzero derivatives D f (x) are (d - 1)-dimensional manifolds. Under mild hypotheses on how the surfaces intersect, for n < d the intersection of n level surfaces f 1 (x) = k l , . . . , f, (x) = k, is a (d - n)dimensional manifold, because we can use F ( x ) := ( f i (x)-kl , . . . , f, (x)-k,) and apply Proposition 19.6 and Theorem 19.7. 0 Exercise 19-2 provides further examples. We can also obtain examples of manifolds by considering subspaces.
Definition 19.9 Let M be a manifold and let U C M be an open subset is called an open submanifold of M .
of
M . Then U
Exercise 19-3 shows that open submanifolds are indeed manifolds themselves. Some interesting sets cannot be described as manifolds. For example, the closed ball B1(0) c Rd (with respect to the Euclidean norm) is not a manifold, because none of the boundary points has a neighborhood that is isomorphic to Rd.The observation that these points have neighborhoods that are isomorphic to a half-space leads to the idea of a manifold with boundary.
Definition 19.10 Let m E N.A metric space M is called an m-dimensional (topological) manifold with boundary i r f o r each point p E M there is an open neighborhood 0 C M of p that is homeomorphic to Euclidean space R"'or to the upper half space Hfn := {(XI, . . . . x,) E Rm : x, 2 0 ) . Points p E M that do not have neighborhoods isomorphic to Rm are also called boundary points and the set of all these points is called the boundary a M of M .
425
19.1. Manifolds To define atlases for manifolds with boundary, we define the following.
Definition 19.11 Let C2 5 Rd be an open subset of IRd and let B E Rd be so that Q 5 B E Then Ck ( B ) denotes the set of restrictions to B of functions that are k times differentiable on a neighborhood of B. Similarly, C" ( B ) denotes the setof restrictions to B offunctions that are infinitely direrentiable on a neighborhood of B .
a.
Because the ranges of the coordinate systems of manifolds with boundary are sets as in Definition 19.11, atlases for manifolds with boundary, @-manifolds with boundary and C"-manifolds with boundary are defined similar to atlases for manifolds, C k manifolds, and Cco-manifolds, respectively. We will also refer to Cco-manifolds with boundary simply as manifolds with boundary. Exercise 19-4 shows that the boundary of a manifold with boundary is a manifold. As for manifolds, subsets of Rd are of particular interest.
Definition 19.12 Let m , d E N be so that m 5 d. A set M C Rd is called an mdimensional embedded manifold with boundary irfor every p E M there is an open set U Rd containing p and an open set V C Rd such that either 1. There is a diffeomorphism h : U + V so that h ( U n M ) = { v EV
: ~ m + i = . . . =
ud
= 01, O r
2. There is a diffeomorphism h : U --f V so that the mth component of h ( p ) is zero andh(U fl M ) = {v E V : u, 2 0, Um+1 = . . . = ud = 0). The reader will prove in Exercise 19-5 that every embedded manifold with boundary is a manifold with boundary. Corners cannot be described with differentiable functions. Therefore, some interesting sets, such as the cube [0, 1Id, do not have a satisfactory description as embedded manifolds with boundary (see Exercise 19-13b). To include these sets, we define manifolds with corners similar to manifolds with boundary.
Definition 19.13 Let m E N.A metric space M is called an m-dimensional (topological) manifold with corners iff each p E M has an open neighborhood 0 C M that is homeomorphic to a subspace ck := {(xi,. . . , x,) E Rrn : xk 2 0, . . . , X m 2 o} with k E { 1, . . . , rn) or to Rm. Points p E M that do not have neighborhoods isomorphic to Rrnare also called boundary points and the set of all these points is called the boundary a M of M . Atlases for manifolds with corners, Ck-manifolds with corners and Cm-manifolds with comers are defined similar to atlases for manifolds with boundary, Ck-manifolds with boundary, and Cm-manifolds with boundary, respectively. We will also refer to Cm-manifolds with comers simply as manifolds with corners. For a Cm-manifold with corners, we will say that a point p E M is contained in a corner iff there is a homeomorphism x from a neighborhood of p to a space ck with k < m and x ( p ) = 0. Embedded manifolds with comers are touched upon in Exercise 19-6. We should also note that, formally, every manifold with boundary is also a manifold with corners. We
19. Introduction to Differential Geometry
426
will typically not unify the two concepts, even when proving results valid for manifolds with corners, because “manifold with corners” should explicitly indicate the presence of corners, while a manifold with boundary is assumed to be smooth
Exercises
s
19-1. Let M be a metric space, let p E M , let 0 C M be an open neighborhood of p , let V W m be open and let x : 0 + V be a homeomorphism. Prove that there is an open neighborhood U of p and a homeomorphism y : U + R”. Hint Consider the restriction of x to the inverse image of a small ball around x ( p ) and then map that ball diffeomorphically to Rm. 19-2. More examples of manifolds (a) Let a , b, c
x2
(0, 30). Prove that the ellipsoid a manifold. Also construct an atlas. E
y2 22 ++= 1 in R3 is a 2-dimensional b2 c2
(b) Let S7 C X 2 and let f : S7 + W be infinitely differentiable with D f ( x ) # 0 for all x E S2. Prove that every level curve f ( x , y ) = k is a one-dimensional manifold. (c) Prove that S L ( n , R) := fold.
{A
E
M ( n x n , W) : det(A) = 1
}
is an n2 - 1 dimensional mani-
19-3. Prove that if M is a manifold and U is an open submanifold, then U is also a manifold. 19-4. Prove that if M is a manifold with boundary, then a M is a manifold. 19-5. Let M E
Wd be an embedded manifold with boundary.
(a) Prove that M is a manifold with boundary. (b) Prove that the two conditions in the definition of an embedded manifold with boundary are mutually exclusive. That is, every point of M satisfies either 1 or 2, but not both. Hint. Assume a point satisfies both and use Corollary 17.66. (c) Prove that the points of M that satisfy condition 2 are the boundary points of M . (d) Prove that if M is the closure of an open subset of Rd,then a M = SM, that is, the boundary of M as a manifold with boundary equals its topological boundary. (e) Prove that aM is an embedded manifold. 19-6. Define embedded manifolds with corners and prove results similar to those in Exercise 19-5 19-7. Prove that if M is a manifold with comers, then a M is a union of manifolds with corners. 19-8. Prove that every connected manifold is pathwise connected. 19-9. Prove that every manifold is locally compact and that every connected manifold is o-compact
19-10. Let Q1, S72 C Rm be open sets. Prove that if g : det ( D g ( x ) ) f 0 for all x E R1.
’21
-+
S72
is a C1-diffeomorphism, then
19-11. Prove that a set M g Rd is an m-dimensional embedded manifold in Rd iff for each p E M there are an open neighborhood G g Rd,an open set N C Rm and an injective infinitely differentiable function f : N + G so that f [ N ] = M n G, D f ( z ) has rank rn for all z E N and f-’ : f[N1+ N is continuous. Hint. For “e,” let x E N be so that f ( x ) = p and let the vectors urn+l, . . . , U d E Rd be so that the set
{
a
Gf(x),
function f ( z i , .
a
. . . , -f(x),
ax,
um+l,
J
. , , , U d is a base of Rd.Then apply Corollary 17.66 to the
. . , z d ) = ~ ( z i , ... . z m ) +
d
C
j=m+l
zjuj
427
19.2. Tangent Spaces and Differentiable Functions
19-12. Let 2 ‘ g R2 be open. Prove that no function f : R --f { p E R2 : 1Ipli2 = 1 ] can be bijective, continuously differentiable and so that all D g ( x ) are injective. Hint. Use an idea similar to that for Exercise 19-11 to prove that f-’ is continuous. 19-13. Corners cannot be described with embedded manifolds. Let d p 2. (a) Prove that the graph [ ( t . ltl, 0 , . . . , 0 ) : t E (-1, 1) ] of the absolute value function is not an embedded manifold in Rd. Hint. Use Exercise 19- 1 1 (b) Prove that the cube [0, 1Id is not an embedded manifold with boundary in Wd Hint. Use Exercise 19-5e and arguments similar to part 19-13a. 19-14. Let M be a manifold with corners (a) Prove that if p E M is in a corner isomorphic to the origin in ck, then there are continuous functions C k . . . . , cm : [O, 11 --f M so that each ci [ [O, 11 ] is contained in a comer and the intersection of any two distinct sets ci [ [0, I] ] is { p ] . (b) Prove that the set CM := { p E M : p is in a corner ] is closed. (c) Prove that if M is compact, then there are finitely many continuous functions c1, . . . , cn so
u n
that C M =
ci
[ [0, 11 1.
i=l
19-15. Prove that if A is an atlas of the manifold M , then the set is also an atlas of M . Note. This atlas is called the maximal atlas of M .
u A” {
:A
g
A” and A”is an atlas of M
]
19.2 Tangent Spaces and Differentiable Functions The definition of differentiable functions on manifolds (see Definition 19.14 below) is motivated by the fact that compositions of differentiable functions are again differentiable. However, rather than using differentiability of the factors to obtain differentiability of the composition, differentiability of the composition is used to define differentiability of the middle factor. Exercise 19-16a shows that this idea is consistent with the idea of differentiability according to Definition 17.24.
Definition 19.14 Let M , N be manifolds and let f : M -+ N be a function. Then f is called differentiable iyfor all x in the atlas of M and all y in the atlas of N the composition y o f o x - l is diyerentiable. Note that Definition 19.14 does not depend on the atlases used for M and N , as long as there is a containment relation between the atlases for each manifold. This is because if x l and x2 are coordinate systems of M with overlapping domains, then the composition x1 o x c l is differentiable and similar for coordinate systems y1, y2 for N . Therefore differentiability of y1 o f o implies differentiability of y2 o f o xF1 on a subset of its domain and the domain of y2 o f o xT1 can be pieced together from overlaps with domains of similar compositions using functions 21 and y1 from smaller atlases of M and N , respectively. The details are left to Exercise 19-16b. Proposition 19.15 and Exercise 19-17 show that differentiable functions on manifolds behave as we would expect differentiable functions to behave when it comes to domain restrictions and compositions.
xrl
428
19. Introduction to Differential Geometry
Proposition 19.15 Let M , N be manifolds, let U C M be an open submanifold and let f : M -+ N be a direrentiablefunction. Then f Iu : U -+ N also is differentiable. Proof. Exercise 19-18. Before we show that there are indeed many examples of differentiable functions, we introduce higher orders of differentiability. Throughout, unless otherwise stated, the dimension of a manifold M will be m and the dimension of a manifold N will be n.
Definition 19.16 Let M , N be manifolds, let k E N and let U E M be open. Then f : M -+ N is a Ck function on U ifffor all coordinate systems x : V -+ JRm and y : W + Rn the composition y o f o x-l is a C kfunction. A function lx[unvnf-l
[w]]
that is Ck for all k E N is called C f f i .A bijective Ckfunction whose inverse also is Ck is called a Ck-diffeomorphism.
Proposition 19.17 Let M be a manifold. A function f : M -+ JR is COc ifSfor each p E M there is a neighborhood U p of p so that f is Cffion U p
rn
Proof. Exercise 19-20. The next result is a translation of Lemma 18.11 to manifolds.
Theorem 19.18 Let M be a manifold, let C 5 M be compact and let U C M be open so that C E U . Then there is a Coofunction f : M -+ [0, 11 so that f / c = 1 and SUPP(f) c u. Proof. Let p E C and let x : V + JRm be a coordinate system around p . Then there is an e p > 0 so that B,,(p) C U n V. Because x-l is continuous, the image UPQd :=
[ I
[T- 1
x B q ( p ) is open and CRd := x B 9 ( p ) n C is closed. With lclWd,uRd as
o x(q) for all q E U n V and let f p ( q ) := 0 in Lemma 18.11 let f p ( q ) := lcRd,uRd for all other 4 E M . It is easy to see that supp(fp) C B 3 ( p ) C U . 3
To prove that f p is Coo, let 4 E M . If f p is identical to zero in a neighborhood of q , then clearly fp is Coo in that neighborhood. If f p is not identical to zero in any neighborhood of q , then q E supp( f p ) C B q ( p ) . Let y be any coordinate system around 4 and let cause
E
E
3
E 0, 2 be so that B,(q) is contained in the domain of y . Be-
&P -= and 4 ?
(
E
supp(f p ) , B,(4) is also contained in the domain of x. Thus is Coo. Because 4
E
M was
u
arbitrary, f p is C f f i . Because C C B y ( p ) and C is compact, there are points P I , . . . , pn E C so
u
PEC
n
that C C
j=1
n
f p j is Coo,glc 1 1 and supp(g) C: U . Hence,
B&pj - ( p j ) . Now g :=
’
j=1
f := j1 o g (see Lemma 18.9 for the definition of j , ) is as desired.
rn
429
19.2. Tangent Spaces and Differentiable Functions
Figure 49: If M is an embedded manifold and h is as in Definition 19.5, then the derivative of h-' maps the horizontal space into which h[U n MI is embedded to a space that is tangential to M . Although Definition 19.14 defines differentiability on a manifold, it is not satisfactory by itself. After all, differentiable functions have a derivative and Definition 19.14 does not tell us what the derivative is or what it should be. The problem is that a manifold does not have any linear structure. Without linear structure there is no way to shift a linear function L [ . - x ] so that f (x) L [ . - x ] is locally a good approximation of f , as we did in Definition 17.24 (also see Exercise 17-25). In fact, we cannot even dejine linear functions, which means on a manifold we must start from scratch. For embedded manifolds, there is a notion of differentiability in the surrounding space. Hence, we will use embedded manifolds as guidance. Throughout, we will make sure that our newly defined notions are consistent with what we should expect for embedded manifolds. So consider an embedded manifold M & EXd and let h : U + V be as in Definition 19.5. Then for all p E U n M the function h-' is differentiable at h ( p ) and near p the affine function k - ' ( h ( p ) ) D h - ' ( h ( p ) ) [ . - h ( p ) ] is tangential to
+
+
h-' (see Figure 49). With Rm x {O]d-"
[
:= u E
Rd : um+l
1
= . . . = V d = 0 there
(Bm
x It is an open set V in Rd so that h maps U n M to the set V n follows directly from the definition of differentiability that for every E > 0 there is a 6 > 0 so that for all z E V n ( E X m x {O}"-'") with / / z - h ( p ) < 6 we have
/
llh-'(z) - k - ' ( k ( p ) ) - D k - ' ( h ( p ) ) [ z - k ( p ) ] l l f
tion P
+ Dk-'(h(p))[
'
EI/Z
- h(p)II.Therefore,thefunc
I
- h ( ~ ) ] / B " ' x ( o l n &ismtangential to h-' ""( xP (0)d-m)
+
[.-
. Geo-
metrically, this means that the set of points p D k - ' ( h ( p ) ) x {O}d-"] is the tangent space of M at p (see Figure 49). It can be shown via the Chain Rule that this tangent space does not depend on the choice of h (see Exercise 19-19). Moreover, if M and N are embedded manifolds and f is a differentiable function on a neighborhood of M so that f [ M ] N, then the derivative of f at p maps the above defined tangent space of M at p to the similarly defined tangent space of N at f ( p ) .
19. Introduction to Differential Geometry
430
This means we can reinterpret derivatives as functions that map the right tangent spaces to each other. But for an abstract manifold there is no surrounding space in which to define tangent spaces on which the derivative of a function f : M -+ N could operate. Thus we first need to create tangent spaces. There are numerous ways in which tangent spaces can be defined in differential geometry and they are all equivalent in a certain sense. We choose a simple definition here and reinterpret it when this becomes conducive to the investigation in Section 19.4. If x is a coordinate system for the manifold M , then for every element p in the domain of x the space R" is the tangent space (of Rm itself) at x ( p ) . This is because the only space that could be tangential to an open set is the surrounding space itself. Proposition 19.19 shows that if x and y are coordinate systems of M and p is in the domains of x and y , then the tangent vectors in the tangent spaces at points x ( p ) and y ( p ) can be identified in such a way that they are useable as tangent vectors for M at p . Proposition 19.20 shows that this idea is consistent with the idea of a tangent space for embedded manifolds as described above. The remaining results in this section will involve a lot of work with equivalence relations. Because every point of a manifold can be in the domain of several coordinate systems, at each step we must assure that our definitions do not depend on the specific coordinate system we use. This level of detail would be hard to work with on a regular basis. It would be similar to recall that formally every real number is an equivalence class of Cauchy sequences of rational numbers (see remarks after the proof of Theorem 16.89) whenever we work with real numbers. Clearly, this would be overkill. Therefore, we will establish that tangent spaces and the functions that take the place of derivatives behave as they should and we will subsequently use these properties, with the details of the definitions only to be used when this is unavoidable.
Proposition 19.19 Let M be an m-dimensional manifold and let p E M . For all u , w E R" and f o r all coordinate systems x, y so that p is in the domains of x and y, dejine the relation - p by (x,u ) - p ( y , w)zff w = D y o x (x(p))[u]. Then the relation - p is an equivalence relation. Moreovel; if we denote the equivalence classes by [x,u I p and the set of equivalence classes by M,, then the binary operations [x,ulP [x,u ] , := [x,u uIp and a[x, uIp := [x,CYU]~,where a E R,are welldejined and with these operations M , is a vector space that is isomorphic to Rm.
(
+
-'I
+
-,
Proof. Left to the reader as Exercise 19-21. The proof that is well-defined relies on the formula for the derivative of the inverse function for symmetry and on the Chain Rule for transitivity. Proposition 19.20 Let M Rdbe an m-dimensional embedded manifold, let p E M and let x : U + R" be a coordinate system around p as constructed in Proposition 19.6. Then x-' is diyerentiable in the sense of Dejinition 17.24 and the "embedded := { p } x Dx-'(x(p)) [R"] is isomorphic to M p via the isotangent space" morphism F ( ( p , u ) ) := [x, (Ux-'(x(p)))-'
.
[u]] P
Proof. Exercise 19-22.
H
43 1
19.2. Tangent Spaces and Differentiable Functions
Proposition 19.19 introduces an object M , that could serve as a tangent space for M at p and Proposition 19.20 shows that for embedded manifolds there is a natural isomorphism between M , and the space that we would expect to be the tangent space. Thus we call M , the tangent space of M at p .
Definition 19.21 The space M , of Proposition 19.19 will be called the tangent space of M at p . The set T M := M , is called the tangent bundle of M .
u
,EM
Simplistically speaking, at every p E M Definition 19.21 merely tacks a tangent space on to the manifold. However, even if this was all, this approach is consistent with an important physical motivation. Vectors in physics have magnitude and direction, just like vectors in mathematics. But vectors in physics also have a point of action. For example, consider a car that moves in a straight line at constant speed. The magnitude and direction of the force that the car exerts on the particles in front of it stay the same, independent of whether these particles are air or whether they constitute another car. Obviously, the effect is different in either situation. If the force acts on a set of air molecules, the car travels regularly. If it acts on another car, the car crashes. To give vectors in mathematics a point of action, we need to incorporate the point of action into the definition of the vector. This is done in the definition of T M . For { i d R d } ) ,note that R$ is just [idRd, u ] , : u E R d l ,which is consistent with the idea that our vectors now have a point of action p . Similarly, for an open set !d E Rdconsidered as a manifold (a, i n ) we obtain a, = [ i n , u ] , : u E R d ]for the tangent space. Although these realizations are trivial, they will be very useful when we consider integration over embedded manifolds. Now that tangent spaces are defined, we can tackle the definition of a "derivative" on M . Consider embedded manifolds M and N and a differentiable function f defined on a neighborhood of M so that f [ M ] 2 N. Then the Chain Rule implies D f ( p ) [ D h - ' ( h ( p ) ) x {O}d-m]] = D ( f oh-') ( h ( p ) ) x {0}"-'"], which is contained in the tangent space of N at f ( p ) , and it is equal to the tangent space if we assume that f is a diffeomorphism. Therefore derivatives on manifolds should map the tangent space at a point into the tangent space at the image point (see Figure 50). Proposition 19.22 shows that it is possible to define such a mapping on the tangent vectors and Proposition 19.24 shows that this map is consistent with what we expect it to do for embedded manifolds.
[
(wd,
I
[,,,
[.-
Proposition 19.22 Let M , N be manifolds, let the function f : M + N be direrentiable, let p E M , let x be a coordinate system around p and let y be a coordinate system around f ( p ) . Then f * , ([x, ul,) := [ Y , D ( Y 0 f 0 x-'> ( x ( p ) ) [ ~ l ]de~~~) fines a linear function f+, : M , +. N,.
Proof. We first need to prove that f i . p is well-defined. Let let y1 and 4'2 be coordinate systems about f ( p ) . Then
(XI,u )
-,
(x2,w)
and
432
19. Introduction to Differential Geometry
Figure 50: The tangent space M , of a manifold M at a point p (see Definition 19.21) can be viewed as a tangential plane attached at p . For embedded manifolds, M , can be considered to be the image of Rm under the derivative of the right parametrization (see Proposition 19.20). For a differentiable function f : M + N the map f* (see Proposition 19.22) plays the role of the derivative. If M and N are both embedded manifolds and f is a differentiable function from a neighborhood of M to a neighborhood of N , then f* combines the function and the derivative (see Proposition 19.24).
Definition 19.23 We denote the function from T M to T N whose restriction to each M p is f * p by f*.Moreovel; unless necessary, we will not explicitly mention p and thus denote f * p by f*,too. Proposition 19.24 Let M , N be embedded manifolds, let p E M and let f be a differentiable function from a neighborhood o f M to a neighborhood of N . Then for all ( p , u ) E MErnbthefunction f*emb((p,u ) ) := ( f ( p ) ,D f ( p ) [ u ] )satisJies the equation f* = FN o fZmb o F L ' , where FM and FN are the isomorphisms from Proposition 19.20 for M and N , respectively.
Proof. Let x be the coordinate system around p that is used to construct FM and let y be the coordinate system around f ( p ) that is used to construct F N . Then
433
19.2. Tangent Spaces and Differentiable Functions
As noted in the beginning, the details of chasing these compositions through the equivalence classes are too cumbersome to do on a regular basis. To avoid this level of detail, we accept that T M really defines the tangent spaces and that f* is the "derivative" of a function f : M + N , and we subsequently use the fundamental properties of these entities rather than their definitions. For manifolds with boundary and manifolds with corners, we can also define tangent spaces. Because differentiable functions on sets that are not closed are restrictions of differentiable functions on larger sets, the derivatives D y o x- are also defined for boundary points. This means we can say the following.
(
'1
Definition 19.25 If M is a manifold with boundary or a manifold with corners, the tangent bundle T M is deJined in the same way as for manifolds. For points on the boundary of the manifold, there are two kinds of tangent vectors.
Definition 19.26 Let M be a manifold with boundaly and let p E L?M. A tangent vector [x,u ] , will be called outward pointing ifsu, -= 0 and it will be called inward pointing iff u, z 0. For p being in the boundary of a manifold with corners the vector [x,u ] , will be called outward pointing i f f x ( p ) u $1;;;. The vector [x," I p will be called inward pointing i f f x ( p ) u E (Ck)'.
+
+
With tangent spaces defined, it is now easy to define (tangential) vector fields.
Definition 19.27 Let M be a manifold. A function F : M + T M so that F ( p ) E M , for all p E M is called a vector field on M . Finally, note that even though we assumed throughout that our manifolds were Coo manifolds, everything in this section can be defined and proved for C1 manifolds.
434
19. Introduction to Differential Geometry
Exercises 19-16. Consistency of Definition 19.14 with the original definition of differentiability and with itself. (a) Let
R1,R; g B" and R2, R; g Rn be open sets, let f : S21 + R2 be a function and let R; and @ : R2 + S2; be differentiable bijective functions with differentiable
9 : '21 +
inverse. Prove that f is differentiable iff @ o f o cp-l is differentiable. (b) Let M be a manifold with atlas dM,let N be a manifold with atlas A N and let f : M + N be so that for all X A E AM and all Y A E A N the composition Y A o f o x i 1 is differentiable. Let AM2 AM and_xN 2 AN be atlases of M and N ,let x : U -+ Em be in 2~ and let y : V + W" be in d N so that the composition y o f ox-' has nonempty domain. Prove that y o f o 1 - I is differentiable. 19-17. Let M , N, 0 be manifolds and let f : M -+ N and g : N + 0 be differentiable. Prove that g o f is differentiable and that ( g o f)* = g, o f*. 19-18. Prove Proposition 19.15. 19-19. Prove that if M is an embedded manifold and h : U + V and k : 6 -+ then Dh-' ( h ( p )) [ TR" x (O]d-m ] = Dk-' ( k ( p ) ) [ Rm x (O]d-"
are as in Definition 19.5,
1.
19-20. Prove Proposition 19.17. 19-21. Prove Proposition 19.19. That is, prove that the relation z Pis an equivalence relation and that the vector addition and scalar multiplication are well-defined. 19-22. Prove Proposition 19.20. 19-23. Let M be a manifold. Prove that if id : M + M is the identity, then id, : T M + T M also is the identity. 19-24. Let M . N be manifolds and let U 5 M be an open submanifold of M . (a) Prove that the tangent bundle T U =
u PEU
(b) Prove that ( f l u ) * = f
*l~
Up of U is equal to
u
Mp.
PEU
for all differentiable functions f : M + N.
19-25. Let M be an m-dimensional manifold. Prove that T M is a 2m-dimensional manifold. 19-26. Prove that if A4 g Rd is an embedded C m manifold and R is an open neighborhood of M , then for every f E Ck(R)the restriction f j is~ C k on M . 19-27. Let M be a connected manifold, let C C_ M be closed (but not necessarily compact) and let U 5 M be open so that C g U . Prove that there is a C" function f : M + [O, 11 so that f l c = 1 and supp(f) G U .
19.3 Differential Forms, Integrals Over the Unit Cube As we start our investigation of integration on manifolds, we first note the following three shortcomings of the tools we currently have available. First, although vector fields as in Definition 19.27 are important, they have a mortal flaw for applications in fluid mechanics and field theory. Any vector field that is defined as a map from the manifold into the tangent bundle is necessarily tangential to the manifold. In fluid mechanics and field theory, manifolds are test surfaces and vector fields usually go through these test surfaces or at least they have a component that causes some "transfer" through the surface. Obviously such vector fields cannot be modeled with the tangent bundle. Second, a typical application of the interplay between fields and surfaces is the computation how much "matter" a vector field transfers through a test surface. To
19.3. Differential Forms, Integrals Over the Unit Cube
435
compute this quantity, we need to integrate the field over the manifold. In R d , integrals usually are computed with Fubini's Theorem and we do not give much thought to the fact that the coordinate directions are provided by the standard base. On a manifold there is no standard base and each neighborhood of a point has infinitely many parametrizations via coordinate systems. Thus we cannot just pick one coordinate system and use the standard base in its image space. Third, lower dimensional objects, like two dimensional surfaces in three dimensional space, typically have measure zero in the higher dimensional space. Therefore, we need a notion of integration that gives nonzero integrals, even if our manifold resides in a higher dimensional space. The above only reemphasizes that the requisite definitions will require a lot of attention to detail. Differential forms will allow us to model vector fields that are not necessarily parallel to the manifold. The integral of such a form over a manifold will be pieced together from simpler integrals. The simplest such integral is the integral of a differential form over a cube, which is presented in this section. In Section 19.4, cubes in Rm are lifted to the manifold as k-cubes and the differential forms will be k-forms on the manifold. Finally, Section 19.5 defines the integral of a form over the whole manifold. Some definitions will be repeated in this presentation, but the insight gained by first working out the details in a simple setting will be well worth it. (Compare with the double coverage of Lebesgue integration in Chapters 9 and 14.) For starters, recall that Ak ( V ) denotes the space of alternating k-tensors on V (Definition 18.17).
Definition 19.28 Let R be a subset of Rd. A function w : R -+
u
Ak (Rz) so that
PER
w ( p ) E Ak Rd f o r all p E
R is called a k-form on R or simply a differential form.
Weformally set A' (Rz) :=
R,so that a 0-form simply is a function.
(
PI
Because it is natural to identify each [idRd, u I p E R$ with u , we will denote tangent vectors to Rd by single letters in this section. Because alternating k-tensors are associated with k-dimensional volumes (see Example 18.21) we introduce the following notation for the dual base.
Definition 19.29 Let { e l , . . . , ed] be the standard base f o r R$.We will denote the dual . . . , Trd} as { d x l , . . . , dxd} where dxi (e,) = ni ( e j ) . base {nl, The notation is similar to that for integrals because forms are connected to integrals of vector fields as follows. A differentiable function Y : [ a ,b ] +- R3 with ~ ' ( t#) 0 for all t E [ a ,b] can be interpreted as describing the position r ( t ) of a traveling particle at time t . If R contains r [ [ a ,b ] ]and the vector field F : R + T R is so that F ( x , y , z ) describes a force that is acting on a particle at the point ( x , y , z ) , then the work that is done as the particle travels from r ( a ) to r (b) is the line integral
W =lh(F(r(t)),
d t , where we assume that each F ( r ( t ) ) was projected
back from R2,(r)to R3 via [in,v],(~) H u . By Exercise 18-53b, the value of this integral depends only on the geometric shape of r [ [ a ,b ] ]and the direction of travel,
436
19. Introduction to Differential Geometry
but not on the speed at which the particle travels from r ( a ) to r ( b ) along this path. we can write the integral as r(f)
Each rl!(t)can be obtained from r ' ( t ) by applying the form d x i . The integral is thus also abbreviated as W =
lb + lb + l P dxl
for d x , ( r ' ( t ) ) d t . The form w = P d x l
Q dx2
b
R dx3, where dxi stands
+ Q dx2 + R dx3 can be interpreted as the dif-
ferential amount of work that is done as the particle moves a differential step from its current position r ( t ) to r ( t )
+ r'(t) dt = r ( t )+
i;::) dx2
2 i1
. The integration for-
malism that we will define in Section 19.5 will indeed locally reduce the integral of the form w to the above line integral. Similarly, an injective differentiable function r : [a,b] x [c,d ] + R3 so that
(2 8) x
( x , y ) f 0 for all ( x , y ) E [ a ,b] x [c,d ] can be interpreted as a surface
in R3.If we interpret F as a flow field, then (with the right hypotheses) the throughput of F through the surface defined by r is the surface integral
Exercise 18-53c shows that this integral also only depends on the geometric shape of the surface as long as the parametrizations respect the orientation of the surface. (Details on the rather subtle notion of orientation will be addressed later.) With the wedge product from Definition 18.20 the factors behind the components of F can be ar obtained by applying the forms dx2 A dx3, dx3 A d x l and d x l A dx2 to the vectors ax ar and -. Thus w = P dx2 A dx3 + Q dx3 A d x l + R d x l A dx2 can be interpreted aY as the differential throughput of the vector field F through a parallelogram spanned by ar ar the differential vectors -dx and -dy attached at r ( x , y ) and located on the surface ax aY defined by r (also see Figure 5 1 on page 442).
437
19.3. Differential Forms, Integrals Over the Unit Cube
L''Idib
Finally, for a scalar function f : [ a ,b] x [ c ,d ] x [ I , h ]
--+
IR the integral is
f ( x , y , z ) d x d y d z . This time there is no extra factor involved. The form
f ( p ) dxr\dyr\dz can be interpreted as the differential contribution o f f ( p )to the overall integral. To stay consistent with the above, we could say that the box is parametrized by the identity. The scaling factor, which is 1, is obtained by applying d x A d y A d z to the partial derivatives of the identity with respect to x , y , and z. The above indicates that differential forms should allow us to describe line integrals of vector fields, surface integrals of vector fields over surfaces with rectangular parameter domain, and integrals of scalar functions over boxes with one formalism. Moreover, in this formalism the forms carry almost all the information needed for the integral. The components of the field as well as the directions in which we integrate are part of the form. Only the parameter domain is not given and it can be supplied by a coordinate system. Therefore, this approach should be the right idea and, after taking care of the details, we will indeed see that forms on manifolds can be used to model vector fields on surfaces.
Definition 19.30 Let R
Rd be open and let o : R -+
u
Ak ( a p be ) a k-form on
PEQ
c
there is a base representation of the form R. By Theorem 18.23, at every p E ~ ( p=) o i l , , , , , i k ( pdxi, ) A . . . A dxi,. We will call w differentiable at p lsil < , . , < i k s d
iff each of the wi,,,,,>ik is differentiable at p . I f w is differentiable at each p E C? we will call w differentiable on a. Recall that by Theorem 17.43 for differentiable functions f : EXd + IR the deriva-
af
+ + af
tive D f is D f = - dxl . . . - dxd and it is only a small abuse of notation to axl axd set d f := D f . We define the differential of a k-form similar to the k lStderivative of regular functions (see Corollary 17.61), except that we work with wedge products instead of tensor products. In this fashion, we will retain the connection to differential contributions to integrals.
+
Definition 19.31 Let R C Rd be open and let f : fine df
af := - dxl 8x1
af dxd. + .'. + -
~ ( p=)
axd
I f w : 52
--+
u
-+ EX be differentiable. We deAk ( a p )is dgerentiable and
PEQ
Wil,,,.,ik( p ) dxi,
A
. . . A dxi,, then we define the ( k + l)-formd w
lsil < . . . < & i d
bY d w ( p ) :=
c
l i i l 0 ) and N := { B : B = {vl, . . . , u d ] is a base of V , o ( q ,. . . , u d ) < 0 ) are called orientations of V and the orientation to which a base ( ~ 1 ., . . , U d ) belongs is denoted [UI,
...
1
Udl.
For example, two orthonormal bases in R2(with the order of the vectors assumed to be fixed) are in the same orientation iff each can be obtained from the other by rotating both base vectors by the same amount. This simple visualization shows the problem with the Mobius strip. If we choose an orthonormal base at our starting point (see Figure 5 2 ) and carry it with us in a way that tries to maintain the orientation (one such way is indicated), then, no matter what we do, after one full traversal of the strip, our base will be in the other orientation of EXd. In Figure 52, this is easy to see because if we rotate the base labeled “problem” back onto our original base, we notice that the base vectors have changed roles. The originally horizontal vector is now vertical and vice versa. Such an interchange cannot be achieved with rotations alone. In terms of the form w , the value of w on the pair of vectors has changed sign, which means the bases are in opposite orientations. For a general manifold, we cannot refer to surrounding space, but the tangent spaces allow us to model the idea described above. Simply speaking, we demand that in each domain of a chart x the orientation of the tangent spaces is given by the images of a fixed orientation of R” under the parametrization x-l of the manifold.
19.5.Integration on Manifolds
(a)
c - +
453
(b)
Figure 52: It is impossible to orient a Mobius strip ( b ) . Because of the half-twist in the strip, any orientation that is carried around the strip in a continuous fashion will not arrive as the same orientation at the starting point. On the other hand, spheres are orientable as indicated by the consistent orientation in ( a ) . (We need to be careful interpreting this figure. There is a difficult theorem in topology that says that on a sphere there is no continuous vector field without zeroes. Thus the bases indicated cannot be interpolated by vector fields. However, any two bases can be mapped to each other by translating tangentially along the sphere and then rotating and this process also works if we cany a base in any fashion around the sphere until we are back at the starting point.)
Definition 19.58 A choice of an orientation p pfor every tangent space M p of the mdimensional manifold M is called consistent i r f o r every chart x : U + Rm and a lla , b E U we have [x,'[[i,[ui, ~ I L .~. . ,~x+'[[i,[uI, ~ ] , e m l x ( u ) ]=] pUifandonly
if [x;' [[ixLu],el],(b)],. . . , x,'[[i,~ul, em],(b)]]= pb. Manifolds with a consistent orientation are called orientable and the function p is called an orientation. A manifold together with an orientation is called an oriented manifold. Charts x : U + I%" such thatfor all a E U we have [ x; ' [ [ i, ~ u] ellx( , al ],. . . , x,'[[i,[u], e,l,(,)]] = pa are called orientation preserving. Note that because k-cubes are defined in terms of coordinate systems, Definition 19.58 also defines orientation preserving k-cubes. For orientation preserving in-cubes c1 and c2, the determinant det ( D (c;' o c2)) is always positive. Therefore Theorem 19.56 shows that the integral of an m-form over an orientation preserving m-cube only depends on the range c "0, lI m]of the m-cube, not on the specific m-cube itself. That is, the integral over c [ [ O , I ] " ] does not depend on the parametrization, as long as we only use orientation preserving parametrizations (m-cubes).
Definition 19.59 Let c : [0, 11" + M be an orientation-preserving m-cube in the m-dimensional oriented manifold M and let w be an in-form that vanishes outside c "0, l I m ] .Then we define
454
19. Introduction to Differential Geometry
Orientations and forms are defined similarly for manifolds with boundary and for manifolds with comers. Now that we have taken care of the orientability issue, the definition of the integral of a form over a manifold is straightforward. To avoid formal problems, we stay with connected manifolds. This is not a problem, because we will mostly be interested in manifolds that are the disjoint union of at most finitely many connected manifolds. To break up the integral, we need infinitely differentiable partitions of unity that are supported inside singular m-cubes. The following results show that such partitions exist.
Definition 19.60 A partition of unity of the manifold M is called a Ccc partition of unity iff all functions in the partition are C30functions. Theorem 19.61 Let 0be an open cover of the connected manifold M . Then there is a Ccopartition of unity subordinate to 0. )IOJ
Proof. This proof is the same as for Theorem 16.115, except that we choose the to be Coo instead of just continuous. By Theorem 19.18, this is possible. H
Theorem 19.62 Let M be an m-dimensional oriented manifold. Then there is an open cover 0 of M so that for each U E 0 there is an orientation-preserving singular m-cube cu with U C cu “0, llm]. Proof. We provide a slightly more elaborate proof than necessary so that it is easy to see how to generalize the result to manifolds with boundary or corners. For each p E M , let x V -+ R” be an orientation-preserving coordinate system around p that maps p to the origin. Let C, C V be a compact cube in R” so that x ( p ) is in the relative interior (with respect to x[V]) of C,. Then with ic, : [0, 11” -+ C, being the natural bijection between the cubes, the function c p := x - l o i c , is an orientation preserving singular m-cube so that the M-interior of c, “0, 1Ip] contains p . Now let W , E C, be a relatively open subset of x[V] that contains p . Then U p := x-l[W,] is open, p E U p C c, [[O, lIm]and 0 := { U p } p Ecovers ~ M. rn Versions of Theorems 19.61 and 19.62 can also be proved as follows for manifolds with boundary and manifolds with corners. For the Coo partition of unity, we just need to modify Theorem 19.18 appropriately for the boundary points. For the covers, the relatively open sets in the proof of Theorem 19.62 will not be open in Rm,because for boundary points of manifolds with boundary one face of the cube C, will be contained in the boundary of the space Hm. For boundary points of manifolds with corners, several faces could be contained in the boundary of the range of the coordinate system. With all machinery in place it is now easy to see (Exercise 19-41) that the definition below really defines only one number that does not depend on the parametrizations or the choice of partition of unity. The same definition also gives the integral over manifolds with boundary and manifolds with comers.
455
19.5. Integration on Manifolds
Definition 19.63 Let M be an m-dimensional oriented manifold and let w be a compactly supported m-form on M . Let 0 be a partition of unity subordinate to an open cover 0 of M so that f o r each U E 0 there is an orientation-preserving singular
It is particularly noteworthy that because o is compactly supported, the sum in Definition 19.63 actually is finite (see Exercise 19-40). To visualize these ideas, consider integration over embedded manifolds in Rd.For an embedded manifold M , we can consider every tangent space M , to be a subspace of the tangent space R; of Rd at p . The identification is done as follows. Let p E M , let U 2 Rd be a neighborhood of p in Rd,let V G Rd be open and let h : U + V be a diffeomorphism as in Definition 19.5. As in the proof of Proposition 19.6 let x := XRm o hlunM be the coordinate system around p obtained from h . The function e : M , + Rdp defined by e [ x ,u ] , := [ h , u ] , is the desired isomorphism (see Exercise 19-42). The definition ([idRd,u],, [idRd, w],), := ( u , w),where (., .) is the usual inner product in Rd, produces a natural inner product on Rdp. (This inner product on Rdp is well-defined, because the definition states that we will use the second component of one unique representative of the tangent vector.) Hence, via the above embedding, the tangent spaces M , of embedded manifolds also carry an inner product. Recalling Proposition 18.29 we can define the following.
Definition 19.64 Let M be an m-dimensional embedded oriented manifold in Rd. Then f o r each p E M we define the volume element w v ( p ) to be the unique in-form so that f o r all orthonormal bases { v l , . . . , urn}in p, we have w v ( p ) [ u l ,. . . , u,] = 1. Example 19.65 Volume elements encode the integral over open subsets of Rd, as well as the integrals over curves and surfaces. 1. Let R
c: Rdbe an open set considered as an oriented embedded manifold with
R.Then with x : R + Rd denoting the natural embedding, the volume element wv can be represented as the wedge product wv = dxl A . . . A dxd. In particular, this means that for all f E CF(S-2) the integral of f wv from Definition 19.63 coincides with the Lebesgue integral of p p = [ e l , . . . , ed] for all p E
f over a.We will also denote this integral by
s,
f dV.
2 . Let M be a d - 1 dimensional embedded oriented manifold and let x be an orientation-preserving coordinate system around p , constructed from a diffeomorphism h : U -+ V as in Definition 19.5. We define the unit normal vector to be the unique unit vector n ( p ) in R$ so that n ( p ) ,h,-d(el), . . . , h;j(ed-1))
1
is a positively oriented base of Rdp. Because the hyperplane determined by D h - ' ( h ( p ) ) [ e l ] ,. . . , D h - ' ( h ( p ) ) [ e d - l ] (see Exercise 19-19) does not depend on the diffeomorphism h , the vector n ( p ) does not depend on the coordinate system. Moreover, the functions n j : M + R that map each p E M to the j t h coordinate of the unit normal vector are differentiable. To see this, in a neighborhood
19. Introduction to Differential Geometry
456
of p , let Z be the solution vector of the system of equations Z, Dh-' [ e j ] )= 0,
(
ah-'
i = 1 , . . . , d - 1 and Z j = 1 for some j so that 1 # 0 in the h-image of axd the neighborhood of p . Then z is parallel or antiparallel to n . By Exercise 177 1 and the differentiability of the coefficients of the system, the components zj n N
are differentiable. The components of n are either the components of N
n
--
IFIl
7 or
llnll
of
, so the n j are differentiable, too.
The volume element of M is w v ( p ) [ u l ,. . . , Vd-11 = det ( n ( p ) ,u1, . . . , V d - I ) , where in the determinant we use the coordinate representation of the vectors with respect to the base {[id,e l l p , . . . , [id,edlp}. Specifically for the case d = 3, if r is an orientation preserving singular 2-cube and f : M -+ IR is differentiable, then
which is the parametric formula (from multivariable calculus) for the scalar surface integral of a function f over a parametric surface parametrized by r . For the integral of a vector field F : 2 ' + T R d defined on a neighborhood 52 of M , we define the surface integral to be
[,
. dS
:=
lM(F,
n ) w v . With the above,
we obtain the following in case d = 3.
which is the parametric formula (from multivariable calculus) for the surface integral of a vector field F over a parametric surface parametrized by r (also see Exercise 18-53c). 3. Let A4 be a one dimensional embedded oriented manifold in IRd, let x : U + R be an orientation preserving coordinate system and let a := x - l . Then (Exercise
(
l , ~ ~ ~The ~ ~line ll). 19-43a) the volume element can be written as w v ( t ) [ v ]= u , -
457
19.5. Integration on Manifolds
integral for scalar functions is defined in the obvious way. The line integral for vector fields is defined by
IM. IM &) dr' :=
(F,
WV.
As for the surface
integral, these formulas reduce to formulas'fahiliar from multivariable calculus when specific parametrizations are used (see Exercise 19-43). The above examples show that the integral over manifolds encodes the integrals of scalar functions, as well as of vector fields over solids, hypersurfaces, and curves with one formalism. Moreover, this formalism shows that the integrals are independent of the parametrizations chosen for the objects in question, which is a big advantage for theoretical considerations. The computation of numerical values of these integrals still uses parametrizations and proceeds along the same lines as in calculus. Finally, note that the definitions presented here actually work in more general settings. The a-algebra of Borel sets on M is the a-algebra generated by the open subsets of M . The integral can actually be defined for rn-forms gwv for which the function g is Borel measurable. Hence, if we use the measure u that assigns to each Borel subset the integral of its indicator function, we can define the integral like we did on measure spaces. In particular, this means that we can talk about LP spaces for which the domain is a manifold. Regarding the domain of the integral we note that the manifold need not be Co3 C' is sufficient and boundaries and comers are permissible.
Exercises 19-40, Prove that in any sum as in Definition 19.63 only finitely many summands are not zero. Hint.Use that the sets [ p : p(p) f 0 ] form a locally finite open cover of M . 19-41. Let M be an m-dimensional oriented manifold, let w be a compactly supported m-form on M , let Q be a partition of unity subordinate to an open cover (3 of M so that for each U E (3 there is an orientation-preserving singular m-cube cu with U C cu "0, l I m ] and let be a partition of unity subordinate to an open cover 6 of M so that for each V E 6 there is an orientation-preserving singularm-cube cv with V 5 cv "0, l]"]. Prove that
jM
FQ
pm =
1
1C.w.
7#?€*
Hints. Use Theorem 19.56 to prove that the integrals of the ~ $ @ wdo not depend on whether we choose the cube cu or C V . Then prove that
I,
'pw =
I,
91C.w. To switch the order of sum-
*€*
mations use that by Exercise 19-40 only finitely many summands are not zero. 19-42. Let M be an m-dimensional embedded manifold, let p E M , let U g Wd be a neighborhood of p in R d , let V g Rd be open and let h : U -+ V be a diffeomorphism as in Definition 19.5. As in the proof of Proposition 19.6 let x := nRrn o hlunM be the coordinate system around p obtained from h. Prove that e [ x , u I p := [ h , u I p defines an isomorphism from Mp to R"p You must also prove that e is well-defined. 19-43. Let M be a one dimensional embedded oriented manifold and let r : [0, 13 + M be an orientation preserving singular 1-cube. (a) Let x : U -+
R be an orientation preserving coordinate system and let a
:= x - l
Prove that
( , :%).
the volume element can be written as w v ( t ) [ u ]= u. -
1
1
(b) Prove that
fwv =
f ( r ( t ) ) ~ ~ r ' ( drh)(~t )~for all differentiable f : M -+
W
19. Introduction to Differential Geometry
45 8
,
A
Figure 53: Visualization of the proof of Stokes’ Theorem for embedded manifolds. The parametrization x-’ maps the outward normal direction of the cube in its domain to the outward normal direction of the k-cube that touches the boundary in the manifold.
(c) Prove that
( F , -)cq
=
1’
( F ( r ( t ) ) r’(t)) , d h ( t ) for all vector fields F : S2
--f
W3
defined on a neighborhood of M .
19.6 Stokes’ Theorem Because the tangent space T a M of an oriented manifold with boundary or with corners is contained in T M , we can obtain an orientation for a M from the orientation of M .
Definition 19.66 Let M be an oriented m-dimensional manifold with boundaq or corners and let p be its orientation. For every p E a M that is not contained in a corner; we let [vl, . . . , vm-11 E i f S [ w , v 1 , . . . , u m - l ] E p p for all outward pointing vectors w.The orientation is called the induced orientation or the positive orientation on the boundary. For d-dimensional embedded manifolds in d-dimensional space, it is also called the outward orientation, because the associated normal vector literally points outward. We will always assume that the boundary carries the induced orientation. Formally, for a manifold with comers, the integral over the boundary is a sum of integrals over the pieces of the boundary that are manifolds with comers themselves. To ease notation, it will be understood that the integral over the boundary of a manifold with comers is such a sum.
Theorem 19.67 Stokes’ Theorem. Let M be an oriented m-dimensional manifold with boundary or corners, let w be a compactly supported ( m - 1)-form on M and let
aM
r
carry the induced orientation. Then 1.44
r
d m = 16’M w’
Proof. We first consider forms that are supported in the M-interior of an orientationpreserving m-cube. The result is trivial if w is supported in the interior of an m-cube c so that c “0, lIm] E M \ aM because then l M d w = l d m = i c w = 0 =
lMw.
For a form w that is supported in the M-interior of an orientation preserving m-cube c in M that intersects the boundary but not the comers, note that a M nc “0, llm] C ac
19.6. Stokes' Theorem
459
and that w is zero on the parts of the boundary of c that do not intersect aM. Let p E a M n c "0, l]"] and let x be an orientation preserving coordinate system around p. (For the next statement, let a negative sign indicate the opposite orientation.) Then x(p) E aHm and pLX(,)= [ e l , . . . , em] = (-l)"[-e,, e l , . . . , em-1], where -em is the outward unit normal vector. This means that the induced orientation on a M n c "0, I]"] is (-1)" times the usual orientation of M,. We can choose the orientation-preserving singular m-cube c so that a M n c "0, I]"] = c(",o) "0, I]"]. Note that by the above C ( ~ , O ): [0, l]"-l -+ a M is orientation preserving for even m and orientation reversing for odd m. Hence,
s
w = (-I),
c(m.0)
I,
w. But in 8c the
coefficient of c(".o) is (-1)". (The left side of Figure 53 gives a visual idea of what we do here.) Hence, we obtain
Finally, when c intersects the comers, a similar argument for the faces of c that are contained in the boundary gives the same result. (Exercise 19-44. The right side of Figure 53 gives a visual idea of what to do.) When we integrate a general form, each form (ow will be supported in the interior of an m-cube as indicated above. Thus we should be able to prove the general result by summation. To move the functions (o from the partition of unity into the differential d,wewillusetheequalityO=Or\w=d(l)r\w=d
c(o1 i9.. A W
= c ( d ( o ) Am. 9EQ
By part 2 of Theorem 19.44 with (o being a 0-form, we obtain d i p ) = (d(o)Aw+(o dw. Hence, because the restrictions of the (o to the boundary are a partition of unity with the requisite properties for defining the integral on the boundary, we conclude
With the general result established, we can now present Stokes' Theorem in the forms that are familiar from calculus (also see Figure 54). To prove a Divergence Theorem in Rd, we first need to consider the volume element of (d - 1)-dimensional embedded manifolds.
Theorem 19.68 Let M be an embedded oriented (d - 1)-dimensional manifold in Rd and let n be its unit normal vector: Then the volume element can be represented d
as w v ( p ) = z ( - l ) ' + j n ; ( p )
dxl
A
... A & A
. . . A dxd. Moreovel; the equality
;=1 h
n j w v = (-l)'+Jdxl
A
. . . A d x j A . . . A dxd holds.
460
19. Introduction to Differential Geometry
Proof. Let U I , . . . , Ud E M p . For the representation of wv, by Exercises 18-33 and 18-34 via expansion with respect to the first column, we obtain the following, where Zj denotes the projection that erases the jthcomponent of each vector. wV(P>[ul,.. .
9
ud-11 d
=
det (n(p), U 1 , .
=
E(-l)'+'nj(p)
. . , Ud-1)
= C(-l)"J.j(P)det(Zj(Ul), j=1
N
. . . , Itj(Vd-1))
d
dxl
A
..
. . . A dxd[ul, . . . , ud-11.
I
*
A
dxj
A
j=1
For njwv, we obtain with n being the unit normal vector, nj(P>wV(P)[ul,.. . ud-ll = nj(p)det(n(p ) , u i , . . . , Ud-1) = d e t ( ( n j ( p ) n ( p ) - e j ) + e j , U i , . . . , u d - i ) 9
= det (ej, U 1 , . . . , ud-1) = (-1)"J =
(- 1)"jdxl
A
... A
A
det (?j(ui),
. , . , nj(ud-1)) N
. . . A dxd.
Now we can prove the Divergence Theorem. For the remainder of this section, we adopt notation that is seen in physics and the sciences with vectors indicated by arrows on top of the letter and with integrals over closed surfaces and curves denoted with a circle in the integral sign(s).
Theorem 19.69 Gauss' Theorem, also known as the Divergence Theorem. If the set E Rd is an embedded oriented connected compact d-dimensional manifol: with boundary or corners, if S = 8 M with positive (outward) orientation and if F is a vector field with continuous partial derivatives on an open region that contains E , then
6F
. d.?
=
( F ,n ) w; =
div ( F ) dw; =
div ( F ) d V (also see Fig-
ure 54(a)).
Proof. This is a consequence of Stokes' Theorem once we note the following: d((F,n)w$)
=
j=l
19.6. Stokes ’ Theorem
46 1
Figure 54: The Divergence Theorem ( a ) says that the integral of a vector field $ over a closed surface S equals the integral of the field’s divergence divF over the-enclosed solid E . Stokes’ Theorem ( b ) says that the line integral of a vector field F along a closed curve C equals the surface integral of curl@ over any surface S bounded by the curve.
The result that is typically called “Stokes’ Theorem” is also a special case of Theorem 19.67. Note that integrals over two and three dimensional objects are often denoted with two and three integral signs, respectively.
Theorem 19.70 Stokes’ Theorem for compact surfaces in R3. If S is an embedded oriented connected compact two dimensioqal manifold with boundary or corners, i f C = as with positive orientation and i f F is a vector field with continuous partial derivatives in an open region of R3 that contains S, then (also see Figure 54(b))
Proof. Exercise 19-45. We conclude with an important result for line integrals.
19. Introduction to Differential Geometry
462
Theorem 19.71 Fundamental Theorem for Line Integrals. Let the curve C be parametrized by the continuously diTerentiable function ;(t), a 5 t 5 b. Iff is dlfSerentiable and V f is continuous on C,then
. dr' = f ( r ' ( b ) )- f ( ; ( a ) ) .
Vf IC
Proof. This can be proved with manifolds or directly. (Exercise 19-46.)
Exercises 19-44. Let M be an m-dimensional manifold with comers, let x : M -+ Ck be a coordinate system and let c = .-I : [0, lIm +. M be an order-preserving m-cube. Prove that for all ( m - 1)-forms w l,O.lP
that are supported in the M-interior of c "0, l]"] the equality Hint.
wLx(p)= [ e l , . . . , e m ] = ( - l ) j
/M d o =
[ - e j , e l , ... ,e j , ... , e m ] . A
1
o holds
19-45. Prove Theorem 19.70.
i
Hint. Prove that F integral.
-
'
1;dIl)
o$ = P d x l
+ Qdx2 + Rdx3. Then use Theorem 19.68 for the double
19-46. Prove Theorem 19.71. 19-47. Some integrals. (a) Compute the integral of the vector field $(x, y , z ) =
{ (x, y , 0 ) E R3
: x*
+ y 2 = 1 1.
(b) Compute the integral of the vector field @ ( x ,y , z ) = to (4, - 3 , 2 ) .
(11 y
over the line segment from (0, 0, 1)
YZ' (c) Compute the integral of the vector field e ( x , y , z ) = ( x 2 z 2
+
cylinder
{ (x, y , z ) E R3 : x2 + y 2 5
1. -1 5
19-48. A subset R _C Rm is called convex iff for all x, y is contained in R.
E
z 5 1 1.
\ z-y
)/ over the surface of the
R the line segment { x + t ( y - x ) : t
E [O,
11 ]
(a) Prove that if R & W3 is open and convex and $ ; + W3 satisfies curl$ = 0 on R,then there is a differentiable function q : Q + W so that F = V q .
Hint Let x
E
R be fixed and define q ( z ) to be the line integral cp(z) :=
the integral is over the line segment from x to z .
s
[.;,?I
F . d r , where
(b) A connected open subset R C B3 is called simply connected iff every closed curve, that is parametrized by a continuous function i! : [ a , b] --f W3 for which there is a c E [ a . 61 so that ?l[,,] and i ! / [ c , b ~are continuously differentiable, is the boundary of a compact embedded two dimensional manifold with comers that is contained in R. Explain why the result from part 19-48a also holds for simply connected sets Q. 19-49. Green's Theorem. Let D be a two dimensional embedded connected compact oriented manifold with boundarv or corners. let C = a D be the boundary curve with positive orientation. Let D be the region in the plane bounded by C. Prove that if P and Q have continuous partial derivatives on an open set that contains D ,then
6 ( g)
d; =
ID(g g) -
dh.
Chapter 20
Hilbert Spaces In addition to the topological structure of a metric space and the linear structure of a normed space, in an inner product space we can measure angles and in particular we can define orthogonality. This additional structure allows us to derive results that are not easily accessible otherwise. The properties of orthonormal bases investigated in Section 20.1 will allow us to establish the L2-convergence of Fourier series in Section 20.2 and we conclude in Section 20.3 with Riesz’ Representation Theorem for linear functionals on Hilbert spaces. As noted in Section 15.9, the inner products of real and complex inner product spaces have slightly different properties. To avoid stating all results for real and for complex spaces, in Sections 20.1 and 20.3 we will assume that our inner product spaces are complex. The proofs will also work for real inner product spaces, because for a real number, the real part and the complex conjugate are equal to the number itself.
20.1 Orthonormal Bases Because an inner product allows us to define orthogonality, we are interested in representing the elements of an inner product space as a sum of orthonormal vectors, similar to the base representation of vectors in d-dimensional space. This section presents the general results and Section 20.2 shows the consequences for the representation of functions with trigonometric polynomials. We first need to make sure that in a representation with an orthonormal system there are not too many nonzero coefficients.
Proposition 20.1 Bessel’s inequality. Let S be an orthonormal system in the inner product space H and let x E H . Then {s E S : (x,s) f. 0} is countable and
c
l(x.
5 IIx1I2.
S€S
463
20. Hilbert Spaces
464
Proof. First let C C S be finite. Then
C€C
c
C€C
R C
C€C
which means
l(x,c ) I 5 ) ) x ) ) Now ~ . suppose for a contradiction that the set 2
C€C
+ 0} is not countable.
Then there are an E > 0 and a set B 5 S so that B is at least countably infinite and for all b E B the inequality (x,b) > E {s E S : (x,s)
holds. Let N
B . Then
E
N be greater than
/ ( x , b ) I 2> N F > b€BN
I
I I
- and let BN
j
1;1112 - E
l2
5 B be an N-element subset of
2 )Ix 112, a contradiction. Therefore the set
{ s E S : (x,s) $ 0 ) must be countable. The inequality follows from the inequality for finite subsets of S proved above. Bessel’s inequality shows that the sum
c(x,
s)s, which, under the right circum-
S€S
stances, should represent the element x,must converge in a Hilbert space (see Exercise 20-1). Thus we can define orthonormal bases.
c(x,
Definition 20.2 An orthonormal system S in an inner product space H is called an orthonormal base #for all x E H the series s)s converges to x. The numbers S€S
(x,s ) are also called the Fourier coefficients of x with respect to S. The term “Fourier coefficients” is usually associated with the expansion of functions in terms of trigonometric functions. Section 20.2 will show that the results here generalize the original Fourier expansions. Theorems 20.3 and 20.4 give several criteria for an orthonormal system to be an orthonormal base. Note that we will freely use the continuity of the inner product in both factors, which is guaranteed by the Cauchy-Schwarz inequality.
cI
Theorem 20.3 Parseval’s identity. An orthonormal system S in an inner product 2 space H is an orthonormal base $for all x E H we have [lx/I2 = (x,S) . S€S
1
20.1. Orthonormal Bases
465
Proof. For “+,”note that if S is an orthonormal base, then
S€S
For
“e,” let llx112 =
c
l(x,s)I2 for all x
S€S
S, := {s
E S :
(x,s ) f 0) is countable. Let
E
H . By Proposition 20.1 the set
(sj)g,be an enumeration of S,,
c
let
00
E
> 0 and let N E
N be
>
so that for all n
N the inequality
l(x,sj)12 iE~
J=n+l
holds. Then for all n 1. N we obtain the following:
n
c(x,
M
11
Hence, x
-
sj)sj
functionD,(t) := 2
sin($)
+ 4)
is called the Dirichlet kernel.
Proof. First note that the nth Fourier polynomial can be represented as follows. F,(x)
=
ao
n
n
]=I
j=1
- + x u , cos(jx) + x b J sin(jx) 2
j=1
+ =
1 n
2 (1
j=l
/= [i + 2 [ f(t)
-77
3r
f ( t ) sin(jt) d h ( t ) -7r
cos(jt) cos(jx)
+ sin(jt) sin(jx)]
j=1
Now, using the Euler identities, we obtain the following for all z E all z E C).
IW (actually for
469
20.2. Fourier Series
lsin((n+;)z) Note that lim Z-+O 2 sin(:) 2n-periodic. Therefore,
-
1
-
=
- n
l - - n
L R
'1
L
1
=n
+ -, so D, 2
E L"[-n,
n).Moreover, D, is
;s,"'" + I"-"
f p ( x - u)Dn(U)d h ( u )
+-
f p ( x - u)Dn(U) d h ( u )
-
fp(X - U)Dn(U)W
u )
f p ( x - u)D,(u) d h ( u )
-7r
7r
=
=
f (X - U ) D n ( U ) d h ( u ) .
-7r
The representation of Fourier polynomials with the Dirichlet kernel now allows us to prove that Fourier series converge pointwise for a large class of functions.
Definition 20.10 A function f : [--71, n ) -+ R is called piecewise smooth iy there is a partition P = {-n = xo < x1 < . . . < x, = n } of [-n, n ] so that f o r all j = 1, . . . , n the restriction f I(x ,-,, o f f to the interval (xj-1, x j ) is difSerentiable and its derivative is bounded. Note that in the definition of piecewise smooth functions, continuity of f at the points x j is not demanded. The definition does imply however, that the periodic extension of a piecewise smooth function has left and right limits at every xj (see Exercise 20-7). Therefore we can say the following.
Theorem 20.11 Iff is a piecewise smooth 2n-periodic function, then at each point
"
+
1
x E [-n, n)the Fourier series F o f f converges to - lim f ( u ) lim f ( u ) (use 2 u+nu+x+ f p at x = -n).Inparticulal; F (x) = f (x)f o r all x at which f is continuous. Convern ]so that f p is continuous gence is uniform to fp on every closed subinterval of [-n, on a neighborhood of the interval (see Figure 55 f o r examples). In particulal; i f fp is continuous, convergence is uniform to f on [-n, n). Proof. For any constant function g ( x ) = c all Fourier coefficients except a0 are zero and a0 = 2c. Hence, by Theorem 20.9 for all c E R we infer, because g is constant
20. Hilbert Spaces
470
-3-2-1 . 1 2 3 -I--
4
5
6
7
8
9 X
Figure 55: For a piecewise smooth function, Fourier series converge uniformly where the function is continuous (left) and they converge to the average of the left and right limits where the function is discontinuous (right). and D, is even,
-1 1
dh(t)=
lirn f ( u ,
+
lsin((a+l>t> ( 2 ~ ) ~ sin ($) n o
1
1 sin ( ( a =
+ a) t )
= f(x-t)+f(x+t)-2c
=
's n o
dh(t).
lim f ( u ) we obtain U'X+
F,(x) - c
=
2 sin
(5)
dh(t) - c
,
sin ( ( a
+
i)
t ) dh(t).
Let K be an upper bound for all existing values of f' and let x E [-n, n). There is a S > 0 so that for all t E (0,S) we have x t , x - t # {xo,. . . , x,) and
+
t
< 2. By definition of piecewise smooth functions and the Mean Value Theo2 sin rem, for all t E (0, S), independent of whether x is in {xo,. . . , x,} or not, the inequal-
(5)
ities f ( x
+t)-
lirn f ( u ) u*x+
f ( x - t ) - lim f ( u ) 5 K t hold. (Use fp u+x-
and appropriate additional hypotheses on 6 if x = -n.)Hence, for all t E (0,S) we infer
+
+ t ) - 2c (3)
f ( x -t) f ( x 2 sin
=
0 there is a p E span(T) so that I(p - f i l m < -, which means that
A
By Theorem 18.12, the set C r ( - n , n)is dense in L 2 ( - n , n),which means that it is also dense in L2[-n, n).The above proves that for every g E C r ( - n , n)and every E > 0 there is a p E span(T) with IIp - gllz < E . Because C r ( - n , n)is dense in L 2 [ - n , n),span(T) is dense in L 2 [ - n , n). By Theorem 20.4, this means that T is an orthononnal base in L2[-n, K). Hence, the Fourier series of any function f E L 2 [ - n , n)converges to f in L2. By Proposition 14.47, the Fourier series converges in measure and by Proposition 14.49 there is a pointwise a.e. convergent subsequence. W Because the density proof works in arbitrary LP-spaces we obtain the following.
Corollary 20.13 The subspace span(T) is dense in LP[-n, n)(1 5 p < 00).
W
Note that the density of the trigonometric polynomials need not imply the convergence of Fourier series. We have encountered this situation with Taylor polynomials. By the Stone-Weierstrass Theorem (see Exercise 16-870, the polynomials are dense in C[-I, 11. Yet there are functions (see Lemma 18.8) for which the Taylor series do not converge to the function. Similarly, the Stone-Weierstrass Theorem (use Exercise 16-87j) can be used to prove that the trigonometric polynomials are dense in the continuous periodic functions, but there are examples of continuous periodic functions whose Fourier series do not converge in Loo[-n, n).Moreover, there are functions in
47 3
20.2. Fourier Series
L ' [ - n , n)whose Fourier series do not converge to the function in L'[-n, n).On the positive side, for p E (1, 00) the Fourier series of functions in L p [ - n , n)do converge to the function in L p [ - n , n).The proofs of these results, which can be found in [30], are beyond the scope of this text. However, we can at least show that for L' functions the Fourier coefficients must converge to zero.
Corollary 20.14 Riemann-Lebesgue Theorem. Let f E L ' [ - n , n ) and let a j and bj be its Fourier cosine and sine coeficients. Then lim a j = 0 and lim bj = 0. j-m
Proof. Let f with
Ilf
L ' [ - n , n)and let E > 0. Then there a function g E C r ( - n , n)
E
- gljl
m b
and
n=1
Hint. Cauchy-Schwarz inequality.
’.
both converge absolutely.
475
20.3 The Riesz Representation Theorem 20-13. Some bounds for Fourier coefficients. (a) L e t f : [-n,n]+
f : [-n. n ] (b) Let n n
1;.
W be continuous and twice continuously differentiable on (-n,n)with
W be continuous and continuously differentiable on (-n,n) so that
--f
If’(x)l dx < CO. Prove that for all n
E
7
1 1 equality lb, 1 5 - - ( l f ( n ) l + lf(-n)l n n
N the Fourier sine coefficients satisfy the in-
1:
+
I f ’ ( x ) l dA).
20- 14. Prove that i f f : [-n, n]+ R is even, then its Fourier sine coefficients are zero and that i f f is odd, its Fourier cosine coefficients are zero. 20-15. Convergence of Fourier series in other norms. (a) Prove that i f f E L 2 [ - n , n)and p E [ l , 2), then the Fourier series o f f also converges to f in L p [ - n , n ) .Hint. Exercise 15-31. (b) Explain why part 20-15a does not prove that Fourier series of all functions in L 1[-n,n) converge in L~ [-n,n). 20-16. For f
E
L 1 ( [-n,n),
) and k
E
Z define C k :=
x
Fourier coefficients o f f and
ckeik‘ := k=-x
_f_
c x
2n
/ n f ( t ) e P L k d‘ h ( t ) . The Ck are also called --7
ckeLk‘
+
k=O
c c x
C-ke-ik‘
is also called the Fourier
k=l
33
series o f f . Prove that for f
E
L 2 ([-n, n),C ) the series
ckeik‘ converges to f in L 2 .
k=-sj
Hint. Prove that the series is equal to the Fourier series from Definition 20.6. Use the Euler identities. 20-17. Explain why the Stone-Weierstrass Theorem, and in particular Exercise 16-87(j)iii, does not prove that Fourier series of continuous functions converge uniformly. 20-18. Dense subspaces of Co ( [ 0 , 2 n ] , C ) (a) Prove the complex version of the Stone-Weierstrass Theorem. That is, prove that if A is a point-separating subalgebra of Co ( [ 0 ,2 n ] , C ) that contains the constant functions and so that for each f = u iu E A the conjugate u - i u also is in A , then A is dense in co ( [ O . 2x1. ). Hint Prove that the regular Stone-Weierstrass Theorem can be applied to the sets of real and imaginary parts of functions in A . That is, prove that these sets satisfy the hypotheses of the Stone-Weierstrass Theorem.
+
c
ajeijX : aj E
@. n
E
N is dense in Co ([O. 2n1. C )
j=-n
(c) Prove that the space from part 20-18b is dense in L p ([0, 2n],
), 1 5 p
0. ’
2
+ 2IIx - Y n I/ 2 -
IIX
- Ym
+x -
~ 7 2 1 1 ~
Hence, (yn}El is a Cauchy sequence. Because the subset K is complete, the limit c := lim yn exists in K and p = IIx - c / J . n+az
Definition 20.21 l f t h e c E K as in Theorem 20.20 is unique, it is also called the best approximation of x in K (also see Figure 56).
477
20.3. The Riesz Representation Theorem
Figure 56: The best approximation of x in a linear subspace K is obtained via orthogonal projections (also see Corollary 20.23 and Exercise 20-23). When K is a complete linear subspace, the best approximation is unique.
Theorem 20.22 Let K be a complete linear subspace of the inner product space H and let x E H . Then the c from Theorem 20.20 is unique. Moreovel; x has a unique decomposition x = k o, where k E K and o is orthogonal to K . The vector k in this decomposition is the unique c from Theorem 20.20.
+
Proof. Let K be a complete linear subspace and let c E K be as in Theorem 20.20. We first prove that x - c must be orthogonal to K . For a contradiction, suppose that (x - c. y ) f 0 for some y E K . Then we can assume without loss of generality that
i l
lJyJ J = 1 and, because eie = 1 for all real numbers 8 , that W((x - c, y ) ) f 0. With
S := -%((x
- c, y ) ) we obtain c
/1x - (c - 6 Y )
12
=
-
6 y E K and
Ib - c + 6y 112
+ 26%((x - c , Y , ) + 621/Yl12
=
IIX
=
IIX - C1l2
=
llx - ell2 - ( W ( b - c, Y , ) )
E for some E > 0. Then the equivalent linear inhomogeneous differential equation
Rnvalued
satisfies the hypotheses of the Picard-Lindelof Theorem with the Lipschitz condition on y being valid on all of R".(This is most easily seen by using I/ . // 1 on R".) Therefore every initial value problem for the Rn valued differential equation has a unique solution on some interval [ a ,a + 6). Now consider an initial value yinit E Rn and two functions f 1 : [ a ,c l ) -+ Rn and f 2 : [ a ,c2) -+ R" that satisfy the Rn valued differential equation with initial value f i ( a ) = yinit = f 2 ( a ) . Without loss of generality assume that c1 5 c2. Let d := sup {x E [ a ,b ) : f 1 I[a,x) = f21[a,x)}and suppose for a contradiction that d < c1. Then f 1 l ~ d , ~and , ) f2/[d,cz) both solve the Rn valued differential equation and continuity implies f l ( d ) = f2(d), but the two functions are not equal on any interval [ d ,d S), contradicting the Picard-Lindelof Theorem. Hence, d = cl, which means that f2 is an extension of f 1 . Consequently, each initial value problem for the Rn valued differential equation has a (globally) unique solution, and hence the same holds for the original nth order differential equation. W
+
With existence and uniqueness of the solutions of initial value problems established, it is possible to state the form of all solutions of linear homogeneous and linear inhomogeneous differential equations.
Theorem 22.11 Let a < b and let ao, . . . , a, : [ a ,b) + R be continuous functions with a,(t) f 0 f o r all t E [ a ,b). Then the linear homogeneous differential equaao(t)y = 0 has n linearly independent solutions tion a , ( t ) ~ ( ~ ). . . a l ( t ) y '
+
+
+
n
y1. . . . , yn and every solution of the dflerential equation is of the f o r m y = where c1, . . . , cn E
cJyJ, J=1
W.
Proof. By Theorem 22.10, the function F that maps each solution y of the differential equation to the vector y ( a ) , y ' ( a ) , . . . , ~ ( ~ - ~ ) y((an -)l ,) ( a ) )is bijective and it is easy to prove that it is linear, too. Hence, there is a linear isomorphism between the vector space of solutions of the differential equation and Rn. The result is now proved by choosing a base in Rn and using the inverse images of the base vectors as the solutions y 1 , . . . , y,. W
(
Theorem 22.12 Let a < b and let g , ao, . . . , a, : [ a ,b ) + R be continuous functions with a,(t) f 0 f o r all t E [ a ,b ) . Let y p be a particular solution of the linear inhomogeneous differential equation a,(t)y(") + . . . a l ( t ) y ' ao(t)y = g ( r ) and let y1, . . . , yn be linearly independent solutions of the linear homogeneous differential equation ~ , ( t ) y ( ~ ). . . a l ( t ) y ' a o ( t ) y = 0. Then every solution of the linear
+
+
+
+
+
22. Ordinary Differential Equations
512 inhomogeneous differential equation the form y = y p
+
a,(t)y(,)
n
c j y j , where c1,
. . . , c,
+ . . . + a l ( t ) y ’ + ao(t)y = g(t) is of E
R.
j=1
Proof. Exercise 22- 11. Knowing what a general solution of a differential equation looks like allows us to write an expression that encodes all solutions of the differential equation. This expression is also called the general solution.
Example 22.13 The general solution of the dgerential equation X f f X(x) = a cos(hx) b sin(hx).
+
+ h2X = 0 is
By Theorem 22.1 1, every solution of the differential equation is a linear combination of two linearly independent solutions. Thus we are done if we can find two linearly independent solutions. It is easily checked that cos(hx) and sin(hx) are linearly independent solutions of X” h2X = 0, which proves the claim.
+
Finally, note that because all results proved in this section are valid for linear differential equations with nonconstant coefficients, the results in this section apply to a wide variety of differential equations that arise in mathematical physics. For an example, consider Exercise 22-13.
Exercises 22-9. Prove Proposition 22.8. 22-10. Prove Proposition 22.9. 22-1 1. Prove Theorem 22.12. Hints. First prove that each y as stated is a solution. Then use that if y is any solution, then y - y p solves the homogeneous equation. 22-12. Prove that the general solution of T’ = -kh2T is T(t) = ce-kA2r 22-13. Prove that for fixed h and u the Bessel equation (see Exercise 21-14) has two linearly independent solutions R1 and R2 defined on (0, m) and that every solution R of the Bessel equation is a linear combination of the form R = c1R I c2 R2. Hint. Argue around an initial point a # 0 and use a substitution u := -r to go right to left in the independent variable r .
+
22-14. Consider the differential equation my”
+ cy’ + ky = 0.
(a) Prove that if 4km - c2 > 0, then the general solution of the differential equation is
(b) Prove that if c2 - 4km > 0, then the general solution of the differential equation is
(c) Prove that if c2 - 4km = 0, then the general solution of the differential equation is y ( t ) := A e - k ‘
+Bte-kr,
(d) Physically interpret the results of parts 22- 14a (“underdamped oscillator”), 22- 14b (“overdamped oscillator”), and 22- 14c (“critically damped oscillator”).
Chapter 23
The Finite Element Method The finite element method uses a deep and beautiful combination of analysis, geometry, linear algebra and computation to provide approximations to solutions of partial differential equations. As with each chapter in Part 111, we will be able to highlight the main ideas, leaving deeper study to the reader as desired. In particular, this chapter will address the theoretical background by introducing the requisite spaces and the method. At the end, a brief outline indicates how the method is used in practice. Simplistically speaking, the finite element method provides something like a best approximation within a finite dimensional subspace of a Hilbert space of functions. This approximation is pieced together from functions whose domains are small subsets of the overall domain under consideration. Though it is technically not quite correct, these functions can be considered to be the “finite elements.” The key to obtaining the mentioned best approximation is to interpret the left side of the partial differential equation as an operator. For operators with sufficiently nice properties, we can then derive theoretical existence and convergence theorems (see Section 23.1). Unfortunately, differential operators need not be well behaved. Spaces on which the results of Section 23.1 can be applied to differential operators are defined in Sections 23.2 and 23.3 and some operators within the scope of the theory are introduced in Section 23.4. The main practical challenges lie in the choice of the subspace (see Section 23.5), in the computation of the best approximation and in the estimation of the error. These practical challenges are addressed in the extensive literature on the finite element method.
23.1 Ritz-Galerkin Approximation A partial differential equation can be written in the form Du = f,where Du is some combination of partial derivatives of u and f is a function. Any function u that satisfies D u = f is a solution of the partial differential equation. The examples in Chapter 21 show that in applications the left side of a partial differential equation is often linear, that is, D ( a u B u ) = a D u B D u for all numbers a,B and all sufficiently often differentiable functions u , u . Thus it is sensible to consider linear differential operators
+
+
513
5 14
23. The Finite Element Method
and we will do so throughout this chapter. If our functions are elements of a Hilbert space H (all Hilbert spaces in this chapter are assumed to be real), then Du = f iff u satisfies the system of equations ( D u , h ) = (f, h ) for all h E H (see Exercise 20-3). Ultimately, the systems of equations for the finite element method are defined slightly differently, but the above will serve as a good motivation until we are ready to discuss details. The right side F := (f,.) of the equation is a functional in the dual space of H and the left side B ( . , .) := (D(.), .) is a bilinear function. Therefore in this section we will focus on systems of equations of the form B ( . , h ) = F ( h ) for all
h
E
H
(H-PDE),
where B : H x H -+ R is bilinear and F E H * . These are the appropriate systems of equations to consider, because in the eventual formulation, we will also have a bilinear form and a linear functional. An element u E H will be called a solution of (H-PDE) iff B ( u , h ) = F ( h ) for all h E H . Partial differential equations typically come with boundary conditions attached. Section 23.3 will show how boundary conditions can be absorbed into (H-PDE) and into the definition of the underlying space. Thus we will not be concerned with boundary conditions for now. Aside from B being continuous, we need the following property to build the theory.
Definition 23.1 Let H be a real Hilbert space and let B : H x H -+ R be a bilinear function. Then B is called elliptic or coercive iff there is a h > 0 so that f o r all u E V we have B ( u , u ) 2 h l j ~ l /The ~ . parameter h is also called the coercivity coefficient. In general, a bilinear function B ( . , .) = ( D ( . ) ,.) need not be continuous or elliptic. However, we will see that both properties can be obtained if the spaces and bilinear functions are chosen appropriately. Formally, we note that continuous, elliptic, and symmetric bilinear functions introduce an inner product that could well replace the original inner product on the space.
Proposition 23.2 Let H be a real Hilbert space and let B : H x H + R be a symmetric, continuous, elliptic bilinear function. Then B ( . , .) defines an inner product on ~ by B is equivalent to the norm of H . H so that the norm 11 ' 1 1 induced Proof. Exercise 23-1. Proposition 23.2 shows that if the bilinear function B is symmetric, continuous, and elliptic, then replacing the original inner product (., .) with the inner product B ( . , .) would not affect the notion of convergence. The only change would be that angles are measured differently. Most importantly, if B ( . , .) is an inner product, then by Riesz' Representation Theorem there must be a unique u E H so that B ( u , h ) = F ( h ) for all h E H . Therefore, in this case the existence and uniqueness of a solution of the system (H-PDE) would already be established. Unfortunately, the bilinear functions B(., .) = ( D ( . ) ,.) need not be symmetric, even for very simple operators.
Example 23.3 Consider the space C,"(O, 1) as an inner product subspace of L 2 ( 0 ,1). d The derivative - : C,"(O, 1) + C,"(O, 1) is a linear function on C,"(O, 1). Hence, dx
515
23.1. Ritz-Galerkin Approximation B ( f , g ) := tions f , g
E
)
f , g is a bilinear function on C,"(O, 1) x C,"(O, 1). But for all func-
-
C X
C,"(0,
i) integration by parts leads to the following.
1
=
lim f ( u ) g ( u ) - lim f ( b ) g ( b )-
fg'dh
b+O+
Ll-1.-
0
Therefore B is not symmetric.
Typically the bilinear functions associated with partial differential equations cannot be made symmetric. However, on the positive side, under the right circumstances our bilinear functions will have all other properties of an inner product. Hence, we should be able to obtain results similar to those for inner products. First we establish that for continuous, elliptic bilinear functions B the system (H-PDE)has a unique solution.
Lemma 23.4 Lax-Milgram Lemma. Let H be a real Hilbert space, let F E H* and let B : H x H -+ R be a continuous elliptic bilinear function. Then there is exactly one u E H so that B ( u , h ) = F ( h )for all h E H . Moreover;for this u the inequality II F II IIu (1 5 -holds, where h is the coercivity coeficient. h
Proof. Because B is bilinear and continuous, for every w E H the function B ( w , .) is linear and continuous. Thus by Riesz' Representation Theorem there is a T ( w ) E H so that for all h E H the equality B(w,h ) = ( T ( w ) ,h ) holds. Note that if we can prove that T : H + H is surjective, then we have proved that a function u as claimed exists. Indeed, by Riesz' Representation Theorem for every F E H* there is an f E H so that F ( . ) = ( f , .). If T is surjective, then B T - ' ( f ) , = ( f , .) = F ( . ) for all F E H * , as claimed. In the remainder of this proof, we show that T is surjective. First we show that T is linear and continuous. For all x,y E H , a , B E E% and h E H we obtain
(
.)
+
( T ( a x+ B y ) , h ) = B ( a x + B Y , h ) = a B ( x ,h ) B B ( y , h ) = a ( T ( x ) ,h ) B ( T ( y ) ,h ) = ( a T ( x ) B T ( y ) ,h ) ,
+
+
+
+
and hence T ( a x B y ) = a T ( x ) B T ( y ) , because h E H was arbitrary (see Exercise 20-3). Therefore T is linear. Moreover, T is continuous, because for all w E H the inequality IIT(w)11* = ( T ( w ) ,T ( w ) ) = B ( w , T ( w ) ) i IIBIIIlwllIIT(w)II holds, which means that T ( w ) 5 II B II II w II for all w E H . Second, we prove that T is injective and the inverse of T is continuous. Suppose for a contradiction that T was not injective. Then there is an element w f. 0 with T ( w ) = 0. But then0 = ( T ( w ) ,w ) = B ( w , w ) 2 hllwll > 0, acontradiction. Now let
1
1
5 16
23. The Finite Element Method
T - ' : T [HI
H be the inverse of T . Then for all y
-+
E
T [ H ]we infer the inequalities 2
llyll l l T - l ( ~ ) l l? ( Y , T - ' ( Y ) ) = B ( T - ' ( y ) , T - ' ( y ) ) 2 A llT-'(y)ll , which means
I1
":"
I1
that T - ( y ) 5 -, and hence T - ' is continuous. Note that once we have proved that T is surjective, this inequality also establishes the inequality claimed at the end of the Lax-Milgram Lemma. Third, we prove that T [ H ]is a closed subspace of H . Let {yn}?=' be a sequence in T [ H ]that converges to y E H . Then, because T-I is continuous, the sequence [ T - ' ( y n ) ] O U is a Cauchy sequence. Because H is a Hilbert space, there is an n=l x E H so that lim T - ' ( y , ) = x. But then because T is continuous, we infer that n+m
T ( x )= T
(n+oo lirn
T-'(y,)
yn = y . Hence, y
E
T[H]
and because { y n } Z lwas arbitrary we conclude that T [ H ]is closed. Now we can finally prove that T [ H ] = H . Suppose for a contradiction that T (HI # H . Then there is a b E H \ T [ H ] . By Theorem 20.22, we can assume without loss of generality that b is orthogonal to all y E T [ H ] .But then we obtain the inequalities 0 = ( T ( b ) ,b) = B ( b , b ) 2 hllb112 > 0, a contradiction. With unique solvability of the system (H-PDE) established, we can turn to the task of approximating the solutions. By Exercise 20-22, if V is a complete subspace of H then u E V is the best approximation of f in V iff ( u , v ) = ( f , v ) for all u E V . Because our bilinear functions are not inner products, we cannot formally talk about best approximations, but the system of equations
B ( . , u ) = F ( v ) for all
u
E
V 5H
( V -PDE)
is well-defined. Ultimately, the right choice of a finite dimensional subspace V of H will produce a solution of a system of equations ( V - P D E ) that is close to the solution of (H-PDE). Before we can address such details, we need to assure that the system ( V - P D E )actually has a unique solution with the right properties.
Definition 23.5 Let H be a real Hilbert space, let V be a subspace, let F E H * and let B : H x H -+ R be a continuous elliptic bilinear function. A Ritz-Galerkin approximation of the solution u of B ( u , h ) = F ( h ) f o r all h E H (that is, of the solution of H - P D E ) is dejined to be a u v E V so that B ( u v , u ) = F ( v )f o r all u E V (that is, u v is a solution of V - P D E ) if such an element exists. Because continuous elliptic bilinear maps B share so many properties with the inner product of H it is not surprising that unique Ritz-Galerkin approximations exist and that if a sequence of spaces "fills" H in the right way, then the Ritz-Galerkin approximations will converge to the solution of (H-PDE).
Lemma 23.6 Let H be a real Hilbert space, let V be a closed subspuce, let F E H* and let B : H x H + R be a continuous elliptic bilinearfunction. Then V contuins a unique Ritz-Galerkin approximation u v f o r the solution of (H-PDE).
5 17
23.1. Ritz-Galerkin Approximation
Proof. Let C := B I v X v and let G := F I V . Then C is bilinear, continuous, and elliptic and G E V * . By the Lax-Milgram Lemma there is a unique u v E V so that for all u E V we have C ( U Vu, ) = G ( u ) ,that is, B ( u v , u ) = F ( u ) , which was to be proved. Lemma 23.7 CCa’s Lemma. Let H be a real Hilbert space, let V be a closed subspace, let F E H*, let B : H x H + R be a continuous elliptic bilinear function, let u be the solution of the equations B ( u , h ) = F ( h )for all h E H and let u v be the II B II Ritz-Galerkin approximation of u in V . Then IIu - u v 11 5 __ inf IIu - u I I , where h
LEV
1) B 1) is the tensor norm of B and h is its coercivity coeficient. In particular, if( V n ) z , is a sequence of closed spaces so that lim dist(u, Vn) = 0, n+cc then lim IIu n+cc
-
UV,,
1 = 0.
Proof. Note that B ( u - U V , w ) = B ( u , w ) - B ( u v , w ) = F ( w ) - F ( w ) = 0 for all w E V . Hence, for arbitrary u E V we infer hllu
-
uylI2
h ( u - U V , u - U V ) 5 B ( u - U V , LI - U V )
= =
B ( u - U V , U ) - B ( u - U V ,U V ) = B ( u - U V , U ) = B ( u - U V ,U ) - B(u - U V , U ) = B ( u - U V , u - U ) I IIBIIIlu - uvIIIIu - UII,
II B II so IIu - u v (1 5 -IIu h
-
uII
for all u
E
V , which implies the desired inequality.
The above shows that if we can define Hilbert spaces of functions and continuous and elliptic bilinear functions so that the solution of a partial differential equation Du = f is also the solution of a system of equations (H-PDE),then the solution can be approximated with functions u v taken from appropriately defined subspaces V of H . To make the problem accessible to computation, it is sensible to find Ritz-Galerkin approximations for the solution of (H-PDE)in appropriately chosen finite dimensional subspaces V . Using finite dimensional subspaces reduces the Ritz-Galerkin approximation to solving a system of linear equations.
Theorem 23.8 Let H be a real Hilbert space, let S be a$nite dimensional subspace with base (s1, . . . , S d ] , let B : H x H + IW be a continuous elliptic bilinear function and let F E H*. Then u 1 , . . . , U d are the coeficients of the unique Ritz-Galerkin d
approximation us =
ujsj
in Sjor the solution of (H-PDE)IfJ u 1 ,
. . . , ud solve the
j=1
system of equations given by d
B ( s j , si)uj = F ( s i ) for all ;=I
Proof. Exercise 23-2.
i = 1 , . .. ,d
( S-PDE)
.
518
23. The Finite Element Method
In summary, because finite dimensional subspaces are closed our abstract considerations have, under the right circumstances, reduced the task of approximating the solution of a partial differential equation to the task of solving a system (S-PDE) of linear equations. We will now show how to translate many partial differential equations into the framework presented in this section.
Exercises 23-1. Prove Proposition 23.2 23-2. Prove Theorem 23.8.
23.2 Weakly Differentiable Functions The first step toward using the results of Section 23.1 (specifically Theorem 23.8) is to encode derivatives so that we can define operators that represent the left side of a partial differential equation on a Hilbert space of functions. We start by introducing notation to simplify the details.
Definition 23.9 A multiindex is a d-tuple a := (a1, . . . , a d ) ofnonnegative integers. d
la; I. The set of nonnegative integers will be denoted No.
We define la I := ;=I
Definition 23.10 Let 52 C let a
E
Rd be open, let f
: 52 -+
R be k
Nt be a multiindex with la1 5 k. We define D f ff
:=
times differentiable and alal
3% . . . a f f d
f . D f ff will
also be called a partial derivative of la Ith order. Consider an open set Q C Rd.The elements of L 2 ( Q ) can have discontinuities, so they are not all differentiable. But Theorem 18.12 shows that the space C,"(Q) is dense in L 2 ( Q ) . By Exercise 20-3, we know that u = f iff ( u , x ) = ( f , x) for all x in a dense subspace of H . Definition 23.11 below defines a weak derivative via this property. The inspiration comes from integration by parts. Consider a continuously differentiable function f : Q -+ R and let g E C,"(Q). Then for all j E { 1, . . . , d ] Fubini's Theorem implies the following.
23.2. Weakly Differen tiable Functions
h
Therefore, an easy induction shows that if D" f is continuous and g
( D " f ) g dh = (-I)la1
s,
519
E
C r (Q), then
f ( D " g ) d h . This is the motivation for defining weak
derivatives in Definition 23.11 below. Proposition 23.12 then shows that the weak derivative is unique if it exists and Proposition 23.13 shows that regular derivatives are also weak derivatives.
Definition 23.11 Let R C Rdbe open, let 1 5 p 5 00, let f E L p ( R ) , and let a! E Nt be a multiindex. The function w E L P ( R ) is called a weak derivative o f f of order la![ ifJlfora11 testfunctions g E C,"(R) we have (w, g)L2(n) = (-I)la1 ( f , Dag),2(,). (Hiilder's inequality guarantees that (w, g ) L ; ( Q )exists f o r all w E LP(s2) and all g E C r ( R ) . ) The function f is called Ia!I times weakly differentiable iff all weak derivatives up to order Ia!I exist. For I C Y[ = I , we say f is weakly differentiable. Proposition23.12 Let R Rd be an open subset of Rd,let 1 5 p 5 oc), let f E L P ( R ) , and let a! E R?; be a multiindex. Ifthe functions u , w E L P ( R ) are both = ( w ,g),2(,)for weakderivatives o f f , that is, i f ( u , g),2(,) = (-1)I"l ( f , all test functions g E C,"(Q), then u = w. Proof. Use the fact that Cr(C2) is dense in L 2 ( R ) . (Exercise 23-3.)
W
Proposition 23.13 Let R C Rdbe an open subset of Rd,let a! E N;f be a multiindex, let 1 5 p 5 00, and let f E LP(s2) n C1011(C2)be so that the regular athpartial derivutive D" f is in L " ( R ) . Then f o r all test functions g E C r ( R ) the equality ( D " f , g ) L 2 ( , ) = (-l)l"'(f; D"g),2(,) hokds, which means that D vf is a weak ath derivative o f f . Proof. Use integration by parts in the coordinates affected by D" (similar to what was done to motivate the weak derivative), and then apply Proposition 23.12. (Exercise 23-4.) W With weak derivatives coinciding with regular derivatives when regular derivatives exist, it is sensible to extend the notation to weak derivatives.
Definition 23.14 Let R 5 Rdbe an open subset of R ' , let 1 5 11 5 00,let f E LP(Q), and let a! E Nt be a multiindex. Then a weak derivative w E L P ( R ) c?f f so that f o r all testfunctions g E C,"(s2) we have (w, g ) L 2 ( n ) = (-1)l"' ( f , Dffg),2(,) is denoted D" ,f := w. For a weak derivative of order 1 in the ith coordinate direction, we also write D ( ' )f instead of using a less easy to read d-tuple.
23. The Finite Element Method
520
We should note that there are functions that are not differentiable, but which have a weak derivative. Recall that the absolute value function is not differentiable at zero.
Example 23.15 Let 2 ' := (-1, 1) C R and consider f ( x ) = 1x1. The weak derivative 1; f o r t > 0, o f f is D ( I )f ( t ) = -1; f o r t 5 0 . Let g E Cr(-1, 1) and let a E (0, 1) be so that supp(g) (-a, a ) . Then (-1)' (f,g')
=
-
/
-.
Because g
E
1
I
la + La 0
fg' d h = -
fg' d h = -
xg' d h
xg' d h
C r ( - l , I ) was arbitrary this proves the claim.
Unfortunately, not all functions in L2(S2)have weak derivatives (see Exercise 235 for a specific example). The following results give an exact characterization of the weakly differentiable functions on open subintervals of R and Theorem 23.17 also is a key element of the proof of the beautiful Antiderivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral (see Exercise 23-8). We first need to prove that if two functions have equal weak first partial derivatives, then the functions must differ (a.e.) by a constant.
Theorem 23.16 Let CZ C Rd be open and connected and let ,f,g E LP(s2) be so that all weak derivatives of order 1 off and g exist. If D(')f = D(')g a.e. for all i E { 1, . . . , d ) , then there is a c E IR so that f = g + c a.e. Proof. Let cp
E Cr(S2).
Then for all i E { 1, . . . , d ) we infer
This condition implies that f - g is equal to a constant almost everywhere. The details are a bit technical and the reader can produce them in Exercise 23-6.
Theorem 23.17 A function f E L p ( a , 6 ) is weakly di3erentiable iff f is equal a.e. to a function g : ( a , b ) +. R that is absolutely continuous on every closed subinterval of ( a , 6 ) and that satisjks g' E LP(a, b). In this case, the weak derivative is a.e. equal to the pointwise derivative. Proof. For the "e" part, we can let f : ( a , b ) + IR be absolutely continuous on every closed subinterval of ( a ,b). By Exercise 23-7b on every closed subinterval of ( a , b ) , f is the difference of two absolutely continuous nondecreasing functions. Therefore, it is enough to prove the result for absolutely continuous nondecreasing f . By Exercise 10-7, f is differentiable a.e. on every closed subinterval of ( a , b ) , and
23.2. Weakly Differentiable Functions
521
hence f is differentiable a.e. on ( a , b) itself. We first prove the result for an absolutely continuous nondecreasing f so that
I
f (x
+ ;)
00
- f(x>
is bounded on every
I In=]
n
closed subinterval [c,d ] s ( a , b). (We assume that we discarded any early terms from 1 1 the sequence for which d - 6 ( u , b) or c - - # ( a , b).) Let cp E C r ( a ,b ) and let n n [c, d ] E ( a , b) be a closed interval so that supp(cp) 5 (c, d ) . Because cp E C r ( a , b ) all difference quotients for cp are bounded on [c,d ] by the maximum of cp’ on [a, b]. Thus, because [c,d ] is bounded, we can apply the Dominated Convergence Theorem to the integrals below.
+
00
which proves D ( ’ )f = j ’ a.e. In case
is unbounded on
1
n
[c,d ]
C ( a , b ) , for each rn
E
N let
Bm
>rn
1 n
n W
Because f is differentiable a.e., the set
Bm is a null set. For each rn
E
W, let
m=l
I;=, 00
I I’ J
be a countable family of pairwise disjoint open intervals that contains B, 00
and satisfies
I
ll? 5 h(B,) j=1
1 +and rn
u Cu 00
00
17
j= I
j=1
IT-’ for rn
1. For each
522
23. The Finite Element Method fm
is nondecreasing, absolutely continuous and
(x
+ i)f m ( x ) -
is bounded.
1 n
fm)c=l
converges uniformly from Moreover, because f is absolutely continuous, { below to f and { fk},"=l converges a.e. from below to f ' . Let q~ E C,"(a, b). By the Dominated Convergence Theorem we conclude
which establishes the "+"part as well as the claim about the derivatives. For the "=+" part, let the function f : ( a , b ) + R be weakly differentiable and let [ c ,d ]
c
( a , b ) . Consider the function f ( x ) :=
l.x
D ( ' ) f ( t )d h ( t ) . Then by Exercise
14-36c, f is absolutely continuous and by the Derivative Form of the Fundamental d
Theorem of Calculus (see Exercise 18-6) the equality - f ( x )
dx
= D ( l ) f ( x )holds a.e.
on ( c ,d ) . But with what we already proved, this means that f is weakly differentiable on ( c ,d ) and D ( ' )f = D ( ' ) f "a.e. on ( c ,d ) . By Theorem 23.16, we conclude that ,f and f differ at most by a constant on ( c ,d ) , and hence f is a.e. equal to an absolutely continuous function on [ c ,d ] .
Exercises 23-3. Prove Proposition 23.12. 23-4. Prove Proposition 23.13. for r > 0, [:I; forrso. Prove directly (that is, without using Theorem 23.17) that g does not have a weak derivative. Hint. Prove that if w E L2(-1, 1) was a weak derivative of g it must satisfy w ( x ) = 0 a.e. Then show that for some even functions e E C r ( - l , I ) the inner product ( e ,uj) is not zero.
23-5. Let R := ( - 1 ,
23-6. Let R
i2
f$
1)
g R a n d consider the function g
C Wd be a connected
E
L 2 ( R ) defined by g ( r ) =
open set. Use the steps below to prove that i f f
d h = Oforall cp E C r ( 5 2 ) andall i E ( I , . . . , d } , then there is a c E
E
L'(52) is so that
W so that f
= c a.e.
We will argue by contradiction, so in the following we assume that f is not constant a.e. Also, we
n d
will first establish the result in case R =
( a j , b j ) is a bounded open box
j=l
(a) Prove that i f f : so thath ( { x E
R + W is so that there is no c E W with f = c a.e., then there is an a Wd : f ( x ) > a } ) > O a n d h ( [ x E Rd : f ( x ) < a } ) > 0.
E
W
(b) Let a be as in part 23-6a, let u 0 be so that A := [ x E R : f ( x ) < a - u } is not a null set. Prove that there must be a 6 E R and an i E [ 1, . . . , d } so that h ( (R n ( A 6 e j ) ) \ A ) > 0. Hint. Suppose for a contradiction that all these measures are zero. First prove that for all translations x S g R of subsets S of A we have h ( ( x S) n A ) = h ( x S). Then let & > 0 and let B be an open box in 52 so that h ( A f' B ) > (I- & ) h ( B )Prove . that then for all translations x B of B that are contained in 2 we have h ( A f' (x B ) ) > (I- & ) h ( x B ) . Use that B , E were arbitrary to conclude that h ( A ) > ( I - E ) ~ ( Rand ) then h ( A ) = h(R), a contradiction.
+
+ +
+
+
+
+
23.2. Weakly Differentiable Functions
523
(c) Prove that there must be a S E R,an i E ( 1 , . . . , d ] and a Lebesgue measurable bounded subset C of A so that Sei C g 52 and h ( (Sei C) \ A ) = A(Sei C ) > 0.
+
+
+
defines a function in C r ( R ) with *(XI = I / J ~ ( x )-
axi
en(x - Sei)
f d h # 0. Conclude that there is a func-
the contradiction for bounded open boxes. (f) Prove that the result holds for all connected open sets R. 23-7 Let f’ : [ a ,b ] + B be an absolutely continuous function. (a) Prove that f is of bounded variation (b) Prove that the two nondecreasing functions in Exercise 8-12a are both absolutely continuous. (c) Use the above and Exercises 10-7 and 14-38b to conclude that f is differentiable almost everywhere and that the derivative is integrable over [ a , b]. 23-8 Fundamental Theorem of Calculus, Antiderivative Form. Prove that f : [ a , b ] + R is absolutely continuous iff f is differentiable a.e., f ’ is Lebesgue integrable and for all x E [ a ,b] the
+
equality f ( x ) = f ( a )
PX
Hint. Use Exercise 14-36c, Exercise 23-7c, Exercise 18-6, Theorem 23.16, and Theorem 23.17. Note. Unlike for the Riemann integral, the Antiderivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral is a biconditional and it has no artificial-looking hypotheses. In particular, the integrability of the derivative does not need to be demanded explicitly, because it follows from absolute continuity.
23-9 Use the Antiderivative Form of the Fundamental Theorem of Calculus to prove that Lebesgue’s singular function is uniformly continuous, but not absolutely continuous. 23-10 Integration by Parts. Let f , g : [ a , b ] + R be absolutely continuous. Prove that the product f g is absolutely continuous and that if fg’ and g’f are integrable, then
l
b
f’g dh = fg
1,”
1
b
-
g’f d h .
U
Hint. Use Exercise 23-8.
23-1 I
Let R P
g Rd be open. A function f
: R +
> 0 there is a S > 0 so that for all X I , .
W is called absolutely continuous iff for every n
. . , x n . zl,. . . , zn
E
llz,
R with
-xj
(1
< S we have
j=l I2
I f(z,)
-
f(x,)
I
0 holds.
R
2. D is called parabolic i f f A is positive semidefinite at every x E Q, that is,for all x E R and all z E Rd \ (0) the inequality ( A ( x ) z ,z ) 2 0 holds.
3. D is called hyperbolic iff A is indejinite at every x E R,that is, for all x E R there are z , z' E Rd \ (0) so that ( A ( x ) z z, ) > 0 and ( A ( x )z', z') < 0. The naming convention is inspired by geometry. Positive definite 2 x 2-matrices A define ellipses via x T A x = 1. In the same fashion, indefinite matrices define hyperbolas and in similar fashion positive semidefinite matrices can be used to define parabolas.
Example 23.29 Some examples of elliptic and nonelliptic differential operators. d
1. The negative Laplace operator - A = -
a2
.
__ IS j=1
a2x
elliptic.
23.4. Elliptic Differential Operators
2. The operator - k A
+ aat
-
533
(ill&)+
= -k
is parabolic. (The matrix of
the aij has zeroes in the row and the column corresponding to t .)
It would be nice if elliptic differential operators D would induce elliptic bilinear forms ( D ( . ) ,.). Unfortunately this is not the case (also see Exercise 23-26). To make some elliptic differential equations accessible to the ideas of Section 23.1, we need to formulate our bilinear form a bit differently, we need to use a stronger property than ellipticity for the operator, and we need to explicitly use that the boundary values are zero. First we rewrite the differential operator D. d
Proposition 23.30 Let D U ( X ) = -
C
a2u(x)
d
u(x)+c(x)u(x) + C bk ( X I aaxk axi a x j k=l
aij ( x )-
i,.j=l
~
be a differential operator as in Definition 23.28. Then, with the scalar product of a , b E Rddenoted by a . b, D can be rewritten as
=
-V(AVu)
+b . VU+
CU.
Proof. Exercise 23-27. For operators as in Proposition 23.30, which satisfy 6 = 0, we can now rewrite the bilinear form (D(.), .) as a more symmetric entity. This will allow us to associate with the partial differential equation Du = f a system of equations (H-PDE) for which the bilinear form is elliptic. The rewriting in Proposition 23.31 explains why we wanted a negative sign in front of the second derivatives. The results we obtain will be practically useful, because if D = A , then clearly 6 = 0.
Proposition 23.31 Let Q 2 Rd be a Trace Theorem Domain. Then for all functions u , u E C r ( Q ) we have
+ cu) u dV
(-V(AVu)
=
(AVu) . V u dV
+
Proof. Because the second term is unaffected, we can concentrate on the first term. By the Divergence Theorem and Exercise 21-6e, we obtain (AVu) . V V dV =
V(A(Vu)u) dV =
and the latter term is zero because u and v are zero on the boundary of Q. Hence,
h
-V(AVu)u dV =
b
(AVu) . V u d V .
23. The Finite Element Method
534
We can now summarize the results of the first two sections of this chapter in more concrete terms for partial differential equations.
Definition 23.32 Let Q g Rdbe a Trace Theorem Domain, let Du := -V(AVu) +cu and let f E Hd(R2). Then u E Hd(R2) is called a weak solution of the equation Du = ,f iff u solves the system of equations (H-PDE)with the bilinear form B dejined by B ( u , u ) :=
L
s,+
(AVu) . V u
(AVu) . Vu
cuu dV =
+ cuu dV
s,
and with F ( u ) =
f u dV for all u E Hd
variational formulation of the equation -V(AVu)
s,
f u d V . The equation
(R)is also called the weak
+ cu = f .
Because C r ( Q ) is dense in Hd (Q) (Exercise 23-25), Proposition 23.31 shows that if u E H i ( Q ) solves the equation in the regular sense, then it will also be a weak solution. By Theorem 23.22, the bilinear function in Definition 23.32 is continuous on the space Hd (Q). To prove that the bilinear function B is elliptic, we need the differential operator to satisfy the property below.
Definition 23.33 An elliptic differential operator is called uniformly elliptic iff the there is a constant CA > 0 so that for all x E Q and all z E Rd \ ( 0 )the inequality ( A ( x ) z Z, ) 2 CA1 1 ~ 1 holds. 1~ To prove that uniformly elliptic operators induce elliptic bilinear functions, we proceed as follows.
Theorem 23.34 PoincarC-Friedrichs inequality. Let Q Rd be a Trace Theorem Domain that is contained in a cube of side length C > 0. Then for all u E Hd ( Q ) the
Proof. Because C,;"(Q) is dense in H d ( Q ) (see Exercise 23-25) it is enough to prove the inequality for functions in C,;"(Q). Moreover, without loss of generality we can assume that R E [O, CId. Let u E Cr(S2) and set u ( x ) := 0 for all x E [0, CId \ Q. Then for all (XI, . . . , xd) E Q we infer
23.4. Elliptic Differential Operators
535
Theorem 23.35 Let 52 5 Rd be a bounded Trace Theorem Domain, let the differential operator Du := -V(AVu) + cu be uniformly elliptic and let f E Hd (52). Then the equation Du = f has a unique weak solution u. Moreovel; if { V,,)Z1 is a sequence of subspaces so that lim dist(w, V,) = 0 for all w E H i (Q), then with U V , being the n-oo
solution of the system of equations (V,,-PDE)we have lim
1u
Proof. By Theorem 23.22, the bilinear form B ( u , u ) :=
s,
n+co
-
u v,
1 = 0.
(AVu) . Vu
+ cuu d V
from Definition 23.32 is continuous. Moreover, with C being the side length of a cube that contains 52, by the Poincark-Friedrichs inequality we obtain the following for all u E Hd(52).
Therefore B is elliptic and the result follows from the Lax-Milgram Lemma and CCa's Lemma. Theorem 23.35 and part 1 of Example 23.29 tell us that the Poisson equation can be solved with the finite element method. That is, the potentials of static electrical fields and the temperature/density distributions of the steady state of headdiffusion phenomena can be approximated by solving large systems of linear equations. Unfortunately, parts 2 and 3 of Example 23.29 show that we cannot directly apply the results
23. The Finite Element Method
536
developed so far to the heat and wave equations. There are ways to apply the finite element method to parabolic and hyperbolic equations. For our introduction, we shall be satisfied having proved that the method can be applied to certain elliptic equations, including the Laplace and Poisson equations.
Exercises d d2 - on H2(-1, 1). Prove that D is uniformly ellipdt2 dt tic, but that the bilinear form (D(.), .) is not elliptic. Then explain why this is not a contradiction to what was done in the proof of Theorem 23.35.
23-26. Consider the differential operator D :=
--
+
23-21. Prove Proposition 23.30. 23-28. Prove that there is no reversal of the PoincarC-Friedrichs inequality, that is, prove that there is nu d
c >
0 so that for all u
E
H d ( Q ) we have I l ~ l l ~ z 1 ( ~c )
Hint. Consider nonnegative functions in C,F"(-l, 1) whose maximum value is 1 and whose L2
norms go to zero.
23.5 Finite Elements So far, we have established that an elliptic partial differential equation Du = f with cu has a unique weak souniformly elliptic left side of the form Du := -V(AVu) lution. Moreover, with the right sequence ( Vn}r=l of finite dimensional subspaces, the weak solution can be approximated with the solutions of the corresponding systems of equations ( V,-PDE). This simplification is significant, because the infinite dimensional problem of solving the partial differential equation is now reduced to solving the finite dimensional systems of linear equations given by (V, -PDE). From a theoretical point of view, all we need are the right spaces V, and we will get an approximation of any given quality. From a practical point of view, we therefore need to address how to construct such spaces. We will build the approximation of the solution on small subsets of Q. These subsets and the functions on them are called finite elements.
+
Definition 23.36 A finite element in Rd is a triple hold. 1. t
(t,P,
, C,) such that the following
C Rd is compact and r o + 0 is a connected Trace Theorem Domain.
2. P, is a j n i t e dimensional subspace of C m ( t ) . 3. With n := dim( Pr ) the set C, consists of linearly independent continuous linear functionals B1, . . . , B , : C " ( t ) + R so that for all a1, . . . , a,, E R there is a p E P, so that B i ( p ) = a;.
The functionals Bi are called the degrees of freedom of thejnite element. Functions 1; ifi = j , P I , . . . , P n so that Bi(Pj) = are called the base functions of the$nite element. Finite elements are also often denoted by
5
only.
23.5. Finite Elements
537
linear
linear
Figure 62: Left to right. A 2-simplex with the evaluation points and a base function for linear Lagrangian finite elements (dotted), a 2-simplex with the evaluation points and a base function for quadratic Lagrangian finite elements (dotted), a 3-simplex with the evaluation points for linear Lagrangian finite elements, and a 3-simplex with the evaluation points for quadratic Lagrangian finite elements. Basically, the definition of a finite element provides a set r on which we can build an approximation to the solution, and a space of functions P, with which to build the approximation. The demand that P, is finite dimensional assures that our space is not too large. The degrees of freedom in C, assure that the space is large enough to reach a certain set of functions. We will see below that the degrees of freedom also are used merge the pieces into a function on all of fi.
Example 23.37 Some simple finite elements. Let Pu($2)be the space of polynomials p : $2 -+ R of degree at most u . For points a l , . . . , a d + ] E R d , so that the set (a1 - a d + l , . . . , a d - a d + l ) is linearly independent, define the d-simplex spanned by
c
d+l
a ] ,. . . , ad+l to be S :=
h;aj
: h l , . . . , hd+l
("+I;=1
E
[0, 11,
h; = 1
j=1 a d + l ) assures
I
. Geometri-
that the points cally, the linear independence of (a1 - a d + l , . . . ,a d a1 , . . . , a d + ] , also called the vertices of S, are not all in the same hyperplane of Rd. Figure 62 shows some simplices. Some properties of simplices are highlighted in Exercise 23-29.
I . Linear Lagrangian finite elements in EXd. For a simplex r in Rd with vertices a l , . . . , a d + ] , let P, := P ' ( s ) . Then dim(P,) = d 1 . For the degrees of freedom, we choose B j ( p ) := p ( a j ) for j = 1, . . . , d + 1 . For the base functions, recall that (a1 - a d + l , . . . , ad - a d + ] }was a base. Therefore, for each y E Rd
+
c d
there are unique y~ , . . . , y d so that y =
y j (a; - U d + l > . For j = I , . . . , d we
j=1
define p;(x) := (x - a d + l ) j (the jthcoordinate of x with respect to the aforementioned base) and for j = d
+ 1 we define Pd+l
c d
(x) := 1 -
p j
;=1
( x ) . Note
538
23. The Finite Element Method that a simplex has exactly the right number of vertices to define polynomials of degree 1 by specifying the values of the polynomial at the vertices.
2. Quadratic Lagrangian finite elements in Rd.For a simplex r in Rd with vertices a l , . . . , a d + l , let P7 := P 2 ( t ) . Then, counting second order terms first and d (d+l)(d+2) d 1= taking symmetry into account, dim(P,) = -(d+1) 2 2 The simplex r has d 1 vertices. For k = 1 , . . . , d and each vertex a k + l , there are k line segments connecting @+I to al , . . . , ak. The vertices of T and
+ +
+
the centers of these segments give us (d
+ 1) +
d
k =
(d
+ l ) ( d + 2) points
2 a l , . . . , a (d+l)(d+Z) . We choose B , ( p ) := p ( a j ) as the degrees of freedom. The 2 coefficients of the base functions P I , . . . , p (d+l)(d+2) are obtained by solving the 2 system of equations that is implicitly given in the definition of finite elements for the coefficients of each p,. Exercise 23-30 gives an impression of the computations. k=l
3. Cubic Hermitian finite elements in Rd.For a simplex r in Rd with vertices a l , . . . , a d + l , let P, := P 3 ( r ) . Then, counting first the third powers of the coordinates, then cubic terms with all factors distinct and then the remaining terms, taking symmetry into account for the cubic summands,
Regarding finite elements, we only consider two dimensions. For d = 2, we obtain dim(P,) = 10. Regarding the degrees of freedom note that evaluation of the polynomial at the vertices and the center plus evaluation of the partial aP aP derivatives - and - at the vertices yields 10 equations for the coefficients of ax ay the polynomial. These equations can be used to determine the base functions, so the above mentioned evaluations can be chosen as the degrees of freedom. 0
Definition 23.38 AJinite element is called Lagrangian iff the degrees of freedom consist of evaluation operators that evaluate the function at points. A finite element is called Hermitian iff the degrees of freedom consist of evaluation operators that evaluate the function and its directional derivatives. With finite elements available to locally approximate the solution, we need to determine how to approximate the solution overall. This is done by partitioning R into subsets on which we have finite elements.
Definition 23.39 Let R C: Rd be a bounded set. A set T of subsets triangulation (also see Figure 63) of S2 i f f the following hold. 1. All sets r
E
of
a is called a
T are closed and each r o is a nonempty Trace Theorem Domain.
23.5. Finite Elements
539
Figure 63: An admissible triangulation with nodes for linear Lagrangian finite elements marked, a refinement of the admissible triangulation with new nodes marked with unfilled circles and an inadmissible triangulation with the “hanging node” marked.
2.
n = (J
5.
re7
3. For all distinct r1, t2
E
T we have
ty
n ti = P).
Moreover; similar to partitions of intervals, we dejine (1 TI( := sup { diam(r) : r E T } and we say that the triangulation R is a refinement of the triangulation T iff each element of R is contained in an element of T and each element of T is the union of finitely many elements of R.
To construct functions on 52 from finite elements, we must merge the functions on different finite elements so that the resulting function is at least in H’(52). For the remainder of this chapter, we will focus on Lagrangian finite elements.
Definition 23.40 Let 52 C Rdbe an open, connected polyhedron and let T be a triangulation into Lugrangian finite elements using simplices. (Formally, T is triangulated by sets that are themselves parts of Lugrangian jinite elements, but this is quite cumbersome to state. So we assume that “triangulate intojinite elements” says just that.) Then T is called an admissible triangulation of 52 iff every face of a r1 E T is also a face of exactly one 52 E T or it is a part of 652. Two elements of T that share a face are also called neighbors of each other: The set of points where the degrees of freedom of each are evaluated is called the set of nodes of the elements of T . The finite element space is now the space
n
Pr of IT /-tuples of Pr functions on
scT
the finite elements so that any two functions that share a node agree at their common nodes. These ITI-tuples need not turn into functions, because two functions in Pr, and Pr2 may be equal at their common nodes and still different somewhere else on the shared part 6rl n St2 of their boundary.
Definition 23.41 Let 52 5 Rd be an open, connected polyhedron and let T be a triangulation into Lagrangian finite elements. Let N be the set of nodes of the elements of T. For each b E N , let T(b) be the set of alljnite elements 5 E T so that b is a
23. The Finite Element Method
540
node of r. For any node b o f t , let Bb,r be the degree of freedom that evaluates each Pr -function at b. The finite element space X is dejined to be
x
=
1
( u r ) s e ~E
n
PT (Vb E N : V t l , t 2 E T ( b )
B b , r l ( u r l ) = Bb.rZ(uT2))
reT
1
.
Iffor all u E X and f o r all neighboring 51, t2 we have us, lsrln6t2 = ur21sTlnsT2, then u can be considered to be a function on 2 and we also write X
=
(u
: E + R :( V r
E T : u I , E PT and V b E N V t l , t 2 E T ( b ) : B b . r I ( ~ I r 1 )= B b , ~ ( U ( r 2 ) ) ) .
Clearly, every finite element space is finite dimensional.
Example 23.42 For the linear and quadratic Lagrangian finite elements introduced in Example 23.37, equality of the elements at the nodes implies equality of the elements on the boundaries of the simplices. Therefore the finite element spaces associated with linear and quadratic Lagrangian finite elements are subspaces of Co ( 5 2 ) . Finally, Proposition 23.44 below shows that if the triangulation and the finite elements are chosen appropriately, the associated finite element space is a finite dimensional subspace of a Sobolev space. Moreover, Theorem 23.45 shows that if we choose an appropriate sequence of such spaces, then the Ritz-Galerkin approximations of the solution of Du = f converge to the actual (weak) solution.
Lemma 23.43 Green's Theorem for H functions. Let R
C -
Rd be a Trace Theorem
uuej.dSforallu,v
E
H'(R),
where formally the values of u and u on the boundary are given by y ( u ) and y ( u ) with y as in Theorem 23.25 and the partial derivatives are weak partial derivatives.
Proof. For all u , u
E
C'
(2)we obtain via the Divergence Theorem
=
IQ
Now let u , u E H ' ( R ) . Because C' (2)is dense in H'(R), we can choose sequences [un)El and ( u n ) z l in C 1 with J J u- u n J J H i+ 0 and l l -~ u n I l ~ i-+ 0 as n -+ 00. Because
(a)
23.5. Finite Elements
54 1 dh. Similar limiting statements hold for surface integral). Therefore the claimed
equality holds for all u , v E H' (0). For the remainder of this section, we will work with compact polyhedra. Note that by Exercises 23-21 and 23-29d (the interiors of) compact polyhedra are Trace Theorem Domains.
Proposition 23.44 Let L 2 C Rd be an open, connected polyhedron, let T be a triangulation into Lagrangian finite elements and let X be the associated finite element space. then X 5 H ' ( R ) . I f P T H ' ( t " ) forall t E T and X 5 Co
(a),
Proof. Let f E X and for r E T , let i E (1, . . . , d ) and let D:) f be the weak D 2 ) f (with partial derivative off I T o in the direction of ei. We claim that D(')f := reT
the 0:' being zero outside ro) is the weak partial derivative of f in the direction of ei . To prove this claim, let g E C,"(R). By Lemma 23.43, we obtain the following.
where the sum of the boundary terms vanishes because g is zero on 6R and all interior boundary terms occur exactly twice and with opposite signs. Because the function g E C r ( R ) was arbitrary, D(')f is the weak ith partial derivative of f . Because f l , ~ E H ' ( t " ) , for each t E T we have that D:) f E L2(R). Thus D(')f E L2(R). Because i E { 1, . . . , d } was arbitrary, all weak first partial derivatives of f exist and are in L2(R). Because f E L2(S2),too, we infer that f E H'(R). Because f E X was arbitrary this establishes the claim.
Theorem 23.45 Let R E Iw be the interior of a compact connected polyhedron, let cu be a uniformly elliptic diTerentia1 operatol; let f E H'(R) Du := -V(AVu) and let be a sequence o f j n i t e element subspaces of H'(R) associated with triangulations T, of R into linear or quadratic Lagrangian finite elements on simplices so that all elements of S,, are in Hd (a),so that lim 11 T, 11 = 0 and so that Tn+l refines
+
{S,,}zl
n+cc
T,,. Then the unique weak solution of the equation D u = f in Hd (R)is the Hd -limit of the solutions of the problems (S,-PDE).
Proof. Because each Tn+l refines T,, the containment S,, C
u
&+I
holds. Hence, it
00
is enough to prove that
S, is dense in Hd (R). To prove this claim, it is enough to
n=l
prove that for every function f E C T (R) there is a sequence { f n } z , with f,, E S,, so that lim 1) f - fnIIH; = 0. So let f E C r ( R ) . For each n E N,let f,, E S, be the n+cc
542
23. The Finite Element Method
unique function so that for all nodes b of T,, we have f n ( b )= f (b).Then, because f is converges uniformly to f and all partial infinitely differentiable, the sequence derivatives converge uniformly where they are defined (which is everywhere outside a null set). Because the domain o f f is bounded, this means that converges to f in H;(L?). rn
{fn]F=l
Theorem 23.45 establishes that the (weak) solutions of certain elliptic partial differential equations, including the Laplace and Poisson equations, can be approximated with the finite element method. While this is theoretically satisfying, it is still not enough for practitioners. Just as it was stated at the beginning of Chapter 13, in numerical analysis it is important to know howfast convergence happens. In this regard, Theorem 23.45 falls short, because it does not say anything about how close we can get to the solution of a given problem in the subspaces we mention. Moreover, the finite element method is computationally intensive, because large systems of linear equations need to be solved. The size of the systems is proportional to the number of elements and the constant of proportionality involves the degree of the elements. Therefore, the practical application of the finite element method involves many steps, some of which are outlined below. 0
0
0
0
0
0
0
0
To start a finite element approach to a problem, we must obtain a variational formulation (compare with Definition 23.32). If the domain is not a polyhedron, then a polyhedron (or another domain that is accessible with the finite element method) must be found so that the solution on the approximate domain is close to the solution on the actual domain. The finite elements need to be chosen so that the resulting (large) systems of equations are well-behaved numerically and so that good error estimates are available for the approximation of the solution of the variational problem. If the solution is to be approximated successively, the degree of the elements as well as the size and shape (use squares, hexagons, etc., instead of triangles) of the mesh elements can be adjusted. To reduce the computational effort, one can refine the mesh more where the solution is expected to fluctuate greatly and less where it is expected to be nearly constant. The triangulations themselves can be modified. A finite element method based on admissible triangulations is also called conforming,while a method involving nonadmissible triangulations with “hanging nodes” is called nonconforming. It is also possible to combine methods and approach a problem using a mixed finite element-finite difference scheme. Error bounds need to be established. Generally speaking, convergence in the LP norm with larger p is better and Loo convergence would be ideal. On the other hand, if L2 estimates are hard, one can try to establish LP estimates with p < 2.
23.5. Finite Elements 0
543
The approximation is not solely judged by how close it is to the solution with respect to an LP norm, but also by how its properties relate to the modeled phenomenon. If an approximation has nonphysical properties (like oscillations when we solve the heat equation), then the approximation must be discarded as physically meaningless, no matter how “close” it is in the L2 sense.
The considerable amount of detail needed here is beyond the aim of this text, which was to provide the theoretical foundation for such investigations. The text [25]could be picked up at this point to expose the reader to more details. Also, for those who read German, the freely available notes [21] are recommended.
Exercises 23-29. Let S C
Rd be a simplex with vertices al , . . . , f l d + l . d+ I
(a) Prove that for any x
E
S the numbers h l , . . . ,
d+ I
h,j = I so that x =
with j=1
hjaj j=l
are unique. Hint. Write x
- Ud+I
as a linear combination of ai
- ad+l,
. . . , ad
- ad+l
(b) Prove that S is closed (c) Prove that S is convex (d) Prove that S is a d-dimensional manifold with comers (and hence So is a Trace Theorem Domain). Hint. There is linear function that maps the standard base to (a1 - a d + l , . . . , ad - f l d + l ) . 23-30. Let ? bc the triangle in H2 with vertices (0,O), (0. l), and ( I , 0 )
(a) For quadratic Lagrangian finite elements, use a computer to compute the base functions P13...,P6. Hint. For each p ; (x.y) = a x 2 + bxy + cy2 + d x + e y + f ,set up a 6 x 6 system of linear equations. (b) Let r be an arbitrary triangle in R2 with vertices ( a x ,a y ) , ( b x ,b y ) and (cx, cr). Find a bijective, affine linear function f : R2 --f R2 (that is, a sum of a constant and a linear function) that maps S to 5 . (c) Explain why the base functions in part 23-30a are sufficient to construct base functions for quadratic Lagrangian finite elements on arbitrary triangles in It2. 23-31. Prove
J;, ~ A i l d h = - ~ V u ~ V u d h + ~ ~ u V u . d S f o r a Hl li (uQ ) a n d u E
E H2(Q),where
formally the values o f u and Vu on the boundary are given by y ( u ) and y ( V u ) with y as in Theorcm 23.25. 23-32. Prove
s,
div(u)u d h = -
s,
u . Vu d h
+
.ld,
uu . dS for all u E
( H 1 ( Q )) d
and u E H 1 ( Q ) ,
where formally the values of u and u o n the boundary are given by y ( u l ) , . . . , y ( u d ) and y ( u ) with y as in Theorem 23.25.
544
Conclusion and Outlook
Conclusion and Outlook It was mentioned in the preface that the text is meant to lay a foundation for a number of topics in mathematics. We can now take a quick look at these topics.
Complex analysis investigates the analytical properties of functions from C to @. It turns out that if such a function is differentiable, it is locally equal to a power series. (Ordinary) differential equations. Theoretical approaches focus on results similar to the Picard-Lindelof Theorem. Applied approaches focus, for example, on special functions of mathematical physics or stability theory (continuous dependence of solutions on input parameters). Numerical approaches focus on numerical schemes to approximate solutions. Differential geometry investigates the geometric properties of manifolds. An important application here is the general theory of relativity. Functional analysis investigates the properties of Banach and Hilbert spaces as well as the properties of linear and nonlinear operators on these spaces. These ideas can then be applied, for example, to solve ordinary and partial differential equations, to approximate solutions, and also to model quantum mechanical phenomena. Harmonic analysis investigates the properties of harmonic functions (solutions of the steady state heat equation or real parts of differentiable complex functions), Fourier series and integral operators. Mathematical physics draws on all branches of mathematics to model phenomena in all branches of physics. Measure theory investigates properties of measures and integrable functions. Numerical analysis provides numerical approximation schemes for solutions of equations and systems of equations. Often the focus is on the application of the method, say, for the finite element method the focus would be error estimates and the choice of mesh, step sizes, and degrees of the elements. Partial differential equations. Topics in this area can reach from theoretical investigations about existence and stability of (weak) solutions to solution schemes with possible overlaps into numerical analysis. Probability theory investigates phenomena governed by chance. It ultimately draws on measure theory, because probability spaces are special measure spaces. Topology investigates properties defined in terms of open sets (point-set topology). Low-dimensional topology focuses on the properties of three dimensional space. You are ready for the topics above. Choose wisely and enjoy.
Appendix A
Logic Sets and Logic are the foundation of mathematics. All mathematical results are ultimately derived from the axioms of set theory using the rules of logic. A start into mathematics from set theory, constructing the real numbers, is almost a course in itself. This being a text on analysis, Appendices A, B, and C are used to outline the necessary background in and connections to the foundations. Appendix A establishes the notation for logic and some fundamental techniques. Appendix B does the same for set theory. Appendix C presents a construction of the rational numbers from the axioms of set theory. In particular, together with the remarks after Theorem 16.89, Appendix C shows that the real numbers can indeed be constructed from the axioms of set theory. Specifics of set theory and logic are only rarely used in analysis. Yet when they are needed, they are essential. In the preface of [ 131, Paul Halmos stated the fundamental importance of set theory by saying one should “read it, absorb it and forget it.” The author wholeheartedly agrees. Fundamental ideas that are frequently used will become second nature. The remaining details often fade from conscious memory without any loss of mathematical ability. Logic provides the language of mathematics and set theory provides the objects. Of course, the two are intertwined. Without language it is not possible to communicate anything about the objects. On the other hand, without objects, what would there be to talk about? We choose to start the fundamentals with logic.
A.l
Statements
In mathematics, there are absolute notions of “true” and “false.” These notions are used to full effect by mostly working with statements.
Definition A . l A statement is a sentence that is either true or false. Once statements are given, more statements can be formed. Definition A.2 applies to arbitrary statements, Definition A.3 applies to statements with variables.
545
A. Logic
546
Definition A.2 Let p and q be statements. I . The statement p
A
q ( “ p and q ”) is true i f f p is true and q is true.
2. The statement p v q ( “ p or q ”) is true iff p is true or q is true, where the “or” also allows for both statements to be true. 3. The statement p =+ q ( “ p implies q ”) is false i f s p is true and q is false.
4. The statement p u q ( “ p if and only if q or “ p iff 4 ” ) is true iff p and q are both true or both false. ”
5. The statement - p (“not p ”) is true iff p is false.
Definition A.3 Let P(x)be a statement that depends on the variable x and let S be a set.
I . The statement V x E S : P(x) (“$or all x in S we have P(x)”) is true holds for all elements x in the set S.
iff
P(x)
2. The statement 3x E S : P(x) (“there is an x in S so that P(x) ”) is true i f f P ( x ) holds for at least one element x in the set S. The symbols V and 3 are called quantifiers. V is the universal quantifier and 3 is the existential quantifier.
Proposition A.4 Let p and q be statements. The contrapositive of the statement p =+ q is (-4) =+ (--p). An implication and its contrapositive are either both true or both jake. That is, the contrapositive says the same as the original implication.
A.2
Negations
To learn more about what it means that a statement is true, it is often helpful to investigate what it means that the statement is false. That is, it is helpful to investigate the negation of the statement. Negations are also used in the contrapositive.
Theorem A S Let p , q be statements. q ) = ( - p ) v (-4).
1. The negation of the statement p
A
q is - ( p
2. The negation ofthe statement p
V
q is - ( p v q ) = ( - p )
3. The negation ofthe statement p =+ q is - ( p
A
=$ q )
=p
A A
(-4).
(-9).
Theorem A.6 Let P(x) be a statement that depends on the variable x and let S be a set. 1. The negation ofVx E S : P ( x ) is -(Vx E S :
P(x)) = 3x E S : (-P(x)).
2. The negation of3x E S : P(x) is -(3x E S : P(x))= V x E S : (-P(x)).
Appendix B
Set Theory This appendix presents the Zermelo-Fraenkel axioms of set theory and it defines relations and functions. Note that products are defined in Definition 7.8.
B.1 The Zermelo-Fraenkel Axioms Axiom B.l The Zermelo-Fraenkel Axioms for Set Theory 1. For every object x and every set S, we can determine if x
E S or x
$ S.
2. Axiom of Specijication. If S is a set and P ( . ) is a meaningfiul statement for euch element of S, then the set of all elements x E S that satisfY P ( x ) is also a set. It is denotedas ( x E S : P ( x ) }oralso as ( x E S I P ( x ) } .
3. There is a set, or equivalently, there is a set 0 that has no elements. (For every set, the set { x E S : x # x ) is empty.) 4. Axiom of Extension. Two sets are equal if and only if they have the same elements.
5. Axiom of Pairing. For any two sets, there exists a set to which they both belong. That is, $ A , B are sets, then { A ,B ) also is a set. 6. Axiom of Unions. For every collection C of sets, there exists a set whose elements are all the elements that belong to at least one element of the collection. This set is denoted C and it is called the union of C.
u
7. Axiom of Powers. For each set S, there exists a set P ( S ) ,called the power set of S, whose elements are all the subsets of S. 8. Axiorn of Injinity. There is a set I that contains k7 and for each a { a , { a ) }is also in I .
547
E
I the set
B. Set Theory
548
9. Axiom of Substitution. If S ( a , 6 ) is a sentence such that for each a E A the set ( b : S(a, b ) } can be formed, then there exists a function F with domain A such that F ( a ) = { b : S ( a , b ) ]forall a E A . Two more important axioms are independent of the Zermelo-Fraenkel axioms.
Axiom B.2 The Axiom of Choice. Let ( A i } i E / be an indexed family of sets. Then Ai so that f ( i ) E Ai for all i E I . there is a function f : I +
u
iEl
Axiom B.3 The Continuum Hypothesis. With K O and K 1 being the first two injnite cardinal numbers, K1 is equivalent to the power set of KO.
B.2 Relations and Functions Relations and functions are fundamental to analysis. In set theory, they are defined as special subsets of the product of two sets.
Definition B.4 Let A and B be sets. Then a relation p from A to B is a set p For a E A and b E B it is customary to write apb instead of ( a , b) E p.
CAx
B.
Definition B.5 Let A and B be sets. 1. A relation p 5 A x B is called totally defined ifffor all a with apb. 2. A relation p C A x B is called well-defined ifffor all a b E B with apb.
E
E
A there is a b
E
B
A there is at most one
Definition B.6 Let A and B be sets. A function ,f : A -+ B is a relation f C A x B that is totally dejined and well-defined. For a E A and b E B , it is customary to write b = f ( a ) instead of ( a , b ) E f . Functions are also called maps or mappings. Definition B.7 Let A and B be sets and let f : A -+ B be a function. 1. The function f is called injective or one-to-one iff x for all x, y E A.
fl y implies f ( x ) # f (y),
2. The function f is called surjective or onto 1 8jor all b such that f ( a ) = b.
E
B there is an a E A
3. The function f is called bijective iff it is injective and surjective.
Appendix C
Natural Numbers, Integers, and Rational Numbers A lot of mathematics seems as if it is not founded on sets, but actually on the number systems that we are familiar with. This appendix briefly indicates how the familiar number systems are all part of set theory.
C.l
The Natural Numbers
Axiom C.l The Peano Axioms for W. 1. There is a natural number 1
2. Each x
E
E
N.
N has a (unique)successor x’,
3. For all x , y
E
N i f x ‘ = y’, then x = y .
4. The element 1 is not the successor of any natural number; 5. The only natural numbers are those given by I and 2.
Proposition C.2 We can construct a model of N in set theory by setting 1 := {a),and by setting x’ := {x,( x ) } for every x that is already dejined. Arithmetic on the natural numbers can also be defined.
Definition C.3 A (binary) operation on a set S is a function o : S x S + S . For elements a , b E S we set a o b := o(a, b).
+
+
Definition C.4 We dejine the operation : W x W -+ W by n I := n’for all n E W and n m‘ := ( n m)’for all m , n E W. The operation . : N x N + N is dejined by n . 1 := n and n . m’ := n . m + n.
+
+
549
C. Natural Numbers, Integers, and Rational Numbers
550
C.2 The Integers Definition C.5 Let X be a set. A relation -5 X x X is called an equivalence relation
ifs I.
-
is reflexive. Thnt is, f o r all x E X we have x
-
x.
- is symmetric. That is,f o r all x , y X we have - y iffy - x . 3. - is transitive. That is, f o r all y , z X we have that x - y and y - z implies x - z. 2.
x,
For euch x
x
E
E
X , the set 1x1 := { y E X : y
Proposition C.6 The relation ( a , b ) lence relation on the set N x N.
-
E
-
x) is called the equivalence class of x.
( c , d ) dejined b y a f d = b
+ c is an equiva-
Definition C.7 The integers Z are dejined to be the set of equivalence classes [ ( a ,b ) ] of elements of N x W under the equivalence relation of Proposition C.6. Addition [ ( c ,d ) ] := [ ( a b, c + d ) ] and multiplication is ofintegers is dejined b y [ ( a ,b) defined b y [ ( a ,b ) ]. [ ( c ,d ) ] := ( a c + b d , b c + a d ) ] . Both operations are well-defined and N is isomorphic to the subset { [ ( n ,l)] : n E W \ {I]}. This set will also be called
+
+
-
N.
C.3 The Rational Numbers Proposition C.8 The relation ( a , b ) relation on the set Z x (Z \ (O}).
- ( c ,d ) defined b y a . d
= b . c is an equivalence
-
Definition C.9 The rational numbers Q are defined to be the set of equivalence of classes [ ( a ,b ) ] qf elements of Z x (Z \ (0)) under the equivalence relation Proposition C.8. Addition is defined by [ ( a ,b ) ] + [ ( c ,d ) ] := [(ad bc, b d ) ] and multiplication is dejined b y [ ( a ,b ) ] . [ ( c ,d ) ] := [ ( a c ,b d ) ] . Both operations are welldejined.
+
Theorem C.10 With operations as dejined above, the rational numbers are an ordered jield. That is, Q satisfies all the properties outlined in Axioms 1.1 and 1.6 f o r the real numbers at the beginning of the text. The set Q+ is { [ ( a ,b ) ] : a , b E N C Z}.
Bibliography [ I ] M. AbramowitL and I. Stegun ( I 965), Handbook of mathenzatical functions: with formulas, XraphJ, and mathematical tuhleb, Dover, New York. [2] R. Adams (1978), Sobolev Spaces, Academic Press, Boston.
[3] R. Aris (1962), Vectors, Tensors, and the Basic Equations of Fluid Mechanics, Prentice-Hall, Englewood Cliffs, NJ. [4] R. Bjork (1994), Memory and Metamemory Considerations in the Training of Human Beings, in J. Metcalfe and A. Shimamura (eds.), Metacognition: Knowing about knowing, MIT Press, Cambridge, MA, 185-205.
[5] J.Bransford, R. Sherwood, N. Vye, and J. Rieser (1986), Teaching Thinking and Problem Solving, American Psychologist, October issue. [6] A. C. Chapman (1987), Fundamentals ofHeat Transfer, MacMillan, New York. [7] D. Cohn (1980), Measure Theory, Birkhauser, Boston, MA.
[8] J. DieudonnC (1960), Foundations of Modern Analysis, Academic Press, New York, London. [9] C. Dodge (1969), Sets, Logic and Numbers, Prindle, Weber & Smith, Incorporated, Boston, London, Sydney. [lo] D. Ferguson (1973), Sufficient conditions for Peano’s kernel to be of one sign, SIAM J. Numer: Anal. 10. 1047-1054. [ 111 H. Goldstein (1950), Classical Mechanics, Addison-Wesley, Cambridge, MA. [ 121 D. Halliday, R. Resnick and J. Walker (2001), Fundamentals ofPhysics, J. Wiley
& Sons, Hoboken, NJ.
[13] P. R. Halmos (1974), Naive set theory, Undergraduate Texts in Mathematics, Springer Verlag, New York. [ 141 E. Hewitt and K. Stromberg (1963, Real and Abstract Analysis, Graduate Texts
in Mathematics, Springer Verlag, New York, Heidelberg, Berlin. [ 151 H. Heuser (1986), Lehrbuch der Analysis, Teil 1 (4. Auflage), B. G. Teubner,
Stuttgart.
55 1
552
Bibliogrriphy
[ 16 I H. Heuser ( 1983), Lekrbuch der Aricilj
,
Teil 2 (2. Aujlage), B. G. Teubner,
Stuttgart. [ 171 H. Heuser (1986), Ficiiktioiici1a1~~1l~~i.s (2. AFflage), B. G. Teubner, Stuttgart. [ 181 A. Hurd and P. Loeb ( 1985). An lritroduction to Noiisturzdard Real Analysis,
Academic Press, Orlando, FL. [ 191 J.D. Jackson ( 1999), Classical Electrodyrzamics (Third Edition), John Wiley &
Sons, Inc., New York.
[20] R. Johnsonbaugh and W. Pfaffenberger (2002), Foundutions of Mathematical Analysis, Dover, Mineola, NY. 12 I ] A. Jiingel (2004), Dus kleine Finite-Elemente-Skript, Vorlesungsskript, Johannes Gutenberg Universitiit Mainz. [22] M. Lehn (2003), Analysis I l l , Vorlesungsskript, Johannes Gutenberg Universitiit Mainz. [23] M. Renardy and R. Rogers (1993), An Introduction to Partial Diflerentiul Equations, Springer, New York. [24] J. T. Sandefur ( 1 990), Discrete Dynamical Systems, Clarendon Press, Oxford. [25] P. Solin (2006), Partial Dflerential Equations and the Finite Element Method, J. Wiley and Sons, Inc., Hoboken, NJ. [26] M. Spivak (1963, Calculus otz Manifolds, W. A. Benjamin Inc., New York. [27] M. Spivak (1979), A Comprehensive Introduction to Differential Geometry, vol. I, seconded., Publish or Perish, Houston, TX. [28] J. Stoer and R. Bulirsch (1980), Introduction to Numerical Analysis, Springer Verlag, New York, Heidelberg, Berlin. [29] K. Stromberg (1981), An Introduction to Classical Real Analysis, Wadsworth International, Belmont, CA. [30] A. Torchinsky (1986), Real-Variable Methods in Harmonic Analysis, Academic Press, San Diego, CA. [3 11 J.Welty, C. Wicks, and R.Wilson (1 969), Fundamentals of Momentum, Heat and Mass Transfer, John Wiley & Sons, Inc., New York, London, Sydney, Tokyo. [32] S. Willard (1970), General Topology, Addison-Wesley, Reading, MA. [33] K. Yosida (1968), Functional Analysis (Second Edition), Springer Verlag, New York, Heidelberg, Berlin. [34] E. Zeidler (1990), Nonlinear Functional Analysis and its Applications II/A, Springer Verlag, New York, Berlin, Heidelberg.
Index ’ (set complement), 147
11 . l i p
( L p norm), 271,273 (oscillation o f f ) , 132,309 G3 (direct sum), 367 11 . I I w m . P ( Q 1 (Sobolev norm), 525 \ (set difference), 1 (nth root), 21,48, 178, 193 x (cross product), 370 x (product a-algebra), 248 x (product measure), 252 x (set product), 119 v (or), 546 A (and), 546 A (wedge product), 400 (hat indicating absence), 440 1,s (indicator function), 88 ! (factorial), 18, 193, 198 direction of a proof, 7 “=+” direction of a proof, 7
(subtraction), 5 2k test, 107 < (less than), 5 > (greater than), 5 (if and only if), 546 (implies), 546 1 . I (absolute value), 7 1 . 1 (size of a set), 37 3 --rule, 222 -
Wf
+ +
7
(integral) . d 3 (line integral), 457 F . dS (surface integral), 456 f d V (over a volume), 455 closed curveshrfaces, 460 improper Riemann, 140, 141, 144 indefinite, 96 Lebesgue, 158 on a measure space, 236 Riemann, 86 (intersection), 1, 1I 8
“+”
Alt(.), 399 262 a posteriori, 205 a priori, 205 a.e., 128, 229 absolute maximum/minimum/extremum, 68, 311 absolute value, 7, 282 absolutely continuous, 139,244,420,523 absolutely convergent, 109, 166,237, 293 accumulation point, 299 addition, 2 additive inverse, 2, 256 admissible triangulation, 539 affine linear, 424 after the fact, 205 algebra, 121, 233 almost everywhere, 128, 229 alternating, 399 Alternating Series Test, 108 Ampkre-Maxwell Law, 494 a;),
n
U(union), I . 118,547 b
-
(fraction), 13
U
c? (empty set), 1 3 (existential quantifier), 1, 546 ’v’ (universal quantifier), 1, 546 2 (greater than or equal), 5 00 (infinity in arithmetic), 147 (., .) (inner product), 264 < (less than or equal), 5 r.1 (ceiling function), 14 1.1 (floor function), 14 V (nabla operator), 366, 438 (negation), 546 11 . 112 (Euclidean norm), 270 I1 . 11 oo (uniform norm), 270, 27 1, 279 as limit of the // . ((p-noms,275, 28 1
-
553
554 and, 546 antisymmetric, 5 arccosine, 197 arcsine, 197 arctangent, 197 arithmetic involving 00, 147 associative, 2, 256, 401 atlas, 422, 425 Axiom of Choice, 548
BVLa, b ] , 259, 274,481 ball. 301 Banach space, 292 Banach’s Fixed Point Theorem, 381 base, 260 Base Exchange Theorem, 262 base functions of a finite element, 536 base step, 1 I , 17 before the fact, 205 Beltrami, 504 Bernoulli’s inequality, 23 Bessel equation, 493, 5 12 Bessel’s inequality, 463 best approximation, 476 bijective, 36, 548 bilinear, 370 binomial coefficient, 18 binomial formulas, 4 Binomial Theorem, 19 bisection method, 208 Bolzano-Weierstrass, 42, 309 Borel measure, 39 1, 48 1 Borel sets, 390, 391,457 boundary, 306,424,425,44 I , 447 boundary condition, 489 bounded, 8,40,59, 86,290, 344 bounded above, 8 , 4 0 bounded below, 8 , 4 0 bounded variation, 136, 259 bounds, 86 C o [ a ,b ] ,258, 271 Coo,258,425,428,444 Coo-manifold, 422 C k ,258,425,428,444 Ck-diffeomorphism, 422 Ck-manifold, 423 Cr(S2), 392 C (complex numbers), 28 1 CCa’s Lemma, 5 17
Index Cantor set, 126, 130, 136, 153, 165. 199. 295 case distinction, 6 Cauchy Criterion, 90. 108, 1 IS, I44 Cauchy Product, 176 Cauchy sequence, 36,39,282,29 1 Cauchy’s Limit Theorem, 48 Cauchy-Schwarz inequality, 269. 286 ceiling function, 14 chain, 446 Chain Rule, 77, 355, 367, 378 chart, 422 Clairaut’s Theorem, 377 clopen, 330 closed, 6, 304 closure, 306 coercive, 5 14 coercivity coefficient, 514 column index, 348 column vector, 350 commutative, 2, 256, 401 commutative diagrams, 350 compact, 309, 310, 314 Comparison Test, 109, I 15, I42 complement, I , 147 complete, 230, 292 Completeness Axiom, 9, 38 completion, 230, 328 complex conjugate, 282 complex lamellar, 504 complex numbers, 28 1 component, 333 composite integration formula, 215 composition, 39,41 conditional convergence, 1 1 1 conforming, 542 connected, 330 conservation of mass, 498,499 consistent, 453 containment relations CP-spaces, 259 Sobolev spaces, 532 continuous, 59,296,297,303 does not imply differentiable, 72 functions assume absolute extrema, 68 implies Lebesgue measurable, 386 inverse function, 67, 3 1 1 nowhere differentiable function, 188 topological formulation, 303 continuously differentiable, 79, 258
Index Continuum Hypothesis i n continuum mechanics, 497 in set theory, 548 contradiction, 3 contrapositive. 546 control volume approach, 497 convergent absolutely, 109, 237, 293 at m, 69 at x, 49 conditionally, 1 1 1 double series, 114 from the left, 56 from the right, 56 in @, 282 in mean, 242 in measure, 243, 291 power series, 175 sequence, 25,287 series, I0 I , 293 unconditionally, 1 1 I , 296, 32 1 convex, 462,476 convolution, 396,4 18, 525 coordinate system, 348, 422 coordinate transformation, 4 16 for differential operators, 367 cosine function, 193, 283 Coulomb's Law, 496 countable, 122 countably additive, 228, 23 1 countably infinite, 122 countably subadditive, 129 counting measure, 228 cover, 3 12 cross product, 370 cube, 446 cubic Hermitian finite elements, 538 curl, 438 cylindrical coordinates, 4 17
D f ff (partial derivative, order la I), 5 18 Dk f (kth derivative), 373, 374 D j f (partial derivative, direction x j ) , 364 D f (derivative), 354 A (Laplace operator), 488 Agi, 90 A X , , 86 6 (boundary, topological), 306 d" -f (nth derivative), 79 dxn '
555
ijj
- (partial
derivative), 365
i)Xj
dist(.r. A ) (distance from a point to a set), 322,333 2 (boundary operator), 44 I , 447 i)M (boundary o f a manifold), 424,425 d (differential of a form), 437, 445 d (metric), 276 Darboux integral, 99 decimal expansion, 104 decreasing, 69, 82 degree. 62 degrees of freedom, 536 DeMorgan's Laws, 1 18 dense, 323 derivative, 7 I , 354 arguments, 354 of a constant multiple, 74, 355 of a difference, 74 of a sum, 74,355 of an increasing function, 82 of an inverse function, 82, 356 of the inversion operator, 357 zero at relative extremum, 80, 359 determinant, 404,407 of a linear function, 405 row expansion, 407 summation formula, 407 diagonal operator, 35 I , 407 diffeomorphism, 422,428 difference, 5 differentiable, 71, 72, 354, 427, 437, 443, 505 implies continuous, 72, 355 differentiable function with bounded, but not Riemann integrable, derivative, 198 differential, 437, 445 differential equation, 505 differential form, 435, 443 differential operator, 532 diffusion equation, 488 dimension, 261 Dini derivatives, 17 1 Dini's Theorem, 31 6 direct proof, 3 direct sum, 367 direction of steepest ascent, 366
556
Index
Dirichlet function, 52, 99 Dirichlet kernel, 468 disconnected, 330 discontinuity, 63 discrete metric, 277 d’,’ ISJolnt, 123 distance, 322, 333, 340 distributive, 2, 120, 256 divergence, 438 Divergence Theorem, 443,460 divergent, 25, 175, 287 series, 101,293 Dominated Convergence Theorem, 240 double series, 114,237, 254 doubly indexed family, 114 dual base, 398 dual space, 398,478 dyadic open box, 233 dyadic rational number, 16 Dynkin system, 246 generated by (I,246 Dynkin’s Lemma, 246
exponents (rules), 22, 19 I extended real number system, 146 extremum absolute, 68 relative (or local), 80
’
efficient evaluation of polynomials, 208 Egoroff’s Theorem, 244 elliptic bilinear form, 5 I4 differential operator, 532 uniformly, 534 embedded manifold, 423 with boundary, 425 with corners, 426 empty product, 18 empty set, I empty sum, 17 endpoints, 6 equicontinuous, 187 equivalence class/relation, 550 equivalent, 122, 3 17 Euclidean norm, 270 Euler identities, 284 Euler’s number, I90 Euler’s Summation Formula, 198 Eulerian approach, 497 evaluation set, 85 even function, 139 even number, 68 existential quantifier, I , 546 explicit differential equation, 507 exponential function, 189, 190, 201, 283
F ( D , R), 256 F ( D , C), 284 . f [ - (image ] of a set), 67 f l (restriction ~ o f f to R), 50 f + (positive part o f f ) . 154, 234 f - (negative part o f f ) , 154, 234 f x , f’ (sections), 248 factorial, 18, 193, 198 family, 117 Faraday’s Law, 494 Fatou’s Lemma, 240 field, 2 field isomorphism, 16 finite, 37, 246 finite dimensional, 261 finite element, 536 Hermitian, 538 Lagrangian, 538 space, 540 finite subcover, 314, 335 finitely additive measure, 233 fixed point, 210, 380 floor function, 14 fluid flow, 496 forced harmonic oscillator, 485 form, 435,443 Fourier coefficients, 464,475 Fourier equation, 488 Fourier polynomial, 467 Fourier series, 467,475 convergence, 469 fraction, 13 Fubini’s Differentiation Theorem, 187 Fubini’s Theorem, 253, 322 function, 548 functional, 298,478 Fundamental Theorem for Line Integrals, 462 Fundamental Theorem of Algebra, 32 1 Fundamental Theorem of Calculus Antiderivative Form, 9.5, 361, 523 Derivative Form, 137, 361, 390
Gamma function, 193
Index Gauss’ Law, 494 Gauss’ Theorem, 460 Gauss-Jordan Algorithm, 35 1 general solution of an ordinary linear differential equation, 5 12 generalized boundary value, 53 1 generalized factorial function, I93 geometric sumskeries, I0 I gradient, 366, 438 Gram-Schmidt Procedure, 268 greater than (or equal to), 5 greatest lower bound, 9 Green’s Identities, 490 Green’s Theorem, 462,540 Gronwall’s Inequality, 193 grows beyond all bounds, 45 H m ( Q ) , H $ ( Q ) (Sobolev space), 525,531 half-open, 6, 396,4 12 harmonic series, 105 heat equation, 488 Heine-Borel, 128, 314 Hermann A. Schwarz’ Theorem, 374 Hermitian finite element, 538 higher derivatives, 79, 373 Hilbert space, 292 Holder’s inequality, 27 I , 274, 28 1 homeomorphism, 335 homogeneous, 5 10 hyperbolic differential operator, 532
I k , 441,446 5 ( . )(imaginary part), 281 inf(.) (infimum), 9 iff (if and only if), 5, 546 image, 67 imaginary part, 28 I Implicit Function Theorem, 38 1 implies, 546 improper Lebesgue integral, 167 improper Riemann integral, 140, 141, 144 in a comer, 425 increasing, 69, 82 indefinite integral, 96 indeterminate forms, 47 indexed family, 1 I7 indicator function, 88, 154 induced metric, 276 induced orientation, 458 induction, 1 1, 17, 34
557 Induction Law, 494 infimum, 9 infinite, 37, 58, 106 infinite discontinuity, 63 infinite sum, 106 infinitely differentiable, 79, 373 infinity, 45, 57 inhomogeneous, 5 10 initial condition, 489 initial value problem, 505 injective, 16, 36, 548 inner product space, 264,284 integers, 13, 550 integrable, 236, 237, 284 integral, 236, 284 over subsets, 238 Integral Test, 143 integrand, 86,90 Integration by Parts, 96, 139,523 Integration by Substitution, 97, 139 integrator, 90 interior (point), 305 intermediate value property, 84 Intermediate Value Theorem, 66, 332 intersection, I , 1 18 interval, 5 inverse function, 67 continuity, 67,3 1 1 derivative, 82, 356 inverse trigonometric functions, 197 inward pointing tangent vector, 433 irrational numbers, 14 irrotational, 504 isolated point, 299 isometry, 326 isomorphism, 262, 265 Jacobian, 416 derivative, 499 matrix, 365 Jensen’s inequality, 275 Jordan content, 153,233 jump discontinuity, 63
L(f,P ) (lower sum), 91 L p (also see P ) ,279,280
brackets around elements, 342 on a manifold, 457 L g ( f , P ) (lower Stieltjes sum), 95 lim 48, 178, 193
n+cc
G,
558 lim (limit notation), 28, 49, 56. 69. 299 E. (Lebesgue measure), 127. 148. 23 I . 232 I’, 264 Ix, 257,270 I f ’ , 259 as an CI’ space, 273 containment relations, 259 C ( X , Y ) , 344 C2,266 C”, 279 LI3,258, 27 I , 273 containment relations, 259, 275, 281 A‘ (space of alternating k-tensors), 399 L‘H8pital’s Rule, 200 Lagrange multipliers, 384 Lagrange polynomial, 2 14 Lagrange’s Interpolation Formula, 2 14 Lagrangian approach. 497,498 Lagrangian derivative, 499 Lagrangian finite element, 538 lamellar, 504 Laplace equation, 488 Laplace operator, 488 in cylindrical and spherical coord.. 490 Laplacian flow, 504 Lax-Milgram Lemma, 5 15 Lebesgue integral, 158, 161 Lebesgue measurable, 147, 154 Lebesgue measure, 148,232 outer, 127, 23 1 Lebesgue’s criterion, 134 Lebesgue’s Differentiation Theorem, 17 1 Lebesgue’s singular function, 188 Lebesgue-Stieltjes measure, 233, 391 left limit, 56, 57 left-sided derivative, 74 Leibniz’ Rule, 368, 502 less than (or equal to), 5 limit, 25, 49-51, 56, 57, 69, 70, 287, 299, 300 nonexistence for sequences, 48 Limit Comparison Test, 1 16, 144 limit inferior, I69 Limit laws, 30,45, 47,52, 56, 59, 70 limit point, 304 limit superior, 169 Limit Test, I05 line integral, 419,435,457 linear, 264,285,342, 370 approximation, 354
lridex differential equation. 5 10 Lagrangian finite elements, 537 operator, 342 linear combination, 260 linearly independent, 259 linearly ordered tield, 8 Lipschitz constant, 299 Lipschitz continuous, 94, 299 local extremurn/maximum/minimum, 80,359 locally jL-null, 28 I locally compact, 334 locally finite, 337 logarithm, I9 I lower bound, 8,40, 86 lower integral, 99 lower sum, 9 1,95 lowest upper bound, 9 max(.) (maximum of a set), I0 min(.) (minimum of a set), 10 max( .f; g) (maximurn of functions), 62 min(,f, g ] (minimum of functions), 62 manifold, 422, 423 manifold with boundary, 424, 425 manifold with corners, 425 map, 548 Markov’s inequality, 238 matrix, 348 matrix multiplication, 349 maximal atlas, 427 maximal orthonormal system, 267 maximum, 10,62 absolute, 68 relative (or local), 80, 359 Maxwell’s equations, 494 Mean Value Theorem, 8 I , 360 for Riemann integrals, 94, 139 for Riemann-Stieltjes integrals, 95 generalized, 200 no direct translation to vector valued functions, 36 1 measurable, 23 I , 234, 284 measurable space, 227 measure space, 228 measure zero, 128, 229 mechanics of continua, 497 metric space, 276 properties defined for subsets, 290 metric subspace, 276 midpoint rule, 222
Index Milne’s Rule, 222 minimum, 10,62 absolute, 68 relative (or local), 80 Minkowski’s inequality, 272, 274 modus ponens, 3 monotone, 4 1 Monotone Class Theorem, 246 Monotone Convergence Theorem, 239 Monotone Sequence Theorem, 4 1 Multidimensional Substitution Formula, 414 multiindex, 5 18 multilinear, 370 multiplication, 2 multiplicative inverse, 2 Multivariable Chain Rule, 367 mutual containment. 119
c/ (nth root), 2 I , 48, 178, 193 n times differentiable, 79, 373 nth derivative, 79, 373 nth order differential equation, 505 No (nonnegative integers), 5 18 W (natural numbers), 11, 549 nabla operator, 366,438 natural exponential function, 190 natural logarithm function, 191 natural numbers, 11 natural projection, I 19, 297, 363 Navier Stokes equations, 496 conservation of mass, 498 negation, 546 negative, 5 negative infinity, 45, 58 neighborhood, 304 neighbors, 539 neutral element, 2, 256 Newton’s method, 209, 361 Newton-Cotes formulas, 2 I4 node of a finite element, 539 nonconforming, 542 nondecreasing, 41, 58, 82, 126, 136 nonincreasing, 4 1, 82 nonnegative integers, 5 18 norm, 86,269,345,371 normed space, 269,284 not, 546 null set, 128, 229 continuous image, 420
559 odd function, 139 odd number, 68 one-to-one, 16, 36, 548 onto, 16, 36,548 open, 6,302,303 open ball, 301 open box, 23 1 open cover, 3 12 operation, 549 operator, 298, 342 operator norm, 345 or, 546 order, 5 order isomorphism, 16 ordered n-tuple, 119 ordered pair, 114, 119 ordinary differential equation, 505 orientable, 453 orientation, 452 orientation preserving, 453 orthogonal, 267 orthogonal projections, 480 orthonormal base, 268,464 orthonormal system, 267 oscillation, 132, 309 outer Lebesgue measure, 127,231 properties, 129 outer Lebesgue-Stieltjes measure, 233 outer measure, 23 I outward orientation, 458 outward pointing tangent vector, 433 PU(fi),
fi
537
(product of numbers or sets), 18, 119
j= I
(pi), 196 x-system, 246 j ~ , (natural projection), 363 JCA, (natural projection), 119 p-integral test, 140, 141 p-series test, 174 pairwise disjoint, 123 parabolic differential operator, 532 parallelogram law, 274,476 parametrization. 422 Parseval’s identity, 464, 474 partial derivative, 364, 365 of la It’ order, 5 18 partial fraction decomposition, 193, 197,32 1 j~
560 partial sums. 1 0 1 , 293 partition. 85 partition of unity, 337. 454 pathwise connected, 332 Peano kernel, 2 I7 Peano‘s error reprexntation, 2 17 periodic. I96 periodic extension, 468 permeability constant, 494 permittivity constant. 494 permutation, 376 Picard-Lindehf Theorem, 508 piecewise continuous, I36 piecewise smooth, 469 Poincar~-Friedrichsinequality, 534 point-separating, 329 pointwise Cauchy, 181 pointwise convergent, 179 Poisson Equation, 494, 495 polar coordinates, 41 7 polarization identity, 274 polynomial, 62, 301, 537 positive definite, 264, 284, 379 positive functional, 480 positive orientation, 458 positive real numbers, 4 power, 18, 22, 19 1 Power Rule, 76, 83, 96, 192 power series, 175, 283 power set, 117, 547 Principle of Induction, 11, 17 product, 18, 23, 119, 349 product a-algebra, 248 product index, I8 product measure, 252 product norm, 362, 367 Product Rule, 76 product space, 362 product-to-sum formulas, 196 projection, 1 19 Pythagoras, I94
Q (rational numbers), 14, 550 quadratic Lagrangian finite elements, 538 quantifiers, 26, 546 Quotient Rule, 75 R(f,P , T ) (Riemann sum), 86 W(.) (real part), 28 1 PS (real numbers), 2
Index
R”,265 radius, 30 I , 4 I2 radius of convergence, 176 rank, 384 Ratio Test, 172 rational function, 63 rational number,$. 14, 550 real numbers, 2 uniqueness, 16 real part, 28 I reciprocal, 13 rectangle with measurable sides. 248 refinement, 92, 338, 539 reflexive, 5 , 550 regularization, 525 reindexing sums, 23 relation, 548 relative complement, 1 relative metric, 276 relative/local maximum/minimum, 80, 359 relatively open, 307 removable discontinuity, 63 restriction, 50 reverse triangular inequality, 7, 277 Reynolds’ Transport Theorem, 502 Riemann integrable, 86, 137 Riemann integral, 86 Banach space valued functions, 361 not for unbounded functions, 90 Riemann sum, 86 Riemann’s Condition, 97, 100 Riemann-Lebesgue Theorem, 473 Riemann-Stieltjes integral, 90,95, 100, 140, 234 Riemann-Stieltjes sum, 90 Riesz’ Representation Theorem, 479 right limit, 56, 57 right-continuous, 480 right-sided derivative, 74, 505 Ritz-Galerkin approximation, 5 16 Rolle’s Theorem, 81 root, 2 1 Root Test, 173 row addition operator, 351, 407 row index, 348 row transposition operator, 35 I , 407 row vector, 350 rules for exponents, 22, 191 S,(f, P , T ) (Riemann-Stieltjes sum), 90
Index
& (permutations), 399 S , , S! (sections), 248 C-mea~urahle,227, 234 Xi,147
(finite sums. infinite series), 17, 101 /=I
sup(.) (supremum), 9 sgn(.) (sign of a permutation). 399 a-algebra, 226 generated by ZA, 245 a-compact, 335 a-finite, 25 I scalar, 256 scalar multiplication, 256 scalar product, 264 Second Derivative Test, 207, 379 segment property, 528 semi-inner product space, 280 semimetric space, 278 seminormed space, 280 separable, 330,466 separation of variables, 490 sequence, 25. 154.282. 287 monotonic, 41 nondecreasing, 4 1 nonincreasing, 41 sequential compactness, 310 series, 101, 283, 293, 296, 321 p-series test, I74 comparison test, 109 ratio test, 172 sesquilinear, 370 Shrinking Lemma, 339 sigma algebra, 226 sign, 399 simple function, 154, 234 simplex, 537 simply connected, 462 Simpson’s Rule, 216, 221 sine function, 193, 283 singular k-cube, 446 Sobolev spaces, 524 containment relations, 532 solenoidal, 503 solution, 505, 514, 534 span, 262. 268 spherical coordinates, 417, 41 8 spring constant, 484 square root, 21
56 I Squeeze Theorem, 34, 54 standard k-cube. 446 standard proof techniques add and subtract the same term, 27 I addjng/subrracring -, 50 I1
avoiding division by Lero, 32 case distinction, 6 Cauchy sequences, 39 choose convergent subsequence, 64 Completeness Axiom, 38 continuous statement provides discrete entities, 5 1 contradiction, 3 direct proof, 3 equality, 27 equality of left and right limits, 57 equivalence, 6 1 existence of limits, 33 finite subcovers of open covers, 335 induction, 17 introductory/closing statements, 28 limits being zero, 35 modus ponens, 3 mutual containment of sets, 1 19 negation with quantifiers, 39 satisfying multiple inequalities, 27 standard induction argument, 34 strict vs. nonstrict inequality, 113 telescoping sum, 89 triangular inequality, 27 uniqueness, 3 universal quantification, 28 “without loss of generality”, 8 standard unit vector, 262 statement, 545 steady state, 488 Stieltjes, 90, 95, 100, 140, 233, 391, 480 Stirling’s Formula, 198 Stokes’ Theorem, 440,442,449,458,461 Stone-Weierstrass Theorem, 329,475 strictly increasing/decreasing, 41, 82 strong induction, 23 subalgebra, 329 subcover, 3 I4 sublattice, 329 submanifold, 424 subordinate, 337 subsequence, 41,290 subspace, 257
562 subtraction, 5 successor set, I 1 sum, 17,23 summation formula determinant, 407 first n integers, 17 powers of the first n integers, 23, 202 support, 337 supremum, 9 surface integral, 419, 436, 456 surjective, 16,36,548 symmetric, 264, 376,550 systems approach, 497,498 7 M (tangent bundle), 43 I I k(space of k-tensors), 371, 398 tangent (hyper)plane, 359 tangent bundle, 43 1 tangent function, 197 tangent space, 43 1 tangential, 358 taxicab metric, 278 Taylor polynomial, 204, 379 Taylor series, 204 Taylor's Formula, 204, 379 telescoping sum, 89 tensor, 370,398 tensor norm, 371 tensor product, 398 ternary Cantor set, 125 test function, 392 test set, 147 thermal flux vector, 487 topology, 303 total order, 5 totally defined, 548 Trace Theorem, 529 Trace Theorem Domain, 53 1 transitive, 5 , 61, 550 transpose, 407 transposition, 399 trapezoidal rule, 216, 221 traveling particle, 435 triangular inequality, 7, 27, 109, 137, 140, 142, 159,236,269,282,361 triangulation, 538 trigonometric polynomial, 196 two-sided limits, 56
u(.f,P ) (upper sum), 91
Index Ug(f, P) (upper Stieltjes sum), 95 unbounded, 40,369 unconditional convergence, 1 1 I , 296, 32 1 uncountable, 124 uniform norm, 270,289 uniformly Cauchy, 182 uniformly continuous, 92, 3 12 uniformly convergent, 180, 301 uniformly elliptic, 534 union, 1, 1 18, 547 uniqueness proof, 4 unit normal vector, 455 universal quantifier, 1, 546 upper bound, 8 , 4 0 , 8 6 upper integral, 99 upper sum, 9 1,95 V,bf (variation over [ a ,b ] ) ,136 vacuously true, 21 variational formulation, 534 vector, 256 vector addition, 256 vector field, 433 vector space, 256,284 vector subspace, 257 velocity vector, 354 Venn diagram, 1 18 vertices of a simplex, 537 volume element, 406,455 volume of d-dimensional balls. 41 8 W".p(n), W,""(S2) (Sobolev space), 524, 53 1 Wallis' Product Formula, 198 wave equation, 495 weak derivative, 519 weak solution, 534 weakly differentiable, 519 Weddle's Rule, 222 wedge product, 400 well-defined, 16, 548 Well-ordering Theorem, 12 without loss of generality, 8
Young's inequality, 27 1, 274
Z(integers), 13, 550 zeroth derivative, 79