REVIEWSOF
Algebra and Analysis for Engineers and Scientists
"This book is a useful compendium of the mathematics of (...
757 downloads
3975 Views
35MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
REVIEWSOF
Algebra and Analysis for Engineers and Scientists
"This book is a useful compendium of the mathematics of (mostly) finite-dimensionallinear vector spaces (plus two final chapters on infinite-dimensional spaces), which do find increasing application in many branches of engineering and science .... The treatment is thorough; the book will certainly serve as a valuable reference." - A merican Scientist "The authors present topics in algebra and analysis for students in engineering and science .... Each chapter is organized to include a brief overview, detailed topical discussions and references for further study. Notes about the references guide the student to collateral reading. Theorems, definitions, and corollaries are illustrated with examples. The student is encouraged to prove some theorems and corollaries as models for proving others in exercises. In most chapters, the authors discuss constructs used to illustrate examples of applications. Discussions are tied together by frequent, well written notes. The tables and index are good. The type faces are nicely chosen. The text should prepare a student well in mathematical matters." - S cience Books and iF lms "This is an intermediate level text, with exercises, whose avowed purpose is to provide the science and engineering graduate student with an appropriate modern mathematical (analysis and algebra) background in a succinct, but nontrivial, manner. After some fundamentals, algebraic structures are introduced followed by linear spaces, matrices, metric spaces, normed and inner product spaces and linear operators.... While one can quarrel with the choice of specific topics and the omission of others, the book is quite thorough and can serve as a text, for self-study or as a reference." - M athematical Reviews "The authors designed a typical work from graduate mathematical lectures: formal definitions, theorems, corollaries, proofs, examples, and exercises. It is to be noted that problems to challenge students' comprehension are interspersed throughout each chapter rather than at the end." - C H O ICE
Printed in the USA
Anthony N. Michel Charles .J Herget
Algebra and Analysis for Engineers and Scientists
Birkhauser Boston • Basel • Berlin
Anthony N. Michel Department of Electrical Engineering nU iversity of Notre Dame Notre Dame, IN 64 556 .U S.A.
Charles .J eH rget eH rget Associates P.O. Box 1425 Alameda, CA 94501 .U S.A.
Cover design by Dutton and Sherman, aH mden, CT. Mathematics Subject Classification (2000): 03Ex,x 03E20, 08-,X 08-01, IS-,X 15A04, 15A06, 15A09, 15A15, 15AI8, 15A21, 15A57, 15A60, 15A63, 20-,X 26-01, 26Ax,x 26A03, 26A15, 26Bx,x 34,X 340- 1, 34A,x 34AI2, 34A30, 340H 5, 64 A22, 64 A50, 64 A55, 64 Bx,x 64 B20, 64 B25, 64 Cx,x 64 C05, 64 Ex,x 64 0- 1, 64 Ax,x 74 ,X 74 0- 1, 74 Ax,x 74 A05, 74 A07, 74 A10, 74 A25, 74 A30, 74 A67, 47BI5,47HI0, 54,X 540- 1, 54A20, 54C,x 54C05, 54C30, 540x , 54005, 54 0 30,54 0 35,54 0 4 5 , 54E50, 93EIO
15-01, 15A03, 20-01, 26-,X 54 B05, 64 ,X 64 NIO, 64 N20, 74 N20, 74 N70, 54E35, 54E54 ,
L i brary of Congress Control Number: 2007931687
ISBN-13: 978-08- 176-74 06-3
e-ISBN-13: 978-08- 176-74 07-0
Printed on acid-free paper. ©2007
Birkhiiuser Boston
Originally published as Mathematical oF undations in Engineering and Science by Prentice-aH ll, Englewood Cliffs, NJ, 1981. A subsequent paperback edition under the title Applied Algebra and F r the Birkhiiuser Boston printing, Functional Analysis was published by Dover, New oY rk, 1993. o the authors have revised the original preface. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkhiiuser Boston, c/o Springer Science+Business Media C L , 233 Spring Street, New oY rk, NY 10013, S U A), except for brief excerpts in connection with reviews or scholarly analysis. sU e in connection with any form of information .storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 432
I
www.birkhauser.com
(IBT)
CONTENTS
PREFACE
IX
CH A PTER 1: 1.1 1.2 1.3 1.4 1.5 1.6
2.2 2.3 2.4
CONCEPTS
1
Sets 1 Functions 12 Relations and Equivalence Relations 25 Operations on Sets 26 Mathematical Systems Considered in This Book References and Notes 31 References 32
CH A PTER 2: 2.1
F U N DAMENTAL
ALGEBRAIC STRU C TU R ES
Some Basic Structures of Algebra
A. Semigroups and Groups
36
30
33 34
8. Rings and iF elds 46 C. Modules, Vector Spaces, and Algebras D. Overview 61 Homomorphisms 62 69 Application to Polynomials References and Notes 74 References 74
53
v
Contents
vi
CHAPTER J : 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
iL near Spaces 75 iL near Subspaces and Direct Sums 81 iL near Independence, Bases, and Dimension iL near Transformations 95 iL near uF nctionals 109 Bilinear uF nctionals 113 Projections 119 Notes and References 123 References 123
CHAPTER 4 : .4 1 .4 2
.4 3 4.4
.4 5 .4 6 .4 7
.4 8 ;4 9 .4 10
VECTOR SPACES AND LINEAR TRANSFORMATIONS 75
85
FINITE-DIMENSIONAL VECTOR SPACES ANDMATRICES 124
Coordinate Representation of Vectors 124 Matrices 129 A. Representation of iL near Transformations by Matrices 129 B. Rank of a Matrix 134 C. Properties of Matrices 136 Equivalence and Similarity 148 Determinants of Matrices
155
Eigenvalues and Eigenvectors 163 Some Canonical oF rms of Matrices 169 Minimal Polynomials, Nilpotent Operators and the oJ rdan Canonical oF rm 178 A. Minimal Polynomials 178 B. Nilpotent Operators 185 C. The oJ rdan Canonical oF rm 190 Bilinear uF nctionals and Congruence 194 Euclidean Vector Spaces 202 A. Euclidean Spaces: Definition and Properties B. Orthogonal Bases 209 iL near Transformations on Euclidean Vector Spaces A. Orthogonal Transformations 216 B. Adjoint Transformations 218 C. Self-Adjoint Transformations 221 D. Some Examples 227 E. uF rther Properties of Orthogonal Transformations 231
202 216
vii
Contents
.4 11 4.12
Applications. to Ordinary Differential Equations A. Initial-Value Problem: Definition 238 B. Initial-Value Problem: linear Systems 24 4 Notes and References 261 References 262
CH A PTER 5:
METRIC SPACES
238
263
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9
Definition of Metric Spaces 264 Some Inequalities 268 Examples of Important Metric Spaces 271 Open and Closed Sets 275 Complete Metric Spaces 286 Compactness 298 Continuous Functions 307 Some Important Results in Applications 314 Equivalent and Homeomorphic Metric Spaces. Topological Spaces 317 323 5.10 Applications A. Applications of the Contraction Mapping Principle 323 B. uF rther Applications to Ordinary Differential Equations 329 5.11 References and Notes 341 References 341
CHAPTER 6: 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.1 0 6.11 6.12
NORMED SPACES AND INNER PRODUCT SPACES 343
Normed linear Spaces 344 linear Subspaces 348 Infinite Series 350 Convex Sets 351 iL near Functionals 355 Finite- Dimensional Spaces 360 Geometric Aspects of iL near Functionals 363 Extension of iL near Functionals 367 Dual Space and Second Dual Space 370 Weak Convergence 372 Inner Product Spaces 375 Orthogonal Complements 381
yiii
Contents
6.13 6.14 6.15
6.16
oF urier Series 387 The Riesz Representation Theorem 393 Some Applications 394 A. Approximation of Elements in iH lbert Space (Normal Equations) 395 B. Random Variables 397 C. Estimation of Random Variables 398 Notes and References 404 References 404
CHAPTER 7: 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10
7.11
L I NEAR OPERATORS
406
Bounded iL near Transformations 04 7 Inverses 415 419 Conjugate and Adjoint Operators eH rmitian Operators 24 7 Other iL near Operators: Normal Operators, Projections, nU itary Operators, and Isometric Operators 34 1 The Spectrum of an Operator 439 Completely Continuous Operators 74 The Spectral Theorem for Completely Continuous Normal Operators 454 Differentiation of Operators 458 Some Applications 465 A. Applications to Integral Equations 465 B. An Example from Optimal Control 468 C. Minimization of Functionals: Method of Steepest Descent 74 1 References and Notes 473 References 74 3 Index 475
PREFACE
This book evolved from a one-year sequence of courses offered by the authors at Iowa State University. The audience for this book typically included theoretically oriented first- or second-year graduate students in various engineering or science disciplines. Subsequently, while serving as Chair of the Department of Electrical Engineering, and later, as Dean of the College of Engineering at the University of Notre Dame, the first author continued using this book in courses aimed primarily at graduate students in control systems. Since administrative demands precluded the possibility of regularly scheduled classes, the Socratic method was used in guiding students in self study. This method of course delivery turned out to be very effective and satisfying to student and teacher alike. F e edback from colleagues and students suggests that this book has been used in a similar manner elsewhere. The original objectives in writing this book were to provide the reader with appropriate mathematical background for graduate study in engineering or science; to provide the reader with appropriate prerequisites for more advanced subjects in mathematics; to allow the student in engineering or science to become familiar with a great deal of pertinent mathematics in a rapid and efficient manner without sacrificing rigor; to give the reader a unified overview of applicable mathematics, thus enabling him or her to choose additional courses in mathematics more intelligently; and to make it possible for the student to understand at an early stage of his or her graduate studies the mathematics used in the cur-
ix
x
Preface
rent literature (e.g., journal articles, monographs, and the like). Whereas the objectives enumerated above for writing this book were certainly pertinent over twenty years ago, they are even more compelling today. The reasons for this are twofold. First, today's graduate students in engineering or science are expected to be more knowledgeable and sophisticated in mathematics than students in the past. Second, today's graduate students in engineering or science are expected to be familiar with a great deal of ancillary material (primarily in the computer science area), acquired in courses that did not even exist a couple of decades ago. In view of these added demands on the students' time, to become familiar with a great deal of mathematics in an efficient manner, without sacrificing rigor, seems essential. Since the original publication of this book, progress in technology, and consequently, in applications of mathematics in engineering and science, has been phenomenal. oH wever, it must be emphasized that the type of mathematics itself that is being utilized in these applications did not experience corresponding substantial changes. This is particularly the case for algebra and analysis at the intermediate level, as addressed in the present book. Accordingly, the material of the present book is as current today as it was at the time when this book first appeared. (Plus a~ change, plus c' e st la meme chose.- A lphonse aK rr, 1849.) This book may be viewed as consisting essentially of three parts: set theory (Chapter I), algebra (Chapters 2- 4 ) , and analysis (Chapters 5-7). Chapter I is a prerequisite for all subsequent chapters. Chapter 2 emphasizes abstract algebra (semigroups, groups, rings, etc.) and may essentially be skipped by those who are not interested in this topic. Chapter 3, which addresses linear spaces and linear transformations, is a prerequisite for Chapters ,4 6, and 7. Chapter 4, which treats finite-dimensional vector spaces and linear transformations on such spaces (matrices) is required for Chapters 6 and 7. In Chapter 5, metric spaces are treated. This chapter is a prerequisite for the subsequent chapters. Finally, Chapters 6 and 7 consider Banach and Hilbert spaces and linear operators on such spaces, respectively. The choice of applications in a book of this kind is subjective and will always be susceptible to criticisms. We have attempted to include applications of algebra and analysis that have broad appeal. These applications, which may be omitted without loss of continuity, are presented at the ends of Chapters 2, 4, 5, 6, and 7 and include topics dealing with ordinary differential equations, integral equations, applications of the contraction mapping principle, minimization of functionals, an example from optimal control, and estimation of random variables. All exercises are an integral part of the text and are given when they arise, rather than at the end of a chapter. Their intent is to further the reader's understanding of the subject matter on hand.
Preface
ix
The prerequisites for this book include the usual background in undergraduate mathematics offered to students in engineering or in the sciences at universities in the United States. Thus, in addition to graduate students, this book is suitable for advanced senior undergraduate students as well, and for self study by practitioners. Concerning the labeling of items in the book, some comments are in order. Sections are assigned numerals that reflect the chapter and the section numbers. For example, Section 2.3 signifies the third section in the second chapter. Extensive sections are usually divided into subsections identified by upper-case common letters A, B, C, etc. Equations, definitions, theorems, corollaries, lemmas, examples, exercises, figures, and special remarks are assigned monotonically increasing numerals which identify the chapter, section, and item number. For example, Theorem 4.4.7 denotes the seventh identified item in the fourth section of Chapter .4 This theorem is followed by Eq. (4.4.8), the eighth identified item in the same section. Within a given chapter, figures are identified by upper-case letters A, B, C, etc., while outside of the chapter, the same figure is identified by the above numbering scheme. iF nally, the end of a proof or of an example is signified by the symbol • .
Suggested Course Outlines Because of the flexibility described above, this book can be used either in a onesemester course, or a two-semester course. In either case, mastery of the material presented will give the student an appreciation of the power and the beauty of the axiomatic method; will increase the student's ability to construct proofs; will enable the student to distinguish between purely algebraic and topological structures and combinations of such structures in mathematical systems; and of course, it will broaden the student's background in algebra and analysis.
A one-semester course Chapters 1, 3, 4, 5, and Sections 6.1 and 6.11 in Chapter 6 can serve as the basis for a one-semester course, emphasizing basic aspects of Linear Algebra and Analysis in a metric space setting. The coverage of Chapter 1 should concentrate primarily on functions (Section 1.2) and relations and equivalence relations (Section 1.3), while the material concerning sets (Section 1.1) and operations on sets (Section 1.4) may be covered as reading assignments. On the other hand, Section 1.5 (on mathematical systems) merits formal coverage, since it gives the student a good overview of the book' s aims and contents.
xii
Preface
The material in this book has been organized so that Chapter 2, which addresses the important algebraic structures encountered in Abstract Algebra, may be omitted without any loss of continuity. In a one-semester course emphasizing Linear Algebra, this chapter may be omitted in its entirety. In Chapter 3, which addresses general vector spaces and linear transformations, the material concerning linear spaces (Section 3.1), linear subspaces and direct sums (Section 3.2), linear independence and bases (Section 3.3), and linear transformations (Section 3.4) should be covered in its entirety, while selected topics on linear functionals (Section 3.5), bilinear functionals (Section 3.6), and projections (Section 3.7) should be deferred until they are required in Chapter .4 Chapter 4 addresses finite-dimensional vector spaces and linear transformations (matrices) defined on such spaces. The material on determinants (Section 4.4) and some of the material concerning linear transformations on Euclidean vector spaces (Subsections .4 1 OD and .4 1 OE), as well as applications to ordinary differential equations (Section 4.11) may be omitted without any loss of continuity. The emphasis in this chapter should be on coordinate representations of vectors (Section 4.1), the representation of linear transformations by matrices and the properties of matrices (Section 4.2), equivalence and similarity of matrices (Section 4.3), eigenvalues and eigenvectors (Section 4.5), some canonical forms of matrices (Section 4.6), minimal polynomials, nilpotent operators and the Jordan canonical form (Section 4.7), bilinear functionals and congruence (Section 4.8), Euclidean vector spaces (Section 4.9), and linear transformations on Euclidean vector spaces (Subsections .4 1 OA, .4 1 OB, and .4 1 oq . Chapter 5 addresses metric spaces, which constitute some of the most important topological spaces. In a one-semester course, the emphasis in this chapter should be on the definition of metric space and the presentation of important classes of metric spaces (Sections 5.1 and 5:3), open and closed sets (Section 5.4), complete metric spaces (Section 5.5), compactness (Section 5.6), and continuous functions (Section 5.7). The development of many classes of metric spaces requires important inequalities, including the Holder and the Minkowski inequalities for finite and infinite sums and for integrals. These are presented in Section 5.2 and need to be included in the course. Sections 5.8 and 5.10 address specific applications and may be omitted without any loss of continuity. oH wever, time permitting, the material in Section 5.9, concerning equivalent and homeomorphic metric spaces and topological spaces, should be considered for inclusion in the course, since it provides the student a glimpse into other areas of mathematics. To demonstrate mathematical systems endowed with both algebraic and topological structures, the one-semester course should include the material of Sections 6.1 and 6.2 in Chapter 6, concerning normed linear spaces (resp., Banach spaces) and inner product spaces (resp., Hilbert spaces), respectively.
Preface
ix ii
A two-semester course In addition to the material outlined above for a one-semester course, a two-se-
mester course should include most of the material in Chapters 2, 6, and 7. Chapter 2 addresses algebraic structures. The coverage of semigroups and groups, rings and fields, and modules, vector spaces and algebras (Section 2.1) should be in sufficient detail to give the student an appreciation of the various algebraic structures summarized in Figure B on page 61. Important mappings defined on these algebraic structures (homomorphisms) should also be emphasized (Section 2.2) in a two-semester course, as should the brief treatment of polynomials in Section 2.3. The first ten sections of Chapter 6 address normed linear spaces (resp., Banach spaces) while the next four sections address inner product spaces (resp., Hilbert spaces). The last section of this chapter, which includes applications (to random variables and estimates of random variables), may be omitted without any loss of continuity. The material concerning normed linear spaces (Section 6.1), linear subspaces (Section 6.2), infinite series (Section 6.3), convex sets (Section 6.4), linear functionals (Section 6.5), finite-dimensional spaces (Section 6.6), inner product spaces (Section 6.11), orthogonal complements (Section 6.12), and Fourier series (Section 6.13) should be covered in its entirety. Coverage of the material on geometric aspects of linear functionals (Section 6.7), extensions of linear functionals (Section 6.8), dual space and second dual space (Section 6.9), weak convergence (Section 6.10), and the Riesz representation theorem (Section 6.14) should be selective and tailored to the availability of time and the students' areas of interest. (For example, students interested in optimization and estimation problems may want a detailed coverage of the H a hn- B anach theorem included in Section 6.8.) Chapter 7 addresses (bounded) linear operators defined on Banach and Hilbert spaces. The first nine sections of this chapter should be covered in their entirety in a two-semester course. The material of this chapter includes bounded linear transformations (Section 7.1), inverses (Section 7.2), conjugate and adjoint operators (Section 7.3), Hermitian operators (Section 7.4), normal, projection, unitary and isometric operators (Section 7.5), the spectrum of an operator (Section 7.6), completely continuous operators (Section 7.7), the spectral theorem for completely continuous normal operators (Section 7.8), and differentiation of (not necessarily linear and bounded) operators (Section 7.9). The last section, which includes applications to integral equations, an example from optimal control, and minimization of functionals by the method of steepest descent, may be omitted without loss of continuity. Both one-semester and two-semester courses offered by the present authors, based on this book, usually included a project conducted by each course participant to demonstrate the applicability of the course material. Each project
ix v
Preface
involved a formal presentation to the entire class at the end of the semester. The courses described above were also offered using the Socratic method, following the outlines given above. These courses typically involved half a dozen participants. While most of the material was self taught by the students themselves, the classroom meetings served as a forum for guidance, clarifications, and challenges by the teacher, usually resulting in lively discussions of the subject on hand not only among teacher and students, but also among students themselves. For the current printing of this book, we have created a supplementary website of additional resources for students and instructors: http://Michel.Herget. net. Available at this website are additional current references concerning the subject matter of the book and a list of several areas of applications (including references). Since the latter reflects mostly the authors' interests, it is by definition rather subjective. Among several additional items, the website also includes some reviews of the present book. In this regard, the authors would like to invite readers to submit reviews of their own for inclusion into the website. The present publication of Algebra and Analysisfor Engineers and Scientists was made possible primarily because of Tom Grasso, Birkhauser's Computational Sciences and Engineering Editor, whom we would like to thank for his considerations and professionalism. Anthony N. Michel Charles .J Herget Summer. 2007
1
N U F DAMENTAL
CONCEPTS
In this chapter we present fundamental concepts required throughout the remainder of this book. We begin by considering sets in Section 1.1. In Section 1.2 we discuss functions; in Section 1.3 we introduce relations and equivalence relations; and in Section 1.4 we concern ourselves with operations on sets. In Section 1.5 we give a brief indication of the types of mathematical systems which we will consider in this book. The chapter concludes with a brief discussion of references.
1.1. SETS Virtually every area of modern mathematics is developed by starting from an undefined object called a set. There are several reasons for doing this. One of these is to develop a mathematical discipline in a completely axiomatic and totally abstract manner. Another reason is to present a unified approach to what may seem to be highly diverse topics in mathematics. Our reason is the latter, for our interest is not in abstract mathematics for its own sake. However, by using abstraction, many of the underlying principles of modern mathematics are more clearly understood. Thus, we begin by assuming that a set is a well defined collection of 1
Chapter 1 I uF ndomental Concepts
2
elements or objects. We denote sets by common capital letters A, B, C, etc., and elements or objects of sets by lower case letters a, b, c, etc. F o r example, we write A = a{ , b, c} to indicate that A is the collection of elements a, b, c. If an element x belongs to a set A, we write X EA. In this case we say that "x belongs to A," or "x is contained in A," or "x is a member of A," etc. Ifx is any element and if A is a set, then we assume that one knows whether x belongs to A or whether x does not belong to A. If x does not belong to A we write x ¢
A.
To illustrate some of the concepts, we assume that the reader is familiar with the set of real numbers. Thus, if we say
R is the set of all real numbers, then this is a well defined collection of objects. We point out that it is possible to characterize the set of real numbers in a purely abstract manner based on an axiomatic approach. We shall not do so here. To illustrate a non-well defined collection of objects, consider the statement "the set of all tall people in Ames, Iowa." This is clearly not precise enough to be considered here. We will agree that any set A may not contain any given element x more than once unless we explicitly say so. Moreover, we assume that the concept of "order" will play no role when representing elements of a set, unless we say so. Thus, the sets A = a{ , b, c} and B = c{ , b, a} are to be viewed as being exactly the same set. We usually do not describe a set by listing every element between the curly brackets { } as we did for set A above. A convenient method of characterizing sets is as follows. Suppose that for each element x of a set A there is a statement P(x ) which is either true or false. We may then define a set B which consists of all elements x E A such that P(x ) is true, and we may write B
=
{x
E
A: P(x ) is true}.
F o r example, let A denote the set of all people who live in Ames, Iowa, and let B denote the set of all males who live in Ames. We can write, then, B=
{x
E
A: x is a male}.
When it is clear which set x belongs to, we sometimes write { x : P(x ) is true} (instead of, say, {x E A: P(x ) is trueD. It is also necessary to consider a set which has no members. Since a set is determined by its elements, there is only one such set which is called the
1.1. Sets
3
empty set, or the vacuous set, or the null set, or the void set and which is denoted by 0. Any set, A, consisting of one or more elements is said to be non-empty or nOD-void. IfA is non-void we write A 1= = 0. If A and B are sets and if every element of B also belongs to A, then we say that B is a subset of A or A includes B, and we write B c A or A :::> B. Furthermore, if B c A and if there is an x E A such that x .¢ B, then we say that B is a proper subset of A. Some texts make a distinction between proper subset and any subset by using the notation c and ~, respectively. We shall not use the symbol ~ in this book. We note that if A is any set, then 0 c: A. Also, 0 c 0. If B is not a subset of A, we write B ¢ A or A P= B. 1.1.1. Example. Let R denote the set of all real numbers, let Z denote the set of all integers, let J denote the set of all positive integers, and let Q denote the set of all rational numbers. We could alternately describe the set Zas Z = { x E R: x is an integer}. Thus, for every x E R, the statement x is an integer is either true or false. We frequently also specify sets such as J in the following obvious manner,
J
=
{x
E Z: x
=
1, 2, ...}.
We can specify the set Q as Q=
x{
E
R:x =
:,p,q
E
Z,q : ;i:o} .
It is clear that 0 c J c Z c Q c R, and that each of these subsets are proper subsets. We note that 0 .¢ .J •
We now wish to state what is meant by equality of sets. 1.1.2. De6nition. Two sets, A and B, are said to be equal if A c Band B c A. In this case we write A = B. If two sets, A and B, are not equal, we write A :;i: B. Ifx and y denote the same element of a set, we say that they are equal and we write x = y. If x and y denote distinct elements of a set, we write x :;i: y.
We emphasize that all definitions are "ifand only if" statements. Thus, in the above definition we should actually have said: A and B are equal if and only if A c Band Be A. Since this is always understood, hereafter all definitions will imply the "only if" portion. Thus, we simply say: two sets A and B are said to be equal if A c Band B cA. In Definition 1.1.2 we introduced two concepts of equality, one of equality of sets and one of equality of elements. We shall encounter many forms of equality throughout this book.
Chapter 1 I uF ndamental Concepts
4
Now let X be a set and let A c: .X The complement of subset A with respect to X is the set of elements of X which do not belong to A. We denote the complement of A with respect to X by CxA . When it is clear that the complement is with respect to ,X we simply say the complement of A (instead of the complement of A with respect to X), and simply write A- . Thus, we have A-
=
{x
E
X: x
AJ. ~
(1.1.3)
In every discussion involving sets, we will always have a given fixed set in mind from which we take elements and subsets. We will call this set the universal set, and we will usually denote this set by .X Throughout the remainder of the present section, X denotes always an arbitrary non-void fixed set. We now establish some properties of sets. 1.1.4. (i) (ii) (iii) (iv) (v) (vi)
Theorem. eL t A, B, and C be subsets of .X
Then
if A c: Band Bee, then Ace; X= 0; 0- = X; (A- r = A; A c B if and only if A- >= B- ; and A = B if and only if A- = B- .
Proof To prove (i), first assume that A is non-void and let x E A. Since A c: B, x E B, and since B c: C, X E C. Since x is arbitrary, every element of A is also an element of C and so A c C. Finally, if A = 0, then A c C follows trivially. The proofs of parts (ii) and (iii) follow immediately from (1.1.3). To prove (iv), we must show that A c (A- ) - and (A- r c: A. If A = 0, then clearly A c: (A- r . Now suppose that A is non-void. We note from (1.1.3) that (A- r
=
{x
E
X:
x
~
A- } .
(1.1.5)
If x E A, it follows from (1.1.3) that x ~ A- , and hence we have from (1.1.5) that x E (A- ) - . This proves that A c:(A- ) - . If(A- r = 0, then A = 0; otherwise we would have a contradiction by what we have already shown; i.e., A c: (A- r . So let us assume that (A- r 0. If x E (A- r it follows from (1.1.5) that x ~ A- , and thus we have x E A in view of (1.1.3). eH nce, (A- r c: A. We leave the proofs of parts (v) and (vi) as an exercise. _
"*
1.1.6. Exercise.
Prove parts (v) and (vi) of Theorem 1.1.4.
The proofs given in parts (i) and (iv) of Theorem 1.1.4 are intentionally quite detailed in order to demonstrate the exact procedure required to prove
1.1. Sets
5
containment and equality of sets. Frequently, the manipulations required to prove some seemingly obvious statements are quite long. It is suggested that the reader carry out all the details in the manipulations of the above exercise and the exercises that follow. Nex t , let A and B be subsets of .X We define the union of sets A and B, denoted by A U B, as the set of all elements that are in A or B; i.e., A
u
B=
x{
E X:
x
E
A or x
E
B}.
When we say x E A or x E B, we mean x is in either A or in B or in both A and B. This inclusive use of "or" is standard in mathematics and logic. IfA and B are subsets of ,X we define their intersection to be the set of all elements which belong to both A and B and denote the intersection by A n B. Specifically, A n B = x { E X : x E A and x E B}.
If the intersection of two sets A and B is empty, i.e., if A n B = 0, we say that A and B are disjoint. F o r example, let X = I{ , 2, 3,4 , 5}, let A = I{ , 2}, let B = 3{ , ,4 5}, let C = 2{ , 3}, and let D = ,4{ 5}. Then A- = B, B- = A, DeB, A U B = ,X A n B = 0, A U C = I{ , 2, 3}, B n D = D, A n C = 2{ ,} etc. In the next result we summarize some of the important properties of union and intersection of sets. 1.1.7. Theorem. (i)
(ii) (iii) (iv) (v) (vi) (vii) (viii) (ix) (x) (xi) (xii) (xiii) (xiv) (xv) (xvi) (xvii)
An B AU B An 0 Au 0
Let A, B, and C be subsets of .X
An X = AuX = X ; A n A= Au A= A u A-
An A-
B n A; B U A; =
=
=
0; A; =
=
A;
=
A; A; X;
0;
An Be A; An B = A if and only if A c B; A c A U B; A = A u B if and only if B c A; (A n B) n C = An (B n C); (A U B) U C = A U (B U C); A n (B u C) = (A n B) u (A n C);
Then
Chapter 1 I uF ndamental Concepts
6
(xviii) (xi)x (x)x
(A
n
B) U
(A U B)(A n Bf
= =
C = (A U C) () (B U A- n B- ; and A- U B- .
C);
Proof. We only prove part (xviii) of this theorem. again as an illustration of the manipulations involved. We will first show that (A () B) U C c (A U C) () (B U C), and then we show that (A () B) U C::::J (A U C) n (B U C). Clearly, if (A () B) U C = 0, the assertion is true. So let us assume that (A () B) U C *- 0, and let x be any element of (A () B) U C. Then x E A () B or x E C. Suppose x E A () B. Then x belongs to both A and B, and hence x E A U C and x E B U C. F r om this it follows that x E (A U C) () (B U C). On the other hand, let x E C. Then x E A U C and x E B U C. and hence x E (A U C) () (B U C). Thus, if x E (A n B) U C, then x E (A U C) n (B U C). and we have (A
n
B) U C
c (A U
n
C)
(B U
C).
(1.1.8)
To show that (A () B) U C ::::J (A U C) () (B U C) we need to prove the assertion only when (A U C) () (B U C) *- 0. So let x be any element of (A U C) n (B U C). Then x E A U C and x E B U C. Since x E A U C, then x E A or x E C. Furthermore, x E B U C implies that x E B or x E C. We know that either x E C or x ¢ C. If x E C. then x E (A () B) U C. If x ¢ C, then it follows from the above comments that x E A and also x E B. Then x E A () B, and hence x E (A () B) U C. Thus, if x ¢ C, then x E (A () B) U C. Since this exhausts all the possibilities, we conclude that (A U C) () (B U C) c (A () B) U C. (1.1.9) F r om (U . S) U C .•
and (1.1.9) it follows that (A U
C) () (B U
C) =
1.1.10. Exercise. Prove parts (i) through (xvii) and parts (xi)x of Theorem 1.1.7.
(A () B)
and (x)x
In view of part (xvi) of Theorem 1.1.7, there is no ambiguity in writing B U C. Extending this concept. let n be any positive integer and let AI' A 2• • ,A3 denote subsets of .X The set AI U A 2 U ... U A3 is defined to be the set of all x E X which belong to at least one of the subsets AI, and we write
AU
U ';1 3
A, =
AI U
A2 U
... U
A3
=
x{
E A, for
x
E X:
some i
= 1• . .. , n).
Similarly, by part (xv) of Theorem 1.1.7, there is no ambiguity in writing A () B n C. We define
n ';1
A, =
AI () A: () ... () A. =
{x
E
X:
x
E
A, for all i
= 1, ... ,n).
1.1. Sets That is,
7
n A, consists of those members of X n
1= 1
AI, A z , • .
which belong to all the subsets
,An'
We will consider the union and the intersection of an infinite number of subsets A, at a later point in the present section. and (x)x of Theorem The following is a generalization of parts (xi)x
1.1.7.
1.1.11. Theorem. Let AI> ... , An be subsets of .X Then (i) (ii)
1.1.14.
U[ 1=
n[
1=1
I
A/J- = A/J=
Exercise.
n A;, 1= 1
(1.1.12)
and
U/=1 A;.
(1.1.13)
Prove Theorem 1.1.11.
The results expressed in Eqs. (1.1.12) and (1.1.13) are usually referred to as De Morgan's laws. We will see later in this section that these laws hold under more general conditions. Next, let A and B be two subsets of X. We define the difference of Band A, denoted (B - A), as the set of elements in B which are not in A, i.e.,
B-
A=
{x
E
:X x
E
f/:
B and x
A}.
We note here that A is not required to be a subset of B. It is clear that B-
A
=
Bn A- .
Now let A and B be again subsets of the set .X The symmetric difference of A and B is denoted by A ! l B and is defined as A !l B
=
(A -
B) U
(B -
A).
The following properties follow immediately.
1.1.15. Theorem. Let A, B, and C denote subsets of .X Then (i) A A B (ii) A A B
=
=
B A A; (A U B) -
(iii) A A A = 0; (iv) A! l 0 = A; (v) A A (B! l C) =
(A
n
B);
(A A B) A C;
(vi) A n (B! l C) = (A n B)! l (A (vii) A! l Be (A! l C) U (C! l B).
1.1.16. Exercise.
n
Prove Theorem 1.1.15.
C); and
8
Chapter 1 I Fundamental Concepts
In passing, we point out that the use of Venn diagrams is highly useful in visualizing properties of sets; however, under no circumstances should such diagrams take the place of a proof. In Figure A we illustrate the concepts of union, intersection, difference, and symmetric difference of two sets, and the complement of a set, by making use of Venn diagrams. Here, the shaded regions represent the indicated sets.
•
A
B
CU ( DO
C"' A U B
x
1.1.17.
iF gure A. Venn diagrams.
1.1.18. Definition. A non-void set A is said to be finite if A contains n distinct elements, where n is some positive integer; such a set A is said to be of order n. The null set is defined to be finite with order ez ro. A set consisting of exactly one element, say A = a{ ,} is called a singleton or the singleton of a. If a set A is not finite, then we say that A is infinite. In Section 1.2 we will further categorize infinite sets as being countable or uncountable. Next, we need to consider sets whose elements are sets themselves. F o r example, if A, D, and C are subsets of ,X then the collection 1< = A { , B, C} is a set whose elements are A, D, and C. We usually call a set whose elements are subsets of X a family of subsets of X or a collection of subsets of .X We will usually employ a hierarchical system of notation where lower case letters, e.g., a, b, c, are elements of ,X upper case letters, e.g., A, B, C, are subsets of ,X and script letters, e.g., 1< , B< , e, are families of subsets of .X We could, of course, continue this process and consider a set whose elements are families of subsets, e.g., 1 }4 = ,4 { 5, ...} .Y Therefore,fhas an inverse,f- I , which is defined only on R < (f) and not on all of .Y In this case we have/- I (y) = y - 3 for all y E R < (f). _
*'
1.2.14. -
Example.
eL t
bY e given by lex )
and R < (f)
= y{
E
X
= I
:Y ,- I
into R given by /- I (y)
=
=
= Y
R, the set of all real numbers. Let /: X
for all x
;Ix \
< y< +
E
I] . Also,
I for all y
I !\y
R. Then/is an injective mapping E
/-1 is a mapping from R< (f) R < (f).
-
Y a nd g : Y Next, let ,X ,Y and Z be non-void sets. Suppose that/: X - Z. F o r each x E ,X we have f(x ) E Y and g(f(x » E Z. Since / and g are mappings from X into Y a nd from Y into Z, respectively, it follows that for each x E X there is one and only one element g(f(x » E Z. e H nce, the set ({ ,x )z E X X Z: z = g(f(x » , x E X } (1.2.15) is a function from X into Z. We call this function the composite function of g and / and denote it by g 0 f The value of go/ at x is given by (g 0 f)(x )
=
g o/(x )
t:.
g(f(x » .
In Figure C, a pictorial interpretation of a composite function is given. 1.2.17. Theorem. If/is a mapping of a set X onto a set Yand g is a mapping of the set Y onto a set Z, then go/ is a mapping of X onto Z. In order to show that go/ is an onto mapping we must show that foranyz E Zthere exists an x E X s uchthatg(/(x » = z . Ifz E Zthensince g is a mapping of Y onto Z, there is an element y E Y such that g(y) = .z Furthermore, since / is a mapping of X onto ,Y there is an x E X such that lex ) = y. Since g o/(x ) = g(f(x » = g(y) = ,z it readily follows that go/ is a mapping of X onto Z, which proves the theorem. _ Proof
1.2. uF nctions
17
1.2.16.
iF gure C. Illustration of a composite function.
We also have 1.2.18. Theorem. IfI is a (I- I ) mapping of a set X onto a set ,Y and if g is a (I- I ) mappi ng ofthe set Y o nto a set Z, then g 0 I is a (I- I ) mapping of X ontoZ. 1.2.19. Exercise.
Prove Theorem 1.2.18.
Next we prove: 1.2.20. Theorem. If I is a (1-1) mapping of a set X onto a set ,Y and if a set Z, then (g 0 f)- I = (f- I ) 0 (g- I ).
g is a (1- I ) mapping of Y o nto
Proof Let z E Z. Then there exists an x E X such that g 0 f(x ) = ,z and hence (g 0 f)- I (Z) = .x Also, since g 0 f(x ) = g(f(x » = ,z it follows that g- I (Z) = I(x ) , from which we have f- I (g- I (Z» = .x But I- I (g- I (z » = I- I 0 g- I (Z) and since this is equal to x, we havef- I 0 g- I (Z) = (g olt 1(z). Since z is arbitrary, the theorem is proved. _
Note carefully that in Theorem 1.2.20 I is a mapping of X onto .Y If it had simply been an injective mapping, the composite function (/- 1 ) 0 (g- I ) may not be defined. That is, the range of g- I is ;Y however, the domain of 1- 1 is R < (f). Clearly, the domain ofI- I must include the range of g-1 in order that the composition (f- l ) 0 (g- l ) be defined. 1.2.21. Example. Let A = r{ , s, t, u}, B = u{ , v, W, x}, .J z Let the function I : A - > B be defined as
1= fer, )U ,
(s, w), (t, v), (u, )x .J
and C =
w { , x , y,
Chapter 1 I uF ndamental Concepts
18
We find it convenient to represent this function in the following way:
(r stU ) .
1=
u
v x
W
That is, the top row identifies the domain ofI and the bottom row contains each uniq u e element in the range of I directly below the appropriate element in the domain. Clearly, this representation can be used for any function defined on a finite set. In a similar fashion, let the function g : B - + C be defined as
g
= (U
W )X .
v
x
z
W
y
Clearly, both/and g are bijective. Also, go lis the (I- I ) mapping of A onto C given by
). y z (xr stU
g 0/=
W
F u rthermore,
uX ),
g- I
= (X
I- I
og- t
W
u v
Z W
y), x
(gof)- t
= (X r sZ
w Y
t
). u
Now
i.e.,f- t og- t = ( goltt
= (rX Wt sZ Y )' u
.•
The reader can prove the next result readily. L e t W, X, ,Y and Z be non-void sets. If I is a mapping of set W into set ,X if g is a mapping of X into set ,Y and if h is a mapping of Y into set Z (sets W, ,X ,Y Z are not necessarily distinct), then h 0 (g 0 f) = (h 0 g) of
1.2.22. T ' heorem.
Prove Theorem 1.2.22.
1.2.23. Exercise.
1.2.24. Example. Let A = m [ , n, p, ,} q B= m [ , r, s}, C = r{ , t, u, v}, = w{ , ,x ,Y ,}z and define I : A - + B, g : B - + C, and h : C - + D as
~
1= (: ;
:),
(~
g=
r
=
C ~ ; :).
:),
h
hog
= (: :
Then
g0I
=
(~
;
~
:)
and
:) .
D
1.2. uF nctions
19
Thus, h
0
(g 0 f) =
i.e., h 0 (g 0 f)
(:
=
(h
~ ~)
:
0
g)
0
f.
and (h
0
g)
0
f =
(:
:
~ ~),
•
There is a special mapping which is so important that we give it a special name. We have:
1.2.25. Definition. Let X be a non-void set. eL t e : X - X be defined by e(x) = x for all x E .X We call e the identity function on .X It is clear that the identity function is bijective.
1.2.26. Theorem. eL t X and Y be non-void sets, and left f: X - .Y Let ex, ey, and e l be the identity functions on X, ,Y and R < (f), respectively. Then (i) iffis injective, thenf- I of= ex andfof- I ; = e l ; and (ii) f is bijective if and only if there is a g : Y - X such that g andfo g = ey.
0
f = ex
Part (i) follows immediately from parts (iii) and (iv) of Theorem 1.2.10. The proof of part (ii) is left as an exercise. _
Proof.
1.2.27. Exercise.
Prove part (ii) of Theorem 1.2.26.
Another special class of important functions are permutations. 1.2.28. Definition. A permutation on a set X is a (I- I ) mapping of X onto .X It is clear that the identity mapping on X is a permutation on .X F o r this reason it is sometimes called the identity permutation on .X It is also clear that the inverse of a permutation is also a permutation.
1.2.29. Exercise. as
eL t X
f=
= a{ ,
b, e}, and definef: X
(ac bb ae),
g=
-+
X and g : X X
(ab eb ae).
Show that/, g,f- I , and g- I are permutations on .X 1.2.30. Exercise. Let Z denote the set of integers, and let f : Z - Z be defined by f(n) = n + 3 for all n E Z. Show thatfandf- I are permutations on Z and thatf- I 0 f= fo f- I .
10
The reader can readily prove the following results. 1.2.31. Theorem. Iflis a (I- I ) mapping of a set A onto a set B and if g is a (1-1) mapping of the set B onto the set A, then g 0 I is a permutation on A. 1.2.32. Corollary. If I and g are both permutations on a set A, then g is a permutation on A. 1.2.33. Exercise.
0
I
Prove Theorem 1.2.31 and Corollary 1.2.32.
1.2.34. Exercise. Show that if a set A consists of n elements, then there are exactly n! (n factorial) distinct permutations on A.
Now letl be a mapping of a set X into a set .Y If X I is a subset of X , then for each element x ' E XI there is a unique element/(x ' ) E .Y Thus,fmay be used to define a mapping f' of XI into Y defined by
=
f' ( x ' )
for all x '
E
I(x ' )
(1.2.35)
This motivates the following definition.
XI'
1.2.36. Definition. The mappingf' of subset XI C X into Y o f Eq. (I.2.35) is called the mapping of X . into Y induced by the mapping f: X - > .Y In this case f' is called the restriction offto the set X I '
We also have: 1.2.37. Definition. IfI is a mapping of XI into Y a nd if XI mapping f of X into Y is said to be an extension offif
for every x
=
/(x ) E
XI'
C
I(x )
1.2.39. Example. s, t}. Clearly XI
Also, define j, j
j =
C
:X
(U
eL t X I = u{ , v, ,} x .X Define I : X I
I=(U
->
X ->
v
=
Yas
n p q
then any (1.2.38)
Thus, if j is an extension off, then I is a mapping of a set XI which is induced by the mapping j of X into .Y T,
,X
f{ l, v, ,x y, ,} z
and Y
v x
Z) .
C
X into Y
= tn, p, ,q
)X .
Y as
v x
y
npqrs
Z),
j =
(U
y
npqnt
Then j andj are two different extensions off Moreover, I is the mapping
1.2. uF nctions
11
of IX into Y induced either by j or j. In general, two distinct mappings may induce the same mapping on a subset. _ Let us next consider the image and the inverse image of sets under mappings. Specifically, we have 1.2.40. Definition. L e tf be a function from a set X into a set :Y Let A c: ,X and let B c: .Y We define the image of A under f, denoted by f(A), to be the set
f(A) =
y{ E :Y y
= f(x ) ,
X
E
A}.
We define the inverse image of B under f, denoted by f- l (B), to be the set
f- ' ( B)
=
x{
E
X : f(x )
E
B}.
Note thatf- I (B) is always defined for any f: X - - > .Y That is, there is no implication here thatfhas an inverse. The notation is somewhat unfortunate in this respect. Note also that the range offis f( X). In the next result, some of the important properties of images and inverse images of functions are summarized. 1.2.41. Theorem. Let f be a function from X into ,Y let A, A1> and A2 be subsets of ,X and let B, BI> and B2 be subsets of .Y Then
(i) if AI c: A, then f(A I) c: f(A); (ii) f(A I U A2 ) = f(A I ) U f(A2 ); (iii) f(A I n A2 ) c: f(A I) n f(A2 ); (iv) f- ' ( B I U B2 ) = f- I (B I) U f- I(B2 ); (v) f- ' ( B I n B2 ) = rJ ( B I ) n f- I(B 2 ); (vi) f- ' ( B- ) = [ f - I (B)r; (vii) f-'[f(A)]:::> A; and c: B. (viii) f[ f - ' ( B)] Proof We prove parts (i) and (ii) to demonstrate the method of proof. The remaining parts are left as an exercise. To prove part (i), let y E f(AI)' Then there is an x E AI such that y = f(x ) . But AI c: A and so x E A. H e nce,f(x ) = y E f(A). This proves thatf(A I) c: f(A). To prove part (ii), let y E f(A 1 U A2 ). Then there is an x E AI U A2 such that y = f(x ) . If x E AI, then f(x ) = y E f(A,). If x E A2 , then f(x ) = y E f(Az ). Since x is in AI or in Az , f(x ) must be in f(A,) or f(Az ). Therefore, f(A I U A2 ) c: f(A I) U f(Az ). To prove that f(A I) U f(Az ) c: f(A, U Az ), we note that Al c: AI U Az . So by part (i), f(AI) c: f(AI
Chapter 1 I uF ndamental Concepts U U
A2). Similarly, f(A2) c f(A, U A2). F r om this it follows that f(A I) f(A2) c f(A, U A2). We conclude that f(A, U A2) = f(A I) U f(A,j.
1.2.42.
Exercise.
Prove parts (iii) through (viii) of Theorem 1.2.41.
-
We note that, in general, equality is not attained in parts (iii), (vii), and (viii) of Theorem 1.2.41. However, by considering special types of mappings we can obtain the following results for these cases.
1.2.43. Theorem. L e tfbe a function from X into ,Y let A, AI' and A2 be subsets of ,X and let B be a subset of .Y Then (i) f(A, n A2) = f(A I) n f(A2) for all pairs of subsets AI, A2 of X if and only iffis injective; (ii) f- ' [ f (A)] = A for all A c X if and only iff is injective; and (iii) f[ f - I (B)] = B for all B c Y i f and only iffis surjective.
Proof We will prove only part (i) and leave the proofs of parts (ii) and (iii) as an exercise. To prove sufficiency, letfbe injective and let AI and A2 be subsets of .X In view of part (iii) of Theorem 1.2.41, we need only show thatf(A I) nf(A,) c f(A, n A2). In doing so, let y E f(A I) n f(A2). Then y E f(A I) and y E f(A2). This means there is an IX E AI and an x 2 E A2 such that y = f(x ,) = f(x 2). Since f is injective, IX = 2X ' Hence, IX E AI n A2. This implies that y E f(A J n A2); i.e.,f(A I) n f(Al ) c f(A I n Al )· To prove necessity, assume that f(A I n A2) = f(A I) n f(A2) for all subsets AI and A 2 of .X F o r purposes of contradiction, suppose there are IX ' 2X E X such that IX X 2 and f(x , ) = f(x 2). Let AI = IX { } and A2 = (X2;} i.e., AI and A2 are singletons of X I and X 2, respectively. Then AI n A2 = 0, and so f(A, n A2) = 0. However, f(A,) = y{ } and f(A2} = y { ,} and thus f(A I) n f(A2) = y{ } 0. This contradicts the fact that f(A , ) n f(A2) = f(A I n A2) for all subsets AI and A2 of .X Thus, f is injective. -
*
*
1.2.4.4
Exercise.
Prove parts (ii) and (iii) of Theorem 1.2.43.
Some of the preceding results can be extended to families of sets. F o r example, we have:
1.2.45. Theorem. Let f be a function from X an indexed family of sets in ,X and let B { .. : IX of sets in .Y Then (i) f(U (ii) f(
A..)
= « UE J
f(A..);
n A..) c n f(A..);
(l.EI
«EI
«EI
into ,Y let A { .. : IX E I} be } K be an indexed family
E
1.2.
u~ nct;ons
(iii) f- I (U
=
B,,)
"EI:
(iv) f- I (n
B,,)
"E /{
U
"EI:
=
f- I (B,,);
n f- I (B,,); and
(v) if Be Y , f- ' ( B- )
"EI:
=
[ f - I (B} r .
Proof
We prove parts (i) and (iii) and leave the proofs of the remaining parts as an exercise. To prove part (i), let y E feu A,,). This means that there is an x E U A" "EI
"EI
such that y = f(x ) . Thus, for some IX E T, x E A". This implies that f(x ) E f(A,,) and so y E f(A,,). e H nce, y E U f(A,,). This shows that feu A,,) "EI
c U f (A,,). "EI
To prove the converse, let y This means there is an x f(x )
= Y
E
E
f(x )
E
Conversely, let x
Thus,j(x ) that
U
.EI:
1.2.46.
E
"EI
E
U
E
• E/{
B". H e nce,j(x )
f- I (B,,) c f- I (
Exercise.
U
"E/{
"' E /{
E
Thus, x
.K
.E/{
U
E
U
"EI
T.
E
A", and so
E
f-I(B",),
U
E
"E/{
B",.
and so x
E
f- I (B,,) .
f- I (B,,). Then x ceK
= y. Now x
B.). This means that f(x )
B",) c U E
f(A,,) for some IX
"EI
• E/{
IX
E
f(A,,) c feu A,,). This completes the
f- I ( U
B" for some
Therefore,j- I (U
f(A,,). Then y
A" such thatf(x )
"EI
U f - I (B.).
• E/{
"EI
feu A,,). Therefore, U
proof of part (i). To prove part (iii), let x eH nce,
U
E
"EI
E
B., and so x
f- I (B,,) for some E
f- I ( U
ceK
IX
E
K.
B,,). This means
B,,), which completes the proof of part (iii). •
Prove parts (ii), (iv), and (v) of Theorem 1.2.45.
Having introduced the concept of mapping, we are in a position to consider an important classification of infinite sets. We first consider the following definition.
1.2.47. Definition. Let A and B be any two sets. The set A is said to be equivalent to set B if there exists a bijective mapping of A onto B. Clearly, if A is equivalent to B, then B is equivalent to A.
1.2.48. Definition. eL t J be the set of positive integers, and let A be any set. Then A is said to be countably infinite if A is equivalent to .J A set is said to be countable or denumerable if it is either finite or countably infinite. Ifa set is not countable, it is said to be uncountable. We have:
Chapter 1
I
~ntal
Concepts
1.2.49. Theorem. L e t J be the set of positive integers, and let 1 c .J If 1 is infinite, then 1 is equivalent to .J Proof. We shall construct a bijective mapping, f, from J onto 1. L e t .J { : n E J } be the family of sets given by J . = {I, 2, ... , n} for n = 1,2, .... Clearly, each J. is finite and of order n. Therefore, J. n I is finite. Since I is 0 for all n. L e t us now define f : J - + I as follows. L e t infinite, 1 - J. f(I) be the smallest integer in 1. We now proceed inductively. Assume f(n) E I has been defined and let f(n 1) be the smallest integer in I which is greater than f(n). Now f(n + 1) > f(n), and so f(n.) > f(n,J for any n. > n2 • This implies thatfis injective. Nex t , we want to show that f is surjective. We do so by contradiction. Suppose that f(J ) I. Since f(J ) c I, this implies that 1- f(J ) 0. L e t q be the smallest integer in 1 - f(J ) . Then q f(1) because f(l) E f(J ) , and so q > f(I). This implies that 1 n J q _ . 0. Since In J q _ . is non- v oid and finite, we may find the largest integer in this set, say r. It follows that r < q - 1 < .q Now r is the largest integer in I which is less than .q But r < q implies that r E f(J ) . This means there is an s E J such that r = f(s). By definition of f,f(s + 1) = .q Hence, q E f(J ) and we have arrived at a contradition. Thus, f is surjective. This completes the proof. _
*
+
*
*
*
*
We now have the following corollary. 1.2.50. Corollary. countable.
Let
A c B c .X
If B is a countable set, then A is
Proof. If A is finite, then there is nothing to prove. So let us assume that A is infinite. This means that B is countably infinite, and so there exists a bijective mapping f : B - + .J L e t g be the restriction offto A. Then for all Xu X 2 E A such that X . X 2 , g(x . ) = f(x t ) f(x 2 ) = g(x 2 ). Thus, g is an injective mapping of A into .J By part (i) of Theorem 1.2.10, g is a bijective mapping of A onto g(A). This means A is equivalent to g(A), and thus g(A) is an infinite set. Since g(A) c ,J g(A) is equivalent to .J Hence, there is a bijective mapping of g(A) onto ,J which we call h. By Theorem 1.2.18, the composite mapping hog is a bijective mapping of A onto .J This means that J is eq u ivalent to A. Therefore, A is countable. _
*
*
We conclude the present section by considering the cardinality of sets. Specifically, if a set is finite, we say the cardinal Dumber of the set is eq u al to the number of elements of the set. Iftwo sets are countably infinite, then we say they have the same cardinal number, which we can define to be the cardinal number of the positive integers. More generally, two arbitrary sets are said to have the same cardinal number if we can establish a bijective mapping between the two sets (i.e., the sets are equivalent).
1.3.
RELATIONS
AND EQUIVALENCE
RELATIONS
Throughout the present section, X denotes a non-void set.
We begin by introducing the notion of relation, which is a generalization of the concept of function. 1.3.1 Deftnition. Let X and Y be non-void sets. Any subset of X X Y is called a relation from X to .Y Any subset of X x X is called a relation in .X 1.3.2. Example. Let A = u{ , v, ,x y) and B = a{ , b, c, d). Let ~ = ({ u, a), (v, b), (u, c), (x, a»). Then ~ is a relation from A into B. It is clearly not a function from A into B (why?). _ 1.3.3. Example. Let X = Y = R, the set of real numbers. The set y) E R x R: :x ::;;; y) is a relation in R. Also, the set ({ ,x y) E R x R: x = sin y) is a relation in R. This shows that so-called multivalued functions are actually relations rather than mappings. _
({ ,x
As in the case of mappings, it makes sense to speak of the domain and the range of a relation. We have: 1.3.4.
DefiDition. eL t p be a relation from X to .Y The subset of X, {x
:X (x, y)
E
E p,
Y
E )Y ,
is called the domaiD or p. The subset of Y {y
:Y (x, y)
E
E
p, X
EX ) ,
is called the ruge of p. Now let p be a relation from X to .Y Then, clearly, the set p- I c Y defined by p- I
=
{ ( y; x)
E
Y X
X : (x, y)
E
pc X
x
X
,X
)Y ,
is a relation from Y to .X The relation p- I is called the inverse relation of p. Note that whereas the inverse of a function does not always exist, the inverse of a relation does always exist. Next, we consider equivalence relations. eL t p denote a relation in X ; i.e., p c X X .X Then for any ,x y E ,X .either (x, y) E P or (x, y) i p, but not both. If (x, y) E p, then we write x p y and if (x, y) i p, we write x.J/y.
1.3.5. DefiDition. Let p be a relation in .X (i) If x P x for all x
E
,X
then p is said to be reflexive;
Chapter 1
26
I
uF ndtzmental
Concepts
(ii) if x P y implies y p x for all x, Y E p, then p is said to be symmetric; and (iii) if for all x, y, Z E ,X X PY and y p Z implies x p ,z then p is said to be traositive. 1.3.6. Example. Let R denote the set of real numbers. The relation in R given by {(x, y): x < y} is transitive but not reflexive and not symmetric. y} is symmetric but not reflexive and The relation in R given by {(x, y): x not transitive. _
*"
defined by p = ({ A x B): 1.3.7. Example. Let p be the relation in (>J< )X A c B}. That is, A p B if and only if A c B. Then p is reflexive and transitive but not symmetric. _ In the following, we use the symbol,.., to denote a relation in .X E ,.." then we write, as before, x ,.., y.
(x, y)
If
1.3.8. Definition. L e t,.., be a relation in .X Then ...., is said to be an equivalence relation in X if ,.., is reflexive, symmetric, and transitive. If ,.., is an equivalence relation and if x ...., y, we say that x is equivalent to y. In particular, the equivalence relation in X characterized by the statement "x ,.., y if and only if x = y" is called the equals relation in X or the identity relation in .X 1.3.9. Example. eL t X be a finite set, and let A, B, C E P< (X). Let,.., on be defined by saying that A ...., B if and only if A and B have the same number of elements. Clearly A ,.., A. Also, if A ,.., B then B "' "' A. F u rthermore, if A ...., Band B "' "' C, then A ,.., C. Hence, ...., is reflexive, symmetric, and transitive. Therefore, ,.., is an equivalence relation in P< (X). _
P< (X)
1.3.10. Example. Let R1. = R x R, the real plane. Let X be the family of all triangles in R1.. Then each of the following statements can be used to define an equivalence relation in :X "is similar to," "is congruent to," "has the same area as," and "has the same perimeter as." _
1.4.
OPERATIONS ON SETS
In the present section we introduce the concept of operation on set, and we consider some of the properties of operations. Throughout this section, X denotes a non-void set. 1.4.1. Definition. A binary operation on X is a mapping of X x .X A ternary operation on X is a mapping of X x X x X into .X X
into
1.4.
27
Operations on Sets
We could proceed in an obvious manner and on .X Since our primary concern in this book will we will henceforth simply say "an operation on X " binary operation on .X If IX: X X X - > X is an operation, then we IX(,X y) A IX yX .
define an n-ary operation be with binary operations, when we actually mean a usually use the notation
1.4.2. Example. Let R denote the real numbers. Let f: R x R - > R be given by f(x , y) = x + y for all x, y E R, where x + y denotes the customary sum of x plus y (Le., + denotes the usual operation of addition of real numbers). Then f is clearly an operation on R, in the sense of Definition as being the operation on R, 1.4.1. We could just as well have defined i.e., +: R x R - > R, where + ( x , y) A x + y. Similarly, the ordinary rules of subtraction and multiplication on R, "- " and" . ", respectively, are also operations on R. Notice that division, :- ,- is not an operation on R, because x :- - y is not defined for all y E R (i.e., x :- - y is not defined for y = 0). { ,J then "- : - " is an operation on R#. • However, if we let R* = R - O
"+"
1.4.3. Exercise. Show that if A is a set consisting of n distinct elements, then there exist exactly n(·)· distinct operations on A. 1.4..4 Example. Let A = a{ , b}. An example of an operation on A is the mapping IX: A x A - > A defined by I%(a,
a)
A 01% 0
=
0,
b)
1%(0,
=
b
A 01%
lX(b,O)
b,
b IX a =
A
b, lX(b, b) =
b IX b =
It is convenient to utilize the following operation table to define
..!~-
ala b
b b a
a.
IX:
(l.4 . 5)
If, in general, IX is an operation on an arbitrary finite set A, or sometimes even on a countably infinite set A, then we can construct an operation table as follows:
If A =
IX
Y
x
xIXy
a{ , b}, as at the beginning of this example, then in addition to
IX
28
CMprerlIFm ~ en~/C~up~
L~ a b
Iba
a b a a a b b b
" a b a
a
b
a
p, y,
:r::
given in (1.4.5), we can define, for example, the operations A as
b a b
a
and ~ on
•
We now consider operations with important special properties. 1.4.6. =
is said to be commutative if x cz y
Definition. An operation cz on X E X.
y cz x for all x , y
1.4.7. Definition. An operation cz on X is said to be associative if (x cz y) cz z = x cz (y cz )z for x, y, Z E .X In the case of the real numbers R, the operations of addition and multiplication are both associative and commutative. The operation ofsubtraction is neither associative nor commutative. 1.4.8. then
Definition.
If cz and P are operations on X (not necessarily distinct),
(i) cz is said to be left distributive over x cz (y P )z
=
P if
(x cz y)
P (x
cz )z
for every x, y, Z E ;X (ii) cz is said to be right distributive over (x
(iii)
P y) cz
z =
(x
P if cz )z P (y cz
)z
for every x, y, Z E X ; and cz is said to be distributive over P if cz is both left and right distributive over p.
In Ex a mple
1.4.4, cz is the only commutative operation. The operation 1.4.4 is not associative. The operations cz, y, and 6 of this ex a mple are associative. In this example, " is distributive over 6 and 6 is distributive over y. In the case of the real numbers R, multiplication, ".", is distributive over addition, The converse is not true.
p of Example
"+".
1.4.9. Definition. If cz is an operation on ,X and if IX is a subset of ,X then X l is said to be closed relative to cz if for every ,x y E X .. x cz Y E X l . Clearly, every set is closed with respect to an operation on it. The set of all integers Z, which is a subset of the real numbers R, is closed with respect to the operations of addition and multiplication defined on R. The even integers are also closed with respect to both of these operations, whereas the odd integers are not a closed set relative to addition.
1.4.
Operations on Sets
1.4.10.
Definition. If a subset X l of X is closed relative to an operation ~ then the operation a: on X l defined by
on X,
('« ,x for all ,x y
E
y)
= x
IX'
y
= x«
is called the operation on X l
lX >
y
induced by
IX.
If X l = X, then IX' = IX. If X l C X but X l 1= = X, then IX' 1= «= since IX' and « are operations on different sets, namely X l and X, respectively. In general, an induced operation IX' differs from its predecessor IX; however, it does inherit the essential properties which « possesses, as shown in the following result.
1.4.11. Theorem. L e t« be an operation on X, let X l C X, where X l is closed relative to IX, and let IX' be the operation on X l induced by IX. Then (i) if« is commutative, then IX' is commutative; (ii) if« is associative, then IX' is associative; and (iii) if P is an operation on X and X l is closed relative to p, and if« is left (right) distributive over p, then IX' is left (right) distributive over P', where P' is the operation on X l induced by p.
1.4.12.
Exercise.
Prove Theorem 1.4.11.
The operation IX' on a subset X l induced by an operation « on X will frequently be denoted by IX, and we will refer to « as an operation on X l ' In such cases one must keep in mind that we are actually referring to the induced operation IX' and not to IX. Definition. eL t X l be a subset of .X An operation a. on X is called an extension of an operation « on X l if X l is closed relative to a. and if « is equal to the operation on X l induced by a..
1.4.13.
A given operation « on a subset X l different extensions.
1.4.14.
Example. and a. and
on X l
« a b C
Let
a. on X
a b C a C b b b a C
a C
Xl
as
a.
= a{ ,
of a set X may, in general, have many
b, c}, and let X
a b C a a C b b C b a C b a C d C d a e d C a
= a{ ,
b
a b C d e a C b d e C b a e d
C
b
a
C
d
d
C
b a e
e
d
a
C
e
f1.
e d d e e d b e
a
d
b e
b, c, d, e}. Define «
d
e
b e
Chapter 1 I uF ndamental Concepts
30
Clearly, ~ is an operation on IX and ii and fl. are operations on .X Moreover, both a. and fl. (ii fl.) are extensions of ~. Also, ~ may be viewed as being induced by ii and fl.. •
*'
1.5.
MATHEMATICAL IN THIS BOOK
SYSTEMS
CONSIDERED
We will concern ourselves with several different types of mathematical systems in the subsequent chapters. Although it is possible to give an abstract definition of the term mathematical systelf1, we will not do so. Instead, we will briefly indicate which types of mathematical systems we shall consider in this book. 1. In Chapter 2 we will begin by considering mathematical systems which are made up of an underlying set X and an operation ~ defined on X. We .}~ We will be able to characterize a will identify such systems by writing ;X{ according to certain properties which X and ~ possess. Two system ;X{ }~ important cases of such systems that we will consider are semigroups and groups. In Chapter 2 we will also consider mathematical systems consisting of a basic set X and two operations, say ~ and p, defined on ,X where a special relation exists between ~ and p. We will identify such systems by writing {X;~, Pl. Included among the mathematical systems of this kind which we will consider are rings and fields. In Chapter 2 we will also consider composite mathematical systems. Such systems are endowed with two underlying sets, say X and ,F and possess a much more complex (algebraic) structure than semigroups, groups, rings, and fields. Composite sy~tems which we will consider include modules, vector spaces over a field F which are also called linear spaces, and algebras. In Chapter 2 we will also study various types of important mappings (e.g., homomorphisms and isomorphisms) defined on semigroups, groups, rings, etc. Mathematical systems of the type considered in Chapter 2 are sometimes called algebraic systems. 2. In Chapters 3 and 4 we will study in some detail vector spaces and special types of mappings on vector spaces, called linear transformations. An important class of linear transformations can be represented by matrices, which we will consider in Chapter .4 In this chapter we will also study in some detail important vector spaces, called Euclidean spaces. 3. Most of Chapter 5 is devoted to mathematical systems consisting of a basic set X and a function p: X x X - + R (R denotes the real numbers), where p possesses certain properties (namely, the properties of distance
1.6. References and Notes
31
between points or elements in X ) . The function p is called a metric (or a distance function) and the pair ;X{ p) is called a metric space. In Chapter 5 we will also consider mathematical systems consisting of a basic set X and a family of subsets of X (called open sets) denoted by 3. The pair { X ; 3) is called a topological space. It turns out that all metric spaces are in a certain sense topological spaces. We will also study functions and their properties on metric (topological) spaces in Chapter 5. .4 In Chapters 6 and 7 we will consider Dormed linear spaces, inner product spaces, and an important class of functions (linear operators) defined on such spaces. A normed linear space is a mathematical system consisting of a vector space X and a real-valued function denoted by II . II, which takes elements of X into R and which possesses the properties which characterize the "length" of a vector. We will denote normed spaces by { X ; 1I·11l. An inner product space consists of a vector space X (over the field of real numbers R or over the field of complex numbers C) and a function (' , ' ) , which takes elements from X x X into R (or into C) and possesses certain properties which allow us to introduce, among other items, the concept of orthogonality. We will identify such mathematical systems by writing
{ X ; (,,· » ) .
It turns out that in a certain sense all inner product spaces are normed linear spaces, that all normed linear spaces are metric spaces, and as indicated before, that all metric spaces are topological spaces. Since normed linear spaces and inner product spaces are also vector spaces, it should be clear that, in the case of such spaces, properties of algebraic systems (called algebraic strocture) and properties of topological systems (called topological structure) are combined. A class of normed linear spaces which are very important are Bauach spaces, and among the more important inner product spaces are Hilbert spaces. Such spaces will be considered in some detail in Chapter 6. Also, in Chapter 7, linear transformations defined on Banach and Hilbert spaces will be considered. 5. Applications are considered at the ends of Chapters ,4 5, and 7.
1.6.
REFERENCES
AND NOTES
A classic reference on set theory is the book by Hausdorff 1[ .5]. The many excellent references on the present topics include the elegant text by Hanneken 1[ .4), the standard reference by Halmos 1[ .3] as well as the books by Gleason 1[ .1] and Goldstein and Rosenbaum 1[ .2].
REFERENCES 1[ .1]
1[ .2] 1[ .3]
1[ .4] 1[ .5]
31
A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.: Addison-Wesley Publishing Co., Inc., 1966. M. E. GOLDStEIN and B. M. ROSENBAUM, "Introduction to Abstract Analysis," National Aeronautics and Space Administration, Report No. SP-203, Washington, D.C., 1969. P. R. A H M L OS, Naive Set Theory. Princeton, N.J.: D. Van Nostrand Company, Inc., 1960. C. B. A H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dickenson Publishing Co., Inc., 1968. F. A H SU DORF,F Mengenlehre. New o Y rk: Dover Publications, Inc., 194.4
2
ALGEBRAIC
STRUCTURES
The subject matter of the previous chapter is concerned with set theoretic structure. We emphasized essential elements of set theory and introduced related concepts such as mappings, operations, and relations. In the present chapter we concern ourselves with algebraic structure. The material of this chapter falls usually under the heading of abstract algebra or modern algebra. In the next two chapters we will continue our investigation of algebraic structure. The topics of those chapters go usually under the heading of linear algebra. This chapter is divided into three parts. The first section is concerned with some basic algebraic structures, including semigroups, groups, rings, fields, modules, vector spaces, and algebras. In the second section we study properties of special important mappings on the above structures, including homomorphisms, isomorphisms, endomorphisms, and automorphisms of semigroups, groups and rings. Because of their importance in many areas of mathematics, as well as in applications, polynomials are considered in the third section. Some appropriate references for further reading are suggested at the end of the chapter. The subject matter of the present chapter is widely used in pure as well as in applied mathematics, and it has found applications in diverse areas, such as modern physics, automata theory, systems engineering, information theory, graph theory, and the like. 33
Chapter 2
34
I
Algebraic Structures
Our presentation of modern algebra is by necessity very brief. oH wever, mastery of the topics covered in the present chapter will provide the reader with the foundation required to make contact with the literature in applications, and it will enable the interested reader to pursue this subject further at a more advanced level.
2.1.
SOME BASIC STRUCTURES
OF
ALGEBRA
We begin by developing some of the more important properties of mathematical systems, { X ; IX,} where IX is an operation on a non-void set .X 2.1.1. Definition. Let IX be an operation on .X If for all ,x ,Y Z E X, x IX Y = x IX z implies that y = ,z then we say that ;X { IX} possesses the left cancellation property. If x IX y = Z IX Y implies that x = ,z then ;X{ IX} is said to possess the right cancellation property. If { X ; IX} possesses both the left and right cancellation properties, then we say that the cancellation laws hold in ;X { IX.} In the following exercise, some specific cases are given. 2.1.2. Exercise.
eL t
x Y xxy yyx
IX
X =
~
,x{
y} and let IX,
~
p, )',
.1.- r :- t y xxy yxy
xxx yyx
and d be defined as
~ xxx yyy
Show that (i) { X ; P} possesses neither the right nor the left cancellation property; (ii) { X ; )'} possesses the left cancellation property but not the right cancellation property; (iii) { X ; d} possesses the right cancellation property but not the left cancellation property; and (iv) { X ; IX} possesses both the left and the right cancellation property. In an arbitrary mathematical system { X ; IX} there are sometimes special elements in X which possess important properties relative to the operation IX. We have: 2.1.3. Definition. eL t element e, such that
IX
be an operation on a set X and let X contain an
x
IX
e, =
,x
for all x E .X We call e, a right identity element of X relative to lX, or simply aright identity of the system ;X{ IX.} If X contains an element e, which satisfies the condition e,IX
x
=
,x
2.1. Some Basic Structures ofAlgebra
3S
for all x E X, then et is called a left identity element of X simply a left identity of the system ;X{ .} «
relative to «, or
We note that a system ;X { } « may contain more than one right identity element of X (e.g., system { X ; cS} of Exercise 2.1.2) or left identity element of X (e.g., system ;X { y} of Exercise 2.1.2).
2.1.4. Definition. An element e of a set X of X relative to an operation « on X if for every x
is called an identity element
e« x = x « e = x E
.X
2.1.5. Exercise.
X
Let
=
I-± h-oI I
+}
Does either ;X{
or ;X{
to, I}
and define the operations"" +
and"· " by
• .0 I
0
I
o
0 0
I
0
I
0
I
.} have an identity element?
Identity elements have the following properties.
2.1.6. Theorem. L e t«
be an operation on .X
has an identity element e, then e is unique. } « has a right identity e, and a left identity ee. then e, = et . (iii) If« is a commutative operation and if ;X{ } « has a right identity element e" then e, is also a left identity. (i) If { X ; (ii) If { X ;
}«
Proof To prove the first part, let e' and en be identity elements of { X ; .} « Then e' « en = e' and e' « en = en. Hence, e' = en. To prove the second part, note that since e, is a right identity, et« e, = et. Also, since et is a left identity, et « e , = e,. Thus, et = e,. To prove the last part, note that for all x E X we have x = x « e, =
e,« x.
•
In summary, if { X ; } « has an identity element, then that element is unique. F u rthermore, if { X ; } « has both a right identity and a left identity element, then these elements are equal, and in fact they are equal to the uniq u e identity element. Also, if { X ; } « has a right (or left) identity element and « is a commutative operation, then {X; } « has an identity element.
2.1.7. Definition. L e t« relative to «. If x
E
X,
be an operation on X and let e be an identity of X then x ' E X is called a right inverse of x relative to
Chapter 2 I Algebraic Structures
« provided that An element x "
x« E
x' =
e. of x relative to « if
X is called a left ia~erse
x"«
x =
e.
The following exercise shows that some elements may not possess any right or left inverses. Some other elements may possess several inverses of one kind and none of the other, and other elements may possess a number of inverses of both kinds. 2.1.8. Exercise.
Let X
=
,x{
«
y, u, v} and define ~
as
y u v x x y x y y x y y x x
u v
x
u v
y x
y
v
u
(i) Show that { X ; } « contains an identity element. (ii) Which elements possess neither left inverses nor right inverses? (iii) Which element has a left and a right inverse? A.
Semigroups and Groups
Of crucial importance are mathematical systems called semlgroups. Such mathematical systems serve as the natural setting for many important results in algebra and are used in several diverse areas of applications (e.g., qualitative analysis of dynamical systems, automata theory, etc.).
be an operation on .X 2.1.9. Deftnition. L e t« if « is an associative operation on .X
We call { X ;
}«
a semlgroup
Now let ,x y, Z E ,X and let « be an associative operation on .X Then x « (y« )z = (x « y) « Z = U E .X Henceforth, we will often simply write u = x « y « .z As a result of this convention we see that for x, y, u, V E ,X X
« y~ u~ v =
=
x ~
(y « u) « v =
(x~y)~(u«v)
=
x ~
y ~ (u « v)
(x~y)«u«v.
(2.1.10)
As a generalization of the above we have the so-called generalized assoc:lalaw, which asserts that if X I ' X z , .. ' ,x . are elements of a semigroup { X ; ~}, then any two products, each involving these elements in a particular order, are equal. This allows us to simply write X I X« z ~ ... ~ x .• ti~e
2.1. Some Basic Structures ofAlgebra
37
In view of Theorem 2.1.6, part (i), if a semigroup has an identity element, then such an element is unique. We give a special name to such a semigroup. 2.1.11. Definition. A semigroup {X; (X} is called a .monoid if X contains an identity element relative to (x, Henceforth, the unique identity element of a monoid ;X { (X} will be denoted bye. Subsequently, we frequently single out elements of monoids which possess inverses. 2.1.12. DefiDition. Let { X ; (X} be a monoid. If x E X possesses a right inverse x ' E ,X then x is called a right invertible element in .X If x E X possesses a left inverse x " E ,X then x is called a left invertible element in .X If x E X is both right invertible and left invertible in ,X then we say that x is an invertible element or a unit of .X Clearly, if e
E ,X
then e is an invertible element.
2.1.13. Theorem. Let { X ; (X} be a monoid, and let x E .X If there exists a left inverse of x, say x', and a right inverse of ,x say x " , then x ' = x " and x ' is unique. Since (X is associative, we have (x' (X x) (X x " = x " and x ' (X (x (X x " ) x'. Thus, x ' = x " . Now suppose there is another left inverse of x, say x " ' . Then x ' " = x " and therefore x ' " = x'. •
Proof =
Theorem 2.1.13 does, in general, not hold for arbitrary mathematical systems {X; (X} with identity, as is evident from the following: 2.1.14.
Exercise.
Let
X
= u{ , v, x , y} and define (X
u v x
u v
v v u u u u v x
x y
u
v
x v
x
(X
as
y
y
y
x
Use this operations table to demonstrate that Theorem 2.1.13 does not, in (X} is replaced by system ;X { (X} with identity. general, hold if monoid ;X { By Theorem 2.1.13, any invertible element of a monoid possesses a unique right inverse and a unique left inverse, and moreover these inverses are equal. This gives rise to the following.
Chapter 2
38
I Algebraic Structures
2.1.15. Definition. eL t { X ; a} be a monoid. If x E X has a left inverse and a right inverse, x ' and x " , respectively, then this unique element x ' = x " is called the inverse of x and is denoted by X - I . Concerning inverses we have. 2.1.16. Theorem. eL t ;X{
a} be a monoid.
(i) If x E X has an inverse, X - I , then X - I has an inverse (X - I t I = x . (ii) If x, y E X have inverses X - I , y- I , respectively, then X a y has an inverse, and moreover (x a y)- I = y- I 1% X - I . (iii) The identity element e E X has an inverse e- I and e- I = e.
Proof To prove the first part, note that x a X - I = e and X - I Thus, x is both a left and a right inverse of X - I and (X - I )- I = .X To prove the second part, note that (x a y)a(y- I
and (y- I
a X-I)
ax - I ) 1%
(x
=
x l % ( yay- I )ax -
a y) = y- I
1%
(X - I
I
=
ax
=
e.
e
a x ) a y = e.
The third part of the theorem follows trivially from e a e =
e.
_
In the remainder of the present chapter we will often use the symbols and "." to denote operations in place of a, p, etc. We will call these "addition" and "multiplication." oH wever, we strongly emphasize here that ..+" and"· " will, in general, not denote addition and multiplication of real numbers but, instead, arbitrary operations. In cases where there exists an identity element relative to "+ " , we will denote this element by "0" and call it "zero." If there exists an identity element relative to ". ", we will denote this element either by "I" or bye. Our usual notation for representing an identity + } an relative to an arbitrary operation a will still be e. If in a system {X; element x E X possesses an inverse, we will denote this element by - x and we will call it "minus "x . F o r example, if ;X{ + } is a semigroup, then we denote the inverse of an invertible element x E X by - x , and in this case we have x + (- x ) = (- x ) + x = 0, and also, - ( - x ) = .x Furthermore, if ,x y E X are invertible elements, then the "sum" x + y is also invertible, and - ( x y) = (- y ) (- x ) . Note, however, that unless is commutative, - ( x + y) (- x ) + (- y ). Finally, if x, y E X and if y is an invertible element, then - y E .X In this case we often will simply write x + (- y ) = x - y.
"+"
+
*'
+
"+"
2.1.17. Example. eL t X = O { , 1,2, 3}, and let the systems { X ; { X ; .} be defined by means of the operation tables
+}
and
2.1.
Some Basic Structures ofAlgebra
+
0 1 2 3
39
10
1 2 3 0 1 2 3 1 2 3 0 2 3 0 I 3 0 1 2
0 1 2 3
0 0 0 0 0
1 2 3 0 0 0 1 2 3 2 0 2 3 2 1
The reader should readily show that the systems { X ; }+ and { X ; .} are monoids. In this case the operation" " + is called "addition mod 4" and"· " is called "multiplication mod 4."
•
The most important special type of semigroup that we will encounter in this chapter is the group. 2.1.18. Definition. A group is a monoid in which every element is invertible; IX,} with identity in which every element is eL ., a group is a semigroup, ;X { invertible. The set R of real numbers with the operation of addition is an example of a group. The set of real numbers with the operation of multiplication does not form a group, since the number zero does not have an inverse relative to multiplication. However, the latter system is a monoid. If we let Rtt = R - O { ,J then R { ;# .} is a group. Groups possess several important properties. Some of these are summarized in the next result. 2.1.19. Theorem. Let {X; IX} be a group, and let e denote the identity element of X relative to IX. Let x and y be arbitrary elements in .X Then (i) (ii) (iii) (iv)
if x IX x = x , then x = e; if Z E X and x IX y = x IX ,z then y = z; ifz E X a ndx I X y = z I X Y , thenx = z ; there exists a unique W E X such that W (X
(v) there exists a unique z
E
X
x =
y; and
(2.1.20)
such that
x(Xz=y. Proof To prove the first part, let x (X x = x. Then X - I IX (x IX x ) = and so (X - I (X x ) IX x = e. This implies that x = e. To prove the second part, let x IX y = x IX .z Then X - I (X (x IX y) = IX )z , and so (X - I IX x ) IX Y = (X - I (X x ) IX .z This implies that y = .z The proof of part (iii) is similar to that of part (ii).
(2.1.21)
,x
X-I
(X
X-I
IX (x
Chapter 2 I Algebraic·Structures
04
To prove part (iv), let w = y« X - I . Then w« x = (y« x - I )« x = y« (X - I = y. To show that w is unique, suppose there is a v E X such that = y. Then w « x = v « x . By part (iii), w = v. The proof of the last part of the theorem is similar to the proof of part (iv). « x) v «x
In part (iv) of Theorem 2.1.19 the element w is called the left solution of Eq. (2.1.20), and in part (v) of this theorem the element z is called the right solution of Eq. (2. I.21). We can classify groups in a variety of ways. Some of these classifications are as follows. eL t { X ; } « be a group. Ifthe set X possesses a finite number of elements, then we speak of a finite group. If the operation « is commutative then we have a commutative group, also called an abelian group. If « is not commutative, then we speak of a non-commutative group or a non-abelian group. Also, by the order of a group we understand the order of the set .X Now let ;X { } « be a semigroup and let IX be a non-void subset of X which is closed relative to .« Then by Theorem 1.4.11, the operation « I on XI induced by the associative operation « is also associative, and thus the mathematical system { X I ; I« } is also a semigroup. The system { X I ; I« } is called a subsystem of { X ; .} « This gives rise to the following concept.
2.1.22. Definitio... eL t { X ;
}« be a semigroup, let IX be a non-void subset of X which is closed relative to lX, and let I« be the operation on X I induced by .« The semigroup (X I ; (XI} is called a subsemigroup of (X ; (Xl.
In order to simplify our notation, we will henceforth use the notation (X I ; (X} to denote the subsemigroup (X I ; « t l (Le., we will suppress the subscript of )« . The following result allows us to generate subsemigroups in a variety of ways.
2.1.23. Theorem. eL t {X;
} « be a semigroup and let ,X where I denotes some index set. eL t Y = "X If,X { ; }«
c X for aU i E I, is a subsemigroup
I, and jf Y is not empty, then { ;Y }«
is a subsemigroup
of (X ; } « for every i of { X ; .} «
n
lEI
E
Proof eL t x , y E .Y Then x, y E ,X for all i every i, and hence x « y E .Y This implies that { Y ; let
Now let Wbe any non~void
Y' =
{Y:
E
}«
subset of ,X where { X ;
We Y c X and { Y ;
}«
I and so x «
y E ,X for is a subsemigroup. _ }«
is a semigroup, and
is a subsemigroup of { X ;
«n.
2.1. Some Basic Structures ofAlgebra
Then
cy is non-empty,
since X
E
14
cy. Also, let
G=
n
YE'l/
.Y
Then We G, and by Theorem 2.1.23 G { ; Il} is a subsemigroup of { X ; This subsemigroup is called the subsemigroup generated by W.
Il}.
2.1.24. Theorem. Let ;X { Il} be a monoid with e its identity element, and let { X I ; Ill} be a subsemigroup of { X ; Il}. Ife E IX ! , then e is an identity element of { X I ; Ill} and { X I ; Iltl is a monoid. 2.1.25. Exercise.
Prove Theorem 2.1.24.
Nex t we define subgroup. 2.1.26. Definition. L e t { X ; Il} be a semigroup, and let { X I ; Iltl be a subsemigroup of { X ; Il}. If { X I ; Ill} is a group, then { X I ; Ill} is called a subgroup of{ X ; Il}. We denote this subgroup by { X I ; Il}, and we say the set IX determines a subgroup of{ X ; Il}. We consider a specific example in the following: 2.1.27. Exercise. L e t Z6 = O { , 1,2,3,4 , 5} and define the operation on Z6 by means of the following operation table:
+
+012345 0012345 1104523 2 2 504 3 1 3345012 4431250 5523104 (a) Show that Z { 6; +} is a group. (b) L e t K = O { , I}. Show that{ K ; +} is a subgroup Of{Z6; +}. (c) Are there any other subgroups Of{Z6; + } ? We have seen in Theorem 2.1.24 that if e E IX c ,X then it is also an identity of the subsemigroup { X I ; Il}. We can state something further. 2.1.28. Theorem. L e t { X ; Il} be a group with identity element e, and let { X I ; Il} be a subgroup of { X ; Il}. Then e l is the identity element of { X I ; Il} if and only if e l = e.
Chapter 2 I Algebraic Structures
14 2.1.29. Exercise.
Prove Theorem 2.1.28.
It should be noted that a semigroup { X ; lX} which has no identity element may contain a subgroup { X I ; lX,} since it is possible for a subsystem to possess an identity element while the original system may not possess an identity. If{ X ; lX} is a semigroup with an identity element and if { X I ; lX} is a subgroup, then the identity element of X mayor may not be the identity element of X I ' oH wever, if { X ; lX} is a group, then the subgroup must satisfy the conditions given in the following:
2.1.30. Theorem. eL t { X ; lX} be a group, and let X I be a non-empty subset of .X Then { X l ; lX} is a subgroup if and only if (i) e E X I ; (ii) for every x
E IX > (iii) for every ,x y E IX >
E
X-I
X
and
XI;
lX Y E X l '
Proo.f Assume that { X I ; lX} is a subgroup. Then (i) follows from Theorem 2.1.28, and (ii) and (iii) follow from the definition of a group. Conversely, assume that hypotheses (i), (ii), and (iii) hold. Condition (iii) implies that IX is closed relative to lX, and therefore { X I ; lX} is a subsemigroup. Condition (i) along with Theorem 2.1.24 imply that (X I ; lX} is a monoid, and condition (ii) implies that (X I ; lX} is a group. _ Analogous to Theorem 2.1.23 we have: 2.1.31. Theorem. eL t (X ; lX} be a group, and let ,X c X for all i where lis some index set. Let Y = "X If (X,; lX} is a subgroup of { X ;
n
for every i
E
l, then (Y ;
lX}
lEI
E
lX}
l,
is a subgroup of (X ; lX.}
Proof Since e E ,X for every i E 1 it follows that e E .Y Therefore, Y is non-empty. Now let y E .Y Then y E ,X for all i E l, and thus y- I E IX so that y- l E .Y Since y E X, it follows that Y c .X Also, for every ,x y E ,Y x, Y E IX for every i E l, and thus x lX y E IX for every i and hence x lX y E .Y Therefore, we conclude from Theorem 2.1.30 that { Y ; lX} is a lX.} _ subgroup of ;X { A direct consequence of the above result is the following: 2.1.32. Corollary. eL t (X ; lX} subgroups of { X ; lX.} eL t X 3 { X I ; lX} and (X 2 ; lX.} 2.1.33. Exercise.
be a group, and let (X I ; lX} and (X 2 ; lX} be X 2 • Then {X 3 ; lX} is a subgroup of
= XI n
Prove Corollary 2.1.32.
2.1.
Some Basic Structures of Algebra
34
We can define a generated subgroup in a similar manner as was done in the case of semigroups. To this end let W be any subset of ,X where (X;~} is a group, and let
Y' = (Y : We Y c X and (Y;~} The set Y ' is clearly non-empty because G=
n
E Y J!'
is a subgroup of (X ; X
E
~} .
.Y' Now let
.Y
Then We G, and by Theorem 2.1.31 (G;~} is a subgroup of (X ; subgroup is called the subgroup generated by W.
2.1.34.
Exercise • . Let W be defined as above. Show that if subgroup of(X ; ~}, then it is the subgroup generated by W.
(W;~ }
~}.
This is a
Let us now consider the following:
"+"
2.1.35. Example.
Let Z denote the set of integers, and let denote the usual operation of addition of integers. Let W = (I}. If Y is any subset of Z such that (Y ; + } is a subgroup of { Z ; + } and We ,Y then Y = Z. To prove this statement, let n be any positive integer. Since Y i s closed with respect to ,+ we must have 1 + 1 = 2 E .Y Similarly, we must have 1 + I + ... + 1 = n E .Y Also, n- I = - n , and therefore all the negative integers are in .Y Also, n - n = 0 E ,Y i.e., Y = Z. Thus, G = Y = Z, and so
n
the group { Z ; + }
is the subgroup generated by I{ .}
•
E Y J!'
The above is an example of a special class of generated subgroups, the so-called cyclic groups, which we will define after our next result. 2.1.36. Theorem. be a group. Let x
Let
Z denote the set of all integers, and let { X ; ~} and define ;xl< = x IX X IX • • IX x (k times), for k a = (Xl Now let b be another element of P, where b
We say that a on P by
"+"
=
0 such that a/
= 0 for all i > n.
lbo, bl' ... ,b.., 0, 0, ...}.
= b if and only if a, = b, for all i. We now define the operation
0+
b=
+
a{ o
bo, 0 1
+
b.. ... .J
Thus, if n 2 m, then a, + b, = 0 for all i > nand P is clearly closed with respect to "+". Next, we define the operation "." on P by
a• b=
where
C"
=
c = c{ o, ~
CI , •.
,J
" a,b,,_,
t:'o
for all k. In this case c" = 0 for all k> m + n, and P is also closed with respect to the operation"· " . Now let us define Then 0
E
P and { P ; + }
0= O { , 0, ....J is clearly an abelian group with identity O. Next,
Chapter 2 I Algebraic Structures
70
define
... .J
e= { I ,O,O,
Then e E P and { P ; • J is obviously a monoid with e as its identity element. We can now easily prove the following
,+ .
2.3.1. Theorem. The mathematical system P { ; J is a commutative ring with identity. It is called the riDg of polynomials over the field .F 2.3.2.
Exercise.
Prove Theorem 2.3.1.
Let us next complete the connection between our abstract characteriz a tion of polynomials and with the function f(t) we originally introduced. To this end we let
to= { I ,O,O,
= O{ , I, 0, 0,
t\
J
}
t'1. =
O { ,O, 1,0,
t3 =
O { ,O,O, I,O,
J J
At this point we still cannot give meaning to a,t', because a, E F and t' E P. However, if we make the obvious identification a{ " 0,0, ... J E P, and if we denote this element simply by a, E P, then we have f(t)
=
a o • to
+
a\ • t\
+ ... +
a• • t· .
Thus, we can represent J ( t) uniquely by the sequence a{ o, at, ... ,a., 0, ... .J By convention, we henceforth omit the symbol ". ", and write, e.g., f(t)
=
ao
+
a\ t
+ ... +
a"r.
We assign t appearing in the argument of f(t) a special name.
,+ .}
2.3.3. DeftnitiOD. Let P { ; be the polynomial ring over a field .F The element t E P, t = O { , 1,0, ...}, is called the indeterminate of P. To simplify notation, we denote by F [ t ] the ring of polynomials over a field ,F and we identify elements of t[ F ] (i.e., polynomials) by making use of the argument t, e.g., f(t) E t[F .]
* 0,
2.3.4. DeftnitioD. Let f(t) E t[F ,] and let f(t) = f{ O,f1o .• . ,f", ... J where f, E F for all i. The polynomial f(t) is said to be of order n or degree n iff" and if f, = for all i > n. In this case we write degf(t) = and we call f" the leading coefticieDt off If f" = I and f, = for all i > then J ( t) is said to be monic.
*°
°
°
°
of n
n,
If every coefficient of a polynomialfis zero, thenf b. is called the zero polynomial. The order of the zero polynomial is not defined.'
2.3.
71
Application to Polynomials
2.3.5. Theorem. L e tf(t) be a polynomial of order n and let get) be a polynomial of order m. Then f(t)g(t) is a polynomial of order m + n.
Proof
+
+ ... +
/"t· , let get)
L e tf(t) = f o fit = f(t)g(t). Then
and let h(t)
=
go
+
glt
+ ... +
g.r,
Since It = 0 for i > nand gJ = 0 for j > m, the largest possible value of k such that hk is non-zero occurs for k = m n; eL .,
+
hm+n
=
/"gm'
Since F is a field, f" and gm cannot be zero divisors, and thushm + . Therefore, hm + . *- 0, and hk = 0 for all k > m + n. •
*-
O.
The reader can readily prove the next result. 2.3.6. Theorem. The ring F ( t) of polynomials over a field F is an integral domain. 2.3.7. Exercise. Prove Theorem 2.3.6. Our next result shows that, in general, we cannot go any further than integral domain for t[ F l. 2.3.8. Theorem. Let f(t) E t[F .] if and only if f(t) is of order zero.
Then f(t) has an inverse relative to "."
Proof
Let f(t) E t[F J be of order n, and assume that f(t) has an inverse relative to ".", denoted by f- I (t), which is of order m. Then f(t)f- I (t)
where e =
=
e,
{I, 0, 0, ... J is of order ez ro. By Theorem 2.3.5 the degree of + n = 0 and since m > 0 and n > 0, we must
+
f(t)f- 1 (t) is m n. Thus, m havem = n = O. Conversely, let f(t) = fo = = fo 1 = { f o· , 0, 0, ... J . •
f{ o, 0, 0, ... ,J where fo
*-
O. Then f- I (t)
In the case of polynomials of order zero we omit the notation t, and we say f(t) is a scalar. Thus, if c(r) is a polynomial of order zero, we have c(t) = c, where c 1= = O. We see immediately that cf(t) = cfo + cflt + cf"t" for all f(t) E t[F .J The following result, which we will require in Chapter ,4 is sometimes called the division algorithm.
+ ...
Chapter 2
72
I Algebraic Structures
2.3.9. Theorem. eL t f(t), get) E E[t] and assume that get) exist unique elements q(t) and ret) in E[t] such that
*"O. Then there
= (q t)g(t) + ret), (2.3.10) where either ret) = 0 or deg ret) < deg get). Proof If f(t) = 0 or if degf(t) < deg get), then Eq. (2.3.10) is satisfied with q(t) = 0, and ret) = f(t). Ifdegg(t) = 0, Le.,g(t) = c, thenf(t) = c[ - I • f(t)] • C, and Eq. (2.3.10) holds with q(t) = c- I f(t) and ret) = O. f(t)
Assume now that deg f(t) > deg get) > 1. The proof is by induction on the degree of the polynomial f(t). Thus, let us assume that Eq. (2.3.10) holds for deg f(t) = n. We first prove our assertion for n = 1 and then for n + I. Assume that deg f(t) = I, eL ., f(t) = a o + alt, where a l O. We need only consider the case g(t) = b o + bit, where b l O. We readily see that Eq. (2.3.10) is satisfied withq ( t) = alb. 1 and ret) = a o - alb.lb o' Now assume that Eq. (2.3.10) holds for degf(t) = k, where k = 1, ... , n. We want to show that this implies the validity of Eq. (2.3.10) for degf(t) = n + I. Let
*"
f(t) =
ao +
alt
+ ... +
*"
a"+lt"+I,
where a,,+ I 1= = O. Let deg get) = m. We may assume that 0 < m < n + I. Let g(t) = bo + bit + ... + b",t"', where b", O. It is now readily verified that
f(t)
=
b;.la"t"+I"- g' (t)
+
*"
[ f (t) -
b;.la.,tk+I"- g' (t)].
(2.3.11)
Now let h(t) = f(t) - b;.l a.,t"+I-"'g(t). It can readily be verified that the coefficient of t"+1 in h(t) is O. Hence, either h(t) = 0 or deg h(t) < n + I. By our induction hypothesis, this implies there exist polynomials set) and ret) such that h(t) = s(t)g(t) + ret), where ret) = 0 or deg ret) < deg get). Substituting the expression for h(t) into Eq. (2.3.11), we have
f(t)
= b[ ;.la"t"+I"'-
+
s(t)]g(t)
+
ret).
Thus, Eq. (2.3.10) is satisfied and the proof of the existence of ret) and q(t) is complete. The proof of the uniqueness of q(t) and ret) is left as an exercise. _ 2.3.12. Exercise.
Prove that (q t) and ret) in Theorem 2.3.9 are unique.
The preceding result motivates the following definition. 2.3.13. Definition. Let f(t) and get) be any non-zero polynomials. Let q(t) and ret) be the unique polynomials such thatf(t) = q(t)g(t) + r(t), where either ret) = 0 or deg ret) < deg get). We call q(t) the qootient and ret) the remainder in the division of f(t) by get). If ret) = 0, we say that get) divides f(t) or is a factor of f(t).
73
2.3. Application to Polynomials
Next. we prove: 2.3.14. Theorem. eL t t[ F ] denote the ring of polynomials over a field .F eL t f(t) and g(t) be non·zero polynomials in t[ F .] Then there exists a unique monic polynomial. d(t). such that (i) d(t) divides f(t) and g(t). and (ii) if d'(t) is any polynomial which divides f(t) and g(t), then d'(t) divides d(t). Let
Proof.
t[ K ]
=
{x(t)
E
t[ F :]
x ( t)
m(t)f(t) +
=
n(t)g(t). where m(t). n(t)
E
t[ F l}.
We note that f(t). g(t) E t[ K .] Furthermore, if a(t), b(t) E t[ K .] then a(t) and a(t)b(t) E t[ K .] Also. if c is a scalar. then ca(t) E t[ K ] for all b(t) E t[ K ] a(/) E K[/]. Now let d(/) be a polynomial of lowest degree in K[t]. Since all scalar multiples of d(/) belong to t[ K ,] we may assume that d(t) is monic. We now show that for any h(/) E t[ K .] there is a q(t) E t[ F ] such that h(/) = d(/)q(t). To prove this. we know from Theorem 2.3.9 that there exist unique such that h(t) = q(t)d(t) + ,(1). where either elements q(t) and ,(t) in t[ F ] r(t) = 0 or deg ,(t) < deg d(t). Since d(t) E /[ K ] and q(t) E t[ F .] it follows that q(I)d(t) E K(t). Also. since h(/) E t[ K ,] it follows that r(/) = h(t) Since d(t) is a polynomial of smallest degree in (K t). it q(t)d(t) E t[ K .] follows that r(/) = O. eH nce. d(t) divides every polynomial in /[ K .] To show that d(t) is unique. suppose dl(t) is another monic polynomial in t[ K ] which divides every polynomial in t[ K .] Then d(t) = a(t)dl(t). and d 1(t) = b(t)d(/) for some a(t). b(t) E t[ F .] It can readily be verified that this is true only when aCt) = b(t) = 1. Now, since J ( t), g(t) E t[K l, part (i) of the theorem has been proven. To prove part (ii), let o(t), b(t) E t[ F ] be such that f(t) = a(t)d'(t) and get) = b(t)d'(t). Since d(t) E t[ K ,] there exist polynomials m(t), n(t) such that d(t) = m(t)f(t) + n(t)g(t). eH nce, d(t) =
+
m(t)a(t)d'(t)
= m [ (t)a(t)
+
n(t)b(t)d'(t)
n(t)b(t)]d(' t).
This implies that d'(t) divides d(t) and completes the proofofthe theorem.
_
The polynomial d(t) in the preceding theorem is called the greatest common divisor of f(t) and g(t). If d(t) = 1. then f(t) and g(t) are said to be relatively prime. 2.3.15. Exercise. Show that if d(t) is the greatest common divisor of f(t) and g(t). then there exist polynomials m(t) and n(t) such that de,) =
m(t)f(t) +
n(t)g(t).
Iff(t) and g(t) are relatively prime, then 1=
m(t)f(t) +
n(t)g(t).
Chapter 2 I Algebraic Structures
74
Now let f(t) E t[ F ] be of positive degree. If f(t) = g(t)h(t) implies that either g(t) is a scalar or h(t) is a scalar, then f(t) is said to be irreducible. We close the present section with a statement of the fundameotal theorem of algebra. 2.3.16. Theorem. Let f(t) E t[ F ] the field of real numbers and let C (i) If F = C, then f(t) can be product f(t) = c(t -
be a non-zero polynomial. L e t R denote denote the field of complex numbers. written uniquely, except for order, as a
cl)(t -
C1)' .. (t -
c.),
where c, C l , • • ,C. E C. (ij) If F = R, then f(t) can be written uniquely, except for order, as a product f(t) = cfl(t)f1(t) . . .f",(t), where C E R and the fl(t), ... ,/",(t) are monic irreducible polynomials of degree one or two.
2.4.
REFERENCES
AND NOTES
There are many excellent texts on abstract algebra. F o r an introductory exposition of this subject refer, e.g., to Birkhoffand MacLane 2[ .1], H a nneken 2[ .2], H u 2[ .3], Jacobson 2[ .4,] and McCoy 2[ .6]. The books by Birkhoff and MacLane and Jacobson are standard references. The texts by H u and McCoy are very readable. The excellent presentation by H a nneken is concise, somewhat abstract, yet very readable. Polynomials over a field are treated extensively in these references. F o r a brief summary of the properties of polynomials over a field, refer also to Lipschutz 2[ .5].
REFERENCES 2[ .1] 2[ .2] 2[ .3]
2[ .4] 2[ .5)
2[ .6)
G. BIRKO H F and S.MACLANE, A Survey of Modern Algebra. New York: The Macmillan Company, 1965. C. B. A H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dickenson Publishing Co., Inc., 1968. S. T. Hu, Elements ofModern Algebra. San rF ancisco, Calif.: oH lden-Day, Inc., 1965. N. A J COBSON, eL ctures in Abstract Algebra. New York: D. Van Nostrand Company, Inc., 1951. S. LIPSCHUTZ, iL near Algebra. New York: McGraw-iH ll Book Company, 1968. N. .H McCoY, uF ndamentals of Abstract Algebra. Boston: Allyn & Bacon, Inc., 1972.
3
VECTOR SPACES AND IL NEAR TRANSFORMATIONS
In Chapter I we considered the set-theoretic structure of mathematical systems, and in Chapter 2 we developed to various degrees of complexity the algebraic structure of mathematical systems. One of the mathematical systems introduced in Chapter 2 was the linear or vector space, a concept of great importance in mathematics and applications. In the present chapter we further examine properties of linear spaces. Then we consider special types of mappings defined on linear spaces, called linear transformations, and establish several important properties of linear transformations. In the next chapter we will concern ourselves with finite dimensional vector spaces, and we will consider matrices, which are used to represent linear transformations on finite dimensional vector spaces.
3.1.
IL NEAR
SPACES
We begin by restating the definition of linear space. 3.1.1. Definition. Let X be a non-empty set, let F be a field, let "+ .. denote a mapping of X x X into ,X and let"· " denote a mapping of F x X into .X Let the members x E X be called l'ectors, let the elements « E F be called scalars, let the operation defined on X be called e'\ ctor addition,
"+ ..
75
Chapter 3 I Vector Spaces and iL near Transformations
76
and let the mapping "." be called scalar multiplicatioD or moltipUcatioa or vectors by scalars. Then for each ,x y E X there is a unique element, x y E ,X called the sum or x aad y, and for each x E X and IX E F there is a unique element, IX • X I!. IXX E ,X called the multiple or x by IX. We say that the non-empty set X and the field ,F along with the two mappings of vector addition and scalar multiplication constitute a vector space or a iJ Dear space if the following axioms are satisfied:
+
+
+
y= y x for every ,x y EX ; (i) x (ii) x (y )z = (x + y) + z for every ,x y, Z E X ; (iii) there is a unique vector in ,X called the ez ro vector or the Dull vector or the origiD, which is denoted by 0 and which has the property that 0 x = x for all x EX ; (iv) IX(X y) = IXX IXy for all IX E F and for all ,x y E X ; (v) (IX p)x = IXX px for all IX, p E F and for all x E X ; (vi) (IXP)X = IX(PX) for all IX, p E F and for all x E ;X (vii) Ox = 0 for all x E X ; and (viii) Ix = x for all x E .X
+
+
+
+
+
+
+
The reader may find it instructive to review the axioms of a field which are summarized in Definition 2.1.63. In (v) the "+" on the left-hand side on the right-hand side denotes the operation of addition on F ; the denotes vector addition. Also, in (vi) IXP I!. IX · p, where "." denotes the operation of mulitplication on .F In (vii) the symbol 0 on the left-hand side is a scalar; the same symbol on the right-hand side denotes a vector. The I on the left-hand side of (viii) is the identity element of F r elative to ".". To indicate the relationship between the set ofvectors X and the underlying field ,F we sometimes refer to a vector space X over field .F oH wever, usually we speak of a vector space X without making explicit reference to the field F and to the operations of vector addition and scalar multiplication. If F is the field of real numbers we call our vector space a real vector space. Similarly, if F is the field of complex numbers, we speak of a complex vector space. Throughout this chapter we will usually use lower case Latin letters (e.g., ,x y, )z to denote vectors (Le., elements of X ) and lower case Greek letters (e.g., IX, p, )') to denote scalars (Le., elements of F ) . If we agree to denote the element (- l )x E X simply by - x , eL ., (- l )x I!. - x , then we have x - x = Ix + (- l )x = (l - l)x = Ox = O. Thus, if X is a vector space, then for every x E X there is a unique vector, denoted -x, such that x - x = O. There are several other elementary properties of vector spaces which are a direct consequence of the above axioms. Some of these are summarized below. The reader will have no difficulties in verifying these.
"+"
3.1. iL near Spaces
77
3.1.2. Theorem. eL t X be a vector space. If ,x y, z are elements in X and if ,« P are any members of ,F then the following hold: (i) if « x = « y and IX 1= = 0, then x = y; (ii) If IXX = px and x 1= = 0, then IX = p;. (iii) if oX + y = x + ,z then y = ;z (iv) IXO = 0; (v)
IX(X
(vi) (IX (vii) x
y) =
+
-
-
fJ)x y=
3.1.3. Exercise.
IXX
=
IXX
IX}'; -
px; and -
0 implies that x
=
-yo
Prove Theorem 3.1.2.
We now consider several important examples of vector spaces. 3.1.4. Example. eL t X be the set of all "arrows" in the "plane" emanating from a reference point which we call the origin or the ez ro vector or the null vector, and which we denote by o. eL t F denote the set of real numbers, and let vector addition and scalar multiplication be defined in the usual way, as shown in iF gure A.
/
o
Vector x
x x
x+y
0
• •
y
"fY
Vector x + y
3.1.5.
• • av•
0
.,
. y
• ($•y
Vector y Vector av, O< a < l Vector ($y, fj > 1 Vector "fY, O X2' • • , x m} is said to be linearly dependent. If a set is not linearly dependent,then it is said to be linearly independent. In this case the relation (3.3.11) implies that IX I = IX 2 = ... = IX", = O. An infinite set of vectors Y in X is said to be linearly independent if every finite subset of Y is linearly independent. Note that the null vector cannot be contained in a set which is linearly independent. Also, if a set of vectors contains a linearly dependent subset, then the whole set is linearly dependent. If X denotes the space of Example 3.1.4, the set of vectors y{ , }z in Figure H is linearly independent, while the set of vectors v} is linearly dependent.
ru,
v u
o 3.3.12. tors.
iF gure .H
3.3.13. Exercise.
iL nearly Independent and iL nearly Dependent Vec-
eL t X = e[a, b), the set of all real-valued continuous functions on a[ , b), where b > a. As we saw in Example 3.1.19, this set forms
Chapter 3 I Vector Spaces and iL near Transformations
88
a vector space. eL t n be a fixed positive integer, and let us define ,x E X for i = 0, 1,2, ... , n, as follows. F o r all I E a[ , b), let and
,x (t) =
x i t) =
1
I', i=
I, ... ,n.
L e tY = x{ o, X I "' " x 8 }. Then V( )Y of degree less than or equal to n.
is the set of all polynomials on a[ , b]
(a) Show that Y is a linearly independent set in .X (b) eL t ,X = ,x { ,} i = 0, 1, ... ,n; i.e., each ,X is a singleton subset of .X Show that
=
V( Y )
(c)
=
eL t oz (t)
V(X o) Ef> V(X
1 for all I
E
I)
Ef> • . . Ef>
V(X.).
a[ , b) and let
Zk(t) =
I
+
+ ... +
I
Ik
for all I E a[ , b) and k = 1, ... ,n. Show that Z = is a linearly independent set in V( )Y .
3.3.14.
Theorem. eL t
"' If I::
a vector space .X
"',X
X 1 "' "
"' P,x" = I::
I- '
If,~
"x ,} Therefore "" =
P,
,~ "'
"x ,}
Zl"' "
.z }
be a linearly independent set in
P, for all i =
then "" =
1, 2, ... , m.
P,x, then ,~ "' ("" - P,)x, = O. Since the set .x{ , ... , is linearly independent, we have ("', - P,) = 0 for all i = 1, ... ,m.
Proof.
"' "',,x =
I- '
{XI'
oz{ ,
for all i. •
The next result provides us with an alternate way of defining linear dependence.
3.3.15. Theorem. A set of vectors .x{ ,
x
,x",}
1, ••
is linearly dependent if and only if for some index i, 1 ~ , "'", such that scalars "' I ' ... , "',- .. "',+ I'
,x
Proof. "' . X
=
"' I X .
+
+
+ "',.+ IX +
""- I X ' - I
I
+ ... + "'",x..
Assume that Eq. (3.3.16) is satisfied. Then I
+ ... +
+
"' , _ . X ' _ I
(- l )x ,
+
"',.+ ,x .+
in a linear space X i ~ m, we can find
+ ... +
"'."X ,
=
(3.3.16)
O.
Thus, "" = - 1 1= = 0 is a non-trivial choice of coefficient for which Eq. (3.3.11) holds, and therefore the set {Xl> X1' • • , "x ,} is linearly dependent. Conversely, assume that the set {XI' X z , • • ,x",} is linearly dependent. Then there exist coefficients "' . , ... , "'", which are not all ez ro, such that
+
"'x z
+ ... +
"'",x", = O.
(3.3.17) Suppose that index i is chosen such that "" 1= = O. Rearranging Eq. (3.3.17) to "'IX
I
z
89
3.3. Linear Independence, Bases, and Dimension - I I I,X
=
II,X ,
+ ... +
I- I
II' - I X
+
III+ I X I+ I
+ ... +
II.X " ,
(3.3.18)
and multiplying both sides of Eq. (3.3.18) by - 1 /11" we obtain IX
=
PIX I
where P" = proof. _
+
+
P1.X1.
-11"/11,,
k
= I,
+
P' _ I X / _
,i -
I
I, i
+
+
PI+ I X / +
1
+ ... +
P",x " "
I, ... ,m. This concludes our
The proof of the next result is left as an exercise. 3.3.19. Theorem. A finite non-empty set Y in a linear space X is linearly indenpendent if and only if for each y E V( Y), y 0, there is a unique finite x " ,} and a uniq u e set of scalars { I II' 111.,"" II",} , subset of ,Y say { X I ' X 1 ."' " such that
*
3.3.20. Exercise.
Prove Theorem 3.3.19.
3.3.21. Exercise. L e t Y be a finite set in a linear space .X Show that Y is linearly independent if and only if there is no proper subset Z of Y such that V(Z) = V( )Y . A concept which is of utmost importance in the study of vector spaces is that of basis of a linear space. 3.3.22. Definition. A set Y or simply a basis, for X if
in a linear space X
(i) Y is linearly independent; and (ii) the span of Y is the linear space X
is called a Hamel
itself; eL .,
V( Y )
=
basis,
.X
As an immediate consequence of this definition we have: 3.3.23. Theorem. Let X be a linear space, and let Y be a linearly independent set in .X Then Y is a basis for V( )Y . 3.3.24.
Exercise.
Prove Theorem 3.3.23.
In order to introduce the notion of dimension of a vector space we show that if a linear space X is generated by a finite number of linearly independent elements, then this number of elements must be unique. We first prove the following result. 3.3.25. 1beorem. L e t Then for each vector x
{XI'
X 1 .,' "
,x , ,}
be a basis for a linear space .X . . . , (I" such that
X there exist unique scalars (II'
E
X
=
(lIX
1
+ ... +
II"X " .
Chapter 3 I Vector Spaces and iL near Transformations
90
Proof. Since IX ' ... ,X . span ,X every vector X a linear combination of them; i.e.,
X
E
can be expressed as
X = lIlx l + lI"X" + ... + lI.X. for some choice of scalars lIl" .. ,lI• . We now must show that these scalars are unique. To this end, suppose that
= X
and Then x
+
=
(- x )
=
+
lI"X" ... - P.x.)
(lIIX I
-
lIlX I
(lIl -
PI)X I
+
lI"X"
+ ... +
+ ... +
+
(lI" -
lI.X.
+
lI.X.)
(- P IX
+ ... +
P")x,,
I-
P"x" P.)x.
(ll. -
=
O.
Since the vectors x I, "x , ...' ,X. form a basis for ,X it follows that they are linearly independent, and therefore we must have (lI, - P,) = 0 for i = 1, ... ,n. From this it follows that III = PI' lI" = P", ... ,lI" = p". • We also have: 3.3.26. Theorem. eL t IX{ ' "X , ... ,x . } be a basis for vector space ,X and let {YI' ... IY' II} be any linearly independent set of vectors. Then m < n.
Proof. We need to consider only the case m > n and prove that then we actually have m = n. Consider the set of vectors IY{ ' X I "" ,x.l. Since the vectors XI' ... ,X . span ,X IY can be expressed as a linear combination of them. Thus, the set {YI' X I > ' " ,x.l is not linearly independent. Therefore, there exist scalars PI' lIl> ... , lI., not all ez ro, such that
PIYI If all the
lI, are
+
lIlx l
zero, then PI
+ ... +
*' 0 and PlY
lI"X. I
=
(3.3.27)
O.
O. Thus, we can write
=
PIYI + O· "Y + ... + O· IY II = O. But this contradicts the hypothesis· of the theorem and can' t happen because the YI' ... IY ' II are linearly independent. Therefore, at least one of the lI, O. Renumbering all the x" if necessary, we can assume that lI" O. Solving for x" we now obtain
*'
*'
"x
=
(- l l)Y I
+ (~~I)XI
+ ... +
Now we show that the set IY{ ' X I "' " ,X ,-I} { X I "' " x.} is a basis for ,X we have I~ ' "~ ,
••
(- : :- I )X . _
I.
is also a basis for .X ,~. E F s uch that
X = ~IXI + ... + ~.x .• Substituting (3.3.28) into the above expression we note that
(3.3.28)
Since
3.3. Linear Independence, Bases, and Dimension
x
=
+
=
'IXI
+
"Y I
"1
Z' X "tXI
+ z
,.[(I[ - )Yt +
+
+
91
(-::-I)X._
+ ... +
t]
".- I X . _ I '
where" and are defined in an obvious way. In any case, every x E X can be expressed as a linear combination of the set of vectors y{ t, X I' • • , X . _ and thus this set must span .X To show that this set is also linearly independent, let us assume that there are scalars such that AYI
+
AIX I
and assume that A1= = O. Then YI-
_ (-A T I)
+
XI·"
+ ... +
In view of Eq. (3.3.27) we have, since YI
+ ... +
= (p~I)XI
A, AI' ... ,A._ I A._IX._
(-A ._I)-A +
PI
a,
I
=
X.-
0,
t
+
0 ·X
.•
(3.3.29)
1= = 0, the relation
(-p:-t)x._
t+
(p~.)x.
(3.3.30)
Now the term (-a../Pt)x. in Eq. (3.3.30) is not zero, because we solved for X . in Eq. (3.3.28); yet the coefficient multiplying X . in Eq. (3.3.29) is zero. Since { X I ' ... ,x . J is a basis, we have arrived at a contradiction, in view of Theorem 3.3.25. Therefore, we must have A = O. Thus, we have
At IX
A._t.X _
+ ... +
1
+
0 . .X
= 0
AI
and since { x u . .. , .x l is a linearly independent set it follows that = 0, • . . , A._ I = O. Therefore, the set { y \ J X I ' • • , X . _ d is indeed a basis for X. By a similar argument as the preceding one we can show that the set ,z Y { YI' XI'· ' ,x . - z J is a basis for ,X that the set 3Y{ ' ,z Y Y I ' X I ' ... ,x . - 3 I is a basis for ,X etc. Now if m > n, then we would not utilize Y n + 1 in our process. Since {Y., . .. ,Y I ) is a basis by the preceding argument, there exist coefficients ' I ., ... , ' I I such that Y.+I
=
' I .Y .
+ ... +
' I IY I '
But by Theorem 3.3.15 this means the "Y i = 1, ... ,n + 1 are linearly dependent, a contradiction to the hypothesis of our theorem. F r om this it now follows that if m > n, then we must have m = n. This concludes the proof of the theorem. _ As a direct consequence of Theorem 3.3.26 we have:
3.3.31. Theorem. If a linear space X has a basis containing a finite number of vectors n, then any other basis for X consists of exactly n elements.
Proof Let { X I ' ... , .x 1 be a basis for X, and let also { Y I "" , y.. l be a basis for .X Then in view of Theorem 3.3.26 we have m < n. Interchanging the role of the X i and ,Y we also have n < m. Hence, m = n. _
Chapter 3 I Vector Spaces and iL near Transformations
92
Our preceding result enables us to make the following definition. 3.3.32. Definition. If a linear space X has a basis consisting of a finite number of vectors, say X { I ' • • , ,x ,}, then X is said to be a ftnite-diJDelLl4 ional vector space and the dimension of X is n, abbreviated dim X = n. In this case we speak of an n-dimeasional vector space. If X is not a finite-dimensional vector space, it is said to be an inftnite-dimeasional vector space. We will agree that the linear space consisting of the null vector is finite dimensional, and we will say that the dimension of this space is ez ro. Our next result provides us with an alternate characterization of (finite) dimension of a linear space. 3.3.33. Theorem. Let X be a vector space which contains n linearly independent vectors. If every set of n + I vectors in X is linearly dependent, then X is finite dimensional and dim X = n. Proof eL t IX{ > • . . ,x,,} be a linearly independent set in ,X and let x Then there exists a set of scalars {II I' ... , 11,,+ I} not all ez ro, such that II I X
+ ... +
I
II"X"
+
II H I
X
and X E V({ X I > " " is n-dimensional. _
(- ...!)L x
=
11"+1
l -
i.e., { X l • •
,x ,});
.•.
,x,,}
-
.X
= O.
Now 11"+1 *- 0, otherwise we would contradict the fact that linearly independent. eH nce, X
E
XI'
•.•
,X "
are
(~)x"
11,,+ I
is a basis for .X
Therefore, X
F r om our preceding result follows: 3.3.34. Corollary. Let X be a vector space. If for given n every set of n + 1 vectors in X is linearly dependent, then X is finite dimensional and dim X
n< o
3.3.35. Exercise.
Prove
3.3.34.
Coroll~ry
We are now in a position to speak of coordinates of a vector. We have: 3.3.36. Definition. Let X be a finite-dimensional vector space, and let x { I ' . . . , ,x ,} be a basis for .X Let X E X be represented by
x =
The unique scalars, I> to the basis {XI' 2X ." •
'tXI
+ ... + ,,,x,,.
2' ., ... ,,,, are called the coordinates of x ,
,x ,}.
with respect
It is possible to prove results similar to Theorems 3.3.26 and 3.3.31 for infinite-dimensional linear spaces. Since we will not make further use of
3.3. iL near Independence, Bases, and Dimension
93
these results in this book, their proofs will be omitted. In the following theorems, X is an arbitrary vector space (i.e., finite dimensional or infinite dimensional). 3.3.37. Theorem. If Y is a linearly independent set in a linear space ,X then there exists a Hamel basis Z for X such that Y c Z. 3.3.38. Theorem. If Y and Z are Hamel Y and Z have the same cardinal number.
bases for a linear space ,X
then
The notion of H a mel basis is not the only concept of basis with which we will deal. Such other concepts (to be specified later) reduce to H a mel basis on finite-dimensional vector spaces but differ significantly on infinite-dimensional spaces. We will find that on infinite-dimensional spaces the concept of Hamel basis is not very useful. However, in the case of finite-dimensional spaces the concept of Hamel basis is most crucial. In view of the results presented thus far, the reader can readily prove the following facts. 3.3.39. Theorem.
=n.
Let
X
be a finite-dimensional linear space with dim X
(i) No linearly independent set in X contains more than n vectors. (ii) A linearly independent set in X is a basis if and only if it contains exactly n vectors. (iii) Every spanning or generating set for X contains a basis for .X (iv) Every set of vectors which spans X contains at least n vectors. (v) Every linearly independent set of vectors in X is contained in a basis for .X (vi) If Y is a linear subspace of X, then Y is finite dimensional and dim Y < n . (vii) If Y is a linear subspace of X and if dim X = dim ,Y then Y = .x 3.3.40.
Exercise.
Prove Theorem 3.3.39.
F r om Theorem 3.3.39 follows directly our next result. 3.3.41. Theorem. Let X be a finite-dimensional linear space of dimension n, and let Y be a collection of vectors in .X Then any two of the three conditions listed below imply the third condition: (i) the vectors in Y a re linearly independent; (ii) the vectors in Y span X ; and (iii) the number of vectors in Y is n.
Chapter 3 I Vector Spaces and iL near Transformalions 3.3.42.
Exercise.
Prove Theorem 3.3.41.
Another way of restating Theorem 3.3.41 is as follows: (a) the dimension of a finite-dimensional linear space X is equal to the smallest number of vectors that can be used to span X ; and (b) the dimension of a finite-dimensional linear space X is the largest number of vectors that can be linearly independent in .X F o r the direct sum of two linear subspaces we have the following result. Theorem. eL t X be a finite-dimensional vector space. If there exist linear subspaces Y and Z of X such that X = Y ® Z, then dim (X ) = dim (Y ) + dim (Z).
3.3.43.
Proof Since X is finite dimensional it follows from part (vi) of Theorem 3.3.39 that Y a nd Z are finite-dimensionallinear spaces. Thus, there exists a basis, say { Y I "" ,Y,,} for ,Y and a basis, say { Z I> ' " ,z ..}, for Z. Let W = { Y I "' " "Y , ZI"" ,z",}. We must show that Wis a linearly independent set in X and that V(W) = .X Now suppose that
Since the representation for 0 in Y and Z, we must have
E
X must be unique in terms of its components
and But this implies that ~I = ~ = ... = ~ " = PI= P~ = ... = P.. = O. Thus, W is a linearly independent set in .X Since X is the direct sum of Y and Z, it is clear that W generates .X Thus, dim X = m + n. This completes the proof of the theorem. _ We conclude the present section with the following results. 3.3.4.4 1beorem. eL t X be an n-dimensional vector space, and let y{ I ' ... , y",} be a linearly independent set of vectors in ,X where m < n. Then it is possible to form a basis for X consisting of n vectors x I ' • • , x"' where ,x = ,Y for i = I, ... , m.
Proof
Let { e l"" ,e,,} be a basis for .X Let SI be the set of vectors IY{ ' ... ,Y"" e l , • • , ell}, where { Y I "' " Y .. } is a linearly independent set of vectors in X and where m < n. We note that SI spans X and is linearly
3.4.
iL near Transformations
95
dependent, since it contains more than n vectors. Now let
. tJ,,Y
E
1= '
"*
" + E 1= '
p,e, =
O.
Then there must be some Pj 0, otherwise the linear independence of { y " ... , Y.} would be contradicted. But this means that ej is a linear combination of the set of vectors Sz = y{ I' • . . , Y .., e l , • • , e j _ l , e j "+ ... , ell}; i.e., Sz is the set SI with ej eliminated. Clearly, Sz still spans .X Now either Sz contains n vectors or else it is a linearly dependent set. If it contains n vectors, then by Theorem 3.3.41 these vectors must be linearly independent in which case Sz is a basis for .X We then let "x = t j , and the theorem is proved. On the other hand, if Sz contains more than n vectors, then we continue the above procedure to eliminate vectors from the remaining e,'s until exactly n - m of them are left. Letting eil, ... ,ej _ be the remaining vectors and letting X .. + I = til' ... ,x " = ej • _ , we have completed the proof of the theorem. _
3.3.45.
Corollary. Let X be an n-dimensional vector space, and let Y be an m-dimensional subspace of .X Then there exists a subspace Z of X of dimension (n - m) such that X = Y EB Z.
3.3.46.
Exercise.
Prove Corollary 3.3.45.
Referring to Figure 3.3.8, it is easy to see that the subspace Z in Corollary 3.3.45 need not be unique.
3.4.
IL NEAR
TRANSFORMATIONS
Among the most important notions which we will encounter are special types of mappings on vector spaces, called linear transformations. Deftnition. A mapping T of a linear space X into a linear space ,Y where X and Y a re vector spaces over the same field ,F is called a linear transformation or linear operator provided that
3.4.1.
(i) T(x (ii) T(tJ)x
+
=
y)
= T(x) + T(y) for all x, y E X ; and tJT(x) for all x E X and for all tJ E .F
A transformation which is not linear is called a non-linear transformation. We will find it convenient to write T E L ( X , )Y to indicate that T is a linear transformation from a linear space X into a linear space Y (i.e.,
Chapter 3 I Vector Spaces and iL near Transformations
96
)Y denotes the set of all linear transformations from linear space X into linear space Y). . It follows immediately from the above definition that T is a linear transfor-
L(X,
mation from a linear space X into a linear space Y if and only if
" II,T(X = I-I; I
,) for all ,X
E X
=
and for all II, E F , ;
T(tl IIIXI)
I, ... ,n. In engineering
and science this is called the principle of soperposition and is among the most important concepts in those disciplines. 3.4.2. Example. Let X = Y denote the space of real-valued continuous Y functions on the interval a[ , b] as described in Example 3.1.19. Let T: X - + be defined by
T [ (]x t)
f (x s)ds,
=
a
'"
,y.J
3.4.24.
Exercise.
Prove Theorem 3.4.23.
Our next result, which as we will see is of utmost importance, is sometimes called the fundamental theorem of linear equations. 3.4.25.
Theorem. eL t T
L(X,
E
+
)Y .
dim &(T)
If X is finite dimensional, then
dim R < (T)
= dim .X
(3.4.26)
eL t dim X = n, let dim & ( T) = s, and let r = n - s. We must show that dim R < (T) = r. First, let us assume that < s < n, and let e{ l> ez , ... , e.} be a basis for X chosen in such a way that the last s vectors, et+., e' H ' ... ,e., form a basis for the linear subspace & ( T) (see Theorem 3.3.4)4 . Then the vectors Tel, Tez, , Te" Te'1+ > ... , Te. generate the linear subspace R < (T). But e,+1> e,+,z , e. are vectors in &(T), and thus Te,+1 = 0, ... , Te. = O. From this it now follows that the vectors Tel, Te z , ... , Te, must generate R < (T). Now let fl = Tel,fz = Tez, .. ' .I, = Te,. We must show that the vectors {f1,fZ, ... ,f,} are linearly independent and as such form a basis < (T). for R Next, we observe that "ltfl + "ldz + ... + "I,f, E R < (T). If the "II> "lz, ... ,"1, are chosen in such a fashion that "ltf. + tdz + ... + "1'/, = 0, then Proof
°
°
= =
7tfl
+
T(7. e l
+
tdz
+
7z e z
+
+
+
71 Te l
7,f, =
7,e,),
+
+
7z Tez
+ ... +
+
7,Te,
and from this it follows that x = "lle l 7zez + ... 7,e, E &(T). Now, by assumption, the set e{ I+ ' " .. , e.} is a basis for &(T). Thus there must exist scalars 7t+1> 7,H, ... ,7. such that
"lle l
+
"Izez
This can be rewritten as
+ ... +
"I,e, =
)' , + J e ,+ J
+ ... +
)'.e .•
Chapter 3 I Vector Spaces and iL near Transformations
100
But fel, e", ... ,en} is a basis for .X F r om this it follows that 71 = 7" = ... = Y r = 7r+ I = ... = Y n = O. eH nce, fltf", ... ,fr are linearly independent < (T) = r. If s = 0, the preceding proof remains valid if and therefore dim R we let fel, ... ,e.} be any basis for X and ignore the remarks about the vectors e{ r + I ' • • ,en}' If s = n, then ffi.(T) = .X eH nce, R < (T) = O { J and so < (T) = O. This concludes the proof of the theorem. _ dim R Our preceding result gives rise to the next definition. 3.4.27. Definition. The rank p(T) of a linear transformation T of a finitedimensional vector space X into a vector space Y is the dimension of the range space R < (T). The nullity v(T) of the linear transformation Tis the dimension of the nullspace ffi.(i'). The reader is now in a position to prove the next result. )Y . Let X be finite dimensional, and let 3.4.28. Theorem. eL t T E L ( X , s = dim ffi.(T). eL t IX { ' ... ,x , } be a basis for ffi.(T). Then
(i) a vector x E X satisfies the equation Tx = O
if and only if x = lIlX I + ... + lI,X , for some set of scalars { l ilt ... , lI,}. Furthermore, for each x E X such that Tx = 0 is satisfied, the set of scalars { l ilt ... , II,} is unique; (ii) if oY is a fixed vector in ,Y then Tx = oY holds for at least one x E X (called a solutioD of the equation Tx = oY ) if and only if oY E R < (T); and (iii) if oY is any fixed vector in Y a nd if X o is some vector in X such that Tx o = oY (i.e., X o is a solution of the equation Tx o = oY ), then a vector x E X satisfies Tx = oY if and only if x = X o + PIX I + ... + P,X, for some set of scalars P{ it P", ... ,P,}. Furthermore, for each x E X such that Tx = oY , the set of scalars P { it P1.' ... ,P,} is unique. 3.4.29.
Exercise.
Prove Theorem 3.4.28.
Since a linear transformation T of a linear space X into a linear space Y is a mapping, we can distinguish, as in Chapter I, between linear transformations that are surjective (i.e., onto), injective (i.e., one-to-one), and bijective (i.e., onto and one-to-one). We will often be particularly interested in knowing when a linear transformation T has an inverse, which we denote by T- l . In this connection, the following terms are used interchangeably: T- I exists, T has an inverse, T is invertible, and Tis non-singular. Also,. a linear
3.4.
iL near Transformations
101
transformation which is not non-singular is said to be singular. We recall, if T has an inverse, then
=
T- I (Tx )
and
T(T- I y)
x for all x
E X
(3.4.30)
= y for all y E R< (T).
(3.4.31)
The following theorem is a fundamental result concerning inverses of linear transformations. 3.4.32.
Let T E L ( X ,
Theorem.
)Y .
(i) The inverse of T exists if and only if Tx = 0 implies x = O. (ii) If T- I exists, then T- I is a linear transformation from R < (T) onto .X Proof To prove part (i), assume first that Tx = 0 implies x = O. Let X I ' X 2 E X with TX I = TX2' Then T(x l - x 2) = 0 and therefore IX - 2X = O. Thus, IX = X 2 and T has an inverse. Conversely, assume that T has an inverse. Let Tx = O. Since TO = 0, we have TO = Tx. Since T has an inverse, X = O. To prove part (ii), assume that T- I exists. To establish the linearity of T- I ,let IY = TX I and 2Y = Tx 2, where Y I ' 2Y E R < (T) and X I ' X 2 E X are such that IY = TX I and 2Y = Tx 2. Then T- I (Y I
+
=
2Y )
=
Also, for
T- I (Tx l T- I (Y I )
~
E F we have
T-I(~YI)
=
T-I(~Txl)
+
+
Tx 2)
=
T- I T(x
l
+
x 2)
=
IX
+
X 2
T- I (yz ) .
=
T-I(T(~xl))
=
~XI
=
~T-I(YI)'
Thus, T- I is linear. It is also a mapping onto ,X since every Y E R < (T) is the image of some X E .X F o r, if X E ,X then there is ayE R < (T) such that Tx = y. Hence, X = T- I y and X E R < (T-I). • 3.4.33. Example. Consider the linear transformation T: R2 - + R~ of Example 3.4.22. Since Tx = 0 implies X = 0, Thas an inverse. We see that T is not a mapping of R2 onto R- ; however, T is clearly a one-to-one mapping of R2 onto R < (T). • F o r finite-dimensional vector spaces we have: 3.4.34. Theorem. Let T E L ( X , )Y . If X is finite dimensional, T has an inverse if and only if CR(T) has the same dimension as X ; i.e., p(T) = dim .X Proof
By Theorem 3.4.25 we have dim ffi:(T) +
dim R < (T)
= dim .X
Chapter 3 I Vector Spaces and iL near Transformations
101
Since Thas an inverse ifand only iU t (T) if and only if T has an inverse. _
= O{ ,J it follows that P(T) = dim X
F o r finite-dimensional linear spaces we also have: 3.4.35. Theorem. eL t X and Y be finite-dimensional vector spaces of the )Y . Then R < (T) = Y same dimension, say dim X = dim Y = n. Let T E L ( X , if and only if T has an inverse.
Proof Assume that T has an inverse. By dim R < (T) = n. Thus, dim R < (T) = dim Y a nd part (vii), that R < (T) = .Y Conversely, assume that R < (T) = .Y eL t R < (T). Let ,X be such that TX t = ,Y for i =
Theorem 3.4.34 we know that if follows from Theorem 3.3.39,
IY{ :Y' .! ' .• . ,Y . } be a basis for I, ... ,n. Then, by Theorem 3.4.23, the vectors X u • • , X . are linearly independent. Since the dimension of X is n, it follows that the vectors X l ' • • ,X . span .X Now let Tx = 0 for some X E .X We can represent X as X = «IX I « . x .• Hence, 0 = Tx = «IYI «.1 Since the vectorsY I ".' ,Y . are linearly independent, we must have I« = = .« = 0, and thus X = This implies that T has an inverse. _
+ ... +
+ ... +
o.
At this point we find it instructive to summarize the preceding results which characterize injective, surjective, and bijective linear transformations. In so doing, it is useful to keep Figure J in mind.
T
:Dm = X
3.4.36. iF gure J . iL near transformation T from vector space X vector space .Y
into
3.4.37. Summary (Injective Linear Transformations). Let X and Y be vector spaces over the same field ,F and let T E L ( X , )Y . The following are equivalent: (i) T is injective; (ii) T has an inverse;
3.4.
iL near Transformations
103
(iii) Tx = 0 implies x = 0; < (T), there is a unique x (iv) for each y E R (v) if TXt = Tx 1 , then X t = x 1 ; and (vi)
if X
t
*' x
1,
then TXt
*' Tx
E
X such that Tx
=
y;
1•
If X is finite dimensional, then the following are equivalent: (i) T is injective; and (ii) p(T) = dim .X 3.4.38. Summary (Surjective Linear Transformations). Let X and Y be vector spaces over the same field E, and let T E L ( X , )Y . The following are equivalent: (i) T is surjective; and (ii) for each Y E ,Y there is an x E X such that Tx If X and Y a re
= y.
finite dimensional, then the following are equivalent:
(i) T is surjective; and (ii) dim Y = p(T). 3.4.39. Summary (Bijective Linear Transformations). vector spaces over the same field E, and let T E L ( X , )Y . equivalent: (i) T is bijective; and (ii) for every y E Y there is a unique x
If X
and Y a re
E
X
Let X and Y be The following are
such that Tx =
y.
finite dimensional, then the following are equivalent:
(i) T is bijective; and (ii) dim X = dim Y = p(T). 3.4.40. Summary (Injective, Surjective, and Bijective Linear Transformations). L e t X and Y be finite-dimensional vector spaces, over the same field E, and let dim X = dim .Y (Note: this is true if, e.g., X = .Y ) The following are equivalent: (i) (ii) (iii) (iv) 3.4.41. (3.4.04 ).
T is injective; T is surjective: T is bijective; and T has an inverse. Exercise.
Verify the assertions made in summaries (3.4.37)-
Chapter 3 I Vector Spaces and iL near Transformations
104
eL t us next examine some of the properties of the set L ( X , )Y , the set of all linear transformations from a vector space X into a vector space .Y As before, we assumelhat X and Y a re linear spaces over the same field .F Let S, T E L ( X , Y), and define the sum of SandT by
+
(S
for all x
E
.X
Also, with /X
by a scalar /X as
E
T)x
t::.
E E
+
Tx
F and T E L ( X , (/XT)x
for all x that /XT
Sx
define multiplication of T
)Y ,
/XTx
t::.
(3.4.24 )
(3.4.34 )
+
.X It is an easy matter to show that (S T) E L ( X , )Y and also L(X, )Y . eL t us further note that there exists a zero element in
Y), called the ez ro transformation and denoted by 0, which is defined by
L(X,
Ox
= 0
(3.4.)4
)Y there corresponds a unique for all x E .X Moreover, to each T E L ( X , Y) defined by linear transformation - T E L ( X , ( - T)x
for all x E .X
= -
Tx
In this case it follows trivially that - T
+
(3.4.45)
T=
O.
3.4.64 . Exercise. eL t X be a finite-dimensional space, and let T E L ( X , )Y . Let e{ l> ... ,e.} be a basis for .X Then Te, = 0 for i = I, ... , n if and only if T = 0 (i.e., T is the ez ro transformation). With the above definitions it is now easy to establish the following result. 3.4.74 . Tbeorem. eL t X and Y be two linear spaces over the same field of scalars ,F and let L ( ,X Y) denote the set of all linear transformations from X into .Y Then L ( X , Y ) is itself a linear space over ,F called the space of linear transformations (here, vector addition is defined by Eq. (3.4.24 ) and multiplication of vectors by scalars is defined by Eq. (3.4.43». 3.4.84 .
Exercise.
Prove Theorem 3.4.74 .
Next, let us recall the definition of an algebra, considered in Chapter 2. 3.4.94 . Definition. A set X is called an algebra if it is a linear space and if in addition to each ,x y E X there corresponds an element in X, denoted by x · y and called the product of x times y, satisfying the following axioms: (i) x · (y + )z (ii) (x + y) • z (iii) (/Xx), (py)
=
=
x • y
= x • z
+
+
(/XP)(x
If in addition to the above,
x
• z for all x , y, z E X ;
y • z for all x , y,
• y) for all x , y E X
X ; and and for all /x, P E .F
Z E
3.4.
105
iL near Transformations
(iv) (x ·
= x •
y) • z
(y • )z for all x , y, Z
E
,X
then X is called an associatil'e algebra. If there exists an element i E X such that i . x = x • i = x for every x E ,X then i is called the identity of the algebra. It can be readily shown that if i exists, then it is unique. Furthermore, if x • y = y • x for all x , y E ,X then X is said to be a commutative algebra. Finally, if Y is a subset of X (X i sanalgebra)and(a)ifx + y E Y w heneverx , y E Y , and(b)ifex x E Y whenever ex E F and x E ,Y and (c) if x • y E Y whenever x , y E ,Y then Y is called a subalgebra of .X Now let us return to the subject on hand. Let ,X ,Y and Z be linear spaces over ,F and consider the vector spaces L ( ,X Y) and L(Y, Z). IfS E L ( ,Y Z) and if T E L ( X , )Y , then we define the product STas the mapping of X into Z characterized by (ST)x = S(Tx ) (3.4.50) for all x E .X The reader can readily verify that ST E L ( X , Next, let X = Y = Z. If S, T, V E L ( X , X ) and if ex, easily shown that S(TU ) = (ST)V, S(T+
(S +
and
= ST+ SV,
U) T)V =
SU
(exS)(PT) =
+
Z). ,F
PE
then it is (3.4.51) (3.4.52) (3.4.53)
TV,
(3.4.54)
(a,P)ST.
F o r example, to verify (3.4.52), we observe that S [ eT
+
= S[(T + )U ]x
)U x]
=
(ST)x
+
= S[Tx + ]xU (SU ) x =
(ST +
SU ) x
for all x E ,X and hence Eq. (3.4.52) follows. We emphasize at this point that, in general, commutativity of linear transformations does not hold; i.e., in general, (3.4.55)
ST*- TS.
There is a special mapping from a linear space X into ,X called the identity transformation, defined by (3.4.56) Ix = x for all x E .X We note that I is linear, i.e., I E L ( X , if X * - O { ,J that I is unique, and that TI = for all T
E
L(X,
X).
IT =
T
X ) , that I*- O ifand only
(3.4.57)
Also, we can readily verify that the transformation
106
Chapter
rJ,I, rJ, e ,F defined by
(a.I)x
I Vector Spaces and Linear Transformations
j
=
a.lx
=
(3.4.58)
a.x
is also a linear transformation. The above discussion gives rise to the following result. 3.4.59. Theorem. The set of linear transformations of a linear space X into ,X denoted by L ( X , X), is an associative algebra with identity I. This algebra is, in general, not commutative. We further have: 3.4.60. and
Theorem. Let T
E
L(X, T- I T=
X).
If T is bijective, then T- I IT- I =
I,
E
L(X,
X)
(3.4.61)
where I denotes the identity transformation defined in Eq. (3.4.56). 3.4.62.
Exercise.
Prove Theorem 3.4.60.
F o r invertible linear transformations defined on finite-dimensional linear spaces we have the following result. 3.4.63. Theorem. Let X be a finite-dimensional vector space, and let T E L(X, X). Then the following are equivalent: (i) (ii) (iii) (iv) (v)
3.4.64.
T is invertible; rank T = dim X ; T is one-to-one; T is onto; and Tx = 0 implies x Exercise.
=
O.
Prove Theorem 3.4.63.
Bijective linear transformations are further characterized by our next result. 3.4.65. IE L ( X ,
Theorem. Let X be a linear space, and let S, T, U X ) denote the identity transformation.
E
L(X,
(i) If ST = S U = I, then S is bijective and S- I = T = .U (ii) IfSand Tare bijective, then STis bijective, and (Sn- I = (iii) If S is bijective, then (S- I )- I = S. (iv) If S is bijective, then a.S is bijective and (a.S>1F a nd a.
*' O.
=
~
X).
Let
T- I S- I .
S- I for all a.
E
3.4.
107
iL near Transformations
3.4.66.
Exercise.
Prove Theorem 3.4.65.
With the aid of the above concepts and results we can now construct certain classes of functions of linear transformations. Since relation (3.4.51) allows us to write the product of three or more linear transformations without the use of parentheses, we can define T", where T E L ( ,X X ) and n is a positive integer, as T"I1T· T · ... · T . (3.4.67) n times
Similarly, if T- I is the inverse of T, then we can define T- " ' , where m is a positive integer, as T- ' "
11
=
(T- I )' "
T- I • T- I ... • T- t . mtfmes
m ti'ines
(3.4.68)
.
n tImes
(T. T· .... T) m + n·times =
T"'"+ = =
(T • T • . ..• • n times =
T) • (T • T • . .•
. mtimes
T) (3.4.69)
1'" • T"'.
In a similar fashion we have and
(T"')"
= T"" = T- = (1"')"'
(3.4.70) (3.4.71)
where m and n are positive integers. Consistent with this notation we also have TI = T (3.4.72) and TO = 1. (3.4.73) We are now in a position to consider polynomials of linear transformations. Thus, if f(A) is a polynomial, i.e.,
f(A) =
0«
+
A \«
+ ... +
"« A",
(3.4.74)
,« ,1"'.
(3.4.75)
« ' ... ,« " E ,F then by f(T) we mean where 0 f(T) =
f1, 0 1
+
f1,tT
+ ... +
The reader is cautioned that the above concept can, in general, not be
Chapter 3 I Vector Spaces and iL near Transformations
108
extended to functions of two or more linear transformations, because linear transformations in general do not commute. Next, we consider the important concept of isomorphic linear spaces. In Chapter 2we encountered the notion of isomorphisms of groups and rings. We saw that such mappings, if they exist, preserve the algebraic properties of groups and rings. Thus, in many cases two algebraic systems (such as groups or rings) may differ only in the nature ofthe elements ofthe underlying set and may thus be considered as being the same in all other respects. We n.ow extend this concept to linear spaces. 3.4.76. Definition. eL t X and Y be vector spaces over the same field .F Ifthere exists T E L ( X , Y) such that Tis a one-to-one mapping of X into ,Y then T is said to be an isomorphism of X into .Y If in addition, T maps X onto Y then X and Yare said to be isomorphic. Note that if X and aY re isomorphic, then clearly aY nd X are isomorphic. Our next result shows that all n-dimensional linear spaces over the same field are isomorphic. 3.4.77. Theorem. Every n-dimensional vector space X over a field F is isomorphic to F". Proof eL t e{ l, ... ,e,,} be a basis for .X Then every x E X has the unique representation x = ele l + ... + e"e", where {el, e1., ... ,~,,} is a unique set of scalars (belonging to F ) . Now let us define a linear transformation T from X into P by Tx =
(~1>
~1.,
••
,e,,)·
It is an easy matter to verify that T is a linear transformation of X onto P, and that it is one-to-one (the reader is invited to do so). Thus, X is isomorphic to P . • It is not difficult to establish the next result.
3.4.78. Theorem. Two finite-dimensional vector spaces X and Yover same field F are isomorphic if and only if dim X = dim .Y 3.4.79.
Exercise.
the
Prove Theorem 3.4.78.
Theorem 3.4.77 points out the importance ofthe spaces R" and C". Namely, every n-dimensional vector space over the field of real numbers is isomorphic to R" and every n-dimensional vector space over the field of complex numbers is isomorphic to eft (see Example 3.I.lO).
3.5.
IL NEAR
N UF CTIONALS
There is a special type of linear transformation which is so important that we give it a special name: linear functional. We showed in Example 3.1.7 that if F is a field, then "F is a vector space over .F If, in particular, n = I, then we may view F as being a vector space over itself. This enables us to consider linear transformations of a vector space X over F into .F 3.5.1. Definition. Let X be a vector space over a field .F A mapping f of X into F is called a functional on .X If1 is a linear transformation of X into ,F then we call 1 a linear functional on X . . We cite some specific examples of linear functionals. 3.5.2. Example.
Consider the space era, b]. Then the mapping
s:
II(x ) =
ex s) ds, x
era, b]
E
(3.5.3)
is a linear functional on era, b]. Also, the function defined by Il(X) =
(x so),
X
E
era, b],
So
a[ , b]
E
(3.5.4)
is also a linear functional on era, b]. Furthermore, the mapping f,ex)
=
r
(x s)xo(s)
(3.5.5)
ds,
where X o is a fixed element of era, b] and where x is any element in era, b], is also a linear functional on era, b]. • 3.5.6. Example. eL t X = P, and denote x The mappingf, defined by f,(x ) = el
E
X
by x
=
(e
I' •.•
,
e.).
(3.5.7)
is a linear functional on .X A more general form of I, is as follows. eL t a = (~I' ... , ~.) E X be fixed and let x = (el' ... ,e.) be an arbitrary element of .X It is readily shown that the function Is(x ) is a linear functional on .X
• = :E ,~ e, I- I
(3.5.8)
•
3.5.9. Exercise. Show that the mappings (3.5.3), (3.5.4), (3.5.5), (3.5.7), and (3.5.8) are linear functionals. Now let X
be a linear space and let X '
denote the set of all linear func-
109
Chapter 3 I Vector Spaces and iL near Transformations
110
tionals on .X Iff E X ' is evaluated at a point x quently we will also find the notation
f(x )
,X we write f(x ) . Fre-
E
(x , J )
A
(3.5.10)
useful. In addition to Eq. (3.5.10), the notation x'(x) used. In this case Eq. (3.5.10) becomes
=
f(x )
(x , J )
or x ' x
is sometimes
(x , x ' ) , =
(3.5.11)
where x ' is used in place of f Now letfl = t' x ,J1. = ~ belong to IX , « E .F Let us define fl + f1. = t'x + ~ and « f = « x ' by
(fl
+
f1.)(x) =
t'x
(x ,
+
fl(x ) =
and
(
and for all and for all
In the case of real linear spaces, the preceding characterization of inner product is identical, except, of course, that we omit conjugates in (i}(- iv). We are now in a position to introduce the concept of inner product space. 3.6.20. DefiDition. A complex (real) linear space X on which a complex (real) inner product, (" ' ) , is defined is called a complex (real) inner product space. In general, we denote this space by { X ; (0, • )}. If the particular inner product is understood, we simply write X to denote such a space (and we usually speak of an inner product space rather than a complex or real inner product space). It should be noted that if two different inner products are defined on the same linear space ,X say (' , )' 1 and (' , • )2' then we have two different inner product spaces, namely, { X ; (' , .).} and { X ; (0, ')2}' Now let { X ; (0, .)' } be an inner product space, let Y be a linear subspace of ,X and let (' , .)" denote the inner product on Y induced by the inner product on X ; i.e., (x, y)' = (x, y)" (3.6.21) for all ,x y EY e .X Then { Y ; (' , ' )"} is an inner product space in its own right, and we say that Y is an inner product subspace of X. Using the concept of inner product, we are in a position to introduce the notion of orthogonality. We have: 3.6.22. Definition. eL t X be an inner product space. The vectors ,x y E X are said to be orthogonal if (x, y) = O. In this case we write x - l y. If a vector x E X is orthogonal to every vector of a set Y c X, then x is said to be orthogonal to set ,Y and we write x - l .Y If every vector of set Y c X is orthogonal to every vector of set Z c X, then set Y is said to be orthogonal to set Z, and we write Y ...L Z. Clearly, if x is orthogonal to y, then y is orthogonal to .x Note that if 0, then it is not possible that x - l x , because (x, x ) > 0 for all x 1= = O. Also note that 0 - l x for all x E X. x 1= =
3.7. Projections
119
Before closing the present section, let us consider a few specific examples. 3.6.23. Example. Let X = R"o F o r x ,' I .) E R· , we can readily verify that o
=
(~I'
00"
~")
E
R" and y
=
(' I I'
••
(x, y) =
is an inner product, and { X ; 3.6.24. Example. ... ,' I .) E C· , let
Let
~,'Il
( ., .)} is a real inner product space. _
= X
•
~
I~
= x
C", F o r
(~I'
.. " ~.)
E
C" and y =
('II>
•
:E ,~ ; "
(x, y) =
1- 1
Then (x, y) is an inner product and ;X{ space. _
(., .)} is a complex inner product
3.6.25. Example. Let X denote the space of continuous complex valued functions on the interval 0[ , 1). The reader can readily show that for f, g E ,X (f,g)
f'=
f(t)g(t)dt
is an inner product. Now consider the family of functions {f.} f.(t) =
n=
0, ± l ,
f.) = 0 if m
3.7.
e
1_
1
,
t
E
defined by
0[ , 1],
± 2 , .... Clearly, f. E X for all n. It is easily shown that (frn, n. Thus, f .. ..L f .. if m n. •
*'
*'
PROJECTIONS
In the present section we consider another special class of linear transformations, called prOjectiODS. Such transformations which utilize direct sums (introduced in Section 3.2) as their natural setting will find wide applications in later parts of this book.
3.7.1. Definition. Let X be the direct sum of linear spaces X I and X 1 ; i.e., let X = X I ® X 1 • eL t x = X I + 2X , be the unique representation of x E X , where X I E X I and 2X , E X 1 • We say that the projection on X I along 2X ,
is the transformation defined by P(x )
=
XI'
Referring to Figure ,L we note that elements in the plane X can uniquely be represented as x = X I + 2X " where X I E X I and X 2 E X 2 (X I and X 1 are one-dimensional linear spaces represented by the indicated lines intersecting at the origin 0). In this case, a projection P can be defined as that
Chapter 3 I Vector Spaces and iL near Transformations
120
= x
3.7.1.
Figure L .
Projection on IX
+
Xl
2X
along 1'X ..
transformation which maps every point x in the plane X onto the subspace XI along the subspace 1'X .' 3.7.3. Theorem. eL t X be the direct sum of two linear subspaces X I 1'X ., and let P be the projection on X I along 1'X .' Then (i) P
E
L(X,
(ii) R < (P) = (iii)
~(P)
=
X);
X I ; and X 2•
Proof To prove the first part, note that if x = X I where x " Y I E X I and 1'X .' 1'Y . E X 2 , then clearly P(f1.X
+
and
=
Py) = =
+
P(f1.X I f1.P(x f1.P(x)
l)
+
+
f1.X1' .
+
PP(YI)
PYI
=
+
PY1' .)
f1.P(x I
+
+
=
1'X . and Y = f1.X I
1'X .)
+
+
PYI PP(YI
+
YI
+
1'Y .'
1'Y .)
pP(y),
and therefore P is a linear transformation. To prove the second part of the theorem. we note that from the definition of P it follows that R < (P) C X I ' Now assume that IX E X I ' Then PX I = IX > and thus x I E R < (P). This implies that XI C R < (P) and proves that R < (P) = X I ' To prove the last part of the theorem, let 1'X . E X 2 • Then PX1' . = 0 so that 1'X . C ~(P). On the other hand, if x E ~(P), then Px = O. Since x = X I + 1' X .' where X I E XI and 1'X . E 1'X .' it follows that X I = 0 and X E 1'X .' Thus, 1'X . ::J ~(P). Therefore, 1'X . = ~(P). • Our next result enables us to characterize projections in an alternative way. 3.7.4. ~(P)
Theorem. eL t P E L ( X , X). if and only if PP = p'1.= P.
Then P is a projection on R < (P) along
111
3.7. Projections
Proof Assume that P is the projection on the linear subspace X l of X along the linear subspace :X h where X = X I EB X I ' By the preceding theorem, Xl = R < (P) and X I = m(p). F o r x E ,X we have x = lX XI' where X I E X I and IX E XI' Then
+
=
p'1. x
=
P(Px)
PX
I
=
XI
=
Px,
and thus p'1. =
P. let us assume that p2 = P. Let 1'X . = m(p) and let X I = R < (P). Clearly, m(p) and R < (P) are linear subspaces of .X We must show that X = R < (P) EB m(P) = X I EB X I ' In particular, we must show that R < (P) n m(p) = O{ J and that R< (P) and m(p) span .X Now if y E R < (P) there exists an x E X such that Px = y. Thus, p'1. x = Py = Px = y. If y E m(p) then Py = O. Thus, if y is in both m(p) and m(p), then we must have y = 0; i.e., R < (P) n m(p) = O { .J Next, let x be an arbitrary element in .X Then we have C~n> versely,
=
x
Px
Letting Px = lX and (I - P)x = IX ' also PX I = P(I - P)x = Px - p'1. x IX E X I ' F r om this it follows that X X I along X I is P. •
+
=
(I -
=
P)x.
we have PX I = pIX = Px = X I and Px - Px = 0; i.e., X I E X I and X I EB X I and that the projection on
The preceding result gives rise to the following: 3.7.5. Definition. pI = P .
XI'
Let
P
E
L(X,
X).
Then P is said to be idempotent if
Now let P be the projection on a linear subspace X l along a linear subspace Then the projection on X I along X I is characterized in the following way.
3.7.6. Theorem. A linear transformation P is a projection on a linear subspace if and only if (I - P) is a projection. If P is the projection on X I along 1'X .' then (I - P) is the projection on 1'X . along X l ' 3.7.7. Exercise.
Prove Theorem 3.7.6.
In view of the preceding results there is no ambiguity in simply saying a transformation P is a projection (rather than P is a projection on X I along 1'X .)' We emphasize here that if P is a projection, then
X =
R < (P)
EB m(p).
(3.7.8)
This is not necessarily the case for arbitrary linear transformations T E L ( X , X ) for, in general, R < (T) and meT) need not be disjoint. F o r example, if there exists a vector X E X such that Tx 0 and such that T2 x = 0, . then Tx E R < (T) and Tx E meT).
*'
Chapter 3 I Vector Spaces and iL near Transformations
121 eL t us now consider:
3.7.9. Definition. eL t T E U.X, X). A linear subspace Y of a vector space X is said to be invariant under the linear transformation T if y E Y implies that Ty E .Y Note that this definition does not imply that every element in Y can be written in the form z = Ty, with y E .Y It is not even assumed that Ty E Y implies y E .Y F o r invariant subspaces under a transformation T E U.X, X ) we can readily prove the following result. 3.7.10. Theorem. eL t T
E
U.X,
Then
X).
(i) X is an invariant subspace under T; (ii) O { J is an invariant subspace under T; (iii) R < (T) is an invariant subspace under T; and (iv) (~ T) is an invariant subspace under T. 3.7.11. Exercise.
Prove Theorem 3.7.10.
Next we consider: 3.7.12. Definition. eL t X be a linear space which is the direct sum of two linear subspaces Y and Z; i.e., X = Y EEl Z. If Y a nd Z are both invariant under a linear transformation T, then T is said to be reduced by Y a nd Z. We are now in a position to prove the following result. 3.7.13. Theorem. Let Y and Z be two linear subspaces of a vector space X such that X = Y EEl Z. Let T E L ( X , X). Then T is reduced by Y and Z if and only if PT = TP, where P is the projection on Y along Z.
Proof Assume that PT = TP. If y E ,Y then Ty = TPy = PTy so that Ty E Y and Y is invariant under T. Now let y E Z. Then Py = 0 and PTy = TPy = TO = O. Thus, Ty E Z and Z is also invariant under T. eH nce, T is reduced by Y and Z. Conversely, let us assume that T is reduced by Y and Z. If x E ,X then x = y + ,z where y E Y and Z E Z. Then Px = yand TPx = Ty E .Y eH nce, PTPx = Ty = TPx ; i.e., PTPx
=
TPx
(3.7.14)
for all x E .X On the other hand, since Y a nd Z are invariant under T, we have Tx = Ty + Tz with Ty E Y and Tz E Z. eH nce, PTx = Ty = PTy = PTPx ; i.e., (3.7.15) PTPx = PTx
3.8.
123
Notes and References
for all x
E
.X
Equations (3.7.14)
and (3.7.15) imply that PT =
TP.
•
We close the present section by considering the following special type of projection.
3.7.16. Definition. A projection P on an inner product space X is said to be an orthogonal projection if the range of P and the null space of Pare orthogonal; i.e., if R < (P) l.. &(P). We will consider examples and additional properties of projections in much greater detail in Chapters 4 and 7.
3.8.
NOTES AND REFERENCES
The material of the present chapter as well as that of the next chapter is usually referred to as linear algebra. Thus, these two chapters should be viewed as one package. F o r this reason, applications (dealing with ordinary differential equations) are presented at the end of the next chapter. There are many textbooks and reference works dealing with vector spaces and linear transformations. Some of these which we have found to be very useful are cited in the references for this chapter. The reader should consult these for further study.
REFERENCES 3[ .1] 3[ .2] 3[ .3]
3[ .4]
P. R. A H M L OS, iF nite Dimensional Vector Spaces. Princeton, N.J . : D. Van Nostrand Company, Inc., 1958. K. O H M F AN and R. N U K ZE, Linear Algebra. Englewood Cliffs, N.J . : PrenticeH a ll, Inc., 1971. A. W. NAYO L R and G. R. SEL,L Linear Operator Theory in Engineering and Science. New Y o rk: H o lt, Rinehart and Winston, 1971. A. E. TAYO L R, Introduction to u F nctional Analysis. New Y o rk: J o hn Wiley & Sons, Inc., 1966.
4
IF NITE-DIMENSIONAL VECTOR SPACES AND MATRICES
In the present chapter we examine some of the properties offinite-dimensional linear spaces. We will show how elements of such spaces are represented by coordinate vectors and how linear transformations on such spaces are represented by means of matrices. We then will study some of the important properties of matrices. Also, we will investigate in some detail a special type of vector space, called the Euclidean space. This space is one of the most important spaces encountered in applied mathematics. Throughout this chapter { « " ... , .« ,} /« E ,F and { x " ... ,x . } , /x E ,X denote an indexed set of scalars and an indexed set of vectors, respectively.
.4 1.
COORDINATE REPRESENTATION OF VECTORS
Let X be a finite-dimensional linear space over a field ,F and let x { I' • • , x.} be a basis for .X Now if x E ,X then according to Theorem 3.3.25 and Definition 3.3.36, there exist unique scalars ~ I' . . . ,~., called coordinates of x with respect to this basis such that
x = 124
~IXI
+ ... +
~.x..
(4.1.1)
.4 1.
U5
Coordinate Representation of Vectors
This enables us to represent x unambiguously in terms of its coordinates as
(4.1.2)
or as (4.1.3) We call x (or x T) the coordinate representation of the underlying object (vector) x with respect to the basis { x " ... ,x,,}. We call x a column vector and x T a row vector. Also, we say that x T is the transpose vector, or simply the transpose
of the vector x. F u rthermore, we define (x T f to be x. It is important to note that in the coordinate representation (4.1.2) or (4.1.3) of the vector (4.1.1), an "ordering" of the basis IX{ ' ... ,x,,} is employed (i.e., the coefficient of X, is the ith entry in Eqs. (4.1.2) and (4.1.3». If the members of this basis were to be relabeled, thus specifying a different "ordering," then the corresponding coordinate representation of the vector X would have to be altered, to reflect this change. However, this does not pose any difficulties, because in a given discussion we will always agree on a particular "ordering" of the basis vectors. Now let« E .F Then
I'
x«
«('IX =
I+
... + ,,,x.)
=
(
u = "
,
(4.1.18)
0 I
respectively. We call the coordinates in Eq. (4.1.17) the natural coordinates of x E R". (The natural basis for F " and the natural coordinates of x E F " are similarly defined.) Next, consider the set of vectors v { ., ... , v.J, given by v. = (1,0, ... ,0), Vz = (I, 1,0, ... ,0), ... ,v" = (I, ... , I). We see that the vectors { V I> ... , v.J form a basis for R". We can express the vector x given in Eq. (4.1.16) in terms of this basis by
,'.+
(4.1.19)
for i = 1, 2, ... , n - 1. Thus, the coorwhere ot" = ,,, and ott = " dinate representation of x relative to {v., ... , v.) is given by ot. ot z
ot,,_. _ ot"
,. - z' z' - 3'
,,,-.
(4.1.20)
"~-
"~
Hence, we have represented the same vector x E R· by two different coordinate vectors with respect to two different bases for R". • Example. Let X = e[a, b}, the functions on the interval a[ , b]. Let Y = = 1 and x,(t) = I' for all I E a[ , b}, i = 3.3.13, Y is a linearly independent set in X
.4 1.21.
set of all real-valued continuous x{ o, x . , .• . , "x J c ,X where ox (t) 1, ... ,n. As we saw in Exercise and as such it is a basis for V( )Y .
Chapter 4 I iF nite-Dimensional
128
Vector Spaces and Matrices
eH nce, for any y E V(Y) there exists a unique set of scalars ('Io, I' I> such that y = I' oXo + ... + I' . X .•
• . , 1' .1 (4.1.22)
Since y is a polynomial in t we can write, mote explicitly,
y(t) =
1' 0
+
+ ... +
' l It
'I.t·,
t E a[ , b).
(4.1.23)
In the present example there is also a coordinate representation; i.e., we can represent y E V( )Y by
(4.1.24)
I' .
This representation is with respect to the basis (x o, IX > • , .x l in V(Y). We could, of course, also have used another basis fo~ V(Y). F o r example, let us choose the basis (zo, z I' • . , .z l for V( )Y given in Exercise 3.3.13. Then we have y =
1X0Z o
+
IX I Z
I
+ ... +
IX"Z",
where IX. = I' " and IXt = I' t - I' t+I' i = 0, 1, ... ,n - 1. Thus, y may also be represented with respect to the basis (ZO,ZI' " • ,z . } by 1X
0
IX
I
1' 0 I' I -
(4.1.25) E
V(Y )
'II
1' 2 (4.1.26)
IX._ _ IX"
I
1' ,,-1 - 'I. 'I.
Thus, two different coordinate vectors were used above in representing the same vector y E V( )Y with respect to two different bases for V( )Y . • Summarizing, we observe: 1. Every vector X belonging to an n-dimensional linear space X over a field F can be represented in terms of a coordinate vector x, or its transpose x T , with respect to a given basis e{ I' • • , e.l c .X We note that x T E P (the space P is defined in Example 3.1.7). By convention we will henceforth also write x E P. To indicate the coordinate representation of x E X by x E P, we write x ~ .x 2. In representing x by x, an "ordering" of the basis e{ l t • • , ell} c X is implied.
.4 2.
Matrices
3.
129
sU age of different bases for X sentations of x E .X
.4 2.
results in different coordinate repre-
MATRICES
In this section we will first concern ourselves with the representation of linear transformations on finite-dimensional vector spaces. Such representations of linear transformations are called matrices. We will then examine the properties of matrices in great detail. Throughout the present section X will denote an n-dimensional vector space and Yan m-dimensional vector space over the same field .F
A. Representation of Linear Transformations by Matrices We first prove the following result. Theorem. Let e{ ., e2, ... ,e ..} be a basis for a linear space .X
.4 2.1.
(i) eL t set (el> e..},
(ii) L e t
A be a linear transformation from X
into vector space Y and is any vector in X and if e2"' " e..) are the coordinates of x with respect to e{ ., e2, ... , then Ax = e1e; + e2e~ + ... + e.. e~.
e;
= Ae1 , e~ = Ae2 , • • e;, ... , e~J
{e~,
,I" = Ae... If x
be any set of vectors in .Y
Then there exists a = e;,
unique linear transformation A from X into Y such that Ae l Ae 2 = e~, .• . , Ae.. = 1". Proof To prove (i) we note that Ax
= =
A(e1e l el~
+
+
+
e2e2 e2e~
+
+
+
e"e,,)
=
elAe l
=
e2Ae2
+ ... +
E
X we have unique scalars
e"Ae"
e"e~.
To prove (ii), we first observe that for eachx e., e2" .. , e.. such that x
+
e.e l
+
+ ... +
e2e2
e..e".
Now define a mapping A from X into Y as A(x) =
Clearly, A(e,) = e; for i Given x = ele l + e2e2 we have A(x + y) =
=
+
ele;
= 1,
+
(el +
,n.
e.. l".
We first must show that A is linear. e..e.. and y = ' l Ie. + I' 2e2 + ... + ' I ..e..,
+
A[(el
+ ... +
e2e~
+
I' 1)e. I' 1)e'l
+
+ +
+
(e .. (e ..
+
+
' I ..)e ..l ' I ..)e' ...
Chapter 4 I iF nite-Dimensional
130
Vector Spaces and Matrices
On the other hand. and Thus.
A(x)
+
ell.
A(y) =
=
= (el
A(x
+
+
+ ... + + (e~ +
e~e~ 111)e~
+
+
e"e:. 11~)~
111e~
+ ... +
+
11~~
(e"
+
+ ... +
l1"e~
11,,)e:.
y).
In an identical way we establish that
=
lXA(x)
A(lX)X
for all x E X and all lX E .F It thus follows that A E L ( X . )Y . To show that A is uniq u e. suppose there exists aBE L ( X . )Y Be, = e; for i = I• . ..• n. It follows that (A - B)e, = 0 for all i = and thus it follows from Exercise 3.4.6 4 that A = B. •
such that I• . ..• n.
We point out that part (i) of Theorem .4 2.1 implies that a linear transformation is completely determined by knowing how it transforms the basis vectors in its domain. and part (ii) of Theorem .4 2.1 states that this linear transfor-
mation is uniquely determined in this way. We will utilize these facts in the following. Now let X be an n-dimensional vector space. and let {el' ez • . ..• ell} be a basis for .X L e t Y b e an m-dimensional vector space. and let {fIJ~ • ... J " ,} be a basis for .Y L e t A E L ( X . )Y . and let e; = Ae, for i = I • . ..• n. Since {[IJ~ • ... J " ,} is a basis for .Y there are uniq u e scalars a{ o.} i = I• . ..• m. j = I • . ..• n. such that
Now let x E .X
Ael = Aez =
I. = allfl ~ = aufl
Ae" =
e:. =
I
at..!1 elel
with respect to the basis e{ we have
l•
Ax = ffIJ~.·
..
+
+
azt!~
+
+
aufz
+
az,,[z
+ ... +
a",t!",
+
a",d",
(4.2.2)
a",..!",.
Then x has the uniq u e representation
x =
Since Ax E .Y
+
+
+ .,. +
e~ez
e"e"
ell}' In view of part (i) of Theorem 4.2.1
••
ele~
+ ... +
(4.2.3)
e"e~.
Ax has a uniq u e representation with respect to the basis
,fIlII. say.
Ax =
11t!1
+
l1dz
+ ... +
11",[",.
(4.2.4)
.4 2.
Matrices
131
Combining Equations (4.2.2) and (4.2.3), we have Ax
+ ... +
= el(aldl
+
+
e,,(au/l +
+
e8(a l J I
+
a",d",)
+
a",,,/,,,)
+
a",Jm)'
.
Rearranging the last expression we have Ax
al"e" + ... + a h e8)/1 + (a"lel + aue" + ... + a"8e8)/"
= (allel
+ However, have
+
+
(a"'lel
a",,,e,, +
... + a"'8en)/",·
in view of the uniqueness of the representation in Eq. (4.2.4)
+
aue" + 11" = a"lel + aue" + 111
11", =
= allel
amlel
+
a",,,e,,
+
+
alnen, ah e8'
+ ... +
a",ne8'
we
(4.2.5)
This set of equations enables us to represent the linear transformation A from linear space X into linear space Y by the unique scalars lao}, i = I, ... , m,j = I, ... , n. F o r convenience we let
A -- [ a,}] -
ail
a" I
r
a"'l
au
au
...
a",,,
.. ,
a 18 ]
ah .
(4.2.6)
a"'8
We see that once the bases {el, e", . .. ,e { / h/", ... ,I",} are fixed, we can represent the linear transformation A by the array of scalars in Eq. (4.2.6) which are uniquely determined by Eq. (4.2.2). In view of part (ii) of Theorem .4 2.1, the converse to the preceding also holds. Specifically, with the bases for X and Y still fixed, the array given in Eq. (4.2.6) is uniquely associated with the linear transformation A of X into .Y The above discussion justifies the following important definition. 8 },
.4 2.7. Definition. The array given in Eq. (4.2.6) is called the matrix A of tbe linear transformation A from linear space X into linear space Y with respect to the basis e{ 1> • • , en} of X and the basis { I I' ... ,fIll} of .Y If, in Definition .4 2.7, X = ,Y and if for both X and Y the same basis e{ l' ... , e is used, then we simply speak of the matrix A of the linear transformation A with respect to the basis e{ l, ... ,e8 } . In Eq. (4.2.6), the scalars (all, 0,,,, ... ,0'8) form the ith row of A and the 8 }
Chapter 4 I iF nite-Dimensional
132
Vector Spaces and Matrices
scalars (all' 0 2/ , ... , 0"'/) form the jth column of A. The scalar a'l refers to that element of matrix A which can be found in the ith row and jth column of A. The array in Eq. (4.2.6) is said to be an (m X n) matrix. Ifm = n, we speak of a square matrix (i.e., an (n X n) matrix). In accordance with our discussion of Section .4 1, an (n X 1) matrix is called a column vector, column matrix, or n-vector, and a (1 x n) matrix is called a row vector. We say that two (m X n) matrices A = [ 0 1/] and B = b[ l/] are equal if and only if 01/ = bl/ for all i = I, ... , m and for allj = I, ... , n. F r om the preceding discussion it should be clear that the same linear transformation A from linear space X into linear space Y may be represented by different matrices, depending on the particular choice of bases in X and .Y Since it is always clear from context which particular bases are being used, we usually don' t refer to them explicitly, thus avoiding cumbersome notation. Now let AT denote the transpose of A E L ( X , Y) (refer to Definition 3.5.27). Our next result provides the matrix representation of AT. .4 2.8. Theorem. Let A E L ( X , Y ) and let A denote the matrix of A with respect to the bases e{ I' ... , e~} in X and { f l' ... ,I.} in .Y Let X I and yl be the algebraic conjugates of X and Y, respectively. Let AT E L ( Y I , X I ) be the transpose of A. Let {f~, ... ,f~} and {e~, ... , e:.}, denote the dual bases of { f l' ... , f",} and e{ u ... , e~}, respectively. If the matrix A is given by Eq. (4.2.6), then the matrix of AT with respect to {f~, ... ,f~} of yl and {e~, ... , e:.} of X ' is given by
all AT
a21
= [ 01.2.. .~2.2 al~
0"'1] •
a2~
""" •
~."'2
...
a",~
(4.2.9) •
Proof. Let B = b[ l' ] denote the (n x m) matrix of the linear transformation AT with respect to the bases f{ ,~ ... ,f~} and {e~, ... , e:.J. We want to show that B is the matrix in Eq. (4.2.9). By Eq. (4.2.2) we have
for i =
I, ...
,n, and
for j = I, ... , m. By Theorem 3.5.22, Therefore,
e
=
6,,}.
.4 2.
Matrices
133
Also,
A< el,f>~ Therefore, b,j
=
e< l, AT/~>
=
= k=L 1"
bkje~)
(el, tl =
= bl}'
bklel, e~>
ajl' which proves the theorem. _
The preceding result gives rise to the following concept. .4 2.10. Definition. The matrix AT in Eq. of matrix A.
(4.2.9)
is called the transpose
Our next result follows trivially from the discussion leading up to Definition .4 2.7. .4 2.11. Theorem. Let A be a linear transformation of an n-dimensional vector space X into an m-dimensional vector space ,Y and let y = Ax. Let the coordinates of x with respect to the basis e{ l , el' ... , e,,} be (e \ J el' ... , e.), and let the coordinates of y with respect to the basis { f l,fl' ... ,f..} be ('I I ' 1' 1' ... , 'I.). eL t
all all
011
ala
au
ala
(4.2.12)
be the matrix of A with respect to the bases reI' e1 , Then
I.}.
allel auel
+
+
a l1 el a 21 el
or, equivalently,
I' I = .4 2.15.
Exercise.
L
•
jml
+ +
a,je j, i =
+
+
alae.
••
=
a 1"e. =
I, ... , m.
,
e.} and { f l,fl, ... ,
' I I'
1' 1'
(4.2.13)
(4.2.14)
Prove Theorem .4 2.11.
Using matrix and vector notation, let us agree to express the system of linear equations given by Eq. (4.2.13) equivalently as
134
Chapter 4
all au
I iF nite-Dimensional
aU.
aa.
au
a~ h
a. 1 a.2
I' 2' ,- .
a• •
or, more succinctly, as
~
T
Vector Spaces and Matrices
"1
"2
".
y, ~
(4.2.16)
(4.2.17)
where x ~ (' I ' 2 ' " .. ".) and yT ~ ("1> "2' ... ,,,.). In terms ofx T, yT, and AT, let us agree to express Eq. (4.2.13) equivalently as
all (' I t
2' ' ,' ,.)
or, in short, as
aU.
a. 1 a.2
a21 au
~
a_
In
ab T x AT
("It "2' .. " "",) (4.2.18)
~
a•
yT.
(4 . 2.19)
We note that in Eq. (4.2.17), x E P, Y E F"', and A is an m X n matrix. F r om our discussion thus far it should be clear that we can utilize matrices to study systems of linear eq u ations which are of the form of Eq. (4.2.13). It should also be clear that an m x n matrix A is nothing more than a uniq u e representation of a linear transformation A of an n-dimensional vector space X into an m-dimensional vector space Y over the same field .F As such, A possesses all the properties of such transformations. We could, in fact, utilize matrices in place of general linear transformations to establish many facts concerning linear transformations defined on finite-dimensional linear spaces. However, since a given matrix is dependent upon the selection of two particular sets of bases (not necessarily distinct), such practice will, in general, be avoided whenever possible. We emphasize that a matrix and a linear transformation are not one and the same thing. In many texts no distinction in symbols is made between linear transformations and their matrix representation. We will not follow this custom.
B. Rank of a Matrix We begin by proving the following result. 4.2.20. Theorem. L e t A be a linear transformation from X into .Y Then A has rank r if and only if it is possible to choose a basis e{ l> e2 , • • , e.}
.4 2.
Matrices
135
for X and a basis { I I' ... ,fIll} for Y such that the matrix A of A with respect to these bases is of the form r..
- 100
6 o
010 A=
0 0 0
...
0 0
0-
0 0
0
1 0 0
...
0
m=
dim .Y
(4.2.21)
000···000···0 000···000···0
....
dim X
n=
Proof. We choose a basis for X of the form e{ l, e2.' ... ,e" e,+I'
• . . , e.}, where e{ l+ ' > ... , e.} isa basisfodJt(A). Ifll = Ae l ,f2. = Ae2.' ... ,/, = Ae" then {l1,f2.," .,/,} is a basis for R < (A), as we saw in the proof of Theorem 3.4.25. Now choose vectors 1,+1, ... ,fin in Y such that the set of vectors l{ 1,f2., .. . ,f",} forms a basis for Y (see Theorem 3.3.4)4 . Then
II 12.
=
=
=
Ae l
=
Ae2
+
(1)/1
(0)/1
+
(0)/2.
+ +
(1)12.
+
(O)/,
+
+
(0)/'1+
+
(0)/,
(0)/'1+
+ +
+
(O)/In'
+
(0)/""
..................................................................................................... ,
I, = o=
Ae,
0=
Ae" =
Ae,+
=
+
(0)/1
I
(0)/2
+
= (0)/1 + (0)/2. +
+
+
+ "' + (O)/In' (4.2.22) + (0)/, + (O)/,+ 1 + ... + (O)/In' (1)/,
(0)/'1+
...................................................................................................... ,
(0)/1
+
(0)/2.
+ ... +
(0)/,
+
(0)/'1+
+ ... +
(O)/In'
The necessity is proven by applying Definition 4.2.7 (and also Eq. (4.2.2» to the set of equations (4.2.22); the desired result given by Eq. (4.2.21) follows. Sufficiency follows from the fact that the basis for R < (A) contains r linearly independent vectors. _ A question of practical significance is the following: if A is the matrix of a linear transformation A from linear space X into linear space Y with respect to arbitrary bases e{ l , • • , e.} for X and { I I' ... , /In} for ,Y what is < (A) be the subspace of Y generthe rank of A in terms of matrix A? Let R ated by Ae l , Ae2.' ... , Ae". Then, in view of Eq. (4.2.2), the coordinate representation of Ae/> i = I, ... ,n, in Y with respect to { I I' ... ,fin} is given by
Chapter 4 I iF nite-Dimensional
136
... ,
Vector Spaces and Matrices
Ae,,~
F r om this it follows that R < (A) consists of vectors y whose coordinate representation is
+ ... + "
y=
"_ ...
a_ ... I
a_ ... 2
(4.2.23)
a..."
where" I' • • , "" are scalars. Since every spanning or generating set of a linear space contains a basis, we are able to select from among the vectors Ael • Ae 2• ... ,Ae" a basis for R < (A). Suppose that the set A { e l , Ae2 • ...• Aek} is this basis. Then the vectors Ae I. Ae 2• ..• , Ae k are linearly independent. and the vectors Aek+I' ... , Ae" are linear combinations of the vectors Ae l • Ae2 , • • • • Aek • F r om this there now follows: .4 2.24. Theorem. Let A E L ( X . )Y , and let A be the matrix of A with respect to the (arbitrary) basis eel' e2 • ... , e,,} for X and with respect to the (arbitrary) basis { l 1.l2 • ... .I...} for .Y Let the coordinate representation of y = Ax be Y = Ax. Then (i) the rank of A is the number of vectors in the largest possible linearly independent set of columns of A; and (ii) the rank of A is the number of vectors in the smallest possible set of columns of A which has the property that all columns not in it can be expressed as linear combinations of the columns in it. In view of this result we make the following definition. .4 2.25. Definition. The rank of an m X of linearly independent columns of A.
c.
n matrix A is the largest number
Properties of Matrices
Now let X be an n-dimensional linear space. let Y be an m-dimensional linear space, let F b e the field for X and ,Y and let A and B be linear transformations of X into .Y eL t A = a[ o ] be the matrix of A. and let B = h[ o ] be the matrix of B with respect to the bases felt e2 • • • , e,,} in X and { f t.f2.
.4 2.
Matrices
137
... ,/",} in .Y Using Eq. (3.4.2 4 ) as well as Definition .4 2.7, the reader can readily verify that the matrix of A + D, denoted by C A A + B, is given by
A
+
B
=
+
a[ lj]
=
b[ IJ]
+
a[ lJ
=
blj]
=
e[ IJ]
C.
(4.2.26)
Using Eq. (3.4.34 ) and Definition .4 2.7, the reader can also easily show that the matrix of A « , denoted by D A «A, is given by «A
=
=
a[ « IJ]
=
a«[ lj]
=
d[ IJ]
D.
(4.2.27)
F r om Eq. (4.2.26) we note that, in order to be able to add two matrices A and B, they must have the same number of row.5 and columns. In this case we say that A and B are comparable matrices. Also, from Eq. (4.2.27) it is clear that if A is an m X n matrix, then so is A « . Next, let Z be an r-dimensional vector space, let A E L ( X , )Y , and let D E L ( ,Y Z). L e t A be the matrix of A with respect to the basis e{ I' e", ... , e in X and with respect to the basis { f l' ! ' " ... ,f",} in .Y Let B be the matrix of D with respect to the basis { f l ,f", ... ,!m} in Y a nd with respect to the basis { g l' g", ... , g,} in Z. The product mapping DA as defined by Eq. (3.4.50) is a linear transformation of X into Z. We now ask: what is the matrix C of DA with respect to the bases e{ l, e", ... , e of X and g{ I ' g", ... , g,} of Z? By definition of matrices A and B (see Eq. (4.2.2», we have K }
K }
and
,
B! J = 1 :bljg/t 1= 1
Now
, "' = 1=1:1 J=I1:
j= I ,
... ,m.
blj aJkgl'
for k = I, ... , n. Thus, the matrix C of BA with respect to basis e{ in X and { g " ... , g,} in Z is e[ IJ' ] where
I' .•.
,
e
K }
(4.2.28) for i
=
I, ... , r andj =
I, ... , n. We write this as C= B A.
(4.2.29)
F r om the preceding discussion it is clear that two matrices A and B can be multiplied to form the product BA if and only if the number of columns ofB is equal to the number of rows of A. In this case we say that the matrices B and A are conformal matrices.
138
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
In arriving at Equations (4.2.28) and (4.2.29) we established the result given below. )Y with respect to the .4 2.30. Theorem. Let A be the matrix of A E L ( X , basis leu ez , ... , e.} in X and basis { l u! z , ,fill} in .Y Let B be the matrix of BEL ( ,Y Z) with respect to basis { I I' ,z ! ,fill} in Y and basis {g" g,z ... ,g,} in Z. Then BA is the matrix of BA. We now summarize the above discussion in the following definition. .4 2.31. let C =
Definition. Let A = a[ l' ] and B = b[ ll] be two m X n matrices, C[ II] be an n X r matrix, and let ~ E .F Then
(i) the som of A and B is the m x
n matrix
D= A + B
where
dll = a'l + bl' for all i = I, ... , m and for allj = 1, ... ,n; (ii) the product of matrix A by scalar ~ is the m x n matrix E=~A
where for all i
=
ell =
1, ... ,m and for allj =
~all
I, ... ,n; and
(iii) the product of matrix A and matrix C is the m x r matrix
G= A C,
where
for each i
=
I, ... , m and for eachj =
1, ... , r.
The properties of general linear transformations established in Section 3.4 hold, of course, in the case of their matrix representation. We summarize some of these in the remainder of the present section. .4 2.32.
Theorem.
(i) Let A and B be (m x n) matrices, and let C be an (n X Then (A B)C = AC + BC.
+
(ii) Let A be an (m Then X
n) matrix, and let Band C be (n x
A(B
+
C)
=
AD
+
AC.
r) matrix. (4.2.33) r) matrices. (4.2.34)
.4 2.
Matrices
139
(iii) Let A be an (m X n) matrix, let B be an (n X C be an (r X s) matrix. Then A(BC) = (iv) Let t¥,
pE
,F and let A be an (m
(t¥ + (v)
(AB)C.
t¥(A +
B) =
and let (4.2.35)
n) matrix. Then X
P)A =
Let t¥ E ,F and let A and B be (m
r) matrix,
t¥A +
x
(4.2.36)
pA.
n) matrices. Then
t¥A +
(4.2.37)
t¥B.
(vi) Let t¥, P E ,F let A be an (m X n) matrix, and let B be an (n X r) matrix. Then (4.2.38) (t¥A)(pB) = (t¥P)(AB). (vii)
Let A and B be (m
x
n) matrices. Then
A +B= (viii)
Let A, B, and C be (m (A +
x
(4.2.39)
B+ A .
n) matrices. Then
B) + C =
A+
(B +
C).
(4.2.40)
The proofs of the next two results are left as an exercise.
.4 2.41.
Theorem. L e t 0 E L ( X , Y ) be the zero transformation defined by Eq. (3.4.)4 . Then for any bases e{ l' ... , e.J and { f l' ... ,I.. J for X and ,Y respectively, the linear transformation 0 is represented by the (m x n) matrix (4.2.42)
The matrix 0 is called the Dull matrix.
.4 2.43.
Theorem. Let I E L ( X , X ) be the identity transformation defined by Eq. (3.4.56). L e t e{ l> ... , e.J be an arbitrary basis for .X Then the matrix representation of the linear transformation I from X into X with respect to the basis e{ l> ... , e.J is given by
I I is called the n x
.4 2.45.
Exercise.
~ ~
[ : .. ..: ..:.:.:..
:J
(4.2.4)4
n identity matrix. Prove Theorems 4.2.32,4.2.41,
and .4 2.43.
140
Chapter 4 I iF nite-Dimensional F o r any (m x
Vector Spaces and Matrices
n) matrix A we have
(4.2.46)
A+ O = O + A = A and for any (n X n) matrix B we have
(4.2.47)
BI= I B= B
where I is the (n x n) identity matrix. If A = a[ u] is a matrix of the linear transformation A, then correspondingly, - A is a matrix of the linear transformation - A , where
-A =
(- I )A =
all
012
ala
021
02 2
02"
0",2
a",,,
(- I ) _ 0 "' 1
- a ll
- 0 12
- a la
- 0 21
-au
- 0 211
- 0 "' 2
- a ",,,
(4.2.48)
= _ - a "' l
It follows immediately that A + (- A ) = 0, where 0 denotes the null matrix. By convention we usually write A + (- A) = A- A . Let A and B be (n X n) matrices. Then we have, in general,
AB*BA,
(4.2.49)
as was the case in Eq. (3.4.55). Nex t ,let A E L ( X , X ) and assume that A is non-singular. Let A- I denote the inverse of A. Then, by Theorem 3.4.60, ..4A1= A-1A = 1. Now if A is the (n x n) matrix of A with respect to the basis e{ l , • • ,ell} in ;X then there is an (n X n) matrix B of A- I with respect to the basis e{ u ... ,ell} in ,X such that (4.2.50) BA= A B= I . We call B the inverse of A and we denote it by A- I . In this connection we use the following terms interchangeably: A- I exists, A bas an inverse, A is invertible, or A is non-singular. If A is not non-singular, we say A is singnlar. With the aid of Theorem 3.4.63 the reader can readily establish the following result for matrices. 4.2.51.
Theorem. eL t A be an (n
(i) rank A = n; (ii) Ax = 0 implies x
=
0;
X
n) matrix. The following are equivalent:
.4 2.
Matrices
141
(iii) for every oY E "F , there is a unique X o E F " such that oY = (iv) the columns of A are linearly independent; and (v) A - I exists. 4.2.52.
Exercise.
Ax o;
Prove Theorem .4 2.51.
We have shown that we can represent n linear eq u ations by the matrix eq u ation (4.2.17). Now let A be a non-singular (n x n) matrix and consider the eq u ation y = Ax. (4.2.53)
If we premultiply both sides of this eq u ation by A - I we obtain x = A- I y ,
(4.2.54)
the solution to Eq. (4.2.53). Thus, knowledge of the inverse of A enables us to solve the system of linear eq u ations (4.2.53). In our next result, which is readily verified, some of the important properties of non-singular matrices are given. 4.2.55.
Theorem.
(i) An (n x n) non-singular matrix has one and only one inverse. (ii) IfA and B are non-singular (n x n) matrices, then (AB)-I = B-1 A- I .
(iii) If A and Bare (n x are A and D. 4.2.56.
Exercise.
n) matrices and if AB is non-singular, then so
Prove Theorem .4 2.55.
Our next theorem summarizes some of the important properties of the transpose of matrices. The proof of this theorem is a direct consequence of the definition of the transpose of a matrix (see Eq. (4.2.9». 4.2.57.
Theorem.
(i) F o r any matrix A, (AT)T = A. (ii) L e t A and B be conformal matrices. Then (AB)T = DTAT. (iii) L e t A be a non-singular matrix. Then (AT)-I = (A-I)T. (iv) L e t A be an (n X n) matrix. Then AT is non-singular if and only if A is non-singular. (v) Let A and B be comparable matrices. Then (A B)T = AT BT. (vi) L e t« E F and A be a matrix. Then ( ... , e.J for X such that the matrix P of P with respect to this basis is of the form .4 2.88.
0:
1 0
o
I
0:,
1
o
I I I I
r
I I
p=
I
I
0 0
:
-~_.
(4.2.89)
:0 I
I
I I
o
I
I I
I I
n- r •
•
;0 where r =
0
o
dim R < (P).
Proof. Since P is a projection we have, from Eq. (3.7.8), X
=
R < (P)
EB (~ P).
Now let r = dim R < (P), and let e{ l> ... , e.J be a basis for X such that (el' ... , e,J is a basis for R < (P). Let P be the matrix of P with respect to this basis, and the theorem follows. •
Chapter 4 I iF nite-Dimensional
148
Vector Spaces and Matrices
We leave the next result as an exercise. 4.2.90.
Theorem.
A E L(X,
X).
Let
X
be a finite-dimensional vector space, and let
If W is a p-dimensional invariant subspace of X and if X
EB Z, then there exists
=
W
a basis for X such that the matrix A of A with respect to this basis has the form A
=
[~: -i'~!:J o :A
2Z
where All is a (p x p) matrix and the remaining submatrices are of appropriate dimension. .4 2.91.
Exercise.
.4 3.
Prove Theorem .4 2.90.
EQUIVALENCE
AND SIMILARITY
F r om the previous section it is clear that a linear transformation A of a finite-dimensional vector space X into a finite-dimensional vector space Y can be represented by means ofdifferent matrices, depending on the particular choice of bases in X and .Y The choice of bases may in different cases result in matrices that are "easy" or "hard" to utilize. Many of the resulting "standard" forms of matrices, called canonical forms, arise because of practical considerations. Such canonical forms often exhibit inherent characteristics of the underlying transformation A. Before we can consider some of the more important canonical forms of matrices, we need to introduce several new concepts which are of great importance in their own right. Throughout the present section X and Y a re finite-dimensional vector spaces over the same field ,F dim X = n and dim Y = m. We begin our discussion with the following result. 4.3.1.
Theorem. Let e{ l , • • ,e"} be a basis for a linear space ,X be a set of vectors in X given by
e{ ;, ... , e~}
e; =
:t pjlej•
j=
I
i=
1, ... ,n,
where Plj E F for all i,j = I, ... ,n. The set e{ ;, ... ,~} X if and only if P = [Plj] is non-singular.
and let (4.3.2)
forms a basis for
Proof Let e{ ;, . .. ,~} be linearly independent, and let Pj denote the jthcolumn vector of P. Let
.4 3.
149
Equivalence and Similarity
for some scalars
lX I '
••
,IX "
E
.F This implies that
' } ' " IXIPIl
It follows that
I, ... ,n.
0, i = =
1':1
Rearranging, we have or
I::" IX I"
1= I
e;, ... , e.
O. =
Since are linearly independent, it follows that IX I = ... = IX" = O. Thus, the columns ofP are linearly independent. Therefore, P is non-singular. { I' .• • , PIt} be a linearly indeConversely, let P be non-singular, i.e., let P pendent set of vectors in .X
" ,=I:: lX,e; =
Let
I' . • . , IX"
E
" IX,PI' ... ,e,,} is a linearly independent set, it follows that I:: =
Then
Since e{ l' for j
=
0 for some scalars IX
I
I, ...
I- '
,n, and thus, I::" IX,P, = O. Since P{ I"
I-'
independent set, it now follows that
IX
e{ ;, ... , e.} is a linearly independent set. _ I
= ... =
.F
0
.. ,p,,} is a linearly IX"
=
0, and therefore
The preceding result gives rise to:
4.3.3. Definition. The matrix P of Theorem .4 3.1 basis e{ ;, ... , with respect to basis e{ I ' . • • , eft}'
e.}
is called the matrix of
We note that since P is non-singular, p- I exists. Thus, we can readily prove the next result.
,e.}
4.3.4. Theorem. L e t e{ l, ... ,e,,} and e{ ;, . .. be two bases for ,X and let P be the matrix of basis e{ ;, ... ,e~} with respect to basis e{ l' ... , eft}' Then p- I is the matrix of basis e{ I' ... , eft} with respect to the basis e{ ;,
... , e,.}.
4.3.5.
Exercise.
Prove Theorem .4 3.4.
The next result is also easily verified.
Chapter 4 I iF nite-Dimensional
150
Vector Spaces and Matrices
.4 3.6. Theorem. eL t X be a linear space, and let the sets of vectors e{ l> ... ,eft}' e{ ~ ,e..}, and e{ f' , . .. , e':} be bases for .X If P is the matrix , e'ft} with respect to basis e{ I ' • • , eft} and if Q is the matrix of basis e{ ,~ of basis e{ f' , , e':} with respect to basis e{ ,~ ... ,e..}, then PQ is the eft}' matrix of basis e{ f' , • . . , e':} with respect to basis e{ l , • • .4 3.7.
Exercise.
Prove Theorem .4 3.6.
We now prove: ~ • , e..} be two bases for a linear .4 3.8. Theorem. eL t e{ I • . • . , eft} and e{ ,~ ... ,e..} with respect to basis space .X and let P be the matrix of basis ,~{ e{ lt • • , eft}' eL t x E X and let x denote the coordinate representation of x with respect to the basis e{ lt • • , eft}' eL t x ' denote the coordinate representation of x with respect to the basis e{ ,~ ... ,e..}. Then Px ' = .x
Proof.
eL t x
T
=
(~I'
=
and let (x)' T
... '~ft)'
(~~,
... ,~~).
Then
and Thus, ~ ft ~eJ
~•
=
J-I
which implies that ,~
Therefore,
~ [ .~
1-'
J~I
=
~
ft
i
P/J~J'
j':1
x
plJe, ]
=
=
=
~ ft(. ~ P/J~J t:1 I- I
) e,
I, ...• n.
Px /. •
.4 3.9. Exercise. eL t X = Rft and let u{ It • • ,u.} be the natural basis for Rft (see Example .4 l.l5). eL t e{ lt • • ,eft} be another basis for R-, and let eft be the coordinate representations of e lt • • , e., respectively, with e lt • • • • , e.} with respect to the natural basis. Show that the matrix of basis e{ I • • • . respect to basis u{ lt • • , fU t} is given by P = e[ lt e2 , • • , eft]' i.e., the matrix whose columns are the column vectors e l • • . • ,eft' )Y , and let e{ l, ... ,e.} and { f l" .. ,f..} .4 3.10. Theorem. eL t A E L ( X , be bases for X and ,Y respectively. eL t A be the matrix of A with respect ,fill} in .Y eL t ~{ • . .. , e..} be to the bases e{ l , • • ,eft} in X and { f l' another basis for .X and letthe matrix of{e,~ , e..} with respectto e{ l , • • , eft} be P. eL t f{ ,~ ... ,f~} be another basis for ,Y and let Q be the matrix of { f l' ... ,fill} with respect to f{ ,~ ... ,f~}. eL t A' be the matrix of A with respect
.4 3.
151
Equivalence and Simiklrity
to the bases e{ ,~
... , e:,} in X and f{ ,~
... ,f~}
in .Y Then.
A' = Q AP. Proof.
We have
A(~ I~ Pklek) =
Ae; =
=
t
t Pk/Aek = t Pkl(f't1 at/rlt)
k~1
k~1
=
Pkl[l=t1 alk(t q J d j)] J=I
k~1
Now, by definition, Ae; =
~
IN
"J ::1
t(f't1 t
J-l
lJ q alkPkl)fj.
k= 1
aj,!j. Since a matrix of a linear transformation
is uniquely determined once the bases are specified, we conclude that
for i =
I, ... ,m andj
=
I, ... , n. Therefore, A' =
QAP. •
In iF gure A, Theorem .4 3.10 is depicted schematically.
x
A
A Px'
x=
" y
"
t· (e;, .. ·.e~}
A'
x'
.4 3.11.
y=
Ax
u; ..... f;"}
"
y'
=
Qy
IF gure A. Schematic diagram of Theorem .4 3.10.
The preceding result motivates the following definition. .4 3.12. (m X
Definition. An (m X n) matrix n) matrix A if there exists an (m X
A' is said to be equivalent to an m) non-singular matrix Q and an
Chapter" I iF nite-Dimensional
152
Vector Spaces and Matrices
n) non-singular matrix P such that
(n X
A' = Q AP.
(4.3.13)
IfA' is equivalent to A, we write A' ..., A. Thus, an (m X n) matrix A' is equivalent to an (m X n) matrix A if and only if A and A' can be interpreted as both being matrices of the same linear transformation A of a linear space X into a linear space ,Y but with respect to possibly different choices of bases. Our next result shows that ..., is reflexive, symmetric, and transitive, and as such is an equivalence relation. .4 3.14.
Theorem. Let A, B, and C be (m x
n) matrices. Then
(i) A is always equivalent to A; (ii) if A is equivalent to B, then B is equivalent to A; and (iii) if A is equivalent to Band B is equivalent to C, then A is equivalent to C. .4 3.15.
Exercise.
Prove Theorem .4 3.14.
The reader can prove the next result readily. .4 3.16.
Theorem. Let A and B be m x n matrices. Then
(i) every matrix A is equivalent to a matrix of the form
1 0 0 .. ·
o
1 0
...
· ..
0-
...
0
000 .. · 1 0 0 0 0 0 .. · 0 0 0
.. · 0 .. · 0
0 0 0 .. · 0 0 0
.. · 0
r
= rank A (4.3.17)
(ii) two (m x n) matrices A and B are equivalent if and only if they have the same rank; and (iii) A and AT have the same rank.
.4 3.18.
Exercise.
Prove Theorem .4 3.16.
Our definition of rank of a matrix given in the last section (Definition .4 2.25) is sometimes called the columa rank of a matrix. Sometimes, an analogous definition for row rank of a matrix is also considered. The above theorem shows that the row rank of a matrix is equal to its column rank.
.4 3.
153
Equivalence and Similarity
Next, let us consider the special case when X
.Y =
We have:
Theorem. L e t A E L ( X , X), let (e l , • • , e.l be a basis for ,X and let A be the matrix of A with respect to (e l , • • , e.l. L e t (e~, ... , e"l be another basis for X whose matrix with respect to (e l , • • , e.l is P. L e t A' be the matrix of A with respect to (~, ... , e"l. Then
.4 3.19.
A'
=
P- I AP.
(4.3.20)
The meaning of the above theorem is depicted schematically in F i gure B. The proof of this theorem is just a special application of Theorem .4 3.10. ,;,.;A~
X
A
t,
Ie" . ". enl
....
• X
•
A'
Ie; ..... e~l 4.3.21.
__
Ie,.' ..• enl
t,,-
e{ ;, ... , e~}
Figure B. Schematic diagram of Theorem .4 3.19.
Theorem .4 3.19 gives rise to the following concept.
.4 3.22.
Definition. An (n X n) matrix A' is said to be similar to an (n X matrix A if there exists an (n X n) non-singular matrix P such that A'
= P- I AP.
n)
(4.3.23)
If A' is similar to A, we write A' ,." A. We call P a similarity transformation.
It is a simple matter to prove the following: Theorem. Let A' be similar to A; i.e., A' = P- I AP, where P is non-singular. Then A is similar to A' and A = PA' P - I .
.4 3.24.
In view of this result, there is no ambiguity in saying two matrices are similar. To sum up, if two matrices A and A' represent the same linear transforX), possibly with respect to two different bases for ,X mation A E L ( X , then A and A' are similar matrices.
eMpter 4 I iF nite-Dimensional
154
Vector Spaces and Matrices
Our next result shows that ' " given in Definition 4.3.22 is an equivalence relation. 4.3.25.
Let A, B, and C be (n x
Theorem.
n) matrices. Then
(i) A is similar to A; (ii) if A is similar to B, then B is similar to A; and (iii) if A is similar to B and if B is similar to C, then A is similar to C. 4.3.26.
Exercise.
Prove Theorem .4 3.25.
F o r similar matrices we also have the following result. 4.3.27.
Theorem.
(i) Ifan (n X n) matrix A is similar to an (n X n) matrix B, then At is similar to Bk, where k is a positive integer. (ii) L e t (4.3.28) where
/%0'
••
,/%",
E
.F
Then
f(P- I AP) =
P- l f(A)P.
(4.3.29)
This implies that if B is similar to A, then f(B) is similar to f(A). In fact, the same matrix P is involved. (iii)
L e t A' be similar to A, and let f(l) denote the polynomial of Eq. (4.3.28). Then f(A) = 0 if and only if f(A' ) = O. (iv) L e t A E L ( X , X ) , and l' et A be the matrix of A with respect to a basis e{ l , • • ,e.} in .X L e t f(l) denote the polynomial of Eq . (4.3.28). Then f(A) is the matrix of f(A) with respect to the basis
e{ l , • • , e.}. (v) L e t A E L ( X , X ) , and letf(l) denote the polynomial ofEq . (4.3.28). Let A be any matrix of A. Thenf(A) = 0 ifand only iff(A) = O.
4.3.30.
Exercise.
Prove Theorem 4.3.27.
We can use results such as the preceding ones to good advantage. F o r example, let A' denote the matrix
11
o
0 0 12 0
00
A' =
(4.3.31)
o o
0 0
0 0
1"_1
o
0
1.
.4 .4
ISS
Determinants ofMatrices
Then
MOO o A~ 0 (A')k
.. ·
·
0 0
=
o
o
0 0
0 0
o
Now letf(A) be given by Eq. (4.3.28). Then
I 0 o I f(A' )
=
0 0
0
+
(10
0 0 0 0
A'1' 0
Al
0
0 ........ . Ar .........
o o
o o
0 A2
+ ...
(II
I
0 0
0
f(AI )
0
0
o
o o
0 0
A"_I 0
0 f(A2)
0
A"
............ . ...........
o o
0 0
-
o f(l.)
We conclude the present section with the following definition. .4 3.32. Definition. We call a matrix of the form (4.3.31) a diagonal matrix. Specifically, a square (n X n) matrix A = [a'l] is said to be a diagonal matrix if alj = 0 for all i j. In this case we write A = diag (all, an, ... , a•• ).
"*
.4 .4
DETERMINANTS OF
MATRICES
At this point of our development we need to consider the important topic of determinants. After stating the definition of the determinant of a matrix, we explore some of the commonly used properties of determinants. We then characterize singular and non-singular linear transformations on finite-dimensional vector spaces in terms of determinants. Finally, we give a method of determining the inverse of non-singular matrices. Let N = {I, 2, ... ,n} . We recall (see Definition 1.2.28) that a permutation on N is a one-to-one mapping of N onto itself. F o r example, if (J denotes a
Chapter 4 I iF nite-Dimensional
156
permutation on N, then we can represent it as
wherej, E Nfor i = I, ... , n andj, q given above, more compactly, as
*-
j" for i *- k. Henceforth, we represent .. . j".
= j dz q
n)
... ... j,,'
I 2 ( jl jz
q=
Vector Spaces and Matrices
Clearly, there are n! possible permutations on N. We let P(N) denote the set of all permutations on N, and we distinguish between odd and even permutations. Specifically, if there is an even number of pairs (i, k) such that i > k but i precedes k in q , then we say that q is even. Otherwise q is said to be odd. Finally, we define the function sgn from P(N) into F b y
+
= {
sgn (q)
I q
is even
-I q is odd for all q E P(N). Before giving the definition of the determinant of a matrix, let us consider a specific example.
4.4.1. Example. As indicated in the accompanying table, there are six permutations on N = (I, 2,3). In this table the odd and even permutations are identified and the function sgn is given. t1
t1
(jl.h)
(j.. h)
123 132 213 231 312 321
(1,2) (1,3) (2, 1) (2,3) (3,1) (3,2)
(1,3) (1,2) (2,3) (2,1) (3,2) (3,1)
Now let A denote the (n X
(jz , h)
sgn t1
even
+1 -1 -1 +1 +1 -1
(2,3) (3,2) (1,3) (3,1) (1,2) (2, 1)
odd odd
even even odd
n) matrix
all al2 A=
is
odd or even
[
a~~ a"l
.. ~ a"z
alrt] ......... •.•
~"
.
a""
We form the product of n elements from A by taking one and only one element from each row and one and only one element from each column. We represent this product as
.4 .4
157
Determinants ofMatrices
where tU i]. ... j.) E P(N). It is possible to find n! such products, one for each u E P(N). We now define the determinant of A, denoted by det (A), by the sum det (A) = where u
=
I:
"ep(N)
sgn (0') • allt • a2jo • . ..•
a.}.,
(4..4 2)
jl .. . j .• We also denote the determinant of A by writing
det(A)
=
(4..4 3)
We now present some of the fundamental properties of determinants. .4 .4 .4
Theorem.
eL t A and B be (n
x n) matrices.
(i) det (AT) = det (A). (ii) If all elements of a column (or row) of A are ez ro, then det (A) = O. (iii) IfB is the matrix obtained by multiplying every element in a column (or row) of A by a constant tx, while all other columns of B are the same as those in A, then det (B) = tx det (A). (iv) If B is the same as A, except that two columns (or rows) are interchanged, then det (B) = - d et (A). (v) If two columns (or rows) of A are identical, then det (A) = O. (vi) If the columns (or rows) of A are linearly dependent, then det (A) = O.
Proof To prove the first part, we note first that each product in the sum given in Eq. (4..4 2) has as a factor one and only one element from each column and each row of A. Thus, transposing matrix A will not affect the n! products appearing in the summation. We now must check to see that the sign of each term is the same. F o r U E P(N), the term in det (A) corresponding to 0' is sgn (u)a llta 2•} .• • a.} .• There is a product term in det (AT) of the form a lt'lajo'2" . aN. such that a 1lt a 2jo . . , a.} . = a} I ' l aN2 ... au .• The right-hand side of this equation is just a rearrangement of the left-hand side. The number of j; > j;+ I for i = I, ... ,n - I is the same as the number of j/ > j/+ I for i = 1, ... , n - 1. Thus, if 0" = ;U j~ . . .j~) then sgn (u' ) = sgn (0'), which means det (AT) = det (A). Note that this result implies that any property below which is proved for columns holds equally as well for rows. To prove the second part, we note from Eq. (4..4 2) that if for some i, Q/ k = 0 for all k, then det (A) = O. This proves that if every element in a row of A is ez ro, then det (A) = O. By part (i) it follows that this result holds also for columns. _
Chapter 4 I iF nite-Dimensional
158 .4 .4 5.
Exercise.
Prove parts (iii}(- vi)
Vector Spaces and Matrices
of Theorem .4 .4 .4
We now introduce some additional concepts for determinants. .4 .4 6. Definition. Let A = a[ l' ] be an n x n matrix. If the ith row and jth column of A are deleted, the remaining (n - 1) rows and (n - 1) columns can be used to form another matrix Mil whose determinant is det (Mil)' We call det (MIJ) the minor of a'l' If the diagonal elements of MIJ are diagonal elements of A, i.e., i = j, then we speak of a principal minor of A. The cofactor of a'l is defined as (- 1 )' + 1 det (MIJ). F o r example, if A is a (3 x
3) matrix, then
det (A)
=
all a ZI
au
an
a l3 a Z3
,
the minor of element a Z3 is det(Mz3) and the cofactor of a Z3 is
=
a ll
l
a 31
The next result provides us with a convenient method of evaluating determinants. .4 .4 7. Theorem. Let A be an n x n matrix. eL t e'l denote the cofactor of a'l' i,j = I, ... ,n. Then the determinant of A is equal to the sum of the products of the elements of any column (or row) of A, each by its own cofactor. Specifically, (4..4 8) for j = for i =
I, ... , n, and, det (A) 1, ... ,n.
F o r example, if A is a (2 x
= J=IL "
a,AI'
2) matrix, then we have
(4..4 9)
.4 .4
159
Determinants ofMatrices
If A is a (3
x 3) matrix, then we have =
det (A)
=
all
012.
0' 3
02'
au
023
0IlC I ,
+
0I1CU
+
0I3 C I3'
In this case five other possibilities exist. F o r example, we also have det (A) .4 .4 10.
Exercise.
=
O"C"
+
02,C2'
+
a 3 ,c 31 •
Prove Theorem .4 .4 7.
We also have: .4 .4 11. Theorem. Ifthe ith row of an (n X n) matrix A consists of elements of the form 0/1 + 0:" 0' 2 + 0;2' • • ,a," + 0:.; i.e., if
a.2
then
det(A)
.4 .4 12.
=
Exercise.
Prove Theorem .4 .4 11.
Furthermore, we have: .4 .4 13. Theorem. eL t A and B be (n x n) matrices. If B is obtained from the matrix A by adding a constant tt times any column (or row) to any other column (or row) of A, then det (B) = det (A). .4 .4 14.
Exercise.
Prove Theorem .4 .4 13.
In addition, we can prove:
Chapter 4 I iF nite-Dimensional
160
Vector Spaces and Matrices
.4 .4 15.
Theorem. Let A be an (n X n) matrix, and let c,/ denote the cofactor of 0 ,/, i,j = I, ... , n. Then the sum of products of the elements of any column (or row) by the corresponding cofactors of the elements of any other column (or row) is ez ro. That is,
• ~
a,/c ,k
1=1
and
= 0 for j
*' k
(4..4 16a) (4..4 16b)
.4 .4 17.
Exercise.
Prove Theorem .4 .4 15.
We can combine Eqs. (4..4 8) ~
•
and (4..4 16a)
a,/c ,k =
1=1
to obtain
det (A)cS/k>
(4..4 18)
1, ... , n, where /~ k denotes the Kronecker combine Eqs. (4..4 9) and (4..4 16b) to obtain
j, k =
delta. Similarly, we can (4..4 19)
1, ... , n. We are now in a position to prove the following important result.
i, k =
.4 .4 20.
Theorem. eL t A and B be (n
Proof
We have
det (AD) =
det(AB)= ~
•
'.=1
By Theorem .4 .4 11
x
n) matrices. Then
det (A) det (B).
(4.4.21)
.
a",.b /• 1
and Theorem .4 .4 ,4
part (iii), we have
a""
a",.
This determinant will vanish whenever two or more of the indices i/,j = 1, ... , n, are identical. Thus, we need to sum only over (f E P(N). We have det (AB) =
~
"EP(N)
b"lb,,1" .b ,•
.
.
,
.4 .4
Determinants 01 Matrices
161
where q = ili~ ... i. and P(N) is the set of all permutations of N = n}. It is now straightforward to show that
{I, ... ,
sgn (q) det (A), =
and hence it follows that det (AB)
= det (A) det (B). •
Our next result is readily verified. .4 .4 22. Theorem. Let I be the (n x n) identity matrix, and let 0 be the (n x n) zero matrix. Then det (I) = I and det (0) = 0. .4 .4 23.
Exercise.
Prove Theorem .4 .4 22.
The next theorem allows us to characterize non-singular matrices in terms of their determinants. .4 .4 24. Theorem. An (n X (A)::I= O.
n) matrix A is non-singular if and only if det
Suppose that A is non-singular. Then A- I exists and A- I A = AA- I I. F r om this it follows that det (A - I A) = I *0, and thus, in view of Eq. (4..4 21), det (A - I ) ::1= 0 and det (A) O. Next, assume that A is singular. By Theorem .4 3.16, there exist nonsingular matrices Q and P such that
Proof
=
*
o A' =
QAP=
°
o
o This shows that rank A det (QAP)
and det (P) =0 . •
*°
< nand det (A')
=
=
0. But
d[ et (Q») • [det (A») • [det (P»)
=
0,
and det (Q)::I= 0. Therefore, if A is singular, then det (A)
Chapter 4 I iF nite-Dimensional
162
Vector Spaces and Matrices
Let us now turn to the problem of finding the inverse A- I of a nonsingular matrix A. In doing so, we need to introduce the classical adjoint of A. .4 .4 25. Definition. Let A be an (n X n) matrix, and let c' j be the cofactor of D/J for i,j = 1, ... ,n. Let C be the matrix formed by the cofactors of A; The matrix (J is called the classical adjoint of A. We write i.e., C = c[ /J' ] adj (A) to denote the classical adjoint of A. We now have: .4 .4 26.
Theorem.
Let A be an (n
=
A[adj (A)]
n) matrix. Then X
a[ dj (A)]A
=
[det (A)] • I.
Proof The proof follows by direct computation, using Eqs. (4..4 18) (4..4 19).
•
As an immediate consequence of Theorem .4 .4 26 lowing practical result. 4.4.27.
Let A be a non-singular (n x
CoroUary.
=
A -I .4 .4 29.
Example.
We have det(A)
and
we now have the foln) matrix. Then
de/(A) adj(A).
(4.4.28)
=
Consider the matrix
_~ H
A~[:
-1,
adj (A) and
=[
-3
~
-1 -1
1 -1
~],
-2 A- I
= [
-~
The proofs of the next two theorems are left as an exercise. .4 .4 30.
Theorem. If A and 8 are similar matrices, then det (A) =
det (8).
X). Let A be the matrix of A with respect .4 .4 31. Theorem. Let A E L ( X , to a basis {el>' .. ,e,,} in ,X and let A' be the matrix of A with respect to another basis fe;, ... , e:.} in .X Then det (A) = det (A').
.4 5.
Eigenvalues and Eigenvectors
.4 .4 32.
Exercise.
163
Prove Theorems .4 .4 30
and .4 .4 31.
In view of the preceding results, there is no ambiguity in the following definition.
.4 .4 33. Definition. The determinant of a linear transformation A of a finite-dimensional vector space X into X is the determinant of any matrix A representing it; i.e., det (A) Do det (A). The last result of the present section is a consequence of Theorems .4 .4 20 and .4 .4 24.
.4 .4 34.
Theorem. Let X be a finite-dimensional vector space, and let A, B E L ( X , X ) . Then A is non-singular if and only if det (A) O. Also, det (AB) = d[ et (A)] • d[ et (B)].
*"
.4 5.
EIGENVALE U S
AND EIGENVECTORS
In the present section we consider eigenvalues and eigenvectors of linear transformations defined on finite-dimensional vector spaces. Later, in Chapter 7, we will reconsider these concepts in a more general setting. Eigenvalues and eigenvectors play, of course, a crucial role in the study of linear transformations. Throughout the present section, X denotes an n-dimensional vector space over a field .F eL t A E L ( X , X ) , and let us assume that there exist sets of vectors e{ l, ... , e.J and e{ ;, ... , e~J, which are bases for X such that
e; =
Ael =
lle l , (4.5.1)
i. = Ae. = l.e.,
where 1, E ,F i = 1, ... , n. If this is the case, then the matrix A' of A with respect to the given basis is
A/ =
This motivates the following result.
o
Chapter 4 I iF nite-Dimensional
164
.4 5.2. Theorem. eL t A such that
E
Vector Spaces and Matrices
X ) , and let.t E .F Then the set ofall x E X
L ( ,X
Ax
Ax =
(4.5.3)
is a linear subspace of .X In fact, it is the null space of the linear transformation (A - .tI), where I is the identity element of L(X, )X .
Proof
Since the zero vector satisfies Eq. (4.5.3) for any .t E ,F the set is non-void. If the zero vector is the only such vector, then we are done, for O { J is a linear subspace of X (of dimension ez ro). In any case, Eq. (4.5.3) holds if and only if (A - U ) x = O. Thus, x belongs to the null space of A - U , and it follows from Theorem 3.4.19 that the set of all x E X sat• isfying Eq. (4.5.3) is a linear subspace of .X Henceforth
we let
mol = x{
:X (A -
.tl)x
=
OJ. (4.5.4) The preceding result gives rise to several important concepts which we introduce in the following definition. E
X ) , and mol be defined as in Theorem .4 5.5. DefiDition. Let ,X A E L ( X , .4 5.2 and Eq. (4.5.4). A scalar .t such that mol contains more than just the zero vector is called an eigenvalue of A (i.e., if there is an x =# 0 such that Ax = lx , then 1 is called an eigenvalue of A). When .t is an eigenvalue of A, then each x =# 0 in mol is called an eigenvector of A corresponding to the eigenvalue .t. The dimension of the linear subspace mol is called the multiplicity of the eigenvalue .t. Ifmol is of dimension one, then A. is called a simple eigenvalue. The set of all eigenvalues of A is called the spectrum of A.
Some authors call an eigenvalue a proper value or a characteristic value or a latent value or a secular value. Similarly, other names for eigenvector are proper vector or cbaracteristic vector. The space mol is called the .tth proper subspace of X. F o r matrices we give the following corresponding definition. .4 5.6. DefiDition. Let A be an (n X n) matrix whose elements belong to the field .F If there exists.t E F and a non-zero vector x E F " such that
Ax
=
.tx
(4.5.7)
then .t is called an eigenvalue of A and x is called an eigenvector of A corresponding to the eigenvalue .t. Our next result provides the connection between Definitions .4 5.5 and .4 5.6. .4 5.8. Theorem. Let A E L ( X , X ) , and let A be the matrix of A with respect to the basis e{ ., ... ,e,,}. Then A. is an eigenvalue of A if and only if.t is an eigenvalue of A. Also, x E X is an eigenvector of A corresponding to .t if
.4 5.
165
Eigenvalues and Eigenvectors
and only if the coordinate representation of x with respect to the basis e{ I' • • , e,,}, ,x is an eigenvector of A corresponding to 1. .4 5.9.
Exercise.
Prove Theorem 4.5.8.
Note that if x (or x) is an eigenvector of A (of A), then any non-ez ro multiple of x (of x) is also an eigenvector of A (of A). In the next result, the proof of which is left as an exercise, we use determinants to characterize eigenvalues. We have:
.4 5.10.
Theorem. Let A E L(X, and only if det (A - lI) = O.
.4 5.11.
Exercise.
)X .
Then 1
E
F is an eigenvalue of A if
Prove Theorem 4.5.10.
Let us next examine the equation det(A - 1 1) =
0
(4.5.12)
in terms of the parameter 1. We ask: Can we determine which values of 1, if any, satisfy Eq. (4.5.12)1 eL t e{ l, ... ,e,,} be an arbitrary basis for X and let A be the matrix of A with respect to this basis. We then have det (A -
U)
=
det (A -
11).
(4.5.13)
The right-hand side of Eq. (4.5.13) may be rewritten as (all
-1)
au
at..
det(A - 1 1) =
(4.5.14) 0"1
ad
(a"" -
1)
It is clear from Eq. (4.4.2)
that expansion of the determinant (4.5.14) yields a polynomial in 1 of degree n. In order for 1 to be an eigenvalue of A it must (a) satisfy Eq. (4.5.12), and (b) it must belong to .F Requirement (b) warrants further comment: note that there is no guarantee that there exists 1 E F such that Eq. (4.5.12) is satisfied, or equivalently we have no assurance that the nth-order polynomial equation det(A - 1 1) =
0
has any roots in .F There is, however, a special class of fields for which requirement (b) is automatically satisfied. We have:
.4 5.15.
Definition. A field F is said to be algebraically closed if for every polynomial p(l) there is at least one 1 E F such that
Pel) =
o.
(4.5.16)
Chapter 4 I iF nite-Dimensional
166
Vector Spaces and Matrices
Any 1 which satisfies Eq. (4.5.16) is said to be a root of the polynomial equation (4.5.16). In particular, the field ofcomplex numbers is algebraically closed, whereas the field of real numbers is not (e.g., consider the equation ..P + I = 0). There are other fields besides the field of complex numbers which are algebraically closed. oH wever, since we will not develop these, we will restrict ourselves to the field of complex numbers, C, whenever the algebraic closure property of Definition .4 5.15 is required. When considering results that are valid for a vector space over an arbitrary field, we will (as before) make usage of the symbol F or frequently (as before) make no reference to F at all. We summarize the above discussion in the following theorem. .4 5.17.
Theorem. eL t A
E
L(X,
Then
X).
(i) det (A - 1 I) is a polynomial of degree n in the parameter 1; i.e., there exist scalars /10' /II' • • , /1ft' depending only on A, such that lT) =
det (A (note that
/1 0
=
/1 0
det (A) and
+
/Ill
/1ft
= (-
+
/lz l z
+ ... +
/I)' f t
(4.5.18)
I)");
(ii) the eigenvalues of A are precisely the roots of the equation (A - ).T) = 0; i.e., they are the roots of /1 0
+
/II).
+
+ ... +
/lz)z'
/lft1"
= 0; and
det
(4.5.19)
(iii) A has; at most, n distinct eigenvalues. The above result motivates the following definition. .4 5.20.
Definition. eL t A E L ( X , det (A -
1I)
and let A be a matrix of A. We call
X),
= det (A -
).1) =
/1 0
+
/II).
+ ... +
/I)."
(4.5.21)
the characteristic polynomial of A (or of A) and det(A - 1 T) =
det(A - 1 1) =
0
(4.5.22)
the characteristic equation of A (or of A). rF om the fundamental properties of polynomials over the field of complex numbers there now follows: Theorem. If X is an n-dimensional vector space over C and if X ) , then it is possible to write the characteristic polynomial of A in the form
.4 5.23. A
E
L(X,
det (A -
).1)
=
(1 1 -
).)",,().z -
).)"" • . •
()., -
).)"",
(4.5.24)
.4 5.
167
Eigenvalues and Eigenvectors
where AI' i = 1, ... ,p, are the distinct roots of Eq. (4.5.19) (Le., AI 1= = A/ for i 1= = j). In Eq. (4.5.24), ml is called the algebraic multiplicity of the root AI'
t
The ml are positive integers, and
ml =
1= 1
n.
Note the distinction between the concept of algebraic multiplicity of AI given in Theorem .4 5.23 and the multiplicity of ).1 as given in Definition .4 5.5. In general, these need not be the same, as will be seen later. We now state and prove one of the most important results of linear algebra, the Cayley-aH milton theorem. .4 5.25. Theorem. eL t A be an n X n matrix, and let p(A) = be the characteristic polynomial of A. Then P(A) =
det (A -
AI)
O.
Proof eL t the characteristic polynomial for A be p(A) =
+
~o
+ ... +
~IA
~"A".
Now let B(A) be the classical adjoint of (A ~ AI). Since the elements bli).) of B(A) are cofactors of the matrix A - ),1, they are polynomials in A of degree not more than n - 1. Thus, blJ(A)
Letting Bk
=
PI/O
+
PI/IA +
... +
PI/ by Eq. (4.6.6). • In addition to the diagonal form and the block diagonal form, there are many other useful forms for matrices to represent linear transformations on finite-dimensional vector spaces. One of these canonical forms involves triangular matrices, which we consider in the last result ofthe present section. We say that an (n X n) matrix is a triangulu matrix ifit either has the form
all
or the form
o
012.
0 13
ab
022
023
02.
0
0
0
0
0 0
(4.6.21)
a._ I ,. a••
all
0
0
0
021
02:1,
0
0
(4.6.22)
In case of Eq. (4.6.21) we speak of an upper triangulu matrix, whereas in case of Eq. (4.6.22) we say the matrix is in the lower triangular form.
.4 6.
Some Canonical oF rms ofMatrices
117
.4 6.23. Theorem. L e t X be an n-dimensional vector space over C, and let A E L ( X , X). Then there exists a basis for X such that A is represented by an upper triangular matrix.
Proof. We wilt show that if A is a matrix of A, then A is similar to an upper triangular matrix A'. Our proof is by induction on n. If n = 1, then the assertion is clearly true. Now assume that for n = k, and C any k x k matrix, there exists a non-singular matrix Q such that C' = Q- I CQ is an upper triangular matrix. We now must show.the validity of the assertion for n = k + 1. Let X b e a (k + I)-dimensional vector space over C. Let AI be an eigenvalue of A, and letll be a corresponding eigenvector. Let { f z , ... ,fk+l} be any set of vectors in X such that { f l' ... ,fk+l} is a basis for .X L e t B be the matrix of A with respect to the basis { f l' ... , fk+I.} Since All = A.lI • B must be of the form AI
B= Now let C be the k
[
bl2
~
... ::: ...
bk+I,z
.• .
0....
o
bl,k+1
~ '.k:.1
J
.
bk+I,k+1
x k matrix
By our induction hypothesis, there exists a non-singular matrix Q such that C' = Q- I CQ, where C' is an upper triangular matrix. Now let
I
0-- :- p=
i I
0
I •
I
0
I I I
•
... Q
I I
0:
I
By direct computation we have I ;I 0
0:
...
~-I-
p- I
=
I I
.: •
I 1
0:
1
Q- I
0
178
Chapter 4 I iF nite-Dimensional
and
AI :. • -~_.
P- I BP
•.
o:
Vector Spaces and Matrices
•
I I
=
I I I I I I
o:
where the .' s denote elements which may be non-ez ro. Letting A = P-IBP, it follows that A is upper triangular and is similar to B. eH nce, any (k + 1) x (k + 1) matrix which represents A E L ( X , X ) is similar to the upper triangular matrix A, by Theorem .4 3.19. This completes the proof of the theorem. _ Note that if A is in the triangular form of either Eq. (4.6.21) or (4.6.22), then det (A - 11) = (a J I - A)(au - A) ... (a • - 1). In this case the diagonal elements of A are the eigenvalues of A.
.4 7.
MINIMAL POLN Y OMIALS, OPERATORS, AND THE CANONICAL O F RM
NILPOTENT JORDAN
In the present section we develop the Jordan canonical form of a matrix. To do so, we need to introduce the concepts of minimal polynomial and nilpotent operator and to study some of the properties of such polynomials and operators. nU less otherwise specified, X denotes an n-dimensional vector
space over a field F throughout the present section. A.
Minimal Polynomials
F o r purposes of motivation, consider the matrix
=
A
[~ o ~ =~]. 3
-I
The characteristic polynomial of A is p(A)
=
1)Z(2 -
(I -
and we know from the Cayley- Hamilton P(A)
=
1),
theorem that O.
(4.7.1)
.4 7.
179
Minimal Polynomials
Now let us consider the polynomial Then
m(A) =
A)(2 -
(1 -
m(A)
=
A) =
2-
3A +
+
A2
= O.
3A
21 -
AZ • (4.7.2)
Thus, matrix A satisfies Eq. (4.7.2), which is of lower degree than Eq. (4.7.1), the characteristic eq u ation of A. Before stating our first result, we recall that an nth- o rder polynomial in A is said to be monic if the coefficient of An is unity (see Definition 2.3.4). 4.7.3. Theorem. L e t A be an (n polynomial m(A) such that X
n) matrix. Then there exists a unique
(i) m(A) = 0; (ii) m(A) is monic; and, (iii) if m'(A) is any other polynomial such that m'(A) = 0, then the degree of m(A) is less or equal to the degree of m'(A) (Le., m(A) is ofthe lowest degree such that m(A) = 0). Proof We know that a polynomial, p(A), exists such that P(A) = 0, namely, the characteristic polynomial. F u rthermore, the degree of p(A) is n. Thus, there exists a polynomial, say f(A), of degree m < n such that f(A) = O. Let us choose m to be the lowest degree for which f(A) = O. Since f(A) is of degree m, we may divide f(A) by the coefficient of Am, thus obtaining a monic polynomial, m(A), such that m(A) = O. To show that m(A) is uniq u e, suppose there is another monic polynomial m' ( A) of degree m such that m'(A) = O. Then m(l) - m' ( l) is a polynomial of degree less than m. F u rthermore, m(A) - m'(A) = 0, which contradicts our assumption that m(A) is the polynomial of lowest degree such that m(A) = O. This completes the proof. _
The preceding result gives rise to the notion of minimal polynomial. 4.7.4. Definition. The polynomial m(A) defined in Theorem .4 7.3 is called the minimal polynomial of A. Other names for minimal polynomial are minimum polynomial and reduced characteristic fUBction. In the following we will develop an explicit form for the minimal polynomial of A, which makes it possible to determine it systematically, rather than by trial and error. In the remainder of this section we let A denote an (n X n) matrix, we let p(A) denote the characteristic polynomial of A, and we let m(A) denote the minimal polynomial of A. Theorem. Let f(l) be any polynomial such that f(A) = m(A) divides f(A).
4.7.5.
O. Then
Chapter 4 I iF nite-Dimensional
180
Vector Spaces and Matrices
Proof. Let 11 denote the degree of mel). Then there exist polynomials q ( l) and r(l) such that (see Theorem 2.3.9) I(l)
! ... , Ne"..} is in WI' We show that this set is linearly independent by contradiction. Assume there are scalars (XI" • ,(x , and PI' ... , PI> not all ez ro, such that
Proof
(Xle l
Since e{ l , • • be non-ez ro. eH nce,
+ ... +
(X,e,
+
PINe,,+1
+ ... +
p,Ne".,
= O.
,e,} is a linearly independent set, at least one of the PI must Rearranging the last equation we have
Chapter 4 I iF nite-Dimensional
188
Thus,
Vector Spaces and Matrices
+ ... +
fl,e,• ..> = 0, W,. If fl.e,,+! + ...
N' ( fl. e,,+.
and (fl.e,,+. + ... + fl,e".,) E + fl,e" • 1= = 0, it can be written as a linear combination of e., ... , e", which contradicts the fact that e{ ., . .. ,e".,} is a linearly independent set. If fl.e,,+. + ... + fl,e,•• , = 0, we contradict the fact that e { ., ... , e".,} is a linearly independent set. eH nce, weconcludethatlZ, = Ofori = I, ... , r andfl, = Ofori = I, ... , t. This completes the proof of the theorem. _ We are now in a position to consider the general representation of a nilpotent operator on a finite-dimensional vector space. .4 7.33. let N
Theorem. eL t V be an m-dimensional vector space over C, and L ( V, V) be nilpotent of index v. Let W. = {x: Nx = O}, ... , W. = {x: N· x = OJ, and let I, = dim W" i = I, ... ,v. Then there exists a basis for V such that the matrix N of N is of block diagonal form, E
N=:[ ' where
N,=
o
:],
(4.7.34)
N,
0100 0010
00 00
0000 0000
01 00
.
(4.7.35)
i = 1, ... ,r, where r = I., N, is a (k, x k,) matrix, I :::;; k,:::;; determined in the following way: there are
I. -
I._I
2/, -
1'1+
2/. -
11
-
(v 1,-.
(i
lI,
and k, is
X v) matrices,
x i) matrices, i = 2, ... ,v -
(I x
I, and
I) matrices.
The basis for V consists of strings of vectors of the form Proof By eL mma .4 7.32, W. c W1 C • • c W•. Let e{ ., ... , e.} be a basis for V such that {e., . .. ,e,.l is a basis for W,. We see that W. = V. Since N is nilpotent of index v, W._ 1 1= = W. and 1.-. < I•. We now proceed to select a new basis for V which yields the desired result. We find it convenient to use double subscripting of vectors. L e th .• = e,•.• .+ ,
.4 7.
189
Nilpotent Operators
•• ,/(/y- I v_ . ),y = e,y and let It. .- 1 = Nlt.., ... ,/(/.- 1 .- . ),.- 1 = NI(/._I .• • )•• , By Lemma .4 7.32, it follows that {el>'" ,e,._.,fl .• - I ,' " ,I which mayor may not be a basis for W._ I' If it is not, we adjoin additional elements from W._> \ denoted by 1\- so as to form a basis for W._ I • Now let 11 .• 2- . = NII • - I ,I2.•• 2- . = NI2..• - I ' · · · ,1\
j>
j =
I I.
II.., we see that the first column in Figure
C reading
We see that each column of Figure C determines a string consisting of k, entries, where k, = v for i = I, ... , (I. - /._1)' Note that (/. - 1.-1) > 0, so there is at least one string. In general, the number of strings withj entries is (// - //-1) - (/J + I - //) = 2/} - I} + I - I} - I for j = 2, ... , v - I. Also, there are /1 - (12. - /1) = 2/ 1 - /" vectors, or strings with one entry. Finally, to show that the number of entries, NI, in N is /1' we see that
Chapter 4 I iF nite-Dimensional
190
Vector Spaces and Matrices
- I. - 1.- 2 ) + there are a total of(/. - I.- I ) + (2/'1+ (2/ 1 - 12 ) = II columns in the table of Figure C. This completes the proof of the theorem. _
... +
(2/ 2 -
II -
13 )
The reader should study Figure C to obtain an appreciation of the structure of the basis for the space V.
C. The oJ rdan
Canonical oF rm
We are finally now in a position to state and prove the result which establishes the Jordan canonical form of matrices. .4 7.37.
A E L(X,
Deorem. eL t X be an n-dimensional vector space over C, and let X ) . eL t the characteristic polynomial of A be
p(A) =
A)"" ... (A, -
(AI -
A)m.,
and let the minimal polynomial of A be m(A)
=
(A -
AI)" ... (A -
A,)",
where AI' ... ,A, are the distinct eigenvalues of A. eL t ,X
Then (i) (ii) (iii) (iv)
Xl>"" X
X,
=
x{
E
X:
(A -
A,I)"x
= OJ.
are invariant subspaces of X under A;
= IX EB ..• EB
X,;
dim ,X = m i = 1, ... ,p; and " there exists a basis for X such that the matrix A of A with respect to this basis is of the form AI A
where A, is an (m,
= X
[
0 ... 0]
~ ... ~.2 o
0
•
: : : •
~.
'
(4.7.38)
... A,
m,) matrix of the form
A, = 1,1 + N,
(4.7.39)
and where N, is the matrix of the nilpotent operator (A, of index V, on ,X given by Eq. (4.7.34) and Eq. (4.7.35).
liT)
Proof. Parts (i)-(iii) are restatements of the primary decomposition theorem (Theorem .4 7.20). From this theorem we also know that (1 - 1 ,)" is the minimal polynomial of A" the restriction of A to "X eH nce, if we let N, = A, - l,I, then N, is a nilpotent operator of index V, on "X We are thus able to represent N, as shown in Eq. (4.7.35). The completes the proof of the theorem. _
.4 7.
oJ rdan Canonical oF rm
191
A little extra work shows that the representation of A E L ( X . X ) by a matrix A of the form given in Eqs. (4.7.38) and (4.7.39) is unique. except for the order in which the block diagonals AI• . ..• Ap appear in A. .4 7.40. Definition. The matrix A of A E L ( X . X ) given by Eqs. (4.7.38) and (4.7.39) is called the Jordan canonical form of A. We conclude the present section with an example.
Example. Let X = R 7 • and let u{ I • • • u7 } be the natural basis for .4 7.41. X (see Example .4 I.15). L e t A E L ( X . X ) be represented by the matrix 3 0 o o 0 2 -1 2 1 -1 -6 0 2 -2 0 -1 1 3 0 o 0 o 0 1 o0 o 0 o 0 o 1 0 -I -I o 1 2 4 1 -I
A=
-1
0
o
1
1
1
o
0
with respect to u{ I , • . . • u7 } . L e t us find the matrix At which represents A in the J o rdan canonical form. We first find that the characteristic polynomial of A is Pel)
=
1)7.
(I -
This implies that 1 1 = I is the only distinct eigenvalue of A. Its algebraic multiplicity is m. = 7. In order to find the minimal polynomial of A. let N
=
),.1,
A-
where I is the identity operator in L ( X , respect to the natural basis in X is
o
-2
N= A - I =
o
o
2
I
-2
o
o o o
o
-1
-I
X).
The representation for N with
-I
1
0
o
1 -I
0 -6
-I
I
-I
1
1 0
0
o o
0
0
1
2
0
3 0o0 0
3 0
o o
0 0 4 0
Chapter 4 I iF nite-Dimensional
192
Vector Spaces and Matrices
We assume the minimal polynomial is of the form m(l) = (l - I» ' and proceed to find the smallest VI such that m(A - I ) = m(N) = O. We first obtain
o o
NZ
=
Next, we get that
-1
o
0 I
o o o
0 0 0
0 -I
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 3 0 o0 0 0 -3 0 0 3 0 o0 0 0 o0 0 o0
N3 = 0 ,
3. eH nce, N is a nilpotent operator of index 3. We see that We will now apply Theorem .4 7.33 to obtain a representation for N in this space. sU ing the notation of Theorem .4 7.33, we let WI = :x { Nx = OJ, Wz = :x { NZx = OJ, and W, = :x { N 3 x = 0). We see that N has three linearly independent rows. This means that the rank of N is 3, and so dim (WI) = II = .4 Similarly, the rank of NZ is I, and so dim (Wz ) = Iz = 6. Clearly, dim (W3) = 13 = 7. We can conclude that N will have a representation N' ofthe form in Eq. (4.7.34) with r = .4 Each of the N; will be of the form in Eq. (4.7.35). There will be 13 - Iz = 1 (3 x 3) matrix. There will be 2/ z - 13 - II = 1 (2 X 2) matrix, and 2/1 - Iz = 2 (l x I) matrices. eH nce, there is a basis for X such that N may be represented by the matrix
and so
X =
VI =
5't~.
o
N' =
1 0iO 0 0 0 001:0000 I o 0 0:0 0 0 0 r- -j·000:01:00 I I o 0 010 010 0 ·- - - r- · o 0 0 0 0:0:0 o 0 0 0 0 0:0 1
,-_
..
The corresponding basis will consist of strings of vectors of the form
NZx..
Nx . .
x..
Nx z , X z , x 3, x ... We will represent the vectors x .. X z , "x and x .. by x .. x z , "x and x .., their coordinate representations, respectively, with respect to the natural basis u{ .. ... , u,} in .X We begin by choosing XI E W3 such that X I 1= Wz ; i.e., we find an X I such that N 3x I = 0 but NZx I :# O. The vector x f = (0,
.4 7.
193
oJ rdan Canonical oF rm
1,0,0,0,0,0) will do. We see that (Nxl)T = (0,0, 1,0,0,0, - I ) and (N2IX )T = (- 1 ,0, I, - 1 ,0,0,0). Hence, NX I E Wz but NX I ~ WI and NZx l E WI' We see there will be only one string of length three, and so we next choose zX E Wz such that X z ~ WI' Also, the pair N { x l , }zx must be linearly independent. The vector x I = (1,0,0,0,0,0,0) will do. Now (NxZ)T = (- 2 ,0,2, - 2 ,0,0, - I ), and NX 2 E WI' We complete the basis { Zx l , Nx z , for X by selecting two more vectors, X 3 , x , E W., such that N X 3t x , } are linearly independent. The vectors x I = (0, 0, - I , - 2, I, 0, 0) and x r = (1, 3, I, 0, 0, I, 0) will suffice. It follows that the matrix P =
N [ xz
l,
Nx l , X I '
Nx z , X z , x
3,
x,]
is the matrix of the new basis with respect to the natural basis (see Exercise
.4 3.9).
The reader can readily show that N' = P - I NP,
where
-I
0 I P=
-I 0
0 0 -2 I 0 0 I 0 0 0 I 0 2 0 -I 0 0 -2 0 -2 0 0
0 0
I
0 0
p- l =
I I
2
4
3 I
0 0
0 I
0 0 0 0 0 I 0 -I 0 0 and
I
0 0
-2
2
0 0 I 3 -I 0 0 I 0 0 0 -3 0 I -I 0 0 -I -I -3 -2 -I I 0 0 0 -I 0 0 0 0 I 0 0 0 0 0 0 0 I 0
Finally the J o rdan canonical form for A is given by
A' =
N'
+
I.
(Recall that the matrix representation for [ i s the same for any basis in .X ) Thus,
Chapter 4 I iF nite-Dimensional
194
Vector Spaces and Matrices
1 1 0iO 0 0 0 I 011:0000 001:0000 t- - ·00 0: 1 1:0 0 0 0 0 :I 0 1 :I 0 0 o 0 0 0 I
A' =
I
I
o- o- T"i-l
'- -i-
0 0 0 0 0 OIl Again, the reader can show that A' =
P- I AP.
In general, it is more convenient as a check to show that PA'
= AP. •
.4 7.42. Exercise. eL t X = R' , and let u{ t , • • , u,} denote the natural X ) be represented by the matrix basis for .X Let A E L ( X ,
A=
05 -1 I 1 0 3 -I -1 1 0 0 4 0 0 0 1 1 -1 0 0 0 4 -1 0 0 0 0 1 3 0 0 0 0 1 3
Show that the Jordan canonical form of A is given by 4
1
04
A' =
0iO I
0
0
1:000 o 0 4:0 0 0 O-O - O - r- i- 4 l0
o
0_
I
I
I
1
1_ _
0 0:0 4 : 0 0 0 0 0 i2 ~
and find a basis for X for which A' represents A.
.4 8.
BILINEAR
N UF CTIONALS
AND CONGRUENCE
In the present section we consider the representation and some of the properties of bilinear functionals on real finite-dimensional vector spaces. (We will consider bilinear functionals defined on complex vector spaces in Chapter 6.)
.4 8.
Bilinear uF nctionals and Congruence
195
Throughout this section X is assumed to be an n-dimensional vector space over the field ofreal numbers. We recall that iffis a bilinear functional on a real vector space ,X then f: X x X - + Rand f(<x
« X I ) = 0 for all « E R. Therefore, mz is a linear Furthermore, mz 1= = X because VI ¢ mz. Hence, dim mz subspace of .X := ;;; n - 1. Now let dim mz = q < n - 1. Since / is a bilinear functional on mz, it follows by the induction hypothesis that there is a basis for mz consisting of a set of q vectors v{ 2 , • • , vf+tl such that f(v1, vJ) = 0 for i 1= = j, 2 < i, j < q + 1. Also, f(v l , vJ) = 0 for j = 2, ... ,q + I, by definition of mz.
+
Chapter 4 I iF nite-Dimensional
200
Vector Spaces and Matrices
uF rthermore, f(v VI) = f(v l , vJ eH nce, f(v VI) = f(v., v,} = 0 for i = 2, " ... ,q + l . It follows that f(v"vJ } = 0 for" i:# j and I~i,j
y~e:.)
A
x
o'
+ ... +
pY e p)
(r~+I~+I'
... ,y~e~)]
0,
ep}. On the other hand,
+ ... +
f[-(~+,~+,
=
E
= f(y,e,
f(x o, x o)
by choice of{ e
"pep =
+ ... +
;Y l+ e;+1
(- 1 )Z[ -
(,,~+
1)2 -
(,,~+z)Z
-
y~e:.),
-
.• •
-
(y~)Z]
u,-} be the natural basis for R'.- Then the natural coordinate representation of x and y is x =
[~:J
and y =
:[ :J
(4.9.1)
respectively (see Example .4 1.15). The representation of these vectors in the plane is shown in Figure D. In this figure, Ix I, Iy I, and Ix - y Idenote the
.4 9.1.
iF gure D. eL ngth of vectors and angle between vectors.
lengths of vectors ,x y, and (x - y), respectively, and 8 represents the angle IlZ , and the length between x and y. The length of vector x is equal to (,f + of vector (x - y) is equal to { ( ' I - 1' 1)'- + (,,- - 1' ,-))- ' 1/2. By convention,
,n
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
we say in this case that "the distance from x to y" is equal to {(~I - I' I)Z + (~z - I' Z)Z}1/2, that "the distance from the origin 0 (the null vector) to x" is equal to (~f + ~DI/Z, and the like. Using the notation of the present chapter, we have (4.9.3) and
Ix - yl =
,J ( x
-
y)T(x -
= ,J ( y - )X T(y - )x = Iy - lx .
y)
(4.9.4)
The angle (J between vectors x and y can easily be characterized by its cosine, namely,
cos 8 =(~
Utilizing
17~+
~z7z)
(4.9.5)
Z·
""'~f + i ""'I' I + I' z the notation of the present chapter, we have
,J
cos (J =
T x x
XT~
(4.9.6)
yTy
It turns out that the real-valued function T x y, which we used in both Eqs. (4.9.3) and (4.9.6) to characterize the length of any vector x and the angle between any vectors x and y, is of fundamental importance. F o r this reason we denote it by a special symbol; i.e., we write (x, y)
Now if we let x
=
t:.
T x y.
(4.9.7)
yin Eq. (4.9.7), then in view of Eq. (4.9.3) we have
Ix I = ""'(x, x).
(4.9.8)
By inspection of Eq. (4.9.3) we note that
>
(x, x)
and
(x , x )
0 for all x * - O
=
=
0 for x
(4.9.9) (4.9.10)
O.
Also, from Eq. (4.9.7) we have (x, y) =
(4.9.11)
(y, x)
for all x and y. Moreover, for any vectors ,x y, and z and for any real scalars « and p we have, in view of Eq. (4.9.7), the relations (x
+
(x , y
and
+
y, )z = )z =
(x, )z (x, y)
+
+
(Y, )z ,
(4.9.12)
(x , )z ,
(4.9.13)
y) =
«(x,
y),
(4.9.14)
(x , « y ) =
«(x,
y).
(4.9.15)
( 0 for all x 0 and (x, x ) = 0 if x = 0; (ii) (x, y) = (y, x ) for all x , y E X; (iii) (IXX py, )z = IX(,X )z P(y, )z for all x, y, z E X and all IX, PER; and (iv) (x, lXy pz) = IX(,X y) P(x, )z for all x , y E X and all IX, pER.
+
+
+
We note that Eqs. axioms.
.4 9.17. if y =
o.
+
(4.9.9}-(4.9.15)
are clearly in agreement with these
Theorem. The inner product (x, y) =
0 for all x
E
X if and only
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
Proof If y = 0, then y = 0 • x and (x, 0) = ( x , 0 • x ) = 0 • (x, x ) = 0 for allx E X . On the other hand, let (x , y) = 0 for all x E .X Then, in particular, it must be true that (x, y) = 0 if x = y. We thus have (y, y) = 0, which implies thaty = 0.. . The reader can prove the next results readily. .4 9.18. Corollary. L e t if and only if A = O.
A E L(X,
.4 9.19. y E X,
A, B E L ( X ,
Corollary. Let then A = B.
4.9.20. Corollary. x , y E R-, then A = 4.9.11.
Exercise.
X).
Then (x, Ay) =
0 for all ,x y E X
If (x, Ay) =
(x, By) for all ,x
X).
A be a real (n x n) matrix.
Let
o.
If x T Ay
=
0 for all
(~\t
.• . , ~_)
Prove Corollaries 4.9.18-4.9.20.
Of crucial importance is the notion of norm. We have: 4.9.11.
Definition. F o r each x E ,X
We call Ix
Ixl
I the norm of .x
let (x ,
=
X ) 1/2.
Let us consider a specific case. 4.9.13. Example. and y = ("I ' . • • ,
Let ,,_ ) .
X = R- and let x, y E X, where x F r om Example 3.6.23 it follows that
-
:E /~ ' I- I
(x, y) =
=
(4.9.24)
is an inner product on .X The coordinate representation of x and y with respect to the natural basis in R- is given by
respectively (see Example .4 1.15). We thus have and
(x , y)
Ixl =
_
(
:E l~ I- I
= Tx y,
)1/2
The above example gives rise to:
=
(X TX)1/2 • •
(4.9.25) (4.9.26)
.4 9.
Euclidean Vector Spaces
207
.4 9.27.
Definition. The vector space R" with the inner product defined in Eq. (4.9.24) is denoted by P. The norm of x given by Eq. (4.9.26) is called the Euclidean norm on R". Relation (4.9.29) of the next result is called the Schwarz inequality.
.4 9.28.
Theorem.
Let x and y be any elements of .X
Then
Ix l · I Y I ,
l(x,y)1 ~
(4.9.29)
where in Eq. (4.9.29) I(x, y) I denotes the absolute value of a real scalar and Ix I denotes the norm of .x F o r any x and y in X and for any real scalar tt we have
Proof
+
(x
tty, x
+
tty)
=
+
(x, )x
tt(x, y)
Now assume first that y *- 0, and let tt
Then
+
(x
tty, x
+
=
tty)
(x
=
(x
, ,
+
tt(y, )x
tt 2(y, y)
>
O.
- ( x , y). (y, y)
(x, x )
=
or
=
+
+
2tt(x, y)
+
tt 2(y, y)
x) _
2(x, y)(x y) (y, y ) '
x) _
(x , y)2 > (y,y) -
(x, x)(y, y)
>
+
(x , y)2(y y) (Y, y)2 ,
0 ,
(x , y)z.
Taking the square root of both sides, we have the desired inequality
l(x,y)1 < Ix l · l yl·
To complete the proof, consider the case y = O. Then (x, y) and in this case the inequality follows trivially. •
.4 9.30.
Exercise.
F o r ,x y
E
,X
= 0, Iy I = 0,
show that
l(x,y)1 = Ix l ' l yl
if and only if x and yare linearly d.ependent. In the next result we establish the axioms of a norm.
.4 9.31.
Theorem. following hold:
For
all x and y in X
and for all real scalars tt, the
(i) Ix l > 0 unless x = 0, in which case Ixl = 0; (ii) Ittx I = Itt I . Ix I, where Itt I denotes the absolute value of the scalar tt; and
(iii)
Ix
+
IY ~
Ixl
+
Iyl·
Chopter 4 I iF nite-Dimensional
Vector Spaces and Matrices
Proof The proof of part (i) follows from the definition of an inner product. To prove part (ii), we note that
I«lx z =
(' .. , e.1 is an orthonormal basis for ,X then the matrix of g with respect to this basis, denoted by G, is the matrix of G with respect to {el>' • , e.l. Conversely, given an arbitrary bilinear functional g defined on ,X there X ) such that (x , Gy) = g(x , y) exists a unique linear transformation G E L ( X , for all x , y E .X
=
Proof. g(x l
+
Let G
E
=
x z , y)
L(X, (X I
+
X),
and let g(x , y) =
X Z,
Gy)
=
(X I '
Gy)
(x , Gy). Then
+
(x z , Gy)
=
+
g(x l ,y)
g(x z , Y ) .
Also, g(x, YI
+
=
yz ) =
(x, G(YI g(x, Y I )
+
+
yz »
=
g(x , yz)·
(x, GYI
+
Gyz)
=
(x , GYI)
+
(x , Gyz)
.4 10.
iL near Transformations on Euclidean Vector Spaces
119
Furthermore, and g(x, IX)Y
=
g(tU,
y)
=
(x, G(IX» Y
Gy)
(lX,X =
=
IX(,X
(x, IXG(y»
Gy) =
IX(,X
=
y),
IXg(X,
Gy)
=
IXg(X,
y),
where IX is a real scalar. Therefore, g is a bilinear functional. Next, let e{ ., ... ,e.} be an orthonormal basis for .X Then the matrix G of g with respect to this basis is determined by the elements g/j = g(e l, eJ). Now let G' = g[ ;J] be the matrix of G with respect to {e., . .. ,e.}. Then Ge J
=
t
k=.
g~Jek
for j =
I, ...
,n.
Hence,
(e lt Ge) =
(e k=t. l,
g~)ek) =
g;j.
Since glJ = g(e l , eJ ) = (e lt Ge J ) = g;J' it follows that G' = G; eL ., G is the matrix ofG. To prove the last part of the theorem, choose any orthonormal basis e[ ., ... ,e.} for .X Given a bilinear functional g defined on ,X let G = g[ lj] denote its matrix with respect to this basis, and let G be the linear transformation corresponding to G. Then (x, Gy) = g(x, y) by the identical argument given above. Finally, since the matrix of the bilinear functional and the matrix of the linear transformation were determined independently, this correspondence is unique. _ It should be noted that the correspondence between bilinear functionals and linear transformations determined by the relation (x, Gy) = g(x, y) for all x , y E X does not depend on the particular basis chosen for ;X however, it does depend on the way the inner product is chosen for X at the outset. Now let G E L ( X , X ) , set g(x, y) = (x, Gy), and let h(x, y) = g(y, x) = (y, Gx) = (Gx, y). By Theorem 4.10.12, there exists a unique linear transformation, denote it by G*, such that h(x, y) = (x, G*y) for all ,x y E .X We call the linear transformation G* E L ( X , X ) the adjoint of G.
.4 10.13.
Theorem
(i) F o r each G E L ( X , X ) , there is a unique G* E L ( X , X ) such that (x, G*y) = (Gx, y) for all ,x y E .X (ii) Let {e., . .. ,e.} be an orthonormal basis for ,X and let G be the matrix of the linear transformation G E L ( X , X ) with respect to this basis. Let G* be the matrix of G* with respect to e[ l , • • , e.}. Then G* = GT.
Proof The proof of the first part follows from the discussion preceding the present theorem. To prove the second part, let e[ l, ... , e.} be an orthonormal basis for ,X and let G* denote the matrix of G* with respect to this basis. Let x and y be the coordinate representation of x and y, respectively, with respect to this
Chapter 4 I iF nite-Dimensional basis. Then
(x , G*y) =
=
T x G*y
(GX)T y =
(Gx , y) =
Thus, for all x and y we have T x (G* -
Vector Spaces and Matrices
GT)y
=
T x GT y.
O. eH nce,
G* =
GT. •
The above result allows the following equivalent definition of the adjoint linear transformation. .4 10.14. Definition. eL t G is defined by the formula for all x, y
E
L(X,
X).
(x , G*y)
.X
E
=
The adjoint transformation, G* (Gx , y)
Although there is obviously great similarity between the adjoint linear transformation and the transpose of a linear transformation, it should be noted that these two transformations constitute different concepts. The differences of these will become more apparent in our subsequent discussion of linear transformations defined on complex vector spaces in Chapter 7. Our next result includes some of the elementary properties of the adjoint of linear transformations. The reader should compare these with the properties of the transpose of linear transformations. X ) , let A*, B* denote their respective .4 10.15. Theorem. eL t A, B E L ( X , adjoints, and let lX be a real scalar. Then
(i) (A*)* = A; (ii) (A B)* = A* (iii) (lXA)* = lXA*; (iv) (AB)* = B*A*;
+
(v) (vi) (vii) (viii) .4 10.16.
+
B*;
/* = I, where / denotes the identity transformation; 0* = 0, where 0 denotes the null transformation; A is non-singular if and only if A* is non-singular; and if A is non-singular, then (A*)- I = (A- I )*. Exercise.
Prove Theorem .4 10.15.
Our next result enables us to characterize orthogonal transformations in terms of their adjoints. .4 10.17. A* =
Proof
Theorem. eL t A E L ( X ,
A- I .
We have (Ax, Ay) =
X).
Then A is orthogonal if and only if
(A*Ax , y). But A is orthogonal if and only jf
.4 10.
iL near Transformations on Euclidean Vector Spaces
(Ax , Ay) =
(x, y) for all x , y
E
.X
221
Therefore,
(A*Ax , y)
=
(x , y)
for all x and y. F r om this it follows that A*A =A-I . •
=
I, which implies that A*
The proof of the next theorem is left as an exercise.
.4 10.18. Theorem. Let A E L ( X , X ) . Then A is orthogonal if and only if A- I is orthogonal, and A- I is orthogonal if and only if A* is orthogonal. .4 10.19.
Exercise.
Prove Theorem .4 10.18.
C. Self- A djoint Transformations Using adjoints, we now introduce two additional important types of linear transformations.
.4 10.20. Definition. Let A E L ( X , )X . Then A is said to be self-adjoint if A* = A, and it is said to be skew-adjoint if A* = - A . Some of the properties of such transformations are as follows.
.4 10.21. Theorem. Let A E L ( X , X ) . Let e{ lO • • , e"} be an orthonormal basis for ,X and let A be the matrix of A with respect to this basis. The following are equivalent: (i) A is self-adjoint; (ii) A is symmetric; and (iii) (Ax , y) = (x , Ay) for all x , y
E
.X
.4 10.22. Theorem. Let A E L ( X , X), and let e{ l, ... , e"} be an orthonormal basis for .X Let A be the matrix of A with respect to this basis. The following are equivalent: (i) A is skew-adjoint; (ii) A is skew-symmetric (see Definition .4 8.8); and (iii) (Ax , y) = - ( x , Ay) for all x , y E .X
.4 10.23.
Exercise.
Prove Theorems .4 10.21 and .4 10.22.
The following corollary follows from part (iii) of Theorem .4 10.22.
Chapter 4 I iF nite-Dimensional
221
.4 10.24. Corollary. eL t following are equivalent:
Vector Spaces and Matrices
A be as defined in Theorem .4 10.22.
(i) A is skew-symmetric; (ii) (x, Ax ) = 0 for all x E ;X (iii) Ax . .l x for all x E .X
Then the
and
Our next result enables us to represent arbitrary linear transformations as the sum of self-adjoint and skew-adjoint transformations. .4 10.25. Corollary. eL t A E L(X, X). Then there exist unique At, A" E L(X, X ) such that A = AI + A", where At is self-adjoint and A" is skewadjoint. .4 10.26.
Prove Corollaries .4 10.24
Exercise.
and .4 10.25.
.4 10.27. Exercise. Show that every real n x n matrix can be written in one and only one way as the sum of a symmetric and skew-symmetric matrix. Our next result is applicable to real as well as complex vector spaces. .4 10.28. Theorem. eL t X be a complex vector space. Then the eigenvalues of a real symmetric matrix A are all real. (If all eigenvalues of A are positive (negative), then A is called positive (oegative) definite.) eL t A = r + is denote an eigenvalue of A, where rand s are real numbers and where i = ../- 1 . We must show that s = O. Since A is an eigenvalue we know that the matrix (A - AI) is singular. So is the matrix
Proof
B=
A [ -
(r
=
A" -
(r
+
+
is)I)[A -
is)I)
is)IA -
(r -
(r -
is)IA
+
(r
+
is)(r -
= A" - 2rA + (r" + s")1" = (A - rI)" + s"I.
Since B is singular, there exists an x * "O such that Bx = 0=
T x Bx
=
T x ([ A
-
rl)"
Since A and I are symmetric, (A - rI)T Therefore,
=
+
s"I)x = AT -
rl T
T x (A -
is)1"
O. Also,
rI)"x
+
s"xT.x
= A - rl.
i.e., where y =
(A -
rI)x. Now yTy =
~ ,~
•
,,1 ~
0 and T x x
= L • ,1> 0, because
I- '
.4 10.
iL near Transformations on Euclidean Vector Spaces
* O. Thus, we have
by assumption x
o=
yTy
+
>
SZ(xT)x
0
+
223
sZxT.x
The only way that this last relation can hold is if s and Ais real. _
=
O. Therefore, A =
T,
X ) with Now let A be the matrix of the linear transformation A E L ( X , respect to some basis. If A is symmetric, then all its eigenvalues are real. In this case A is self-adjoint and all its eigenvalues are also real; in fact, the eigenvalues of A and A are identical. Thus, there exist uniq u e real scalars AI' ... , Apt P < n, such that
U)
det (A -
det (A =
(AI -
AI) =
A)""(Az -
A)'"'
... (A, -
A)'".'
(4.10.29)
We summarize these observations in the following: Corollary. Let A E L ( X , )X . If A is self-adjoint, then all eigenvalues of A are real and there exist uniq u e real numbers AI" • ,A" p < n, such that Eq. (4.10.29) holds.
.4 10.30.
i
=
As in Section 4.5, we say that in Corollary 4.10.30 the eigenvalues A" 1, ... ,p < n, have algebraic multiplicities m i = 1, ... ,p, respectively. " is the following result. Another direct consequence of Theorem 4.10.28
4.10.31. Corollary. Let least one eigenvalue.
.4 10.32.
Exercise.
A E L(X,
If A is self-adjoint, then A has at
X).
Prove Corollary 4.10.31.
Let us now examine some of the properties of the eigenvalues and eigenvectors of self-adjoint linear transformations. First, we have:
.4 10.33.
Theorem. Let A E L ( X , X ) be a self-adjoint transformation, and let AI" .. ,Ap , p < n, denote the distinct eigenvalues of A. If ,X is an eigenvector for A, and if XI is an eigenvector for AI' then ,x .1. XI for all i j.
*
Proof Assume that A, A,andconsider AX I = ,x 0 and x , O. We have
*
A,(X Thus,
*
"
Since A,
x,) =
(A,X
"
)JX =
* AI' we have (XI'
Now let A
E
L(X,
X),
(Ax
"
)JX =
(XI'
Ax /) =
(x"
AJX /) =
(A, -
AJ)(X"
)JX
0, which means ,x .1. xI'
=
IX ) =
=
A,X , and Ax,
*
AJ"X
where
Aix " )J x '
O. _
and let A, be an eigenvalue of A. Recall that
~,
Chapter 4 I iF nite-Dimensional
224
Vector Spaces and Matrices
denotes the null space of the linear transformation A -
m= l
x{
E
= OJ.
:X (A - A Il)x
Recall also that ml is a linear subspace of .X have immediately:
A,l, i.e., (4.10.34)
F r om Theorem .4 10.33 we now
X ) be a self-adjoint transformation, and .4 10.35. Corollary. Let A E L ( X , let AI and Aj be eigenvalues of A. If AI *- Aj , then ml ..1 mj •
.4 10.36.
Exercise.
Prove Corollary .4 10.35.
Making use of Theorem .4 9.59, we now prove the following important result. X ) be a self-adjoint transformation, and .4 10.37. lbeorem. Let A E L ( X , let A\, ... , A" p < n, denote the distinct eigenvalues of A. Then
dim X
= n=
dim m\
+
+ ... +
dim mz
dim m,.
Proof Let dim ml = nl , and let ret, ... , e• .l be an orthonormal basis for mi' Next, let e{ • + I > ' " ,e.,H .} be an orthonormal basis for mz . We continue in this manner, finally letting e{ ., + ... +_. + I> • • • , e•• + ... .+ ,} be an orthonormal basis for mp • Let n\ + ... + n p = m. Since ml ..1 mj , i *- j, it follows that the vectors et> ... ,e.., relabeled in an obvious way, are orthonormal in .X We can conclude, by Corollary .4 9.52, that these vectors are a basis for ,X if we can prove that m = n. Let Y be the linear subspace of X generated by the orthonormal vectors e\ , ... , e... Then e{ l , • • , e..} is an orthonormal basis for Y a nd dim Y = m. Since dim Y + dim y1. = dim X = n (see Theorem .4 9.59), we need only prove that dim Y 1. = O. To this end let x be an arbitrary vector in Y 1.. Then (x, e\) = 0, ... , (x, e..) = 0; i.e., x . .l e\, ... , x ..1 e.., by Theorem .4 9.59. So, in particular, again by Theorem .4 9.59, we have x ..1 ml , i = I, ... ,p. Now let y be in mi' Then
(Ax, y) =
(x, Ay) =
(x, AIY)
=
Alx , y) =
0,
since A is self-adjoint, since y is in ml , and since x ..1 mi' Thus, Ax ..1 m, for i = I, ... ,p, and again by Theorem .4 9.59, Ax . .l el , i = I, ... , m. Thus, by Theorem .4 9.59, Ax . .l yol. Therefore, for each x E Y 1. we also have Ax E yol. Hence, A induces a linear transformation, say A', from yol into 1Y ., where A' x = Ax for all x E y1.. Now A' is a self-adjoint linear transformation from yol into oY l, because for all x and y in yol we have
(A'x, y) =
(AX, y) =
(x, Ay) =
(x, A'y).
Assume now that dim yol> O. Then by Corollary .4 10.31, A' has an eigenvalue, say Ao, and a corresponding eigenvector X o *- O. Thus, X o *- 0
.4 10.
iL near Transformations on Euclidean Vector Spaces
225
IS 10 y1. and A' x o = Ax o = Aox o; i.e., Ao is also an eigenvector of A, say Ao = A,. So now it follows that X o E ~/' But from above, X o E Y 1., which This implies that X o 1- x o, or (x o, x o) = 0, which in turn means X o 1- ~/' implies that X o = O. But this contradicts our earlier assumption that X o 1= = O. eH nce, we have arrived at a contradiction, and it therefore follows that dim Y 1. = O. This proves the theorem. _
Our next result is a direct consequence of Theorem .4 10.37.
.4 10.38.
Corollary. eL t A
E
L(X,
)X .
If A is self-adjoint, then
(i) there exists an orthonormal basis in X such that the matrix of A with respect to this basis is diagonal; and (ii) for each eigenvalue A, of A we have dim m, = multiplicity of A,.
Proof As in the proof of Theorem .4 10.37 we choose an orthonormal basis ret, ... ,em}, where m = n. We have Ael = Ale., . .. ,Ae", = Ale"" Ae",+l = A2.e"'h+ ' .. ,Ae",+ ... u. = Ape",+ ... + .• Thus, the matrix A of A with respect to e{ l , • . • ,e.} is Al
In.
0
Al A. 2.
A=
In.
A2.
o
A,
I n,
A,
To prove the second part, we note that the characteristic polynomial of
A is
det (A -
AI) =
and, hence, n,
=
det (A dim /~ =
AI)
=
(AI -
A)"'(A2. -
multiplicity of A" i
=
A)"'
1,
(Ap ,p.
Another consequence of Theorem .4 10.37 is the following:
_
A)"',
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
.4 10.39. Corollary. Let A be a real (n x n) symmetric matrix. Then there exists an orthogonal matrix P such that the matrix A' defined by A' = P- I AP = pTAP is diagonal. .4 10.40.
Exercise.
Prove Corollary .4 10.39.
F o r symmetric bilinear functionals defined on Euclidean vector spaces we have the following result.
.4 10.41. Corollary. eL t f(x , y) be a symmetric bilinear functional on .X Then there exists an orthonormal basis for X such that the matrix offwith respect to this basis is diagonal. Proof
By Theorem .4 10.12 there exists an F
E
L( ,X X )
such that f(x , y)
= (x, Fy) for all x, y E .X Since f is symmetric, f(y, x ) = f(x , y) = (y, Fx) = (x, yF ) = (F,x y) for all x, y E X, and thus, by Theorem .4 10.21,
F is self-adjoint. eH nce, by Corollary .4 10.38, there is an orthonormal basis for X such that the matrix of F is diagonal. By Theorem .4 10.12, this matrix is also the representation offwith respect to the same basis. _
The proof of the next result is left as an exercise.
.4 10.42. Corollary. eL t j(x ) be a quadratic form defined on .X there exists an orthonormal basis for X such that if x T = (~I' ..• the coordinate representation of x with respect to this basis, then! ( x ) + ... + lX.e~ for some real scalars lXI' • • , IX • .4 10.43.
Exercise.
=
,~.)
Then is lXle~
Prove Corollary .4 10.42.
Next, we state and prove the spectral tbeorem for self-adjoint linear X ) is a transformations. First, we recall that a transformation P E L ( X , projection on a linear subspace of X if and only if p1. = P (see Theorem 3.7.4). Also, for any projection P, X = R < (P) EEl (~ P), where R < (P) is the range of P and ~(P) is the null space of P (see Eq. (3.7.8». Furthermore, recall that a projection P is called an orthogonal projection if R < (P) ..1 (~ P) (see Definition 3.7.16).
.4 10.4.4
Theorem. Let
A E L(X,
X)
be a self-adjoint transformation, let
AI' ... ,Ap denote the distinct eigenvalues of A, and let ~I be the null space
of A - AsI (see Eq. (4.10.34.» F o r each; = projection on 1& along &f-. Then
I, ... ,p, let PI denote the
(i) PI is an orthogonal projection for each; = = 0 for i *j, i,j = I, ... ,p;
(ii) PIP)
I, ... ,p;
.4 10.
Linear Transformations on Euclidean Vector Spaces
t
(iii)
= I, where I E L(X,
PJ
J-I
and (iv) A =
t
X ) denotes the identity transformation;
AJP)"
J=I
To prove the first part, note that X = m:, EB m:;-, i = I, ... ,p, by Theorem .4 9.59. Thus, by Theorem 3.7.3, R< (P ,) = m:, and m:(P ,) = m:;-, and hence, P, is an orthogonal projection. To prove the second part, let i 1= = j and let x E .X Then PJx I:>. x J E m: J . Since R< (P ,) = m:, and since m:,1.. m: J , we must have x J E m:(P ,); i.e., P,PJx = 0 for all x E .X
Proof
To prove the third part, let P
t
=
P" We must show that P
I- I
= I.
To
do so, we first show that P is a projection. This follows immediately from the fact that for arbitrary x E ,X p2 X = (PI + ... + P,)(Plx + ... + P,x ) = PIx + ... + P;x , because P'P J = 0 for i 1= = j. Hence, p2 X = (PI + ... + P,)x = Px, and thus P is a projection. Next, we show that dim R ... ,I. be n real-valued functions which are defined and continuous on D (i.e., /,(t, X I ' ... , x.), i = I, ... ,n, are defined for all points in D and are continuous with respect to all arguments I, IX > • • , x.). We call (4.11. 7) IX = /,(1, X I ' ... ,x . ), i = 1, ... , n, a system of n ordinary differential equations of tbe first order. A set of n real differentiable functions 1 ' £ ' ... , ,.} (if it exists) defined on a real I interval T = (I I' I z ) c R such that the points (I, '1(1), ... , ,.(1» E D for all lET and such that
;tCt) =
/,(1, '1(1), ... ".(t» ,
i
= 1, ... , n
(4.11.8)
for all lET, is called a solution of tbe system of ordinary differential equations (4.11.7).
,.} is .4 11.9. Definition. Let (f, ~ I> • . . , ~.) E D. If the set { ' I "' " a solution of the system of equations (4.11.7) and if (' I (f), ... , ,.(f» = (~I> ... , ~.), then the set 1 ' £ "' . ".} is called a solution of the initial-value problem IX = /,(t, X I ' . ' • , x.), i = 1, ... , n } . (4.11.10) X I (f) = ~I' I = I, ... , n It is convenient to use vector notation to represent Eq. (4.11.10). Let
.4 11.
241
Applications to Ordinary Differential Equotions
f(/, x )
and define i =
=
[
/1(/, X 1.(/,
,X.)]
It.' •
.
/[ I('~ =.
.
. 1.(/, x)
,x . )
X It • . .
)X ]
dx/dt componentwise; i.e.,
We can express Eq. (4.11.10) equivalently as i
= f(t, x)
(X T)
=;
}.
(4.11.11)
If in Eq. (4.11.1 I) f(t, x) does not depend on I (i.e., f(t, )x (I, )x E D), then we have i = f(x).
= f(x) for all (4.11.12)
In this case we speak of an autonomous system of first-order ordinary difl'erential equations. Of special importance are systems of first-order ordinary differential equations described by (4.11.13) i = A(t)x + vet), i
and
=
(4.11.14)
A(t)x,
(4.11.15)
i= A x ,
where x is a real n-vector, A(t) = a[ j{ (t)] is a real (n x n) matrix with elements a{j(/) that are defined and continuous on a t interval T, A = a[ ,/] is an (n X n) matrix with real constant coefficients, and vet) is a real n-vector with components v,(t), i = 1, ... ,n, which are defined and at least piecewise continuous on T. These equations are clearly a special case of Eq. (4.11.7). F o r example, if in Eq. (4.11.7) we let /,(t,
XI'
••
,x . ) = /,(t, x) =
~
• a'/(t)x
I- I
l,
i=
I, ... ,n,
then Eq. (4.11.14) results. In the case of Eqs. (4.11.14) and (4.11.15), we speak of a linear homogeneous system of ordinary differential equations, in the case of Eq. (4.11.13) we have a linear non-bomogeneous system of ordinary differential equations, and in the case of Eq. (4.11.15) we speak of a linear system of ordinary differential equations with constant coefficients. Next, we consider initial-value problems described by means of nth-order ordinary differential equations. L e tlbe a real function which is defined and
Chapter 4 I iF nite-Dimensional
242
continuous in a domain D of the real (I, Ii. dkx/dl k. We call (X )~ = 1(1, ,X X (I), • • •
XI' ,
Vector Spaces and Matrices space, and let
,x~)
••
X ( k)
(4.1 1.1 6)
x(~-Il)
an nth-order ordinary dift'erential equation. A real function ' I (if it exists) which is defined on a I interval T = (I I' t 2) C R and which has n derivatives on T is called a solution of Eq. (4.11.16) if (I, 1' (/), ... ,rp(~)(/» E D for all I E Tand if rp(~)(/) = 1(/, 1' (/), ..• , rp(~-Il(/» (4.1 1.17) for all lET. .4 11.18.
Definition. eL t (r, and if rp(r) = of the initial value problem
e" ... ,e~)
e" ... ,
(4.11.16)
=
(X )~
=
rp(~-Il(r)
1(/, ,x x(ll, ...
eJ' ... ,x(I-~ l(r)
=
x ( r)
D. If ' I is a solution of Eq. then ' I is called a solution
E
e~, ,X(~-I»
}.
=
(4.1 1.19)
e~
Of particular interest are nth-order ordinary differential equations
+
a,,(/)x(~) a,,(t)x()~
and a,.x(~)
+
+
a._I(/)x(~-1l
+
a~_I(t)X(~-1l
+
+
+
a l (t)x(1l
+ ... +
a~_lx(~-1l
+
al(/)x ( l)
alx ( I)
+
+
ao(t)x
=
ao(t)x
=
=
0,
aox
V(/),
(4.11.20)
0,
(4.11.21) (4.11.22)
where a,,(t), .• . ,oo(t) are real continuous functions defined on the interval T, where a~(/) :;z:! 0 for all lET, where a~, • . , a o are real constants, where a" :;z:! 0, and where v(/) is a real function defined and piecewise continuous on T. We call Eq. (4.11.21) a linear homogeneous ordinary differential equation oforder n, Eq. (4.1 1.20) a linear non-homogeneous ordinary differential equation of order n, and Eq. (4.1 I .22) a linear ordinary differential equation of order n with constant coefficients. We now show that the theory of nth-order ordinary differential equations reduces to the theory of a system of n first-order ordinary differential equations. To this end, let in Eq. (4.11.19) X = X I ' and let
=
IX
x = x
2
I_~X
x~
=
=
X 2 3
=
X
=
x~
1(/,
X ( 2)
(4.1 1.23)
=
X(~-I)
XI'
••
, x~)
=
x(~)
This system of equations is clearly defined for all (I, X I ' ... ,x~) E D. Now assume that the vector p4 T = ('11' ... , rp~) is a solution of Eq. (4.11.23) on an
.4 11.
Applications to Ordinary Differential Equations
interval T. Since rp"
= ;"
rp3
f(t, rp,(t), . .. ,rpft(t»
= ;", ... ,rpft = rp\ft-I),
and since
f(t, rp,(t), . .. ,rp\ft-Il(t» =
=
rp\ft)(t),
it follows that the first component rp, of the vector, is a solution of Eq. (4.11.16) on the interval T. Conversely, assume that rp, is a solution of Eq. (4.11.16) on the interval T. Then the vector cpT = (rp, rp(l), ... , rp(ft-ll) is clearly a solution of the system of eq u ations (4.11.23). Note that if rp,(1') = ~" ... ,rp\ft-I)(1') = ~ft' then the vector, satisfies ,(f) = ; , where = (~t, ... , ~ft)' The converse is also true. Thus far we have concerned ourselves with initial-value problems characterized by real ordinary differential equations. It is possible to consider initialvalue problems involving complex ordinary differential equations. F o r example, let t be real and let ZT = (z " ... , Zft) be a complex vector (i.e., Zk is of the form U k + ivk , where U k and V k are real and i = ,J = } ) . Let D be a domain in the (t, z) space, and letf., ,f,. be n continuous complex-valued functions defined on D. Let fT = (fl' ,f,.), and let = dz/dl. We call
;T
z
= C(t, )z
z
(4.11.24)
a system of n complex ordinary differential eq u ations of the first order. A complex vector cpT = (rp" • .. , rpft) which is defined and differentiable on a real t interval T = (T" T,,) c R such that the points (I, rp,(t), ... , rpft(t» E D for all t E T and such that
= C(t, .l' :' !&
= A(t)[ . II.z I· · 1· • .] =
A(t)Y .
.. · I A(t)' I ' . ] •
We also have: .4 11.37. Theorem. If" is a solution of the matrix equation on T and if t, ' f E T, then det "(/) = det "(' f )ef. tf A(.) i., t E T.
Proof " =
Recall that if C =
[ " II] and A(t) = fill
~(detY)=
:~
is an (n
n) matrix, then tr C =
X
;{ I•
o[ IAt)]. Then I¥ II =
"u .. :.:&~
I "d
fl.
[ e ll]
...
¥lh
.. ::: .. :.z~
.. .
~.2~
t
(4.11.38)
I-I
CII'
Let
alk(t)"kr Now
flu
fill
+
(4.11.36)
•
~.:&~
.. ,
'IIh
.. ::: .. ~:&
".. ".1 "d "u "I. "" + "u ..................
+
fl••
flu
,,-'
,,:& • .
fld . , . fl••
(4.11.39)
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
Also,
1' 2' .
-
...................
' / Inn
The last determinant is unchanged if we subtract from the first row 012 times the second row plus 1 , times the third row up to 0ln times the nth row. This yields
°
0\1 (/)'/1
\I
0 \ I (t}yt u
1\ 2' 1
...
°
\I
/' 122
(t}'/I
1n
\l'2n
=
01\(/) det 1' .
Repeating the above procedure for the remaining determinants we get
~
°
d[ et 1'(t)] =
11 (/)
This now implies det Y ( t)
.4 1J.04 .
E
T.
022(1) det 1'(1)
+ ... +
0..(/) det 1'(t)
[tr A(t)] det 1'(t). =
for all t
+
det Y ( t)
=
det Y(r)ef~
It A(,),,,
•
Exercise.
Verify Eq. (4.11.39).
We now prove:
.4 1J.14 . Theorem. A solution Y of the matrix equation (4.11.36) is a 0 for all t E T. fundamental matrix for Eq. (4.11.30) ifand only if det 1'(t)
*'
Proof. Assume that l' = [ . 1 I V ' 21· .. 1V ' .] is a fundamental matrix for Eq. (4.11.30), and let 'I' be a nontrivial solution for (4.11.30). By Theorem .4 11.32 there exist unique scalars ~I' • • , /In E ,F not all ez ro, such that
or
• =
1' a ,
(4.11.42)
where aT = (/II' ..• ,/I.). Equation (4.11.42) constitutes a system of n linear equations with unknowns /II' ..• , /In at any f E T and has a unique solution for any choice of.(f). eH nce, we have det 1' ( f) 0, and it now follows from 0 for any 1 E T. Theorem .4 11.37 that det Y ( t) Conversely, let l' be a solution of the matrix equation (4.11.36) and assume
*'
*'
.4 11.
Applications to Ordinary Differential Equations
that det Y ( t) 1= = 0 for all t pendent for all t E T. •
E
249
T. Then the columns of., are linearly inde-
The reader can readily prove the next result. .4 11.43. Theorem. L e t" be a fundamental matrix for Eq. (4.11.30), and let C be an arbitrary (n x n) non-singular constant matrix. Then is also a fundamental matrix for Eq. (4.11.30). Moreover, ifT is any other fundamental matrix for Eq. (4. 11.30) then there exists a constant (n X n) non-singular matrix P such that T = "P.
"C
4.1l.44.
Exercise.
Prove Theorem .4 11.43.
Now let R(t) = [rit») be an arbitrary matrix such that the scalar valued functions rl}(t) are Riemann integrable on T. We define integration of R(t) componentwise, i.e.,
f R(t)dt = fr[ ,it»)dt
=
J[
r,/(t)dt}
Integration of vectors is defined similarly. In the next result we establish some of the properties of the state transition matrix, • . Hereafter, in order to indicate the dependence of. on l' as well as t, we will write .(t, 1'). By b4 (t, 1'), we mean u.(t, 1')/ut. .4 11.45. Theorem. eL t D be defined by Eq. (4.11.28), let l' E T, let cp(1') = ~, let (1',)~ E D, and let .(t,1' ) denote the state transition matrix for Eq. (4.11.30) for all t E T. Then (i) b4 (t, f) = A(t).(t, 1') with .(1' , 1' ) = I, where I denotes the (n x identity matrix; (ii) the unique solution of Eq. (4.11.30) is given by ,(t)
for all t E T; (iii) .(t, f) is non-singular for alI t (iv) for any t, (J E T we have
.(t,1' )
=
n)
(4.11.46)
.(t, 1'~ T;
E
= .(t, (J~«(J,
f);
(v) [.(t,1'»)-1 t:. .- I (t, f) = .(- r , t) for all t E T; and (vi) the unique solution of Eq. (4.11.29) is given by
cp(t)
= .(t, 1')~
+
f .(t,
")v(,,)d,,.
(4.11.47)
Proof The first part of the theorem follows from the definition of the state transition matrix.
Chapter 4 I iF nite-Dimensiotull
Vector Spaces and Matrices
To prove the second part, assume that f{ t ) = with respect to t we have
(+ t)
= i(t, f~
=
A(t~(t,
.(t, f~.
Differentiating
= A(t)t O
f{ t )
=
=
.(t, f~
~t,
O'~O',
f~.
Since this equation holds for arbitrary ~ in the x space, we have
= .(t, O'~(O',
f)
~t,
f).
To prove the fifth part of the theorem we note that .- I (t, f) exists by part (iii). F r om part (iv) it now follows that
I= where I denotes the (n x
.(t, f~(f,
t),
n) identity matrix. Thus,
.(f, t)
.- I (t, f) =
for all t E T. In the next chapter we will show that under the present assumptions, Eq. (4.11.29) possesses a unique solution for every (f, ~) E D, wheret< f ) = ~. Thus, to prove the last part of the theorem, we must show that the function (4.11.47) is this solution. Differentiating with respect to t we have
= ,< t , f~
+ ~t,
= A(t~(t,
f~
.(t)
A(t)[~t, =
=
Also, f{ f )
=~.
A(t~t)
Therefore, •
+ f~
+
t)Y(t) (Y t)
+
+
+
f , ... ,1.., lk+.' ... ,lu, denote the (not necessarily distinct) eigenvalues of A. Show that
o
ell'
o
where
.]
and I"
I,· - i
2!
1
(v. - l)! 1' · - " (VIII 2)!
t
o where J . is a
VIII
X
VIII
0
matrix and k
+
e'"
0
v.
+ ... +
v, = n.
Next, we consider initial-value problems characterized by linear nth-order ordinary differential equations given by
+
a.(t)x
l l •
a.(t)x
l• l
+
a._ . (t)x
l
a._ . (t)x
c.-
and a.x
l .)
+
a._ . x
.-
Il
Il
+
+
l. - I )
+
+
a.(t)x ( \ l
+ ... +
a.(t)x ( \ l
a.x l l)
+
+
ao(t)x =
+
ao(t)x aox
=
=
v(t),
(4.11.59)
0,
(4.11.60)
O.
(4.11.61)
In Eqs. (4.11.59) and (4.11.60), v(t) and o,(t), i = 0, ... ,n, are functions which are defined and continuous on a real t interval T, and in Eq. (4.11.61),
Chapter 4 I iF nite-Dimensional
Vector Spaces and Matrices
i = 0, ... , n, are constant coefficients. We assume that 0. F= 0, that 0 for any 1 E T, and that v(l) is not identically ez ro. Furthermore, the coefficients 01' 0 1(1), i = 0, ... ,n, may be either real or complex. In accordance with Eq. (4.11.23), we can reduce the study of Eq. (4.11.60) to the study of the system of n first-order ordinary differential equations
the
01'
0,,(1) F=
i where
o
A(I) =
=
o
o
0
-°
-' oo(t) _ a,,(I)
I
o o
0
I
o
1
o
(4.11.62)
A(I)x,
1(1)
- 0 2(1) 0.(1)
a,,(I)
(4.11.63)
- O "- I (t)
a,,(/)
. ••
In this case the matrix A(I) is said to be in companion form. Since A(I) is continuous on T, there exists for all 1 ETa unique solution II to the initialvalue problem i =
A(I)x
}
x(t')=;=(~I,··,~,,)T
(4.11.64) '
where T E T and; E R" (or e") (this will be proved in the next chapter). Moreover, the first component of ,I, PI' is the solution of Eq. (4.11.60) satisfying PI(T) =
p(T) =
p(\)(T) =
~I'
... , pl"-II(T) =
~2'
~".
Now let 1' 1' " .. ,' I ' " be solutions of Eq. (4.11.60). Then we can readily verify that the matrix
y
=[
;::: ... 1' "\ ' -
;1:' ...:::...;:' ]
¥I~"-
t)
t)
• • •
,,~.-
(4.11.65)
t)
is a solution of the matrix equation
+
=
A(I)",
(4.11.66)
where A(I) is defined by Eq. (4.11.63). We call the determinant of" the Wronskian of Eq. (4.11.60) with respect to the solutions ¥l1>"" I¥ ", and we denote it by det" = W(' I ' I > " " 1' ,' ,). (4.11.67) Note that for a fixed set of solutions I¥ I" .. , "" (and considering the Wronskian is a function of I. To indicate this, we write W(" I ' • •
T ,
fixed),
".)(1).
.4 11.
257
Applications to Ordinary Differential Equations
In view of Theorem .4 11.37 we have for all t
,' Y ,)(t)
W(Y ' I "' "
=
= .4 11.69. tion
Example.
T,
E
det Y ( t) =
det 'P(r)eJ~trACII'"
W(Y ' I "' "
Y', )(r)eJ~-[II.-.e")/II.C"lld".
(4.11.68)
Consider the second-order ordinary differential eq u a-
tZx CZI
+
tx
W
The functions 1' Y (t) = t and (z ' Y t) Consider now the matrix
= x
-
=
0,
0
< t
0,
Using
=
)z'Y (t)
W(YI' >
= det "(t) = -_
which checks.
- e2
-
W(Y' I> )z'Y (r)eJ~
ID (Titl _
l'
•
-, 2
(- II.e"I/II,C"IJ d"
t>
t
0,
The reader will have no difficulty in proving the following:
.4 11.71.
Theorem. A set of n solutions of Eq. (4.11.60), Y'I' ... ,Y'", is linearly independent on a t interval T if and only if W(Yt' > ... ,Y,' ,)(t) 1= = 0 for all t E T. Moreover, every solution of Eq. (4.11.60) is a linear combination of any set of n linearly independent solutions.
.4 11.72.
Exercise.
Prove Theorem .4 11. 71.
We call a set ofn solutions ofEq . (4.11.60), 1'Y t ..• , "' Y , which is linearly independent on T a fundamental set for Eq. (4.11.60). L e t us next turn our attention to the non-homogeneous linear nth- o rder ordinary differential eq u ation (4.11.59). Without loss of generality, let us assume that a,,(t) = 1 for all t E T; i.e., let us consider C X "1
+
a"_I(t)xC"-1l
+ ... +
al(t)x(l)
+
ao(t)x
=
v(t).
(4.11.73)
The study of this eq u ation reduces to the study of the system of n first-order
I Finite-Dimensional
Chapter 4
158
ordinary differential equations
o o
A(t) =
+
A(t)x
i =
where
Vector Spaces and Matrices
b(t),
(4.11.74)
o
1
o
o
o
o
1
.
000 - o o(t)
- 0 1(/)
- 0 2(/)
...
- 0 ._
b(t) =
1(/)
o V(/) (4.11.75)
In the next chapter we will show that for all lET there exists a unique solution ~ to the initial-value problem i = (X T)
+
A(t)x
=; =
b(t)
}
(el' ... ,e.)T
,
(4.11.76)
CI '
where T E T and; E R· (or C·). The first component of~, of Eq. (4.11.59), with 0.(/) = 1 for all t E T, satisfying CI(-r)
= el'
=
C(tJ(r- )
'2'
is the solution
= , .•
... , Clo-(> ! r- )
We now have:
.4 11.77.
Theorem. Let I¥{ t>
+
lX .>
... , I¥ .}
+ ... +
I.- I>
O._I(t)X
Then the solution' of the equation Xl.>
+
+
o._I(/)x(·-tJ
be a fundamental set for the equation
+
+
OI(t)X()J
+
01(/)X()J
satisfying ~(T) = ; = (C(T), CIIl(T), , ,(·-t>(T»T R· (or C·) is given by the expression
; E
C(/)
= CA(/)
+ to I¥ ,(/) t:1
I'
W { ,(¥I .. • . W(¥I h • •
r
oo(/)X
=
oo(t)x =
= O. v(1),
(4.11.78) (4.11.79)
(' I " .. ,' . )T, T
, .'¥ )(s)} v(s) ds, I¥ .)(s)
,
E
T,
(4.11.80)
where CA is the solution of Eq. (4.11.78) with CA(T) = ' I ' and where ~(¥lI' ... ,¥I.X/) is obtained from W(¥lI" .. , I¥ .)(/) by replacing the ith column of W(¥lI" .. , I¥ .X/) by (0,0, ... , l)T.
.4 11.81.
Exercise.
Prove Theorem .4 11.77.
Let us consider a specific case.
.4 11.82. tion
Example.
Consider the second-order ordinary differential equa-
12 x
12>
+
tx
ltJ -
X
=
b(t), t
>
0,
(4.11.83)
.4 11.
Applications to Ordinary Differential Equations
where b(t) is a real continuous function for all t > equivalent to
O. This equation
is
(4.11.84)
where v(t) = b(t)/t'1.. F r om Example .4 11.69 we have V'I(t) = and W(V'., V'1' .)(t) = - 2 /t. Also,
o
I t -1
t, V'1' .(t) =
l/t,
1
=--,
t
tr
eL t us next focus our attention on linear nth-order ordinary differential equations with constant coefficients. Without loss of generality, let us assume that, in Eq. (4.11.61), a. = 1. We have (4.11.85)
We call the algebraic equation
a._ l l n- 1 + .,. + all + a o = 0 (4.11.86) the characteristic equation of the differential equation (4.11.85). As was done before, we see that the study of Eq. (4.11.85) reduces to the study of the system of first-order ordinary differential equations given by P(l) =
ln
+
:i =
where
Ax,
(4.11.87)
AJ l-a.~ ..... ~ ..... ~ .....~ ... :::.... ~ . .] o
-al
(4.11.88)
- a 3 ..• - a ._ I We now show that the eigenvalues of matrix A ofEq . (4.11.88) are precisely the roots of the characteristic equation (4.11.86). First we consider
o
1
0
-,t
0
o
o
o
-,t
det(A - , tI)
=
-a2
o o
-,t
o o
.
Chapter 4 I iF nite-Dimensional
160
-1
Vector Spaces and Matrices
0
0
0
o
-1
I
0
0
o
0
0
-1
I
-01
-0"
-03
= -1
...
+
-(1
- 0 "_ , ,
. 0,,_ 1 )
100 -1
+ sU ing
(- 1 )"+ 1 (- 0
induction we arrive at the expression det(A - 1 1)
=
(- I )"{ l "
+
,° ,_11,,-1
0)
+ ... +
I
0
-1
o
0
all
+
oo}.
0
0.
(4.11.89)
1 is an eigenvalue of A if and only if 1 is a root of the characteristic equation (4.11.86).
It follows from Eq. (4.11.89) that
.4 11.90. Exercise. Assume that the eigenvalues of matrix A given in Eq. (4.11.88) are all real and distinct. eL t A denote the diagonal matrix
o
(4.11.91)
1" where 1 1 , • • ,1" denote the eigenvalues of matrix A. eL t Vanclermonde matrix given by
V denote the
I V=
11 II
1" l~
1" l~
(a) Show that V is non-singular. (b) Show that A = V-IAV. Before closing the present section, let us consider so-called "adjoint systems." To this end let us consider once more Eq. (4.11.30); i.e., t
=
A(t)x .
(4.11.92)
Let A*(t) denote the conjugate transpose of A(t). (That is, ifA(t) = o[ (J' t)], then A*(t) = a[ l}(t)]T = a[ ,J (t)], where a,it) denotes the complex conjugate
.4 12.
261
Notes and References
of a,it).) We call the system of linear first-order ordinary differential equations y = -A*(t)y (4.11.93) the adjoint system to (4.1 1.92). .4 11.94. Exercise. eL t Y be a fundamental matrix of Eq. (4.11.92). Show that T is a fundamental matrix for Eq. (4.11.93) if and only if
T*Y = C, where. C is a constant non-singular matrix, and where T* denotes the conjugate transpose of T. It is also possible to consider adjoint equations for linear nth-order ordinary differential equations. eL t us for example consider Eq. (4.11.85), the study of which can be reduced to that of Eq. (4.11.87), with A specified by Eq. (4.11.88). Now consider the adjoint system to Eq. (4.11.87), given by
y= - A *y,
where
0 0
0 -I
- A *=
(4.11.95) 0 0 0
0 -I
ao a a2 1
..................... o 0· · - I a.-
where a, denotes the complex conjugate of a" i = (4.11.95) represents the system of equations Yl =
2Y = .Y
aoY., -YI
= - Y , ,- I
+
,
(4.1 1.96)
1
0, ... , n -
I. Equation
(4.1 1.97)
+
alY . ' a,,-I.Y ·
Differentiating the last expression in Eq. (4.11.97) (n ... ' Y " - I ' and letting "Y = ,Y we obtain C (- I )· y "> + (- I ),,- l a._1Y c..-I> + ... + (- I )Qlit> +
1) times, eliminating
Y"
aoY
=
O.
(4.11.98)
Equation (4.11.98) is called the adjoint of Eq. (4.1 1.85).
.4 12.
NOTES AND REFERENCES
There are many excellent texts on finite-dimensional vector spaces and matrices that can be used to supplement this chapter (see e.g., .4 [ 1], .4 [ 2], .4 [ ,] 4 and .4 [ 6].4 [ - 10]). References .4 [ 1], .4 [ 2], .4 [ 6], and .4 [ 10] include appli-
C1uIpter 4
I Fmite-Dimensional
Vector Spaces and Matrices
cations. (In particular, consult the references in .4 [ 10] for a list of diversified areas of applications.) ExceUent references on ordinary differential equations include .4 [ 3], .4 [ 5], and .4 [ 11]. REFERENCES .4 [ 1]
.4 [ 2]
.4 [ 3] .4 [ ] 4 .4[ 5]
.4 [ 6] .4 [ 7) .4[ 8] .4[ 9] .4[ 10]
.4 [ 11]
N. R. AMuNDSON, MatMmatical Methods in Chemical Engineering: Matrices and Their Applications. Englewood ai1f's, N.J.: Prentice-aH ll, Inc., 1966. R. E. BELM L AN, Introduction to Matrix Algebra. New York: McGraw-iH D Book Company, Inc., 1970. .F BRAUER and .J A. NOBEL, Qualitatil1e Theory of Ordinary Differential Equations: An Introduction. New York: W. A. Benjamin, Inc., 1969. * E. T. BROWNE, Introduction to the Theory of Determinants and Matrices. Chapel iH D, N.C.: The nU iversity of North carolina Press, 1958. E. A. CoDDINGTON and N. IL MNSON, Theory of Ordinary Differential Equations. New York: McGraw-iH ll Book Company, Inc., 1955. F. R. GANTMACHER, Theory of Matrices. Vols. I, II. New York: Chelsea Publishing Company, 1959. P. R. IIALMos, iF nite Dimensional Vector Spaces. Princeton, N.J.: D. Van Nostrand Company, Inc., 1958. .K O H M F AN and R. N UK ZE, Linear Algebra. Englewood ai1f's, N.J.: Prentice-aH ll,
Inc., 1961.
S. IL PSCHT U Z, Linear Algebra. New York: McGraw-iH ll Book Company, 1968. B. NOBLE, Applied iL near Algebra. Englewood aiit' s , N.J.: Prentice-aH ll, Inc., 1969. .L S. PoNTlU A OIN, Ordinary Differential Equations. Reading, Mass.:
Addison-Wesley Publishing Co., Inc., 1962.
- R eprinted by Dover Publications, Inc., New York,
1989.
5
METRIC SPACES
U p to this point in our development we have concerned ourselves primarily with algebraic structure of mathematical systems. In the present chapter we focus our attention on topological structure. In doing so, we introduce the concepts of "distance" and "closeness." In the final two chapters we will consider mathematical systems endowed with algebraic as well as topological structure. A generalization of the concept of "distance" is the notion of metric. Using the terminology from geometry, we will refer to elements of an arbitrary set X as points and we will characterize metric as a real-valued, non-negative function on X x X satisfying the properties of "distance" between two points of .X We will refer to a mathematical system consisting of a basic set X and a metric defined on it as a metric space. We emphasize that in the present chapter the underlying space X need not be a linear space. In the first nine sections of the present chapter we establish several basic facts from the theory of metric spaces, while in the last section of the present chapter, which consists of two parts, we consider some applications to the material of the present chapter.
5.1.
DEFINITION
OF
METRIC SPACE
We begin with the following definition of metric and metric space. 5.1.1. Definition. eL t X real-valued function on X lowing properties: (i) p(x, y) (ii) p(x , y) (iii) p(x, y)
=
be an arbitrary non-empty set, and let p be a x ,X i.e., p: X x X - R, where p has the fol-
0 for all ,x y E X and p(x , y) = 0 if and only if x = p(y, x) for all x, y E X ; and p(x , )z + p(z, y) for all x , y, Z E .X
y;
The function p is called a metric on ,X and the mathematical system consisting of p and ,X {X; p}, is called a metric space. The set X is often called the underlying set of the metric space, the elements of X are often called points, and p(x, y) is frequently called the distance from a point x E X to a point y E .X In view of axiom (i) the distance between two different points is a unique positive number and is equal to zero if and only if two points coincide. Axiom (ii) indicates that the distance between points x and y is equal to the distance between points y and x. Axiom (iii) represents the well-known triangle inequality encountered, for example, in plane geometry. Clearly, if p is a metric for X and if IX is any real positive number, then the function IXp(X, y) is also a metric for .X We are thus in a position to define infinitely many metries on .X The above definition of metric was motivated by our notion of distance. Our next result enables us to define metric in an equivalent (and often convenient) way. 5.1.2. Theorem. eL t p: X (i) p(x, y) = (ii) p(y, )z
a.p for
(til1' 11') III
(5.2.3) follows trivially. Therefore, we assume that
7= = 0 and
lell
:U +
s:
P'/.q
0, then inequality
Ig(tW dt
0 for all t in that subinterval. eH nce,
1[
6
• Ix ( t) -
y(t) I' dt
1] /, > o.
Therefore, p,(x , y) = 0 if and only if x ( t) = y(t) for all t E a[ , b]. To show that the triangle inequality holds, let u, tJ, W E era, b], and let x = U - tJ and y = v - w. Then we have, from inequality (5.2.8),
=
< =
I{ • lu(t) I{• Iu(t) b • I{ lu(t) b
p,(u, w) =
b
p,(u, v)
+
w(tWdt v(t) +
} 1/,
v(t) -
}
v(tWdt
1/,
w(t) I' dt
+ • I{ b
} 1/,
Iv(t) -
w(tWdt
}
1/,
p,(v, w),
the triangle inequality. It now follows that e{ ra, bJ; p,} is a metric space. It is easy to see that this space is an unbounded metric space. _
273
5.3. Examples of Important Metric Spaces
5.3.14. Example. eL t F o r x, y E ~[a, b], let
b] be defined as
a[~ ,
p_ ( x ,
y)
=
sup Ix { t ) -
GStS6
In
the preceding example.
(5.3.15)
y{ t ) I.
To show that {~[a, b]; p- l is a metric space we first note thatp_ ( x , y) = p_ ( y, x), that p(x , y) > 0 for all ,x y, and that p{ x , y) = 0 if and only if x ( t) = y{ t ) for all t E a[ , b]. To show that p_ satisfies the triangle inequality we note that
=
p_ ( x , y)
sup Ix { t )
Ix - y l
o x =
z{t)
.StS6
+
z ( t) -
y{ t )
I
I
i ·' I
•x
y(t)/
sup Ix ( t) -
=
I
-
GStS6
I I
I I
•
R, pix, yl
I
I
o
y
= Ix - yl
ev= ( v,.V2)
(x "
X"
.- - - - , -
I
2x 1
:
I
I
o P.(x ,
X .. R2, P.(X ,
o
yl
yl" max (Ix, - y ,I,lx 2
- 2Y 11
(x tl
o
- -
x
"' - _ ' : "' - ' =
era,
bJ , P. (x"
-~
Ib I
2x 1 = sup { I x l
- --
- -
-
(t)- 2 x t{ 111
a~tb ~
5.3.16.
iF gure B. Illustration of various metrics.
174
Chopttr 5
' +
[ p iY I '
Y2)],}I/"
= max p{ (z x u 2X ), PY(IY ' are metric spaces.
Z2)
Then Z { ; PI} and Z { ; P-J
The spaces Z { ; P,J and Z { ; P-J
1 0 and let X E .Y Then there is ayE Y such that y E S(x ; r). Since Y c Z, Y E Z and thus X is an adherent point of Z. To prove the fourth part, note that Y c Y U Z and Z c Y U Z. F r om part (iii) it now follows that Y c Y U Z and i c Y U Z. Thus, f u i c Y U Z. To show that Y U Z c f u i, let X E Y U Z and suppose that X :q Y u i. Then there exist spheres S(x ; r l ) and S(x ; r2) such that S(x ; r l) n Y = 0 and S(x ; ' 2 ) n Z = 0. L e t r = min {'It :' ' } z Then S(x ; r) n [ Y U Z] = 0. But this is impossible since X E Y U Z. Hence, X E Y u i, and thus Y U Z c f u i. The proof of the remainder of the theorem is left as an exercise. _ Proof
r
5.4.11.
r.
Exercise.
Prove parts (v) and (vi) of Theorem 5.4.10.
We can further classify points and subsets of metric spaces. 5.4.12. Definition. L e t Y be a subset of X and let Y - denote the complement of .Y A point X E X is called an interior point of the set Y if there
5.4.
Open and Closed Sets
279
exists a sphere Sex; r) such that sex; r) c .Y The set of all interior points of set Y is called the interior of Y a nd is denoted by yo. A point x E X is an ex t erior point of Y if it is an interior point of the complement of .Y The exterior of Y is the set ofall exterior points of set .Y The set ofall points x E X such that x E f () (Y - ) is called the frontier of set .Y The boundary of a set Y is the set of all points in the frontier of Y which belong to .Y 5.4.13. Example. Let R { ; p} be the real line with the usual metric, and let Y = y{ E R: 0 < :Y 5: I} = (0, I]. The interior of Y is the set (0, I) = { y E R: 0 < y < I}. The exterior of Y i s the set (- 0 0, 0) U (I, + 0 0), f = y{ E R: < Y : 5: I} = 0[ , I] and Y - = (- 0 0,0] U 1[ , + 0 0). Thus, the • frontier of Y is the set CO, I}, and the boundary of Y is the singleton l{ .}
°
We now introduce the following important concepts. 5.4.14. Definition. A subset Y of X is said to be an open subset of X if every point of Y is an interior point of ;Y eL ., Y = yo. A subset Z of X is said to be a closed subset of X if Z = Z. When there is no room for confusion, we usually call Y an open set and
Z a closed set. On occasions when we want to be very explicit, we will say that Y is open relative to { X ; p} or witb respect to { X ; p}. In our next result we establish some of the important properties of open sets.
5.4.15. (i) (ii)
Theorem. and 0 are open sets. If { .Y } .. eA is an arbitrary family of open subsets of ,X X
is an open set. (iii) The intersection of a finite number of open sets of X
then
U
• eA
Y ..
is open.
Proof To prove the first part,. note that for every x E X, any sphere Sex; r) c .X Hence, every point in X is an interior point. Thus, X is open. Also, observe that 0 has no points and therefore every point of 0 is an interior point of 0. Hence, 0 is an open subset of .X To prove the second part, let .Y{ .} EA be a family of open sets in ,X and Y .• If Y .. is empty for every tt E A, then Y = 0 is an open let Y = U .eA
subset of .X Now suppose that Y *- 0 and let x E .Y Then x E tt E A. Since Y .. is an open set, there is a sphere Sex; r) such c Y .• Hence, Sex; r) c ,Y and thus x is an interior point of .Y Y is an open set. To prove the third part, let Y 1 and Y 2 be open subsets of .X = 0, then Y 1 n Y 2 is open. So let us assume that Y 1 n Y z *-
Y . for some that sex; r) Therefore, If Y 1 () Y 2 0, and let
Chapter 5 I Metric Spaces
= Y 1 n Y z • Since x E Y " there is an r l > 0 such that x E S(x; Y I ' Similarly, there is an r z > 0 such that x E S(x; rz) c Y z . L e t T = min { r " Tz.} Then x E S(x ; r), where S(x ; r) c S(x ; T1) and S(x ; r) c S(x ; rz). Thus, S(x; r) c Y 1 n Y z , and x is an interior point of Y 1 n Y z . Hence, Y 1 n Y z is an open subset of .X By induction, we can show that the intersection of any finite number of open subsets of X is open. _ x
E Y
T 1) C
We now make the following
p} be a metric space. The topology of X Definition. L e t ;X{ mined by p is defined to be the family of all open subsets of .X
5.4.16.
deter-
In our next result we establish a connection between open and closed subsets of .X 5.4.17.
Theorem.
(i) X and 0 are closed sets. (ii) If Y is an open subset of ,X (iii) If Z is a closed subset of ,X
then r is closed. then Z- is open.
Proof
The first part of this theorem follows immediately from the definitions of ,X 0, and closed set. To prove the second part, let Y b e any open subset of .X We may assume that Y 1= = 0 and Y 1= = .X Let x be any adherent point of Y - . Then x cannot belong to ,Y for if it did, then there would exist a sphere S(x ; ,) c ,Y which is impossible. Therefore, every adherent point of Y - belongs to Y - , and thus Y - is closed if Y is open. To prove the third part, let Z be any closed subset of .X Again, we may assume that Z 1= = 0 and Z 1= = .X L e t x E Z- . Then there exists a sphere S(x ; T) which contains no point of Z. This is so because if every such sphere would contain a point of Z, then x would be an adherent point of Z and consequently would belong to Z, since Z is closed. Thus, there is a sphere S(x ; r) c Z- ; i.e., x is an interiorpointofZ- . Since this holds for arbitrary x E Z- , Z- is an open set. _ In the next sets. 5.4.18.
result we present additional important properties of open
Theorem.
(i) Every open sphere in X is an open set. (ii) If Y is an open subset of ,X then there is a family of open spheres, S { .}.eA' such that Y = U S .•
• eA
(iii) The interior of any subset Y of X in .Y
is the largest open set contained
5.4.
Open and Closed Sets
281
Proof To prove the first part, let Sex; r) be any open sphere in .X L e t x . E sex; r), and let p(x, lX ) = r .• If we let r o = r - ' . , then according to the proof of part (ii) of Theorem 5.4.10 we have S(x l ; ro) c Sex; r). Hence, x . is an interior point of sex; r). Since this is true for any x . E sex; r), it follows that sex ; r) is an open subset of .X To prove the second part of the theorem, we first note that if Y = 0, then Y is open and is the union of an empty family of spheres. So assume that Y t= = 0 and that Y is open. Then each point X E Y is the center of a sphere Sex; r) c ,Y and moreover Y is the union of the family of all such spheres. The proof of the last part of the theorem is left as an exercise.
5.4.19.
Exercise.
_
Prove part (iii) of Theorem 5.4.18.
Let {Y; p} be a subspace of a metric space {X; pI, and suppose that V is a subset of .Y It can happen that V may be an open subset of Y and at the same time not be an open subset of .X Thus, when a set is described as open, it is important to know in what space it is open. We have:
5.4.20.
p} be a metric subspace of { X ; pl. (i) A subset V c Y is open relative to { Y ; p} if and only if there is a subset U c X such that U is open relative to { X ; p} and V = Y n .U (ii) A subset G c Y is closed relative to { Y ; p} if and only if there is a subset F of X such that Fis closed relative to ;X { p} and G = F n .Y Proof L e t S(x o; r) = x { E :x p(x, x o) < r} and S'(x o; r) = x { E :Y p(x, x o) < r}. Then S' ( x o; r) = Y n S(x o; r). Theorem.
Let { Y ;
To prove the necessity of part (i), let V be an open set relative to { Y ; p} , and let x E V. Then there is a sphere S' ( x ; r) c V (r may depend on )x . Now
V=
U
.,el'
S' ( x ; r)
By part (ii) of Theorem 5.4.15,
U
.,el'
=
U
.,el'
S(x ; r)n
Sex; r) = U
Y.
is an open set in ;X{
pl.
To prove the sufficiency of part (i), let V = Y n ,U where U is an open subset of .X L e t x E V. Then x E ,U and hence there is a sphere S(x; r) c .U Thus, S'(x; r) = Y n Sex; r) c Y n U = V. This proves that x is an interior point of V and that V is an open subset of .Y The proof of part (ii) of the theorem is left as an exercise. _
5.4.21.
Exercise.
Prove part (ii) of Theorem 5.4.20.
The first part of the preceding theorem may be stated in another equivalent way. L e t 3 and 3' be the topology of ;X { p} and {Y; pI, respectively, generated by p. Then 3' = { Y n :U U E 3}. Let us now consider some specific examples.
elulpter 5 I Metric Spaces
5.4.22. Example. Let X = R, and let p be the usual metric on R; eL ., p(x, y) = Ix - yl. Any set Y = (a, b) = { x : a < x < b} is an open subset of .X We call (a, b) an open interval on R. _ 5.4.23. Example. We now show that the word "finite" is crucial in part (iii) of Theorem 5.4.15. eL t R { ; p} denote again the real line with the usual metric, and let a < b. If "Y = x { E R: a < x < b + lin}, then for each positive integer n, "Y is an open subset of the real line. oH wever, the set
n- "Y
,,= \
= x{
E
R: a
< x < b} = (a, b]
is not an open subset of R. (This. can readily be verified, since every sphere S(b; r) contains a point greater than b and hence is not in
n- "Y .)
,,= \
_
In the above example, let Y = (a, b]. We saw that Y is not an open subset of R; i.e., b is not an interior point of .Y oH wever, if we were to consider { Y ; p} as a metric space by itself, then Y is an open set. 5.4.24. Example. eL t e{ ra, b]; p_} denote the metric space of Example 5.3.14. eL t 1 be an arbitrary finite positive number. Then the s~t of continuous functions satisfying the condition Ix ( t) I < 1 for all a < t < b is an open _ subset of the metric space e{ ra, b]; p_.} Theorems 5.4.15 and 5.4.17 tell us that the sets X and 0 are both open and closed in any metric space. In some metric spaces there may be proper subsets of X which are both open and closed, as illustrated in the following example. 5.4.25. Example. eL t X be the set of real numbers given by X = (- 2 , - 1 ) U (+ 1 , + 2 ), and let p(x , y) = Ix - yl for x , y E .X Then { X ; p} is clearly a metric space. Let Y = (- 2 , - 1 ) c X and Z = (+ I, + 2 ) c .X Note that both Y and Z are open subsets of .X oH wever, Y - = Z, Z- = ,Y and thus Y a nd Z are also closed subsets of .X Therefore, Y and Z are proper subsets of the metric space ;X { p} which are both open and closed. (Note that in the preceding we are not viewing X as a subset of R. As such X would be open. Considering ;X{ p} as our metric space, X is both open and closed.) _ 5.4.26. Exercise. eL t { X ; p} be a metric space with p the discrete metric defined in Example 5.1.7. Show that every subset of X is both open and closed. In our next result we summarize several important properties of closed sets.
5.4.
Open and Closed Sets
5.4.27.
Theorem.
(i) Every subset of X consisting of a finite number of elements is closed. (ii) L e t X o E ,X let r> 0, and let K ( x o ; r) = x { E X : p(x , x o) < r}. Then K ( x o; r) is closed. (iii) A subset Y c X is closed if and only if feY . (iv) A subset Y c X is closed if and only if Y ' c .Y (v) Let {Y.}.eA be any family of closed sets in .X Then Y. is closed.
n
• eA
(vi) The union of a finite number of closed sets in X is closed. (vii) The closure of a subset Y of X is the intersection of all closed sets containing .Y
Proof Only the proof of part (v) is given. Let {Y.}.eA be any family of closed subsets of .X Then {Y:}.eA is a family of open sets. Now Y . )-
=
U
.eA
5.4.28.
Y:
is an open set, and hence
n
.eA
Y. is a closed subset of .X
(n .eA •
Prove parts (i) to (iv), (vi), and (vii) of Theorem 5.4.27.
Exercise.
We now consider several specific examples of closed sets. 5.4.29. Example. Let X = R, and let p be the usual metric, p(x , y) = Ix - yl· Any set Y = x{ E R: a < x < b}, where a < b is a closed subset of R. We call Y a closed interval on R and denote it by a[ , b]. • 5.4.30. Example. We now show that the word "finite" is essential in part (vi) of Theorem 5.4.27. Let {R; p} denote the real line with the usual metric, and let a> O. If Y. = x { E R: lin < x < a} for each positive integer n, then Y. is a closed subset of the real line. However, the set
U
-
.=1
Y. =
(x
E
R: 0 < x
0 there is an x . E S such that p(x, x . ) < f. 5.4.35.
Exercise.
Prove Theorem 5.4.34.
eL t us now consider some specific cases. 5.4.36. Example. The real line with the usual metric is a separable space. As we saw in Example 5.4.9, if Q is the set of rational numbers, then Q = R.
•
5.4.37. Example. Let {R·; p,} be the metric space defined in Example 5.3.1 (recall that 1 < p < 00). The set of vectors x = (e I' ,e.) with rational coordinates (i.e., is a rational real number, i = I, ,n) is a denumerable everywhere dense set in R· and, therefore, R { ;· p,} is a separable metric space. _
e,
5.4.38. Example. eL t {l,; p,} be the metric space defined in Example 5.3.5 (recall that I < p < 00). We can show that this space is separable in the following manner. eL t Y
= .Y{
E
I,: .Y
=
('II' ... , 1/.,0,0, ...) for some n,
where 1/1 is a rational real number, i
= 1, ... ,n} .
5.4.
Open and Closed Sets
285
Then Y is a countable subset of I,. To show that it is everywhere dense, let E > 0 and let x E I" where x = (~I> ~z, ...). Choose n sufficiently large so that ~ E' k-~+t
We can now find a Y~
1:
I~kl'
1' ,z ...). Thus, Y is an uncountable
set. Notice now that for every IY > zY E ,Y p~(IY > yz ) = 0 or l. That is, p~ restricted to Y is the discrete metric. It follows from Exercise 5.4.14 that Y cannot be separable and, consequently, { t ; p~} is not separable. _
Chapter 5 I Metric Spaces
286
5.5.
COMPLETE
METRIC SPACES
The set of real numbers R with the usual metric p defined on it has many remarkable properties, several of which are attributable to the so-called "completeness property" of this space. F o r this reason we speak of R { ; p} as being a complete metric space. In the present section we consider general complete metric spaces. Throughout this section {X; p} is our underlying metric space, and J denotes the set of positive integers. Before considering the completeness of metric spaces we need to consider a few facts about sequences on metric spaces (cf. Definition 1.1.25). 5.5.1. Definition. A sequence .x { } in a set Y c: X is a functionf: J Thus, if .x{ } is a sequence in ,Y thenf(n) = x . for each n E .J
.Y
5.5.2. Definition. eL t .x{ } be a sequence of points in ,X and let x be a point of .X The sequence {x.} is said to converge to x if for every f > 0, there is an integer N such that for all n;;::: N, p(x, x . ) < f (i.e., x . E S(x ; f) for all n ;;::: N). In general, N depends on f; i.e., N = N(f). We call x the limit of .x{ ,} and we usually write
lim x .
•
or x . - x as n - 00. If there is no x then we say that {x.l diverges.
= ,x
E
X to which the sequence converges,
Thus, x . - + x if and only if the sequence of real numbers {p(x., )x } converges to ez ro. In view of the above definition we note that for every f > 0 there is afinite number N such that all terms of {x.l except the first (N - I) terms must lie in the sphere with center x and radius E. eH nce, the convergence of a sequence depends on the infinite number of terms x{ N + 1J X N+ 2' • • ), and no amount of alteration of a finite number of terms of a divergent sequence can make it converge. Moreover, if a convergent sequence is changed by omitting or adding a finite number of terms, then the resulting sequence is still convergent to the same limit as the original sequence. Note that in Definition 5.5.2 we called x the limit of the sequence .x{ .} We will show that if { x . ) has a limit in ,X then that limit is unique. 5.5.3. Definition. eL t .x { } be a sequence of points in ,X where f(n) to x . for each n E .J If the range offis bounded, then .x { } is said to be a bounded sequence.
The range off in the above definition may consist of a finite number of points or of an infinite number of points. Specifically, if the range of f
5.5. Complete Metric Spaces
consists of one point, then we speak of a constant sequence. constant sequences are convergent.
Clearly, all
{ ; p} denote the set of real numbers with the usual 5.5.4. Example. Let R metric. If n E ,J then the sequence n{ Z} diverges and is unbounded, and the range of this sequence is an infinite set. The sequence { ( - I )"} diverges, is
a{ + ( nl)"}
bounded, and its range is a finite set. The sequence to a, is bounded, and its range is an infinite set.
converges
_
be a sequence in .X Let n l , n z , ... , nk' ... be 5.5.5. Definition. eL t "x { } a sequence of positive integers which is strictly increasing; i.e., nJ > nk for all j > k. Then the sequence "x { .} is called a subsequence of ,x { ,}. If the subsequence "x { .} converges, then its limit is called a subsequential limit of ,x { ,]. It turns out that many of the important properties of convergence on R can be extended to the setting of arbitrary metric spaces. In the next result several of these properties are summarized.
5.5.6. lbeorem. eL t ,x { ,}
be a sequence in .X
Then
(i) there is at most one point x E X such that lim "x
"
= x;
(ii) if ,x { ,} is convergent, then it is bounded; (iii) ,x { ,} converges to a point x E X if and only if every sphere about x
contains all but a finite number of terms in ,x { ,}; (iv) ,x { ,} converges to a point x E X if and only if every subsequence of ,x { ,} converges to x ; (v) if{,x ,} converges to x E X and if Y E ,X then lim p(x", )Y = p(x, )Y ;
"
(vi) if ,x { ,} converges to x E X and if the sequence y{ ,,} of X converges to Y E ,X then lim p(x", y,,) = p(x, y); and
(vii) if ,x [ ,} converges "to x E ,X and if there is ayE X and a )' > that p(x", y) < )' for all n E ,J then p(x, y) < y.
0 such
= x and " lim "x = y. Then for every f > 0 there are positive integers N" and N)' such " p(x", x ) < f/2 whenever n > N" and p(x", y) < f/2 whenever n > N that r
Proof.
To prove part (i), assume that ,x y E X
If we let N
Now
f
=
and that lim "x
max (N", N,,), then it follows that
is any positive number. Since the only non-negative number which
Chapter 5 I Metric Spaces
288
is less than every positive number is ez ro, it follows that p(x, y) = 0 and therefore x = y. To prove part (iii), assume that lim x . = x and let Sex; f) be any sphere
•
about .x Then there is a positive integer N such that the only terms of the sequence { x . } which are possibly not in Sex; f) are the terms X I ' x 2 , • • , X N - 1 • Conversely, assume that every sphere about X contains all but a finite number of terms from the sequence .x{ .} With f > 0 specified, let M = max n{ E :J .x 1= S(x ; f)} . IfwesetN= M + l,thenx . E S(x ; f)foralln> N ,which was to be shown. To prove part (v), we note from Theorem 5.1.13 that lP(y, )x -
•
I=
p(x, .x ).
.x Therefore, lim p(x, x . ) = 0 and so lim Ip(y, )x
By hypothesis, lim x . = - p (y, x . )
I
.Y' Then 6 = p(x, y) - i' > O. Now'Y - p(x., y) > 0 for all n E ,J and thus
0< for all n
E
6
N, y" E S(x; f). That is, there are infinitely many points of Y i n S(x ; f). To prove part (iii), assume that Y is closed and let ,Y { ,} be a convergent sequence with Y . E Y for all n and lim "Y = x . We want to show that x E Y .
"*
•
By part (i), x must be an adherent point of .Y Since Y is closed, x E .Y Next, we prove the converse. Let x be an adherent point of .Y Then by part (i), there is a sequence y{ .J in Y such that lim Y . = x. By hypothesis, we must
•
have x E .Y Since Y contains all of its adherent points, it must be closed. _
Statement (iii) of Theorem 5.5.8 is often used as an alternate way of defining a closed set. The next theorem provides us with conditions under which a sequence is convergent in a product metric space. 5.5.9. Theorem. Let {X; P.. J and fY; py} be two metric spaces, letZ = X x ,Y let p be any of the metrics defined on Z in Theorem 5.3.19, and let { Z ; p} denote the product metric space of { X ; P..} and { Y ; py}. If Z E Z = X x ,Y then z = (x, y), where x E X and y E .Y eL t fx,,} be a sequence in ,X and let y{ ,,} be a sequence in .Y Then, (i) the sequence ({ .x , y,,)} converges in Z if and only if ,x { ,} X and .Y { } converges in ;Y and (ii) lim (x"' Y.) = (lim x . , lim y,,) whenever this limit exists.
•
5.5.10. Exercise.
converges in
••
Prove Theorem 5.5.9.
In many situations the limit to which a given sequence may converge is unknown. The following concept enables us to consider the convergence of a sequence without knowing the limit to which the sequence may converge.
Chapter 5 I Metric Spaces 5.5.11. Definition. A sequence ,x { ,} of points in a metric space ;X { p} is said to be a Cauchy sequence or a fundamental sequence if for every e > 0 there is an integer N such that p(x", "x ,) < e whenever m, n ~ N. The next result follows directly from the triangle inequality.
p} is
5.5.12. Theorem. Every convergent sequence in a metric space { X ; a Cauchy sequence.
Proof
Assume that lim "x
"
=
.x Then for arbitrary e >
integer N such that p(x", x) < el2 and p(x"" x) In view of the triangle inequality we now have whenever m, n >
x z ), p(x l , x 3), ... ,p(x l , x N)). Then, by the triangle inequality, p(x l , ,x ,) < P(X l ' x N ) + p(x N , ,x ,) < (l + I)
Proof
if n > N. Thus, for all n E I, p(x l , ,x ,) inequality, p(x"" ,x ,) < p(x"" X l ) for all m, n
E
I. Thus, p(x"" ,x ,)
0 such that p,(O, lX )' =
k
we know by Theorem 5.5.13 that there
[ .~I ..
1~.k I'
1] /'
O. Then there is an
integer N such that p,(x } , X k ) < € for all k,j > N. Again, let n be any positive integer. Then we have p~(,~x )~x < € for all j, k > N. F o r fixed n, we conclude from Theorem 5.5.6, part (vii), that p~(X', x~) :::;; € for all k 2 N. eH nce,
[ ~ " 1,,,, "' s l
k'
I' IJ /'
€
N, where N depends
only on € (and not on n). Since this must hold for all n E I, we conclude that p(x , x k } < € for all k > N. This implies that lim x k = X . _ k
5.5.27. Exercise. is complete.
Show that the discrete metric space of Example 5.1.7
5.5.28. Example. eL t e{ ra, bJ; p~) be the metric space defined in Example 5.3.14. Thus, era, bJ is the set of all continuous functions on a[ , bJ and y)
p~(x,
=
sup I(X I) -
• S/Sb
y(l) I.
We now show that e{ ra, bJ; p~) is a complete metric space. If ,x { ,} isa Cauchy sequence in era, bJ, then for each € > 0 there is an N such that I,x ,(I) - "X ,(I) I < € whenever m, n 2 N for all I E a[ , b]. Thus, for fixed I, the sequence ,X { ,(I}) converges to, say, oX (I}. Since t is arbitrary, the sequence offunctions {x,,( .)} converges pointwise to a function x o( .). Also, since N = N(€ ) is independent of I, the sequence ,x { ,( • )} converges uniformly to x o( • ). Now from the calculus we know that if a sequence of continuous functions ,x { ,( • )» converges uniformly to a function x o( • ), then x o( • ) is continuous. Therefore, every Cauchy sequence in e{ a[ , b); pool converges to an element in this space in the sense of the metric poo. Therefore, the metric space e{ a[ , bJ; pool is complete. _
5.5.29. Example. eL t e{ ra, bJ; pz} 5.3.12, with p = 2; i.e., pz(x,
:U
y) =
be the metric space defined in Example
(X [ I)
-
y(I)J2
dt}
lIZ.
We now show that this metric space is not complete. Without loss ofgenerality let the closed interval be [ - 1 , IJ. In particular, consider the sequence ,x { ,} of continuous functions defined by x , ,(t)=
{
< t:::;; 0
0,
-)
nt,
O:::;;t:::;;l! n
I,I! n :::;;t:::;;)
} ,
Chapter 5 I Metric Spaces
194 x ( t)
n= 3
n= 2 - ~ ' + f
~ -n=l
- l _ l - - f I~ - - l ..- - -
t
5.5.30. n = m >
for e{ ra, b]; P2}.
iF gw'e .F Sequence {x.}
1,2, .... This sequence is depicted pictorially in Figure n and note that P{ 2(X
.., .X )}2 =
=
(m -
,,)2 ill... t 2 dt
(m -
,,)2
o
3m2 n
< .!.
1/(3£). Therefore, .x{ } is a Cauchy sequence. F o r purposes of contradiction, let us now assume that .x{ } converges to a continuous function x, where convergence is taken with respect to the metric P2' In other words, assume that
fl
Ix.(t) -
(x t)12 dt - -
0 as n - -
00.
This implies that the above integral with any limits between + I and - I also approaches ez ro as n - > 00. Since x.(t) = 0 whenever t E [ - 1 ,0] , we have
f
l
Ix.(t) -
(x t)12 dt
=
0
independent of n. From this it follows that the continuous function x is such that
and x(t)
r
= 0 whenever t
Choosing n
fl
E
[ - 1 ,0] .
Ix.(t) -
r
2dt
I x(t) 1
=
0,
Now if 0
I/a, we have 11 -
x ( tW dt - -
0 as n - -
00.
Since this integral is independent of n it vanishes. Also, since x is continuous
5.5. Complete Metric Spaces
it follows that x(t) = 1 for t > a. Since a can be chosen arbitrarily close to ez ro, we end up with a function x such that x(t)
= {O,
I,
t t
E E
[ - 1 ,0] (0, I]
}.
Therefore, the Cauchy sequence .x [ J does not converge to a point in era, b], and the metric space is not complete. _ The completeness property of certain metric spaces is an essential and important property which we will use and encounter frequently in the remainder of this book. The preceding example demonstrates that not all metric spaces are complete. However, this space e[ ra, b]; pzJ is a subspace of a larger metric space which is complete. To discuss this complete metric space (i.e., the completion of e{ ra, b]; pz)} , it is necessary to make use of the eL besgue theory of measure and integration. F o r a thorough treatment of this theory, we refer the reader to the texts by Royden 5[ .9] and Taylor 5[ .10]. Although knowledge of this theory is not an essential requirement in the development of the subsequent results in this book, we will want to make reference to certain examples of important metric spaces which are defined in terms of the eL besgue integral. F o r this reason, we provide the following heuristic comments for those readers who are unfamiliar with this subject. The eL besgue measure space on the real numbers, R, consists of the triple R { , mr, lJ ,} where mr is a certain family of subsets of R, called the eL besgue measurable sets in R, and J l is a mapping, W mr - > R*, called eL besgue measure, which may be viewed as a generalization of the concept of length in R. While it is not possible to characterize mr without providing additional details concerning the eL besgue theory, it is quite simple to enumerate several important examples of elements in mr. F o r instance, mr contains all intervals of the form (a, b) = x { E R: a < x < b}, c[ , d) = x { E R: c < x < d}, (e,f] = x{ E R: e < x < f } , g[ , h] = x{ E R: g < x < h}, as well as all countable unions and intersections of such intervals. It is emphasized that mr does not include all subsets of R. Now if A E mr is an interval, then the measure of A, lJ (A), is the length of A. F o r example, if A = a[ , b], then lJ (A) = b - a. Also, if B is a countable union of disjoint intervals, then lJ (B) is the sum of the lengths of the disjoint intervals (this sum may be infinite). Of particular interest are subsets of R having measure ez ro. Essentially, this means it is possible to "cover" the set with an arbitrarily small subset of R. Thus, every subset of R containing at most a countable number of points has eL besgue measure equal to ez ro. F o r example, the set of rational numbers has eL besgue measure ez ro. (There are also uncountable subsets of R having eL besgue measure zero.) In connection with the above discussion, we say that a proposition P(x) is true almost everywhere (abbreviated a.e.) if the set S = [x E R: P(x) is
Chapter 5
296
I
Metric Spaces
not true} has eL besgue measure ez ro. F o r example, two functions f, g: R- + R are said to be equal a.e. if the set S = x { E R:f(x ) *- g(x)} E mt and if .J l(S) = O. eL t us now consider the integral of real-valued functions defined on the interval a[ , b] c R. It can be shown that a bounded function f: a[ , b] - + R is Riemann integrable (where the Riemann integral is denoted, as usual, by
r a
f(x ) dx )
if and only if f is continuous almost everywhere on a[ , b]. The .
class of Riemann integrable funCtions with a metric defined in the same manner as in Example 5.5.29 (for continuous functions on a[ , b]) is not a complete metric space. oH wever, as pointed out before, it is possible to generalize the concept of integral and make it applicable to a class of functions significantly larger than the class of functions which are continuous a.e. In doing so, we must consider the class of measurable functions. Specifically, a functionf: R - + R is said to be a eL besgue measurable fnnction if f- I (' l l.) E mt for every open set CU c R. Now letfbe a e L besgue measurable function which is bounded on the interval a[ , b], and let M = sup { f (x ) = y: x E a[ , b],} and let m = inf { f (x ) = y: x E a[ , b].} In the eL besgue approach to integration, the range off is partitioned into intervals. (This is in contrast with the Riemann approach, where the domain of f is partitioned in developing the integral.) Specifically, let us divide the range off into the n parts specified by m = oY < IY < ... R by (5.5.32) It can be shown that the value of p([f], g[ )J defined by Eq. (5.5.32) is the same for any f and g in the equivalence classes [ f ] and g[ ,] respectively. Furthermore, p satisfies all the axioms of a metric, and as such pL { a[ , h]; pp} is a metric space. One of the important results of the eL besgue theory is that this space is complete. It is important to note that the right-hand side of Eq. (5.5.32) cannot be used to define a metric on .cp[a, h], since there are functions f *- g such that
.[J ,f
b)
If-
glp dp.
=
0; however, in the literature the distinction between
h] and .cp[a, h] is usually suppressed. b] instead of [ f ] E A L a, b], where f E Finally, in the particular case when p Example 5.5.29 is a subspace of the space L{
pL a[ , pL a[ ,
That is, we usually write .cJa, b]. = 2, the space e{ ra, b]; pz}
z ; pz.}
•
f
E
of
Before closing the present section we consider some important general properties of complete metric spaces. 5.5.33. Theorem. Let { X ; p) be a complete metric space, and let { Y ; p} be a metric subspace of { X ; pl. Then { Y ; p) is complete if and only if Y is a closed subset of .X Proof Assume that { Y ; p) is complete. To show that Y is a closed subset of X we must show that Y contains all of its adherent points. Let y be an adherent point of ;Y i.e., lety E .Y Then each open sphere S(y; lIn), n = I, 2, ... , contains at least one point y" in .Y Since p(y", y) < lIn it follows that
the sequence y{ ,,) converges to y. Since y{ ,,) is a Cauchy sequence in the complete space { Y ; p} we have y{ ,,} converging to a point y' E .Y But the limit of a sequence of points in a metric space is unique by Theorem 5.5.6. Therefore, y' = y; i.e., y E Y and y is closed.
Chapter 5 / Metric Spaces
Conversely, assume that Y is a closed subset of .X To show that the space { Y ; p} is complete, let .Y { } be an arbitrary Cauchy sequence in { Y ; pl. Then y{ ,,} is a Cauchy sequence in the complete metric space ;X { p} and as such it has a limit y E .X oH wever, in view of Theorem 5.5.8, part (iii), the closed subset Y of X contains all its adherent points. Therefore, { Y ; p} is complete. _ We emphasize that completeness and closure are not necessarily equivalent in arbitrary metric spaces. F o r example, a metric space is always closed, yet it is not necessarily complete. Before characterizing a complete metric space in an alternate way, we need to introduce the following concept. 5.5.34. Definition. A sequence S{ t} is called a nested sequence of sets if St
::>
Sz
of subsets of a metric space ;X{ ::>
p}
S3 ::> • •
We leave the proof of the last result of the present section as an exercise. 5.5.35. Theorem. eL t { X ;
p} be a metric space. Then,
(i) { X ; p} is complete if and only if every sequence of closed nested spheres in { X ; p} with radii tending to ez ro have non-void interesection; and p} is complete, if S{ t} is a nested sequence of non-empty closed (ii) if ;X { subsets of ,X
and if lim diam (S,,) =
0, then the intersection
•
is not empty; in fact, it consists of a single point. 5.5.36. Exercise.
5.6.
n SIt
.~I
Prove Theorem 5.5.35.
COMPACTNESS
We recall the Bolzano-Weierstrass theorem from the calculus: Every bounded, infinite subset of the real line (i.e., the set of real numbers with the usual metric) has at least one point of accumulation. Thus, if Y is an arbitrary bounded infinite subset of R, then in view of this theorem we know that any sequence formed from elements of Y has a convergent subsequence. F o r example, let Y = 0[ , 2], and let ,x { ,} be the sequence of real numbers given by "x -
_ I -
(- I )" 2
+ n'I
n-
_
2 1, , ....
Then the range of this sequence lies in Y and is thus bounded. eH nce, range has at least one accumulation point. It, in fact, has two.
the
5.6.
Compactness
299
A theorem from the calculus which is closely related to the BolzanoWeierstrass theorem is the eH ine-Borel theorem. We need the following terminology. 5.6.1. Definition. eL t Y be a set in a metric space { X ; p), and let A be an index set. A collection of sets { Y II : (X E A) in {X; p) is called a covering of Y if Y c U Y II • A subcollection { Y p : p E B) of the covering { Y . : (X E A), eL .,
ileA
B c A such that Y c
U
pes
Y
p
is called a subcovering of { Y.;
(X
E
A). If
all the members Y . and Y p are open sets, then we speak of an open covering and open subcovering. If A is a finite set, then we speak of a finite covering. In general, A may be an uncountable set. We now recall the eH ine-Borel theorem as it applies to subsets of the real line (Le., of R): eL t Y be a closed and bounded subset of R. If { Y . : (X E A) is any family of open sets on the real line which covers ,Y then it is possible to find a finite subcovering of sets from { Y . : (X E A). Many important properties of the real line follow from the BolzanoWeierstrass theorem and from the eH ine-Borel theorem. In general, these properties cannot be carried over directly to arbitrary metric spaces. The concept of compactness, to be introduced in the present section, will enable us to isolate those metric spaces which possess the eH ine-Borel and BolzanoWeierstrass property. Because of its close relationship to compactness, we first introduce the concept of total boundedness. 5.6.2. Definition. eL t Y be any set in a metric space { X ; p}, and let l be an arbitrary positive number. A set S. in X is said to be an l- n et for Y if for any point y E Y there exists at least one point S E S. such that p(s,y) < l. The l-net, S.. is said to be finite if S. contains a finite number of points. A subset Y of X is said to be totally bounded if X contains a finite l- n et for Y for every l > O. Some authors use the terminology l-dense set for E-net and precompact for totally bounded sets. An obvious equivalent characterization of total boundedness is contained in the following result. 5.6.3. Theorem. A subset Y c X is totally bounded if and only if Y can be covered by a finite number of spheres of radius E for any E > O. 5.6.4.
Exercise.
Prove Theorem 5.6.3.
In Figure G a pictorial demonstration of the preceding concepts is given. If in this figure the size of E would be decreased, then correspondingly, the
Chapter 5 I Metric Spaces
300
Set X
• S. is the finite set consisting of the dots within the set X
• •
• •
•
• •
•
• •
•
•
• •
•
•
•
•
• • •
•
•
•
• •
• Set Y
•
•
iF gure G. Total boundedness of a set .Y
5.6.5.
number of elements in S. would increase. If for arbitrarily small E the number of elements in S. remains finite, then we have a totally bounded set .Y Total boundedness is a stronger property than boundedness. We leave the proof of the next result as an exercise. 5.6.6.
Then,
p J be a metric space, and let Y be a subset of .X
Theorem. eL t { X ;
(i) if Y is totally bounded, then it is bounded; if Y is totally bounded, then its closure Y is totally bounded; and (iii) if the metric space { X ; pJ is totally bounded, then it is separable. (ii)
5.6.7. Exercise.
Prove Theorem 5.6.6.
We note, for example, that all finite sets (including the empty set) are totally bounded. Whereas all totally bounded sets are also bounded, the converse does, in general, not hold. We demonstrate this by means of the following example. 5.6.8. Example. eL t /{ 2; P2J be the metric space defined in Example 5.3.5. Consider the subset Y c /2 defined by Y
= y{
E
. 1'1,1
12 ::E t= 1
2
S I}.
We show that Y is bounded but not totally bounded. F o r any ,x y have by the Minkowski inequality (5.2.7), P2(X,y)
= [~Iet
- l' ,12r2 < [~le,/2r2 +
t[ i
l' ,12T'2 s
,Y we
E
2.
Thus, Y is bounded. To show that Y is not totally bounded, consider the set of points E = e{ p e2 , • • J c ,Y where e l = (1,0,0, ...), e2 = (0, 1,
301
5.6. Compactness
0, ...), etc. Then pz(e l, eJ ) = ...;-T for i 1= = j. Now suppose there is a finite €-net for Y for say € = 1- Let S{ l> ... , s,,} be the net S,. Now if eJ is such that p(eJ' SI) < ! for some i, then peek' sJ > peek' eJ ) - p(eJ' SI) > ! for k 1= = j. Hence, there can be at most one element of the set E in each sphere S(SI;! ) for i = I, ... ,n. Since there are infinitely many points in E and only a finite number of spheres S(SI; ! ) , this contradicts the fact that S, is an (- n et. Hence, there is no finite (- n et for ( = ! ' and Y is not totally bounded. _ Let us now consider an example of a totally bounded set. 5.6.9.
Example.
Let R { "; pz}
be the metric space defined in Example 5.3.1,
and let Y be the subset of R" defined by Y =
{y
EO
R":
t
leI
til
0. To this end, let N be a positive integer such that N€ > .- In, and let S, be the set of all n-tuples given by s{ =
Sf =
(q l '
... ,q . )
where - N
< ml
Show that Y is totally bounded. In studying compactness of metric spaces, we will find it convenient to introduce the following concept. 5.6.11. Definition. A metric space { X ; p} is said to be sequentially compact if every sequence of elements in X contains a subsequence which converges to some element x EO .X A set Y in the metric space { X ; p} is said to be sequentially compact if the subspace { Y ; pJ is sequentially compact; eL ., every sequence in Y contains a subsequence which converges to a point in .Y 5.6.12. Example. L e t X = (0, I], and let p be the usual metric on the real line R. Consider the sequence .x { ,J where "x = lin, n = I, 2, . . .. This
302
Chapter 5
I
Metric Spaces
sequence has no subsequence which converges to a point in ,X { X ; p} is not sequentially compact. _
and thus
We now define compactness. 5.6.13. Definition. A metric space { X ; p} is said to be compact, or to possess p} contains a finite the eH ine-Borel property, if every open covering of ;X { open subcovering. A set Y in a metric space { X ; p} is said to be compact if the subspace { Y ; p} is compact. Some authors use the term bicompact for eH ine-Borel compactness and the term compact for what we call sequentially compact. As we shall see shortly, in the case of metric spaces, compactness and sequential compactness are equivalent, so no confusion should arise. We will also show that compact metric spaces can equivalently be characterized by means of the Bolzano-Weierstrass property, given by the following. 5.6.14. Definition. A metric space { X ; p} possesses the Bolzano-Weierstrass property if every infinite subset of X has at least one point of accumulation. A set Y in X possesses the Bolzano-Weierstrass property if the subspace { Y ; p} possesses the Bolzano-Weierstrass property. Before setting out on proving the assertions made above, i.e., the equivalence of compactness, sequential compactness, and the Bolzano-Weierstrass property, in metric spaces, a few comments concerning some of these concepts may be of benefit. Informally, we may view a sequentially compact metric space as having such an abundance of elements that no matter how we choose a sequence, there will always be a clustering of an infinite number of points around at least one point in the metric space. A similar interpretation can be made concerning metric spaces which possess the Bolzano-Weierstrass property. Utilizing the concepts of sequential compactness and total boundedness, we first state and prove the following result. 5.6.15. Theorem. Let { X ; p} be a metric space, and let Y be a subset of .X The following properties hold: (i) (ii) (iii) (iv) (v)
if Y is sequentially compact, then Y is bounded; if Y is sequentially compact, then Y is closed; if { X ; p} is sequentially compact, then { X ; p} is totally bounded; if ;X { p} is sequentially compact, then ;X{ p} is complete; and if { X ; p} is totally bounded and complete, then it is sequentially compact.
Proof To prove (i), assume that Y is a sequentially compact subset of X
and assume, for purposes of contradiction, that Y is unbounded. Then we
5.6. Compactness
303
can construct a sequence ,Y { ,} with elements arbitrarily far apart. Specifically, let IY E Y a nd choose zY E Y such that P(YI' 12) > I. Next, choose Y 3 E Y such that p(y I' Y 3) > 1 + p(y I' Y )z . Continuing this process, choose "Y E Y such that P(YI' ,Y ,) > 1 + p(y., Y , ,- I )' If m > n, then P(YI'Y"') > I+ p(y"y")andp(y",,y,,) > Ip(Y I ' Y " ' ) - p(YI,Y,,)1 > 1. But this implies that y{ ,,} contains no convergent subsequence. However, we assumed that Y is sequentially compact; i.e., every sequence in Y contains a convergent subseuq ence. Therefore, we have arrived at a contradiction. Hence, Y must be bounded. In the above argument we assumed that Y is an infinite set. We note that if Y is a finite set then there is nothing to prove. To prove part (ii), let f denote the closure of Y a nd assume that Y E f. Then there is a sequence of points ,Y { ,} in Y which converges to ,Y and every subsequence of y{ ,,} converges to ,Y by Theorem 5.5.6, part (iv). But, by hypothesis, Y is sequentially compact. Thus, the sequence y{ ,,} in Y contains a subsequence which converges to some element in .Y Therefore, Y = f and Y is closed. We now prove part (iii). Let { X ; p} be a sequentially compact metric space, and let X I E .X With E > 0 fixed we choose if possible X z E X such that p(x p x z ) > E. Next, if possible choose X 3 E X such that p(x l , x z ) > E and p(x p x 3 ) > E. Continuing this process we have, for every n, p(x", X I ) > E, p(x", x z ) > E, • • , p(x", X , ,_ I ) > E. We now show that this process must ultimately terminate. Clearly, if{X; p} is a bounded metric space then we can pick E sufficiently large to terminate the process after the first step; i.e., there is no point x E X such thatp(x 1 , x ) :2 € . Now suppose that, in general, the process does not terminate. Then we have constructed a sequence ,x { ,} such that for any two members X I ' x J of this sequence, we have p(xt> X J ) > E. But, by hypothesis, ;X{ p} is sequentially compact, and thus ,x { ,} contains a subsequence which is convergent to an element in .X Hence, we have arrived at a contradiction and the process must terminate. Using this procedure we now have for arbitrary E > 0 afinite set of points { x . , x z , ... ,X l } such that the spheres, S(x,,; E), n = I, ... ,I, cover X ; i.e., for any E > 0, X contains a finite E-net. Therefore, the metric space { X ; p} is totally bounded. We now prove part (iv) of the theorem. Let ,x { ,} be a Cauchy sequence. Then for every E > 0 there is an integer I such that p(x"" ,x ,) < f whenever m > n > I. Since { X ; p} is sequentially compact, the sequence ,x { ,} contains a subsequence tx{ .l convergent to a point X E X so that lim P(Xl., )x = O. ,,-00 The sequence I{ ,,} is an increasing sequence and I", > m. It now follows that whenever m > n > I. Letting m - + 00, we have 0 < p(x", )x < E, whenever n > I. Hence, the Cauchy sequence ,x { ,} converges to x E .X Therefore, X is complete. In connection with parts (iv) and (v) we note that a totally bounded metric
Chapter 5
304
I Metric Spaces
space is not necessarily sequentially compact. We leave the proof of part (v) as an exercise. _ 5.6.16. Exercise.
Prove part (v) of Theorem 5.6.15.
Parts (iii), (iv) and (v) of the above theorem allow us to define a sequentially compact metric space equivalently as a metric space which is complete and totally bounded. We now show that a metric space is sequentially compact if and only if it satisfies the Bolzano-Weierstrass property. 5.6.17. Theorem. A metric space { X ; p} is sequentially compact if and only if every infinite subset of X has at least one point of accumulation.
Proof Assume that Y is an infinite subset of a sequentially compact metric pl. If nY{ } is any sequence of distinct points in ,Y then nY{ } contains space ;X{ because ;X { p} is sequentially compact. a convergent subsequence y{ ,J, The limit Y of the subsequence is a point of accumulation of .Y Conversely, assume that { X ; p} is a metric space such that every infinite subset Y of X has a point of accumulation. Let y{ n} be any sequence of points then this sequence in .Y If a point occurs an infinite number of times in nY { ,} contains a convergent subsequence, a constant subsequence, and we are finished. If this is not the case, then we can assume that all elements of .Y{ } are disti net. eL t Z denote the set of all points Y n' n = I, 2, .... By hypothesis, the infinite set Z has at least one point of accumulation. If Z E Z is such a point of accumulation then we can choose a sequence of points of Z which converges to z (see Theorem 5.5.8, part (i» and this sequence is a subsequence y{ ,.} of nY { ' } Therefore, ;X{ p} is sequentially compact. This concludes the proof. _ Our next objective is to show that in metric spaces the concepts of compactness and sequential compactness are equivalent. In doing so we employ the following lemma, the proof of which is left as an exercise. 5.6.18. eL mma. eL t ;X { p} be a sequentially compact metric space. If { Y .. : IX E A} is an infinite open covering of { X ; p}, then there exists a number E > 0 such that every sphere in X of radius E is contained in at least one of the open sets Y ... 5.6.19. Exercise.
Prove Lemma 5.6.18.
5.6.20. Theorem. A metric space { X ; sequentially compact.
p} is compact if and only if it is
Proof From Theorem 5.6.17, a metric space is sequentially compact if and only if it has the Bolzano-Weierstrass property. Therefore, we first show
5.6. Compactness
that every infinite subset of a compact metric space has a point of accumulation. eL t [ X ; p) be a compact metric space, and let Y be an infinite subset of .X F o r purposes of contradiction, assume that Y has no point of accumulation. Then each x E X is the center of a sphere which contains no point of ,Y except possibly x itself. These spheres form an infinite open covering of .X But, by hypothesis, [ X ; p) is compact, and therefore we can choose from this infinite covering a finite number of spheres which also cover .X Now each sphere from this finite subcovering contains at most one point of ,Y and therefore Y is finite. But this is contrary to our original assumption, and we have arrived at a contradiction. Therefore, Y has at least one point of accumulation, and [ X ; p) is sequentially compact. Conversely, assume that [ X ; p) is a sequentially compact metric space, and let [ Y .. ;« E A) be an arbitrary infinite open covering of .X From Lemma 5.6.18 there exists an [ > 0 such that every sphere in X of radius [ is contained in at least one of the open sets Y ... Now, by hypothesis, { X ; p) is sequentially compact and is therefore totally bounded by part (iii) of Theorem 5.6.15. Thus, with arbitrary [ fixed we can find a finite [-net,
IX[ > S(x
«
x z , ... ,XI)' l ; [)
E
c Y .. I , i
A). eH nce,
=
such that X c
U
I
1= 1
S(x
l;
f). Now in view of Lemma
I, ... ,I, where the sets ,Y ,"
XcU
5.6.18,
are from the family { Y
.. ;
I
Y .. " and X has a finite open subcovering chosen from the infinite open covering { Y .. ;« E A). Therefore, the metric space { X ; p) is compact. This proves the theorem. _ I-I
There is yet another way of characterizing a compact metric space. Before doing so, we give the following definition. 5.6.21. Definition. eL t F { .. : « E A} be an infinite family of closed sets. The family F { .. :« E A} is said to have the finite intersection property if for every finite set B c A the set F .. is not empty.
n
.. EB
5.6.22. Theorem. A metric space ;X{ p} is compact if and only if every infinite family F { .. :« E A} of closed sets in X with the finite intersection property has a nonvoid intersection; i.e., F .. t= = 0 .
n
.. EA
5.6.23. Exercise.
Prove Theorem 5.6.22.
We now summarize the above results as follows.
306 5.6.24.
(i) (ii) (iii) (iv) (v)
Chapter 5
Theorem.
{X;
;X{
p} p} p} p}
In a metric space { X ;
I
Metric Spaces
p} the following are eq u ivalent:
is compact; is sequentially compact;
{X; possesses the Bolzano-Weierstrass property; {X; is complete and totally bounded; and every infinite family of closed sets in { X ; p} with the finite intersection property has a nonvoid intersection.
Concerning product spaces we offer the following exercise.
5~6.2S. Exercise. L e t { X I ; pa, { X z ; pz}, . .. , { X . ; spaces. L e t X = X I X X z X ... x X . , and let p(x , y)
=
PI(X " Y I )
+ ... +
where "x ,Y E "X i = I; ... , n, and where ,x Y space { X ; p} is also a compact metric space.
P.} be n compact metric
P.(x . , Y.), E .X
(5.6.26)
Show that the product
The next result constitutes an important characteriz a tion of compact sets in the spaces R· and C·. 5.6.27. Theorem. L e t { R · ; pz} (let { C · ; pz } ) be the metric space defined in Ex a mple 5.3.1. A set Y c R- (a set Y c C· ) is compact if and only ifit is closed and bounded. 5.6.28. Exercise.
Prove Theorem 5.6.27.
Recall that every non-void compact set in the real line R contains its infimum and its supremum. In general, it is not an easy task to apply the results of Theorem 5.6.24 to specific spaces in order to establish necessary and sufficient conditions for compactness. F r om the point of view of applications, criteria such as those established in Theorem 5.6.27 are much more desirable. We now give a condition which tells us when a subset of a metric space is compact. We have: 5.6.29. Theorem. L e t { X ; p} be a compact metric space, and let Y If Y is closed, then Y is compact.
Proof
c .X
L e t { Y .. ; (J, E A} be any open covering of ;Y i.e., each Y .. is open relative to { Y ; pl. Then, by Theorem 5.4.20, for each Y .. there is a U .. which is open relative to ;X{ p} such that Y .. = Y n U ... Since Y is closed, Y - is an open set in ;X{ pl. Also, since X = Y U Y - , Y - U U{ .. : (J, E A} is an open covering of .X Since X is compact, it is possible to find a finite subcovering from this family; i.e., there is a finite set B c A such that X = Y -
5.7.
Continuous uF nctions
u U[ .. EB V..].
Since Y c
U
307
.. eB
V.., Y
This implies that Y is compact.
U
= _
.. eB
Y
n
V.. ; i.e., { Y
.. ;«
E
B} covers Y .
We close the present section by introducing the concept of relative compactness. 5.6.30. Definition. Let { X ; p} be a metric space and let Y c .X The subset Y is said to be relatively compact in X if Y is a compact subset of .X One of the essential features of a relatively compact set is that every sequence has a convergent subsequence, just as in the case of compact subsets; however, the limit of the subsequence need not be in the subset. Thus, we have the following result. 5.6.31. Theorem. eL t { X ; p} be a metric space and let Y c .X Then Y is relatively compact in X if and only if every sequence of elements in Y contains a subsequence which converges to some x E .X
Proof Let Y be relatively compact in ,X and let nY{ } be any sequence in .Y Then nY{ } belongs to Y also and hence has a convergent subsequence in ,Y since Y is sequentially compact. Hence, nY{ } contains a subsequence .X which converges to an element x EY e Conversely, let nY { } be a sequence in .Y Then for each n = 1,2, ... , there is an x n E Y such that p(x n, nY ) < lin. Since x { n} is a sequence in ,Y it contains a convergent subsequence, say x{ n.} , which converges to some x E .X Since nx { .J is also in ,Y it follows from part (iii) of Theorem 5.5.8 that x E .Y Hence, Y is sequentially compact, and so Y is relatively compact in .X _
5.7.
CONTINUOS U
N UF CTIONS
Having introduced the concept of metric space, we are in a position to give a generalization of the concept of continuity of functions encountered in calculus. 5.7.1. Definition. Let { X ; P..J and { Y ; py} be two metric spaces, and let f: X - + Y be a mapping of X into .Y The mappingf is said to be continuous at the pcint X o E X if for every ( > 0 there is a ~ > 0 such that
o)] < ( whenever p,,(x, x o) < ~. The mapping f is said to be continuous on X simply continuous if it is continuous at each point x E .X PY [ f (x ) ,f(x
or
308
Chapter 5 / Metric Spaces
We note that in the above definition the ~ is dependent on the choice of X o and e; ie., ~ = tS(f, x o). Now if for each f > 0 there exists a ~ = tS(e) > 0 such that for any X o we have py[ f (x ) ,f(x o)] < f whenever p,,(x, x o) < ~, then we say that the function f is uniformly continuous on .X Henceforth, if we simply say f is continuous, we mean f is continuous on .X 5.7.2. Example. Let { X ; p,,} = R~, 5.3. I). Let A denote the real matrix
We denote x
E
Rn and Y
::: :::
[
A=
amI
py}
and let { Y ;
a m2
. ...
:::]
.,.
a mn
=
RT (see Example
.
Rm by
E
L e t us define the function f: Rn - +
Rm by
=
f(x)
Ax
for each x ERn. We now show that f is continuous on Rn. Ifx, X o E Rm are such that y = f(x) and aY = f(x.), then we have
y, oY
[ and
~']
amI
11m
= ~
p[ y(y, OY )]2 Using
R- and
"~ ] e[ ,]
a[ n =
E
am_
tL
en
a/j(e J -
eOJ)r
the Schwarz inequality,· it follows that p[ ,.{y,
Now let
M=
t{ 1
tJ
yo»)2
all}
Ct ah) ~LJ
< [~ 1/1
1= =
0 (if
M=
(e J
-
e I)2) O
0 then we are done). Given any
0 and choosing ~ = flM, it follows that p,.{y, oY ) < f whenever p,,(x, ox ) and any mapping f: Rn - + Rm which is represented by a real, constant (m X n) matrix A is continuous on Rn. • f
~
5.7.3. Example. Let { X ; p,,} = { Y ; py} = {e[a, b]; P2}' the metric space defined in Example 5.3.12, and let us define a function/: X - + Y in the fol-
5.7.
Continuous uF nctions
lowing way. F o r x
309
,X Y = f(x ) is given by
E
yet)
I: k(t, s)x(s)ds,
=
t
E
a[ , b],
where k: R7. - > R is continuous in the usual sense, i.e., with respect to the metric spaces R~ and R1. We now show that f is continuous on .X Let x, X o E X and y, o Y E Y be such that y = f(x ) and oY = f(x o). Then [ p iY ,
oY W
It follows from Holder's where M = ever Px(,x
u: r
rI{ :
=
k(t, s)[(x s)
-
ox (s)]ds}
dt.
inequality for integrals (5.2.5) that py(y, oY )
0, py{y,Yo)
0 such that for each 6 > 0 there is an x with the property that Px(x, x o) < 6 and p,.(f(x ) ,f(x o» > E. This implies that for each positive integer n there is an x~ such that Px(x., x o) < lin and P,.(f(x J , f(x o > E for all n; i.e., ~x - + X o but { f (x . )} does not converge to f(x o)' But we assumed that f(x . ) - + f(x o) whenever ~x - + X o' Hence, we have arrived at a contradic-
»
5.7. Continuous uF nctions
tion, and I must be continuous at theorem. _ X
311
o' This concludes the proof of our
Continuous mappings on metric spaces possess the following important properties.
5.7.9. Theorem. eL t { X ; p~} and { Y ; be a mapping of X into .Y Then (i)
p,} be two metric spaces, and letl
I
is continuous on X if and only if the inverse image of each open subset of { Y ; p,} is open in { X ; p~}; and (ii) I is continuous on X if and only if the inverse image of each closed subset of { Y ; p,} is closed in { X ; p~}.
Proof. eL t I be continuous on ,X and let V::t= 0 be an open subset of ;Y{ p,}. Let U = I- I (V). Clearly, :U :t= 0. Now let x E .U Then there exists a unique y = I(x ) E V. Since V is open, there is a sphere S(y; e) which is entirely contained in V. Since I is continuous at x, there is a sphere S(x; 0) such that its image I(S(x ; 0» is entirely contained in S(y; e) and therefore in V. But from this it follows that S(x; 0) c .U eH nce, every x E U is the center of a sphere which is contained in .U Therefore, U is open. Conversely, assume that the inverse image of each non-empty open subset of Y is open. F o r arbitrary x E X we have y = f(x ) . Since S(y; e) c Y i s open, the setf- I (S(y; e» is open for every f > and x E f- I (S(y;e» . eH nce, there is a sphere Sex; 0) such that sex ; 0) c f- I (S(y; e» . From this it follows that for every f > 0 there is a 6 > 0 such that f(S(x ; 0) c S(y; f). Therefore,fis continuous at .x But x E X was arbitrarily chosen. eH nce, I is continuous on .X This concludes the proof of part (i). To prove part (ii) we utilize part (i) and take complements of open sets. •
°
The reader is cautioned that the image of an open subset of X under Y is not necessarily an open subset of .Y a continuous mapping f: X - + F o r example, let I: R - + R be defined by f(x ) = x 2 for every x E R. Clearly, lis continuous on R. eY t the image of the open interval ( - I , I) is the interval 0[ , I). But the interval 0[ , I) is not open. We leave the proof of the next result as an exercise to the reader.
5.7.10. Theorem. eL t {X; p~}, {Y; p,}, and Z { ; P.} be metric spaces, letf be a mapping of X into ,Y and let g be a mapping of Y into Z. Iffis continuous on X and g is continuous on ,Y then the composite mapping h = g 0 I of X into Z is continuous on .X 5.7.11. Exercise.
Prove Theorem 5.7.10.
F o r continuous mappings on compact spaces we state and prove the following result.
Chapter 5
312
5.7.12. Theorem. Let ;X { Px} f: X - + Y be continuous on .X
and ;Y{
I Metric Spaces
P)'} be two metric spaces, and let
(i) If {X; Px} is compact, then f(X ) is a compact subset of {Y; p)'.} (ii) If U is a compact subset of the metric space ;X{ Px,} thenf(U ) is a compact subset of the metric space { Y ; p)'.} (iii) If {X; P}x is compact and if U is a closed subset of ,X then f( )U is a closed subset of { ;Y p)'). (iv) If;X { Px} is compact, thenfis uniformly continuous on .x
Proof To prove part (i) let IY { I} be a sequence in f(X ) . Then there are points ,x { ,} in X such that IY I = f(x ll ). Since ;X{ Px} is compact we can find a subsequence ,x { .) of ,x { ,} which converges to a point in ;X i.e., ,x . - + x. In view of Theorem 5.7.8 we have, since f is continuous at x, f(x , .) - + f(x ) E f(X ) . From this it follows that the sequence ,Y{ ,} has a convergent subsequence and f(X ) is compact. To prove part (ii), let U be a compact subset of .X Then ;U { Px} is a compact metric space. In view of part (i) it now follows that f( )U is also a compact subset of the metric space { Y ; p)'.} To prove part (iii), we first observe that a closed subset U of a compact metric space ;X{ Px) is itself compact and ;U { Px) is itself a compact metric space. In view of part (ii), f( U ) is a compact subset of the metric space { Y ; P)'} and as such is bounded and closed. To prove part (iv), let E > O. F o r every x E ,X there is some positive number, 'I(x), such that f(S(x ; 2'1(x») c: S(f(x ) ; E/2). Now the family { S ex ; ' I (x » : x E X ) is an open covering of X. Since X is compact, there is a finite set, say F c: ,X such that S { ex; ,,(x»: x E } F is a covering of .X Now let 6 = min {,,(x): x E .} F Since F is a finite set, 6 is some positive number. Now let ,x Y E X be such that p(x, y) < 6. Choose z E F such that x E S(z; ,,(z». Since 6:::;;; ,,(z), Y E S(z; 2,,(z.» Since f(S(z ; 2,,(z)» c S(f(z ) ; E/2), it follows that f(x ) and f(y) are in S(f(z ) ; E/2). eH nce, pif(x ) ,f(y» < E. Since 6 does not depend on x E ,X f is uniformly continuous on .X This completes the proof of the theorem. _ eL t us next consider some additional generalizations of concepts encountered in the calculus. 5.7.13. Definition. eL t ;X { Px} and ;Y{ p),} be metric spaces, and let {fll} be a sequence of functions from X into .Y Iff{ 1l(X)} converges at each x E X, then we say that {fll} is pointwise convergent. In this case we write lim fll = f, II where f is defined for every x E .X Equivalently,
we say that the sequence f{ lO} is pointwise convergent to
5.7.
Continuous uF nctions
a function I if for every = N(f, x ) such that
313 f
> 0 and for every x pil,,(x ) ,/(x »
N(f, x). In general, N(f, x ) is not necessarily bounded. However, if N(f, )x is bounded for all x E ,X then we say that the sequence I[ .} converges to I uniformly on .X Let M(f) = sup N(f, x ) < 00. Equivalently, "ex
we say that the sequence [f.} converges uniformly to I on X f > 0 there is an M(f) such that
pil.(x ) ,f(x »
M(f) for all x E .X In the next result a connection between uniform convergence of functions and continuity is established. (We used a special case of this result in the proof of Example 5.5.28.) 5.7.14. Theorem. Let [ X ; p,,} and [ Y ; py} be two metric spaces, and let f[ It} be a sequence of functions from X into Y such that f" is continuous on X for each n. If the sequence [f.} converges uniformly to I on X, then I is continuous on .X Assume that the sequence [ f .} converges uniformly to Ion .X Then < f whenever n > N for every f > 0 there is an N such that Py(f.(x ) ,f(x » for all x E .X If M > N is a fixed integer then 1M is continuous on .X Letting X o E X b e fixed, we can find a 6> 0 such thatpy(fM(x),IM(x o» < fwhenever p,,(x , x o) < 6. Therefore, we have
Proof
:» ::;;
py(f(x ) ,/(x o
pif(x ) ,fM(X »
+
py(fM(x),fM(X
+
»
O
PY(fM(XO),f(x o» < 3f, whenever. PJe(x, x o) < 6. F r om this it follows that f is continuous at X O' Since X o was arbitrarily chosen,fis continuous at all x E .X This proves the theorem. • The reader will recognize in the last result of the present section several generalizations from the calculus to real-valued functions defined on metric spaces. [ ; p} denote the 5.7.15. Theorem. Let [ X ; pJe} be a metric space, and let R real line R with the usual metric. Let I: X - > R, and let U c: .X If I is continuous on X and if U is a compact subset of [ X ; p",}, then (i) lis uniformly continuous on U ; (ii) fis bounded on ;U and (iii) if U " * 0, f attains its infimum and supremum on ;U i.e., there ex i stx o ,x E U ) andf(x sup l) = 1 E U s uchthatf(x o )= i nf{ f (x ) :x [ f (x ) : x E .} U
Chapter 5
314
I
Metric Spaces
Proof Part (i) follows from part (iv) ofTheorem 5.7.12. Since U is a compact subset of X it follows that /(U ) is a compact subset of R. Thus, /(U ) is bounded and closed. From this it follows that j is bounded. To prove part (iii), note that if U is a non-empty compact subset of ;X { Px,} then /(U ) is a non-empty compact subset of R. This implies that / attains its infimum and supremum on .U •
5.8.
SOME IMPORTANT RESUT L S IN APPLICATIONS
In this section we present two results which are used widely in applications. The first of these is called the fixed point principle while the second is known as the Ascoli-Arzela theorem. Both of these results are widely utilized, for example,in establishing existence and uniqueness of solutions of various types of equations (ordinary differential equations, integral equations, algebraic equations, functional differential equations, and the like). We begin by considering a special class of continuous mappings on metric spaces, so-called contraction mappings. The 5.8.1. Definition. eL t { X ; p} be a metric space and let j: X - X . function / is said to be a contraction mapping if there exists a real number c such that 0 < c < I and for all ,x y
E
s;;: cp(x . y)
p(f(x ) ,j(y»
.X
(5.8.2)
The reader can readily verify the following result. 5.8.3. Theorem. Every contraction mapping is uniformly continuous on .X
5.8.4.
Prove Theorem 5.8.3.
Exercise.
The following result is known as the fixed point principle or the principle of contraction mappings. 5.8.5. Theorem. eL t { X ; p} be a complete metric space, and let / be a contraction mapping of X into .X Then (i) there exists a unique point
and (ii) for any
uX
X such that
E
f(x o) = XI
E ,X
the sequence x { X n+ 1
= /(x
n} n ),
(5.8.6)
xo'
in X defined by
n=
1,2, ...
converges to the unique element X o given in (5.8.6).
(5.8.7)
315
5.8. Some Important Results in Applications
The unique point X o satisfying Eq. (5.8.6) is called a fixed point off In this case we say that X o is obtained by the method of successive approximations. We first show that if there is an X o E X satisfying (5.8.6), then it must be unique. Suppose that X o and oY satisfy (5.8.6). Then by inequality (5.8.2), we have p(xo,Yo) < cp(x o' oY )· Since 0 < c < I, it follows that p(x o' oY ) = 0 and therefore X o = oY ' Now let IX be any point in .X We want to show that the sequence fx.} generated by Eq. (5.8.7) is a Cauchy sequence. F o r any n > I, we have p(x.+ I, x . ) < cp(x., x._ I). By induction we see that p(x.1+ > x . ) < C· - I p(XZx ' l ) for n = 1,2, .... Thus, for any m > n we have
Proof
p(x""
",- I
< I: P(XkI+ '
x.)
x
k= •
< -
c•
-1
p( ,zX
1- c
IX
k) )
I< Ifx t(t)
-
317
N, we have fX t(t,)
I + IfX t(t,)
-
"x ,(t,)
+
I IX",(t,) -
"x ,(t)1
N. Therefore, .x{ } is a Cauchy sequence in era, b]. Since e{ ra, b]; pool is a complete metric space (see Example 5.5.28), fX{ t} converges to some point in era, b]. This implies that fY { t} has a subsequence which converges to a point in era, b] and so, by Theorem 5.6.31, Y is relatively compact in era, b]. This completes the proof of the theorem. _ Our next result follows directly from Theorem 5.8.12. It is sometimes referred to as Ascoli's lemma. 5.8.13. Corollary. Let 9{ 1ft} be a sequence of functions in e{ ra, b]; poolIf 9{ 1ft} is equicontinuous on a[ , b] and uniformly bounded on a[ , b] (Le., there exists an M> 0 such that sup 1,.(t)1 < M for all n), then there exists a .S;.S;b
,
E
era, b] and a subsequence 9{ 1ft.} of ,{ ,,}
uniformly on a[ , b].
5.8.14.
Exercise.
such that 9{ 1ft.}
converges to,
Prove Corollary 5.8.13.
We close the present section with the following converse to Theorem 5.8.12. 5.8.15. Theorem. Let Y be a subset of era, b] which is relatively compact in the metric space e{ ra, b]; pool. Then Y is a bounded set and is equicontinuous on a[ , b]. 5.8.16. Exercise.
5.9.
Prove Theorem 5.8.15.
EQUIVALENT AND HOMEOMORPHIC SPACES. TOPOLOGICAL SPACES
METRIC
It is possible that seemingly different metric spaces may exhibit properties which are very similar with regard to such concepts as open sets, limits of sequences, and continuity of functions. F o r example, for each p, I < p < 00, the spaces R~ (see Examples 5.3.1,5.3.3) are different metric spaces. However, it turns out that the family of all open sets is the same in all of these metric spaces for 1 < p < 00 (e.g., the family of open sets in R7 is the same as the family of open sets in Ri, which is the same as the family of open sets in Rj, etc.). Furthermore, metric spaces which are not even defined on
Chapter 5
318
I Metric Spaces
the same underlying set (e.g., the metric spaces { X ; P.. } and { Y ; py}, where X Y) may have many similar properties of the type mentioned above. We begin with equivalence of metric spaces defined on the same underlying set.
*
5.9.1. Definition. Let { X ; ptl and { X ; Pl} be two metric spaces defined on the same underlying set .X Let 3 1 and 31 be the topology of X determined by PI and Pl' respectively. Then the metrics PI and Pl are said to be equivalent metrics if 3 1 = 31 , Throughout the present section we use the notation f: { X ;
PI} ~
{Y;
Pl}
to indicate a mapping from X into ,Y where the metric on X is PI and the metric on Y is Pl' This distinction becomes important in the case where X = ,Y i.e. in the casef: { X ; PI} - + { X ; Pl}' Let us denote by i the identity mapping from X onto ;X i.e., i(x ) = x for all x E .X Clearly, i is a bijective mapping, and the inverse is simply i itself. However, since the domain and range of i may have different metrics associated with them, we shall write and
i: ;X{ i- I : {X;
PI} Pl}
{X; ~ ~
Pl} {X;
PI}'
With the foregoing statements in mind, we provide in the following theorem a number ofequivalent statements to characterize equivalent metrics. 5.9.2. Theorem. Let {X; pd, {X; Pl}' and { Y ; the following statements are equivalent:
P3} be metric spaces. Then
(i) PI and Pl are equivalent metrics; (ii) for any mappingf: X - + Y,J: { X ; PI} - + { Y ; P3} is continuous on X if and only iff: { X ; Pl} - + { Y ; P3} is continuous on X; (iii) the mapping i: { X ; PI} - + { X ; Pl} is continuous on ,X and the mapping i- I : { X ; Pl} - + { X ; ptl is continuous on X; and (iv) for any sequence x { R } in ,X x { R } converges to a point x in { X ; PI} if and only if x { R } converges to x in ;X { Pl}'
Proof
To prove this theorem we show that statement (i) implies statement (ii); that statement (ii) implies statement (iii); that statement (iii) implies statement (iv); and that statement (iv) implies statement (i). To show that (i) implies (ii), assume that PI and Pl are equivalent metrics, and letfbe any continuous mapping from ;X{ PI} into {Y; P3}' Let U be any open set in { Y ; P3}' Sincefis continuous,J - I (U ) is an open set in { X ; PI}' Since PI and Pl are equivalent metrics,f- I (U ) is also an open set in { X ; Pl} ' Hence, the mapping f: { X ; Pl} - + { Y ; P3} is continuous. The proof of the converse in statement (ii) is identical.
5.9. Equivalent and oH meomorphic
Metric Spaces. Topological Spaces
319
We now show that (ii) implies (iii). Clearly, the mapping i: ;X{ pz} - + {X; pz} is continuous. Now assume the validity of statement (ii), and let { Y ; P3} = {X; pz.} Then i: {X; PI} - {X; pz} is continuous. Again, it is clear that i- I : { X ; PI} - + { X ; PI} is continuous. Letting { Y ; P3} = { X ; pd in statement (ii), it follows that i- I : { X ; pz} - + { X ; PI} is continuous. Next, we show that (iii) implies (iv). eL t i: ;X{ PI} - + ;X { pz} be continuous, and let the sequence {x~} in metric space { X ; PI} converge to .x By Theorem 5.7.8, lim i(x)~ = i(x); eL ., lim ~x = x in { X ; pz.} The converse is ~
~
proven in the same manner. Finally, we show that (iv) implies (i). L e t U be an open set in { X ; PI}' be a sequence in U - which converges Then U - is closed in { X ; PI}' Now let{x~J to x in { X ; PI}. Then x E U by part (iii) of Theorem 5.5.8. By assumption, {x~} converges to x in { X ; pz} also. Furthermore, since x E U - , U - is closed pz,} by part (iii) of Theorem 5.5.8. Hence, U is open in ;X{ pz.} Letting in ;X{ U be an open set in ;X { pz,} by the same reasoning we conclude that U is open in { X ; PI}' Thus, PI and pz are equivalent metrics. This concludes the proof of the theorem. _ The next result establishes sufficient conditions for two metrics to be equivalent. These conditions are not necessary, however.
5.9.3. Theorem. Let ;X{ PI} and ;X { pz} be two metric spaces. If there exist two positive real numbers, Y' and A, such that lpz ( x ,
for all ,x y
5.9.4.
E
,X
Exercise.
y)
a[ , b] and where a[ , b] is a closed interval of R. Let L . and assume that f satisfies the condition
If(x z ) -
f(x l )!
0,
(5.10.3)
Chapter 5 I Metric Spaces
324
for all X ! , X z E a[ , b). In this case / is said to satisfy a iL pschitz condition, and L is called a iL pschitz constant. Now consider the complete metric space {R; p}, where p denotes the usual metric on the real line. Then aH , b]; p} is a complete metric subspace of{ R ; p} (see Theorem 5.5.33). If in (5.10.3) we assume that L < I, then/is clearly a contraction mapping, and Theorem 5.8.5 applies. It follows that if L < I, then Eq. (5.10.2) possesses a unique solution. Specifically, if X o E a[ , b), then the sequence ,x { ,}, n = 1,2, ... determined by "X = /(X,,_I) converges to the unique solution of Eq. (5.10.2). Note that if Id/(x ) fdx I = If' ( x ) I < c < I on the interval a[ , b) (in this case f' ( a) denotes the right-hand derivative of/at a, and f' ( b) denotes the left-hand derivative of/at b), then / is clearly a contraction. In iF gures J and K the applicability of the contraction mapping principle y= x b1-----------.,(
81- - - . (
/
/
/
/
/
/
iF gure J . Successive approximations (convergent case).
5.10.4.
,
b
y· x /
/
/
/
/
~ ............... y" fIx)
/
8 /
/
,-
x. X
3x b
5.10.5.
iF gure .K
Successive approximations (convergent case).
5.10. Applications
325
is demonstrated pictorially. As indicated, the sequence .x{ } determined by • successive approximations converges to the fixed point .x In our next example we consider a system of linear equations. 5.10.6. Example.
Consider the system of n linear equations
e, =
•
~
:'J 1
+ P"
a'J~
e.)
i=
1, ... , n.
(5.10.7)
Assume that x = (~I' ... , E R·, b = (PI> ' .. , P.) E R·, and a'J E R. Here the constants a'' J P, are known and the are unknown. In the following we use the contraction mapping principle to determine conditions for the existence and uniqueness of solutions of Eq. (5.10.7). In doing so we consider different metric spaces. In all cases we let
e,
y =!(x) denote the mapping determined by the system of linear equations
•
+
a'J~ P" i = I, ... , n, "J :1 where y = (' I I' ... , 'I.) ERn. F i rst we consider the complete space R { n; PI} = R7. Let y' = ! ( x ' ) , y" = ! ( x " ), x ' = (~;, ... , ~) and x " = (~~, ... , ~). We have
I' , =
=
PI(y' , y")
=
2Y ) < 1.1.1 M(b Now let fl"l denote the composite mapping f 0 f = yl"l. A little bit of algebra yields p..(fI"l(X I ),fI"l(x
However,
z
»
= p..(yl">'yl"l) ~
~n . 1.tI"M"(b - a)" -
0 as n -
0
a)poo(x l , x 2)· ••• 0 f, and let fl"l(x )
~n. 1.1.I"M"(b - a)"p..(x 00.
l,
x
2 )·
(5.10.18)
Thus, for an arbitrary value of
.t, n can be chosen so large that k H e nce,
we have
A
~! 1.tI"M"(b -
a)"
: lED,
II(t, x) I, and let
~
solution tp of Eq. (5.10.25) ~.
= min (a, hIM). Note that
a if a < hiM and ~ = hiM if a> hiM (refer to Figure )L . We will show that an f-approximate solution exists on the interval [ f , f + ~]. The proof is similar for the interval (f - ~, fl. In our proof we will construct an f-approxiconsisting of a finite number of straight line mate solution starting at (f,~, segments joined end to end (see Figure )L . Since 1 is continuous on the compact set Do, it is uniformly continuous on Do (see Theorem 5.7.12). eH nce, given f > 0, there exists 6 = 6(f) > 0 such that I/(t, x ) - I(t' , x ' ) I < f whenever (t, x), (t', x ' ) E Do, It - t'l < 6 and Ix - i'x < 6. Now let f = to and f + ~ = t". We divide the half-open interval (to, t,,] into n half-open subintervals (to, tl]' (tl' t 2,] ... , (t,,_ I ' t,,] in such a fashion that ~
=
Let
solution.
(5.10.30)
331
5.10. Applications
Next, we construct a polygonal path consisting of n straight lines joined end and having slopes equal to to end, starting at the point (r, e> 6 (to, ml _ 1 = ! ( tI- 1 > e l- l ) over the intervals (tl_ l ,tl] , i= I, ... ,n, respectively, where el = el- I + m l _ I Itl - tl _ 1 I· A typical polygonal path is shown in Figure .L Note that the graph of this path is confined to the triangular region in Figure .L eL t us denote the polygonal path constructed in this way by 1' . Note that 1' is continuous on the interval 1[ ,' l' + ~], that 1' is a piecewise linear function, and that 1' is piecewise continuously differentiable. Indeed, we have 1' (1') = = and
eo)
eo
1' (t)
e
= 1' (t l- l ) + f(tl-I>
'1(ti-I»(t
-
ti-I)'
ti-
I
< t < ti' i =
1, ... , n. (5.10.31)
Also note that
1'1(t) -
+
I
O. Now let
~1)
=
1 E (a, bl 1 E b [ , b+
{ ; (1), ",(1),
Pl
}.
To show that ; is a solution of the differential eq u ation on the interval (a, b + Pl, with ;(-r) = ,{ we must show that ; is continuous at t = b. Since
,(b- )
=
and since
;(0 =
,(b- )
we have
;(0 = for all t
E
(a, b
r
{+
{+
+
f(x , ;(s» d s
s: f(s, ;(s»ds,
s: f(s, ;(s»ds
+ Pl. The continuity of ;
in the last eq u ation implies the
Chapter 5 I Metric Spaces
336
countinuity of I(s, s~ .»
Differentiating the last equation, we have
~(t) = I(t, ~(t» for all t E (a, b + Pl. We call ~ a continuation of the solution tp to the interval (a, b Pl. If 1 satisfies a Lipschitz condition on D with respect to ,x then ~ is unique, and we call ~ the continuation of tp to the interval (a, b + Pl. We can repeat the above procedure of continuing solutions until the boundary of D is reached. Now let the domain D be, in particular, a rectangle, as shown in F i gure M. It is important to notice that, in general, we cannot continue solutions over the entire t interval T shown in this figure.
+
0=
h{ . )x : Tl < t < T 2.tl
< x < t 2)
T= ( Tl.T2)
t
T
5.10.4.4 iF gure M. Continuation of a solution to the boundary of domain D.
We summarize the above discussion in the following: 5.10.45. Theorem. In Eq. (5.10.25), let f be continuous and bound on a domain D of the (t, x) plane and let (T, { ) E D. Then all solutions of the initial-value problem (5.10.26) can be continued to the boundary of D. We can readily extend Theorems 5.10.28, 5.10.33, Corollary 5.10.37, and Theorems 5.10.43 and 5.10.45 to initial-value problems characterized by systems of n first-order ordinary differential equations, as given in Definition .4 11.9 and Eq. .4 1 1.11. In doing so we replace D c R'1. by D c Ra+ I , x E R by x E RaJ: D - + R by f: D - + Ra, the absolute value Ixl by the q u antity a
and the metric p(x, y) =
Ix -
Ixl = I; Ix,l, 'sl
y I on R by the metric p(x , y)
-
(5.10.46)
= I; I,x - y,l
I- '
on R-. (The reader can readily verify that the function given in Eq. (5.10.46) satisfies the axioms of a norm (see Theorem .4 9.31).) The definition of Eapproximate solution for the differential eq u ation i = f(t, x ) is identical to that given in Definition 5.10.27, save that scalars are replaced by vectors (e.g., the scalar function tp is replaced by the n-vector valued function p4 ).
337
5.10. Applications
Also, the modifications involved in defining a Lipschitz on D c R-+ I are obvious. 5.10.47.
condition for f(t, )x
F o r the ordinary differential eq u ation
Exercise.
i =
(5.10.48)
f(t, x)
and for the initial-value problem i
=
=
(X T)
f(t, x),
(5.10.49) ~
characterized in Eq. (4.11.7) and Definition .4 11.9, respectively, state and prove results for existence, uniqueness, and continuation of solutions, which are analogous to Theorems 5.10.28, 5.10.33, Corollary 5.10.37, and Theorems 5.10.43 and 5.10.45. In connection with Theorem 5.10.45 we noted that the solutions of initialvalue problems described by non-linear ordinary differential equations can, in general, not be extended to the entire t interval T depicted in Figure M. We now show that in the case of initial-value problems characterized by linear ordinary differential equations it is possible to extend solutions to the entire interval T. First, we need some preliminary results. Let D = ({ t, )x : a < t < b, x E R-} (5.10.50) where the function equations
1·1 is defined in Eq. (5.10.46). Consider the set of linear =
,X
t
J-I
a,it)x J
f,(t, x ) ,
I.>.
=
i
1, ... , n
(5.10.51)
where the a,it), i,j = I, ... , n, are assumed to be real and continuous functions defined on the interval a[ , b]. We first show that f(t, x) = lfl(t, x), ... ,/_(t, )x T ] satisfies a Lipschitz condition on D, If(t, x ' )
for all (t, x ' ) , (t, x " ) and k
=
max
I! ( .J ! ( ._ I - I
-
L I
E
-
f(t, x " ) I ~
k lx '
-
= (x ; , ...
D, where x '
x"l
x" =
,x~)T,
(x:' ,
a,it) I· Indeed, we have
Ir(t, x ' ) -
r(t, x " ) I = ,~
= L
II,(t, x ' )
- I - 1=1' ~
I- I
+
0, where PllYol1
of Ex a mple 6.1.5. A moment' s reflection reveals that in case of II . 112' the unit sphere is a circle of radius I; when the norm is II • II.., the unit sphere is a sq u are with vertices (1,1), (I, - I ), (- 1 , 1), (- 1 , - I ); if the norm is II • 111' the unit sphere is the sq u are with vertices (0, I), (I~ 0), (- 1 ,0), (0, - I ). If for the unit sphere corresponding to II • lip we let p increase from I to 00, then this sphere will deform in a continuous manner from the sq u are corresponding to II • lit to the sq u are corresponding to II . II... This is depicted in F i gure C. We note that in all cases the unit sphere results in a convex set. F o r the case of the real-valued function
6.4.15.
(6.4.16) the set determined by II x II < 1 results in a set which is not convex. In particular, if p = 2/3, the set determined by II x II < I yields the boundary and the interior of an asteroid, as shown in F i gure C. The reason for the non- c onvex i ty of this set can be found in the fact that the function (6.4.16) does not represent a norm. In particular, it can be shown that (6.4.16) does not satisfy the triangle inequality. •
11'11.
t, 11'11,
6.4.17.
Unit spheres for Example 6.4.15
1I'lb13
iF gure C. Unit
spheres for Example 6.4.15.
355
6.5. iL near uF nctionals
6.4.18.
Exercise.
Verify the assertions made in Example 6.4.15.
We conclude this section by introducing the notion of cone. 6.4.19. Definition. A set Y in X is called a cone with vertex at the origin if Y E Y implies that Y« E Y for all« > O. If Y is a cone with vertex at the origin, then the set X o + ,Y X o E ,X is called a cone with vertex X o' A convex cone is a set which is both convex and a cone. In Figure D examples of cones are shown.
(al Cone
(bl Convex cone
6.4.20.
6.5.
IL NEAR
iF gure D
FUNCTIONALS
Throughout this section X
is a normed linear space.
We recall that a mapping, f, from X into F is called a functional on X (see Definition 3.5.1). Iff is also linear, i.e., f(<x< + py) = « f (x ) + Pf(y) for all« , P E F and all ,x y E ,X then f is called a linear functional (refer to Definition 3.5.1). Recall further that X I , the set of all linear functionals on ,X is a linear space over F (see Theorem 3.5.16). eL t f E X I and x E .X In accordance with Eq. (3.5.10), we use the notation
f(x ) =
(6.5.1)
< x , f)
to denote the value offat .x Alternatively, we sometimes find it convenient to let x' E X ' denote a linear functional defined on X and write (see Eq. (3.5.11))
x'(x)
= ,x
0 such that for all x
E
.X
If(x ) I < Mil x II Iff is not bounded, then it is said to be unbounded.
The following theorem shows that continuity and boundedness of linear functionals are equivalent. 6.5.5. Theorem.· A linear functional bounded if and only if it is continuous.
f on a normed linear space X is
Proof Assume thatfis bounded, and let M be such that If(x)1 < Mil x II for all x E .X If"x - + 0, then If(x,,) I < Mil "x 11- + o. H e nce,fis continuous at x = O. F r om Theorem 6.5.3 it follows thatfis continuous for all x E .X Conversely, assume thatfis continuous at x = 0 and hence at any x E X. There is a 6> 0 such that If(x)1 < I whenever IIxll < 6. Now for any x 1= = 0 we have II (6x)/11 Ix I II = 6, and thus If(x ) If we let M =
=
I If(~
IIxll
1/6, then If(x )
II II) 1= -r X
If( 6x ) \ .
I< Mllxll,
TIXlT
IIxll -r
0 there is an M such that Ix : (x ) - :x "(x)1 < fllxll for all m, n > M and for all x E .X But (~x )x +x'(x), and hence Ix ' ( x ) - :x "(x) I < fllxll for all m > M. It now follows that
Ix(' )x I
= Ix(' )x
-
:x "(x)
+
< fllxll + Ilx:"lIllxll,
:x "(x)
I < Ix(' )x -
:x "(x)
I + I:x "(x) I
and thus x ' is a bounded linear functional. iF nally, to show that :x ., - + E * X , we note that Ix ' ( x ) - :x "(x) I < fllx II whenever m > M from which we have Ilx' - :x ., II < f whenever m > M. This proves the theorem. _
x'
Chapter 6 I Normed Spaces and Inner Product Spaces 6.5.8. Exercise.
Prove part (i) of Theorem 6.5.6.
It is especially interesting to note that *X is a Banach space whether X is or is not a Banach space. We are now in a position to make the following definition. 6.5.9. Definition. The set of all bounded linear functionals on a normed space X is called the Donned conjugate space of ,X or the nonned dual of ,X or simply the dual of ,X and is denoted by * X . F o r I E *X we call 11/11 defined by Eq. (6.5.7) the nonn off The next result states that the norm of a functional can be represented in various equivalent ways. 6.5.10. T ' heorem. L e tlbe be the norm off Then (i)
(ii) (iii) 6.5.H .
IIIII= Ilfll= 11/11 =
a bounded linear functional on ,X
inf{ M : I/(x )
Is
sup { 1 /(x ) l} ;
and
M
b:1~ 1
and let 11/11
II for all x EX } ;
Mil x
sup {l/(x)l}.
1..1- 1
Exercise.
Prove Theorem 6.5.10.
Let us now consider the norms of some specific linear functionals. 6.5.12. Example. mapping
Consider the normed linear space
r
I(x ) =
x(s) ds,
x
e{ ra, b]; II • II-I. The
era, b]
E
is a linear functional on era, b] (cf. Example 3.5.2). The norm of this functional equals (b - a), because I/(x ) 1 =
I6J
G
x(s) ds
I
O. It now follows that
*
for every a
= ( ... , ,x ,} be a basis for ,X let kY { } be a Cauchy sequence in and for each k let the coordinates of kY with respect to IX { > ... ,x,,} be given by (l1kl> ... , ' 7 h)' It follows from Theorem 6.6.1 that there is a constant M such that I11k} 1- 1/J1 < MllYk - IY II forj = I, ... , n and all i, k = 1,2, .... Hence, each sequence 7'{ k}} is a Cauchy sequence in ,F i.e., in R or C, and is therefore convergent. Let '70} = lim 7' k} for j = I, ... , n. If we ,X
k
let
oY = it follows that kY { }
' 7 0I X I
converges to oY '
+ ... +
7' o"x",
This proves that X
is complete.
_
The next result follows from Theorems 6.6.5 and 6.2.1. 6.6.6. Theorem. L e t X be a normed linear space, and let Y be a finitedimensional linear subspace of .X Then (i) Y is complete, and (ii) Y is closed.
Chapter 6 I Normed Spaces and Inner Product Spaces
6.6.7. Exercise.
Prove Theorem 6.6.6.
Our next result is an immediate consequence of Theorem 6.6.1.
X be a finite-dimensional normed linear space, and let/be a linear functional on .X Then/is continuous.
6.6.8. Theorem. Let 6.6.9. Exercise.
Prove Theorem 6.6.8.
We recall from Definition 5.6.30 and Theorem 5.6.31 that a subset Y o f a metric space X is relatively compact if every sequence of elements in Y contains a subsequence which converges to an element in .X This property can be useful in characterizing finite-dimensional subspaces in an arbitrary normed linear space as we shall see in the next theorem. Note also that in view of Definition 5.1.19 a subset Y in a normed linear space X is bounded if and only if there is a .t > 0 such that II Y II < .t for all Y E .Y
6.6.10. Theorem. Let X be a normed linear space, and let Y be a linear subspace of .X Then Y is finite dimensional if and only if every bounded subset of Y is relatively compact. Proof (Necessity) Assume that Y is finite dimensional, and let {x I' • • , x.J be a basis for .Y Then for any Y E Y there is a unique set {"I' ... , such that Y = "IX I + ... + Let A be a bounded subset of ,Y and let I' k } be a sequence in A. Then we can write kY = "ax i + ... + ".kX . for k = I, 2, . . . . There exists a .l > 0 such that II Y kll < .l for all k. Consider I"Ik I + ... + I".kl. We wish to show that this sum is bounded. Suppose that it is not. Then for each positive integer m, we can find a Y k . such that I"I k.1 + ... + I".l.1 .>L mY > m. Now let Y~. = (l/Ym)Yk.· It follows that
".J
".x ..
lIy~.1I -
Thus, Y~.
0 as m -
m
e2.' ...) E 12.' Y = (111) 112.' ...) E 12.' and define (x, y): 12. X 12. - Cas (x , y)
.
= I-I; elil' I
It can readily be shown that ( " .) is an inner product on .X Since 12. is complete relative to the norm induced by this inner product (see Ex a mple 6.1.6), it follows that 12. is a H i lbert space. _ 6.11.10. Ex a mple (a) L e t X = ~[a, b] denote the linear space of complex-valued continuous functions defined on a[ , b] (see Ex a mple 6.1.9). F o r ,x y E ~[a, b] define (x , y)
=
s:
x ( t)y(t) dt.
It is readily verified that this space is a pre- H i lbert space. In view of Example 6.1.9 this space is not complete relative to the norm II x II = (x, X)I/2., and hence it is not a H i lbert space. (b) We extend the space of real-valued functions, pL a[ , bJ, defined in Ex a mple 5.5.31 for the case p = 2, to complex-valued functions to be the set of all functions f: a[ , b] C such that f = u + iv for u, v E 2L .[a, b]. Denoting this space also by 2L .[a, b], we define
(f, g)
= r
G [ J .bl
fgdp,
Chapter 6 I Normed Spaces and Inner Product Spaces
378
for f, g
b], where integration is in the eL begue .)} is a Hilbert space. _
E L~[a,
b]; ( "
{L~[a,
In the next example
sense. The space
we consider the Cartesian product of Hilbert
spaces.
i = I, ... , n, denote a finite collection of 6.11.11. Example. Let IX{ '} Hilbert spaces over C, and let X = IX X •• x X .• If x E ,X then x = (X I J ' • . , x.) with IX E IX ' Defining vector addition and multiplication of vectors by scalars in the usual manner (see Eqs. (3.2.14), (3.2.15), and the related discussion, and see Example 6.1.10) it follows that X is a linear space. If ,x Y E X and if (XI' IY )I denotes the inner product of IX and IY on uX then it is easy to show that
defines an inner product on .X The norm induced on X b y is
Ilxll = where IIXlIII = X is a Hilbert
=
(x, )X I/2
d: IIIX
11f)1/2
I- I
X I)/'2. It is readily verified that X
(XI'
space. _
6.11.12. Exercise.
this inner product
is complete, and thus
Verify the assertions made in Example 6.1 1.11.
In Theorem 6.1.15 we saw that in a normed linear space { X ; II • II}, the norm 1\ • II is a continuous mapping of X into R. Our next result establishes the continuity of an inner product. In the following, X . +- X implies convergence with respect to the norm induced by the inner product ( " .) on .X
6.11.13. X E
,X
Theorem. Let .x{ } be a sequence in X and let .Y { } be a sequence in .X Then
+-
(i) (z, x . )
(ii) (x . , z) -
(iii)
IIxlIll-+-
(iv) if 1; .Y ~
,._ 1
Z
6.11.14.
E
(z, )x for all z (x, )z for all z IIxll; and
E
;X
E
X;
is convergent in ,X
then (1; .Y , )z
.X
Exercise.
such that x .
~
,,= 1
+-
x, where
= n:o::.l 1; (y., )z for all ~
Prove Theorem 6.11.13.
Next, let us recall that two vectors x, Y E X are said to be orthogonal if (x, y) = 0 (see Definition 3.6.22). In this case we write x ..L y. If Y c X
379
6.11. Inner Product Spaces
and x E X is such that x .J .. y for all y E ,Y then we write x .J .. .Y Also, if Z c X and Y c X and if z .J .. Y for all z E Z, then we write Y .J .. Z. Furthermore, observe that x .J .. x implies that x = O. Finally, the notion of inner product allows us to consider the concepts of alignment and colinearity of vectors. 6.11.1S. Definition. Let X be an inner product space. The vectors x, y E X are said to be coJinear if (x, y) = ± l Ix l l ·llyll and aligned if (x, y) =
Ilxll·IIYII·
Our next result is proved by straightforward computation. 6.11.16. Theorem.
+
+
F o r all x, y
yW Ilx (i) Ilx (ii) if x .J .. y, then IIx
6.11.17. Exercise.
+
yW = yW
E
X we have
211xW
= IlxW
+
+
211yW; and
IlyW·
Prove Theorem 6.11.16.
Parts (i) and (ii) of Theorem 6.11.16 are referred to as the parallelogram law and the Pythagorean theorem, respectively (refer to Theorems .4 9.33 and .4 9.38). Let x { .. : a E I} be an indexed set of elements in ,X where I is an arbitrary index set (i.e., I is not necessarily the integers). Then (x .. : « E I} is said to be an orthogonal set ofvectors if x .. ...L x p for all ,« pEl such that « 1= = p. A vector x E X is called a unit vector if II x II = 1. An 6.11.18. Definition.
orthogonal set of vectors is called an orthonormal set if every element of the set is a unit vector. Finally, if IX{ } is a sequence of elements in ,X we define an orthogonal sequence and an orthonormal sequence in an obvious manner. sU ing an inductive process we can generalize part (ii) of Theorem 6.11.16 as follows. 6.11.19. Theorem.
Let { X I '
... ,x
II ~ x
J
n}
be a finite orthogonal set in .X
W= J~
Then
IIx J llz.
We note that if x 1= = 0 and if y = lx llxll, then lIyll = 1. eH nce, it is possible to convert every orthogonal set of vectors into an orthonormal set. Let us now consider a specific example. 6.11.20. Example. Let X denote the space of continuous complex-valued functions on the interval 0[ , I]. In accordance with Example 6.11.10, we
I Normed Spaces and Inner Product Spaces
Chapter 6
380
define an inner product on X by
=
(f, g)
(f(t)g(t) dt.
(6.11.21)
We now show that the set of vectors defined by fft(t)
= e2a .,' , n = 0, ± I , ± 2 , ... ,i = , J = I
is an orthonormal set in .X we obtain (f.,f",) =
Substituting Eq.
J
I
e 2a (a- I II)'
Since e
2ak
i.e., if m
'
=
cos 2nk
-
+
2n(n -
II
(fft,f",) =
0, m
* n;
0 e2aCa- I II)"
(6.11.21),
dt
1 m)i
i sin 2nk, we have
* n, then fa ..L
fill' On the other hand,
:J
(fft,fft) =
i.e., if n =
-
(6.11.22) into Eq.
=
fft(t)f",(t) dt
0
(6.11.22)
m, then (fft,fft) =
e2a (ft- f t)" dt =
Il/all =
I and
I;
1. •
The next result arises often in applications. 6.11.23. Theorem. (i)
t I(x,
1='
(ii) (x -
x;)
6.11.25. Exercise.
x,)x,)
is a finite orthonormal set in ,X
... , fX t}
12 < IlxW
:t (x,
1='
If { X I '
for all
..L x J
X
X;
E
for any j
and
then
(6.11.24)
= 1, ... , n.
Prove Theorem 6.11.23 (see Theorem .4 9.58).
On passing to the limit as n result. 6.11.26. Theorem. If ,x { }
->
00
in (6.11.24), we obtain the following
is any countable orthonormal set in ,X
then (6.11.27)
for every x
E
.X
The relationship (6.11.27) is known as the Bessel inequality. The scalars (x , x,) are called the Fourier coefficients of x with respect to the orthonormal set ,x { .} The next result is a generalization of Theorem .4 9.17.
(1"
=
6.12.
381
Orthogonal Complements
6.11.28. Theorem. In an inner product space X we have (x, y) x E X if and only if y = O. 6.11.29. Exercise.
=
0 for all
Prove Theorem 6.11.28.
From our discussion thus far it should be clear that not every normed linear space can be made into an inner product space. The following theorem gives us sufficient conditions for which a normed linear space is also an inner product space. 6.11.30. Theorem.
Let
X
be a normed linear space. If for all ,x y
Ilx + yll2 + Ilx - yW = 2(llxW + IlyW), then it is possible to define an inner product on X by (x, y)
=
+
tfll x
for all ,x y
E
,X
yW where i =
6.11.33. Exercise.
E
,X
(6.11.31)
IIx - yW + illx + iyW - illx - iyW} (6.11.32) ,.;=T.
Prove Theorem 6.11.30.
6.11.34. Corollary. If X is a real normed linear space whose norm satisfies Eq. (6.11.31) for all ,x y E ,X then it is possible to define an inner product on X by (x, y)
for all ,x y
E
= tWx
+
yW - l lx -
yW}
.X
6.11.35. Exercise.
Prove Corollary 6.11.34.
In view of part (i) of Theorem 6.11.16 and in view of Theorem 6.11.30, condition (6.11.31) is both necessary and sufficient that a normed linear space be also an inner product space. Furthermore, it can also be shown that Eq. (6.11.32) uniquely defines the inner product on a normed linear space. We conclude this section with the following exercise. 00, be the normed linear space defined 6.11.36. Exercise. eL t I" I < p < in Example 6.1.6. Show that I, is an inner product space if and only if p= 2.
6.12.
ORTHOGONAL
COMPLEMENTS
In this section we establish some interesting structural properties of Hilbert spaces. Specifically, we will show that any vector x of a Hilbert space X can uniquely be represented as the sum of two vectors y and ,z where y
Chapter 6 I Normed Spaces and Inner Product Spaces
is in a subspace Y of X
and z is orthogonal to .Y
This is known as the
projection theorem. In proving this theorem we employ the so-called "classical
projection theorem," a result of great importance in its own right. This theorem extends the following familiar result to the case of (infinite-dimensional) Hilbert spaces: in the three-dimensional Euclidean space the shortest distance between a point and a plane is along a vector through the point and perpendicular to the plane. Both the classical projection theorem and the projection theorem are of great importance in applications.
Throughout this section, ;X {
(-, .)) is a complex inner product space.
6.12.1. Definition. eL t Y be a non-void subset of .X The set of all vectors orthogonal to ,Y denoted by .Y l, is called the orthogonal complement of .Y The orthogonal complement of yl. is denoted by y{ l.).l. 6 yil, the orthogonal complement of yil is denoted by (yil)~ 6 Yil.l, etc. 6.12.2. Example. eL t X be the space 3£ depicted in iF gure G, and let Y be the Ix a- ix s. Then yl. is the x 2 x 3p- lane, yu is the Ix a- ix s, ~Y is again = yl., yilil the x 2 x 3p- lane, etc. Thus, in the present case, y.u = ,Y ~ Y = yil, yil~il = y~il = yl., etc. _
y
y.
Xl
6.11.3
iF gure G
We now state and prove several properties of the orthogonal complement. The proof of the first result is left as an exercise. 6.12.4.
Theorem. In an inner product space ,X O { )l. =
6.12.5. Exercise.
X
and Xl. =
O { .J
Prove Theorem 6.12.4.
6.12.6. Theorem. eL t Y be a non-void subset of .X Then y~ is a closed linear subspace of .X Proof If ,x y E y.l, then (x, )z = 0 and (y, )z = 0 for all z E .Y eH nce, (<x< + py, )z = (« ,x )z + P(Y, )z = 0, and thus (<x< + Py) .l- z for all z E ,Y or (<x< py) E lY .. Therefore, yl. is a linear subspace of .X
+
6.12.
Orthogonal Complements
383
To show that y.l is closed, assume that X o is a point of accumulation of Then there is a sequence fx)~ from y.l such that II ~x - X o11- 0 as n 00. By Theorem 6.11.13 we have 0 = (x~, z) (x o, z ) as n 00 for all Z E .Y Therefore X o E y.l and y.l is closed. _ lY ..
Before considering the next result we require the following concept.
6.12.7. Definition. Let Y be a non-void subset of ,X and let V(Y) be the linear subspace generated by Y (see Definition 3.3.6). Let V(Y) denote the closure of V(Y). We call V(Y) the closed linear subspace generated by .Y Note that in view of Theorem 6.2.3, V(Y) of .X
6.12.8. Theorem. Let (i) (ii) (iii) (iv) (v)
is indeed a linear subspace
Y and Z be non-void subsets of .X Then
either Y () yl. = 0 or Y () y.l = O { ;J Y c Y.ll.; if Y c Z, then Zl. c Y.l; y.l = Y . lll; and yH is the smallest closed linear subspace of X i.e., yH = V(Y).
which contains ;Y
To prove part (i), assume that Y () yl. 1= = 0, and let x E Y Then x E Y and x E y.l and so (x , x ) = O. This implies that x = The proof of part (ii) is left as an exercise. To prove part (iii), let Y E Z.L. Then y .l- x for all x E Z. Since it follows that y .l- x for all x E .Y Thus, y E y.L whenever Y E
Proof
y.L
::::J
Z.L.
() .Y l.
O.
Z ::::J ,Y Z.L and
To prove part (iv) we note that, by part (ii) of this theorem, y.l c yll.L . On the other hand, since Y c yH , by part (iii) of this theorem, y.L ::::J y.L l l. Thus, y.L = y.L..L .L The proof of part (v) is also left as an exercise. _
6.12.9. Exercise.
Prove parts (ii) and (v) of Theorem 6.12.8.
In view of part (iv) of the above theorem, we can write y.L = y.LH = = ... , and y.l.L = y.l.L..L L = yU H l l. = .... Before giving the classical projection theorem, we state and prove the following preliminary result. yl..L..L lL .
6.12.10. Theorem. Let Y arbitrary vector in .X Let
6=
be a linear subspace of ,X inf(lIy -
xII: y
E
.} Y
and let x
be an
Chapter 6 I Normed Spaces and Inner Product Spaces
384
If there exists a oY E Y such that lIyo - Ix I = 0, then oY is unique, and moreover oY E Y is the unique element in Y such that IIoY - x II = 0 if and only if (x - oY ) 1- .Y
Proof Let us first show that if II oY - x II = 0, then (x - oY ) 1- .Y In doing so we assume to the contrary that there is ayE Y not orthogonal to x - oY ' We also assume, without loss of generality, that y is a unit vector and that (x - oY , y) = « O. Defining a vector Z E Y as z = oY + ,Y« we have
IIx - W z
=
*
IIx -
y« 1l2 =
(x -
oY -
= (x - oY , x - oY ) -
(x -
oY , « y ) -
= IIx = IIx i.e., II x - z II < II x -
oY -
oY 11 2 - 1 1«
-
2
2
I« I
+
« y , x - oY ( ~ and
lIy", -
nY W
: X 2(0.» < x z } (') ... (') (0.>: ,X ,(o.» < ,x ,}. Furthermore, for a random vector ,X the function F x : R" - + R,
Chapter 6 / Normed Spaces and Inner Product Spaces
398
defined by (xF )x = P{ X I < x . , ... , fX t < x ft ,} is called the distribution function of .X If X is a random variable and g is a function, g: R - R, such that the Stieltjes integral to be E{g(X)}
=
roo g(x )dF x roo g(x)dF(x )X .
is a function, g: Rft -
exists, then the expected value of g( X ) Similarly, if X
is a random vector and if g
R such that t.g(x ) dF x ( X )
value ofg(X) is defined to be E{g(X)}
t.
=
is defined
exists, then the expected
g(x)dF(x )x .
Some of the expected
values of primary interest are E(X), the expected value of ,X E(XZ), the second moment of ,X and E{ [ X - E(X)Z ] ,} the variance of .X If we let .c z denote the family of random variables defined on a probability space to, g:, P} such that E(XZ) < 00, then this space is a vector space over R with the usual definition of addition and multiplication by a scalar. We say two random variables, IX and X z , are equal almost surely if P{co: IX (co) (z X co)} = O. If we let L z denote the family of equivalence classes of all random variables which are almost surely equal (as in Example 5.5.31), then L { z ; (,)} is a real Hilbert space where the inner product is defined by
*'
=
L z • Throughout the remainder of this section, we let to, g:, P} denote our belong underlying probability space, and we assume that all random variab~es to the Hilbert space L z with inner product (X , )Y = E(XY). (X,
)Y
E(XY)
for ,X
Y
E
C. Estimation of Random Variables The special class of estimation problems which we consider may be formulated as follows: given a set of random variables { Y .. ... , "Y ,}, find the best estimate of another random variable, .X The sense in which an estimate is "best" will be defined shortly. Here we view the set {Y., ... , "Y ,} to be observations and the random variable X as the unknown. F o r any mappingf: R'" - R such thatf(Y I , • . , "Y ,) E L z for all observations {Y ... .. , "Y ,}, we call X = f(Y I , • • , "Y ,) an estimate of .X Iffis linear, we call X a linear estimate. Next, letfbe linear; eL ., letfbe a linear functional on R"'. Then there is a vector aT = (III' ... ,II",) E R'" such thatf(y) = aTy for all yT = ("., ... , "",) E R"'. Now a linear estimate, X = lilY. II",Y"" is called the best linear estimate of ,X given { Y l "' " "Y ,}, if E{ [ X - lilY. - ... II",Y",]Z} is minimum with respect to a E R"'. The classical projection theorem (see Theorem 6.12.12) tells us that the best linear estimate of X is the projection of X onto the linear vector space
+ ... +
6.15. Some Applications
V({Y
399
, IY Il})' F u rthermore, Eq . (6.15.1) gives us the explicit form for 1, ... ,m. We are now in a position to summariz e the above discussion in the following theorem, which is usually called the orthogonality principle.
"« i =
p . •.
, Y III belong to L z · Then X = « I Y I + ... p .•. is the best linear estimate of X if and only if {«p ... '«Ill} are such - 21Y ,} = 0 for i = 1, ... ,m.
6.15.8. Theorem. L e t ,X Y
+
I« IlYIIl that E{ [ X
We also have the following result.
Y I , •• , Y IIl belong to L z . L e t G = ,Y [ j]' where i,j = I, , m, and let V = (PI' ... ,Pill) E Rill, where ,} for i = 1, , m. If G is non- s ingular, then X = « I Y I is the best linear estimate of X if and only if aT = bTG - I .
6.15.9. Corollary. L e t ,X
't,j =
P, =
+
l« ilY
E{,Y Y E{XY
1ft
j,}
6.15.10. Exercise.
+ ...
Prove Theorem 6.15.8 and Corollary 6.15.9.
L e t us now consider a specific case.
6.15.11. Example. L e t ,X VI' ... , Vm be random variables in L z such that E{ X } = E{V,} = E{XV ,} = 0 for i = I, ,m, and let R = P[ /J] be non, m. Suppose that the measuresingular where P,j = E[V,V j] for i,j = I, ments {Y p • .• , IY ft} of X are given by Y , = X V, for i = I, ... ,m. Then we have E{,Y Y + V,][X + Vj]} = 0'; + P'j for i,j = I, j} = E{ [ X ,} = E{ X ( X ... ,m, where 0'; 11. E{.J z X Also, E{XY + V,)} = 0'; for i = I, ... ,m. Thus, G = /Y[ ']J where j' Y = 0'; + P'j for i,j = I, ... , m, bT = (PI' ... , Pm), where P, = 0'; for i = 1, ... ,m, and aT = bTG-I. •
+
6.15.12. Exercise. In the preceding example, show that if P,j = i,} = I, ... , m, where btj is the K r onecker delta, then
,« -
_
0';-+
2
z for
mO'" , O'v
._
I -
O';b ,j for
I, ... , m.
The nex t result provides us with a useful means for finding the best linear estimate of a random variable ,X given a set of random variables { Y p ... , Y k } , if we already have the best linear estimate, given {Y p .• • , Y k - I } .
6.15.13. Theorem. L e t k > 2, and let Y I , • • , Y k be random variables in L z . L e t Y ' j = V({Y I , • • • • Yj})' the linear vector space generated by the random variables {Y p . • , Y j } , for 1 < j < k. L e t Y i k - I) denote the best linear estimate of Y k• given {Y p . . • , Y k- I,} and let Y k(k - I) = Y k Y i k - I). Then kY ' = 'Yk-I EB V({Y k(k - I)}).
Chapter 6 I Normed Spaces and Inner Product Spaces
04 0
Proof By the classical projection theorem (see Theorem 6.12.12), "Y ,(k - I) .J .. ,Y' .-I· Now for arbitrary Z E ,Y ' ., we must have Z = CIY I + ... + C,.-I ,Y .-I + C,.Y,. for some (C I' ... ,C,.). We can rewrite this as Z = ZI + Z2' where ZI = CIY I + ... + C,.-I,Y .-I + C,.Y,.(k - I) and Z2 = C,.Y,.(k - I). and Z2 1- 'Y,.-I' it follows from Theorem 6.12.12 that ZI Since ZI E ,Y ' .-I and Z2 E V({,Y .(k - I)}), the theorem and Z2 are unique. Since ZI E ,Y' .-I is proved. _
We can extend the problem of estimation of (scalar) random variables to random vectors. eL t X I ' ... , X. be random variables in £ 2 ' and let X = (XI> ... , .X )T be a random vector. Let Y o ' .. , "Y , be random variables in , £ 2 ' We call i = (.A\, ... , .X )T the best linear estimate of ,X given Y{ I' "Y ,}. if ,X is the best linear estimate of "X given { Y o ' .. , "Y ,} for i = 1, , n. Clearly, the orthogonality principle must hold for each X , ; i.e., we must have E{(,X - ,X )Y j } = 0 for i = 1, ... ,n and j = 1, ... ,m. In this case i can be expressed as i = AY, where A is an (n X m) matrix of real numbers and Y = (Y I , • • , "Y ,)T. Corollary 6.15.9 assumes now the following matrix form.
6.15.14. Theorem. Let X I ' ... ,X., oY ... , "Y , £ 2 ' Let G = [)',j]' where )'1) = E{,Y Y j } for i,j = P [ ,j] , where PI) = E{ X , Y j} for i = I, ... ,n. If i = AY is the best linear estimate of ,X given ,Y 6.15.15.
Exercise.
be random variables in 1, ... ,m, and let B = G is non-singular, then if and only if A = BG- I .
Prove Theorem 6.15.14.
We note that Band G in the above theorem can be written in an alternate way. That is, we can say that
i =
E{XYT}[E{YVTWIY
(6.15.16)
is the best linear estimate of .X By the expected value of a matrix of random variables, we mean the expected value of each element of the matrix. In the remainder of this section we apply the preceding development to dynamic systems. eL t J = {I, 2, ...} denote the set of positive integers. We use the notation {X(k)} to denote a sequence of random vectors; i.e., X(k) is a random vector be a sequence of random vectors, (U k) = [ U I (k), for each k E .J eL t (U { k)} ... , U i k)] T , with the properties and
= 0
E{ U ( k)}
E{U(k)UT(j)}
=
Q(k~j"
(6~I5.1
7)
(6.15.18)
for all j, k E ,J where Q(k) is a symmetric positive definite (p X p) matrix { (k)} be a sequence of random vectors, V(k) = for all k E .J Next, let V
6.15.
Some Applications
04 1
[V1(k), ... , V..(k)]T, with the properties
=
E{V(k)}
and
E{V(k)VT(j)}
=
0
(6.15.19)
R(k)Ojk
(6.15.20)
for all j, k E ,J where R(k) is a symmetric positive definite (m for all k E .J Now let X ( I) be a random vector, X ( I) = 1X[ (I), ... , X~(I)]T, properties E{X(I)} = 0 and E{X(I)XT(I)} = P(I), X
m) matrix
with the (6.15.21) (6.15.22)
where P(I) is an (n X n) symmetric positive definite matrix. We assume further that the relationships among the random vectors are such that E{(U k)VT(j») E{(X I)UT(k»)
and
E{X(I)VT(k)}
= =
=
0,
(6.15.23)
0,
(6.15.24)
0
(6.15.25)
for all k,j E .J Next, let A(k) be a real (n x n) matrix for each k E ,J let B(k) be a real (n x p) matrix for each k E ,J and let C(k) be a real (m x n) matrix for each k E .J We let {X(k)} and (Y{ k)} be the sequences of random vectors generated by the difference eq u ations and
(X k
+
Y(k)
=
1)
=
+
A(k)X(k) C(k)X(k)
+
B(k)U(k)
(6.15.26)
V(k)
(6.15.27)
for k = 1,2, .... We are now in a position to consider the following estimation problem: ... , Y(k)}, find the best linear estimate of given the set of observations, (Y{ I), the random vector (X k). We could view the observed random variables as Y [ (I), yT(2), ... ,YT(k)], and apply a single random vector, say cyT = T Theorem 6.15.14; however, it turns out that a rather elegant and significant algorithm exists for this problem, due to R. E. Kalman, which we consider next. In the following, we adopt some additional convenient notation. F o r each k,j E ,J we let t(j Ik) denote the best linear estimate of X ( j), given (Y{ I), ... , (Y k)}. This notation is valid for j < k and j;;::: k; however, we shall limit our attention to the situation where j ;;::: k. In the present context, a recursive algorithm means that ~(k + I Ik + I) is a function only of ~(k Ik) and Y ( k + I). The following theorem, which is the last result of this section, provides the desired algorithm explicitly.
I Normed Spaces and Inner Product Spaces
Chapter 6
6.15.28. Theorem (K a lman). Given the foregoing assumptions for the dynamic system described by Eqs. (6.15.26) and (6.15.27), the best linear estimate of X(k), given (Y(I), ... , Y ( k)} , is provided by the following set of difference eq u ations: i(k Ik)
i(k Ik -
=
and i(k
where K ( k)
=
P(k Ik -
P(k
for k
=
+
+
K ( k)[ Y ( k)
II k)
11 k)
= I[ -
=
C(k)i(k Ik -
1)],
(6.15.29)
A(k)i(k Ik),
=
I)CT(k)[C(k)P(k
P(kl k)
and
+
I)
Ik -
I)CT(k)
K ( k)C(k)] P (kl
A(k)P(kl k)AT(k)
(6.15.30)
(6.15.31)
R(k)] - l ,
I),
(6.15.32)
B(k)Q(k)BT(k)
(6.15.33)
k -
+
+
1, 2, ... , with initial conditions
i(IIO) =
and
P(lIO)
0
= P(I).
Proof Assume that i(kl k - I) is known for k E .J We may interpret i(lIO) as the best linear estimate of X(l), given no observations. We wish
to find i(k Ik) and i(k + 11 k). It follows from Theorem 6.15.13 (extended to the case of random vectors) that there is a matrix K ( k) such that i(k I k) = i(kl k - I) + K ( k)f(kl k - 1), where f(kl k - 1) = Y ( k) - t(kl k - I), and t(k Ik - I) is the best linear estimate of Y ( k), given {Y(l), ... , Y ( k - I)}. It follows immediately from Eqs. (6.15.23) and (6.15.27) and the orthogonality principle that t(k I k - 1) = C(k)i(k I k - I). Thus, we have shown that Eq. (6.15.29) must be true. In order to determine K ( k), let X ( kl k - 1) = X ( k) - X ( kl k - I). Then it follows from Eqs. (6.15.26) and (6.15.29) that
(X kl
k)
=
X ( kl
k -
I) -
K ( k)[ C (k)X ( kl
k -
I) +
V(k)] .
To satisfy the orthogonality principle, we must have E{ X ( k I k)Y T (j)} = 0 for j = 1, ... , k. We see that this is satisfied for any K ( k) for j = 1, ... , k - 1. In order to satisfy E(X ( k Ik)YT(k)} = 0, K ( k) must satisfy 0=
E{ X ( k
Ik -
l)YT(k)}
-
K ( k)[ C (k)E{ X ( k
Let us first consider the term E{ X ( k
Ik -
I)YT(k)}
=
E(X ( kl
k -
+
Ik -
l)Y T (k)}
E{V(k)YT(k)}].
I)X T (k)C T(k)
+
X ( kl
k -
(6.15.34) l)VT(k)}.
(6.15.35)
We observe that X(k), the solution to the difference eq u ation (6.15.26) at (time) k, is a linear combination of X ( l) and U ( l), ... , U ( k - 1). In view
6.15. Some Applications
04 3
of Eqs. (6.15.23) and (6.15.25) it follows that E{X(j)VT(k)} = 0 for all k,j E .J Hence, E{ X ( kl k - I)VT(k)} = 0, since X ( kl k - I) is a linear Y ), ... , Y(k - I). combination of X(k) and O Next, we consider the term
E{ X ( kl
k-
= E{ X ( kl
I)XT(k)}
k-
l)[XT(k)
iT(kl k -
iT(k Ik -
1)' +
I)]}
= E { X ( klk- I )[ X T (klk- l ).+ i T(klk- I )] }
=
P(kl
k-
I)
E{X(kl
I)
(6.15.36)
where
P(kl k and E{ X ( klk tion of O Y { ),
t::.
l)iT(klk' - I)} = I)} .
-
k-
I)}
I)X T (klk -
0, since i(klk - I ) is a linear combina-
... , Y(k -
Now consider
Using
.+
= E{V(k)[TX (k)CT(k)
E{V(k)YT(k)}
= R(k).
VT(k)J}
(6.15.37)
Eqs. (6.15.35), (6.15.36), and (6.15.37), Eq. (6.15.34) becomes
0=
P(kl k -
I)CT(k) -
K(k)[C(k)P(kl
k-
l)CT(k)
.+
R(k)].
(6.15.38)
Solving for (K k), we obtain Eq. (6.15.31). To obtain Eq. (6.15.32), let X(k I k) = i(k) - X(k Ik) and P(k Ik) = ErX ( k I k)XT(k I k)}. In view of Eqs. (6.15.27) and (6.15.29) we have
X ( kl k) =
X ( kl k -
1) -
K(k)[C(k)X(kl
Ik -
= [ I - K(k)C(k)]X(k
k-
1) -
1)
+
V(k)]
(K k)V(k).
F r om this it follows that P(kl k) =
I[ -
(K k)C(k)JP(kl I[ -
= I[ -
x
K ( k)C(k)] P (kl
K(k)C(k)]P(k P { (k
k-
Ik -
1)
k-
Ik -
I)CT(k) -
I)CT(k)KT(k)
+
(K k)R(k)KT(k)
1)
K(k)[C(k)P(k
Ik -
I)CT(k)
.+ R(k)J}
T K (k).
U s ing Eq. (6.15.38), it follows that Eq. (6.15.32) must be true. To show that i(k' + 11k) is given by Eq. (6.15.30), we simply show that the orthogonality principle is satisfied. That is,
E{[X(k
+ =
1) -
for j
=
A(k)i(k Ik)]YT(j)}
EfA(k)[X(k)
1, ... , k.
-
i(k I k)]YT(j)}
.+
EfB(k)U(k)YT(j)} =
°
Chapter 6 / Normed Spaces and Inner Product Spaces
04 4
Finally, to verify Eq. (6.15.33), we have from Eqs. (6.15.26) and (6.15.30)
X(k
+
11 k)
=
A(k)X ( k
Ik) +
B(k)U(k).
F r om this, Eq. (6.15.33) follows immediately. We note that i(ll 0) P(IIO) = P(l). This completes the proof. _
6.16.
=
0 and
NOTES AND REFERENCES
The material of the present chapter as well as that of the next chapter constitutes part of what usually goes under the heading of functional analysis. Thus, these two chapters should be viewed as a whole rather than two separate parts. There are numerous excellent sources dealing with H i lbert and Banach spaces. We cite a representative sample of these which the reader should consult for further study. References 6 [ .6]6[- .8], 6[ .10], and 6[ .12] are at an introductory or intermediate level, whereas references 6 [ .2]6[ - .4] and 6[ .13] are at a more advanced level. The books by Dunford and Schwartz and by Hille and Phillips are standard and encyclopedic references on functional analysis; the text by Y osida constitutes a concise treatment of this subject, while the monograph by H a lmos contains a compact exposition on H i lbert space. The book by Taylor is a standard reference on functional analysis at the intermediate level. The texts by K a ntorovich and Akilov, by K o lmogorov and F o min, and by Liusternik and Sobolev are very readable presentations of this subject. The book by Naylor and Sell, which presents a very nice introduction to functional analysis, includes some interesting examples. F o r references with applications of functional analysis to specific areas, including those in Section 6.15, see, e.g., Byron and F u ller 6[ .1], K a lman et al. 6[ .5], L u enberger 6[ .9], and Porter 6[ .11].
REFERENCES 6[ .1] 6[ .2] 6[ .3] 6[ .4] 6[ .5]
.F W. BYRON and R. W. EL UF R, Mathematics of Classical and Quantum Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc.,
1969 and 1970.· N. DUNO F RD and .J SCHWARTZ, Linear Operators. Parts I and II. New York: Interscience Publishers, 1958 and 1964. P. R. A H M L OS, Introduction to Hilbert Space. New York: Chelsea Publishing Company, 1957. E. IH EL and R. S. PHIIL PS, Functional Analysis and Semi-Groups. Providence, R.I.: American Mathematical Society, 1957. R. E. A K M L AN, P. L . A F B L , and M. A. ARBIB, Topics in Mathematical System Theory. New York: McGraw-iH ll Book Company, 1969. *Reprinted in one volume by Dover Publications, Inc., New oY rk,
1992.
6.16.
6[ .6) 6[ .7) 6[ .8) 6[ .9) 6[ .10] 6[ .11] 6[ .12] 6[ .13]
Notes and References L . V. A K NTORovlCH Spaces. New York:
and G. P. AKIO L V, uF nctional Analysis in Normed The Macmillan Company, 1964. A. N. O K M L OGOROV and S. V. O F MIN, Elements of the Theory of uF nctions and uF nctional Analysis. Vols. t, II. Albany, N.Y.: Graylock Press, 1957 and 1961. .L A. IL SU TERNIK and V. .J SoBOLEV, Elements ofFunctional Analysis. New York: rF ederick Ungar Publishing Company, 1961. D. G. EUL NBERGER, Optimization by Vector Space Methods. New York: J o hn Wiley & Sons, Inc., 1969. A. W. NAYO L R and G. R. SEL,L iL near Operator Theory. New York: Holt, Rinehart and Winston, 1971. W. A. PORTER, Modern oF undations of Systems Engineering. New York: The Macmillan Company, 1966. A. E. TAYO L R, Introduction to uF nctional Analysis. New York: John Wiley & Sons, Inc., 1958. .K O Y SIDA, uF nctional Analysis. Berlin: Springer-Verlag, 1965.
7
IL NEAR
OPERATORS
In the present chapter we concern ourselves with linear operators defined on Banach and Hilbert spaces and we study some of the important properties of such operators. We also consider selected applications in this chapter. This chapter consists of ten parts. Throughout, we consider primarily bounded linear operators, which we introduce in the first section. In the second section we look at inverses of linear transformations, in section three we introduce conjugate and adjoint operators, and in section four we study hermitian operators. In the fifth section we present additional special linear transformations, including normal operators, projections, unitary operators, and isometric operators. The spectrum of an operator is considered in the sixth, while completely continuous operators are introduced in the seventh section. In the eighth section we present one of the main results of the present chapter, the spectral theorem for completely continuous normal operators. Finally, in section nine we study differentiation of operators (which need not be linear) defined on Banach and Hilbert spaces. Section ten, which consists of three subsections, is devoted to selected topics in applications. Items touched upon include applications to integral equations, an example from optimal control, and minimization of functionals (method of steepest descent). The chapter is concluded with a brief discussion of pertinent references in the eleventh section.
7.1.
BOUNDED
IL NEAR
TRANSFORMATIONS
Throughout this section X and Y denote vector spaces over the same field
,F where F is either R (the real numbers) or C (the complex numbers). We begin by pointing to several concepts considered previously. Recall from Chapter I that a transformation or operator T is a mapping of a subset :D(T) of X into .Y Unless specified to the contrary, we will assume that X = :D(T). Since a transformation is a mapping we distinguish, as in Chapter I, between operators which are onto or surjective, one-to-one or injective, and one-to-one and onto or bijective. If T is a transformation of X into Y we write T: X - + .Y If x E X we call y = T(x) the image ofx in Y under T, and if V c X we define the image ofset V in Y under T as the set T(V)
=
y{
E
y
Y:
=
T(v), v EVe X } .
On the other hand, if W c ,Y then the inverse image ofset Wunder T is the set T- I (W) = x { E :X y = T(x) EWe .} Y We define the range ofT, denoted R < (T), by
R < (T) =
y{
E
:Y
y=
T(x), x EX } ;
i.e., R < (T) = T(X). Recall that if a transformation T of X into Y is injective, then the inverse of T, denoted T- I , exists (see Definition 1.2.9). Thus, if y = T(x) and if T is injective, then x = T- l (y). In Definition·3.4.1 we defined a linear operator (or a linear transformation) as a mapping of X into Y having the property that (i) T(x (ii) T(lX)X
+ y) = T(x) + T(y) for all ,x y E X ; = lXT(x) for alllX E F and all x E
and .X
As in Chapter 3, we denote the class of all linear transformations from Also, in the case of linear transformations we write
X into Y by L ( X , )Y . Tx in place of T(x).
Of great importance are bounded linear operators, which turn out to be also continuous. We have the following definition.
7.1.1. Definition. Let X and Y be normed linear spaces. A linear operator T: X - + Y is said to be bounded if there is a real number 1' > 0 such that for all x
E
.X
II Tx Ily < 1' 11 lx ix
The notation II x Ilx indicates that the norm on X II Tx lIy indicates that the norm on Y is employed.
is used, while the notation However, since the norms of the various spaces are usually understood, it is customary to drop the subscripts and simply write II x II and II Tx II·
04 7
Chapter 7 I iL near Operators
04 8
Our first result allows us to characterize a bounded linear operator in an equivalent way.
)Y . Then T is bounded if and only if T 7.1.2. Theorem. Let T E L ( X , maps the unit sphere into a bounded subset of .Y 7.1.3.
Exercise.
Prove Theorem 7.1.2.
In Chapter 5 we introduced continuous functions (see Definition 5.7.1). The definition of continuity of an operator in the setting of normed linear spaces can now be rephrased as follows. 7.1.4. Definition. An operator T: X - > Y (not necessarily linear) is said to be continuous at a point X o E X iffor every f > 0 there is a 6 > 0 such that IIT(x ) -
whenever II x X
o II
T(x o) II
0 as n - > 00. Then II TX n II < , 11 .x 11 - > 0 as n - > 00, and hence T is continuous at the point E .X F r om Theorem 7.1.5 it follows that T is continuous at all points x E .X Conversely, assume that Tis continuous at x = 0, and hence at all x E .X Since TO = 0 we can find a 6 > 0 such that II Tx II < I whenever II x II S 6. F o r any x 1= := 0 we have 1I(6x)/llxllll = 6, and hence
Proof
o
IITxll Ifwe let"
=
=
II T(I I~
1/6, then II Txll
•
0, and II Til = 0 if and only ifT = s IISII + II Til for every S, T E B(X, )Y ; and III Til for every T E B(X, )Y and for every ~ E .F
(i) for every T (ii) liS + Til (iii) II T~ II = I~
Proof
be
EX } ;
I{ I
1"'1=\
can equivalently
for all x EX } ;
x
sup
I",I:S:\
II Til
)Y ,
B(X ,
E
0;
The proof of part (i) is obvious. To verify (ii) we note that
II(S + T)x l l = IISx + Tx l l < IISxll + IITxl1 < (IISII + IITll)llxll· If x = 0, then we are finished. If x t= = 0, then liS + Til = ~
< IISII + IITII for all x
II(Sltx~)xlI
E
,X
x t= =
We leave the proof of part (iii), which is similar, as an exercise. F o r the space B(X ,
E
B(X,
X),
then ST
IISTII < IISII 1· 1 F o r each x
E
B(X,
E
IISTII = completing the proof.
E
B(X,
X)
and
Til·
X we have
II (ST)x II = II S(Tx) II < IISII·11 which shows that ST
_
we have the following results.
X)
7.1.16. Theorem. If S, T
Proof
O.
X).
sup ","0
If x t= =
Tx l l
< IISII·IITII·llxll,
0, then
II(ST)xll < IISII·IITII, IIxll
_
7.1.17. Theorem. Let / denote the identity operator on .X B(X, X), and II/II = 1.
Then /
E
7.1.
Bounded iL near
7.1.18. Exercise.
Transformations
14 1
Prove Theorem 7.1.17.
We now consider some specific cases. 7.1.19. Example.
x
= (el' ez, ... ) E
X = I z , the Banach space of Example let us define T: X - > X by
Let ,X
Tx =
6.1.6. F o r
(0, ez, e3' ... ).
The reader can readily verify that T is a linear operator which is neither injective nor surjective. We see that 00 00
IITxW =
R be a real-valued function, continuous on the square a < s < b, a < t < b. Define the operator T: X - > X by
=
[ T x ] ( s)
for x
E
.X
Then T
L(X,
E
X)
IITx II =
sup
Q~·,~b
This shows that T that IITII = )10' •
E
)10
B(X ,
k(s, t)x ( t) dt
(see Example 3.4.6). Then
Ifb k(s, t)x(t) dt I
< [Q~rb =
r
r Q
Ik(s, t) Idt]
·lIxll·
)Y and that
•
IITil
F b y j,(x ) =
Proof
reI,
is finite dimensional, then Tis F o r each x
= elx l +
e" i =
I,
+
E
there is a unique If we define ,n, then by Theorem X,
enxn'
Chapter 7 I iL near Operators
14 2
6.6.1 we know that each f, is a continuous linear functional. Thus, there exists a set of real numbers { " I' ... ,,,"} such that If,(x) I < ",lIxll for i = 1, ... , n. Now
Tx = ' I Tx l + ... + ,"Tx". If we let p = max, llTx,11 and )'0 = max , )"' then it follows that IITxll < np)'oll x II. Thus, T is bounded and hence continuous. _
Next, we concern ourselves with various norms of linear transformations on the finite dimensional space R".
7.1.22. Example. eL t X = R", and let IU { ' ... ,u"} be the natural basis for R" (see Example .4 I.I 5). F o r any A E L ( X , X ) there is an n X n matrix, say ... , A = a[ ll] (see Definition .4 2.7), which represents A with respect to IU{ > u"}. Thus, if Ax = y, where x = (' I > ... ,,") E X a ndy = ('71' ... , 7' ") E ,X we may represent this transformation by y = Ax (see Eq. (4.2.17». In Example 6.1.5 we defined several norms on R", namely and
IIxllp =
[ I ' l lI'
+ ... +
1e"'I] ' /P,
11_ = max, I{
II x
0 such that S(x ; E,,) n A" = 0. Now let IX E X and EI > 0 be such that S(x l ; f l ) n AI = 0. eL t X z E X and f z > 0 be such that S(x z ; fz ) c S(x l ; f\ ) and S(x z ; fz ) n Az = 0. We see that it is possible to construct a sequence of closed nested spheres, ,K { ,}, (see Definition 5.5.34) in such a fashion that the diameter of these spheres, diam (K,,), converges to ez ro. In view of part (ii) of Theorem 5.5.35, Then X
n K" * 0. eL t ..
k- I
¢ A" for all n. But this contradicts the fact that X =
completes the proof of the proposition.
Proof ofTheorem 7.2.6. Let Ak =
Clearly, Y
=
. U A
k- I
{ y E :Y k•
II r- I y II
(iii) if III > vector x
Let X
X),
let l E ,F
II T II, then Tx = h has a unique solution, namely x = II Til, then (T - 1/)-1 exists and is continuous on X ; II T II, then for a given y E X there is one and only such that (T -
E X
x =(iv) if III-
be a Banach space, let T E B(X,
Til
1' 2' ...• 'I., ... ) for all y
6.1.6) and define T:
= ('12' 1' 3' ... , 'I.,
=
2
y, is the operator
... )
12,
Recalling the definition of orthogonal complement (refer to Definition
6.12.1), we have the following important results for bounded linear operators on H i lbert
spaces.
7.3.13. Theorem. L e t T be a bounded linear operator on a H i lbert X into a H i lbert space .Y Then, (i) { R < (T)}.L (ii) R < (T) = (iii) ~(T) = (iv) R < (T*) (v)
(~ T*)
(vi) R < (T)
=
=
space
~(T*);
~(T*).L;
R S.
B(X, X ) T- S> E
if S ~ 0, T~ 0, then (S + if ex > 0, T~ 0, then exT~ if S ::; T, T::; ,U then S < for any V E B(X, X), if V*V> o.
B(X, T)
0;
be hermitian operators, and let
>
X)
0;
U; and T > 0, then V*TV>
O. In particular,
Proof The proofs of parts (i}(- iii) are obvious. F o r example, if S > 0, T > 0, then (Sx , )x + (Tx, )x = (Sx + Tx , )x = + Dx , x) ;;::: 0 and (S+ D;;:::O. To prove part (iv) we note that (V*TVx , x) = (TVx , Vx);;::: 0, since Vx = y is a vector in X and (Ty, )Y > 0 for all y E .X If we consider, in particular, T = 1= 1*, then v* V ~ O. •
S«
The proof of the next result follows by direct verification of the formulas involved.
34 0
Chapter 7 / iL near Operators
7.4.15.
Theorem. eL t A
where i
= ,j- 1 .
~ =
U
A [
E B(X ,
+
and let
X),
V=
and
A*]
ii
A [ -
A*],
Then
(i) U and V are hermitian operators; and (ii) if A = C + iD, where C and D are hermitian, then C D= V. 7.4.16.
= U
and
Prove Theorem 7.4.15.
Exercise.
eL t us now consider some specific cases. 7.4.17. Example. eL t X = C" with inner product given in Example 3.6.24. Let A E B(X , X), and let e{ l> ... ,eft) be any orthonormal basis for .X As we saw in Example 7.3.10, if A is represented by the matrix A, then A* is represented by A * = AT. In this case A is hermitian ifand only if A = AT. • 7.4.18. Example. T E B(X, X ) by
= L X
eL t
=
y
Then for any
Z
7.4.19.
Tx
=
=
s:
tx ( t)z ( t)
(x , Tz)
=
Let X =
z =
Show that T*
Tx
y(t)
=
dt
=
s:
x ( t)tz ( t)
dt
_
=
I
-+
X
by
x ( s)ds.
*" T and therefore T is not hermitian.
7.4.20. Exercise. eL t X = L given in Example 7.3.11; i.e.,
Show that T
tx ( t).
b], and define T: X
[, 2a L
6.11.10), and define
(T*x , z).
T* and T is hermitian.
Exercise.
=
we have
E X
(Tx , z )
Thus, T =
b] (see Example
[, 2a
=
(Tx ) (t)
=
2
a[ , b] and consider the Fredholm
s: k(s, t)x(s)ds,
T* if and only if k(t, s) =
operator
t E a[ , b].
k(s, t).
We conclude this section with the following result, which we will subsequently require.
7.5.
Other iL near Operators
34 1
7.4.21. Theorem. Let X be a H i lbert space, let T E B(X , X ) be a hermitian operator, and let 1 E R. Then there exists a real number" > 0 such that , 11 x II < II (T - U ) x II for all x E X if and only if (T - U ) is bijective and (T - 11)-1 E B(X , X ) , in which case II(T - ,il)-III < 1/".
L e t T A= T - AT. It follows from Theorem 7.4.10 that T Ais also hermitian. To prove sufficiency, let Til E B(X , X ) . It follows that for all Y E ,X IITilyli < II Til II · l lyl\ · L e ttingy = TAX and" = II Til WI, we have II TAX II 2 ,,11 x 1\ for all x E .X To prove necessity, let" > 0 be such that , 11 x II < II TAX II for all x E .X We see that TAX = 0 implies X = 0; i.e., m(TJ = O { ,J and so TAis injective. < (T A) = .X It follows from Theorem 6.12.16 that X We next show that R = R< (T A) EEl R< (T A)1.. F r om Theorem 7.3.13, we have R< (TA)l· = men). Since TAis hermitian, meT! ) = m(TA) = O { .J Hence, R< (T A) = .X We next show that R < (T A) = R < (T A), i.e. the range of T A is closed. Let nY{ J be a sequence in R < (T A) such thatYn - > y. Then there is a sequence nx{ J in X such that TAx n = n'J ' F o r any positive integers m, n, , 11 X m - X nIi < II TAx m - TAx nII = II m Y - nY II. Since nY{ J is Cauchy, nx { J must also be Cauchy. Let X n - > .x Then nY = TAx n -> TAX = y. Thus, Y E R < (T A) and so R < (T A) is closed. This proves that TA is bijective. Finally, ,,11 Ti I Y II < II Y II for all y E X implies Ti I E B(X , X ) and II Tilll < 1/". This completes the proof of the theorem. _
Proof
7.5.
OTHER LINEAR OPERATORS: NORMAL OPERATORS, PROJECTIONS, U N ITARY OPERATORS, AND ISOMETRIC OPERATORS
In this section we consider additional important types of linear operators. Throughout this section X is a complex H i lbert space, T* denotes the adjoint of T E B(X , X ) , and I E B(X , X ) denotes the identity operator. 7.5.1. Definition. ifT*T= TT*.
An operator T
E
7.5.2. Definition. An operator T operator if T*T = I. 7.5.3. Definition. An operator T tor if T*T = TT* = I.
E
B(X ,
E B(X ,
B(X ,
Our first result is for normal operators.
is said to be a normal operator
X)
X)
X)
is said to be an isometric is said to be an unitary opera-
34 1
ClUpJ ter
7.5.4. Theorem. Let operators such that T 7.5.5. Exercise. Theorem 7.4.15.
=
7
I iL near Operators
T E B(X, X). Let ,U V E B(X, X ) be hermitian U iV. Then T is normal if and only if U V = VU.
+
Prove Theorem 7.5.4. Recall that U and V are unique by
F o r the next result, recall that a linear subspace Y of X is invariant under a linear transformation T if T(Y ) c Y (see Definition 3.7.9). Also, recall that a cloSed linear subspace Y of a Hilbert space X is itself a Hilbert space with inner product induced by the inner product on X (see Theorem 6.2.1). 7.5.6. Theorem. Let T E B( ,X X ) be a normal operator, and let Y be a closed linear subspace of X which is invariant under T. eL t T I be the restriction of T to .Y Then TIE B(Y , )Y and T I is normal. 7.5.7. Exercise.
Prove Theorem 7.5.6.
F o r isometric operators we have the following result. 7.5.8. Theorem. eL t T E B(X , X). Then the following are equivalent: (i) T is isometric; (ii) (Tx , Ty) = (x, y) for all ,x y E X ; and (iii) II Tx - Ty II = IIx y II for all ,x y E .X Proof If T is isometric, then (x, y) = (lx , y) = (T*Tx, y) = (Tx , Ty) for all x , y E .X Next, assume that (Tx, Ty) = (x, y). Then I\ Tx - Ty I\ ' = I\ T(x - y) II' = (T(x - y), T(x - y)) = x « - y), (x - y» = IIx - yll' ; i.e., IITx - Tyll
= l lx - y ll·
iF nally, assume that II Tx - Ty II = II x y II. Then (T*Tx, II Tx W = IIx W = (x, x); i.e., (T· T x , )x = (x, x) for all x implies that T· T = I; i.e., T is isometric. _ =
)x E
=
.X
(Tx , Tx ) But this
From Theorem 7.5.8 there follows the following corollary. 7.5.9. Corollary. If T E B(X, X ) is an isometric operator, then IITxll = Ilxll for all x E X and IITII = I. F o r unitary operators we have the following result. 7.5.10. Theorem. eL t T
E B(X ,
(i) T is unitary; (ii) T· is unitary; (iii) T and T· are isometric;
)X .
Then the following are equivalent:
7.5. Other iL near Operators
34 3
(iv) T is isometric and T* is injective; (v) T is isometric and surjective; and (vi) T is bijective and T- I = T*.
7.S.H.
Exercise.
Prove Theorem 7.5.10.
Before considering projections, let us briefly return to Section 3.7. Recall that if (a linear space) X is the direct sum of two linear subspaces XI and X z , i.e., X = X l EB X z , then for each x E X there exist unique X l E X l and Xz E X z such that X = Xl X z . We call a mapping P: X - > X defined by Px = X l the projection on .X along X z . Recall thatP E L ( X , X), R < (P) = X l ' and m(p) = X z . Furthermore, recall that if P E L ( X , X ) is such that pz = P, then P is said to be idempotent and this condition is both necessary and sufficient for P to be a projection on R < (P) along m(p) (see Theorem 3.7.4). Now if X is a Hilbert space and if X l = Y is a closed linear subspace of ,X then X z = y.l and X = Y E9 y.l (see Theorem 6.12.16). If for this particular case P is the projection on Y a long y.l, then P is an orthogonal projection (see Definition 3.7.16). In this case we shall simply call P the orthogonal projection on .Y
+
7.5.12. Theorem. eL t Y be a closed linear subspace of X such that Y and Y
*" .X
Let P be the orthogonal projection onto .Y Then
*" O{ J
(i) P E B(X, X ) ; (ii) IIPII = I; and (iii) p* = P.
Proof We know that P E L ( X , X ) . To show that P is bounded let X = X l x z , where X I E Y a nd X z E .Y l. Then II Px II = Ilxlli < IIxll. eH nce, P is bounded and IIPII ~ I. If X z = 0, then IIPxl1 = IIxll and so IIPII = I. To prove (iii), let x, Y E X be given by X = X I + X z and Y = IY + ,zY respectively, where X I ' IY E Y a nd x z , zY E .Y l. Then (x , Py) = (X l + X z , Y l ) = (X l ' Y l ) and (Px, y) = (XI> IY yz) = (X I ' YI)' Thus, (x, Py) = (Px, y)
+
for all ,x Y E .X This implies that P
+
= P*. •
From the above theorem it follows that an orthogonal projection is a hermitian operator.
7.5.13. Theorem. Let Y be a closed linear subspace of ,X and let P be the orthogonal projection onto .Y If Y
l
= x{
and if Y z is the range of P, then Y
Px
E X:
= Y
l
= )x
= Y z.
Chapter 7 I iL near Operators
34 4
Proof Y= Y 7.5.14.
Since I
=
Y~
~Y
.•
=
Theorem.
,Y
since Y c Y
Let P
L(X,
E
x{
I
it follows that
c Y~,
If P is idempotent and hermitian, then
X).
=
Y
and since Y
I,
=
Px
E X:
}x
is a closed linear subspace of X and P is the orthogonal projection onto .Y
Proof
Since P is a linear operator we have
If x, y E ,Y then Px
=
+
P(rx.x
fty) =
x and Py
+
P(rx.x
+
=
+
rx.Px
ftPy.
y, and it follows that fty) =
rx.x
+
fty.
Therefore, (rx.x fty) E Y a nd Y is a linear subspace of .X We must show that Y is a closed linear subspace. First, however, we show that P is bounded and therefore continuous. Since
IIPzW
=
(Pz, Pz)
=
(P*Pz, )z
=
(P~z,
)z
=
(Pz, )z
(X l , and hence X o E .Y Finally, we must show that P is an orthogonal projection. L e t x E ,Y and let y E .Y l. Then (Py, )x = (y, Px) = (Y, x) = 0, since x ...L y. Therefore, Py...L x and Py E .Y l. But P(Py) = Py, since P~ = P and thus Py E .Y Therefore, it follows that Py = 0, because Py E Y and Py E .Y l. Now let Z = x + y E ,X where x E Y and y E lY .. Then pz = Px + Py = x + = .x Hence, P is an orthogonal projection onto .Y •
°
The next result is a direct consequence of Theorem 7.5.14. 7.5.15. Corollary. L e t Y be a closed linear subspace of X, the orthogonal projection onto .Y Then P(yl.) = O { .J 7.5.16. Exercise.
and let P be
Prove Corollary 7.5.15.
The next result yields the representation of an orthogonal projection onto a finite-dimensional subspace of .X 7.5.17. Theorem. L e t IX{ > • • , x~} be a finite orthonormal set in ,X and let Y be the linear subspace of X generated by { X I "' " x~}. Then the orthogonal projection of X onto Y is given by
Px =
~
~
I- I
(x, ,X )X
, for all x
E
.X
7.5.
Other iL near Operators
34 5
Proof We first note that Y is a closed linear subspace of X by Theorem 6.6.6. We now show that P is a projection by proving that p'1. = P. F o r any j
=
I, ... , n we have
PX
Hence,
for any x
=
ft ~
(x
I- I
J ,
,x )x,
=
(7.5.18)
Ix "
we have X
E
J
=
~
" (x,
,=
,X )X
t-1
Next, we show that CR(P) = Y c CR(P), let y E .Y Then
.Y
=
y
Px.
It is clear that CR(P) c .Y
+ ... +
tllXI
To show that
tI"x"
for some { t il' ... ,tift}. It follows from Eq. (7.5.18) that Py = Y and so y E CR(P). iF nally, to show that P is an orthogonal projection, we must show that CR(P) 1- (~ P). To do so, let x E ~(P) and let y E CR(P). Then
=
(x, y)
= =
=
(x, Py) ~
I~
(x, ~
" (x, ,X )(X
(O,y)
This completes the proof.
=
" (y, ,X )X
1= 1
O.
"
y)
=
(~(x, " 1'1=
,)
= ,X )X
~
"(y, - - ,x )(x,
1= 1
"
y)
=
,x )
(Px, y)
_
Referring to Definition 3.7.12 we recall that if Y and Z are linear subspaces of (a linear space) X such that X = Y ffi Z, and if T E L ( X , X ) is such that both Y and Z are invariant under T, then T is said to be reduced by Y and Z. When X is a Hilbert space, we make the following definition.
7.5.19. Definition. eL t Y be a closed linear subspace of ,X and let T E X ) . Then Y is said to reduce T if Y a nd y.l. are invariant under T.
L(X,
Note that in view of Theorem 6.12.16, Definitions 3.7.12 and 7.5.19 are consistent. The proof of the next theorem is straightforward.
7.5.20. Theorem. Let B(X , X ) . Then Y
be a closed linear subspace of ,X
and let T
Y is invariant under T if and only if y.l. is invariant under T*; and (ii) Y reduces T if and only if Y is invariant under T and T*. (i)
E
Chapter 7 I iL near Operators 7.5.21. Exercise.
Prove Theorem 7.5.20.
7.5.22. Theorem. Let Y be a closed linear subspace of ,X let P be the orthogonal projection onto ,Y let T E B(X, X ) , and let I denote the identity operator on .X Then
(i) Y is invariant under T if and only if TP = PTP; (ii) Y reduces T if and only if TP = PT; and (iii) (I - P) is the orthogonal projection onto lY .. Proof To prove (i), assume that TP = PTP. Then for any x E Y we have Tx = T(Px ) = P(TPx ) E Y, since P applied to any vector of X is in .Y Conversely, if Y is invariant under T, then for any vector x E X we have T(Px ) E ,Y because Px E .Y Thus, P(TPx ) = TPx for every x E .X To prove (ii), assume that PT = TP. Then PTP = P2T = PT = TP. Therefore, PTP = TP, and it follows from (i) that Y is invariant under T. To prove that Y reduces T we must show that Y is invariant under T*. Since P is hermitian we have T*P = (PT)* = (TP)* = P*T* = PT*; i.e., T*P = PT*. But above we showed that PTP = TP. Applying this to T* we obtain T*P = PT*P. In view of (i), Y is now invariant under T*. Therefore, the closed linear space reduces the linear operator T. Conversely, assume that Y reduces T. By part (i), TP = PTP and T*P = PT*P. Thus, PT = (T*P)* = (PT*P)* = PTP = TP; i.e., TP = PT. To prove (iii) we first show that (I - P) is hermitian. We note that (l - P)* = 1* - p* = I - P. Next, we show that (I - P) is idempotent. We observe that (I - pp = (1- 2P + P2) = (1- 2P + P) = (1- P). Finally, we note that (1 - P)x = x if and only if Px = 0, which implies that x E lY .. Thus, yl.
=
x{
E
X:
(1- P)x
It follows from Theorem 7.5.14 that (I The next theorem.
=
.}x
P) is a projection onto lY .. •
result follows immediately from part (iii) of the preceding
7.5.23. Theorem. Let Y be a closed linear subspace of ,X and let P be the orthogonal projection on .Y If II Px II = II x II, then Px = x, and consequently x E .Y 7.5.24.
Exercise.
Prove Theorem 7.5.23.
We leave the proof of the following result as an exercise. 7.5.25. Theorem. Let Y a nd Z be closed linear subspaces of ,X and let P and Q be the orthogonal projections on Y a nd Z, respectively. Let 0 denote
7.5.
Other iL near Operators
the zero transformation in B(X, (i) Y 1(ii) PQ = (iii) QP =
(iv) P(Z) (v) Q(Y )
;z =
34 7 )X .
The following are equivalent:
0; 0; O { ;}
= O{ .J
7.5.26. Exercise.
and
Prove Theorem 7.5.25.
F o r the product of two orthogonal projections we have the following result.
7.S.27. Theorem. L e t Y I and Y z be closed linear subspaces of ,X and let PI and P z be the orthogonal projections onto Y I and Y z , respectively. The product transformation PJP Z is an orthogonal projection if and only if PI commutes with P z . In this case the range of P1P Z is Y I (i Y z .
Proof Assume that PIP Z = PZP I· Then (PIP Z)* = PfN = PZP I = PIP Z; i.e., if PIP Z = PZP I then (PIP Z)* = (P1P Z)· Also, (PJPZP = PIPZPIP Z = PIPIPZP Z = PIP Z; i.e., if PIP Z = PZP I , then PIP Z is idempotent. Therefore, PIP Z is an orthogonal projection. Conversely, assume that PJP Z is an orthogonal projection. Then (PJP z )* = PfN = PZP 1 and also (P1P Z)* = PJP z . Hence, P1P Z = PZP J . Finally, we must show that the range of PI P z is eq u al to Y J (i Y z . Assume that x E 6l(P IP z ). Then P1PZx = ,x because P J P z isan orthogonal projection. Also, PIPZx = PI(PZx) E Y J , because any vector operated on by P J is in Y I ' Similarly, PZPlx = Pz(PJ)x E Y z . Now, by hypothesis, P1P Z = PZP Io and therefore PIPZx = PZPJx = x E Y I (i Y z . Thus, whenever x E 6l(P IP z ), then x E Y J (i Y z . This implies that 6l(P IP z ) c Y I (i Y z . To show that 6l(P IP z ) ::J Y I ( i Y z , assume that x E Y 1 (i Y z . Then PJPZx = PJP{ )xz = PIX = X E 6l(P IP z ). Thus, Y I (i Y z C 6l(P 1P z ). Therefore, 6l(P IP z ) = Y I (i Y z • •
7.5.28. Theorem. L e t
Y and Z be closed linear subspaces of ,X and let P and Q be the orthogonal projections onto Y a nd Z, respectively. The following are eq u ivalent:
(i) (ii) (iii)
P::;;; Q;
II Px II < II Qxll Y c: z;
(iv) QP = (v) PQ =
P; and P.
for all x
E X;
7. I iL near
ChJpz ter
34 8
Operators
Assume that P ~ Q. Since P and Q are orthogonal projections, they are hermitian. F o r a hermitian operator, P ~ 0 means (Px , x ) ~ 0 for all x E .X If P < Q, then (Px , x ) < (Qx , x ) for all x E X or (P"x , x ) < (Q"x , x ) or (Px , Px ) ~ (Qx , Qx ) or II Px II" < II Qx1l 2 , and hence IIPxll < II Qx l l for aU x E .X Next, assume that II Px II < II Qx II for all x E .X If x E Y , then Px = x and Proof
(x , x )
=
(Px , Px )
=
IIQxll" ~
IIPxll" ~
IIQllllxll"
=
II x
II" =
(x , x ) ,
and therefore II Qx II = II x II. F r om Theorem 7.5.23 it now follows that Qx = x , and hence x E Z. Thus, whenever x E Y then x E Z and Z ::J Y. Now assume that Z ::J Y and let y = Px , where x is any vector in X. Then QPx = Qy = y = Px for all x E X and QP = P. Suppose now that QP = P. Then (QP)* = P*, or P*Q* = PQ = p* = P; i.e., PQ = P. Finally, assume that PQ = P. F o r any x E X we have (Px , x ) = IIPxll" = IIPQxll"~IIPII"IIQxll" = IIQxll" = (Qx , Qx ) = (Q2 X ,X ) = (Qx , x ) ; i.e., (Px, )x < (Qx , )x from which we have P < Q. _ We leave the proof of the next result as an exercise. 7.5.29. Theorem. Let Y
1
and "Y be closed linear subspaces of ,X
and let
PI and P 2 be the orthogonal projections onto Y t and "Y , respectively. The difference transformation P = PI - P z is an orthogonal projection if and only if P z < PI' The range of Pis Y t n Y t .
7.5.30. Exercise.
Prove Theorem 7.5.29.
We close this section by considering some specific cases. 7.5.31. Example. in Example .4 10.48.
eL t R denote the transformation from E" into E" given That transformation is represented by the matrix
R,
= [c~S
SID
0 - sin OJ cos 0
0
with respect to an orthonormal basis e{ l' obtain R:
=[ -
e"J.
By direct computation we
c~s 0
SID
sin OJ. 9 cos 9
It readily follows that R*R = RR* = I. Therefore, R is a linear transformation which is isometric, unitary, and normal. _
7.6. The Spectrum 0/ an Operator
7.5.32. Exercise. eL t by y = PTx, where
= X
L
y(t) =
2
34 9 0[ , 00) and define the truneation operator P T
{ X ( t)
o
for all 0 < t :::;; T for all t > T
Show that PT is an orthogonal projection with range
R < (P
T)
=
x{
E :X
x(t)
and null space m(P T )
Additional examples Section 7.10.
7.6.
THE
= x{
E
:X
(x t)
=
0 for t
> T},
= 0 for all t < T}.
of different types of operators are considered in
SPECTRUM
OF
AN OPERATOR
In Chapter 4 we introduced and discussed eigenvalues and eigenvectors of linear transformations defined on finite-dimensional vector spaces. In the present section we continue this discussion in the setting of infinitedimensional spaces. nU less otherwise stated, X will denote a complex Banach space and I will denote the identity operator on .X oH wever, in our first definition, X may be an arbitrary vector space over a field .F 7.6.1. Definition. eL t T E L ( X , )X . A scalar A E F is called an eigenvalue of T if there exists an x E X such that x * - O and such that Tx = AX. Any vector x * - O satisfying the equation Tx = Ax is called an eigenvector of T corresponding to the eigenvalue A.. 7.6.2. Definition. eL t X be a complex Banach space and let T: X The set of all .J E F = C such that
.X
(i) R < (T - AI) is dense in ;X (ii) (T - .J I)-I exists; and (iii) (T - .J I)-I is continuous (i.e., bounded) is called the resolvent set of T and is denoted by p(T). The complement of p(T) is called the spectrum of T and is denoted by q ( T). The preceding definitions require some comments. First, note that if .J is an eigenvalue of T, there is an x * - O such that (T - .J I)x = O. From Theorem 3.4.32 this is true if and only if (T - AI) does not have an inverse. eH nce, if .J is an eigenvalue of T, then ,t E (q T). Note, however, that there
C1u:zpter 7 I iL near Operators
04
are other ways that a complex number 1 may fail to be in p(T). These possi. bilities are enumerated in the following definition. 7.6.3. Definition. The set of all eigenvalues of T is called the point spectrum of T. The set of alll such that (T - l1)- 1 exists but Gl(T - l l) is not dense in X is called the residual spectrum of T. The set of all 1 such that (T - 11)-1 exists and such that Gl(T - 11) is dense in X but (T - ll)- I is not continuous is called the continuous spectrum. We denote these sets by pq ( T), Rq(T), and Cq(T), respectively. Clearly, q ( T) = Pq(T) U Cq(T) U Rq(T). Furthermore, when X is finite dimensional, then q(T) = Pq(T). We summarize the preceding definition in the following table. AI)-1 exists and is continuous (T (T -
=
< R (T- U ) R < (T
-U)
X
*X
.11)-1
AI)-1 exists but not continuous (T (T -
AI)-1 does not exist
(T -
.11)-1 is
A e p(D
.Ie Ca(D
A e Pa(D
.Ie "RtT(T)
1 e RtT(T)
1 e PtT(T)
7.6.4. Table A. Characterization of the resolvent set and the spectrum of an operator
7.6.5. Example. x = (~I' ~2" ..)
E
Let X = /2 be the Hilbert space of Example 6.11.9, let ,X and define T E B(X , X ) by
=
Tx
! { 2 ' i(3' ...). F o r each 1 E C we want to determine (a) whether (T - 11)-1 exists; (b) if so, whether (T - 11)-1 is continuous; and (c) whether Gl(T - 1 1) = .X (~I'
First we consider the point spectrum of T. IfTx =
lx then (~
k = 1,2• . ... This holds for non-trivial x if and only if l Hence.
k:k =
pq ( T) = {
=
-
l )~k =
0,
11k for some k.
I. 2• . .. } .
Next, assume that 1 ¢ pq(T). so that (T - l1)- 1 exists. and let us inves· tigate the continuity of (T - 1 1)- 1 . We see that if y = (' I I. 1' 2.' ..) E Gl(T - 11), then (T - l 1)- l y = x is given by ~
-.....!l.L_ k'lk . ..! . ._ l - I - l k
k-
k
7.6. The Spectrum 0/ an Operator
Now if A.
=
0, then
II (T - A.I)-I y W=
. k= 1
14 ~
and (T -
k"11~
A.I)-I is not
bounded and hence not continuous. On the other hand, if A. A.I)-I is continuous since I' k I < , 1 11k I for all k, where
(T -
*" 0, then
I
and p(n
= P[ O'(T) u CO' ( nr· •
7.6.6. Exercise. eL t X = lz, the Hilbert space of Example x = (' I ' ,,,' ' 3 ' " .), and define the right shift operator T,: X - + left shift operator T,: X - + X by
= Y
and
T,x
=
(0,
I' ' , ,' ...)
6.11.9, let X and the
respectively. Show that
=
p(T,)
p(T,)
= CO'(T,) = A{ .
CO'(T,) RO'(T,) PO'(T,)
=
= A{ .
=
PO'(T,) RO'(T,)
C: IA.I >
E
E
= A{ . E = 0.
I),
C: IA.I = C: IA.I
I),
< I),
We now examine some of the properties of the resolvent set and the spectrum. 7.6.7. Theorem. Let T E B(X, X). IflAI > lently, if A E O'(n, then IA.I < II Til.
II Til, then A. E
p(T) or, equiva-
14
Chapter 7
7.6.8. Exercise.
I iL near Operators
Prove Theorem 7.6.7 (use Theorem 7.2.2).
7.6.9. Theorem. Let T
E
B(X,
X).
Then P(T) is open and o'(T) is closed.
Proof Since o(T) is the complement of p(T), it is closed if and only if P(T) is open. Let 1 0 E P(T). Then (T - 1 0 1) has a continuous inverse. F o r arbitrary 1 we now have
III- (T - l oI} - I (T 1- 1}11 = II(T - l oI} - ' ( T - 1 0 1) - (T - l ol} - I (T = II(T - l oI} - I [ ( T - 1 01) - (T -1I)]11 = II(l- l o)(T - 1 0 1)-111
= Il- l olIl(T - 1 0 /)-111. Now for 11 - 10 I sufficiently small, we have III- (T - loT) - I (T - 1I) II = 11 - 1 0 III(T -
1 0 ) -I
- 1 /)11
II
0 there exists a non-zero vector x E X such that II Tx - lx II < Ell x II. We denote the approximate point spectrum by n(T). If 1 E n(T), then 1 is called an approximate eigenvalue ofT. Clearly, Pt1(T) c n(T). Other properties of n(T) are as follows. 7.6.15. Theorem. n(T) c t1(T).
eL t
be a Hilbert X
space, and let T
Proof Assume that 1 ~ t1(T). Then (T and for any x E X we have
IIxII =
1/)- 1
and 1
II. ~
Then
)X .
has a continuous inverse,
< II(T - l l)- I IIII(T
II(T- l l)- I (T - l l)x l l
Now let E = I/II(T Ell x II for every x E X
lJ )
B(X,
E
- l l)x l l.
Then we have, from above, II Tx n(T). Therefore, t1(T) ::> n(T). •
lx l l ~
We leave the proof of the next result as an exercise. 7.6.16. Theorem. eL t X be a Hilbert normal operator. Then n(T) = t1(T). 7.6.17. Exercise.
space, and let T
E
B(X,
be a
X)
Prove Theorem 7.6.16.
We can use the approximate point spectrum to establish some of the properties of the spectrum of hermitian operators. 7.6.18. Theorem. eL t hermitian. Then X
be a Hilbert
space, and let T
E
B(X,
X)
be
(i) t1(T) is a subset of the real line;
(ii) II Til = sup {Ill: 1 E t1(T)}; and (iii) t1(T) is not empty and either + II Til or
-II
Til belongs to t1(T).
Proof To prove (i), note that if T is hermitian it is normal and t1(T) = n(T). eL t 1 E n(T), and assume that 1 0 is complex. Then for any x 0 we have
0
0 because (Sx , )x = (T*Tx , x ) = (Tx , Tx ) = II Tx ZH > O. This last condition implies that S has no negative eigenvalues. Specifically, if l is an eigenvalue of S, then there is an x * - O in X such that Sx = Ax. Now
o
O. By Theorem 7.7.24, S has an eigenvalue, p, where ± p = IISII = IIT*TII = IITW· Now let N ~ ffi:(S - pI) = ffi:iS ), and note that N contains a non- z e ro vector. Since T is normal, TS = T(T*T)
Chapter 7 I iL near Operators = (T*nT = ST. Similarly, we have T*S = ST*. By Theorem 7.6.24, N is invariant under T and under T*. By Theorem 7.5.6 this means T remains normal when its domain of definition is restricted to N. By Theorem 7.7.25, there is alE C and a vector x I= :- 0 in N such that Tx = lx , and thus T*x = .x X Now since Sx = T*Tx = T*(lx ) = IT*x = llx = 1112x for this x I= :0, and since Tx = lJ X for all x E N, it follows that 111 2 = lJ = II S II = II T*T II = II T W· Therefore, III = II T II and 1 is an eigenvalue of T. _
7.8.
THE SPECTRAL THEOREM O F R COMPLETELY CONTINUOS U NORMAL OPERATORS
The main result of this section is referred to as the spectral theorem (for completely continuous operators). Some of the direct consequences of this theorem provide an insight into the geometric properties of normal operators. Results such as the spectral theorem playa central role in applications. In Section 7.10 we will apply this theorem to integral equations.
Throughout this section, X is a complex iH lbert We require some preliminary results.
space.
7.8.1. neorem. L e t T E B(X, X ) be completely continuous and normal. F o r each f > 0, let A. be the annulus in the complex plane defined by
A. =
{l
E C: f
< 1).1 s II Til}.
Then the number of eigenvalues of T contained in A. is finite.
Proof To the contrary, let us assume that for some f > 0 the annulus A. contains an infinite number of eigenvalues. By the Bolzano-Weierstrass theorem, there is a point of accumulation 1 0 of the eigenvalues in the annulus A•. Let ){ .ft} be a sequence of distinct eigenvalues such that )." - > ).0 as n - > 00, and let Tx" = l"x", II "x II = I. Since T is a completely continuous for which the sequence T { "x .} operator, there is a subsequence x { ...} of ,x { ,} converges to an element u E X ; i.e., Tx". - > U as nk - > 00. Thus, since Tx ... = l".x we have l • x ... - > u. But 1/).... - > 1/10 because 1" I= :- O. Therefore x • - > (I/10)u. But the x • are distinct eigenvectors corresponding to distinct eigenvalues. By part (iv) of Theorem 7.6.10 .x { ..} is an orthonormal 2 sequence and "x . - > (I/10)u. But II x • - "x ,11 = 2, and thus x { ...} cannot be a Cauchy sequence. Yet, it is convergent by assumption; i.e., we have arrived at a contradiction. Therefore, our initial assumption is false and the theorem is proved. _ ft. ,
Our next result is a direct consequence of the preceding theorem.
7.8.
The Spectral Theorem for Completely Continuous Normal Operators
54 5
7.8.2. Theorem. Let T
E B(X , X ) be completely continuous and normal. Then the number of eigenvalues of T is at most denumerable. If the set of eigenvalues is denumerable, then we have a point of accumulation at zero and only at zero (in the complex plane). The non-zero eigenvalues can be ordered so that
7.8.3. Exercise.
Prove Theorem 7.8.2.
The next result is known as the spectral theorem. Here we let Ao = 0, and we let {AI' A2.' ...} be the non-zero eigenvalues of a completely continuous operator T E B(X , X). Note that Ao mayor may not be an eigenvalue of T. If Ao is an eigenvalue, then m.(T) need not be finite dimensional. oH wever, by Theorem 7.7.20, m.(T - A/) is finite dimensional for i = 1,2, ....
7.8.4.
Theorem. eL t T E B(X, X ) be completely continuous and normal, { lt A2.' ...} be the non-zero distinct eigenvalues of T let Ao = 0, and let A (this collection may be finite). eL t m., = m.(T - A,I) for i = 0, I, 2, .... Then the family of closed linear subspaces m { .,};:o of X is total.
The fact that each Theorem 7.1.26. Now let Y
Proof
m., is a closed linear subspace of X follows from = U m.", and let N = y.1.. We wish to show that
.
N= O { .J By Theorem 6.12.6, N is a closed linear subspace of .X We will show first that Y is invariant under T*. Let x E .Y Then x E m.. for some n and Tx = l"x. Now l.,(T*x ) = T*(l"x ) = T*Tx = T(T*x ) ; i.e., T(T*x ) = l.(T*x ) and so T*x E m.., which implies T*x E .Y Therefore, Y is invariant under T*. From Theorem 7.3.15 it follows that y.1. is invariant under T. Hence, N is an invariant closed linear subspace under T. It follows from Theorems 7.7.8 and 7.5.6 that if T I is the restriction of T to N, then T I E B(N, N) and T I is completely continuous and normal. Now let us suppose that N 1= = O { .J By Theorem 7.7.25 there is a non-zero x E N and a A. E C such that T I x = lx . But if this is so, Ais an eigenvalue of T and it follows that x E m." for some n. Hence, x E N (\ ,Y which is impossible unless x = O. This completes the proof. • In proving an alternate form of the spectral theorem, we require following result.
the
7.8.5. Theorem. Let {N k } be a sequence of orthogonal closed linear subspaces of ;X i.e., N k .1. N J for all j 1= = k. Then the following statements are equivalent:
(i) N { k } is a total family; (ii) X is the smallest closed linear subspace which contains every N k ; and
Chapter 7 I iL near Operators
S4 6
for every x E X there is a unique sequence x{ (a) X k E N k for every k,
(iii)
(b)
Proof
= U
II
.
L k=1
x
k
=
k}
such that
and
X;
We first prove the equivalence of statements (i) and (ii). Let Y Nil' Then Y c y.l.L by Theorem 6.12.8. Furthermore, y.l.L is the smallest
closed linear subspace which contains Y by Theorem 6.12.8. Now suppose { N k } is a total family. Then yl. = O { .J Hence, yl.l. = X and so X is the smallest closed linear subspace which contains every N k • On the other hand, suppose X is the smallest closed linear subspace which { .J But yl.l.l. = lY .. Thus, contains every N k • Then X = y.l.L and yl.l.l. = O yl. = O { ,J and so { N k } is a total family. We now prove the equivalence of statements (i) and (iii). Let N { k } be a total family, and let x E .X F o r every k = 1,2, ... , there is an IX < E IH < and a kY E Nt such that x = X k + IY '< If IX < = 0, then (x, x k) = 0. If IX < 0, then (x, xk1llxkll) = (Xk + kY ' x k lllx k ll) = II ,x .. II· Thus, it follows from Bessel's inequality that
*'
eH nce, let Y
=
~
. Ilx,..1I
2
.Y F o r x = (~I" .. '~8) E ,X
7.9.
64 1
Differentiation ofOperators
let us write
[
I(x ) =
For X
o E ,X
/I~X)J
/[ 1(1;1,;., . .
=
.
.
.
.
I",(x) 1",(1;1'' assume that the partial derivatives
I
af,(x )
~
,I;')J
,1;.)
af,(x o)
ae;-
? f ; "=". exist and are continuous for i = I, ... , m and j = I, ... ,n. The Frechet differential of1 at X o with increment h = (hI' ... ,h.) E X is given by
~
3/(x o, h) =
all (x o)
a/,(x o)
~
h[ h·:.'·J
al",(x o)
al",(x o)
_ ael
The F r tkhet derivative of 1 at X o is given by all (x o)
al;.
~
which is also called the Jacobian matrix j' ( x ) = a! ( x ) /ax . •
of 1 at X
o' We sometimes write
7.9.11. Example. Let X = e[a, b], the family of real-valued continuous functions defined on a[ , b], and let { X ; II· II-} be the Banach space given in Example 6.1.9. Let k(s, t) be a real-valued function defined and continuous on a[ , b] X a[ , b], and let g(t, )x be a real-valued function which is defined and ag(t, x ) /ax is continuous for t E a[ , b] and x E R. Let I: X - . X be defined by I(x ) F o r fixed given by X
o E ,X
=
s: k(s, t)g(t, x(t»dt,
x
E
.X
the Frechet differential of1 at X o with increment hEX
3/(x o, h) =
f
k(s, t) ag(t'a~o(t})
h(t)dt. •
is
Chapter 7 I iL near Operators
64 2 7.9.12. Exercise.
Verify the assertions made in Examples 7.9.5 to 7.9.11.
We now establish some of the properties of F r echet differentials. 7.9.13. Theorem. Then
Let f, g: X
Y
-+
be Frechet
differentiable at X
o E .X
(i) fis continuous at X o E ;X and (ii) for all ,~ p E ,F f~ + pg is F r echet differentiable at X o and (~f + pg)'(x o) = ~f'(xo) pg' ( x o)· Proof To prove (i), let f be Frechet differentiable at x o, and let F(x o) be the Frechet derivative off at X o' Then f(x o + h) - f(x o) = f(x o + h) - f(x o) - (F ox )h + (F ox )h,
+
and
IIf(x o + h) - f(x o) II ~ IIf(x o + h) - f(x o) - (F ox )hll + IIF(x o)hll. Since F(x o) is bounded, there is an M > 0 such that II (F o x )h II < Mil h II. F u rthermore, for given! > 0 there is a ~ > 0 such that IIf(x o + h) - f(x o) - (F ox h) II < I! I h II provided that II h II .~< Hence, IIf(x o + h) - f(x o) II < (M + ! ) lIhll whenever IIhll .~< This implies thatfis continuous atx o' The proof of part (ii) is straightforward and is left as an exercise. _ 7.9.14.
Prove part (ii) of Theorem 7.9.13.
Exercise.
We now show that the chain rule encountered in calculus applies to Frechet derivatives as well. 7.9.15. Theorem. Let ,X ,Y and Z be normed linear spaces. L e t g: X - + ,Y f: Y - + Z, and let,: X - + Z be the composite function , = fog. L e t g be Frechet differentiable on an open set D c ,X and let f be F r echet differentiable on an open set E c g(D). If x E D is such that g(x) E E, then, is Frechet differentiable at x and ,' ( x ) = f'(g(x))g'(x).
Proof Let y = g(x) and d = x + hE D. Then ,(x
=
+
h) -
f(y +
,(x ) -
Thus, given! 11,(x
+
>
f' ( y)d
0 there is a ~
h) -
=
f' ( y)g' ( x ) h
f(y) -
d) -
,(x ) -
+
g(x
>
+
h) f(y +
f' ( y){ g (x
g(x), where hEX d) -
+
f(y) h) -
0 such that II d II
f' ( y)g' ( x ) hll ~
! l Idli
0 such that IAI > d and 11 - 1k I> d for k = 1,2, .... We note from Theorem 7.8.7 that PIP j = 0 for i j. Now for N < 00, we have by the Pythagorean theorem,
*'
II-Pf +
k~I;: ~;:112
k~
=rhIlPoYW+
11-A ! kI2I1PkYW
< d211PoYW + dz kt IIP kyW
+ ktlllPkYll z ]
= dzI[ IPoYW = d 211 poY
+ ~ Pkylr
< dzll pOY + =
This implies that k~ Theorem 6.13.3 that
11
dziIYW.
~ 1k 12 II PkY
nt :X ~ ):
i;l PkyW
2
11 is convergent, and so it follows from
is convergent to an element in .X
be a positive integer. By Theorem 7.5.12, P j is continuous, and so P ) PP by Theorem 7.1.27, Pj ~, ~ 1 = ~ , J ....:Y,. Now let x be given by L e tj
00
(
Eq. (7.10.4) for arbitrary Y
00
11-1 All
E
,,- 1
A"
lJ .
.X We want to show that Tx - l x
= y. F r om
7.10. Some Applications
64 7
Eq. (7.10.4) we have
=
Pox
I - r PoY
and
1 lPJ y forj=
PJ X = l J Thus, poY
=
- l Pox
and PJY
=
lJPxJ
theorem (Theorem 7.8.7), we have Y
+
= poY
lPJx.
+
Now from the spectral
fti PJ'Y 00
Tx
00
= ftilJ P J x ,
and
00
~ lPJx. Hence, Y = Tx - l x . :'J 1 Finally, to show that x given by Eq. (7.10.4) is unique, let x and z be such that Tx - Ax = Tz - lz = y. Then it follows that T(x - )z - l(x - z) =Y - Y = O. Hence, T(x - )z = l(x - )z . Since 1 is by assumption not an eigenvalue of T, we must have x - z = O. This completes the proof. _
lx
= lPox
-
1,2, ....
In the next result we consider the case where 1 is a non-zero ofT.
eigenvalue
7.tO.S. Theorem. Let I{ n} denote the non-zero distinct eigenvalues of T, and let A= lJ for some positive integer j. Then there is a (non-unique) x E X satisfying Eq. (7.10.2) if and only if PJY = 0, where PJ is the orthogonal projection of X onto ffi:J = :x { (T - Al)x = O}. If PJY = 0, then a solution to Eq. (7.10.2) is given by
X=X
poY o - " ' "II.
+
PkY
~
~'
k= l lI.k k*J
where Po is the orthogonal projection of X in ffi:J '
-.I\,
(7.10.6)
onto ffi:(T) and X o is any element
Proof We first observe that ffi:J reduces T by part (iii) of Theorem 7.6.26. It therefore follows from part (ii) of Theorem 7.5.22 that TPJ = PJT. Now suppose that Y is such that Eq. (7.10.2) is satisfied for some x E .X Then it follows that PJY = Pi Tx - lJ x ) = TPJx - lJPxJ = AJPXJ - AJPXJ = O. In the preceding, we used the fact that Tx = lJ x for x E ffi:J and PJx E ffi:J for all x E .X Hence, PJY = O. Conversely, suppose that PJY = 0, and let x be given by Eq. (7.10.6). The proof that x satisfies Eq. (7.10.2) follows along the same lines as the proof of Theorem 7.10.3, and the details are left as an exercise. The nonuniqueness of the solution is apparent, since (T - ll)x o = 0 for any X o E
ffi:J' -
7.tO.7. Exercise.
Complete the proof of Theorem 7.10.5.
Chapter 7 I iL near Operators
64 8 B.
An Example
from Optimal Control
In this example we consider systems which can appropriately be described by the system of first-order ordinary differential equations
°
+
AX(I)
i(l) =
(7.10.8)
BU(I),
X o is given. Here (X I) E RIO and (U I) E R'" for every 1 such that < 1 < T for some T> 0, and A is an n X n matrix, and B is an n X m matrix. As we saw in part (vi) of Theorem .4 11.45, if each element of the vector (U I) is a continuous function of I, then the unique solution to Eq. (7.10.8) at time 1 is given by
where x ( o)
A
+
= .(1, O)x(O)
(X I)
(.(1, r- )BU(f)d-r,
(7.10.9)
where .(1, f) is the state transition matrix for the system of equations given in Eq. (7.10.8). [' , T] by Let sU now define the class of vector valued functions ;L O ;L O [' ,
T] =
u{ : uT
=
(U . ,
,u",), where
••
If we define the inner product by (u, v)
=
r
/U
E
L
[ , 20
T], i =
I, ... ,m} .
uT(t)v(l)dl
for u, v E Lr[O, 1',] then it follows that Lr[O, T] is a Hilbert space (see Example 6.11.11). Next, let us define the linear operator L : Lr[O, T] - + Li[O, 1'] by
=
[Lu](I)
I
.(1, r- )BU(f)d-r
(7.10.10)
for all U E Lr[O, 1'.] Since the elements of .(1, r- ) are continuous functions on 0[ , T] X 0[ , T], it follows that L is completely continuous. Now recall from Exercise 5.10.59 that Eq. (7.10.9) is the unique solution to Eq. (7.10.8) when the elements of the vector u(t) are continuous functions of t. It can be shown that the solution of Eq. (7.10.8) exists in an extended sense if we permit u E Lr[O, T]. Allowing for this generalization, we can now consider the following optimal control problem. Let "I E R be such that "I > 0, and let/be the real-valued functional defined on Ll[O, T] given by /(u)
=
r
T x (t)X(I)dt
+
"I
r
T U (I)U(t)dt,
(7.10.11)
where (x t) is given by Eq. (7.10.9) for U E T L O [ , T]. The linear quadratic L O [ , T] such that/(u) in Eq. (7.10.11) is cost control problem is to find u E T minimum, where x(t) is the solution to the set of ordinary differential equations (7.10.8). This problem can be cast into a minimization problem in a Hilbert space as follows.
7.10. Some Applications
64 9
Let
v(t)
= - . (t, O)x o for 0
0, then there exists a unique U o E X such that f(u o) < f(u) for all u E .X Furthermore, U o is the solution to the equation L*Lu
o
+ "U
o=
(7.10.14)
L*v.
eL t us first examine Eq. (7.10.14). Since L is a completely continuous operator, by Corollary 7.7.12, so is L*L. Furthermore, the eigenvalues of L * L cannot be negative, and so - " cannot be an eigenvalue of L*L. Making the association T = L * L , A = - " , and y = L * v in Eq. (7.10.2), it is clear that Tis normal and it follows from Theorem 7.10.3 that Eq. (7.10.14) has a unique solution. In fact, this solution is given by Eq. (7.10.4), using the above definitions of symbols. Next, let us assume that U o is the unique element in X satisfying Eq. (7.10.14), and let hE X b e arbitrary. It follows from Eq. (7.10.13) that Proof.
f(u o +
h) =
=
= =
Therefore, f(u o +
(L u o + (L u o -
+
(v, v)
(L u o -
+
v,L u o + L h - v) + ,,(uo + v, L u o - v) + 2(Lh, L u o - v)
Lh -
+
v, L u o -
2(h, L * L u
IILu o -
"(I!o, u o) + o+
vW +
v)
+
"u o -
+
2,,(u o, h) (v, v) L * v)
+
+
,,(uo, uo) ,,(h, h)
IlvW + "lIuoW+
h) is minimum if and only if h
,,(h, h)
=
"lIhW·
O.
•
h, U o +
h)
Chapter 7 I iL near Operators
74 0
The solution to Eq. (7.10.14) can be obtained from Eq. (7.10.4); however. a more convenient method is available for the finding of the solution when L is given by Eq. (7.10.10). This is summariz e d in the following result.
7.10.1S. Theorem. L e t Y' >
0, and let f(u) be defined by Eq . (7.10.11), where x ( t) is the solution to Eq. (7.10.8). If
for all t such that 0 ~ ential eq u ation
P(t) with P(T)
Proof
=
C 1 > O. The reader can readily verify that the functional given in Eq. (7.10.13) is a special case off, given in Eq. (7.10.18), where we make the association M= L * L + 1' 1 (provided i' > 0), w
= L * v, and p = U n der
(v, v).
the above conditions, the equation
Mx =
w
(7.10.20)
74 1
Chapter 7 I iL near Operators
has a unique solution, say x o, and X o minimizes f(x ) . Iterative methods are based on beginning with an initial guess to the solution of Eq. (7.10.20) and then successively attempting to improve the estimate according to a recursive relationship of the form (7.10.21) where~. E Rand r. E .X Different methods of selecting~. and r. give rise to various algorithms of minimizing f(x ) given in Eq. (7.10.18) or, equivalently, finding the solution to Eq. (7.10.20). In this part we shall in particular consider the method of steepest descent. In doing so we let
r.
=
w-
Mx.,
n=
1,2, . . . .
(7.10.22)
The term r. defined by Eq. (7.10.22) is called the residual of the approximation x .• If, in particular, x . satisfies Eq. (7.10.20), we see that the residual is ez ro. F o r f(x ) given in Eq. (7.10.18), we see that
f' ( x . )
= - 2 r",
where f' ( x . ) denotes the gradient of f(x . ). That is, the residual, r., is "pointing" into the direction of the negative of the gradient, or in the direction of steepest descent. Equation (7.10.2 I) indicates that the correction term ~.r. is to be a scalar multiple of the gradient, and thus the steepest descent method constitutes an example of one of the so-called "gradient methods." is chosen so thatf(x . + ~.r.) is minimum. With r. given by Eq. (7.10.22),~. Substituting x . + ~.r. into Eq. (7.10.18), it is readily shown that (l
•
=
(r•• r.) (r., Mr.)
is the minimizing value. This method is illustrated pictorially in Figure B.
,X
fix , ) 7.10.23.
iF gure B. Illustration of the method of steepest descent.
74 3
7.11. Refe,ences and Notes
In the following result we show that under appropriate conditions the x{ J generated in the heuristic discussion above converges to the sequence N uniq u e minimizing element X o satisfying Eq. (7.10.20). 7.10.24. Theorem. L e t M E B(X , X ) be a self-adjoint operator such that for some pair of positive real numbers" and .J l we have ,,11 x W< (x, Mx ) < .J lllx Wfor all x E .X L e t IX E X be arbitrary, let W E ,X and let'N = W - Mx N, where N X I+ = X N (l,N'N for n = 1,2, ... ,and (l,N = ('N' N ' )/('N' M'N)' Then the sequence x { converges to x o, where X o is the uniq u e solution to Eq. (7.10.20).
+
N}
In view of the Schwarz inequality we have (x, Mx ) < IIMx l lllx l l. This implies that "llx l l < IIMx l l for all x E ,X and so M is a bijective mapping by Theorem 7.4.21, with M- I E B(X , X ) and 11M-III < I/r. By Theorem 7.4.10, M- I is also self-adjoint. L e t X o be the uniq u e solution to Eq . (7.10.20), and define :F X - > R by
Proof
=
x o, M(x - x o)) for x E .X We see that F is minimized uniquely by x = x o, and furthermore F ( x o) = O. We now show that lim F ( x N) = O. If for some n, F ( x N) = 0, the process F(x)
(x -
N
terminates and we are done. So assume in the following that F ( x also that since M is positive, we have F ( x ) > 0 for all x E .X We begin with the fact that
+
F ( x N+ I) = F ( x N) - 2(1,N('N' MYN) (I,~('N' where we have let NY = X o - X N. Noting that N ' = (YN' MYN) = (M- I ' N ' 'n), we have F(x
Hence, (F N x I+ ) so X N- >
7.11.