Mariano Giaquinta Giuseppe Modica
Mathematical Analysis Linear and Metric Structures and Continuity
Birkhauser Boston • Basel • Berlin
Mariano Giaquinta Scuola Noimale Superiore Dipartimento di Matematica 1-56100 Pisa Italy
Giuseppe Modica Universita degl. Studi di Firenze P S . ^ ' T " ' ' ' **' Matematica Apphcata 1-50139 Firenze Italy
Cover design by Alex Gerasev. Mathematics Subject Classification (2000): 00A35, 15-01, 32K99,46L99, 32C18, 46E15, 46E20 Library of Congress Control Number: 2006927565 ISBN-10: 0-8176-4374-5
e-ISBN-10: 0-8176-4514-4
ISBN-13: 978-0-8176-4374-4
e-ISBN-13: 978-0-8176-4514-4
Printed on acid-free paper. ©2007 Birkhauser Boston BirkhdUSeV All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Birkhauser Boston, c/o Springer Science+Business Media LLC, 233 Spring Street, New York, NY 10013, USA) and the author, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. 9 8 7 6 5 4 3 2 1 www.birkhauser.com
(MP)
Preface
One of the fundamental ideas of mathematical analysis is the notion of a function', we use it to describe and study relationships among variable quantities in a system and transformations of a system. We have already discussed real functions of one real variable and a few examples of functions of several variables^, but there are many more examples of functions that the real world, physics, natural and social sciences, and mathematics have to offer: (a) not only do we associate numbers and points to points, but we associate numbers or vectors to vectors, (b) in the calculus of variations and in mechanics one associates an energy or action to each curve y{t) connecting two points (a, y{a)) and {b,y{b)y. b
^{y):= j F{t,y{tW{t))dt in terms of the so-called Lagrangian F(t, y,p), (c) in the theory of integral equations one maps a function into a new function h
^(^) "^
K{s,r)x{T)dT
by means of a kernel K{s, r ) , (d) in the theory of differential equations one considers transformations of a function x{t) into the new function t
t
/
f{s,x{s))ds,
a
where f{s,y)
is given.
^ in M. Giaquinta, G. Modica, able, Birkhauser, Boston, 2003, quinta, G. Modica, Mathematical Birkhauser, Boston, 2004, which
Mathematical Analysis, Functions of One Variwhich we shall refer to as [GMl] and in M. GiaAnalysis. Approximation and Discrete Processes, we shall refer to as [GM2].
Preface
FONCTIONS DE LIGNES CROPRSSKKS \ U\ SORnONMi: RN l$l«
ViTO V 0 I , T « 8 J U .
r«r JftMpk H*t»,
PARIS, r.AOTIflRH-VItrARS, IMPniMEtin-UBHAlRE
Figure 0.1. Vito Volterra (1860-1940) and the frontispiece of his Legons sur les fonctions de lignes.
Of course all the previous examples are covered by the abstract setting of functions or mappings from a set X (of numbers, points, functions, . . . ) with values in a set Y (of numbers, points, functions, . . . ) . But in this general context we cannot grasp the richness and the specificity of the different situations, that is, the essential ingredients from the point of view of the question we want to study. In order to continue to treat these specificities in an abstract context in mathematics, but also use them in other fields, we proceed by identifying specific structures and studying the properties that only depend on these structures. In other words, we need to identify the relevant relationships among the elements of X and how these relationships reflect on the functions defined on X. Of course we may define many intermediate structures. In this volume we restrict ourselves to illustrating some particularly important structures: that of a linear or vector space (the setting in which we may consider linear combinations), that of a metric space (in which we axiomate the notions of limit and continuity by means of a distance)^ that of a normed vector space (that combines linear and metric structures), that of a Banach space (where we may operate hnearly and pass to the limit), and finally, that of a Hilbert space (that allows us to operate not only with the length of vectors, but also with the angles that they form). The study of spaces of functions and, in particular, of spaces of continuous functions originating in Italy in the years 1870-1880 in the works of among others Vito Volterra (1860-1940), Giulio Ascoh (1843-1896), Cesare Arzela (1847-1912) and Ulisse Dini (1845-1918), is especially relevant in the previous context. A descriptive diagram is the following:
Preface
vii
Accordingly, this book is divided into three parts. In the first part we study the hnear structure. In the first three chapters we discuss basic ideas and results, including Jordan's canonical form of matrices, and in the fourth chapter we present the spectral theorem for self-adjoint and normal operators in finite dimensions. In the second part, we discuss the fundamental notions of general topology in the metric context in Chapters 5 and 6, continuous curves in Chapter 7, and finally, in Chapter 8 we illustrate the notions of homotopy and degree, and Brouwer's and Borsuk's theorems with a few applications to the topology of R^. In the third part, after some basic preliminaries, we discuss in Chapter 9 the Banach space of continuous functions presenting some of the classical fixed point theorems that play a relevant role in the solvability of functional equations and, in particular, of differential equations. In Chapter 10 we deal with the theory of Hilbert spaces and the spectral theory of compact operators. Finally, in Chapter 9 we survey some of the important applications of the ideas and techniques that we previously developed to the study of geodesies, nonlinear ordinary differential and integral equations and trigonometric series. In conclusion, this volume^ aims at studying continuity and its implications both in finite- and infinite-dimensional spaces. It may be regarded as a companion to [GMl] and [GM2], and as a reference book for multi-dimensional calculus, since it presents the abstract context in which concrete problems posed by multi-dimensional calculus find their natural setting. Though this volume discusses more advanced material than [GMl,2], we have tried to keep the same spirit, always providing examples and "^ This book is a translation and revised edition of M. Giaquinta, G. Modica, Analisi Matematica, III. Strutture lineari e metriche, continuitd, Pitagora Ed., Bologna, 2000.
viii
Preface
exercises to clarify the main presentation, omitting several technicalities and developments that we thought to be too advanced and supplying the text with several illustrations. We are greatly indebted to Cecilia Conti for her help in polishing our first draft and we warmly thank her. We would like to also thank Fabrizio Broglia and Roberto Conti for their comments when preparing the Itahan edition; Laura Poggiolini, Marco Spadini and Umberto Tiberio for their comments and their invaluable help in catching errors and misprints and Stefan Hildebrandt for his comments and suggestions, especially those concerning the choice of illustrations. Our special thanks also go to all members of the editorial technical staff of Birkhauser for the excellent quality of their work and especially to Avanti Paranjpye and the executive editor Ann Kostant. N o t e : We have tried to avoid misprints and errors. But, as most authors, we are imperfect authors. We will be very grateful to anybody who wants to inform us about errors or just misprints or wants to express criticism or other comments. Our e-mail addresses are
[email protected] [email protected] We shall try to keep up an errata corrige at the following webpages: http: //www. sns. it/'^giaquinta http://www.dma.unif i.it/~modica
Mariano Giaquinta Giuseppe Modica Pisa and Firenze October 2006
Contents
Preface Part I. Linear Algebra 1.
2.
Vectors, Matrices and Linear Systems 1.1 The Linear Spaces R^ and C^ a. Linear combinations b. Basis c. Dimension d. Ordered basis 1.2 Matrices and Linear Operators a. The algebra of matrices b. A few special matrices c. Matrices and linear operators d. Image and kernel e. Grassmann's formula f. Parametric and impUcit equations of a subspace .. 1.3 Matrices and Linear Systems a. Linear systems and the language of linear algebra b. The Gauss ehmination method c. The Gauss elimination procedure for nonhomogeneous linear systems 1.4 Determinants 1.5 Exercises
3 3 3 6 7 9 10 11 12 13 15 18 18 22 22 24 29 31 37
Vector Spaces and Linear Maps 41 2.1 Vector Spaces and Linear Maps 41 a. Definition 41 b. Subspaces, linear combinations and bases 42 c. Linear maps 44 d. Coordinates in a finite-dimensional vector space .. 45 e. Matrices associated to a linear map 47 f. The space £ ( X , y ) 49 g. Linear abstract equations 50
X
3.
Contents
h. Changing coordinates i. The associated matrix under changes of basis . . . . j . The dual space C[X, K) k. The bidual space 1. Adjoint or dual maps 2.2 Eigenvectors and Similar Matrices 2.2.1 Eigenvectors a. Eigenvectors and eigenvalues b. Similar matrices c. The characteristic polynomial d. Algebraic and geometric multiplicity e. Diagonizable matrices f. Triangularizable matrices 2.2.2 Complex matrices a. The Cayley-Hamilton theorem b. Factorization and invariant subspaces c. Generahzed eigenvectors and the spectral theorem d. Jordan's canonical form e. Elementary divisors 2.3 Exercises
51 53 54 55 56 57 58 58 60 60 62 62 64 65 ^^ 67 68 70 75 76
Euclidean and Hermitian Spaces 3.1 The Geometry of Euclidean and Hermitian Spaces a. Euclidean spaces b. Hermitian spaces c. Orthonormal basis and the Gram-Schmidt algorithm d. Isometrics e. The projection theorem f. Orthogonal subspaces g. Riesz's theorem h. The adjoint operator 3.2 Metrics on Real Vector Spaces a. Bilinear forms and linear operators b. Symmetric bilinear forms or metrics c. Sylvester's theorem d. Existence of p-orthogonal bases e. Congruent matrices f. Classification of real metrics g. Quadratic forms h. Reducing to a sum of squares 3.3 Exercises
79 79 79 82 85 87 88 90 91 92 95 95 97 97 99 101 103 104 105 109
Contents
Self-Adjoint Operators 4.1 Elements of Spectral Theory 4.1.1 Self-adjoint operators a. Self-adjoint operators b. The spectral theorem c. Spectral resolution d. Quadratic forms e. Positive operators f. The operators A* A and AA"" g. Powers of a self-adjoint operator 4.1.2 Normal operators a. Simultaneous spectral decompositions b. Normal operators on Hermitian spaces c. Normal operators on Euclidean spaces 4.1.3 Some representation formulas a. The operator A* A b. Singular value decomposition c. The Moore-Penrose inverse 4.2 Some Applications 4.2.1 The method of least squares a. The method of least squares b. The function of linear regression 4.2.2 Trigonometric polynomials a. Spectrum and products b. Sampling of trigonometric polynomials c. The discrete Fourier transform 4.2.3 Systems of difference equations a. Systems of linear difference equations b. Power of a matrix 4.2.4 An ODE system: small oscillations 4.3 Exercises
xi
Ill Ill Ill Ill 112 114 115 117 118 119 121 121 121 122 125 125 126 127 128 128 128 130 130 131 132 134 136 136 137 141 143
Part II. Metrics and Topology 5.
Metric Spaces and Continuous Functions 5.1 Metric Spaces 5.1.1 Basic definitions a. Metrics b. Convergence 5.1.2 Examples of metric spaces a. Metrics on finite-dimensional vector spaces b. Metrics on spaces of sequences c. Metrics on spaces of functions 5.1.3 Continuity and limits in metric spaces a. Lipschitz-continuous maps between metric spaces b. Continuous maps in metric spaces
149 151 151 151 153 154 155 157 159 161 . 161 162
Contents
c. Limits in metric spaces 164 d. The junction property 165 5.1.4 Functions from R^ into R'^ 166 a. The vector space C^{A, W^) 166 b. Some nonhnear continuous transformations from R^ into R ^ 167 c. The calculus of limits for functions of several variables 171 5.2 The Topology of Metric Spaces 174 5.2.1 Basic facts 175 a. Open sets 175 b. Closed sets 175 c. Continuity 176 d. Continuous real-valued maps 177 e. The topology of a metric space 178 f. Interior, exterior, adherent and boundary points .. 179 g. Points of accumulation 180 h. Subsets and relative topology 181 5.2.2 A digression on general topology 182 a. Topological spaces 182 b. Topologizing a set 184 c. Separation properties 184 5.3 Completeness 185 a. Complete metric spaces 185 b. Completion of a metric space 186 c. Equivalent metrics 187 d. The nested sequence theorem 188 e. Baire's theorem 188 5.4 Exercises 190 Compactness and Connectedness 6.1 Compactness 6.1.1 Compact spaces a. Sequential compactness b. Compact sets in R^ c. Coverings and e-nets 6.1.2 Continuous functions and compactness a. The Weierstrass theorem b. Continuity and compactness c. Continuity of the inverse function 6.1.3 Semicontinuity and the Prechet-Weierstrass theorem 6.2 Extending Continuous Functions 6.2.1 Uniformly continuous functions 6.2.2 Extending uniformly continuous functions to the closure of their domains 6.2.3 Extending continuous functions a. Lipschitz-continuous functions
197 197 197 197 198 199 201 201 202 202 203 205 205 206 207 207
Contents
6.3
6.4 7.
8.
6.2.4 Tietze's theorem Connectedness 6.3.1 Connected spaces a. Connected subsets b. Connected components c. Segment-connected sets in R*^ d. Path-connectedness 6.3.2 Some apphcations Exercises
xiii
208 210 210 211 211 212 213 214 216
Curves 7.1 Curves in R^ 7.1.1 Curves and trajectories a. The calculus b. Self-intersections c. Equivalent parametrizations 7.1.2 Regular curves and tangent vectors a. Regular curves b. Tangent vectors c. Length of a curve d. Arc length and C^-equivalence 7.1.3 Some celebrated curves a. Spirals b. Conchoids c. Cissoids d. Algebraic curves e. The cycloid f. The catenary 7.2 Curves in Metric Spaces a. Functions of bounded variation and rectifiable curves b. Lipschitz and intrinsic reparametrizations 7.2.1 Real functions with bounded variation a. The Cantor-Vitali function 7.3 Exercises
219 219 219 222 223 223 224 224 225 226 232 233 234 236 237 238 238 240 241
Some Topics from the Topology of R^ 8.1 Homotopy 8.1.1 Homotopy of maps and sets a. Homotopy of maps b. Homotopy classes c. Homotopy equivalence of sets d. Relative homotopy 8.1.2 Homotopy of loops a. The fundamental group with base point b. The group structure on 7ri(X, XQ) c. Changing base point
249 250 250 250 252 253 256 257 257 257 258
241 243 244 245 247
Contents
d. Invariance properties of the fundamental group . . . 259 8.1.3 Covering spaces 260 a. Covering spaces 260 b. Lifting of curves 261 c. Universal coverings and homotopy 264 d. A global invertibility result 264 8.1.4 A few examples 266 a. The fundamental group of 5^ 266 b. The fundamental group of the figure eight 267 c. The fundamental group of 5^, n > 2 267 8.1.5 Brouwer's degree 268 a. The degree of maps S^ ^ S^ 268 b. An integral formula for the degree 269 c. Degree and inverse image 270 d. The homological definition of degree for maps S^ ^ S^ 271 8.2 Some Results on the Topology of R'^ 272 8.2.1 Brouwer's theorem 272 a. Brouwer's degree 272 b. Extension of maps into S'^ 273 c. Brouwer's fixed point theorem 274 d. Fixed points and solvability of equations in R'^+^ . 275 e. Fixed points and vector fields 276 8.2.2 Borsuk's theorem 278 8.2.3 Separation theorems 279 8.3 Exercises 281 Part III. Continuity in Infinite-Dimensional Spaces 9.
Spaces of Continuous Functions, Banach Spaces and Abstract Equations 285 9.1 Linear Normed Spaces 285 9.1.1 Definitions and basic facts 285 a. Norms induced by inner and Hermitian products . 287 b. Equivalent norms 288 c. Series in normed spaces 288 d. Finite-dimensional normed linear spaces 290 9.1.2 A few examples 292 a. The space £p, 1 < p < oo 292 b. A normed space that is not Banach 293 c. Spaces of bounded functions 294 d. The space iooiy) 295 9.2 Spaces of Bounded and Continuous Functions 295 9.2.1 Uniform convergence 295 a. Uniform convergence 295 b. Pointwise and uniform convergence 297
Contents
xv
c. A convergence diagram 297 d. Uniform convergence on compact subsets 299 9.2.2 A compactness theorem 300 a. Equicontinuous functions 300 b. The Ascoh-Arzela theorem 301 9.3 Approximation Theorems 303 9.3.1 Weierstrass and Bernstein theorems 303 a. Weierstrass's approximation theorem 303 b. Bernstein's polynomials 305 c. Weierstrass's approximation theorem for periodic functions 307 9.3.2 Convolutions and Dirac approximations 309 a. Convolution product 309 b. MoUifiers 312 c. Approximation of the Dirac mass 313 9.3.3 The Stone-Weierstrass theorem 316 9.3.4 The Yosida regularization 319 a. Baire's approximation theorem 319 b. Approximation in metric spaces 320 9.4 Linear Operators 322 9.4.1 Basic facts 322 a. Continuous linear forms and hyperplanes 323 b. The space of linear continuous maps 324 c. Norms on matrices 324 d. Pointwise and uniform convergence for operators . 325 e. The algebra End (X) 326 f. The exponential of an operator 327 9.4.2 Fundamental theorems 327 a. The principle of uniform boundedness 328 b. The open mapping theorem 329 c. The closed graph theorem 330 d. The Hahn-Banach theorem 331 9.5 Some General Principles for Solving Abstract Equations . . . 334 9.5.1 The Banach fixed point theorem 335 a. The fixed point theorem 335 b. The continuity method 337 9.5.2 The Caccioppoli-Schauder fixed point theorem 339 a. Compact maps 339 b. The Caccioppoli-Schauder theorem 341 c. The Leray-Schauder principle 342 9.5.3 The method of super- and sub-solutions 342 a. Ordered Banach spaces 343 b. Fixed points via sub- and super-solutions 344 9.6 Exercises 344
xvi
Contents
10. Hilbert Spaces, Dirichlet's Principle and Linear Compact Operators 10.1 Hilbert Spaces 10.1.1 Basic facts a. Definitions and examples b. Orthogonality 10.1.2 Separable Hilbert spaces and basis a. Complete systems and basis b. Separable Hilbert spaces c. Fourier series and ^2 d. Some orthonormal polynomials in L^ 10.2 The Abstract Dirichlet's Principle and Orthogonality a. The abstract Dirichlet's principle b. Riesz's theorem c. The orthogonal projection theorem d. Projection operators 10.3 Bilinear Forms 10.3.1 Linear operators and bilinear forms a. Linear operators b. Adjoint operator c. Bilinear forms 10.3.2 Coercive symmetric bilinear forms a. Inner products b. Green's operator c. Ritz's method d. Linear regression 10.3.3 Coercive nonsymmetric bilinear forms a. The Lax-Milgram theorem b. Faedo-Galerkin method 10.4 Linear Compact Operators 10.4.1 Fredholm-Riesz-Schauder theory a. Linear compact operators b. The alternative theorem c. Some facts related to the alternative theorem . . . . d. The alternative theorem in Banach spaces e. The spectrum of compact operators 10.4.2 Compact self-adjoint operators a. Self-adjoint operators b. Spectral theorem c. Compact normal operators d. The Courant-Hilbert-Schmidt theory e. Variational characterization of eigenvalues 10.5 Exercises
351 351 351 351 354 355 355 355 357 360 363 364 366 367 368 368 369 369 369 370 371 371 372 373 374 376 376 377 378 378 378 379 381 383 384 385 385 387 388 390 392 393
Contents
11. Some Applications 11.1 Two Minimum Problems 11.1.1 Minimal geodesies in metric spaces a. Semicontinuity of the length b. Compactness c. Existence of minimal geodesies 11.1.2 A minimum problem in a Hilbert space a. Weak convergence in Hilbert spaces b. Existence of minimizers of convex coercive functional 11.2 A Theorem by Gelfand and Kolmogorov 11.3 Ordinary Differential Equations 11.3.1 The Cauchy problem a. Velocities of class C^{D) b. Local existence and uniqueness c. Continuation of solutions d. Systems of higher order equations e. Linear systems f. A direct approach to Cauchy problem for linear systems g. Continuous dependence on data h. The Peano theorem 11.3.2 Boundary value problems a. The shooting method b. A maximum principle c. The method of super- and sub-solutions d. A theorem by Bernstein 11.4 Linear Integral Equations 11.4.1 Some motivations a. Integral form of second order equations b. Materials with memory c. Boundary value problems d. Equihbrium of an elastic thread e. Dynamics of an elastic thread 11.4.2 Volterra integral equations 11.4.3 Predholm integral equations in C^ 11.5 Fourier's Series 11.5.1 Definitions and preliminaries a. Dirichlet's kernel 11.5.2 Pointwise convergence a. The Riemann-Lebesgue theorem b. Regular functions and Dini test 11.5.3 L^-convergence and the energy equality a. Fourier's partial sums and orthogonality b. A first uniform convergence result c. Energy equality 11.5.4 Uniform convergence
xvii
395 395 395 395 396 397 397 398 400 402 403 404 404 405 407 409 410 411 413 415 416 418 419 421 423 424 424 425 425 426 427 427 429 430 431 433 435 436 436 437 439 439 440 441 442
xviii
Contents
a. A variant of the Riemann-Lebesgue theorem b. Uniform convergence for Dini-continuous functions c. Riemann's locaUziation principles 11.5.5 A few complementary facts a. The primitive of the Dirichlet kernel b. Gibbs's phenomenon 11.5.6 The Dirichlet-Jordan theorem a. The Dirichlet-Jordan test b. Fejer example 11.5.7Fejer's sums
442 444 445 445 445 447 449 449 451 452
A. Mathematicians and Other Scientists
455
B.
457
Bibliographical Notes
C. Index
459
Mathematical Analysis Linear and Metric Structures and Continuity
Parti
Linear Algebra
—
x.
k
\»
:^[^^
^^^'\ -,i5^^^fl
W^MI
^'^
•If - '•'•'
William R. Hamilton (1805-1865), James Joseph Sylvester (1814-1897) and Arthur Cayley (1821-1895).
1. Vectors, Matrices and Linear Systems
The early developments of linear algebra, and related to it those of vectorial analysis, are strongly tied, on the one hand, to the geometrical representation of complex numbers and the need for more abstraction and formalization in geometry and, on the other hand, to the newly developed theory of electromagnetism. The names of William R. Hamilton (1805-1865), August Mobius (1790-1868), Giusto Bellavitis (1803-1880), Adhemar de Saint Venant (1797-1886) and Hermann Grassmann (1808-1877) are connected with the beginning of linear algebra, while J. Willard Gibbs (18391903) and Oliver Heaviside (1850-1925) established the basis of modern vector analysis motivated by the then recent Treatise in Electricity and Magnetism by James Clerk Maxwell (1831-1879). The subsequent formalization is more recent and relates to the developments of functional analysis and quantum mechanics. Today, linear algebra appears as a language and a collection of results that are particularly useful in mathematics and in applications. In fact, most modeling, which is done via linear programming of ordinary or partial differential equations or control theory, can be treated numerically by computers only after it has been transformed into a linear system; in the end most of the modeling on computers deals with linear systems. Our aim here is not to present an extensive account; for instance, we shall ignore the computational aspects (error estimations, conditioning, etc.), despite their relevance, but rather we shall focus on illustrating the language and collecting a number of useful results in a wider sense. There is a strict link between linear algebra and linear systems. For this reason in this chapter we shall begin by discussing linear systems in the context of vectors in R'^ or C"^.
1.1 The Linear Spaces R^ and C" a. Linear combinations Let K be the field of real numbers or complex numbers. We denote by '. the space of ordered n-tuples of elements of K, K^ := {x \yi:={x\x^,...,
x^), x' eK,
i = 1,..., n } .
4
1. Vectors, Matrices and Linear Systems
The elements of K^ are often called points or vectors of K^] in the latter case we think of a point in K^ as the end-point of a vector applied at the origin. In this context the real or complex numbers are called scalars as they allow us to regard a vector at different scales. We can sum points of K^, or multiply them by a scalar by summing their coordinates or multiplying the coordinates by A: X + y := (x^ + y \ x2 + y ^ . . . , a;^ + 2/^), if X = {x\
Ax := {Xx\ Ax^,..., Ax^),
x \ . . . , X-), y = (2/1, 2 / 2 , . . . , 2/-), A E K.
Of course Vx, y, z G K"^ and VA, /x G K, we have o o o o
(x + y) + z = X + (y + z), X 4- y = y + X, A(x 4- y) = Ax + Ay, (A -f /i)x = Ax + /ix, (A/i)x = A(/ix), if 0 := ( 0 , . . . , 0), then x - f O = 0 + x = x, 1 • X = x and, if —x := (—l)x, then x + (—x) = 0.
We write x — y for x + (—y) and, from now on, the vector 0 will be simply denoted by 0. 1.1 Example. If we identify B? with the plane of geometry via a Cartesian system, see [GMl], the sum of vectors in R"^ corresponds to the sum of vectors acccording to the parallelogram law, and the multipHcation of x by a scalar A to a dilatation by a factor |A| in the same sense of x if A > 0 or in the opposite sense if A < 0.
1.2 About the notation. A list of vectors in K^ will be denoted by a lower index, Vi, V 2 , . . . , Vfc, and a list of scalars with an upper index A^, A^,..., A'^. The components of a vector x will be denoted by upper indices. In connection with the product row by columns^ see below, it is useful to display the components as a column
/x^\ .2
However, since this is not very convenient typographically, if not strictly necessary, we shall write instead x = (x^, x ^ , . . . , x"). Given k scalars A^, A^,..., A'^ and k vectors v i , V2,. •., v^ of R'^, we may form the linear combination vector of vi, V 2 , . . . , Vfc with coefficients A^, A^,..., A'^ given by k
5^ASGK^ j=l
1.3 Definition, (i) We say that W CK^ is a linear subspace, or simply a subspace of K^, if all finite linear combinations of vectors in W belong to W.
1.1 The Linear Spaces R^ and C^
i tan— •cfiancenM le gmerati &M rnni riMkto n. By (i) we can complete the basis { w i , W 2 , . . . , Wn} of W^ to form a basis of Span { v i , V 2 , . . . , v ^ } with k elements; this is a contradiction since { e i , 6 2 , . . . , e n } is already a basis of K'^, hence a maximal system of linearly independent vectors of K^. (iii) follows as (ii). Let us prove that two bases of W have the same number of elements. Suppose that { v i , V2,.. •, Vp} and {ei, e 2 , . . . , e^} are two bases of W with p < k. By (i) we may complete v i , V 2 , . . . , Vp with k — p vectors chosen among e i , 6 2 , . . . , e^ to form a new basis { v i , V 2 , . . . , Vp, e p ^ - i , . . . , 6 ^ } of W; but this is a contradiction since {ei, 6 2 , . . . , ep} is already a basis of W, hence a maximal system of linearly independent vectors of W, see Proposition 1.5. Similarly, and we leave it to the reader, one can prove that k < n. D
1.8 Definition. The number of the elements of a (all) basis of a linear subspace W ofK^ is called the dimension ofW and denoted by dimM^.
1.1 The Linear Spaces E ^ and C^
9
1.9 Corollary. The linear space W^ has dimension n and, if W is a linear subspace ofW^, then dimW < n. Moreover, (i) there are k linearly independent vectors v i , V2,..., v^ G W, (ii) a set of k linearly independent vectors v i , V2,..., v^ 6 M^ is always a basis of W, (iii) any p vectors v i , V2,..., Vp G W with p > k are always linearly dependent, (iv) z/vi, V 2 , . . . , Vp are p linearly independent vectors ofW, then p < k, (v) for every subspace V C K^ such that V CW we have dim V < k, (vi) let V, W be two subspaces of W^; then V = W if and only ifVcW and dim V = dim W. 1.10 If. Prove Corollary 1.9.
d. Ordered basis Until now, a basis 5 of a linear subspace W of W^ is just a finite set of linearly independent generators of VF; every x G VF is a unique linear combination of the basis elements. Here, uniqueness means uniqueness of the value of each coefficient in front of each basis element. To be precise, one would write v€5
It is customary to index the elements of 5 with natural numbers, i.e., to consider 5 as a list instead of as a set. We call any list made with the elements of a basis S an ordered basis. The order just introduced is then used to link the coeflficients to the corresponding vectors by correspondingly indexing them. This leads to the simpler notation x^^AV, i=l
we have already tacitly used. Moreover, 1.11 Proposition. Let W be a linear subspace ofW^ of dimension k and let (vi, V 2 , . . . , Vfc) be an ordered basis of W. Then for every x G VF there is a unique vector A G K^, A := (A^, A^,..., A^) such that x = J2^=i ^^^i1.12 E x a m p l e . The list (ei, ©2) • • • J Gn) of vectors of W^ given by e i :— (1, 0,. .,0), e2 := ( 0 , 1 , . . . , 0 ) , . . . , Gn = ( 0 , 0 , . . . , 1) is an ordered basis of K^. In fact e i , e 2 , . 1 Gn are trivially linearly independent and span K^ since
/o\ = xi \xn/
1
0
X2
+ x^
u
+ • • • + x"
=E
X
Gj
\l/
for all X € K^. (ei , e 2 , . . . , Gn) is called the canonical or standard basis of MJ^. We shall always think of the canonical basis as an ordered basis.
10
1. Vectors, Matrices and Linear Systems
1.2 Matrices and Linear Operators Following Arthur Cayley (1821-1895) we now introduce the calculus of matrices. An m x n matrix A with entries in X is an ordered table of elements of X arranged in m rows and n columns. It is customary to index the rows from top to bottom from 1 to m and the columns from left to right from 1 to n. If {aj} denotes the element or entry in the ith row and the jth column, we write /a\
al
2
a.n
A = \o
or A = [a}], z = l , . . . , m , j
,n.
R is linear if and only if (i) (fiiXx, At/)) = Xip{x, y) V(a:, y) G R^ and VA G R+, (ii) there exist A and r G R such that (p({cose,sinO)) — Aco9{0 + r) ^9 e R. 1.24 %. The reader is invited to find the form of the associated matrices corresponding to the linear operators sketched in Figure 1.3.
1.2 Matrices and Linear Operators
0
VZ2L,
W-
^
15
n w.
YA
Figure 1.3. Some linear transformations of the plane. In the figure the possible images of the square [0,1] x [0,1] are in shadow.
d. Image and kernel Let A G M^,n(K) and let A(x) := Ax, x G K*^ be the linear associated operator. The kernel and the image of A (or A) are respectively defined by kerA = ker A : - {x G K^ I A(x) - o } , 1mA = Im A : - | y G K ^ I 3 x G K'' such that ^(x) = y | . Trivially, kerA is a linear subspace of the source space K"^, and it easy to see that the following three claims are equivalent: (i) A is injective, (ii) kerA = {0}, (iii) a i , a 2 , . . . , an are linearly independent in W^. If one of the previous claims holds, we say that A is nonsingular, although in the current literature nonsingular usually refers to square matrices. Also observe that A may be nonsingular only if m > n. Also Im A — Im A is a linear subspace of the target space K"^, and by definition lm.A = Spanjai, a 2 , . . . , a^}. The dimension of Im A = Im A is called the rank of A (or of A) and is denoted by Rank A (or Rank A). By definition Rank A is the maximal number of linearly independent columns o / A , in particular Rank A < min(n, m). Moreover, it is easy to see that the following claims are equivalent
16
1. Vectors, Matrices and Linear Systems
(i) A is surjective, (ii) I m A = K^, (iii) Rank A = m. Therefore A may be surjective only if m < n. The following theorem is crucial. 1.25 Theorem (Rank formula). For every matrix A G Mm,n{^) ^^e have dim Im A = n — dim ker A. Proof. Let ( v i , V 2 , . . . , v^) be a basis of ker A. According to Theorem 1.7 we can choose (n — k) vectors e ^ ^ i , . . . , en of the standard basis of K" in such a way that v i , V 2 , . . . , Vfc,efe_|_i,... ,en form a basis of K'^. Then one easily checks that D (A(efc_|_i),..., A ( e n ) ) is a basis of Im A, thus concluding that d i m l m A = n — k.
A first trivial consequence of the rank formula is the following. 1.26 Corollary. Let A e M^,n(K). (i) If m 0. (ii) If m > n, then A is nonsingular, i.e., ker A = {0}, if and only if Rank A is maximal, Rank A = n. (iii) Ifm, = n, i.e., A is a square matrix, then the following two equivalent claims hold: a) Let A{K) := A x be the associated linear map. Then A is surjective if and only if A is injective. b) A x = h is solvable for any choice of h e K ^ if and only if A{-x) = 0 has zero as a unique solution. Proof, (i) Prom the rank formula we have dim ker A = n — dim I m A > 7 2 — m > 0 . (ii) Again from the rank formula, dim Im A = n — dim ker A = n = min(n, m). (iii) (a) Observe that A is injective if and only if ker A = {0}, equivalently if and only if dim ker A = 0, and that A is surjective if and only if Im A = K^, i.e., d i m l m A = m = n. The conclusion follows from the rank formula. (iii) (b) The equivalence between (iii) (a) and (iii) (b) is trivial.
D
Notice that (i) and (ii) imply that A : K^ ^ W^ may be injective and surjective only if n = m. 1.27 %, Show the following. P r o p o s i t i o n . Let A € Mn,nOQ and A(x.) \— A x . The following claims are equivalent: (i) (ii) (iii) (iv) (v) (vi)
A is injective and surjective, A is nonsingular, i.e., ker A = {0}, A is surjective, there exists B G Mn,n{^) such that B A = Idn, there exists B 6 Mn,ni)^) such that A B = Idn, A is invertible, i.e., there exists a matrix B G Mn,n{^) Idn.
such that B A = A B =
1.2 Matrices and Linear Operators
17
An important and less trivial consequence of the rank formula is the following. 1.28 Theorem (Rank of the transpose). Let A e Mm^n- Then we have (i) the maximum number of linearly independent columns and the maximum number of linearly independent rows are equal, i.e., Rank A = Rank A^, (ii) let p := Rank A. Then there exists a nonsingular p x p square submatrix of A. Proof, (i) Let A = [a* ], let a i , a 2 , . . . , an be the columns of A and let p := Rank A. We assume without loss of generality that the first p columns of A are linearly independent and we define B as the mxp submatrix formed by these columns, B := [ai | a2 | . . . | ap]. Since the remaining columns of A depend linearly on the columns of B , we have p
a^ = J2'''j^i
Vfc = l , . . . , m , V j = p + l , . . . , n
i=i
for some R = [r*] G Mp^n-p{^)-
In terms of matrices,
I ap-f 1 ap-f 2 . •. an
= I ai
...
ap I R = B R ,
hence
A=
[BIBRI
-=B[ldp|R].
Taking the transposes, we have A-^ G Mn,mO^)j B-^ G Mp^rnO^) ^^^
(1.6)
A^ = R^
Since [Idp | R ] ^ is trivially injective, we infer that ker A ^ = k e r B ^ , hence by the rank formula Rank A"^ = m — dim ker A ^ = m — dim ker B-^ = Rank B ^ , and we conclude that Rank A ^ = RankB-^ < min(m, p) = p = Rank A. Finally, by applying the above to the matrix A ^ , we get the opposite inequality Rank A = Rank(A-^)-^ < RankA-^, hence the conclusion. (ii) With the previous notation, we have Rank B-^ = Rank B = p. Thus B has a set of p independent rows. The submatrix S of B made by these rows is a square pxp matrix with Rank S = Rank S-^ = p, hence nonsingular. • 1.29 1. Let A G Mm,n{K), let A(x) := A x and let ( v i , V 2 , . . . , Vn) be a basis of K^. Show the following: (i) A is injective if and only if the vectors A ( v i ) , A(v2), • . . , A{vn) of K^ are linearly independent, (ii) A is surjective if and only if {A(vi), A ( v 2 ) , . . . , A(vn)} spans K ^ , (iii) A is bijective iff { ^ ( v i ) , A ( v 2 ) , . . . , A(vn)} is a basis of K"^.
18
1. Vectors, Matrices and Linear Systems
e. Grassmann's formula Let U and V be two linear subspaces of K^. Clearly, both U (IV and [/ -h F := | x G K"" I X = u + V for some u G C/ and v E F | are linear subspaces of K^. When U OV = {0}, we say that U -^ V is the direct sum of U and V and we write [/ 0 F for C/ + V. If moreover U ® V = K^, we say that U and V are supplementary subspaces. The following formula is very useful. 1.30 Proposition (Grassmann's formula). LetU andV be linear subspaces ofK^. Then dim{U + y ) + dim(C/ nV) = dimU-\- dimV. Proof. Let ( u i , U2,. • . , u^) and ( v i , V 2 , . . . , v^) be two bases of U and V respectively. The vectors u i , U 2 , . . . , u/^, v i , V 2 , . . . , v^ span U -{-V, and a subset of them form a basis of U -\-V. In particular, dim.{U + V) = R a n k L where L is the n x {h-\-k) matrix defined by
L := l^ui I ... I u^ I - VI I ... I - VfcJ. Moreover, a vector x = XlILi ^^^i G K'^ is in L^ fl V if and only if there exist unique y^^ y^^- • • ^ y^ such that X = x^u\ + . . . x^Wh = 2/^vi H
h 2/^Vfc,
thus, if and only if the vector w := (—x^, — x ^ , . . . , —x^, y^, y^,..., longs to kerL. Consequently, the linear map (j) : K^'^'^ —v K^,
y^) € K^'^'^ be-
h
is injective and surjective from kerL onto UHV. It follows that dini{U OV) = dim kerL and, by the rank formula, dim(C/ nV)
+ dim{U + V) = dim kerL -h R a n k L = h + k = dimU + dimV. D
1.31 f. Notice that the proof of Grassmann's formula is in fact a procedure to compute two bases oiU-\-V and UOV starting from two bases of U and V. The reader is invited to choose two subspaces U and V of K" and to compute the basis of U -\- V and of
unv. f. Parametric and implicit equations of a subspace 1.32 P a r a m e t r i c e q u a t i o n of a straight line in K"^. Let a 7^ 0 and let q be two vectors in K^. The parametric equation of a straight line through q and direction a is the map r : K —>• K*^ given by r(A) := Aa + q, AG K. The image of r | x G R^ 3 A such that x = Aa -h q | is the straight line through q and direction a.
19
1.2 Matrices and Linear Operators
Figure 1.4. Straight line through q and direction a.
We have r(0) = q and r ( l ) = a -h q. In other words, r{t) passes through q and a -f q. Moreover, x is on the straight Une passing through q and a -f q if and only if there exists t E K such that x = i a + b , or, more explicitly
^2 =ta^
-\-q'^, (1.7)
[x^ = ta^ -\-q'" In kinematics, K — R and the map t —• r{t) := t a + q gives the position at time t of a point moving with constant vector velocity a starting at q at time t = 0 on the straight line through q and a + q. 1.33 Implicit e q u a t i o n of a straight line in K^. We want to find a representation of the straight line (1.7) which makes no use of the free parameter t. Since a 7«^ 0, one of its components is nonzero. Assume for instance a^ ^ 0, we can solve the first equation in (1.7) to get t = {q^ — x^)/a^ and, substituting the result into the last (n — 1) equations, we find a system of (n — 1) constraints on the variable x = (a;\ a:^,..., x^) e K^,
a:2 = ial^a^^q^ x^=^-^^^a^+q^,
The previous linear system can be written as A ( x — q) = 0 where A G Mn-i,nO^) the matrix defined by
f-a'^/a^ -a^/a^ A = I ~a^lo> -a'*/a" X-a'^lQ^
-1 0 0
0 -1 0U
0 0 - 11
0
0
0
...
is
0 0
-1/
1.34 ^ . Show that there are several parametric equations of a given straight line. A parametric equation of the straight line through a and b G M'^ is given by t —> r(t) := a-l-t(b-a), t GM. 1.35 P a r a m e t r i c a n d implicit e q u a t i o n s of a 2-plane in K^. Given two linearly independent vectors vi,V2 in R^ and a point q G M^, we call the parametric equation
20
1. Vectors, Matrices and Linear Systems
of the plane directed by v i , V2 and passing through q, the map (/? : K^ —> K^ d.efined by (p{{a, /3)) := a v i + /3v2 -h q, or in matrix notation
[v i I V2J I
^ I ct\
1^1+^.0)
Of course v? is Unear iff q = 0. The 2-plane determined by this parametrization is defined by n : = I m ( ^ = | x G M^ I x - q G I m A J . Suppose v i = (a, 6, c) and V2 = {d, e, / ) so that
Because of Theorem 1.28, there is a nonsingular 2 x 2 submatrix B of A and, without loss of generality, we can suppose that B = ( \b
1. We can then solve the system e
\x^ - q^ = ba -\- e(3 in the unknown (a,/3), thus finding a and /3 as linear functions of x^ — q^ and x^ — (p'. Then, substituting into the third equation, we can eliminate (a, ^) from the last equation, obtaining an implicit equation, or constraint, on the independent variables, of the form r (x^ - gl) + 5 {x^ -q^) + t (x^ - q^) = 0, that describes the 2-plane without any further reference to the free parameters
{a,0).
More generally, let W he a. linear subspace of dimension fc in K^, also called a k-plane (through the origin) of K"^. If v i , V 2 , . . . , v^ is a basis of W, we can write H^ = ImL where L :=
vi
V2
Vfc
We call X -^ L{x) := Lx the parametric equation of W generated by (vi, V2,..., Vfc). Of course a different basis of W yields a different parametrization. We can also write any subspace W of dimension k a.sW = ker A where A G Mn-k,n{^)' We call it an implicit representation of W. Notice that since ker A = W, we have Rank A-^ = Rank A = n — A: by Theorem 1.28 and the rank formula. Hence the rows of A are n — k linearly independent vectors of K*^. 1.36 Remark. A A;-dimensional subspace of K" is represented by means of k free parameters, i.e., the image of K^ through a nondegenerate parametric equation, or by a set of independent {n — k) constraints given by linearly independent scalar equations in the ambient variables.
1.2 Matrices and Linear Operators
21
1.37 P a r a m e t r i c a n d implicit representations. One can go back and forth from the parametric to the impUcit representation in several ways. For instance, start with W = I m L where L G M^^/c(K) has maximal rank, R a n k L = k. By Theorem 1.28 there is a A: X A: nonsingular matrix submatrix M of L. Assume that M is made by the first few rows of L so that M L = N where N G M^-kM^)Writing x as x = ( x ' , x ' 0 with x ' € K^ and x ' ' G K'^"'', the parametric equation x = Lt, t G K'^, writes as
I x' = Mt,
(1.8)
I x'' = Nt. As M is invertible,
(
t = M-ix', N M - i x ' = x".
We then conclude that x G I m L if and only if N M ~ ^ x ' = x " . The latter is an implicit equation for W, that we may write as A x = 0 if we define A G Mn-fc,n(K) by
A =
-Idfc
NM-
Conversely, let W = ker A where A G Mn,fc(K) has Rank A = n — k. Select n — k the square independent columns, say the first n —fcon the left, call B G Mn-k,n-kO^) matrix made by these columns, and split x as x = ( x ' , x ' ' ) where x ' G K'^~^ and x " G K^. Thus A x = 0 rewrites as
= 0,
or
B x ' -f C x ' ' = 0.
As B is invertible, the last equation rewrites as x ' = —B ^ C x " , Therefore x G ker A if and only if
x ' ' := L x ' ' ,
i.e., W =
lmL.
x'' G :
22
1. Vectors, Matrices and Linear Systems
1.3 Matrices and Linear Systems a. Linear systems and the language of linear algebra Matrices and linear operators are strongly tied to linear systems. A linear system of m equations and n unknowns has the form
(1.9)
[afx^+a^x'^^'"
m^n + a^x
Lm
The m-tuple (6^,..., b^) is the given right-hand side, the n-tuple ( x ^ , . . . , x"^) is the unknown and the numbers {a}}, i = 1 , . . . , m, j = 1 , . . . , n are given and called the coefficients of the system. If we think of the coefficients as the entries of a matrix A, /a\ A =
aj
a\
. ••
al
.
ai\ (1.10)
K-] =
\aT af . ..
a-/
and we set b := ( 6 \ 6^,..., V^) £ K'", x := {x^, x^,..., the system can be written in a compact way as Ax = b.
x") € K", then (1.11)
Introducing the linear map ^ ( x ) := Ax, (1.9) can be seen as a functional equation A{x) = b (1.12) or, denoting by a i , a 2 , . . . , an the n-columns of A indexed from left to right, as (1.13) x^ai + x^8i2 + • • • -h x'^ain = b. Thus, the discussion of linear systems, linear independence, matrices and linear maps are essentially the same, in different languages. The next proposition collects these equivalences. 1.38 Proposition. With the previous notation we have: (i) A x is a linear combination of the columns of A. (ii) The following three claims are equivalent: a) the system (1.11) or (1.9), is solvable, i.e., there exists x G K^ such that A x = b ; b) h is a linear combination o / a i , a 2 , . . . , an/ c) h e ImA. (iii) The following four claims are equivalent:
1.3 Matrices and Linear Systems
23
a) A x = b has at most one solution, b) A x = 0 implies x = 0, c) A(x) = 0 has a unique solution, d) kerA = {0}, e) ai, a 2 , . . . , an are linearly independent. (iv) ker A is the set of all solutions of the system A x = 0. (v) Im A is the set of all b ^s such that the system Ax = b has at least one solution. (vi) Let XQ £ W^ he a solution of AXQ — h. Then the set of all solutions of A x = h is the set | x o | + ker A := {x G K"" x -
XQ
G ker A}.
With the previous notation, we see that b is Unearly dependent of a i , a 2 , . . . , an if and only if Rank a i
^n
= Rank a i
an b
Thus from Proposition 1.38 (ii) we infer the following. 1.39 Proposition (Rouche-Capelli). With the previous notation, the system (1.9) or (1.11) is solvable if and only if Rank a i
an
= Rank a i
an b
The m X (n + 1) matrix
/a{ b^
b]:^
ai
\aT
a^
...
.
b. The Gauss elimination method As we have seen, linear algebra yields a proper language to discuss linear systems, and conversely, most of the constructions in linear algebra reduce to solving systems. Moreover, the proofs we have presented are constructive and become useful from a numerical point of view if one is able to efficiently solve the following two questions: (i) find the solution of a nonsingular square system A x = b, (ii) given a set of vectors T c K^, find a subset S CT such that Span S = SpanT. In this section we illustrate the classical Gauss elimination method which efficiently solves both questions. 1.42 E x a m p l e . Let us begin with an example of how to solve a linear system. Consider the linear system
Sx + Sy + Qz 2x-hy-\-z where x := {x,y,x),
b := (61,62,63) and
=b2, =63,
Ax = b
1.3 Matrices and Linear Systems
25
A=-
We subtract from the second and third equations the first one multipUed by 1/2 and 1/3 respectively to get the new equivalent system:
1
6x-\-18y-\-6z
=bi,
< 3x-\-Sy + 6z-^{6x-\-18y-\-6z) [
2x-^ y-\-z - l{6x + 18y + 6z)
=-^bi-\-b2, =-§61+63,
i.e., ( 6x-\-lSy + 6z < - i / + 3z
= 61, =-§61+62,
[
=-§61+63.
-by-z
(1.15)
This essentially requires us to solve the system of the last two equations f - 2 / + 32 \ -5y-z
=-§61+62, =-§61+63.
We now apply the same argument to this last system, i.e., we subtract from the last equation the first one multiplied by 5 to get
(
6x-\-lSy-\-6z -y + Sz -52/ - z - 5(-2/ + 3^)
=61, =-§61+62, = - § 6 1 + 63 - 5 ( - § 6 i + 62),
i.e.,
(
6x-\-18y-\-6z
=61,
-y + Sz =-§61+62, -I62 =261-562+63. This system has exsictly the same solution as the original one and, moreover, it is easily solvable starting from the last equation. Finally, we notice that the previous method produced two matrices
U is upper triangular and L is lower triangular with 1 in the principal diagonal, so the original system A x = b rewrites as U x = Lb. Since L = [/*] is invertible (l] = 1 Vi) and x is arbitrary, we can rewrite the last formula as a decomposition formula for A, A = L-iU.
The algorithm we have just described in Example 1.42, that transforms the proposed 3 x 3 square system into a triangular system, extends to systems with an arbitrary number of unknowns and equations, and it is called the Gauss elimination method. Moreover, it is particularly efficient, but does have some drawbacks from a numerical point of view.
26
1. Vectors, Matrices and Linear Systems
Let (1.16)
Ax = 0
be a linear homogeneous system with m equations, n unknowns and a coefficient matrix given by /'
^2
A = \o
([e^(i) | . . . | e^(^)]) = ( - l ) ^ D ( [ e i | . . . | en]) and D([ei | . . . | en]) = 1, we conD clude that D{A) agrees with the right-hand side of (1.20), hence D{A) = det A.
The determinant can also be computed by means of an inductive formula. 1.55 Definition. Let A = [aj] E Mn^nOQ, n > 1. A r-minor of A is a r X r suhmatrix of A, that is a matrix obtained by choosing the common entries of a choice of r rows and r columns of A and relabeling the indices from 1 to r. For i,j = l,...,nwe define the complementing (i, j)-minor of the matrix A, denoted by M^(A), as the (n — 1) x (n — l)-minor obtained by removing the ith row and the jth column from A. 1.56 T h e o r e m (Laplace). Let A e Mn,n(^), det A : -
n>l. Then
A
X;=i(-lP+'«] detMJ(A)
z/n = 1,
(1.21)
ifn>l.
Proof. Denote by D ( A ) the right-hand side of (1.21). Let us prove that D{A) fulfills the conditions (i), (ii) and (iii) of Theorem 1.54, thus D{A) = det A. The conditions (i) and (ii) of Theorem 1.54 are trivially fulfilled by D{A). Let us also show that (iii) holds, i.e., if aj = a^+i for some j , then D(A) — 0. We proceed by induction on j . By the induction step, det M ^ ( A ) = Q ioi h ^ jj + 1, hence D{A) = (-l)-?+ia] det M ] ( A ) + (—l)^aL ^ d e t M L j ( A ) . Since a j = ^j-fi? ^^^^ consequently, M j ( A ) = Mj_j_^(A), we conclude that D{A) = 0 . D
Prom (1.20) we immediately infer the following.
1.4 Determinants
35
1.57 T h e o r e m ( D e t e r m i n a n t of t h e t r a n s p o s e ) . We have det A ^ = det A
for all A G Mn,n(lK).
One then shows the following important theorem. 1.58 Theorem (Binet's formula). Let A and B be two nxn Then det(BA) = d e t B det A.
matrices.
Proof. Let A - : [a]] = [ai | . . . | an], B = [6j] = [bi | . . . | bn] and let ( e i , . . . , e n ) be the canonical basis of K"^. Since
j=l
j,r=l
r=l
we have n
d e t ( B A ) = det Q ^ r=l =
n
a5[br I . . . I 5]1 « n b r ] ) r=l
Yl ^a(l)«a(2) • • • < ( n ) det[b^(i) | . . . | b^(^)] creVn
= E
( - i r < i ) < 2 ) - - - < ( n ) d e t B = detAdetB.
As stated in the beginning, the determinant gives us a criterion to decide whether a matrix is nonsingular or, equivalently, whether n vectors are linearly independent. 1.59 T h e o r e m . A nxn matrix A is nonsingular if and only if det
A^^.
Proof. If A is nonsingular, there is a B G Mn.ni]^) such that A B = Idn, see Exercise 1.27; by Binet's formula det A det B = 1. In particular det A ^ 0. Conversely, if the columns of A are linearly dependent, then it is not difficult to see that det A = 0 by using Theorem 1.54. •
Let A = [a^] be an m x n matrix. We say that the characteristic of A is r if all p-minors with p > r have zero determinant and there exists a r-minor with nonzero determinant. 1.60 T h e o r e m (Kronecker). The rank and the characteristic of a matrix are the same. Proof. Let A 6 Mm,ni^) and let r := Rank A . For any minor B , trivially R a n k B < Rank A = r, hence every p-minor is singular, i.e., has zero determinant, if p > n. On the other hand. Theorem 1.28 implies that there exists a nonsingular r-minor B of A , hence with det B ^ 0. •
The defining inductive formula (1.21) requires us to compute the determinant of the complementing minors of the elements of the first row; on account of the alternance, we can use any row, and on account of Theorem 1.57, we can use any column. More precisely,
36
1. Vectors, Matrices and Linear Systems
1.61 Theorem (Laplace's formulas). Let A be an n x n matrix. We have for all h^k = 1,...,n n
Skh det A = Yli-lf+^a';
det M^^(A),
n
Skh det A = ^ ^ ( - i r + ' ^ a l d e t M K A ) , where Shk is Kronecker^s symbol. 1.62 ^ . To compute the determinant of a square n x n matrix A we can use a Gauss reduced matrix G A of A. Show that det A = (—1)*^ n r = i ( ^ A ) i where a is the permutation of rows needed to compute G A , and the product is the product of the pivots.
It is useful to rewrite Laplace's formulas using matrix multiplication. Denote by cof(A) = [c*] the square n x n matrix, called the matrix of CO factors of A, defined by c}:=(-l)^+^detM^(A). Notice the exchange between the row and column indices: the (z, j ) entry of cof(A) is (—1)*"^-^ times the determinant of the complementing (j, z)-minor. Using the cofactor matrix, Laplace's formulas in Theorem L61 rewrite in matrix form as 1.63 Theorem (Laplace's formulas). Let A be annxn we have cQf(A) A = A c o f ( A ) = det A Idn-
matrix. Then (1.22)
We immediately infer the following. 1.64 Proposition. Let A = [ai | a2 | . . . | an] G Mn,n{^)
be nonsingular.
(i) We have A"' = : ^ c o f ( A ) . det A ^ ^ (ii)
The system A x = b , b G K"^, has a unique solu-
(CRAMER'S RULE)
tion given by _ / 1 J\.
—
\
2 )
n\ , . . . , t X /
^ J,
«£/
detBi
det A '
where := lai
. . . a^-i b a^-^i
. . . an I.
1.5 Exercises
37
Proof, (i) follows immediately from (1.22). (ii) follows from (i), but it is better shown using linearity and the alternating property of the determinant. In fact, solving A x = b is equivalent to finding x = (x^, x ^ , . . . , x^) such that b = Y17=i ^*^i- Now, linearity and the alternating property of the determninant yield n
det B i = det
• • • a^-i
n
V ^ X^SLJ ai_|_i j=i
...
= V ^ x^ det
• • • Sii-i
SLJ ai_|_i
...
j=i
Since the only nonzero addend on the right-hand side is the one with j = i, vje infer det B i = x^ det a i
1.65 f.
...
a^-i
a^ a i + i
...
an
= x^ det A.
Show that d e t c o f ( A ) = (det A)^
1.5 Exercises 1.66 1 . Find the values of x,y (1, y, y^) form a basis of M^.
e M for which the three vectors (1,1,1),
{l,x,x'^),
1.67 ^ . Let Q:i,a2 G C be distinct and nonzero. Show that e"i*, e"2*^ t G R, are linearly independent on C. [Hint: See [GM2] Corollary 5.54.] 1.68 %, Write the parametric equation of a straight line (i) through b = (1,1,1) and with direction a = (1,0,0), (ii) through a = (1,1,1) and b = (1,0,0). 1.69 %. Describe in a parametric or implicit way in E^, o a straight line through two points, o the intersection of a plane and a straight line, o a straight line that is parallel to a given plane, o a straight line on a plane, o a plane through three points, o a plane through a point containing a given straight line, o a plane perpendicular to a straight line. 1.70 ^ Afflne t r a n s f o r m a t i o n s . An affine transformation (f : K^ —>• K^ is a map of the type (p{x.) := L{x.) + qo where L : K^ —> K^ is linear and qo G K"^. Show that (p is an affine transformation if and only if (f maps straight lines onto straight lines. 1.71 f. Let Pi and P2 be two (n - l)-planes in W^. Show that either Pi = P2 or P i n F2 = 0 or Pi n P2 has dimension n - 2. 1.72 1. In M^ find (i) two 2-planes through the origin that meet only at the origin, (ii) two 2-planes through the origin that meet along a straight line. 1.73 1 . In E^ write the 2 x 2 matrix associated with the counterclockwise rotations of angles 7r/2, TT, 37r/2, and, in general, 6 eR.
38
1. Vectors, Matrices and Linear Systems
1.74 %, Write the matrix associated with the axial symmetry in R^ and to plane symmetries. 1.75 %. Write down explicit linear systems of 3, 4, 5 equations with 4 or 5 unknowns, and use the Gauss elimination procedure to solve them. 1.76 % Let A € Mn,n(K). Show that if A B = 0 VB € Mn,n(^)^
1.77 1 . Let A = ( ^
~^
^ I and B = I ^
~^
\0
-1
3J
\b
2
then A = 0.
^ | . Compute A -f B , \ / 2 A + B .
3J
1.78 If. Let
(
3
3
2
3^
2
2
0
2
-1
0
1
V
(
0
(2 3 B = 1
V2
-A 2 -1
5/
Compute A B , B B ^ , B ^ B . 1.79 K. Let 0 0 0
0\ 1 0 0 0/
Show that A B = 0. 1.80 f. Let A , B € Mn,n{K).
Show that if A B = 0 and A is invertible, then B = 0.
1.81 % Let
Show that o A 2 = B 2 = C2 o AB = - B A = o BC = - C B = o CA = - A C =
3=-Id, C, Id, B.
1.82 f. Let A , B e Mn,n- We say that A is symmetric if A = A ^ . Show that, if A is symmetric, then A B is symmetric if and only if A and B commute, i.e., A B = B A . 1.83 ^ . Let M G Mn,n{^) be an upper triangular matrix with all entries in the principal diagonal equal to 1. Suppose that for some k we have M'^ = M M • • • M = IdnShow that M = Idn1.84 %. Let A , B € Mn,n(K). In general A B ^ B A . The n x n matrix [A,B] := A B — B A is called the comm,utator or the Lie bracket of A and B . Show that (i) [A,B1 = - [ B , A ] , (ii)
(JACOBI'S IDENTITY) [[A,B], C] + [[B, C], A] -f [[C, A ] , B ]
=
0,
(iii) the trace of [A, B] is zero. The trace of a n x n matrix A = [aj] is defined as
1.5 Exercises
39
1.85 ^. Let A € Mn,n be diagonal. Show that B is diagonal if and only if [A, B] = 0. 1.86 ^ Block matrices. Write a, n x n matrix as
A=(^l
^2
where A J is the submatrix of the first k rows and h columns, A2 is the submatrix of the first k rows and n — h columns, etc. Show that A\
A A /B}
B A _/A}B}+A1B?
AjBi+AiBi
A?
A^MB?
B2/
AfBi+A^B:
lAfBj+A^B?
1.87 1. Let A G Mfc,fc(K), B G Mn,n(K) and A 0
C =
0 B
Compute det C. 1.88 1. Let A G Mfc,fe(K), B G Mn,n(K), C G Mfc,n(K) and A 0
M =
C B
Compute det M. 1.89 ^ Vandermonde determinant. Let Ai, A2, /I
A:=
. , An G
and
l\
1
1
Ai
A2
A3
An
^1 Aj
^2 A2
^3 A3
A3
VA? AJ AJ ... A^y Prove that det A = ni 1, and all the linear subspaces of K^ are vector spaces over K. Also, the space of m x n matrices with entries in K, Mm,n{^), is a vector space over K, with the two operations of sum of matrices and multiplication of a matrix by a scalar, see Section 1.2. 2.3 E x a m p l e . Let X be any set. Then the class T{X, K) of all functions ip : X —^ K is a vector space with the two operations of sum and multiplication by scalars defined by {if + ip)ix) := ip{x) -f 7p(x), (A(/?)(a:) := X(p{x) Va: G X. Several subclasses of functions are vector spaces, actually linear subspaces of J^(X,K). For instance, o the set C°([0,1],]R) of all continuous functions cp : [0,1] —>• R, the set of kdiflFerentiable functions from [0,1] into R, the set C'^([0,1],R) of all functions with continuous derivatives up to the order k, the set C"^([0,1],R) of infinitely differentiable functions, o the set of polynomials of degree less than /c, the set of all polynomials, o the set of all complex trigonometric polynomials, o the set of Riemann summable functions in ]0,1[, o the set of all sequences with values in K.
We now begin the study of properties that depend only on the hnear structure of a vector space, independently of specific examples. b. Subspaces, linear combinations and bases 2.4 Definition. A subset W of a vector space X is called a linear subspace, or shortly a subspace of X, if (i) OeW, (ii) "^ x,y eW we have x ~\-y eW, (iii) \/ X eW and W XeK we have Xx G W. Obviously the element 0 is the zero element of X and the operations of sum and multiplication by scalars are as those in X. In a vector space we may consider the finite linear combinations of elements of X with coefficients in K, i.e., ^A^^.
eX
i=l
where A^, A^,..., A'^ G K and vi, t'2, • • •, ^n ^ X. Notice that we have indexed both the vectors and the relative coefficients, and we use the standard notation on the indices: a list of vectors has lower indices and a list of coefficients has upper indices. It is readily seen that a subset VF C X is a subspace of X if and only if all finite linear combinations of elements of X with coefficients in K belong to W. Moreover, given a set 5 C X, the family of all finite linear combinations of elements of 5 is a subspace of X called the span of S and denoted by Span S. We say that a finite number of vectors are linearly dependent if there are scalars, not all zero, such that
2.1 Vector Spaces and Linear Maps
43
LECTURES Q U A T E R N I O N S J « J t o ^latinwiiciil ^tt»i«; T B E KOTAL I R I S H ACADEMY;
THE HALLS Ot TBIKnY COLLEOS, DUBUK :
SIR WltUAM ROWAN HAUILTOM, LLD, M.R.LA.,
DUBLIN: HO0OK8 AND SMITH, OEAFTON.STBKRT.
Figure 2.1. Arthur Cayley (1821-1895) and the Lectures on Quaternions by WiUiam R. Hamilton (1805-1865).
lOilPOiC! trHITTAEIB » COM ATSMAklA tMSK CAItrafDOKt HA«Hau«t « CO. 1853.
or, in other words, if one vector is a linear combination of the others. If n vectors are not hnearly dependent, we say that they are linearly independent More generally, we say that a set S of vectors is a set of linearly independent vectors whenever any finite list of elements of S is made by hnearly independent vectors. Of course linearly independent vectors are distinct and nonzero. 2.5 Definition. Let X be a vector space. A set S of linearly independent vectors such that Span 5 = X is called a basis of X. A set A C X is a maximal independent set of X if A is a set of linearly independent vectors and, whenever we add to it a vector w € X\A, Au{w} is not a set of linearly independent vectors. Thus a basis of X is a subset S C X such that (i) every a: G X is a finite linear combination of some elements of S. Equivalently, for every x e X there is a map A : 5 ^ K such that X = J2ves -^C^)^ ^iid X{v) = 0 except for a finite number of elements (depending on x) of 5, (ii) each finite subset of 5 is a set of linearly independent vectors. It is easy to prove that for every x e X the representation x = J2ves ^(^)^ is unique if 5 is a basis of X. Using the same proof as in Proposition 1.5 we then infer 2.6 Proposition. Let X be a vector space over K. Then S C X is a basis of X if and only if S is a maximal independent set.
44
2. Vector Spaces and Linear Maps
Using Zorn's lemma, see [GM2], one can also show the following. 2.7 Theorem. Every vector space X has a basis. Moreover, two bases have the same cardinality. 2.8 Definition. A vector space X is finite dimensional if X has a finite basis. In the most interesting infinite-dimensional vector spaces, one can show the basis has nondenumerable cardinality. Later, we shall see that the introduction of the notion of limit, i.e., of a new structure on X, improves the way of describing vectors. Instead of trying to see every a; G X as a finite linear combination of elements of a nondenumerable basis, it is better to approximate it by a suitable sequence of finite linear combinations of a suitable countable set. For finite-dimensional vector spaces. Theorem 2.7 can be proved more directly, as we shall see later. 2.9 f. Show that the space of all polynomials and C^([0,1], M) are infinite-dimensional vector spaces.
c. Lifear maps 2.10 Definition. Let X and Y be two vector spaces over K. A map (p : X -^Y is called K-linear, or linear for short, if (p{x -\-y) = (f{x) -h (p(y)
and
(p{Xx) = X(p{x)
for any x,y E X and A G K. A linear map that is infective and surjective is called a (linear) isomorphism. Of course, ii (p : X ^Y
is hnear, we have (p{0) = 0 and, by induction,
2.11 Proposition. Let (p : X ^^Y k
i=l
be linear. Then k
i=l
for any A^, A^,..., A^ G K and e\, 6 2 , . . . , Cn G X . In particular, a linear map is fixed by the values it takes on a basis. The space of linear maps (p : X -^Y between two vector spaces X and y , denoted by £(X, Y"), is a vector space over K with the operations of sum and multiplication by scalars defined in terms of the operations on Y by {ip + ip){x) := (p(x) + ip{y), (A(^)(x) = X(p(x) for all (p,ip e C{X, Y) and A G R. Notice also that the composition of linear maps is again a linear map, and that, if (/? : X -> F is an isomorphism, then the inverse map (p~^ :Y -^ X is also an isomorphism. It is easy to check the following.
2.1 Vector Spaces and Linear Maps
2.12 Proposition. Let ip : X -^Y
45
be a linear map,
(i) If S C X spans W C X, then (p{S) spans ip{W), (ii) If ei, 6 2 , . . . , Cn are linearly dependent in X, then ^ ( e i ) , . . . , (^(cn) are linearly dependent in Y. (iii) ^ is injective if and only if any list (ei, 6 2 , . . . , Cn) of linearly independent vectors in X is mapped into a list ((^(ei),(/?(e2),... ,(p{en)) of linearly independent vectors in X. (iv) The following claims are equivalent a) if is an isomorphism, b) S C X is a basis of X if and only if (p{S) is a basis ofY. 2.13 %, Show that the following maps are linear (i) the derivation map D : C^([0,1]) —>• C^([0,1]) that maps a C^-function into its derivative, f ^^ f. (ii) the map that associates to every function of class C^([0,1]) its integral over [0,1],
/-
r f{t)dt, Jo
(iii) the primitive map C°([0,1]) —• C^([0,1]) that associates to every continuous function the primitive function X
fix)
-^ F(x)):= : j mdt. 0
2.14 Definition. Let cp : X —^ Y be a linear map. The kernel of (p and the image of ip are respectively kercp := Ix e X\ (p{x) = o i , Im(p := C(X, y ) is given by the mn maps {^j}, j = 1 , . . . , n, z = 1 , . . . , m, defined in terms of the bases as
^ii{ek)=Sifj = 0
if k ^ i,
A; = 1,.. . , n .
if k = i,
The matrix associated to ip'j is the mx n matrix with all entries 0 except for the entry (i, j ) where we have 1. Of course the matrix Ad{£) associated to £ depends on the coordinate systems we use on X and Y. When we want to emphasize such a dependence, we write
to denote the matrix associated to £ : X -^ y using the coordinate systems £ on the source space and T in the target space. The product of composition of Unear maps corresponds to the product of composition of linear maps at the level of coordinates, hence to the product row by columns of the corresponding matrices. More precisely, we have the following.
50
2. Vector Spaces and Linear Maps
2.24 Proposition. Let (/? : X —> F and ip : Y -^ Z be two linear maps, and let £ : X -^W, T :Y -^ K^, and G : Z ^ K^ be three systems of coordinates on X,Y and Z. Then
MJ{^ov) = M%{^P)Mf{ linear map and y £ Y. The equation (f{x)=0 called the associate homogeneous equation to (2.3). Of course, we have
is
(i) the set of all solutions of the associate linear homogeneous equation ip{x) = 0 is ker ip, (ii) (2.3) is solvable iff t/ G limp, (iii) (2.3) has at most a unique solution if ker (^ = {0}, (iv) if (p{xo) = 2/, then the set of all solutions of (2.3) is Ix e X\x
— xo e ker (p >.
Taking into account the rank formula, we infer the following. 2.25 Corollary. Let X, Y be finite dimensional of dimension n and m respectively, and let (p : X -^Y be a linear map. Then
2.1 Vector Spaces and Linear Maps
51
(i) if m 0, (ii) if m>n, then (p is injective iff Rank ip = n, (iii) ifn = m, then ip is injective if and only if (p is surjective. The claim (iii) of Corollary 2.25 is one of the forms of Predholm 's alternative theorem: either (p{x) = y is solvable for every y ^ Y or (p{x) = 0 has a nonzero solution. 2.26 E x a m p l e . A second order linear equation ay"+by'-\-cy
= f,
a,b,ceR,
/ 6 C^R),
(2.4)
can be seen as an abstract linear equation ip(y) = f by introducing the linear map if : C2(R) -^ C ^ R ) ,
y - . K maps x e X to the ith coordinate of X, so that n
X = 2_\e^{x)ei
Vx G X,
2=1
(ii) (e^, e^,..., e^) is a basis on X*, (iii) ifx = ^ ^ ^ 1 x'ci eX and£ = YJ"i=i he' G X\
then(>{x) = YTi^i
hx\
Proof, (i) If x = J^^^i x^ei, then e^{x) = Y:7=i x'e^{ei) = x^. Then ^(e^) = /i^ Vi. Thus, if ^(x) = 0 Vx we trivially have (ii) Let ^ := Yl]=iH^^' /ii = 0 Vi. (iii) In fact,
jf = l
i=l
j=l
i=l
= E E '>^*e'(«i) = E E 'j^^-^y = E '*^*i=lj=l
i=lj=l
i=l
The system of linear maps (e^, e^,..., e'^) characterized by (2.7) is called the dual basis of (ei, e 2 , . . . , en) in X*.
2.1 Vector Spaces and Linear Maps
55
CALCOLO
GEOMETRICO tttCtKITOUUX
OPEBAZlOh'l DELIA LOQICA DEDUTTIVA QIU8EPFE PBAKO
TOUIKO FRATSLtl 80CCA KDITORI thMSSftM
^^» ••«•&»
Figure 2.6. Giuseppe Peano (1858-1932) and the frontispiece of his Calcolo Geometrico.
2.31 Remark. Coordinates of vectors or covectors of a vector space X of dimension n are both n-tuples. However, to distinguish them, it is useful to index the coordinates of covectors with lower indices. We can reinforce this notation even more by writing the coordinates of vectors as column vectors and coordinates of covectors as row vectors.
k. The bidual space Of course, we may consider also the space of linear forms on X*, denoted by X** and called the hidual of X. Every v e X identifies a Hnear form on >C(X*,K), by defining t;** : X* -^ K by ^;**(^) := e{v). The map 7 : X —> X**, X —> 7(x) := x** we have just defined is linear and injective. Since dimX** = dimX* = dimX, 7 is surjective, hence an isomorphism, and we call it natural since it does not depend on other structures on X , as does the choice of a basis. Since X** and X are naturally isomorphic, there is a "symmetry" between the two spaces X and X*. To emphasize this symmetry, it is usual to write ip{x) instead of < (p^x > if (^ G X* and a; G X , introducing the evaluation map
< , >:X* x X
< (p,x > : = (f{x).
2.32 ^ . Let X be a vector space and let X* be its dual. A duality between X and X* is a map < , > : X* x X ^ K that is
56
2. Vector Spaces and Linear Maps
(i) linear in each factor, -\- /3y > = a < (p,x > -\-/3 < (p,y > ,
< ^,ax
< a(f -\- /3ip ,x > = a < (f,x > -f/? < ip ,x > , for all a, /? G K, X, y e X and (f,ip e X*, (ii) nondegenerate i.e., if < If ,x > = 0 Wx, then (f = 0, if < V? , x > = 0 V that evaluates a linear map (p : X at x G X is a duality.
^^K
1. Adjoint or dual maps Let X, Y be vector spaces, X* and F* their duals and < , >x and < , >Y the evaluation maps on X* x X and Y* x Y. For every linear map £ : X ^^ y , one also has a map ^* : F* -^ X* defined by < r(2/*),x > : = < y\e{x)
>
Vx G X,V2/* G F*.
It turns out that £* is linear. Now if (ei, e2,. •., Cn) and ( / i , . . . , fm) are bases in X and Y respectively, and (e^, e ^ , . . . , e^) and (/^, / ^ , . . . , / ^ ) are the dual bases in X* and F*, then the associated matrices L = [L^] G Mm,n(If^) and M = [Mj] G Mn,m(IK), associated respectively to £ and ^*, are defined by m
n
h=l
2=1
By duality, i.e., M = L-^. Therefore we conclude that i / L is the matrix associated to £ in a given basis, then L^ is the matrix associated to £* in the dual basis. We can now discuss how coordinate changes in X reflect on the dual space. Let X be a vector space of dimension n, X* its dual space, ( e i , . . . , e^), (ei, 62,..., Cn) two bases on X and ( e \ e^,..., e^) and (e^ e^,..., e") the corresponding dual bases on X*. Let ^ : X -^ X be the linear map defined by £{ei) := e^ Vi = 1 , . . . , n. Then by duality degP2, we may divide Pi by P2, i.e., uniquely decompose Pi as Pi = QP2 + R where d e g P < degP2. This allows us to define the greatest common divisor (g.c.d.) of two polynomials that is defined up to a scalar factor, and compute it by Euclid's algorithm. Moreover, since complex polynomials factor with irreducible factors of degree 1, the g.c.d. of two complex polynomials is a constant polynomial if and only if the two polynomials have no common root. We also have 2.56 Lemma. Let p{t) and q{t) be two polynomials with no common zeros. Then there exist polynomials a{t) and b{t) such that a{t)p{t) + b{t)q{t) = 1 V t e C .
68
2. Vector Spaces and Linear Maps
We refer the readers to [GM2], but for their convenience we add the proof of Lemma 2.56 Proof. Let V := (r{t) := a{t)p{t) -h b{t)q(t) I a{t), b{t) are polynomials} and let d = ap-\-/3q he the nonzero polynomial of minimum degree in V. We claim that d divides both p and q. Otherwise, dividing p by d we would get a nonzero polynomial r := p — md and, since p and d are in V, r = p — md G V also, hence a contradiction, since r has degree strictly less than d. Then we claim that the degree of d is zero. Otherwise, d would have a root that should be common to p and q since d divides both p and q. In conclusion, d is a nonzero • constant polynomial.
2.57 Proposition. For every polynomial p, the kernel of p{A) is an invariant subspace for A G Mn,n{C)Proof. Let w G kerp(A).
Since tp{t) = p{t) t, we infer Ap{A) = p{A)A.
Therefore
p ( A ) ( A w ) = (p(A) A ) w = ( A p ( A ) ) w = A p ( A ) w := AO = 0. Hence Aw G kerp(A).
•
2.58 Proposition. Let p be the product of two coprime polynomials, p{t) =Pi{t)p2{t), and let A G Mn,n(C). Then kerp(A) := kerpi(A) 0ker;?2(A). Proof By Lemma 2.56, there exist two polynomials a i , a 2 such that ai{t)pi(t) + a2(t)p2(t) = 1. Hence (2.9) ai (A)pi (A) + a2 (A)p2 (A) = Id. Set Wi := k e r p i ( A ) ,
W2 := kerp2(A),
W := kerp(A).
Now for every x G W, we have ai {A)pi ( A ) x G W2 since P 2 ( A ) a i ( A ) p i ( A ) x = P2(A)(Id - a2(A)p2(A))x = (Id - a2(A)p2(A))p2(A)x = a i ( A ) p i ( A ) p 2 ( A ) x = a i ( A ) p ( A ) x - 0. and, similarly, a2(A)p2(A)x G Wi. Thus W = Wi + W2. Finally W = WieW2fact, if y G Wi n W2, then by (2.9), we have
In
y = a i (A)pi (A)y -h 02 (A)p2 (A)y = 0 + 0 = 0.
c. Generalized eigenvectors and the spectral theorem 2.59 Definition. Let A G Mn,n(C), and let A be an eigenvalue of A of multiplicity k. We call generalized eigenvectors of A relative to the eigenvalue A the elements of W:=ker{Xld-A)^. Of course, (i) eigenvectors relative to A are generalized eigenvectors relative to A, (ii) the spaces of generalized eigenvectors are invariant subspaces for A.
2.2 Eigenvectors and Similar Matrices
69
2.60 T h e o r e m . Let A G Mn,n(C). Let Ai, A2,..., A^ be the eigenvalues of A with multiplicities m i , 7712,..., ruk and let VFi, VF2,..., W^ 6e the subspaces of the relative generalized eigenvectors, Wi := ker(AiId — A). Then (i) the spaces VFi, VF2,..., Wk are supplementary, consequently there is a basis of C^ of generalized eigenvectors of A, (ii) dimWi = mi. Consequently, if we choose A ' G Mn,n(^) using a basis (ei, 6 2 , . . . , e^) where the the first rui elements span Wi, the following m2 elements span W2 and the last m^ elements span Wk- We can then write the matrix A ' in the new basis similar to A where A' has the form 0
Ai
0 \ 0
A2
A' = 0
0
where for every i = 1,... ,k, the block Ai is ami x ^ i matrix with Xi as the only eigenvector with multiplicity mi and, of course, (A^ Id — A^)"^^ = 0. Proof, (i) Clearly the polynomials pi{s) := (Ai - s)"^i, ^2(5) := (A2 - s)"^^, . . . , Pk{s) := (Afe — s)'^f^ factorize pA and are coprime. Set N^ := p i ( A ) and notice that Wi = k e r N i . Repeatedly applying Proposition 2.58, we then get kerpA(A) = k e r ( N i N 2 . . . N ^ ) = k e r ( N i ) © ker(N2N3 • • • N ^ )
= --- =
WieW2e"-®Wk.
(i) then follows from the Cayley-Hamilton theorem, kerpA(A) = C^. (ii) It remains to show that dim Wi = rrii VI Let (ei, 62, • • •, e-n) be a basis such that the first hi elements span VTi, the following /12 elements span W2 and the last h^ elements span W^. A is therefore similar to a block matrix Ai
0
0
0
A2
0
0
0
\
A' =
Afc h
where the block A^ is a square matrix of dimension hi := dim W^. On the other hand, the Qi X Qi matrix (A^ Id — Aj)"^^ = 0 hence all the eigenvalues of Xi Id — A i are zero. Therefore A i has a unique eigenvalue Ai with multiplicity /li, and p^i (s) := (s — Xi)^^. We then have k
PA{S) = PA'is) = UPAM i=l
k
= Yl(s ~ A)^S i=l
and the uniqueness of the factorization yields hi = rrii. The rest of the claim is trivial. D
70
2. Vector Spaces and Linear Maps
Another proof of dim Wi = ruj goes as follows. First we show the following. 2.61 L e m m a . IfO is an eigenvalue ofB^ eigenvalue for B"^ with multiplicity m.
Mn,n(C) with multiplicity
m, the 0 is an
Proof. The function 1 - A'^, A € C, can be factorized as 1 - A"^ = I l i l o ^ ( ^ ' ~ ^) where (jj := e*27r/m jg ^ j.QQ^ Q£ unity (the two polynomials have the same degree and take the same values at the m roots of the unity and at 0). For z,t E C m—l
.
m —1
---*'"=.™(i - (1)-)=.- n (-^ - ^) = n (-*- - *). i=0
i=0
hence m— l
2^Id-B^ = 2^Id-B^-
J|(u;^2ld-B). i=0
If we set qo(z) := H l i ^ ^ Q{^-^z)^ we have qo{0) ^ 0, and m—l
m—l
P B - (z"^) ••= detiz"^ Id - B"^) = Yl PB(UJ'Z) = JJ (uj^z^q{uj^z) i=0
=
z'^\o{z).
i=0
(2.10) On the other hand p-B^ — s'^qi(r) for some qi with qi{0) ^ 0 and some r > 1. Thus, following (2.10) PBm(s) = s'^qi(s), i.e., 0 is an eigenvalue of multiplicity m for B*^.
•
i4noi/ier proof that dim Wj = m,i in Theorem, 2.60. Since
y] m,i = y^ dim Wi = dim X, it suffices to show that dim Wi < rui Vi. Since 0 is an eigenvalue of B := Aj Id —A of multiplicity m := r/ij, 0 is an eigenvalue of multiplicity m for B"^ by Lemma 2.61. Since Wi is the eigenspace corresponding to the eigenvalue 0 of B ' ^ , it follows from Proposition 2.43 that dim Wj < m. D
d. Jordan's canonical form 2.62 Definition. A matrix B E Mn^n^) exists k >0 such that B'^ = 0.
^s said to he nilpotent if there
Let B G Mq^q{C) be a nilpotent matrix and let k be such that B^ = 0, but gfc-i _^ Q p^^ g^ basis (ei, 6 2 , . . . , Cg) of kerB, and, for each z = 1 , . . . , s, set ej := Ci and define ef, ef,..., ef' to solve the systems Be^ '-= ^i~ for j = 2 , 3 . . . as long as possible. Let {e^}, j = 1 , . . . ,fc^,z = 1 , . . . , ^, be the family of vectors obtained this way. 2.63 Theorem (Canonical form of a nilpotent matrix). Let "B be a q X q nilpotent matrix. Using the previous notation, {e]} is a basis ofC^. Consequently, if we write B with respect to this basis, we get a qxq matrix B ' similar to B of the form
2.2 Eigenvectors eind Similar Matrices
/K
71
0 \ 0 (2.11)
B' B
V
AJ
where each block Bj has dimension ki and, if ki > 1, it has the form 0 1 0 0 0 1 B,= 0 0 0
. . . . . .
0 0 0
0 0 0
. . .
1 0
(2.12)
The reduced matrix B ' is called the canonical Jordan form of the nilpotent matrix B . Proof. The kernels Hj := ker B-^ of B-^, j = 1,... ,k, form a strictly increasing sequence of subspaces {0} = i/o C / / i C i/2 C • • • C Hk-i C Hk := C*?. The claim then follows by iteratively applying the following lemma.
D
2.64 Lemma. For j = 1,2,... ,k — l, let (ei, e 2 , . . . , Cp) be a basis of Hj and let xi, X2,..., x^ be all possible solutions of Bxj = Cj, j = 1,... ,p. Then (ei, 6 2 , . . . , ep,xi, X2,..., Xr) is a basis for Hj^\. Proof. In fact, it is easily seen that o the vectors e i , e2, • . . , Cp, x i , 0:2, • • •, Xr are linearly independent, o {ei, 6 2 , . . . , ep,a;i, X 2 , . . . , Xr} C Hj^i, o the image of Hj^i by B is contained in Hj. Thus r, which is the number of elements ei in the image of B , is the dimension of the image of i / j + i by B . The rank formula then yields dim Hj^i
= dim Hj -\- dim f Im B PI ifj + i ) =
p-\-r.
Now consider a generic matrix A G M„,n(C). We first rewrite A using a basis of generalized eigenvectors to get a new matrix A ' similar to A of the form A^ A'
0
0
...
0 \ 0
(2.13)
72
2. Vector Spaces and Linear Maps
where each block A^ has the dimension of the algebraic multiplicity rrii of the eigenvalue A^ and a unique eigenvalue A^. Moreover, the matrix Ci := A J d - Ai is nilpotent, and precisely C^' = 0 and C^"''^ •=/- 0. Applying Theorem 2.63 to each Cj, we then show that A^ is similar to \i Id + B ' where B ' is as (2.11). Therefore, we conclude the following. 2.65 Theorem (Jordan's canonical form). Lei Ai, A 2 , . . , A^; he all distinct eigenvalues of A e Mn,n(C). For every z = 1 , . . . , A: (i) let (^2,1,..., Ui^p.) be a basis of the eigenspace Vx. (as we know, pi < rii),
(ii) consider the generalized eigenvectors relative to Ai defined as follows: for any j = 1,2,...,pi, a) set ejj := Uij, b) set efj to be a solution of a-l
iA-Xild)efj=e
(2.14)
a
as long as the system (2.14) is solvable, c) denote by OL{i,j) the number of solved systems plus 1. Then for every i = 1,... ,k the list (efj) with j = 1,... ,pi and a = 1,..., a(i,j) is a basis for the generalized eigenspace Wi relative to \i. Hence the full list Kj)
i = l....,kj
= l,...,pi,a
is a basis ofC^. By the definition of the {efj}, S:=
(2.15)
= l,...,a(z,j) if we set
1 2 1 2 1 2 ^ l , l 5 ^ I , l 5 • • • 5 ^ 1 , 2 ' ^1,25 • • • 5 ^ 2 , 1 ' ^ 2 , 1 ' • • •
the matrix J := S ^AS, that represents x —> A x in the basis (2.15), has the form
A
j
=
'1,1
0
0
0
0
Ji,)
0
0
0
0
\
'1,2
0
0
'k,pk
\ where i = 1,... ,k, j = 1,... ,pi, 3ij has dimension a{i,j)
and
2.2 Eigenvectors and Similar Matrices
73
if dim 3ij = 1
1
0
0
o\
0
\i
1
0
0
0
0
\i
1
0
0
0
0
...
\i
0
0
...
0
(\ ^iJ ~~ \
\ Q
otherwise.
1 \)
A basis with the properties of the basis in (2.15) is called a Jordan basis^ and the matrix J that represents A in a Jordan basis is called a canonical Jordan form of A. 2.66 E x a m p l e . Find a canonical Jordan form of ^ 2 0 0 0 0 1 2 0 0 A = 0 1 2 0 0 0 1 3
Vl
0
0
A is lower triangular hence the eigenvalues of multiplictiy 2. We then have /o 0 1 0 A-2Id = 0 1 0 0 Vl 0
^ 0 0 0
1
3/
A are 2 with multiplicity 3 and 3 with
0 0 0 1 0
0 0 0 1 1
o\ 0 0 0 1/
A — 2Id has rank 4 since the columns of A of indices 1, 2, 3 and 5 are linearly independent. Therefore the eigenspace V2 has dimension 5 — 4 == 1 by the rank formula. We now compute a nonzero eigenvalue,
/
(x\ y ( A - 2 Id)
0
z t
\
/0\ 0 0 0
y z-\-t
Voy For instance, one eigenvector is si := ( 0 , 0 , 1 , - 1 , 1 ) ^ . We now compute the Jordan basis relative to this eigenvalue. We have e\ -^ = s\ and it is possible to solve
\
/ 0 \ 0 1
z+t \x-ht + uj
-1
/
0 X
y
for instance, S2
^1,1
Vl/
(0,1,0, —1, 2 ) ^ is a solution. Hence we compute a solution of
74
2. Vector Spaces and Linear Maps
(
0
\ 1 0 -1
X
y z-ht \x + t-\-u/
\2j
hence S3 := ef ^^ = (1,0, 0, - 1 , 2 ) ^ . Looking now at the other eigenvalue,
A-3Id =
0 0 -1 1 0
0 -1 1 0 0
(''1 0 0
Vi
o\
0 0 0 0 1
0 0 0 0/
A is of rank 4 since the columns of indices 1, 2, 3 and 4 are linearly independent. Thus by the rank formula, the eigenspace relative to the eigenvalue 2 has dimension 1. We now compute an eigenvector with eigenvalue 2. We need to solve /x\
/
( A - 3 Id)
- x \
t
-y y- z z
\u)
\x + t/
y z
=
('\ 0 0 0
=
\oJ
and a nonzero solution is, for instance, 54 := (0,0,0, 0,1)-^. Finally, we compute Jordan's basis relative to this eigenvalue. A solution of /
- x \ 0 0 0
-y
y-z z
\x + t) is given by S5 = e^ ^ = (0, 0 , 0 , 1 , 0)-^. Thus, we conclude that the matrix
/o S = Ul
S2 S3 S4 S5
0 1 -1
=
Vi
0 1 0 -1 2
1 0 0 -1 2
0 0 0 0 1
o\ 0 0 1 0/
is nonsingular, since the columns are linearly independent, and by construction /2 0 S-^AS^: 0 0
1 2 0 0
0 0 1 0 2 0 0 3
0\ 0 0 1
Vo 0 0 0 3/
2.2 Eigenvectors and Similar Matrices
75
e. Elementary divisors As we have seen, the characteristic polynomial det(5ld-A),
seK,
is invariant by similarity transformations. However, in general the equality of two characteristic polynomials does not imply that the two matrices be similar.
2.67 E x a m p l e . The unique eigenvalue of the matrix A ^ = I
V /i
I is AQ and has
XoJ
multiplicity 2. The corresponding eigenspace is given by the solutions of the system
{
O'x'^ +0'X^
=0,
fjLX^ + 0 • X ^ = 0 .
If /x 7^ 0, then VXO,M ^ ^ dimension 1. Notice that AQ is diagonal, while A^ is not diagonal. Moreover, AQ and A^ with fi ^ 0 are not similar.
It would be interesting to find a complete set of invariants that characterizes the class of similarity of a matrix, without going explictly into Jordan's reduction algorithm. Here we mention a few results in this direction. Let A e Mn,n(C). The determinants of the minors of order k of the matrix 5 Id — A form a subset T>k of polynomials in the s variable. Denote by Dk{s) the g.c.d. of these polynomials whose coefiicient of the maximal degree term is normalized to 1. Moreover set Do{s) := 1. Using Laplace's formula, one sees that Dk-i{s) divides Dk{s) for all k = l , . . . , n . The polynomiSfe
are called the elementary divisors of A. They form a complete set of invariants that describe the complex similarity class of A. In fact, the following holds. 2.68 Theorem. The following claims are equivalent (i) A and B are similar as complex matrices, (ii) A and B have the same Jordan's canonical form (up to permutations of rows and columns), (iii) A and B have the same elementary divisors.
76
2. Vector Spxaces and Linear Maps
2.3 Exercises 2.69 f. Write a few 3 x 3 real matrices and interpret them as linear maps from M^ into E^. For each of these linear maps, choose a new basis of R^ and write the associate matrix with respect to the new basis both in the source and the target R^. 2.70 %. Let Vi, V 2 , . . . , Vri be finite-dimensional vector spaces, and let / o , / i , - • • ? / n be linear maps such that {0}:^yiAy2^
. . . ^ ^ ' V n _ / ^ ' Vn ^ {0}.
Show that, if I m ( / i ) = ker(/i+i) Vi = 0 , . . . , n - 1, then E ? = i ( - 1 ) ' d i m Vi = 0 . 2.71 f. Consider R as a vector space over Q. Show that 1 and ( are linearly independent if and only if ^ is irrational, ? ^ Q. Give reasons to support that R as a vector space over Q is not finite dimensional. 2.72 ^ L a g r a n g e m u l t i p l i e r s . Let X, Y and Z be three vector spaces over K and let f : X ^>'Y, g : X —^ Z he two linear maps. Show that ker p C ker / if and only if there exists a linear map £ : Z -^ Y such that / := io g. 2.73 f.
Show that the matrices
;:)• 0°)' have the same eigenvalues but are not similar. 2.74 ^ . Let Ai, A 2 , . . . , An be the eigenvalues of A € Mn,n(C), possibly repeated with their multiplicities. Show that tr A = Ai + • • -f An and det A = Ai • A2 • • • An. 2.75 %. Show that p{s) = s'^ -\- an-is^~^ the n X n matrix /
H
0 0
1 0
0 1
-ao
—ai
—a2
2.76 %, Let A G Mfc,fc(K), B € Mn,n{^), polynomial of the matrix
h ao is the characteristic polynomial of ... ...
0 0
\
-fln-i
/
C 6 Mk,n{^)-
VO
Compute the characteristic
B
2.77 % L e t ^ r C ^ -^ C^ be defined by ^(ei) := ei^i if i = 1 , . . . , n - l and ^(en) = e i , where e i , e2, • . . , en is the canonical basis of C^. Show that the associated matrix L is diagonizable and that the eigenvalues are all distinct. [Hint: Compute t h e characteristic polynomial.]
2.3 Exercises
2.78 %. Let A 6 Mn,n{^)
77
and suppose A^ = Id. Show that A is similar to
/T
for some k, 1 < k < n. [Hint: Consider the subspeices V+ := {a^ I A x = x } and V- := {x I A x = - x } and show that V+ 0 y_ = R^. ] 2.79 i[. Let A, B G Mn,nW be two matrices such that A^ = B ^ = Id and tr A = tr B . Show that A and B are similar. [Hint: Use Exercise 2.78.] 2.80 f. Show that the diagonizable matrices span Mn,n{^)- [Hint: Consider the matrices Mij = diag (1, 2 , . . . , n ) + 'Eij where Eij has value 1 at entry (i, j ) and value zero otherwise.] 2.81 %. Let A , B e Mn,n(^) and let B be symmetric. Show that the polynomial t -^ det(A -I- t B ) has degree less than R a n k B . 2.82 %, Show that any linear operator A : W^ dimension 1 or 2.
V^ has an invariant subspace of
2.83 f F i t t i n g d e c o m p o s i t i o n . Let / : X —>• X be a linear operator of a finitedimensional vector space and set f^ := / o • • • o / /c-times. Show that there exists k, 1 < k 0 Vx G R"' and x^Gx = 0 if and only if X — 0, in particular, G is invertible. b. Hermit ian spaces A similar structure exists on complex vector spaces. 3.9 Definition. Let X be a vector space over C. A Hermitian product on X is a map {\):XxX-^C which is (i) (SESQUILINEAR),
i.e.,
{av -h (iw\z) = a{v\z) + l3{w\z), {v\aw -f /3z) = a{v\w) -h l3{v\z) (ii) (iii)
(HERMITIAN) {Z\UJ)
yzex.
= (wlz) "iw.z £ X, in particular {z\z) G \
(POSITIVE DEFINITE) (Z\Z)
> 0 and {z\z) = 0 if and only if z = 0.
The nonnegative real number \z\ :=^ y^{z\z) is called the norm of z E X. 3.10 Definition. A finite-dimensional complex space with a Hermitian product is called a Hermitian space.
3.1 The Geometry of Euclidean and Hermitian Spaces
83
3.11 E x a m p l e . Of course the product (z,w) -^ {z\w) := wz is a Hermitian product on C. More generally, the map ( | ) : C^ x C " —^ C defined by n
{z\w) := z » w := ^z^w^ J=i
Mz = (z^, z^,...,
^ " ) , w = {w'^, w'^,...,
w'^)
is a Hermitian product on C^, called the standard Hermitian product of C^. As we shall see later, see Proposition 3.25, C^ equipped with the standard Hermitian product is in a sense the only Hermitian space of dimension n.
Let X be a complex vector space with a Hermitian product ( | ). Prom the properties of the Hermitian product we deduce \z -h w\'^ = (z -{• w\z -\-w) = (z\z -h K;) + (w\z -\-w) I
I
\
I
/
V I
/
\
I
, (3.2)
/
= {z\z) + {z\w) + {w\z) -h {w\w) = \z\'^ -h \w\'^ + 2di{z\w) from which we infer at once the following. 3.12 T h e o r e m ,
(ii)
(i) We have
(PARALLELOGRAM IDENTITY)
\z + w\'^ -\-\z-wf (iii)
( P O L A R I T Y FORMULA)
We have
= 2 (|zp + \wf)
\/z, w
eX.
We have
— iw\ 4:{z\w) =: (\z -\- w\'^ — \z — w;p 1 4- if 1^: -h iw\'^ — \z 2^ for all z^w G X. We therefore can compute the Hermitian product of z and w by computing four norms. (iv) (CAUCHY-SCHWARZ INEQUALITY) The following inequality holds \{z\w)\ < \z\ \wl
yz.weX;
moreover {z\w) = \z\ \w\ if and only if either w = 0, or z = Xw for some A G M, A > 0. Proof, (i), (ii), (iii) follow trivially from (3.2). Let us prove (iv). Let z, w E X and A = te*^, t,e eR. From (3.2) 0 < |z + \w\'^ = t'^\w\'^ + \z\^ + 2t^{e-'yz\w))
Vt G M,
hence its discriminant is nonpositive, thus me-'»{z\w))\
< \z\ \w\.
Since 0 is arbitrary, we conclude |(2;|ii;)| < \z\ \w\. The second part of the claim then follows as in the real case. If {z\w) = \z\ \w\, then the discriminant of the real polynomial t —> |2 + ttyp, t e R, vanishes. If -w; ^i^ 0, for some t G M we have \z + tw\'^ = 0, i.e., z = —tw. Finally, —t is nonnegative since —t{w\w) = {z\w) = \z\ \w\ > 0. D
,
84
3. Euclidean and Hermitian Spaces
3.13 i[. Let A" be a complex vector space with a Hermitian product and let z^w ^ X. Show that K^^lif)! = \z\ \w\ if and only if either it; = 0 or there exists A 6 C such that z = \w.
3.14 Definition. Let X be a complex vector space with a Hermitian product ( I ). Two vectors z^w e X are said to be orthogonal, and we write z 1.W, if {z\w) = 0. Prom (3.2) we immediately infer the following. 3.15 P r o p o s i t i o n ( P y t h a g o r e a n t h e o r e m ) . Let X be a complex vector space with a Hermitian product ( | ). If z^w E X are orthogonal, then
We see here a diiference between the real and the complex cases. Contrary to the real case, two complex vectors, such that |z + i(;p = |2:p + |if;p holds, need not be orthogonal. For instance, choose X := C, {z\w) := wz, and let z = 1 and w = i. 3.16 P r o p o s i t i o n . Let X be a complex vector space with a Hermitian product on it. The norm of z e X, \z\ :=
^/(z\z),
is a real-valued function \ \ : X —^R with the following properties (i) \z\ G R+ Vz G X. (ii) (NONDEGENERACY) (iii) (iv)
\Z\=0 if and only if z = 0. (1-HOMOGENEITY) \XZ\ = |A| \z\ VA G C, Vz G X. (TRIANGULAR INEQUALITY) \Z-\-W\ < \z\ + \w\ \/z,w
G X.
Proof, (i), (ii), (iii) are trivial, (iv) follows from the Cauchy-Schwarz inequality since \z + w\^ = \z\^ -f k | 2 + 2^(z\w)
< \z\^ + |i/;|2 + 2 \(z\w)\
0 yz,w e X and d{z,w) = 0 if and only ii z = w. (ii) (SYMMETRY) d{z,w) = d{w,z) ^z.w G X. (iii)
(TRIANGULAR INEQUALITY) d{z, w) < d(z, x) + d{x, w) Ww, x,z
e
X.
We refer to d as to the distance on X induced by the Hermitian product.
3.1 The Geometry of Euclidean and Hermitian Spaces
85
3.17 Hermitian products in coordinates. If X is a Hermitian space, the Gram matrix associated to the Hermitian product is defined by setting G=
[gij],
9ij :=
{ei\ej).
Using Hnearity {z\w) = Y2 {ei\ej)z'w^ = z^Gw ',3 = 1
if z = (z^, z ^ , . . . , z'^), w = (it;^, w'^,..., w'^) G C^ are the coordinate vector columns of z and w in the basis (ei, e 2 , . . . , Cn)- Notice that (i) G is a Hermitian matrix, G = G, (ii) G is positive definite, i.e., z^Gz > 0 Vz G C^ and z^Gz = 0 if and only if z = 0, in particular, G is invertible. c. Orthonormal basis and the Gram—Schmidt algorithm 3.18 Definition. Let X be a Euclidean space with scalar product { \ ) or a Hermitian vector space with Hermitian product ( | ). ^4 system of vectors {^a}aeA ^ ^ '^s called orthonormal if iea\ep) =Sap
Va,/3 G A
Orthonormal vectors are hnearly independent. In particular, n orthonormal vectors in a Euclidean or Hermitian vector space of dimension n form a basis, called an orthonormal basis. 3.19 E x a m p l e . The canonical basis ( e i , e 2 , . . . , Cn) of E"^ is an orthonormal basis for the standard inner product in E " . Similarly, the canonical basis ( e i , 6 2 , . . . , en) of C^ is an orthonormal basis for the standard Hermitian product in C"^. 3.20 %. Let ( I ) be an inner (Hermitian) product on a Euclidean (Hermitian) space X of dimension n and let G be the associated Gram matrix in a basis (ei, 6 2 , . . . , en)Show that G = Idn if and only if (ei, e2,. •., en) is orthonormal.
Starting from a denumerable system of linearly independent vectors, we can construct a new denumerable system of orthonormal vectors that span the same subspaces by means of the Gram-Schmidt algorithm, 3.21 Theorem (Gram-Schmidt). Let X be a real (complex) vector space with inner (Hermitian) product ( | ). Let t'l, t;2,..., t'jt,... be a denumerable set of linearly independent vectors in X. Then there exist a set of orthonormal vectors wi, W2,. - -, Wk,- • - such that for each fc = 1,2,... Span|it;i, W2,..., wA = Spanjt'i, t'2,.--, Vkj.
86
3. Euclidean and Hermitian Spaces
Proof. We proceed by induction. In fact, the algorithm W[ = VI,
wi := -—f-, '^p='^pYl^j=lMwj)wj p -P A^j= w Wp:=
—-
never stops since Wp ^ 0 "ip = 1,2,3,...
and produces the claimed orthonormal basis. D
3.22 Proposition (Pythagorean theorem). LetX be a real (complex) vector space with inner (Hermitian) product ( | ). Let (ei, e2, • . . , e^) be an orthonormal basis of X. Then k X =
Y^{x\ej)ej
xeX,
2=1
that is the ith coordinate of x in the basis (ei, e2,. •., Cn) is the cosine director {x\ei) of x with respect to ei. Therefore we compute k
{x\y) = Y^(x|ei) {y\ei)
if X is Euclidean^
i=l k
{x\y) = 2_\{^\^i) {y\^i)
^ / ^ ^-^ Hermitian,
i=l
so that in both cases Pythagoras's theorem holds: k
\x\' = {x\x) = Y;^\ix\ei)\-'. i=l
Proof. In fact, by linearity, for j = 1 , . . . , A; and x — ^Y^=\ ^^^i ^® have n
n
n
i=l
i=\
i=\
Similarly, using linearity and assuming X is Hermitian, we have {x\y) = (Y^x'ei
I (jZv^ej)
i=l n
= f^ i,3 = ^
j=\ k
hence, by the first part, n
{x\y) =
^{x\ei){y\ei).
x*^(e,|e^)
3.1 The Geometry of Euclidean and Hermitian Spaces
87
d. Isometries 3.23 Definition. Let X^Y he two real (complex) vector spaces with inner (Hermitian) products ( | )x and ( | )y. We say that a linear map A : X -^ Y is an isometry if and only if \A{X)\Y
= \x\x
Vx e
X,
or, equivalently, compare the polar formula, if {A{x)\A{y))Y
= {x\y)x
^x,yeX.
Isometries are trivially injective, but not surjective. If there exists a surjective isometry between two Euclidean (Hermitian) spaces, then X and Y are said to be isometric. 3.24 %. Let X,Y be two real (complex) vector spaces with inner (Hermitian) products { \ )x and { \ )Y and let A : X —>• V be a linear map. Show that the following claims are equivalent (i) A is an isometry, (ii) B C X is am orthonormal basis if and only if A{B) is an orthonormal basis for A{X).
Let X be a real vector space with inner product ( | ) or a complex vector space with Hermitian product ( | ). Let (ei, e 2 , . . . , e^) be a basis in X and f : X -^ K^, (K = R of K = C) be the corresponding system of coordinates. Proposition 3.22 implies that the following claims are equivalent. (i) (ei, 6 2 , . . . , Cn) is an orthonormal basis, (ii) £{x) = ((x|ei),...,(x|en)), (iii) £ is an isometry between X and the Euclidean space W^ with the standard scalar product (or C^ with the standard Hermitian product). In this way, the Gram-Schmidt algorithm yields the following. 3.25 Proposition. Let X be a real vector space with inner product ( | ) (or a complex vector space with Hermitian product { \ )) of dimension n. Then X is isometric to R^ with the standard scalar product (respectively, to C^ with the standard Hermitian product), the isometry being the coordinate system associated to an orthonormal basis. In other words, using an orthonormal basis on X is the same as identifying X with R"^ (or with C") with the canonical inner (Hermitian) product. 3.26 I s o m e t r i e s in c o o r d i n a t e s . Let us compute the matrix associated to an isometry R : X -^ Y between two Euclidean spaces of dimension n and m respectively, in an orthonormal basis (so that X and Y are respectively isometric to R^ (C^) and W^ ( C ^ ) by means of the associated coordinate system). It is therefore sufficient to discuss real isometries i? : E^ -^ E"^ and complex isometries RiC^ -^C^. Let i? : E^ —)• E ^ be linear and let R € Mm,nW be the associated matrix, il(x) = R x , X G E"^. Denoting by ( e i , e 2 , . . •, en) the canonical basis of E " ,
3. Euclidean and Hermitian Spaces
R =
ri
r2
...
Fn ,
Ti = R e j Vi.
Since (ei, e 2 , . . . , Cn) is orthonormal, R is an isometry if and only if ( r i , rg, • •., Tn) are orthonormal. In particular, m > n and Tj Ti = Ti* Tj = 5ij
i.e., the matrix R is an orthogonal matrix^ R ^ R = Idn. When m — n, the isometries i l : R'^ —» R'^ are necessarily surjective being injective, and form a group under composition. As above, we deduce that the group of isometries of R'^ is isomorphic to the orthogonal group 0(n) defined by 0{n) := | R e Mn,n(R) | R ^ R = I d n } . Observe that a square orthogonal matrix R is invertible with R~^ = R-^. If follows that R R ^ = Id and | det R | = 1. Similarly, consider C^ as a Hermit ian space with the standard Hermitian product. Let R:C -^C^ he linear and let R € Mm,n(C) be such that R{z) = R z . Denoting by ( e i , e 2 , . . . , Cn) the canonical basis of R^, R =
r i r2
...
Fn ,
Ti = R e i Vi = 1 , . . . , m.
Since ( e i , e 2 , . . . , en) is orthonormal, R is an isometry if and only if r i , r 2 , . . •, rn are orthonormal. In particular, m > n and
i.e., the matrix R is a unitary
matrix, R^R=
Idn.
When 171 = 71, the isometries R : C^ -^ C^ are necessarily surjective being injective, moreover they form a group under composition. From the above, we deduce that the group of isometries of C^ is isomorphic to the unitary group U(n) defined by U{n) := | R € Mn,n(C) | R ^ R = I d n } . Observe that a square unitary matrix R is invertible with R R R ^ = Id and | det R | = 1.
^= R
. I t follows that
e. The projection theorem Let X be a real (complex) vector space with inner (Hermitian) product ( I ) that is not necessarily finite dimensional, let F C X be a finitedimensional linear subspace of X of dimension k and let (ei, 6 2 , . . . , e^) be an orthonormal basis of V. We say that x G X is orthogonal to V if {x\v) = 0 Vz; G 1^. As (ei, e 2 , . . . , ek) is a basis of V, x 1.V if andonly if (x|ei) = 0 Vi = 1 , . . . ,fc. For all a; G X, the vector Py{x) :=^{x\ei)ei i=i
eV
3.1 The Geometry of Euclidean and Hermitian Spaces
89
is called the orthogonal projection of x in F , and the map Py : X -^ V, X —> Pv{x), the projection map onto V. By Proposition 3.22, Py(x) = x if X G F , hence ImP = V and P^ = P. By Proposition 3.22 we also have |Py(x)p = Zli=i l(^ki)P- The next theorem explains the name for Pv{x) and shows that in fact Pv{x) is well defined as it does not depend on the chosen basis (ei, e2, • •., e/c). 3.27 Theorem (of orthogonal projection). With the previous notation, there exists a unique z G V such that x — z is orthogonal to V, i.e., {x — z\v) = 0 \fv e V. Moreover, the following claims are equivalent. (i) X — z is orthogonal to V, i.e., {x — z\v) = 0 ^v e V, (ii) z GV is the orthogonal projection of x onto V, z = Pv{x), (iii) z is the point in V of minimum distance from x, i.e., \x — z\ < \x — v\
Mv
GV^
V ^ z.
In particular, Pv{x) is well defined as it does not depend on the chosen orthonormal basis and there is a unique minimizer of the function v -^ \x — v\, V e V, the vector z = Pv{x). Proof. We first prove uniqueness. U zi,Z2 £V are such that (x — Zi\v) = 0 ioT i = I, 2, then {zi — Z2\v) = 0 Vt; G V, in particular \zi — Z2\'^ = 0. (i) => (ii). From (i) we have {x\ei) = {z\ei) Vi = 1 , . . . , fc. By Proposition 3.22 k
k
z = Y^{z\ei)ei
= '^{x\ei)ei
=
Pv{x).
1=1
i=i
This also shows existence of a point z such that x — z is orthogonal to V and that the definition of Pv{x) is independent of the chosen orthonormal basis (ei, 6 2 , . . . , e^). (ii) =^ (i). Ii z = Py{x),
we have for every j = 1,...
,k
k
{x - z\ej) = {x\ej) - '^{x\ei)(ei\ej)
= {x\ej) - {x\ej) = 0,
i=l
hence (x — z\v) = 0 Vi'. (i) => (iii). Let v EV.
Since {x — z\v) = 0 v/e have
\x-v\'^
= \x-
z-\- z-vl"^
= \x-
z\'^ -\-\z-
i;p,
hence (iii). (iii) =^ (i). Let v e V. The function t ^y \x — z + t v p , t G M, has a minimum point at t = 0. Since \x- z + tvl"^ = \x - z\'^ + 2t^{x - z\v) -\- t'^\v\'^, necessarily 3f?(x — z\v) = 0. If X is a real vector space, this means {x — z\v) = 0, hence (i). If X is a complex vector space, from R{x — z\v) = 0 \/v e V, we also have 3f?(e-^^(x - z\v)) = 0 V(9 € M Vv G V, hence {x - z\v) = 0 Vi; € V and thus (ii). D
We can discuss linear independence in terms of an orthogonal projection. In fact, for any finite-dimensional space V C X, x e V ii and only if X — PY{X) = 0, equivalently, the equation x — Pv{x) = 0 is an implicit equation that characterizes V as the kernel of Id- Py.
90
3. Euclidean and Hermitian Spaces
3.28 %, Let W = Span { v i , V 2 , . . . , v^} be a subspace of K^. Describe a procedure that uses the orthogonal projection theorem to find a basis of W. 3.29 %, Given A G Mm,n(^), describe a procedure that uses the orthogonal projection theorem in order to select a maximal system of independent rows and columns of A. 3.30 %. Let A G Mm^nC^)- Describe a procedure to find a basis of ker A. 3.31 ^. Given k linear independent vectors, choose among the vectors (ei, e2, • . . , en) of R"" (n — k) vectors that together with v i , V 2 , . . . , v^ form a basis of R"^. 3.32 P r o j e c t i o n s in c o o r d i n a t e s . Let X be a Euclidean (Hermitian) space of dimension n and let F C X be a subspace of dimension k. Let us compute the matrix associated to the orthogonal projection operator Py : X -^ X in an orthonormal basis. Of course, it suffices to think of Py as of the orthogonal projection on a subspace of R^ (C^ in the complex case). Let (ei, e 2 , . . . , en) be the canonical basis of R^ and V C R^. Let v i , V 2 , . . . , v^ be an orthonormal basis of V and denote by V = [vM the n x k nonsingular matrix
V : = ^vi I V2 I ... I VfcJ so that Vj = Z ^ i L i ^ j e j - Let P be the n x n matrix associated to the orthogonal projection onto V, Py(x) = P x , or, Pi = P e i , z = l , . . . , n .
' = [pi |P2 I ••• | p n j , Then Pi = Py{ei)
= ^{ei.Wj)wj
=
Y^v]wj
3=1
j=l
j=lh=l
h=l
(3.3)
I.e.,
P = VV^. The complex case is similar. With the same notation, instead of (3.3) we have k
Pi = Py{ei)
k
= ^{ei
.Vj )wj = ^
v^Wj
J=l
3=1
3=1h=l
h=l
(3.4)
i.e., P = VV^.
f. Orthogonal subspaces Let X be a real vector space with inner product ( | ) or a complex vector space with Hermitian product ( | ). Suppose X is finite dimensional and let W^ C X be a linear subspace of X. The subset ly-^ := {x G X I {x\y) is called the orthogonal of W in X.
=OyyeW^
3.1 The Geometry of Euclidean and Hermitian Spaces
91
3.33 Proposition. We have (i) W-^ is a linear subspace of Xj
(ii) wnw-^ = {o}, (iii) (W^)^ = W, (iv) W and W-^ are supplementary, hence dim W + dim W-^ = n, (v) if Pw dnd Pw^ tt^^ respectively, the orthogonal projections onto W and W-^ seen as linear maps from X into itself, then Pw^ = ^^x — PwProof. We prove (iv) and leave the rest to the reader. Let {vi, V2, •.., Vk) he a basis of W. Then we can complete (vi, V2,... •, v^) with n — k vectors of the canonical basis to get a new basis of X. Then the Gram-Schmidt procedure yields an orthonormal basis {wi, W2^" ", Wn) of X such that W = Span iwi, W2,... •, Wk\- On the other hand Wk-\-i,..., Wn € W-^, hence dim W-^ = n — k. •
g. Riesz's theorem 3.34 Theorem (Riesz). Let X be a Euclidean or Hermitian space of dimension n. For any L e X* there is a unique XL G X such that L{X) = {X\XL)
VXGX.
'
(3.5)
Proof. Assume for instance, that X is Hermitian. Suppose L ^ 0, otherwise we choose XL = 0, and observe that d i m l m L = 1, and V := kerL has dimension n — 1 if d i m X = n. Fix XQ G V-^ with |a:o| = 1, then every x E X decomposes as X = x' -\- XxQ,
x' 6 kerL, A = (x|a:o)-
Consequently, L{x) = I/(a:') -f AL(xo) = (x|a;o)I/(a:o) = (a;|L(xo)xo) and the claim follows choosing x^ '= L{xo)xo.
•
The map (3 : X* ^^ X, L ^^ XL defined by the Riesz theorem is called the Riesz map. Notice that /? is linear if X is Euclidean and antilinear if X is Hermitian. 3.35 T h e Riesz m a p in c o o r d i n a t e s . Let X be a Euclidean (Hermitian) space with inner (Hermitian) product ( | ),fixa basis and denote by x = (x^, x ^ , . . . , x'^) the coordinates of x, and by G the Gram matrix of the inner (Hermitian) product. Let L G X* and let L be the associated matrix, L{x) = L x . From (3.5) L x = L{x) = {X\XL)
= X-^GXL
if X is Euclidean,
L x = L{x) = {X\XL)
= X"^G5CL"
if X is Hermitian,
Gx£, = L-^
or XL = G~^L-^
if X is Euclidean,
7^
1 'p
i.e.,
Gxx, = L
or x/, = G
L
if X is Hermitian.
In particular, if the chosen basis (ei, 62, • • •, en) is orthonormal, then G = Id and XX, = L-^
if X is Euclidean,
—T
XL = L
if X is Hermitian.
92
3. Euclidean and Hermitian Spaces
Figure 3.1. Dynamometer.
3.36 E x a m p l e (Work and forces). Suppose a mass m is fixed to a dynamometer. If 9 is the inclination of the dynamometer, the dynamometer shows the number L = mg cos 0,
(3.6)
where p is a suitable constant. Notice that we need no coordinates in R"^ to read the dynamometer. We may model the lecture of the dynamometer as a map of the direction V of the dynamometer, that is, as a map L : S"^ —* R from the unit sphere 5"^ = {x € E^ Ma;I = 1} of the ambient space V into R. Moreover, extending L homogeneously to the entire space V by setting L{v) := \v\ L{v/\v\), v e R^ \ {0}, we see that such an extension is linear because of the simple dependence of L from the inclination. Thus we can model the elementary work done on the mass m, the measures made using the dynamometer, by a linear map L : V -^ R. Thinking of the ambient space V as Euclidean, by Riesz's theorem we can represent L as a scalar product, introducing a vector F := XL EV such that {v\F) = L(y)
Vt; G V.
We interpret such a vector as the force whose action on the mass produces the elementary work L{v). Now fix a basis (ei,62,63) of V. li F = (F^^F"^, F^)^ is the column vector of the force coordinates and L = (Li,L2,I/3) is the 1 x 3 matrix of the coordinates of L in the dual basis, that is, the three readings Li = L{ei), i = 1,2,3, of the dynamometer in the directions 61,62,63, then, as we have seen.
In particular, if (61,62,63) is an orthonormal basis.
h. The adjoint operator Let XyY be two vector spaces both on K = M or K = C with inner (Hermitian) products ( | )x and ( | )y and let A : X -^Y he a, hnear map. For any y eV the map X -> {A{x)\y)Y
3.1 The Geometry of Euclidean and Hermitian Spaces
93
defines a linear map on X, hence by Riesz's theorem there is a unique A*{y) e X such that {A{x)\y)Y = {y\A%x))x
Vx G X, Vy e Y,
(3.7)
It is easily seen that the map y -^ A*{y) from Y into X defined by (3.7) is linear: it is called the adjoint of A. Moreover, (i) let A,B : X -^ Y he two linear maps between two Euchdean or Hermitian spaces. Then {A + B)* = A* ^ B*, (ii) (XA)* = XA* if A G M and A : X ^ Y is SL hnear map between two Euclidean_spaces, (iii) (XA)* = XA"" if A G C and A : X ^ F is a hnear map between two Hermitian spaces, (iv) ( 5 o A)* = A* o 5 * if A : X ^ F and B : F ^ Z are hnear maps between Euclidean (Hermitian) spaces, (v) (A*)* = ^ i f A : X - ^ y i s a linear map. 3.37 %, Let X, Y be vector spaces. We have already defined an adjoint A : Y* —^ X* with no use of inner or Hermitian products, < A{y*),x>=
\/x e X, My* e
Y\
If X and Y are Euclidean (Hermitian) spaces, denote by /3x : X* —^ X, /Sy : Y* -^ Y the Riesz isomorphisms and by A* the adjoint of A defined by (3.7). Show that A* =
/3xoAo0-\ 3.38 T h e adjoint o p e r a t o r in c o o r d i n a t e s . Let X, Y be two Euclidean (Hermitian) spaces with inner (Hermitian) products { \ )x and ( | ) y . Fix two bases in X and y , and denote the Gram matrices of the inner (Hermitian) products on X and Y respectively, by G and H . Denote by x the coordinates of a vector x. Let A : X -^ Y be a linear map. A* be the adjoint map and let A, A* be respectively, the associated matrices. Then we have (A(x)\y)Y
= x^A^Hy,
{x\A*{y))
= x^'GA'y,
{x\A*iy))
= x^GA^^y,
if X and Y are Euclidean and (A{x)\y)Y
= x^A^Hy,
if X and Y are Hermitian. Therefore GA* = A ^ H
if X and Y are Euclidean,
GA* = A ^ H
if X and Y are Hermitian,
or, recalling that G ^ = G, ( G ~ i ) ^ = G - \ H ^ = H if X and Y are Euclidean and that G ^ = G, ( G - i ) ^ = G - i , and H ^ = H if X and Y are Hermitian, we find A* = G - ^ A ^ H
if X and Y are Euclidean,
A* = G~^ A ^ H
if X and r are Hermitian.
In particular. A* = A ^ in the Euclidean case, _r A* = A in the Hermitian case if and only if the chosen bases in X and Y are orthonormal.
(3.8)
94
3. Euclidean and Hermitian Spaces
3.39 Theorem. Let A: X -^Y be a linear operator between two Euclidean or two Hermitian spaces and let A* : Y -^ X be its adjoint. Then Rank^* = R a n k ^ . Moreover, (Im^)-^=ker^*,
Im^=(ker^*)^,
{lmA*)-^=kerA,
IinA* = (ker^)-^.
Proof. Fix two orthonormal bases on X and Y, and let A be the matrix associated to A using these bases. Then, see (3.8), the matrix associated to A* is A-^, hence Rank A* = Rank A ^ = Rank A = Rank A, and dim(ker A*)-^ = dim Y - dim ker A* = Rank A* = Rank A = dim Im A. On the other hand, Im A C (ker A*)-*- since, if t/ = A{x) and A*{v) = 0, then {y\v) = (A{x)\v) = {x\A*{v)) = 0. We then conclude that (ker A*)-*- = ImA. The other claims easily follow. In fact, they are all equivalent to I m A = (kerA*)-*-. D
As an immediate consequence of Theorem 3.39 we have the following. 3.40 Theorem (The alternative theorem). Let A : X -^ Y be a linear operator between two Euclidean or two Hermitian spaces and let A* : Y -^ X be its adjoint. Then A|kerA-L * (ker^)-^ -^ ImA and At^ : ImA —^ (ker^)-^ are injective and onto, hence isomorphisms. Moreover, (i) A(x) = y has at least a solution if and only if y is orthogonal to kerA^ (ii) y is orthogonal to ImA if and only if A*{y) = 0, (iii) A is injective if and only if A* is surjective, (iv) A is surjective if and only if A* is injective. 3.41 . A more direct proof of the equaUty ker A = (Im^*)-^ is the following. For simplicity, consider the real case. Clearly, it suffices to work in coordinates and by choosing an orthonormal basis, it is enough to show that Im A = (ker A^)-^ for every matrix A € Mm,n{^)Let A = (a'j) G Mm„n{^) and let a^, a^,..., a"^ be the rows of A, equivalently the columns of A ^ , /ai\
A =
\a^J Then,
3.2 Metrics on Real Vector Spaces
Ax
/ a\x^ + alx^ + • • • + al^x"" \ ajx^ + ajx^ + • • • + alx"" \ a f x^ + afx^ + • • • + a ^ x ^ /
95
2
\ a^ • x/
Consequently, x G ker A if and only if a* • x = 0 Vi = 1 , . . . , m, i.e., kerA = S p a n U \ s?,...,
a^}
= (ImA^)"^.
(3.9)
3-2 Metrics on Real Vector Spaces In this section, we discuss bilinear forms on real vector spaces. One can develop similar considerations in the complex setting, but we refrain from doing it. a. Bilinear forms and linear operators 3.42 Definition. Let X he a real linear space. A bilinear form on X is a map 6 : X X X —> M that is linear in each factor, i.e., b{ax + /3y, z)=a 6(x, z)-\-(3b{y, z), 6(x, ay -\- f3z) = a 6(x, y) + 0 b{z, z). for all X, y, X G X and all forms on X by B{X).
a,l3e
We denote the space of all bilinear
Observe that, if 6 G B{X), then 6(0, x) - b{0,y) = 0 Vx,?/ G X. The class of bihnear forms becomes a vector space if we set (6i + 62)(x,y) := 6i(x,y) + b2{x,y), (A6)(x, y) := 6(Ax, y) =- 6(x, A, y). Suppose that X is a linear space with an inner product denoted by ( I ). If 6 G S(X), then for every y e X the map x -^ b{x,y) is linear, consequently, by Riesz's theorem there is a unique B := B{y) G X such that b{x,y) = {x\B{y)) Vx G X. (3.10) It is easily seen that the map y —> B{y) from Y into X defined by (3.10) is linear. Thus (3.10) defines a one-to-one correspondence between B{X) and the space of linear operators £(X, X), and it is easy to see this correspondence is a linear isomorphism between B{X) and £(X, X).
3. Euclidean and Hermitian Spaces
96
Ueber
die Hjpothe8eQ,welohe derGeometne zu Gruad^ liegen.
d ie
Hypothesen, B. R i e m a n n.
welche der Oeometrie za 6nmde liegen. iBcljung. P l » n d«r Un BekaantUoh wut dte Qmmctrte to . den Begritf de« iUittmea, ab die «r*««n Grandbagriib tta die Baume al* etwaa Gegebene* wnxa. Sie giebt ton iha NomlnaldefinitioneD, w«hrend die wcMBtUchen BeetimmuBgeQ in Form TOO Axiomen auftretesi. D u VerMltaiM dieeer VoreuMetrangen bleibt dabei im Dankeln; man •ieht weder «in, ob uod in wie wdt "ihte Verbindung nothwendig, fiocli a priori, ob sie mflgHch Ut. Uiese Dankelheit wntde such von Ettklid bi« auf L e g e n d r e . der Oeometrie ni nennen. urn den berOl>mte»ten neoeren Bei weder Ton den MathenwUkem, noch on den PhiloMiphen, welch* sich te die* aeinen Oiund wohl dariu, damit betcbiitigttD. geboben. £ • h daM der allgemeine Begriff mehr£uh tttgedehnter i . ganz onbearbeitet blieb. Icli cliein die babe mir daher tunichet die A n ^ b e gesteilt. den Begriff einer mehrftch auigedehateo GrBMe au< aUgemrinei» Ght««»«nbegri«»n lu conetniimi. £ • wird daraos hervoigehea. dan ein« m«hrfe«h auigedehnte GrOwe ver-
B. R i e m a n n.
t) Dine AMModlooc H tm 10. Jani 18&4 ran d«m 2«Mk MiMT HaUiUtioa Ttnaitaltelen CoUoqiiam Iflerans eridlrt Mch die Form der D«nMlni(,
d«m dreix«bs«m B«nd« 4*r AUiandlaag«n der KSni|^ieiicD OtMllwiwrt der WiMMtehaftcn tn OSttingen.
Gdttingen, in der Diet«riolitoli»ii 1867,
BoohhandUng.
BmanMbwrig, im JuU 1M7.
Figure 3.2. Frontispiece and a page of the celebrated dissertation of G. F. Bernhard Riemann (1826-1866).
3.43 Bilinear forms in c o o r d i n a t e s . Let X be a finite-dimensional vector space and let (ei, 6 2 , . . . , e-n) be a basis of X. Let us denote by B the nxn matrix, sometimes called the Gram matrix of b. B = [bij]
bij =
b{ei,ej).
Recall that the first index from the left is the row index. Then by linearity, if for every x, 2/, X = {x^, cc^,..., ic^)^ and y = (y^, 2/^, • • •, y^)^ € M^ are respectively, the column vectors of the coordinates of x and y, we have
bix,y) = J2 ^ij^'y' = x^-lBy) =x^By. In particular, a coordinate system induces a one-to-one correspondence between bilinear forms in X and bilinear forms in W^. Notice that the entries of the matrix B have two lower indices that sum with the indices of the coordinates of the vectors x, y that have upper indices. This also reminds us that B is not the matrix associated to a linear operator B related to B . In fact, if instead N is the associated linear operator to 6, bix.y) = (x\N{y))
Vx,y€X,
then y ^ B x - b(x,y) = {x\N{y))
= y^N^Gx
where we have denoted by G the Gram matrix associated to the inner product on X , G = [gij], gij — (ei|ej), and by N the n v. n matrix associated to A^ : X —^ X in the basis (ei, 6 2 , . . . , en). Thus N^G = B or, recalling that G is symmetric and invertible, N = G-iB^.
3.2 Metrics on Real Vector Spaces
97
b. Symmetric bilinear forms or metrics 3.44 Definition. Let X be a real vector space. A bilinear form b G B{X) is said to be (i) symmetric or a metric, ifb{x^y) = b{y,x) Vx,?/ G X, (ii) antisymmetric ifb{x,y) = —b{y,x) ^x,y G X. The space of symmetric bilinear forms is denoted by Syra{X). 3.45 %, Let b G B(X). Show that bs{x,y) := ^{b{x,y) + b{y,x)), x,y € X , is a symmetric bilinear form and bA{x,y) :— ^{b{x,y) — b{y,x)), x,y £ X, is an antisymmetric bilinear form. In particular, one has the natural decomposition 6(x, y) = bs {x, y) + 6^ (x, y) of b into its symmetric and antisymmetric parts. Show that b is symmetric if and only if 6 = 65, and that b is antisymmetric if and only if 6 = 6^^. 3.46 %. Let 6 G B(X) be a symmetric form, and let B be the associated Gram matrix. Show that 6 is symmetric if and only if B ^ = B . 3.47 %, Let b e B(X) and let N be the associated linear operator, see (3.10). Show that AT is self-adjoint, N* = AT, if and only if 6 G Sym{X). Show that N* = -N if and only if b is antisymmetric.
c. Sylvester's theorem 3.48 Definition. Let X be a real vector space. We say that a metric on X, i.e., a bilinear symmetric form g : X x X ^^^R is (i) nondegenerate if^xeX,x^O there is y e and Wy e Xy y y^ 0 there is x e X such that (ii) positively definite ifb{x^x) > O'^x ^ X, x ^ (iii) negatively definite ifb{x,x) < 0 \/x e X, x
X such that 6(x, y) ^ 0 6(x, y) j^ 0, 0, ^0.
3.49 %. Show that the scalar product {x\y) on X is a symmetric and nondegenerate bilinear form. We shall see later, Theorems 3.52 and 3.53, that any symmetric, nondegenerate and positive bilinear form on a finite-dimensional space is actually an inner product.
3.50 Definition. Let X be a vector space of dimension n and let g G Sym{X) be a metric on X. (i) We say that a basis (ei, e 2 , . . . , Cn) is g-orthogonal if g{ei,ej) = 0 Vz,j = l,...,n, i^j. (ii) The radical of g is defined as the linear space md{g) := | x G X\g{x,y)
= 0\fye
x}.
(iii) The range of the metric g is r{g) := n — dimrad^.
98
3. Euclidean and Hermitian Spaces
Figure 3.3. Jorgen Gram (1850-1916) and James Joseph Sylvester (1814-1897).
(iv) The signature of the metric g is the triplet of numbers {i^{g)J.{g),io{g)) where i^{g) := maximum of the dimensions of the subspaces V C X on which g is positive definite, g{v^v) > O^v EV, V ^^ 0, i-{g) := maximum of the dimensions of the subspaces V C X on which g is negative definite, g{v,v) 0, g{ei,ei) < 0, g{ei^ei) = 0. Then n+ = 'i+id), u- = i-{g) and UQ = ioig)- In particular, n^., n_, no do not depend on the chosen g-orthogonal basis, i^{g) + i-{g) = r{g)
and
i+{g) + i-{g) + io{g) = n.
3.2 Metrics on Real Vector Spaces
Proof. Suppose that g(ei,ei)
99
> 0 for i = 1 , . . . , n-j.. For each v = X^i^i '^^^i^ we have 9{v,v) = Y^\v'\'^g{ei,ei)
> 0,
1=1
hence dim Span {ei, 6 2 , . . . , en^} < H ( P ) - On the other hand, if l y C X is a subspace of dimension i-^{g) such that g{v, i;) > 0 Vv € W, we have V y n S p a n { e n ^ + i , . . . , e n } = {0} since g{v,v) < 0 for all v G S p a n { e n , + i , . . . , e n } . Therefore we also have i-\-{g) < n — {n — n-|_) = n.^. Similarly, one proves that n _ = i-{g). Finally, since G := [^(ei,ej)] is the matrix associated to g in the basis (ei, 6 2 , . . . , en), we have io{g) = d i m r a d ( y ) = d i m k e r G , and, since G is diagonal, d i m k e r G = UQ. D
d. Existence of ^-orthogonal bases The Gram-Schmidt algorithm yields the existence of an orthonormal basis in a Euclidean space X. We now see that a slight modification of the GramSchmidt algorithm allows us to construct in a finite-dimensional space a ^-orthogonal basis for a given metric g. 3.53 Theorem (Gram-Schmidt). Let g be a metric on a finite-dimensional real vector space X. Then g has a g-orthogonal basis. Proof. Let r be the rank of gf, r := n—dimrcid (^), and let {wi, W2,. •., Wn-r) be a basis of rad (g). If V denotes a supplementary subspace of rad (^f), then V is p-orthogonal to radg and d i m F = r. Moreover, for every v £ V there is z £ X such that g{v, z) ^ 0. Decomposing zasz = w-\-t^wEV^t£ r a d ( ^ ) , we then have g{v,w) = g{v,w) + g{v, t) = g(v, z) ^ 0, i.e., g is nondegenerate on V. Since trivially, (i^i, i i ; 2 , . . . , Wn-r) is ^-orthogonal and V is ^-orthogonal to (i^i, 1^2, • • •, '^n-r)-, in order to conclude it suffices to complete the basis {w\^ W2,. - • -, Wn-r) with a ^f-orthogonal basis of V; in other words, it suffices to prove the claim under the further assumption that g be nondegenerate. We proceed by induction on the dimension of X. Let ( / i , / 2 , . . . , / n ) be a basis of X. We claim that there exists ei G X with g{ei,ei) / 0. In fact, if for some fi we have gifiifi) 7^ O5 we simply choose ei := / i , otherwise, if g{fi,fi) = 0 for all i, for some k ^ 0 we must have g{fi, fk) ¥" 0, since by assumption rad (g) = {0}. In this case, we choose ei := / i -|- /^ as
g{fi + fkji + fk) = gifufi) + 2g{fiJk) -f gifkJk) = O + 2g{fiJk) + 0 / 0 . Now it is easily seen that the subspace Vi:=[vex\g{euv)
= 0]
supplements S p a n { e i } , and we find a basis (t'2, • • • ,^n) of Vi such that g{vj,ei) for all j = 2 , . . . , n by setting ._.
= 0
fl(/j,ei) P(ei,ei)
Since g is nondegenerate on Vi, by the induction assumption we find a p-orthogonal basis ( e 2 , . . . , Cn) of Vi, and the vectors (ei, 6 2 , . . . , Cn) form a p-orthogonal basis of X. D
100
3. Euclidean and Hermitian Spaces
A variant of the Gram-Schmidt procedure is the following one due to Carl Jacobi (1804-1851). Let g : X X X -^ R he Si metric on X. Let (/i, / 2 , . •., fn) be a basis of X, let G be the matrix associated to g in this basis, G = [gij], gij = 9{fu fj)- Set Ao = 1 and for A: = 1 , . . . , n Ak :=detGA: where G^ is the k x k submatrix of the first k rows and k columns. 3.54 Proposition (Jacobi). / / A^ 7^ 0 for all k = 1 , . . . , n, there exists a g-orthogonal basis (ei, 6 2 , . . . , e^) of X; moreover g[ek,ek) := —7—. Proof. We look for a basis (ei, e2, • . . , en) so that I ei
=a\fi,
62 = 0 2 / 1 +«2/2»
or, equivalently, ek
•=Yl^kfi^
(3.11)
k = l,..-,n,
as in the Gram-Schmidt procedure, such that g(ei,ej) = 0 for i ^ j . At first sight the system giei^ej) = 0, i :^ j , is a system in the unknowns aj.. However, if we impose that for all fc's p(efc,/i) = 0 Vi = l , . . . , A ; - l , (3.12) by linearity g{ek,ei) = 0 for i < /c, and by symmetry gie^^ei) = 0 for i > k. It suffices ,a^ then to fulfill (3.12) i.e., solve the system offc— 1 equations in k unknowns al.,a^,... k
Yl^ifjJiK
=0,
Vi = 1,... ,^ - 1.
(3.13)
j=i
If we add the normalization condition k
Yl9{fjJk)ai
= l,
(3.14)
j=i
we get a system of k equations in k unknowns of the type G ^ x = b , where Gk = [9ij]y 9ij '•= gifiJj)^ X = (oj^,...,aj^)^ and b = ( 0 , 0 , . . . , 1)^. Since det Gfc = Afc and Afc 7^ 0 by assumption, the system is solvable. Due to the arbitrarity of fc, we are able to find a gf-orthogonal basis of type (3.11). It remains to compute ^(6^,6^). Prom (3.13) and (3.14) we get
9iek,ek) = Yl ^O'kOUiJj) = J2'^k{Y^9(fiJj)4)
=Yl/'k^Jk=cit j=i
and we compute a^ by Cramer's formula, ^k —
hence giek.Ck) =
Ak-i/Ak-
Afc
'
3.2 Metrics on Real Vector Spaces
101
3.55 Remark. Notice that Jacobi's method is a rewriting of the GramSchmidt procedure in the case where g{fi^fi) 7^ 0 for all i's. In terms of Gram's matrix G := [gi^i^ej)]^ we have also proved that
T-GT = d i a g { ^ } for a suitable triangular matrix T. 3.56 Corollary (Sylvester). Suppose that Ai,...,Afc ^ 0. Then the metric g is nondegenerate. Moreover, i-{g) equals the number of changes of sign in the sequence (1, Ai, A 2 , . . . , A^). In particular, if Ak > 0 for all k 's, then g is positive definite. Let (ei, 6 2 , . . . , en) be a ^-orthogonal basis oi X. By reordering the basis in such a way that > 0 if j = l , . . . , 2 + ( ^ ) , < 0 if j = 2+(p) + l,...,2+(^) + *-(^), = 0 otherwise;
9{ej,ej)\ and setting (
fr-=^
[ Cj
^
if 7 - 1
i^(a)-\-i
(a)
otherwise
we get
9{fjJj)
( 1 if j = l , . . . , i + ( ^ ) , = < - 1 if j = n ( ^ ) + l , . . . , i + ( ^ ) + z_(c/), I 0 otherwise.
e. Congruent matrices It is worth seeing now how the matrix associated to a bilinear form changes when we change bases. Let (ei, 6 2 , . . . , Cn) and (/i, /2, • • •, fn) be two bases of X and let R be the matrix associated to the map R : X -^ X, R{ei) := fi in the basis (ei, 6 2 , . . . , e^), that is
R := [ri I r2 where r^ is the column vector of the coordinates of fi in the basis (ei, 6 2 , . . . , Cn)' As we know, if x and x' are the column vectors of the coordinates of X respectively, in the basis (ei, 6 2 , . . . , en) and (/i, /2, • • •, /n), then x = R x ' . Denote by B and B ' the matrices associated to b respectively, in the coordinates (ei, 6 2 , . . . , e^) and (/i, / 2 , . • •, fn)- Then we have
102
3. Euclidean and Hermitian Spaces
b{x,y)
x'^BV
b{x, y) = x^By = x "T-DTJ R^BRy' hence (3.15)
B' = R^BR.
The previous argument can be of course reversed. If (3.15) holds, then B and B ' are the Gram matrices of the same metric h on W^ in different coordinates 6(x,2/)=x^bV = (RxfB(Ry). 3.57 Definition. Two matrices A , B G Mn^n{ if there exists a nonsingular matrix R E Mn,n{
are said to he congruent such that B = R ^ A R .
It turns out that the congruence relation is an equivalence relation on matrices, thus the nxn matrices are partitioned into classes of congruent matrices. Since the matrices associated to a bilinear form in different basis are congruent, to any bilinear form corresponds a unique class of congruent matrices. The above then reads as saying that two matrices A, B G Mn,n{^) are congruent if and only if they represent the same bilinear form in different coordinates. Thus, the existence of a ^r-orthogonal basis is equivalent to the following. 3.58 T h e o r e m . A symmetric matrix A G Mn^n{ diagonal matrix.
is congruent to a
Moreover, Sylvester's theorem reads equivalently as the following. 3.59 T h e o r e m . Two diagonal matrices I, J G Mn,n(^) o,f^ congruent if and only if they have the same number of positive, negative and zero entries in the diagonal. If, moreover, a symmetric matrix A G Mji_^ji{M^ is congruent to (\ 0 0 Ida 0
V 0
-Idfc
0
0
0/
then (a^b^n — a — b) is the signature of the metric y ^ A x . Thus the existence of a ^-orthogonal matrix in conjunction with Sylvester's theorem reads as the following. 3.60 T h e o r e m . Two symmetric matrices A , B G Mn,nO^) o,re congruent if and only if the metrics y-^ A x and y-^Bx on W^ have the same signature (a, 6, r). In this case, A and B are congruent to
3.2 Metrics on Real Vector Spaces
V
103
\
Ida
0
0
0
-Idfc
0
0
0
0/
f. Classification of real metrics Since reordering the basis elements is a linear change of coordinates, we can now reformulate Sylvester's theorem in conjunction with the existence of a ^-orthonormal basis as follows. Let X, Y be two real vector spaces, and let g, h be two metrics respectively, on X and Y. We say that {X,g) and (F, h) are isometric if and only if there is an isomorphism L : X —^ Y such that h{L{x),L{y)) = g{x^y) Wx,y G X. Observing that two metrics are isometric if and only if, in coordinates, their Gram matrices are congruent, from Theorem 3.60 we infer the following. 3.61 Theorem. (X, g) and (y, h) are isometric if and only if g and h have the same signature, (hid) ^'^-{9)^0(9)) = (i+(/i),i_(/i),ioW). Moreover, if X has dimension n and the metric g on X has signature (a^b.r), a + b + r = n, then (X^g) is isometric to (lR^,/i) where /i(x,y) := x^Hy and
/I H
V
Ida
0
0
0
-Idb
0
0
0
0/
According to the above, the metrics over a real finite-dimensional vector space X are classified, modulus isometrics, by their signature. Some of them have names: (i) The Euclidean metric: i-^{g) = n, i-{g) = io{g) — 0; in this case g is a scalar product. (ii) The pseudoeuclidean metrics: i-{g) = 0. (iii) The Lorenz metric or Minkowski metric: i-\-{g) = n — 1, i-{g) — 1, ^o(^) = 0. (iv) The Artin metric i-\-{g) = i-{g) = P, ^o(p) = 03.62 %. Show that a biUnear form ^ on a finite-dimensional space X is an inner product on X if and only if g is symmetric and positive definite.
104
3. Euclidean and Hermitian Spaces
g. Quadratic forms Let X be a finite-dimensional vector space over M and let b G B{X) be a bilinear form on X. The quadratic form 0 : X ^ R associated to b is defined by 0(x) = b{x,x)^ X e X. Observe that 0 is fixed only by the symmetric part of b bs{x,y) :=
-{b{x,y)-\-b{y,x))
since b{x, x) = bs{x, x) \fx G X. Moreover one can recover bs from (/) since bs is symmetric, bs{x, y) = 2 v^^ -^y)-
^ W - ^(^)) •
Another important relation between a bilinear form b G B{X) and its quadratic form 0 is the following. Let x and v £ X. Since (j){x -f- tv) = (f){x) + M b(x^ v) -f b{v, we have j^(t>{x + tv)^t=o = 2bs{x,v).
(3.16)
We refer to (3.16) saying that the symmetric part bs ofb is the first variation of the associated quadratic form. 3.63 Homogeneous polynomials of degree two. Let B = [bij] € Mn,nW and let n
6(x,y) := x^By = ^
bijx'y^
be the bilinear form defined by B on R*^, x = ( x \ x ^ , . . . , x"), y = ( y \ ?/2,..., 2/^). Clearly, n
(/)(x) = 6(x, x) = x^Bx = ^
bijx'^x^
is a homogeneous polynomial of degree two. Conversely, any homogeneous polynomial of degree two P{x) = ^
bijx'x^ = x^Bx
i,j = l,n i 0; (h) saddle a^x^ - b'^y^ — 2cz = 0, c > 0; (i) elliptic cyhnder: a^x^ + b'^y^ — 1; (j) straight line: a^x^ + 6^2/^ = 0; (k) imaginary straight line: c?x^ -\-b^y^ = —1; (1) hyperbolic cylinder cP'x^ — b^y^ — 1; (m) nonparallel planes: d^x^ — b'^y^ = 0; (n) parabolic cylinder a?x'^ — 2cz, c > 0; (o) parallel planes: a?x'^ = 1; (p) plane: a^x^ = 0; (q) imaginary plane: a?x'^ = —1.
3.3 Exercises
109
in this new basis and the quadric can be written as (/>(x) = ( x O ^ A ' x ' + 2(b'|xO + 2 ( b ' ' | x ' 0 + C2 = 0 where x ' , b ' G Im A, x ' ' , b ' ' G ker A, x = x ' + x ' ' , b = b ' + b ' ' and det A ' / 0. Applying the argument in (iii) to ( x ' ) ^ A ' x ' + 2 b ' « x ' +C2, we may further transform the quadric into (l){x) = ( x O ^ A ' x ' + C3 + 2 b ' ' . x ' ' = 0, and, writing j / ' := —2 b " • x ' ' — C3, that is, by means of an affine transformation that does not change the variable x', we end up with (f){x) = ( x ' ) ' ^ A ' x ' — y' = 0.
3.3 Exercises 3.72 %, Starting from specific lines or planes expressed in parametric or implicit way in M^, write o the straight line through the origin perpendicular to a plane, o the plane through the origin perpendicular to a straight line, o the distance of a point from a straight line and from a plane, o the distance between two straight lines, o the perpendicular straight line to two given nonintersecting lines, o the symmetric of a point with respect to a line and to a plane, o the symmetric of a line with respect to a plane. 3.73 %. Let X, Y be two Euclidean spaces with inner products respectively, ( | ) x and ( I ) y . Show that X X y is an Euclidean space with inner product {xi\yi)-\-{x2\y2), (xi,X2), (2/1,2/2) E X xY. of X X y .
Notice that X x {0} and {0} x Y are orthogonal subspaces
3.74 If. Let X, 2/ G M"". Show that x ± y if and only if \x - ay\ > \x\ Va G M. 3.75 %, The graph of the map A{x) := Ax, A G Mm,n{^) GA := { ( x , y ) | a : G R ' ' , yeR^,
y = A{x)\
is defined as C R"" x R'^.
Show that GA is a linear subspace of M'^+'^ of dimension n and that it is generated by the column vectors of the {k -\- n) x n
1\
A Id„
110
3. Euclidean and Hermitian Spaces
Also show that the row vectors of the k x {n + k) matrix
-Idfe
A generates the orthogonal to GA-
3.76 %. Write in the standard basis of R^ the matrices of the orthogonal projection on specific subspaces of dimension 2 and 3. 3.77 %, Let X be Euclidean or Hermitian and let V, W be subspaces of X. Show that
v-^nw^ = {v-\-w)^. 3.78 If. Let / : Mn,n{K) -> K be a linear map such that / ( A B ) = / ( B A ) V A , B € Mn,niK). Show that there is A 6 K such that / ( X ) = A t r X for all X E Mn,n{K) where t r X : = E ^ i C c ^ i f X = [a.}]. 3.79 f. Show that the bilinear form b : Mn,n{^)
x Mn,n{R) -^ K given by n
6(A,B):=tr(A^B):=53(A^B)| i=l
defines an inner product on the real vector space Mn,n{R)- Find the orthogonal of the symmetric matrices. 3.80 f. Given n + 1 points zi, Z2,.-., Zn+i in C, show that there exists a unique polynomial of degree at most n with prescribed values at zi, ^2, • • ? Zn+i- [Hint: If Vn is the set of complex polynomials of degree at most n, consider the map : Vn -^ C^"*" given by (/>(P) := ( P ( 2 i ) , P f e ) , • • • ,P(^n)).] 3 . 8 1 % D i s c r e t e i n t e g r a t i o n . Let ti, t2, • - •, tn he n points in [a,b] C M. Show that there are constants a i , a 2 , . . •, an such that b
/
n
P{t)dt
=
^ajP{tj)
for every polynomial of degree at most n — 1. 3.82 If. Let g := fO, 1]^ = la; G M"" I 0 < a:i < 1,
i=l,...,n\
be the cube of side one in R'^. Show that its diagonal has length y/n. Denote by x i , . . . ,X2'n the vertices of Q and by x := ( 1 / 2 , 1 / 2 , . . . , 1/2) the center of Q. Show that the balls around x that do not intersect the balls B{xi, 1/2), i = 1 , . . . , 2^, necessarily have radius at most Rn '-= {y/n — 2)/2. Conclude that for n > 4, B(x, Rn) is not contained in Q. 3.83 f.
Give a few explicit metrics in M^ and find the corresponding orthogonal bases.
3.84 f. Reduce a few explicit quadratic forms in R^ and R'* to their canonical form.
4. Self-Adjoint Operators
In this chapter, we deal with self-adjoint operators on a Euclidean or Hermitian space, and, more precisely, with the spectral theory for self-adjoint and normal operators. In the last section, we shall see methods and results of linear algebra at work on some specific examples and problems.
4.1 Elements of Spectral Theory 4.1.1 Self-adjoint operators a. Self-adjoint operators 4.1 Definition. Let X be a Euclidean or Hermitian space X. A linear operator A : X -^ X is called self-adjoint if A* = A. As we can see, if A is the matrix associated to A in an orthonormal basis, then A ^ and A are the matrices associated to A* in the same basis according to whether X is Euclidean or Hermitian. In particular, A is self-adjoint if and only if A = A"^ in the Euclidean case and A = A^ in the Hermitian case. Moreover, as a consequence of the alternative theorem we have X = ker A 0 Im A,
ker A
±lmA
ii A : X -^ X is self-adjoint. Finally, notice that the space of self-adjoint operators is a subalgebra of £(X, X). Typical examples of self-adjoint operators are the orthogonal projection operators. In fact, we have the following. 4.2 Proposition. Let X be a Euclidean or Hermitian space and let P : X -^ X be a linear operator. P is the orthogonal projection onto its image if and only if P* = P and P o P = P.
112
4. Self-Adjoint Operators
Proof. This follows, for instance, from 3.32. Here we present a more direct proof. Suppose P is the orthogonal projection onto its image. Then for every y € X {y~P{y)\z) = 0 V2 € I m P . Thus y = P{y) if y € I m P , that is P{x) = PoP(a:) = P'^{x) Vx G X. Moreover, for x,y E X 0 = (a; - P{x)\P{y))
= (x\P{y))
-
(P{x)\P(y)),
0 = (P{x)\y - P(y)) = (P(x)\y)
-
(P(x)\P{y)),
hence, {P{x)\y) = (x|P(j/)),
i.e.,
P * = P.
Conversely, if P * = P and P^^ = P we have {x - P{x)\Piz))
= {P*(x - P{x))\z)
= (P{x) - P''(x)\z)
= (P(x) - P{x)\z)
=0
for all z^X.
D
b. The spectral theorem The following theorem, as we shall see, yields a characterization of the self-adjoint operators. 4.3 Theorem (Spectral theoremi). Let A : X -^ X be a self-adjoint operator on the Euclidean or Hermitian space X. Then X has an orthonormal basis made of eigenvectors of X. In order to prove Theorem 4.3 let us first state the following. 4.4 Proposition. Under the hypothesis of Theorem 4-3 we have (i) A has n real eigenvalues, if counted with multiplicity, (ii) ifV CW^ is an invariant subspace under A, then V-^ is also invariant under A, (iii) eigenvectors corresponding to distinct eigenvalues are orthogonal. Proof, (i) Assume X is Hermitian and let A € Mn,n(^) be the matrix associated to A in an orthonormal basis. Then A = A''", and A has n complex eigenvalues, if counted with multiplicity. Let z 6 C"^ be an eigenvector with eigenvalue A G C. Then A z = A z = Az = A z. Consequently, if A = (a*), z = (z^, z"^,...,
z^), we have
1 A|z|2 = Er=i A z* z* = E«"=i ^' A? = E«",,=i "5 ^^ ^\ = E",=i 4 ^' ^'Since a* = a^ for a l H , j = 1 , . . . , n, we conclude that (A-A)|zp = 0
i.e.,
AGM.
In the Euclidean case, A ^ = A = A , also. (ii) Let w € V^. For every v £ V -we have {A{'w)\v) = (w\A{v)) = 0 since A{v) G V and w eV-^. Thus A{w) ± V. (iii) Let x, y be eigenvectors of A with eigenvalues A, /x, respectively. Then A and /x are real and (A - ^i){x\y) = iXx\y) - {x\ny) = (A(x)\y) - {x\A{y)) = 0. Thus (x\y) = 0 if A 5^ ^ .
D
4.1 Elements of Spectral Theory
113
Proof of Theorem 4-3. We proceed by induction on the dimension n oi X. On account of Proposition 4.4 (i), the claim trivially holds if d i m X = 1. Suppose the theorem has been proved for all self-adjoint operators on H when dim H = n — 1 and let us prove it for A. Because of (i) Proposition 4.4, all eigenvalues of A are real, hence there exists at least an eigenvector ui of A with norm one. Let H := Span {txi}"*" and let B := A^fj be the restriction of A to H. Because of (ii) Proposition 4.4, B{H) C H, hence B \ H ^^ H is a linear operator on H (whose dimension is n — 1); moreover, B is self-adjoint, since it is the restriction to a subspace of a self-adjoint operator. Therefore, by the inductive assumption, there is an orthonormal basis {u2^... ,Un) oi H made by eigenvectors of B , hence of A. Since U2,... ,Un are orthogonal to u i , (txi, W2, • • •, ttn) is an orthonormal • basis of X made by eigenvectors of ^ .
The next proposition expresses the existence of an orthonormal basis of eigenvectors in several diflFerent ways, see Theorem 2.45. We leave its simple proof to the reader. 4.5 Proposition. Let A : X ^^ X be a linear operator on a Euclidean or Hermitian space X of dimension n. Let (wi, U2, • • •, Un) be a basis of X and let Xi^ A2,..., An be real numbers. The following claims are equivalent (i) (lAi, 1^2, •. •, Un) is an orthonormal basis of X and each Uj is an eigenvector of A with eigenvalue Xj, i.e., A{uj) = XjUj,
(^il'^j) = ^ij
Vi, j = 1 , . . . , n,
(ii) {ui, U2',..., Un) is an orthonormal basis and n
J= l
(iii) (lii, 1/2,..., Un) is an orthonormal basis and for all x^y £ X (A(x)\y) = / ^ ^ = i ^ji^M (y\^j) ^f^ ^^ Euclidean, IX]?= 1 ^3i^Wj) iy\%•) if X is Hermitian. Moreover, we have the following, compare with Theorem 2.45. 4.6 Proposition. Let A : X —^ X be a self-adjoint operator in a Euclidean or Hermitian space X of dimension n and let A € Mn^n{^) be the matrix associated to A in a given orthonormal basis. Then A is similar to a diagonal matrix. More precisely, let {ui, 1*2,..., Un) be a basis of X of eigenvectors of A, /et Ai, A2,.. •, An E M be the corresponding eigenvalues and let S G Mn,n{^) be the matrix that has the n-tuple of components of Ui in the given orthonormal basis as the ith column. S :=
U i U2
Ur,
Then S^S = Id
and
S^AS = diag (Ai, A2,..., An)
114
4. Self-Adjoint Operators
if X is Euclidean, and S^S = Id
and
S^AS = diag (Ai, A2,..., A^).
if X is Hermitian. Proof. Since the columns of S are orthonormal, it follows that S ^ S = Id if X is Euclid—T
ean or S S = Id if X is Hermitian. The rest of the proof is contained in Theorem 2.45. D
c. Spectral resolution Let A : X —^ X he 3. self-adjoint operator on a Euclidean or Hermitian space X of dimension n, let (i^i, 1*2,..., Un) be an orthonormal basis of eigenvectors of A, let Ai, A2,..., A^; be the distinct eigenvalues of A and Vi, V2,..., Vk the corresponding eigenspaces. Let Pi : X -^ Vi he the projector on Vi so that
UjEVi
and by (ii) Proposition 4.4 k
A{x) = J2XiPi{x). i=l
As we have seen, by (iii) Proposition 4.4, we have Vi L Vj ii i •=^ j and, by the spectral theorem, Y2i=i dimV^ = n. In other words, we can say that {Vi}i is a decomposition of X in orthogonal subspaces or state the following. 4.7 Theorem. Let A : X -^ X be self-adjoint on a Euclidean or Hermitian space X of dimension n. Then there exists a unique family of projectors Pi, P25. • • 5 Pk CL'^d distinct real numbers Ai, A2,. •., A^^ such that k
p. o Pj = 6ijPj,
^
k
Pi = Id
and A = ^
XiPi.
Finally, we can easily complete the spectral theorem as follows. 4.8 Proposition. Let X be a Euclidean or Hermitian space. A linear opertor A : X -^ X is self-adjoint if and only if the eigenvalues of A are real and there exists an orthonormal basis of X made of eigenvectors of A.
4.1 Elements of Spectral Theory
115
d. Quadratic forms To a self-adjoint operator A : X —^ X we may associate the bilinear form a:X xX ^K, a{x,y) := {A{x)\y),
x,y e X,
which is symmetric if X is EucUdean and sesquilinear^ tt(^?y) = o.{y^x)^ if X is Hermitian. 4.9 Theorem. Let A : X —^ X be a self-adjoint operator, (ei, e2, • . . , e^) an orthonormal basis of X of eigenvectors of A and Ai, A2,. •., A^ be the corresponding eigenvalues. Then n
{A{x)\x) =Y.Xi\{x\ei)f
Vx e X,
(4.1)
2=1
In particular, if Amin ^nd Amax CiT'e respectively, the smallest and largest eigenvalues of A, then A m i n k l ^ < {A{x)\x)
< A max 1^1
Vx G X.
Moreover, we have {A{x)\x) = Amin |^P (resp. {A{x)\x) = Amax kPy^ if and only if x is an eigenvector with eigenvalue Amin (resp. Xmax)Proof. Proposition 4.5 yields (4.1) hence n
n
i=l
2=1
and, since l^p = J27=i l ( ^ k i ) P ^^ G X, the first part of the claim is proved. Let us prove that {A(x)\x) = Amin 12^ P if and only if x is an eigenvector with eigenvalue Amin- If x is an eigenvector of A with eigenvalue Amin) then A{x) = Amin 2: hence {A{x)\x) = (Amin^:^!^) = Amin k p . Conversely, suppose (ei, 6 2 , . . . , Cn) is a basis of X made by eigenvectors of A and the eigenspace Vx^-^^ is spanned by (ei, 6 2 , . . . , e^). Prom {A(x)\x) = AminkP^ we infer that 0 = iA(x)\x)
n - AminkP = E ( ^ ^ i=l
^min)|(x|ei)P
and, as AjAmin ^ 0, we get that (x|ei) = 0 Vz > A:, thus x G V\^.^. We proceed similarly for Amax.
D
All eigenvalues can, in fact, be characterized as in Theorem 4.9. Let us order the eigenvalues, counted with their multiplicity, as Ai < A2 < • • • < An and let (ei, e 2 , . . . , en) be an orthonormal basis of corresponding eigenvectors (ei, 6 2 , . . . , en), A{ei) = XiCi Vz = 1 , . . . , n; finally, set Vk := Span{ei, e 2 , . . . , e^},
Wk := {efc,efc+i,... ,en}.
116
4. Self-Adjoint Operators
Since T4, Wk are invariant subspaces under A and Vj^ = Wk-\-i, by applying Theorem 4.9 to the restriction of {A{x)\x) on 14 and VF^, we find Ai = mm{A{x)\x), 1x1=1 Xk = max< {A{x)\x)
(4.2) |x| = 1, x G Vjt >
= min< {A{x)\x)
|a:| = 1, x e Wk \
if A: = 2 , . . . , n — 1,
An = max(A(x)|x). kl=i Moreover, if 5 is a subspace of dimension n—fc-fl, we have 5014 ¥" {0}? then there is XQ ^ S ilVk with |a;o| = 1; thus min|(A(x)|x) |x| = 1, x e s\ < {A{xo)\xo) < max< {A{x)\x)
|x| = 1, x eVk\
= Xk-
Since dim W4 = n — k -\-1 and mmxeWk{^{^)\^) = A^, we conclude with the min-max characterization of eigenvalues that makes no reference to eigenvectors. 4.10 Proposition (Courant). Let A he a self-adjoint operator on a Euclidean or Hermitian space X of dimension n and let Xi < X2 < - - - < Xn be the eigenvalues of A in nondecreasing order and counted with multiplicity. Then Xk =
max
min
l^v V / I / I I I
min m a x n A ( x ) b ) dimS=k
J
\x\ =
l^v V / I / I I I
l,xeS>. J
4.11 A variational algorithm for the eigenvectors. Prom (4.2) we know that Afc :=min{(A(x)|x)| |x| = 1, x G 14-^1},
A: = l . . . , n ,
(4.3)
where F_i = {0}. This yields an iterative procedure to compute the eigenvalues of A. For j = 1 define Ai =
m.m{A{x)\x), kl=i
and for j = 1 , . . . , n — 1 set Vj := eigenspace of Aj, Aj+i := min< {A{x)\x) | |x| = 1, x e Wj >.
4.1 Elements of Spectral Theory
117
Notice that such an algorithm yields an alternative proof of the spectral theorem. We shall see in Chapter 10 that this procedure extends to certain classes of self-adjoint operators in infinite-dimensional spaces. Finally, notice that Sylvester's theorem, Gram-Schmidt's procedure or the other variants for reducing a quadratic form to a canonical form, see Chapter 3, allow us to find the numbers of positive, negative and null eigenvalues (with multiplicity) without computing them explicitly. e. Positive operators A self-adjoint operator A : X -^ X is called positive (resp. nonnegative) if the quadratic form (j){x) := (Ax|x) is positive for x ^ 0 (resp. nonnegative). Prom the results about metrics, see Corollary 3.56, or directly from Theorem 4.9, we have the following. 4.12 Proposition. Let A: X ^^ X be self-adjoint. A is positive (nonnegative) if and only if all eigenvalues of A are positive (nonnegative) or iff there is X> 0 (X>0) such that {Ax\x) > A|xp. 4.13 Corollary. A : X ^^ X is positive self-adjoint if and only if a{x,y) = {A{x)\y) is an inner (Hermitian) product on X. 4.14 Proposition (Simultaneous diagonalization). Let A^M : X ^>X be linear self-adjoint operators on X. Suppose M is positive. Then there exists a basis (ei, e 2 , . . . , e^) of X and real numbers Ai, A2,..., An such that {M{ei)\ej) = 6ij, A{ej) = XjMcj \/iJ = 1 , . . . ,n. (4.4) Proof. The metric g{x,y) := (M(x)\y) is a scalar (Hermitian) product on X and the Unear operator M~^A : X —> X is self-adjoint with respect to g since g{M-^A{x),y)
= {MM-''A(x)\y)
= {A{x)\y) =
= {x\MM-'^A(y))
= (Mx\M-'^A(y))
{x\A{y)) =
g{x,M-'^A{y)).
Therefore, M~^A has real eigenvalues and, by the spectral theorem, there is a gorthonormal basis of X made of eigenvectors of M ~ ^ A , g{ei,ej)
= {M{ei)\ej)
= Sij,
M~^A(ej)
= XjCj \/i,j = 1 , . . . , n .
4.15 Remark. We cannot drop the positivity assumption in Proposition 4.14. For instance, if
we have det(AId — M~^A) = A^ + 1, hence M~^A has no real eigenvalue.
118
4. Self-Adjoint Operators
4.16 %. Show the following. P r o p o s i t i o n . Let X be a Euclidean space and let g,b : X x X —^R be two metrics on X. Suppose g is positive. Then there exists a basis of X that is both g-orthogonal and b-orthogonal. 4 . 1 7 ^ . Let A,M be linear self-adjoint operators and let M be positive. Then M~^A is self-adjoint with respect to the inner product g{x,y) := (M{x)\y). Show that the eigenvalues Ai, A2, • •., An of M~^^A are iteratively given by Ai =
min
g(x,x) = l
g{M
A(x))x
= min
Xyt^O
{M{x)\x)
and for J = 1 , . . . , n — 1 I V^ := eigenspace of M~^A IWj
relative to Aj,
:=iVi®V2e'-'eVj)-^,
[Xj+1 := mm{{A{x)\x)
\ {M{x)\x)
= 1, x e
Wj},
where V-*- denotes the orthogonal to V with respect to the inner product g. 4 . 1 8 f.
Show the following.
P r o p o s i t i o n . Let T be a linear operator on K^. IfT-\-T* is positive then all eigenvalues of T have positive (nonnegative) real part.
(nonnegative),
f. The operators A* A and A A* Let A : X -^Y he Si linear operator between X and Y that we assume are either both Euclidean or both Hermitian. Prom now on we shall write Ax instead of A{x) for the sake of simplicity. As usual, A* : Y ^ X denotes the adjoint of A. 4.19 Proposition. The operator A* A : X -^ X is (i) self-adjoint, (ii) nonnegative, (iii) Ax, A*Ax and {A*Ax\x) are all nonzero or all zero, in particular A* A is positive if and only if A is infective, (iv) if ui^ i/2,..., Un are eigenvectors of A* A respectively, with eigenvalues Ai, A2,..., An; then {Aui\Auj) =
\i{ui\uj),
in particular, if ui, U2,. -., Un are orthogonal to each other, then Au\,..., Aun are orthogonal to each other as well. Proof
(i) In fact, {A*A)* = A*A** = A*A.
(ii) and (iii) If Ax = 0, then trivially A*Ax = 0, and if A*Ax = 0, then (A*Ax\x) = 0. On the other hand, {A*Ax\x) = {Ax\Ax) = | ^ x p hence Ax = 0 if (A*Ax\x) = 0. (iv) In fact, {Aui\Auj)
= {A*Aui\uj)
= \i{ui\uj)
= Xi\ui\'^Sij.
D
4.1 Elements of Spectral Theory
4.20 Proposition. The operator A A* :Y ^^Y
119
is
(i) self'adjoint, (ii) nonnegative, (iii) A*x, AA*x and {AA*x\x) are either all nonzero or all zero, in particular AA* is positive if and only if ker A* = {0}, equivalently if and only if A is surjective. (iv) if ui, U2j' >., Un are eigenvectors of AA"^ with eigenvalues respectively, Ai, A2,..., Xny then {A''ui\A*Uj) = Xi{ui\uj), in particular, if ui, U2,.. - ^ Un are orthogonal to each other, then A*ui^..., A*Un are orthogonal to each other as well Moreover, A A* and A* A have the same nonzero eigenvalues and Rank^Tl* = R a n k ^ M = Rank A = Rank A*. In particular, Rank A A* = Rank A* A < min(dim X, dim F ) . Proof. The claims (i) (ii) (iii) and (iv) are proved as in Proposition 4.19. To prove that A* A and A A* have the same nonzero eigenvalues, notice that if X € A", x 7«^ 0, is an eigenvalue for A*A with eigenvalue A 7^ 0, A*Ax = Xx, then Ax 7^ 0 by (iii) Proposition 4.19 and AA*{Ax) = A{A*Ax) = A{Xx) = XAx, i.e., Ax is a nonzero eigenvector for A A* with the same eigenvalue A. Similarly, one proves that if 2/ 7^ 0 is an eigenvector for A A* with eigenvalue X ^ 0, then by (iii) A*y ^ 0 and A*y is an eigenvector for A* A with eigenvalue A. Finally, from the alternative theorem, we have Rank A A* = Rank A* = Rank A = Rank A M .
g. Powers of a self-adjoint operator Let A : X -^ X he self-adjoint. By the spectral theorem, there is an orthonormal basis (ei, 6 2 , . . . , e-n) of X and real numbers Ai, A2,..., A^ such that n
Ax = 2_[^j{^\^j)^j
^^ ^ ^'
By induction, one easily computes, using the eigenvectors ei, 6 2 , . . . , Cn and the eigenvalues Ai, A2,..., An of A the /c-power oi A, A^ :=^ Ao- - -oA k times, V/c > 2, as n
A^x = Y,i^i)H^\ei)
ei
Vx e X
(4.5)
i=l
4.21 Proposition. Let A: X ^^ X be self-adjoint and fc > 1. Then (i) A^ is self-adjoint, (ii) A is an eigenvalue for A if and only if A^ is an eigenvalue for A^,
120
4. Self-Adjoint Operators
(iii) X E X is an eigenvector of A with eigenvalue A if and only if x is an eigenvector for A^ with eigenvalue X^. In particular, the eigenspaces of A relative to A and of A^ relative to X^ coincide. (iv) / / A is invertihle, equivalently, if all eigenvalues of A are nonzero, then 1
A~^x = 22 T~(^kO ^i
^^ ^ ^'
.=1 ^^ 4.22 % Let A: X -^ X he self-adjoint. Show that
li p{t) = YlT=i ^kt^ ^^ ^ polynomial of degree m, then (4.5) yields m
m
P{A){x) = Y^akA^x)
n
n
= ^5^afcA^^(x|e,)e,- = 5^p(A,)(x|e,)e,-. (4.6)
k=l
k=lj=l
j=l
4.23 Proposition. Let A : X -^ X be a nonnegative self-adjoint operator and let k E N, k > 1. There exists a unique nonnegative self-adjoint operator B : X -^ X such that B'^^ = A. Moreover, B is positive if A is positive. The operator B such that B'^^ = A \s called the 2A;th root of A and is denoted by ^ \ / A . Proof. If A{x) ^ X;7=i ^jiA^j^j^
(4.5) yields B^^ == A for n
Uniqueness remains to be shown. Suppose B and C are self-adjoint, nonnegative and such that A = B"^^ = C^'^. Then B and C have the same eigenvalues and the same eigenspaces by Proposition 4.21, hence B = C. •
In particular, if A : X —^ X is nonnegative and self-adjoint, the operator square root of A is defined by n \/A{X)
:= ^2 V ^ ( ^ l ^ i ) ^ J '
X e X,
i=l
if A has the spectral decomposition Ax = X]^=i
^j{^\^j)^j'
4 . 2 4 %. Prove Proposition 4.14 by noticing that, if A and M are self-adjoint and M is positive, then M~'^/^AM~'^/^ : X —>• X is well defined and self-adjoint. 4 . 2 5 ^ . Let A,B be self-adjoint and let A be positive. Show that B is positive if S := AB -h BA is positive. [Hint: Consider A~^/^BA~^/2 and apply Exercise 4.18.]
4.1 Elements of Spectral Theory
121
4.1.2 Normal operators a. Simultaneous spectral decompositions 4.26 Theorem. Let X be a Euclidean or Hermitian space. If A and B are two self-adjoint operators on X that commute, A = A\
B = B\
AB = BA,
then there exists an orthonormal basis (ei, 6 2 , . . . , Cn) on X of eigenvectors of A and B, hence n
n
z = ^{z\ei)ei,
Az = ^Xi{z\ei)ei,
2=1
1=1
n
Bz =
^fii{z\ei)ei, i—1
Ai, A2,..., An G M and /ii, /Li2,..., /in ^ I^ being the eigenvalues respectively of A and B. This is proved by induction as in Theorem 4.3 on account of the following. 4.27 Proposition. Under the hypoteses of Theorem 4-26, we have (i) A and B have a common nonzero eigenvector, (ii) if V is invariant under A and B, then V-^ is invariant under A and B as well. Proof, (i) Let A be an eigenvalue of A and let V\ be the corresponding eigenspace. For all y € Vx we have ABy = BAy = XBy, i.e., By G V^. Thus V^ is invariant under B , consequently, there is an eigenvector w £ Vx oi B^y^, i.e., common to A and B. (ii) For every w G V-^ and z £ V, we have Az,Bz G V and {Aw\z) = {w\Az) = 0, {Bw\z) = {w\Bz) = 0, i.e.. Aw, Bw eV-^. • 4 . 2 8 %. Show that two symmetric matrices A, B are simultaneously diagonizable if and only if they commute A B = B A .
b. Normal operators on Hermitian spaces A linear operator on a Euclidean or Hermitian space is called normal if NN* = N*N. Of course, if we fix an orthonormal basis in X, we may represent N with an n x n matrix N 6 Mn,n(C) and N is normal if and only if N N ^ = N ^ N if X is Hermitian or N N ^ = N ^ N if X is Euclidean. The class of normal operators, though not trivial from the algebraic point of view (it is not closed for the operations of sum and composition), is interesting as it contains several families of important operators as subclasses. For instance, self-adjoint operators A^ = A^*, anti-self-adjoint operators N* = —N, and isometric operators, N*N = Id, are normal operators. Moreover, normal operators in a Hermitian space are exactly the ones that are diagonizable. In fact, we have the following.
122
4. Self-Adjoint Operators
4.29 Theorem (Spectral theorem). Let X be a Hermitian space of dimension n and let N : X -^ X he a linear operator. Then N is normal if and only if there exists an orthonormal basis of X made by eigenvectors ofN. Proof. Let (ei, 6 2 , . . . , Cn) be an orthonormal basis of X made by eigenvectors of N. Then for every z £ X n
n
Nz = ^Xj{z\ej)ej,
N*z =
j= l
^^{z\ej)ej 3= 1
hence NN*z = XI^^i |AjP(^|e-,)ej- = N*Nz. Conversely, let N + N* N -N* A:= — , B := . 2 2i It is easily seen that A and B are self-adjoint and commute. Theorem 4.26 then yields a basis of orthonormal eigenvectors of A and B and therefore of eigenvectors oi N := A+ iB and N* = A- iB. D 4.30 1 . Show that AT : C"^ -^ C"^ is normal if and only if N and N* have the same eigenspaces.
c. Normal operators on Euclidean spaces Let us translate the information contained in the spectral theorem for normal operators on Hermitian spaces into information about normal operators on Euclidean spaces. In order to do that, let us first make a few remarks. As usual, in C^ we write z — x+iy, x,y EW^ for z — {x\ -\-iyi,..., Xn + iyn)' If VF is a subspace of W^, the subspace of C"^ WeiW
:= (zeC'^\z
= x-h iy, x,y
is called the complexified ofW. Trivially, A\mc{W®iW) if F is a subspace of C^, set V
ew\ = dim^ W. Also,
:={ZGC^|ZGF}.
4.31 Lemma. A_subspace V C C^ is the complexified of a real subspace W if and only ifV = V. Proof. liV vectors
= W^
iW, trivially V = V. Conversely, if 2 € F is such that z e V, the
''have real coordinates. Set
z -\- z 2 '
^ -
W :=^xeW\x=
z — z {z/i) -h z/i 2i 2
^ ^ , 2€ V};
then it is easily seen that V = W ® iW li V = V.
4.1 Elements of Spectral Theory
123
For N : M^ ^ W^ we define its complexified as the (complex) linear operator Nc : C -^ C defined by Nc{z) := Nx + iNy iiz = x-\-iy. Then we easily see that (i) A is an eigenvalue of N if and only if A is an eigenvalue of Nc^ (ii) N is respectively, a self-adjoint, anti-self-adjoint, isometric or normal operator if and only if Nc is respectively, a self-adjoint, anti-selfadjoint, isometric or normal operator on C"^, (iii) the eigenvalues of N are either real, or pairwise complex conjugate; in the latter case the conjugate eigenvalues A and A have the same multiplicity. 4.32 Proposition. Let N : W^ ^^ W^ be a normal operator. Every real eigenvalue \ of N of multiplicity k has an eigenspace Vx of dimension k. In particular, V\ has an orthonormal basis made of eigenvectors. Proof. Let A be a real eigenvalue for NQ, NQZ = Xz. We have NQZ
— Nx - iNy = N^z = Xz = Xz,
i.e., z € C^ is an eigenvector of N^ with eigenvalue A if and only if 'z is also an eigenvector with eigenvalue A. The eigenspace Ex of Nc relative to A is then closed under conjugation and by Lemma 4.31 Ex '•= Wx © iWx, where
Wx:={xeR''\x=^,
z-\- z
zeEx],
and dimR Wx — dime ^x • Since N^ is diagonizable in C and W\ C VA ? we have k — dime Ex = dim Wx < dimR V^. As dimV^ ^ k, see Proposition 2.43, the claim follows.
D
4.33 Proposition. Let X be a nonreal eigenvalue of the normal operator N :W^ —^W^ with multiplicity k. Then there exist k planes of dimension 2 that are invariant under N. More precisely, i / e i , e 2 , . . . , en G C"^ are k orthonormal eigenvectors that span the eigenspace Ex of Nc relative to A and we set U2J-1 : =
Cj -\- Cj 7=—,
V2 '
U2j :=
'''
^.— ^.
V2i '
then lii, ii2,. • •, U2k Q'Te orthonormal in W^, and for j = 1,... ,k the plane Span{iA2j_i5^2j}? is invariant under N; more precisely we have
{
N{u2j-l)
= OLU2J-1 - fiU2j,
N{u2j) = /3u2j-i + au2j
where X=: a-\- i/3. Proof. Let Ex, Ej be the eigenspax:es of Nc relative to A and A. Since Nc is diagonizable on C, then ^A -L ^Jdime ^A = dime -^X ~ ^' On the other hand, for z ^ Ex
124
4. Self-Adjoint Operators
Ncz = Nx — iNy = N^z =
\z.
Therefore, z ^ Ex'ii and only if 2 G E-^. The complex subspgice F\ := Ex®Ej of C^ has dimension 2k and is closed under conjugation; Lemma 4.31 then yields Fx = Wx ^iWx where Wx:=[xeR''\x=
^ ^ ,
zeEx\
and
dimR Wx = dime E = 2k.
(4.7)
If (ei, 6 2 , . . . , efc) is an orthonormal basis of Ex, (ei, 6 2 , . . . , e^) is an orthonormal basis of Ej; since y/2ej
=: U2J-1
-\-iu2j,
V^ej = : U2j-1
• ^y'2j,
we see that {uj} is an orthonormal basis of Wx- Finally, if A := a -f- z/?, we compute
= ^+ Ae-
(Niu2j-i) = Nc{^) \N{U2J)
= Af ( ^ )
= ^ ^ ^
• = aU2j-l
- 0U2j,
= • • • = /3«2,-l + a « 2 „
i.e., Span {tt2j-1, W2j} is invariant under N.
D
Observing that the eigenspaces of the real eigenvalues and the eigenspaces of the complex conjugate eigenvectors are pairwise orthogonal, from Propositions 4.32 and 4.33 we infer the following. 4.34 Theorem. Let N be a normal operator on M.^. Then R^ is the direct sum of 1-dimensional and 2-dimensional subspaces that are pairwise orthogonal and invariant under N. In other words, there is an orthonormal basis such that the matrix N associated to N in this basis has the block structure 0 \ 0 A Ai N'
0
0
0
0
To each real eigenvalue A of multiplicity k correspond k blocks A of_dimension 1 x 1 . To each couple of complex conjugate eigenvalues A, A of multiplicity k correspond fc 2 x 2 blocks of the form a -a
(3 a
where a + if3 := A. 4.35 Corolleiry. Let A/^: M" —> R" be a normal operator. Then (i) N is self-adjoint if and only if all its eigenvalues are real, (ii) A'' is anti-self-adjoint if and only if all its eigenvalues are purely imaginary (or zero), (iii) N is an isometry if and only if all its eigenvalues have modulus one. 4 . 3 6 % Show Corollary 4.35.
4.1 Elements of Spectral Theory
125
4.1.3 Some representation formulas a. The operator A* A Let yl: X —> y be a linear operator between two Euclidean spaces or two Hermitian spaces and let ^* : y ^ ' X be its adjoint. As we have seen, yl*A : X —> X is self-adjoint, nonnegative and can be written as n
A" Ax =
Y^Xi{x\ei)ei 2=1
where (ei, 6 2 , . . . , Cn) is a basis of X made of eigenvectors of A*A and for each 2 = 1 , . . . , n Ai is the eigenvalue relative to e^; accordingly, we also have {A*Ay^^x
:= ^
fJ.i{x\ei) e^.
2=1
where /Xi := ^/Xi. The operator (A*A)^/^ and its eigenvalues / i i , . . . ,/Xn, called the singular values of A, play an important role in the description of A 4 . 3 7 f. Let A G Mm,n{^)' Show that ||A|| := sup|a.|^i |Ax| is the greatest singular value of A. [Hint: | A x p = (A* Ax) •x .]
4.38 Theorem (Polar decomposition). Let A : X ^^Y between two Euclidean or two Hermitian spaces.
be an operator
(i) If dimX < d i m y , then there exists an isometry U : X -^ Y, i.e., tf'U = Id, such that Moreover, if A = US with f/*C/ = Id and S* = S, then S = (A* A)^/^ and U is uniquely defined on ker S-^ = ker A-^. (ii) If dimX > dim.Y, then there exists an isometry U : Y ^y X, i.e., U*U = Id such that A = {AA*y^^U\ Moreover, ifA = SU with U*U = Id and 5* = S, then S = {AA*)^^'^ and U is uniquely defined on ker 5-^ = I m ^ . Proof. Let us show (i). Set n := d i m X and N := d i m y . First let us prove uniqueness. If A = 175 where U*U = Id and 5* = S, then A*A = S*U*US = S*S = ^ 2 , i.e., S = (A*A)i/2. Now from A = U(A*A)^/^, we infer for i = 1 , . . . , n Aid)
= t/(A*A)i/2(ei) = Uimei)
=
fiiU{ei),
if (ei, 6 2 , . . . , en) is an orthonormal basis of X of eigenvectors of {A*A)^^^ with relative eigenvalues ^ 1 , /JL2, . • •, /Xn- Hence, U(ei) = —A(ei) if/Ltj ^ 0, i.e., U is uniquely defined by A on the direct sum of the eigenspaces relative to nonzero eigenvalues of (A* A)^/^, that is, on the orthogonal of ker(A*A)^/2 = ker A. Now we shall exhibit U. The vectors A{ei),..., A{en) are orthogonal and |A(ei)| = Mi as
126
4. Self-Adjoint Operators
{A{ei)\A{ej))
= {A*A{ei)\ej)
= fJLi{ei\ej) = f^i6ij.
Let us reorder the eigenvectors and the corresponding eigenvalues in such a way that for some k, 1 < k < n, the vectors A{ei),..., A{ek) are not zero and A(ek-\-i) = - • - = A{en) = 0. For i = 1 , . . . , A: we set i;i := , . r ^ ^ and we complete t;i, t ; 2 , . . . , ^^fc to form a new orthonormal basis (vi^ V2,-.., VN) oiY. Now consider U : X -^Y defined by U(ei) :=Vi
i=l,...,n.
By construction {U{ei)\U{ej)) — Sij, i.e., U*U = Id, and, since fXi = \A{ei)\ = 0 for i > k, we conclude for every i = 1,... ,n
I yrt^t;^ = Ovi = 0
if k < i < n
(ii) follows by applying (i) to ^ * .
D
b. Singular value decomposition Combining polar decomposition and the spectral theorem we deduce the so-called singular value decomposition of a matrix A. We discuss only the real case, since the complex one requires only a few straightforward changes. Let A G MM,nW with n < N. The polar decomposition yields A = U(A^A)i/2
^i^j^
uTu _ jj
On the other hand, since A-^A is symmetric, the spectral theorem yields S e Mn.nW such that (A^A)i/2 = S^diag(/ii, / i 2 , . . . , /in)S,
S^S = Id,
where /ii, /X25 • • •, /^n are the squares of the singular values of A. Recall that the ith column of S is the eigenvector of (A*A)^/^ relative to the eigenvalue fii. then T ^ T = Id, In conclusion, if we set T := U S ^ G MNA^), S^S = Id and A = Tdiag(/ii, / i 2 , . . . , /in)S. This is the singular value decomposition of A, that is implemented in most computer hbraries on linear algebra. Starting from the singular value decomposition of A, we can easily compute, of course, (A-^A)^/^, and the polar decomposition of A. 4.39 . We notice that the singular value decomposition can be written in a more symmetric form if we extend T to a square orthogonal matrix V G MN,N(^), V ^ V = Id and extending diag (/xi, /i2, • • •, /in) to a A/^ x n matrix by adding N — n null rows at the bottom. Then, again
A = VAS where V G MATXATW, V ^ V = Id, S G Mn,n(^),
S ^ S = Id and
4.1 Elements of Spectral Theory
0
/MI 0
/i2
0
0 0
0 0
Mn
\0
0
A =
127
0
0/
c. The Moore-Penrose inverse Let A : X —^Y he 3. linear operator between two Euclidean or two Hermitian spaces of dimension respectively, n and m. Denote by P:X
^kevA^
Q:Y-^lmA
and
the orthogonal projections operators to kevA-^ and 1mA. Of course Ax = Qy has at least a solution x G X for any y ^Y. Equivalently, there exists X E X such that y — Ax ± Im ^ . Since the set of solutions of Ax — Qy is a translate of ker A, we conclude that there exists a unique x := A'^y E X such that y — Ax 1. Im A, [x e
Ax = Qy,
equivalently,
(4.8)
X = Px.
kerA^,
The linear map A"^ :Y ^^ X, y —^ A^y, defined this way, i.e.,
is called the Moore-Penrose inverse oi A: X ^^Y. {AA^
From the definition
=Q,
A^A = P, ker A+ =lmA^ ImA^ =
= kerQ,
keiA^.
4.40 Proposition. A^ is the unique linear map B :Y -^ X such that AB = Q,
BA = P
and
kerB = keTQ',
(4.9)
moreover we have A^AA"^ =A^AA''
=A\
(4.10)
128
4. Self-Adjoint Operators
Proof. We prove that B = A^ by showing for s\\ y £ Y the vector x := By satisfies (4.8). The first equaUty in (4.9) yields Ax = ABy = Qy and the last two imply x = By = BQy = BAx = Px. Finally, from AA^ = Q and A^A = P^ we infer that A*AA^ = A*Q = A\
A^AA* = PA* = A*,
using also that A*Q = A* and PA* = A* since A and A* are such that ImA (kerA*)-^ and ImA* = kerA-^.
= D
The equation (4.10) allow us to compute A^ easily when A is injective or surjective. 4.41 Corollary. Let A : X ^^ Y be a linear map between Euclidean or Hermitian spaces of dimension n and m, respectively. (i) If ker A = {0}, then n <m, A* A is invertible and A^ =
{A*A)-^A*;
moreover^ if A = [/(A*A)^/^ is the polar decomposition of A, then A^ = {A*A)-^/^U\ (ii) If ker A* = {0}, then n>m, AA* is invertible, and
moreover, if A = {AA*)^^'^U* is the polar decomposition of A, then At = C/(AA*)-V2. For more on the Moore-Penrose inverse, see Chapter 10.
4.2 Some Applications In this final section, we illustrate methods of linear algebra in a few specific examples.
4.2.1 The method of least squares a. The method of least squares Suppose we have m experimental data yi, 2/2, • • •, Vm when performing an experiment of which we have a mathematical model that imposes that the data should be functions, (/>(x), of a variable x e X. Our problem is that of finding x G X in such a way that the theoretical data 0(x) be as close as possible to the data of the experiment. We can formahze our problem as follows. We list the experimental data as a vector y = (yi, 2/2, • • •, 2/m) G W^ and represent the mathematical
4.2 Some Applications
129
model as a map 0 : X —> W^. Then, we introduce a cost function C = C{(j){x)^y) that evaluates the error between the expected result when the parameter is x, and the experimental data. Our problem then becomes that of finding a minimizer of the cost function C. If we choose (i) the model of the data to be linear^ i.e., X is a vector space of dimension n and (j) = A\ X -^ W^ is a linear operator, (ii) as cost function, the function square distance between the expected and the experimental data, C{x) = \Ax - 2/|2 = {Ax - y\Ax - y),
(4.11)
we talk of the {linear) least squares problem. 4.42 Theorem. Let X and Y he Euclidean spaces, A \ X ^^ Y a linear map, y EY and C : X ^^R the function C{x) := \Ax-y\Y^
x e X.
The following claims are equivalent (i) X is a minimizer of C, (ii) y - Ax ± 1mA, (iii) X solves the canonical equation A*{Ax-y)
= 0.
(4.12)
Consequently C has at least a minimizer in X and the space of minimizers of C is a translate o/ker A. Proof. Clearly minimizing is equivalent to finding z = Ax G I m A of least distance from y. By the orthogonal projection theorem, x is a minimizer if and only if Ax is the orthogonal projection of y onto Im A. We therefore deduce that a minimizer x G X for C exists, that for two minimizers xi,X2 of C we have Axi = Ax2, i.e., x i — X2 6 ker A and that (i) and (ii) are equivalent. Finally, since ImA-*- = kerA*, (ii) and (iii) are clearly equivalent. • 4 . 4 3 R e m a r k . The equation (4.12) expresses the fa-ct that the function x —>• | Aa: — 6p is stationary at a minimizer. In fact, compare 3.65, since Vx{z\x) = z and Vx(^x\x) = 2La; if L is self-adjoint, we have \Ax - 6|2 = |6|2 _ 2(6|Ax) + |Ax|2, V(6|Ax) = V(A*6|x) = A*6, Vx|Ax|2 = Vx{A*Ax\x)
=
2A*Xx
hence Vx\h-Ax\^
=
2A*{Ax-h).
As a consequence of Theorem 4.42 on account of (4.8) we can state the following 4.44 Corollary. The unique minimizer of C{x) = \Ax — y|y in Im A* = ker A-^ is X = A^y.
130
4. Self-Adjoint Operators
b. The function of linear regression Given m vectors xi, X2,. • •, Xm in a Euclidean space X and m corresponding numbers yi, 2/2, • • •, 2/m, we want to find a linear map L : X -^R that minimizes the quantity m
nL):='£\yi-Lixi)\\ i=l
This is in fact a dual formulation of the linear least squares problem. By Riesz's theorem, to every linear map L : X —> R corresponds a unique vector WL ^ X such that L{y) := {y\wL)j and conversely. Therefore, we need to find w G X such that m
C{w) := ^Ivi
- {xi\w)\'^ -^ min.
2=1
If y := (2/1, 2/2, • • •, 2/m) ^ R"^ and A: X —^ W^ is the linear map Aw := [{xi\w), {X2\w),...
{xn\w)j,
we are again seeking a minimizer of C : X —> M C{w) :=\y-Aw\^,
w e X.
Theorem 4.42 tells us that the set of minimizers is nonempty, it is a translate of ker A and the unique minimizer of C in ker A-^ = Im ^* is if; := A^y. Notice that n
A*a=j2
^i^i^
^=(«^
^^ • • -«"") ^ ^"^
2=1
hence, w £ IxnA'' = ker A"^ if and only if if; is a linear combination of xi, X2,..., Xm- We therefore conclude that A'^y is the unique minimizer of the cost function C that is a linear combination o / x i , 0:2,..., Xm- The corresponding linear map L{x) := {x\A^y) is called the function of linear regression.
4.2.2 Trigonometric polynomials Let us reconsider in the more abstract setting of vector spaces some of the results about trigonometric polynomials, see e.g.. Section 5.4.1 of [GM2]. Let Vn,2Tr be the class of trigonometric polynomials of degree m with complex coefficients n
Vn,2ir ••= [Pix) = ^
Cke''"' I Cfc e C, A; = - n , . . . , n } .
4.2 Some Applications
131
Recall that the vector {c-n,. "•,Cn) G C^"^"^^ is called the spectrum of P{^) — Z]fc=-n ^k^^^^' Clearly, Pn,27r is a vector space over C of dimension at most 2n + 1. The function (P|Q) : VU^-K X ^n,27r -^ C defined by
is a Hermitian product on Vn,2T^ that makes Pn,27r a Hermitian space. Since
(gifcx|^z/.x)=1. r ^i{k-h)x ^^ ^ ^^^^ see Lemma 5.45 of [GM2], we have the following. 4.45 Proposition. The trigonometric polynomials {e^^^}k=-n,n form an orthonormal set of2n-\-l vectors in Pn,27r o,nd we have the following. (i) T^n,27T is a Hermitian space of dimension 2n + 1. (ii) The map ^: 'Pn,27r -^ C^^"^^, that maps a trigonometric polynomial to its spectrum is well defined since it is the coordinate system in Vn,27r relative to the orthonormal basis {e'^^^}. In particular^ : Vn,2n -^ £2n-\-i j^g ^ (complex) isometry. (iii) (FOURIER COEFFICIENTS) For k = - n , ...,n we have
1 n
Cfe = (P|e"=") = — / (iv)
P{t)e-''''dt.
( E N E R G Y IDENTITY)
i- r \P{t)fdt = \\P\f := (P|P) = J2 |(P|e''=-)|2 = f; \Ck?. k=—n
k=—n
a. Spectrum and products Let P{x) = Y2=-n ^ke'^"" and Q{x) = Y2=-n dke'^'' be two trigonometric polynomials of order n. Their product is the trigonometric polynomial of order 2n
k=—n
/e=—n
h,k=—n
2n
= E ( E -Hd,)e'''. p=-2n
h-\-k=p
If we denote by {ck} * {dk} the product in the sense of Cauchy of the spectra of P and Q, we can state the following.
132
4. Self-Adjoint Operators
4.46 Proposition. The spectrum of P{x)Q{x) is the product in the sense of Cauchy of the spectra of P and Q (PQ)k = Pk^Qk4.47 Definition. The convolution product of P and Q is defined by p * Q{x) := ^
r
P{x +
t)Q{i)dt
^TT J-n
Notice that the operation (P^Q) ^^ P *Q is hnear in the first factor and antihnear in the second one. We have 4.48 Proposition. P^Q is a trigonometric polynomial of degree n. Moreover the spectrum of P^Q is the term-by-term product of the spectra of P and Q, {p7Q)^:=P,Qk. Proof. In fact for h, k = — n , . . . , n, we have
27r y_7r
27r J - T T
hence, if P{x) = Efc=_n Cfce^^=" and Q{x) = E f c = - n ^fce^''^, we have
P^Q{x)=
f^
f2 ^hdi^^hke'"''= Y.
h=—nk=—n
^fc^^'""-
k=—n
b. Sampling of trigonometric polynomials A trigonometric polynomial of degree n can be reconstructed from its values on a suitable choice of its values on 2n + 1 points, see Section 5.4.1 r - := — o^+li? 27r . J = "~^?..., n, then the sampling map of [GM2]. Set Xj C : Vn,2n
-- C ^ ^ + l ,
C{P)
:= ( P ( x _ n ) , • • • ,
is invertible, in fact, see Theorem 5.49 of [GM2], 1 '^ ^^""^ • " 2 ; r f T ^
P{xj)Dn{x-Xj)
3 = -n
where Dn{t) is the Dirichlet kernel of order n Dn{t)—
Yl
e^^* = l 4 - 2 ^ c o s A : t .
P{Xn))
4.2 Some Applications
133
Spectrum
Trigonometric polynomials of degree n
£2n-\-l
E
n
ikt
Samplings £2n+l
IDFT
Figure 4.1. The scenario of trigonometric polynomials, spectra and samples.
4.49 Proposition. K,27r given by
J^^^C
V2n-\-lC~'^(z)(x)
and its inverse yjin + 1C~^ : C^"''^^
:= , \^ ZiDJx - Xj) >/2nTT .^1^ -^ '
are isometries between Vn,2n O'Tid C^"^"^^. Proof. In fact, C maps e* '^*, k = — n , . . . , n, to an orthonormal basis of C^'^"'"^:
Prom the samples, we can directly compute the spectrum of P, 4.50 Proposition. Let P{x) e Vn,2n CL'^^d Xj := 2 ^ ^ j ? j = —n, . . . , n . Then
^ f^ P(t)e-'^' dt = ^ ^ J2 P{xj)e-''^^.
(4.13)
Proof. Since (4.13) is linear in P , it suffices to prove it when P{x) = e*'^*, h = — n , . . . , n. In this case, we have ^ J^^ P{t)e~'^ ^^ dt = 6hk ^^^
3=-n
since Dn{xj)
J=-n
= 0 for j ^ 0, j e [-n,n] and DnCO) = 2n + 1.
D
134
4. Self-Adjoint Operators
c. The discrete Fourier transform The relation between the values {P{tj)} of P e 'Pn,27r at the 2n + 1 points tj and the spectrum P of P in the previous paragraph is a special case of the so-called discrete Fourier transform. For each positive integer N^ consider the 27r-periodic function EN{t) : R -^ C given by „ /^N v^^ ikt 1 ^ EN{t):=2^e'^' = \ , k=o [ i-eit
if H s a multiple of 27r,
. . (4.14)
Otherwise.
Let uj = e*i5^ and let l,a;,c 0,
.^ ^^.
/O = 0, / i = 1, that is given by 1 / / l + \/5\n
/l-y/5\n
(4.22)
see e.g., [GM2]. Let us find it again as an application of the above. Set fn Un-i-1 then
F.,.H^^^M= /n+2/
'-''
\/n+/n+l7
0^ ' 1 and, F„=A»(; where 0 1
1 1
]=r
\1
^ ^n, 1
4.2 Some Applications
141
The characteristic polynomial of A is det(AId — A) = A(A — 1) — 1, hence A has two distinct eigenvalues l-hv/5 l-v/5 2 ' ^ 2 An eigenvalue relative to A is (1, A) and an eigenvector relative to /x is (1, /x). The matrix A is diagonizable a s A = S A S ~ ^ where
^X
IJLJ
A -/i
y A
-ly
\0
/x
It follows that
1
A
A — /x \^A
l \ / A'^ ii)
\—iJL^
Consequently,
'" = I^(--''") = 7!((H^)"-(^)") 4,2.4 An ODE system: small oscillations Let x i , X 2 , . . . , xjv be N point masses in M^ each respectively, to a nonzero mass m i , m 2 , . . . , TRN. Assume that each point exerts a force on the other points according to Hookers law^ i.e., the force exerted by the mass at Xj on Xi is proportional to the distance of Xj from x^ and directed along the line through x^ and Xj,
By Newton's reaction law, the force exerted by x^ on Xj is equal and opposite in direction, fji = ~fij^ consequently the elastic constants fc^j, i ^ j , satisfy the symmetry condition kij = kji. In conclusion, the total force exerted by the system on the mass at x^ is N X,j = l,N i^3
3 = 1,N i^j
where we set ku := — ^J=I,N
3 = 1,N iy^J
j=l
kij. Newton's equation then takes the form
mix'/-fi=0,
z = l,...,Ar,
(4.23)
with the particularity that the j t h component of the force depends only on the j t h component of the mass. The system then splits into 3 systems of N equations of second order, one for each coordinate. If we use the matrix notation, things simplify. Denote by M := diag {mi, m2,. • •, TUN}
142
4. Self-Adjoint Operators
the positive diagonal matrix of masses, by K := (fc^j) G MN,N{^) the symmetric matrix of elastic contants, and by X{t) € M^ the jth coordinates of the points x i , . . . , x^v A = (Xj^,..., x ^ j ,
x^ =: [Xj^, a^^, x^ j ,
i.e., the columns of the matrix X(t) := [xi(t) I X2(t) I . . . I Xiv(t)]
e Miv,3(M).
Then (4.23) transforms into the system of equations MX'\t)-\-KX{t) = 0
(4.24)
where the product is the product rows by columns. Finally, if X"(t) denotes the matrix of second derivatives of the entries of X(^), the system (4.23) can be written as X'\t) + M - ^ K X ( t ) = 0, (4.25) in the unknown X : R -^ Miv,n(R)Since M~^K is symmetric, there is an orthonormal basis of R ^ made by eigenvalues of M~^K and real numbers Ai, A2,..., AA/^ such that {ui\uj) = 5ij
and
M.~^Kuj := \jUj\
notice that i^i, ?X2? • • •, ^iv are pairwise orthonormal vectors since M is diagonal. Denoting by Pj the projection operator onto Span{uj} we also have N
Id = 5^P„
N
M-^K = X^A,P,.
3=1
j=i
Thus, projecting (4.25) onto SpanjiXj} we find 0 = Pj{0) = P , ( X " + M - ^ K X ) = {PjXy
+ A^(P,X),
Vj = 1 , . . . , iV,
i.e., the system (4.25) splits into A^ second order equations each in the unknowns of the matrix PjX.{t). Since K is positive, the eigenvalues are positive, consequently each element of the matrix PjX(t) is a solution of the harmonic oscillator
hence PjX{t) = cos(yA~t)P,X(0) + ! ^ ^ ^ i ^ P , . X ' ( 0 ) . In conclusion, since Id = ^j^i Pj-, we have
4.3 Exercises
143
N
x{t) = J2PiMt) ' '
(4.26)
The numbers \/A7/(27r),... \/A^/(27r) are called the proper frequencies of the system. We may also use a functional notation ^
A2n+1
^
\2n
and we can write (4.26) as X(i) = cos{t^/A)X{0)
+ ?H^i^^X'(0), vA
where A := M~^K.
4.3 Exercises 4 . 5 7 %. Let A be an n X n matrix and let A be its eigenvalue of greatest modulus. Show that |A| < sup^dail + | 4 | + • • • + |aj,|). 4.58 % G r a m m a t r i x . Let {/i, / 2 , . . . , fm} be m vectors in M^. The matrix G = [gij] G Mm,n(IR) defined by gij = {fi\fj) is called Gram's matrix. Show that G is nonnegative and it is positive if and only if / i , 72, • • • ? fm are linearly independent. 4.59 t . Let A,B : C^ -^ C^ be self-adjoint and let A be positive. Show that the eigenvalues of A~^B are real. Show also that A~^B is positive if B is positive. 4 . 6 0 %. Let A = [a^] G Mn,n(K) be self-adjoint and positive. Show that det A < (trA/n)"^ and deduce det A < n?=i ^? with Rank A = n. Select n vectors wi, 1^2, • • •, Un € K** such that A w i , . . . , A u n G K ^ are orthonormal. [Hint: Find U G Mn,n(K) such that A U is an isometry.]
4.3 Exercises
4.71 %. Let A e MN,n{^) and A = U A V , where U G 0(N), to 4.39, show that A+ = V ^ A ' U ^ where
/^
0
0
0\
0
-^
0
0
A' =
V G 0{n).
145
According
1
Mfc
0
\ 0
0
...
0
0/
Ml) M2, • • •, Mfe being the nonzero singular values of A. 4.72 %, For u : R —> R^, discuss the system of equations
(Pu —-+ dt
2
. V-l
-l'
u = 0.
4.73 If. Let A G Mn,n(R) be a symmetric matrix. Discuss the following systems of ODEs x ' ( t ) + A x ( t ) = 0, - i x ' ( t ) + Ax(t) = 0 , x " ( t ) + Ax(t) = 0,
where Ais positive definite
and show that the solutions are given respectively by e-*^x(0),
e-^*^x(0),
cos(tVA)x(0) +
sin(t\/A)
^
, x'(0).
4.74 ^ . Let A be symmetric. Show that for the solutions of x''(t) + Ax(t) = 0 the energy is conserved. Assuming A positive, show that |x(t)| < E/Xi where E is the energy of x(t) and A the smallest eigenvalue of A. 4.75 %. Let A be a Hermitian matrix. Show that |x(t)| = const if x(t) solves the Schrodinger equation i x ' -|- A x = 0.
Part II
Metrics and Topology
Felix Hausdorff (1869-1942), Maurice Frechet (1878-1973) and Rene-Louis Baire (18741932).
5. Metric Spaces and Continuous Functions
The rethinking process of infinitesimal calculus, that was started with the definition of the limit of a sequence by Bernhard Bolzano (1781-1848) and Augustin-Louis Cauchy (1789-1857) at the beginning of the XIX century and was carried on with the introduction of the system of real numbers by Richard Dedekind (1831-1916) and Georg Cantor (1845-1918) and of the system of complex numbers with the parallel development of the theory of functions by Camille Jordan (1838-1922), Karl Weierstrass (18151897), J. Henri Poincare (1854-1912), G. F. Bernhard Riemann (18261866), Jacques Hadamard (1865-1963), Emile Borel (1871-1956), ReneLouis Baire (1874-1932), Henri Lebesgue (1875-1941) during the whole of the XIX and beginning of the XX century, led to the introduction of new concepts such as open and closed sets, the point of accumulation and the compact set. These notions found their natural collocation and their correct generalization in the notion of a metric space, introduced by Maurice Frechet (1878-1973) in 1906 and eventually developed by Felix Hausdorff (1869-1942) together with the more general notion of topological space. The intuitive notion of a "continuous function" probably dates back to the classical age. It corresponds to the notion of deformation without "tearing". A function from X to F is more or less considered to be continuous if, when x varies slightly, the target point y = f{x) also varies slightly. The critical analysis of this intuitive idea also led, with Bernhard Bolzano (1781-1848) and Augustin-Louis Cauchy (1789-1857), to the correct definition of continuity and the limit of a function and to the study of the properties of continuous functions. We owe the theorem of intermediate values to Bolzano and Cauchy, around 1860 Karl Weierstrass proved that continuous functions take on maximum and minimum values in a closed and limited interval, and in 1870 Eduard Heine (1821-1881) studied uniform continuity. The notion of a continuous function also appears in the work of J. Henri Poincare (1854-1912) in an apparently totally different context, in the so-called analysis situs, which is today's topology and algebraic topology. For Henri Poincare, analysis situs is the science that enables us to know the qualitative properties of geometrical figures. Poincare referred to the properties that are preserved when geometrical figures undergo any kind of deformation except those that introduce tearing and glueing of points. An intuitive idea for some of these aspects may be provided by the following examples.
150
5. Metric Spaces and Continuous Functions
GBUNDZCGE
MENaENLEHEE ESPACES ABSTRAITS INTRODUCTION A L'ANALYSE GfiNfiRALB
FELIX H1U8D0RPP
a FIOCRES: (M TKXr Xatirice MUlECHBT
PARIS GA0THIBU-VJLLAR8 »i CI, tOlTEURS H aniHl»-A«fil(li««, H
LEIPZia VEBLAO VON VEIT A COilP,
Figure 5.1. Frontispieces of Les espaces abstraits by Maurice Frechet (1878-1973) and of the Mengenlehre by Felix Hausdorff (1869-1942).
o Let US draw a disc on a rubber sheet. No matter how one pulls at the rubber sheet, without tearing it, the disc stays whole. Similarly, if one draws a ring, any way one pulls the rubber sheet without tearing or glueing any points, the central hole is preserved. Let us think of a loop of string that surrounds an infinite pole. In order to separate the string from the pole one has to break one of the two. Even more, if the string is wrappped several times around the pole, the linking number between string and pole is constant, regardless of the shape of the coils. o We have already seen Euler's formula for polyhedra in [GMl]. It is a remarkable formula whose context is not classical geometry. It was Poincare who extended it to all surfaces of the type of the sphere, i.e., surfaces that can be obtained as continuous deformations of a sphere without tearing or glueing. o E, R^, R^ are clearly different objects as linear vectorspaces. As we have seen, they have the same cardinality and are thus undistinguishable as sets. Therefore it is impossible to give meaning to the concept of dimension if one stays inside the theory of sets. One can show, instead, that their algebraic dimension is preserved by deformations without tearing or glueing. At the core of this analysis of geometrical figures we have the notion of a continuous deformation that corresponds to the notion of of a continuous one-to-one map whose inverse is also continuous, called homeomorphisms. We have already discussed some relevant properties of continuous functions / : R - ^ R e / : R 2 - ^ R i n [GMl] and [GM2]. Here we shall discuss continuity in a sufficiently general context, though not in the most general.
5.1 Metric Spaces
151
Poincare himself was convinced of the enormous importance of extending the methods and ideas of his analysis situs to more than three dimensions. ... L'analysis situs a plus de trois dimensions presente des difficultes enormes; il faut pour tenter de les surmonter etre bien persuade de Vextreme importance de cette science. Si cette importance n'est pas bien comprise de tout le monde, c 'est que tout le monde n'y a pas suffisamment reflechi^ In the first twenty years of this century with the contribution, among others, of David Hilbert (1862-1943), Maurice Prechet (1878-1973), FeUx Hausdorff (1869-1942), Pavel Alexandroff (1896-1982) and Pavel Urysohn (1898-1924), the fundamental role of the notion of an open set in the study of continuity was made clear, and general topology was developed as the study of the properties of geometrical figures that are invariant with respect to homeomorphisms, thus linking back to Euler who, in 1739, had solved the famous problem of Konigsberg's bridges with a topological method. There are innumerable successive applications, so much so that continuity and the structures related to it have become one of the most pervasive languages of mathematics. In this chapter and in the next, we shall discuss topological notions and continuity in the context of metric spaces.
5.1 Metric Spaces 5.1.1 Basic definitions a. Metrics 5.1 Definition. Let X be a set. A distance or metric on X is a map d : X X X -^ R+ for which the following conditions hold: (i) (IDENTITY) d{x,y) >Oifx^yeX, and d{x,x) = 0 Vx G X . (ii) (SYMMETRY) d{x,y) = d{y,x) ^x,y e X. (iii) (TRIANGLE INEQUALITY) d{x,y) < d{x,z) -\-d{z,y), \/ x,y,z e X. A metric space is a set X with a distance d. Formally we say that (X, d) is a metric space if X is a set and d is a distance on X. The properties (i), (ii) and (iii) are often called metric axioms.
^ The analysis situs in more than three dimensions presents enormous difficulties; in order to overcome them one has to be strongly convinced of the extreme importance of this science. If its importance is not well understood by everyone, it is because they have not sufficiently thought about it.
152
5. Metric Spaces and Continuous Functions
\
T
B
Figure 5.2. Time as distance.
5.2 E x a m p l e . The Euclidean distance d{x, y) := \x — y\, x,y £R, is a, distance on R. On M? and R^ distances are defined by the Euclidean distance, given for n = 2,3 by _
^ 1/2
Irtixi-y^A
where X := (xi,a;2),y := (yi,2/2) if n = 2, or x := (xi,X2,X3),y := (2/1,2/2,2/3) if n = 3. In other words, R, R'^, R^ are metric spaces with the EucHdean distance. 5.3 E x a m p l e . Imagine R^ as a union of strips En := {(a:i,a;2,iC3) \n < X3 < n -\- 1}, made by materials of different indices of refractions Vn- The time t{A, B) needed for a light ray to go from A to B in R^ defines a distance on R^, see Figure 5.2.
5.4 E x a m p l e . In the infinite cylinder C = {{x,y,z)\x'^ -\-y"^ = 1} c R^, we may define a distance between two points P and Q as the minimal length of the line on C, or geodesic, connecting P and Q. Observe that we can always cut the cylinder along a directrix in such a way that the curve is not touched. If we unfold the cut cylinder to a plane, the distance between P and Q is the Euclidean distance of the two image points.
5.5 1. Of course 1001a; — 2/1 is also a distance on R, only the scale factor has changed. More generally, if / : R —• R is an injective map, then d(x, y) := \f(x) — f{y)\ is again a distance on R.
5.6 Definition. Let (X, d) be a metric space. The open ball or spherical open neighborhood centered at XQ e X of radius p > 0 is the set B{xo,p) := Ix e X\ d{x, XQ) < p\.
Figure 5.3. Metrics on a cylinder and on the boundary of a cube.
5.1 Metric Spaces
153
Notice the strict inequality in the definition of B{x,r). In M, R^, R^ with the EucHdean metric, B{xo^ r) is respectively, the open interval ]xo — r, xo + r[, the open disc of center XQ G R^ and radius r > 0, and the ball bounded by the sphere of R^ of center XQ G R^ and radius r > 0. We say that a subset £' C X of a metric space is bounded if it is contained in some open ball. The diameter oi E C X is given by d i a m ^ := sup< d(a:,y) x, ?/ G £^L and, trivially, E is bounded iff didiXnE < +oo. Despite the suggestive language, the open balls of a metric space need not be either round nor convex; however they have some of the usual properties of discs in R^. For instance (i) B{xo, r) C B{xo, s) VXQ G X and 0 < r < 5, (ii) Ur>oB{xo,r) = X Vxo G X, (iii) nr->o^(^o,^) = {^o} Vxo G X ,
(iv) Va:o G X and Vz G B{xo^r) the open ball centered at z and radius p :=z r — d(xo, z) > 0 is contained in B{xo, r), (v) for every couple of balls B{x,r) and B{y,s) with a nonvoid intersection and Vz G B{x,r) n ^(y, s), there exists t > 0 such that B{z, t) C B{x, r) n B{y, s), in fact t := min(r — d(x, z), s — d{y, z)), (vi) for every x,y e X with x ^ y the balls 5 ( x , r i ) and B{y^r2) are disjoint if ri + r2 < G!(X, y). 5.7 ^ . Prove the previous claims. Notice how essential the strict inequality in the definition of B(xo, p) is.
b. Convergence A distance d on a set X allow us to define the notion of convergent sequence in X in a natural way. 5.8 Definition. Let (X, c!) he a metric space. We say that the sequence {xn} C X converges to x e X, and we write Xn -^ x, if d{Xn,x) -^ 0 in R, that is , if for any r > 0 there exists n such that d{xn^x) < r for all n>n. The metric axioms at once yield that the basic facts we know for limits of sequences of real numbers also hold for limits of sequences in an arbitrary metric space. We have (i) the limit of a sequence {xn} is unique, if it exists, (ii) if {xn} converges, then {xn} is bounded, (iii) computing the limit of {xn} consists in having a candidate x G X and then showing that the sequence of nonnegative real numbers {d{xn',x)} converges to zero, (iv) if Xn -^ X, then any subsequence of {xn} has the same limit x.
154
5. Metric Spaces and Continuous Functions
Thus, the choice of a distance on a given set X suffices to pass to the limit in X (in the sense specified by the metric d). However, given a set X, there is no distance on X that is reasonably absolute (even in R), but we may consider different distances in X. The corresponding convergences have different meanings and can be suited to treat specific problems. They all use the same general language, but the exact meaning of what convergence means is hidden in the definition of the distance. This flexibility makes the language of metric spaces useful in a large context.
5-1.2 Examples of metric spaces Relevant examples of distances are provided by linear vector spaces on the fields K = E or C in which we have defined a norm. 5.9 Definition. Let X be a linear space over K = R or C. A norm on X is a function \\ \\ : X -^ R+ satisfying the following properties (i) (FiNiTENESS) ||a;|| eR\/x e X. (ii) (IDENTITY) ||X|| > 0 and \\x\\ =0 if and only if x — 0. (iii)
(iv)
(1-HOMOGENEITY) ||AX|| = |A| ||X|| V X G X, (TRIANGLE INEQUALITY) ||X + 2/|| < ||x|| +
VA G K.
||2/|| \J x,y e X.
/ / II • II is a norm on X, we say that (X, || ||) Z5 a linear normed space or simply that X is a normed space with norm \\\\. Let X be a linear space with norm || ||. It is easy to show that the function d: X x X -^ R+ given by d{x,y) := \\x-y\\,
x,y e X,
satisfies the metric axioms, hence defines a distance on X, called the natural distance in the normed space (X, || ||). Obviously, such a distance is translation invariant, i.e., d{x -\- z,y -\- z) — d{x, y) Vx, y,z ^ X. Trivial examples of metric spaces are provided by the nonempty subsets of a metric space. If A is a subset of a metric space (X, c?), then the restriction of d to ^ x A is trivially, a distance on A. We say that ^ is a metric space with the induced distance from X. 5.10 If. For instance, the cylinder C := {{x, y, 2;) G M^ | x^ 4- y^ = 1} is a metric space with the Euclidean distance that, for x, y G C, yields d(x, y) :=:length of the chord joining x and y. The geodesic distance dg of Example 5.4, that is the length of the shortest path in C between x and y, defines another distance. C with the geodesic distance dg has to be considered as another metric space different from C with the Euclidean distance. A simple calculation shows that l|x-y||^ oo.
For instance, let (f{t) := te~*, t € M+, and consider the sequence of sequences {x^} where x^ := {x^}n, xV- := < ^ ( ^ ) . Then \fi we have xj. = f e"^/'' -^ 0 as fc -^ oo, while llxfc - Oiloo = s u p j ^ e - ^ / ^ I 2 = 0 , 1 , . . . j = - 7^ 0.
158
5. Metric Spaces and Continuous Functions
Of course W^ with the metric doo in Exercise 5.12 is a subset of ^oo endowed with the induced metric doo- This follows from the identification (a:\...,x^) ^
(x\...,x",0,...,0,...).
5.18 E x a m p l e (£p spaces, p > 1). Consider the space of all real (or complex) sequences X := ( x i , . . . ) . For 1 < p < oo, x = {xn} and y := {y-n} set 1 /
Trivially, ||x||^p = 0 if and only if any element of the sequence x is zero, moreover Minkowski's inequality
l|x + y|Up 0 is the set of all continuous functions g G ^^([0,1]) such that \gix) - f{x)\ < e
VxG[0,l]
or the family of all continuous functions with graphs in the tubular neighborhood of radius e of the graph of / t / ( / , e ) : = { ( a ; , j / ) | x e [ 0 , l ] , y eR,\y
- f{x)\ < e],
(5.3)
see Figure 5.6. The uniform convergence in C^([0,1]), that is the convergence in the uniform norm, of {fk} C C^([0,1]) to / G C^([0,1]) amounts to computing Mk '•= ll/fc - /lloc,[o,i] = ^ ^ ^ , j l/fc W - / W l for every fc = 1, 2 , . . . and to showing that Mk —* 0 as /c -^ -f-oo.
160
5. Metric Spaces and Continuous Functions
-1
-1/A;2
Figure 5.7. The function / ^ in (5.4).
5.20 E x a m p l e ( F u n c t i o n s o f class C^dO, 1])). Denote by C^([0,1]) the space of all functions / : [0,1] - ^ R of class C \ see [GMl]. For / 6 C^{[0,1]), set llci([0,i]) ' =
sup | / ( a : ) | + xG[0,l]
sup
\f{x
||/lloo,[0,l] + ll/'lloo,[0,l]-
xG[0,l]
It is easy to check that / —• Consequently,
Ci(fo 11) ^s ^ norm in the vector space C^([0,1]).
dc^([o,i])if^9) •= ll/-pllci([o,i]) defines a distance in C^([0,1]). In this case, a function g E C^ has a distance less than e from / if | | / - S/||oo,[o,i] + 11/' - P1IOO,[O,I] < ^i equivalently, if the graph of g is in the tubular neighborhood t / ( / , ei) of the graph of / , and the graph of g' is in the tubular neighborhood C/(/',e2) of f with ei + 62 = e, see (5.3). Moreover, convergence in the Ci([0,l])-norm of { A } C CH[0,1]) to f E C i ( [ 0 , l ] ) , \\fk - /Ilci([0,i]) ^ 0, amounts to / uniformly in [0,1], fk •/'
uniformly in [0,1].
Figures 5.8 and 5.9 show graphs of Lipschitz functions and functions of class C^([0,1]) that are closer and closer to zero in the uniform norm, but with uniform norm of the derivatives larger than one. 5.21 E x a m p l e (Integral m e t r i c s ) . Another norm and corresponding distance in C^([0,1]) is given by the distance in the mean
II/IIL1([O,I]) '= J
l/WN*,
dLmo,i])(f,g)
•= \\f -g\\LHo,i)
•= j
\f-9\dx.
5.22 ^ . Show that the L -norm in C0([0,1]) satisfies the norm axioms. Convergence with respect to the L^-distance differs from the uniform one. For instance, for A: = 1, 2 , . . . set (5.4)
\o
if]^ R is a Lipschitz-continuous function with Lip (/) = 1. In fact, from the triangle inequaUty, we get \fiy) - f{x)\ = \d(y,xo) -d{x,xo)\
< d{x,y)
yx,y G X,
hence / is Lipschitz continuous with Lip (/) < 1. Choosing a; = XQ, we have \f{y) - / ( ^ o ) | = \d{y, xo) - d{xo, xo)\ = d{y, XQ), thus L i p ( / ) > 1.
162
5. Metric Spaces and Continuous Functions
Figure 5.9. On the left, the sequence fkix) := k~^ cos{kx) that converges uniformly to zero with slopes equibounded by one. On the right, gk(x) '•= k~^ cos{k'^x), that converges uniformly to zero, but with slopes that diverge to infinity. Given any function / G C'^([0,1]), a similar phenomenon occurs for the sequences fk{x) := f(kx)/k, gk{x) =
f{k'^x)/k.
5.26 ^ D i s t a n c e from a s e t . Let (X, d) be a metric space. The distance function from cc € A" to a nonempty subset A C X is defined by d{x, A) := inf|d(a:, y) U € A \ . It is easy to show that f{x) := d(x, A) : X —^R is a, Lipschitz-continuous function with
Li (f)^
Jo iid(x,A) = OWx, I1
otherwise.
If d{x, A) is identically zero, then the claim is trivial. On the other hand, for any x,y £ X and z E A we have d{x, z) < d{x, y) -\- d{y, z) hence, taking the infimum in z^
d{x,A)-d{y,A)
0, there exists a sequence {zn} C A such that W r < l + i Therefore, \d(x, A) — d(xn,A)\
= d{x, A) >
d(x, Xn), n -\-1 from which we infer that the Lipschitz constant of a: —• d{x^ A) must not be smaller than one.
b. Continuous maps in metric spaces The notion of continuity that we introduced in [GMl], [GM2] for functions on one real variable can be extended in the context of the abstract metric structure. In fact, by paraphrasing the definition of continuity of functions / : R -^ R+ we get
5.1 Metric Spaces
163
5.27 Definition. Let (X^dx) and {Y^dy) be two metric spaces. We say that f : X ^^ Y is continuous at XQ Z/ Ve > 0 there exists S > 0 such that dy(/(x),/(xo)) < e whenever dx{x,xo) < S, i.e., S)) C 5 y ( / ( X Q ) , e).
Ve > 0 3(5 > 0 such that f{Bx{xo,
(5.6)
We say that f : X ^^ Y is continuous in E C X if f is continuous at every point XQ ^ E. When E = X and f : X ^^ Y is continuous at any point of X, we simply say that f : X ^^Y is continuous. 5.28 1[. Show that a-Holder-continuous functions, 0 < a < 1, in particular Lipschitzcontinuous functions, between two metric spaces are continuous.
Let (X, dx) and (F, dy) be two metric spaces and E C X. Since E is a, metric space with the induced distance of X, Definition 5.27 also appHes to the function f : E —^Y. Thus f : E —^Y is continuous dit XQ E E if \/e>03S>0
such that f{Bx{xo,
S) H E) C By{f{xo),
e)
(5.7)
and we say that / : E" ^ y is continuous ii f : E ^^ Y is continuous at any point XQ E E. 5.29 Remark. As in the case of functions of one real variable, the domain of the function / is relevant in order to decide if / is continuous or not. For instance, f : X -^Y is continuous in £" C X if Vxo G E V€ > 0 35 > 0 such that f{Bx{xo,
5)) C By{f{xo),
e),
(5.8)
while the restriction f\E - E -^ Y oi f to E is continuous in E if yxoeE\/e>03S>0
such that f{Bx{xo,S)nE)
C By{f{xo),e).
(5.9)
We deduce that the restriction property holds: if f : X ^^Y is continuous in E, then its restriction f\E '- ^ ~^^ ^o E is continuous. The opposite is in general false, as simple examples show. 5.30 Proposition. Let X,Y,Z be three metric spaces, and XQ E X. If f : X ^^ Y is continuous at XQ and g : Y -^ Z is continuous at f{xo), then g o f : X ^^ Z is continuous at XQ • In particular, the composition of two continuous functions is continuous. Proof. Let e > 0. Since g is continuous at f{xo), there exists a > 0 such that g{BY{f{xo),(T)) C Bz{g{f{xo)),e). Since / is continuous at XQ, there exists 5 > 0 such that f(BxixoyS)) C By(/(a;o),cr), consequently go f{Bx{xo,S))
C g{BYif{xo),a))
C Bzigo
f{xo),e). D
Continuity can be expressed in terms of convergent sequences. As in the proof of Theorem 2.46 of [GM2], one shows 5.31 Theorem. Let (X, dx) and (y, dy) be two metric spaces, f : X -^Y is continuous at XQ E X if and only if f{xn) -^ f{xo) in iy^dy) whenever {Xn}
d X,
Xn-^
XQ in ( X , d x ) .
164
5. Metric Spaces and Continuous Functions
c. Limits in metric spaces Related to the notion of continuity is the notion of the hmit. Again, we want to rephrase f{x) —^yo as x ^ XQ. For that we need / to be defined near XQ, but not necessarily at XQ. For this purpose we introduce the 5.32 Definition. Let X be a metric space and A C X. We say that XQ G X is an accumulation point of A if each ball centered at XQ contains at least one point of A distinct from XQ , Vr>0
B(xo,r)nA\{xo}^0.
Accumulation points are also called cluster points. 5.33 %, Consider R with the EucUdean metric. Show that (i) the set of points of accumulation of A :=]a, 6[, B = [a,b], C = [a, b[ is the closed interval [a, 6], (ii) the set of points of accumulation of A :=]0,1[U{2}, B = [0,1]U{2}, C = [0,1[U{2} is the closed interval [0,1], (iii) the set of points of accumulation of the rational numbers and of the irrational numbers is the whole R.
We shall return to this notion, but for the moment the definition suffices. 5.34 Definition. Let (X, dx) and (Y, dy) be two metric spaces, letEcX and let XQ G X be a point of accumulation of E. Given f : E\ {XQ} -^ Y, we say that y^ £Y is the limit of f{x) as x -^ XQ, X E E, and we write f{x) -^yo as x-^
XQ,
or
lim /(x) = yo X
'XQ
xeE
if for any e > 0 there exists 6 > 0 such that dy(/(x),2/o) < e whenever X e E and 0 < dx{x,xo) < S. Equivalently, Ve > 0 3(5 > 0 such that f{Bx{xo,
S)r]E\
{XQ}) C Byiyo, e).
Notice that, while in order to deal with the continuity of / at xo we only need / to be defined at XQ; when we deal with the notion of limit we only need that XQ be a point of accumulation of E. These two requirements are unrelated, since not all points of E are points of accumulation and not all points of accumulation of E are in E, see, e.g.. Exercise 5.33. Moreover, the condition 0 < dxix^xo) in the definition of limit expresses the fact that we can disregard the value of / at XQ (in case / is defined at XQ). Also notice that the limit is unique if it exists^ and that limits are preserved by restriction. To be precise, we have 5.35 Proposition. Let (X, dx) and (F, dy) be two metric spaces. Suppose F C E C X and let XQ G X be a point of accumulation for F. If f{x) —^y as X -^ xo; X £ E, then / ( x ) -^ y as x -^ XQ, X £ F. 5.36 ^ . As for functions of one variable, the notions of limit and continuity are strongly related. Show the following.
5.1 Metric Spaces
165
P r o p o s i t i o n . Let X and Y be two metric spaces, E C X and XQ ^ X. (i) / / XQ belongs to E and is not a point of accumulation of E, then every function f : E —^Y is continuous at XQ. (ii) Suppose that XQ belongs to E and is a point of accumulation for E. Then a) f : E -^ Y is continuous at XQ if and only if f{x) —>• f{xo) as x ^^ XQ, xe E, b) f(x) —> y as X -^ XQ, X ^ E, if and only if the function g : EU {XQ} -^ Y defined by \fix)
ifxeE\{xo}, if X =
is continuous
XQ
at XQ.
We conclude with a change of variable theorem for limits, see e.g., Proposition 2.27 of [GMl] and Example 2.49 of [GM2]. 5.37 Proposition. Let X^Y,Z be metric spaces, E C X and let XQ be a point of accumulation for E. Let f : E —^ Y, g : f{E) -^ Z be two functions and suppose that /(XQ) is an accumulation point of f{E). If (i) 9{y) -^ L as y -^ yo, y e f{E), (ii) f(x) -^ yo as X -^ xoy X e E, (iii) either f{xo) = yO) or f{x) ^ yo for all x e E and x ^ XQ, then g{f{x)) -^ L as x —^ XQ, X E E. d. The junction property A property we have just hinted at in the case of real functions is the junction property^ see Section 2.1.2 of [GMl], which is more significant for functions of several variables. Let X be a set. We say that a family {[/«} of subsets of a metric space is locally finite at a point XQ G X if there exists r > 0 such that B(xo, r) meets at most a finite number of the C/a's. 5.38 Proposition. Let {X^dx), (1^,dy) be metric spaces, f : X —^ Y a function, XQ E X, and let {Ua} be a family of subsets of X locally finite at XQ.
(i) Suppose that XQ is as X -^ XQ, X eUa, (ii) / / XQ G HaUa and then f : X ^^Y is
a point of accumulation of Ua and that f[x) -^ y for all a. Then f{x) —^y as x ^^ XQ, X £ X. f : Ua C X —^ Y is continuous at xo for all a, continuous at XQ .
5.39 t - Prove Proposition 5.38. 5.40 E x a m p l e . An assumption on the covering is necessary in order that the conclusions of Proposition 5.38 hold. Set A := {(x, y)\x'^ R, i = 1 , . . . , n, are Lipschitz continuous. (ii) Let {X, d) he a metric space. Then a) f : X -^ W^ is continuous at XQ e X if and only if all its components f^, / ^ , . . . , / ^ are continuous at XQ, b) '^f fi9 ' ^ -^^^ ^^6 continuous at XQ, then f -\- g : X —> W^ is continuous at XQ , c) if f : X -^ W^ and A : X —> R are continuous at XQ then the map Xf : X —^ W^ defined by A/(x) := A(x)/(x), is continuous at
XQ.
5.42 E x a m p l e . The function / : R^ —>• R, / ( x , y, x) := sin(x^y) + x^ is continuous at R^. In fact, if xo := (xo,yo,zo), then the coordinate functions x = (x, y,%) -^ x, x —^ y, X —>• z are continuous at XQ by Proposition 5.41. By Proposition 5.41 (iii), x -^ x'^y and X -^ z'^ are continuous at xo, and by (ii) Proposition 5.41, x -^ x'^y + z"^ is continuous at XQ. Finally sin(x^2/ -^ x^) is continuous since sin is continuous.
5.43 Definition. Let X and Y he two metric spaces. We denote hy C^{X, Y) the class of all continuous function f : X -^Y.
5.1 Metric Spaces
167
As a consequence of Proposition 5.41 C^(X,R"^) is a vector space. Moreover, if A G C^{X,R) and / G CO(X,R^), then A / : X -^ R^ given by Xf{x) := A(x)/(x), xeX, belongs to C^(X,R^). In particular, 5.44 Corollary. Polynomials in n variables belong to C^(R^,R). Therefore, maps f '.W^ —^ W^ whose components are polynomials of n variables are continuous. In particular, linear maps L G £(R^,R'^) are continuous. It is worth noticing that in fact 5.45 Proposition. Let L :W^ ^^ W^ be linear. Then L is Lipschitz continuous in R^. Proof. As L is linear, we have Lip (/) : ==
sup x,yeR'^
\\X-y\\Rn
Xy^y
=
sup — x,y£R^ \\x-y\\un
=
sup —— o^zeR^ IPIIR'^
= : ||L||.
xj^y
Let us prove that ||L|| < H-CXD. Since L is continuous at zero by Corollary 5.44, there exists S > 0 such that ||L(ii;)|| < 1 whenever \\w\\ < S. For any nonzero 2 € M^, set w := 2Jnr\- Since ||ti;|| < 6, we have ||L(i(;)|| < 1. Therefore, writing z = ^y^w and using the linearity of L
||L(.)|| = | | « L W | | = « | 1 L H | | < ^ | N 1 hence
||L||