Matrix Analysis Matrix Analysis for Scientists for Scientists & & Engineers Engineers
This page intentionally intentionally left left blank blank This page
Matrix Matrix Analysis Analysis for Scientists Engineers for Scientists & & Engineers
Alan J. J. Laub Alan Laub University of California Davis, California
slam.
Copyright © 2005 by the the Society Society for Industrial and and Applied Mathematics. Copyright 2005 by for Industrial Applied Mathematics. 10987654321 10987654321 All America. No this book All rights rights reserved. reserved. Printed Printed in in the the United United States States of of America. No part part of of this book may be be reproduced, reproduced, stored, stored, or or transmitted transmitted in in any any manner manner without the written may without the written permission permission of the publisher. For For information, information, write to the the Society Society for Industrial and Applied of the publisher. write to for Industrial and Applied Mathematics, Mathematics, 3600 3600 University University City City Science Science Center, Center, Philadelphia, Philadelphia, PA PA 191042688. 191042688.
MATLAB® is is a a registered registered trademark trademark of The MathWorks, MathWorks, Inc. Inc. For For MATLAB MATLAB product product information, information, MATLAB® of The please contact The Apple Hill 017602098 USA, USA, please contact The MathWorks, MathWorks, Inc., Inc., 3 3 Apple Hill Drive, Drive, Natick, Natick, MA MA 017602098 5086477000, Fax: Fax: 5086477101, 5086477101,
[email protected], www.mathworks.com 5086477000,
[email protected], wwwmathworks.com Mathematica is is a a registered registered trademark trademark of of Wolfram Wolfram Research, Research, Inc. Mathematica Inc. Mathcad is is a a registered registered trademark of Mathsoft Mathsoft Engineering Engineering & & Education, Education, Inc. Mathcad trademark of Inc. Library of of Congress Congress CataloginginPublication CataloginginPublication Data Data Library Laub, Alan J., 19481948Laub, Alan J., Matrix analysis scientists and and engineers engineers // Alan Matrix analysis for for scientists Alan J. J. Laub. Laub. p. cm. cm. p. Includes bibliographical bibliographical references references and and index. Includes index. ISBN 0898715768 0898715768 (pbk.) (pbk.) ISBN 1. Matrices. Matrices. 2. 2. Mathematical Mathematical analysis. analysis. I.I. Title. Title. 1. QA188138 2005 QA 188.L38 2005 512.9'434—dc22 512.9'434dc22
2004059962 2004059962
About the cover: cover: The The original original artwork artwork featured on the cover was created by by freelance About the featured on the cover was created freelance permission . artist Aaron Tallon artist Aaron Tallon of of Philadelphia, Philadelphia, PA. PA. Used Used by by permission.
• slam
5.lam...
is a a registered registered trademark. is trademark.
To To my my wife, wife, Beverley Beverley (who captivated captivated me in the UBC UBC math library nearly forty years ago) nearly forty
This page intentionally intentionally left left blank blank This page
Contents Contents Preface Preface
xi xi
11
Introduction Introduction and and Review Review 1.1 Notation and 1.1 Some Some Notation and Terminology Terminology 1.2 Matrix Matrix Arithmetic 1.2 Arithmetic . . . . . . . . 1.3 Inner Inner Products and Orthogonality 1.3 Products and Orthogonality . 1.4 Determinants 1.4 Determinants
11 11 33 4 44
2 2
Vector Vector Spaces Spaces 2.1 Definitions Examples . 2.1 Definitions and and Examples 2.2 Subspaces......... 2.2 Subspaces 2.3 2.3 Linear Linear Independence Independence . . . 2.4 Sums and Intersections Intersections of 2.4 Sums and of Subspaces Subspaces
77 77 99 10 10 13 13
33
Linear Linear Transformations Transformations 3.1 Definition Definition and Examples . . . . . . . . . . . . . 3.1 and Examples 3.2 Matrix Representation of Linear 3.2 Matrix Representation of Linear Transformations Transformations 3.3 Composition Transformations . . 3.3 Composition of of Transformations 3.4 Structure of Linear Linear Transformations Transformations 3.4 Structure of 3.5 3.5 Four Four Fundamental Fundamental Subspaces Subspaces . . . .
17 17 17 17 18 18 19 19 20 20 22 22
4 4
Introduction Introduction to to the the MoorePenrose MoorePenrose Pseudoinverse Pseudoinverse 4.1 Definitions and Characterizations Characterizations. 4.1 Definitions and 4.2 Examples.......... 4.2 Examples 4.3 Properties and and Applications Applications . . . . 4.3 Properties
29 29 30 30 31 31
55
Introduction Introduction to to the the Singular Singular Value Value Decomposition Decomposition 5.1 5.1 The The Fundamental Fundamental Theorem Theorem . . . 5.2 Some Basic Properties Properties . . . . . 5.2 Some Basic 5.3 Row and Column Compressions 5.3 Rowand Column Compressions
35 35 35 35 38 40
6 6
Linear Linear Equations Equations 6.1 Vector Vector Linear Linear Equations Equations . . . . . . . . . 6.1 6.2 Matrix Linear Equations Equations . . . . . . . . 6.2 Matrix Linear 6.3 6.3 A A More More General General Matrix Matrix Linear Linear Equation Equation 6.4 Some Useful and and Interesting Inverses. 6.4 Some Useful Interesting Inverses
43 43 43 43
vii
44 47 47 47 47
viii viii
Contents Contents
7
Projections, Inner Product Spaces, and Norms 7.1 Projections . . . . . . . . . . . . . . . . . . . . . . 7.1 Projections 7.1.1 The fundamental orthogonal orthogonal projections projections 7.1.1 The four four fundamental 7.2 Inner Product Product Spaces Spaces 7.2 Inner 7.3 7.3 Vector Vector Norms Norms 7.4 Matrix Norms Norms . . . . 7.4 Matrix
51 51 51 51 52 52 54 54 57 57 59 59
8
Linear Least Squares Problems 8.1 Linear Least Least Squares Problem . . . . . . . . . . . . . . 8.1 The The Linear Squares Problem 8.2 8.2 Geometric Geometric Solution Solution . . . . . . . . . . . . . . . . . . . . . . 8.3 Linear Regression Regression and and Other 8.3 Linear Other Linear Linear Least Least Squares Squares Problems Problems 8.3.1 Linear regression 8.3.1 Example: Example: Linear regression . . . . . . . 8.3.2 problems . . . . . . . 8.3.2 Other Other least least squares squares problems 8.4 Least Squares 8.4 Least Squares and and Singular Singular Value Value Decomposition Decomposition 8.5 Least Squares and QR Factorization Factorization . . . . . . . 8.5 Least Squares and QR
65 65 65 65 67 67 67 67 67 67 69 70 70 71 71
9
Eigenvalues and Eigenvectors 9.1 Fundamental Definitions Definitions and Properties 9.1 Fundamental and Properties 9.2 Jordan Jordan Canonical Canonical Form Form . . . . . 9.2 the JCF 9.3 Determination of 9.3 Determination of the JCF . . . . . 9.3.1 Theoretical computation . 9.3.1 Theoretical computation l's in in JCF blocks 9.3.2 On the + 9.3.2 On the +1's JCF blocks 9.4 Geometric Aspects of JCF of the the JCF 9.4 Geometric Aspects 9.5 The The Matrix Sign Function Function. 9.5 Matrix Sign
75 75 75 82 82 85 85 86 86 88 88 89 89 91 91
10 Canonical Forms 10.1 Basic Canonical 10.1 Some Some Basic Canonical Forms Forms . 10.2 Definite 10.2 Definite Matrices Matrices . . . . . . . 10.3 Equivalence Transformations Transformations and 10.3 Equivalence and Congruence Congruence 10.3.1 matrices and 10.3.1 Block Block matrices and definiteness definiteness 10.4 Rational Canonical 10.4 Rational Canonical Form Form . . . . . . . . .
95 95
95 95 99 102 102 104 104 104 104
11 Linear Differential and and Difference Difference Equations Equations 11 Linear Differential 11.1 Differential ILl Differential Equations Equations . . . . . . . . . . . . . . . . 11.1.1 matrix exponential 11.1.1 Properties Properties ofthe of the matrix exponential . . . . 11.1.2 11.1.2 Homogeneous Homogeneous linear linear differential differential equations equations 11.1.3 11.1.3 Inhomogeneous Inhomogeneous linear linear differential differential equations equations 11.1.4 Linear matrix differential equations 11.1.4 Linear matrix differential equations . . 11.1.5 decompositions . . . . . . . . . 11.1.5 Modal Modal decompositions matrix exponential 11.1.6 11.1.6 Computation Computation of of the the matrix exponential 11.2 Difference Equations . . . . . . . . . . . . . . 11.2 Difference Equations 11.2.1 linear difference difference equations 11.2.1 Homogeneous Homogeneous linear equations 11.2.2 Inhomogeneous linear difference equations 11.2.2 Inhomogeneous linear difference equations 11.2.3 powers . 11.2.3 Computation Computation of of matrix matrix powers Equations. . . . . . . . . . . . . . . 11.3 HigherOrder Equations 11.3 HigherOrder
109 109 109 109 109 109 112 112 112 112 113 113 114 114 114 114 118 118 118 118 118 118 119 119 120 120
Contents Contents
ix ix
12 Generalized Eigenvalue Eigenvalue Problems Problems 12 Generalized 12.1 The Generalized EigenvaluelEigenvector 12.1 The Generalized Eigenvalue/Eigenvector Problem Problem 12.2 Forms . . . . . . . . . . . . . . . . . 12.2 Canonical Canonical Forms 12.3 Application to to the the Computation of System Zeros . 12.3 Application Computation of System Zeros 12.4 Generalized Eigenvalue Eigenvalue Problems 12.4 Symmetric Symmetric Generalized Problems . 12.5 Simultaneous Simultaneous Diagonalization 12.5 Diagonalization . . . . . . . . . 12.5.1 Simultaneous Simultaneous diagonalization 12.5.1 diagonalization via via SVD SVD 12.6 HigherOrder HigherOrder Eigenvalue Problems .. 12.6 Eigenvalue Problems 12.6.1 Conversion Conversion to firstorder form form 12.6.1 to firstorder
125 125 125 127 127 130 131 131 133 133 133 135 135 135
13 Kronecker 13 Kronecker Products Products 13.1 Definition and Examples Examples . . . . . . . . . . . . . 13.1 Definition and 13.2 Properties Properties of of the the Kronecker Kronecker Product Product . . . . . . . 13.2 13.3 Application to to Sylvester and Lyapunov Lyapunov Equations Equations 13.3 Application Sylvester and
139 139 139 139 140 144 144
Bibliography Bibliography
151
Index Index
153
This page intentionally intentionally left left blank blank This page
Preface Preface This intended to for beginning (or even even seniorlevel) This book book is is intended to be be used used as as aa text text for beginning graduatelevel graduatelevel (or seniorlevel) students in the sciences, sciences, mathematics, computer science, science, or students in engineering, engineering, the mathematics, computer or computational computational science who wish to be familar with enough prepared to science enough matrix analysis analysis that they they are are prepared to use its tools and ideas comfortably in aa variety variety of applications. By By matrix matrix analysis analysis II mean mean linear tools and ideas comfortably in of applications. linear algebra and and matrix application to algebra matrix theory theory together together with with their their intrinsic intrinsic interaction interaction with with and and application to linear linear differential text linear dynamical dynamical systems systems (systems (systems of of linear differential or or difference difference equations). equations). The The text can be used used in onequarter or or onesemester onesemester course course to to provide provide aa compact compact overview of can be in aa onequarter overview of much important and and useful useful mathematics mathematics that, that, in many cases, cases, students meant to to learn learn much of of the the important in many students meant thoroughly somehow didn't manage to topics thoroughly as as undergraduates, undergraduates, but but somehow didn't quite quite manage to do. do. Certain Certain topics that may may have have been been treated treated cursorily cursorily in in undergraduate undergraduate courses courses are treated in more depth that are treated in more depth and more more advanced is introduced. only the and advanced material material is introduced. II have have tried tried throughout throughout to to emphasize emphasize only the more important and "useful" tools, methods, and mathematical structures. Instructors are encouraged to supplement the book book with with specific specific application from their their own own encouraged to supplement the application examples examples from particular area. particular subject subject area. The choice of algebra and and matrix matrix theory theory is is motivated motivated both both by by The choice of topics topics covered covered in in linear linear algebra applications and computational utility relevance. The The concept of matrix applications and by by computational utility and and relevance. concept of matrix factorization factorization is is emphasized emphasized throughout throughout to to provide provide aa foundation foundation for for aa later later course course in in numerical numerical linear linear algebra. are stressed than abstract vector spaces, spaces, although although Chapters and 3 3 algebra. Matrices Matrices are stressed more more than abstract vector Chapters 22 and do cover cover some geometric (i.e., subspace) aspects aspects of fundamental do some geometric (i.e., basisfree basisfree or or subspace) of many many of of the the fundamental notions. The books by Meyer [18], Noble and Daniel [20], Ortega Ortega [21], and Strang [24] are excellent companion companion texts for this book. Upon course based based on on this this are excellent texts for this book. Upon completion completion of of aa course text, the student is then then wellequipped to pursue, pursue, either via formal formal courses through selftext, the student is wellequipped to either via courses or or through selfstudy, followon topics on the computational side (at the level of [7], [II], [11], [23], or [25], for example) or or on on the side (at level of [12], [13], [13], or [16], for example). of [12], or [16], for example). example) the theoretical theoretical side (at the the level essentially just an understanding Prerequisites for for using this this text are quite modest: essentially understanding of and definitely some previous previous exposure to matrices matrices and linear algebra. Basic of calculus calculus and definitely some exposure to and linear algebra. Basic concepts such such as determinants, singularity singularity of eigenvalues and concepts as determinants, of matrices, matrices, eigenvalues and eigenvectors, eigenvectors, and and positive definite matrices matrices should have been covered at least least once, even though their recollection may occasionally occasionally be be "hazy." However, requiring requiring such material as as prerequisite prerequisite permits tion may "hazy." However, such material permits the early "outoforder" by standards) introduction of topics the early (but (but "outoforder" by conventional conventional standards) introduction of topics such such as as pseupseudoinverses and and the singular value decomposition (SVD). tools doinverses the singular value decomposition (SVD). These These powerful powerful and and versatile versatile tools can can then be exploited exploited to to provide a unifying foundation foundation upon which to base subsequent subsequent toptopics. Because tools tools such the SVD are not not generally generally amenable to "hand "hand computation," computation," this this ics. Because such as as the SVD are amenable to approach necessarily availability of of appropriate mathematical software software on appropriate mathematical on approach necessarily presupposes presupposes the the availability aa digital digital computer. computer. For For this, this, II highly highly recommend recommend MAlLAB® MATLAB® although although other other software software such such as as
xi xi
xii xii
Preface Preface
Mathcad® is also excellent. Since this text is not intended for a course in Mathematica® or Mathcad® numerical linear algebra per per se, se, the details of most of the numerical aspects of linear algebra are deferred to are deferred to such such aa course. course. The presentation of the material in this book is is strongly influenced influenced by by computacomputational issues for two principal reasons. First, "reallife" "reallife" problems seldom yield to simple closedform closedform formulas or solutions. They must generally be solved computationally and it is important to know which types of algorithms can be relied upon and which cannot. Some of of the numerical linear linear algebra, form the Some the key key algorithms algorithms of of numerical algebra, in in particular, particular, form the foundation foundation virtually all of modern modem scientific and engineering computation. A second upon which rests virtually motivation for a computational emphasis is that it provides many of the essential tools for what I call "qualitative mathematics." mathematics." For example, in an elementary linear algebra course, a set of vectors is either linearly independent or it is not. This is an absolutely fundamental fundamental concept. But in most engineering or scientific contexts we want to know more than that. If linearly independent, independent, how "nearly dependent" are the vectors? If If a set of vectors is linearly If they are linearly dependent, are there "best" linearly independent subsets? These tum turn out to be more difficult difficult problems frequently involve involve researchlevel researchlevel questions questions when be much much more problems and and frequently when set set in the context of of the finiteprecision, finiterange floatingpoint arithmetic environment of of most modem modern computing platforms. Some of of the the applications applications of of matrix matrix analysis analysis mentioned mentioned briefly briefly in in this this book book derive modem statespace from the modern statespace approach to dynamical systems. Statespace Statespace methods are modem engineering where, for example, control systems with now standard standard in much of modern large numbers numbers of interacting inputs, outputs, and states often give rise to models models of very high order that must be analyzed, simulated, and evaluated. The "language" in which such described involves vectors and matrices. It is thus crucial to acquire models are conveniently described knowledge of the vocabulary vocabulary and grammar of this language. The tools of matrix a working knowledge analysis are also applied applied on a daily basis to problems in biology, chemistry, econometrics, physics, statistics, and a wide variety of other fields, and thus the text can serve a rather diverse audience. audience. Mastery of the material in this text should enable the student to read and diverse understand the modern modem language of matrices used throughout mathematics, science, and engineering. prerequisites for this text are modest, and while most material is developed developed from While prerequisites basic ideas in the book, the student does require a certain amount of what is conventionally referred to as "mathematical maturity." Proofs Proofs are given for many theorems. When they are referred not given explicitly, obvious or or easily easily found found in literature. This This is is ideal ideal not given explicitly, they they are are either either obvious in the the literature. material from which to learn a bit about mathematical proofs and the mathematical maturity and insight gained thereby. It is my firm conviction conviction that such maturity is neither neither encouraged nor nurtured by relegating the mathematical aspects of applications (for example, linear algebra for elementary statespace theory) to introducing it "onthef1y" "onthefly" when algebra to an appendix or introducing foundation upon necessary. Rather, Rather, one must must lay lay a firm firm foundation upon which which subsequent applications and and perspectives can be built in a logical, consistent, and coherent fashion. perspectives I have taught this material for many years, many times at UCSB and twice at UC Davis, course has successful at enabling students students from from Davis, and and the the course has proven proven to to be be remarkably remarkably successful at enabling disparate backgrounds to acquire a quite acceptable acceptable level of mathematical maturity and graduate studies in a variety of disciplines. Indeed, many students who rigor for subsequent graduate completed the course, especially especially the first few times it was offered, offered, remarked afterward that completed if only they had had this course before they took linear systems, or signal processing. processing, if
Preface Preface
xiii XIII
or estimation theory, etc., they would have been able to concentrate on the new ideas deficiencies in their they wanted to learn, rather than having to spend time making up for deficiencies background in matrices and linear algebra. My fellow instructors, too, realized that by background requiring this course as a prerequisite, they no longer had to provide as much time for "review" and could focus instead on the subject at hand. The concept seems to work.
AJL, — AJL, June 2004
This page intentionally intentionally left left blank blank This page
Chapter 1 Chapter 1
Introduction and and Review Introduction Review
1.1 1.1
Some Notation Notation and and Terminology Terminology Some
We begin with with aa brief brief introduction notation and used We begin introduction to to some some standard standard notation and terminology terminology to to be be used throughout the text. This This is review of of some some basic notions in throughout the text. is followed followed by by aa review basic notions in matrix matrix analysis analysis and linear linear algebra. algebra. and The The following following sets sets appear appear frequently frequently throughout throughout subsequent subsequent chapters: chapters:
1. Rnn== the the set set of of ntuples ntuples of of real real numbers as column column vectors. vectors. Thus, Thus, xx Ee Rn I. IR numbers represented represented as IR n means means
where Xi xi Ee R for ii Ee !!. n. IR for where Henceforth, the notation!! notation n denotes denotes the the set set {I, {1, ... ..., , nn}. Henceforth, the }. Note: Vectors Vectors are vectors. A vector is where Note: are always always column column vectors. A row row vector is denoted denoted by by y~ yT, where yy G E Rn IR n and and the the superscript superscript T T is is the the transpose transpose operation. operation. That That aa vector vector is is always always aa column vector vector rather rather than row vector vector is entirely arbitrary, arbitrary, but this convention convention makes makes column than aa row is entirely but this it text that, x TTyy is while it easy easy to to recognize recognize immediately immediately throughout throughout the the text that, e.g., e.g., X is aa scalar scalar while T xy is an an nn xx nn matrix. xyT is matrix.
en
2. Cn = the the set set of of ntuples ntuples of of complex complex numbers numbers represented represented as as column column vectors. vectors. 2. 3. IR xn = Rrnmxn = the the set set of of real real (or (or realvalued) realvalued) m m xx nn matrices. matrices.
4. 1R;n xn Rmxnr
= xn denotes = the set set of of real real m x n matrices of of rank rank r. Thus, Thus, IR~ Rnxnn denotes the the set set of of real real nonsingular matrices. nonsingular n n xx nn matrices.
e
mxn 5. = 5. Crnxn = the the set set of of complex complex (or (or complexvalued) complexvalued) m xx nn matrices. matrices.
6. e;n xn Cmxn
= n matrices = the the set set of of complex complex m m xx n matrices of of rank rank r. r. 1
Chapter 1. 1. Introduction Introduction and and Review Review Chapter
22
We now classify some of the more familiar "shaped" matrices. A matrix A Ee IRn xn x (or A A E enxn ) is eC" ")is
diagonal if if aij a,7 == 00 for forii i= ^ }.j. •• diagonal upper triangular triangular if if aij a,; == 00 for forii >> }.j. •• upper lower triangular triangular if if aij a,7 == 00 for for i/ > 1. •• tridiagonal 1. pentadiagonal if if aij ai; = = 00 for for Ii/ —J j\I >> 2. •• pentadiagonal 2. upper Hessenberg Hessenberg if if aij afj == 00 for for ii — jj >> 1. •• upper 1. lower Hessenberg Hessenberg if if aij a,; == 00 for for }j —ii >> 1. •• lower 1. Each of the above also has a "block" analogue obtained by replacing scalar components in nxn mxn the respective definitions definitions by block block submatrices. submatrices. For For example, example, if if A Ee IR Rnxn , , B Ee IR R nxm ,, and C Ee jRmxm, Rmxm, then then the the (m (m + n) n) xx (m (m + n) n) matrix matrix [~ [A0Bc block upper upper triangular. triangular. ~]] isisblock C T A is AT and is the matrix whose entry The transpose of The of aa matrix matrix A is denoted denoted by by A and is the matrix whose (i, j)th j)th entry 7 mx A, that is, (AT)ij A E jRmxn, AT7" e E jRnxm. is the (j, (7, i)th Oth entry of A, (A ),, = aji. a,,. Note that if A e R ", then A E" xm . If A Ee em If A C mxxn, ", then its Hermitian Hermitian transpose (or conjugate conjugate transpose) is denoted by AHH (or H sometimes A*) and j)th entry is (AH)ij the bar bar indicates sometimes A*) and its its (i, j)\h entry is (A ), 7 = = (aji), («77), where where the indicates complex complex = a IX + jf$ jfJ (j = ii = jfJ. A A is conjugation; i.e., i.e., if z = (j = = R), v^T), then z = = IX a — jfi. A matrix A is symmetric T H if A = A T and Hermitian A = A H. We henceforth if A = A Hermitian if A = A . We henceforth adopt the convention that, that, unless otherwise noted, an equation equation like = A ATT implies implies that that A is realvalued realvalued while while aa statement A = A is statement otherwise noted, an like A H like A A = AH implies that A A is complexvalued. = A complexvalued.
z
Remark While \/—\ most commonly commonly denoted denoted by in mathematics mathematics texts, Remark 1.1. While R isis most by ii in texts, }j is is the common notation notation in in electrical and system system theory. is some some the more more common electrical engineering engineering and theory. There There is advantage to being conversant with both notations. The notation j is used throughout the text but but reminders reminders are text are placed placed at at strategic strategic locations. locations. Example 1.2. 1.2. Example
~
1. A = [ ;
2. A
5
= [ 7+}
3 · A  [ 7 5 j
is symmetric symmetric (and Hermitian). ] is (and Hermitian). 7+ is complexvalued symmetric but Hermitian. 2 j ] is complexvalued symmetric but not not Hermitian.
7+} is Hermitian Hermitian (but symmetric). 2 ] is (but not not symmetric).
Transposes block matrices be defined defined in obvious way. is Transposes of of block matrices can can be in an an obvious way. For For example, example, it it is easy to to see see that that if if A,, are appropriately appropriately dimensioned dimensioned subblocks, subblocks, then easy Aij are then
r
= [
1.2. Matrix Arithmetic
3
11.2 .2 Matrix Arithmetic Arithmetic It is assumed that the reader is familiar with the fundamental notions of matrix addition, multiplication of a matrix by a scalar, and multiplication of matrices. A special case of matrix multiplication multiplication occurs when the second second matrix is a column i.e., the matrixvector product Ax. Ax. A very important way to view this product is vector x, i.e., interpret it as a weighted weighted sum (linear combination) of the columns of A. That is, suppose to interpret (linear combination) suppose
A =
la' ....• a"1
E
m JR " with a,
Then Ax =
Xjal
E
JRm and x =
+ ... + Xnan
Il ;xn~
]
E jRm.
The importance importance of this interpretation interpretation cannot be overemphasized. As a numerical example, take = [96 take A A = [~ 85 74]x ~], x ==
!
2 . Then can quickly quickly calculate dot products rows of [~]. Then we we can calculate dot products of of the the rows of A A
column x to find Ax Ax = = [50[;~], matrixvector product product can also be computed with the column 32]' but this matrixvector computed via v1a
3.[ ~ J+2.[ ~ J+l.[ ~ l
For large arrays of numbers, there can be important computerarchitecturerelated computerarchitecturerelated advantages to preferring the latter calculation method. mxn nxp multiplication, suppose A e E R jRmxn and and B = [bi,...,b [hI,.'" hpp]] e E R jRnxp with For matrix multiplication, suppose A 1 hi E jRn.. Then the matrix product A AB bi e W B can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appears frequently in the text and is presented below as a theorem. Again, its importance cannot be overemphasized. It It is deceptively simple and its full understanding is well rewarded. pxn Theorem 1.3. [Uj, .... Theorem 1.3. Let U U = [MI, . . ,, un] un]Ee jRmxn Rmxn with withUiut Ee jRm Rm and andVV == [VI, [v{.•. ,...,, Vn] vn]Ee lRRPxn p jRP. with Vi vt eE R . Then
n
UV T
=
LUiVr E jRmxp. i=I
If (C D)TT = If matrices C and D are compatible for multiplication, recall that (CD) = DT DT C TT H H H (or (CD} (C D)H =— DH C H).). This gives a dual to the matrixvector matrixvector result above. Namely, if if D C mxn jRmxn has C EeR has row row vectors cJ cj Ee jRlxn, E l x ", and and is is premultiplied premultiplied by by aa row row vector yT yTeE jRlxm, Rlxm, then the product can be written as a weighted linear sum of the rows of C as follows: follows:
yTC=YICf +"'+Ymc~
EjRlxn.
Theorem 1.3 can then also be generalized to its "row reader. Theorem "row dual." The details are left left to the readei
4 4
1.3 1.3
Chapter Review Chapter 1. 1. Introduction Introduction and and Review
Inner Inner Products Products and and Orthogonality Orthogonality
For IRn, the Euclidean inner inner product For vectors vectors x, yy E e R", the Euclidean product (or inner inner product, for for short) short) of x and is given given by by yy is n
T (x, y) := x y = Lx;y;. ;=1
Note that that the inner product product is is aa scalar. Note the inner scalar. If we define complex Euclidean inner product product (or (or inner inner product, product, If x, y Ee <en, C", we define their their complex Euclidean inner for short) short) by for by n
(x'Y}c :=xHy
= Lx;y;. ;=1
y)c x}c, Note that (x, (x, y) = (y, (y, x) i.e., the order order in in which which xx and yy appear appear in in the complex inner c = c, i.e., product is is important. important. The The more more conventional conventional definition definition of of the the complex inner product product is is product complex inner ((x, x , yy)c )c = yHxx = Eni=1 x;y; xiyi but the text text we with the = yH = L:7=1 but throughout throughout the we prefer prefer the the symmetry symmetry with the real real case. case.
Example 1.4. Let [1j]] and and yy == [~]. [1/2]. Then Then Example 1.4. Let xx = = [} (x, Y}c = [ }
JH [ ~ ] =
[I
 j] [
~
] = 1  2j
while while
and we see that, indeed, (x, (x, Y}c y)c = = {y, (y, x)c' x)c. and we see that, indeed, Note that that xx TTxx = = 0 0 if if and and only only if if xx = = 00 when when xx eE Rn IRn but but that that this this is is not not true true if ifxx eE Cn. en. Note HH What is true complex case and only if x = 0. illustrate, consider consider What is true in in the the complex case is is that that X x x = 00 if if and only if O. To To illustrate, T H the nonzero vector =0 the nonzero vector xx above. above. Then Then X x TXx = 0 but but X x HXX = = 2.2. n Two nonzero nonzero vectors vectors x, x, y eE IR to be be orthogonal if their their inner product is is Two R are are said said to orthogonal if inner product H zero, i.e., xxTTyy = = 0. if X 0. If xx and zero, i.e., O. Nonzero Nonzero complex complex vectors vectors are are orthogonal orthogonal if x Hyy = = O. and yy are are T T orthogonal and and X x TXx = and yyT = 1,1, then then we we say say that that xx and are orthonormal. orthonormal. A A orthogonal = 11 and yy = and yy are nxn T T nxn matrix A eE IR is an orthogonal matrix matrix if if A AT AAT = I, where where /I is is the the n n x x nn matrix R is an orthogonal AA = = AA = /, nx identity matrix. matrix. The notation /„ In is sometimes identity sometimes used used to denote denote the identity matrix in in IRRnxn " x nxn H H (or en xn). A eE en = I. Clearly (orC" "). Similarly, Similarly, a matrix A C xn is said said to be unitary if A H A = = AA H = an orthogonal orthogonal or or unitary unitary matrix rows and is an matrix has has orthonormal orthonormal rows and orthonormal orthonormal columns. columns. There There is mxn no special name attached attached to to aa nonsquare nonsquare matrix matrix A A e E ]Rrn"n (or € E e ))with no special name R mxn (or Cmxn with orthonormal orthonormal rows columns. rows or or columns.
1.4 1.4
Determinants Determinants
It A E IRnnxn xn It is assumed assumed that the reader is familiar with the basic theory of of determinants. determinants. For A eR nxn (or A A 6 E en we use use the the notation det A A for determinant of of A. A. We We list list below below some some of of (or C xn) ) we notation det for the the determinant
1.4. Determinants 1.4. Determinants
5
properties of determinants. Note that this is the more more useful properties is not aa minimal set, i.e., several of one or more of the others. properties are consequences properties are consequences of one or more of the others. 1. If If A A has a zero row or if any two rows of A A are equal, then det A A = = 0.o.
= 0. 2. If If A A has has aa zero zero column column or or if if any any two two columns columns of of A A are are equal, equal, then then det det A A = O. 3. Interchanging of A sign of 3. Interchanging two two rows rows of A changes changes only only the the sign of the the determinant. determinant. 4. Interchanging two columns of A changes only the sign of of the determinant. 5. scalar a 5. Multiplying Multiplying aa row row of of A A by by aa scalar ex results results in in aa new new matrix matrix whose whose determinant determinant is is a det A. exdetA. Multiplying a column of A A by a scalar 6. Multiplying scalar ex a results in a new matrix whose determinant determinant is a det is ex det A. A.
7. Multiplying of A scalar and and then then adding adding it it to 7. Multiplying aa row row of A by by aa scalar to another another row row does does not not change change the the determinant. determinant. 8. Multiplying aa column 8. column of of A by a scalar scalar and then adding it to another column column does does not change the the determinant. change determinant. nxn 9. det detAT = det detA = detA A eE C C"X"). AT = A (detA (det AHH = det A if A ).
10. If A is diagonal, diagonal, then det A = =a11a22 alla22 ... 10. If • • • ann, ann, i.e., i.e., det det AA isis the the product product of of its its diagonal diagonal elements. a22 ... 11. 11. If If A is upper triangular, then det det A = = all a11a22 • • • a"n. ann.
12. If triangular, then = a11a22 • • • ann. ann. 12. If A A is is lower lower triangUlar, then det det A A= alla22 ... 13. A is block block diagonal block upper triangular or block lower triangular), with 13. If A diagonal (or (or block A 11, A22, A 22 , ... An" A == square diagonal blocks A11, • • •,, A (of possibly different different sizes), then det A nn (of det A 11 det det A22 A22 ... det Ann. det A11 • • • det Ann. xn 14. If eRIRnnxn ,thendet(AB) = det 5. 14. If A, A, B B E , then det(AB) = det A A det det B. 1 15. If If A Rnxn, then =1det 15. A € E lR~xn, then det(Adet(A 1)) = de: AA. . nxn xm mxm 16. A eE R lR~xn and D DE IR m detA det(D – CA– CAl 1 B). B). 16. If If A and eR ,, then det det [~ [Ac B~] A det(D D] = del Proof: from the LU factorization Proof" This This follows follows easily easily from the block block LU factorization
[~ ~J=[
~ ][ ~
xn mxm 17. If If A and D D eE RM , then then det det [~ [Ac B~] BD – 11C ). 17. A Ee R IRnnxn and lR~xm, det D D det(A det(A – B DC). D] = det Proof" This follows easily from the block UL factorization Proof:
BD 1 I
][
Chapter 1. 1. Introduction Introduction and and Review Chapter Review
6 6
Remark 1.5. The factorization of of aa matrix into the of aa unit lower triangular Remark 1.5. The factorization matrix A A into the product product of unit lower triangular matrix L L (i.e., lower triangular with all l's 1's on the diagonal) and an an upper triangular matrix V U is is called an an LV LU factorization; factorization; see, see, for example, example, [24]. [24]. Another Another such such factorization factorization is is VL UL where V U is unit upper triangular and L is lower triangular. triangular. The factorizations used above are block analogues of these. Remark [~ BD]. ~ ]. Remark 1.6. The matrix D — e C A –I1 BB is called the Schur complement of A in[AC
l
D – l C is the Schur complement of in [~ [AC B~D ]. Similarly, A – B BDIe of D Din
EXERCISES EXERCISES 1. If A eE jRnxn a is a scalar, what is det(aA)? What is det(–A)? det(A)? Rnxn and or A is orthogonal, what is det A? A? If A is unitary, unitary, what is det A? A? 2. If If A If A
3. Let Letx,y jRn. Show Showthatdet(lxyT) x, y eE Rn. that det(I – xyT) = 11 – yTx. yTx. 4. Let U1, VI, V2, E jRn xn be orthogonal matrices. Show that the product V U2, ... . . .,,Vk Uk € Rnxn U = = VI U1 V2 U2 ... • • •V Ukk is is an an orthogonal matrix. 5. Let A A E of A, denoted denoted TrA, Tr A, is defined as the sum of its diagonal e jRNxn. R n x n . The trace of aii. elements, Eni=1 au· elements, i.e., i.e., TrA TrA = = L~=I linear function; i.e., if A, B eE JRn xn and a, ft f3 eE R, JR, then (a) Show that the trace is a linear Rnxn Tr(aA + f3B) fiB)= + fiTrB. Tr(aA = aTrA aTrA + f3TrB. (b) Show that Tr(AB) = Tr(BA), AB i= BA. Tr(Afl) = Tr(£A), even though in general AB ^ B A. nxn (c) Let S € E R jRnxn be skewsymmetric, skewsymmetric, i.e., S STT = = So TrS = 0. O. Then S. Show that TrS either prove the converse or provide a counterexample. x 6. A matrix A A eE W jRnxn A22 = A. " is said to be idempotent if A 22 / x™ . , • , 2cos ~ IF such that that Definition 2.1. 2.1. A is aa set with two two operations +, .• : IF F xx F F such
(Al) a (P + y) y) = = (a (a +,8) + p ) + yy ffor o r all all a,,8, a, ft, yy Elf. € F. (Al) a + (,8 (A2) element 0 IF such such that 0= for all all a a Ee F. IF. (A2) there there exists exists an an element 0 Ee F that aa + 0 = aa. for (A3) for all IF, there element (a) IF such a + (a) O. (A3) for all aa eE F, there exists exists an an element (—a) eE F such that that a (—a) = 0. (A4) a a+ = ft ,8 + afar a for all all a, a, ,8 Elf. (A4) + ,8 p= ft e F.
(Ml) aa· ((,8, p  yy)) = (a·,8)· ( a  p )  yyf for o r all all a,,8, a, p, yy Elf. e F. (Ml) (M2) IF such that a .• II = for all aa Ee F. IF. (M2) there exists an element I1 Ee F = a for (M3) IF, a f. IF such that a .• aI 1. (M3) for all a Ee ¥, ^ 0, 0, there exists an element aI a"1 E€ F a~l == 1. (M4) (M4) aa·,8 • p =,8 = P .• afar a for all all a, a, ,8 p Ee IF. F. (D) (D)
= a·,8 +a· yy for for all a, ,8, y Elf. aa· ((,8 p + y) y)=cip+aalia, p,ye¥.
Axioms (Al)(A3) that (IF, +) is is aa group an abelian if (A4) also holds. Axioms (A1)(A3) state state that (F, +) group and and an abelian group group if (A4) also holds. to), .)•) isis an Axioms Axioms (MI)(M4) (M1)(M4) state state that that (IF (F \\ {0}, an abelian abelian group. group. Generally speaking, speaking, when when no no confusion confusion can can arise, arise, the the multiplication multiplication operator operator "." is Generally "•" is not explicitly. not written written explicitly. 7
8
Chapter 2. Vector Spaces
Example 2.2. 2.2. Example 1. addition and is aa field. IR with with ordinary ordinary addition and multiplication multiplication is field. I. R 2. C with complex addition multiplication is 2. e with ordinary ordinary complex addition and and multiplication is aa field. field. 3. Raf.r] = = the field of 3. Ra[x] the field of rational rational functions functions in in the the indeterminate indeterminate xx =
{ao+
f30 +
atX f3t X
+ ... + apxP + ... + f3qXq
:aj,f3i EIR ;P,qEZ
+} ,
where = {O,l,2, {0,1,2, ... ...},}, is where Z+ Z+ = is aa field. field. mxn IR~ xn = 4. 4.RMr = {m m xx nn matrices matrices of of rank rank rr with with real real coefficients} coefficients) is is clearly clearly not not aa field field since, since, x for (Ml) does m= = n. n. Moreover, " is is not not aa field for example, example, (MI) does not not hold hold unless unless m Moreover, R" lR~xn field either either since (M4) (M4) does does not not hold (although the other 88 axioms hold). since hold in in general general (although the other axioms hold).
Definition vector space V together operations Definition 2.3. 2.3. A A vector space over over a a field field F IF is is a a set set V together with with two two operations ^VV and· and :: IFF xxV »•VV such such that that + ::VV xx VV + V + (VI) group. (VI) (V, (V, +) +) is is an an abelian abelian group. all a, a, f3 E F IF and andfor all vv E (V2) (V2) (a· ( a  pf3)) . vv = = aa  .( (f3 P ' V. v) ) f ofor r all p e for all e V. V.
(V3) (a + f3). ft) • vv == a· a • vv + + pf3.• vv for F and for all vv e (V3) (a for all all a, a, p f3 € Elf andforall E V. V. (V4) a· a(v w)=av w for all aa eElF F and for all v, w w Ee V. (V4) (v + w) = a . v + aa .w for all andfor all v, V. for all all vv E (V5) (V5) I· 1 • vv = = vv for eV V (1 (1 eElf). F). A vector vector space space is is denoted denoted by by (V, (V, F) IF) or, or, when when there there is is no no possibility possibility of of confusion confusion as as to to the the A underlying Id, simply V. underlying fie field, simply by by V.
Remark 2.4. Note that + + and from the + and and .• in Definition Remark 2.4. Note that and·• in in Definition Definition 2.3 2.3 are are different different from the + in Definition 2.1 in on different different objects in different different sets. In practice, practice, this this causes causes 2.1 in the the sense sense of of operating operating on objects in sets. In no confusion and the • operator operator is even written is usually usually not not even written explicitly. explicitly. no confusion and the· Example 2.5. Example 2.5. 1. (R", R) IR) with with addition addition defined defined by by I. (IRn,
and scalar multiplication defined by and scalar multiplication defined by
(en, e). is vector space. Similar definitions definitions hold hold for for (C", is aa vector space. Similar C).
2.2. Subspaces 2.2. Subspaces
99
JR) is vector space with addition addition defined defined by by 2. (JRmxn, (E mxn , E) is aa vector space with 2.
A+B=
[ ." P" a21 + + fJ2I .
amI
+ fJml
al2 a22
+ fJI2 + fJ22
aln + fJln a2n + fJ2n
am2 + fJm2
a mn
and scalar scalar multiplication and multiplication defined defined by by [ ya" y a 21 yA =
y a l2 y a 22
.
yaml
yam 2
ya," ya2n
.
+ fJmn
l
l
.
yamn
3. be an vector space be an be the 3. Let Let (V, (V, IF) F) be an arbitrary arbitrary vector space and and '0 V be an arbitrary arbitrary set. set. Let Let cf>('O, O(X>, V) V) be the set of of functions functions f/ mapping D to V. Then Then cf>('O, O(D, V) V) is is aa vector space with with addition addition set mapping '0 to V. vector space defined defined by by (f
+ g)(d) =
fed)
+ g(d)
for all d E '0 and for all f, g E cf>
and multiplication defined by and scalar scalar multiplication defined by (af)(d) = af(d) for all a E IF, for all d ED, and for all f E cf>. Special Special Cases: Cases: n (a) '0 V = = [to, [to, td, t\], (V, (V, IF) F) = = (JR (IR", E), and and the functions are are piecewise (a) the functions piecewise continuous continuous , JR), n n =: (C[to, td)n. =: (PC[to, (PC[f0, td)n t\]) or continuous continuous =: =: (C[? , h]) . 0
(b) '0
= [to, +00),
(V, IF)
= (JRn, JR), etc.
A E Ax(t)} is vector space 4. Let 4. Let A € JR(nxn. R"x". Then Then {x(t) (x(t) :: x(t) x(t) = = Ax(t}} is aa vector space (of (of dimension dimension n). n).
2.2 2.2
Subspaces Subspaces
Definition 2.6. 2.6. Let (V, IF) F) be be aa vector vector space space and and let let W W c~ V, V, W W f= = 0. 0. Then Then (W, (W, IF) F) is Definition Let (V, is aa subspace is itself space or, subspace of of (V, (V, IF) F) if if and and only only ifif (W, (W, IF) F) is itself aa vector vector space or, equivalently, equivalently, if if and and only only i f ( a w 1 + fJw2) ßW2) eE W for all a, a, fJß eE IF ¥ andforall and for all WI, w1, W2 w2 Ee W. if(awl foral! Remark 2.7. 2.7. The The latter latter characterization characterization of of aa subspace subspace is is often often the the easiest easiest way way to to check check Remark or that something in or prove prove that something is is indeed indeed aa subspace subspace (or (or vector vector space); space); i.e., i.e., verify verify that that the the set set in question Note, too, too, that this question is is closed closed under under addition addition and and scalar scalar multiplication. multiplication. Note, that since since 00 Ee IF, F, this implies that the zero vector must be in in any any subspace. subspace. implies that the zero vector must be Notation: When the the underlying underlying field field is understood, we we write write W W c~ V, the symbol Notation: When is understood, V, and and the symbol ~, c, when with vector vector spaces, spaces, is is henceforth henceforth understood to mean mean "is "is aa subspace subspace of." of." The The when used used with understood to less restrictive restrictive meaning meaning "is "is aa subset subset of' of" is is specifically specifically flagged flagged as as such. such. less
10
Chapter 2. Vector Spaces
Example 2.S. Example 2.8. x 1. (V,lF) and let W = {A e E R" JR.nxn A is symmetric}. Then 1. Consider (V, F) = = (JR.nxn,JR.) (R" X ",R) and = [A " :: A
We V. W~V.
Proof: symmetric. Then easily shown shown that + f3A2 fiAi is Proof' Suppose Suppose A\, AI, A A22 are are symmetric. Then it it is is easily that ctA\ aAI + is symmetric for for all all a, a, f3 R symmetric ft eE R. x ]Rnxn not a subspace of JR.nxn. 2. Let W = {A €E R" " :: A is orthogonal}. Then W is /wf R"x". 2 2 3. (V, F) = (R = [v1v2 identify v1 3. Consider Consider (V, IF) = (]R2,, R) JR.) and and for for each each vv €E R ]R2 of of the the form form vv = [~~ ]] identify VI with with with the the ycoordinate. ycoordinate. For For a, f3 R define the jccoordinate xcoordinate in in the the plane plane and and V2 the u2 with ß eE R, define
W",/l =
{V : v =
[ ac
~
f3 ]
;
c
E
JR.} .
Then Wa,ß is V if and only = 0. interesting exercise, Then W",/l is aa subspace subspace of of V if and only if if f3ß = O. As As an an interesting exercise, sketch sketch W2,1, W2,o,W1/2,1, andW1/2, too, that that the the vertical vertical line line through through the the origin origin (i.e., (i.e., W2.I, W2,O, Wi,I' and Wi,o, Note, too, 0. Note, a == 00) a oo) is is also also aa subspace. subspace. All origin are Shifted subspaces Wa,ß with All lines lines through through the the origin are subspaces. subspaces. Shifted subspaces W",/l with f3ß = =1= 0 0 are are called linear called linear varieties. varieties. Henceforth, dependence of space on Henceforth, we we drop drop the the explicit explicit dependence of aa vector vector space on an an underlying underlying field. field. Thus, usually denotes denotes aa vector vector space space with with the the underlying underlying field field generally generally being being R JR. unless Thus, V V usually unless explicitly stated stated otherwise. explicitly otherwise. Definition 12, and vector spaces (or subspaces), then RR = Definition 2.9. 2.9. IfffR and SS are are vector spaces (or subspaces), then = SS if if and and only only ifif S and S C R. RC ~SandS ~ R. To prove prove two two vector vector spaces are equal, equal, one one usually usually proves proves the the two two inclusions inclusions separately: Note: To spaces are separately: An is shown shown to arbitrary s5 E€ S is is shown shown to An arbitrary arbitrary rr eE R is to be be an an element element of of S and and then then an an arbitrary to be an element of be an element of R. R.
2.3 2.3
Linear Independence Independence Linear
Let • •}} be V. Let X X = {v1, {VI, v2, V2, ••.• be aa nonempty nonempty collection collection of of vectors vectors u, Vi in in some some vector vector space space V. Definition 2.10. 2.10. X X is a linearly linearly dependent set set of of vectors if and only Definition if and only if if there exist exist k distinct distinct X and and scalars scalars aI, not all all zero zero such such that that elements VI, elements v1, ... . . . ,, Vk vk eE X a1, ..• . . . ,, (Xk ak not
X linearly independent if and and only any collection collection of distinct X is is aa linearly independent set set of of vectors vectors if only ififfor for any of kk distinct elements VI, v1, ... . . . ,,Vk of X . . . ,, ak, elements Vk of X and and for for any any scalars scalars a1, aI, ••• ak, al VI
+ ... + (XkVk = 0 implies
al
= 0, ... , ak = O.
11 11
2.3. Linear Independence Independence 2.3. Linear Example 2.11. Example 2.11.
~,
~
1. LetV = R3. Then Then {[ I. 1£t V =
However, [ Howe,."I
HiHi] } Ime~ly
is a linearly independent set. Why? i" independent.. Why?
i1[i1[l ]}
de~ndent ~t
linearly dependent set iss aa Iin=ly
(since v2 + + v3 0). (since 2v\ 2vI —  V2 V3 = = 0). xm tA m A E ]Rnxm. Thenconsider considerthe therows rows of ofeetA as vectors vectors in in C em[t [to, tIl 2. Let A e ]Rnxn R xn and 5B eE R" . Then BB as 0, t1] fA (recall that etA e denotes the matrix exponential, which is discussed in more detail in Chapter 11). 11). Independence these vectors vectors turns concept Chapter Independence of of these turns out out to to be be equivalent equivalent to to aa concept called to be be studied further in in what what follows. follows. called controllability, to studied further
]Rn, ii E consider the matrix V = , Vk] eE ]Rnxk. Let Vi vf eE R", e If, k, and consider = [VI, [ v 1 , ... ... ,Vk] Rnxk. The The linear dependence of of this this set of vectors vectors is is equivalent to the the existence existence of nonzero vector vector a eE ]Rk dependence set of equivalent to of aa nonzero Rk O. An equivalent condition linear dependence dependence is that the k x x k matrix such that Va = 0. condition for linear VT VT VV is is singular. singular. If If the the set set of of vectors vectors is is independent, independent, and and there there exists exists aa Ee ]Rk R* such such that that Va then a = = 0. O. An An equivalent equivalent condition for linear independence is is that that the the matrix Va = = 0, 0, then condition for linear independence matrix V TTVV is is nonsingular. nonsingular.
Definition 2.12. 2.12. Let X X = Vi E span of of Definition = {VI, [ v 1 , V2, v 2 , ...• . . }} be a collection of of vectors vi. e V. Then the span X is as X is defined defined as Sp(X) = Sp{VI, = {v :
V
V2, ... }
= (Xl VI
+ ... + (XkVk
;
(Xi ElF,
Vi
EX, kEN},
where N = {I, {1, 2, ... ...}. }. n 2.13. Let V = ]Rn and Example 2.13. =R and define
el
=
0 0
o
, e2 =
0 1 0
,'" ,en =
0 0 0
o
SpIel, e2, , en} ]Rn. Then Sp{e1, e2, ... ...,e = Rn. n} =
Definition of vectors V if if and Definition 2.14. 2.14. A A set set of vectors X X is is aa basis basis for for V and only only ijif
1. X X is a linearly independent set (of (of basis vectors), and and 2. Sp(X) Sp(X) = = V. 2. V.
Chapter 2. Vector Spaces
12 12
Example 2.15. {el, ... , en} for]Rn [e\,..., en} is a basis for IR" (sometimes called the natural natural basis). Now let bb1, ..., , bnn be a basis (with a specific order associated with the basis vectors) l , ... for V. Then Then for for all all v E e V there there exists exists aa unique unique ntuple {E1 , ... . . . , ,E~n} n } such such that that for ntuple {~I'
v= where
B
~
~Ibl
+ ... + ~nbn
[b".,b.l. x
= Bx,
DJ
~
Definition 2.16. The scalars {Ei} components (or sometimes the coordinates) coordinates) Definition 2.16. {~i } are called the components of ... , b } and are unique. We that the vector x of of , of v with respect to the basis {b (b1, ..., ] unique. We say nn l represents the vector v with respect to the basis B. B. components represents Example 2.17. In]Rn, In Rn, VI ]
:
+ V2e2 + ... + vne n ·
= vlel
Vn
We can can also also determine determine components components of of vv with with respect respect to to another another basis. For example, example, while We basis. For while [
~
] = I . el
+ 2 . e2,
with respect respect to to the basis with the basis
{[~l[!J} we have we have [
~
] = 3.[
~
]
~
+ 4· [
l
To see this, write [
~
] =
XI • [
= [ ~ Then Then
[ ~~ ] = [ ;

~ + ]
X2 • [
_! ]
! ][ ~~ l 1
r
I [
;
]
=[ ~
l
Theorem 2.18. 2.18. The The number number of of elements elements in in aa basis basis of of aa vector is independent independent of of the the Theorem vector space space is particular basis considered. particular basis considered. Definition 2.19. 2.19. If V= 0) 0) has n elements, V V is is said to Definition If a basis X X for for a vector space V(Jf be n.dimensional ndimensional or or have have dimension dimension nn and and we we write write dim dim(V) n or or dim dim V V — n. n. For be (V) = n For
=
=
2.4. 2.4. Sums Sums and and Intersections Intersectionsof of Subspaces Subspaces
13 13
consistency, space, we we define define dim(O) O. A A consistency, and and because because the the 00 vector vector is is in in any any vector vector space, dim(O) = = 0. vector space V is is finitedimensional finitedimensional if there exists exists aa basis basis X with nn < < +00 elements; if there X with +00 elements; vector space V otherwise, otherwise, V V is is infinitedimensional. infinitedimensional.
Thus, Theorem 2.18 says says that number of in aa basis. basis. the number of elements elements in Thus, Theorem 2.18 that dim(V) dim (V) = the Example Example 2.20. 2.20. 1. dim(~n) = n. dim(Rn)=n. 2. dim(~mXn) dim(R mxn ) = mn. mn.
Note: Check basis for by the mn matrices m, jj Ee ~, Note: Check that that aa basis for ~mxn Rmxn is is given given by the mn matrices Eij; Eij; ii eE m, n, where Efj is all of elements are are 00 except except for (i, J)th j)th location. location. where Eij is aa matrix matrix all of whose whose elements for aa 11 in in the the (i, The collection of E;j Eij matrices matrices can can be be called called the the "natural "natural basis matrices." The collection of basis matrices." 3. dim(C[to, tJJ) t1]) =  +00. +00. T A =A AT} 4. dim{A dim{A E€ ~nxn Rnxn :: A } = = !n(n {1/2(n + 1). 1 (To see see why, why, determine 1) symmetric basis matrices.) matrices.) (To determine1/2n(n !n(n + 1) symmetric basis 2
5. A is upper (lower) triangular} = !n(n + 1). 1). 5. dim{A dim{A Ee ~nxn Rnxn :: A is upper (lower) triangular} =1/2n(n
2.4 2.4
Sums and Intersections of Subspaces Subspaces
Definition 2.21. 2.21. Let (V, JF') F) be vector space let 71, c V. sum and and intersection Definition Let (V, be a a vector space and and let R, SS S; V. The The sum intersection ofR and SS are defined respectively respectively by: of R, and are defined by:
1. R n+S S = {r {r + ss :: rr eE R, U, ss eE S}. 5}. 1. 2. ft R n S = R and S}. 2. H5 = {v {v :: vv Ee 7^ and vv Ee 5}.
Theorem 2.22. 2.22. Theorem kK
1. K CV V (in (in general, U\ \+ 1. R + SS S; general, RI
=: L ]T R; ft/ S; C V, V, for for finite finite k). k). ... +h 7^ Rk =: 1=1
;=1
2. D5 CV V (in (in general, 2. 72. R n S S; general,
f] n
a eA CiEA
*R, CV V/or an arbitrary arbitrary index index set A). Raa S; for an set A).
Remark 2.23. 2.23. The U S, Remark The union union of of two two subspaces, subspaces, R C S, is is not not necessarily necessarily aa subspace. subspace.
Definition = R 0 SS is is the the direct direct sum sum of R and and SS ifif Definition 2.24. 2.24. T = REB ofR 1. R n S == 0, 0, and 1. n and
(L
L
2. (in general am/ ]P ft,2. U R + SS = = T (in general, ft; R; n (^ ft,) R j ) == 00 and Ri == T). T). H; y>f
« The subspaces R, Rand are said said to complements of of each The subspaces and SS are to be be complements each other other in in T. T.
14 14
Chapter 2. 2. Vector Vector Spaces Spaces Chapter
2 Remark 2.25. unique. For example, consider V = jR2 2.25. The complement of R ft (or S) S) is not unique. =R and let let R ft be be any any line line through through the the origin. any other other distinct line through origin is and origin. Then Then any distinct line through the the origin is a complement of R. ft. Among all the complements there is a unique unique one orthogonal to R. ft. We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose =RR O Theorem 2.26. Suppose T = EB S. Then Then
1. every written uniquely uniquely in every tt E€ T can can be be written in the the form form tt = rr + ss with with rr Ee Rand R and ss Ee S. S. 2. 2. dim(T) = = dim(R) dim(ft) + + dim(S).
Proof: To Proof: To prove the first part, suppose an arbitrary vector tt Ee T can be written in two ways as tt = r1 S2, where r2 Ee Rand R. and s1, e S. Then r2 = s2— s\. But rl + s1 Sl = r2 r2 + S2, where r1, rl, r2 SI, S2 S2 E Then r1 r, —  r2 S2  SI. But as r1 –r2 ft and 52 S. Since Since Rft n fl S = 0, 0, we r\ = r2 ri and and s\ from rl r2 £ E Rand S2 — si SI eE S. we must must have have rl SI = si S2 from which uniqueness follows. which uniqueness follows. 0 The statement of the second part is a special case of the next theorem. D Theorem 2.27. ft, S S of of a vector space space V, V, Theorem 2.27. For For arbitrary arbitrary subspaces subspaces R, a vector dim(R + S) = dim(R)
+ dim(S) 
dim(R n S).
x Example 2.28. jRn xn the Example 2.28. Let U be the subspace of upper triangular matrices in E" " and let £.c be the nxn subspace of lower triangUlar jRn xn. xn triangular matrices in R . Then it may be checked that U + + .c L= = jRn Rnxn nxn un.c jRnxn. while U n £ is the set of diagonal matrices in R . Using the fact that dim {diagonal (diagonal matrices} = = n, n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity validity of formula given given in Theorem 2.27. 2.27. of the the formula in Theorem
Example 2.29. Example 2.29. x jRnxn, and let R" ", and let S
Let (V, jR), let R (V, IF) F) = (jRnxn, (R n x n , R), ft be the set of skewsymmetric matrices in x be the set of symmetric matrices in jRnxn. the set in R" ". Then V = U $0 S. S.
n
x Proof: This follows easily from the fact that any A A E jRnxn written in the form Proof: e R" " can be written
1
TIT
A=2:(A+A )+2:(AA).
The first matrix on the righthand side above is in S while the second is in R. ft.
EXERCISES EXERCISES 1. ... , Vk} vd is a linearly dependent set. Then show that one of the vectors 1. Suppose {VI, {vi,..., must be a linear combination of the others. XI, *2, X2, ... Xk E jRn be nonzero mutually ... , 2. Let x\, . . . ,, x/c E R" mutually orthogonal vectors. Show that {XI, [x\,..., Xk} must be linearly independent independent set. set. Xk} must be aa linearly
3. Let VI, ... ,v , Vn jRn. Show that Av\,..., Av" •.. , Av AVnn are orv\,... are also orn be orthonormal vectors in R". x jRnxn thonormal if and only if A Ee R" " is orthogonal. 4. Consider = [2 Consider the vectors VI v\ — [2 1l]fr and V2 1*2== [3[3 1f. l] r .Prove Provethat thatVIviand andV2V2form forma abasis basis 2 for R v= = [4 [4 If l]r with respect to this basis. jR2.. Find the components of the vector v
Exercises Exercises
15
5. Let Let P denote set of polynomials of degree less than or or equal two of the form form 5. denote the the set of polynomials of degree less than equal to to two of the 2 Po p\xX + pix where Po, po, PI, p\, p2 e R. Show that is aa vector vector space space over over R E. Show Show Po + PI P2x2,, where P2 E R Show that P is x, and  1 basis for Find the the that the polynomials polynomials 1, that the 1, *, and 2x2 2x2 — 1 are are aa basis for P. Find the components components of of the 22 polynomial 22 + + 3x 3x + 4x basis. 4x with with respect respect to to this this basis. polynomial 6. Prove Theorem case of only). 6. Prove Theorem 2.22 2.22 (for (for the the case of two two subspaces subspaces Rand R and S only).
7. Let denote the vector space space of of degree degree less equal to and of 7. Let Pnn denote the vector of polynomials polynomials of less than than or or equal to n, n, and of n the form p ( x ) = po + p\x + • • • + p x , where the coefficients /?, are all real. Let PE the form p(x) Po + PIX + ... + Pnxn, where the coefficients Pi are all real. Let PE n denote subspace of all even even polynomials in Pnn,, i.e., i.e., those that satisfy satisfy the property denote the the subspace of all polynomials in those that the property p(—x} = p(x). Similarly, let let PQ denote the subspace of of all all odd polynomials, i.e., i.e., p( x) = p(x). Similarly, Po denote the subspace odd polynomials, those satisfying p(—x} = – p ( x ) . Show that P = P © POthose satisfying p(x) = p(x). Show that nn = PE E EB Po· 8. Repeat using instead instead the subspaces T 7" of of tridiagonal 8. Repeat Example Example 2.28 2.28 using the two two subspaces tridiagonal matrices matrices and and U of of upper upper triangular triangular matrices. matrices. U
This page intentionally intentionally left left blank blank This page
Chapter 3 Chapter 3
Linear Linear Transformations Transformations
3.1 3.1
Definition Definition and and Examples Examples
definition of of aa linear linear transformation (or (or linear map, linear function, function, We begin with the basic definition or linear operator) between two two vector vector spaces. or linear operator) between spaces. Let (V, F) IF) and and (W, IF) be be vector vector spaces. spaces. Then I: :: V + Definition 3.1. Let (W, F) Then C > W is aa linear transformation if and if transformation if and only only if I:(avi + {3V2) al:vi + {3I:V2 for all all a, a, {3 ElF and for all v VI, V22e E V. £(avi pv2) = = aCv\ fi£v2 far £e F and far all V. },v The vector space space V is called called the I: while while VV, W, the space into into the domain of of the the transformation transformation C the space The vector which it it maps, maps, is called the which is called the codomain.
Example 3.2. Example 3.2. 1. Let F IF = R JR and take V W = PC[f PC[to, and take V= W +00). 1. Let 0, +00). Define £ I: :: PC[t PC[to, +00) + PC[to, +00) by by Define > PC[t 0, +00) 0, +00) vet)
f+
wet) = (I:v)(t) =
11
e(tr)v(r) dr.
to mxm 2. Let Let F IF = R JR and W = JRmxn. Fix M MEe R JRmxm.. and take V V= W R mx ". Fix mx + JRmxn mxn by Define £ I: :: JRmxn R " > M by
X
f+
Y
= I:X = MX.
n : a, 3. Let F IF = =n R JR and take V = P" pn = {p(x) ao0 + ct alx + ... +h aanx ai E E R} JR} and and 3. Let and take V= (p(x) = a }x H nx" 1 W = pnl. w = p . I: : V + p', where' where I denotes Define C.: —> W by I:Lpp =— p', denotes differentiation differentiation with respect to x. x.
17
Chapter 3. Linear Chapters. Li near Transformations Transformations
18
3.2 3.2
Matrix Representation Representation of Linear Transformations Transformations Matrix of Linear
Linear conLinear transformations transformations between between vector vector spaces spaces with with specific specific bases bases can can be be represented represented conSpecifically, suppose £L : (V, F) IF) —>• ~ (W, IF) is linear and further veniently in matrix form. Specifically, (W, F) suppose that {Vi, ~} and {Wj, {w j, j E {u,, i eE n} e !!!.} m] are bases for V V and W, respectively. Then the ith column of A = = Mat £ L (the matrix representation of £L with respect to the given bases for V V and and W) of £i>, {w}j,•, jj eE m}. raj. In for W) is is the the representation representation of LVi with with respect respect to to {w In other other words, words,
al
n
:
A=
]
E
JR.mxn
a mn
represents £ since since represents L LVi = aliwl
+ ... + amiWm
=Wai,
where W= = [w\,..., wm]]and where W [WI, ... , w and
L depends on the particular bases for V is the ith z'th column of A. Note that A = Mat £ V and W. This could be reflected by subscripts, say, in the notation, but this is usually usually not done. uniquely determined determined (by linearity) The action of £L on an arbitrary vector Vv eE V V is uniquely by action on on aa basis. Thus, if v = = E1v1 + ... ••• + + E vn = = V Vxx (where and hence by its its action basis. Thus, if V ~I VI + ~nnVn (where u, v, and hence jc, x, is is arbitrary), then arbitrary), then LVx = Lv = ~ILvI
+ ... + ~nLvn
=~IWal+"'+~nWan
= WAx.
Thus, £V WA since xx was was arbitrary. arbitrary. Thus, LV = W A since When V= = R", W == lR. Rmm and and {Vi, [ v i , ii Ee n}, [ w jj', jj eE !!!.} m} are are the (natural) bases, bases When V JR.n, W ~}, {W the usual usual (natural) WA linea LV = W A becomes simply £ L = = A. A. We We thus commonly identify identify A A as a linear the equation £V transformation with its matrix i.e., transformation with its matrix representation, representation, i.e.,
m Thinking of as aa matrix matrix and from Rn Rm usually Thinking of A both both as and as as aa linear linear transformation transformation from JR." to to lR. usually causes causes no no naturally to appropriate matrix multiplication. confusion. Change of basis then corresponds naturally
3.3. Composition Transformations 3.3. Composition of ofTransformations
3.3
19 19
Composition Composition of Transformations
Consider three vector spaces U, V, and W Wand and transformations B from U to V and A from V to to W. W. Then Then we we can can define define aa new new transformation transformation C C as as follows: follows:
C The above diagram C = = AB. The above diagram illustrates illustrates the the composition composition of of transformations transformations C AB. Note Note that that in in most texts, the arrows above are reversed as follows:
C However, it might be useful to prefer the former since the transformations A and B appear in the same order order in dimZ// = = p, = n, n, in the same in both both the the diagram diagram and and the the equation. equation. If If dimU p, dimV dim V = and W = m, and if if we associate matrices the transformations transformations in in the and dim dim W m, and we associate matrices with with the the usual usual way, way, then composition composition of corresponds to to standard standard matrix multiplication. That That is, then of transformations transformations corresponds matrix mUltiplication. is, we have C C —A AB B .. The above is sometimes expressed expressed componentwise by the mxp
nxp
formula n
cij
=
L
aikbkj.
k=1
Two Two Special Special Cases: Inner Product: Let x, y eE Rn. ~n. Then their inner product is the scalar Inner Product: n
xTy = Lx;y;. ;=1
m Outer ~m, Outer Product: Product: Let x eE R , yy eE ~n. Rn. Then their outer product is the m x n matrix matrix
Note that any rankone matrix A eE ~mxn Rmxn can be written in the form A = = xyT xyT H mxn mxn above (or xy xyH if A Ee C c ).). A rankone symmetric matrix can be written in the form XX xx TT (or xx XXHH).).
20 20
Chapter Chapter 3. 3. LinearTransformations Li near Transformations
3.4 3.4
Structure Structure of of Linear Linear Transformations Transformations
Let A :: V + W be transformation. Let A V —> be aa linear linear transformation.
Definition3.3. A, denotedR(A), set {w Av for for some Definition 3.3. The The range range of of A, denotedlZ( A), is is the the set {w Ee W : w w= = Av some vv Ee V}. V}. Equivalently, R(A) =— {Av {Av : v Ee V}. V}. The range of of A is also known as the image of of A and denoted denoted Im(A). Im(A). The Av = of The nullspace of of A, denoted denoted N(A), N(A), is is the the set {v {v Ee V V : Av = O}. 0}. The The nullspace nullspace of kernel of of A and and denoted Ker (A). A is also known as the kernel (A). Theorem 3.4. Let Let A A :: V V + —>• W be be aa linear linear transformation. transformation. Then Then 1. R(A) 1. R ( A ) S; C W. W.
2. V. 2. N(A) N(A) S; c V.
Note N(A) and Note that that N(A) and R(A) R(A) are, are, in in general, general, subspaces subspaces of of different different spaces. spaces. mxn Theorem 3.5. Let A Ee R ~mxn.. If ... ,,a an], If A is written in in terms of of its columns as A = = [ai, [a\,... n], then then R(A) = Sp{al, ... , an} . then
Proof: The the defiProof: The proof proof of of this this theorem theorem is is easy, easy, essentially essentially following following immediately immediately from from the definition. 0 nition. D
Remark 3.6. Note Note that is that in in Theorem Theorem 3.5 and and throughout throughout the the text, text, the the same same symbol symbol (A) (A) is used to denote both aa linear the used to denote both linear transformation transformation and and its its matrix matrix representation representation with with respect respect to to the usual usual (natural) (natural) bases. bases. See See also also the the last paragraph of of Section Section 3.2. 3.2. Definition 3.7. ... , vk] vd be a set of 3.7. Let {VI, {v1,..., of nonzero vectors Vi u, Ee ~n. Rn. The set is said to be orthogonal orthogonal if if' vr vjvjv j = 00 for ^ jj and and orthonormal orthonormal if if vr vf vvjj = 88ij' 8tj is is the for ii f= where 8ij the be ij, where Kronecker Kronecker delta delta defined defined by by
8 = {I0 ij
ifi=j, if i f= j.
Example 3.8. 3.8.
J. [: J}
1. {[
~
2. {[
~~i
is an an orthogonal orthogonal set. set. is
],[ :~~ J}
is an an orthonormal orthonormal set. set. is
. h Vi • hogonaI set, . isan an en {I~ ~, ... ~  IS 33.. If {{VI, t > i •.• , . . . ,,Vk Vk}} Wit with u, E.IN,. € 1Tlln M." IS is an ort orthogonal set,ththen —/==,  .,, ~} —/=== an orthonormal orthonormal set. set.
I ~VI ^/v, VI vi
^/v'k vk ~~~
]
3.4. of Li near Transformations Transformations 3.4. Structure Structure of Linear
21 21
Definition 3.9. Let S . Then Theorem 3.14 3.14 (Decomposition (Decomposition Theorem). Theorem). Let Let A A :: IR + R IRm. Then
1. every every vector space R" IRn can can be written in in a a unique unique way way as as vv = 7. vector vv in in the the domain domain space be written = xx + y, y, ± E M(A) N(A) and E J\f(A) N(A).l = R(AT) N(A) EB ». where x € and y € ft(Ar) (i.e., (i.e., IR R"n = M(A) 0 R(A ft(ATr)).
2. every in the the codomain Rmm can a unique asww = x+y, every vector vector w in codomain space space IR can be be written written in ina unique way way as = x+y, 1 R(A) and and y e E ft(A) R(A).l = Af(A N(AT)T ) (i.e., IRmm = R(A) 0 EBN(A ». where x eE U(A) (i.e., R = 7l(A) M(ATT)). This key key theorem theorem becomes becomes very very easy easy to to remember remember by by carefully studying and underThis carefully studying and understanding Figure Figure 3.1 in the the next next section. standing 3.1 in section.
3.5 3.5
Four Four Fundamental Fundamental Subspaces Subspaces
x Consider aa general general matrix matrix A A € E E^ lR;,xn. When thought thought of of as as aa linear linear transformation transformation from Consider ". When from IR E"n m to of A can be in terms fundamental subspaces subspaces to R IRm,, many many properties properties of A can be developed developed in terms of of the the four four fundamental
3.5. Four Four Fundamental Fundamental Subspaces Subspaces 3.5.
23 23
A
N(A)1
r
r
X
EB {OJ
{O}Gl
m r
nr
Figure fundamental subspaces. Figure 3.1. 3.1. Four fundamental subspaces. R(A), 'R.(A)^, R(A)1, AN(A), properties seem almost 7£(A), f ( A ) , and N(A)1. N(A)T. Figure 3.1 3.1 makes many key properties obvious and and we return to to this this figure figure frequently frequently both both in in the the context context of of linear linear transformations obvious we return transformations and in in illustrating illustrating concepts concepts such such as as controllability controllability and and observability. observability. and
be aa linear linear transfortransforDefinition 3.15. Let W be spaces and and let let A Definition 3.15. Let V and and W be vector vector spaces A :: V + W be motion. mation. 1. A is onto onto (also (also called called epic epic or or surjective) surjective) ifR(A) ifR,(A) = = W. W. 1. A is 2. A is onetoone onetoone or or 11 11 (also (also called called monic monic or or injective) infective) if ifJ\f(A) 0. Two Two equivalent equivalent 2. A is N(A) == O. characterizations of A 11 that that are are often often easier to verify verify in in practice are the the characterizations of A being being 11 easier to practice are following: following: (a) AVI = AV2 (b)
VI
===} VI
= V2 .
t= V2 ===} AVI t= AV2 .
m Definition 3.16. 3.16. Let A : E" IR n + IRm. rank(A) = dim R(A). This is sometimes called > R . Then rank(A) dimftCA). the column column rank rank of of A (maximum number of of independent independent columns). The row row rank rank of of A is
24 24
Chapter 3. LinearTransformations Chapter3. Linear Transformations
r dim 7£(A R(AT) ) (maximum number of of independent independent rows). rows). The dual notion to rank is the nullity of A, sometimes denoted of A, denoted nullity(A) nullity(A) or or corank(A), corank(A), and and is is defined defined as as dimN(A). dim A/"(A). n m Theorem 3.17. 3.17. Let A :: R ]Rn > ~ R ]Rm.. Then dim K(A) R(A) = dimNCA)L. dimA/'(A) ± . (Note: (Note: Since 1 TT N(A)L" = = 7l(A R(A ),), this theorem is sometimes colloquially A/^A) colloquially stated "row rank of of A == column rank of of A.") A.")
Proof: Define a linear transformation T : N(A)L Proof: J\f(A)~L ~ —>•R(A) 7£(A)byby Tv
=
Av for all v
E
N(A)L.
Clearly T is 11 (since A/"(T) N(T) = = 0). To To see that T is also onto, take any W w eE R(A). 7£(A). Then by definition there is a vector xx Ee ]Rn Ax = R" such that Ax — w. w. Write xx = Xl x\ + X2, X2, where 1 Xl N(A)L N(A). Then Ajti AXI = W N(A)L.1. The last equality x\ Ee A/^A)  andx2 and jc2 eE A/"(A). u; = TXI r*i since Xl *i eE A/^A)shows that T R(A) = T is onto. We thus have that dim dim7?.(A) = dimN(A)L dimA/^A^ since it is easily shown 1 basis for N(A)L,, then {TVI, basis for R(A). if that if {VI, {ui, ... . . . ,, viv} abasis forA/'CA) {Tv\, ... . . . ,, Tv Tvrr]} is aabasis 7?.(A). Finally, if r } is a following string of equalities follows follows easily: we apply this and several previous results, the following T "column A" = rank(A) R(A) = R(AT) "column rank of A" rank(A) = = dim dim7e(A) = dimN(A)L dim A/^A)1 = = dim dim7l(A ) = = rank(AT) rank(A r ) == "row rank of 0 of A." D The following corollary is immediate. Like the theorem, it is a statement about equality of dimensions; the subspaces subspaces themselves themselves are are not not necessarily in the the same same vector vector space. space. of dimensions; the necessarily in m Corollary 3.18. ]Rn ~ ]Rm.. Then dimN(A) R(A) = = n, where n is the 3.18. Let A : R" > R dimA/"(A) + + dim dimft(A) dimension of dimension of the the domain domain of of A. A.
Proof: Theorems 3.11 3.11 and and 3.17 3.17 we we see see immediately Proof: From From Theorems immediately that that n = dimN(A) = dimN(A)
+ dimN(A)L + dim R(A) .
0
For completeness, completeness, we include here a few miscellaneous results about ranks of sums and products of matrices. xn Theorem 3.19. ]Rnxn. 3.19. Let A, B Ee R" . Then
1. O:s rank(A 2. rank(A)
+ B)
:s rank(A)
+ rank(B) 
+ rank(B).
n :s rank(AB) :s min{rank(A), rank(B)}.
3. nullity(B) :s nullity(AB) :s nullity(A) 4.
if B is nonsingular,
rank(AB)
+ nullity(B).
= rank(BA) = rank(A) and N(BA) = N(A).
Part 44 of of Theorem 3.19 suggests suggests looking looking atthe at the general general problem of the four fundamental fundamental Part Theorem 3.19 problem of the four subspaces of matrix products. The basic results are contained in the following following easily proved theorem.
3.5. 3.5. Four Four Fundamental Fundamental Subspaces Subspaces
25 25
mxn nxp Theorem 3.20. IRmxn, IRnxp. 3.20. Let A Ee R , B Ee R . Then
1. RCAB) S; RCA). 2. N(AB) ;2 N(B). 3. R«AB)T) S; R(B T ). 4. N«AB)T) ;2 N(A T ).
The It The next next theorem theorem is is closely closely related related to to Theorem Theorem 3.20 3.20 and and is is also also easily easily proved. proved. It is and is extremely extremely useful useful in in text text that that follows, follows, especially especially when when dealing dealing with with pseudoinverses pseudoinverses and linear linear least least squares squares problems. problems. mxn Theorem 3.21. Let A Ee R IRmxn. 3.21. Let . Then
1. R(A)
= R(AA T ).
2. R(AT)
= R(A T A).
3. N(A) = N(A T A). 4. N(A T ) = N(AA T ).
We now now characterize characterize II 11 and and onto onto transformations transformations and and provide provide characterizations characterizations in We in terms of of rank and invertibility. terms rank and invertibility. Theorem Theorem 3.22. 3.22. Let A :: IR Rnn + » IRm. Rm. Then 1. A is onto onto if if and and only only if //"rank(A) —m m (A (A has has linearly linearly independent independent rows rows or or is is said said to to 1. A is rank (A) = have full row AATT is have full row rank; rank; equivalently, equivalently, AA is nonsingular). nonsingular). 2. A is said 2. A is 11 11 if if and and only only ifrank(A) z/rank(A) = = nn (A (A has has linearly linearly independent independent columns columns or or is is said T to full column AT A is nonsingular). to have have full column rank; rank; equivalently, equivalently, A A nonsingular).
Proof' Proof part 1: A is R(A) = Proof: Proof of of part 1: If If A is onto, onto, dim dim7?,(A) —m m = — rank(A). rank (A). Conversely, Conversely, let let yy Ee IRm Rm T T ] n be arbitrary. Let jc x =A AT(AA (AAT)I IRn.. Then y = Ax, i.e., y Ee R(A), A is onto. )~ y Y Ee R 7?.(A), so A A is = Proof Proof of of part part 2: 2: If If A is 11, 11, then then N(A) A/"(A) = = 0, 0, which which implies implies that that dimN(A)1dim A/^A)1 = —nn — dim R(A 7£(ATr ),), and and hence hence dim dim 7£(A) Theorem 3.17. 3.17. Conversely, Conversely, suppose suppose AXI Ax\ = Ax^. dim R(A) = nn by by Theorem AX2. T Then A ATr A;ti AXI = A AT AX2, which implies x\ XI = X2 since A ATrAA is invertible. Thus, A A is Ax2, = x^. 11. D 11. D
Definition A :: V Definition 3.23. 3.23. A V + —» W W is is invertible invertible (or (or bijective) bijective) if if and and only only if if it it is is 11 11 and and onto. onto. Note that that if if A is invertible, invertible, then then dim dim V V = — dim dim W. W. Also, »• E" is invertible invertible or or A is Also, A A :: W IRn1 + IR n is Note nonsingular ifand nonsingular if and only only ifrank(A) z/rank(A) = = n. n. x A E€ R" IR~xn, Note that in the special case when A ", the transformations A, A, AT, Ar, and AI A"1 ± are N(A)1 and R(A). The are all all 11 11 and and onto onto between between the the two two spaces spaces M(A) and 7£(A). The transformations transformations AT AT ! and I have range but is and A A~ have the the same same domain domain and and range but are are in in general general different different maps maps unless unless A A is T orthogonal. Similar remarks apply to A A and A~ A T. .
26
Chapter 3. linear Chapters. Li near Transformations Transformations
If linear transformation is not invertible, it may still be right or left left invertible. DefiIf a linear concepts are followed by a theorem characterizing left left and right invertible nitions of these concepts transformations.
Definition V > Definition 3.24. 3.24. Let Let A A :: V + W. Then Then 1. A is said to be right invertible ifif there exists a right inverse transformation A~ ARR :: R AA R = W + —> V such that AA~ = Iww,, where IIw transformation on W. w denotes the identity transfonnation L left inverse transformation A L + 2. A is said to to be left invertible ifif there exists a left transformation A~ :: W —> L V such L A A == Iv, such that that AA~ Iv, where where Iv Iv denotes denotes the the identity identity transfonnation transformation on on V. V.
Let A : V + Theorem 3.25. Let > W. Then 1. A A is right right invertible invertible ifif and and only only ifif it it is onto. 1. onto. left invertible and only ifit 2. A is is left invertible if if and if it is 11. and only if and left left invertible, i.e., both Moreover, A is is invertible if if and if it is both right and invertible, i.e., both11 11 and and R L onto, in in which case A~ A Il = = A~ A R = A~ A L. = . m Theorem 3.22 3.22 we see that if A : E" ]Rn + ]Rm Note: From Theorem >• E is onto, then a right inverse R T T is given by A~ A R = = A AT(AA (AAT) left inverse is given by ) I.. Similarly, if A is 11, then a left L T L = (ATTA)I1AT. AA~ = (A A)~ A .
3.26. Let Let A : V » + V. V. Theorem 3.26. 1. If A  RR such that AA~ A A  RR = = I, then A is invertible. If there exists a unique right inverse A~ L left inverse A~ A L A LLA A = 2. If If there exists a unique left such that A~ = I, then A is invertible.
Proof: We prove the first part and proof of second to the reader. Notice the Proof: and leave leave the proof of the second the following: following: A(A R + ARA I)
= AA R + AARA = I
+IA 
A
A since AA R = I
= I. R (A R + AA RRAA —  /)I) must must be be aa right right inverse inverse and, and, therefore, Thus, (A + therefore, by by uniqueness uniqueness itit must must be be R R R A R + A~ A RRA A — I = A R. A RRA A = = /, I, i.e., i.e., that A~ A R the case that A~ + = A~ . But this implies that A~ is aa left left inverse. inverse. It It then then follows follows from from Theorem Theorem 3.25 3.25 that that A A is is invertible. invertible. D 0
Example 3.27. 1. Let A = 2]:]R2 + E ]R1I.. Then A is onto. (Proof: (Proo!' Take any a E ]R1I; = [1 [1 2] : E2 »• € E ; then one 2 can such that rank can always always find find vv eE E ]R2 such that [1 [1 2][^] 2][ ~~] = = a). a). Obviously Obviously A A has has full full row row rank (= 1) and A  RR = _~]j is a right (=1) and A~ = [ _j right inverse. inverse. Also, it is clear that there are are infinitely infinitely many A. In Chapter right inverses for A. Chapter 6 we characterize characterize all right inverses of a matrix by characterizing all solutions of the linear linear matrix matrix equation equation AR AR = characterizing all solutions of the = I.I.
27
Exercises
2. LetA ~ ]R2. Then A is 11. The only 2. Let A = [i]:]Rl [J] : E1 > E2. ThenAis 11. (Proof (Proof: The only solution solution toO to 0 = = Av Av = = [i]v [I2]v is N(A) = A is that A A has has full is vv = 0, 0, whence whence A/"(A) = 00 so so A is 11). 11). It It is is now now obvious obvious that full column column L rank (=1) and A~ A L = = [3 [3 —1] 1] is a left inverse. Again, it is clear that there are A. In we characterize infinitely infinitely many many left left inverses inverses for for A. In Chapter Chapter 66 we characterize all all left left inverses inverses of of aa matrix LA = matrix by characterizing characterizing all all solutions solutions of of the the linear linear matrix matrix equation equation LA = I.I.
3. The matrix 3. The matrix A =
1 1 2 1 [ 3 1
when onto. give when considered considered as as aa linear linear transformation on on IE]R3,\ isisneither neither 11 11nor nor onto.We We give below bases bases for four fundamental below for its its four fundamental subspaces. subspaces.
EXERCISES EXERCISES 3 1. Let A A = consider A A as a linear linear transformation transformation mapping E ]R3 to ]R2. 1. Let = [[~8 5;3 i) J4 and consider E2. Find A with respect to Find the the matrix matrix representation representation of of A to the bases bases
{[lHHU]} of R3 and
{[il[~J}
2
of E . nx 2. Consider vector space ]Rnxn ]R, let 2. Consider the the vector space R " over over E, let S denote denote the the subspace subspace of of symmetric symmetric matrices, R denote matrices, and and let let 7£ denote the the subspace subspace of of skewsymmetric skewsymmetric matrices. matrices. For For matrices matrices nx ]Rnxn y) = Y). Show that, with X, Y Y Ee E " define their inner product by (X, (X, Y) = Tr(X Tr(XTr F). J. . respect this inner inner product, product, R respect to to this 'R, = —SS^.
3. Consider £, defined in Example 3.2.3. Is £, £, Consider the differentiation differentiation operator C £ II? 11? IsIs£ onto? onto? 4. Prove Theorem Theorem 3.4. 4. Prove 3.4.
Chapter 3. Linear Transformations Chapters. Linear Transformations
28 5. Prove Theorem 3.11.4. 3.Il.4. Theorem 3.12.2. 6. Prove Theorem
7. Determine Detennine bases for the four fundamental fundamental subspaces of the matrix
A=[~2 5~ 5~ ~]. 3 mxn 8. Suppose xn has a left left inverse. Show that ATT has a right inverse. Suppose A Ee IR Rm
n
9. Let = [[~J o]. Determine A/"(A) and and 7£(A). Are they equal? Is general? 9. Let A = DetennineN(A) R(A). Are they equal? Is this this true true in in general? If If this is true in general, prove it; if not, provide a counterexample. 9x48 E Mg 1R~9X48. linearly independent independent solutions 10. 10. Suppose A € . How many linearly solutions can be found to the homogeneous = 0? Ax = O? homogeneous linear linear system system Ax T 3.1 to illustrate the four fundamental subspaces associated e 11. Modify Figure 3.1 associated with A ATE nxm m IR nxm thought of as a transformation from from R IR m to IRn. R R".
Chapter Chapter 4 4
Introduction to the the Introduction to MoorePenrose MoorePen rose Pseudoinverse Pseudoinverse In this introduction to generIn this chapter chapter we we give give aa brief brief introduction to the the MoorePenrose MoorePenrose pseudoinverse, pseudoinverse, aa generalization of the inverse of a matrix. The MoorePenrose pseudoinverse is defined for any matrix and, as is is shown in the the following text, brings brings great notational and conceptual clarity matrix and, as shown in following text, great notational and conceptual clarity to of solutions solutions to arbitrary systems of linear linear equations equations and and linear linear least to arbitrary systems of least squares squares to the the study study of problems. problems.
4.1 4.1
Definitions Definitions and and Characterizations Characterizations
Consider aa linear linear transformation —>• y,y, where whereX Xand andY y arearearbitrary arbitraryfinitefiniteConsider transformation A A :: X X + 1 dimensional N(A).l dimensional vector spaces. Define Define a transformation transformation T T :: Af(A)  + —>• R(A) Tl(A) by by Tx = Ax for all x E NCA).l.
Then, as noted in the 3.17, T T is (11 and and onto), onto), and Then, as noted in the proof proof of of Theorem Theorem 3.17, is bijective bijective Cll and hence hence we we can define a unique inverse transformation TRCA) + can T~l 1 :: 7£(A) —>•NCA).l. J\f(A}~L. This Thistransformation transformation can be used to give our first first definition A ++,, the the MoorePenrose MoorePenrose pseudoinverse pseudoinverse of of A. can be used to give our definition of of A A. neither provides provides nor suggests a good computational strategy Unfortunately, the definition neither good computational strategy for determining AA++.. for determining Definition A and and T as defined defined above, above, define define aa transformation transformation A A++ : Y + X X by Definition 4.1. 4.1. With With A T as y —»• by
L + where y = = YI y\ + Yz j2 with y\ eE 7£(A) yi eE Tl(A} Then A is the where Y with Yl RCA) and and Yz RCA).l.. Then A+ is the MoorePenrose MoorePenrose pseudoinverse A. pseudoinverse of of A.
Although X X and and Y were arbitrary vector spaces let us us henceforth henceforth consider consider the the Although y were arbitrary vector spaces above, above, let 1 X X =W ~n and Y lP1.mm.. We We have thus defined A+ A + for all A A Ee IR™ lP1.;" xn. case X y =R ". A purely algebraic characterization A ++ is is given in the the next next theorem, theorem, which proved by by Penrose Penrose in characterization of of A given in which was was proved in 1955; 1955; see see [22]. [22].
29
30
Chapter 4. Introduction to to the the MoorePenrose MoorePenrose Pseudoinverse Pseudoinverse Chapter 4. Introduction
xn Theorem Let A A Ee lR;" A++ if Theorem 4.2. 4.2. Let R?xn. . Then Then G G= =A if and and only only ifif
(Pl) AGA = A. (PI) AGA = A.
(P2) GAG GAG = G. (P2) G.
=
(P3) (P3) (AG)T (AGf = AG. AG. (P4) (P4) (GA)T (GA)T == GA. GA.
Furthermore, A++ always Furthermore, A always exists exists and and is is unique. unique.
Note that nonsingular matrix matrix satisfies Penrose properties. Note that the the inverse inverse of of aa nonsingular satisfies all all four four Penrose properties. Also, Also, aa right right or or left left inverse inverse satisfies satisfies no no fewer fewer than than three three of of the the four four properties. properties. Unfortunately, Unfortunately, as as with 4.1, neither its proof with Definition Definition 4.1, neither the the statement statement of of Theorem Theorem 4.2 4.2 nor nor its proof suggests suggests aa computacomputational However, the the great providing aa tional algorithm. algorithm. However, the Penrose Penrose properties properties do do offer offer the great virtue virtue of of providing checkable the following following sense. that is is aa candidate checkable criterion criterion in in the sense. Given Given aa matrix matrix G G that candidate for for being being the G the pseudoinverse pseudoinverse of of A, A, one one need need simply simply verify verify the the four four Penrose Penrose conditions conditions (P1)(P4). (P1)(P4). If If G satisfies all four, must be A++.. Such often relatively satisfies all four, then then by by uniqueness, uniqueness, it it must be A Such aa verification verification is is often relatively straightforward. straightforward.
[a
[!
+ Example Verify directly A+ = Example 4.3. 4.3. Consider Consider A A == [']. Verify directly that that A = [ ~] f ] satisfies satisfies (PI)(P4). (P1)(P4). L A L = Note Note that that other other left left inverses inverses (for (for example, example, A~ = [3 [3 — 1]) 1]) satisfy satisfy properties properties (PI), (PI), (P2), (P2), and and (P4) (P4) but but not not (P3). (P3).
A++ is given in the following Still another characterization Still another characterization of of A is given in the following theorem, theorem, whose whose proof proof can While not this can be be found found in in [1, [1, p. p. 19]. 19]. While not generally generally suitable suitable for for computer computer implementation, implementation, this characterization can can be be useful for hand calculation of of small small examples. examples. characterization useful for hand calculation xn Theorem Let A A Ee lR;" Theorem 4.4. 4.4. Let R™xn. . Then Then
A+
= lim (AT A + 82 1) I AT
(4.1)
= limAT(AAT +8 2 1)1.
(4.2)
6+0 6+0
4.2 4.2
Examples Examples
verified by by using the above Each of Each of the the following following can can be be derived derived or or verified using the above definitions definitions or or characcharacterizations. terizations. T Example AT (AATT) A is Example 4.5. 4.5. X A+t == A (AA )~I if if A is onto onto (independent (independent rows) rows) (A (A is is right invertible).
Example 4.6. A)I AT A is invertible). Example 4.6. A+ A+ = = (AT (AT A)~ AT if if A is 11 11 (independent (independent columns) columns) (A (A is is left left invertible). Example Example 4.7. 4.7. For For any any scalar scalar a, a, if a
t= 0,
if a =0.
4.3. Properties Properties and and Applications 4.3. Applications
31 31
Example jRn, Example 4.8. 4.8. For For any any vector vector v Ee M", if v i= 0, if v = O.
Example 4.9. Example 4.9.
Example 4.10. Example 4.10.
4.3 4.3
r
[~ ~
[~
=[
~
~l
0
r 1 I
4
=[
I
4
I
4 I
4
Properties and and Applications Properties Applications
This section miscellaneous useful useful results on pseudoinverses. these This section presents presents some some miscellaneous results on pseudoinverses. Many Many of of these are are used used in in the the text text that that follows. follows. mx jRmxn"and orthogonal Theorem 4.11. 4.11. Let A Ee R andsuppose supposeUUEejRmxm, Rmxm,VVEejRnxn R n x "areare orthogonal(M(Mis is T 11 orthogonal if if MT M = MM ). Then ). Then orthogonal
Proof: For For the simply verify verify that that the the expression expression above above does indeed satisfy satisfy each each cof Proof: the proof, proof, simply does indeed the four 0 the four Penrose Penrose conditions. conditions. D nxn Theorem Let S jRnxn be with U SU = D, where where U and Theorem 4.12. 4.12. Let S Ee R be symmetric symmetric with UTTSU = D, U is is orthogonal orthogonal an + + TT + D is diagonal. diagonal. Then Then S S+ = U D+U where D D+ is is again again a a diagonal diagonal matrix matrix whose whose diagonc diagonal D is UD U , , where elements are are determined to Example elements determined according according to Example 4.7. 4.7.
Theorem 4.13. A E 4.13. For For all A e jRmxn, Rmxn,
1. A+
= (AT A)+ AT = AT (AA T)+.
2. (A T )+ = (A+{.
Proof: Both results can can be proved using the limit limit characterization characterization of of Theorem Theorem 4.4. The Proof: Both results be proved using the 4.4. The proof of of the the first is not particularly easy easy and and does not even even have the virtue virtue of of being being proof first result result is not particularly does not have the especially illuminating. illuminating. The The interested interested reader reader can can consult consult the proof in in [1, [1, p. p. 27]. The especially the proof 27]. The proof of the the second second result (which can can also also be easily by by verifying the four four Penrose Penrose proof of result (which be proved proved easily verifying the conditions) is is as as follows: follows: conditions) (A T )+ = lim (AA T ~+O
+ 82 l)IA
= lim [AT(AAT ~+O
= [limAT(AAT
+ 82 l)1{ + 82 l)1{
~+O
= (A+{.
0
32
Chapter 4. Introduction to to the the MoorePenrose MoorePenrose Pseudo Pseudoinverse Chapter 4. Introduction inverse
4.12 and 4.13 Note that by combining Theorems 4.12 4.13 we can, can, in theory at least, compute the MoorePenrose pseudoinverse of any matrix (since AAT A AT and AT AT A are symmetric). This e.g., [7], [7], [II], [11], turns out to be a poor poor approach in finiteprecision arithmetic, however (see, (see, e.g., [23]), and better methods are suggested in text that follows. Theorem Theorem 4.11 4.11 is suggestive of a "reverseorder" property for pseudoinverses of prodnets of of matrices such as as exists exists for of products. nroducts TTnfortnnatelv. in general, peneraK ucts matrices such for inverses inverses of Unfortunately, in
As example consider [0 1J B= A = = [0 I] and and B = [LI. : J. Then Then As an an example consider A (AB)+ = 1+ = I
while while B+ A+
= [~
[]
~J ~ = ~.
sufficient conditions under which the reverseorder reverseorder property does However, necessary and sufficient hold are known and we quote a couple of moderately useful results for reference. + + Theorem 4.14. 4.14. (AB)+ (AB)+ = = B B+ A A + ifif and and only only if if
1. n(BB T AT) ~ n(AT) and 2. n(A T AB) ~ nCB) .
Proof: For the proof, see [9]. Proof: [9].
0 D
+ Theorem 4.15. = B?A+, where BI AB\B+. 4.15. (AB) (AB)+ = B{ Ai, where BI = = A+AB A+ AB and and A) AI = = ABIB{.
Proof: For the proof, see [5]. Proof: [5].
0 D
n xr r xm lR~xr, lR~xm, A+. Theorem 4.16. 4.16. If If A eE R eR (AB)+ == B+ B+A+. r , B E r , then (AB)+ n xr T + Proof' Since A Ee R lR~xr, A)IlAAT, A+ Proof: A+ = = (AT (ATA)~ , whence A AA = fIrr .• Similarly, Similarly, since r , then A+ xm + T T + B e E W lR;xm, we B+ BT(BBT)I, BB+ f The by . , we have B = B (BB )~\ whence BB = I . The result then follows by r rr taking BIt = = B,At B, A\ = =A in Theorem Theorem 4.15. 4.15. D takingB A in 0
The following theorem gives some additional useful properties properties of pseudoinverses. mxn Theorem 4.17. 4.17. For For all A E e lR Rmxn ,,
1. (A+)+ = A. 2. (AT A)+ = A+(A T)+, (AA T )+ = (A T)+ A+. 3. n(A+)
= n(A T) = n(A+ A) = n(A TA).
4. N(A+)
= N(AA+) =
5.
If A
N«AA T)+)
is normal, then AkA+
=
= N(AA T) = N(A T).
A+ Ak and (Ak)+ = (A+)kforall integers k > O.
Exercises
33
xn Note: Recall Recall that A eE R" IRn xn is normal A ATT = = A ATTA. A. For For example, example, if if A A is is symmetric, symmetric, Note: that A is normal if if AA then it it is is normal. normal. However, However, aa matrix matrix can can be be none none of the skewsymmetric, skewsymmetric, or or orthogonal, orthogonal, then of the preceding but but still be normal, normal, such as preceding still be such as
A=[ ba ab] for scalars a, E. for scalars a, b b eE R The next next theorem facilitating aa compact and unifying approach The theorem is is fundamental fundamental to to facilitating compact and unifying approach to studying studying the of solutions solutions of equations and linear least squares to the existence existence of of (matrix) (matrix) linear linear equations and linear least squares problems. problems. nxp MXm IRnxp, IRnxm. Theorem 4.18. Suppose Suppose A Ee R , B Ee E . Then Then R(B) K(B) cS; R(A) U(A) if if and and only only ifif B. AA+B == B. m Proof: Suppose R(A) and and take arbitrary jc x E IRm. RCA), so so Proof: Suppose R(B) K(B) cS; U(A) take arbitrary eR . Then Then Bx Bx eE R(B) H(B) cS; H(A), p there exists aa vector such that = Bx. have there exists vector yy Ee R IRP such that Ay Ay = Bx. Then Then we we have
Bx
= Ay = AA + Ay = AA + Bx,
where one the Penrose is used arbitrary, we where one of of the Penrose properties properties is used above. above. Since Since xx was was arbitrary, we have have shown shown that B = AA+ B. that B = AA+B. + To prove prove the converse, assume assume that that AA AA +B B = B take arbitrary arbitrary yy eE K(B). R(B). Then To the converse, B and and take Then m m there vector xx E IR such that Bx Bx = y, whereupon whereupon there exists exists aa vector eR such that = y, 0
y = Bx = AA+Bx E R(A).
EXERCISES EXERCISES
U ;].1 •
1. Use Theorem 4.4 to to compute pseudoinverse of of \ 2 1. Use Theorem 4.4 compute the the pseudoinverse
2
T + T + T x, Y IRn, show that (xyT)+ 2. If jc, y eE R", (xyT)+ == (x T(xx)+(yT x) (yy)+ y) yx yxT. mxn r 3. For For A A eE R IRmxn, prove that that 7£(A) RCA) = = 7£(AA R(AAT) using only only definitions definitions and and elementary 3. , prove ) using elementary properties MoorePenrose pseudoinverse. pseudoinverse. of the the MoorePenrose properties of mxn 4. For A A e E R IRmxn, , prove that R(A+) ft(A+) = R(A ft(ATr). pxn mx 5. For A A E IRPxn and BE IRmxn, thatN(A) S; A/"(S) N(B) if and A = B. eR 5 €R ", show that JV(A) C and only if BA+ fiA+A B. xn m A G E M" IRn xn, IRmmxm xm and suppose further that D 6. Let A , 5B eE JRn E n xxm , and D E€ E D is nonsingular. 6.
(a) Prove Prove or or disprove disprove that that
[~
AB D
(b) (b) Prove Prove or or disprove disprove that that
[~
B D
r r=[ =[
A+
0
A+
0
A+ABD i D i
A+BD 1 D i
l
].
This page intentionally intentionally left left blank blank This page
Chapter Chapter 5 5
Introduction to Introduction to the the Singular Singular Value Decomposition Value Decomposition
In this this chapter chapter we we give give aa brief brief introduction introduction to to the the singular value decomposition decomposition (SVD). (SVD). We We In singular value show that matrix has an SVD SVD and and describe describe some show that every every matrix has an some useful useful properties properties and and applications applications of this this important important matrix matrix factorization. factorization. The The SVD plays aa key key conceptual and computational of SVD plays conceptual and computational role throughout throughout (numerical) and its applications. role (numerical) linear linear algebra algebra and its applications.
5.1
The Fundamental Theorem Theorem
xn mxm Theorem 5.1. Let A eE R™ IR~xn.. Then there exist orthogonal matrices U E IRmxm and and Theorem 5.1. e R nxn nxn V V E€ IR R such such that that
A
n
= U~VT,
(5.1)
rxr
IRrxr,, and a\ UI > ur ) e E R diag(ul, ... where = [J ... ,,o>) > ••• > > U orr > More > 0. O. More where S ~ = [~ °0], SS = diagfcri, specifically, we have specifically,
A
= [U I
U2) [
~
= Ulsvt·
0 0
V IT VT
][ ]
(5.2)
2
(5.3)
nxr The submatrix sizes are all determined by r (which must be S n}), i.e., i.e., UI IRmxr,, < min{m, min{m, «}), U\ eE W U2 eE ^x(mr) «xr j yV22 €E Rnxfor^ U2 IRrnx(mrl,; Vi VI eE RIRnxr, IRnx(nr),and andthethe0Osubblocks inE~are arecompatibly compatibly JM^/ocJb in dimensioned. dimensioned.
r r Proof: Since AT A (ATAAi is symmetric and and nonnegative nonnegative definite; recall, for example, Proof: Since A A >:::::00( A s symmetric definite; recall, for example, [24, Ch. 6]), eigenvalues are are all real and and nonnegative. nonnegative. (Note: The rest rest of the proof proof follows [24, Ch. 6]), its its eigenvalues all real (Note: The of the follows analogously if if we we start start with with the the observation observation that that A AAT analogously A T ::::: > 00 and and the the details detailsare are left left to to the the reader reader T of eigenvalues AT A A by by {U?, with UI as an exercise.) Denote the the set as an exercise.) Denote set of eigenvalues of of A {of , i/ eE !!.} n} with a\ ::::: > ... • • • ::::: >U arr >> 0 = Ur+1 o>+i = = ... • • • = Un. an. Let Let {Vi, {u, , ii Ee !!.} n} be be aa set set of of corresponding corresponding orthonormal orthonormal eigenvectors eigenvectors 0= and V\ = [v\, ...,,Vvr r),] , V2Vi == [Vr+I, [vr+\,... . . .,V, vn n].].LettingS Letting S =—diag(uI, diag(cri,... . . .,u , rcf),r),we wecan can and let let VI [VI, ... r 2 T 2 A TAVi A VI = = VI S2.. Premultiplying by vt A TAVi A VI = vt VI S2 = the latter latter write A write ViS Premultiplying by Vf gives gives vt Vf A VfV^S = S2, S2, the equality following andpostmultiplying postmultiplyingby by of the the r;, Vi vectors. vectors. PrePre and equality following from from the the orthonormality orthonormality of SI the emotion equation S~l gives eives the
(5.4)
35
Chapter to the Chapter 5. 5. Introduction Introduction to the Singular Singular Value Value Decomposition Decomposition
36 36
Turning now to the the eigenvalue eigenvalue equations equations corresponding to the the eigenvalues eigenvalues ar+l, or+\,... . . . ,, a Turning now to corresponding to ann we we T have that A A TTAV A V2z = VzO = 0, whence Vi A T A V = O. Thus, A V = O. Now define the V20 Vf A AV22 0. AV2 0. Now mx/ l matrix VI IRmxr VI = AViS~ AViSI. Ui E e M " by U\ . Then from (5.4) (5.4) we see see that VrVI UfU\ = = /; i.e., the 77IX( r) columns of VI are orthonormal. Choose any matrix V2 E IRmx(mr) such that [VI columns U\ orthonormal. Choose U2 £ ^ ™~ [U\ V2] U2] is orthogonal. Then T V AV
=[ =[
VrAVI
Vr AVz
VIAVI
vI AVz
VrAVI
~]
vIA VI
]
since A AV V22 ==0.O. Referring the equation equation V U\I == A A VI V\ SI S l defining since Referring to to the defining U\, VI, we we see see that that U{ V r AV\ A VI = = S and and vI 1/2 AVi = vI U^UiS = O. 0. The The latter latter equality equality follows follows from from the the orthogonality orthogonality of of the S A VI = VI S = the V 2.. Thus, we see that, in fact, VT A V = [~ ~], and defining this matrix columns of VI U\ and andU UTAV [Q Q], to S completes completes the to be be ~ the proof. proof. D 0 Definition Definition 5.2. 5.2. Let A A == V"i:. t/E VT VT be an SVD SVD of of A A as in Theorem 5.1. 5.1. 1. The set {ai, ... , ar}} is called called the set of [a\,..., of (nonzero) singular values values of of the matrix A and iI T proof of A;'(2 (AT A) == is denoted ~(A). £(A). From the proof of Theorem 5.1 we see that ai(A) cr,(A) = A (A A) I
AtA.? (AA (AATT).).
min{m, n} Note that there are also min{m, n] — r zero singular singular values.
2. The columns ofUV are called called the left singular vectors orthonormal columns of left singular vectors of of A (and are the orthonormal eigenvectors of of AA AATT).). eigenvectors 3. The columns of right singular of V are called called the right singular vectors vectors of of A (and are the orthonormal orthonormal eigenvectors of of AT A1A). A). x Remark complex case in which A E IC~ xn" is quite straightforward. Remark 5.3. 5.3. The analogous analogous complex e C™ straightforward. H The decomposition A = proof is essentially decomposition is A = V"i:. t/E V V H,, where V U and V V are unitary and the proof identical, except for Hermitian transposes replacing transposes.
Remark 5.4. Note that V Remark 5.4. U and V can be be interpreted interpreted as changes changes of basis in both the domain domain and codomain codomain spaces spaces with respect to has aa diagonal diagonal matrix matrix representation. representation. and with respect to which which A A then then has Specifically, Specifically, let C, C denote denoteAAthought thought of ofasasaalinear linear transformation transformation mapping mapping IRWn totoIRm. W. Then Then T rewriting A A = VT as as AV A V = V"i:. Mat C the bases = V"i:. U^V U E we we see see that Mat £ is is "i:. S with respect respect to the m (see [v\,..., for IR R"n and and {u {u\,..., for R (see the Section 3.2). 3.2). See See also also {VI, ... , vn }} for I, •.. , u m IRm the discussion discussion in in Section m]} for Remark 5.16. 5.16. Remark Remark decomposition is not unique. Remark 5.5. 5.5. The !:ingular singular value decomposition unique. For example, an examination of the proof proof of Theorem Theorem 5.1 reveals that any orthonormal orthonormal basis basis for for N(A) jV(A) can can be be used used for for V2. V2. • £lny there may be nonuniqueness nonuniqueness associated the columns V\ (and (and hence hence VI) U\) corcor• there may be associated with with the columns of of VI responding to multiple cr/'s. responding to multiple O'i'S.
37 37
5.1. 5.1. The The Fundamental Fundamental Theorem Theorem
• any U2 C/2can be used so long as [U [U\I U2] Ui] is orthogonal. orthogonal. U and V V can be changed (in tandem) by sign (or multiplier of the form form • columns of U eejej8 in the the complex case). case). What is unique, however, is the matrix I: E and the span of the columns of UI, U\, U2, f/2, VI, Vi, and V ¥22 (see Theorem Theorem 5.11). Note, too, too,that thataa"full "full SVD" SVD"(5.2) (5.2)can canalways alwaysbe beconstructed constructedfrom from a "compact SVD" SVD" (5.3). (5.3).
Computing an SVD by working directly with the eigenproblem for A ATT A A or Remark 5.6. 5.6. Computing T AA T is numerically poor in finiteprecision arithmetic. Better algorithms exist that work AA directly on A via a sequence of orthogonal orthogonal transformations; transformations; see, e.g., [7], see, e.g., [7], [11], [11],[25], [25]. F/vamnlp Example 5.7.
A  [10 01]  U I UT,
2 x 2 orthogonal orthogonal matrix, is an SVD. where U U is an arbitrary arbitrary 2x2 5.8. Example 5.8. A _ [ 1

0
~ ]
sin e cose
cose = [  sine
J[~ ~J[
cose sine
Sine] cose '
where e 0 is arbitrary, is an SVD. Example 5.9. 5.9. Example I
A=U
2y'5
3
5
2
y'5
n=[ [] 3 2
3
S
0
2~
4y'5 15
][
3~ 0
_y'5 3
0
0][ 0 0
v'2 T v'2 T
v'2 T v'2 2
]
I
3
=
2
3 2
3J2
[~ ~]
3
is an SVD. MX A e E IR Example 5.10. 5.10. Let A R nxn " be symmetric symmetric and positive definite. Let V V be an orthogonal orthogonal matrix of eigenvectors A, i.e., AV = A = A VTT is an eigenvectors that diagonalizes A, i.e., VT VT AV =A > > O. 0. Then A = V VAV SVDof A. SVD of A.
A factorization UI: VTr of m x nn matrix A A qualifies as an SVD if U t/SV o f aann m U and V are orthogonal and I: £ is an m x n "diagonal" matrix whose diagonal elements in the upper left comer A = UI:V A, then corner are positive (and ordered). For example, if A f/E VTT is an SVD of A, r r T T VI:TU V S C / i is s aan n SSVD V D ooff AT. A .
38 38
Chapter Introduction to the Singular Decomposition Chapter 5. 5. Introduction to the Singular Value Value Decomposition
5.2 5.2
Some Some Basic Basic Properties Properties
mxn Theorem 5.11. Let A A Ee R jRrnxn have singular value value decomposition A = VTT.. Using Theorem 5.11. Let have aa singular decomposition A = U'£ VLV Using the notation the following hold: the notation of of Theorem Theorem 5.1, 5.1, the following properties properties hold:
1. A. 1. rank(A) rank(A) = = rr == the the number number of of nonzero nonzero singular singular values values of of A. 2. Let Let U V =. = [HI, [UI, .... and V A has has the the dyadic dyadic (or 2. . . ,, uurn] V = = [VI, [v\,... ..., , vvnn].]. Then Then A (or outer outer m] and product) expansion product) expansion r
A = Laiuiv;.
(5.5)
i=1
3. The singular vectors vectors satisfy satisfy the the relations relations 3. The singular AVi
= ajui,
AT Uj = aivi
for i E
(5.6) (5.7)
r.
4. LetUI = [UI, ... , u r ], U2 = [Ur+I, ... , urn], VI = [VI, ... , vr ], andV2 = [Vr+I, ... , Vn]. Then (a) R(VI) = R(A) = N(A T / . (b) R(U2) = R(A)1 = N(A T ). (c) R(VI)
= N(A)1 = R(A T ).
(d) R(V2)
= N(A) =
R(AT)1.
Remark 5.12. Part Part 4 4 of theorem provides provides aa numerically numerically superior superior method method for Remark 5.12. of the the above above theorem for finding bases for four fundamental to methods finding (orthonormal) (orthonormal) bases for the the four fundamental subspaces subspaces compared compared to methods based based column echelon echelon form. form. Note Note that that each each subspace on, for example, reduction reduction to row or on, for example, to row or column subspace requires requires knowledge of the The relationship subspaces is is summarized summarized knowledge of the rank rank r. r. The relationship to to the the four four fundamental fundamental subspaces nicely in Figure 5.1. nicely in Figure 5.1. Remark 5.13. 5.13. The the dyadic decomposition (5.5) as aa sum of outer outer products Remark The elegance elegance of of the dyadic decomposition (5.5) as sum of products SVD and the key vector vector relations relations (5.6) explain why why it conventional to to write the SVD and the key (5.6) and and (5.7) (5.7) explain it is is conventional write the as = U'£V UZVTT rather say, A = U,£V. UZV. as A A = rather than, than, say, A = mx Theorem Let A A E jRmxn singular value value decomposition A = in Theorem 5.14. Let e E " have have aa singular decomposition A = U,£V UHVTT as as in Theorem 5.1. Then Then TheoremS.].
(5.8)
where where
39 39
5.2. Some Basic Properties 5.2. Some Basic Properties
A
r
r
E9 {O}
/
{O)
nr
mr
Figure 5.1. and the subspaces. Figure 5.1. SVD SVD and the four four fundamental fundamental subspaces. with Qsubblocks appropriately U and we let let the the columns columns of of U and V V with the the Osubblocks appropriately sized. sized. Furthermore, Furthermore, ifif we be as defined then be as defined in in Theorem Theorem 5.11, 5.11, then
r
=
L
1
v;u;,
(5.10)
;=1 U;
Proof' The proof follows follows easily easily by by verifying verifying the the four Penrose conditions. conditions. Proof: The proof four Penrose
0 D
+ Remark expressions above an SVD SVD of Remark 5.15. 5.15. Note Note that that none none of of the the expressions above quite quite qualifies qualifies as as an of AA+ if insist that singular values ordered from smallest. However, However, aa simple simple if we we insist that the the singular values be be ordered from largest largest to to smallest. reordering reordering accomplishes accomplishes the the task: task:
(5.11)
This also be identity matrix matrix This can can also be written written in in matrix matrix terms terms by by using using the the socalled socalled reverseorder reverseorder identity (or ..., , e^, symmetric. (or exchange exchange matrix) matrix) P P = = \e [e rr,,eerI, e2, e\\, ed, which which is is clearly clearly orthogonal orthogonal and and symmetric. r^\, ...
40 40
Chapters. Introduction to to the Singular Value Decomposition Chapter 5. Introduction the Singular Value Decomposition
Then Then A+
= (VI p)(PS1 p)(PVr)
is the the matrix matrix version version of of (5.11). A "full be similarly similarly constructed. is (5.11). A "full SVD" SVD" can can be constructed.
Remark 5.16. 5.16. Recall Recall the the linear linear transformation transformation T used in in the the proof proof of of Theorem Theorem 3.17 and Remark T used 3.17 and is determined determined by by its its action action on on aa basis, basis, and and since in Definition Definition 4.1. 4.1. Since in Since T T is since ({VI, v \ , ... . . .,,vvr r}}isisaa basisforN(A).l, then TT can can be be defined defined by by TVj u rr}} basis forJ\f(A)±, then TV; == OjUj cr, w,, ,i / E~. e r. Similarly, Similarly, since since {UI, [u\,... ... , ,u is a basis forR(A), then then TcanbedefinedbyTIu; = tv; From Section Section 3.2, the isabasisfor7£(.4), T~lI can be defined by T^'M, = ^u, ,i , / eE~. r. From 3.2, the with respect respect to to the the bases bases {{VI, and {u clearly matrix representation representation for matrix for T T with v \ , ... ..., , vvrr}} and {MII,, ... . . . ,, uurr]} is is clearly with respect respect to to S, while the the matrix matrix representation representation for the inverse linear transformation transformation TS, while for the inverse linear T~lI with 1 the same bases is is 5"" SI.. the same bases
5.3 5.3
Rowand Column Compressions Row and Column Compressions
Row compression Let A A E E lR. have an by (5.1). (5.1). Then Let Rmxn have an SVD SVD given given by Then
VT A = :EVT =
[~ ~ ] [ ~i
 [ SVr 0 ]
]
mxn E lR. .
rx Notice that that M(A) N(A) = and the the matrix matrix SVf SVr Ee R lR. rxll" has has full row Notice  N(V M(UT T A) = N(svr> A/"(SV,r) and full row T other words, words, premultiplication premultiplication of of A A by by VT is an an orthogonal orthogonal transformation transformation that that rank. In rank. In other U is A by by row row transformations. transformations. Such row compression compression can can also also be be accomplished "compresses" "compresses" A Such aa row accomplished by orthogonal orthogonal row row transformations transformations performed performed directly directly on A to to reduce reduce it it to to the the form form [~], by on A 0 , where R R is is upper upper triangular. triangular. Both Both compressions compressions are are analogous analogous to to the the socalled where socalled rowreduced rowreduced echelon form form which, which, when when derived by aa Gaussian Gaussian elimination elimination algorithm implemented in in echelon derived by algorithm implemented finiteprecision arithmetic, arithmetic, is is not not generally generally as as reliable reliable aa procedure. finiteprecision procedure. D _
Column compression compression Column Again, SVD given Then Again, let let A A eE R lR.mxn have have an an SVD given by by (5.1). (5.1). Then AV = V:E
=
[VI
U2]
[~ ~
]
=[VIS 0] ElR.mxn. mxr This time, time, notice notice that that H(A) R(A) = K(AV) R(A V) = R(UI S) and and the the matrix matrix UiS VI S eE R lR. m xr has has full This K(UiS) full In other other words, words, postmultiplication postmultiplication of of A A by by V is an transformation column rank. rank. In column V is an orthogonal orthogonal transformation A by by column I;olumn transformations. transformations. Such compression is is analogous to the the that "compresses" "compresses" A Such aa compression analogous to
Exercises Exercises
41 41
socalled columnreduced columnreduced echelon echelon form, form, which which is not generally generally aa reliable reliable procedure procedure when when socalled is not performed by by Gauss transformations in in finiteprecision For details, see, for for performed Gauss transformations finiteprecision arithmetic. arithmetic. For details, see, example, [7], [7], [11], [11],[23], [23],[25]. [25].
EXERCISES EXERCISES mx T 1. Let X E IRmxn. XT = 0, show that X == 0. o. €M ". If If X XX = T 2. Prove Prove Theorem Theorem 5.1 5.1 starting starting from the observation that AA AAT ~ 0. O. 2. from the observation that > xn A eE E" IRnxn indefinite. Determine an SVD of A. 3. Let A be symmetric but indefinite. an SVD A. m n 4. IRm, ~n be nonzero vectors. Determine Determine an SVD of A E ~~ xn 4. Let x eE R , yy eE R of the matrix A e R™ defined by by A A = xyT. defined xyT.
Determine SVDs the matrices matrices 5. Determine SVDs of of the 5. (a) (b)
[ ] [ ~l 1 0
1
mxn nxn mxm and 6. Let Let A A e E R ~mxn and E IRmxm and Y ~nxn are are orthogonal. and suppose W W eR 7 eE R
(a) that A and and WAY (a) Show Show that W A F have have the the same same singular singular values values (and (and hence hence the the same same rank). rank). (b) Suppose that W Wand Yare A and Y are nonsingular but not necessarily orthogonal. Do A and have the the same they have have the the same same rank? rank? and WAY WAY have same singular singular values? values? Do Do they XM Let A € E R" ~~xn. . Use Use the the SVD to determine factorization of of A, i.e., i.e., AA== QQP P 7. 7. Let SVD to determine aa polar factorization where Q Q is is orthogonal orthogonal and and P p TT > > 0. O. Note: Note: this this is is analogous to the the polar polar form form where P = P analogous to iO zz = rel& ofa of a complex complex scalar scalar zz (where (where ii = jj = V^T). J=I).
This page intentionally intentionally left left blank blank This page
Chapter 6 6 Chapter
Linear Equations Equations Linear
In this this chapter we examine uniqueness of In chapter we examine existence existence and and uniqueness of solutions solutions of of systems systems of of linear linear equations. the form equations. General General linear linear systems systems of of the form (6.1)
are special case, case, the the familiar familiar vector vector system are studied studied and and include, include, as as aa special system Ax = b; A
6.1 6.1
E ]Rn xn,
b
E ]Rn.
(6.2)
Vector Equations Vector Linear Linear Equations
We begin review of principal results We begin with with aa review of some some of of the the principal results associated associated with with vector vector linear linear systems. systems. Theorem 6.1. system of Theorem 6.1. Consider Consider the the system of linear linear equations equations
Ax = b; A
E lRmxn,
b
E lRm.
(6.3)
1. solution to b E R(A). 1. There There exists exists aa solution to (6.3) (6.3) if if and and only only ififbeH(A). m, i.e., 2. for all b Ee lR 2. There exists a solution to (6.3) (6.3} for Rmm if if and only only ifR(A) ifU(A) == lR W", i.e., A is onto; equivalently, equivalently, there there exists exists aa solution if and and only only ifrank([A, j/"rank([A, b]) b]) = = rank(A), and rank(A), and onto; solution if this is possible only ifm ifm :::: < nn (since (since m m = dim dimT^(A) = rank(A) < min{m, min{m, nn}). this is possible only R(A) = rank(A) ::::
n.
3. A solution solution to N(A) = A is 3. A to (6.3) (6.3) is is unique unique if if and and only only if ifJ\f(A) = 0, 0, i.e., i.e., A is 11. 11. 4. for all 4. There There exists exists aa unique unique solution to to (6.3) (6.3) for all bb Ee ]Rm W" if if and and only only if if A is is nonsingular; nonsingular; mxm equivalently, A Mmxm and A has neither singular value nor aa 0 eigenvalue. eigenvalue. equivalently, A EG lR and A has neither aa 0 singular value nor 1 5. at most for all if of 5. There There exists exists at most one one solution solution to to (6.3) (6.3) for all bb Ee lR Wm if and and only only if if the the columns columns of A n. A are are linearly linearly independent, independent, i.e., i.e., A/"(A) N(A) = 0, 0, and and this this is is possible possible only only ifm ifm > ::: n.
6. Ax = 6. There exists a nontrivial solution to the homogeneous system Ax = 0 0 if if and only only ifif rank(A) < n. rank(A) < n. 43
44
Chapter Linear Equations Chapter 6. 6. Linear Equations
Proof: The The proofs proofs are are straightforward straightforward and and can can be be consulted consulted in in standard standard texts texts on on linear Proof: linear algebra. Note Note that that some parts of of the the theorem theorem follow follow directly directly from from others. others. For example, to to algebra. some parts For example, prove part part 6, note that that xx = 0 0 is is always to the the homogeneous homogeneous system. Therefore, we we prove 6, note always aa solution solution to system. Therefore, must have have the the case case of of aa nonunique nonunique solution, A is not II, which implies implies rank(A) rank(A) < < n n must solution, i.e., i.e., A is not 11, which by part part 3. 0 by D
6.2 6.2
Matrix Linear Equations
In some of and uniqueness In this this section section we we present present some of the the principal principal results results concerning concerning existence existence and uniqueness of solutions to to the the general general matrix matrix linear linear system (6.1). Note Note that that the the results results of of solutions system (6.1). of Theorem Theorem 6.1 follow from those below below for the special case k = = 1,1, while while results results for (6.2) follow 6.1 follow from those for the special case for (6.2) follow by by specializing even even further = n. n. specializing further to to the the case case m m= Theorem 6.2 6.2 (Existence). equation Theorem (Existence). The The matrix matrix linear linear equation
AX = B; A
E JR. mxn ,
BE
JR.mxk,
(6.4)
has has aa solution solution ifif and and only only ifl^(B) ifR(B) C S; 7£(A); R(A); equivalently, equivalently, aa solution solution exists exists ifif and and only only ifif + AA B = B. AA+B B. Proof: follows essentially range Proof: The The subspace subspace inclusion inclusion criterion criterion follows essentially from from the the definition definition of of the the range of aa matrix. criterion is is Theorem of matrix. The The matrix matrix criterion Theorem 4.18. 4.18. 0 mxn mxk + Theorem 6.3. 6.3. Let A eE R JR.mxn,, B E JR.mxk and suppose that AA +B B = Theorem eR = B. Then any matrix of form of the the form X = A+ B + (/  A+ A)Y, where Y E JR.nxk is arbitrary, (6.5)
is is aa solution solution of of AX=B.
(6.6)
Furthermore, all solutions of (6.6) (6.6) are of this form. Furthermore, all solutions of are of this form. Proof: To To verify verify that that (6.5) (6.5) is is aa solution, premultiply by by A: Proof: solution, premultiply A: AX
= AA+ B + A(I = B
+ (A 
A+ A)Y
AA+ A)Y by hypothesis
= B since AA + A = A by the first Penrose condition.
That all solutions arc of this seen as follows. Let Let Z arbitrary solution That all solutions are of this form form can can be be seen as follows. Z be be an an arbitrary solution of of (6.6). i.e .. AZ AZ :::: B. Then Then we we can can write write (6.6), i.e., — B.
Z=A+AZ+(IA+A)Z =A+B+(IA+A)Z
and (6.5). and this this is is clearly clearly of of the the form form (6.5).
0
6.2. Matrix Matrix Linear Linear Equations Equations 6.2.
45
+ + Remark A is square and nonsingular, A A+ = A" AI1 and so (I A+ A) = O. Remark 6.4. When A (/ — A A) 0. Thus, 1 = AI B. there is no "arbitrary" component, leaving only the unique solution X X• = A~ B.
Remark Remark 6.5. 6.5. It It can be shown that the particular particular solution X = A++BB is the solution of (6.6) (6.6) 7 that minimizes minimizes TrXT TrX X. (TrO (Tr() denotes denotes the the trace of aa matrix; that TrXT TrX r X = = £\ jcj.) trace of matrix; recall recall that Li,j•xlj.) that
Theorem 6.6 (Uniqueness). (Uniqueness). A of the the matrix linear equation equation Theorem 6.6 A solution solution of matrix linear AX
= B;
A E lR,mxn, BE lR,mxk
(6,7)
and only only if A ++AA = I; equivalently, has aa unique and only only if if is unique unique if if and if A = /; equivalently, (6.7) (6.7) has unique solution solution if if and M(A) = 0. N(A) = O. Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting Proof: thatA+ A = rank(A) (recallr n), But Butrank(A) that A+A = I/ can occur only ifr if r = — n, n, wherer where r = rank(A) (recall r ::: < h). rank(A) = = nn if and and only if A is Ilor 11 or _/V(A) 0. D if only if A is N(A) = = O. 0
Example A Ee lR,nxn. Ax = Example 6.7. 6.7. Suppose A E"x". Find all solutions of the homogeneous system Ax — 0, 0. Solution: x=A+O+(IA+A)y = (IA+A)y, + where yy eE lR,n A+ t= I,I. R" is arbitrary. Hence, there exists a nonzero solution if and only if A AA /= rank(A) A being singular. Clearly, if there exists a This is equivalent to either rank (A) = = r < < n or A unique, nonzero solution, it is not unique. Computation: Since yy is arbitrary, it is easy to see that all solutions are generated from a basis for 7£(7 R(I — A A ++ A). A). But if A A has an SVD given by A A = = U f/Eh VT, VT, then it is easily r checked that 1/  A+A V2V and R(Vz U(V2V^) = = R(Vz) K(V2) = =N(A), N(A). A+ A = Vz V[ 2 and
vD
Example A Ee lR,mxn; Example 6.S. 6.8. Characterize Characterize all right inverses of a matrix A ]Rmx"; equivalently, find all AR = solutions R of the equation AR = 1Imm., Here, we write 1m Im to emphasize the m x m identity matrix, matrix. Solution: There exists a right inverse if and only if R(Im) R(A) and this is 7£(/m) S; c 7£(A) equivalent to Im. Clearly, Clearly, this can occur occur if if and only if if rank(A) rank(A) = = rr = m (since AA + +I1m this can and only m (since equivalent to AA m = 1m. + rr ::: is then a right inverse). All right inverses < m) m) and this is equivalent to A A being onto (A (A+ of A are then of the form of A R = A+ 1m
+ (In
 A+ A)Y
=A++(IA+A)Y, + A+ A = I/ where Y Ee lR,nxm E"xm is arbitrary, arbitrary. There is a unique right inverse if and and only if A A 1 (N(A) = 0), in which case A A must be invertible and R R = AI. (AA(A) = A" .
Example 6.9. 6.9. Consider Consider the system of linear firstorder difference equations Example (6,8)
46 46
Equations Chapter 6. Linear Equations
nxmxm IR nxxn" and B E IR (n(rc>l,ra>l). ~ I, m ~ I). The vector Jt* Xk in linear system theory is with A Ee R" fieR" at time time k while while Uk is the the input (control) vector. known as as the known the state state vector vector at Uk is input (control) vector. The The general general solution solution of of (6.8) (6.8) is is given given by by
kJ Xk
= Akxo
+ LAkJj BUj
(6.9)
j=O
k
~Axo+[B.AB •...• A
UkJ ] Uk2
kJ
~o
B]
(6.10)
[
for kk > 1. We the question: question: Given Given XQ 0, does does there exist an sequence for ~ 1. We might might now now ask ask the Xo = 0, there exist an input input sequence k {uj x^ va in [Uj }}y~Q jj^ such such that takes an an arbitrary arbitrary value W ? In In linear linear system system theory, is aa question {u j 1 ~:b that Xk Xk takes value in 1R"? theory, this this is question of of reacbability. reachability. Since Since m ~ > I, 1, from from the the fundamental fundamental Existence Existence Theorem, Theorem, Theorem 6.2, we see that (6.8) is reachable if and only if if R([ B, AB, ... , A n  J B]) = 1R"
or, equivalently, if if or, equivalently, if and and only only if rank [B, AB, ... , A n  J B]
= n.
A related related question question is is the the following: following: Given Given an an arbitrary arbitrary initial initial vector vector XQ, does there there exexA Xo, does j such ist an an input input sequence sequence {u {"y}"~o such that that xXnn = = O? 0? In linear linear system system theory, theory, this this is is called called controllability. if controllability. Again from Theorem Theorem 6.2, we see that (6.8) is controllable if and only if
l'/:b
Clearly, reachability always implies controllability and, if A A is nonsingular, control1 lability and and reachability are equivalent. equivalent. The The matrices = [~ [ ° ~] andB5 == [~] f ^ 1provide providean an A = lability reachability are matrices A Q1and example example of of aa system system that that is is controllable controllable but but not not reachable. reachable. The standard conditions conditions with analogues for continuoustime models The above are standard with analogues for continuoustime models (i.e., (i.e., linear linear differential differential equations). equations). There There are are many many other other algebraically algebraically equivalent equivalent conditions. conditions.
Example We now now introduce Example 6.10. 6.10. We introduce an an output output vector vector Yk yk to to the the system system (6.8) (6.8) of of Example Example 6.9 6.9 by the equation by appending appending the equation (6.11) pxn E IR Pxn e R
pxm E IR Pxm €R
with C and (p pose some the with and D (p ~ > 1). 1). We We can can then then pose some new new questions questions about about the overall system that are are dual to reachability reachability and and controllability. overall system that dual in in the the systemtheoretic systemtheoretic sense sense to controllability. The The answers answers are are cast cast in in terms terms that that are are dual dual in in the the linear linear algebra algebra sense sense as as well. well. The The condition condition dual reachability is knowledge of l';:b dual to to reachability is called called observability: observability: When When does does knowledge of {u {"7j r/:b }"!Q and and {Yj {y_/}"~o suffice to determine xo? As aa dual we have have the of suffice to determine (uniquely) (uniquely) Jt dual to to controllability, controllability, we the notion notion of 0? As reconstructibility: When does knowledge of r/:b and and {;y/}"Io {YJ lj:b suffice to determine reconstructibility: When does knowledge of {u {wjy }"~Q suffice to determine result from theory is the following: following: (uniquely) xxn? The fundamental fundamental duality duality result from linear linear system system theory is the (uniquely) nl The
(A. [controllablcl if (AT,T. B TT)] is observable observable [reconsrrucrible] (A, B) B) iJ is reachable [controllable] if and and only if if(A [reconstructive].
6.4 Inverses 6.4 Some Some Useful Useful and and Interesting Interesting Inverses
47
To To derive derive aa condition condition for for observability, observability, notice notice that that
kl
Yk = CAkxo
+L
CAk1j BUj
+ DUk.
(6.12)
j=O
Thus, Thus,
Yo  Duo Yl  CBuo  Du] (6.13)
r
Yn] 
Lj:~ CA n  2 j BUj  DUnl
Let denote the the (known) (known) vector vector on on the the lefthand of (6.13) (6.13) and denote the the matrix on Let v denote lefthand side side of and let let R denote matrix on the By the fundamental the righthand righthand side. side. Then, Then, by by definition, definition, v Ee R(R), Tl(R), so so aa solution solution exists. exists. By the fundamental Uniqueness Theorem, Theorem, Theorem Theorem 6.6, Uniqueness 6.6,the thesolution solutionisisthen thenunique uniqueififand andonly onlyififN(R) N(R) ==0,0, or, if or, equivalently, equivalently, if if and and only only if
6.3 6.3
A More Equation A More General General Matrix Matrix Linear Linear Equation
mxn mxq q , and C E jRpxq. Theorem 6.11. Let A Ee R jRmxn, B Ee R jRmx ,B , and C e Rpxti. Then the the equation
AXC=B
(6.14)
+ + has AA + BC+C B, in case the general solution solution is the has aa solution solution if if and and only only if if AA BC C = = B, in which which case the general is of of the form (6.15) n p jRnxp where Y €E R * is arbitrary. arbitrary.
A the notion A compact compact matrix matrix criterion criterion for for uniqueness uniqueness of of solutions solutions to to (6.14) (6.14) requires requires the notion + of the Kronecker product of matrices for its statement. Such a criterion (C C+ ® A of the Kronecker product of matrices for its statement. Such a criterion (CC Example 7.12. 7.12. Let V V= = E". (jc, y) =X > 0 is is an an arbitrary n x n positive definite definite matrix, defines defines a "weighted" inner product. T Definition 7.13. 7.13. IfIf A Ee R IRmmxxn, ATE IR nnxm xm is the unique linear transformation transformation or map Definition ", then A e R T E R IRmm and andfor IRn. such that {x, (x, Ay) = {AT (A x, y) for all x € for all y e R".
7.2. 7.2. Inner Inner product Product Spaces Spaces
55 55
It is easy easy to to check check that, that, with with this this more more "abstract" of transpose, transpose, and It is "abstract" definition definition of and if if the the T (i, y)th j)th element element of of A A is is a aij, then the the (i, (i, y)th j)th element element of of A AT is ap. It can also be checked (/, is a/,. It can also be checked (; , then T T that all the usual usual properties properties of of the the transpose transpose hold, hold, such = B BT AT. the that all the such as as (AB) (Afl) = A . However, However, the
definition above allows us us to to extend the concept concept of of transpose transpose to to the the case case of of weighted weighted inner inner definition above allows extend the mxn products in the following way. Suppose A A eE R ]Rm xn and let (., .) Q and (., .) R, with Q {, }g (•, }R, with Qand and R positive positive definite, definite, be be weighted weighted inner inner products products on on R IRmm and and W, IRn, respectively. respectively. Then Then we we can can define the the "weighted transpose" A A## as the unique unique map map that that satisfies define "weighted transpose" as the satisfies # m (x, Ay) AY)Q (A#x, all xx E IRm IRn.1. (x, = (A x, Y)R y)R for all eR and for all Yy Ee W Q =
T # By Example Example 7.12 7.l2 above, above, we we must must then then have have X xT QAy = x x TT(A (A#{ Ry for all x, x, y. y. Hence Hence we we By QAy ) Ry for all # T # = (A#{ R. Taking transposes transposes (of AT Q = = RA RA#. must have QA QA = (A ) R. (of the usual variety) gives A Q . Since R is is nonsingular, nonsingular, we we find find Since R
A# = R1A TQ. Q. A* = /r'A'
We can generalize the notion of = 0) to Qorthogonality We can also also generalize the notion of orthogonality orthogonality (x (xTTyy = 0) to Q orthogonality (Q (Q is is aa positive positive definite definite matrix). matrix). Two Two vectors vectors x, x, yy Ee IRn W are are Qorthogonal Qfor V and x , xx)) = =00 ifif and ifxx = = 0.
2. (x, y) (y, x) e V. V. 2. (x, y) = (y, x) for for all all x, x, yy E 3. (x,ayi = a(x, y2}forallx, y\, yY22 Ee V V and for alia, 3. (x, aYI + fiy f3Y2) a(x, y\) yll + fi(x, f3(x, Y2) for all x, YI, andfor all a, f3ft 6 E C. c. 2) = Remark 7.15. could use Remark 7.15. We We could use the the notation notation {•, (., } ·)ec to to denote denote aa complex complex inner inner product, product, but but if the the vectors vectors involved complexvalued, the the complex complex inner inner product product is is to to be be understood. if involved are are complexvalued, understood. Note, too, too, from from part part 22 of of the the definition, definition, that that ((x, must be be real real for for all all x. Note, x , xx)) must x. Remark 7.16. Note from parts 22 and and 3 3 of of Definition Definition 7.14 7.14 that that we we have have Remark 7.16. Note from parts
(ax\ + fix2, y) = a(x\, y) + P(x2, y}. Remark 7.17. The Euclidean Euclidean inner inner product product of x, y E is given given by by Remark 7.17. The of x, eC C"n is n
(x, y)
= LXiYi = xHy. i=1
H The conventional the complex Euclidean inner inner product product is is (x, (x, y} y) = yyHxx but but we we The conventional definition definition of of the complex Euclidean HH use its its complex complex conjugate conjugate x yy here here for for symmetry symmetry with with the the real real case. use case.
Remark 7.1S. 7.18. A (x, y} Remark A weighted weighted inner inner product product can can be be defined defined as as in in the the real real case case by by (x, y)Q = Q — H Qy, for arbitrary arbitrary Q Q = Q QH > 0. o. The notion notion of Q Qorthogonality Xx HHQy, > orthogonality can can be be similarly similarly generalized to the the complex generalized to complex case. case.
56 56
Chapter 7. 7. Projections, Projections, Inner Inner Product Product Spaces, and Norms Chapter Spaces, and Norms
Definition 7.19. (V, IF) F) endowed is called Definition 7.19. A A vector vector space space (V, endowed with with aa specific specific inner inner product product is called an an inner If F = C, call V V aa complex complex inner space. If inner product product space. space. If IF = e, we we call inner product product space. If FIF == R, R we we call V Va space. a real real inner inner product product space. call Example 7.20. 7.20. Example T 1. Check that = IRR"n xxn" with with the the inner inner product product (A, (A, B) B) = = Tr Tr A AT B is is aa real real inner inner product product 1. Check that V = B space. Note other choices choices are since by of the function, space. Note that that other are possible possible since by properties properties of the trace trace function, T T BTTAA = Tr A BTT = = Tr BAT. Tr AT TrA BB = = Tr TrB = TrAB TrBA . nx H 2. V= = e Cnxn " with the inner inner product (A, B) B) = Tr Tr A is aa complex complex inner 2. Check Check that that V with the product (A, AHBB is inner product space. Again, other choices choices are possible. product space. Again, other are possible.
Definition V be inner product V, we (or Definition 7.21. 7.21. Let Let V be an an inner product space. space. For For vv eE V, we define define the the norm norm (or length) \\v\\ = = */(v, v). This This is (  , .).) . length) ofv ofv by by IIvll J(V,V). is called called the the norm norm induced induced by by (', Example Example 7.22. 7.22. n 1. If If V V = = IR E." with inner product, 1. with the the usual usual inner product, the the induced induced norm norm is is given given by by IIi> v II = n 2 21
(Li=l V i )2.(E,=i 0. also aa projection, so the the above applies and and we also projection, so above result result applies we get get
0::::: ((I  P)v. v) = (v. v)  (Pv, v) =
from which the theorem follows. follows. from which the theorem
IIvll2  IIPvll 2
0
Definition norm induced on an "usual" inner product The norm induced on an inner inner product product space space by by the the "usual" inner product Definition 7.24. 7.24. The is called norm. natural norm. is called the the natural In case V = = C" en or or V == R", IR n, the the natural natural norm norm is is also also called the Euclidean Euclidean norm. norm. In In In case called the the next next section, section, other on these spaces are are defined. defined. A converse to the other norms norms on these vector vector spaces A converse to the the above above IIx II — = .j(X,X}, an inner inner procedure is is also also available. That is, is, given norm defined defined by by \\x\\ procedure available. That given aa norm •>/(•*> x), an product can be defined via product can be defined via the the following. following.
7.3. 7.3. Vector Vector Norms Norms
57 57
Theorem 7.25 Theorem 7.25 (Polarization (Polarization Identity). Identity). 1. For x, x, yy E product is 1. For € m~n, R", an an inner inner product is defined defined by by
IIx+YIl2~IIX_YI12_
(x,y)=xTy=
IIx + yll2 _ IIxll2 _ lIyll2 2
2. For For x, x, yy eE C", en, an an inner inner product product is by 2. is defined defined by
where = ii = = \/—T. where jj = .J=I.
7.3 7.3
Vector Norms Vector Norms
Definition 7.26. vector space. IR is Definition 7.26. Let Let (V, (V, IF) F) be be aa vector space. Then Then II\ \ . \ II\ : V V + >• R is aa vector vector norm norm ifit if it satisfies following three satisfies the the following three properties: properties: 1. Ilxll::: Ofor all x E V and IIxll = 0 ifand only ifx
2. Ilaxll = lalllxllforallx
E
Vandforalla
E
= O.
IF.
3. IIx + yll :::: IIxll + IIYliforall x, y E V. (This seen readily from the illus(This is is called called the the triangle triangle inequality, inequality, as as seen readily from the usual usual diagram diagram illus two vectors vectors in in ]R2 .) trating sum of trating the the sum of two R2.) Remark 7.27. 7.27. It the remainder this section to state for complexRemark It is is convenient convenient in in the remainder of of this section to state results results for complexvalued vectors. The specialization specialization to the real real case case is is obvious. obvious. valued vectors. The to the A vector said to Definition 7.28. Definition 7.28. A vector space space (V, (V, IF) F) is is said to be be aa normed normed linear linear space space if if and and only only ifif there exists exists aa vector vector norm norm II .• II :: V V + > ]R R satisfying satisfying the the three three conditions conditions of of Definition there Definition 7.26. 7.26.
Example Example 7.29. 7.29.
1. HOlder norms, pnorms, are by 1. For For x Ee en, C", the the Holder norms, or or pnorms, are defined defined by
Special Special cases: cases: (a) Ilx III = L:7=1
IXi
I (the "Manhattan" norm). 1
(b) Ilxllz = (L:7=1Ix;l2)2 = (c) Ilxlioo
= maxlx;l IE!!
=
(X
H
1
X)2
(the Euclidean norm).
lim IIxllp
p++oo
(The that requires (The second second equality equality is is aa theorem theorem that requires proof.) proof.)
58 58
Chapter 7. Projections, Projections, Inner Inner Product Spaces, and and Norms Chapter 7. Product Spaces, Norms 2. Some weighted weighted pnorms: pnorms: 2. Some L~=ld;lx;l, whered; O. (a) IIxll1.D JC,.D = = E^rf/l*/!, where 4 > > 0. 1
(b) IIx IIz.Q — = (x = QH Ikllz.g (xhH Qx) QXY 2,> where Q = QH > > 0 (this norm is more commonly denoted II .• IIQ)' c). denoted
3. vector space space (C[to, (C[to, ttl, t \ ] , 1Ft), R), define define the vector norm 3. On On the the vector the vector norm 11111 = max 1/(t)I· to:::.t~JI
On the vector space space «e[to, ((C[to, ttlr, t\])n, 1Ft), R), define define the the vector On the vector vector norm norm 1111100 = max II/(t) 11 00 , tO~t:5.tl Theorem Inequality). Let Let x, x, yy E Fhcorem 7.30 7.30 (HOlder (Holder Inequality). e en. C". Then Ther, I
I
p
q
+=1. A particular particular case the Holder HOlder inequality A case of of the inequality is is of of special special interest. interest.
Theorem 7.31 (CauchyBunyakovskySchwarz Inequality). Inequality). Let C". Then Theorem 7.31 (CauchyBunyakovskySchwarz Let x, x, y y eE en. Then
with equality are linearly dependent. with equality ifif and and only only ifif xx and and yyare linearly dependent. x2 Proof' Consider the matrix [x y] y] e E en Proof: Consider the matrix [x C"x2 .. Since Since
is definite matrix, matrix, its must be be nonnegative. nonnegative. In words, is aa nonnegative nonnegative definite its determinant determinant must In other other words, H H H H H H H H H o y, we yl ~< 0 ~ < (x ( x xx)(yH ) ( y y y) ) — (x ( x yy)(yH ) ( y x x). ) . Since Since yH y xx == xx y, we see see immediately immediately that that IXH \XHy\ D IIxll2l1yllz. 0 \\X\\2\\y\\2Note: This is not not the algebraic proof proof of of the the CauchyBunyakovskySchwarz Note: This is the classical classical algebraic CauchyBunyakovskySchwarz (CBS) e.g., [20, However, it to remember. remember. (CBS) inequality inequality (see, (see, e.g., [20, p. p. 217]). 217]). However, it is is particularly particularly easy easy to Remark 7.32. The between two x, yy eE C" en may by Remark 7.32. The angle angle e 0 between two nonzero nonzero vectors vectors x, may be be defined defined by cos# = 1I;~~1~1112' I, „ .^ , 0 < 0 < 5. The CBS inequality is thus equivalent to the statement cos e = 0 ~ e ~ I' The CBS inequality is thus equivalent to the statement IlMmlylb — ^  cose COS 01 ~< 1. 1. 1 Remark 7.33. Theorem 7.31 and Remark Remark 7.32 product spaces. Remark 7.33. Theorem 7.31 and 7.32 are are true true for for general general inner inner product spaces. x nxn Remark 7.34. The The norm norm II .• 112 2 is unitarily invariant, if U U E€ e C" " is is unitary, unitary, then Remark 7.34. is unitarily invariant, i.e., i.e., if then H H H \\Ux\\2 = IIxll2 \\x\\2 (Proof (Proof. IIUxili \\Ux\\l = xXHUHUx U Ux = xHx X X = = IIxlli)· \\x\\\). However, However, 11·111   , and   1^ IIUxll2 and 1I·IIClO
7.4. Matrix Matrix Norms Norms 7.4.
59 59
are not invariant. Similar Similar remarks remarks apply apply to to the the unitary unitary invariance invariance of of norms norms of of real real are not unitarily unitarily invariant. vectors under orthogonal transformation. vectors under orthogonal transformation. Remark 7.35. 7.35. If If x, yy E€ en C" are are orthogonal, orthogonal, then then we we have have the Identity Remark the Pythagorean Pythagorean Identity
Ilx ± YII~
= IIxll~
+ IIYII~,
_ _//. the proof proof of of which follows easily easily from from liz z2 z z. the which follows II~2 = ZH
Theorem 7.36. All norms are equivalent; there exist 7.36. All norms on on en C" are equivalent; i.e., i.e., there exist constants constants CI, c\, C2 ci (possibly (possibly depending on onn) depending n) such such that that
Example 7.37. 7.37. For For xx EG en, C", the the following following inequalities inequalities are are all all tight bounds; i.e., i.e., there there exist exist Example tight bounds; vectors for which equality holds: holds: vectors xx for which equality
Ilxlll :::: Jn Ilxlb Ilxll2:::: IIxll» IIxlloo :::: IIxll»
Ilxlll :::: n IIxlloo; IIxl12 :::: Jn Ilxll oo ; IIxlioo :::: IIxllz.
Finally, we Finally, we conclude conclude this this section section with with aa theorem theorem about about convergence convergence of of vectors. vectors. ConConvergence of of aa sequence sequence of of vectors to some some limit vector can can be converted into into aa statement vergence vectors to limit vector be converted statement about numbers, i.e., terms of about convergence convergence of of real real numbers, i.e., convergence convergence in in terms of vector vector norms. norms.
Theorem 7.38. 7.38. Let Let II· \\ • II\\ be be aa vector vector norm norm and and suppose suppose v, v, v(l), i» (1) , v(2), v(2\ ... ... Ee en. C". Then Then lim
V(k)
k4+00
7.4 7.4
=
v
if and only if
lim k~+oo
II v(k)

v
II = O.
Matrix Norms Norms Matrix
In this section we we introduce introduce the the concept concept of of matrix norm. As As with with vectors, vectors, the for In this section matrix norm. the motivation motivation for using matrix norms is is to to have have aa notion notion of of either either the the size size of of or or the the nearness of matrices. matrices. The The using matrix norms nearness of of former the latter to make make sense former notion notion is is useful useful for for perturbation perturbation analysis, analysis, while while the latter is is needed needed to sense of "convergence" vector space xn ,, IR) is "convergence" of of matrices. matrices. Attention Attention is is confined confined to to the the vector space (IRm (Wnxn R) since since that that is what arises arises in in the majority of of applications. applications. Extension Extension to to the complex case case is is straightforward what the majority the complex straightforward and and essentially essentially obvious. obvious. mx Definition 7.39. 7.39. II·  • II : IR Rmxn " > E is is aa matrix matrix norm if it it satisfies the following Definition ~ IR norm if satisfies the following three three properties: properties:
IR mxn and
IIAII
2.
lIaAl1 =
3.
IIA + BII :::: IIAII + IIBII for all A, BE IRmxn. (As with vectors, this is called the triangle inequality.)
~
Ofor all A
E
lalliAliforall A E
IR
IIAII
if and only if A
1.
mxn
= 0
andfor all a E IR.
= O.
60
Chapter Chapter 7. 7. Projections, Projections, Inner Inner Product Product Spaces, Spaces, and and Norms Norms
Example 7.40. 7.40. Let A Ee lR,mxn. R mx ". Then the Frobenius norm (or matrix Euclidean norm) is defined by
IIAIIF
t
~ (t. ai;) I ~ (t.
altA)) 1
~
(T, (A' A)) 1
~
(T, (AA '));
(where rank(A)). ^wncic r = = laiiK^/i;;. Example 7.41. Let A A E e lR,mxn. Rmxn. Then the matrix matrix pnorms are defined by
=
IIAII P
max
IIAxll = max Ilxli p IIxllp=1
_P
Ilxllp;60
IIAxll
. p
The following three special cases are important because they are "computable." "computable." Each is a theorem and requires a proof. I. The "maximum column sum" norm is 1.
2. 2. The "maximum row sum" norm is IIAlioo = max rE!!l.
(t
laUI).
J=1
3. 3. The spectral norm is tTL
T
IIAII2 = Amax(A A) = A~ax(AA ) = a1(A).
Note: IIA+llz
=
l/ar(A), where r
= rank(A).
mxn
Example 7.42. lR,mxn.. The Schattenpnorms Example 7.42. Let A EE R Schatten/7norms are defined by I
IIAlls.p = (at'
+ ... + a!)"".
Some special cases of Schatten /?norms pnorms are equal to norms defined previously. previously. For example,  .  5 2 =  • 5i00 = 11·115.2 = II . \\IIFF and and 11'115,00 = II •. 112'2. The The norm norm II .• 115.1 is often often called called the the trace trace norm. norm. 5>1 is mx Example 7.43. lR,mxn Example 7.43. Let A Ee K "._ Then "mixed" norms can also be defined by
IIAII p,q
= max IIAxil p 11., .An}. „ } . The The spectral radius of of A is the scalar by P.l, spectral radius A is the scalar
p(A) = max IA;I. i
63 63
Exercises Exercises
Let Let
A=[~14
0 12
~].
5
Determine AF, IIAII \\A\\Ilt, IIAlb A2, IIAlloo, HA^, and and peA). p(A). Determine IIAIIF' 9. Let Let 9.
A=[~4 9~ 2~].
Determine AF, IIAII H A I dI ,, IIAlb A2, IIAlloo, H A H ^ , and and p(A). (An nn xx nn matrix, all of of whose Determine IIAIIF' peA). (An matrix, all whose 2 n (n 2 + 1) /2, columns rows as well as columns and and rows as well as main main diagonal diagonal and and antidiagonal antidiagonal sum sum to to ss = = n(n l)/2, is called a "magic square" matrix. If M is a magic square matrix, it can be proved that  M Up = = ss for for all all/?.) p.) that IIMllp T 10. Let , where e IR R"n are are nonzero. Determine IIAIIF' AF, IIAIII> Aj, IIAlb A2, 10. Let A A = = xy xyT, where both both x, x, y y E nonzero. Determine and IIAoo in terms terms of of IIxlla \\x\\a and/or and/or IlylljJ, \\y\\p, where where ex a and and {3 ft take the value value 1,2, 1, 2, or or (Xl oo as and A 1100 in take the as appropriate. appropriate.
This page intentionally intentionally left left blank blank This page
Chapter 8 Chapter 8
Linear Linear Least Least Squares Squares Problems Problems
8.1 8.1
The The Linear Linear Least Least Squares Squares Problem Problem
mx Problem: A E jRmxn Problem: Suppose Suppose A e R " with with m 2: > nand n and bb E bb is to all in U minimum least squares squares residual residual is is 0 0 {::=:} is orthogonal orthogonal to all vectors vectors in U22 {::=:} •/—!)• If A E If A € 1Ftnxn, R"x", then there is an easily checked checked relationship between the left and right T A and AT (take transposes of if eigenvectors eigenvectors of of A and A (take Hermitian Hermitian transposes of both both sides sides of of (9.2». (9.2)). Specifically, Specifically, if left eigenvector of of A A corresponding to A A eE A(A), A(A), then yy is a right eigenvector of of AT y is a left AT corresponding to IA. €E A A(A). (A). Note, too, that by elementary properties of of the determinant, r we have A(A) A(A) = = A(A A(AT), A(A) = A(A) only A E we always always have ), but but that that A(A) = A(A) only if if A e 1Ftnxn. R"x".
Definition 9.7. IfX is aa root multiplicity m m ofjr(X), that A X is is an an eigenvalue A Definition 9.7. If A is root of of multiplicity of n(A), we we say say that eigenvalue of of A of algebraic multiplicity m. multiplicity of of algebraic multiplicity m. The The geometric geometric multiplicity ofXA is is the the number number of of associated associated independent eigenvectors eigenvectors = = nn — rank(A A/) = = dimN(A dim J\f(A — AI). XI). independent rank(A — AI) If AE A(A) has has algebraic then 1I :::: if If A € A(A) algebraic multiplicity multiplicity m, m, then < dimN(A dimA/"(A — AI) A/) :::: < m. m. Thus, Thus, if we denote the the geometric geometric multiplicity of A A by by g, we must have 1I :::: < gg :::: < m. m. we denote multiplicity of g, then then we must have x Definition A matrix matrix A A Ee W 1Ftnxn is said said to an eigenvalue whose Definition 9.8. 9.8. A " is to be be defective defective if if it it has has an eigenvalue whose geometric multiplicity multiplicity is geometric is not not equal equal to to (i.e., (i.e., less less than) than) its its algebraic algebraic multiplicity. multiplicity. Equivalently, Equivalently, A A is is said said to to be be defective defective ifif it it does does not not have have nn linearly linearly independent independent (right (right or or left) left) eigenvectors. eigenvectors.
From the CayleyHamilton Theorem, we know that n(A) O. However, n(A) = = 0. However, it is possible for for A to satisfy satisfy aa lowerorder example, if = \[~1Q ®], satA to lowerorder polynomial. polynomial. For For example, if A A = ~], then then A A satsible 2 (Je — 1)2 = O.0. But the smaller isfies (1 isfies I) = But it it also also clearly clearly satisfies satisfies the smaller degree degree polynomial polynomial equation equation
a  n =0o.
(it.  1) ;;;:;
neftnhion minimal polynomial polynomial of Of A A G l::: l!if.nxn ix the (hI' polynomial polynomilll o/(X) a(A) oJ Definition ~.~. 5.5. Thll The minimal K""" is of IPll.ft least degree such that O. degree such that a(A) a (A) ~=0.
It a(Je) is unique (unique the coefficient It can can be be shown shown that that or(l) is essentially essentially unique (unique if if we we force force the coefficient of the highest A to to be such aa polynomial polynomial is is said to be monic and and we we of the highest power power of of A be + +1,1. say; say; such said to be monic generally write et a(A) generally write (A) as as aa monic monic polynomial polynomial throughout throughout the the text). text). Moreover, Moreover, itit can can also also be be
9.1. Fundamental 9.1. Fundamental Definitions Definitions and and Properties Properties
77 77
nonzero polynomial polynomial fi(k} f3(A) for which ftf3(A) O. In particular, shown that aa(A) (A.) divides every every nonzero (A) = 0. particular, a(A) a(X) divides n(A). n(X). a(A) There is an algorithm to determine or (A.) directly directly (without (withoutknowing knowing eigenvalues eigenvalues and and asasUnfortunately, this algorithm, algorithm, called the Bezout Bezout algorithm, sociated eigenvector eigenvector structure). Unfortunately, algorithm, is numerically unstable. Example 9.10. Example 9.10. The above definitions are illustrated below for a series of matrices, each 4 4, i.e., n(A) (A —  2) 2)4. of which has an eigenvalue 2 of algebraic multiplicity 4, 7r(A) = (A . We denote the geometric multiplicity by g. g.
A[~ 
0
0
A~[~ A~U
A~U
2
0 0
0 I 2
0 0
2
0 0 I 2
0 0 0 2
0 0
2
!]
~
~
~
ha,"(A)
] ha< a(A)
(A  2)' ""d g
(A  2)' ""d g
~ ~
1.
2.
0 0 0 2
~
] h'" a(A)
~
(A  2)2 ""d g
~
3.
0 0 0 2
~
] ha then easy to show that the eigenvalues eigenvalues of f(A) (defined (defined as L~:OanAn) are f(A), but the of /(A) as X^o^A") are /(A), butf(A) /(A)does does not notnecessarily necessarily have all all the the same same eigenvectors eigenvectors (unless, (unless, say, A is is diagonalizable). diagonalizable). For For example, example, A A = = T [~0 6 have say, A Oj] 2 has only one one right corresponding to has only right eigenvector eigenvector corresponding to the the eigenvalue eigenvalue 0, 0, but but A A2 = = f[~0 0~1]has has two two independent right right eigenvectors eigenvectors associated associated with with the the eigenvalue o. What What is is true true is is that that the the independent eigenvalue 0. eigenvalue/eigenvector pair pair (A, (A, x) x) maps maps to to (f(A), x) but but not not conversely. eigenvalue/eigenvector (/(A), x) conversely.
The following theorem is is useful useful when when solving solving systems of linear linear differential differential equations. The following theorem systems of equations. A etA Ax are Details of how the matrix exponential e' is used to solve solve the system system xi = Ax are the subject of of Chapter Chapter 11. 11. xn 1 Theorem 9.20. Let A Ee R" jRnxn and suppose suppose X~~ XI AX = A, A, where A A is diagonal. Then Theorem 9.20. AX —
n
= LeA,txiYiH. i=1
82
Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors Chapter Eigenvectors
Proof: Starting from from the definition, we Proof' Starting the definition, we have have
n
=
0
LeA;IXiYiH. i=1
The following following corollary corollary is is immediate immediate from from the the theorem setting tt == I.I. The theorem upon upon setting nx Corollary If A A Ee R ]Rn xn is diagonalizable diagonalizable with Ai, i/' E right Corollary 9.21. 9.21. If " is with eigenvalues eigenvalues A.,, en,~, and and right AA XA i eigenvectors •, / € n_, then e has eigenvalues e , i € n_, and the same eigenvectors. i E ~, then e has eigenvalues e i E ~, and the same eigenvectors. eigenvectors xXi, " t
There are extensions extensions to to Theorem Theorem 9.20 9.20 and and Corollary Corollary 9.21 9.21for forany anyfunction functionthat thatisis There are analytic A, i.e., i.e., ff(A) ... , f(An))Xanalytic on on the the spectrum spectrum of of A, (A) = = XXf(A)Xf(A)X~l I = = Xdiag(J(AI), Xdiag(/(A.i),..., f ( X t t ) ) X ~ Il .. It course, to have aa version version of which It is is desirable, desirable, of of course, to have of Theorem Theorem 9.20 9.20 and and its its corollary corollary in in which A A is is not not necessarily necessarily diagonalizable. diagonalizable. It It is is necessary necessary first first to to consider consider the the notion notion of of Jordan Jordan canonical form, form, from from which such aa result is then then available available and and presented in this chapter. canonical which such result is presented later later in this chapter.
9.2 9.2
Jordan Canonical Canonical Form Form Jordan
Theorem 9.22. 9.22. Theorem x I. lordan all A A Ee C" c nxn AI, ... , kAnn E C 1. Jordan Canonical Canonical Form Form (JCF): (/CF): For For all " with with eigenvalues eigenvalues X\,..., eC x (not necessarily necessarily distinct), distinct), there there exists exists X € C^ " such (not X E c~xn such that that
XI AX
= 1 = diag(ll, ... , 1q),
(9.12)
where of the the lordan Jordan block matrices 1/ i1,, .••• . . ,, 1q Jq is is of of the the form form where each each of block matrices
0
1i
o
0
Ai
0
Ai Ai
=
(9.13)
o o
Ai
o
Ai
9.2. Jordan Canonical Canonical Form Form 9.2. Jordan
83 83
and L;=1 ki = n. nx Form: For all A E€ R jRnxn" with eigenvalues AI, 2. Real Jordan Canonical Form: Xi, ... . . .,,An Xn (not (not xn necessarily distinct), there exists X X € E R" lR.~xn such that necessarily
(9.14) J\, ... ..., , J1qq is form where each of of the Jordan block matrices 11, is of of the form
in the case of real eigenvalues A., e A (A), and
where = [[ _»' andhI2 == [6 \0 ~]A ininthe thecase caseof of complex complex conjugate conjugateeigenvalues eigenvalues Mi; = _~; ^~: 1] and where M > ai±jp eA(A ). (Xi ± jfJi E A(A). i Proof: Proof: For the proof proof see, for example, [21, pp. 120124].
D 0
Transformations T == [I"__,~ "•{"] allowus usto togo goback back and andforth forthbetween between aareal realJCF JCF Transformations like like T { ] allow and its complex counterpart: TI [ (X
+ jfJ o
O. ] T (X  JfJ
=[
(X fJ
fJ ] (X
= M.
complicated. With For nontrivial Jordan blocks, the situation is only a bit more complicated. 1
j
o
o
j
1
o
o
~ ~]
o j
0
1
'
84
Chapter 9. 9. Eigenvalues Eigenvectors Chapter Eigenvalues and and Eigenvectors
it is is easily it easily checked checked that that
T I
[ "+ jfi 0 0 0
et
0 0
+ jf3 0 0
0 0
0
]T~[~ l h
et  jf3
M
et  jf3
Definition Definition 9.23. 9.23. The The characteristic characteristic polynomials polynomials of of the the Jordan Jordan blocks blocks defined defined in in Theorem Theorem 9.22 are called the elementary or invariant of A. 9.22 are called the elementary divisors divisors or invariant factors factors of A. matrix is product of of its its elementary Theorem 9.24. The characteristic polynomial polynomial of Theorem 9.24. The characteristic of aa matrix is the the product elementary divisors. The minimal of aa matrix divisors of of divisors. The minimal polynomial polynomial of matrix is is the the product product of of the the elementary elementary divisors highest degree corresponding to to distinct distinct eigenvalues. highest degree corresponding eigenvalues.
c
x Theorem 9.25. " with eigenvalues AI, ...," X Then Theorem 9.25. Let Let A A eE C"nxn with eigenvalues AI, .. An. n. Then
n
1. det(A) = nAi. i=1 n
2. Tr(A) =
2,)i. i=1
Proof: Proof: l
1. Theorem 9.22 we have have that A = XXJJXXI. Thus, 1. From From Theorem 9.22 we that A ~ . Thus, 1 det(A) = ) = det(7) A,. det(A) = det(XJXdet(X J XI) det(J) = = ]~[" n7=1 Ai. =l
Theorem 9.22 2. Again, from from Theorem 9.22 we have that A = XXJJXXI. ~ l . Thus, l 11 Tr(A) = = Tr(XJX~ ) = TrC/X" *) = Tr(A) Tr(X J XI) Tr(JX X) = Tr(/) Tr(J) = = £" L7=1 Ai. =1 A.,.
D 0
Example 9.26. Suppose A e E lR. is known known to to have have 7r(A) :rr(A) = (A Example 9.26. Suppose A E7x7 is (A.  1)4(A 1)4(A  2)3 2)3and and 2 2 et(A) a (A.) = = (A (A. —1)2(A I) (A. —2)2. 2) . Then ThenAAhas hastwo twopossible possibleJCFs JCFs(not (notcounting countingreorderings reorderingsofofthe the diagonal blocks): diagonal blocks): 1
J(l)
=
0 0 0
0 0
0 0 0
0 0 0
1
0 0 1 0 0 0 0
0 0 0
0 0 0 1 0 0 2 0 0
0 0 0 0 1 2
0
0
0
1 0 0 0
0 0 0
0 0 0 2
and
f2)
=
0 0 0 0 0
0
1
I 1 0 0 2
0 0 0 0
0
0
0 0
0
0
0 0 0 0 0 1 0 2 0
0 0 0 0 0 0 0 0 0 0 0 2
(1) has elementary  1), (A  (A. 1),(A1),  2)2,  2),(A  2), Note that 7J(l) has elementary divisors divisors(A(A 1)z, I) 2(A , (A.  1), (A, and 2)2(A , and 2) 2 2 2 J(2) has has elementary  1)2,I)(A, (A  2)2, (A (A2). while /( elementarydivisors divisors (A(A 1)2, I) (A , (A  2)and , and  2).
9.3. Determination Determination of JCF 9.3. of the the JCF
85 &5
Example rr(A), l) for Example 9.27. 9.27. Knowing TT (A.), a(A), a (A), and and rank(A rank (A —Ai A,,7) for distinct distinct Ai A.,isis not not sufficient sufficient to to determine A uniquely. determine the JCF of A uniquely. The matrices
Al=
a 0 0 0 0 0 0
0 a 0 0 0 0 0
a 0 0 0 0
0 0 0 a 0 0 0
0 0 0 a 0 0
0 0 0 0 0 a 0
0 0 0 0 0 1 a
A2 =
a 0 0 0 0 0 0
0 a 0 0 0 0 0
a 0 0 0 0
0 0 0 a 0 0 0
0 0 0 a 0 0
0 0 0 0 a 0
0 0 0 0 0 0 a
a)\ al) both have rr(A) 7r(A.) = = (A (A. —a)7, a) ,a(A) a(A.)== (A(A. — a) and , andrank(A rank(A — al) ==4, 4,i.e., i.e.,three threeeigeneigenvectors.
9.3
Determination of of the the JCF Determination JCF
lxn The first critical item of information in determining the JCF of a matrix A ]R.nxn is its A Ee W number of eigenvectors. For each distinct eigenvalue Ai, A,,, the associated associated number of linearly independent right (or left) eigenvectors eigenvectors is given by dim dimN(A A;l) = n — rank(A — A;l). independent right A^(A — A.,7) A.(7). The straightforward straightforward case case is, of course, course, when when Ai X, is is simple, simple, i.e., of algebraic algebraic multiplicity 1; it it The is, of i.e., of multiplicity 1; then has precisely one eigenvector. The more interesting (and difficult) case occurs when Ai is of algebraic multiplicity multiplicity greater than one. For example, suppose A, suppose
A =
[3 2 0
o Then Then
A3I=
3 0
n
U2 I] o o
0 0
has rank 1, so the eigenvalue 3 has two eigenvectors associated associated with it. If If we let [~l [^i ~2 £2 ~3]T &]T denote aa solution solution to to the the linear linear system system (A (A — 3/)£ 0, we that 2~2 2£2 + +£ = 0O.. Thus, Thus, both both denote 3l)~ = = 0, we find find that ~33=
are eigenvectors eigenvectors (and (and are are independent). independent). To get aa third JC3 such such that X = [Xl [x\ KJ_ XT,] are To get third vector vector X3 that X X2 X3] reduces A to JCF, we need the notion of principal vector.
c
xn x Definition 9.28. A Ee C"nxn (or R" ]R.nxn). principal vector of degree Definition 9.28. Let A "). Then xX is a right principal degree k associated with A A(A) X Ee A (A) ifand if and only only if(A if (A  ulx XI)kx == 00 and and(A (A  AI)kl U}k~lxx i= ^ o. 0.
Remark Remark 9.29. 9.29. 1. An analogous definition holds for a left left principal vector of degree k. k.
86
Chapter 9. 9. Eigenvalues Eigenvectors Chapter Eigenvalues and and Eigenvectors
synonymously with "of "of degree k." 2. The phrase "of "of grade k" is often often used synonymously 3. Principal vectors are sometimes also called generalized generalized eigenvectors, eigenvectors, but the latter different meaning in Chapter 12. term will be assigned a much different = 1 corresponds to the "usual" eigenvector. eigenvector. 4. The case kk =
S. of 5. A right (or left) principal vector of degree kk is associated with a Jordan block J; ji of dimension k or larger.
9.3.1 9.3.1
Theoretical Theoretical computation computation
To motivate the development of a procedure for determining determining principal vectors, consider a (1) (2) 2 2 2 x 2 Jordan Jordan block{h[~0 h1. i]. Denote Denote by by xx(l) and x x(2) the the two two columns columns of of aa matrix matrix XX eE R lR~X2 2x2 and ,x A to this JCF. JCF. Then J can that reduces a matrix A Then the theequation equation AX AX == XXJ canbe bewritten written A [x(l)
x(2)] = [x(l)
X(2)]
[~ ~
J.
The first column yields the equation Ax(!) Ax(1) = = AX(!), hx(1) which simply says that x(!) x (1) is a right (2) x(2),, the principal vector eigenvector. The second second column yields the following equation for x of degree 2: of degree (A  A/)x(2)
= x(l).
(9.17) z (2)
w
If we premultiply premultiply (9.17) by by (A AI), we we find find (A ==(A If we (A  XI), (A A1)2 X I ) x(2) x (A A1)X(l) XI)x ==O.0.Thus, Thus, the definition of principal vector is satisfied. x lR nxn This suggests a "general" procedure. First, determine all eigenvalues of A eE R" " nxn ). A eE A A(A) following: (or C ). Then for each distinct X (A) perform the following:
c
1. Solve (A  A1)X(l) = O.
I) associated This step finds all the eigenvectors (i.e., principal vectors of degree 1) associated with A. The number of of A — XI. AI. For example, if if X. of eigenvectors depends on the rank of  XI) A/) = = n — 1, there is only one eigenvector. If multiplicity of rank(A — If the algebraic multiplicity of principal vectors still need need to be computed XA is greater than its geometric multiplicity, principal from succeeding steps. (1) x(l),, solve 2. For each independent jc
(A  A1)x(2) = x(l).
of The number of linearly independent solutions at this step depends on the rank of (A —  uf. (A X I ) 2 . If, for example, this rank is nn — 2, there are two linearly independent AI)22x^ o. One of these solutions solutions solutions to the homogeneous equation (A (A — XI) x (2) = 0. (l) 22 ( l ) (1= 0), 0), since (A = (A AI)O = 0. o. The The other is, of course, xx(l) (^ (A  'A1) X I ) xx(l) = (A  XI)0 othersolution solution necessary to take a linear is the desired principal vector of degree 2. (It may be necessary (1) of jc x(l) R(A —  XI). AI). See, combination of vectors to get get a righthand righthand side that is in 7£(A See, for example, Exercise 7.)
9.3. Determination Determination of of the the JCF 9.3. JCF
87
3. 3. For For each each independent independent X(2) x(2) from from step step 2, 2, solve solve (A 
AI)x(3)
=
x(2).
4. Continue Continue in in this this way until the the total total number number of of independent independent eigenvectors eigenvectors and and principal 4. way until principal vectors is is equal equal to to the the algebraic algebraic multiplicity multiplicity of of A. vectors A. Unfortunately, this this naturallooking can fail fail to to find find all vectors. For For Unfortunately, naturallooking procedure procedure can all Jordan Jordan vectors. more extensive more extensive treatments, treatments, see, see, for for example, example, [20] [20] and and [21]. [21]. Determination Determination of of eigenvectors eigenvectors and principal principal vectors is obviously obviously very for anything anything beyond simple problems problems (n (n = = 22 and vectors is very tedious tedious for beyond simple or or 3, 3, say). say). Attempts Attempts to to do do such such calculations calculations in in finiteprecision finiteprecision floatingpoint floatingpoint arithmetic arithmetic generally prove prove unreliable. unreliable. There There are are significant significant numerical numerical difficulties difficulties inherent inherent in in attempting generally attempting to compute compute aa JCF, JCF, and and the the interested interested student student is is strongly strongly urged urged to to consult consult the the classical classical and and very to very readable MATLAB readable [8] [8] to to learn learn why. why. Notice Notice that that highquality highquality mathematical mathematical software software such such as as MATLAB does not not offer j cf command, j ardan command is available does offer aa jcf command, although although aa jordan command is available in in MATLAB's MATLAB'S Symbolic Toolbox. Toolbox. Symbolic kxk Theorem 9.30. 9.30. Suppose Suppose A Ckxk has an an eigenvalue eigenvalue A A,ofofalgebraic algebraicmultiplicity multiplicitykkand and Theorem A Ee C has suppose further further that X = of suppose that rank(A rank(A — AI) AI) = = kk — 1. 1. Let Let X = [[x(l), x ( l ) , ... . . . ,, X(k)], x(k)], where where the the chain chain of vectors Then vectors x(i) x(i) is is constructed constructed as as above. above. Then
Theorem Theorem 9.31. 9.31. {x(l), (x (1) , ... . . . ,, X(k)} x (k) } is is aa linearly linearly independent independent set. set. Theorem Principal vectors Jordan blocks indeTheorem 9.32. 9.32. Principal vectors associated associated with with different different Jordan blocks are are linearly linearly independent. pendent. Example Let Example 9.33. 9.33. Let
A=[~0 01 2; ] . The eigenvalues eigenvalues of of A are A1 = I, 1, A2 h2 = = 1, 1, and and A3 h3 = = 2. 2. First, First, find the eigenvectors eigenvectors associated associated The A are AI = find the with the distinct distinct eigenvalues eigenvalues 11 and and 2. with the 2. ,(1)= (A 2I)x~1) = 00 yields (A 2/)x3(1) yields
88
Chapter 9. Eigenvalues and Eigenvectors (1)
(A yields (A 11)x?J l/)x, ==00 yields
To find find aa principal of degree degree 22 associated associated with with the the multiple multiple eigenvalue eigenvalue 1, 1, solve solve To principal vector vector of (A get (A – 1I)xl l/)x,(2)2) == xiI) x, (1)to toeet
x,
(2)
Now let let Now X
= [xiI)
=[
0~ ]
.
xl" xl"] ~ [ ~
0 1
5 3
0
Then itit is is easy easy to to check check that Then that
X'~U i 0
1
5 ]
and XlAX
=[
l
I
~
1
0
0
9.3.2 9.3.2
n
On the +1 's 's in JCF JCF blocks
In this subsection subsection we show that superdiagonal elements elements of of aa JCF not be In this we show that the the nonzero nonzero superdiagonal JCF need need not be 11's's but but can can be be arbitrary arbitrary — so so long long as as they they are are nonzero. nonzero. For For the the sake sake of of definiteness, defmiteness, we we consider below below the case of of aa single single Jordan but the the result clearly holds any JCF. JCF. consider the case Jordan block, block, but result clearly holds for for any nxn Suppose and SupposedAA€E RjRnxn and
Let D diag(d1, ... . . . ,, ddnn)) be be aa nonsingular nonsingular "scaling" "scaling" matrix. D = diag(d" matrix. Then Then Let
D'(X' AX)D = D' J D = j
A
4l. d,
0
0
)...
!b. d,
0
0
A
=
dn 
I
dn 
2
A
0
0
0 dn dn 
)...
I
89
9.4. 9.4. Geometric Geometric Aspects Aspects of of the the JCF JCF
di's Appropriate choice of the di 's then yields any desired nonzero superdiagonal elements. interpreted in terms of the matrix X = [x[, ... ,x This result can also be interpreted = [x\,..., xnn]] of eigenvectors eigenvectors and principal that reduces reduces A Specifically, Jj is is obtained obtained from from A the and principal vectors vectors that A to to its its JCF. lCF. Specifically, A via via the similarity dnxn}. similarity transformation transformation XD XD = \d\x\,..., [d[x[, ... , dnxn]. In similar fashion, reverseorder identity matrix (or matrix) In aa similar fashion, the the reverseorder identity matrix (or exchange exchange matrix) I
0
0
0 p = pT = p[ =
(9.18)
0 I
1
0
0
can be used to to put the superdiagonal superdiagonal elements elements in in the subdiagonal instead instead if that is desired: desired: A
I
0
A
0
A
0 0
A
p[
0
p=
0
0
A
0
1
A
0 A 0
0
9.4 9.4
I A
A
0
0
0 A
Geometric Aspects of the Geometric Aspects of the JCF JCF
c
X nxn The matrix X X that reduces a matrix A E jH.nxn )) totoaalCF e IR" "(or (or Cnxn JCFprovides providesaachange changeof ofbasis basis with respect to diagonal or or block natural to with respect to which which the the matrix matrix is is diagonal block diagonal. diagonal. It It is is thus thus natural to expect expect an an associated direct direct sum decomposition of of jH.n. R. Such Such aa decomposition decomposition is is given given in in the the following associated sum decomposition following theorem. x Theorem 9.34. Suppose A Ee R" jH.nxn 9.34. Suppose " has characteristic polynomial
n(A) = (A  A[)n) ... (A  Amtm
and minimal polynomial a(A) = (A  A[)V) '" (A  Am)Vm
. . . ,, A. distinct. Then Then with Ai, AI, ... Ammdistinct. jH.n
= N(A = N (A
 AlIt) E6 ... E6 N(A  AmItm  A1I)
v)
E6 ... E6 N (A  Am I) Vm .
dimN(A — A.,/) AJ)Viw = = «,. ni. Note that dimM(A Definition 9.35. Let Definition 9.35. Let V be a vector space over F IF and and suppose suppose A : V —>• + V is a linear transformation. A subspace S c V V is if AS c S, is defined as the transformation. A subspace S ~ is Ainvariant A invariant if AS ~ S, where where AS AS is defined as the set {As : ss eE S}. S}. set {As:
90
Chapter Eigenvectors Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors
If ... , Sk If V is taken to be ]Rn R" over Rand R, and SS Ee ]Rn R"xxk* is a matrix whose columns SI, s\,..., s/t span aa kdimensional /^dimensional subspace subspace S, <S,i.e., i.e.,R(S) K(S) == S, <S,then thenS <S Ainvariant andonly onlyififthere there span is isAinvariant ififand kxk exists M EeR ]Rkxk such that (9.19) AS = SM. This follows easily by comparing the ith /th columns of each side of (9.19):
Example 9.36. 9.36. The The equation equation Ax Ax = A* = xx A defining aa right right eigenvector eigenvector xx of of an an eigenvalue Example AX = A defining eigenvalue XA says that *x spans an Ainvariant subspace (of dimension one). Example 9.37. 9.37. Suppose X block diagonalizes A, i.e., XI AX =
[~
J 2
].
Rewriting in the form
~ J, we have that that A A,i = X;li, A", /,,i /== 1,2, 1, 2,sosothe thecolumns columnsofofXiA,span spanananAinvariant Amvanantsubspace. subspace. we have AX Theorem 9.38. 9.38. Suppose A Ee ]Rnxn. E"x".
7. = CloI «o/ + + ClIA o?i A + + '"• • •+ + ClqAq <xqAq be be aa polynomial polynomial in in A. A. Then Then N(p(A)) N(p(A)) and and 1. Let Let p(A) peA) = R(p(A)) are Ainvariant. 7£(p(A)) Ainvariant.
2. S is Ainvariant A invariant if if and only only if ifSS11. is AATTinvariant. Theorem If V is a vector space over IF NI EB Theorem 9.39. 9.39. If F such that V = = N\ ® ... • • • EB 0 Nmm, , where each A// is Ainvariant, then aa basis V can can be with respect respect to which A N; is Ainvariant, then basis for for V be chosen chosen with to which A has has aa block block diagonal representation. diagonal representation.
The Jordan Jordan canonical canonical form form is is aa special special case case of of the above theorem. If A A has The the above theorem. If has distinct distinct eigenvalues Ai as in Theorem 9.34, N(A —  A.,/)"' Ai/)n, by SVD, for eigenvalues A,,9.34, we could choose bases for N(A example (note (note that that the the power power ni n, could could be be replaced replaced by v,). We would then then get get aa block block diagonal diagonal example by Vi). We would representation for blocks rather structured Jordan blocks. Other Other representation for A A with with full full blocks rather than than the the highly highly structured Jordan blocks. such "canonical" "canonical" forms forms are are discussed discussed in text that that follows. such in text follows. Suppose A" X == [Xl AX ==diag(J1, ... , Jm [ X ..... i , . . . ,Xm] Xm] Ee]R~xn R"nxnisissuch suchthat thatXI X^AX diag(7i,..., Jm),),where where each Ji = diag(JiI,"" diag(/,i,..., Jik,) //*,.) and and each each /,* is aa Jordan Jordan block block corresponding corresponding to to Ai A, Ee A(A). each Ji = Jik is A(A). We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our attention to only only the the Jordan block case. case. Note that A A", == Xi A*, J/,,i , so so by by (9.19) (9.19) the the columns columns attention here here to Jordan block Note that AXi of A", (i.e., the the eigenvectors eigenvectors and and principal vectors associated associated with with Ai) A.,) span span an an Ainvariant of Xi (i.e., principal vectors Ainvariant subspace of]Rn. of W. Finally, we return to the problem of developing a formula A formula for ee'l AA in the case that A x T nxn is not necessarily diagonalizable. Let 7, E€ <e C" "' , be Jordan basis for N N(A is not necessarily diagonalizable. Let Yi be aa Jordan basis for (AT — A.,/)"'. A;lt. Equivalently, partition Equivalently, partition
9.S. The The Matrix Sign Function Function 9.5. Matrix Sign
91 91
compatibly. Then compatibly. Then A = XJX I = XJy H = [XI, ... , Xm] diag(JI, ... , Jm) [YI ,
••• ,
Ym]H
m
H
= LX;JiYi . i=1
In a similar fashion we can compute m
etA = LXietJ;YiH, i=1
which in conjunction which is is aa useful useful formula formula when when used used in conjunction with with the the result result A
0 exp t
0
teAt
.lt 2 e At
0
eAt
teAt
0
0
0
eAt
1 A
0
A A
0
eAt
0
0
2!
0
block 7, Ji associated A == Ai. for a k x k Jordan block associated with an eigenvalue A. A.,.
9.5 9.5
The Function The Matrix Matrix Sign Sign Function
section we give a very brief brief introduction to an interesting interesting and useful useful matrix function In this section function called sign function. sign (or scalar. A called the the matrix matrix sign function. It It is is aa generalization generalization of of the the sign (or signum) signum) of of aa scalar. A survey of the matrix sign function and some of its applications can be found in [15]. Definition 9.40. 9.40. Let z E E C with Re(z) ^f= O. of z is defined defined by Definition 0. Then the sign of ifRe(z) > 0, ifRe(z) < O.
Re(z) {+1 sgn(z) = IRe(z) I = 1
x Definition 9.41. cnxn Definition 9.41. Suppose A E e C" " has no eigenvalues on the imaginary axis, and let
be Jordan canonical canonicalform form for for A, with with N N containing containing all all Jordan Jordan blocks blocks corresponding corresponding to to the the be aa Jordan in the the left left halfplane halfplane and and P P containing containing all all Jordan Jordan blocks blocks corresponding corresponding to eigenvalues of eigenvalues of A in to eigenvalues in eigenvalues in the the right right halfplane. halfplane. Then Then the the sign sign of of A, A, denoted denoted sgn(A), sgn(A), is is given given by by sgn(A) = X
[ / 0] 0
/
X
I
,
92 92
Chapter 9. 9. Eigenvalues Eigenvalues and and Eigenvectors Eigenvectors Chapter
where the negative and positive positive identity matrices are of of the same dimensions as N and p, P, respectively. There are are other other equivalent equivalent definitions definitions of of the sign function, function, but but the one given There the matrix matrix sign the one given here is is especially especially useful useful in in deriving deriving many of its its key key properties. The JCF JCF definition definition of of the the here many of properties. The matrix sign function does not generally generally lend itself itself to reliable computation on a finitewordlength digital computer. In fact, its reliable numerical calculation calculation is an interesting topic in its own right. We state state some some of the more properties of matrix sign sign function function as as theorems. theorems. We of the more useful useful properties of the the matrix Their Their straightforward proofs are left left to the exercises. exercises.
e
x Theorem 9.42. 9.42. Suppose A Ee C"nxn " has no eigenvalues on the imaginary axis, and let Theorem = sgn(A). S= sgn(A). Then the following following hold:
1. S is diagonalizable with eigenvalues equal to del. ± 1. 2. S2 2. S2 = = I. I.
3. = SA. SA. 3. AS AS = 4. sgn(AH) = 4. sgn(A") = (sgn(A»H. (sgn(A))". l x 5. sgn(T1AT) foralinonsingularT Ee C" enxn 5. sgn(TAT) = T1sgn(A)T Tlsgn(A)TforallnonsingularT "..
6. sgn(cA) = sgn(c) sgn(c) sgn(A) sgn(A)/or c. 6. sgn(cA) = for all nonzero real scalars c. x nxn Theorem 9.43. 9.43. Suppose A Ee e C" " has no eigenvalues on the imaginary axis, and let Theorem — sgn(A). sgn(A). Then the following S= following hold:
1. — /) is an Ainvariant left halfplane halfplane eigenvalues I. 7l(S R(S l) Ainvariant subspace corresponding to the left of A (the (the negative negative invariant invariant subspace). subspace). of
2. R(S+/) R(S + l) is an Ainvariant A invariant subspace corresponding to the right halfplane halfplane eigenvalues of A (the (the positive invariant subspace). of positive invariant 3. negA negA == = (l (/ — S) S)/2 of A. 3. /2 is a projection projection onto the negative invariant subspace subspace of 4. posA == positive invariant subspace of = (l (/ + + S)/2 is a projection onto the positive of A. A.
EXERCISES EXERCISES
e
nxn 1. A Ee Cnxn ),.1> ••• ),.nn with corresponding right 1. Let A have distinct distinct eigenvalues AI, ...,, X right eigeneigenvectors ... ,,xXnn and and left left eigenvectors eigenvectors Yl, y\, ••. ..., , Yn, yn, respectively. respectively. Let Let v Ee en C" be be an vectors Xi, Xl, ... an arbitrary vector. vector. Show Show that that vv can can be be expressed expressed (uniquely) (uniquely) as as aa linear linear combination combination arbitrary
of the right eigenvectors. Find the appropriate expression expression for v as a linear combination of the left eigenvectors as well.
93 93
Exercises
x H 2. A E rc nxn i.e., A AH = A. Prove that all of 2. Suppose Suppose A € C" " is is skewHermitian, skewHermitian, i.e., = —A. Prove that all eigenvalues eigenvalues of aa skewHermitian matrix must be pure imaginary. skewHermitian matrix must be pure imaginary. x 3. A Ee C" rc nxn is Hermitian. Let A be an an eigenvalue eigenvalue of A with with corresponding 3. Suppose Suppose A " is Hermitian. Let A be of A corresponding right eigenvector x. Show that also aa left left eigenvector eigenvector for for A. right eigenvector x. Show that xx is is also A. Prove Prove the the same same result result if A A is skewHermitian. if is skewHermitian. 5x5 4. Suppose a matrix A E€ lR. R5x5 has eigenvalues {2, {2, 2, 2, 2, 3}. 3}. Determine all possible JCFs for A. JCFs for A.
5. 5. Determine the eigenvalues, eigenvalues, right eigenvectors eigenvectors and and right principal vectors if if necessary, and (real) JCFs of the following matrices: (a)
2 1 ] 0 ' [ 1
6. Determine the the JCFs JCFs of 6. Determine of the the following following matrices: matrices:
PAPI PAP" 1 isiscalled calledaasimilarity. similarity. T T 2. If = VV and and if if Q = P pT is orthogonal, the transformation transformation A A i» f+ PAP P ApT is called If W = is orthogonal, the is called an orthogonal orthogonal similarity (or unitary unitary similarity in the the complex complex case). case). an similarity (or similarity in
The achieved under similarity. If The following following results results are are typical typical of of what what can can be be achieved under aa unitary unitary similarity. If A = AHH E has eigenvalues AI, ... An,n, then then there matrix U A = A 6 en C"xxn " has eigenvalues AI, . . . ,, A there exists exists aa unitary unitary matrix £7 such suchthat that UHHAU D, where where D D == diag(AJ, diag(A.j,..., A. n ). This This is is proved proved in in Theorem Theorem 10.2. 10.2. What What other other U AU =— D, ... , An). answer is given in in Theorem matrices are are "diagonalizable" "diagonalizable" under under unitary unitary similarity? matrices similarity? The The answer is given Theorem x 10.9, where C"nxn " is is unitarily similar to 10.9, where it it is is proved proved that that aa general general matrix matrix A A eE e unitarily similar to aa diagonal diagonal H H and only only if if it it is is normal normal (i.e., (i.e., AA AA H = = A AHA). Normal matrices matrices include include Hermitian, Hermitian, matrix if matrix if and A). Normal skewHermitian, (and their symmetric, skewskewHermitian, and and unitary unitary matrices matrices (and their "real" "real" counterparts: counterparts: symmetric, skewsymmetric, and and orthogonal, orthogonal, respectively), respectively), as as well well as as other other matrices matrices that that merely merely satisfy the symmetric, satisfy the a definition, as A _ b ^1 for for real scalars aa and If aa matrix definition, such such as A= = [[_~ real scalars and b. h. If matrix A A is is not not normal, normal, the the JCF described described in 9. most "diagonal" we can can get is the the JCF most "diagonal" we get is in Chapter Chapter 9.
!]
x Theorem en xn AI, ... Theorem 10.2. 10.2. Let A = = AHH eE C" " have (real) eigenvalues A.I, . . . ,,An. Xn. Then there HH exists aa unitary unitary matrix matrix X X such such that that X X AX AX = D= diag(Al, ... An) (the columns columns ofX of X are are exists = D = diag(A.j, . . . ,, X n) (the orthonormal eigenvectors for orthonormal eigenvectors for A). A).
95 95
96 96
Chapter 10. 10. Canonical Canonical Forms Forms Chapter
Proof: Let x\ eigenvector corresponding corresponding to X\, xf*x\ = Proof' XI be a right eigenvector AI, and normalize it such that x~ XI = 1. Then Then there exist n . . . ,, xXnn such such that that X = (XI, [x\,..., = 1. there exist n — 11 additional additional vectors vectors xX2, ... , x xn] 2, ... n] = [x\ X22]] is unitary. Now [XI XHAX
=[
xH I XH ] A [XI 2
X 2]
=[ =[ =[
x~Axl
X~AX2 XfAX 2
XfAxl
Al
X~AX2
0
XfAX 2
Al
0
0
XfAX z
]
]
(10.1)
l
(10.2)
In (l0.1) (10.1) we have used fact that = AIXI. k\x\. When When combined combined with with the the fact fact that In we have used the the fact that Ax\ AXI = that x~ Al remaining in the (l,I)block. (2, I)block by x"xiXI = = 1, 1, we get Ai (l,l)block. We also get 0 in the (2,l)block orthogonal to all vectors in X (l,2)block by noting that x\ XI is orthogonal Xz. 2. In (10.2), we get 0 in the (l,2)block H XH AX AX is Hermitian. The proof induction upon noting noting that X proof is completed easily by induction that the (2,2)block ... , A. An.n . 0 (2,2)block must have eigenvalues A2, X2,..., D XI Ee JRn, X— = Given a unit vector x\ E", the construction of X2z Ee JRnx(nl) ]R"X("1) such that X [XI orthogonal is frequently [x\ X22]] is orthogonal frequently required. The construction can actually be performed quite easily by means of Householder Householder (or Givens) transformations transformations as in the proof proof of the following general general result. following result. nxk 10.3. Let X\ E e C Cnxk have orthonormal orthonormal columns columns and and suppose U is is a unitary have Theorem 10.3. Let XI suppose V a unitary kxk matrix such such that that V UX\ = [\ ~], 1, where is matrix XI = where R R €E Ckxk is upper upper triangular. triangular. Write Write U V HH = [U\ [VI U Vz]] 0
2
nxk with Ui VI E €C Cnxk . Then [XI [Xi V U2]] is unitary.
Proof: Xk]. Construct sequence of of Householder (also known Proof: Let Let X\ X I = [x\,..., [XI, ... ,xd. Construct aa sequence Householder matrices matrices (also known HI, ... , H Hkk in the usual way (see below) such that as elementary reflectors) H\,..., Hk ... HdxI, ... , xd = [
~
l
..., , Xk U= = where R is upper triangular (and nonsingular since x\, XI, ... Xk are orthonormal). Let V H UH = /,•• H Hk'" HI. Then VH = /HI'" Hkk and and k...H v. Then
H Then x^U = 0 (i (/ E € ~) k) means that xXif is orthogonal to each of the n — U2. X i U2  kk columns of V2. 2 = But the latter are orthonormal since they are the last n — kk rows of the unitary matrix U. U. Thus. [XI unitary. 0 Thus, [Xi U2] f/2] is unitary. D
10.3 The construction called called for in Theorem 10.2 is then a special case of Theorem Theorem 10.3 for kk = 1. = 1. 1. We illustrate the construction of the necessary Householder matrix for kk — For simplicity, simplicity, we consider the real case. Let the unit vector x\ [£i, .. . . ,. ,, ~nf. %n]T. XI be denoted by [~I,
10.1. Basic Canonical Canonical Forms 10.1. Some Some Basic Forms
97
Then X^2 is is given given by Then the the necessary necessary Householder Householder matrix matrix needed needed for for the the construction construction of of X by + r TT , U = I —2uu+ = I +uu where u = [';1 ± 1, ';2, ... , ';nf. It can easily be checked 2uu — u^UU , where u [t\ 1, £2, • • •» £«]  It can checked u that U U is symmetric symmetric and U UTTU U = = U U22 = = I, I, so U U is orthogonal. orthogonal. To see that U U effects effects the necessary is easily easily verified = 2± 2£i and = 11 ± necessary compression compression of of jci, Xl, it it is verified that that U u TTU u = ± 2';1 and U u TTX\ Xl = ± £1. ';1. Thus,
Further details on Householder matrices, including the choice of sign and the complex case, consulted in standard numerical linear linear algebra can be consulted standard numerical algebra texts such as [7], [7], [11], [11], [23], [23], [25]. [25]. The real version of Theorem 10.2 10.2isisworth worthstating statingseparately separately since sinceititisisapplied appliedfrefrequently quently in in applications. applications. T nxn Theorem 10.4. Let A A = A AT jRnxn have eigenvalues eigenvalues k\, AI, ... , An. Then there there exists an 10.4. Let eE E have ... ,X exists an n. Then lxn jRn xn (whose orthogonal matrix X eE W (whose columns are orthonormal eigenvectors of of A) such that T XT AX = = D D= = diag(Xi, diag(Al, .... X AX . . , An). X n ).
A (with the obvious analogue Note that Theorem 10.4 implies that a symmetric matrix A from 10.2for forHermitian Hermitian matrices) matrices) can canbe bewritten written from Theorem Theorem 10.2 n
A = XDX
T
= LAiXiXT,
(10.3)
i=1
spectral representation of A. In fact, A in (10.3) is actually a which is often often called the spectral weighted sum of orthogonal projections P, Pi (onto the onedimensional onedimensional eigenspaces eigenspaces corresponding 's),i.e., i.e., sponding to to the the A., Ai'S), n
A
= LAiPi, i=l
where = PUM —xxiXt ixf = ixj since where P, Pi = PR(x;) = =xxixT sincexjxTxi Xi — =1.1.
The following pair of theorems form the theoretical theoretical foundation of the doubleFrancisdoubleFrancisQR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
98
Chapter Canonical Forms Chapter 10. 10. Canonical Forms
x Theorem 10.5 Let A A eE C" cnxn Then there there exists exists a a unitary unitary matrix matrix U such that that Theorem 10.5 (Schur). (Schur). Let ". . Then U such H U H AU U AU == T, T, where where TT is is upper upper triangular. triangular.
Proof: The proof of of this this theorem theorem is is essentially essentially the the same same as that of of Theorem lO.2 except except that that Proof: The proof as that Theorem 10.2 in this this case case (using (using the the notation notation U rather than than X) X) the the (l,2)block AU2 is is not not 0. O. 0 in U rather (l,2)block wf AU2 D
ur
of A A E IRn xxn it is is thus thus unitarily unitarily similar to an an upper upper triangular triangular matrix, matrix, but but In the the case case of In e R" ",, it similar to if A A has has aa complex complex conjugate conjugate pair pair of of eigenvalues, eigenvalues, then then complex arithmetic is if complex arithmetic is clearly clearly needed needed to place place such such eigenvalues eigenValues on on the the diagonal diagonal of of T. T. However, However, the the next next theorem theorem shows shows that that every every to xn A eE W IRnxn is also also orthogonally orthogonally similar similar (i.e., (i.e., real real arithmetic) arithmetic) to to aa quasiuppertriangular A is quasiuppertriangular matrix. A A quasiuppertriangular matrix is is block block upper upper triangular triangular with with 1 matrix. quasiuppertriangular matrix 1 xx 11 diagonal diagonal blocks corresponding to corresponding to blocks corresponding to its its real real eigenvalues eigenvalues and and 2x2 2 x 2 diagonal diagonal blocks blocks corresponding to its its complex conjugate conjugate pairs pairs of of eigenvalues. eigenvalues. complex
Theorem 10.6 Let A A E IR n xxn. there exists exists an an orthogonal 10.6 (MurnaghanWintner). Let e R" ". Then Then there orthogonal T T matrix U such that that U AU = where S S is is quasiuppertriangular. matrix U such U AU = S, S, where quasiuppertriangular. Definition 10.7. triangular matrix matrix T in Theorem Theorem 10.5 is called Schur canonical canonical Definition 10.7. The The triangular T in 10.5 is called aa Schur form The quasiuppertriangular S in 10.6 is real form or or Schur Schur form. fonn. The quasiuppertriangular matrix matrix S in Theorem Theorem 10.6 is called called aa real Schur canonical form form or real Schur Schur form fonn (RSF). columns of unitary [orthogonal} Schur canonical or real (RSF). The The columns of aa unitary [orthogonal] matrix U that reduces reduces a a matrix matrix to [real} Schur Schur form fonn are are called called Schur matrix U that to [real] Schur vectors. vectors.
Example 10.8. 10.8. The The matrix matrix
s~ [ 20
4
h[
1
2
is is in in RSF. RSF. Its Its real real JCF JCF is is
1 1
5
0
0 0
n n
Note corresponding first Note that that only only the the first first Schur Schur vector vector (and (and then then only only if if the the corresponding first eigenvalue eigenvalue if U orthogonal) is is an an eigenvector. eigenvector. However, However, what what is is true, true, and and sufficient for virtually virtually is real real if is U is is orthogonal) sufficient for all applications applications (see, (see, for for example, example, [17]), is that that the the first first k Schur vectors span span the the same all [17]), is Schur vectors same Ainvariant subspace the eigenvectors corresponding to to the the first first k eigenvalues along the the invariant subspace as as the eigenvectors corresponding eigenvalues along diagonal of of T (or S). diagonal T (or S). While every every matrix matrix can can be be reduced reduced to to Schur Schur form (or RSF), RSF), it it is is of of interest interest to to know While form (or know when we we can go further further and reduce aa matrix matrix via via unitary unitary similarity to diagonal diagonal form. form. The when can go and reduce similarity to The following following theorem theorem answers answers this this question. question. x Theorem 10.9. 10.9. A C"nxn " is is unitarily unitarily similar Theorem A matrix matrix A A eE c similar to to a a diagonal diagonal matrix matrix ifif and and only only if if H H H A is is normal normal (i.e., (i.e., A AHAA = = AA A AA ).).
Proof: Suppose Suppose U is aa unitary unitary matrix matrix such such that that U AU = D, where where D D is is diagonal. diagonal. Then Then Proof: U is UHH AU = D, AAH
so is normal. so A A is normal.
= U VUHU VHU H = U DDHU H == U DH DU H == AH A
10.2. Definite Matrices 10.2. Definite Matrices
99
Conversely, suppose A A is normal and let U A U = T, U be a unitary matrix such that U UHHAU T, where T T is an upper triangular matrix (Theorem (Theorem 10.5). Then
It It is then a routine exercise to show that T T must, in fact, be diagonal.
10.2 10.2
0 D
Definite Matrices Definite Matrices
xn Definition 10.10. A e lR. Wnxn is Definition 10.10. A symmetric symmetric matrix matrix A A E
definite if if and only if ifxxTTAx > 0Qfor all nonzero nonzero xx G Wn1.. We We write write A > 0. 1. positive positive definite and only Ax > for all E lR. A > O.
2. nonnegative definite (or x TT Ax Ax :::: for all (or positive positive semidefinite) if if and and only only if if X > 0 for all n nonzero xx Ee lR. W. • We We write write A > 0. A :::: O. nonzero 3. negative negative definite if  A is positive positive definite. write A A < O. if—A definite. We We write < 0. 4. nonpositive definite (or negative semidefinite) if We (or negative if—A A is nonnegative nonnegative definite. definite. We write < 0. write A A ~ O. Also, if A and B are symmetric we write write A > B if and only if or Also, if A and B are symmetric matrices, matrices, we A > B if and only if AA — BB >> 0 or B —  A A < < 0. O. Similarly, Similarly, we we write write A A :::: B ifif and and only only ifA if A —  B>QorB B :::: 0 or B —  A A < ~ 0. O. B > B
e
x nxn Remark If A A Ee C" Remark 10.11. 10.11. " is Hermitian, all the above definitions hold except that superscript s. Indeed, this is generally true for all results in the remainder of of superscript H //ss replace T Ts. this section that may be stated in the real case for simplicity.
Remark 10.12. If If a matrix is neither neither definite nor semidefinite, semidefinite, it is said to be indefinite. indefinite. H nxn Theorem 10.13. Let Let A A = AH with AI{ :::: A22 :::: An.n. Thenfor = A eE e Cnxn with eigenvalues eigenvalues X > A > ... • • • :::: > A Then for all all E en, x eC",
Proof: Let U A as in Theorem 10.2. Proof: U be a unitary matrix that diagonalizes diagonalizes A 10.2. Furthermore, Furthermore, let yv = U UHHx, x, where x is an arbitrary vector in en, CM, and denote the components of y by j]i, ii En. € n. Then Then 11;, n
x HAx = (U HX)H U H AU(U Hx) = yH Dy = LA; 111;12. ;=1
But clearly n
LA; 11'/;12 ~ AlyH Y = AIX HX ;=1
100 100
and and
Chapter 10. 10. Canonical Canonical Forms Forms Chapter
n
LAillJilZ:::
AnyHy = An xHx ,
i=l
from which the theorem follows.
0 D
H nxn nxn Remark 10.14. The ratio ^^ XHHAx for A = AH E eC and Remark = A > 0, we must have a B b > nn —b l H of «„„ ann — bb B~ B1b completes the proof. 0 root of b completes D
10.3 10.3
Equivalence Equivalence Transformations Transformations and and Congruence Congruence
71 xm x Theorem 10.24. Let A €E C™* c;,xn. c~xn . Then Then there exist exist matrices P Ee C: C™xm and Q eE C" such n " such that that
PAQ=[~ ~l
(l0.4)
Proof: A classical proof proof can be consulted in, for example, [21, Proof: [21,p.p.131]. 131].Alternatively, Alternatively, suppose A has an SVD of the form (5.2) in its complex version. Then
[
Take P
=[
'f [I ]
S~
H
U
0 ] [ I Uf
Sl
o
and Q
=
]
AV
=
[I0
V to complete the proof.
0 ] 0 .
0
Note that the greater freedom afforded afforded by the equivalence transformation of Theorem 10.24, as opposed to the more restrictive situation of a similarity transformation, yields a far "simpler" canonical form (10.4). However, numerical procedures procedures for computing such an equivalence directly via, say, Gaussian or elementary row and column operations, are generally unreliable. The numerically preferred equivalence is, of course, the unitary unitary equivalence known as the SVD. However, the SVD is relatively expensive to compute and other canonical forms exist that are intermediate between (l0.4) (10.4) and the SVD; see, for example [7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable than (lOA) (10.4) and more efficiently efficiently computable than a full SVD. Many similar results are also available. available.
10.3. Transformations and Congruence 10.3. Equivalence Equivalence Transformations and Congruence
103 103
x Theorem 10.25 10.25 (Complete (Complete Orthogonal Decomposition). Let Let A A Ee C™ e~xn.". Then exist Theorem Orthogonal Decomposition). Then there there exist mxm nxn mxm nxn unitary matrices U e and V e such that that unitary matrices U eE C and V Ee C such
(10.5)
where R Ee e;xr upper (or lower) triangular triangular with with positive positive diagonal diagonal elements. where R €,rrxr is is upper (or lower) elements.
Proof: For the proof, proof, see Proof: For the see [4]. [4].
0 D x
mxm Let A A eE C™ e~xn.". Then exists a a unitary unitary matrix matrix Q Q Ee C e mxm and and aa Theorem 10.26. 10.26. Let Theorem Then there there exists x permutation matrix IT E en xn such that permutation Fl e C" "
QAIT =
[~ ~
l
(10.6)
r xr rx( r) E C e;xr erx(nr) arbitrary but in general general nonzero. nonzero. where R E upper triangular and S eE C " is arbitrary r is upper
Proof: For the see [4]. [4]. Proof: For the proof, proof, see
D 0
Remark 10.27. When A has has full column rank rank but but is "near" aa rank rank deficient deficient matrix, Remark 10.27. When A full column is "near" matrix, various rank rank revealing decompositions are can sometimes detect such such various revealing QR QR decompositions are available available that that can sometimes detect phenomena at considerably less less than than aa full Again, see see [4] phenomena at aa cost cost considerably full SVD. SVD. Again, [4] for for details. details. nxn n xn H e nxn and X X e E C e~xn. H X XH AX is called Definition 10.28. Definition 10.28. Let A eE C The transformation A i> AX n . The aa congruence. congruence. Note Note that that aa congruence congruence is is aa similarity similarity if if and and only only if ifXX is is unitary. unitary.
Note that that congruence preserves the the property property of of being being Hermitian; Hermitian; i.e., if A A is Note congruence preserves i.e., if is Hermitian, Hermitian, then AX is is also also Hermitian. Hermitian. It of interest to ask ask what what other properties of of aa matrix matrix are are then X XHH AX It is is of interest to other properties preserved under under congruence. congruence. It turns out the principal principal property property so so preserved preserved is is the the sign sign preserved It turns out that that the of of each each eigenvalue. eigenvalue. H x nxn Definition 10.29. Let =A eE C" " and and let the numbers positive, Let A A = AH e let 7t, rr, v, v, and and £ ~ denote denote the numbers of of positive, Definition 10.29. negative, and zero eigenvalues, respectively, of of A. A. Then inertia of of negative, and eigenvalues, respectively, Then the inertia of A is is the the triple of numbers v, n of A is sig(A) = v. numbers In(A) In(A) = (rr, (n, v, £). The The signature signature of is given by sig(A) = nrr — v.
Example 10.30. Example 10.30.
o 1 o o
0] 00 10 =(2,1,1).
l.In[!
0
0
x 2. If A = A" AH Ee Ce nnxn if and and only only if In(A) = (n, 0, 0). 2. If A " , ,t hthen e n AA> > 00 if if In (A) = (n,0,0).
In(A) = (rr, v, £), n, then rank(A) = n rr + v. 3. If In(A) (TT, v, then rank(A) v. n xn Theorem 10.31 10.31 (Sylvester's (Sylvester's Law Law of Inertia). Let A = A HHE xn and X e E C e~ nxn.. Then Theorem of Inertia). e en Cnxn H H AX). In(A) In(A) == In(X ln(X AX).
Proof: For For the the proof, proof, see, for example, p. 134]. D Proof: see, for example, [21, [21, p. 134]. D Theorem Theorem 10.31 10.31guarantees guaranteesthat thatrank rankand andsignature signatureofofa amatrix matrixare arepreserved preservedunder under We then then have have the the following. congruence. congruence. We following.
104 104
Chapter 10. Chapter 10. Canonical Canonical Forms Forms
H xn nxn Theorem 10.32. Let A = A AH with In(A) = (jt, (Jr, v, v, O. eE c C" In(A) = £). Then there exists a matrix xn H X E c~xn such that XH AX = diag(1, I, 1,..., I, ... , 1, 1,0, X e C"n X AX = diag(l, .... . . ,, 1, 0, .... . . ,0), , 0),where wherethe thenumber number of of 1's's is Jr, the number of I 's is v, and the numberofO's is~. is 7i, the number of — l's is v, the number 0/0 's is (,.
Proof: Let AI AI,, ... Anw denote the eigenvalues of of A and order them such that the first TT Jr are Proof: . . . ,, X O. By Theorem Theorem 10.2 there exists a unitary positive, the next v are negative, and the final £~ are 0. AV = matrix V U such that VH UHAU = diag(AI, diag(Ai, ... . . . ,, An). A w ). Define Define the thenn xx nnmatrix matrix
vv
= diag(I/~, ... , I/~, 1/.fArr+I' ... , I/.fArr+v, I, ... ,1).
Then it is easy to check that X X =V U VV W yields the desired desired result.
10.3.1 10.3.1
0 D
Block matrices and definiteness
T AT Theorem 10.33. Suppose A = =A and D D= = DT. DT. Then
°
T ifand A> D  BT AIl B > D > and A  BD^B BD I BT > O. if and only ifeither if either A > 0 and and D BT A~ > 0, 0, or D > 0 and > 0.
Proof: The proof proof follows by considering, for example, the congruence Proof: B ] [I D ~ 0
_AI B I
JT [
A BT
~ ][ ~
The details are straightforward and are left left to the reader.
0 D
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem. Remark T T AT D =D DT. Theorem 10.35. Suppose A = A and D . Then
B ] > D 
°
+ + if A:::: 0, AA AA+B = B, B. and D  BT A+B:::: o. if and only if ifA>0, B = and D BT A B > 0.
Proof: Consider the congruence with Proof: Consider
proof of Theorem Theorem 10.33. and proceed as in the proof
10.4 10.4
0 D
Rational Form Rational Canonical Canonical Form
rational canonical form. One final canonical form to be mentioned is the rational
10.4. Rational Rational Canonical Canonical Form Form 10.4.
105 105
n x Definition 10.36. A A matrix matrix A A E Xn" is said to be nonderogatory ifits Definition e lR M" is said to be if its minimal minimal polynomial polynomial and characteristic characteristic polynomial polynomial are are the same or; Jordan canonical canonical form and the same or, equivalently, equivalently, if if its its Jordan form has only one block block associated each distinct has only one associated with with each distinct eigenvalue. eigenvalue.
xn Suppose A EE lR is aa nonderogatory nonderogatory matrix characteristic polynoSuppose A Wnxn is matrix and and suppose suppose its its characteristic polynon(A) = A" An — (ao alA + ... + A + an_IAnI). a n _iA n ~') Then Then it it can can be be shown shown (see (see [12]) [12]) that that A mial is 7r(A) (a0 + + «A is similar is similar to to aa matrix matrix of of the the form form
o o
o
o 0
o
(10.7)
o
o
nxn Definition 10.37. 10.37. A " of Definition A matrix matrix A A eE E lRnx of the the form form (10.7) (10.7) is is called called a a companion cornpanion matrix rnatrix or or is to be in companion cornpanion forrn. is said said to be in form.
Companion matrices matrices also also appear appear in in the the literature literature in in several several equivalent equivalent forms. forms. To To Companion illustrate, consider the the companion matrix illustrate, consider companion matrix
(l0.8)
This in lower Hessenberg form. This matrix matrix is is aa special special case case of of aa matrix matrix in lower Hessenberg form. Using Using the the reverseorder reverseorder identity P given by (9.18), (9.18), A A is is easily to be be similar to the the following matrix identity similarity similarity P given by easily seen seen to similar to following matrix in upper Hessenberg Hessenberg form: in upper form: a2
al
o
0
1
0
o
1
6]
o . o
(10.9)
Moreover, since since aa matrix matrix is is similar similar to to its its transpose transpose (see (see exercise exercise 13 13 in in Chapter Chapter 9), 9), the the Moreover, following are also also companion companion matrices matrices similar similar to above: following are to the the above:
:l ~ ! ~01]. ao
0
(10.10)
0
Notice that that in in all cases aa companion companion matrix matrix is is nonsingular nonsingular if and only only if ao i= Notice all cases if and if aO /= O. 0. In fact, the inverse of aa nonsingular nonsingular companion matrix is in companion companion form. form. For In fact, the inverse of companion matrix is again again in For £*Yamr\1j=» example,
o 1
o
~ ao
1
o o
~ ao
o o
_!!l
o o
(10.11)
Chapter 10. 10. Canonical Canonical Forms Forms Chapter
106
with with aa similar similar result result for for companion companion matrices matrices of of the the form form (10.10). (10.10). If If a companion matrix of the form (10.7) is singular, singular, i.e., if if ao ao = = 0, then its pseudo1 ... , anIf inverse can still be computed. Let a Ee JRn1 M"" denote the vector [ai, \a\, a2, 02,..., a n i] and and let l r . Then it is easily verified that I+~T a' Then it is easily verified that cc = l+ a a
o
o
o
+
o o
o
o
o
o
1 caa T
o
ca
J.
Note that /I — caa TT = = (I + + aaTT) ) I ,, and hence the pseudoinverse of a singular companion matrix is not companion matrix matrix unless = 0. O. matrix is not aa companion unless a a= Companion matrices matrices have interesting properties, among which, perCompanion have many many other other interesting properties, among which, and and perhaps surprisingly, surprisingly, is is the the fact singular values found in in closed form; see see haps fact that that their their singular values can can be be found closed form; [14].
Theorem 10.38. 10.38. Let GI > ••• > the singular values of of the companion matrix matrix Theorem Let a\ al > ~ a2 ~ ... ~ a ann be be the singular values the companion A a = Then Leta = a\ + + a\ ai + + •...• • ++ a%_ a;_1{ and and yy = = 1 1+ + «.Q ++ a. a. Then A in in (10.7). (10.7). Let
ar
aJ
2_ 21 ( y + Jy 2 4ao2) '
al

a? = 1
for i = 2, 3, ... , n  1,
a; = ~ (y  Jy2  4aJ) . Ifao ^ 0, the largest largest and and smallest smallest singular also be be written in the the equivalent equivalent form form If ao =1= 0, the singular values values can can also written in
Remark 10.39. Explicit Explicit formulas formulas for for all all the right and left singular singular vectors can Remark 10.39. the associated associated right and left vectors can also be derived easily. easily. also be derived nx If A E JRnxn If A € R " is derogatory, derogatory, i.e., has more than one Jordan block associated associated with at least least one not similar companion matrix matrix of of the at one eigenvalue, eigenvalue, then then it it is is not similar to to aa companion the form form (10.7). (10.7). However, it can be shown that a derogatory matrix is similar to a block diagonal matrix, each of each of whose whose diagonal diagonal blocks blocks is is aa companion companion matrix. matrix. Such Such matrices matrices are are said said to to be be in in rational canonical form (or Frobenius Frobenius canonical form). rational canonical form form). For details, see, for example, [12]. Companion appear frequently control and signal processing literature Companion matrices matrices appear frequently in in the the control and signal processing literature but they are are often often very very difficult difficult to to work work with numerically. Algorithms reduce but unfortunately unfortunately they with numerically. Algorithms to to reduce an companion form form are are numerically an arbitrary arbitrary matrix matrix to to companion numerically unstable. unstable. Moreover, Moreover, companion companion matrices are are known known to possess many many undesirable undesirable numerical properties. For For example, in matrices to possess numerical properties. example, in n increases, their eigenstructure is extremely ill conditioned, general and especially especially as n nonsingular ones nearly singular, unstable, and nonsingular ones are are nearly singular, stable stable ones ones are are nearly nearly unstable, and so so forth forth [14]. [14].
Exercises Exercises
107
Companion matrices matrices and and rational rational canonical canonical forms forms are are generally generally to to be be avoided avoided in in fioatingCompanion floatingpoint computation.
Remark 10.40. Theorem 10.38 10.38 yields yields some understanding of of why why difficult difficult numerical Remark 10.40. Theorem some understanding numerical behavior linear behavior might might be be expected expected for for companion companion matrices. matrices. For For example, example, when when solving solving linear equations of the form (6.2), one measure of numerical numerical sensitivity Kp(A) systems of equations sensitivity is K = P(A) = l m A ] IIpp'> the socalledcondition conditionnumber numberof ofAA with withrespect respecttotoinversion inversionand andwith withrespect respect II ^A IIpp II A~ e socalled k to Pnorm. If If this 0(10*), this number number is is large, large, say say O(lO ), one one may may lose lose up up to to kk digits digits of of to the the matrix matrix pnorm. precision. In In the the 2norm, 2norm, this this condition number is is the the ratio ratio of of largest largest to to smallest smallest singular singular precision. condition number explicitly as values which, by the theorem, can be determined determined explicitly
y+J y 2  4a5
21 a ol It is is easy k2(A) < small or or yy is both), It easy to to show show that that y/2/ao 21~01 ::::< K2(A) :::: £,, 1:01' and and when when GO ao is is small is large large (or (or both), then It is for yy to large n. Note K2(A) ^~ T~I. I~I' It is not not unusual unusualfor to be be large large for forlarge Note that that explicit explicit formulas formulas then K2(A) Koo(A) can also be determined easily by using (l0.11). for K] K\ (A) (A) and Koo(A) (10.11).
EXERCISES EXERCISES 1. 1. Show that if a triangular matrix is normal, then it must be diagonal. x A e E M" jRnxn" is normal, then Af(A) N(A) = = N(A ). 2. Prove that if A A/"(ATr ). nx A G E C cc nxn peA) = = maxx max)..EA(A) peA) is called the spectral 3. Let A " and define p(A) I'M Then p(A) €A(A) IAI. radius of if A A2. Show radius of A. A. Show Show that that if A is is normal, normal, then then p(A) peA) = = IIAII2' Show that that the the converse converse is is true true if if n n= = 2. 2. nxn A € E C en xn be normal with eigenvalues eigenvalues y1 A],, ... and singular singular values a\ 0'1 ~ ~ 4. Let A ..., , yAnn and > a0'22 > ... • •• ~ > an on ~ > O. 0. Show Show that that a; a, (A) (A) = IA;(A)I A.,(A) for for ii E!l. e n.
5. Use the reverseorder identity identity matrix P introduced in in (9.18) (9.18) and the matrix U U in x A e E C" cc nxn Theorem 10.5 to find a unitary matrix Q that reduces A " to lower triangular form. x2 6. M]eECCC22x2 .. Find U such such that that A = I[~J : Find aa unitary unitary matrix matrix U 6. Let Let A
xn 7. A E jRn xn is positive definite, show that A A I[ must must also also be be positive positive definite. 7. If If A e W
[1
x 8. A e E E" jRnxn is positive definite. definite. Is [ ^ 3. Suppose A " is nxn 9. Let R, R, S 6 E E jRnxn be be symmetric. Show that that [[~* }. Let symmetric. Show
R > SSI. R>
A~I]1 > ~ 0? O? /i > 0 if and and only only if if S > > 0 and J~]1 > 0 if 0 and
108 108
Chapter Chapter 10. 10. Canonical Canonical Forms Forms
10. following matrices: 10. Find the inertia of the following (a)
[~ ~
(d) [  1 1 j
l
(b) [
1+ j ] 1 .
2 1 j
1+ j ] 2 '
Chapter 11 11 Chapter
Linear and Linear Differential Differential and Difference Equations Difference Equations
11.1 11.1
Differential Differential Equations Equations
In this section the linear homogeneous system equations In this section we we study study solutions solutions of of the linear homogeneous system of of differential differential equations x(t)
= Ax(t);
x(to)
= Xo
E JR.n
(11.1)
for this for tt 2: > to. IQ. This This is is known known as as an an initialvalue initialvalue problem. problem. We We restrict restrict our our attention attention in in this nxn chapter to the where the the matrix A E JR.nxn is constant chapter only only to the socalled socalled timeinvariant timeinvariant case, case, where matrix A e R is constant and (11.1) is known always and does does not not depend depend on on t.t. The The solution solution of of (11.1) is then then known always to to exist exist and and be be unique. in terms unique. It It can can be be described described conveniently conveniently in terms of of the the matrix matrix exponential. exponential. nxn Definition 11.1. A Ee JR.nxn, JR.nxn is Definition 11.1. For For all all A Rnxn, the the matrix matrix exponential exponential eeAA Ee R is defined defined by by the power series power series
A
e =
+00 1
L
k=O
,Ak.
(11.2)
k.
The series be shown to converge A (has radius of The series (11.2) (11.2) can can be shown to converge for for all all A (has radius of convergence convergence equal equal to +(0). to +00). The Thesolution solutionof of(11.1) (11.1)involves involvesthe thematrix matrix (11.3)
which thus A and which thus also also converges converges for for all all A and uniformly uniformly in in t.t.
11.1.1 11.1.1
Properties of of the matrix exponential exponential Properties the matrix
1. eO e° = = I. I. Proof: This This follows follows immediately immediately from from Definition Definition 11.1 11.1bybysetting settingAA==O.0. Proof T
A )A = e A • 2. For For all allAAEGJR.nxn, R" XM , (e(e f  e^. Proof This follows follows immediately immediately from Definition 11.1 linearity of of the the transpose. Proof: This from Definition 11.1 and and linearity transpose. T
109 109
110 110
Chapter Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations
3. For For all all A Ee JRnxn R"x" and and for for all all t, t, Tr Ee JR, R, Proof" Note that Proof: Note that e(t+r)A
= I
e(t+r)A e(t+T)A
rA = = = etA e'AeerA = erAe elAe'tAA..
+ (t + T)A + (t + T)2 A 2 + ... 2!
and and tA rA
e e
= ( I + t A + t2!2 A 2 +... ) ( I + T A + T2!2 A 2 +... ) .
Compare powers of A in the above Compare like like powers of A in the above two two equations equations and and use use the the binomial binomial theorem theorem on(t+T)k. on (t + T)*. xn B 4. For all JRnxn and = all A, B Ee R" and for all all t Ee JR, R, et(A+B) et(A+B) =etAe =^e'Ae'tB = etBe e'Be'tAA if and and only if A and B commute, AB = BA. and B commute, i.e., i.e., AB =B A. Proof' Note that Proof: Note that 2
et(A+B)
= I
t + teA + B) + (A + B)2 + ...
2!
and and
while while tB tA
e e
=
(
1+ tB
t2 2 2 2 +... ) . + 2iB +... ) ( 1+ tA + t2!A
Compare like like powers of tt in in the first equation equation and the second second or or third third and the Compare powers of the first and the and use use the binomial theorem on on (A (A + B/ B)k and and the the commutativity commutativityof ofAAand andB.B. binomial theorem x 5. ForaH JRnxn" and For all A Ee R" and for for all all t eE JR, R, (etA)1 (e'A)~l = ee~'tAA.. Proof" Simply Proof: Simply take take TT = = t — t in in property property 3. 3.
6. Let £ denote the Laplace transform. Then for 6. Let denote the Laplace transform transform and and £1 £~! the the inverse inverse Laplace Laplace transform. Then for x E R" JRnxn" and for all tt € E lR, all A € R, tA } = (sI  A)I. (a) (a) .l{e C{etA } = (sIArl. 1 M (b) A)I} erA. (b) .lI{(sl£ 1 {(j/A)} == « .
Proof" prove only similarly. Proof: We We prove only (a). (a). Part Part (b) (b) follows follows similarly.
{+oo = io
et(sl)e
(+oo
=io
ef(Asl)
tA
dt
dt
since A and (sf) commute
111 111
11.1. Differential Differential Equations 11.1. Equations
= {+oo
10
=
t
e(AiS)t x;y;H dt assuming A is diagonalizable
;=1
~[fo+oo e(AiS)t dt]x;y;H 1
n
= '"'
  X i y;H
L..... s  A"I i=1
assuming Re s > Re Ai for i E !!
1 = A)I. = (sI (sl A).
The matrix matrix (s A) ~' I is is called called the the resolvent resolvent of A and and is is defined defined for for all all ss not not in A (A). The (s II — A) of A in A (A). Notice in in the the proof proof that that we we have have assumed, assumed, for convenience, that that A A is Notice for convenience, is diagonalizable. diagonalizable. If this is not scalar dyadic If this is not the the case, case, the the scalar dyadic decomposition decomposition can can be be replaced replaced by by m
et(Asl)
=L
Xiet(Jisl)y;H
;=1
using Allsucceeding succeedingsteps stepsin inthe theproof proof then then follow follow in inaastraightforward straightforward way. way. using the the JCF. JCF. All x A For all all A A eE R" JRnxn" and and for all t eE R, JR, 1h(e 7. For for all £(e'tA )) = AetA = etA e'AA. Proof: Since Since the the series series (11.3) is uniformly uniformly convergent, convergent, it it can can be be differentiated Proof: (11.3) is differentiated termbytermbyterm from which the result follows immediately. Alternatively, the formal definition
d
e(t+M)A _ etA
_(/A) = lim
dt
L'lt
~t+O
can be employed employed as follows. For any consistent matrix norm,
I
e(t+~t)AAtetA u.  Ae tA
I = IIIL'lt (etAe~tA 
/A)  Ae tA
I
tA
I
=
I ~t (e~tAetA 
=
I ~t (e~tA  l)e  Ae II
=
tA
tA
I (M A
(M)2 A 2 +... ) +~
e tA  AetAil
tA
~; A 2etA + ... ) 
Ae
I L'lt
= I ( Ae + =
etA)  Ae
I ( ~; A2 + (~~)2 A
< MIIA21111e
tA II
1 ( _2!
< L'lt1lA21111e
tA Il
(1 +

3
+ .. , )
etA
tA
II
I
L'lt (L'lt)2 + IIAII + IIAI12 + ... ) 3! 4! L'ltiIAIl
= L'lt IIA 21111e tA IIe~tIIAII.
+ (~t IIAII2 + ... )
112 112
Chapter 11. 11. Linear Linear Differential Differential and and Difference Difference Equations Equations Chapter For fixed t, the For fixed t, the the righthand righthand side side above above clearly clearly goes goes to to 00 as as t:.t At goes goes to to O. 0. Thus, Thus, the limit and equals Ae t AA•. A A similar the limit etAA A, A, or the limit exists exists and equals Ae' similar proof proof yields yields the limit e' or one one can can use use the A fact A commutes with any A of finite degree etA. fact that that A commutes with any polynomial polynomial of of A of finite degree and and hence hence with with e' .
11.1.2
Homogeneous Homogeneous linear differential equations equations
Theorem 11.2. Let Let A A Ee IR Rnnxn xn.. The The solution solution of of the the linear linear homogeneous homogeneous initialvalue initialvalue problem problem x(t)
= Ax(l);
x(to)
= Xo
E
IR n
(11.4)
for t ::: to is given by
(11.5)
Proof: Proof: Differentiate Differentiate (11.5) (11.5) and and use use property property 77 of of the the matrix matrix exponential exponential to to get get xx((t) t ) == (t to)A Ae(tto)A Xo so, by the fundamental fundamental existence and Ae ~ xo x(t0) = — e(toto)A e(fo~t°')AXQXo = — XQ xo = Ax(t). Also, x(to) uniqueness theorem theorem for for ordinary ordinary differential differential equations, equations, (11.5) (11.5) is is the the solution solution of of (11.4). (11.4). D uniqueness 0
11.1.3
Inhomogeneous Inhomogeneous linear differential equations equations
nxn xm Theorem Let A A Ee R IR nxn B Ee W IR nxm and function uu be given Theorem 11.3. Let ,, B and let let the the vectorvalued vectorvalued function be given and, and, say, say, continuous. continuous. Then Then the the solution solution of of the the linear linear inhomogeneous inhomogeneous initialvalue initialvalue problem problem
= Ax(t) + Bu(t);
x(t)
= Xo
IRn
(11.6)
= e(tto)A xo + t e(ts)A Bu(s) ds. lo
(11.7)
x(to)
E
for > to IQ is is given given by by the the variation variation of of parameters parameters formula formula for tt ::: x(t)
t
Proof: Differentiate property 77 of Proof: Differentiate (11.7) (11.7) and and again again use use property of the the matrix matrix exponential. exponential. The The general general formula formula d dt
l
q
(t)
pet)
f(x, t) dx =
l
q
af(x t) ' dx pet) at (t)
+
dq(t) dp(t) f(q(t), t )    f(p(t), t )  dt dt
Ir:
( s)A is used to Ae(ts)A Bu(s) ds + Bu(t) = Ax(t) + Bu(t). Also, to get get xx(t) (t) = = Ae(tto)A Ae{'to)AxXo0 + f'o Ae ' Bu(s) + Bu(t) = Ax(t) = (f fo)/1 x(to} =