A Second Semester of Linear Algebra

A Second Semester of Linear Algebra S. E. Payne 19 January ’09 2 Contents 1 Preliminaries 1.1 Fields . . . . . . . ...

Author: S. E. Payne

28 downloads 715 Views 966KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

A Second Semester of Linear Algebra S. E. Payne 19 January ’09

2

Contents 1 Preliminaries 1.1 Fields . . . . . . . . . . . . . . . . . 1.2 Groups . . . . . . . . . . . . . . . . . 1.3 Matrix Algebra . . . . . . . . . . . . 1.4 Linear Equations Solved with Matrix Algebra . . . . . . . . . . . . . . . . 1.5 Exercises . . . . . . . . . . . . . . . .

7 . . . . . . . . . . . . . . 8 . . . . . . . . . . . . . . 9 . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . 13

2 Vector Spaces 2.1 Definition of Vector Space . . . . . . 2.1.1 Prototypical Example . . . . . 2.1.2 A Second Example . . . . . . 2.2 Basic Properties of Vector Spaces . . 2.3 Subspaces . . . . . . . . . . . . . . . 2.4 Sums and Direct Sums of Subspaces . 2.5 Exercises . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

15 15 15 16 16 17 18 19

3 Dimensional Vector Spaces 3.1 The Span of a List . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Linear Independence and the Concept of Basis . . . . . . . . . 3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21 21 22 27

4 Linear Transformations 4.1 Definitions and Examples . . . . . . . 4.2 Kernels and Images . . . . . . . . . . . 4.3 Rank and Nullity Applied to Matrices . 4.4 Projections . . . . . . . . . . . . . . . 4.5 Bases and Coordinate Matrices . . . .

31 31 32 34 35 36

3

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . .

4

CONTENTS 4.6 4.7 4.8

Matrices as Linear Transformations . . . . . . . . . . . . . . . 38 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Polynomials 5.1 Algebras . . . . . . . . . . . 5.2 The Algebra of Polynomials 5.3 Lagrange Interpolation . . . 5.4 Polynomial Ideals . . . . . . 5.5 Exercises . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

6 Determinants 6.1 Determinant Functions . . . . . . . . . . . . . . . . . . . . 6.1.3 n-Linear Alternating Functions . . . . . . . . . . . 6.1.5 A Determinant Function - The Laplace Expansion . . . . . . . . . . . . . . . . . . . . . . . 6.2 Permutations & Uniqueness of Determinants . . . . . . . . 6.2.1 A Formula for the Determinant . . . . . . . . . . . 6.3 Additional Properties of Determinants . . . . . . . . . . . 6.3.1 If A is a unit in Mn (K), then det(A) is a unit in K. 6.3.2 Triangular Matrices . . . . . . . . . . . . . . . . . . 6.3.3 Transposes . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Elementary Row Operations . . . . . . . . . . . . . 6.3.5 Triangular Block Form . . . . . . . . . . . . . . . . 6.3.6 The Classical Adjoint . . . . . . . . . . . . . . . . . 6.3.8 Characteristic Polynomial of a Linear Map . . . . . 6.3.10 The Cayley-Hamilton Theorem . . . . . . . . . . . 6.3.11 The Companion Matrix of a Polynomial . . . . . . 6.3.13 The Cayley-Hamilton Theorem: A Second Proof . . 6.3.14 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . 6.3.15 Comparing ST with T S . . . . . . . . . . . . . . . 6.4 Deeper Results with Some Applications∗ . . . . . . . . . . 6.4.1 Block Matrices whose Blocks Commute∗ . . . . . . 6.4.2 Tensor Products of Matrices∗ . . . . . . . . . . . . 6.4.3 The Cauchy-Binet Theorem-A Special Version∗ . . 6.4.5 The Matrix-Tree Theorem∗ . . . . . . . . . . . . . 6.4.10 The Cauchy-Binet Theorem - A General Version∗ . 6.4.12 The General Laplace Expansion∗ . . . . . . . . . .

. . . . .

. . . . .

47 47 49 52 54 58

61 . . 61 . . 64 . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

65 67 67 72 72 72 72 73 73 74 77 79 81 83 85 85 89 90 91 92 94 96 98

CONTENTS

6.5

5

6.4.13 Determinants, Ranks and Linear Equations∗ . . . . . . 100 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7 Operators and Invariant Subspaces 7.1 Eigenvalues and Eigenvectors . . . . . . . 7.2 Upper-Triangular Matrices . . . . . . . . . 7.3 Invariant Subspaces of Real Vector Spaces 7.4 Two Commuting Linear Operators . . . . 7.5 Commuting Families of Operators∗ . . . . 7.6 The Fundamental Theorem of Algebra∗ . . 7.7 Exercises . . . . . . . . . . . . . . . . . . . 8 Inner Product Spaces 8.1 Inner Products . . . . . . . . . . . . . . 8.2 Orthonormal Bases . . . . . . . . . . . . 8.3 Orthogonal Projection and Minimization 8.4 Linear Functionals and Adjoints . . . . . 8.5 Exercises . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . . .

. . . . .

. . . . . . .

. . . . .

9 Operators on Inner Product Spaces 9.1 Self-Adjoint Operators . . . . . . . . . . . . . 9.2 Normal Operators . . . . . . . . . . . . . . . . 9.3 Decomposition of Real Normal Operators . . . 9.4 Positive Operators . . . . . . . . . . . . . . . 9.5 Isometries . . . . . . . . . . . . . . . . . . . . 9.6 The Polar Decomposition . . . . . . . . . . . . 9.7 The Singular-Value Decomposition . . . . . . 9.7.3 Two Examples . . . . . . . . . . . . . 9.8 Pseudoinverses and Least Squares∗ . . . . . . 9.9 Norms, Distance and More on Least Squares∗ 9.10 The Rayleigh Principle∗ . . . . . . . . . . . . 9.11 Exercises . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . .

. . . . . . . . . . . .

113 113 115 120 122 126 129 134

. . . . . . .

. . . . . . .

. . . . .

137 . 137 . 146 . 148 . 152 . 154

. . . . . . . . . . . .

159 . 159 . 163 . 166 . 169 . 171 . 175 . 177 . 181 . 184 . 189 . 195 . 197

10 Decomposition WRT a Linear Operator 203 10.1 Powers of Operators . . . . . . . . . . . . . . . . . . . . . . . 203 10.2 The Algebraic Multiplicity of an Eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 10.3 Elementary Operations . . . . . . . . . . . . . . . . . . . . . . 208

6

CONTENTS 10.4 10.5 10.6 10.7

Transforming Nilpotent Matrices . . . An Alternative Approach to the Jordan A “Jordan Form” for Real Matrices . . Exercises . . . . . . . . . . . . . . . . .

11 Matrix Functions∗ 11.1 Operator Norms and Matrix Norms∗ 11.2 Polynomials in an Elementary Jordan 11.3 Scalar Functions of a Matrix∗ . . . . 11.4 Scalar Functions as Polynomials∗ . . 11.5 Power Series∗ . . . . . . . . . . . . . 11.6 Commuting Matrices∗ . . . . . . . . 11.7 A Matrix Differential Equation∗ . . . 11.8 Exercises . . . . . . . . . . . . . . . .

. . . . Form . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

210 216 219 227

. . . . . Matrix∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

233 233 236 238 242 244 250 255 257

. . . . .

259 . 259 . 260 . 261 . 264 . 266

12 Infinite Dimensional Vector Spaces∗ 12.1 Partially Ordered Sets & Zorn’s Lemma∗ 12.2 Bases for Vector Spaces∗ . . . . . . . . . 12.3 A Theorem of Philip Hall∗ . . . . . . . . 12.4 A Theorem of Marshall Hall, Jr.∗ . . . . 12.5 Exercises∗ . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Chapter 1 Preliminaries Preface to the Student This book is intended to be used as a text for a second semester of linear algebra either at the senior or first-year-graduate level. It is written for you under the assumption that you already have successfully completed a first course in linear algebra and a first course in abstract algebra. The first short chapter is a very quick review of the basic material with which you are supposed to be familiar. If this material looks new, this text is probably not written for you. On the other hand, if you made it into graduate school, you must have already acquired some background in modern algebra. Perhaps all you need is to spend a little time with your undergraduate texts reviewing the most basic facts about equivalence relations, groups, matrix computations, row reduction techniques, and the basic concepts of linear independence, span, and basis in the context of Rn . On the other hand, some of you will be ready for a more advanced approach to the material covered here than we can justify making a part of the course. For this reason I have included some starred sections that may be skipped without disturbing the general flow of ideas, but that might be of interest to some students. If material from a starred section is ever cited in a later section, that section will necessarily also be starred. For the material we do cover in detail, we hope that you will find our presentation to be thorough and our proofs to be complete and clear. However, when we indicate that you should verify something for yourself, that means you should use paper and pen and write out the appropriate steps in detail. One major difference between your undergraduate linear algebra text and 7

8

CHAPTER 1. PRELIMINARIES

this one is that we discuss abstract vector spaces over arbitrary fields instead of restricting our discussion to the standard space of n-tuples of real numbers. A second difference is that the emphasis is on linear transformations from one vector space over the field F to a second one, rather than on matrices over F . However, a great deal of the general theory can be effortlessly translated into results about matrices.

1.1

Fields

The fields of most concern in this text are the complex numbers C, the real numbers R, the rational numbers Q, and the finite Galois fields GF (q), where q is a prime power. In fact, it is possible to work through the entire book and use only the fields R and C. And if finite fields are being considered, most the time it is possible to use just those finite fields with a prime number of elements, i.e., the fields Zp ∼ Z/pZ, where p is a prime integer, with the algebraic operations just being addition and multiplication modulo p. For the purpose of reading this book it is sufficient to be able to work with the fields just mentioned. However, we urge you to pick up your undergraduate abstract algebra text and review what a field is. For most purposes the symbol F denotes an arbitrary field except where inner products and/or norms are involved. In those cases F denotes either R or C. Kronecker delta When the underlying field F is understood, the symbol δij (called the Kronecker delta) denotes the element 1 ∈ F if i = j and the element 0 ∈ F if i 6= j. Occasionally it means 0 or 1 in some other structure, but the context should make that clear. If you are not really familiar with the field of complex numbers, you should spend a little time getting acquainted. A complex number α = a + bi, where · α = a2 + b2 = |α|2 . a, b ∈ R and i2 = −1, has a conjugate α = a − bi, and α √ It is easily checked that α · β = α · β. Note that |α| = a2 + b2 = 0 if and only if α = 0. You should show how to compute the multiplicative inverse of any nonzero complex number. √ We define the square-root symbol as usual: For 0 ≤ x ∈ R, put √ x = |y| where y is a real number such that y 2 = x. The following two lemmas are often useful. Lemma 1.1.1. Every complex number has a square root.

1.2. GROUPS

9

Proof. Let α = a + bi be any complex number. With the definition of √ given above, and with γ = a2 + b2 ≥ |a|, we find that r

√

!2 r γ+a γ−a ±i = a ± |b|i. 2 2

Now just pick the sign ± so that ±|b| = b. Lemma 1.1.2. Every polynomial of odd degree with real coefficients has a (real) zero. Proof. It suffices to prove that a monic polynomial P (x) = xn + a1 xn−1 + · · · + an with some ai 6= 0, with a1 , . . . , an ∈ R and n odd has a zero. Put a = |a1 | + |a2 | + · · · + |an | + 1 > 1, and = ±1. Then |a1 (a)n−1 + · · · + an−1 (a) + an | ≤ |a1 |an−1 + · · · + |an−1 |a + |an | ≤ (|a1 | + |a2 | + · · · + |an |)an−1 = (a − 1)(an−1 ) < an . It readily follows that P (a) > 0 and P (−a) < 0. (Check this out for yourself !) Hence by the Intermediate Value Theorem (from Calculus) there is a λ ∈ (−a, a) such that P (λ) = 0.

1.2

Groups

Let G be an arbitrary set and let ◦ : G × G → G be a binary operation on G. Usually we denote the image of (g1 , g2 ) ∈ G × G under the map ◦ by g1 ◦ g2 . Then you should know what it means for (G, ◦) to be a group. If this is the case, G is an abelian group provided g1 ◦ g2 = g2 ◦ g1 for all g1 , g2 ∈ G. Our primary example of a group will be a vector space whose elements (called vectors) form an abelian group under vector addition. It is also helpful if you remember how to construct the quotient group G/N , where N is a normal subgroup of G. However, this latter concept will be introduced in detail in the special case where it is needed.

10


1.3

Matrix Algebra

An m × n matrix A over F is an m by n array of elements from the field F . We may think of A as an ordered list of m row vectors from F n or equally well as an ordered list of n column vectors from F m . The element in row i and column j is usually denoted Aij . The symbol Mm,n (F ) denotes the set of all m × n matrices over F . It is readily made into a vector space over F . For A, B ∈ Mm,n (F ) define A + B by (A + B)i,j = Aij + Bij . Similarly, define scalar multiplication by (aA)ij = aAij . Just to practice rehearsing the axioms for a vector space, you should show that Mm,n (F ) really is a vector space over F . (See Section 2.1.) If A and B are m × n and n × p over F , respectively, then the product AB is an m × p matrix over F defined by (AB)ij =

n X

Aik Bkj .

k=1

Lemma 1.3.1. Matrix multiplication, when defined, is associative. Proof. (Sketch) If A, B, C are m × n, n × p and p × q matrices over F , respectively, then ((AB)C)ij = =

p X

p n X X (ABil )Clj = ( Aik Bkl )Clj =

l=1

l=1 k=1

X

Aik Bkl Clj = (A(BC))ij .

1≤k≤n;1≤l≤p

The following observations are not especially deep, but they come in so handy that you should think about them until they are second nature to you. Obs. 1.3.2. The ith row of AB is the ith row of A times the matrix B.

1.4. LINEAR EQUATIONS SOLVED WITH MATRIX

ALGEBRA

11

Obs. 1.3.3. The jth column of AB is the matrix A times the jth column of B. If A is m × n, then the n × m matrix whose (i, j) entry is the (j, i) of A is called the transpose of A and is denoted AT . Obs. 1.3.4. If A is m × n and B is n × p, then (AB)T = B T AT . Obs. 1.3.5. If A is m × n with columns C1 , . . . , Cn and X = (x1 , P . . . , xn )T is n×1, thenAX is the linear combination of columns of A given by nj=1 xj Cj . Obs. 1.3.6. If A is m × n with rows α1 , . . . , αm andP X = (x1 , . . . , xm ), then XA is the linear combination of rows of A given by m i=1 xi αi . At this point you should review block multiplication of partitioned matrices. The following special case is sometimes helpful. Obs. 1.3.7. If A is m × n and B is n × p, view A as partitioned into n column vectors in F m and view B as partitioned into n row vectors in F p . Then using block multiplication we can view AB as AB =

n X

colj (A) · rowj (B).

j=1

Note that colj (A) · rowj (B) is an m × p matrix with rank at most 1.

1.4

Linear Equations Solved with Matrix Algebra

For each i, 1 ≤ i ≤ m, let n X

aij xj = bj

j=1

be a linear equation in the indeterminates x1 , . . . , xn with the coefficients aij and bj all being real numbers. One of the first things you learned to do in your undergraduate linear algebra course was to replace this system of linear equations with a single matrix equation of the form

12


A~x = ~b, where A = (aij ) is m × n with m rows and n columns, and ~x = (x1 , . . . , xn )T , ~b = (b1 , . . . , bm )T . You then augmented the matrix A  a11 . . .  a21 . . .  A0 =  ..  . ... am1 . . .

with the column ~b to obtain  b1 a1n b2  a2n  .. ..  . . .  bm amn

At this point you performed elementary row operations on the matrix A0 so as to replace the submatrix A with a row-reduced echelon matrix R and at the same time replace ~b with a probably different column vector b~0 . You then used this matrix (R b~0 ) to read off all sorts of information about the original matrix A and the system of linear equations. The matrix R has r nonzero rows for some r, 0 ≤ r ≤ m. The leftmost nonzero entry in each row of R is a 1, and it is the only nonzero entry in its column. Then the matrix (R b~0 ) is used to solve the original system of equations by writing each of the “leading variables” as a linear combination of the other (free) variables. The r nonzero rows of R form a basis for the row space of A, the vector subspace of Rn spanned by the rows of A. The columns of A in the same positions as the leading 1’s of R form a basis for the column space of A, i.e., the subspace of Rm spanned by the columns of A. There are n − r free variables. Let each of these variables take a turn being equal to 1 while the other free variables are all equal to 0, and solve for the other leading variables using the matrix equation R~x = ~0. This gives a basis of the (right) nullspace of A, i.e., a basis of the space of solutions to the system of homogeneous equations obtained by replacing ~b with ~0. In particular we note that r is the dimension of both the row space of A and of the column space of A. We call r the rank of A. The dimension of the right null space of A is n − r. Replacing A with its transpose AT , so that the left null space of A becomes the right null space of AT , shows that the left null space of A has dimension m − r. A good reference for the basic material is the following text: David C. Lay, LINEAR ALGEBRA AND ITS APPLICATIONS, 3rd Edition, Addison Wesley, 2003. An excellent reference at a somewhat higher level is R. A. Horn and C. R. Johnson, MATRIX ANALYSIS, Cambridge Univ. Press, 1985.

1.5. EXERCISES

1.5

13

Exercises

1. Consider the matrix 

1  −1 A=  2 1

 1 −1 1 −3  . 1 0  2 −3

(a) Find the rank of A. (b) Compute a basis for the column space of A, the row space of A, the null space of A, and the null space of AT . (c) Give the orthogonality relations among the four subspaces of part (b).   1  −3   (d) Find all solutions x = (x1 , x2 , x3 )T to Ax =   3 ; and to 0   3  −1   Ax =   4 . 4 2. Block Multiplication of matrices. Let A be an m × n matrix over F whose elements have been partitioned into blocks. Also, let B be an n × p matrix over F whose elements are partitioned into blocks so that the number of blocks in each row of A is the same as the number of blocks in each column of B. Moreover, suppose that the block Aij in the i-th row and j-th column of blocks of A and the block Bjk in the j-th row and k-th column of blocks of B are compatible for multiplication, i.e., Aij · Bjk should be well-defined for all appropriate indices i, j, k. Then the matrix product AB (which is well-defined) may be computed by block multiplication. For example, if A and B are partitioned in this way, then the (i, j)-th block of AB is given by the natural formula

14

CHAPTER 1. PRELIMINARIES (AB)ij =     

A11 A12 · · · A21 A22 · · · .. .. .. . . . As1 As2 · · ·

A1r A2r .. .

    

Asr =

r X

B11 B12 · · · B21 B22 · · · .. .. .. . . . Br1 Br2 · · ·

B1t B2t .. . Brt

     ij

Aik Bkj .

k=1

3. Suppose the matrix A (which might actually be huge!) is partitioned into four block in such a way that it is block upper triangular and the two diagonal blocks are square and are invertible. A11 A12 . A= 0 A22 Knowing that A11 and A22 are both invertible, compute A−1 . Then illustrate your solution by using block computations to compute the inverse of the following matrix.    A=  

1 3 0 0 0

1 2 0 0 0

 1 −1 2 0 1 −1   1 0 1   1 1 0  0 1 0

Chapter 2 Vector Spaces Linear algebra is primarily the study of linear maps on finite-dimensional vector spaces. In this chapter we define the concept of vector space and discuss its elementary properties.

2.1

Definition of Vector Space

A vector space over the field F is a set V together with a binary operation “+” on V such that (V, +) is an abelian group, along with a scalar multiplication on V (i.e., a map F × V → V ) such that the following properties hold: 1. For each a ∈ F and each v ∈ V , av is a unique element of V with 1 v = v for all v ∈ V . Here 1 denotes the multiplicative identity of F . 2. Scalar multiplication distributes over vector addition: a(u + v) = (au) + (av), which is usually written as au + av, for all a ∈ F and all u, v ∈ V . 3. (a + b)u = au + bu, for all a, b ∈ F and all u ∈ V . 4. a(bv) = (ab)v, for all a, b ∈ F and all v ∈ V .

2.1.1

Prototypical Example

Let S be any nonempty set and let F be any field. Put V = F S = {f : S → F : f is a function}. Then we can make V into a vector space over F as follows. For f, g ∈ V , define the vector sum f + g : S → F by (f + g)(s) = f (s) + g(s) for all s ∈ S. 15

16

CHAPTER 2. VECTOR SPACES Then define a scalar multiplication F × V → V as follows: (af )(s) = a(f (s)) for all a ∈ F, f ∈ F, s ∈ S.

It is a very easy but worthwhile exercise to show that with this vector addition and this scalar multiplication, V is a vector space over F . It is also interesting to see that this family of examples includes (in some abstract sense) all the examples of vector spaces you might have studied in your first linear algebra course. For example, let F = R and let S = {1, 2, . . . , n}. Then each f : {1, 2, . . . , n} → R is given by the n-tuple (f (1), f (2), . . . , f (n)). So with almost no change in your point of view you can see that this example is essentially just Rn as you knew it before.

2.1.2

A Second Example

Let F be any field and let x be an indeterminate over F . Then the ring F [x] of all polynomials in x with coefficients in F is a vector space over F . In Chapter 4 we will see how to view F [x] as a subspace of the special case of the preceding example where S = {0, 1, 2, . . .}. However, in this case, any two elements of F [x] can be multiplied to give another element of F [x], and F [x] has the structure of a commutative ring with 1. It is even an integral domain! In fact, it is a linear algebra. See the beginning of Chapter 4 for the definition of a linear algebra, a term that really ought to be defined somewhere in a Linear Algebra course. (Look up any of these words that you are unsure about in your abstract algebra text.) (Note: Our convention is that the zero polynomial has degree equal to −∞.) If we fix the nonnegative integer n, then Pn = {f (x) ∈ F [x] : degree(f (x)) ≤ n} is a vector space with the usual addition of polynomials and scalar multiplication.

2.2

Basic Properties of Vector Spaces

We are careful to distinguish between the zero 0 of the field F and the zero vector ~0 in the vector space V .

2.3. SUBSPACES

17

Theorem 2.2.1. (Properties of zero) For a ∈ F and v ∈ V we have the following: av = ~0 if and only if a = 0 ∈ F or v = ~0 ∈ V. Proof. First suppose that a = 0. Then 0v = (0 + 0)v = 0v + 0v. Now adding the additive inverse of 0v to both sides of this equation yields ~0 = 0v, for all v ∈ V . Similarly, if v = ~0, we have a~0 = a(~0 + ~0) = a~0 + a~0. Now add the additive inverse of a~0 to both sides to obtain ~0 = a~0. What remains to be proved is that if av = ~0, then either a = 0 or v = ~0. So suppose av = ~0. If a = 0 we are done. So suppose a 6= 0. In this case a has a multiplicative inverse a−1 ∈ F . So v = (a−1 a)v = a−1 (av) = a−1~0 = ~0 by the preceding paragraph. This completes the proof. Let −v denote the additive inverse of v in V , and let −1 denote the additive inverse of 1 ∈ F . Lemma 2.2.2. For all vectors v ∈ V , (−1)v = −v. Proof. The idea is to show that if (−1)v is added to v, the the result is ~0, the additive identity of V . So, (−1)v + v = (−1)v + 1v = (−1 + 1)v = 0v = ~0 by the preceding result.

2.3

Subspaces

Definition A subset U of V is called a subspace of V provided that with the vector addition and scalar multiplication of V restricted to U , the set U becomes a vector space in its own right. Theorem 2.3.1. The subset U of V is a subspace of V if and only if the following properties hold: (i) U 6= ∅. (ii) If u, v ∈ U , then u + v ∈ U . (iii) If a ∈ F and u ∈ U , then au ∈ U . Proof. Clearly the three properties all hold if U is a subspace. So now suppose the three properties hold. Then U is not empty, so it has some vector u. Then 0u = ~0 ∈ U . For any v ∈ U , (−1)v = −v ∈ U . By these properties and

18

CHAPTER 2. VECTOR SPACES

property (ii) of the Theorem, (U, +) is a subgroup of (V, +). (Recall this from your abstract algebra course.) It is now easy to see that U , with the addition and scalar multiplication inherited from V , must be a vector space, since all the other properties hold for U automatically because they hold for V.

2.4

Sums and Direct Sums of Subspaces

Definition Let U1 , . . . , Um be subspaces of V . The sum U1 + U2 + · · · + Um is defined to be m X

Ui := {u1 + u2 + · · · + um : ui ∈ Ui for 1 ≤ i ≤ m}.

i=1

You are asked in the exercises to show that the sum of subspaces is a subspace. Definition The indexed family, or ordered list (U1 , . . . , Um ) of subspaces is said to be independent provided that if ~0 = u1 + u2 + · · · + um with ui ∈ Ui , 1 ≤ i ≤ m, then ui = 0 for all i = 1, 2, . . . , m. Theorem ≤ m. Each element Pm 2.4.1. Let Ui be a subspace of V for 1 ≤ i P v ∈ i=1 Ui has a unique expression of the form v = m i=1 ui with ui ∈ Ui for all i if and only if the list (U1 , . . . , Um ) is independent. Proof. Suppose the list (U1 , . . . , Um ) is independent. Then by definition ~0 has a unique representation desired form (as a sum of zero vectors). Pm of theP two such representations with Suppose that some v = i=1 ui = m i=1 vi has P ui , vi ∈ Ui , 1 ≤ i ≤ m. Then ~0 = v − v = m i=1 (ui − vi )with ui − vi ∈ Ui since Ui is a subspace. So ui = vi for all i. Conversely, if each element of the sum has a unique representation as an element of the sum, then certainly ~0 does also, implying that the list of subspaces is independent. Pm Definition If each element of i=1 Ui has a unique representation as an element in the sum, then we say that the sum is the direct sum of the subspaces, and write m X i=1

Ui = U1 ⊕ U2 ⊕ · · · ⊕ Um = ⊕

m X i=1

Ui .

2.5. EXERCISES

19

P Theorem 2.4.2. Let U1 , . . . , Um be subspaces of V for which V = m i=1 Ui . Then the following Pm are equivalent: (i) V = ⊕ i=1 Ui . P (ii) Uj ∩ 1≤i≤m;i6=j Ui = {~0} for each j, 1 ≤ j ≤ m. P P Proof. Suppose V = ⊕ m i=1 Ui . If u ∈ Uj ∩ 1≤i≤m;i6=j Ui , say u = −uj ∈ Uj P Pm ~ and u = 1≤i≤m;i6=j ui , then i=1 ui = 0, forcing all the ui ’s equal to ~0. This shows that (i) implies (ii). It isP similarly easy to see that (ii) implies that ~0 has a unique representation in m i=1 Ui . Note: When m = 2 this says that U1 + U2 = U1 ⊕ U2 if and only if U1 ∩ U2 = {~0}.

2.5

Exercises

1. Determine all possible subspaces of F 2 = {f : {1, 2} → F : f is a function}. 2. Prove that the intersection of any collection of subspaces of V is again a subspace of V . Here V is any vector space over any field F . 3. Define the sum of a countably infinite number of subspaces of V and discuss what it should mean for such a collection of subspaces to be independent. 4. Prove that the set-theoretic union of two subspaces of V is a subspace if and only if one of the subspaces is contained in the other. 5. Prove or disprove: If U1 , U2 , W are subspaces of V for which U1 +W = U2 + W , then U1 = U2 . 6. Prove or disprove: If U1 , U2 , W are subspaces of V for which U1 ⊕W = U2 ⊕ W , then U1 = U2 . 7. Let U1 , U2 , U3 be three subspaces of a vector space V over the field F for which U1 ∩ U2 = U1 ∩ U3 = U2 ∩ U3 = {0}. Prove that U1 + U2 + U3 = U1 ⊕ U2 ⊕ U3 , or give a counterexample.

20

CHAPTER 2. VECTOR SPACES

Chapter 3 Dimensional Vector Spaces Our main concern in this course will be finite dimensional vector spaces, a concept that will be introduced in this chapter. However, in a starred section of Chapter 12 and with the help of the appropriate axioms from set theory we show that every vector space has a well-defined dimension. The key concepts here are: span, linear independence, basis, dimension. We assume throughout this chapter that V is a vector space over the field F.

3.1

The Span of a List

For the positive integer N let N be the ordered set N = {1, 2, . . . , N }, and let N be the ordered set {1, 2, . . . , } of all natural numbers. Definition: A list of elements of V is a function from some N to V or from N to V . Usually such a list is indicated by (v1 , v2 , . . . , vm ) for some positive integer m, or perhaps by (v1 , v2 , . . .) if the list is finite of unknown length or countably infinite. An important aspect of a list of vectors of V is that it is ordered. A second important difference between lists and sets is that elements of lists may be repeated, but in a set repetitions are not allowed. For most the work in this course the lists we consider will be finite, but it is important to keep an open mind about the infinite case. Definition: Let L = (v1 , v2 , . . .) be a list of elements of V . We say that v ∈ V is in the span of L P provided there are finitely many scalars a1 , a2 , . . . , am ∈ F such that v = m i=1 ai vi . Then the set of all vectors in the span of L is said to be the span of L and is denoted span(v1 , v2 , . . .). By 21

22

CHAPTER 3. DIMENSIONAL VECTOR SPACES

convention we say that the span of the empty list ( ) is the zero space {~0}. The proof of the following lemma is a routine exercise. Lemma 3.1.1. The span of any list of vectors of V is a subspace of V . If span(v1 , . . . , vm ) = V , we say (v1 , . . . , vm ) spans V . A vector space is said to be finite dimensional if some finite list spans V . For example, let F n denote the vector space of all column vectors with n entries from the field F , and let e~i be the column vector (0, . . . , 0, 1, 0 . . . , 0)T with n − 1 entries equal to 0 ∈ F and a 1 ∈ F in position i. Then F n is finite dimensional because the list (e1 , e2 , . . . , en ) spans F n . Let f (x) ∈ F [x] have the form f (x) = a0 + a1 x + · · · an xn with an 6= 0. We say that f (x) has degree n. The zero polynomial has degree −∞. We let Pn (F ) denote the set of all polynomials with degree at most n. Then Pn (F ) is a subspace of F [x] with spanning list L = (1, x, x2 , . . . , xn ).

3.2

Linear Independence and the Concept of Basis

One of the most fundamental concepts of linear algebra is that of linear independence. Definition: A finite list (v1 , . . . , vm ) of vectors in V is said to be linearly independent provided the only choice of scalars a1 , . . . , am ∈ F for which Pm ~0 is a1 = a2 = · · · = am = 0. An infinite list (v1 , v2 , . . .) of a v = i=1 i i vectors in V is said to be linearly independent provided that for each positive integer m, the list (v1 , . . . , vm ) consisting of the first m vectors of the infinite list is linearly independent. A finite set S of vectors of V is said to be linearly independent provided each list of distinct vectors of S (no repetition allowed) is linearly independent. An arbitrary set S of vectors of V is said to be linearly independent provided every finite subset of S is linearly independent. Any subset of V or list of vectors in V is said to be linearly dependent provided it is not linearly independent. It follows immediately that a list L = (v1 , v2 , . . .) is linearly dependent provided there is some P integer m ≥ 1 ~ for which there are m scalars a1 , . . . , am ∈ F such that m i=1 ai vi = 0 and not all the ai ’s are equal to zero. The following Linear Dependence Lemma will turn out to be extremely useful.

3.2. LINEAR INDEPENDENCE AND THE CONCEPT OF BASIS

23

Lemma 3.2.1. Let L = (v1 , v2 , . . .) be a nonempty list of nonzero vectors in V . (Of course we know that any list that includes the zero vector is automatically linearly dependent.) Then the following are equivalent: (i) L is linearly dependent. (ii) There is some integer j ≥ 2 such that vj ∈ span(v1 , . . . , vj−1 ). (Usually we will want to choose the smallest index j for which this holds.) (iii) There is some integer j ≥ 2 such that if the jth term vj is removed from the list L, the span of the remaining list equals the span of L. ~ Proof. Suppose that the list L = (v1 , v2 , . . .) is linearly Pm dependent and v1 6= 0. For some m there are scalars a1 , . . . , am for which i=1 ai vi = ~0 with at least one of the ai not equal to 0. Since v1 6= ~0, not all of the scalars a2 , . . . , am can equal 0. So let j ≥ 2 be the largest index for which aj 6= 0. Then −1 −1 vj = (−a1 a−1 j )v1 + (−a2 aj )v2 + · · · + (−aj−1 aj )vj−1 ,

proving (ii). To see that (ii) implies (iii), just note that in any linear combination of vectors from L, if vj appears, it can be replaced by its expression as a linear combination of the vectors v1 , . . . , vj−1 . It should be immediately obvious that if (iii) holds, then the list L is linearly dependent. The following theorem is of major importance in the theory. It says that (finite) linearly independent lists are never longer than (finite) spanning lists. Theorem 3.2.2. Suppose that V is spanned by the finite list L = (w1 , . . . , wn ) and that M = (v1 , . . . , vm ) is a linearly independent list. Then m ≤ n. Proof. We shall prove that m ≤ n by using an algorithm that is interesting in its own right. It amounts to starting with the list L and removing one w and adding one v at each step so as to maintain a spanning list. Since the list L = (w1 , . . . , wn ) spans V , adjoining any vector to the list produces a linearly dependent list. In particular, (v1 , w1 , . . . , wn ) is linearly dependent with its first element different from ~0. So by the Linear Dependence Lemma we may remove one of the w’s so that the remaining list

24


of length n is still a spanning list. Suppose this process has been carried out until a spanning list of the form 0 B = (v1 , . . . , vj , w10 , . . . , wn−j )

has been obtained. If we now adjoin vj+1 to the list B by inserting it immediately after vj , the resulting list will be linearly dependent. By the Linear Dependence Lemma, one of the vectors in this list must be a linear combination of the vectors preceding it in the list. Since the list (v1 , . . . , vj+1 ) must be linearly independent, this vector must be one of the w0 ’s and not one of the v’s. So we can remove this w0 and obtain a new list of length n which still spans V and has v1 , . . . , vj+1 as its initial members. If at some step we had added a v and had no more w’s to remove, we would have a contradiction, since the entire list of v’s must be linearly independent even though when we added a v the list was to become dependent. So we may continue the process until all the vectors v1 , . . . , vm have been added to the list, i.e., m ≤ n. Definition A vector space V over the field F is said to be finite dimensional provided there is a finite list that spans V . Theorem 3.2.3. If U is a subspace of the finite-dimensional space V , then U is finite dimensional. Proof. Suppose that V is spanned by a list of length m. If U = {~0}, then certainly U is spanned by a finite list and is finite dimensional. So suppose that U contains a nonzero vector v1 . If the list (v1 ) does not span U , let v2 be a vector in U that is not in span(v1 ). By the Linear Dependence Lemma, the list (v1 , v2 ) must be linearly independent. Continue this process. At each step, if the linearly independent list obtained does not span the space U , we can add one more vector of U to the list keeping it linearly independent. By the preceding theorem, this process has to stop before a linearly independent list of length m + 1 has been obtained, i.e., U is spanned by a list of length at most m, so is finite dimensional. Note: In the preceding proof, the spanning list obtained for U was also linearly independent. Moreover, we could have taken V as the subspace U on which to carry out the algorithm to obtain a linearly independent spanning list. This is a very important type of list. Definition A basis for a vector space V over F is a list L of vectors of V that is a spanning list as well as a linearly independent list. We have just observed the following fact:

3.2. LINEAR INDEPENDENCE AND THE CONCEPT OF BASIS

25

Lemma 3.2.4. Each finite dimensional vector space V has a basis. By convention the empty set ∅ is a basis for the zero space {~0} . Lemma 3.2.5. If V is a finite dimensional vector space, then there is a unique integer n ≥ 0 such that each basis of V has length exactly n. Proof. If B1 = (v1 , . . . , vn ) and B2 = (u1 , . . . , um ) are two bases of V , then since B1 spans V and B2 is linearly independent, m ≤ n. Since B1 is linearly independent and B2 spans V , n ≤ m. Hence m = n. Definition If the finite dimensional vector space V has a basis with length n, we say that V is n-dimensional or that n is the dimension of V . Moreover, each finite dimensional vector space has a well-defined dimension. For this course it is completely satisfactory to consider bases only for finite dimensional spaces. However, students occasionally ask about the infinite dimensional case and we think it is pleasant to have a convenient treatment available. Hence we have included a treatment of the infinite dimensional case in Chapter 12 that treats a number of special topics. For our general purposes, however, we are content merely to say that any vector space which is not finite dimensional is infinite dimensional (without trying to associate any specific infinite cardinal number with the dimension). Lemma 3.2.6. A list B = (v1 , . . . , vn ) of vectors in V is a basis of V if and only if every v ∈ V can be written uniquely in the form v = a1 v1 + · · · + an vn .

(3.1)

Proof. First assume that B is a basis. Since it is a spanning set, each v ∈ V can be written in the form given in Eq. 3.1. Since B is linearly independent, such an expression is easily seen to be unique. Conversely, suppose each v ∈ V has a unique expression in the form of Eq. 3.1. The existence of the expression for each v ∈ V implies that B is a spanning set. The uniqueness then implies that B is linearly independent. Theorem 3.2.7. If L = (v1 , . . . , vn ) is a spanning list of V , then a basis of V can be obtained by deleting certain elements from L. Consequently every finite dimensional vector space has a basis. Proof. If L is linearly independent, it must already be a basis of V . If not, consider v1 . If v1 = ~0, discard v1 . If not, then leave L unchanged. Since

26


L is assumed to be linearly dependent, there must be some j such that vj ∈ span(v1 , . . . , vj−1 ). Choose the smallest j for which this is true and delete that vj from L. This will yield a list L1 of n − 1 elements that still spans V . If L1 is linearly independent, then L1 is the desired basis of V . If not, then proceed as before to obtain a spanning list of size n − 2. Continue this way, always producing spanning sets of shorter length, until eventually a linearly independent spanning set is obtained. Lemma 3.2.8. Every linearly independent list of vectors in a finite dimensional vector space V can be extended to a basis of V . Proof. Let V be a finite dimensional space, say with dim(V ) = n. Let L = (v1 , . . . , vk ) be a linearly independent list. So 0 ≤ k ≤ n. If k < n we know that L cannot span the space, so there is a vector vk+1 not in span(L). Then L0 = (v1 , . . . , vk , vk+1 ) must still be linearly independent. If k = 1 < n we can repeat this process, adjoining vectors one at a time to produce longer linearly independent lists until a basis is obtained. As we have seen above, this must happen when we have an independent list of length n. Lemma 3.2.9. If V is a finite-dimensional space and U is a subspace of V , then there is a subspace W of V such that V = U ⊕ W . Moreover, dim(U ) ≤ dim(V ) with equality if and only if U = V . Proof. We have seen that U must also be finite-dimensional, so that it has a basis B1 = (v1 , . . . , vk ). Since B1 is a linearly independent set of vectors in a finite-dimensional space V , it can be completed to a basis B = (v1 , . . . , vk , vk+1 , . . . , vn ) of V . Put W = span(vk+1 , . . . , vn ). Clearly V = U + W , and U ∩ W = {~0}. It follows that V = U ⊕ W . The last part of the lemma should also be clear. Theorem 3.2.10. Let L = (v1 , . . . , vk ) be a list of vectors of an n-dimensional space V . Then any two of the following properties imply the third one: (i) k = n; (ii) L is linearly independent; (iii) L spans V . Proof. Assume that (i) and (ii) both hold. Then L can be completed to a basis, which must have exactly n vectors, i.e., L already must span V . If (ii) and (iii) both hold, L is a basis by definition and must have n elements by definition of dimension. If (iii) and (i) both hold, L can be restricted to

3.3. EXERCISES

27

form a basis, which must have n elements. Hence L must already be linearly independent. Theorem 3.2.11. let U and W be finite-dimensional subspaces of the vector space V . Then dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W ). Proof. Let B1 = (v1 , . . . , vk ) be a basis for U ∩ W . Complete it to a basis B2 = (v1 , . . . , vk , u1 , . . . , ur ) of U and also complete it to a basis B3 = (v1 , . . . , vk , w1 , . . . , wt ) for W . Put B = (v1 , . . . , vk , u1 , . . . , ur , w1 , . . . , wt ). We claim that B is a basis for U + W , from which the theorem follows. First we show that B is linearlyPindependent. Pt that there Pr So suppose k are scalars ai , bi , ci ∈ F for which i=1 ai vi + i=1 bi ui + i=1 ci wi = ~0. Pt Pr Pk It follows that i=1 ci wi ∈ U ∩ W . Since B2 i=1 bi ui = − i=1 ai vi + is linearly independent, this means all the bi ’s are equal to 0. This forces Pt P k ~0. Since B3 is linearly independent, this forces all c w = a v + i=1 i i i=1 i i the ai ’s and ci ’s to be 0. But it should be quite clear that B spans U + V , so that in fact B is a basis for U + W . At this point the following theorem is easy to prove. We leave the proof as an exercise. Theorem 3.2.12. Let U1 , . . . , Um be finite-dimensional subspaces of V with V = U1 + · · · Um and with Bi a basis for Ui , 1 ≤ i ≤ m. The the following are equivalent: (i) V = U1 ⊕ · · · ⊕ Um ; (ii) dim(V ) = dim(U1 ) + dim(U2 ) + · · · + dim(Um ); (iii) (B1 , B2 , . . . , Bm ) is a basis for V . (iv) The spaces U1 , . . . , Um are linearly independent.

3.3

Exercises

1. Write out a proof of Lemma 3.1.1. 2. Let L be any list of vectors of V , and let S be the set of all vectors in L. Show that the intersection of all subspaces having S as a subset is just the span of L.

28

CHAPTER 3. DIMENSIONAL VECTOR SPACES 3. Show that any subset T of a linearly independent set S of vectors is also linearly independent, and observe that this is equivalent to the fact that if T is a linearly dependent subset of S, then S is also linearly dependent. Note: The empty set ∅ of vectors is linearly independent. 4. Show that the intersection of any family of linearly independent sets of vectors of V also linearly independent. 5. Give an example of two linearly dependent sets whose intersection is linearly independent. 6. Show that a list (v) of length 1 is linearly dependent if and only if v = ~0. 7. Show that a list (v1 , v2 ) of length 2 is linearly independent if and only if neither vector is a scalar times the other. 8. Let m be a positive integer. Let V = {f ∈ F [x] : deg(f ) = m or f = ~0}. Show that V is or is not a subspace of F [x]. 9. Prove or disprove: There is a basis of Pm (x) all of whose members have degree m.

10. Prove or disprove: there exists a basis (p0 , p1 , p2 , p3 ) of P3 (F ) such that (a) all the polynomials p0 , . . . , p3 have degree 3. (b) all the polynomials p0 , . . . , p3 give the value 0 when evaluated at 3. (c) all the polynomials p0 , . . . , p3 give the value 3 when evaluated at 0. (d) all the polynomials p0 , . . . , p3 give the value 3 when evaluated at 0 and give the value 1 when evaluated at 1. 11. Prove that if U1 , U2 , · · · , Um are subspaces of V then dim(U1 + · · · + Um ) ≤ dim(U1 ) + dim(U2 ) + · · · + dim(Um ). 12. Prove or give a counterexample: If U1 , U2 , U3 are three subspaces of a finite dimensional vector space V , then dim(U1 + U2 + U3 ) =

3.3. EXERCISES

29 dim(U1 ) + dim(U2 ) + dim(U3 )

−dim(U1 ∩ U2 ) − dim(U2 ∩ U3 ) − dim(U3 ∩ U1 ) +dim(U1 ∩ U2 ∩ U3 ). 13. Suppose that p0 , p1 , . . . , pm are polynomials in the space Pm (F ) (of polynomials over F with degree at most m) such that pj (2) = 0 for each j. Prove that the set {p0 , p1 , . . . , pm } is not linearly independent in Pm (F ). 14. Let U, V, W be vector spaces over a field F, with U and W subspaces of V . We say that V is the direct sum of U and W and write V = U ⊕W provided that for each vector v ∈ F there are unique vectors u ∈ U , w ∈ W for which v = u + w. (a) Prove that if dimU + dimW = dimV and U ∩ W = {0}, then V = U ⊕ W. (b) Prove that if C is an m×n matrix over the field R of real numbers, then Rn = Col(C T ) ⊕ N (C), where Col(A) denotes the column space of the matrix A and N (A) denotes the right null space of A.

30


Chapter 4 Linear Transformations 4.1

Definitions and Examples

Throughout this chapter we let U , V and W be vector spaces over the field F. Definition A function (or map) T from U to V is called a linear map or linear transformation provided T satisfies the following two properties: (i) T (u + v) = T (u) + T (v) for all u, v ∈ U . and (ii) T (au) = aT (u) for all a ∈ F and all u ∈ U . These two properties can be combined into the following single property: Obs. 4.1.1. T : U → V is linear provided T (au+bv) = aT (u)+bT (v)∀a, b ∈ F, u, v ∈ U . Notice that T is a homomorphism of the additive group (U, +) into the additive group (V, +). So you should be able to show that T (~0) = ~0, where the first ~0 is the zero vector of U and the second is the zero vector of V . The zero map: The map 0 : U → V defined by 0(v) = ~0 for all v ∈ U is easily seen to be linear. The identity map: Similarly, the map I : U → U defined by I(v) = v for all v ∈ U is linear. For f (x) = a0 + a1 x + a2 x2 + · · · an xn ∈ F [x], the formal derivative of f (x) is defined to be f 0 (x) = a1 + 2a2 x + 3a3 x3 + · · · + nan xn−1 . Differentiation: Define D : F [x] → F [x] by D(f ) = f 0 , where f 0 is the formal derivative of f . Then D is linear. 31

32

CHAPTER 4. LINEAR TRANSFORMATIONS

The prototypical linear map is given by the following. Let F n and F m be the vector spaces of column vectors with n and m entries, respectively. Let A ∈ Mm,n (F ). Define TA : F n → F m by TA : X 7→ AX for all X ∈ F n . The usual properties of matrix algebra force TA to be linear. Theorem 4.1.2. Suppose that U is n-dimensional with basis B = (u1 , . . . , un ), and let L = (v1 , . . . vn ) be any list of n vectors of V . Then there is a unique linear map T : U → V such that T (ui ) = vi for 1 ≤ i ≤ n. P for unique scalars Proof. Let u be any vector of U , so u = ni=1 ai ui P P ai ∈ F . Then the desired T has to be defined by T (u) = ni=1 ai T (ui ) = ni=1 ai vi . This clearly defines T uniquely. The fact that T is linear follows easily from the basic properties of vector spaces. Put L(U, V ) = {T : U → V : T is linear}. The interesting fact here is that L(U, V ) is again a vector space in its own right. Vector addition S + T is defined for S, T ∈ L(U, V ) by: (S + T )(u) = S(u) + T (u) for all u ∈ U . Scalar multiplication is defined for a ∈ F and T ∈ L(U, V ) by (aT )(u) = a(T (u)) for all u ∈ U . You should verify that with this vector addition and scalar multiplication L(U, V ) is a vector space with the zero map being the additive identity. Now suppose that T ∈ L(U, V ) and S ∈ L(V, W ). Then the composition (S ◦ T ) : U → W , usually just written ST , defined by ST (u) = S(T (u)), is well-defined and is easily shown to be linear. In general the composition product is associative (when a triple product is defined) because this is true of the composition of functions in general. But also we have the distributive properties (S1 + S2 )T = S1 T + S2 T and S(T1 + T2 ) = ST1 + ST2 whenever the products are defined. In general the multiplication of linear maps is not commutative even when both products are defined.

4.2

Kernels and Images

We use the following language. If f : A → B is a function, we say that A is the domain of f and B is the codomain of f . But f might not be onto

4.2. KERNELS AND IMAGES

33

B. We define the image (or range of f by Im(f ) = {b ∈ B : f (a) = b for at least one a ∈ A}. And we use this language for linear maps also. The null space (or kernel) of T ∈ L(U, V ) is defined by null(T ) = {u ∈ U : T (u) = ~0}. NOTE: The terms range and image are interchangeable, and the terms null space and kernel are interchangeable. In the exercises you are asked to show that the null space and image of a linear map are subspaces of the appropriate spaces. Lemma 4.2.1. Let T ∈ L(U, V ). Then T is injective (i.e., one-to-one) if and only if null(T ) = {~0}. Proof. Since T (~0) = ~0, if T is injective, clearly null(T ) = {~0}. Conversely, suppose that null(T ) = {~0}. Then suppose that T (u1 ) = T (u2 ) for u1 , u2 ∈ U . Then by the linearity of T we have T (u1 − u2 ) = T (u1 ) − T (u2 ) = ~0, so u1 − u2 ∈ null(T ) = {~0}. Hence u1 = u2 , implying T is injective. The following Theorem and its method of proof are extremely useful in many contexts. Theorem 4.2.2. If U is finite dimensional and T ∈ L(U, V ), then Im(T ) is finite-dimensional and dim(U ) = dim(null(T )) + dim(Im(T )). Proof. (Pay close attention to the details of this proof. You will want to use them for some of the exercises.) Start with a basis (u1 , . . . , uk ) of null(T ). Extend this list to a basis (u1 , . . . , uk , w1 , . . . , wr ) of U . Thus dim(null(T )) = k and dim(U ) = k + r. To complete a proof of the theorem we need only show that dim(Im(T )) = r. We will do this by showing that (T (w1 ), . . . , T (wr )) is a basis of Im(T ). Let u ∈ U . Because (u1 , . . . , uk , w1 , . . . , wr ) spans U , there are scalars ai , bi ∈ F such that u = a1 u1 + · · · ak uk + b1 w1 + · · · br wr . Remember that u1 , . . . , uk are in null(T ) and apply T to both sides of the preceding equation. T (u) = b1 T (w1 ) + · · · br T (wr ).

34


This last equation implies that (T (w1 ), . . . , T (wr )) spans Im(T ), so at least Im(T ) is finite dimensional. To show that (T (w1 ), . . . , T (wr )) is linearly independent, suppose that r X

ci T (wi ) = ~0 for some ci ∈ F.

i=1

P P P It follows easily that ri=1 ci wi ∈ null(T ), so ri=1 ci wi = ki=1 di ui . Since (u1 , . . . , uk , w1 , . . . , wr ) is linearly independent, we must have that all the ci ’s and di ’s are zero. Hence (T (w1 ), . . . , T (wr )) is linearly independent, and hence is a basis for Im(T ). Obs. 4.2.3. There are two other ways to view the equality of the preceding theorem. If n = dim(U ), k = dim(null(T )) and r = dim(Im(T )), then the theorem says n = k + r. And if dim(V ) = m, then clearly r ≤ m. So k = n − r ≥ n − m. If n > m, then k > 0 and T is not injective. If n < m, then r ≤ n < m says T is not surjective (i.e., onto). Definition: Often we say that the dimension of the null space of a linear map T is the nullity of T .

4.3

Rank and Nullity Applied to Matrices

Let A ∈ Mm,n (F ). Recall that the row rank of A (i.e., the dimension of the row space row(A) of A) equals the column rank (i.e., the dimension of the column space col(A)) of A, and the common value rank(A) is called the rank of A. Let TA : F n → F m : X 7→ AX as usual. Then the null space of TA is the right null space rnull(A) of the matrix A, and the image of TA is the column space of A. So by Theorem 4.2.2 n = rank(A) + dim(rnull(A)). Similarly, m = rank(A) + dim(lnull(A)). (Clearly lnull(A) denotes the left null space of A.) Theorem 4.3.1. Let A be an m × n matrix and B be an n × p matrix over F . Then (i) dim(rnull(AB)) ≤ dim(rnull(A)) + dim(rnull(B)), and (ii) rank(A) + rank(B) - rank(AB) ≤ n.

4.4. PROJECTIONS

35

Proof. Start with rnull(B) ⊆ rnull(AB) ⊆ F p . col(B) ∩ rnull(A) ⊆ rnull(A) ⊆ F n . Consider the map TB : rnull(AB) → col(B) ∩ rnull(A) : ~x 7→ B~x. Note that TB could be defined on all of F p , but we have defined it only on rnull(AB). On the other hand, rnull(B) really is in rnull(AB), so null(TB ) = rnull(B). Also, Im(TB ) ⊆ col(B) ∩ rnull(A) ⊆ rnull(A), since TB is being applied only to vectors ~x for which A(B~x) = 0. Hence we have the following:

dim(rnull(AB)) = dim(null(TB )) + dim(Im(TB )) = dim(rnull(B)) + dim(Im(TB )) ≤ dim(rnull(B)) + dim(rnull(A)).

This proves (i). This can be rewritten as p − rank(AB) ≤ (p − rank(B)) + (n − rank(A)), which implies (ii).

4.4

Projections

Suppose V = U ⊕ W . Define P : V → V as follows. For each v ∈ V , write v = u + w with u ∈ U and w ∈ W . Then u and w are uniquely defined for each v ∈ V . Put P (v) = u. It is straightforward to verify the following properties of P . Obs. 4.4.1. (i) P ∈ L(V ). (ii) P 2 = P . (iii) U = Im(P ). (iv) W = null(P ). (v) U and W are both P -invariant. This linear map P is called the projection onto U along W (or parallel to W ) and is often denoted by P = PU,W . Using this notation we see that Obs. 4.4.2. I = PU,W + PW,U .

36


As a kind of converse, suppose that P ∈ L(V ) is idempotent, i.e., P 2 = P . Put U = Im(P ) and W = null(P ). Then for each v ∈ V we can write v = P (v) + (v − P (v)) ∈ Im(P ) + null(P ). Hence V = U + W . Now suppose that u ∈ U ∩ W . Then on the one hand u = P (v) for some v ∈ V . On the other hand P (u) = ~0. Hence ~0 = P (u) = P 2 (v) = P (v) = u, implying U ∩ W = {~0}. Hence V = U ⊕ W . It follows readily that P = PU,W . Hence we have the following: Obs. 4.4.3. The linear map P is idempotent if and only if it is the projection onto its image along its null space.

4.5

Bases and Coordinate Matrices

In this section let V be a finite dimensional vector space over the field F . Since F is a field, i.e., since the multiplication in F is commutative, it turns out that it really does not matter whether V is a left vector space over F (i.e., the scalars from F are placed on the left side of the vectors of V ) or V is a right vector space. Let the list B = (v1 , . . . , vn ) be a basis for V . So if v is anP arbitrary vector in V there are unique scalars c1 , . . . , cn in F for which v = ni=1 ci vi . The column vector [v]B = (c1 , . . . , cn )T ∈ F n is then called the coordinate matrix for v with respect to the basis B, and we may write  v=

n X i=1

  ci vi = (v1 , . . . , vn )  

c1 c2 .. .

    = B[v]B . 

cn Perhaps we should discuss this last multiplication a bit. In the usual theory of matrix manipulation, if we want to multiply two matrices A and B to get a matrix AB = C, there are integers n, m and p such that A is m × n, B is n × p, and the product C is m × p. If in general the entry in the ith row and jth column of aP matrix A is denoted by Aij , then the (i, j)th entry of AB = C is (AB)ij = nk=1 Aik Bkj . If A is a row or a column the entries are usually indicated by a single subscript. So we might

4.5. BASES AND COORDINATE MATRICES write

   A = (a1 , . . . , an ); B =  

b1 b2 .. .

37

  X  ak b k .  ; AB =  k

bn In this context it is usually assumed that the entries of A and B (and hence also of C) come fromPsome ring, probably a commutative ring R, so that in particular this sum k ak bk is a uniquely defined element of R. Moreover, using the usual properties of arithmetic in R it is possible to show directly that matrix multiplication (when defined!) is associative. Also, matrix addition is defined and matrix multiplication (when defined!) distributes over addition, etc. However, it is not always necessary to assume that the entries of A and B come from the same kind of algebraic system. We may just as easily multiply a column (c1 , . . . , cn )T of scalars from the field F by the row (v1 , . . . , vn ) representing an ordered basis of V over F to obtain   c1 n  c2  X   v = (v1 , . . . , vn ) ·  ..  = ci v i .  .  i=1 cn We may also suppose A is an n × n matrix over F and write (u1 , u2 , . . . , un ) = (v1 , v2 , . . . , vn )A, so uj =

n X

Aij vi .

i=1

Then it follows readily that (u1 , . . . , un ) is an ordered basis of V if and only if the matrix A is invertible, in which case it is also true that n X (v1 , v2 , . . . , vn ) = (u1 , u2 , . . . , un )A , so vj = (A−1 )ij ui . −1

i=1

To see this, observe that    (v1 , . . . , vn ) ·  

c1 c2 .. . cn





     = (v1 , v2 , . . . , vn )A ·   

c1 c2 .. . cn

    

38

CHAPTER 4. LINEAR TRANSFORMATIONS    = (v1 , v2 , . . . , vn ) · (A  

c1 c2 .. .

   ) 

cn Since (v1 , v2 , . . . , vn ) is an independent list, this product is the zero vector if and only if   c1  c2    A  ..   .  cn is the zero vector, from which the desired result is clear.

4.6

Matrices as Linear Transformations

Let B1 = (u1 , . . . , un ) be an ordered basis for the vector space U over the field F , and let B2 = (v1 , . . . , vm ) be an ordered basis for the vector space V over F . Let A be an m × n matrix over F . Define TA : U → V by [TA (u)]B2 = A · [u]B1 for all u ∈ U . It is quite straightforward to show that TA ∈ L(U, V ). It is also clear (by letting u = uj ), that the j th column of A is [T (uj )]B2 . Conversely, if T ∈ L(U, V ), and if we define the matrix A to be the matrix with j th column equal to [T (uj )]B2 , then [T (u)]B2 = A · [u]B1 for all u ∈ U . In this case we say that A is the matrix that represents T with respect to the pair (B1 , B2 ) of ordered bases of U and V , respectively, and we write A = [T ]B2 ,B1 . Note that a coordinate matrix of a vector with respect to a basis has a subscript that is a single basis, whereas the matrix representing a linear map T has a subscript which is a pair of bases, with the basis of the range space listed first and that of the domain space listed second. Soon we shall see why this order is the convenient one. When U = V and B1 = B2 it is sometimes the case that we write [T ]B1 in place of [T ]B1 ,B1 . And we usually write L(V ) in place of L(V, V ), and T ∈ L(V ) is called a linear operator on V . In these notes, however, even when there is only one basis of V being used and T is a linear operator on V , we sometimes indicate the matrix that represents T with a subscript that is a pair of bases, instead of just one basis, because there are times when we want to think of T as a member of a vector

4.6. MATRICES AS LINEAR TRANSFORMATIONS

39

space so that it has a coordinate matrix with respect to some basis of that vector space. Our convention makes it easy to recognize when the matrix represents T with respect to a basis as a linear map and when it represents T as a vector itself which is a linear combination of the elements of some basis. We give an example of this. Example 4.6.1. Let defined as follows: 1 0 v1 = 0 0 0 0 v4 = 1 0

V = M2,3 (F ). Put B = (v1 , . . . , v6 ) where the vi are

0 0

0 0

; v2 = ; v5 =

0 1 0 0 0 0

0 0 0 0 1 0

; v3 = ; v6 =

0 0 1 0 0 0

0 0 0 0 0 1

; .

It is clear that B is a basis for M2,3 (F ), and if A ∈ M2,3 (F ), then the coordinate matrix [A]B of A with respect to the basis B is [A]B = (A11 , A12 , A13 , A21 , A22 , A23 )T . Given such a matrix A, define a linear map TA : F 3 → F 2 by TA (u) = Au for all u ∈ F 3 . Let S1 = (e1 , e2 , e3 ) be the standard ordered basis of F 3 , and let S2 = (h1 , h2 ) be the standard ordered basis of F 2 . So, for example, h2 = (0, 1)T . You should check that [TA ]S2 ,S1 = A. For 1 ≤ i ≤ 3; 1 ≤ j ≤ 2, let fij ∈ L(F 3 , F 2 ) be defined by fij (ek ) = δik hj . Let B3 = (f11 , f21 , f31 , f12 , f22 , f32 ). We want to figure out what is the coordinate matrix [TA ]B3 . (The next theorem sows that B3 is indeed a basis of L(F 3 , F 2 ).) We claim that [TA ]B3 = (a11 , a12 , a13 , a21 , a22 , a23 )T . Because of the order in P which we listed the basis vectors fij , this is equivalent to saying that TA = i,j aij fji . If we evaluate this sum at (ek ) we get X i,j

aij fji (ek ) =

X

aik fki (ek ) =

i

This establishes our claim.

X i

aik hi =

a1k a2k

= Aek = TA (ek ).

40


Recall that B1 = (u1 , . . . , un ) is an ordered basis for U over the field F , and that B2 = (v1 , . . . , vm ) is an ordered basis for V over F . Now suppose that W has an ordered basis B3 = (w1 , . . . , wp ). Let S ∈ L(U, V ) and T ∈ L(V, W ), so that T ◦ S ∈ L(U, W ), where T ◦ S means do S first. Then we have [(T ◦ S)(u)]B3 = [T (S(u))]B3 = [T ]B3 ,B2 [S(u)]B2 = [T ]B3 ,B2 [S]B2 ,B1 [u]B1 = = [T ◦ S]B3 ,B1 [u]B1 for all u ∈ U. This implies that [T ◦ S]B3 ,B1 = [T ]B3 ,B2 · [S]B2 ,B1 . This is the equation that suggests that the subscript on the matrix representing a linear map should have the basis for the range space listed first. Recall that L(U, V ) is naturally a vector space over F with the usual addition of linear maps and scalar multiplication of linear maps. Moreover, for a, b ∈ F and S, T ∈ L(U, V ), it follows easily that [aS + bT ]B2 ,B1 = a[S]B2 ,B1 + b[T ]B2 ,B1 . We leave the proof of this fact as a straightforward exercise. It then follows that if U = V and B1 = B2 , the correspondence T 7→ [T ]B1 ,B1 = [T ]B1 is an algebra isomorphism. This includes consequences such as [T −1 ]B = ([T ]B )−1 when T happens to be invertible. Proving these facts is a worthwhile exercise! Let fij ∈ L(U, V ) be defined by fij (uk ) = δik vj , 1 ≤ i, k ≤ n; 1 ≤ j ≤ m. So fij maps ui to vj and maps uk to the zero vector for k 6= i. This completely determines fij as a linear map from U to V . Theorem 4.6.2. The set B ∗ = {fij : 1 ≤ i ≤ m; 1 ≤ j ≤ n} is a basis for L(U, V ) as a vector space over F . Note: We could turn B ∗ into a list, but we don’t need to.

4.7. CHANGE OF BASIS

41

∗ Proof. We start by showing that independent. Suppose that P P B is linearly P ~ ij cij fij = 0, so that 0 = ij cij fij (uk ) = j ckj vj for each k. Since (v1 , . . . , vm ) is linearly independent, ck1 = ck2 = · · · = ckm = 0, and this holds for each k, so the set of fij must be linearly independent. We now show that it spans L(U, V ). ∈ L(U, V ) and that [S]B2 ,BP 1 = C = (cij ), PnFor suppose that SP i.e., S(u ) = c v . Put T = c f . Then T (u ) = cik fki (uj ) = j j i=1 ij i i,k ik ki P i cij vi , implying that S = T since they agree on a basis.

Corollary 4.6.3. If dim(U ) = n and dim(V ) = m, then dimL(U, V ) = mn.

4.7

Change of Basis

We want to investigate what happens to coordinate matrices and to matrices representing linear operators when the ordered basis is changed. For the sake of simplicity we shall consider this question only for linear operators on a space V , so that we can use a single ordered basis. So in this section we write matrices representing linear operators with a subscript which is a single basis. Let F be any field and let V be a finite dimensional vector space over F , say dim(V ) = n. Let B1 = (u1 , . . . , un ) and B2 = (v1 , . . . , vn ) be two (ordered) bases of V . So P for v ∈ V , and for i = 1, say that [v]B1 = n (c1 , c2 , . . . , cn )T , i.e., v = i=1 ci ui . We often write this equality in the form v = (u1 , . . . , un )[v]B1 = B1 [v]B1 . Similarly, v = B2 [v]B2 . Since B1 and B2 are both bases for V , there is an invertible matrix Q such that B1 = B2 Q and B2 = B1 Q−1 . The first equality indicates that (u1 , . . . , un ) = (v1 , . . . , vn )Q, or uj =

n X i=1

Qij vi .

(4.1)

42


This equation says that the jth column of Q is the coordinate matrix of uj with respect to B2 . Similarly, the jth column of Q−1 is the coordinate matrix of vj with respect to B1 . For every v ∈ V we now have v = B1 [v]B1 = (B2 Q)[v]B1 = B2 [v]B2 . It follows that Q[v]B1 = [v]B2 .

(4.2)

Now let T ∈ L(V ). Recall that the matrix [T ]B that represents T with respect to the basis B is the unique matrix for which [T (v)]B = [T ]B [v]B for all v ∈ V. Theorem 4.7.1. Let B1 = B2 Q as above. Then [T ]B2 = Q[T ]B1 Q−1 . Proof. In Eq. 4.2 replace v with T (v) to get Q[T (v)]B1 = [T (v)]B2 = [T ]B2 [v]B2 = [T ]B2 Q[v]B1 for all v ∈ V . It follows that Q[T ]B1 [v]B1 = [T ]B2 Q[v]B1 for all v ∈ V , implying Q[T ]B1 = [T ]B2 Q,

(4.3)

which is equivalent to the statement of the theorem. It is often convenient to remember this fact in the following form (let P play the role of Q−1 above): If B2 = B1 P, then [T ]B2 = P −1 [T ]B1 P.

(4.4)

A Specific Setting Now let V = F n , whose elements we think of as being column vectors. Let A be an n × n matrix over F and define TA ∈ L(F n ) by TA (v) = Av, for all v ∈ F n .

4.7. CHANGE OF BASIS

43

Let S = (e1 , . . . , en ) be the standard ordered basis for F n , i.e., ej is the column vector in F n whose jth entry is 1 and all other entries are equal to 0. It is clear that we can identify each vector v ∈ F n with [v]S . Moreover, the jth column of A is Aej = [Aej ]S = [TA ej ]S = [TA ]S [ej ]S = [TA ]S ej = the jth column of [TA ]S , which implies that A = [TA ]S .

(4.5)

Putting all this together, we have Theorem 4.7.2. Let S = (e1 , . . . , en ) be the standard ordered basis of F n . Let B = (v1 , . . . , vn ) be a second ordered basis. Let P be the matrix whose jth column is vj = [vj ]S . Let A be an n × n matrix over F and define TA : F n → F n by TA (v) = Av. So [TA ]S = A. Then [TA ]B = P −1 AP . Definition Two n × n matrices A and B over F are said to be similar (written A ∼ B) if and only if there is an invertible n × n matrix P such that B = P −1 AP . You should prove that “similarity” is an equivalence relation on Mn (F ) and then go on to complete the details giving a proof of the following corollary. Corollary 4.7.3. If A, B ∈ Mn (F ), and if V is an n-dimensional vector space over F , then A and B are similar if and only if there are bases B1 and B2 of V and T ∈ L(V ) such that A = [T ]B1 and B = [T ]B2 . The Dual Space We now specialize to the case where V = F is viewed as a vector space over F . Here L(U, F ) is denoted U ∗ and is called the dual space of U . An element of L(U, F ) is called a linear functional. Write B = (u1 , . . . , un ) for the fixed ordered basis of U . Then 1 ∈ F is a basis of F over F , so we write this basis as ¯1 = (1) and m = 1, and there is a basis B ∗ of U ∗ defined by B ∗ = (f1 , . . . , fn ), where fi (uj ) = δij ∈ F . This basis B ∗ is called the basis dual to B. If f ∈ U ∗ satisfies f (uj ) = cj for 1 ≤ j ≤ n, then [f ]¯1,B1 = [c1 , . . . , cn ] = [f (u1 ), . . . , f (un )]. P P As above in the more general case, if g = i ci fi , then g(uj ) = i ci fi (uj ) = cj , so f = g, and [f ]B∗ = [c1 , . . . , cn ]T = ([f ]¯1,B1 )T . We restate this in general: For f ∈ U ∗ , [f ]B∗ = ([f ]¯1,B1 )T .

44


Given a vector space U , let GL(U ) denote the set of all invertible linear operators on U . If we let composition of maps be the binary operation of GL(U ), then GL(U ) turns out to be a group called the general linear group on U . Suppose that T ∈ GL(U ), i.e., T is an invertible element of L(U, U ). We define a map Tˆ : U ∗ → U ∗ by Tˆ(f ) = f ◦ T −1 , for all f ∈ U ∗ . We want to determine the matrix [Tˆ]B∗ ,B∗ . We know that the j th column of T T this matrix is [Tˆ(fj )]B∗ = [fj ◦T −1 ]B∗ = [fj ◦ T −1 ]¯1,B = [fj ]¯1,B · [T −1 ]B,B =   0  ..   .   −1 T −T  (0, · · · , 1j , · · · , 0) ([T ]B,B ) = ([T ]B,B ) · 1j  = j th column of ([T ]B,B )−T .  .   ..  0 This says that [Tˆ]B∗ ,B∗ = ([T ]B,B )−T . Exercise 18. asks you to show that the map GL(V ) → GL(V ∗ ) : T 7→ Tˆ is an isomorphism.

4.8

Exercises

1. Let T ∈ L(U, V ). Then null(T ) is a subspace of U and Im(T ) is a subspace of V . 2. Suppose that V and W are finite-dimensional and that U is a subspace of V . Prove that there exists a T ∈ L(V, W ) such that null(T ) = U if and only if dim(U ) ≥ dim(V ) − dim(W ). 3. Let T ∈ L(V ). Put R = Im(T ) and N = null(T ). Note that both R and N are T -invariant. Show that R has a complementary T -invariant subspace W (i.e., V = R ⊕ W and T (W ) ⊆ W ) if and only if R ∩ N = {~0}, in which case N is the unique T -invariant subspace complementary to R.

4.8. EXERCISES

45

4. State and prove Theorem 4.3.1 for linear maps (instead of for matrices). 5. Prove Corollary 4.6.3 6. If T ∈ L(U, V ), we know that as a function from U to V , T has an inverse if and only if it is bijective (i.e., one-to-one and onto). Show that when T is invertible as a function, then its inverse is in L(V, U ). 7. If T ∈ L(U, V ) is invertible, and if B1 is a basis for U and B2 is a basis for V , then ([T ]B2 ,B1 )−1 = [T −1 ]B1 ,B2 . 8. Two vector spaces U and V are said to be isomorphic provided there is an invertible T ∈ L(U, V ). Show that if U and V are finite-dimensional vector spaces over F , then U and V are isomorphic if and only if dim(U ) = dim(V ). 9. Let A ∈ Mm,n (F ) and b ∈ Mm,1 (F ) = F m . Consider the matrix equation A~x = ~b as a system of m linear equations in n unknowns x1 , . . . , xn . Interpret Obs. 4.2.3 for this system of linear equations. 10. Let B1 , B2 be bases for U and V , respectively, with dim(U ) = n and dim(V ) = m. Show that the map M : L(U, V ) → Mm,n (F ) : T 7→ [T ]B2 ,B1 is an invertible linear map. 11. Suppose that V is finite-dimensional and that T ∈ L(V ). Show that the following are equivalent: (i) T is invertible. (ii) T is injective. (iii) T is surjective. 12. Suppose that V is finite dimensional and S, T ∈ L(V ). Prove that ST is invertible if and only if both S and T are invertible. 13. Suppose that V is finite dimensional and T ∈ L(V ). Prove that T is a scalar multiple of the identity if and only if ST = T S for every S ∈ L(V ).

46


14. Suppose that W is finite dimensional and T ∈ L(V, W ). Prove that T is injective if and only if there exists an S ∈ L(W, V ) such that ST is the identity map on V . 15. Suppose that V is finite dimensional and T ∈ L(V, W ). Prove that T is surjective if and only if there exists an S ∈ L(W, V ) such that T S is the identity map on W . 16. Let V be the vector space over the reals R consisting of the polynomials in x of degree at most 4 with coefficients in R, and with the usual addition of vectors (i.e., polynomials) and scalar multiplication. Let B1 = {1, x, x2 , . . . , x4 } be the standard ordered basis of V . Let W be the vector space of 2 × 3 matrices over R with the usual addition of matrices and scalar multiplication. Let B2 be the basis of ordered 1 0 0 0 1 0 W given as follows: B2 = {v1 = , v2 = , v3 = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 , v4 = , v5 = , v6 = }. 0 0 0 1 0 0 0 1 0 0 0 1 Define T : V 7→ W by: For f = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 , put T (f ) =

0 a3 a2 + a4 a1 + a0 a0 0

.

Construct the matrix A = [T ]B1 ,B2 that represents T with respect to the pair B1 , B2 of ordered bases. 17. Let T ∈ L(V ). Prove or disprove each of the following: (a) V = null(T ) ⊕ range(T ). (b) There exists a subspace U of V such that U ∩ null(T ) = {0} and range(T ) = {T (u) : u ∈ U }. 18. Let V be a finite dimensional vector space over F . Show that the map GL(V ) → GL(V ∗ ) : T 7→ Tˆ is an isomorphism.

Chapter 5 Polynomials 5.1

Algebras

It is often the case that basic facts about polynomials are taken for granted as being well-known and the subject is never developed in a formal manner. In this chapter, which we usually assign as independent reading, we wish to give the student a somewhat formal introduction to the algebra of polynomials over a field. It is then natural to generalize to polynomials with coefficients from some more general algebraic structure, such as a commutative ring. The title of the course for which this book is intended includes the words “linear algebra,” so we feel some obligation to define what a linear algebra is. Definition Let F be a field. A linear algebra over the field F is a vector space A over F with an additional operation called multiplication of vectors which associates with each pair of vectors u, v ∈ A a vector uv in A called the product of u and v in such a way that (a) multiplication is associative: u(vw) = (uv)w for all u, v, w ∈ A; (b) multiplication distributes over addition: u(v + w) = (uv) + (uw) and (u + v)w = (uw) + (vw), for all u, v, w ∈ A; (c) for each scalar c ∈ F , c(uv) = (cu)v = u(cv) for all u, v ∈ A. If there is an element 1 ∈ A such that 1u = u1 = u for each u ∈ A, we call A a linear algebra with identity over F , and call 1 the identity of A. The algebra A is called commutative provided uv = vu for all u, v ∈ A. Warning: What we have just called a “linear algebra” is sometimes called a “linear associative algebra,” because multiplication of vectors is associative. There are important situations in which nonassociative linear al47

48

CHAPTER 5. POLYNOMIALS

gebras (i.e., not necessarily associative algebras) are studied. We will not meet them in this course, so for us all linear algebras are associative. Example 5.1.1. The set of n × n matrices over a field, with the usual operations, is a linear algebra with identity; in particular the field itself is an algebra with identity. This algebra is not commutative if n ≥ 2. Of course, the field itself is commutative. Example 5.1.2. The space of all linear operators on a vector space, with composition as the product, is a linear algebra with identity. It is commutative if and only if the space is one-dimensional. Now we turn our attention to the construction of an algebra which is quite different from the two just given. Let F be a field and let S be the set of all nonnegative integers. We have seen that the set of all functions from S into F is a vector space which we now denote by F ∞ . The vectors in F ∞ are just infinite sequences (i.e., lists) f = (f0 , f1 , f2 , . . .) of scalars fi ∈ F . If g = (g0 , g1 , g2 , . . .) and a, b ∈ F , then af + bg is the infinite list given by af + bg = (af0 + bg0 , af1 + bg1 , . . .)

(5.1)

We define a product in F ∞ by associating with each pair (f, g) of vectors in F ∞ the vector f g which is given by (f g)n =

n X

fi gn−i , n = 0, 1, 2, . . .

(5.2)

i=0

Since multiplication in F is commutative, it is easy to show that multiplication in F ∞ is also commutative. In fact, it is a relatively routine task to show that F ∞ is now a linear algebra with identity over F . Of course the vector (1, 0, 0, . . .) is the identity, and the vector x = (0, 1, 0, 0, . . .) plays a distinguished role. Throughout this chapter x will continue to denote this particular vector (and will never be an element of the field F ). The product of x with itself n times will be denoted by xn , and by convention x0 = 1. Then x2 = (0, 0, 1, 0, . . .), x3 = (0, 0, 0, 1, 0, . . .), etc. Obs. 5.1.3. The list (1, x, x2 , . . .) is both independent and infinite. Thus the algebra F ∞ is not finite dimensional.

5.2. THE ALGEBRA OF POLYNOMIALS

49

The algebra F ∞ is sometimes called the algebra of formal power series over F . The element f = (f0 , f1 , f2 , . . .) is frequently written as f=

∞ X

f n xn .

(5.3)

n=0

This notation is very convenient, but it must be remembered that it is purely formal. In algebra there is no such thing as an ‘infinite sum,’ and the power series notation is not intended to suggest anything about convergence.

5.2

The Algebra of Polynomials

Definition Let F [x] be the subspace of F ∞ spanned by the vectors 1, x, x2 , . . . ,. An element of F [x] is called a polynomial over F . Since F [x] consists of all (finite) linear combinations of x and its powers, a non-zero vector f in F ∞ is a polynomial if and only if there is an integer n ≥ 0 such that fn 6= 0 and such that fk = 0 for all integers k > n. This integer (when it exists) is called the degree of f and is denoted by deg(f ).The zero polynomial is said to have degree −∞. So if f ∈ F [x] has degree n, it may be written in the form f = f0 x0 + f1 x + f2 x2 + · · · + fn xn , fn 6= 0. Usually f0 x0 is simply written f0 and called a scalar polynomial. A non-zero polynomial f of degree n such that fn = 1 is called a monic polynomial. The verification of the various parts of the next result guaranteeing that A = F [x] is an algebra is routine and is left to the reader. Theorem 5.2.1. Let f and g be non-zero polynomials over F . Then (i) f g is a non-zero polynomial; (ii) deg(f g) = deg(f ) + deg(g) ; (iii) f g is a monic polynomial if both f and g are monic; (iv) f g is a scalar polynomial if and only if both f and g are scalar polynomials; (v) deg(f + g) ≤ max{deg(f ), deg(g)}. Corollary 5.2.2. The set F [x] of all polynomials over a given field F with the addition and multiplication given above is a commutative linear algebra with identity over F .

50


Corollary 5.2.3. Suppose f, g, and h are polynomials over F such that f 6= 0 and f g = f h. Then g = h. Proof. Since f g = f h, also f (g − h) = 0. Since f 6= 0, it follows from (i) above that g − h = 0. P Pn i j Let f = m i=0 fi x and g = j=0 gj x , and interpret fk = 0, gt = 0, if k > m, t > n, respectively. Then ! m+n i X X X fg = fi gj xi+j = fj gi−j xi , i,j

i=0

j=0

where the first sum is extended over all integer pairs (i, j) with 0 ≤ i ≤ m and 0 ≤ j ≤ n. Definition Let A be a linear algebra with identity over the field F . We denote the identity of A by 1 and make the that u0 = 1 for each Pnconvention i u ∈ A. Then to each polynomial f = i=0 fi x over F and u ∈ A, we associate an element f (u) in A by the rule n X

f (u) =

fi ui .

i=0 0

Warning: f0 u = f0 · 1, where 1 is the multiplicative identity of A. Example 5.2.4. Let C be the field of complex numbers and let f = x2 + 2. (a) If A = C and z ∈ C, f (z) = z 2 + 2, in particular f (3) = 11 and 1+i = 1. f 1−i (b) If A is the algebra of all 2 × 2 matrices over C and if B=

1 0 −1 2

then f (B) = 2

1 0 0 1

+

1 0 −1 2

,

2

=

3 0 −3 6

.

5.2. THE ALGEBRA OF POLYNOMIALS

51

(c) If A is the algebra of all linear operators on C 3 and T is the element of A given by √ √ T (c1 , c2 , c3 ) = (i 2c1 , c2 , i 2c3 ), then f (T ) is the linear operator on C 3 defined by f (T )(c1 , c2 , c3 ) = (0, 3c2 , 0). (d) If A is the algebra of all polynomials over C and g = x4 + 3i, then f (g) is the polynomial in A given by f (g) = −7 + 6ix4 + x8 . Theorem 5.2.5. Let F be a field, A a linear algebra with identity over F , f, g ∈ F [x], u ∈ A and c ∈ F . Then: (i) (cf + g)(u) = cf (u) + g(u); (ii) (f g)(u) = f (u)g(u) = (gf )(u). Proof. We leave (i) as an exercise. So for (ii), suppose f=

m X

i

fi x and g =

i=0

Recall that f g =

(f g)(u) =

P

i,j

X i,j

n X

gj xj .

j=0

fi gj xi+j . So using (i) we obtain

fi gj ui+j =

m X

! fi ui

i=0

n X

! gj uj

= f (u)g(u).

j=0

Fix u ∈ A and define Eu : F [x] → A by Eu (f ) = f (u).

(5.4)

Using Theorem 5.2.5 it is now easy to see that the map Eu : F [x] → A is an algebra homomorphism, i.e., it preserves addition and multiplication. There is a special case of this that is so important for us that we state it as a separate corollary. Corollary 5.2.6. If A = L(V ) and T ∈ A, and if f, g ∈ F [x],then (f · g)(T ) = f (T ) ◦ g(T ).

52

5.3


Lagrange Interpolation

Throughout this section F is a fixed field and t0 , t1 , . . . , tn are n + 1 distinct elements of F . Put V = {f ∈ F [x] : deg(f ) ≤ n}, and define Ei : V → F by Ei (f ) = f (ti ), 0 ≤ i ≤ n. By Theorem 5.2.5 each Ei is a linear functional on V . Moreover, we show that B ∗ = (E0 , E1 , . . . , En ) is the basis of V ∗ dual to a particular basis of V . Put Y x − tj . pi = ti − tj j6=i Then each pi has degree n, so belongs to V , and Ej (pi ) = pi (tj ) = δij .

(5.5)

It will turn out that B = (p0 , . . . , pn ) is a basis for V , and then Eq. 5.5 expresses what we mean by saying that B ∗ is the basis dual to B. P If f = ni=0 ci pi , then for each j X f (tj ) = ci pi (tj ) = cj . (5.6) i

So if f is the zero polynomial, each cj must equal 0, implying that the list (p0 , . . . , pn ) is linearly independent in V . Since (1, x, x2 , . . . , xn ) is a basis for V , clearly dim(V ) = n + 1. Hence B = (p0 , . . . , pn ) must be a basis for V . It then follows from Eq. 5.6 that for each f ∈ V , we have f=

n X

f (ti )pi .

(5.7)

i=0

The expression in Eq. 5.7 is known as Lagrange’s Interpolation Formula. Setting f = xj in Eq. 5.7 we obtain j

x =

n X

(ti )j pi

(5.8)

i=0

Definition Let A1 and A2 be two linear algebras over F . They are said to be isomorphic provided there is a one-to-one mapping u 7→ u0 of A1 onto A2 such that (a) (cu + dv)0 = cu0 + dv 0

5.3. LAGRANGE INTERPOLATION

53

and (b) (uv)0 = u0 v 0 for all u, v ∈ A1 and all scalars c, d ∈ F . The mapping u 7→ u0 is called an isomorphism of A1 onto A2 . An isomorphism of A1 onto A2 is thus a vector space isomorphism of A1 onto A2 which has the additional property of preserving products. Example 5.3.1. Let V be an n-dimensional vector space over the field F . As we have seen earlier, each ordered basis B of V determines an isomorphism T 7→ [T ]B of the algebra of linear operators on V onto the algebra of n × n matrices over F . Suppose now that S is a fixed linear operator on V and that we are given a polynomial n X f= ci x i i=0

with coefficients ci ∈ F . Then f (S) =

n X

ci S i .

i=0

Since T 7→ [T ]B is a linear mapping, [f (S)]B =

n X

ci [S i ]B .

i=0

From the additional fact that [T1 T2 ]B = [T1 ]B [T2 ]B for all T1 , T2 ∈ L(V ), it follows that [S i ]B = ([S]B )i , 2 ≤ i ≤ n. As this relation is also valid for i = 0 and 1, we obtain the result that Obs. 5.3.2. [f (S)]B = f ([S]B ) . In other words, if S ∈ L(V ), the matrix of a polynomial in S, with respect to a given basis, is the same polynomial in the matrix of S.

54


5.4

Polynomial Ideals

In this section we are concerned primarily with the fact that F [x] is a principal ideal domain. Lemma 5.4.1. Suppose f and d are non-zero polynomials in F [x] such that deg(d) ≤ deg(f ). Then there exists a poynomial g ∈ F [x] for which deg(f − dg) < deg(f ). Note: This includes the possibility that f = dg so deg(f − dg) = −∞. Proof. Suppose f = am x m +

m−1 X

ai xi , am 6= 0

i=0

and that n

d = bn x +

n−1 X

bi xi , bn 6= 0.

i=0

Then m ≥ n and am am m−n m−n x d = 0 or deg f − x d < deg(f ). f− bn bn Thus we may take g = abm xm−n . n This lemma is useful in showing that the usual algorithm for “long division” of polynomials works over any field. Theorem 5.4.2. If f, d ∈ F [x] and d 6= 0, then there are unique polynomials q, r ∈ F [x] such that (i) f = dq + r; (ii) deg(r) < deg(d). Proof. If deg(f ) < deg(d) we may take q = 0 and r = f . In case f 6= 0 and deg(f ) ≥ deg(d), the preceding lemma shows that we may choose a polynomial g ∈ F [x] such that deg(f − dg) < deg(f ). If f − dg 6= 0 and deg(f − dg) ≥ deg(d) we choose a polynomial h ∈ F [x] such that deg[f − d(g + h)] < deg(f − dg).

5.4. POLYNOMIAL IDEALS

55

Continuing this process as long as necessary, we ultimately obtain polynomials q, r satisfying (i) and (ii). Suppose we also have f = dq1 + r1 where deg(r1 ) < deg(d). Then dq + r = dq1 + r1 and d(q − q1 ) = r1 − r. If q − a1 6= 0, then d(q − q1 ) 6= 0 and deg(d) + deg(q − q1 ) = deg(r1 − r). But since the degree of r1 − r is less than the degree of d, this is impossible. Hence q = a1 and then r = r1 . Definition Let d be a non-zero polynomial over the field F . If f ∈ F [x], the preceding theorem shows that there is at most one polynomial q ∈ F [x] such that f = dq. If such a q exists we say that d divides f , that f is divisible by d, and call q the quotient of f by d. We also write q = f /d. Corollary 5.4.3. Let f ∈ F [x], and let c ∈ F . Then f is divisible by x − c if and ony if f (c) = 0. Proof. By the theorem, f = (x − c)q + r where r is a scalar polynomial. By Theorem 5.2.5, f (c) = 0q(c) + r(c) = r(c). Hence r = 0 if and only if f (c) = 0. Definition Let F be a field. An element c ∈ F is said to be a root or a zero of a given polynomial f ∈ F [x] provided f (c) = 0. Corollary 5.4.4. A polynomial f ∈ F [x] of degree n has at most n roots in F. Proof. The result is obviously true for polynomials of degree 0 or 1. We assume it to be true for polynomials of degree n − 1. If a is a root of f , f = (x − a)q where q has degree n − 1. Since f (b) = 0 if and only if a = b or q(b) = 0, it follows by our induction hypothesis that f has at most n roots. Definition Let F be a field. An ideal in F [x] is a subspace M of F [x] such that f g belongs to M whenever f ∈ F [x] and g ∈ M .

56


Example 5.4.5. If F is a field and d ∈ F [x], the set M = dF [x] of all multiples df of d by arbitrary f in F [x] is an ideal. This is because d ∈ M (so M is nonempty), and it is easy to check that M is closed under addition and under multiplication by any element of F [x]. The ideal M is called the principal ideal generated by d and is denoted by dF [x]. If d is not the zero polynomial and its leading coefficient is a, then d1 = a−1 d is monic and d1 F [x] = dF [x]. Example 5.4.6. Let d1 , . . . , dn be a finite number of polynomials over F . Then the (vector space) sum M of the subspaces di F [x] is a subspace and is also an ideal. M is the ideal generated by the polynomials d1 , . . . , dn . The following result is the main theorem on ideals in F [x]. Theorem 5.4.7. Let M be any non-zero ideal in F [x]. Then there is a unique monic polynomial d ∈ F [x] such that M is the principal ideal dF [x] generated by d. Proof. Among all nonzero polynomials in M there is (at least) one of minimal degree. Hence there must be a monic polynomial d of least degree in M . Suppose that f is any element of M . We can divide f by d and get a unique quotient and remainder: f = qd + r where deg(r) < deg(d). Then r = f − qd ∈ M , but deg(r) is less than the smallest degree of any nonzero polynomial in M . Hence r = 0. So f = qd. This shows that M ⊆ dF [x]. Clearly d ∈ M implies dF [x] ⊆ M , so in fact M = dF [x]. If d1 and d2 are two monic polynomials in F [x] for which M = d1 F [x] = d2 F [x], then d1 divides d2 and d2 divides d1 . Since they are monic, they clearly must be identical. Definition If p1 , . . . , pk ∈ F [x] and not all of them are zero, then the monic generator d of the ideal p1 F [x] + · · · + pk F [x] is called the greatest comon divisor (gcd) of p1 , · · · , pk . We say that the polynomials p1 , . . . , pk are relatively prime if the greatest common divisor is 1, or equivalently if the ideal they generate is all of F [x]. There is a very important (but now almost trivial) consequence of Theorem 5.4.7. It is easy to show that if A is any n × n matrix over any field F , the set {f (x) ∈ F [x] : f (A) = 0} is an ideal in F [x]. Hence there is a unique monic polynomial p(x) of least degree such that p(A) = 0, and if g(x) ∈ F [x] satisfies g(A) = 0, then g(x) = p(x)h(x) for some h(x) ∈ F [x]. This polynomial p(x) is called the minimal polynomial of A.

5.4. POLYNOMIAL IDEALS

57

The exercises at the end of this chapter are to be considered an integral part of the chapter. You should study them all. Definition The field F is called algebraically closed provided each polynomial in F [x] that is irreducible over F has degree 1. To say that F is algebraically closed means the every non-scalar irreducible monic polynomial over F is of the form x − c. So to say F is algebraically closed really means that each non-scalar polynomial f in F [x] can be expressed in the form f = c(x − c1 )n1 · · · (x − ck )nk where c is a scalar and c1 , . . . , ck are distinct elements of F . It is also true that F is algebraically closed provided that each non-scalar polynomial over F has a root in F . The field R of real numbers is not algebraically closed, since the polynomial (x2 + 1) is irreducible over R but not of degree 1. The Fundamental Theorem of Algebra states that the field C of complex numbers is algebraically closed. We shall prove this theorem later after we have introduced the concepts of determinant and of eigenvalues. The Fundamental Theorem of Algebra also makes it clear what the possibilities are for the prime factorization of a polynomial with real coefficients. If f is a polynomial with real coefficients and c is a complex root of f , then the complex conjugate c is also a root of f . Therefore, those complex roots which are not real must occur in conjugate pairs, and the entire set of roots has the form {b1 , . . . , br , c1 , c1 , . . . , ck , ck }, where b1 , . . . , br are real and c1 , . . . , ck are non-real complex numbers. Thus f factors as f = c(x − b1 ) . . . (x − br )p1 · · · pk where pi is the quadratic polynomial pi = (x − ci )(x − ci ). These polynomials pi have real coefficients. We see that every irreducible polynomial over the real number field has degree 1 or 2. Each polynomial over R is the product of certain linear factors given by the real roots of f and certain irreducible quadratic polynomials.

58


5.5

Exercises

m! 1. (The Binomial Theorem) Let m = k!(m−k)! be the usual binomial cok efficient. Let a, b be elements of any commutative ring. Then m

(a + b) =

m X m i=0

k

am−k bk .

Note that the binomial coefficient is an integer that may be reduced modulo any modulus p. It follows that even when the denominator of m appears to be 0 in some ring of characteristic p, for example, this k binomial coefficient can be interpreted modulo p. For example 6 = 20 ≡ 2 (mod 3), 3 even though 3! is zero modulo 3. The derivative of the polynomial f = c0 + c1 x + · · · + cn x n is the polynomial f 0 = Df = c1 + 2c2 x + · · · + ncn xn−1 . Note: D is a linear operator on F [x]. 2. (Taylor’s Formula). Let F be any field, let n be any positive integer, and let f ∈ F have degree m ≤ n. Then f=

n X Dk (f )(c) k=0

k!

(x − c)k .

Be sure to explain how to deal with the case where k! is divisible by the characteristic of the field F . If f ∈ F [x] and c ∈ F , the multiplicity of c as a root of f is the largest positive integer r such that (x − c)r divides f .

5.5. EXERCISES

59

3. Show that if the multiplicity of c as a root of f is r ≥ 2, then the multiplicity of c as a root of f 0 is at least r − 1. a b 4. Let A = . Let M = {f ∈ F [x] : f (A) = 0}. Show that M is c d a nonzero ideal. (Hint: consider the polynomial f (x) = x2 − (a + d)x + (ad − bc).) Definition Let F be a field. A polynomial f ∈ F [x] is said to be reducible over F provided there are polynomials g, h ∈ F [x] with degree at least 1 for which f = gh. If f is not reducible over F , it is said to be irreducible over F . A polynomial p(x) ∈ F [x] of degree at least 1 is said to be a prime polynomial over F provided whenever p divides a product gh of two polynomials in F [x] then it has to divide at least one of g and h. 5. Show that a polynomial p(x) ∈ F [x] with degree at least 1 is prime over F if and only if it is irreducible over F . 6. (The Primary Decomposition of f ) If F is a field, a non-scalar monic polynomial in F [x] can be factored as a product of monic primes in F [x] in one and, except for order, only one way. If p1 , . . . , pk are the distinct monic primes occurring in this factorization of f , then f = pn1 1 pn2 2 · · · pnk k , where ni is the number of times the prime pi occurs in this factorization. This decomposition is also clearly unique and is called the primary decomposition of f (or simply the prime factorization of f ). 7. Let f be a non-scalar monic polynomial over the field F , and let f = pn1 1 · · · pnk k be the prime factorization of f . For each j, 1 ≤ j ≤ k, let fj =

f pj nj

=

Then f1 , . . . , fk are relatively prime.

Y i6=j

pni i .

60

CHAPTER 5. POLYNOMIALS 8. Using the same notation as in the preceding problem, suppose that f = p1 · · · pk is a product of distinct non-scalar irreducible polynomials over F . So fj = f /pj . Show that f 0 = p01 f1 + p02 f2 + · · · + p0k fk . 9. Let f ∈ F [x] have derivative f 0 . Then f is a product of distinct irreducible polynomials over F if and only if f and f 0 are relatively prime.

10. Euclidean algorithm for polynomials Let f (x) and g(x) be polynomials over F for which deg(f (x)) ≥ deg(g(x)) ≥ 1. Use the division algorithm for polynomials to compute polynomials qi (x) and ri (x) as follows. f (x) g(x) r1 (x) r2 (x)

= = = = .. . rj (x) = rj (x) =

q1 (x)g(x) + r1 (x), degr1 (x) < deg(g(x)). If r1 (x) 6= 0, then q2 r1 (x) + r2 (x), degr2 (x) < deg(g(x)). If r2 (x) 6= 0, then q3 (x)r2 (x) + r3 (x), degr3 (x) < deg(r2 (x)). If r3 (x) 6= 0, then q4 (x)r3 (x) + r4 (x), degr4 (x) < deg(r3 (x)). If r4 (x) 6= 0, then qj+2 (x)rj+1 (x) + rj+2 (x), degrj+2 (x) < deg(rj+1 (x)). If rj+2 (x) = 0, qj+2 (x)rj+1 (x) + 0.

Show that rj+1 (x) = gcd(f (x), g(x)). Then use these equations to obtain polynomials a(x) and b(x) for which rj+1 (x) = a(x)f (x)+b(x)g(x). The case where 1 is the gcd of f (x) and g(x) is especially useful.

Chapter 6 Determinants We assume that the reader has met the notion of a commutative ring K with 1. Our main goal in this chapter is to study the usual determinant function defined on the set of square matrices with entries from such a K. However, essentially nothing from the general theory of commutative rings with 1 will be used. One of the main types of application of the notion of determinant is to determinants of matrices whose entries are polynomials in one or more indeterminates over a field F . So we might have K = F [x], the ring of polynomials in the indeterminate x with coefficients from the field F . It is also quite useful to consider the theory of determinants over the ring Z of rational integers.

6.1

Determinant Functions

Throughout these notes K will be a commutative ring with identity. Then for each positive integer n we wish to assign to each n × n matrix over K a scalar (element of K) to be known as the determinant of the matrix. As soon as we have defined these terms we may say that the determinant function is n-linear alternating with value 1 at the identity matrix. Definition Let D be a function which assigns to each n × n matrix A over K a scalar D(A) in K. We say that D is n-linear provided that for each i, 1 ≤ i ≤ n, D is a linear function of the ith row when the other n − 1 rows are held fixed. Perhaps this definition needs some clarification. If D is a function from 61

62

CHAPTER 6. DETERMINANTS

Mm,n (K) into K, and if α1 , . . . , αn are the rows of the matrix A, we also write D(A) = D(α1 , . . . , αn ), that is, we think of D as a function of the rows of A. The statement that D is n-linear then means D(α1 , . . . , cαi + αi0 , . . . , αn ) = cD(α1 , . . . , αi , . . . , αn ) + + D(α1 , . . . , αi0 , . . . , αn ).

(6.1)

If we fix all rows except row i and regard D as a function of the ith row, it is often convenient to write D(αi ) for D(A). Thus we may abbreviate Eq. 6.1 to D(cαi + αi0 ) = cD(αi ) + D(αi0 ), so long as it is clear what the meaning is. In the following sometimes we use Aij to denote the element in row i and column j of the matrix A, and sometimes we write A(i, j). Example 6.1.1. Let k1 , . . . , kn be positive integers, 1 ≤ ki ≤ n, and let a be any element of K. For each n × n matrix A over K, define D(A) = aA(1, k1 )A(2, k2 ) · · · A(n, kn ).

(6.2)

Then the function defined by Eq. 6.2 is n-linear. For, if we regard D as a function of the ith row of A, the others being fixed, we may write D(αi ) = A(i, ki )b where b is some fixed element of K (b is a multiplied by one entry of A from each row other than the i-th row.) . Let αi0 = (A0i1 , . . . , A0in ). Then we have D(cαi + αi0 ) = [cA(i, ki ) + A0 (i, ki )]b = cD(αi ) + D(αi0 ). Thus D is a linear function of each of the rows of A. A particular n-linear function of this type is just the product of the diagonal entries: D(A) = A11 A22 · · · Ann .

6.1. DETERMINANT FUNCTIONS

63

Example 2. We find all 2-linear functions on 2 × 2 matrices over K. Let D be such a function. If we denote the rows of the 2 × 2 identity matrix by 1 and 2 , then we have D(A) = D(A11 1 + A12 2 , A21 1 + A22 2 ). Using the fact that D is 2-linear, we have D(A) = A11 D(1 , A21 1 + A22 2 ) + A12 D(2 , A21 1 + A22 2 ) = = A11 A21 D(1 , 1 ) + A11 A22 D(1 , 2 ) + A12 A21 D(2 , 1 ) + A12 A22 D(2 , 2 ). This D is completely determined by the four scalars D(1 , 1 ), D(1 , 2 ), D(2 , 1 ), D(2 , 2 ). It is now routine to verify the following. If a, b, c, d are any four scalars in K and if we define D(A) = A11 A21 a + A11 A22 b + A12 A21 c + A12 A22 d, then D is a 2-linear function on 2 × 2 matrices over K and D(1 , 1 ) = a, D(1 , 2 ) = b D(2 , 1 ) = c, D(2 , 2 ) = d. Lemma 6.1.2. A linear combination of n-linear functions is n-linear. Proof. It suffices to prove that a linear combination of two n-linear functions is n-linear. Let D and E be n-linear functions. If a and b are elements of K the linear combination aD + bE is defined by (aD + bE)(A) = aD(A) + bE(A). Hence, if we fix all rows except row i, (aD + bE)(cαi + αi0 ) = aD(cαi + αi0 ) + bE(cαi + αi0 ) = acD(αi ) + aD(αi0 ) + bcE(αi ) + bE(αi0 ) = c(aD + bE)(αi ) + (aD + bE)(αi0 ).

64


NOTE: If K is a field and V is the set of n×n matrices over K, the above lemma says the following. The set of n-linear functions on V is a subspace of the space of all functions from V into K. Example 3. Let D be the function defined on 2 × 2 matrices over K by D(A) = A11 A22 − A12 A21 .

(6.3)

This D is the sum of two functions of the type described in Example 1:

D = D1 + D2 D1 (A) = A11 A22 D2 (A) = −A12 A21 (6.4) Most readers will recognize this D as the “usual” determinant function and will recall that it satisfies several additional properties, such as the following one.

6.1.3

n-Linear Alternating Functions

Definition: Let D be an n-linear function. We say D is alternating provided D(A) = 0 whenever two rows of A are equal. Lemma 6.1.4. Let D be an n-linear alternating function, and let A be n×n. If A0 is obtained from A by interchanging two rows of A, then D(A0 ) = −D(A). Proof. If the ith row of A is α and the jth row of A is β, i 6= j, and all other rows are being held constant, we write D(α, β) in place of D(A). D(α + β, α + β) = D(α, α) + D(α, β) + D(β, α) + D(β, β). By hypothesis, D(α + β, α + β) = D(α, α) = D(β, β) = 0. So 0 = D(α, β) + D(β, α).

6.1. DETERMINANT FUNCTIONS

65

If we assume that D is n-linear and has the property that D(A0 ) = −D(A) when A0 is obtained from A by interchanging any two rows of A, then if A has two equal rows, clearly D(A) = −D(A). If the characteristic of the ring K is odd or zero, then this forces D(A) = 0, so D is alternating. But, for example, if K is an integral domain with characteristic 2, this is clearly not the case.

6.1.5

A Determinant Function - The Laplace Expansion

Definition: Let K be a commutative ring with 1, and let n be a positive integer. Suppose D is a function from n × n matrices over K into K. We say that D is a determinant function if D is n-linear, alternating and D(I) = 1. It is clear that there is a unique determinant function on 1 × 1 matrices, and we are now in a position to handle the 2 × 2 case. It should be clear that the function given in Example 3. is a determinant function. Furthermore, the formulas exhibited in Example 2. make it easy to see that the D given in Example 3. is the unique determinant function. Lemma 6.1.6. Let D be an n-linear function on n × n matrices over K. Suppose D has the property that D(A) = 0 when any two adjacent rows of A are equal. Then D is alternating. Proof. Let B be obtained by interchanging rows i and j of A, where i < j. We can obtain B from A by a succession of interchanges of pairs of adjacent rows. We begin by interchanging row i with row i + 1 and continue until the rows are in the order α1 , . . . , αi−1 , αi+1 , . . . , αj , αi , αj+1 , . . . , αn . This requires k = j − i interchanges of adjacent rows. We now move αj to the ith position using (k − 1) interchanges of adjacent rows. We have thus obtained B from A by 2k − 1 interchanges of adjacent rows. Thus by Lemma 6.1.4, D(B) = −D(A). Suppose A is any n × n matrix with two equal rows, say αi = αj with i < j. If j = i + 1, then A has two equal and adjacent rows, so D(A) = 0.

66


If j > i + 1, we intechange αi+1 and αj and the resulting matrix B has two equal and adjacent rows, so D(B) = 0. On the other hand, D(B) = −D(A), hence D(A) = 0. Lemma 6.1.7. Let K be a commutative ring with 1 and let D be an alternating n-linear function on n × n matrices over K. Then (a) D(A) = 0 if one of the rows of A is 0. (b) D(B) = D(A) if B is obtained from A by adding a scalar multiple of one row of A to a different row of A. Proof. For part (a), suppose the ith row αi is a zero row. Using the linearity of D in the ith row of A says D(αi + αi ) = D(αi ) + D(αi ), which forces D(A) = D(αi ) = 0. For part (b), if i 6= j, write D(A) = D(αi , αj ), with all rows other than the ith one held fixed. D(B) = D(αi + cαj , αj ) = D(αi , αj ) + cD(αj , αj ) = D(A) + 0. Definition: If n > 1 and A is an n × n matrix over K, we let A(i|j) denote the (n − 1) × (n − 1) matrix obtained by deleting the ith row and jth column of A. If D is an (n − 1)-linear function and A is an n × n matrix, we put Dij (A) = D[A(i|j)]. Theorem 6.1.8. Let n > 1 and let D be an alternating (n−1)-linear function on (n − 1) × (n − 1) matrices over K. For each j, 1 ≤ j ≤ n, the function Ej defined by n X Ej (A) = (−1)i+j Aij Dij (A) (6.5) i=1

is an alternating n-linear function on n×n matrices A. If D is a determinant function, so is each Ej . Proof. If A is an n × n matrix, Dij (A) is independent of the ith row of A. Since D is (n − 1)-linear, it is clear that Dij is linear as a function of any row except row i. Therefore Aij Dij (A) is an n-linear function of A. Hence Ej is n-linear by Lemma 6.1.2. To prove that Ej is alternating it will suffice to show that Ej (A) = 0 whenever A has two equal and adjacent rows. Suppose αk = αk+1 . If i 6= k and i 6= k + 1, the matrix A(i|j) has two equal rows, and thus Dij (A) = 0. Therefore, Ej (A) = (−1)k+j Akj Dkj (A) + (−1)k+1+j A(k+1)j D(k+1)j (A).

6.2. PERMUTATIONS & UNIQUENESS OF DETERMINANTS

67

Since αk = αk+1 , Akj = A(k+1)j and A(k|j) = A(k + 1|j). Clearly then Ej (A) = 0. Now suppose D is a determinant function. If I (n) is the n × n identity matrix, then I (n) (j|j) is the (n − 1) × (n − 1) identity matrix I (n−1) . Since (n) Iij = δij , it follows from Eq. 6.5 that Ej (I (n) ) = D(I (n−1) ).

(6.6)

Now D(I (n−1) ) = 1, so that Ej (I (n) ) = 1 and Ej is a determinant function.

We emphasize that this last Theorem (together with a simple induction argument) shows that if K is a commutative ring with identity and n ≥ 1, then there exists at least one determinant function on K n×n . In the next section we will show that there is only one determinant function. The determinant function Ej is referred to as the Laplace expansion of the determinant along the jth column. There is a similar Laplace expansion of the determinant along the ith row of A which will eventually show up as an easy corollary.

6.2

Permutations & Uniqueness of Determinants

6.2.1

A Formula for the Determinant

Suppose that D is an alternating n-linear function on n × n matrices over K. Let A be an n × n matrix over K with rows α1 , α2 , . . . , αn . If we denote the rows of the n × n identity matrix over K by 1 , 2 , . . . , n , then αi =

n X j=1

Hence

Aij j , 1 ≤ i ≤ n.

(6.7)

68


! X

D(A) = D

A1j j , α2 , . . . , αn

j

X

=

A1j D(j , α2 , . . . , αn ).

j

If we now replace α2 with D(A) =

P

X

k

A2k k , we see that

A1j A2k D(j , k , . . . , αn ).

j,k

In this expression replace α3 by D(A) =

X

P

l

A3l l , etc. We finally obtain

A1k1 A2k2 · · · Ankn D(k1 , . . . , kn ).

(6.8)

k1 ,k2 ,...,kn

Here the sum is over all sequences (k1 , k2 , . . . , kn ) of positive integers not exceeding n. This shows that D is a finite sum of functions of the type described by Eq. 6.2. Note that Eq. 6.8 is a consequence just of the assumption that D is n-linear, and that a special case was obtained earlier in Example 2. Since D is alternating, D(k1 , k2 , . . . , kn ) = 0 whenever two of the indices ki are equal. A sequence (k1 , k2 , . . . , kn ) of positive integers not exceeding n, with the property that no two of the ki are equal, is called a permutation of degree n. In Eq. 6.8 we need therefore sum only over those sequences which are permutations of degree n. A permutation of degree n may be defined as a one-to-one function from the set {1, 2, . . . , n} onto itself. Such a function σ corresponds to the n-tuple (σ1, σ2, . . . , σn) and is thus simply a rule for ordering 1, 2, . . . , n in some well-defined way. If D is an alternating n-linear function and A is a n × n matrix over K, we then have D(A) =

X

A1(σ1) · · · An(σn) D(σ1 , . . . , σn )

(6.9)

σ

where the sum is extended over the distinct permutations σ of degree n.


69

Next we shall show that D(σ1 , . . . , σn ) = ±D(1 , . . . , n )

(6.10)

where the sign ± depends only on the permutation σ. The reason for this is as follows. The sequence (σ1, σ2, . . . , σn) can be obtained from the sequence (1, 2, . . . , n) by a finite number of interchanges of pairs of elements. For example, if σ1 6= 1, we can transpose 1 and σ1, obtaining (σ1, . . . , 1, . . .). Proceeding in this way we shall arrive at the sequence (σ1, . . . , σn)) after n or fewer such interchanges of pairs. Since D is alternating, the sign of its value changes each time that we interchange two of the rows i and j . Thus, if we pass from (1, 2, . . . , n) to (σ1, σ2, . . . , σn) by means of m interchanges of pairs (i, j), we shall have D(σ1 , . . . , σn ) = (−1)m D(1 , . . . , n ). In particular, if D is a determinant function D(σ1 , . . . , σn ) = (−1)m ,

(6.11)

where m depends only on σ, not on D. Thus all determinant functions assign the same value to the matrix with rows σ1 , . . . , σn , and this value is either 1 or -1. A basic fact about permutations is the following: if σ is a permutation of degree n, one can pass from the sequence (1, 2, . . . , n) to the sequence (σ1, σ2, . . . , σn) by a succession of interchanges of pairs, and this can be done in a variety of ways. However, no matter how it is done, the number of interchanges used is either always even or always odd. The permutation is then called even or odd, respectively. One defines the sign of a permutation by 1, if σ is even; sgn σ = −1, if σ is odd. We shall establish this basic properties of permutations below from what we already know about determinant functions. However, for the moment let us assume this property. Then the integer m occurring in Eq. 6.11 is always even if σ is an even permutation, and is always odd if σ is an odd permutation. For any alternating n-linear function D we then have D(σ1 , . . . , σn ) = (sgn σ)D(1 , . . . , n ),

70


and using Eq. 6.9 we obtain " # X D(A) = (sgn σ)A1(σ1) · · · An(σn) D(I).

(6.12)

σ

From Eq. 6.12 we see that there is precisely one determinant function on n × n matrices over K. If we denote this function by det, it is given by X det(A) = (sgn σ)A1(σ1) A2(σ2) · · · An(σn) , (6.13) σ

the sum being extended over the distinct permutations σ of degree n. We can formally summarize this as follows. Theorem 6.2.2. Let K be a commutative ring with 1 and let n be a positive integer. There is precisely one determinant function on the set of n × n matrices over K and it is the function det defined by Eq. 6.13. If D is any alternating n-linear function on Mn (K), then for each n × n matrix A, D(A) = (det A)D(I). This is the theorem we have been working towards, but we have left a gap in the proof. That gap is the proof that for a given permutation σ, when we pass from (1, 2, . . . , n) to (σ1, . . . , σn) by interchanging pairs, the number of interchanges is always even or always odd. This basic combintaorial fact can be proved without any reference to determinants. However, we now point out how it follows from the existence of a determinant function on n × n matrices. Let K be the ring of rational integers. Let D be a determinant function on the n × n matrices over K. Let σ be a permutation of degree n, and suppose we pass from (1, 2, . . . , n) to (σ1, . . . , σn) by m interchanges of pairs (i, j), i 6= j. As we showed in Eq. 6.11 (−1)m = D(σ1 , . . . , σn ), that is, the number (−1)m must be the value of D on the matrix with rows σ1 , . . . , σn . If D(σ1 , . . . , σn ) = 1, then m must be even. If D(σ1 , . . . , σn ) = −1,


71

then m must be odd. From the point of view of products of permutations, the basic property of the sign of a permutation is that sgn (στ ) = (sgn σ)(sgn τ ).

(6.14)

This result also follows from the theory of determinants (but is well known in the theory of the symmetric group independent of any determinant theory). In fact, it is an easy corollary of the following theorem. Theorem 6.2.3. Let K be a commutative ring with identity, and let A and B be n × n matrices over K. Then det (AB) = (det A)( det B). Proof. Let B be a fixed n × n matrix over K, and for each n × n matrix A define D(A) = det (AB). If we denote the rows of A by α1 , . . . , αn , then D(α1 , . . . , αn ) = det (α1 B, . . . , αn B). Here αj B denotes the 1 × n matrix which is the product of the 1 × n matrix αj and the n × n matrix B. Since (cαi + αi0 )B = cαi B + αi0 B and det is n-linear, it is easy to see that D is n-linear. If αi = αj , then αi B = αj B, and since det is alternating, D(α1 , . . . , αn ) = 0. Hence, D is alternating. So D is an alternating n-linear function, and by Theorem 6.2.2 D(A) = (det A)D(I). But D(I) = det (IB) = det B, so det (AB) = D(A) = (det A)(det B).

72


6.3

Additional Properties of Determinants

Several well-known properties of the determinant function are now easy consequences of results we have already obtained, Eq. 6.13 and Theorem 6.2.3, for example. We give a few of these, proving some and leaving the others as rather routine exercises. Recall that a unit u in a ring K with 1 is an element for which there is an element v ∈ K for which uv = vu = 1.

6.3.1

If A is a unit in Mn (K), then det(A) is a unit in K.

If A is an invertible n × n matrix over a commutative ring K with 1, then det (A) is a unit in the ring K.

6.3.2

Triangular Matrices

If the square matrix A over K is upper or lower triangular, then det (A) is the product of the diagonal entries of A.

6.3.3

Transposes

If AT is the transpose of the square matrix A, then det (AT ) = det (A). Proof. If σ is a permutation of degree n, (AT )i(σi) = A(σi)i . Hence det (AT ) =

X

(sgn σ)A(σ1)1 · · · A(σn)n .

σ

When i = σ −1 j, A(σi)i = Aj(σ−1 j) . Thus A(σ1)1 · · · A(σn)n = A1(σ−1 1) · · · An(σ−1 n) . Since σσ −1 is the identity permutation,

6.3. ADDITIONAL PROPERTIES OF DETERMINANTS

73

(sgn σ)(sgn σ −1 ) = 1, so sgn (σ −1 ) = sgn (σ). Furthermore, as σ varies over all permutations of degree n, so does σ −1 . Therefore, det (AT ) =

X

(sgn σ −1 )A1(σ−1 1) · · · An(σ−1 n) = det (A).

σ

6.3.4

Elementary Row Operations

If B is obtained from A by adding a multiple of one row of A to another (or a multiple of one column to another), then det(A) = det(B). If B = cA, then det (B) = cn det (A).

6.3.5

Triangular Block Form

Suppose an n × n matrix A is given in block form B C A= , 0 E where B is an r × r matrix, E is an s × s matrix, C is r × s, and 0 denotes the s × r zero matrix. Then B C det = (det B)(det E). (6.15) 0 E Proof. To prove this, define D(B, C, E) = det

B C 0 E

.

If we fix B and C, then D is alternating and s-linear as a function of the rows of E. Hence by Theorem 6.2.2 D(B, C, E) = (det E)D(B, C, I),

74


where I is the s × s identity matrix. By subtracting multiples of the rows of I from the rows of B and using the result of 6.3.4, we obtain D(B, C, I) = D(B, 0, I). Now D(B, 0, I) is clearly alternating and r-linear as a function of the rows of B. Thus D(B, 0, I) = (det B)D(I, 0, I) = 1. Hence

D(B, C, E) = (det E)D(B, C, I) = (det E)D(B, 0, I) = (det E)(det B).

By taking transposes we obtain A 0 det = (det A)(det C). B C It is now easy to see that this result generalizes immediately to the case where A is in upper (or lower) block triangular form.

6.3.6

The Classical Adjoint

Since the determinant function is unique and det(A) = det(AT ), we know that for each fixed column index j, n X det (A) = (−1)i+j Aij det A(i|j),

(6.16)

i=1

and for each row index i, det (A) =

n X (−1)i+j Aij det A(i|j). j=1

(6.17)


75

As we mentioned earlier, the formulas of Eqs. 6.16 and 6.17 are known as the Laplace expansion of the determinant in terms of columns, respectively, rows. Later we will present a more general version of the Laplace expansion. The scalar (−1)i+j det A(i|j) is usually called the i, j cofactor of A or the cofactor of the i, j entry of A. The above formulas for det (A) are called the expansion of det (A) by cofactors of the jth column (sometimes the expansion by minors of the jth column), or respectively, the expansion of det (A) by cofactors of the ith row (sometimes the expansion by minors of the ith row). If we set Cij = (−1)i+j det A(i|j), then the formula in Eq. 6.16 says that for each j,

det (A) =

n X

Aij Cij ,

i=1

where the cofactor Cij is (−1)i+j times the determinant of the (n−1)×(n−1) matrix obtained by deleting the ith row and jth column of A. Similarly, for each fixed row index i,

det (A) =

n X

Aij Cij .

j=1

If j 6= k, then n X

Aik Cij = 0.

i=1

To see this, replace the jth column of A by its kth column, and call the resulting matrix B. Then B has two equal columns and so det(B) = 0.

76


Since B(i|j) = A(i|j), we have 0 = det (B) n X = (−1)i+j Bij det (B(i|j)) i=1

=

n X

(−1)i+j Aik det (A(i|j))

i=1

=

n X

Aik Cij .

i=1

These properties of the cofactors can be summarized by n X

Aik Cij = δjk det (A).

(6.18)

i=1

The n × n matrix adj A , which is the transpose of the matrix of cofactors of A is called the classical adjoint of A. Thus (adj A)ij = Cji = (−1)i+j det (A(j|i)).

(6.19)

These last two formulas can be summarized in the matrix equation (adj A)A = (det (A))I.

(6.20)

(To see this, just compute the (j, k) entry on both sides of this equation.) We wish to see that A(adj A) = (det A)I also. Since AT (i|j) = (A(j|i))T , we have (−1)i+j det (AT (i|j)) = (−1)i+j det (A(j|i)), which simply says that the i, j cofactor of AT is the j, i cofactor of A. Thus adj (AT ) = (adj A)T . Applying Eq. 6.20 to AT , we have (adj AT )AT = (det (AT ))I = (det A)I. Transposing, we obtain

(6.21)


77

A(adj AT )T = (det (A))I. Using Eq. 6.21 we have what we want: A(adj A) = (det (A))I.

(6.22)

An almost immediate corollary of the previous paragraphs is the following: Theorem 6.3.7. Let A be an n × n matrix over K. Then A is invertible over K if and only if det (A) is invertible in K. When A is invertible, the unique inverse for A is A−1 = (det A)−1 adj A. In particular, an n × n matrix over a field is invertible if and only if its determinant is different from zero. NOTE: This determinant criterion for invertibility proves that an n × n matrix with either a left or right inverse is invertible. NOTE: The reader should think about the consequences of Theorem 6.3.7 in case K is the ring F [x] of polynomials over a field F , or in case K is the ring of rational integers.

6.3.8

Characteristic Polynomial of a Linear Map

If P is also an n × n invertible matrix, then because K is commutative and det is multiplicative, it is immediate that det (P −1 AP ) = det (A).

(6.23)

This means that if K is actually a field, if V is an n-dimensional vector space over K, if T : V → V is any linear map, and if B is any basis of V , then we may unambiguously define the characteristic polynomial cT (x) of T to be cT (x) = det(xI − [T ]B ). This is because if A and B are two matrices that represent the same linear transformation with respect to some bases of V , then by Eq. 6.23 and Theorem 4.7.1

78


det(xI − A) = det(xI − B). Given a square matrix A, a principal submatrix of A is a square submatrix centered about the diagonal of A. So a 1 × 1 principal submatrix is simply a diagonal element of A. Theorem 6.3.9. Let K be a commutative ring with 1, and let A be an n × n matrix over K. The characteristic polynomial of A is given by f (x) = det(xI − A) =

n X

ci xn−i

(6.24)

i=0

where c0 = 1, and for 1 ≤ i ≤ n, ci = i × i principal submatrices of −A.

P

det(B), where B ranges over all the

For an n × n matrix A, the trace of A is defined to be tr(A) =

n X

Aii .

i=1

Note: Putting i = 1 yields the fact that the coefficient of xn−1 is P − ni=1 Aii = −tr(A), and putting i = n says that the constant term is (−1)n det(A). Proof. Clearly det(xI − A) is a polynomial of degree n which is monic, i.e., c0 = 1, and and with constant term det(−A) = (−1)n det(A). Suppose 1 ≤ i ≤ n − 1 and consider the coefficient ci of xn−i in the polynomial det(xI − A). Recall that in general, if D = (dij ) is an n × n matrix over a commutative ring with 1, then X det(D) = (−1)sgn(π) · d1,π(1) d2,π(2) · · · dn,π(n) . π∈Sn

P So to get a term of degree n − i in det(xI − A) = π∈Sn (−1)sgn(π) (xI − A)1,π(1) · · · (xI − A)n,π(n) we first select n − i indices j1 , . . . , jn−i , with complementary indices k1 , . . . , ki . Then in expanding the product (xI−A)1,π(1) · · · (xI− A)n,π(n) when π fixes j1 , . . . , jn−i , we select the term x from the factors (xI − A)j1 ,j1 , . . . , (xI − A)jn−i jn−i , and the terms (−A)k1 ,π(k1 ) , . . . , (−A)ki ,π(ki ) otherwise. So if A(k1 , . . . , ki ) is the principal submatrix of A indexed by rows


79

and columns k1 , . . . , ki , then det(−A(k1 , . . . , ki )) isPthe associated contribution to the coefficient of xn−i . It follows that ci = det(B) where B ranges over all the principal i × i submatrices of −A. Suppose the permutation πP∈ Sn consists of k permutation cycles of sizes l1 , . . . , lk , respectively, where li = n. Then sgn(π) can be computed by sgn(π) = (−1)l1 −1+l2 −1+···lk −1 = (−1)n−k = (−1)n (−1)k . We record this formally as: sgn(π) = (−1)n (−1)k if π ∈ Sn is the product of k disjoint cycles. (6.25)

6.3.10

The Cayley-Hamilton Theorem

In this section we let K be an integral domain (i.e., K is a commutative ring with 1 such that a · b = 0 if and only if either a = 0 or b = 0) and let λ be an indeterminate over K, so that the polynomial ring K[λ] is also an integral domain. If B is any n × n matrix over an appropriate ring, |B| denotes its determinant. Let A be an n × n matrix over K and let n X f (λ) = |λI − A| = λn + cn−1 λn−1 + · · · + c1 λ + c0 = ci λ i i=0

be the characteristic polynomial of A. (Note that we put cn = 1.) The Cayley-Hamilton Theorem asserts that f (A) = 0. The proof we give actually allows us to give some other interesting results also. The classical adjoint of the matrix A (also known as the adjugate of A), will be denoted by Aadj . Recall that (Aadj )ij = (−1)i+j det(A(j|i)). Since λI − A has entries from an integral domain, it also has a classical adjoint (i.e., an adjugate), and (λI − A)adj

ij

= (−1)i+j det ((λI − A)(j|i)) .

Since the (i, j) entry of (λI − A)adj is a polynomial in λ of degree at most n − 1, it follows that (λI − A)adj must itself be a polynomial in λ of degree at most n − 1 and whose coefficients are n × n matrices over K. Say adj

(λI − A)

=

n−1 X i=0

Di λi

80


for some n × n matrices D0 , . . . , Dn−1 . At this point note that if λ = 0 we have: (−A)adj = (−1)n−1 Aadj = D0 . On the one hand we know that (λI − A)(λI − A)adj = det(λI − A) · I = f (λ) · I. On the other hand we have ! n−1 n−1 n X X X i i n (λI − A) Di λ = −AD0 + (Di−1 − ADi )λ + Dn−1 λ = ci · I · λi . i=0

i=1

i=0

If we define Dn = D−1 = 0, we can write n X

(Di−1 − ADi )λi =

i=0

n X

ci · I · λ i .

i=0

So Di−1 −ADi = ci ·I, 0 ≤ i ≤ n, which implies that Aj Dj−1 −Aj+1 Dj = cj Aj . Thus we find f (A)

n X

=

cj A j =

j=0

n X

Aj (Dj−1 − ADj ) =

j=0

= (D−1 − AD0 ) + (AD0 − A2 D1 ) + (A2 D1 − A3 D2 ) + + · · · + (An−1 Dn−2 − An Dn−1 ) + (An Dn−1 − An+1 Dn ) = D−1 − An+1 Dn = 0. This proves the Cayley-Hamilton theorem. Multiply Dj−1 −Aj Dj = cj I on the left by Aj−1 to get Aj−1 D−1 −Aj Dj = cj Aj−1 , and then sum for 1 ≤ j ≤ n. This gives n X

j−1

cj A

j=1

n X = (Aj−1 Dj−1 − Aj Dj ) = D0 − An Dn = D0 , j=1

so (−1)n−1 Aadj = D0 = An−1 + cn−1 An−2 + · · · + c1 I. Put g(λ) = (−1)n−1 Aadj , or

f (λ)−f (0) λ−0

= λn−1 + cn−1 λn−2 + · · · + c1 to obtain g(A) = Aadj = (−1)n−1 g(A).

(6.26)


81

This shows that the adjugate of a matrix is a polynomial in that matrix. Then if det(A) is a unit in K, so A is invertible, we see how to view A−1 as a polynomial in A. Now use the equations Dj−1 − ADj = cj I (with D−1 = Dn = 0, cn = 1) in a slightly different manner. Dn−1 Dn−2 Dn−3 Dn−4

= = = = .. .

Dn−j = .. . Dj = .. . D2 = D1 = D0 =

I ADn−1 + cn−1 I = A + cn−1 I ADn−2 + cn−2 I = A2 + cn−1 A + cn−2 I An−3 + cn−3 I = A3 + cn−1 A2 + cn−2 A + cn−3 I Aj−1 + cn−1 Aj−2 + cn−2 Aj−3 + · · · + cn−j+2 A + cn−j+1 I An−j−1 + cn−1 An−j−2 + cn−2 An−j−3 + · · · + cj+2 A + cj+1 I

AD3 + c3 I = An−3 + cn−1 An−4 + cn−2 An−5 + · · · + c4 A + c3 I AD2 + c2 I = An−2 + cn−1 An−3 + cn−2 An−4 + · · · + c3 A + c2 I AD1 + c1 I = An−1 + cn−1 An−2 + cn−2 An−3 + · · · + c2 A + c1 I Pn−1 Di λi and colSubstituting these values for Di into (λI − A)adj = i=0 lecting coefficients on fixed powers of A we obtain (λI − A)adj = + .. . + +

6.3.11

An−1 + (λ + cn−1 )An−2 + (λ2 + cn−1 λ + cn−2 )An−3 (λ3 + cn−1 λ2 + cn−2 λ + cn−3 )An−4 (6.27) n−2

n−3

n−4

(λ + cn−1 λ + cn−2 λ + · · · + c3 λ + c2 )A n−1 n−2 n−3 (λ + cn−1 λ + cn−2 λ + cn−3 λn−4 + · · · + c2 λ + c1 ) · A0

The Companion Matrix of a Polynomial

In this section K is a field and f (x) = xn + an−1 xn−1 + · · · + a1 x + a0 ∈ K[x]. Define the companion matrix C(f (x)) by

82


     C(f (x)) =    

· · · 0 0 −a0 · · · 0 0 −a1 · · · 0 0 −a2 .. . . . .. .. . . . 0 0 · · · 1 0 −an−2 0 0 · · · 0 1 −an−1 0 1 0 .. .

0 0 1 .. .

     .   

The main facts about C(f (x)) are in the next result. Theorem 6.3.12. det (xIn − C(f (x))) = f (x) is both the minimal and characteristic polynomial of C(f (x)). Proof. First we establish that f (x) = det(xIn − C(f (x))). This result is clear if n = 1 and we proceed by induction. Suppose that n > 1 and compute the determinant by cofactor expansion along the first row, applying the induction hypothesis to the first summand.      det(xIn − C(f (x))) = det     

x 0 ··· 0 0 a0 −1 x · · · 0 0 a1 0 −1 · · · 0 0 a2 .. .. . . . .. .. . .. . . . . 0 0 · · · −1 x an−2 0 0 · · · 0 −1 x + an−1

 x ··· 0 0 a1  −1 · · · 0  0 a2     .. . . . . . . . . = x det  .  . . . .    0 · · · −1 x an−2  0 · · · 0 −1 x + an−1   −1 x · · · 0 0  0 −1 · · · 0 0     .. . . .. ..  +a0 (−1)n+1 det  ... . . . .     0 0 · · · −1 x  0 0 · · · 0 −1

        


83

= x(xn−1 + an−1 xn−2 + · · · + a1 ) + a0 (−1)n+1 (−1)n−1 = xn + an−1 xn−1 + · · · + a1 x + a0 = f (x). This shows that f (x) is the characteristic polynomial of C(f (x)). Now let T be the linear operator on K n whose matrix with respect to the standard basis S = (e1 , e2 , . . . , en ) is C(f (x)). Then T e1 = e2 , T 2 e1 = T e2 = e3 , . . . , T j e1 = T (ej ) = ej+1 for 1 ≤ j ≤ n − 1, and T en = −a0 e1 − a1 e2 − · · · − an−1 en , so (T n + an−1 T n−1 + · · · + a1 T + a0 I)e1 = 0. Also (T n + · · · + a1 T + a0 I)ej+1 = (T n + · · · + a1 T + a0 I)T j e1 = T j (T n + · · · + a1 T + a0 I)e1 = 0. It follows that f (T ) must be the zero operator. On the other hand, (e1 , T e1 , . . . , T n−1 e1 ) is a linearly independent list, so that no nonzero polynomial in T with degree less than n can be the zero operator. Then since f (x) is monic it must be that f (x) is also the minimal polynomial for T and hence for C(f (x)).

6.3.13

The Cayley-Hamilton Theorem: A Second Proof

Let dim(V ) = n and let T ∈ L(V ). If f is the characteristic polynomial for T , then f (T ) = 0. This is equivalent to saying that the minimal polynomial for T divides the characteristic polynomial for T . Proof. This proof is an illuminating and fairly sophisticated application of the general theory of determinants developed above. Let K be the commutative ring with identity consisting of all polynomials in T . Actually, K is a commutative algebra with identity over the scalar field F . Choose a basis B = (v1 , . . . , vn ) for V and let A be the matrix which represents T in the given basis. Then T (vj ) =

n X

Aij vi , 1 ≤ j ≤ n.

i=1

These equations may be written in the equivalent form

84


n X

(δij T − Aij I)vi = ~0, 1 ≤ j ≤ n.

i=1

Let B ∈ Mn (K) be the matrix with entries Bij = δij T − Aji I. Note the interchanging of the i and j in the subscript on A. Also, keep in mind that each element of B is a polynomial in T . Let f (x) = det (xI − A) = T det xI − A . Then f (T ) = det(B). (This is an element of K. Think about this equation until it seems absolutely obvious.) Our goal is to show that f (T ) = 0. In order that f (T ) be the zero operator, it is necessary and sufficient that det(B)(vk ) = ~0 for 1 ≤ k ≤ n. By the definition of B, the vectors v1 , . . . , vn satisfy the equations ~0 =

n X

(δij T − Aij I)(vi ) =

i=1

n X

Bji vi , 1 ≤ j ≤ n.

(6.28)

i=1

˜ be the classical adjoint of B, so that BB ˜ = BB ˜ = det(B)I. Note Let B ˜ ˜kj that B also has entries that are polynomials in the operator T . Let B operate on the right side of Eq. 6.28 to obtain ˜kj ~0 = B

n X

Bji (vi ) =

i=1

n X

(˜bkj Bji )(vi ).

i=1

So summing over j we have ! n n n X X X ˜ ~0 = Bkj Bji (vi ) = (δki det(B)) (vi ) = det(B)(vk ). i=1

j=1

i=1

At this point we know that each irreducible factor of the minimal polynomial of T is also a factor of the characteristic polynomial of T . A converse is also true: Each irreducible factor of the characteristic polynomial of T is also a factor of the minimal polynomial of T . However, we are not yet ready to give a proof of this fact.


6.3.14

85

Cramer’s Rule

We now discuss Cramer’s rule for solving systems of linear equations. Suppose A is an n × n matrix over the field F and we wish to solve the system of linear equations AX = Y for some given n-tuple (y1 , . . . , yn ). If AX = Y , then (adj A)AX = (adj A)Y implying (det A)X = (adj A)Y. Thus computing the j-th row on each side we have (det A)xj =

n X

(adj A)ji yi

i=1 n X = (−1)i+j yi det A(i|j). i=1

This last expression is the determinant of the n × n matrix obtained by replacing the jth column of A by Y . If det A = 0, all this tells us nothing. But if det A 6= 0, we have Cramer’s rule: Let A be an n × n matrix over the field F such that det A 6= 0. If y1 , . . . , yn are any scalars in F , the unique solution X = A−1 Y of the system of equations AX = Y is given by det Bj , j = 1, . . . , n, det A where Bj is the n × n matrix obtained from A by replacing the jth column of A by Y . xj =

6.3.15

Comparing ST with T S

We begin this subsection with a discussion (in terms of linear operators) that is somewhat involved and not nearly as elegant as that given at the end. But we find the techniques introduced first to be worthy of study. Then at the end of the subsection we give a better result with a simpler proof. Theorem 6.3.16 (a) Let V be finite dimensional over F and suppose S, T ∈ L(V ). Then ST and T S have the same eigenvalues, but not necessarily the same minimal polynomial.

86


Proof. Let λ be an eigenvalue of ST with nonzero eigenvector v. Then T S(T (v)) = T (ST (v)) = T (λv) = λT (v). This says that if T (v) 6= 0, then λ is an eigenvalue of T S with eigenvector T (v). However, it might be the case that T (v) = 0. In that case T is not invertible, so T S is not invertible. Hence it is not one-to-one. This implies that λ = 0 is an eigenvalue of T S. So each eigenvalue of ST is an eigenvalue of T S. By a symmetric argument, each eigenvalue of T S is an eigenvalue of ST . To show that ST and T S might not have the same minimal polynomial, consider the matrices 0 1 0 0 S= ; T = . 0 0 0 1 Then

ST =

0 1 0 0

and T S =

0 0 0 0

.

The minimal polynomial of ST is x2 and that of T S is x. Additional Comment 1: The above proof (and example) show that the multiplicities of 0 as a root of the minimal polynomials of ST and T S may be different. This raises the question as to whether the multiplicities of nonzero eigenvalues as roots of the minimal polynomials of ST and T S could be different. However, this cannot happen. We sketch a proof of this. Suppose the minimal polynomial of ST has the factor (x−λ)r for some positive integer r and some nonzero eigenvalue λ. This means that on the one hand, for each generalized eigenvector w 6= 0 associated with λ, (ST − λI)r (w) = 0, and on the other hand, there is a nonzero generalized eigenvector v associated with λ such that (ST − λI)r−1 (v) 6= 0. (See the beginning of Chapter 10.) Step 1. First prove (say by induction on m) that T (ST − λI)m = (T S − λI) T and (interchanging the roles of S and T ) S(T S−λI)m = (ST −λI)m S. This is for all positive integers m. Step 2. Show ST (v) 6= 0, as follows: Assume that ST (v) = 0. Then m

0 = = = =

(ST − λI)r (v) (ST − λI)r−1 (ST − λI)(v) (ST − λI)r−1 (0 − λv) −λ(ST − λI)r−1 (v) 6= 0,


87

a clear contradiction. Note that this also means that T (v) 6= 0. Step 3. From the above we have 0 = T (ST − λI)r (v) = (T S − λI)r T (v), so T (v) is a generalized eigenvector of T S associated with λ. Step 4. Show that (T S − λI)r−1 T (v) 6= 0. This will imply that (x − λ)r divides the minimum polynomial of T S. Then interchanging the roles of S and T will show that if (x−λ)r divides the minimum polynomial of T S, it also divides the minimum polynomial of ST . Hence for each nonzero eigenvalue λ, it has the same multiplicity as a factor in the minimum polynomial of T S as it has in the minimum polynomial of ST . The details are: Suppose that 0 = (T S − λI)r−1 T (v), then 0 = S(0) = S(T S − λI)r−1 T (v) = (ST − λI)r−1 ST (v) But then 0 = (ST − λI)r (v) = (ST − λI)r−1 (ST − λI)(v) = (ST − λI)r−1 (ST (v) − λv) = 0 − λ(ST − λI)r−1 (v) 6= 0, a clear contradiction. This completes the proof. Additional Comment 2: It would be easy to show that the two characteristic polynomials are equal if any two n × n matrices were simultaneously upper triangularizable. It is true that if two such matrices commute then (over an algebraically closed field) they are indeed simultaneously upper triangularizable. However, consider the following example: 1 0 1 1 S= , and T = . 0 0 1 0 a b Suppose there were an invertible matrix P = for which both c d P SP −1 and P T P −1 were upper triangular. Let ∆ = detP = ad − bc 6= 0. Then 1 ad −ab −1 , P SP = ∆ cd −bc

88


which is upper triangular if and only if cd = 0. 1 ad + bd − ac −ab − b2 + a2 −1 PTP = , cd + d2 − c2 cd + d2 − c2 ∆ which is upper triangular if and only if d2 + cd − c2 = 0. If c = 0 this forces d = 0, giving ∆ = 0, a contradiction. If d = 0, then c = 0. Additional Comment 3: The preceding Comment certainly raises the question as to whether or not the characteristic polynomial of ST can be different from that of T S. In fact, they must always be equal, and there is a proof of this that introduces a valuable technique. Suppose that Z is the usual ring of integers. Suppose that Y = (yij ) and Z = (zij ) are two n × n matrices of indeterminates over the integers. Let D = Z[yij , zij ], where 1 ≤ i, j ≤ n. So D is the ring of polynomials over Z in 2n2 commuting but algebraically independent indeterminates. Then let L = Q(yij , zij ) be the field of fractions of the integral domain D. Clearly the characteristic polynomials of Y Z and of ZY are the same whether Y and Z are viewed as being over D or L. But over L both Y and Z are invertible, so Y Z = Y (ZY )Y −1 and ZY are similar. Let x be an additional indeterminate so that the characteristic polynomial of ZY is det(xI − ZY ) = det(Y ) · det(xI − ZY ) · det(Y −1 ) = det(xI − Y Z), which is then also the characteristic polynomial of Y Z. Hence ZY and Y Z have the same characteristic polynomial. Now let R be a commutative ring with 1 6= 0. (You may take R to be a field, but this is not necessary.) Also let S = (sij ) and T = (tij ) be n × n matrices over R. There is a homomorphism θ : D[x] = Z[yij , zij ][x] → R[x] that maps yij 7→ sij and zij 7→ tij for all i, j, and that maps x to x. This homomorphism maps the characteristic polynomial of Y Z to the characteristic polynomial of ST and the characteristic polynomial of ZY to that of T S. It follows that ST and T S have the same characteristic polynomial since ZY and Y Z do. Exercise: If 0 has multiplicity at most 1 as a root of the characteristic polynomial of ST , then ST and T S have the same minimal polynomial. Now suppose that R is a commutative ring with 1, that A and B are matrices over R with A being m × n and B being n × m, with m ≤ n. Let

6.4. DEEPER RESULTS WITH SOME APPLICATIONS∗

89

fAB , fBA be the characteristic polynomials of AB and BA, respectively, over R. Theorem 6.3.15(b) fBA = xn−m fAB . Proof. For square matrices of size m + n we have the following two identities:

AB 0 I A AB ABA = B 0 0 I B BA I A 0 0 AB ABA = . 0 I B BA B BA Since the (m + n) × (m + n) block matrix I A 0 I is nonsingular (all its eigenvalues are +1), it follows that −1 AB 0 I A 0 0 I A = . 0 I B 0 0 I B BA Hence the two (m + n) × (m + n) matrices AB 0 0 0 C1 = and C2 = B 0 B BA are similar. The eigenvalues of C1 are the eigenvalues of AB together with n zeros. The eigenvalues of C2 are the eigenvalues of BA together with m zeros. Since C1 and C2 are similar, they have exactly the same eigenvalues, including multiplicities, the theorem follows.

6.4

Deeper Results with Some Applications∗

The remainder of this chapter may be omitted without loss of continuity. In general we continue to let K be a commutative ring with 1. Here M atn (K) denotes the ring of n × n matrices over K, and |A| denotes the element of K that is the determinant of A.

90


6.4.1

Block Matrices whose Blocks Commute∗

We can regard a k × k matrix M = (A(i,j) ) over M atn (K) as a block matrix, a matrix that has been partitioned into k 2 submatrices (blocks) over K, each of size n × n. When M is regarded in this way, we denote its determinant in K by |M |. We use the symbol D(M ) for the determinant of M viewed as a k × k matrix over M atn (K). It is important to realize that D(M ) is an n × n matrix. Theorem. Assume that M is a k × k block matrix of n × n blocks (i,j) A over K that pairwise commute. Then X |M | = |D(M )| = (sgn σ)A(1,σ(1)) A(2,σ(2)) · · · A(k,σ(k)) .

(6.29)

σ∈Sk

Here Sk is the symmetric group on k symbols, so the summation is the usual one that appears in a formula for the determinant. The first proof of this result to come to our attention is the one in N. Jacobson, Lectures in Abstract algebra, Vol. III – Theory of Fields and Galois Theory, D. Van Nostrand Co., Inc., 1964, pp 67 – 70. The proof we give now is from I. Kovacs, D. S. Silver, and Susan G. Williams, Determinants of CommutingBlock Matrices, Amer. Math. Monthly, Vol. 106, Number 10, December 1999, pp. 950 – 952. Proof. We use induction on k. The case k = 1 is evident. We suppose that Eq. 6.27 is true for k − 1 and then prove it for k. Observe that the following matrix equation holds:     

I 0 ··· (2,1) −A I ··· .. .. . . ··· (k,1) −A 0 ···

0 0 .. . I

    

I ··· 0 (1,1) 0 A ··· 0 .. .. .. ... . . . (1,1) 0 0 ··· A





    M =   

 A(1,1) ∗ ∗ ∗  0  , ..  . N 0

where N is a (k − 1) × (k − 1) matrix. To simplify the notation we write this as P QM = R,

(6.30)


91

where the symbols are defined appropriately. By the multiplicative property of determinants we have D(P QM ) = D(P )D(Q)D(M ) = (A(1,1) )k−1 D(M ) and D(R) = A(1,1) D(N ). Hence we have (A(1,1) )k−1 D(M ) = A(1,1) D(N ). Take the determinant of both sides of the last equation. Since |D(N )| = |N | by the induction hypothesis, and using P QM = R, we find

|A(1,1) |k−1 |D(M )| = |A(1,1) ||D(N )| = |A(1,1) ||N | = |R| = |P ||Q||M | = |A(1,1) |k−1 |M |. If |A(1,1) | is neither zero nor a zero divisor, then we can cancel |A(1,1) |k−1 from both sides to get Eq. 6.29. For the general case, we embed K in the polynomial ringK[z], where z is an indeterminate, and replace A(1,1) with the matrix zI + A(1,1) . Since the determinant of zI + A(1,1) is a monic polynomial of degree n, and hence is neither zero nor a zero divisor, Eq. 6.27 holds again. Substituting z = 0 (equivalently, equating constant terms of both sides) yields the desired result.

6.4.2

Tensor Products of Matrices∗

Definition: Let A = (aij ) ∈ Mm1 ,n1 (K), and let B ∈ Bm2 ,n2 (K). Then the tensor product or Kronecker product of A and B, denoted A ⊗ B ∈ Mm1 m2 ,n1 n2 (K), is the partitioned matrix    A⊗B = 

a11 B a21 B .. .

a12 B a22 B .. .

··· ··· ...

am1 1 B am1 2 B · · ·

a1n1 B a2n1 B .. .

   . 

(6.31)

am1 n1 B

It is clear that Im ⊗ In = Imn . Lemma Let A1 ∈ Mm1 ,n1 (K), A2 ∈ Mn1 ,r1 (K), B1 ∈ Mm2 ,n2 (K), and B2 ∈ Mn2 ,r2 (K). Then (A1 ⊗ B1 )(A2 ⊗ B2 ) = (A1 A2 ) ⊗ (B1 B2 ).

(6.32)

92


Proof. Using block multiplication, we see that the (i, j) block of (A1 A2 ) ⊗ (B1 B2 ) is n1 X ((A1 )ik B1 ) ((A2 )kj B2 ) = k=1

=

n1 X

! (A1 )ik (A2 )kj

B1 B2 ,

k=1

which is also seen to be the (i, j) block of (A1 A2 ) ⊗ (B1 B2 ). Corollary Let A ∈ Mm (K) and B ∈ Mn (K). Then A ⊗ B = (A ⊗ In )(Im ⊗ B).

(6.33)

|A ⊗ B| = |A|n |B|m .

(6.34)

and Proof. Eq. 6.33 is an easy consequence of Eq. 6.32. Then prove Eq. 6.34 as follows. Use the fact that the determinant function is multiplicative. Clearly Im ⊗ B is a block diagonal matrix with m blocks along the diagonal, each equal to B, so det(Im ⊗ B) = (det(B))m . The determinant det(A ⊗ In ) is a little trickier. The matrix A ⊗ In is a block matrix whose (i, j)-th block is aij In , 1 ≤ i, j ≤ m. Since each two of the blocks commute, we can use the theorem that says that we can first compute the determinant as though it were an m × m matrix of elements from a commutative ring of n × n matrices, and then compute the determinant of this n × n matrix. Hence det(A ⊗ I) = det(det(A) · In ) = (det(A))n , and the proof is complete.

6.4.3

The Cauchy-Binet Theorem-A Special Version∗

The main ingredient in the proof of the Matrix-Tree theorem (see the next section) is the following theorem known as the Cauchy-Binet Theorem. It is more commonly stated and applied with the diagonal matrix 4 below taken to be the identity matrix. However, the generality given here actually simplifies the proof. Theorem 6.4.4. Let A and B be, respectively, r × m and m × r matrices, with r ≤ m. Let 4 be the m × m diagonal matrix with entry ei in the (i, i)position. For an r-subset S of [m], let AS and B S denote, respectively, the


93

r × r submatrices of A and B consisting of the columns of A, or the rows of B, indexed by the elements of S. Then det(A 4 B) =

X

det(AS )det(B S )

S

Y

ei ,

i∈S

where the sum is over all r-subsets S of [m]. Proof. We prove the theorem assuming that e1 , . . . , em are independent (commuting) indeterminates over F . Of course it will then hold for all values of e1 , . . . , em in F . Recall that if C = (cij ) is any r × r matrix over F , then det(C) =

X

sgn(σ)c1σ(1) c2σ(2) · · · crσ(r) .

σ∈Sr

P Given that A = (aij ) and B = (bij ), the (i,j)-entry of A4B is m k=1 aik ek bkj , and this is a linear form in the indeterminates e1 , . . . , em . Hence det(A4B) is a homogeneous polynomial of degree r in e1 , . . . , em . Suppose that det(A4B) has a monomial et11 et22 . . . where the number of indeterminates ei that have ti > 0 is less than r. Substitute 0 for the indeterminates ei that do not appear in et11 et22 . . ., i.e., that have ti = 0. This will not affect the monomial et11 et22 . . . or its coefficient in det(A 4 B). But after this substitution 4 has rank less than r, so A 4 B has rank less than r, implying that det(A 4 B) must be the zero polynomial. Hence we see that the coefficient of a monomial in the polynomial det(A 4 B) is zero unless that monomial Qis the product of r distinct indeterminates ei , i.e., unless it is of the form i∈S ei for some r-subset S of [m]. Q The coefficient of a monomial i∈S ei in det(A 4 B) is found by setting ei = 1 for i ∈ S, and ei = 0 for i 6∈ S. When thisQ substitution is made in 4, A 4 B evaluates to AS B S . So the coefficient of i∈S ei in det(A 4 B) is det(AS )det(B S ). Exercise 6.4.4.1. Let M be an n × n matrix all of whose linesums are zero. Then one of the eigenvalues of M is λ1 = 0. Let λ2 , . . . , λn be the other eigenvalues of M . Show that all principal n − 1 by n − 1 submatrices have the same determinant and that this value is n1 λ2 λ3 · · · λn . Sketch of Proof: First note that since all line sums are equal to zero, the entries of the matrix are completely determined by the entries of M in the

94


first n − 1 rows and first n − 1 columns, and that the entry in the (n, n) position is the sum of all (n − 1)2 entries in the first n − 1 rows and columns. Clearly λ1 = 0 is an eigenvalue of A. Observe the appearance of the (n − 1) × (n − 1) subdeterminant obtained by deleting the bottom row and right hand column. Then consider the principal subdeterminant obtained by deleting row j and column j, 1 ≤ j ≤ n − 1, from the original matrix M . In this (n − 1) × (n − 1) matrix, add the first n − 2 columns to the last one, and then add the first n − 2 rows to the last one. Now multiply the last column and the last row by -1. This leaves a matrix that could have been obtained from the original upper (n − 1) × (n − 1) submatrix by moving its jth row and column to the last positions. So it has the same determinant. Now note that the coefficient of x in the characteristic polynomial Pf (x) = n−1 n−1 det(xI−A) is (−1) λ2 λ3 · · · λn , since λ1 = 0, and it is also (−1) det(B), where the sum is over all principal subdeterminants of order n − 1, which by the previous paragraph all have the same value. Hence det(B) = n1 λ2 λ3 · · · λn , for any principal subdeterminant det(B) of order n − 1.

6.4.5

The Matrix-Tree Theorem∗

The “matrix-tree” theorem expresses the number of spanning trees in a graph as the determinant of an appropriate matrix, from which we obtain one more proof of Cayley’s theorem counting labeled trees. An incidence matrix N of a directed graph H is a matrix whose rows are indexed by the vertices V of H, whose columns are indexed by the edges E of H, and whose entries are defined by:   0 1 N (x, e) =  −1

if x is not incident with e, or e is a loop, if x is the head of e, if x is the tail of e.

Lemma 6.4.6. If H has k components, then rank(N ) = |V | − k. Proof. N has v = |V | rows. The rank of N is v − n, where n is the dimension of the left null space of N , i.e., the dimension of the space of row vectors g for which gN = 0. But if e is any edge, directed from x to y, then gN = 0 if and only if g(x) − g(y) = 0. Hence gN = 0 iff g is constant on each component of H, which says that n is the number k of components of H.


95

Lemma 6.4.7. Let A be a square matrix that has at most two nonzero entries in each column, at most one 1 in each column, at most one -1 in each column, and whose entries are all either 0, 1 or -1. Then det(A) is 0, 1 or -1. Proof. This follows by induction on the number of rows. If every column has both a 1 and a -1, then the sum of all the rows is zero, so the matrix is singular and det(A) = 0. Otherwise, expand the determinant by a column with one nonzero entry, to find that it is equal to ±1 times the determinant of a smaller matrix with the same property.

Corollary 6.4.8. Every square submatrix of an incidence matrix of a directed graph has determinant 0 or ±1. (Such a matrix is called totally unimodular.) Theorem 6.4.9. (The Matrix-Tree Theorem) The number of spanning trees in a connected graph G on n vertices and without loops is the determinant of any n − 1 by n − 1 principal submatrix of the matrix D − A, where A is the adjacency matrix of G and D is the diagonal matrix whose diagonal contains the degrees of the corresponding vertices of G. Proof. First let H be a connected digraph with n vertices and with incidence matrix N . H must have at least n − 1 edges, because it is connected and must have a spanning tree, so we may let S be a set of n − 1 edges. Using the notation of the Cauchy-Binet Theorem, consider the n by n − 1 submatrix NS of N whose columns are indexed by elements of S. By Lemma 6.4.6, NS has rank n−1 iff the spanning subgraph of H with S as edge set is connected, i.e., iff S is the edge set of a tree in H. Let N 0 be obtained by dropping any single row of the incidence matrix N . Since the sum of all rows of N (or of NS ) is zero, the rank of NS0 is the same as the rank of NS . Hence we have the following:

det(NS0 )

=

±1 0

if S is the edge set of a spanning tree in H, otherwise.

(6.35)

Now let G be a connected loopless graph on n vertices. Let H be any digraph obtained by orienting G, and let N be an incidence matrix of H. Then we claim N N T = D − A. For,

96

CHAPTER 6. DETERMINANTS (N N T )xy =

P N (x, e)N (y, e) e∈E(G) deg(x) if x = y, = −t if x and y are joined by t edges in G. An n − 1 by n − 1 principal submatrix of D − A is of the form N 0 N 0T where N 0 is obtained from N by dropping any one row. By Cauchy-Binet, X X det(N 0 N 0T ) = det(NS0 ) det(NS0T ) = (det(NS0 ))2 , S

S

where the sum is over all n − 1 subsets S of the edge set. By Eq. 6.33 this is the number of spanning trees of G. Exercise 6.4.9.1. (Cayley’s Theorem) In the Matrix-Tree Theorem, take G to be the complete graph Kn . Here the matrix D − A is nI − J, where I is the identity matrix of order n, and J is the n by n matrix of all 1’s. Now calculate the determinant of any n − 1 by n − 1 principal submatrix of this matrix to obtain another proof that Kn has nn−2 spanning trees. Exercise 6.4.9.2. In the statement of the Matrix-Tree Theorem it is not necessary to use principal subdeterminants. If the n − 1 × n − 1 submatrix M is obtained by deleting the ith row and jth column from D − A, then the number of spanning trees is (−1)i+j det(M ). This follows from the more general lemma: If A is an n − 1 × n matrix whose row sums are all equal to 0 and if Aj is obtained by deleting the jth column of A, 1 ≤ j ≤ n, then det(Aj ) = − det(Aj+1 ).

6.4.10

The Cauchy-Binet Theorem - A General Version∗

Let 1 ≤ p ≤ m ∈ Z, and let Qp,m denote the set of all sequences α = (i1 , i2 , · · · , ip ) of p integers with 1 ≤ i1 < i2 < · · · < ip ≤ m. Note that |Qp,m | = mp . Let K be a commutative ring with 1, and let A ∈ Mm,n (K). If α ∈ Qp,m and β ∈ Qj,n , let A[α|β] denote denote the submatrix of A consisting of the elements whose row index is in α and whose column index is in β. If α ∈ Qp,m , then there is a complementary sequence α ˆ ∈ Qm−p,m consisting of the list of exactly those positive integers between 1 and m that are not in α, and the list is in increasing order. Theorem 6.4.11. Let A ∈ Mm,n (K) and B ∈ Mn,p (K). Assume that 1 ≤ t ≤ min{m, n, p} and let α ∈ Qt,m , β ∈ Qt,p . Then


det(AB[α|β]) =

X

97

det(A[α|γ]) · det(B[γ|β]).

γ∈Qt,n

Proof. Suppose that α = (α1 , . . . , αt ), β = (β1 , . . . , βt ), and let C = AB[α|β]. Then Cij =

n X

aαi k bkβj .

k=1

So we have  Pn b a b · · · kβ α k kβ t 1 1 k=1 k=1   .. .. .. C= . . . . Pn Pn k=1 aαt k bkβ1 · · · k=1 aαt k bkβt  Pn

To calculate the determinant of C we start by using the n-linearity in the first row, then the second, row, etc.  n X

det(C) =

aα1 k1

k1 =1

=

n X k1 =1

···

n X

  · det  

Pn bk1 β1 k=1 aα2 k bkβ1 .. Pn . k=1 aαt k bkβ1

··· ··· ··· ···

Pn bk1 βt k=1 aα2 k bkβt .. Pn .P k=1 aα2 k bkβt n k=1 aαt k bkβt



aα1 k1 · · · aαt kt

kt =1

bk1 β1 · · ·  .. · det  . bkt β1 · · ·

 bk 1 β t ..  . .  bkt βt

   = 

(6.36)

If ki = kj for i 6= j, then 

bk1 β1 · · ·  .. det  . bk t β 1 · · ·

 bk1 βt ..  = 0. .  bk t β t

Then the only possible nonzero determinant occurs when (k1 , . . . , kt ) is a permutation of a sequence γ = (γ1 , . . . , γt ) ∈ Qt,n . Let σ ∈ St be the permutation of {1, 2, . . . , t} such that γi = kσ(i) for 1 ≤ i ≤ t. Then

98




bk1 β1 · · ·  .. det  . bk t β 1 · · ·

 bk1 βt ..  sgn(σ) det(B[γ|β]). .  bk t β t

(6.37)

Given a fixed γ ∈ Qt,n , all possible permutations of γ are included in the summation in Eq. 6.36. Therefore Eq. 6.36 may be rewritten , using Eq. 6.37, as ! X X det(C) = sgn(σ)aα1 γσ(1) · · · aαt γσ(t) det(B[γ|β]), γ∈Qt,n

σ∈St

which is the desired formula. The Cauchy-Binet formula gives another verification of the fact that det(AB) = det(A) · det(B) for square matrices A and B.

6.4.12

The General Laplace Expansion∗

P For γ = (γ1 , . . . , γt ) ∈ Qt,n , put s(γ) = tj=1 γj . Theorem: Let A ∈ Mn (K) and let α ∈ Qt,n (1 ≤ t ≤ n) be given. Then X (−1)s(α)+s(γ) det(A[α|γ]) · det(A[ˆ α|ˆ γ ]). (6.38) det(A) = γ∈Qt,n

Proof. For A ∈ Mn (K), define X (−1)s(α)+s(γ) det(A[α|γ]) · det(A[ˆ α|ˆ γ ]). Dα (A) =

(6.39)

γ∈Qt,n

Then Dα : Mn (K) → K is easily shown to be n-linear as a function on the columns of A. To complete the proof, it is only necessary to show that Dα is alternating and that Dα (In ) = 1. Thus, suppose that the columns of A labeled p and q, p < q, are equal. If p and q are both in γ ∈ Qt,n , then A[α|γ] will have two columns equal and hence have zero determinant. Similarly, if p and q are both in γˆ ∈ Qn−t,n , then det(A[ˆ α|ˆ γ ) = 0. Thus in the evaluation of Dα (A) it is only necessary to consider those γ ∈ Qt,n such that p ∈ γ and q ∈ γˆ , or vice versa. So suppose p ∈ γ, q ∈ γˆ , and defind a new sequence γ 0 in Qt,n by replacing p ∈ γ by q. Thus γˆ 0 agrees with γˆ except that q has been replaced by p. Thus


s(γ 0 ) − s(γ) = q − p.

99

(6.40)

(Note that s(γ) − p and s(γ 0 ) − q are both the sum of all the things in γ except for p.) Now consider the sum 0

(−1)s(γ) det(A[α|γ]) det(A[ˆ α|ˆ γ ]) + (−1)s(γ ) det(A[α|γ 0 ]) det(A[ˆ α|ˆ γ 0 ]), which we denote by S(A). We claim that this sum is 0. Assuming this, since γ and γ 0 appear in pairs in Qt,n , it follows that Dα (A) = 0 whenever two columns of A agree, forcing Dα to be alternating. So now we show that S(A) = 0. Suppose that p = γk and q = γˆl . Then γ and γ 0 agree except in the range from p to q, as do γˆ and γˆ 0 . This includes a total of q − p + 1 entries. If r of these entries are included in γ, then γ1 < · · · < γk = p < γk+1 < · · · < γk+r−1 < q < γk+r < · · · < γt and A[α|γ 0 ] = A[α|γ]Pw−1 , where w is the r-cycle (k + r − 1, k + r − 2, . . . , k). Similarly, A[ˆ α|ˆ γ 0 ] = A[ˆ α|ˆ γ ]Pw0 where w0 is a (q − p + 1 − r)-cycle. Thus, 0

(−1)s(γ ) det(A[α|γ 0 ]) det(A[ˆ α|ˆ γ 0 ]) = 0

= (−1)s(γ )+(r−1)+(q−p)−r det(A[α|γ]) det(A[ˆ α|ˆ γ ]). Since s(γ 0 ) + (q − p) − 1 − s(γ) = 2(q − p) − 1 is odd, we conclude that S(A) = 0. Thus Dα is n-linear and alternating. It is routine to check that Dα (In ) = 1, completing the proof. Applying this formula for det(A) to det(AT ) gives the Laplace expansion in terms of columns.

100

6.4.13


Determinants, Ranks and Linear Equations∗

If K is a commutative ring with 1 and A ∈ Mm,n (K), and if 1 ≤ t ≤ min{m, n}, then a t × t minor of A is the determinant of any submatrix A[α|β] where α ∈ Qt,m , β ∈ Qt,n . The determinantal rank of A, denoted D-rank(A), is the largest t such that there is a nonzero t × t minor of A. With the same notation, Ft (A) = h{det A[α|β] : α ∈ Qt,m , β ∈ Qt,n }i ⊆ K. That is, Ft (A) is the ideal of K generated by all the t×t minors of A. Put F0 (A) = K and Ft (A) = 0 if t > min{m, n}. Ft (A) is called the tth -Fitting ideal of A. The Laplace expansion of determinants along a row or column shows that Ft+1 (A) ⊆ Ft (A). Thus there is a decreasing chain of ideals K = F0 (A) ⊇ F1 (A) ⊇ F2 (A) ⊇ · · · . Definition: If K is a PID, then Ft (A) is a principal ideal, say Ft (A) = hdt (A)i where dt (A) is the greatest common divisor of all the t × t minors of A. In this case, a generator of Ft (A) is called the tth -determinantal divisor of A. Definition If A ∈ Mm,n (K), then the M -rank(A) is defined to be the largest t such that {0} = Ann(Ft (A)) = {k ∈ K : kd = 0 for all d ∈ Ft (A)}. Obs. 6.4.14. 1. M -rank(A) = 0 means that Ann(F1 (a)) 6= {0}. That is, there is a nonzero a ∈ K with a · aij = 0 for all entries aij of A. Note that this is stronger than saying that every element of A is a zero divisor. For example, if A = (2 3) ∈ M1,2 (Z6 ), then every element of A is a zero divisor in Z6 , but there is no single nonzero element of Z6 that annihilates both entries in the matrix. 2. If A ∈ Mn (K), then M -rank(A) = n means that det(A) is not a zero divisor of K. 3. To say that M -rank(A) = t means that there is an a 6= 0 ∈ K with a · D = 0 for all (t + 1) × (t + 1) minors D of A, but there is no nonzero b ∈ K which annihilates all t × t minors of A by multiplication. In particular, if det(A[α|β]) is not a zero divisor of K for some α ∈ Qs,m , β ∈ Qs,n , then M -rank(A) ≥ s.


101

Lemma 6.4.15. If A ∈ Mm,n (K), then 0 ≤ M − rank(A) ≤ D-rank (A) ≤ min{m, n}. Proof. Routine exercise. We can now give a criterion for solvability of the homogeneous linear equation AX = 0, where A ∈ Mm,n (K). This equation always has the trivial solution X = 0, so we want a criterion for the existence of a solution X 6= 0 ∈ Mn,1 (K). Theorem 6.4.16. Let K be a commutative ring with 1 and let A ∈ Mm,n (K). The matrix equation AX = 0 has a nontrivial solution X 6= 0 ∈ Mn,1 (K) if and only if M − rank(A) < n. Proof. Suppose that M -rank(A) = t < n. then Ann(Ft+1 (A)) 6= {0}, so choose b 6= 0 ∈ K with b · Ft+1 (A) = {0}. Without loss of generality, we may assume that t < m, since, if necessary, we may replace the system AX = 0 with an equivalent one (i.e., one with the same solutions) by adding some rows of zeros to the bottom of A. If t = 0, then baij = 0 for all aij and we may take   b  ..  X =  . . b Then X 6= 0 ∈ Mn,1 (K) and AX = 0. So suppose that t > 0. Then b 6∈ Ann(Ft (A)) = {0}, so b·det(A[ζ|β]) 6= 0 for some α ∈ Qt,m , β ∈ Qt,n . By permuting rows and columns, which does note affect whether AX = 0 has a nontrivial solution, we can assume α = (1, . . . , t) = β. For 1 ≤ i ≤ t + 1 let βi = (1, 2, . . . , ˆi, . . . , t + 1) ∈ Qt,t+1 , where ˆi indicates that i is deleted. Let di = (−1)t+1+i det(A[α|βi ]). Thus d1 , . . . , dt+1 are the cofactors of the matrix A1 = A[(1, . . . , t + 1)|(1, . . . , t + 1)] obtained by deleting row t + 1 and column i. Hence the Laplace expansion gives

102


( P t+1 j=1 aij dj = 0, Pt+1 j=i aij dj = det(A[(1, . . . , t, i)|(1, . . . , t, t + 1)]),

if 1 ≤ i ≤ t, if t < i ≤ m.

(6.41)



 x1   Let X =  ... , where xn xi = bdi , if 1 ≤ i ≤ t + 1, xi = 0, if t + 2 ≤ i ≤ n. Then X 6= 0 since xt+1 = b · det(A[α|β]) 6= 0. But Eq. 6.39 and the fact that b ∈ Ann(Ft+1 (A)) show that   Pt+1 b j=1 a1j dj   .. AX =   . Pt+1 b j=1 amj dj   0 ..   .     0   =  = 0.  b det(A[(1, . . . , t, t + 1)|(1, . . . , t, t + 1)])    ..   . b det(A[(1, . . . , t, m)|(1, . . . , t, t + 1)]) Thus X is a nontrivial solution to the equation AX = 0. Conversely, assume that X 6= 0 ∈ Mn,1 (K) is a nontrivial solution to AX = 0, and choose k with xk 6= 0. We claim that Ann(Fn (A)) 6= {0}. If n > m, then Fn (A) = {0}, and hence Ann(Fn (A)) = K 6= (0). Thus we may assume that n ≤ m. Let α = (1, . . . , n) and for each β ∈ Qn,m , let Bβ = A[α|β]. Then since AX = 0 and since each row of Bβ is a full row of A, we conclude that Bβ = 0. The adjoint matrix formula (Eq. 6.20) then shows that (det(Bβ ))X = (Adj Bβ )Bβ X = 0, from which we conclude that xk det(Bβ ) = 0. Since β ∈ Qn,m is arbitrary, we conclude that xk · Fn (A) = 0, i.e., xk ∈ Ann(Fn (A)). But xk 6= 0, so Ann(Fn (A)) 6= {0}, and we conclude that M -rank(A) < n, completing the proof.


103

In case K is an integral domain we may replace the M -rank by the ordinary determinantal rank to conclude the following: Corollary 6.4.17. If K is an integral domain and A ∈ Mm,n (K), then AX = 0 has a nontrivial solution if and only if D-rank(A) < n. Proof. If I ⊆ K, then Ann(I) 6= {0} if and only if I = {0} since an integral domain has no nonzero zero divisors. Therefore, in an integral domain Drank(A) = M -rank(A). The results for n equations are in n unknowns are even simpler. Corollary 6.4.18. Let K be a commutative ring with 1. 1. If A ∈ Mn (K), then AX = 0 has a nontrivial solution if and only if det(A) is a zero divisor of K. 2. If K is an integral domain and A ∈ Mn (K), then AX = 0 has a nontrivial solution if and only if det (A) = 0. Proof. If A ∈ Mn (K), then Fn (A) = hdet(A)i, so M -rank(A) < n if and only if det(A) is a zero divisor. In particular, if K is an integral domain then M -rank(A) < n if and only if det(A) = 0. There are still two other concepts of rank which can be defined for matrices with entries in a commutative ring. Definition Let K be a commutative ring with 1 and let A ∈ Mm,n (K). Then we will define the row rank of A, denoted by row-rank(A), to be the maximum number of linearly independent rows in A, while the column rank of A, denoted col-rank(A), is the maximum number of linearly independent columns. Corollary 6.4.19. Let K be a commutative ring with 1. 1. If A ∈ Am,n (K), then max{row-rank(A), col-rank(A) ≤ M − rank(A) ≤ D-rank(A). 2. If K is an integral domain, then row-rank(A) = col-rank(A) = M − rank(A) = D-rank(A).

104


Proof. We sketch the proof of the result when K is an integral domain. In fact, it is possible to embed K in its field F of quotients and do all the algebra in F . Recall the following from an undergraduate linear algebra course. The proofs work over any field, even if you did only consider them over the real numbers. Let A be an m × n matrix with entries in F . Row reduce A until arriving at a matrix R in row-reduced echelon form. The first thing to remember here is that the row space of A and the row space of R are the same. Similarly, the right null space of A and the right null space of R are the same. (Warning: the column space of A and that of R usually are not the same!) So the leading (i.e., leftmost) nonzero entry in each nonzero row of R is a 1, called a leading 1. Any column with a leading 1 has that 1 as its only nonzero entry. The nonzero rows of R form a basis for the row space of A, so the number r of them is the row-rank of A. The (right) null space of R (and hence of A) has a basis of size n − r. Also, one basis of the column space of A is obtained by taking the set of columns of A in the positions now indicated by the columns of R in which there are leading 1’s. So the column rank of A is also r. This has the interesting corollary that if any r independent columns of A are selected, there must be some r rows of those columns that are linearly independent, so there is an r × r submatrix with rank r. Hence this submatrix has determinant different from 0. Conversely, if some r × r submatrix has determinant different from 0, then the “short” columns of the submatrix must be independent, so the “long” columns of A to which they belong must also be independent. It is now clear that the row-rank, column-rank, M-rank and determinantal rank of A are all the same. Obs. 6.4.20. Since all four ranks of A are the same when K is an integral domain, in this case we may speak unambiguously of the rank of A, denoted rank(A). Moreover, the condition that K be an integral domain is truly necessary, as the following example shows. Define A ∈ M4 (Z210 ) by   0 2 3 5  2 0 6 0   A=  3 0 3 0 . 0 0 0 7 It is an interesting exercise to show the following: 1. row-rank(A) = 1.


105

2. col-rank(A) = 2. 3. M -rank(A) = 3. 4. D-rank(A) = 4. Theorem 6.4.21. Let K be a commutative ring with 1, let M be a finitely generated K-module, and let S ⊆ M be a subset. If |S| > µ(M ) = rank(M ), then S is K-linearly dependent. Proof. Let µ(M ) = m and let T = {w1 , . . . , wm } be a generating set for M consisting of m elements. Choose n distinct elements {v1 , . . . , vn } of S for some n > m, which is possible by hypothesis. Since M = hw1 , . . . , wm i, we may write m X vj = aij wi , with aij ∈ K. i=1

Let A = (aij ) ∈ Mm,n (K). Since n > m, it follows that M -rank(A) ≤ m < n, so Theorem 6.4.16 shows that there is an X 6= 0 ∈ Mn,1 (K) such that AX = 0. Then n X

xj v j

=

j=1

n X j=1

=

xj

m X

! aij wi

i=1

m X

n X

i=1

j=1

! aij xj

wi

(6.42)

= 0, since AX = 0. Therefore, S is K-linearly dependent. Corollary 6.4.22. Let K be a commutative ring with 1, let M be a Kmodule, and let N ⊆ M be a free submodule. Then rank(N ) ≤ rank(M ). Proof. If rank(M ) = ∞, there is nothing to prove, so assume that rank(M ) = m < ∞. If rank(N ) > m, then there is a linearly independent subset of M , namely a basis of N , with more than m elements, which contradicts Theorem 6.4.21. Theorem 6.4.23. If A ∈ Mm,n (K) and B ∈ Mn,p (K), then D-rank(AB) ≤ min{D-rank(A), D-rank(B)}.

(6.43)

106


Proof. Let t > min{D-rank(A), D-rank(B)} and suppose that α ∈ Qt,m , β ∈ Qt,p . Then by the Cauchy-Binet formula det(AB[α|β]) =

X

det(A[α|γ]) det(B[γ|β]).

γ∈Qt,n

Since t > min {D-rank (A), D-rank(B)}, at least one of the determinants det(A[α|β]) or det(B[γ|β]) must be 0 for each γ ∈ Qt,n . Thus det(AB[α|β]) = 0, and since α and β are arbitrary, it follows that D-rank(AB) < t, as required.

The preceding theorem has a useful corollary. Corollary 6.4.24. Let A ∈ Mm,n (K), U ∈ GL(m, K), V ∈ GL(n, K). Then D-rank(U AV ) = D-rank(A). Proof. Any matrix B ∈ Mm,n (K) satisfies D-rank(B) ≤min{m, n}. Since D-rank(U ) = m and D-rank(V ) = n, it follows from Eq. 6.41 that D-rank(U AV ) ≤ min{D-rank(A), n, m} = D-rank(A) and D-rank(A) = D-rank(U −1 (U AV )V −1 ) ≤ D-rank(U AV ). This completes the proof.

6.5

Exercises

Except where otherwise noted, the matrices in these exercises have entries from a commutative ring K with 1. 

 ∗ ∗ ∗ 1. If A =  0 0 ∗ , show that no matter how the five elements ∗ 0 0 ∗ might be replaced by elements of K it is always the case that det(A) = 0.

6.5. EXERCISES

107

2. Suppose that A is n × n with n odd and that AT = −A. Show that: (i) If 2 6= 0 in K, then det(A) = 0. (ii)∗ If 2 = 0 but we assume that each diagonal entry of A equals 0, then det(A) = 0. 3. Let A be an m × n complex matrix and let B be an n × m complex matrix. Also, for any positive integer p, Ip denotes the p × p identity matrix. So AB is m × m, while BA is n × n. Show that the complex number λ 6= 1 is an eigenvalue of Im − AB if and only if λ is an eigenvalue of In − BA. 4. (Vandermonde Determinant) Let t1 , . . . , tn be commuting indeterminates over K, and let A be the n × n matrix whose entries are from the commutative ring K[t1 , . . . , tn ] defined by

   A= 

1 t1 t21 · · · 1 t2 t22 · · · .. .. .. .. . . . . 2 1 tn tn · · ·

tn−1 1 tn−1 2 .. . tn−1 n

   . 

Then

det A =

Y

(ti − tj ).

1≤j 0 and choose a nonzero vector v ∈ V . Let T ∈ L(V ). Then the set (v, T (v), T 2 (v), . . . , T n (v)) of n + 1 vectors in an n-dimensional space cannot be linearly independent. So there must be scalars, not all zero, such that ~0 = a0 v + a1 T (v) + a2 T 2 (v) + · · · + an T n (v). Let m be the largest index such that am 6= 0. Since v 6= ~0, the coefficients a1 , . . . , an cannot all be 0, so 0 < m ≤ n. Use the a0 s to construct a polynomial which can be written in factored form as a0 + a1 z + a2 z 2 + · · · + am z m = c(z − λ1 )(z − λ2 ) · · · (z − λm ),

116

CHAPTER 7. OPERATORS AND INVARIANT SUBSPACES

where c ∈ F is nonzero, each λj ∈ F , and the equation holds for all z ∈ F . We then have ~0 = a0 v + a1 T (v) + · · · + am T m (v) = (a0 I + a1 T + · · · am T m )(v) = c(T − λ1 I)(T − λ2 I) · · · (T − λm I)(v).

(7.2)

If (T − λm I)(v) = ~0, then λm must be an eigenvalue. If it is not the zero vector, consider (T − λm−1 I)(T − λm I)(v). If this is zero, then λm−1 must be an eigenvalue. Proceeding this way, we see that the first λj (i.e., with the largest j) for which (T − λj I)(T − λj+1 I) · · · (T − λm I)(v) = ~0 must be an eigenvalue. Theorem 7.2.2. Suppose T ∈ L(V ) and B = (v1 , . . . , vn ) is a basis of V . Then the following are equivalent: (i) [T ]B is upper triangular. (ii) T (vk ) ∈ span(v1 , . . . , vk ) for each k = 1, . . . , n. (iii) The span(v1 , . . . , vk ) is T -invariant for each k = 1, . . . , n. Proof. By now the proof of this result should be clear to the reader. Theorem 7.2.3. Let F be an algebraically closed field, let V be an ndimensional vector space over F with n ≥ 1, and let T ∈ L(V ). Then there is a basis B for V such that [T ]B is upper triangular. Proof. We use induction on the dimension of V . Clearly the theorem is true if n = 1. So suppose that n > 1 and that the theorem holds for all vector spaces over F whose diminsion is a positive integer less than n. Let λ be any eigenvalue of T (which we know must exist by Theorem 7.2.1). Let U = Im(T − λI). Because T − λI is not injective, it is also not surjective, so dim(U ) < n = dim(V ). If u ∈ U , then T (u) = (T − λI)(u) + λu. Obviously (T − λI)(u) ∈ U (from the definition of U ) and λu ∈ U . Thus the equation above shows that T (u) ∈ U , hence U is T -invariant. Thus T |U ∈ L(U ). By our induction hypothesis, there is a basis (u1 , . . . , um ) of U

7.2. UPPER-TRIANGULAR MATRICES

117

with respect to which T |U has an upper triangular matrix. Thus for each j we have (using Theorem 7.2.2) T (uj ) = (T |U )(uj ) ∈ span(u1 , . . . , uj ).

(7.3)

Extend (u1 , . . . , um ) to a basis (u1 , . . . , um , v1 , . . . , vr ) of V . For each k, 1 ≤ k ≤ r, we have T (vk ) = (T − λI)(vk ) + λvk . The definition of U shows that (T − λI)(vk ) ∈ U = span(u1 , . . . , um ). Clearly T (vk ) ∈ span(u1 , . . . , um , v1 , . . . , vk ). It is now clear that that T has an upper triangular matrix with respect to the basis (u1 , . . . , um , v1 , . . . , vr ). Obs. 7.2.4. Suppose T ∈ L(V ) has an upper triangular matrix with respect to some basis B of V . Then T is invertible if and only if all the entries on the diagonal of that upper triangular matrix are nonzero. Proof. We know that T is invertible if and only if [T ]B is invertible, which by Result 6.3.2 and Theorem 6.3.7 is invertible if and only if the diagonal entries of [T ]B are all nonzero. Corollary 7.2.5. Suppose T ∈ L(V ) has an upper triangular matrix with respect to some basis B of V . Then the eigenvalues of T consist precisely of the entries on the diagonal of [T ]B . (Note: With a little care the diagonal of the upper triangular matrix can be arranged to have all equal eigenvalues bunched together. To see this, rework the proof of Theorem 7.2.3.) Proof. Suppose the diagonal entries of the n×n upper triangular matrix [T ]B are λ1 , . . . , λn . Let λ ∈ F . Then [T − λI]B is upper triangular with diagonal elements equal to λ1 − λ, λ2 − λ, . . . , λn − λ. Hence T − λI is not invertible if and only if λ equals one of the λj ’s. In other words, λ is an eigenvalue of T if and only λ equals one of the λj ’s as desired. Obs. 7.2.6. Let B = (v1 , . . . , vn ) be a basis for V . An operator T ∈ L(V ) has a diagonal matrix diag(λ1 , . . . , λn ) with respect to B if and only if T (vi ) = λi vi , i.e., each vector in B is an eigenvector of T .

118


In some ways, the nicest operators are those which are diagonalizable, i.e., those for which there is some basis with respect to which they are represented by a diagonal matrix. But this is not always the case even when the field F is algebraically closed. Consider the following example over the complex numbers. Define T ∈ L(C 2 ) by T (y, z)= (z, 0). As you should verify, if S is 0 1 the standard basis of C 2 , then [T ]S = , so the only eigenvalue of T 0 0 is 0. But null(T − 0I) is 1-dimensional. So clearly C 2 does not have a basis consisting of eigenvectors of T . One of the recurring themes in linear algebra is that of obtaining conditions that guarantee that some operator have a diagonal matrix with respect to some basis. Theorem 7.2.7. If dim(V ) = n and T ∈ L(V ) has exactly n distinct eigenvalues, then T has a diagonal matrix with respect to some basis. Proof. Suppose that T has distinct eigenvalues λ1 , . . . , λn , and let vj be a nonzero eigenvector belonging to λj , 1 ≤ j ≤ n. Because nonzero eigenvectors corresponding to distinct eigenvalues are linearly independent, (v1 , . . . , vn ) is linearly independent, and hence a basis of V . So with respect to this basis T has a diagonal matrix. The following proposition gathers some of the necessary and sufficient conditions for an operator T to be diagonalizable. Theorem 7.2.8. Let λ1 , . . . , λm denote the distinct eigenvalues of T ∈ L(V ). Then the following are equivalent: (i) T has a diagonal matrix with respect to some basis of V . (ii) V has a basis consisting of eigenvectors of T . (iii) There exist one-dimensional T -invariant subspaces U1 , . . . , Un of V such that V = U1 ⊕ · · · ⊕ Un . (iv) V = null(T − λ1 I) ⊕ · · · ⊕ null(T − λm I). (v) dim(V ) = dim(null(T − λ1 I)) + · · · + dim(null(T − λm I)). Proof. At this stage it should be clear to the reader that (i), (ii) and (iii) are equivalent and that (iv) and (v) are equivalent. At least you should think about this until the equivalences are quite obvious. We now show that (ii) and (iv) are equivalent.

7.2. UPPER-TRIANGULAR MATRICES

119

Suppose that V has a basis B = (v1 , . . . , vn ) consisting of eigenvectors of T . We may group together those vi ’s belonging to the same eigenvalue, say v1 , . . . , vd1 belong to λ1 so span a subspace U1 of null(T − λ1 I); vd1 +1 , . . . , vd1 +d2 belong to λ2 so span a subspace U2 of null(T −λ2 I); . . . , and the last dm of the vi ’s belong to λm and span a subspace Um of null(T −λm I). Since B spans all of V , it is clear that V = U1 + · · · + Um . Since the subspaces of eigenvectors belonging to distinct eigenvalues are independent (an easy corollary of Theorem 7.1.2), it must be that V = U1 ⊕ · · · ⊕ Um . Hence each Ui is all of null(T − λi I). Conversely, if V = U1 ⊕ · · · ⊕ Um , by joining together bases of the Ui ’s we get a basis of V consisting of eigenvectors of T . There is another handy condition that is necessary and sufficient for an operator on a finite-dimensional vector space to be diagonalizable. Before giving it we need the following lemma. Lemma 7.2.9. Let T ∈ L(V ) where V is an n-dimensional vector space over the algebraically closed field F . Suppose that λ ∈ F is an eigenvalue of T . The λ must be a root of the minimal polynomial p(x) of T . Proof. Let v be a nonzero eigenvector of T associated with λ. It is easy to verify that p(T )(v) = p(λ) · v. But if P (T ) is the zero operator, then p(T )(v) = 0 = p(λ) · v implies that p(λ) = 0. Theorem 7.2.10. Let V be an n-dimensional vector space over the algebraically closed field F , and let T ∈ L(V ). Then T is diagonalizable if and only if the minimal polynomial for T has no repeated root. Proof. Suppose that T ∈ L(V ) is diagonalizable. This means that there is a basis B of V such that [T ]B is diagonal, say it is the matrix diag(λ1 , . . . , λn ) = A. If f (x) is any polynomial over F , then f (A) = diag(f (λ1 ), . . . , f (λn ). So if λ1 , . . . , λr are the distinct eigenvalues of T , and if p(x) = (x − λ1 )(x − λ2 ) · · · (x − λr ), then p(A) = 0, so p(T ) = 0. Hence the minimal polynomial has no repeated roots (since it must divide p(x)). The converse is a bit more complicated. Suppose that p(x) = (x − λ1 ) · · · (x − λr ) is the minimal polynomial of T with λ1 , . . . , λr distinct. By the preceding lemma we know that all the eigenvalues of T appear among λ1 , . . . , λr . Let

120


Wi = null(T − λi I) for 1 ≤ i ≤ r. We proved earlier that W1 + · · · + Wr = W1 ⊕ · · · ⊕ Wr . What we need at this points is to show that W1 ⊕ · · · ⊕ Wr is all of V . For 1 ≤ j ≤ r put p(x) pj (x) = . x − λj P Q Note that p0 (x) = rj=1 pj (x), so that p0 (λi ) = k6=i (λi −λk ). Also, pj (λi ) = 0, if i 6= j; 0 p (λi ), if i = j. Use partial fraction technique to write r

X ai 1 = . p(x) x − λi i=1 The usual process shows that ai = 1=

1 , p0 (λi )

r X

so we find

1

pi (x). p0 (λi ) i=1

1 For each i, 1 ≤ i ≤ r, put Pi = p0 (λ pi (T ), so I = P1 + P2 + · · · + Pr . Since i) p(x) = (x − λi )pi (x), (T − λi )pi (T )(v) = 0 for all v ∈ V . This means that Pi (v) ∈ Wi for all v ∈ V . Moreover, for each v ∈ V , v = P1 (v) + P2 (v) + · · · + Pr (v) ∈ W1 ⊕ · · · ⊕ Wr . This says V is the direct sum of the eigenspaces of T . Hence if we choose any basis for each Wi and take their union, we have a basis B for V consisting of eigenvectors of T , so that [T ]B is diagonal.

Corollary 7.2.11. Let T ∈ L(V ) be diagonalizable. Let U be a T -invariant subspace of V . Then T |U is diagonalizable. Proof. Let p(x) be the minimal polynomial for T . If u ∈ U , then p(T )(u) = 0. So p(T )|U is the zero operator on U , implying that p(x) is a multiple of the minimal polynomial for T |U . Since T is diagonalizable its minimal polynomial has no repeated roots. Hence the minimal polynomial for T |U can have no repeated roots, implying that T |U is diagonalizable.

7.3

Invariant Subspaces of Real Vector Spaces

We have seen that if V is a finite dimensional vector space over an algebraically closed field F then each linear operator on V has an eigenvalue in

7.3. INVARIANT SUBSPACES OF REAL VECTOR SPACES

121

F . We have also seen an example that shows that this is not the case for real vector spaces. This means that an operator on a nonzero finite dimensional real vector space may have no invariant subspace of dimension 1. However, we now show that an invariant subspace of dimension 1 or 2 always exists. Theorem 7.3.1. Every operator on a finite dimensional, nonzero, real vector space has an invariant subspace of dimension 1 or 2. Proof. Suppose V is a real vector space with dim(V ) = n > 0 and let T ∈ L(V ). Choose v ∈ V with v 6= ~0. Since (v, T (v), . . . , T n (v)) must be linearly dependent (Why?), there are scalars a0 , a1 , . . . , an ∈ F such that not all the ai ’s are zero and ~0 = a0 v + a1 T (v) + · · · an T n (v). Pn i Construct the polynomial f (x) = i=0 ai x which can be factored in the form f (x) = c(x − λ1 ) · · · (x − λr )(x2 + α1 x + β1 ) · · · (x2 + αk x + βk ), where c is a nonzero real number, each λj , αj , and βj is real, r + k ≥ 1, αj2 < 4βj and the equation holds for all x ∈ R. We then have 0 = a0 v + a1 T (v) + · · · + an T n (v) = (a0 I + a1 T + · · · + an T n )(v) (7.4) 2 2 = c(T − λ1 I) · · · (T − λr I)(T + α1 T + β1 I) · · · (T + αk T + βk I)(v), which means that T −λj is not injective for at least one j or that T 2 +αj T +βj I is not injective for at least one j. If T − λj I is not injective for some j, then T has an eigenvalue and hence a one-dimensional invariant subspace. If T 2 + αj T + βj I is not injective for some j, then we can find a nonzero vector w for which T 2 (w) + αj T (w) + βj w = ~0. (7.5) Using Eq. 7.5 it is easy to show that span(w, T (w)) is T -invariant and it clearly has dimension 1 or 2. If it had dimension 1, then w would be an eigenvector of T belonging to an eigenvalue λ that would have to be a root of x2 + αj x + βj = 0, contradicting the assumption that αj2 < 4βj . Hence T has a 2-dimensional invariant subspace. In fact we now have an easy proof of the following:

122


Theorem 7.3.2. Every operator on an odd-dimensional real vector space has an eigenvalue. Proof. Let V be a real vector space with dim(V ) odd and let T ∈ L(V ). We know that the eigenvalues of T are the roots of the characteristic polynomial of T which has real coefficients and degree equal to dim(V ). But every real polynomial of odd degree has a real root by Lemma 1.1.2, i.e., T has a real eigenvalue.

7.4

Two Commuting Linear Operators

Let V be a finite-dimensional vector space over thefield F , and let T be a 0 1 linear operator on V . Suppose T has matrix A = with respect −1 0 to some basis. If i is an element in F (or in some extension of F ) for which i2 = −1, then the eigenvalues of T are ±i. So T has eigenvectors in V if and only if i ∈ F . For example, if F = R, then T has no eigenvectors. To avoid having to deal with this kind of situation we assume from now on that F is algebraically closed, so that each polynomial that has coefficients in F splits into linear factors over F . In particular, any linear operator T on V will have minimal and characteristic polynomials that split into linear factors over F . Our primary example of an algebraically closed field is the field C of complex numbers. Recall that the ring F [x] of polynomials in the indeterminate x with coefficients from F is a principle ideal domain. This means that if I is an ideal of F [x], it must consist of all multiples of some particular element of F [x]. Our chief example is the following: Let T be any linear operator on V , let W be a T -invariant subspace of V , and let v be any vector in V . Put T (v, W ) = {f (x) ∈ F [x] : f (T )(v) ∈ W }. It is easy to show that T (v, W ) is an ideal of F [x]. (This just means that the sum of any two polynomials in T (v, W ) is also in T (v, W ), and if f (x) ∈ T (v, W ) and g(x) is any polynomial in F [x], then the product f (x)g(x) is back in T (v, W ).) Hence there is a unique monic polynomial g(x) of minimal degree in T (v, W ) called the T -conductor of v into W . For this conductor g(x) it is true that f (x) ∈ T (v, W ) if and only if there is some h(x) ∈ F [x] for which f (x) = g(x) · h(x). If W = {~0}, then g(x) is called the T -annihilator of v. Clearly the minimal polynomial p(x) of T is in T (v, W ), so g(x) divides p(x). All

7.4. TWO COMMUTING LINEAR OPERATORS

123

these polynomials have coefficients in F , so by hypothesis they all split into linear factors over F . The fact that p(x) divides any polynomial q(x) for which q(T ) = 0 is quite important. Here is an example. Suppose W is a subspace invariant under T , so the restriction T |W of T to vectors in W is a linear operator on W . Let g(x) be the minimal polynomial for T |W . Since p(T ) = 0, clearly p(T |W ) = 0, so g(x) divides p(x). Theorem 7.4.1. Let V be a finite-dimensional vector space over (the algebraically closed) field F , with n = dim(V ) ≥ 1. Let S and T be commuting linear operators on V . Then every eigenspace of T is invariant under S, and S and T have a common eigenvector in V . Proof. Since dim(V ) ≥ 1 and F is algebraically closed, T has an eigenvector v1 in V with associated eigenvalue c ∈ F , i.e., 0 6= v1 ∈ V and (T − cI)(v1 ) = 0. Put W = {v ∈ V : (T − cI)(v) = 0}, so W is the eigenspace of T associated with the eigenvalue c. To see that W is invariant under S let w ∈ W , so T (w) = cw. Then since S commutes with T , it also commutes with T − cI, and (T − cI)(S(w)) = [(T − cI)S](w) = [S(T − cI)](w) = S((T − cI)(w)) = S(0) = 0. This says that S(w) is in W , so S acts on W , which has dimension at least 1. But then S|W is a linear operator on W and must have an eigenvector w1 in W . So w1 is a common eigenvector of S and T . Note: In the above proof we do not claim that every element of W is an eigenvector of S. Also, S and T play symmetric roles in the above proof, so that also each eigenspace of S is invariant under T . We need to use the concept of quotient space. Let W be a subspace of V , where V is any vector space over a field F . For each v ∈ V , the set v + W = {v + w : w ∈ W } is called a coset of the subgroup W in the additive group (V, +), and these cosets form a group V /W (called a quotient group) with the following binary operation: (v1 + W ) + (v2 + W ) := (v1 + v2 ) + W . From Modern Algebra we know that the quotient group (V /W, +) is also an abelian group (Think about the different roles being played by the symbol “+” here.) We can make this quotient group V /W into a vector space over the same field by defining a scalar multiplication as follows: for c ∈ F and v + W ∈ V /W , put c(v + W ) = cv + W . It is routine to show that this makes

124


V /W into a vector space. Moreover, if B1 = (v1 , . . . , vr ) is a basis for W , and B2 = (v1 , . . . , vr , vr+1 , . . . , vn ) is a basis for V , then (vr+1 + W, . . . , vn + W ) is a basis for V /W . (Be sure to check this out!!) Hence dim(V ) = dim(W ) + dim(V /W ). Moreover, if W is invariant under the operator T ∈ L(V ), then T induces a linear operator T on V /W as follows: T : V /W → V /W : v + W 7→ T (v) + W. Clearly if T, S ∈ L(V ), if W is invariant under both S and T , and if S · T = T · S, then T · S = S · T . We are now ready for the following theorem on two commuting operators. Theorem 7.4.2. Let S and T be two commuting operators on V over the algebraically closed field F . Then there is a basis B of B with respect to which both [T ]B and [S]B are upper triangular. Proof. We proceed by induction on n, the dimension of V . By the previous theorem there is a vector v1 ∈ V such that T v1 = λ1 v1 and Sv1 = µ1 v1 for some scalars λ1 and µ1 . Let W be the subspace spanned by v1 . Then the dimension of V /W is n − 1, and the operators T and S on V /W commute, so by our induction hypothesis there is a basis B1 = (v2 + W, v3 + W, . . . , vn + W ) of V /W with respect to which both T and S have upper triangular matrices. It follows that B = (v1 , v2 , . . . , vn ) is a basis of V with respect to which both T and S have upper triangular matrices. Theorem 7.4.3. Let T be a diagonalizable operator on V and let S ∈ L(V ). Then ST = T S if and only if each eigenspace of T is invariant under S. Proof. Since T is diagonalizable, there is a basis B of V consisting of eigenvectors of T . Let W be the eigenspace of T associated with the eigenvalue λ, and let w ∈ W . If S(w) ∈ W , then (T S)(w) = T (Sw) = λSw = S(λw) = S(T w) = ST (w). So if each eigenspace of T is invariant under S, S and T commute at each element of B, implying that ST = T S on all of V . Conversely, suppose that ST = T S. Then for any w ∈ W , T (Sw) = S(T w) = S(λw) = λS(w), implying that Sw ∈ W . So each eigenspace of T must be invariant under S. Note that even if T is diagonalizable and S commutes with T , it need not be the case that S must be diagonalizable. For example, if T = I, then V is the only eigenspace of T , and if S is any non-diagonalizable operator on V ,

7.4. TWO COMMUTING LINEAR OPERATORS

125

then S still commutes with T . However, if both T and S are known to be diagonalizable, then we can say a bit more. Theorem 7.4.4. Let S and T both be diagonalizable operators on the ndimensional vector space V over the field F . Then S and T commute if and only if S and T are simultaneously diagonalizable. Proof. First suppose that S and T are simultaneously diagonalizable. Let B be a basis of V with respect to which both [T ]B and [S]B are diagonal. Since diagonal matrices commute, [T ]B and [S]B commute, implying that T and S commute. Conversely, suppose that T and S commute. We proceed by induction on n. If n = 1, then any basis is equivalent to the basis consisting of any nonzero vector, and any 1 × 1 matrix is diagonal. So assume that 1 < n and the result holds over all vector spaces of dimension less than n. If T is a scalar times the identity operator, then clearly any basis that diagonalizes S will diagonalize both S and T . So suppose T is not a scalar times the identity. Let λ be an eigenvalue of T and put W = null(T − λI). So W is the eigenspace of T associated with λ, and by hypothesis 1 ≤ dim(W ) < n. By Theorem 7.4.3 W is invariant under S. It follows that T |W and S|W are both diagonalizable by Corollary 7.2.11, and hence by the induction hypothesis the two are simultaneously diagonalizable. Let BW be a basis for W which consists of eigenvectors of both T |W and S|W , so they are also eigenvectors of both T and S. Repeat this process for each eigenvalue of T and let B be the union of all the bases of the various eigenspaces. Then the matrices [T ]B and [S]B are both diagonal. Theorem 7.4.5. Let T be a diagonalizable operator on V , an n-dimensional vector space over F . Let S ∈ L(V ). Then there is a polynomial f (x) ∈ F [x] such that S = f (T ) if and only if each eigenspace of T is contained in a single eigenspace of S. Proof. First suppose that S = f (T ) for some f (x) ∈ F [x]. If T (v) = λv, then S(v) = f (T )(v) = f (λ)v. Hence the entire eigenspace of T associated with λ is contained in the eigenspace of S associated with its eigenvalue f (λ). This completes the proof in one direction. For the converse, let λ1 , . . . , λr be the distinct eigenvalues of T , and let Wi = null(T − λi I), the eigenspace of T associated with λi . Let µi be

126


the eigenvalue of S whose corresponding eigenspace contains Wi . Note that the values µ1 , . . . , µr might not be distinct. Use Lagrange interpolation to construct the polynomial f (x) ∈ F [x] for which f (λi ) = µi , 1 ≤ i ≤ r. Then define an operator S 0 ∈ L(V ) as follows. Let B be a basis of V consisting of the union of bases of the eigenspaces Wi . For each v ∈ B, say v ∈ Wi , put S 0 (v) = f (T )(v) = f (λi )v = µi v = S(v). Then since S 0 and S agree on a basis of V , they must be the same operator. Hence S = f (T ). (Here we have used the observation that if S 0 (v) = f (T )(v) for each v in some basis of V , then S 0 = f (T ).)

Corollary 7.4.6. Let T ∈ L(V ) have n distinct eigenvalues where n = dim(V ). Then the following are equivalent: (i) ST = T S. (ii) S and T are simultaneously diagonalizable. (ii) S is a polynomial in T . Proof. Since T has n distinct eigenvalues, its minimal polynomial has no repeated factors, so T is diagonalizable. Then Theorem 7.4.3 says that ST = T S iff each eigenspace of T is invariant under S. Since each eigenspace of T is 1-dimensional, this means that each eigenvector of T is also an eigenvector of S. Using Theorem 7.4.5 we easily see that the theorem is completely proved. We note the following example: If T is any invertible operator, so the constant term of its minimal polynomial is not zero, it is easy to use the minimal polynomial to write I as a polynomial in T . However, any polynomial in I is just some constant times I. So if T is any invertible operator that is not a scalar times I, then I is a polynomial in T , but T is not a polynomial in I.

7.5

Commuting Families of Operators∗

Let V be an n-dimensional vector space over F , and let F be a family of linear operators on V . We want to know when we can simultaneously triangularize or diagonalize the operators in F, i.e., find one basis B such that all of the matrices [T ]B for T ∈ F are upper triangular, or they are all diagonal. In the case of diagonalization, it is necessary that F be a commuting family of

7.5. COMMUTING FAMILIES OF OPERATORS∗

127

operators: U T = T U for all T, U ∈ F. That follows from the fact that all diagonal matrices commute. Of course, it is also necessary that each operator in F be a diagonalizable operator. In order to simultaneously triangularize, each operator in F must be triangulable. It is not necessary that F be a commuting family; however, that condition is sufficient for simultaneous triangulation as long as each T in F can be individually triangulated. The subspace W of V is invariant under (the family of operators) F if W is invariant under each operator in F. Suppose that W is a subspace that is T -invariant for some T ∈ L(V ). Recall the definition of the T -conductor of v into W . It is the set {f (x) ∈ F [x] : f (T )(v) ∈ W }. We saw that since F [x] is a PID there must be a monic polynomial g(x) of minimal degree in this ideal such that f (x) belongs to this ideal if and only if f (x) = g(x)h(x) for some h(x) ∈ F [x]. This polynomial g(x) was called the T -conductor of v into W . Since ~0 ∈ W , it is clear that g(x) divides the minimal polynomial for T . Lemma 7.5.1. Let F be a commuting family of triangulable linear operators on V . Let W be a proper subspace of V which is invariant under F. There exists a vector v ∈ V such that: (a) v is not in W ; (b) for each T in F, the vector T (v) is in the subspace spanned by v and W. Proof. Since the space of all linear operators on V is a vector space with dimension n2 , it is easy to see that without loss of generality we may assume that F has only finitely many operators. For let {T1 , . . . , Tr } be a maximal linearly independent subset of F, i.e., a basis for the subspace of L(V ) spanned by F. If v is a vector such that (b) holds for each Ti , then (b) holds for each each operator which is a linear combination of T1 , . . . , Tr . First we establish the lemma for one operator T . To do this we need to show that there is some vector v ∈ (V \ W ) for which the T -conductor of v into W is a linear polynomial. Since T is triangulable, its minimal polynomial p(x), as well as its characteristic polynomial, factor over F into a product of linear factors. Say p(x) = (x − c1 )e1 (x − c2 )e2 · · · (x − cr )er . Let w be any vector of V that is not in W , and let g(x) be the T -conductor of w into W . Then g divides the minimal polynomial for T . Since w is not in W , g is not constant. So g(x) = (x − c1 )f1 (x − c2 )f2 · · · (x − cr )fr where at least one of the integers fi is positive. Choose j so that fj > 0. Then (x − cj ) divides g(x):

128


g = (x − cj )h(x). If h(x) = 1 (it must be monic in any case), then w is the vector we seek. By definition of g, if h(x) has degree at least 1, then the vector v = h(T )(w) cannot be in W . But (T − cj I)(v) = (T − cj I)h(T )(w) = g(T )(w) ∈ W.

(7.6)

Now return to thinking about the family F. By the previous paragraph we can find a vector v1 not in W and a scalar c1 such that (T1 −c1 I)(v1 ) ∈ W . Let V1 be the set of all vectors v ∈ V such that (T1 − c1 I)(v) ∈ W . Then V1 is a subspace of V that properly contains W . Since T ∈ F commutes with T1 we have (T1 − c1 I)(T (v)) = T (T1 − c1 I)(v). If v ∈ V1 , then (T1 − c1 I)v ∈ W . Since W is invariant under each T in F, we have T (T1 − c1 I)(v) ∈ W i.e., T v ∈ V1 , for all v ∈ V1 and all T ∈ F. Note again that W is a proper subspace of V1 . Put U2 = T2 |V1 , the operator obtained by restricting T2 to the subspace V1 . The minimal polynomial for U2 divides the munimum polynomial for T2 . By the second paragraph of this proof applied to U2 and the invariant subspace W , there is a vector v2 ∈ V1 but not in W , and a scalar c2 such that (T2 − c2 I)(v2 ) ∈ W . Note that (a) v2 6∈ W ; (b) (T1 − c1 I)(v2 ) ∈ W ; (c) (T2 − c2 I)(v2 ) ∈ W . Let V2 = {v ∈ V1 : (T2 − c2 I)(v) ∈ W }. Then V2 is invariant under F. Apply the same ideas to U3 = T3 |V2 . Continuing in this way we eventually find a vector v = vr not in W such that (Tj − cj I)(v) ∈ W , for 1 ≤ j ≤ r. Theorem 7.5.2. Let V be a finite-dimensional vector space over the field F . Let F be a commuting family of triangulable linear operators on V (i.e., the minimal polynomial of each T ∈ F splits into linear factors). There exists an ordered basis for V such that every operator in F is represented by a triangular matrix with respect to that basis. Proof. Start by applying Lemma 7.5.1 to the F-invariant subspace W = {0} to obtain a nonzero vector v1 for which T (v1 ) ∈ W1 = hv1 i for all T ∈ F.

7.6. THE FUNDAMENTAL THEOREM OF ALGEBRA∗

129

Then apply the lemma to W1 to find a vector v2 not in W1 but for which T (v2 ) ∈ W2 = hv1 , v2 i. Proceed in this way until a basis B = (v1 , v2 , . . .) of V has been obtained. Clearly [T ]B is upper triangular for every T ∈ F. Corollary 7.5.3. Let F be a commuting family of n × n matrices over an algebraically closed field F . There exists a nonsingular n × n matrix P with entries in F such that P −1 AP is upper-triangular, for every matrix A in F. Theorem 7.5.4. Let F be a commuting family of diagonalizable linear operators on the finite-dimensional vector space V . There existws an ordered basis for V such that every operator in F is represented in that basis by a diagonal matrix. Proof. If dim(V ) = 1 or if each T ∈ F is a scalar times the identity, then there is nothing to prove. So suppose 1 < n = dim(V ) and that the theorem is true for vector spaces of diimension less than n. Also assume that for some T ∈ F, T is not a scalar multiple of the identity. Since the operators in F are all diagonalizable, we know that each minimal polynomial splits into distinct linear factors. Let c1 , . . . , ck be the distinct eigenvalues of T , and for each index i put Wi = null(T − ci I). Fix i. Then Wi is invariant under every operator that commutes with T . Let Fi be the family of linear operators on Wi obtained by restricting the operators in F to the invariant subspace Wi . Each operator in Fi is diagonalizable, because its minimal polynomial divides the minimal polynomial for the corresponding operator in F. By hypothesis, dim(Wi ) < dim(V ). So the operators in Fi can be simultaneously diagonalized. In other words, Wi has a basis Bi which consists of vectors which are simultaneously characteristic vectors for every operator in Fi . Then B = (B1 , . . . , Bk ) is the basis of V that we seek.

7.6

The Fundamental Theorem of Algebra∗

This section is based on the following article: Harm Derksen, The Fundamental Theorem of Algebra and Linear Algebra, The American Mathematical Monthly, vol 110, Number 7, August-September 2003, 620 – 623. We start by quoting from the third paragraph of the article by Derksen. “Since the fundamental theorem of algebra is needed in linear algebra courses, it would be desirable to have a proof of it in terms of linear algebra. In this paper we prove that every square matrix with complex coefficients has

130


an eigenvector. This statement is equivalent to the fundamental theorem of algebra. In fact, we will prove the slightly stronger result that any number of commuting square matrices with complex entries have a common eigenvector. The proof lies entirely within the framework of linear algebra, and unlike most other algebraic proofs of the fundamental theorem of algebra, it does not require Galois theory or splitting fields. ” Preliminaries Several results we have obtained so far have made the assumption that the field F was algebraically closed. Moreover, we often gave the complex numbers C as the prototypical example. So in this section we have to be careful not to quote any results that might have hidden in them the assumption that C is algebraically closed. For the proof we use only the following elementary properties of real and complex numbers that were established much earlier. Lemma Every polynomial of odd degree with real coefficients has a (real) zero. Lemma Every complex number has a square root. Theorem 7.3.2 If A is real, n×n, with n odd, then A has an eigenvector (belonging to a real eigenvalue). An Induction Argument Keep in mind that we cannot use results that might have hidden in them the assumption that C is algebraically closed. For a field K and for positive integers d and r, consider the following statement: P (K, d, r): Any r commuting linear transformations A1 , A2 , . . . , Ar of a K- vector space V of dimension n such that d does not divide n have a common eigenvector. Lemma 7.6.1. If P (K, d, 1) holds, then P (K, d, r) holds for all r ≥ 1. It is important to realize that the smallest d for which the hypothesis of this lemma holds in a given situation might be much larger than d = 1. Proof. The proof is by induction on r. The case of P (K, d, 1) is true by hypothesis. For r ≥ 2, suppose that P (K, d, r − 1) is true and let A1 , . . . , Ar be commuting linear transformations of V of dimension such that d does not


131

divide n. Because (K, d, 1) holds, Ar has an eigenvalue λ in K. Let W be the kernel and Z the image of Ar − λI. It is now easy to show that each of W and Z are left invariant by each of A1 , . . . , Ar−1 . First suppose that W 6= V . Because dim W + dim Z = dim V , either d does not divide dim W or d does not divide dim Z. Since dim W < n and dim Z < n, we may assume by induction on n that A1 , . . . , Ar already have a common eigenvector in W or in Z. In the remaining case, W = V . Because P (K, d, r − 1) holds, we may assume that A1 , . . . , Ar−1 have a common eigenvector in V , say v. Since Ar v = λv (because W = V ), v is a common eigenvector of A1 , . . . , Ar . Lemma 7.6.2. P (K, 2, r) holds for all r ≥ 1. In other words, if A1 , . . . , Ar are commuting linear transformations on an odd dimensional R-vector space, then they have a common eigenvector. Proof. By Lemma 7.6.1 it is enough to show that P (R, 2, 1) is true. If A is an linear transformation of an odd dimensional R-vector space, det(xI − A) is a polynomial of odd degree, which has a zero λ by Lemma 1.1.2. Then λ is a real eigenvalue of A. We now “lift” the result of Lemma 7.6.2 to the analogous result over the field C. Definition If A is any m×n matrix over C we let A∗ denote the transpose of the complex conjugate of the matrix A, i.e., (A∗ )ij = Aji . Then A is said to be Hermitian if m = n and A∗ = A. Lemma 7.6.3. P (C, 2, 1) holds, i.e., every linear transformation of a Cvector space of odd diimension has an eigenvector. Proof. Suppose that TA : C n → C n : v 7→ Av is a C-linear map with n odd. Let V be the R-vector space Hermn (C), the set of n × n Hermitian matrices. Define two linear operators L1 and L2 on V by L1 (B) = and

AB + BA∗ , 2

AB − BA∗ . 2i It is now easy to show that dim V = n2 , which is odd. It is also routine to check that L1 and L2 commute. Hence by Lemma 7.6.2, P (R, 2, 2) holds L2 (B) =

132


and implies that L1 and L2 have a common eigenvector B, say L1 (B) = λB and L2 (B) = µB, with λ and µ both real. But then (L1 + iL2 )(B) = AB = (λ + µi)B, and any nonzero column vector of B gives an eigenvector for the matrix A. Lemma 7.6.4. P (C, 2k , r) holds for all k ≥ 1and r ≥ 1. Proof. The proof is by induction on k. The case k = 1 follows from Lemmas 7.6.3 and 7.6.1. Assume that P (C, 2l , r) holds for l < k. We will establish that P (C, 2k , r) holds. In view of Lemma 7.6.1 it suffices to prove P (C, 2k , 1). Suppose that A : C n → C n is linear, where n is divisible by 2k−1 but not by 2k . Let V be the C-vector space Skewn (C) = {B ∈ Mn (C) : B ∗ = −B}, the set of n × n skew-symmetric matrices with complex entries. Note that dim V = n(n − 1)/2, which ensures that 2k−1 does not divide dim V . Define two commuting linear transformations L1 and L2 of V by L1 (B) = AB + BA∗ and L2 (B) = ABA∗ . It is an easy exercise to show that L1 and L2 are indeed both in L(V ) and they commute. By P (C, 2k−1 , 2), L1 and L2 have a common eigenvector B, say L1 (B) = λ(B) = AB + BA∗ , i.e., BA∗ = (λI − A)B and L2 (B) = µB = ABA∗ = A(λI − A)B = (λA − A2 )B, which implies (A2 − λA + µI)B = 0, where λ and µ are now complex numbers. Let v be a nonzero column of B. Then (A2 − λA − µI)v = 0.


133

Since each element of C has a square root, there is a δ in C such that δ = λ2 − 4µ. We can write x2 − λx + µ = (x − α)(x − β), where α = (λ + δ)/2 and β = (λ − δ)/2. We then have 2

(A − αI)w = 0, where w = (A − βI)v. If w = 0, then v is an eigenvector of A with eigenvalue β; if w 6= 0, then w is an eigenvector of A with eigenvalue α. We have now reached the point where we can prove the main result for commuting operators on a complex space. Theorem 7.6.5. If A1 , A2 , . . . , Ar are commuting linear transformations of a finite dimensional nonzero C-vector space V , then they have a common eigenvector. Proof. Let n be the dimension of V . There exists a positive integer k such that 2k does not divide n. Since P (C, 2k , r) holds by Lemma 7.6.4, the theorem follows. Theorem 7.6.6. (The Fundamental Theorem of Algebra) If P (x) is a nonconstant polynomial with complex coefficients, then there exists a λ in C such that P (λ) = 0. Proof. It suffices to prove this for monic polynomials. So let P (x) = xn + a1 xn−1 + a2 xn−2 + · · · + an . Then P (x) = det (xI − A), where A is the companion matrix of P :     A=  

· · · 0 −an · · · 0 −an−1 · · · 0 −an−2 .. . . .. . . . 0 0 · · · 1 −a1 0 1 0 .. .

0 0 1 .. .

    .  

Theorem 7.6.5 implies that A has a complex eigenvalue λ in C, from which it follows that P (λ) = 0.

134

7.7


Exercises

1. Let V be finite dimensional over F and let P ∈ L(V ) be idempotent. Determine the eigenvalues of P and show that P is diagonalizable. 2. If T, S ∈ L(V ) and T S = ST , show that (i) null(T ) is S-invariant; and (ii) If f (x) ∈ F [x], then null(f (T )) is S-invariant. 3. Suppose n is a positive integer and T ∈ L(F n ) is defined by T (z1 , z2 , . . . , zn ) = (z1 + · · · + zn , z1 + · · · + zn , . . . , z1 + · · · + zn ). Determine all eigenvalues and eigenvectors of T . 4. Suppose T ∈ L(V ) and dim(Im(T )) = k. Prove that T has at most k + 1 distinct eigenvalues. 5. Suppose that S, T ∈ L(V ), λ ∈ F and 1 ≤ k ∈ Z. Show that (T S − λI)k T = T (ST − λI)k . 6. Suppose that S, T ∈ L(V ). Prove that ST and T S have the same eigenvalues but not necessarily the same minimal polynomial. 7. Suppose that S, T ∈ L(V ) and at least one of S, T is invertible. Show that ST and T S have the same minimal and characteristic polynomials. 8. Suppose that F is algebraically closed, p(z) ∈ F [z] and a ∈ F . Prove that a is an eigenvalue of p(T ) if and only if a = p(λ) for some eigenvalue λ of T . (Hint: Suppose that a is an eigenvalue of p(T ). Factor p(z)−a = c(z −λ1 ) · · · (z −λm ). Use the fact that p(T )−aI is not injective. Don’t forget to consider what happens if c = 0.) 9. Suppose that S, T ∈ L(V ) and that T is diagonalizable. Suppose that each eigenvector of T is an eigenvector of S. Show that ST = T S. 10. Let T : V → V be a linear operator on the vector space over the field F . Let v ∈ V and let m be a positive integer for which v 6= 0, T (v) 6= 0, ..., T m−1 (v) 6= 0, but T m (v) = 0. Show that {v, T (v), . . . , T m−1 (v)} is a linearly independent set.

7.7. EXERCISES

135

11. Let W be a subspace of the vector space V over any field F . (a) Prove the statements about the quotient space V /W made in the paragraph just before Theorem 7.4.2. (b) First Isomorphism Theorem: Each linear transformation T : V → W induces a linear isomorphism Φ : V /(null(T )) → Im(T ).

136


Chapter 8 Inner Product Spaces 8.1

Inner Products

Throughout this chapter F will denote a subfield of the complex numbers C, and V will denote a vector space over F . Definition An inner product on V is a scalar-valued function h , i : V × V → F that satisfies the following properties: (i) hu + v, wi = hu, wi + hv, wi for all u, v, w ∈ V . (ii) hcu, vi = chu, vi for all c ∈ F , u, v ∈ V . (iii) hv, ui = hu, vi for all u, v ∈ V , where the overline denotes complex conjugate . (iv) hu, ui > 0 if u 6= ~0. It is easy to check that the above properties force hu, cv + wi = chu, vi + hu, wi ∀u, v, w ∈ V, c ∈ F. Example 8.1.1. On F n there is a “standard” inner product defined as follows: for ~x = (x1 , . . . , xn ) and ~y = (y1 , . . . , yn ), put h~x, ~y i =

n X

xi yi .

i=0

It is easy enough to show that the above definition really does give an inner product on F n . In fact, it is a special case of the following example. 137

138

CHAPTER 8. INNER PRODUCT SPACES

Example 8.1.2. Let A be an invertible n × n matrix over F . Define h , i on F n (whose elements are written as row vectors for the purpose of this example) as follows. For u, v ∈ V put hu, vi = uAA∗ v ∗ ,

where B ∗ denotes the complex conjugate transpose of B.

It is a fairly straightforward exercise to show that this definition really gives an inner product. If A = I the standard inner product of the previous example is obtained. Example 8.1.3. Let C(0, 1) denote the vector space of all continuous, realvalued functions on the interval [0, 1]. For f, g ∈ C(0, 1) define Z

1

hf, gi =

f (t)g(t)dt. 0

Again it is a routine exercise to show that this really gives an inner product. Definition An inner product space is a vector space over F (a subfield of C) together with a specified inner product on that space. Let V be an inner product space with p inner product h , i. The length of a vector v ∈ V is defined to be ||v|| = hv, vi. Theorem 8.1.4. If V is an inner product space, then for any vectors u, v ∈ V and any scalar c ∈ F , (i) ||cu|| = |c| ||u||; (ii) ||u|| > 0 for u 6= ~0; (iii) |hu, vi| ≤ ||u|| ||v||; (iv) ||u + v|| ≤ ||u|| + ||v||. Proof. Statements (i) and (ii) follow almost immediately from the various definitions involved. The inequality in (iii) is clearly valid when u = ~0. If u 6= ~0, put w=v−

hv, ui u. ||u||2

It is easily checked that hw, ui = 0 and

8.1. INNER PRODUCTS

139

hv, ui hv, ui u, v − ui ||u||2 ||u||2 hv, uihu, vi = hv, vi − ||u||2 |hu, vi|2 = ||v||2 − . ||u||2

0 ≤ ||w||2 = hv −

Hence |hu, vi|2 ≤ ||u||2 ||v||2 . It now follows that Rehu, vi ≤ |hu, vi| ≤ kuk · kvk, and ||u + v||2 = = ≤ =

||u||2 + hu, vi + hv, ui + ||v||2 ||u||2 + 2Rehu, vi + ||v||2 ||u||2 + 2||u|| ||v|| + ||v||2 (||u|| + ||v||)2 .

Thus ||u + v|| ≤ ||u|| + ||v||. The inequality in (iii) is called the Cauchy-Schwarz inequality. It has a very wide variety of applications. The proof shows that if u is nonzero, then |hu, vi| < ||u|| ||v|| unless v=

hv, ui u, ||u||2

which occurs if and only if (u, v) is a linearly dependent list. You should try out the Cauchy-Schwarz inequality on the examples of inner products given above. The inequality in (iv) is called the triangle inequality. Definitions Let u and v be vectors in an inner product space V . Then u is orthogonal to v if and only if hu, vi = 0 if and only if hv, ui = 0, in which case we say u and v are orthogonal and write u ⊥ v. If S is a set of vectors in V , S is called an orthogonal set provided each pair of distinct vectors in S is orthogonal. An orthonormal set is an orthogonal set S with the additional property that ||u|| = 1 for every u ∈ S. Analogous definitions are made for lists of vectors.

140


Note: The standard basis of F n is an orthonormal list with respect to the standard inner product. Also, the zero vector is the only vector orthogonal to every vector. (Prove this!) Theorem 8.1.5. An orthogonal set of nonzero vectors is linearly independent. Proof. Let S be a finite or infinite orthogonal set of nonzero vectors in a given inner product space. Suppose v1 , . . . , vm are distinct vectors in S and that w = c1 v1 + c2 v2 + · · · + cm vm . Then X hw, vk i = h cj vj , vk i j

=

X

cj hvj , vk i

j

= ck hvk , vk i. Since hvk , vk i = 6 0, it follows that ck =

hw, vk i , 1 ≤ k ≤ m. ||vk ||2

When w = ~0, each ck = 0, so S is an independent set. Corollary 8.1.6. If a vector w is a linear combination of an orthogonal list (v1 , . . . , vm ) of nonzero vectors, then w is the particular linear combination w=

m X hw, vk i k=1

||vk ||2

vk .

(8.1)

Theorem 8.1.7. (Pythagorean Theorem) If u ⊥ v, then ||u + v||2 = ||u||2 + ||v||2 . Proof. Suppose u ⊥ v. Then ||u + v||2 = hu + v, u + vi = ||u||2 + ||v||2 + hu, vi + hv, ui = ||u2 || + ||v||2 .

(8.2)

8.1. INNER PRODUCTS

141

Theorem 8.1.8. (Parallelogram Equality) If u, v are vectors in the inner product space V , then ||u + v||2 + ||u − v||2 = 2(||u||2 + ||v||2 ). Proof. The details are routine and are left as an exercise (see exercise 1). Starting with an inner product h , i on a vector space U p over F (where F is some subfield of C), we defined a norm on U by ||v|| = hv, vi, for all v ∈ U . This norm function satisfies a variety of properties as we have seen in this section. Sometimes we have a norm function given and would like to know if it came from an inner product. The next theorem provides an answer to this question, but first we give an official definition of a norm. Definition A norm on a vector space U over the field F is a function || || : U → [0, ∞) ⊆ R such that (i) ||u|| = 0 iff u = ~0; (ii))||au|| = |a| · ||u|| ∀a ∈ F, u ∈ U ; (iii) ||u + v|| ≤ ||u|| + ||v||. Theorem 8.1.9. Let || || be a norm on U . Then there is an inner product 1 h , i on U such that ||u|| = hu, ui 2 for all u ∈ U if and only if || || satisfies the parallelogram equality. Proof. We have already seen that if the norm is derived from an inner product, then it satisfies the parallelogram equality. For the converse, now suppose that || || is a norm on U satisfying the parallelogram equality. We will show that there must have been an inner product from which the norm was derived in the usual fashion. We first consider the case F = R. It is then clear (from the real polarization identity - see Exercise 13) that h , i must be defined in the following way: ||u + v||2 − ||u − v||2 hu, vi = . 4 2

~

2

(8.3) 1

It is then clear that hu, ui = ||2u|| 4−||0|| | = ||u||2 , so ||u|| = hu, ui 2 for all u ∈ U . but it is not at all clear that h , i is an inner product. However, since hu, ui = ||u||2 , by the definition of norm we see that

142


(a) hu, ui ≥ 0, with equality if and only if u = ~0. Next we show that h , i is additive in the first slot. We use the parallel2 2 + ||u−v|| . ogram equality in the form ||u||2 + ||v||2 = ||u+v|| 2 2 Let u, v, w ∈ U . Then from the definition of h , i we have: 4(hu + v, wi − hu, wi − hv, wi) (which should be 0) = ||u + v + w||2 − ||u + v − w||2 − ||u + w||2 + ||u − w||2 − ||v + w||2 + ||v − w||2 = ||u+v +w||2 +(||u−w||2 +||v −w||2 )−||u+v −w||2 −(||u+w||2 +||v +w||2 ) ||u + v − 2w||2 ||u − v||2 ku + v + 2w||2 ||u − v||2 + −||u+v−w||2 − − 2 2 2 2 ||u + v + 2w||2 ||u + v − 2w||2 −(||u+v−w||2 +||w||2 )− = (||u+v+w||2 +||w||2 )+ 2 2 2 2 2 ||u + v + 2w|| ||u + v|| ||u + v − 2w|| = + + 2 2 2 2 2 ||u + v|| ||u + v − 2w|| ||u + v + 2w||2 − − − 2 2 2 = 0. = ||u+v+w||2 +

Hence hu + v, wi = hu, wi + hv, wi, proving (b) h , i is additive in the first slot. To prove that h , i is homogeneous in the first slot is rather more involved. First suppose that n is a positive integer. Then using additivity in the first slot we have hnu, vi = nhu, vi. Replacing u with n1 u gives hu, vi = nh n1 u, vi, so n1 hu, vi = h n1 u, vi. So if m, n are positive integers, h

1 m m u, vi = mh u, vi = hu, vi. n n n

Using property (ii) in the definition of norm we see ||−u|| = |−1|||u|| = ||u||. Then using the definition of h , i, we have h−u, vi =

|| − u + v||2 − || − u − v||2 | || − (u − v)||2 − || − (u + v)||2 = 4 4 =

−||u + v||2 + ||u − v||2 = −hu, vi. 4

8.1. INNER PRODUCTS

143

This shows that if r is any rational number, then hru, vi = rhu, vi. Now suppose that λ ∈ R is any real number (with special interest in the case where λ is not rational). There must be a sequence {rn }∞ n=1 of rational numbers for which limn→∞ rn = λ. Thus λhu, vi = limn→∞ rn hu, vi = limn→∞ hrn u, vi = ||rn u + v||2 − ||rn u − v||2 . = limn→∞ 4 We claim that limn→∞ ||rn u + v||2 = ||λu + v||2 and limn→∞ ||rn u − v||2 = ||λu − v||2 . Once we have shown this, we will have λhu, vi =

λu + v||2 − ||λu − v||2 = hλu, vi, 4

completing the proof that h , i is homogeneous in the first slot. For x, y ∈ U , ||x|| = ||y+(x−y)|| ≤ ||y||+||y−x||, so ||x||−||y|| ≤ ||x−y||. Interchanging the roles of x and y we see that also ||yk − kx|| ≤ ||y − x|| = ||x − y||. Hence |||x|| − ||y||| ≤ ||y − x||. With x = rn u + v and y = λu + v, this latter inequality gives |||rn u + v|| − ||λu + v||| ≤ ||(rn − λ)u|| = |rn − λ| · ||u||. Since rn − λ → 0, we have ||rn u + v|| → ||λu + v||. Replacing v with −v gives limn→∞ ||rn u − v|| = ||λu − v||. So indeed h , i is homogeneous in the first slot. Fiinally, we show that (d) hu, vi = hv, ui. So: hu, vi =

||v + u||2 − ||v − u||2 ||u + v||2 − ||u − v||2 = = hv, ui. 4 4

This completes the proof in the case that F = R. Now consider the case F = C. By the complex polarization identity (see Exercise 15) we must have 4

1X n hu, vi = i ||u + in v||2 . 4 n=1

(8.4)

144


Then putting v = u we find 4

4

1X n 1X n hu, ui = i ||u + in u||2 = i |1 + in |2 ||u||2 4 n=1 4 n=1 ||u||2 [2i − 0 − 2i + 4] = ||u||2 , 4 as desired. But we must still show that h , i has the properties of an inner product. Because hu, u i = ||u||2 , it follows immediately from the properties of a norm that hu, ui ≥ 0 with equality if and only if u = ~0. =

For convenience define h , iR to be the real inner product defined above, so hu, viR =

||u + v||2 − ||u − v||2 . 4

Note that hu, vi = hu, viR + ihu, iviR . We have already proved that h , iR is additive in the first slot, which can now be used in a routine fashion to show that h , i is also additive in the first slot. It is even easier to show that hau, vi = ahu, vi for a ∈ R using the homogeneity of h , iR in the first slot. However, we must still extend this to all complex numbers. Note that: hiu, vi = ||iu + v||2 − ||iu − v||2 + i||iu + iv||2 − i||iu − iv||2 = 4 2 2 ||i(u + v)|| i − ||i(u − v)|| i − ||i(u + iv)||2 + ||i(u − iv)||2 = 4 2 2 ||u + v|| i − ||u − v|| i − ||u + iv||2 + ||u − iv||2 = 4 = ihu, vi. Combining this with additivity and homogeneity with respect to real numbers, we get that h(a + bi)u, vi = (a + bi)hu, vi ∀a, b ∈ R.

8.1. INNER PRODUCTS

145

Hence h , i is homogeneous in the first slot. The only thing remaining to show is that hv, ui = hu, vi. ||u + v||2 − ||u − v||2 + ||u + iv||2 i − ||u − iv||2 i 4 ||v + u||2 − ||v − u||2 + ||i(v − ui)||2 i − ||(−i)(v + ui)||2 i = 4 2 2 ||v + u|| − ||v − u|| + ||v + iu||2 i − ||v − iu||2 i = 4 = hv, ui.

hu, vi =

Now suppose that F is a subfield of C and that V = F n . There are three norms on V that are most commonly used in applications. Definition For vectors x = (x1 , . . . , xn )T ∈ V , the norms k · k1 , k · k2 , and k · k∞ , called the 1-norm, 2-norm, and ∞-norm, respectively, are defined as: kxk1 = |x1 | + |x2 | + · · · + |xn |; kxk2 = (|x1 |2 + |x2 |2 + · · · + |xn |2 )1/2 ; kxk∞ = max{|x1 |, |x2 |, . . . , |xn |}.

(8.5)

Put x = (1, 1)T ∈ F 2 and y = (−1, 1)T ∈ F 2 . Using these vectors x and y it is routine to show that k · k1 and k · k∞ do not satisfy the parallelogram equality, hence must not be derived from an inner product in the usual way. On the other hand, all three norms are equivalent in a sense that we are about to make clear. First we pause to notice that the so-called norms really are norms. The only step that is challenging is the triangle inequality for the 2-norm, and we proved this earlier. The details for the other two norms are left to the reader. Definition Let k · k be a norm on V . A sequence {vi }∞ i=1 of vectors is said to converge to the vector v∞ provided the sequence {kvi − v∞ k} of real numbers converges to 0. With this definition we can now talk of a sequence of vectors in F n converging by using norms. But which norm should we use? What we mean by saying that all three norms are equivalent is that a sequence of vectors

146


converges to a vector v using one of the norms if and only if it converges to the same vector v using either of the other norms. Theorem 8.1.10. The 1-norm, 2-norm and ∞-norm on F n are all equivalent in the sense that (a) If a sequence {xi } of vectors converges to x∞ as determined in one of the norms, then it converges to x∞ in all three norms, and for fixed index j, the entries (xi )j converge to the entry (x∞ )j . This is an easy consequence of √ (b) kxk1 ≤ kxk2 n ≤ nkxk∞ ≤ nkxk1 . Proof. If u = (u1 , . . . , un ), put x = (|u1 |, . . . , |un |)T and y = (1, 1, . . . , 1)T . Then by the Cauchy-Schwarz inequality applied to x and y, we have qX X √ √ |ui |2 · n = kuk2 n. |hx, yi| = |ui | = kuk1 ≤ kxk2 kyk2 = √ So kxk1 ≤ kxk2 n for all x ∈ F n . 2 2 2 2 2 2 √ Next, kxk2 = x1 + · · · + xn ≤ n(max{|xi |}) = n · kx||∞ , implying kxk2 ≤ nkxk∞ . This proves the first and second inequalities in (b), and the third is quite obvious. In fact, any two norms on a finite dimensional vector space over F are equivalent (in the sense given above), but we do not need this result.

8.2

Orthonormal Bases

Theorem 8.2.1. Let L = (v1 , . . . , P vm ) be an orthonormal list of vectors in V and put W = span(L). If w = m i=1 ai vi is an arbitrary element of W , then (i) ai = hw, vP i i, and Pm 2 2 2 (ii) ||w|| = m |a | = i i=1 i=1 |hw, vi i| . P Pm Proof. If w = m i=1 ai vi , compute hw, vj i = h i=1 ai vi , vj i = aj . Then apply the Pythagorean Theorem. The preceding result shows that an orthonormal basis can be extremely handy. The next result gives the Gram-Schmidt algorithm for replacing a linearly independent list with an orthonormal one having the same span as the original.

8.2. ORTHONORMAL BASES

147

Theorem 8.2.2. (Gram-Schmidt method) If L = (v1 , . . . , vm ) is a linearly independent list of vectors in V , then there is an orthonormal list B = (e1 , . . . , em ) such that span(e1 , . . . , ej ) = span(v1 , . . . , vj ) for each j = 1, 2, . . . , m. Proof. Start by putting e1 = v1 /||v1 ||, so e1 has norm 1 and spans the same space as does v1 . We construct the remaining vectors e2 , . . . , em inductively. Suppose that e1 , . . . , ek have been determined so that (e1 , . . . , ek ) is orthonormal and span(e1 , . . . , ej ) = span(v1 , . . . , vj ) for each j = 1, 2, . . . , k. We then conP struct ek+1 as follows. Put e0k+1 = vk+1 − ki=1 hvk+1 , ei iei . Check that e0k+1 is orthogonal to each of e1 , . . . , ek . Then put ek+1 = e0k+1 /||e0k+1 ||. At this point we need to assume that V is finite dimensional. Corollary 8.2.3. Let V be a finite dimensional inner product space. (i) V has an orthonormal basis. (ii) If L is any orthonormal set in V it can be completed to an orthonormal basis of V . Proof. We know that any independent set can be completed to a basis to which we can then apply the Gran-Schmidt algorithm. Lemma 8.2.4. If T ∈ L(V ) has an upper triangular matrix with respect to some basis of V , then it has an upper triangular matrix with respect to some orthonormal basis of V . Proof. Suppose that B = (v1 , . . . , vn ) is a basis such that [T ]B is upper triangular. Basically this just means that for each j = 1, 2, . . . , n the subspace span(v1 , . . . , vj ) is T -invariant. Use the Gram-Schmidt algorithm to construct an orthonormal basis S = (e1 , . . . , en ) for V such that span(v1 , . . . , vj ) = span(e1 , . . . , ej ) for each j = 1, 2, . . . , n. Hence for each j, span(e1 , . . . , ej ) is T -invariant, so that [T ]S is upper triangular. Using the fact that C is algebraically closed we showed that if V is a finite dimensional complex vector space, then there is a basis B for V such that [T ]B is upper triangular. Hence we have the following result which is sometimes called Schur’s Theorem. Corollary 8.2.5. (Schur’s Theorem) If T ∈ L(V ) where V is a finite dimensional complex inner product space, then there is an orthonormal basis for V with respect to which T has an upper triangular matrix.

148


We now apply the preceding results to the case where V = C n . First we introduce a little more language. If P is an invertible matrix for which P −1 = P ∗ , where P ∗ is the conjugate transpose of P , then P is said to be unitary. If P is real and unitary (so P −1 = P T ), we say P is an orthogonal matrix. Let A be an n×n matrix over C. View C n as an inner product space with the usual inner product. Let S be the standard ordered (orthonormal) basis. Define TA ∈ L(C n ) by TA (~x) = A~x. We know that [TA ]S = A. Let B = (v1 , . . . , vn ) be an orthonormal basis with respect to which TA has an upper triangular matrix. Let P be the matrix whose jth column is vj = [vj ]S . Then [TA ]B = P −1 AP by Theorem 4.7.2. Since B is orthonormal it is easy to check that P is a unitary matrix. We have proved the following. Corollary 8.2.6. If A is an n × n complex matrix, there is a unitary matrix P such that P −1 AP is upper triangular.

8.3

Orthogonal Projection and Minimization

Let V be a vector space over the field F , F a subfield of C, and let h , i be an inner product on V . Let W be a subspace of V , and put W ⊥ = {v ∈ V : hw, vi = 0 for all w ∈ W }. Obs. 8.3.1. W ⊥ is a subspace of V and W ∩ W ⊥ = {0}. Hence W + W ⊥ = W ⊕ W ⊥. Proof. Easy exercise. We do not know if each v ∈ V has a representation in the form v = w + w0 with w ∈ W and w0 ∈ W ⊥ , but at least we know from the preceding observation that it has at most one. There is one case where we know that V = W ⊕ W ⊥. Theorem 8.3.2. If W is finite dimensional, then V = W ⊕ W ⊥ . Proof. Suppose W is finite dimensional. Then using the Gram-Schmidt process, for example, we can find an orthonormal basis D = (v1 , . . . , vm ) of W . For arbitrary v ∈ V , put ai = hv, vi i. Then we know that w=

m X i=1

ai vi

8.3. ORTHOGONAL PROJECTION AND MINIMIZATION

149

is in W , and we write 0

v =w+w =

m X

ai vi + (v −

i=1

m X

ai vi ).

i=1

Pm

⊥ We show that v − i=1 ai vP i ∈ W . m So let u ∈ W , say u = j=1 bj vj . Since D is orthonormal,

hv −

m X

ai vi , ui = hv −

1

=

i=1

m X

bj hv, vj i −

m X

b j aj −

j=1

Pm

m X

ai vi ,

m X

bj vj i

j=1

bj ai hvi , vj i

i,j=1

j=1

=

m X

m X

bj aj hvj , vj i = 0.

j=1 0

So with w = i=1 ai vi and w = v − w, v = w + w0 is the unique way to write v as the sum of an element of W plus an element of W ⊥ . Now suppose V = W ⊕ W ⊥ , (which we have just seen is the case if W is finite dimensional). A linear transformation P : V → V is said to be an orthogonal projection of V onto W provided the following hold: (i) P (w) = w for all w ∈ W , (ii) P (w0 ) = 0 for all w0 ∈ W ⊥ . So let P be an orthogonal projection onto W by this definition. Let v = w + w0 be any element of V with w ∈ W , w0 ∈ W ⊥ . Then P (v) = P (w + w0 ) = P (w) + P (w0 ) = w + 0 = w ∈ W . So P (v) is a uniquely defined element of W for all v. Conversely, still under the hypothesis that V = W ⊕W ⊥ , define P 0 : V → V as follows. For v ∈ V , write v = w + w0 , w ∈ W , w0 ∈ W ⊥ (uniquely!), and put P 0 (v) = w. It is an easy exercise to show that P 0 really is linear, P 0 (w) = w for all w ∈ W , and P 0 (w0 ) = P 0 (0 + w0 ) = 0 for w0 ∈ W ⊥ . So P 0 : v = w + w0 7→ w is really the unique orthogonal projection of V onto W . Moreover, Obs. 8.3.3. P 2 = P ; P (v) = v if and only if v ∈ W ; P (v) = 0 if and only if v ∈ W ⊥ .

150


Obs. 8.3.4. I − P is the unique orthogonal projection of V onto W ⊥ . Both Obs. 8.3.3 and 8.3.4 are fairly easy to prove, and their proofs are worthwhile exercises. Obs. 8.3.5. W ⊆ W ⊥⊥ . Proof. W ⊥ consists of all vectors in V orthogonal to every vector of W . So in particular, every vector of W is orthogonal to every vector of W ⊥ , i.e., each vector of W is in (W ⊥ )⊥ . But in general we do not know if there could be some vector outside W that is in W ⊥⊥ . Theorem 8.3.6. If V = W ⊕ W ⊥ , then W = (W ⊥ )⊥ . Proof. By Obs. 8.3.5, we must show that W ⊥⊥ ⊆ W . Since V = W ⊕ W ⊥ , we know there is a unique orthogonal projection P of V onto W , and for each v ∈ V , v − P (v) ∈ W ⊥ . Keep in mind that W ⊆ (W ⊥ )⊥ and P (v) ∈ W ⊆ (W ⊥ )⊥ , so hv − P (v), P (v)i = 0. It follows that for v ∈ W ⊥⊥ we have ||v−P (v)||2 = hv−P (v), v−P (v)i = hv, v−P (v)i−hP (v), v−P (v)i = 0−0 = 0 by the comments just above. Hence v = P (v), which implies that v ∈ W , i.e., W ⊥⊥ ⊆ W . Hence W ⊥⊥ = W . Note: If V = W ⊕ W ⊥ , then (W ⊥ )⊥ = W , so also V = W ⊥ ⊕ (W ⊥ )⊥ , and (W ⊥ )⊥⊥ = W ⊥ . Given a finite dimensional subspace U of the inner product space V and a point v ∈ V , we want to find a point u ∈ U closest to v in the sense that ||v − u|| is as small as possible. To do this we first construct an orthonormal basis B = (e1 , . . . , em ) of U . The unique orthogonal projection of V onto U is given by m X PU (v) = hv, ei iei . i=0

We show that PU (v) is the unique u ∈ U closest to v. Theorem 8.3.7. Suppose U is a finite dimensional subspace of the inner product space V and v ∈ V . Then ||v − PU (v)|| ≤ ||v − u|| ∀u ∈ U. Furthermore, if u ∈ U and equality holds, then u = PU (v).

8.3. ORTHOGONAL PROJECTION AND MINIMIZATION

151

Proof. Suppose u ∈ U . Then v − PU (v) ∈ U ⊥ and PU (v) − u ∈ U , so we may use the Pythagorean Theorem in the following: ||v − PU (v)||2 ≤ ||v − PU (v)||2 + ||PU (v) − u||2 = ||(v − PU (v)) + (PU (v) − u)||2 = ||v − u||2 ,

(8.6)

where taking square roots gives the desired inequality. Also, the inequality of the theorem is an equality if and only if the inequality in Eq. 8.6 is an equality, which is if and only if u = PU (v). Example 8.3.8. There are many applications of the above theorem. Here is one example. Let V = R[x], and let U be the subspace consisting of all polynomials f (x) with degree less than R4 and satisfying f (0) = f 0 (0) = 0. 1 Find a polynomial p(x) ∈ U such that 0 |2 + 3x − p(x)|2 dx is as small as possible. R1 Solution: Define an inner product on R[x] by hf, gi = 0 f (x)g(x)dx. Put g(x) = 2 + 3x, and note that U = {p(x) = a2 x2 + a3 x3 : a2 , a3 ∈ R}. We need an orthonormal basis of U . So start with the basis B = (x2 , x3 ) 2 and apply the Gram-Schmidt algorithm. We want to put e1 = ||xx2 || . First R1 compute ||x2 ||2 = 0 x4 dx = 15 , so e1 =

√

5 x2 .

(8.7)

Next we want to put e2 =

x3 − hx3 , e1 ie1 . ||x3 − hx3 , e1 ie1 ||

√ √ R1 Here hx3 , e1 i = 5 0 x5 dx = 65 . Then x3 − hx3 , e1 ie1 = x3 − 65 x2 . R1 1 Now ||x3 − 65 x2 ||2 = 0 (x3 − 56 x2 )2 dx = 7·36 . Hence

√ √ 5 e2 = 6 7(x3 − x2 ) = 7(6x3 − 5x2 ). 6 Then the point p ∈ U closest to g = 2 + 3x is

(8.8)

152


p = hg, e1 ie1 + hg, e2 ie2 Z 1 √ √ 2 2 = (2 + 3x)x dx 5 5x + 0 Z 1 √ √ 3 2 (2 + 3x)(6x − 5x )dx + 7 7(6x3 − 5x2 ),

(8.9)

0

(8.10) so that after a bit more computation we have p = 24x2 −

8.4

203 3 x. 10

(8.11)

Linear Functionals and Adjoints

Recall that if V is a vector space over any field F , then a linear map from V to F (viewed as a vector space over F ) is called a linear functional. The set V ∗ of all linear functionals on V is called the dual space of V . When V is a finite dimensional inner product space the linear functionals on V have a particularly nice form. Fix v ∈ V . Then define φv : V → F : u 7→ hu, vi. It is easy to see that φv is a linear functional. If V is finite dimensional then every linear functional on V arises this way. Theorem 8.4.1. Let V be a finite-dimensional inner product space. Let φ ∈ V ∗ . Then there is a unique vector v ∈ V such that φ(u) = hu, vi for every u ∈ V . Proof. Let (e1 , . . . , en ) be an orthonormal basis of V . Then for any u ∈ V , we have n X u= hu, ei iei . i=1

8.4. LINEAR FUNCTIONALS AND ADJOINTS

153

Hence n n X X φ(u) = φ( hu, ei iei ) = hu, ei iφ(ei )

=

i=1 n X

i=1

hu, φ(ei )ei i.

i=1

(8.12) P So if we put v = ni=1 φ(ei )ei , we have φ(u) = hu, vi for every u ∈ V . This shows the existence of the desired v. For the uniqueness, suppose that φ(u) = hu, v1 i = hu, v2 i ∀u ∈ V. Then 0 = hu, v1 − v2 i for all u ∈ V , forcing v1 = v2 . Now let V and W both be inner product spaces over F . Let T ∈ L(V, W ) and fix w ∈ W . Define φ : V → F by φ(v) = hT (v), wi. First, it is easy to check that φ ∈ V ∗ . Second, by the previous theorem there is a unique vector (that we now denote by T ∗ (w)) for which φ(v) = hT (v), wi = hv, T ∗ (w)i ∀v ∈ V. It is clear that T ∗ is some kind of map from W to V . In fact, it is routine to check that T ∗ is linear, i.e., T ∗ ∈ L(W, V ). T ∗ is called the adjoint of T . Theorem 8.4.2. Let V and W be inner product spaces and suppose that S, T ∈ L(V, W ) are both such that S ∗ and T ∗ exist. Then (i) (S + T )∗ = S ∗ + T ∗ . (ii) (aT )∗ = aT ∗ . (iii) (T ∗ )∗ = T . (iv) I ∗ = I. (v) (ST )∗ = T ∗ S ∗ . Proof. The routine proofs are left to the reader. Theorem 8.4.3. Let V and W be finite dimensional inner product spaces over F . Suppose T ∈ L(V, W ). Then (i) null(T ∗ ) = (Im(T ))⊥ ; (ii) Im(T ∗ ) = (null(T ))⊥ ; (iii) null(T ) = (Im(T ∗ ))⊥ ; (iv) Im(T ) = (null(T ∗ ))⊥ .

154


Proof. For (i), w ∈ null(T ∗ ) iff T ∗ (w) = ~0 iff hv, T ∗ (w)i = 0 ∀v ∈ V iff hT (v), wi = 0 ∀v ∈ V iff w ∈ (Im(T ))⊥ . This proves (i). By taking orthogonal complements of both sides of an equality, or by replacing an operator with its adjoint, the other three equalities are easily established. Theorem 8.4.4. Let V be a finite dimensional inner product space over F and let B = (e1 , . . . , en ) be an orthonormal basis for V . Let T ∈ L(V ) and let A = [T ]B . Then Aij = hT (ej ), ei i. Proof. The matrix A is defined by T (ej ) =

n X

Aij ei .

i=1

Since B is orthonormal, we also have n X T (ej ) = hT (ej ), ei iei . i=1

Hence Aij = hT (ej ), ei i. Corollary 8.4.5. Let V be a finite dimensional inner product space over F and let B = (e1 , . . . , en ) be an orthonormal basis for V . Let T ∈ L(V ) and let A = [T ]B . Then [T ∗ ]B = A∗ , where A∗ is the conjugate transpose of A. Proof. According to Theorem 8.4.4, Aij = hT (ej ), ei i, and if B = [T ∗ ]B , then Bij = hT ∗ (ej ), ei i = hei , T ∗ (ej )i = hT (ei ), ej i = Aji .

8.5

Exercises

1. Parallelogram Equality: If u, v are vectors in the inner product space V , then ||u + v||2 + ||u − v||2 = 2(||u||2 + ||v||2 ). Explain why this equality should be so-named. 2. Let T : Rn → Rn be defined by T (z1 , z2 , . . . , zn ) = (z2 − z1 , z3 − z2 , . . . , z1 − zn ).

8.5. EXERCISES

155

(a) Give an explicit expression for the adjoint, T ∗ . (b) Is T invertible? Explain. (c) Find the eigenvalues of T . (d) Compute the characteristic polynomial of T . (e) Compute the minimal polynomial of T . 3. Let T ∈ L(V ), where V is a complex inner product space and T has an adjoint T ∗ . Show that if T is normal and T (v) = λv for some v ∈ V , then T ∗ (v) = λv. 4. Let V and W be inner product spaces over F , and let T ∈ L(V, W ). If there is an adjoint map T ∗ : W → V such that hT (v), wi = hv, T ∗ (w)i for all v ∈ V and all w ∈ W , show that T ∗ ∈ L(W, V ). 5. Let T : V → W be a linear transformation whose adjoint T ∗ : W → V with respect to h , iV , h , iW does exist. So for all v ∈ V , w ∈ W , hT (v), wiW = hv, T ∗ (w)iV . Put N = null(T ) and R∗ = Im(T ∗ ). (a) Show that (R∗ )⊥ = N . (b) Suppose that R∗ is finite dimensional and show that N ⊥ = R∗ . 6. Prove Theorem 8.4.2. 7. Prove Theorem 8.4.3. 8. On an in-class linear algebra exam, a student was asked to give (for ten points) the definition of “real symmetric matrix.” He couldn’t remember the definition and offered the following alternative: A real, n × n matrix A is symmetric if and only if A2 = AAT . When he received no credit for this answer, he went to see the instructor to find out what was wrong with his definition. By that time he had looked up the definition and tried to see if he could prove that his definition was equivalent to the standard one: A is symmetric if and only if A = AT . He had a proof that worked for 2 × 2 matrices and thought he could prove it for 3×3 matrices. The instructor said this was not good enough. However, the instructor would give the student 5 points for a counterexample, or the full ten points if he could prove that the two definitions were equivalent for all n. Your problem is to determine whether or not the

156

CHAPTER 8. INNER PRODUCT SPACES student should have been able to earn ten points or merely five points. And you might consider the complex case also: show that A2 = AA∗ if and only A = A∗ , or find a counterexample.

9. Let A be a real symmetric matrix. Either prove the following statement or disprove it with a counterexample: Each diagonal element of A is bounded above (respectively, below) by the largest (respectively, smallest) eigenvalue of A. 10. Suppose V and W are finite dimensional inner product spaces with orthonormal bases B1 and B2 , respectively. Let T ∈ L(V, W ), so we know that T ∗ ∈ L(W, V ) exists and is unique. Prove that [T ∗ ]B1 ,B2 is the conjugate transpose of [T ]B2 ,B1 . 11. Suppose that V is an inner product space over F . Let u, v ∈ V . Prove that hu, vi = 0 iff ||u|| ≤ ||u + av|| ∀a ∈ F . (Hint: If u ⊥ v, use the Pythagorean theorem on u + av. For the converse, square the assumed inequality and then show that −2Re(ahu, vi) ≤ |a|2 ||v|| for all a ∈ F . Then put a = −1/||v|| in the case v 6= ~0.) 12. For arbitrary real numbers a1 , . . . , an and b1 , . . . , bn , show that !2 ! n ! n n X X X ≤ aj bj ja2j b2j /j . j=1

j=1

j=1

13. Let V be a real inner product space. Show that 1 1 hu, vi = ||u + v||2 − ||u − v||2 . 4 4

(8.13)

(This equality is the real polarization identity.) 14. Let V be a complex inner product space. Show that hu, vi = Re(hu, vi) + i Re(hu, ivi).

(8.14)

15. Let V be a complex inner product space. Show that 4

1X n hu, vi = i ||u + in v||2 . 4 n=1

(8.15)

8.5. EXERCISES

157

(This equality is the complex polarization identity.) For the next two exercises let V be a finite dimensional inner product space, and let P ∈ L(V ) satisfy P 2 = P . 16. Show that P is an orthogonal projection if and only if null(P ) ⊆ (Im(P ))⊥ . 17. Show that P is an orthogonal projection if and only if ||P (v)|| ≤ ||v|| for all v ∈ V . (Hint: You will probably need to use Exercises 11 and 16.) 18. Fix a vector v ∈ V and define T ∈ V ∗ by T (u) = hu, vi. For a ∈ F determine the formula for T ∗ (a). Here we us the standard inner product on F given by ha, bi = ab for a, b ∈ F . 19. Let T ∈ L(V, W ). Prove that (a) T is injective if and only if T ∗ is surjective; (b) T is surjective if and only if T ∗ is injective. 20. If T ∈ L(V ) and U is a T -invariant subspace of V , then U ⊥ is T ∗ invariant. 21. Let V be a vector space with norm k · k. Prove that | kuk − kvk | ≤ ku − vk. 22. Use exercise 21 to show that if {vi }∞ i=1 converges to v in V , then ∞ {kui k}i=1 converges to kvk. Give an example in R2 to show that the norms may converge without the vectors converging. 23. Let B be the basis for R2 given by B = (e1 , e1 + e2 ) where e1 = (1, 0)T 2 and e2 = (0, 1)T . Let T ∈ L(R ) be theoperator whose matrix with 1 1 respect to the basis B is [T ]B = . Compute the matrix [T ∗ ]B , −1 2 where the standard inner product on R2 is used. 24. Let A, B denote matrices in Mn (C).

158

CHAPTER 8. INNER PRODUCT SPACES (a) Define the term unitary matrix; define what it means for A and B to be unitarily equivalent; and show that unitary equivalence on Mn (C) is an equivalence relation. (b) P Show that if A P = (aij ) and B = (bij ) are unitarily equivalent, then n n 2 2 ∗ i,j=1 |aij | = i,j=1 |bij | . (Hint: Consider tr(A A).) 3 1 1 1 (c) Show that A = and B = are similar but −2 0 0 2 not unitarily equivalent. (d) Give two 2 × 2 matrices that satisfy the equality of part (b) above but are not unitarily equivalent. Explain why they are not unitarily equivalent.

25. Let U and W be subspaces of the finite-dimensional inner product space V. (a) Prove that U ⊥ ∩ W ⊥ = (U + W )⊥ . (b) d Prove that dim(W ) − dim(U ∩ W ) = dim(U ⊥ ) − dim(U ⊥ ∩ W ⊥ ). 26. Let u, v ∈ V . Prove that hu, vi = 0 iff kuk ≤ ku + avk for all a ∈ F .

Chapter 9 Operators on Inner Product Spaces Throughout this chapter V will denote an inner product space over F . To make life simpler we also assume that V is finite dimensional, so operators on V always have adjoints, etc.

9.1

Self-Adjoint Operators

An operator T ∈ L(V ) is called Hermitian or self-adjoint provided T = T ∗ . The same language is used for matrices. An n × n complex matrix A is called Hermitian provided A = A∗ (where A∗ is the transpose of the complex conjugate of A). If A is real and Hermitian it is merely called symmetric. Theorem 9.1.1. Let F = C, i.e., V is a complex inner product space, and suppose T ∈ L(V ). Then (a) If hT (v), vi = 0 ∀v ∈ V , then T = 0. (b) If T is self-adjoint, then each eigenvalue of T is real. (c) T is self-adjoint if and only if hT (v), vi ∈ R ∀v ∈ V . Proof. It is routine to show that for all u, w ∈ V , hT (u + w), u + wi − hT (u − w), u − wi + 4 hT (u + iw), u + iwi − hT (u − iw), u − iwi + i. 4

hT (u), wi =

159

160

CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES

Note that each term on the right hand side is of the form hT (v), vi for an appropriate v ∈ V , so by hypothesis hT (u), wi = 0 for all u, w ∈ V . Put w = T (u) to conclude that T = 0. This proves part (a). For part (b), suppose that T = T ∗ , and let v be a nonzero vector in V such that T (v) = λv for some complex number λ. Then λhv, vi = hT (v), vi = hv, T (v)i = λhv, vi. Hence λ = λ, implying λ ∈ R, proving part (b). For part (c) note that for all v ∈ V we have

hT (v), vi − hT (v), vi = hT (v), vi − hv, T (v)i = hT (v), vi − hT ∗ (v), vi = h(T − T ∗ )(v), vi. If hT (v), vi ∈ R for every v ∈ V , then the left side of the equation equals 0, so h(T − T ∗ )(v), vi = 0 for each v ∈ V . Hence by part (a), T − T ∗ = 0, i.e., T is self-adjoint. Conversely, suppose T is self-adjoint.Then the right hand side of the equation above equals 0, so the left hand side must also be 0, implying hT (v), vi ∈ R for all v ∈ V , as claimed. Now suppose that V is a real inner product space. Consider the operator T ∈ L(R2 ) defined by T (x, y) = (−y, x) with the standard inner product on R2 . Then hT (v), vi = 0 for all v ∈ R2 but T 6= 0. However, this cannot happen if T is self-adjoint. Theorem 9.1.2. Let T be a self-adjoint operator on the real inner product space V and suppose that hT (v), vi = 0 for all v ∈ V . Then T = 0. Proof. Suppose the hypotheses of the theorem hold. It is routine to verify hT (u), wi =

hT (u + w), u + wi − hT (u − w), u − wi 4

using hT (w), ui = hw, T (u)i = hT (u), wi because T is self-adjoint and V is a real vector space. Hence also hT (u), wi = 0 for all u, w ∈ V . With w = T (u) we see T = 0.

9.1. SELF-ADJOINT OPERATORS

161

Lemma 9.1.3. Let F be any subfield of C and let T ∈ L(V ) be self-adjoint. If α, β ∈ R are such that x2 + αx + β is irreducible over R, i.e., α2 < 4β, then T 2 + αT + βI is invertible. Proof. Suppose α2 < 4β and ~0 6= v ∈ V . Then h(T 2 + αT + βI)(v), vi = hT 2 (v), vi + αhT (v), vi + βhv, vi = hT (v), T (v)i + αhT (v), vi + β||v||2 ≥ ||T (v)||2 − |α| ||T (v)|| ||v|| + β||v||2 2 |α| ||v|| α2 = ||T (v)|| − ||v||2 + β− 2 4 > 0, (9.1) where the first inequality holds by the Cauchy-Schwarz inequality. The last inequality implies that (T 2 +αT +βI)(v) 6= ~0. Thus T 2 +αT +βI is injective, hence it is invertible. We have seen that some operators on a real vector space fail to have eigenvalues, but now we show that this cannot happen with self-adjoint operators. Lemma 9.1.4. Let T be a self-adjoint linear operator on the real vector space V . Then T has an eigenvalue. Proof. Suppose n = dim(V ) and choose v ∈ V with ~0 6= v. Then (v, T (v), . . . , T n (v)) must be linearly dependent. Hence there exist real numbers a0 , . . . , an , not all 0, such that ~0 = a0 v + a1 T (v) + · · · + an T n (v). Construct the polynomial f (x) = a0 + a1 x + a2 x2 + · · · + an xn which can be written in factored form as f (x) = c(x2 + α1 x + β1 ) · · · (x2 + αk x + βk )(x − λ1 · · · (x − λm ), where c is a nonzero real number, each αj , βj and λj is real, each αj2 < 4βj , m + k ≥ 1, and the equation holds for all real x. Then we have 0 = a0 v + a1 T (v) + · · · + an T n (v) = (a0 I + a1 T + · · · + an T n )(v) = c(T 2 + α1 T + β1 I) · · · (T 2 + αk T + βk )(T − λ1 I) · · · (T − λm I)(v).

162


Each T 2 +αj T +βj I is invertible by Lemma 9.1.3 because T is self-adjoint and each αj2 < 4βj . Also c 6= 0. Hence the equation above implies that 0 = (T − λ1 I) · · · (T − λm I)(v). It then follows that T − λj I is not injective for at least one j. This says that T has an eigenvalue. The next theorem is very important for operators on real inner product spaces. Theorem 9.1.5. The Real Spectral Theorem: Let T be an operator on the real inner product space V . Then V has an orthonormal basis consisting of eigenvectors of T if and only if T is self-adjoint. Proof. First suppose that V has an orthonormal basis B consisting of eigenvectors of T . Then [T ]B is a real diagonal matrix, so it equals its conjugate transpose, i.e., T is self-adjoint. For the converse, suppose that T is self-adjoint. Our proof is by induction on n = dim(V ). The desired result clearly holds if n = 1. So assume that dim(V ) = n > 1 and that the desired result holds on vector spaces of smaller dimension. By Lemma 9.1.4 we know that T has an eigenvalue λ with a nonzero eigenvector u, and without loss of generality we may assume that ||u|| = 1. Let U = span(u). Suppose v ∈ U ⊥ , i.e. hu, vi = 0. Then hu, T (v)i = hT (u), vi = λhu, vi = 0, so T (v) ∈ U ⊥ whenever u ∈ U ⊥ , showing that U ⊥ is T -invariant. Hence the map S = T |U ⊥ ∈ L(U ⊥ ). If v, w ∈ U ⊥ , then hS(v), wi = hT (v), wi = hv, T (w)i = hv, S(w)i, which shows that S is self-adjoint. Thus by the induction hypothesis there is an orthonormal basis of U ⊥ consisting of eigenvectors of S. Clearly every eigenvector of S is an eigenvector of T . Thus adjoining u to an orthonormal basis of U ⊥ consisting of eigenvectors of S gives an orthonormal basis of V consisting of eigenvectors of T , as desired. Corollary 9.1.6. Let A be a real n × n matrix. Then there is an orthogonal matrix P such that P −1 AP is a (necessarily real) diagonal matrix if and only if A is symmetric.

9.2. NORMAL OPERATORS

163

Corollary 9.1.7. Let A be a real symmetric matrix with distinct (necessarily real) eigenvalues λ1 , . . . , λm . Then V = null(T − λ1 I) ⊕ · · · ⊕ null(T − λm I).

9.2

Normal Operators

Definition An operator T ∈ L(V ) is called normal provided T T ∗ = T ∗ T . Clearly any self-adjoint operator is normal, but there are many normal operators in general that are not self-adjoint. For example, if A is an n × n nonzero real matrix with AT = −A (i.e., A is skew-symmetric) , then A 6= A∗ but A is a normal matrix because AA∗ = A∗ A. It follows that if T ∈ L(Rn ) is the operator with [T ]S = A where S is the standard basis of Rn , then T is normal but not self-adjoint. Recall Theorem 7.1.2 (A list of nonzero eigenvectors belonging to distinct eigenvalues must be linearly independent.) Theorem 9.2.1. Let T ∈ L(V ). Then ||T (v)|| = ||T ∗ (v)|| ∀v ∈ V iff T is normal. Proof. T is normal

⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒

T ∗T − T T ∗ = 0 h(T ∗ T − T T ∗ )(v), vi = 0 ∀v ∈ V hT ∗ T (v), vi = hT T ∗ (v), vi ∀v ∈ V ||T (v)||2 = ||T ∗ (v)||2 ∀v ∈ V.

(9.2)

Since T ∗ T − T T ∗ is self-adjoint, the theorem follows from Theorems 9.1.1 part (a) and 9.1.2. Corollary 9.2.2. Let T ∈ L(V ) be normal. Then (a) If v ∈ V is an eigenvector of T with eigenvalue λ ∈ F , then v is also an eigenvector of T ∗ with eigenvalue λ. (b) Eigenvectors of T corresponding to distinct eigenvalues are orthogonal. Proof. Note that (T − λI)∗ = T ∗ − λI, and that T is normal if and only if T − λI is normal. Suppose T (v) = λv. Since T is normal, we have 0 = ||(T − λI)(v)|| = hv, (T ∗ − λI)(T − λI)(v)i = ||(T ∗ − λI)(v)||.

164


Part (a) follows. For part (b), suppose λ and µ are distinct eigenvalues with associated eigenvectors u and v, respectively. So T (u) = λu and T (v) = µv, and from part (a), T ∗ (v) = µv. Hence (λ − µ)hu, vi = hλu, vi − hu, µvi = hT (u), vi − hu, T ∗ (v)i = 0. Because λ 6= µ, the above equation implies that hu, vi = 0. The next theorem is one of the truly important results from the theory of complex inner product spaces. Be sure to compare it with the Real Spectral Theorem. Theorem 9.2.3. Complex Spectral Theorem Let V be a finite dimensional complex inner product space and T ∈ L(V ). Then T is normal if and only if V has an orthonormal basis consisting of eigenvectors of T . Proof. First suppose that V has an orthonormal basis B consisting of eigenvectors of T , so that [T ]B = A is a diagonal matrix. Then A∗ is also diagonal and is the matrix A∗ = [T ∗ ]B . Since any two diagonal matrices commute, AA∗ = A∗ A. This implies that T is normal. For the converse, suppose that T is normal. Since V is a complex vector space, we know (by Schur’s Theorem) that there is an orthonormal basis B = (e1 , . . . , en ) for which A = [T ]B is upper triangular. If A = (aij ), then aij = 0 whenever i > j. Also T (e1 ) = a11 e1 , so ||T (e1 )||2 = |a11 |2 ||T ∗ (e1 )||2 = |a11 |2 + |a12 |2 + · · · + |a1n |2 . Because T is normal, ||T (e1 )|| = ||T ∗ (e1 )||. So the two equations above imply that all entries in the first row of A, except possibly the diagonal entry a11 , equal 0. It now follows that T (e2 ) = a12 e1 + a22 e2 = a22 e2 , so ||T (e2 )||2 = |a22 |2 , ||T ∗ (e2 )||2 = |a22 |2 + |a23 |2 + · · · + |a2n |2 . Because T is normal, ||T (e2 )|| = ||T ∗ (e2 )||. Thus the two equations just above imply that all the entries in the second row of A, except possibly the diagonal entry a22 , must equal 0. Continuing in this fashion we see that all the nondiagonal entries of A equal 0, i.e., A is diagonal.

9.2. NORMAL OPERATORS

165

Corollary 9.2.4. Let A be a normal, n × n complex matrix. Then there is a unitary matrix P such that P −1 AP is diagonal. It follows that the minimal polynomial of A has no repeated roots. Theorem 9.2.5. Let A be n × n. Then A is normal if and only if the eigenspaces of AA∗ are A-invariant. Proof. First suppose that A is normal, so AA∗ = A∗ A, and suppose that AA∗~x = λ~x. We show that A~x also belongs to λ for AA∗ : AA∗ (A~x) = A(AA∗~x) = A · λ~x = λ(A~x). For the converse, suppose the eigenspaces of AA∗ are A-invariant. We want to show that A is normal. We start with the easy case. Lemma 9.2.6. Suppose BB ∗ = diag(λ1 , . . . , λk , 0 . . . , 0) is a diagonal matrix, and suppose that the eigenspaces of BB ∗ are B-invariant. Then B is normal. Proof of Lemma: ~u = (0, . . . , 0, uk+1 , . . . , un )T is a typical element of the null space of BB ∗ , i.e., the eigenspace belonging to the eigenvalue 0. First note that the bottom n−k rows of B must be zero, since the (i, i) entry of BB ∗ is the inner product of the ith row of B with itself, which must be 0 if i ≥ k+1. So by hypothesis B(0, . . . , 0, uk+1 , . . . , un )T = (0, . . . , 0, vk+1 , . . . , vn )T . Since the top k entries of B(0, . . . , 0, uk+1 , . . . , un )T must be zero, the entries in the top k rows and last n − k columns must be zero, so B1 0 B= , 0 0 where B1 is k × k with rank k. For 1 ≤ i ≤ k, the standard basis vector ~ei is an eigenvector of BB ∗ belonging to λi . And by hypothesis, BB ∗ · B~ei = λi B~ei . So B · BB ∗~ei = B · λi~ei , implying that [B 2 B ∗ − BB ∗ B]~ei = ~0. But this latter equality is also seen to hold for k + 1 ≤ i ≤ n. Hence B 2 B ∗ = BB ∗ B. Now using block multiplication, B12 B1∗ = B1 B1∗ B1 , implying that B1 B1∗ = B1∗ B1 . But this implies that BB ∗ = B ∗ B. So B is normal, proving the lemma. Now return to the general case. Let ~u1 , . . . , ~un be an orthonormal basis of eigenvectors of AA∗ . Use these vectors as the columns of a matrix U , so that U ∗ (AA∗ )U = diag(λ1 , . . . , λk , 0 = λk+1 , . . . , 0 = λn ), where 0 6= λ1 · · · λk

166


and k = rank(A). Our hypothesis says that if AA∗~uj = λj ~uj , then AA∗ · A~uj = λj A~uj . Put B = U ∗ AU . So BB ∗ = U ∗ AU U ∗ A∗ U = U ∗ (AA∗ )U = diag(λ1 , . . . , λn ). Compute BB ∗ (U ∗~uj ) = U ∗ AA∗~uj = U ∗ · λj ~uj = λj U ∗~uj . So U ∗~uj = ~ej is an eigenvector of BB ∗ belonging to λj . Also (BB ∗ )BU ∗~uj = U ∗ AU U ∗ A∗ U U ∗ AU U ∗~uj = U ∗ AA∗ A~uj = U ∗ · λj A~uj = λj · U ∗ AU U ∗~uj = λj B(U ∗~uj ). So the eigenspaces of BB ∗ are B-invariant. Since BB ∗ is diagonal, by the Lemma we know B is normal. But now it follows easily that A = U BU ∗ must be normal

9.3

Decomposition of Real Normal Operators

Throughout this section V is a real inner product space. Lemma 9.3.1. Suppose dim(V ) = 2 and T ∈ L(V ). Then the following are equivalent: (a) T is normal but not self-adjoint. (b)The matrix of T with respect to every orthonormal basis of V has the a −b form , with b 6= 0. b a (c)The matrix of T with respect to some orthonormal basis of V has the a −b form , with b > 0. b a Proof. First suppose (a) holds and let S = (e1 , e2 ) be an orthonormal basis of V . Suppose a c [T ]S = . b d Then ||T (e1 )||2 = a2 + b2 and ||T ∗ (e1 )||2 = a2 + c2 . Because T is normal, by Theorem 9.2.1 ||T (e1 )|| = ||T ∗ (e1 )||. Hence b2 = c2 . Since T is not a b self-adjoint, b 6= c, so we have c = −b. Then [T ∗ ]S = . So −b d 2 a + b2 ab − bd a2 + b2 −ab + bd ∗ ∗ [T T ]S = , and [T T ]S = . ab − bd b2 + d2 −ab + bd b2 + d2 Since T is normal it follows that b(a − d) = 0. Since T is not self-adjoint, b 6= 0, implying that a = d, completing the proof that (a) implies (b). Now suppose that (b) holds and let B = (e1 , e2 ) be any orthonormal basis of V . Then either B or B 0 = (e1 , −e2 ) will be a basis of the type needed to show that (c) is satisfied.

9.3. DECOMPOSITION OF REAL NORMAL OPERATORS

167

Finally, suppose that (c) holds, i.e., there is an orthonormal basis B = (e1 , e2 ) such that [T ]B has the form given in (c). Clearly T 6= T ∗ , but a simple computation with the matrices representing T and T ∗ shows that T T ∗ = T ∗ T , i.e., T is normal, implying that (a) holds. Theorem 9.3.2. Suppose that T ∈ L(V ) is normal and U is a T -invariant subspace. Then (a) U ⊥ is T -invariant. (b) U is T ∗ -invariant. (c) (T |U )∗ = (T ∗ )|U . (d) T |U is a normal operator on U . (e) T |U ⊥ is a normal operator on U ⊥ . Proof. Let B 0 = (e1 , . . . , em ) be an orthonormal basis of U and extend it to an orthonormal basis B = (e1 , . . . , em , f1 , . . . , fn ) of V . Since U is T -invariant, A B [T ]B = , where A = [T |U ]B0 . 0 C For each j, 1 ≤ j ≤ m, ||T (ej )||2 equals the sum of the squares of the absolute values of the entries in the jth column of A. Hence m X

||T (ej )||2 =

j=1

the sum of the squares of the absolute values of the entries of A.

(9.3)

For each j, 1 ≤ j ≤ m, ||T ∗ (ej )||2 equals the sum of the squares of the absolute values of the entries in the jth rows of A and B. Hence m X j=1

||T ∗ (ej )||2 =

the sum of the squares of the absolute values of the entries of A and B.

(9.4)

Because T is normal, ||T (ej )|| = ||T ∗ (ej )|| for each j. It follows that the entries of B must all be 0, so A 0 [T ]B = . 0 C This shows that U ⊥ is T -invariant, proving (a). But now we see that ∗ A 0 ∗ [T ]B = , 0 C∗

168


implying that U is T ∗ -invariant. This completes a proof of (b). Now let S = T |U . Fix v ∈ U . Then hS(u), vi = hT (u), vi = hu, T ∗ (v)i ∀u ∈ U. Because T ∗ (v) ∈ U (by (b)), the equation above shows that S ∗ (v) = T ∗ (v), i.e., (T |U )∗ = (T ∗ )|U , completing the proof of (c). Parts (d) and (e) now follow easily. At this point the reader should review the concept of block multiplication for matrices partitioned into blocks of the appropriate sizes. In particular, if A and B are two block diagonal matrices each with k blocks down the diagonal, with the jth block being nj × nj , then the product AB is also block diagonal. Suppose that Aj , Bj are the jth blocks of A and B, respectively. Then the jth block of AB is Aj Bj :     A1 0 · · · 0 B1 0 · · · 0  0 A2 · · · 0   0 B2 · · · 0       .. ..  ·  .. ..  . . . .  . ··· . . .   . ··· .  0 · · · · · · Am 0 · · · · · · Bm   A1 B1 0 ··· 0  0  A2 B2 · · · 0   =  .. . .. . .  .  . ··· . 0 · · · · · · Am Bm We have seen the example T (x, y) = (−y, x) of an operator on R2 that is normal but has no eigenvalues, so has no diagonal matrix. However, the following theorem says that normal operators have block-diagonal matrices with blocks of size at most 2 by 2. Theorem 9.3.3. Suppose that V is a real inner product space and T ∈ L(V ). Then T is normal if and only if there is an orthonormal basis of V with respect to which T has a block diagonal matrix where each block is a 1-by-1 matrix or a 2-by-2 matrix of the form a −b , (9.5) b a with b > 0.

9.4. POSITIVE OPERATORS

169

Proof. First suppose that V has an orthonormal basis B for which [T ]B is block diagonal of the type described in the theorem. Since a matrix of the form given in Eq. 9.5 commutes with its adjoint, clearly T is also normal. For the converse, suppose that T is normal. Our proof proceeds by induction on the dimension n of V . For n = 1 the result is obvious. For n = 2 if T is self-adjoint it follows from the Real Spectral Theorem; if T is not self-adjoint, use Lemma 9.3.1. Now assume that n = dim(V ) > 2 and that the desired result holds on vector spaces of dimension smaller than n. By Theorem 7.3.1 we may let U be a T -invariant subspace of dimension 1 if there is one. If there is not, then we let U be a 2-dimensional T -invariant subspace. First, if dim(U ) = 1, let e1 be a nonzero vector in U with norm 1. So B 0 = (e1 ) is an orthonormal basis of U . Clearly the matrix [T |U ]B0 is 1-by-1. If dim(U ) = 2, then T |U is normal (by Theorem 9.3.2), but not self-adjoint (since otherwise T |U , and hence T , would have an eigenvector in U by Lemma 9.1.4). So we may choose an orthonormal basis of U with respect to which the matrix of T |U has the desired form. We know that U ⊥ is T -invariant and T |U ⊥ is a normal operator on U ⊥ . By our induction hypothesis there is an orthonormal basis of U ⊥ of the desired type. Putting together the bases of U and U ⊥ we obtain an orthonormal basis of V of the desired type.

9.4

Positive Operators

In this section V is a finite dimensional inner product space. An operator T ∈ L(V ) is said to be a positive operator provided T = T ∗ and hT (v), vi ≥ 0 ∀v ∈ V. Note that if V is a complex space, then having hT (v), vi ∈ R for all v ∈ V is sufficient to force T to be self-adjoint (by Theorem 9.1.1, part (c)). So hT (v), vi ≥ 0 for all v ∈ V is sufficient to force T to be positive. There are many examples of positive operators. If P is any orthogonal projection, then P is positive. (You should verify this!) In the proof of Lemma 9.1.3 we showed that if T ∈ L(V ) is self-adjoint and if α, β ∈ R are such that α2 < 4β, then T 2 + αT + βI is positive. You should think about the analogy between positive operators (among all operators) and the nonnegative real numbers among all complex numbers. This will be made easier

170


by the theorem that follows, which collects the main facts about positive operators. If S, T ∈ L(V ) and S 2 = T , we say that S is a square root of T . Theorem 9.4.1. Let T ∈ L(V ). Then the following are equivalent: (a) T is positive; (b) T is self-adjoint and all the eigenvalues of T are nonnegative: (c) T has a positive square root; (d) T has a self-adjoint square root; (e) There exists an operator S ∈ L(V ) such that T = S ∗ S. Proof. We will prove the (a) =⇒ (b) =⇒ (c) =⇒ (d) =⇒ (e) =⇒ (a). Suppose that (a) holds, i.e., T is positive, so in particular T is self-adjoint. Let v be a nonzero eigenvector belonging to the eigenvalue λ. Then 0 ≤ hT (v), vi = hλv, vi = λhv, vi, implying that λ is a nonnegative number, so (b) holds. Now suppose that (b) holds, so T is self-adjoint and all the eigenvalues of T are nonnegative. By the Real and Complex Spectral Theorems, there is an orthonormal basis B = (e1 , . . . , en ) of V consisting of eigenvectors of T . Say T (ei ) = λj ei , where each λi ≥ 0. Define S ∈ L(B) by p S(ej ) = λj ej , 1 ≤ j ≤ n. Since S has a real diagonal matrix clearly selfp S is Pn with respect to the basisPB, n 2 adjoint. Now suppose v = j=1 aj ej . Then hS(v), vi = j=1 λj |aj | ≥ 0, so S is a positive square root of T . This shows (b) =⇒ (c). Clearly (c) implies (d), since by definition every positive operator is selfadjoint. So suppose (d) holds and let S be a self-adjoint operator with T = S 2 . Since S is self-adjoint, T = S ∗ S, proving that (e) holds. Finally, suppose that T = S ∗ S for some S ∈ L(V ). It is easy to check that T ∗ = T , and then hT (v), vi = hS ∗ S(v), vi = h(S(v), S(v)i ≥ 0, showing that (e) implies (a). An operator can have many square roots. For example, in addition to ±I being square roots of I, for each a ∈ F and for each nonzero b ∈ F , we have that a b A= satisfies A2 = I. 1−a2 −a b However, things are different if we restrict our attention to positive operators.

9.5. ISOMETRIES

171

Theorem 9.4.2. Each positive operator on V has a unique positive square root. Proof. Let T ∈ L(V ) be positive with nonnegative distinct eigenvalues λ1 , . . . , λm . Since T is self-adjoint, we know by Theorem 7.2.8 (and the Spectral Theorems) that V = null(T − λ1 I) ⊕ · · · ⊕ null(T − λm I). By the preceding theorem we know that T has a positive square root S. 2 Suppose α is an eigenvalue of S. If v ∈ null(S √ − αI). Then T (v) = S (v) = 2 2 α v, so α is some eigenvalue of T , i.e., α = λi for some i. Clearly p null(S − λj ) ⊆ null(T − λj I). √ √ Since the only possible eigenvalues of S are λ1 , . . . , λm , and because S is self-adjoint, we also know that p p V = null(S − λ1 I) ⊕ · · · ⊕ null(S − λm I). A dimension argument then shows that p null(S − λj I) = null(T − λj I) for each p j. In other words, on null(T −λj I), the operator S is just multiplication by λj . Thus S, the positive square root of T , is uniquely determined by T .

9.5

Isometries

An operator S ∈ L(V ) is called an isometry provided ||S(v)|| = ||v|| ∀v ∈ V. For example, λI is an isometry whenever λ ∈ F satisfies |λ| = 1. More generally, suppose that λ1 , . . . , λn are scalars with absolute value 1 and S ∈ L(V ) satisfies S(ej ) = λj ej for some orthonormal basis B = (e1 , . . . , en ) of V . For v ∈ V we have v = hv, e1 ie1 + · · · + hv, en ien

(9.6)

172


and (using the Pythagorean theorem) ||v||2 = |hv, e1 i|2 + · · · + |hv, en i|2 .

(9.7)

Apply S to both sides of Eq. s9.6: S(v) = λ1 hv, e1 ie1 + · · · + λn hv, en ien . This last equation along with |λj | = 1 shows that ||S(v)||2 = |hv, e1 i|2 + · · · + |hv, en i|2 .

(9.8)

If we compare Eqs. 9.7 and 9.8, we see that ||v|| = ||S(v)||, i.e., S is an isometry. In fact, this is the prototypical isometry, as we shall see. The next theorem collects the main results concerning isometries. Theorem 9.5.1. Suppose V is an n-dimensional inner product space and S ∈ L(V ). Then the following are equivalent: (a) S is an isometry; (b) hS(u), S(v)i = hu, vi ∀u, v ∈ V ; (c) S ∗ S = I; (d) (S(e1 ), . . . , S(en )) is an orthonormal basis of V whenever (e1 , . . . , en ) is an orthonormal basis of V ; (e) there exists an orthonormal basis (e1 , . . . , en ) of V for which (S(e1 ), . . . , S(en )) is orthonormal; (f ) S ∗ is an isometry; (g) hS ∗ (u), S ∗ (v)i = hu, vi ∀u, v ∈ V ; (h) SS ∗ = I; (i) hS ∗ (e1 ), . . . , S ∗ (en )i is orthonormal whenever (e1 , . . . , en ) is an orthonormal list of vectors in V ; (j) there exists an orthonormal basis (e1 , . . . , en ) of V for which (S ∗ (e1 ), . . . , S ∗ (en )) is orthonormal. Proof. To start, suppose S is an isometry. If V is a real inner-product space, then for all u, v ∈ V , using the real polarization identity we have ||S(u) + S(v)||2 − ||S(u) − S(v)||2 4 ||S(u + v)||2 − ||S(u − v)||2 = 4 2 ||u + v|| − ||u − v||2 = 4 = hu, vi.

hS(u), S(v)i =

9.5. ISOMETRIES

173

If V is a complex inner product space, use the complex polarization identity in the same fashion. In either case we see that (a) implies (b). Now suppose that (b) holds. Then h(S ∗ S − I)(u), vi = hS(u), S(v)i − hu, vi = 0 for every u, v ∈ V . In particular, if v = (S ∗ S − I)(u), then necessarily (S ∗ S − I)(u) = 0 for all u ∈ V , forcing S ∗ S = I. Hence (b) implies (c). Suppose that (c) holds and let (e1 , . . . , en ) be an orthonormal list of vectors in V . Then hS(ej ), S(ek )i = hS ∗ S(ej ), ek i = hej , ek i. Hence (S(e1 ), . . . , S(en )) is orthonormal, proving that (c) implies (d). Clearly (d) implies (e). Suppose that (e1 , . . . , en ) is a basis of V for which (S(e1 ), . . . , S(en )) is orthonormal. For v ∈ V , ||S(v)||2 = ||S(

n X hv, ei iei ||2 i=1

=

n X

||hv, ei iS(ei )||2

i=1

=

n X

|hv, ei i|2

i=1

= ||v||2 . Taking square roots we see that (e) implies (a). We have now shown that (a) through (e) are equivalent. Hence replacing S by S ∗ we have that (f) through (j) are equivalent. Clearly (c) and (h) are equivalent, so the proof of the theorem is complete. The preceding theorem shows that an isometry is necessarily normal (see parts (a), (c) and (h)). Using the characterization of normal operators proved earlier we can now give a complete description of all isometries. But as usual, there are separate statements for the real and the complex cases.

174


Theorem 9.5.2. Let V be a (finite dimensional) complex inner product space and let S ∈ L(V ). Then S is an isometry if and only if there is an orthonormal basis of V consisting of eigenvectors of S all of whose corresponding eigenvalues have absolute value 1. Proof. The example given at the beginning of this section shows that the condition given in the theorem is sufficient for S to be an isometry. For the converse, suppose that S ∈ L(V ) is an isometry. By the complex spectral theorem there is an orthonormal basis (e1 , . . . , en ) of V consisting of eigenvectors of S. For 1 ≤ j ≤ n, let λj be the eigenvalue corresponding to ej . Then |λj | = ||λj ej || = ||S(ej )|| = ||ej || = 1. Hence each eigenvalue of S has absolute value 1, completing the proof. The next result states that every isometry on a real inner product space is the direct sum of pieces that look like rotations on 2-dimensional subpaces, pieces that equal the identity operator, and pieces that equal multiplication by -1. It follows that an isometry on an odd-dimensional real inner product space must have 1 or -1 as an eigenvalue. Theorem 9.5.3. Suppose that V is a real inner product space and S ∈ L(V ). Then S is an isometry if and only if there is an orthonormal basis of V with respect to which S has a block diagonal matrix where each block on the diagonal is a 1-by-1 matrix containing 1 or -1, or a 2-by-2 matrix of the form cos(θ) −sin(θ) , with θ ∈ (0, π). (9.9) sin(θ) cos(θ) Proof. First suppose that S is an isometry. Because S is normal, there is an orthonormal basis of V such that with respect to this basis S has a block diagonal matrix, where each block is a 1-by-1 matrix or a 2-by-2 matrix of the form a −b , with b > 0. (9.10) b a If λ is an entry in a 1-by-1 block along the diagonal of the matrix of S (with respect to the basis just mentioned), then there is a basis vector ej such that S(ej ) = λej . Because S is an isometry, this implies that |λ| = 1 with λ real, forcing λ = ±1.

9.6. THE POLAR DECOMPOSITION

175

Now consider a 2-by-2 matrix of the form in Eq. 9.10 along the diagonal of the matrix of S. There are basis vectors ej , ej+1 such that S(ej ) = aej + bej+1 . Thus 1 = ||ej ||2 = ||S(ej )||2 = a2 + b2 . This equation, along with the condition that b > 0, implies that there exists a number θ ∈ (0, π) such that a = cos(θ) and b = sin(θ). Thus the matrix in Eq. 9.10 has the required form. This completes the proof in one direction. Conversely, suppose that there is an orthonormal basis of V with respect to which the matrix of S has the form stated in the theorem. There there is a direct sum decomposition V = U1 ⊕ · · · ⊕ Um , where each subspace Uj is a subspace of V having dimension 1 or 2. Furthermore, any two vectors belonging to distinct U ’s are orthogonal, and each S|Uj is an isometry mapping Uj into Uj . If v ∈ V , write v=

m X

ui , uj ∈ Uj .

i=1

Applying S to this equation and taking norms gives ||S(v)||2 = = = =

||S(u1 ) + · · · + S(um )||2 ||S(u1 )||2 + · · · + ||S(um )||2 ||u1 ||2 + · · · + ||um ||2 ||v||2 .

This shows that S is an isometry, as desired.

9.6

The Polar Decomposition

Theorem 9.6.1. Polar Decomposition If T ∈ L(V ), then there exists an isometry S ∈ L(V ) such that √ T = S ◦ T ∗T .

176


Proof. Let T ∈√L(V ). Then for each √ v ∈ V , ||T (v)||2 = hT (v), T (v)i = √ hT ∗ T (v), vi = h T ∗ T (v), T ∗ T (v)i = || T ∗ T (v)||2 . So we have established √ ||T (v)|| = || T ∗ T (v)|| ∀v ∈ V.

(9.11)

The next step is to construct a map √ √ S1 : Im( T ∗ T ) → Im(T ) : T ∗ T (v) 7→ T (v), and show that it is a well-defined isometry. √ √ √ ||T (v1 )−T (v2 )|| = ||T (v1 −v2 )|| = || T ∗ T (v1 −v2 )|| = || T ∗ T (v1 )− T ∗ T (v2 )||. √ √ This shows that T (v1 ) = T (v2 ) if and only if T ∗ T (v1 ) = T√∗ T (v2 ). In fact it shows that S1 is well-defined and is a bijection from Im( T ∗ T ) onto Im(T ). One consequence of this is that √ dim(Im( T ∗ T )) = dim(Im(T )) and √ dim(Im( T ∗ T ))⊥ = dim(Im(T ))⊥ . √ It is also easy to check√that S1 is linear. Moreover, if v = T ∗ T (u), then ||S1 (v)|| = ||T (u)|| = || T ∗ T (u)|| = ||v||, implying that S1 is√an isometry. Now construct an orthonormal basis (e1 , . . . , em ) of (Im( T ∗ T )⊥ and an orthonormal basis (f1 , . . . , fm ) of (Im(T ))⊥ . Define √ S2 : (Im( T ∗ T )⊥ → (Im(T ))⊥ by: S2 (ej√ ) = fj (and extend linearly). It follows that ||S2 (w)|| = ||w|| for all w ∈ (Im( T ∗ T )⊥ . Here S2 is an isometry by part (e) of Theorem 9.5.1. (Notation: If U and W are independent, orthogonal subspaces of V so U + W = U ⊕ W and U ⊆ W ⊥ , we write U ⊥ W in place of U ⊕ W .) We know that

√ √ V = Im( T ∗ T ) ⊥ (Im( T ∗ T ))⊥ . √ √ For v ∈ V , write v = u + w with u ∈ Im( T ∗ T ), w ∈ (Im( T ∗ T ))⊥ . Define S : V → V by S(v) = S1 (u) + S2 (w). It is easy to check that S ∈ L(V ). Moreover, ||S(v)||2 = ||S1 (u) + S2 (w)||2 = ||S1 (u)||2 + ||S2 (w)||2 = ||u||2 + ||w||2 = ||v||2 , implying S is an isometry. (Here we used the fact that ⊥ S1 (u) ∈ Im(T √ ) and S2 (w) ∈ (Im(T )) .) The only thing left to check is that T = S ◦ T ∗ T , but this is obvious by the way S1 is defined on the image of √ T ∗T .

9.7. THE SINGULAR-VALUE DECOMPOSITION

177

We can visualize this proof as follows. Start with T ∈ L(V ). ⊥ √ √ Im( T ∗ T ) ⊥ Im( T ∗ T ) e1 , . . . , em √ T ∗ T (v)

ej

S1 ↓

S2 ↓

T (v) Im(T )

↓ fj

⊥

(Im(T ))⊥

f1 , . . . , f m

S = S1 ⊕ S2 =⇒ S is an isometry. And

√ Polar Decomposition: T = S ◦ T ∗ T . √ If T is invertible, then S = T ◦ ( T ∗ T )−1 is unique. If T is not invertible, then Im(T ) 6= V , so S2 6= −S2 . Hence S 0 = S1 ⊕ −S2 yields s a polar decomposition of T distinct from that given by S 0 . The polar decomposition states that each operator on V can be written as the product of an isometry and a positive operator. Thus we can write each operator on V as the product of two operators, each of which is of a type that we have completely described and understand reasonably well. We know there is an orthonormal basis of V with respect to which the isometry S has a diagonal matrix (if F = C) or a block diagonal matrix with blocks of size at most 2-by-2√(if F = R), and there is an orthonormal basis of V with respect to which T ∗ T has a diagonal matrix. Unfortunately, there may not be one orthonormal basis that does both at the same time. However, we can still say something interesting. This is given by the singular value decomposition which is discussed in the next section.

9.7

The Singular-Value Decomposition

Statement of the General Result Let F be a subfield of the complex number field C, and let A ∈ Mm,n (F ). Clearly A∗ A is self-adjoint, so there is an orthonormal basis (v1 , . . . , vn ) for F n (whose elements we view as column vectors) consisting of eigenvectors of A∗ A with associated eigenvalues λ1 , . . . , λn . Let h , i denote the usual

178


inner product on F n . Since ||Avi ||2 = hAvi , Avi i = hA∗ Avi , vi i = hλi vi , vi i = λi ||vi ||2 = λi , we have proved the following lemma. Lemma 9.7.1. With the notation as above, (i) Each eigenvalue λi of A∗ A is real and nonnegative. Hence WLOG we may assume that λ√ 1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. (ii) With si = λi for 1 ≤ i ≤ n, we say that the si , 1 ≤ i ≤ n, are the singular values of A. Hence the singular values of A are the lengths of the vectors Av1 , . . . , Avn . This is enough for us to state the theorem concerning the Singular Value Decomposition of A. Theorem 9.7.2. Given A ∈ Mm,n (F ) as above, there are unitary matrices U ∈ Mm (F ) and V ∈ Mn (F ) such that   s1 0   ...     (9.12) A = U ΣV ∗ , where Σ =   s r    0  0 is a diagonal m × n matrix and s1 ≥ s2 ≥ · · · ≥ sr are the positive (i.e., nonzero) singular values of A. (i) The columns of U are eigenvectors of AA∗ , the first r columns of U form an orthonormal basis for the column space col(A) of A, and the last m − r columns of U form an orthonormal basis for the (right) null space of the matrix A∗ . (ii) The columns of V are eigenvectors of A∗ A, the last n − r columns of V form an orthonormal basis for the (right) null space null(A) of A, and the first r columns of V form an orthonormal basis for the column space col(A∗ ). (iii) The rank of A is r. Proof. Start by√supposing that λ1 ≥ λ2 ≥ · · · ≥ λr > λr+1 = · · · = λn = 0. Since ||Avi || = λi = si , clearly Avi 6= 0 iff 1 ≤ i ≤ r. Also, if i 6= j, then hAvi , Avj i = hA∗ Avi , vj i = λi hvi , vj i = 0. So (Av1 , . . . , Av √ n ) is an orthogonal list. Moreover, Avi 6= 0 iff 1 ≤ i ≤ r, since ||Avi || = λi = si . So clearly (Av1 , . . . , Avr ) is a linearly independent list that spans a subspace P of col(A). On the other hand suppose that y = Ax ∈ col(A). Then x = ci vi , so Pn Pr y = Ax = i=1 ci Avi = i=1 ci Avi . It follows that (Av1 , . . . Avr ) is a basis


179

for col(A), implying that r =dim(col(A)) = rank(A). Of course, then the right null space of A has dimension n − r. Avi For 1 ≤ i ≤ r, put ui = ||Av = s1i · Avi , so that Avi = si · ui . Now extend i || (u1 , . . . , ur ) to an orthonormal basis (u1 , . . . , um ) of F m . Let U = [u1 , . . . , um ] be the matrix whose jth column is uj . Similarly, put V = [v1 , . . . , vn ], so U and V are unitary matrices. And AV = [Av1 , . . . , Avr , 0, . . . , 0] = [s1 u1 , . . . , sr ur , 0, . . . , 0] =   s1 0 ...     = [u1 , . . . , um ]   = U Σ, s   r .. . 0 where Σ is diagonal, m × n, with the singular values of A along the diagonal. Then AV = U Σ implies A = U ΣV ∗ , A∗ = V Σ∗ U ∗ , so A∗ A = V Σ∗ U ∗ · U ΣV ∗ = V Σ∗ ΣV ∗ . Then   λ1   ... (A∗ A)V = V Σ∗ ΣV ∗ V = V Σ∗ Σ = V   = (λ1 v1 , . . . , λr vr , 0, · · · , 0). λr Since (A∗ A)vj = 0 for j > r, implying vj∗ A∗ Avj = 0, forcing Avj = 0, it is clear that vr+1 , p . . . , vn form an orthonormal basis for null(A). Since ∗ ∗ λj vj = (A A)vj = A λj uj , we see A∗ uj = sj vj . It now follows readily that v1 , . . . , vr form a basis for the column space of A∗ . Similarly, AA∗ = U ΣΣ∗ U ∗ , so that (AA∗ )U = U ΣΣ∗ = (λ1 u1 , . . . , λr ur , 0, · · · , 0). It follows that (u1 , . . . , un ) consists of eigenvectors of AA∗ . Also, ur+1 , . . . , um form an orthonormal basis for the (right) null space of A∗ , and u1 , . . . , ur form an orthonormal basis for col(A). Recapitulation We want to practice recognizing and/or finding a singular value decomposition for a given m × n matrix A over the subfield F of C.

180


A = U ΣV ∗ if and only if A∗ = V Σ∗ U ∗ .

(9.13)

Here Σ is m × n and diagonal with the nonzero singular values s1 ≥ s2 ≥ . . . , ≥ sr down the main diagonal as its only nonzero entries. Similarly, Σ∗ is n × m and diagonal with s1 ≥ · · · ≥ sr down the main diagonal as its only nonzero entries. So the nonzero eigenvalues of A∗ A are identical to the nonzero eigenvalues of AA∗ . The only difference is the multiplicity of 0 as an eigenvalue. For 1 ≤ i ≤ r, Avi = si ui and A∗ ui = si vi .

(9.14)

(v1 , . . . , vr , vr+1 , . . . , vn ) is an orthonormal basis of eigenvectors of A∗ A. (9.15)

(u1 , . . . , ur , ur+1 , . . . , um ) is an orthonormal basis of eigenvectors of AA∗ . (9.16)

(v1 , . . . , vr ) is a basis for col(A∗ ) and (vr+1 , . . . , vn ) is a basis for null(A). (9.17)

(u1 , . . . , ur ) is a basis for col(A) and (ur+1 , . . . , um ) is a basis for null(A∗ ). (9.18) We can first find (v1 , . . . , vn ) as an orthonormal basis of eigenvectors of A∗ A, put ui = s1i · Avi , for 1 ≤ i ≤ r, and then complete (u1 , . . . , ur ) to an orthonormal basis (u1 , . . . , um ) of eigenvectors of AA∗ . Sometimes here it is efficient to use the fact that (ur+1 , . . . , um ) form a basis for the null space of AA∗ or of A∗ . If n < m, this is the approach usually taken.


181

Alternatively, we can find (u1 , . . . , um ) as an orthonormal basis of eigenvectors of AA∗ (with eigenvalues ordered from largest to smallest), put vi = s1i A∗ ui for 1 ≤ i ≤ r, and then complete (v1 , . . . , vr ) to an orthonormal basis (v1 , . . . , vn ) of eigenvectors of A∗ A. If m < n, this is the approach usually taken.

9.7.3

Two Examples 

 1 −1 Problem 1. Find the Singular Value Decomposition of A =  −2 2  . 2 −2 9 −9 Solution: A∗ A = AT A = . In this case it is easy to find −9 9 an orthonormal basis of!R2 consisting of eigenvectors of A∗ A. Put v1 = ! √1 − √12 2 , v2 = , V = [v1 , v2 ]. √1 √1 2

2

√  −√ 2 = 3√1 2  2 √2  = −2 2 

Then A∗ A[v1 , v2 ] = [18v1 , 0 · v2 ]. So put u1 =  

− 31 2 3

− 32

Av1 ||Av1 ||

  = the first column of U . 

   2 −2 It is now easy to see that w1 =  1  , w2 =  0  form a basis of 0 1 ⊥ u1 . We apply Gram-Schmidt to (w1 , w2 ) to obtain  −2    u2 = 

√2 5 √1 5

0

√

 , u3 =  

45 √4 45 √5 45

 .

Now put U = [u1 , u2 , u3 ]. ∗ Itfollows √ that A = U ΣV is one singular value decomposition of A, where 3 2 0  Σ= 0 0 . 0 0

182


1 0 i Problem 2. Compute a singular value decomposition of A = . 0 1 −i Solution: In thiscase AA∗ is 2 × 2, while A∗ A is 3 × 3, so we start with 2 −1 AA∗ = . Here AA∗ has eigenvalues λ1 = 3 and λ2 = 1. So the −1 2 √ singular values of A∗ (and hence of A) are s1 =  3 and s2 = 1. It follows  √ √ 3 0 3 0 0 that Σ = . Also Σ∗ =  0 1 . 0 1 0 0 0 In this case we choose to compute an orthonormal basis (u1 , u2 ) of eigen −1 −1 vectors of AA∗ . It is simple to check that AA∗ − 3I = , and we −1 −1 ! √1 1 −1 ∗ 2 may take u1 = . Similarly, AA − I = , and we may −1 √ −1 1 2 ! take u2 =

√1 2 √1 2

. Then U = [u1 , u2 ]. At this point we must put  1    ! √ 1 0 1 6 √ 1  1 ∗  −1  2 0 1  √ , = √ v1 = A u1 = √ −1 6  s1 3 −2i 2 √ −i i 6

and 

 1 0 v2 = 1 · A ∗ u2 =  0 1  −i i

√1 2 √1 2

!

 =

√1 2 √1 2

 .

0

At this point we know that (v3 ) must be an orthonormal basis for {span(v1 , v2 )}⊥ . So we want v3 = (x, y, z)T with 0 = h(1, −1, −2i), (x, y, z)i = x − y − 2iz, and 0 = h(1, 1, 0), (x, y, z)i = x + y. It follows easily that (x, y, z) = (iz, −iz, z), where we must choose z so that the norm of this −i −1 √i vector is 1. If we put z = √ , i.e., z = √i3 , then (x, y, z) = ( √13 , √ , 3 ). 3 3 Then  1  2i ! √ √ √1 √ − 1 1 6 √ √ 3 0 0  √16 √1 6  2 2 0 A=  , 1 1 2 2 − √2 √2 0 1 0 √1 √1 √i − − 3 3 3 which is a singular value decomposition of A.


183

THE REMAINDER OF THIS SECTION MAY BE CONSIDERED TO BE STARRED. Definition If λ is an eigenvalue of the matrix A, then dim(null(T − λI)) is called the geometric multiplicity of the eigenvalue λ. Theorem 9.7.4. Let M be m2 × m1 and N be m1 × m2 . Put 0 N A= . M 0 Then the following are equivalent: (i) λ 6= 0 is an eigenvalue of A with (geometric) multiplicity f . (ii) −λ 6= 0 is an eigenvalue of A with (geometric) multiplicity f . (iii) λ2 6= 0 is an eigenvalue of M N with (geometric) multiplicity f . (iv) λ2 6= 0 is an eigenvalue of N M with (geometric) multiplicity f . Proof. Step 1. Show (i) ↔ (ii) Let AU = λUfor some matrix U of rank U1 U1 f . Write U = , and put U˜ = , where Ui has mi rows for U2 −U2 i = 1, 2. Then AU = λU becomes 0 N U1 N U2 U1 = =λ , M 0 U2 M U1 U2 so N U2 = λU1 and M U1 = λU2 . −λU 0 N U 1 1 = −λU˜ . Since = This implies AU˜ = λU2 −U2 M 0 rank(U ) = rank(U˜ ), the first equivalence follows. Step 2. Show (iii) ↔ (iv). Let M N U 0 = λ2 U 0 for some matrix U 0 of rank f . Then (N M )(N U 0 ) = λ2 N U 0 , and rank(N U 0 ) = rank(U 0 ), since rank(λ2 U 0 ) = rank(M N U 0 ) ≤ rank(U 0 ), and λ 6= 0. So N M has λ2 as eigenvalue with geometric multiplicity at least f . Interchanging roles of N and M proves (iii) ↔ (iv). Step (i) ↔ (iii) Let AU = λU with U having rank f , and 3. Show U 1 U˜ = as in Step 1. Put n = m1 +m2 . Then A2 U ; U˜ = λ2 U ; U˜ . −U2 since N U2 = λU1 and M U1 = λU2 , U1 and U2 have the same row space. So row(U ) = row(U1 ) = row(U2 ). This implies rank(U1 ) = rank(U2 ) = f .

184


U1 U1 U1 0 Using column operations we can transform into , U2 −U2 0 U2 which has rank 2f . So λ2 is an eigenvalue of A2 with geometric multiplicity at least 2f . On the other hand, the geometric multiplicity of an eigenvalue λ2 6= 0 of A2 equals n − rank(A2 − λ2 I) = n − rank((A − λI)(A + λI)) ≤ n + n − rank(A − λI) − rank(A + λI) = 2f . Note: The singular values of a complex matrix N are sometimes de0 N fined to be the positive eigenvalues of . By the above result N∗ 0 we see that they are the same as the positive square roots of the nonzero eigenvalues of N N ∗ (or of N ∗ N ) as we have defined them above.

9.8

Pseudoinverses and Least Squares∗

Let A be an m × n matrix over F (where F is either R or C) and suppose the rank of A is r. Let A = U ΣV ∗ be a singular value decomposition of A. So Σ is an m × n diagonal matrix with the nonzero singular values s1 ≥ s2 ≥ . . . , ≥ sr down the main diagonal as its only nonzero entries. Define Σ+ to be the n × m diagonal matrix with diagonal equal to ( s11 , s12 , . . . , s1r , 0, . . .). Then both ΣΣ+ and Σ+ Σ have the general block form Ir 0 , 0 0 but the first product is m × m and the second is n × n. Definition The Moore-Penrose generalized inverse of A (sometimes just called the pseudoinverse of A) is the n × m matrix A+ over F defined by A+ = V Σ+ U ∗ . Theorem 9.8.1. Let A be an m × n matrix over F with pseudoinverse A+ (as defined above). Then the following three properties hold: (a) AA+ A = A; (b) A+ AA+ = A+ ; (c) AA+ and A+ A are hermitian (i.e., self-adjoint).

9.8. PSEUDOINVERSES AND LEAST SQUARES∗

185

Proof. All three properties are easily shown to hold using the definition of A+ . You should do this now. Our definition of “the” Moore-Penrose generalized inverse would not be valid if it were possible for there to be more than one. However, the following theorem shows that there is at most one (hence exactly one!) such pseudoinverse. Theorem 9.8.2. Given an m × n matrix A, there is at most one matrix satisfying the three properties of A+ given in Theorem 9.8.1. This means that A has a unique pseudoinverse. Proof. Let B and C be pseudoinverses of A. i.e., satisfying the three properties of A+ in Theorem 9.8.1. Then CA = C(ABA) = CA(BA)∗ = CAA∗ B ∗ = (A(CA)∗ )∗ B ∗ = (ACA)∗ B ∗ = A∗ B ∗ = (BA)∗ = BA, i.e., CA = BA. Then B = BAB = B(AB)∗ = BB ∗ A∗ , so that B = BB ∗ (ACA)∗ = BB ∗ A∗ (AC)∗ = BAC = CAC = C.

At this point we want to review and slightly revise the Gram-Schmidt algorithm given earlier. Let (v1 , . . . , vn ) be any list of column vectors in F m . Let these be the columns of an m × n matrix A. Let W = span(v1 , . . . , vn ) be the column space of A. Define u1 = v1 . Then for 2 ≤ i ≤ n put ui = vi − α1i u1 − α2i u2 − · · · − αi−1,i ui−1 , where αji =

hvi ,uj i , huj ,uj i

if uj 6= ~0, and αji = 0 if uj = ~0. Hence

vi = α1i u1 + α2i u2 + α3i u3 + · · · + αi−1,i ui−1 + ui .

(9.19)

186


Let Q0 be the m×n matrix whose columns are u1 , u2 , . . . , un , respectively, and let R0 be the n × n matrix given by   1 α12 · · · α1n  0 1 · · · α2q    R0 =  .. .. (9.20) .. ..  .  . . . .  0 0 ··· 1 The ui in Q0 are constructed by the Gram-Schmidt method and form an orthogonal set. This means that Q0 has orthogonal columns, some of which may be zero. Let Q and R be the matrices obtained by deleting the zero columns from Q0 and the corresponding rows from R0 , and by dividing each nonzero column of Q0 by its norm and multiplying each corresponding row of R0 by that same norm. Then Eq. 9.20 becomes A = QR with R upper triangular and Q having orthonormal columns. (9.21) Eq. 9.21 is the normalized QR-decomposition of A. Note that if A has rank k, then Q is m × k with rank k and R is k × n and upper triangular with rank k. The columns of Q form an orthonormal basis for the column space W of A. If we compute Q∗ Q, since the columns of Q are orthonormal, we get Q∗ Q = Ik . Since (u1 , . . . , uk ) is an orthonormal basis for W , if P0 orthogonal projection onto W , then for v ∈ F n we have    hv, u i u∗1 1 k X    . .. P0 (v) = hv, ui iui = (u1 , . . . , uk )   = Q ·  .. . i=1 u∗k hv, uk i

∈ L(F n ) is the   ∗  · v = QQ v.

So QQ∗ is the projection matrix projecting v onto W = col(Q). Hence QQ∗ v is the unique vector in W closest to v. Lemma 9.8.3. Suppose that the m × n matrix A has rank k and that A = BC, where B is m × k with rank k and C is k × n with rank k. Then A++ = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ is the pseudoinverse of A.

9.8. PSEUDOINVERSES AND LEAST SQUARES∗

187

Proof. It is rather straightforward to verify that the three properties of Theorem 9.8.1 are satisfied by this A++ . Then by Theorem 9.8.2 we know that a matrix A+ satisfying the three properties of Theorem 9.8.1 is uniquely determined by these properties.) Corollary 9.8.4. If A = QR is a normalized QR-decomposition of A, then A+++ = R∗ (RR∗ )−1 Q∗ is the pseudoinverse of A. Given that A = U ΣV ∗ is a singular value decomposition of A, partition the two matrices as follows: U = (Uk , Um−k ) and V = (Vk , Vn−k ), where Uk is m × k and its columns are the first k columns of U , i.e., they form an orthonormal basis of col(A). Similarly, Vk is k × n and it columns are the first k columns of V , so they form an orthonormal basis for col(A∗ ). This is the same as saying that the rows of Vk∗ form an orthonormal basis for row(A). Now let D be the k × k diagonalmatrix with the nonzero singular values of D 0 A down the diagonal, i.e., Σ = . Now using block multiplication 0 0 we see that D 0 Vk∗ A = (Uk , Um−k ) = Uk Dk Vk∗ . ∗ 0 0 Vn−k This expression A = Uk Dk Vk∗ is called the reduced singular value decomposition of A. Here Uk is m × k with rank k, and Dk Vk∗ is k × n with rank k. Then A = QR = Uk · Dk Vk∗ where Q = Uk is m × k with rank k, and R = Dk Vk∗ is k × n with rank k. The columns of Q form an orthonormal basis for col(A), and the rows of R form an orthogonal basis for row(A). So a pseudoinverse A+ is given by A+ = R∗ (RR∗ )−1 (Q∗ Q)−1 Q∗ = Vk Dk−1 Uk∗ , after some computation. Suppose we are given the equation Ax = b,

(9.22)

where A is m × n and b is m × 1. It is possible that this equation is not consistent. In this case we want to find xˆ so that Aˆ x is as close to b as possible, i.e., it should be the case that Aˆ x is the projection ˆb of b onto the column space of A. Put xˆ = A+ b = Vk Dk−1 Uk∗ b.

188


Then Aˆ x = (Uk Dk Vk∗ )Vk Dk−1 Uk∗ b = Uk DD−1 Uk∗ b = Uk Uk∗ b,

(9.23)

which by the paragraph preceding Lemma 9.8.3 must be the projection ˆb of b onto col(A). Thus xˆ is a “least-squares solution” to Ax = b, in the sense that xˆ minimizes ||Aˆ x − b|| (which is the square root of a sum of squares). Moreover, it is true that xˆ has the smallest length among all least-squares solutions to Ax = b. Theorem 9.8.5. Let A be m×n with rank k. Let A = U ΣV ∗ be a normalized singular value decomposition of A. Let U = (Uk , Um−k ) and V = (Vk , Vn−k ) be partitioned as above. Then A+ = Vk Dk−1 Uk∗ , as above, and xˆ = A+ b is a least-squares solution to Ax = b. If x0 is any other least-squares solution, i.e., ||Ax0 − b|| = kAˆ x − b||, then ||ˆ x|| ≤ ||x0 ||, with equality if and only if x0 = xˆ. Proof. What remains to be shown is that if ||Ax0 − b|| = kAˆ x − b||, then ||ˆ x|| ≤ ||x0 ||, with equality if and only if x0 = xˆ. We know that ||Ax − b|| is minimized precisely when Ax = ˆb is the projection Uk Uk∗ b of b onto col(A). So suppose Ax0 = ˆb = Aˆ x, which implies A(x0 − xˆ) = 0, i.e., x0 − xˆ ∈ null(A). By Eq. 9.17 this means that x0 = xˆ +Vn−k z for some z ∈ F n−k . We claim that −1 ∗ xˆ = A+ b = Vk Dk−1 Uk∗ b and Vn−k z are orthogonal. For, hVk DK Uk b, Vn−k zi = ∗ ∗ z ∗ (Vn−k Vk )Dk−1 UK∗ b = 0, because by Eq. 9.15, Vn−k Vk = 0(n−k)×k . Then ||x0 ||2 = ||ˆ x + Vn−k z||2 = ||ˆ x||2 + ||Vn−k z||2 ≥ ||ˆ x||2 , with equality if and only 2 if ||Vn−k z|| = 0 which is if and only if Vn−k z = ~0 which is if and only if x0 = xˆ. Theorem 9.8.6. Let A be m × n with rank k. Then A has a unique (MoorePenrose) pseudoinverse. If k = n, A+ = (A∗ A)−1 A∗ , and A+ is a left inverse of A. If k = m, A+ = A∗ (AA∗ )−1 , and A+ is a right inverse of A. If k = m = n, A+ = A−1 . Proof. We have already seen that A has a unique pseudoinverse. If k = n, A = B, C = Ik yields a factorization of the type used in Lemma 9.8.3, so A+ = (A∗ A)−1 A∗ . Similarly, if k = m, put B = Ik , C = A. Then A+ = A∗ (AA∗ )−1 . And if A is invertible, so that (A∗ A)−1 = A−1 (A∗ )−1 , by the k = n case we have A+ = (A∗ A)−1 A∗ = A−1 (A∗ )−1 A∗ = A−1 .

9.9. NORMS, DISTANCE AND MORE ON LEAST SQUARES∗

189

Norms, Distance and More on Least Squares∗

9.9

In this section the norm kAk of an m × n matrix A over C is defined by: !1/2 kAk = (tr(A∗ A))1/2 =

X

|Aij |2

.

i,j

(Here we write tr(A) for the trace of A.) Note: This norm is the 2-norm of A thought of as a vector in the mndimensional vector space Mm,n (C). It is almost obvious that kAk = kA∗ k. Theorem 9.9.1. Let A, P , Q be arbitrary complex matrices for which the appropriate multiplications are defined. Then kAP k2 + k(I − AA+ )Qk2 = kAP + (I − AA+ )Qk2 . Proof. kAP + (I − AA+ )Qk2 = tr [AP + (I − AA+ )Q]∗ [AP + (I − AA+ )Q] = kAP k2 +tr (AP )∗ (I − AA+ )Q +tr ((I − AA+ )Q)∗ AP +k(I−AA+ )Qk2 . We now show that each of the two middle terms is the trace of the zero matrix. Since the second of these matrices is the conjugate transpose of the other, it is necessary only to see that one of them is the zero matrix. So for the first matrix: (AP )∗ (I − AA+ )Q = (AP )∗ (I − AA+ )∗ Q = ((I − AA+ )AP )∗ Q = = (AP − AA+ AP )∗ Q = (AP − AP )∗ Q = 0.

Theorem 9.9.2. Let A and B be m × n and m × p complex matrices. Then the matrix X0 = A+ B enjoys the following properties: (i) kAX − Bk ≥ kAX0 − Bk for all n × p X. Moreover, equality holds if and only if AX = AA+ B. (ii) kXk ≥ kX0 k for all X such that kAX − Bk = kAX0 − Bk, with equality if and only if X = X0 .

190


Proof. kAX − Bk2 = = kA(X −A+ B)+((I −AA+ )(−B)k2 = kA(X −A+ Bk2 +k(I −AA+ )(−B)k2 = kAX − AA+ Bk2 + kAA+ B − Bk2 ≥ kAA+ B − Bk2 , with equality if and only if AX = AA+ B. This completes the proof of (i). Now interchange A+ and A in the preceding theorem and assume AX = AA+ B so that equality holds in (i). Then we have kXk2 = kA+ B + X − A+ (AA+ B)k2 = kA+ B + (I − A+ A)Xk2 = kA+ Bk2 + kX − A+ AXk2 = kA+ Bk2 + kX − A+ Bk2 ≥ kA+ Bk2 , with equality if and only if X = A+ B. NOTE: Suppose we have n (column) vectors v1 , . . . , vn in Rm and some vector y in Rm , and we want to find that vector in the space span(v1 , . . . , vn ) = V which is closest to the vector y. First, we need to use only an independent subset of the vi ’s. So use row-reduction techniques to find a basis (w1 , . . . , wk ) of V . Now use these wi to form the columns of a matrix A. So A is m × k with rank k, and the pseudoinverse of A is simpler to compute than if we had used all of the vi ’s to form the columns of A. Put x0 = A+ y. Then Ax0 = AA+ y is the desired vector in V closest to y. Now suppose we want to find the distance from y to V , i.e., the distance d(y, AA+ y) = ky − AA+ yk. It is easier to compute the square of the distance first: ky − AA+ yk2 = k(I − AA+ )yk2 = y ∗ (I − AA+ )∗ (I − AA+ )y = = y ∗ (I −AA+ )(I −AA+ )y = y ∗ (I −AA+ −AA+ AA+ AA+ )y = y ∗ (I −AA+ )y. We have proved the following: Theorem 9.9.3. The distance between a vector y of Rm and the column 1 space of an m × n matrix A is (y ∗ (I − AA+ )y) 2 .


191

Theorem 9.9.4. AXB = C has a solution X if and only if AA+ CB + B = C, in which case for any Y , X = A+ CB + + Y − A+ AY BB + is a solution. This gives the general solution. (The reader might want to review the exercises in Chapter 6 for a different approach to a more general problem.) Proof. If AXB = C, then C = AXB = AA+ AXBB + B = AA+ CB + B. Conversely, if C = AA+ CB + B, then X = A+ CB + is a particular solution of AXB = C. Any expression of the form X = Y − A+ AY BB + satisfies AXB = 0. And if AXB = 0, then X = Y − A+ AY BB + for Y = X. Put X0 = A+ CB + and let X1 be any other solution to AXB = C. Then X = X1 − X0 satisfies AXB = 0, so that X1 − X0 = X = Y − A+ AY BB + for some Y.

If x = (x1 , . . . , xn )T and y = (y1 , . . . , yn )T are two points of C n , the distance between x and y is defined to be d(x, y) = kx − yk =

X

|xi − yi |2

1/2

.

A hyperplane H inP C n is the set of vectors x = (x1 , . . . , xn )T satisfying an equation of the form ni=1 ai xi + d = 0, i.e., H = {x ∈ C n : Ax + d = 0}, where A = (a1 , . . . , an ) 6= ~0. A = 1 · (a1 , . . . , an ), where 1 is 1 × 1 of rank 1 and (a1 , . . . , an ) is 1 × n of rank 1, so 1 A∗ . A+ = A∗ (AA∗ )−1 = kAk2 For such an A you should now show that kA+ k =

1 . kAk

Let y = (y1 , . . . , yn )T be a given point of C n and let H : Ax + d = 0 be a given hyperplane. We propose to find the point x0 of H that is closest to y0 and to find the distance d(x0 , y0 ). Note that x is on H if and only if Ax + d = 0 if and only if A(x − y0 ) − (−d − Ay0 ) = 0. Hence our problem is to find x0 such that

192


(i) A(x0 − y0 ) − (−d − Ay0 ) = 0, (i.e., x0 is on H), and (ii) kx0 − y0 k is minimal (so x0 is the point of H closest to y0 ). Theorem 9.9.2 says that the vector x = x0 − y0 that satisfies Ax − (−d − Ay0 ) = 0 with kxk minimal is given by x = x0 − y0 = A+ (−d − Ay0 ) =

1 A∗ (−d − Ay0 ). 2 kAk

Hence x0 = y 0 +

1 −d ∗ A − A∗ Ay0 . 2 2 kAk kAk

And d(y0 , H) = d(y0 , x0 ) = kx0 − y0 k = kA+ (−d − Ay0 )k = |d + Ay0 | · kA+ k =

|Ay0 + d| . kAk

Note: This distance formula generalizes well known formulas for the distance from a point to a line in R2 and from a point to a plane in R3 . We now recall the method of approximating by least squares. Suppose y is a function of n real variables t(1) , . . . , t(n) , and we want to approximate y as a linear function of these variables. this means we want to find (real) numbers x0 , . . . , xn for which 1. x0 + x1 t(1) + · · · + xn t(n) is as “close” to the function y = y(t(1) , . . . , t(n) ) as possible. Suppose we have m measurements of y corresponding to m different (1) (n) sets of values of the t(j) : (yi ; ti , . . . , ti ), i = 1, . . . , m. The problem is to determine x0 , . . . , xn so that (1)

2. yi = x1 ti + · · · + xn t(n) + ri , 1 ≤ i ≤ m, where the ri ’s are small in some sense. (j)

Put ti = aij , y = (y1 , . . . , ym )T , x = (x0 , . . . , xn )T , r = (r1 , . . . , rm )T , and


   A= 

1 a11 · · · 1 a21 · · · .. .. .. . . . 1 am 1 · · ·

a1n a2n .. .

193

   . 

amn

Then 2. becomes: 3. y = Ax + r. A standard interpretation of saying that “r is small” is to say Pmthat for some some weighting constants w1 , . . . , wm , the number S = i=1 wi ri2 = rT W r, where W = diag(w1 , . . . , wm ) is minimal. P 2 As S = m i=1 wi (yi − s0 − x1 ai1 − x2 ai2 − · · · − xn ain ) , to minimize S ∂S = 0 for all k. as a function of x0 , . . . , xn , we require that ∂x k Then X ∂S = −2 wi (yi − x0 − x1 ai1 − · · · − xn ain ) = 0 ∂x0 i implies 4.

m m m X X X X ( wi )x0 + ( wi ai1 )x1 + · · · + ( wi ain )xn = wi yi . i=1

i=1

And for 1 ≤ k ≤ n : implies

i=1

∂S ∂xk

= −2

Pm

i=1

wi (yi −x0 −x1 ai1 −· · · xn ain )aik = 0

5. m m m X X X X wi aik )x0 + ( wi ai1 aik )x1 + · · · + ( wi ain aik )xn = wi yi aik . ( i=1

i=1

i=1

It is easy to check that    AT W =  

1 ··· a11 · · · .. . ··· a1n · · ·

1 am1 .. . amn





 w · · · 0 1    .. . . ..   . . .   0 · · · wm

194

CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES    = 

w1 ··· a11 w1 · · · .. . ··· a1n wn · · ·

wm am1 wm .. .

   . 

amn wm

So putting together the right hand sides of 4. and 5., we obtain 6.

   (A W )y =   T

P P wi yi ai1 wi yi .. P . ain wi yi

   . 

Also we compute    AT W A =  

w1 ··· a11 w1 · · · .. .. . . a1n w1 · · ·

wm am1 wm .. .



    

amn wm

1 a11 · · · .. .. .. . . . 1 am1 · · ·

 a1n ..  . .  amn

Comparing the above with the left hand sides of 4. and 5., and putting the above equations together, we obtain the following system of n + 1 equations in n + 1 unknowns: 7. (AT W A)x = AT W y. We now reconsider this problem taking advantage of our results on generalized inverses. So A is a real m × n matrix, y is a real m × 1 matrix, and x ∈ Rn is sought for which (y − Ax)T (y − Ax) = kAx − yk2 is minimal (here we put W = I). Theorem 9.9.2 solves this problem by putting x = A+ y. We show that this also satisfies the above condition AT Ax = AT y, as should be expected. For suppose A = BC with B m × k, C k × n, where k = rank(A) = rank(B) = rank(C). Then AT Ax = AT AA+ y = C T B T BCC T (CC T )−1 (B T B)−1 B T y = C T B T y = AT y. So when W = I the generalized inverse solution does the following: First, it actually constructs a solution of the original problem that, second, has the smallest norm of any solution.

9.10. THE RAYLEIGH PRINCIPLE∗

195

The Rayleigh Principle∗

9.10

All matrices in this section are over the field C of complex numbers, and for any matrix B, B ∗ denotes the conjugate transpose of B. Also, h , i denotes the standard inner product on C n given by: hx, yi = xT y = y ∗ x. Let A be n × n hermitian, so A has real eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λn with an orthonormal set {v1 , . . . , vn } of associated eigenvectors: Avj = λj vj , hvj , vi i = vjT vi = vi∗ vj = δij . Let Q = (v1 , . . . , vn ) be the matrix whose jth column is vj . Then Q∗ Q = In and Q∗ AQ = Q∗ (λ1 v1 , . . . , λn vn ) = Q∗ QΛ = Λ = diag(λ1 , . . . , λn ), and Q is unitary (Q∗ = Q−1 ). For 0 6= x ∈ C n define the Rayleigh Quotient ρA (x) for A by x∗ Ax hAx, xi = . ρA (x) = hx, xi ||x||2

(9.24)

Put O = {x ∈ C n : hx, xi = 1}, and note that for 0 6= k ∈ C, 0 6= x ∈ C n , ρA (kx) = ρA (x).

(9.25)

Hence {ρA (x) : x 6= 0} = {ρA (x) : x ∈ O} = {x∗ Ax : x ∈ O}.

(9.26)

The set W (A) = {ρA (x) : x ∈ O} is called the numerical range of A. Observe that if x = Qy, then x ∈ O iff y ∈ O. Since W (A) is the continuous image of a compact connected set, it must be a closed bounded interval with a maximum M and a minimum m. Since Q : C n → C n : x 7→ Qx is nonsingular, Q maps O to O in a one-to-one and onto manner. Hence M = maxx∈O {x∗ Ax} = maxy∈O {(Qy)∗ A(Qy) = y ∗ Q∗ AQy} = ∗

= maxy∈O {y Λy} = maxy∈O {

n X

λj |yj |2 },

j=1

where y = (y1 , y2 , . . . , yn )T ∈ C n , Similarly,

P

|yi |2 = 1.

m = minx∈O {ρA (x)} = miny∈O {

n X j=1

λj |yj |2 }.

196


By the ordering of the eigenvalues, for y ∈ O we have

λ1 = λ1

X

2

|yj | ≤

n X

2

λj |yj | = ρλ (y) ≤ λn

j=1

n X

|yj |2 = λn .

(9.27)

j=1

Furthermore, with y = (1, 0, . . . , 0)∗ ∈ O and z = (0, . . . , 0, 1)∗ ∈ O, we have ρΛ (y) = λ1 and ρΛ (z) = λn . Hence we have almost proved the following first approximation to the Rayleigh Principle. Theorem 9.10.1. Let A be an n × n hermitian matrix with eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λn . Then for any nonzero x ∈ O, λ1 ≤ ρA (x) ≤ λn , and

(9.28)

λ1 = minx∈O ρA (x); λn = maxx∈O ρA (x).

(9.29)

If 0 6= x ∈ C n satisfies ρA (x) = λi for either i = 1 or i = n,

(9.30)

then x is an eigenvector of A belonging to the eigenvalue λi . Proof. Clearly Eqs. 9.28 and 9.29 are already proved. So considerP Eq. 9.30. Without loss of generality we may assume x ∈ O. Suppose x = nj=1 cj vj , P P P n n n 2 ∗ c λ v so that ρA (x) = x∗ Ax = c v j=1 λj |cj | . j=1 j j j = j=1 j j P P Clearly λ1 = λ1 nj=1 |cj |2 ≤ nj=1 λj |cj |2 with equality iff λj = λ1 whenever cj 6= 0. Hence ρA (x) = λ1 iff x belongs to the eigenspace associated with λ1 . The argument for λn is similar. Recall that Q = (v1 , . . . , vn ), and note that if x = Qy, so y = Q∗ x, then x = vi = Qy iff y = Q∗ vi = ei . So with the notation x = Qy, y = (y1 , . . . , yn )T , we have hx, vi i = x∗ vi = x∗ QQ∗ vi = y ∗ ei = y i . Hence x ⊥ vi iff y = Q∗ x satisfies yi = 0. Def. Tj = {x 6= 0 : hx, vk i = 0 for k = 1, . . . , j} = {v1 , . . . , vj }⊥ \ {0}. Theorem 9.10.2. ρA (x) ≥ λj+1 for all x ∈ Tj , and ρA (x) = λj+1 for some x ∈ Tj iff x is an eigenvector associated with λj+1 . Thus λj+1 = minx∈Tj {ρA (x)} = ρA (vj+1 ).

9.11. EXERCISES

197

P P Proof. 0 6= x ∈ Tj iff x = Qy where y = nk=j+1 yk ek iff x = nk=j+1 yk vk . Without loss of generality we may Then y = Q∗ x ∈ O Pn assume x 2∈ O ∩ Tj .P n 2 and x ∈ Tj ∩ O iff ρA (x) = k=j+1 λk |yk | ≥ λj+1 k=j+1 |yk | ≥ λj+1 , with equality iff yk = 0 whenever λk > λj+1 . In particular, if y = ej+1 , so x = vj+1 , ρA (x) = λj+1 . Theorem 9.10.3. Put Sj = {x 6= 0 : hx, vk i = 0 for k = n, n − 1, . . . , n − (j − 1)}. So Sj = {vn , vn−1 , . . . , vn−(j−1) }⊥ \ {~0}. Then ρA (x) ≤ λn−j for all x ∈ Sj , and equality holds iff x is an eigenvector associated with λn−j . Thus λn−j = maxx∈Sj {ρA (x)} = ρA (vn−j ). Proof. We leave the proof as an exercise for the reader. The Rayleigh Principle consists of Theorems 9.10.2 and 9.10.3.

9.11

Exercises

1. Show that if T ∈ L(V ), where V is any finite-dimensional inner product space, and if T is normal, then (a) Im(T ) = Im(T ∗ ), and (b) null(T ) = null(T ∗ ). 2. Prove that if T ∈ L(V ) is normal, then null(T k ) = null(T ) and Im(T k ) = Im(T ) for every positive integer k. 3. Prove that a normal operator on a complex inner product space is self-adjoint if and only if all its eigenvalues are real. 4. Let V be a finite dimensional inner product space over C. Suppose T ∈ L(V ) and U is a subspace of V . Prove that U is invariant under T if and only if U ⊥ is invariant under T ∗ .

198


5. Let V be a finite dimensional inner product space over C. Suppose that T is a positive operator on V (called positive semidefinite by some authors). Prove that T is invertible if and only if hT v, vi > 0 for every v ∈ V \ {0}. 6. In this problem Mn (R) denotes the set of all n × n real matrices, and Rn denotes the usual real inner product space of all column vectors with n real entries and inner product (x, y) = xT y = yT x = (y, x). Define the following subsets of MN (R):

Πn Sn Σn Kn

= = = =

{A ∈ Mn (R) : (Ax, x) > 0 for all 0 6= x ∈ Rn } {A ∈ Mn (R) : AT = A} Π n ∩ Sn {A ∈ Mn (R) : AT = −A}

For this problem we say that A ∈ Mn (R) is positive definite if and only if A ∈ Πn . A is symmetric if and only if it belongs to Sn . It is skew-symmetric if and only if it belongs to Kn . Note: It is sometimes the case that a real matrix is said to be positive definite if and only if it belongs to Sn also, i.e. A ∈ Σn . Remember that we do not do that here. Problem: Let A ∈ Mn (R). (i) Prove that there are unique matrices B ∈ Sn and C ∈ Kn for which A = B + C. Here B is called the symmetric part of A and C is called the skew-symmetric part of A. (ii) Show that A is positive definite if and only if the symmetric part of A is positive definite. (iii) Let A be symmetric. Show that A is positive definite if and only if all eigenvalues of A are positive. If this is the case, then det(A) > 0.

9.11. EXERCISES

199

(iv) Let ∅ 6= S ⊆ {1, 2, . . . , n}. Let AS denote the submatrix of A formed by using the rows and columns of A indexed by the elements of S. (So in particular if S = {k}, then AS is the 1 × 1 matrix whose entry is the diagonal entry Akk of A.) Prove that if A is positive definite (but not necessarily symmetric), then AS is positive definite. Hence conclude that each diagonal entry of A is positive. (v) Let A ∈ Σn and for 1 ≤ i ≤ n let Ai denote the principal submatrix of A defined by using the first i rows and first i columns of A. Prove that det(Ai ) > 0 for each i = 1, 2, . . . , n. (Note: The converse is also true, but the proof is a bit more difficult.) 7. A is a real rectangular matrix (possibly square). (a) Prove that A+ = A−1 when A is nonsingular. (b) Determine all singular value decompositions of I with U = V . Show that all of them lead to exactly the same pseudoinverse. (c) The rank of A+ is the same as the rank of A. (d) If A is self-conjugate, then A+ is self-conjugate. (e) (cA)+ = 1c A+ for c 6= 0. (f) (A+ )∗ = (A∗ )+ . (g) (A+ )+ = A. (h) Show by a counterexample that in general (AB)+ 6= B + A+ . (i) If A is m × r, B is r × n, and both matrices have rank k, then (AB)+ = B + A+ . (j) Suppose that x is m × 1 and y is 1 × m. Compute (a) x+ ; (b) y + ; (c) (xy)+ .     3 1 −2    −1 2  , ~b =  1 . 8. Let A =   0  −4  3  2 2 5 (a) Find a least-squares solution of A~x = ~b.

200

CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES (b) Find the orthogonal projection of ~b onto the column space of A. (c) Compute the least-squares error (in the solution of part (a)).

9. (a) Show that if C ∈ Mn (C) is Hermitian and x∗ Cx = 0 for all x ∈ C n , then C = 0. (b) Show that for any A ∈ Mn (C) there are (unique) Hermitian matrices B and C for which A = B + iC. (c) Show that if x∗ Ax is real for all x ∈ C n , then A is Hermitian. 10. Let V be the vector space over the reals R consisting of the polynomials in x of degree at most 4 with coefficients in R, and with the usual addition of vectors (i.e., polynomials) and scalar multiplication. Let B1 = {1, x, x2 , . . . , x4 } be the standard ordered basis of V . Let W be the vector space of 2 × 3 matrices over R with the usual addition of matrices and scalar multiplication. Let B2 be the basis of ordered 0 1 0 1 0 0 W given as follows: B2 = {v1 = , v2 = , v3 = 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 , v4 = , v5 = , v6 = }. 0 0 0 1 0 0 0 1 0 0 0 1 Define T : V 7→ W by: For f = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 , put T (f ) =

0 a3 a2 + a4 a1 + a0 a0 0

.

Construct the matrix A = [T ]B1 ,B2 that represents T with respect to the pair B1 , B2 of ordered bases. ∗ ~ ~ 11. Show that if A ∈ Mn (C) is normal, then A~x = 0if and only if A ~x = 0. 0 1 0 0 Use the matrices B = and B ∗ = to show that 0 0 1 0 this “if and only if” does not hold in general.

9.11. EXERCISES

201

12. Let U ∈ Mp,m (C). Show that the following are equivalent: (a)

Ip U U ∗ Im

is positive definite;

(b) Ip − U U ∗ is positive definite; (c) Im − U ∗ U is positive definite. (Hint: consider ∗

∗

(v , w )

Ip U U ∗ Im

v w

=

= v ∗ (Ip − U U ∗ )v + ??? = w∗ (Im − U ∗ U )w + ???) 

 2 −1 0 13. Let A =  −1 2 −1 . Show that A is positive definite in three 0 −1 2 different ways. 14. Compute a singular value decomposition of A where 1+i 1 0 A= . 1−i 0 1 15. You may assume that V is a finite-dimensional vector space over F , a subfield of C, say dim(V ) = n. Let T ∈ L(V ). Prove or disprove each of the following: (a) V = null(T ) ⊕ range(T ). (b) There exists a subspace U of V such that U ∩ null(T ) = {0} and range(T ) = {T (u) : u ∈ U }. 16. Let R, S, T ∈ L(V ), where V is a complex inner product space. (i) Suppose that S is an isometry √ and R is a positive operator such that T = SR. Prove that R = T ∗ T .

202

CHAPTER 9. OPERATORS ON INNER PRODUCT SPACES ∗ (ii) Let σ denote the smallest singular value of T , and

let σ denote

(v) the largest singular value of T . Prove that σ ≤ Tkvk

≤ σ ∗ for every nonzero v ∈ V .

17. Let A be an n × n matrix over F , where F is either C or R. Show that there is a polynomial f (x) ∈ F [x] for which A∗ = f (A) if and only if A is normal. What can you say about the minimal degree of such a polynomial? 18. Let F be any subfield of C. Let F n be endowed with the standard inner product, and let A be a normal, n × n matrix over F . (For example, F could be the rational field Q.) The following steps give another view of normal matrices. Prove the following statements without using the spectral theorem (for C). (i) null(A) = (col(A))⊥ = null(A∗ ). (ii) If A2 x = 0, then Ax = 0. (iii) If f (x) ∈ F [x], then f (A) is normal. (iv) Suppose f (x), g(x) ∈ F [x] with 1 = gcd(f (x), g(x)). If f (A)x = 0 and g(A)y = 0, then hx, yi = 0. (v) Suppose p(x) = is the minimal polynomial of A. Then p(x) has no repeated (irreducible) factors. 19. (a) Prove that a normal operator on a complex inner product space with real eigenvalues is self-adjoint. (b) Let T : V → V be a self-adjoint operator. Is it true that T must have a cube root? Explain. (A cube root of T is an operator S : V → V such that S 3 = T .)

Chapter 10 Decomposition WRT a Linear Operator To start this chapter we assume that F is an arbitrary field and let V be an n-dimensional vector space over F . Note that this necessarily means that we are not assuming that V is an inner product space.

10.1

Powers of Operators

First note that if T ∈ L(V ), if k is a nonnegative integer, and if T k (v) = ~0, then T k+1 (v) = ~0. Thus null(T k ) ⊆ null(T k+1 ). It follows that {~0} = null(T 0 ) ⊆ null(T 1 ) ⊆ · · · null(T k ) ⊆ null(T k+1 ) ⊆ · · · .

(10.1)

Theorem 10.1.1. Let T ∈ L(V ) and suppose that m is a nonnegative integer such that null(T m ) = null(T m+1 ). Then null(T m ) = null(T k ) for all k ≥ m. Proof. Let k be a positive integer. We want to prove that null(T m+k ) = null(T m+k+1 ). We already know that null(T m+k ) ⊆ null(T m+k+1 ). To prove the inclusion in the other direction, suppose that v ∈ null(T m+k+1 ). Then ~0 = T m+1 (T k )(v). So T k (v) ∈ null(T m+1 ) = null(T m ), implying that ~0 = T m (T k (v)) = T m+k (v). Hence null(T m+k+1 ) ⊆ null(T m+k ), completing the proof. 203

204

CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR

Corollary 10.1.2. If T ∈ L(V ), n = dim(V ), and k is any positive integer, then null(T n ) = null(T n+k ). Proof. If the set containments in Eq. 10.1 are strict for as long as possible, at each stage the dimension of a given null space is at least one more than the dimension of the preceding null space. Since the entire space has dimension n, at most n + 1 proper containments can occur. Hence by Theorem 10.1.1 the Corollary must be true. Definition: Let T ∈ L(V ) and suppose that λ is an eigenvalue of T . A vector v ∈ V is called a generalized eigenvector of T corresponding to λ provided (T − λI)j (v) = ~0 for some positive integer j. (10.2) By taking j = 1 we see that each eigenvector is also a generalized eigenvector. Also, the set of generalized eigenvectors is a subspace of V . Moreover, by Corollary 10.1.2 (with T replaced by T − λI) we see that Corollary 10.1.3. The set of generalized eigenvectors corresponding to λ is exactly equal to null[(T − λI)n ], where n = dim(V ). An operator N ∈ L(V ) is said to be nilpotent provided some power of N is the zero operator. This is equivalent to saying that the minimal polynomial p(x) for N has the form p(x) = xj for some positive integer j, implying that the characteristic polynomial for N is xn . Then by the Cayley-Hamilton Theorem we see that N n is the zero operator. Now we turn to dealing with images of operators. Let T ∈ L(V ) and k ≥ 0. If w ∈ Im(T k+1 ), say w = T k+1 (v) for some v ∈ V , then w = T k (T (v) ∈ Im(T k ). In other words we have V = Im(T 0 ) ⊇ ImT 1 ⊇ · · · ⊇ Im(T k ) ⊇ Im(T k+1 ) · · · .

(10.3)

Theorem 10.1.4. If T ∈ L(V ), n = dim(V ), and k is any positive integer, then Im(T n ) = Im(T n+k ). Proof. We use the corresponding result already proved for null spaces. dim(Im(T n+k )) = n − dim(null(T n+k )) = n − dim(null(T n )) = dim(Im(T n )). Now the proof is easily finished using Eq. 10.3

10.2. THE ALGEBRAIC MULTIPLICITY OF AN

10.2

EIGENVALUE

205

The Algebraic Multiplicity of an Eigenvalue

If the matrix A is upper triangular, so is the matrix xI − A, whose determinant is the product of its diagonal elements and also is the characteristic polynomial of A. So the number of times a given scalar λ appears on the diagonal of A is also the algebraic multiplicity of λ as a root of the characteristic polynomial of A. The next theorem states that the algebraic multiplicity of an eigenvalue is also the dimension of the space of generalized eigenvectors associated with that eigenvalue. Theorem 10.2.1. Let n = dim(V ), let T ∈ L(V ) and let λ ∈ F . Then for each basis B of V for which [T ]B is upper triangular, λ appears on the diagonal of [T ]B exactly dim(null[(T − λI)n ]) times. Proof. For notational convenience, we first assume that λ = 0. Once this case is handled, the general case is obtained by replacing T with T − λI. The proof is by induction on n, and the theorem is clearly true when n = 1. We assume that n > 1 and that the theorem holds on spaces of dimension n − 1. Suppose that B = (v1 , . . . , vn ) is a basis of V for which [T ]B is the upper triangular matrix   λ1 ∗   ...   (10.4) .    λn−1 0 λn Let U = span(v1 , . . . , vn−1 ). Clearly U is invariant under T and the matrix of T |U with respect to the basis (v1 , . . . , vn−1 ) is   λ1 ∗   .. (10.5)  . . 0 λn−1 By our induction hypothesis, 0 appears on the diagonal of the matrix in Eq. 10.5 exactly dim(null((T |U )n−1 ))) times. Also, we know that null((T |U )n−1 ) = null((T |U )n ), because dim(U ) = n − 1. Hence 0 appears on the diagonal of 10.5 dim(null((T |U )n ))) times.

(10.6)

206


The proof now breaks into two cases, depending on whether λn = 0 or not. First consider the case where λn 6= 0. We show in this case that null(T n ) ⊆ U.

(10.7)

This will show that null(T n ) = null((T |U )n ), and hence Eq 10.6 will say that 0 appears on the diagonal of Eq. 10.4 exactly dim(null(T n )) times, completing the proof in the case where λn 6= 0. It follows from Eq. 10.4 that   λn1 ∗   ..   . (10.8) [T n ]B = ([T ]B )n =  . n   λn−1 0 λnn This shows that T n (vn ) = u + λnn vn for some u ∈ U . Suppose that v ∈ null(T n ). Then v = u˜ + avn where u˜ ∈ U and a ∈ F . Thus ~0 = T n (v) = T n (˜ u) + aT n (vn ) = T n (˜ u) + au + aλnn vn . Because T n (˜ u) and au are in U and vn 6∈ U , this implies that aλnn = 0. Since λn 6= 0, clearly a = 0. Thus v = u˜ ∈ U , completing the proof of Eq. 10.7, and hence finishing the case with λn 6= 0. Suppose λn = 0. Here we show that dim(null(T n )) = dim(null((T |U )n )) + 1, which along with Eq. 10.6 will complete the proof when λn = 0. First consider dim(null(T n )) = dim(U ∩ null(T n ) + dim(U + null(T n )) − dim(U ) = dim(null((T |U )n )) + dim(U + null(T n )) − (n − 1). Also, n = dim(V ) ≥ dim(U + null(T n )) ≥ dim(U ) = n − 1.

(10.9)

10.2. THE ALGEBRAIC MULTIPLICITY OF AN

EIGENVALUE

207

It follows that if we can show that null(T n ) contains a vector not in U , then Eq. 10.9 will be established. First note that since λn = 0, we have T (vn ) ∈ U , hence T n (vn ) = T n−1 (T (vn )) ∈ Im[(T |U )n−1 ] = Im[(T |U )n ]. This says that there is some u ∈ U for which T n (u) = T n (vn ). Then u − vn is not in U but T n (u−vn ) = ~0. Hence Eq. 10.9 holds, completing the proof. At this point we know that the geometric multiplicity of λ is the dimension of the null space of T − λI, i.e., the dimension of the eigenspace associated with λ, and this is less than or equal to the algebraic multiplicity of λ, which is the dimension of the null space of (T −λI)n and also equal to the multiplicity of λ as a root of the characteristic polynomial of T , at least in the case that T is upper triangularizable. This is always true if F is algebraically closed. Moreover, in this case the following corollary is clearly true: Corollary 10.2.2. If F is algebraically closed, then the sum of the algebraic multiplicities of of all the eigenvalues of T equals dim(V ). For any f (x) ∈ F [x], T and f (T ) commute, so that the null space of p(T ) is invariant under T . Theorem 10.2.3. Suppose F is algebraically closed and T ∈ L(V ). Let λ1 , . . . , λm be the distinct eigenvalues of T , and let U1 , . . . , Um be the corresponding subspaces of generalized eigenvectors. Then (a) V = U1 ⊕ . . . ⊕ Um ; (b) each Uj is T -invariant; (c) each (T − λj I)|Uj is nilpotent. Proof. Since Uj = null(T − λj I)n for each j, clearly (b) follows. Clearly (c) follows from the definitions. Also the sum of the multiplicities of the eigenvalues equals dim(V ), i.e. dim(V ) =

m X

dim(Uj ).

j=1

Put U = U1 + · · · + Um . Clearly U is invariant under T . Hence we can define S ∈ L(U ) by S = T |U .

208


It is clear that S has the same eigenvalues, with the same multiplicities, as does T , since all the generalized eigenvectors of T are in U . Then the dimension of U is the sum of thePdimensions of the generalized eigenspaces of T , forcing V = U , and V = ⊕ m j=1 Uj , completing the proof of (a). There is another style of proof that offers a somewhat different insight into this theorem, so we give it also. Suppose that f (λ) = (λ−λ1 )k1 · · · (λ−λm )km is the characteristic polynomial of T . Then by the Cayley-Hamilton theorem f (λ) , for 1 ≤ (T − λ1 I)k1 · · · (T − λm I)km is the zero operator. Put bj (λ) = (λ−λ )kj j

j ≤ m. Then are polynomials aj (λ) for Psince gcd(b1 (λ), . . . , bm (λ)) = 1 there P which 1 = j aj (λ)bj (λ), which Pimplies that I = j aj (T )bj (T ). Then for any vector u ∈ V we P have u = j aj (T )bj (T )(u), where aj (T )bj (T )(u) ∈ Uj . It follows that V = j Uj . Since the Uj are linearly independent, we have P V = ⊕ j Uj . By joining bases of the various generalized eigenspaces we obtain a basis for V conisting of generalized eigenvectors, proving the following corollary. Corollary 10.2.4. Let F be algebraically closed and T ∈ L(V ). Then there is a basis of V consisting of generalized eigenvectors of T . Lemma 10.2.5. Let N be a nilpotent operator on a vector space V over any field F . Then there is a basis of V with respect to which the matrix of N has the form   0 ∗   .. (10.10)  , . 0 0 Proof. First choose a basis of null(N ). Then extend this to a basis of null(N 2 ). Then extend this to a basis of null(N 3 ). Continue in this fashion until eventually a basis of V is obtained, since V = null(N m ) for sufficiently large m. A little thought should make it clear that with respect to this basis, the matrix of N is upper triangular with zeros on the diagonal.

10.3

Elementary Operations

For 1 ≤ i ≤ n, let ei denote the column vector with a 1 in the ith position and zeros elsewhere. Then ei eTj is an n × n matrix with all entries other than the (i, j) entry equal to zero, and with that entry equal to 1.

10.3. ELEMENTARY OPERATIONS

209

Let Eij (c) = I + cei eTj . We leave to the reader the exercise of proving the following elementary results: Lemma 10.3.1. The following elementary row and column operations are obtained by pre- and post-multiplying by the elementary matrix Eij (c): (i) Eij (c)A is obtained by adding c times the jth row of A to the ith row. (ii) AEij (c) is obtained by adding c times the ith column of A to the jth column. (iii) Eij (−c) is the inverse of the matrix Eij (c). Moreover, if T is upper triangular with diagonal diag(λ1 , . . . , λn ), and if λi 6= λj with i < j, then      0 T = Eij (−c)T Eij (c) = Eij (−c)     



λ1 ..

.

∗

     Eij (c),    

λi ... 0

λj ..

.

where T 0 is obtained from T by replacing the (i,j) entry tij of T with tij + c(λi − λj ). The only other entries of T that can possibly be affected are to the right of tij or above tij . Using Lemma 10.3.1 over and over, starting low, working left to right in a given row and moving upward, we can transform T into a direct sum of blocks   A1   A2   −1 P TP =  , . .   . Ar where each block Ai is upper triangular with each diagonal element equal to λi , 1 ≤ i ≤ r. Our next goal is to find an invertible matrix Ui that transforms the block Ai into a matrix Ui−1 Ai Ui = λi I + N where N is nilpotent with a special

210


form. All entries on or below the main diagonal are 0, all entries immediately above the diagonal are 0 or 1, and all entries further above the diagonal are 0. This matrix λi I + N is called a Jordan block. We want to arrange it so that it is a direct sum of elementary Jordan blocks. These are matrices of the form λi I + N where each entry N just above the diagonal is 1 and all other elements of N are 0. Then we want to use block multiplication to transform P −1 T P into a direct sum of elementary Jordan blocks.

10.4

Transforming Nilpotent Matrices

Let B be n × n, complex and nilpotent of order p: B p−1 6= 0 = B p . By N (B j ) we mean the right null space of the matrix B j . Note that if j > i, then N (B i ) ⊆ N (B j ). We showed above that if N (B i ) = N (B i+1 ), then N (B i ) = N (B k ) for all k ≥ k. Step 1. Lemma 10.4.1. Let W = (w1 , . . . , wr ) be an independent list of vectors in N (B j+1 ) with hW i ∩ N (B j ) = {0}. Then BW = (Bw1 , . . . , Bwr ) is an independent list in N (B j ) with hBW i ∩ N (B j−1 ) = {0}. P j Proof. Clearly BW ⊆ N (BP ). Suppose ri=1 ci Bw Pir+ uj−1j = 0, with uj−1 ∈ r j−1 j−1 N ). Then 0 = B ( i=1 ci Bwi + uj−1 ) = i=1 ci B wi + 0. This says P(B r j i=1 ci wi ∈ N (B ), so by hypothesis c1 = c2 = · · · = cr = 0, and hence also uj−1 = 0. It is now easy to see that the Lemma must be true. Before proceeding to the next step, we introduce some new notation. Suppose V1 is a subspace of V . To say that the list (w1 , . . . , wr ) is independent in V \ V1 means first that it is independent, and second that if W is the span hw1 , . . . , wr i of the given list, then W ∩ V1 = {0}. To say that the list (w1 , . . . , wr ) is a basis of V \ V1 means that a basis of V1 adjoined to the list (w1 , . . . , wr ) is a basis of V . Also, hLi denotes the space spanned by the list L. Step 2. Let Up be a basis of N (B p ) \ N (B p−1 ) = C n \ N (B p−1 ), since B p = 0. By Lemma 10.4.1 BUp is an independent list in N (B p−1 ) \ N (B p−2 ), with hBUp i ∩ N (B p−2 ) = {0}. Complete BUp to a basis (BUp , Up−1 ) of N (B p−1 ) \ N (B p−2 ). At this point we have that (BUp , Up−1 , Up ) is a basis of N (B p ) \ N (B p−2 ) = C n \ N (B p−2 ).

10.4. TRANSFORMING NILPOTENT MATRICES

211

Step 3. (B 2 Up , BUp−1 ) is an independent list in N (B p−2 ) \ N (B p−3 ). Complete this to a basis (B 2 Up , BUp−1 , Up−2 ) of N (B p−2 ) \ N (B p−3 ). At this stage we have that (Up , BUp , Up−1 , B 2 Up , BUp−1 , Up−2 ) is a basis of N (B p ) \ N (B p−3 ). Step 4. Proceed in this way until a basis for the entire space V has been obtained. At that point we will have a situation described in the following array:

Independent set basis for subspace (Up ) N (B p ) \ N (B p−1 ) = C n \ N (B p−1 ) (Up−1 , BUp ) N (B p−1 ) \ N (B p−2 ) (Up−2 , BUp−1 , B 2 Up ) N (B p−2 ) \ N (B p−3 ) (10.11) 2 3 p−3 p−4 (Up−3 , BUp−2 , B Up−1 , B Up ) N (B ) \ N (B ) .. .. . . p−2 p−1 (U1 , BU2 , . . . , B Up−1 , B Up ) N (B p−(p−1) ) \ N (B 0 ) = N (B) Choose x = x1 ∈ B p−j Up−j+1 and follow it back up to the top of that column: xp−j ∈ Up−j ; Bxp−j = xp−j−1 ; . . . ; Bx2 = x1 . So B p−j−1 xp−j = x1 . We now want to interpret what it means for 

  P −1 BP = H = 

H1

0 1 0   0 1    .. ..  with Hi =  . .   Hs 1 0

    .  

This last matrix is supposed to have 1’s along the superdiagonal and 0’s elsewhere. This says that the jth column of BP (which is B times the jth column of P ) equals the jth column of P H, which is either the zero column or the (j − 1)st column of P . So we need B(jth column of P ) = 0 or the (j − 1)st column of B.

212


So to form P , as the columns of P we take the vectors in the Array 10.11 starting at the bottom of a column, moving up to the top, then from the bottom to the top of the next column to the left, etc. Each column of Array 10.11 represents one block Hi , and the bottom row consists of a basis for the null space of B. Then we do have   H1   .. P −1 BP = H =  . . Hs Suppose we have a matrix A = λI + B where B is nilpotent with all elements on or below the diagonal equal to 0. Construct P so that P −1 BP = H, i.e., P −1 AP = P −1 (B + λI)P = λI + H is a direct sum of elementary Jordan blocks each with the same eigenvalue λ along the diagonal and 1’s just above the diagonal. Such a matrix is said to be in Jordan form. Start with a general n × n matrix A over C. Here is the general algorithm for finding a Jordan form for A. 1. Find P1 so that P1−1 AP1 = T is in upper triangular Schur form, so   λ1   ...       λ ∗ 1     λ 2   ..   .   T = . λ2     ...   0     λ r     ..   . λr 2. Find P2 so that    P2−1 T P2 =  



Tλ1 Tλ2 0

0 .. .

   = C.  Tλr


213

3. Find P3 so that P3−1 CP3 = D + H where (P1 P2 P3 )−1 · A · (P1 P2 P3 ) = D + H is in Jordan form. An n×n elementary Jordan block is a matrix of the form Jn (λ) = λI +Nn , where λ ∈ C, Nn is the n × n nilpotent matrix with 1’s along the superdiagonal and 0’s elsewhere. The minimal and characteristic polynomials of Jn (λ) are both equal to (x − λ)n , and Jn has a 1-dimensional space of eigenvectors spanned by (1, 0, . . . , 0)T and an n-dimensional space of generalized eigenvectors, all belonging to the eigenvalue λ. If A has a Jordan form consisting of a direct sum of s elementary Jordan blocks, the characteristic polynomials of these blocks are the elementary divisors of A. The characteristic polynomial of A is the product of its elementary divisors. If λ1 , . . . , λs are the distinct eigenvalues of A, and if for each i, 1 ≤ i ≤ s, (xQ− λi )mi is the largest elementary divisor involving the eigenvalue λi , then si=1 (x − λi )mi is the minimal polynomial for A. It is clear that (x − λ) divides the minimal polynomial of A if and only if it divides the characteristic polynomial of A if and only if λ is an eigenvalue of A. At this point we consider an example in great detail. In order to minimize the tedium of working through many computations, we start with a blockupper triangluar matrix N with square nilpotent matrices along the main diagonal, so it is clear from the start that N is nilpotent. Let N = (Pij ) where the blocks Pij are given by 

P11

P22 = P33 P44 P55

0  0 =   0 0  0  0 = 0 0 = 0 = (0)

1 0 0 0

0 1 0 0

 0 0   1  0 

1 0 0 1  0 0 1 0

These diagonal blocks determine the sizes of each of the other blocks, and all blocks Pij with i > j are zero blocks. For the blocks above the diagonal

214


we have the following. 

P12

P14

P23

P25 = P35 P45

 −1 1 0  −1 1 0   =   −1 1 0  ; P13 0 1 0   −1 1  −1 1   =   −1 1  ; P15 = −1 1   −1 1 0 =  −1 1 0  ; P24 0 1 0    −1 =  −1  ; P34 =  −1 −1 = . 0



−1  −1 =   −1 −1   −1  −1     −1  ; −1  −1  −1 = −1  −1 1 −1 1  ; 0 1

1 1 1 1

 0 0  ; 0  0

 1 1 ; 1

The reader might want to write out the matrix N as a 13 × 13 matrix to visualize it better, but for computing N 2 using block multiplication the above blocks suffice. We give N 2 as an aid to computing its null space.   0 0 1 0 −1 0 1 −1 0 1 −1 0 0  0 0 0 1 −1 0 1 −1 0 1 −1 0 0     0 0 0 0 0 0 1 −1 0 1 −1 0 0     0 0 0 0 0 0 1 −1 0 1 −1 0 0     0 0 0 0 0 0 1 −1 0 1 −1 0 0     0 0 0 0 0 0 0 0 0 1 −1 0 0     N2 =   0 0 0 0 0 0 0 0 0 1 −1 0 0   0 0 0 0 0 0 0 0 0 1 −1 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 0 0 0 0 0 Another routine computation shows that N 3 has first row equal to (0 0 0 1 − 1 0 0 0 0 0 0 0 0)


215

and all other entries are zeros. Now let x = (x1 , x2 , . . . , x13 )T and write out the conditions for x to be in the null space of N j for j = 1, 2, 3, 4. The null space of N 4 is R13 . So N 4 has rank 0 and nullity 13. x is in the null space of N 3 if and only if x4 = x5 . N 3 has rank 1 and nullity 12. x is in the null space of N 2 if and only if x3 = x4 = x5 , x7 = x8 and x10 = x11 . N 2 has rank 4 and nullity 9. x is in the null space of N if and only if x2 = x3 = x4 = x5 , x6 = x7 = x8 , x9 = x10 = x11 and x12 = x13 . N has rank 8 and nullity 5. Now let S = (e1 , e2 , . . . , e13 ) be the standard basis of R13 . U4 is to be a basis of null(N 4 ) \ null(N 3 ) and must have one element since the nullity of N 4 is one greater than the nullity of N 3 . Put U4 = {e1 + e2 + e3 + e4 }. It is easy to see that this vector is not in the null space of N 3 . For the next step first compute N (e1 + e2 + e3 + e4 ) = e1 + e2 + e3 . This vector is in null(N 3 ) \ null(N 2 ). Since the nullity of N 3 is three more than the nullity of N 2 , we need two more vectors in N 3 \ N 2 . We propose e1 + e2 + · · · + e7 and e1 + · · · + e10 . Then we would have {e1 + · · · + e10 , e1 + · · · + e7 , e1 + e2 + e3 } as a basis of null(N 3 ) \ null(N 2 ). This is fairly easy to check. The next step is to compute the image of each of these three vectors under N . We get the three vectors {e1 + · · · + e9 , e1 + · · · + e6 , e1 + e2 } . Since the nullity of N 2 is four more than the nullity of N , we need one additional vector to have a basis of null(N 2 ) \ null(N ). We propose e1 + · · · + e12 . Then it is routine to show that {e1 + · · · + e12 , e1 + · · · e9 , e1 + · · · + e6 , e1 + e2 } is a basis of null(N 2 ) \ null(N ). We next compute the image of each of these four vectors under N and get {e1 + · · · + e11 , e1 + · · · + e8 , e1 + · · · + e5 , e1 }

216


in the null space of N . Since the nullity of N is 5, we need one more vector. We propose e1 + · · · + e13 . So then we have {e1 + · · · + e13 , e1 + · · · + e11 , e1 + · · · + e8 , e1 + · · · + e5 , e1 } as a basis for the null space of N . At this point we are ready to form the chains of generalized eigenvectors into a complete Jordan basis. The first chain is:

v1 = e1 , v2 = e1 + e2 , v3 = e1 + e2 + e3 , v4 = e1 + e2 + e3 + e4 . The second chain is v5 = e1 + · · · + e5 , v6 = e1 + · · · + e6 , v7 = e1 + · · · + e7 . The third chain is v8 = e1 + · · · + e8 , v9 = e1 + · · · + e9 , v10 = e1 + · · · + e10 . The fourth chain is v11 = e1 + · · · + e11 , v12 = e1 + · · · + e12 . The fifth chain is v13 = e1 + · · · + e13 . With the basis B = (v1 , v2 , . . . , v13 ) forming the columns of a matrix P , we have that P −1 N P is in Jordan form with elementary Jordan blocks of sizes 4, 3, 3, 2 and 1.

10.5

An Alternative Approach to the Jordan Form

Let F be an algebraically closed field. Suppose A is n × n over F with characteristic polynomial f (λ) = |λI − A| = (λ − λ1 )m1 (λ − λ2 )m2 · · · (λ − λq )mq , where λi 6= λj if i 6= j, and each mi ≥ 1. Let TA ∈ L(F n ) be defined as usual by TA : F n → F n : x 7→ Ax. We have just seen that there is a basis B of F n for which [TA ]B is in Jordan form. This was accomplished

10.5. AN ALTERNATIVE APPROACH TO THE JORDAN FORM

217

by finding a basis B which is the union of bases, each corresponding to an elementary Jordan block. If Bi corresponds to the k × k elementary Jordan block J = λi I + Nk , then Bi = (v1 , v2 , . . . , vk ) where Avj = λi vj + vj−1 for 2 ≤ j ≤ k, and Av1 = λi v1 . We call (v1 , . . . , vk ) a k-chain of generalized eigenvectors for the eigenvalue λi . If P is the n × n matrix whose coluns are the vectors in B arranged into chains, with all the chains associated with a given eigenvalue being grouped together, then P −1 AP = J1 ⊕ J2 ⊕ · · · ⊕ Jq , where Ji is the direct sum of the elementary Jordan blocks associated with λi . It is easy to see that P −1 AP = D + N , where D is a diagonal matrix whose diagonal entries are λ1 , . . . , λq , with each λi repeated mi times, and N is nilpotent. For an elementary block λI + N it is clear that λI · N = N · λI = λN , from which it follows that also DN = N D. So Lemma 10.5.1. A = P DP −1 + P N P −1 , where P DP −1 is diagonalizable, P N P −1 is nilpotent and P DP −1 commutes with P N P −1 . Write the partial fraction decomposition of 1/f (λ) as a1 (λ) aq (λ) 1 = + ··· + , m 1 f (λ) (λ − λ1 ) (λ − λq )mq where ai (λ) is a polynomial in λ with degree at most mi − 1. Put Y f (λ) = (λ − λj )mj . bi (λ) = (λ − λi )mi j6=i

(10.12)

(10.13)

Then put Pi = ai (A) · bi (A) = ai (A) ·

Y (A − λj I)mj .

(10.14)

j6=i

Since f (λ) divides bi (λ)bj (λ) when i 6= j, we have Pi Pj = 0, if i 6= j. Since 1 =

Pq

i=1

(10.15)

ai (λ)bi (λ), we have I=

q X i=1

ai (A)bi (A) = P1 + · · · + Pq .

(10.16)

218


Then Pi = Pi I = Pi (P1 + · · · + Pq ) = Pi2 , so we can combine these last two results as Pi Pj = δij Pi . (10.17) Then using Pimi = Pi we have [Pi (A − λi I)]mi = Pi (A − λi I)mi = ai (A)bi (A)(A − λi I)mi = ai (A)f (A) = 0. (10.18) Hence if we put Ni = Pi (A − λi I), then if i 6= j Ni Nj = 0 and Nimi = 0.

(10.19)

Note that Pi and Ni are both polynomials in A. Start with I = P1 + · · · + Pq . Multiply by A on the right. A = P 1 A + P2 A + · · · + Pq A = P1 (λ1 I + A − λ1 I) + · · · + Pq (λq I + A − λq I) q X = (λi Pi + Pi (A − λi I)) i=1 q

=

X i=1

λ i Pi +

q X

Ni = D + N,

(10.20)

i=1

P where D = i=1 λi Pi and N = qi=1 Ni . Clearly D and N are polynomials in A, so they commute with each other P and with A, and Pq N 2is nilpotent. q 2 2 Starting with the observation that D = ( i=1 λi Pi ) = i=1 λi Pi it is easy to see that for any polynomial g(λ) ∈ F [λ] we have Pq

g(D) =

q X

g(λi )Pi .

(10.21)

i=1

It then follows that if g(λ) = (λ − λ1 )(λ − λ2 ) · · · (λ − λq ), then g(D) = 0. This says the minimal polynomial of D has no repeated roots, so that D is diagonalizable. Theorem 10.5.2. Let A be n×n over F and suppose the minimal polynomial for A factors into linear factors. Then there is a diagonalizable matrix D and a nilpotent matrix N such that (i) A = D + N ; and (ii) DN = N D. The diagonalizable matrix D and the nilpotent matrix N are uniquely determined by (i) and (ii) and each of them is a polynomial in A.

10.6. A “JORDAN FORM” FOR REAL MATRICES

219

Proof. We have just observed that we can write A = D + N where D is diagonalizable and N is nilpotent, and where D and N not only commute but are polynomials in A. Suppose that we also have A = D0 + N 0 where D0 is diagonalizable and N 0 is nilpotent, and D0 N 0 = N 0 D0 . (Recall that this was the case in Lemma 10.5.1.) Since D0 and N 0 commute with one another and A = D0 + N 0 , they commute with A, and hence with any polynomial in A, especially with D and N . Thus D and D0 are commuting diagonalizable matrices, so by Theorem 7.4.4 they are simultaneously diagonalizable. This means there is some invertible matrix P with P −1 DP and P −1 D0 P both diagonal and, of course, P −1 N P and P −1 N 0 P are both nilpotent. From A = D + N = D0 + N 0 we have D − D0 = N 0 − N . Then P −1 (D − D0 )P is diagonal and equals P −1 (N 0 − N )P , which is nilpotent (Why?), implying P −1 (D − D0 )P is both diagonal and nilpotent, forcing P −1 (D − D0 )P = 0. Hence D = D0 and N = N 0 .

10.6

A “Jordan Form” for Real Matrices

Theorem 10.6.1. Let V be a real vector space and T ∈ L(V ). Then there is a basis B of V for which   A1 ∗   .. [T ]B =  , . 0

Am

where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues. Proof. The result is clearly true if n = dim(V ) = 1, so suppose n = 2. If T has an eigenvalue λ, let v1 be any nonzero eigenvector of T belonging to λ. Extend (v1 ) to a basis (v1 , v2 ) of V . With respect to this basis T has an upper triangular matrix of the form λ a . 0 b In particular, if T has an eigenvalue, then there is a basis of V with respect to which T has an upper triangular matrix. If T has no eigenvalues, then choose any basis (v1 , v2 ) of V . With respect to this basis, the matrix of T has

220


no eigenvalues. So we have the desired result when n = 2. Now suppose that n = dim(V ) > 2 and that the desired result holds for all real vector spaces with smaller dimension. If T has an eigenvalue, let U be a 1-dimensional subspace of V that is T -invariant. Otherwise, by Theorem 7.3.1 let U be a 2-dimensional T -invariant subspace of V . Choose any basis of U and let A1 denote the matrix of T |U with respect to this basis. If A1 is a 2-by-2 matrix then T has no eigenvalues, since otherwise we would have chosen U to be 1-dimensional. Hence T |U and A1 have no eigenvalues. Let W be any subspace of V for which V = U ⊕ W . We would like to apply the induction hypothesis to the subspace W . Unfortunately, W might not be T -invariant, so we have to be a little tricky. Define S ∈ L(W ) by S(w) = PW,U (T (w)) ∀w ∈ W. Note that T (w) = PU,W (T (w)) + PW,U (T (w)) = PU,W (T (w)) + S(w) ∀w ∈ W.

(10.22)

By our induction hypothesis, there is a basis for W with respect to which S has a block upper triangular matrix of the form   S2 ∗   ...  , 0 Am where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues. Adjoin this basis of W to the basis of U chosen above, getting a basis of V . The corresponding matrix of T is a block upper triangular matrix of the desired form. definition of the characteristic polynomial of the 2 × 2 matrix A = Our a c is that it is b d f (x) = (x − a)(x − d) − bc = x2 − (trace(A))x + det(A). The fact that this is a reasonable definition (the only reasonable definition if we want the Cayley-Hamilton theorem to be valid for such matrices) follows from the next result.


221

Theorem 10.6.2. Suppose V is a real vector space with dimension 2 and T ∈ L(V ) has no eigenvalues. Suppose A is the matrix of T with respect to some basis. Let p(x) ∈ R[x] be a polynomial of degree 2. (a) If p(x) equals the characteristic polynomial of A, then p(T ) = 0. (b) If p(x) does not equal the characteristic polynomial of A, then p(T ) is invertible. Proof. Part (a) follows immediately from the Cayley-Hamilton theorem, but it is also easy to derive it independently of that result. For part (b), let q(x) denote the characteristic polynomial of A with p(x) 6= q(x). Write p(x) = x2 + α1 x + β1 and q(x) = x2 + α2 x + β2 for some α1 , α2 , β1 , β2 ∈ R. Now p(T ) = p(T ) − q(T ) = (α1 − α2 )T + (β1 − β2 )I. If α1 = α2 , then β1 6= β2 , since otherwise p = q. In this case p(T ) is some multiple of the identity and hence is invertible. If α1 = 6 α2 , then p(T ) = (α1 − α2 )(T −

β2 − β1 ) I), α1 − α2

which is an invertible operator since T has no eigenvalues. Thus (b) holds.

Now suppose V is 1-dimensional and T ∈ L(V ). For λ ∈ R, null(T − λI) equals V if λ is an eigenvalue of T and {0} otherwise. If α, β ∈ R with α2 < 4β, so that x2 + αx + β = 0 has no real roots, then null(T 2 + αT + βI) = {0}. (Proof: Because V is 1-dimensional, there is a constant λ ∈ R such that T v = λv for all v ∈ V . (Why is this true?) So if v is a nonzero vector in V , then (T 2 + αT + βI)v = (λ2 + αλ + β)v. The only way this can be 0 is for v = 0 or (λ2 + αλ + β) = 0. But this second equality cannot hold because we are assuming that α2 < 4β. Hence null(T 2 + αT + βI) = {0}.) Now suppose that V is a 2-dimensional real vector space and that T ∈ L(V ) still has no eigenvalues. For λ ∈ R, null(T −λI) = {0} exactly because T has no eigenvalues. If α, β ∈ R with α2 < 4β, then null(T 2 + αT + βI) equals all of V if x2 +αx+β is the characteristic polynomial of T with respect to some (and hence to all) ordered bases of V , and equals {0} otherwise by

222


part (b) of Theorem 10.6.2. It is important to note that in this case the null space of T 2 + αT + βI is either {0} or the whole space 2-dimensional space! The goal of this section is to prove the following theorem. Theorem 10.6.3. Suppose that V is a real vector space of dimension n and T ∈ L(V ). Suppose that with respect to some basis of V , the matrix of T has the form   A1 ∗   .. A= (10.23) , . 0 Am where each Aj is a 1 × 1 matrix or a 2 × 2 matrix with no eigenvalues (as in Theorem 9.4). (a) If λ ∈ R, then precisely dim(null((T −λI)n )) of the matrices A1 , . . . , Am equal the 1 × 1 matrix [λ]. (b) If α, β ∈ R satisfy α2 < 4β, then precisely dim(null(T 2 + αT + βI)n ) 2 of the matrices A1 , . . . , Am have characteristic polynomial equal to x2 +αx+b. Note that this implies that null((T 2 + αT + βI)n ) must have even dimension. Proof. With a little care we can construct one proof that can be used to prove both (a) and (b). For this, let λ, α, β ∈ R with α2 < 4β. Define p(x) ∈ R[x] by x − λ, if we are trying to prove (a); p(x) = 2 x + αx + β, if we are trying to prove (b). Let d denote the degree of p(x). Thus d = 1 or d = 2, depending on whether we are trying to prove (a) or (b). The basic idea of the proof is to proceed by induction on m, the number of blocks along the diagonal in Eq. 10.23. If m = 1, then dim(V ) = 1 or dim(V ) = 2. In this case the discussion preceding this theorem implies that the desired result holds. Our induction hypothesis is that for m > 1, the desired result holds when m is replaced with m − 1.


223

Let B be a basis of V with respect to which T has the block uppertriangular matrix of Eq. 10.23. Let Uj denote the span of the basis vectors corresponding to Aj . So dim (Uj ) = 1 if Aj is 1 × 1 and dim(Uj ) = 2 if Aj is a 2 × 2 matrix (with no eigenvalues). Let U = U1 + U2 + · · · + Um−1 = U1 ⊕ U2 ⊕ · · · ⊕ Um−1 . Clearly U is invariant under T and the matrix of T |U with respect to the basis B 0 obtained from the basis vectors corresponding to A1 , . . . , Am−1 is   A1 ∗   .. (10.24) [T |U ]B0 =  , . 0 Am−1 Suppose that dim(U ) = n0 , so n0 is either n − 1 or n − 2. Also, 0 dim(null(p(T |U )n )) = dim(null(p(T |U )n )). Hence our induction hypothesis implies that Precisely d1 dim(null(p(T |U ))n ) of the matrices A1 , . . . , Am−1 have characteristic polynomial p.

(10.25)

Let um be a vector in Um . T (um ) might not be in Um , since the entries of the matrix A in the columns above the matrix Am might not all be 0. But we can project T (um ) onto Um . So let S ∈ L(Um ) be the operator whose matrix with respect to the basis corresponding to Um is Am . It follows that S(um ) = PUm ,U T (um ). Since V = U ⊕Um , we know that for any vector v ∈ V we have v = PU,Um (v) + PUm ,U (v). Putting T (um ) in place of v, we have T (um ) = PU,Um T (um ) + PUm ,U T (um ) = ∗ u + S(um ),

(10.26)

where ∗ u denotes an unknown vector in U . (Each time the symbol ∗ u is used, it denotes some vector in U , but it might be a different vector each time the symbol is used.) Since S(um ) ∈ Um , so T (S(um )) = ∗ u + S 2 (um ), we can apply T to both sides of Eq. 10.26 to obtain T 2 (um ) = ∗ u + S 2 (um ).

(10.27)

(It is important to keep in mind that each time the symbol ∗ u is used, it probably means a different vector in U .)

224


Using equations Eqs. 10.26 and 10.27 it is easy to show that p(T )(um ) = ∗ u + p(S)um .

(10.28)

Note that p(S)(um ) ∈ Um . Thus iterating the last equation gives p(T )n (um ) = ∗ u + p(S)n (um ).

(10.29)

Since V = U ⊕ Um , for any v ∈ V , we can write v = ∗ u + um , with ∗ u ∈ U and um ∈ Um . Then using Eq. 10.29 and the fact that U is invariant under any polynomial in T , we have p(T )n (v) = ∗ u + p(S)n (um ),

(10.30)

where v = ∗ u + um as above. If v ∈ null(p(T )n ), we now see that 0 = ∗ u + P (S)n (um ), from which it follows that P (S)n (um ) = 0. The proof now breaks into two cases: Case 1 is where p(x) is not the characteristic polynomial of Am ; Case 2 is where p(x) is the characteristic polynomial of Am . So consider Case 1. Since p(x) is not the characteristic polynomial of Am , we see that p(S) must be invertible. This follows from Theorem 10.6.3 and the discussion immediately following, since the dimension of Um is at most 2. Hence P (S)n (um ) = 0 implies um = 0. This says: The null space of P (T )n is contained in U.

(10.31)

This says that null(p(T )n ) = null(p(T |U )n ). But now we can apply Eq. 10.25 to see that precisely d1 dim(null(p(T )n ) of the matrices A1 , . . . , A m−1 have characteristic polynmial p(x). But this means that precisely d1 dim(null(p(T )n ) of the matrices A1 , . . . , Am have characteristic polynomial p(x). This completes Case 1. Now suppose that p(x) is the characteristic polynomial of Am . It is clear that dim(Um ) = d. Lemma 10.6.4. We claim that dim(null(p(T )n )) = dim(null(p(T |U )n ) + d.


225

This along with the induction hypothesis Eq. 10.25 would complete the proof of the theorem. But we still have some work to do. Lemma 10.6.5. V = U + null(p(T )n ). Proof. Because the characteristic polynomial of the matrix Am of S equals p(x), we have p(S) = 0. So if um ∈ Um , from Eq. 10.26 we see that p(T )(um ) ∈ U . So p(T )n (um ) = p(T )n−1 (p(T )(um )) ∈ range(p(T |U )n−1 ) = range(p(T |U )n ), where the last identity follows from the fact that dim(U ) < n. Thus we can choose u ∈ U such that p(T )n (um ) = p(T )n (u). Then p(T )n (um − u) = p(T )n (um ) − p(T )n (u) = p(T |U )n (u) − p(T |U )n (u) = 0.

(10.32)

This says that um −u ∈ null(p(T )n ). Hence um , which equals u+(um −u), is in U + null(p(T )n ), implying Um ⊆ U + null(p(t)n ). Therefore V = U + null(p(T )n ) ⊆ V . This proves the lemma 10.6.5. Since dim(Um ) = d and dim(U ) = n − d, we have dim(null(p(T )n ) = dim(U ∩ null(p(T )n ) + dim(U + null(p(T )n ) − dimU = dim(null(p(T |U )n )) + dim(V ) − (n − d) (10.33) n = dim(null(p(T |U ) )) + d. (10.34) This completes a proof of the claim in Lemma 10.6.4, and hence a proof of Theorem 10.6.3 Suppose V is a real vector space and T ∈ L(V ). An ordered pair (α, β) of real numbers is called an eigenpair of T if α2 < 4β and T 2 + αT + βI is not injective. The previous theorem shows that T can have only finitely many eigenpairs, because each eigenpair correpsponds to the characteristic

226


polynomial of a 2 × 2 matrix on the diagonal of A in Eq. 10.23, and there is room for only finitely many such matrices along that diagonal. We define the multiplicity of an eigenpair (α, β) of T to be 2 dim (V ) ) dim(null (T + αT + βI) . 2 From Theorem 10.5.3 we see that the multiplicity of (α, β) equals the number of times that x2 + αx + β is the characteristic polynomial of a 2 × 2 matrix on the diagonal of A in Eq. 10.24. Theorem 10.6.6. If V is a real vector space and T ∈ L(V ), then the sum of the multiplicities of all the eigenvalues of T plus the sum of twice the multiplicities of all the eigenpairs of T equals dim(V ). Proof. There is a basis of V with respect to which the matrix of T is as in Theorem 10.5.3. The multiplicity of an eigenvalue λ equals the number of times the 1 × 1 matrix [λ] appears on the diagonal of this matrix (from 10.5.3). The multiplicity of an eigenpair (α, β) equal the number of times x2 + αx + β is the characteristic polynomial of a 2 × 2 matirx on the diagonal of this matrix (from 10.5.3). Because the diagonal of this matrix has length dim(V ), the sum of the multiplicities of all the eigenvalues of T plus the sum of twice the multiplicities of all the eigenpairs of T must equal dim(V ). Axler’s approach to the characteristic polynomial of a real matrix is to define them for matrices of sizes 1 and 2, and then define the characteristic polynomial of a real matrix A as follows. First find the matrix J = P −1 AP that is the Jordan form of A. Then the characteristic polynomial of A is the product of the characteristic polynomials of the 1 × 1 and 2 × 2 matrices along the diagonal of J. Then he gives a fairly involved proof (page 207) that the Cayley-Hamilton theorem holds, i.e., the characteristic polynomial of a matrix A has A as a zero. So the minimal polynomial of A divides the characteristic polynomial of A. Theorem 10.6.7. Suppose V is a real, n-dimensional vector space and T ∈ L(V ). Let λ1 , . . . , λm be the distinct eigenvalues of T , with U1 , . . . , Um the corresponding spaces of generalized eigenvectors. So Uj = null(T − λj I)n . Let (α1 , β1 ), . . . , (αr , βr ) be the distinct eigenpairs of T . Let Vj = null(T 2 + αj T + βj )n , for 1 ≤ j ≤ r. Then

10.7. EXERCISES

227

(a) V = U1 ⊕ · · · ⊕ Um ⊕ V1 ⊕ · · · ⊕ Vr . (b) Each Ui and each Vj are invariant under T . (c) Each (T − λi I)|Ui and each (T 2 + αj T + βj I)|Vj are nilpotent.

10.7

Exercises

1. Prove or disprove: Two n×n real matrices with the same characteristic polynomials and the same minimal polynomials must be similar. 2. Prove: Let A be an n × n idempotent matrix (i.e., A2 = A) with real entries (or entries from ANY field). Prove that A must be diagonalizable. 3. Suppose A is a block upper-triangular matrix 

∗

A1 ...

 A= 0

  ,

Am

where each Aj is a square matrix. Prove that the set of eigenvalues of A equals the union of the sets of eigenvalues of A1 , . . . , Am . 4. Suppose V is a real vector space and T ∈ L(V ). Suppose α, β ∈ R are such that α2 < 4β. Prove that null(T 2 + αT + βI)k has even dimension for every positive integer k. 5. Suppose V is a real vector space and T ∈ L(V ). Suppose α, β ∈ R are such that α2 < 4β and T 2 + αT + βI is nilpotent. Prove that dim(V ) is even and (T 2 + αT + βI)dim(V )/2 = 0. 6. Let a and b be arbitrary complex numbers with b 6= 0. Put A = a b . Define a map TA on M2 (C) by −b a TA : M2 (C) → M2 (C) : B 7→ AB − BA.

228

CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR (i) Show that TA ∈ L(Mn (C)), i.e., show that TA is linear. (ii) Let B be the ordered basis of M2 (C) given by B = {E11 , E12 , E21 , E22 } , where Eij has a 1 in position (i, j) and zeros elsewhere. Compute the matrix [TA ]B that represents TA with respect to the basis B. (iii) Compute the characteristic polynomial of the linear operator TA and determine its eigenvalues. (iv) Compute a basis of each eigenspace of TA . (v) Give a basis for the range of TA . (vi) Give the Jordan form of the matrix [TA ]B and explain why your form is correct.

7. Define T ∈ L(C 3 ) by T : (x1 , x2 , x3 ) 7→ (x3 , 0, 0). (a) Find all eigenvalues and correspopnding eigenvectors of T . (b) Find all generalized eigenvectors of T . (c) Compute the minimal polynomial for T . (d) Give the Jordan form for T . 8. An n×n matrix A is called nilpotent if Ak = 0 for some positive integer k. Prove: (a) A is nilpotent if and only if all its eigenvalues are 0. (b) If A is a nonzero real nilpotent matrix, it can not be symmetric. (c) If A is a nonzero complex nilpotent matrix, it can not be hermitian (i.e., self-adjoint).   1 1 0 0  0 0 1 0   9. Let A =   0 0 0 1 , where A is to be considered as a matrix over 0 1 0 0 C.

10.7. EXERCISES

229

(a) Determine the minimal and characteristic polynomials of A and the Jordan form for A. (b) Determine all generalized eigenvectors of A and a basis B of C 4 with respect to which the operator TA : x 7→ Ax has Jordan form. Use this to write down a matrix P such that P −1 AP is in Jordan form. 10. Let A be an n × n symmetric real matrix with some power of A being the identity matrix. Show that either A = I or A = −I or A2 = I but A is not a scalar times I. 11. LetV be the usual real vector space of all 3 × 3 real matrices. Define subsets of V by U = {A ∈ V : AT = A} and W = {A ∈ V : AT = −A}. (a) (3 points) Show that both U and W are subspaces of V . (b) (3 points) Show that V = U ⊕ W . (c) (3 points) Determine the dimensions of U and W . (d) (3 points) Let T r : V → R : A 7→ trace(A). Put R = {A ∈ U : T r(A) = 0}. Show that R is a subspace of U and compute a basis for R.   0 1 0 (e) (8 points) Put P =  0 0 1  and define T : W → W : B 7→ 1 0 0 T P B + BP . Show that T ∈ L(W ) and determine a basis B for W . Then compute the matrix [T ]B . What are the minimal and characteristic polynomials for T ?     a 0  12. Let W be the space of 3×2 matrices over R. Put V1 =  b 0  : a, b, c ∈ R ,   c 0     0 a  and V2 =  0 b  : a, b, c ∈ R . So W is isomorphic to V1 ⊕ V2 .   0 c Let A be the matrix



 1 2 3 A= 2 0 4  3 4 −1

230

CHAPTER 10. DECOMPOSITION WRT A LINEAR OPERATOR Define a map T ∈ L(W ) by T : W → W : B 7→ AB. Compute the eigenvalues of T , the minimal polynomial for T , and the characteristic polynomial for T . Compute  the Jordan form for T . (Hint: 1 One eigenvector of the matrix A is  1 .) 1

13. Let V = R5 and let T ∈ L(V ) be defined by T (a, b, c, d, e) = (2a, 2b, 2c+ d, a + 2d, b + 2e). (a) Find the characteristic and minimal polynomial of T . (b) Determine a basis of F 5 consisting of eigenvectors and generalized eigenvectors of T . (c) Find the Jordan form of T (or of M (T )) with respect to your basis. 14. Suppose that S, T ∈ L(V ). Suppose that T has dim(V ) distinct eigenvalues and that S has the same eigenvectors as T (though not necessarily with the same eigenvalues). Prove that ST = T S. 15. Let k be positive integer greater than 1 and let N be an n × n nilpotent matrix over C. Say N m = 0 6= N m−1 for some positive integer m ≥ 2. (a) Prove that I + N is invertible (b) Prove that there is some polynomial f (x) ∈ C[x] with degree at most m − 1 and constant term equal to 1 for which B = f (N ) satisfies B k = I + N . 16. Let A be any invertible n × n complex matrix, and let k be any positive integer greater than 1. Show that there is a matrix B for which B k = A. 17. If A is not invertible the situation is much more complicated. Show that if the minimal polynomial for A has 0 as a root with multiplicity 1, then A has a k-th root. In particular if A is normal, then it has a k-th root. 18. If A is nilpotent a wide variety of possibilities can occur concerning k-th roots.

10.7. EXERCISES

231

(a) Show that there is an A ∈ M10 (C) for which A5 = 0 but A4 6= 0. (b) Show that any A as in Part (a) cannot have a cube root or any k-th root for k ≥ 3. 19. Let m, k be integers greater than 1, and let J be an m × m Jordan block with 1’s along the superdiagonal and zeros elsewhere. Show that there is no m × m complex matrix B for which B k = J.

232


Chapter 11 Matrix Functions∗ 11.1

Operator Norms and Matrix Norms∗

Let V and W be vector spaces over F having respective norms k·kV and k·kW . Let T ∈ L(V, W ). One common problem is to understand the “size” of the linear map T in the sense of its effects on the magnitude of the inputs. For each nonzero v ∈ V , the quotient kT (v)kW /kvkV measures the magnification caused by the transformation T on that specific vector v. An upper bound on this quotient valid for all v would thus measure the overall effect of T on the size of vectors in V . It is well known that in every finite dimensional space V , the quotient kT (v)kW /kvkV has a maximum value which is achieved with some specific vector v0 . Note that if v is replaced by cv for some nonzero c ∈ F , the quotient is not changed. Hence if O = {x ∈ V : kxkV = 1}, then we may define a norm for T (called a transformation norm) by kT kV,W = max{kT (v)kW /kvkV : ~0 = 6 v ∈ V } = max{kT (v)kW : v ∈ O}. (11.1) (In an Advanced Calculus course a linear map T is shown to be continuous, and the continuous image of a compact set is compact.) In this definition we could replace “max” with “supremum” in order to guarantee that kT kV,W is always defined even when V and W are not finite dimensional. However, in this text we just deal with the three specific norms we have already defined on the finite dimensional vector spaces (See Section 8.1). (For more on matrix norms see the book MATRIX ANALYSIS by Horn and Johnson.) The norm of a linear map from V to W defined in this way is a vector norm 233

CHAPTER 11. MATRIX FUNCTIONS∗

234

on the vector space L(V, W ). (See Exercise 1.) We can actually say a bit more. Theorem 11.1.1. Let U , V and W be vector spaces over F endowed with vector norms k · kU , k · kV , k · kW , respectively. Let S, T ∈ L(U, V ) and L ∈ L(V, W ) and suppose they each have norms defined by Eq. 11.1 Then: (a) kT kU,V ≥ 0, and kT kU,V = 0 if and only if T (u) = ~0 for all u ∈ V . (b) kaT kU,V = |a| kT kU,V for all scalars a. (c) kS + T kU,V ≤ kSkU,V + kT kU,V . (d) kT (u)kV ≤ kT kU,V · kukU for all u ∈ U . (e) kIkU,U = 1, where I ∈ L(U ) is defined by I(u) = u for all u ∈ U . (f ) kL ◦ T kU,W ≤ kLkV,W · kT kU,V . (g) If U = V , then kT i kU,U ≤ (kT kU,U )i . Proof. Parts (a), (d) and (e) follow immediately from the definition of the norm of a linear map. Part (b) follows easily since |a| can be factored out of the appropriate maximum (or supremum). For part (c), from the triangle inequality for vector norms, we have k(S + T )(u)kV

= kS(u) + T (u)kV ≤ kS(u)kV + kT (u)kV ≤ (kSkU,V + kT kU,V )kukU ,

from part (d). This says the quotient used to define the norm of S + T is bounded above by (kSkU,V + kT kU,V ), so the least upper bound is less than or equal to this. Part (f) follows in a similar manner, and then (g) follows by repeated applications of part (f). Let A be an m × n matrix over F (with F a subfield of C). As usual, we may consider the linear map TA : F n → F m : x 7→ Ax. We consider the norm on L(F n , F m ) induced by each of the standard norms k·k1 , k·k2 , k·k∞ (see Section 8.1), and use it to define a corresponding norm on the vector space of m × n matrices. It seems natural to let the transformation norm of TA also be a “norm” of A. Specifically, we have n o 1 ~0 : x = 6 kAk1 = max kAxk n kxk1 o kAxk2 ~ kAk2 = max kxk2 : x 6= 0 n o ∞ ~0 . kAk∞ = max kAxk : x = 6 kxk∞

11.1. OPERATOR NORMS AND MATRIX NORMS∗

235

Theorem 11.1.2. Let be m × n over the field F . Then: PA m (a) kAk1 = max{ i=1 |Aij | : 1 ≤ j ≤ n} (maximum absolute column sum). P (b) kAk∞ = max{ nj=1 |Aij | : 1 ≤ i ≤ m} (maximum absolute row sum). (c) kAk2 = maximum singular value of A. Proof. To prove (a), observe m X n m X n X X kAxk1 = A x ≤ |Aij | |xj | ij j i=1

=

n m X X j=1

j=1

i=1 j=1

! |Aij | |xj | ≤

i=1

=

n X

! maxj

j=1

maxj

m X

X

|Aij | |xj |

i

! |Aij | kxk1 = αkxk1 .

i=1

Pm

So kAk1 ≤ α = maxj i=1 |Aij |. To complete the proof of (a) we need to construct a vector x ∈ F n such that kAxk1 = αkxk1 . Put x = (0, 0, . . . , 1, . . . , 0)T where Pmthe single nonzero entry 1 is in column j0 , and the maximum P value of i=1 |Aij | occurs when j = j0 . Then kxk1 = 1, and kAxk1 = m i=1 |Aij0 | = α = αkxk1 as desired. We now consider part (b). Define α by α = max{

n X

|Aij | : 1 ≤ i ≤ m}.

j=1

For an arbitrary v = (v1 , . . . , vn )T ∈ F n suppose that the maximum defining α above is attained when i = i0 . Then we have n X kTA (v)k∞ = kAvk∞ = maxi |(Av)i | = maxi Aij vj j=1

n n X X ≤ maxi (|Aij | |vj |) ≤ maxi (|Aij | maxk |vk |) j=1

j=1

≤ αkvk∞ ,


236

so kTA (v)k∞ /kvk∞ ≤ α for all nonzero v. This says that this norm of TA is at most α. To show that it equals α we must find a vector v such that the usual quotient equals α. For each j = 1, . . . , n, chose vj with absolute value 1 so that Ai0 j vj = |Ai0 j |. Clearly kvk∞ = 1, so that kTA (v)k∞ ≤ α, which we knew would have to be the case anyway. On the other hand n n X X |Ai0 j | = α. Ai0 j vj = |(TA (v))i0 | = j=1

j=1

This shows that kTA (v)k∞ = α for this particular v, and hence this norm of TA is α. Part (c) is a bit more involved. First note that kvk as used in earlier chapters is just kvk2 . Suppose that S is unitary and m × m, and that A is m × n. Then k(SA)(v)k2 = kS(A(v))k2 = k(A(v)k2 for all v ∈ F n . It follows immediately that kSAk2 = kAk2 . Now let A be an arbitrary m × n matrix with singular value decomposition A = U ΣV ∗ . So kAk2 = kU ΣV ∗ k2 = kΣV ∗ k2 . Since k · k2 is a matrix norm coming from a transformation norm, by part (f) of Theorem 11.1.1 kΣV ∗ k2 ≤ kΣk2 kV ∗ k2 = kΣk2 . If Σ = diag(σ1 , σ2 , . . . , σk , 0, . . . , 0), with pσ1 ≥ σ2 ≥ · · · ≥ σk > 0, then kΣxk2 = k(σ1 x1 , . . . , σk xk , 0, . . . , 0)T k2 = σ12 x21 + · · · σk2 x2k ≤ σ1 ·kxk2 . So kΣk2 ≤ σ1 . Suppose x = (1, 0, . . . , 0)T . Then Σx = (σ1 x1 , 0, . . . , 0)T . So kxk2 = 1, kΣxk2 = |σ1 x1 | = σ1 = σ1 kxk2 . This says kΣk2 = σ1 . So we know that kΣV ∗ k2 ≤ σ1 . Let x be the first column of V , so V ∗ x = (1, 0, . . . , 0)T . Then kΣV ∗ xk2 = σ1 , showing that kΣV ∗ k2 = kΣk2 = σ1 , so that finally we see kAk2 = σ1 .

11.2

Polynomials in an Elementary Jordan Matrix∗

Let Nn denote the n × n matrix with all entries equal to 0 except for those along the super diagonal just above the main diagonal:   0 1 0 ··· 0  0 0 1 ··· 0     0 ··· 0  Nn =  0 0 (11.2) .  .. ..  . .. 0 1   . . 0 ··· ··· 0 0

11.2. POLYNOMIALS IN AN ELEMENTARY JORDAN MATRIX∗

237

1, if j = i + 1; 0, otherwise. The following lemma is easily established

So (Nn )ij =

Lemma 11.2.1. For 1 ≤ m ≤ n − 1, (Nnm )ij =

1, 0,

if j = i + m; Also, otherwise.

Nn0 = I and Nnn = 0. Then let Jn (λ) be the elementary n × n given by Jn (λ) = λI + Nn . So  λ 1 0 ···  0 λ 1 ···   0 0 λ ··· Jn (λ) =   .. .. . .. λ  . . 0 ···

···

0

Jordan block with eigenvalue λ 0 0 0



    = λI + Nn ,  1  λ

where Nn is nilpotent with minimal polynomial xn . Theorem 11.2.2. In this Theorem (and until notified otherwise) write J in place of Jn (λ). For each positive integer m,  m m m−1 m−n+1  m λ λ · · · λ 1 m−1 m m m−n+2   λ · · · λ   m−2 m J = , . .   . λm

that is,

m

(J )ij =

m λm−j+i ∀ 1 ≤ i, j ≤ n. j−i

Proof. It is easy to see that the desired result holds for m = 1 by the definition of J, so suppose that it holds for a given m ≥ 1. Then (J

m+1

m

)ij = (J · J )ij =

n X

Jik (J m )kj = λ(J m )ij + 1 · (J m )i+1,j

k=1

m m m−j+i =λ λ + λm−j+i+1 j−i j−i−1 m m m + 1 m+1−j+i m+1−j+i m+1−j+i = λ + λ = λ . j−i j−i−1 j−i


238

P Now suppose that f (x) = km=0 am xm , so that f (J) = a0 I + a1 J + · · · + ak J k . Recall (or prove by induction on s) that for s ≥ 0, f

(s)

k X

k X m m m−s f (s) (λ) m−s . (x) = am s!x , so am λ = s! s s m=0 m=0

(11.3)

So for s ≥ 0, (f (J))i,i+s This says that  f (λ)  0   f (J) =  0  .  .. 0

k X

m m−s 1 = am λ = f (s) (λ). s s! m=0

1 (1) f (λ) 1!

f (λ) 0 .. . 0

1 (2) f (λ) 2! 1 (1) f (λ) 1!

f (λ) ···

··· ··· ··· ...

1 f (n−1) (λ) (n−1)! 1 f (n−2) (λ) (n−2)! 1 f (n−3) (λ) (n−3)!

···

.. . f (λ)

    .  

(11.4)

Now suppose that J is a direct sum of elementary Jordan blocks: J = J1 ⊕ J2 ⊕ · · · ⊕ Js . Then for the polynomial f (x) we have f (J) = f (J1 ) ⊕ f (J2 ) ⊕ · · · ⊕ f (Js ). Here f (J1 ), . . . , f (Js ) are polynomials in separate Jordan blocks, whose values are given by Eq. 11.3. This result may be applied in the computation of f (A) even when A is not originally in Jordan form. First determine a T for which J = T −1 AT has Jordan form. Then compute f (J) as above and use f (A) = f (T JT −1 ) = T f (J)T −1 .

11.3

Scalar Functions of a Matrix∗

For certain functions f : F → F (satisfying requirements stipulated below) we can define a matrix function f : A 7→ f (A) so that if f is a polynomial, the

11.3. SCALAR FUNCTIONS OF A MATRIX∗

239

value f (A) agrees with the value given just above. Start with an arbitrary square matrix A over F and let λ1 , . . . , λs be the distinct eigenvalues of A. Reduce A to Jordan form T −1 AT = J1 ⊕ J2 ⊕ · · · ⊕ Jt , where J1 , . . . , Jt are elementary Jordan blocks. Consider the Jordan block   λi 1 0 · · · 0  λi 1 · · · 0    Ji = Jni (λi ) =  .. (11.5) ..  ,  . .  λi which has (x − λi )ni as its minimal and characteristic polynomials. If the function f : F → F is defined in a neighborhood of the point λi and has derivatives f (1) (λi ), . . . , f (ni −1) (λi ), then define f (Ji ) by     f (Ji ) =   

f (λ) 0 0 .. .

1 (1) f (λ) 1!

0

0

f (λ) 0 .. .

1 (2) f (λ) 2! 1 (1) f (λ) 1!

f (λ) ···

··· ··· ··· .. . ···

1 f (n−1) (λ) (n−1)! 1 f (n−2) (λ) (n−2)! 1 f (n−3) (λ) (n−3)!

.. . f (λ)

    .  

(11.6)

This says that (f (Ji ))r,r+j =

1 (j) f (λ), for 0 ≤ j ≤ n − r. j!

(11.7)

If f is defined in a neighborhood of each of the eigenvalues λ1 , . . . , λs and has (finite) derivatives of the proper orders in these neighborhoods, then also f (J) = f (J1 ) ⊕ f (J2 ) ⊕ · · · ⊕ f (Jt ),

(11.8)

f (A) = T f (J)T −1 = T f (J1 )T −1 ⊕ · · · ⊕ T f (Jt )T −1 .

(11.9)

and The matrix f (A) is called the value of the function f at the matrix A. We will show below that f (A) does not depend on the method of reducing A to Jordan form (i.e., it does not depend on the particular choice of T ),

240


and thus f really defines a function on the n × n matrices A. This matrix function is called the correspondent of the numerical function f . Not all matrix functions have corresponding numerical functions. Those that do are called scalar functions. Here are some of the simplest properties of scalar functions. Theorem 11.3.1. It is clear that the definition of scalar function was chosen precisely so that part (a) below would be true. (a) If f (λ) is a polynomial in λ, then the value f (A) of the scalar functioin f coincides with the value of the polynomial f (λ) evaluated at λ = A. (b) Let A be a square matrix over F and suppose that f1 (λ) and f2 (λ) are two numerical functions for which the expressions f1 (A) and f2 (A) are meaningful. If f (λ) = f1 (λ) + f2 (λ), then f (A) is also meaningful and f (A) = f1 (A) + f2 (A). (c) With A, f1 and f2 as in the preceding part, if f (λ) = f1 (λ)f2 (λ), then f (A) is meaningful and f (A) = f1 (A)f2 (A). (d) Let A be a matrix with eigenvalues λ1 , . . . , λn , each appearing as often as its algebraic multiplicity as an eigenvalue of A. If f : F → F is a numerical function and f (A) is defined, then the eigenvalues of the matrix f (A) are f (λ1 ), . . . , f (λn ). Proof. The proofs of parts (b) and (c) are analogous so we just give the details for part (c). To compute the values f1 (A), f2 (A) and f (A) according to the definition, we must reduce A to Jordan form J and apply the formulas given in Eqs. 11.8 and 11.9. If we show that f (J) = f1 (J)f2 (J), then from Eq. 11.8 we immediately obtain f (A) = f1 (A)f2 (A). In fact, f (J) = f (J1 ) ⊕ · · · ⊕ f (Jt ), and f1 (J)f2 (J) = f1 (J)f2 (J) = f1 (J1 )f2 (J1 ) ⊕ · · · ⊕ f1 (Jt )f2 (Jt ), so that the proof is reduced to showing that f (Ji ) = f1 (Ji )f2 (Ji ), (i = 1, 2, . . . , t),

11.3. SCALAR FUNCTIONS OF A MATRIX∗

241

where Ji is an elementary Jordan block. Start with the values of f1 (Ji ) and f2 (Ji ) given in Eq. 11.6, and multiply them together to find that [f1 (Ji )f2 (Ji )]r,r+s =

s X

(f1 (Ji ))r,r+j (f2 (Ji ))r+j,r+s =

j=0 s X 1 1 (j) (s−j) f1 (λi ) f2 (λi ) = = j! (s − j)! j=0 s

1X 1 s! (j) (s−j) = f1 (λi )f2 (λi ) = (f1 f2 )(s) (λi ), s! j=0 j!(s − j)! s! where the last equality comes from Exercise 2, part (i). Thus f1 (Ji )f2 (Ji ) = f (Ji ), completing the proof of part (c) of the theorem. For part (d), the eigenvalues of the matrices f (A) and T −1 f (A)T = f (T −1 AT ) are equal, and therefore we may assume that A has Jordan form. Formulas in Eq. 11.5 and 11.6 show that in this case f (A) is upper triangular with f (λ1 ), . . . , f (λn ) along the main diagonal. Since the diagonal elements of an upper triangular matrix are its eigenvalues, part (d) is proved. Similarly, if f and g are functions such that f (g(A)) is defined, and if h(λ) = f (g(λ)), then h(A) = f (g(A)). To finish this section we consider two examples. Example 11.3.2. Let f (λ) = λ−1 . This function is defined everywhere except at λ = 0, and has finite derivatives of all orders everywhere it is defined. Consequently, if the matrix A does not have zero as an eigenvalue, i.e., if A is nonsingular, then f (A) is meaningful. But λ · f (λ) = 1, hence A·f (A) = I, so f (A) = A−1 . Thus the matrix function A 7→ A−1 corresponds to the numerical function λ 7→ λ−1 , as one would hope. √ √ Example 11.3.3. Let f (λ) = λ. To remove the two-valuedness of · it is sufficient to slit the complex plane from the origin along a ray not containing any eigenvalues of A, and to consider one branch of the radical. Then this function, for √ λ 6= 0, has finite derivatives of all orders. It follows that the expression A is meaningful for all nonsingular matrices A. Putting λ = A in the equation f (λ)f (λ) = λ,


242 we obtain

f (A)f (A) = A. This shows that each nonsingular matrix has a square root.

11.4

Scalar Functions as Polynomials∗

At this point we need to generalize the construction given in Lagrange interpolation (see Section 5.3). Lemma 11.4.1. Let r1 , . . . , rs be distinct complex numbers, and for each i, 1 ≤ i ≤ s, let mi be a nonnegative integer. Let (aij ) be a table of arbitrary numbers , 1 ≤ i ≤ s, and for each fixed i, 0 ≤ j ≤ mi . Then there exists a polynomial p(x) such that p(j) (ri ) = aij , for 1 ≤ i ≤ s and for each fixed i, 0 ≤ j ≤ mi . Proof. It is convenient first to construct auxiliary polynomials pi (x) such that pi (x) and its derivatives to the mi th order assume the required values at the point ri and are all zero at the other given points. Put

φi (x) = bi0 + bi1 (x − ri ) + · · · + bimi (x − ri )mi =

mi X

bij (x − ri )j ,

j=0 (j)

where the bij are complex numbers to be determined later. Note that φi (ri ) = j!bij . Set Φi (x) = (x − r1 )mi +1 · · · (x − ri−1 )mi +1 (x − ri+1 )mi +1 · · · (x − rs )mi +1 , and pi (x) = φi (x)Φi (x) =

(m i X

bij (x − ri )j

) 1≤j≤s Y

j=0

(x − rj )mi +1 .

j6=i

By the rule for differentiating a product (see Exercise 2), (j) pi (ri )

=

j X j l=0

l

(l)

(j−l)

φi (ri )Φi

(ri ),

11.4. SCALAR FUNCTIONS AS POLYNOMIALS∗ or aij =

j X j l=0

l

(j−l)

l!bil Φi

(ri ).

243

(11.10)

Using Eq. 11.10 with j = 0 and the fact that Φi (ri ) 6= 0 we find bi0 =

ai0 , 1 ≤ i ≤ s. Φi (ri )

(11.11)

For each i = 1, 2, . . . , s, and for a given j with 0 ≤ j < mi , once bil is determined for all l with 0 ≤ l < j, we can solve Eq. 11.10 for aij bij = − j!Φi (ri )

Pj−1

(j−l)

bil Φi (ri ) , 1 ≤ i ≤ s. (j − l)!Φi (ri ) l=0

(11.12)

This determines pi (x) so that pi (x) and its derivatives up to the mi th derivative have all the required values at ri and all equal zero at rt for t 6= i. It is now clear that the polynomial p(x) = p1 (x) + p2 (x) + · · · + ps (x) satisfies all the requirements of the Lemma. Consider a numerical function λ 7→ f (λ) and an n × n matrix A for which the value f (A) is defined. We show that there is a polynomial p(x) for which p(A) equals f (A). Let λ1 , . . . , λs denote the distinct eigenvalues of the matrix A. Using only the proof of the lemma we can construct a polynomial p(x) which satisfies the conditions p(λi ) = f (λi ), p0 (λi ) = f 0 (λi ), . . . , p(n−1) (λi ) = f (n−1) (λi ),

(11.13)

where if some of the derivatives f (j) (ri ) are superfluous for the determination of f (A), then the corresponding numbers in Eq. 11.12 may be replaced by zeros. Since the values of p(x) and f (λ) (and their derivatives) coincide at the numbers λi , then f (A) = p(A). This completes a proof of the following theorem. Theorem 11.4.2. The values of all scalar functions in a matrix A can be expressed by polynomials in A. Caution: The value f (A) of a given scalar function f can be represented in the form of some polynomial p(A) in A. However, this polynomial, for a given function f will be different for different matrices A. For example, the


244

minimal polynomial p(x) of a nonsingular matrix A can be used to write A−1 as a polynomial in A, but for different matrices A different polynomials will occur giving A−1 as a polynomial in A. √ Also, considering the function f (λ) = λ, we see that for every nonsingular matrix A there exists a polynomial p(x) for which p(A)p(A) = A. VERY IMPORTANT: With the help of Theorem 11.4.2 we can now resolve the question left open just preceding Theorem 11.3.1 concerning whether f (A) was well-defined. If we know the function f (λ) and its derivatives at the points λ1 , . . . , λs , we can construct the polynomial p(x) whose value p(A) does not depend on the reduction of the matrix A to Jordan form, and at the same time is equal to f (A). Consequently, the value f (A) defined in the preceding section using the reduction of A to Jordan form, does not depend on the way this reduction is carried out. Let f (λ) be a numerical function, and let A be a matrix for which f (A) is meaningful. By Theorem 11.4.2 we can find a polynomial p(x) for which p(A) = f (A). For a given function f (λ), the polynomial p(x) depends only on the elementary divisors of the matrix A. But the elementary divisors of A and its transpose AT coincide, so p(AT ) = f (AT ). Since for a polynomial p(x) we have p(AT ) = p(A)T , it must be that f (AT ) = f (A)T for all scalar functions f (λ) for which f (A) is meaningful.

11.5

Power Series∗

All matrices are n × n over F , as usual. Let G(x) =

∞ X

ak x k

k=0

P k be a power series with coefficients from F . Let GN (x) = N k=0 ak x be the N th partial sum of G(x). For each A ∈ Mn (F ) let GN (A) be the element of Mn (F ) obtained by substituting A in this polynomial. For each fixed i, j we obtain a sequence of real or complex numbers cN ij , N = 0, 1, 2, . . . by taking N cij to be the (i, j) entry of the matrix GN (A). The series G(A) =

∞ X k=0

ak Ak

11.5. POWER SERIES∗

245

is said to converge to the matrix C in Mn (F ) if for each i, j ∈ {1, 2, . . . , n} ∞ the sequence {cN ij }N =0 converges to the (i, j) entry of C (in which case we write G(A) = C). We say G(A) converges if there is some C ∈ Mn (F ) such that G(A) = C. For A = (aij ) ∈ Mn (F ) define kAk =

n X

|aij |,

i,j=1

i.e., kAk is the sum of the absolute values of all the entries of A. Lemma 11.5.1. For all A, B ∈ Mn (F ) and all a ∈ F (a) kA + Bk ≤ kAk + kBk; (b) kABk ≤ kAk · kBk; (c) kaAk = |a| · kAk. Proof. The proofs are rather easy. We just give a proof of (b). X X X X |Aik | · |Bkj | ≤ Aik Bkj ≤ kABk = |(AB)ij | = i,j

i,j

≤

X

k

i,j,k

|Aik | · |Brj | = kAk · kBk.

i,j,k,r

P k Suppose that G(x) =P ∞ k=0 ak x has radius of convergence equal P∞ to R. ∞ Hence if kAk < R, then k=0 ak kAkk converges absolutely, i.e., k=0 |ak | · k K k k kAk But then |ak (A )P ij | ≤ ak kA k ≤ |ak | · kAk , implying that P∞ converges. ∞ k k k=0 |ak (A )ij | converges, hence k=0 ak (A )ij converges. At this point we have shown the following: Lemma 11.5.2. If kAk < R where R is the radius of convergence of G(x), then G(A) converges to a matrix C. We now give an example of special interest. Let f (λ) be the numerical function ∞ X λk f (λ) = exp(λ) = . k! k=0

246


Since this power series converges for all complex numbers λ, each matrix A ∈ Mn (F ) satisfies the condition in Lemma 11.5.2 so exp(A) converges to a matrix C = eA . Also, we know that f (λ) = exp(λ) has derivatives of all orders at each complex number. Hence f (A) is meaningful in the sense of the preceding P∞ 1 ksection. We want to be sure that the matrix C to which the series k=0 k! A converges is the same as the value f (A) given in the preceding section. So let us start this section over. A sequence of square matrices A1 , A2 , · · · , Am , Am+1 , · · · ,

(11.14)

all of the same order, is said to converge to the matrix A provided the elements of the matrices in a fixed row and column converge to the corresponding element of the matrix A. It is clear that if the sequences {Am } and {Bm } converge to matrices A and B, respectively, then {Am + Bm } and {Am Bm } converge to A + B and AB, respectively. In particular, if T is a constant matrix, and the sequence {Am } converges to A, then the sequence {T −1 Am T } will converge to T −1 AT . Further, if (s) Am = A(1) m ⊕ · · · ⊕ Am , (m = 1, 2, . . .), where the orders of the blocks do not depend on m, then {Am } will converge (i) to some limit if and only if each block {Am } converges separately. The last remark permits a completely simple solution of the question of the convergence of a matrix power series. Let a0 + a1 x + a2 x 2 + · · · + am x m + · · ·

(11.15)

be a formal power series in an indeterminate x. The expression a0 I + a1 A + a2 A 2 + · · · + am A m + · · ·

(11.16)

is called the corresponding power series in the matrix A, and the polynomial fn (A) = a0 I + a1 A + · · · + an An is the nth partial sum of the series. The series in Eq. 11.16 is convergent provided the sequence {fn (A)}∞ n=1 of partial sums has a limit. If this limit exists it is called the sum of the series. Reduce the matrix A to Jordan form:


247

T −1 AT = J = J1 ⊕ · · · ⊕ Jt , where J1 , · · · , Jt are elementary Jordan blocks. We have seen above that convergence of the sequence {fn (A)} is equivalent to the convergence of the sequence {T −1 fn (A)T }. But T −1 fn (A)T = fn (T −1 AT ) = fn (J) = fn (J1 ) ⊕ · · · ⊕ fn (Jt ), and the question of the convergence of the series Eq. 11.16 is equivalent to the following: Under what conditions is this series convergent for the Jordan blocks J1 , · · · , Jt ? Consider one of these blocks, say Ji . Let it have the elementary divisor (x − λi )ni , i.e., it is an elementary Jordan block with minimal and characteristic polynomial equal to (x − λi )ni . By Eq. 11.4     fn (Ji ) =    

fn (λi )

1 (1) f (λi ) 1! n

1 (2) f (λi ) 2! n 1 (1) f (λi ) 1! n

···

0

fn (λi )

0 .. .

0 .. .

fn (λi )

··· ...

0

0

···

···

···

(n−1) 1 f (λi ) (n−1)! n (n−2) 1 f (λi ) (n−2)! n (n−3) 1 f (λi ) (n−3)! n

.. . f (λ)

     . (11.17)    (j)

Consequently, {fn (Ji )} converges if and only if the sequences {fn (λi )}∞ n=1 for each j = 0, 1, . . . , ni − 1 converge, i.e., if and only if the series Eq. 11.15 converges, and the series obtained by differentiating it term by term up to ni − 1 times, inclusive, converges. It is known from the theory of analytic functions that all these series are convergent if either λi lies inside the circle of convergence of Eq. 11.15 or λi lies on the circle of convergence and the (ni − 1)st derivative of Eq. 11.15 converges at λi . Moreover, when λi lies inside the circle of convergence of Eq. 11.15, then the derivative of the series evaluated at λi gives the derivative of the original function evaluated at λi . Thus we have Theorem 11.5.3. A matrix power series in A converges if and only if each eigenvalue λi of A either lies inside the circle of convergence of the corresponding power series f (λ) or lies on the circle of convergence, and at the same time the series of (ni − 1)st derivatives of the terms of f (λ) converges at λi to the derivative of the function given by the original power series evaluated at λi , where ni is the highest degree of an elementary divisor belonging


248

to λi (i.e., where ni is the size of the largest elementary Jordan block with eigenvalue λi ). Moreover, if each eigenvalue λi of A lies inside the circle of convergence of the power series f (λ), then the jth derivative of the power series in A converges to the jth derivative of f (λ) evaluated at A. NowP reconsider the exponential function mentioned above. We know that λk λ f (λ) = ∞ k=0 k! converges, say to e , for all complex numbers λ. Moreover, λ the function f (λ) = e has derivatives of all orders at each λ ∈ C and the power series obtained from the original by differentiating term by term converges to the derivative of the function at each complex number λ. In fact, the derivative of the function is again the original function, and the power series obtained by differentiating the original power series by term is P term 1 k A not only again the original power series. Hence the power series ∞ k=0 k! converges to some matrix C, it converges to the value exp(A) defined in the previous section using the Jordan form of A. It follows that for each complex n × n matrix A there is a polynomial p(x) such that exp(A) = p(A). Let Jλ denote the n × n elementary Jordan block   λ 1 0 ... 0  0 λ 1 ... 0     ..  . . . . . . Jλ =  . . .  = λI + Nn . .    0 λ 1  0 ··· ··· ··· λ If f (λ) = eλ , then we know  λ 1 λ e 1! e  0 eλ  f (Jλ ) =  .  .. 0

···

1 λ e 2! 1 λ e 1!

..

. ···

··· ··· .. .

1 eλ (n−1)! 1 eλ (n−2)!

.. . 1 λ e 1!

eλ

   . 

Now let t be any complex number. Let Bn (t) = Bt be the n × n matrix defined by   t2 tn−1 1 1!t · · · 2! (n−1)!  t tn−2   0 1  · · · 1! (n−2)!   n−3 t . Bt =  0 0 1 · · ·  (n−3)!   . ..  .. ..  .. . . .  0 ···

··· ···

1


249 j−i

t So (Bt )ij = 0 if j < i and (Bt )ij = (j−i)! if i ≤ j. With f (λ) = eλ put g(λ) = f (tλ) = etλ . A simple induction shows that (j) g (λ) = tj g(λ). Then g(Jλ ) = etJλ = etλ Bt .

In particular, eJλ = eλ B1 . We can even determine the polynomial p(x) for which p(Jλ ) = eJλ . Put pn (x) =

n−1 X (x − λ)j j=0

j!

.

It follows easily that (n) p(j) n (λ) = 1 for 0 ≤ j ≤ n − 1, and pn (x) = 0.

From Eq. 11.6 we see that pn (Jn (λ)) = Bn (1), so eλ pn (Jn (λ)) = eλ Bn (1) = eJλ . We next compute Bt Bs , for arbitrary complex numbers t and s. Clearly the product is upper triangular. Then for i ≤ j we have j j X X (Bt · Bs )ij = (Bt )ik · (Bs )kj ) = k=i

k=i

sj−k tk−i · (k − i)! (j − k)!

j X 1 j − i (j−i)−(j−k) j−k 1 = t s = (t + s)j−i = (Bt+s )ij . (j − i)! k=i j − k (j − i)! Hence Bt · Bs = Bt+s = Bs · Bt . It now follows that etJλ · esJλ = etλ Bt · esλ Bs = e(t+s)λ Bt+s = e(t+s)Jλ = esJλ · etJλ . Fix the complex number s and put T0 = diag(sn−1 , sn−2 , . . . , s, 1). It is now easy to verify that T0−1 · sJλ · T0 = Jsλ . (11.18)


250

Hence Jsλ is the Jordan form of sJλ . Also, T0 AT0−1 it follows easily that T0 B1 T0−1 = Bs .

ij

= sj−i Aij . From this

It now follows that esJλ = f (sJλ ) = f (T0 · Jsλ · T0−1 ) = T0 · f (Jsλ )T0−1 = = T0 · eJsλ · T0−1 = T0 (esλ B1 )T0−1 = esλ Bs , which agrees with an earlier equation. Suppose A is a general n × n matrix with Jordan form T −1 AT = Jλ1 ⊕ · · · ⊕ Jλt , Jλi ∈ Mni (F ). Let T0 be the direct sum of the matrices diag(sni −1 , sni −2 , . . . , s, 1). Then T0−1 T −1 sAT T0 = Jsλ1 ⊕ · · · ⊕ Jsλt , so esA = T esλ1 Bn1 (s) ⊕ · · · ⊕ esλt Bnt (s) T −1 . Several properties of the exponential function are now easy corollaries. Corollary 11.5.4. Using the above descriptions of A and esA , etc., we obtain: (a) AeA = eA A for all square A. (b) esA etA = e(s+t)A = etA esA , for all s, t ∈ C. (c) e0 = I, where 0 is the zero matrix. (d) eA is nonsingular and e−A = (eA )−1 . (e) eI = eI. (f ) det(eJλ ) = enλ = etrace(Jλ ) , from which it follows that det(eA ) = etrace(A) .

11.6

Commuting Matrices∗

Let A be a fixed n × n matrix over C. We want to determine which matrices commute with A. If A commutes with matrices B and C, clearly A commutes with BC and with any linear combination of B and C. Also, if A commutes with every n × n matrix, then A must be a scalar multiple A = aI of the identity matrix I. (If you have not already verified this, do it now.)

11.6. COMMUTING MATRICES∗

251

Now suppose T −1 AT = J = J1 ⊕ · · · ⊕ Js

(11.19)

is the Jordan form of A with each Ji an elementary Jordan block. It is easy to check that X commutes with J if and only if T XT −1 commutes with A. Therefore the problem reduces to finding the matrices X that commute with J. Write X in block form corresponding to Eq. 11.19:   X11 X12 · · · X1s  X21 X22 · · · X2s    X =  .. (11.20) .. ..  .  . . ··· .  Xs1 Xs2 · · · Xss The condition JX = XJ reduces to the equalities Jp Xpq = Xpq Jq

(p, q = 1, · · · , s).

(11.21)

Note that there is only one equation for each Xpq . Fix attention on one block Xpq = B = (bij ). Suppose Jp is r × r and Jq is t × t. Then Xpq = B is r × t, Jp = λp I + Nr , Jq = λq I + Nt . From Eq. 11.21 we easily get λp Buv +

r X

(Nr )uk Bkv

=

λq Buv +

t X

Buk (Nt )kv ,

(11.22)

k=1

k=1

for all p, q, u, v

with

1 ≤ p, q ≤ s, 1 ≤ u ≤ r; 1 ≤ v ≤ t.

Eq. 11.22 quickly gives the following four equations: (For v 6= 1, u 6= r), λp Buv + Bu+1,v = λq Buv + Bu,v−1 .

(11.23)

(For v = 1; u 6= r), λp Bu1 + Bu+1,1 = λq Bu1 + 0.

(11.24)

(For v 6= 1; u = r), λp Brv + 0 = λq Br,v + Br,v−1 .

(11.25)

(For u = r, v = 1), λp Br1 + 0 = λq Br1 .

(11.26)

First suppose that λp 6= λq . Then from Eq. 11.26 Br1 = 0. Then from Eq. 11.24 (λq − λp )Bu1 = Bu+1,1 . Put u = r − 1, r − 2, . . . , 1, in that order, to


252

get Bu1 = 0 for 1 ≤ u ≤ r, i.e., the first column of B is the zero column. In Eq. 11.25 put v = 2, 3, . . . , t to get Brv = 0 for 1 ≤ v ≤ t i.e., the bottom row of B is the zero row. Put u = r − 1 in Eq. 11.23 and then put v = 2, 3, . . . , t, in that order, to get Br−1,v = 0 for all v, i.e., the (r − 1)st column has all entries equal to zero. Now put u = r − 2 and let v = 2, 3, . . . , t, in that order, to get Br−2,v = 0 for 1 ≤ v ≤ t. Continue in this way for u = r − 3, r − 4, etc., to see that B = 0. So for λp 6= λq we have Xpq = 0. Now consider the case λp = λq . The equations Eq. 11.23 through 11.25 become (For v 6= 1; u 6= r), Bu+1,v = Bu,v−1 .

(11.27)

(For v = 1; u 6= r)Bu+1,1 = 0.

(11.28)

(For v 6= 1; u = r)Br,v−1 = 0.

(11.29)

It is now relatively straightforward to check that there are only the following three possibilities (each of which is said to be in linear triangular form): 1. r = t, in which case B =

Pt

i i=0 ζi Nt

for some scalars ζi ∈ C.

2. r > t, in which case B has the form Pt i ζ N i t i=1 . B= 0r−t,t 3. r < t, in which case B has the form Pr i B = 0r,t−r . i=1 ζi Nr Conversely, the determination of the form of B shows that if B has any of the forms shown above then B commutes with X. For example, if B = ρI3 + N3 ⊕ ρI2 + N2 ⊕ σI3 + N3 for ρ 6= σ, then the matrices X that commute with V have the form

(11.30)

11.6. COMMUTING MATRICES∗



253

P2

i i=1 ci N2

P3

i i=1 ai N3

 X= 

01,2

P2 P2 i i 02,1 i=1 bi N2 i=1 di N2 where ai , bi , ci di , λi are arbitrary scalars.

 3  X ⊕ λi N3i ,  i=1

Polynomials in a matrix A have the special property of commuting, not only with the matrix A, but also with any matrix X which commutes with A. Theorem 11.6.1. If C commutes with every matrix which commutes with B, then C is a polynomial in B. Proof. In the usual way we may assume that B is in Jordan form. Suppose B=

s X

Bi =

i=1

s X

Jni (λi ) =

i=1

s X

λi Ini + Nni .

i=1

The auxiliary matrices X=

s X

ai Ini

i=1

where the ai are arbitrary numbers, are known to commute with B. So by hypothesis X also commutes with C. From the preceding discussion, if we take the ai to be distinct, we see that C must be decomposable into blocks: C = C1 ⊕ · · · ⊕ Cs . Moreover, it follows from CB = BC that these blocks have linear triangular form. Now let X be an arbitrary matrix that commutes with B. The general form of the matrix X was established in the preceding section. By assumption, C commutes with X. Representing X in block form, we see that the equality CX = XC is equivalent to the relations Cp Xpq = Xpq Cq (p, q = 1, 2, . . . , s).

(11.31)

If the corresponding blocks Bp , Bq have distinct eigenvalues, then Eq. 11.31 contributes nothing, since then Xpq = 0. Hence we assume that Bp and Bq have the same eigenvalues. Let Cp =

r X i=1

ai Nri ,

Cq =

t X i=1

bi Nti .


254

For definiteness, suppose that r < t. Then Eq. 11.31 becomes ( r X

( t ) X P Pr r i i 0r,t−r · bi Nti , = · 0r,t−r i=1 ζi Nr i=1 ζi Nr

) ai Nri

i=1

i=1

where the ζi are arbitrary complex numbers. Multiply the top row times the right hand column on both sides of the equation to obtain:

(a1 , . . . , ar ) · (ζr , . . . , ζ1 )T = (0, . . . , 0, ζ1 , . . . , ζr ) · (bt , . . . , b1 )T , which is equivalent to a1 ζr + a2 ζr−1 + · · · + ar ζ1 = b1 ζr + b2 ζr−1 + · · · + br ζ1 . Since the ζi can be chosen arbitrarily, it must be that ai = bi , for 1 ≤ i ≤ r.

(11.32)

It is routine to verify that a similar equality is obtained for r ≥ t. Suppose that the Jordan blocks of the matrix B are such that blocks with the same eigenvalues are adjacent. For example, let B = (B1 ⊕ · · · ⊕ Bm1 ) ⊕ (Bm1 +1 ⊕ · · · ⊕ Bm2 ) ⊕ · · · ⊕ (Bmk +1 ⊕ · · · ⊕ Bs ), where blocks with the sam eigenvalues are contained in the same set of parentheses. Denoting the sums in parentheses by B (1) , B (2) , · · · , respectively , divide the matrix B into larger blocks which we call cells. Then divide the matrices X and C into cells in a corresponding way. The results above show that all the nondiagonal cells of the matrix X are zero; the structure of the diagonal cells was also described above. The conditions for the matrix C obtained in the present section show that the diagonal blocks of this matrix decompose into blocks in the same way as those of B. Moreover, the blocks of the matrix C have a special triangular form. Equalities 11.32 mean that in the blocks of the matrix C belonging to a given cell, the nonzero elements lying on a line parallel to the main diagonal are equal. For example, let B have the form given in Eq. 11.30,

11.7. A MATRIX DIFFERENTIAL EQUATION∗

      B=     

ρ 0 0 0 0 0 0 0

1 ρ 0 0 0 0 0 0

0 1 ρ 0 0 0 0 0

0 0 0 ρ 0 0 0 0

Our results show that every matrix which commutes with B must have the  a0 a1 a2  a0 a1   a0   a0 C=     

0 0 0 1 ρ 0 0 0

0 0 0 0 0 σ 0 0

0 0 0 0 0 1 σ 0

0 0 0 0 0 0 1 σ

255

      .     

(11.33)

C that commutes with each matrix form 

a1 a0 d0 d1 d0

     .   d2   d1  d0

(11.34)

We need to prove that C can be represented as a polynomial in B. We do this only for the particular case where B has the special form given in Eq. 11.33, so that C has its form given in Eq. 11.34. By Lemma 11.4.1 there is a polynomial f (λ) that satisfies the conditions: f (ρ) = a0 , f 0 (ρ) = a1 , f 00 (ρ) = a2 , f (σ) = d0 , f 0 (σ) = d1 , f 00 (σ) = d2 . By applying Eq. 11.6 we see that f (B) = C

11.7

A Matrix Differential Equation∗

Let F denote either R or C. If B(t) is an n × n matrix each of whose entries is a differentiable function bij (t) from F to F , we say that the derivative of B with respect to t is the matrix dtd (B(t)) whose (i, j)th entry is (bij )0 (t). Let x(t) = (x1 (t), . . . , xn (t))T be an n-tuple of unknown functions F → F . The derivative of x(t) will be denoted x(t). ˙ Let A be an n × n matrix over F , and consider the matrix differential equation (with initial condition):


256

x(t) ˙ = Ax(t), x(0) = x0 .

(11.35)

Theorem 11.7.1. The solution to the system of differential equations in Eq. 11.35 is given by x(t) = etA x0 . Before we can prove this theorem, we need the following lemma. Lemma 11.7.2.

d tA (e ) = AetA . dt

Proof. First suppose that A is the elementary Jordan block A = J = Jn (λ) = λI +N , where N = Nn . If n = 1, recall that the ordinary differential equation x0 (t) = λx(t) has the solution x(t) = aeλt where a = x(0). Since N1 = 0, we see that the theorem holds in this case. Now suppose that n > 1. So A = J = λI + N where N n = 0. Recall that A

tJ

e =e

λt

= e Bt =

n−1 λt X e · tj j=0

j!

N j.

A simple calculation shows that n−1

X d tJ e = λeλt I + dt j=1

λeλt tj + jeλt tj−1 j!

N j.

On the other hand, JetJ = (λI + N ) eλt

n−1 j X t j=0

" = eλt (λI + N ) I +

=e

λt

Nj

n−2 j X t j=1

"

j!

!

tn−1 n−1 Nj + N j! j!

n−2 X λtj

tj−1 λI + (1 + λt)N + + j! (j − 1)! j=2 λtn−1 tn−2 n−1 + + N ) (n − 1)! (n − 2)!

!#

N j+

11.8. EXERCISES

257

d tJ e , as desired. dt This completes a proof of the Lemma. Now turn to a proof of Theorem 11.7.1. First suppose that A has the Jordan form J = P −1 AP , where J = −1 J1 ⊕ · · · ⊕ Js . Since etA is a polynomial in tA, eP tAP = P −1 etA P , i.e. =

etA = P · etJ · P −1 = T · (etJ1 ⊕ · · · ⊕ etJs )P −1 . It now follows easily that d tA = AetA . e dt Since x0 = x(0) is a constant matrix, dtd etA x0 = AtA x0 , so that x = etA x0 satisfies the differential equation x˙ = Ax. It is a standard result from the theory of differential equations that this solution is the unique solution to the given equation.

11.8

Exercises

1. Show that the norm of T ∈ L(V, W ) as defined in Equation 11.1 is a vector norm on L(V, W ) viewed as a vector space in the usual way. 2. Let D denote the derivative operator, and for a function f (in our case a formal power series or Laurent series) let f (j) denote the jth derivative of f , i.e., Dj (f ) = f (j) . (i) Prove that n X n (i) (n−i) D (f · g) = f g . i i=0 n

(ii) Derive as a corollary to part (i) the fact that X j D (f ) = f (i1 ) f (i2 ) . i , i 1 2 i +i =j j

2

1

2

(iii) Now use part (i) and induction on n to prove that X j j n D (f ) = f (i1 ) · · · f (in ) . i , . . . , i 1 n i +···+i =j 1

n

258


3. The Frobenius norm kAkF of an m × n matrix A is defined to be the square root of the sum of the squares of the magnitudes of all the entries of A. (a) Show that the Frobenius norm really is a matrix norm (i.e., a vector norm on the vector space of all m × n matrices). (b) Compute kIkF and deduce that k · kF cannot be a transformation norm induced by some vector norm. (c) Let U and V be unitary matrices of the appropriate sizes and show that kAkF = kU AkF = kAV kF = kU AV kF . 4. Let λ 7→ f (λ) and λ 7→ g(λ) be two numerical functions and let A be an n × n matrix for which both f (A) and g(A) are defined. Show that f (A)g(A) = g(A)f (A).

Chapter 12 Infinite Dimensional Vector Spaces∗ 12.1

Partially Ordered Sets & Zorn’s Lemma∗

There are occasions when we would like to indulge in a kind of “infinite induction.” Basically this means that we want to show the existence of some set which is maximal with respect to certain specified properties. In this text we want to use Zorn’s Lemma to show that every vector space has a basis, i.e., a maximal linearly independent set of vectors in some vector space. The maximality is needed to show that these vectors span the entire space. In order to state Zorn’s Lemma we need to set the stage. Definition A partial order on a nonempty set Z is a relation ≤ on A satisfying the following: 1. x ≤ x for all x ∈ A (reflexive) ; 2. if x ≤ y and y ≤ x then x = y

for all x, y ∈ A (antisymmetric);

3. if x ≤ y and y ≤ z then x ≤ z

for all x, y, z ∈ A (transitive).

Given a partial order ≤ on A we often say that A is partially ordered by ≤, or that (A, ≤) is a partially ordered set. Definition Let the nonempty set A be partially ordered by ≤. 1. A subset V of A is called a chain if for all x, y ∈ B, either x ≤ y or y ≤ x. 259

260

CHAPTER 12. INFINITE DIMENSIONAL VECTOR SPACES∗

2. An upper bound for a subset B of A is an element u ∈ A such that b ≤ u for all b ∈ B. 3. A maximal element of A is an element m ∈ A such that if m ≤ x for some x ∈ A, then m = x. In the literature there are several names for chains, such as linearly ordered subset or simply ordered subset. The existence of upper bounds and maximal elements depends on the nature of (A, ≤). As an example, let A be the collection of all proper subsets of Z + (the set of positive integers) ordered by ⊆. Then, for example, the chain {1} ⊆ {1, 2} ⊆ {1, 2, 3} ⊆ · · · does not have an upper bound. However, the set A does have maximal elements: for example Z + \ {n} is a maximal element of A for any n ∈ Z + . Zorn’s Lemma If A is a nonempty partially ordered set in which every chain has an upper bound, then A has a maximal element. It is a nontrivial result that Zorn’s Lemma is independent of the usual (Zermelo-Fraenkel) axioms of set theory in the sense that if the axioms of set theory are consistent, then so are these axioms together with Zorn’s Lemma or with the negation of Zorn’s Lemma. The two other most nearly standard axioms that are equivalent to Zorn’s Lemma (in the presence of the usual Z-F axioms) are the Axiom of Choice and the Well Ordering Principle. In this text, we just use Zorn’s Lemma and leave any further discussion of these matters to others.

12.2

Bases for Vector Spaces∗

Let V be any vector space over an arbitrary field. Let S = {A ⊆ V : A is linearly independent} and let S be partially ordered by inclusion. If C is any chain in S, then the union of all the sets in C is an upper bound for C. Hence by Zorn’s Lemma S must have a maximal element B. By definition B is a linearly independent set not properly contained in any other linearly independent set. If a vector v were not in the space spanned by B, then B ∪ {v} would be a linearly independent set properly containing B, a contradiction. Hence B is a basis for V . This proves the following theorem: Theorem 12.2.1. Each vector space V over an arbitrary field F has a basis. This is a subset B of V such that each vector v ∈ V can be written in just one way as a linear combination of a finite lis of vectors in B.

12.3. A THEOREM OF PHILIP HALL∗

12.3

261

A Theorem of Philip Hall∗

Let S and I be arbitrary sets. For each i ∈ I let Ai ⊆ S. If ai ∈ Ai for all i ∈ I, we say {ai : i ∈ I} is a system of representatives for A = (Ai : i ∈ I). If in addition ai 6= aj whenever i 6= j, even though Ai may equal Aj , then {ai : i ∈ I} is a system of distinct representatives (SDR) for A. Our first problem is: Under what conditions does some family A of subsets of a set S have an SDR? For a finite collection of sets a reasonable answer was given by Philip Hall in 1935. It is obvious that if A = (Ai : i ∈ I) has an SDR, then the union of each k of the members of A = (Ai : i ∈ I) must have at least k elements. Hall’s observation was that this obvious necessary condition is also sufficient. We state the condition formally as follows: Condition (H) : Let I = [n] = {1, 2, . . . , n}, and let S be any (nonempty) set. For each i ∈ I, let Si ⊆ S. Then A = (S1 , . . . , Sn ) satisfies Condition (H) provided for each K ⊆ I, | ∪k∈K Sk | ≥ |K|. Theorem 12.3.1. The family A = (S1 , . . . , Sn ) of finitely many (not necessarily distinct) sets has an SDR if and only if it satisfies Condition (H). Proof. As Condition (H) is clearly necessary, we now show that it is also sufficient. Br,s denotes a block of r subsets (Si1 , . . . , Sir ) belonging to A, where s = | ∪ {Sj : Sj ∈ Br,s }|. So Condition (H) says: s ≥ r for each block Br,s . If s = r, Br,s is called a critical block. (By convention, the empty block B0,0 is critical.) If Br,s = (A1 , . . . , Au , Cu+1 , . . . , Cr ) and Bt,v = (A1 , . . . , Au , Du+1 , . . . , Dt ), write Br,s ∩ Bt,v = (A1 , . . . , Au ); Br,s ∪ Bt,v = (A1 , . . . , Au , Cu+1 , . . . , Cr , Du+1 , . . . , Dt ). Here the notation implies that A1 , . . . , Au are precisely the subsets in both blocks. Then write Br,s ∩ Bt,v = Bu,w , where w = | ∪ {Ai : 1 ≤ i ≤ u}|, and Br,s ∪ Bt,v = By,z , where y = r + t − u, z = | ∪ {Si : Si ∈ Br,s ∪ Bt,v }|. The proof will be by induction on the number n of sets in the family A, but first we need two lemmas. Lemma 12.3.2. If A satisfies Condition (H), then the union and intersection of critical blocks are themselves critical blocks.

262


Proof of Lemma 12.3.2. Let Br,r and Bt,t be given critical blocks. Say Br,r ∩ Bt,t = Bu,v ; Br,r ∪ Bt,t = By,z . The z elements of the union will be the r + t elements of Br,r and Bt,t reduced by the number of elements in both blocks, and this latter number includes at least the v elements in the intersection: z ≤ r + t − v. Also v ≥ u and z ≥ y by Condition (H). Note: y + u = r + t. Hence r + t − v ≥ z ≥ y = r + t − u ≥ r + t − v, implying that equality holds throughout. Hence u = v and y = z as desired for the proof of Lemma 12.3.2 . Lemma 12.3.3. If Bk,k is any critical block of A, the deletion of elements of Bk,k from all sets in A not belonging to Bk,k produces a new family A0 in which Condition (H) is still valid. 0 Proof of Lemma12.3.3. Let Br,s be an arbitrary block, and (Br,s )0 = Br,s 0 0 the block after the deletion. We must show that s ≥ r. Let Br,s ∩Bk,k = Bu,v and Br,s ∪ Bk,k = By,z . Say

Br,s = (A1 , . . . , Au , Cu+1 , . . . , Cr ), Bk,k = (A1 , . . . , Au , Du+1 , . . . , Dk ). So Bu,v = (A1 , . . . , Au ), By,z = (A1 , . . . , Au , Cu+1 , . . . , Cr , Du+1 , . . . , Dk ). 0 0 0 The deleted block (Br,s )0 = Br,s 0 is (A1 , . . . , Au , Cu+1 , . . . , Cr ). But Cu+1 , . . . , Cr , as blocks of the union By,z , contain at least z − k elements not in Bk,k . Thus s0 ≥ v + (z − k) ≥ u + y − k = u + (r + k − u) − k = r. Hence s0 ≥ r, as desired for the proof of Lemma 12.3.3. As indicated above, for the proof of the main theorem we now use induction on n. For n = 1 the theorem is obviously true. Induction Hypothesis: Suppose the theorem holds (Condition (H) implies that there is an SDR) for any family of m sets, 1 ≤ m < n. We need to show the theorem holds for a system of n sets. So let 1 < n, assume the induction hypothesis, and let A = (S1 , . . . , Sn ) be a given collection of subsets of S satisfying Condition (H). First Case: There is some critical block Bk,k with 1 ≤ k < n. Delete the elements in the members of Bk,k from the remaining subsets, to obtain 0 0 a new family A0 = Bk,k ∪ Bn−k,v , where Bk,k and Bn−k,v have no common elements in their members. By Lemma 12.3.3, Condition (H) holds in A0 , 0 and hence holds separately in Bk,k and in Bn−k,v viewed as families of sets.

12.3. A THEOREM OF PHILIP HALL∗

263

0 By the induction hypothesis, Bk,k and Bn−k,v have (disjoint) SDR’s whose union is an SDR for A.

Remaining Case: There is no critical block for A except possibly the entire system. Select any Sj of A and then select any element of Sj as its representative. Delete this element from all remaining sets to obtain a family 0 0 A0 . Hence a block Br,s with r < n becomes a block Br,s 0 with s ∈ {s, s − 1}. By hypothesis Br,s was not critical, so s ≥ r + 1 and s0 ≥ r. So Condition (H) holds for the family A0 \ {Sj }, which by induction has an SDR. Add to this SDR the element selected as a representative for Sj to obtain an SDR for A. We now interpret the SDR problem as one on matchings in bipartite graphs. Let G = (X, Y, E) be a bipartite graph. For each S ⊆ X, let N (S) denote the set of elements of Y connected to at least one element of S by an edge, and put δ(S) = |S| − |N (S)|. Put δ(G) = max{δ(S) : S ⊆ X}. Since δ(∅) = 0, clearly δ(G) ≥ 0. Then Hall’s theorem states that G has an X-saturating matching if and only if δ(G) = 0. Theorem 12.3.4. G has a matching of size t (or larger) if and only if t ≤ |X| − δ(S) for all S ⊆ X.

Proof. First note that Hall’s theorem says that G has a matching of size t = |X| if and only if δ(S) ≤ 0 for all S ⊆ X iff |X| ≤ |X| − δ(S) for all S ⊆ X. So our theorem is true in case t = |X|. Now suppose that t < |X|. Form a new graph G0 = (X, Y ∪ Z, E 0 ) by adding new vertices Z = {z1 , . . . , z|X|−t } to Y , and join each zi to each element of X by an edge of G0 . If G has a matching of size t, then G0 has a matching of size |X|, implying that for all S ⊆ X, |S| ≤ |N 0 (S)| = |N (S)| + |X| − t, implying |N (S)| ≥ |S| − |X| + t = t − (|X| − |S|) = t − |X \ S|. This is also equivalent to t ≤ |X| − (|S| − |N (S)|) = |X| − δ(S).

264


Conversely, suppose |N (S)| ≥ t−|X \S| = t−(|X|−|S|). Then |N 0 (S)| = |N (S)| + |X| − t ≥ (t − |X| + |S|) + |X| − t = |S|. By Hall’s theorem, G0 has an X-saturating matching M . At most |X| − t edges of M join X to Z, so at least t edges of M are from X to Y . Note that t ≤ |X| − δ(S) for all S ⊆ X iff t ≤ minS⊆X (|X| − δ(S)) = |X| − maxS⊆X δ(S) = |X| − δ(G). Corollary 12.3.5. The largest matching of G has size |X| − δ(G) = m(G), i.e., m(G) + δ(G) = |X|.

12.4

A Theorem of Marshall Hall, Jr.∗

Many of the ideas of “finite” combinatorics have generalizations to situations in which some of the sets involved are infinite. We just touch on this subject. Given a family A of sets, if the number of sets in the family is infinite, there are several ways the theorem of P. Hall can be generalized. One of the first (and to our mind one of the most useful) was given by Marshal Hall, Jr. (no relative of P. Hall), and is as follows. Theorem 12.4.1. Suppose that for each i in some index set I there is a finite subset Ai of a set S. The system A = (Ai )i∈I has an SDR if and only if the following Condition (H’) holds: For each finite subset I 0 of I the system A0 = (Ai )i∈I 0 satisfies Condition (H). Proof. We establish a partial order on deletions, writing D1 ⊆ D2 for deletions D1 and D2 iff each element deleted by D1 is also deleted by D2 . Of course, we are interested only in deletions which preserve Condition (H’). If all deletions in an ascending chain D1 ⊆ D2 ⊆ · · · ⊆ Di ⊆ · · · preserve Condition (H), let D be the deletion which consists of deleting an element b from a set A iff there is some i for which b is deleted from A by Di . We assert that deletion D also preserves Condition (H). In any block Br,s of A, (r, s < ∞), at most a finite number of deletions in the chain can affect Br,s . If no deletion of the chain affects Br,s , then of course D does not affect Br,s , and Condition (H) still holds for Br,s . Otherwise, let Dn be the last deletion that affects Br,s . So under Dn (and hence also under 0 0 D) (Br,s )0 = Br,s 0 still satisfies Condition (H) by hypothesis, i.e., s ≥ r. But Br,s is arbitrary, so D preserves Condition (H) on A. By Zorn’s Lemma,

12.4. A THEOREM OF MARSHALL HALL, JR.∗

265

¯ preserving Condition (H). We show that there will be a maximal deletion D ¯ under such a maximal deletion D preserving Condition H, each deleted set Si0 has only a single element. Clearly these elements would form an SDR for the original A. Suppose there is an a1 not belonging to a critical block. Delete a1 from every set Ai containing a1 . Under this deletion a block Br,s is replaced by a 0 0 block Br,s 0 with s ≥ s − 1 ≥ r, so Condition (H) is preserved. Hence after a maximal deletion each element left is in some critical block. And if Bk,k is a critical block, we may delete elements of Bk,k from all sets not in Bk,k and still preserve Condition (H) by Lemma 12.3.3 (since it needs to apply only to finitely many sets at a time). By Theorem 12.3.1 each critical block Bk,k (being finite) possesses an SDR when Condition (H) holds. Hence we may perform an additional deletion leaving Bk,k as a collection of singleton sets and with Condition (H) still holding for the entire remaining sets. It is now ¯ preserving Condition (H), each element clear that after a maximal deletion D is in a critical block, and each critical block consists of singleton sets. Hence ¯ preserving Condition (H), each set consists of a after a maximal deletion D single element, and these elements form an SDR for A. The following theorem, sometimes called the Cantor–Schroeder–Bernstein Theorem, will be used with the theorem of M. Hall, Jr. to show that any two bases of a vector space V over a field F must have the same cardinality. Theorem 12.4.2. Let X, Y be sets, and let θ : X → Y and ψ : Y → X be injective mappings. Then there exists a bijection φ : X → Y . Proof. The elements of X will be referred to as males, those of Y as females. For x ∈ X, if θ(x) = y, we say y is the daughter of x and x is the father of y. Analogously, if ψ(y) = x, we say x is the son of y and y is the mother of x. A male with no mother is said to be an “adam.” A female with no father is said to be an “eve.” Ancestors and descendants are defined in the natural way, except that each x or y is both an ancestor of itself and a descendant of itself. If z ∈ X ∪ Y has an ancestor that is an adam (resp., eve) we say that z has an adam (resp., eve). Partition X and Y into the following disjoint sets: X1 = {x ∈ X : x has no eve}; X2 = {x ∈ X : x has an eve};

266


Y1 = {y ∈ Y : y has no eve}; Y2 = {y ∈ Y : y has an eve}. Now a little thought shows that θ : X1 → Y1 is a bijection, and ψ −1 : X2 → Y2 is a a bijection. So φ = θ|X1 ∪ ψ −1 |X2 is a bijection from X to Y . Corollary 12.4.3. If V is a vector space over the field F and if B1 and B2 are two bases for V , then |B1 | = |B2 |. Proof. Let B1 = {xi : i ∈ I} and B2 = {yj : j ∈ J}. For each i ∈ I, let Γi = {j ∈ J : yj occurs with nonzero coefficient in the unique linear expression for xi in terms of the yj0 s}. Then the union of any k (≥ 1) Γ0i s, say Γi1 , . . . , Γik , each of which of course is finite, must contain at least k distinct elements. For otherwise xi1 , . . . , xik would belong to a space of dimension less than k, and hence be linearly dependent. Thus the family (Γi : i ∈ I) of sets must have an SDR. This means there is a function θ : I → J which is an injection. Similarly, there is an injection ψ : J → I. So by the preceding theorem there is a bijection J ↔ I, i.e., |B1 | = |B2 |.

12.5

Exercises∗

Exercise 12.5.0.1. Let A = (A1 , . . . , An ) be a family of subsets of {1, . . . , n}. Suppose that the incidence matrix of the family is invertible. Show that the family has an SDR. Exercise 12.5.0.2. Prove the following generalization of Hall’s Theorem: Let A = (A1 , . . . , An ) be a family of subsets of X that satisfies the following property: There is an integer r with 0 ≤ r < n for which the union of each subfamily of k subsets of A, for all k with 0 ≤ k ≤ n, has at least k − r elements. Then there is a subfamily of size n − r which has an SDR. (Hint: Start by adding r “dummy” elements that belong to all the sets.)

12.5. EXERCISES∗

267

Exercise 12.5.0.3. Let G be a (finite, undirected, simple) graph with vertex set V . Let C = {Cx : x ∈ V } be a family of sets indexed by the vertices of G. For X ⊆ V , let CX = ∪x∈X Cx . A set X ⊆ V is C-colorable if one can assign to each vertex x ∈ X a “color” cx ∈ Cx so that cx 6= cy whenever x and y are adjacent in G. Prove that if |CX | ≥ |X| whenever X induces a connected subgraph of G, then V is C-colorable. (In the current literature of graph theory, the sets assigned to the vertices are called lists, and the desired proper coloring of G chosen from the lists is a list coloring of G. When G is a complete graph, this exercise gives precisely Hall’s Theorem on SDR’s. A current research topic in graph theory is the investigation of modifications of this condition that suffice for the existence of list colorings. Exercise 12.5.0.4. With the same notation of the previous exercise, prove that if every proper subset of V is C-colorable and |CV | ≥ |V |, then V is C-colorable.

Index A = [T ]B2 ,B1 , 38 QR-decomposition, 186 T -annihilator of v, 122 T -conductor of v into W , 122, 127 [v]B , 36 ∞-norm, 145 L(U, V ), 32 n-linear, 61 1-norm, 145 2-norm, 145

complex number, 8 convergence of a sequence of matrices, 246 convergence of a sequence of vectors, 145 convergence of power series in a matrix, 245 coordinate matrix, 36

determinant, 65 diagonalizable, 118 adjoint of linear operator, 153 dimension, 25 adjoint, classical, 76 direct sum of subspaces, 18 adjugate, 79 dual basis, 43 algebraic multiplicity of an eigenvalue, dual space, 43, 152 205 eigenvalue, 113 algebraically closed, 57 eigenvector, 114 alternating (n-linear), 64 elementary divisors of a matrix, 213 basis of a vector space, 24 elementary Jordan block, 213 basis of quotient space, 210 Euclidean algorithm, 60 bijective, 45 exponential of a matrix, 246, 250 binomial theorem, 58 finite dimensional, 24 block multiplication, 13 Frobenius norm of a matrix, 258 Cayley-Hamilton Theorem, 79, 83 Fundamental Theorem of Algebra, 57 chain, 259 characteristic polynomial, 77 generalized eigenvector, 204 classical adjoint, 76 geometric multiplicity of an eigenvalue, 183 codomain, 33 companion matrix, 81 Gram-Schmidt algorithm, 146 268

INDEX Hermitian matrix, 131, 159 Hermitian operator, 159 hyperplane, 191 ideal in F [x], 56 idempotent, 36 independent list of subspaces, 18 injective, 33 inner product, 137 inner product space, 138 isometry, 171 isomorphic, 45, 52

269 normal operator, 163 normal matrix, 163 normalized QR-decomposition, 186 null space, 33 nullity, 34 orthogonal, 139 orthogonal matrix, 148 orthonormal, 139

partial order, 259 partially ordered set, 259 polarization identity, complex, 157 Jordan form, 212 polarization identity, real, 156 polynomial, 49 kernel, 33 positive (semidefinite) operator, 169 Kronecker delta, 8 positive definite, 198 prime polynomial, 59 Lagrange Interpolation, 52 Lagrange interpolation generalized, 242 principal ideal, 56 principal submatrix, 78 Laplace expansion, 67, 75 projection, 35 least-squares solution, 188 projection matrix, 186 linear algebra, 47 pseudoinverse, 184 linear functional, 43, 152 linear operator, 38 quotient space, 123 linear transformation, 31 linear triangular form, 252 range, 33 linearly dependent, 22 Rayleigh Principle, 196 linearly independent, 22 reduced singular value decomposition, list, 21 187 lnull(A), 34 rnull(A), 34 maximal element, 260 scalar matrix functions, 240 minimal polynomial , 56 Schur’s Theorem, 147 monic polynomial, 49 Moore-Penrose generalized inverse, 184 self-adjoint operator, 159 singular value, 178 singular value decomposition, 178 nilpotent, 204 skew-symmetric matrix, 163 norm, 145 span, 22 norm (on a vector space), 141

270 sum of subspaces, 18 surjective, 34 system of distinct representatives (SDR), 261 Taylor’s Formula, 58 tensor product, 91 trace of a matrix, 78 transformation norm, 233 unitary matrix, 148 upper bound, 260 Vec, 109 vector space, 15 vector subspace, 17 Zorn’s Lemma, 260

INDEX

A Second Semester of Linear Algebra

Linear Algebra, Second Edition

Linear Algebra, Second edition

Linear Algebra, Second Edition

Lectures in geometry. Semester II. Linear algebra and differential geometry

Foundations of Linear Algebra

Linear Algebra

ALGEBRA LINEAR

Linear algebra

Linear algebra

Linear Algebra

Linear Algebra

Linear Algebra

Foundations of linear algebra

Fundamentals of linear algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear algebra

Linear algebra

Linear Algebra

Linear algebra

Linear Algebra

Algebra Linear

Linear Algebra

Linear Algebra

Linear algebra

Linear Algebra

A Second Semester of Linear Algebra

Linear Algebra, Second Edition

Linear Algebra, Second edition

Linear Algebra, Second Edition

Lectures in geometry. Semester II. Linear algebra and differential geometry

Foundations of Linear Algebra

Linear Algebra

ALGEBRA LINEAR

Linear algebra

Linear algebra

Linear Algebra

Linear Algebra

Linear Algebra

Foundations of linear algebra

Fundamentals of linear algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear algebra

Linear algebra

Linear Algebra

Linear algebra

Linear Algebra

Algebra Linear

Linear Algebra

Linear Algebra

Linear algebra

Linear Algebra

Recommend Documents