Linear Algebra (Oxford Science Publications)

Linear Algebra Richard Kaye and Robert Wilson School ofMathematics and Statistics The University ofBirmingham OXFORD ...

Author: Richard Kaye | Robert Wilson

984 downloads 4584 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Linear Algebra Richard Kaye and

Robert Wilson School ofMathematics and Statistics The University ofBirmingham

OXFORD NEW YORK TOKYO OXFORD UNIVERSITY PRESS •

•

Linear Algebra

This book has been printed digitally and produced in a standard specification in order to ensure its continuing availability

OXFORD UNIVERSITY PRESS

Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York

Auckland Bangkok Buenos Aires Cape Town Chennai Dar es Salaam Delhi Hong Kong Istanbul Karachi Kolkata Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Sao Paulo Shanghai Taipei Tokyo Toronto Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York ©Richard Kaye and Robert Wilson, 1998 The moral rights of the authors have been asserted Database right Oxford University Press (maker) Reprinted 2003 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose this same condition on any acquirer ISBN 0-19-850237 0

P refa ce This book constitutes a second course in linear algebra, and is based on second year courses given first by RW and then by RK in Birmingham over the last five years. The objectives of such a course are as follows. Firstly, the student must learn a whole host of algebraic methods associated with bilinear forms, inner products, eigenvectors, and diagonalization of matrices, and be confident in carrying out calculations with these in all areas of mathematics. Secondly, this course is likely to be one of the first places where the student meets the axiomatic method of abstract algebra, and as such serves as an introduction to abstract algebra in general. We believe that these two requirements can be successfully married into a single course. By this stage in a student's career, vectors and matrices should be sufficiently familiar that the jump to the full rigour of the axiomatic treatment of vector spaces is not such a great one. Our approach throughout is to show how certain key examples suggest axioms, and then to prove 'structure theorems ' showing that the abstract objects satisfying the axioms are isomorphic to one of the 'canonical' examples for finite dimensional vector spaces over ]g_ or C or other fields. More advanced theorems can then be proved in a ' concrete way' using matrices and column vectors. This approach has the advantages that matrices (with which students are generally quite comfortable) are never very far away, and that proofs coincide very closely with the calculations that the students will be required to do, so algorithms or methods can be obtained from studying proofs (a useful lesson to learn in general) . Obviously, we give plenty of examples in the text, and in doing so we are able to point out exactly where the proofs of the corresponding theorems suggest how to do the calculation in question. We have assumed a certain amount of familiarity with matrices and basic operations on them (addition, multiplication, transpose, determinants, and cal culating inverses) , and the student should be able to solve simultaneous linear equations of the form Ax = b, obtaining the full solution set. This much at least will be included in a first-year programme of study, and this material is summar ized in Chapter 1. We also assume some basic knowledge of the complex numbers, but we do not assume the student has encountered vector spaces over C before. Properties of polynomials that are required for understanding of the Cayley Hamilton Theorem and the minimum polynomial of a linear transformation are set out in a suitable place in Part III. Typically, a student at this level will have met the concepts of vector space over ]g_ and dimension before, but full familiarity with these ideas is not necessary

vi Preface

2.

since this material is revised fully in Chapter Clearly the extent to which this chapter needs to be revised or expanded upon is at the discretion of the lecturer or the student. We have chosen to include all of the basic material on linear independence, bases and dimension here, even though for most students this will be revision. Almost all of the time our vector spaces are finite dimensional, but since students at this level will occasionally meet applications (such as Fourier series) which require infinite dimensional vector spaces we have included some material on these too; in most cases we prove results for finite dimensional vector spaces only, indicating afterward whether or not the theorem is also true for infinite dimensional spaces. We have also included a brief introduction to fields, and this may well be new material to a student at this level. The reasons for including this mater ial are clear: many examples and applications of vector spaces that a typical undergraduate will see will involve fields other than IR or C, and the field ax ioms provide an important illustration of the axiomatic approach. We do not allow ourselves to dwell on the subject, though, giving for example just a few examples of finite fields rather than a complete classification. In any case the main emphasis of the book is on spaces over IR or C, and this section is optional. Part II goes on to discuss inner product spaces in general, and also bilinear forms and quadratic forms on real vector spaces, culminating in their full de scription via the diagonal form given by the Gram-Schmidt orthonormalization (for inner products) and by 'Sylvester's law of inertia' in the more general case of symmetric bilinear forms. In the case of spaces over the complex numbers, conjugate symmetric forms are also considered and the corresponding laws are derived by the same methods. To keep things reasonably straightforward, Part II is concerned mostly with spaces over IR or C. Part III contains a full discussion of linear transformations from a finite di mensional vector space V to itself, their eigenvalues, eigenvectors, and diagonal ization. The algorithms performing these computations are emphasized through out. Determinants are used as an aid to computations (the characteristic poly nomial) but are not required for a full understanding of the theory. The book ends with two chapters that emphasize applications of the material presented in the whole book: one on self-adjoint transformations on inner product spaces and the final chapter on Jordan normal form. In Part III, our vector spaces are over an arbitrary field F with the only condition on F being that the minimum polynomial of the linear transformation f in question splits over F. The student may prefer to continue to read the text as if F is ffi'. or C. Of course, there is too much material for a single course here, and it is up to the lecturer to decide on the course content and emphasis. With the material in Part I taken as understood, it would certainly be possible to cover all of the topics here as 'algorithms' or 'methods' in a single course, leaving the brighter students to follow up the sections explaining why some of the more difficult ideas (such as Sylvester's law of inertia, or the Jordan normal form) really do work.

Preface vii

2. 6 ,

6.2,

Alternatively, some sections and/or chapters can be omitted altogether without interrupting the flow of the text. For example, Sections 4.4, 5.4, 7.4 and 7.5 may be omitted and Chapters 13 and 14 are independent of each other, so one or both of these could be omitted (though it should be mentioned that these last two chapters are in some ways the most important of all for applications) . Birmingham October 1997

RK RW

Contents

PART I 1

2

1.1 .21 1.1.1.435 1.1.67 22..21 2.2.43 2.2.65

Matrices

Matrices Addition and multiplication of matrices The inverse of a matrix The transpose of a matrix Row and column operations Determinant and trace Minors and cofactors

Vector spaces

Examples and axioms Subspaces Linear independence Bases Coordinates Vector spaces over other fields

PART II 3

4

5

MATRICES AND VECTOR SPACES

33.. 21 3.3 4.4.4.321 4.4 5.5.21 55..43

BILINEAR AND SESQUILINEAR FORMS

Inner prod uct spaces

The standard inner product Inner products Inner products over C

Bilinear and sesquilinear forms

Bilinear forms Representation by matrices The base-change formula Sesquilinear forms over C

Orthogonal bases

Orthonormal bases The Gram-Schmidt process Properties of orthonormal bases Orthogonal complements

33 45 56 1115 1919 2427 338294 474749 55 6161 6366 69 7733 7581 85

X

6 7

Contents

6.6.21 7.7.7.321 7.7.45

When is a form definite?

The Gram-Schmidt process revisited The leading minor test

Quadratic forms and Sylvester's law of inertia

Quadratic forms Sylvester's law of inertia Examples Applications to surfaces Sesquilinear and Hermitian forms

PART III 8

9

10

11

12

13

14

8.8.21 8.3 99..21 9.9.34 10.10.21 10.3 11.11.21 11.3 12.12.21 12.3 13.13.21 113.3 .34 14. 1

LINEA R TRANSFORMATIONS

Linear transformations

Basics Arithmetic operations on linear transformations Representation by matrices

Polynomials

Polynomials Evaluating polynomials Roots of polynomials over .fl)A. (>. + fl)A = >.A + flA. O(A) = 0. A + ( - A) = 0. Two matrices A = (a;j ) and B = (b;j ) can sometimes be multiplied together,

but only if their sizes agree in a special way. The rule is as follows: the matrix product AB only exists when A is n x m and B is m x k for some n, m , k. When this is the case, AB is the n x k matrix with ( i, j)th entry

m

L a;rbrj = a;r brj + a;2 b2j + · · · + a;m bmj .

r =l

The n x n identity matrix I n is the matrix ( a;j ) with diagonal entries a;; 1 and all other entries When n is clear from context, it is just denoted I. Note that matrix multiplication is not in general commutative, and it is not hard to find examples, such as

0.

( -11 02) (01 -11 ) = ( -11 11 ) =I ( -12 02) = (01 -11 ) ( -11 02) .

=

Even worse, it may be that AB is defined but BA is not. However, provided the following matrices are all of the correct size for the expressions to be defined, we have the following laws. (Associativity.) (AB)C = A (BC) .

8.

The inverse of a matrix

9.

5

(Zero.) OA = AO = 0. 10. (Identity.) lA =AI = A. 1 1. (Distributivity. ) A ( B +C) = AB +AC. 12. (Distributivity.) (A+B ) C = AC+BC. Square matrices will be particularly important in this book. We note here that if A , B are n x n matrices, then A+ B , AB and >-A, >-B are all n x n matrices; thus all of the twelve laws just given hold for n x n matrices A , B , C. 1.3

T he i nverse of a m atrix

If A is a square matrix, it may be that there is a matrix B such that AB = I. When this happens, B is called a right inv ers e of A. Similarly, it may be that there is a l eft inv ers eC of A, satisfying CA = I. :\'ot all matrices have inverses in this way, but it is an interesting (and not entirely straightforward) fact that the left and right inverses of a square matrix A, if they exist, are the same. Fact 1 . 1 L et A b ean n x n matrix and suppos e B is e i th er a l eft or a right i nv ers eof A . Th en B is a two-sid ed inv ers e, i. e. BA = AB = I, and this inv ers e is unique, so if C satisfi es eith er CA = I or AC = I , th en C = B . (See Exercise 1.15 for a way to proye this.) Vle denote this unique inverse of a square matrix A when it exists by A - 1. If A - 1 exists we say A is i m·ertib Jc. Proposition 1 . 2 If A and B ar e inv er tibl esquar e matric es , th en AB is also inv ertibl eand (AB ) - 1 eq uals B - 1 A - 1 . Proof

Using the associativity law, (AB ) (B-1 A - 1 )

= =

A(B(B- 1 A - l ) ) A(IA -l )

=

=

A ( ( B B - 1 )A -l )

AA - l = I.

Hence B - 1 A -l is a right inverse of AB, and therefore is the unique inverse of D AB . 1.4

The transpose of a m atrix

The transpos e operation (which is notated with a T sign) converts an matrix to an m x n matrix as follows: all

UJ2

UJm

a21

a22

U2m

a31

a3 2

U3m

anl

Un2

rLnm

1 . 3 For n T (AB) = (a) B TA T;

Proposition

x n

T

(""

UJ 2

-

a 1: m

matric es A, B:

a21

U3 1

a22

a32

a,2

a2m

a3m

rL nm ""'

.

)

.

n x m

Matrices

6

(b) if A is invertible, then so is AT, and (AT)- 1 = (A-1)T .

Proof ( a ) A typical entry Cji of C = AB i s 2::::�= 1 aj k b k i , and this i s the (i , j)th entry of (AB)T . On the other hand, the (i, j)th entry of BT AT is 2:::: �= 1 b k i ajk ,

which is precisely Cji· (b) We use the fact that the inverse B - 1 of a matrix B , when it exists, is both a left and right inverse (i.e. B - 1 B = BB- 1 = I) and is unique. By part (a) we have (A 1 ) TAT = (AA - I )T = I T = I so (A - 1 )T is an inverse of AT. By D uniqueness (A - 1 ) T = (AT) - 1 . -

1.5

Row a n d col u m n operations

An elementary row operation is a basic type of operation on matrices; for ex ample, 'add row i to row j ' is an elementary row operation. The three kinds of elementary row operations are:

1. P j := Pj + >..pi ; 'add >.. times row i to row j ' , for any number >.. and any rows i , j ; 2. P i := >..pi ; 'multiply row i by).. ' , for any nonzero number >.. and any row i ; 3. swap (pi , pj) ; 'swap rows i and j ' , for any two rows i , j . (Note: the Greek letter p ( 'rho') sounds almost the same as the English word 'row' !) Each of these operations corresponds to multiplying on the left by a certain matrix. For example, for row operations on 3 x n matrices,

� �); (� � D; (�0 0� �).

1. P 3 := P 3 + 2pz corresponds to left-multiplying by 2. p,

,�

Ap, wne,pond' to left-multiplying by

2 1

and

3. swap(p1 , pz) corresponds to left-multiplying by

1 A row operation is a combination of elementary row operations, performed con secutively, and so, by associativity, is equivalent to multiplication on the left by a product of matrices of the forms above.

Exercise 1 . 1 Check your understanding by calculating AB directly for B

=

(;

2 3 -1 1 -2 3 1

and for each matrix A in 1-3 above.

By combining elementary row operations, we can obtain row operations to perform several useful transformations of matrices.

Row and column operations

7

Clearing the first column. Our first basic technique using row operations converts any matrix A to another matrix B of the form

(1 so all but possibly the first entry in the first column is zero. We can furthermore arrange that b11 is either 1 or 0.

2. 4.

1. If all entries in the first column are already zero there is nothing to do. Else use the swap operation to arrange that the top left entry a1 1 in the

matrix is nonzero. 3. Optionally, use p 1 := A.p1 where >.. = 1 / a1 1 to ensure that the top left entry is one. Now use operation PJ := Pi + f-lPJ for each j � and for suitable values of f-l to ensure the rest of the first column is zero.

2

(

25 9)

) 2

Exercise 1 . 2 Apply this method to convert the following. (b)

1 0 -1 0 -1 1

1

10 11 . 1

1 1

Echelon form. A matrix is in echelon form if it contains no adjacent rows of

the form

0 0 0 0

0 XJ 0 Y1

X2 Y2

with y 1 =f. 0 (irrespective of what x1 is) . In other words, for a matrix in echelon form, each row starts with a sequence of zeros, and the number of zeros in this initial sequence increases as you go from one row to the next row beneath it until we get to the very last row, or until all other rows are entirely zero.

(

2 24)

)

Exercise 1.3 Decide which of the following are in echelon form. Give reasons. (a)

0 1 0 0 0 0

(b)

(d)

(�

G

0 1 0 0 1 0 0 0

�

(c)

�

1

3

(� 2 2 2 1

1

1

0 0 3

1

8 Matrices Converting to echelon form. Any matrix can be converted to echelon form by row operations. The procedure is as follows. 1. Apply 'clearing the first column ' ignoring any initial columns of zeros ) until the matrix is of the form (

0

0

bml

2. Now put the matrix

into echelon form by ' clearing the first column' of this matrix again, ignor ing any initial column of zeros ) . Note that row operations on( this matrix correspond to row operations on the original. Continue in this way until the whole matrix is in echelon form. If the 'optional' step in 'clearing the first column' above is applied each time, this method gives echelon form in which the first nonzero entry in each row is 1 .

Exercise 1 . 4 Use row operations to put the following into echelon form . ( a)

G �) G 5 i) (! D c� 1 1 1

( d)

(

1 0 0

b)

1 1 3 4 4

2 1 3

(c)

-1

( 0 1

c)

1

2

G

2 3

- 2 -3 -4

2 3 2 3 0

1

-5�)

D

Rank. If A can be converted to the echelon form matrix B using row operations, and B has exactly k nonzero rows, then the rank of A, rk A ( sometimes called the row-rank), is k.

Exercise 1.5 Say what the ranks of the matrices in Exercise 1.4 are. For further practice, calculate the ranks of the matrices in Exercises 1 .2 and 1 . 3 also. It is not at all obvious that this notion of rank is well-defined. In other words, it is not obvious whether or not there can be two different sequences of row operations converting A into echelon forms B and C respectively, where B has a different number of nonzero rows than C. In fact this is impossible, but we will defer a proper discussion of this until Chapter

8.

Row and column operations

9

Converting to the identity matrix. If a matrix A is n x k (i.e. has n rows and k columns) where n::;; k, then the leftmost n x n block of A can sometimes be converted to the identity matrix using row operations. In fact, this can always be done if A has echelon form all 0 0

0

0

al n+ l a2 n+l a3n+ l

alk a2 k a3k

ann an n + l

ank

aln a2n a3n

an 0 a33

with each of the diagonals ajj f:- 0, or in other words, if A has rank n. 1. By performing the row operation Pi := ( 1/ a ii)Pi for i = 1, . . . , n if necessary, convert the above echelon form to a similar one where the diagonal entries aii are all 1 . 2. Now, starting at column n and then working backwards t o column n - 1 , n- 2, up to column 1 , clear all nondiagonal entries i n this column by carrying out row operations as follows: (a) for column n, use operations Pi :=Pi + f1Pn for i = 1 , 2, . . . , n - 1 , and suitable values of 11 in each case; (b) for column n - 1, use operations Pi : = Pi + flPn -l for i = 1, 2, . . . , n - 2; (c) and so on. This gives a form 1 0 0 1 0 0

0 0

0 0 1

0

0 b l n+ l 0 b2 n+ l 0 b3n + l

bl k b2k b3k

b n n+ l

b nk

1

for the matrix.

Exercise 1 . 6 Where applicable, apply this method to the matrices in Exercises 1.2, 1 .3 , and 1 .4.

Calculating the inverse. If A is an n x m matrix and B is an n x k matrix, the augm en t ed matrix (A[B) is the n x (m + k) matrix you get by writing down

the entries of A and B next to one another in the obvious way. To compute the inverse of an n x n square matrix A, apply row operations to the augmented matrix (A/1) , where I is the n x n identity matrix, to get (I[B) for some matrix B , as described in the last section. This is not always possible, but as we have already seen it will be possible if A can be converted to some echelon form with n nonzero rows, i.e. if A has rank n . Then A - I = B .

10 Matrices For example, starting with

A=

( i i -�) -1 0

1

and following the above procedure we can get row operations converting

u

3 1 0 0 2 1 -1 0 1 0 1 0 0 1 0

(Try it! ) Thus

)

to

(

' 0 0 0 1 0 0 0 1

)

1/4 - 1 /2 -5/2 1 1 . 0 1 /4 - 1/2 - 1 /4

)

u l' c�

- 1/2 -5/2 1 1 . 1 /4 - 1/2 - 1/4

2 1 -1 1 0

To see why the method works, recall that each elementary row operation corresponds to multiplying on the left by an elementary row operation matrix. So applying several row operations to (A[I) corresponds to multiplying on the left by a product R = R 1 R2 . . . Rk of row operation matrices. If the result of these operations is (I[B) then by associativity of matrix multiplication R(A[I) = (I[B ) , i.e. RA = I and RI = B. In other words, R = B and B is a left inverse of A . We saw that this method will always find the inverse of an n x n matrix A if A has maximum possible rank n. It turns out A -I exists if and only if rk A = n , so this method will always succeed. The ideas here can be used to prove Fact 1 . 1 . See Exercise 1 . 15.

Exercise 1. 7 Use this method to calculate the inverses of the following matrices. ( a)

(� I) 1 0 1 1 0 1 0 0

(b )

H -�) 0 1 -1

( c)

u �) 2 -1 2

Solving linear equations. Row operations are commonly used to solve simul taneous linear equations, and we may illustrate the method here with an example. To solve x + -

x

y

+ 2z = - 1

y

+ 4z =

z = -1

+

-x +

first put the equations in matrix form

3,

Determinant and trace

11

and then put the augmented matrix formed from the matrix on the left with the column vector on the right into echelon form :

(-

�

1 2 0 1 -1 1 4

1 2 -1 1 3 -2 2 -4

1 2 -1 1 3 -2 0 0 0

6

The full solution can now be read off directly: x

+ y + 2z = - 1 y + 3z = - 2

0 =

0,

)

-

so z may be anything, y = - 2 - 3z, and x = - 1 - 2z y = z + 1 . This method works for any number of simultaneous equations in any number of unknowns, and always gives the most general solution. Of course, a system of simultaneous linear equations may not have any solu tion at all. The following is useful in this regard. Fact 1 .4 The equation

A

=

(��) .

=h

Xn

where A is a k x n matrix and b is a k x 1 column vector has at least one solution if and only if rk(A) rk(A[b) . It has exactly one solution if and only if rk(A) = n .

Column operations. These are analogous to row operations except that they operate on columns instead of rows. In this book, the elementary column oper ations are denoted by "'j := "'j + .A"'; , "'i := Af\,; (for A::/:- 0), and swap("'; , "'j ) , in exact analogy with the notation for the row operations. Here, "'i is used to denote the ith column of a matrix, just as p; denoted the ith row. Column operations correspond to multiplying on the right by special column operation matrices, just as row operations correspond to left-multiplication by row operation matrices. In fact, column operations are not used as much as row operations in practice, and they will appear here only occasionally. 1.6

Determ inant a n d trace

The determinant and trace operations take a square matrix A and return a number, det A or tr A. The trace operation is the simplest to calculate, as it is just the sum of the diagonal entries of the matrix. Thus if A = ( a;j ) is an n x n matrix, the trace of A is defined to be

12 Matrices tr A = tr

(

n

an = a 1 1 + an +

:

·

·

·

+ ann =

L a;;. i =l

an !

The significance of this operation is not at all clear from this definition. However, note that it has the obvious property that for n x n matrices A and B , tr(A + B) = t r A + t r B . One much less obvious property that will play an important role later i s that for an invertible n x n matrix P and any n x n matrix A, tr(P - 1 AP) = tr A. At this stage at least, the determinant of A will seem just as mysterious. Its definition is an inductive one. For 1 x 1 matrices A = ( a1 1 ) , we define det( an ) = a 1 1 ,

I ( )

i.e. the number which is the only entry of the matrix A. For an n x n matrix A = ( a;j ) , we denote det A by the matrix A with vertical lines round it, and define det A =

an a2 1

a1 2 a22

a1n a1n

an!

an 2

ann

a!n

an

a2 1

a2 3

a1n

an!

an3

ann

- a1 2

= a11

ann

an2 +

· · ·

+ ( - l ) n+!aln

a2 1

a! n - 1

an!

an n - !

(This is sometimes called 'expansion by the first row'.) The rule is to take the entries in the top row in turn, with alternating signs, and multiply them by the determinants formed by deleting the row and column of the entry in question. This gives an expression for the n x n determinant in terms of (n - 1) x (n - 1) determinants, and the (n - 1) x (n - 1 ) determinants are evaluated b y the same rule repeatedly until we get down to 1 x 1 determinants which are evaluated using ( 1 ) . For example, applying this to the 2 x 2 case we have

��

which should be memorized. In the 3

a1 1 a2 1 a31

= ad - be, x

3 case we have

a1 2 a13 an a 2 3 = an ( ana33 - a 2 3 a3 2 ) - a1 2 ( a 2 1 a33 - a2 3 a31 ) + a13 ( a 2 1 a3 2 - a 22 a3 1 ) . a3 2 a33

Determinant and trace 13 Determinants often have a physical interpretation a s areas or volumes. For example, the determinant

�� =ad-

be

has magnitude equal to the area of the parallelogram with corners given by position vectors

The first point of significance of determinants for matrix operations is the fol lowing. Fact 1 . 5 For an n x n matrix A, th efollowing ar e equival en t : (a) A - I exists ; (b) det A -::/:- 0; (c) rk A = n. There are various useful rules to help calculate determinants. Firstly, for some matrices it is particularly easy to compute determinants. We say a matrix A = ( is upp er triangular if 0 for i > j; in other words the nonzero entries are all on or above the principal diagonal of the matrix so that

a;j)

a;j =

A=

a1 1 0 0 0

a22 a23a33 a1 2

a1 3

0

0

ala2n a3nn ann

Fact 1 . 6 Th e d et erminan t of an upp er triangular matrix A is equal to th e product of i ts diagonal en tri es . We also have Fact 1 . 7 L et A , B b en x n matric es and I th en x n id en ti ty matrix. Th en : (a) det A = det(A T ); (b) det(AB) = det A det B; (c) if A is inv ertibl e, det (A -I ) = (det A) - 1 ; (d) det I = 1; (e) if A has a row or column which is en tir ely z er o, th en det A = 0 . Fact part (b) , i s the central fact concerning determinants, and the key to proving all the results concerning determinants mentioned here. A proof of it is outlined in Exercise 1 .20 below. It also suggests an alternative way to calculate determinants: instead of using the definition directly, we can apply row opera tions to get our matrix in echelon form , and then compute the determinant of

1.7,

14 Matrices this (using Fact 1.6) and also the determinants of the row operation matrices used to get to this form. In fact, the determinants of the basic row operation matrices are easy to calculate. Fact 1.8 (a) If S i, j is th e matrix for th e row op eration swap (pi , pj ) wh er e i =1- j' th en det s i, j = - 1 . (b) If T i,>. is th e matrix for th e row op eration P i : = >.. pi wh er e>.. :j:. 0, th en detT i,>. = >.. . (c) If Ai, j,>. is th ematrix for th erow op eration P i : = P i + Apj wh er ei :j:. j, th en det Ai, j,>. = 1 . From the last two basic facts, all the usual rules for evaluating determinants can be deduced.

Proposition 1 .9 If w eswap any two rows in a d et erminant

an a1 2 a2 1 a22

a1 n a2 n

th e d et erminant chang es sign (i. e. is multipli ed by - 1). In particular, if th e matrix A = ( a ij ) has two id en tical rows th en det A = 0 .

Proof Use Fact 1.7 part ( b ) and Fact 1 .8 part (a) . I f A has two identical rows, then swapping them does not change A, so det A = - det A and hence 0 det A = 0. Proposition 1.10 Multiplying any singl erow of a d et erminant by >..multipli es th ed et erminant by >...

Proof Fact 1 . 7 part (b) and Fact 1 .8 part (b).

0

Be careful here: only one row of the matrix is multiplied by >.. to multiply the determinant by >.. . This is in contrast with scalar multiplication of matrices, >.. A , where every row i s multiplied by >.. . I n fact, i f A i s an n x n matrix, det(>.. A) = An det A .

Proposition 1.1 1 Applying any row op eration P i matrix A l eav es det A unchang ed.

·-

P i + Apj (i :j:. j) t o a

Proof Fact 1 .7 part (b) and Fact 1 . 8 part (c).

0

Proposition 1.12 A d et erminant can b e expand ed by any row, provid ed you r em em b er that th esign associat ed with an en try aij is ( - 1) i+j . Proof This can be derived from the definition, Fact 1 . 7 part (b) and Fact 1 .8

part (a) .

0

Proposition 1 . 1 3 Any of th eabov erul es for evaluating d et erminants for rows and row op erations appli es equally to columns and column op erations.

Minors and cofactors 15 D

Proof Fact 1.7 part (a) .

For example, a determinant with two identical columns is zero, just as one with two identical rows. 1.7

Minors a n d cofactors

Determinants are also used to transform an another matrix as follows. Given

A=

C'

n x n

a12 a 22

a2 1 .

anl

a ,n a2 n .

an 2

)

square matrix A

=

( a ;j ) to

'

a nn

define b;j to be the determinant of the (n - 1 ) x (n - 1) matrix obtained by deleting the i th row and the j th column of A. Then the matrix B = ( b;j) is called the matrix of minors of A. Now define c;j = ( - 1 ) i +J bij · In other words, c;j is the determinant of the (n- 1) x (n - 1) submatrix of A with the sign correction you would use when evaluating det A by the ith row (or by the jth column). The number Cij is called the ( i, j) th cofactor of A, and the matrix C = ( c;j ) is called the matrix of cofactors of A. The transpose of this, C T , is called the adjugat ematrix of A , written adj A. This matrix cr = adj A has the useful property that (adj A)A = A (adj A)

=

det (A)I.

(2)

Thus, in principle at least, determinants give another way of calculating inverses: if det A =1- 0 then A-1

=

�

de A

adj A.

In practice, this method is only used for particularly simple matrices, including all 2 x 2 matrices, where we have

(�

)

(

1 1 b d d -ad - be -c

In most other cases, it is usually simpler to calculate inverses using row opera tions. Exercises

Exercise 1 .8 To show you understand the definition of 'echelon form ' , give an example of an upper triangular matrix which is not in echelon form.

16 Matrices Exercise 1 . 9 Let A be an m x n matrix and let ei be the n x 1 column matrix with ith entry equal to 1 and all other entries 0. Show that Ae; is equal to the ith column of A. Exercise 1 . 10 Using the definitions of the matrix operations directly, verify laws 1-3 for matrix addition, 1-4 for scalar multiplication, and 1-5 for matrix multiplication. Exercise 1 . 1 1 In each case, determine if the system of equations has a solution, and if so give the most general solution. (a) x + 2y + z = 2 -1 -x + 2y 2y + 2z = 7 z = 1 X y 1 2x - y 2x + 2z = 1

5x (b)

(c)

X + 2y + Z X+ y+Z +z -X

1 1 1

Exercise 1 . 1 2 Do the same as the last exercise for the following. x + 2y - z + w = 1 (a) 2x + y + z = 2 (b)

X+y- Z y + z x + 2z x - y + 5z

(c)

1 2 1 1

x + 2y + 3z + -X + y + Zx + 5y + 7z +

w W

w

-1 2 1

Exercise 1 . 13 Calculate the adjugate of each of the matrices of Exercise 1 .7 and in each case verify that equation (2) on page 15 holds. Exercise 1 . 14 Calculate the adjugate of each of the following m atrices A. (b)

(-�

0

. - �)

2 1 0 1 2 -1 -1 1 1

Exercise 1 . 1 5 This exercise proves Fact 1 . 1 using the idea of row operations, and is for ambitious students only. (a) For each n x n row operation matrix R, show directly that R has an inverse, i.e. some S with SR = R S = I.

Exercises 17 (b) Show that if R = R 1 R 2 . . . R k is a product of elementary row operation matrices then R does not have a row that is entirely zero. [Hint: use (a) and associativity of matrix multiplication.] (c) Suppose A , B are n x n matrices with AB = I. Prove that there is some C with CA = I. [Hint: let R = R 1R 2 . . . R k be row operation matrices so that R A is in echelon form. By multiplying on the right by B , show that A has rank n.] (d) Considering transposes and part (c) , show that if A , B are square matrices with BA = I then there is a matrix C with AC = I. (e) Show that if CA = AB = I then C = B. [Hint: what is CAB?] Similarly, show that if CA = BA = I then C = B .

Exercise 1 . 16 Show that any square matrix A is the product R 1R 2 . . . R k of

'generalized' elementary row operation matrices, where 'generalized ' means that the ). in each of the row operations p; := Ap; and p; := Pi + Apj is now allowed to be zero. [Hint: modify the 'converting to the identity matrix' method above to find ordinary elementary row operations R ; and generalized elementary row operations s j such that Rl SII.] R kA = s l 0

0

0

0

0 0

Exercise 1 . 17 Show directly from the definition of determinant that a l l + Ab1 1 a2 1

a1n + ).b ln a2 n

al l a2 1

al n a2 n

an !

ann

an!

ann

+A

bl l a2 1

bl n a2 n

an!

ann

Deduce that the determinant of a matrix A whose top two rows are equal is zero.

Exercise 1 . 18 Show that the matrix of any generalized elementary row opera tion (in the sense of Exercise 1 . 16) can always be written as one of SiT jRT jS ; , SiRSi, TjRT j , or R , where R is the matrix of a generalized row operation act ing on the first two rows only, and Si, T j are the matrices of ' swap' operations swap(p1, Pi) and swap(p2 , P j ) respectively.

Exercise 1 . 19 Prove by induction on n that if A, B are n x n matrices with B obtained from A by the operation swap(pi, P J ) , where 1 ( i < j ( n , then det B = det A. [Hint: if i = 1 and j = 2, expand det B twic e, using the -

definition of determinant. Otherwise, use the induction hypothesis.]

Exercise 1 . 20 Using Exercises 1 . 16, 1 . 17, 1 . 18 , and 1 . 19, show that det(AB) = det A det B for all n

x

n matrices A , B .

Exercise 1.21 Using Exercises 1 .20 and 1 . 16 , o r otherwise, prove the following for an n x n matrix A. (a) rk A = n if and only if det A -::/:- 0. (b) det A T = det A. (c) If A is upper triangular then det A is the product of its diagonal entries.

18 Matrices Exercise 1 . 2 2 A perm utation a- of { 1 , 2, . . . a- : { 1 , 2, .. .

,n

}

-+

,n

} is a bijection

{ 1 , 2 , . .. , n } .

The set of all such permutations is denoted Sn . The matri x M,. of the permutation a- is the n x n matrix with 1 in the (a-( i), i)th position for each i = 1 , 2, .. . , n , and all other entries equal to 0. The sign of the permutation a- , sgn(a-), is defined to be detM,. . (a) Show that M ,. ei = e,.(i) for all i, where ei is the n x 1 column matrix of Exercise 1 .9. Hence show that M,.M" = M uorr , where a- o 1r E Sn is the permutation defined by a- o 1r(i) = a- (1r(i)). (b) Show that sgn(a-) E { 1 , - 1 } for all a- E S n . (c) Calculate the signs of the permutations a-, 1r E S 3 given by a- ( 1)

=

2, a-(2)

=

1, a-(3)

=

3,

7r(1) = 2, 7r(2) = 3, 7r(3)

(d) Verify the formula

a1 1 a2 1

a1 2 a13 a 22 a 2 3

=

=

a1 1 a22 a33 + a1 2 a 2 3a31 + a13a2 1 a32 - a1 2 a2 1 a33 - a13a22 a31 - ana2 3a32

L sgn(a-) a l u( l ) a2 u(2)a3u(3)

where the summation is over all six permutations a- of { 1 , 2 , 3 } .

Exercise 1 .23 Show by induction on n that the formula

=

L sgn (a-) al u(l )a2 u(2 )

o-ESn

holds for all

n

E N.

·

· ·

anu(n)

=

1.

2 Vector spaces I n this chapter we will review material concerning vector spaces. Some o f this (but possibly not all of it) may already be familiar to you. This chapter is very important, however, because it sets out the style in which we shall study linear algebra, how we shall organize the text, and why it is particularly useful to do so in this way. 2.1

Exam ples a n d a xioms

We start our study of vector spaces with some examples.

Example 2 . 1 Let

II!'

from the real numbers

be the eet of all column vectooe R

+0

�

with cntri"'

x, y,,

We can add two such vectors by the rule

It followe that when we add r·

G)

G)

to a vectoc v we get v again, just like 0 + r

r foe all real numbeoe r . So

G)

�

behave' juet like 0 and i' called the

zero vector, denoted 0. We can also multiply a vector v with a real number r by the rule

The operations of addition and multiplication by real numbers r are related. For example, it turns out that 2v is equal to v + v for all vectors v. More generally, the distributivity law, (r + s)v = (rv) + ( sv) , holds, since

20 Vector spaces (r + s)

(�) (��: :��) (��: :�) (��) (:�) =

z

+

=

=

(r + s)z

rz + sz

rz

sz

Various other laws like this one hold too as you will see in a moment.

Example 2.2 Now consider M3 ,3 ( IR) , the set of 3 x 3 matrices with entries from

JR. As for vectors, matrices can be added by the rule

(

X1 2 xn ' X"2 '1 X 22 X2 3 X31 X3 2 X33

) (

" Y1 2 " + Y2 l Y22 Yn "

Y3l

"

Y3 2 Y33

)

=

( ) (

+ y, X! 2 + Y! 2 xu + "" '' X"2 ! + Y 2 1 X 22 + Y22 X2 3 + Y2 3 X3! + Y3! X3 2 + Y3 2 X33 + Y33

and a matrix can be multiplied by a real number

X13 X2 3 X33

r by

rx11 TX1 2 TX1 3

=

TX 2 1 TX31

TX22 TX2 3 TX3 2 TX33

)

)

Similar laws hold for such matrices. For example, the zero matrix ( with all entries equal to 0) can be added to any other matrix without changing it, and the commutative and associative laws of addition, and the distributivity law, hold too, as we saw in Section 1 .2.

Example 2 .3 Consider now IR[X] , the set of polynomials in X with coefficients from N. A typical polynomial is

f(X) = anXn + where the added by

ai

· · ·

+ a2 X2 + a 1 X 1 + ao

are real numbers, and possibly 0. Two such polynomials can be

(anXn + · · + a 1 X 1 + ao) + (bnXn + · · · + b1X1 + bo) n = (an + bn)X + · · · + (a! + b! )X 1 + (ao + bo) , and multiplied by r by r(anXn + · · · + a 1 X1 + ao) = (ran)Xn + + (ra!)X1 + ( m0 ) . ·

· ·

·

Real vector spaces. As will be clear, these examples ( and many others like them ) have several features in common. When mathematicians want to concen trate on certain features of several examples ( and possibly ignore other features of the examples, such as the rule for multiplying two matrices together, or the very different rule for multiplying two polynomials together ) they write down a xioms for the common features. The three examples above are all examples of r eal v ector spac es and the next definition gives the axioms for real vector spaces.

Examples and axioms 21

IE.

Definition 2.4

A real vector space or vector space over is a s et V containing a sp ecia lzero vector 0, tog eth er with op erations of addition of two v ec tors, giving u + v, and mu ltip ilcation of a v ec tor v with a r ea lnumb er >., giving >.v, satis fying th efo llowing al ws for a llu, v, w E V and >., tJ E JR.

(u+v) + w=u +(v+w) u+v=v+ u u+O=u v+(-1)v=O .:\(tJv) =(>.tJ)V A(u + v) =AU+ Av (A +tJ)U=AU+ fJU.

(1) (2) (3) (4) (5) (6) (7)

Th e v ec tor ( - 1 )v is d efin ed to b e th e r es u tl of mu tlip ying l - 1 and v, but is mor e usua lly writt en -v; simi ar l y, l u + (-v) is writt en u- v. Th e elem en ts of V ar eca lled th e vectors of th espac e. Rea lnumb ers ar eoft en r ef err ed to as th e scalars of th espac e. Some of the axioms above have special names: ( 1 ) is called the associativ ity of addition; (2) is the commutativity of addition; and (6) and (7) are the distributivity laws. Several advantages of the axiomatic approach are already apparent. We now have a way of dealing with examples like the three mentioned above (and many others like them) in one go, rather than proving theorems for each example indi vidually, so this approach saves time and energy. It also helps understanding the examples, since features which are irrelevant for some problems (matrix multi plication perhaps) are not mentioned in the axioms. Thirdly, it helps consider ably in checking that proofs are correct, since for example a proof of a theorem about vector spaces can only use the axioms listed in the definition, and no other properties of any special example the reader or author might have in mind. Of course, to apply any theorems we might prove about real vector spaces to a given space V, one must first prove that the axioms for real vector spaces are true of V. This is usually straightforward.

Example 2.5 The set of complex numbers C forms a real vector space with the usual addition z1 +z2 of complex numbers, and scalar multiplication, A(x+ iy) =(Ax) + i(Ay),

where x, y, A are all real.

Proof Most of the vector space axioms express well-known facts about C. For example, (z1 +z2)+z3=z1 +(z2+z3) is just associativity of addition of complex numbers, and z1 + z2 = z2+ z1 is commutativity. The zero vector 0 here is just the complex number 0 0 + iO, so z + 0 =z and z + ( -1)z= 0 are clear. Also A(tJz) = (AtJ)z, A(zl + z2) = Az1 +Az2, and (A+tJ)z = >.z+tJZ are consequences 0 of associativity and distributivity of complex multiplication. =

22 Vector spaces

V and also zero space V = {0}, + V0 = .AO = 0 ). 0, (1)-(7) 0 for all v V andv,Inalla scalrealavector rs .A. space V, (a) v = 1v, (b) Ov=0, and (c) ).0 = 1 v== ((-1)( -1)( --11)v)v + 0 == (( -1)( + ( -1)v) -1)(-l)v -l)v ++ (((v-l)v + v) ==O+v ((-1)(- l)v+ (- l)v) +v ( =v Ov=(=llv+(+ (--l)vl))v ( 7 ) =v + (-1)v =0 (-1)0) .AO== .A.AO(O++.A((-1)0) ==0.AO + ( -1)(.AO)

Note how a vector space is defined. It is necessary to say wh at the set of vectors is, to say wh at th e rules for addition of vectors and scalar multiplication are too. Th e next example is very important, despite its simplicity. In fact, this ex ample describes the simplest kind of vector space of all. Example 2 . 6 Th e over IR is the vector space where (ob viously) 0 denotes th e zero vector. A ddition and scalar multiplication are given for all E JR. Th e vector space axioms are 0 and by the rules 0 all true for this for a very simple reason: since every meaningful expression involving scalars and th e vector 0 gives the vector any two such expressions are equal, and hence equations are all true in this special zero space. Th e next proposition gives the first example of the use of th e axioms to prove statements about vector spaces. It shows th at many of the laws which you migh t have expected to b e axioms in fact follow from the axioms already given.

Proposition 2 . 7 E

Proof ( a) Given

we h ave

by by by by by by by

(5) (3) (4) (2) ( 1) 4) and (2) (3) and (2) .

(b ) Again,

by by (a) by (4) .

For (c ) ,

by by by by

as

required.

(4) (6 ) (5) (4) ,

D

Examples and axioms 23 Because of the axioms and because of propositions like this last one, we will adopt a slightly more relaxed approach to notation, writing for example u+v+w for the sum of three vectors u, v, w in a vector space V without specifying the order in which they are to be added. (This order does not matter, since by ( 1 ) and (2) (u + v ) + w, u + (v + w), u + (w + v), w + (v + u), (w + u ) + v, and so on, are all the same.) Similarly, by (5) , ( -2.\ )v = (-2)(.\v) = -(2.\)v, etc. , and this vector will b e denoted more simply by -2.\v. We will use the axioms ( 1 )-(7) and Proposition 2. 7 all the time, usually without explicit mention.

Complex vector spaces. The definition given above of a vector space over

lR

can easily be modified to give the notion of a v ec tor spac e ov er th e compl ex num bers by just replacing lR by in Definition 2.4 and letting .A and f.l range over all complex numbers in the laws ( 1)-(7) for scalar multiplication. For example, the set of column vectors of height 3 and with entries from can be regarded as a complex vector space, in just the same way as can be regarded as a real vector space. However, care is needed. For example, the set of complex numbers can be regarded as a r eal vector space as we have already seen, or as a compl e x vector space with usual complex-number addition + and for scalar multiplication the compl ex-num ber multiplication Similarly, can be regarded as either a real vector space or a complex vector space. The properties of these spaces (e.g. as a real vector space, and as a complex vector space) are quite different, as we shall see. The distinction of what scalars you allow is a very important one.

e,

e3

e

e

..\z. z1 z2 JR3

e

e

e

(XY!1) (XY.22) Z1 Z2

e3

(XY!1 YX22) Z 1 Z2

en

Canonical examples. This book is concerned with vector spaces in general, but we will constantly refer back to the more familiar vector spaces Rn and with addition defined by

..

.

+

+ +

..

(8)

.. . +

and scalar multiplication defined by

(9)

en

for scalars .A. Unless otherwise specified, will be regarded as a complex vector space (so the scalars .A in (9) will be complex numbers, and multiplication, .Ax etc . , is ordinary multiplication of complex numbers) . R n is always considered as a real vector space. The bold face notation for vectors, as in v, w, , will be reserved for the vector spaces Rn and only, and we will use light face letters x, y, u, v, w, . . . for vectors (i.e. elements of V) in the general case. Scalars will be denoted with Greek letters .A, f.l, v, . . . to distinguish them from vectors.

z,

en

. . .

24 Vector spaces We will also refer to the spaces JR0 and C0 . If JRn is the set of column vectors with n entries, JR0 should be the set of column vectors with no entries. There is only one such vector, and it might be written ( ) , just as the empty set is sometimes written { } . The only possible way to define addition and scalar mul tiplication is by defining ) + ( ) = ) and .A() = ( ) . In other words JR0 is just the zero space over JR, as in Example 2.6, where () is thought of as 0. Similarly, C0 is the complex zero vector space--it too has a single vector ) thought of as 0, but this time the scalars are the complex numbers. The main disadvantage with column notation for vectors in lRn and en is the amount of paper it requires. For this reason, you may see people use so-called row vectors, but changing between row and column vectors is sometimes confusing. In this book, we will use column vectors throughout, but will sometimes notate a column vector as a row with a transpose sign as a reminder. Thus we often write the vector

(

(

(

T

as

(xInr,ourxz,xexamples 3 )T. of vector spaces over the real numbers lR (or the complex

numbers q , lR (or q is called the field of scalars. To simplify terminology, we shall often talk about a vector space over F, or an F-vector space, where F is lR or C. The optional Section 2.6 below takes this terminology a bit further and gives axioms and further examples of fields. 2.2

S ubspaces

Almost invariably in pure mathematics, whenever you see a definition for an 'object' of some kind, you will also have a definition of a 'subobject' .

Definition 2.8 Given a vector space V over lR or C, a subspace of V is a

u,

subset W � V which contains the zero vector of V and is closed under the operations of addition and scalar multiplication. That is, for each v E W and each scalar .A, each of + v , (and .Av) must be in W .

u .Au

u,

Since - 1 i s a scalar, every subspace W contains -v for each v i n W, so W is also closed under subtraction of vectors, v - too. You should be able to check easily that {0} and itself are both subspaces of the vector space V . The following lemma gives a 'minimal' condition for a subset W o f V t o b e a subspace o f Lemma 2.9 Let W � V be nonempty, where V is a vector space over lR or C. Then W is a subspace of V if and only if v + .Aw E W for each v, w E W and each scalar .A.

V.

V

Proof Given v , w E W and .A E F, we have 0 = v + (-l)v E W, v + w = v + lw E W, and .Aw = 0 + .Aw E W . This proves the 'if' part. For the 'only if'

Subspaces 25 part, suppose W is a subspace of V, so v + AW E w .

v,

wE

W , and A is a scalar. Then

Aw E

W D

The laws (1)-(7) clearly hold in any subspace, so a subspace W of a vector space V is a vector space in its own right. Subspaces will turn out to be very important indeed. We would like some way of specifying a subspace accurately and succinctly. For example, suppose we knew the vectors a1 , . . . , an were in a subspace W of V . Does this determine W? Or, if not, is there some 'special' or 'best' subspace of V containing a 1 , . . . , an ? With this in mind we make the tentative definition, If V is a vector space over F = IK or C, and A � V is any subset of vectors from V, then the smallest subspace W of V that contains A is called the subspace spanned by A . The problem with this i s that it i s not immediately obvious that this sub space W exists for all A. Certainly, V itself is a subspace of V containing A, so some subspace of V containing A exists, but why is there a smallest subspace containing A? However, one can rescue this idea as follows. If a1 , a2 , . . . , a n E W and W is a subspace of V then it follows from the fact that W is closed under addition and scalar multiplication that ( 10) for any scalars A1 , A2 , · · · , An · (An expression like (10) is called a linear com bination of the vectors a 1 , . . . , an .) What is more, by the associativity and dis tributivity laws in the vector space,

(A1 a1 + A2 a2 + · · · + Anan) + ( 111 a1 + J12 a2 + · · · + Jln an) = ( ), 1 + J1d a1 + (A2 + J12 )a2 + · · · + (An + Jln) an and so the sum of any two linear combinations of a 1 , . . . , an or the scalar product of a linear combination of a 1 , . . . , an is again a linear combination. In other words the set of linear combinations of a1 , . . . , an is a subspace of V , so we make the following definition.

Definition 2 . 1 0 Given a vector space V over F = IK or C, and given a subset A=

{a 1 , a2 , . . . , an} W

=

of V ,

{ )'1 a1 + A2 a2 + · · + An an : .\1 , . . . , An E ·

F}

is the subspace of V spanned by A. The elements of W

.\1 a1 + A2 a2 + · · · + Anan are called linear combinations of vectors from A. This subspace W is de noted span A or span(a 1 , a2 , . . . , an) ·

26 Vector spaces We note again that span A is a subspace of V , and any subspace containing all the vectors from A must contain every linear combination of vectors from A , i.e. must contain each element of W, so W is indeed the smallest subspace containing A. In particular, the zero vector 0 is always a linear combination of a1 , a2 , . . . , an since

IR3

Example 2 . 1 1 Let V = with the usual addition and scalar multiplication, and consider vectors a = (1, 2, 0)T and b = (0, 1, -l)r. Then a typical linear combination of a, b is

If we write this as (x, y, z)T we easily see that 2x- y- z = 0 since x = .\, y = 2.\+f-l, and z = -f-l. So every vector (x, y, z)T in span( a, b) satisfies 2x-y-z = 0. On the other hand, given a vector (x, y, z)T such that 2x-y-z = 0, we have

so (x, y, z)T is in span(a, b). Thus we have proved that

'Pan(a, b)

�

{ G)

•

2x- y- z � 0

}

.

If A is the empty set, we define span A to be {0}. This is just a convention; if you like, you can think of 0 as the sum of an empty sequence of terms of the form Aiai , but if this doesn't appeal, just learn the convention. Example 2 . 1 2 If V =

and

IR4 with the usual addition and scalar multiplication,

W=

{(X) y

�

. 3x+ y = 0 · x+y+z = w

}

IR

we can easily check that W is a subspace of V : if x, y, z, w, x' , y', z' , w' E with 3x+y = 0, x+y+z=w, 3x'+y' = 0, x'+y'+z' = w', then 3(x+.\x' )+(y+.\y' ) = 0 and (x+A.x' )+(y+.Ay' )+(z+A.z' ) = (w+A.w' ), so W is a subspace by Lemma 2.9. For any vector (x, y, z, w )T E W, y = -3x and w = x + y + z = z- 2x so every vector of W is of the form (x,-3x, z, z - 2x)T for some x, z E R In

Linear independence

27

(1, -3,0, -2) T and b = (0,0, 1, 1)T are in W , as you 1 X -3x 01 -3 =x 0 � 2x -2 1

( ) ( ) + z (Q)

particular the vectors a = can check. In fact span( a, b) = W since

z

It is sometimes useful to be able to define span A when A is infinite. In this case we do not have any way to form an infinite sum of terms of the form .\; a;, so instead we are guided by our principle that span A should be the smallest vector subspace of V containing A .

Definition 2 . 13 If A is an infinite subset of V , where V is a vector space over a field F , we define span A , the s ubspace spanned by A , to be the set of all linear combinations of finite subsets of A . Thus span A i s the union of subspaces of V of the form span B where B � A is finite. In symbols, span A =

U

span B

B finite

You can check that this definition makes span A into a subspace of V , the smallest subspace of V to contain A. This is so important it is worth noting as a separate proposition.

IR

Proposition 2 . 1 4 If V is a vector space over or C, B � V, and a 1 , a2 , . . . are vectors in span B then span( a 1 , a2 , . . . , ak) � span B .

, ak

If A is infinite, a linear combination of vectors from A i s j ust an element of span A; that is, a linear combination of a finite number of elements of A. We repeat that, in general, there is no way to combine infinitely many elements into a single linear combination. 2.3

li near indep endence

++

IR

Suppose that A = { a1 , a2 , . . . , an} � V where V is a vector space over or C. We have seen that the zero vector is always a linear combination of vectors from A, and we can ask if the expression 0=

Oar

+ Oa2

·

· ·

Oan

for 0 is unique. The set of vectors A is said to be linearly independent if the expression above is the only linear combination of vectors from A that gives and A i s linearly dependent otherwise. Generalizing slightly to include the case when A may be infinite we have,

0,

28 Vector spaces

�

A set A � V of vectors in a vector space V over F = or C is linearly dependent if there is n E N, vectors a1 , a2 , . . . , an E A, and scalars ..\ 1 , ..\2 , . . . , An not all zero such that

Definition 2 . 1 5

..\ 1 a1 + ..\2 a2 + · · · + ..\n an = 0 . Otherwise, A is linearly independent. So a finite set A = { a1 , a2 , . . . , an} is linearly independent if and only if for all scalars ..\1 , ..\2 , . . . , An E F

..\ 1 a1 + ..\2 a2 + · · · + An an =

0 implies ..\1 = ..\2 = · · · = An = 0 .

Also, if A i s infinite, i t i s linearly independent if and only i f every finite subset of A is linearly independent. By convention, the empty set containing no vectors is linearly independent. ( 1 , 1 , 1 ) r , and

c

=

�3

the vectors a = ( 1 , 2, o )T , (0, 0 , 1)T form a linearly independent set, for if

Example 2 . 1 6 In the real vector space

b

..\a + Jib + vc = 0 ..\, Jl, v then the following system of equations is satisfied: ..\ + J1 = 0 2..\ + J1 0. J1 + But this system has ..\ = J1 = v = as its only solution. On the other hand, the vectors a = ( 1 , 2, o )r, b ( 1 , 1 , l )r, and d ( 1 , - 1, 3)T form a linearly dependent set since 2a - 3b + d = 0. Example 2 . 1 7 Let V be C considered as a real vector space with addition (11) (x1 + iy1 ) + (x2 + iy2 ) = (x1 + x2 ) + i(y1 + Y2 )

for some scalars

0

0

1/

and scalar multiplication (12) ..\(x + iy) = (..\x) + i(..\y). Then { 1 , i} i s linearly independent, since if ..\, Jl are real numbers with 0 = ..\ . 1 + J1 . i = ..\ + iJL then the real part, ..\ , and the imaginary part, Jl, of ..\ + iJL are both zero. Now consider V = C as a complex vector space with operations as in ( 1 1 ) an d (12) except now ..\ may be a complex number. This time { 1 , i} is linearly dependent, since

so for ..\

1· 1+i·i=O =

1 and

J1 =

i

..\ 1 + J1 . i .

=

0.

Bases 29 In IR3 , a one-element set {a} is linearly independent j ust in case a =1- 0. (Note in particular that {0} is linearly dependent since 0 = 0 , and the scalar used here is nonzero.) Also, {a, b} is linearly independent if and only if a and b do not lie on a single line through 0, and {a, b, c} is linearly independent if and only if a , b, and c do not lie on a single plane. We started talking about linear independence via the uniqueness of the linear combination 0 = Oa1 + Oa2+ + Oan for the zero vector. However, if A is linearly independent and v is in the subspace spanned by A then the linear combination for v is also unique, as the following useful proposition shows. Proposition 2 . 18 Suppose A = { a1 , . . . , an} � V is linearly independent, where V is a vector space over IR or C. Suppose also that v E V and there are scalars A1 , . . . , An and ILl , . . . , ILn such that

1

·

1

· ·

and

Proof We have

so

giving

and hence

since { a1 , a2 , . . . as required. 2.4

, an} is linearly independent. So A1

B ases

Definition 2 . 1 9 V which spans V .

=

ILl , A 2

= 1L2 ,

. . . , An = ILn D

A basis of a vector space V is a linearly independent set B �

Example 2.20 The real vector space IR3 has basis {e 1 , e2 , e3 } , where e 1 = ( l , O, O)r, e 2 = (O, l , O)T, and e3 = (O, O, l)T. To prove this you need to check

30 Vector spaces the set is linearly independent and spans the vector space in question. For the first, if

then

= A2 = A3 = 0, as required. (x, y , z)T where x, y, z E JR. But

so A r

For the second, an arbitrary vector in

IR3

is

(x, y , zf = xer + y e2 + ze 3 ( x, y, z f is a linear combination of e 1 , e 2 , e3 . Similarly, IRn has basis { er , e 2 , . . . , e n } , where ei is the n x with ith entry equal to and all other entries zero. This basis that it is called the usual basis or standard basis of IRn . so

1

1 column vector is used so often

The following theorem is particularly important.

Theorem 2 . 2 1 Let V be a vector space over IR or C, and Jet B .;; 1 A1a1 - · · · - >.;; An - l an - 1 · Hence an E span( a1 , . . . , an - 1 , b), as required. J

·

·

linearly

32 Vector spaces For the additional part, we are given that { a 1 , . . . , an- I , an } is linearly inde pendent; suppose scalars Jli are given with

J1 1 a1 + · · · + fln- l an- 1 + fln b = 0. Substituting b )q a! + · · · + A n - I an- ! + An an into this we get ( /11 + fln A l )a l + · · + (Jln- 1 + fln An )an- 1 + fln A n an 0. Now { a1 , . . . , an - 1 , an } is linearly independent so all these coefficients are zero. In particular, A n fln 0 so Jln = 0 since An =/:- 0. But this gives j1 1 a 1 + · · + fln-l an- 1 0 so /1 1 = 112 = · · · = fln- 1 = 0 by the linear independence of { a 1 , az, . . . , an }, as =

=

·

=

=

·

D required. Theorem 2.25 Suppose A, B are both bases of a vector space V over � or C.

Then A, B have the same number of elements.

Again, the theorem is true generally, but we will prove it here in the special case when one of the two sets A, B is finite. Proof Suppose A has at least as many elements as B, and B is finite. List all the elements of B as b1 , b2, , bn , and let a 1 , a 2 , , an be distinct elements of A. Our task is to show that this in fact lists all the elements of A . Now a 1 E V = span ( b1 , b2, . . . , bn ) so . • •

a1

• . .

A1b1 + A2 b2 + · · · + An bn for some scalars Ai· Certainly not all the Ai are zero, for otherwise a 1 =

= 0 E A so A would not be linearly independent. By reordering the bi if necessary we may assume A1 =1- 0. Then a1 ¢ span ( b2 , . . . , bn ), for else there would be scalars f.li

with

a1 = Ob1 + P,2b2 + · · · + f.ln bn A1 b1 + A2 b2 + + An bn and 0 =1- A1 , contradicting the uniqueness of the coefficients in linear combina =

· ·

·

tions of linearly independent sets ( Proposition 2.18) . So by the exchange lemma, a 1 , b2 , b3 , . . . , bn is a basis of V . Now consider a2 . Again, a2 = A1 a1 + A2 b2 + · · · + A nbn for some scalars Ai · Not all of A2 , A3 , . . . , A n are zero, else

A1a1 - az = 0

contradicting the linear independence of A. By reordering if necessary, we may assume A 2 =/:- 0. So az E span ( a1 , b2 , b3 , . . . , bn ) · But a2 ¢ span ( a1 , b3 , . . . , bn) , for else A1 a1 + Azbz + A3 b3 + · · · + An bn = P,1 a1 + Obz + P,3b3 + + f.ln bn for some scalars Jli, with ,\2 =1- 0 contradicting Proposition 2 .18. Therefore, by the exchange lemma a 1 , a2 , b3 , . . . , bn is a basis of V . · · ·

Bases 33 Continuing in this way, we eventually get that a 1 , a2 , . . . , an is a basis of V . Now if A -::/:- { a 1 , a2 , . . . , an} take a E A not equal to any ai . Since { a 1 , a2 , . . . , an} spans V there are scalars vi with

so { a1 , a2 , . . . , an, a} is not linearly independent, a contradiction. Therefore A { a 1 , a2 , . . . , an} , and A and B have the same number of elements.

= 0

Definition 2 . 26 The number of elements of a basis ofV (which depends only on V , and not on the choice of basis) is called the dimension of V . The dimension of V is den oted dim V .

The usual examples turn out to have the dimension you would expect. For example, JR3 has dimension 3 since e 1 = ( 1 , e2 = 1, e3 = 0, forms a basis of size 3. Similarly, !Rn has dimension n. The complex vector space en also has dimension n, since the usual basis { el ) e 2 ) . . . ) en} is a basis for en too ( but see also Example 2 . 17 and Exercise 2.5 for the dimension of en as a real vector space). It is wise not to forget the case of dimension This is when a vector space is spanned by the empty linearly independent set, 0. But what vectors are a linear combination of vectors in 0? The zero vector ( by convention) is one such, and in fact it is the only one. So a vector space V of dimension is the zero space, i.e. V =

0, O)T,

(0, O)T,

(0, 1)T

0.

0

{0}.

Corollary 2.27 If V is a vector space over IR or e and U s;;; V is a subspace of V then dim U ::;; dim V . If, additionally, dim V is finite and U -::/:- V then dim U < dim V . Proof Let B s;;; V be a basis of U. Then by Theorem 2.21 B extends to a basis B' 2 B of V. Clearly as B s;;; B' , B' has at least as many elements as B. If dim V is finite and U -::/:- V , then B' is finite and hence B is also finite, so U has finite dimension. But U = span B -::/:- V = span B', so B' -::/:- B and hence 0 B' has strictly more elements than B .

The second part of this very useful corollary can be stated in an alternative form as follows. Note too that the finiteness assumption is essential here (un like some of the results here which were proved only in the finite case but are nevertheless true in the infinite case too) . See Exercise

2.10.

Corollary 2 .28 Suppose that V is a vector space over IR or e, dim V is finite, and U s;: V is a subspace of V with dim U = dim V. Then U = V .

Often, a vector space V has finite dimension, in which case all bases of V are finite, but this may not be the case. In this book we are mostly concerned with finite dimensional vector spaces, but occasionally infinite dimensional spaces are required.

34 Vector spaces Example 2.29 Let V be the set of all functions j, g , . . . from the natural num bers N = {0, 1 , 2, 3, . . . } to JR, with addition f + g defined by

( J + g) ( n )

=

f ( n) + g ( n )

all n E N

and scalar multiplication >.. j by ( >.. f ) (n ) = >.. · f ( n )

all n E N.

Then V is a real vector space with infinite dimension. Proof We leave the verification of the vector space axioms as a straightforward exercise. To show that V has infinite dimension, let e; E V be the function defined by e; ( n ) = 0 for i :j:. n and ei (i) = 1. We show that {ei : i E N} is linearly inde pendent, and hence by Theorem 2.21 can be extended to a ( necessarily infinite) basis. Let

be an arbitrary linear combination of vectors from V , and suppose f = 0 . We must show >..0 = >.. 1 = · · · = An = 0 . The vector f E V is of course a function N -+ IR, and checking the definitions of + and scalar multiplication we see that J ( i)

=

{ �i

if i (; n otherwise.

But if f = 0 this means that f ( O ) = f ( 1 ) = · · · = An = 0 as required.

>..o = >.. 1 2.5

= f ( n ) = 0, in other words D

Coordinates

If V is a finite dimensional vector space over lR or C then it has a finite basis B � V. Since B spans V, every vector v from V can be written as a linear combination of elements of B. Thus if B = { v1 , v2 , . . . , Vn} , each v E V can be

written as

for some scalars >.. 1 , >.. 2 , . . . , >..n . By Proposition 2 . 18, these scalars >.. 1 , >..2 , , >..n are unique, so providing the ordering of B as v1 , v2 , . . . , Vn is understood the column vector ( >.. I ' >..2 , . . . ' An ) T from !Rn or en determines v uniquely; the A; are called the coordinates of v with respect to the ordered basis v1 , v2 , , Vn of V. • • •

• • .

Coordinates 35 Example 2.30 For the real vector space IR3 we may take ordered basis v 1 , v 2 , v 3

where

The coordinates of the vector v can be found by solving

=

(1, 0, 1)T with respect to this ordered basis 1/2,

1,

as three simultaneous equations in .A1 , .A2 , .-\3 . This gives .-\1 .-\2 = .A 3 so v has coordinates with respect t o the ordered basis v1 , v2 , v3 . Similarly, it has coordinates with respect to the ordered basis v1 , v3 , v 2 -changing the ordering of the basis changes the order of the coordin ates. Finally, v has coordinates with respect to the usual basis e1 , e2 , e3 0, of IR3 , just as you would expect.

1)T (1/2, 1/2,(1/2, 1, 1/2)T (1, 1)T

=

=

The convention we shall use in this book is that when the ordering of a basis is important we shall say so, and also omit the curly brackets, as in the phrase 'the ordered basis v 1 , v2 , . . . , Vn '. If the ordering is unimportant (so the basis can be just thought of as a set) we use curly brackets, as in 'the basis { v1 , v2 , . . . , Vn } ' . Ordered bases and coordinates are used to show that two vector spaces V, W of the same dimension over the same scalar field IR or C are isomorphic, or in other words look the same. Two isomorphic spaces V and W might not have exactly the same vectors, but vectors can be paired off, one from V with one from W , so that the operations of addition and scalar multiplication in W do the same to the paired vectors as the operations of addition and scalar multiplication in V do to the original vectors. Example 2.31 The real vector space IR2 of column vectors (with the usual addition and scalar multiplication) looks rather similar to the complex numbers, C, regarded as a real vector space. We can represent both diagrammatically as a plane (with y coordinates in the case of IR2 , the Argand diagram in the case of q . What's more, the column vector gives precisely the coordinates of the complex number + iy E C with respect to the ordered basis i of C. The idea is to pair these two vectors off with each other. This done, there are certain obvious similarities between the vector space operations. Compare

x,

(x, y) T

x

( ) + (x') (x + x')' x

y

with

=

y'

y+y

(x + iy) + (x' + iy' ) (x + x') + i (y + y' ) , =

1,

36 Vector spaces and

with

..\ (x + iy) = (..\x) + i( ..\y) .

Note that we ignore here the fact that in the complex numbers we have a multiplication operation w · z combining z, w E C whereas there is no such multiplication of two vectors giving another vector defined on !R;2 , since we are looking at the two spaces purely as vector spaces. A 'pairing off' of vectors in V and W is called a one-to-one correspondence or a bijection, and is really a function f : V ---+ W such that f is injective, i.e.

v ::/:- w implies f(v) ::/:- f(w),

and surjective, i.e.

w E W implies w = f(v) for some v E

V.

Bijections are used to give the complete definition of isomorphisms of vector spaces. Definition 2.32 Two vector spaces V, W, both over JR; or both over C, are isomorphic if there is a bijection f : V ---+ W such that ( 13)

f(u + v) f(u) + f(v) f(..\v) = ..\f(v) =

(14)

for all u, v E V and all scalars ..\. The bijection f is said to be an isomorphism from V to W . We write V � W or f : V � W . I t i s particularly important to realise that there are two different addition operations here, and two different scalar multiplication operations. So it might be better to write

f(u + v)

=

f(u) EEl f(v)

instead of (13), where + i s addition of vectors i n V and in W . Similarly,

f(..\ · v)

=

EEl

i s addition o f vectors

..\ 8 j (v)

would be a more accurate representation of (14), where · is scalar multiplication in V and 8 is scalar multiplication in W. Similarly, we should distinguish between

Coordinates 37 the zero vector Ov of V and the zero vector Ow of W as these really are different vectors. Note that if f : V c:::t W, then j (Ov) = Ow (or as we shall usually write, f(O) = 0, ignoring the fact that these two zero vectors are actually different) . Indeed, f(Ov) = Of(v) for any vector v E V . But O v i s the zero vector of V , f(v) is a vector in W , and any vector in W multiplied by 0 (according to scalar multiplication in W ) is the zero vector of W . Note also that (13) and (14) together imply that

for all a1 , a2 , . . . , ak E V and all scalars A. 1 , A.2 , . . . , Ak, so an isomorphism f takes linear combinations of a 1 , a2 , . . . , ak E V to linear combinations of

j (a1 ) , j (a2 ) , . . . , f(ak).

The following theorem, when carefully formulated, i s actually true for vector spaces of infinite dimension too, but we restrict our discussion here to the finite case to avoid the more difficult issues in set theory that would otherwise be needed.

Theorem 2.33 Suppose V is a vector space over JR. with finite dimension n � 0 . Then V � IR.n as real vector spaces. Similarly, if V is a vector space over 0. By the definition of 'dimension' there is an ordered basis v1 , v 2 , . . . , Vn of V of size n. The idea is to define f : V ---+ IR.n by taking each v E V to its coordinate form with respect to the ordered basis v1 , v 2 , . . . , Vn · Specifically, we define f by

noting that this definition is valid since every v E V has precisely one expres sion of the form A. 1v1 + A. 2 v 2 + · · · + AnVn. We just have to check that f is an isomorphism. The function f is injective, because if v = A.1 v 1 + .\2 v 2 + · · · + Anvn, w Jl ! V l + }.l2V2 + · · · + JlnVn , and f(v) = f(w), then

Ai = Jli for all i, so v = w. Also, f is surjective, since if = (r 1 , r2 , . . . , rn)T E IR.n , we have j (r1v1 + r2 v2 + · · · + rnvn) = r . Furthermore, if v = A1 v1 + A 2 v2 + · · · + AnVn and w = Jll v1 + }.l2V2 + · · + JlnVn,

so

r

·

then

so

38 Vector spaces f(v + w) = ( ).1 + f-1, 1 , >.. 2 + f-1,2 , · · · , An + Pnf = ( >.. 1 , >..2 , . . . , >..n f + (p1 , f.i,2 , · · · , pnf = f(v) + f(w) , and if v i s a scalar

so

f(vv) = (v >.. 1 , v>..2 , . . . , v>..n ) T = v( >.. 1 , >..2 , · · · , >..n f = vf(v) required. For the case of a vector space over C, the argument is the same, but use D scalars from C instead. as

as

It follows that any two real vector spaces V, W of dimension n are isomorphic, are any two complex vector spaces V, W of dimension n .

2.6

Vector spaces over other fields

The observant reader might have noticed that the two kinds of vector spaces we have been considering-over the reals and over the complexes-have much in common and he or she may wonder whether the notion of a vector space makes sense over any other number system other than � or C. The answer is yes. All we need is a number system in which we can perform the usual operations of addition, subtraction, multiplication, and division, subject to the usual rules, such as a + 0 = a, and a · a 1 = 1 , and so on. -

Definition

0 + and · , which satisfy the following axioms. a+b=b+a (a + b) + c = a + (b + c) a+O=a for all a there exists -a such that a + (-a) 0 a·b=b·a (a · b) · c = a · (b · c) a·1 =a for all a =I 0 there exists a-1 such that a · a- 1 = 1 a · (b + c) = a · b + a · c.

2 . 34 A field is a set F containing distinct elements

and 1 , with

two binary operations

=

If a field F is fin ite, its order is the number of elements in F.

( 15) ( 16) ( 1 7) ( 18) (19)

(20) (21) (22) (23)

Vector spaces over other fields 39 The axioms for fields can be thought of as forming three groups: (A) the rules for addition, ( 15)-( 18); (B) the rules for multiplication, ( 1 9)-(22); and ( C) the distributivity law, (23). Plenty of examples are furnished by arithmetic modulo some prime p.

Example 2 .35 If p is any prime number, let lFP denote the set { 0, 1, . . . , p - 1 } , and define operations + and · on lFP as follows. First use ordinary integer arith metic, and then 'reduce modulo p ' ; in other words, subtract whatever multiple of p is necessary to bring the answer in the range 0 to p 1 . -

Example 2.36 The field lF 2 has two elements, 0 and 1 , subject t o all the ordin ary rules of arithmetic except that 1 + 1 = 0 . Example 2 .37 The field JF5 o f order 5 can b e constructed as follows. First take ordinary arithmetic on the set {0, 1 , 2, 3, 4}: 3 4 3 4 4 5 5 6 6 7 7 8

0 1 2 0 1 2 1 2 3 2 3 4 3 4 5 4 5 6

+ 0 1 2 3 4

0 0 0 0 0 0

0 1 2 3 4

1 0 1 2 3 4

2 0 2 4 6 8

3 0 3 6 9 12

4 0 4 8 12 16

3 0 3 1 4 2

4 0 4 3 2 1

which on reduction modulo 5 gives + 0 1 2 3 4

0 0 1 2 3 4

1 1 2 3 4 0

2 2 3 4 0 1

3 3 4 0 1 2

4 4 0 1 2 3

0 1 2 3 4

0 1 0 0 0 1 0 2 0 3 0 4

2 0 2 4 1 3

Example 2.38 The integers modulo 4 do not form a field. One way to see this is to observe that the element 2 does not have a multiplicative inverse, so that you cannot divide by 2. This is because for x = 0, 1 , 2, 3, we have 2 · x = 0, 2, 0, 2 respectively, so there is no element x with 2 x = 1 . ·

In fact there is a field of order 4 , but it cannot b e obtained in this simple way.

Example 2.39 Let lF4 denote the set {0, 1 , a, a + 1 } with addition and multi plication defined by the following tables. + 0 1 a a+1

0 0 1 a a+1

1 1 0 a+1 a

a a a+1 0 1

a+ 1 a+1 a 1 0

0 1 a a+1

0 0 0 0 0

1 0 1 a a+1

Then it can be verified that this is a field with four elements.

a 0 a a+1 1

a+ 1 0 a+1 1 a

40 Vector spaces Notice that in the last example, the multiplication table tells us that a · a = a + 1 , which we can think of as a polynomial equation, a 2 = a+ 1 or a 2 + a + 1 = 0. In fact all finite fields can be defined in a similar way, starting with a field of prime order (in this case IF2 ) , and adjoining some element a satisfying a suitable polynomial equation. We shall see more of this in Chapter 9. For the moment, we merely state the following important theorem without proof.

p and each p ositive integer pn . Moreover, every finite field is of this form.

Theorem 2.40 For each prime field of order

n,

there is a unique

Example 2.41 The field of order 9 can be defined by adjoining a 'square root of - 1 ' to the field IF3 of order 3, in the same way that we obtain C from is,

R

That

Vector spaces over finite fields. The whole of this chapter can be gener

IE.

alized to vector spaces over an arbitrary field F. Simply replace or C in the definitions and theorems by the field F. All the theorems remain true in this more general context.

Example 2 .42 Let F

=

IF2 F8

=

=

{0, 1 } , the field of order 2. Then

{ (x1 , . . .

, xs ) T : Xi

E F}

is a vector space of dimension 8 over F, with a basis

{ (1 , o, o, o, o, o, o, of, (o, 1 , o, o, o, o, o, of, . . . , (O, o, o, o, o, o, o, 1)r}. These vectors are very important i n computer science, where they are called 'bytes'. The number of such vectors is 28 , as there are two possibilities for x1 , two possibilities for x 2 , and so on. More generally, if F is a field of order then contains exactly vectors.

q,

Fn

qn

The generalization of the main theorem, Theorem 2.33, states that any vector space of dimension n over a field F is isomorphic to Fn. As a corollary we have the following.

qn vectors.

Corollary 2.43 Any vector space of dimension exactly

n

over a field of order

q has

In fact, we may use this result to prove the second part of Theorem 2.40; that is, that every finite field has order equal to some prime power.

Lemma 2.44 Let F be a finite field, and let F0 be the subset F0

=

{0,

1, 1 + 1, 1 + 1 + 1, . . . }

of F . Then F0 is a subfield of F (i.e. is closed under addition and multiplication, and is a field in its own right) , and the order of F0 is a prime number,

p.

Vector spaces over other fields 41 Proof F0 is clearly closed under addition, by the associativity of addition. To show it is closed under multiplication, we use distributivity to show that

� - � = �· a

(24)

ab

b

Now F is finite, so the elements 1 , 1 + 1 , 1 + 1 + 1 , . . . cannot be all distinct, which implies that 1+1+···+1= 1+1+···+1

'-v-"

for some positive integers s times gives

r

>

r

s.

'-v-" s

Subtracting 1 from both sides of this equation

1 + 1 + . . . + 1 = 0.

'-v-" r-s

Let p be the smallest positive integer such that 1+1+···+1=0

'-v-"

'

p

and note that the argument just given shows that for no 0 �

s

< r � p does

1+1+··· + 1 = 1+1+···+ 1

'-v-"

for else 0 < r - s
y > 0 } . Exercise 2 . 5 What is the dimension (0, 1 , 2 , 3, . . , or infinite) of the following (a) (b) (c)

.

vector spaces? (Give reasons.) (a) C5 , as a complex vector space. (b) C5 , as a real vector space. (c) The set of polynomials p(X) of degree at most 7 with coefficients from IR , as a real vector space. (d) The set of polynomials p(X) of arbitrary degree with coefficients from IR , as a real vector space.

Exercise 2.6 For each integer n ? 0, let fn : IR � IR be the function defined by fn ( x) = xn . Show that the set B = {fn : n ? 0 } is a basis for the vector space IR[x] of real polynomial functions. Show that the map ¢ defined on IR[x] by

is injective but not surjective. Deduce that to a proper subspace of itself.

IR[x] is isomorphic (as a vector space)

Exercise 2 . 7 Write out the addition and multiplication tables for IF3 . Exercise 2.8 (a) Write out the addition and multiplication tables for integers modulo 6), and show that ::Z6 is not a field. (b) More generally, show that if a and b are any two integers bigger than Zab is not a field.

p

::Z6

(the

1 , then

Exercise 2.9 Let be a prime number, and Zp be the set of integers modulo p. For any nonzero element a E Zp , we define the map 'multiplication by a '

by ma :

b f-t ab

(mod

p)

44 Vector spaces (a) Use uniqueness of prime factorization to show that m a is an injection, and hence a bijection. (b) Let b be the element such that ma (b) = (mod p). Show that b is an inverse to a, and deduce that Zp is a field.

1

Exercise 2.10 Let V be the set of all functions f : N -+ IR, and let 0 be the function defined by O(n) = 0 (see Example 2.29). (a) Prove that each of the axioms holds, showing that V is a real vector space. (b) Let B = { ei : i E N} where ei is as in Example 2.29. Show that B does not span V. (Hint: consider f such that f (n ) = for all n .) (c) Show that W = span B and the real vector space IR[X] of polynomials with coefficients from IR are isomorphic to each other. (d) Show that V and W are not isomorphic to each other, i.e. there is no iso morphism f : V .':::f W. (This is tricky.) [Hint for (d): given such an f, use induction on n to define a function g : N -+ IR so that for each n there is k � n such that

1

in

jRk + I

.]

Part II Bilinear and sesquilinear forms

3 I n ne r pro d uct spaces I n this chapter we consider ways of 'multiplying' two vectors in a vector space V to give a scalar. Such a product is called an inner product, or scalar product, and the theory is based initially on a familiar example (sometimes called the dot product) on three-dimensional vectors in IR3 . To begin with, we consider real vector spaces, but we will consider complex vector spaces later on. 3.1

T h e standard inner prod u ct

The scalar product, inner product, or dot product, of two vectors in IR2 or JR3 is rather well known. We start this chapter with its definition and description of its main properties. Throughout this section, llvll will denote the length of the vector v in IR2 or JR3 .

The standard inner product in IR2 • Let r, s E IR2 be nonzero vectors, and let B be the angle from vector r to s. The angle r makes with the x-axis will be denoted a. (See Figure 3.1. ) We suppose also that r = (r1 , r2 )T and s = s2 )T

in coordinate form, so

llrll

=

Jr? + r� and llsii Jsi + s�.

(s1,

=

Now, the matrix for rotation about the origin by an angle (or negative) direction is the basis-changing matrix

a

in the clockwise

s r

Fig. 3 . 1

Two vectors and the angle between them

48 Inner product spaces

r

b

Fig. 3.2 The cosine rule

( cos a

sin a - sin a cos a

) = 1 ( r1

jj;IT -rz

In other words, multiplication of a position vector ( x, y ) T on the left by this matrix gives the corresponding position after rotation by the angle a as indicated. Clearly, this matrix moves s to the vector li s i i ( cos O, sin O ) r , since B is the angle between r and s, so

jj s jj

1 (cos O) = n;rr ( -r 1 z sin 0 T

from which we deduce that

1"1 81 + Tz 8 z = l l r l l l l s ll cos O

(1)

1"1 82 - 1"28 1 = l l r l l i i s ll sin O.

(2)

and

This chapter and the next are concerned with expressions like these. In par ticular, ll r l l l l s l l cos B is called the standard inner product of r and s, and will be denoted r · s or (r j s ) . Note that as cos 0 1 we have =

The cosine rule. Before we look at inner products in three or more dimen sions, we will deduce the familiar identity a2 = b2 + c2 - 2bc cos 0 for a triangle with sides of length a, b, c and angle 0 opposite the side a. Let r, s be the vectors in IR2 with angle between them 0 and lengths il r il = b and l l s ll = c, and l i s - r ll = a, as in Figure 3.2; suppose that r = ( r1 , rz ) r , s = ( 8 1 , 82 ) r. Then

Inner products 49 /I s - r /1 = (81 - r l ) 2 + (82 - r2 ) 2

812 - 281r1 + r21 + 8 22 - 2 82 r2 + r22 = 8 21 + 8 22 + r21 + r22 - 2( 8rrr + 82 r2 ) 2 = // s // + / / r UZ - 2 // r l ! l! sl ! cos B. =

So

The standard inner product in �3 • As in �2 , the standard inner product r s or (r / s ) is defined to be l!ri ! llsll cos B where B i s the angle between the two vectors. Suppose r = (r1 , r2 , r3 )T and s = (81 , 82 , 83 )T; then by the cosine rule (referring to Figure 3.2 if necessary) we have 2llrll i!si! cosB = l!rll 2 + lls i! 2 - lis - rll 2 ·

r 21 + r22 + r23 + 8 21 + 822 + 823 - ((81 - r l ) 2 + (8 2 - r2 ) 2 + ( 8 3 - r3 ) 2 ) 2 (r 8 J + r282 + T3 8 3 ),

=

r

=

by cancelling terms. Therefore we have an expression very similar t o that ob tained in two dimensions for our inner product, i.e. (3) 3.2

I n ner prod u cts

Although it is more difficult to interpret the idea of angle in four or more dimen sions, the expression in (3) suggests we define (v/w) in �n by

n

(v/w)

=

L v; w; i= l

for two vectors v = (v1 , v2 , . . . , vn)T and w = (wr , w2 , · · · , wn)T in �n . This is called the standard inner product on �n , but there are many other possible inner products, as we shall see. What are the essential properties of an inner product? We obviously have

n

(w/v)

=

n

L w; v; L v;w; i=l

=

i= l

n

(v\A.w) and

=

L v;AW; i=l

=

(v/w) ,

n

=

,\

L v; w; i=l

=

,\ (v \ w),

50 Inner product spaces n

(uiv + w ) = L u; ( v; + w; ) = i= l

n

n

i=l

i=l

L u;v; + L u;w; = (ujv) + (ujw) .

A slightly less obvious property which is nevertheless important is that n

(vlv) = ""' L.., V;2 ;::: 0, i=l

i,

and also that if (viv) = 0 then 2:::� 1 vf = 0 and hence v; = 0 for all which implies v = 0. It is these properties which we (somewhat arbitrarily) choose as the defining properties of an inner product.

Definition 3 . 1 If V is a vector space over IR, then an inner product on V is a map (written ( j )) from V x V to IR (taking a pair of vectors ( v , w) to a real number (vjw)) with the following properties. (a) (Symmetry) (vlw) = (w!v) for all vectors v , w in V . (b) (Linearity) (u l ..\v + J.Lw) = ..\ (u jv} + J.L(ujw} , for all vectors u, v , w in V and all scalars ..\, J.l· (c) (Positive definiteness) i. (v ! v} ;::: 0, and ii. if (vlv) = 0 then v = 0 (equivalently, if v =I 0, then (v l v) =I 0) for all vectors v E V .

Notice that because it i s symmetric, the linearity i n the second variable im plies linearity in the first variable. That is,

(..\u + J.Lvlw} = ..\(ujw} + J.L(vjw}. Thus an inner product is linear in both variables, so we call it bilinear. A finite dimensional vector space over IR with an inner product defined on it is called a Euclidean space.

Definition 3.2

We can define all sorts of different inner products on a vector space, not just the standard ones given above.

Example 3.3 In IR2 we could define

( (a, bfj (c, df) = ac + bc + ad + 3bd. This satisfies the above three properties, so is an inner product. The first two are easy to verify. To check the third one, observe first that ( (a, b)T j (a, b)T) = a2 + 2ab + 3b2 = (a + b) 2 + 2b2 ;::: 0. Moreover, if ( (a, b)Tj (a, b)T ) = 0, then (a + b) 2 + 2b2 = 0, so a + b = 0 and b = 0, and therefore (a, b)T = (0, O)T.

Example 3.4 Another example is given by the vector space 'it'[a, b] of all con tinuous functions 1 from the closed interval [a, b] to JR. Here, addition and scalar 1 We shall not give a completely rigorous account of 'continuous functions' and integration.

Instead, the student is directed to any standard textbook in analysis. For now, the basic properties needed here can be take on trust. See also Appendix A .

Inner products 51 multiplication are the usual pointwise addition and scalar multiplication of func tions (similar to that in Example 2.29) , and the zero element is the function which takes the value 0 everywhere. If and are two functions in we can define their inner product to be

f g

�[a, b],

(fj g) = 1b f(x)g(x) dx.

To prove that this is an inner product according to Definition 3 . 1 , we first need the facts that

1 b f(x)g(x) dx = 1 b g(x)f(x) dx, 1b f(x)(ag(x) + f3h(x)) dx = o: 1b f(x)g(x) dx + f3 1b f(x)h(x) dx,

and

which immediately give properties (a) , (b) , and (c i) . To prove c ii) , we need the (somewhat more difficult) result that if -+ is a continuous function which is nonnegative and not identically 0 , then :f:- 0. This result is proved in Appendix A , as Lemma A .2. Now suppose that is not identically 0, so that defines a continuous function which is everywhere nonnegative and not identically zero. Then

IE.

J: g(x) dx

g: [a, b]

(

(f(x)) 2 (!I f) = 1 b (f(x)) 2 dx :f:- 0 ,

f

(

giving c ii) , s o we have now proved that this is an inner product.

�[a, b]

in Example 3.4 is a very large infinite Example 3 . 5 The vector space dimensional space, but we can construct a similar finite dimensional example, by taking the space of all polynomials in of degree less than n, and defining an inner product by

IE.n [x]

x

1 (fj g) = 1 f(x)g(x) dx. The ordinary scalar product on IE.2 and IE.3 is related to concepts of length and distance in a way that can easily be generalized to arbitrary Euclidean spaces. If we take v = (x, z ) T E IE.3 then for the standard inner product, v v = x2 + 2 + z 2 , which is the square of the length of the vector v. The distance between two vectors v and is then naturally defined as the length of y,

·

y

v - w . In general we define:

w

52 Inner product spaces Definition 3.6 The norm (or length) of a vector v is written llvll and defined by llvll = �, the positive square roo t of the inner product of v with itself The distance between two vectors v and w is written d( v , w) and defined by d(v, w) = l l v - w j j . As a consequence, we note that 11 - v ll = llvll ; the 'length' of the vector - v equals the length of v . More generally, if you multiply a vector by a scalar, then its

length is multiplied by the absolute value of the same scalar.

Proposition 3. 7 For all vectors v in a Euclidean space V , and for all .A E IR, we have jj.Av ll = !.X I · llvii -

Proof II.Avl l = J(.Av j.Av) = J ( .A2 (vjv) ) = I.A I · J(vjv) = j .A j · !l v l ! .

D

If you add two vectors together, the relationship between the lengths is not so simple, but we still get the familiar triangle inequality, i.e. llv + wjj � II v ii + !! w ll · To prove this in general requires the following very important result.

Proposition 3.8 (The Cauchy-Schwarz inequality) For all vectors v, w in a Euclidean space V ,

j (v jw) l � ll v ll · l l w ll -

Proof For every real value of .A we have 0 � llv + A.w l l 2 = (v + A.w jv + A.w )

= (v jv) + .A(vjw) + .A(wjv) + .A2 (wjw) = .A2 II w ll 2 + 2.A(v j w) + l l v l l 2 ·

Now regard the right-hand side as a quadratic polynomial in the variable .>.. . This polynomial is always nonnegative, so it has at most one real root. Therefore the discriminant ('b2 -4ac' for the polynomial ax2 + bx + c) is nonpositive. In symbols, hence Now take the positive square root of both sides to obtain the result.

D

A slightly different proof is obtained by saying that the first inequality in the above proof, namely holds for all values of .>.. , in particular if w =f. 0 then the inequality above holds for

(v jw) .A = l lw l l 2 · Substituting in and simplifying yields the Cauchy-Schwarz inequality. Of course, if w = 0 then (v j w) = 0 so both sides of the inequality are zero and so the

Inner products 53 inequality is true here too. (Compare this also with the proof of the complex version given in Proposition 3.21 below.) The Cauchy-Schwarz inequality has many forms, as it can be applied to all sorts of spaces. For example, applying it to the standard inner product on we obtain the following.

IR',.n

Corollary

3.9 If x1 ,

. . . , Xn , Y1 , . . . , Yn are any real numbers,

then

Similarly, we can apply it to Example 3.4 to obtain:

Corollary 3 . 1 0 If f and interval [a , b] , then

g

are continuous real-valued functions on the closed

(1 f(x)g(x) dx)2 b

Proposition

3.11

�

1 ( j (x)) 2 dx 1 (g(x)) 2 dx. b

b

(The triangle inequality) For any vectors v, w in a Euc

lidean space V ,

ll v + w ll

�

ll v ll + ll w ll ·

Proof Expanding directly, l l v + w l l 2 = \v + w lv + w) = \vlv) + 2 \v lw) + \w lw)

ll v ll 2 + 2 1 \v lw) l + ll w ll 2 � ll vW + 2llv l l l lwll + ll w W = ( I I vii + ll w ll ) 2 �

by Proposition 3.8

·

0

Now take the positive square root of each side.

We have just seen how to generalize the concept of distance from ordinary three-dimensional Euclidean space to arbitrary real inner product spaces, in such a way that the basic theorems like the triangle inequality still hold in this more general context. Another concept that can be easily generalized is that of angle. Recall that two vectors in JR',.3 are perpendicular (at right angles, or orthogonal) if their inner product is zero. More generally, if v and w are two nonzero vectors in JR',.3 then the angle e between them is given by v w = ll v llll w ll cos e . These properties can be used as definitions in arbitrary spaces with inner products defined on them. ·

Definition

3 . 1 2 If V is a Euclidean space, and v and w are elements of V, then and w are said to be orthogonal if \vlw) = 0. If both v and w are nonzero, the angle between v and w is defined to be e where 0 � e � 7r and (vlw) cos e v ll ll · llwll "

v

=

54 Inner product spaces Note that the Cauchy-Schwarz inequality (Proposition 3.8) implies that _1

(vj w} "' ll v ll · ll w l �

and so this definition of e makes sense.

�

1

Example 3.13 In IR3 with the standard inner product, these definitions coincide with the ordinary geometrical definitions.

Example 3.14 In '6'[-1r, 1r] with the inner product

(fjg} = i: f (x)g(x) dx,

define functions fk and gk by fk (x) = cos( kx) for integers k � 0 and gk (x) = sin ( kx) for integers k � 1 . Then you can check that any two of these functions are orthogonal to each other. For example,

since the integrand is an odd function of x, i.e. a function f (x) such that /( -x) - f (x) for all x E JR. Also, provided k ::/:- m,

=

(fkl fm ) = /_ cos(kx) cos(mx) dx = � /_: (cos(kx + mx) + cos(kx - mx)) dx [ rr

rr

1 2

=

0.

_

sin - (krr+mrr) sin(krr+mrr) k+m k+m + sin(krr-mrr) sin -(krr - mrr) k-m k-m _

]

Similarly, if k ::/:- m,

(gkj gm ) /_: sin (kx) sin(mx) dx = � /_rr (cos(kx - mx) - cos(kx + mx) ) dx rr =

= 0.

(If k = m then cos(kx - mx) = 1 and both integrals come to 1r . ) This is an important example which you will most likely meet elsewhere as well. It is the foundation of the theory of Fourier series.

Inner products over C 55 3.3

I nner prod u cts over C

So far in this chapter we have been working with real vector spaces. However, the whole theory goes through for complex vector spaces with very little change. In what follows we write z for the complex conjugate of z, and lz l for the absolute value of z, so that lz 1 2 = zz. We shall also use Re(z) for the real part of z, and Im(z) for the imaginary part of z , so z + z = Re(z) and z - z = Im(z) .

2

2i

Example 3 . 1 5 Let V = C, regarded as a one-dimensional complex vector space. It would be nice if our definition of an inner product gave rise to liz - wll being the distance from z to w in the Argand diagram, i.e. i z - w l , which would mean that l l z l l 2 = lzl 2 = zz . This suggests we should define the inner product by (ziw) = zw. This has slightly different properties from the real inner products we have seen so far. For example, (ziw) = z w = w z = (wiz) . We still have (zi .A.w) = z .A.w = .A. z w = .A. (ziw), but now (.A.z lw) = .A.z w = >: z w = :\ (ziw) . Based on this example, we give the following formal definition of a complex inner product.

Definition 3 . 1 6 If V is a vector space over C, then a map ( I ) from V

x V to C (taking (v, w) to (v lw)) is an inner product if the following are true for all u, v , w E V and .A., p E C. (a) (Conjugate-symmetry) (v lw) = (wlv) . (b) (Linearity) (ui .A.v + pw) = .A.(uiv) + p(uiw). (c) (Positive definiteness) i. (vlv) E IE. with (v iv) ;?: 0, and ii. if (v iv) = 0 then v = 0 (equivalently, if v -::f 0 then (vlv) -::f 0).

Notice that (a) and (b) imply that

( .Au + pvlw) = (w i .A.u + pv) = .A.(wlu) + p(wlv) = >: (w lu) + Ti (wlv) = >: (uiw) + Ti (v iw). Note in particular (ru + sv i w) = r (uiw) + s (v lw) for any real r, s ; in particular, putting r = s = 1 we have (u + v lw) = (ulw) + (v w ). Although this type of

l

inner product is not bilinear it is nearly so. It is sometimes called sesquilinear, i.e. 'one-and-a-half-linear', because of this and part (b) of Definition 3 . 1 6 , and it is also called conjugate-symmetric in view of part (a) of Definition 3 . 1 6 .

56 Inner product spaces Definition 3 . 1 7

A finite dimensional vector space over e with an inner product defined on it is called a unitary space.

We will also need to be able to refer to vector spaces with an inner product without specifying whether these spaces are over IE. or over e, and without any assumption that they have finite dimension. Such spaces will be referred to in the rest of this book as inner product spaces.

n and let v and w be any two vectors in V , say = e v = (vi, · · · ,vn )T and w = (w ,wn )T. Then define n v;w;. (v j w ) = L i= l Example 3 . 1 8 Let V

i

, · · ·

Then ( I ) is an inner product (as you should check) , called the standard inner product on

en .

Most of the results for real inner product spaces also work for complex inner product spaces, with a sprinkling of complex conjugates and absolute values inserted.

Definition 3 . 1 9 If V is a unitary space and

v to be l v l = y'(vjv), and the d(v,w) = l v - w l .

v E V, we define the of v and w to be norm

distance between two vectors

(The concept of angle is not so easy to interpret in the complex case, so we will not try to do so. )

v 1 >-v l = i >- i · l v l · j >.vj j = J(>.vj >.v) = J(�.\ (vj v) ) = j .\ j · y'(vj v) = j >. J · ! v ll-

Proposition 3.20 For any vector i n a complex inner product space and any E e, we have

>.

Proof

Proposition 3.21 (The Cauchy-Schwarz inequality) For any vectors in a complex inner product space V,

D

v, w

j(vjw )j � ll v l · l w llProof For variety, we give here a slightly different proof from the one we gave for the real case. Note carefully the places in the proof where we have to insert an absolute value, complex conjugate, or real part. For every complex value of ,\ we have 0�

ll v + .\w l 2 = (v + >.wj v + >.w) = (vj v + >.w) + �(wj v + >.w) = (vj v ) + .\(vj w ) + �(wj v ) + �,\(wjw) = (v j v) + >.(vjw) + � (vj w) + �>.(wj w ) = 1 J v l 2 + 2 Re (.\ (vj w )) + i>-l2 l w l 2 ·

Inner products over C 57

w -1- 0 it is true for = (l wjw lv2) . Since (v! w)(wj v ) = ( vj w )(v! w ) = ! (v! w W , we have - (v ! w2W ) + ! (v! wW l w ll 2 = - !(v ! wW2 + ! v i 2 . 0 � l v l 2 + 2 Re ( l l wl l!wl 4 l wl!

Now this is true for all .>.. , in particular if .>..

-

Therefore

w

from which the result for -1- 0 follows by taking the positive square root of 0 both sides. For = 0 the inequality is trivial as 0.

(vj w) = l ! w ll = Proposition 3. 22 (The triangle inequality) For any v, w in a complex inner product space V, l v + w l � ! v i + l w ll· Proof A s before, l t v + w l 2 = (v + w ! v + w) = (vjv) + (vj w) + (wj v) + (w ! w) = (v l v ) + 2 Re(vj w) + (wj w) � l v l ! 2 + 2j(v j w )! + l w l 2 by Proposition 3.21 � l v ! l 2 + 2 1! v ll · l ! w ! l + l w l 2 2 = (! vi + l wli) . w

0

Now take the positive square root of each side.

Example 3.23 For a < b from IR., the space 'i!?c [a, b] is the complex vector space of all continuous functions (see Appendix A) f : [a, b] -+ C with pointwise addition and scalar multiplication, analogous to the real case (Example As in the real case, 'i!?c [a, b] can be given an inner product,

3.4).

{J j g) = 1b f(x) g (x) dx.

The axioms of linearity in the second argument and conjugate-symmetry are straightforward to verify; positive definiteness is proved just as in the real case, using = defines a continuous E IR., and the fact that real-valued function on [a, b]. The Cauchy-Schwarz inequality for this space says that whenever j, E 'i!?c [a, b] then or �

f(x)f(x) l f (x)J2 l f (x) l !(f! g)j l f ll l g l ! , 1 /2 1 /2 ) ) 2 I 1b f(x)g(x) dx I � (1b l f(xW dx (1b jg (x) ! dx

g

58 Inner product spaces Example 3.24 Let V be the set of all continuous functions the integral

00 b lim f l f(xW dx !-oo jJ(xW dx a-too b-too -a

f:

ffi. --+

C such that

=

exists in IE.. We show that V becomes a vector space over C when addition and scalar multiplication are defined pointwise, as in the previous example. The most difficult part is in showing that V is closed under addition, i.e. that if E V then E V, in particular that the integral

f, g

f+g

is finite. For this, let

/_: J f (x) + g (xW dx

a, b > 0 and note that

l f (x) + g(xW = (f(x) + g(x))(f(x) + g(x)) = f(x)f(x) + f(x) g (x) + g(x)f(x) + g(x) g (x) = l f(x) l 2 + 2 Re (f(x)g(x)) + j g (x) j 2 . So f� a l f (x) + g(x)J Z dx equals j_ba l f (xW dx + 2 j_ba Re (f(x) g (x)) dx + j_ba J g(xW dx � (f l J ) [-a, b ] + 2 j (f j g ) [ -a, bJ I + (g j g) [ - a, b] where ( I ) [- a, b] denotes the inner product in '?t'c [ -a, b] of the previous example. Now put r f�oo l f (x)J Z dx, and s J�00 j g (x)J Z dx; note also that =

=

f�a l f (x) + g (x)J 2 dx is at most Ul f) [ -a,b] + 2J(Jj f ) [ - a, bJ I 1 /2 j(g j g) [-a,bJ I 1 /2 + (g jg) [ -a ,b] � r + 2vrs + This gives an upper bound on I( a, b) = f� a l f (x) + g(x)J 2 dx which is inde pendent of a, b. Since I( a, b) is nondecreasing as a, b this shows that lim a , b -+oo I( a, b) f�00 j j (x) + g (x)J Z dx exists, required. To prove closure of V under scalar multiplication, just note that for .\ E C,

by Cauchy-Schwarz in

'?t'c [-a, b].

Hence

r:::

=

as

--+

s.

oo,

The vector space axioms are then verified in the usual way and all are now straightforward.

Exercises 59 In fact,

(Ji g) = /_: f(x)g(x) dx defines an inner product on V . Once again, the only difficult part here is to show that if E V then the integral f�oo exists in C. This can be done by considering the real and imaginary parts of this integral separately, and using the Cauchy-Schwarz inequality in 'G'c [ -a, b] , as before. The axioms for an inner product can be verified as for 'G'c [ a, b] .

f(x) g (x) dx

J, g

Exercises

Exercise 3 . 1 For each of the following pairs of vectors in JR3 , calculate the angle between them. (a) ( 1 / J2 , 1/ J2 , 0) T , (0, 1 , 0) T (b) ( 1 , 1 , 0) T , ( 1 , - 1 , 0) T (c) ( 1 , 0 , 1 ) r , ( - 1 , 1 , 0) r . Exercise 3.2 In each of the following cases compute the projection of s in the direction of r. Then write down a nonzero vector t in the space spanned by s and r orthogonal to r. Verify your answer by computing (t jr) . (a) s = (5, 2 , 1 ) r , r = ( - 1 , 2, 3) T (b) S = 1 , 1 , 1 ) T , r = - 1 , 0, 1 ) T .

(

(

Exercise 3.3 From equation (2) , the value r1 8 2 - r2 8 1 two-dimensional determinant

llrll llsll sin B is the

Give a geometrical interpretation of this determinant in terms of the area of the parallelogram drawn out by r and s. Explain the significance of its sign in terms of the direction travelled as you go from r to s.

Exercise 3.4 The triangle inequality is often used in one of the following al ternative forms: ll v - w ll :( l l v ll + l lv + w l l � l lv l l ll v - w ll � ll v ll -

ll w l l ll w ll ll w ll -

Show that these forms follow from Proposition 3. 1 1 by simple substitutions. (Remember that 1 1 - w ll = ll w ll - )

+

Exercise 3.5 By expanding ( v + w l v w) and (v - w jv - w ) , prove that, for any vectors v , w in a real inner product space,

60 Inner product spaces

+

(a) ( v lw) = � ( llv wll2 - llvl l 2 - !lwW ) , (b) llv + wll2 llv - wl! 2 = 2 ( llvll2 llwll2 ) .

+

+

Exercise 3.6 Prove that in any complex inner product space (a) (vlw ) = � ( llv wll2 illv - iwll2 i) ( llvll2 llwl!2 ) ) , (b) (v lw) = t ( llv wll2 - llv - wll2 illv - iwll2 - illv iwW ) .

+ + +

+

(1 +

+

+

Exercise 3 . 7 In each of (a) , (b), (c) below, determine whether ( I / is an inner product on V. Justify your answers. In every case which is an inner product, write down the appropriate form of the Cauchy-Schwarz inequality.

+ (v2 - w2 ) 2 where v = (Vv21 ) and w = (Ww21 ) . (b) V = IR3 , (x l y ) = 2x l Y 1 + 3 X Y + 2x3 y3 + x1 Y3 + X3Y 1 -2x2 Y3 -2x3y2 , where 2 2 X = ( x1,X2 ,x3 ) T and Y = (y!,Y2 , Y3 ) T. (c) V = JR3 , (x l y) = X!Y! + 2X 2 Y2 - 2X!Y3 - 2 X3Yl - X2 Y3 - X3Y2 , where X = (x 1 ,x2 , X3 ) T and Y = (y! , Y2 , Y3 ) T. (a) V = IR2 , (v!w) = ( v 1 - w ! ) 2

Exercise 3 . 8 D o the same as the last question for the following.

(a) V is the set of all n x n matrices with entries from IR, and (AlB) = tr(AB) . (Recall: if C is the matrix ( ci1 ) , then tr(C) denotes the trace of C , defined by tr(C) = Cii · ) (b) V is the set of all n x n matrices with entries from and (A B) = tr (A TB) . (For one of them, try to write down an expression for (AlA) where A = ( a ij ) 2 For the other, try to show that in terms of the aij . It should remind you of (AlB) is not positive definite. )

2::7= 1

IRn

IR,

I

.

Exercise 3.9 Complete the proof that

/_: f (x) g (x) dx is inner product on the complex vector space 'C'c [ -a, a] of Example 3.2 4. Exercise 3 . 1 0 Let V be the real vector space JR3 with the standard inner product. Describe the set of vectors in V which are orthogonal to ( -1, 0, 2 )T. In particular, show that this is a subspace of V, find a basis for it, and hence (J ig) =

an

deduce its dimension.

4 B i l i near a nd sesq u i l i near forms In the last chapter we introduced the idea of an inner product on a vector space. Our main example was the standard inner product on IR3 ( and, more generally, on !Rn ) This inner product has a particularly elegant geometric meaning, but we saw other important examples ( involving vector spaces of continuous functions and integration, for example ) where the geometric interpretation isn't so clear. It turns out that there are many other interesting forms defined like inner products which are not positive definite-some of which, like Minkowski space ( Example 4. 1 ) , have significant physical interpretations. This chapter starts off the study of these forms, and in particular shows how they may be represented by matrices. .

4. 1

B ilinear forms

Example 4 . 1 (Minkowski space) This example is very important in the the ory of special relativity. Take V = with three 'space' coordinates x, y, z and one ' time' coordinate t, and define a function F : V x V --+ by

IR4,

IR

c is the speed of light. Then F is symmetric, F ( (x1 , Y1 , z1 , t l ) r , (x2 , y2 , z2 , t2 ) T ) = x1x2 Y1 Y2 z1 z2 - c2 t1t2 = F ( (x 2 , y2 , z 2 , t2 ) r , (xl , Yl , zl , ti ) r ) ,

where

+

+

and linear in both arguments, e.g.

+ + + +

F (.A (x1 , Yl , z1 , tl ) T t-t (x 2 , y2 , z2 , t2 ) r, (x 3 , Ys , zs , ts) T ) = (.\x1 fJ,X 2 )X 3 + (.\yl MY2 ) Ys + (.\z1 fJ,Z2 )z3 - c2 (.\t1 + t-tt2)t3 = .A (x1 x 3 Yl Y3 z1 z3 - c2 t1ts) t-t(X2X 3 Y2Ys + Z2Z3 - c2 t2ts) = .A F ( (x1 , Y1 , Z1 , tl ) r, (x3 , Y3 , z 3 , t3 f) + t-t F ( (x 2 , Y2 , z2 , t2 ) r, (x 3 , y3 , Z3 , t3 ) r ) , but if we try to define a 'norm' by ll v ll 2 F( v, v) and a ' distance' by d(v , w) = ll v - w ll , then we run into problems since F is not positive definite. In fact F(v - w, v - w) can be positive, giving a so-called space-like distance

+

+

=

+

+

62

Bilinear and sesquilinear forms

d(v, w) JF(v - w, v - w), =

negative, giving a time-like distance

d(v,w) ± J-F(v - w, v - w), or even zero, as is for example d((O, O,O, O)r, (c,0, 0, 1)T). =

In this chapter we will generalize the idea of inner product by dropping the condition of being positive definite, and sometimes also the symmetry condition.

F from u, v, w o: , (a) F( o:u + (Jv,w) = o:F(u,w) + (JF(v,w), and (b) F(u, o:v + (Jw) o:F(u, v) + (J F(u, w). Definition 4.3 A bilinear form on V is symmetric if also (c) F(u,v) F(v,u) for all u, v E V . Example 4 . 4 (The Lorentz plane) Take V IR2 and define the map F from V V to IR by F((a, b)r, (c, d)T) ac-bd. Then F is a symmetric bilinear form on V (as you should check) , but is not an inner product since F((O, l)r, (0, 1)T) = -1, and therefore property (c i) in Definition 3.1 of an inner product fails. We also have F((1, 1)r, (1, 1)T) = 0, so there is a nonzero vector whose 'norm' is zero, and property ( c ii) fails also. Definition 4.2 A bilinear form on a real vector space V is a map V x V to IR which for all E V and (3 E IR satisfies =

=

=

=

x

The next example is really a whole family of examples. It is particularly important since it will turn out that all bilinear forms on finite dimensional real vector spaces can be regarded as essentially just one of these examples, in the same way that every finite dimensional real vector space is isomorphic to IRn for some n . Example 4.5 Let V be the real vector space IRn , and suppose is an n x n matrix with entries from JR. Define an operation

F on V

x

V by

A

F(x, y) = xTAy. Since xT is a 1 matrix , A is and y is 1, this is well-defined matrix multiplication and also F(x, y) is a 1 1 matrix, which we can consider the same the real number which is its only entry. Thus F: V V IR and it turns out that F is a bilinear form on V. Furthermore, if A is symmetric , i.e. if AT A, then F i s symmetric. x n

as

n x n,

x

n x

x

-+

=

Proof To prove all these claims, we must check the two axioms for a bilinear form. We have

F( o:x + (Jy, z) ( o:x + (Jy)TAz ( o:xT + (JyT)Az o:xTAz + (JyTAz o:F(x,z) + (J F(y,z), =

=

=

=

Representation by matrices 63 and similarly,

F(x, o:y + {3z) xTA(o:y + {3z) xTA(o:y) + xTA(f3z) o:xTAy + {3xTAz o:F(x, y) + {3 F(x, z). Now suppose A is symmetric. Then since the transpose of a 1 1 matrix is itself, F(x,y) F(x,y)T. Using the fact that the transpose of a product of =

=

=

=

x

=

matrices is equal to the product of the transposes taken in the opposite order ( Proposition 1.3) , we have

F(x,y) F(x,y) T (xTAyf (yTAT (xT f ) (yTAx) F(y,x), since AT A and (xT)T x. Example 4.6 If V !Rn and A is the identity matrix =

=

=

= =

=

=

=

then

D

n x n

F((x1, · · · ,xn ) T , (y1 , · · · ,ynf) (x1, . . . ,xn )A(y1, · · · , ynf (x1, · · · , Xn )(y1, · · · ,yn ) T X1Y1 + · + XnYn , =

=

so

F is the standard inner product on V .

4.2

=

Representation b y matrices

·

·

e 1 , . . . , en , v w v 2:: 7= 1 vi ei w 2:: 7= 1 wi ei .

Suppose we have a real vector space V with an ordered basis and a bilinear form defined on V . Given two vectors and we can write them as linear combinations of the basis vectors, say = and = Then, using bilinearity,

F

F(v,w) F(� vi ei , t WJeJ ) n n L L Vi WJF(ei ,eJ ) · i=1 j=1 =

=

=

� vi F(ei , t WJeJ )

64 Bilinear and sesquilinear forms

F

F(e;, ej)

Thus if we know the values the form takes on the basis elements, i.e. for all i , j , then we can work out the form on any pair of vectors. For convenience, we may put the values of the basis vectors into a matrix,

F(e;,ej )

A (a;j ) defined by a;j = F(e;,ej) is called F e 1 , . . . , en Notice that if F is a symmetric bilinear form , then F(e;, ej ) F(ej , e;) for j , so the matrix A is symmetric in the sense that a;j = aji for all i , j , or in other words A = AT. Definition 4.7 The matrix

=

the matrix of the bilinear form

with respect to the ordered basis

of V .

all i ,

=

Now, in V, any vector v is a (unique) linear combination

e1, e2 , .(.v1,. , evn , , v )T. 2 F, n Proposition 4.8 Suppose V is a real vector space with ordered basis e1, . . . , e n , and F is a bilinear form defin ed on V , with matrix A with respect to this basis. Then for any vectors v, w E V and their corresponding coordinate forms v = (v1 , . . . , vn )T, ( . . , wn )T with respect to the same basis e1, e2 , . . . , en

of basis vectors. Thus with respect to the basis the vector v is determined by its coordinate form, the column vector This representation can be used to give an elegant formula for the form just as in Example 4.5. . • •

w =

WI ,

.

we have

Proof We have

=

by bilinearity.

n n i= l j= l

L L v;F(e;, ej )wj D

Representation by matrices 65 The conclusion is that every bilinear form on a finite dimensional real vector space is 'the same as' (more precisely, isomorphic to ) one of the examples de scribed in Example Thus matrices can be applied to calculations in examples of inner products on vector spaces other than as in the next example.

4. 5 .

Example

Nn ,

4.9 Let V be the three-dimensional vector space mials of degree less than with the inner product

3,

N3 [x] of all polyno

1 (!! g) = fo f(x)g(x) dx defined in Example 3.5. Then 1,x,x2 is an ordered basis of V , and the corres

ponding matrix of inner products is

A = ( 1�2 1/3 �j� 1/4 1/5 �j�) . This matrix can be used to find (x - 1j2x 2 + x), for example. We write x - 1 = ( -1) 1 + 1 x and 2x2 + x = 1 x + 2 x2 in coordinate form with respect to this basis as (-1, 1, 0)T and (0, 1,2)r, and work out (-1, 1,0)A(0, 1,2)T = -1/3. If you like, you can check this by direct integration, as follows. 1 1 1 1 1 3 10 (x - 1) (2x2 + x) dx = 1!0 (2x - x2 - x) dx = - - = ·

·

·

·

2

3

2

3.

Being able to represent bilinear forms as matrices is not the end of the story, however, as the next two examples show.

Example 4 . 1 0 If we take the standard inner product on N2 , then with respect to the standard basis (1, O)T, (0, 1)T the matrix of this inner product is the identity matrix ( � �) . If we choose a different basis, however, then we get a different matrix representing the inner product. For example, we calculate the matrix of the standard inner product with respect to the ordered basis (1, 2)r, ( -1, o)r. Here, ( ( 1 , 2)T!(1, 2)T) = 1 2 +2 2 = 5, ( (1,2)Tj(-1,0)T) = -1 + 0 = -1, ((- 1, 0)Tj(1,2)T) = -1 by symmetry of ( j ) , and ( ( -1, O)Tj( -1, O)T) = 1. So the representing matrix is ( � -D . Example 4 . 1 1 Take V = N2 and F(x,y) = x1y1 + X 1 Y2 + xzy1 where x (x1, xz)T and y = (YI, yz)r. With respect to the ordered basis e1, e2 given by _

we get the matrix B

=

(� i) , while with respect to the basis e� , e� , where _


( _3 -2) . 2 1 Notice that the form F and hence both the matrices B and B' are symmetric.

we get B' =

4.3

The base-change formula

Since we can choose all sorts of different bases, it is important to know what happens to the matrix of an inner product if we change the basis. Let V be a real vector space with basis and bilinear form Sup pose we choose a new ordered basis and write our new basis vectors in terms of the old ones, as

F. fr, . . . ,e1,fn ., . . , en n fi = kLP = ! kiek, say. Write P for the matrix (PiJ), the base-change matrix. the standard basis e = (1,0,0)r, e = Example 4 . 1 2 Let Vl)r. IIRf3weandnowtakechoose (0, 1, O)r, e3 =(0,(0,0,1, -1)T we have for a new basis f11 = (1, 0, I)T, f22 = (2, -1, O)r, f3 fr = 1er + Oe2 + 1e3 f2 = 2er + ( -1)e2 + Oe3 f3 = Oer + 1e2 + ( -1)e3 so the base-change matrix from basis e1, e 2 , e 3 to f1, f2 , f3 is =

=

G -� J)

In this example, the base-change matrix is the matrix formed from columns with respect to the old basis equal to the coordinate forms of the vectors

er, e2 , e3 .

f1, f2 , f3

In general, The base-change matrix is the matrix P formed from columns giving the coordinate forms of the new ordered basis j1 , h, with respect to the old ordered basis To understand the importance of this matrix in calculations, consider a vector with respect to the new basis E V written in coordinate form as We have

vfr,f , . . . , f { 2 n}·

. . . , fn e 1 , e2 , . . . , en . l:�= I vdi

The base-change formula 67 so

P(O, . , 0, 0, . , O) T ) P, e1, e2 , . . . , en . P(v1 , v2 , . . . , vn ) T e1, e2 , . . . , env P fr, . . , fn e1, e2 , . . . , en .

Now . . 1, . . ( where the 1 is in the ith position is the column vector formed from the ith column of i.e. the coordinate form of j; with respect to the old basis is the coordinate form Therefore of with respect to In other words, Multiplication of a column vector by the base-change matrix converts coordinate forms from the new ordered basis to the old h, . ordered basis -I Also, it turns out that the inverse p of the base-change matrix always exists, and Multiplication of a column vector by the base-change matrix p- 1 converts coordinate forms from the old ordered basis to the new ordered basis !1 , h ,

e1, e2 , . . . , en

. . . , fn·

Example 4 . 1 3 I n the complex vector space C2 , take ordered bases a1, a2 and b1, b2 where Then the base-change matrices from the usual basis

and from the usual basis (1,

O)T, (0, 1)T to b1, b2 is = ( -ii 1 ) Q

1

.

( 1 , O)r,

(0, 1)T to a1, a2 is

a1, a2 b1, b2 a1 , a2 , b 1 , b2 , ) - 1 (� -1 ) .

This can be used to find the base-change matrix from as follows. to If is the coordinate form of a vector v with respect to then is the coordinate form of v with respect to the usual basis. Then is the coordinate form of v with respect to so the base Q l change matrix from to is

, T P (v1 , v(v12 ) Tv2 ) P (v1 , v2 ) T -

q- I p

a 1 , a2 b1, b2 = ( i _ 11 ) - l (� = 1 (i1 -- 1i -i 1) = 21 ( -1 1 -i i -z

2i

z

2i -

+

-. 1 z

2+i

z

68 Bilinear and sesquilinear forms The base-change matrix also enables us to convert between the matrices for bilinear forms. Given a bilinear form

F,

n

n

= L LPkiPlJ F(ek ,ez ) n n = k=LPki L F(ek ,ez )pzJ· l Now, Pk i i s the (i, k)th entry of pT, s o this expression i s equal to the (i, j) th entry of pTAP. We have proved the following. k=l l=l

1= 1

Proposition 4.14 ( The base-change formula) Given two ordered bases of a

Euclidean space V , e 1 , . . . , e n and f1 , . . . , fn , related by the base-change matrix . , e n to !I , . . . , fn , suppose and are the matrices of the inner product with respect to . . , e n and h , . . , fn · Then

P from basis

e1, . .

A. B

e1, .

B = pTAP.

Example 4.15 Take the standard inner product on JR2 , with respect to the two bases e1 , e2 where

Then with the notation of Proposition 4. 14 we have

B = ( -25 -21 ) .

P can be calculated as 1 ) - l ( 1 0) = ( 1 1 ) ( 1 0) (-1 1 ) . -1 -2 1 1 -2 - 2 1 5 -2 ( �) , (Alternatively, we can calculate P from first principles, by writing P = � so h = ae1 ce 2 , giving

The base-change matrix

1 3

1

= 3

+

+

2a c = 1 and a - c = -2, which can be solved to give a = -t and cand= �hence - Similarly we find b = t and d = -�.)

Sesquilinear forms over C We can check that

pTAP = ( -11 -25) (51 = i (� -3g) (-15 l 9

4.4

1 -1 2) ( 5 -21 ) =

S esquili near forms over c

1 9

69

1 -2) ( -1845 -1 �) (-25 -21 ) = B.

So far in this chapter, everything has been for vector spaces over the field of real numbers lit If we replaced JR:. throughout by any other field F, e.g. the field of complex numbers, everything remains valid, and we would have the beginnings of the theory of bilinear forms over the field F. In particular, a bilinear form on a finite dimensional vector space over F is represented by a matrix with entries from F and is isomorphic to one of the canonical examples F ( , on But we are interested also in inner products over C also, and these forms are not bilinear, but are instead sesquilinear, so the previous sections do not apply. This last section concerns matrix representations of such forms over C, and generalizes the notion of bilinear form in a different way, using the complex conjugate operation in C

A v w ) = vTAw

Fn .

Definition 4 . 1 6 A

sesquilinear form on a complex vector space V is a map

w E V and o:, {3 E C satisfies w), and w) {3F(u, w).

F from V x V to C which for all u , v , (a) F (o:u {3 v, = Ci F ( u, + /3F (v, (b) F (v, o:v + o: F ( u, v) +

+ w) {3w) =

Definition 4 . 1 7 A sesquilinear form F : V x V -+ C is conjugate-symmetric if also

(c) F ( u, v) = F (v, u) for all u, v

E V.

A s examples, note that any complex inner product (as i n the last chapter) is a conjugate-symmetric sesquilinear form. Everything we have done here for bilinear forms on vector spaces over JR:. applies equally well to sesquilinear forms on vector spaces over C, provided complex-conjugate signs are added in the appropriate place. First, we have a family of ' canonical examples' exactly analogous to Example 4.5.

Example 4.18 Let V be the complex vector space n x n

matrix with entries from C Define F on V

x

en , and suppose A is an

V by

F(x,y) =xTAy,

where y is the complex conjugate of the column vector y , formed by taking the complex conjugate of each entry. As before, F ( x , is a x matrix, which we consider as the same as the complex number which is its only entry. Thus F : V x V -+ C and is a sesquilinear form on V. Also, if is conjugate symmetric, i.e. if then F is conjugate-symmetric.

F AT = A,

y)

1 1 A

70 Bilinear and sesquilinear forms Proof This is just the same as before with a few complex conjugates added.

F(o:x + ;Jy,z) = (ax + /3yf Az (axT j3 yT)Az axTAz + j3yTAz aF(x, z) /3F(y, z), and F(x, o:y ;Jz) = xTA(o:y + ;Jz) xTA(o:y) xTA(;Jz) o:xTAy PxTAz o:F(x,y) ;JF(x,z). Now suppose A is conjugate-symmetric. Then F(x,y) F(x,yf and we have F(x,y) (xTAy)T yTATx = yr -AT +

=

=

+

=

+

=

+

+

+

=

=

=

=

=

X

= YTAx

= F(y,x), similar to the real case. Definition 4.19 Given an ordered basis e 1 , . . . , e n of a complex inner product space V, and a sesquilinear form F on V, the F with respect to this ordered basis is the matrix B whose ( i, j) th entry is b;j F(e;, ej)· If F is conjugate-symmetric, bji F(ej,e;) F(e;, ej) b;j, so B is conjugate-symmetric. Notice that a diagonal entry bii of a conjugate-symmetric matrix B is always real, since b;; = b;; . 0

matrix of the form

=

=

Proposition 4.20 With the same notation, if n

n

v L i=l v;e; and w jL=l Wjej, =

then

n

n

ej)Wj F(v,w) L i=l jLv;F(e;, =l =

=

=

=

Exercises 71 which we can interpret in matrix terms as

vTBw.

Proposition 4.21 (The base-change formula) With the same notation, if we change to the new basis h , . . . , fn , given in terms of the old basis by fi = I:; = I Pkiek, then

�

,

which is the ( i j ) th entry of the matrix

e 1 , . . . , en

P TBP.

Exercises

Exercise 4 . 1 Calculate the base-change matrix P for change of basis from e 1 , e 2 to e� , e� in Example 4. 1 1 , and verify the formula = in this case.

B' pTBP Exercise 4.2 Which of the functions F ( (x1, x 2 , x 3 )T, (y1, y2 , y 3)T) here are in ner products on JR3 ? (a) XI + YI + X2 Y2 + X3Y3· [Hint: what could the matrix representation be?] (b) 3x 1 Yr + Sx 2 Y2 + 4x3Y3 + 2xr Y2 + 2x2 Y 1 + 3x 2 Y3 + 3y2 x3 - x r Y3 - X3Y1 · [Consider 2(xr + x2 ) 2 + 3(x2 + x3 ) 2 + (x1 - x3 ) 2 .] (c) 3x r Yr + Sx 2 Y2 + 4x3Y3 + 2xr Y2 + 2x2 y1 - 3X 2 Y3 - 3y2 x3 - x1 Y3 - X3Y1· [Consider 2(xr + x 2 ) 2 + 3(x2 - x3 ) 2 + (x1 - x 3 ) 2 . ] Exercise 4.3 An inner product,

JR2 er ( 1 , O)T, e 2 (0, l)T. (b) Calculate the matrix representing ( I ) with respect to the basis fr (1/2, o)r , f2 = ( - 1/2, 1) r . (c) Calculate matrices representing the bilinear form

is defined on (a) Calculate the matrix A representing ( I ) with respect to the standard basis •

=

=

=

with respect to e1 , e 2 and

Exercise 4.4 Let

f1 , f2 . Is F an inner product?

Perform row operations on A of the form Pi >.. E IR, to write A in the form

:=

Pi

+ >..pi ,

1

:s; j < i :s;

3 and


( ;) G ;) " b d

0 0

Compute a matrix P with

0

PA �

b d 0

Why is P invertible? Now calculate PAPT . What does this matrix represent? Exercise 4.5 A sesquilinear form on C3 is defined by F ((x, y, z ) ' , (x' , y ' , z') T ) ) � (X, ji, Z)

(-I;

2i

- 1 - 2i 0 -z

Compute the matrix of F with respect to: (a) the basis (i, l , O) T , ( 1 , 1 + i, l ) T , (O, O, i )T; (b) the basis ( - 1 - 3i , 2 + i, - 1 + 2i )r , (2 + i, 1 - 2i, 2 + i)r, (0, 0 , -i)T. Is F an inner product? Exercise 4.6 Let F be a sesquilinear form on the complex vector space C3 and e 1 , e 2 , e3 is the usual ordered basis oH:::3 . Suppose that F (f1 , f1 ) 1 , F ( f2 , f2 ) = 2, F(f3 , f3 ) = 2, and F(fi , fj ) = 0 for i ::/=- j , where fl = ( 1 , i, O) r , f2 = (0, i , 1 , fl = (0 , 0, 1 + i) r .

f

=

Calculate the matrix of F with respect to the basis e1 , e 2 , e3 . Is F an inner product?

Exercise 4. 7 Write down the matrices A, B, C (respectively) of the standard inner product on �3 with respect to: (a) the standard basis; (b) the ordered basis (2, 1 , 1 )r, ( 1 , 2, O) r , (0, 1 , - 1)T; (c) the ordered basis ( -1, 1 , l ) T , (3, 0 , O)T, (1, 2 , - 1) T . Also write down base-change matrices P and Q (respectively) for: (d) base change from (a) to (b) ; (e) base change from (b) to (c) ; Verify that B = pTAP and C = QTBQ both hold.

Exercise 4.8 An alternating form F is a bilinear form on a vector space V satisfying F( v, v) = 0 for all v E V . (a) Show by expanding F(v + w , v + w ) that if F i s an alternating form on a vector space V then F is skew-symmetric, i.e. F(v, w) = - F(w, v) for all v and w in V . (b) Clearly the zero function i s an alternating form; give another example. [Hint: find a suitable matrix A.]

5

O rthogo n a l bases

IR3

One important use of the standard inner product of i s that i t allows us to distinguish between 'nice' bases such as the usual one where the basis vec tors are orthogonal to each other and each has length 1 , and others such as ( 1 , 0, 0)r, (1, 1, 0)r, ( 1 , 1 , 1 f where this is not true. The main objective of this chapter is to show how to find such nice (or, as we shall call them, orthonormal) bases for a given finite dimensional vector space with an arbitrary inner product, and to study the properties of these orthonormal bases. This chapter concerns inner products on vector spaces, and all our vector spaces are over or ..v ) = >..j (v) (4) F(u, v) = G(f(u), f(v)) for all u, v E V and all scalars >.. E �two complex vector spaces V, W with forms F: V V C and G: Similarly, W W C respectively are if there is a bijection f: V W -+

x

x

-+

isomorphic

x

-+

isomorphic

-+

x

such that

f(u + v) = f(u) + f(v) f(>..v ) = >..f (v) F(u,v) = G(f(u),f(v))

for all

-+

-+

(5) (6) (7)

u, v E V and all scalars >.. E C.

Corollary 5 . 1 2 Let V be a Euclidean space of dimension n . Then V is iso morphic to �n with the standard inner product as an inner product space. Similarly, each unitary space V of dimension n is isomorphic to with the standard inner product.

en

Proof We do the real case, the complex case being identical. Denote the inner product in V by F, and the standard inner product in �n by ( I ) .

By Corollary 5.9 there is an orthonormal basis proof of Theorem 2.33

v1, v2 , . . . , Vn of V . By the

defines an isomorphism of real vector spaces from V to �n . We must show (4) be two and holds too. Let and be typical elements of V , and let the coordinate form of with respect to the orthonormal basis of V . Note that = Since the matrix representing the inner and product on V is the identity matrix I (because is orthonormal) , by Proposition 4.8 we have

v = >.. 1 v1 +>..2v2 + · · · +An Vn w =)Tf.LJ VJ +J.L vz+· · - V a = (>.. 1 , >..2 , . . . , >..n b =2 (J.L 1, J.L2+J.L, . .,n. ,nJ,.L.n.)T. , V v,a w f ( w) b. v1 v2 n f ( v) = F v1, v2 , . . . , Vn F(v, w) = aTib = ar b = A1J.L 1 + A2 J.L2 + + Anf.Ln (f(v) i f (w)) ·

·

·

=

as

required.

0

Properties of orthonormal bases 81 5.3

P roperties of orthonorma l bases

There are a number of useful results about orthonormal bases, all using the same basic idea we used in proving Proposition 5.3. Most of the results here concern finite dimensional spaces and are rather and probably also these could be proved by using familiar in the case of Corollary 5.12 to show the space V in question is isomorphic to or with the isomorphism taking an orthonormal basis to the standard basis of or and then proving the result for the standard basis in or However, in the cases of these particular results, it is just as easy to prove them directly, and this is what we shall do here, starting with the real case.

en :

]Rn )

Inner product spaces over numbers.

JR.

IRn enn , n ]R e ) IRn en .

We start by considering spaces over the real

e1, . . . ,en }

Proposition 5 . 1 3 (Fourier expansion) Suppose that { is an or thonormal basis of a Euclidean space V. Then for any E V we have

v

n v L(e;[ v)e;. i= l =

�= l .\;e;. Then take the v toI:obtain ej, n (ej ! v) L .\;(ej[ e;) Aj. i= l Substituting back .\; (e;[ v ) into v 2:: � 1 .\;e; we obtain the equation required. Example 5 . 1 4 You are probably familiar with this result in JR3 . If {i, j , k} is the standard (orthonormal) basis, u · v is the standard inner product, and if v E IR3 , then

v

in terms of the basis, say inner product of both sides of this equation with

Proof We can write

=

=

=

=

=

D

v

=

( i · v)i + (j · v)j + (k · v)k.

That is, v is the sum of its projections onto the three coordinate axes.

Example 5 . 1 5 We consider the orthonormal basis { standard inner product, where

e1 e2 e3 e4

= =

=

=

( 1 /2, 1 /2, 1/2, 1 /2)r, (1/2, 1 /2, - 1/2, - 1/2)r, (1/2, - 1/2, 1 /2, - 1/2)r, (1/2, - 1/2, - 1 /2, 1 /2f,

e1, e2 , e3 , e4 } of IR4 with the and

v ( 1, 2, 3 , 4) T in terms of this basis. We calculate (e1 ! v ) (e2 [ v ) -2, (e3[v) - 1 , and (e4 ! v) 0 , and therefore v 5e1 - 2ez - e3.

and express =

=

=

=

=

=

5,

82 Orthogonal bases

(eiJv)

The coefficients in Proposition 5 . 1 3 are the coordinates of the vector with respect to the ordered basis e1 , They are also sometimes called Fourier coefficients because of their use in Example 3. 14. That example, however, is an infinite dimensional space, for which Proposition 5 . 1 3 does not hold in general. With infinitely many dimensions, there may be convergence problems for infinite series which you might learn about elsewhere.

v

. . . , en .

Example 5 . 1 6 Recall from Example 3.14 that the functions sin(kx) for different positive integer values of k are mutually orthogonal, with respect to the inner product = J::. J (x)g(x) dx, and have norm ,frr . Thus if f : V -+ V is ., defined by

(JJg)

f(x)

n Ak sin(kx) L k=1

=

then we can recover the coefficients Ak from the function f (x) as follows. First normalize the functions to give an orthonormal set of functions fk (x) , where fk (x) = ( 1 /../7r) sin(kx ) , so that f (x) = 2:::: �= 1 (.\k .j7r) fk (x). Then by Proposi tion 5 . 1 3 the coefficients Ak.j1f are given by

so Ak

=

1 (' ; }_7( f(x) sin(kx) dx.

Proposition 5 . 1 7 (Pythagoras's theorem) Suppose e 1 , normal basis of a Euclidean space V. Then, for all E V ,

v

. . . , en is an ortho

n 2 l vW L i= 1 (eiJ v ) . Proof From Proposition 5 . 1 3 , v 2:::: 7= 1 (edv)ei· Now take the inner product of both sides with v. =

=

D

In words, the square of the length of a vector is equal to the sum of the squares of its coordinates. This should make it clear why it is called Pythagoras's theorem.

Example 5 . 1 8 In Example 5 . 1 0 we saw that the polynomials :L!:. � x , and 3\1 0 (x2 �) form an orthonormal basis of the space IR3 [x] of polynomials of

2, 2

-

degree less than 3 with inner product (!Jg)

=

L> (x)g(x) dx.

Properties of orthonormal bases 83

x2 , we have X = 2 J151 0 ( 3 J41 0 (X2 - !3 )) + J32 J22 .

Applying this to the polynomial 2

So

x2 has ' coordinates' y'2/3, 0, and 2 Jl 0/ 1 5. By Pythagoras's theorem 2 l x2 1 2 = ( �2 r + cy;o)

= 92 + 45 = 52 8

J� 1 x4 dx. Pythagoras's theorem is the special case v w of the following. Corollary (Parseval's identity) If { e1, . . . , en } is an orthonormal basis of the Euclidean space V , and v, w E V , then n vje ;)(e;j w ) . (vj w) L( i= l Proof From Proposition 5. 1 3, w = 2.:: � 1 (e; ! w )e;, and taking the inner product of both sides of this with v gives the required identity. which you can check directly by computing

=

5.19

=

D

In IR3 with the standard inner product and standard basis, (vje;) is the ith coordinate of v. In this case,

v;

(vjw} =

3

v;w;. L i=l

5. 1 2.

In other words, Parseval's identity is another way of stating the isomorphism in Corollary So far, the results concern finite dimensional spaces. The next result is a version of Pythagoras's theorem for infinite dimensional spaces. But since we cannot in general form an infinite sum, the infinite dimensional version of Pythagoras's theorem becomes an inequality.

2.:: ::: 1 (v ! e;} 2 , Proposition (Bessel's inequality) If { e1, . . . , ek } is an orthonormal set (no t necessarily a basis) of vectors in a real inner product space V, and v E V , 5 . 20

then

k

L(e; !v ) 2 :::;; ll v l 2 · i= l

84 Orthogonal bases

v - :Z:7= 1 (eiJ v )ei, and compute JJwJJ as follows. 0 � J w j 2 (wJw) = ( v - l:)eiJ v )ei j v - L( ejj v )ej J i=l j= l = (vj v ) - 2 L (ei ! v ) 2 L L (eiJ v )(ej J v )(eiJei ) i=l i=l j= l 2 = (vJ v ) - """' L..,. i=l (eij v ) , since (eiJ ej) 0 except when i j , and ( ei! e i) = 1, a s required. Example 5.21 We can apply Bessel's inequality to Example 5.16. If g is any

Proof Let w

=

n

n

=

n

+

n

n

n

D

=

=

continuous function on [-7r, 1r] , we obtain

so

(x) = x, we get t;n (1;,- x sin(kx) dx) 2 � 1;,- x2 dx = 2K4 /3. Now integrating J.:rrx sin(kx) dx by parts gives ( -1) k+1 27r / k, as you can check, In the special case when g - rr

1r

- rr

so we have proved

or

In fact this is one of the special cases when the convergence problems can be overcome (although we will not prove it in this book) , and it turns out that

:Z:%: 1

(1/k2 ) = 11"2/6.

IC. All of the above results have analogues for spaces over C In all cases the proofs are the same, perhaps with some complex conjugates added.

Inner product spaces over

Orthogonal complements 85 Proposition 5 . 22 If { e 1 , . . . , e n } is an orthonormal basis of a complex inner v , w E V, then

product space V, and

n

:Z:::: i=l

( e ;lv) e ; . n (Pythagoras 's theorem) llvll2 :Z::: : l ( e;lv)l 2· i=l v

(a) (Fourier expansion)

=

=

(b)

(c) (Parseval's identity)

n n w 2:: w e (vl ) I:(vl e; )( d ) (e ;lv)(e;l w ) . =

=

i= l

Proof (a) is exactly as before.

n n n :Z:::: I(vl e ; W . :Z:::: (vl e;)(vle;) i=l ll v ll 2 = (vlv) i=l :Z:::: (vle;)(e;lv) i=l n For (c) , consider w :Z:::: ( e; l w ) e ; and take the inner product with v of both i=l =

=

For (b) , note

__

=

=

0

sides of this equation.

Proposition 5 .23 (Bessel's inequality) If { e1 , . . . , e k } is an orthonormal set (not necessarily a basis!) in a complex inner product space, then

k l:l(e; lv )l2 :( l l v ll2· i=l

Proof

n Let w v - :Z:::: (e;lv) e ; , and compute llwll as follows. i= l n n llwll2 ( wlw) \v - l:(e ; lv ) e;jv - l:(eJi v) eJ ) j n =l n i= l (vlv) - 2:: ( e;lv)(e ;lv) - l:(eJiv)(v ieJ ) j= l i=l n n 2:: 2:: ( e;l v)( eJ i v)( e;iej ) i=l j =l n = (v l v) - ""' L..., l ( e ;lv)l 2 , =

=

=

=

+

i= l

since 5.4

( e ; ie3 )

=

0 except when i

=

j . The result now follows as

llwl l2 � 0.

0

O rthogo n a l com p lements

This section describes the structure of vector spaces with an inner product in greater detail, and in particular introduces the notion of orthogonal subspaces. The main result uses the Gram-Schmidt method, and many of the ideas here are used implicitly in Chapter 7, especially in the proof of Theorem 7.8; also,

86 Orthogonal bases

Fig. 5.2 Orthogonal complements in JR2 in Chapter 13, Theorem 13. 16 gives an alternative short proof using orthogonal complements of results that will be proved by alternative methods. Nothing, however, in this section is actually required later, so the section can be safely skipped on a first reading if necessary.

Example 5 . 24 Let V = IE.2 and U = { ( x, x) : x E IE.} . Then the subspace W = { ( x, 0) : x E IE.} has the property that every vector in V is the sum of a vector from U and a vector from W :

+

(x, y) T = ( y, yf (x - y, Of. The same is true for the subspace W' = { (O, x) : x E IE.} : (x, y ) T = (x, x) T + (O, y - xf. We will say that W and W' are complements to U in the space V. In fact there are infinitely many complements, and in terms of the vector space structure on V there is nothing to choose between them. But in the presence of the standard inner product one complement stands out, namely the perpendicular line U' = { (x, -x) : x E IE.}. ( See Figure 5.2.) We start by giving definitions covering the ideas of ' complement' and 'ortho gonal complement'.

If

Definition 5.25 U and W are subspaces of a vector space V, then the sum of U and W is defin ed as

U + W = { u + w : u E U, w E W } .

Proposition 5 . 2 6 With this definition, U + W is a subspace of V . Proof I f v1 and v2 are elements of U + W, then there are elements u1 , u 2 of U and W 1 , w 2 of W such that v1 = u 1 + w 1 and v2 = u 2 + w 2 . Then for any scalar a

we have

v1 + o:v2

=

=

(u 1 + w l ) + o: ( u2 + w 2 ) ( u 1 o:u 2 ) + ( w 1 + o:w2 ) E U + W,

+

since u 1 + o:u 2 E U and w1 + o:w2 E W .

D

Orthogonal complements 87 by U

Example =

IR2 ,

5.27 Let V = and consider the two subspaces U and W defined E and W = E Then

{ (x,O) : x IR}

U+W=

=

= =

{ (O,y) : y IR}. {u w : u E U, w E W } { ( x, O) ( O, y) : x E IR,y E IR} { (x, y) : x E IR, y E IR} +

+

V.

On the other hand U U W consists only of the points which are on one of the two coordinate axes.

Definition 5 . 28 If V is a vector space and U is a subspace of V , then W is called a complement to U in V if (a ) W is a subspace of V , (b ) V = U + W, and (c) U n W = If these three conditions are satisfied, we write V = U EB W , and say that V is the direct sum of U and W .

{0}.

The point o f Example 5.24 was t o say that although there may be many vector space complements to a subspace U , in an inner product space there is a 'special' one, the orthogonal complement. This is defined next.

Definition 5 .29 If V is an inner product space and U is a subspace of V, we define

U j_

=

{v E V (u I v) 0 for all u E U}. =

:

This is called the orthogonal complement t o U in V , or ' U perp ' for short.

In view of the name, we ought to prove at once that U j_ is actually a comple ment to U , in the sense of Definition 5.28. In fact, this is not true in general, but it is true if U is finite dimensional. Before we prove this, we prove an easy lemma which will be very useful in practice when calculating orthogonal complements of subspaces. Lemma

5 .30 If V is an inner product space, U is a subspace of V, and U has

{u1, , uk } , then U l_ {v E V : (u i ! v ) 0 for all with 1 � i � k}. Proof Let W {v E V : (ui ! v ) 0 for all i with 1 � � k}. I f v E U j_ then (u ! v) 0 for all u E U , so in0 particular (ui! v) 0 for all i , so v E W. Conversely, for all i, and if u E U , then u can be written as if w E W, then (u; l w) u 2.::7= 1 >.;ui for some scalars )..i · Thus k k (u !w ) (L Aiu; j w J L >.;(u;! w ) 0, i= l i=l so w E U Hence U j_ = W , as required. a basis

.

•

•

=

=

=

i

=

=

=

=

=

=

j_ .

=

i

=

0

88 Orthogonal bases

w

This result means that in order to check whether a vector is in U j_ it is sufficient to check that is orthogonal to all vectors in some basis for U. We can now use this to show that U j_ is indeed a complement to U in the sense of Definition 5.28.

w

Proposition 5.31 If V is an inner product space, and U is a finite dimensional subspace of V, then (a) U j_ is a subspace of V , ( b ) u n ul_ = and (c) u + ul_ v . (Thus V U EB U l_ .)

=

=

{0},

v,w E U j_ , then (ujv) = (uj w ) = 0 for all u E U . (uj v+o:w) = (u j v) + o:(uj w ) = 0 for all u E U and all scalars o:, so v + o:w E U So U j_ is a subspace of V. For (b) , observe that if u E U n U l_ then in particular (uj u ) = 0, so u = 0. Thus U n Ul_ = {0}. We show for (c) that any vector can be resolved into two components, v5 in U and vp orthogonal to U. This is very similar to Lemma 5.6 (see also Fig ure 5 . 1 ) . We first use Gram-Schmidt orthonormalization (Theorem 5.8) to find an orthonormal basis for U, say { u 1 , . . . , u k }. Then for any v E V, define k vs = jL=l (v J uJ)uj and Vp = v - vs. By definition vs E U, and for all i (vpj ui) = (vj u i) - (vs l ui) k = (vj u i) - L(vJ uJ)(ujJ u i) j= l Proof For (a) , note that if Therefore

1_ .

we have proved that any vector v E V can be written v+p VpE UEj_ .uThus v = Vs + u l_ , so every vector in v is in u and therefore u + ul_ = v, as required. Proposition 5 .32 If V = U W , then dirn(V) = dirn(U) + dirn(W) . Proof Choose a basis { u 1 , . . . , u m } of U and a basis { w1, . . . , wn } of W. By the hypothesis V = U + W , any vector v in V is of the form v = u + w for some u E U and w E W, so can be written as m +n . v=L AiUi L flj Wj i= l j= l + u l_ ,

Therefore as

EB

0

Exercises 89 Thus

{ ,U u1 , . . .

m , w1 ,

..

.

, wn}

spans V . On the other hand, if

m

n

i=l

j= l

L A;U; + L f1jWj then

n l:>.;u; l:p1w1 { ... 0, wn} { } m

=

i= l

-

j= l

=

0

E unw

{0}, . . . , Um, . , Wn} =

whence .A; = for all i since u 1 , , u m is a basis of U and /1j = 0 for all j since WJ , is a basis of W . Thus { u 1 , w1 , . . is a linearly independent set, so is a basis for V, and .

dim(V) as

=m

•

•

+n

=

dim(U) + dim ( W ) ,

0

required.

Corollary 5 .33 If V is a finite dimensional inner product space, and U is a subspace of V, then (a) dim(U) + dim(U.l ) = dim ( V ) , and (b) (U.l ).l = U.

Proof The first part is immediate from Propositions and 5.32. Now i f x E u then (xlv) = 0 for all v E u L . Therefore X E (U.l ) .l , and hence u s;; ( U.l ) J· . But from the first part applied to U .l , we have dim( U .l ) + dim( ( U .l ) .l ) 0 dim( V ) , so dim(U) = dirn((U.l ) .l ) , and therefore U = (U.l ) .l .

5.31

Exercises

Exercise 5 . 1 Use the Gram-Schmidt process with the following inner products, starting with the ordered bases given, to obtain a new basis which is orthogonal. (a) V = B.3 with the standard inner product, and with the basis (2, -2, (b) V = IR2 , with inner product defined by

(1, 1, o)r,

(0, 1, -l)r,

l)r.

(x!y)

=

(1,0)r, (0, l)r. 1, {

(x1 x2 )

(- � - � ) (��) ,

and basis (c) V = IR3 [x] , with inner product (Jig) = f01 defined by f0 ( x ) = !J (x) = x, and h ( x )

f(x)g(x) dx, = x2 .

and basis

fa, !J , h

Exercise 5 . 2 If e 1 , . . . , en} is any orthogonal basis of a real inner product v E V, prove that

space V, and

(e; i v) e v=� L (e '·ie·) ' ;. i= l Use this to write x3 as a linear combination of the Legendre polynomials P1 (x) = x, P2 (x) = 1x 2 - ' and P3 (x) = %x3 - 1x.

1,

�

Po (x)

=

90

Orthogonal bases

IR3 is given by F (x y ) (x 1 X2 x3) ( � � =� ) (��) · -1 -1 3 Y3

Exercise 5 .3 A bilinear form on ,

=

Apply the Gram-Schmidt process to F, to find a basis with respect to which the matrix of F is diagonal. (You could start with the standard basis. Don ' t forget to check that the set of vectors that Gram-Schmidt gives you is a basis.) Is F an inner product? Explain your answer.

{ e1, e2 ,

Exercise 5 . 4 Let E = . . . } be an infinite orthogonal set of nonzero vectors in a real inner product space V . By normalizing E or otherwise, prove that for all E V

v

(vJeY2 {=: i l ei l ! for all natural numbers n. Exercise 5 . 5 By considering g ( x) x2 , Bessel ' s inequality, and the method of Example 5 .2 1 , find an upper bound for L�=l (1/n4 ) which is independent of Exercise 5 . 6 Let V b e the real vector space IR[x] of polynomials i n x with real coefficients, and let ( I ) be an inner product on V. Suppose that the sets {Po, Pl , · · · , Pn , . . } � V and {Qo, Q l , · · · , Q n , . . } � V are both orthogonal, with Pn and Q n both having degree n for each n E N. Prove that (Pn i R) (Q n i R) 0 for all polynomials R with degree k < n. [Hint: consider Fourier expansions of R.] Hence show that there are scalars A n with Pn An Q n · Exercise 5 . 7 (Legendre polynomials) Let V be the real inner product space IR[x] of polynomials over IR with inner product (P(x) JQ (x)) [11 P(x)Q(x) dx. The Legendre polynomials are defined to be the unique sequence of polynomials E V with Pn having degree n, (P;jPj) 0 for all i # j , and Pn (1) 1 for Palln (x)n. (See Exercise 5.6 to see why this specifies them uniquely.) (a) Prove that for all n E N there is O:n E IR such that Pn+ l (x) O:nX Pn (x) + (1 - O:n )Pn-l (x). (Consider the Fourier expansion of xPn (x). ] (b) Use integration by parts to show that (xPn (x)J P�(x)) 1 - � I ! Pnll 2 and (Pn -l (x)JP�(x)) 2, where denotes differentiation with respect to x. II v 1 2

>

�

"'

=

N.

.

.

=

=

=

=

=

=

=

=

=

1

Exercises 91 (c) Using part (a) , show that

Hence deduce from (b) that

n ;?: 1.

for (d) Using part (a) (twice!) and that

Hence, by induction on

and

(xPn (x)IPn+ l (x)) = (Pn (x) l xPn+ l (x)), show

n, show that II Pn ll 2 = 2n2+ 1

n Pn+I(x) = 2nn++ 11 xPn (x) - -n + 1 Pn- 1 · --

(e) Show that

(1 - x2 )P�(x) - 2xP�(x) n(n + 1)Pn = 0 by taking inner products with Pi for � n. Exercise 5.8 (Hermite polynomials) Let V be the real vector space !R.(x] with (P(x)jQ(x)) = f�oo e-x2/ 2 P(x)Q(x) dx. Let In = (x n \1 ) . (a) Prove that Io = /21i. (Hint: write I6 = JJ e-(x 2 +y2 )/2 dx dy and change to polar coordinates to evaluate this multiple integral.] (b) Show that h n+ I 0 and I2n = ((2n)!/(2n n!) )Io. (Use induction on n.) Hence show that ( I ) is an inner product on V. (c) The Hermite polynomials are defined to be the unique sequence of polynomi als Hn (x) E V with Hn having degree n, {Hn : n E N} is orthogonal for this inner product, and the leading coefficient of Hn (X) is 1. Use Gram-Schmidt or otherwise to show that H3 (x) = x3 - 3x Ho(x) = 1 H4 (x) = x4 - 6x2 3 3 H 1 (x) = x Hs(x) = x5 - 10x + 15x. H2 (x) = x2 - 1 i

+

=

+

92 Orthogonal bases (e) Show that

(xHn (x) I H�(x)) = � ( - I Hn l 2 + 1 Hn+ 1 1 2 o;, II Hn- 1 1 2 ) +

and

by using integration by parts. (f) By taking the inner product of the equation in (d) with

H�(x), show that 1 = 1 + f1n · f1n+ (g) Hence derive the following properties of the polynomials Hn : 1 . 1 Hn l 2 = n !J27[; ii. Hn+1 (x) = xHn (x) - nHn - 1 (x); iii. H�(x) - xH�(x) + nHn (x) = 0. Exercise 5 . 9 (Laguerre polynomials) Let V be the real vector space IR[x] with ( P(x)I Q (x)) = J000 e - x P(x)Q(x) dx. (a) Prove that Jt x n e - x dx = n! for all n. (b) Hence show that ( I ) is an inner product on V. (c) The Laguerre polynomials are the unique sequence of polynomials Ln ( x) E V with Ln having degree n, { L n : n E N} orthogonal for this inner product, and the leading coefficient of L n (x) is 1. Use Gram-Schmidt to show that L3 (x) = x3 - 9x2 3+ 18x - 6 Lo(x) = 1 L4 (x) = x4 - 16x + 72x2 3- 96x + 24 L1(x) = x - l L2 (x) = x2 - 4x + 2 L5 (x) = x5 - 25x4 + 200x - 600x2 + 600x - 120. (d) Derive the following properties of the polynomials L n : 1. 1 Lnl l 2 = (n!) 2 ; ii. L n + 1 (x) (2n + 1 - x)L n (x) + n 2 Ln - 1 (x) = 0; and iii . xL�(x) + (1 - x)L�(x) nL n (x) = 0. Exercise 5 . 1 0 Prove the following generalization of Proposition 5.32: if U and W are subspaces of a vector space V, then dim(U) + dim(W) dirn(U + W) + dim(U W ) . +

+

n

=

Exercise 5 . 1 1 I f U and W are subspaces of a finite dimensional inner product space V , show that (a) if U � W, then W.l 0 for all i and = 0 for all i -::f. j , then is positive definite. (b) If < 0 for all i and 0 for all i -::f. j , then is negative definite.

v1, . . . , Vn F(vi,vi) F(vi,vi)

F F Proof For (a) , let v E V be arbitrary. Since the Vi form a basis, v 2:: �= 1 AiVi for some scalars Ai. Then F(vi,Vj) F(vi,v1)

=

=

n n L = i=ln jL=l AiAjF(vi,vj) """' 2 L._.. i=l I Ai l F(vi, vi) =

;:: o

1 A il 2

(i -::f. j) is zero, each > 0 , and each ;:: 0 . since each Also, since each > 0 the only way could be equal t o 0 i s if = = ··· = 0. This shows that is positive definite as 0, i.e. if required. The argument for (b) is the same, just replacing > with < in the appropriate D places.

F(vi,v1) F(vi,vi) A1 A2 An =

v

F(vi,vi) F(v,v) F

=

Consider the Gram Schmidt process applied to a symmetric bilinear form

Fsymmetric on a real vector space V with basis vectors v 1, v2 , . . . , Vn (or a conjugate sesquilinear form on a complex vector space V ) . We compute vectors k - 1"' F (Wi, Vk) . Wk - Vk - """" i=l F (Wi, Wi ) W, far as possible either until we have found some Wk with F( W k J wk) 0, or else until we have obtained all vectors w1, w2 , . . . , W n · Suppose first that F(wi,wi) -::f. 0 for all k but F(wk,wk) 0. In this case we have Wi E span(v1,v2 , . . . ,vi) for all � k. But then Wk -::f. 0 since otherwise k - 1"' F(Wi, Vk ) . · ) Wi E span(v1,v2 , . . . ,Vk-d Vk """" i= l F(w,,w, L._.,

as

=

n

=

L._.,

i
-.; u ; ) = "£7= 1 >-. ;j (u ; ) , for all u, v E V and all scalars >-., A; . Proof For (a) use f (O) f (Ou + Ov) = Of(u) + Of(v) = 0 for all vectors u, v E V . For (b) , apply Definition 8 . 1 for J1 0, and for (c) , apply part (b ) with ).. - 1. Part (d) is just Definition 8 . 1 where ).. = J1 = 1 , and (e) is by repeated 0 applications of Definition 8.1. =

=

=

Example 8.3 Let V be the real vector space linear transformations j, g : V --+ W by

IR3 and let W be IR2 . We define

and

It is easy to check that Definition 8.1 is satisfied. For example,

Definition 8.4 Given f : V --+ W as in Definition 8.1, the image (or range) of f is {f(v) : v E V} (written f(V ) or im(f)). The kernel (or nullspace) of f is {v E V : f (v) = 0} (written ker(f)). Proposition 8.5 If f : V --+ W is a linear transformation, then irn(f) is a sub space of W and ker(f ) is a subspace of V . Proof We verify the conditions in Lemma 2.9. If v, w E ker f then f(v + >-.w) = f(v) + >..f (w) = 0 + )..0 = 0, so v + >-.w E ker f. If v, w E irn f then v = f(x) and w = f(y) for some x, y E V , so f(x + )..y ) = 0 f (x) + )..j ( y) v + >-.w E irn f. =

Basics 129 Example 8.6 For f and g as in Example 8.3, f is surjective since given (u, v)T E �2 we have j (u, -u, u + v) T = (u, v)T, but g is not surjective as for example ( 1 , - 1) T is not equal to any ( x , x ) T . The kernel of f is ker f = { ( x, y, z ) T : 2x + y = y + z = 0 }

which as you can check i s spanned b y the vector ( 1 , spanned by (0, 1 , O)T, (0, 0, l ) T .

-2, 2) r. The kernel of g is

The following is a particularly useful criterion for testing if a linear trans formation is injective.

Proposition 8. 7 A linear transformation f : V -+ W is injective if and only if ker f is the zero subspace { 0 } of V . Proof I f f. 0 i s i n ker f then f v ) = j (O) = 0 s o f i s not injective. Conversely, if f(v) = f(w) for some f. w , then w f. 0 and w) = f (v) - f(w) = 0 D so v - w E ker f and hence ker f f. { 0 } .

v

Proposition

( v-

v

f(v -

8.5 allows us to make the following definition.

rank of f is the dimension of irn(j) , and the nullity of f is the dimension of ker(j) . We write r(j) for the rank of j, and n(j) for its n ullity

Definition 8.8 The

Example 8.9 In Examples 8.3 and 8.6 we have n(j) = 1 and n (g) = 2. We can calculate the ranks of and g as follows: firstly, r(f) = 2 as f maps �3 to �2 surjectively; secondly, r (g) = 1 as ( 1 , 1 ) T forms a one-element basis of irn g . In both cases we check that r(j) + n(f) = 3 = dim �3 and r(g) +n(g) = 3 = dim �3 . This is no accident; in fact, these are just particular cases of the rank-nullity formula.

f

Theorem 8.10 (The rank-nullity formula) If f : V then r(f)

+ n(j) = dim( V ) .

Proof Choose a basis {

-+

W is a linear map,

v1,

. . . , vk } for ker(j) , and extend this basis to a basis { v1 , , Vm } of V, so that k = n(f) and rn = dirn (V). Now any vector v in V is of the form v = I:;� 1 A;v; , so •

•

•

f (v) = f

(t A;v;)

m

=

t= l

2: >-.d(v; ) ,

i=lm = L >-.i f(v;) i=k+ l

since f i s linear,

since f (v J ) = · · · = f(vk) = 0. Thus the vectors f (vk+ ! ) , . . . , j(vm) span the = 0 then image of f. On the other hand if I:;� k+ l

,\i f(v;)

130 Linear transformations

so

m L i=k+l A;V; E ker(j ) ,

and hence there are scalars

Jl.J such that k m i=kL+l A;V; = jL=l Jl.j Vj·

{v1, . . . , Vm } i s a linearly independent set and k m Ji. V L J J L j=l i= k+l (-.\;)v; = 0 so .\; = 0 for all Therefore {f(vk+ l ), . . . , j(vm )} is a linearly independent set, and hence a basis for im(j ) . So But

+

i.

r(j) as

= m -

k = dim(V)

-

n (j) 0

required.

Rank and nullity give useful ways of determining if a transformation is in jective or surjective.

Proposition 8 . 1 1 If j : V -+ W is a linear transformation of finite dimensional

f f Proof The map f i s injective i f and only i f ker f = {0}, i f and only i f n(j) = 0 by Proposition 8.7, and f is surjective if and only if dim(im f ) = dim W by Corollary 2.28.

vector spaces V, W (over the same field F) then (a) is injective if and only if and (b) is surjective if and only if r(j) = dim W .

n (j) =

0,

0

Corollary 8 . 1 2 If f : V

-+ W is a linear transformation of vector spaces V, W, then is injective if and only if r(j) = dim V and is surjective if and only if n(j) = dim V - dim W .

f

f

Proof B y the previous proposition and the rank-nullity formula.

0

Just as matrices provided a canonical family of examples of bilinear forms in the previous part of this book, they provide examples of linear transformations too.

Basics 131 Example 8.13 Let V = IRn and W = !Rm , and let A be an

m x n matrix with real entries. Then we may define the linear transformation fA : V -+ W by

JA (v) = Av.

(Note that if v is an n x 1 matrix, A v is an m x 1 matrix, so the matrix multiplication in this definition makes sense.) The transformation fA is linear by distributivity of matrix multiplication: A(.-\u + f1V) = .\Au + 11Av.

We can do the same over any field F. If A is an m x n matrix with entries from F , we can define the linear transformation fA : F n -+ Fm by fA ( v) = A v . I t i s convenient t o extend the terminology and notation for image, kernel , rank, and nullity to matrices in this way: if A is an m x n matrix over the field F, then im denotes the image of fA ,

A

im A = {Ax : x E F n } ,

ker A denotes the kernel ker A = { x E Fn : Ax = 0 } , and r ( A) , n(A) denote the rank and nullity of A, i.e. the dimensions of im A and ker A respectively. Exercise 8 . 1 Show that, for an m x n matrix A over a field F, the subspace im A of pm is the subspace spanned by the columns of A .

It i s interesting t o note that the rank of an m x n matrix A over IR, a s defined using echelon form in Section 1 .5, is the same as the rank of the linear transformation fA just defined. To see this, consider a sequence of row operations converting A to echelon form B. Since each such row operation corresponds to an m x m matrix Ri over N, we have

r(A)

Now, each row operation matrix Ri is invertible, so 1 1 1 A - R1 R2 . . . R-k B .

This means that the linear transformation fA is equal to the cornpos1 t 1 0n of the transformations JR-' , JR2- ' , · . . , JR- ' , fB · You can verify that each Ri 1 is k I

132 Linear transformations actually a bijection rn;.m -+ rn;.m . It follows that dim(irn fA ) = dirn (irn !B ) - But since B is in echelon form we can spot a basis of im /B immediately. If

0 with each

bi n, -::J 0 then

0

0

the subspace irn /B which is spanned by the columns of

B has basis given by

1 0 0

0 1 0

0 0 1

0

0

0

0

'

•

•

•

(There is one of these vectors for each nonzero row in B . ) So im !B has dimen sion equal to the number of nonzero rows in B , i.e. equal to rk as defined in Section 1.5.

A

Example 8 . 14 Let

A be the matrix

�) .

1 2 2 -1 3 -4 - 1

This is a 3 x 4 matrix so represents a linear transformation JR3 -+ JR4 . We can perform row operations to get into echelon form as follows.

A

P2:=p2-P1

(�

_;)

1 2 1 -3 3 -4 - 1

A

p3 : =p3 - p1

p3 : =p3 - 2 p1

G (i

1 2 1 -3 2 -6 -4 1 2 1 -3 0 0

;)

- �)

_

� s

A has rank 2, and hence nullity dim( IR4 ) - 2 2. the elementary row operation matrices together we find that RABy multiplying B, where R is the matrix So

=

=

(-: _I �)

R- 1

R- 1

Now, im B is spanned by the columns of B , so a basis of this space is (1, 0, O)T and (0, 1, O)r. It follows that im is spanned by ( 1 , 0, O)T and (0, 1, O)T, and on calculating we find that

A

Arithmetic operations on linear transformations 133

R- 1 ( 1 , 0 , 0)T

so a basis of irn A is formed by

(0, 1, 2)T.

To work out the kernels, note that

{ (v�;) + Y + } } { (� ) { c:,2�'�) �} x

ker B =

2z + 3 w 0 y - 3z - 2 w 0 =

=

_

:

z

x

=

y

w

( - 5 , 3, 1 ,

Of' and ( - 5 , 2, 0, l)T. Also, ker A

Av because 8.2

=

z. w E

=

so is spanned by any vector v,

-2z - 3w - y 3z + 2 w

=

0

¢:;>

RAv = 0 ¢:;> B v

=

=

ker B since for

0

R is invertible.

A rithmetic operations on linear tra n sform ations

You will be used to the idea of adding functions together, defining the sum of two functions f + g by

(f + g ) (x)

=

f(x)

+ g (x),

and of scaling functions, defining the function

(i\f)(x)

=

i\f by

>.(J(x)).

For this to make sense, we simply need the codomain of the functions t o have an addition and a scalar multiplication defined on it, and this is true in the case of linear transformations since this codomain here is a vector space W . It should come as no surprise that we expect these operations on functions to satisfy the vector space axioms. In fact, if we take a set S of functions from any set X to a vector space W, such that S is closed under addition and scalar multiplication, then S is automatically a vector space. We do not need any other conditions on the functions, or on

134 Linear transformations the domain X of these functions. To prove this, we simply have to check the vector space axioms: for example, for any E S and any scalar we have since

f, g ,\( ! + g) = (,\f) + ( ,\g), (,\ (! + g))(x) = ,\ ((! + g)(x)) = ,\ (f(x) + g(x)) = ,\(f(x)) + ,\ (g(x))

,\

since W is a vector space, and so

(,\ (! + g))(x) = ( V)(x) + (,\g)(x) = (( ,\! ) + ( ,\g))(x).

The other axioms are equally easy to check. Example 8 . 1 5 Let �[0, 1] be the set of all differentiable functions from [0, 1]

to ffi.. The codomain of these functions is IR, which may be regarded as a one dimensional vector space over itself. Then � [0, 1] is a vector space with the operations defined above, since it is closed under addition and scalar multipli cation.

Example 8 . 1 6 Let 2 ( V, W) be the set of all linear transformations from a

vector space V to a vector space W over the same field F. Then 2 ( V, W) is itself a vector space, over the same field F. The zero element in 2 ( V, W) is the map that takes every vector in V to zero in W .

The special case 2 ( V, V ) will b e especially important for the study of linear transformations and applications. This space may be given an additional opera tion, namely composition of functions, by setting to be the function defined by

fog (f o g)(x) = f(g(x)).

Composition is a sort of 'multiplication' of linear transformations from V to V, but you should be warned that not all the laws you might expect for multiplica tion hold in this case. For example, the commutativity law is false in general, although the associativity law is always true. For the record, we list all the properties of the arithmetic operations on linear transformations defined that hold for all E 2 ( V, V) and all scalars 11 · 1. ( Associativity.) 2. ( Commutativity. ) 3. ( Zero. ) 0 0 where 0 is the zero map defined by 0.

(! o g) o h = f o (gfoohg) = g o f f, g, h ,\ , (! + g) + h = f + (g + h). f + g = g + f. O(x) = + f = f + = f, 4. ,\ (Jlf) = ( AJl ) f. 5. ( ,\ + Jl )f = ,\ f + flf . 6. Of = 0 . 7 . f + ( - l )f = O .

Representation by matrices 135 (f o g) o h = J o (g o h) . 9 . I o f = f o I = j , where I is the identity map defined by I(x) = x. 10. f o (g + h) = (f o g) + (f o h). 1 1 . (g + h) f = (g f) + (h f). 8.

0

0

0

A diligent reader will stop t o check these properties at this point, but for others, the main idea of the next section provides an alternative proof. 8.3

Representation by matrices

If f : V -+ W is a linear map , v 1 , is a basis for V, and w 1 , . . . is a basis for W , then each f(vj) is in W , so can be written as a linear combination of the basis vectors. Thus we have

. . . , Vn m f(vj) = 2::: a;jW; i=l

a;j · a;j)

, Wm

for some scalars (These scalars are uniquely determined by Proposition 2. 18.) The matrix A = ( is called the matrix of f with respect to the ordered bases of V and w 1 , . . . , wm of W.

V], · · · ,vn

Example 8.17 (a) The matrix of the zero map 0: V -+ W taking any

to 0 is just the zero matrix 0. To see this, note that

m i=l

vE

V

m i= l

so each a;J = 0 since these coefficients are unique. (b) The matrix of the identity map I : V -+ V with I( v) = v, with respect to the same basis v 1 , . . . in both domain and codomain, is the identity matrix I. Again, to see this we just need to note that

, Vn

f(vj ) =

n Vj = 2:= J;jVi i=

l b;j = 0 if i -::/:- j and b;j = 1 if i = j . Given the matrix A of a linear map f with respect to basis can work out f ( v) for any vector v E V , as follows. First express v combination of the basis vectors of V, say v = .E7= 1 so that where

AjVj,

v1 , .as. . a, Vlinear n , we (2)

since f is linear, and therefore

n m L j=l Aj 2::: i=l a;j W;. We can view this matrix in terms of the coordinates vector v with respect to the ordered basis . . . , Vn of V. (See Section. . . 2.5, Anforof athediscussion f(v) =

v1 ,

A1 ,

136 Linear transformations of coordinates.) Equation (2) shows that the ith coordinate J-li of f ( v ) with respect to the ordered basis w 1 , . . , W m of W is

.

n

J-li =

L a;jAj ,

j= l

(J-L1 , (

. . . , J-Lm) T is related to the column or, in other words, the column vector 11 = T vector >. = . . . , A n ) and the matrix A = a;j ) by matrix multiplication,

().. 1 ,

11 = A>.. Example 8 . 1 8 Consider the linear transformations f, g : Ilt3

f

G)

e+

--+

Ilt2 given by

e:::) ,

The matrices representing these with respect to the usual bases of Ilt3 and Ilt2 are

1 1

0 0

and

respectively. To see this note that f

m � G) �

2

�)

w + 0 (�) ,

giving the first column of the matrix for j , and so on. Alternatively, note that (x, y , z) T is the coordinate form for this vector with respect to the usual basis, and and The next important question is: what happens to the matrix A if we change either the basis of V or the basis of W? First, let us replace the basis v 1 , . . , Vn by v ; . . . , v� , related by the base-change matrix P = (Pij ) , so that

.

,

vj Then

n

=

L:::V ij V;. i= l

Representation by matrices 137

n LPiJf(v;) i=1 m = L PiJ L a ki Wk i= 1 k=1 =

n

Now the sum in brackets in this last expression is simply the (k, j)th entry of the matrix AP, so the matrix of with respect to the pair of bases and . . . , wm is AP. Similarly, we can replace the basis by w; , . . . , w� , a new basis related to the old one by the base-change matrix Q = ( so that

f

w1,

Let

q- 1 = (rij ), so that

Then

w1, . . . , Wm q j ) i m w� = 2:_ q;1 w ; . i= 1 m W; = j2:_rj;W�. =1

v1 , • . . , v11 q- 1

f

v;, . . . , v�

q- 1

so that the matrix of with respect to and w; , . . . , w� is A. Putting the two together we obtain the general form AP , where P i s the base-change matrix for V and Q is the base-change matrix for W . We are most interested i n the case when V W, where i t i s reasonable to suppose that we will use the same basis for both the domain and codomain. is a basis for V, we can Thus if : V --+ V is a linear map, and write =

v1, . . . , Vn

f

n f(vj) L i=l a;jV;, and say that A = ( is the matrix of with respect to the ordered basis v1 , . . . , v11 • In this case,a;1)if we change basis byf P = (p;1 ) , so that n vj = LPi i= l jV;, =

138 Linear transformations then we change both occurrences of the basis, so the matrix of f with respect to v� , . . . , v� is p - 1 AP. Let us state this formally for future reference. Proposition 8 . 1 9 Let V be a vector space with ordered bases B given by v 1 , . . . , Vn and B ' given by v� , . . . , v� . Let P be the base-change matrix from B

to B' , so

vj

n

=

L Pij Vi · i= 1

Suppose that f : V ---+ V is a linear transformation which has matrix A with respect to the basis B and matrix B with respect to the basis B ' . Then B p- 1 AP.

=

Warning. A matrix in isolation can represent many different things. It does not make sense to talk about changing the basis unless you know what the matrix in question represents. Thus changing the basis for a quadratic form has the effect of changing the representing matrix from A to p T AP, whereas changing the basis for a linear transformation has the effect of changing A to p - 1 AP. These are only the same under the very special circumstances when p- 1 p T , circumstances which are examined i n more detail i n Chapter 13. =

Definition 8 . 2 0 If B = p - 1 AP, then A and B are called similar matrices. If B = p T AP, for some invertible matrix P, then A and B are called congruent.

The final thing that we need to consider with matrix representation of linear transformations is how the matrix representation of the product >..f of a scalar ).. and a linear transformation f or the representation of the sum f + g or com position f o g of two linear transformations can be obtained from the matrix representations of f, g. For scalar multiplication and addition, this is quite straightforward.

Proposition 8.21 Let f, g E £(V, W ) where V, W are finite dimensional vector spaces, and let f, g have matrix representations A, B respectively, with respect to some ordered bases A and B of V, W respectively. Then (a) The matrix representation of >.. f with respect to A, B is the scalar product >.. A of the matrix A. (b) The matrix representation of the sum f + g with respect to A, B is the matrix sum A + B . Proof This i s almost immediate from the definition. The matrix of f i s given by m

f (vj )

=

L a;j Wi

j=1

where A is the ordered basis v1 , . . . , Vn and B is w 1 , . . . , Wm . Then

Representation by matrices 139 m

m

(A.f)(vj ) >... j=L1 aijWi j=L1 ( >...aij )Wi =

so the matrix for

=

A.f is >...A . Similarly if g(vj ) = j=L1 bij Wi m

then m

m

m

(f + g)(vj ) j=L1 a;j Wi + jL=1 b;j Wi j=L1 (a;j + b;j )w; and hence the matrix for f + g is A + B . =

=

0

Note also that the matrix for the zero transformation 0 E Jt(V, W) given by 0 is the zero matrix of the appropriate size. The other arithmetic operation on linear transformations is composition, and this corresponds to matrix multiplication. Indeed, this perhaps explains why matrix multiplication is defined in the way it is.

O(x)

=

Proposition 8.22 ( Composition of functions) Let U, V, W be finite dimen sional vector spaces, with ordered bases A, B, C respectively, and let U -t V and V -t W be linear maps. If f and are represented by the matrices A

g: g and B with respect to A , B, C, then fo g is represented with respect to the same bases by the matrix product AB. Proof If A is the ordered basis 111 , . . . B is v1 , . . . , Vm , and C is w1, . . . , W n , then (f o g)(uk ) f(g (u k )) f (f1 bjkVj ) ]= j=L1 bj k f(vn j ) L bjk L a;j Wi f:

, 11 1 ,

=

=

m

=

m

=

j= l

i=1

But 2::_7'= 1 is precisely the (i, k ) th element of the matrix product AB, so 0 the matrix representing is AB.

a;1b1k

fog

140 Linear transformations In the special case of 2'(V, V ) , where V is an n-dimensional vector space over a field F, what we have proved is this. Given an ordered basis B of the vector space V, we have a map from 2'(V, V ) to the set Mn,n ( F) of n x n matrices with entries from F, taking f to the matrix representing f. Every linear transform ation corresponds to some matrix (which is uniquely determined once we have fixed B ) , and every matrix is the matrix of some transformation f. What's more, the zero transformation corresponds to the zero matrix, the identity transforma tion I(x) = x corresponds to the identity matrix I n , and the operations of scalar multiplication and addition of linear transformations correspond to scalar mul tiplication and addition of matrices. In other words, the vector spaces 2'(V, V ) and Mn,n ( F) over F are isomorphic. B u t we can say a little bit more: these vector spaces have 'multiplication' operations-for Mn,n( F) this is just multiplication of matrices, and for 2'(V, V ) it is composition f o g of functions, and Proposi tion 8.22 shows that Mn,n ( F) and 2'(V, V) are isomorphic with this operation too. Exercises

Exercise 8 . 2 For each of the following n x m matrices A, give bases in �n and �m for im fA and ker fA , where fA : �m --+ �n is left-multiplication by A.

(a)

G�

1 1

(b)

( ; �) -1 0 1 1

(c)

(-� i -i) . -1 4

1

Exercise 8 . 3 For each of the following sets S of vectors in �n , find a linear map

�n --+ �3 whose kernel is spanned by S . (a) S = {(1, 1 , 0), (1, 0, - 1)} i n �3 . (b) S = { ( 1 , -2, 1 , 0)} in �4 . 2 (c) S = { (-2, 1)} in � • (d) S = {( - 1 , 1, 2, - 1), (0, 1, 2, 3)} in

Exercise 8.4 Let

f : �3

--+

�4 .

�3 be the linear transformation defined by

f((x, y, zf) = (-x + y - z, x + 2y, - y + 3z) T . Calculate the images of the vectors

VJ = (0, 1 , 1) v2 = ( 1, - 1, 1) V3 = (2, 1 , 0).

Verify that f(vJ ) = -v1 - 2v2 + v3 , and derive similar expressions for f(v2 ) and f(v 3 ). Hence write down the matrix of f with respect to the basis v1 , v2 , v3 of �3 .

Exercises 141 Exercise 8.5 Do the same for the map

g defined by g ((x, y , zf) = (y - 2z, -x + 2y - z, x + y + z) T .

Exercise 8 . 6 Define f : .. , say, and then =

a(x) + b(x) ( 1 /.A. ) ( 1 - a(x)) = a(x) + .\ ( 1 / >.. ) ( 1 - a(x))

=

1

required. Otherwise r(x) is nonzero and we can apply the inductive hypothesis. This gives polynomials f(x) and g(x) such that

as

r(x)f(x) + b(x)g(x) Substituting back for

r(x)

we obtain

(a(x) - b(x)q(x))f(x) and hence

=

1.

+ b(x)g(x) 1 =

+

a(x)f(x) + b(x) (q(x)f(x) g(x)) = 1 as

required.

9.4

D

Roots of p olynomials over other fields

Most of the rest of the book can be applied to vector spaces over arbitrary fields. This ( optional ) section is provided for readers who would like to see how the theory of linear transformations as we set it out applies to vector spaces over

148 Polynomials fields F other than C and JR. In the chapters that follow, there is essentially very little that needs to be changed when we work over an arbitrary field F, except that we often need our polynomials to have a root in the field, or to factorize into linear factors over that field. The simplest way to ensure this is to restrict attention to algebraically closed fields, which are defined to satisfy these two (equivalent) conditions. Definition 9 . 10 A field F is algebraically closed if every polynomial with

coefficients from F factorizes into linear factors which themselves have coeffi cients from F. However, many of the results below are true even for fields which are not algebraically closed, although the proofs sometimes require us to work in a larger field than we started with (just as results about IR often require us to work in q . To do this in general, we need the existence of an 'algebraic closure' of our field F, which is a 'smallest ' algebraically closed field containing F. This idea is given formally in the next definition, but a proof that such a field exists is beyond the scope of this book. Definition 9 . 1 1 If F is any field, then F is an algebraic closure of F if

(a) every polynomial with coefficients in F has all its roots in F, and (b) every element of F is a root of some p olynomial with coefficients from F. Theorem 9.12 If F is any field, then F has an algebraic closure F. Moreover,

F is unique (up to isomorphism) .

In the case of the field of real numbers, � its algebraic closure is just the ' complex number field, C, which is formed from IR by 'adding a square root of - 1 ' (normally called i) , i.e. by adding a root of the polynomial equation x 2 + 1 = 0. This may suggest a method for constructing the algebraic closure of an ar bitrary field F: simply adjoin roots of polynomials one after another until we can go no further. In general there are infinitely many polynomials to consider, and so this is an infinite process. It is not obvious that it can be 'completed' in a sensible way, or that the result is uniquely determined by the original field F. Example 9.13 As indicated in Example 2.41, the field of order 9 may be con

structed by adjoining a root of x 2

+1

to the field of order 3.

In general , we may as well assume that we adjoin a root of an irreducible polynomial f(x) (i.e. one which cannot be factorized in any nontrivial way over the original fiel d ) . It can then be shown that by taking all polynomials modulo f(x), we obtain a field. If the original field had order q , and the degree of f(x) is n , then the new field will have order q n . (See Exercises 9.6 and 9 .7.) The existence part of Theorem 9.12 is proved by an infinite process as follows. Given a field F' 2 F, either F' is the algebraic closure of F, so there is nothing else to do, or else there is a polynomial equation f(x) = 0 over F' with not all its roots in F. By factorizing f(x) if possible, we may assume f(x) is irreducible. Then we may add a root of f(x) to F' by the method of Exercise 9.6, obtaining a

Exercises 149 new field F" 2 F' , and the process continues. It may be that this process finishes rather quickly, as would be the case in constructing IR, or it may take infinitely many stages. However, even if infinitely many stages are required, we may take the union of all fields constructed at some stage as our algebraic closure. The results on the fields lR and C and polynomials earlier in this chapter apply to other fields too. For instance, we can generalize Euclid's algorithm ( Proposition 9.9) immediately to algebraically closed fields, but for general fields it needs to be stated slightly differently.

a(x)

b(x) a(x) b(x) s(x) t(x) a(x)s(x) + b(x)t(x) = 1.

Proposition 9 . 1 4 If and are nonzero polynomials with coefficients from an arbitrary field F, and and have no common factors other than constants, then there are polynomials and such that

Proof A s before, but replacing 'root ' by 'nonconstant factor' everywhere in the 0 previous proof. Exercises

p(x) = x3 - 2x2 + x + 1, q(x) = x2 - 5x - 2, and r(x) = x2 + 6x + 9 at each of the matrices A = G �) and B = (-31 -30) . Exercise 9 . 2 Evaluate the same polynomials as in the last exercise at the linear transformations f((a, b)T) = (2a-b, a+b)T and g ((a, b, c)T) = (a+b, b+c, c-a)T from IR2 IR2 and IR3 IR3 respectively. Exercise 9 . 3 Expand (A-3I)(B2 + 4B+2l) and (A-3B)(B2 +4B+3A+2l) where A and B are unknown real matrices and I is the identity Exercise 9 . 1 Evaluate each of the polynomials

-t

-t

n x n

n x n

matrix.

Exercise 9 . 4 Show that every polynomial of odd degree with coefficients from lR

bas at least one real root. [Hint: show that if ( is a complex root of a polynomial lR then ( is also a root, so nonreal roots occur in pairs. ]

p(x) over

Exercise 9 . 5 Show that every polynomial over lR can be factorized into linear and quadratic factors over IR, i .e. written as a product of polynomials over IR, r

where each

q1

has degree at most

2.

Exercise 9 . 6 Let F be a field, and let

degree

k

f(x)

be an irreducible polynomial of over F. Use Euclid's algorithm to show that any nonzero polynomial

150 Polynomials g(x) of degree less than k has a multiplicative inverse modulo f(x); that is, there exists h( x) such that g(x)h(x) 1 (mod f(x)). Deduce that the set of polynomials modulo f (x) i s a field. Exercise 9. 7 Let F be a field, f(x) an irreducible polynomial of degree k over F, and G be the field defined by adjoining a root of f(x) to F. Prove that G is =

a vector space of dimension

k over F.

Exercise 9.8 Let F' be a field and let F be a subfield of F' such that the di mension of F' as a vector space over F is finite. (Such field extensions F' 2 F are often called finite extensions.) By considering 1 , a, a 2 , a 3 , . . . , a n , or other wise, show that every a E F' satisfies a polynomial equation = 0 for some polynomial with coefficients from F.

p(x)

p(x)

10 Eige nva l ues a nd eige nvectors 10.1

A n example

Rather than giving the formal definition of eigenvalues and eigenvectors-the subject of this chapter, indeed of the rest of the book-straight away, we shall give a hypothetical example of their use to motivate their study. We imagine a team of biologists studying a single-celled organism which re produces by cell division. They have identified two different types of the organ ism, X and Y, and have noticed that on cell division a type X sometimes mutates into type Y, and sometimes a type Y mutates into type X . Their measurements indicate that both types of the organism reproduce at the same speed, on average doubling in number in unit time. Moreover, starting from a population of type X organisms, after unit time they observe a population of 180 type X and type Y . Similarly, starting from type Y organisms, after unit time 190 type Y and type X are observed. They want to use this data to predict the development of a mixed population over several units of time. If X n represents the number of type X at time n and Yn represents the number of type Y at time n they suggest these numbers should be related by

100

20

100

10

Yn+ l

=

0 . 1 Xn

+ 1.9yn-

Equations like these are called simultaneous difference equations. In matrix form , they are written as

(Xn+l ) = ( 1 .8 0.2) (Xn ) . (1) 0.1 1 .9 Yn Yn + l We shall solve these equations by 'pulling a rabbit out of a hat ' , considering the vectors ( 1, 1) T and ( 2, 1 ) T. These vectors are chosen because they have the nice property that (-2)1 = 1.? (-2) . (2) ( 11) = 2 ( 11 ) ' ( 1 .8 0.2) ( 1.8. 1 0.2) 1 . 9 1 1 . 9 0.1 0 These two vectors are linearly independent so form a basis of IE.2 . We shall also consider the base-change matrix formed from these two vectors, -

152 Eigenvalues and eigenvectors

P = c -21) . p- I = (- � i) U n , Vn (�:) = p-I G:) . (�::�) = p-I G::�) = p-I G:� �:�) pp-I G:)

This matrix has inverse

i

and we define numbers

Now, from

or

(I)

by

and (2) we have

But it follows from

(2) that p-I G:�

�:�) p = G

��7) ·

(Alternatively, this matrix multiplication can be checked directly.) We deduce

(Un+I) Vn+ I (20 1.70 ) (Un) Vn and hence Un = 2n uo, Vn (1. 7 ) nvo. Also, (uo, vo)T = p - I (xo, yo)T = !(xo + 2yo, Yo - xo)T and -

=

which can be expanded to give

and

Xn , Yn

being formulae for the numbers of organisms of type X, Y in terms of the initial populations of types X and Y.

x0, y0

Eigenvalues and eigenvectors 153 10.2

Eigenvalues a n d eigenvectors

Examples like the one given in the previous section show the importance of vectors such as ( 1 , 1)T and ( -2, satisfying properties like those in (2) above. We start by making this into a formal definition.

1)T

Definition 1 0 . 1 Let A be an n n matrix over a field F. Then a column vector x in Fn is called an of A, with A E F, if x f:- 0 and Ax = Ax. Theorem 1 0 . 2 A scalar, A, is an eigenvalue of an n n matrix, A, if and only if the matrix A - AI has nullity n (A - AI) > 0. Proof If A is an eigenvalue of A with eigenvector x f:- 0 then (A - Al)x = Ax - Alx = AX - AX = 0 . Since x f:- 0 and lies in the kernel of A - AI, n (A - AI) > 0. Conversely, if n(A - AI) > 0 then there is a nonzero vector x in the kernel of A - AI, so (A - Al)x = 0 or Ax = AX. Hence A is an eigenvalue. The proof of the preceding theorem also shows the following. Theorem 1 0 . 3 Suppose A is an eigenvalue of an n n matrix A. Then the eigenvectors of A having eigenvalue A are precisely the n onzero vectors in ker ( A - AI) = {x : (A - Al)x 0 } . Theorem 10.4 Every n n matrix A over F = or e has an eigenvalue A in e and an eigenvector X in en with eigenvalue A. Proo f Fix any nonzero vector v E en. The vectors v, Av, . . . ,An v, form a set of n + 1 vectors in a n-dimensional vector space en, so are linearly dependent. In other words there are scalars ao, a1, . . . , a n E e with Now consider the polynomial a n z n + + a1z + a0 • Since we are working over eandandc Eevery polynomial over e has all its roots in e, there are E e e with x

eigenvector

eigenvalue x

D

x

=

lR

x

· · ·

Tj ' . . . ' Tn

Then

= (an An + + a1A + aol)v = c(A - r1l) . . . (A - rn l)v. Since v f:- 0 , there is some such that (A - r;l) . . . (A - rn l)v = 0 but r; 1) . . . (A - r n l)v f:- 0. It follows from this that at least one of the (A +1 matrices (A - r;l) has nonzero nullity, so at least one of the r; is an eigenvalue D for A. 0

· · ·

i

154 Eigenvalues and eigenvectors Remark 10.5 In general, the same result h olds for a matrix A over an arbitrary field F if we replace .. .

v

f ( v) .Xv

.X

v

f:

Suppose that A = (a;j ) is the matrix of the linear map V --+ V with respect to a basis . . . , Vn of V. Suppose that is an eigenvector of with eigenvalue and write in terms of the basis vectors as = 2:: 7= 1 Then we have Expressing both sides of this equation in terms of the basis vectors, we have

v

.X, v1, v f(v) = .Xv.

n

n

.Xv = .\ jL= l P,jVj = j"'f:.= l (.Xp,1 )v1 . On the other hand,

v

f, p,;v;.

Upper triangular matrices 155

f(v) = ! (t Jl;V; ) = tJld (v;) = � � aj iVj � (�Jliaj; ) vj. So, comparing coefficients of each basis vector, we have ).. j each j , and therefore a1 a1 a� ! a222 a�nn }117 �2 ' an! an2 ann n n which means that (J1 1 , . . . � n ) T is an eigenvector of A with eigenvalue >.. Conversely, if ( �1 , . . . , n )T is an eigenvector of A with eigenvalue >., then the corresponding vector n v=L i=l JliVi i n V i s an eigenvector of f with eigenvalue >.. I n particular, this result implies that the eigenvalues of f are the same a s the eigenvalues of a matrix representing f with respect to any basis. Therefore any �;

=

)( ) ( )

�

c

=

�

)..

�!

�

,

�

two representing matrices have the same eigenvalues. Using Proposition 8 . 1 9 we can restate this as follows. Proposition 1 0 . 8 If A, B, and P are n x then B and A have the same eigenvalues. 10.3

n

matrices related by B

= p -l AP,

U pper triangular matrices

As an important application of eigenvalues and eigenvectors, we aim to show here that any square matrix over � or CC is similar to an upper triangular matrix over CC. In other words, for any square matrix A over � or CC there is an invertible matrix P over CC with

b0u b ! 2 b1 3 b022 b23 1 0 b33 p AP = 0

0

0

b!n b2 n b3n bnn

This theorem isn't quite as simple as it may seem. Firstly, it will turn out that all the diagonal entries in the form above will necessarily be eigenvalues of A . Secondly, it isn't always possible to put a real matrix A into a real upper triangular form ; in other words, the diagonal entries may turn out to be complex.

156 Eigenvalues and eigenvectors The same result is true more generally for a matrix A over a field F. It turns out that we can always find an upper triangular matrix B which is similar to A, but once again the entries from B may have to be from the algebraic closure F of F rather than itself. The first lemma we need is quite straightforward, and has nothing to do with eigenvectors. Lemma 10.9 Suppose f : V -t V is a linear transformation of an n-dimensional vector space V over a field F. If f has nullity at least 1 then there is a basis v 1 , v2 , such that

F

. . , Vn .

for all j is

=

1 , . . . , n. In o ther words, the matrix of f with respect to v 1 , v2 , .

. . , vn

Proof Let r = r(f). By the rank-nullity formula, r (f) < n . So suppose for the whole v1 , . . is a basis for im f, and extend this to a basis v 1 , . 0 of V. Then f (vJ) E span(v 1 , . . . , vr ) � span(v 1 , . . . ) , as required.

. , Vr

, Vn- 1

.

.

, Vn

en be the n-dimensional vector space over C, and suppose f is a linear transformation from V to V . Then there is a basis v1 , . . . , Vn of V such that, with respect to this basis, the matrix of f is upper triangular. Proposition 1 0 . 1 0 Let V =

Proof We use induction on n ; assume the result is true for all spaces V of dimension n 1 over C. Given V of dimension n and f : V -t V, let ..\ be an eigenvalue of f. Such ..\ exists by Theorem 10.4. Note that f - ..\I has nullity at least 1 , so by the previous lemma, there is a basis such that ( f - ..\I) ( ) E span ( 1 , J) for all i . Then the matrix of f with respect to this basis is of the form -

ui

u 1 , u2 , . . . , U n

u . . . , Un -

0

u . , Un- 1

Let W be the subspace spanned by 1 , . . and note that f ( w) E W for all w E W , so that the restriction of f to W is a linear transformation of W . , of W such that the B y the induction hypothesis there i s a basis v 1 , v2 , matrix of the restriction of f to W with respect to this is upper triangular. Put then {v1 , v2 , . . is a basis of V, since (j. span(v 1 , = = span( , - 1 ) and hence { v 1 , . . . is a linearly independent set of = size n = dim V . Also, ( + w for some w E W ; hence the matrix of f D with respect to the basis v1 , v2 , is upper triangular too.

Vn - 1 ,vn } Vn V , V } 1, f vn ) ..\vn V n- n n • . •

Vn un ; u1, . . . Un

.

•

.

.

,

. . . ,Vn - 1)

Upper triangular matrices 157 Remark 1 0 . 1 1 Once again, the proposition remains valid for any algebraically

closed field F in place of C, by the same proof Example 1 0 . 1 2 Consider the matrix

A = (-� -1-� -i) . 0

A A

-2

To put into upper triangular form we first need an eigenvector, and from the first column of it is obvious that 0, is an eigenvector with eigenvalue Then

(1, o)T

A+

A+l= (� -i-1 -1-i)

-1.

0

so the image of I (considered as a linear transformation on IR3 by left multiplication ) has basis which as it happens in this particular case is also an eigenvector of We extend the vector we have so far to a basis 0, (0, of IR3 . Then the base-change matrix is

(-1, 1, -l)r, A. ( -1, 1, -1)T, (1, o)r, 1, O)T : P = (-1- 001

This has inverse

p-1 =

D

H 1 -D ( = 1A p- P � -1 -1�) '

and

0 0

-1

0 0

which is in upper triangular form .

Example 1 0 . 1 3 B y good fortune, we found the new basis rather easily i n the last example. For a slightly more typical example, consider

A = (--1� � =;) · 0

1, O)T

5

Here (0, is obviously an eigenvector with eigenvalue 4. The subspace which is the image of the linear transformation

158 Eigenvalues and eigenvectors

- 4I ( = � � =�)1 -1 0 has basis formed from f1 ( -1, -1, - l)T, and f2 ( 1, -3, l)T. We extend this to a basis of the whole space by adjoining f3 ( 1, 0, O)T, and so we have base-change matrix (-1-1-1 -311 p A

=

=

=

=

=

On calculating, we find that

( -� !) which has eigenvalue 4 and eigenvector ( 1 , 1 ) r. Also, B -4I ( = � i ) so a basis for the image of this is ( 1, 1 ) T. We extend this to the basis ( 1, 1 ) T, ( 1, 0 ) T of IR2 . Going back to IR3 what we have done is to replace the basis f , f , f with f1 f , f1 , f , and the base-change matrix for

We now look at B

=

=

,

1 2 3

this operation is

+

2

On calculating, we have q- I p -I APQ

in upper triangular form.

=

,

3

(4� -�1 401)

10.10 may fail for vector spaces over other fields. (y, -x ) r. The matrix of f with respect to the usual basis is (- � �) . Suppose for the sake of obtaining a contradiction that P ( � �) is a real base-change matrix putting this into upper triangular form, 1 d 0 1 0 p-I ( -1 0) p ad - be ( -ba ) ( -1 01 ) (a db ) (x0 y) · Example 1 0 . 1 4 Proposition

For example, let V

=

IR2 over IR and f be the linear transformation j (x , y )T

=

=

-c

c

=

z

=

Upper triangular matrices 159

( cd2+ ab2 d2 + b2 ) (x yz) -cd - ab - 0 ad - be c

Multiplying out gives

1

-c2 - a2

-

a

-

so = 0 giving a = c = 0 (since a and c were supposed to be real) and hence P is singular. We conclude that the original matrix is not similar to any upper triangular matrix over JR.. Note, however, that

( 1 1) - 1 ( i

0

0

1

-1 0

) (1 1) - (i i ) i

0

0

-i

so the original matrix can be put in upper triangular form over C. The eigenvalues of this particular example are i and -i which are imaginary and not real. In general, the diagonal entries of any upper triangular form for a matrix A are the eigenvalues of A, as shown by the next proposition. Proposition 1 0 . 1 5 If A is an n x n upper triangular matrix, then the diagonal

entries in A are precisely the eigenvalues of A .

Proof Suppose A = (a;j) and ,\ = a ;; is a diagonal entry o f A. Then

Aej = aj1 e1 +

·

·

·

+ aj - 1 j ej - 1 + ajj ej

since A is upper triangular. Using a ;; = ..\, we obtain (A - ..\l)e; = a; 1 e1 +

·

·

·

+

ai-1 i ei- I E span(e1 , . . . , ei_ I ) .

Thus A - ..\1 defines a linear transformation T from an i-dimensional space, span(e 1 , . . . , ei ) , into an (i - I )-dimensional space, span(e 1 , . . . , e;_ I ) . By the rank-nullity formula, T has nullity at least 1, so there is v E span( e 1 , . . . , e;) with T(v) = (A - ..\l)v = 0, i.e. v is an eigenvector of A with eigenvalue ..\. To see that all the eigenvalues of A appear along the diagonal of this rep resentation, suppose ,\ is not equal to any diagonal entry a;i . Then A ..\1 is in upper triangular form and has each diagonal entry nonzero. In other words, A - ..\1 is in echelon form and has rank n, and hence nullity 0. Therefore ,\ is D not an eigenvalue.

-

It would be possible to define the determinant of a matrix A to be the product of the diagonal entries in any upper triangular form p- I AP for A. The problems are ( 1 ) that it is not immediately obvious that this definition agrees with the usual definition, and (2) in showing that this definition does not depend on the choice of P or the choice of the upper triangular form A. In fact these problems can be got round, and the definition can be made sound, but doing so would take us too far off track. 1 We conclude with a particularly useful observation concerning upper trian gular matrices. 1

The interested reader can follow up the details in 'Down with determinants', by Sheldon

Axler.

American Math Monthly,

February 1 995,

pp.

139- 1 4 5 .

160 Eigenvalues and eigenvectors Theorem 1 0 . 1 6 If A is any upper triangular n x n matrix with entries from or C, and A 1 , Az , . . . , An are the diagonal entries of A including repetitions,

JR.

then the matrix

is the zero matrix. Proof If n = 1 , this is obvious. We prove the general statement using induction on n. Given an upper triangular n x n matrix, take the standard basis e1 , . . . , e n of the underlying vector space IR.n or en . Observe first that

from Section 9.2 in the last chapter. Now

since (A - An l)en = 0 since A is upper triangular with last row equal to (0, 0 , . . . , 0 , An ) · Also, the linear transformation given by A on the subspace span (e 1 , . . . , e n - ! ) has upper triangular matrix with respect to this basis, with diagonal entries A1 , . . . , A n - ! , so by the induction hypothesis we have for all i < for i
.; , and the matrix of f is by definition

0

0

]J

Conversely, if the matrix of f with respect to { v 1 , . . . , vn } is as above, then D = A;v; , so the basis vectors v; are eigenvectors of f.

f(v;)

176 Diagonalization 12.2

A criterion for diagonalizability

This section uses the ideas from the previous chapter to give a precise criterion for when a matrix or linear transformation is diagonalizable. Recall that in the last chapter we identified two polynomials whose roots are precisely the eigenvalues of a linear transformation f : V --+ V, namely the minimum polynomial m1 (x) and the characteristic polynomial x1 (x) . If f is diagonalizable, then the minimum polynomial of f takes a particularly simple form. formation. If f distinct roots.

f:

V --+ V be a linear trans is diagonalizable, then the minimum polynomial m1 (x) of f has

Theorem 1 2 .4 Let V be a vector space, and let

f is diagonalizable, so there is a basis of eigenvectors { v 1 , . . . , Vn } of V. Let ,\ 1 , . . . , Ar be the distinct eigenvalues of f , and define the polynomial Proof Assume that

Then p (f) (f - -\11)( ! - .\21) . . . (f - Arl) , and the factors (f - -\;I) all commute with one another by Proposition 9.5. Now each basis vector is an eigenvector, with eigenvalue -\1 for some j , and therefore we have (f - -\1 I) ( v;) = 0. It follows that

=

v;

p (f)(v;) = (f - -\d)(f - -\2 I) . . . (f - Arl) v; = (f - -\d) . . . (f - Aj-J I) (f - AJ+J I) . . . (f - -\ri)(f - -\1 I) v; =0 for each i, and so p (f) is the zero transformation. Therefore m1 (x) divides p ( x ) , by Proposition 1 1 .3. But p ( x ) has distinct roots, and therefore m1 (x) has distinct D roots. In fact, the converse of this result is also true, and we will prove it in due course (see Corollary 12.13). Notice that when we have proved this, we will have a criterion which we can use to determine if a particular linear transformation is diagonalizable. Specifically: f is diagonalizable if and only if m1 ( x) has distinct roots. Recall from Theorem 10.2 that if f : V --+ V , ,\ is a scalar, and v E V, then v E ker(f - -\I) if and only if is either zero or an eigenvector of f with eigenvalue -\ .

v

If ,\ is an eigenvalue of a linear transformation f : V --+ V , then the subspace ker(f - -\ I ) is called the -\-eigenspace of f . Its dimension is called the geometric multiplicity of the eigenvalue .\.

Definition 1 2 . 5

We will prove that if the minimum polynomial of f has distinct roots, then V is the 'direct sum ' of these eigenspaces. In other words, if we take whatever

A criterion for diagonalizability 177 basis B >. we like for the .A-eigenspace ker(f - .AI) , and do this for all eigenvalues .A 1 , -A2 , . . . , .Ak , then B>,1 U B>,2 U · · · U B>..

is automatically a basis of V, provided

m 1 ( x)

has distinct roots.

Definition 1 2.6 V is the direct sum of subspaces V1 , . . . , Vr , if every vector E V can be written uniquely as a sum v = v1 + · · · + Vr , where E Vi . We

v

write v = vl

EB . . . EB

v;

Vr or v = EB �=l Vi .

A useful way to think of direct sums is that V i s a direct sum V = EB �= l V; if and only if whenever we have bases B; of Vi , then their union U ; B; is a basis of V, as the following proposition shows. Proposition 1 2 . 7 If Vi is a subspace of a vector space V, where V = E9 �= 1 V; , and if B; is a basis for Vi for each i, then U�= l B; is a basis for V .

Proof First, every vector v E V can be written as v = v 1 + · · · + Vr , where E Vi . Therefore for each i, v; is a linear combination of the vectors in B; , and

v;

so v is a linear combination of the vectors in U�= l B; . Thus U�= l B; spans the space V . Now suppose that there is a linear dependence among the vectors of U�= 1 B ; . Write B 1 = { a1 , . . . , ak}, B 2 = {b 1 , . . . , bz } , . . . , Br = {z 1 , . . . , zt } , and suppose the linear dependence is Then Vj = Al a! + . . . + Akak E VI , v2 = /-lib! + . . . + J-llbl E v2 , and so on. Thus we have written the zero vector as 0=

VJ

+ V2 +

·

· ·

+ Vr ,

where v; E Vi . But since 0 = 0 + · · · + 0 is the unique way of a sum of vectors in the Vi , we must have v; = 0 for all i. This .A 1 a 1 + · · · + Akak = 0, so all the .A; = 0, since { a 1 , . . . , ak} is a Similarly, = 0, . . . , �i 0. In other words U�= l B; is a linearly subset.

J-li

=

writing 0 as implies that basis for V1 . independent

D

Corollary 1 2 .8 If V = EB �= l Vi , then dim(V) = l:�= l dim( V; ) . Proof Immediate from Proposition 12.7.

D

The main theorem of this section is the following, from which we can easily deduce the converse to Theorem 12.4. Theorem 1 2 .9 Let V be a complex vector space, and let f : V

transformation. Suppose that

-+

V be a linear

with .A1 , .A2 , . . . , Ar distinct, and let Vi be the A;-eigenspace of f. Then V

=

V1

EB

V2

EB · · · EB

Vr .

178 Diagonalization According to Definition 12.6, there are two parts to proving V is a direct sum of subspaces: uniqueness (proved below in Corollary 12. 1 1) and existence (Proposition 1 2 . 1 2 ) . The first part uses the fact that eigenvectors with distinct eigenvalues are linearly independent. More formally, Proposition 1 2 . 1 0 Let f : V --t V be a linear transformation, and suppose that v 1 , . . . , Vr are eigenvectors of f with distinct eigenvalues A 1 , . . . , A r respectively. Then { v1 , . . . , Vr } is a linearly independent set. Proof Suppose not, and let k be the smallest integer such that { v 1 , , Vk } is linearly dependent. In particular, { v 1 , . . . , Vk - d is linearly independent, and there exists a linear dependence .

.

•

0:1 V 1 + · · · + O:k Vk = 0 with ak f:: 0. Moreover, Vk f:: 0 so at least one other a ; is nonzero (i Applying f to both sides of this equation we obtain

0= = = =

< k) .

f (O) j(a1 v 1 + · · · + ak vk ) ar f(vr ) + · · · + ak f(vk ) a1 A 1 v 1 + · · · + O:k A k Vk ·

Now subtract A k times the first equation from the second , to obtain

a1 (A 1 - A k ) v 1 + a2 (A2 - A k ) v2 +

· · ·

+ O:k - 1 (Ak - 1 - A k ) vk - 1 = 0.

But A; - A k f:: 0 since A 1 , . . . , A k are distinct, so this i s a nontrivial linear dependence among { v 1 , . . . , Vk - 1 } , contradicting the fact that these vectors are 0 linearly independent. Corollary 1 2 . 1 1 Let f : V --t V be a linear transformation, and suppose that v E V can be written as v = v 1 + · · · + Vr where v; is an eigenvector of f with eigenvalue A;, and A1 , . . . , Ar are distinct. If also v = w1 + + Wr with each w; an eigenvector of f with eigenvalue A; , then v; = w; for each i . ·

·

·

Proof Otherwise

is a nontrivial linear dependence of eigenvectors with distinct eigenvalues, con 0 tradicting Proposition 1 2 . 1 0 . This proves the uniqueness part of Theorem 12.9. The existence part is a little harder. Proposition 1 2 . 1 2 Suppose that f : V

minimum polynomial

--t

V is a linear transformation with

mJ (x) = (x - A r ) . . . (x - >-r ) where A 1 , . . . , Ar are distinct, and suppose that v E V . Then there exist eigenvectors v 1 , . . . , Vr of f such that v = v 1 + · · · + Vr .

A criterion for diagonalizability 179 Proof For each j in turn consider the polynomial Pj (x) defined by

Note that Pj (x) is well-defined (since all the A i are distinct) and that Pj (Aj ) while Pj (Ak ) = 0 i f k =/::- j . Now consider the polynomial p(x) defined by

=

1,

r

p (x)

=

L:>j (x) .

j= l

This has the property that p(,\i ) = 1 for each i since pj (A i ) = 0 if i =/::- j and = 1 . Thus the polynomial p(x) - 1 has roots ,\1 , . . . , A r· But p(x) - 1 has degree at most r - 1 , since each Pj ( x) has degree r - 1 , so p( x) - 1 has at most r - 1 roots. Since all the A i are distinct, the only way this can happen is if p(x) - 1 is identically 0. Thus p(x) is identically 1 . Hence p(f) is the identity linear transformation, and so p (f) (v) = But

Pi (Ai )

p(f)

=

v. r Pj (f) L j= l

so

r

r

j= l

j =l

where

Vj

Pj (f) (v) (f - >-1 I) . . . (f - Aj - 1 I) (f - AJ+ ! I ) . . . (f - >-ri) = (v) . ( >-j - >- 1 ) . . . ( >-j - Aj_ I ) ( >-j - >-H I ) . . . ( >-j - >- r ) =

Applying f - Aji to both sides of this we conclude that (f - Ajl) (vj ) i s a scalar multiple of mt (f)(v), which is 0. Therefore f(vj) = AjVj , so Vj is an eigenvector D of f with eigenvalue Aj , as required. An alternative proof of this proposition can be given by Proposition 9 . 1 4 and induction. See also Proposition 1 2 . 1 7 below. Theorem 12.9 now follows immediately from Corollary 12. 1 1 and Proposi tion 12.12.

180 Diagonalization 1 2 . 1 3 Suppose that f : V imum polynomial

Corollary

--+

V is a linear transformation with min

rnf (x) = (x - ..\I ) . . . (x - ..\,.)

where ..\ 1 , . . .

Proof

, ..\,.

are distinct. Then f is diagonalizable.

Choose a basis Bi for each eigenspace Vi - Then by Proposition 12.7,

U�= l Bi is a basis for V, and every element of this basis is by definition an

eigenvector of f . The result follows from Proposition 12. 3 . 1 2.3

D

Exa m p les

We shall start by discussing some matrices whose minimum polynomials were calculated in the previous chapter. Example 1 2 . 1 4 In Example 1 1 . 1 7 we showed that the matrix

has minimum polynomial rnA (x) = x 2 - 8x + 16. Since this polynomial factorizes completely into linear factors over IE., A is similar to an upper triangular matrix, but rnA (x) = (x - 4) 2 has 4 as a repeated root, so Theorem 12.4 says that A cannot be diagonalized. Example 1 2 . 1 5 The matrix

over IE. has minimum polynomial x2 - 2x + 2. To see this, it suffices to check that A 2 - 2A + 21 = 0 , and that x 2 - 2x + 2 cannot be factorized over IE.. However, rnA (x) has no real roots, so A is not similar to any upper triangular matrix, let alone a diagonal one. Over C, the situation is different as x 2 - 2x + 2 = (x - ( 1 + i ) ) (x - ( 1 - i ) ) which has two distinct roots in C , so A i s diagonalizable over C . I n fact, we find that

as

you may check.

Example 1 2 . 1 6 Consider the matrix

A=

( ! � -�) -

-4

0 -3

Examples 181 1 1 . 18. This has mA (x) = (x - 2)(x - 1 ) (x + 1) eigenvalues of A are 2, 1 , - 1 . The minimum polynomial mA (x)

of Example

so the has its maximum number of roots ( three ) in IR and all these roots are distinct, so is diagonalizable. In other words, there is an invertible 3 x 3 matrix P with real entries so that

A

p-' AP �

(� ! J)

To find such a matrix P, it suffices to find a basis of eigenvectors of A, since P is just the base-change matrix from the usual basis to a basis of eigenvectors. These eigenvectors can be found as usual by solving simultaneous equations. For eigenvalue 2, we need to solve

The full solution is that (x, y, z)T is any scalar multiple of (0, 1 , 0)r, so we may take (0, 1 , as our first basis vector. Similarly, ( 1 , - 1 , - 1)T and ( 1 , -2, -2)T are eigenvectors with eigenvalues 1 , - 1 respectively. Proposition 12.10 says that these three vectors form a basis, and so ( taking them in the same order as we took the eigenvalues 2, 1 , - 1) we see we may take

Of

p=

(� -� �) -

0 - 1 -2

Note, however, that any three eigenvectors for the three eigenvalues form a basis with respect to which the matrix of the transformation x f--7 Ax is diagonal , so the choice of P above is by no means unique.

Looking ahead to Chapter 14 and the primary decomposition theorem ( The orem 14.3) , we can point out an alternative method for finding eigenvectors other than solving the obvious simultaneous equations. \Vith A as in the previous ex ample, we already identified mA (x) = (x - 2) (x - 1)(x + 1), so it is clear that im B � ker C where B = (A - I) ( A + I) and C = ( A - 2I) , since the product CB of these two matrices is zero. It turns out in fact that these subspaces are actually equal, irn B = ker C , so to find the eigenvectors with eigenvalue 2 it suffices to compute B and find a basis of its image:

(-�

)(

) (

4 0 2 0 0 0 2 1 -5 -4 3 -5 0 3 -4 0 -2 0 0 -4 0 -4 so the image is spanned by (0, 1 , O)T . The other two eigenspaces can be computed B

=

(A - I)(A + I) =

in a similar way.

=

182 Diagona/ization In practice, this method seems useful for more simple matrices, especially when the work in identifying the minimum polynomial has already been done. But, in general, calculating bases of the image of a matrix still involves computing echelon forms, so there may not be any real saving in effort for more complicated examples. For the interested reader, we give the result that states that this method works as follows. Proposition 1 2 . 1 7 For any n x n matrix A over a field F, if A has minimum p olynomial mA (x) = p(x) q (x) , where p(x) and q (x) do not have any non constant factor in common, then im(p( A )) = ker ( q(A) ) .

The proof uses the version of Euclid's algorithm given as Proposition 9. 14, and is left as an exercise for the reader. Diagonalization is frequently applied in the solution of simultaneous linear difference equations and simultaneous linear differential equations. For example, if sequences Xn , Yn are defined by

we would like to find formulae for Xn and Yn in terms of the known quantities Of course, we may write n al l a1 2 r Xn ' Yn = a2 1 a22 s

a n , a 1 2 , a2 1 , a 22 , r, s .

( ) (

) ()

but this just begs the question of determining a formula for the nth power of a square matrix. Diagonalization helps here, since if A=

(

a1 1 a2 1

and

p - 1 AP =

G �)

then (P -1 AP )

and

n

= p - 1 APP - 1 AP . . . p -1 AP = p - 1 A n p

(A0n fl0n) .

So

which gives A n in terms of the eigenvalues A, fl of A and a basis of eigenvectors given by P . This is what is going on in the example in Section 1 0 . 1 , and of course there is nothing special about 2 x 2 matrices here.

Examples 183 Example 1 2 . 1 8 We solve the system of difference equations

=

x0 = 1 ,

X n+l = 3x n - 4yn + 2 zn , Yn+ l Xn - Yn + Zn , Zn+ l = Xn - 2yn + 2 zn ,

Yo = 2, zo = - 1 .

The solution is

where

Now, the minimum polynomial of A was computed in Example 1 1. 19 and found to be (x - 1)(x - 2 ) , which is a product of distinct linear factors, so A is diag onalizable. To find a basis of IR.3 of eigenvectors we must solve the simultaneous equations ( A - I)(x, y , z ) T = 0 and ( A - 2I)(x, y , z ) T = 0. The first of these is

which has solution space ker ( A - I) = span ( ( 1 , 1 , If, ( 2, 1, o f) as

you may check. The second eigenspace is the set of solutions of

which is

ker ( A - 21) = span (( 2, 1 , 1f ) .

Therefore,

(� ) 1

where

P=

2 1 0

2

�

( Of course, there are many other bases for the eigenspaces, and so many other suitable base-change matrices one might take. ) This gives

) ) c� D

184 Diagonalization

(i � ) ( : 1 ' -1 C ) C ' ( :) (-1) -

A" � p

so

0 1 0 2n + 2 " +' - 1 + 2n - 1 + 2n

2 2 "+ ' 2n 1 0 2n 4 - 2 n+ 2 -2 + 2 " +' + 2n 3 - 2 n+ l n+l 2n 2-2 p- • �

= An

;

Zn

2

=

"

2 "+ ' 6 2 n+ 2 3 - 2 n+2

2 1 -2

.

Similar methods can b e used to solve simultaneous differential equations. Example 1 2 . 19 Suppose variables u, v depend on time, t, according to the equations

du = dt

dv = u - v' dt

u + 3v

which can be written as

It turns out that this 2 alized. In fact

x

2 matrix has eigenvalues 2 and - 2, so can be diagon where

for then

=

( u + v)/4,

(31 -_ 11 ) .

p - 1 (u, v)T, or y = ( - u + 3v)/4,

This suggests introducing variables x, y with (x, y)T

x

P=

=

(U) = dtd p (Xy ) = ( 11 - 31) p (Xy) '

!}__ dt v or

This gives dxjdt = 2x, dyjdt = -2y so x = Ae2 1 , y = Be-21 for some positive constants A, B, so the solution is (u , v)T = P (x, y) T , or The constants A , B can be found as usual from boundary conditions. For ex ample, if we are given that v. uo and v vo at time t = 0, then A = (uo + vo)/4 and B = (3v0 - ?lo) / 4.

=

=

Exercises 185 The exercises following provide more examples, and one or two hints on some useful tricks that can be applied in similar cases. Not all matrices can be diagon alized, though, and when you meet such an example the methods of this chapter cannot be used. Instead, it may be necessary to put the matrix in Jordan normal form, and use the ideas from Chapter 14 below. Exercises

Exercise 1 2 . 1 Calculate the characteristic polynomials and minimum polyno-

mials of the following matrices.

( a) ( c)

G D 2 1 0

0 I 0 0

( b)

2 0 2 0

(� �)

( d)

Which of these ( if any ) is diagonalizable? Exercise 1 2 . 2 Show that, regarded as 2

(cos B sin B

- sin B cos B

)

x

(i �) 0 1 0

(! i) 2 0 2

0 1 0 0

3

2 matrices over C,

and

are similar. Exercise 1 2 . 3 Which of the following are diagonalizable? Explain your answers,

but try to do as little work as possible, using results from this and previous chapters where applicable.

( a) ( d)

-

-

G H) G D G D ( -� �i ) (-! ;) (b)

2 -4'

over �

Exercise 1 2 . 4 ( a) Find two 2

0 I

0

(e)

4 -I -2

( c)

-1 1 -2

-4

over C .

x 2 matrices over � which have the same char acteristic polynomial but which are not similar. ( b ) Find two 3 x 3 matrices over � which have the same minimum polynomial but which are not similar. ( c ) Find two 4 x 4 matrices over � which have the same minimum polynomial and the same characteristic polynomial, but which are not similar.

186 Diagonalization Exercise 1 2 . 5 The Fibonacci numbers Xn are defined by Xn+ 2 = Xn+! + Xn , Xo = X! = 1 . Let Un = X2 n , Vn = X 2 n+! and find a matrix A so that

(UnVn+!+ ! ) = A (UnVn ) .

Diagonalize A and hence find a formula for Xn in terms of n. Exercise 12.6 Solve Xn+! = - Yn + Zn ; Yn+! = -yn ; Zn+!

with initial values xo = Yo = 1 , zo = 2.

=

2x n - 2yn + Zn;

Exercise 1 2 . 7 Solve

(a) Xn+! = Xn + 2yn ; Yn+ ! = 2xn + Yn + 1 ; where Xo = Yo = 1 . [Hint: introduce Zn with Zn+! = Zn and zo = 1 .] (b) Xn+! = 2xn + 3yn ; Yn+! = 3xn + 2yn + 2 n ; where Xo = 1 , Yo = 2. [Hint: introduce some suitable Zn .]

Exercise 1 2 . 8 Solve

(a) X n+! = Xn + 4x n + 1 ; Yn+! = Xn + Yn ; Xo = Yo = 1 . (b) X n+ ! = 2xn + Yn + 1 ; Yn+ ! = Xn + 2yn ; Xo = Yo = 1 . [Hint: in each case, introduce U n = Xn + an + b , Vn = Yn + en + constants a , b, c, d.]

d for

certain

Exercise 1 2 .9 Solve the following systems of differential equations for functions

x(t) , y(t) , and z(t) , where a dot denotes differentiation with respect to t. (a) x = -y + z ; iJ = -y; i = 2x - 2y + z ; with boundary conditions x(O) y(O) = 1, z(O) = 2. (b) x = x + 2y; iJ = 2x + y + 1 ; where x(O) = y (O) = 1 . (c) x = 2 x + 3y; iJ = 3 x + 2y + e 2t ; where x (O) = 1 , y (O) = 2. (d) x = x + 4x + 1 ; iJ = x + y; x(O) = y (O) = 1 . (e) x = 2x + y + 1 ; iJ = x + 2y; x(O) = y(O) = 1 .

Exercise 1 2 . 10 Show that V is the direct sum of subspaces U, W if and only if

uE

(a) every v E V is equal to u + w for some U and w E W and (b) U n W = That is, show that Definition 5.28 and Definition 1 2 . 6 agree.

0.

Exercise 1 2 . 1 1 Prove Proposition 12. 1 7. Hence, using induction on dimension, give an alternative proof of Proposition 1 2 . 12.

13 Self-adjoi nt t ra nsformations This chapter combines material concerning quadratic forms with material from the previous chapter on diagonalization. Throughout, V is a finite dimensional vector space over IR or C with an inner product ( I ) . The main goal in this chapter is to understand the nature of quadratic forms, symmetric bilinear forms and conjugate-symmetric sesquilinear forms on a finite dimensional inner product space V-in particular, how the form relates to the inner product on V . It turns out that the key to describing such a form is an associated linear transformation on V . The linear transformations here are of interest in their own right, and have the property of being self-adjoint (as defined below). They can be diagonalized using methods in the last chapter, and this diagonalization provides a complete description of the bilinear or sesquilinear form we are interested in. 13.1

O rthogon a l and u nitary transformations

In earlier chapters we studied the behaviour of quadratic forms (or equivalently, symmetric bilinear forms) on arbitrary vector spaces over JR. The goal was to find a change of basis that diagonalizes the form , or at least makes it look as 'nice ' as possible. In many ways, particularly so for applications to geometry, quantum mechanics, etc. , it is much more interesting to study forms on inner product spaces. What this means is that a base-change transformation f must preserve the inner product, i.e. must send a vector v to another vector f( v) of the same length as v, and send an orthogonal pair of vectors v , to another orthogonal pair f ( v), f ( ) An equivalent view is that our base-change transformations should send orthonormal bases to orthonormal bases, so instead of allowing ourselves to use arbitrary bases, we only allow orthonormal bases.

w.

w

Example 1 3 . 1 Let us suppose that we are working in IR2 with the standard

inner product, and we are considering the quadratic form Q (x, y) (xja)2 + 1 as the equation of an (y/b) 2 . You should recognise the equation Q (x, y) ellipse. By scaling the coordinates, changing the basis to (a , O) r , (O, b)T (which is an orthogonal basis, but not orthonormal) , we can write Q as Q (u, v) u2 + v2 , and the ellipse turns into a unit circle. If on the other hand we only allowed ourselves to use orthonormal bases, then our ellipse would keep its shape, but it might be rotated. For example, changing basis to the orthonormal basis =

=

=

188 Self-adjoint transformations

( 1/v2, 1/v2)r, ( -1/v2, 1/v2)r

represents a rotation by and hence preserves orthogonality and length.

Jr /4 about the origin,

From now to the end of this chapter we will deal with the real case and the complex case at the same time by writing complex-conjugate signs where they are required in the complex case. (In the real case, these complex-conjugate signs can always be ignored since the number in question is real.) Definition 13.2 Let V be a finite dimensional inner product space, and suppose

is a linear transformation V V. We say f is (when V is a vector fspace over �) or (when V is a vector space over .. - 'X) (v !v)

=

0. But (v!v) f: 0 since v f: 0 and hence .>.. = -:\".

D

Corollary 1 3 . 1 2 For any symmetric real n x n matrix A or any conjugate symmetric complex n x n matrix A , the characteristic polynomial XA (x ) has n real roots (counting multiplicities). Proof XA (x ) has ceding theorem.

n

complex roots, but each of these roots is real by the pre D

Theorem 13.13 The minimum p olynomial m f (x ) of a self-adjoint linear trans

formation f : V repeated roots.

-+

V of a finite dimensional inner product space V has no

Eigenvalues and diagonalization 195 = (x - ..\) 2 p (x) for some polynomial p (x) . Then E V with ( J - ..\ )p( J ) (v) -::j:. 0 and ( J - ..\) 2 p(f) (v) = 0. 0, so there is ( ) ) v (f- ..\ p f -::j:.

Proof If not, suppose m1 (x)

But then

0 -::f. (( ! - ..\)p(f) (v) l (f - >.)p ( f ) (v))

=

(p(f) (v) l ( f - >.)( ! - >.)p (f)(v))

since ( J - >.) is self-adjoint. This is a contradiction. Corollary 13.14 Any self-adjoint f : V product space V is diagonalizable.

--+

=

0 D

V of a finite dimensional inner

Proof By the previous theorem and Corollary

12.13.

D

In Proposition 12.7 we proved that if v 1 , v2 , . . . , V k are eigenvectors of a linear transformation f , where f (vi) = AiVi and the )..i are all distinct, then { v1 , v2 , , vk } is linearly independent. For self-adjoint f we can make the stronger statement that the Vi are orthogonal. •

•

•

Theorem 1 3 . 1 5 Let f be a self-adjoint linear transformation f : V --+ V, and suppose v 1 , v2 are eigenvectors of f with corresponding eigenvalues ).. 1 , ).. 2 . If ).. 1 -::j:. ).. 2 then v 1 and v2 are orthogonal. Proof We have

>.1 (v 1 l v2 )

1;"" (v1 !v2 ) (>.1 v1 ! v2 ) (f (v i ) I v2 ) = (v i i f (v2 )) = (v1 ! >. 2 v2 ) = >. 2 (v 1 l v2 ) =

=

=

since ).. 1 is real, by Theorem (v 1 l v2 ) = 0, as >. 1 -::f. >. 2 .

13. 1 1.

But then ( >. 1 - ..\ 2 ) (v 1 h )

=

0, and hence D

What this means is that, for a self-adjoint linear transformation J , we can always find an orthogonal basis of eigenvectors. In fact , we can do even better: by normalizing in the usual way there is an orthonormal basis of eigenvectors. To see this, we first find any basis of eigenvectors. Then for each eigenvalue >. , we take the set of basis vectors which have that eigenvalue , and apply the Gram-Schmidt algorithm to it. The result will be an orthonormal basis for the eigenspace, since any nonzero linear combination of eigenvectors with eigenvalue ).. is itself an eigenvector with eigenvalue >. . The base-change matrix P from the usual orthonormal basis to this orthonor mal basis of eigenvectors will be unitary, i.e. P T = p-I , since the new basis is orthonormal. It follows that, given a real symmetric matrix A, or a complex

196 Self-adjoint transformations conjugate-symmetric matrix A, we can find an orthonormal basis of pectively ) for which both the linear transformation

en

f (v)

=

IRn (res

Av

and the (symmetric bilinear, or sesquilinear) form

F(v, w)

=

vTAw

are represented by the same diagonal matrix. The proof just given of Corollary 13. 14 and Theorem 13.15 is somewhat indir ect. Using the notion of orthogonal complement from the 'optional' Section 5.4, it is possible to give a direct proof of these results. We do this now for the benefit of readers who have read the material on orthogonal complements. Theorem 13.16 Suppose f : V

--+ V is a self-adjoint linear transformation of a fin ite dimensional inner product space V over lR or C. Then there is an or tlwnormal basis { v1 , v2 , . . . } of V such that each vi is an eigenvector of f .

, Vn

Proof We use induction on the dimension n of V . If n = 0 there is nothing to prove as the empty set 0 is a suitable basis of V. Since f has a real eigenvalue .\1 , there i s a nonzero v1 E V with f ( v1 ) = .\ 1 v1 and [[vi [[ = 1. Let U = span(vi ) and W = U j_ = {w E V : ( u[w ) = 0 } . W we have ( j(w) [u ) = ( w [f(u) ) = ( w [ .\ 1 u ) Then for w .\ 1 (w [u ) = 0, so f (w) E W ; thus we may regard f as a self-adjoint linear transformation of W. Also, U EB W = V and U has dimension 1 , so W has dimension n - 1. By our induction hypothesis, there is an orthonormal basis v2 , . . . of W consisting of eigenvectors of f, and clearly v1 , v2 , . . . is the required basis of V. D

E

=

, Vn

, Vn

13 .4

Applications

We shall indicate some of the applications of the results in the previous section here by way of some examples, all of which concern real vector spaces. One obvious place in which real symmetric matrices arise is as the repres enting matrix of the symmetric bilinear form corresponding to a quadratic form For example, given a quadratic form Q (x, y , z) on JR3 , the equation Q on Q (x, y, z ) = a represents a surface which we might want to describe. Simply completing the square as we did in the last part to find the rank and signature of Q gives some information, but we lose the additional structure given by the usual inner product on in the process. Somehow, we need to diagonalize the form Q and the usual inner product simultaneously to get a full picture. This is illustrated by the following example.

IRn .

IR3

Example 1 3 . 1 7 Consider the surface in

IR3 defined by the equation

5x2 + 5 y 2 + 5z 2 - 2xy - 2yz - 2zx This i s Q (x, y, z)

=

3 where Q i s the quadratic form

=

3.

Applications 197

Q(x,y,z) � (x , y , z) ( =!

We now diagonalize the matrix

A=

( -�

�� =D G)

-1

-1 -1

5

- 1 -1

5

)

if it represented a linear transformation not a bilinear form . This is legitim ate, provided we keep to orthonormal bases, since we know that corresponds to a symmetric bilinear form, which in turn corresponds to a self-adjoint linear transformation. It is symmetric, and hence the corresponding linear transform ation fA is self-adjoint, and is diagonalizable. In fact, it turns out that this matrix has a basis of eigenvectors 1 , 1 , l )r, ( 1 , - 1 , 0) T , 1 , 0, - 1 ) T with corres ponding eigenvalues 3, 6, 6 , as you can check. However, in this example we want an orthonormal basis of eigenvectors. Using the Gram-Schmidt process to or thogonalize u 1 = ( 1 , - 1 , 0)r, u 2 = ( 1 , 0, - 1 ) T we set v 1 = u 1 and as

Q

(

(

This gives the following orthogonal basis of eigenvectors of A v, �

Now normalize: W1

=

(

( l) -

t)

1 //2 - 1/ 2

.

v2

w2

This gives the base-change matrix

=

=

C�) 1!

2

,

v, �

( )

w3

l f/6 1 /)6 , -2/)6

(D

=

c)

/3 1 /)3 . 1 /)3

)

1 /y'6 1 /)3 1 /)6 1 /)3 -2/)6 1 /)3

of Example 1 3.4. The point of that example was to show that p T = p- 1 and so p TAP = p- 1 AP =

( )

6 0 0 0 6 0 . 0 0 3

This matrix is diagonal, so the base-change matrix P diagonalizes the quadratic form as well as the linear transformation fA .

Q

198 Self-adjoint transformations Now introduce 'new coordinates' a , b, c by the rule

so

Q (x, y, z)

�

(x, y, z)P(P - ' AP)P - '

�

(a, b, c)

=

6 a 2 + 6b2 + 3c2 ,

since (x, y, z) P = pT (x, y, z ) T is given by the equation

=

G � D G)

p - 1 (x, y, z)r. Thus the surface Q (x, y, z)

with respect to the new coordinates

wr , w 2 , w3 .

G)

a,

=

3

b, c in the directions given by the vectors

This surface is an ellipsoid, with centre at the origin, and elongated in the

c or w 3 direction with radius 1 in this direction, and with radius 1 /v2 in the directions orthogonal to this.

A quadratic form of rank 3 over �3 can always be diagonalized as Q (x, y, z) = ± (x/a) 2 ± (y/b) 2 ± (z/ c ) 2 . The surfaces given by the equation Q (x, y, z) to the signs (i.e. the signature) . The surface

Q (x, y, z)

=

=

1 have different shapes according

(x/a) 2 + (y/b) 2 + (z/c) 2

=

1

=

1

is an ellipsoid, with semi-major axes a , b, c. The surface

Q(x, y, z)

=

(x/a) 2 + (y/b) 2 - (z/c) 2

is a one-sheet hyperboloid, something like a cooling tower extending to infinity in both directions. The surface

Q (x, y, z)

=

(x/a) 2 - (y/b) 2 - (z/c) 2

=

1

is a two-sheet hyperboloid, like a hill reflected in the sky. The final equation,

Q (x, y, z)

=

- (x/a) 2 - (y/b) 2 - (z/c?

clearly has no real solutions. (See Figure 13.2.)

=

1,

Applications 199

(a) Ellipsoid

(b) One-sheet hyperboloid

(c) Two-sheet hyperboloid Fig. 13.2 Surfaces defined by quadratic forms of rank 3


(a) Elliptical cylinder

(b) Hyperbolic cylinder Fig. 1 3 . 3 Surfaces defined by quadratic forms of rank

2

The degenerate cases, when the quadratic form has smaller rank, are also worth noting. The surface = = 1 is an elliptical cylin + der, while the surface Q = is a hyperbolic cylinder (if that makes sense ! ) . (See Figure 13.3.) We may also consider surfaces of the form = 0. These are degener ate cases of = E when E -+ 0. In particular, the surface = 0 is that of two cones joined together at their apexes, whereas the more general case = 0 is similar except the cross sections of the 'cones' are ellipses. The other degenerate case of this type is exemplified by = 0 which is a pair of planes meeting at the line = = 0.

2 2 2 - (y IW(ylb) 1 (x,Q(x,y, y, z) z)(xI a)(xla) Q(x, y, z) Q(x, y, z) (z I b) 2 (xla) 2 - (ylb) 2 - (zlc? (xla) 2 - (ylb) 2 =

(xla?-(y lb) 2 x y

Exercises

Exercise 1 3 . 1 For each case, sketch the graph of the curve in question and

describe all eigenvectors of the matrix

A geometrically.

Exercises 201

G n. (b) x2 + 2xy y2 = 7, and A = G U. ., ., ( 5 2 ) (c) 5x- + 4xy 5y - = 7, and A = 2 5 (d) x2 + y2 = 7, and A = G � ) . (a) x2 + 4xy + y2 = 7, and A = +

+

·

CC2 be defined by T ((x, y) T ) = (2ix + y, x) T . (a) Write down the matrix A of T with respect to the usual basis of CC2 .

Exercise 13.2 Let T : CC2

(b) (c) (d) (e)

-+

Is A symmetric? Is A conjugate-symmetric? What are the eigenvalues of A? Is A diagonalizable?

Exercise 13.3 A matrix A is of the form

(� �) , where a, b, c E JR. Suppose

that A has an eigenvalue .\ of algebraic multiplicity 2 . Prove that a = b and calculate the value of c. Exercise 13.4 Sketch the graph of each of the following.

(a) 5x2 - 8xy + 5y2 9. ( b ) l l x2 - 24xy + 4y2 + 6x + 8y = - 15. (c) 1 6x2 - 24xy + 9y2 - 30x + 40y = 5. [Hint: for (b) and (c) diagonalize the matrix for the quadratic form first, then transform the whole equation including the nonquadratic parts.] =

Exercise 1 3 . 5 Describe the set of points

{ (x, y, z ) E JR3 : x 2 - y2 /3 + z2 - 2 xy - 2yz + 2xz = 1 } mentioning any rotational or translational symmetries that you can find. [Trans lational symmetry is when a figure looks the same after it has been shifted by a translation vector v. Rotational symmetry is similar, but the figure is rotated through an angle e about a given axis.] Exercise 13.6 The form Q is defined on IR3 by

Q ((x, y, z) T )

=

x2 + y 2

+

4z2

+

14xy + 8xz + 8yz.

By finding a suitable orthogonal matrix P and defining 'new coordinates ' a , b, c by write Q (x, y, z) T as for some real constants

.\ ,

fl,

.\a2 + f1b 2 + vc2 and v. Hence describe the following surfaces.


Q(x, y, z) = Q(x,y,z) = O. Q(x,y, z) + x + y - 2z = l. Q(x,y,z) + x + y + z = Q(x,y, z) + x + y + z = O. Q(x, y, z) + x =

(a) 1. (b) (c) (d) 1. (e) (f) 1. [Hint: for some of these, you may find it helpful to change the origin.] Exercise 1 3 . 7 (This exercise is for students wondering why the terminology

'self-adjoint ' is used. ) Let V be a finite dimensional inner product space, and let

ordered orthonormal basis of V. Suppose f: V --'t V is a linear etransformation 1 , . . . , en be anwith matrix A with respect to this ordered basis, and define jt : v

--'t

v by

jt is called the adjoint of

f.

n Jt(v) = L(v ! f(e;))ei . i=l

(a) Show that jt is a linear transformation of V. (b) Show that

n

n

k=!

k=!

for all i , j . (c) Using the previous part and linearity, show that

(u ! J (v)) = (jt (u)J v)

u, v

E V. for all is (d) Deduce that the matrix of jt with respect to the basis that the definition of jt is independent of the orthonormal basis taken.

e1, . . . , e n e1,AT , and . . . , en

Exercise 13.8 Let V be an inner product space over IE. or C, and suppose that

self-adjoint linear transformation from V to V. Given p(x), a polynomial fwithis acoefficients from the field of scalars for V, show that p(J) is a self-adjoint linear transformation V V. Exercise 13.9 Let a, f3 E 2'(V, V ) be self-adjoint, where V is an inner product space over IR or C. Write a/3 as �(a/3 - f3a) + �(a/3 + f3a), to show that J(vJ af3(v) W ? ±J (vJ (a/3 - f3a)(v)) l 2 for all vectors v E V . --'t

14 The J ord a n norm a l form If a linear transformation f : F ---+ F i s not diagonalizable, we may still ask for a basis with respect to which the matrix of f is as 'nice as possible'. It turns out that we can always obtain such a basis where this matrix is in Jordan normal form; that is, a special upper triangular form where the only nonzero entries off the diagonal are entries equal to 1 just above repeated eigenvalues. Such forms will enable us to solve a much greater variety of simultaneous difference and differential equations. 14.1

J orda n normal form

We have proved that if f : V ---+ V is a linear transformation whose minimum polynomial has distinct roots, then f is diagonalizable. We now consider the general case, when m1(x) may have repeated roots. First we need to generalize the concept of eigenspaces. Definition 1 4 . 1 Suppose f : V

where ) q , .

..

---+

V has minimum polynomial

, Ar are the distinct eigenvalues of f. Then the subspaces ker( (! - ,\;l)e, )

of F are called the generalized eigenspaces of f .

1

Notice that if e ; = 1 , i.e. ,\; is an eigenvalue which occurs with multiplicity a root of the minimum polynomial, then this is just the usual eigenspace. The most important result for our purposes is that V is the direct sum of these generalized eigenspaces. This is a generalization of Theorem 12.9, and is stated below as Theorem 14.3 and will be proved in Section 14.4. But before we give this result formally, let us consider an example by way of illustration. as

Example 1 4 . 2 Let V

=

([3 , and define the linear map f : F f

(�) ( =

z

2

- x - :y - z -x + 4y + z

)

.

---+

F by

204 The Jordan normal form Then f is represented with respect to the standard basis by the matrix

A = (-1-� -�0 -�0) ' (x) = XA (x) = (2-x)(x2 + 2x + 1 ) = (2 -x)(x + 1)2. XJ m1(x) (x -2)(x + 1) (x -2)(x + 1)2. (A -2I)(A +I) = (-1-� -�4 -1- �) (-1-� -�4 -�2) ( � -120 �) ' = = mA( x ) 1)2. m1( x ) ( x -2)( x + 2, = 21. (�z) = (-x-x +-�y4y-z-z) = 0 T : -xz + 4y - z 0, (x, y, z)Ty = 0 x = -z. -x-5y-z{(-z,O,z) = + -1 , (�z) = (-x+4y+2z -x -3;y-z ) = == {(O,y,zf y,z -2y -z 0,4y + 2z = 0} : {(O,y,-2yf : y = ( -x-x+-34y;y-z+ 2z) 9x - (-x + 4y + 2z) ) -3x-2(-x-2y-z) (-3x + 4( -x-2y-z) + 2( -x + 4y + 2z) (I)

so we can calculate its characteristic polynomial in the usual way, as Thus

is either

or

But

6

-6

-6

which is not the zero matrix, so For the eigenvalue the generalized eigenspace is the same as the ordinary eigenspace, and is just the kernel of the linear map g f Now g

so the kernel of

is the set of all vectors satisfying the equations So and = equivalently, and ker(g) = E q. For the eigenvalue we work out the ordinary eigenspace in the same way. Writing h f I we have g

h

and

E C, E C} .

ker(h)

Now

h

Jordan normal form 205 so

ker (h2 ) = {(x, y, z) T : 9x = 0} = { (O, y, z) T : y, z E C} ,

which has dimension 2. Note that the image of h2 i s the set of all vectors of the form (9x, 0 , -9x)T , which is the same as the kernel of g. That is, irn(h2 ) = ker ( g ) . Similarly we have irn(g ) = ker ( h 2 ) . ( Compare Proposition 1 2 . 1 7 and the example preceding it. ) We now state our promised generalization of Theorem 12.9. Theorem 14.3 (Primary decomposition) Let f : V formation witl1 minimum polynomial

-+

V be a linear trans

where >.. 1 , . . . , Ar are the distinct eigenvalues of f, and e1 , . . . , er are posit ive in tegers. Let V1 , . . . , Vr denote the corresponding generalized eigenspaces, i.e. vi = ker((f - >..i W ' ) . Then

Proof See Section 14.4, Theorem 14.15.

D

This is sometimes called the primary decomposition of V with respect to j , and V1 , , Vr are the primary components. We have already seen an example of this in Example 14.2. In that example, the generalized eigenspaces are •

•

•

V1

=

V2

ker ( g ) = { (x, 0, - x) T :

=

2 ker(h ) =

x

E q,

and

{ (O, y, z) T : y, z E C} ,

and it is easy to check that V = V1 EB V2 in this case. If we now choose a basis B1 for VI and a basis B2 for v2 then B = B I u B 2 is a basis for v since v = VI EB v2 ' ' is a direct sum , and then we can write f with respect to the basis B . For example, take B1 = ( 1 , 0, - 1) T and B 2 = (0, 1, O) T , (0, 0, 1 ) T , and calculate

f ( 1 , 0, - l ) T j(0, 1 , 0) T f(0 , 0 , 1 ) T

(2, 0 , -2) T (0, -3, 4) T (0, - 1 , 1 ) T

2( 1 , 0, - 1 ) T - 3 (0 , 1 , O) T + 4(0, 0, 1 ) T - (0, 1 , 0) T + (0, 0 , 1 ) T

so the matrix of f with respect to the basis B is

(! -D 0 -3 4

Observe that this matrix is in block diagonal form, with the blocks corresponding to the different generalized eigenspaces of f.

206 The Jordan normal form 1 4. 3 , if B; is a basis for Vi , then U�= l B; i s a basis for V , and with respect t o this basis the matrix of f has block diagonal form, i.e.

Proposition 1 4 . 4 With ti1e notation of Theorem

0 where A 1 , . . . , Ar are square matrices giving the action of f on V1 , . . . , Vr . Proof We use the fact that f and (f - A;I)e, commute with each other, for all i . I f v; E V; = ker((f - A;I)e' ) , then

(f - A;I)e' (v; )

=

0

so

and

(f - A;I)e' ( f (v;) ) = 0 ;

hence f(v; ) E V; .

D

The primary decomposition uses the factorization of the minimum polynomial of f to give us a block diagonal form for the matrix of f. Each block now has a single eigenvalue: the minimum polynomial of an n x n block is, say, (x - ,A.) k , and the characteristic polynomial is (x - ,\) n . Our next task is to simplify the shape of these blocks. In other words, we try to find as nice a basis as possible for each generalized eigenspace. We have already seen in Example 1 2 . 1 that in general we cannot find a basis with respect to which the matrix is diagonal. However, we can get close. To see the kind of thing that we can do, let us consider an example. Example 1 4 . 5 Let f : IR.3

f

-+

IR.3 be defined by

(�)z ( � ;8� � �z4z) , (�! -� - �) . =

x 1 lOx + 5y -

so that f is represented with respect to the standard basis by the matrix B=

10

5 -4

First we calculate the characteristic polynomial XB (x) = ( 1 - x ) 3 and minimum polynomial mB (x) (x - 1) 2 . In particular there is a single eigenvalue, namely =

Jordan normal form 207 1, and its algebraic multiplicity is 3. Next we work out the eigenspace, ker(B - I), which consists of all vectors (x , y , z ) satisfying

Solving these equations in the usual way, we obtain a two-dimensional space of solutions, spanned by eigenvectors such as ( 1 , 0, 2) T and (0, 1 , l)r, for example. (Thus the geometric multiplicity of the eigenvalue is 2.) ;\1oreover, as (B - I)2 is the zero matrix, ker(B - I)2 is the whole space. To get a nice basis for the space, we first take a basis for ker(B - I) and then extend to a basis for ker(B - I)2 . For example, we could take the ordered basis

Applying the corresponding base-change matrix

we obtain the new matrix 1 Q- B Q =

(

�) '

1 0 -2 0 1 1 0 0

which is now an upper triangular matrix, with the eigenvalues on the diagonal. But we can do better than this. If we apply B - I to the basis vector ( 1 , 0, O)r, then the image vector ( - 2, 14, lO ) T is in ker(B - I) since w = (B - I) ( l , 0, O)T has (B - I)w = (B - I)2 ( 1 , 0, O)T = 0 as ker(B - I)2 = R3 . So let us change our basis of ker(B - I) to include this vector. For example, we could take our new basis for the whole space to be

( 1 , o , of, ( -2, 14, IOf, ( o , 1, If

which would give a base-change matrix

and a new matrix

which is in so-called Jordan n ormal form. The only nonzero entries off the diag onal are entries equal to 1 , one place immediately above the diagonal.

208 The Jordan normal form --t W is a linear transformation with minimum poly nomial m1(x) = (x - A)k , then there is a basis of W with respect to which the matrix of f has A on the diagonal, 1 or 0 in each entry immediately above the diagonal, and 0 elsewhere. That is, the matrix of f has the form

Theorem 14.6 If f : W

A 0

1 A

0

0 1

0

0

1 A

A

1

A

0

0 1

0

0

A

1

0

with zeros everywhere except as indicated. Proof See Section 14.2.

D

This theorem tells you how each of the blocks A; in Proposition 14.4 can be rewritten. Putting all the blocks together again, we get the Jordan normal form of an arbitrary matrix, which has blocks of the shape given in Theorem 14.6, for various values of A. The small blocks which make up this matrix, of the form

A 0

E

=

1 A

0 1

0 0 0

0

1

0

0

0

A

are called elementary Jordan matrices. If E is a k x k matrix of this form , then it is easy to show that (E - Alk ) k = 0, but that (E - Al k )k - I f:. 0 . Thus if a Jordan matrix J has a k x k block E as above, then the minimum polynomial must be divisible by (x - A)k . Indeed we have the following result ( see Exercise 1 1 .7) . Proposition 1 4 . 7 If f : V

--t

V is a linear transformation and

then, in a matrix representation of f in Jordan normal form, the largest element ary Jordan matrix with eigenvalue A; is an e; x e ; matrix.

With the same matrix E as above, suppose that v eigenvector of E. Then Ev Av, i.e. =

=

(v1 , . . . , vk ) T is an

Obtaining the Jordan normal form 209 Ev = (.\v1 + v2 , .\v2 + v3 , . . . , .\vk- 1 + vk , .\vk ) T = .\ (v1 , v2 , · . . , vk ) T , = · · = Vk- l = 0 . So up to a scalar multiple,

so v1 v2 we have (0 , 0, . . . , l)T. Thus each elementary .Jordan matrix has a one-dimensional ei genspace. Putting all these together we obtain the following.

=

v=

·

Proposition 14.8 The dimension of the .\-eigenspace of f (i.e. the geometric

multiplicity of .\) is equal to the number of elementary Jordan matrices for ,\ in the Jordan normal form for f . 14.2

O btaining t h e J ordan norma l form

Here, we will be rather more precise on how the .Jordan form of an arbitrary square matrix can be obtained. Suppose f : --1 is a linear transformation. First, the primary decomposi tion theorem (Theorem 14.3 or Theorem 14. 15) shows how we can get a block diagonal form for the matrix of f , by finding bases of the generalized eigenspaces. (All you need to know to be able to carry out this calculation is the definition of the generalized eigenspaces. In particular, you don't need to know the proof of the primary decomposition theorem .) This reduces the problem to finding a 'nice' representation for each block, i.e. finding a 'nice' basis for each generalized eigenspace. Each block corresponding to one of these generalized eigenspaces has a single eigenvalue. The minimum polynomial of an n x n block is ( x - >.)k , say, and the characteristic polynomial is (x - .\ ) n . We suppose, therefore, that we have a linear transformation f : --1 with minimum polynomial rnJ (x) (x - >.)k, and for simplicity we consider the linear transformation g f- .\I instead. Then we have rng ( x ) x k and X g (x) xn , so the only eigenvalue of g is 0 . A s m g ( x ) xk , gk i s the zero map , s o ker(yk ) . Now clearly ker g r + l 2 r r r ker g for all r, for if v E ker(g ) then g (v) 0 and g r � l (v) g(g r (v)) g(O) 0, so v E ker (g r + l ) . This means we get a chain of subspaces

V V

=

=

V V =

=

=

V=

=

=

V = ker l 2 ker g k - l 2

·

·

·

2 ker g2

:J

ker g 2 ker l

=

=

= {0}.

The general method for finding a suitable basis of V i s as follows. First take a basis v1 , . . . , Vr1 of ker g, extend this to a basis v1 , . . . , Vr1 , Vr1 + I , . . . , Vr2 of ker g 2 , and so on , until we have a basis

V = ker gk. \Ve now modify this basis: first write down those basis elements Vrk _ 1 -,- l , . . . , Vrk of ker gk not in ker gk-l as a1 = Vrk-l . . . , an1 = Vrk , giving

of

+I ,

Next calculate b1 = g(a1 ) , . . . , b n 1 g (an1 ) and write these down underneath the ai . These bi are all elements of ker gk-l since gk-l (bi) gk (ai) 0, and it

=

=

=

210 The Jordan normal form will turn out that all the vectors a;, bj form a linearly independent set. Because of this, we can extend the list of the b; to b n, + 1 , . , b n2 so that .

.

We then work out c; = g(b;) for each i, write these underneath, and extend what we have got to a basis of span( vr. _ 3 + 1 , . . . , vr. ) . When this process is complete, we will have a basis of the whole space V written as a table of the form

a b1

1

bn1 + 1

b n2

C1

a n, b n, Cn,

Cn1 + 1

Cn2

Cn2 + 1

Cn3

Z1

Zn,

Zn1 + 1

Zn2

Zn2+1

Zn3

Znk

·

All that is required is to order this basis in a suitable way. To do this, note that = b; , g(b;) = c; , etc. , so we order the basis reading up the columns first, and then left to right, as

g(a;)

Because g(a;) = b; , g (b;) = c; , etc. , the matrix of g will be in Jordan normal form, with an elementary Jordan matrix of the form

0 1 0 0 0

0 1

0 0 0

0 0

0

0

1

for each column of the table. The matrix of the original linear transformation f g + .AI is then formed of elementary Jordan matrices =

A. 1 0 A. 0

0 1

0 0 0

0 0

0

A.

1

required. Clearly, the crucial point to this construction (and it is not immediately obvious) is that the basis modification actually does give a basis. The lemma that tells us that it really does work is the following. as

Obtaining the Jordan normal form 211 Lemma 1 4 . 9 If { u 1 , . . . , Ur } is a basis for ker(gj ) , i s extended t o a basis

of ker (gJ � 1 ) , and to a basis of ker (gH 2 ) , then { u 1 , . . . , 11r , g( wi ) , . . . , g( w1 ) } is a linearly independent subset of ker(gH 1 ) . Proof First note that gJ� 1 (g(w;) ) = gH2 (w; ) = 0 so g(w;) E ker(gj-t-1 ) . To

show linear independence, suppose we have a linear dependence r t L >..; v; + L JL;g(w;) = 0 , i= 1

so that

i=l

r ;g(w;) = L f1 L A;V; t

i= l

i= l

E ker(gj ) .

Therefore

so 2::�= 1 fl iWi E ker(gh- 1 ) which means that it can be written as a linear com bination of {v 1 , . . . , v r , v 1 , But • • •

,vs}.

=

{u l , · · · , V r , V 1 , · · ·

is a linearly independent set, so all >.. ; = 0 .

Jl;

, t18 1 W 1 , · · ·

, wt }

0 . Therefore L �� 1 A;u; = 0 , so all the

o

Example 14.10 Let A be the matrix

A=

r-: ( -1 I

1

0

2 0 0 0

1

0 0 0

1 1

3

-1 -1

})

You can calculate that X A = (2 - x ) and rnA (x) = ( x - 2r3 . �ow let g (v) = Bv where

B = A - 21 =

5

I

-1 1 -1 1

0 0 0 0

0 1

-1 1 -1

0 0 0

�l -

1 -1 -1

212 The Jordan normal form Calculating kernels as usual, we find that

with basis

( 1 , - 1 , 1 , 0 , 0)r , ( o , o , o, 1 , - l)r ,

with basis

( 1 , - 1 ' 1 , 0 , 0) T ' (0 , 0 , 0 , 1 , - 1 ) T ' ( 1 , 0 , 0 , 0 , O) T ' (0 , 0 , 0 , 1 ' 0) T ' and ker g3

=

IR.5 , with basis

( 1 , - 1 , 1 , o, o )r , ( o, o, o, 1 , - 1f , ( 1 , o, o, o, of, ( o, o, o, 1 , o ) r , ( o, 1 , o, o, of. We now modify this basis according to the rules above. First, we set a1

(0 , 1 , 0 , 0 , O)T. Next, take b 1 = g(aJ ) = ( 1 , 0 , 0 , 0 , O)T , and extend by adding b 2 = (0 , 0 , 0 , 1 , 0)T. Finally, we set c1 = g(b l ) = ( 1 , - 1 , 1 , - 1 , 1)T and c 2 g(b 2 ) = (0 , 0 , 0 , 1 , - l)T. These vectors are organized in the following way,

and we can order the basis we have just found reading up columns and across from left to right as c 1 , b 1 , a1 , c 2 , b 2 . The corresponding base-change matrix P and its inverse are

p

(

=

I

-1 1 -1 1

1 0 0 0 0

0 0 1 0 0 0 0 1 0 -1

!)

p-'

�

(�

0 0 -1 1 1 0 I 0 0

0 0 0 0 1

))

and you can check that

p - 1 AP

1 0 1 0 2 0 0 0 0

=

0 0 0

(� �} 2

2

0

in Jordan normal form. 14.3

Applications

.Just as with diagonalization, Jordan form gives us a useful method for solving many kinds of simultaneous difference and differential equations. \Ve illustrate the method here with an example of simultaneous difference equations.

Applications 213 Example 1 4 . 1 1 Solve the equations

3x Xn+l Zn n YnZn+l+l Yn-Xn +Yn 2zn Zn = = =

+

+

xo Yo zo

= = = 1. subject to First, in matrix form this becomes

XnYn) (Xn+l ) ( Yn+l Zn-,-l Zn =A

where

By the usual calculations, we find that

Put

g(v) = (A - 2I) v, so ker g has basis ( 1 , 0, - l)r, ker g2 has basis ( 1 , 0, - 1f, (1, - 1 , 0f,

and ker g3 has basis

(1, o, - 1f, ( 1 , - 1 , of, ( 1 , o, of. Set a 1 = ( 1 , 0, O)T, b 1 = g(a1 ) = (1, - 1 , O)T, and c 1 = g(b 1 ) = gives a base-change matrix P and its inverse given by

P= and

u �) 1

p-1 =

-1

0

(:

G D (an�: ) (Xn�:)

p - 1 AP = As usual , we 'change coordinates ' to

= p-1

1

20

)

( 1 , 0, - 1)T. This

0 -1 -1 0 . 1 1

214 The Jordan normal form giving

a(bn") (20 21 01) (abn") a( bnn++!1 ) Cn+ Cn 0 0 2 Cn an 2nnao + n2n- l bo + n(n2- 2n-2co bnCn = 22"cobo .+ n2"-1co p - I AP

=

=

I

Now the general solution to difference equations like this is

1)

=

=

Substituting

into this we get

2 - ?n/2- 4) abnn= 2"-2n-�(.1, (33nn-2) Cn 2" 3, 2"-2(34n2- + 5n/2 + 4) XnYn 2"-2( 2"-2(4 + ?n/2-3n2). = =

which gives

·

=

6n )

=

z,

[xx2' (((iii +I)++ 1)1)

=

For solving equations like these, the following 'standard forms' are useful.

Theorem

1 4 . 1 2 T h e general solution of the differen ce equa tions

0 0 A 0 0 0 0

A

X3

xk(i:+ 1) is

where

is the coefficient o f

"Cn

= 1 , etr.

xi

in

( :r +

" C;

=

0 0 0

XX2! ((ii)) (i) 1 Xk(i)

1

A

n!/(i!(n"Co =

I ) " , so

a: 3

1,

i)l)

" C\

=

n,

"C2

= n(n - ... , I)/2,

Applications 215 Proof Use induction on n together with the familiar identity

D

to obtain the theorem .

The same ideas can be applied to differential equations too. Example 1 4 . 1 3 Quantities x(t), y(t) , z (t) vary with time, t, and satisfy the equations

dx - = 3x + z dt dy - = -x + y - z dt dz = y + 2z -;at and boundary conditions x(O) = y(O) = z(O) = In matrix form , we have

L

We find x(t) , y(t) , z (t) .

where

is the matrix of the previous example. This suggests using the same base-change matrix P , and defining new quantities u (t) , v(t) , w (t) by

Then

or

dw - = 2w dt

dv = 2v + w dt

du - = 2v + v . dt

The solution to this standard system of differential equations is

216 The Jordan normal form for constants of integration A , B, C. These constants are found using the bound ary conditions

So A = 3, B

( ) u(O) v (O) w (O)

=

= p-I

- 1 , and C

=

( ) x(O) y (O) z (O)

= p-

1

() ( ) 1 1 1

-1 3

-1

.

-1. This gives the required solution

x = �t2 e 2t + 2te2t + e 2t y = -3te 2t + e 2t z = - � t2 e 2t + te2t + e 2t

of the original differential equations. 14.4

Proof of the prim ary decom position theorem

Here we prove the primary decomposition theorem. As will be clear, if the the orem is taken on trust, the proof is not required in calculations. However, the proof is given here for those readers who like to see the complete story. Suppose that we have a linear map f : \1 -+ V with minimum polynomial m1 (x) = (x - >. I )e' . (x - Ar ) e , where A 1 , . . . , Ar are the distinct eigenvalues of f, and let V1 , . . . , lir be the generalized eigenspaces, defined by . .

"

\1;

= ker ( ( f -

=

A;W' ) .

We want to prove that V i s the direct sum of these generalized eigenspaces. Define p(x) = (x - >. I ) e ' and q (x) (x - >. 2 )e 2 (x - Ar )e" , so that m 1 (x) = p(x) q (x) and VI = ker(p(f) ) . Also define wl = ker ( q (f)) . Our plan is to show that V1 EB W1 = V, and then use induction on dim V to obtain a decomposition w1 = EB r;� 2 V; . We shall use the result in Proposition 9.9 ( or Proposition 9. 14) which says there are polynomials t( x) and s ( x) such that

t(x)p(x) + s(x) q (x)

• • •

=

1.

We shall also use the fact from Section 9.2 that p(f) q (f) throughout. Lemma 1 4 . 1 4 V1 Pro of

EB

q (f)p(f), etc. ,

W1 = V .

Let v E V. Then

v

=

Iv

=

(t(f)p(f) + s(f) q (f))v = t(f)p(f)v + s(f) q (f)v .

But t(f)p(f)v E W1 since q (f) ( t(f)p(f)v) = t(f) (p(f) q (f)v) = 0 as p(f) q (f) = 0. Similarly, s (f) q (f)v E \11 , so \1 = \11 + W1 .

Proof of the primary decomposition theorem 217 To show that this sum is direct, suppose v E V1 , w E W1 , and v + w = 0. We must show v = w = 0. But v + w = 0 implies p(f)v + p(f)w = 0 but p(f)v = 0 as v E ker p(f ) , so p(f)w = 0. But this means w = Iw = (t(J)p(J) q(f)w = way. as

0

+ s (J)q(f))w

since w E ker q(f) .

+ s (f)q(f)w = 0 So w = 0. We prove v = 0 in exactly the same = t(f)p(J)w

D

Theorem 1 4 . 1 5 If f : V --+ V is a linear map with minimum polynomial where .\ 1 , . . . , Ar are the (distinct) eigenvalues of f , let V1 , . . . , Vr be the gener alized eigenspaces, defined by

Then V = V1

EB · · · EB

Vr .

Proof By induction on the number

r of distinct eigenvalues of f. If r = 1 there is nothing to prove. If r > 1 , we have V = V1 EB W1 , and we can define g : W1 --t W1 by g(w) = f(w) . We only need to check that g(w) E W1 . But if w E W1 = im(p(f) ) then w = p(J) (v) for some v E V , so g(w) = f(w) = (f o p(J)) (v) = (p(f) o f ) (v) = p(f) (f(v)) E im (p(J)) = W1 . Moreover, W1 ker(q(f)) , so q (g) maps every vector in W1 to 0 . Therefore =

q (x) = ( x

-

A2 ) e 2

...(

x -

Ar ) e "

divides the minimal polynomial rng ( x ) of g. I t follows that g restricted to wl has r - eigenvalues, and we can use induction to say that W1 = V2 EB EB Vr , D Vr . so that V = V1 EB W1 = V1 EB V2 EB · · · EB

1

·

·

·

Let us look at an example in detail to see how this primary decomposition works. Let V = IR.3 and suppose that f : V --t V is given by f so that f2

() ( ( ( a

() ( a b

c

=

-a - c

)

b = f 4a - b + 7c

c

)

-a - c 4a - b + 7c , 4a + 3c

4a + 3c -(-a - c) - (4a + 3c) 4( -a - c) - (4a - b + 7c) + 7(4a + 3c) 4(-a - c) + 3(4a + 3c) -3a - 2c 20a + b + 10c . 8a + 5c

)

)

218 The Jordan normal form Then f is represented with respect to the standard basis by the matrix

A=

(-� -1 �) -1

7 3

and the characteristic polynomial of f is

X t (x) = XA (x) = (x - 1 ) 2 (x + 1). Therefore the minimal polynomial of f , rn1 (x) = rnA (x), i s either (x - 1) (x+ 1 ) or (x- 1) 2 (x+ 1). But (A-I) (A+I) A2 -I :j: 0, so in fact rnA (x) = (x- 1 ? (x+ 1). (Note: throughout these calculations you can work either with the matrix A or with the linear map f.) Now we want to take out one linear factor of rnA ( x), to the full power, say p(x ) = (x - 1 ) 2 , and q(x) is what is left, namely q(x ) = (x + 1 ) . By definition we have rn1 ( x) = p(x )q(x ) . Substituting f into this identity, we obtain p(J)q(J) = rn1 (J) = 0, so all vectors get mapped to 0 by p(J) o q(J) . Of course, some vectors go to 0 under p(J) on its own (these are exactly the elements of ker(p(J ) ) ) , and some go to 0 under q(J) on its own (these vectors form ker( q(J) ) ) . We have q(x) = x + 1 , so q(J) = f + I i s defined by (q(J)) (v) = j (v) + v, i.e. =

q(J) :

(a, b, c)T r-+ ( -c, 4a + 7c, 4a + 4c)T.

Therefore the image of q(J) is spanned by the vectors (0, 4, 4)T and ( - 1 , 7, 4)T, or to take a simpler basis, {(0, 1, 1)T, (1, 0, 3f} . Moreover, the kernel of q(J) consists of all vectors (a, b, c)T such that (-c, 4a + 7c, 4a + 4c) = (0, 0, 0) , in other words ker(q(J)) = ((0, l , O)T). It is easy to see now that in this example V = im(q(f)) EEl ker(q(J) ) . Similarly we have p(x) = (x - 1 ) 2 , so p(J) = j 2 - 2 f + I and

Therefore the image of p(J) is spanned by (0, 1 , 0)T. Also, the kernel of p(J) consists of all vectors (a, b, c)T satisfying 12a + 4b - 4c = 0. This is clearly a two-dimensional space, spanned by ( 1 , 0, 3)T and (0, 1 , 1 ) T. Thus we see that the image of p(J) is equal to the kernel of q(J ) , and vice versa. In the notation above we have vl = span ( ( l , 0, 3)T, (0, 1 , l)T), wl span((O, 1 , O)T), and v vl Gl wj . =

=

Exercises 219 If we choose the new basis v1 =

( 1 , 0, 3)

T

(0, 1 , 1 )

, v2 =

T

, v3 =

(0, 1 , 0)

T

and write f with respect to this basis, we see that

( -4, 25, 13)T ( - 1 , 6, 3)T (0, - 1 , 0)T

f(vi ) j ( v'2 ) j ( V3 )

- 4v1 + 25v2 - v 1 + 6v2 -V3

so the matrix of f with respect to the ordered basis v1 , v2 , v3 is B

(-�4 J) -1 6 0

2

=

which is in block diagonal form : we have separated out the different eigenvalues into different blocks. Exercises

Exercise 1 4 . 1 Using matrices

p

-1

=

�

c

-1

2

2

-1 -1 - 1 -2

2

compute p - 1 AP, where

A�

(�

-

)

;2

-1

P=

-3 - 5 1 -1

-2 -2

-2

-1

0 0 I 1 0 0 2

(� �)

)

3 . 3

�

What are the characteristic and minimum polynomials of A? Give bases for each of the generalized eigenspaces ker(A - .ur . Exercise 1 4 . 2 For the matrix

A=

(-� � =;) -3

2

-2

1

compute a base-change matrix P such that p - AP is in Jordan normal form , as follows. (a) Compute XA (x). Show that it is of the form - (x - )q ) (:r - .A 2 ) 2 for some distinct .A1 , .A 2 . (b) Find bases 11 of ker(A - .A1 I) , v1 of ker(A - .A 1 I) , and v1 , l• 2 of ker(A - .A1 If .

220 The Jordan normal form ( c ) Working from first principles, explain why ( A - A 1 I ) v2 is a scalar multiple of v 1 . [Hint: what is ( A - A1 I ) ( A - A I I ) v2 ?] ( d ) Let w1 = u, w2 = ( A - A1 I) v2 , and w3 = v2 . What is the matrix of the linear transformation A with respect to this basis? Write down the base-change matrix P .

Exercise 1 4 . 3 Repeat the last exercise (except for part ( c ) ) for the matrix

( This time 1 , 2, 3. )

XA (

x) = (A - x) 3 for some A. Find bases for ker( A - Al) n for

n

=

Exercise 1 4.4 Solve

( a)

X n+l = 2yn - Zn Yn+l = Yn Zn+l = X n - 2yn + 2zn xo = Yo = zo = 1 (b ) X n+! = 5xn - 3yn - 5zn + 5wn Yn + ! = X n + Yn - Zn + Wn Zn+! = 2xn - 2yn - 2zn + 3wn Wn + 1 = Xn - Yn - 2zn + 3wn xo = Yo = zo = wo = 1 .

(

Exercise 1 4 . 5 Find bases for the primary components o f the linear map rep resented by the matrix

A=

-1 1 1

3

0

-4

0

15

-2 - 7 0 3

J)

and hence find a matrix P such that p - I A P is in block diagonal form.

Exercise 14.6 ( a) Let

Find the characteristic polynomial, eigenvalues, and the minimum poly nomial of A , and find the algebraic and geometric multiplicities of each eigenvalue. Write down the Jordan normal form J for A.

(b ) Do the same for the matrix B =

( � -=_12) . i

Exercises 221 Exercise 14.7 Write down all possible Jordan normal forms for matrices with

characteristic polynomial (x - A) 5 . In each case, calculate the minimum polyno mial and the geometric multiplicity of the eigenvalue A. Verify that this informa tion determines the Jordan normal form . Exercise 14.8 Do the same for (x - A)6 .

Exercise 14.9 Show that there are two 7 x 7 matrices in Jordan normal form which have the same minimum polynomial and for which A has the same geo metric multiplicity, but which are not similar. Exercise 1 4 . 1 0 Show that the Jordan normal form of a 2 x 2 matrix is deter mined by its minimum polynomial, but that this is not true for 3 x 3 matrices.

3 x 3 matrix is deter mined by its minimum polynomial and its characteristic polynomial, but that this is not true for 4 x 4 matrices.

Exercise 14. 1 1 Show that the Jordan normal form of a

A p pend ix A A theorem of a na lysis If I is an interval of the reals (e.g. [a, b] , (a, b) , [a , b) where a < b and a, b are either real or ± oo), a function f : I -+ IE. or f : I -+ C is said to be continuous at a point c E I if for all E > 0 from N there is b > 0 in IE. with

/f(x) - f(c) /

Linear Algebra (Oxford Science Publications)

Ockham Algebras (Oxford Science Publications)

Multisensory Control of Movement (Oxford Science Publications)

Probability: An Introduction (Oxford Science Publications)

Combinatorics of Experimental Design (Oxford science publications)

Probability: An Introduction (Oxford Science Publications)

Statistical Data Analysis (Oxford Science Publications)

Rings and Fields (Oxford Science Publications)

Linear Algebra

ALGEBRA LINEAR

Linear algebra

Linear algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear algebra

Linear algebra

Linear Algebra

Linear algebra

Linear Algebra

Algebra Linear

Linear Algebra

Linear Algebra

Linear algebra

Linear Algebra

Linear Algebra (Oxford Science Publications)

Ockham Algebras (Oxford Science Publications)

Multisensory Control of Movement (Oxford Science Publications)

Probability: An Introduction (Oxford Science Publications)

Combinatorics of Experimental Design (Oxford science publications)

Probability: An Introduction (Oxford Science Publications)

Statistical Data Analysis (Oxford Science Publications)

Rings and Fields (Oxford Science Publications)

Linear Algebra

ALGEBRA LINEAR

Linear algebra

Linear algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

Linear algebra

Linear algebra

Linear Algebra

Linear algebra

Linear Algebra

Algebra Linear

Linear Algebra

Linear Algebra

Linear algebra

Linear Algebra

Recommend Documents