Advanced Linear Algebra (Graduate Texts in Mathematics)

Steven Roman Advanced Linear Algebra Third Edition Steven Roman 8 Night Star Irvine, CA 92603 USA sroman@romanpress...

Author: Steven Roman

22 downloads 1178 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

Steven Roman

Advanced Linear Algebra Third Edition

Steven Roman 8 Night Star Irvine, CA 92603 USA [email protected] Editorial Board S. Axler Mathematics Department San Francisco State University San Francisco, CA 94132 USA [email protected]

ISBN-13: 978-0-387-72828-5

K.A. Ribet Mathematics Department University of California at Berkeley Berkeley, CA 94720-3840 USA [email protected]

e-ISBN-13: 978-0-387-72831-5

Library of Congress Control Number: 2007934001 Mathematics Subject Classification (2000): 15-01 c 2008 Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper. 9 8 7 6 5 4 3 2 1 springer.com

To Donna and to Rashelle, Carol and Dan

Preface to the Third Edition

Let me begin by thanking the readers of the second edition for their many helpful comments and suggestions, with special thanks to Joe Kidd and Nam Trang. For the third edition, I have corrected all known errors, polished and refined some arguments (such as the discussion of reflexivity, the rational canonical form, best approximations and the definitions of tensor products) and upgraded some proofs that were originally done only for finite-dimensional/rank cases. I have also moved some of the material on projection operators to an earlier position in the text. A few new theorems have been added in this edition, including the spectral mapping theorem and a theorem to the effect that dim²= ³ dim²= i ³, with equality if and only if = is finite-dimensional. I have also added a new chapter on associative algebras that includes the wellknown characterizations of the finite-dimensional division algebras over the real field (a theorem of Frobenius) and over a finite field (Wedderburn's theorem). The reference section has been enlarged considerably, with over a hundred references to books on linear algebra. Steven Roman

Irvine, California, May 2007

Preface to the Second Edition

Let me begin by thanking the readers of the first edition for their many helpful comments and suggestions. The second edition represents a major change from the first edition. Indeed, one might say that it is a totally new book, with the exception of the general range of topics covered. The text has been completely rewritten. I hope that an additional 12 years and roughly 20 books worth of experience has enabled me to improve the quality of my exposition. Also, the exercise sets have been completely rewritten. The second edition contains two new chapters: a chapter on convexity, separation and positive solutions to linear systems (Chapter 15) and a chapter on the QR decomposition, singular values and pseudoinverses (Chapter 17). The treatments of tensor products and the umbral calculus have been greatly expanded and I have included discussions of determinants (in the chapter on tensor products), the complexification of a real vector space, Schur's theorem and Geršgorin disks. Steven Roman

Irvine, California February 2005

Preface to the First Edition

This book is a thorough introduction to linear algebra, for the graduate or advanced undergraduate student. Prerequisites are limited to a knowledge of the basic properties of matrices and determinants. However, since we cover the basics of vector spaces and linear transformations rather rapidly, a prior course in linear algebra (even at the sophomore level), along with a certain measure of “mathematical maturity,” is highly desirable. Chapter 0 contains a summary of certain topics in modern algebra that are required for the sequel. This chapter should be skimmed quickly and then used primarily as a reference. Chapters 1–3 contain a discussion of the basic properties of vector spaces and linear transformations. Chapter 4 is devoted to a discussion of modules, emphasizing a comparison between the properties of modules and those of vector spaces. Chapter 5 provides more on modules. The main goals of this chapter are to prove that any two bases of a free module have the same cardinality and to introduce Noetherian modules. However, the instructor may simply skim over this chapter, omitting all proofs. Chapter 6 is devoted to the theory of modules over a principal ideal domain, establishing the cyclic decomposition theorem for finitely generated modules. This theorem is the key to the structure theorems for finite-dimensional linear operators, discussed in Chapters 7 and 8. Chapter 9 is devoted to real and complex inner product spaces. The emphasis here is on the finite-dimensional case, in order to arrive as quickly as possible at the finite-dimensional spectral theorem for normal operators, in Chapter 10. However, we have endeavored to state as many results as is convenient for vector spaces of arbitrary dimension. The second part of the book consists of a collection of independent topics, with the one exception that Chapter 13 requires Chapter 12. Chapter 11 is on metric vector spaces, where we describe the structure of symplectic and orthogonal geometries over various base fields. Chapter 12 contains enough material on metric spaces to allow a unified treatment of topological issues for the basic

xii Preface

Hilbert space theory of Chapter 13. The rather lengthy proof that every metric space can be embedded in its completion may be omitted. Chapter 14 contains a brief introduction to tensor products. In order to motivate the universal property of tensor products, without getting too involved in categorical terminology, we first treat both free vector spaces and the familiar direct sum, in a universal way. Chapter 15 (Chapter 16 in the second edition) is on affine geometry, emphasizing algebraic, rather than geometric, concepts. The final chapter provides an introduction to a relatively new subject, called the umbral calculus. This is an algebraic theory used to study certain types of polynomial functions that play an important role in applied mathematics. We give only a brief introduction to the subject c emphasizing the algebraic aspects, rather than the applications. This is the first time that this subject has appeared in a true textbook. One final comment. Unless otherwise mentioned, omission of a proof in the text is a tacit suggestion that the reader attempt to supply one. Steven Roman

Irvine, California

Contents

Preface to the Third Edition, vii Preface to the Second Edition, ix Preface to the First Edition, xi

Preliminaries, 1 Part 1: Preliminaries, 1 Part 2: Algebraic Structures, 17

Part I—Basic Linear Algebra, 33 1

Vector Spaces, 35 Vector Spaces, 35 Subspaces, 37 Direct Sums, 40 Spanning Sets and Linear Independence, 44 The Dimension of a Vector Space, 48 Ordered Bases and Coordinate Matrices, 51 The Row and Column Spaces of a Matrix, 52 The Complexification of a Real Vector Space, 53 Exercises, 55

2

Linear Transformations, 59 Linear Transformations, 59 The Kernel and Image of a Linear Transformation, 61 Isomorphisms, 62 The Rank Plus Nullity Theorem, 63 Linear Transformations from - to - , 64 Change of Basis Matrices, 65 The Matrix of a Linear Transformation, 66 Change of Bases for Linear Transformations, 68 Equivalence of Matrices, 68 Similarity of Matrices, 70 Similarity of Operators, 71 Invariant Subspaces and Reducing Pairs, 72 Projection Operators, 73

xiv

Contents

Topological Vector Spaces, 79 Linear Operators on = d , 82 Exercises, 83

3

The Isomorphism Theorems, 87 Quotient Spaces, 87 The Universal Property of Quotients and the First Isomorphism Theorem, 90 Quotient Spaces, Complements and Codimension, 92 Additional Isomorphism Theorems, 93 Linear Functionals, 94 Dual Bases, 96 Reflexivity, 100 Annihilators, 101 Operator Adjoints, 104 Exercises, 106

4

Modules I: Basic Properties, 109 Motivation, 109 Modules, 109 Submodules, 111 Spanning Sets, 112 Linear Independence, 114 Torsion Elements, 115 Annihilators, 115 Free Modules, 116 Homomorphisms, 117 Quotient Modules, 117 The Correspondence and Isomorphism Theorems, 118 Direct Sums and Direct Summands, 119 Modules Are Not as Nice as Vector Spaces, 124 Exercises, 125

5

Modules II: Free and Noetherian Modules, 127 The Rank of a Free Module, 127 Free Modules and Epimorphisms, 132 Noetherian Modules, 132 The Hilbert Basis Theorem, 136 Exercises, 137

6

Modules over a Principal Ideal Domain, 139 Annihilators and Orders, 139 Cyclic Modules, 140 Free Modules over a Principal Ideal Domain, 142 Torsion-Free and Free Modules, 145 The Primary Cyclic Decomposition Theorem, 146 The Invariant Factor Decomposition, 156 Characterizing Cyclic Modules, 158

Contents

Indecomposable Modules, 158 Exercises, 159

7

The Structure of a Linear Operator, 163 The Module Associated with a Linear Operator, 164 The Primary Cyclic Decomposition of = , 167 The Characteristic Polynomial, 170 Cyclic and Indecomposable Modules, 171 The Big Picture, 174 The Rational Canonical Form, 176 Exercises, 182

8

Eigenvalues and Eigenvectors, 185 Eigenvalues and Eigenvectors, 185 Geometric and Algebraic Multiplicities, 189 The Jordan Canonical Form, 190 Triangularizability and Schur's Theorem, 192 Diagonalizable Operators, 196 Exercises, 198

9

Real and Complex Inner Product Spaces, 205 Norm and Distance, 208 Isometries, 210 Orthogonality, 211 Orthogonal and Orthonormal Sets, 212 The Projection Theorem and Best Approximations, 219 The Riesz Representation Theorem, 221 Exercises, 223

10

Structure Theory for Normal Operators, 227 The Adjoint of a Linear Operator, 227 Orthogonal Projections, 231 Unitary Diagonalizability, 233 Normal Operators, 234 Special Types of Normal Operators, 238 Self-Adjoint Operators, 239 Unitary Operators and Isometries, 240 The Structure of Normal Operators, 245 Functional Calculus, 247 Positive Operators, 250 The Polar Decomposition of an Operator, 252 Exercises, 254

Part II—Topics, 257 11

Metric Vector Spaces: The Theory of Bilinear Forms, 259 Symmetric, Skew-Symmetric and Alternate Forms, 259 The Matrix of a Bilinear Form, 261

xv

xvi

Contents

Quadratic Forms, 264 Orthogonality, 265 Linear Functionals, 268 Orthogonal Complements and Orthogonal Direct Sums, 269 Isometries, 271 Hyperbolic Spaces, 272 Nonsingular Completions of a Subspace, 273 The Witt Theorems: A Preview, 275 The Classification Problem for Metric Vector Spaces, 276 Symplectic Geometry, 277 The Structure of Orthogonal Geometries: Orthogonal Bases, 282 The Classification of Orthogonal Geometries: Canonical Forms, 285 The Orthogonal Group, 291 The Witt Theorems for Orthogonal Geometries, 294 Maximal Hyperbolic Subspaces of an Orthogonal Geometry, 295 Exercises, 297

12

Metric Spaces, 301 The Definition, 301 Open and Closed Sets, 304 Convergence in a Metric Space, 305 The Closure of a Set, 306 Dense Subsets, 308 Continuity, 310 Completeness, 311 Isometries, 315 The Completion of a Metric Space, 316 Exercises, 321

13

Hilbert Spaces, 325 A Brief Review, 325 Hilbert Spaces, 326 Infinite Series, 330 An Approximation Problem, 331 Hilbert Bases, 335 Fourier Expansions, 336 A Characterization of Hilbert Bases, 346 Hilbert Dimension, 346 A Characterization of Hilbert Spaces, 347 The Riesz Representation Theorem, 349 Exercises, 352

14

Tensor Products, 355 Universality, 355 Bilinear Maps, 359 Tensor Products, 361

Contents

When Is a Tensor Product Zero?, 367 Coordinate Matrices and Rank, 368 Characterizing Vectors in a Tensor Product, 371 Defining Linear Transformations on a Tensor Product, 374 The Tensor Product of Linear Transformations, 375 Change of Base Field, 379 Multilinear Maps and Iterated Tensor Products, 382 Tensor Spaces, 385 Special Multilinear Maps, 390 Graded Algebras, 392 The Symmetric and Antisymmetric Tensor Algebras, 392 The Determinant, 403 Exercises, 406

15

Positive Solutions to Linear Systems: Convexity and Separation, 411 Convex, Closed and Compact Sets, 413 Convex Hulls, 414 Linear and Affine Hyperplanes, 416 Separation, 418 Exercises, 423

16

Affine Geometry, 427 Affine Geometry, 427 Affine Combinations, 428 Affine Hulls, 430 The Lattice of Flats, 431 Affine Independence, 433 Affine Transformations, 435 Projective Geometry, 437 Exercises, 440

17

Singular Values and the Moore–Penrose Inverse, 443 Singular Values, 443 The Moore–Penrose Generalized Inverse, 446 Least Squares Approximation, 448 Exercises, 449

18

An Introduction to Algebras, 451 Motivation, 451 Associative Algebras, 451 Division Algebras, 462 Exercises, 469

19

The Umbral Calculus, 471 Formal Power Series, 471 The Umbral Algebra, 473

xvii

xviii

Contents

Formal Power Series as Linear Operators, 477 Sheffer Sequences, 480 Examples of Sheffer Sequences, 488 Umbral Operators and Umbral Shifts, 490 Continuous Operators on the Umbral Algebra, 492 Operator Adjoints, 493 Umbral Operators and Automorphisms of the Umbral Algebra, 494 Umbral Shifts and Derivations of the Umbral Algebra, 499 The Transfer Formulas, 504 A Final Remark, 505 Exercises, 506

References, 507 Index of Symbols, 513 Index, 515

Preliminaries

In this chapter, we briefly discuss some topics that are needed for the sequel. This chapter should be skimmed quickly and used primarily as a reference.

Part 1 Preliminaries Multisets The following simple concept is much more useful than its infrequent appearance would indicate. Definition Let : be a nonempty set. A multiset 4 with underlying set : is a set of ordered pairs 4 ~ ¸² Á ³

:Á {b Á

£

for £ ¹

where {b ~ ¸Á Á Ã ¹. The number is referred to as the multiplicity of the elements in 4 . If the underlying set of a multiset is finite, we say that the multiset is finite. The size of a finite multiset 4 is the sum of the multiplicities of all of its elements. For example, 4 ~ ¸²Á ³Á ²Á ³Á ²Á ³¹ is a multiset with underlying set : ~ ¸Á Á ¹. The element has multiplicity . One often writes out the elements of a multiset according to multiplicities, as in 4 ~ ¸Á Á Á Á Á ¹ . Of course, two mutlisets are equal if their underlying sets are equal and if the multiplicity of each element in the common underlying set is the same in both multisets.

Matrices The set of d matrices with entries in a field - is denoted by CÁ ²- ³ or by CÁ when the field does not require mention. The set CÁ ²< ³ is denoted by C ²- ³ or C À If ( C, the ²Á ³th entry of ( will be denoted by (Á . The identity matrix of size d is denoted by 0 . The elements of the base

2

Advanced Linear Algebra

field - are called scalars. We expect that the reader is familiar with the basic properties of matrices, including matrix addition and multiplication. The main diagonal of an d matrix ( is the sequence of entries (Á Á (Á Á Ã Á (Á where ~ min¸Á ¹. Definition The transpose of ( CÁ is the matrix (! defined by ²(! ³Á ~ (Á A matrix ( is symmetric if ( ~ (! and skew-symmetric if (! ~ c(. Theorem 0.1 (Properties of the transpose) Let (, ) CÁ . Then 1) ²(! ³! ~ ( 2) ²( b )³! ~ (! b ) ! 3) ²(³! ~ (! for all 4) ²()³! ~ ) ! (! provided that the product () is defined 5) det²(! ³ ~ det²(³.

Partitioning and Matrix Multiplication Let 4 be a matrix of size d . If ) ¸Á Ã Á ¹ and * ¸Á Ã Á ¹, then the submatrix 4 ´)Á *µ is the matrix obtained from 4 by keeping only the rows with index in ) and the columns with index in * . Thus, all other rows and columns are discarded and 4 ´)Á *µ has size () ( d (* (. Suppose that 4 CÁ and 5 CÁ . Let 1) F ~ ¸) Á Ã Á ) ¹ be a partition of ¸Á Ã Á ¹ 2) G ~ ¸* Á Ã Á * ¹ be a partition of ¸Á Ã Á ¹ 3) H ~ ¸+ Á Ã Á + ¹ be a partition of ¸Á Ã Á ¹ (Partitions are defined formally later in this chapter.) Then it is a very useful fact that matrix multiplication can be performed at the block level as well as at the entry level. In particular, we have ´4 5 µ´) Á + µ ~ 4 ´) Á * µ5 ´* Á + µ * G

When the partitions in question contain only single-element blocks, this is precisely the usual formula for matrix multiplication

´4 5 µÁ ~ 4Á 5Á ~

Preliminaries

3

Block Matrices It will be convenient to introduce the notational device of a block matrix. If )Á are matrices of the appropriate sizes, then by the block matrix 4~

v )Á Å w )Á

)Á Å )Á

Ä

)Á y Å Ä )Á zblock

we mean the matrix whose upper left submatrix is )Á , and so on. Thus, the )Á 's are submatrices of 4 and not entries. A square matrix of the form v ) x 4 ~x Å w

Ä y Æ Æ Å { { Æ Æ Ä ) zblock

where each ) is square and is a zero submatrix, is said to be a block diagonal matrix.

Elementary Row Operations Recall that there are three types of elementary row operations. Type 1 operations consist of multiplying a row of ( by a nonzero scalar. Type 2 operations consist of interchanging two rows of (. Type 3 operations consist of adding a scalar multiple of one row of ( to another row of (. If we perform an elementary operation of type to an identity matrix 0 , the result is called an elementary matrix of type . It is easy to see that all elementary matrices are invertible. In order to perform an elementary row operation on ( CÁ we can perform that operation on the identity 0 , to obtain an elementary matrix , and then take the product ,(. Note that multiplying on the right by , has the effect of performing column operations. Definition A matrix 9 is said to be in reduced row echelon form if 1) All rows consisting only of 's appear at the bottom of the matrix. 2) In any nonzero row, the first nonzero entry is a . This entry is called a leading entry. 3) For any two consecutive rows, the leading entry of the lower row is to the right of the leading entry of the upper row. 4) Any column that contains a leading entry has 's in all other positions. Here are the basic facts concerning reduced row echelon form.

4


Theorem 0.2 Matrices (Á ) CÁ are row equivalent, denoted by ( ) , if either one can be obtained from the other by a series of elementary row operations. 1) Row equivalence is an equivalence relation. That is, a) ( ( b) ( ) ¬ ) ( c) ( ) , ) * ¬ ( * . 2) A matrix ( is row equivalent to one and only one matrix 9 that is in reduced row echelon form. The matrix 9 is called the reduced row echelon form of (. Furthermore, 9 ~ , Ä, ( where , are the elementary matrices required to reduce ( to reduced row echelon form. 3) ( is invertible if and only if its reduced row echelon form is an identity matrix. Hence, a matrix is invertible if and only if it is the product of elementary matrices. The following definition is probably well known to the reader. Definition A square matrix is upper triangular if all of its entries below the main diagonal are . Similarly, a square matrix is lower triangular if all of its entries above the main diagonal are . A square matrix is diagonal if all of its entries off the main diagonal are .

Determinants We assume that the reader is familiar with the following basic properties of determinants. Theorem 0.3 Let ( CÁ ²- ³. Then det²(³ is an element of - . Furthermore, 1) For any ) C ²- ³, det²()³ ~ det²(³det²)³ 2) ( is nonsingular (invertible) if and only if det²(³ £ . 3) The determinant of an upper triangular or lower triangular matrix is the product of the entries on its main diagonal. 4) If a square matrix 4 has the block diagonal form v ) x 4 ~x Å w then det²4 ³ ~ det²) ³.

Ä y Æ Æ Å { { Æ Æ Ä ) zblock

Preliminaries

5

Polynomials The set of all polynomials in the variable % with coefficients from a field - is denoted by - ´%µ. If ²%³ - ´%µ, we say that ²%³ is a polynomial over - . If ²%³ ~ b % b Ä b % is a polynomial with £ , then is called the leading coefficient of ²%³ and the degree of ²%³ is , written deg ²%³ ~ . For convenience, the degree of the zero polynomial is cB. A polynomial is monic if its leading coefficient is . Theorem 0.4 (Division algorithm) Let ²%³Á ²%³ - ´%µ where deg ²%³ . Then there exist unique polynomials ²%³Á ²%³ - ´%µ for which ²%³ ~ ²%³²%³ b ²%³ where ²%³ ~ or deg ²%³ deg ²%³. If ²%³ divides ²%³, that is, if there exists a polynomial ²%³ for which ²%³ ~ ²%³²%³ then we write ²%³ ²%³. A nonzero polynomial ²%³ - ´%µ is said to split over - if ²%³ can be written as a product of linear factors ²%³ ~ ²% c ³Ä²% c ³ where - . Theorem 0.5 Let ²%³Á ²%³ - ´%µ. The greatest common divisor of ²%³ and ²%³, denoted by gcd² ²%³Á ²%³³, is the unique monic polynomial ²%³ over for which 1) ²%³ ²%³ and ²%³ ²%³ 2) if ²%³ ²%³ and ²%³ ²%³ then ²%³ ²%³. Furthermore, there exist polynomials ²%³ and ²%³ over - for which gcd² ²%³Á ²%³³ ~ ²%³ ²%³ b ²%³²%³

Definition The polynomials ²%³Á ²%³ - ´%µ are relatively prime if gcd² ²%³Á ²%³³ ~ . In particular, ²%³ and ²%³ are relatively prime if and only if there exist polynomials ²%³ and ²%³ over - for which ²%³ ²%³ b ²%³²%³ ~

Definition A nonconstant polynomial ²%³ - ´%µ is irreducible if whenever ²%³ ~ ²%³²%³, then one of ²%³ and ²%³ must be constant. The following two theorems support the view that irreducible polynomials behave like prime numbers.

6


Theorem 0.6 A nonconstant polynomial ²%³ is irreducible if and only if it has the property that whenever ²%³ ²%³²%³, then either ²%³ ²%³ or ²%³ ²%³. Theorem 0.7 Every nonconstant polynomial in - ´%µ can be written as a product of irreducible polynomials. Moreover, this expression is unique up to order of the factors and multiplication by a scalar.

Functions To set our notation, we should make a few comments about functions. Definition Let ¢ : ¦ ; be a function from a set : to a set ; . 1) The domain of is the set : and the range of is ; . 2) The image of is the set im² ³ ~ ¸ ² ³ :¹. 3) is injective (one-to-one), or an injection, if % £ & ¬ ²%³ £ ²&³. 4) is surjective (onto ; ), or a surjection, if im² ³ ~ ; . 5) is bijective, or a bijection, if it is both injective and surjective. 6) Assuming that ; , the support of is supp² ³ ~ ¸ : ² ³ £ ¹

If ¢ : ¦ ; is injective, then its inverse c ¢ im² ³ ¦ : exists and is welldefined as a function on im² ³. It will be convenient to apply to subsets of : and ; . In particular, if ? : and if @ ; , we set ²?³ ~ ¸ ²%³ % ?¹ and c ²@ ³ ~ ¸ : ² ³ @ ¹ Note that the latter is defined even if is not injective. Let ¢ : ¦ ; . If ( : , the restriction of to ( is the function O( ¢ ( ¦ ; defined by O( ²³ ~ ²³ for all (. Clearly, the restriction of an injective map is injective. In the other direction, if ¢ : ¦ ; and if : < , then an extension of to < is a function ¢ < ¦ ; for which O: ~ .

Preliminaries

7

Equivalence Relations The concept of an equivalence relation plays a major role in the study of matrices and linear transformations. Definition Let : be a nonempty set. A binary relation on : is called an equivalence relation on : if it satisfies the following conditions: 1) (Reflexivity) for all : . 2) (Symmetry) ¬ for all Á : . 3) (Transitivity) Á ¬ for all Á Á : . Definition Let be an equivalence relation on : . For : , the set of all elements equivalent to is denoted by ´µ ~ ¸ : ¹ and called the equivalence class of . Theorem 0.8 Let be an equivalence relation on : . Then 1) ´µ ¯ ´µ ¯ ´µ ~ ´µ 2) For any Á : , we have either ´µ ~ ´µ or ´µ q ´µ ~ J. Definition A partition of a nonempty set : is a collection ¸( Á Ã Á ( ¹ of nonempty subsets of : , called the blocks of the partition, for which 1) ( q ( ~ J for all £ 2 ) : ~ ( r Ä r ( . The following theorem sheds considerable light on the concept of an equivalence relation. Theorem 0.9 1) Let be an equivalence relation on : . Then the set of distinct equivalence classes with respect to are the blocks of a partition of : . 2) Conversely, if F is a partition of : , the binary relation defined by if and lie in the same block of F

8


is an equivalence relation on : , whose equivalence classes are the blocks of F . This establishes a one-to-one correspondence between equivalence relations on : and partitions of : . The most important problem related to equivalence relations is that of finding an efficient way to determine when two elements are equivalent. Unfortunately, in most cases, the definition does not provide an efficient test for equivalence and so we are led to the following concepts. Definition Let be an equivalence relation on : . A function ¢ : ¦ ; , where ; is any set, is called an invariant of if it is constant on the equivalence classes of , that is, ¬ ²³ ~ ²³ and a complete invariant if it is constant and distinct on the equivalence classes of , that is, ¯ ²³ ~ ²³ A collection ¸ Á Ã Á ¹ of invariants is called a complete system of invariants if ¯ ²³ ~ ²³ for all ~ Á Ã Á

Definition Let be an equivalence relation on : . A subset * : is said to be a set of canonical forms (or just a canonical form) for if for every : , there is exactly one * such that . Put another way, each equivalence class under contains exactly one member of * . Example 0.1 Define a binary relation on - ´%µ by letting ²%³ ²%³ if and only if ²%³ ~ ²%³ for some nonzero constant - . This is easily seen to be an equivalence relation. The function that assigns to each polynomial its degree is an invariant, since ²%³ ²%³ ¬ deg²²%³³ ~ deg²²%³³ However, it is not a complete invariant, since there are inequivalent polynomials with the same degree. The set of all monic polynomials is a set of canonical forms for this equivalence relation. Example 0.2 We have remarked that row equivalence is an equivalence relation on CÁ ²- ³. Moreover, the subset of reduced row echelon form matrices is a set of canonical forms for row equivalence, since every matrix is row equivalent to a unique matrix in reduced row echelon form.

Preliminaries

9

Example 0.3 Two matrices (, ) C ²- ³ are row equivalent if and only if there is an invertible matrix 7 such that ( ~ 7 ) . Similarly, ( and ) are column equivalent, that is, ( can be reduced to ) using elementary column operations, if and only if there exists an invertible matrix 8 such that ( ~ )8. Two matrices ( and ) are said to be equivalent if there exist invertible matrices 7 and 8 for which ( ~ 7 )8 Put another way, ( and ) are equivalent if ( can be reduced to ) by performing a series of elementary row and/or column operations. (The use of the term equivalent is unfortunate, since it applies to all equivalence relations, not just this one. However, the terminology is standard, so we use it here.) It is not hard to see that an d matrix 9 that is in both reduced row echelon form and reduced column echelon form must have the block form 0 1 ~ > cÁ

Ác cÁc ?block

We leave it to the reader to show that every matrix ( in C is equivalent to exactly one matrix of the form 1 and so the set of these matrices is a set of canonical forms for equivalence. Moreover, the function defined by ²(³ ~ , where ( 1 , is a complete invariant for equivalence. Since the rank of 1 is and since neither row nor column operations affect the rank, we deduce that the rank of ( is . Hence, rank is a complete invariant for equivalence. In other words, two matrices are equivalent if and only if they have the same rank. Example 0.4 Two matrices (, ) C ²- ³ are said to be similar if there exists an invertible matrix 7 such that ( ~ 7 )7 c Similarity is easily seen to be an equivalence relation on C . As we will learn, two matrices are similar if and only if they represent the same linear operators on a given -dimensional vector space = . Hence, similarity is extremely important for studying the structure of linear operators. One of the main goals of this book is to develop canonical forms for similarity. We leave it to the reader to show that the determinant function and the trace function are invariants for similarity. However, these two invariants do not, in general, form a complete system of invariants. Example 0.5 Two matrices (, ) C ²- ³ are said to be congruent if there exists an invertible matrix 7 for which

10


( ~ 7 )7 ! where 7 ! is the transpose of 7 . This relation is easily seen to be an equivalence relation and we will devote some effort to finding canonical forms for congruence. For some base fields - (such as s, d or a finite field), this is relatively easy to do, but for other base fields (such as r), it is extremely difficult.

Zorn's Lemma In order to show that any vector space has a basis, we require a result known as Zorn's lemma. To state this lemma, we need some preliminary definitions. Definition A partially ordered set is a pair ²7 Á ³ where 7 is a nonempty set and is a binary relation called a partial order, read “less than or equal to,” with the following properties: 1) (Reflexivity) For all 7 , 2) (Antisymmetry) For all Á 7 , and implies ~ 3) (Transitivity) For all Á Á 7 , and implies Partially ordered sets are also called posets. It is customary to use a phrase such as “Let 7 be a partially ordered set” when the partial order is understood. Here are some key terms related to partially ordered sets. Definition Let 7 be a partially ordered set. 1) The maximum (largest, top) element of 7 , should it exist, is an element 4 7 with the property that all elements of 7 are less than or equal to 4 , that is, 7 ¬4 Similarly, the mimimum (least, smallest, bottom) element of 7 , should it exist, is an element 5 7 with the property that all elements of 7 are greater than or equal to 5 , that is, 7 ¬5 2) A maximal element is an element 7 with the property that there is no larger element in 7 , that is, 7Á ¬ ~

Preliminaries

11

Similarly, a minimal element is an element 7 with the property that there is no smaller element in 7 , that is, 7Á ¬ ~ 3) Let Á 7 . Then " 7 is an upper bound for and if " and " The unique smallest upper bound for and , if it exists, is called the least upper bound of and and is denoted by lub¸Á ¹. 4) Let Á 7 . Then M 7 is a lower bound for and if M and M The unique largest lower bound for and , if it exists, is called the greatest lower bound of and and is denoted by glb¸Á ¹. Let : be a subset of a partially ordered set 7 . We say that an element " 7 is an upper bound for : if " for all : . Lower bounds are defined similarly. Note that in a partially ordered set, it is possible that not all elements are comparable. In other words, it is possible to have %Á & 7 with the property that % & and & %. Definition A partially ordered set in which every pair of elements is comparable is called a totally ordered set, or a linearly ordered set. Any totally ordered subset of a partially ordered set 7 is called a chain in 7 . Example 0.6 1) The set s of real numbers, with the usual binary relation , is a partially ordered set. It is also a totally ordered set. It has no maximal elements. 2) The set o ~ ¸Á Á Ã ¹ of natural numbers, together with the binary relation of divides, is a partially ordered set. It is customary to write to indicate that divides . The subset : of o consisting of all powers of is a totally ordered subset of o, that is, it is a chain in o. The set 7 ~ ¸Á Á Á Á Á ¹ is a partially ordered set under . It has two maximal elements, namely and . The subset 8 ~ ¸Á Á Á Á ¹ is a partially ordered set in which every element is both maximal and minimal! 3) Let : be any set and let F ²:³ be the power set of : , that is, the set of all subsets of : . Then F ²:³, together with the subset relation , is a partially ordered set. Now we can state Zorn's lemma, which gives a condition under which a partially ordered set has a maximal element.

12


Theorem 0.10 (Zorn's lemma) If 7 is a partially ordered set in which every chain has an upper bound, then 7 has a maximal element. We will use Zorn's lemma to prove that every vector space has a basis. Zorn's lemma is equivalent to the famous axiom of choice. As such, it is not subject to proof from the other axioms of ordinary (ZF) set theory. Zorn's lemma has many important equivalancies, one of which is the well-ordering principle. A well ordering on a nonempty set ? is a total order on ? with the property that every nonempty subset of ? has a least element. Theorem 0.11 (Well-ordering principle) Every nonempty set has a well ordering.

Cardinality Two sets : and ; have the same cardinality, written (: ( ~ (; ( if there is a bijective function (a one-to-one correspondence) between the sets. The reader is probably aware of the fact that ({( ~ (o( and (r( ~ (o( where o denotes the natural numbers, { the integers and r the rational numbers. If : is in one-to-one correspondence with a subset of ; , we write (: ( (; (. If : is in one-to-one correspondence with a proper subset of ; but not all of ; , then we write (: ( (; (. The second condition is necessary, since, for instance, o is in one-to-one correspondence with a proper subset of { and yet o is also in one-to-one correspondence with { itself. Hence, (o( ~ ({(. This is not the place to enter into a detailed discussion of cardinal numbers. The intention here is that the cardinality of a set, whatever that is, represents the “size” of the set. It is actually easier to talk about two sets having the same, or different, size (cardinality) than it is to explicitly define the size (cardinality) of a given set. Be that as it may, we associate to each set : a cardinal number, denoted by (: ( or card²:³, that is intended to measure the size of the set. Actually, cardinal numbers are just very special types of sets. However, we can simply think of them as vague amorphous objects that measure the size of sets. Definition 1) A set is finite if it can be put in one-to-one correspondence with a set of the form { ~ ¸Á Á Ã Á c ¹, for some nonnegative integer . A set that is

Preliminaries

13

not finite is infinite. The cardinal number (or cardinality) of a finite set is just the number of elements in the set. 2) The cardinal number of the set o of natural numbers is L (read “aleph nought”), where L is the first letter of the Hebrew alphabet. Hence, (o( ~ ({( ~ (r( ~ L 3) Any set with cardinality L is called a countably infinite set and any finite or countably infinite set is called a countable set. An infinite set that is not countable is said to be uncountable. Since it can be shown that (s( (o(, the real numbers are uncountable. If : and ; are finite sets, then it is well known that (: ( (; ( and (; ( (: ( ¬ (: ( ~ (; ( The first part of the next theorem tells us that this is also true for infinite sets. The reader will no doubt recall that the power set F²:³ of a set : is the set of all subsets of : . For finite sets, the power set of : is always bigger than the set itself. In fact, (: ( ~ ¬ (F ²:³( ~ The second part of the next theorem says that the power set of any set : is bigger (has larger cardinality) than : itself. On the other hand, the third part of this theorem says that, for infinite sets : , the set of all finite subsets of : is the same size as : . Theorem 0.12 ¨ –Bernstein theorem) For any sets : and ; , 1) (Schroder (: ( (; ( and (; ( (: ( ¬ (: ( ~ (; ( 2) (Cantor's theorem) If F²:³ denotes the power set of : , then (: ( (F ²:³( 3) If F ²:³ denotes the set of all finite subsets of : and if : is an infinite set, then (: ( ~ (F ²:³( Proof. We prove only parts 1) and 2). Let ¢ : ¦ ; be an injective function from : into ; and let ¢ ; ¦ : be an injective function from ; into : . We want to use these functions to create a bijective function from : to ; . For this purpose, we make the following definitions. The descendants of an element : are the elements obtained by repeated alternate applications of the functions and , namely

14


² ³Á ² ² ³³Á ²² ² ³³³Á Ã If ! is a descendant of , then is an ancestor of !. Descendants and ancestors of elements of ; are defined similarly. Now, by tracing an element's ancestry to its beginning, we find that there are three possibilities: the element may originate in : , or in ; , or it may have no point of origin. Accordingly, we can write : as the union of three disjoint sets I: ~ ¸ : originates in :¹ I; ~ ¸ : originates in ; ¹ IB ~ ¸ : has no originator¹ Similarly, ; is the disjoint union of J: , J; and JB . Now, the restriction OI: ¢ I: ¦ J: is a bijection. To see this, note that if ! J: , then ! originated in : and therefore must have the form ² ³ for some : . But ! and its ancestor have the same point of origin and so ! J: implies I: . Thus, OI: is surjective and hence bijective. We leave it to the reader to show that the functions ²OJ; ³c ¢ I; ¦ J; and OIB ¢ IB ¦ JB are also bijections. Putting these three bijections together gives a bijection between : and ; . Hence, (: ( ~ (; (, as desired. We now prove Cantor's theorem. The map ¢ : ¦ F ²:³ defined by ² ³ ~ ¸ ¹ is an injection from : to F ²:³ and so (: ( (F ²:³(. To complete the proof we must show that no injective map ¢ : ¦ F ²:³ can be surjective. To this end, let ?~¸ :

¤ ² ³¹ F ²:³

We claim that ? is not in im² ³. For suppose that ? ~ ²%³ for some % : . Then if % ? , we have by the definition of ? that % ¤ ? . On the other hand, if % ¤ ? , we have again by the definition of ? that % ? . This contradiction implies that ? ¤ im² ³ and so is not surjective. Cardinal Arithmetic Now let us define addition, multiplication and exponentiation of cardinal numbers. If : and ; are sets, the cartesian product : d ; is the set of all ordered pairs : d ; ~ ¸² Á !³

:Á ! ; ¹

The set of all functions from ; to : is denoted by : ; .

Preliminaries

15

Definition Let and denote cardinal numbers. Let : and ; be disjoint sets for which (: ( ~ and (; ( ~ . 1) The sum b is the cardinal number of : r ; . 2) The product is the cardinal number of : d ; . 3) The power is the cardinal number of : ; . We will not go into the details of why these definitions make sense. (For instance, they seem to depend on the sets : and ; , but in fact they do not.) It can be shown, using these definitions, that cardinal addition and multiplication are associative and commutative and that multiplication distributes over addition. Theorem 0.13 Let , and be cardinal numbers. Then the following properties hold: 1) (Associativity) b ² b ³ ~ ² b ³ b and ²³ ~ ²³ 2) (Commutativity) b ~ b and ~ 3) (Distributivity) ² b ³ ~ b 4) (Properties of Exponents) a) b ~ b) ² ³ ~ c) ²³ ~ On the other hand, the arithmetic of cardinal numbers can seem a bit strange, as the next theorem shows. Theorem 0.14 Let and be cardinal numbers, at least one of which is infinite. Then b ~ ~ max¸Á ¹

It is not hard to see that there is a one-to-one correspondence between the power set F²:³ of a set : and the set of all functions from : to ¸Á ¹. This leads to the following theorem. Theorem 0.15 For any cardinal 1) If (: ( ~ , then (F ²:³( ~ 2)

16


We have already observed that (o( ~ L . It can be shown that L is the smallest infinite cardinal, that is, L0 ¬ is a natural number It can also be shown that the set s of real numbers is in one-to-one correspondence with the power set F ²o³ of the natural numbers. Therefore, (s( ~ L The set of all points on the real line is sometimes called the continuum and so L is sometimes called the power of the continuum and denoted by . Theorem 0.14 shows that cardinal addition and multiplication have a kind of “absorption” quality, which makes it hard to produce larger cardinals from smaller ones. The next theorem demonstrates this more dramatically. Theorem 0.16 1) Addition applied a countable number of times or multiplication applied a finite number of times to the cardinal number L , does not yield anything more than L . Specifically, for any nonzero o, we have L h L ~ L and L ~ L 2) Addition and multiplication applied a countable number of times to the cardinal number L does not yield more than L . Specifically, we have L h L ~ L and ²L ³L ~ L

Using this theorem, we can establish other relationships, such as L ²L ³L ²L ³L ~ L which, by the Schro¨der–Bernstein theorem, implies that ²L ³L ~ L We mention that the problem of evaluating in general is a very difficult one and would take us far beyond the scope of this book. We will have use for the following reasonable-sounding result, whose proof is omitted. Theorem 0.17 Let ¸( 2¹ be a collection of sets, indexed by the set 2 , with (2 ( ~ . If (( ( for all 2 , then e ( e 2

Let us conclude by describing the cardinality of some famous sets.

Preliminaries

17

Theorem 0.18 1) The following sets have cardinality L . a) The rational numbers r. b) The set of all finite subsets of o. c) The union of a countable number of countable sets. d) The set { of all ordered -tuples of integers. 2) The following sets have cardinality L . a) The set of all points in s . b) The set of all infinite sequences of natural numbers. c) The set of all infinite sequences of real numbers. d) The set of all finite subsets of s. e) The set of all irrational numbers.

Part 2 Algebraic Structures We now turn to a discussion of some of the many algebraic structures that play a role in the study of linear algebra.

Groups Definition A group is a nonempty set ., together with a binary operation denoted by *, that satisfies the following properties: 1) (Associativity) For all Á Á ., ²i³i ~ i²i³ 2) (Identity) There exists an element . for which i ~ i ~ for all .. 3) (Inverses) For each ., there is an element c . for which ic ~ c i ~

Definition A group . is abelian, or commutative, if i ~ i for all Á . . When a group is abelian, it is customary to denote the operation i by +, thus writing i as b . It is also customary to refer to the identity as the zero element and to denote the inverse c by c, referred to as the negative of . Example 0.7 The set < of all bijective functions from a set : to : is a group under composition of functions. However, in general, it is not abelian. Example 0.8 The set CÁ ²- ³ is an abelian group under addition of matrices. The identity is the zero matrix 0Á of size d . The set C ²- ³ is not a group under multiplication of matrices, since not all matrices have multiplicative

18


inverses. However, the set of invertible matrices of size d is a (nonabelian) group under multiplication. A group . is finite if it contains only a finite number of elements. The cardinality of a finite group . is called its order and is denoted by ².³ or simply (.(. Thus, for example, { ~ ¸Á Á Ã Á c ¹ is a finite group under addition modulo , but CÁ ²s³ is not finite. Definition A subgroup of a group . is a nonempty subset : of . that is a group in its own right, using the same operations as defined on . . Cyclic Groups If is a formal symbol, we can define a group . to be the set of all integral powers of : . ~ ¸ {¹ where the product is defined by the formal rules of exponents: ~ b This group is denoted by º» and called the cyclic group generated by . The identity of º» is ~ . In general, a group . is cyclic if it has the form . ~ º» for some .. We can also create a finite group * ²³ of arbitrary positive order by declaring that ~ . Thus, * ²³ ~ ¸ ~ Á Á Á Ã Á c ¹ where the product is defined by the formal rules of exponents, followed by reduction modulo : ~ ²b³ mod This defines a group of order , called a cyclic group of order . The inverse of is ²c³ mod .

Rings Definition A ring is a nonempty set 9 , together with two binary operations, called addition (denoted by b ) and multiplication (denoted by juxtaposition), for which the following hold: 1) 9 is an abelian group under addition 2) (Associativity) For all Á Á 9 , ²³ ~ ²³

Preliminaries

19

3) (Distributivity) For all Á Á 9 , ² b ³ ~ b and ² b ³ ~ b A ring 9 is said to be commutative if ~ for all Á 9 . If a ring 9 contains an element with the property that ~ ~ for all 9 , we say that 9 is a ring with identity. The identity is usually denoted by . A field - is a commutative ring with identity in which each nonzero element has a multiplicative inverse, that is, if - is nonzero, then there is a for which ~ . Example 0.9 The set { ~ ¸Á Á Ã Á c¹ is a commutative ring under addition and multiplication modulo l ~ ² b ³ mod Á

p ~ mod

The element { is the identity. Example 0.10 The set , of even integers is a commutative ring under the usual operations on {, but it has no identity. Example 0.11 The set C ²- ³ is a noncommutative ring under matrix addition and multiplication. The identity matrix 0 is the identity for C ²- ³. Example 0.12 Let - be a field. The set - ´%µ of all polynomials in a single variable %, with coefficients in - , is a commutative ring under the usual operations of polynomial addition and multiplication. What is the identity for - ´%µ? Similarly, the set - ´% Á Ã Á % µ of polynomials in variables is a commutative ring under the usual addition and multiplication of polynomials. Definition If 9 and : are rings, then a function ¢ 9 ¦ : is a ring homomorphism if ² b ³ ~ b ²³ ~ ²³²³ ~ for all Á 9 . Definition A subring of a ring 9 is a subset : of 9 that is a ring in its own right, using the same operations as defined on 9 and having the same multiplicative identity as 9 .

20


The condition that a subring : have the same multiplicative identity as 9 is required. For example, the set : of all d matrices of the form ( ~ >

?

for - is a ring under addition and multiplication of matrices (isomorphic to - ). The multiplicative identity in : is the matrix ( , which is not the identity 0 of CÁ ²- ³. Hence, : is a ring under the same operations as CÁ ²- ³ but it is not a subring of CÁ ²- ³. Applying the definition is not generally the easiest way to show that a subset of a ring is a subring. The following characterization is usually easier to apply. Theorem 0.19 A nonempty subset : of a ring 9 is a subring if and only if 1) The multiplicative identity 9 of 9 is in : 2) : is closed under subtraction, that is, Á : ¬ c : 3) : is closed under multiplication, that is, Á : ¬ :

Ideals Rings have another important substructure besides subrings. Definition Let 9 be a ring. A nonempty subset ? of 9 is called an ideal if 1) ? is a subgroup of the abelian group 9, that is, ? is closed under subtraction: Á ? ¬ c ? 2) ? is closed under multiplication by any ring element, that is, ? Á 9 ¬ ? and ?

Note that if an ideal ? contains the unit element , then ? ~ 9 . Example 0.13 Let ²%³ be a polynomial in - ´%µ. The set of all multiples of ²%³, º²%³» ~ ¸²%³²%³ ²%³ - ´%µ¹ is an ideal in - ´%µ, called the ideal generated by ²%³. Definition Let : be a subset of a ring 9 with identity. The set º:» ~ ¸

b Ä b

9Á

:Á ¹

Preliminaries

21

of all finite linear combinations of elements of : , with coefficients in 9 , is an ideal in 9 , called the ideal generated by : . It is the smallest (in the sense of set inclusion) ideal of 9 containing : . If : ~ ¸ Á Ã Á ¹ is a finite set, we write º Á Ã Á

»

~ ¸

b Ä b

9Á

:¹

Note that in the previous definition, we require that 9 have an identity. This is to ensure that : º:». Theorem 0.20 Let 9 be a ring. 1) The intersection of any collection ¸? 2¹ of ideals is an ideal. 2) If ? ? Ä is an ascending sequence of ideals, each one contained in the next, then the union ? is also an ideal. 3) More generally, if 9 ~ ¸? 0¹ is a chain of ideals in 9 , then the union @ ~ 0 ? is also an ideal in 9 . Proof. To prove 1), let @ ~ ? . Then if Á @ , we have Á ? for all 2 . Hence, c ? for all 2 and so c @ . Hence, @ is closed under subtraction. Also, if 9 , then ? for all 2 and so @ . Of course, part 2) is a special case of part 3). To prove 3), if Á @ , then ? and ? for some Á 0 . Since one of ? and ? is contained in the other, we may assume that ? ? . It follows that Á ? and so c ? @ and if 9 , then ? @ . Thus @ is an ideal. Note that in general, the union of ideals is not an ideal. However, as we have just proved, the union of any chain of ideals is an ideal. Quotient Rings and Maximal Ideals Let : be a subset of a commutative ring 9 with identity. Let be the binary relation on 9 defined by ¯ c : It is easy to see that is an equivalence relation. When , we say that and are congruent modulo : . The term “mod” is used as a colloquialism for modulo and is often written mod : As shorthand, we write .

22


To see what the equivalence classes look like, observe that ´µ ~ ¸ 9 ¹ ~ ¸ 9 c :¹ ~ ¸ 9 ~ b for some :¹ ~ ¸ b :¹ ~b: The set b : ~ ¸ b

:¹

is called a coset of : in 9 . The element is called a coset representative for b :. Thus, the equivalence classes for congruence mod : are the cosets b : of : in 9 . The set of all cosets is denoted by 9°: ~ ¸ b : 9¹ This is read “9 mod : .” We would like to place a ring structure on 9°: . Indeed, if : is a subgroup of the abelian group 9, then 9°: is easily seen to be an abelian group as well under coset addition defined by ² b :³ b ² b :³ ~ ² b ³ b : In order for the product ² b :³² b :³ ~ b : to be well-defined, we must have b : ~ Z b : ¬ b : ~ Z b : or, equivalently, c Z : ¬ ² c Z ³ : But c Z may be any element of : and may be any element of 9 and so this condition implies that : must be an ideal. Conversely, if : is an ideal, then coset multiplication is well defined. Theorem 0.21 Let 9 be a commutative ring with identity. Then the quotient 9°? is a ring under coset addition and multiplication if and only if ? is an ideal of 9 . In this case, 9°? is called the quotient ring of 9 modulo ? , where addition and multiplication are defined by ² b :³ b ² b :³ ~ ² b ³ b : ² b :³² b :³ ~ b :

Preliminaries

23

Definition An ideal ? in a ring 9 is a maximal ideal if ? £ 9 and if whenever @ is an ideal satisfying ? @ 9 , then either @ ~ ? or @ ~ 9 . Here is one reason why maximal ideals are important. Theorem 0.22 Let 9 be a commutative ring with identity. Then the quotient ring 9°? is a field if and only if ? is a maximal ideal. Proof. First, note that for any ideal ? of 9 , the ideals of 9°? are precisely the quotients @ °? where @ is an ideal for which ? @ 9. It is clear that @ °? is an ideal of 9°? . Conversely, if AZ is an ideal of 9°? , then let A ~ ¸ 9 b ? AZ ¹ It is easy to see that A is an ideal of 9 for which ? A 9 . Next, observe that a commutative ring : with identity is a field if and only if : has no nonzero proper ideals. For if : is a field and ? is an ideal of : containing a nonzero element , then ~ c ? and so ? ~ : . Conversely, if : has no nonzero proper ideals and £ : , then the ideal º » must be : and so there is an : for which ~ . Hence, : is a field. Putting these two facts together proves the theorem. The following result says that maximal ideals always exist. Theorem 0.23 Any nonzero commutative ring 9 with identity contains a maximal ideal. Proof. Since 9 is not the zero ring, the ideal ¸¹ is a proper ideal of 9 . Hence, the set I of all proper ideals of 9 is nonempty. If 9 ~ ¸? 0¹ is a chain of proper ideals in 9 , then the union @ ~ 0 ? is also an ideal. Furthermore, if @ ~ 9 is not proper, then @ and so ? , for some 0 , which implies that ? ~ 9 is not proper. Hence, @ I . Thus, any chain in I has an upper bound in I and so Zorn's lemma implies that I has a maximal element. This shows that 9 has a maximal ideal.

Integral Domains Definition Let 9 be a ring. A nonzero element r 9 is called a zero divisor if there exists a nonzero 9 for which ~ . A commutative ring 9 with identity is called an integral domain if it contains no zero divisors. Example 0.14 If is not a prime number, then the ring { has zero divisors and so is not an integral domain. To see this, observe that if is not prime, then ~ in {, where Á . But in { , we have

24


p ~ mod ~ and so and are both zero divisors. As we will see later, if is a prime, then { is a field (which is an integral domain, of course). Example 0.15 The ring - ´%µ is an integral domain, since ²%³²%³ ~ implies that ²%³ ~ or ²%³ ~ . If 9 is a ring and % ~ & where Á %Á & 9 , then we cannot in general cancel the 's and conclude that % ~ &. For instance, in { , we have h ~ h , but canceling the 's gives ~ . However, it is precisely the integral domains in which we can cancel. The simple proof is left to the reader. Theorem 0.24 Let 9 be a commutative ring with identity. Then 9 is an integral domain if and only if the cancellation law % ~ &Á £ ¬ % ~ & holds.

The Field of Quotients of an Integral Domain Any integral domain 9 can be embedded in a field. The quotient field (or field of quotients) of 9 is a field that is constructed from 9 just as the field of rational numbers is constructed from the ring of integers. In particular, we set 9 b ~ ¸²Á ³ Á 9Á £ ¹ where ²Á ³ ~ ²Z Á Z ³ if and only if Z ~ Z . Addition and multiplication of fractions is defined by ²Á ³ b ²Á ³ ~ ² b Á ³ and ²Á ³ h ²Á ³ ~ ²Á ³ It is customary to write ²Á ³ in the form ° . Note that if 9 has zero divisors, then these definitions do not make sense, because may be even if and are not. This is why we require that 9 be an integral domain.

Principal Ideal Domains Definition Let 9 be a ring with identity and let 9 . The principal ideal generated by is the ideal º» ~ ¸ 9¹ An integral domain 9 in which every ideal is a principal ideal is called a principal ideal domain.

Preliminaries

25

Theorem 0.25 The integers form a principal ideal domain. In fact, any ideal ? in { is generated by the smallest positive integer a that is contained in ? . Theorem 0.26 The ring - ´%µ is a principal ideal domain. In fact, any ideal ? is generated by the unique monic polynomial of smallest degree contained in ? . Moreover, for polynomials ²%³Á Ã Á ²%³, º ²%³Á Ã Á ²%³» ~ ºgcd¸ ²%³Á Ã Á ²%³¹» Proof. Let ? be an ideal in - ´%µ and let ²%³ be a monic polynomial of smallest degree in ? . First, we observe that there is only one such polynomial in ? . For if ²%³ ? is monic and deg²²%³³ ~ deg²²%³³, then ²%³ ~ ²%³ c ²%³ ? and since deg²²%³³ deg²²%³³, we must have ²%³ ~ and so ²%³ ~ ²%³. We show that ? ~ º²%³». Since ²%³ ? , we have º²%³» ? . To establish the reverse inclusion, if ²%³ ? , then dividing ²%³ by ²%³ gives ²%³ ~ ²%³²%³ b ²%³ where ²%³ ~ or deg ²%³ deg ²%³. But since ? is an ideal, ²%³ ~ ²%³ c ²%³²%³ ? and so deg ²%³ deg ²%³ is impossible. Hence, ²%³ ~ and ²%³ ~ ²%³²%³ º²%³» This shows that ? º²%³» and so ? ~ º²%³». To prove the second statement, let ? ~ º ²%³Á Ã Á ²%³». Then, by what we have just shown, ? ~ º ²%³Á Ã Á ²%³» ~ º²%³» where ²%³ is the unique monic polynomial ²%³ in ? of smallest degree. In particular, since ²%³ º²%³», we have ²%³ ²%³ for each ~ Á Ã Á . In other words, ²%³ is a common divisor of the ²%³'s. Moreover, if ²%³ ²%³ for all , then ²%³ º²%³» for all , which implies that ²%³ º²%³» ~ º ²%³Á Ã Á ²%³» º²%³» and so ²%³ ²%³. This shows that ²%³ is the greatest common divisor of the ²%³'s and completes the proof.

26


Example 0.16 The ring 9 ~ - ´%Á &µ of polynomials in two variables % and & is not a principal ideal domain. To see this, observe that the set ? of all polynomials with zero constant term is an ideal in 9 . Now, suppose that ? is the principal ideal ? ~ º²%Á &³». Since %Á & ? , there exist polynomials ²%Á &³ and ²%Á &³ for which % ~ ²%Á &³²%Á &³ and & ~ ²%Á &³²%Á &³

(0.1)

But ²%Á &³ cannot be a constant, for then we would have ? ~ 9 . Hence, deg²²%Á &³³ and so ²%Á &³ and ²%Á &³ must both be constants, which implies that (0.1) cannot hold. Theorem 0.27 Any principal ideal domain 9 satisfies the ascending chain condition, that is, 9 cannot have a strictly increasing sequence of ideals ? ? Ä where each ideal is properly contained in the next one. Proof. Suppose to the contrary that there is such an increasing sequence of ideals. Consider the ideal < ~ ? which must have the form < ~ º» for some < . Since ? for some , we have ? ~ ? for all , contradicting the fact that the inclusions are proper.

Prime and Irreducible Elements We can define the notion of a prime element in any integral domain. For Á 9 , we say that divides (written ) if there exists an % 9 for which ~ %. Definition Let 9 be an integral domain. 1) An invertible element of 9 is called a unit. Thus, " 9 is a unit if "# ~ for some # 9 . 2) Two elements Á 9 are said to be associates if there exists a unit " for which ~ " . We denote this by writing . 3) A nonzero nonunit 9 is said to be prime if ¬ or 4) A nonzero nonunit 9 is said to be irreducible if ~ ¬ or is a unit Note that if is prime or irreducible, then so is " for any unit ". The property of being associate is clearly an equivalence relation.

Preliminaries

27

Definition We will refer to the equivalence classes under the relation of being associate as the associate classes of 9 . Theorem 0.28 Let 9 be a ring. 1) An element " 9 is a unit if and only if º"» ~ 9 . 2) if and only if º» ~ º ». 3) divides if and only if º » º». 4) properly divides , that is, ~ % where % is not a unit, if and only if º » º». In the case of the integers, an integer is prime if and only if it is irreducible. In any integral domain, prime elements are irreducible, but the converse need not hold. (In the ring {´jc µ ~ ¸ b jc Á {¹ the irreducible element divides the product ² b jc ³² c jc ³ ~ but does not divide either factor.) However, in principal ideal domains, the two concepts are equivalent. Theorem 0.29 Let 9 be a principal ideal domain. 1) An 9 is irreducible if and only if the ideal º» is maximal. 2) An element in 9 is prime if and only if it is irreducible. 3) The elements Á 9 are relatively prime, that is, have no common nonunit factors, if and only if there exist Á 9 for which b ~ This is denoted by writing ²Á ³ ~ . Proof. To prove 1), suppose that is irreducible and that º» º» 9 . Then º» and so ~ % for some % 9. The irreducibility of implies that or % is a unit. If is a unit, then º» ~ 9 and if % is a unit, then º» ~ º%» ~ º». This shows that º» is maximal. (We have º» £ 9 , since is not a unit.) Conversely, suppose that is not irreducible, that is, ~ where neither nor is a unit. Then º» º» 9 . But if º» ~ º», then , which implies that is a unit. Hence º» £ º». Also, if º» ~ 9 , then must be a unit. So we conclude that º» is not maximal, as desired. To prove 2), assume first that is prime and ~ . Then or . We may assume that . Therefore, ~ % ~ % . Canceling 's gives ~ % and so is a unit. Hence, is irreducible. (Note that this argument applies in any integral domain.) Conversely, suppose that is irreducible and let . We wish to prove that or . The ideal º» is maximal and so ºÁ » ~ º» or ºÁ » ~ 9 . In the former case, and we are done. In the latter case, we have ~ % b &

28


for some %Á & 9 . Thus, ~ % b & and since divides both terms on the right, we have . To prove 3), it is clear that if b ~ , then and are relatively prime. For the converse, consider the ideal ºÁ », which must be principal, say ºÁ » ~ º%». Then % and % and so % must be a unit, which implies that ºÁ » ~ 9 . Hence, there exist Á 9 for which b ~ .

Unique Factorization Domains Definition An integral domain 9 is said to be a unique factorization domain if it has the following factorization properties: 1) Every nonzero nonunit element 9 can be written as a product of a finite number of irreducible elements ~ Ä . 2) The factorization into irreducible elements is unique in the sense that if ~ Ä and ~ Ä are two such factorizations, then ~ and after a suitable reindexing of the factors, . Unique factorization is clearly a desirable property. Fortunately, principal ideal domains have this property. Theorem 0.30 Every principal ideal domain 9 is a unique factorization domain. Proof. Let 9 be a nonzero nonunit. If is irreducible, then we are done. If not, then ~ , where neither factor is a unit. If and are irreducible, we are done. If not, suppose that is not irreducible. Then ~ , where neither nor is a unit. Continuing in this way, we obtain a factorization of the form (after renumbering if necessary) ~ ~ ² ³ ~ ² ³² ³ ~ ² ³² ³ ~ Ä Each step is a factorization of into a product of nonunits. However, this process must stop after a finite number of steps, for otherwise it will produce an infinite sequence Á Á Ã of nonunits of 9 for which b properly divides . But this gives the ascending chain of ideals º » º » º » º » Ä where the inclusions are proper. But this contradicts the fact that a principal ideal domain satisfies the ascending chain condition. Thus, we conclude that every nonzero nonunit has a factorization into irreducible elements. As to uniqueness, if ~ Ä and ~ Ä are two such factorizations, then because 9 is an integral domain, we may equate them and cancel like factors, so let us assume this has been done. Thus, £ for all Á . If there are no factors on either side, we are done. If exactly one side has no factors left,

Preliminaries

29

then we have expressed as a product of irreducible elements, which is not possible since irreducible elements are nonunits. Suppose that both sides have factors left, that is, Ä ~ Ä where £ . Then Ä , which implies that for some . We can assume by reindexing if necessary that ~ . Since is irreducible must be a unit. Replacing by and canceling gives Äc ~ Äc This process can be repeated until we run out of 's or 's. If we run out of 's first, then we have an equation of the form " Ä ~ where " is a unit, which is not possible since the 's are not units. By the same reasoning, we cannot run out of 's first and so ~ and the 's and 's can be paired off as associates.

Fields For the record, let us give the definition of a field (a concept that we have been using). Definition A field is a set - , containing at least two elements, together with two binary operations, called addition (denoted by b ) and multiplication (denoted by juxtaposition), for which the following hold: 1) - is an abelian group under addition. 2) The set - i of all nonzero elements in - is an abelian group under multiplication. 3) (Distributivity) For all Á Á - , ² b ³ ~ b and ² b ³ ~ b

We require that - have at least two elements to avoid the pathological case in which ~ . Example 0.17 The sets r, s and d, of all rational, real and complex numbers, respectively, are fields, under the usual operations of addition and multiplication of numbers. Example 0.18 The ring { is a field if and only if is a prime number. We have already seen that { is not a field if is not prime, since a field is also an integral domain. Now suppose that ~ is a prime. We have seen that { is an integral domain and so it remains to show that every nonzero element in { has a multiplicative inverse. Let £ { . Since , we know that and are relatively prime. It follows that there exist integers " and # for which

30


" b # ~ Hence, " ² c #³ mod and so " p ~ in { , that is, " is the multiplicative inverse of . The previous example shows that not all fields are infinite sets. In fact, finite fields play an extremely important role in many areas of abstract and applied mathematics. A field - is said to be algebraically closed if every nonconstant polynomial over - has a root in - . This is equivalent to saying that every nonconstant polynomial splits over - . For example, the complex field d is algebraically closed but the real field s is not. We mention without proof that every field - is contained in an algebraically closed field - , called the algebraic closure of - . For example, the algebraic closure of the real field is the complex field.

The Characteristic of a Ring Let 9 be a ring with identity. If is a positive integer, then by h , we simply mean h ~ bÄb terms Now, it may happen that there is a positive integer for which h~ For instance, in { , we have h ~ ~ . On the other hand, in {, the equation h ~ implies ~ and so no such positive integer exists. Notice that in any finite ring, there must exist such a positive integer , since the members of the infinite sequence of numbers h Á h Á h Á Ã cannot be distinct and so h ~ h for some , whence ² c ³ h ~ . Definition Let 9 be a ring with identity. The smallest positive integer for which h ~ is called the characteristic of 9 . If no such number exists, we say that 9 has characteristic . The characteristic of 9 is denoted by char²9³. If char²9³ ~ , then for any 9 , we have h ~ b Ä b ~ ² b Ä b ³ ~ h ~ terms terms

Preliminaries

31

Theorem 0.31 Any finite ring has nonzero characteristic. Any finite integral domain has prime characteristic. Proof. We have already seen that a finite ring has nonzero characteristic. Let be a finite integral domain and suppose that char²- ³ ~ . If ~ , where Á , then h ~ . Hence, ² h ³² h ³ ~ , implying that h ~ or h ~ . In either case, we have a contradiction to the fact that is the smallest positive integer such that h ~ . Hence, must be prime. Notice that in any field - of characteristic , we have ~ for all - . Thus, in - , ~ c for all This property takes a bit of getting used to and makes fields of characteristic quite exceptional. (As it happens, there are many important uses for fields of characteristic .) It can be shown that all finite fields have size equal to a positive integral power of a prime and for each prime power , there is a finite field of size . In fact, up to isomorphism, there is exactly one finite field of size .

Algebras The final algebraic structure of which we will have use is a combination of a vector space and a ring. (We have not yet officially defined vector spaces, but we will do so before needing the following definition, which is placed here for easy reference.) Definition An algebra 7 over a field - is a nonempty set 7, together with three operations, called addition (denoted by b ), multiplication (denoted by juxtaposition) and scalar multiplication (also denoted by juxtaposition), for which the following properties hold: 1) 7 is a vector space over - under addition and scalar multiplication. 2) 7 is a ring under addition and multiplication. 3) If - and Á 7, then ²³ ~ ²³ ~ ²³

Thus, an algebra is a vector space in which we can take the product of vectors, or a ring in which we can multiply each element by a scalar (subject, of course, to additional requirements as given in the definition).

Part I—Basic Linear Algebra

Chapter 1

Vector Spaces

Vector Spaces Let us begin with the definition of one of our principal objects of study. Definition Let - be a field, whose elements are referred to as scalars. A vector space over - is a nonempty set = , whose elements are referred to as vectors, together with two operations. The first operation, called addition and denoted by b , assigns to each pair ²"Á #³ of vectors in = a vector " b # in = . The second operation, called scalar multiplication and denoted by juxtaposition, assigns to each pair ²Á "³ - d = a vector " in = . Furthermore, the following properties must be satisfied: 1) (Associativity of addition) For all vectors "Á #Á $ = , " b ²# b $³ ~ ²" b #³ b $ 2) (Commutativity of addition) For all vectors "Á # = , "b#~#b" 3) (Existence of a zero) There is a vector = with the property that b"~"b~" for all vectors " = . 4) (Existence of additive inverses) For each vector " = , there is a vector in = , denoted by c", with the property that " b ²c"³ ~ ²c"³ b " ~

36


5) (Properties of scalar multiplication) For all scalars Á F and for all vectors "Á # = , ²" b #³ ~ " b # ² b ³" ~ " b " ²³" ~ ²"³ " ~ "

Note that the first four properties in the definition of vector space can be summarized by saying that = is an abelian group under addition. A vector space over a field - is sometimes called an - -space. A vector space over the real field is called a real vector space and a vector space over the complex field is called a complex vector space. Definition Let : be a nonempty subset of a vector space = . A linear combination of vectors in : is an expression of the form # b Ä b # where # Á Ã Á # : and Á Ã Á - . The scalars are called the coefficients of the linear combination. A linear combination is trivial if every coefficient is zero. Otherwise, it is nontrivial.

Examples of Vector Spaces Here are a few examples of vector spaces. Example 1.1 1) Let - be a field. The set - - of all functions from - to - is a vector space over - , under the operations of ordinary addition and scalar multiplication of functions: ² b ³²%³ ~ ²%³ b ²%³ and ² ³²%³ ~ ² ²%³³ 2) The set CÁ ²- ³ of all d matrices with entries in a field - is a vector space over - , under the operations of matrix addition and scalar multiplication. 3) The set - of all ordered -tuples whose components lie in a field - , is a vector space over - , with addition and scalar multiplication defined componentwise: ² Á Ã Á ³ b ² Á Ã Á ³ ~ ² b Á Ã Á b ³ and

Vector Spaces

37

² Á Ã Á ³ ~ ² Á Ã Á ³ When convenient, we will also write the elements of - in column form. When - is a finite field - with elements, we write = ²Á ³ for - . 4) Many sequence spaces are vector spaces. The set Seq²- ³ of all infinite sequences with members from a field - is a vector space under the componentwise operations ² ³ b ²! ³ ~ ²

b ! ³

and ² ³ ~ ² ³ In a similar way, the set of all sequences of complex numbers that converge to is a vector space, as is the set MB of all bounded complex sequences. Also, if is a positive integer, then the set M of all complex sequences ² ³ for which B

( ( B ~

is a vector space under componentwise operations. To see that addition is a binary operation on M , one verifies Minkowski's inequality B

8 ( ~

° b ! ( 9

B

8 ( ( 9 ~

°

B

°

b 8 (! ( 9 ~

which we will not do here.

Subspaces Most algebraic structures contain substructures, and vector spaces are no exception. Definition A subspace of a vector space = is a subset : of = that is a vector space in its own right under the operations obtained by restricting the operations of = to : . We use the notation : = to indicate that : is a subspace of = and : = to indicate that : is a proper subspace of = , that is, : = but : £ = . The zero subspace of = is ¸¹. Since many of the properties of addition and scalar multiplication hold a fortiori in a nonempty subset : , we can establish that : is a subspace merely by checking that : is closed under the operations of = . Theorem 1.1 A nonempty subset : of a vector space = is a subspace of = if and only if : is closed under addition and scalar multiplication or, equivalently,

38


: is closed under linear combinations, that is, Á - Á "Á # : ¬ " b # :

Example 1.2 Consider the vector space = ²Á ³ of all binary -tuples, that is, -tuples of 's and 's. The weight M ²#³ of a vector # = ²Á ³ is the number of nonzero coordinates in #. For instance, M ²³ ~ . Let , be the set of all vectors in = of even weight. Then , is a subspace of = ²Á ³. To see this, note that M ²" b #³ ~ M ²"³ b M ²#³ c M ²" q #³ where " q # is the vector in = ²Á ³ whose th component is the product of the th components of " and #, that is, ²" q #³ ~ " h # Hence, if M ²"³ and M ²#³ are both even, so is M ²" b #³. Finally, scalar multiplication over - is trivial and so , is a subspace of = ²Á ³, known as the even weight subspace of = ²Á ³. Example 1.3 Any subspace of the vector space = ²Á ³ is called a linear code. Linear codes are among the most important and most studied types of codes, because their structure allows for efficient encoding and decoding of information.

The Lattice of Subspaces The set I²= ³ of all subspaces of a vector space = is partially ordered by set inclusion. The zero subspace ¸¹ is the smallest element in I ²= ³ and the entire space = is the largest element. If :Á ; I ²= ³, then : q ; is the largest subspace of = that is contained in both : and ; . In terms of set inclusion, : q ; is the greatest lower bound of : and ; : : q ; ~ glb¸:Á ; ¹ Similarly, if ¸: 2¹ is any collection of subspaces of = , then their intersection is the greatest lower bound of the subspaces:

: ~ glb¸: 2¹ 2

On the other hand, if :Á ; I ²= ³ (and - is infinite), then : r ; I ²= ³ if and only if : ; or ; : . Thus, the union of two subspaces is never a subspace in any “interesting” case. We also have the following.

Vector Spaces

39

Theorem 1.2 A nontrivial vector space = over an infinite field - is not the union of a finite number of proper subspaces. Proof. Suppose that = ~ : r Ä r : , where we may assume that : \ : r Ä r : Let $ : ± ²: r Ä r : ³ and let # ¤ : . Consider the infinite set ( ~ ¸$ b # - ¹ which is the “line” through #, parallel to $. We want to show that each : contains at most one vector from the infinite set (, which is contrary to the fact that = ~ : r Ä r : . This will prove the theorem. If $ b # : for £ , then $ : implies # : , contrary to assumption. Next, suppose that $ b # : and $ b # : , for , where £ . Then : ² $ b #³ c ² $ b #³ ~ ² c ³$ and so $ : , which is also contrary to assumption. To determine the smallest subspace of = containing the subspaces : and ; , we make the following definition. Definition Let : and ; be subspaces of = . The sum : b ; is defined by : b ; ~ ¸" b # " :Á # ; ¹ More generally, the sum of any collection ¸: 2¹ of subspaces is the set of all finite sums of vectors from the union : : : ~ H 2

bÄb

c

: I

2

It is not hard to show that the sum of any collection of subspaces of = is a subspace of = and that the sum is the least upper bound under set inclusion: : b ; ~ lub¸:Á ; ¹ More generally, : ~ lub¸: 2¹ 2

If a partially ordered set 7 has the property that every pair of elements has a least upper bound and greatest lower bound, then 7 is called a lattice. If 7 has a smallest element and a largest element and has the property that every collection of elements has a least upper bound and greatest lower bound, then 7

40


is called a complete lattice. The least upper bound of a collection is also called the join of the collection and the greatest lower bound is called the meet. Theorem 1.3 The set I²= ³ of all subspaces of a vector space = is a complete lattice under set inclusion, with smallest element ¸¹, largest element = , meet glb¸: 2¹ ~ : 2

and join lub¸: 2¹ ~ :

2

Direct Sums As we will see, there are many ways to construct new vector spaces from old ones.

External Direct Sums Definition Let = Á Ã Á = be vector spaces over a field - . The external direct sum of = Á Ã Á = , denoted by = ~ = ^ Ä ^ = is the vector space = whose elements are ordered -tuples: = ~ ¸²# Á Ã Á # ³ # = Á ~ Á Ã Á ¹ with componentwise operations ²" Á Ã Á " ³ b ²# Á Ã Á # ³ ~ ²" b # Á Ã Á " b #³ and ²# Á Ã Á # ³ ~ ²# Á Ã Á # ³ for all - . Example 1.4 The vector space - is the external direct sum of copies of - , that is, - ~ - ^ Ä ^ where there are summands on the right-hand side. This construction can be generalized to any collection of vector spaces by generalizing the idea that an ordered -tuple ²# Á Ã Á # ³ is just a function ¢ ¸Á Ã Á ¹ ¦ = from the index set ¸Á Ã Á ¹ to the union of the spaces with the property that ²³ = .

Vector Spaces

41

Definition Let < ~ ¸= 2¹ be any family of vector spaces over - . The direct product of < is the vector space = ~ H ¢ 2 ¦ = d ²³ = I

2

2

thought of as a subspace of the vector space of all functions from 2 to = . It will prove more useful to restrict the set of functions to those with finite support. Definition Let < ~ ¸= 2¹ be a family of vector spaces over - . The support of a function ¢ 2 ¦ = is the set supp² ³ ~ ¸ 2 ²³ £ ¹ Thus, a function has finite support if ²³ ~ for all but a finite number of 2. The external direct sum of the family < is the vector space ext

= ~ H ¢ 2 ¦ = d ²³ = , has finite supportI 2

2

thought of as a subspace of the vector space of all functions from 2 to = . An important special case occurs when = ~ = for all 2 . If we let = 2 denote the set of all functions from 2 to = and ²= 2 ³ denote the set of all functions in = 2 that have finite support, then ext

= ~ = 2 and = ~ ²= 2 ³ 2

2

Note that the direct product and the external direct sum are the same for a finite family of vector spaces.

Internal Direct Sums An internal version of the direct sum construction is often more relevant. Definition A vector space = is the (internal) direct sum of a family < ~ ¸: 0¹ of subspaces of = , written = ~
is denoted by B²= Á > ³. 1) A linear transformation from = to = is called a linear operator on = . The set of all linear operators on = is denoted by B²= ³. A linear operator on a real vector space is called a real operator and a linear operator on a complex vector space is called a complex operator. 2) A linear transformation from = to the base field - (thought of as a vector space over itself) is called a linear functional on = . The set of all linear functionals on = is denoted by = i and called the dual space of = . We should mention that some authors use the term linear operator for any linear transformation from = to > . Also, the application of a linear transformation on a vector # is denoted by ²#³ or by #, parentheses being used when necessary, as in ²" b #³, or to improve readability, as in ² "³ rather than ² ²"³³. Definition The following terms are also employed: 1) homomorphism for linear transformation 2) endomorphism for linear operator 3) monomorphism (or embedding) for injective linear transformation 4) epimorphism for surjective linear transformation 5) isomorphism for bijective linear transformation.

60


6) automorphism for bijective linear operator. Example 2.1 1) The derivative +¢ = ¦ = is a linear operator on the vector space = of all infinitely differentiable functions on s. 2) The integral operator ¢ - ´%µ ¦ - ´%µ defined by %

~ ²!³!

is a linear operator on - ´%µ. 3) Let ( be an d matrix over - . The function ( ¢ - ¦ - defined by ( # ~ (#, where all vectors are written as column vectors, is a linear transformation from - to - . This function is just multiplication by ( . 4) The coordinate map ¢ = ¦ - of an -dimensional vector space is a linear transformation from = to - . The set B²= Á > ³ is a vector space in its own right and B²= ³ has the structure of an algebra, as defined in Chapter 0. Theorem 2.1 1) The set B²= Á > ³ is a vector space under ordinary addition of functions and scalar multiplication of functions by elements of - . 2) If B²< Á = ³ and B²= Á > ³, then the composition is in B²< Á > ³. 3) If B²= Á > ³ is bijective then c B²> Á = ³. 4) The vector space B²= ³ is an algebra, where multiplication is composition of functions. The identity map B²= ³ is the multiplicative identity and the zero map B²= ³ is the additive identity. Proof. We prove only part 3). Let ¢ = ¦ > be a bijective linear transformation. Then c ¢ > ¦ = is a well-defined function and since any two vectors $ and $ in > have the form $ ~ # and $ ~ # , we have c ²$ b $ ³ ~ c ² # b # ³ ~ c ² ²# b # ³³ ~ # b # ~ c ²$ ³ b c ²$ ³ which shows that c is linear. One of the easiest ways to define a linear transformation is to give its values on a basis. The following theorem says that we may assign these values arbitrarily and obtain a unique linear transformation by linear extension to the entire domain. Theorem 2.2 Let = and > be vector spaces and let 8 ~ ¸# 0¹ be a basis for = . Then we can define a linear transformation B²= Á > ³ by

Linear Transformations

61

specifying the values of # arbitrarily for all # 8 and extending to = by linearity, that is, ² # b Ä b # ³ ~ # b Ä b # This process defines a unique linear transformation, that is, if Á B²= Á > ³ satisfy # ~ # for all # 8 then ~ . Proof. The crucial point is that the extension by linearity is well-defined, since each vector in = has an essentially unique representation as a linear combination of a finite number of vectors in 8 . We leave the details to the reader. Note that if B²= Á > ³ and if : is a subspace of = , then the restriction O: of to : is a linear transformation from : to > .

The Kernel and Image of a Linear Transformation There are two very important vector spaces associated with a linear transformation from = to > . Definition Let B²= Á > ³. The subspace ker² ³ ~ ¸# = # ~ ¹ is called the kernel of and the subspace im² ³ ~ ¸ # # = ¹ is called the image of . The dimension of ker² ³ is called the nullity of and is denoted by null² ³. The dimension of im² ³ is called the rank of and is denoted by rk² ³. It is routine to show that ker² ³ is a subspace of = and im² ³ is a subspace of > . Moreover, we have the following. Theorem 2.3 Let B²= Á > ³. Then 1) is surjective if and only if im² ³ ~ > 2) is injective if and only if ker² ³ ~ ¸¹ Proof. The first statement is merely a restatement of the definition of surjectivity. To see the validity of the second statement, observe that " ~ # ¯ ²" c #³ ~ ¯ " c # ker² ³ Hence, if ker² ³ ~ ¸¹, then " ~ # ¯ " ~ #, which shows that is injective. Conversely, if is injective and " ker² ³, then " ~ and so " ~ . This shows that ker² ³ ~ ¸¹.

62


Isomorphisms Definition A bijective linear transformation ¢ = ¦ > is called an isomorphism from = to > . When an isomorphism from = to > exists, we say that = and > are isomorphic and write = > . Example 2.2 Let dim²= ³ ~ . For any ordered basis 8 of = , the coordinate map 8 ¢ = ¦ - that sends each vector # = to its coordinate matrix ´#µ8 - is an isomorphism. Hence, any -dimensional vector space over - is isomorphic to - . Isomorphic vector spaces share many properties, as the next theorem shows. If B²= Á > ³ and : = we write : ~ ¸

:¹

Theorem 2.4 Let B²= Á > ³ be an isomorphism. Let : = . Then 1) : spans = if and only if : spans > . 2) : is linearly independent in = if and only if : is linearly independent in >. 3) : is a basis for = if and only if : is a basis for > . An isomorphism can be characterized as a linear transformation ¢ = ¦ > that maps a basis for = to a basis for > . Theorem 2.5 A linear transformation B²= Á > ³ is an isomorphism if and only if there is a basis 8 for = for which 8 is a basis for > . In this case, maps any basis of = to a basis of > . The following theorem says that, up to isomorphism, there is only one vector space of any given dimension over a given field. Theorem 2.6 Let = and > be vector spaces over - . Then = > if and only if dim²= ³ ~ dim²> ³. In Example 2.2, we saw that any -dimensional vector space is isomorphic to - . Now suppose that ) is a set of cardinality and let ²- ) ³ be the vector space of all functions from ) to - with finite support. We leave it to the reader to show that the functions ²- ) ³ defined for all ) by ²%³ ~ F

if % ~ if % £

form a basis for ²- ) ³ , called the standard basis. Hence, dim²²- ) ³ ³ ~ () (. It follows that for any cardinal number , there is a vector space of dimension . Also, any vector space of dimension is isomorphic to ²- ) ³ .


63

Theorem 2.7 If is a natural number, then any -dimensional vector space over - is isomorphic to - . If is any cardinal number and if ) is a set of cardinality , then any -dimensional vector space over - is isomorphic to the vector space ²- ) ³ of all functions from ) to - with finite support.

The Rank Plus Nullity Theorem Let B²= Á > ³. Since any subspace of = has a complement, we can write = ~ ker² ³ l ker² ³ where ker² ³ is a complement of ker² ³ in = . It follows that dim²= ³ ~ dim²ker² ³³ b dim²ker² ³ ³ Now, the restriction of to ker² ³ , ¢ ker² ³ ¦ > is injective, since ker² ³ ~ ker² ³ q ker² ³ ~ ¸¹ Also, im² ³ im² ³. For the reverse inclusion, if # im² ³, then since # ~ " b $ for " ker² ³ and $ ker² ³ , we have # ~ " b $ ~ $ ~ $ im² ³ Thus im² ³ ~ im² ³. It follows that ker² ³ im² ³ From this, we deduce the following theorem. Theorem 2.8 Let B²= Á > ³. 1) Any complement of ker² ³ is isomorphic to im² ³ 2) (The rank plus nullity theorem) dim²ker² ³³ b dim²im² ³³ ~ dim²= ³ or, in other notation, rk² ³ b null² ³ ~ dim²= ³

Theorem 2.8 has an important corollary. Corollary 2.9 Let B²= Á > ³, where dim²= ³ ~ dim²> ³ B. Then is injective if and only if it is surjective. Note that this result fails if the vector spaces are not finite-dimensional. The reader is encouraged to find an example to support this statement.

64


Linear Transformations from - to - Recall that for any d matrix ( over - the multiplication map ( ²#³ ~ (# is a linear transformation. In fact, any linear transformation B²- Á - ³ has this form, that is, is just multiplication by a matrix, for we have 2 Ä 3 ~ 2 Ä 3²³ ~ and so ~ ( , where ( ~ 2 Ä 3 Theorem 2.10 1) If ( is an d matrix over - then ( B²- Á - ³. 2) If B²- Á - ³ then ~ ( , where ( ~ ² Ä ³ The matrix ( is called the matrix of . Example 2.3 Consider the linear transformation ¢ - ¦ - defined by ²%Á &Á '³ ~ ²% c &Á 'Á % b & b '³ Then we have, in column form,

v % y v %c& y v & ~ ' ~ w ' z w %b&b' z w

c

y v%y & z w' z

and so the standard matrix of is (~

v w

c

y z

If ( CÁ , then since the image of ( is the column space of (, we have dim²ker²( ³³ b rk²(³ ~ dim²- ³ This gives the following useful result. Theorem 2.11 Let ( be an d matrix over - . 1) ( ¢ - ¦ - is injective if and only if rk²(³ ~ n. 2) ( ¢ - ¦ - is surjective if and only if rk²(³ ~ m.


65

Change of Basis Matrices Suppose that 8 ~ ² Á Ã Á ³ and 9 ~ ² Á Ã Á ³ are ordered bases for a vector space = . It is natural to ask how the coordinate matrices ´#µ8 and ´#µ9 are related. Referring to Figure 2.1,

Fn

IB IC(IB)-1

V

IC Fn Figure 2.1 the map that takes ´#µ8 to ´#µ9 is 8Á9 ~ 9 8c and is called the change of basis operator (or change of coordinates operator). Since 8Á9 is an operator on - , it has the form ( , where ( ~ ²8Á9 ² ³ Ä 8Á9 ² ³³ ~ ²9 8c ²´ µ8 ³ Ä 9 8c ²´ µ8 ³³ ~ ²´ µ9 Ä ´ µ9 ³³ We denote ( by 48,9 and call it the change of basis matrix from 8 to 9. Theorem 2.12 Let 8 ~ ² Á Ã Á ³ and 9 be ordered bases for a vector space = . Then the change of basis operator 8Á9 ~ 9 8c is an automorphism of - , whose standard matrix is 48,9 ~ ²´ µ9 Ä ´ µ9 ³³ Hence ´#µ9 ~ 48Á9 ´#µ8 and 49Á8 ~ 48c ,9 . Consider the equation ( ~ 48 Á 9 or equivalently, ( ~ ²´ µ9 Ä ´ µ9 ³³ Then given any two of ( (an invertible d matrix)Á 8 (an ordered basis for - ) and 9 (an ordered basis for - ), the third component is uniquely determined by this equation. This is clear if 8 and 9 are given or if ( and 9 are

66


given. If ( and 8 are given, then there is a unique 9 for which (c ~ 49Á8 and so there is a unique 9 for which ( ~ 48Á9 . Theorem 2.13 If we are given any two of the following: 1) an invertible d matrix ( 2) an ordered basis 8 for - 3) an ordered basis 9 for - . then the third is uniquely determined by the equation ( ~ 48 Á 9

The Matrix of a Linear Transformation Let ¢ = ¦ > be a linear transformation, where dim²= ³ ~ and dim²> ³ ~ and let 8 ~ ² Á Ã Á ³ be an ordered basis for = and 9 an ordered basis for > . Then the map ¢ ´#µ8 ¦ ´ #µ9 is a representation of as a linear transformation from - to - , in the sense that knowing (along with 8 and 9, of course) is equivalent to knowing . Of course, this representation depends on the choice of ordered bases 8 and 9. Since is a linear transformation from - to - , it is just multiplication by an d matrix (, that is, ´ #µ9 ~ (´#µ8 Indeed, since ´ µ8 ~ , we get the columns of ( as follows: (²³ ~ ( ~ (´# µ8 ~ ´ µ9 Theorem 2.14 Let B²= Á > ³ and let 8 ~ ² Á Ã Á ³ and 9 be ordered bases for = and > , respectively. Then can be represented with respect to 8 and 9 as matrix multiplication, that is, ´ #µ9 ~ ´ µ8,9 ´#µ8 where ´ µ8,9 ~ ²´ µ9 Ä ´ µ9 ³ is called the matrix of with respect to the bases 8 and 9. When = ~ > and 8 ~ 9, we denote ´ µ8,8 by ´ µ8 and so ´ #µ8 ~ ´ µ8 ´#µ8

Example 2.4 Let +¢ F ¦ F be the derivative operator, defined on the vector space of all polynomials of degree at most . Let 8 ~ 9 ~ ²Á %Á % ³. Then


´+²³µ9 ~ ´µ9 ~

67

vy vy vy , ´+²%³µ9 ~ ´µ9 ~ Á ´+²% ³µ9 ~ ´%µ9 ~ wz wz wz

and so ´+µ8 ~

v w

y z

Hence, for example, if ²%³ ~ b % b % , then ´+²%³µ9 ~ ´+µ8 ´²%³µ8 ~

v w

yv y v y ~ zw z w z

and so +²%³ ~ b %. The following result shows that we may work equally well with linear transformations or with the matrices that represent them (with respect to fixed ordered bases 8 and 9). This applies not only to addition and scalar multiplication, but also to matrix multiplication. Theorem 2.15 Let = and > be finite-dimensional vector spaces over - , with ordered bases 8 ~ ² Á Ã Á ³ and 9 ~ ² Á Ã Á ³, respectively. 1) The map ¢ B²= Á > ³ ¦ CÁ ²- ³ defined by ² ³ ~ ´ µ8,9 is an isomorphism and so B²= Á > ³ CÁ ²- ³. Hence, dim²B²= Á > ³³ ~ dim²CÁ ²- ³³ ~ d 2) If B²< Á = ³ and B²= Á > ³ and if 8 , 9 and : are ordered bases for < , = and > , respectively, then ´µ8,: ~ ´ µ9,: ´µ8,9 Thus, the matrix of the product (composition) is the product of the matrices of and . In fact, this is the primary motivation for the definition of matrix multiplication. Proof. To see that is linear, observe that for all , ´ b ! µ8Á9 ´ µ8 ~ ´² b ! ³² ³µ9 ~ ´ ² ³ b ! ² ³µ9 ~ ´² ³µ9 b !´ ² ³µ9 ~ ´µ8Á9 ´ µ8 b !´ µ8Á9 ´ µ8 ~ ² ´µ8Á9 b !´ µ8Á9 ³´ µ8

68


and since ´ µ8 ~ is a standard basis vector, we conclude that ´ b ! µ8Á9 ~ ´µ8Á9 b !´ µ8Á9 and so is linear. If ( CÁ , we define by the condition ´ µ9 ~ (²³ , whence ² ³ ~ ( and is surjective. Also, ker²³ ~ ¸¹ since ´ µ8 ~ implies that ~ . Hence, the map is an isomorphism. To prove part 2), we have ´µ8Á: ´#µ8 ~ ´ ²#³µ: ~ ´ µ9,: ´#µ9 ~ ´ µ9Á: ´µ8Á9 ´#µ8

Change of Bases for Linear Transformations Since the matrix ´ µ8,9 that represents depends on the ordered bases 8 and 9 , it is natural to wonder how to choose these bases in order to make this matrix as simple as possible. For instance, can we always choose the bases so that is represented by a diagonal matrix? As we will see in Chapter 7, the answer to this question is no. In that chapter, we will take up the general question of how best to represent a linear operator by a matrix. For now, let us take the first step and describe the relationship between the matrices ´ µ8Á9 and ´ µ8Z Á9Z of with respect to two different pairs ²8 Á 9³ and ²8 Z Á 9Z ³ of ordered bases. Multiplication by ´ µ8Z Á9Z sends ´#µ8Z to ´ #µ9Z . This can be reproduced by first switching from 8 Z to 8 , then applying ´ µ8Á9 and finally switching from 9 to 9Z , that is, ´ µ8Z ,9Z ~ 49Á9Z ´ µ8,9 48Z Á8 ~ 49Á9Z ´ µ8,9 48c Á8 Z Theorem 2.16 Let B²= ,> ³ and let ²8 Á 9³ and ²8 Z Á 9Z ³ be pairs of ordered bases of = and > , respectively. Then ´ µ8Z Á9Z ~ 49Á9Z ´ µ8Á9 48Z Á8

(2.1)

When B²= ³ is a linear operator on = , it is generally more convenient to represent by matrices of the form ´ µ8 , where the ordered bases used to represent vectors in the domain and image are the same. When 8 ~ 9, Theorem 2.16 takes the following important form. Corollary 2.17 Let B²= ³ and let 8 and 9 be ordered bases for = . Then the matrix of with respect to 9 can be expressed in terms of the matrix of with respect to 8 as follows: ´ µ9 ~ 48Á9 ´ µ8 48c Á9

(2.2)

Equivalence of Matrices Since the change of basis matrices are precisely the invertible matrices, (2.1) has the form


69

´ µ8Z Á9Z ~ 7 ´ µ8Á9 8c where 7 and 8 are invertible matrices. This motivates the following definition. Definition Two matrices ( and ) are equivalent if there exist invertible matrices 7 and 8 for which ) ~ 7 (8c

We have remarked that ) is equivalent to ( if and only if ) can be obtained from ( by a series of elementary row and column operations. Performing the row operations is equivalent to multiplying the matrix ( on the left by 7 and performing the column operations is equivalent to multiplying ( on the right by 8c . In terms of (2.1), we see that performing row operations (premultiplying by 7 ) is equivalent to changing the basis used to represent vectors in the image and performing column operations (postmultiplying by 8c ) is equivalent to changing the basis used to represent vectors in the domain. According to Theorem 2.16, if ( and ) are matrices that represent with respect to possibly different ordered bases, then ( and ) are equivalent. The converse of this also holds. Theorem 2.18 Let = and > be vector spaces with dim²= ³ ~ and dim²> ³ ~ . Then two d matrices ( and ) are equivalent if and only if they represent the same linear transformation B²= Á > ³, but possibly with respect to different ordered bases. In this case, ( and ) represent exactly the same set of linear transformations in B²= Á > ³. Proof. If ( and ) represent , that is, if ( ~ ´ µ8,9

and ) ~ ´ µ8Z ,9Z

for ordered bases 8 Á 9Á 8 Z and 9Z , then Theorem 2.16 shows that ( and ) are equivalent. Now suppose that ( and ) are equivalent, say ) ~ 7 (8c where 7 and 8 are invertible. Suppose also that ( represents a linear transformation B²= Á > ³ for some ordered bases 8 and 9, that is, ( ~ ´ µ8Á9 Theorem 2.9 implies that there is a unique ordered basis 8 Z for = for which 8 ~ 48Á8Z and a unique ordered basis 9Z for > for which 7 ~ 49Á9Z . Hence ) ~ 49Á9Z ´ µ8Á9 48Z Á8 ~ ´ µ8Z Á9Z

70


Hence, ) also represents . By symmetry, we see that ( and ) represent the same set of linear transformations. This completes the proof. We remarked in Example 0.3 that every matrix is equivalent to exactly one matrix of the block form 0 1 ~ > cÁ

Ác cÁc ?block

Hence, the set of these matrices is a set of canonical forms for equivalence. Moreover, the rank is a complete invariant for equivalence. In other words, two matrices are equivalent if and only if they have the same rank.

Similarity of Matrices When a linear operator B²= ³ is represented by a matrix of the form ´ µ8 , equation (2.2) has the form ´ µ8Z ~ 7 ´ µ8 7 c where 7 is an invertible matrix. This motivates the following definition. Definition Two matrices ( and ) are similar, denoted by ( ) , if there exists an invertible matrix 7 for which ) ~ 7 (7 c The equivalence classes associated with similarity are called similarity classes. The analog of Theorem 2.18 for square matrices is the following. Theorem 2.19 Let = be a vector space of dimension . Then two d matrices ( and ) are similar if and only if they represent the same linear operator B²= ³, but possibly with respect to different ordered bases. In this case, ( and ) represent exactly the same set of linear operators in B²= ³. Proof. If ( and ) represent B²= ³, that is, if ( ~ ´ µ8

and ) ~ ´ µ9

for ordered bases 8 and 9, then Corollary 2.17 shows that ( and ) are similar. Now suppose that ( and ) are similar, say ) ~ 7 (7 c Suppose also that ( represents a linear operator B²= ³ for some ordered basis 8 , that is, ( ~ ´ µ8 Theorem 2.9 implies that there is a unique ordered basis 9 for = for which


71

7 ~ 48Á9 . Hence ) ~ 48Á9 ´ µ8 48c Á 9 ~ ´ µ9 Hence, ) also represents . By symmetry, we see that ( and ) represent the same set of linear operators. This completes the proof. We will devote much effort in Chapter 7 to finding a canonical form for similarity.

Similarity of Operators We can also define similarity of operators. Definition Two linear operators Á B²= ³ are similar, denoted by , if there exists an automorphism B²= ³ for which ~ c The equivalence classes associated with similarity are called similarity classes. Note that if 8 ~ ² Á Ã Á ³ and 9 ~ ² Á Ã Á ³ are ordered bases for = , then 49Á8 ~ ²´ µ8 Ä ´ µ8 ³ Now, the map defined by ² ³ ~ is an automorphism of = and 49Á8 ~ ²´² ³µ8 Ä ´² ³µ8 ³ ~ ´µ8 Conversely, if ¢ = ¦ = is an automorphism and 8 ~ ² Á Ã Á ³ is an ordered basis for = , then 9 ~ ² ~ ² ³Á Ã Á ~ ² ³³ is also a basis: ´µ8 ~ ²´² ³µ8 Ä ´² ³µ8 ³ ~ 49Á8 The analog of Theorem 2.19 for linear operators is the following. Theorem 2.20 Let = be a vector space of dimension . Then two linear operators and on = are similar if and only if there is a matrix ( C that represents both operators, but with respect to possibly different ordered bases. In this case, and are represented by exactly the same set of matrices in C . Proof. If and are represented by ( C , that is, if ´ µ8 ~ ( ~ ´µ9 for ordered bases 8 and 9, then ´µ9 ~ ´ µ8 ~ 49Á8 ´ µ9 48Á9 As remarked above, if ¢ = ¦ = is defined by ² ³ ~ , then

72


´µ9 ~ 48Á9 and so c ´µ9 ~ ´µc 9 ´ µ9 ´µ9 ~ ´ µ9

from which it follows that and are similar. Conversely, suppose that and are similar, say ~ c where is an automorphism of = . Suppose also that is represented by the matrix ( C , that is, ( ~ ´ µ8 for some ordered basis 8 . Then ´µ8 ~ 49Á8 and so c ´µ8 ~ ´c µ8 ~ ´µ8 ´ µ8 ´µc 8 ~ 4 9 Á8 ´ µ 8 4 9 Á 8

It follows that ( ~ ´ µ8 ~ 48Á9 ´µ8 48c Á9 ~ ´ µ9 and so ( also represents . By symmetry, we see that and are represented by the same set of matrices. This completes the proof. We can summarize the sitiation with respect to similarity in Figure 2.2. Each similarity class I in B²= ³ corresponds to a similarity class J in C ²- ³: J is the set of all matrices that represent any I and I is the set of all operators in B²= ³ that are represented by any 4 J .

W

V V

similarity classes of L(V)

[W]B [V]B J [W]C [V]C

Similarity classes of matrices

I

W

Figure 2.2

Invariant Subspaces and Reducing Pairs The restriction of a linear operator B²= ³ to a subspace : of = is not necessarily a linear operator on : . This prompts the following definition.


73

Definition Let B²= ³. A subspace : of = is said to be invariant under or -invariant if : : , that is, if : for all : . Put another way, : is invariant under if the restriction O: is a linear operator on : . If = ~: l; then the fact that : is -invariant does not imply that the complement ; is also -invariant. (The reader may wish to supply a simple example with = ~ s .) Definition Let B²= ³. If = ~ : l ; and if both : and ; are -invariant, we say that the pair ²:Á ; ³ reduces . A reducing pair can be used to decompose a linear operator into a direct sum as follows. Definition Let B²= ³. If ²:Á ; ³ reduces we write ~ O: l O; and call the direct sum of O: and O; . Thus, the expression ~l means that there exist subspaces : and ; of = for which ²:Á ; ³ reduces and ~ O: and ~ O;

The concept of the direct sum of linear operators will play a key role in the study of the structure of a linear operator.

Projection Operators We will have several uses for a special type of linear operator that is related to direct sums. Definition Let = ~ : l ; . The linear operator :Á; ¢ = ¦ = defined by :Á; ² b !³ ~ where : and ! ; is called projection onto : along ; . Whenever we say that the operator :Á; is a projection, it is with the understanding that = ~ : l ; . The following theorem describes a few basic properties of projection operators. We leave proof as an exercise. Theorem 2.21 Let = be a vector space and let B²= ³.

74


1) If = ~ : l ; then :Á; b ; Á: ~ 2) If ~ :Á; then im²³ ~ :

ker²³ ~ ;

and

and so = ~ im²³ l ker²³ In other words, is projection onto its image along its kernel. Moreover, # im²³

¯

# ~ #

3) If B²= ³ has the property that = ~ im²³ l ker²³

and

Oim²³ ~

then is projection onto im²³ along ker²³. Projection operators are easy to characterize. Definition A linear operator B²= ³ is idempotent if ~ . Theorem 2.22 A linear operator B²= ³ is a projection if and only if it is idempotent. Proof. If ~ :Á; , then for any : and ! ; , ² b !³ ~ ~

~ ² b !³

and so ~ . Conversely, suppose that is idempotent. If # im²³ q ker²³, then # ~ % and so ~ # ~ % ~ % ~ # Hence im²³ q ker²³ ~ ¸¹. Also, if # = , then # ~ ²# c #³ b # ker²³ l im²³ and so = ~ ker²³ l im²³. Finally, ²%³ ~ % ~ % and so Oim²³ ~ . Hence, is projection onto im²³ along ker²³.

Projections and Invariance Projections can be used to characterize invariant subspaces. Let B²= ³ and let : be a subspace of = . Let ~ :Á; for any complement ; of : . The key is that the elements of : can be characterized as those vectors fixed by , that is,


75

: if and only if ~ . Hence, the following are equivalent: : : : for all : ² ³ ~ for all : ² ³ ~ for all : Thus, : is -invariant if and only if ~ for all vectors : . But this is also true for all vectors in ; , since both sides are equal to on ; . This proves the following theorem. Theorem 2.23 Let B²= ³. Then a subspace : of = is -invariant if and only if there is a projection ~ :Á; for which ~ in which case this holds for all projections of the form ~ :Á; . We also have the following relationship between projections and reducing pairs. Theorem 2.24 Let = ~ : l ; . Then ²:Á ; ³ reduces B²= ³ if and only if commutes with :Á; . Proof. Theorem 2.23 implies that : and ; are -invariant if and only if :Á; :Á; ~ :Á;

and

² c :Á; ³ ² c :Á; ³ ~ ² c :Á; ³

and a little algebra shows that this is equivalent to :Á; :Á; ~ :Á;

and

:Á; ~ :Á;

which is equivalent to :Á; ~ :Á; .

Orthogonal Projections and Resolutions of the Identity Observe that if is a projection, then ² c ³ ~ ² c ³ ~ Definition Two projections Á B²= ³ are orthogonal, written , if ~ ~

Note that if and only if im²³ ker²³

and

im²³ ker²³

The following example shows that it is not enough to have ~ in the definition of orthogonality. In fact, it is possible for ~ and yet is not even a projection.

76


Example 2.5 Let = ~ - and consider the ? - and @ -axes and the diagonal: ? ~ ¸²%Á ³ % - ¹ @ ~ ¸²Á &³ & - ¹ + ~ ¸²%Á %³ % - ¹ Then +Á? +Á@ ~ +Á@ £ +Á? ~ +Á@ +Á? From this we deduce that if and are projections, it may happen that both products and are projections, but that they are not equal. We leave it to the reader to show that @ Á? ?Á+ ~ (which is a projection), but that ?Á+ @ Á? is not a projection. Since a projection is idempotent, we can write the identity operator as s sum of two orthogonal projections: b ² c ³ ~ Á

² c ³

Let us generalize this to more than two projections. Definition A resolution of the identity on = is a sum of the form b Ä b ~ where the 's are pairwise orthogonal projections, that is, for £ . There is a connection between the resolutions of the identity on = and direct sum decompositions of = . In general terms, if b Ä b ~ for any linear operators B²= ³, then for all # = , # ~ # b Ä b # im² ³ b Ä b im² ³ and so = ~ im² ³ b Ä b im² ³ However, the sum need not be direct. Theorem 2.25 Let = be a vector space. Resolutions of the identity on = correspond to direct sum decompositions of = as follows: 1) If b Ä b ~ is a resolution of the identity, then = ~ im² ³ l Ä l im² ³


77

and is projection onto im² ³ along ker² ³ ~ im² ³ £

2) Conversely, if = ~ : l Ä l : and if is projection onto : along the direct sum £ : ,, then b Ä b ~ is a resolution of the identity. Proof. To prove 1), if b Ä b ~ is a resolution of the identity, then = ~ im² ³ b Ä b im² ³ Moreover, if % b Ä b % ~ then applying gives % ~ and so the sum is direct. As to the kernel of , we have im² ³ l ker² ³ ~ = ~ im² ³ l

p

im² ³

q £

s t

and since ~ , it follows that im² ³ ker² ³ £

and so equality must hold. For part 2), suppose that = ~ : l Ä l : and is projection onto : along £ : . If £ , then im² ³ ~ : ker² ³ and so . Also, if # ~ #~

bÄb

bÄb

for

: , then

~ # b Ä b # ~ ² b Ä b ³#

and so ~ b Ä b is a resolution of the identity.

The Algebra of Projections If and are projections, it does not necessarily follow that b , c or is a projection. For example, the sum b is a projection if and only if ² b ³ ~ b

78


which is equivalent to ~ c Of course, this holds if ~ ~ , that is, if . But the converse is also true, provided that char²- ³ £ . To see this, we simply evaluate in two ways: ²³ ~ c²³ ~ c and ²³ ~ c²³ ~ c Hence, ~ ~ c and so ~ . It follows that ~ c ~ and so . Thus, for char²- ³ £ , we have b is a projection if and only if . Now suppose that b is a projection. For the kernel of b , note that ² b ³# ~

¬

² b ³# ~

¬

# ~

and similarly, # ~ . Hence, ker² b ³ ker²³ q ker²³. But the reverse inclusion is obvious and so ker² b ³ ~ ker²³ q ker²³ As to the image of b , we have # im² b ³

¬

# ~ ² b ³# ~ # b # im²³ b im²³

and so im² b ³ im²³ b im²³. For the reverse inclusion, if # ~ % b &, then ² b ³# ~ ² b ³²% b &³ ~ % b & ~ # and so # im² b ³. Thus, im² b ³ ~ im²³ b im²³. Finally, ~ implies that im²³ ker²³ and so the sum is direct and im² b ³ ~ im²³ l im²³ The following theorem also describes the situation for the difference and product. Proof in these cases is left for the exercises. Theorem 2.26 Let = be a vector space over a field - of characteristic £ and let and be projections. 1) The sum b is a projection if and only if , in which case im² b ³ ~ im²³ l im²³

and

ker² b ³ ~ ker²³ q ker²³

2) The difference c is a projection if and only if ~ ~


79

in which case im² c ³ ~ im²³ q ker²³

and

ker² c ³ ~ ker²³ l im²³

3) If and commute, then is a projection, in which case im²³ ~ im²³ q im²³

and

ker²³ ~ ker²³ b ker²³

(Example 2.5 shows that the converse may be false.)

Topological Vector Spaces This section is for readers with some familiarity with point-set topology.

The Definition A pair ²= Á J ³ where = is a real vector space = and J is a topology on the set = is called a topological vector space if the operations of addition 7¢ = d = ¦ = Á

7²#Á $³ ~ # b $

and scalar multiplication C¢ s d = ¦ = Á

C²Á #³ ~ #

are continuous functions.

The Standard Topology on s The vector space s is a topological vector space under the standard topology, which is the topology for which the set of open rectangles 8 ~ ¸0 d Ä d 0 0 's are open intervals in s¹ is a base, that is, a subset of s is open if and only if it is a union of open rectangles. The standard topology is also the topology induced by the Euclidean metric on s , since an open rectangle is the union of Euclidean open balls and an open ball is the union of open rectangles. The standard topology on s has the property that the addition function 7¢ s d s ¦ s ¢ ²#Á $³ ¦ # b $ and the scalar multiplication function C¢ s d s ¦ s ¢ ²Á #³ ¦ # are continuous and so s is a topological vector space under this topology. Also, the linear functionals ¢ s ¦ s are continuous maps. For example, to see that addition is continuous, if ²" Á Ã Á " ³ b ²# Á Ã Á # ³ ² Á ³ d Ä d ²Á ³ 8

80


then " b # ² Á ³ and so there is an for which ²" c Á " b ³ b ²# c Á # b ³ ² Á ³ for all . It follows that if ²" Á Ã Á " ³ 0 ²" c Á " b ³ d Ä d ²" c Á " b ³ 8 and ²# Á Ã Á # ³ 1 ²# c Á # b ³ d Ä d ²# c Á # b ³ 8 then ²" Á Ã Á " ³ b ²# Á Ã Á # ³ 7²0Á 1 ³ ² Á ³ d Ä d ²Á ³

The Natural Topology on = Now let = be a real vector space of dimension and fix an ordered basis 8 ~ ²# Á Ã Á # ³ for = . We wish to show that there is precisely one topology J on = for which ²= Á J ³ is a topological vector space and all linear functionals are continuous. This topology is called the natural topology on = . Our plan is to show that if ²= Á J ³ is a topological vector space and if all linear functionals on = are continuous, then the coordinate map 8 ¢ = s is a homeomorphism. This implies that if J does exist, it must be unique. Then we use ~ 8c to move the standard topology from s to = , thus giving = a topology J for which 8 is a homeomorphism. Finally, we show that ²= Á J ³ is a topological vector space and that all linear functionals on = are continuous. The first step is to show that if ²= Á J ³ is a topological vector space, then is continuous. Since ~ where ¢ s ¦ = is defined by ² Á Ã Á ³ ~ # it is sufficient to show that these maps are continuous. (The sum of continuous maps is continuous.) Let 6 be an open set in J . Then Cc ²6³ ~ ¸²Á %³ s d = % 6¹ is open in s d = . This implies that if % 6, then there is an open interval 0 s containing for which 0% ~ ¸ %

0¹ 6

We need to show that the set c ²6³ is open. But c ²6³ ~ ¸² Á Ã Á ³ s # 6¹ ~ s d Ä d s d ¸ s # 6¹ d s d Ä d s

In words, an -tuple ² Á Ã Á ³ is in c ²6³ if the th coordinate times # is


81

in 6. But if # 6, then there is an open interval 0 s for which 0 and 0# 6. Hence, the entire open set < ~sdÄdsd0 dsdÄds where the factor 0 is in the th position is in c ²6³, that is, ² Á Ã Á ³ < c ²6³ Thus, c ²6³ is open and , and therefore also , is continuous. Next we show that if every linear functional on = is continuous under a topology J on = , then the coordinate map is continuous. If # = denote by ´#µ8Á the th coordinate of ´#µ8 . The map ¢ = ¦ s defined by # ~ ´#µ8Á is a linear functional and so is continuous by assumption. Hence, for any open interval 0 s the set ( ~ ¸# = ´#µ8Á 0 ¹ is open. Now, if 0 are open intervals in s, then c ²0 d Ä d 0 ³ ~ ¸# = ´#µ8 0 d Ä d 0 ¹ ~ ( is open. Thus, is continuous. We have shown that if a topology J has the property that ²= Á J ³ is a topological vector space under which every linear functional is continuous, then and ~ c are homeomorphisms. This means that if J exists, its open sets must be the images under of the open sets in the standard topology of s . It remains to prove that the topology J on = that makes a homeomorphism makes ²= Á J ³ a topological vector space for which any linear functional on = is continuous. The addition map on = is a composition 7 ~ c k 7Z k ² d ³ where 7Z ¢ s d s ¦ s is addition in s and since each of the maps on the right is continuous, so is 7. Similarly, scalar multiplication in = is C ~ c k CZ k ² d ³ where CZ ¢ s d s ¦ s is scalar multiplication in s . Hence, C is continuous. Now let be a linear functional. Since is continuous if and only if k c is continuous, we can confine attention to = ~ s . In this case, if Á Ã Á is the standard basis for s and ( ² ³( 4 for all , then for any

82


% ~ ² Á Ã Á ³ s , we have ( ²%³( ~ c ² ³c ( (( ² ³( 4 ( ( Now, if (%( °4 , then ( ( °4 and so ( ²%³( , which implies that is continuous at % ~ . According to the Riesz representation theorem (Theorem 9.18) and the Cauchy– Schwarz inequality, we have ) ²%³) )H ))%) where 9 s . Hence, % ¦ implies ²% ³ ¦ and so by linearity, % ¦ % implies ²% ³ ¦ % and so is continuous at all %. Theorem 2.27 Let = be a real vector space of dimension . There is a unique topology on = , called the natural topology, for which = is a topological vector space and for which all linear functionals on = are continuous. This topology is determined by the fact that the coordinate map ¢ = ¦ s is a homeomorphism, where s has the standard topology induced by the Euclidean metric.

Linear Operators on = d A linear operator on a real vector space = can be extended to a linear operator d on the complexification = d by defining d ²" b #³ ~ ²"³ b ²#³ Here are the basic properties of this complexification of . Theorem 2.28 If Á B²= ³, then 1) ² ³d ~ d , s 2) ² b ³d ~ d b d 3) ²³d ~ d d 4) ´ #µd ~ d ²#d ³. Let us recall that for any ordered basis 8 for = and any vector # = we have ´# b µcpx²8 ³ ~ ´#µ8 Now, if 8 is an ordered basis for = , then the th column of ´ µ8 is ´ µ8 ~ ´ b µcpx²8³ ~ ´ d ² b ³µcpx²8³ which is the th column of the coordinate matrix of d with respect to the basis cpx²8 ³. Thus we have the following theorem.


83

Theorem 2.29 Let B²= ³ where = is a real vector space. The matrix of d with respect to the ordered basis cpx²8 ³ is equal to the matrix of with respect to the ordered basis 8 : ´ d µcpx²8³ ~ ´ µ8 Hence, if a real matrix ( represents a linear operator on = , then ( also represents the complexification d of on = d .

Exercises 1.

2. 3. 4.

Let ( CÁ have rank . Prove that there are matrices ? CÁ and @ CÁ , both of rank , for which ( ~ ?@ . Prove that ( has rank if and only if it has the form ( ~ %! & where % and & are row matrices. Prove Corollary 2.9 and find an example to show that the corollary does not hold without the finiteness condition. Let B²= Á > ³. Prove that is an isomorphism if and only if it carries a basis for = to a basis for > . If B²= Á > ³ and B²= Á > ³ we define the external direct sum ^ B²= ^ = Á > ^ > ³ by ² ^ ³²²# Á # ³³ ~ ² # Á # ³

Show that ^ is a linear transformation. Let = ~ : l ; . Prove that : l ; : ^ ; . Thus, internal and external direct sums are equivalent up to isomorphism. 6. Let = ~ ( b ) and consider the external direct sum , ~ ( ^ ) . Define a map ¢ ( ^ ) ¦ = by ²#Á $³ ~ # b $. Show that is linear. What is the kernel of ? When is an isomorphism? 7. Let B- ²= ³ where dim²= ³ ~ B. Let ( C ²- ³. Suppose that there is an isomorphism ¢ = - with the property that ² #³ ~ (²#³. Prove that there is an ordered basis 8 for which ( ~ ´ µ8 . 8. Let J be a subset of B²= ³. A subspace : of = is J -invariant if : is invariant for every J . Also, = is J -irreducible if the only J -invariant subspaces of = are ¸¹ and = . Prove the following form of Schur's lemma. Suppose that J= B²= ³ and J> B²> ³ and = is J= -irreducible and > is J> -irreducible. Let B²= Á > ³ satisfy J= ~ J> , that is, for any J= there is a J> such that ~ and for any J> there is a J= such that ~ . Prove that ~ or is an isomorphism. 9. Let B²= ³ where dim²= ³ B. If rk² ³ ~ rk² ³ show that im² ³ q ker² ³ ~ ¸¹. 10. Let B²< ,= ³ and B²= Á > ³. Show that 5.

rk² ³ min¸rk² ³Á rk²³¹ 11. Let B²< Á = ³ and B²= Á > ³. Show that null² ³ null² ³ b null²³

84


12. Let Á B²= ³ where is invertible. Show that rk²³ ~ rk² ³ ~ rk²³ 13. Let Á B²= Á > ³. Show that rk² b ³ rk² ³ b rk²³ 14. Let : be a subspace of = . Show that there is a B²= ³ for which ker² ³ ~ : . Show also that there exists a B²= ³ for which im²³ ~ : . 15. Suppose that Á B²= ³. a) Show that ~ for some B²= ³ if and only if im²³ im² ³. b) Show that ~ for some B²= ³ if and only if ker² ³ ker²³. 16. Let dim²= ³ B and suppose that B²= ³ satisfies ~ . Show that rk² ³ dim²= ³. 17. Let ( be an d matrix over - . What is the relationship between the linear transformation ( ¢ - ¦ - and the system of equations (? ~ ) ? Use your knowledge of linear transformations to state and prove various results concerning the system (? ~ ) , especially when ) ~ . 18. Let = have basis 8 ~ ¸# Á Ã Á # ¹ and assume that the base field - for = has characteristic . Suppose that for each Á we define Á B²= ³ by Á ²# ³ ~ F

# # b #

if £ if ~

Prove that the Á are invertible and form a basis for B²= ³. 19. Let B²= ³. If : is a -invariant subspace of = must there be a subspace ; of = for which ²:Á ; ³ reduces ? 20. Find an example of a vector space = and a proper subspace : of = for which = : . 21. Let dim²= ³ B. If , B²= ³ prove that ~ implies that and are invertible and that ~ ² ³ for some polynomial ²%³ - ´%µ. 22. Let B²= ³. If ~ for all B²= ³ show that ~ , for some - , where is the identity map. 23. Let = be a vector space over a field - of characteristic £ and let and be projections. Prove the following: a) The difference c is a projection if and only if ~ ~ in which case im² c ³ ~ im²³ q ker²³

and

ker² c ³ ~ ker²³ l im²³

Hint: is a projection if and only if c is a projection and so c is a projection if and only if


85

~ c ² c ³ ~ ² c ³ b is a projection. b) If and commute, then is a projection, in which case im²³ ~ im²³ q im²³

and

ker²³ ~ ker²³ b ker²³

24. Let ¢ s ¦ s be a continuous function with the property that ²% b &³ ~ ²%³ b ²&³ 25. 26.

27. 28.

29.

30.

31.

32.

33.

Prove that is a linear functional on s . Prove that any linear functional ¢ s ¦ s is a continuous map. Prove that any subspace : of s is a closed set or, equivalently, that : ~ s ± : is open, that is, for any % : there is an open ball )²%Á ³ centered at % with radius for which )²%Á ³ : . Prove that any linear transformation ¢ = ¦ > is continuous under the natural topologies of = and > . Prove that any surjective linear transformation from = to > (both finitedimensional topological vector spaces under the natural topology) is an open map, that is, maps open sets to open sets. Prove that any subspace : of a finite-dimensional vector space = is a closed set or, equivalently, that : is open, that is, for any % : there is an open ball )²%Á ³ centered at % with radius for which )²%Á ³ : . Let : be a subspace of = with dim²= ³ B. a) Show that the subspace topology on : inherited from = is the natural topology. b) Show that the natural topology on = °: is the topology for which the natural projection map ¢ = ¦ = °: continuous and open. If = is a real vector space, then = d is a complex vector space. Thinking of = d as a vector space ²= d ³s over s, show that ²= d ³s is isomorphic to the external direct product = ^ = . (When is a complex linear map a complexification?) Let = be a real vector space with complexification = d and let B²= d ³. Prove that is a complexification, that is, has the form d for some B²= ³ if and only if commutes with the conjugate map ¢ = d ¦ = d defined by ²" b #³ ~ " c #. Let > be a complex vector space. a) Consider replacing the scalar multiplication on > by the operation ²'Á $³ ¦ '$ where ' d and $ > . Show that the resulting set with the addition defined for the vector space > and with this scalar multiplication is a complex vector space, which we denote by > . b) Show, without using dimension arguments, that ²>s ³d > ^ > .

Chapter 3

The Isomorphism Theorems

Quotient Spaces Let : be a subspace of a vector space = . It is easy to see that the binary relation on = defined by "#

¯

"c#:

is an equivalence relation. When " #, we say that " and # are congruent modulo : . The term mod is used as a colloquialism for modulo and " # is often written " # mod : When the subspace in question is clear, we will simply write " #. To see what the equivalence classes look like, observe that ´#µ ~ ¸" = " #¹ ~ ¸" = " c # :¹ ~ ¸" = " ~ # b for some :¹ ~ ¸# b :¹ ~#b: The set ´#µ ~ # b : ~ ¸# b

:¹

is called a coset of : in = and # is called a coset representative for # b : . (Thus, any member of a coset is a coset representative.) The set of all cosets of : in = is denoted by = °: ~ ¸# b : # = ¹ This is read “= mod : ” and is called the quotient space of = modulo : . Of

88


course, the term space is a hint that we intend to define vector space operations on = °: . The natural choice for these vector space operations is ²" b :³ b ²# b :³ ~ ²" b #³ b : and ²" b :³ ~ ²"³ b : but we must check that these operations are well-defined, that is, 1) " b : ~ " b :Á # b : ~ # b : ¬ ²" b # ³ b : ~ ²" b # ³ b : 2) " b : ~ " b : ¬ " b : ~ " b : Equivalently, the equivalence relation must be consistent with the vector space operations on = , that is, 3) " " Á # # ¬ ²" b # ³ ²" b # ³ 4) " " ¬ " " This senario is a recurring one in algebra. An equivalence relation on an algebraic structure, such as a group, ring, module or vector space is called a congruence relation if it preserves the algebraic operations. In the case of a vector space, these are conditions 3) and 4) above. These conditions follow easily from the fact that : is a subspace, for if " " and # # , then " c " :Á # c # : ¬ ²" c " ³ b ²# c # ³ : ¬ ²" b # ³ c ²" b # ³ : ¬ " b # " b # which verifies both conditions at once. We leave it to the reader to verify that = °: is indeed a vector space over - under these well-defined operations. Actually, we are lucky here: For any subspace : of = , the quotient = °: is a vector space under the natural operations. In the case of groups, not all subgroups have this property. Indeed, it is precisely the normal subgroups 5 of . that have the property that the quotient .°5 is a group. Also, for rings, it is precisely the ideals (not the subrings) that have the property that the quotient is a ring. Let us summarize.

The Isomorphism Theorems

89

Theorem 3.1 Let : be a subspace of = . The binary relation "#

¯

"c#:

is an equivalence relation on = , whose equivalence classes are the cosets # b : ~ ¸# b

:¹

of : in = . The set = °: of all cosets of : in = , called the quotient space of = modulo : , is a vector space under the well-defined operations ²" b :³ ~ " b : ²" b :³ b ²# b :³ ~ ²" b #³ b : The zero vector in = °: is the coset b : ~ : .

The Natural Projection and the Correspondence Theorem If : is a subspace of = , then we can define a map : ¢ = ¦ = °: by sending each vector to the coset containing it: : ²#³ ~ # b : This map is called the canonical projection or natural projection of = onto = °: , or simply projection modulo : . (Not to be confused with the projection operators :Á; .) It is easily seen to be linear, for we have (writing for : ) ²" b #³ ~ ²" b #³ b : ~ ²" b :³ b ²# b :³ ~ ²"³ b ²#³ The canonical projection is clearly surjective. To determine the kernel of , note that # ker²³ ¯ ²#³ ~ ¯ # b : ~ : ¯ # : and so ker²³ ~ : Theorem 3.2 The canonical projection : ¢ = ¦ = °: defined by : ²#³ ~ # b : is a surjective linear transformation with ker²: ³ ~ : . If : is a subspace of = , then the subspaces of the quotient space = °: have the form ; °: for some intermediate subspace ; satisfying : ; = . In fact, as shown in Figure 3.1, the projection map : provides a one-to-one correspondence between intermediate subspaces : ; = and subspaces of the quotient space = °: . The proof of the following theorem is left as an exercise.

90


V V/S

T S

T/S

{0}

{0}

Figure 3.1: The correspondence theorem Theorem 3.3 (The correspondence theorem) Let : be a subspace of = . Then the function that assigns to each intermediate subspace : ; = the subspace ; °: of = °: is an order-preserving (with respect to set inclusion) one-to-one correspondence between the set of all subspaces of = containing : and the set of all subspaces of = °: . Proof. We prove only that the correspondence is surjective. Let ? ~ ¸" b : " < ¹ be a subspace of = °: and let ; be the union of all cosets in ? : ; ~ ²" b :³ "
c ?7 ~ det²%0 c *´ ²%³µ³

²%Â Á ³ ~ det>

% c

So, let us define (²%Â Á Ã Á c ³ ~ %0 c *´ ²%³µ Ä v % x c % Ä x ~ x c Æ x Å Å Æ % w Ä c

y { { Å { { c % b c z

where % is an independent variable. The determinant of this matrix is a polynomial in % whose degree equals the number of parameters Á Ã Á c . We have just seen that det²(²%Â Á ³³ ~ ²%Â Á ³ and this is also true for ~ . As a basis for induction, if det²(²%Â Á Ã Á c ³³ ~ ²%Â Á Ã Á c ³ then expanding along the first row gives det²(²%Á Á Ã Á ³³ v c x ~ % det²(²%Á Á Ã Á ³³ b ²c³ detx Å w ~ % det²(²%Á Á Ã Á ³³ b ~ % ²%Â Á Ã Á ³ b ~ % b % b Ä b % b %b b ~ b ²%Â Á Ã Á ³ We have proved the following.

% c Å

Ä y Æ { { Æ % Ä c zd

The Structure of a Linear Operator

181

Lemma 7.17 For any ²%³ - ´%µ, det²%0 c *´²%³µ³ ~ ²%³

Now suppose that 9 is a matrix in the elementary divisor form of rational canonical form. Since the determinant of a block diagonal matrix is the product of the determinants of the blocks on the diagonal, it follows that

det²%0 c 9³ ~ Á ²%³ ~ 9 ²%³ Á

Moreover, if ( 9, say ( ~ 7 97 c , then det²%0 c (³ ~ det²%0 c 7 97 c ³ ~ det ´7 ²%0 c 9³7 c µ ~ det²7 ³det²%0 c 9³det²7 c ³ ~ det²%0 c 9³ and so det²%0 c (³ ~ det²%0 c 9³ ~ 9 ²%³ ~ ( ²%³ Hence, the fact that all matrices have a rational canonical form allows us to deduce the following theorem. Theorem 7.18 Let B²= ³. If ( is any matrix that represents , then

²%³ ~ ( ²%³ ~ det²%0 c (³

Changing the Base Field A change in the base field will generally change the primeness of polynomials and therefore has an effect on the multiset of elementary divisors. It is perhaps a surprising fact that a change of base field has no effect on the invariant factors— hence the adjective invariant. Theorem 7.19 Let - and 2 be fields with - 2 . Suppose that the elementary divisors of a matrix ( C ²- ³ are Á

7 ~ ¸Á Á Ã Á Á Ã Á Á Á Ã Á Á ¹ Suppose also that the polynomials can be further factored over 2 , say

Á

~ ÁÁ ÄÁ where Á is prime over 2 . Then the prime powers Á

Á 8 ~ ¸Á

Á Á

Á Ã Á Á

Á Á Ã Á Ã Á Á

are the elementary divisors of ( over 2 .

Á

Á Á Ã Á Á

Á

¹

182


Proof. Consider the companion matrix *´ Á ²%³µ in the rational canonical form of ( over - . This is a matrix over 2 as well and Theorem 7.15 implies that

Á Á

*´ Á ²%³µ diag²*´ÁÁ Á µÁ Ã Á *´Á µ³ Hence, 8 is an elementary divisor basis for ( over 2 . As mentioned, unlike the elementary divisors, the invariant factors are field independent. This is equivalent to saying that the invariant factors of a matrix ( 4 ²- ³ are polynomials over the smallest subfield of - that contains the entries of (À Theorem 7.20 Let ( C ²- ³ and let , - be the smallest subfield of that contains the entries of (. 1) The invariant factors of ( are polynomials over , . 2) Two matrices (Á ) C ²- ³ are similar over - if and only if they are similar over , . Proof. Part 1) follows immediately from Theorem 7.19, since using either 7 or 8 to compute invariant factors gives the same result. Part 2) follows from the fact that two matrices are similar over a given field if and only if they have the same multiset of invariant factors over that field. Example 7.2 Over the real field, the matrix (~6

c 7

is the companion matrix for the polynomial % b , and so ElemDivs ²(³ ~ ¸% b ¹ ~ InvFacts ²(³ However, as a complex matrix, the rational canonical form for ( is (~6

c 7

and so ElemDivd ²(³ ~ ¸% c Á % b ¹

and

InvFactd ²(³ ~ ¸% b ¹

Exercises 1.

2.

We have seen that any B²= ³ can be used to make = into an - ´%µmodule. Does every module = over - ´%µ come from some B²= ³? Explain. Let B²= ³ have minimal polynomial ²%³ ~ ²%³Ä ²%³

The Structure of a Linear Operator

183

where ²%³ are distinct monic primes. Prove that the following are equivalent: a) = is -cyclic. b) deg² ²%³³ ~ dim²= ³. c) The elementary divisors of are the prime power factors ²%³ and so = ~ ºº# »» l Ä l ºº# »» is a direct sum of -cyclic submodules ºº# »» of order ²%³. 3. Prove that a matrix ( C ²- ³ is nonderogatory if and only if it is similar to a companion matrix. 4. Show that if ( and ) are block diagonal matrices with the same blocks, but in possibly different order, then ( and ) are similar. 5. Let ( C ²- ³. Justify the statement that the entries of any invariant factor version of a rational canonical form for ( are “rational” expressions in the coefficients of (, hence the origin of the term rational canonical form. Is the same true for the elementary divisor version? 6. Let B²= ³ where = is finite-dimensional. If ²%³ - ´%µ is irreducible and if ² ³ is not one-to-one, prove that ²%³ divides the minimal polynomial of . 7. Prove that the minimal polynomial of B²= ³ is the least common multiple of its elementary divisors. 8. Let B²= ³ where = is finite-dimensional. Describe conditions on the minimal polynomial of that are equivalent to the fact that the elementary divisor version of the rational canonical form of is diagonal. What can you say about the elementary divisors? 9. Verify the statement that the multiset of elementary divisors (or invariant factors) is a complete invariant for similarity of matrices. 10. Prove that given any multiset of monic prime power polynomials

Á

4 ~ ¸Á ²%³Á Ã Á ²%³Á Ã Á Ã Á Á ²%³Á Ã Á Á ²%³¹ and given any vector space = of dimension equal to the sum of the degrees of these polynomials, there is an operator B²= ³ whose multiset of elementary divisors is 4 . 11. Find all rational canonical forms ²up to the order of the blocks on the diagonal) for a linear operator on s6 having minimal polynomial ²% c 1³ ²% b 1³ . 12. How many possible rational canonical forms (up to order of blocks) are there for linear operators on s6 with minimal polynomial ²% c 1³²% b 1³ ? 13. a) Show that if ( and ) are d matrices, at least one of which is invertible, then () and )( are similar.

184


b) What do the matrices (~>

c)

and ?

)~>

?

have to do with this issue? Show that even without the assumption on invertibility the matrices () and )( have the same characteristic polynomial. Hint: Write ( ~ 7 0Á 8

where 7 and 8 are invertible and 0Á is an d matrix that has the d identity in the upper left-hand corner and 's elsewhere. Write ) Z ~ 8)7 . Compute () and )( and find their characteristic polynomials. 14. Let be a linear operator on - with minimal polynomial ²%³ ~ ²% b 1³²% c 2³. Find the rational canonical form for if - ~ r, - ~ s or - ~ d. 15. Suppose that the minimal polynomial of B²= ³ is irreducible. What can you say about the dimension of = ? 16. Let B²= ³ where = is finite-dimensional. Suppose that ²%³ is an irreducible factor of the minimal polynomial ²%³ of . Suppose further that "Á # = have the property that ²"³ ~ ²#³ ~ ²%³. Prove that " ~ ² ³# for some polyjomial ²%³ if and only if # ~ ² ³" for some polynomial ²%³.

Chapter 8

Eigenvalues and Eigenvectors

Unless otherwise noted, we will assume throughout this chapter that all vector spaces are finite-dimensional.

Eigenvalues and Eigenvectors We have seen that for any B²= ³, the minimal and characteristic polynomials have the same set of roots (but not generally the same multiset of roots). These roots are of vital importance. Let ( ~ ´ µ8 be a matrix that represents . A scalar - is a root of the characteristic polynomial ²%³ ~ ( ²%³ ~ det²%0 c (³ if and only if det²0 c (³ ~

(8.1)

that is, if and only if the matrix 0 c ( is singular. In particular, if dim²= ³ ~ , then (8.1) holds if and only if there exists a nonzero vector % - for which ²0 c (³% ~ or equivalently, ( % ~ % If ´#µ8 ~ %, then this is equivalent to ´ µ8 ´#µ8 ~ ´#µ8 or in operator language, # ~ # This prompts the following definition. Definition Let = be a vector space over a field - and let B²= ³. 1) A scalar - is an eigenvalue (or characteristic value) of if there exists a nonzero vector # = for which

186


# ~ # In this case, # is called an eigenvector (or characteristic vector) of associated with . 2) A scalar - is an eigenvalue for a matrix ( if there exists a nonzero column vector % for which (% ~ % In this case, % is called an eigenvector (or characteristic vector) for ( associated with . 3) The set of all eigenvectors associated with a given eigenvalue , together with the zero vector, forms a subspace of = , called the eigenspace of and denoted by ; . This applies to both linear operators and matrices. 4) The set of all eigenvalues of an operator or matrix is called the spectrum of the operator or matrix. We denote the spectrum of by Spec² ³. Theorem 8.1 Let B²= ³ have minimal polynomial ²%³ and characteristic polynomial ²%³. 1) The spectrum of is the set of all roots of ²%³ or of ²%³, not counting multiplicity. 2) The eigenvalues of a matrix are invariants under similarity. 3) The eigenspace ; of the matrix ( is the solution space to the homogeneous system of equations ²0 c (³²%³ ~

One way to compute the eigenvalues of a linear operator is to first represent by a matrix ( and then solve the characteristic equation det²%0 c (³ ~ Unfortunately, it is quite likely that this equation cannot be solved when dim²= ³ . As a result, the art of approximating the eigenvalues of a matrix is a very important area of applied linear algebra. The following theorem describes the relationship between eigenspaces and eigenvectors of distinct eigenvalues. Theorem 8.2 Suppose that Á Ã Á are distinct eigenvalues of a linear operator B²= ³. 1) Eigenvectors associated with distinct eigenvalues are linearly independent; that is, if # ; , then the set ¸# Á Ã Á # ¹ is linearly independent. 2) The sum ; b Ä b ; is direct; that is, ; l Ä l ; exists. Proof. For part 1), if ¸# Á Ã Á # ¹ is linearly dependent, then by renumbering if necessary, we may assume that among all nontrivial linear combinations of


187

these vectors that equal , the equation # b Ä b # ~

(8.2)

has the fewest number of terms. Applying gives # b Ä b # ~

(8.3)

Multiplying (8.2) by and subtracting from (8.3) gives ² c ³# b Ä b ² c ³# ~ But this equation has fewer terms than (8.2) and so all of its coefficients must equal . Since the 's are distinct, ~ for and so ~ as well. This contradiction implies that the # 's are linearly independent. The next theorem describes the spectrum of a polynomial ² ³ in . Theorem 8.3 (The spectral mapping theorem) Let = be a vector space over an algebraically closed field - . Let B²= ³ and let ²%³ - ´%µ. Then Spec²² ³³ ~ ²Spec² ³³ ~ ¸²³ Spec² ³¹ Proof. We leave it as an exercise to show that if is an eigenvalue of , then ²³ is an eigenvalue of ² ³. Hence, ²Spec² ³³ Spec²² ³³. For the reverse inclusion, let Spec²² ³³, that is, ²² ³ c ³# ~ for # £ . If ²%³ c ~ ²% c ³ Ä²% c ³ where - , then writing this as a product of (not necessarily distinct) linear factors, we have ² c ³Ä² c ³Ä² c ³Ä² c ³# ~ (The operator is written for convenience.) We can remove factors from the left end of this equation one by one until we arrive at an operator (perhaps the identity) for which # £ but ² c ³# ~ . Then # is an eigenvector for with eigenvalue . But since ² ³ c ~ , it follows that ~ ² ³ ²Spec² ³³. Hence, Spec²² ³³ ²Spec² ³³.

The Trace and the Determinant Let - be algebraically closed and let ( C ²- ³ have characteristic polynomial ( ²%³ ~ % b c %c b Ä b % b ~ ²% c ³Ä²% c ³

188


where Á Ã Á are the eigenvalues of (. Then ( ²%³ ~ det²%0 c (³ and setting % ~ gives det²(³ ~ c ~ ²c³c Ä Hence, if - is algebraically closed then, up to sign, det²(³ is the constant term of ( ²%³ and the product of the eigenvalues of (, including multiplicity. The sum of the eigenvalues of a matrix over an algebraically closed field is also an interesting quantity. Like the determinant, this quantity is one of the coefficients of the characteristic polynomial (up to sign) and can also be computed directly from the entries of the matrix, without knowing the eigenvalues explicitly. Definition The trace of a matrix ( C ²- ³, denoted by tr²(³, is the sum of the elements on the main diagonal of (. Here are the basic propeties of the trace. Proof is left as an exercise. Theorem 8.4 Let (Á ) C ²- ³. 1) tr²A³ ~ tr²(³, for - . 2) tr²( b )³ ~ tr²(³ b tr²)³. 3) tr²()³ ~ tr²)(³. 4) tr²()*³ ~ tr²*()³ ~ tr²)*(³. However, tr²()*³ may not equal tr²(*)³. 5) The trace is an invariant under similarity. 6) If - is algebraically closed, then tr²(³ is the sum of the eigenvalues of (, including multiplicity, and so tr²(³ ~ cc where ( ²%³ ~ % b c %c b Ä b % b . Since the trace is invariant under similarity, we can make the following definition. Definition The trace of a linear operator B²= ³ is the trace of any matrix that represents . As an aside, the reader who is familar with symmetric polynomials knows that the coefficients of any polynomial ²%³ ~ % b c %c b Ä b % b ~ ²% c ³Ä²% c ³


189

are the elementary symmetric functions of the roots: c ~ ²c³ c ~ ²c³

c ~ ²c³

Å

~ ²c³ ~

The most important elementary symmetric functions of the eigenvalues are the first and last ones: c ~ c b Ä b ~ tr²(³

and ~ ²c³ Ä ~ det²(³

Geometric and Algebraic Multiplicities Eigenvalues actually have two forms of multiplicity, as described in the next definition. Definition Let be an eigenvalue of a linear operator B²= ³. 1) The algebraic multiplicity of is the multiplicity of as a root of the characteristic polynomial ²%³. 2) The geometric multiplicity of is the dimension of the eigenspace ; . Theorem 8.5 The geometric multiplicity of an eigenvalue of B²= ³ is less than or equal to its algebraic multiplicity. Proof. We can extend any basis 8 ~ ¸# Á Ã Á # ¹ of ; to a basis 8 for = . Since ; is invariant under , the matrix of with respect to 8 has the block form 0 ´ µ8 ~ 6

( ) 7block

where ( and ) are matrices of the appropriate sizes and so ²%³ ~ det²%0 c ´ µ8 ³ ~ det²%0 c 0 ³det²%0c c )³ ~ ²% c ³ det²%0c c )³ (Here is the dimension of = .) Hence, the algebraic multiplicity of is at least equal to the the geometric multiplicity of .

190


The Jordan Canonical Form One of the virtues of the rational canonical form is that every linear operator on a finite-dimensional vector space has a rational canonical form. However, as mentioned earlier, the rational canonical form may be far from the ideal of simplicity that we had in mind for a set of simple canonical forms and is really more of a theoretical tool than a practical tool. When the minimal polynomial ²%³ of splits over - , ²%³ ~ ²% c ³ Ä²% c ³ there is another set of canoncial forms that is arguably simpler than the set of rational canonical forms. In some sense, the complexity of the rational canonical form comes from the choice of basis for the cyclic submodules ºº#Á »». Recall that the -cyclic bases have the form 8Á ~ 2#Á Á #Á Á Ã Á Á c #Á 3

where Á ~ deg² Á ³. With this basis, all of the complexity comes at the end, so to speak, when we attempt to express ² Á c ²#Á ³³ ~ Á ²#Á ³ as a linear combination of the basis vectors. However, since 8Á has the form 2#Á #Á #Á Ã Á c #3 any ordered set of the form ² ³#Á ² ³#Á Ã Á c ² ³# where deg² ²%³³ ~ will also be a basis for ºº#Á »». In particular, when ²%³ splits over - , the elementary divisors are

Á ²%³ ~ ²% c ³Á and so the set 9Á ~ 2#Á Á ² c ³#Á Á Ã Á ² c ³Á c #Á 3 is also a basis for ºº#Á »». If we temporarily denote the th basis vector in 9Á by , then for ~ Á Ã Á Á c ,


191

~ ´² c ³ ²#Á ³µ ~ ² c b ³´² c ³ ²#Á ³µ ~ ² c ³b ²#Á ³ b ² c ³ ²#Á ³ ~ b b For ~ Á c , a similar computation, using the fact that ² c ³b ²#Á ³ ~ ² c ³Á ²#Á ³ ~ gives ²Á c ³ ~ Á c Thus, for this basis, the complexity is more or less spread out evenly, and the matrix of Oºº#Á »» with respect to 9Á is the Á d Á matrix v x x @ ² Á Á ³ ~ x x Å w

Æ Ä

Ä Ä y Æ Å { { Æ Æ Å { { Æ Æ z

which is called a Jordan block associated with the scalar . Note that a Jordan block has 's on the main diagonal, 's on the subdiagonal and 's elsewhere. Let us refer to the basis 9 ~ 9Á as a Jordan basis for . Theorem 8.6 (The Jordan canonical form) Suppose that the minimal polynomial of B²= ³ splits over the base field - , that is, ²%³ ~ ²% c ³ Ä²% c ³ where - . 1) The matrix of with respect to a Jordan basis 9 is diag@ ² Á Á ³Á Ã Á @ ² Á Á ³Á Ã Á @ ² Á Á ³Á Ã Á @ ² Á Á ³ where the polynomials ²% c ³Á are the elementary divisors of . This block diagonal matrix is said to be in Jordan canonical form and is called the Jordan canonical form of . 2) If - is algebraically closed, then up to order of the block diagonal matrices, the set of matrices in Jordan canonical form constitutes a set of canonical forms for similarity. Proof. For part 2), the companion matrix and corresponding Jordan block are similar:

192


*´²% c ³Á µ @ ² Á Á ³ since they both represent the same operator on the subspace ºº#Á »». It follows that the rational canonical matrix and the Jordan canonical matrix for are similar. Note that the diagonal elements of the Jordan canonical form @ of are precisely the eigenvalues of , each appearing a number of times equal to its algebraic multiplicity. In general, the rational canonical form does not “expose” the eigenvalues of the matrix, even when these eigenvalues lie in the base field.

Triangularizability and Schur's Lemma We have discussed two different canonical forms for similarity: the rational canonical form, which applies in all cases and the Jordan canonical form, which applies only when the base field is algebraically closed. Moreover, there is an annoying sense in which these sets of canoncial forms leave something to be desired: One is too complex and the other does not always exist. Let us now drop the rather strict requirements of canonical forms and look at two classes of matrices that are too large to be canonical forms (the upper triangular matrices and the almost upper triangular matrices) and one class of matrices that is too small to be a canonical form (the diagonal matrices). The upper triangular matrices (or lower triangular matrices) have some nice algebraic properties and it is of interest to know when an arbitrary matrix is similar to a triangular matrix. We confine our attention to upper triangular matrices, since there are direct analogs for lower triangular matrices as well. Definition A linear operator B²= ³ is upper triangularizable if there is an ordered basis 8 ~ ²# Á Ã Á # ³ of = for which the matrix ´ µ8 is upper triangular, or equivalently, if # º# Á Ã Á # » for all ~ Á Ã Á . As we will see next, when the base field is algebraically closed, all operators are upper triangularizable. However, since two distinct upper triangular matrices can be similar, the class of upper triangular matrices is not a canonical form for similarity. Simply put, there are just too many upper triangular matrices. Theorem 8.7 (Schur's theorem) Let = be a finite-dimensional vector space over a field - . 1) If the characteristic polynomial (or minimal polynomial) of B²= ³ splits over - , then is upper triangularizable. 2) If - is algebraically closed, then all operators are upper triangularizable.


193

Proof. Part 2) follows from part 1). The proof of part 1) is most easily accomplished by matrix means, namely, we prove that every square matrix ( 4 ²- ³ whose characteristic polynomial splits over - is similar to an upper triangular matrix. If ~ there is nothing to prove, since all d matrices are upper triangular. Assume the result is true for c and let ( 4 ²- ³. Let # be an eigenvector associated with the eigenvalue - of ( and extend ¸# ¹ to an ordered basis 8 ~ ²# Á Ã Á # ³ for s . The matrix of ( with respect to 8 has the form ´( µ8 ~ >

i ( ?block

for some ( 4c ²- ³. Since ´( µ8 and ( are similar, we have det ²%0 c (³ ~ det ²%0 c ´( µ8 ³ ~ ²% c ³ det ²%0 c ( ³ Hence, the characteristic polynomial of ( also splits over - and the induction hypothesis implies that there is an invertible matrix 7 4c ²- ³ for which < ~ 7 ( 7 c is upper triangular. Hence, if 8~>

7 ?block

then 8 is invertible and 8´(µ8 8c ~ >

7 ?>

i ( ?>

~ 7 c ? >

i

Advanced Linear Algebra (Graduate Texts in Mathematics)

Advanced Linear Algebra (Graduate Texts in Mathematics)

Advanced Linear Algebra (Graduate Texts in Mathematics)