Generalized Inverses of Linear Transformations
Books in the Classics in Applied Mathematics series are monographs and...
415 downloads
1965 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Generalized Inverses of Linear Transformations
Books in the Classics in Applied Mathematics series are monographs and textbooks declared out of print by their original publishers, though they are of continued importance and interest to the mathematical community. SIAM publishes this series to ensure that the information presented in these texts is not lost to today's students and researchers.
Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board
John Boyd, University of Michigan Leah Edelstein-Keshet, University of British Columbia William 0. Fans, University of Arizona Nicholas J. Higham, University of Manchester Peter Hoff, University of Washington Mark Kot, University of Washington Hilary Ockendon, University of Oxford Peter Olver, University of Minnesota Philip Protter, Cornell University Gerhard Wanner, L'Université de Geneve Classics in Applied Mathematics C. C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan 0. F. Belinfante and Bernard Kolman, A Suntey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis. A Second Course Anthony V. Fiacco and Garth P. McCormick, NonhineaT Sequential Unconstrained Minimization Techniques F. H. Clarke, Optimization and Nonsmooth Analysis George F. Carrier and Carl E Pearson, Ordinary Differential Equations Leo Breiman, Probability
R. Bellman and 0. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L Mangasarian, Nonlinear Programming °Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Supplement. Translated by 0. W. Stewart Richard Bellman, Introduction to Matrix Analysis U. M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L Campbell, and L. R. Petzold, Numerical Solution of lnitial.Value Problems in Differential.Algebmic Equations Charles 1. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dermis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations
Richard E. Barlow and Frank Proechan, Mathematical Theory of Reliability Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition
*First time in print.
Classics in Applied Mathematics (continued) Beresford N. Partert, The Symmetric Eigenvaluc Problem Richard Haberman, Mathematical ModeLs: Mechanical Vibrations, R*ulation Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Tamer Emanuel Parzen, StOChaStiC Processes Petar Kokotovi& Hassan K. KhaLit, and John O'ReilLy, Singular Perturbation Methods in Cont,ob Analysis
and Design Jean Dickinson Gibbons, Ingram 01km, and MiLton Sobel, Selecting and Ordering Fbpulations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods and Variational Problems Ivar EkeLand and Roger Témam, Convex AnaLysis Ivar Stalcgold, Boundary Value Problems of Mathematical Physics, Volumes I and ii J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Srampacchia, An introduction to Variational inequalities and Their Applications
Natterer, The Mathematics of Computerized Tomography Avinash C. Kak and MalcoLm Slaney, Principles of Computerized Tomographic Imaging E
R. Wong, Asymptotic Approximations of integrals
0. Axelsson arid V. A. Barker, Rnite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, TIme Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics linear and Nonlinear Programming, Rxed-ibint Theorems
Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Inmilgator, Mathematical Optimization and Economic Theory Philippe 0.
The FInite Element Method for EUiptiC Problems Jane K. Cullum and Ralph A. Willoughby, Lanczos Algorithms for Large Symmetric Eigennalue Vol.
Theory
M. Vidyasagar, NOrjineaT Systems Analysis, Second Edition Robert Mattheij and Jaap Molenaar, Ordinary Differential Equations in Theory and Practice Shanti S. Gupta and S. Panchapakesan, Multiple Decision Procedures: Theory and Methodology of Selecting and Ranking Populations Eugene L Ailgower and Kurt Georg, Introduction to Numerical Continuation Methods
Leah Edelstein-Keshet, Mathematical Models in Biology Heinz-Otto Kreiss and Jens Lorenz, initial.Boundary Value Problems and the Navier-Stokes Equations J. L. Hodges, Jr. and E L Lehrnann, Basic Concepts of Probability and Statistics, Second Edition George F. Carrier, Max Krook, and Carl E. Pearson, Functions of a Complex Variable Theory and Technique Friedrich Pukeisheim, Optimal Design of Experiments Israel Oohberg, Peter Lancaster, and Leiba Rodman, Invariant Subspaces of Matrices with Applications Lee A. Segel with 0. H. Handelman, Mathematics Applied to Continuum Mechanics Rajendra Bhatia, Perturbation Bounds for Matrix Eigenvalues Barry C. ArnoLd, N. Batakrishnan, and H. N. Nagaraja, A First Course in Order Statistics Charles A. Desoer and M. Vidyasagar, Feedback Systems: lnput.Output Properties Stephen L Campbell and Carl D. Meyer, Generalized Inverses of linear Transformations
Generalized Inverses of Linear Transformations
ci
Stephen L. Campbell
Carl D. Meyer North Carolina State University Raleigh, North Carolina
Society for Industrial and Applied Mathematics Philadelphia
Copyright © 2009 by the Society for Industrial and Applied Mathematics This SIAM edition is an unabridged republication of the work published by Dover Publications, Inc., 1991, which is a corrected republication of the work first published by Pitman Publishing Limited, London, 1979.
1098 7 6 5 43 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. Library of Congress Cataloging.in#Publication Data Campbell, S. L. (Stephen La Vern) Generalized inverses of linear transformations I Stephen L. Campbell, Carl D. Meyer.
p. cm. -- (Classics in applied mathematics ; 56) Originally published: London: Pitman Pub., 1979. Includes bibliographical references and index. ISBN 978-0-898716-71-9 1. Matrix inversion. 2. Transformations (Mathematics) I. Meyer, C. D. (Carl Dean) II. Title. QA188.C36 2009 512.9'434--dc22 2008046428
512.111.
is
a registered trademark.
To 4ciI1L
Contents
Preface to the Classics Edition
Preface o
Introduction and other preliminaries I
2 3 I
2
Exercises
6
I
8
4
Computation of At Generalized inverse of a product
8 10 12 19
5
Exercises
25
Least squares solutions What kind of answer is 2 Fitting a linear hypothesis 3 Estimating the unknown parameters 4 Goodness of lit 5 An application to curve fitting 6 Polynomial and more general fittings 1
7 WhyAt? 3
I
Prerequisites and philosophy Notation and basic geometry
The Moore—Penrose or generalized inverse I Basic definitions 2 Basic properties of the generalized inverse 3
2
xiii xix
28 28 30 32 34 39 42 45
Sums, partitioned matrices and the constrained generalized inverse I The generalized inverse of a sum 2 Modified matrices 3 Partitioned matrices 4 Block triangular matrices
46
The fundamental matrix of constrained minimization 6 Constrained least squares and constrained generalized
63
5
7
inverses Exercises
46 SI 53 61
65 69
CONTENTS
x
4
Partial isometries and EP matrIces I Introduction 2 Partial isometrics 3 EP matrices 4 Exercises
71 71 71
74 75
5 The generalized Inverse in electrical engineering I Introduction 2 n-port networks and the impedance matrix 3 Parallel sums 4 Shorted matrices 5 Other uses of the generalized inverse 6 Exercises 7 References and further reading
77 77 77 82 86 88 88 89
6 (i,j, k)-Generajlzed inverses and linear estImation I Introduction 2 Definitions 3 (1)-inverses 4 Applications to the theory of linear estimation
91
5
Exercises
7 The Drazin Inverse I Introduction 2 3
4
Definitions Basic properties of the Drazin inverse Spectral properties of the Drazin inverse
ADasapolynomialinA 6 ADasa limit 5
7 8
The Drazin inverse of a partitioned matrix Other properties
8 ApplicatIons of the Drazin inverse to the theory of finite Markov I Introduction and terminology 2 Introduction of the Drazin inverse into the theory of finite Markov chains 3 Regular chains
4 Ergodic chains 5 Calculation of A' and w for an ergodic chain 6 7
Non-ergodic chains and absorbing chains References and further reading
9 ApplIcations of the DraiJa Inverse 1 Introduction 2 Applications of the Drazin inverse to linear systems of differential equations
91 91
96 104 115 121)
120 121
127 129 130 136 139 147
151 151
152 157 158 160 165 170
171 171 171
CONTENTS
Applications of the Drazin inverse to difference equations 4 The Leslie population growth model and backward population projection 5 Optimal control 6 Functions of a matrix 3
7 Weak Drazin inverses 8
10
ContInuity of die generalized inverse 1
2 3
4 5
6 7 8
9 11
Exercises
Introduction Matrix norms Matrix norms and invertibility Continuity of the Moore—Penrose generalized inverse Matrix valued functions Non-linear least squares problems: an example Other inverses Exercises
References and further reading
Linear programming
Introduction and basic theory 2 Pyle's reformulation I
12
3
Exercises
4
References and further reading
ComputatIonal concerns
Introduction 2 Calculation of A' 3 Computation of the singular value decomposition 1
4 (1)-inverses
Computation of the Drazin inverse 6 Previous algorithms 5
xi
181
184 187
200 202 208
210 210 210 214 216 224 229 231
234 235
236 236 241
244 245
246 246 247 251
255 255 260
Exercises
261
Bibliography
263
Index
269
7
Preface to the Classics Edition
The first edition of Generalized Inverses of Linear Transfor,narions was
written toward the end of a period of active research on generalized inverses. Generalized inverses of various kinds have become a standard and important mathematical concept in many areas. The core chapters of this book, consisting of Chapters 1—7, 10, and 12. provide a development of most of the key generalized inverses. This presentation is as up to date and readable as ever and can be profitably read by anyone interested in learning about generalized inverses and their application. Two of the application chapters, however, turned out to be on the ground floor of the development of application areas that have gone on to become significant areas of applied mathematics. Chapter 8 focuses on applications involving Markov chains. While the basic relation between the group inverse and the theory of Markov chains is still relevant, several advances have been made. Most notably, there has been a wealth of new results concerning the use of the group inverse to characterize the sensitivity of the stationary probabilities to perturbations in the underlying transition probabilities—representative results are found in [8, 9, 13, 16, 28, 29, 23, 36, 37, 38, 39,40].' More generally, the group inverse has found applications involving expressions for differentiation of eigenvectors and eigenvalues [10, 11, 12, 14, 30, 31]. Since the original version of this book appeared, researchers in more theoretical areas have been applying the group inverse concept to the study of M-matrices, graph theory, and general nonnegative matrices [4, 5, 6, 7, 17, 18, 19, 20, 21, 22, 32, 33, 34]. Finally, the group inverse has recently proven to be fundamental in the analysis of Google's PageRank system. Some of these applications are described in detail in [26, 27]. 'Citations here correspond only to the references immediately following this preface.
xiv
PREFACE TO THE CLASSICS EDITION
Chapter 9 discusses the Drazin inverse and its application to
differential equations of the form + Bx = f. In Chapter 9 these equations are called singular systems of differential equations, and some applications to control problems are given. It turns out that many physical processes are most naturally modeled by such implicit differential equations. Since the publication of the first edition of this book there has been a major investigation of the applications, numerical solution, and theory behind such implicit differential equations. Today, rather than being called singular systems, they are more often called differential algebraic equations (DAEs) in applied mathematics and called either DAES or descriptor systems in the sciences and engineering. Chapter 9 still provides a good introduction to the linear time invariant case, but now it should be viewed as the first step in understanding a much larger and very important area. Readers interested in reading further about DAEs are referred to the general developments [3, 1,24, 15] and the more technical books [25, 35]. There has been, of course, some additional work on generalized inverses since the first edition was published. A large and more recent bibliography can be found in [2].
Stephen L Campbell Carl D. Meyer September 7, 2008
References [1] Ascher, U. M. and Petzold, L. R. Computer Methods for Ordinary Equations and Equations. SIAM, Philadelphia, 1998. [2] Ben-Israel, A. and Greville, T. N. E. Generalized Inverses. Theory and Applications. 2nd ed. Springer-Verlag, New York, 2003.
[3] Brenan, K. E., Campbell, S. L., and Petzold, L. R. Numerical Solution of Initial-Value Pmblems in Equations. Classics in Appl. Math. 14, SIAM, Philadelphia, 1995. [4] Catral, M., Neumann, M., and Xu, J. Proximity in group inverses of M-matrices and inverses of diagonally dominant M-matrices. LinearAlgebraAppl. 409,32—50,2005.
PREFACE TO THE CLASSICS EDITION
xv
[5] Catral, M., Neumann, M., and Xu, J. Matrix analysis of a Markov chain small-world model. LinearAlgebra App!. 409, 126—146, 2005.
[6] Chen, Y., Kirkland, S. J., and Neumann, M. Group generalized
inverses of M-matrices associated with periodic and nonperiodic Jacobi matrices. Linear Muhi!inearAlgebra. 39, 325—340, 1995. (7] Chen, Y., Kirkland, S. J., and Neumann, M. Nonnegative
alternating circulants leading to Al-matrix group inverses. Linear Algebra App!. 233, 8 1—97, 1996. (8] Cho, 0. and Meyer, C. Markov chain sensitivity measured by mean first passage times. LinearA!gebra App!. 316,21—28, 2000. (9] Cho,
0. and Meyer, C. Comparison of perturbation bounds for the
stationary distribution of a Markov chain. Linear Algebra App!. 335, 137—150, 2001.
[10] Deutsch, E. and Neumann, M. Derivatives of the Perron root at an essentially nonnegative matrix and the group inverse of an M-matrix. J. Math. Anal. App!. 102, 1—29, 1984. (11] Deutsch, E. and Neumann, M. On the first and second derivatives of the Perron vector. Linear Algebra App!. 71,57—76, 1985. [12] Deutsch, E. and Neumann, M. On the derivative of the Perron vector whose infinity norm is fixed. Linear Multilinear Algebra. 21,75—85, 1987.
(13] Funderlic, R. E. and Meyer, C. Sensitivity of the stationary distribution vector for an ergodic Markov chain. Linear Algebra AppI. 17, 1—16, 1986.
[14] Golub, 0. H. and Meyer, Jr., C. D. Using the QR factorization and group inversion to compute, differentiate, and estimate the sensitivity of stationary probabilities for Markov chains. SIAM J. Aig. Discrete Met!,. 7, 273—281, 1986.
[15] Hairer, E. and Wanner, G. Solving Ordinary Differential Equations II. St and Differential-Algebraic Problems. 2nd ed. Springer Ser. Comput. Math. 14, Springer-Verlag, Berlin, 1996.
PREFACE TO THE CLASSICS EDITION
xvi
[16] Ipsen, I. and Meyer, C. D. Uniform stability of Markov SJAMJ. MatrLxAnal. App!. 15, 1061—1074, 1994.
chains.
[17] Kirkland, S. J. and Neumann, M. Convexity and concavity of the Perron root and vector of Leslie matrices with applications to a population model. SIAM I. Matrix Anal. App!. 15, 1092—1107, 1994. [18]
[19]
Kirkland, S. J. and Neumann, M. Group inverses of M-matrices associated with nonnegative matrices having few eigenvalues. LinearAlgebra App!. 220, 181—213, 1995.
Kirkland, S. J. and Neumann, M. The M-matrix group generalized inverse problem for weighted trees. SIAM J. Matrix Anal. App!. 19, 226—234, 1998.
[20] Kirkland, S. J. and Neumann, M. Cutpoint decoupling and first passage times for random walks on graphs. SIAM J. Matrix Anal. App!. 20,860—870, 1999.
[21] Kirkland, S. J., Neumann, M., and Shader, B. L. Distances in weighted trees and group inverse of Laplacian matrices. SIAM I. Matrix Anal. App!. 18, 827—841, 1997.
[22] Kirkland, S. J., Neumann, M.,
and Shader, B. L. Bounds on the
subdominant eigenvalue involving group inverses with applications to graphs. Czech.
Math. J. 48,
1—20, 1998.
[23] Kirkland, S. J., Neumann, M., and Sze, N.-S. On optimal condition
numbers for Markov chains. Numer
Math. 110,521—537,2008.
[24] Kumar, A. and Daoutidis, P. Control of Nonlinear Djfferential
Equation Systems with Applications to Chemical Processes. Chapman and Hall/CRC, Boca Raton, FL, 1999.
Algebraic
[25] Kunkel, P. and Mehrmann, V. Differential-Algebraic Equations. Analysis and Numerical Solution. EMS Textbooks in Math., European Mathematical Society, ZUrich, 2006. [26]
Meyer, C. Google 's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Langville, A. and
Princeton, NJ, 2006.
PREFACE TO THE CLASSICS EDITION
xvii
[27] Langville, A. and Meyer, C. D. Updating Markov chains with an eye on Google's PageRank. SIAM J. Matrix Anal. App!. 27, 968—987,2006.
[28] Meyer, C. The character of a finite Markov chain. In Linear Algebra, Markoi' Chains, and Queuing Models. IMA Vol. Math. Appl. 48, Springer. New York, 1993.47—58.
[291 Meyer, C. Sensitivity of the stationary distribution of a Markov chain. SlAM). Matrix Anal. App!. 15, 715—728, 1994.
[30] Meyer, C. Matrix Analysis and Applied Linear Algebra. 2nd ed. SIAM, Philadelphia, to appear. [31] Meyer, C. and Stewart, G. W. Derivatives and perturbations of eigenvectors. SIAM I. Anal. 25,679—691, 1988. [32] Neumann, M. and Werner, H. J. Nonnegative group inverses. LinearAlgebra App!. 151, 85—96, 1991.
[33] Neumann, M. and Xu, J. A parallel algorithm for computing the group inverse via Perron complementation. Electron. J. Linear Algebra 13, 13 1—145, 2005.
[34] Neumann, M. and Xu, J. A note on Newton and Newton-like inequalities for M-matrices and for Drazin inverses of M-matrices. Electron. J. Linear Algebra 15, 314—328, 2006. [35] Riaza, R. Differential-Algebraic Systems: Analytical Aspects and Circuit Applications. World Scientific, River Edge, NJ, 2008. [36] Seneta, E. Sensitivity to perturbation of the stationary distribution: Some refinements. LinearAlgebra App!. 108, 12 1—126, 1988. [37] Seneta, E. Perturbation of the stationary distribution measured by ergodicity coefficients. Adv. App!. Probab. 20,228—230, 1988. [38] Seneta, E. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains. In Numerical Solution of Mar/coy Chains. (W. J. Stewart, ed.) Marcel Dekker, New York, 1991, 121—129.
xviii
PREFACE TO THE CLASSICS EDITION
[39] Seneta, E. Explicit forms for ergodicity coefficients of stochastic matrices. Linear Algebra AppI. 191, 245—252, 1993. [40] Seneta, E. Sensitivity of finite Markov chains under perturbation.
Statist. Probab. Leu. 17, 163—168, 1993.
Preface
During the last two decades, the study of generalized inversion of linear transformations and related applications has grown to become an important topic of interest to researchers engaged in the study of linear mathematical problems as well as to practitioners concerned with applications of linear mathematics. The purpose of this book is twofold. First, we try to present a unified treatment of the general theory of generalized inversion which includes topics ranging from the most traditional to the most contemporary. Secondly, we emphasize the utility of the concept of generalized inversion by presenting many diverse applications in which generalized inversion plays an integral role. This book is designed to be useful to the researcher and the practitioner, as well as the student. Much of the material is written under the assumption that the reader is unfamiliar with the basic aspects of the theory and applications of generalized inverses. As such, the text is accessible to anyone possessing a knowledge of elementary Linear algebra. This text is not meant to be encyclopedic. We have not tried to touch on all aspects of generalized inversion—nor did we try to include every known application. Due to considerations of length, we have been forced to restrict the theory to finite dimensional spaces and neglect several important topics and interesting applications. In the development of every area of mathematics there comes a time when there is a commonly accepted body of results, and referencing is limited primarily to more recent and less widely known results. We feel that the theory of generalized inverses has reached that point. Accordingly, we have departed from previous books and not referenced many of the more standard facts about generalized inversion. To the many individuals who have made an original contribution to the theory of generalized inverses we are deeply indebted. We are especially indebted to Adi Ben-Israel, Thomas Greville, C. R. Rao, and S. K. Mitra whose texts undoubtedly have had an influence on the writing of this book.
xx
PREFACE
In view of the complete (annotated) bibliographies available in other texts, we made no attempt at a complete list of references.
Special thanks are extended to Franklin A. Graybill and Richard 3. Painter who introduced author Meyer to the subject of generalized inverses and who prov led wisdom and guidance at a time when they were most needed. S. L. Campbell C. D. Meyer, Jr North Carolina State University at Raleigh
0
Introduction and other preliminaries
1.
Prerequisites and philosophy
The study of generalized inverses has flourished since its rebirth in the
early 1950s. Numerous papers have developed both its theory and its applications. The subject has advanced to the point where a unified treatment is possible. It would be desirable to have a book that treated the subject from the viewpoint of linear algebra, and not with regard to a particular application. We do not feel that the world needs another introduction to linear algebra. Accordingly, this book presupposes some familiarity with the basic facts and techniques of linear algebra as found in most introductory courses. It is our hope that this book would be suitable for self-study by either students or workers in other fields. Needed ideas that a person might well have forgotten or never learned, such as the singular value decomposition, will be stated formally. Unless their proof is illuminating or illustrates an important technique, it will be relegated to the exercises or a reference. There are three basic kinds of chapters in this book. Chapters 0, 1, 2, 3, 4, 6, 7, 10, and 12 discuss the theory of the generalized inverse and related notions. They are a basic introduction to the mathematical theory. Chapters 5, 8,9 and 11 discuss applications. These chapters are intended to illustrate the uses of generalized inverses, not necessarily to teach how to use them. Our goal has been to write a readable, introductory book which will whet the appetite of the reader to learn more. We have tried to bring the reader far enough so that he can proceed into the literature, and yet not bury him under a morass of technical lemmas and concise, abbreviated proofs. This book reflects our rather definite opinions on what an introductory book is and what it should include. In particular, we feel that the numerous applications are necessary for a full appreciation of the theory. Like most types of mathematics, the introduction of the various generalized inverses is not necessary. One could do mathematics without
2
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
ever defining a ring or a continuous function. However, the introduction
of generalized inverses, as with rings and continuous functions, enables us to more clearly see the underlying structure, to more easily manipulate it, and to more easily express new results. No attempt has been made to have the bibliography comprehensive. The bibliography of [64] is carefully annotated and contains some 1775 entries. In order to keep this book's size down we have omitted any discussion of the infinite dimensional case. The interested reader is referred to [64].
2.
Notation and basic geometry
Theorems, facts, propositions, lemmas, corollaries and examples are
numbered consecutively within each section. A reference to Example 3.2.3. refers to the third example in Section 2 of Chapter 3. If the reader were already in Chapter 3, the reference would be just to Example 2.3. Within Section 2, the reference would be to Example 3. Exercise sections are scattered throughout the chapters. Exercises with an (*) beside them are intended for more advanced readers. In some cases the require knowledge from an outside area like complex function theory. At other times they involve fairly complicated proofs using basic ideas. xli) C(R) is the field of complex (real) numbers. CTM is the vector-space of m x n complex (real) matrices over C(R). C'(R") is the vector-space of n-tuples of complex (real) numbers over C(R). We will frequently not distinguish between C"(R") and C" X '(R" 1)• That is, n-tuples will be written as column vectors. Except where we specifically state otherwise, it is to be assumed that we are working over the complex field. If A e C'" then with respect to the standard basis of C" and C'", A induces a linear transformation : C" -. C'" by = Au for every ueC". Whenever we go from one of A or to the other, it is to be understood that it is with respect to the standard basis. The capital letters, A, B, C, X, Y, Z are reserved for matrices or their corresponding linear transformations. Subspaces are denoted by the capital letters M, N. Subspaces are always linear subspaces. The letters U, V, W are reserved for unitary matrices or partial isometrics. I always denotes the identity matrix. If I e we sometimes write 1. Vectors are denoted by b, u, v, y, etc., scalars by a, b, )., k, etc. R(A) denotes the range of A, that is, the linear span of the columns of A. The range of is denoted Since A is derived from by way of the standard basis, we have R(A) = The null space of A, N(A), is : Ax = 0). A matrix A is hermiuan if its conjugate transpose, A*, equals A. Jf A2 = A, then A is called a projector of C" onto R(A). Recall that rank (A) = TI(A) if A2 = A. If A2 = A and A = A *, then A is called an orthogonal projector. If A, then [A,B] = AB — BA. The inner product between two vectors u, we C" is denoted by (u, v). If $" X
INTRODUCTION AND OTHER PRELIMINARIES
3
= {ueC" :(u,v) = 0 for every VE.Y}. The smallest subspace of C" containing .9' is denoted LS([f). Notice that = LS(.9'). Suppose now that M,N1, and N2 are subspaces of C". Then N1 + N2 = {u + v:ueN1 and veN2}, while AN1 = {Au :uEN1}. If M = N1 + N2 and N1 N2 = {O}, then M is called the direct sum of N1 and N2. In this case we write M = N1 + N2. If M = N1 + N2 and N1 J.. N2, that is (u,v) = 0 for every u€N1 and yeN2, then M is called the orthogonal sum of N1 and N2. This will be written M = N1 N2. If two vectors are orthogonal. their sum will frequently be written with a also. If C" = N1 + N2, then N1 and N2 are called complementary subs paces. Notice that C" = N1 Nt. The dimension of a subspace M is denoted dim M. One of the most basic facts used in this book is the next proposition. is a subset of C", then ,$f'-'-
Proposition 0.2.1
Suppose that AeCtm
Then R(A) =
Proof We will show that R(A)
N(A*)I and dim R(A) = dim Suppose that uER(A). Then there is a such that Av = u. If wEN(A*), (v,A*w) w) = then (u, w) = (As', = (v,O) =0. Thus UEN(A*)J. and R(A) N(A*)i. But dim R(A) = rank A = rank = m — dim N(A*) = dim N(A*)L.
•
Thus R(A)=N(A*).. A useful consequence of Proposition 1 is the 'star cancellation law'. Proposition 0.2.2 (Star cancellation law) Suppose that Then (i) A*AB MAC and only (ii) N(AtA) = N(A), (iii) R(A*A) = R(A*).
and
= AC. Also
Proof. (i) may be rewritten as A*A(B — C) =0 if and only if A(B — C)
=0.
Clearly (i) and (ii) are equivalent. To see that (ii) holds notice that by Proposition 1, N(M) = R(A)L and thus MAx =011 and only if Ax =0. To see (iii), note that using (ii) we have that R(A*A) = N(A*A)L = N(A)- =
R(A*). U
Propositions 1 and 2 are basic and will be used frequently in what
follows without comment. If M is a subspace of C", then we may define the orthogonal projector, = u if ueM and PMU = 0 M by Notice that = and that for any orthogonal projector, P. we have P= — It is frequently helpful to write a matrix A in block form, that is, express A as a matrix made up of matrices, A = If B is a second block matrix, then (AB)1, =
submatrices A.,, cations and additions. Example 0.2.1
and (A + B)1, =
+ B1, provided the
the correct size to permit the indicated multipli-
LetA=
112341 1
2
01
1
1
iJ
riol
andB= 10 OI.ThenA 10 OJ
4 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
can be written as A
written as
where A11
= [1], A12 = [2,3],
=
[i] where B1 = [1
0], B2
and B3 = [1,0].
=
B3
_1A11 Th US AB
LA21
A1311B11 1A11B1 + A12B2 + A13B3 A22 A23J1B2 1LA21B1+A22B2+A22B3 LB3J A12
If all the submatrices are from C' 'for a fixed r, then block matrices K
may be viewed as matrices over a ring. We shall not do so. This notation is especially useful when dealing with large matrices or matrices which have submatrices of a special type. There is a close connection between block matrices, invariant subspaces, and projections which we now wish to develop. then M is called Definition 0.2.1 If M is a subspace of and an invariant subspace of A Ef and only (lAM = (Au : ueM} c M. If M is an invariant subspace for both A and then M is called a reducing subspace
of A.
Invariant and reducing subspaces have a good characterization in terms of projectors.
Proposition 0.2.3
Let PM and M isa subspace of be the orthogonal projector onto M. Then (i) M is an invariant subspace for A (land only if PMAPM = APM. (ii) M is a reducing subspace for A (land only (fAPM = PMA. The proof of Proposition 3 is left to the next set of exercises. Knowledge of invariant subspaces is useful for it enables the introduction of blocks of Suppose
zeros into a matrix. Invariant subspaces also have obvious geometric importance in studying the effects of on
Proposition 0.2.4 Suppose that A E CN and that M is an invariant subspace of A of dimension r. Then there exists a unitary matrix U such that
(i) A=U*IIA
Al
If M is a reducing subspace of A, then (ii) A = U*I
L°
A 11
ii
1 IU where
A22j
' A 22
Then CN = M M1. Let Proof Let M be a subspace of orthonormal basis for M and be an orthonormal basis for
be an Then
INTRODUCTION AND OTHER PRELIMINARIES
5
Order the vectors in P so that P2 is an orthonormal basis for are listed first. those in Let U be a unitary transformation that maps the standard basis for onto the ordered basis fi. Then P = P1
M
r
—
Lo
—
0]'
I
12
*
LA21
(1)
A,2
r = dim M. Suppose now that M is an invariant subspace for A. Thus PMAPM = APM by Proposition 3. This is equivalent to where
E C'
X
UPMUUAUUPMU = U*AUU*PMU. Substituting (1) into this gives IA11 L0
0]
[A11
0
0JLA,1 0
Thus A21 =0 and part (i) of Proposition 4 follows. Part (ii) follows by substituting (1) into U*PMUU*AU = U*AUU*PMU. • If AECnXn, then R(A) is always an invariant subspace for A. If A is hermitian, then every invariant subspace is reducing. In particular. R(A) is reducing. If A = A*, then there exists a unitary marix U and an
Proposition 0.2.5
irnertible hermitian matrix A1 such that A =
1
U.
Proposition 5 is. of course, a special case of the fact that every hennitian matrix is unitarily equivalent to a diagonal matrix. Viewed in this manner. it is clear that a similar result holds if hermitian is replaced by normal where a matrix is called normal if A*A = AA*. We assume that the reader is already familiar with the fact that normal and hermitian matrices are unitarily equivalent to diagonal matrices. Our purpose here is to review some of the 'geometry' of invariant subspaces and to gain a facility with the manipulation of block matrices. Reducing subspaces are better to work with than invariant subspaces, has no reducing = n> 1, does have invariant subspaces
but reducing subspaces need not always exist. A X
subspaces. Every matrix in since it has eigenvectors. And if is a set of eigenvectors corresponding to a particular eigenvalue of A, then LS(S1') is an invariant subspace for A. We shall see later that unitary and hermitian matrices are often easier to work with. Thus it is helpful if a matrix can be written as a product of such factors. If there is such a decomposition. It is called the polar form. The name comes from the similarity between it and the polar form of a complex number z = re1° where reR and 1e101 = I.
Theorem 0.2.1
If
then there exists a unitary matrix U and
hermitian matrices B,C such that A = UB = CU.
6
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If A E cm
n, then one cannot hope to get quite as good an expression for A since A is not square. There are two ways that Theorem 1 can be extended. One is to replace U by what is called a partial isometry. This will be discussed in a later section. The other possibility is to replace B by a matrix like a hermitian matrix. By 'like a hermitian matrix' we are thinking of the block form given in Proposition 5. where m
Theorem 0.2.2
(Singular Value Decomposition) Suppose that A E Ctm 'C". Then there exist unitary matrices UeCM 'Cm and VeC" 'C", and an invertible
hermitian diagonal matrix D = Diag{a1, ... ,a,}, whose diagonal entries are the positive square roots of the eigenvalues of(A*A) repeated according
to multiplicity, such that A = u
0°]v.
The proofs of Theorems I and 2 are somewhat easier if done with the notation of generalized inverses. The proofs will be developed in the exercises following the section on partial isometrics. Two comments are in order. First, the matrix
] is not a square
matrix, although D is square. Secondly, the name 'Singular Value Decomposition' comes from the numbers fri, ... , a,) which are frequently referred to as the singular values of A. The notation of the functional calculus is convenient and will be used from time to time. If C = U Diag{A1, ... , for some unitary U and is a function defined on then we definef(C) = U Diag{f(21), ... This definition only makes sense if C is normal. If p(A) = aMA" + ... + a0, then p(C) as we have defined it here agrees with the standard definition of p(C) = aRC" + ... + a0!. For any A E C" 'Ca a(A) denotes the set of eigenvalues of A.
I
3.
Exercises
Prove that if P is hermitian, then P is a projector if and only if P3 = P2. 2. Prove that if M1 c M2 C" are invariant subspaces for A, then A is 1.
unitarily equivalent to a matrix of the form J 0
X X where X
LOOXJ
denotes a non-zero block. Generalize this to t invariant subspaces such that 3. If M2 c C" are reducing subspaces for A show that A is unitarily
Ix equivalent to a matrix in the form reducing subspaces such that M1
4. Prove Proposition 2.3.
0
01
0 X 0 1. Generalize this to t LOOXJ ...
M.
INTRODUCTION AND OTHER PRELIMINARIES X is an invariant subspace for A, then and M c 5. Prove that if A M1 is an invariant subspace for A*.
6. Suppose A = A*. Give necessary and sufficient conditions on A to guarantee that for every pair of reducing subspaces M1 , M2 of A that either M1 ±M2 or M1rM2
{O}.
7
1
The Moore—Penrose or generalized inverse
1.
Basic definitions
Equations of the form
Ax = b,AECMXA,XECA,b€Cm
(1)
X9, occur in many pure and applied problems. If A and is invertible, then the system of equations (1) is, in principle, easy to solve. The unique solution is x = A - 1b. If A is an arbitrary matrix in C"' ", then it becomes more difficult to solve (1). There may be none, one, or an infinite number of solutions depending on whether b€ R(A) and whether n-rank (A)> 0. One would like to be able to find a matrix (or matrices) C, such that solutions of (1) are of the form Cb. But if R(A), then (1) has no solution. This will eventually require us to modify our concept of what a solution of (1) is. However, as the applications will illustrate, this is not as unnatural as it sounds. But for now we retain the standard definition of solution. To motivate our first definition of the generalized inverse, consider the functional equation
(2)
wheref is a real-valued function with domain S". One procedure for solving (2) is to restrict the domain off to a smaller set is one to so that one. Then an inverse functionj' from R(J) to b°' is defined byj 1(y) = x
if xe9" andf(x) = y. Thus! '(y) is a solution of(2) for y€R(J). This is how the arcsec, arcsin, and other inverse functions are normally defined. The same procedure can be used in trying to solve equation (1). As usual, we let be the linear function from C" into C"' defined by = Ax for xeC". To make a one to one linear transformation it must be restricted to a subspace complementary to N(A). An obvious one is N(A)- = R(A*). This suggests the following definition of the generalized inverse.
Definition 1.1.1
Functional definition of the generalized inverse
If
THE MOORE—PENROSE OR GENERALIZED INVERSE
define the linear transformation 'X and AtX
=
by The matrix :Cm
9
=0 is denoted At
and is called the generali:ed inverse of A.
It is easy to check that AAtX = Oil xER(A)1 and AAtx = x if x€R(A). Similarly, AtAX = 0 if xEN(A)= and A'Ax = x if XER(A*) = R(At). Thus AAt is the orthogonal projector of Ctm onto R(A) while A'A is the orthogonal projector of onto R(A*) = R(At). This suggests a second definition of the generalized inverse due to E. H. Moore
If Definition 1.1.2 Moore definition of the generalized inverse AECrnXn. then the generali:ed inuerse of A is defined to be the unique matrix
such 1/tat
(a) AAt
=
PR(A).
and
(b) AtA=PR(A.).
Moore's definition was given in 1935 and then more or less forgotten. This is possibly due to the fact that it was not expressed in the form of Definition 2 but rather in a more cumbersome (no pun intended) notation. An algebraic form of Moore's definition was given in 1955 by Penrose who was apparently unaware of Moore's work.
Definition 1.1.3
Penrose definition of the generalized inverse At then is the unique matrix in CnXtm such that
(i)
If
AAtA=A,
(ii) AtAAt = At, (iii) (AAt)* = AAt, (iv) (AtA)* = AtA. The first important fact to be established is the equivalence of the definitions. Theorem 1.1.1 The functional, Moore and Penrose definitions of the generalized inverse are equivalent.
Proof We have already noted that if At satisfies Definition 1. then it satisfies equations (a) and (b). If a matrix At satisfies (a) and (b) then it immediately satisfies (iii) and (iv). Furthermore (i) follows from (a) by observing that AAtA = PR(A,A = A. (ii) will follow from (b) in a similar manner. Since Definition I was constructive and the A' it constructs satisfies (a), (b) and (i)—(iv), the question of existence in Definitions 2 and 3 is already taken care of. There are then two things remaining to be proven. One is that a solution of equations (i)—(iv) is a solution of(a) and (b). The second is that a solution of(a) and (b) or (i)—(iv) is unique. Suppose then that A' is a matrix satisfying (i)—(iv). Multiplying (ii) on the left by A gives (AAt)2 = (AAt). This and (iii) show that AAt is an orthogonal projector. We must show that it has range equal to the range
10 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
of A. Using (i) and the fact that R(BC) c R(B) for matrices B and C, we get R(A) = R(AAtA) c R(AAt) R(A), so that R(A) = R(AAt). as desired. The proof that AtA = is similar and is Thus = left to the reader as an exercise. One way to show uniqueness is to show that if At satisfies (a) and (b), or (i)—(iv), then it satisfies Definition 1. Suppose then that At is a matrix satisfying (i)—(iv), (a), and (b). If then by (a), AAtx =0. Thus by (ii) A'x = AtAAtX = A'O = 0. II xeR(A), then there exist yeR(A*) such that Ay = x. But AtX = AtAy = y. The last equality follows by observing that taking the adjoint of both sides of 1x. Thus (1) gives PR(At)A = A* so that R(A) c R(At). But y = At satisfies Definition I. U As this proof illustrates, equations (i) and (ii) are, in effect, cancellation laws. While we cannot say that AB = AC implies B = C, we can say that if AtAB = AtAC then AB = AC. This type of cancellation will frequently appear in proofs and the exercises. For obvious reasons, the generalized inverse is often referred to as the Moore—Penrose inverse. Note also that if A E C" "and A is invertible, then A -' = At so that the generalized inverse lives up to its name.
2.
Basic properties of the generalized inverse
Before proceeding to establish some of what is true about generalized inverses, the reader should be warned about certain things that are not true. While it is true that R(A*) = R(At), if At is the generalized inverse, condition (b) in Definition 2 cannot be replaced by AtA =
Example 1.2.1
o] = A' = XAX
?].SinceXA=AX= =
X satisfies AX =
and hence X #
and XA
At: Note that XA
=
But
and thus
X. If XA = PR( A.), AX = PR(A), and in addition XAX = X, then
X = At. The proof of this last statement is left to the exercises. In computations involving inverses one frequently uses (AB)' = B- 1A A and B are invertible. This fails to hold for generalized inverses even if AB = BA.
Fact 1.2.1 If as BtAt. Furthermore Example 1.2.2
then (AB)t is not necessarily the same is
not necessarily equal to (A2)t.
Ii
THE MOORE—PENROSE OR GENERALIZED INVERSE
A2
= A while At2 =
Thus (At)2A2 =
which is not a projection.
(A2)t.
Thus (At)2 Ways of calculating At will be given shortly. The generalized inverses in Examples 1 and 2 can be found directly from Definition 1 without too much difficulty. Examples 2 illustrates another way in which the properties of the generalized inverse differ from those of the inverse. If A is invertible, then
2Eo(A) if and only
If A
in Example 2, then
—
(7(A) = {1,0} while a(At) =
If A is similar to a matrix C, then A and C have the same eigenvalues. the same Jordan form, and the same characteristic polynomial. None of these are preserved by taking of the generalized inverse.
1 Example 1.2.2
Ii B= 10
0 1
Let A =
1
1
0 —2
2
L—i
ii
—11 Then A =
BJB'
where
1J
1
10 1 0] IIandJ=IO 0 0J.ThecharacteristicpolynomialofA
Liooi
Loo2J
and J is
ro o = Ji 0
A2 and 2— 2.
LO
and the characteristic polynomial of divisors 22,(2 — 1/2).
is
—
1/2)
0
o
0 1/2
with elementary
11
An easy computation gives At = 1/12
6
2—11 0
L—' —2
6
1J
— (1 — (1 + and hence a diagonal Jordan form. Thus, if A and C are similar, then about the only thing that one can always say about At and C is that they have the same rank. A type of inverse that behaves better with respect to similarity is discussed in Chapter VII. Since the generalized inverse does not have all the properties of the inverse, it becomes important to know what properties it does have and which identities it does satisfy. There are, of course, an arbitrarily large number of true statements about generalized inverses. The next theorem lists some of the more basic properties.
But At has characteristic polynomial 1(2 —
Theorem 1.2.1 (P1)
Then
Suppose that
(At)t=A 0
(P2) (At)* = (A*)t (P3) IfAeC,(AA)t = A'At where
ifA=0. = AAAt = AtAA* (P4) (P5) (A*A)t = AtA*t
= — 412 #0 and
=0
12
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
At = (A*A)?A* = A*(AA*)? (P7) (UAV)t = v*A?u* where U, V are unitary matrices. (P6)
Proof We will discuss the properties in the order given. A look at Definition 2 and a moment's thought show that (P1) is true. We leave (P2) and (P3) to the exercises. (P4) follows by taking the adjoints of both A = (AAt)A and A = A(AtA). (P5), since it claims that something is a generalized inverse, can be checked by using one of the definitions. Definition 2 is the quickest. AtA*tA*A = At(A*tA*)A = A?(AAt)*A = AtAA'A = AtA = Similarly, A*AA?A*? = A*(AAt)A*t = = A*(AAt)*A*t = A*(A*tA*)A*t = (A*A*?)(A*A*?) = A*A*t = = A?A*? by Definition 2. (A*A)? = Thus = = (P6) follows from (P5). (P7) is left as an exercise, a
01t
IA
Proposition
1.2.1
•.
I
I
:
0
=I [0
Ló The proof of Proposition 1 is left as an exercise. As the proof of Theorem I illustrates, it is frequently helpful to know the ranges and null spaces of expressions involving At. For ease of reference we now list several of these basic geometric properties. (P8) and (P9) have been done already. The rest are left as an exercise. :AmJ
Theorem 1.2.2 If then (P8) R(A) = R(AAt) = R(AA*) (P9) R(At) = R(A*) = R(AtA) = (PlO) R(I — AA') = N(AAt) = = N(At) = R(A)(P11) R(I — AtA)= N(AtA) = N(A)= 3.
Computation of At
In learning any type of mathematics, the working out of examples is useful, if not essential. The calculation of At from A can be difficult, and will be discussed more fully later. For the present we will give two methods which will enable the reader to begin to calculate the generalized inverses of small matrices. The first method is worth knowing because using it should help give a feeling of what A' is. The method consists of 'constructing' At according to Definition 1.
Example 1.3.1
[1127 Let A = I 0
Ii
2
2 I . Then R(A*) is
0
1
spanned by
L'°' {
[i].
[o]. [o]}. A subset forming a basis of R(A*) is
THE MOORE—PENROSE OR GENERALIZED INVERSE
[?] =
=
roi
3
and At 4 =
I1
. I
We now
must calculate a basis for R(A)-- =
N(A*).
Lii
1
+ x4 [1/2] where
Solving the system A*x =0 we get that x = x3
—1
x3.x4eC. Then At
—11
=At
1/2
1/21
= O€C3. Combining
all
of this
01
0 3
3
—1
gives At 2
4
1/2
21 21 ri
At=I0
0 1
1
Li
1J
—1 1/2 =
1
0
0
1
0 0 0
01
oJ
[1
0
10
1
o
01
0 Olor
Lii
0
oJ
3
3
—1
—1
2
4
1/2
1,2
2
I
1
0
2
I
0
1
—l
The indicated inverse can always be taken since its columns form a basis for R(A) R(A)L and hence are a linearly independent set of four vectors in C4. Below is a formal statement of the method described in Example 1. Xfl Theorem 1.3. 1 Let AECTM have rank r. If {v1 , v2,... "r} is a basis for R(A*) and {w1 , w2, ... , is a basis for N(A*), then
Proof By using Definition I
I
I
A'[Av 1I••.I'Av,:w I
I
Furthermore,
I
I
I
I
I
I
1.1
it
is clear that
I
{Av1,Av2, ... ,Av,) must be a basis for R(A). Since R(A)1 =
N(A*), it follows that the matrix [Av1 ...
w1
...
non-singular. The desired result is now immediate. •
must be
13
14 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
The second method involves a formula which is sometimes useful. It
depends on the following fact:
Proposition
then there exists that A = BC and r = rank (A) = rank (B) = rank (C).
such
1.3.1
The proof of Proposition 1 is not difficult and is left as an exercise. It means that the next result can, in theory, always be used to calculate At. or (B*BYl if C or B have See Chapter 12 for a caution on taking small singular values. Theorem 1.3.2 If A = BC where r = rank (A) = rank (B) = rank (C), then At
Prool Notice that
= C(CC) '(BB) iB*.
and CC are rank r matrices in C' 'so that it X
makes sense to take their inverses. Let X = C*(CC*) *(B*B) IB*. We will show X satisfies Definition 3. This choice is made on the grounds that the more complicated an expression is, the more difficult it becomes geometrically to work with it, and Definition 3 is algebraic. Now AX = BCC*(CC*) i(B*B) 1B* = B(BB) 'B,so(AX) = AX. Also XA = C*(CC*) IB*BC = C*(CC*) 1C, so (XA)* = XA. Thus (iii) and (iv) hold. To check (1) and (ii) use XA = C*(CC*) 'C to get that A(XA) = BC(C*(CC*)... 'C) = BC = A. And (XA)X = C(CC) 1CC*(CC*)i x (B*B) 'B = C*(CC*) '(BB) IB* = X. Thus X = At by Definition 3.U
Example 1.3.2
1.A=BC where
Let
BeC2'" and CeC1 x3 In fact, a little thought shows that A
Then B*B=[5],CC*=[6].ThusAt= Ill
1
2].
2
LzJ 1
is typical as the next result shows.
Theorem 1.3.3
If
and rank(A) = 1, then At = !A* where
The proof is left to the exercises. The method of computing At described in Example 1.3.1 and the method of Theorem 1.3.2 may both be executed by reducing A by elementary row operations.
Definition 1.3.1 A matrix echelon form .fE is of the form
which has rank r is said to be in row
(1) (m — r) X
THE MOORE—PENROSE OR GENERALIZED INVERSE
where the elements c, of C (
15
= C,. satisfy the following conditions,
(1)
(ii) The first non-zero entry in each row of C is 1. (iii) If = 1 is the first non-zero entry of the ith row, then the jth column of C is the unit vector e1 whose only non-zero entry is in the ith position.
For example, the matrix
120 —2 3501 400 E= 0 0 0 0 0 000 0000 000 0000 00
1
1
3
(2)
is in row echelon form. Below we state some facts about the row echelon form, the proofs of which may be found in [65]. For A e CTM "such that rank (A) = r:
(El) A can always be row reduced to row echelon form by elementary row operations (i.e. there always exists a non-singular matrix P€C" such that PA = EA where EA is in row echelon form). (E2) For a given A, the row echelon form EA obtained by row reducing A is unique. (E3) If Eq is the row echelon form for A and the unit vectors in EA appear in columns i2,... , and i,, then the corresponding columns of A are a basis for R(A). This particular basis is called the set of distinguished columns of A. The remaining columns are called the undistinguished columns of A. (For example, if A is a matrix such that its row echelon form is given by (2) then the first, third, and sixth columns of A are the distinguished columns. (E4) If EA is the row echelon form (1) for A, then N(A) = = N(C). (ES) If (1) is the row echelon form for A, and if the matrix made up of the distinguished columns of A (in the same order as they are in A), then A = BC where C is obtained from the row echelon form. This is a full rank factorization such as was described in Proposition 1.
Very closely related to the row echelon form is the hermite echelon form. However, the hermite echelon form is defined only for square matrices.
Definition 1.3.2
tf its elements
A matrix
is said to be in hermite echelon form satisfies the following conditions.
(i) H is upper triangular (i.e. =0 when i (ii) is either 0 or 1. (iii) If h,1 =0, then h, = Ofor every k, 1 k (iv) If h1, = 1, then hkg = Ofor every k # I.
>j). n.
16
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
For example, the matrix
120 0000 3501 000 —2 4 0 0 00 H=000 0000 000 0000 000 0013 000 0000 1
is in hermite form. Below are some facts about the hermite form, the proofs of which may be found in [65). For can always be row reduced to a hermite form. If A is reduced to its row echelon form, then a permutation of the rows can always be performed to obtain a hermite form. (H2) For a given matrix A the hermite form HA obtained by row reducing A is unique. (H3) = HA (i.e. HA is a projection). (H4) N(A) = N(HA) = R(I — HA) and a basis for N(A) is the set of non-zero columns of I — HA. A
We can now present the methods of Theorems 1 and 2 as algorithms.
Algorithm 1.3.1
To obtain the generalized inverse of a square matrix
Ae CA
Row reduce A* to its hermite form HA,. (II) Select the distinguished columns of A*. Label these columns v, and place them as columns in a matrix L. '1' (III) Form the matrix AL. (IV) Form I — HA. and select the non-zero columns from this matrix. Label these columns w1 ,w2,... , (V) Place the columns of AL and the w1's as columns in a matrix (I)
M= rows of M -
and compute M '.(Actually only the first r l are
needed.)
(VI) Place the first r rows of M '(in the same order as they appear in M ')in a matrix called R. (VII) Compute At as At = LR. Although Algorithm 1 is stated for square matrices, it is easy to use it for non-square matrices. Add zero rows or zero columns to construct a square
matrix and use the fact that Algorithm 1.3.2 inverse for any
=
= [AtjO*].
To obtain the full rank factorization and the generalized
THE MOORE—PENROSE OR GENERALIZED INVERSE 17
(I) Reduce A to row echelon form EA.
(II) Select the distinguished columns of A and place them as the columns in a matrix B in the same order as they appear in A. (III) Select the non-zero rows from EA and place them as rows in a matrix C in the same order as they appear in EA. (IV) Compute (CC*yl and (B*B)'. (V) Compute A' as At = C*(CC*) l(B*BY We will use Algorithm 1 to find At where
Example 1.3.3
ri A
4 6
2
1
—10 0
1
1
0 0
1
40
12 Lo
(I) Using elementary row operations on A* we get that its hermite echelon form is
10
10
00
01
H
(H) The first, second and fourth columns of A* are distinguished. Thus
ft 2 0 12 4 0 L=11 0
0
6
1
(III) Then
AL=
22 34
34 56
5
6
41 61
461 1
(IV) The non-zero column oil — HA. is = [— 1, 1/2. 1,0]*. (V) Putting AL and w1 into a matrix M and then computing M' gives r22
34
M— 134
56 6
J5
4 6
—
40
—20
19
14
1
46 —4 —44
0
40
20
1
1/2
1
61
(VI) The first three rows of M ' give R as 40 i 1
—20 14
50 —26
[—46—4 -44
—90 18
342
50 —26
40
—
901
18!
oJ
18 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
(VII) Thus
4 —1 —27
1
2 20 0
—
Example 1.3.4
A=
8
—10 0
—2 —54 25
—45
0
45
We will now use Algorithm 2 to fmd At where
12141 24066 0 24066 1
2
3
3
(I) Using elementary row operations we reduce A to its row echelon form
Ii
2
0
3
31
10
0
1
1
—21
EA=10
00
0
01
L0000
oJ
(II) The first and third columns are distinguished. Thus
B=[? 1]. (III) The matrix C is made up of the non-zero rows of EA so that
20 Lo 0
(IV) Now CC*
1
3
3
1
—2
[23
B*B =[1?
—
Calculating
=
we get
(V) Substituting the results of steps (II), (III) and (IV) into the formula
for At
At = C*(CC*) l(B*B) 1B*
=1
27 54 207 288 —333
6
3
6
12
6
12
—40 —20 —40 —22 —11 —22 98
49
98
THE MOORE—PENROSE OR GENERALIZED INVERSE
19
Theorem 2 is a good illustration of one difficulty with learning from a
text in this area. Often the hard part is to come up with the right formula. To verify it is easy. This is not an uncommon phenomenon. In differential equations it is frequently easy to verify that a given function is a solution. The hard part is to show one exists and then to find it. In the study of generalized inverses, existence is usually taken care of early. There remains then the problem of finding the right formula. For these reasons, we urge the reader to try and derive his own theorems as we go. For example, can you come up with an alternative formula to that of Theorem 2? The resulting formula should, of course, only involve At on one side. Ideally, Then ask yourseff, can I do better by it would not even involve B' and imposing special conditions on B and C or A? Under what conditions does the formula simplify? The reader who approaches each problem, theorem and exercise in this manner will not only learn the material better, but will be a better mathematician for it. 4.
Generalized inverse of a product
As pointed out in Section 2, one of the major shortcomings of the
Moore—Penrose inverse is that the 'reverse order law' does not always hold, that is, (AB)' is not always BtAt. This immediately suggests two questions. What is (AB)'? When does (AB)t = BtAt? The question, 'What is (AB)'?' has a lot of useless, or non-answer, answers. For example, (AB)' = is a non-answer. It merely restates condition (ii) of the Penrose definition of (AB)t. The decision as to whether or not an answer is an answer is subjective and comes with experience. Even then, professional mathematicians may differ on how good an answer is depending on how they happen to view the problem and mathematics. The authors feel that a really good answer to the question, 'What is (AB)'?' does not, and probably will not exist. However, an answer should:
(A) have some sort of intuitive justification if possible; (B) suggest at least a partial answer to the other question, 'When does
= B'A'?' Theorem 4.1 is, to our knowledge, the best answer available. We shall now attack the problem of determining a formula for (AB)'. The first problem is to come up with a theorem to prove. One way to come up with a conjecture would be to perform algebraic manipulations on (AB)' using the Penrose conditions. Another, and the one we now follow, is to draw a picture and make an educated guess. If that guess does not work, then make another. Figure 1.1 is, in a sense, not very realistic. However, the authors find it a convenient way to visualize the actions of linear transformations. The vertical lines stand for CTM, and C. A sub-interval is a subspace. The rest of the interval is a (possibly orthogonal) complementary subspace. A
20 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS C
1'
— C
C,,
---—=A '45R (5) Fig. 1.1
shaded band represents the one to one mapping of one subspace onto Xfl another. It is assumed that A€C'" and BEC" ". In the figure: (a, c) =
R(B*), (a', c') = R(B), (b', d') = R(A*), and (b", d") = R(A). The total shaded band from C" to C" represents the action of B (or The part that is shaded more darkly represents PR(A.)B = AtAB. The total shaded area
from CN to C'" is A. The darker portion is APR(B) = ABBt. The one to one from C" to C'" may be viewed as the v-shaped portion of the mapping dark band from C" to C'". Thus to 'undo' AB, one would trace the dark band backwards. That is, (AB)t = (PRA.B)t(APRB)t. There are only two things wrong with this conjecture. For one,
AB = A(AtA)(BBt)B
(1)
and not AB = A(BBt)(A?A)B. Secondly, the drawing lies a bit in that the interval (b', c') is actually standing for two slightly skewed subspaces and not just one. However, the factorization (I) does not help and Fig. 1.1 does seem to portray what is happening. We are led then to try and prove the following theorem due to Cline and Greville.
Theorem 1.4.1
IfA€C'"
X
and BEC"
X
then (AB)t = (PR(A. )B)(APR(B)).
Proof Assuming that one has stumbled across this formula and is wondering if it is correct, the most reasonable thing to do is see if the formula for (AB)t satisfies any of the definitions of the generalized inverse. Let X = (PR( A.)B)'(APR(B))t = (AtAB)t(ABBt)t. We will proceed as follows. We will assume that X satisfies condition (i) of the Penrose definition. We will then try and manipulate condition (1) to get a formula that can be verified independently of condition (1). If our manipulations are reversible, we will have shown condition (i) holds. Suppose then that ABXAB = AB, or equivalently, AB(AtAB)?(ABBt)tAB = AB.
(2)
Multiply (2) on the left by At and on the right by Bt. Then (2) becomes A?AB(AtAB)t(ABBt)tABBt = AtABBt,
(3)
THE MOORE—PENROSE OR GENERALIZED INVERSE
or,
21
A.PR(B1• Equivalently,
BVA' ) =
(4)
= To see that (4) is true, we will show that if E1
=
R(BPBB' R(A),E2 =
(5)
Suppose first that u€R(B)-. Then E1u = E2u =0. Suppose then that ueR(B). Now to find E1u we will need to But BBtR(A*) is a subspace of R(B). Let u = u1 calculate R(A*) and u2e[BBtR(A*)]1 n R(B). Then where u1 then E1u = E2u for all
=
R(BPBB'
=
R(B)U1 =
(6)
AtAu1.
(7)
Equality (7) follows since u1 ER(B) and the projection of R(B) onto AtAR(B) is accomplished by A'A. Now E2u = PR(A.)PR(B)u = PR(A.)u = AtAii, SO = A'Au1, that is, if U2EN(A) = (4) will now follow provided that R(A*)J.. Suppose then that vER(A*). By definition, u2 BBty. Thus o = (BBtv,u2) = (V,BBtU2) = (v,u2). Hence u2eR(A*)I. as desired. (4) is now established. But (4) is equivalent to (3). Multiply (3) on the left by A and the right by B. This gives (2). Because of the particular nature of X it turns out to be easier to use ABXAB = AB to show that X satisfies the Moore definition Of(AB)t. Multiply on the right by (AB)t. Then we have But N(X) c N((ABBt)t) = R(ABBt)1 = R(AB). ABXPR(AB) = Thus XPR(As)
=
and hence
ABX=PR(AB).
(8)
)XAB = Now multiply ABXAB = AB on the left by (AB)t to get B•A• But R(X) c R((AtAB)t) = R((AtAB)*) = R(B*A*A*t) = R(B*A*) and hence BA X = X. Thus XAB=PR((AB).).
(9)
Equations (8) and (9) show that Theorem 1 holds by the Moore definition.
U Comment It is possible that some readers might have some misgivings about equality (7). An easy way to see that R(B)U1 = AtAu1 is as EAtAR(B). But follows. Suppose that u1 ER(B). Then A(u1 — AtAu1)= Au1 — Au1 = 0. Thus u1 — AtAu1eR(A*)1 c(AfAR(B))L. Hence u1 = AtAU1 (u1 — Thus PR(AAB)u3 = A'Au1. The formula for simplifies if either = I or = I.
Corollary 1.4.1 Suppose that AECTM "and BeC" P (i) If rank (A) = n, then (AB)' = Bt(APR(a,)t. )B)A. (ii) If rank (B) = n, then (AB)' =
22 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Part (i) (or ii) of Corollary 1 is, of course, valid if A (or B) is invertible.
and rank(A)= rank(B) = n. Then (AB)t = BtAt and At = A*(AA*)l while Bt = (B*B) 1B*. The formulas for At and Bt in Corollary 1 come from Theorem 3.2 and B In fact Corollary 1 can be and the factoring of A = derived from Theorem 3.2 also. The assumptions of Corollary 2 are much more stringent than are necessary to guarantee that (AB)t = BtAt.
Corollary 1.4.2
Example 1.4.1
Suppose that
LetA=
10101 10
0
1a001
ii andB=
10
andBt=
01 where a,b,ceC.
LOOd
L000J 10001 0 01
b
10
bt
cland(AB)t= Ibt 0 0l.Ontheotherhand ctOJ L000J Lo 0
10001
BtAt= Ibt 0 LO
sothat(AB)t=BtAt.NoticethatBA=
ctOJ
IOaOl 10 0 bI. L000J
By varying the values of a,b and cone can see that (AB)t = B?At
possible without
(i)AB=BA (a#b) (ii) rank(A)=rank(B) (iii)
(a=b=0,c=1)
rank(AB)=rank(BA) (a=b#0,c=0).
The list can be continued, but the point has been made. The question
remains, when is (AB)t = BtAt? Consider Example 1 again. What was it about that A and B that made (AB)t = BtAt? The only thing that A and B seem to have in common are some invariant subspaces. The subspaces R(A), R(A), N(A), and N(A) are all invariant for both A and B. A statement about invariant subspaces is also a statement about projectors. A possible method of attack has suggested itself. We will assume that = BtAt. From this we will try to derive statements about projectors. In Example 1, MA and B were simultaneously diagonalizabk so we should see if MA works in. Finally, we should check to see if our conditions are necessary as well as sufficient. Assume then that and
= BtAt.
(10)
Theorem 1 gives another formula for (AB)'. Substitute that into (10) to get (AtAB)t(ABBt)t = B?At. To change this into a projector equation,
THE MOORE—PENROSE OR GENERALIZED INVERSE
23
multiply on the left by (A'AB)and on the right by (ABBt), to give
PR(A'AB) P
—'P R(A') PR(B)1
11
By equation (4), (11) can be rewritten as
and
=
hence is a projector. But the product of two hermitian projectors is a projector if and only if the two hermitian projectors commute. Thus (recall that [X, Y] = XY — YX) = 0, or equivalently, [A'A,BB'] = 0, If(AB)t = B?At. (12)
Is (12) enough to guarantee (10)? We continue to see if we can get any additional conditions. Example 1 suggested that an AA term might be useful. If(10) holds, then ABBtAI is hermitian. But then A*(ABBtAt)A is hermitian. Thus A*(ABBt)A?A = AtABBtA*A. (13) Using (12) and the fact that or
= A5, (13) becomes
=
{A*A,BBtJ = 0.
(14)
Condition (14) is a stronger condition than (12) since it involves AA and not JUSt
AA )•
In Theorem 1 there was a certain symmetry in the formula. It seems unreasonable that in conditions for (10) that either A or B should be more important. We return to equation (10) to try and derive a formula like (14) but with the roles of A and B 'reversed'. Now BtAtAB is hermitian is hermitian. Proceedsince we are assuming (10) holds. Thus ing as before, we get that
[BB*,AtA]=O.
(15)
Continued manipulation fails to produce any conditions not implied by
(14) and (15). We are led then to attempt to prove the following theorem. Theorem 1.4.2
Suppose that A e C'" statements are equivalent:
Xfl
and BE C"
X
Then the following
(AB)' = BtAt. (ii) BB*AtA and A*ABBt are hermitian. (iii) R(A5) is an invariant subspace for BB5 and R(B) is an invariant (1)
subs pace of A5A.
= 0 and =0. AtABBSAS (v) = BB5A5 and BB'A5AB = A5AB.
(iv)
Proof Statement (iii) is equivalent to equations (14) and (15) so that (i) implies (ii). We will first show that (ii)—(v) are all equivalent. Since BBS and A5A are hermitian, all of their invariant subspaces are reducing. Thus (ii) and (iii) are equivalent. Now observe that if C is a matrix and M a
24 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
= (I — = then + PMCPtI. This + is an invariant subspace if and only if =0. Thus says that (iii) and (iv) are equivalent. Since (v) is written algebraically, we will show it is equivalent to (ii). Assume then that BB*AtA is hermitian. Then BB*AtA = AtABB*. Thus BB*(AtA)A* = A?ABB*A*, or subspace,
BB*A* = AtABB*A*.
(16)
Similarly if A*ABBt is hermitian. then BBtA*AB = A*AB
(17)
so that (ii) implies (v). Assume now that (16) and (17) hold. Multiply (17) on the right by Bt and (16) on the right by A*t. The new equations are precisely statement (iii) which is equivalent to (ii). Thus (ii)—(v) are all equivalent. Suppose now that (ii)—(v) hold. We want to prove (1). Observe that BtAt = Bt(BBt)(AtA)At = Bt(AtA)(BBt)At, while by Theorem I (AB)t = Theorem 2 will be proven if we can show that
(AtAB)t=Bt(AtA) and
(18)
(ABBt)t = BBtAt.
(19)
To see 11(18) holds, check the Penrose equations. Let X = Bt(AtA). Then AtABX = AtABB?AtA = AtAAtABBt = A?ABBt = A'ABB') = Thus Penrose conditions (i) and (iii) are satisfied. Now X(AtAB) = Bt(AtA)(AtA)B = Bt(AtA)B. Thus X(AtAB)X = Bt(AtA)BB?(AtA) = BtAtA = X and Penrose condition (ii) is satisfied. There remains then only to show that X(A'AB) is hermitian. But A?ABB* is hermitian by assumption (ii). Thus B?(A?ABB*)Bt* is hermitian and Bt(A?A)BB*B*t = Bt(AtA)B = X(A'AB) as desired. The proof of (19) is similar and left to
the exercises. S It is worth noticing that conditions (ii)—(v) of Theorem 1.4.2 make Fig. 1.1 correct. The interval (b', c') would stand for one subspace R(AtABB?) rather than two skewed subspaces, R(AtABBt) and R(BBtAtA). A reader might think that perhaps there is a weaker appearing set of conditions than (ii)—(v) that would imply (ABt) = BtAt. The next Example shows that even with relatively simple matrices the full statement of the conditions is needed. 10 0 01 ri 0 0] Example 1.4.2 Let A = 10 1 and B = 10 1 0 I. Then B = B* = Lo i oJ Lo 0 OJ 1
10
00 ] 100
Bt=B2andAt=IO 11 Lo
i]'I=Io 0
Li oJ J
[0 i
01
1I.N0wBB*AtAis
—ij
hermitian so that [BB*, AtA] = 0 and [BBs, AtA] =0. However,
10001
A*ABBt = 10 2 LO
I
which is not hermitian so that (AB)t # B'At. 0J
THE MOORE—PENROSE OR GENERALIZED INVERSE
25
An easy to verify condition that implies (AB)t = BtAt is the following.
Corollary 1.4.3 IJA*ABB* = BB*A*A, then (AB)t = B'A'. The proof of Corollary 2 is left to the exercises. Corollary 2 has an advantage over conditions (ii)—(iv) of Theorem 2 in that one does not have to calculate At. Bt, or any projectors to verify it. It has the disadvantage that it is only sufficient and not necessary. Notice that the A, B in Example I satisfy [AtA, BB*] =0 while those in Example 2 do not. There is another approach to the problem of determining when (AB)t = BtAt. It is to try and define a different kind of inverse of A, call it, A so B A . This approach will not be discussed. that (AB)
5. 1.
Exercises Each of the following is an alternative set of equations whose unique solution X is the generalized inverse of A. For each definition show that it is equivalent to one of the three given in the text.
(a) AX = PR(A),MX ) = N(A). (b) AX = PR(A),XA = PR(A.),XAX = X. (c) XAA* =
d
A*,XX*A* = x.
XAx—J"
if
10 ifxeN(A*).
(e) XA = PR(A),N(X) = N(A*). Comment: Many of these have appeared as definitions or theorems in the
literature. Notice the connection between Example 2.1, Exercise 1(b), and Exercise 3 below. 2. Derive a set of conditions equivalent to those given in Definition 1.2 or Definition 1.3. Show they are equivalent. Can you derive others? 3. Suppose that A€Cm Xfl Prove that a matrix X satisfies AX = XA = if and only if X satisfies (i), (iii), and (iv) of Definition 1.3. Such an X is called a (1,3,4)-inverse of A and will be discussed later. Observe that it cannot be unique if it is not At since trivially At is also a (1,3,4)-inverse.
4. Calculate At from Theorem 3.1 when A
and when
=
Hint: For the second matrix see Example 6. 5. Show that if rank (A) =
1,
then At = !A* where k = 1=1 j= 1
6.
If and rank (A) = n, notice that in Theorem 3.2, C may be chosen as an especially simple invertible matrix. Derive the formula
26 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
for At under the assumption that rank A = n. Do the same for the case
when rank (A) = 0 f 0 0 7. Let A = 1
I
0
L—1
1
0
21 I. Cakulate At from Definition 1.1. 1
—1 0]
8. Verify that (At)* = (A*)t. 9. Verify that (AA)t = AtAt, AEC, where ,V = 10. If AeCTM
Xiv
1
if A
0 and 0' =0.
and U, V are unitary matrices, verify that (UAV)t =
V*AtU*. 11. Derive an explicit formula for A' that involves no (t)'s by using the singular value decomposition.
*12. Verify that At =
*13. Verify the At =
I 2iri
f !(zI z
— A*A) IA*dz where C is a closed contour
containing the non-zero eigenvalues of A*A, but not containing the zero eigenvalue of A*A in or on it.
14. Prove Proposition 1.1. Exercises 15—21 are all drawn from the literature. They were originally done by Schwerdtfeger, Baskett and Katz, Greville, and Erdelyi. Some follow almost immediately from Theorem 4.1. Others require more work.
such that rank A = rank B and if the eigenvectors corresponding to non-zero elgenvalues of the two matrices ASA and BB* span the same space, then (AB)' = B'A'. and AAt = AtA,BBt = BtB,(AB)tAB= 16. Prove that if AEC AB(AB)', and rank A = rank B = rank(AB), then (AB)' = BtAt. Be 17. Assume that AECTM Show that the following statements are equivalent. 15. Prove that if AECrnXv and
X
(i) (AB)t = B'A'. (ii) A?ABB*A*ABBt = BB*A*A. (iii) A'AB = B(AB)'AB and BBtA* = A*AB(AB)t. (iv) (AtABBt)* = (AtABBt)t and the two matrices ABBtAt and UtAtAB are both hermitian.
18. Show that if [A, = 0, [At, (Bk] =0, [B, =0, and [Bt,PR(A,)] = 0, then (AB)' = 19. Prove that if A* = At and B* = Bt and if any of the conditions of Exercise 18 hold, then (AB)' = BtAt = B*A*. 20. Prove that if A* = A' and the third and fourth conditions of Exercise 18 hold, then (AB)' = BtAt = BtA*. 21. Prove that if B* = B' and the first and second conditions of Exercise 18 hold, then (AB)t = BtAt. 22. Prove that if [A*A, BB*] =0, then (AB)t = B'A'.
THE MOORE—PENROSE OR GENERALIZED INVERSE
27
Verify that the product of two hermitian matrices A and B is hermitian if and only if [A, B] =0. 24. Suppose that P. Q are hermitian projectors. Show that PQ is a projector if and only if [P, Q] =0. = 25. Assuming (ii)—(v) of Theorem 8, show that (APR(B))t = BtAt. PR(B)A without directly using the fact that (AS)' = 26. Write an expression for (PAQ)t when P and Q are non-singular. 27. Derive necessary and sufficient conditions on P and A, P non-singular, 23.
for (P 'AP)' to equal P 'A'P. X
28. Prove that if Ae Br and the entries of A are rational numbers, then the entries of At are rational.
2
Least squares solutions
1.
What kind of answer is Atb?
At this point the reader should have gained a certain facility in working with generalized inverses, and it is time to find out what kind of solutions they give. Before proceeding we need a simple geometric lemma. Recall \1/2 if w = [w1, ... , w,3*GCP, then l( w = E 1w112) = (w*w)U2 that denotes the Euclidean norm of w.
Lemma 2.1.1
Ifu,veC"and(u,v)=O,thenhlu+v112=11u112+11v112.
Proof Suppose that
and (u,v) =0. Then
llu+v112 =(u+v,u+v)=(u,u)+(v,u)+(u,v)+(v,v)= hull2 + 11,112. Now consider again the problem of finding solutions u to (1)
If(1) is inconsistent, one could still look for u that makes Au — as possible.
b
as small
Definition 2.1.1
Suppose that and b€Cm. Then a vector is called a least squares solution to Ax = b if II Au — b Av — b for all veC'. A vector u is called a minimal least squares solution to Ax = b tf u is a least squares solution to Ax = b and liii < w for all other least squares solutions w.
The name 'least squares' comes from the definition of the Euclidean
norm as the square root of a sum of squares. If beR(A), then the notions of solution and least squares solution obviously coincide. The next theorem speaks for itself. Theorem 2.1.1
Suppose that AeCtm'" and beCTM. Then Atb is the
minima! least squares solution to Ax = b.
LEAST SQUARES SOLUTIONS 29
Proof Notice that IIAx—b112 = II(Ax
(I -AA')b112
—
= IlAx —
+ 11(1 —AAt)b112.
Thus x will be a least squares solution if and only if x is a solution of the consistent system Ax = AAtb. But solutions of Ax = AAtb are of the form
AtA)h = — A'A)h. Since 11x112 = IIAtbII2 we see that there is exactly one minimal least squares solution x = Atb. As a special case of Theorem 2.1.1, we have the usual description of an orthogonal projection.
x=
—
Corollary 2.1.1
Suppose that M is a subs pace of
•
and
is the
orthogonal projector of onto M. If beCI*, then PMb is the unique closest vector in M to b with respect to the Euclidean norm.
In some applications, the minimality of the norm of a least squares solution is important, in others it is not. If the minimality is not important, then the next theorem can be very useful.
Theorem 2.1.2 Suppose that AECTM "and beCTM. Then the following statements are equivalent (1) u is a least squares solution of Ax = b,
(ii) u is a solution of Ax = AAtb, (iii) ii is a solution of A*Ax = A*b. (iv) u is of the form A'b + h where heN(A).
Proof We know from the proof of Theorem 1 that (i), (ii) and (iv) are equivalent. If (1) holds, then multiplying Au = b on the left by A* gives (iii). On the other hand, multiplying A*Au = A*b on the left by A*t gives Au = AA'b. Thus (iii) implies (ii). U Notice that the system of equations in statement (iii) of Theorem 2 does not involve At and is a consistent system of equations. They are called the normal equations and play an important role in certain areas of statistics.
It was pointed out during the introduction to this section that if X satisfies AXA = A, and be R(A), then Xb is a solution to (1). Thus, for consistent systems a weaker type of inverse than the Moore—Penrose would suffice. However, if then the condition AXA = A is not enough to guarantee that Xb is a least squares solution.
Fact There exist matrices X, A and vector b, R(A), such that AXA = A but Xb is not a least squares solution of Ax = b. Example 2.1.1
If X satisfies AXA = A, then X is of the
Let A
= form 1
Lx21
squares
1. Let b =
X22J
1 1. Then by Theorem 2 a vector u isa least L1J
solution to Ax = b if and only if Ax = b1 where b1
least squares solution, then IIAu — bil =
11b1 —
bli
=
1.
r1i
If u is a
= Loi [1 +2X1 2].
But A(Xb) =
30 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
= (1 +41 x12 12)1/2. If x22 0 then A(Xb) — b > I and Thus A(Xb) — Xb will not be a least squares solution Ax = b.
Example 2 also points out that one can get least squares solutions of the form Xb where X is not At. Exactly what conditions need to be put on X to guarantee Xb is a least squares solution to Ax = b will be discussed in Chapter 6.
2.
Fitting a linear hypothesis
Consider the law of gravitation which says that the force of attraction y
between two unit-mass points is inversely proportional to the square of the distance d between the points. If x = l/d2. then the mathematical formulation of the relationship between y and x isy =f(x) = /ix where fi is an unknown constant. Because the functionf is a linear function, we consider this to be a linear functional relationship between x and y. Many such relationships exist. The physical sciences abound with them. Suppose that an experiment is conducted in which a distance d0 between two unit masses is set and the force of attraction y0 between them is measured. A value for the constant fi is then obtained as fi = y0/x0 = However, if the experiment is conducted a second time, one should not be greatly surprised if a slightly different value of fi is obtained. Thus, for the purposes of estimating the value of fi, it is more realistic to say that for each fixed value of x, we expect the observed values yj of yto satisfy an equation of the form y1 = fix + e1 where ej is a measurement error which occurs more or less at random. Furthermore, if continued observations of y were made at a fixed value for x, it is natural to expect that the errors would average out to zero in the long run. Aside from measurement errors, there may be another reason why different observations of y might give rise to different values of/i. The force of attraction may vary with unknown quantities other than distance (e.g. the speed of the frame of reference with respect to the speed of light). That is, the true functional relationship may be y = fix + g(u1 , u2, ... ,UN) where the function g is unknown. Here again, it may not be unreasonable to expect that at each fixed value of x, the function g assumes values more or less at random and which average out to zero in the long run. This second type of error will be called functional error. Many times, especially in the physical sciences, the functional relationship between the quantities in question is beyond reproach so that measurement error is the major consideration. However, in areas such as economics, agriculture, and the social sciences the relationships which exist are much more subtle and one must deal with both types of error. The above remarks lead us to the following definition.
Definition 2.2.1 When we hypothesize that y is related linearly to x1 ,x2, ... ,xN, we are hypothesizing that for each set of values p1 = (x11, x12,... , x1)for x1, x2,... ,x,,, the observations y1for y atp1 can be expressed where(i)/i0,fi1,...,fiare asy,=fi0+fi1x11 +fi2x12 + ...
LEAST SQUARES SOLUTIONS
31
unknown constants (called parameters). (ii) e,1 is a value assumed by an unknown real valued function e, such that e, has the property that the values which it assumes will 'average out' to zero over all possible observat ions y at p..
That is, when we hypothesize that y is related linearly to x3 , x2, ...
,;,
are hypothesizing that for each point p = (x1, , the 'expected ... , value', E(,y.), of the observation y. at (that is, the 'average observation' at we
p1) satisfies the equation
=
+
+
+ ... +
and not that y1 does. This can be easily pictured in the case when only two
variables are involved. Suppose we hypothesize that y is related linearly to the single variable x. This means that we are assuming the existence of a linef(x) = II,,, + such that each point (x1, E(y.)) lies on this straight line. See Fig. 21.
In the case when there are n independent variables, we would be hypothesizing the existence of a surface in (which is the translate of a subspace) which passes through the points (p1. E(y.)). We shall refer to such a surface as a flat. In actual practice, the values E(y.) are virtually impossible to obtain exactly. Nevertheless, we will see in the next section that it is often possible to obtain good estimates for the unknown parameters, and therefore produce good estimates for the E(y.)'s while also producing a reasonable facsimile of the hypothesized line of flat. The statistically knowledgeable reader will by now have observed that we have avoided, as much as possible, introducing the statistical concepts which usually accompany this type of problem. Instead, we have introduced vague terms such as 'average out'. Admittedly, these terms being incorporated in a definition would (and should) make a good mathematician uncomfortable. However, our purpose in this section is to examine just the basic aspects of fitting a linear hypothesis without introducing statistical
The set of oil
E(.Vm)
possible
at x,,, set of all possible observations
4The set of oil possible observations at x1
A'
Fig. 2.1
32 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
concepts. For some applications, the methods of this section may be
sufficient. A rigorous treatment of linear estimation appears in Chapter 6. In the next two sections we will be concerned with the following two basic problems.
(I) Having hypothesized a linear relationship, y, = + + ... + for the unknown parameters P1. 1=0,1,... find estimates , n. + e,, (II) Having obtained estimates for the p1's, develop a criterion to help decide to what degree the functionf(x1 , x2, ... , = Po + + ... + 'models' the situation under question. 3.
Estimating the unknown parameters
We will be interested in two different types of hypotheses.
Definition 2.3.1
When the term
is present in the expression (1)
we shall refer to (1) as an intercept hypothesis. When does not appear, we will call (1) a no intercept hypothesis. Suppose that we have hypothesized that y is related linearly to by the no intercept hypothesis x1 , x2,...
,;
y1 = P1;1
+ P2;2 + ... +
(2)
+ e,1.
To estimate the parameters P1 , select (either at random or by ... , design) a set of values for the x's. Call them p1 = [x11,x12, ... ,x1J. Then observe a value for y at p1 and call this observation y1. Next select a second set of values for the x's and call them p2 = [x21 ,x22, ... ,x2j(they need not be distinct from the first set) and observe a corresponding value for y. Call it y2. Continue the process until m sets of values for the x's and m observations for y have been obtained. One usually tries to have rn> n. If the observations for the x's are placed as rows in a matrix
xli x21
xml
pm_i
which we will call the design matrix, and the observed values for y are
placed in a vector y = [y1 ,...
we may write our hypothesis (2) as
y=Xb+e, where b is the vector of unknown parameters b =
(3)
,... , p,,JT and e, is the
unknown e,,= [e,1,... ,e,,JT.
In the case of an intercept hypothesis (1) the design matrix X1 in the equation
y=X,b,+e,
(4)
LEAST SQUARES SOLUTIONS 33
takes on a slightly different appearance from the design matrix X which
arose in (3). For an intercept hypothesis. X1
is
of the form
ri I
I
[1 x2 1
I
i=I
:
Li
Xmi
and b3 is of the form
I
Xm2 b1
L'
maJipnx(n+1)
= [IJ0IbT]T,
=
b=
Consider a no intercept hypothesis and the associated matrix equation b is to use the (3). One of the most useful ways to obtain estimates information contained in X and y, and impose the demand that 6 be a vector such that 'X6 is as close to y as possible', or equivalently, 'e, is as close to 0 as possible. That is, we require 6 to be a least squares solution of Xb = y. Therefore, from Theorem 1.2, any vector of the form 6= Xty + h,heN(X), could serve as an estimate for b. If X is not of full column rank, to select a particular estimate one must impose further restrictions on 6. In passing, we remark that one may always impose the restriction that 11611 be minimal among all least squares estimates so that the desired estimate is 6= Xty. Depending on the application, this may or may not be the estimate to choose. defined by x6 = is For each least squares estimate 6 of b, the values', E(y), where an estimate for the vector of
E(y)= [E(y1),... Although it is a trivial consequence of Theorem 1.2, it is useful to observe the following. The vector = X6 is the same for all least sjuares solutions 6, of X6 = Moreover,for all least squares solutions 5, = x6 = = XXty and r = y — = = (I — XXt)y.
Theorem 2.3.1
A closely related situation is the following. Suppose (as is the case in
many applications) one wishes to estimate or predict a value for a particular linear combination (5)
of the
on the basis of previous observations. Here,
= [c1 ,... , ce].
That is, we want to predict a value, y(c*), for y at the point = [C1, c2, ... , cjon the basis of observations made at the points p1 , p2, ... p,,,. If = cab, we know that it may be possible to have we use = infinitely many estimates 6. Hence y(c*) could vary over an infinite set of values in which there may be a large variation. However, there when c*b is invariant among all least squares estimates 6, so that y(c*) has a unique value.
Theorem 2.3.2
Let
The linear form
least squares solutions of X6 = c*16
= C*X?Y.
invariant among all y !f and only iice R(X*); in which case, is
34 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Proof If S is a least squares solution of X6 = y, then S is of the form 6 = X'y + h where heN(X). Thus c"S = cXty + ch is invariant if and only if c*h =0 for all hEN(X). That is, ceR(X*) = N(X)L. • Note that in the important special case when X has full column rank, cb is trivially invariant for any Most of the discussion in this section concerned a no intercept hypothesis and the matrix equation (3). By replacing X by X1 and b by b,, similar remarks can be made about an intercept hypothesis via the matrix equation (4).
4.
Goodness of fit
Consider first the case of a no intercept hypothesis. Suppose that for
various sets of values p, of the x's, we have observed corresponding values y, for y and set up the equation
y=Xb+e,.
(1)
From (1) we obtain a set of least squares estimates parameters ,81
,
for the
...
As we have seen, one important application is to use the to estimate or predict a value for y for a given point = [c1 ,c2, ... ,cJ by means
of what we shall refer to as the estimating equation (2)
How good is the estimating equation? One way to measure its effectiveness is to use the set of observation points which gave rise to X and measure how close the vector y of observed values is to the vector of estimated values. That is how close does the flat defined by (2) come to passing From through the points (ps, y— Theorem 3.1 we know that is invariant among all least squares estimates S and that y — = r = 11(1— XXt)y II. One could be tempted to say that if r Ills small, then our estimating
equation provides a good fit for our data. But the term 'small' is relative. If we are dealing with a problem concerning distances between celestial objects, the value r = 10 ft might be considered small whereas the same value might be considered quite large if we are dealing with distances between electrons in an atomic particle. Therefore we need a measure of relative error rather than absolute error. Consider Fig. 2.2. This diagram Suggests another way to measure how close y is to The magnitude of the angle 9 between the two vectors is such a measure. In Ce', it is more convenient to measure Icos 0,, rather than I°I, by means of the equation cos 0i_j!.tll —
_IIxxtyII —
(Throughout, we assume y 0, otherwise there is no problem.) LikeWise, Isin 0, or stan might act as measures of relative error. Since y can be
LEAST SQUARES SOLUTIONS 35
R
Fig. 2.2
decomposed into two components, one in R(X) and the other in R(X)--, y = + r, one might say, in rough terms, that Icos 01 represents the 1. percentage ofy which lies in R(X). Let R = cos 0 so that 0 RI Notice that if IRI = 1, then all of y is in R(X),y = and r = 0. If R = 0, then y j.. R(X), =0, and r = y. Thus when I RI = 1, the flat defined by the actually passes 131x1 + j32x2 + ... + equationf(x1,x2, ... through each of the data points (p1. y1) so that we have an 'exact' or as possible and 'perfect' fit. When R =0, y is as far away from we say that we have no fit at all. In practice, it is common to use the term R2 = cos2 0 = I! 112/Il 112 rather than I RI to measure the goodness of fit. Note that R2 RI since R 1. Thus R2 is a more conservative measure. For example, if R2 = 0.96, one would probably consider the fit to be fairly good since this indicates that, in fact, about 98% of y lies in R(X). A familiar form for R2, when all numbers are real, is /.,. \2 1=1
R2—
1=1
1=1
where
denotes the ith entry of XXty = and y. is the ith entry of y.
This follows because II
112
[y*yJctyj = [y*fl = 1=1
Hence R2 =
Notice that R and R2
2 II 3' II
r=
is not. In statistical circles, R goes by the name of the product moment correlation and R2 is known as the between the observed y1's and the predicted are unit free measures whereas
coefficient of determination.
y
—
36 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Consider now the case of the intercept hypothesis
= Po + Pixji + ... +
+
As mentioned earlier, this gives rise to the matrix equation y = X,b + e,
where X, = [i!X1. If one wishes to measure the goodness of fit in this case, one may be
tempted to copy the no intercept case and use IIx,x,yII
(3
This would be a mistake. In using the matrix X,, more information would be used than the data provided because of the first column, j. The
expression (3) does not provide a measure of how well the flat
fitsthedatapoints Instead, the expression (3) measures how well the flat in C".2 fits the points = + + f(xO,xl,x2, ... (1, p.. y.). In order to decide on a measure of goodness of fit, we need the following fact. = The vector 6, = Theorem 2.4.1 Let X, ,fl,J' = [p0:ST]bT = [Ps.... ,$j, isa least squares solution of X,b, = y and only if = !j*(y — X6) (4) and 6 is a least squares solution of (5)
Here J =
is
a matrix of ones.
Proof Suppose first that
satisfies (4) and that Sis a least squares b"]' is a least squares solution of solution of(5). To show that = X,b, = y, we shall use Theorem 1.2 and show that 6, satisfies the normal = equations, Note first that
lw
'Lxi —I
Therefore, x1'x,S, = (6)
Xb) +
j*y
I X*Jy
—
x*JxG + x*x6
LEAST SQUARES SOLUTIONS
Since S is a least squares solution of (5), we
is a solution of
/
/ \
I
'
mj\ mj 1
1
/
\2
/ \
know from Theorem 2.1 that 6 I
'
mj\ mj 1
\*/ /
/ =II__J) 1
37
1
(7)
\
orthogonal projector onto N(J)). the equation (7) is equivalent to X*X6 — !X*JXS = X*Y
or
—
!X*Jy — !X*JX6 + X*X6 = X*Y.
(8)
= which proves that 6, is a = = y. Conversely, assume now that is a least
Therefore, (6) becomes x,x,61
least squares solution of squares solution of x16, = y. Then 61 satisfies the normal equations = That is, fl0 and S must satisfy m
—
I
—
Direct multiplication yields (9)
X*j$0 + X*X6 = X*y.
Equation (9) implies that value of X*Jy —
(10)
=
into (10) yields
X*JX6 + X*X6 =
—
X6), which is (4). Substituting this
—
X6) + X*X6 = X*y or equivalently,
which is equation (9). Hence, 6
satisfies (7) so that S is a least squares solution of (i
(i
—
—
j )y, and the theorem is proven.
Let XMand yMdenote the matrices XM =
(i !J)y and let i1 =!
and =
(i
—
—
!
and
y.. That is,
xS =
YM
=
the mean of
—
the jth column of X, is the mean of the values assumed by thejth independent variable Likewise, is just the mean of all of the observations y, for y.
38 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
and YM are
Therefore,
the matrices
I x21—X1 X22—X2
XMI
:
1Y2—YI
I'YMI
x,,2—i2
I
Theorem 1 says that in obtaining least squares solutions of X,51 = y, X15 = YM In we are really only obtaining least squares solutions effect, this says that fitting the intercept hypothesis + P,x, + y. = /1,,, + fl1x13 + fl2x12 + by the theory of least squares is equivalent to fitting the no intercept hypothesis
=
—
+ P2(x,2
—
—
x2) +
+
—
i.,) +
Thus, a measure of goodness of fit for an intercept hypothesis is 2
p2_ —
M
JM
— —
2
;
2 2
A familiar form for R2 when all numbers are real is
(11)
[91_irJ2) i—i
where y. is the ith entry of y and (11), we must first prove that
=
is the ith entry of = X,Xy. To prove (12)
—
To see that (12) is true, note that so that by Theorem 1,
is a least squares solution of X,,6 = y
[!j*y — L
a least squares solution of = y. Thus all least squares solutions of X161 = y are of the form 61=s+ h where heN(X1). Because Xy is a least squares solution of = y, there must be a vector b0e N(X1) such that is
= X1(x + h0) = X1s = !jj*(y —
= s + h0. Therefore,
= !Jy =
+
—
+
= !Jy + (i —
+ =
+
from which (12) follows. Now observe that (12)
LEAST SQUARES SOLUTIONS 39
implies that the ith entry (YM)j of
=
is
given by (13)
We can now obtain (11) as 2
R2=
YM
4
=
IIYMII2
IIYMIIIIY%jII
(14)
=
By using (13) along with the definition of YM we see that (14) reduces to (11). We summarize the preceding discussion in the following theorem.
Theorem 2.4.2
For the no intercept hypothesis
= fl1x11 +
+ ... +
+ e,. the number,
)2 XXty 112_Il
R2 —
112
-
11y1I2
1=1
1=1
is a measure of goodness offit. For the intercept hypothesis, = fl0 + measure of goodness offit is given by + + ... + fi
)2 R2
YM
where XM =
=
-
11211
=11
(i
—
=
IIYMII
YM =
(i
—
Here X1
and
is the ith entry of
andJ =jj*,j =[1,...,lJ*.
In each case 0 R2 1 and is free of units. When R2 = 1, the fit is exact and when R =0, there is no fit at all. 5.
An application to curve fitting
Carl Friedrich Gauss was a famous and extremely gifted scientist who lived
from 1777 to 1855. In January of 1801 an astronomer named G. Piazzi briefly observed and then lost a 'new planet' (actually this 'new planet' was the asteroid now known as Ceres). During the rest of 1801 astronomers and other scientists tried in vain to relocate this 'new planet' of Piazzi. The task of finding this 'new planet' on the basis of a few observations seemed hopeless. Astronomy was one of the many areas in which Gauss took an active interest. In September of 1801, Gauss decided to take up the challenge of
40 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
finding the lost planet. Gauss hypothesized an elliptical orbit rather than
the circular approximation which previously was the assumption of the astronomers of that time. Gauss then proceeded to develop the method of least squares. By December, the task was completed and Gauss informed the scientific community not only where to look, but also predicted the position of the lost planet at any time in the future. They looked and it was where Gauss had predicted it would be. This extraordinary feat of locating a tiny, distant heavenly body from apparently insufficient data astounded the scientific conununity. Furthermore, Gauss refused to reveal his methods. These events directly lead to Gauss' fame throughout the entire scientific community (and perhaps most of Europe) and helped to establish his reputation as a mathematical and scientific genius of the highest order. Because of Gauss' refusal to reveal his methods, there were those who even accused Gauss of sorcery. Gauss waited until 1809 when he published his Theoria Motus Corporum Coelestiwn In Sectionibus Conicis Dolem Ambientium to systematically develop the theory of least squares and his methods of orbit calculation. This was in keeping with Gauss' philosophy to publish nothing but well polished work of lasting significance. Gauss lived before linear algebra as such existed and he solved the problem of finding Ceres by techniques of calculus. However, it can be done fairly simply without calculus. For the sake of exposition, we will treat a somewhat simplified version of the problem Gauss faced. To begin with, assume that the planet travels an elliptical orbit centred about a known point and that m observations were made. Our version of Gauss' problem is this.
Problem A
Suppose that (x1 , y1), (x2 , y2),... , (x,,,, y,,) represent the m
coordinates in the plane where the planet was observed. Find the ellipse in standard position x2/a2 ÷ y2/b2 = 1, which comes as close to the data
points as possible. If there exists an ellipse which actually passes through the m data points, then there exist parameters = 1/a2, = 1/b2, which satisfy each of the m equations fl1(x1)2 + $2(y,)2 = 1 for I = 1,2, ... , m. However, due to measurement error, or functional error, or both, it is reasonable to expect =1 that no such ellipse exists. In order to find the ellipse fl1x2 + which is 'closest' to our m data points, let
=
1
for I = 1,2,... ,m.
Then, in matrix notation, (1) is written as I
e11 e21
X2
eJ
x2
;i LP2J
'
(1)
LEAST SQUARES SOLUTIONS 41
or e = Xb — j. There are many ways to minimize the
could require that
1e11
be minimal
or that max
For example, we I,1e21,...
be
i—I
minimal. However, Gauss himself gave an argument to support the claim
that the restriction that minimal
e 112 be
(2)
i—i
gives rise to the 'best closest ellipse', a term which we will not define here.
(See Chapter 6.) Intuitively, the restriction (2) is perhaps the most reasonable if for no other reason than that it agrees with the usual concept of euclidean length or distance. Thus Problem A can be reformulated as follows.
Problem B
Find a vector 6=
, fl2]T
that is a least squares solution
of Xb = j. From Theorem 2.4, we know that all possible least squares solutions are of the form 6 = Xtj + h where hEN(X). In our example, the rank of XeC"2 wilibe two unless for each i= l,2,...,m; in which case
the data points line on a straight line. Assuming non-colinearity, it follows that N(X) = {O} and there is a unique least squares solution 6= Xtj = (X*X) 1X"j. That the matrix X is of full column rank is characteristic of curve fitting by least squares techniques. Example 2.5.1 We will find the ellipse in standard position which comes as close as possible to the four data points (1, 1), (0,2), ( — 1,1), and (— 1,2).
ThenX=[? the least squares solution to X6 = j and e = I( X6 — if is approximately = I is the ellipse that fits 'best' (Fig. 2.3). A measure of goodness of fit is R2 = f 2/lIi 112 * 0.932 ( * means 'approximately equal to') which is a decent fit. Notice that there is nothing in the working of Problem B that forced Xtj to have positive coefficients. If instead of an ellipse, we had tried to fit a hyperbola in standard position to the data, we would have wound up with the same least squares problem which has only an ellipse as a least squares solutions. To actually get a least squares problem equivalent to Problem A it would have to look something like this: 0.5. Thus j71x2 +
II
Problem C f Au — b H
Find a vector u with positive coefficients such that for all v with positive coefficients.
Av — b
II
The idea of a constrained least squares problem will not be discussed
here. It is probably unreasonable to expect to know ahead of time the
42 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
0
OzOo$o point Fig. 2.3 The ellipse
x2
+1 yz =
orientation of a trajectory. Perhaps a more reasonable problem than
Problem A would be:
Problem 0 Given some data points find the conic section that provides the 'closest fit.' For convenience assume that the conic section does not go through the origin. Then it may be written as ax2 + by2 + cx + dy +fxy = I. If we use the same four data points of the previous example there are many least squares solutions all of which pass through the four data points. The minimal least squares solution is a hyperbola.
Polynomial and more general fittings
6.
In the previous section we were concerned with fitting a conic to a set of
data points. A variation which occurs quite frequently is that of trying to find the nth degree polynomial (1)
Usually one has which best fits m data points (x1 ,yj, (x2,y2), ... rn> n + 1; otherwise there is no problem because jim n + 1, then the interpolation polynomial of degree m — I n provides an exact fit.
We proceed as before by setting (2) J—o
Thus
Il I! es,,
XNS
XM*XJ
fbi
y11
II]
yJ '
LEAST SQUARES SOLUTIONS 43
or e = Xb — y. If the restriction that e be minimal is imposed, then a closest nth degree polynomial to our data points has as its coefficients Where the are the components of a least squares solution b = Xty h,heN(X) of Xb = y. Notice that if the xi's are all distinct then X has full column rank. (It is an example of what is known as a Vandermonde segment.) Hence N(X) = {O} and there is a unique least = (X*X) 1X*y. squares solution, 5= To measure goodness of fit, observe that (2) was basically an intercept hypothesis so that from Theorem 4.2, one would use the coefficient of 112/Il YM determination R2 = A slightly more general situation than polynomial fitting is the following. Suppose that you are given n functions g1(x) and n linear functions I. of Now suppose that you are given m data points k unknown parameters (x1, y,). The problem is to find values so that
... is as close to the data points as possible. Let
WJk; and define e. =
x=:
=
+ ... +
!,gjx1) — y.. Then the corresponding matrix
e= [e1,...,e,,JT,b= [$1
equation ise= g1(x1) g1(x2)
,
g2(x1) ... g2(x2) ...
Note that this problem is equivalent to finding values so that is as close to the data points as y = fl1L1(x) + fl2L2(x) + ... + possible where L1(x) is a linear combination of the functions g1(x),
To insure that e is minimal, the parameters must be the components of a least squares solution 6,, of XW6W = y. By Theorem 1.7 we have 5w
=
x1)W) (XPR(w))Y + Ii, he N(XW).
In many situations W is invertible. It is also frequently the case that X has full column rank. If W is invertible and N(X) = {O}, then N(XW) = (0) so that (2) gives a unique least squares solution
= W 'Xty = W i(X*x) IX*y = (X*XW) 'X1y. Example 2.6.1
We will find parameters
$2. and
(3) so
that the
function
f(x)= $1x + fi2x2 + fi3 (sin x) best fits the four data points (0,1),
(, 3). and (it 4). Then we will
44 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
find the
so that
f(x) = (fir + ft2)x + (P2 + = P1x + + x2) +
+ sin x + sinx)
best fits the same four data points. In each case we will also compute R2, the coefficient of determination. In the first case, we seek least squares solutions of = y where
000
1
1
2
X=4irir
3
4
In this case, X has full column rank so that the least squares solution is unique and is given by = Xty [9.96749, — 2.76747, — 5.82845]T. Also
0.9667
so that we have a fairly good fit.
11101
Inthesecondcase,W= 10
1
1
LOOLJ
solution is given by
fi
—
1
10
1
Lo
0
ii 1 9.967491
ii iJ
1—2.767471 L—
5.82845J
1 6.906511
=
I
3.060981.
[— 5.82845J
Since N(X) = {O} and W is invertible we have XW(XW)t = XWW XXt and 2
=
IIXW(XW)'y12
hO In closing this section we would like to note that while in curve fitting problems it is usually possible to get X of full column rank. There are times when this is not the case. In the study of Linear Statistical Models, in say Economics, one sometimes has several that tend to move together (are highly correlated). As a result some columns of X, if not linearly dependent, will be nearly so. In these cases Xis either not of full rank or is ill-conditioned (see Chapter 12) [71]. The reader interested in a more statistically rigorous treatment of least squares problems is referred to Section 6.4.
LEAST SQUARES SOLUTIONS 45
7.
WhyAt?
It may come as a surprise to the reader who is new to the ideas in this book that not everyone bestows upon the Moore—Penrose inverse the same central role that we have bestowed upon it so far. The reason for this disfavour has to do with computability. There is another type of inverse, which we denote for now by A' such that A'b is a least squares solution to Ax = b. (A' is any matrix such that AA' = The computation of A' or A'b frequently requires fewer arithmetic operations than the computation of At. Thus, if one is only interested in finding a least squares solution, then A'b is fine and there would appear to be no need for At. Since this is the case in certain areas, such as parts of statistics, they are usually happy with an A'b and are not too concerned with At. Because they are useful we will discuss the A' in the chapter on other types of inverses (Chapter 6). We feel, however, that the generalized inverse deserves the central role it has played so far. The first and primary reason is pedagogical. A' stands for a particular matrix while A' is not unique. Two different formulas for an A' of a given matrix A might lead to two different matrices. The authors believe very strongly that for the readers with only an introductory knowledge of linear algebra and limited 'mathematical maturity' it is much better to first learn thoroughly the theory of the Moore—Penrose generalized inverse. Then with a firm foundation they can easily learn about the other types of inverses, some of which are not unique and some of which need not always exist.
Secondly, a standard way to check an answer is to calculate it again by a different means. This may not work if one calculates an A' by two different techniques, for it is quite possible that the two different correct approaches will produce very different appearing answers. But no matter how one calculates A' for a given matrix A, the answer should be the same.
3
Sums, partitioned matrices and the constrained generalized inverse
1.
The generalized inverse of a sum
For non-singular matrices A, B, and A + B, the inverse of the sum is rarely the sum of the inverses. In fact, most would agree that a worthwhile
expression for (A +'is not known in the general case. This would tend to make one believe that there is not much that can be said, in general, about (A + B)t. Although this may be true, there are some special cases which may prove to be useful. In the first section we will state two results and prove a third. The next sections apply the ideas of the first to develop computational algorithms for At and prove some results on the generalized inverse of special kinds of partitioned matrices. Our first result is easily verified by checking the four Penrose conditions of Definition 1.1.3. Theorem 3.1.1 If A, B€Cm Xn and if AB* = 0 and B*A = 0, then (A + B)t = At + Bt. The hypothesis of Theorem 1 is equivalent to requiring that R(A*) R(B*) and R(A) j. R(B). Clearly this is very restrictive. If the hypothesis of Theorem I is relaxed to require only that R(A*) j.. R(B*), which is still a very restrictive condition, or if we limit our attention to special sums which have the form AA* + BB*, it is then possible to prove that the following rather complicated formulas hold.
j
Theorem 3.1.2 IfAeC BeCTM ", then (AA* + BB*)? = (I — Ct*B*)At [I — AtB(I — CtC)KB*At*JAt(I — BCt) + C'C' where C = (I — AAt)B, K = [I + (I — CtC)B*At*AtB(I — CtC)] '. If A, and AB* = 0, then (A + B)t = At +(I — AtB)[C? +(I — CtC) x KB*At*At(I — where C and K are defined above. Since Theorem 2 is stated only to give the reader an idea of what types of statements about sums are possible and will not be used in the sequel, its proof is omitted. The interested reader may find the proof in Cline's paper [30].
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 47
We will now develop a useful formulation for the generalized inverse of
a particular kind of sum. For any matrix BeC'" ",the matrix B can be denote the written as a sum of matrices of rank one. For example, let matrix in C'" "which contains a I in the (i.j)th position and 0's elsewhere. then If B = (1)
is the sum of rnn rank one matrices. It can be easily shown that if rank (B) = then B can be written as the sum of just r matrices of rank one. Furthermore, if FECrnx? is any matrix of rank one, then F can be written as the product of two vectors, F = cd*, where ceC'", deC". Thus BeC'" can always be written as (2)
Throughout this chapter e1 will denote a vector with a I in the ith place and zeros elsewhere. Thus if ... ,eJ C'", then {e1 ,... ,e,,J would be the standard basis for C'". If B has the decomposition given in (1), let = where e1eC'". Then the representation (2) assumes the form B= It should be clear that a representation such as (2) is not unique. Now if one had at one's disposal a formula by which one could g-invert (invert in the generalized sense) a sum of the form A + cd* where ceC'" and deC", then B could be written as in (2) and (A + B)? could be obtained by
recursively using this formula. In order for a formula for (A + cd)' to be useful, it is desirable that it be of the form (A + cd)' = At + G where G a matrix made up of sums and products of only the matrices A, At, c, d, and their conjugate transposes. The reason for this requirement will become clearer in Sections 2 and 3. Rather than present one long complicated expression for (A + cd*)f Exercise 3.7.18.), it is more convenient to consider the following six logical possibilities which are clearly exhaustive. (1)
and
and 1 + d*Afc arbitrary; and 1 + d*A?C =0;
(ii) ceR(A) and (iii) ceR(A) and d arbitrary and 1 + dA'c #0; (iv) and deR(A*) and 1 + d*Atc =0;
(v) c arbitrary and deR(A) and 1 + d*Afc #0; (vi) ceR(A) and deR(A*) and I +d*Afc=0. Throughout the following discussion, we will make frequent use of the fact that the generalized inverse of a non-zero vector x is given by = x*/IIx 112 where lix 112 =(x,x). Theorem 3.1.3 For CEC'" and dEC" let k = the column AtC, h = the row d*At, u = the column (I — AAt)c, v = the row d*(I — AtA), and = the scalar I + d*AtC. (Notice that ceR(A) and only if u =0 and
48 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
deR(A*)
and only if,
= 0.) Then the generalized inverse of(A + cd*) is
as follows. (1)
Ifu#Oand,#O,then(A+cd*)t=At_kut _vth+p,tut.
(ii)
If u = 0 and #0, then (A + cd*)? = A' +
and
(iv)
qt = _QI')2k*At
÷k).
where p1 =
—
= 11k0211v112+1P12.
Ifu#0,,=O,andfl=0,then(A+cd*)t=At_Athth_ku?.
(v) Ifv=OandP#0, then (A +cd*)t =At
whereP2=_(0")Ath*+k). 02 = 11h11211u112 + 1P12. (vi) Ifu=O,,=O,andp=0,then(A+cd*)t=At_kktAt_
Athth + (ktAtht)kh. Before proving Theorem 1, two preliminary facts are needed. We state them as lemmas.
IA
Lemma 3.1.1
L'
uli—i.
—PJ
Proof This follows immediately from the factorization
IA+cd* L
0*
ci —li—Lb
O1FA
'JL'
ullI kill
0
—PJL0" lJLd*
Lemma 3.1.2 If M and X are matrices such that XMMt = X and MtM = XM, then X = Mt.
Proof Mt = (MtM)Mt = XMMt = X. U We now proceed with the proof of Theorem 3. Throughout, we assume
c #0 and d #0.
Proof of (i). Let X1 denote the right-hand side of the equation in (i) and let M = A + cd*. The proof consists of showing that X1 satisfies the four Penrose conditions. Using Mt =0, dv' = 1, = — 1, and c — Ak = AAt it is easy to see that MX1 = + wit so that the thfrd Penrose condition holds. Using UtA =0, utc = 1, be = — 1, and d* — hA = v, one obtnint X1M = AtA = and hence the fourth condition holds. The first and second conditions follow easily. Proof of (ii). Let X2 denote the right-hand side of the equality (ii). By using = 0,d*vt = 1, and d*k = — 1, it is seen that (A + cd*)X2 = AA', Ak = C,
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 49
which is hermitian. From the facts that ktAtA = kt. hc = — I, and d* — hA = v, it follows that X2(A + cd*) = A'A — kk' + v'v, which is also
hermitian. The first and second Penrose conditions are now easily verified.
Proof of (iii) This case is the most difficult. Here u = 0 so that CER(A) and hence it follows that R(A + cdi c R(A). Since fi 0 it is clear from Lemma 1 that rank (A + cd*) = rank (A) so that R(A + cdi = R(A). Therefore
+
(A +
= AAt
(3)
because AAt is the unique orthogonal projector onto R(A). Let X3 denote the right-hand side of the equation in (iii). Because = it follows immediately from (3) that X3(A + cd*)(A + cd*)t = X3. Hence the first condition of Lemma 2 is satisfied. To show that the second condition of Lemma 2 is also satisfied, we first The matrix AtA — show that (A + cd*)?(A + cd*) = AtA — kk' + kkt + is hermitian and idempotent. The fact that it is hermitian is clear and the fact that it is idempotent follows by direct computation using AtAk = k, AtAp1 = — k, and kkt p1 = — k. Since the rank of an idempotent matrix is equal to its trace and since trace is a linear function, it follows = Tr(AtA) — = Tr(A'A — kkt + that rank(AtA — kkt + Tr(kkt) + Now, kkt and are idempotent matrices of rank = trace = 1 and AtA is an idempotent matrix whose rank is equal to rank (A), so that rank(AtA — kk' + rank(A + cd*). (4) 1,d*p1 = I —a1fl',and UsingthefactsAk=c,Ap1 = d*AtA = — v, one obtains (A + cd*)(AtA — kkt + p1 = A + cd* — 2, c(v + flk' + 'p11). Now, II so that = k 2a1 I P1 2=PhIklL2 and hence +fiIIkII = — v _flkt. 112
Ii
Thus, (A + cd*)(AtA — kkt + p1p'1) = A + cd*. Because A'A — kkt + is an orthogonal projector, it follows that R(A* + dc*) c R(AtA — kkt + By virtue of(4), we conclude that R(A* + dc*) = R(AtA — kkt + or p1p'1), and hence(A* + dc*)(A* ÷ dci' = AtA — kk' + equivalently, (A + + cdi = AtA — kkt + p1 To show that X3(A + cd*) = A'A + p1 — kkt, we compute X3(A + cd*). 1, Observe that k*AfA = k*, = — and + d* = — 11v112P1k+v. 1
Now, X3(A + cd*)
/
1
a, = AtA +
—
= AtA + !,*k* —
a,
a,
—
+
" k2 - v*)d* P
p,d* — a1
+ d*)
ft
+ p,d*
50 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=AtA+!v*k*__p,(v_ Write v as = — fill k fl + parentheses and use the fact that X3(A + cd*) = Since
+
AtA+
and substitute this in the expression in 2 p1 = 1fi12c' Ilk II to obtain p1k*.
+ p, P,
+
II
2k =
—
k —k —k = — kk'. we arrive at X3(A + cd*) = A'A + Thus (A + cd*)t(A + cd*) = X3(A + cd*) so that X3 = (A + Cd*)t by Lemma 2. Proof of (iv) and (v). (iv) follows from (ii) and (v) follows from (iii) by
taking conjugate transposes and using the fact that for any matrix = (Me)'. h'h and AtA — kkt is an Proof of (vi). Each of the matrices AAt — orthogonal projector. The fact that they are idempotent follows from AA'h' = lit, MA' = h,A'Ak = k and ktAtA = kt. It is clear that each is hermitian. Moreover, the rank of each is equal to its trace and hence each has rank equal to rank (A) — 1. Also, since u =0, v =0, and fi =0, it follows from Lemma 1.1 that rank(A + cd*) = rank(A) — 1. Hence, rank(A + cd*) = rank (AA'
—
hth)
= rank (AAt
—
k'k).
(5)
With the facts AAtC = c, hc = — I, and hA = d*, it is easy to see that (AAt — hth)(A + cd*) = (A + cd*), so that R(A + cd*) c R(AAt — h'h). Likewise, using d*A?A = d*, d*k = — 1, and Ak = c, one sees that (A + cd*)(AtA — kk') = A + cd*. Hence R(A* + dc*) R(A'A — kk'). By virtue of (5), it now follows that
(A + cd*)(A + cd*)t = AA'
—
h'h, and
(6)
(A + cd*)t(A + cd*) = A'A
—
kk'.
(7)
If X4 denotes the right-hand side of (vi), use (6) and the fact that bAA' = h to obtain X4(A + cd*)(A + cd*)t = X4 which is the first condition of Lemma 2. Use k'A'A = kt, hA = d*, and hc = — 1 to obtain X4(A + cd) =
AtA — kk'. Then by (7), we have that the second condition of Lemma 2 is satisfied. Hence X4 = (A + cd*)f
Corollary 3.1.1
•
When ceR(A), deR(A*), and
inverse of A + cd* is given by (A +
0, the generalized
= At —
At
—
Proof Setv=Oin(iii),u=Oin(v). U Corollary 1 is the analogue of the well known formula which states that
if both A and A + cd are non-singular, then (A + cdT' =
A'
—
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
2.
51
Modified matrices
At first glance. the results of Theorem 1.3 may appear too complicated
to be any practical value. However, a closer examination of Theorem 1.3 will reveal that it may be very useful and it is not difficult to apply to a large class of problems. Suppose that one is trying to model a particular situation with a mathematical expression which involves a matrix A C'" XsI and its generalized inverse A'. For a variety of reasons, it is frequently the case that one wishes to modify the model by changing one or more entries of A to produce a 'modified' matrix A, and then to compute At. The modified model involving A and A' may then be analysed and compared with the original model to determine what effects the modifications produce. A similar situation which is frequently encountered is that an error is discovered in a matrix A of data for which A' has been previously calculated. It then becomes necessary to correct or modify A to produce a matrix A and then to compute the generalized inverse A' of the modified matrix. In each of the above situations, it is highly desirable to use the already known information; A,A' and the modifications made. in the computation of A' rather than starting from scratch. Theorem 1.3 allows us to do this since any matrix modification can always be accomplished by the addition of one or more rank one matrices. To illustrate these ideas, consider the common situation in which one wishes to add a scalar to the (i,j)th entry of A€C'" to produce the modified matrix A. Write A as where Write A' as A' =
= [c1 ...
(1)
rr =
[r
That is, g•j denotes the (i,j)-entry of At, c1 is the ith column of At, and r1 is the ith row of At. The dotted lines which occur in the block matrix of At are included to help the reader distinguish the blocks and their arrangement. They will be especially useful in Section 3 where some blocks have rather complicated expressions. To use Theorem 1.3 on the modified matrix (1), order the computation as follows.
Algorithm 3.2.1 To g-invert the modified matrix A + (I) Compute k and h. This is easy since k = Ate1 and h =
(II) Compute u and v by u = e1 — Ac1 and v =
(Ill) Compute
this is also easy since fi = 1 +
so that
—
so that fi = 1 + (IV) Decide which of the six cases to use according as u, ,, and are zero or non-zero. (V) Depending on which case is to be used, carefully arrange the he1,
52 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
computation of the terms involved so as to minimize the number of
multiplications and divisions performed. To illustrate step (V) of Algorithm 1, consider the term kktAt, which has to be computed in cases (ii) and (vi) of Theorem 1.3. It could be computed in several ways. Let us examine two of them. To obtain kt (we k 112 = assume) k #0) we use k' = This requires 2n operations (an operation is either a multiplication or division). If we perform the calculations by next forming the product kk' and then the product (kkt)At, it would be necessary to do an additional mn2 + n2 operations, making a total of n2(m + 1) + 2n operations. However, if kktAt is computed by first obtaining kt and then forming the product ktAt, followed then by forming the product (k(ktAt)), the number of operations required is reduced to 2n(m + 1). This could amount to a significant saving in time and effort as compared to the former operational count. It is important to observe that the products AA' or AtA do not need to be explicitly computed in order to use Theorem 1.3. If one were naive enough to form the products AAt or AtA, a large amount of unnecessary effort would be expended.
Example 3.2.1 2
Suppose
0
A= 10
1
—1
Lo
0
1
that
3 ii 0I,andA'=— 3 12
—iJ
—3
01
5 7
41
4J
3-7-8J
has been previously computed. Assume that an error has been discovered in A in that the (3,3)-entry of A should have been zero instead of one. Then A is corrected by adding — 1 to the (3,3)-entry. Thus the modified matrix is A = A + e3( — To obtain A' we proceed as follows. (I) The terms k and h are first read from A' as
—4].
7
(II) The terms u and v are easily calculated. Ac3 = e3 so that u =0.
v=
—1 —1
—
ij.
(III) The term $ is also read from At as fi = 1 (IV) Since u =0, v used to obtain At.
0, and
—
g33
=
#0, case (iii) of Theorem 1.3 must be
(V) Computing the terms in case (iii) we get
k 112
= c3 112 =
and
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
53
I
Then by (in) of Theorem 1.3., A =
2
—2
2
2
— 5
2
0
3.
0 —6
Partitioned matrices where A, B, C and D are four matrices such that E is
Let E
=
also a matrix. Then A, B, C, D are called conformable. There are two ways to think of E. One is that E is made up of blocks A, B, C, D. In this case an E to the blocks are, in a sense, considered fixed. If one is trying to have a certain property, then one might experiment with a particular size or kind of blocks. This is especially the case in certain more advanced areas of mathematics such as Operator Theory where specific examples of linear operators on infinite dimensional vector spaces are often defined in terms of block matrices. One can also view E as partitioned into its blocks. In this viewpoint one starts with E, views it as a partitioned matrix and tries to compute things about E from the blocks it is partitioned into. In this viewpoint E is fixed and different arrangements of blocks may be considered. Partitioned matrix and block matrix are equivalent mathematical terms. However, in a given area of mathematics one of the two terms is more likely to be in vogue. We shall try to be in style. Of course, a partitioned or block matrix may have more or less than four blocks.
54 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
This section is concerned with how to compute the generalized inverse
of a matrix in terms of various partitions of it. As with Section 2, it is virtually impossible to come up with usable results in the general case. However, various special cases can be handled and, as in Section 2, they are not only theoretically interesting, but lead to useful algorithms. In fact. Theorem 1.3 will be the basis of much of this section, including the first case we consider. Let be partitioned by slicing off its last column, so that P = [B cJ where BeCtm and ceCtm. Our objective is to obtain a useful P may also be written as P = [Bj011 + 1] where expression for 01ECM, O2ECA.
and
Then P is in a form for which Theorem 1.3 applies. Let A = = [0 1]. Using the notation of Theorem 1.3 and the fact that At
one easily obtains h = d*At = 0 so that
=
= 1 + d*Afc = 1 and
v = d* #0. Also, u = (I — AAt)C = (I — BB')c. Thus, there are two cases to consider, according as to whether u #0 or u =0. Consider first the case when u #0 (i.e. Then case (1) in Theorem 1.3 is used to obtain pt In this case
AC=[fBtcl 0
IBtl IB'cutl
101
10
L
Next, consider the case when u =0 so that (iii) of Theorem 1.3 must be
used. Let k = Btc. Then = — k*Bt so that Bt —
=
= 1 + c*B'*Btc = 1 + k*k,
=
and —
kk*Bt 1 + k*k k*Bt
.
Thus we have the following theorem.
1 + kk Theorem 3.3.1 For ceCTM, and P = let k = Btc and u = (I — BBt)C = c — Bk. The generalized inverse of P is given by t
IBt_kyl
(ut
]whereY=l(l+k*k)_lk.Bt ifu=0.
[B]
Theorem 4 can be applied to matrices of the form p = r a row vector, by using the conjugate transpose. By using Theorem 1 together with Theorem 1.3, it is possible to consider the more general partitioned matrix M
=
where
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
dECN,
so that M can be written as
and xeC. Let P =
+
+
M
55
=
=
Theorem 1.3 can be applied to obtain Mt as M' = [Pt 01 + G. pt is
I+
known from Theorem 1. Clearly, x
[Pt oiI?1 = I L
0. Thus either case (i)or case (v) of Theorem 1.3 must be
.J
used, depending
on whether or not
Idi
IIA*
The details, which are somewhat involved, are left to the interested reader, or one may see Meyer's paper [551. We state the end result. Theorem 3.3.2
let
For AECtm N CEC
:],k=Atc,h=dsAt,u=(I_AAt)
c,
= I + 11k02,w2= 1 + The generalized inverse for M is as follows.
(i) If U
0 and V
lAt — kut — vth — 5,tut
0, then Mt =
I
u
L
At (ii)
10
lkk*At
—
'0
I
then M'
(iii)
where p1
— k.
=[ 2k*At
=
(iv) If v =0 and t5 =0, then M
iAt_
where p2
(vi)
=
0, then Mt
— k,
u
11,112 +
t
—i
I
rAt
= cv,
—
*
u
L
(v) If v = 0 and 5
—
,
0
—3-- IAtII*U*
=L +
1]'.
=5
—
h, and 42 = W2 U 112 + 1512.
Ifu=O,v=O, andS=O, then
M' = I
[
—kA
0
J
+ k*Ath*Iklrh, —1
w1w2[—lJ''
56 GENERAUZED INVERSES OF LINEAR TRANSFORMATIONS
Frequently one finds that one is dealing with a hermitian or a real
symmetric matrix. The following corollary might then be useful.
Corollary 3.3.1
For AECTMXM such that A = A* and for xeR, the
generalized inverse of the hermitian matrix H =
0, then Ht
(1) If u
rAt
k
tk*
as follows: (5
t' U:
t.
=L
(ii)
If u =0 and 6=0, then
H' =
1
1 and 2 may be used to recursively compute a generalized inverse, and were the basis of some early methods. However, the calculations are sensitive to ill conditioning (see Chapter 12). The next two algorithms, while worded in terms of calculating At should only be used for that purpose on small sized, reasonably well conditioned matrices. The real value of the next two algorithms, like Algorithm 2.1, is in updating. Only in this case instead of updating A itself, one is adding rows and/or columns to A. This corresponds, for example in least squares problems, to either making an additional observation or adding a new parameter to the model.
Algorithm 3.3.1 To g-invert any (I)
(II) For i 2, set B = (III) k = B_ 1c1, (IV) u1 = —
!c1],
1k1,
if ii,
0,
and k*Bt
(VI) B
if
0,
=
(VII) Then
Example 3.3.1
Suppose that we have the linear system Ax = b, and
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
57
have computed At so we know that
10 A— —
2
1
3
1
11
'r
At
1
0
'
1
1
—1
—1
3
But now we wish to add another independent variable and solve Ai = b where
10
—
A
=
1
21
0
3
1
1
1
1
—1
—t by computing A.
We will use Algorithm 1. In the notation of Algorithm I, A = B3, A = B2,
c*=[1 0
1
—2
=
3
—
1
7], so that 2
3
2 —2
0
1
1-2
1]
—7J• If the matrix in our model is always hermitian and we add both an independent variable and a row, the next Algorithm could be useful. L
Algorithm 3.3.2
5
3
To g-invert H =
such
that H = H*
(I) Set A1=h11.
(II) For i 2, set
=
where c. =
I,] (III) Let k = A1_ 1c1, (IV) ô. = — and (V) u1
0, then
=
V1
=
and
At (VII) If U1 = 0 and 61 #0, then
(VIII) If ii, =0 and
At
= [_L
= 0, then let r1 =
A;j;L1
58 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=
=( At =
so that
and z1 =
(r1_z1)*
L
(IX) Then
For a general hermitian matrix it is difficult to compare operational counts, however Algorithm 2 is usually more efficient than Algorithm 1 when applicable since it utilizes the symmetry. There is, of course, no clear cut point at which it is better to recompute At from scratch then use augmentation methods. If A were 11 x 4 and 2 rows were added, the authors would go with Algorithm 1. If A were 11 x 4 and 7 rows were added, we would recompute At directly from A. It is logical to wonder what the extensions of Theorems 1 and 2 are.
That is, what are [A!C]t and
when C and D no longer are just
columns and B is no longer a scalar? When A, B, C and D are general conformable matrices, the answer to 'what is [A! C]?' is difficult. A useful
answer to 'what is
rA Cit ?' is not yet known though formulas exist. LD B]
The previous discussion suggests that in some cases, at least when C has a 'small' number of columns, such as extensions could be useful. We will begin by examining matrices of the form [A! C] where A and C are general conformable matrices. One representation for [A!C]t is as follows.
Theorem 3.3.3 For AeCTM 'and CECTM [A C] can be written as
r
AA rA'clt I i — LT*(I + fl'*) I
I
VA —
I(At
—
where B = (I — AAt)C and T = AtC(I
the generalized inverse of
AtCRt
AtCBt) + Bt —
BtB).
Proof One verifies that the four Penrose conditions are satisfied. U A representation similar to that of Theorem 3 is possible for matrices partitioned in the form
by taking transposes.
The reader should be aware that there are many other known ways of representing the generalized inverse for matrices partitioned as [A! C] or
as []. The interested reader is urged to consult the following references to obtain several other useful representations. (See R. E. Cline [31], A. Ben-Israel [12], and P. V. Rao [72].)
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
As previously mentioned, no useful representation for
rA L
R
59
Cl' has, D]
up to this time, been given where A, C, R and D are general comformable matrices. However, if we place some restrictions on the blocks in question, we can obtain some useful results.
Lemma 3.3.1
If A, C, R and D are conformable matrices such that A is then D = RA 'C.
square and non-singular and rank (A) = rank
Furthermore, EJP = RA' and Q = A'C then
Proof
The factorization I
F
O1[A
I][R
yelds rank
I
I
—A'Cl fA
DJL0
i
IAC1 = rank rA;
0
]Lo 1
0
D—RA'C
= rank (A) +
rank (D — RA - 'C). Therefore, it can be concluded that rank (D — RA 'C) =0, or equivalently, D = RA 'C. The factorization (1) follows
directly. • Matrices of the type discussed in Lemma 1 have generalized inverses which possess a relatively simple form. Theorem 3.3.4
Let A, C, R and D be conformable matrices such that
A is square, non-singular, and rank (A) any matrices such that
[A C]t =
[
[R D] =
= rank
If P and Q are
Q],
then
([I + P*P]A[I + QQ*Jy 1[I P*]
and G = [I, Q]. Notice B= = that rank(B) = rank(G) = rank(A) = rank(M) = r = (number of columns
Proof Let
M
of B) = number of rows of G). Thus, we may apply Theorem 1.3.2 to obtain Mt = (BG)t = G*(GG*) '(BB) IB*. Since (B*B) 1B* = [A*(I + P*P)A] = A 1(1 + p*p) l[UP*] and G*(GG*rl =
+
desired result is obtained. •
It is always possible to perform a permutation of rows and columns to
60 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
any matrix so as to bring a full-rank non-singular block to the upper left
hand corner. Theorem 3.3.4 may then be used as illustrated below. Example 3.3.3 In order to use Theorem 4 to find Mt, the first step is to reduce M to row echelon form, EM. This not only will reveal the rank of M, but will also indicate a full set of linearly independent columns.
LetM=
2
1
2
1
1
3
1
2
4
2
2
6
0 0
1
2' —1 0
2
,sothatEM=
0 1
0 0 0
1/2 1/2 0
5/21 1/2 0
00000
24 015 Thus, rank(M)
=2 and the first and third columns of M form a full independent set. Let F be the 5 x 5 permutation matrix obtained by exchanging the second and third rows of so that [1
11213
MF =
—
[2
independent
=
X2]. The next step is to select two
01415
rows from the
matrix X1. This may be
accomplished in several
rows reduction to echelon form, or one might
ways. One could have obtained this information by noting which
were interchanged
during the
just look at X1 and select the appropriate rows, or one might reduce to echelon form. In our example, it is easy to see that the first and third rows of X1 are independent. Let E be the 4 x 4 permutation matrix obtained by exchanging the second and third rows of 14 so that
EMF
=
11
1
Ii
—1
12
1
2
1
3
2
0
2
01415
[2
IA
C
= I
permutation matrices are unitary, Theorem 1.2.1 allows us to write (EMF)t = F*M?E* so that = Now apply Theorem 4 to obtain (EMF)t. In our example, Since
so that
—88
66
—6
36
—12
30
15
—35
30
—20
9
1
18
10
18
15
36
30
33
330
1
—3 —6 —6 —12 15
66 30
9
18
33
—55
—88 —55 —35 —20 1
10
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
Theorem 3.3.5.
Let A. C, R and D be conformable matrices such that
lAd
A is square, non-singular, and rank (A) = rank
matrix M
61
R
=
and let
D
•
Let M denoe the
The matrix W is
l[A* R*].
non-singular and Mt =
Proof We first prove the matrix W is non-singular. Write W as W = A*AA* + R*RA* + A*CC* + R*DC*. From Lemma I we know = that D = RA 'C so that W = A*AA + R*RA* + A*CC* + R*RA (A*A + R*R)A - l(AA* + CCt). Because A is non-singular, the matrices (AtA + RtR) and (AAt + CCt), are both positive definite, and thus non-singular. Therefore, W must be non-singular. Furthermore, W' = + CC*yl + RtR) '.Using this, one can now verify
the four Penrose conditions are satisfied. U In both Theorems 4 and 5, it is necessary to invert only one matrix whose dimensions are equal to those of A. In Theorem 5, it is not necessary to obtain the matrices P and Q as in Theorem 4. However, where problems of ill-conditioning are encountered (see Chapter 12), Theorem 4 might be preferred over Theorem 5.
4.
Block triangular matrices
Definition 3.4.1 For conformable matrices T,, , T,2 , T21, and T22, matrices oftheform fT11
L°
0
122J
LT21
T22
are called upper block triangular and lower block triangular, respectively. It is important to note that neither T,, nor T22 are required to be square in the above definition. Throughout this section, we will discuss only upper block triangular matrices. For each statement we make about upper block triangular matrices, there is a corresponding statement possible for lower block triangular matrices.
Definition 3.4.2
For an upper block triangular matrix,
T is a properly partitioned upper block triangular matrix if T is upper block triangular of the form T = I L
A
"22J
where the
62 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
dimensions of G11 and G22 are the same as the dimensions of the transposes of T11 and T22, respectively. Any partition ofT which makes T into a
properly partitioned matrix is called a proper partition for T.
111101 lo 0 0 ii
LetT=
Example 3.4.1
Loooi]
1/30 1/3 0 1/30
0 0 0
0 1/2 1/2 There are several ways to partition T so that T will have an upper block triangular form. For example,
Iii 1101 and T2 Ii = 100101 L000: iJ Loolo 1 i
are two different partitions of T which both give rise to upper block triangular forms. Clearly, T, is a proper partition of T while T2 is not. In fact, T1 is the only proper partition of T. Example 3.4.2 If T is an upper block triangular matrix which is partitioned as (1) whether T11 and T22 are both non-singular, then T is properly partitioned because
T_l_1Tu
I
T1 22
1 I
Not all upper block triangular matrices can be properly partitioned.
Example 3.4.3
Let T =
111
2 L20
'I
11
2
12
4 0J . Since there are no zeros in
11
L001J
—11
8
25J The next theorem characterizes properly partitioned matrices. —
10
Theorem 3.4.1 Let T be an upper block triangular matrix partitioned as (1). T is properly partitioned (land only c 1) and Furthermore, when T is properly partitioned, Tt is given by R(Tr2)
rTt all —L
_TfT Tt I
I
i
(Note the resemblance between this expression and that of Example 2.)
Proof Suppose first that T is properly partitioned so that Tt is upper block triangular. It follows that iTt and TtT must also be upper block triangular. Since iT' and VT are hermitian, it must be the case that they
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 63
are of the form
fR
0]
FL
0
Lo
R2]
Lo
L2
By using the fact that 1'TtT = T, one obtains R1T11 =T11 and L2T22 =T22, R1T12 =T12 and T12L2 =T12
(3)
Also, R1 , R2, L1 and L2 must be orthogonal projectors because they are
hermitian and idempotent. Since R1 = T1 1X for some X and R1T11 = T11, Likewise, we can conclude R(R = i)' and hence T = From (3), L2 = VT22 for some Y and T22 = T22L2 implies L2 = we now have 2
T —T12a"dP
T*_T* 12 12'
4
c
(5)
and therefore R(T12) c R(T11) and
To prove the converse, one first notes that (5) implies (4) and then uses this to show that the four Penrose conditions of Definition 1.1.3 are satisfied by the matrix (2). A necessary condition for an upper block triangular to be properly partitioned is easily obtained from Theorem 1.
•
Let T be partitioned as in (1). If T is properly partitioned, rank(T11) + rank(T22).
Corollary 3.4.1 then rank(T)
Proof If T is properly partitioned, then Tt is given by (2) and T1
rTTtL'_ 1T12 = T12 so that iTt =L
rank(T)= rank(TTt) = rank(T1 rank(T22). U 5.
1
—
i
1
0 22
+ rank
Thus, 22
= rank(T1
+
The fundamental matrix of constrained minimization
Definition 3.5.1
Let V E CA
be any matrix in
xr•
x
be a positive semi-definite matrix and let C*
The block matrix B
Iv C*1 is called the =LC 0 ]
fundamental matrix of constrained minimization.
This matrix is so named because of its importance in the theory of constrained minimization, as is demonstrated in the next section. It also plays a fundamental role in the theory of linear estimation. (See Section 4 of Chapter 6.) Throughout this section, the letter B will always denote the matrix of Definition 1. Our purpose here is to obtain a form for Bt.
64 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Note that ifS is the permutation matrix S =
then
rc* vit
B=[0 vi cjS*sothatBt=SLo rc*
Thus, we may use Theorem 3.4.1 to obtain the following result.
Theorem 3.5.1
Bt =
jf and only tfR(V) c R(C*).
In the case when R(V) R(C*), it is possible to add an appropriate term to the expression in Theorem 1 to get Bt.
Theorem 3.5.2 For any positive semi-definite let E = I — and let Q = (EVE)t. Then Bt
Proof
XII
—
=
and any C*ECN
VCt].
(1)
+
Since V is positive semi-definite, there exists
XII
such
that
Now, E = E* = E2so that
V=
(2) Q = (E*A*AE)t = ([AE]*AE)t and hence R(Q) = R( [AE]*) R(EA*) c R(E). This together with the fact that Q = Q* implies
EQ=QandQE=Q,
(3)
so that CQ = 0 and Q*C =0. Let X denote the right-hand side 01(1). We shall show that X satisfies the four Penrose conditions. Using the above information, calculate BX as
+ EVQ 0
—
BX
:
EVCt
- EVQVCt
Use (3) to write EVQ = EVEQ = QtQ and EVQV = EVEQEV = EA*AE(AE;t(AE)*t(AE)*A = EA*AE(AE)t(AE) x (AE)tA = EA*AE(AE)f A = (AE)*(AE)(AE)tA = (AE)*A
=EA*A=EV.
(4)
Thus,
+ Q'Q
BX — L
0
0 1
:ccti' :
which is hermitian. Using (5), compute BXB
(5)
=
From (3) and (4) it is easy to get that EVEQV = EVQV = EV, and hence BXB = B. It follows by direct computation using (5) that XBX = X.
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 65
Finally, compute XB
I
QVE+CtC
01
:
From (3)
= QVE = QEVE = QQ'. In a manner similar to that used in obtaining (4), one can show that VQVE = yE, so that XB
=
which is hermitian. We have shown that X satisfies all four Penrose
conditions so that Bt = X. U 6.
Constrained least squares and constrained generalized inverses
In this section, we deal with two problems in constrained minimization. Let beC'", and f€C". Let 5" denote the set
5= {xIx=Ctf+ N(C)}. That is, 9' is the set of solutions (or least squares solutions) of Cx = 1, depending on whether or not fER(C). .9' will be the set of constraints. It
can be argued that the function Ax — b attains a minimum value on .9'. The two problems which we will consider are as follows. Problem
Find mm
1
Ax
—
b
and describe the points in 9' at which
the minimum is attained as a function of A, b, C, and
f.
Problem 2 Among the points in 9 for which the minimum of Problem 1 is attained, show there is a unique point of minimal norm and then describe it asa function of A,b,C, and f. The solutions of these two problems rest on the following fundamental theorem. This theorem also indicates why the term 'fundamental matrix' was used in Definition 3.5.1. Theorem 3.6.1 q(x) = Ax — b Ax0 —
2•
Let A, b, C, 1, and .5" be as described above, and let A vector x satisfies the conditions that XØE.9' and
bil Ax — bil for all XE9' !fand only !f there is a vector
such that z0
IA*A
Lc
[°] is a least squares solution of the system = :
C*][x]
fA*b
o]Ly]Lf
Proof Let B and v denote the block matrices IA*A C*1 IA*bl B Suppose first that is a least squares = =L g solution of Bz = v. From Theorem 2.1.2, we have that Bz0 = BBt,. From equations (2) and (5) of Section 5, we have I
66 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
BB
ICtC + (AE)t(AE) =[
o
where E = I
-'
—
C C,
so that Bz, = BBt, implies A*Ax0 = C*y. = CtCA*b + (AE)t(AE)A*b = CtCA*b + (AE)t(AE)(AE)*b = CtCA*b + (AE)*b = A*b
(1)
and
Cx0=CCtf. From (2) we know x E9'. Write x, = Ctf + h0 where h0eN(C). For every xe9' we have x = C1f+ so that
IIAC'f+ = AC'f + =
(2)
Ax —b112 —
ACtf — Ah, +
A; — b
(3)
112
For all h€N(C), we may use (1) to get (Ah,Ax0 — b) = (h,A*(Ax0
—
b))= +
(h, _C*y0)= —(Ch,y0)=O. Hence(3)becomesq(x)= q(x0) so that q(x) q(x) for all xeb°, as desired. Conversely, suppose x0e9' and q(x) q(x). If Ctm is decomposed as Ctm =
A(N(C)) + [A(N(C))]'-, then
A; — b = Ah + w, where hEN(C),WE[A(N(C))]-'-.
(4)
We can write q(x0) =
Ah + w II 2 = H
Ah 2 + w 2.
(5)
Now observe that (x0 — h)E$° because x0E$" and heN(C) implies C(x0 — I.) = CCtf. By hypothesis, we have q(x) q(x) for all xeb' so that q(x0) q(x0 — h) = II (Ax — b) — Ala 112 = (A1 + w) — Ah 112 = w (from (4)) = q(x0) — Ala 112 (from (5)). Thus Ah =0 and (Ax0 — b)E
[A(N(C))]' by (4). Hence for any geN(C), 0= (Ag, Ax0 — b) = (g, A*Ax0 A*b), and (A*Ax, — = R(C*). This means there exists a — A*b or vector( — such that C*( — y0) = A*Ax + C*y, = A*b = A*b — (AE)*b + (AE)*b
(6)
= A*b — (AE)*b + (AE)t(AE)(AE)*b = A*b — EA*b + (AE)IAEA*b
= [CtC + (AE)I(AE)]A*b Now (6) together with the fact that x0e9', gives
1
= RBt
rA*bl L
therefore
[c
is a least squares solution of —
• IJ
FA*bl
The solution to Problem 11$ obtained directly from Theorem 1.
j
and
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE
Theorem 3.6.2
The set of vectors M c .9' at which mm
67
Ax — b is
atta.ned is given by
M = {(AE)t(b — AC'f) + Ctf+(I
=I—
(M will be called the set of constrained least squares solutions). Furthermore where E
Ax — b
mm
= 11(1 —
A(AE)t)(ACtf —
b)
(7)
Proof From Theorem I, we have M
=
Bt[A;b]+(I — BtB)['L].
and
arbitrary).
By Theorem 5.2, M =
QQt = [(AE)*(AE)]l(AE)*(AE) = = (AE)t(AE)sO that M becomes M {QA*(b — ACtf) + Ctf + (I — Note that, R((AE)') = R((AE)*) = R(EA*) R(E), so that Q
E(AE)t = (AE)t and (3) of Section 5 yields QA* = (EQE)A* = E(AE)t(AE)*t(AE)* = E(AE)t = (AE)t. Thus M becomes M = {(AE)t(b — ACtf) + Ctf+ For each mEM, we wish to write the (I expression Am — b Ii. In order to do this, observe (8) implies that A(AE)t(AE) = AE and = so that A(I — when
(8)
=0
for all {eN(C). Expression (7) now follows. I The solution to Problem 2 also follows quickly. Let M denote the set of constrained least squares solutions as given in Theorem 6.2. If u denotes the vector u = (AE)t(b — Act f) + Cti, then u is the unique constrained least squares x for all XE M such solution of minimal norm. That is, U EM and
Theorem 3.6.3
that x
u.
Proof The fact that UE M is a consequence of Theorem 2 by taking =0. To see u has minimal norm, suppose x€M and use Theorem 2 to = N(ct*) Since R((AE)t(AE)) = write x = u + (I — (AE,t(AE)g, R((AE)*) = R(EA*) R(E) = N(Ct*), it follows that ct*(AE)t AE =0 and Therefore it is now a simple matter to verify that u (I — — (AE)t(AE)g 112 112 u 112 with equality holding if and lix = u 112 + 11(1 only if(I — = 0, i.e if and only if u = x. U From Theorems 2 and 3, one sees that the matrix (AE)t is the basic quantity which allows the solution of the constrained least squares problem to be written in a fashion analogous to that of the solution of the unconstrained problem. Suppose one wished to define a 'constrained generalized inverse for
68 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
A with respect to C' so that it would have the same type of least squares
properties in the constrained sense as At has in the unconstrained sense. Suppose you also wanted it to reduce to At when no constraints are present (i.e. C = 0). The logical definition would be the matrix (AE)t. Definition 3.6.1 For Ae C and CE Xfl the constrained generalized inverse of A with respect to C, denoted by is defined to be — CtC))t. = (APN(c))t = (A(I (Notice that reduces to At when C = 0.) The definition of could also have been formulated algebraically, see Exercise 7.18. The solutions of Problem 1 and Problem 2 now take on a familiar form. The constrained least squares solution of Ax = b of minimal norm is. + (1—
xM =
(9)
The set of constrained least squares solutions is given by
M=
+ (I —
(10)
Furthermore, mm
lAx —bli =
(11)
b)lt.
11(1 —
The special case when the set of constraints defines a subspace instead of just a flat deserves mention as a corollary. Let V be a subspace of and P = The point X_E Vofminimal norm at which mm Ax — b H is attained is given by
Corollary 3.6.1
(12)
and the set of points M Vat which mm H Ax — b H is attained is "€1,
(13)
Furthermore, mm
Ax
—
b
= H (I —
AA' )b Il.
(14)
Proof C = and f=0, in (9),(10), and (11). Whether or not the constrained problem Ax = b, x e V is consistent also has an obvious answer. Corollary 3.6.2 If Vis a subspace of Ax = b, xe Vhas a solution and only
and P =
= b (i.e. problem is consistent, then the solution set is given by V) and the minimal norm solution is Xm =
then the problem
If the + (I —
Proof The problem is consistent if and only if the quantity in (14) is zero, that is, = b. That this is equivalent to saying
PARTITIONED MATRICES AND CONSTRAINED GENERALIZED INVERSE 69
follows from (8). The rest of the proof follows from (13) and (12). U In the same fashion one can analyse the consistency of the problem Ax = b, xe{f+ Vis a subspace) or one can decide when two systems
possess a common solution. This topic will be discussed in Chapter 6 from a different point of view. 7. 1.
Exercises Use Theorem 3.1.3 to prove Theorem 3.3.2.
2. Prove that if rank(A) =
R(C)
R(A), and R(R) c R(A*),
then D = RAT. 3. Let Q = D — RAtC. If R(C) c R(A), R(R*) c R(A*), R(R) c R(Q), and R(C*) R(Q*), prove that — fA — fAt + AtCQtRAt —QtRAt LR Qt
DJL
4. Let P = A — CDtR. If R(R)
R(D), R(C*) c R(D*), R(R*)
R(P*), and
R(P), write an expression for
R(C)
in terms of
5. If M =
C, R and D.
..
rA
.
[c* Dj is a positive semi-definite hermitian matrix such
that R(C*At) R(D — C*AtC), write an expression for Mt. 6. Suppose A is non-singular in the matrix M of Exercise 5. Under this assumption, write an expression for Mt. 7. If T22 is non-singular in (1) of Section 4 prove that B= — T1 1)T12 + is non-singular and then prove that
8.
If T11 is non-singular in Exercise 7, write an expression for Tt.
9. Let T
= IA
LO*
ci where
Derive an
CECTM, and
expression for TT
10. Prove that the generalized inverse of an upper (lower) triangular matrix T of rank r is again upper (lower) triangular if and only if there
exists a permutation matrix P such that PTP
] where
=
T1 E C' 'is a non-singular upper (lower) triangular matrix. 11. For such that rank(A) = r, prove that AtA = AA' if and only if there exists a unitary matrix W such that W*AW
=
] where
70 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
T1 eC'
xr
a nonsingular triangular matrix. Use Exercise 10 and the fact that any square matrix is unitarily equivalent to an upper (lower) triangular matrix. is
and
12. For
write an expression for
rAlt LRi
13. Prove Theorem 3.4.1 for lower block triangular matrices. 14. Give an example to show that the condition rank(T) = rank(T1 + rank (T22) is not sufficient for (1) of Section 4 to be properly partitioned. 15. Complete the proof of Theorem 3.3.3. 16. Complete the proof of Theorem 3.3.5. 17. If V is a positive definite hermitian matrix and if C is conformable, let K = V + CCt and R = C*KtC. Show that
Cit IKt — KtCRtC*Kt
Iv
[Ct
:
oJ
[
:
KtcRt
RtC*Kt
18. The constrained generalized inverse of A with respect to C is the unique solution X of the five equations (1) AXA = A on N(C), (2) XAX = X, (3) (AX)t = AX, (4) PN(C)(XA)t = XA, on N(C) (5) CX =0. 19. Complete the proof of Theorem 3.1.1 20. Derive Theorem 3.3.1 from Theorem 3.3.3.
4
Partial isometries and EP matrices
1.
Introduction
There are certain special types of matrices which occur frequently and called unitary if A* = A - l, have useful properties. For example, A e hermitian if A = A*, and normal if A*A = AA*. This should suggest to the reader questions like: when is A* = At?, when is A = At?, and when is
AtA = AAt? The answering of such questions is useful in understanding the generalized in'.'erse and is probably worth doing for that reason alone. It turns out, however, that the matrices involved are useful. It should probably be pointed out that one very rarely has to use partial isometrics or the polar form. The ideas discussed in this short chapter tend to be geometrical in nature and if there is a geometrical way of doing something then there is probably an algebraic way (and conversely). It is the feeling of the authors, however, that to be able to view a problem from more than one viewpoint is advantageous. Accordingly, we have tried to develop both the geometric and algebraic theory as we proceed. Throughout this chapter denotes the Eudhdean norm on C". 2.
Partial isometries
Part of the difficulity with generalizing the polar form in Theorem 0.3.1 X form to AECM X m # n, was the need for a 'non-square unitary'.
We will now develop the appropriate generalization of a unitary matrix. Definition 4.2.1 Suppose that VeCtm x n m. Then V is called an isometry fl Vu = u fl for all UECN. The equation Vu = u may be rewritten (Vu, Vu) = (u, u) or (V*yu, u) = (u, u). Now if C1, C2 are hermitian matrices in C" then (C1u,u)= (C2u,u) for all UEC" if and only if C1 = C2. Thus we have: Proposition 4.2.1 is an isonietry if and only if V*V = A more general concept than isometry is that of a partial isometry.
72
GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
a subs pace M. Then Definition 4.2.2 Let =M partial isometry (of M into Ctm) and only if (1)
is a
IIVUH=IIuIIforallu€Mand
(ii) Vu=Oifu€M'.
The subspace M is called the initial space of V and R(V) is called the final space.
A partial isometry V (or y) sends its initial space onto its final space without changing the lengths of vectors in its initial space or the angles between them. In other words, a partial isometry can be viewed as the identification of two subspaces. Orthogonal projections are a special type of partial isometry. Partial isometrics are easy to characterize. Theorem 4.2.1
Suppose that VeCtm
X
Then the following are equivalent.
(i) V is a partial isometry
(ii)
V*=Vt.
VV*—D a R( 's') s"" (iv) V=VV*V. (v) V* = (vi) (VV)2 = (V*V).
—
R( V) —
Initial space oF V
(vii) (VV*)2 =
Proof The equivalence of (1) and (iv)—(vii) is left to the exercises, while the equivalence of (ii) and (iii) is the Moore definition of yt• Suppose then that V is a partial isometry and M is its initial space. If ueM, then (Vu, Vu) = (V*Vu, u) = (u, u). But also R(V*V) = R(V*) = N(V)1 = M. Thus is hermitian. If ueM-'-, then V*Vu = 0 since )LYIM =IIM since Vu =0. Thus Similar arguments show that VV* = and = (iii) follows. To show that (iii) implies (I) the above argument can be done in reverse.
Corollary 4.2.1 If V is a partial isometry, then so is V*. For partial isometrics the Singular Value Decomposition mentioned in Chapter 0 takes a form that is worth noting. We are not going to prove the Singular Value Decomposition but our proof of this special case and of the general polar form should help the reader do so for himself.
Proposition 4.2.2
Xfl
Suppose that V cC"' is a partial isometry of rank r. Then there exist unitary matrices UeC"' X and We C" X "such that
:Jw. Proof Suppose that V cC'" XIt is a partial isometry. Let M = R(V*) be its initial space. Let { b1,... ,b,} be an orthonormal basis for M. Extend this to
PARTIAL ISOMETRIES AND EP MATRICES 73
an orthonormal basis
... ,bj of C". Since V is
= {b1 ,... ,b,,
isometric on M, {Vb1, ... ,Vbj is an orthonormal basis for R(V). Extend {Vb1,... ,Vb,} to an orthonormal basis = {Vb1, ... of CTM. Let W be the unitary transformation which changes a vector to its Let U be the unitary transformation coordinates with respect to basis which changes a 132-coordinate vector into a coordinate vector with respect to the standard basis of CTM. Then (1) follows. • We are now in a position to prove the general polar form.
Theorem 4.2.2
(General Polar Form). Suppose that AeCTM
Xn
Then
(1) There exists a hermitian BeC" such that N(B) = N(A) and a partial isometry such that R(V) = R(A), N(V) = N(B), and A = VB. X (ii) There exists a hermitian Ce C'" TM such that R(C) = R(A) and a partial isometry W such that N(W) = N(A), R(W) = R(C), and A = CW.
Proof The proof is motivated by the complex number idea it generalizes. We will prove (1) of Theorem If z = rern, then r = (zzl"2 and e" = z(zzT 2 and leave (ii), which is similar, to the exercises. Let B = (Recall the notation of page 6.) Then BeC" and N(A*A) N(B) = = N(A). Let V = AB'. We must show that V is the required partial isometry. Notice that Bt is hermitian, N(Bt) = N(B), and R(Bt) = R(B). Thus R(V) = R(ABt) = R(AB) = R(A(A*A)) = R(A) and N(V) = N(AA*A) = N(A) = N(B) as desired. Suppose then that ueN(V)1 = R(B). Then Vu 112 = (Vu, Vu) = (ABtU, AB'u) = (BtA*ABtU, u) = (BtBIBtU, u) = u Thus V is the required partial isometry. U The proof of the singular value decomposition theorem is left to the
exercises. Note that if D is square, then
ID O'lIi L o'
01
o'iLo
where
ID L0
01
0] can be factored as Ii 01. a partial
ID 01.is square and
k
[o
Of
isometry. A judicious use of this observation, Theorem 2, and the proof of Proposition 2 should lead to a proof of the singular value decomposition. While partial isometries are a generalization of unitary matrices there are some differences. For example, the columns or rows of a partial isometry need not be an orthonormal set unless a subset of the standard basis for C" or C'" is a basis for R(V*) or R(V).
Example 4.2.1
0
.ThenVisa
0 partial isometry but neither the columns nor the rows (or a subset thereof) form an orthonormal basis for R(V) or R(V). It should also be noted that, in general, the product of a pair of partial isometrics need not be a partial isometry. Also, unlike unitary operators,
74 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
square partial isometrics can have eigenvalues of modulus unequal to
one or zero. Example 4.2.2
Let
v=
Then V is a partial isometry and
a(V)= 10
Example 4.2.3
3.
Let V =
1
0
01 .
1
[000J
Then V is a partial isometry and
EP matrices
The identities A5A
= AM for normal matrices and A 'A = AA' for
invertible matrices are sometimes useful. This suggests that it might be
helpful to know when AtA = AAt.
Definition 4.3.1
and rank(A)= r. If AtA = AAt,
Suppose that
then A is called an EP,, or simply EP, matrix. The basic facts about EP matrices are set forth in the next theorem.
Theorem 4.3.1 (I)
Suppose that AeC"
X
Then the following are equivalent.
AisEP
(ii) R(A) = R(A5) = R(A) N(A) (iii) (iv) There exists a unitarr matrix U and an invertible r x r matrix A1, r = rank (A), such that (1)
: U5.
Proof (1). (ii) and (iii) are clearly equivalent. That (iv) implies (iii) is obvious. To see that (iii) implies (iv) let 13 be an orthonormal basis for C" consisting of first an orthonormal basis for R(A) and then an orthonormal basis for N(A). is then the coordinate transformation from standard
coordinates to fl-coordinates. • If A is EP and has the factorization given by (1), then since U, unitary
are
(2)
Since EP matrices have a nice form it is helpful if one can tell when a matrix is EP. This problem will be discussed again later. Several conditions implying EP are given in the exercises.
PARTIAL ISOMETRIES AND EP MATRICES
75
It was pointed out in Chapter 1 that, unlike the taking of an inverse, the taking of a generalized inverse does not have a nice 'spectral mapping property'. If A e invertible, then Aec(A) if and only
')
(3)
and Ax = Ax
if and only if A 'x =
(1\ x. )
(4)
While it is difficult to characterize matrices which satisfy condition (3), it is relatively easy to characterize those that satisfy condition (4). Notice that (4) implies (3).
Theorem 4.3.2
Suppose
that
Then A is EP if and only jf
(Ax = Ax if and only if Atx = Atx).
(5)
Proof Suppose that A is EP. By Theorem 1, A = is unitary and A11 exists. Then Ax = Ax if and only
]
U where U
]
Ux = A
=0, then u1 =0, and AtX =0. Thus (5) holds for A =0. If A
0,
then u2 =0 and u1 is an eigenvector for A1. Thus (5) follows from (2) and (4). Suppose now that (5) holds. Then N(A) = = Thus A is EP
by condition (iii) of Theorem 1. U Corollary 4.3.1 If A is EP, then Aec(A) if and only tfAtea(At). Corollary 1 does not of course, characterize when A is EP.
Example 4.3.1
Notice that a(A) = {O) and At = A*.
Let A
= Thus Aeo(A) if and only if A'ec(A). However, AtA
AAt_.[i 4. 1.
while
=
Exercises If V, W, and VW arc partial isometrics, show that (VW)t = WtVt using only Theorem 1.
76 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=0, If V. W are partial isometries and [W, VVt] =0 or [V. then WV is a partial isometry. a partial isometry and U, W are unitary, show that 3. If VeC" x UVW is a partial isometry. 4. Show that the following conditions are equivalent. 2.
(i) V is a partial isometry (ii) VV*V = V (iii) V*VV* = (iv) (V*V)2 = = VV* (v) 5. Prove part (ii) of Theorem 2. *6. Prove the Singular Value Decomposition Theorem (Theorem 0.2). 7. Prove that if A*A = AA*, then A is EP. 8. Prove that the following are equivalent.
(a) A is EP
(b) [AtA,A+At]=0 (c) [AAt, A + At] =0 (d) [AtA,A + A*] = 0 (e) [AAt,A + A*] = 0
(f) [A,AtA]=O (g) [A,AAt] = 0
9. Prove that if A is EP, then (At)2 = (A2)t. Find an example of a matrix A 0, such that (A2)t = (At)2 but A is not EP. *10. Prove that A is EP if and only if both (At)2 = (A2)t and R(A) = R(A2). *11. Prove that A is EP if and only if R(A2)= R(A) and [AtA,AAt] = 0. = (Al)t, then [AtA,AAt] = 0 but not conversely. Comment: Thus the result of Exercise 11 implies the result of Exercise 10. Exercise 11 has a fairly easy proof if the condition [AtA, AAt] =0 is translated into a decomposition of C". 12. Suppose that X = What can you say about X? Give an example X of a X such that X = a partial isometry. What conditions in addition to X = Xt are needed to make X a partial isometry? 13. Prove that V is an orthogonal projector if and only if V = Vt = 14. Prove that if A, B are EP (not necessarily of the same rank) and AB = BA, then (AB)t = BtAt.
5. The generalized inverse in electrical engineering
1.
Introduction
In almost any situation where a system of linear equations occurs there is the possibility of applications of the generalized inverse. This chapter will describe a place where the generalized inverse appears in electrical engineering. To make the exposition easily accessible to those with little knowledge of circuit theory, we have kept the examples and discussion at an elementary level. Technical terms will often be followed by intuitive definitions. No attempt has been made to describe all the uses of generalized inverses in electrical engineering, but rather, one particular use will be discussed in some detail. Additional uses will be mentioned in the closing paragraphs to this chapter. It should be understood that almost everything done here can be done for more complex circuits. Of course, curve fitting and least squares analysis as discussed in Chapter 2 is useful in electrical engineering. The applications of this chapter are of a different sort. The Drazin Inverse of Chapter 7 as shown in Chapter 9 can be used to study linear systems of differential equations with singular coefficients. Such equations sometimes occur in electrical circuits if, for example, there are dependent sources.
2.
n-port network and the impedance matrix
It is sometimes desirable, or necessary, to consider an electrical network in terms of how it appears from the outside. One should visualize a box (the network) from which lead several terminals (wires). The idea is to describe the network in terms of measurements made at the terminals. One thus characterizes the network by what it does, rather than what it physically is. This is the so-called 'black box' approach. This approach appears in many other fields such as nuclear engineering where the black
78 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
box might be a nuclear reactor and the terminals might represent
measurements of variables such as neutron flow, temperature, etc. We will restrict ourselves to the case when the terminals may be treated in pairs. Each pair is called a port. It is assumed that the amount of current going into one terminal of a port is the same as that coming out of the other terminal of the same port. This is a restriction on the types of devices that might be attached to the network at the port. It is not a restriction on the network. A network with n ports is called an n-port network.
Given an n-port network there are a variety of ways to characterize it depending on what one wants to do. In particular, there are different kinds of readings that can be taken at the ports. Those measurements thought of as independent variables are called inputs. Those thought of as dependent variables are called outputs. We will assume that our networks have the properties of homogeneity and superposition. Homogeneity says that if the inputs are multiplied by a factor, then the outputs are multiplied by that same factor. If the network has the property of superposition, then the output for the sum of several inputs is the sum of the outputs for each input. We will use current as our input and voltage as our output. Kirchhofl's laws are useful in trying to determine if a particular pair of terminals are acting like a port. We will also use them to analyse a particular circuit. A node is the place where two or more wires join together. A loop is any closed conducting path. KIRCH HOFF'S CURRENT LAW: The algebraic sum of all the instantaneous currents leaving a node is zero. KIRCH HOFF'S VOLTAGE LAW: The algebraic sum of all the voltage drops around any loop is zero. Kirchhoff's current law may also be applied to the currents entering and leaving a network if there are no current sources inside the network. Suppose that r denotes a certain amount of resistance to the current in a wire. We will assume that our wires have no resistance and that the resistance is located in certain devices called resistors. Provided that the resistance of the wires is 'small' compared with that of other devices in the circuit this is not a 'bad' approximation of a real circuit. Let v denote the voltage (pressure forcing current) across the resistor. The voltage across the resistor is also sometimes referred to as the 'voltage drop' across the resistor, or the 'change in potential'. Let i denote the current in the resistor. Then
v=ir. that r is constant but v and i vary with time. lithe one-sided Laplace transform is taken of both sides of (1), then v = ir where v and i are now functions of a frequency variable rather than of a time variable. When v and I are these transformed functions (for any circuit), then the ratio v/i is called the impedance of the circuit. Impedance is in the same units (ohms) as resistance but is a function of frequency. If the circuit Suppose
(1)
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
Poril
79
Port3
0-
JPod4 Fig. 5.1 A 4.pori network.
consists only of resistors, then the impedance is constant and equals the
resistance. Impedance is usually denoted by a z. In order to visualize what is happening it is helpful to be able to picture
a network. We will denote a current source by t ,a resistor by
a
terminal by and a node by —o-----. The reader should be aware that not all texts distinguish between terminals and nodes as we do. We reserve the word 'terminal' for the ports. Our current sources will be idea! current sources in that they are assumed to have zero resistance. Resistors are assumed to have constant resistance. Before proceeding let us briefly review the definition of an n-port network. Figure 5.1 is a 4-port network where port 1 is open, port 2 has a current source applied across it, port 3 is short-circuited, and port 4 has a resistor across it. Kirchhofls current law can be applied to show that ports 1, 2 and 3 actually are ports, that is, the current entering one terminal is the same as that leaving the other terminal. Port 1 is a port since it is open and there is no current at all. Now consider the network in Fig. 5.2. The network in Fig. 5.2 is not a 4-port network. As before, the pairs of terminals 5 and 6, 7 and 8, do form ports. But there is no way to guarantee,
(4
I
5
2
6
:3
Fig. 5.2 A network which is not an n-port.
80 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Port 2
Port3
Port
Fig. 5.3
without looking inside the box, that the current coming out of terminal 4 is the same as that flowing into any of terminals 1, 2 or 3. Thus terminal 4 cannot be teamed up with any other terminal to form a port. There are, of course, ways of working with terminals that cannot be considered as ports, but we will not discuss them here. It is time to introduce the matrices. Consider a 3-port network, Fig. 5.3, which may, in fact, be hooked up to other networks not shown. Let be the potential (voltage) across the jth port. Let be the current through one of the terminals of the jth port. Since the v,, are variables, it really does not matter which way the arrow for points. Given the values of the voltages v1, v2, v3 and are determined. But we have assumed our network was homogeneous and had the property of superposition. Thus the can be written in terms of the by a system of linear equations. = Z11i1 + Z12j2 + z13i3 V2 = Z21i1 + Z22i2 + Z23j3, = z31i1 + z32i2 + z33i3 V1
(2)
or in matrix notation,
= Zi, where v,ieC3,
Z is called the impedance matrix of the network since it has the same units as impedance and (3) looks like (1). In the system of equations (2), i,, and the are all functions of the frequency variable mentioned earlier. If there are devices other than just resistors, such as capacitors, in the network, then Z will vary with the frequency. The numbers have a fairly elementary physical meaning. Suppose that we take the 3-port of Fig. 5.3 and apply a current of strength i1 across the terminals forming port 1, leave ports 2 and 3 open, and measure the voltage across port 3. Now an ideal voltmeter has infinite resistance, that is, there is no current in it. (In reality a small amount of current goes through it.) Thus i3 =0. Since port 2 was left open, we have i2 =0. Then (2) says that v3 = z31i1 or z31 = v3/i1 when = = 0. The other Zkj have similar interpretations. We shall calculate the impedance matrix of the network Example 1 in Fig. 5.4. In practice Z would be calculated by actual physical
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
81
Iport3
_J
1_
Fig. 5.4 A particular 3-port network. The circled number give the resistance in ohms of the resistor.
measurements of currents and voltages. We shall calculate it by looking
'inside the box'. If a current i1
is
applied across port 1 we have the situation
in Fig. 5.5. The only current is around the indicated loop. There is a resistance of
1 ohm on this loop so that v1 = Thus = v1/i1 = I. Now there is no current in the rest of the network so there can be no changes in potential. This means that v2 =0 since there is no potential change across the terminals forming port 2. It also means that the potential v3 across port 3 is the same as the potential between nodes a and b in Fig. 5.5. That is, v3 = 1 also. Hence z21 = v2/i1 =0 and z31 = v3/i1 = 1. Continuing we get
Ii
0
Li
2
Z=IO 2
11
(4)
21.
3j
In order to calculate z33 recall that if two resistors are connected in series (Fig. 5.6), then the resistance of the two considered as one resistor is the sum of the resistance of each. Several comments about the matrix (4) are in order. First, the matrix (4) is hermitian. This happened because our network was reciprocal. A network is reciprocal if when input and output terminals are interchanged, the relationship between input and output is unchanged. That is, = Second, the matrix (4) had only constant terms since the network in Fig. 5.5
'I
L___ -J Fig. 5.5 Application of a current to port I.
82 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Fig. 5.6 Two resistors in series.
was resistive, that is, composed of only resistors. Finally, notice that (4) was not invertible. This, of course, was due to the fact that v3 = v1 + v2. One might argue that v3 could thus be eliminated. However, this dependence might not be known a priori. Also the three-port might be needed for joining with other networks. We shall also see later that theoretical considerations sometimes lead to singular matrices. 3.
Parallel sums
Suppose that R3 and R2 are two resistors with resistances r1 and r2. Then if R1 and R2 are in series (see Fig. 5.6) we have that the total resistance is
r1 + r2. The resistors may also be wired in parallel (Fig. 5.7). The total resistance of the circuit elements in Fig. 5.7 is r1r2/(r1 + r2) unless r1 = r2 =0 in which case it is zero. The number r1r2/(r1 + r2)
(1)
is called the parallel sum of r1 and r2. It is sometimes denoted r1 : r2. This section will discuss to what extent the impedance matrices of two n-ports, in series or in parallel, can be computed from formulas like those of simple resistors. It will be convenient to alter our notation of an n-port slightly by writing the 'input' terminals on the left and the 'output' terminals on the right. The numbers j, j' will label the two terminals forming the jth port. Thus the 3-port in Fig. 5.8a would now be written as in Fig. 5.8b. The notation of Fig. 5.8a is probably more intuitive while that of Fig. 5.8b is more convenient for what follows. The parallel and series connection of two n-ports is done on a port basis.
Fig. 5.7 Two resistors wired in parallel.
I
t
(0)
Fig. 5.8 Two ways of writing a 3-port network.
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
83
FIg. 5.9 Series connection of two 3-ports.
That is, in the series connection of two n-port networks, the ports labelled 1 are in series, the ports labelled 2 are in series, etc. (Fig. 5.9). Note, though, that the designation of a port as 1, 2, 3,... is arbitrary. Notice that the parallel or series connection of two n-ports forms what appears to be a new n-port (Fig. 5.10).
Proposition 5.3.1
Suppose that one has two n-ports N1 and N2 with impedance matrices Z1 and Z2. Then the impedance matrix of the series connection of N1 and N2 will be Z1 + Z2 provided that the two n-ports are
stillfunctioning as n-ports. Basically, the provision says that one cannot expect to use Z1 and Z2,
if in the series connection,
and N2 no longer act like they did when
Z1 and Z2 were computed.
It is not too difficult to see why Proposition 1 is true. Let N be the network formed by the series connection of two n-ports N1 and N2. Suppose in the series connect.c,n that N1 and N2 still function as n-ports. Apply a current of I amps across the ith port of N. Then a current of magnitude I goes into the 'first' terminal of port i of N1. Since N1 is an n-port, the same amount of current comes out of the second terminal of port i of N1 and into the first terminal of port i of N2. But N2 is also functioning as an n-port. Thus I amps flow out of terminal 2 of the ith port of N2. The resulting current is thus equivalent to having applied a current of! amps across the ith ports of and N2 separately. But the is the sum of the potentials potential across the jth port of N, denoted across the jth ports of N1 and N2 since the second terminal of port j of N1 and the first terminal of port j of N2 are at the same potential. But Nk, = k = 1,2 are functioning as n-ports so we have that = and where the superscript refers to the network. Thus = vs/I = (41) + = as desired. + 1
Fig. 5.10 Parallel connection of Iwo 3-ports.
84 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
L______ I FIg. 5.11 Two 2-ports connected in series.
Example 5.3.1 Consider the series connection of two 2-ports shown in Fig. 5.11. All resistors are assumed to be 1 ohm. The impedance matrix of while that of the second is
the first 2-port in Fig. 5.11 is Z1
Suppose now that a current of magnitude I is applied across
=
port 1 of the combined 2-port of Fig. 5.11. The resistance between nodes a and b is 1:2=2/3. The potential between a and b is thus 2/3 I. But what is important, 1/3 of the current goes through branch b and 2/3 through branch a. Thus in the series hookup of Fig. 5.11, there is I amperes of current going in terminal 1 of the first 2-port but only 21/3 coming out of terminal I of the first 2-port. Thus the first port of the first 2-port is no longer acting like a port. If the impedance matrix Z of the entire network
z1 + z2. in many cases,
of Fig. 5.11 is calculated, we get Z
=
however, the n-ports still act like n-ports when in series and one may add the impedance matrices. In practice, there is a simple procedure that can be used to check if the n-ports are still functioning as n-ports. We see then that when n-ports are in series, that the impedance matrix of the whole network can frequently be calculated by adding the individual impedance matrices. Likewise, when in parallel a formula similar to (1) can sometimes be used to calculate the impedance matrix. Suppose that Then define the parallel sum A :B of A and B by
A :B = A(A + B)tB. If a reciprocal network is composed solely of resistive elements, then the impedance matrix Z is not only hermitian but also positive semi-definite. That is, (Zx, x) 0 for all If Z is positive semi-definite, we sometimes write Z 0. If Z is positive semi-definite, then Z is hermitian. (This depends on the fact that and not just Ra.) if A — B 0 for A A is greater than or equal to B.
Proposition 5.3.2
Suppose
that N1 and N3 are two reciprocal n-ports
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
85
which are resistive networks with impedance ?natriCes Z1 and Z2. Then the impedance matrix of the paralle! connection of N1 and N2 is Z1 : Z2.
Proof In order to prove Proposition 2 we need to use three facts about the parallel sum of hermitian positive semi-definite matrices Z ,Z2. The first is that (Z1 : Z2) = (Z2 : Z1). The second is that R(Z1) + R(Z2) = R(Z1 + Z2), so that, in particular. R(Z1), R(Z2) c R(Z1 + Z2). The third is is hermitian. The proof of these facts = N(Z1), i = 1,2. since that is left to the exercises. Let N1 and N2 be two n-ports connected in parallel to form an n-port N. Let Z1, Z2 and Z be the impedance matrices of N1, N2 and N respectively. and v be the current and voltage vectors Similarly, let i1, i; v1, for N1, N2 and N. To prove Proposition 2 we must show that
v=Z1[Z1 +Z2]tZi=(Z1:Z2)i.
(2)
The proof of (2) will follow the derivation of the simple case when N1, N2 are two resistors and Z1, Z2 are positive real numbers. The current vector i may be decomposed as
i=i1+i2.
(3)
= v2 since N1 and N2 are connected in parallel. Thus
But v =
v=Z1i1,andv=Z2i2.
(4)
We will now transform (3) into the form of(2). Multiply (3) by Z1 and Z2 to get the two equations Z1i = Z1i1 + Z1i2 = v + Z1i2, and Z2i = Z2i1 + Z2i2 = v + Z2i1. Now multiply both of these equations by (Z1 + Z2)t. This gives
(Z1 + Z2)tZ1i = (Z1 + Z2)tv + (Z1 + Z2)tZ1i2, and (Z1
+
= (Z1 +
+ (Z1 + Z2)tZ2i1.
(5) (6)
Multiply (5) on the left by Z2 and (6) on the left by Z1. Equations (5) and (6) become
(Z2:Z1)i=Z2(Z1 +Z2)tv+(Z2:Z1)i2,and (Z1 : Z2)i = Z1(Z1 + Z2)t, + (Z1 : Z2)i1.
(7)
=+
But (Z1 :Z2) =(Z2 :Z1), i and Z1 + Z2 is hermitian. Thus addition of the two equations in (7) gives us that
(Z1 :Z2)i =(Z1 + Z2)(Z1 + Z2)tv =
+ z)V.
(8)
Now the impedance matrix gives v from i. Thus v must be in R(Z1) and R(Z2) by (4) so that + 1)V =v and (8) becomes (Z1 Z2)i = v as
desired. •
Example 5.3.2
Consider the parallel connection of two 3-port networks
86 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
(9)
Fig. 5.12 The parallel connection of N3 and N2.
shown in Fig. 5.12. The impedance matrices of N1 and N2 are
Ii Z1=I0
0
1]
2
Li
2
Ii
0
1
2IandZ2=I0
1
1
3J
1
2
LI
By Proposition 2, the impedance matrix of circuit (9) is Z1 : Z2. 11
0
11
111
0
Z1:Z2=Io
ii
2
21
(10
2
21
Li
2
3]
\L'
2
3]
Ii
0
= —10
ii r
2
84 21 I— 60
2
3] L
324
11/2
=
0
0 2/3 2/3
24
+
1'
0
10
1
Li
I
11\tli
0
iii 10
2J1L1
ii
1
ii
1
2]
i]
—60 66
241
Ii
0
61
10
1
II
6
30J
Li
1
2J
1/21
2/31. 7/6J Ll/2 The generalized inverse can be computed easily by several methods. The values obtained from Z1 Z2 reader is encouraged to verify that the agree with those obtained by direct computation from (9). 4.
I
Shorted matrices
The generalized inverse appears in situations other than just parallel
connections Suppose that one is interested in a 3-port network N with impedance matrix Z. Now short out port 3 to produce a new 3-port network, N' and denote its impedance matrix by Z'. Since v3 is always zero in N' we must have = Z'33 =0, that is, the bottom row of Z' is zero. If N is a
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
87
reciprocal network, then the third column of Z must also consist of zeros.
Z' would then have the form
z,=
(1)
.
The obvious question is: What is the relationship between the ZkJ and the The answer, which at first glance is probably not obvious, is:
Proposition 5.4.1
Suppose that N is a resistive n-port network with
impedance matrix Z. Partition Z as Z
=
1 s n. Then Z' =
is the impedance matrix of
the network N'formed by shorting the last sports of N N is reciprocal.
Iii
Proof Write i = I
.°
I. v
L13J
lvi where
= [°j
v5eC5. Then v = Zi may be
s
written as V0
= =
ho
+ (2)
+
Suppose now that the last s ports of N are shorted. We must determine the matrix X such that v0 = Xi0. Since the last s ports are shorted, =0. Thus the second equation of(2) becomes Z22i = — Z21i0. Hence
=
—
+ [I
—
=—
+ h where h€N(Z22). (3)
If (Zi, 1) = 0, then Zi = 0 since Z 0. Thus N(Z22)c N(Z32). (Consider i with = 0). Substituting equation (3) into the first equation of (2) now + h)) = Z1 110 — gives v0 =Z1 110 + Z1215 = Z1 110 + Z12( — = (Z11 — as desired. The zero blocks appear in
the V matrix for the same reason that zeros appeared in the special
case(1). U Z' is sometimes referred to as a shorted matrix. Properties of shorted matrices often correspond to physical properties of the circuit. We will mention one. Others are developed in the exercises along with a generalization of the definition of shorted matrix. Suppose that Z,Z' are as in Proposition 1. Then Z V 0. This corresponds to the physical fact that a short circuit can only lower resistance of a network and not increase it. It is worth noting that in the formula for V in Proposition 1 a weaker
88 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
type of inverse than the generalized inverse would have sufficed. However,
as we will see later, it would then have been more difficult to show that Z Z'. In the parallel sum the Penrose conditions are needed and a weaker inverse would not have worked. 5.
Other uses of the generalized inverse
The applications of the generalized inverse in Sections 3 and 4 were chosen
partly because of their uniqueness. There are other uses of the generalized inverse which are more routine. For example, suppose that we have an n-port network N with impedance matrix Z. Then v = Zi. It might be desirable to be able to produce a particular output v0. In that case we would want to solve v0 = Zi. If then we must seek approximate solutions. This would be a least squares problem as discussed in Chapter 2. would correspond to that least squares solution which requires the least current input (in the sense that Iii II is minimized). Of course, this approach would work for inputs and outputs other than just current inputs and voltage outputs. The only requirements are that with respect to the new variables the network has the properties of homogeneity and superposition otherwise we cannot get a linear system of equations unless a first order approximation is to be taken. In practice, there should also be a way to compute the matrix of coefficients of the system of equations. Another use of the generalized inverse is in minimizing quadratic forms subject to linear constraints. Recall that a quadratic form is a function 4) from C" to C of the form 4)(x) = (Ax, x), AE C" Xst for a fixed A. The instantaneous power dissipated by a circuit, the instantaneous value of the energy stored in the inductors (if any are in the circuit), and the instantaneous value of the energy stored in the capacitors, may all be written in the form (Al, i) where i is a vector made up of the loop currents. A description of loop currents and how they can be used to get a system of equations describing a network may be found in Chapter III of Huelsman. His book provides a very good introduction to and amplification of the ideas presented in this chapter.
6.
Exercises
For Exercises (I)—(8) assume that A, B, C, D 0 are n x n hermitian matrices.
1. Show that A:B=B:A. 2. Show that A:B0. 3. Prove that R(A : B) = R(A) R(B).
4. Prove that (A:B):C=A:(B:C). *5 Prove that Tr (A : B) (Tr A) : (Tr B). *6. Prove that det (A : B) (det A) : (det B).
THE GENERALIZED INVERSE IN ELECTRICAL ENGINEERING
Fig. 5.14
Fig. 5.13 7.
89
a. Provethat ifAB,then
b. Formulate the physical analogue of 7a in terms of impedances. *8. Show that (A + B) : (C + D) A : C + B D. (This corresponds to the assertion that Fig. 5.13 has more impedance than Fig. 5.14 since the latter has more paths.) For Exercises (9)—(14) assume that is an hermitian positive semidefinite operator on C". Pick a subspace M c C". Let B be an orthonormat basis consisting of With respect to B, first an orthonormal basis for M and then one for
A
B
has the matrix E =[B* C]' where E 0. Define EM =[ 9.
A_BCtB* 0 o
Prove that EEM.
10. Show that if D is hermitian positive semi-definite and E D 0 and
R(D)cM, DEM. *11. Prove that EM=lim E:nPM. For the next three exercises F is another hermitian positive semi-definite matrix partitioned with respect to B just like E was.
12. Suppose that E F 0. Show that EM FM 0. 13. Prove that (E + F)M EM + FM. 14. Determine when equality holds in Exercise 13. Let L. M be subspaces. Prove = :
7.
References and further reading
A good introduction to the use of matrix theory in circuit theory is [44].
Our Section 2 is a very condensed version of the development there. In particular, Huelsman discusses how to handle more general networks than ours with the port notation by the use of 'grounds' or reference nodes. Matrices other than the impedance matrix are also discussed in detail. Many papers have been published on n-ports. The papers of Cederbaum, together with their bibliographies, will get the interested reader started. Two of his papers are listed in the bibliography at the end of this book [27], [28]. The parallel sum of matrices has been studied by Anderson and colleagues in [2], [3], [4], [5] and his thesis. Exercises (1 )—(8) come from [3] while (9)—(14) are propositions and lemmas from [2]. The theory of shorted operators is extended to operators on Hubert space in [5]. In [4] the operations of ordinary and parallel addition are treated as special
90 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
of a more general type of matrix addition. The minimization of quadratic forms is discussed in [10]. The authors of [42] use the generalized inverse to minimize quadratic forms and eliminate 'unwanted variables'. The reader interested in additional references is referred to the bibliographies of the above and in particular [4]. cases
6
(/jk)-generalized inverses and linear estimation
1.
Introduction
We have seen in the earlier chapters that the generalized inverse At of A E C'" although useful, has some shortcomings. For example: computation of At can be difficult, At is lacking in desirable spectral properties, and the generalized inverse of a product is not necessarily the product of the generalized inverses in reverse order. It seems reasonable that in order to define a generalized inverse which overcomes one or more of these deficiencies, one must expect to give up something. The importance one attaches to the various types of generalized inverses will depend on the particular applications which one has in mind. For some applications the properties which are lost will not be nearly as important as those properties which are gained. From a theoretical point of view, the definition and properties of the generalized inverse defined in Chapter 1 are probably more elegant than those of this chapter. However, the concepts of this chapter are considered by many to be more practical than those of the previous chapters.
2.
Definitions
Recall that L(C", C'") denotes the set of linear transformations from C"
into C'". For 4eL(C",C'"), was defined in Chapter 1 as follows. C" was denoted the decomposed into the direct sum of N(4) and a one to one mapping of restriction of 4 to N(4)', so that 4j was onto R(4). was then defined to be Atx -
if xeR(A) — tO
if
Instead of considering orthogonal complements of N(4) and R(4), one could consider any pair of complementary subspaces and obtain a linear transformation which could be considered as a generalized inverse for 6.
92 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Definition 6.2.1
(Functional Definition) Let 44eL(C", C'") and let N and R be complementary subs paces of N(4) and R(44), that is, C" = N(44) + N and C'" = R(4) + R. Let 41 =44 L (i.e. 44 restricted to N. Note that 441 is a one to R(44)—' N exists). For xeC'", one mapping of N onto R(44) so that 44 let x = r1 + r2 where r1 ER(4) and r2eR. The function QN,R defined by
9 is either called the (N, R)-generalized inverse for 4 or a prescribed range!
null space generalized inverse for 44. For a given N and R, QNR is a uniquely defined linear transformation from C'" into C". Therefore,for AECrnXA, A induces afunction 44 and we can define GNR€Cnxrn to be the matrix 0fQNR (with respect to the standard basis).
In the terminology of Definition 1, At is the (R(A*), inverse for A. In order to avoid confusion, we shall henceforth refer to At as the Moore—Penrose inverse of A. In Chapter 1 three equivalent definitions for At were given; a functional definition, a projective definition (Moore's definition), and an algebraic definition (Penrose's definition). We can construct analogous definitions for the (N, R)-generalized inverse. It will be assumed throughout this section that N, R are as in Definition 1. The projection operators which we will be dealing with in this chapter will not necessarily be orthogonal projectors. So as to avoid confusion, the following notation will be adopted. Notation. To denote the oblique projector whose range is M and whose null space is N we shall use the symbol The symbol will denote, as before, the orthogonal projector whose range is M and whose null space
is M'. Starting with Definition 1 it is straightforward to arrive at the following
two alternative characterizations of GNR.
Definition 6.2.2 (Projective Definition) For AeC'"
is called the
Il) = N, N(GNR) = R, AGNR =
(N, R)-generalized inverse for A and GNRA =
Definition 6.2.3 (Algebraic Definition) For AeC'" (N, R)-generalized inverse for A and GNRAGNR = GNR.
",
GNR is the
R(GNR) = N, N(GN R) = R, AGN RA = A,
Theorem 6.2.1
The functional, the projective, and the algebraic definitions of the (N, R)-generalized inverse are equivalent.
Proof We shall first prove that Definition 1 is equivalent to Definition 2 and then that Definition 2 is equivalent to Definition 3. If QN R satisfies the conditions of Definition 1, then it is clear that R(QNR)'= N and N(Q$R) = R. But 44QNR is the identity function on R(4) and the zero function on R. Thus AGNR
=
Similarly GNRA =
GENERALIZED INVERSES AND LINEAR ESTIMATION
93
and Definition I implies Definition 2. Conversely, if GNR satisfies
the conditions of Definition 2, then N and R must be complementary subspaces for N(A) and R(A) respectively. It also follows that must be the identity on R while is the identity on N and zero on N(A). Thus QNR satisfies the conditions of Definition I, and hence Definition 2 implies Definition 1. That Definition 2 implies Definition 3 is clear. To complete the proof, we need only to show that Definition 3 implies Definition 2. Assuming G satisfies the conditions of Definition 3, we obtain (AG)2 = AG and (GA)2 = GA so that AG and GA are projectors. Furthermore, R(AG) R(A) = R(AGA) c R(AG) and R(GA) R(G) = R(GAG) c R(GA). Thus R(AG) = R(A) and R(GA) = R(G) = N. Likewise, it is a simple matter to show that N(AG) = N(G) = R and N(GA) = N(A), so that AG = and GA = which are the
conditions of Definition 2 and the proof is complete. • K
Corollary 6.2.1 For AeCTM the class of all prescribed range/null space generalized inverses for A is precisely the set (1)
i.e. those matrices which satisfy the first and second Penrose conditions. The definition of a prescribed range/null space inverse was formulated as an extension of the Moore—Penrose inverse with no particular applications in mind. Let us now be a bit more practical and look at a problem of fundamental importance. Consider a system of linear equations written as Ax = b. If A is square and non-singular, then one of the A characteristics of is the solution. In order to generalize this property, one migbt ask for A Ctm what are the characteristics of a matrix GE Km such that Gb is a solution of Ax = b for every beCTM for which Ax = b is consistent? That is, what are the characteristics of G if AGb = b for every beR(A)?
(2)
In loose terms, we are asking what do the 'equation solving generalized inverses of A' look like? This is easy to answer. Since AGb = b for every beR(A), it is clear that AGA = A. Conversely, suppose that G satisfies AGA = A. For every beR(A) there exists an such that Ax,, = b. Therefore, AGb = AGAx,, = Ax, = b for every bER(A). Below is a formal statement of our observations.
Theorem 6.2.2
For AeCTM has the property that Gb is a solution of Ax = bfor every be Ctm for which Ax = b is consistent and only if
GE{XECAXmIAXA = A}.
Thus the 'equation solving generalized inverses for A' are precisely those which satisfy the first Penrose condition of Definition 1.1.3.
(3)
94 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
such that, in addition to being an 'equation solving inverse' in the sense of (2), we also z for all z Gb and require that for each beR(A), Gb ze{x Ax = b). That is, for each beR(A) we want Gb to be the solution of minimal norm. On the basis of Theorem 2.1.1 we may restate our objective as follows. For each beR(A) we require that Gb = Atb. Therefore, G must satisfy the equation GA = AtA = (4) Let us now be more particular. Suppose we seek
which is equivalent to
AGA = A and (GA)* = GA.
(5)
The equivalence of (4) and (5) is left to the exercises. Suppose now that G is any matrix which satisfies (5). All of the above implications are reversible so that Gb must be the solution of minimal norm. Below is a formal statement of what we have just proven.
Theorem 6.2.3
For AeC'"
and beR(A), A(Gb) = band
0, use Theorem 3.1 and write
(A"' + zI)
r(cs+1
I —'C'
I
01
-:
=
+ zI) 'C' = C- ', (2) is proven. A' and (1) follows. •
Since C is non-singular, and Jim (C"'
If l k,
z-O
then C? = Since it is always true that k n, we
also have the following
THE DRAZIN INVERSE
Corollary 7.7.1
AD
For
The index of
= urn
137
+ zI) 'An.
also be characterized in terms of a limit. Before doing this, we need some preliminary results. The first is an obvious consequence of Theorem 3.1. can
Xli
Lemma 7.6.1 Let Ind(A") = I and only
a singular matrix. For a positive integer p, p Ind(A). Equivalently, the smallest positive integer Ifor which Ind(A1) = 1 is the index of A. be
Lemma 7.6.2 Let be a nilpotent matrix such that Ind(N) = k. For non-negative integers m and p. the limit (3)
z-0 exists
and only
+ p k. When the limit exists, its value is given by
lit4
=
urn zm(N + zI)
ifm>O ,fm=O
0
z—0
(4)
Proof If N = 0, then, from Lemma 2.1, we know that k =
1.
The limit
under consideration reduces to
ifp=O
ifpl.
0,
z—0
It is evident this limit will exist if and only if either p 1 or m 1, which is equivalent to m + p 1. Thus the result is established for k = 1. Assume k—i
N1
nowthatk1,i.e.N#O.Since(N+zIY1=
Z
1=0
+(_lr_2zNm+P_2 ÷
+(—
+
(—
z
+ (5)
.
If m + p k, then clearly the limit (3) exists. Conversely, if the limit (3) exists, then it can be seen from (5) that
Theorem 7.6.2
For
= 0 and hence m + p k. U
where Ind(A) = k and for non-negative
integers m and p. the limit (6) urn z"(A + zI) z-0 exists tf and only !fm + p k: in which case the value of the limit is given by
limf'(A + zI)'A" =
{( — Ir
(I_AAD)Am4Pi,
(7)
138 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Proof If k =0, the result is immediate. Assume k 1 and use Theorem 2.1 where P and C are non-singular and N is
to write A = nilpotent of index k. Then
zm(A + zI) 'A"=
(8)
Because C is non-singular, we always have
limz'"(C+zI)
1
,
z-.O
ifm>O ifm=O
(9)
Thus the limit (6) exists if and only if the limit lim z"(N + zI) exists, :-. 0 which, by Lemma 2, is equivalent to saying m + p k. The expression (7)
is obtained from (8) by using (9) and (4). • There are some important corollaries to the above theorem. The first characterizes Ind(A) in terms of a limit. Corollary 7.6.2
For AeC"
the following statements are equivalent
(i) Ind(A) = k. (ii) k is the smallest non-negative integer such that the limit lim (A + zI) tAk exists. z-0 (iii) k is the smallest non-negative integer such that the limit urn z*(A +
exists.
(iv) If Ind(A) = k, then lim (A + zI) iAk = (AAD)Akl z-0 (v) And when k> 0, lim zk(A + zI)' = (—
=
1)
1)
z-0
Corollary 7.6.3 For lim (A + zI) '(A' + z'I) = z-'O
Corollary 7.6.4
A''.
For
and for every integer l Ind(A) >0,
the following statements are equivalent.
(1) Ind(A)l. (ii) lim(A+zI)1A=AA'. z-0
=I—AA'.
(iii) Jim
z-0 The index can also be characterized in terms of the limit (1).
Theorem 7.6.3
For
X
the smallest non-negative integer, 1, such
that
lim (A'11 +zI)'A' exists is the index of A.
(10)
THE DRAZIN INVERSE
139
Proof If Ind(A) = 0, then the existence of(10) is obvious. So suppose Ind(A) = k 1. Using Theorem 2.1 we get
zI)
— —
0
L
1
The term (C'4 ' + zI) IC: has a limit for alt 1 0 since C is invertible.
which has a limit if and only if N' =0. That is, 1 Ind(A). U
The Drazin inverse of a partitioned matrix
7.
This section will investigate the Drazin inverse of matrices partitioned as
M
fl where A and C are always assumed to be square.
=
Unfortunately, at the present time there is no known representation for MD with A, B, C, D arbitrary. However, we can say something if either D =0 or B =0. In the following theorem, we assume D =0.
If M
Theorem 7.7.1
where A and Care square,
= k = Ind(A), and 1= Ind(C), then MD =
X = (AU)2[
(AD)IBCi](I
—
[AD
CD] where
CCD)
jzO
+ (I —
—
= (AD)2[
—
ADBCD
CCD)
IwO
rk-* 1 + (I — AAD)I Z AB(CDY l(CD)2 — ADBCD. Ls.O J (We define 00
(2)
I)
Proof Expand the term AX as follows. i—i
lCD
'BC' — 0
o
-
k—i
s—i
AX = k—I
2 —
0
140 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
=
/
\ / ADBCCD
1-2
I ADB +
(AD)1. 2BC11' 1—I
0
++
+ /k—I
+1
iB(CDYi +
— (
k-i
+E
0
1k-i
= ADB +
1-2
)—
1—2
1—2
(AD)i+2BCI.I.l 0 —
—
ADBCCD — 0
k-i
ADA1+ iB(CD)i+l
+
—
AADBC.
Now expand the term XC as follows. (AD)I.I.2BCI.41
XC =
k—i
I—i
1—i
o
(AD)i+2BC1+2CD +
—
0
o k— I
—
—
ADBCDC
0
/1—2
'+' +(AD)l1BCI
=1
'BC') + (BcD +
+
1B(CDY11)
+ It
—
\
/1—2
1—f
\o
I
k—i
AIB(CDY+1)
—
(ADABCD
ADBCDC.
ADX AB cD]=[o CD][o ci'
AB ADX
Fromthisitfollowsthat[0
so that condition (3) of Definition 2.3 is satisfied. To show that condition (2)
holds, note that
FAD X irA B1IAD X] [0 CDJ[o cj[o CDf[0
I
ADAX + XCCD + ADBCD CD
:
Thus, it is only necessary to show that ADAX + XCCD + ADBCD = X.
However, this is immediate from (2). Thus condition (2) of Definition 2.3 is satisfied. Finally, we will show that
IA
[o cJ
[0
X 1
IA
cDf[o c] IA
.]
(3)
IA' S(p)1 =Lo c'
THE DRAZIN INVERSE
141
p—i
'BC'. Thus, since n + 2> k and n + 2>1,
whereS(p)= 1=0
IA
]
X
[0 C]
An42X+S(fl+2)CD
c°][o
[o
Therefore, it is only necessary to show that 2X + S(n + 2)CD S(n + 1). Observe first that since 1+ k < n + 1, it must be the case that AhI(AD)i
=
for I = 1, 2, ...
=
(AD)IBCI 1(1
(4)
,1 — 1.
Thus, 1 Li=o
ri-i
= L 1—0
—
CCD)
1BCD
—
(5)
J 1
An-IBCI](I — CCD) —
I—i
I—i
1=0
10
IBCD
= =
Now, S(n +
IBCICD
=
IBCICD 1=0
1—0 11+ i
+ 1 1
1
- IBCICD =
By writing
1BCD
.IBCICD
+
1—0
=A11BCD+
1=1 l—1
1=0
we
obtain 1_i S(n
+
2)CD
=
1BCD
lCD +
+ 1=0
(6) 1=1+ 1
It is now easily seen from (5) and (6) that 1—1
A"2X +S(n + 2)CD_ Z
+1—1+1 Z
i—I
n
1=0
1=1
=
=
A"'BC' = S(n +
1),
i—0
which is the desired result. U By taking transposes we also have the following.
142 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If
Corollary 7.7.1
L
where
A andC are square
= with Ind(A) = k and Ind(C)
1, then LD is given by LD
AD] where
=
X is the matrix given in (2).
There are many cases when one deals with a matrix in which two blocks are zero matrices. Corollary 7.7.2
Let
lAB] Mi=[0
lAo]
1001
A
A is square and each
is square. Then,
rAD
rAD 0
—L0
lOB
]'
2—
til "IMD_ 3 — (AD)2B AD 0]'
and B(AD)2 AD
Each of these cases follows directly from Theorem 1 and Corollary 1.
The next result shows how lnd(M) is related to Ind(A) and Ind(C). Theorem 7.7.2
If M
with A, C square, then
= Max{Ind(A), Ind(C)} Ind(M) lnd(A) + Ind(C).
[
Proof By using (iii) of Corollary 6.2 we know that if Ind(M) =
m,
then
the limit
limf'(M+zI)'
(7)
z-0
exists. Since
z"(M + 1)'—
+ 0
f'(C+zI)'
J'
(8)
one can see that the existence of the limit (7) implies that the limits
lim z"(A + 21)_i and urn f'(C + zI)' exist. From Corollary 6.2 we can z—'O
conclude that Ind(A) m = Ind(M) and Ind(C) m = Ind(M), which
establishes the first inequality of the theorem. On the other hand, if Ind(A) = k and Ind(C) =1, then by Theorem 6.2 the limits lim zkft(A + zI) lim + zI)' and lim 2k '(A + zI)'B(C + ZI)_1 = z—O z—O z—'O lim [z"(A + zI) I] B[z(C + zI)'J each exist. z—O
THE DRAZIN IN VERSE 143
+ zI)
Thus urn
exists and lnd(M) k +
I.
U
In the case when either A or C is non-singular, the previous theorem reduces to the following.
Corollary 7.7.3
Let AECPXr,
C
non-singular, (Ind(C) = 0), then lnd(M) = Ind(A). Likewise, if A is nonsingular, then Ind(M) = Ind(C).
The case in which Ind(M) 1 is of particular interest and will find applications in the next chapter. The next theorem characterizes these matrices. Theorem 7.7.3
If
and M
then Ind(M)
1
=
ifand only if each of the following conditions is true: Ind(A) 1, Ind(C) 1,
(9)
and
(I—AA')B(I—CC')=O.
(10)
Furthermore, when M' exists, it is given by
c'
Lo
(11)
Proof Suppose first that Ind(M) 1. Then from Theorem 2, it follows that Ind(A) 1, Ind(C) I and (9) holds. Since Ind(M) 1, we know that MD = M' is a (1)-inverse for M. Also, Theorem 5.1 guarantees that M' is
a polynomial in M so that M' must be an upper block triangular (1)-inverse for M. Theorem 6.3.8 now implies that (10) must hold. Conversely, suppose that (9) and (10) hold. Then (9) implies that AD = A' and CD = C'. Since A' and C' are (1)-inverses for A and C, (10) along with Theorem 6.3.8 implies that there exists an upper block triangular (1)-inverse for M. Theorem 6.3.9 implies that rank(M) = rank(A) + rank(C). Similarly, rank(M2) = rank(A2) + rank(C2) = rank(A) + rank(C) = rank(M)
so that Ind(M) 1. The explicit form forM' given in (11) is a direct consequence of (2).
U
Corollary 7.7.4
If Ind(A) N(C) c N(B), then Ind(M)
1, 1
Ind(C) 1, and either R(B)
R(A) or
where M is as in Theorem 3.
Proof R(B) c R(A) implies that (I — AA' )B =0 and N(C) c N(B)
impliesthatB(I—CC')=O. •
It is possible to generalize Theorem 3 to block triangular matrices of a
general index.
144 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Theorem 7.7.4
Let
both be singular
non-singular, then Corollary 3 applies) and let M
A"'
positive integer p, let S(p) =
either is
For each
=[
with the convention that 00
= I.
i— 0
Then, Ind(M)
m if and only
each of the following conditions are true:
Ind(A)m, Ind(C)m,and
(12) (13)
(I — AAD)S(m)(I
—
CCD) =0.
(14)
Proof Notice that for positive integers p.
M"
(15)
= [s—— Assume first that Ind(M) m. Then Ind(MM) = 1. From Theorem 3 and the singularity of A, C we can conclude that Ind(AM) = 1, Ind(C'") = 1, and (I — Am(Am)D)S(m)(I
(16)
=0.
—
(17)
Then (12), (13) hold by (16). Clearly, (17) reduces to (14). Conversely, suppose (12)—(14) hold. Then (16) and (17) hold. Theorem 3 now implies
that Ind(Mm) = 1. Therefore, Ind(M) 1. • SeCNXS, such that rank(R) = n and Lemma 7.7.1 For rank(S) = n, it is true that rank(RTS) = rank(T).
Proof Note that RtR = and = Thus rank(RTS) rank(T) rank(RtRTSSt) rank(RTS). We now consider the Drazin inverse of a non-triangular partitioned matrix.
•
Theorem 7.7.5
Let
and M
If rànk(M) = rank(A)= r,
= then Ind(M) = Ind[A(I — QP)] +1= Ind[(I — QP)A] +1, where
P=CA' andQ=A'B.
Proof From Lemma 3.3.3, we have that D = CA 'B so that M=
Q]. Thus, for every positive integer i;
+
QJ
+ QP)AJ'1[I
Q]. (18)
THE DRAZIN INVERSE
Since
[i,]
A has full column rank and [I
Q]
145
has full row rank, we can
')= rank([A(I + QP)]"). Therefore, rank([A(I + QP)]")= rank([A(I + QP)] 1) if and only if ')= rank(M). Hence Ind([A(I + QP)]) + 1 = Ind(M). In a similar manner, one can show that Ind[(I + QP)A] + 1 = Ind(M). • conclude from Lemma 1 that
An immediate corollary is as follows.
Corollary 7.7.5 For the situation of Theorem 5, Ind(M) = 1 and only tf (I + QP) is non-singular. The results we are developing now are not only useful in computing the index but also in computing AD. Theorem 7.7.6 rank(A) = r, then
1]A[(sA)2riI!A_IB]
=
(19)
Proof Let R denote the matrix R
1][
=
A 'B],
and let m = Ind(M). By using (18), we obtain l[(44S)2]DA[I
'R
By Theorem 5 we know Ind(AS) = m —
1
A 'B].
so that
'[(4%S)2J° =
(ASr-'. Thus, it now follows that
'A[I
'R
A 1B] = MM.
The facts that MR = RM and RMR are easily verified and we have that R satisfies the algebraic definition for MD. The second equality of
(19)is similarly verified. U The case when Ind(M) = 1 is of particular interest.
Theorem 7.7.7
Let
If rank(M)=
and M
=
S'
exists where S = I + rank(A) = r, then Ind(M) = 1 if and only A 1BCA '.When Ind(M) =1, M' is given by
M'
146 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
+BC)'A(A2 + BC)'[A!BI. This result follows directly from Corollary 5 and Theorem 6. Before showing how some of these results concerning partitioned matrices can be used to obtain the Drazin inverse of a general matrix we need one more Lemma. Consider a matrix of the form M
Lemma 7.7.2 AeCPXF.
] where
=
Ind(A) Ind(M) Ind(A)+ 1.
(20)
Furthermore, suppose rank(M) = r. Then Ind(M) = non-singular.
1
and only
A is
Proof (20) follows from Theorem 2. To prove the second statement of the lemma, first note that md (A) =0 implies md (M) = md (0) = 1 by Corollary 3. Conversely, if rank (M) = r and md (M) = 1, then r = rank(M2) or equivalently, r = rank ([A2, AB]) rank (A) r. Thus rank (A) = r so
that A-' exists. • The next theorem can be used to compute the group inverse in case
Ind(M)=' 1. Theorem 7.7.8
Let
where R is a non-singular matrix
suchthatRM_[
]whereUeC". Then MD — R
LO
—
If Ind(M) =1, then M' =
0
R
R
Proof
Since Ind(M) = Ind(RMR '), we know from Lemma 2 that Ind(M) = 1 if and only if U' exists. The desired result now follows from
Corollary 2. • Example 7.7.1
1='Ii
12 Li
2
Let 1
4
2
0
0
We shall calculate M' using Theorem 7. Row reduce [M I] to
R]
THE DRAZIN INVERSE
147
where EM is in row echelon form. (R is not unique.) Then,
Ii 0 01 EM=I0
0
1
ii
0 0
1
LO 0 0]
[—2
[1
—fl, 0]
1
2
0
andRM=E%I.NowR1=12 4
1
[1
0
0
120 [KV sothatRMR'=
00:0 00
Clearly, U' exists. This imphes that Ind(M) = I and rig—' ' ii—2v1
[0
—
0
J
R—
—8
4
2
—10 —5
21
From Lemma 6.1, we know that lip Ind(M), then Ind(M") =
1. Thus, = the above method could be applied to to obtain For a general matrix M with p Ind(M), MD is then given by MD = = Another way Theorem 7 can be used to obtain M of index greater than 1 is described below. Suppose p Ind(M). Then
Ind((RMR ')')= Ind(RM"R ')= Ind(M")= 1. Thus if RM
then RMR'
S
=
T
0]and(RMR'Y'=
].
It follows from (20) that Ind(SP) 1. Therefore one can use Theorem 7 to (This is an advantage because
fmd MD
=R
=
r(SP).sP—
'L
is a smaller size matrix.) Then, 1
1
(SP)#SP.. 2T1
]R.
Finally, note that the singular value decomposition can be used in conjunction with Theorem 7. (See Chapter 12).
8.
Other properties
It is possible to express the Drazin inverse in terms of any (1)-inverse. Theorem 7.8.1
If
is such that Ind(A) = k, then for each integer
148 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
I k, and AD =
any (1)-inverse of l)tAI
X= AD
01
= P[0
rc-2:-i
AD
=
In particular,
',P and C non-singular and N nilpotent.
Proof Let A = Then
;
'.If X is a (1)-inverse, then it is easy to see
1
where X1, X2, X3 are arbitrary. That
L"i
= A'XA' is easily verified by multiplying the block matrices together. U In Theorem 1.5, the Moore—Penrose inverse was expressed by using the full rank factorization of a matrix. The Drazin inverse can also be obtained by using the full rank factorization.
Theorem 7.8.2 Suppose A e and perform a sequence offull rank factorizations: A = B,C1,C1B, = B2C2,C2B2 = B3C3,... so that isafull rank factorization of C,_ ,B1_ 1 ,for i = 2,3 Eventually, there will be a pair offactors, Bk and Ck, such that either (CkBk) 1 exists or CkBk =0. If k denotes the first integer for which this occurs, then
fk when (CkBk)' exist 'lk+l whenCkBk=O.
I nd(A'
—
When CkBk is non-singular, rank(Ak) = number of columns of Bk = number of rows of
and
R(Ak)= R(B, B2...
Bk), N(Ak)= N(CkCk...l ... C1).
Moreover, AD
— 5B1 B2 ... 0
C1
when
exists
when CkBk=O.
Proof IfC1B1isp xpand has rankq0 means each entry of X is positive.)
If an ergodic chain is not regular, then it is said to be cyclic. It can be shown that if an ergodic chain is cyclic, then each state can only be entered at periodic intervals. A state is said to be absorbing if once it is entered, it can never be left. A chain is said to be an absorbing chain if it has at least one absorbing state and from every state it is possible to reach an absorbing state (but not necessarily in one step). The theory of finite Markov chains provides one of the most beautiful and elegant applications of the theory of matrices. The classical theory of Markov chains did not include concepts relating to generalized inversion of matrices. In this chapter it will be demonstrated how the theory of generalized inverses can be used to unify the theory of finite Markov chains. It is the Drazin inverse rather than any of the (i,j, k)-inverses which must be used. Some types of (1)-inverses, including the Moore—Penrose inverse, can be 'forced' into the theory because of their equation solving abilities. However, they lead to cumbersome expressions which do little to enhance or unify the theory and provide no practical or computational advantage. Throughout this chapter it is assumed that the reader is familiar with the classical theory as it is presented in the text by Kemeny and Snell [46]. All matrices used in this chapter are assumed to have only real entries so that (.)* should be taken to mean transpose.
2.
Introduction of the Drazin inverse into the theory of finite Markov chains.
For an rn-state chain whose transition matrix is T, we will be primarily concerned with the matrix A = I — T. Virtually everything that one wants to know about a chain can be extracted from A and its Drazin inverse. One of the most important reasons for the usefulness of the Drazin inverse is the fact that Ind(A) = I for every transition matrix T so that the Drazin inverse is also the group inverse. This fact is obtainable from the classical theory of elementary divisors. However, we will present a different proof utilizing the theory of generalized inverses. After the theorem is proven, we will use the notation A in place of in order to emphasize the fact that we are dealing with the group inverse.
Theorem 8.2.1
If TE W" is any transition matrix (i.e. T is a stochastic matrix) and if A = I — T, then Ind(A) = 1 (i.e. exists).
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 153
Proof The proof is in two parts. Part I is for the case when T is irreducible. Part H is for the case when T is reducible. (I) If T is a stochastic matrix and j is a vector of l's, then Tj = j so that = 1,(see page 211), it follows that p(T)= 1. 1EO(T). Since p(T) lIT If T is irreducible, then the Perron — Frobenius Theorem implies that the eigenvalue 1 has algebraic multiplicity equal to one. Thus, Oec(A) with algebraic multiciplicity equal to one. Therefore, Ind(A) = 1, which is exists by Theorem 7.2.4. N equivalent to saying that Before proving Part II of this theorem, we need the following fact. Lemma 8.2.1 If B 0 is irreducible and M 0 Is a non-zero matrix that B + M = S is a transition matrix, then p(B) < 1.
such
Proof Suppose the proposition is false. Then p(B) 1. However, since 1. Thus, p(B) S is stochastic and M 0, it follows that B B 1. Therefore, it must be the case that 1 = p(B) = p(B*). The Perron—Frobenius Theorem implies that there exists a positive eigenvector. v >0, corresponding to the eigenvalue 1 for B. Thus, v = Bv = = j*S*v — j*M*v = (S* — M)v. By using the fact that Sj = j, we obtain j*i, — Therefore, j*M*v =0. However, this is impossible because
•
We are now in a position to give the second part of the proof of Theorem 1. (II) Assume now that the transition matrix is reducible. By a suitable permutation of the states, we can write
T
ix
(—
indicates equality after a suitable permutation
LO ZJ
hasbeenperformed)
where X and Z are square. If either X or Z is reducible, we can perform another permutation so that
ruvw TJ0 CD. Lo
0
E
If either U, C or E, is reducible, then another permutation is performed. Continuing in this manner, we eventually get B11
0
B22
o
O..:B,,,,
is irreducible. If one or more rows of blocks are all zero except where =0 for for the diagonal block (i.e. if there are subscripts I such that
154 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
each
k # I), then perform one last permutation and write T11
T12
O
T23
...
T1, T2,
T1,+2 T2,..2
T1,+1
Tir+i
...
...
T 0
0
o
Each T11 (1= 1,2,... ,n) is irreducible. From Part I of this proof, we know that (I — T11)' exists for every i. However, for i = 1,2, ... ,r, there is at least r, by an identity matrix and call the resulting matrix 1'. From Theorem 2.1 we know that < 1 for i = 1,2,... ,r so that urn '1" exists A-.
and therefore must be given by urn f
= I — AA' where A = I — f. This
modified chain is clearly an absorbing chain and the probability of eventual absorption into state when initially in 5", is given by
(urn
/1k
= (I — AA')Ik. From this, it should be clear that in the original
chain the probability of eventual absorption into the set [9',] is simply
the sum over
of the absorption probabilities. That is,
P (eventual absorption into [5",] I initially in .9'.) =
(I —
AA '),,. (3)
'ES."
We must now show that the
can be eliminated from (3). In order to do
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS
167
this, write A and A as
Ill
:oi
G
A—
'
I
E
Theorem 2.1 guarantees that
I-
II
ndA—
12
I
G22. is non-singular and Theorem 7.7.3 yields
=
When T is in the form (1) of Section 2, the set of indicies 91k Will be = {h,h + 1, h + 2,... h + r}. Partition I — AA' and sequential. I— as follows:
columns h, h +
I
o...0 0...0
1,...h+t
WIN
...
Wpq
row i is
in here W,fl
=
(4)
0...0
0".0 0...0 and
V
W1qWqq •..
i.r+ i
1w,+ i.,+ +
I—AM =
+
+
columns h, h + 1, ... h + 1
W,,qWqq
1
row
... W,,,W,,,
v-.lis }
...
...
(5)
W+ir+i
...
0
...
0
...
0
0
where W
=I—
the gth row of
In here
Suppose the ith row of I — AA lies along If P denotes the probability given in (3), then it
168 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
is
clear that P is given by
P = gth row sum of W,,,.
(6)
It is also evident that
= gth row sum of
(I —
(7)
lEfk
Since Wq is the limiting matrix of Tqq and Tqq is the transition matrix of an ergodic it follows that the rows of Wqq are identical and sum to 1. Therefore, the gth row sum of W,,qWq = gth row sum of W,,,, and the
desired result now follows by virtue ol(6) and (7). U Theorem 8.6.4
If
is a transient state and is an ergodic state in an ergodic class [6"j which is regular, then the limiting value for the nth step = (I — AA')Ik. transition probability from 50• to 60k is given by lim SI-.
Proof It is not hard to see that urn absorption
= P (eventual
where
=
A-.
into [9'J I initially in 6".) and
the component of the fixed probability vector associated with [5°,] corresponding to the state 6",. Suppose [9'J corresponds to Tqq when the transition matrix T is written in the form (1) of Section 2, and suppose the ith row of I — AA' lies along the gth row of the block in (5). The kth column of I — AA' must therefore lie along one of the columns of W say thefth one, so that = we can use (6) to obtain (I — AA = (gth row sum of WN) x is
W
W,Pk
U
=
itself contain important information about a general chain with more than one transient state. The elements of
Theorem 8.6.5
If 6". and 9', are transient states, then (A')1, is the expected number of times in .9', when initially in Furthermore 5°. and 6", belong to the same transient set (A')1, > 0 and (A' >0
Proof Permute the states so that T has the form (1) of Section 2 so that T=
and A =
[!j9.-! T2E]
where p(Q) < I. Notice that TSk = Qil because 6". and 6", are both
\
/1,—i
transient states. By using the fact that (
/a-I
T'
\
is the Q' ) = ( ) Ii* \s"o 1k expected number of times in 6°, in n steps when initially in 6",, it is easy to see that the expected number of times in .9', when initially in is
\:=o
lim
(a_i)
•
1=0
Theorem 8.6.6
For a general chain, let
denote the set of indices
APPLICATIONS OF THE DRAZIN INVERSE TO THE MARKOV CHAINS 169
corresponding to the transient states. If the chain is initially in the transient states then is the expected number of times the chain is in a k€f transient state. (Expected number of times to reach an ergodic set.)
Proof If a pennutation is performed so that the transition matrix has the form (1) of Section 2, then
O(tA — tA1)D(IA — tA1) = (A — U Using Corollary 9.6.1, one may get the following version of Theorem 2.3. Theorem 9.6.2 Suppose there exists a c such that cA + B is invertible. = (cA + B) 1B, and CA = AA + ñfor all A. Suppose Let A = (cA + B)- 1A, is not invertible for some A. Let {A,,... ,2,} be the Afor which AA + B is not invertible. Then all solutions of Ax + Bx =0 are of the form CA
r
=
(—
(5)
—
m.
or equivalently,
, (—
rn
— c + ADreAut(I
—
[(At — c)A + I]D[(A1 — c)A
+ I])q (6)
where k. = md CAL, q an arbitrary vector in
Proof The general solution of Ax + Bx =0 is x = e
ADBtADAq,
q an
arbitrary vector by Theorem 10.13. Since cA + = I, we have x=e
AD(I_CA)IADAq
=
AD +cI)IADAq.
(7)
Also if A1A + B is not invertible, then A.A + is not. Hence AA + I — cA is not. Thus 21A + B is not invertible if and only if — (A1 — c) - is an eigenvalue of A. But then (— + c) is an eigenvalue of AD. Thus A,A + B is not invertible if and only if A. is an eigenvalue of ci — Both (4) and (6)
now follow from (4), (7) and a little algebra. •
202 GENERALIZED IN VERSES OF LINEAR TRANSFORMATIONS
Note that if the k1 in Theorem 1 are unknown or hard to compute, one may use n in their place. It is interesting to note that while AA + and + B have the same eigenvalues (2 for which det(AA + B) = 0), it is the algebraic multiplicity of the eigenvalue in the pencil 2A + that is important and not the multiplicity in 2A + B. In some sense, Ax + =0 is a more natural
equation to consider than Ax + Bx =0. It is possible to get formulas like (5), (6) using inverses other than the Drazin. However, they tend to often be less satisfying, either because they apply to more restrictive cases, introduce extraneous solutions, or are more cumbersome. For example, one may prove the following corollary.
Corollary 9.6.2 Let A, B be as in Theorem 1. Then all solutions of Ax + Bx = 0 are of the form m
=
(—
—
m.
(8)
The proof of Corollary 2 is left to the exercises. Formula (8) has several disadvantages in comparison to (5) or (6). First At and CA do not necessarily commute. Secondly, in (5) or (6) one has q = x(O) while in (8) one needs to find q, such that —
= q, which may be a non-trivial task.
1=1
7.
Weak Drazin inverses
The preceding sections have given several applications of the Drazin inverse.
It can, however, be difficult to compute the Drazin inverse. One way to lessen this latter problem is to look for a generalized inverse that would play much the same role for AD as the (1)-inverses play for At. One would not expect such an inverse to be unique. It should, at least in some cases of interest, be easier to compute. It should also be usable as a replacement for in many of the applications. Finally, it should have additional applications of its own. Consider the difference equation (1)
From Section 3 we know that all solutions of (1) are o( the form x, = It is the fact that the Drazin inverse solves (I) that helps explain its applications to differential equations in Section 2. We shall define an inverse so that it solves (1) when (1) is consistent. Note that in (1), we have x,, = for I 0. Thus if our inverse is to always solve (1) it the must send R(Ak), k = Ind(A), onto itself and have its restriction to
APPLICATIONS OF THE DRAZIN INVERSE
203
to R(A"). That is. it provides the unique as the inverse of A solution to Ax = b, XER(Ak), when bE R(A"). same
Definition 9.7.1
"and k = Ind(A). Then B is a weak
that Drazin inverse, denoted Ad, Suppose
(d)
B is called a projective weak Drazin inverse of A if B satisfies (d) and
(p) R(BA) = R(AAD). B is called a commuting weak Drazin inverse of A (1 B satisfies (d) and
(c) AB=BA. B is called a minimal rank weak Drazin inverse of A B satisfies (d) and (m) rank(B) = rank(AD).
Definition 9.7.2
An (ia,... , i,,)-int'erse of A is a matrix B satisfying the
properties listed in the m-tu pie. Here i, E { 1,2,3,4. d, m, c. p }. The integers 1, 2, 3, 4 represent the usual defining relations of the Moore—Penrose inverse. Properties d, m, c, p are as in Definition 1. We shall only be concerned with properties { 1,2. m, d, c. p }. Note that
they are all invariant under a simultaneous similarity of A and B. Also 'B = Ak, and get note that one could define a right weak (d)-inverse by a theory analogous to that developed here. Theorem 9.7.1 Suppose that AEC" non-singular matrix such that TAT...I
X
k
= md A. Suppose TEC" XI?
C non-singular. Nk =0.
=
is
a
(2)
Then B is a (d)-inuerse of A if and only if
TBT'
X,Yarbitrary.
(3)
B is an (m,d)-inverse for A if and only if
Ic.-1 xl
TBT -1
Lo
B is a (p. d)-inverse
TBT'
=
oj'
Xarburary.
(4)
of A if and only if X arbitrary, YN =0.
(5)
B is a (c,d)-inverse of A ifand only if
rc-1 o' YNNY. Tff1'=Lo y]'
(6)
204 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
B is a (I, d).inverse of A
(land only if
TBT'
XN=O,Na(1).inverseofN.
(7)
B is a (2,d)-inverse of A (land only :f
YNY=Y,XNY=O.
(8)
If TAT'
is nilpotent, then (3)—(8) are to be interpreted as the (2,2)-block in the matrix. If A is invertible, then all reduce to A -
Proof Let A be written as in (2). That each of (3)—(8) are the required types of inverses is a straight-forward verification. Suppose then that B is a (d)-inverse of A. The case when A is nilpotent or invertible is trivial, so assume that A is neither nilpotent nor invertible. Since B leaves R(Ak)
invariant, we have TBT1
(d) gives only ZC"' =
Izxl
= Lo Hence
for some Z, X, Y. Substituting into
Z = C1 and (3) follows. (4) is clear.
Assume now that B satisfies (3). If B is a (p, d)-inverse then
iT' XN1\_ (F '
RkLO
° o
Thus (5) follows. If B is a (c, d)-inverse of A, then
r' cxl Lo
NY]
XN Lo YN
1'
But then CkX = XNk =0 and (6) follows. Similarly, (7) and (8) follow from (3) and the definition of properties { 1,2). Note that any number I md (A) can be used in place of k in (d). In and Ak+ lAd Ak. Although AdA and AAd are not general, AkA4A always projections, both are the identity on R(Ak). From (3), (4), and (6);
Corollary 9.7.1
is the unique (p.c. d')-inverse of A. AD is also a
(2, p, c, d)-inverse and is the unique (2, c, d)-inverse of A by definition.
Corollary 9.7.2
Suppose that Ind(A) =
1.
Then
(i) B is a (1,d)-inverse of A (land only (lB is a (d)-inverse, and (ii) B is a (2,d)-inverse of A (land only (1 B is an (m,d)-inverse.
Corollary 9.7.3
Suppose that Ind(A)
2. Then there are no (1,c,d)-
inverses or (1, p. 6)-inverses.
Proof Suppose that md (A) 2 and B is a (1, c, d)-inverse of A. Then by =0 (3), (6), (7) we have X =0, NYN = N, and NY = YN. But then
APPLICATIONS OF THE DRAZIN INVERSE 205
which is a contradiction. If B is a (1, p. d)-inverse we have by (3), (5), (7)
that X =0, Y =0, and NON = N which is a contradiction. U Most of the (d)-inverses are not spectral in the sense of [40] since no assumptions have been placed on N(A), N(Ad). However;
Corollary 9.7.4
The operation of taking (m,d)-inverses has the spectral mapping property. That is. 2 is a non-zero eigenvalue for A (land only if 1/2 is a non-zero eigenvalue for the (m, d)-inverse B. Furthermore, the eigenspaces for 2 and 1/2 are the same. Both A and B either have a zero eigenvalue or are invertible. The zero eigenspaces need not be the same.
Note that if 2 is a non-zero elgenvalue of A, then 1/1 is an eigenvalue of any (d)-inverse of A.
Corollary 9.7.5 JfB1,... , B, are (d)-inverses of A, then B1B2 ... B, is a is a (d')-inverse of Atm. (d)-inverse of A'. In particular, Corollary 5 is not true for (1)-inverses. For B
is
=
a (1,2)-
but B2 = 0 and hence B2 is not a (1)-inverse of
inverse of A
= A2 = A. This is not surprising, for (A')2 may not be even a (1)-inverse of A2.
Theorem 9.7.2
Ind(A) = k. Then
Suppose that
(i) (AD + Z(I — ADA)IZ€Cn is the set of all (d)-inverses of A, ADA {AD ADAZ(I — (ii) is the set of all (m,d)-inverses of A, + — ADA) (AD ZA = AZ) is the set of all (c, d)-inverses of A, (iii) + Z(I X
and
(iv) {AD + (I — ADA)[A(I — ADA)] 1(1 — ADA)[A(I ADA) =0) is the set of all (1, d)-inverses of A.
—
ADA)]A(I —
Proof (i)—(iv) follow from Theorem 1. We have omitted the (p. d)- and (2, d)-inverses since they are about as appealing as (iv). U Just as it is possible to calculate A' given an A, one may calculate A" from any Ad.
Jfk = lnd(A), then AD = (Ad)I+ 'A'for any I k. The next two results are the weak Drazin equivalents of Theorem 7.8.1.
Corollary 9.7.6
Theorem 9.7.3
Suppose that
A
where C is
= invertible. Then all (d)-inverses of A are given by Ad
IC1 _C.1DEd+Z(I_EàE)]. Lo
Ed
j'
206 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS Ed any (t)-inverse of E, Ea an (m, d)-inverse of E, Z an arbitrary
matrix of the
correct size.
Proof Suppose A Then Ak
with C invertible. Let k = Ind(A) = Ind(E).
=
ic' e1 = Lo
II
.
.
01.
Ek]' where Ois some matrix. Now the range of[0 0]
in R(Ak). Hence AD and any Ad agree on it. Thus (10)
Now suppose (10) is a (d)-inverse of A. Then AAdAk = A'. Hence
fC D1ICi Lo EJLO
XI1ICA 811C' & X2J1.o
II
CX1+DX211C' ThusLo EX2 ]Lo &+(CX1+DX2)E'=O, EX2E'=E'. If AdAk +2
E'f[0
E'
E'][0
E']'°'
01 ic' el
(11) (12)
= A' is to hold, one must have X2 a (d)-inverse of E. Let X2 = E'
for some (d)-inverse of E. Then (12) holds. Now (11) becomes X1E' = — C IDEdEk. Let Eà be an (m,d)-inverse of E. Then EaE is a projection onto R(E'). Hence X1 must be of the form — C IDEa + Z(I — EaE) and (9) follows. To see that (9) defines a (d)-inverse of A is a direct computation. = A' implies ABA' = A', the It should be pointed out that while two conditions are not equivalent.
Corollary 9.7.7
Suppose there exists an invertible T such that (13)
]T is an (m,d)-inverse
with C invertible and N nilpotent. Then T
for A. If one wanted AD from (13) it would be given by the more complicated
expressionTADTI
rc-i
=Lo
/k-i
C'XN'
Although for block triangular matrices it is easier to compute a weak Drazin than a Drazin inverse, in practice one frequently does not have a block triangular matrix to begin with. We now give two results which are the weak Drazin analogues of Algorithm 7.6.1.
Theorem 9.7.4
Suppose that
and that p(x) = x'(c0 + ... + c,xl,
APPLICATIONS OF THE DRAZIN INVERSE 207
0, is the characteristic (or minimal) polynomial of A.
c0
Then
Ad=__(c11+...+c,Ar_l)
(14)
is a (cd)-inverse of A. If(14) is not invertible, then Ad + (I invertible (c,d)-inverse of A.
—
AdA) is an
Proof Since p(A) =0, we have (c01 + ... + c,A')A1 =0. Hence (c11 + ... + c,A' ')A'4' = — c0A'. Since Ind(A) I, we have that (14) is a (d)-inverse. It is commuting since it is a polynomial in A. Now let A be as in (2). Then since Ad is a (c, d)-inverse it is in the form (6). But then
y= is
—
!(c11 + c2N + ... + c,N' c0
1)•
Ifc1 #0, then V is invertible since N
nilpotent and we are done. Suppose that c1 =0, then
is nilpotent. That Ad + (I — AdA) is a (c, d)-inverse
follows from the fact that Ad Note that Theorem 4 requires no information on eigenvalues or their multiplicities to calculate a (C, d)-inverse. If A has rational entries, (14) would provide an exact answer if exact arithmetic were used. Theorem 4 suggests that a variant of the Souriau — Frame algorithm could be used to compute (c, d)-inverses. In fact, the algorithm goes through almost unaltered.
Theorem 9.7.5
Suppose that A
and
X
Let B0 = I. For j = 1,2, ...
, n,
let
If p5#O, but
=0, then (15)
is a (c,d)-inverse. In fact, (14) and (15) are the same matrix.
Proof Let k = Ind(A). Observe that = — — — ... — If r is the smallest integer such that B, =0 and s is the largest such that #0, then Ind(A) = r — s. Since B, =0, we have A'=p1A'1 — ... —p5A'5=O. Hence,
A'5 = —(A' — p1A'
... — p5
1)
=
!(As—' _P1A5_2_..._P,_,I)Ar_s41.Thatis,Ak=(!B5....i)Ak+1
desired. U
as
208 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
Suppose that A, "and AB = BA. Let Ad be any (4)-inverse of A. Then AdBAAD = BAdAAD = BADAAD = ADBAAD.
Lemma 9.7.1
Proof If AB = BA, then ADB = BAD. Also if A is given by (2), then
TBT' =
with B1C = CB1. Lemma 1 now follows from
Theorem 1. • As an immediate consequence of Lemma 1, one may use a (d)-inverse in many of the applications of the Drazin inverse. For example, see the next theorem.
Theorem 9.7.6
Suppose that A,BeC""'. Suppose that Ax + Bx = 0 has
unique solutions for consistent initial conditions, that is, there is a scalar c such that (cA+ B) is invertible. Let A = (cA + B) 1A, = (cA + B) 18. Let k = md (A), jfAx + Bx =0, x(O) = q, is consistent, then the solution is If Ad is an (m,d)-inverse of A, then all solutions of Ax + Bx =0 x=e are of the form x = qeC", and the space of consistent initial conditions is R(AAd) = R(AI2A).
Note in Theorem 6 that AAd need not equal AdA even if is an (m, d)inverse of A. Weak Drazin inverses can also be used in the theory of Markov chains. For example, the next result follows from the results of Chapter 8.
If T is the transition matrix of an rn-state ergodic chain and if A = I — T, then the rows of I — AdA are all equal to the unique fixed probability vector ofTfor any (4)-inverse of A. Theorem 9.7.8
8.
Exercises
Exercises 1—6 provide a generalization of some of the results in Section 2.
Proofs may be found in [18]. 1. Suppose that A, B are m x n matrices. Let ()° denote a (2)-inverse. Show that the following are equivalent: (i) (AA + B)°A, + B)°B commute. (ii) (AA + B)(AA + B)°A[I —v.A + B)°(AA + B)] = 0. (iii) + B)(AA + B)°B{I — (AA + + B)] =0.
+ B)tA, 2. Prove that if A,B are hermitian, then + B)tB commute if and only if there exists a I such that N(XA + B) = N(A) N(B). Furthermore, if I exists, then (IA + B)tA, (IA + B)tB commute. 3. Prove that if A, BeC" 'are such that one is EP and the other is positive semi-definite, then there exists A such that AA + B is invertible if and only if N(A) N(S) = {0). 4. Prove that if A, Be C" Xli are such that one is EP and one is positive semi-definite, then there exists A such that N(AA ÷ B) = N(A) rs N(S).
APPLICATIONS OF THE DRAZIN INVERSE 209 5.
Suppose that A, are such that N(A) N(B) reduces both A and B. Suppose also that there exists a 2 such that N(AA + B) = N(A) N(B). Prove that when Ax + Bx =1, f n-times continuously differentiable is consistent if and only if + B) for all t, that is, (AA + B) x B)tf = f. And that if it is consistent, then all solutions are of the + form X
= + [(AA + B)D(AA + B) — ADarADAq
+e
—
+ [I — (2A + B)D(A.A + B)]g
where A = (AA + B)DA, B = (A.A + B)DB, I = (AA + q is an arbitary vector, g an arbitrary vector valued function, and k = md (A). 6. Prove that if A, B are EP and one is positive semi-definite, 2 as in Exercise 4, then all solutions of Ax + Bx = f are in the form given in Exercise 5. 7. Derive formula (8) in Corollary 9.6.2. 8. Derive an expression for the consistent set of initial conditions for Ax + Bx = f when f is n-times differentiable and AA + B is onto. 9. Verify that Corollary 9.7.6 is true. 10. Fill in the details in the proof of Theorem 9.5.7. + 'B = A". 11. If A E and k = md (A), define a right weak Drazin by Develop the right equivalent of Theorems 1, 2, 3 and their corollaries. 12. Solve Ax(t) + Bx(t) = b, A, B as in Example 2.2, b = [1 20]*. X
Answer: x1(t) =
—
x2(r) =
—
x3(t) = 13.
+ 2x3(0)) —
—
+
+ 2x3(0)) — + 2x3(0)) —
—
—
—
+ 2t —
t
Let T be a matrix of the form (1). Assume each P, > 0(If any P, = 0, Thus, we then there would never be anyone in the age interval A, agree to truncate the matrix T just before the first zero survival probability.) The matrix T is non-singular if and only if bm (the last birth rate) is non-zero. Show that the characteristic equation for T is
O=xm —b,x"''
—p1p2b3x"3 — Pm_2bm.- ,)x —(p,p2 ... ibm)•
10
Continuity of the generalized inverse
1.
Introduction
Consider the following statement: X (A) If is a sequence of matrices and C"' converges to an invertible matrix A, then for large enough j, A, is invertible and A
to its obvious theoretical interest, statement (A) has practical computational content. First, if we have a sequence of 'nice' matrices which gets close to A, it tells us that gets close to A'. Thus approximation methods might be of use in computing Secondly, statement (A) gives us information on how sensitive the inverse of A is to terrors' in determining A. It tells us that if our error in determining A was 'small', then the error resulting in A1 due to the error in A will also be 'small'. This chapter will determine to what extent statement (A) is true for the Moore—Penrose and Drazin generalized inverses. But first, we must discuss what we mean by 'near', 'small', and 'converges to'.
2. Matrix norms In linear algebra the most common way of telling when things are close is by the use of norms. Norms are to vectors what absolute value is to numbers.
Definition 10.2.1. A function p sending a vector space V into the positive reals is called a norm all u,vE V and
(I) p(u)=Oiffu=O. and (iii) p(u + v) p(u) + p(v) (ii)
(triangle inequality).
We will usually denote p(u) by u
CONTINUITY OF THE GENERALIZED INVERSE
211
There are many different norms that can be put on C". If uEC" and u has coordinates (u1, ... , un), then the sup norm of u is given by sup 1
The p-norm of u
is
given by 1/p
forp1.
The function u hf,, is not a norm for 0 p < since it fails to satisfy (iii). The norm I! ii 112 is the ordinary Euclidean norm of u, that is, u 112 is the geometric length of u. We are using the term norm a little loosely. To be precise we would have to say that is actually a family of norms, one for each C". However, to avoid unenlightening verbage we shall continue to talk of the norm the norm etc. are isomorphic, as a vector space, to C'"". The m x n matrices, C'" Thus C'" can be equipped with any of the above norms. However, in AB A liii B whenever working out estimates it is extremely helpful 1
AB is defined. A norm for which AB hAil IIBII for called a matrix norm. Not all norms are matrix norms. Example 10.2.1 the
If A =
of C'"" applied to
then define A II = max 1a11I. This is just
Now let
A = B = 1, but AB = 2. Thus this norm is not a matrix norm. There is a standard way to develop a matrix norm from a vector norm. is a norm on C' for all r. Define hA IL5 by Suppose that and
A 1105 = sup (II Au iii: ueC", u hIS = I). It is possible to generalize this
definition by using a different norm in C'" to 'measure' Au than the one used in C" to 'measure' u. However, in working problems it is usually We will not need easier to use a fixed vector norm such as or the more general definition. There is another formulation of II Proposition 10.2.1 Suppose that A e C'" Xfl and is a norm on C'" and Thus IA IL5 = I Au KIIuII5 for every ueC"}. The proof of Proposition 1 is left to the exercises. If A is thought of as a linear transformation from C" to C'", where C" and C'" are equipped with the norm then A L is the norm that is usually used. A L is also referred to as the operator norm of A relative to hence the subscript os. Conversely, if is a matrix norm, then by identifying C' and C' 'it is a matrix norm we have induces a norm on C', say IL. Since
IIAuIL hAil hulL.
212 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
If(1) occurs for a pair of norms and on C" and Ctm. we on By Proposition 1 we say that consistent with the vector norm know that is consistent with IL if and only if II A A We pause briefly to recall the definition of a limit. ii
Definition 10.2.2
Suppose that is a sequence of m x n matrices and A or lim Then = is a norm on C'" converges to A, (written XsI
ii
A)
j
every real number c> 0, there exists a real number i such A fi then ii Notice that the definition of convergence seems to depend on the norm
used. However, in finite dimensional spaces there is no difficulty.
Theorem 10.2.1
Suppose that dime,jsional vector space V. Then if
and and
f, are two norms on afinite Off, are equivalent.
That
is,
there exist constants, k, I such that k u u for all U E V. u Theorem 1 is a standard result in most introductory courses in linear
algebra. A proof may be found, for example, in [49]. Theorem 1 tells us that if A1 A with regard to one norm, then A1 A with regard to any norm. It is worth noting that A1 —' A if and only if the entries of converge to the corresponding entries of A. To further develop this circle of ideas, and for future reference, let us see what form Theorem 1 takes for the norms we have been looking at. Recall that an inequality is called sharp if it cannot be improved by multiplying one side by a scalar. Theorem 10.2.2 Suppose that ueC" and that p. q are two real numbers greater or equal to one. Then (1)
(ii)
iii
huh0 u
n—
(iv) n
— hi
ii
u
II u II q
if p rank (P). That is, dim dim R(P). But N(P) is complementary to R(P). Thus for each J,, there is a #0. But then eN(P), and vector u, such that u, )u. =Puj +E.u. =Ej U. . Let 11.11 bean U. =P, lkfor ThenkII E j,'. But F!, if —p0 and we have a
contradiction. Thus the required J0 does exist. I We next prove a special case of Theorem 1.
Proposition 10.7.2 Suppose that
A€C"'
and
—'
A. Suppose
CONTINUITY OF THE GENERALIZED INVERSE
further that md (A,) = md (A) and core-rank (A,) = core-rank (A) for Then A? —, AD.
greater than some fixed
Proof
233
j
Suppose that A1 —. A, md (A1) = md (A) = k and core-rank (A1) =
= core-rank (A). From Chapter 7 we know while AD = l)tAk. But rank(AJk4 1) = core-rank (A1) = core-rank (A) = 1)t by Theorem 4.1 Proposition 2 rank(A2k4 1)• Thus (A,2k l)t —, now follows. U We are now ready to prove Theorem 1.
Proof of Theorem 1 Suppose that A ., A are m x
m matrices and A1— A.
We will first prove the only if part of fheorem 1. Suppose that A?-. AD. But AJA? is a projector onto Then AJA? —, and AAD is a Thus rank(A.,A?) = core-rank (A1) and projector onto rank (AAD) = core-rank (A). That core-rank (A,) = core-rank (A) for large j now follows from Proposition 1 and the fact that AJA? —, AAD. To prove the if part of Theorem 1 assume that core-rank (A,) = core-rank (A). Let A1 = C + N, and A = C + N be the core-nilpotent decompositions A' for all integers 1. Pick in Chapter 7. Now of A1 and A
= and A' = C'. Hence C'— C'. Now I sup {Ind(A1), Ind(A)}. Then = rank(C'). core-rank (A1)= core-rank (A) = = Ind(C) are either both zero or both one. This implies that -. by Proposition 2. We may assume the indices are one Thus = (CD)1. else A is invertible and we are done. But = (C?)' and = converges to (C" = Hence A? = C?
•
We would like to conclude this section by examining the continuity of
the index. In working with the rank it was helpful to observe that if A1—. A, then rank (A1) rank (A) for large enough j. This is not true for the index.
Example 10.7.4
Let
=0 while lnd(A) =2. Notice that A? + AD in Example 4. Proposition 10.7.3 Suppose that then there exists aj0 such that Ind(A)
Ind(A)forj j.
A and
AD. Let J0 be such that md does not Proof Suppose A1—. A and take on any of its finite number of values a finite number of times for be the subsequence such that jj0. Let 1= inf{Ind(A) and md (A1) =1. Let N be the nilpotent parts of A1, A respectively. Then — —. A'(I — ADA) = N'. Hence md (N) l.U 0= (NJ =
234 GENERALIZED INVERSES OF LINEAR TRANSFORMATIONS
8.
Exercises
Prove that 11 Ii,,, p 1 defines a norm on C". on C" and C'" that 2. Prove that for AeC'" • and norm 1.
sup { II Au
=I} K u 115 for every uEC}.
: UE C", II u
= inf{K: Au 115
3. Suppose that {Ak} is a sequence of rn x n matrices. Let cC'" ".Prove that Ak —, A if and only if A=
=
lim k-.
for 1 i rn, 1
4.
Show that if Ac C'" XII, then
5.
If Ae C'"
then
let A
n. A 1101
= max {
= n max ajj I = n II A
II
Since C'" XIS is
the 11112 noim. IA 112
isomorphic to C'"" we can give
/
i
\1/2
=
ii (a)
is a matrix norm in the following sense. If
Prove that A, BE C" XIS then
AB A
lB
(b) Prove that 11112 is a matrix norm. (This will probably require the following inequality:
/
\1/2/
i,
Ei_i
N
i—i
•
\1/2
1—1
which goes by the name of Cauchy's inequality.) Show that Suppose that A E C" (c)
!
(d)
—
(e)
A A A
A
A 1101 A fl
A A II
o2
and
A
(f) n"211A112 11A1102 hAil2
(g) n"211A112 (h) n"211A112 hAil01 (1)
(i) n"2iiAiI02 hAIL1 (k) n' hAil00 hAIL1 (1)
In parts (l)—(k) determine the correct inequalities if
AEC'"" rather than CNX. 6. Show that the inequalities (c)—(k) of Exercise 5 are all sharp. 7. Prove Proposition 10.3.1.
CONTINUITY OF THE GENERALIZED INVERSE 8.
Let I be the identity matrix on CflXft. Show that 11111
235
1 for any
matrix norm on 9. Let be any matrix norm 11111> 1 is permitted. 1111 A 0, which depends on the
equipment used, such that numbers less than ç are considered zero. Several algorithms suitable for hand calculation or exact arithmetic on small matrices have been given earlier. For small matrices, those methods are sometimes preferable to the more complicated ones we shall now discuss. This chapter is primarily interested in computer calculation for 'large' matrices. Throughout this chapter denotes a matrix norm as described in Chapter 11. For invertible A, we define the condition number of A with We frequently write DC respect to the norm as = hA instead of ac(A). If A is singular, then i4A) =
2.
HA
At
Calculation of A'
This section will be concerned with computing
The first difficulty is that this is not, as stated, a well-posed problem. If A is a matrix which is not of full column or row rank then it is possible to change the rank of A
by an arbitrarily small perturbation. Using the notation of Chapter 12 we have:
Unpleasant fact Suppose AECTM XI, is of neither full column nor full row rank. Then for any real number K, and any 6>0, there exists a matrix E, E