Control System Synthesis A Factorization Approach Part II
Synthesis Lectures on Control and Mechatronics Editors Chaou...
111 downloads
568 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Control System Synthesis A Factorization Approach Part II
Synthesis Lectures on Control and Mechatronics Editors Chaouki Abdallah, University of New Mexico Mark W. Spong, University of Texas at Dallas
Control System Synthesis: A Factorization Approach, Part II Mathukumalli Vidyasagar 2011
Control System Synthesis: A Factorization Approach, Part I Mathukumalli Vidyasagar 2011
The Reaction Wheel Pendulum Daniel J. Block, Karl J. Åström, and Mark W. Spong 2007
Copyright © 2011 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher.
Control System Synthesis: A Factorization Approach, Part II Mathukumalli Vidyasagar www.morganclaypool.com
ISBN: 9781608456611 ISBN: 9781608456635
paperback ebook (Part II)
DOI 10.2200/S00358ED1V01Y201105CRM003
A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON CONTROL AND MECHATRONICS Lecture #3 Series Editors: Chaouki Abdallah, University of New Mexico Mark W. Spong, University of Texas at Dallas Series ISSN Synthesis Lectures on Control and Mechatronics Print 1939-0564 Electronic 1939-0572
Control System Synthesis A Factorization Approach Part II
Mathukumalli Vidyasagar University of Texas at Dallas
SYNTHESIS LECTURES ON CONTROL AND MECHATRONICS #3
M &C
Morgan
& cLaypool publishers
ABSTRACT This book introduces the so-called “stable factorization approach” to the synthesis of feedback controllers for linear control systems. The key to this approach is to view the multi-input, multioutput (MIMO) plant for which one wishes to design a controller as a matrix over the fraction field F associated with a commutative ring with identity, denoted by R, which also has no divisors of zero. In this setting, the set of single-input, single-output (SISO) stable control systems is precisely the ring R, while the set of stable MIMO control systems is the set of matrices whose elements all belong to R. The set of unstable, meaning not necessarily stable, control systems is then taken to be the field of fractions F associated with R in the SISO case, and the set of matrices with elements in F in the MIMO case. The central notion introduced in the book is that, in most situations of practical interest, every matrix P whose elements belong to F can be “factored” as a “ratio” of two matrices N, D whose elements belong to R, in such a way that N, D are coprime. In the familiar case where the ring R corresponds to the set of bounded-input, bounded-output (BIBO)-stable rational transfer functions, coprimeness is equivalent to two functions not having any common zeros in the closed right half-plane including infinity. However, the notion of coprimeness extends readily to discrete-time systems, distributed-parameter systems in both the continuous- as well as discrete-time domains, and to multi-dimensional systems. Thus the stable factorization approach enables one to capture all these situations within a common framework. The key result in the stable factorization approach is the parametrization of all controllers that stabilize a given plant. It is shown that the set of all stabilizing controllers can be parametrized by a single parameter R, whose elements all belong to R. Moreover, every transfer matrix in the closed-loop system is an affine function of the design parameter R. Thus problems of reliable stabilization, disturbance rejection, robust stabilization etc. can all be formulated in terms of choosing an appropriate R. This is a reprint of the book Control System Synthesis: A Factorization Approach originally published by M.I.T. Press in 1985.
KEYWORDS stable factorization, coprimeness, Bezout identity, simultaneous stabilization, robust stabilization, robust regulation, genericity
Dedicated to the memories of My paternal grandfather, Mathukumalli Narasimharao (1881-1965) and My father, Prof. Mathukumalli Venkata Subbarao (1921-2006)
ix
kt
я
dv
May God protect us together May God nourish us together May we work together with great energy May our study be vigorous and effective May we not have ill feelings Let there be peace, peace, peace
(A vedic prayer recited by the teacher and student(s) together at the commencement of studies. Found in several upanishads including Taittiriyopanishad )
xi
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface for the Original Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
6
Filtering and Sensitivity Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.1 6.2 6.3 6.4 6.5 6.6 6.7
7
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graph Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robustness Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robust and Optimal Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genericity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
207 209 229 236 252 269
Extensions to General Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 8.1 8.2 8.3
A
153 156 162 166 176 193 199
Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 7.1 7.2 7.3 7.4 7.5 7.6
8
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some Facts about Hardy Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensitivity Minimization: Scalar Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensitivity Minimization: Fat Plant Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensitivity Minimization: General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two-Parameter Compensator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Coprime Factorizations over a Commutative Ring . . . . . . . . . . . . . . . . . . . . . . . . . 279 Coprime Factorizations for Distributed Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Controller Synthesis over a General Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Algebraic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 A.1 A.2 A.3 A.4
Rings, Fields and Ideals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rings and Fields of Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Ideal Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Euclidean Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
307 312 314 319
xii
B
Preliminaries on Matrix Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 B.1 B.2
C
Matrices and Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Topological Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 C.1 C.2
Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Topological Rings and Normed Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Preface This is a reprint of my book Control System Synthesis: A Factorization Approach, originally published by M.I.T. Press in 1985. That book went out of print about ten years after its publication, but nevertheless continued to be cited in the published literature. I would like to believe that this is because the contents of the book are still relevant to linear control theory as it is currently practiced. I am therefore grateful to Morgan & Claypool for their decision to reprint my book, so that the current generation of graduate students and researchers are able to access its contents without having to photocopy it surreptitiously from the library. The original text was created by me using the troff text processing system created at Bell Laboratories. Indeed, the 1985 book was the first time that I produced a camera-ready copy, instead of subjecting myself to the mercy and vagaries of typists and/or typesetters. Back then this was a novel thing to do; however today it is standard practice. Dr. C. L. Tondo of T&T TechWorks, Inc. and his troops have expertly keyed in the entire text into LATEX, for which I am extremely grateful. There is no doubt that the physical appearance of the text has been significantly improved as a result of this switch. With the entire text at my disposal, I could in principle have made major changes. After thinking over the matter, I decided to stick with the original text, and restricted myself to correcting typographical errors. Upon re-reading the original text after a gap of perhaps twenty years, I felt that the entire material still continues to be relevant, except for Chapter 6 on H∞ -control. Just about the time that I was finalizing the book, two seminal papers appeared, giving the connection between interpolation theory and H∞ -control; these are cited in the book as references [40] and [115]. A few years after my book was published, the influential book [37] appeared. This was followed in short order by the paper [28], which gives a complete state-space computational procedure for H∞ control; this paper is perhaps the most cited paper in the history of control theory. Subsequently, the book [29] gave an elementary introduction to the theory, while [45, 117] are advanced treatments of the subject. It would therefore have required a massive effort on my part to rewrite Chapter 6 of the book to bring it up to date, and I felt that I could contribute nothing beyond the excellent texts already in print. So I decided to leave the book as it is, on the basis that the conceptual framework for H∞ -control presented here still remains relevant. I had dedicated the original book to my paternal grandfather. In the interim, my father too has passed away, and I have therefore added his name to the dedication of the Morgan & Claypool edition. About twenty years ago, while perusing a book on Telugu writers, I discovered that one of my ancestors, bearing the same name as my paternal grandfather, was a well-known Telugu poet during the 19th century; he lived between 1816 and 1873. The article about him states that the clan
xiv
PREFACE
was famous for scholarship for three centuries (emphasis added). I am truly blessed to have such a distinguished lineage.
Dallas and Hyderabad, June 2011
Preface for the Original Edition The objective of this book is to present a comprehensive treatment of some recent results in the area of linear multivariable control that can be obtained using the so-called "factorization" approach. It is intended as a second level graduate text in linear systems and as a research monograph. The prerequisites for reading this book are covered in three appendices, but a reader encountering these topics for the first time would undoubtedly have difficulty in mastering this background on the basis of these appendices alone. Moreover, the appendices concentrate on the mathematical background needed to understand the material covered here, but for motivational background a standard first course in graduate level linear system theory would be desirable. The central idea that is used repeatedly in the book is that of "factoring" the transfer matrix of a (not necessarily stable) system as the "ratio" of two stable rational matrices. This idea was first used in a paper published in 1972 (see [92]), but the emphasis there was on analyzing the stability of a given system rather than on the synthesis of control systems as is the case here. It turns out that this seemingly simple stratagem leads to conceptually simple and computationally tractable solutions to many important and interesting problems; a detailed description can be found in Chapter l. The starting point of the factorization approach is to obtain a simple parametrization of all compensators that stabilize a given plant. One could then, in principle, choose the best compensator for various applications. This idea was presented in the 1976 paper by Youla, Jabr, and Bongiorno entitled "Modern Wiener-Hopf Design of Optimal Controllers, Part II: The Multivariable Case," which can be considered to have launched this entire area of research. The viewpoint adopted in this book, namely that of setting up all problems in a ring, was initially proposed in a 1980 paper by Desoer, Liu, Murray, and Saeks. This paper greatly streamlined the Youla et al. paper and reduced the problem to its essentials. Thus virtually all of the research reported here is less than five years old, which bears out the power of this approach to formulate and solve important problems. In writing the book, some assumptions have been made about the potential readership. First, it is assumed that the reader is already well versed in the aims and problems of control system analysis and design. Thus, for example, the book starts off discussing the problem of stabilization without any attempt to justify the importance of this problem; it is assumed that the reader already knows that stable systems are better than unstable ones. Also, as the book is aimed at professional researchers as well as practitioners of control system synthesis, a theorem-proof format has been adopted to bring out clearly the requisite conditions under which a particular statement is true, but at the same time, the principal results of each section have been stated as close to the start of the section as is practicable. In this way, a reader who is interested in pursuing the topics presented here in greater depth is enabled to do so; one who is only interested in using the results can rapidly obtain an idea of what they are by scanning the beginnings of various sections and by skipping proofs. In this
xvi
PREFACE FOR THE ORIGINAL EDITION
connection it is worth noting that Chapter 3 is a painless introduction to the factorization approach to scalar systems that could, in my opinion, be taught to undergraduates without difficulty. At various times, I have taught the material covered here at Waterloo, Berkeley, M.I.T., and the Indian Institute of Science. Based on these experiences, I believe that the appendices plus the first five chapters can be covered in a thirteen-week period, with three hours of lectures per week. In a standard American semester consisting of eighteen weeks of lectures with three hours of lectures per week, it should be possible to cover the entire book, especially if one starts directly from Chapter I rather than the appendices. Most of the sections in the appendices and the first five chapters contain several problems, which contain various ancillary results. The reader is encouraged to attempt all of these problems, especially as the results contained in the problems are freely used in the subsequent sections. It is now my pleasure to thank several persons who aided me in this project. My wife Shakunthala was always a great source of support and encouragement during the writing of this book, which took considerably more time than either of us thought it would. Little Aparna came into being at about the same time as the third draft, which gave me a major incentive to finish the book as quickly as possible, so that I might then be able to spend more time with her. Several colleagues gave me the benefit of their helpful comments on various parts of the manuscript. Of these, I would like to mention Ken Davidson, Charlie Desoer, John Doyle, Bruce Francis, Allen Tannenbaum, George Verghese, N. Viswanadham, and Alan Willsky. My students Chris Ma and Dean Minto went over large parts of the material with a great deal of care and exposed more than one serious mistake. I would like to acknowledge my indebtedness to all of these individuals. The final camera ready copy was produced by me using the troff facility and the pic preprocessor to generate the diagrams. In this connection, I would like to thank. Brian Haggman, a differential topologist turned computer hack, for much valuable advice. Finally, at the outset of this project, I was fortunate enough to receive an E. W. R. Steacie Memorial Fellowship awarded by the Natural Sciences and Engineering Research Council of Canada, which freed me from all teaching duties for a period of two years and enabled me to concentrate on the research and writing that is reflected in these pages. I would like to conclude this preface with a historical aside, which I hope the reader will find diverting. It is easy enough to discover, even by a cursory glance at the contents of this book, that one of the main tools used repeatedly is the formulation of the general solution of the matrix equation XN + Y D = I , where all entities are matrices with elements in a principal ideal domain. Among recent writers, Tom Kailath [49] refers to this equation as the Bezout identity, while V. Kuˇcera [60] refers to it as the Diophantine equation. In an effort to pin down just exactly what it should be called, I started searching the literature in the area of the history of mathematics for the details of the person(s) who first obtained the general solution to the equation ax + by = 1, where a and b are given integers and one seeks integer-valued solutions for x and y. It appears that the equation "Diophantine equation" was commonly applied by European mathematicians of the seventeenth century and later to any equation where the unknowns were required to assume only integer values. The phrase honors the Greek mathematician Diophantus,
PREFACE FOR THE ORIGINAL EDITION
xvii
who lived (in Alexandria) during the latter part of the third century A. D. However, the general solution of the linear equation in integer variables mentioned above was never studied by him. In fact, Smith [90, p. 134] states that Diophantus never studied indeterminate equations, that is equations that have more than one solution. According to Colebrooke [17, p. xiii], the first occidental mathematician to study this equation and to derive its general solution was one Bachet de Meziriac in the year 1624. The first mathematician of antiquity to formulate and find all solutions to this equation was an ancient Hindu by the name of Aryabhatta, born in the year 476. A detailed and enjoyable exposition of this subject can be found in the recent comprehensive book by van der Waerden [104]. Thus, in order to respect priority, I submit that the equation in question should henceforth be referred to as Aryabhatta’s identity.
153
CHAPTER
6
Filtering and Sensitivity Minimization 6.1
PROBLEM STATEMENT
The problem studied in this chapter is the following: Suppose P is a given plant, not necessarily stable, and W1 , W2 are given proper stable rational matrices. The objective is to find, among all stabilizing compensators of P , a C that minimizes the weighted performance measure J = W1 (I + P C)−1 W2 .
(6.1.1)
The significance of the problem can be explained in terms of Figure 6.1.
v W1 t
W2
r
+–
e
W2
C
P
+ + d y
z
W1
Figure 6.1: Pictorial Representation of the Weighted Performance Measure.
Here W2 can be interpreted as the generator of the disturbance input d, and (I + P C)−1 W2 it the transfer matrix from j to the measured output y. In this case W1 y is the weighted measured output, and W1 (I + P C)−1 W2 is the transfer matrix from v to the weighted measured output. Alternately, one can think of W2 as generating a (bounded) reference input r, and (I + P C)−1 W2 is the transfer matrix from t to the tracking error e. In this case W1 e is the weighted error. By suitable definition of the norm · , the weighted performance measure J can be made to represent several quantities of practical significance. Case I. Filtering Suppose r is a random process with power spectral density W2 (j ω)2 , and that it is desired to minimize the weighted energy of the resulting process e by selecting C appropriately.
154
6. FILTERING AND SENSITIVITY MINIMIZATION
In this case, t represents white noise and W2 is a whitening filter; e(s) = [(I + P C)−1 W2 )](s), and the power spectral density of e is given by e(j ω)2 = [(I + P C)−1 W2 ](j ω)2 .
(6.1.2)
In assessing the situation, one may be more concerned about certain frequencies more than others. The role of the weighting matrix W1 is to reflect the relative importance attached to energy at different frequencies. Thus, the weighted error z equals W1 e and its power spectral density is z(j ω)2 = [W1 (I + P C)−1 W2 ](j ω)2 .
(6.1.3)
The objective is to minimize the energy of the process z, i.e., 1 J = 2π =
1 2π
∞ z(j ω)2 dω −∞ ∞
[W1 (I + P C)−1 W2 ](j ω)2 dω .
(6.1.4)
−∞
Case II. Sensitivity Minimization Suppose u is a square-integrable input with an L2 -norm no larger than one, and suppose it is desired to minimize the largest L2 -norm that z may have in response to such an input. Here again the objective function is to minimize the weighted energy of the error e, but in this case the input u is assumed to be any L2 -input of unit norm or less, rather than white noise as in Case I. In this case the objective function to be minimized is J = sup z2 u2 ≤1
= sup W1 (I + P C)−1 W2 u2 u2 ≤1
= sup [W1 (I + P C)−1 W2 ](j ω) ,
(6.1.5)
ω∈R
where || · ||2 denotes the L2 -norm, and the last equality is a standard result from [26, 94] as pointed out in the next section. Since in this case the quantity measures the maximum error in response to a unit-norm input u, this problem is sometimes referred to as the minimax sensitivity problem. In both the above cases, as well as in others that the reader may construct, the minimization of J in (6.1.1) is required to be carried out subject to the constraint that C stabilizes P . As things stand, the compensator C is thus constrained, and also enters the objective function J in a highly nonlinear fashion. However, using the results of Chapter 5, it is possible to transform this into an unconstrained optimization problem involving a much simpler objective function. The set of all compensators C that stabilize a given plant P is neatly parametrized in The˜ N) ˜ be any r.c.f. and any l.c.f. of P , and select X, ˜ Y˜ ∈ M(S) such that orem 5.2.1. Let (N, D), (D,
6.1. PROBLEM STATEMENT
155
N˜ X˜ + D˜ Y˜ = I . Then, from this theorem, S(P ) = {(X˜ + DR)(Y˜ − NR)−1 : R ∈ M(S) and |Y˜ − NR| = 0} .
(6.1.6)
Moreover, if C ∈ S(P) equals (X˜ + DR)(Y˜ − NR)−1 for a certain R ∈ M(S), then (cf. (5.2.7)) (I + P C)−1 = (Y˜ − NR)D˜ .
(6.1.7)
Thus, the problem of minimizing J (R) = W1 (I + P C)−1 W2 subject to the constraint that C ∈ S(P) is equivalent to minimizing ˜ 2 , J (R) = W1 (Y˜ − NR)DW
(6.1.8)
subject to the constraint that |Y˜ − NR| = 0. Now, from Lemma 5.2.4, |Y˜ − NR| = 0 for all R belonging to an open dense subset of M(S), and if P is strictly proper, then from Lemma 5.2.6, |Y˜ − N R| = 0 for all R ∈ M(S). Hence, the original problem of minimizing the weighted performance measure J of (6.1.1) is equivalent, in the case where P is strictly proper, to an unconstrained minimization of the objective function J (·) of (6.1.8) with respect to R. Moreover, the original ˜ 2 , where the noteworthy expression W1 (I + P C)−1 W2 has been replaced by W1 (Y˜ − NR)DW feature is that the quantity inside the norm is affine in R. If the plant is not strictly proper, the quantity R is constrained to the extent that it has to satisfy the “nonsingularity” constraint. However, since this constraint is only operational on a nowhere dense set, it can be ignored during the minimization and can be tested once an unconstrained minimum has been found. If by chance the optimal choice of R does not satisfy the nonsingularity constraint, then arbitrarily small perturbations in R will satisfy the constraint and will result in only an arbitrarily small increase in the performance measure (since the latter is a continuous function). Up to now we have only talked about minimizing the weighted energy of the error signal in response to various types of inputs. Depending on the application, one might prefer instead to minimize the weighted energy of some other signal within the loop. Since the transfer matrix between any two points can be expressed as an affine function of R using the results of Chapter 5, such a minimization is qualitatively no different from the problems discussed earlier. For instance, if it is desired to minimize the weighted energy of the plant output in response to an input u, then J = W1 P C(I + P C)−1 W2 .
(6.1.9)
However, if C ∈ S(P ), then by (5.2.7),
for some R ∈ M(S), so that
P C(I + P C)−1 = (X˜ + DR)N˜ ,
(6.1.10)
˜ 2 . J = W1 (X˜ + DR)NW
(6.1.11)
In fact, it is easy to see that the general problem becomes one of minimizing J = F − GRH ,
(6.1.12)
156
6. FILTERING AND SENSITIVITY MINIMIZATION
where F, G, H ∈ M(S) are given matrices. Finally, the above discussion applies even when a two-parameter control scheme is used. In this case, by (5.6.15), all transfer matrices are affine functions of the matrix S = [Q R], and the general cost function assumes the form J = F − GSH ,
(6.1.13)
where F, G, H ∈ M(S) are given matrices. The discrete-time analogs of the above problems are simply obtained by replacing S by R∞ .
6.2
SOME FACTS ABOUT HARDY SPACES
In the previous section, it was shown that several problems of filtering and sensitivity minimization can be formulated as follows: Minimize J (R) = F − GRH ,
(6.2.1)
where F, G, H ∈ M(S) are given and R ∈ M(S) is to be chosen. The solution of this class of problems is facilitated by the use of some concepts and results from the theory of Hp - (or Hardy) spaces. The purpose of this section is to present a listing of the basic concepts and results from this theory, mostly without proof. The reader is referred to [34] for an excellent treatment of this theory, and to [59] for some specialized results. ¯ the closed Throughout this section, D denotes the open unit disc in the complex plane, and D unit disc. For each p ∈ [1, ∞), the set Hp consists of all analytic functions f on the open unit disc D with the property that ⎡ ⎤1/p 2π 1 sup ⎣ |f (rej θ )|p dθ ⎦ =: f p < ∞ . (6.2.2) 2π r∈(0,1) 0
The set H∞ consists of all analytic functions f on D such that sup
max |f (rej θ )| =: f ∞ < ∞ .
r∈(0,1) θ∈[0,2π]
(6.2.3)
If f ∈ Hp for some p ∈ [1, ∞], then strictly speaking f (z) is only defined when |z| < 1. However, for almost all θ ∈ [0, 2π ], it is possible to define f (ej θ ) as a nontangential limit of f (z) as z → ej θ . If f ∈ Hp then its boundary value function f (ej θ ) belongs to the Lebesgue space Lp [0, 2π]. Moreover, ⎡ ⎤1/p 2π 1 f p = ⎣ |f (ej θ )|p dθ ⎦ if 1 ≤ p < ∞ , 2π 0
= ess.
sup |f (ej θ )| if p = ∞ . θ∈[0,2π]
(6.2.4)
6.2. SOME FACTS ABOUT HARDY SPACES
157
In other words, the norm f p of a function f ∈ Hp can be computed from its boundary value function. For each p ∈ [1, ∞], the space Hp is a Banach space. In addition, H2 is a (complex) Hilbert space with inner product defined by 1 f, g = 2π
2π
f¯(ej θ )g(ej θ ) dθ ,
(6.2.5)
0
where the bar denotes complex conjugation. It turns out that the set of functions {zi }∞ i=0 forms an orthonormal basis for H2 . Hence, a function f (·) analytic in D belongs to H2 if and only if it can be expanded in the form ∞ fi z i , (6.2.6) f (z) = i=0
where the sequence {fi } is square-summable and the infinite sum converges in the sense of the H2 -norm. Thus, one can think of H2 as the set of z-transforms of square-summable sequences. Similarly, if {fi } is an absolutely summable sequence, then f (z) =
∞
fi z i ,
(6.2.7)
i=0
belongs to H∞ , and in fact f ∞ ≤
∞
|fi | .
(6.2.8)
i=0
In general, the converse of the above statement is false: Not every function in H∞ is of the form (6.2.7) where the coefficient sequence is absolutely summable. However, it is easy to verify that a rational function belongs to H∞ if and only if it is the z-transform of an absolutely summable sequence. If the product of two H∞ -functions is defined to be their pointwise product, then H∞ becomes a Banach algebra. Suppose f ∈ H∞ . Then one can associate with f a continuous linear operator Tf mapping H2 into itself by (Tf g)(z) = f (z)g(z) .
(6.2.9)
In other words, an application of Tf consists simply of pointwise multiplication by f (·). It can be shown that this association is norm-preserving. That is, the operator norm of Tf equals f ∞ ; or in other words, Tf := sup f g2 = f ∞ . g2 ≤1
(6.2.10)
158
6. FILTERING AND SENSITIVITY MINIMIZATION
Suppose m is a nonnegative integer, and that {ai } is a sequence of nonzero complex numbers such ∞ (1 − |ai |) converges. Then the function that |ai | < 1 for all i, and the sum i=1 ∞ |ai | z − ai b(z) = z , ai 1 − a¯ i z m
(6.2.11)
i=1
is called the Blaschke product corresponding to m and the sequence {ai }. The function b(·) is an element of H∞ and satisfies |b(z)| < 1 for all z in D, |b(ej θ )| = 1 for almost all θ ∈ [0, 2π ]. Moreover, the set of zeros of b(·) in D consists precisely of the set {ai }, where repeated occurrence of a particular complex number corresponds to multiple zeros of b. If m is a nonnegative integer and {a1 , · · · , an } is a finite set of points (not necessarily distinct) such that 0 < |ai | < 1 for all i, then b(z) = zm
n |ai | z − ai , ai 1 − a¯ i z
(6.2.12)
i=1
is called the associated (finite) Blaschke product. It too belongs to H∞ and satisfies |b(z)| < 1 for all z ∈ D and |b(ej θ )| = 1 ∀θ ∈ [0, 2π ]. Suppose f is any rational function in H∞ . Let m denote the multiplicity of z = 0 as a zero of f (if f (0) = 0 set m = 0), and let a1 , · · · , an denote the other zeros of f in D. If f has a multiple zero at some point in D, repeat this point in the sequence {a1 , · · · , an } as many times as the multiplicity. If b(·) is now defined as in (6.2.12), then it is clear that f can be factored in the form f (z) = a(z)b(z), where both a, b ∈ H∞ , b is a Blaschke product, and a has no zeros in the open unit disc. Further, if f has no zeros on the unit circle, then a is actually a unit of H∞ , i.e., it has an inverse in H∞ . The factorization described above is a special case of what is known as an inner-outer factorization. A function g ∈ H∞ is inner if it satisfies |g(ej θ )| = 1 for almost all θ ∈ [0, 2π]. The only rational inner functions are finite Blaschke products multiplied by a complex number of unit modulus. In general, an inner function can be expressed as a product of a Blaschke product and a so-called singular inner function; we will not require the latter concept in this book. The definition of an outer function is more technical: A function f ∈ H∞ is outer if the range of the map Tf defined in (6.2.9) is dense in H2 . However, a rational function f ∈ H∞ is outer if and only if it has no zeros in the open unit disc (but it could have zeros on the unit circle). With these definitions, we have the following result:
Every f ∈ H∞ can be factored as ab where a is outer and b is inner. If f is rational and has no zeros on the unit circle, then f can be factored in the form ab where a is a unit of H∞ and b is a finite Blaschke product. Fact 6.2.1
6.2. SOME FACTS ABOUT HARDY SPACES
159
In the case of matrices over Hp -spaces, the essential results for our purposes are as follows: Let n > 1 be an integer. Then Hn2 is a Hilbert space with the inner product 1 f, g = 2π
2π
f ∗ (ej θ )g(ej θ ) dθ .
(6.2.13)
0 n×m where ∗ denotes the conjugate transpose. Corresponding to every F ∈ H∞ one can define a m n continuous linear operator TF : H2 → H2 by
(TF g)(z) = F (z)g(z) ∀z ∈ D .
(6.2.14)
Moreover, the norm of the operator TF equals the norm of F ; that is, TF = sup TF g2 = F ∞ = g2 ≤1
sup σ¯ (F (ej θ )) ,
(6.2.15)
θ∈[0,2π]
where σ (·) denotes the largest singular value. n×m is inner if The matrix analog of Fact 6.2.1 is stated next. A matrix G ∈ H∞ ∗ j θ j θ G (e )G(e ) = I for almost all θ ∈ [0, 2π ] (in this definition it is important to note that n n×m need not equal m). A matrix F ∈ H∞ is outer if the range of the map TF defined in (6.2.14) is n n×m dense in H2 . In particular, a rational matrix F ∈ H∞ is outer if and only if rank F (z) = n for all z ∈ D. Note that if G inner (resp. outer) then n ≥ m (resp. n ≤ m). To state the next result precisely, we recall the set R∞ introduced in Section 2.1, which consists of all rational functions that have no poles in the closed unit disc. It is easy to see that R∞ is just the set of rational functions in H∞ (this explains the symbolism), and that the norm on R∞ defined in (2.2.21) is consistent with (6.2.4). n×m Suppose G ∈ H∞ and has rank min{n, m}.1 Then G can be factored as Gi Go where Gi is inner and Go is outer. If n ≥ m, then Go is square, whereas if n ≤ m, then Gi is square.
Fact 6.2.2
n×m has rank n and Fact 6.2.2 has several interesting consequences. First, suppose G ∈ R∞
that n ≤ m. Then, by applying Fact 6.2.2 to F = G , we see that G can be factored as AB where n×n n×m A ∈ R∞ is outer and B ∈ R∞ is the transpose of an inner matrix. This is quite distinct from the factorization in Fact 6.2.2. Second, noting that the transpose of a square inner matrix is also inner, we see that a square matrix G ∈ M(R∞ ) of full rank can be factored as Gi Go and also as Fo Fi where Gi , Fi are inner and Go , Fo are outer. Finally, observe that if G is a square inner matrix, then |G| is an inner function, and the adjoint matrix of G is also a square inner matrix. Also, note that left multiplication by an inner matrix preserves norms in both H2 and H∞ . That is, if F ∈ M(H∞ ) is inner, G ∈ M(H∞ ) and H ∈ M(H2 ), then F G∞ = G∞ and F H 2 = H 2 . 1This means that at least one full-size minor of G is nonzero.
160
6. FILTERING AND SENSITIVITY MINIMIZATION
One can also define the class Hp over the open right half-plane, rather than the open unit disc as is the case above. Suppose f : C+ → C is analytic; then f belongs to the class Hp (C+ ), where 1 ≤ p < ∞, if ⎡ ⎤1/p ∞ 1 =: f p+ < ∞ . |f (σ + j ω)|p dω⎦ (6.2.16) sup ⎣ σ >0 2π −∞
The function f belongs to H∞ (C+ ) if sup sup |f (σ + j ω)| =: f ∞+ < ∞ .
σ >0
ω
(6.2.17)
If f ∈ Hp (C+ ), then its domain definition can be extended to the j ω-axis, and the function ω → f (j ω) belongs to the Lebesgue space Lp (R). Moreover, ⎡ ⎤1/p ∞ 1 |f (j ω)|p dω⎦ if p ∈ [1, ∞), f p+ = ⎣ 2π −∞
f ∞+ = ess. sup |f (j ω)| ω
if p = ∞ .
(6.2.18)
H2 (C+ ) is a Hilbert space, with the inner product of two functions f and g defined by 1 f, g + = 2π
∞
f¯(j ω)g(j ω) dω .
(6.2.19)
−∞
The notions of Blaschke product, inner and outer functions all have their counterparts in this setting. In particular, a function f ∈ H∞ (C+ ) is inner if |f (j ω)| = 1 for almost all ω; a rational f ∈ H∞ (C+ ) with f (j ω) = 0 for all ω is outer if it is a unit. Facts 6.2.1 and 6.2.2 have their obvious analogs for the space H∞ (C+ ), obtained by replacing “unit circle” by “imaginary axis” and “ej θ ” by “j ω” throughout. Suppose f ∈ H∞ (of the unit disc). Then the function g : C+ → C defined by g(s) = f [(s − 1)/(s + 1)] ,
(6.2.20)
belongs to H∞ (C+ ), and satisfies, g∞+ = f ∞ . However, not all functions in H∞ (C+ ) can be generated in this manner. For example, g(s) = e−s belongs to H∞ (C+ ), but is not of the form (6.2.20) for any f ∈ H∞ .The situation is worse when p < ∞. Even if f ∈ Hp , the corresponding g might not belong to Hp (C+ ). An easy way to see this is to note that every g ∈ Hp (C+ ) must be strictly proper. Thus, if f ∈ Hp is rational and f (1) = 0, then from (6.2.20) g(∞) = f (1) = 0, so that g ∈ / Hp (C+ ). Over the unit disc, H∞ is a subset of Hp for all p < ∞, and indeed Hp is a subset of Hq whenever p > q; this follows from the compactness of the interval [0, 2π ]. However, Hp (C+ ) and Hq (C+ ) are distinct sets whenever p = q, with neither being a subset of the other.
6.2. SOME FACTS ABOUT HARDY SPACES
161
There is however an interesting relationship, which is given next. Fact 6.2.3
Suppose f ∈ Hp , for some p ∈ [1, ∞), and define g by (6.2.20). Let h(s) =
21/p g(s) . (1 + s)2/p
(6.2.21)
Then h ∈ Hp (C+ ) and hp+ = f p . Proof. By definition p f p
1 = 2π
∞ |f (ej θ )|p dθ,
(6.2.22)
0 p hp
1 = 2π =
1 2π
∞
g(j ω) p
(1 + ω2 )1/p 2 dω
−∞ ∞
|f ([(j ω − 1)/(j ω + 1)])|p −∞
2dω . 1 + ω2
(6.2.23)
To relate (6.2.22) and (6.2.23), let ej θ = (j ω − 1)/(j ω + 1). Then e−j θ/2 + ej θ/2 1 + ej θ = 1 − ej θ e−j θ/2 − ej θ/2 = j cot θ/2 .
jω =
(6.2.24)
Hence, θ = 2 cot−1 ω;
dθ =
Combining (6.2.25) and (6.2.23) completes the proof.
2dω . 1 + ω2
(6.2.25)
2
Finally, if f ∈ H2 , then its boundary function θ → f (ej θ ) belongs to L2 [0, 2π ]. Let + H+ 2 [0, 2π] denote the set of boundary values of functions in H2 . Then H2 [0, 2π ] is a closed sub− space of L2 [0, 2π]. Let H2 [0, 2π ] denote the orthogonal complement of H+ 2 [0, 2π ] in L2 [0, 2π ]. Then every function f ∈ L2 can be expressed uniquely in the form f = f+ + f− where f+ ∈ H+ 2 and f− ∈ H− (we drop the [0, 2π ] for brevity). This splitting up has a very intuitive interpretation. 2 The collection of functions {ej nθ , n ∈ Z} forms an orthonormal basis for L2 , so that every f ∈ L2 has the Fourier series ∞ fn ej nθ . (6.2.26) f (ej θ ) = n=−∞
162
6. FILTERING AND SENSITIVITY MINIMIZATION
j nθ , n ∈ Z }, while H− is the span of the set {ej nθ , n < 0}. Now H+ + 2 is the span of the functions {e 2 Hence, ∞ −1 f+ (ej θ ) = fn ej nθ , f− (ej θ ) = fn ej nθ . (6.2.27) n=0
n=−∞
Note that the function z → z−1 f− (z−1 ) belongs to H2 . Such a function is called coanalytic. In case f (·) is rational and has no poles on the unit circle, the above decomposition is even more familiar: Expand f into partial fractions; then f+ (resp. f− ) is the sum of the terms corresponding to poles outside (resp. inside) the unit disc. The above discussion applies, mutatis mutandis, to the space L2 (R).
6.3
FILTERING
The problem studied in this section is the following: Suppose a plant P ∈ M(R(s)) is specified, together with weighting matrices W1 , W2 . The objective is to minimize the weighted sensitivity measure J = W1 (I + P C)−1 W2 2+ , (6.3.1) over all C ∈ S(P ). At this stage, several assumptions are made to facilitate a solution. First, note that J might not even be defined for all C ∈ S(P ), because W1 (I + P C)−1 W2 , might not be strictly proper. To get around this problem, it is assumed that either W1 or W2 is strictly proper. Next, it is assumed that W1 , W2 are square, and that W1 (s) = 0, W2 (s) = 0 for all s ∈ C+ . Finally, for technical reasons, it is assumed that the plant P has at least as many inputs as outputs. This is the so-called “square” or “fat” plant case; the case of “tall” plants is discussed in Section 6.6. As pointed out in Section 6.1, this problem can be reformulated in a more manageable ˜ N˜ ) be an r.c.f. and an l.c.f. of P , and let X, ˜ Y˜ ∈ M(S) be any solutions of form. Let (N, D), (D, N˜ X˜ + D˜ Y˜ = I . Then, by Theorem 5.2.1, every C ∈ S(P ) is of the form (X˜ + DR)(Y˜ − NR)−1 for some R ∈ M(S) such that |Y˜ − NR| = 0. Moreover, the corresponding weighted sensitivity measure equals ˜ 2 2 J (R) = W1 (Y˜ − NR)DW ˜ 2 − W1 NR DW ˜ 2 2 . = W1 Y˜ DW
(6.3.2)
Hence, the problem is to minimize J (R) by a suitable choice of R ∈ M(S).2 This problem has a ˜ 2 generates a very nice geometrical interpretation. As R varies over M(S), the product W1 NR DW subspace of M(H2 (C+ )). To minimize J (·), one has to find the closest point in this subspace to ˜ 2 . Now, since at least one of the matrices W1 or W2 is strictly proper, it turns out that the W1 Y˜ DW ˜ 2 : R ∈ M(S)} is not closed. As a result, there may not exist a closest point in subspace {W1 N R DW ˜ 2 . In other words, the infimum of J (·) is not attained by any R, except in this subspace to W1 Y˜ DW trivial cases. It is nevertheless possible to compute the infimum of J (·), and to construct a family 2The constraint that |Y˜ − N R| = 0 is ignored since it is satisfied by almost all R (see Lemma 5.2.4).
6.3. FILTERING
(Rε ) of functions in M(S) such that J (Rε ) approaches this infimum as ε → based on the following lemma.
0+ . This
163
procedure is
Suppose F, G, H ∈ M(S), and suppose in addition that (i) F is strictly proper, (ii) G has full row rank at all points on the j ω-axis and at infinity,3 and (iii) H is square and nonsingular at all points on the j ω-axis and at infinity. Define Lemma 6.3.1
Jm (R) = F − GRH 2 ,
(6.3.3)
/ M(H2 (C+ )). Factorize G, H as G = Gi Go , H = Ho Hi , where and let Jm (R) = ∞ if F − GRH ∈ Gi , Hi are inner and Go , Ho are outer. For any, A ∈ M(S), let A∗ denote the rational matrix, A∗ (s) = A (−s).4 Then min Jm (R) = (Gi ∗ F Hi )− 2 , (6.3.4) R
and this minimum is attained by any R satisfying Go RHo = (Gi ∗ F Hi ∗ )+ .
(6.3.5)
Proof. First of all, observe that the equation Go RHo = A has a solution for R corresponding to every A ∈ M(S). The reason is that Go has a right inverse and Ho is a unit. To prove the lemma, note that Gi ∗ (s)Gi (s) = I, Hi ∗ (s)Hi (s) = Hi (s)Hi ∗ (s) = I ∀s, since both Gi and Hi are inner matrices. In particular, Gi ∗ (j ω) is a unitary matrix for all ω, and as a result for any A ∈ M(H2 (C+ )) we have A22 = Gi ∗ A22 , (6.3.6) where the norm on the left is the H2 -norm or the L2 -norm, while the norm on the right side is strictly the L2 -norm (since Gi ∗ A need not belong to M(H2 (C+ ))). In the same way, multiplication by Hi ∗ is also norm-preserving. Using this fact leads to (Jm (R))2 = F − GRH 22 = F − Gi Go RHo Hi 22 = Gi ∗ F Hi ∗ − Go RHo 22 .
(6.3.7)
If Gi ∗ F Hi is decomposed as (Gi ∗ F Hi ∗ )+ + (Gi ∗ F Hi ∗ )− , then by orthogonality, (Jm (R))2 = (Gi ∗ F Hi ∗ )− 22 + (Gi ∗ F Hi ∗ )+ − Go RHo 22 ≥ (Gi ∗ F Hi ∗ )− 22 . On the other hand, equality can be achieved in (6.3.8) by choosing R to satisfy (6.3.5).
(6.3.8)
2
3This means that G has at least as many columns as rows. 4 In particular, if s = j ω, then, A∗ (j ω) is the conjugate transpose of A(j ω), so that this notation is consistent with earlier usage.
164
6. FILTERING AND SENSITIVITY MINIMIZATION
If we attempt to apply Lemma 6.3.1 to minimize the function J (·) of (6.3.2), we run ˜ 2 may not satisfy the hypotheses into the difficulty that the functions G = W1 N and H = DW of Lemma 6.3.1. In particular, if P (j ω) has less than full row rank for some ω, then so does N(j ω) ˜ and of |H |. Finally, since (i.e., G(j ω)). Similarly, any j ω-axis poles of P correspond to zeros of |D| W1 or W2 is assumed to be strictly proper, either |G(∞)| = 0 or |H (∞)| = 0. To get around these difficulties, the following modification of Lemma 6.3.1 is required. Suppose F, G, H ∈ M(S), and suppose in addition that (i) G has the form G = G1 Dg , where G1 ∈ M(S) has full row rank at all points on the j ω-axis and at infinity, while Dg ∈ M(S) is a diagonal matrix Dg = Diag {dg1 , · · · , dgn }, where the only C+e -zeros of dgi (if any) are at infinity or on the j ω-axis. (ii) H has the form H = Dh H1 where H1 ∈ M(S) is square and nonsingular at all points on the j ω-axis and at infinity, while Dh ∈ M(S) is a diagonal matrix of the form Dh = Diag {dh1 , · · · , dhm }, where the only C+e -zeros of dhi (if any) are at infinity or on the j ω-axis. Under these conditions, Lemma 6.3.2
inf
R∈M(S)
F − GRH 2 =
inf
R∈M(S)
F − G1 RH1 2 .
(6.3.9)
Proof. As shown by Lemma 6.3.1, the infimum on the right side of (6.3.9) is actually attained. Let R0 ∈ M(S) be an element that achieves this minimum. Now every matrix of the form GRH is also of the form G1 R1 H1 (just let R1 = Dg RDh ). Thus, the infimum on the left side of (6.3.9) can be no smaller than that on the right side. Hence, in order to prove the lemma, it is enough to construct a family of functions (Rε )ε>0 in M(S) such that F − GRε H 2 → F − G1 R0 H1 2 as ε → 0+ . Suppose a ∈ S has the property that a(s) = 0 whenever Re s > 0, so that the only possible C+e -zeros of a are at infinity or on the j ω-axis. (Note that all the dgi , dhi , are of this type.) To be specific, let j ω1 , · · · , j ωl (where ωl may be infinite) denote the distinct j ω-axis zeros of a, with multiplicities m1 , · · · , ml , respectively. It is claimed that there exists a family of functions (bε )ε>0 in S such that (i) bε /a is a unit of S for each ε > 0. (ii) bε (j ω) → 1 uniformly on all compact subsets of (−∞, ∞) that do not contain any of the ωi ’s, as ε → 0+ . To see this, note that a(s) is of the form l 1 n s − j ωi mi · , (6.3.10) a(s) = u(s) s+1 s+1 i=1
for some unit u ∈ S. Hence, bε (s) =
n mi l s − j ωi 1 · , s + ε − j ωi εs + 1 i=1
(6.3.11)
6.3. FILTERING
165
satisfies the requirements (i) and (ii). In view of this, it follows that there exists families Bεg , Bεh of diagonal matrices in M(S) such that (i) Dg−1 Bεg is unimodular for all ε > 0. (ii) Bεh Dh−1 is unimodular for all ε > 0. (iii) Bεg (j ω) → I, Bεh (j ω) → I as ε → 0+ uniformly on all compact subsets of (−∞, ∞) that do not contain the zeros of |Dg |, |Dh |. Now define Rε = Dg−1 Bεg R0 Bεh Dh−1 , where R0 ∈ M(S) attains the infimum on the right side of (6.3.9). Then F − GRε H 2 = F − G1 Dg · Dg−1 Bεg R0 Bεh Dh−1 · Dh H1 2 = F − G1 Bεg R0 Bεh H1 2 .
(6.3.12)
The proof of the lemma is complete if it can be shown that the last quantity in (6.3.12) approaches F − G1 R0 H1 2 as ε → 0+ . But this follows from the limiting behavior of Bεg , Bεh as described in (iii) above. 2 We are now ready to tackle the general problem of minimizing J (·) as defined in (6.3.2). Let ˜ 2 , and suppose Gs denote the Smith form of W1 N and Hs the Smith form of DW Ug Gs Vg = W1 N,
˜ 2, Uh Hs Vh = DW
(6.3.13)
where Ug , Vg , Uh , Vh ∈ U(S). Then ˜ 2 2 ˜ 2 − W1 NR DW J (R) = W1 Y˜ DW ˜ 2 − Ug Gs Vg RUh Hs Vh 2 . = W1 Y˜ DW
(6.3.14)
Since Vg , Uh are unimodular, one can replace Vg RUh by another “free” parameter, say R1 . Moreover, Gs (and Hs ) can be expressed as a product of two diagonal matrices Eg and Dg (resp. Eh and Dh ) such that |Eg | is nonzero at all points on the j ω-axis and at infinity, while |Dg (s)| = 0 whenever s ∈ C+ . Now, defining G1 = Ug Eg , H1 = Eh Vh , (6.3.15) shows that
˜ 2 − G1 Dg R1 Dh H1 2 , J (R) = W1 Y˜ DW
(6.3.16)
where R1 = Vg RUh . Hence, by Lemma 6.3.2, ˜ 2 − G1 RH1 2 , inf J (R) = inf W1 Y˜ DW R
R
(6.3.17)
which can be computed using Lemma 6.3.1. The situation is much cleaner in the case of discrete-time systems. First, since the interval [0, 2π ] is compact, every H∞ -function is also in H2 . Further, a strictly causal plant, which is the
166
6. FILTERING AND SENSITIVITY MINIMIZATION
discrete analog of a strictly proper plant, is one whose transfer matrix is zero at z = 0. As a result, in the formulation of the discrete-time filtering problem, the weighting matrices W1 and W2 can be any rational H∞ -matrices and need not be strictly causal. Moreover, an optimal controller always exists and the approximation procedure of Lemma 6.3.2 is not needed.
6.4
SENSITIVITY MINIMIZATION: SCALAR CASE
Because the sensitivity minimization problem is much more involved technically than the filtering problem, the discussion of the former is divided into three parts, namely the scalar case, the fat plant case, and the general case. Accordingly, the problem studied in this section is the following: Suppose p is a given scalar plant, and w is a given element of S. The objective is to find a c ∈ S(p) that minimizes the weighted sensitivity measure5
w(j ω)
.
J = sup
(6.4.1)
ω∈R 1 + p(j ω)c(j ω) As in the previous section, one can transform this problem into an affine minimization problem. Let (n, d) be a coprime factorization of p, and let x, y ∈ S be such that xn + yd = 1. Then the problem becomes: Minimize J (r) = sup |[(y − rn)dw](j ω)| ω∈R
= (y − rn)dw∞ ,
(6.4.2)
by suitable choice of r ∈ S. Moreover, it can be assumed without loss of generality that the weighting function w is outer. To see this, factor w in the form w = wi wo where wi and wo are inner and outer, respectively. Then, since multiplication by the inner function wi is norm-preserving, we have J (r) = (y − rn)dwo ∞ . In order to take full advantage of the available results on Hp -spaces, a bilinear transformation is now used to map the set S into H∞ . Suppose f ∈ S, and define the function f˜ by f˜(z) = f [(1 + z)/(1 − z)] .
(6.4.3)
Since the bilinear transformation s = (1 + z)(1 − z) maps the open unit disc into the open right half-plane, it follows that f˜ is a rational element of H∞ , i.e., f ∈ R∞ . Moreover, f˜∞ =
sup |f˜(ej θ )| θ∈[0,2π]
= sup |f (j ω)| = f S ,
(6.4.4)
ω∈R
since the unit circle maps onto the imaginary axis plus the point at infinity. Conversely, suppose g˜ ∈ H∞ is rational, and define g by g(s) = g[(s ˜ − 1)/(s + 1)] . 5 Note that the two weights w and w are combined into one, since we are dealing with the scalar case. 1 2
(6.4.5)
6.4. SENSITIVITY MINIMIZATION: SCALAR CASE
167
Then g ∈ S and g ˜ ∞ = gS . In this way one can set up a one-to-one norm-preserving correspondence between functions in S and functions in R∞ . If this procedure is adopted, the problem under study becomes the following: Find a function r˜ ∈ R∞ that minimizes J = (y˜ − r˜ n) ˜ d˜ w ˜ ∞.
(6.4.6)
At this stage we drop the tilde and state the problem in H∞ rather than in R∞ ; the reason for the latter is that H∞ is a Banach space while R∞ is merely a normed space. The problem in essence is as follows: Given two rational functions f, g ∈ H∞ , find a rational function r ∈ H∞ that minimizes J (r) = f − gr∞ .
(6.4.7)
The definitions of f and g when J is as in (6.4.6) are obvious. The problem of minimizing J has a very simple geometric interpretation: Let gH∞ denote the subspace {gr : r ∈ H∞ }. Then the problem is to find the closest point to f in the subspace gH∞ , where distances are measured using the norm · ∞ . It turns out that the subspace gH∞ is closed if and only if g has no zeros on the unit circle. If gH∞ is a closed subspace, then it can be shown that, corresponding to each rational f ∈ H∞ there exists a unique closest point in gH∞ (more on this shortly). Unfortunately, if the plant p has zeros on the j ω-axis or at infinity, then so does n. As a result, g = n˜ = n˜ d˜ w˜ has one or more zeros on the unit disc, and the subspace gH∞ is not closed. As a consequence, if one defines J0 = inf f − gr∞ , r∈H∞
(6.4.8)
then the infimum in (6.4.8) may or may not be attained, depending on f . In the original context, this means that in general there does not exist a single compensator that minimizes the weighted sensitivity measure (6.4.1), in the practically important case where the plant is strictly proper (or has j ω-axis zeros). In view of the above discussion, one would nevertheless like to know the value of J0 , as well as a sequence {ri } of functions in H∞ such that J (ri ) → J0 . This information can then be translated back into a greatest lower bound on the weighted sensitivity measure of (6.4.1), as well as an optimizing sequence {ci } of compensators. In order to do this, first carry out an inner-outer factorization of g. Thus, g = bg0 where b is a finite Blaschke product and g0 is outer. Now factor g0 as vu where v ∈ H∞ has all of its zeros on the unit circle and u is a unit of H∞ . Then J0 = inf f − bdur∞ = inf f − bvr∞ , r∈H∞
r∈H∞
(6.4.9)
since the unit factor u can be absorbed into the “free” parameter r. Now consider the modified problem: Minimize f − rb∞ by a suitable choice of r ∈ H∞ . Define Jm = inf f − rb∞ , r∈H∞
(6.4.10)
168
6. FILTERING AND SENSITIVITY MINIMIZATION
as the value of the modified problem. In view of earlier discussion, the infimum in (6.4.10) is actually attained, since b has no zeros on the unit circle. Now, since every function of the form bvr is also a multiple of b, it is clear that Jm ≤ J0 . The next lemma gives conditions under which equality holds. Lemma 6.4.1
only if
|f (ej θi )|
Let ej θ1 , · · · , ej θl denote the distinct zeros of v on the unit circle. Then J0 = Jm if and ≤ Jm ∀i.
Proof. “only if ” Since v(ej θi ) = 0 ∀i, we have (bvr)(ej θi ) = 0 ∀i, ∀r ∈ H∞ .6 Hence, (f − bvr)(ej θi ) = f (ej θi ) ∀i, and as a result f − bvr∞ =
sup |[f − bvr](ej θ )| θ∈[0,2π]
≥ max |[f − bvr](ej θi )| i
= max |f (ej θi )|, ∀r ∈ H∞ . i
(6.4.11)
Combining (6.4.11) with the hypothesis that J0 = Jm leads to Jm = J0 = inf f − bvr∞ ≥ max |f (ej θi )| . i
r∈H∞
(6.4.12)
“if ” Select an r0 ∈ H∞ such that Jm = f − r0 b∞ ; such an r0 exists, by earlier remarks. Let f0 = f − r0 b. Next, introduce the function
and define
hθ,ε (z) = (z − ej θ )/[z − (1 + ε)ej θ ] ,
(6.4.13)
l [hθi ,ε (z)]mi , hε (z) =
(6.4.14)
i=1
where mi is the multiplicity of ej θi as a zero of v(·). Now observe that (i) hε /v is a unit of H∞ for each ε > 0, and (ii) hε (ej θ ) → 1 uniformly on all compact subsets of [0, 2π ] that do not contain any of the θi ’s, as ε → 0+ . Consider the family (rε )ε>0 in H∞ defined by rε = r0 hε /v. Then by (6.4.9), f − bvrε ∞ ≥ J0 ,
for each ε > 0 .
(6.4.15)
On the other hand, f − bvrε = f − br0 hε so that |(f − bvrε )(ej θ )| → |f0 (ej θ )| ≤ Jm uniformly on all compact subsets of [0, 2π ] that do not contain any of the θi ’s, and |(f − bvrε )(ej θi )| = |f (ej θi )| ≤ Jm by hypothesis. This means that lim sup f − bvrε ∞ ≤ Jm . ε→0+
6 Strictly speaking, r(ej θi ) is undefined for a general H
(6.4.16)
∞ -function, and this statement is only valid under more restrictive conditions, e.g., r is rational. These technicalities can be addressed, but spelling out the details would obscure the main ideas of the proof.
6.4. SENSITIVITY MINIMIZATION: SCALAR CASE
169
Combining (6.4.15), (6.4.16) and the fact that J0 ≥ Jm leads to the conclusion that J0 = Jm and that (6.4.17) f − bvrε ∞ → Jm as ε → 0+ .
2
This completes the proof.
In the above proof, it is easy to see that there is nothing special about the function hε of (6.4.14), and that any function hε can be used so long as (i) hε ∈ H∞ ∀ε > 0, (ii) hε (ej θi ) = 0 ∀i, and (iii) hε (ej θ ) → 1 uniformly on all compact subsets of [0, 2π ] that do not contain any of the 0i ’s. Note that it f (ej θi ) = 0 ∀i, then the condition of Lemma 6.4.1 is automatically satisfied and J0 = Jm . Going back to (6.4.6), we have f = y˜ d˜ w˜ and g = n˜ d˜ w. ˜ Hence, if the weighting function w is chosen such that every unit circle zero of n˜ is also a zero of w, ˜ then J0 = Jm . (Note that unit circle zeros of d˜ are already common to both f and g.) Translating this condition back to the s-domain yields the result that J0 = Jm provided w is strictly proper whenever p is, and vanishes at the j ω-axis zeros of the plant. Once such a weighting function is chosen, then any unit circle zeros of n˜ d˜ w˜ can be “deleted” without affecting the infimum of J . Example 6.4.2
Let p be a strictly proper, nonminimum phase plant with the transfer function p(s) =
s−1 . (s − 3)(s 2 + 1)
Let the weighting function w equal w(s) =
1 . s+1
Then w(∞) = p(∞) = 0, so that J0 = Jm in this case. Now, substituting s = (1 + z)/(1 − z) gives p(z) ˜ =
n(z) ˜ z(z − 1)2 = , 2 ˜ 2(2z − 1)(z + 1) d(z)
where n, ˜ d˜ are the numerator and denominator polynomials of p, ˜ respectively. A particular solution of x˜ n˜ + y˜ d˜ = 1 can be obtained by Euclidean division and back substitution, namely x(z) ˜ = 6z2 +
13 , 2
1 y(z) ˜ = − (6z2 − 9z + 2) . 4
Finally, w(z) ˜ =
1−z . 2
Hence, J (r) = f − rg∞ ,
170
6. FILTERING AND SENSITIVITY MINIMIZATION
where 1 ˜ f (z) = (w˜ y˜ d)(z) = (z − 1)(2z − 1)(z2 + 1)(6z2 − 9z + 2), 4 ˜ g(z) = (w˜ n˜ d)(z) = −z(z − 1)(2z − 1)(z2 + 1) . If we factor g in the form g = bvu where b is a Blaschke product, v has all of its zeros on the unit circle, and u is a unit, then z(z − 0.5) , 1 − 0.5z v(z) = (z − 1)3 (z2 + 1), b(z) =
u(z) = −2(1 − 0.5z) . Now Lemma 6.4.1 allows us to ignore v and u, in that inf J (r) = Jm = inf f − rb∞ . r
r
Lemma 6.4.1 provides conditions under which, by solving the minimization problem in (6.4.10) (which does have a solution), it is possible to determine the value of the infimum in (6.4.9). Actually, it turns out that it is possible to compute Jm (the value of the problem in (6.4.10)) without actually computing the r0 that attains the minimum. If the conditions of Lemma 6.4.1 hold, this Jm is also the infimum in (6.4.9). In this way, it is possible to obtain rapidly a figure of merit against which suboptimal choices of r in (6.4.9) can be compared. If the optimization problem in (6.4.10) is solved completely, i.e., if r0 is also determined, then a family rε such that f − grε ∞ → Jm can be computed as in the proof of Lemma 6.4.1. The next step then is to give a procedure for computing Jm . We begin with a result that is now obvious, but has some practical significance. Lemma 6.4.3 Suppose the plant p is strictly proper and stable, and has no zeros in the open right half-plane; suppose also that the weighting function w is stable, strictly proper and vanishes at all j ω-axis zeros (if any) of p. Then
w inf sup
(j ω)
= 0 . (6.4.18) c∈S(p) ω 1 + pc
Remarks 6.4.4 In other words, Lemma 6.4.3 states that if the plant is strictly proper, stable, and “minimum phase,”7 then the weighted sensitivity measure can be made arbitrarily small provided the weighting function is also stable and vanishes at all RHP zeros of the plant. 7 Normally, the term “minimum phase” implies that the plant has no zeros in the closed right half-plane; here the usage is slightly
different.
6.4. SENSITIVITY MINIMIZATION: SCALAR CASE
171
Proof. In this case d˜ is a unit, and all zeros of n˜ in the closed unit disc are in fact on the unit circle. As a result, the finite Blaschke product b equals 1, and the condition of Lemma 6.4.1 is satisfied. Finally, it is clear that inf f − rb∞ = inf f − r∞ = 0 , (6.4.19) r
r
2
corresponding to the choice r = f .
In the general case the infimum Jm in (6.4.10) can be computed by associating, with each H∞ -function f − rb, the corresponding continuous linear operator Tf −rb mapping H2 into itself, as defined in (6.2.9). Since Th = h∞ ∀h ∈ H∞ , it follows that Jm = inf Tf −rb . r∈H∞
(6.4.20)
Let bH2 denote the range of the operator Tb ; i.e., bH2 = {bg : g ∈ H2 } ,
(6.4.21)
and let N denote the orthogonal complement of bH2 . Thus, N = {h ∈ H2 : h, bg = 0 ∀g ∈ H2 } .
(6.4.22)
Finally, let denote the orthogonal projection mapping H2 onto N. Now, as r varies over H∞ , the operator Tf −rb ranges over various operators. However, the operator Tf −rb is the same for all r, and in fact equals Tf . To see this, let g ∈ H2 , be arbitrary; then Tf −rb g = [(f − rb)g] = (f g) − (rbg) = (f g) − (brg) = (f g) = Tf g ,
(6.4.23)
since brg ∈ bH2 and hence its projection onto N is zero. Using this idea, one can prove the following result (see Sarason [86] for the proof ): Fact 6.4.5
Define a linear map A : N → N to be the restriction of Tf to N. Then inf f − rb∞ = A .
r∈H∞
(6.4.24)
Moreover, if f (ej θ ) is a continuous function of θ, then there is a unique function r0 ∈ H∞ that attains the infimum in (6.4.26). Let f0 = f − r0 b, and let g ∈ N be a function such that Ag2 = g2 . Then f0 /A is inner, and (Ag)(z) f0 (z) = . (6.4.25) g(z) Fact 6.4.5 is very useful because it is possible to compute all the quantities involved using only linear algebra. First of all, the subspace N is finite-dimensional, and in fact its dimension equals the
172
6. FILTERING AND SENSITIVITY MINIMIZATION
number of zeros of the Blaschke product b. Next, one can readily find a basis for N, which can be converted to an orthonormal basis using the Gram-Schmidt procedure. Let {v1 , · · · , vn } denote this orthonormal basis. Then one can compute a matrix representation of the linear operator A with respect to this basis; call it A0 . Now (6.4.24) states that the infimum of f − rb∞ is precisely the largest singular value of A0 . Further, if p = [p1 · · · pn ] denotes the (column) eigenvector of A0 ∗ A0 corresponding to the eigenvalue (σ¯ (A0 ))2 , then n
f0 (z) =
(A0 p)i vi (z)
i=1 n
.
(6.4.26)
pi vi (z)
i=1
Let us elaborate each of these points in turn.
Let b be a finite Blaschke product, and define N as in (6.4.22). Let {λ1 , · · · , λn } denote the distinct zeros of b, with multiplicities {m1 , · · · , mn }, respectively. Then a (nonorthonormal) basis for N is given by the functions Lemma 6.4.6
zk−1 , (1 − λ¯ i z)k
k = 1, · · · , mi ; i = 1, · · · , n .
(6.4.27)
The proof follows readily from the following lemma.
Lemma 6.4.7
Suppose λ is a complex number with |λ| < 1, k is an integer greater than or equal to 1,
and define φ(z) =
zk−1 . ¯ k (1 − λz)
(6.4.28)
Then φ ∈ H2 ; moreover, for any g ∈ H2 , φ, g =
g (k−1) (λ) . (k − 1)!
(6.4.29)
6.4. SENSITIVITY MINIMIZATION: SCALAR CASE
173
¯ By definition (6.2.5), we have Proof. Clearly φ ∈ H2 , since its only singularity is at 1/λ. 1 φ, g = 2π =
=
1 2π 1 2π
2π 0 2π
0 2π
0
1 = 2πj
C
¯ j θ )g(ej θ ) dθ φ(e e−j (k−1)θ g(ej θ ) dθ ¯ −j θ )k (1 − λe ej θ g(ej θ ) dθ (ej θ − λ)k g(z) dz , (z − λ)k
(6.4.30)
where C denotes the unit circle oriented counterclockwise (note that z = ej θ on C, so that dz = j ej θ dθ). Finally, by the Cauchy integral formula, the last term in (6.4.30) equals g (k−1) (λ)/(k − 1)!. 2 Proof of Lemma 6.4.6. The subspace N is defined by (6.4.22). Let g ∈ H2 be arbitrary. Then bg ∈ H2 has a zero of multiplicity mi at λi , for i = 1, · · · , n. Hence, if φik denotes the function φik (z) =
zk−1 , (1 − λ¯ i z)k
(6.4.31)
then, by Lemma 6.4.7, φik , bg =
(bg)(k−1) (λi ) =0, (k − 1)!
(6.4.32)
whenever 1 ≤ k ≤ mi . Thus, each of the functions φik belongs to N. It is routine to verify that they are linearly independent. Finally, in order to show that they span N, it is enough to show that the orthogonal complement of the span of the φik ’s is contained in bH2 . Accordingly, suppose h ∈ H2 is orthogonal to each φik . Then, by Lemma 6.4.7, h has a zero of multiplicity mi at λi , for all i, and 2 is hence of the form bg for some g ∈ H2 . Once a basis has been found for N, it is easy to construct an orthonormal basis using the Gram-Schmidt procedure. Let us drop the double subscript notation, let n denote the dimension of the null space N, and let {φ1 , · · · , φn } denote the basis for N given by 6.4.6. Let ∈ C n×n denote the associated Grammian matrix; i.e., γij = φi , φj .
(6.4.33)
174
6. FILTERING AND SENSITIVITY MINIMIZATION
Note that the constants γij can be readily computed using Lemma 6.4.7. For instance, if b has only simple poles at λ1 , · · · , λn , then Lemma 6.4.6 shows that the set φi (z) =
1 , 1 − λ¯ i z
(6.4.34)
is a basis for N. Next, by Lemma 6.4.7, γij = φi , φj = φj (λi ) =
1 . 1 − λ¯ j λi
(6.4.35)
Since is the Grammian of a linearly independent set, it is Hermitian and positive definite. Hence, it is possible to factor in the form B ∗ B where B is nonsingular. Let X = B −1 , and define vj (z) =
n
φi (z)xij ,
j = 1, · · · , n .
(6.4.36)
i=1
Then {v1 , · · · , vn } is an orthonormal basis for N. If the factor B is selected to be upper triangular, then this is just the Gram-Schmidt procedure. The next step is to obtain a matrix representation A0 for the operator A = Tf with respect to the basis {vi }. This is given by aij = vi , Avj = vi , Tf vj = vi , f vj .
(6.4.37)
Now, by Lemma 6.4.7, a quantity of the form φi , f φj is very easy to compute. Hence, it is desirable to expand aij as a summation of terms of the form φi , f φj using (6.4.36). Let denote the matrix in C n×n whose ij -th element is φi , f φj . Then routine computations show that A0 = X∗ X .
(6.4.38)
Now Fact 6.4.5 asserts that the infimum Jm in (6.4.10) equals A0 . Further, if p ∈ C n is a column vector such that A0 p2 = p2 , then the optimal function f0 is given by (6.4.26). Finally, as is the case with the filtering problem, the solution of the sensitivity minimization problem is a lot cleaner in the discrete-time case than in the continuous-time case. If p(z) is the transfer function of a strictly-causal plant, then p(0) = 0. If, in addition, p(·) has no poles nor zeros on the unit circle, and the weighting function w(·) has no zeros on the unit circle, then an optimal controller always exists. This section is concluded with an illustrative example. Example 6.4.8
Consider once again the plant of Example 6.4.2, namely p(s) =
s−1 , (s − 3)(s 2 + 1)
6.4. SENSITIVITY MINIMIZATION: SCALAR CASE
175
together with the weighting function w(s) =
1 . s+1
By previous discussion, the sensitivity minimization problem reduces to finding an r ∈ H∞ that minimizes f − rb∞ , where f and b are computed in Example 6.4.2. The first objective of this example is to show that it is unnecessary to compute f and b explicitly. Now b is the Blaschke product part of n˜ d˜ w. ˜ Hence, to form b, it is only necessary to know the locations of the zeros of ˜ w˜ inside the open unit disc. In the s-domain, this translates to knowing the open RHP zeros n, ˜ d, and poles of the plant, as well as the open RHP zeros of the weighting function. Moreover, as stated earlier, it can be assumed without loss of generality that the weighting function is outer, so that it has no open RHP zeros. In the problem at hand, p has a zero at s = 1, and a pole at s = 3. Under the bilinear transformation z = (s − 1)/(s + 1), these points correspond to z = 0, 0.5, respectively. Hence, z(z − 0.5) b(z) = . 1 − 0.5z Next, in order to apply Fact 6.4.5 to compute the optimal function f0 , it is only necessary to know the values of f (and its derivatives in case b has repeated zeros) at the zeros of b. Since b is the inner part of n˜ d˜ w˜ and f = y˜ d˜ w, ˜ it follows that if z is a zero of b, then f (z) = w(z) ˜ or 0, depending on whether z is a zero of n˜ (i.e., a plant zero) or a zero of d˜ i.e., a plant pole). In the present instance, f (0) = w(0) ˜ = w(1) = 0.5, f (0.5) = 0. All of this discussion is made more precise in the next section, where it is shown that the sensitivity minimization problem is one of interpolation by a function of minimum norm. To compute the optimal function f0 , we first find a basis for the null space N of (6.4.22). By Lemma 6.4.6, such a basis is given by φ1 (z) = 1,
φ2 (z) =
1 . 1 − 0.5z
From (6.4.35), the Grammian matrix is 1 1
= , 1 4/3 which can be factored as
1 0√
= 1 1/ 3 Hence, X=B
−1
1 1√ = B B . 0 1/ 3
√ 1 −√3 , = 3 0
176
6. FILTERING AND SENSITIVITY MINIMIZATION
and an orthonormal basis for N is given by v1 (z) = φ1 (z) = 1,
v2 (z) =
√ √ z . 3(φ2 (z) − φ1 (z)) = 3 2−z
Next we need to compute A0 = X Xwhere φij = φi , f φj . From Lemma 6.4.7, φ1 , f φ1 = (f φ1 )(0) = w(0) ˜ = 0.5, φ1 , f φ2 = (f φ2 )(0) = 0.5, φ2 , f φ1 = (f φ1 )(0.5) = 0, φ2 , f φ2 = (f φ2 )(0.5) = 0 . Therefore
0.5 0 √ A0 = X X = . −0.5 3 0
It is routine to compute √ that σ¯ (A0 ) = 1, and that the associated vector p is [1 0] , which leads to A0 p = [0.5 − 0.5 3] . Hence, by (6.4.26) the optimal function f0 is given by √ 0.5v1 (z) − 0.5 3v2 (z) f0 (z) = v1 (z) 1 − 2z = . 2−z
6.5
SENSITIVITY MINIMIZATION: FAT PLANT CASE
The sensitivity minimization problem for multivariable systems is much deeper than the corresponding problem for scalar systems. At present, a clean solution to the problem is available only in the case where the plant has at least as many inputs as outputs. Not all of the details of this solution are given in this section, as they require some background concepts from functional analysis not covered previously. Suppose P is a given strictly proper plant with at least as many inputs as outputs, and that W1 , W2 ∈ M(S) are weighting matrices such that W1 has full row rank (as a rational matrix) and W2 is square and nonsingular. The objective is to minimize the weighted sensitivity measure J = W1 (I + P C)−1 W2 ∞ .
(6.5.1)
˜ N˜ ) be any r.c.f. and any l.c.f. of P , and let X, ˜ Y˜ ∈ M(S) be any solutions of Let (N, D), (D, N˜ X˜ + D˜ Y˜ = I . Then, as shown in Section 6.2, one can replace (6.5.1) by the equivalent problem of minimizing ˜ 2 ∞ , ˜ 2 − W1 NR DW J (R) = W1 Y˜ DW (6.5.2)
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
177
as the “free” parameter R varies over M(S). Now define rational H∞ -matrices F, G, H by ˜ 2 )[(1 + z)/(1 − z)] , F (z) = (W1 Y˜ DW
(6.5.3)
G(z) = (W1 N)[(1 + z)/(1 − z)] , ˜ 2 )[(1 + z)/(1 − z)] . H (z) = (DW
(6.5.4) (6.5.5)
Then the problem becomes one of minimizing J (R) = F − GRH ∞ ,
(6.5.6)
suitably choosing R from the set of rational H∞ -matrices. Since the plant is strictly proper, we have that G(1) = 0. As a result, the infimum of J is not attained in general. To deal with this situation, it is necessary to extend Lemma 6.4.1 to the matrix case. For this purpose, define J0 =
inf
R∈M(H∞ )
F − GRH ∞ .
(6.5.7)
Suppose G and H have the following properties: G is of the form G1 g where G1 ∈ M(H∞ ), g ∈ H∞ , and in addition G1 (ej θ ) has full row rank at all θ ∈ [0, 2π ], and all zeros of g(·) are on the unit circle. Similarly, H is of the form H1 h where H1 ∈ M(H∞ ), h ∈ H∞ , and in addition H1 (ej θ ) is square and nonsingular at all θ ∈ [0, 2π], and all zeros of h(·) are on the unit circle. This means in particular that if G(ej θ ) has less than full rank for some θ , then G(ej θ ) must in fact equal the zero matrix; similar remarks apply to H . Note that the assumptions on G and H are trivially satisfied in the scalar case. Now define Jm =
inf
R∈M(H∞ )
F − G1 RH1 ∞ .
(6.5.8)
It can be shown that the infimum in (6.5.8) is achieved, owing to the rank conditions on G1 and H1 . The next result is a generalization of Lemma 6.4.1. Let ej θ1 , · · · , ej θl denote the distinct zeros of the product gh on the unit circle. Then J0 = Jm if and only if F (ej θi ) ≤ Jm ∀i. Lemma 6.5.1
Proof. Since every matrix of the form GRH is also of the form G1 RH1 it is easy to see that Jm ≤ J0 . “only if ” Since gh(ej θi ) = 0 ∀i, it follows that GRH (ej θi ) = 0 ∀i, ∀R ∈ M(H∞ ). Hence, (F − GRH )(ej θi ) = F (ej θi ) ∀i, ∀R ∈ M(H∞ ), and as a result F − GRH ∞ =
sup (F − GRH )(ej θ ) θ∈[0,2π]
≥ max (F − GRH )(ej θi ) i
= max F (ej θi ), ∀R ∈ M(H∞ ) . i
(6.5.9)
178
6. FILTERING AND SENSITIVITY MINIMIZATION
Combining (6.5.9) with the hypothesis that J0 = Jm leads to Jm = J0 =
inf
R∈M(H∞ )
F − GRH ∞ ≥ max F (ej θi ) . i
(6.5.10)
“if ” Select an R0 ∈ M(H∞ ) such that Jm = F − G1 R0 H1 ∞ . Such an R0 exists, by earlier remarks. Now recall the function (cf. (6.4.13)) hθ,ε (z) = and define hε (z) =
z − ej θ , z − (1 + ε)ej θ
(6.5.11)
l [hθi ,ε (z)]mi ,
(6.5.12)
i−1
where mi is the multiplicity of ej θi as a zero of gh. Define the family (Rε ) in M(H∞ ) by Rε = R0 hε /gh. Then F − GRε H = F − G1 R0 H1 hε . Moreover, as in the proof of Lemma 6.4.1, it will 2 follow that F − G1 R0 H1 hε ∞ → F − G1 R0 H1 ∞ = Jm as ε → 0+ . Hence, J0 = Jm . To interpret Lemma 6.5.1 in the context of the sensitivity minimization problem, suppose the plant P has blocking zeros8 at s = ±j ω1 , · · · , ±j ωn and at infinity, but has full row rank at all other points on the j ω-axis. Suppose also that P has no j ω-axis poles. Select the weighting matrices W1 , W2 such that W1 has the same blocking zeros as P on the extended j ω-axis, but has full row rank at all other points on the j ω-axis, while W2 (j ω) is square and nonsingular at all points on the extended j ω-axis.Then F (ej θi ) = 0 ∀i and the condition F (ej θi ) ≤ Jm is automatically satisfied; as a result, J0 = Jm . Moreover, if R0 is a choice of R that attains the infimum in (6.5.8), then the family (Rε = R0 hε /gh) is an optimizing family for J (·). This leads to a family of compensators (Cε ) for P such that W1 (I + P Cε )−1 W2 ∞ → Jm as ε → 0+ . Let us therefore turn our attention to the problem of computing the infimum in (6.5.8) and an R that attains this infimum. As a first step, factorize G1 and H1 in the form G1 = Gi Go , H1 = Ho Hi , where Gi , Hi are inner, Go ∈ M(H∞ ) has a right inverse, and Ho ∈ M(H∞ ) is a unit. This is possible in view of the rank conditions on G1 and H1 (see Fact 6.2.2). Now, for any V ∈ M(H∞ ), the equation V = Go RHo has a solution for R in M(H∞ ). Hence, inf
R∈M(H∞ ) adj
F − G1 RH1 ∞ =
inf
V ∈M(H∞ )
F − Gi V Hi ∞ .
(6.5.13)
adj
Next, let Gi , Hi denote the adjoint matrices of Gi , Hi , respectively, and note that these adjoint matrices are themselves inner. Let b = |Gi | · |Hi |, and note that b is an inner function. Now, since multiplication by inner matrices is norm-preserving, it follows that adj
adj
F − Gi V Hi ∞ = Gi F Hi
adj
= F0 − bV ∞ , ∀V ∈ M(H∞ ) , 8 Recall that s is a blocking zero of P if P (s) = 0.
adj
− Gi Gi V Hi Hi
∞ (6.5.14)
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
where F0 =
adj adj Gi F Hi . Hence, from
Jm =
179
(6.5.8), (6.5.13), and (6.5.14), inf
V ∈M(H∞ )
F0 − bV ∞ .
(6.5.15)
As a final simplification, we may as well assume that b is a (finite) Blaschke product: Since b is inner, it equals a constant times a Blaschke product; the constant can be absorbed into the “free” matrix V . Let us now switch notation and restate the problem. Given a rational matrix F ∈ M(H∞ ) and a finite Blaschke product b, find an R ∈ M(H∞ ) that minimizes F − bR∞ . There are two aspects to this problem: (i) computing the minimum, and (ii) finding an R that attains the minimum. Each aspect is discussed separately. Backtracking through the various manipulations, the reader can verify that the matrices F and R have the same dimensions, and that they have at least as many columns as rows. To be specific, suppose they are of dimension r × s, where r ≤ s. Recall from Section 6.2 that Hs2 is a Hilbert space under the inner product 2π 1 f, g = f ∗ (ej θ )g(ej θ ) dθ . (6.5.16) 2π 0
As in the scalar case, with each matrix H ∈ Hs×s ∞ one can associate a continuous linear operator TH : Hs2 → Hs2 in the obvious way, namely: (TH f )(z) = H (z)f (z) ∀z ∈ D,
f ∈ Hs2 .
(6.5.17)
sup σ¯ (H (ej θ )) .
(6.5.18)
Moreover, this association is norm-preserving; that is, TH f 2 f ∈Hs \0 f 2
TH = sup 2
= H ∞ = ess.
θ∈[0,2π]
It turns out that Fact 6.4.5 has a complete analog in the multivariable case in so far as computing the infimum of F − bR∞ . This analog is based on a preliminary result taken from [86]. Some notation smoothens the way to a presentation of the latter. Let bHs2 = {bf : f ∈ Hs2 } ,
(6.5.19)
and let N denote the orthogonal complement of bHs2 in Hs2 ; thus N = {g ∈ Hs2 : g, bf = 0 ∀f ∈ Hs2 } .
(6.5.20)
Finally, let denote the orthogonal projection mapping Hs2 onto N. s For each ∈ Hs×s ∞ , let denote the map T from H2 into N. Let denote the collection { : ∈ Hs×s ∞ }. Then, corresponding to each map M in the collection , there exists a matrix such that M = ∞ and = M. ∈ Hs×s ∞
Lemma 6.5.2
180
6. FILTERING AND SENSITIVITY MINIMIZATION
2
Proof. See [86], Theorem 3.
Theorem 6.5.3
Suppose F ∈ Hr×s ∞ with r ≤ s. If r < s, define F0 =
F 0r−s×s
,
(6.5.21)
so that F0 ∈ Hs×s ∞ . Define a linear mapping, A : N → N as the restriction of TF0 to N. Then inf
s×s R∈H∞
F − bR∞ = A .
(6.5.22)
As in the scalar case, the subspace N is finite-dimensional, so that A can be computed by purely algebraic means. More on this later. Remarks 6.5.4
Proof. As a first step, suppose r = s so that F is square. As R varies over Hs×s ∞ , the operator TF −bR ranges over various operators. However, the operator TF −bR is the same for all R, and equals TF . To see this, let g ∈ Hs2 be arbitrary. Then TF −bR g = (F − bR)g = (F g) − (bRg) = TF g ,
(6.5.23)
since bRg ∈ Hs2 and hence its projection onto N is zero. Now TF is what is called F in Lemma 6.5.2, which states that there exists a matrix ∈ Hs×s ∞ such that = F and such that ∞ = F = (TF ). But = F is equivalent to (F − ) = 0, i.e., (F − )g ∈ Hs2 ∀g ∈ Hs2 .
(6.5.24)
Let ei denote the i-th elementary unit vector, which has a “1” in the i-th position and zeros elsewhere, and let (6.5.25) g(z) = ei , ∀z ∈ D . Then g ∈ Hs∞ (which also implies that g ∈ Hs2 ); moreover, (6.5.24) now implies that the i-th column of the matrix F − is a multiple (in H∞ ) of the inner function b. Repeating this reasoning for s×s all indices i shows that F − ∈ bHs×s ∞ , i.e., that = F − bR0 for some R0 ∈ H∞ . It is now claimed that (6.5.26) F − bR0 ∞ = inf F − bR∞ . s×s R∈H∞
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
181
To prove (6.5.26), note that F − bR∞ = TF −bR by (6.5.18), ≥ TF −bR since ≤ 1, = TF , by (6.5.23), = ∞ = F − bR0 ∞ . The preceding discussion can be summarized as follows: If F is square, then inf F − bR∞ = TF . s×s R∈H∞
(6.5.27)
(6.5.28)
To complete the proof of the theorem, it only remains to show that TF restricted to N has the same norm as TF . Suppose g ∈ Hs2 . Then g can be expressed uniquely in the form f + bh where f ∈ N and h ∈ Hs2 ( since Hs2 is the direct sum of N and bHs2 ). Now9 TF g = (Ff + F bh) = (Ff + bF h) = Ff . Moreover, g2 ≥ f 2 , so that
F g
2
g2
≤
Ff 2 f 2
.
(6.5.29)
(6.5.30)
Hence, A (which is the restriction of TF to N) has the same norm as TF . Next, the restriction that F be square is removed. Suppose r < s and define F0 as in (6.5.21). r×s s×s
Partition any R0 ∈ Hs×s ∞ as R0 = [R R1 ] where R ∈ H∞ . Then, for any R0 ∈ H∞ , we have F − bR F0 − bR0 ∞ = −bR1 ∞ (F − bR)(ej θ ) = ess. sup σ¯ . (6.5.31) −bR1 (ej θ ) θ∈[0,2π] Now for any two complex matrices A, B of compatible dimensions, it is easy to show that A σ¯ ≥ σ¯ (A) . (6.5.32) B Hence, from (6.5.31), F0 − bR0 ∞ ≥ F − bR∞ , so that inf
s×s R0 ∈H∞
F0 − bR0 ∞ ≥
inf
r×s R∈H∞
F − bR∞ .
(6.5.33)
To prove the reverse inequality, let R be any matrix in Hs×s ∞ that attains the infimum on the right side of (6.5.33), and define R0 = [R 0s×r−s ] . Then F0 − bR0 ∞ = F − bR∞ . Hence, the inequality in (6.5.33) is actually an equality. Moreover, from the previous paragraph, the infimum on the left side of (6.5.33) equals the norm of TF0 when restricted to N. 2 9 Note that this argument fails if b is replaced by an inner matrix.
182
6. FILTERING AND SENSITIVITY MINIMIZATION
As in the scalar case, it is easy to compute a basis for N.
Lemma 6.5.5 Let b be a finite Blaschke product with distinct zeros λ1 , · · · , λn , of multiplicities m1 , · · · , mn , respectively. For i ∈ {1, · · · , s}, let ei denote the i-th s × 1 elementary unit vector. Define N as in (6.5.20). Then, a (nonorthonormal) basis for N is given by the vector-valued functions i vkl (z) =
zk−1 · ei , 1 − λ¯ l z
1 ≤ i ≤ s; 1 ≤ k ≤ ml ; 1 ≤ l ≤ n .
(6.5.34)
Proof. By Lemma 6.4.7 and (6.5.16), (k−1)
i vkl , f = fi
(zl ), ∀f ∈ Hs2 ,
(6.5.35)
Since f ∈ bHs2 if and only if the right side of (6.5.35) is zero for all i, k, l, the lemma follows.
2
Once a basis for N has been determined, it is possible to convert it into an orthonormal basis, compute a matrix representation of A0 of A with respect to this basis, and finally to compute σ¯ (A0 ); the details are exactly the same as in the scalar case (see (6.4.33) et. seq.). However, if one is only interested in computing σ¯ (A0 ), a short-cut is available, and is described next. It should be emphasized that the procedure described next is equally applicable to the scalar case, and can be used to compute σ¯ (A0 ) more readily than by using the formula (6.4.38). However, there would be no advantage gained by this: In the scalar case, Fact 6.4.5 gives a method for computing the infimum of f − br∞ as well as for finding an r that attains the infimum, all in one shot. In contrast, the procedure given below only determines the infimum, and a separate interpolation procedure has to be used to find an optimizing r. This is no disadvantage in the multivariable case, however, since such a two-stage procedure is all that is available at present. Now for the short-cut: Theorem 6.5.3 states that the infimum of F − bR∞ equals A, where A : N → N is the restriction of TF to N. Lemma 6.5.5 provides a (nonorthonormal) basis for N. To compute A, the first step is to observe that A = A∗ , where A∗ : N → N is the adjoint map of A. Now compute a representation of A∗ with respect to the basis given in Lemma 6.5.5, and denote it by . Also, let denote the Grammian matrix of the basis in Lemma 6.5.5. Then it is easy to show that A = inf{λ : λ − ∗ is positive definite} .
(6.5.36)
Alternately, A is the largest solution to the generalized eigenvalue problem |λ − ∗ | = 0. The effectiveness of the above short-cut derives from the fact that the matrix representation is block-diagonal, as shown in Lemma 6.5.6 below. For the sake of clarity of notation, Lemma 6.5.6 is
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
183
stated only for the case where the Blaschke product b has simple zeros; the general case is conceptually not more difficult and is left to the reader. Suppose b is a finite Blaschke product with distinct simple zeros λ1 , · · · , λm , and that is rational. Define ∈ C ms×mr , ∈ C mr×mr by
Lemma 6.5.6
F ∈
Hr×s ∞
= Block Diag {F ∗ (λ1 ), · · · , F ∗ (λm )} ,
(6.5.37)
= ( ij ), i, j = 1, · · · , m ,
(6.5.38)
where
1 · Ir . 1 − λ¯ j λi
(6.5.39)
F − bR∞ = inf{λ : λ − ∗ is positive definite} .
(6.5.40)
ij = Then inf
s×s R∈H∞
Proof. Define N as in (6.5.20).Then N has dimension ms, and a basis for N is given by Lemma 6.5.5 as 1 ej , j = 1, · · · , s; i = 1, · · · , m . (6.5.41) vij (z) = 1 − λ¯ i z Thus, any h ∈ N can be expanded as h(z) =
s m
hij vij (z) .
(6.5.42)
i=1 j =1
Arrange the coordinates hij in an ms × 1 column vector as follows: ⎤ h11 ⎢ .. ⎥ ⎢ . ⎥ ⎥ ⎢ ⎢ h1s ⎥ ⎥ ⎢ ⎢ .. ⎥ ⎢ . ⎥ . ⎥ ⎢ ⎢hm1 ⎥ ⎥ ⎢ ⎢ . ⎥ ⎣ .. ⎦ ⎡
(6.5.43)
hms Then the claim is that the matrix representation of A∗ with respect to this basis is 0 = Block Diag {F0 (λ1 ), · · · , F0 (λm )} .
(6.5.44)
184
6. FILTERING AND SENSITIVITY MINIMIZATION
To prove the claim, let vkl be any basis function of the form (6.5.41). We must show that the matrix (6.5.44) gives the right answer for A∗ vkl . Now the coordinate vector of vkl is [0 · · · 1 · · · 0] , where the 1 is in the position (k − 1)s + l. Applying the matrix 0 to this vector gives [0 · · · F0 (λk )el · · · 0]∗ as the coordinate vector of A∗ vkl ; in other words, ∗
A vkl =
s
∗ f0lp (λk )vkp .
(6.5.45)
p=1
To show that (6.5.45) is correct, it is enough to show that it gives the right answer for the inner products vij , A∗ vkl for all i, j . By definition, vij , A∗ vkl = A∗ vkl , vij = vkl , Avij = vkl , TF0 vij = vkl , F0 vij = el∗ F0 (λk )vij (λk ) 1 = f0lj (λk ) , 1 − λ¯ i λk
(6.5.46)
where the last two steps follow from Lemma 6.4.7 and (6.5.41), respectively. On the other hand, letting ukl denote the function in (6.5.45) leads to vij , ukl =
s
∗ f0lp (λk )vij , vkp .
(6.5.47)
p=1
Note that
1 δjp , 1 − λ¯ k λi where δ is the Kronecker delta. Hence, (6.5.47) becomes vij , vkp =
∗ (λk ) · vij , ukl = f0lj
1 , 1 − λ¯ k λj
(6.5.48)
(6.5.49)
which agrees with (6.5.46). By previous discussion, A∗ is given by A∗ = inf{λ : λ 0 − ∗0 0 is positive definite} ,
(6.5.50)
where 0 is the Grammian matrix of the basis in (6.5.41). One can readily compute that 0 = ( 0ij )m i,j =1 where 1
0ij = · Is , (6.5.51) 1 − λ¯ j λi Now the only difference between and 0 is in their dimensions. Similarly, if we define as in (6.5.37), then is a submatrix of 0 , since F is a submatrix of F0 . If r = s then we are done,
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
185
since = 0 , = 0 and (6.5.50) is the same as (6.5.40). Otherwise, by symmetric row and column permutations, one can transform λ 0 − ∗0 0 into the form λ 0 λ 0 −→ , (6.5.52) 0 λ − ∗ where = (ij )m i,j =1 , and ij =
1 · Is−r . 1 − λ¯ j λi
(6.5.53)
It is clear from (6.5.52) that λ 0 − 0 ∗ 0 is positive definite if and only if λ − ∗ is; hence, (6.5.50) reduces to (6.5.40). 2 Now that an expression is available for the infimum of F − bR∞ the next step is to give a procedure for computing an R that attains the infimum. In the case where the inner function b has only simple zeros, such a procedure is provided by the matrix Nevanlinna-Pick algorithm.10 This algorithm is also applicable to the scalar problem studied in Section 6.4, and forms an alternative to the procedure of Fact 6.4.5. However, the latter is preferable, since it leads to an optimal interpolating function in one step, where as the Nevanlinna-Pick algorithm is iterative, and requires as many iterations as there are zeros of b. The main result in the sequel is Theorem 6.5.9, which overlaps somewhat with Lemma 6.5.6. However, an independent proof is given for this theorem, as the iterative interpolation algorithm is really a consequence of the method of proof. To set up the procedure, define μ=
inf
r×s R∈H∞
F − bR∞ ,
(6.5.54)
and let λ1 , · · · , λn denote the (simple) zeros of b. Then the problem is to find a matrix G ∈ Hr×s ∞ such that G(λi ) = F (λi ), i = 1, · · · , n, and G∞ = μ . (6.5.55) s×s (Note that G ∈ Hr×s ∞ is of the form F − bR for some R ∈ H∞ if and only if G(λi ) = F (λi ) ∀i). Equivalently, if we define Fi = [F (λi )]/μ , (6.5.56)
then the problem is one of finding a matrix ∈ Hr×s ∞ such that ∞ = 1 and (λi ) = Fi ∀i. The remainder of this section is devoted to the matrix version of the so-called NevanlinnaPick problem, which can be stated as follows: Given complex numbers λ1 , · · · , λn , with |λi | < 1 ∀i, together with complex matrices F1 , · · · , Fn , with Fi ≤ 1 ∀i, find all matrix-valued functions (if any) such that ∞ ≤ 1 and (λi ) = Fi ∀i. The relevance of this problem to sensitivity minimization is clear. 10 Strictly speaking, the Pick problem only makes sense for square matrices, so it is probably more accurate to speak of the matrix
Nevanlinna algorithm.
186
6. FILTERING AND SENSITIVITY MINIMIZATION
In the sequel, it is important to note that · denotes the norm of a complex matrix whereas · ∞ denotes the norm of a matrix-valued function. Before proceeding to the interpolation algorithm, it is established that one may assume that Fi < 1 ∀i, without any loss of generality. Consider first the scalar case, where one is given scalars f1 , · · · , fn , with fi ≤ 1 ∀i, and is asked to find a φ ∈ H∞ such that φ∞ ≤ 1 and φ(λi ) = fi ∀i. Suppose one of the constants has magnitude exactly equal to 1, and renumber the constants if necessary so that |f1 | = 1. Since λ1 belongs to the open unit disc, it follows from the maximum modulus principle that the only H∞ -function with norm 1 that equals f1 at λ1 is the constant function φ(z) ≡ f1 ∀z. Hence, the interpolation problem does not have a solution unless fi = f1 ∀i. In the latter case, the solution is the obvious one, namely φ(z) ≡ f1 . Thus, one may as well assume at the outset that |fi | < 1 ∀i, because the problem is trivial otherwise. But this reasoning breaks down in the matrix case, because there exist nonconstant matrices such that (z) ≡ 1 ∀z ∈ D; for example, let 1 0 (z) = . (6.5.57) 0 z However, Lemma 6.5.7 below shows that all such functions are essentially of the form (6.5.57). One can think of Lemma 6.5.7 as providing a matrix version of the maximum modulus principle. Suppose ∈ Hk×l ∞ and that ∞ ≤ 1. Suppose (z0 ) = 1 for some z0 in the open unit disc, and let r denote the multiplicity of 1 as a singular value of (z0 ) (i.e., as an eigenvalue of ∗ (z0 ) (z0 )). Then (z) = 1 ∀z ∈ D. Moreover, there exist unitary matrices U ∈ C k×k , V ∈ C l×l such that 0l−r×r Ir , (6.5.58) U (z)V = 0r×k−r (z) Lemma 6.5.7
where (z) < 1 ∀z ∈ D. The proof of Lemma 6.5.7 requires the following observation on complex matrices. Suppose M ∈ C k×l has norm 1. Then (i) |mij | ≤ 1 ∀i, j , (ii) If |mij | = 1 for some i, j , then all other elements of row i and column j are zero.
Lemma 6.5.8
Proof. Temporarily let · 2 denote the Euclidean vector norm on both C k and C l . Then Mx2 ≤ x2 ∀x ∈ C l ,
(6.5.59)
since M = 1. To prove (i), suppose |mij | > 1 for some i, j , and let x = ej , the j -th elementary unit vector in C l . Then Mx2 > 1, which is a contradiction. To prove (ii), suppose |mij | = 1 for some i, j , and again let x = ej .Then Mej 2 = 1 implies that all other elements of the j -th column of M are zero. The same reasoning applied to M ∗ shows that all other elements of the i-th row of M are zero. 2
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
187
∗ (z
Proof of Lemma 6.5.7. Suppose 1 is an r-fold eigenvalue of 0 ) (z0 ), so that 1 is an r-fold k×k singular value of (z0 ). Thus, there exist unitary matrices U ∈ C , V ∈ C l×l such that 0r×l−r Ir , (6.5.60) U (z0 )V = 0k−r×r M where M < 1. Define y(z) = (U (z)V )11 , i.e., y is the corner element of U (z)V . Then y is analytic, satisfies |y(z)| ≤ 1 ∀z ∈ D (by Lemma 6.5.8 (i)), and y(z0 ) = 1 (by (6.5.60)). Hence, by the maximum modulus theorem, y(z) = 1 ∀z ∈ D, since z0 is in the open unit disc. Now by Lemma 6.5.8 (ii), all other elements in the first row and first column of U (z)V are identically zero. The same reasoning applies to all elements of the form (U (z)V )ii where i ≤ r. Hence, U (z)V is of the form (6.5.58), and (z) = 1 ∀z ∈ D. It only remains to show that (z) < 1 ∀z ∈ D. Suppose to the contrary that (z1 ) = 1 for some z1 ∈ D. Then, by previous reasoning, (z) = 1 ∀z ∈ D. In particular, M = (z0 ) = 1 , which contradicts the assumption that 1 is an r-fold singular value of (z0 ). Hence, (z) < 1 ∀z ∈ D. 2 Let us now return to the interpolation problem, where one is given points λ1 , · · · , λn , and matrices F1 , · · · , Fm , with |λi | < 1 and Fi ≤ 1 ∀i. Recall that the objective is to show that one may assume Fi < 1 ∀i without sacrificing generality. Assume to the contrary that Fi = 1 for some values of i. Among these select one for which the multiplicity of 1 as a singular value is the largest, and renumber it as F1 . Select unitary matrices U, V such that Ir 0 , (6.5.61) U F1 V = 0 G1 where r is the multiplicity of 1 as a singular value of F , and G1 < 1. Now, by Lemma 6.5.7, any such that ∞ ≤ 1 and (λi ) = Fi must satisfy 0 Ir ∀z ∈ D . (6.5.62) U (z)V = 0 (z) In particular, F2 , · · · , Fn must satisfy I U Fi V = r 0
0 , Gi
i = 2, · · · , n .
(6.5.63)
Otherwise the interpolation problem has no solution. On the other hand, suppose (6.5.63) holds. Then Gi < 1 ∀i; otherwise, if Gi = 1 for some i, then Fi has 1 as a singular value of multiplicity larger than r, contradicting the manner in which F1 was selected. Now replace the original interpolation problem by the following modified problem: Find a such that ∞ ≤ 1 and (λi ) = Gi ∀i. Then there is a one-to-one correspondence between solutions of the original problem and of the modified problem, via (6.5.62). But in the modified problem all the matrices Gi , have norms strictly less than one.
188
6. FILTERING AND SENSITIVITY MINIMIZATION
We are now ready for the main result on the matrix Nevanlinna problem. Suppose λ1 , · · · , λn are distinct complex numbers and F1 , · · · , Fn are complex matrices with |λi | < 1 and Fi < 1 ∀i. Define a partitioned matrix
Theorem 6.5.9
⎡
P11 ⎢ .. P =⎣ . Pn1 where Pij =
⎤ · · · P1n .. ⎥ , . ⎦
(6.5.64)
· · · Pnn
1 · (I − Fi∗ Fj ) . 1 − λ¯ i λj
(6.5.65)
Then there exists a ∈ M(H∞ ) such that ∞ < 1 and (λi ) = Fi ∀i, if and only if the matrix P is positive definite. In the current set-up Lemma 6.5.6 leads to the following result: There exists a ∈ M(H∞ ) such that ∞ ≤ 1 and (λi ) = Fi if and only if the matrix P is nonnegative definite. The proof of Theorem 6.5.9 is rather long and is based on a preliminary lemma. We shall see that an iterative algorithm for computing will emerge from the proof. Lemma 6.5.10
Suppose E < 1, and define L(E) =
A B , C D
(6.5.66)
where A = (I − EE ∗ )−1/2 ,
B = −(I − EE ∗ )−1/2 E,
C = −(I − E ∗ E)−1/2 E ∗ ,
D = (I − E ∗ E)−1/2 .
(6.5.67)
(Note that M 1/2 denotes the Hermitian square root of M.) Then for each matrix X with X ≤ 1, the matrix TE (X) = (AX + B)(CX + D)−1 , (6.5.68) is well-defined. Moreover, TE (X) ≤ 1 (resp. < 1) ⇐⇒ X ≤ 1 (resp. < 1) .
(6.5.69)
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
189
− E ∗ E)−1/2 (E ∗ X
Proof. First of all, CX + D = (I + I ). Since E < 1, it follows that, whenever ∗ X ≤ 1, the matrix E X also has norm less than one, so that E ∗ X + I has an inverse. Hence, TE (X) is well-defined. Now define I 0 J = . (6.5.70) 0 −I Then routine calculations show that L∗ J L = J ; these calculations make use of the matrix identities (I − MN )−1 M = M(I − NM)−1 , (I − MN )
−1
= I + M(I − NM)
(6.5.71) −1
N.
(6.5.72)
Note that for any matrix M, we have that M ≤ 1 (resp. < 1) if and only if M ∗ M − I ≤ 0 (resp. < 0). Hence, (6.5.69) is equivalent to T ∗ T − I ≤ 0 (resp. < 0) ⇐⇒ X∗ X − I ≤ 0 (resp. < 0) ,
(6.5.73)
where T is a shorthand for TE (X). But since CX + D is nonsingular, T ∗ T − I ≤ 0 (resp. < 0) ⇐⇒ (CX + D)∗ (T ∗ T − I )(CX + D) ≤ 0 (resp. < 0) .
(6.5.74)
The latter quantity equals (AX + B)∗ (AX + B) − (CX + D)∗ (CX + D) X ∗ ∗ = [X I ]L J L I X = [X∗ I ]J I = X∗ X − I .
(6.5.75)
Hence, (6.5.69) follows from (6.5.74) and (6.5.75).
2
Proof of Theorem 6.5.9. “if ” The proof is by induction on the number n of points to be interpolated. If n = 1, the theorem is clearly true: Take (z) = F1 . To establish the inductive step, suppose the theorem is true for n − 1 points, and suppose that P is positive definite; it is shown below that this implies the existence of a ∈ M(H∞ ) satisfying ∞ < 1 and (λi ) = Fi ∀i. Let y be the inner function11 y(z) = 11 Note that y(z) = z if λ = 0. 1
|λ1 |(z − λ1 ) , λ1 (1 − λ¯ 1 z)
(6.5.76)
190
6. FILTERING AND SENSITIVITY MINIMIZATION
and define Ti = TF1 (Fi ), 1 Gi = Ti , y(λi )
i = 2, · · · , n ,
(6.5.77)
i = 2, · · · , n ,
(6.5.78)
where TF1 (·) is defined in (6.5.68). Now define a partitioned matrix ⎡ ⎤ Q22 · · · Q2n ⎢ .. ⎥ , Q = ⎣ ... . ⎦ Qn2 where Qij =
(6.5.79)
· · · Qnn
1 · (I − G∗i Gj ) . ¯ 1 − λi λj
(6.5.80)
It is claimed that Q is positive definite. To prove the claim, compute Qij from (6.5.80), and make liberal use of the identities (6.5.71) and (6.5.72). This leads to 1 1 ∗ Qij = T Tj I− 1 − y(λ ¯ i )y(λj ) i 1 − λ¯ i λj 1 1 1 = (6.5.81) ·I + (I − Ti∗ Tj ) , 1− 1 − y(λ ¯ i )y(λj ) 1 − y(λ ¯ i )y(λj ) 1 − λ¯ i λj where from (6.5.77), I − Ti∗ Tj = (I − F1∗ F1 )1/2 (I − Fi∗ F1 )−1 (I − Fi∗ Fj ) (I − F1∗ Fj )−1 (I − F1∗ F1 )1/2 .
(6.5.82)
Ri−1 = (I − F1∗ Fi )−1 (I − F1∗ F1 )1/2 ,
(6.5.83)
For clarity, define
and note that
i = 1, · · · , n ,
1 1 λ¯ 1 λ1 − 1 . · 1− = 1 − y(λ ¯ i )y(λj ) 1 − λ¯ i λj (λ¯ i − λ¯ 1 )(λj − λ1 )
(6.5.84)
Then, (6.5.81) becomes Qij =
Rj−1 R ∗ −1 λ¯ 1 λ1 − 1 , · I + i Pij y(λj ) y(λi ) (λ¯ i − λ¯ 1 )(λj − λ1 )
(6.5.85)
1 − λ¯ 1 λ1 I + y(λ ¯ i )Qij y(λj ) Rj . (1 − λ¯ i λ1 )(1 − λj λ¯ 1 )
(6.5.86)
or, upon rearrangement, Pij =
Ri∗
6.5. SENSITIVITY MINIMIZATION: FAT PLANT CASE
Now (6.5.86) implies that P =R where ci =
∗
0 R, Q
c1 I 0
191
(6.5.87)
1 for i = 1, · · · , n , 1 − λ¯ 1 λi
(6.5.88)
and R is a partitioned matrix of the form ⎡
c1 R1 ⎢ 0 ⎢ ⎢ R=⎢ 0 ⎢ . ⎣ ..
c2 R2 y(λ2 )c2 R2 0 .. .
c3 R 3 0 y(λ3 )c3 R3 .. .
0
0
0
··· ··· ···
c n Rn 0 0 .. .
⎤ ⎥ ⎥ ⎥ ⎥ . ⎥ ⎦
(6.5.89)
· · · y(λn )cn Rn
Since R is nonsingular, c1 > 0 and P is positive definite, it follows from (6.5.87) that Q is also positive definite. In particular, Qii is positive definite for all i, and Gi < 1 for all i. By the inductive hypothesis, there exists a function ∈ M(H∞ ) such that ∞ < 1 satisfying (λi ) = Gi for i = 2, · · · , n. Now define (z) = (A − y(z)(z)C)−1 (y(z)(z)D − B) ,
(6.5.90)
where A = (I − F1 F1∗ )−1/2 ,
B = (I − F1 F1∗ )−1/2 F1 ,
C = (I − F1∗ F1 )−1/2 F1∗ ,
D = (I − F1∗ F1 )−1/2 .
(6.5.91)
Note that (6.5.90) is the inverse of the transformation TF1 , defined in (6.5.68). Hence, from (6.5.77), (6.5.78), and (6.5.90), (λi ) = Fi for i = 2, · · · , n .
(6.5.92)
Since y(λ1 ) = 0, (6.5.90) shows that (λ1 ) = A−1 B = F1 .
(6.5.93)
Finally, by Lemma 6.5.10, (z) < 1 for all z implies that (z) < 1 for all z. Hence, is a solution to the interpolation problem with n points. This completes the proof by induction. “only if ” One needs to prove the following statement: “Suppose ∈ M(H∞ ) and that ∞ < 1. Let λ1 , · · · , λn be any n distinct points in the open unit disc, let Fi = (λi ), and define P as in (6.5.64), (6.5.65). Then P is positive definite.”
192
6. FILTERING AND SENSITIVITY MINIMIZATION
The proof is again by induction on the number n of points. The statement is clearly true if n = 1. Suppose it is true for n − 1 points. Given the n points λ1 , · · · , λn and the function , let Fi = (λi ) and define the function ∈ M(H∞ ) by (z) =
1 (A (z) + B)(C (z) + D)−1 , y(z)
(6.5.94)
where y is as in (6.5.76) and A, B, C, D are as in (6.5.91). Then ∞ < 1 by Lemma 6.5.10. Moreover, (λi ) = Gi where Gi is defined in (6.5.78). By the inductive hypothesis, the matrix Q of (6.5.79) is positive definite. Since (6.5.87) holds, P is also positive definite. Hence, the statement is true for n points. 2 The proof of the “if ” part of the theorem suggests an iterative algorithm for computing . By the transformation (6.5.77)–(6.5.78) one can replace the given interpolation problem involving n points by another involving only n − 1 points. Applying the transformation repeatedly reduces the number of interpolation constraints by one each time, until eventually one is left with a single matrix Hn such that Hn < 1, and is faced with the problem of finding a matrix (·) ∈ M(H∞ ) such that (λn ) = Hn and ∞ < 1. An obvious solution is (z) ≡ Hn ∀z. By applying the inverse transformation (6.5.90) n − 1 times one can find a solution to the original problem. Moreover, there is a one-to-one correspondence between the set of all solutions (·) to the above problem and the set of all solutions to the original interpolation problem. The fact that the matrix R in (6.5.87) is upper-triangular suggests that (6.5.87) is the first step in a block Cholesky decomposition of P . This fact can be used to generate Q quickly and reliably. Suppose now that it is desired to find a function (·) ∈ M(H∞ ) satisfying (λi ) = Fi and such that ∞ ≤ 1 (as opposed to ∞ < 1). In this case, a necessary and sufficient condition that such a exist is that the test matrix P of (6.5.64) be nonnegative definite (as opposed to positive definite as in Theorem 6.5.9). To construct such a , one can use the iterative procedure described above, but with a few modifications. As mentioned before Theorem 6.5.9, one can assume at the outset that each matrix Fi has a norm strictly less than one, by applying the reduction procedure of Lemma 6.5.7 if needed. Then, by applying the transformation (6.5.77)–(6.5.80), one can replace the original problem involving the n matrices F1 , · · · , Fn by a reduced interpolation problem involving the n − 1 matrices G2 , · · · , Gn . Moreover, if P is nonnegative definite, so is Q, because of (6.5.87). However, even if Fi < 1 ∀i, it can happen that Gi = 1 for some i. In such a case, it is necessary first to reduce the size of the matrices by first using Lemma 6.5.7 before carrying out another step of the Nevanlinna iteration. This section concludes with an example that illustrates the use of the matrix Nevanlinna algorithm. Suppose it is desired to find a matrix ∈ H1×2 ∞ that has norm less than one and satisfies the interpolation constraints Example 6.5.11
(0) = [0.2 0.4], (0.5) = [0.3 0.7] .
6.6. SENSITIVITY MINIMIZATION: GENERAL CASE
193
The test matrix P of (6.5.64) is of order 4 × 4 in this case. To save some labor, let us consider the 2×1 equivalent problem of finding a ∈ H∞ that has norm less than one and satisfies the interpolation constraints 0.2 0.3 (0) = , (0.5) = . 0.4 0.7 In this case λ1 = 0, λ2 = 0.5, F1 , F2 are self-evident, and 0.8 0.66 P = , 0.66 0.56 which is positive definite. Hence, the problem has a solution. From (6.5.67), ⎤ ⎡ 1.0236 0.0472 −0.2236 L(F1 ) = ⎣ 0.0472 1.0944 −0.4472⎦ . −0.2236 −0.4472 1.118 In this case y(z) = z, y(λ2 ) = 0.5. Hence, from (6.5.77) and (6.5.78), 1 0.11652 0.15791 · = , T2 = 0.33304 0.73788 0.45135 1 0.31582 . T2 = G2 = 0.9027 0.5 One can as well choose (z) = G2 ∀z. Then, from (6.5.90), −1 1.0236 + 0.0706z 0.0472 + 0.1412z 0.2236 + 0.3531z (z) = 0.0472 + 0.2018z 1.0944 + 0.4037z 0.4472 + 1.0092z 1 0.2236 + 0.3659z = 1.118 + 0.4743z 0.4472 + 1.0028z 1 0.2 + 0.3273z . = 1 + 0.4242z 0.4 + 0.8969z
6.6
SENSITIVITY MINIMIZATION: GENERAL CASE
In this section, the sensitivity minimization problem is studied in full generality, i.e., without any simplifying assumptions on the relative numbers of plant inputs and outputs, and on the dimensions of the weighting matrices. The filtering problem in the general case is not discussed separately in the interests of brevity, but reasoning similar to that given here applies to this problem as well. As shown in earlier sections, the problem is one of minimizing a cost function J (R) = F − GRH ∞ ,
(6.6.1)
194
6. FILTERING AND SENSITIVITY MINIMIZATION
by a suitable choice of R ∈ M(H∞ ), where F, G, H ∈ M(H∞ ) are given rational matrices. It is assumed throughout that G(ej θ ), H (ej θ ) have full rank (either row or column, as appropriate) for all θ ∈ [0, 2π]. If this is not so, then it is necessary to extend Lemma 6.5.1 to the present case. This extension is straight forward and is left to the reader. Some preliminary results are presented before we proceed to the main solution algorithm. In the sequel, the notation A > 0 (resp. A ≥ 0) means that the Hermitian matrix A is positive definite (resp. nonnegative definite).The notation A > B (resp. A ≥ B) denotes that A and B are Hermitian, and that A − B > 0 (resp. A − B ≥ 0). The notations A < B and A ≤ B are defined analogously. Lemma 6.6.1
Suppose A ∈ M(R(z)) and satisfies two conditions: A (z−1 ) = A(z) ∀z, and A(e ) ≥ 0 ∀ θ ∈ [0, 2π ] . jθ
(6.6.2) (6.6.3)
Then there exists an outer matrix U ∈ M(R∞ ) such that A(z) = U (z−1 )U (z) ∀z .
(6.6.4)
Further, if (6.6.3) is strengthened to A(ej θ ) > 0
∀θ ∈ [0, 2π ] ,
(6.6.5)
then U can be chosen to a unit of M(R∞ ). In this case, U is unique to within left multiplication by an orthogonal matrix. A matrix U satisfying (6.6.4) is called a spectral factor of A and is denoted by A1/2 .
2
Proof. See [3, p. 240].
n×m Suppose A ∈ R∞ where n > m, and that A is inner. Then there exists an inner matrix n×(n−m) n×n B ∈ R∞ such that = [A B] ∈ R∞ is also inner.
Lemma 6.6.2
The matrix B is sometimes referred to as a complementary inner matrix of A. Proof. Since A is inner, it has full column rank in M(R∞ ). By permuting rows if necessary, which does not affect the validity of the lemma, it can be assumed that D A= , (6.6.6) N m×m n−m×m where |D| = 0 as an element of R∞ . Note that D ∈ R∞ , N ∈ R∞ . Now define
P (z) = −[D (z−1 )]−1 N (z−1 ) .
(6.6.7)
6.6. SENSITIVITY MINIMIZATION: GENERAL CASE
195
Then P ∈ and therefore has an r.c.f. (U, V ) over M(R∞ ). Specifically, U ∈ m×(n−m) (n−m)×(n−m) R∞ , V ∈ R∞ , and (R(z))m×(n−m)
P (z) = U (z)[V (z)]−1 ∀z .
(6.6.8)
Combining (6.6.8) and (6.6.9) shows that D (z−1 )U (z) + N (z−1 )V (z) = 0 ∀z . Now define W = and express (6.6.9) in the form
U n×(n−m) ∈ R∞ , V
A (z−1 )W (z) = 0 ∀z .
(6.6.9)
(6.6.10)
(6.6.11)
Clearly W has full column rank as an element of M(R∞ ). Hence, by Fact 6.2.2, W has an innern×(n−m) (n−m)×(n−m) outer factorization Wi Wo , where Wi ∈ R∞ and Wo ∈ R∞ . Multiplying (6.6.11) by −1 [Wo (z)] gives (6.6.12) A (z−1 )Wi (z) = 0 ∀z . It is claimed that the choice B = Wi makes = [A B] inner (and of course square). Note that −1 A (z )A(z) A (z−1 )B(z) = I ∀z , (6.6.13) (z−1 ) (z) = B (z−1 )A(z) B (z−1 )B(z) since A, B are both inner and satisfy (6.6.12).
2
n×m Now we come to the main result of this section. To be specific, suppose F ∈ R∞ ,G ∈ k×m ∈ R∞ , and that n > l, k < m; the cases n ≤ l and/or k ≥ m are simpler and the requisite modifications will be obvious. Recall that the problem is to select R ∈ Rl×k ∞ so as to minimize n×l R∞ ,H
J (R) = F − GRH ∞ .
(6.6.14)
The first step in the solution procedure is to show that the above problem can be reduced to that of minimizing A−Q B , (6.6.15) C D ∞ by a suitable choice of Q ∈ Rl×k ∞ , where A, B, C, D are L∞ (not H∞ !) matrices of appropriate n×l dimensions. For this purpose, first factor G as Gi Go , where Gi ∈ R∞ is inner and Go ∈ Rl×l ∞ is n×n outer. Then, using Lemma 6.6.2, construct a square inner matrix ∈ R∞ containing Gi as its first l columns. Then, Go G= . (6.6.16) 0(n−l)×l
196
6. FILTERING AND SENSITIVITY MINIMIZATION
m×k Similar reasoning applied to H ∈ R∞ shows that there exist an outer matrix Ho ∈ Rk×k ∞ and a m×m square inner matrix ∈ R∞ such that
H = [Ho 0k×(m−k) ] .
(6.6.17)
Hence, Go R[Ho 0] ∞ J (R) = F − 0 ∗ Go ∗ R[Ho 0]∞ . = F − 0 Now define
A B = ∗ F ∗ ∈ M(L∞ ) , C D
(6.6.18)
(6.6.19)
and let Q denote the “free” parameter Go RHo (recall that both Go and Ho are units of M(H∞ ) by virtue of the rank conditions on G and H ). Then the cost function becomes A−Q B . (6.6.20) C D ∞ To proceed further, we require another very simple result: Lemma 6.6.3 Y ∞ , then
Suppose W = [X Y ] ∈ M(L∞ ) is rational. Then W ∞ ≥ Y ∞ . Further, if γ > W ∞ ≤ γ ⇐⇒ (γ 2 I − Y Y ∗ )−1/2 X∞ ≤ 1 ,
(6.6.21)
where Y ∗ (z) = Y (z−1 ), and (γ 2 I − Y Y ∗ )−1/2 denotes the spectral factor of γ 2 I − Y Y ∗ . Proof. It is obvious that W ∞ ≥ Y ∞ . Moreover, if γ > Y ∞ , then (γ 2 I − Y Y ∗ )−1/2 exists and is a unit of M(R∞ ), by Lemma 6.6.1; hence, (γ 2 I − Y Y ∗ )−1/2 is well-defined. By definition, (6.6.22) W ∞ = max σ¯ (W (ej θ )) . θ∈[0,2π]
Hence, W ∞ ≤ γ ⇐⇒ σ¯ (W (ej θ )) ≤ γ ∀θ ∈ [0, 2π ] ⇐⇒ (W (ej θ )W ∗ (ej θ ) ≤ γ 2 I ∀θ ⇐⇒ [XX ∗ + Y Y ∗ ](ej θ ) ≤ γ 2 I ∀θ ⇐⇒ (XX ∗ )(ej θ ) ≤ γ 2 I − (Y Y ∗ )(ej θ ) ∀θ . This leads immediately to (6.6.21).
(6.6.23)
2
6.6. SENSITIVITY MINIMIZATION: GENERAL CASE
197
A parallel result to Lemma 6.6.3 is the following. Lemma 6.6.4
Suppose W =
X ∈ M(L∞ ) , Y
(6.6.24)
is rational. Then W ∞ ≥ Y ∞ . Further, if γ > Y ∞ , then W ∞ ≤ γ ⇐⇒ X(γ 2 I − Y ∗ Y )−1/2 ∞ ≤ 1 .
(6.6.25)
Now let us return to the problem of minimizing the cost function in (6.6.20). Let γ be a current guess of the minimum, and let K denote [C D]. Then, from Lemma 6.6.4, A−Q B ≤γ , (6.6.26) min C D ∞ Q∈M(H∞ ) if and only if min
Q∈M(R∞ )
[A − Q B](γ 2 I − K ∗ K)−1/2 ≤ 1 . ∞
(6.6.27)
Now note that (γ 2 I − K ∗ K)−1/2 is a unit of M(R∞ ). Hence, the product Q(γ 2 I − K ∗ K)−1/2 can be replaced by another free parameter R. If we define A1 = A(γ 2 I − K ∗ K)−1/2 ∈ M(L∞ ) , ∗
B1 = B(γ I − K K) 2
−1/2
∈ M(L∞ ) ,
(6.6.28) (6.6.29)
then (6.6.27) is equivalent to min
R∈M(H∞ )
[A1 − R B1 ]∞ ≤ 1 .
(6.6.30)
Next, from Lemma 6.6.3, (6.6.30) holds if and only if min
R∈M(H∞ )
(I − B1 − B1∗ )−1/2 (A1 − R)∞ ≤ 1 .
(6.6.31)
Once again, it is possible to replace (I − B1 − B1∗ )−1/2 R by another free parameter S ∈ M(H∞ ). If we define A2 = (I − B1 − B1∗ )−1/2 A1 ∈ M(L∞ ) , (6.6.32) then (6.6.31) is equivalent to min
S∈M(H∞ )
A2 − S∞ ≤ 1 .
(6.6.33)
The problem of finding the minimum in (6.6.33) is one of finding the closest matrix in M(H∞ ) to a given matrix in M(L∞ ). This can be reduced to the problem studied in Section 6.5,
198
6. FILTERING AND SENSITIVITY MINIMIZATION
namely minimizing a cost function of the form (6.5.22). This can be accomplished as follows: Since A2 ∈ M(L∞ ) and is rational, it has an l.c.f. (T , U ) over M(R∞ ). Now T is square and full rank in M(R∞ ), and therefore has a factorization To Ti , where To is a unit of M(R∞ ) and Ti is inner. Hence, A2 (z) = [T (z)]−1 U (z) = [Ti (z)]−1 [To (z)]−1 U (z) = [Ti (z)]−1 V (z) ,
(6.6.34)
where V = To−1 U ∈ M(R∞ ). Now A2 − S∞ = Ti−1 V − S∞ = Ti (Ti−1 V − S)∞ = V − Ti S∞ adj
= Ti
V − bS∞ ,
(6.6.35)
where b = |Ti |. This is the type of objective function covered by Theorem 6.5.3. The preceding discussion can be summarized in the following iterative procedure for computing the minimum of the function in (6.6.20). Step 1 Choose
B ∞ } . γ ≥ max {[C D]∞ , D
(6.6.36)
Step 2 Form L = (γ 2 I − K ∗ K)−1/2 ,
(6.6.37)
M = (I − BLL∗ B ∗ )−1/2 .
(6.6.38)
where K = [C D]. Step 3 Form lf BL∞ ≥ 1 , increase γ and go to Step 2; otherwise go on to Step 4. Step 4 Compute J∗ =
min
S∈M(H∞ )
MAL − S∞ .
(6.6.39)
If J ∗ < 1, decrease γ and go to Step 2. If J ∗ > 1, increase γ and go to Step 2. If γ = 1, go on to Step 5. Step 5 Set Q = M −1 SL−1 , where S achieves the minimum in (6.6.39). This choice of Q minimizes the original objective function in (6.6.20).
6.7. TWO-PARAMETER COMPENSATOR
6.7
199
TWO-PARAMETER COMPENSATOR
The objective of this section is to discuss the relative merits of a two-parameter compensator in filtering and sensitivity minimization applications, compared to a one-parameter compensator. The conclusions can be briefly stated as follows: In problems of disturbance rejection, both schemes are equivalent. In tracking problems involving a stable plant, again both schemes are equivalent. Finally, in tracking problems with an unstable plant, whether one uses the H2 -norm or the H∞ -norm, there exist plants for which the optimum achievable with a two-parameter scheme is as small as one wishes compared to the one-parameter optimum. To avoid purely technical details caused by the nonexistence of optimal compensators in the continuous-time case, the discussion below is addressed to the discrete-time case. However, with the aid of Lemmas 6.3.2, 6.4.1, and 6.5.1, this discussion can be translated to the continuous-time case. Accordingly, suppose P is a given multivariable plant with transfer matrix P (z), where z is the transfer function of a unit delay. Suppose u1 is a reference input to be tracked, while u2 and u3 are sensor noises to be rejected. Let W1 , W2 , W3 , W4 be given weighting matrices in M(H∞ ). It is assumed that (i) P has no poles on the unit circle, (ii) W1 , W3 are outer, (iii) W2 , W4 are square outer, and (iv) P (ej θ ), W1 (ej θ ), · · · , W4 (ej θ ) all have full row rank at all θ ∈ [0, 2π ]. For convenience, a two-parameter compensation scheme is reproduced in Figure 6.2. The
u1
~
Nc1
+
z1
–
~
Dc—1
y1
+
u2 D —1
+ +
u3
~
Nc2
z2 y2
+
N
Figure 6.2: Two-Parameter Compensation Scheme.
optimal tracking problem is to minimize either J2 = W1 (I − Hyu1 )W2 2 ,
(6.7.1)
J∞ = W1 (I − Hyu1 )W2 ∞ ,
(6.7.2)
or by means of a suitable stabilizing compensator. The optimal disturbance rejection problem is to select a stabilizing compensator that minimizes either K21 = W3 Hyu2 W4 2 or K22 = W3 Hvu3 W4 2 ,
(6.7.3)
200
6. FILTERING AND SENSITIVITY MINIMIZATION
or K∞1 = W3 Hyu2 W4 ∞ or K∞2 = W3 Hvu3 W4 ∞ .
(6.7.4)
From the discussion in Section 5.6, it is at once clear that, as far as minimizing K21 , K22 , K∞1 , K∞2 , there is no advantage to using a two-parameter scheme instead of a oneparameter scheme. This is because the set of achievable stable transfer matrices Hyu2 , Hvu3 are exactly the same with either scheme. But with the optimal tracking problem, the situation is quite different. The minimum J2 or J∞ achievable with a two-parameter scheme is no larger than with a one-parameter scheme, and can be arbitrarily smaller, depending on the plant. Moreover, with the two-parameter scheme, it is possible simultaneously to minimize J2 and K21 , (or J2 and K22 , or J∞ and K∞1 or any other combination). This is the overwhelming advantage of the two-parameter scheme. To justify the latter statement first, recall from (5.6.15) that the set of achievable stable transfer matrices from (u1 , u2 , u3 ) to y are parametrized by ˜ − N(X + R D)] ˜ , [NQ N(Y − R N)
(6.7.5)
˜ N˜ ) are any r.c.f. and l.c.f. of P , X, Y ∈ M(H∞ ) satisfy XN + Y D = I , and where (N, D), (D, Q, R ∈ M(H∞ ) are arbitrary. Thus, to minimize J∞ and K22 simultaneously, for example, one chooses Q to minimize J∞ = W1 (I − N Q)W2 ∞ , (6.7.6) and R to minimize
˜ K22 = W3 (I − N(X + R D))W 4 2 .
(6.7.7)
The associated compensator N˜ c1 = Q,
˜ N˜ c2 = X + R D,
D˜ c = Y − R N˜ ,
(6.7.8)
minimizes the weighted tracking error and optimally rejects the weighted disturbance. From (5.6.16), the set of stable transfer matrices Hyu1 , that can be achieved using a one˜ : R ∈ M(H∞ )}. Let us define parameter compensator is {N(X + R D) ∗ J∞ =
J2∗ = h∗∞ = h∗2 =
min
W1 (I − N Q)W2 ∞ ,
(6.7.9)
min
W1 (I − N Q)W2 2 ,
(6.7.10)
min
˜ W1 (I − N(X + R D))W 2 ∞ ,
(6.7.11)
min
˜ W1 (I − N(X + R D))W 2 2 .
(6.7.12)
Q∈M(H∞ ) Q∈M(H∞ ) R∈M(H∞ ) R∈M(H∞ )
∗ , J ∗ (resp. h∗ , h∗ ) are the best weighted tracking errors achievable with a two-parameter Then, J∞ ∞ 2 2 ∗ ≤ h∗ , J ∗ ≤ h∗ . If P is a stable plant, then (resp. one-parameter) compensation scheme. Clearly J∞ ∞ 2 2 ∗ = h∗ . On one can choose N = P , X = 0, D˜ = I , in which case it is immediate that J2∗ = h∗2 , J∞ ∞ ∗ ∗ ∗ ∗ the other hand, there exist plants for which the ratios J∞ / h∞ or J2 / h2 are as small as one wishes.
6.7. TWO-PARAMETER COMPENSATOR
201
To set up these examples, let us study the computation of the various minima in (6.7.9)–(6.7.12). ∗ , h∗ and restrict attention to the scalar case for ease of discussion. In Consider first the constants J∞ ∞ this case we can define a single weighting function w = w1 w2 . Also, factor n as ni no , where ni , no are respectively inner and outer functions. Then ∗ = min (1 − nq)w∞ . J∞ q∈H∞
(6.7.13)
At this stage it can be assumed without loss of generality that w is outer: If w = wi wo is an inner-outer factorization of w, then multiplication by wi does not affect the norm. Also, since w(ej θ ) = 0 ∀θ ∈ [0, 2π ], w is actually a unit of H∞ . Therefore ∗ = min w − ni no qw∞ J∞ q∈H∞
= min w − ni q∞ , q∈H∞
(6.7.14)
after absorbing the units no and w into q. Thus, the problem is to find a function f of minimum H∞ -norm such that f interpolates the values of w at the zeros of ni (i.e., at the plant zeros inside the unit disc). Now consider the computation of h∗∞ . By definition, h∗∞ = min [1 − n(x + rd)]w∞ r∈H∞
= (y − rn) dw∞ .
(6.7.15)
Factor d as di do where di , do are respectively inner and outer, and absorb all units into the free parameter r. This leads to h∗∞ = min [ydo w − ni r]di ∞ r∈H∞
= min ydo w − ni r∞ , r∈H∞
(6.7.16)
since multiplication by di preserves norms. Thus, the problem is to find a function g of minimum H∞ -norm that interpolates the values of ydo w at the zeros of ni . This can be restated in a more transparent way. Suppose λ is a zero of ni . Then ni (λ) = 0 ⇒ (nx)(λ) = 0 ⇒ (yd)(λ) = 1 ⇒ (ydo )(λ) = 1/di (λ) .
(6.7.17)
Thus, g must interpolate the values of w/di at the zeros of ni , while f interpolates the values of w at the zeros of ni . Since |di (z)| < 1 for all z in the open unit disc, we see that h∗∞ is in general larger ∗ . In fact, if p has a pole-zero pair close together inside the unit disc, then |d (λ)| will be than J∞ i ∗. very small at some zero λ of ni , and h∗∞ will be considerably larger than J∞ Example 6.7.1
Let w = 1, and let p(z) =
z − 0.5 − ε . z − 0.5
202
6. FILTERING AND SENSITIVITY MINIMIZATION
Then z − 0.5 − ε , 1 − (0.5 + ε)z z − 0.5 . di = 1 − 0.5z
n = z − 0.5 − ε, d = z − 0.5,
ni =
Since ni has only one zero, at z = 0.5 + ε, ∗ J∞ = |w(0.5 + ε)| = 1,
h∗∞ = |(w/di )(0.5 + ε)| = 1/|di (0.5 + ε)| 0.75 − 0.5ε 0.75 = ≈ . ε ε ∗ / h∗ can be made as small as one desires. By choosing ε sufficiently small, the ratio J∞ ∞
Now consider the computation of J2∗ and h∗2 . Recall from Section 6.2 that the space L2 [0, 2π ] − + can be decomposed as the direct sum of the orthogonal subspaces H+ 2 and H2 , where H2 consists of the boundary values of H2 -functions. Given f ∈ L2 [0, 2π ], let f+ (resp. f− ) denote its projection − − ∗ −1 −1 onto H+ 2 (resp. H2 ). Finally, if f ∈ H2 , then f denotes the function z → z f (z ), while if F ∈ M(H∞ ), then Fo (resp. Fi ) denotes the outer (resp. inner) part of F . For ease of discussion, suppose W1 = I , and denote W2 by W . Also, suppose W is a unit of M(H∞ ), without loss of generality. Then J2∗ = = = =
min
(I − N Q)W 2
min
W − Ni Q2 , after absorbing No , W into Q
min
Ni∗ W − Q2
Q∈M(H∞ ) Q∈M(H∞ )
Q∈M(H∞ ) (Ni∗ W )− 2
(6.7.18)
,
corresponding to the choice Q = (Ni∗ W )+ . On the other hand, h∗2 =
min
R∈M(H∞ )
˜ (I − N(X + R D))W 2 .
(6.7.19)
˜ Y˜ ∈ M(H∞ ) such that To bring out the connection between h∗2 and J2∗ more clearly, select X,
Y −N˜
X D˜
D N
−X˜ D = Y˜ N
−X˜ Y˜
Y −N˜
X =I . D˜
(6.7.20)
6.7. TWO-PARAMETER COMPENSATOR
203
˜ )o denote the outer part of DW ˜ , then This is always possible, by Theorem 4.1.16. If we let (DW h∗2 =
min
˜ 2 (Y˜ − NR)DW
min
˜ )o − NR(DW ˜ ) o 2 Y˜ (DW
min
˜ )o − No R(DW ˜ ) o 2 Ni∗ Y˜ (DW
R∈M(H∞ )
=
R∈M(H∞ )
=
R∈M(H∞ )
˜ )o min Ni∗ Y˜ (DW R∈M(H∞ ) ˜ )o ]− 2 . [Ni∗ Y˜ (DW
= =
− R2 (6.7.21)
To compare (6.7.18) and (6.7.21), note that N X + Y˜ D˜ = I , Ni∗ W
=
(6.7.22)
Ni∗ (N X
˜ . ˜ + Y˜ D)W = No XW + Ni∗ Y˜ DW
(6.7.23)
However, since No XW ∈ M(H2 ), ˜ ]− . (Ni∗ W )− = [Ni∗ Y˜ DW So if we define F =
˜ Ni∗ Y˜ DW
∈ M(L2 ) then
J2∗ = F− 2 ,
˜ )∗i ]− 2 . h∗2 = [F (DW
This implies that h∗2 , is in general larger than J2∗ . Example 6.7.2
(6.7.24)
Consider the scalar case; let w = 1, and let p(z) =
zm , 1 − 2z
where m is an integer. Then we can choose n = zm ,
d = 1 − 2z,
x=2 ,
y = 1 + 2z + 4z2 + · · · + 2m−1 zm−1 .
m
In this case n = ni since n is already inner. Hence, we have ni = zm , n∗i = z−m . z − 0.5 1 − 0.5z di = , di∗ = . 1 − 0.5z z − 0.5 f = n∗i yd = n∗i (1 − nx) = n∗i − x = z−m − 2m .
J2∗ = f− 2 = z−m 2 = 1 .
f di∗ = (z−m − 2m ) di∗ ,
h∗2 = (f di∗ )− 2 = (z−m di∗ )− − (2m di∗ )− 2 ≥ 2m (di∗ )− 2 − (z−m di∗ )− 2 .
(6.7.25)
204
6. FILTERING AND SENSITIVITY MINIMIZATION
Now expand di (z) in a Fourier series di (z) =
∞
ci z i ,
i=0
where c0 = −0.5,
ci = 1.5(0.5)i for i ≥ 1 .
Then, since di is inner, ∞
1 = di 22 =
ci2 .
i=0
Therefore, ∞
ci2 = 0.75 .
i=1
Since, di∗ (z)
=
∞
ci z−i ,
i=0
(di∗ )− =
∞
ci z−i ,
i=1
we see that (di∗ )− 22 =
∞
ci2 = 0.75 .
i=1
On the other hand, if m ≥ 1, then we have (z−m di∗ )− = z−m di∗ =
∞
ci z−(i+m) ,
i=0
(z−m di∗ )− 2 = di∗ 2 = 1 . Hence, h∗2 ≥ 2m−1 ·
√
3−1.
By choosing the integer m sufficiently large, the ratio J2∗ / h∗2 can be made as small as one desires.
6.7. TWO-PARAMETER COMPENSATOR
205
NOTES AND REFERENCES Section 6.2 contains only the barest minimum of the theory of Hardy spaces. For excellent treatments of this topic, see Duren [34], Koosis [59] or Sz. Nagy and Foias [69]. Solving the filtering problem of Section 6.3 was the original motivation of Youla, Jabr and Bongiorno [111] in parametrizing the set of all compensators that stabilize a given plant. Kucera [60] gives a solution to the discrete-time filtering problem that is similar in its approach to that of [111]. The treatment given here follows Francis [36]. The sensitivity minimization problem was formulated by Zames [113] and solved in the scalar case by Zames and Francis [40, 115].The treatment here is similar to that in [36]. In the multivariable case, the fat plant case was solved by Chang and Pearson [12] and Francis, Helton, and Zames [38]. Our treatment is adapted from the former. The general sensitivity minimization problem was solved by Doyle [31]. Note that it is possible to carry out all of the steps of Sections 6.4 to 6.6 using purely state-space methods; see [16, 31, 44]. The paper of Sarason [86] contains some mathematical results pertinent to the sensitivity minimization problem. The Nevanlinna-Pick algorithm used in Section 6.5 is taken from the paper by Delsarte, Genin, and Kamp [18], but this paper treats only the square matrix case. Finally, Potapov [75] contains several useful results on matrices of analytic functions, such as Lemma 6.5.6.
207
CHAPTER
7
Robustness This chapter begins with a detailed study of the issue of the robustness of feedback stability in the presence of plant and/or compensator perturbations. Both the nature and the extent of the perturbations that maintain feedback stability are discussed. Then the problem of robust regulation (i.e., stabilization plus tracking and/or disturbance rejection) is examined, for both the one-parameter as well as the two-parameter compensation schemes. Finally, various generic properties of control systems are studied.
7.1
INTRODUCTION
Consider the standard feedback system shown in Figure 7.1, and suppose the compensator C stabilizes the plant P , i.e., that the closed-loop transfer matrix
(I + P C)−1 H (P , C) = C(I + P C)−1
−P (I + CP )−1 (I + CP )−1
,
(7.1.1)
belongs to M(S). The central question studied in this chapter is: What sort of perturbations can be u2 u1
+
e1
–
C
y1
+
+
e2
P
y2
Figure 7.1: Feedback System.
permitted in P and/or C without destroying feedback stability? This question is very relevant in any practical application of the controller synthesis theory developed in the previous two chapters. The first step in designing a compensator for any given plant is to obtain a mathematical model for it. As this process always involves some modeling errors, simplifying assumptions, etc., the “true” plant is in general different from the nominal model used in designing a compensator. In the same vein, one can normally expect some errors or simplifications in implementing the compensator, so that the “true” compensator is not necessarily the same as the designed compensator. These phenomena can be captured by supposing that the true plant and compensator are perturbed versions of the nominal
208
7. ROBUSTNESS
plant and compensator respectively. Hence, there is interest in obtaining characterizations of both the nature and the extent of the permissible perturbations. Before attempting to characterize permissible perturbations, we should first decide on a reasonable definition of what constitutes a permissible perturbation. Suppose (P0 , C0 ) is a nominal plant-compensator pair which is stable, so that H (P0 , C0 ) ∈ M(S), and suppose P0 , C0 are perturbed to P , C, respectively. In order for (P , C) to be a permissible perturbation of (P0 , C0 ), we require as a minimum that (P , C) also be a stable pair. But this is not enough. The objective of compensator design is not merely to stabilize a plant but also to improve its response (this is why compensation is applied even to stable plants in some instances). Hence (P , C) is deemed to be a permissible perturbation of (P0 , C0 ) if (P , C) is stable, and in addition the closed-loop transfer matrix H (P , C) is “close” to H (P0 , C0 ). There are two aspects to the study of robustness. The first is qualitative, and pertains to the nature of the permissible perturbations. Mathematically, this is the problem of defining a topology on the set of plant-compensator pairs.The second is quantitative, and is concerned with the extent of the permissible perturbations. This is the problem of defining a metric on the set of plant-compensator pairs. Each has its own significance, which is discussed at the appropriate place. In the study of robustness, a very helpful feature is that the map from the pair (P , C) to the closed-loop transfer matrix H (P , C) is one-to-one.That is, given H (P , C) ∈ M(S), one can uniquely determine P and C, by proceeding as follows: Define C 0 0 I G= , F = . (7.1.2) 0 P −I 0 Then it is easy to verify (as in Section 5.1) that H (P , C) = (I + F G)−1 . Solving this relationship for G in terms of H (P , C) gives C 0 G= = F [I − (H (P , C))−1 ] , (7.1.3) 0 P where we use the fact that F −1 = −F . Now suppose that (P0 , C0 ) is a given stable pair corresponding to the nominal plant and compensator, and let B denote a subset of M(S) containing H (P0 , C0 ), representing the acceptable perturbations of the closed-loop transfer matrix resulting from changes in the plant and/or compensator. Now the set {G ∈ M(R(s)) : G = F (I − H )−1
for some H ∈ B} ,
(7.1.4)
certainly contains all pairs (P , C) such that H (P , C) ∈ B. But there is a flaw in this argument: If H is an arbitrary element of B, then G = F (I − H )−1 need not be block-diagonal and therefore, need not correspond to a plant-compensator pair. Of course, the set of block-diagonal matrices of the form (7.1.4) is precisely the set of permissible plant and compensator perturbations. However, this set is very difficult to characterize explicitly. Hence we are forced to seek alternate approaches. This is what is done in the next few sections.
7.2. GRAPH TOPOLOGY
7.2
209
GRAPH TOPOLOGY
Suppose it is desired to design a stabilizing compensator for a plant that is only specified imprecisely. Suppose further that the plant uncertainty can be modeled by considering a family of plants Pλ , where the uncertainty parameter λ assumes values in some topological space . Typically λ represents some physical parameters of the plant, such as component values, and the function λ → Pλ captures the manner in which this uncertainty in the physical parameters is reflected in the plant description. The parameter space may be Rn for some integer n, or a more general normed space, or any topological space. An example of the latter situation arises in the design of compensators for singularly perturbed systems, where = [0, ∞). For technical reasons, it is assumed that is a first-countable topological space (see Section C.1 for the definition). As all metric spaces are first-countable, this poses no practical restriction. Let Pλ0 be the nominal plant description on which the compensator design is based. The question is: When is it possible to design a stabilizing compensator for Pλ0 that is “robust” against the plant uncertainty? To answer this question, it is necessary first to define robust stabilization. The definition adopted here is that a compensator1 C ∈ S(Pλ0 ) robustly stabilizes the family of plants Pλ if (i) there exists a neighborhood N of λ in such that C ∈ S(Pλ ) ∀λ ∈ N, and (ii) the stable closed-loop transfer matrix H (Pλ , C) is continuous in λ at λ0 . The first condition means that C stabilizes not only the nominal plant Pλ0 but also all “perturbed” plants Pλ for all λ, sufficiently near λ0 . But this alone is not enough: The objective of compensator design is not merely to stabilize a plant but also to shape the stable closed-loop response. Thus, the second condition is imposed, requiring that the perturbed (stable) closed-loop response H (Pλ , C) be close to the nominal (stable) closed-loop response H (Pλ0 , C) whenever λ0 is near λ. Note that the question as posed above is purely qualitative, in that one is not concerned about the “size” of the neighborhood N, but merely about its existence. The quantitative aspects of robustness are studied in the next two sections. The question studied (and answered) in this section is: When does there exist a robustly stabilizing compensator for a given family Pλ of plants? The answer involves the introduction of an appropriate topology on the set M(R(s)) of possibly unstable plants, called the graph topology. In terms of the graph topology, the answer to the above question turns out to be exceedingly simple and satisfactory: Robust stabilization is possible if and only if the function λ → Pλ is continuous in the graph topology at λ = λ0 . In other words, robust stabilization is possible if and only if any variations in the parameter λ from its nominal value λ0 affect the plant description in a continuous manner, where the continuity of Pλ is judged in the context of the graph topology. In this section, the graph topology is defined, and some of its properties are proved, including the above-stated result on robust stabilizability. Then several examples are given with a view to developing some insight into the graph topology. One of these “examples” contains a fairly substantial application of these ideas to the design of compensators for singularly perturbed systems. As the name implies, the graph topology has to do with the graph of a plant, and we begin with this notion. 1 Recall that S(P ) is the set of compensators that stabilize P .
210
7. ROBUSTNESS
Suppose P ∈ M(R(s)). Then P is a possibly unstable plant; that is, not all bounded inputs applied to P need result in bounded outputs. But undoubtedly some (though not necessarily all) bounded inputs applied to P will produce bounded outputs. For example, suppose p(s) = 1/(s − 1); then it is intuitively clear that a bounded input u(·) whose Laplace transform has a zero at s = 1 will produce a bounded output from p. The graph of a plant P is just the set of bounded input-output pairs corresponding to P . Precisely, if P ∈ R(s)n×m , then GS (P ) = {(u, y) ∈ Sn+m : y = P u} ,
(7.2.1)
is the graph of P with respect to S. Lemma 7.2.1
Suppose P ∈ R(s)n×m , and let (N, D) be any r.c.f. of P . Then GS (P ) = {(Dz, N z) : z ∈ Sm } .
(7.2.2)
This lemma shows that the graph of a plant P can be very simply parametrized in terms of any r.c.f. (N, D) of P . As the free parameter z varies over Sm , the pair (Dz, N z) generates the set of all “bounded” input-output pairs corresponding to P . Remarks 7.2.2
As in [26, 81], let L∞ denote the set of all measurable essentially bounded functions mapping [0, ∞) into the real numbers, and let L2 denote the set of all measurable square-integrable functions mapping [0, ∞) into R. Then one can also define the graph of P with respect to the spaces L∞ and L2 in a manner entirely analogous with (7.2.1), namely n+m : yˆ = P u} ˆ , GL∞ (P ) = {(u, y) ∈ L∞ GL2 (P ) = {(u, y) ∈ L2n+m : yˆ = P u} ˆ ,
(7.2.3) (7.2.4)
where the hat denotes Laplace transformation. It can be shown [95] that these graphs can also be simply characterized in terms of an r.c.f. of P . Specifically, if (N, D) is any r.c.f. of P , then GL∞ (P ) = {(Dz, N z) : z ∈ Lm ∞} , GL2 (P ) = {(Dz, N z) : z ∈ Lm 2},
(7.2.5) (7.2.6)
where D and N are now interpreted as bounded operators on L∞ or L2 . But technically Lemma 7.2.1 is easier to prove than (7.2.5) or (7.2.6). Proof. Suppose u = Dz for some z ∈ Sm .Then P u = ND −1 Dz = Nz ∈ Sn . Hence, for all z ∈ Sm , the pair (Dz, Nz) belongs to GS (P ). It remains to show that every point (u, y) ∈ GS (P ) is of the form (Dz, Nz) for some z ∈ Sm . Since N and D are right-coprime, one can select X, Y ∈ M(S) such that XN + Y D = 1. Now suppose (u, y) ∈ GS (P ), i.e., suppose u ∈ Sm , y ∈ Sn and y = P u.
7.2. GRAPH TOPOLOGY
D −1 u
211
[R(s)]m .
Define z = ∈ Then clearly (u, y) = (Dz, N z). The proof is complete if it can be shown that z actually belongs to Sm . Note that z = D −1 u = (XN + Y D)D −1 u = XND −1 u + Y u = Xy + Y u ∈ Sm , since all quantities in the last expression belong to M(S). Corollary 7.2.3
(7.2.7)
2
Suppose P ∈ Sn×m . Then GS (P ) = {(z, P z) : z ∈ Sm } .
Proof. Note that (P , I ) is an r.c.f. of P .
(7.2.8)
2
Now we are ready to define the graph topology. Suppose (N, D) is a right-coprime pair, and that |D| = 0. Then there exists a constant μ = μ(N, D) such that (N1 , D1 ) is also a right-coprime pair and |D1 | = 0 whenever [(N1 − N) (D1 − D) ] < μ(N, D). To see this, first select any X, Y ∈ M(S) such that XN + Y D = I , and let μ1 = 1/[X Y ]. Then N1 − N < μ1 . (7.2.9) XN1 + Y D1 − I < 1 whenever D1 − D This shows that XN1 + Y D1 ∈ U(S), whence N1 and D1 are right-coprime. Next, since |D| = 0, the ball B(|D|, r) does not contain 0 whenever r < |D|. Let r < |D|. Since the map D1 → |D1 | is continuous, the set of matrices D1 such that |D1 | ∈ B(|D|, r) contains a ball B(D, μ2 ) centered at D. Now let μ = min{μ1 , μ2 }. Then the preceding arguments show that N1 , D1 are coprime and |D1 | = 0 whenever (N1 , D1 ) ∈ B((N, D), μ). Of course, the bound μ is not uniquely specified by the above conditions, but this does not matter. Given a plant P ∈ M(R(s)), a basic neighborhood of P is defined as follows: Let (N, D) be any r.c.f. of P , and let ε be any positive number less than μ(N, D). Then the set −1 N1 − N N(N, D; ε) = {P1 = N1 D1 : < ε} , (7.2.10) D1 − D is a basic neighborhood of P . Thus, a basic neighborhood of P merely consists of all plants P1 which have an r.c.f. (N1 , D1 ) that is “close” to an r.c.f. of P . To put it another way, a basic neighborhood of P is obtained by starting with some r.c.f. (N, D) of P and making a “small” perturbation of (N, D) to (N1 , D1 ), and then taking the ratio, N1 D1−1 . Now by varying ε over all positive numbers less than μ(N, D), by varying (N, D) over all r.c.f.’s of P , and by varying P over all elements of M(R(s)), we obtain a collection of basic neighborhoods. Lemma 7.2.4
The collection of basic neighborhoods forms a base for a topology on M(R(s)).
212
7. ROBUSTNESS
Proof. In accordance with the contents of Section C.1, it is necessary to prove that axioms (B1) and (B2) are satisfied. For convenience, these axioms are recalled below: (B1) The collection of basic neighborhoods covers M(R(s)). (B2) If N1 , N2 are basic neighborhoods and P ∈ N1 ∩ N2 , then there is a basic neighborhood N such that P ∈ N ⊆ (N1 ∩ N2 ). Of these, it is at once obvious that (B1) holds, since every P ∈ M(R(s)) belongs to some basic neighborhood. To prove (B2), suppose ¯ −1 N − N1 ¯ ¯ (7.2.11) < ε1 } , N1 = {N D : ¯ D − D1 N¯ − N2 (7.2.12) N2 = {N¯ D¯ −1 : D¯ − D2 < ε2 } , and suppose P ∈ N1 ∩ N2 . Then P has r.c.f.’s (N¯ 1 , D¯ 1 ), (N¯ 2 , D¯ 2 ) such that ¯ N¯ 1 − N1 =: r1 < ε1 , N2 − N2 =: r2 < ε2 . D¯ 2 − D2 D¯ 1 − D1
(7.2.13)
Moreover, there exists a U ∈ U(S) such that N¯ 2 = N¯ 1 U, D¯ 2 = D¯ 1 U . Now select a positive number r such that r ≤ min{ε1 − r1 , ε2 − r2 /U }, and define ¯ −1 N − N1 < r} . (7.2.14) N = {ND : D − D¯ 1 Then N is certainly a basic neighborhood containing P . It only remains show that N ⊆ N1 ∩ N2 . This follows readily: First, N − N¯ 1 < r ⇒ N − N1 < r1 + r < ε1 , (7.2.15) D − D1 D − D¯ 1 which shows that N D −1 ∈ N1 . Next, N − N¯ 1 < r ⇒ NU DU D − D¯ 1 NU ⇒ DU
− N¯ 2 < rU − D¯ 2 − N2 < r2 + rU < ε2 , − D2
which shows that (NU )(DU )−1 = ND −1 ∈ N2 . Hence N ⊆ N1 ∩ N2 .
(7.2.16)
2
The topology on M(R(s)) defined by the collection of sets of the form (7.2.10) as base is called the graph topology. In it, two plants P1 and P2 are “close” if they have r.c.f.’s (N1 , D1 ), (N2 , D2 ) such that (N1 , D1 ) − (N2 , D2 ) is “small.” In view of Lemma 7.2.1, this means that P1 and P2 are “close” if their graphs are “close.” This is the reason for the name “graph topology.”
7.2. GRAPH TOPOLOGY
213
The central result of this section is that the graph topology on M(R(s)) is the weakest topology in which feedback stability is a robust property. In order to state this result precisely, it is necessary first to characterize the convergence of sequences and continuity of functions in the graph topology. Suppose {Pi } is a sequence in M(R(s)) and that P ∈ M(R(s)). Then by definition (Section C.1), {Pi } converges to P if and only if every neighborhood of P contains all but a finite number of terms of the sequence {Pi }. Since every neighborhood of P contains a basic neighborhood of P , it is easy to see that {Pi } converges to P if and only if every basic neighborhood of P contains all but a finite number of terms of {Pi }. All this is a ready consequence of the definition of the graph topology and of the definition of convergence in a topological space. The next lemma gives several equivalent characterizations of convergence in the graph topology. Suppose {Pi } is a sequence in M(R(s)) and that P ∈ M(R(s)). Then the following statements are equivalent: (i) {Pi } converges to P in the graph topology. (ii) For every r.c.f. (N, D) of P , there exist r.c.f.’s (Ni , Di ) of Pi such that Ni → N, Di → D in M(S). (iii) There exist an r.c.f. (N, D) of P and a sequence {(Ni , Di )} of r.c.f.’s of Pi such that Ni → N, Di → D in M(S). ˜ N) ˜ of P , there exists a sequence {(D˜ i , N˜ i )} of l.c.f.’s of Pi such that (iv) For every l.c.f. (D, ˜ ˜ ˜ ˜ Ni → N , Di → D in M(S). ˜ N) ˜ of P and a sequence {(D˜ i , N˜ i )} of l.c.f ’s of Pi such that N˜ i → (v) There exist an l.c.f. (D, ˜ ˜ ˜ N , Di → D in M(S). Lemma 7.2.5
Proof. (i) ⇐⇒ (ii). This is a ready consequence of the definition of the graph topology and of the discussion in the paragraph preceding the lemma. Suppose {Pi } converges to P . Then every basic neighborhood of P contains all but a finite number of terms in the sequence {Pi }. In particular, let (N, D) be any r.c.f. of P and let εi be any sequence of positive numbers converging to zero. (Assume without loss of generality that εi < μ(N, D) ∀i.) Define N¯ − N −1 Ni = {N¯ D¯ : (7.2.17) D¯ − D < εi } . Then for each integer k there exists an integer jk such that Pi ∈ Nk ∀i ≥ jk . That is, there exist r.c.f.’s (Ni , Di ) of Pi such that Ni − N (7.2.18) Di − D < εk ∀i ≥ jk . From this, one can conclude that
Ni − N Di − D → 0 .
(7.2.19)
214
7. ROBUSTNESS
i.e., that Ni → N, Di → D in M(S). This shows that (i) ⇒ (ii). The converse implication follows just as easily. Suppose (ii) is true and let N be any basic neighborhood of P . Then N is of the form (7.2.10) for some r.c.f. (N, D) of P and some ε > 0. By (ii), there exists a sequence {(Ni , Di )} of r.c.f.’s of Pi such that (Ni , Di ) → (N, D) in M(S). Hence [Ni Di ] − [N D ] < ε for large enough i, i.e., Pi ∈ N for large enough i. Since this is true for all basic neighborhoods of P , it follows that {Pi } converges to P . (ii) ⇒ (iii). Obvious. (iii) ⇒ (ii). Suppose (N, D) is an r.c.f. of P with the property that there exists a sequence ¯ D) ¯ be any other r.c.f. of P . It is {(Ni , Di )} of r.c.f.’s of Pi such that Ni → N, Di → D. Let (N, ¯ D¯ i → D. ¯ Since shown that there exists a sequence {(N¯ i , D¯ i )} of r.c.f.’s of Pi such that N¯ i → N, ¯ ¯ ¯ ¯ (N, D), (N D) are both r.c.f.’s of P , there exists a U ∈ U(S) such that N = NU, D = DU . Now ¯ D¯ i → D. ¯ Thus, (ii) holds. define N¯ i = Ni U, D¯ i = Di U . Then N¯ i → N, (iv) ⇐⇒ (v). The proof is entirely analogous to that of (ii) ⇐⇒ (iii) above. Thus far, it has been shown that (i), (ii), (iii) are all equivalent, and that (iv) and (v) are equivalent. The proof of the lemma is completed by showing that (iii) ⇐⇒ (v). (iii) ⇒ (v). By hypothesis, there exist an r.c.f. (N, D) of P and a sequence of r.c.f.’s {(Ni , Di )} ˜ N) ˜ of P and matrices X, Y, X, ˜ Y˜ ∈ M(S) of Pi such that Ni → N, Di → D. Now select an l.c.f. (D, such that Y X D −X˜ =I . (7.2.20) −N˜ D˜ N Y˜ This can always be done, in view of Theorem 4.1.16. Since (Ni , Di ) → (N, D) and since U(S) is an open subset of M(S), it follows that Di −X˜ ∈ U(S) for large enough i . (7.2.21) Fi = Ni Y˜ Now let Gi = Fi−1 ∈ M(S) for large enough i, and partition Gi as Xi Yi Gi = . −N˜ i D˜ i
(7.2.22)
Then N˜ i , D˜ i are left-coprime (for large enough i), since N˜ i X˜ + D˜ i Y˜ = I . Further, N˜ i Di + D˜ i Ni = 0 implies that D˜ i−1 N˜ i = Ni Di−1 = Pi .2 Hence (D˜ i , N˜ i ) is an l.c.f. of Pi . Finally, since inversion of unimodular matrices is continuous, we have that Y X . (7.2.23) Gi → (lim Fi )−1 = −N˜ D˜ i ˜ N), ˜ which is an l.c.f. of P . This is (v). This shows that (D˜ i , N˜ i ) → (D, (v) ⇒ (iii). This is entirely analogous to the above.
2
2 It still has to be shown that D ˜ i is nonsingular for this argument to be valid. However, this is straight-forward; see the proof of
Lemma 8.1.17.
7.2. GRAPH TOPOLOGY
215
The next result characterizes the continuity of functions mapping into M(R(s)). It is a ready consequence of Lemma 7.2.5. Corollary 7.2.6 Suppose is a first-countable topological space, and that λ → Pλ is a function mapping
into M(R(s)). Then the following are equivalent: (i) The function λ → Pλ is continuous at λ = λ0 when M(R(s)) is equipped with the graph topology. (ii) For each r.c.f. (N0 , D0 ) of Pλ0 , there exists a family of r.c.f.’s (Nλ , Dλ ) of Pλ such that (Nλ0 , Dλ0 ) = (N0 , D0 ), and such that Nλ , Dλ are continuous (in M(S)) at λ = λ0 . (iii) There exists a family (Nλ , Dλ ) of r.c.f.’s of Pλ such that Nλ , Dλ are continuous (in M(S)) at λ = λ0 . (iv) For each l.c.f. (D˜ 0 , N˜ 0 ) of Pλ0 , there exists a family (D˜ λ , N˜ λ ) of l.c.f.’s of Pλ such that ˜ (Dλ0 , N˜ λ0 ) = (D˜ 0 , N˜ 0 ) and such that (D˜ λ , N˜ λ ) are continuous (in M(S)) at λ = λ0 . (v) There exists a family (D˜ λ , N˜ λ ) of l.c.f.’s of Pλ such that D˜ λ , N˜ λ are continuous (in M(S)) at λ = λ0 .
Now we come to the main result of this section. It is rather wordy in the interests of precision, but can be loosely stated as follows: A family of plants {Pλ } can be robustly stabilized by a compensator C ∈ S(Pλ0 ) if and only if the function λ → Pλ is continuous at λ = λ0 in the graph topology: further, if λ → Pλ is continuous at λ0 , then every C ∈ S(Pλ0 ) robustly stabilizes the family {Pλ }. Suppose λ → Pλ is a function mapping a first-countable topological space into M(R(s)). (i) Suppose the function λ → Pλ is continuous at λ = λ0 when M(R(s)) is equipped with the graph topology. Let C be any element of S(Pλ0 ). Then there exists a neighborhood N of λ0 such that C ∈ S(Pλ ) ∀λ ∈ N, and moreover H (Pλ , C) is continuous (in M(S)) as a function of λ at λ0 . (ii) Conversely, suppose there exists a C ∈ S(Pλ0 ) such that C ∈ S(Pλ ) for all λ in some neighborhood N of λ0 , and that H (Pλ , C) is continuous (in M(S)) as a function of λ at λ0 . Then the function λ → Pλ is continuous when M(R(s)) is equipped with the graph topology.
Theorem 7.2.7
Proof. (i) By hypothesis, the function λ → Pλ is continuous at λ0 . By Corollary 7.2.6 (iii), there exists a family (Nλ , Dλ ) of r.c.f.’s of Pλ such that Nλ , Dλ are continuous at λ0 . Since C ∈ S(Pλ0 ), there is an l.c.f. (D˜ c , N˜c ) of C such that D˜ c Dλ0 + N˜ c Nλ0 = I (see Corollary 5.1.8). Moreover, H (Pλ0 , C) =
I − Nλ0 N˜ c Dλ0 N˜ c
−Nλ0 D˜ c Dλ0 D˜ c
.
(7.2.24)
Now define λ = D˜ c Dλ + N˜ c Nλ . Then λ0 = I and λ → λ is a continuous function from into M(S). Since U(S) is an open subset of M(S), there is a neighborhood N of λ0 in such that
216
7. ROBUSTNESS
λ ∈ U(S) ∀λ ∈ N. Hence C ∈ S(Pλ ) ∀λ ∈ N. Moreover, since inversion of unimodular matrices is continuous, the map λ → −1 λ is continuous. It now follows from ˜ c Nλ −1 D˜ c N I − Nλ −1 λ λ , (7.2.25) H (Pλ , C) = Dλ −1 N˜ c Dλ −1 D˜ c λ
λ
that H (Pλ , C) is continuous at λ = λ0 . (ii) By hypothesis, there exists a C ∈ S(Pλ0 ) and a neighborhood N of λ0 such that C ∈ S(Pλ ) ∀λ ∈ N. Select any r.c.f. (N0 , D0 ) of Pλ0 . Since C ∈ S(Pλ0 ), by Corollary 5.1.8 there exists an l.c.f. (D˜ c , N˜ c ) of C such that D˜ c D0 + N˜ c N0 = I . Now, since C ∈ S(Pλ ) ∀λ ∈ N, it follows by symmetry that Pλ ∈ S(C) ∀λ ∈ N. Hence, once again by Corollary 5.1.8, there exists r.c.f.’s (Nλ , Dλ ) of Pλ such that D˜ c Dλ + N˜ c Nλ = I ∀λ ∈ N. Moreover, I − N0 N˜ c −N0 D˜ c , (7.2.26) H (Pλ0 , C) = D0 N˜ c D0 D˜ c I − Nλ N˜ c −Nλ D˜ c . (7.2.27) H (Pλ , C) = Dλ N˜ c Dλ D˜ c By hypothesis H (Pλ , C) is continuous at λ = λ0 . Hence so is the function Nλ N˜ c Nλ D˜ c . Fλ = Dλ N˜ c Dλ D˜ c Since Nλ0 , Dλ0 are fixed elements of M(S) (i.e., independent of λ), the function N0 Nλ = (since N˜ c N0 + D˜ c D0 = I ) , Fλ D0 Dλ
(7.2.28)
(7.2.29)
is also continuous at λ0 . Thus, Nλ , Dλ are continuous at λ0 . Finally, since H (Pλ , C) is continuous at λ0 , it follows that lim H (Pλ , C) = H (Pλ0 , C) . (7.2.30) λ→λ0
Then from (7.2.29) it follows that lim
λ→λ0
Nλ N0 = . Dλ D0
Now by Corollary 7.2.6 (ii), the map λ → Pλ is continuous at λ0 .
(7.2.31)
2
In the proof of Theorem 7.2.7, the only properties of M(S) used were (i) the set of units is open, and (ii) inversion of a unit is continuous. Thus, Theorem 7.2.7 holds under extremely general conditions. Specifically, one can replace S by any topological ring with the above two properties, plus the obvious ones (namely S is a commutative domain with identity), and Theorem 7.2.7 remains valid. The same remark applies to Theorem 7.2.8 and Corollary 7.2.9 below.
7.2. GRAPH TOPOLOGY
217
Theorem 7.2.7 is addressed to the robustness of feedback stability in the case where the plant is perturbed but the compensator remains fixed. The next result is addressed to the case where both the plant and the compensator may be perturbed. Suppose a family of plants {Pλ } and a family of compensators {Cλ } is specified, and consider the function λ → (Pλ , Cλ ). Theorem 7.2.8 below relates the behavior of this function to the robustness of feedback stability. Before the presentation of the theorem, however, a word is needed about the problem formulation. Modeling the plant and compensator perturbations by a family {Pλ , Cλ } where both P and C are functions of the same parameter λ gives the misleading impression that plant and compensator perturbations are in some way “coupled.” But this is not so. Suppose plant uncertainty is modeled by specifying a family of plants Pα where α belongs to a topological space A, and compensator uncertainty similarly leads to a family of compensators Cβ where β belongs to another topological space B. In such a case, define the parameter space to be the cartesian product of A and B equipped with the product topology, and define P(α,β) = Pα , C(α,β) = Cβ . In this way, “independent” plant and compensator uncertainty can also be treated by Theorem 7.2.8. Suppose λ → Pλ , λ → Cλ are functions mapping a first-countable topological space into M(R(s)), and suppose that the pair (Pλ0 , Cλ0 ) is stable. (i) Suppose the functions λ → Pλ , λ → Cλ are continuous at λ = λ0 in the graph topology. Then there exists a neighborhood N of λ0 such that (Pλ , Cλ ) is stable for all λ ∈ N, and in addition H (Pλ , Cλ ) is continuous at λ = λ0 in the norm topology on M(S). (ii) Conversely, suppose there is a neighborhood N of λ0 such that (Pλ , Cλ ) is stable for all λ ∈ N, and such that H (Pλ , Cλ ) is continuous at λ = λ0 in the norm topology on M(S). Then the functions λ → Pλ , λ → Cλ are continuous at λ = λ0 in the graph topology. Theorem 7.2.8
The proof of this theorem is deferred to Section 7.3. However, combining Theorems 7.2.8 and 7.2.7 leads to the following result, which is relevant in the context of singularly perturbed systems. Suppose λ → Pλ is a function mapping a first-countable topological space into M(R(s)), and suppose a point λ0 ∈ is specified. If there exist a neighborhood N of λ0 and a family {Cλ , λ ∈ N} in M(R(s)) of compensators such that (Pλ , Cλ ) is stable for all λ ∈ N and such that H (Pλ , Cλ ) is continuous at λ = λ0 , then there exists a C ∈ M(R(s)) and a neighborhood B of λ0 such that (Pλ , C) is stable for all λ ∈ B and H (Pλ , C) is continuous at λ = λ0 . Corollary 7.2.9
Remarks 7.2.10 The point of the corollary is this: Suppose a family of plants {Pλ } is specified, together with a nominal plant Pλ0 . If it is possible to design a family of compensators Cλ such that Cλ robustly stabilizes the family of plants {Pλ }, then the same can be accomplished using a fixed (i.e., λ-independent) compensator.
218
7. ROBUSTNESS
Proof. The hypotheses imply, by Theorem 7.2.8 (ii), that both λ → Pλ , λ → Cλ are continuous in the graph topology at λ = λ0 . Now let C = lim Cλ , or any other element of S(Pλ0 ). Then, by λ→λ0
Theorem 7.2.7 (i), C robustly stabilizes the family {Pλ }.
2
The preceding results show that the graph topology is of fundamental importance in the study of the robustness of feedback stability. Hence it is worthwhile to develop a further understanding of it, with particular emphasis on what continuity in this topology means. The next several propositions are directed towards this aim. On the set M(S) viewed as a subset of M(R(s)), the norm topology and the graph topology are the same. M(S) is an open subset of M(R(s)) in the graph topology. Proposition 7.2.11
Proof. To prove the first sentence, it is enough to show that convergence in the norm topology is equivalent to convergence in the graph topology in M(S). Accordingly, suppose {Pi } is a sequence3 in M(S) and that P ∈ M(S). Suppose first that Pi − P → 0. Then C = 0 stabilizes P as well as Pi , since P , Pi ∈ M(S). Further, since Pi − P → 0, it follows from (7.1.1) that H (Pi , 0) → H (P , 0) in M(S). Hence, by Theorem 7.2.7 (ii), we conclude that {Pi } converges to P in the graph topology. To prove the converse, and incidentally to prove the second sentence, suppose {Pi } is a sequence in M(R(s)) (not necessarily in M(S)) converging in the graph topology to P ∈ M(S). Now (P , I ) is an r.c.f. of P . Hence, by Lemma 7.2.5 (ii) there exist r.c.f.’s (Ni , Di ) of Pi such that Ni → P , Di → I in M(S). Now, I is a unimodular matrix. Since U(S) is an open subset of M(S) and since the inversion of unimodular matrices is continuous, this implies that Di is unimodular for large enough i and that Di−1 → I in M(S). Hence, Pi = Ni Di−1 ∈ M(S) for large enough i; moreover, since Ni − P → 0 and Di−1 − I → 0, it follows that Pi − P = Ni Di−1 − I → 0. As it has been shown during the preceding argument that every sequence in M(R(s)) that converges to a limit in M(S) must eventually lie in M(S), we conclude that M(S) is an open subset of M(R(s)). 2 Proposition 7.2.11 shows that, in the case of functions mapping into M(S), continuity in the graph topology is the same as continuity in the usual norm topology. In the case of unstable plant sequences, however, several strange things can happen. Before illustrating these difficulties, a preliminary result is presented that is often helpful in concluding lack of continuity. Suppose λ → Pλ is a function from a first-countable topological space into M(R(s)), and suppose this function is continuous in the graph topology at λ = λ0 . Let s1 , · · · , sl denote the poles of Pλ0 in the closed RHP. (i) Let be any compact subset of the open RHP such that none of the si lies on the boundary of , and let γ denote the number of poles of Pλ0 inside , counted according to their McMillan degrees. Then there is a neighborhood N of λ0 such that Pλ has exactly γ poles inside for all λ ∈ N. Proposition 7.2.12
3 Strictly speaking, one should examine the convergence of nets rather than sequences. However, it is shown in Section 7.3 that
the graph topology is metrizable and hence first-countable; so it is enough to examine only sequences.
7.2. GRAPH TOPOLOGY
219
(ii) Let be any closed subset of the closed RHP that does not contain s1 , · · · , sl ; if Pλ0 has a pole at infinity, suppose in addition that is bounded. Then Pλ (s) → Pλ0 (s) as λ → λ0 , uniformly for all s ∈ .
In general, the number of closed RHP poles of Pλ and of Pλ0 need not be equal. For example, let = [0, ∞), λ0 = 0 and Pλ (s) = 1/(s + λ). Remarks 7.2.13
Proof. Since the function λ → Pλ is continuous in the graph topology at λ = λ0 , it follows from Corollary 7.2.6 (iii) that there exists a family (Nλ , Dλ ) of r.c.f.’s of Pλ such that Nλ , Dλ are continuous at λ0 . To prove (i), observe that |Dλ | is continuous at λ0 ; hence |Dλ |(s) → |Dλ0 |(s) uniformly in s as λ → λ0 . Since |Dλ | and |Dλ0 | are both analytic in the open RHP, and since |Dλ0 | has no zeros on the boundary of , it follows by applying the principle of the argument to |Dλ | and to |Dλ0 | that, for some neighborhood N of λ0 , the number of zeros of |Dλ | inside is the same for all λ ∈ N. Since the RHP poles of Pλ are in one-to-one correspondence with the zeros of |Dλ |, and since the McMillan degree of an RHP pole of Pλ is precisely its multiplicity as a zero of |Dλ |, (i) follows. To prove (ii), observe that since Dλ (s) → Dλ0 (s) uniformly over the closed RHP, it follows that [Dλ (s)]−1 → [Dλ0 (s)]−1 wherever the latter is well-defined, i.e., wherever |Dλ0 (s)| = 0. Moreover, if we exclude a neighborhood of every closed RHP zero of |Dλ0 |, the convergence is uniform with respect to s. Since Pλ (s) = Nλ (s)[Dλ (s)]−1 , the same is true of Pλ . 2 One of the nice features of the norm topology in M(S) is that it is a product topology: That is, a sequence of matrices {P (i) } in M(S) converges to P ∈ M(S) if and only if each component (i) sequence {Pj k } converges to pj k for all j, k. Consequently, a function λ → Pλ mapping into M(S) is continuous at λ0 if and only if each component of Pλ is continuous at λ0 . Now Proposition 7.2.11 shows that the graph topology and the norm topology coincide on M(S). Hence, when restricted to M(S), the graph topology is also a product topology. However, in the case where a function λ → Pλ maps into unstable multivariable plants, the continuity of the matrix-valued map Pλ at λ0 neither implies nor is implied by the continuity at λ0 of the scalar-valued maps λ → pij λ for all i, j . This is illustrated through some examples. First a scalar example is presented to set the stage. Example 7.2.14
Let = R, λ0 = 0, and consider the function pλ (s) =
s−1−λ . s−1
(7.2.32)
Then at λ = 0 the function pλ has a pole-zero cancellation in the open right half-plane, namely at s = 1. This causes pλ to be discontinuous at λ = 0, as can be seen by the following reasoning. Now p0 = 1, which belongs to S. If pλ were to be continuous at λ = 0, it would follow from Proposition 7.2.11 that pλ ∈ S for all small enough λ. Since clearly pλ ∈ / S ∀λ = 0, the conclusion is that p is not continuous at λ = 0.
220
7. ROBUSTNESS
Example 7.2.15
Again let = R, λ0 = 0 and consider the function P λ(s) =
−λs s−1 2s 2 −1 s 2 −1
s−1−λ s−1 −s 2 +s+1 s 2 −1
.
(7.2.33)
Then, from Example 7.2.14, the (1,2)-component of Pλ is not continuous in the graph topology at λ = 0. Nevertheless, the function λ → Pλ mapping R into M(R(s)) is continuous in the graph topology at λ = 0. To see this, define
z+λ z−1 , Nλ (s) = z−2 1
z−1 z Dλ (s) = , z z−1
(7.2.34)
where z = s/(s + 1) ∈ S.Then it is easy to see that Nλ is continuous at λ = 0, and Dλ is independent of λ. Moreover, Pλ = Nλ Dλ−1 , as is easily checked. Hence, by Corollary 7.2.6 (iii), the function Pλ is continuous in the graph topology at λ = 0.This example shows that the continuity of a matrix-valued function does not imply the continuity of its components.
Example 7.2.16
Once again let = R, λ0 = 0, and consider P λ(s) =
s+1 s−1−λ
Then, P0 (s) =
s+1 s−1
s+1 s−1+λ
s+1 s−1
.
(7.2.35)
.
(7.2.36)
It is immediate from (7.2.35) and (7.2.36) that each component of Pλ is continuous in the graph topology at λ = 0. Nevertheless, Pλ itself is not continuous at λ = 0. To see why this is so, note that P0 has a pole at s = 1 of McMillan degree one, and this is the only pole of P0 . On the other hand, for each nonzero λ ∈ (−1, 1), Pλ , has two poles in the closed RHP, each of McMillan degree one. Since Proposition 7.2.12 (i) fails to hold, the conclusion is that Pλ is not continuous in the graph topology at λ = 0. The next result shows that, while the graph topology may be the “right” topology for studying the robustness of feedback stability, addition and multiplication in R(s) (or M(R(s))) are not necessarily continuous operations in this topology.
Proposition 7.2.17
R(s) equipped with the graph topology is not a topological ring.
7.2. GRAPH TOPOLOGY
221
Proof. It is necessary to demonstrate that either addition or multiplication fails to be continuous. In actuality, both operations are not continuous in general. First, let {εi } be any sequence converging to zero, and let s − 3 − εi , s−2 s−3 p(s) = , s−2
pi (s) =
1 , s−2 1 q(s) = . s−2
qi (s) =
(7.2.37)
Then pi → p, qi → q the graph topology, but (pi + qi )(s) =
s − 2 − εi , s−2
(7.2.38)
does not approach p + q = 1 in the graph topology (see Example 7.2.14). Hence, addition is not continuous in general. An example of the noncontinuity of multiplication is provided by s − 1 − εi , s+1 s−1 p(s) = , s+1
pi (s) =
s+1 , s − 1 + εi s+1 q(s) = . s−1 qi (s) =
(7.2.39)
2
The details are left to the reader. Using entirely analogous reasoning, one can show that the matrix ring R(s)n×n
equipped with
the graph topology fails to be a topological ring for every integer n ≥ 1. Up to now we have seen a spate of negative results, including the fact that the continuity of a matrix-valued function cannot be related in a simple way to the continuity of its components, and that addition and multiplication need not be continuous. These results are presented primarily to impress upon the reader the need to proceed with caution, and to demonstrate that “obvious” statements such as “ lim (Pλ + Qλ ) = lim Pλ + lim Qλ ” need not always hold true. Happily, λ→λ0
λ→λ0
λ→λ0
there are also several positive statements that one can make about the graph topology. Suppose is a first-countable topological space, and that P , Q : → M(R(s)), R : → M(S) satisfy Pλ = Qλ + Rλ ∀λ ∈ . Suppose in addition that R is continuous at λ0 . Then P is continuous at λ0 if and only if Q is continuous at λ0 . Proposition 7.2.18
Remarks 7.2.19
As we have seen in the proof of Proposition 7.2.17, the statement lim Pλ = lim Qλ + lim Rλ ,
λ→λ0
λ→λ0
λ→λ0
(7.2.40)
is not valid in general. Thus, the point of Proposition 7.2.18 is that (7.2.40) is valid provided one of the two limits on the right side of (7.2.40) belongs to M(S).
222
7. ROBUSTNESS
Proof. “if ” Since Q is continuous at λ0 , by Corollary 7.2.6 (iii) it follows that there exists a family (Nλ , Dλ ) of r.c.f.’s of Qλ such that Nλ , Dλ are continuous at λ0 . Since Rλ ∈ M(S), the pair (Nλ + Rλ Dλ , Dλ ) is an r.c.f. of Pλ . Finally, since M(S) is a topological ring, the functions λ → Nλ + Rλ Dλ and λ → Dλ are both continuous at λ0 . This implies, by Corollary 7.2.6, that Pλ is continuous at λ0 . “only if ” Write Qλ = Pλ − Rλ and apply the above reasoning. 2 The multiplicative analog of Proposition 7.2.18 is stated next. Suppose is a first-countable topological space, and that P , Q : → M(R(s)), R : → U(S) satisfy Pλ = Qλ Rλ ∀λ ∈ . Suppose in addition that R is continuous at λ0 . Then P is continuous at λ0 if and only if Q is continuous at λ0 . Proposition 7.2.20
It is left to the reader to prove Proposition 7.2.20, by exploiting the fact that Rλ is unimodular for all λ, and to extend Proposition 7.2.20 to the case where Pλ = Rλ Qλ . The final result along these lines shows that the graph topology on M(R(s)) is a product topology under certain circumstances. Proposition 7.2.21 Suppose P maps a first-countable topological space into M(R(s)), and that P is block-diagonal of the form P = Block Diag {P1 , · · · , Pn } where the dimensions of the blocks are the same for all λ. Then P is continuous at λ0 ∈ if and only if each function Pi is continuous at λ0 .
The proof of this proposition is given in Section 7.3. One can generate some more results by combining the preceding three propositions in various ways. Consider a function λ → Pλ mapping into M(R(s)). In view of Examples 7.2.15 and 7.2.16, one can ask whether there is a simple test to determine whether or not Pλ is continuous at λ0 . Such a test is provided by Theorem 7.2.7, and can be stated as follows: Find any C ∈ S(Pλ0 ); then, Pλ is continuous at λ0 if and only if C robustly stabilizes the family {Pλ }. Of course, this test begs the question, since the issue of continuity arises only in conjunction with the existence of a robustly stabilizing compensator. Thus, the next several propositions are addressed to some very specific situations of practical interest. Proposition 7.2.22
Let = R, λ0 = 0, and suppose Pλ (s) is the transfer matrix of the system x(t) ˙ = Aλ x(t) + Bλ u(t),
y(t) = Cλ x(t) .
(7.2.41)
Suppose the triple (C0 , A0 , B0 ) is stabilizable and detectable, and that (Cλ , Aλ , Bλ ) is continuous in λ at λ = 0 in the usual norm topology on Euclidean space. Then the map λ → Pλ is continuous in the graph topology at λ = 0.
7.2. GRAPH TOPOLOGY
223
Proof. This is a straight-forward application of Lemma 4.2.2. Select a matrix F such that A0 − F C0 is Hurwitz, i.e., such that all eigenvalues of A0 − F C0 have negative real parts. By continuity, Aλ − F Cλ is also Hurwitz for sufficiently small λ. From Lemma 4.2.2, an l.c.f. of Pλ is given by D˜ λ (s) = I − Cλ (sI − Aλ + F Cλ )−1 F , N˜ λ (s) = Cλ (sI − Aλ F Cλ )−1 Bλ ,
(7.2.42) (7.2.43)
for sufficiently small λ. Since D˜ λ , N˜ λ are continuous at λ = 0 in the norm topology on M(S), it follows from Corollary 7.2.6 (iii) that Pλ is continuous in the graph topology at λ = 0. 2 A consequence of Proposition 7.2.22 is that any compensator C that stabilizes the nominal plant P0 , which is described by (7.2.41) with λ = 0, also stabilizes Pλ for all sufficiently small λ; moreover, the closed-loop transfer matrix H (Pλ , C) is continuous at λ = 0. In Proposition 7.2.22, each of the plants Pλ has the same dynamic order (i.e., the same statespace dimension). Proposition 7.2.23 below is addressed to the so-called singular perturbation case, where the plant P0 has a lower dynamic order than the plants Pλ , λ > 0. Proposition 7.2.23 system
Suppose = [0, ∞), and for each λ > 0 let Pλ (s) be the transfer matrix of the
A12 x(t) B + 1 u(t) , A22 z(t) B2 x(t) + D1 u(t) . y(t) = [C1 C2 ] z(t)
x(t) ˙ A11 = A21 λ˙z(t)
(7.2.44) (7.2.45)
Suppose A22 is a Hurwitz (and hence, nonsingular) matrix, and let P0 (s) be the transfer matrix of the system resulting from substituting λ = 0 in (7.2.44) and (7.2.45), namely x(t) ˙ = Ax(t) + Bu(t), y(t) = Cx(t) + D0 u(t) ,
(7.2.46)
−1 A = A11 − A12 A−1 22 A21 , B = B1 − A12 A22 B2 , −1 C = C1 − C2 A−1 22 A21 , D0 = D1 − C2 A22 B2 .
(7.2.47)
where
Suppose the triple (C,A,B) is stabilizable and detectable. Then Pλ is continuous at λ = 0 in the graph topology if C2 (sI − A22 )−1 B2 ≡ 0. Remarks 7.2.24 Equations of the form (7.2.44)–(7.2.45) are usually used to model systems with both fast and slow dynamical variables. The state variables x (resp. z) are referred to as the slow (resp. fast) modes. In this context, P0 represents a simplified model of Pλ obtained by neglecting the fast modes. A relevant question for such systems is the following: When does a compensator designed
224
7. ROBUSTNESS
on the basis of the simplified system P0 work for the true system Pλ ? More precisely, suppose one has found a stabilizing compensator C0 for the simplified system P0 . The issue is whether C0 also stabilizes Pλ for all sufficiently small positive λ, and if so whether H (Pλ , C0 ) approaches H (P0 , C0 ) as λ → 0+ . On the basis of Theorem 7.2.7, we know that this is so if and only if Pλ is continuous in the graph topology at λ = 0. Now Proposition 7.2.23 states that continuity holds if C2 (sI − A22 )−1 B2 ≡ 0. Proposition 7.2.23 has several interesting consequences. For instance, suppose the simplified system P0 is stabilized by state feedback.This corresponds to the case y(t) = x(t) (i.e., C1 = I, C2 = 0), and C0 = K, a constant matrix. Since the requisite condition holds in this case, it follows that the control law u(t) = −Kx(t) also stabilizes Pλ for all sufficiently small positive λ, and that H (Pλ , K) converges to H (P0 , K) as λ → 0+ . Similarly, if the control input u does not directly affect the last modes, (i.e., B2 = 0), then it is again immediate that any compensator that stabilizes the reduced order model also robustly stabilizes the singularly perturbed system for all sufficiently small positive λ. Proof. Suppose C2 (sI − A22 )−1 B2 ≡ 0. The continuity of Pλ at λ = 0 is established using Corollary 7.2.6 (iii), i.e., by constructing a family (D˜ λ , N˜ λ ) of l.c.f.’s of Pλ , λ ≥ 0, such that both D˜ λ and, N˜ λ are continuous at λ = 0 in the norm topology on M(S). Select a matrix F such that A − F C is Hurwitz; this can always be done since (C, A) is detectable. Define D˜ 0 (s) = I − C(sI − A + F C)−1 F , N˜ 0 (s) = C(sI − A + F C)−1 (B − F D0 ) + D0 .
(7.2.48) (7.2.49)
Then, by Theorem 4.2.1, (D˜ 0 , N˜ 0 ) is an l.c.f. of P0 . To obtain an l.c.f. of Pλ , let F0 = [F 0] and define A12 F A11 ¯ Aλ = − [C1 C2 ] A21 /λ A22 /λ 0 A11 − F C1 A12 − F C2 . (7.2.50) = A21 /λ A22 /λ From [15], the eigenvalues of A¯ λ are asymptotically equal to those of A22 /λ plus those of A11 − F C1 − (A12 − F C2 )A−1 22 A21 = A − F C .
(7.2.51)
Hence A¯ λ is Hurwitz for all sufficiently small positive λ. Thus, once again by Theorem 4.2.1, an l.c.f. of Pλ is given by D˜ λ (s) = I − C0 (sI − A¯ λ )−1 F0 , N˜ λ (s) = C0 (sI − A¯ λ )−1 (Bλ − F0 D1 ) + D1 ,
(7.2.52) (7.2.53)
7.2. GRAPH TOPOLOGY
where
B1 , B2 /λ
225
Bλ =
C0 = [C1 C2 ] .
(7.2.54)
The remainder of the proof consists of studying N˜ λ and D˜ λ in detail and showing that ˜ Nλ → N˜ 0 , D˜ λ → D˜ 0 as λ → 0+ . Recall that the inverse of a partitioned matrix is given by
X W
Y V
−1
=
where
−1 −1 −V W −1
−−1 Y V −1 −1 V + V −1 W −1 Y V −1
,
(7.2.55)
= X − Y V −1 W .
(7.2.56)
−1 sI − A11 + F C1 −A12 + F C2 F ˜ Dλ = I − [C1 C2 ] −A21 /λ sI − A22 /λ 0 −1 −1 −1 = I − C 1 F + C2 V W F ,
(7.2.57)
= (sI − A11 + F C1 ) − (A12 − F C2 )(sI − A22 /λ)−1 A21 /λ = (sI − A11 + F C1 ) − (A12 − F C2 )(λsI − A22 )−1 A21 , V = sI − A22 /λ , W = −A21 /λ .
(7.2.58) (7.2.59)
Hence
where
Simplification of (7.2.57) yields where
D˜ λ = I − M−1 F ,
(7.2.60)
M = C1 − C2 V −1 W = C1 + C2 (λsI − A22 )−1 A21 .
(7.2.61)
Now, letting A¯ denote the Hurwitz matrix A − F C, we get ¯ −1 F D˜ λ = I − C(sI − A) ¯ −1 F + C2 A−1 A21 (sI − A) ¯ −1 F . = I − C1 (sI − A) 22
(7.2.62)
Hence, it can be concluded that D˜ λ → D˜ 0 if it can be shown that ¯ −1 , −1 → (sI − A) −1 ¯ −1 . (λsI − A22 ) A21 −1 → −A−1 22 A21 (sI − A)
(7.2.63) (7.2.64)
4 Note that (λsI − A22 )−1 does not approach −A−1 22 . Hence, (7.2.64) does not automatically follow from (7.2.63). 4 Observe that (λsI − A )−1 → 0 as s → ∞ whenever λ > 0. Hence, the norm of the difference between the functions (λsI − 22 −1 A22 )−1 and A−1 22 is at least A22 , no matter how small λ is.
226
7. ROBUSTNESS
To prove (7.2.63), let A¯ 11 denote A11 − F C1 , and note that
where Hence,
−1 = [sI − A¯ 11 − (A12 − F C2 )(λsI − A22 )−1 A21 ]−1 = {sI − A¯ 11 − (A12 − F C2 )A−1 22 A21 −1 − (A12 − F C2 )[(λsI − A22 )−1 − A−1 22 ]A21 } ¯ −1 , = (I − R)−1 (sI − A)
(7.2.65)
¯ −1 . R = (A12 − F C2 )[(λsI − A22 )−1 − A−1 22 ](sI − A)
(7.2.66)
¯ −1 = [(I − R)−1 − I ](sI − A) ¯ −1 . −1 − (sI − A)
(7.2.67)
Now (I − R)(s) approaches I uniformly on every compact subset of C+e as λ → 0, and (sI − ¯ −1 → 0 as |s| → ∞. Hence, −1 → (sI − A) ¯ −1 and (7.2.63) is proved. To establish (7.2.64), A) −1 −1 observe that (λsI − A22 ) approaches −A22 uniformly on all compact subsets of C+e as λ → 0, ¯ −1 are strictly proper. Thus (7.2.64) follows from (7.2.63) plus the and that both −1 and (sI − A) strict properness of the quantities on both sides of (7.2.63). Finally, (7.2.63) and (7.2.64) together imply that D˜ λ → D˜ 0 as λ → 0. Next, we examine the “numerator” matrix N˜ λ . From (7.2.53) sI − A11 + F C1 −A12 + F C2 B1 − F D1 + D1 . (7.2.68) N˜ λ = [C1 C2 ] −A21 /λ sI − A22 /λ B2 /λ Expanding the above equation results in an expression for N˜ λ as a sum of five terms the first four of which arise from the matrix product and the last is D1 . From (7.2.55), the first of these terms is ¯ −1 (B1 − F D1 ). C1 −1 (B1 − F D1 ). Now (7.2.63) implies that this term converges to C1 (sI − A) The second term in the expansion is − C2 V −1 W −1 (B1 − F D1 ) = C2 (sI − A22 /λ)−1 (A21 /λ)−1 (B1 − F D1 ) = C2 (λsI − A22 )−1 A21 −1 (B1 − F D1 ) .
(7.2.69)
From reasoning analogous to that used to establish (7.2.64), it follows that this term converges to −1 −C2 A−1 22 A21 (sI − A22 ) (B1 − F D1 ). The third term is − C1 −1 Y V −1 B2 /λ = C1 −1 (A12 − F C2 )(sI − A22 /λ)−1 B2 /λ = C1 −1 (A12 − F C2 )(λsI − A22 )−1 B2 .
(7.2.70)
As before, this converges to ¯ −1 (A12 − F C2 )A−1 B2 . −C1 (sI − A) 22
(7.2.71)
7.2. GRAPH TOPOLOGY
However, note that
C2 A−1 22 B2
227
= 0. Hence the above limit simplifies to ¯ −1 A12 A−1 B2 . −C1 (sI − A) 22
(7.2.72)
The fourth term is the most complicated one; it equals C2 V −1 (I + W −1 Y V −1 )B2 /λ = C2 (sI − A22 /λ)−1 · I + Aλ21 −1 (A12 − F C2 ) sI −
−1 A22 λ − A22 )−1 ]B2
·
= C2 (λsI − A22 )−1 [I + A22 −1 (A12 − F C2 )(λsI = C2 (λsI − A22 )−1 B2 + C2 (λsI − A22 )−1 A21 −1 · (A12 − F C2 )(λsI − A22 )−1 B2 .
B2 λ
(7.2.73)
Now the first term on the right side is identically zero by hypothesis, while the second term is strictly −1 ¯ −1 proper. As a result, the second term converges without difficulty to C2 A−1 22 A21 (sI − A) A12 A22 B2 . Of course the fifth and final term is the constant matrix D1 . Combining the five limits, we see that, N˜ λ approaches −1 ¯ −1 (C1 − C2 A−1 22 A21 )(sI − A) (B1 − F D1 − A12 A22 B2 ) + D1 .
(7.2.74)
Finally, using the relations (7.2.47), and noting that D1 = D0 , since C2 A−1 22 B2 = 0, the above limit simplifies to ¯ −1 (B − F D0 ) + D0 = N˜ 0 . (7.2.75) C(sI − A) From Corollary 7.2.6, it now follows that Pλ converges to P0 in the graph topology.
2
While the result contained in Proposition 7.2.23 is interesting, its use is limited by the fact that the condition it requires is quite restrictive. The origin of the restrictiveness is the definition of robust stabilizability. Recall that a controller C ∈ S(P0 ) was not only required to stabilize Pλ for all sufficiently small positive λ, but the resulting stable family of transfer matrices H (Pλ , C) was required to converge uniformly over C+e to H (P0 , C) as λ → 0. The next proposition is devoted to a study of the situation where a weaker form of convergence of the family H (Pλ , C) is acceptable. Consider the family {Pλ } defined in (7.2.44)–(7.2.45). Suppose C ∈ (P0 ) is strictly proper. Then C ∈ S(Pλ ) for all sufficiently small positive λ. Moreover, [H (Pλ , C)](s) converges to [H (P0 , C)](s) uniformly over all compact subsets of C+e as λ → 0. Proposition 7.2.25
Proof. In the interests of brevity, let us say that a family {Fλ } converges weakly to F ∈ M(S) if Fλ (s) → F (s) uniformly over all compact subsets of C+e as λ → 0. In contrast, we say simply that {Fλ } converges to F if the convergence of Fλ (s) to F (s) is uniform over the entire C+e . Also note that, if {Fλ } converges weakly to F and {Gλ } converges to G, then the product {Gλ Fλ } converges to GF provided G is strictly proper.
228
7. ROBUSTNESS
Since C ∈ S(P0 ), it has an r.c.f. (Nc , Dc ) such that N˜ 0 Nc + D˜ 0 Dc = I . Moreover, since C is strictly causal, so is Nc . Now consider the return difference matrix λ = N˜ λ Nc + D˜ λ Dc .
(7.2.76)
In the proof of Proposition 7.2.23, it was shown that {N˜ λ } converges to N˜ 0 provided C2 (sI − A22 )−1 B2 ≡ 0. However, from (7.2.73) one can easily verify that, even if the above condition is not satisfied, the family {N˜ λ } still converges weakly to N˜ 0 . Since Nc is strictly proper, it follows that N˜ λ Nc → N˜ 0 Nc . Also D˜ λ → D˜ 0 , since the condition was not used in the proof of this result (see (7.2.63)–(7.2.67)). Hence, λ → N˜ 0 Nc + D˜ 0 Dc = I , which implies that λ is unimodular for sufficiently small (positive) λ. By Theorem 5.1.6, this means that C ∈ S(Pλ ) for small enough λ. Next, the twin formula to (5.1.21) shows that ˜ λ −Dc −1 N˜ λ D Dc −1 λ λ H (Pλ , C) = . (7.2.77) Nc −1 D˜ λ I − Nc −1 N˜ λ λ
λ
˜ All terms except −Dc −1 λ Nλ , converge to their corresponding values when λ = 0. But this term only converges weakly to −Dc N˜ 0 , since neither Dc , nor the limit of −1 λ is strictly proper. However, this is enough to conclude that {H (Pλ , C)} converges weakly to H (P0 , C). 2 Note that the set of all strictly proper compensators in S(P0 ) can be parametrized using Proposition 5.2.10. If C ∈ S(P0 ) is not strictly proper, then it is possible to construct a family {Pλ } such that (7.2.47) holds, but C ∈ / S(Pλ ) for all λ > 0. In other words, if P0 is stabilized using a nonstrictly proper compensator, then there always exists a singular perturbation of P0 that is not stabilized by C for each λ > 0. In closing, let us discuss the graph topology in the case where the set of “stable” transfer matrices is something other than the set S. A careful examination of the various proofs in this section reveals that the actual nature of the elements of the set S was not used anywhere: Rather, the pertinent properties of S were (i) the set of units in S is open and (ii) inversion is continuous. Thus, the graph topology can be defined over the set of fractions associated with an arbitrary ring R, provided only that R has the two properties mentioned above. In particular, the graph topology for discrete-time systems is self-evident, as it is in the case where the set of desirable transfer functions is taken to be the set SD where D is some prespecified region in the complex plane. Finally, note that in [114] a so-called “gap metric” is defined which is nothing but a metrization of the graph topology corresponding to a particular choice of “stable” transfer functions, namely the set of bounded linear operators mapping the Hilbert space H2 into itself. Such a choice of “stable” transfer functions takes into account the fact that BIBO-stability is a desirable attribute of a system, but does not take into account the requirement that a stable system should be causal in order to be physically useful.
7.3. GRAPH METRIC
7.3
229
GRAPH METRIC
The previous section discusses the qualitative aspects of the robustness of feedback stability. Specifically, the set M(R(s)) of all (possibly unstable) plants is topologized by means of the graph topology, and it is shown that continuity in the graph topology is a necessary and sufficient condition for feedback stability to be robust (see Theorem 7.2.7). In order to obtain quantitative estimates of robustness, it is essential to metrize the graph topology, i.e., to define a metric on M(R(s)) such that convergence (and hence, continuity) in this metric is equivalent to convergence in the graph topology. Such a metric, called the graph metric, is defined in this section. Its applications to deriving quantitative robustness estimates are given in the next section. At this stage, it should be noted that the graph metric presented in this section is only one possible means of metrizing the graph topology. It is undoubtedly possible to define alternate metrics on M(R(s)) that induce exactly the same topology. Each of these metrics would lead to a different quantitative estimate for the robustness of feedback stability, but would leave the qualitative result of Theorem 7.2.7 unchanged. Thus, in some sense the graph topology is more fundamental than the graph metric, defined below. The graph metric is defined using the concept of a normalized r.c.f. of a rational matrix, and this is introduced first. Suppose P ∈ M(R(s)). An r.c.f. (N, D) of P is normalized if N ∗ (s)N (s) + D ∗ (s)D(s) = I, ∀s ,
(7.3.1)
where N ∗ (s) = N (−s). Lemma 7.3.1 Every P ∈ M(R(s)) has a normalized r.c.f. which is unique to within right multiplication by an orthogonal matrix.
Proof. Let (N1 , D1 ) be any r.c.f. of P , and define (s) = N1∗ (s)N1 (s) + D1∗ (s)D1 (s) .
(7.3.2)
Then ∗ (s) = (s) ∀s. Moreover, by the right-coprimeness of N1 and D1 , there exist positive constants α and β such that αI ≤ (j ω) ≤ βI ∀ω. Hence, by [3, p. 240], one can do a spectral factorization of as (s) = R ∗ (s)R(s), where R ∈ U(S). Now let N = N1 R −1 , D = D1 R −1 . Then (N, D) is also an r.c.f. of P by Theorem 4.1.13, and (7.3.1) holds. To prove the assertion regarding uniqueness, let (N2 , D2 ) be another normalized r.c.f. of P . Then by definition N2∗ (s)N2 (s) + D2∗ (s)D2 (s) = I, ∀s . (7.3.3) Moreover, by Theorem 4.1.13, there is a unimodular matrix U ∈ U(S) such that N2 = NU, D2 = DU . Substituting this into (7.3.3) and using (7.3.1) shows that U ∗ (s)U (s) = I . In other words, U is a spectral factor of the identity matrix. Hence, U (s) must be constant as a function of s and must equal an orthogonal matrix. 2
230
7. ROBUSTNESS
Suppose P1 , P2 ∈ M(R(s)) have the same dimensions, and let (Ni , Di ) be a normalized r.c.f. of Pi for i = 1, 2. Let Di , i = 1, 2 , (7.3.4) Ai = Ni Definition 7.3.2
and define δ(P1 , P2 ) =
inf
U ∈M(S),U ≤1
A1 − A2 U ,
d(P1 , P2 ) = max{δ(P1 , P2 ), δ(P2 , P1 )} .
(7.3.5) (7.3.6)
Then d(P1 , P2 ) is the graph metric on M(R(s)). It is left to the reader to verify that d(P1 , P2 ) is a well-defined quantity depending only on P1 and P2 , even though A1 and A2 are only unique to within right multiplication by an orthogonal matrix. The reason is that multiplication by an orthogonal matrix does not change the norm. Lemma 7.3.3
d is a metric on M(R(s)) assuming values in the interval [0, 1].
Proof. If U = 0, then A1 − A2 U = A1 = 1. Hence, δ(P1 , P2 ) ≤ 1 and d(P1 , P2 ) ≤ 1 for all plants P1 , P2 . In proving that d is a metric, the only nonobvious part is the proof of the triangle inequality. Accordingly, let P1 , P2 , P3 ∈ M(R(s)), and select U, V ∈ M(S) such that U ≤ 1, V ≤ 1, and A1 − A2 U ≤ δ(P1 , P2 ) + ε , A2 − A3 V ≤ δ(P2 , P3 ) + ε ,
(7.3.7) (7.3.8)
where ε is some positive number. Then V U ∈ M(S) and V U ≤ 1; moreover, A1 − A3 V U ≤ A1 − A2 U + A2 U − A3 V U ≤ A1 − A2 U + A2 − A3 V U ≤ δ(P1 , P2 ) + δ(P2 , P3 ) + 2ε
(7.3.9)
where in the last step we use the fact that U ≤ 1. Now (7.3.9) shows that δ(P1 , P3 ) ≤ δ(P1 , P2 ) + δ(P2 , P3 ) + 2ε .
(7.3.10)
Since (7.3.10) holds for every ε > 0, it follows that δ(P1 , P3 ) ≤ δ(P1 , P2 ) + δ(P2 , P3 ) .
(7.3.11)
δ(P3 , P1 ) ≤ δ(P3 , P2 ) + δ(P2 , P1 ) .
(7.3.12)
By symmetry, The triangle inequality for d now follows from (7.3.6).
2
7.3. GRAPH METRIC
231
The main result of this section is Theorem 7.3.5 which states that the topology on M(R(s)) induced by the graph metric is the same as the graph topology. The following technical result is needed to prove this theorem. Lemma 7.3.4 Suppose P1 , P2 ∈ M(R(s)) have the same dimensions, and let A1 , A2 be associated matrices in M(S) obtained from their normalized r.c.f.’s. Suppose U, V ∈ M(S) satisfy U ≤ 1, and A1 − A2 U + A2 − A1 V < 1. Then U and V are unimodular.
Proof. Note that, because of (7.3.1), we have A1 X = A2 X = X ∀X ∈ M(S) (in other words, multiplication by Ai preserves norms). Hence, I − V U = A1 − A1 V U ≤ A1 − A2 U + A2 U − A1 V U ≤ A1 − A2 U + A2 − A1 V < 1 .
(7.3.13)
Hence, V U ∈ U(S) by Lemma 7.2.5, and V , U ∈ U(S) as a consequence.
2
A sequence {Pi } in M(R(s)) converges to P ∈ M(R(s)) in the graph topology if and only if d(Pi , P ) → 0.
Theorem 7.3.5
Proof. “if ” Suppose d(Pi , P ) → 0. Let A, Ai be matrices of the form (7.3.4) obtained from normalized r.c.f.’s of P , Pi , respectively. Then for each ε in the interval (0, 1/4) there exists an integer n such that d(Pi , P ) ≤ ε ∀i ≥ n. For each i, there exists Ui , Vi in the unit ball of M(S) such that A − Ai Ui ≤ d(Pi , P ) + ε, Ai − AVi ≤ d(Pi , P ) + ε. Hence, for all i ≥ n, we have A − Ai Ui + Ai − AVi ≤ 4ε < 1 .
(7.3.14)
By Lemma 7.3.4, this implies that Ui , Vi are unimodular.Thus, if A = [D N ] and Ai = [Di Ni ] , then (Ni Ui , Di Ui ) is an r.c.f. of Pi and A − Ai Ui ≤ 2ε ∀i ≥ n. It now follows from Lemma 7.2.5 that Pi → P in the graph topology. “only if ” Suppose Pi → P in the graph topology, and let (N, D) be a normalized r.c.f. of P . Then by Lemma 7.2.5 there exist r.c.f.’s (Ni , Di ) of Pi such that Ni → N, Di → D. However, (Ni , Di ) need not be normalized. Let Mi = [Di Ni ] , and suppose Mi = Ai Ri where Ai corresponds to a normalized r.c.f. of Pi and Ri ∈ U(S). Now Mi − A → 0 and Mi = Ai Ri = Ri since Ai is an isometry. Hence, Ri → 1. Define Ui = Ri /Ri . Then Ui = 1; moreover, A − Ai Ui ≤ A − Ai Ri + Ai Ri − Ai Ui = A − Mi + Ri − Ui = A − Mi + |1 − Ri | → 0 as i → ∞ .
(7.3.15)
232
7. ROBUSTNESS
This shows that δ(P , Pi ) → 0. To prove that δ(Pi , P ) → 0, it is necessary to estimate Ri−1 . Let γi = A − Mi . Then for all vectors x ∈ M(L2 ), we have Mi x ≥ Ax − (A − Mi )x ≥ (1 − γi )x. Since x = Ai x = Mi Ri−1 x ≥ (1 − γi )Ri−1 x, it follows that Ri−1 ≤ 1/(1 − γi ) whenever γi < 1. In particular, Ri−1 → 1 as i → ∞. Let Vi = Ri−1 /Ri−1 . Then, as in (7.3.15), Vi = 1 and Ai − AVi = Mi Ri−1 − AVi ≤ Mi Ri−1 − ARi−1 + ARi−1 − AVi ≤ Mi − ARi−1 + Ri−1 − Vi → 0 as i → ∞ .
(7.3.16)
2
Thus δ(Pi , P ) → 0.
At present, the graph metric distance between two plants cannot be computed exactly, since the problem of determining the infimum in (7.3.5) is still open. However, it is possible to give upper and lower bounds.
Suppose (N1 , D1 ) is a normalized r.c.f. of P1 , (N2 , D2 ) is a (not necessarily normalized) r.c.f. of P2 , and let A1 = [D1 N1 ] , M2 = [D2 N2 ] . Suppose A1 − M2 =: γ < 1. Then Lemma 7.3.6
d(P1 , P2 ) ≤
2γ . 1−γ
(7.3.17)
Proof. Suppose M2 = A2 R2 where A2 corresponds to a normalized r.c.f. of P2 and R2 ∈ U(S). Now M2 ≤ A1 + M2 − A1 = 1 + γ . Hence, R2 = M2 ≤ 1 + γ and R2−1 ≤ 1/(1 − γ ), as in the proof of Theorem 7.3.5. Moreover, (7.3.15) and (7.3.16) imply that δ(P1 , P2 ) ≤ 2γ , δ(P2 , P1 ) ≤ 2γ /(1 − γ ) , d(P1 , P2 ) = max{δ(P1 , P2 ), δ(P2 , P1 )} ≤ 2γ /(1 − γ ) ,
(7.3.18) (7.3.19) (7.3.20)
2
This completes the proof. To compute a lower bound on the graph metric distance between two plants, note that δ(P1 , P2 ) ≥
inf
R∈M(S)
A1 − A2 R .
This can be computed using the methods of Section 6.6.
(7.3.21)
7.3. GRAPH METRIC
233
We are now in a position to undertake a proof of Proposition 7.2.21. For ease of reference it is restated here in an equivalent form. Proposition 7.3.7 Suppose {Pi } is a sequence in M(R(s)), P ∈ M(R(s)), and suppose Pi , P are block-diagonal in the form
Pi = P =
Block Diag {Pi1 , · · · , Pil } , Block Diag {P¯1 , · · · , P¯l } ,
(7.3.22) (7.3.23)
where the partitioning is commensurate. Then d(Pi , P ) → 0 if and only if d(Pij , P¯j ) → 0 as i → ∞, for each j in {1, · · · , l}. Remarks 7.3.8
Proposition 7.3.7 is equivalent to Proposition 7.2.21 in view of Theorem 7.3.5.
Proof. It is enough to prove the proposition for the case where l = 2, as the general case will then follow by induction. Accordingly, let us change notation, and suppose we are given a sequence {Pi } of block-diagonal matrices and a block-diagonal matrix P having the form Qi 0 Q 0 Pi = , P = . (7.3.24) 0 Ri 0 R The claim is that d(Pi , P ) → 0 if and only if d(Qi , Q) → 0 and d(Ri , R) → 0. “if ” Suppose d(Qi , Q) → 0, d(Ri , R) → 0. Then, by Theorem 7.3.5, {Qi } converges to Q and {Ri } converges to R in the graph topology. This implies, by Lemma 7.2.5, that there are r.c.f.’s (NQi , DQi ), (NRi , DRi ), (NQ , DQ ), (NR , DR ) of Qi , Ri , Q, R such that (NQi , DQi ) → (NQ , DQ ), (NRi , DRi ) → (NR , DR ) Now define 0 0 NQi DQi Ni = , Di = . (7.3.25) 0 NR i 0 DR i Then clearly (Ni , Di ) is an r.c.f. of Pi , (N, D) is an r.c.f. of P , and (Ni , Di ) → (N, D) (because the norm topology on M(S) is a product topology). Again by Lemma 7.2.5, {Pi } converges to P in the graph topology, so that d(Pi , P ) → 0 by Theorem 7.3.5. “only if ” Suppose d(Pi , P ) → 0. Then {Pi } converges to P in the graph topology. Let (NQ , DQ ), (NR , DR ) be normalized r.c.f.’s of Q, R respectively. Then (N, D) as defined in (7.3.26) is easily shown to be a normalized r.c.f. of P . Similarly, let (NQi , DQi ), (NRi , DRi ) be normalized r.c.f.’s of Qi , Ri , respectively. Then (Ni , Di ) as defined in (7.3.25) is a normalized r.c.f. of Pi . By hypothesis, there is a sequence {Ui } of unimodular matrices such that ⎡ ⎤ ⎡ ⎤ 0 DQi DQ 0 ⎢ 0 ⎢ ⎥ DR i ⎥ ⎢ ⎥ U1i U2i → ⎢ 0 DR ⎥ , (7.3.26) ⎣ NQ ⎦ ⎣ 0 U3i U4i NQ 0 ⎦ i 0 NR i 0 NR
234
7. ROBUSTNESS
where Ui is partitioned in the obvious way. Since the topology on M(S) is a product topology, each block in the partitioned matrix on the left side of (7.3.26) converges to the corresponding block on the right side of (7.3.26). In particular, 0 D Qi , U2i → NQi 0
DRi 0 . U3i → NR i 0
(7.3.27)
Since (NQi , DQi ), (NRi , DRi ) are both normalized r.c.f.’s, (7.3.27) implies that U2i → 0, U3i → 0. Let Ai (resp. A) denote the block 4 × 2 matrix on the left (resp. right) side of (7.3.26). Since U2i → 0, U3i → 0, it follows that 0 U2i →0. (7.3.28) Ai U3i 0 Combining (7.3.26) and (7.3.28) gives Ai U¯ i → A , where
U1i ¯ Ui = 0
0 U4i
(7.3.29) .
(7.3.30)
Now (7.3.29) shows that U¯ i → 1. Hence, if we define Wi = Ui /Ui , then (7.3.29) shows that A − Ai Wi → 0. On the other hand, since d(Pi , P ) → 0, there exists a sequence {Vi } of matrices in the unit ball of M(S) such that Ai − AVi → 0.Thus A − Ai Wi + Ai − AVi < 1 for large enough i, and Wi = 1. Hence by Lemma 7.3.4, both Wi and Vi are unimodular for large enough i. In particular, U1i and U4i are both unimodular for large enough i. Hence, (NQi U1i , DQi U1i ) is an r.c.f. of Qi , and (NRi U4i , DRi U4i ) is an r.c.f. of Ri . Since (7.3.29) implies that DQ D Qi U1i → , NQ i NQ
DR DRi U4i → . NR i NR
(7.3.31)
It follows from Lemma 7.2.5 that Qi → Q, Ri → R in the graph topology. Now Theorem 7.3.5 shows that d(Qi , Q) → 0, d(Ri , R) → 0. 2 This section is concluded with a proof of Theorem 7.2.8. It is restated here in an equivalent form for ease of reference. Proposition 7.3.9 Suppose {Pi }, {Ci } are sequences in M(R(s)), and that P0 , C0 ∈ M(R(s)). Then, the following statements are equivalent: (i) d(Pi , P0 ) → 0, d(Ci , C0 ) → 0 as i → ∞. (ii) d(Hi , H0 ) → 0 as i → ∞, where Hi = H (Pi , Ci ), H0 = H (P0 , C0 ).
7.3. GRAPH METRIC
235
If the pair (P0 , C0 ) is stable, then the above result reduces to Theorem 7.2.8. In general, it shows that the map from the plant-compensator pair to the closed-loop transfer matrix is continuous if all spaces are equipped with the graph topology, even if the pair (P0 , C0 ) is not stable. Remarks 7.3.10
Proof. Let
G0 =
C0 0
0 P0
,
Gi =
Ci 0
0 Pi
.
(7.3.32)
Then Proposition 7.3.7 shows that (i) is equivalent to the following statement: (iii) d(Gi , G0 ) → 0 as i → ∞. Thus the statement to be proved is: d(Gi , G0 ) → 0 if and only if d(Hi , H0 ) → 0. Define 0 I F = . (7.3.33) −I 0 Then (cf. Section 5.1) Hi = (I + F Gi )−1 ,
H0 = (I + F G0 )−1 .
(7.3.34)
Suppose (Ni , Di ) is an r.c.f. of Gi . Then Hi = Di (Di + F Ni )−1 , from (7.3.34). Moreover, (Di , Di + F Ni ) is an r.c.f. of Hi , since Di + F Ni I F Di = , (7.3.35) Di I 0 Ni and the matrix on the right side of (7.3.35) is unimodular, with the inverse 0 I . −F F (Note that F −1 = −F .) Conversely, suppose (Bi , Ai ) is an r.c.f. of Hi . Then Di 0 I Ai = , Ni −F F Bi
(7.3.36)
(7.3.37)
is an r.c.f. of Gi . Similar remarks apply to G0 and H0 . The statement to be proved is that d(Gi , G0 ) → 0 if and only if d(Hi , H0 ) → 0. Suppose first that d(Gi , G0 ) → 0. Then, by Theorem 7.3.5, {Gi } converges to G0 in the graph topology. Hence, according to Lemma 7.2.5, there exist r.c.f.’s (Ni , Di ) of Gi and (N0 , D0 ) of G0 such that Ni → N0 , Di → D0 . Now define Ai I F Di A0 I F D0 = , = . (7.3.38) Bi B0 I 0 Ni I 0 N0 Then (Bi , Ai ) is an r.c.f. of Hi , (B0 , A0 ) is an r.c.f. of H0 , and Ai → A0 , Bi → B0 . Hence, {Hi } converges to H0 in the graph topology and d(Hi , H0 ) → 0. The reverse implication follows by reversing the above steps. 2
236
7. ROBUSTNESS
We conclude this section with a discussion of the graph metric in the case where the set of “stable” transfer functions is something other than the set S. First of all, the graph metric extends almost effortlessly to linear distributed systems. The central step in defining a graph metric is the existence of a normalized r.c.f. In the case of distributed systems, the existence of such a factorization is assured by Beurling’s theorem in the scalar case and the Beurling-Lax theorem in the multivariable case, which states that a closed shift-invariant subspace of Hn2 is the image of an inner operator. Thus, even in the case of distributed systems, if a plant has an r.c.f., then it has a normalized r.c.f. Of course, not all distributed systems have r.c.f.’s (see Section 8.1). Thus, the graph metric can only be defined between pairs of plants when each has an r.c.f. Next, in the case of discrete-time systems, the arguments given here carry over without any major modifications: One only has to replace the extended closed right half-plane by the closed unit disc. Finally, if the set of “stable” transfer functions is a set of the form SD where D is some prespecified region of the complex plane, or its discrete analog, then the graph metric can be defined by first conformally mapping D into C+e (or the closed unit disc in the discrete-time case). This of course requires that the region D be simply connected.
7.4
ROBUSTNESS CONDITIONS
In this section, the focus is on designing stabilizing compensators for imprecisely known plants. Thus, a nominal plant description is available, together with a description of the plant uncertainty, and the objective is to design a compensator that stabilizes all plants lying within the specified band of uncertainty. The types of plant uncertainty considered include additive perturbations, multiplicative perturbations, stable-factor perturbations, and graph-metric perturbations. The solution of the design problem is divided into two steps: The first step is the derivation of necessary and sufficient conditions that any compensator must satisfy in order to be a solution of the design problem. The second step is a further examination of these conditions to see when there actually exists a compensator satisfying these conditions.This leads naturally to the notion of optimally robust compensators. This section concludes with a discussion of the case where there are uncertainties in both the plant and compensator. Throughout this section, several models of plant uncertainty are used; these are described next. Suppose P0 is a given nominal plant and that r ∈ S is a given function. An additive perturbation of P0 is only defined in the case where P0 has no poles on the imaginary axis. The class A(P0 , r) consists of all P ∈ M(R(s)) that have the same number of open RHP poles as P0 (counted according to McMillan degree), and satisfy P (j ω) − P0 (j ω) < |r(j ω)| ∀ω ∈ R .
(7.4.1)
In particular, (7.4.1) implies that if P ∈ A(P0 , r), then P also has no poles on the j ω-axis. However, P − P0 need not be analytic in the open RHP, because even though P and P0 have the same number of open RHP poles, they could be at different locations. Hence, (7.4.1) does not imply that P (s) − P0 (s) < |r(s)| ∀s ∈ C+e . The assumption that r ∈ S is not too restrictive. Suppose it is
7.4. ROBUSTNESS CONDITIONS
237
known that P has the same number of RHP poles as P0 , and that P (j ω) − P0 (j ω) < |f (j ω)| where f is some proper rational function. Then it is always possible to find a stable proper rational function r such that r(j ω) = f (j ω) ∀ω. In fact, one can assume without loss of generality that r is outer. The class of multiplicative perturbations of P0 is also defined under the restriction that P0 has no j ω-axis poles. The class M(P0 , r) consists of all P ∈ M(R(s)) that have the same number of open RHP poles as P0 and satisfy P (s) = (I + M(s))P0 (s) , M ∈ M(R(s)) , M(j ω) < |r(j ω)| ∀ω ∈ R .
where (7.4.2)
Finally, the class of stable-factor perturbations is specified not just in terms of the nominal plant description P0 , but in terms of a particular r.c.f. (N0 , D0 ) of P0 and a function r ∈ S. The class S(N0 , D0 , r) consists of all plants P ∈ M(R(s)) satisfying P (s) = N(s)[D(s)]−1 , where N(s) − N0 (s) D(s) − D0 (s) < |r(s)| ∀s ∈ C+e .
(7.4.3)
The rationale behind the above three models of plant uncertainty is fairly clear. In the case of both A(P0 , r) and M(P0 , r), the perturbed plant P has the same number of open RHP poles as the nominal plant P0 (though possibly at different locations). In the case of A(P0 , r), |r(j ω)| provides an upper bound on the absolute magnitude of the plant uncertainty at the frequency ω, while in the case of M(P0 , r), |r(j ω)| provides an upper bound on the relative magnitude of uncertainty at the frequency ω. The class of stable-factor perturbations is free from the restructive assumption that the perturbed plant and the unperturbed plant have the same number of RHP poles. As well, P0 is permitted to have j ω-axis poles, as is the perturbed plant P . Using (7.4.3) in conjunction with the principle of the argument over various subregions of C+ , one can localize regions in C+ where a P ∈ S(N0 , D0 , r) can have poles. Theorems 7.4.1 and 7.4.2 below give the main robustness results in the case of additive and multiplicative perturbations, respectively. Suppose P ∈ S(N0 , D0 , r), r ∈ M(R(s)) are specified, and suppose C stabilizes P0 . Then C stabilizes all P in the class A(P0 , r) if and only if5
Theorem 7.4.1
[C(I + P0 C)−1 ](j ω) · |r(j ω)| ≤ 1 ∀ω ∈ R .
(7.4.4)
Theorem 7.4.2 Suppose P0 ∈ M(P0 , r), r ∈ M(R(s)) are specified, and suppose C stabilizes P0 . Then C stabilizes all P in the class M(P0 , r) if and only if
[P0 C(I + P0 C)−1 ](j ω) · |r(j ω)| ≤ 1 ∀ω ∈ R . 5 Recall that the norm M of a matrix M equals its largest singular value.
(7.4.5)
238
7. ROBUSTNESS
The proofs of both theorems make use of a graphical criterion for feedback stability that generalizes the Nyquist criterion. As it is of independent interest, it is stated separately. Lemma 7.4.3 Suppose P , C ∈ M(R(s)) have γp , γc poles in the open RHP, respectively, counted according to McMillan degree, and none on the j ω-axis. Then C stabilizes P if and only if the plot of |(I + P C)(j ω)| as ω decreases from ∞ to −∞ does not pass through the origin, and encircles the origin γp + γc times in the clockwise sense.
A little notation and an appeal to the principle of the argument simplifies the proof of Lemma 7.4.3. Suppose f is a rational function that has neither poles nor zeros on the extended j ω-axis. Then, as ω decreases from ∞ to −∞, the graph of f (j ω) traces out a closed curve in the complex plane. Let 1 ind f = lim [arg f (−j ω) − arg f (j ω)] , (7.4.6) 2π ω→∞ denote the index of this closed curve with respect to the origin. In more prosaic terms, ind f is the number of times that the plot of f (j ω) encircles the origin in the counter-clockwise sense as ω decreases from ∞ to −∞. The principle of the argument states that ind f = no. of zeros of f in C+ − no. of poles of f in C+ .
(7.4.7)
Since arg ab = arg a + arg b for any two nonzero complex numbers a and b, it follows that ind fg = ind f + ind g for any two appropriate rational functions f and g. Proof of Lemma 7.4.3. Let (Np , Dp ), (D˜ c , N˜c ) denote any r.c.f. and any l.c.f. of P and C, respectively. Then, by Theorem 5.1.6, C stabilizes P if and only if D˜ c Dp + N˜c Np is unimodular, or equivalently, f = |D˜ c Dp , +N˜ c Np | is a unit of S. Using the principle of the argument, this condition can be expressed in terms of the behavior of f (j ω), and ultimately, |(I + P C)(j ω)|. Note that f = |D˜ c Dp + N˜ c Np | = |D˜ c | · |I + CP | · |Dp | = |D˜ c | · |I + P C| · |Dp |
(7.4.8)
since |I + CP | = |I + P C|. If |(I + P C)(j ω)| = 0 for some ω (finite or infinite), it is immediate that f is not a unit. So suppose that |(I + P C)(j ω)| = 0 ∀ω and at infinity. Then ind f is welldefined and (7.4.9) ind f = ind |D˜ c | + ind |I + P C| + ind |Dp | . Now for a function g ∈ S with no j ω-axis zeros, ind g equals its degree δ(g) defined in (2.1.13). Moreover, g is a unit if and only if δ(g) = 0. Finally, from Theorem 4.3.12 and (7.4.7), it follows
7.4. ROBUSTNESS CONDITIONS
239
that ind |D˜ c | = γc , ind |Dp | = γp . Combining this reasoning with (7.4.9) shows that f is a unit if and only if ind |I + P C| = −(γc + γp ) . (7.4.10)
2
This completes the proof.
If P and/or C has j ω-axis poles, Lemma 7.4.3 has to be modified by replacing j ω-axis throughout by indented j ω-axis. This modification is standard (see e.g., [26, Sec.IV.5]). Proof of Theorem 7.4.1. For the sake of simplicity, it is assumed that C has no j ω-axis poles. The modifications required to handle the case where C has j ω-axis poles are obvious and left to the reader. “only if ” It is shown that if (7.4.4) does not hold, then there exists a P in the class A(P0 , r) that is not stabilized by C. Accordingly, suppose (7.4.4) is not true, and let F denote C(I + P0 C)−1 .Then either F (∞) · |r(∞)| > 1, or else F (j ω0 ) · |r(j ω0 )| > 1 for some finite ω0 . Let Re denote the extended real line, i.e., the set R ∪ {∞}. Then the hypothesis is that F (j ω0 ) · |r(j ω0 )| > 1 for some ω0 ∈ Re . Let F denote F (j ω0 ), let σ1 , · · · , σs denote the nonzero singular values of F , and let denote the diagonal matrix = Diag {σ1 , · · · , σs }. Select unitary matrices U, V such that 0 U F (j ω0 )V = , (7.4.11) 0 0 Note that σ1 · |r(j ω0 )| > 1 by assumption. There are now two cases to consider. Case (i) ω0 = 0 or ∞. In this case F is a real matrix, so that U and V can also be chosen to be real (and hence orthogonal), and r(j ω0 ) is real. Now define Q ∈ M(S) by r(s) 1/σ1 0 U Q(s) = −V , (7.4.12) 0 0 r(j ω0 ) and let P = P0 + Q. Then, since Q is stable, P has the same number of (and in fact the same) RHP poles as P0 . Moreover, Q(j ω) =
|r(j ω)| < |r(j ω)| ∀ω ∈ R , σ1 |r(j ω0 )|
(7.4.13)
since σ1 |r(j ω0 )| > 1. Hence P belongs to the class A(P0 , r). However, it is claimed that C does not stabilize P . To prove this, it is enough, in view of Lemma 7.4.3, to show that |(I + P C)(j ω0 )| = 0. Let v1 denote the first column of V , and note that V v1 = e1 = [1 0 · · · 0] because V is orthogonal. Hence, (I + QF )(j ω0 )v1 = v1 − (QF )(j ω0 )v1 1/σ1 0 = v1 − V 0 0 0 = v1 − v1 = 0 .
0 e 0 1 (7.4.14)
240
7. ROBUSTNESS
Since v1 = 0, this implies that |(I + QF )(j ω0 )| = 0. Now |I + P C| = |I + (Q + P0 C)| = |I + QF | · |I + P0 C|, from which it follows that |(I + P C)(j ω0 )| = 0. Case (ii) ω0 is nonzero and finite. In this case the only complication is that the matrices U, V and the constant r(j ω0 ) may not be real, so that the Q defined in (7.4.12) may not be a real rational matrix. To handle this possibility, write r(j ω0 ) = |r(j ω0 )| exp(j δ) where δ ∈ [0, 2π ). Let v1 , u1 denote respectively the first column of V and the first row of U . Express each component of v1 , u1 in the form (v1 )i = v¯i exp(j φi ), (u1 )i = u¯ i exp(j θi ) , (7.4.15) where v¯i , u¯ i are real numbers and φ − δ, θ − δ ∈ (−π, 0] ∀i. Define Q(s) = −
r(s) v(s) ¯ u¯ (s) , σ1 |r(j ω0 )|
(7.4.16)
where ⎤ v¯1 (s − α1 )/(s + α1 ) ⎥ ⎢ .. v(s) ¯ =⎣ ⎦ , . ⎡
v¯ (s − αn )/(s + αn ) ⎤ ⎡ n u¯ 1 (s − β1 )/(s + β1 ) ⎥ ⎢ .. u(s) ¯ =⎣ ⎦ , . u¯ n (s − βn )/(s + βn ) , and the nonnegative constants αi , βi are adjusted such that6
j ω0 − αi j ω0 − βi arg = φi − δ , arg = θi − δ . j ω0 + α i j ω0 + βi
(7.4.17)
(7.4.18)
Then Q is a real rational matrix, Q ∈ M(S), and Q(j ω0 ) = −v1 u1 /σ1 .
(7.4.19)
Moreover, Q satisfies (7.4.13). Hence, P = P0 + Q belongs to the class A(P0 , r). As before, (I + QF )(j ω0 ) is singular, because it maps v1 into the zero vector. Hence C does not stabilize P . “if ” Suppose (7.4.4) holds. Let P be any plant in the class A(P0 , r) and let denote P − P0 . Then, by (7.4.1), (j ω) < |r(j ω)| ∀ω . (7.4.20) Combining (7.4.4) with (7.4.20) and letting F denote C(I + P0 C)−1 leads to (j ω) · F (j ω) < 1 ∀ω . 6 One can assume without loss of generality that ω > 0, so that such α , β exist. i i 0
(7.4.21)
7.4. ROBUSTNESS CONDITIONS
241
Hence |I + λ(j ω)F (j ω)| = 0 ∀ω ∈ R, ∀λ ∈ [0, 1] .
(7.4.22)
Let X denote the topological space C − 0 (i.e., the “punctured plane”) with the topology inherited from C. Define two closed curves f (·) and g(·) in X by f (j ω) = 1 ∀ω ,
g(j ω) = |I + (j ω)F (j ω)| ∀ω .
(7.4.23)
Now (7.4.22) shows that the function h(λ, j ω) = |I + λ(j ω)F (j ω)| ,
(7.4.24)
continuously transforms the curve f into the curve g in X. Hence, the curves f and g are homotopic in X, and as a result they have the same index with respect to 0. Since the index of f is clearly zero, it follows that ind g = ind |I + F | = 0 . (7.4.25) Finally, note that |I + P C| = |I + F | · |I + P0 C| .
(7.4.26)
Since C stabilizes P0 , it follows from Lemma 7.4.3 that ind |I + P0 C| = −(γ0 + γc ) ,
(7.4.27)
where γ0 (resp. γc ) is the number of RHP poles of P0 (resp. C) counted according to McMillan degree. Now (7.4.25)–(7.4.27) imply that ind |I + P C| = −(γ0 + γc ) .
(7.4.28)
Since P also has γ0 RHP poles by assumption, (7.4.28) and Lemma 7.4.3 show that C stabilizes P . 2 ¯ 0 , r) to be the set of all plants P that have the same number of RHP Suppose we define A(P poles as P0 , and satisfy P (j ω) − P0 (j ω) ≤ |r(j ω)| ∀ω . (7.4.29) (The difference between (7.4.1) and (7.4.29) is that the < is replaced by ≤.) Then the proof of Theorem 7.4.1 can be easily adapted to prove the following result. Corollary 7.4.4
¯ 0 , r) if and only if Suppose C ∈ S(P0 ). Then C stabilizes all plants in the class A(P sup [C(I + P0 C)−1 ](j ω) · |r(j ω)| < 1 . ω
(7.4.30)
242
7. ROBUSTNESS
Sketch of Proof of Theorem 7.4.2. Suppose P = (I + M)P0 . Then |I + P C| = |I + (I + M)P0 C| = |I + MG| · |I + P0 C| ,
(7.4.31)
where G = P0 C(I + P0 C)−1 . By assumption, M satisfies M(j ω) < |r(j ω)| ∀ω, and (7.4.5) states that G(j ω) · |r(j ω)| ≤ 1 ∀ω. 2 ¯ 0 , r) to be the set of all plants P To derive a result analogous to Corollary 7.4.4, define M(P that have the same number of RHP poles as P0 , and satisfy P = (I + M)P0 where M ∈ M(R(s)) and M(j ω) ≤ |r(j ω)| ∀ω . (7.4.32) Corollary 7.4.5
¯ 0 , r) if and only if Suppose C ∈ S(P0 ). Then C stabilizes all plants in the class M(P sup [P0 C(I + P0 C)−1 ](j ω) · |r(j ω)| < 1 . ω
(7.4.33)
Finally, consider the case of stable-factor perturbations. Suppose P0 ∈ M(R(s)). Suppose an r.c.f. (N0 , D0 ) of P0 and a function r ∈ S are specified, and define the class S(N0 , D0 , r) as in (7.4.3). Suppose C ∈ S(P0 ), and select an l.c.f. (D˜ c , N˜ c ) of C such that D˜ c D0 + N˜ c N0 = I . Then C stabilizes all P in the class, S(N0 , D0 , r), if and only if [D˜ c (j ω) N˜ c (j ω)] · |r(j ω)| ≤ 1 ∀ω . (7.4.34)
Theorem 7.4.6
Since D˜ c , N˜ c ∈ M(S), r ∈ S, the condition (7.4.34) actually implies that ˜ ˜ [Dc (s) Nc (s)]r(s) ≤ 1 ∀s ∈ C+e . Remarks 7.4.7
Proof. For convenience, define
D0 A0 = N0
,
D A1 = , N
A˜ c = [D˜ c N˜ c ] .
(7.4.35)
“if ” Suppose (7.4.34) holds, and suppose P is an arbitrary plant in the class S(N0 , D0 , r). Then P has an r.c.f. (N, D) such that A(s) − A0 (s) < |r(s)| ∀s ∈ C+e . Now consider the return difference matrix D˜ c D + N˜ c N = A˜ c A. Since A˜ c A0 = I , it follows that A˜ c A = I + A˜ c (A − A0 ). However, from (7.4.34) we get A˜ c (A − A0 )](s) ≤ A˜ c (s) · [A − A0 ](s) < 1 ∀s ∈ C+e . This shows that |(A˜ c A)(s)| = |I + [A˜ c (A − A0 )](s)| = 0 ∀s ∈ C+e , so that A˜ c A is unimodular. Hence, by Theorem 5.1.6, C stabilizes P .
7.4. ROBUSTNESS CONDITIONS
243
“only if ” Suppose (7.4.34) is false, and select an ω0 such that A˜ c (j ω) · |r(j ω0 )| > 1. Let σ1 ≥ σ2 ≥ · · · ≥ σm denote the singular values of A˜ c (j ω0 ), and note that σ1 |r(j ω0 )| > 1. Let = Diag {σ1 , · · · σm }, and select unitary matrices U, V such that7 U A˜ c (j ω0 )V = [ 0] . Since A˜ c (j ω0 )A0 (j ω0 ) = I , it follows that A0 (j ω0 ) must be of the form −1 A0 (j ω0 ) = V U. X By assumption, σ1 |r(j ω0 )| > 1. Now define R = Diag {1/σ1 , 0, · · · , 0}, and r(s) R Q(s) = −V U , T r(j ω0 )
(7.4.36)
(7.4.37)
(7.4.38)
where T is a constant matrix chosen as follows: If the first column of the matrix X in (7.4.37) is nonzero, simply choose T = 0. Otherwise, if the first column of X is zero, choose T to be a matrix whose columns are all zero except for the first column, and whose first column is nonzero and has sufficiently small norm that Q(j ω0 ) < 1/|r(j ω0 )|. If T is so chosen, then the first column of X − T is not identically zero. Now let P be the plant ND −1 , where [N D ] = A = A0 + Q. Then P belongs to the class S(N0 , D0 , r), since Q(s) · |r(s)| ≤ 1 ∀s ∈ C+e .8 However, −1 −R U, (7.4.39) A(j ω0 ) = A0 (j ω0 ) + Q(j ω0 ) = V X−T and the first column of −1 − R is zero. Hence A˜ c (j ω0 )A(j ω0 ) is singular, and A˜ c A is not unimodular. The question is: Does the singularity of |D˜ c D + N˜ c N| at j ω0 imply that C fails to stabilize P ? We can conclude that this is so, provided it can be shown that N and D are right-coprime, or at least, that any greatest common right divisor of N and D is nonsingular at j ω0 . To amplify this argument, suppose N = N1 B, D = D1 B, where B is a g.c.r.d. of N and D, and N1 and D1 are right-coprime. Then |D˜ c D + N˜ c N| = |D˜ c D1 + N˜ c N1 | · |B|. Hence, if |B(j ω0 )| = 0, then |D˜ c D1 + N˜ c N1 | vanishes at j ω0 , and C does not stabilize P . Thus the proof is complete if it can be shown that |B(j ω0 )| = 0. For this it is enough to show that the matrix [D N ] = A0 − Q has full column rank at j ω0 . But this last fact is immediate from (7.4.39), since the first column of X − T is nonzero. 2 There is an interesting anomaly associated with the robustness condition (7.4.34). Comparing Theorem 7.4.1 with Corollary 7.4.4, or Theorem 7.4.2 with Corollary 7.4.5, we see that it makes very little difference, in the case of additive and multiplicative perturbations, whether the class of 7 Note that A˜ (s) is a “fat” matrix and has full row rank at all s ∈ C . c +e 8The matrix Q may not be a real rational matrix; but this can be fixed up as in the proof of Theorem 7.4.1.
244
7. ROBUSTNESS
perturbations is defined with a strict or nonstrict inequality. If the class of perturbations is defined with a strict inequality, then the robustness condition has a nonstrict inequality, and vice versa. But ¯ 0 , D0 , r) by this is not so in the case of stable-factor perturbations. Define the class S(N ¯ 0 , D0 , r) = {P = ND −1 : (N − N0 )(s) S(N (D − D0 )(s) ≤ |r(s)| ∀s ∈ C+e } . (7.4.40) One might be tempted to conjecture the following result: C ∈ S(P0 ) stabilizes all P in the class ¯ 0 , D0 , r); if and only if S(N sup A˜ c (j ω) · |r(j ω)| < 1 . (7.4.41) ω
But this is false: (7.4.41) is certainly sufficient for robust stability, but not necessary, as the next example shows. Example 7.4.8
Consider p0 (s) =
where n0 (s) =
n0 (s) 4 = s−3 d0 (s)
4(s + 7) , (s + 1)2
d0 (s) =
(s − 3)(s + 7) . (s + 1)2
Let c(s) = 4/(s + 5). Then c stabilizes p0 . Moreover, dc d0 + nc n0 = 1 with dc (s) =
s+5 , s+7
nc (s) =
4 . s+7
¯ 0 , d0 , r); with r = 1. Since Now consider the class S(n dc (∞) 1 nc (∞) = 0 = 1 , ¯ 0 , d0 , r), which shows (7.4.41) does not hold. Nevertheless, c stabilizes every p in the class S(n that (7.4.41) is not always necessary for robust stability. To establish the claim made in the above paragraph, suppose n − n0 n(s) p(s) = , where (s) ≤ 1 ∀s ∈ C+e . d − d0 d(s) Let Q = [q1 q2 ] denote [d − d0 n − n0 ] . Then there are two cases to consider, namely: (i) q1 (∞) + 1 = 0, and (ii) q1 (∞) + 1 = 0. In either case we have [dc (s) nc (s)] < 1 ∀s ∈ C+ , so that (dc d + nc n)(s) = 1 + [dc (s) nc (s)]Q(s) = 0 ∀s ∈ C+ .
7.4. ROBUSTNESS CONDITIONS
245
On the other hand, [dc (∞) nc (∞)] = [1 0] so that (dc d + nc n)(∞) = 1 + q1 (∞) . Hence, if q1 (∞) = −1, then dc d + nc n has no zeros in the extended RHP and is thus a unit of S. Therefore c stabilizes p. It only remains to show that c stabilizes p even if q1 (∞) = −1. In this case q2 (∞) = 0 since Q(∞) ≤ 1. Hence, n(∞) = d(∞) = 0, and α = 1/(s + 1) is a common divisor of n and d. Further, since (dc d + nc n)(s) = 0 ∀s ∈ C+ , we see that α, its powers and associates are the only possible common divisors of n and d. Now it is claimed that, whatever be q1 , the function d0 + q1 can have only a simple zero at infinity. Let us accept this claim for a moment; then α is a greatest common divisor of n and d, since d/α does not vanish at infinity. Let n1 = 1/α, d1 = d/α; then (n1 , d1 ) is a coprime factorization of p. Now (dc d1 + nc n1 )(∞) = d1 (∞) = 0 , since d has only a simple zero at infinity. On the other hand, since it has already been established that (dc d + nc n)(s) = 0 ∀s ∈ C+ , it follows that (dc d1 + nc n1 )(s) = 0 ∀s ∈ C+e . Hence, dc d1 + nc n1 is a unit of S, and c stabilizes p. Thus, the example is complete if the claim can be established. This is most easily done using the bilinear transformation z = (s − 1)/(s + 1), which sends the function d0 (s) into 1+z a(z) = d0 = (4 − 3z)(2z − 1) , 1−z and sends q1 (s) into an associated rational function b(z). In this context, the claim is that a − b has only a simple zero at z = 1 whenever b(1) = −1 and b∞ ≤ 1.9 We prove the contrapositive, namely, if g is a rational H∞ -function such that g(1) = 1 and a − g has a double zero at z = 1, then g∞ > 1. Expand g in a power series around z = 1, as g(z) =
∞
gi (z − 1)i .
i=0
If a − g has (at least) a double zero at z = 1, then g(1) = a(1) = 1 ,
g (1) = a (1) = −1 .
Hence, for z < 1 and sufficiently close to 1, we have g(z) > 1. Therefore, g∞ > 1. This completes the example. 9 Here ·
∞ denotes the norm in the space H∞ ; see Section 6.2.
246
7. ROBUSTNESS
Each of Theorems 7.4.1, 7.4.2 and 7.4.6 gives conditions for a particular C ∈ S(P0 ) to stabilize all plants in a given class of perturbations. But they do not answer the question whether, given a class of perturbations, there actually exists a C ∈ S(P0 ) that satisfies the robustness conditions. This question is addressed next. Note that the solutions given below make use of the H∞ -norm minimization results derived in Sections 6.5 and 6.6. Suppose a nominal plant P0 ∈ M(R(s)), free of poles on the extended j ω-axis, and a function r ∈ S are specified. Suppose r does not have any zeros on the extended j ω-axis. Let ˜ N˜ ) be any r.c.f. and any l.c.f. of P0 , and let X, ˜ Y˜ ∈ M(S) be any particular solutions of (N, D), (D, ˜ ˜ ˜ ˜ ˜ the identity N X + D Y = I . Factor D, D in the form
Theorem 7.4.9
D = Di Do , D˜ = D˜ o D˜ i ,
(7.4.42)
where Di , D˜ i , are inner and Do , D˜ o are outer, and assume without loss of generality that |Di | = |D˜ i |.10 Finally, factor r as ro ri where ro , ri are outer and inner, respectively. Under these conditions, there exists a C ∈ S(P0 ) that stabilizes all plants in the class A(P0 , r) if and only if there exists an S ∈ M(S) such that adj Di X˜ D˜ o ro + δS∞ ≤ 1 , (7.4.43) where δ = |Di | = |D˜ i |. Proof. By Theorem 5.2.1, the set of all C ∈ S(P0 ) is given by S(P0 ) = {(X˜ + DS)(Y˜ − NS)−1 : S ∈ M(S), |Y˜ − NS| = 0} ,
(7.4.44)
where the nonsingularity constraint is satisfied by an open dense subset of S ∈ M(S) (see Lemma 5.2.4). Moreover, from (5.2.7), if C = (X˜ + DS)(Y˜ − NS)−1 , then C(I + P0 C)−1 = (X˜ + DS)D˜ .
(7.4.45)
Hence, there exists a C ∈ S(P0 ) satisfying (7.4.4) if and only if there exists an S ∈ M(S) such that ˜ + DS Dr ˜ ∞≤1. X˜ Dr
(7.4.46)
A few simple manipulations bring (7.4.46) into the form (7.4.43). Note that ˜ + DS Dr ˜ ∞ = X˜ D˜ o D˜ i ro ri + Di Do S D˜ o D˜ i ro ri X˜ Dr adj = Di X˜ D˜ o ro ri δ + δ 2 Do S D˜ o ro ri ∞ adj = δri (D X˜ D˜ o ro + δQ)∞ adj
= Di
i
X˜ D˜ o ro + δQ∞ ,
(7.4.47)
10 By Theorem 4.1.15, |D| and |D| ˜ are associates; so are |Do | and |D˜ o |, since both are units. Hence, |Di | and |D˜ i | are associates,
and since both are inner, |Di | = ±|D˜ i | in any case.
7.4. ROBUSTNESS CONDITIONS adj Di
247
adj D˜ i ,
where, in successive steps, we multiply on the left by and on the right by replace the matrix S by a new “free” parameter Q = Do S D˜ o ro , and observe that multiplication by the inner function δri preserves norms. Hence, (7.4.46) and (7.4.43) are equivalent. 2 If δ has only simple RHP zeros (or equivalently, if P0 has only simple RHP poles), then (7.4.43) can be stated in a more readily testable form using the results of Section 6.5. Corollary 7.4.10
Fi =
Suppose δ has only simple RHP zeros, and let λ1 , · · · , λn denote these zeros. Let = 1, · · · , n, and define
adj ˜ (Di XD o ro )(λj ), j
I + Fj∗ Fk Qj k = , 1 ≤ j, k ≤ n , λ¯ + λk ⎤ ⎡j Q11 · · · Q1n ⎢ .. ⎥ . Q = ⎣ ... . ⎦ Qn1 · · · Qnn
(7.4.48)
(7.4.49)
Then there exists a C ∈ S(P0 ) that stabilizes all plants in the class A(P0 , r) if and only if Q is nonnegative definite. In (7.4.43), X˜ is any particular solution of the identity N˜ X˜ + D˜ Y˜ = I . Hence, it is nice to know that the test matrix Q is the same no matter which X˜ is used. To see this, let X˜ 1 , Y˜1 be another set of matrices in M(S) satisfying N˜ X˜ 1 + D˜ Y˜1 = I . Then, from the proof of Theorem 5.2.1, it follows that X˜ 1 = X˜ + DR for some R ∈ M(S). Hence, adj
Di
adj adj X˜ 1 D˜ o ro = Di (X˜ + DR)D˜ o ro = Di X˜ D˜ o ro + δDo R D˜ o ro ,
(7.4.50)
since D = Di Do . Now the second term on the right side of (7.4.50) vanishes at all zeros of δ. Hence F1 , · · · , Fn are independent of which particular solution X˜ is used to compute them. Similar remarks apply even in the case where δ has repeated zeros. If the plant P0 is scalar, then the expression for Fj (or fj in the scalar case) is more elegant. In adj this case, Di = 1 (since by convention the adjoint of a 1 × 1 matrix is 1) and fj = (xdo ro )(λj ). Now, in the scalar case δ = di ; hence, it follows from d = di do = δdo and nx + yd = 1 that, at any zero λj of δ, we have n(λj )x(λj ) = 1. Hence, x(λj )do (λj ) =
do (λj ) . n(λj )
(7.4.51)
With the plant p0 , associate the stable plant q0 = p0 δ = n/do . Then fj =
do (λj ) ro ro (λj ) = (λj ) . n(λj ) q0
(7.4.52)
248
7. ROBUSTNESS
The treatment of multiplicative perturbations is entirely similar to that of additive perturbations. Suppose a function r ∈ S and a nominal plant P0 ∈ M(R(s)) are specified, and ˜ N) ˜ be any r.c.f. and any l.c.f. of suppose that P0 has no extended j ω-axis poles. Let (N, D), (D, P0 , and let X, Y ∈ M(S) be any particular solutions to the identity XN + Y D = I . Under these conditions, there exists a C ∈ S(P0 ) that stabilizes all plants in the class M(P0 , r) if and only if there exists an R ∈ M(S) such that ˜ (7.4.53) N(X + R D)r ∞ ≤1.
Theorem 7.4.11
Further, if P0 has at least as many inputs as outputs and r has no extended j ω-axis zeros, then (7.4.53) can be simplified as follows: Factor N, D˜ in the form N = N i No ,
D˜ = D˜ o D˜ i ,
(7.4.54)
where Ni , D˜ i are inner and No , D˜ o are outer. Finally. factor r as ri ro where ri is inner and ro is a unit of S. Under these conditions, there exists a C ∈ S(P0 ) that stabilizes all plants in the class M(P0 , r) if and only if there exists an S ∈ M(S) such that adj No X D˜ i ro + δS∞ ≤ 1 ,
(7.4.55)
where δ = |D˜ i |. ˜ −1 (X + R D) ˜ for some Proof. From Theorem 5.2.1, every C ∈ S(P0 ) is of the form (Y − R N) R ∈ M(S). Moreover, ˜ . P0 C(I + P0 C)−1 = P0 (I + CP0 )−1 C = N(X + R D)
(7.4.56)
Hence (7.4.5) holds for some C ∈ S(P0 ) if and only if there exists an R ∈ M(S) such that (7.4.53) holds. Now suppose the additional assumptions on P0 , r hold. Then, using by now familiar manipulations, we note that ˜ ∞ = Ni No Xro ri + Ni No R D˜ o D˜ i ro ri ∞ N Xr + NR Dr adj = Ni ri (No Xro D˜ + δS)∞ = No Xro D˜ i
adj
i
+ δS)∞ ,
(7.4.57)
where S = No R D˜ o and we use the fact that multiplication by the inner matrix Ni ri reserves norms. Now (7.4.57) is precisely (7.4.55). 2 adj If the plant P0 is scalar, one can again obtain a simple expression for the No Xro D˜ i evaluated at the zeros of δ. Suppose nx + dy = 1 and suppose δ(λ) = 0 at some point λ ∈ C+ .Then d(λ) = 0, adj nx(λ) = 1, and (No Xro D˜ i )(λ) = ro (λ)/ni (λ).
7.4. ROBUSTNESS CONDITIONS
249
The foregoing results also lead very naturally to the notion of optimally robust compensators. Suppose a nominal plant P0 is given, together with a function f ∈ S, which represents an uncertainty profile. Consider now the class A(P0 , r) consisting of all plants satisfying (7.4.1), where r = λf . It is now reasonable to ask: What is the largest value of the parameter λ for which the class A(P0 , r) is robustly stabilizable, and what is a corresponding robustly stabilizing compensator? This problem can be solved very easily using the results derived so far. Factor f as fi fo where fi , fo are respectively inner and outer, and define adj γ = min Di X˜ D˜ o fo + δS , (7.4.58) S∈M(S)
where all symbols are as in Theorem 7.4.9. Now, by applying Theorem 7.4.9 with r = λf , it follows that there exists a single compensator that stabilizes all the plants in the class A(P0 , r) if and only if γ λ ≤ 1. Hence, the largest value of λ for which there exists a robustly stabilizing compensator is given by λ = 1/γ . Further, if S is any matrix which attains the minimum in (7.4.58), then the compensator C = (X˜ + DS)(Y˜ − NS)−1 is an optimally robust compensator. The case of multiplicative perturbations is entirely similar and is left to the reader. Next, the case of stable-factor perturbations is studied. Suppose a function r ∈ S and a nominal plant P0 ∈ M(R(s)) are specified, together with an r.c.f. (N0 , D0 ) of P0 . Let X, Y ∈ M(S) satisfy XN0 + Y D0 = I , and let (D˜ 0 , N˜ 0 ) be any l.c.f. of P0 . Then there exists a C ∈ S(P0 ) that stabilizes all plants in the class S(N0 , D0 , r) if and only if there exists a matrix R ∈ M(S) such that A − BR ≤ 1, where X D˜ 0 A= , B= . (7.4.59) Y −N˜ 0
Theorem 7.4.12
The proof is obvious and is left to the reader. This section is concluded with a discussion of the case where there are perturbations in both the plant and the compensator. The results in this case make use of the graph metric introduced in Section 7.3. A bit of notation makes the presentation smoother. Given P , C ∈ M(R(s)), define I 0 T (P , C) = H (P , C) − , (7.4.60) 0 0 where H (P , C) is defined in 7.1.1. Using this last equation, one can derive a few other useful representations of T (P , C). For instance, −P C(I + P C)−1 −P (I + CP )−1 −P T (P , C) = (7.4.61) = (I + CP )−1 [C I ] . C(I + P C)−1 (I + CP )−1 I Suppose the pair (P0 , C0 ) is stable, and that P0 , C0 are perturbed to P , C, respectively. Then the pair (P , C) is stable provided
Theorem 7.4.13
d(P , P0 )T (P0 , C0 ) + d(C, C0 )T (C0 , P0 ) < 1 .
(7.4.62)
250
7. ROBUSTNESS
Under the hypotheses of Theorem 7.4.13, the pair (P , C0 ) is stable provided d(P , P0 ) < 1/T (P0 , C0 ). Corollary 7.4.14
The significance of Theorem 7.4.13 and Corollary 7.4.14 is as follows: The type of plant perturbations studied in these two results is the most unstructured one considered in this section, in that (i) one is permitted to simultaneously perturb both the plant and the compensator. (ii) there is no restriction on the number of RHP poles of the perturbed and unperturbed plant being the same, and (iii) the perturbations are not couched in terms of a particular coprime factorization of the plant. The stability condition (7.4.62) is interesting in that the effects of the perturbations in the plant and controller enter additively. When only the plant is perturbed, the stability condition given in Corollary 7.4.14 is reminiscent of the small gain theorem (see e.g., [D1, Chapter VI]). These results also bring out the utility of the graph metric in obtaining quantitative robustness estimates. Proof. Let (N0 , D0 ), (X˜ 0 , Y˜0 ) be normalized r.c.f.’s of P0 , C0 , respectively. Then from the stability of the pair (P0 , C0 ), and Problem 5.1.1, the matrix
D0 N0
U0 =
−X˜ 0 Y˜0
,
(7.4.63)
.
(7.4.64)
is unimodular. Let V0 = U0−1 , and partition V0 as
Y0 V0 = −N˜ 0
X0 D˜ 0
Then (D˜ 0 , N˜ 0 ), (Y0 , X0 ), are l.c.f.’s of P0 , C0 , respectively. Now select real numbers δp > d(P , P0 ), δc > d(C, C0 ) such that δp T (P0 , C0 ) + δc T (C0 , P0 ) < 1 .
(7.4.65)
This is possible in view of (7.4.62). Let (Np , Dp ), (Nc , Dc ) be any normalized r.c.f.’s of P , C respectively. Then, from the definition of the graph metric (see 7.3.2), there exist matrices Wp , Wc , ∈ M(S) with Wp ≤ 1, Wc ≤ 1 such that D0 − Dp Wp N0 − Np Wp ≤ δp , Y˜0 − Dc Wc X˜ 0 − Nc Wc ≤ δc .
(7.4.66) (7.4.67)
7.4. ROBUSTNESS CONDITIONS
251
Define F as in (7.3.33) and observe that F is inner so that multiplication by F is norm-preserving. Thus (7.4.67) implies that −(X˜ 0 − Nc Wc ) (7.4.68) Y˜0 − Dc Wc ≤ δc . Next, define
Dp Wp −Nc Wc N W Dc Wc p p Dp −Nc Wp = 0 Np Dc
U=
0 Wc
.
(7.4.69)
It is now shown that (7.4.65) implies the unimodularity of U . This will show, a fortiriori, that Wp and Wc , are both unimodular, and that the pair (P , C) is stable, by Problem 5.1.1. Note that U − U0 = [A B] , where A=
D0 − Dp Wp , N0 − Np Wp
B=
−(X˜ 0 − Nc Wc ) . Y˜0 − Dc Wc
(7.4.70)
(7.4.71)
Now, if [A B]U0−1 < 1, then (7.4.70) implies that U is unimodular. From (7.4.64), [A B]U0−1 = [A B]V0 = A[Y0 X0 ] + B[−N˜ 0 D˜ 0 ] . [A B]U0−1 ≤ A[Y0 X0 ] + B[−N˜ 0 D˜ 0 ] ≤ δp [Y0 X0 ] + δc [−N˜ 0 D˜ 0 ]
(7.4.72) (7.4.73)
where the last step follows from (7.4.66) and (7.4.68). The proof is completed by showing that [Y0 X0 ] = T (P0 , C0 ) , [−N˜ 0 D˜ 0 ] = T (C0 , P0 ) .
(7.4.74) (7.4.75)
Then (7.4.65) and (7.4.73) will imply the unimodularity of U and the stability of (P , C). To prove (7.4.74), recall that (N0 , D0 ) is a normalized r.c.f. of P0 , and (Y0 , X0 ) is the corresponding l.c.f. of C0 such that Y0 D0 + X0 N0 = I . Hence, from (5.1.23) and (7.4.60), −N0 X0 −N0 Y0 −N0 (7.4.76) = [X0 Y0 ] ; T (P0 , C0 ) = D0 X0 D 0 Y0 D0 −N0 is an isometry T (P0 , C0 ) = [X0 Y0 ] since D0 = [Y0 X0 ] . (7.4.77) The proof of (7.4.75) follows essentially by symmetry arguments, after noting that [−N˜ 0 D˜ 0 ] = [N˜ 0 D˜ 0 ]. Recall that (X˜ 0 , Y˜0 ) is a normalized r.c.f. of C0 , that P0 stabilizes C0 , and that (D˜ 0 , N˜ 0 ) is the corresponding l.c.f. of P0 such that D˜ 0 Y˜0 + N˜ 0 X˜ 0 = I . 2
252
7. ROBUSTNESS
It is not known at present how close the conditions of Theorem 7.4.13 and Corollary 7.4.14 are to being necessary. In particular, it is not known whether the condition T (P0 , C) < r −1 is necessary for a controller C to stabilize all plants within a distance of r from the plant P0 . Nevertheless, it is reasonable to seek, among all stabilizing controllers for P0 , an optimally robust controller C0 for which T (P0 , C0 ) is as small as possible. The problem of finding such a C0 can be formulated as an H∞ -norm minimization problem. As C varies over S(P0 ), the corresponding T (P0 , C) varies over all matrices of the form −N0 (X0 + R D˜ 0 ) −N0 (Y0 − R N˜ 0 , (7.4.78) T1 (R) = D0 (X0 + R D˜ 0 ) D0 (Y0 − R D˜ 0 where (N0 , D0 ), (D˜ 0 , N˜ 0 ) are any r.c.f. and l.c.f. of P0 , X0 , Y0 ∈ M(S) satisfy X0 N0 + Y0 D0 = I , and R ∈ M(S) is a free parameter (see (5.2.6)). Thus, minimizing T (P0 , C) over all C ∈ S(P0 ) is equivalent to the unconstrained minimization of T1 (R) as R varies over M(S). Now note that T1 (R) is of the form U − V RW , where −N˜ 0 −N˜ 0 Y [X ] , V = (7.4.79) , W = [D˜ 0 − N˜ 0 ] . U= 0 0 D˜ 0 D˜ 0 This type of minimization problem is studied in Section 6.6. Finally, the results contained in this section carry over with obvious modifications to the study of the robustness of discrete-time systems, and to cases where the set of “stable” transfer functions is something other than S.
7.5
ROBUST AND OPTIMAL REGULATION
In this section, several previously derived results are brought to bear to provide solutions of two problems, namely robust regulation and optimal robust regulation. The robust regulation problem encompasses two aspects, namely robust tracking and robust disturbance rejection. The robust tracking problem can be described as follows: Suppose we are given a plant P , together with an unstable reference signal generator which is factored in the form D˜ r−1 N˜ r , where D˜ r , N˜ r are left-coprime. The objective is to design a compensator such that the plant output tracks the unstable reference signal, and continues to do so even as the plant is slightly perturbed. This problem is depicted in Figure 7.2, where, to achieve the greatest possible generality, a two-parameter compensation scheme is shown. Alternatively, suppose the unstable signal is a disturbance which is injected at the plant output. In this case, the objective is to design a compensator such that the plant output is insensitive to the disturbance, and remains so even as the plant is slightly perturbed. This situation is shown in Figure 7.3, and is referred to as the robust disturbance rejection problem. Note that a feedback, or one-parameter, configuration is employed in Figure 7.3 since, as shown in Section 5.6, both configurations are equivalent for the purposes of disturbance rejection. In more precise terms, the robust disturbance rejection problem can be stated as follows: Find a compensator such that three conditions are met, namely
7.5. ROBUST AND OPTIMAL REGULATION v
r
~
~
Dr—1 Nr
~
~
N c1
Dc—1
+–
P
253
y
~
N c2
Figure 7.2: Robust Tracking Using a Two-Parameter Compensator. v ~ —1 ~
Dr
C
+–
P
+
Nr
+
y
Figure 7.3: Robust Disturbance Rejection.
(i) The feedback system of Figure 7.2 is stable. (ii) The transfer matrix from v to the tracking error r − y belongs to M(S). (iii) There exists a neighborhood N(P ) of P in the graph topology such that conditions (i) and (ii) hold for all plants in N(P ). If a one-parameter or feedback compensator is used, the robust tracking problem is mathematically equivalent to the robust disturbance rejection problem, since the transfer matrix from v to the tracking error e in Figure 7.4 is exactly the same as the transfer matrix from v to the plant output y in Figure 7.3. On the other hand, if a two-parameter compensation scheme is used, there is a distinct difference between robust tracking and robust disturbance rejection. As a matter of fact, there is nothing to be gained by using a two-parameter scheme if the objective is to reject a disturbance (see Section 5.6). For this reason, throughout this section, attention is focused on the robust tracking problem. v
~
~
Dr—1 Nr
r
+–
e
C
P
Figure 7.4: Robust Tracking Using a One-Parameter Compensator.
y
254
7. ROBUSTNESS
Let us now formulate the problem in more mathematical terms. Given the plant P , let ˜ N˜ ) be any r.c.f. and l.c.f. of P . The objective is to find a compensator C such that three (N, D), (D, conditions are satisfied: (R1) C ∈ S(P ). (R2) (I + P C)−1 D˜ r−1 N˜ r ∈ M(S).11 (R3) There exists a δ such that (R1) and (R2) hold whenever P is replaced by P1 = D˜ 1−1 N˜ 1 where [D˜ − D˜ 1 N˜ − N˜ 1 ] < δ . (7.5.1) The solution to the robust tracking problem is given next. Let αr denote the largest invariant factor of D˜ r .12 Then there exists a C satisfying (R1)–(R3) if and only if N and αr I are left-coprime. If N and αr I are left-coprime, the set of all C that satisfy (R1)–(R3) is given by
Theorem 7.5.1
{C1 /αr : C1 ∈ S(P /αr )} .
(7.5.2)
Checking whether or not the problem at hand has a solution is made easier by the next result. The following statements are equivalent: (i) N and αr I are left-coprime. (ii) N (s) has full row rank at all C+e -zeros of αr . (iii) N has full row rank, and the largest invariant factor of N and αr are coprime.
Lemma 7.5.2
Thus, Lemma 7.5.2 shows that only square or fat plants can be robustly regulated, and reduces the matrix coprimeness test of Theorem 7.5.1 to a scalar coprimeness criterion. Once one is sure that the robust tracking problem does indeed have a solution in the case at hand, the formula (7.5.2) describing all compensators C that achieve robust tracking has a very simple interpretation: In Figure 7.5, choose C1 to be any compensator that stabilizes the feedback system. Then C = C1 /αr achieves (R1)–(R3). Moreover, every such C is of the above form. The presence of the 1/αr term within the compensator is referred to elsewhere as the “internal model principle” [107]. The proof of Theorem 7.5.1 requires two preliminary results. The first of these is the same as Lemma 5.7.3, and is reproduced here for convenience. Suppose T , U, V ∈ M(S) with |U | = 0 and U, V left-coprime.Then T U −1 V ∈ M(S) if and only if T U −1 ∈ M(S). Lemma 7.5.3
Because of Lemma 7.5.3, condition (R2) can be replaced by 11 Note that (R2) is redundant unless D ˜ r , is non-unimodular, i.e., the reference signal generator is unstable. 12 Note that α is a nonunit since D ˜ r is not unimodular. r
7.5. ROBUST AND OPTIMAL REGULATION
v
~
~
Dr—1 Nr
r
+–
C
Ơr
255
P
Figure 7.5: Solution to the Robust Tracking Problem.
(R2’) (I + P C)−1 D˜ r−1 ∈ M(S). The second result is actually a special case of a fact established in the next section (see Proposition 7.6.5). It is stated here without proof, and the reader may wish to provide a specialized proof that applies to the ring S (see Problem 7.5.1). Suppose A, B ∈ M(S) are square, have the same dimensions, and suppose |A| = 0. Then for every ε > 0, the ball (7.5.3) B(B, ε) = {B1 ∈ M(S) : B1 − B < ε} ,
Lemma 7.5.4
contains a matrix B1 such that A, B1 are right-coprime. The proof of Theorem 7.5.1 is very long. The first step, which takes us most of the way, is to establish the next result. Lemma 7.5.5 Suppose (Nc , Dc ) is an r.c.f. of C ∈ S(P ). Then C achieves (R1)–(R3) if and only if αr−1 Dc ∈ M(S) (i.e., if and only if αr divides every component of Dc ).
Proof. Since the robustness condition (R3) involves perturbing the plant from its nominal description P , we can assume at the outset that D˜ and D˜ r are right-coprime. If this is not so originally, it can be ˜ by Lemma 7.5.4. Also, since the truth or falsity achieved by an arbitrarily small perturbation of D, −1 of the condition αr Dc is unaffected by multiplying Dc by a unimodular matrix, we can as well ˜ c + NN ˜ c = I . Finally, note that since αr choose the particular r.c.f. (Nc , Dc ) of C that satisfies DD −1 ˜ ˜ is the largest invariant factor of Dr , we have that αr Dr ∈ M(S) (see Problem 7.5.2). “if ” Suppose Dc = αr T , where T ∈ M(S). Now (R1) holds by assumption, since C ∈ S(P ). Next, (7.5.4) (I − P C)−1 D˜ r−1 = Dc D˜ D˜ r−1 = αr T D˜ D˜ r−1 ∈ M(S) , so that (R2’) holds. To prove (R3), suppose P = D˜ −1 N˜ is replaced by P1 = D˜ 1−1 N˜ 1 , where D˜ 1 = ˜ N˜ 1 = N˜ + N˜ . Then the corresponding return difference matrix is D˜ + D, ˜ M = D˜ 1 Dc + N˜ 1 Nc = I + [D˜ N]
Dc Nc
,
(7.5.5)
256
7. ROBUSTNESS
which is unimodular provided −1 Dc ˜ ˜ [D N] < δ = Nc .
(7.5.6)
Hence, (R1) holds robustly. To prove that (R2’) also holds robustly, note that (I + P1 C)−1 D˜ r−1 = Dc M −1 D˜ 1 D˜ r−1 = T M −1 D˜ 1 αr D˜ r−1 ∈ M(S) ,
(7.5.7)
since M −1 ∈ M(S) due to the unimodularity of M. “only if ” Suppose (Rl), (R2’), (R3) hold; it is to be shown that αr−1 Dc ∈ M(S). This part of the proof is divided into several steps. First, since D˜ and Dr are right-coprime, let (A, B) be an l.c.f. of D˜ D˜ r−1 . Then, by Theorem 4.1.15, A and D˜ r have the same invariant factors. In particular, αr is the largest invariant factor of A. Step 1 It is shown that Dc A−1 ∈ M(S) . (7.5.8) By assumption, (R2’) holds, i.e., Dc D˜ D˜ r−1 = Dc A−1 B ∈ M(S). An application of Lemma 7.5.3 gives (7.5.8). Step 2 It is shown that ˜ c A−1 ∈ M(S) ∀Q ∈ M(S) . Dc QNN
(7.5.9)
˜ c ). Then I − εQNN ˜ c is Let Q ∈ M(S) be arbitrary, and choose ε in the interval (0, 1/QNN −1 ˜ unimodular. Define M = (I − εQNNc ) . Then M=I+
∞ ˜ c )i = I + Rε NN ˜ c, (εQNN
(7.5.10)
i=1
for some rational matrix Rε , (note that M is rational). Moreover, Rε → 0 as ε → 0. Hence, Rε < ˜ δ for sufficiently small ε, where δ is the number defined in (7.5.1). Now let P1 = D˜ −1 (I + Rε )N. −1 −1 ˜ By assumption, (I + P1 C) Dr ∈ M(S), since P1 is a valid perturbation of P . But ˜ c + (I + Rε )NN ˜ c ]−1 D˜ D˜ r−1 = Dc (I + Rε NN ˜ c )−1 A−1 B (I + P1 C)−1 D˜ r−1 = Dc [DD ˜ c )A−1 B ∈ M(S) . (7.5.11) = Dc (I − εQNN Applying Lemma 7.5.3 gives ˜ c )A−1 ∈ M(S) . Dc (I − εQNN
(7.5.12)
˜ c A−1 ∈ M(S), and dividing by ε leads to (7.5.9). Subtracting (7.5.8) yields εDc QNN Step 3 It is shown that Dc QA−1 ∈ M(S) ∀Q ∈ M(S) .
(7.5.13)
7.5. ROBUST AND OPTIMAL REGULATION
257
˜ c + N˜ Nc = I , Indeed, since DD ˜ c + NN ˜ c )A−1 = (Dc QD)D ˜ c A−1 + Dc QNN ˜ c A−1 ∈ M(S) , Dc QA−1 = Dc Q(DD by (7.5.8) and (7.5.9), respectively. Step 4 It is shown that
αr−1 Dc ∈ M(S) ,
which is the desired conclusion. Select unimodular U, V ∈ U(S) such that ⎡ ⎤ 0 α1 ⎢ ⎥ .. U AV = ⎣ ⎦=S, . 0
(7.5.14)
(7.5.15)
(7.5.16)
αr
is the Smith form of A. By (7.5.8), Dc QA−1 ∈ M(S) ∀Q ⇐⇒ Dc QV S −1 U ∈ M(S) ∀Q ⇐⇒ Dc QV S −1 ∈ M(S) ∀Q ⇐⇒ Dc Q1 S −1 ∈ M(S) ∀Q1
(7.5.17)
where Q1 denotes QV , and Q, Q1 range over M(S). Now let Q1 = I . Then Dc S −1 ∈ M(S), which shows that αr divides the last column of Dc . Next, successively let Q1 equal the permutation matrix that interchanges column j with the last column, and let the index j vary over all possible values. Then Dc Q1 S −1 ∈ M(S) implies that αr divides every column of Dc . 2 The argument used in the last step is sufficiently useful that it deserves display as a separate item. Suppose A, B ∈ M(S), have the same number of columns, and in addition A is square and nonsingular. Then BQA−1 ∈ M(S) ∀Q ∈ M(S) , (7.5.18)
Proposition 7.5.6
if and only if the largest invariant factor of A divides every element of B. Proof of Theorem 7.5.1. It is first shown that the robust regulation problem has a solution if and ˜ Y˜ ∈ M(S) such that N˜ X˜ + D˜ Y˜ = I . Then by Theoonly if N and αr I are left-coprime. Select X, rem 5.2.1, S(P ) = {(X˜ + DR)(Y˜ − NR)−1 : R ∈ M(S)} . (7.5.19) Thus, the set of all Dc as C varies over S(P ) is given by {Y˜ − NR : R ∈ M(S)}. By Lemma 7.5.5, the robust tracking problem has a solution if and only if αr−1 (Y˜ − NR) ∈ M(S) for some R ∈ M(S), or equivalently, there exist R, W ∈ M(S) such that Y˜ − NR = αr W .
(7.5.20)
258
7. ROBUSTNESS
Now (7.5.20) can be rearranged as
NR + αr W = Y˜ .
(7.5.21)
Thus, the proof is complete if it can be shown that the left-coprimeness of N and αr I is necessary and sufficient for (7.5.21) to have a solution. Accordingly, suppose first that N, αr I are left-coprime, and select T , U ∈ M(S) such that NT + αr U = I ; then R = T Y˜ , W = U Y˜ is a solution of (7.5.21). Conversely, suppose N and αr I are not left-coprime, and let V be a nontrivial (i.e., nonunimodular) common left-divisor of N and αr I . Then NR + αr W is a left-multiple of V for all R, W . On the other hand, from Corollaries 4.1.18 and 4.1.7, N and Y˜ are left-coprime, so that Y˜ is not a left-multiple of V . Hence, (7.5.21) does not have a solution. To complete the proof, it is shown that the set of all C ∈ S(P ) such that αr Dc−1 ∈ M(S) is precisely {C1 /αr : C1 ∈ S(P /αr )}, provided of course that N and αr I are left-coprime. First, suppose Dc = αr T for some T ∈ M(S). Then, since ˜ c = αr DT ˜ + NN ˜ c, ˜ c + NN I = DD
(7.5.22)
˜ −1 N˜ = P /αr . we see that T and Nc are right-coprime, and that C1 = Nc T −1 stabilizes (αr D) Hence, every solution to the robust tracking problem is of the form C1 /αr , for some C1 ∈ S(P /αr ). ˜ −1 N; ˜ moreover, it can be shown that αr D˜ Conversely, suppose C1 ∈ S(P /αr ). Now P /αr = (αr D) ˜ N) ˜ is a l.c.f. of P /αr . Since C1 ∈ S(P /αr ), C1 has an r.c.f. and N˜ are left-coprime. Hence, (αr D, (Nc1 , Dc1 ) satisfying ˜ c1 + NN ˜ c1 = I . αr DD (7.5.23) Thus, C = C1 /αr = Nc1 (αDc1 )−1 stabilizes P , and the denominator matrix of C is a multiple of αr . Hence, C solves the robust tracking problem. 2 Corollary 7.5.7
Suppose N and αr I are left-coprime, and let T , U ∈ M(S) be particular solutions of NT + αr U = I.
(7.5.24)
Let (B, A) be an r.c.f. of N/αr . Then the set of all C ∈ S(P ) that solve the robust tracking problem is {[X˜ + D(T Y˜ + AS)][αr (U Y˜ − BS)]−1 : S ∈ M(S)} .
(7.5.25)
Proof. From the proof of Theorem 7.5.1, we see that C solves the robust tracking problem if and only if C = (X˜ + DR)(Y˜ − NR)−1 for some R satisfying (7.5.21). Now it is routine to verify that the set of all solutions of (7.5.21) is R = T Y˜ + AS ,
W = U Y˜ − BS ,
S ∈ M(S) .
(7.5.26)
Substituting for R from (7.5.26) and noting that Y˜ − NR = αr W = αr (U Y˜ − BS) leads to (7.5.25). 2
7.5. ROBUST AND OPTIMAL REGULATION
259
Proof of Lemma 7.5.2. Recall from Section 4.2 that N and αr I are left-coprime if and only if the matrix F = [N αr I ] has full row rank at all s ∈ C+e . If αr (s) = 0, the rank condition holds; therefore, it is only necessary to verify the rank condition at the C+e -zeros of αr . If αr (s) = 0, then F (s) has full row rank if and only if N(s) has full row rank. Hence, we conclude that N and αr I are left-coprime if and only if N (s) has full row rank at all C+e -zeros of αr . Clearly this is the same as requiring the largest invariant factor of N and αr to be coprime. 2 Lemma 7.5.2 also holds over any principal ideal domain, and not just over S; but the proof in the general case is a little more complicated (see Problem 7.5.3). Now let us consider the problem of robust tracking using a two-parameter compensation scheme. Recall from Section 5.6 that such a scheme can be depicted as shown in Figure 7.2. In this scheme, the tracking error between the reference input r and the plant output y is given by e = [I − N(N˜ c2 N + D˜ c D)−1 N˜ c1 ]r .
(7.5.27)
Hence, the transfer matrix from v to e is given by
where
F (N, D) = [I − NM −1 N˜ c1 ]D˜ r−1 N˜ r ,
(7.5.28)
M = M(N, D) = N˜ c2 N + D˜ c D .
(7.5.29)
As before, the objectives of the robust tracking problem are to determine the compensator parameters N˜ c1 , N˜ c2 , D˜ c such that the closed-loop system is stable and F (N, D) ∈ M(S) for the nominal plant as well as for all plants in some neighborhood of the nominal plant. A partial solution to the problem is given by the next theorem. Consider the system of Figure 7.2, and suppose P has at least as many inputs as outputs. Let (N, D) be an r.c.f. of the nominal plant, and let (N1 , D1 ) denote perturbations of (N, D). Let (Nc2 , Dc ) be an r.c.f. of the feedback compensator D˜ c−1 N˜ c2 .Then the triple (N˜ c1 , N˜ c2 , D˜ c ) solves the robust tracking problem if and only if (i) αr−1 Dc ∈ M(S), and (ii) N1 [M(N1 , D)]−1 (N˜ c2 − N˜ c1 )D˜ r−1 ∈ M(S) ∀(N1 , D1 ). If P is square, (ii) is equivalent to (ii’) (N˜ c2 − N˜ c1 )D˜ r−1 ∈ M(S). If P is nonsquare, (ii’) implies but is not implied by (ii).
Theorem 7.5.8
If we use a one-parameter, or feedback compensator, then N˜ c2 = N˜ c1 , so that (ii) above is automatically satisfied. In this case, (i) is a necessary and sufficient condition for robust tracking. This is precisely the result contained in Lemma 7.5.5. Remarks 7.5.9
260
7. ROBUSTNESS
Proof. “if ” Suppose first that (i) and (ii) hold. It is desired to show that (I − N1 M −1 N˜ c1 )D˜ r−1 ∈ M(S) ∀(N1 , D1 ) , where M = M(N1 , D1 ) is as in (7.5.29). Define D1 S= N1
−Nc2 Dc
(7.5.30)
(7.5.31)
.
By the robustness of closed-loop stability, S is unimodular for (N1 , D1 ) sufficiently close to (N, D). Let T = S −1 , and partition T as A B T = . (7.5.32) K L Now [A B] is equivalent to [I 0], since it is part of a unimodular matrix. This, plus the fact that −Nc2 =0, (7.5.33) [A B] Dc implies that [A B] is an l.c.f. of D˜ c−1 N˜ c2 , i.e., that [A B] = U [D˜ c N˜ c2 ] Similarly,
for some U ∈ U(S) .
[K L] = [−N˜ 1 D˜ 1 ] ,
where (N˜ 1 , D˜ 1 ) is an l.c.f. of the perturbed plant
N1 D1−1 . Now
(7.5.34)
(7.5.35) T S = I implies that
U (D˜ c D1 + N˜ c2 N1 ) = I ,
(7.5.36)
or U = M −1 . Also, ST = I implies in particular that
or Therefore
N1 B + Dc L = I ,
(7.5.37)
NM −1 N˜ c2 + Dc D˜ 1 = I .
(7.5.38)
(I − N1 M −1 N˜ c1 )D˜ r−1 = Dc D˜ 1 D˜ r−1 + N1 M −1 (N˜ c2 − N˜ c1 )D˜ r−1 ,
(7.5.39)
which belongs to M(S) by (i) and (ii). Hence, robust tracking is achieved. “only if ” Suppose the triple (N˜ c1 , N˜ c2 , D˜ c ) achieves robust tracking. It is desired to show that (i) and (ii) hold. The hypothesis is that {I − N1 [M(N1 , D1 )]−1 N˜ c1 }D˜ r−1 N˜ r ∈ M(S) ∀(N1 , D1 ) ,
(7.5.40)
7.5. ROBUST AND OPTIMAL REGULATION
261
where (N1 , D1 ) range over some neighborhood of (N, D). By Lemma 7.5.3, (7.5.40) is equivalent to {I − N1 [M(N1 , D1 )]−1 N˜ c1 }D˜ r−1 ∈ M(S) ∀(N1 , D1 ) . (7.5.41) As a prelude to the proof, observe first that it can be assumed without loss of generality that the numerator matrix N of the nominal plant and αr I are left-coprime. If this is not so originally, it can be achieved by an arbitrarily small perturbation of the plant: It suffices to perturb N to a nearby matrix which has full row rank at all C+e -zeros of αr (see Lemma 7.5.2). Also, we may as well assume that the r.c.f. (N, D) of the nominal plant and the triple (N˜ c1 , N˜ c2 , D˜ c ) satisfy N˜ c2 N + D˜ c D = I .
(7.5.42)
˜ N˜ ) be an l.c.f. of the nominal plant defined by the equation Finally, let (D,
D N
D˜ c −Nc2 = Dc −N˜
N˜ c2 D˜
−1 .
(7.5.43)
Then, since N˜ and N have the same invariant factors, it follows from Lemma 7.5.2 that N˜ and αr I are also left-coprime. The first step of the proof is to observe that the hypotheses imply that N˜ c1 and D˜ r are right-coprime. To see this, note that (7.5.41) implies that
Hence,
W = (I − N N˜ c1 )D˜ r−1 ∈ M(S) .
(7.5.44)
N N˜ c1 + W D˜ r = I ,
(7.5.45)
which shows that N˜ c1 and D˜ r are right-coprime. Let (A, B) be an l.c.f. of N˜ c1 D˜ r−1 , and note that αr is also the largest invariant factor of A. Next, fix the plant numerator at N and vary its denominator. Then (7.5.41) states that {I − N [M(N, D1 )]−1 N˜ c1 }D˜ r−1 ∈ M(S) ∀D1 .
(7.5.46)
Subtracting (7.5.44) from (7.5.46) leads to N{[M(N, D1 )]−1 − I }N˜ c1 D˜ r−1 = N{[M(N, D1 )]−1 − I }A−1 B ∈ M(S) ∀D1 .
(7.5.47)
Hence, from Lemma 7.5.3, N{[M(N, D1 )]−1 − I }A−1 ∈ M(S) ∀D1 .
(7.5.48)
Now M = M(N, D1 ) = I + D˜ c (D1 − D). By an argument analogous to that in the proof of Lemma 7.5.5 (see especially Step 2), it can be shown that M −1 − I can be made equal ε D˜ c Q for an arbitrary matrix Q ∈ M(S), by suitable choice of D1 . Therefore, (7.5.48) implies that N D˜ c QA−1 ∈ M(S) ∀Q ∈ M(S) .
(7.5.49)
262
7. ROBUSTNESS
Now Proposition 7.5.6 leads to the conclusion that αr the largest invariant factor of A, divides each ˜ Hence, element of N D˜ c . However, from (7.5.43), N D˜ c = Dc N. Dc αr−1 N˜ ∈ M(S) .
(7.5.50)
Finally, since αr I and N˜ are left-coprime, an application of Lemma 7.5.3 to (7.5.50) gives Dc αr−1 ∈ M(S) .
(7.5.51)
This proves (i). The proof of (ii) is now very easy. From previous arguments, robust tracking is equivalent to (7.5.39). However, since αr divides every element of Dc , we have that Dc D˜ 1 D˜ r−1 ∈ M(S) ∀(N1 , D1 ) .
(7.5.52)
Subtracting (7.5.52) from (7.5.39) shows that N1 M −1 (N˜ c2 − N˜ c1 )D˜ r−1 ∈ M(S) ∀(N1 , D1 ) .
(7.5.53)
This proves (ii). To complete the proof, it is necessary to show two things: 1) Condition (ii’) implies (ii); 2) If the plant is square, (ii) implies (ii’). Of these, the first statement is obvious: If N˜ c2 − N˜ c1 is a left-multiple of D˜ r , then (7.5.53) is clearly satisfied. To prove the second statement, define Q = (N˜ c2 − N˜ c1 )D˜ r−1 .
(7.5.54)
Then (7.5.53) implies in particular that NQ ∈ M(S). Now select unimodular matrices U, V ∈ U(S) such that U QV is in Smith-McMillan form, and let S = U −1 QV −1 = Diag {s1 /t1 , · · · , sr /tr } .
(7.5.55)
Since N Q ∈ M(S), it follows that ti divides the i-th column of NU , for all i. Now, if Q itself does not belong to M(S), then t1 is not a unit. Note that t1 is a divisor of αr , because of (7.5.54). Now let s be any C+e -zero of t1 . Then, since t1 divides the first column of NU, NU (s) is singular, which in turn implies, by the unimodularity of U , that N(s) is singular. However, since t1 is a divisor of αr , this contradicts the left-coprimeness of N and αr I . Hence, Q ∈ M(S). 2 Theorem 7.5.8 shows that (ii) implies (ii’) for square plants. It is now shown by example that there exist situations involving nonsquare plants where (ii) holds but (ii’) is false. Example 7.5.10
Suppose the plant P =
4s + 1 3s 2s − 1 2s − 1
7.5. ROBUST AND OPTIMAL REGULATION
263
is to be compensated so as to robustly track a step input. In this case, we may take D˜ r = αr = s/(s + 1). For convenience, let z denote s/(s + 1), and note that z ∈ S. Then P = [3z 3z + 1](3z − 1)−1 , and an r.c.f. for P is given by N = [−z 3z + 1] , 1 + z −1 − 3z D= . −2z 6z − 1 Now choose
1 0 ˜ , Dc = z z
N˜ c2 =
1 . 1−z
Then it is claimed that robust tracking is achieved provided αr divides the second component of N˜ c2 − N˜ c1 , even if αr does not divide the first component of N˜ c2 − N˜ c1 .This is proved by establishing that condition (ii) of Theorem 7.5.8 holds. For this purpose, suppose N, D are perturbed to N1 , D1 , and partition these as N1 = [na nb ], D1 = [Da Db ] . Also, partition N˜ c1 , N˜ c2 , D˜ c as x ˜ N c1 = a , xb
y ˜ Nc2 = a , yb
D˜ ˜ Dc = ˜ a . Db
The key point to note here is that D˜ b is a multiple of z = αr . Now partition M = N˜ c2 N1 + D˜ c D1 , in the form maa mab M= . mba mbb Then N1 M −1 (N˜ c2 − N˜ c1 )/αr = |M|−1 [(na mbb − nb mba )(ya − xa )]/αr + |M|−1 [(−na mab − nb maa )(yb − xb )]/αr . Since αr divides yb − xb , the second term belongs to S, provided the plant perturbation is small enough to maintain the unimodularity of M, i.e., to maintain closed-loop stability. As for the first term, mba = yb na + D˜ b Da , mbb = yb nb + D˜ b Db na mbb − nb mba = D˜ b (na Db − nb Da )
264
7. ROBUSTNESS
is a multiple of αr , since αr divides D˜ b . Hence, (ii) holds and robust tracking is achieved, whatever be xa ! Now let us consider the problem of optimal regulation.The problems studied in Chapter 6 can be interpreted as minimizing the effect of a stable disturbance on the plant output, or minimizing the tracking error in response to a stable reference input. With the aid of the results derived thus far in this section, it is now possible to tackle the same problems in the presence of unstable disturbances or reference inputs. Specifically, suppose P is a given plant, D˜ r−1 N˜ r is a given disturbance or reference input generator, and W1 , W2 are given square weighting matrices. The objective is to minimize the objective function Jp = W1 (I + P C)−1 D˜ r−1 N˜ r W2 p , (7.5.56) over all C that achieve robust regulation. This can be interpreted as the problem of optimal disturbance rejection, or optimal tracking using a one-parameter compensator. Using Theorem 7.5.1 and Corollary 7.5.7, this problem can be transformed into one of the type solved in Chapter 6. With all symbols as in Theorem 7.5.1 and Corollary 7.5.7, the set of all C over which Jp is to be minimized is given by (7.5.25). Substituting from (7.5.25) into (7.5.56) shows that ˜ , (I + P C)−1 D˜ r−1 = (U Y˜ − BS)DL Jp = E − W1 BSF p ,
(7.5.57) (7.5.58)
L = αr D˜ r−1 , E = W1 U Y˜ F , ˜ N˜ r W2 . F = DL
(7.5.59)
where
Note that the robust regulation problem has a solution if and only if the plant has at least as many inputs as outputs. Hence, F is square or fat, and as a result the minimization of Jp can be carried out using the techniques presented in Section 6.5. Consider now the problem of optimal robust tracking of square plants using a two-parameter compensation scheme. From Corollary 7.5.7 and Theorem 7.5.8 it follows that the set of all triples (N˜ c1 , N˜ c2 , D˜ c ) that achieve robust regulation is given by D˜ c−1 N˜ c2 = [(X˜ + D)(T Y˜ + AS)][αr (U Y˜ − BS)]−1 , N˜ c1 = N˜ c2 − QD˜ r , S, Q ∈ M(S) .
(7.5.60)
The above expression is not very convenient, as it does not explicitly specify the compensator parameters. Such an expression can be obtained using a doubly coprime factorization for the plant. Suppose Y X D −X˜ =I . (7.5.61) −N˜ D˜ N Y˜
7.5. ROBUST AND OPTIMAL REGULATION
265
Then by Problem 5.2.2 it follows that ˜ −1 (X + R D) ˜ ∀R ∈ M(S) . (X˜ + DR)(Y˜ − NR)−1 = (Y − R N)
(7.5.62)
Hence, the set of all compensators that achieve robust tracking can be described with the aid of Corollary 7.5.7 and (7.5.62) as D˜ c = Y − (T Y˜ + AS)N˜ , N˜ c2 = X + (T Y˜ + AS)D˜ , D˜ c1 = X + (T Y˜ + AS)D˜ + QD˜ r .
(7.5.63)
For the class of compensators described above, the tracking error transfer matrix can be computed from (7.5.39) as ˜ + NQ)N˜ r . (Dc D˜ D˜ r−1 + NQ)N˜ r = ((U Y˜ − BS)DL (7.5.64) Comparing (7.5.64) with (7.5.57), we see that the extra flexibility arising from using a two-parameter compensation scheme is the presence of the term NQ in (7.5.64).The expression (7.5.64) can actually be simplified. Since the plant is square, N and αr I are both left- as well as right-coprime. Hence, (N, αr I ) is a r.c.f. for αr−1 N , or in other words, we may take A = αr I, B = N . Hence, in (7.5.64), ˜ + N Q = U Y˜ DL ˜ + N(Q − S DL) ˜ ˜ + NR , (U Y˜ − BS)DL = U Y˜ DL
(7.5.65)
where R = Q − S D˜ is a free parameter. To reiterate, the set of stable transfer matrices achievable using a feedback compensator is ˜ − NS DL) ˜ N˜ r , (U Y˜ DL
S ∈ M(S) ,
(7.5.66)
whereas with a two-parameter scheme, the set is ˜ + NQ)N˜ r , (U Y˜ DL
Q ∈ M(S) .
(7.5.67)
Jp = W1 (I − N N˜ c2 )D˜ r−1 N˜ r W2 p ,
(7.5.68)
Now, the problem of minimizing
over all robustly stabilizing two-parameter compensators can be reformulated as that of minimizing Jp = E − W1 NQN˜ r W2 p ,
(7.5.69)
as Q varies over M(S), where E is defined in (7.5.59). It is easy to see that the infimum in (7.5.69) is no larger than that in (7.5.58), since (7.5.69) contains (7.5.58) as a special case by choosing ˜ Moreover, as in Section 6.6, there always exist infinitely many optimal compensators if a Q = S DL.
266
7. ROBUSTNESS
two-parameter scheme is used, and this flexibility can be exploited to achieve both good disturbance rejection and good tracking simultaneously. Example 7.5.11 For the sake of variety we consider discrete-time systems in the next two examples. Thus z denotes the unit delay and its z-transform. Note that the z-transform of a sequence {f0 , f1 , . . .} is given by ∞ fi z i . f (z) = i=0
Consider the set-up shown in Figure 7.3, and suppose it is desired to robustly reject a step input disturbance. Since the z-transform of a sequence of one’s is 1/(1 − z), we may take nr = 1, dr = αr = 1 − z . Suppose the plant has two inputs and one output, and has the transfer matrix P = [z(z − 0.5) z(z + 0.5)](z + 0.25)−1 . An l.c.f. of this plant is given by D˜ = z + 0.25 , N˜ = [z(z − 0.5) z(z + 0.5)] , and a particular solution of the Bezout identity N˜ X˜ + D˜ Y˜ = I is given by X˜ = [4 − 4] , Y˜ = 4 . An r.c.f. of P is given by N = [z 0] , −1 −z − 0.5 D= . 1 z − 0.5 Now N and αr are left-coprime. In fact (7.5.24) is satisfied with T = [1 0] , U = 1 . Hence, the robust disturbance rejection problem has a solution. Further, all solutions of this problem can be parametrized using Corollary 7.5.7. An r.c.f. of N/αr is z−1 0 B = [z 0] , A = . 0 1 From (7.5.25), the set of all compensators that achieve robust disturbance rejection is given by C = Nc Dc−1 ,
7.5. ROBUST AND OPTIMAL REGULATION
267
where Dc = (4 − zs1 ) ,
−4z + 3 + (1 − z)s1 − (z + 0.5)s2 , 4z − 3 + (z − 1)s1 + (z − 0.5)s2 s1 , s2 ∈ H∞ are rational .
Nc = (z + 0.25)
Now consider the problem of optimal robust disturbance rejection as stated in (7.5.56), with both weighting functions equal to unity. From (7.5.57) and (7.5.58), the set of achievable transfer functions from v to y is f (s1 , s2 ) = (4 − zs1 )(z + 0.25) . Suppose it is desired to minimize the H∞ -norm of this transfer function. Using the methods described in Section 6.4, one can verily that the minimum is achieved with s1 = 0, with a minimum cost of 5. The collection of compensators that achieve the minimum is −4z + 3 − (z + 0.5)s2 Dc = 4(1 − z) , Nc = , s2 ∈ H∞ . 4z − 3 + (z − 0.5)s2
Consider the set-up shown in Figure 7.2, where the input to be tracked is the sinusoidal sequence {0, 1, 0, −1, 0, 1, . . .}. Since the z-transform of this sequence is z/(1 + z2 ), we may take nr = z , αr = 1 + z2 . Example 7.5.12
Suppose the plant is scalar, with the transfer function p=
z . z + 0.5
Then, dropping the tildes because everything is a scalar, we have n = z , d = z + 0.5 , x = −2 , y = 2 . a = αr = 1 + z2 , b = n = z , t = −z , u = 1 . Hence, the set of r, w that satisfy (7.5.21) is r = ty + as = −2z + (1 + z2 )s , w = uy − bs = 2 − zs , s ∈ H∞ . From Corollary 7.5.7, the set of all feedback, or one-parameter compensators that achieve robust tracking is given by c = nc /dc , where dc = (2 − zs)(1 + z2 ) , nc = −2 + (z + 0.5)(−2z + (1 + z2 )s) ,
268
7. ROBUSTNESS
and s ∈ H∞ is arbitrary. If it is desired to have optimal tracking by minimizing the objective function (7.5.56) with unity weighting, then one has to minimize J∞ = (2 − zs)(z + 0.5)z∞ , s ∈ H∞ . Using the techniques of Section 6.4, one can verify that the minimum is 3, corresponding to the choice s = 0. Hence the unique feedback compensator that achieves optimal robust tracking is c=
−(1 + 0.5z + z2 ) . 1 + z2
Now let us see how much better we can do using a two-parameter compensation scheme. The set of transfer functions from v to the tracking error that can be achieved is nr ((uy − bs)d + nq) = z[(2 − zs) (z + 0.5) + zq], s , q ∈ H∞ . = z(1 + zq1 ) , where q1 = q − (z + 0.5)s + 2. The H∞ -norm of the above is minimized when q1 = 0, and equals 1. Hence, not only does the two-parameter scheme result in a reduction in the optimal tracking error by a factor of 3, but there are infinitely many compensators that achieve this minimum. In this way it is possible to achieve both optimal disturbance rejection as well as good tracking. For example, a compensator that achieves optimal tracking corresponds to the choice q = −2, s = 0, which leads to the compensator dc = 2 + 2z2 , nc2 = −2(1 + 0.5 + z2 ) , nc1 = nc2 + 2αr = −z . Note that the feedback compensator is the same as the optimal compensator discussed above. But the use of a two-parameter scheme reduces the tracking error by a factor of 3.
PROBLEMS 7.5.1. Prove Lemma 7.5.4. 7.5.2. Suppose A ∈ M(S) is square and nonsingular and let a denote its largest invariant factor. Show that aA−1 ∈ M(S). 7.5.3. Prove Lemma 7.5.2 over a general principal ideal domain R (Hint: Partition all the full-size minors of the matrix [N αr I ] into several groups, namely: those that do not involve any columns of αr I , those that involve one column, two columns, etc.). ˜ N˜ are left-coprime. 7.5.4. Suppose D˜ and N˜ are left-coprime, as are N˜ and αr I . Show that αr D,
7.6. GENERICITY
269
7.5.5. Using the techniques of Section 6.7, construct a sequence of plants Pi and a sequence of reference inputs D˜ r−1 N˜ ri such that the ratio of the optimal tracking error achievable using i a two-parameter compensation scheme to that achievable using a one-parameter scheme approaches zero as i approaches ∞.
7.6
GENERICITY
In this section, the focus is on the notion of genericity, which is a mathematically precise way of expressing the intuitive idea that some statement or property is almost always true. The concept of genericity is defined first, and then it is shown that various useful properties of control systems are generically true. These include coprimeness, strong stabilizability, and simultaneous stabilizability (subject of course to suitable restrictions). We begin with a formal definition of genericity. Let X denote a topological space. Recall that a binary relation R on X is a subset of the product space X × X. It is symmetric if (x, y) ∈ R ⇐⇒ (y, x) ∈ R. More generally, an n-ary relation is a subset of the product space Xn . A binary relation R on X is generic if it is an open dense subset of X. It is strongly generic if the following is true: For each x ∈ X, define
Definition 7.6.1
B(x) = {y ∈ X : (x, y) ∈ R} .
(7.6.1)
Then the set of x for which B(x) is an open dense subset of X is itself an open dense subset of X. One can state the same definition in words: R is generic if two statements are true: (i) If a pair (x, y) belongs to R, then there is a neighborhood of (x, y) such that every point in the neighborhood also belongs to R. (ii) If a pair (x, y) does not belong to R, then every neighborhood of this pair contains a point that belongs to R. In other words, R is generic if it has two features: (i) If a pair satisfies the relation, then sufficiently small perturbations will not destroy the relation. (ii) If a pair does not satisfy the relation, then arbitrarily small perturbations in the pair will make it satisfy the relation. A simple (and pertinent) example is that, in the polynomial ring R[s] topologized as in Problem C.2.3, the relation R defined by R = {(x, y) : x and y have no common zeros} ,
(7.6.2)
is generic. The exact relationship between genericity and strong genericity is brought out in Problem 7.6.1. But, roughly speaking, the symmetric relation R is strongly generic if, for almost all x, almost all y satisfy the relation (x, y) ∈ R. Proposition 7.6.2 and dense.
Let A, B be open dense subsets of a topological space X. Then C = A ∩ B is also open
270
7. ROBUSTNESS
Proof. Clearly C is open, being the intersection of two open sets. Hence, it only remains to show that it is dense. For this, it is necessary to show the following: Let x ∈ X be arbitrary. Then every neighborhood of x contains a point of A ∩ B. Accordingly, let x ∈ X be chosen arbitrarily, and let N be a neighborhood of x. Since A is dense, N contains an element a ∈ A. Moreover, since A is open, there is a neighborhood N1 ⊆ A containing a. One can assume that N1 is also a subset of N by replacing N1 by N ∩ N1 if necessary. Next, since B is also dense, the neighborhood N1 contains an element b ∈ B. Since b also belongs to A, b belongs to A ∩ B and is contained in the neighborhood N of x. 2 The topological space-under study in this section is the ring S of proper stable rational functions. Recall that the topology on S is the natural one induced by its norm. Thus, a base for the topology on S is provided by open balls of various radii centered at various points. Most of the arguments given below can be extended to an arbitrary topological ring whose set of units is open and where inversion of a unit is continuous, but at a considerable increase in the level of abstraction. We begin by deriving some results concerning the genericity of coprimeness. Proposition 7.6.3
Define a symmetric binary relation R on S by R = {(f, g) : f and g are coprime} .
(7.6.3)
Then R is strongly generic as well as generic. Proof. It is shown only that R is strongly generic. The proof of genericity is similar and is left as an exercise (see Problem 7.6.2). Let f ∈ S, and define Rf = {g ∈ S : f and g are coprime} .
(7.6.4)
It is shown below that Rf is an open dense subset of S whenever f = 0. Since the set of nonzero elements is open and dense in S, this would prove the strong genericity of R. To show that Rf is open, suppose that f and g are coprime, and choose a, b ∈ S such that af + bg = 1. If h − g is less than b−1 , then af + bh = 1 + b(h − g) ,
(7.6.5)
is a unit of S, which shows that h and f are also coprime. Thus, if g ∈ Rf , then h ∈ Rf whenever h belongs to the ball B(g, b−1 ). This shows that Rf is open. To show that Rf is dense whenever f = 0, let g be an arbitrary element of S. If f and g are coprime, then nothing further needs to be done. On the other hand, if f and g are not coprime, it must be shown that every neighborhood of g contains an element which is coprime with f . Towards
7.6. GENERICITY
271
this end, let S denote the set of C+e -zeros of the function f , and note that S is a finite set since f is nonzero. Now partition the set S into two disjoint subsets S1 and S2 . as follows: S1 = {s ∈ S : g(s) = 0} , S2 = {s ∈ S : g(s) = 0} .
(7.6.6)
r = min |g(s)| ,
(7.6.7)
Define s∈S2
and note that r > 0. Now let ε > 0 be arbitrary; it will be shown that the ball B(g, ε) contains an element h which is coprime with f . Select a positive δ < min(r, ε) and define h = g + δ · 1. Then, it is claimed that f and h are coprime, since they do not have any common C+e -zeros. To establish the claim, note that the only possible common C+e -zeros of f and h are the elements of the set S. Now, if s ∈ S1 , then h(s) = δ = 0, whereas if s ∈ S2 , then |h(s)| > r − δ > 0. Hence, f and h are coprime. 2 Thus, Proposition 7.6.3 shows that, generically, any two functions in S are coprime. In order to extend this result to matrices, a preliminary lemma is needed. Proposition 7.6.4
Suppose m > 1 is some integer. Then the set Mm = {D ∈ Sm×m : |D| = 0} ,
(7.6.8)
is an open dense subset of Sm×m . Proof. The map D → |D| is continuous, and the set of nonzero elements in S is open. Hence, Mm , which is the preimage of an open set under a continuous map, is itself open. To show that it is also dense, suppose D is a matrix in Sm×m with |D| = 0, and let ε be an arbitrary positive number. It is shown next that the ball B(D, ε) contains a matrix with a nonzero determinant. Select matrices U, V ∈ U(S) such that U DV = Diag {d1 , · · · , dr , 0, · · · , 0} , (7.6.9) is the Smith form of D, and r is the rank of D. Observe now that di divides di+1 for all i. Now define (7.6.10) S = Diag {d1 , · · · , dr , δdr , · · · , δdr } , and let D1 = U −1 SV −1 . Then |D1 | = 0, and D1 can be made to lie in the ball B(D, ε) by choosing the constant δ sufficiently small. 2 The next proposition generalizes Proposition 7.6.3 to the matrix case. Proposition 7.6.5
Suppose m < n are positive integers, and define R(m, n) = {A ∈ Sm×n : The g.c.d. of all minors of A is 1 } .
(7.6.11)
272
7. ROBUSTNESS
Then R(m, n) is an open dense subset of Sm×n .
1) If R(m, n) is viewed as mn-ary relation on S, then the proposition states that this relation is generic. 2) Another way of stating the proposition is: Suppose F ∈ Sm×k , G ∈ Sm×l . Then F and G are generically coprime if k + l > m. 3) The definition (7.6.11) can be equivalently stated as: Remarks 7.6.6
R(m, n) = {A ∈ Sm×n : A ∼ [I 0]} .
(7.6.12)
Proof. Clearly it is enough to prove the proposition for the case where n = m + 1. The openness of the set R(m, n) is obvious, and it only remains to show that the set is dense. This is done by induction on m. Note that the proposition is true when m = 1, by Proposition 7.6.3. Suppose the proposition is true for matrices with m − 1 rows and m (or more) columns.To establish the statement for matrices with m rows, suppose A ∈ Sm×n but does not belong to R(m, n). To be shown is that every neighborhood of A contains a matrix belonging to R(m, n). Let ε > 0 be any specified number, and partition A in the form (7.6.13) A = [A1 a2 ] , where A1 and a2 have dimensions m × m and m × 1, respectively. Now let A11 denote the submatrix consisting of the first m − 1 rows of A1 . By the inductive hypothesis and Proposition 7.6.4, there exists an m × m matrix B11 , (7.6.14) B1 = B12 such that (i) |B1 | = 0, (ii) [I 0] is a Smith form of B11 , and (iii) B1 − A1 < ε/2. Now let S denote the set of C+e -zeros of |B1 |, and partition the set S into two disjoint sets S1 and S2 , as follows: S1 = {s ∈ S : all minors of [B1 a2 ] are zero} , S2 = {s ∈ S : not all minors of [B1 a2 ] are zero} .
(7.6.15)
The idea, as in the proof of Proposition 7.6.3, is to perturb a2 in such a way as to destroy all the common zeros in the set S1 while ensuring that none of the points in S2 becomes a common zero of all the minors. For this purpose, note that all minors of the matrix [B1 a2 ] belong to S and are thus, bounded over the closed right half-plane. Let μ be an upper bound on the norms of all these minors. Now, at each point s ∈ S1 , find an (m − 1) × (m − 1) minor of B1 which is nonzero at s; such a minor must exist, since [I 0] is a Smith form of B1 . Suppose such a minor is obtained by
7.6. GENERICITY
273
omitting the j -th column of B1 . Let em denote the m-th elementary unit vector. Then the m × m minor of B = [B1 a2 + δem ] , (7.6.16) obtained by omitting the j -th column is nonzero at s, whenever δ = 0. Hence, at all s ∈ S1 , at least one of the m × m minors of the matrix B in (7.6.16) is nonzero. The next step is to show that, if δ is sufficiently small, then at least one of the m × m minors of the matrix B is nonzero at all points in the set S2 as well. This can be shown as follows: In analogy with (7.6.7), choose a real number r > 0 such that, at each s ∈ S2 , at least one m × m minor of the matrix [B1 a2 ] is nonzero. Now choose δ < r/μ. Then, at each s ∈ S2 , the corresponding m × m minor of B is also nonzero. Finally, since S = S1 ∪ S2 , is the zero set of |B1 |, these are the only possible points in the closed RHP where all m × m minors of the matrix B could vanish. Hence, we conclude that, whenever δ < min(r/μ, ε), B has the Smith form [I 0]. 2 Proposition 7.6.5 shows that a rectangular matrix generically has the Smith form [I 0]. The next result is addressed to the Smith form of square matrices. Proposition 7.6.7
Suppose m is a positive integer, and define Tm = {A ∈ Sm×m : A ∼ Diag {1, · · · , 1, |A|}} .
(7.6.17)
Then Tm is an open dense subset of Sm×m . Proof. The openness of Tm is left to the reader to prove. To prove that Tm is dense, suppose A does not belong to Tm , and let ε > 0 be a given number. To construct an element A1 belonging to Tm lying in the ball B(A, ε), we proceed as follows: First, by Proposition 7.6.4, it can be assumed that |A| = 0, by perturbing the matrix A slightly if necessary. Now select unimodular matrices U, V such that S = U AV = Diag {a1 , · · · , am } , (7.6.18) is a Smith form of A.Then, for any δ > 0, one can find functions bi ∈ B(ai , δ) such that the functions b1 , · · · , bm are all pairwise coprime (see Proposition 7.6.3). Hence, the matrix B = U −1 Diag {b1 , · · · , bm }V −1 ,
(7.6.19)
has the Smith form Diag {1, · · · , 1, |B|}. By adjusting δ, one can ensure that B ∈ B(A, ε).
2
The first principal result of this section states that a collection of r multivariable plants is generically simultaneously stabilizable provided the plants have at least r inputs or r outputs. A useful corollary is that generically a multivariable plant can be strongly stabilized. These results make use of Proposition 7.6.5 concerning the generic Smith form of rectangular matrices. Suppose l, m are given positive integers, and let M(R(s)) denote the set [R(s)]l×m , equipped with the graph topology. Let r be a positive integer, and define the r-ary relation R on
Theorem 7.6.8
274
7. ROBUSTNESS
M(R(s)) by R = {(P1 , · · · , Pr ) :
S(Pi ) = ∅} .
(7.6.20)
i
Then R is an open dense subset of M(R(s)) whenever r ≤ max(l, m). The proof of the theorem is divided into two parts: First, it is shown that the result is true in the case where min(l, m) = 1, i.e., the case where all plants are either single-input or single-output. Then it is shown that the general case can be reduced to this special case. Suppose min(l, m) = 1. Then R defined in (7.6.20) is an open dense subset of M(R(s)) if max(l, m) ≥ r.
Lemma 7.6.9
Proof. The proof is given for the case where l = 1 and m ≥ r. The other case can be proved by taking transposes. (i) (i) (i) Given the plants P1 , · · · , Pr , each of dimension 1 × m, find l.c.f.’s (dp , [np1 · · · npm ]), for i = 1, · · · , r. Define the matrix Q ∈ M(R(s))r×(m+1) by qij = dp(i) if j = 1 , qij = n(i) p(j −1) if j > 1 .
(7.6.21)
Let C ∈ M(R(s))m×1 be a compensator, and let ([nc1 , · · · , ncm ] , dc ) be an r.c.f. of C. Then by Theorem 5.1.6, C stabilizes each of the plants Pi if and only if each of the functions ui are defined by ⎡ ⎤ ⎡ ⎤ dc u1 ⎢ nc ⎥ ⎢ .. ⎥ ⎢ 1⎥ (7.6.22) ⎣ . ⎦ = Q ⎢ .. ⎥ , ⎣ . ⎦ ur ncm is a unit. Turning this argument around, the set of plants {P1 , · · · , Pr } can be simultaneously stabilized if and only if there exist units ui , i = 1, · · · , r such that (7.6.22) has a solution in M(R(s)) for the unknowns dc and nci , i = 1, · · · , m, where the first component of the solution (corresponding to dc ) is nonzero. Now, if m ≥ r, then the matrix Q has more columns than rows. Hence, by Proposition 7.6.5, Q is generically right-invertible. That is, if Q is not right-invertible, it can be made so by an arbitrarily small perturbation in each of its elements, which is precisely the same as slightly perturbing each of the plants Pi in the graph topology. Thus, by slightly perturbing each of the plants, we can ensure that (7.6.22) has a solution in M(R(s)), whatever be the left hand side. In particular, one can choose any arbitrary set of units {u1 , · · · , ur }, and (7.6.22) has a solution in M(R(s)). The only additional point to be worried about is that the first element of this solution (corresponding to dc ) should be nonzero. But this too can be guaranteed by perturbing the units slightly if necessary. The details of this last argument are routine and are left to the reader. 2
7.6. GENERICITY
275
The next step is to reduce the case of an l × m system where l < m to the case of a single-output system (the case where l > m is handled by transposition). Lemma 7.6.10 Suppose l, m are given positive integers. Define the relation R on the product space M(S)(m+l)×m × Rl as the set of all (D, N, v) such that vN, D are right-coprime. Then the relation R is generic.
Proof. Let n(1) , · · · , n(l) denote the rows of the matrix N . By Proposition 7.6.5, it is generically true that the Smith form of the matrix D Mi = (i) , (7.6.23) n is [I 0] for all i. In other words, this means that the g.c.d. of all m × m minors of the matrix D M= , (7.6.24) N involving exactly one row of N is equal to 1. Now, if v is an l × 1 vector, then the pair (D, vN) is right-coprime if and only if the g.c.d. of all m × m minors of the matrix D Q= , (7.6.25) vN is equal to 1. We show that generically this is the case for almost all v ∈ Rl . Now the matrix Q has dimensions (m + 1) × m, as do the matrices Mi defined in (7.6.23). Moreover, since the determinant function is multilinear, a minor of Q involving the last row can be expressed as a linear combination of the corresponding minors of the various Mi . More precisely, let qi denote the minor of Q obtained j by omitting the j -th row the matrix D, and define mi analogously. Then q = j
l
j
vi mi .
(7.6.26)
j
(7.6.27)
i=1
By hypothesis, it is generically true that g.c.d.i,j {mi } = 1 . j
Hence, if the elements mi satisfy the rank condition of [100], it follows that, for almost all vectors v, g.c.d.j {q j } = 1 . (7.6.28) It is left to the reader to show that the rank condition is generically satisfied.This shows that (D, vN) 2 are right-coprime and completes the proof that the relation R is generic.
276
7. ROBUSTNESS
The significance of Lemma 7.6.10 is in showing that, generically, a multivariable plant can be stabilized using a rank one compensator. Suppose P is a given multivariable plant with an r.c.f. (N, D), and suppose there exists a vector v such that vN, D are right-coprime. Then one can find a pair of elements (a, B) ∈ M(S) such that avN + BD = I . Note that a is a column vector. Now the compensator C = B −1 av has rank one as an element of M(R(s)) and stabilizes P . Hence, if one can stabilize P using a rank one compensator if one can find a row vector v such that vN, D are right-coprime. Lemma 7.6.10 states that generically this is always possible. With the aid of Lemma 7.6.10, the proof of Theorem 7.6.8 can be completed. Proof of Theorem 7.6.8. If min(l, m) = 1, then the truth of the theorem is established by Lemma 7.6.9. If not, suppose without loss of generality that r ≤ m ≥ l. For each plant Pi , it is generically true that the set of row vectors v such that the m-input-scalar-output plant vPi can be stabilized by an appropriate compensator is open and dense in Rl . Since the intersection of a finite number of open and dense subsets of Rl is again open and dense (see Proposition 7.6.2), generically the set of row vectors v such that each of the plants vPi can be stabilized by an appropriate compensator is open and dense. Moreover, by Lemma 7.6.10, generically there exists a common stabilizing compensator C1 , since r ≤ m. Hence, C = vC1 simultaneously stabilizes each of the plants Pi , i = 1, · · · , r. 2 In Section 5.4 it was shown that the problem of simultaneously stabilizing r plants is equivalent to that of simultaneously stabilizing r − 1 plants using a stable compensator. Now Theorem 5.2.1 shows that r plants of dimension l × m can be generically simultaneously stabilized provided r ≤ max(l, m). Thus, it is natural to conjecture that r plants of dimension l × m can generically be simultaneously stabilized using a stable compensator provided r < max(l, m). In fact this is true, and the proof follows very nearly the reasoning above. However, a few minor adjustments are needed. The reason for this is the following: Given r plants Pi , i = 1, · · · , r, they can be simultaneously stabilized using a stable compensator if and only if the plants r + 1 plants 0, Pi , i = 1, · · · , r can be simultaneously stabilized. Now, the reasoning used in Lemma 7.6.9 and Theorem 7.6.8 relies on being able to perturb all of the plants as needed. However, in the present instance, one can only perturb all except one plant (namely the zero plant). However, there is a saving feature: In choosing an l.c.f. of the zero plant, the numerator matrix has to be zero, but the denominator matrix is arbitrary. Hence, the proof of the above result is complete if it can be shown that the Smith form of an (r + 1) × (m + 1) matrix of the form d1 N1 M= , (7.6.29) d2 0 is generically equal to [I 0] whenever r < m. Comparing the matrix in (7.6.29) to that in Proposition 7.6.5, we see that the only difference is the presence of a row of zeros in the bottom of the matrix M in (7.6.29). Now, checking back to the proof of Proposition 7.6.5, the ingredients to the proof by induction were: (i) the fact that the r × m matrix N1 generically had the Smith form [I 0], and (ii) the ability to adjust the corner element d2 . As both of these features are present in (7.6.29),
7.6. GENERICITY
277
we can conclude that the matrix M is generically right-invertible if r < m. Hence, the following corollary to Theorem 5.2.1 is proved: Corollary 7.6.11 Suppose the plants Pi , i = 1, · · · , r have dimension l × m.Then they can generically be simultaneously stabilized using a stable compensator if r < m.
PROBLEMS 7.6.1. (a) Show that if R is open then Rx is open for all x. (b) Show that if Rx is a dense subset of X for all x belonging to a dense subset of X, then R is a dense subset of X × X. 7.6.2. Show that the relation R of (7.6.3) is generic. 7.6.3. Show that the relation Tm defined in (7.6.17) is open.
NOTES AND REFERENCES The graph topology of Section 7.2 and the graph metric of Section 7.3 were introduced in [96]. For another metric on the set of square rational plants, see [114]. The robustness results in Section 7.4 are due to several researchers, including Doyle and Stein [32], Chen and Desoer [13], Kimura [58], and Vidyasagar and Kimura [99]. For other results on the robustness of feedback stability, specifically the gain margin of systems, see [57, 67]. Some of the results on robust regulation are from [39]. The genericity of simultaneous stabilizability was first established for two plants by Vidyasagar and Viswanadham [102]. The general results given here were first obtained by Ghosh and Byrnes [43] using state-space methods. The present treatment follows [100].
279
CHAPTER
8
Extensions to General Settings The previous six chapters concentrated on the problems of controller synthesis in the case where the underlying ring of “stable” transfer functions is the ring S defined in Section 2.1. As pointed out in Section 4.1, the only property of S that is used repeatedly in Chapters 2–7 is the fact that S is a principal ideal domain. Thus, most if not all of the contents of these chapters remain valid if S is replaced by any p.i.d. This is why the synthesis theory of Chapters 2–7 encompasses, within a single framework, lumped continuous-time as well as discrete-time systems, and situations where the stable region in the complex plant is something other than the left half-plane (or the closed unit disc in the discrete-time case). The objective of the present chapter is to generalize the earlier synthesis theory by removing the assumption that the underlying set of “stable” systems is a p.i.d. In effect, the generalization extends the scope of the theory to include linear distributed systems, 2-D systems and the like. In contrast with earlier chapters, this chapter, and in particular Section 8.1, presupposes a mathematical background on the part of the reader well beyond that contained in the appendices. Several references to the literature are given at appropriate places, but in general, [14, 116] are good references for concepts from abstract algebra, while [42] is a handy source for results on Banach algebras.
8.1
COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
Throughout this section, R denotes a commutative domain with identity, and F denotes the field of fractions associated with R. As in the rest of the book, the symbols M(R), M(F), U(R) denote respectively the set of all matrices with elements in R, the set of matrices with elements in F, and the set of all unimodular matrices with elements in R. If R is a principal ideal domain, then the contents of Section 4.1 show that every matrix over F has both right- and left-coprime factorizations over R. In fact, for this it is enough that R be a Bezout domain, i.e., that every finitely generated ideal in R be principal. However, if R is not Bezout, then the questions of the existence of r.c.f.’s and l.c.f.’s and their interrelationships need to be examined anew. It is the purpose of this section to do so. The concepts of coprimeness and coprime factorizations introduced in Section 4.1 can be extended in a natural way to a general ring R. Suppose A, B ∈ M(R) have the same number of columns. Then A and B are right-coprime if there exist X, Y ∈ M(R) such that XN + Y D = I .
(8.1.1)
280
8. EXTENSIONS TO GENERAL SETTINGS
Suppose F ∈ M(F). A pair (N, D) where N, D ∈ M(R) is a right-coprime factorization (r.c.f.) of F if (i) D is square and |D| = 0, (ii) F = ND −1 , and (iii) N, D are right-coprime. Left-coprime matrices and l.c.f.’s are defined analogously. Given an arbitrary F ∈ M(F), the existence of an r.c.f. for F is not automatic unless R is a Bezout domain (see Corollary 8.1.5 below). However, if F does have an r.c.f., then it is easy enough to characterize all r.c.f.’s of F . In fact, the relevant part of Theorem 4.1.13 carries over in toto. For ease of reference, the result is repeated here, but the proof is omitted. Lemma 8.1.1 Suppose F ∈ M(F) has an r.c.f. (N, D). Then the set of all r.c.f.’s of F consists of the set of pairs (NU, DU ) as U ranges over all matrices in U(R).
But when does an F ∈ M(F) have an r.c.f.? We begin with the scalar case. Lemma 8.1.2 Suppose f ∈ F , and express f as a/b where a, b ∈ R.Then f has a coprime factorization if and only if the ideal generated by a and b in R is principal.
Proof. “if ” Suppose g is a generator of the ideal generated by a and b. Then a = ng, b = dg, g = xa + yb for appropriate elements n, d, x, y ∈ R. Since b is the denominator of a fraction, it is nonzero, which in turn implies that both d and g, are nonzero. This, together with earlier relations, leads to f = n/d, xn + yd = 1. Hence, (n, d) is a coprime factorization of f . “only if ” Suppose (n, d) is a coprime factorization of f , and choose x, y ∈ R such that xn + yd = 1. Let r = b/d. Then r = bd −1 = b(xnd −1 + y) = b(xab−1 + y) = xa + yb ∈ R .
(8.1.2)
Hence, d divides b in R, and dr = b. This implies that a = nr, and that the ideal generated by a, b 2 is the same as the principal ideal generated by r. The matrix case is essentially similar, if one observes that every matrix F ∈ M(F) can be expressed in the form AB −1 where A, B ∈ M(R) and |B| = 0. To see this, suppose fij = pij /qij where pij , qij ∈ R. Define b = qij , and let aij = bpij /qij ∈ R. If A denotes the matrix of the i
j
aij ’s, then F = A(bI )−1 . The result in the matrix case is stated without proof. Lemma 8.1.3
Suppose F ∈ Fn×m , and express F in the form AB −1 where A, B ∈ M(R) and |B| = 0.
Define I = {XA + Y B : X ∈ Rm×n , Y ∈ Rm×m } ,
(8.1.3)
and note that I is a left ideal in Rm×m . Then F has an r.c.f. if and only if I is a left principal ideal. Lemma 8.1.4
Every f ∈ F has a coprime factorization if and only if R is a Bezout domain.
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
281
Proof. The following statement is easy to prove: R is a Bezout domain if and only if every pair of elements in R generates a principal ideal. So, if R is not Bezout, then there exist a, b ∈ R such that the ideal generated by a and b is not principal. Obviously, both a and b are nonzero. Hence, f = a/b ∈ F, but does not have a coprime factorization, by Lemma 8.1.2. This proves the “only if ” part. The “if ” part follows readily from Lemma 8.1.2. 2 Corollary 8.1.5
Every F ∈ Fn×m has an r.c.f. for all integers n, m ≥ 1 if and only if R is a Bezout
domain. Proof. If R is a Bezout domain, the proof of Theorem 4.1.13 applies to show that every F ∈ Fn×m has an r.c.f. for all n, m ≥ 1. 2 The next several paragraphs are devoted to deriving conditions for a matrix in M(R) to have a left (or right) inverse in M(R), using the idea of reduction modulo maximal ideals. Since a finite set of matrices A1 , · · · , An in M(R) are left-coprime if and only if the matrix [A1 , · · · , An ] has a right inverse, this problem arises naturally in the study of the coprimeness of two (or more) matrices. In the case where R is a Bezout domain, several equivalent criteria for left invertibility are given in Section 4.1. It turns out that Corollary 4.1.5 is the result that extends most naturally to the case of a general ring R (see Theorem 8.1.8 below). An ideal I in R is maximal if I = R and I is not properly contained in any ideal of R other than R itself. Equivalently (see [116, p. 150]), I is maximal if and only if the factor ring R/I is a field. Let max(R) denote the set of maximal ideals of R. Then R/I is a field for each I ∈ max(R), even though for different I this field could be different. However, if R is in addition a Banach algebra over the complex field, then R/I is isomorphic to the complex field C for all I ∈ max(R) [42, p. 31]. For x ∈ R and I ∈ max(R), let xˆI denote the coset x + I in the factor ring R/I. Lemma 8.1.6
Suppose a1 , · · · , an ∈ R. Then there exist x1 , · · · , xn ∈ R such that n
xi ai = 1
(8.1.4)
i=1
if and only if rank [aˆ 1I · · · aˆ nI ] = 1 ∀I ∈ max(R) .
(8.1.5)
Remarks 8.1.7 Note that (8.1.4) is equivalent to the statement that a1 , · · · , an generate R, while (8.1.5) states that no maximal ideal of R contains all of the ai ’s. Thus, Lemma 8.1.6 is a natural restatement of a well-known result [42, p. 24] that an element in R is a unit if and only if it is not contained in any maximal ideal of R.
282
8. EXTENSIONS TO GENERAL SETTINGS
Proof. “if ” Suppose (8.1.5) holds and let J denote the ideal generated by the ai ’s. If J = R, then J is contained in a maximal ideal of R [116, p. 151]. Suppose J ⊆ I ∈ max(R). Then I contains all the ai ’s, in contradiction to (8.1.5). Hence, J = R, i.e., (8.1.4) holds. “only if ” Suppose (8.1.4) holds. If (8.1.5) is false, then there exists an I ∈ max(R) that contains all the ai ’s. In view of (8.1.4), this implies that 1 ∈ I , which is absurd. 2
Theorem 8.1.8
Suppose A ∈ Rn×m with n ≥ m. Then A has a left inverse in Rm×n if and only if1 rank Aˆ I = m ∀I ∈ max(R) .
(8.1.6)
An equivalent, and extremely useful, way of stating this theorem is: A ∈ Rn×m has a left inverse if and only if the set of m × m minors of A together generate R. Proof. “only if ” Suppose B ∈ Rm×n is a left inverse of A, so that BA = Im . Since the map x → xˆI from R into R/I is a homomorphism for each I ∈ max(R), it follows, upon taking cosets modulo I, that Bˆ I Aˆ I = Im ∀I ∈ max(R) . (8.1.7) Now (8.1.7) implies (8.1.6) . “if ” This part of the proof is made more intelligible by using the symbol S(m, n) defined in Section B.1. Recall that S(m, n) denotes the set of all strictly increasing m-tuples taken from {1, · · · , n}. The case m = n is very easy: In this case (8.1.6) merely states that |A| is not contained in any maximal ideal of R. Hence, |A| is a unit of R, and A−1 ∈ Rm×m . Now suppose m < n. For each m-tuple J ∈ S(m, n), let FJ ∈ Rm×m denote the m × m submatrix of A corresponding to the rows in J. In other words, if J = {j1 , · · · , jm }, then ⎡
aj1 1 ⎢ .. FJ = ⎣ . a jm 1
···
⎤ a j1 m .. ⎥ . . ⎦
(8.1.8)
· · · a jm m
Let fJ = |Fj | denote the m × m minor of A corresponding to the rows in J . Then (8.1.6) states that, for each I ∈ max(R), we have fˆJ I = 0 for some J ∈ S(m, n). Hence, by Lemma 8.1.6, the minors fJ , J ∈ S(m, n) together generate R; i.e., there exist elements xJ , J ∈ S(m, n) such that xJ fJ = 1 . (8.1.9) J ∈S(m,n) 1 Note that Aˆ is an n × m matrix, with elements in the field R/I, whose ij -th element is aˆ . I ij I
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
283
Next, for each J ∈ S(m, n), define a matrix BJ ∈ R as follows: Suppose J = {j1 , · · · , jm }, m×m define FJ as in (8.1.8), and let GJ ∈ R denote the adjoint matrix of FJ . Let gJ 1 , · · · , gJ m denote the columns of GJ . Then m×n
BJ = [0 gJ 1 gJ 2 · · · gJ m 0].
(8.1.10)
In other words, columns j1 , · · · , jm of BJ equal gJ 1 · · · , gJ m , respectively, while all other columns of BJ are zero. By construction, (8.1.11) BJ A = GJ FJ = fJ Im . Now define
B=
xJ BJ .
(8.1.12)
J ∈S(m,n)
Then BA = Im , from (8.1.11) and (8.1.9).2
2
Suppose A, B ∈ M(R) have the same number of columns. Then A and B are right
coprime if and only if the matrix [Aˆ I Bˆ I ] has full column rank for all I ∈ max(R).
Corollary 8.1.9
Application 8.1.10 [110] Let k be an algebraically closed field (e.g., the complex numbers), and let R be the ring k[x1 , · · · , xr ] of polynomials in the r indeterminates x1 , · · · , xr with coefficients in k. This ring arises in the study of multidimensional linear systems. For each r-tuple p = (p1 , · · · , pr ) ∈ k r , the set (8.1.13) Ip = {φ ∈ k[x1 , · · · , xr ] : φ(p) = 0}
of polynomials that vanish at p is a maximal ideal of R. Moreover, since k is algebraically closed, every maximal ideal of R is of the form (8.1.13) for some p ∈ k r [4, p. 82]. Now R/Ip is isomorphic to k for all p ∈ k r . Further, the canonical map φ → φ + Ip takes the polynomial φ into the field element φ(p). In this case, Lemma 8.1.6 is the “weak” form of Hilbert’s nullstellensatz, namely: A finite collection of polynomial φ1 , · · · , φn generates R if and only if, at each point p ∈ k r , at least one of the φ(p) is nonzero. Corollary 8.1.9 states that two matrices A, B ∈ M(R) are right-coprime if and only if the matrix [A (p) B (p)] has full column rank for all p ∈ k r . If R is a Banach algebra over the complex field, then Lemma 8.1.6 and Theorem 8.1.8 continue to apply, but some further refinements are possible because of the topological structure of the maximal ideal space of R. As mentioned earlier, in this case R/I is isomorphic to C for all I ∈ max(R). Hence, in this case one can think of the canonical projection x → xˆI as a homomorphism from R into C. Thus, with every I ∈ max(R) one can associate a nonzero homomorphism from R to C.The converse is also true: If φ : R → C is a nonzero homomorphism, then the kernel of φ is a maximal ideal of 2This proof was communicated to the author by N.K. Bose.
284
8. EXTENSIONS TO GENERAL SETTINGS
R. Let us now switch notation. Let denote the maximal ideal space of R, and for each ω ∈ and each x ∈ R, let xˆω denote the image (in C) of x under the homomorphism associated with ω. Let be equipped with the weakest topology in which the map ω → xˆω is continuous for each x ∈ R. Such a topology can be defined in terms of its base, consisting of sets of the form B(ω; x1 , · · · , xn ) = {θ ∈ : |xˆiθ − xˆiω | < ε ∀i} .
(8.1.14)
Then becomes a compact topological space. Let C() denote the Banach algebra of continuous (complex-valued) functions on , with the supremum norm; i.e., f C = max |f (ω)| , ω∈
(8.1.15)
and multiplication defined pointwise. Then, with each element x ∈ R, one can associate an element xˆ ∈ C() by the rule x(ω) ˆ = xˆω ∀ω ∈ . The function xˆ ∈ C() is called the Gel’fand transform of ˆ denote the subset of C() consisting of the set of Gel’fand transforms of all x ∈ R. Then x. Let R ˆ is a subalgebra of C(), but does not in general equal C(); in fact, R ˆ may not even be a closed R subalgebra of C(). The utility of the above theory arises from the fact that, for several Banach algebras of interest, the maximal ideal space can be identified explicitly. Examples 8.1.11 Let R = A, the disc algebra defined in Section 2.4. One can think of A as the set of transfer functions of 2 -stable shift-invariant digital filters whose frequency responses are continuous. The maximal ideal space of A is the closed unit disc D. For each z ∈ D, the map f → f (z) (the “evaluation” map) defines a homomorphism from A into C; conversely, all nonzero homomorphisms from A into C are of this type. Hence, given f ∈ A, the function z → f (z) is the Gel’fand transform of f . Since the space C() is just the set of continuous functions C(D), in this case the Gel’fand transform is just the injection map of A into C(D). Let R = 1 (Z+ ), the set of sequences {fi }∞ i=0 that are absolutely summable, with convolution as the multiplication. As shown in [26, 97], 1 (Z+ ) is precisely the set of unit-pulse responses of ∞ -stable shift-invariant causal digital filters. The maximal ideal space of 1 (Z+ ) is once again the closed unit disc D. Corresponding to each z ∈ D, one can associate the homomorphism
fˆ(z) =
∞
fi zi ,
(8.1.16)
i=0
which maps 1 (Z+ ) into C. The map z → fˆ(z) which is a function in C(D) is the Gel’fand transform of f ∈ 1 (Z+ ). In this case, it is just the usual z-transform of the sequence {fi }. More generally, let R = 1 (Zn+ ) denote the set of all absolutely summable n-indexed sequences {fi1 ···in } where the each index ranges over the nonnegative integers. The maximal ideal space of R
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
285
n
is D , the cartesian product of the unit disc with itself n times, also referred to as the n-polydisc. Suppose z = (z1 , · · · , zn ) ∈ Dn . Then the function ∞
fˆ(z) =
···
i1 =0
∞
fi1 ···in z1i1 · · · znin
(8.1.17)
in =0
defines a homomorphism from 1 (Zn+ ) into C. The map z → fˆ(z) which belongs to C(Dn ) is the Gel’fand transform of f , which is again the usual z-transform. Now suppose R = 1 (Z), the set of absolutely summable two-sided sequences {fi }∞ i=−∞ . In this case the maximal ideal space of R is the unit circle T. For each z ∈ T, the corresponding homomorphism from 1 (Z) to C is again given by (8.1.16), with the summation now running from −∞ to ∞. If R = 1 (Zn ), then the maximal ideal space of R is Tn , and (8.1.17) defines a homomorphism of R into C (with the obvious change of 0 to −∞ everywhere). One can replace Zn by any locally compact Abelian group and the theory is still complete; see [80]. In some applications, one does not know the maximal ideal space of R explicitly, but does know a dense subset of . Two such instances are discussed in Examples 8.1.15 and 8.1.16 below. Theorem 8.1.8 can be easily extended to this case. Suppose R is a Banach algebra over C, with maximal ideal space . Suppose is a dense subset of , and suppose a1 , · · · , an ∈ R. Then there exist x1 , · · · , xn ∈ R such that
Lemma 8.1.12
n
xi ai = 1
(8.1.18)
|aˆ i (ω)| > 0 .
(8.1.19)
i=1
if and only if inf
ω∈
n i=1
Proof. “if ” Suppose (8.1.19) holds; then it is claimed that n
|aˆ i (ω)| = 0 ∀ω ∈ ,
(8.1.20)
i=1
or equivalently, that at least one aˆ i (ω) is nonzero at each ω ∈ . The proof of (8.1.20) is by contradiction. Suppose that, for some ω0 ∈ , we have aˆ i (ω0 ) = 0 ∀i. Since is dense in , there is a sequence3 {ωj } in converging to ω0 , and n i=1
|aˆ i (ωj )| →
n
|aˆ i (ω0 )| = 0 .
i=1
3 If is not a first countable topological space, this argument has to be modified by replacing all sequences by nets.
(8.1.21)
286
8. EXTENSIONS TO GENERAL SETTINGS
But (8.1.21) contradicts (8.1.19). Hence, (8.1.20) is true. The existence of the xi ’s satisfying (8.1.18) now follows from Lemma 8.1.6. “only if ” Suppose (8.1.18) holds. Then, by Lemma 8.1.6, (8.1.20) is also true. Now, since is a compact topological space, (8.1.20) implies that inf
ω∈
n
|aˆ i (ω)| > 0 .
(8.1.22)
i=1
Finally, (8.1.22) implies (8.1.19), since is a subset of .
2
Let R, , be as in Lemma 8.1.12, and suppose A ∈ Rn×m with n ≥ m. Then A has a left inverse in R if and only if ˆ >0, (8.1.23) inf σm (A(ω)) Lemma 8.1.13
m×n
ω∈
where σm (·) denotes the smallest singular value of a matrix.
An equivalent way of stating (8.1.23) is as follows: There exists an α > 0 such ˆ that Aˆ ∗ (ω)A(ω) ≥ αI ∀ω ∈ .
Remarks 8.1.14
ˆ Proof. “if ” Suppose σm (A(ω)) ≥ δ ∀ω ∈ . Then a simple exercise in numerical analysis shows that
|aˆ J (ω| ≥
J ∈S(m,n)
nδ 2 > 0 ∀ω ∈ , c
(8.1.24)
ˆ and c = n!/[m!(n − m)!]. From where aˆ J , J ∈ S(m, n) are the various m × m minors of A, Lemma 8.1.12, (8.1.24) implies that the minors aˆ J , J ∈ S(m, n) generate R. The existence of a left inverse for A now follows as in the proof of Theorem 8.1.8. “only if ” Suppose A has a left inverse. Then, by Theorem 8.1.8, the minors aJ , J ∈ S(m, n) of A generate R. Hence, by Lemma 8.1.12, inf |aˆ j (ω)| > 0 . (8.1.25) ω∈
It is now routine to prove (8.1.23).
J
2
Application 8.1.15 Let R = H∞ , which can be interpreted as the set of transfer functions of causal, 2 -stable, shift-invariant digital filters, without the requirement that the frequency response be continuous. The maximal ideal space of H∞ is not known explicitly. However, an extremely deep result due to Carleson [34, Chapter 12], called the corona theorem, states that the open unit disc
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
287
is a dense subset of the maximal ideal space of H∞ . Hence, Lemma 8.1.12 states, in this case, that two functions f, g ∈ H∞ are coprime if and only if inf {|f (z)| + |g(z)|} > 0 .
|z|0.
s∈C+
(8.1.30)
Now we change direction somewhat. Lemma 8.1.4 shows that if R is not a Bezout domain, then not every matrix in M(R) has an r.c.f., and not every matrix has an l.c.f. At this stage it is natural to ask: If a matrix has one, does it have the other? The rest of this section is devoted to answering this question for various rings of interest.
288
8. EXTENSIONS TO GENERAL SETTINGS
Suppose A ∈ Rm×n with m < n; then we say that A can be complemented if there exists a unimodular matrix U ∈ Rn×n containing A as a submatrix. Suppose A can be complemented, and let U = [A B ] ∈ Rn×n be unimodular. By expanding |U | along the first m rows using Laplace’s expansion (Fact B.1.4), we see that the m × m minors of A must together generate R, or equivalently (by Theorem 8.1.8), that A has a right inverse in Rn×m . Thus, only right-invertible matrices can be complemented. But is right-invertibility sufficient? We postpone answering this question in order to demonstrate the relevance of complementability to the issue at hand. ˜ N). ˜ Then F has an r.c.f. if and only if the matrix Suppose F ∈ M(F) has an l.c.f. (D, A = [D˜ N˜ ] can be complemented.
Lemma 8.1.17
˜ is obtained from A by elementary Proof. “if ” Suppose A can be complemented. Since B = [−N˜ D] 4 column operations, B can also be complemented. Suppose Y X U= (8.1.31) −N˜ D˜ is a unimodular matrix containing B as a submatrix, and partition U −1 ∈ M(R) as D −X˜ U −1 = . N Y˜
(8.1.32)
The claim is that (N, D) is an r.c.f. of F . From U U −1 = I , we get Y D + XN = I , ˜ + DN ˜ =0. −ND
(8.1.33) (8.1.34)
Thus, D and N are right-coprime from (8.1.33), and the proof is complete if it can be established that |D| = 0, because (8.1.34) would then imply that ND −1 = D˜ −1 N˜ = F . ˜ N) ˜ is an To show that D is nonsingular, begin by observing that D˜ is nonsingular, since (D, l.c.f. of F . Therefore the matrix I 0 D −X˜ −1 U = (8.1.35) −N˜ D˜ 0 I is also nonsingular. It now follows from (8.1.35) that |D| = 0. “only if ” This part exactly parallels the proof of Theorem 4.1.16. Suppose (N, D) is an r.c.f. ˜ Y˜ ∈ M(R) such that XN + Y D = I, N˜ X˜ + D˜ Y˜ = I . Define of F , and select X, Y, X, Y X D −X˜ U= , W = . (8.1.36) −N˜ D˜ N Y˜ 4The change from [D ˜ N˜ ] to [−N˜ D] ˜ is made simply to make U resemble the matrix in Theorem 4.1.16.
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
Then,
UW =
I −Y X˜ + X Y˜ . 0 I
289
(8.1.37)
Since |U W | = 1, it follows that |U | is a unit and that U is unimodular. This shows that B = ˜ can be complemented. The same is true of A = [D˜ N], ˜ since A can be transformed to B [−N˜ D] by elementary column operations. 2
Suppose F ∈ M(F) has an r.c.f. (N, D). Then F has an l.c.f. if and only if the matrix can be complemented.
Corollary 8.1.18
[D
N ]
The proof is easy and is left as an exercise. Suppose A ∈ Rm×n be complemented.Then by definition there exists a matrix B ∈ R(n−m)×m such that [A B ] is unimodular. The next result parametrizes all such matrices B. Suppose A ∈ Rm×n can be complemented, and suppose [A B0 ] is unimodular. Then the set of B ∈ M(R(n−m)×m such that [A B ] is unimodular is given by Lemma 8.1.19
{B : B = RA + V B0 , where R ∈ M(R), V ∈ U(R)} .
(8.1.38)
Proof. First, suppose B = RA + V B0 where V ∈ U(R). Then, A A I = = B RA + V B0 R
0 V
A , B0
(8.1.39)
which is unimodular since both matrices on the right side are in U(R). Conversely, suppose [A B ] ∈ U(R). Partition [A B0 ] −1 as [F G], and note that AF = I, AG = 0. Now, by assumption, −1 A A A I = [F G] = B B0 B BF
0 BG
(8.1.40)
is in U(R), which shows that BG ∈ U(R). Let V = BG, R = BF . Then, from (8.1.38), A A I 0 A = , (8.1.41) = RA + V B0 B R V B0 which implies that B = RA + V B0 .
2
290
8. EXTENSIONS TO GENERAL SETTINGS
Now we return to the question: When can every right-invertible matrix over R be complemented? The answer is provided by the notion of a Hermite ring. A row [a1 , · · · , an ] ∈ R1×n is a unimodular row if a1 , · · · , an together generate R. A ring R is Hermite if every unimodular row can be complemented. Lemma B.2.1 shows that every Bezout domain is Hermite. But it is shown below that several other commonly encountered rings are Hermite, though they are not Bezout. Before this, a minor technicality is disposed of. The following statements are equivalent: (i) R is a Hermite ring. (ii) If A ∈ Rk×n has a right inverse, then A can be complemented.
Lemma 8.1.20
Remarks 8.1.21 (a) The objective of the lemma is to deal with “fat” matrices, not just row vectors; (b) compare (ii) with Corollary 4.1.7.
Proof. (ii) ⇒ (i) Obvious: just let m = 1. (i) ⇒ (ii) Fix n. The proof is by induction on k. The definition of a Hermite ring shows that (ii) is true when k = 1. Suppose (ii) is true for matrices in Rm×n whenever m ≤ k − 1 < n − 1, and suppose A ∈ Rk×n has a right inverse, say C. Partition A and C as shown below: 1 A (8.1.42) A = 2 , C = [C1 c2 ] , a where a 2 is the last row of A and c2 is the last column of C. Then A1 C1 = Ik−1 . Hence, by the inductive hypothesis (and taking transposes), there exists a unimodular matrix U ∈ Rn×n of the form U = [C1 D]. Now 1 Ik−1 Z A =: M . (8.1.43) AU = 2 [C1 D] = 0 y a Since A has a right inverse, so does M, and hence so does y ∈ R1×(n−k+1) . Let b ∈ R(n−k+1)×1 be a right inverse of y. Since R is Hermite, there exists a unimodular matrix B ∈ R(n−k+1)×(n−k+l) containing b as its first column. Thus, Ik−1 ZB Ik−1 × × Ik−1 0 = = =: N , (8.1.44) AU 0 B 0 yB 0 1 × since the first element of yB ∈ R1×(n−k+1) is a one. (In (8.1.44), × is a generic symbol denoting an element or a matrix whose value is unimportant.) Now the matrix N in (8.1.44) can be complemented, since the determinant of ⎡ ⎤ Ik−1 × × ⎣ 0 (8.1.45) 1 × ⎦ 0 0 In−k is one. Since N is a right associate of A, it follows that A can also be complemented.
2
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
291
Suppose R is a Hermite ring and that A ∈ R with m < n. Then A has a right m×m n×n inverse if and only if there exist unimodular V ∈ R ,U ∈ R such that m×n
Corollary 8.1.22
V AU = [Im 0] .
(8.1.46)
Corollary 8.1.22 shows that in a Hermite ring, even though there is no Smith form in general for arbitrary matrices, there is one for right-invertible matrices (and of course for left-invertible matrices). The following three statements are equivalent: (i) R is a Hermite ring. (ii) If F ∈ M(F) has an l.c.f., then it has an r.c.f. (iii) If F ∈ M(F) has an r.c.f., then it has an l.c.f.
Theorem 8.1.23
Proof. The equivalence of (ii) and (iii) follows readily by taking transposes. ˜ N). ˜ Then A = (i) ⇒ (ii) Suppose R is Hermite, and suppose F ∈ M(F) has an l.c.f., say (D, ˜ ˜ [D N ] has a right inverse. From Lemma 8.1.20, A can be complemented, and from Lemma 8.1.17, it follows that F has an r.c.f. (ii) ⇒ (i) Suppose [a1 · · · an ] ∈ R1×n is an arbitrary unimodular row; it must be shown that this row can be complemented. Since a1 , · · · , an generate R, at least one of the ai must be nonzero. Suppose a1 = 0 by renumbering if necessary, and defineF ∈ F1×(n−1) by a2 an · · · F = . (8.1.47) a1 a1 Now the unimodularity of the row [a1 · · · an ] shows that F has an l.c.f., namely (a1 , [a2 · · · an ]). Since (ii) is assumed to hold, F also has an r.c.f. By Lemma 8.1.17, this implies that the row [a1 · · · an ] can be complemented. Hence, R is Hermite. 2 Theorem 8.1.23 shows that Hermite rings are the next best thing to Bezout domains, at least as far as controller synthesis theory goes. If R is Bezout, then every matrix in M(F) has both an r.c.f. and an l.c.f., and all of the synthesis theory of Chapter 5 goes through very nicely. If R is not Bezout, then not every matrix in M(F) has an r.c.f., and not every matrix has an l.c.f. However, if R is at least Hermite, then the matrices in M(F) that have an r.c.f. are precisely those that have an l.c.f. Moreover, this property characterizes Hermite rings. It is therefore worthwhile to explore whether various commonly encountered rings are Hermite. If R is a complex Banach algebra, then there is a very general result available. The proof of this result is well beyond the scope of this book and can be found in [62]; see [62, Theorem 3]. Suppose R is a complex Banach algebra with maximal ideal space . Then a unimodular row [a1 · · · an ] in R can be complemented if and only if the row [aˆ 1 · · · aˆ n ] can be complemented in C(). In particular, if is contractible, then R is Hermite.
Theorem 8.1.24
292
8. EXTENSIONS TO GENERAL SETTINGS
Recall that a topological space is contractible if the identity map on is homotopic to a constant map. In other words, is contractible if there exists an ω0 ∈ and a continuous function f : [0, 1] × → such that f (0, ω) = ω, f (1, ω) = ω0 , ∀ω ∈ .
(8.1.48)
Theorem 8.1.24 is similar in spirit to one of the basic results in the Gel’fand theory of Banach algebras, which states that, given any complex Banach algebra R and any f ∈ R, f can be inverted in R if and only if fˆ can be inverted in C(). Note that a convex set in C n is contractible: If ⊆ C n is convex, let ω0 ∈ be arbitrary and define f (λ, ω) = λω0 + (1 − λ)ω . (8.1.49) This observation leads to a very useful criterion. Corollary 8.1.25
convex subset of
Cn
Suppose R is a complex Banach algebra whose maximal ideal space is a (compact) for some integer n. Then R is a Hermite ring.
[6, 54] The disc algebra, 1 (Z+ ), 1 (Zn+ ) are all Hermite. Theorem 8.1.24 and Corollary 8.1.25 are not valid in general if R is a real Banach algebra. However, in [6, 54] it is shown that the following real algebras are indeed Hermite rings: (i) The set As of functions in the disc algebra A whose Fourier coefficients are all real (i.e., the set of all symmetric functions in A). (ii) The set 1r (Zn+ ) consisting of all real sequences in 1 (Zn+ ). Examples 8.1.26
Theorem 8.1.24 suggests a simple example of a ring that is not Hermite. Let S 2 denote the unit sphere in R3 , and let R = C(S 2 ), the set of all continuous complex-valued functions on S 2 .5 Then, = S 2 , which is not contractible. It is claimed that R is not a Hermite ring. To show this, it is enough to demonstrate a unimodular row in R1×3 that cannot be complemented. Define a = [a1 a2 a3 ] by setting a(x) equal to the unit outward normal vector at x, for each x ∈ S 2 . Then a(·) is continuous and therefore belongs to R1×3 . Moreover, since a(x) = 0 ∀x ∈ S 2 , it follows from Lemma 8.1.6 that a is a unimodular row. Now suppose a could be complemented, and suppose [a b c ] ∈ R3×3 is unimodular. Then for each x ∈ S 2 , the three vectors a(x), b(x), c(x) must span the vector space C 3 . Let T (x) denote the (two-dimensional) tangent space to S 2 at x. Since span {a(x), b(x), c(x)} = C 3 contains T (x), and since a(x) is orthogonal to T (x), we see that the projections of b(x), c(x) onto T (x) must both be nonzero and must together span T (x). Let f(x), g(x) denote the projections of b(x), c(x) onto T (x), respectively. Then f(·) is a vector field on S 2 that does not vanish anywhere. However, this contradicts the Borsuk-Ulam Example 8.1.27
5This example was suggested to the author by Ken Davidson.
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
theorem [46, p. 116], which states that every vector field on S 2
293
must have a zero. Hence, a(·) cannot
be complemented, and R is not a Hermite ring. If we define F = [a2 /a1 a3 /a1 ], then F has an l.c.f. but not an r.c.f. We conclude this section with a theorem that is very important in several branches of control theory and indeed in mathematics. Let k be a field, and let R = k[x1 , · · · , xl ], the ring of polynomials in l indeterminates with coefficients in k. Then R is a Hermite ring.
Theorem 8.1.28 (Quillen-Suslin)
(i) Until it was proven, the above result was known as Serre’s conjecture. See [61] for an excellent historical introduction to this result, its origins and its significance. (ii) Quillen [76] shows that the result remains valid if k is replaced by any p.i.d. However, the proof given below does not apply to this case. (iii) The ring k[[x1 , · · · , , xl ]] of formal power series in l indeterminates with coefficients in k is also a Hermite ring; see [63]. (iv) The proof given below is taken from [79, pp. 138–145]. See [112] for another proof that applies to the special case when k = C.
Remarks 8.1.29
The elementary proof of Theorem 8.1.28 is rather long, and makes use of a few concepts not previously introduced. A ring B is local if it contains a unique maximal ideal [4, p. 38]. Suppose A is a domain, and let P be a prime ideal in A. The set consisting of all fractions of the form a/b where a ∈ A, b ∈ A \ P, is a local ring, called the localization of A at P, and is denoted by Ap . The unique maximal ideal of Ap is the set of fractions a/b where a ∈ P, b ∈ A \ P. The pertinent property of local rings is the following: If B is a local ring and J is its unique maximal ideal, then every b ∈ / J is a unit. This follows from Lemma 8.1.6. If B is a ring and b1 , · · · , bn ∈ B, the symbol (b1 , · · · , bn ) denotes the ideal generated by the bi ’s. If f ∈ B[y], then δ(f ) denotes the degree of the polynomial f . More generally, if f ∈ B[y1 , · · · , ym ], then δ(f ) denotes the total degree of f . That is, if f is of the form fi y1i1 · · · ylil , (8.1.50) f (y1 , · · · , yl ) = i
where i is a shorthand for the l-tuple (i1 , · · · , il ) then δ(f ) is the largest of the sums (i1 + · · · + il ) for which the coefficient fi is nonzero. The proof of Theorem 8.1.28 is basically inductive. Clearly k[x1 ] is Hermite, since it is a p.i.d. (see Lemma B.2.1). The proof consists of showing that if k[x1 , · · · , xl−1 ] is Hermite, then so is k[x1 , · · · , xl ]. This proof consists of two independent steps. The first step (which is actually proved last here, since it is more complicated than the other) is to show that if B is a Hermite ring and [a1 , · · · , an ] is a unimodular row in B1×n , then it can be complemented, provided one of the ai ’s is a
294
8. EXTENSIONS TO GENERAL SETTINGS
monic polynomial. The second step (which is proved first) is to show that in the ring k[x1 , · · · , xl ], it is always possible to effect a change of variables in such a way that every unimodular row contains at least one monic polynomial in the new set of variables. Lemma 8.1.30 (Nagata)
Suppose a ∈ R, and let m = δ(a) + 1. Define a new set of indeterminates yl = xl , yi = xi − xlm
l−i
.
(8.1.51)
Then a(x1 , · · · , xl ) = rb(y1 , . . . , yl ), where r = 0 belongs to k,and b is a monic polynomial in yl (with coefficients in k[y1 , · · · , yl−1 ]). Proof. Denote an l-tuple (j1 , · · · , jl ) where each of the ji is nonnegative by the symbol j. Then the polynomial a can be written as j j rj x1i · · · xl l , (8.1.52) a= j
where rj ∈ k. Note that (8.1.51) has the inverse transformation xl = yl , xi = yi + ylm
l−i
.
Substituting from (8.1.53) into (8.1.52) gives l−1 j rj (y1 + ylm )j1 · · · (yl−1 + ylm )jl−1 yl l . b=
(8.1.53)
(8.1.54)
j
Now let m denote the l-tuple (ml−1 , · · · , m, 1). The first point to note is that if i = (i1 , · · · , il ) and j = (j1 , · · · , jl ) are two distinct l-tuples with 0 ≤ ik , jk ≤ m − 1 ∀k, then i · m and j · m are distinct integers, where · denotes the usual vector dot product. This follows upon noting that i · m < ml , and that the l-tuple i is just the representation of the integer i · m with respect to the base m; similarly for the integer j · m. Returning to (8.1.54), let J = {j : rj = 0}. Then there is a unique l-tuple j0 ∈ J such that j0 · m > j · m ∀j ∈ J \ j0 . Hence, when expanded, (8.1.54) is of the form j ·m
b = rj0 yl 0
+ terms involving lower powers of yl .
Hence, b equals rj0 times a monic polynomial in yl of degree j0 · m.
(8.1.55)
2
Note that if k is an infinite field, then it is possible to achieve the same result as in Lemma 8.1.30 by using only a linear transformation of variables [112]. However, the more complicated transformation is needed in the case of an arbitrary (and possibly finite) field. To prove that R = k[x1 , · · · , xl ] is a Hermite ring, suppose [a1 · · · an ] is an arbitrary unimodular row in R1×n ; it must be shown that this row can be complemented. In order to do this, we can first transform the indeterminates x1 , · · · , xl into the set y1 , · · · , yl using Lemma 8.1.30
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
295
such that one of the bi ’s is monic in yl , where bi (y1 . · · · , yl ) = a(x1 , · · · , xl )∀i. Since the rings k[x1 , · · · , xl ] and k[y1 , · · · , yl ] are clearly isomorphic, it is enough to show that the row [b1 · · · bn ] can be complemented. Thus, the proof of Theorem 8.1.28 is complete once the following lemma is established. Note that this lemma is of independent interest. Suppose B is a Hermite ring, and let R = B[x]. Suppose [a1 · · · an ] ∈ R1×n is a unimodular row, and that at least one of the ai ’s is monic. Then this row can be complemented. Lemma 8.1.31
The proof of Lemma 8.1.31 requires two preliminary results. Lemma 8.1.32 (Suslin)
Suppose B is a ring, R = B[x], and suppose f, g ∈ R are of the form f (x) = x r + a1 x r−1 + · · · + ar , g(x) = b1 x r−1 + · · · + br .
(8.1.56)
Then, for each j ∈ {1, · · · , r} the ideal (f, g) contains a polynomial hj such that δ(hj ) ≤ r − 1 and the leading coefficient of hj is bj . Proof. Define I ⊆ B to be the set of leading coefficients of all h in the ideal (f, g) such that δ(h) ≤ r − 1. Then I is an ideal in B. To see this, suppose p1 , q1 , ∈ I, and let p, q ∈ (f, g) be the corresponding polynomials with leading coefficients p1 , q1 and degrees less than or equal to r − 1. Assume without loss of generality that δ(p) ≤ δ(q). Then the polynomial h(x) = q(x) − x δ(q)−δ(p) p(x)
(8.1.57)
has leading coefficient q1 − p1 and degree no larger than r − 1. Hence, I is closed under subtraction. Finally, it is clear that if p1 ∈ I and s ∈ B, then p1 s ∈ I; just take h(x) = sp(x). To prove the lemma, it is enough to establish the following statement: If h ∈ (f, g) and δ(h) ≤ r − 1, then all coefficients of h belong to I. This is established by induction, by proving that: for each j ∈ {1, · · · , r}, if h ∈ (f, g) and δ(h) ≤ r − 1, then the coefficients of x r−1 , · · · , x r−j belong to I. Now the statement is true for j = 1, from the definition of I. Suppose the statement is true up to j − 1, and suppose h is an arbitrary element of the ideal (f, g) with δ(h) ≤ r − 1. Let δ = δ(h), and write h(x) = h1 x r−1 + · · · + hr , (8.1.58) where some of the hi may be zero. By the inductive hypothesis, h1 , · · · , hj −1 belong to I. Now, xh(x) − h1 f (x) ∈ (f, g) has a degree no larger than r − 1. Moreover, r (hi+1 − h1 ai )x r−i . xh(x) − h1 f (x) =
(8.1.59)
i=1
The (j − 1)-st coefficient of this polynomial is hj − h1 aj −1 , which belongs to I by the inductive hypothesis. Since h1 aj −1 ∈ I, it follows that hj ∈ I. This completes the proof of the inductive step. 2
296
8. EXTENSIONS TO GENERAL SETTINGS
The last preliminary result is the same as Lemma 8.1.31, except that the ring B is restricted to be a local ring. Let B be a local ring, let R = B[x], and suppose [a1 · · · an ] ∈ R1×n is a unimodular row with at least one of the ai being monic. Then this row can be complemented. Lemma 8.1.33
Proof (Suslin). Assume without loss of generality that n ≥ 3,6 and that a1 is monic. The proof is by induction on the degree of a1 . If δ(a1 ) = 0, then a1 = 1 and there is nothing further to prove. If δ(a1 ) =: r > 0, we will show that [a1 · · · an ] can be reduced, through elementary column operations, to a row [b1 · · · bn ] where b1 is monic and δ(b1 ) < r. Accordingly, suppose r = δ(a1 ). Since a1 is monic, by performing column operations if necessary we can assume that δ(ai ) < r for i = 2, · · · , n. Let J be the unique maximal ideal of B, and for b ∈ B let bˆ ∈ B/J denote the coset b + J. For f ∈ R = B[x], define fˆ ∈ [B/J][x] in the obvious way, namely, if f = fi x i , then fˆ = fˆi x i . Now the row [aˆ 1 · · · aˆ n ] is unimodular in [B/J][x]. Since δ(aˆ 1 ) = δ(a1 ) > 0, not all of a2 , · · · , an can be zero. By renumbering if necessary, suppose a2 = 0. Then some coefficient of a2 does not belong to J. Suppose a2 (x) =
r
b2i x r−i ,
(8.1.60)
i=1
/ J. Since B is a local ring, this implies that b2j is a unit. Let us now apply and suppose b2j ∈ Lemma 8.1.32 with f = a1 , g = a2 , and define I as in Lemma 8.1.32. Then Lemma 8.1.32 implies that b2j ∈ I, i.e., that the ideal (a1 , a2 ) contains a monic polynomial of degree less than r. So by elementary column operations, a3 can be replaced by a monic polynomial of degree less than r. Now apply the inductive hypothesis. 2 Proof of Lemma 8.1.31. The lemma is proved by establishing the following fact: There exists a unimodular matrix U ∈ U(R) such that aU = b, where b = [b1 · · · bn ] ∈ B1×n is a unimodular row. In fact, such a transformation is possible (as the following proof will reveal) for any ring B. Since B is Hermite, b can be complemented, whence a can also be complemented. Suppose f, g ∈ R = B[x]; then f · g ∈ B[x] can be defined in the obvious way, namely, if f = fi x i , then (f · g)(x) = fi (g(x))i . Given the row a, define I to be the subset of B consisting of all b ∈ B which have the property that, for each f, g ∈ R there exists a unimodular matrix U = U (f, g) such that a(f + bg) = a(f )U , where a(f + bg), a(f ) are row vectors in R1×n defined by (a(f + bg))i = ai · (f + bg) , (a(f ))i = ai · f . 6 Note that a unimodular row consisting of exactly two elements can always be complemented, in any ring.
(8.1.61)
8.1. COPRIME FACTORIZATIONS OVER A COMMUTATIVE RING
297
Then I is an ideal in B: Clearly 0 ∈ I (take U = I ∀f, g). If b, c ∈ I, then for each f, g ∈ R there exist unimodular matrices U (f, g), V (f, g) such that a(f + bg) = a(f )U, a(f + cg) = a(f )V . Hence, we have a(f + (b + c)g) = a(f + bg + cg) = a(f + bg)V (f + bg, g) = a(f )U (f, g)V (f + bg, g). Since the matrix U (f, g)V (f + bg, g) is unimodular for all f, g ∈ R, it follows that b + c ∈ I. Finally, if b ∈ I and c ∈ B, then for any f, g ∈ R we have a(f + bcg) = a(f )U (f, cg), which shows that bc ∈ I. If it can be shown that I = B, then the proof is complete. If 1 ∈ I, then for each f, g ∈ R there exists a U (f, g) ∈ U(R) such that a(f + g) = a(f ). In particular, let f = x, g = −x; then a(0) = a(x)U for some U ∈ U(R). If we now define b = a(0), then the desired statement follows upon observing that a(0) is a unimodular row in B. To show that I = B, suppose by way of contradiction that I = B. Then I is contained in some maximal ideal of B, say J. Let L denote the localization BJ . Then B ⊆ L, and a is a (monic) unimodular row in L[x]. By Lemma 8.1.33, a can be complemented in L[x]. In other words, there exists a unimodular matrix U over L[x] such that a(x) = [1 · · · 0 0]U (x) .
(8.1.62)
Now adjoin a new indeterminate y to L[x], and define V (x, y) = [U (x + y)]−1 U (x) .
(8.1.63)
Then V is also a unimodular matrix over L[x, y], and V (x, 0) = In . From (8.1.62), it follows that a(x + y) = [1 · · · 0 0]U (x + y) ,
(8.1.64)
so that, from (8.1.64) and (8.1.63), a(x + y)V (x, y) = [1 · · · 0 0]U (x) = a(x) .
(8.1.65)
Next, note that V (x, y) is a matrix over L[x, y], so that each element of V (x, y) is a polynomial in x and y with coefficients in L. Moreover, since V (x, 0) = In , it follows that each element of V is of the form r dij k (x)y k , (8.1.66) Vij (x, y) = cij + k=1
where cij = 0 or 1. Loosely speaking, each term in the polynomial expansion of vij is either a “constant” 0 or 1, or else involves a positive power of y. Recall that L consists of fractions p/q where p ∈ B, q ∈ B \ J. Let b denote the product of the denominators of all the coefficients occurring in the matrix V (x, y); thus b is the product of the denominators of the coefficients of dij k (x), for all i, j, k. Then V (x, by) is actually a unimodular matrix over B[x, y], since replacing y by by wipes out all the denominators. Now (8.1.65) implies that a(x + by)V (x, by) = a(x) .
(8.1.67)
298
8. EXTENSIONS TO GENERAL SETTINGS
So for any f, g ∈ B[x], (8.1.67) implies that a(f + bg)V (f, bg) = a(f ) .
(8.1.68)
Therefore b ∈ I, by definition. On the other hand, b is the product of elements in B \ J, and hence itself belongs to B \ J. In particular, b ∈ / I ⊆ J. This contradiction shows that I = B. 2
8.2
COPRIME FACTORIZATIONS FOR DISTRIBUTED SYSTEMS
Suppose R is a commutative domain with identity, and let F denote the associated field of fractions. The previous section was concerned with the question of when a matrix in M(F) has a right or left-coprime factorization, and when the existence of one implies the existence of the other. It was shown that the notion of a Hermite ring plays an important role in answering the above questions. In particular, if R is a Hermite ring, then a matrix in M(F) has an r.c.f if and only if it has an l.c.f. However, aside from some general and abstract results, very little was said about the existence of coprime factorizations. The objective of this section is to study this issue in more detail for the case of linear distributed systems, both continuous-time as well as discrete-time. This is done by first proving a general theorem about the existence of coprime factorizations, and then specializing this theorem to the two special cases mentioned above. As a consequence, we shall obtain two very large classes of transfer matrices whose elements all have both r.c.f ’s as weil as l.c.f.’s. The main result of this section is given next. Suppose I is a subset of R satisfying two conditions: (I1) I is a saturated multiplicative system; (I2) If a ∈ I and b ∈ R, then the ideal in R generated by a and b is principal. Under these conditions, every matrix G ∈ M(I−1 R) has both an r.c.f. as well as an l.c.f.
Theorem 8.2.1
1) Recall from Section A.2 that a subset I of R is said to be multiplicative if a, b ∈ I implies that ab ∈ I. It is saturated if a ∈ I and b ∈ R divides a together imply that b ∈ I. 2) By induction, one can show that (I2) is equivalent to the following statement: (I2’) If a ∈ I and b1 , . . . , bn ∈ R, then the ideal in R generated by a, b1 , . . . , bn is principal. Let d be a generator of this principal ideal. Then one can show quite easily that d is a greatest common divisor (g.c.d.) of the set {a, b1 , . . . , bn }. 3) Recall from Section A.2 that I−1 R denotes the ring consisting of all fractions of the form a/b, where a ∈ R, and b ∈ I. One may assume without loss of generality that 1 ∈ I. By identifying a ∈ R with a/1, one can imbed R as a subring of I−1 R. Remarks 8.2.2
The proof of Theorem 8.2.1 is facilitated by some preliminary results. Lemma 8.2.3 Suppose a ∈ I, b1 , . . . , bn ∈ R, and let d be a g.c.d. of the set {a, b1 , . . . , bn }. Then there exists a matrix U ∈ U(R) such that |U | = d and [a, b1 , . . . , bn ] is the first column of U .
8.2. COPRIME FACTORIZATIONS FOR DISTRIBUTED SYSTEMS
299
The proof of this lemma exactly follows that of Lemma B.2.1, if one notes that every subset {a, b1 , · · · , bi } has a g.c.d. by condition (I2). Corollary 8.2.4 Suppose a, b1 , · · · , bn have a g.c.d. of 1. Then there exists a U ∈ U(R) with [a, b1 , · · · , bn ] as its first column.
Lemma 8.2.5
Suppose a, b1 , · · · , bn ∈ R have a g.c.d. of d. Then there exists a V ∈ U(R) such that ⎡ ⎤ ⎡ ⎤ a d ⎢b ⎥ ⎢ 0 ⎥ ⎢ 1⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢·⎥ V ⎢ ⎥ = ⎢ ⎥. (8.2.1) ⎢ · ⎥ ⎢·⎥ ⎢ ⎥ ⎢ ⎥ ⎣ · ⎦ ⎣·⎦ bn 0
Proof. Let c0 = a/d, ci = bi /d, i ≥ 1. Then c0 ∈ I since I is saturated, and 1 is a g.c.d. of all the ci ’s. By Corollary 8.2.4 above, there exists a U ∈ U(R) such that [c0 · · · cn ] is the first column of 2 U . Now let V = U −1 . Suppose G ∈ M(I−1 R) is of order n × m, and G = BA−1 , where B ∈ Rn×m , A ∈ Rm×m , Then G has an r.c.f. provided there exists a U ∈ U(R) such that A D U = . (8.2.2) B 0n×m Lemma 8.2.6
The proof of this lemma is the same as that of Theorem 4.1.3 with a few inessential modifications, and is left to the reader. Proof of Theorem 8.2.1. It is shown that every G ∈ M(I−1 R) has an r.c.f. The existence of an l.c.f. for every such G will then follow by taking transposes. Suppose G ∈ M(I) is of order n × m. Then a typical element gij of G is of the form cij /aij , where cij ∈ R, aij ∈ I ∀i, j . Define a= aij , bij = gij a , (8.2.3) i,j
and let B be the matrix whose elements are the bij ’s, A = aI . Then G = BA−1 , and the theorem is proved if it can be shown that there exists a U ∈ U(R) such that (8.2.2) holds. Such a U is constructed
300
8. EXTENSIONS TO GENERAL SETTINGS
iteratively. As the first step, let dm be a g.c.d. of the set {a, b1m , · · · , bnm }. Such a dm exists, since a ∈ I (by the multiplicativity of the set I). Select a unimodular matrix Vm ∈ R(n+1)×(n+1) such that ⎡ ⎤ ⎡ ⎤ a dm ⎢b ⎥ ⎢ 0 ⎥ ⎢ 1m ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ · ⎥ ⎢ · ⎥ (8.2.4) Vm ⎢ ⎥=⎢ ⎥ . ⎢ · ⎥ ⎢ · ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ · ⎦ ⎣ · ⎦ 0 bnm Such a Vm exists, by Lemma 8.2.5. Now define I Um = m−1 0
0 . Vm
(8.2.5)
Then routine computations show that Um where
A aIm−1 = X B
em , 0
em = [0 · · · 0 dm ] ,
(8.2.6)
(8.2.7)
and X ∈ M(R). It is now possible to repeat the procedure by working on the next-to-last column of the matrix in (8.2.6). After m repetitions of the reduction, one arrives at a matrix of the form shown in (8.2.2) 2 As an application of the above theory, consider the ring A described in (8.1.16), which is recalled here for convenience. The ring A consists of all distributions f (·) with support in [0, ∞) of the form ∞ f (t) = fi δ(t − ti ) + fa (t) , (8.2.8) i=0
where 0 ≤ t1 < t2 < . . . are real numbers, fi are real, f (·) is measurable, and in addition f A =
∞ i=0
∞ |fi | +
|fa (t)| dt < ∞ .
(8.2.9)
0
It can be shown that A is precisely the set of impulse responses of BIBO-stable linear time-invariant systems, and is a ring with pointwise addition and convolution as the product. Now, it can be shown [101] that A is not a Bezout domain. Hence, not every element in the field of fractions of A has a coprime factorization. Moreover, it is not known whether A is a Hermite ring or not. Hence, it is of interest to identify a subset B of the fraction field of A such that every matrix in M(B) has both
8.2. COPRIME FACTORIZATIONS FOR DISTRIBUTED SYSTEMS
301
an r.c.f. and an l.c.f. This is done in [8] and can be described in the current framework as follows: Let ˆ denote the set of Laplace transforms of distributions in A. Then A ˆ is isomorphic to A, and has A pointwise addition and multiplication as the ring operations. It is routine to verify that each Laplace ˆ converges over the closed right half-plane C+e , and is analytic over the open right transform in A ˆ 0 denote the subset of A ˆ defined as follows: A ˆ 0 consists of those functions half-plane. Now let A ˆ which are analytic over the closed right half-plane. More precisely, f ∈ A ˆ 0 if there exists f ∈A an ε > 0 (which may depend on f ) such that f (·) is analytic over the half-plane {s : Re s > −ε}. ˆ 0 is a subring of A. ˆ We take A ˆ 0 as the ring R. The subset I of R is defined as follows: A Then A ˆ 0 is bounded away from zero et infinity if, whenever {si } is a sequence in C+e , such function f in A ˆ 0 that are that |si | → ∞, we have that lim inf |f (si )| > 0. Now I consists precisely of those f in A bounded away from zero at infinity. It is easy to verify that I is a saturated multiplicative system in R (see Problem 8.2.1), so that condition (I1) is satisfied. The proof of (I2) is a little more complicated, but can be found in [101]. Hence, on the basis of Theorem 8.2.1, we conclude that every function in B = I−1 R has a coprime factorization, and more generally, every G ∈ M(B) has both an r.c.f. as well as an l.c.f. In view of the above very useful property, it is worthwhile to understand a little better the class ˆ then any zero of of functions B. Since the zeros of an analytic function are always isolated, if f ∈ A ˆ 0 , then every zero of f in the closed RHP is isolated. f in the open RHP is isolated. Now, if f ∈ A Finally, if f ∈ I, then f is also bounded away from zero at infinity, which implies that all zeros of f in C+e must in fact be contained in a compact subset thereof. Since any such zero is isolated and of finite multiplicity, we conclude that every f ∈ I has only a finite number of isolated zeros, each of finite order, in the closed RHP. Hence, every g ∈ B has only a finite number of singularities in the closed RHP, and each of these is a pole of finite order. Now let us consider discrete-time systems. In this case, a good choice for the set of transfer functions of stable systems is the disc algebra A, consisting of functions that are analytic on the open unit disc and continuous on the closed unit disc. The argument given in [101] shows that A is not a Bezout domain, but A is a Hermite ring as shown in Example 8.1.26. Hence, every matrix in M(A) that has an r.c.f. also has an l.c.f. and vice versa. But it is still desirable to have simple sufficient conditions for the existence of an r.c.f. (or an l.c.f.). For this purpose, let R be the disc algebra A, and let I consist of those functions f ∈ A that have only a finite number of zeros in the closed unit disc and none on the unit circle (hence they are all in the open unit disc). With this definition, it is once again routine to verify condition (I1). To check (I2), suppose f ∈ A, g ∈ I. Then f and g can have only a finite number of common zeros in the closed unit disc, and these must all be in the open unit disc. Suppose these common zeros are α1 , . . . , αn , with multiplicities m1 , . . . , mn . Let φ(z) be the polynomial
φ(z) =
n (z − αi )mi . i=1
(8.2.10)
302
8. EXTENSIONS TO GENERAL SETTINGS
Then it can be shown that the ideal in A generated by f and g is the same as the principal ideal generated by φ (see Problem 8.2.2). Hence, (I2) holds, and every matrix whose elements all belong to I−1 A have both an r.c.f. and an l.c.f.
PROBLEMS ˆ 0 that are bounded away from zero 8.2.1. Show that the set I consisting of the functions in A infinity satisfies condition (I1). 8.2.2. Show that the subset I of the disc algebra defined above satisfies conditions (I1) and (I2).
8.3
CONTROLLER SYNTHESIS OVER A GENERAL RING
The objective of this section is to extend the results contained in Chapter 5, and specifically Section 5.2, to the case where the set S is replaced by a general ring R. The main obstacle to doing so is that the plant and compensator can not be assumed a priori to have right- or left-coprime factorizations. It is shown below that this difficulty can be mostly circumvented. Suppose R is a commutative domain with identity, and denote its field of fractions by F. In the present set-up, both the plant P and the compensator C are assumed to belong to M(F). The pair (P , C) is stable if (I + P C)−1 −P (I + CP )−1 H (P , C) = , (8.3.1) C(I + P C)−1 (I + CP )−1 is well-defined and belongs to M(R). The symbol S(P ) denotes the set of C ∈ M(F) such that (P , C) is a stable pair. The first objective is to parametrize the set S(P ) corresponding to a given plant P . This is made easier by the next result. Lemma 8.3.1
Suppose P has an r.c.f. (resp. an l.c.f.) and that C ∈ S(P ). Then C has an l.c.f. (resp. an
r.c.f.). Proof. Consider the case where P has an r.c.f.; in case P has an l.c.f. the arguments are entirely similar. Suppose P = N D −1 where N, D ∈ M(R) are right-coprime, and select X, Y ∈ M(R) such that XN + Y D = I . Now H (P , C) = H (ND −1 , C) I − N(D + CN)−1 C = D(D + CN)−1 C
−N(D + CN)−1 D(D + CN)−1
(8.3.2)
belongs to M(R), since C ∈ S(P ). Define S, T ∈ M(F) by S = (D + CN)−1 , T = (D + CN)−1 C .
(8.3.3)
8.3. CONTROLLER SYNTHESIS OVER A GENERAL RING
Then,
−NS , DS
I − NT H (P , C) = DT
303
(8.3.4)
which shows that N S, N T , DS, DT ∈ M(R). This in turn implies that S = X(NS) + Y (DS) ∈ M(R) ; T = X(NT ) + Y (DT ) ∈ M(R) .
(8.3.5)
Also, from (8.3.3), C = S −1 T . Thus, in order to prove that (S, T ) is an l.c.f. of C, it only remains to establish the left-coprimeness of S and T . But this is immediate, since SD + T N = (D + CN)−1 (D + CN) = I .
(8.3.6)
2
Hence, (S, T ) is an l.c.f. of C.
Corollary 8.3.2 and an l.c.f.
Suppose P has both an r.c.f. and an l.c.f. and that C ∈ S(P ). Then C has both an r.c.f.
Corollary 8.3.3
Suppose P , C ∈ M(F) and that the pair (P , C) is stable. Then the matrix G=
C 0
0 P
(8.3.7)
has both an r.c.f. and an l.c.f.
Remarks 8.3.4
It is not claimed that P and C individually have both an r.c.f. and an l.c.f. - only
that G has both.
Proof. Define
0 I F = . −I 0
(8.3.8)
Then (P , C) is stable if and only if (G, F ) is stable. Since F is a constant matrix, it has both an r.c.f. as well as an l.c.f. The conclusion now follows from Corollary 8.3.2. 2
304
8. EXTENSIONS TO GENERAL SETTINGS
Thus, if one is interested in designing stabilizing compensators for a plant that has both an r.c.f. and an l.c.f., the basic result of Chapter 5, namely Theorem 5.2.1, carries over completely. It is repeated here for emphasis. ˜ N). ˜ Select X, Y, X, ˜ Y˜ ∈ Suppose P ∈ M(F) has an r.c.f. (N, D) and an l.c.f. (D, ˜ ˜ ˜ ˜ M(R) such that XN + Y D = I, N X + D Y = I . Then,
Theorem 8.3.5
˜ : R ∈ M(R) and |Y − R N| ˜ = 0} ˜ −1 (X + R D) S(P ) = {(Y − R N) = {(X˜ + DR)(Y˜ − NR)−1 : R ∈ M(R) and |Y˜ − NR| = 0}.
(8.3.9)
Once a parametrization of S(P ) is available, Theorem 5.4.1 concerning the simultaneous stabilizability of two plants P0 and P1 carries over to a general ring R. Suppose P0 and P1 ∈ M(F) have the same dimensions, and that each has both an r.c.f. and an l.c.f. Define
Theorem 8.3.6
A0 = X0 N1 + Y0 D1 , B0 = −N˜ 0 D1 + D˜ 0 N1 ,
(8.3.10)
where the symbols are as in (5.4.1). Then there exists a C ∈ M(F) that stabilizes both P0 and P1 if and only if there exists an M ∈ M(R) such that A0 + MB0 U(R). Thus, as before, the simultaneous stabilizability question for two plants can be reduced to a strong stabilizability question for a single plant. Unfortunately, this latter question is very difficult to answer, especially for multivariable plants. The strong stabilizability question for scalar plants is the following: Given coprime elements n, d ∈ R, when does there exist a c ∈ R such that d + cn is a unit of R? The answer to this question depends strongly on the specifics of the ring R. In case R = A, the disc algebra, and n has only a finite number of zeros inside the closed unit disc, the existence or otherwise of a suitable c can be decided using the parity interlacing test, as in Section 2.4. However, if n has a sequence of zeros clustering on the unit circle, the methods of Section 2.4 are no longer applicable. For multivariable plants, the strong stabilization question is: Given a right-coprime pair N, D ∈ M(R), does there exist a C ∈ M(R) such that D + CN ∈ U(R)? If R is a proper Euclidean domain, the results of Section 4.4 can be used to transform this matrix problem into a scalar problem. Specifically, such a C exists if and only if the pair (n, d) can be strongly stabilized, where n is the g.c.d. of all elements of N and d = |D|. However, such a reduction is not possible over a general ring. As a result, the strong stabilization problem for multivariable plants over a general ring is still open. The other contents of Chapter 5, such as the two-parameter compensation scheme for example, can also be extended to a general ring without undue difficulty. The details are left to the reader.
8.3. CONTROLLER SYNTHESIS OVER A GENERAL RING
305
The results of Chapters 6 and 7 can be extended almost without modification to distributed systems. The reason is that these results are mainly based on the properties of the function spaces H2 and H∞ , and the rationality of the various functions involved does not play a central role. This section is concluded with a discussion of the possibility of designing lumped compensators (i.e., those with rational transfer function matrices) for distributed systems. Such compensators are desirable because of ease of implementation. Consider first the discrete-time case, and suppose the ring R is the disc algebra A. Then the polynomials are dense in A. As a consequence, the set of rational functions (i.e., ratios of polynomials) are dense in the fraction field of A in the graph topology. Thus, if c is any element (compensator) in the fraction field of A, then every neighborhood of c contains a rational function. This fact can be exploited to design lumped stabilizing compensators, as follows: First, use Theorem 8.3.5 together with any other mathematical apparatus to design a good, or even optimal, compensator C in the form Nc Dc−1 or D˜ c−1 N˜ c , where the matrices Nc , Dc or N˜ c , D˜ c have their elements in the ring A. By approximating the elements of these matrices by polynomials or rational H∞ -functions to a sufficiently high degree of accuracy, one can come up with lumped compensators that stabilize the plant under study, or achieve near-optimal performance. In the continuous-time case, the situation is a little more complicated, because the elements ˆ For example, the function f (s) = exp(−s) belongs to A, ˆ since it is of S are not dense in the ring A. the Laplace transform of a delayed impulse distribution. But it cannot be uniformly approximated by any stable rational function. Hence, it is not always possible to approximate compensators in ˆ by rational compensators. However, it is not too difficult to verify that any the fraction field of A function which is the Laplace transform of a distribution of the form f (t) = f0 δ(t) + fa (t) ,
(8.3.11)
where the function fa is absolutely integrable (i.e., belongs to the space L1 ), can be approximated arbitrarily closely by a function in S. Thus, if R denotes the set of Laplace transforms of such functions, then S is a dense subset of R. Hence, any compensator of the form D˜ c−1 N˜ c , or Nc Dc−1 where Dc , Nc , N˜ c , D˜ c ∈ M(R), can be approximated by a lumped compensator. The situation is more satisfactory in the case of the ring B discussed in the previous section. Recall that B is the ring of fractions of the form I−1 A, where I and A are defined in Section 8.2. It can be shown that every function in B is the sum of a rational function and the Laplace transform of a function belonging to L1 (see [8]). Thus, if F ∈ M(B), then F can be written as F = G + H , where G ∈ M(R(s)) and H is the Laplace transform of a matrix consisting of functions from L1 . Now, the problem of designing a stabilizing compensator for F is essentially that of designing one for the “unstable part” G of F . Suppose (Ng , Dg ) is an r.c.f. of G over the ring M(S), and select matrices X, Y ∈ M(S) that satisfy the identity XNg + Y Dg = I .Then, as shown in Problem 4.1.12, ˆ (which contains S as a subring), the pair (Ng + H Dg , Dg ) is an r.c.f. of F , this time over the ring A and moreover, X(Ng + H Dg ) + (Y − XH )Dg = I .
(8.3.12)
306
8. EXTENSIONS TO GENERAL SETTINGS
Thus, the compensator (Y − XH )−1 X stabilizes the plant F . Next, note that Y − XH ∈ M(R), where R is the set of Laplace transforms of distributions of the form (8.3.11). Thus, there exists a sequence of rational matrices {Ai } in M(S) that converges Y − XH . By choosing the index i sufficiently large, we can ensure that the lumped compensator A−1 i X stabilizes the plant F . If |Y − XH | = 0, this presents no problem, since one can always choose the sequence {Ai } to consist of nonsingular matrices. Thus, we have shown that every distributed plant whose transfer matrix belongs to the set B can be stabilized by a lumped compensator.
NOTES AND REFERENCES The references for the material in Section 8.1 are given at the appropriate places. The design of lumped compensators for distributed systems is discussed in [70, 71], and for the special case of timedelay systems in [55]. See [51] for results relating the internal and external stability of distributed systems, and [6, 54, 64, 77] for controller synthesis theory in very general settings. The factorization approach carries over with some limitations to nonlinear systems as well. In this connection, see [1, 23, 24].
307
APPENDIX
A
Algebraic Preliminaries This appendix contains a brief introduction to some basic properties of rings. It is important to note that the choice of results from ring theory that are presented here is highly selective; the reader is referred to [52] or [116] for a more comprehensive treatment.
A.1
RINGS, FIELDS AND IDEALS
A ring is a nonempty set R together with two binary operations + (addition) and · (multiplication) such that the following axioms are satisfied: Definition A.1.1
(R1) (R, +) is a commutative group. This means that a + (b + c) = (a + b) + c, ∀ a, b, c ∈ R. a + b = b + a, ∀ a, b ∈ R. There exists an element 0 ∈ R such that a + 0 = 0 + a = a, ∀ a ∈ R. For every element a ∈ R, there exists a corresponding element −a ∈ R such that a + (−a) = 0. (R2) (R, ·) is a semigroup. This means that a · (b · c) = (a · b) · c, ∀a, b, c ∈ R. (R3) Multiplication is distributive over addition. This means that a · (b + c) = a · b + a · c, (a + b) · c = a · c + b · c, ∀ a, b, c ∈ R. As is customary, a · b is denoted by ab, and a + (−b) is denoted by a − b. A ring R is said to be commutative if ab = ba ∀ a, b ∈ R, and is said to have an identity if there exists an element 1 ∈ R such that 1 · a = a · 1 = a ∀ a ∈ R. A classic example of a ring is the set of integers, denoted by Z, with the usual notions of addition and multiplication. Clearly, Z is a commutative ring with identity. The set E of even integers, together with the usual addition and multiplication is an example of a ring
Examples A.1.2
308
A. ALGEBRAIC PRELIMINARIES
without identity; but E is commutative. The set Z2×2 of 2 × 2 matrices with integer elements is a noncommutative ring with identity. A ring R is said to be a domain (or an integral domain) if a ∈ R, b ∈ R, ab = 0 implies that either a = 0 or b = 0. In other words, R is a domain if the product of every pair of nonzero elements is nonzero. Example A.1.3
In Example A.1.2 above, both Z and E are domains, but Z2×2 is not.
Consider the ring Z/(6) consisting of the six elements {0, 1, 2, 3, 4, 5}, with addition and multiplication defined modulo 6. For example, 3 + 5 = 8 mod 6 = 2, and 2.5 = 10 mod 6 = 4.This ring is commutative and has an identity. However, since 2 · 3 = 0, it is not a domain. Suppose R is a ring with identity. An element x ∈ R is called a unit of R if there is a y ∈ R such that xy = yx = 1. It can be easily shown that such a y is unique; y is called the inverse of x and is denoted by x −1 .
Example A.1.4
Let R = C[0, 1], the set of continuous real-valued functions defined over the interval [0, 1]. If addition and multiplication on R are defined pointwise, i.e.,
Example A.1.5
(x + y)(t) = x(t) + y(t) , (xy)(t) = x(t)y(t), ∀t ∈ [0, 1], ∀x, y ∈ R , then R is a commutative ring with identity. However, it is not a domain; for example, let
1 − 2t 0 ≤ t ≤ 0.5 x(t) = 0 0.5 < t ≤ 1
0 0 ≤ t ≤ 0.5 y(t) = . 2t − 1 0.5 < t ≤ 1 Then xy = 0, even though x = 0, y = 0. A function x ∈ R is a unit of R if and only if x(t) = 0 ∀t ∈ [0, 1]. In other words, the units in R are the functions that do not change sign over [0, 1]. Definition A.1.6 A field is a commutative ring F with an identity, satisfying two additional assumptions: (F1) F contains at least two elements. (F2) Every nonzero element of F is a unit.
The rational numbers, real numbers and complex numbers are all well-known examples of fields. Another example is Z/(p) where p is a prime integer, consisting of the elements
Examples A.1.7
A.1. RINGS, FIELDS AND IDEALS
309
{0, 1, · · · , p − 1}. Addition and multiplication are defined modulo p. This set is clearly a commutative ring with identity, and it contains at least two elements. To show that it satisfies axiom (F2), suppose a is any nonzero element of Z/(p). Since the greatest common divisor of a and p (in the usual sense) is 1, there exist integers x and y such that ax + py = 1 . Moreover, for any integer q, we have that a(x − qp) + p(y + qa) = 1 . So by a suitable choice of q, the integer x − qp can be made to lie between 0 and p − 1. As a result, it follows that there exist integers b and c, with b lying between 0 and p − 1, such that ab + pc = 1, or ab = 1 − pc ≡ 1
mod p .
Hence, a is a unit and b is its inverse. For example, in the ring Z/(13), the inverse of 5 is 8, and the inverse of 2 is 7. The field Z/(2), consisting of the two elements {0, 1}, is called the binary field. A subset S of a ring R is a subring of R if it is a ring in its own right, i.e., if 0 ∈ S, and the sum, difference and product of two elements of S again belong to S. A subset I in a ring R is said to be a left ideal if (i) I is a subgroup of the additive group of R, and (iiL) a ∈ I, x ∈ R imply that xa ∈ I. I is a right ideal if (i) I is a subgroup of the additive group of R and (iiR) a ∈ I, x ∈ R imply that ax ∈ I. I is an ideal if it is both a left ideal and a right ideal.
Definition A.1.8
In the above definition, condition (i) means the following: First, 0 ∈ I, and second, if x, y ∈ I, then x ± y ∈ I. Condition (iiL) (resp. (iiR)) means that if and element of I is multiplied on the left (resp. right) by any element of R, the product is once again in I. Clearly, if R is a commutative ring, the concepts of a left ideal, right ideal, and ideal all coincide. Example A.1.9
Let R be the ring Z2×2 of 2 × 2 matrices with integer elements. Let I1 = {M ∈ Z2×2 : m11 = 0, m21 = 0} .
Then it is easy to verify that I1 is a left ideal in R. Similarly, I2 = {M ∈ Z2×2 : m21 = 0, m22 = 0} is a right ideal in R. The set of diagonal matrices D = {M ∈ Z2×2 : m21 = 0, m12 = 0}
310
A. ALGEBRAIC PRELIMINARIES
is a subring of R but is neither a left ideal nor a right ideal. Let a be an integer. Then M(a) = {M ∈ Z2×2 : mij is divisible by a ∀i, j } is an ideal in R.
Example A.1.10
Consider the ring C[0, 1] of Example A.1.5, and let t0 ∈ [0, 1]. Then It0 = {x(·) ∈ C[0, 1] : x(t0 ) = 0}
is an ideal. More generally, let S be any subset of [0, 1]. Then IS = {x(·) ∈ C[0, 1] : x(t) = 0 ∀t ∈ S} is an ideal. Suppose a is some element of a ring R. Then the set of all elements of the form xa where x ∈ R, i.e., the set of all left multiples of a, is a left ideal. It is called the left principal ideal generated by a. Similarly, the right principal ideal generated by a is the set of all elements ax where x ∈ R. An ideal I in a commutative ring R is a prime ideal if a ∈ R, b ∈ R, ab ∈ I implies that either a ∈ I or b ∈ I. Equivalently, I is a prime ideal if and only if the set R − I is closed under multiplication (i.e., the product of two elements not belonging to I cannot itself belong to I).
Consider the ring of integers Z, and let n be any integer. Then the set of multiples of n, denoted by (n), is the principal ideal generated by n. It is a prime ideal if and only if n divides ab implies that n divides a or n divides b, which is true if and only if n is a prime number.
Example A.1.11
Example A.1.12 Consider the ideal It0 of Example A.1.10. If a product xy belongs to this ideal, then (xy)(t0 ) = 0, which means that either x(t0 ) = 0 or y(t0 ) = 0. Hence, It0 is a prime ideal. However, if S contains more than one point, then IS is not a prime ideal. By convention, the entire ring R is not considered to be a prime ideal. This is in conformity with the convention that 1 is not considered a prime number (see Example A.1.11). Thus, “I is a prime ideal in R” automatically implies, by convention, that I is a proper subset of R.
A.1. RINGS, FIELDS AND IDEALS
311
PROBLEMS A.1.1. (i) Show that the zero element of a ring is unique (Hint: if 01 and 02 are two additive identities, then 01 = 01 + 02 = 02 ). (ii) Show that the additive inverse of an element is unique. A.1.2. (i) Show that a · 0 = 0 for all a in a ring (Hint: ab = a(b + 0)). (ii) Show that, if a ring R contains an identity and has at least one nonzero element, then 1 = 0 (Hint: let a = 0; then a0 = a = a1). A.1.3. Show that, if R is a domain, then the cancellation law holds, i.e., ab = ac, a = 0 implies that b = c. A.1.4. Let R be a ring with identity and at least one nonzero element. (i) Show that the set of units of R is a group under multiplication; i.e., show that (a) if u is a unit, then so is u−1 , and (b) if u, v are units, so are uv and vu. (ii) Show that 0 can never be a unit (Hint: see Problem A.1.2). (iii) Show that if x has an inverse, then it is unique. (iv) Show that if x has a left inverse y and a right inverse z, then y = z (i.e., show that yx = 1, xz = 1 implies y = z). A.1.5. Let R be the set of functions mapping the interval [0, 1] into the set of integers Z. (i) Show that R is a ring under pointwise addition and multiplication. (ii) What are the units of this ring? A.1.6. Consider the ring Z/(p) of Example A.1.7. Show that this ring is a field if and only if p is a prime number. A.1.7. (i) Consider the ring Z/(9), consisting of {0, 1, · · · , 8}, with addition and multiplication modulo 9. Determine the units of this ring. (Answer: 1,2,4,5,7,8) (ii) Consider the ring Z/(n). Show that m is a unit of this ring if and only if the greatest common divisor of m and n is 1. A.1.8. Consider the ring Zn×n of n × n matrices with integer elements. Show that a matrix M in this ring is a unit if and only if its determinant is ±1. A.1.9. Show that a commutative ring R is a domain if and only if {0} is a prime ideal.
312
A. ALGEBRAIC PRELIMINARIES
A.2
RINGS AND FIELDS OF FRACTIONS
Throughout this section “ring” means a commutative ring with identity. Suppose R is a ring. An element a ∈ R is an absolute nondivisor of zero if b ∈ R, ab = 0 implies that b = 0. In every ring, there are absolute nondivisors of zero; for example, all the units are of this type. But there may be others as well. In the ring Zn×n , a matrix M is an absolute nondivisor of zero if and only if the determinant of M is nonzero. Thus, a nonunit matrix can be an absolute nondivisor of zero (cf. Problem A.1.8). Now consider the ring C[0, 1] defined in Example A.1.5. Suppose a function x(·) belonging to this ring vanishes at only a finite number of points. If y ∈ C[0, 1] and xy = 0, then x(t)y(t) ≡ 0, which means that y(t) = 0 for all except a finite number of values of t. However, since y(·) is continuous, this implies that y(t) ≡ 0, or that y is the zero element of the ring. Hence, x is an absolute nondivisor of zero. If a is an absolute nondivisor of zero and ab = ac, then b = c; in other words, absolute nondivisors of zero can be “cancelled.” Of course, if R is a domain, then every nonzero element of R is an absolute nondivisor of zero. A set M in a ring R is said to be a multiplicative system if a, b ∈ M implies that ab ∈ M. It is saturated if a ∈ R, b ∈ R, ab ∈ M implies that a ∈ M, b ∈ M. Examples A.2.1
Fact A.2.2
The set N of absolute nondivisors of zero in a ring R is a multiplicative system.
Proof. Suppose a, b ∈ N, y ∈ R, and aby = 0. Then, since a, b ∈ N, it follows successively that 2 aby = 0 ⇒ by = 0 ⇒ y = 0. Hence, ab ∈ N. Suppose R is a ring, M is a multiplicative system in R containing 1, and M is a subset of N (the set of absolute nondivisors of zero in R). We now begin a construction which will ultimately result in a ring L which contains R as a subring, and in which every element of M is a unit. Consider the set R × M, and define a binary relation ∼ on R × M as follows: (a, b) ∼ (c, d) ⇐⇒ ad = bc. The relation ∼ is an equivalence relation: Clearly ∼ is reflexive and symmetric. To show that it is transitive, suppose (a, b) ∼ (c, d) and (c, d) ∼ (e, f ).Then ad = bc and cf = de. Multiplying the first equation by f and the second one by b gives adf = bcf = bde. Now d ∈ N since d ∈ M and M ⊆ N. Hence, d can be cancelled in the above equation to give af = be, i.e., (a, b) ∼ (e, f ). Since ∼ is an equivalence relation, the set R × M can be partitioned into disjoint equivalence classes under ∼. The set of equivalence classes R × M/ ∼ is denoted by L. The set L therefore consists of fractions (a, b), or a/b in more familiar terms, where we agree to treat two fractions a/b
A.2. RINGS AND FIELDS OF FRACTIONS
313
and c/d as equivalent if ad = bc. Addition and multiplication of fractions in L are defined in the familiar way, namely a + b a · b
c ad + bc = . d bd c ac = . d bd
(A.1) (A.2)
Note that, if b, d ∈ M, then so does bd. Hence, the right sides of (A.1) and (A.2) are valid fractions. Actually, (A.1) and (A.2) represent operations on equivalence classes; but the reader can verify that the sum and product of two fractions (i.e., two equivalence classes) do not depend on which representatives of the equivalence classes are used. With addition and multiplication defined by (A.1) and (A.2), L is a ring. Moreover, if every a ∈ R is identified with the fraction a/1, then R is isomorphic to a subring of L. The element 1/1 serves as an identity for L. Finally, if d ∈ M, then d corresponds to d/1 ∈ L; moreover, d/1 is a unit in L with the inverse 1/d. Thus, every element of (the isomorphic image of ) M is a unit in L. The ring L is called the ring of fractions of R with respect to M, and is denoted by M−1 R. Since M is a subset of N (the set of absolute nondivisors of zero), the ring N−1 R contains the largest number of units. If in particular R is a domain, then N = R \ 0, i.e., N consists of all nonzero elements of R. In this case, the ring F = (R \ 0)−1 R has the property that every nonzero element of R is a unit in F. Moreover, every nonzero element of F is also a unit: If a/b ∈ F and a = 0, then b/a is the inverse of a/b. Hence, F is a field; it is referred to as the field of fractions or quotient field associated with the domain R. There is a particularly important class of fraction rings that is frequently encountered. Suppose R is a ring, and that I is a prime ideal in R. If M denotes the complement of I, then M is a multiplicative system (see Problem A.1.4). The corresponding fraction ring M−1 R is called the localization of R with respect to I.
Examples A.2.3 The set of integers Z is a domain. The field of fractions associated with the integers is the set of rational numbers. In the ring Z, let M denote the set of numbers that are not divisible by 3. Then M is a multiplicative system (since 3 is a prime number), and 1 ∈ M. The ring of fractions M−1 Z is the set of rational numbers whose denominators (when expressed in reduced form) are not divisible by 3. This is a subring of the field of rational numbers.
PROBLEMS A.2.1. Suppose R is a ring, a, b, c ∈ R, and a is an absolute nondivisor of zero. Show that if ab = ac, then b = c.
314
A. ALGEBRAIC PRELIMINARIES
A.2.2. Show that the addition and multiplication rules (A.1) and (A.2) are unambiguous, in the sense that the final answers do not depend on which representatives of the equivalence classes are used in the computation. A.2.3. Consider the ring Z/(10), consisting of the integers 0, · · · , 9 with addition and multiplication defined modulo 10. (i) Determine the units of this ring. (ii) Show that the set N of absolute nondivisors of zero consists of just the units. (iii) Using (ii), show that the ring of fractions with numerators in Z/(10) and denominators in N is again just Z/(10). A.2.4. Suppose R is a ring and that I is a prime ideal in R. Let M denote the complement of I. Show that M is a multiplicative system.
A.3
PRINCIPAL IDEAL DOMAINS
Throughout this section, “ring” means a commutative ring with identity. A ring R is said to be a principal ideal ring if every ideal in R is principal. R is a principal ideal domain (p.i.d.) if it is a domain as well as a principal ideal ring. Definition A.3.1
Recall that a principal ideal I consists of all multiples of some element a, i.e., I = {xa : x ∈ R}. Thus, in a principal ideal ring, every ideal is generated by a single element. If x and y are elements of a ring R with x = 0, we say that x divides y, and y is a multiple of x, if there is an element z ∈ R such that y = xz; this is denoted by x | y. If x and y are elements of a ring R such that not both are zero, a greatest common divisor (GCD) of x, y is any element d ∈ R such that (GCD1) d | x and d | y . (GCD2) c | x, c | y ⇒ c | d . In the above definition, it is implicit that d = 0, since d divides a nonzero element by virtue of (GCD1). We say a greatest common divisor because a GCD is not unique. Certainly, if d is a GCD of x and y, then so is −d. Actually, the following stronger result holds: Suppose d is a GCD of x and y; then so is du whenever u is a unit. Suppose in addition that R is a domain; then every GCD d1 of x and y is of the form d1 = du for some unit u.
Fact A.3.2
Proof. The first sentence is obvious. To prove the second sentence, observe that if d1 is another GCD of x and y, then by (GCD2) d1 | d and d | d1 . Hence, d1 = du for some u. Moreover, du | d since d1 | d. Since R is a domain and d = 0, this implies that u | 1, i.e., that u is a unit. 2
A.3. PRINCIPAL IDEAL DOMAINS
315
Fact A.3.2 states that if R is a domain, then once we have found one GCD of a given pair of elements, we can quickly find them all. However, a question that is not answered by Fact A.3.2 is: Does every pair of elements have a GCD? The answer to this question is provided next. Suppose R is a principal ideal ring. Then every pair of elements x, y ∈ R, not both of which are zero, has a GCD d which can be expressed in the form
Theorem A.3.3
d = px + qy
(A.1)
for appropriate elements p, q ∈ R. Moreover, if R is a domain, then every GCD of x and y can be expressed in the form (A.1). Proof. Given x, y ∈ R, consider the set I = {ax + by, a, b ∈ R} .
(A.2)
In other words, I is the set of all “linear combinations” of x and y. It is easy to verify that I is an ideal in R.1 Since R is a principal ideal ring, I must be a principal ideal; that is, there exists an element d ∈ I such that I is the set of all multiples of d. Since x = 1 · x + 0 · y ∈ I, it follows that x is a multiple of d, i.e., d | x; similarly d | y. Thus, d satisfies the axiom (GCD1). Next, since d ∈ I and I is the ideal generated by x, y, there exist p, q ∈ R such that (A.1) holds. Now suppose c | x, c | y. Then c | (px + qy), i.e., c | d. Hence, d also satisfies (GCD2) and is thus a GCD of x and y. Up to now, we have shown that every pair of elements x and y has a GCD which can be written in the form (A.1) for a suitable choice of p and q. Now, if in addition R is a domain, then every GCD d1 of x, y is of the form d1 = du for some unit u, by Fact A.3.2. Thus, every GCD d1 can be written in the form d1 = du = upx + upy. 2 Two elements x, y ∈ R are relatively prime or simply coprime if every GCD of x, y is a unit. In view of Fact A.3.2, if R is a domain, this is equivalent to saying that x and y are coprime if and only if 1 is a GCD of x and y. Now suppose R is a principal ideal domain. Then, by Theorem A.3.3, x, y ∈ R are coprime if and only if the ideal I in (A.2) is the same as the ideal generated by 1. But the latter is clearly the entire ring R. This can be summarized as follows: Fact A.3.4 Let R be a principal ideal domain. Then x, y ∈ R are coprime if and only if there exist p, q ∈ R such that px + qy = 1.
One can also define a GCD of an n-tuple of elements (x1 , · · · , xn ), not all of which are zero. An element d ∈ R is a GCD of the collection (x1 , · · · , xn ) if (i) d | xi ∀i , (ii) c | xi ∀i ⇒ c | d . 1 I is referred to as the ideal generated by x and y.
(A.3) (A.4)
316
A. ALGEBRAIC PRELIMINARIES
As in Fact A.3.2, it follows that a GCD of a given n-tuple is unique to within a unit provided the ring is a domain. Moreover, Theorem A.3.3 can be generalized as follows: Suppose R is a principal ideal ring, and suppose x1 , · · · , xn ∈ R, with at least one element not equal to zero. Then the n-tuple (x1 , · · · , xn ) has a GCD d which can be expressed in the form n pi xi (A.5) d=
Theorem A.3.5
i=1
for appropriate elements p1 , · · · , pn ∈ R. Moreover, if R is a domain, then every GCD of this n-tuple can be expressed in the form (A.5). The proof is entirely analogous to that of Theorem A.3.3 and is left as an exercise. In the field of rational numbers, every ratio a/b of integers has an equivalent “reduced form” f/g where f and g are coprime. The next result shows that such a statement is also true in the field of fractions associated with any principal ideal domain. Let R be a principal ideal domain, and let F be the field of fractions associated with R. Given any fraction a/b in F, there exists an equivalent fraction f/g such that f and g are coprime.
Fact A.3.6
Proof. If a and b are already coprime, then there is nothing to be done. Otherwise, let d be a GCD of a, b, and define f = a/d, g = b/d. Then clearly, f/g = a/b, and it only remains to show that f and g are coprime. From Theorem A.3.3, there exist p, q ∈ R such that d = pa + qb = pf d + qgd .
(A.6)
Cancelling d from both sides of (A.6) gives 1 = pf + qg, which shows that f, g are coprime.
2
Thus, the procedure for “reducing” a fraction is the natural one: namely, we extract a greatest common divisor from the numerator and the denominator. Two elements a, b ∈ R are associates (denoted by a ∼ b) if there is a unit u such that a = bu. One can readily verify that ∼ is an equivalence relation on R. A nonunit, nonzero element p ∈ R is a prime if the only divisors of p are either units or associates of p. An equivalent definition is the following: p is a prime if p = ab, a, b ∈ R implies that either a or b is a unit. In the ring of integers, the primes are precisely the prime numbers. A useful property of principal ideal domains is stated next, without proof (see [116]). Every nonunit, nonzero element of a principal ideal domain can be expressed as a product of primes. Moreover, this factorization is unique in the following sense: If Fact A.3.7
x=
n i=1
pi =
m i=1
qi ,
(A.7)
A.3. PRINCIPAL IDEAL DOMAINS
317
where pi , qi are all primes, then n = m, and the qi ’s can be renumbered such that pi ∼ qi ∀i. Thus, the only nonuniqueness in the prime factorization of an element arises from the possibility of multiplying some of the prime factors by a unit (for example, 6 = 2 · 3 = (−2) · (−3)). Using prime factorizations, one can give a simple expression for a GCD of a set of elements. The proof is left as an exercise (see Problem A.3.7). Fact A.3.8 Suppose x1 , · · · , xn is a set of elements such that none is a unit nor zero. Express each element xi in terms of its prime factors in the form
xi =
n
α
pj ij
(A.8)
i=1
where the pj ’s are distinct (i.e., nonassociative) primes, αij ≥ 0 and αij = 0 if pj is not a divisor of xi . Then a GCD d of (x1 . · · · , xn ) is given by d=
n
β
pj j
(A.9)
i=1
where βj = min αij . i
Corollary A.3.9
(A.10)
Two elements x, y are coprime if and only if their prime divisors are distinct.
Up to now, we have talked about the greatest common divisor of a set of elements. A parallel concept is that of the least common multiple. Suppose (x1 , · · · , xn ) is a set of elements, none of which is zero. We say that y is a least common multiple of this set of elements if (LCM1) xi | y ∀i , (LCM2) xi | z ∀i implies that y | z . Using prime factorizations, one can give a simple expression for a l.c.m. that parallels Fact A.3.8. Suppose (x1 , · · · , xn ) is a collection of nonzero, nonunit elements. Express each element xi in terms of its prime factors as in (A.8). Then a l.c.m. of this collection of elements is given by n γ pj j , (A.11) y= Fact A.3.10
i=1
318
A. ALGEBRAIC PRELIMINARIES
where γj = max αij . i
(A.12)
The proof is left as an exercise (see Problem A.3.8).
PROBLEMS A.3.1. Prove that two elements x, y in a p.i.d. R are coprime if and only if there exist p, q ∈ R such that px + qy is a unit. A.3.2. Let R be a domain, and let x, y denote a GCD of x, y which is unique to within a unit factor. Show that if u, v are units, then x, y = ux, vy (in other words, every GCD of x, y is also a GCD of ux, vy and vice versa). A.3.3. Let R be a p.i.d., and let x, y, z ∈ R. Show that x, y, z = x, y , z . More generally, show that if x1 , · · · , xn ∈ R, then x1 , · · · , xn = x1 , · · · , xm , xm+1 , · · · , xn for 1 ≤ m < n. A.3.4. Let R be a p.i.d., and let x1 , · · · , xn ∈ R. Let d be a GCD of this set of elements and select p1 , · · · , pn ∈ R such that pi xi = d. Show that 1 is a GCD of the set of elements p1 , · · · , pn . A.3.5. Suppose x, y are coprime and z divides y, in a p.i.d. R. Show that x and z are coprime. A.3.6. Suppose u is a unit and x1 , · · · , xn are nonzero elements. Show that u, x1 , · · · , xn = 1. A.3.7. Prove Fact A.3.8. A.3.8. Prove Fact A.3.10. A.3.9. (i) Find the prime factorizations of 8, 24, 42 in the ring of integers. (ii) Find a GCD of the above three numbers using Fact A.3.8. (iii) Find a l.c.m. of the above three numbers using Fact A.3.10. A.3.10. Show that the following three statements are equivalent: (i) x divides y. (ii) x is a GCD of x, y. (iii) y is a l.c.m. of x, y. A.3.11. Suppose R is a p.i.d., that x, y, z ∈ R, x and y are coprime, and that x divides yz. Show that x divides z. (Hint: Use Corollary A.3.9.)
A.4. EUCLIDEAN DOMAINS
A.4
319
EUCLIDEAN DOMAINS
In this section, we study a special type of ring that finds a lot of application in this book. Throughout this section “domain” means a commutative domain with identity. A domain R is a Euclidean domain if there is a degree function δ : R \ 0 → Z+ a satisfying the following axioms:2 Definition A.4.1
(ED1) For every x, y ∈ R with y = 0, there exists a q ∈ R such that either r := x − qy is zero, or else δ(r) < δ(y). (ED2) If x | y then δ(x) ≤ δ(y). One can think of q as a quotient, and r as a remainder, after “dividing” x by y.The axiom (ED1) states that we can always get a remainder that is either zero or else has a smaller degree than the divisor y. We speak of a quotient and remainder because q and r are not necessarily unique unless additional conditions are imposed on the degree function δ(·). The axiom (ED2) implies that δ(1) ≤ δ(x) ∀x = 0, since 1 divides every nonzero element. Hence, it can be assumed without loss of generality that δ(1) = 0. The same axiom also implies that if x and y are associates, then they have the same degree (because if x and y are associates, then x | y and y | x). In particular, δ(u) = 0 whenever u is a unit. Fact A.4.2
Let R be a Euclidean domain with degree function δ(·). and suppose δ(x + y) ≤ max{δ(x), δ(y)} , δ(xy) = δ(x) + δ(y) .
(A.1) (A.2)
Then, for every x, y ∈ R with y = 0, there exists a unique q ∈ R such that δ(x − yq) < δ(y), where the degree of zero is taken as −∞. Proof. By (ED1), there exists at least one such q. Now suppose δ(x − ay) < δ(y), δ(x − by) < δ(y), and define r = x − ay, s = x − by.Then x = ay + r = by + s. Rearranging gives (a − b)y = s − r. If a = b, then δ((a − b)y) = δ(a − b) + δ(y) ≥ δ(y), by (A.2). On the other hand, δ(s − r) ≤ max{δ(r), δ(s)} < δ(y). This contradiction shows that a = b and also r = s. 2
Definition A.4.3 A Euclidean domain R with degree function δ(·) is called a proper Euclidean domain if R is not a field and δ(·) satisfies (A.2).3 2 Note that Z denotes the set of nonnegative integers. + 3This is slightly different from the definition in [65, p. 30].
320
A. ALGEBRAIC PRELIMINARIES
Note that, in a proper Euclidean domain, the division process might still produce nonunique quotients and remainders, because (A.1) is not assumed to hold. This is the case, for example, in the ring of proper stable rational functions, which are studied in Chapter 2. Fact A.4.4
Every Euclidean domain is a principal ideal domain.
Proof. Let R be a Euclidean domain, and let I be an ideal in R. If I = {0}, then I is principal with 0 as the generator. So suppose I contains some nonzero elements, and let x be an element of I such that δ(x) is minimum over all nonzero elements of I. We claim that I is the ideal generated by x and is hence principal. To prove this, let y ∈ I be chosen arbitrarily; it is shown that x divides y. By axiom (ED1), there exists a q ∈ R such that either r := y − qx is zero or else δ(r) < δ(x). If r = 0, then δ(r) < δ(x) contradict’s the manner in which x was chosen. Hence, r = 0, i.e., x divides y. 2 We now present a very important example of a Euclidean domain, namely the ring of polynomials in one indeterminate with coefficients in a field. To lead up to this example, an abstract definition of a polynomial is given. Let R be a ring. Then a polynomial over R is an infinite sequence {a0 , a1 , · · · } such that only finitely many terms are nonzero. The sum and product of two polynomials a = {ai } and b = {bi } are defined by (a + b)i = ai + bi , i i ai−j bj = aj bi−j . (ab)i = j =0
(A.3) (A.4)
j =0
For notational convenience, a polynomial a = {ai } can be represented by a0 + a1 s + a2 s 2 + · · · where s is called the “indeterminate.” The highest value of the index i such that ai = 0 is called the degree of a polynomial a = {a0 , a1 , · · · }.4 Thus, if a is a polynomial of degree m, we can write a(s) = a0 + a1 s + · · · + am s m =
m
ai s i .
(A.5)
i=0
The set of polynomials over R is denoted by R[s], and is a commutative ring with identity. Moreover, if R is a domain, so is R[s]. Suppose F is a field. Then F[s] is a Euclidean domain if the degree of a polynomial in F[s] is defined as above. Fact A.4.5
4The degree of the zero polynomial is taken as −∞.
A.4. EUCLIDEAN DOMAINS
321
Proof. To prove axiom (ED1), suppose f (s) =
n i=0
fi s i ,
g(s) =
m
gi s i , gm = 0 .
(A.6)
i=0
It is necessary to show the existence of a q ∈ F[s] such that δ(f − gq) < δ(g), where δ(0) = −∞. If n < m, take q = 0. If n ≥ m, define q1 (s) = (fn /gm )s n−m ; then δ(f − gq1 ) ≤ n − 1. By repeating this process if necessary on the polynomial f − gq1 we can ultimately find a q ∈ F[s] such that δ(f − gq) < δ(g). Thus, (ED1) is satisfied. The proof of (ED2) is straight-forward. 2 Since the degree function δ(·) satisfies both (A.1) and (A.2), the Euclidean division process yields a unique remainder and quotient r, q corresponding to each pair f, g with g = 0. Moreover, it is clear that F[s] is not a field (the polynomial s has no inverse). Hence, F[s] is a proper Euclidean domain. The field of fractions associated with F[s] is denoted by F(s), and is called the set of rational functions over F. Note that every element of F(s) is a ratio of two polynomials (hence the name rational function).
PROBLEMS A.4.1. Show that the set of integers Z is a Euclidean domain if we define the degree of an integer to be its absolute value. Does the division process result in unique remainders? A.4.2. Consider the ring R[s], consisting of polynomials with real coefficients. What are the primes of this ring? A.4.3. Suppose R is a proper Euclidean domain and x ∈ R. Show that if δ(x) = 1, then x is a prime. Is the converse true? (Hint: See Problem A.4.2.) A.4.4. Let R be a Euclidean domain. Show that x ∈ R is a unit if and only if δ(x) = 0. (Hint: Use axiom (ED1).)
323
APPENDIX
B
Preliminaries on Matrix Rings The objective of this appendix is to gather some well-known facts on matrices whose elements belong to a ring or a field, and to state them in as much generality as possible.
B.1
MATRICES AND DETERMINANTS
Let R be a ring, and let Rn×n denote the set of square matrices of order n × n whose elements belong to R. If the sum and product of two matrices in Rn×n are defined in the familiar way, namely (A + B)ij = aij + bij , n aik bkj , (AB)ij =
(B.1) (B.2)
k=1
then Rn×n becomes a ring, usually referred to as a ring of matrices over R. If R contains an identity and n ≥ 2, then Rn×n is not commutative. For instance, if n = 2, we have 1 0 0 0 0 0 1 0 = . (B.3) 0 0 1 0 1 0 0 0 Similar examples can be constructed if n > 2. Also, if n ≥ 2, then Rn×n is not a domain because Diag {1, 0, · · · , 0} Diag {0, 1, 0, · · · , 0} = 0n×n .
(B.4)
The determinant of a matrix A ∈ Rn×n is denoted by |A| and is defined in the familiar way, namely |A| =
φ∈n
sign φ
n
aiφ(i) ,
(B.5)
i=j
where n denotes the set of permutations of the set N = {1, · · · , n} into itself, and sign φ = ± 1 depending on whether φ is an even or odd permutation.1 Most of the usual properties of determinants hold in the present abstract setting. The required results are stated without proof, and the reader is referred to [65] for further details. Define a function : Rn × · · · × Rn → R as follows: For every v1 , · · · , vn ∈ Rn , define (v1 , · · · , vn ) to be the determinant of the matrix V ∈ Rn×n whose columns are v1 , · · · , vn in that 1Throughout this section, the symbol “1” is used to denote both the integer as well as the identity element of the ring R. It is
usually clear from the context which is meant.
324
B. PRELIMINARIES ON MATRIX RINGS
order. Thus, is just the determinant function viewed as a function of the columns of a matrix. ¯ 1 , · · · , vn ) to be the determinant of the matrix V ∈ Rn×n whose rows are Similarly, define (v v1 , · · · , vn , in that order. Fact B.1.1
We have ¯ 1 , · · · , vn ) ∀ v1 , · · · , vn ∈ Rn . (v1 , · · · , vn ) = (v
(B.6)
¯ is alternating and multilinear. That is, if two arguments of are The function (and hence ) interchanged, then changes sign, and if two arguments of are equal, then equals zero; finally, (αv1 + βw1 , v2 , · · · , vn ) = α(v1 , v2 , · · · , vn ) + β(w1 , v2 , · · · , vn )
(B.7)
for all possible choices of the arguments. Fact B.1.2
Let A ∈ Rn×n , n ≥ 2. Then, for any i, j ∈ {1, · · · , n}, |A| =
n (−1)i+j aij mij (A) ,
(B.8)
j =1
n |A| = (−1)i+j aij mij (A) ,
(B.9)
i=1
where mij (A) is the ij -th minor of A, defined as the determinant of the (n − 1) × (n − 1) matrix obtained from A by deleting its i-th row and j -th column.2 Fact B.1.3
Let A ∈ Rn×n , n ≥ 2. Then n j =1 n
aij (−1)
k+j
mkj (A) =
aik (−1)
i+j
mij (A) =
i=1
|A| if i = k 0 if i = k
(B.10)
|A| if j = k . 0 if j = k
(B.11)
Now a bit of notation is introduced to make subsequent theorem statements more compact. Suppose m and n are positive integers, with m ≤ n.Then S(m, n) denotes the collection of all strictly increasing m-tuples {i1 , · · · , im }, where 1 ≤ i1 < i2 < · · · < im ≤ n. For example, S(3, 5) = {(1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5) , (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5)} . 2To be consistent with subsequent notation, one should write a
(B.12)
N\i,N \j instead of mij (A). But the latter notation is more convenient.
B.1. MATRICES AND DETERMINANTS
325
If m = n, then S(m, n) is the singleton set {(1, 2. · · · , n)}, while if m = 0 then S(m, n) is just the empty set. Suppose A ∈ Rm×n , and let I ∈ S(l, m), J ∈ S(l, n). Then aI J denotes the l × l minor of A consisting of the rows from I and the columns from J . In particular, if I and J are singleton sets of the form I = {i}, J = {j }, then aI J is just the element aij . Note that aI J ∈ R. We use AI J to denote the l × l matrix consisting of the elements from the rows in I and the columns in J . Thus, aI J = |AI J |. In some situations it is of interest to examine a submatrix of A consisting of the rows in I and all columns of A; such a submatrix is denoted by AI. . The notation A.J is similarly defined. Fact B.1.4
(Laplace’s Expansion of a Determinant) Suppose A ∈ Rn×n , and suppose I ∈ S(m, n).
Then, |A| =
(−1)v(I,J ) aI J aN\I,N\J ,
(B.13)
J ∈S(m,n)
where v(I, J ) =
i+
i∈I
j.
(B.14)
j ∈J
Equation (B.13) generalizes (B.8). The corresponding generalization of (B.9) is similar and is left to the reader. (Binet-Cauchy Formula) Suppose A ∈ Rn×m , B ∈ Rm×l , and let C = AB ∈ Rn×l . Let I ∈ S(p, n), J ∈ S(p, l). Then, cI J = aI K bKJ . (B.15) Fact B.1.5
K∈S(p,m)
In particular, if n = m = l, then |C| = |A| · |B|. Note that (B.15) is a natural generalization of (B.2). Using the multilinearity of the determinant function, one can obtain an expression for the determinant of the sum of two matrices. For example, if A, B ∈ R2×2 and a1 , a2 , b1 , b2 are the columns of the two matrices, then |A + B| = (a1 + b1 , a2 + b2 ) = (a1 , a2 ) + (a1 , b2 ) + (b1 , a2 ) + (b1 , b2 ) .
(B.16)
If A, B ∈ Rn×n then the formula for |A + B| will involve the sum of 2n terms. In case one of the matrices is diagonal, each term in this expansion can be neatly expressed as a product of principal minors of A and B. Fact B.1.6
Suppose A, B ∈ Rn×n and that A is diagonal (i.e., aij = 0 for i = j ). Then, |A + B| =
n
l=0 I ∈S(l,n)
aI I bN\I,N\I ,
(B.17)
326
B. PRELIMINARIES ON MATRIX RINGS
where aI I is interpreted as 1 when I is empty. If A ∈ Rn×n , its adjoint matrix, denoted by Aadj , is defined by (Aadj )ij = (−1)i+j mj i (A) .
(B.18)
In view of (B.10) and (B.11), it is seen that, for any A ∈ Rn×n , A · Aadj = Aadj · A = |A|In ,
(B.19)
where In denotes the n × n identity matrix. A matrix A ∈ Rn×n ts unimodular if it has an inverse in Rn×n , i.e., it is a unit in the ring n×n R .
Fact B.1.7
A ∈ Rn×n is unimodular if and only if |A| is a unit in R.
Proof. “if ” Suppose |A| is a unit in R and let b = |A|−1 ∈ R. Then bAadj ∈ Rn×n and A · bAadj = bAadj · A = In . Hence, A is unimodular. “only if ” Suppose A is unimodular, and let B ∈ Rn×n be the inverse of A. Then 1 = |In | = |A| · |B|, which shows that |A| is a unit in R. 2 Now consider the set Fn×n of matrices with elements in a field F. Since a field is also a ring, all of the preceding discussion applies. In addition, since every nonzero element of F is a unit, we see that every A ∈ Fn×n such that |A| = 0 has an inverse in Fn×n . It is customary to call a matrix A nonsingular if |A| = 0. From Fact B.1.3, we see that if A is nonsingular, then A−1 is given by (A−1 )ij = (−1)i+j mj i (A)/|A| .
(B.20)
The relation (B.20) is a special case of the following result, which gives the relationship between the minors of A and A−1 . The proof can be found in [41, pp. 21–22]. Suppose A ∈ Fn×n is nonsingular and let B = A−1 . Let N = {1, · · · , n} and suppose I, J ∈ S(l, n) for some l. Then,
Fact B.1.8
bI J = (−1)v(N\I,N\J ) aN\I,N\J |A|−1 , where the function v is defined in (B.14). If I and J are singleton sets then (B.21) reduces to (B.20). Suppose F ∈ Fr×s , and suppose F can be partitioned as A B F = , C D
(B.21)
(B.22)
B.1. MATRICES AND DETERMINANTS
327
where A is a nonsingular matrix of order n × n. Then the matrix G := D − CA−1 B ∈ Fr−n×s−n
(B.23)
is called the Schur complement of F with respect to A, and is sometimes denoted by F /A. The next result relates the minors of G to those of F . Fact B.1.9
Suppose J ∈ S(t, m), K ∈ S(t, l). Then gJ K = |A|−1 fN∪({n}+J ),N∪({n}+K) ,
(B.24)
where {n} + J denotes the set sum (thus if J = {j1 , · · · , jt }, then {n} + J = {n + j1 , · · · , n + jt }).
Proof. Observe that
I −CA−1
0 A B F = =: E . I 0 G
(B.25)
Now suppose P ∈ S(n + t, n + m), Q ∈ S(n + t, n + l) and that N is a subset of both P and Q. Consider the minor eP Q . The rows in P of E are obtained by adding multiples of the first n rows of F to the rows in P of F . Thus, eP Q = fP Q , since a minor is unchanged by adding multiples of some rows to others. Observe that the matrix EP Q is block-lower-triangular, so that eP Q = |A| · eP −N,Q−N . The identity (B.24) now follows by choosing P = N ∪ ({n} + J ), Q = N ∪ ({n} + K). 2 Suppose A ∈ Fn×n , B ∈ Fm×n , |A| = 0, and let G = BA−1 . Suppose J ∈ S(l, m), K ∈ S(l, n). Then |A| · gJ K equals the determinant of the n × n matrix obtained from A by replacing the rows in K of A by the rows in J of B. Fact B.1.10
A simple example helps to illustrate the statement of the result. Suppose A ∈ F5×5 , B ∈ F3×5 , and let a i , bi denote the i-th rows of A and B, respectively. Each of these is a 1 × 5 row vector. Suppose J = (2, 3), K = (3, 5). Then the above result states that
1
a
2
a
|A| · g(2,3),(3,5) =
b2
. (B.26)
a4
b3
Remarks B.1.11
Note that if B is a column vector then the above fact reduces to Cramer’s rule for solving linear equations.
328
B. PRELIMINARIES ON MATRIX RINGS
Proof. Define
F =
A −I . B 0
(B.27)
Then G is the Schur complement of F with respect to A. By Fact B.1.9, |A| · gJ K = fN ∪ ({n} + J ), N ∪ ({n} + K),
A −I.K
. =
BJ. 0
(B.28)
Note that each of the last l columns of the minor consists of all zeros except for a single 1. If we expand this minor about the last l columns using Laplace’s expansion (Fact B.1.4), then the expansion consists of a single term, since the submatrix consisting of the last l columns has only nonzero l × l minor. Thus, if K = {k1 , · · · , kl }, then
v l AN\K
|A| · gJ K = (−1) (−1)
, (B.29) BJ.
where
l l (n + i) + ki . v= i=1
(B.30)
i=1
Now the minor on the right side of (B.29) is not quite in the form stated in Fact B.1.10, since the rows of B occur below those of A, rather than substituting for them (as in Example B.26, for instance). It is a matter of detail to verify that the parity factor (−1)v+l accounts for this difference. 2 Suppose A ∈ Fn×n , B ∈ Fn×m , |A| = 0 and let G = A−1 B. Suppose J ∈ S(l, n), K ∈ S(l, m). Then |A| · gJ K equals the determinant of the n × n matrix obtained from A by replacing the columns in J of A by the columns in K of B. Fact B.1.12
The proof is similar to that of Fact B.1.10 and is left to the reader.
B.2
CANONICAL FORMS
Let R be a principal ideal domain, and let F be the field of fractions associated with R. In this section, we study the sets Rn×m and Fn×m , consisting of n × m matrices whose elements belong to R and F, respectively. We prove the existence of two canonical forms, namely the Smith form on Rn×m and the Smith-McMillan form on Fn×m . A matrix A ∈ Rn×m is a left associate of B ∈ Rn×m (denoted by A =L B) if there is a unimodular matrix U ∈ Rn×n such that A = U B. A is a right associate of B (denoted by A =R B) if there is a unimodular matrix V ∈ Rm×m such that A = BV . A is equivalent to B (denoted by A ∼ B) if
B.2. CANONICAL FORMS
329
there are unimodular matrices U ∈ R , V ∈ R . such that A = U BV . It is left to the reader to verify that =L , =R , ∼ are all equivalence relations. We now commence our study of canonical forms. n×n
m×m
Lemma B.2.1 Suppose a1 , · · · , an ∈ R, and let dn be a GCD of this set of elements. Then there exists a matrix Pn ∈ Rn×n whose first row is [a1 · · · an ] and whose determinant is dn .
Proof. The proof is by induction on n. If n = 2, by Theorem A.3.3 there exist p1 , p2 ∈ R such that p1 a1 + p2 a2 = d2 . Now let a1 a2 (B.1) P2 = −p2 p1 For larger values of n, let dn−1 be a GCD of a1 , · · · , an−1 . By the inductive hypothesis, we can construct a matrix Pn−1 ∈ Rn−1×n−1 whose first row is [a1 · · · an−1 ] and whose determinant is dn−1 . By Fact B.1.3, n−1
aj (−1)1+j m1j (Pn−1 ) = dn−1 .
(B.2)
n−1 aj (−1)1+j m1j (Pn−1 ) = 1 . dn−1
(B.3)
j =1
so that j =1
For convenience, let zj = −aj /dn−1 . Next, by Problem A.3.3, dn is a GCD of dn−1 and an . By Theorem A.3.3, there exist x, y ∈ R such that xdn−1 + yan = dn . Now define ⎡ ⎢ ⎢ ⎢ Pn = ⎢ ⎢ ⎣ yz1
⎤ an 0⎥ ⎥ .. ⎥ . .⎥ ⎥ 0⎦
Pn−1
···
yzn−1
(B.4)
x
Expanding |Pn | about the last column and using (B.3) gives |Pn | = xdn−1 + yan = dn .
2
(Hermite Form) Every A ∈ Rn×n is a left associate of a matrix B that is lowertriangular (i.e., bij = 0 for j > i).
Theorem B.2.2
330
B. PRELIMINARIES ON MATRIX RINGS
Proof. Let dn be a GCD of {a1n , · · · , ann }, i.e., the elements of the last column of A. By Theorem A.3.5, there exist elements p1 , · · · , pn ∈ R such that n
pi ain = dn .
(B.5)
i=1
By Problem A.3.4, 1 is a GCD of the set p1 , · · · , pn . Hence, by a slight variation of Lemma B.2.1, there exists a unimodular matrix U ∈ Rn×n such that its last row is [p1 · · · pn ]. Now (U A)nn = dn . Also, for i = 1, · · · , n − 1, (U A)in belongs to the ideal generated by a1n , · · · , ann and is thus a multiple of dn . Let zi = (U A)in /dn for i = 1, · · · , n − 1, and define ⎡
1 0 ··· ⎢0 1 · · · ⎢ ⎢ .. Un = ⎢ ... ... . ⎢ ⎣0 0 · · · 0 0 ···
0 0 .. .
−z1 −z2 .. .
⎤
⎥ ⎥ ⎥ ⎥. ⎥ 1 −zn−1 ⎦ 0 1
(B.6)
Then Un is unimodular since its determinant is 1. Moreover, Un U A is of the form ⎡ ⎢ ⎢ Un U A = ⎢An−1 ⎣ a n−1
⎤ 0 .. ⎥ .⎥ ⎥. 0⎦
(B.7)
dn
In other words, the last column of A has been reduced to zero above the diagonal by means of left multiplication by an appropriate unimodular matrix. One can now repeat the procedure with the n − 1 × n − 1 matrix An−1 , and eventually arrive at a lower triangular matrix. For clarity, the next step of the algorithm is briefly outlined: Let dn−1 denote a GCD of the elements of the last column of An−1 . As above, there exists a unimodular matrix U¯ n−1 ∈ Rn−1×n−1 such that ⎡ ⎢ ⎢ U¯ n−1 An−1 = ⎢An−2 ⎣ a n−2
⎤ 0 .. ⎥ . ⎥ ⎥. 0 ⎦
(B.8)
dn−1
Now define the n × n unimodular matrix Un−1 = Block Diag{U¯ n−1 , 1} .
(B.9)
B.2. CANONICAL FORMS
Then,
⎡ ⎢ ⎢An−2 ⎢ Un−1 Un U A = ⎢ ⎢ ⎣ a n−2 a n−1
0 .. . 0 dn−1 ·
⎤ 0 .. ⎥ .⎥ ⎥ . 0⎥ ⎥ ⎦ 0
2
Every matrix A ∈ Rn×n , is a left associate of an upper triangular matrix C ∈ Rn×n .
Proof. In the proof of Theorem B.2.2, start with the first column of A instead of the last. Corollary B.2.4
(B.10)
dn
The rest of the proof is now obvious. Corollary B.2.3
331
2
Every matrix A ∈ Rn×n is a right associate of a lower (resp. upper) triangular matrix.
Proof. In the proof of Theorem B.2.2, start with the last (resp. first) row of A.
2
For rectangular matrices, the following result holds: Corollary B.2.5
Suppose A ∈ Rn×m . Then A is a left associate of a matrix of the form D if n > m, [D E] if n < m . 0
(B.11)
where D can be chosen to be either lower or upper triangular. It is left to the reader to state and prove the result analogous to Corollary B.2.5 concerning right associates of rectangular matrices. Next, we start developing the Smith form. Recall that a matrix A ∈ Rn×m is said to have rank l if (i) there is an l × l submatrix of A with nonzero determinant, and (ii) every (l + 1) × (l + 1) minor of A is zero. An elementary row operation on the matrix A consists of one of the following: (i) interchanging two rows of A, or (ii) adding a multiple of one row to another. An elementary column operation is similarly defined. It is easy to see that an elementary row (column) operation on a matrix A can be accomplished by multiplying A on the left (right) by a unimodular matrix. Thus, a matrix obtained from A by elementary row and column operations is equivalent to A. Now suppose A ∈ Rn×m is a (left or right) multiple of B ∈ Rn×m , and let bl denote a GCD of all l × l minors of B. Then it follows from the Binet-Cauchy formula (Fact B.1.5) that bl divides all l × l minors of A. Thus, if al denotes a GCD of all l × l minors of A, we see that bl divides al .
332
B. PRELIMINARIES ON MATRIX RINGS
From this, it follows that if A and B are equivalent matrices, then al and bl are associates, and A and B have the same rank. (Smith Form) Suppose A ∈ Rn×m has rank l. Then, A is equivalent to a matrix of the form ⎤ ⎡ h1 0 · · · 0 0 ⎢ 0 h2 · · · 0 0⎥ ⎥ ⎢ ⎢ .. .. .. .. ⎥ , (B.12) H = ⎢ ... ⎥ . . . . ⎥ ⎢ ⎦ ⎣0 0 0 hl 0
Theorem B.2.6
H ∈R
n×m
0
0
where hi divides hi+1 for i = 1, · · · , l − 1. Moreover, h1 · · · hi is a GCD of all i × i minors of A, and the hi ’s are unique to within multiplication by a unit.
Remarks B.2.7
h1 , · · · , hn are called the invariant factors of A.
Proof. Since A has rank l, it contains an l × l submatrix with nonzero determinant. By elementary row and column operations, this submatrix can be brought to the upper left-hand corner of A. As in the proof of Theorem B.2.2, there is a unimodular matrix U ∈ Rn×n such that (U A)11 is a GCD of all elements in the first column of A. By elementary row operations, the first column ¯ If of U A can be made to contain all zeros except in the (1, 1)-position. Call the resulting matrix A. ¯ a¯ 11 divides all elements of the first row of A, then all elements of the first row (except in the (1, 1) ¯ can be made to equal zero by a suitable choice of a unimodular matrix V of the position) of AV form ⎡ ⎤ 1 −v12 −v13 · · · −v1m ⎢· 1 0 · 0 ⎥ ⎢ ⎥ ⎢· · 1 · 0 ⎥ V =⎢ (B.13) ⎥. ⎢ .. .. .. ⎥ ⎣· ⎦ · . . . ·
·
·
·
1
¯ will continue to have zeros except in the (1, 1) position. On the Moreover, the first column or AV ¯ we can choose a unimodular other hand, if a¯ 11 does not divide all elements in the first row of A. ¯ )11 is a GCD of the elements of the first row of AV ¯ . If this is done, the first matrix V so that (AV ¯ column of AV may no longer contain zeros. In such a case, we repeat the above row and column operations. This process cannot continue indefinitely, because the original element a¯ 11 has only a finite number of prime factors (see Fact A.3.7) and each successive corner element is a proper (i.e., nonassociative) divisor of its predecessor. Thus, in a finite number of steps, we arrive at a matrix B equivalent to A such that b11 divides all elements of the first row as well as first column of B. By
B.2. CANONICAL FORMS
333
elementary row and column operations, all of the elements of the first row and column of B can be made equal to zero. Thus, ⎡ ⎤ b11 0 · · · 0 ⎢ 0 ⎥ ⎢ ⎥ A ∼ B ∼ ⎢ . (B.14) ⎥. ⎣ .. ⎦ B1 0 By proceeding to the first row and column of B1 (i.e., the second row and column of B) and then repeating the procedure, we will eventually have ⎡ ⎤ d1 · · · 0 ⎢ .. ⎥ .. .. ⎢ . . 0⎥ A ∼ ⎢. (B.15) ⎥. ⎣ 0 · · · dl ⎦ 0
M
Now, di = 0 ∀i, since these are all divisors of the elements of the first l rows and columns of A; and none of these rows nor columns is identically zero. Next, M must be zero; otherwise A is equivalent to a matrix of rank at least l + 1, which contradicts the fact that the rank of A is l. Thus, far we have shown that ⎡ ⎤ d1 0 · · · 0 ⎢ 0 d2 · · · 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ .. .. .. A ∼ ⎢ ... (B.16) ⎥. . . . ⎢ ⎥ ⎣ 0 0 · · · dl ⎦ 0
0
Let D = Diag {d1 , · · · , dl }. We will show that D ∼ Diag {h1 , · · · , hl }, where hi divides hi+1 for i = 1, · · · , l − 1. This is enough to prove the theorem. By adding columns 2 to l to the first column of D, it follows that ⎡ ⎤ d1 0 · · · 0 ⎢d2 d2 · · · 0 ⎥ ⎢ ⎥ D ∼ ⎢. (B.17) .. .. .. ⎥ . . ⎣. . . .⎦ dl
0
· · · dl
By multiplying the latter matrix on the left by an appropriate unimodular matrix, we get another matrix E whose (1, 1)-element is a GCD of {d1 , · · · , dl } and whose first column is zero otherwise. Let h1 := a GCD of {d1 . · · · , dl }. Then, ⎡ ⎤ h1 ⎢ 0 E¯ ⎥ ⎢ ⎥ E=⎢. (B.18) ⎥, ⎣ .. ⎦ 0
334
B. PRELIMINARIES ON MATRIX RINGS
where every element of E¯ is in the ideal generated by d2 , · · · , dl , and is thus a multiple of h1 . Since the first row of E¯ is a multiple of h1 , it follows that ⎡
h1 ⎢0 ⎢ E ∼ ⎢. ⎣ ..
0 ··· 0 E1
⎤ ⎥ ⎥ ⎥, ⎦
(B.19)
0 where every element of E1 is a multiple of h1 . Now, by the preceding paragraph, E1 ∼ Diag {g2 , · · · , gl } where h1 divides gi for all i. Now let h2 be a GCD of g2 , · · · , gl . Then clearly h1 divides h2 . Moreover, h2 0 E1 ∼ . (B.20) 0 E2 Repeating this procedure, we finally get D ∼ Diag {h1 , · · · , hl } where hi divides hi+1 for all i. This completes the proof of (B.12). The divisibility conditions on the hi imply that the product h1 · · · hi is a GCD of all i × i minors of H . Since A and H are equivalent, product is also a GCD of all i × i minors of A. Now suppose A is also equivalent to another matrix ⎡
g1 ⎢0 ⎢ ⎢ G = ⎢ ... ⎢ ⎣0
0 g2 .. .
··· ··· .. .
0
· · · gl 0
0 0 .. .
⎤ 0⎥ ⎥ ⎥ ⎥, ⎥ ⎦
(B.21)
0
where gi divides gi+1 for all i. Then the reasoning of the preceding paragraph shows that the product h1 · · · hi is an associate of the product g1 · · · gi for all i. Now, if h1 ∼ g1 and h1 h2 ∼ g1 g2 , then h2 ∼ g2 . Reasoning inductively in this fashion, we conclude that hi ∼ gi for all i. 2
Corollary B.2.8
associates.
Two matrices A, B ∈ Rn×m are equivalent if and only if their invariant factors are
B.2. CANONICAL FORMS
335
n×m
Let F be the field of fractions associated with R. We now begin a study of the set F of matrices with elements from F and show the existence of a canonical form known as the SmithMcMillan form. (Smith-McMillan Form) Suppose F ∈ Fn×m has rank l. Then there exist unimodular matrices U ∈ Rn×n , V ∈ Rm×m such that ⎡ ⎤ a1 /b1 0 ··· 0 0 ⎢ 0 a2 /b2 · · · 0 0⎥ ⎢ ⎥ ⎢ .. . . . .. ⎥ , .. .. .. UFV = ⎢ . (B.22) .⎥ ⎢ ⎥ ⎣ 0 ⎦ 0 · · · al /bl
Theorem B.2.9
0
0
where ai , bi are coprime for all i; ai divides ai+1 and bi+1 divides bi for i = 1, · · · , l − 1; and b1 is a least common multiple of the denominators of all elements of F , expressed in reduced form.3 Proof. Let y denote an l.c.m. of the denominators of all elements of F , expressed in reduced form. Then yF ∈ Rn×m , and the rank of yF is also l. By Theorem B.2.6, there exist unimodular matrices U ∈ Rn×n , V ∈ Rm×m such that ⎡ ⎤ h1 · · · 0 ⎢ .. ⎥ .. .. ⎢. . . 0⎥ ⎢ ⎥ UyF V = ⎢ (B.23) .. ⎥ , ⎣ 0 · · · hl . ⎦ 0 0 where hi divides hi+1 for all i. Hence, ⎡
h1 /y ⎢ .. ⎢ . UFV = ⎢ ⎢ ⎣ 0
··· .. .
0 .. .
· · · hl /y 0
⎤ ⎥ 0⎥ ⎥ .. ⎥ . .⎦
(B.24)
0
Let ai /bi be a reduced form for the fraction hi /y, for i = 1, · · · , l. Since hi | hi+1 , let hi+1 = hi ri where ri ∈ R. Let [·] denote the reduced form of a matrix. Then r ai+1 hi+1 hi ri ai ri i = = = = ai , (B.25) y y bi bi bi+1 where in the last step we used the fact that ai , bi are coprime. Now (B.25) implies that ai | ai+1 , bi+1 | bi for all i. 3 See Fact A.3.6 for the definition of a reduced form.
336
B. PRELIMINARIES ON MATRIX RINGS
Finally, to show that b1 ∼ y, it is enough to show that h1 and y are coprime, since a1 /b1 is a reduced form of h1 /y. This is most easily done using prime factorizations. Let pij /qij be a reduced form of the element fij , for all i, j . Then y is an l.c.m. of all qij , and h1 is a GCD of all yfij = ypij /qij for all i, j . Suppose t is a prime factor of y, of multiplicity of α. (By this we mean that t is a prime, t α divides y but t α+1 does not.) Then, from Fact A.3.10, t must be a prime divisor of multiplicity α, of some qij . The corresponding pij is not divisible by t, since pij , qij are coprime (see Corollary A.3.9). As a result, the corresponding term yfij = ypij /qij is also not divisible by t (since t is a factor of multiplicity α of both ypij and qij ). Thus, h1 , being a GCD of all yfij , is also not divisible by t (see Fact A.3.8). Since this is true of every prime factor of y, it follows from Corollary A.3.9 that y and h1 are coprime. 2
337
APPENDIX
C
Topological Preliminaries In this appendix, a few basic concepts from topology are introduced. For greater detail, the reader is referred to [56] or [89].
C.1
TOPOLOGICAL SPACES
This section contains a brief introduction to topological spaces. Let S be a set. A collection T of subsets of S is a topology if1 (TOP1) Both S and Ø (the empty set) belong to T . (TOP2) A finite intersection of sets in T again belongs to T . (TOP3) An arbitrary union of sets in T again belongs to T . The ordered pair (S, T ) is called a topological space, and subsets of S belonging to T are said to be open. A subset of S is closed if its complement in S is open. Definition C.1.1
Let S be any set, and let T1 consist of just the two sets S and Ø. Then (S, T1 ) is a topological space. T1 is referred to as the trivial topology on S. Let T2 consist of all subsets of S. Then (S, T2 ) is also a topological space. T2 is referred to as the discrete topology on S. Suppose S is a set, and T1 , T2 are topologies on S. Then T1 is weaker than T2 (and T2 is stronger than T1 ) if T1 is a subset of T2 , i.e., every set that is open in the topology T1 is also open in the topology T2 . It is obvious that, for any set S, the trivial topology and the discrete topology are, respectively, the weakest and strongest topologies that can be defined on S. Examples C.1.2
To give interesting and useful examples of topological spaces, the notion of base is introduced. To motivate this notion, recall the familiar definition of an open subset of the real line: A subset U of R is open if and only if, corresponding to every x ∈ U, there is a number δ > 0 such that the interval (x − δ, x + δ) is also contained in U. The concept of a base is an abstraction of this idea, whereby intervals of the form (x − δ, x + δ) are replaced by more general sets satisfying appropriate axioms. Let S be a set, and let B be a collection of subsets of S that satisfies two axioms: (B1) The sets in B cover S (i.e., the union of the sets in B is S). (B2) Whenever B1 , B2 are sets in B with a nonempty intersection and x ∈ B1 ∩ B2 , there exists a B(x) in B such that x ∈ B(x) ⊆ (B1 ∩ B2 ). 1Throughout this section, upper case italic letters denote collections of sets, while bold face letters denote sets.
338
C. TOPOLOGICAL PRELIMINARIES
Using this collection B, another collection T of subsets of S is defined as follows: A subset U of S belongs to T if and only if, for every x ∈ U, there is a set B(x) such that x ∈ B(x) ⊆ U. Fact C.1.3 A subset U of S belongs to the collection T if and only if it is a union of sets in B. T is a topology on S. Moreover, T is the weakest topology on S containing all sets in B.
Remarks C.1.4
B is referred to as a base for the topology T , and T is the topology generated by the
base B. To clarify Fact C.1.3, consider the set Rn , consisting of n-tuples of real numbers. Let B denote the collection of “balls” B(x, ε) of the form Example C.1.5
B(x, ε) = {y ∈ Rn : x − y < ε}
(C.1)
as x varies over Rn and ε varies over the positive numbers. Here · denotes the usual Euclidean (or any other) norm on Rn . It is a straight-forward matter to verify that this collection of balls satisfies axioms (B1) and (B2). Hence, this collection forms a base for a topology on Rn , in which a set U ⊆ Rn is open if and only if, for every x ∈ U, there is a ball B(x, ε) ⊆ U. This coincides with the “usual” definition of open sets on Rn . The one extra bit of information that comes out of Fact C.1.3 is that a set is open if and only if it is a union of balls. Proof of Fact C.1.3. To prove the first sentence, suppose first that U is a union of sets in B; it is shown Bi , where I is an index set and Bi that U belongs to the collection T . Specifically, suppose U = i∈I
is in the collection B for all i ∈ I . Let x be an arbitrary element of U. Then x ∈ Bi for some i ∈ I , and for this i we have x ∈ Bi ⊆ U. Hence, by the definition of T , U is in T . Conversely, suppose U is in T . Then for every x ∈ U there is a B(x) in B such that x ∈ B(x) ⊆ U. It is now claimed that U= B(x), which would show that U is a union of sets from B. To prove the claim (and thereby x∈U B(x) since each B(x) ⊆ U; conversely, complete the proof of the first sentence), note that U ⊇ x∈U since every x ∈ U also belongs to the corresponding B(x), it follows that U ⊆ B(x). Hence, the x∈U
two sets are equal. The second sentence is proved by verifying that T satisfies the three axioms (TOP1)–(TOP3). First, Ø belongs to T since Ø vacuously satisfies the defining condition for a set to belong to T . Next, since S is the union of all sets in B, it is also in T . Hence, T satisfies (TOP1). To establish (TOP2), it is enough to show that a nonempty intersection of two sets in T again belongs to T ; it will then follow by induction that every finite intersection of sets in T again belongs to T . Accordingly, suppose U, V are in T and let x be any element of U ∩ V; we will show the existence of a B(x) in B such that x ∈ B(x) ⊆ (U ∩ V), which in turn will establish that the intersection
C.1. TOPOLOGICAL SPACES
339
U ∩ V is open. Since x ∈ U and x ∈ V, there exist B1 (x), B2 (x) in B such that x ∈ B1 (x) ⊆ U, x ∈ B2 (x) ⊆ V. Now, by axiom (B2), there is a B(x) in B such that x ∈ B(x) ⊆ (B1 (x) ∩ B2 (x)), which in turn is contained in U ∩ V. Thus, T satisfies (TOP2). To establish (TOP3), suppose {Uα } is a family of sets in T . Then each Uα is a union of sets belonging to B, whence their union is also a union of sets belonging to B. Thus, Uα is also in T . α
To prove the last sentence, let T1 be another topology on S such that every set in B is contained in T1 . Since T1 is a topology, arbitrary unions of sets in B again belong to T1 . Since every set in T can be expressed as a union of sets in B, it follows that every set in T is in T1 , i.e., T is weaker than T1 . 2 A topological space (S, T ) is first-countable if T has the following property: For every x ∈ S, there is a countable collection of open sets {Bi (x), i ∈ Z+ }, each containing x, such that every open set containing x also contains some Bi (x). In view of Fact C.1.3, the collection Bi (x), x ∈ S is a base for the topology T . Since the set S may be uncountable, the base for the topology may also be uncountable. However, in a first-countable topology, the collection of open sets containing each particular point has a countable base. For example, consider the set Rn together with the topology of Example C.1.5. For a fixed x ∈ Rn , the collection of balls B(x, 1/m), m an integer ≥ 1 is countable; moreover, every open set containing x also contains at least one of the balls B(x, 1/m). Hence, the topological space of Example C.1.5 is first-countable. A very general class of first-countable topological spaces is that of metric spaces, which are defined next. Definition C.1.6 A metric space (S, ρ) is a set S, together with a function ρ : S → R satisfying the following axioms: (M1) ρ(y, x) = ρ(x, y) ∀ x, y ∈ S. (M2) ρ(x, y) ≥ 0 ∀x, y ∈ S; ρ(x, y) = 0 ⇐⇒ x = y. (M3) ρ(x, z) ≤ ρ(x, y) + ρ(y, z) ∀ x, y, z ∈ S. If (S, ρ) is a metric space, then there is a natural topology that can be defined on S. Let B(x, ε) denote the ball B(x, ε) = {y ∈ S : ρ(x, y) < ε} . (C.2)
Then the collection of sets B(x, ε) as x varies over S and ε varies over all positive numbers, is a base for a topology T on S. T is referred to as the topology on S induced by the metric ρ. In this topology, a set U in S is open if and only if, for every x ∈ U, there is a ball B(x, ε) ⊆ U. Every metric space is first-countable as a topological space: For a fixed x, consider the countable collection of sets B(x, 1/m), m an integer ≥ 1. Then every open set containing x also contains at least one of the sets B(x, 1/m). The question of convergence of sequences in topological spaces is examined next. Suppose (S, T ) is a topological space. A set N is said to be a neighborhood of x ∈ S if x ∈ N, and N contains
340
C. TOPOLOGICAL PRELIMINARIES
a set in T that contains x. In other words, a set N is a neighborhood of x if it contains an open set containing x. Note that a neighborhood itself need not be open. A sequence {xi } in S is said to converge to x ∈ S if every neighborhood of x contains all but a finite number of terms of the sequence {xi }. Consider the set Rn together with the topology defined in Example C.1.5. Then a sequence {xi } converges to x if and only if, for every ε > 0, there is a number N such that xi ∈ B(x, ε) ∀ i ≥ N. This is the familiar notion of convergence in Rn . Example C.1.7
Consider a set S together with the discrete topology of Example C.1.2. Let {xi } be a sequence in S converging to x. Since the singleton set {x} is also a neighborhood of x, it follows that xi = x for all except a finite number of values of i. Example C.1.8
Consider a set S together with the trivial topology of Example C.1.2. Then S is the only neighborhood of any x ∈ S. As a result, every sequence {xi } converges to every x ∈ S. The preceding two examples show that the meaning of convergence is very much dependent on the particular topology defined on a set. Moreover, in general, a sequence does not necessarily have a unique limit. To address the latter problem, we define Hausdorff topologies. A topology T on a set S is Hausdorff if distinct points have disjoint neighborhoods. It is easy to verify that if T is a Hausdorff topology on a set S, then the limit of a sequence in S is unique if it exists. Also, if (S, ρ) is a metric space, then the topology induced by the metric ρ is Hausdorff. Let S be a set, and let T1 , T2 be two topologies on S, with T1 weaker than T2 . Then, whenever a sequence {xi } in S converges to x in the topology T2 , it also converges to x in the topology T1 . However, it is possible that a sequence may converge in T1 but not in T2 . This shows that convergence in T1 is in general a weaker requirement than convergence in T2 (and helps to explain the terminology). Recall that a set in S is closed if its complement is open. One of the desirable features of firstcountable topologies is that one can give an alternate, and very useful, characterization of closed sets. Suppose U is a set in S. An element x ∈ S is said to be a limit point (or a cluster point, accumulation point) of U if every neighborhood of x contains an element of U other than x. It can be shown that a set is closed if and only if it contains all of its limit points. Now suppose (S, T ) is a first-countable topological space. Then, for each x ∈ S, there is a countable collection of open sets {Bi (x), i ∈ Z} such that every neighborhood of x contains at least one of the Bi (x). As a result, x is a limit point of a set U if and only if there exists a sequence {xi } in U converging to x such that xi = x ∀i. Based on this, one can show that U is closed if and only if every convergent sequence {xi } in U has a limit in U (see Problem C.1.5). Example C.1.9
C.1. TOPOLOGICAL SPACES
341
Now we come to the notion of continuity. Suppose (S1 , T1 ), (S2 , T2 ) are topological spaces, and f is a function mapping S1 into S2 . Given any subset U of S2 , its preimage under f is the subset of S1 denoted by f −1 (U) and defined by f −1 (U) = {x ∈ S1 : f (x) ∈ U} .
(C.3)
The function f is continuous at x ∈ S1 if, whenever U is an open subset of S2 containing f (x), its preimage f −1 (U) is an open subset of S1 (containing x). f is continuous if it is continuous at all x ∈ S1 . Clearly, whether or not f is continuous is very much dependent on the topologies T1 and T2 . For example, if T1 is the discrete topology, then every function f : S1 → S2 is continuous. Also, one can easily verify the following: If f is continuous at x ∈ S1 and N is a neighborhood of f (x), then f −1 (N) is a neighborhood of x (see Problem C.1.6). The converse is also true, but more difficult to prove: If f −1 (N) is a neighborhood of x whenever N is a neighborhood of f (x), then f is continuous at x. The proof can be found in [56].
Fact C.1.10 Suppose (S1 , T1 ), (S2 , T2 ) are topological spaces, and f : S1 → S2 is continuous at x ∈ S1 . Then, whenever {xi } is a sequence in S1 converging to x, the sequence {f (xi )} converges to f (x).
Proof. Let N be any neighborhood of f (x). Since f is continuous at x, f −1 (N) is a neighborhood of x. Since {xi } converges to x, f −1 (N) contains all but a finite number of terms of the sequence {xi }. This is the same as saying that f (xi ) ∈ N for all but a finite number of values of i. Since this is true for every neighborhood of f (x), we conclude that {f (xi )} converges to f (x). 2 If the topologies T1 , T2 are first-countable, then the converse of Fact C.1.10 is also true: If {f (xi )} converges to f (x) for all sequences {xi } converging to x, then f is continuous at x.
PROBLEMS C.1.1. Let (S, ρ) be a metric space, and let B(x, ε) be defined by (C.2). Show that the collection of sets B(x, ε) is a base for a topology on S by verifying that axiom (B1) is satisfied. C.1.2. Let S be any set, and define ρ : S × S → R by ρ(x, y) = 0 if x = y, 1 if x = y. (i) Verify that (S, ρ) is a metric space. (ii) Show that the topology on S induced by ρ is the discrete topology. C.1.3. Let (S, T ) be a first countable topological space, and let U be a subset of S. Show that U is closed if and only if the following is true: xi ∈ U, {xi } converges to x implies that x ∈ U (Hint: either x = xi for some i or else x is a limit point of U).
342
C. TOPOLOGICAL PRELIMINARIES
C.1.4. Suppose (S, ρ) is a metric space, and define d(x, y) =
ρ(x, y) . 1 + ρ(x, y)
(i) Show that (S, d) is also a metric space. (ii) Show that ρ and d induce the same topology on S. C.1.5. Show that, if (S, T ) is a topological space and T is a Hausdorff topology, then a sequence in S can converge to at most one point. C.1.6. Suppose (S1 , T1 ), (S2 , T2 ) are topological spaces, and f : S1 → S2 is continuous at x ∈ S1 . Show that, if N is a neighborhood of f (x), then f −1 (N) is a neighborhood of x.
C.2
TOPOLOGICAL RINGS AND NORMED ALGEBRAS
This section contains a brief introduction to the subject of topological rings. Roughly speaking, a topological ring is a ring, together with a topology, such that subtraction and multiplication are continuous operations with respect to the topology. Before talking about topological rings as such, it is necessary to define the product topology on the cartesian product of topological spaces. Recall that the cartesian product S1 × S2 of two sets S1 and S2 consists of all ordered pairs (x, y) where x ∈ S1 and y ∈ S2 . Now suppose T1 , T2 are topologies on S1 , S2 , respectively. The objective is to define a topology on the product set S = S1 × S2 . This is done in the following way: Let B denote the collection of subsets of S of the form U × V where U ∈ T1 , V ∈ T2 . In other words, B is the collection of subsets of S formed by taking cartesian products of an open subset of S1 and an open subset of S2 . It can be shown that B is a base for a topology on S (see Problem C.2.1), which is referred to as the product topology on S. Now suppose Si , i = 1, 2, 3 are topological spaces, with the topologies Ti , i = 1, 2, 3. Let S = S1 × S2 , and let T denote the product topology on S. Suppose f is a function mapping S into S3 . By definition, f is continuous if f −1 (W) is an open subset of S whenever W is an open subset of S3 . Now suppose f is continuous, that (x, y) ∈ S and let f (x, y) =: z ∈ S3 . Since f is continuous, it follows that, whenever N is a neighborhood of z, f −1 (N) is a neighborhood of (x, y). Recalling the definition of the product topology on the set S, we see that, whenever N is a neighborhood of z, there exist neighborhoods N1 of x and N2 of y such that f (x1 , y1 ) ∈ N ∀ x1 ∈ N1 , ∀ y1 ∈ N2 . In particular, f (x, y1 ) ∈ N∀ y1 ∈ N2 , and f (x1 , y) ∈ N ∀ x1 ∈ N1 . The conclusion is that if f is a continuous function, then for each x ∈ S1 the function f (x, ·) : S2 → S3 is continuous; similarly, for each y ∈ S2 , the function f (·, y) : S1 → S3 is continuous. Now we come to topological rings. Suppose R is a ring and T is a topology on R. The pair (R, T ) is a topological ring if the functions (x, y) → x − y and (x, y) → xy are continuous functions from R × R into R, when R × R is given the product topology. Definition C.2.1
C.2. TOPOLOGICAL RINGS AND NORMED ALGEBRAS
343
Several facts are immediate consequences of the above definition. First, if the function (x, y) → x − y is continuous, then for each fixed x , the function y → x − y is a continuous function from R into R. In particular, taking x = 0, it follows that the function y → −y is continuous. Next, since compositions of continuous functions are again continuous, the function (x, y) → x + y = x − (−y) is also continuous. Similarly, the function (x, y) → y − x = −(x − y) is also continuous. Given a subset U ⊆ R and an element x ∈ R, let x + U denote the set defined by x + U = {x + y : y ∈ U} .
(C.1)
One can think of x + U as the set U “translated” by x. Now suppose U is an open subset of R, and define the function fx : R → R by fx (y) = y − x. As seen earlier, fx is continuous. Hence, the set fx−1 (U) is open. But clearly this set equals x + U. Thus, we have shown that if U is open, so is x + U for all x ∈ R. In other words, the topology on a topological ring is “translation-invariant,” in the sense that translates of open sets are again open. A consequence of this is that, once we know all the open sets containing 0, we know all open sets. We now speak briefly about normed algebras. Suppose A is a linear vector space over a field F. This means that there is a concept of addition between members of A, and scalar multiplication between an element of F and an element of A. Now A is an algebra (over F) if, in addition, one can define a product of two elements of A in such a way that A becomes a ring. Thus, an algebra A has associated with it a field F, and three operations: addition between elements of A, multiplication between two elements of A, and multiplication between an element of F and an element of A. A is a ring with respect to addition and multiplication, and A is a linear vector space with respect to addition and scalar multiplication. For example, if F is any field, the set of matrices Fn×n is an algebra over F. Now suppose A is an algebra over R (the real numbers) or C (the complex numbers). Then (A, · ) is a normed algebra if one can define a norm · on A such that (NA1) a ≥ 0 ∀ a ∈ A, a = 0 ⇐⇒ a = 0, (NA2) αa = |α| · a ∀ α ∈ R( or C), ∀ a ∈ A, (NA3) a + b ≤ a + b ∀ a, b ∈ A, (NA4) ab ≤ a · b , ∀ a, b ∈ A, Axioms (NA1)–(NA3) are just the usual requirements for (A, · ) to be a normed space, and (NA4) is the additional requirement for it to be a normed algebra. Let (A, · ) be a normed algebra, and let B denote the collection of balls B(x, ε) = {y ∈ A : x − y < ε} .
(C.2)
Then, B is a base for a topology T on A. This is the same as the topology induced by the metric ρ(x, y) = x − y. It is now shown that (A, T ) is a topological ring. In order to do this, it is necessary to show that the functions (x, y) → x − y and (x, y) → xy are both continuous. Let us begin with subtraction. Suppose x, ¯ y¯ ∈ A, z¯ = x¯ − y, ¯ and let S ⊆ A be an open set containing z¯ . We must show that the set {(x, y) : x − y ∈ S} =: V is an open subset of A × A. This is done by showing that, whenever
344
C. TOPOLOGICAL PRELIMINARIES
(x, y) ∈ V, there are balls B(x, δ), B(y, δ) such that B := B(x, δ) × B(y, δ) is contained in V. Since S is open and x − y ∈ S, by definition there is an ε > 0 such that B(x − y, ε) ⊆ S. Now let δ = ε/2, and, suppose x1 ∈ B(x, ¯ δ), y1 ∈ B(y, ¯ δ). Then (x1 − y1 ) − (x − y) ≤ x1 − x + y1 − y < 2δ < ε .
(C.3)
Hence, (x1 , y1 ) ∈ V. This shows that subtraction is continuous. Next, to show that multiplication is continuous, suppose x, ¯ y¯ ∈ A, let z¯ = x¯ y, ¯ and let S be an open set in A containing z¯ . Define V = {(x, y) : xy ∈ S}. We show that V is open by showing that, whenever (x, y) ∈ V, there exist balls B(x, δ), B(y, δ) such that B(x, δ) × B(y, δ) is contained in V. Since V is open, by definition there is an ε > 0 such that B(xy, ε) ⊆ S. Suppose x1 ∈ B(x, δ), y1 ∈ B(y, δ) where δ is yet to be specified. Then x1 y1 − xy = [(x1 − x) + x] · [(y1 − y) + y] − xy = (x1 − x)(y1 − y) + (x1 − x)y + x(y1 − y) .
(C.4)
x1 y1 − xy ≤ x1 − x · y1 − y + x1 − x · y x · y1 − y
+
≤ δ + δ(x + y) . 2
(C.5)
Hence (x1 , y1 ) ∈ V if δ is chosen such that δ 2 + δ(x + y) ≤ ε .
(C.6)
Such a δ can always be found since the left side of (C.6) is continuous in δ and equals zero when δ is zero. This completes the proof that multiplication is continuous.
Suppose (A, · ) is a normed algebra, and let U denote the set of units of A (i.e., the set of elements of A that have a multiplicative inverse in A). Then the function f : u → u−1 maps U into itself continuously. Fact C.2.2
By the continuity of the “inversion” function f , we mean the following: Given any u ∈ U and any ε > 0, there exists a δ > 0 such that
Remark C.2.3
u−1 − v −1 < ε whenever v ∈ U and u − v < δ .
(C.7)
C.2. TOPOLOGICAL RINGS AND NORMED ALGEBRAS
345
Proof. Let u ∈ U and let ε > 0 be specified. Suppose v ∈ U and v ∈ B(u, δ), where δ is not as yet specified. Then u−1 − v −1 = u−1 (v − u)v −1 = ≤ u−1 · v − u · v −1 ≤ δu−1 · v −1 ≤ δu−1 · (u−1 + u−1 − v −1 ) .
(C.8)
Solving for u−1 − v −1 from (C.8) gives u−1 − v −1 ≤
δu−1 2 . 1 − δu−1
(C.9)
Thus, (C.7) is satisfied if δ is chosen such that the right side of (C.9) is less than ε. But such a δ can always be found, since the right side of (C.9) is continuous in δ and equals zero when δ is zero. 2 This shows that the function f is continuous.
PROBLEMS C.2.1. Suppose (S1 , T1 ), (S2 , T2 ) are topological spaces. (i) Let T denote the collection of subsets of S1 × S2 of the form U × V where U ∈ T1 , V ∈ T2 . Show that T is a base for a topology on the set S1 × S2 (Hint: Show that T satisfies the axiom (B1) of Section B.1). (ii) Suppose B1 , B2 are respectively bases for the topologies T1 , T2 . Let B denote the collection of subsets of S1 × S2 of the form B1 × B2 where B1 ∈ B1 , B2 ∈ B2 . Show that B is also a base for a topology on S1 × S2 . (iii) Show that B and T generate the same topology on S1 × S2 (Hint: Show that every set in T is a union of sets in B). C.2.2. Consider the ring Rn×n of n × n matrices with real elements. Let · be any norm on the vector space Rn , and define the norm of a matrix by A = supx=0
Ax x
Show that (Rn×n , · ) is a normed algebra. C.2.3. Consider the ring R[s] of polynomials in the indeterminate s with real coefficients. Define the norm of a(s) = ai s i by i a(·) = |ai | . i
Show that (R[s], · ) is a normed algebra.
347
Bibliography [1] V. Anantharam and C. A. Desoer, “On the stabilization nonlinear systems,” IEEE Trans. on Auto. Control, AC-29, pp. 569–572, June 1984. DOI: 10.1109/TAC.1984.1103584 Cited on page(s) 306 [2] B. D. O. Anderson, “A note on the Youla-Bongiorno-Lu condition,” Automatica, 12, pp. 387– 388, July 1976. DOI: 10.1016/0005-1098(76)90060-1 Cited on page(s) [3] B. D. O. Anderson and J. B. Moore, Optimal Filtering, Prentice-Hall, Englewood Cliffs, New Jersey, 1979. Cited on page(s) 194, 229 [4] M. F. Atiyah and I. G. MacDonald, Introduction to Commutative Algebra, Addison-Wesley, Reading, MA., 1969. Cited on page(s) 283, 293 [5] J. A. Ball and J. W. Helton, “A Beurling-Lax theorem for the Lie group U (m, n) which contains most classical interpolation theory,” J. Operator Theory, 9, pp. l07–142, 1983. Cited on page(s) [6] C. I. Byrnes, M. W. Spong and T. J. Tarn, “A several complex variables approach to feedback stabilization of neutral delay-differential systems,” Math. Sys. Thy., 17, pp. 97–134, May 1984. DOI: 10.1007/BF01744436 Cited on page(s) 292, 306 [7] F. M. Callier and C. A. Desoer,“Open-loop unstable convolution feedback systems with dynamical feedback,” Automatica, 13, pp. 507–518, Dec. 1976. DOI: 10.1016/0005-1098(76)90010-8 Cited on page(s) [8] F. M. Callier and C. A. Desoer, “An algebra of transfer functions of distributed linear timeinvariant systems,” IEEE Trans. Circ. and Sys., CAS-25, pp. 651–662, Sept. 1978. DOI: 10.1109/TCS.1978.1084544 Cited on page(s) 287, 301, 305 [9] F. M. Callier and C. A. Desoer, “Simplifications and clarifications on the paper ’An algebra of transfer functions of distributed linear time-invariant systems,’ ” IEEE Trans. Circ. and Sys., CAS-27, pp. 320–323, Apr. 1980. DOI: 10.1109/TCS.1980.1084802 Cited on page(s) [10] F. M. Callier and C. A. Desoer, “Stabilization, tracking and disturbance rejection in multivariable convolution systems,” Annales de la Societé Scientifique de Bruxelles, 94, pp. 7–51, 1980. Cited on page(s) [11] F. M. Callier and C. A. Desoer, Multivariable Feedback Systems, Springer-Verlag, New York, 1982. Cited on page(s)
348
BIBLIOGRAPHY
[12] B. C. Chang and J. B. Pearson,“Optimal Disturbance reduction in linear multivariable systems,” IEEE Trans. on Auto. Control, AC-29, pp. 880–888, Oct. 1984. DOI: 10.1109/TAC.1984.1103409 Cited on page(s) 205 [13] M. J. Chen and C. A. Desoer, “Necessary and sufficient conditions for robust stability of linear distributed feedback systems,” Int. J. Control, 35, pp. 255–267, 1982. DOI: 10.1080/00207178208922617 Cited on page(s) 277 [14] M. J. Chen and C. A. Desoer, “Algebraic theory of robust stability of interconnected systems,” IEEE Trans. on Auto. Control, AC-29, pp. 511–519, June 1984. DOI: 10.1109/TAC.1984.1103572 Cited on page(s) 279 [15] J. H. Chow and P. V. Kokotovic, “Eigenvalue placement in two-time-scale systems,” Proc. IFAC Symp. on Large Scale Sys., Udine, Italy, pp. 321–326, June 1976. Cited on page(s) 224 [16] C.-C. Chu and J. C. Doyle, “On inner-outer and spectral factorizations,” Proc. IEEE Conf. on Decision and Control, Las Vegas, Dec. 1984. Cited on page(s) 205 [17] H. T. Colebrooke, Algebra with Arithmetic and Mensuration, John Murray, London, 1817, reprinted by Dr. Martin Sandig OHg, Wiesbaden, W. Germany, 1973. Cited on page(s) xvii [18] P. R. Delsarte, Y. Genin and Y. Kamp, “The Nevanlinna-Pick problem for matrix-valued functions,” SIAM J. Appl. Math, 36, pp. 47–61, Feb. 1979. DOI: 10.1137/0136005 Cited on page(s) 205 [19] C. A. Desoer and W. S. Chan, “The feedback interconnection of linear time-invariant systems,” J. Franklin Inst., 300, pp. 335–351, 1975. DOI: 10.1016/0016-0032(75)90161-1 Cited on page(s) [20] C. A. Desoer and M. J. Chen, “Design of multivariable feedback systems with stable plant,” IEEE Trans. on Auto. Control, AC-26, pp. 408–415, April 1981. DOI: 10.1109/TAC.1981.1102594 Cited on page(s) [21] C. A. Desoer and C. L. Gustafson, “Design of multivariable feedback systems with simple unstable plant,” IEEE Trans. on Auto. Control, AC-29, pp. 901–908, Oct. 1984. DOI: 10.1109/TAC.1984.1103400 Cited on page(s) [22] C. A. Desoer and C. L. Gustafson, “Algebraic theory of linear multivariable feedback systems,’ IEEE Trans. on Auto. Control, AC-29, pp. 909–917, Oct. 1984. DOI: 10.1109/TAC.1984.1103410 Cited on page(s) [23] C. A. Desoer and C. A. Lin,“Two-step compensation of nonlinear systems,” Systems and Control Letters, 3, pp. 41–46, June 1983. DOI: 10.1016/0167-6911(83)90036-1 Cited on page(s) 306
BIBLIOGRAPHY
349
[24] C. A. Desoer and R. W. Liu, “Global parametrization of feedback systems with nonlinear plants,” Systems and Control Letters, 1, pp. 249–251, Jan. 1982. DOI: 10.1016/S0167-6911(82)80006-6 Cited on page(s) 306 [25] C. A. Desoer, R. W. Liu, J. Murray and R. Saeks, “Feedback system design: The fractional representation approach to analysis and synthesis,” IEEE Trans. Auto. Control, AC-25, pp. 399– 412, June 1980. DOI: 10.1109/TAC.1980.1102374 Cited on page(s) [26] C. A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties, Academic Press, New York, 1975. Cited on page(s) 154, 210, 239, 284, 287 [27] J. C. Doyle, “Robustness of multiloop linear feedback systems,” Proc. 17th Conf. Decision and Control, Ft. Lauderdale, pp. 12–18, 1979. Cited on page(s) [28] J. C. Doyle, K. Glover, P. P. Khargonekar and B. A. Francis, “State space solutions to standard H2 and H∞ control problems,” IEEE Transactions on Automatic Control, 34, 831-847, 1989. Cited on page(s) xiii [29] J. C. Doyle, B. A. Francis and A. Tannenbaum, Feedback Control Theory, MacMillan, New York, 1991 Cited on page(s) xiii [30] J. Doyle, “Analysis of feedback systems with structured uncertainties,” IEE Proc., Part D, 129, pp. 242–250, Nov. 1982. Cited on page(s) [31] J. C. Doyle, “Synthesis of robust controllers and filters,” Proc. IEEE Conference on Decision and Control, pp. 109–114, 1983. DOI: 10.1109/CDC.1983.269806 Cited on page(s) 205 [32] J. C. Doyle and G. Stein, “Multivariable feedback design: Concepts for a classical/modern synthesis,” IEEE Trans. Auto. Control, AC-26, pp. 4–16, Feb. 1981. DOI: 10.1109/TAC.1981.1102555 Cited on page(s) 277 [33] J. C. Doyle, J. E. Wall and G. Stein, “Performance and robustness resuits for structured uncertainty,” Proc. IEEE Conf. on Decision and Control, pp. 628–636, 1982. Cited on page(s) [34] P. L. Duren, The Theory of H p -spaces, Academic Press, New York, 1970. Cited on page(s) 156, 205, 286 [35] B. A. Francis, “The multivariable servomechanism problem from the input-output viewpoint,” IEEETrans. Auto. Control, AC-22, pp. 322–328, June 1977. DOI: 10.1109/TAC.1977.1101501 Cited on page(s) [36] B. A. Francis,“On the Wiener-Hopf approach to optimal feedback design,” Systems and Control Letters, 2, pp. 197–201, Dec. 1982. DOI: 10.1016/0167-6911(82)90001-9 Cited on page(s) 205
350
BIBLIOGRAPHY
[37] B. A. Francis, A Course in H∞ Control Theory, Lecture Notes in Control and Information Sciences, Volume 88, Springer-Verlag, Heidelberg, 1987. Cited on page(s) xiii [38] B. A. Francis, J. W. Helton and G. Zames, “H ∞ -optimal feedback controllers for linear multivariable systems,” IEEE Trans. on Auto. Control, AC-29, pp. 888–900, Oct. 1984. DOI: 10.1109/TAC.1984.1103387 Cited on page(s) 205 [39] B. A. Francis and M. Vidyasagar, “Algebraic and topological aspects of the regulator problem for lumped linear systems,” Automatica, 19, pp. 87–90, Jan. 1983. DOI: 10.1016/0005-1098(83)90078-X Cited on page(s) 277 [40] B. A. Francis and G. Zames, “On optimal sensitivity theory for SISO feedback systems,” IEEE Trans. Auto. Control, AC-29, pp. 9–16, Jan. 1984. DOI: 10.1109/TAC.1984.1103357 Cited on page(s) xiii, 205 [41] F. R. Gantmacher, Theory of Matrices, Chelsea, New York. Cited on page(s) 326 [42] I. M. Gel’fand, D. Raikov and G. Shilov, Commutative normed rings, Chelsea, New York, 1964. Cited on page(s) 279, 281 [43] B. K. Ghosh and C. I. Byrnes, “Simultaneous stabilization and simultaneous pole-placement by non-switching dynamic compensation,” IEEE Trans. Auto. Control, AC-28, pp. 735–741, June 1983. DOI: 10.1109/TAC.1983.1103299 Cited on page(s) 277 [44] K. Glover, “All optimal Hankel-norm approximations of linear multivariable systems and their L∞ -error bounds,” Int. J. Control, 39, pp. 1115–1193, June 1984. DOI: 10.1080/00207178408933239 Cited on page(s) 205 [45] M. Green and D. J. N. Limebeer, Linear Robust Control, Prentice-Hall, Englewood Cliffs, New Jersey, 1995. Cited on page(s) xiii [46] V. Guillemin and A. Pollack, Differential Topology, Prentice-Hall, Englewood Cliffs, NJ., 1979. Cited on page(s) 293 [47] C. L. Gustafson and C. A. Desoer, “Controller design for linear multivariable feedback systems with stable plant,” Int. J. Control, 37, pp. 881–907, 1983. DOI: 10.1080/00207178308933018 Cited on page(s) [48] E. Hille and R. S. Phillips, Functional Analysis and Semigroups, Amer. Math. Soc., Providence, RI, 1957. Cited on page(s) 287 [49] K. Hoffman, Banach Spaces of Analytic Functions, Prentice-Hall, Englewood Cliffs, NJ., 1962. Cited on page(s)
BIBLIOGRAPHY
351
[50] N.T. Hung and B. D. O. Anderson,“Triangularization technique for the design of multivariable control systems,” IEEE Trans. Auto. Control, AC-24, pp. 455–460, June 1979. DOI: 10.1109/TAC.1979.1102052 Cited on page(s) [51] C. A. Jacobson, “Some aspects of the structure and stability of a class of linear distributed systems,” Robotics and Automation Lab. Rept. No. 31, Renn. Poly. Inst., Dept. of ECSE, May 1984. Cited on page(s) 306 [52] N. Jacobson, Lectures in Abstract Algebra, Vol. I, Van-Nostrand, New York. 1953. Cited on page(s) 307 [53] T. Kailath, Linear Systems, Prentice-Hall, Englewood Cliffs, NJ., 1980. Cited on page(s) [54] E. W. Kamen, P. P. Khargonekar and A. Tannenoaum, “A local theory of linear systems with noncommensurate time delays,” submitted for publication. Cited on page(s) 292, 306 [55] E. W. Kamen, P. P. Khargonekar and A. Tannenbaum, “Stabilization of time-delay systems using finite-dimensional compensators,” IEEE Trans. on Auto. Control, AC-30, pp. 75–78, Jan. 1985. DOI: 10.1109/TAC.1985.1103789 Cited on page(s) 306 [56] J. L. Kelley, General Topology, Van-Nostrand, New York, 1955. Cited on page(s) 337, 341 [57] P. P. Khargonekar and A. Tannenbaum, “Noneuclidean metrics and the robust stabilization of systems with parameter uncertainty,” IEEE Trans. Auto. Control, AC-30, pp. 1005-1013, Oct. 1985. DOI: 10.1109/TAC.1985.1103805 Cited on page(s) 277 [58] H. Kimura, “Robust Stabilizability for a class of transfer functions,” IEEE Trans. Auto. Control, AC-29, pp. 788–793, Sept. 1984. DOI: 10.1109/TAC.1984.1103663 Cited on page(s) 277 [59] P. Koosis, The Theory of H p spaces, Cambridge University Press, Cambridge, 1980. Cited on page(s) 156, 205 [60] V. Kuˇcera, Discrete Linear Control: The Polynomial Equation Approach, Wiley, New York, 1979. Cited on page(s) xvi, 205 [61] T. Y. Lam, Serre’s Conjecture, Lecture notes in Mathematics, No. 635, Springer-Verlag, Berlin, 1978. Cited on page(s) 293 [62] V. Ya. Lin, “Holomorphic fiberings and multivalued functions of elements of a Banach algebra,” (English translation) Funct. Anal. Appl., 37, pp. 122–128, 1973. DOI: 10.1007/BF01078884 Cited on page(s) 291 [63] H. Lindel and W. Lutkebohmert, “Projektive modulen fiber polynomialen erweiterungen von potenzreihenalgebren,” Archiv der Math, 28, pp. 51–54, 1977. DOI: 10.1007/BF01223888 Cited on page(s) 293
352
BIBLIOGRAPHY
[64] R. W. Liu and C. H. Sung, “Linear feedback system design,” Circ. Sys. and Sig. Proc., 2, pp. 35– 44, 1983. DOI: 10.1007/BF01598142 Cited on page(s) 306 [65] C. C. MacDuffee, Theory of Matrices, Chelsea, New York, 1946. Cited on page(s) 319, 323 [66] H. Maeda and M. Vidyasagar,“Some results on simultaneous stabilization,” Systems and Control Letters, 5, pp. 205-208, Sept. 1984. DOI: 10.1016/S0167-6911(84)80104-8 Cited on page(s) [67] H. Maeda and M. Vidyasagar, “Infinite gain margin problem in multivariable feedback systems,” Automatica, 22, pp. 131-133, Jan. 1986. DOI: 10.1016/0005-1098(86)90115-9 Cited on page(s) 277 [68] A. S. Morse, “System invariants under feedback and cascade control,” Proc. Int. Conf. on Math. Sys. Thy., Udine, ltaly, 1976. Cited on page(s) [69] B. Sz-Nagy and C. Foias, Harmonic Analysis of Operators on Hilbert Space, Elsevier, New York, 1970. Cited on page(s) 205 [70] C. N. Nett, “The fractional representation approach to robust linear feedback design: A selfcontained exposition,” Robotics and Automation Lab. Rept. No. 30, Renn. Poly. Inst., Dept. of ECSE, May 1984. Cited on page(s) 306 [71] C. N. Nett, C. A. Jacobson and M. J. Balas, “Fractional representation theory: Robustness with applications to finite-dimensional control of a class of linear distributed systems,” Proc. IEEE Conf. on Decision and Control, pp. 268–280, 1983. DOI: 10.1109/CDC.1983.269841 Cited on page(s) 306 [72] C. N. Nett, C. A. Jacobson and M. J. Balas, “A connection between state-space and doubly coprime fractional representations,” IEEE Trans. Auto. Control, AC-29, pp. 831–832, Sept. 1984. DOI: 10.1109/TAC.1984.1103674 Cited on page(s) [73] G. Newton, L. Gould and J. F. Kaiser, Analytic Design of Linear Feedback Controls, Wiley, New York, 1957. Cited on page(s) [74] L. Pernebo, “An algebraic theory for the design of controllers for linear multivariable systems,” IEEE Trans. Auto. Control, AC-26, pp. 171–194, Feb. 1981. DOI: 10.1109/TAC.1981.1102554 Cited on page(s) [75] V. P. Potapov, “The Multiplicative structure of J -contractive matrix functions,” Translations of the Amer. Math. Soc., 15, pp. 131–243. 1960. Cited on page(s) 205 [76] D. Quillen, “Projective modules over polynomial rings,” Invent. Math., 36, pp. 167–171, 1976. DOI: 10.1007/BF01390008 Cited on page(s) 293
BIBLIOGRAPHY
353
[77] V. R. Raman and R. W. Liu, “A necessary and sulficient condition for feedback stabilization in a factorial ring,” IEEE Trans. Auto. Control, AC-29, pp. 941–942, Oct. 1984. DOI: 10.1109/TAC.1984.1103395 Cited on page(s) 306 [78] H. H. Rosenbrock, State Space and Multivariable Theory, Nelson, London, 1970. Cited on page(s) [79] J. J. Rotman, An Introduction to Homological Algebra, Academic Press, New York, 1979. Cited on page(s) 293 [80] W. Rudin, Fourier Analysis on Groups, John Wiley, New York, 1962. Cited on page(s) 285, 287 [81] W. Rudin, Real and Complex Analysis, McGraw-Hill, New York, 1974. Cited on page(s) 210 [82] R. Saeks and J. Murray, “Feedback System Design: The tracking and disturbance rejection problems,” IEEE Trans. Auto. Control, AC-26, pp. 203–218, Feb. 1981. DOI: 10.1109/TAC.1981.1102561 Cited on page(s) [83] R. Saeks and J. Murray, “Fractional representation, algebraic geometry and the simultaneous stabilization problem,” IEEE Trans. Auto. Control, AC-27, pp. 895–903, Aug. 1982. DOI: 10.1109/TAC.1982.1103005 Cited on page(s) [84] R. Saeks, J. Murray, O. Chua, C. Karmokolias and A. Iyer, “Feedback system design: The single variate case - Part I,” Circ., Sys. and Sig. Proc., 1, pp. 137–170, 1982. DOI: 10.1007/BF01600050 Cited on page(s) [85] R. Saeks, J. Murray, O. Chua, C. Karmokolias and A. Iyer, “Feedback system design: The single variate case - Part II,” Circ., Sys. and Sig. Proc., 2, pp. 3–34, 1983. DOI: 10.1007/BF01598141 Cited on page(s) [86] D. Sarason, “Generalized interpolation in H ∞ ,” Trans. Amer. Math. Soc., 127, pp. 179–203, May 1967. DOI: 10.2307/1994641 Cited on page(s) 171, 179, 180, 205 [87] D. D. Siljak, “On reliability of control,” Proc. IEEE Conf. Decision and Control, pp. 687–694, 1978. Cited on page(s) [88] D. D. Siljak, “Dynamic reliability using multiple control systems,” Int. J. Control, 31, pp. 303– 329, 1980. DOI: 10.1080/00207178008961043 Cited on page(s) [89] G. Simmons, Introduction to Topology and Modern Analysis, McGraw-Hill, New York, 1966. Cited on page(s) 337 [90] D. E. Smith, History of Mathematics. Vol. I, Dover, New York, 1958. Cited on page(s) xvii [91] A. A. Suslin, “Projective modules over a polynomial ring are free,” Soviet Math. Doklady, 17, pp. 1160–1164, 1976. Cited on page(s)
354
BIBLIOGRAPHY
[92] M. Vidyasagar, “Input-output stability of a broad class of linear time-invariant multivariable feedback systems,” SIAM J. Control, 10, pp. 203–209, Feb. 1972. DOI: 10.1137/0310015 Cited on page(s) xv [93] M. Vidyasagar, “Coprime factorizations and the stability of muitivariable distributed feedback systems,” SIAM J. Control, 13, pp. 1144–1155, Nov. 1975. DOI: 10.1137/0313071 Cited on page(s) [94] M. Vidyasagar, Nonlinear Systems Analysis, Prentice-Hall, Englewood Cliffs, NJ., 1978. Cited on page(s) 154 [95] M. Vidyasagar, “On the use of right-coprime factorizations in distributed feedback systems containing unstable subsystems,” IEEE Trans. Circ. and Sys., CAS-25, pp. 916–921, Nov. 1978. DOI: 10.1109/TCS.1978.1084408 Cited on page(s) 210 [96] M. Vidyasagar, “The graph metric for unstable plants and robustness estimates for feedback stability,” IEEE Trans. Auto. Control, AC-29, pp. 403–418, May 1984. DOI: 10.1109/TAC.1984.1103547 Cited on page(s) 277 [97] M. Vidyasagar and N. K. Bose, “Input-output stability of linear systems defined over measure spaces,” Proc. Midwest Symp. Circ. and Sys., Montreal, Canada, pp. 394–397, Aug. 1975. Cited on page(s) 284 [98] M. Vidyasagar and K. R. Davidson, “A parametrization of all stable stabilizing compensators for single-input-output systems,” (under preparation). Cited on page(s) [99] M. Vidyasagar and H. Kimura, “Robust controllers for uncertain linear multivariable systems,” Automatica, 22, pp. 85-94, Jan. 1986. DOI: 10.1016/0005-1098(86)90107-X Cited on page(s) 277 [100] M. Vidyasagar and B. C. Lévy, “On the genericity of simultaneous stabilizability,” Symp. Math. Thy. of Netw. and Sys., Stockholm, 1985. Cited on page(s) 275, 277 [101] M. Vidyasagar, H. Schneider and B. A. Francis, “Algebraic and topological aspects of feedback stabilization,” IEEE Trans. Auto. Control, AC-27, pp. 880–894, Aug. 1982. DOI: 10.1109/TAC.1982.1103015 Cited on page(s) 300, 301 [102] M. Vidyasagar and N. Viswanadham, “Algebraic design techniques for reliable stabilization,” IEEE Trans. Auto. Control, AC-27, pp. 1085–1095, Oct. 1982. DOI: 10.1109/TAC.1982.1103086 Cited on page(s) 277 [103] M. Vidyasagar and N. Viswanadham, “Reliable stabilization using a multi-controller configutation,” Proc. IEEE Conf. on Decision and Control, pp. 856–859, 1983. DOI: 10.1016/0005-1098(85)90008-1 Cited on page(s)
BIBLIOGRAPHY
355
[104] B. L. van der Waerden, Geomety and Algebra in Ancient Civilizations, Springer Verlag, New York, 1984. Cited on page(s) xvii [105] J. L. Walsh, Interpolation and Approximation by Rational Functions in the Complex Domain, AMS Colloqium Publ., Providence, RI., 1935. Cited on page(s) [106] W. Wolovich, Linear Multivariable Control, Springer-Verlag, New York, 1974. Cited on page(s) [107] W. M. Wonham, Linear Multivariable Control: A Geometric Approach, 2nd ed., SpringerVerlag, New York, 1979. Cited on page(s) 254 [108] D. C. Youla, J. J. Bongiorno, Jr. and C. N. Lu, “Single-loop feedback stablization of linear multivariable plants,” Automatica, 10, pp. 159–173, 1974. DOI: 10.1016/0005-1098(74)90021-1 Cited on page(s) [109] D. C. Youla, J. J. Bongiorno, Jr. and H. A. Jabr, “Modern Wiener-Hopf design of optimal controllers, Part I: The single-input case,” IEEE Trans. on Auto. Control, AC-21, pp. 3–14, Feb. 1976. DOI: 10.1109/TAC.1976.1101139 Cited on page(s) [110] D. C. Youla and G. Gnavi, “Notes on n-dimensional system theory,” IEEE Trans. Circ. and Sys., CAS-26, pp. 105–111, Feb. 1979. DOI: 10.1109/TCS.1979.1084614 Cited on page(s) 283 [111] D. C. Youla, H. A. Jabr and J. J. Bongiorno, Jr., “Modern Wiener-Hopf design of optimal controllers, Part II: The multivariable case,” IEEE Trans. Auto. Control, AC-21, pp. 319–338, June 1976. DOI: 10.1109/TAC.1976.1101223 Cited on page(s) 205 [112] D. C. Youla and R. Pickel, “The Quillen-Suslin theorem and the structure of n-dimensional elementary polynomial matrices,” IEEE Trans. Circ. and Sys., CAS-31, pp. 513–518, June 1984. DOI: 10.1109/TCS.1984.1085545 Cited on page(s) 293, 294 [113] G. Zames,“Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms and approximate inverses,” IEEE Trans. Auto. Control, AC-26, pp. 301–320, April 1981. DOI: 10.1109/TAC.1981.1102603 Cited on page(s) 205 [114] G. Zames and A. El-Sakkary, “Unstable systems and feedback: The gap metric,” Proc. Allerton Conf., pp. 380–385, 1980. Cited on page(s) 228, 277 [115] G. Zames and B. A. Francis, “A new approach to classical frequency methods: feedback and minimax sensitivity,” IEEE Trans. Auto. Control, AC-28, pp. 585–601, May 1983. DOI: 10.1109/TAC.1983.1103275 Cited on page(s) xiii, 205 [116] O. Zariski and P. Samuel, Commutative Algebra, Vol. I, Van-Nostrand, New York, 1958. Cited on page(s) 279, 281, 282, 307, 316
356
BIBLIOGRAPHY
[117] K. Zhou, J. C. Doyle and K. Glover, Robust and Optimal Control, Prentice-Hall, Englewood Cliffs, New Jersey, 1996. Cited on page(s) xiii
357
Author’s Biography MATHUKUMALLI VIDYASAGAR Mathukumalli Vidyasagar was born in Guntur, India in 1947. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Wisconsin in Madison, in 1965, 1967, and 1969, respectively. From the next twenty years he taught mostly in Canada, before returning to his native India in 1989. Over the next twenty years, he built up two research organizations from scratch, first the Centre for Artificial Intelligence and Robotics under the Ministry of Defence, Government of India, and later the Advanced Technology Center in Tata Consultancy Services (TCS), India’s largest software company. After retiring from TCS in 2009, he joined the University of Texas at Dallas as the Cecil & Ida Green Chair in Systems Biology Science, and he became the Founding Head of the Bioengineering Department. His current research interests are stochastic processes and stochastic modeling, and their application to problems in computational biology. He has received a number of awards in recognition of his research, including the 2008 IEEE Control Systems Award.
359
Index absolute nondivisor of zero, 312 accumulation point, 340 additive perturbation, 236 associate, 316 left, 328 right, 328 base for a topology, 338 Bezout domain, 279 relation to coprime factorizations, 280 binary relation, 269 Binet-Cauchy formula, 325 Blaschke product, 158 closed set, 337 cluster point, 340 complementary inner matrix, 194 complementation, 288 in a local ring, 296 relation to coprime factorizations, 288 comprime factorization over a general ring, 279 relation to Bezout domains, 280 relation to principal ideals, 279, 280 comprimeness over a general ring, 281 continuity, 341 contractible set, 291 relation to Hermite rings, 291 coprimeness, 315 in A, 287 corona theorem, 286
degree function, 319 determinant, 323 multilinearity of, 324 domain, 308 equivalence, 328 Euclidean domain, 319 proper, 319 field, 309 of fractions, 313 filtering, 153 solution, 162, 163 fraction, 312 Gel’fand transform, 284 generic relation, 269 genericity of comprimeness, 270 of nonsingularity, 271 of simultaneous strong stabilizability, 273, 277 of Smith forms, 271, 273 graph, 209 graph metric, 229 estimates for, 232 relation to graph topology, 230 graph topology, 212 and singular perturbations, 223 and state space models, 221 continuity in, 215
360
INDEX
convergence in, 213 relation to product topology, 219 relation to robust stabilization, 215 greatest common divisor, 314
normalized r.c.f., 228 normed algebra, 342 nullstellensatz, 283 Nyquist criterion, 238
Hardy spaces, 156 Hausdorff topology, 340 Hermite form, 329 Hermite ring, 290 relation to coprime factorizations, 291
open set, 337 optimal robust regulation, 264 outer function, 158 outer matrix, 159
ideal, 309 left, 309 maximal, 281 prime, 310 principal, 310 right, 309 index, 238 inner function, 158 inner matrix, 159 inner-outer factorization, 158, 159 internal model principle, 254 invariant factor, 332 Laplace’s expansion, 328 least common multiple, 317 limit point, 340 local ring, 293 localization, 293, 313 locally compact Abelian group, 285 maximal ideal space, 283 maximum modulus principle, 186 metric space, 339 multiplicative perturbation, 236 multiplicative set, 298, 312 Nagata’s lemma, 294 neighborhood, 339 Nevanlinna-Pick algorithm, 185
performance index, 153 prime element, 316 prime factorization, 317 principal ideal domain, 314 principal ideal ring, 314 product topology, 342 proper Euclidean domain, 319 Quillen-Suslin theorem, 293 quotient field, 313 reduced form, 316 relation, 269 binary, 269 generic, 269 strongly generic, 269 symmetric, 269 ring, 307 commutative, 307 of fractions, 313 with an identity, 307 robust disturbance rejection, 252 solution, 252 robust regulation, 252 robust stabilizability, 209 robust stabilization existence of compensators, 244, 246, 248 theorems, 237, 242 robust tracking, 253 using two-parameter compensation, 259
INDEX
saturated set, 298, 312 Schur complement, 327 sensitivity minimization, 154 example, 174, 192 for a fat plant, 176 for a general plant, 193 for a scalar plant, 166 Serre’s conjecture, 293 singular perturbations, 223 and strictly proper compensators, 227 Smith form, 332 Smith-McMillan form, 335 spectral factor, 194 spectral factorization, 194 stabilization by lumped compensators, 305 stable-factor perturbation, 236
strongly generic relation, 269 subring, 309 symmetric relation, 269 topological ring, 342 topological space, 337 topology, 337 discrete, 337 first-countable, 339 Hausdorff topology, 340 product, 342 trivial, 337 unimodular row, 289 weighted performance index, 153
361