STUDIES IN COMPUTATIONAL MATHEMATICS 6
Editors:
C. BREZINSKI University of Lille Villeneu~'e d'Ascq, Fran~'e
L. WUYT...
72 downloads
976 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
STUDIES IN COMPUTATIONAL MATHEMATICS 6
Editors:
C. BREZINSKI University of Lille Villeneu~'e d'Ascq, Fran~'e
L. WUYTACK Uni~'ersity of Antw'elT~ Wihjjk, Belgium
ELSEVIER Amsterdam
-
Lausanne
-
New
York
-
Oxford
-
Shannon
-
Singapore
-
Tokyo
LINEAR ALGEBRA, RATIONAL APPROXIMATION AND ORTHOGONAL POLYNOMIALS
LINEAR ALGEBRA, RATIONAL APPROXIMATION AND ORTHOGONAL POLYNOMIALS
Adhemar BULTHEEL Marc VAN BAREL Departnlent of Conqmter S~'ien~'e Katholieke U,i~'ersiteit Lettve, He~'erlee. Belgi,m
1997
ELSEVIER Amsterdam
-
Lausanne
-
New
York
-
Oxford
-
Shannon
-
Singapore
-
Tokyo
ELSEVIER SCIENCE B.V.
Sara Burgerhartstraat 25 RO. Box 211, 1000 AE Amsterdam, The Netherlands
Library o f C o n g r e s s C a t a l o g i n g - i n - P u b l i c a t i o n
Data
Bulrheel, Adhemar• Linear algebra, rational approximation, and o r t h o g o n a l p o l y n o m i a l s / Adhemar B u l t h e e l , Marc van B a r e l . p. cm. - - ( S t u d i e s in c o m p u t a t i o n a l m a t h e m a t i c s - 6) lncludes bibliographical references and i n d e x . ISBN 0 - 4 4 4 - 8 2 8 7 2 - 9 (acid-free paper) I. Euclidian algorithm. 2. Algebras, Linear• 3. Pade approximant. 4. Orthogonal polynomials. I. Barel, Marc v a n , 1 9 6 0 • II. Title. III. Series. QA242.B88 1997 512'.72--dc21 97-40610 CIP
ISBN" () 444 82872 9
© 1997 ELSEVIER SCIENCE B.V. All righls reserved. N~ part of this publication may hc reproduced, stored in a retrieval system or transmitted in any form or by any rncans, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Scicrlcc B.V., Copyright & Permissions Department, P.O. Box 521, 1()()() A M Amsterdam, The Netherlands. Special regulations lk)r readers in the U.S.A. - This publication has hccn registered with the Copyrighl Clearance Center Inc. (CCC), 222 RoscwC~C~d Drive, Danvcrs, M A ()]923. infomlation can bc obtained from the ('CC aboul conditi~ms under which ph~lC~copics of paris of this publication may hc made in the U.S.A. All ()thor copyright qucsli()ns, including ph~tocopying oulsidc of the U.S.A., should be referred It) lhc publisher. NC~ rcsFmsibilily ix assunlcd by the publisher Iktr any injury am_l/Ctr clama,~c IC~ persons Ctr properly as a matter ~1 pntducls liahilily, negligence ~r ~tthcrwisc, ~tr lr~tm any use or opcrali~m of any methods, pn~ducls, inslructi~ms ~r iclcas c~mtaincd in the malcrial herein. This b~t~k ix printed ~m acid-free paper. Prinlcd in lhc Nclhcrlands
Preface It is very interesting to see how the same principles and techniques are developed in different fields of mathematics, quite independently from each other. It is only after a certain m a t u r i t y is reached that a result is recognized as a variation on an old theme. This is not always a complete waste of effort because each approach has its own merits and usually adds something new to the existing theory. It is virtually impossible and there is no urgent need to stop this natural development of things. If tomorrow a new application area is emerging with its own terminology and ways of thinking, it is not impossible that an existing method is rediscovered and only later recognized as such. It will be the twist in the formulation, the slightly different objectives, an extra constraint etc. that will revive the interest in an old subject. The availability of new technological or m a t h e m a t i c a l tools will force researchers to rethink theories that have long been considered as complete and dead as far as research is concerned. In this text we give a good illustration of such an evolution. For the underlying principle we have chosen the algorithm of Euclid. It is probably the oldest known nontrivial algorithm, which can be found in the most elementary algebra textbooks. It is certainly since the introduction of digital computers and the associated algorithmic way of thinking that it received a new impetus by its many applications which resulted in an increasing interest, but its simplicity in dealing with situations that are, at least in certain problem formulations not completely trivial, explains its success. Already in ancient times, long before modern computers became an essential part of the scene, the algorithm has been used in many applications. In its original form, it deals with a geometrical problem. At least, Euclid himself describes it in his 7th book of the Elements as a way of constructing the largest possible unit rule which measures the length of two given rules as an integer times this unit rule. Nowadays we recognize the Euclidean algorithm as a method to compute the greatest common divisor of two integers or of two polynomials. This may seem a trivial step, yet, it
vi
PREFACE
links geometry, algebra and number theory. Of course, the distinction between different mathematical disciplines is purely artificial and is invented by mathematicians. Luckily, the self-regulating mathematical system maintains such links between the different types of mathematicians, and prevents them from drifting too far apart. This trivial observation would not justify this text if there weren't many more applications of this computational method. It was recognized with the invention of continued fractions that the algorithm does not only compute the final result: the greatest common divisor, but all the intermediate results also appear as numerators or denominators in a (finite) continued fraction expansion of a rational number. The next, quite natural step, is to apply the algorithm to test whether a number is rational or not and to see what happens when the number is not rational. The algorithm will go on indefinitely, and we get an infinite continued fraction expansion. The study of these expansions became a useful and powerful tool in number theory. The same technique can be applied to (infinite formal) power series, rather than just polynomials, and again the algorithm will end after a finite number of steps if we started from a representation of a rational fraction or it will give an infinite (formal) expansion which might or might not converge in a certain region e.g., of the complex plane. This hooks up the Euclidean algorithm with (rational) approximation theory. From an algebraic point of view, the Euclidean domains (and several of its generalizations) became a study object in their own right. From the approximation side, the kind of rational approximants that you get are known as Pad6 approximants. Although this theory has celebrated its hundredth anniversary, the recognition of the Euclidean algorithm as an elegant way of constructing approximants in a nonnormal table came only a couple of decades ago. Traditionally, the computation of a Padd approximant was done via the solution of a linear system of equations. The matrix of the system has a special structure because it contains only O(n) different elements while the entries on an antidiagonal are all the same. It is a Hankel matrix. The linear algebra community became interested in solving this special type of systems because their solution can be obtained with sequential computations in O(n 2) operations, rather than the O(n 3) for general systems. The Hankel matrices, and the related Toeplitz matrices, show up in numerous applications and have been studied for a long time and that explains why it is important to have efficient solution methods for such systems which fully exploit their special structure. Not only the solution of such systems, but also other, related linear algebra problems can benefit from the known
PREFACE
vii
theory of the Euclidean algorithm. The Lanczos algorithm, which is recently rediscovered as a fashionable research area is intimately related to the Euclidean algorithm. Again the algorithm of Euclid can serve as an introduction to the fast growing literature on fast algorithms for matrices with special structure, of which Toeplitz and Hankel matrices are only the most elementary examples. Connected, both to Toeplitz and Hankel matrices and to Pad~ approximation, are orthogonal polynomials and moment problems. Given the moments for some inner product, the problem is to find the orthogonal polynomials and eventually the measure itself. For a measure with support on the real line, the moment matrix is typically a Hankel matrix, for Szeg6's theory where the support is the unit circle of the complex plane, the moment matrix is typically Toeplitz. To have a genuine inner product, the moment matrices should be positive definite and strongly nonsingular, that is, all its principal leading minors are nonsingular. In Pad~ approximation, this has been formally generalized to orthogonality with respect to some linear functional and the denominators of the approximants are (related to) the orthogonal polynomials. However, in this situation, the moment matrix is Hankel, but in general neither positive definite nor strongly nonsingular and then the Euclidean algorithm comes again to the rescue because it helps to jump over the singular blocks in a nonnormal Pad~ table. A similar situation occurs in the problem of Laurent-Pad~ approximation which is related to a Toeplitz moment matrix, and also here the matrix is neither positive definite nor strongly nonsingular. The analogs of the Euclidean algorithm which can handle these singular situations are generalizations of the Schur and Szeg6 recursions in classical moment theory. A final cornerstone of this text that we want to mention here is linear systems theory. This is an example of an engineering application where many, sometimes deep, mathematical results come to life. Here, both mathematicians and engineers are active to the benefit of both. The discipline is relatively young. It was only since the nineteenthirties that systems theory became a mathematical research area. In the current context we should mention the minimal partial realization problem for linear systems. It is equivalent with a Pad~ approximation problem at infinity. The minimality of the realization is however important from a practical point of view. The different formulation and the extra minimality condition makes it interesting because classical Pad~ approximation doesn't give all the answers and a new concept of minimal Pad~ approximation is the natural equivalent in the theory of Pad~ approximation. A careful examination of the Euclidean algorithm will reveal that it is actual]~, a variant of the Berlekamp-Massey
viii
PREFACE
algorithm. The latter was originally developed as a method for handling error-correcting codes by shift register synthesis. It became known to the engineering community as also solving the minimal partial realization problem. Another aspect that makes systems theory interesting in this aspect is that in this area it is quite natural to consider systems with n inputs and m outputs. When m and n are equal to 1, we get the scalar theory, but with m and n larger than 1, the moments, which are called Markov parameters in this context, are matrices and the denominators of the realizations are square polynomial matrices and the numerators rectangular polynomial matrices. So, many, but not all, of the previously mentioned aspects are generalized to the matrix or block matrix case for multi-input multi-output systems. In the related areas of linear algebra, orthogonal polynomials and Pad~ approximation, these block cases are underdeveloped to almost nonexisting at all. The translation of results from multi-input multi-output systems theory to the fields mentioned above was one of the main incentives for putting the present text together. In this book we shall only consider the scalar theory and connections that we have sketched above. Excellent textbooks exist on each of the mentioned areas. In most of them the Euclidean algorithm is implicitly or explicitly mentioned, but the intimate interplay between the different fields is only partially covered. It is certainly not our intention to replace any of these existing books, but we want in the first place put their interconnection at the first plan and in this way we hope to fill an empty space. We make the text as selfcontained as possible but it is impossible to repeat the whole theory. If you are familiar with some of the fields discussed it will certainly help in understanding our message. For the theory of continued fractions and their application in Pad~ approximation as well as in number theory, you can consult the book by Perron Die Lehre yon den Kettenbriicken [202] and Wall's Analytic theory of continued fractions [237, 238] which are classics, but Jones and Thron's Continued fractions, analytic theory and applications [161] can be considered as a modern classic in this domain. The most recent book on the subject is Continued fractions with applications [183] by Lorentzen and Waadeland who include also many applications, including orthogonal polynomials and signal processing. For Pad~ approximations you may consult e.g. Baker's classic" Essentials of Padg Approximation [5], but a more recent substitute is the new edition of the book Padd Approximants by Baker and Graves-Morris [6, 7]. See also [69]. The connection between Pad~ approximation and formal orthogonal polynomials is explicitly discussed by Brezinski in Padd-type approximation
PREFACE
ix
and general orthogonal polynomials [24]. And much more on formal orthogonal polynomials is given in A. Draux's PolynSmes orthogonaux formels applications [87]. For the linear algebra aspects, the book by Heinig and Rost Algebraic methods for Toeplitz-like matrices and operators [144] is a cornerstone that resumes many results. For the theory of linear systems there is a vast literature, but Kailath's book Linear systems [165] is a very good introduction to many of the aspects we shall discuss. First we guide the reader from the simplest formulation of the Euclidean algorithm to a more abstract formulation in a Euclidean domain. We give some of its variants and some of its most straightforward applications. In a second chapter we discuss some aspects and applications in linear algebra, mainly including the factorization of Hankel matrices. In the third chapter, we give an introduction to the Lanczos algorithm for unsymmetric matrices and some of its variants. The fourth chapter on orthogonal polynomials translates the previous results to orthogonal polynomials with respect to a general biorthogonal form with an arbitrary moment matrix. The Hankel matrices that were studied in previous chapters are a very special case. We also give some results about Toeplitz matrices which form another important subclass. As a preparation for the matrix case, we give most formulations in a noncommutative field which forces us to use left/right terminology. This is not really essential, but it forces us to be careful in writing the products and inverses so that the results reflect already (at least partially) the block case. Chapter 5 treats Pad~ approximations. Perhaps the most important results of this chapter are the formulations of minimal Pad~ problems and the method to solve them. The next chapter gives a short introduction to linear systems and illustrates how the previous results can be used in this context. It includes a survey of recent developments in stability tests of R.outh-Hurwitz and Schur-Cohn type. Finally, Chapters 7 and 8 give some less elaborated perspectives of further applications which are closely related to what has been presented in the foregoing chapters. Chapter 7 gives a general framework for solving very general rational interpolation problems of which (scalar) Pad~ approximants are a special case. It also introduces the look-ahead strategy for solving such problems and which is most important in numerical computations. It is left to the imagination of the reader to translate the look-ahead ideas to all the other interpretations one can give to these algorithms in terms of rational approximation, of orthogonal polynomials, iterative methods for large matrices,
x
PRE, F A C E
solution of structured systems etc. The last chapter introduces the application of the Euclidean algorithm in the set of Laurent polynomials to the factorization of a polyphase matrix into a product of elementary continued fraction-like matrices. These polyphase matrices occur is the formulation of wavelet transforms and the factorization is interpreted as primal and dual lifting steps which allow for an efficient computation of wavelet transform and its inverse. While we were preparing this manuscript, we became aware of the P h . D . thesis by Marlis Hochbruck [151] which treats similar subjects. As for the iterative solution of systems, Claude Brezinski is preparing another volume in this series [28] which is completely devoted to this subject. It is clear that in this monograph all the topics of the project ROLLS are present. That is Rational approximation, Orthogonal functions, Linear algebra, Linear systems, and Signal processing. The remarkable observation is that the Euclidean algorithm, in one form or another, is a "greatest common divisor" of all these topics.
A c k n o w l e d g e m e n t : This research was supported by the Belgian National Fund for Scientific Research (NFWO) project LANczos under contract #2.0042.93 and the Human Capital and Mobility project ROLLS under contract CHl~X-CT93-0416.
Contents Preface
v
Table of contents
xi
List of symbols 1
2
3
Euclidean
x,v
fugues
1
1.1
T h e a l g o r i t h m of Euclid
. . . . . . . . . . . . . . . . . . . . .
1
1.2
E u c l i d e a n ring a n d g.c.l.d . . . . . . . . . . . . . . . . . . . . .
3
1.3
Extended Euclidean algorithm
1.4
Continued fraction expansions . . . . . . . . . . . . . . . . . .
15
.................
9
1.5
A p p r o x i m a t i n g f o r m a l series . . . . . . . . . . . . . . . . . . .
21
1.6
Atomic Euclidean algorithm . . . . . . . . . . . . . . . . . . .
32
1.7
Viscovatoff a l g o r i t h m . . . . . . . . . . . . . . . . . . . . . . .
37
1.8 1.9
Layer peering vs. layer adjoining m e t h o d s . . . . . . . . . . . Left-l~ight d u a l i t y . . . . . . . . . . . . . . . . . . . . . . . .
52 54
Linear
algebra
of Hankels
61
2.1
Conventions and notations . . . . . . . . . . . . . . . . . . . .
61
2.2
Hankel m a t r i c e s . . . . . . . . . . . . . . . . . . . . . . . . . .
63
2.3
Tridiagonal matrices
2.4
S t r u c t u r e d Hankel i n f o r m a t i o n
.................
81
2.5
Block G r a m - S c h m i d t a l g o r i t h m . . . . . . . . . . . . . . . . .
84
2.6
T h e Schur a l g o r i t h m
85
2.7
T h e Viscovatoff a l g o r i t h m
Lanczos
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
algorithm
74
95 99
3.1
K r y l o v spaces . . . . . . . . . . . . . . . . . . . . . . . . . . .
100
3.2
Biorthogonality . . . . . . . . . . . . . . . . . . . . . . . . . .
101
3.3
T h e generic a l g o r i t h m
104
. . . . . . . . . . . . . . . . . . . . . . xi
xii
CONTENTS 3.4 The Euclidean Lanczos algorithm . . . . . . . . . . . . . . . . 105 3.5 Breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Note of warning . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.6
4
O r t h o g o n a l polynomials 135 4.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.2 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . 138 149 4.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 4.4 Hessenberg matrices . . . . . . . . . . . . . . . . . . . . . . . 4.5 Schur algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 155 159 4.6 Rational approximation . . . . . . . . . . . . . . . . . . . . . 4.7 Generalization of Lanczos algorithm . . . . . . . . . . . . . . 166 169 4.8 The Hankel case . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Toeplitz case . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 4.9.1 Quasi-Toeplitz matrices . . . . . . . . . . . . . . . . . 179 4.9.2 The inner product . . . . . . . . . . . . . . . . . . : . 182 183 4.9.3 Iohvidov indices . . . . . . . . . . . . . . . . . . . . . 185 4.9.4 Row recurrence . . . . . . . . . . . . . . . . . . . . . . 4.9.5 Triangular factorization . . . . . . . . . . . . . . . . . 197 4.9.6 Rational approximation . . . . . . . . . . . . . . . . . 199 4.9.7 Euclidean type algorithm . . . . . . . . . . . . . . . . 201 207 4.9.8 Atomic version . . . . . . . . . . . . . . . . . . . . . . 4.9.9 Block bidiagonal matrices . . . . . . . . . . . . . . . . 210 4.9.10 Inversion formulas . . . . . . . . . . . . . . . . . . . . 213 4.9.11 Szego-Levinson algorithm . . . . . . . . . . . . . . . . 220 4.10 Formal orthogonality on an algebraic curve . . . . . . . . . . 226
5
Pad6 a p p r o x i m a t i o n 231 5.1 Definitions and terminology . . . . . . . . . . . . . . . . . . . 232 5.2 Computation of diagonal PAS . . . . . . . . . . . . . . . . . . 235 5.3 Computation of antidiagonal PAS . . . . . . . . . . . . . . . . 241 5.4 Computation of staircase PAS . . . . . . . . . . . . . . . . . . 248 5.5 Minimal indices . . . . . . . . . . . . . . . . . . . . . . . . . . 250 5.6 Minimal Pad6 approximation . . . . . . . . . . . . . . . . . . 254 265 5.7 The Massey algorithm . . . . . . . . . . . . . . . . . . . . . .
6
Linear s y s t e m s 271 6.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 6.2 More definitions and properties . . . . . . . . . . . . . . . . . 287 6.3 The minimal partial realization problem . . . . . . . . . . . . 292 6.4 Interpretation of the Padk results . . . . . . . . . . . . . . . . 298
CONTENTS
xiii
300 6.5 The mixed problem . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Interpretation of the Toeplitz results . . . . . . . . . . . . . . 304 6.7 Stability checks . . . . . . . . . . . . . . . . . . . . . . . . . . 306 306 6.7.1 Routh-Hurwitz test . . . . . . . . . . . . . . . . . . . . 322 6.7.2 Schur-Cohn test . . . . . . . . . . . . . . . . . . . . .
7 General rational interpolation 351 7.1 General framework . . . . . . . . . . . . . . . . . . . . . . . . 351 7.2 Elementary updating and downdating steps . . . . . . . . . . 360 365 7.3 A general recurrence step . . . . . . . . . . . . . . . . . . . . 366 7.4 Pad6 approximation . . . . . . . . . . . . . . . . . . . . . . . 380 7.5 Other applications . . . . . . . . . . . . . . . . . . . . . . . . 8 Wavelets
8.1 8.2 8.3 8.4 8.5 8.6 8.7
385
Interpolating subdivisions . . . . . . . . . . . . . . . . . . . . 385 390 Multiresolution . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Wavelet transforms . . . . . . . . . . . . . . . . . . . . . . . . 398 The lifting scheme . . . . . . . . . . . . . . . . . . . . . . . . 402 Polynomial formulation . . . . . . . . . . . . . . . . . . . . . Euclidean domain of Laurent polynomials . . . . . . . . . . . 407 409 Factorization algorithm . . . . . . . . . . . . . . . . . . . . .
Bibliography
413
List of Algorithms
435
Index
436
This Page Intentionally Left Blank
List of Symbols C
the complex numbers
R
t h e real n u m b e r s t h e integers
Z N mod div 7)
t h e n a t u r a l n u m b e r s including 0
u(v)
m o d u l o o p e r a t i o n in a E u c l i d e a n d o m a i n . . . . . . . . . . . . . . division o p e r a t i o n in a E u c l i d e a n d o m a i n . . . . . . . . . . . . . . i n t e g r a l ring or d o m a i n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t h e u n i t s of 7) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 3 3
alb a,,,b
a divides b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a is a s s o c i a t e of b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 4
g e l d ( a , b) g e r d ( a , b)
g r e a t e s t c o m m o n left divisor of a a n d b . . . . . . . . . . . . . . . . g r e a t e s t c o m m o n right divisor of a a n d b . . . . . . . . . . . . . .
5 5
g e d ( a , b)
g r e a t e s t c o m m o n divisor of a a n d b . . . . . . . . . . . . . . . . . . . . degree f u n c t i o n in a E u c l i d e a n ring . . . . . . . . . . . . . . . . . . . .
5 6
a ldiv b a rmod b
left r e m a i n d e r in a E u c l i d e a n ring . . . . . . . . . . . . . . . . . . . . . left q u o t i e n t in a E u c l i d e a n ring . . . . . . . . . . . . . . . . . . . . . . . right r e m a i n d e r in a E u c l i d e a n ring . . . . . . . . . . . . . . . . . . .
7 7 55
a rdiv b
right q u o t i e n t in a E u c l i d e a n ring . . . . . . . . . . . . . . . . . . . .
55
F
A n a r b i t r a r y field, possibly skew . . . . . . . . . . . . . . . . . . . . . .
2
p o l y n o m i a l s over F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
0(.) a lmod b
F(z)
f o r m a l L a u r e n t series over F w i t h finitely m a n y n e g a t i v e
F[[z]]
powers formal powers formal
o r d (a) deg (a)
L a u r e n t p o l y n o m i a l s over F . . . . . . . . . . . . . . . . . . . . . . . . . o r d e r of a f o r m a l L a u r e n t series a . . . . . . . . . . . . . . . . . . . . degree of a f o r m a l L a u r e n t series a . . . . . . . . . . . . . . . . . . .
of z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 L a u r e n t series over F w i t h finitely m a n y p o s i t i v e of z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 p o w e r series over F . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 407 21 22
i
t h e s t a c k i n g v e c t o r of a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . reversal o p e r a t o r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62 63
Z
shift o p e r a t o r . . . . . . . . . . . . .
62
a
XV
. ..........................
L I S T OF S Y M B O L S
xvi diag(ai) ~n O~n
nl,IIn F(p) det A
span{x/} dim 2d range A rank A ker A
V* /z~j p# p#
S U
s;,
S~_
II~, H~, II[
~ zkF,
,
a diagonal matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K r o n e c k e r index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67 68
E u c l i d e a n index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . truncation operators ................................. c o m p a n i o n m a t r i x of p o l y n o m i a l p . . . . . . . . . . . . . . . . . . . t h e d e t e r m i n a n t of m a t r i x A . . . . . . . . . . . . . . . . . . . . . . . . .
68 69 74
77
t h e s p a n of t h e vectors x~ . . . . . . . . . . . . . . . . . . . . . . . . . . . t h e d i m e n s i o n of t h e space X" . . . . . . . . . . . . . . . . . . . . . . . the r a n g e of m a p p i n g A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the r a n k of t h e m a t r i x A . . . . . . . . . . . . . . . . . . . . . . . . . . . . the kernel of m a t r i x A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . bilinear f o r m or inner p r o d u c t . . . . . . . . . . . . . . . . . . . . . . . dual space for v e c t o r space V . . . . . . . . . . . . . . . . . . . . . . . moment ............................................ reciprocal p o l y n o m i a l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
100 100 101 116 119 136 136 139 182
reciprocal of vector of p o l y n o m i a l s . . . . . . . . . . . . . . . . . . Iohvidov indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
219 184
(strict) minimal index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . t i m e set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
251 272
signal space ( = t i m e d o m a i n ) . . . . . . . . . . . . . . . . . . . . . . . set of i n p u t signals of a linear s y s t e m . . . . . . . . . . . . . . . set of o u t p u t signals of a linear s y s t e m . . . . . . . . . . . . . . i n p u t - o u t p u t m a p of a linear s y s t e m . . . . . . . . . . . . . . . . p r e s e n t , p a s t , f u t u r e signal space . . . . . . . . . . . . . . . . . . . . p r o j e c t o r s o n t o p r e s e n t , p a s t , f u t u r e signal space . . . . i m p u l s e signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272 272 272 272 273 273 274
two-sided f o r m a l L a u r e n t series (= frequency domain) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
274
Ib{r} T ID
present, past and f u t u r e in t h e f r e q u e n c y d o m a i n . . . . . . . . . . . . . . . . . . . . . . z-transform ......................................... i m p u l s e response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . s t a t e space of a linear s y s t e m . . . . . . . . . . . . . . . . . . . . . . . observability a n d r e a c h a b i l i t y m a t r i x . . . . . . . . . . . . . . . . C a u c h y index of F over interval [a, b] . . . . . . . . . . . . . . . . c o m p l e x unit circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . open unit disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E
c o m p l e m e n t of closed unit disk . . . . . . . . . . . . . . . . . . . . . .
322
/,
p a r a - c o n j u g a t e of f w.r.t, i R . . . . . . . . . . . . . . . . . . . . . . . . or w . r . t . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
310 322
h = ?-/6 $ O,Tr
276 274
277 278 284 309 322 322
LIST OF SYMBOLS
xvii
p# I(F)
r e c i p r o c a l of p o l y n o m i a l P . . . . . . . . . . . . . . . . . . . . . . . . . .
322
i n d e x of F w . r . t , iIi~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
311
or w . r . t . T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
326
n u m b e r of z e r o s in left h a l f p l a n e . . . . . . . . . . . . . . . . . . . .
311
or in ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
326
Z
set of i n t e r p o l a t i o n p o i n t s . . . . . . . . . . . . . . . . . . . . . . . . . . .
352
F[[z]]z
f o r m a l N e w t o n series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
352
N(P)
This Page Intentionally Left Blank
Chapter 1
Euclidean fugues 1.1
The algorithm of Euclid
The algorithm which Euclid wrote down in 300 B.C. in the Elements (Book 7, Theorems 1 and 2) is probably the oldest nontrivial algorithm t h a t is still of great practical importance in our days [173]. There are earlier traces of it in the work of Greek mathematicians [25, p. 6] and some even go back to the Babylonian period [3, p. 27]. It is best known as an algorithm to compute a greatest common divisor of two integers or two polynomials, but it can as well be used to compute a greatest common divisor of two elements from a general Euclidean domain. This is commonly defined as an integral domain where a certain division property holds which forms the basis of the Euclidean algorithm. For example, consider the set of integers Z and let us denote the absolute value of a E Z by i)(a). The division property for the integers Z then says t h a t for any pair a, b C Z with i)(a) > i)(b) ~ O, there exist a quotient q C Z and a remainder r C Z such t h a t
a = qb + r with 0(r) < cO(b).
(1.1)
For future reference of the relation (1.1), we used already at this point the curious notation O(a) to denote the absolute value of a number a. For the set of positive integers, one could simply describe the Euclidean algorithm as follows : Given a couple of positive integers, we subtract the smallest number from the larger one and replace the largest one by this difference. We do the same with the resulting couple until the numbers are equal. Then we have found a greatest common divisor of the given couple.
2
CHAPTER
1.
EUCLIDEAN FUGUES
The above description is close to the original description of Euclid. The successive subtractions simulate the divisions. Note that relation (1.1) does not define the quotient and remainder uniquely. We could write e . g . - 3 7 = ( - 4 ) ( 8 ) - 5 or - 3 7 = ( - 5 ) ( 8 ) + 3. Both of these satisfy the property (1.1). There is a remedy for this nonuniqueness, which we shall give in section 1.2 in a more general context. Anyway, we shall use the notation q - a div b to denote that q is a quotient and r = a m o d b to indicate that r is the (corresponding) remainder. A formal definition of the Euclidean algorithm can now be given as described in algorithm S c a l a r _ E u c l i d . Algorithm 1.1: Scalar_Euclid Given a and b Set 7 ' _ 1 = a and r0 = b k=l w h i l e rk-1 ~ 0 Set rk = r k - 2 rood rk-1 k=k+l endwhile
The last nonzero element in the sequence of the r i ' s is r k - 2 which is the greatest common divisor of a and b. The main step of the algorithm is an application of the division property. We compute rk as a remainder when rk-2 is divided by rk-1. As we have said before, this will not define the sequence of numbers {rk) uniquely, not even for positive a and b. As a m a t t e r of fact, because of this nonuniqueness, the greatest common divisor will not be unique either, and strictly speaking, we should refer to a greatest common divisor. It turns out however, and we come back to this in the next section, that whatever the choice made at every step of the Euclidean algorithm, one will always end up with the same greatest common divisor up to a sign change. Note also t h a t the algorithm will certainly terminate since O(rk) is a strictly decreasing sequence of nonnegative numbers. Hence r k will eventually become zero. The above results are immediately transferable to the Euchdean domain F[z] of polynomials over a field F. However some of the entities have to be reinterpreted. The absolute value of the integers should be replaced by the degree of the polynomial. This explains why we introduced the curious
1.2.
EUCLIDEAN
RING
AND
G.C.L.D.
3
notation O(a) for the absolute value. For the polynomial case, there is no freedom in the choice of the quotient and the remainder in every step, but as will be seen in section 1.3, when we want to compute the greatest common divisor by the extended Euchdean algorithm, it is possible to build in some freedom and eventually end up with a greatest common divisor that is unique up to a constant. In elementary textbooks on algebra, one finds the notion of a Euclidean domain, which is an integral domain with a function 0(.). This function has exactly the same properties for the elements of the general Euclidean domain as the absolute value function had for the elements of the Euclidean domain Z. Thus in the general Euchdean domain, a division property like (1.1) should hold. In view of a possible generalization, we briefly review the theory of greatest common divisors in Euchdean rings. A Euclidean ring is a slight generalization of a Euchdean domain in the sense that it need not be commutative.
1.2
Euclidean ring and g.c.l.d.
We consider an integral ring 79 which is a ring without zero-divisors. This means the following : 9 79 is an Abelian group under addition that has neutral element 0 (zero). 9 79 is a monoid under multiplication that has neutral element 1. The inverse of an element need not exist, but all the elements that do have an inverse, form a multiplicative group. The invertible elements are called u n i t s . We shall denote the set of units for 79 by L/(T~). 9 Multiplication is (left and right) distributive over the addition. 9 There are no zero-divisors, i.e., ab = 0 implies a = 0 or b = 0. Note that the latter two properties imply the cancellation law : If ab = ac and a ~ 0 then b = c and similarly if ba = ca and a ~ 0 then b = c. This in turn implies that a left inverse is also a right inverse. Indeed, if ab = 1, then aba = a and the cancellation law gives ba = 1. If the monoid is a commutative multiplicative structure, then the integral ring is called an integral d o m a i n . If the elements 790 = 7) \ ~0} form a multiplicative group, i.e., when //(7)) = 790, then 79 is called a field. If it is an Abelian group, the field is
4
CHAPTER
1.
EUCLIDEAN
FUGUES
called commutative, otherwise it is said to be skew. A skew field is sometimes called a division ring. Thus, besides the other properties, the main differences are in the properties of the multiplicative group 7)0 as given in the following table
Do is not Abelian Abelian
a monoid integral ring integral domain
a group skew field commutative field
Unless stated otherwise, we shall assume in this chapter that 7) represents an arbitrary integral ring. For a field we shall use the notation F. Since in general we do not have commutativity, we should add to most of the notions the prefix left or right. We develop the theory mainly for the left versions and assume the reader can construct the right duals for him or herself. Of course for an integral domain or a commutative field, the distinction between left and right disappears. We now introduce some notions from divisibility theory. D e f i n i t i o n 1.1 (left d i v i s o r ) We say that a E 7) is a left divisor of b C 7) iff there exists some c C 7) such that b = ac. N o t a t i o n : alb. Note that any element from 7) will be a left divisor of 0. Thus we can write alO , Ya C 7). On the other hand 0 is a left divisor of 0 but of nothing else :
0
b, vb e
=
\ {0}.
D e f i n i t i o n 1.2 (left a s s o c i a t e ) We say that two elements a,b E 7) are left associates, denoted as a ,,~ b, iff there is a unit u C H(7)) such that a=bu. E x a m p l e 1.1 The set of integers Z is an integral domain w i t h / 4 ( Z ) = { - 1 , 1}. The polynomials over an arbitrary field F form an integral ring or domain F[z] and/4(F[z]) = {constant polynomials # 0} is the group of units.
The following properties are elementary to prove and can be found in any elementary textbook on algebra e.g. [66]. We shall not repeat the proofs here. Theorem
1.1 In an integral ring 7), we get the following properties :
1. The relation ,,~ is an equivalence relation in 7). 2. a ,,~ b r
alb and bla.
1.2.
E U C L I D E A N R I N G A N D G.C.L.D.
5
3. If a ,.~ b and c ,.~ d then a lc ~ bid.
By the equivalence relation ,,~, we can partition 1) into equivalence classes =
e u(v)}.
Note t h a t the equivalence class of 0 contains only one element : [0] = {0}. The equivalence class [1] is just the set of units : [1] = L/(:D). The third s t a t e m e n t in the previous t h e o r e m says t h a t the left divisibility p r o p e r t y of two elements propagates to all their left associates in the same equivalence class. Thus it suffices to study divisibility in the quotient structure of equivalence classes, i.e., in D/,,~ = {[a]: a e T)}. It is then most convenient to choose a uniquely defined representative for each of the equivalence classes. For [0], this is of course 0. For the other equivalence classes one agrees upon a particular choice. C o m m o n practice is to represent a nonzero equivalence class in Z by its positive representative. And a nonzero equivalence class in F[z] is usually represented by a monic representative. A greatest common left divisor (g.c.l.d.) can now be defined as follows : D e f i n i t i o n 1.8 ( g r e a t e s t c o m m o n left d i v i s o r ) A greatest c o m m o n left divisor of two elements s, r C 7) is an element g E 7) such that 9 gls and glr 5 t is a c o m m o n left divisor of s and r)
9
and
hlg 5t
greatest amon9 all
l ft di,i o
of s and r).
The notation we use is g = g c l d ( s , r). Note t h a t if one of the elements is zero, then the other element will always be a g.c.l.d, of the couple : g c l d ( s , 0) = s, for all s C 7). Note also t h a t g c l d ( 0 , 0) = 0. This might be surprising but it is conform to the previous definition. All g.c.l.d, of two elements s and r are equal up to a unit right multiple, i.e., they all are left associates, as one can easily verify. T h e o r e m 1.2 If s and r are elements from an integral ring 7), then if g is a g.c.l.d, of s and r, then h will also be a g.c.l.d, of s and r i f f h ,.~ g. P r o o f . Because both g and h are g.c.l.d.'s of r and s, they left divide one a n o t h e r : hlg and glh. T h e o r e m 1.1(2) then implies h ~ g. z] This imphes t h a t in the quotient structure T~/,,~, the g.c.l.d, is unique whenever it exists. If we agree to choose a unique representative from
6
CHAPTER
1. E U C L I D E A N F U G U E S
[gcld(s,r)], we can say that the g.c.l.d. ~ if it exists ~ is "unique" in T~ too. For practical computations however, one can temporarily disregard the uniqueness and use some normalizations that turn out to be convenient at that moment. If one ends up with some representative of the equivalence class of g.c.l.d.'s, it is usually not a difficult task to find the unique representative with some normalizing procedure at the very end of the algorithm. For further reference, we give one more definition of prime and coprime elements, before we pass to the definition of Euclidean ring and the Euclidean algorithm. D e f i n i t i o n 1.4 ( p r i m e , left c o p r i m e ) If a g.c.l.d, of s and r is a unit we shall call s and r left coprime. A n o n u n i t p C ~o - ~) \ ~0~ is called prime if p - ab (a, b C 1)o) implies that either a or b is a unit. Otherwise we call p composite. It should be stressed that in an arbitrary integral ring, there need not exist a g.c.l.d, for an arbitrary couple of elements. Indeed, g anf h can both be common left divisors of the same couple, but neither g]h nor h[g. Hence it can not be decided which one is the "greatest". However if the integral ring possesses some extra property which turns it into a Euclidean ring, then there will always be a g.c.l.d, and moreover, the right sided Euclidean algorithm shall give a constructive way of computing a g.c.l.d, for a given couple. Note the switch that we make here. We shall use the right sided Euclidean algorithm in a right sided Euclidean ring to compute a greatest common left divisor. So now we come to the definition of a right sided Euclidean ring. Recall that we denote the nonzero elements of :D as :Do =
v \ {0}. D e f i n i t i o n 1.5 ( r i g h t s i d e d E u c l i d e a n r i n g ) A right sided Euclidean ring as an integral ring Z~ together with a f u n c t i o n 0(.) 9 lP ~ N that satisfies
1. O ( a ) - O iJy
- o,
2. (division property) Va C 19, b C ~)o there ezists a "left quotient" q (i.e. the quotient of a left division b -1 a) and a "left remainder" r in 13 such that
- bq +
< o(b).
I f 1) is an integral domain, then this defines a Euclidean domain. This is not the classical definition. Usually, one imposes that 1) is a ring without zero-divisors (which need not have a unit-element a priori) but in
1.2. E U C L I D E A N R I N G A N D G.C.L.D.
7
addition, 0(.) satisfies O(ab) >_ O(a), for all nonzero a and b. It can then be proved t h a t 7) will automatically have a unit-element [206, p. 327]. Since we do not need the property O(ab) >_ O(a), we gave the a d a p t e d definition. As before, we shall denote the above decomposition of a by d i v and m o d operations but, to make the left and right difference, we denote the q and r obtained by the relations above as q-aldivb
and
r-almodb.
It is also common practice to denote the fact that a and r differ only by a right multiple of b by a = r ( l m o d b). Thus a = r ( l m o d b) r
(a - r ) l m o d b = 0 r
a l m o d b = r l m o d b.
Note t h a t r = a l m o d b implies r = a ( l m o d b), but the converse is not true. E x a m p l e 1.2 Probably the best known example of a Euclidean domain is the set of integers Z where we can take O(a) to be the absolute value. Another common example is the ring of polynomials over a field F where we may now use O(a) - 2 dega with deg0 - - c r by convention, so t h a t 0(0) = 0. Note t h a t more choices for 0(-) are possible. Another choice for the polynomials F[z] could be O ( a ) = 1 + deg a with deg 0 = - 1 . 1, we get tk(z) - ck(ak + z) -1 for k > 1, which is an elementary step from a classical continued fraction:
vo
+
+laa
Note the effect of V0 where u0 - co - 0 and which accounts for the unit factors v0 and a0. The k t h convergent or approzimant of this continued fraction is given by CO,k
= to o t~ o . . . o
tk(o).
(1.6)
aO,k
A right n u m e r a t o r - d e n o m i n a t o r pair can be c o m p u t e d by
[ ] [0] ~o,k
a0,k
- Y0,k
1
9
(1 7)
The more general situation, where uk and vk are a r b i t r a r y for k > 1, corresponds to a more general "continued fraction" which has the form t22 + ?32 ,00
~~ ~ /
cl + v l a2 -+- u2
c2 + v2 ~~~ / ao
(~.8)
al + Ul a2 + u~
It is assumed t h a t Vo still corresponds to the previous choice: u0 - co - 0 while a0 and v0 are a r b i t r a r y units. This V0 causes the unit factors vo and
1.4.
CONTINUED FRACTION EXPANSIONS
17
a0 in the above formula. However, if the Vk for k _ 1 are all of the general form, then the fraction is forking in n u m e r a t o r as well as in denominator at each step. It grows in two directions like a binary tree. Its kth convergent is then defined as before by co,kao, k-1 where this n u m e r a t o r and denominator are still given by ( 1 . 6 ) o r (1.7). This is what is obtained in (1.8) when we set u k - - v k - - 0 . If we build up the matrix product V0,k by multiplying successively from the right, i.e., using the recursion Vo,k-lVk, then this corresponds to a forward evaluation scheme to get the successive convergents for increasing k. N u m e r a t o r and denominator of the n t h convergent can be found in the last column of V0,n. This is a forward scheme since by the product V0,k-lIrk we compute (c0,k, a0,k) from (c0,k-1, a0,k-1). If we want to find the n t h convergent for a fixed n, we can also start from the other side and successively build up the product V0,n starting from the right with Vn and each time multiply from the left with another Vk until we reach V0,n. This corresponds to the backward evaluation scheme. This is a backward scheme since we build the continued fraction starting from the tail. W i t h each multiplication, we lift the tail to the tail of a higher level in the fraction depth, until we reach the whole fraction in the end. As it is explained above, a right MSbius transform of z - uv -1 corresponds to multiplying the column [u v] T from the left with a V-matrix. Also the multiplication of a row [s r] from the right with a V - m a t r i x corresponds to a M5bius transformation, a left one, and hence to an elementary step of a left continued fraction. For later use, we shah introduce this next. If we set then we can associate with it the dual MSbius transformation
ik(z) - (vk + z u k ) - l ( c k + zak) - z' with z - s - l r and z ' -
s ' - l r '.
The following property is trivial to verify. L e m m a 1.11 Suppose the MSbius transformations tk and tk are both related with the matriz Vk in the primal and dual sense respectively, i.e., if
[vk ck1 Uk
ak
then tk(Z) - (Ck -~- V k z ) ( a k -4- ltkZ) -1
and
ik(z)-
Then, supposing the inverses ezist, z' - tk(z) iff - z -
+
+ z k). ik(-z').
18
CHAPTER
1. E U C L I D E A N F U G U E S
P r o o f . This is easy, solving z from z' - tk(z) in terms of z' gives the result directly. D Let us define now
which corresponds to -
o
o...o
Then the previous property (see also property 2.4 of [41]) implies that -s-lr
If we apply this for n -
--S
- to
o
tl o . . .
t,(-s~,lr,,).
o
2, we get (right divisions)
--I
c2 -~- '/)2z2 ) Cl + Via2 + u2z2
vo ?--
c2 + v2z2 ) ao al + Ula2 + u2z2
with z2 - - s ~ l r 2 . Thus by building up the product VoV1V2...Vk from left to right, we do not only have a forward evaluation scheme for the computation of a right pair of numerators Co,k and denominators ao,k for the successive convergents, but, if we also work out [s r]VoV1... Vn - [sn rn], we also have a recursion for a left pair of numerators rn and denominators sn of the tails zn - - s ~ l r n of the (generalized) right continued fraction expansion o f - s - l r . Thus, starting from two elements s and r from a right sided Euclidean ring :D, we can now consider the right sided extended Euclidean algorithm as a method to generate successive right rational approximants for the element f - - s - l r in the (skew) field F. Each of the convergents -1 Co,kao, k can be considered as an approximant which is obtained by setting the corresp9nding tail equal to zero. Suppose we apply the right sided extended Euclidean algorithm with an initialization G_I-
0
1
s
?
.
Note that for this initialization [,
- 1]c_
-[0
0]
1.4.
CONTINUED
FRACTION
19
EXPANSIONS
holds. The algorithm then generates Gk -
G _ ~ Vo V 1 . . . V k
for which the previous relation is kept throughout the computations. Therefore we have Sco,k
~ rao,k
-- r k
-- 0
or --8
-1
17, _ C o , k a o , k
_
_s-
1
-
rkao,lk .
(1.9)
We can interpret this as if co,kao,- 1k were a right rational approximation for f - -s-it. The minus sign seems to be inappropriate here, but we obtain a much more natural relation if we set s . . . . 1, so that f - r and the previous relation becomes f
-- c O , k a O-, 1k - - r k a o , -l k .
In the next section we shall apply these ideas in the case of formal Laurent series. Before we border this generalization, it is important to set up the correct framework. W h a t exactly is going on? Why are the truncated parts of the continued fractions called approximants? W h a t is approximated and in what sense? Well, here is the formal framework. Suppose as we have put at the beginning of this section that F is a skew field. Divisibility theory has not much sense in a field because every nonzero element has an inverse, meaning that we always have the trivial form a - bq q- r with q - b - l a and r - 0 of the division property. However, we may consider a subset 13 of F which contains 1, and which, with the addition and multiplication of F forms a ring. Thus not every element in 7) should have an inverse in 7). All the elements in the continued fraction were elements from D. Thus the convergents or approximants of a right continued fraction are elements which can be written as a right fraction of elements from 7). E x a m p l e 1.3 The field F could be the field of real numbers R while the ring 7) could be the ring of integers Z. Thus we get rational (= ratio of integers) approximants for real numbers. Or we could consider F to be the skew field of formal series of the form X:k ck zk with only a finite number of ck with k positive that are different from zero. The coefficients are in some unspecified skew field. For 13 we take the subset of polynomials. Then, again, we get rational (-- ratio of polynomials) approximants for a given series from F. 9
20
C H A P T E R 1. EUCLIDEAN FUGUES
To measure how good the approximation is, we have to introduce the function 0(.) again. Suppose that on :D, this function turns :D into a Euclidean ring as we defined it above. Thus 0(-) takes on integer values in :D and O(a) - 0 iff a - 0. For the other elements in F, we assume t h a t O(-) is a positive real number. W h a t we have here is almost like a norm and -log 0 ( . ) i s almost like a valuation of the skew field F [243]. There is a complete theory on valuation rings, normed fields and similar constructs, which is close to what we do here but we will not go into the details of this theory. To allow the Euclidean algorithm to work with elements from F, it remains to define what we mean by the division property when a and b are elements from a (skew) field F. We shall suppose t h a t for all a C F and b E F0 = F \ {0} there exists a "left quotient" q C :D and a "left remainder" r E F such that = bq + < O(b). If we happen to take a and b in "D, then we have the usual properties of a Euchdean ring, and the Euchdean algorithm will end up by finding a g.c.l.d. of a and b, for exactly the same reasons we had before. However, when a a n d / o r b are not in D, then the Euchdean algorithm will still generate successive rk with O(rk) strictly decreasing, but at a certain step we may get 1 < 0(rk), and there is no guarantee that the algorithm will end; there may be no g.c.l.d, of a and b. E x a m p l e 1.4 When F is the set of formal series as defined in the previous example and 19 is the subset of polynomials, then for a, b C F, but nonzero, we can split b-la C F into a polynomial part q C 7) and a strictly proper part s C F \ I). Hence (1.10) holds if we set O(f) - 2 deg(f). Note that O(f) C N for f polynomial, while O(f) C (0, 1) when f is strictly proper.
-1 of the continued fraction To measure how good the approximant co,nao,,~ -1
expansion of f approximates, we can e.g. use O ( f - Co,nao,n)- O(rka -lo,k) or O(fao,n- co,,,)= 0(rk). Since O(rk)is strictly decreasing in the Euclidean algorithm, the approximation gets better and better. See also [65]. The construction we have set up above will be called a (skew) rational approximation field (P~AF) of which we recapitulate the definition below. D e f i n i t i o n 1.6 ( r a t i o n a l a p p r o x i m a t i o n field) A right sided rational approzimation field (RAF) is an arbitrary (skew) field of which a subset 19 forms a right sided Euclidean ring for a function 0(.). Moreover this function and the division property in 1) is eztended to elements in F such that
1.5. A P P R O X I M A T I N G F O R M A L SERIES
21
1. O(f) is a positive real number for f E F \ T) 2. for any couple of nonzero elements a, b E F, there exist q C 19 and r C F such that = bq + < o(b). In a I~AF, the Euclidean algorithm is well defined, but it could be t h a t it does not stop after a finite number of steps.
1.5
Approximating formal series
In this section, we study RAFs of formal series, which will play a very i m p o r t a n t role in the rest of this book. Before we come to the point of this section we shall take the opportunity to introduce some notations concerning polynomials and formal series. As before, F is an arbitrary field (possibly finite). If we do not suppose comm u t a t i v i t y of the multiplication, then F is supposed to be a skew field or division ring which means that it is a ring where all the nonzero elements are invertible. Except for c o m m u t a t i v i t y of the multiphcation, it has all the properties of a field. In many applications, we may think of F as being the field of reals ll~ or the field C of complex numbers. F[z] is the ring of polynomials over F and F[[z]] is the ring of formal power series ( f p s ) i n z with coefficients in F. F(z) denotes the skew field of all formal Laurent series (fls) that can be written as ~ = d ckz k, for some d E Z, i.e., it has a finite number of terms with a negative power of z. If in connection with this fls we use Ck with k < d, then we think of it as being zero. The subset { ~ ckz k} of F(z) for which ck = 0 for all k < d is indicated by Fd(Z). Similarly, all polynomials of degree at most d are represented by Fd[Z]. D e f i n i t i o n 1.7 ( o r d e r ) If in the fls c(z) - E ck zk E F(z), the coefficient Cd ~ 0 and ck = O, Vk < d, then we call d the order of the fls. We denote it as d = ord c(z). We set ord 0 = oo. For the m o m e n t we shall be interested in the set F(z -1) of fls which are of the form d
c(z)-
~
ckz k with d < oo.
k"--oo
Thus
F 1. When all ak - 1, then it takes the form
Vo
el I-I[ a~O)z + a 1(1) I
a~O)"t/'lC2 (1)I_.~_ Z + a2
2c3 a~1)L+ . . . ) ao 1 [ a(~
+
which is called a Jacobi or J-fraction. Chebyshev mentions in the paper [54] on least squares approximation that in the continued fraction the partial denominators can have degree at most one, which is exactly the J-fraction. Also in [56] and [57] this fraction for power series in z -~ is used explicitly. Let us consider the approximating mechanism in some greater detail. We start from the equation (1.11). Without loss of generality, suppose for simplicity that degs > degr. Thus f - s - ~ r - ~ = ~ f k z -k is strictly proper. The degree of a0,k is denoted by ~k and the degree of ak, which is by the definition in Algorithm 1.4 equal to deg sk-1 - deg rk-1, is denoted by ak. Because sk-~ - rk-2Uk-1, with uk-~ a nonzero constant, we get ak -deg rk-2--deg rk-1. We also know that deg rk < deg rk-1, thus ak _> 1. This is because rk - (rk-2uk-1 l m o d rk-1)ck. It then follows directly that deg rk - deg s - at - a2 . . . . .
ak+l.
(1.12)
The relation aO,k
ak
--
"U,O,k-1Ck + a O , k - 1
--
aO,k- 2"//,k- 1 Ck ~- aO,k- 1 ak
(1.13)
1.5.
APPROXIMATING
FORMAL
with initial values ao,o implies by induction t h a t
aor
25
SERIES
0 (a nonzero constant) and ao,1 -
aoal,
k ~k - deg ao,k - E hi. i=1
This implies the well known relation for the Euclidean algorithm deg s - deg so - deg rk + deg ao,k+l.
(1.14)
It follows from the equation (1.11) t h a t deg ( f - Co,kao,~)
--
deg ( - s - 1 rkao.~:)
=
deg rk - deg s - deg ao,k
=
--2tC,k -- Otk+l
Now we suppose for simplicity, and this is again no loss of generality, t h a t s - - 1 . In t h a t case rk is strictly proper for all k _> 0 with strictly decreasing degree. Since at the other h a n d co,k as well as ao,k are polynomials, it follows from fao,k - Co,k - rk (1.15) t h a t co,k is the polynomial part of fao,k, which has a degree t h a t is precisely 1 less t h a n the degree of ao,k (if fl ~t 0). Let us summarize w h a t we obtained so far in a theorem. T h e o r e m 1.13 Let s - - 1 a n d let f - - s - X r - r - ~ o f k z - k be in F ( z -1 ). A p p l y the right sided e z t e n d e d E u c l i d e a n algorithm with the initial matriz G_~-
0 s
1 r
.
This m e a n s the f o l l o w i n g 9 Choose n o n z e r o c o n s t a n t s vo and ao C Fo F \ { O} a n d uo - Co - O. For k >_ 1, choose uk, ck C Fo, vk - 0 and ak - - ( s k - 1 l d i v rk-1)Ck. S e t
Lr ]
V k - / vk
ck
Uk
ak
'
k_>0,
and generate I vO,k Gk -- a - 1 VoVI " " " Vk
--
~O,k
Sk
CO,k aO,k tk
CHAPTER
26
1. E U C L I D E A N
FUGUES
Then the following relations hold anddegak-ak_>
degao-ao-O,
k ~i=o
deg ao,k - ak -
1,
k_> 1
ai
co,k - f ao,k d i v 1 - the polynomial part of f ao,k deg co,k - deg ao,k + deg f
rk - f ao,k m o d 1 deg rk - - a k + l
deg ( f -
a-1 co,k o,k)
- -~k
-
Nk+l
-2~k
--
-
a k + l .
This is a good place to sit back for a while and check the previous results with an example. E x a m p l e 1.5 We give an example where s is not j u s t - 1 . It illustrates t h a t this r e q u i r e m e n t makes the f o r m u l a t i o n of the t h e o r e m a bit simpler, but it is not really needed. T h e a l g o r i t h m still works. We take for the given s and r the following L a u r e n t series s=-z+z
-4 a n d r =
l+z
-4.
Since we have c o m m u t a t i v i t y , we do not have to distinguish between left a n d right and we shall simplify n o t a t i o n accordingly. S u p p o s e we take vo - ao - 1, t h e n we can directly s t a r t with so - s and ro - r. Suppose we choose ck and uk equal to 1 for all k _> 1. T h e n the successive c o m p u t a t i o n s give us.
al - - c l (So d i v to) - z S1
7'I
]
-
To Iv1 z+z,
z
a2 - - c 2 ( s l d i v r l ) - - z 3 + z 2 - z + 1
S2
r2]
-- [ ~1
?'1]V3
:
[ l + z -4 z-3+z-4 ] [
:
[ z-3+z-4
2z-4 ]
0 1
1 -z 3 + z2 - z + 1
1.5.
APPROXIMATING
FORMAL SERIES
27
a3 - -c3(s2 div r2) - - ( z + 1)/2
,3
~3 ]
~
1 ]
1 - ( ~ + 1)/2
-
[2~ -~
0].
Here the algorithm stops because r 3 - - 0 which means that - s - l r and c0,3ao, ~ correspond exactly. The given data corresponded to a rational function which is recovered after 3 steps of the algorithm. The successive V0,k matrices, containing the successive approximants are given by
[011 1
Z
] V 0,2 -
Vo,IV2
-
[1 z
-z
z3 z2z l] 4 -~- z 3 -
- z ~ + ,~ - z + ~ Vo,3 -
Vo,2V3
-
- Z 4 -~ Z 3 - - Z 2 -~- Z -[- 1
z 2 --~ z --~
1
;
(z ~ + ~)/2 ] (Z 5 --
1)/2
"
The degree properties can be easily checked. For example, one finds that -1 the series expansions of f - - s - l r - [1 + z - 4 ] / [ z - z -4] and co,3ao, 3 = [(z4+ 1)/2]/[(z 5 - 1)/2] match completely. How to get approximants with a monic denominator is described in the next theorem. F is a skew field and by the leading coefficient of a nonzero r C F(z -1) we mean the coefficient ~ in r(z) - ~zd~s(r)+ lower degree terms. T h e o r e m 1.14 ( n o r m a l i z a t i o n ) Suppose 0 ~ s e F(z -1) has leading coefficient ~. For this s and some r C F(z -1 ), with deg r < deg s, we apply the right sided (extended) Euclidean algorithm as described above. We fix the arbitrary units as follows. Choose Vo - _~-1 and ao - 1 and for k >_ 1 choose at each step ck -- rk-1 and uk - -~-kl_ 1, where rk denotes the leading coefficient of rk. Then this normalization yields ao,k and ak as monic polynomials and the leading coefficient of sk is - 1 . P r o o f . That the leading coefficient of sk is - 1 follows from s k - rk-lUk, k _ 1. The choice for uk and the initial condition so - - s $ -1 proves the result for sk, k >_ O.
28
C H A P T E R 1. E U C L I D E A N F U G U E S
As we have seen before, the degrees of a0,k are strictly increasing. This implies t h a t the leading coefficient of a0,k is given by the p r o d u c t of the leading coefficients of ak and a0,k-1. Therefore it is sufficient to prove t h a t ak is monic. Because a0 - 1, the proof then follows by induction as follows. The leading coefficient of ak for k >_ 1 is given by the ratio of the leading coefficient of sk-1 and the leading coefficient of rk-1 times - c k . Since the leading coefficient of sk-1 is - 1 , the leading coefficient for ak is equal to ~-11ck , and this reduces to 1 by the choice ck - Tk-1. [--] If we redo the c o m p u t a t i o n s of the previous E x a m p l e 1.5, with the normalizations proposed in T h e o r e m 1.14, then we get E x a m p l e 1.6 ( E x . 1.5 c o n t i n u e d ) The condition t h a t the leading coefficient of so be - 1 is already satisfied and we can take V0 to be the identity matrix. Thus so = s = - z + z -4 and r0 = r = 1 + z -4. Now the successive c o m p u t a t i o n s give us. ct
--
r0 -- 1
U1
--
--tO 1 -- --1
al
=
-cx(s0divro) - z
[ ~01~0 ].1
S1 I T 1
_ [ _~_ z-, i z-~ +z-, ]
C2 U2 a2
-
~i-i
--
--7'1 1
=
-c2(sldivrl)-z
-
- i
a-z
2+z-1
- [ ~11~1],~ - [_~_z-,~z-~+z-,] [ _~ -
[ -z-~- z-' I -~z-' ]
1 z3 - z 2 + z -
1
1.5. A P P R O X I M A T I N G
FORMAL SERIES
29
C3
--
r2----2
21,3
-
-~i -~-
a3
-
-c3(s2divr2)-z+l
1/2
[,~ I~ ],~ _ [_~-~_z-,l_~.~-,]
[s31T3]
-[-~-'
[o 2] ~/~
z+l
to ].
The successive V0,k matrices, containing the successive approximants are now given by
[01 1] --
Z
; V 0,2 -
go,iV2 -
[1
z
z 4-
z 3 -~- z 2 -
( ~ - ~ + z - ~)/2 V0,3-V0,2V3-
(z 4 - z 3 + z 2 - z -
1)/2
z-
1] 1
;
~' + ~ ] zS- 1
" 0
To introduce the idea of getting an expansion of the left ratio of two fls, we started from the case where these fls were elements from F(z - ] ) . This is because the resemblance with the Euclidean algorithm as applied to polynomials is almost immediate. These fls have a degree, just like polynomials have etc. It will not require much imagination from the reader to see that it is possible to give dual formulations of the above set-up if the two fls are both in F(z). So we shall have a P-part ("proper part" or "principal part + constant term") for a fls in F(z), defined as 9 D e f i n i t i o n 1.11 ( P - p a r t ) If d - ordc(z) , deg r < deg s. The algorithms I n i t i a l i z e and Algorithm 1.6" Atomic_Euclid
Given s and r
k=O Initialize ~o = 0
k=l w h i l e rk-1 ~ 0
Update_k k=k+l endwhile
N o r m a l i z e are exactly as before. The algorithm Update__k is described in the algorithmic frame 1.7. With this version of the algorithm, we are able to see that the Euclidean algorithm can be implemented in two different versions. The difference being in the way t h a t the coefficients $(') and rk are obtained. Suppose that we consider the case s = - 1 , r = f. In our first version of the Euclidean algorithm, only the sk and rk appeared. In the extended Euclidean algorithm, also the V0,k matrices with entries u0,k, v0,k, co,k and a0,k appeared. The coefficients in the Irk matrix were still found from the sk-1 and rk-1. However, because at every stage f Uo,k (i) + ~(i) ~o,k - sk(i) ,
(1.17)
(~) the coefficient - $ ~ ) of z -~k-~-i-1 in this expression is not affected by V0,k because this is a polynomial and - ~ k - 1 - i 1 is negative. Thus with f - ~ k > 0 fk z-k, we find that
(i)
36
C H A P T E R 1. E U C L I D E A N F U G U E S
Algorithm 1.7" U p d a t e _ . k Define ~k as deg s - deg rk-1 and set ak - ~ k - ~k-1 Define G(k-1) -- Gk-1 Define rk as the coefficient of z d~s'-~h in r k - 1 - - r(k-1) for i - 0, 1 , . . . , a k Define ~ - 1) as the coefficient of z des s-,~k_l-i in s k(i- 1) and set q(~k
- i ) = ~ . ~ ~ ( i - ~)
sk
Define T/.(i) "~: - [ Set endfor
1 .(~k-i)
-'tk
zak
-i
0] 1
C~) -- C~-1)"(i) "k
Gk-G(~k)[
01 01]
Normalize
(i) i.e. where the vector u0,k (i) contains the coefficients of u'O,k u(~)c 0,k,z)- [ 1 z
.
.
.
z,~k ],(i) UO,k"
The latter way of arranging the computations requires inner products of vectors to be evaluated as in (1.18). It is not necessary any more to keep track of the sk and rk to keep the recursion going. Indeed, if we know only f and V0,k-1, we can always compute the necessary elements with inner products of the form (1.18). We can also note that the computation of the q(')-coefficients can be done by solving the upper triangular Toeplitz system
o~
9
,
9 Trek,k-1
__
qC~k) k
~
"S~h - c E k , k - 1
Herein we used the notations oo
Sk_l
E
--
.9i,k-1z-i
i=~k-I
and
co
7.k_ 1 - -
E ?i,k-1 z i=~k
-i
(~k-1 - r,~.,k-x ).
,
(1.19)
1.7. VISCOVATOFF ALGORITHM
37
The atomic algorithm does this by a kind of backward substitution. Note also that the entries needed to define this system can all be found by evaluating inner products. The right-hand side is found by scaling the r k - 2 from the previous step with the factor uk-1. The entries of the Toeplitz matrix are found from the multiplications
f t~k + a k
"""
f 2tck
rt~k + a k , k - - 1
where the left-hand side Hankel matrix has dimension ak + 1 by tck-1 + 1 and a0,k-1 is the vector containing the coefficients of ao,k-1 (z). Thus a0,k-1 (z) = [1 z . . . z~k-t]a0,k_l, which is known from the previous step. We shall return to these types of matrix interpretations in the next chapter. To put it in other words: all what the atomic Euclidean algorithm does is to split the matrix
i10][01] -qk
1
1
0
with
qk(z)- q(~k)Z~k +''" + q(O)
as a product of more elementary matrices V(~ "''k~Z(=k) i , which corresponds to writing out the backward substitution steps for the system (1.19). The two versions (with or without the inner products) that can be obtained for almost all of the algorithms that we present can be given an interpretation of "layer peering" or "layer adjoining". These terms stem from scattering theory. In Section 1.8 we briefly go into this difference.
1.7
Viscovatoff algorithm
There is another way of splitting one step of the usual Euclidean algorithm into more elementary steps. It is however definitely different from the atomic Euclidean algorithm. To introduce the idea, we start with the simplest possible case. Consider a series f - - s - l r C ]~(z-1) with s - - 1 , thus f - r. Suppose the normalization takes v0 - a0 - 1 and uk - ck - 1 for all k > 0, thus ak - -qk and sk - rk-1 for k _> 1. Suppose moreover that we are in the normal case, that is ctk - 1 for all k. The Euclidean algorithm then constructs the continued fraction (right divisions)
1 i+...+ +la2
1 ]an+zn
I
38
C H A P T E R 1. E U C L I D E A N F U G U E S
in which all ak are polynomials of degree 1. These are obtained as ak - - ( s k ldiv rk) - - ( r k - : ldiv rk). The zk --.sklrk are the tails. The Euclidean algorithm is a succession of invert-and-split operations: invert zk-1 and split in a polynomial part ak - -qk and a strictly proper part zk. -
~_:(~)-: - ~,(~)+ ~(z) - (~(~~ + ~(~:>)+ z~(~). The atomic Euclidean algorithm does not take out the complete polynomial ak at once, but splits this into two steps
Zk_ 1( Z ) - 1 - - a(O)z + z~(z) zlk(z)- aO) + zk(z)
or
z , _ , ( z ) - (4~
+
(:) + z, (z)))-:
The Viscovatoff version splits this by introducing a new continued fraction step"
~-1(~)-: 4(z)-: -
-
4~ + 4(z)
or
a~(:)+ z k ( z )
z,
(4
(:)
-:) -:
Note that although we have used the same notation for the atomic Euclidean algorithm and the Viscovatoff algorithm, it refers to different objects. The Viscovatoff version thus results in a so called KITZ -1 continued fraction [149, p. 525].
1 I 1 I+ iO~zI+ :~i)1+'"+ f(z)--la~O)z+[a~l) [a I
I zI [~. I
I (i 21)
[ a(n~ -~- , (1) q_ zn(z )"
"
Basically, in the atomic Euclidean algorithm, we fit the first two terms in the series z~J 1 by an interpolating polynomial viz., a(~ + a k(1) , while in the Viscovatoff splitting, we fit them by a "rational approximant" a(~
+
[aO)] - t , which happens to be a polynomial. We can do something similar for a formal series f 6 F(z) of order zero: f ( z ) = fo + s +.... The splitting of one continued fraction step into two elementary continued fraction steps comes very naturally in this setting as we explain below. When we do not split up the steps, as in the usual Euclidean algorithm, then according to (1.16), we get an expansion in the normal case which looks like
y(z)
1 -
~ +
z~l+... + +1 aa
. [ a. + z 2 ~
(1.22)
1.7. V I S C O V A T O F F A L G O R I T H M
39
where again each ak(z) a (~ O) is a polynomial of degree 0
deg c0,2k < k,
deg CO,2k+l
deg ao,2k -- k,
deg aO,2k+l is what we want to give now. At first glance this seems to be the simpler case since we do not work with P-parts, but directly with polynomial parts, thus no shifting with powers of z seems to be necessary. However, as we have seen above in (1.21), the general form of the continued fraction in the normal case is
I+
+
l alz
Xl + 11+ . . . ]a3z
l a,
with ak constants. Thus we have to distinguish between an "odd step" and an "even step". This is also true in the general case. This can be explained as follows. The Vk matrix which we have just given for the series in F(z> has the form
-
[ 0 Uk z - v
ck ] z '~k
~l,kz - v
5k - - ( z ~'sk-1 ldiv ?'k-1
)r
where we used ~ k ( z ) - z - V h a k ( z ) , (t,'k -- a k - v ) t h e P-part o f z v S k _ l T " ; l l . Thus 5k is a polynomial in z -1 of precise degree ~k. The extra factor z ah in the formula for Vk represented an equivalence transform, needed to make everything polynomial. For the case of series in FIz -~), we have to repace z by z -1 everywhere and ak will be the polynomial part of ( z ' r k _ l ) - l s k _ l .
1.7. VISCOVATOFF ALGORITHM
47
Thus the equivalence transformation with the factor z ~k is not needed. Thus as long as v = 0, every step corresponds to a m a t r i x Vk of the form
[0
Uk
ck] ak
For v = 1 though, an odd-even pair of two successive Vk matrices is
[0 ck][ 0 ck+x] [0 ck][ 0 C+l ll,kZ
akz
l/,k+ l z
ak+l Z
lZk
akz
Z
ak+l
Uk+l
And the last factor z is discarded by an equivalence transformation. E x a m p l e 1.9 ( E x . 1.8 c o n t i n u e d ) We illustrate this by the previous example where we formally replace z by 1/z. Thus we obtain z4 + 1
I-z~-l-I
z-l]+
[-l+z
1
-1
z -4
2
-z-
-3
z~j z~j
[+
+z
+
i
1
I'2
An equivalence transformation with powers of z will turn all negative powers into positive powers of z. Thus we get
/ _ rl__]+ z
1
1 I
11
-z/2+[ -2
[ -z 3+z 2-z+l+[
O
which has the form (1.26) with this time ak polynomials in z.
The result is t h a t for series in F(z -1) with deg < 0, we have to replace the Algorithm 1.8 by Algorithm 1.9. We check this algorithm against the results in the last example. E x a m p l e 1.10 ( E x . 1.9 c o n t i n u e d ) Let f = - s so, using trivial normalizations,
G o --
-1
-(~ +
I 1 ~1 0 1- zs
1 1 + z4
W i t h v - 1, we find a l - 1 and al - - ( s o ldiv zro)z- z.
cl
--
--
o[01] I 0 --
1
z
1
1-+-z 4
1
Z
l+z
~)/(P
- ~),
48
CHAPTER
Algorithm
1.9:
1.
EUCLIDEAN
FUGUES
Compute_Gk(v)
{ series in ]g(z -1>, deg < 0 ) if k odd t h e n tk - v e l s e tk - 0 e n d i f ak = deg sk-1 - deg rk-1 qk - ( s k - 1 l d i v z t k r k _ l ) z *k. DefineVk-[
01 -qkl ]
Set G~ - Gk-1V~ Normalize
For k - 2 we have a2 - 3, a2 - - ( s l l d i v G2 = G I V2 gives
[0
72 ~ -
1
i
G2 =
rl)
1
-z 3 + z2 - z + 1 z
- - - - Z 3 q- Z 2 -- Z -+- 1.
1
-z 4 + z3 - z2 + z + 1
l+z
2
For k = 3 we have a3 = 1, a3 : - ( s 2 l d i v zr2)z = - z / 2 . gives
V 3 ~-
[0 1] 1
I G3 =
-z/2
Thus
1 Thus
(]3 =
G2V3
'
-z ~ + z 2 - z + 1 -z 4 + z 3 - z 2 + z + 1
(z 4 - z 3 + z 2 - z + 2 ) / 2 (Z 5 -
Z 4 -~- Z 3 -
Z 2 -]-
Z)/2
1
-1
2
Finally for k - 4, a4 - 0, a4 - - ( s 3 l d i v r3) - - 2 .
4 ---
G 4 --
[ol] 1
-2
I
(z ~ -
(z ~ -
' z~ + z~ -
z 4 +z 3-
-1
z + 2)/2
z 2 +z)/2
-z 4 - 1 -zS+ 1 0
Thus we get exactly the same continued fraction as we had above.
O
1.7.
VISCOVATOFF
49
ALGORITHM
Note t h a t in the previous example a4 Viscovatoff case when v - 1, we can have odd k, we still have ak :> 1. Indeed, for k thus qk - zq~ and r~ - s k - 1 - z q ~ r k - 1 so (Zrk_1)-18k_1
-- q~ +
0. Until now all ak >_ 1. In the ak - 0 in case k is even. For an o d d denote qkI -- Sk-1 l d i v Z r k - 1 , that
(zrk_l)-l(rk).
By the division property, r~ is the r e m a i n d e r of 8 k - l C k l d i v z r k - 1 , deg r~ < deg rk-1 + 1, thus, because rk - r ~ c k ,
SO t h a t
deg rk - deg r~ < deg r k - 1 -- deg sk. Therefore a k + l -- deg s k - deg rk could be zero. But in an even step, setting qk -- s k - l C k l d i v r k - 1 , thus ak - --qk, (rk-1)-lsk-1
-
-
qk + ( r k - 1 ) - l r k
where for analogous reasons deg rk < deg r k - 1 -- deg sk, so t h a t in this case
ak+l ~ 1. The analog of T h e o r e m 1.18 for series in F(z - 1 ) includes T h e o r e m 1.13 as a special case for v - 0, so t h a t p a r t is not new. Including also the Viscovatoff variant, i.e., for v - 1 we get" Theorem
1 . 1 9 F is a s k e w field. L e t s, r E F(z - 1 ) s u c h t h a t f F ( z - 1 1 w i t h deg f ___ - 1. D e f i n e
I1~ 0
1
3
r
-
-s-lr
E
.
C h o o s e v C {0, 1}. Fork>l" I f k odd, t h e n s e t tk - v, else s e t tk -- O. Set
G k -
I vO,k
eO,k
uo,k
ao,k
Sk
rk
-
a n d Zk -- - - s - k i r k w i t h uo -
G k_ ~Vk ,
Co -
Uk --
Vk Uk
Ck ak
~
O, ao, vo G Fo a r b i t r a r y , ao -
k>l ak -- - d e g Zk-1 -- deg S k - 1 -- deg r k - 1 , ak -- - - ( s k - i l d i v z t k r k _ i ) c k z tk ck, uk E Fo a r b i t r a r y Vk -- O.
I'~k -- .
~i
O, a n d f o r
50
CHAPTER
1. E U C L I D E A N F U G U E S
Then, for n - 1 , 2 , . . .
ulc21+ ... +
f -vo
+l
a2k > 0,
a2k-1 >_ 1,
a2
ao 1 [ a n + Un Zn /
fork>_1
deg c0,k - tck - a l ,
deg
ao,k
--
~;k
and for k >_ 0 -1
1
-1
f - Co,kao,k - - s - rkao, k deg rk - deg sk - ak+l - - n k + l + deg s.
-1
The Co,kao,k are right Padd approzimants at cr for the series f . Moreover, if v - 1 then for k > 1
C0,2k_1(0 ) # O, aO,2k(O) # O, aO,2k_l(O ) -- 0
P r o o f . As for the case with series in F(z), almost everything follows immediately by simple induction arguments and we leave it as an exercise. To check t h a t we get Padfi approximants for f , it is sufficient to note that
a-1
deg ( f - c0,k 0,k)
-
- d e g ,S -~- (--/~k+l + deg s) - deg ao,k
1. The c0,k and a0,k are right coprime since by construction, they are obtained by multiplying coprime polynomials in z -1 with appropriate powers of z. D In the normal case for v = 1, in the previous algorithm we have ak = tk, i.e. ak - 1 for k odd and ak = 0 for k even. This corresponds to taking out separately the fist degree term and the constant term of the first degree polynomial, ak that the usual Euclidean algorithm would give, as we have explained in the beginning of this section. In the normal case, the general theorem speciahzes to C o r o l l a r y 1.20 Let s , r C ~(z -1 ) $~ch that f - s - 1 T E ~"(Z -1 ) with deg f - - 1 . Choose v E {0, 1} and vo, ao C F0 arbitrary. Define
G_I -
I1~ 0
1
T
,
Vo -
[ ] vo 0
0 ao
'
Go-
G-1Vo.
1.7.
VISCOVATOFF
51
ALGORITHM
F o r k > 1: Choose vk = 0 a n d ck, uk C ~o arbitrary. Set ak = deg sk-1 - deg rk-1, tk = v if k is odd a n d tk = 0 otherwise,
ak-
- ( s k - 1 ldiv z t k r k _ l ) c k C F,
Yk --
Vk Uk
Ck z tk ak
] '
I vO,k C-O,k Gk -
Uo,k
ao,k
Sk
rk
-- G k_ 1 Vk .
Then, if all ak ~ 0
?30 _s-lr _
Cl
I+
l alztl
co,kaO '
--
_8-1
1c21 + . . . + I a2
n-lCn
ao 1
Jan ztk + UnZn]
rkao,- 1k
a n d co,kao, ~ is a right P a d d a p p r o x i m a n t s f o r f at cr S u p p o s i n g v = 1, we have
a2k-1 --
a2k
1,
= O,
/~2k = tC2k-1 = k
deg co,2k = deg co,2k-1 = k - 1, degao,2k = degao,2k_l = k, degs2k=degs-k,
co,2k-1(0) ~ 0
ao,2k(O) y~ O,
ao,2k-l(O) = O,
degs2k_l = d e g s - k - 1
degrzk = degr2k+l = d e g s - k -
1
while f o r v = O, we have ak = 1,
~k = k
degco,k=k-1, degrk = d e g s k - 1
degao,k=k, =degs-k-
1
In the chapter on Pad6 approximants it will be explained t h a t the case v = 0 gives diagonal approximants from the Pad6 table, while v = 1 will correspond to a staircase.
C H A P T E R 1. E U C L I D E A N FUGUES
52
1.8
Layer peeling vs. layer adjoining methods
In [37], Bruckstein and Kailath developed a unified conceptual framework for studying (inverse) scattering procedures. Within this framework, they distinguish "layer peeling" and "layer adjoining" algorithms. We shall explain these terms starting from our knowledge about the algorithm of Euclides. Suppose we start from the Euclidean algorithm in its simplest form which corresponds to the continued fraction expansion of a series f C F(z -1 ) which is strictly proper. Suppose moreover that we disregard normalizations, i.e., we choose uk = ck = 1 and vk = 0 for k = 1 , 2 , . . . and set V0 = / 2 . Thus Vk reduces to Vk-
[
0 I
I -qk
"
The Euclidean algorithm then starts from z0 = f and in step k, it takes the inverse of zk-1 and splits this in its polynomial part qk and the strictly proper part, which is the tail zk. It is thus a sequence of invert-and-split operations. We know that, as this procedure proceeds, we build up better and better -1 of the continued fracPad~ approximants (i.e., the approximants c0,ka 0,k tion), meaning that we match more and more of the initial terms in the original series f. The corresponding tail zk somehow represents information about that part of the series f that has not been approximated yet. Note that in this formulation of the algorithm, the approximants are never needed during the computations. The qk, which are generated are sufficient to construct the approximants, but they are not needed to keep the algorithm running. In fact the next step can start as soon as the strictly proper part of the inverse of the current tail is known, thus after inversion of the current tail, the polynomial part qk is peeled off as excess terms which are "superfluous" since the next step is just waiting for the strictly proper part zk. One can see this as if the "signal" f - ~ fkz -k, or the "time series" (fk), where larger k refer to instances deeper in the future, is scattered through a layered medium. The effect of each layer is the invert-and-split operation. In the picture
Z2
-1
I
Zk-l J
-1
[ Zk I
r
each block represents such an operation or a layer and as the signal penetrates deeper in the layered medium, more and more information is caught
1.8. L A Y E R PEELING VS. L A Y E R ADJOINING METHODS
53
from the most recent future in the successive layers while the signal Zk that is k layers deep contains the residual information. It is not only the tail of the continued fraction, but it is also related to information that is k steps ahead in the future of f. Thus the Euclidean algorithm is tracking the signal as it penetrates in the layered medium. To answer the question "How will this signal get through the layered medium?", it is first pushed through the first layer and once it is known what comes through, the first layer can be peeled off. We may forget the original signal and any information about the first layer completely. With the output after layer one, we are back to the original question with a new input and medium with one layer less. The same terminology can be used in the more general formulation of the Euclidean algorithm. The input for layer k is the couple (sk-1, rk-1) and the output is (sk, rk). Layer k "absorbs" the most relevant information at that stage as Vk. In the previous approach we were mainly interested in the effect of the medium on the signal, which is obtained by peeling off more and more of the upper layers and find out what these layers did to the signal. If we are more interested in the medium itself, we should make a model for it. One way of obtaining such a model is by glueing together the successive layers that we peeled of in the previous approach to form an approximation of the original medium. Mathematically this means that we build up this model for the medium by multiplying the V~ matrices since V0,k will completely describe the global effect of the upper crust of the medium, up to and including layer k. Thus the main objective here is not to see the effect of the medium on the given signal, but to build up a model for the medium by adjoining more and more of the upper layers. In terms of the continued fraction, this means that we are more interested in the successive approximants of the continued fraction, rather than in the tails. The extended Euclidean algorithm uses a layer peeling technique to compute the effect of the successive layers (compute Vk) and uses this information to build up the model (multiply out V0,k). However, this is not what is usually meant by a layer adjoining algorithm. In contrast to the layer peeling method where the effect of the first layer is computed for the whole signal, the layer adjoining method will only process as much of the signal as is needed to identify the first layer. After we have identified the first k - 1 layers by building the matrix Vo,k-1, only that part of the initial signal, needed to compute the next layer, is pushed through all the layers that have been identified so far. Each time a new layer Vk is constructed, it is added to the already existing model for the layered medium Vl,k-1 to get
CHAPTER 1. EUCLIDEAN FUGUES
54 W l ,k :
W l ,k - l V k .
Comparing the layer peeling and layer adjoining approach, we can conclude that a layer adjoining method computes a model of the medium adjoining more and more layers. Therefore an increasing part of the original signal is pushed through the medium with an increasing number of layers. The original signal is kept intact during the whole computation. The model could be built explicitly (the product V0,k is evaluated), as it usually is, or it could be stored in cascade form (the factored form VoV1.." Vk). The effect on the model on the original signal is of no importance and the result (the tails zk or the couple (sk, rk)) are in most cases not explicitly computed). A layer peeling method on the other hand computes this effect explicitly. The model of the medium is absent or only present at a second level. It is only present in factored form V1, V2,..., Vk. There is a computational edge to this distinction. In the layer adjoining methods, we have to compute the scattering effect of k layers on (part of) the original signal. This is given by convolution, or series times polynomial or I-Iankel times vector operations. All of this translates into inner products of two vectors. On the other hand, in layer peeling methods, we only have to describe the effect of one layer, which manipulates information much more locally and hence does not require inner products of long vectors. Typical operations are the division or inversion of formal series, or the inversion of a triangular Toeplitz matrix. It is well known that inner products are not quite suitable for parallel implementation. As the extended Euclidean algorithm, also the atomic Euchdean algorithm as we have formulated it, is not "pure sang". It builds up the concatenation of the layers, which is the layer adjoining element, but it uses the coefficients in residual series rk and sk, which is the layer peeling element. At the end of that section however, we have observed how the computations could be done in terms of the original data f and how the inner products then appeared. When these ideas are implemented, one gets a truly layer adjoining method.
1.9
Left-Right duality
The previous results were all derived in one of their two possible versions. They were either for a left or for a right formulation. We encourage the reader to derive the corresponding dual results. For further reference, we shall summarize some of them below. One can also consider it as a summary of what we have obtained in this first chapter. To distinguish with the previous formulations, we shall give all quantities a hat.
1.9. L E F T - R I G H T D U A L I T Y
55
Suppose we want to find a greatest common right divisor of the elements and § in a left sided Euclidean ring 7) or we want to find a continued fraction expansion of _§ with ~ and § in a left sided KAF. The operations r d i v and r m o d are defined as in a right sided Euclidean ring or a right sided I~AF, except that now right divisions are used. This can be obtained with a left sided Euclidean algorithm. The latter will run as follows. Choose some units 60 and a0 and define s0 = /~0s and § = ao§ Furthermore we set ~20 = c0 = 0. This initialization for the left sided Euclidean algorithm is obtained by the following initialization procedure.
G-1-
I10 ] 0
1
§
'
CO &O
--
[ 00] 0
&O
The following steps for k = 1, 2 , . . . now look as follows. Set
~k
~k
where ~k and ~2k are units, ~)k = 0 and ~k - --~k(~k-1 r d i v § The updating of the (~k matrix is then [~o,k
k--fTkGk-
--
O,k
~2o,k ~k ]
aO,k §
"
When at step n we find that § = 0 (which will certainly happen if ~ and § are elements from 7)), then we have found a g.c.r.d, of ~ and § gcrd(~, ~) = ~. = vo,.S + uo,.§ The approximation interpretation when ~ and § are elements from a left sided rational approximation field, (e.g., the RAF of formal series from F(z -1) where F is a skew field), can also be formulated in a dual setting. When we apply the left sided extended Euclidean algorithm as described above to these data, we should choose/~0, ?z0 and for k > 0 also ~2k and ck to be units (e.g., nonzero elements from F). In the example of the formal series, the r d i v operation is defined to mean that sk-1 r d i v § is the polynomial part of ~k-1 ~-_11-
56
C H A P T E R 1. E U C L I D E A N F U G U E S
In general, one shall obtain the following left sided formal continued fraction expansion in terms of the quantities defined by the Euclidean algorithm.
C2~lao ~3,~2
al ao -'[-
'
b
a2 -~-
an-1 + ^ an + ~n~2n where the ratio means here left division as indicated. The nth convergent ^-1 of this continued fraction is equal to a0,n~0,n. Again in the case where the I~AF is F(z -1), it approximates ] - _§ in the sense that the right-hand side of -- ao,n CO,n
has degree -2~,~-c~n+1 where ak is the degree of the polynomial ak and sk ~ hi. Furthermore, the degree of a0,n is equal to ~n, deg § - deg ~ - t~n+l and the tails z.n of the continued fraction are given by ~,~ - - § 1. If the I~AF is twosided, then left as well as right approximants can be generated. These two approximants are the same however. We illustrate this for the P~AF F(z -1>. Suppose a formal series f from F(z -1) has the following left and right fraction descriptions" f - s - i t - § We prove that its left and right Pad~ approximants are the same. T h e o r e m 1.21 Let F be a skew field and s - l r - ~ - 1 with r , s , ~ , ~ C F(z -1 >. Applying the right sided Euclidean algorithm to G-1 and the left sided Euclidean algorithm to G-1 with
G-1-
I1 ~1 0 s
1 T
and
G-1-
[lo ] 0
1
will give the same approzimants aO,kColk _ ~ - lo,kaO,k.
The tails are the same up to unit factors ek(s~-lrk) -- (§
-1)ek,
with ek,ek C Fo.
1.9. L E F T - R I G H T D U A L I T Y
57
P r o o f . Define 1
0
(i.~.7)
'
then equality of the left and right descriptions is expressed as
[~ ,]r~[~ §
_ 0.
We consider the case where we choose ao - vo - rio - ~)o - 1 and ck - 6k -1 and uk - uk - - 1 for k _> 1. The corresponding quantities are indicated with a prime. For example we have
[0
-1-q~
and
V~-
[0
1-0~
where q~ is the polynomial part of s - i t and ~ is the polynomial part of § Thus q~ - ~ . A direct consequence is t h a t V ~ V ~ - D and thus
0
or ( s ~ ) - l r l - §
-
[~ ~]r~
=
[~
[] §
~]r~
d
-~. By induction, it then follows t h a t for all k _> 1
q~-il~,
(s~)-lr~-§
The latter relation gives Vo,nNVo, n ' ^'
-1,
and
V~P~V~-N.
- P~ and because V/0,n, being generated by
the Euclidean algorithm is invertible, we also have Vo,,,~Vo, ^' n - 5]. Taking the (2,2) element from the latter product implies
CO,n(ao,n) --(a0,n)- c0,n" I
I
-1
^l
1 ^1
This proves the theorem with the special normalization. For general normalization, we have for example
V1- [ 0 L u~
t::l ] -qlc~
where q~ - So ldiv ro - S'oVo l d i v r~ao. Thus q~ - ao ~q~vo. The u p d a t i n g step gives -
=
[So ro]V1 - [ f O U l
s o c l - roqlcl]
[r'oaou 1 (s'o - r'oq~)vocl] [~'~t~
~ ~' ] ,
t~ -
-ao~
,
~
-
vo~.
58
CHAPTER
1.
EUCLIDEAN
FUGUES
For the next step ql
-
81
t ldiv rl - 8tltl ldiv r tl w l - w 1- 1 q2tl.
Therefore -
[,~
~x]y~ - [ ~
=
[~w~,2
=
[4t~
,~
- ~lq~]
' ' ) t 1 c2] (~1' - ~1q2 ~o~],
t~ - - ~ o ~ ,
~
- t~.
A proof by induction yields for general k _> 1
[,k ~k]- [,'ktk /k~ok],
t k - - -- W k - 1 U k ~
?13k - - t k - 1 Ck
where w0 - a0 and to - v0. More generally
0 In particular
wk
"
[ ] [ ] cO,k
_
CO,k !
aO,k
Wk
ao, k
, , -1 so that co,k a -o1, k - Co,k(ao,k) 9 A completely dual analysis can be given for the Euclidean algorithm applied to (~-1. This would give units {k and zbk such that
(~k-[
ikO zbk0] ( ~ .
Thus ,
a~
CO,kao, lk -- Co,k(O,k)
-1
^~
-- ( a o , k ) -
1~
o,k
_&-I
o,kCo,k"
Since also
§
z~k§ ]
we get ,k) - (§
The theorem is proved.
~)(ik~). [3
As we said before, ao,kCo,-1k is a right sided Pad~ approximant and ~o,~h0,k a left sided Pad6 approximant for f. Thus the previous theorem states that the left and the right Pad~ approximants are the same.
1.9. L E F T - R I G H T D U A L I T Y
59
We know that V0,n and V0,n, being generated by Euclidean algorithms are invertible, and hence, their row elements are left coprime and their column elements are right coprime. This says among other things that the denominators of two successive right Pad~ approximants constructed by the Euclidean algorithm, are left coprime and that the same holds for the corresponding numerators. It says also that the Pad~ approximants which are constructed are irreducible (as they should be by definition of Pad~ approximant). If the RAF is not skew, an alternative, much more direct proof is available. It holds for a general integral domain 7). T h e o r e m 1.22 In an integral domain 7), let
V O ' n - - V I V 2 " ' ' V n - - [ v~
ao,nCo'nI
be the matrix generated by the Euclidean algorithm, then the pairs formed by rows or columns in Vo,n are coprime. P r o o f . We have now the determinant of a matrix available and we find from det Vo,n - det V0 det V I ' " det Vn - c with c a unit, that vo,nao,n- Uo,nCo,n = c, and thus that the theorem holds. [:] In the case of the P~AF F(z -1 ), the entries in the V0,n matrix are polynomials. Then we arrive at the remarkable identity, attributed to Frobenius [100]. C o r o l l a r y 1.23 ( F r o b e n i u s i d e n t i t y ) Let F be a commutative field and
let VO,n be the matrix generated by the Euclidean algorithm applied to the data r - f C F(z-1), strictly proper and s - -1, using a monic normalization. Then Vo,nao,n
- - "lto,nC.O~n - -
1.
P r o o f . In this case it holds that n
det Vo,n - vo,nao,n- Uo,nCo,n -- aOvo H (--ckuk) k=l
with a 0 - v 0 - 1 and UkCk- --1 for k -
1, 2, . . . .
[::]
Chapter 2
Linear algebra of Hankels The results we obtained in the previous chapter can be given an interpretation in linear algebra. The idea is very simple: a polynomial or a series can be represented by the vector of its coefficients, which we shall call its stacking vector. A product of series can be written as a convolution of the sequences of their coefficients, that is conveniently written as a product of a Toeplitz or Hankel matrix and a vector. Expressing the properties of the previous chapter in this manner will lead to factorization results for structured matrices like Hankel and Toeplitz matrices. These factorization algorithms are called fast because they obtain triangular factors in O(n 2) operations as opposed to O(n 3) operations for general matrices. Since also F F T techniques can be used, one may obtain not only fast but even superfast algorithms. In this chapter we start with Hankel matrices. Our first objective is to write the results of the previous chapter in terms of matrices and vectors, rather than series and polynomials. This will lead to an inverse triangular factorization of a Hankel matrix, computable by an algorithm of the layer adjoining type. The direct translation of the Euclidean algorithm gives a layer peeling type of scheme that computes a triangular factorization of the Hankel matrix itself.
2.1
C o n v e n t i o n s and n o t a t i o n s
We use results and notations of the previous chapter. However, we follow some additional conventions. 9 We work with formal series f C F / z - l / w h i c h , in the notation of the previous chapter, can be described as the ratio of two elements s and r in F{z -1)" f - - s - l r . To simplify t h e notation, we assume now 61
62
CHAPTER
2.
LINEAR
ALGEBRA
OF HANKELS
that s - - 1 . Thus f is the same as r and is supposed to be strictly proper" f - ~ o f k z - k . n o r m a l i s a t i o n . In almost all the results we assume that we use the normalizations in the Euclidean algorithms of Theorem 1.14. These normalizations have as effect t h a t all the polynomials ak and hence also all the polynomials a0,k are monic and that all the series - s k are monic too, i.e. their leading coefficient which corresponds to the highest degree term, is 1. This normalization is obtained by taking V0 equal to the identity matrix and for k _> 1 we choose ck - rk-1 and uk - -~-_11 where ~k-1 is the leading coefficient of rk-1. See (2.3) below. 9 In order not to complicate the notation unduly, we restrict ourselves in the present chapter to the case where the basic structure F we work in has a commutative multiplication. Thus F is a field, that is n o t skew anymore. Moreover, we introduce some general notation to be used in the remainder of the book. l e a d i n g c o e f f i c i e n t . We denote the leading coefficient of a nonzero series r - ~] r k z k C F(z -~) by ~, i.e., ~ - r t - ford(r)if l - - max{k 9 rk ~ 0} - ord (r). This was used in Theorem 1.14 and shall be used persistently in the rest of the text. s t a c k i n g v e c t o r . We often need stacking vectors of coefficients of polynomials and series. As we did in the last chapter, we use here and in the rest of this book, the convention that the special Sans Serif font is reserved to indicate these stacking vectors. For example, if a is a nonzero polynomial, then by a we mean the column vector obtained by stacking the coefficients of the polynomial a. Thus a ( z ) = [1 z - . . zdega]a. More generally, if s ( z ) - ~] s k z k is a fls, then s is the stacking vector such that s ( z ) - [... z -1 1 z z 2 ...]s. We shall also say that s ( z ) generates s. On several occasions, we shall use z to mean [1 z z 2 --.]T which may be finite or infinite, as will be appropriate. Thus for a ( z ) e F[[z]] or F[z] we have a ( z ) - zTa. 9 shift a n d r e v e r s a l o p e r a t o r . We denote the down shift operator for vectors by Z. Thus we denote the stacking vector of z s ( z ) by Zs if s is the stacking vector of s ( z ) . Thus Zs means "shift the elements of the vector s one position down, possibly introducing a zero at the top" when the vector is only half infinite or finite. Correspondingly, Z -1
2.2. HANKE, L MATRICES
63
will denote a left inverse for the operator Z and it will shift all the elements of the vector one place up. Observe that Z -1 - i Z i with the reversal operator. This means t h a t ]~ reverses the order of the elements in the vector. It turns the vector upside down. All these operators are supposed to have the appropriate dimensions, possibly infinite, t h a t should be clear from the context. A final note on terminology. 9 H a n k e l m a t r i x . A Hankel matrix is a m a t r i x whose entries depend only on the sum of the row and column indices: H = [fi+j+l]. T o e p l i t z m a t r i x . A Toeplitz matrix is a matrix whose entries depend only on the difference of the row and column indices: T = [fj-i]. t r i a n g u l a r m a t r i c e s . We use the term "lower triangular" and "upper triangular" in the usual sense for matrices t h a t are zero above respectively below their main diagonal. A m a t r i x is called "right lower" or "left upper" triangular if it is zero above respectively below its anti-diagonal. For Hankel matrices, the right/left specifiers are redundant and we shall drop them on occasion. Thus a lower triangular Hankel matrix is a Hankel matrix t h a t is right lower triangular and an upper triangular Hankel m a t r i x is left upper triangular.
2.2
Hankel matrices
Let us consider a strictly proper fls to arrive at a matrix formulation of equation ( 1 . 1 1 ) o r its simplified form Note t h a t for any positive integer (1.15): fao,k- co,k = rk is given by
[.it
...
f ~-~F fk z-k E ]~(Z-I>. We want the approximation property given in (1.15). l, the coefficient of z -l in the relation -
/t+,,,, ] o,k
where, as we agreed above, a0,k is the stacking vector of the polynomial a0,k. Now define the Hankel matrices
j=O,...,/
If necessary, we explicitly indicate the dependence of Hk,l on the sequence fl, f 2 , . . , or, what is the same, on the fls f = ~] fkz -k by denoting this Hankel m a t r i x by Hkj(f). f is called the symbol of the Hankel matrix.
C H A P T E R 2. L I N E A R A L G E B R A OF H A N K E L S
64
The fact that deg r k -
--gk+l can be expressed as
H,~k+~- l,~k a O , k
--
[
0
0
9 9 9
0
(9..9.)
~k
where rk # 0 is the leading coefficient in rk, i.e.,
(2.3)
rk(z) - rkz -~k+l + terms of lower degree.
In other words, ao, k is an element from the null space or kernel of the Hankel matrix H,~h+~_2,~k. Here we stumble into another related field" the solution of Hankel systems, the study of their kernels, etc. An overwhelming literature exists on this topic. See e.g., the books of Heinig and l~ost [144] and Iohvidov [156] or the survey paper [40]. Some results on the factorization of Hankel matrices, given in the Gragg and Lindquist paper [121] can easily be obtained. We write this down for an oo • oo matrix H = Hoo,oo. Just as we can embed the polynomials in the fps, we can suppose that the stacking vectors are extended with infinitely many zeros downwards, so that we can consider a0,k as belonging to F ~176 if this happens to be convenient. Furthermore Zia0,k is obviously the stacking vector of ziao,k(z). The fact that the strictly proper part of fao,k is given by rk can be expressed as a generalization of (2.2) by
Ha0,k -
/~ rk
where rk is the stacking vector of rk, i.e., rk - [ " " rk-[
...
~k
0
...
Z -2
Z -1
0 ]T,
] rk, thus (2.4)
~k+l
and rk is the leading coefficient of rk. More generally, we have for i < '~k+l
We write this down for i - 0, 1 , . . . , O~k+1 - - 1 < ~k+l and summarize this in a global relation
HA.,k - fr R.,k where we have set
A.,k-- [ ao,k I
Z 0,k I
9
I Z
lao, k
]
(2.5)
and
I Zrk
...
Z~k+~-lrk ].
(2.6)
2.2.
65
HANKEL MATRICES
Let us partition A.,k and R.,k into blocks as follows Ao,k
A.,k-
Ak,k
Rk,k
and R.,k
_
Ro ,k
where we suppose that Ai,k as well as Ri,k have dimensions ai+l X a k + l . The blocks Ai,k are zero for i > k and the blocks Ri,k are zero for i < k. All the blocks are Toeplitz blocks. The blocks A0,k are lower triangular and the blocks Ak,k and Rk,k are upper triangular. Our next step is to bring the previous results together for several values of k. Therefore define
A
-[A.o
I
...
Rk-
[ R.,o [ R.,1
I
and
"'"
R.,k ] 9
(2.8)
Then HAk
-
(2.9)
T Rk
where Ak is block upper triangular with Toeplitz blocks and /~ Rk is block lower triangular with Hankel blocks. The matrix Ak is also upper triangular in the scalar sense, but f Rk is not lower triangular in the scalar sense since its diagonal blocks are not (see the example below). These diagonal blocks are zero above the anti-diagonal, hence lower Hankel triangular. Since both A T and I R k are block lower triangular, it follows that also their product A T ] Rk - A T k H Ak is block lower triangular. The transpose is therefore block upper triangular 9 But since A kT H Ak is symmetric, it is equal to its transpose which is simultaneously block upper and block lower triangular. Hence it is block diagonal with blocks Dk,k - A.,T ] R.,k - AkTk f Rk,k.
(2.10)
These blocks are lower Hankel triangular because AkTk is lower triangular Toeplitz and i Rk,k is lower Hankel triangular. Denote the ( i , j ) t h element of the Hankel block Dk,k by d2,~k+i+j-1 (1 ~k. In linear system theory, these numbers nk are known to be the Kronecker indices of partial sections of f [165]. So we introduce the following definition D e f i n i t i o n 2.1 ( K r o n e c k e r i n d i c e s ) For a given infinite (scalar} sequence f l , f 2 , . . , in ~ we define the Hankel matrices Hid as before in (2.1). The Kronecker indices ~k, k - O, 1 , . . . are then defined by ~o - 0 and recursively, we define nk+l as the smallest number larger than e;k such that the Hankel matrix H,~k+t - 1,,~k has full rank. As we know, these Kronecker indices are such that for vk - nk+l - 1, the matrices Hk - H~h,~k are precisely the successive invertible leading submatrices of H. Also this characterization could be used as a definition of the Kronecker indices. However, the previous definition has the "advantage" that to define ~k+l, we only need the elements of the Hankel H,~,+~-l,,~k, which contains less fk-elements than the submatrix Hk. For the differences between the Kronecker indices, there doesn't seem to be a widespread term. Because they indicate the degrees of the successive quotients as generated by the Euclidean algorithm, we call them Euclidean indices. D e f i n i t i o n 2.2 ( E u c l i d e a n i n d i c e s ) Let the sequence ( f k } as in the previous definition have Kvoneckev indices {tCk}. Then the numbers ao - O, c~k - t c k - tck-1, k - 1, 2 , . . . ave called the Euclidean indices of the sequence.
2.2. H A N K E L M A T R I C E S
69
To summarize what we obtained so far in a linear algebra setting, we can formulate the following theorem which was given in the paper of Gragg and Lindquist [121]. T h e o r e m 2.1 Let {fk}k>~ be an infinite sequence in F with Kronecker indices {'r and Euclidean indices {ak}k>0. Suppose that with f we associate the Hankel matrices H i j as in (2.1) and set H - Hoo,oo and find a_o,k from equation (2.13) to get aT T o,k _ [gS,k 1 0 0 ...]. Define the upper triangular matrix. An as in (2.5,2.7). Then A T H An - Dn with Dn - diag(D0,0Dl,1... Dn,n) a block diagonal matrix where each block Dk,k is a lower Hankel triangular matrix of dimension ak+l x ak+l with en12ah+1-1 which is given by the relations (2.11). tries of the subsequence {d2~k+iji=~ The elements d2~h+i are zero for i -- 1 , . . . , ak+l - 1. The columns of the matrix An are the stacking vectors of the corresponding polynomials appearing in the normalized extended Euclidean algorithm applied to s - - 1 and r - f - ~ k > l fk z - k . The stacking vectors of the series rk that are generated in that algorithm are found in the corresponding columns of the R,~ matrix which can be defined by (2.9), i.e. i Rn - HA,,. In this theorem we have kept infinite dimensional matrices, mainly because the rk are infinite series. We can however easily obtain finite dimensional formulations by just truncating at the appropriate place. Since we need this in the sequel, we give such a formulation now. To do this we introduce a new operation and notation. T r u n c a t i o n o p e r a t o r . This operator is denoted as Ikl- It is represented by the first k rows of a identity matrix of appropriate dimensions (here infinite dimensional) so that for a vector s, the result of Ikl s will be the first k elements of s only. Similarly, we denote by Ilk the operator which selects the last k rows. It is represented by the last k rows of the identity matrix. For example, with u + l 1fork>
1.
If we equate the coefficients of z i in the left-hand side and the right-hand side for i < 0, we get JR.,k-1 I Z ~ k r k - 1 ] a k - --rk-2Uk-lCk + rk,
(2.z5)
2.2. HANKEL MATRICES
71
The m a t r i x in the left-hand side is R.,k-1 as in (2.6) but extended with one more column. Thus it is
Now we split again ak as a kT - [gkT 1] and w e bring the last column of (2.16) to the right-hand side in (2.15). This gives R..,k-l~k = -Zakrk-1
(2.17)
- rk-2t3k -[- rk
with ~k - Uk-lCk -- --~k-_12~k-1. (R.ecall the normalization ck - rk-1 and uk-1 - -~k-_12.) The first ak nontrivial equations herein can be used to define gk. T h e y are l~k-l,k-la-k
=
(2.18)
--[k-1
where ~ - 1 of the same dimension as gk is the stacking vector containing in reverse order the coefficients of z - j for j -- ~tr + 1 , . . . , ~;k of z'~krk_l + rk-2flk. If we want to have the ak defined in terms of the p a r a m e t e r s di as introduced in T h e o r e m 2.1, then we take equations (2.17) and multiply from the left with ATk_I i to get the following relation which defines ak in terms of the dk t h a t a p p e a r in T h e o r e m 2.1 D k - l , k - l g k -- - d k
(2.19)
with D k - l , k - 1 the lower Hankel triangular m a t r i x as defined in terms of the d2~k_~+i in (2.11) and d k is given by d_k-[
d2,~k_~+~k+l
""
d2~j, ] T 9
For the last conclusion we use the fact t h a t A T . , k - 1 /~ rk-2 as well as ATk_I /~ rk give a zero vector. E x a m p l e 2.2 ( E x . 1.5 c o n t i n u e d ) Let us take the previous example again. We can start with f - - s - l r which is in this case
f(z)-
( z - z - 4 ) - 1 ( 1 + z -4) - z -1 + z -S + z -6 + z -1~ + z -11 + z -15 + . . -
Since we know t h a t the Hankel m a t r i x will have finite rank, we need only write down a finite part of it. The 5 by 5 part will be sufficient. The c o m p u t a t i o n s can be redone with s = - 1 and r = f. The same polynomials ak will emerge. Thus we have a l = 1, a2 = 3 and a 3 : 1. The normalized polynomials ak are given by a0-1,
al-z,
a2-z
3-z 2+z-1
and
a3-z+l.
72
C H A P T E R 2.
LINEAR ALGEBRA
OF H A N K E L S
These give for the a0,k polynomials: a0,0
1,
a0,1 -- z,
a0,2
Z 4 __
--
Z3
+ z2 - z-
1
and
a0,3 --
Z 5
- 1.
Since ~3 - 1 + 3 + 1 - 5, the m a t r i x A2 is a 5 x 5 matrix. The m a t r i x H2 is
1 0 0 0 1
H2-
0 0 0 1 1
0 0 1 1 0
0 1 1 0 0
1 1 0 0 0
and the m a t r i x A2 is given by 0 0 1
A2-
0 0 0 1
-1 -1 1 -1 1
which delivers 1 0 0 1
A T H 2 A 2 - diag(D0,0, D1,1, D2,2) -
0 1 1
1 1 0 -2
as one can easily verify. Also the relations (2.19) can be easily verified. For example the coefficients of a2 satisfy the system
I~176 Ill 0
1
1
i
1
0
a_2--
0
.
0
The relation (2.18) can be found more directly. As a m a t t e r of fact, the following construction is how S.-Y. Kung [177] found the recursion as a generalization of the Lanczos algorithm. The idea of equation (2.9) is to construct the m a t r i x Ak such t h a t the right-hand side m a t r i x I Rk is in a kind of echelon form. By this it is m e a n t t h a t the m a t r i x has unique pivot positions for each column. In other words, the first nonzero element
2.2.
HANKEL MATRICES
73
in each column should appear on different rows. The matrix is then lower triangular up to column interchanges. Suppose one wants to construct the first column of the block column A.,k. Note then that
/~ [ ao,~_2,
ao,k_l,
Zao,~_l,
= ~ [ ro,k-~ I ro,k-11
...
Zro,~-i
Z~
, -
, Z~
] ]
has the form 0
0 ~
0
0
'~k-2
rk-
•
X
~
•
X
The middle block in /~ is /~ Rk-l,k-1 which has size ak x ak. It is lower triangular Hankel with diagonal elements rk-1. 7"k_ 1
2~ Rk-~,k-~
-I
9
~
'~k-~ X
The last column of the matrix /~ violates the echelon condition. In this column, rk-1 and the ak subsequent elements should be eliminated and this can be done by linear combinations of the previous columns. The resulting column is then called f rk. The first coefficient is eliminated by adding flk times the first column to the last one with flk - - r k - l r k - 1 2 9 Denote the first ak coefficients following this position in the result by ~ - 1 . Then these coefficients are eliminated by adding R k - l , k - l ~ to it with a_k satisfying (2.18). Thus we have shown with a hnear algebra argument t h a t rk -- Z ~
~. r k - 2 f l k -I- R.,k-la_a_k.
(2.20)
or
rk - rk-2/~k + rk-~ak.
(2.21)
which is at this stage a famihar formula. It is indeed the same as the recursion of the Euclidean algorithm rk - sk_~ck + rk_~ak because Tk
--
S k - l Ck + r k _ l a k
=
rk-2Uk-lCk
+ rk-lak
74
CHAPTER
2.
LINEAR
ALGEBRA
and because of the normalizing choice uk-1 -
OF HANKELS
-~k-_l2 and ck - rk-1, we
have Uk-lCk -- --rkl2rk-l_ -- ilk, this reduces to (2.21). This same three term recurrence relation holds also for the a-vectors instead of the r-vectors because they satisfy the same recurrence. In fact, this three term recurrence relation for the a0,k polynomials was already derived in (1.13). The latter will play an i m p o r t a n t role in the next section.
2.3
Tridiagonal matrices
Also the three term recurrence relation (1.13) mentioned at the end of the previous section, can be written in an explicit matrix form. To get this result, we shah need the companion matrix of a polynomial. D e f i n i t i o n 2.3 ( c o m p a n i o n
matrix)
Let p ( z ) - ~ p k z k be a m o n i c poly-
n o m i a l o f degree m (pro = 1). T h e n its companion matrix is defined as
F(p) =
0
0
...
0
-Po
1
0
...
0
-Pl
0
1
...
0
-P2
:
:
:
:
9
,
.
~
0
0
...
1
(9..22)
-Prn-1
The last column of F ( p ) shall be denoted by - p . In this section we drop again the redundant zeros in the stacking vectors of the a0,k polynomials and consider An as a ~,~+1 by ~,~+1 matrix. The An of this section is what was denoted as A~ with v + 1 = ~n+l in the previous section. Thus here all the a-vectors are supposed to be in 1~"+~. We can now write the recurrence (1.13) aO,k -- aO,k- 2 Uk- 1Ck + aO,k- 1 ak
for the monic normalization ( q - ri-1, ui-1 - -~i-__ 1 ) _ in vector notation. This becomes
aO,k -- aO,k-2~k + [ aO,k-1
Zao,k-1
...
Zekao,k_l ] ak
with ~k = u k - l c k and czk = deg ak. This is the same as
Zakao,k_l - --ao,k_2~k -- [ ao,k_l
...
ZCxk-lao,k_l
] ak + aO,k
or, with Ak defined as in (9..7) Z~ka0,k-l-Ak[~l
- i l k 0 ..- 0 I - a kT I 1 0 --- 0]T ~k--2
ock_ 1
(x k
~
2.3.
TRIDIA GONAL
MATRICES
75
In the left-hand side, we find Z times the ~kth column of the m a t r i x Ak. T h a t is Z times the last column of the block column A.,k-1. The right-hand side involves the first column of the previous block, the previous columns of the same block and the first column of the next block. We could as well express the complete block column A.,k-1 as (n _ k) Z A.,k_ I - A~ [ 0 . . . 0 ] T k-2,k-1 T
T I TT-x,k-1 I T k,k-1 [0...
O] T
with
(2.23)
T k - l , k - 1 -- F ( a k ) E ]~kx~
Tk-2,k-1 --
0 ...
0 -Zk
0 .
0 .
0 .
.
.
0
-..
9
Tk,k-1 --
0
...
0
0 0 .
--.--
0 0 .
9
0
---
C ]Wh-~ x~k
1 0 .
.
.
0
0
E F ~k+~ x~k.
(2.24)
(2.25)
If we write this down for severaJ k-values, it is easy to find, by checking the boundaries t h a t F(ao,k+l )Ak - A k T k (2.26) with Tk a block tridiagonal m a t r i x To,o
To,1
T1,0 TI,1 T1,2 Tk-
T2,1 T2,2
(2.27)
"'. 99
Tk,k-x
Tk-l,k
Tk,k
We call these matrices Jacobi matrices. In the scalar (i.e. the nondegenerate case) some authors prefer to restrict the name Jacobi m a t r i x to the symmetric version of Tk to be defined in Definition 2.4. However, since we generalize to the block case anyway, we broaden the meaning of Jacobi m a t r i x to any block tridiagonal m a t r i x which will turn out to generalize in some sense the three term recurrence relation for orthogonaJ polynomials, and this is the case for Tk as we shall see in the next chapter. An expression for the inverse of Ak in terms of Rk and Dk was given in Corollary 2.2 but it can also be given in terms of the m a t r i x Tk. It
76
C H A P T E R 2. L I N E A R A L G E B R A OF H A N K E L S
follows from ( 2 . 2 6 ) t h a t [ F(ao,k+l)]iAk - AkT~. Take the first column of these relations with i - 0 , . . . , a k + l - 1. The left-hand sides are then the successive columns of the identity matrix. Hence we have found that Ak-1 - - [ e l I Tkel I "'" I rP~h+t-1 "~k eI ]
(2.28)
withe1-[100 . . . 0 ] T E F ~k+~. We can summarize the results of this section in the next theorem which can also be found in the paper of Gragg and Lindquist [121]. 2.3 Suppose the (monic) polynomials ao,k are given by the recurrence relation ao,k - a o , k - 2 ~ k + ao,k-lak
Theorem
with j3k constant and ak a monic polynomial of degree ak > O. The initial conditions are ao,-1 - 0 and ao,o - 1. Then ao,k is of degree ~k - ~kl ai. Define the unit upper triangular matrix An as in (2.7). Then F(ao,n+z )A~ = ANT,,
(2.29)
with Tn as defined in (2.27) and f ( a o , n + l ) is the companion matrix of ao,n+l. The inverse A~, z is given by (2.28). E x a m p l e 2.3 ( E x . 1.5 c o n t i n u e d ) For n - 2, we check again with the same example if the previous theorem gives the correct result. We cannot take n > 2 because the algorithm broke down after step n = 3 because we had an exact fit. The equation (2.26) for k = 2 looks as follows 1 0 0 0 0
1
-1 -1 1 -1 1
0 1
1 1 1 1 1
0 1
0 0 1
0 0 0 1
0 1
0 0 1
1 1
0 0 0 1
-1 -1 1 -1 1
1 1 -1 1 1
=
0 -2 0 0 -1
.
Computing the successive powers of T2 gives 9
T2-
0 1 0 0 0
0 0 1 0 0
0 0 0-1 1 0
1 1
0 -2 0 1 0 1-1
,T~-
0 0 1 0 0
0 0 0-1 1 0
1 1 1 10
1 0 0 0
0 2 -2 0 1
,
2.3. TRIDIA GONAL MATRICES 0 0 0 1 0
T~
1 1 -1 1 1
1 0 0 0 0
0 1 0 0 0
77
0 -2 2 -2 -1
1 1 -1 1 1
1 0 0 0 0
0 0 1 0 0 1 0 0 0 0
-2 0 0 0 -1
so that the inverse A21 is given by 1 0 0 1 0 A21-
1
0 0 0
1 1 -1
1
1 1
From the previous theorem, it is easy to find a determinant expression for the polynomials a0,n+l.
Corollary
2.4 Let ao,k, An, F(a0,n+l) and Tn be as defined in the previous theorem. Then ao,,~+l is the characteristic polynomial of Tn. This means that d e t ( z l n - T n ) - ao,n+l(z). (2.30)
Hence, the zeros of a0,n+l are the eigenvalues of Tn. P r o o f . Just note that it follows from (2.29) that a0,n+l(z)
since det An
-
-
d e t [ z l n - f(ao,n+l)]- d e t ( z l n - AnTnA~, 1)
=
det[An(zIn- Tn)A~ ~]
=
det(An) det(zln - Tn) det(A~, 1)
=
det(zIn - Tn)
det A~, 1
-
1.
D
There is another way of finding the inverse A~ 1 than the formula (2.28) as we have seen in the last section. See Corollary 2.2. There is a way to link the block tridiagonal matrix Tn with the Hankel matrices from the previous section more explicitly. Therefore, we introduce the Hankel matrix H ( z f ) . Recall that if OO
-k k=l
78
C H A P T E R 2. L I N E A R A L G E B R A OF H A N K E L S
then H ( f ) represents the Hankel matrix
H ( / ) = [/i+j+1]i,3:o,1,.... Let us set v + l = tcn+~. Then the ( v + 1) • ( v + l ) leading principal submatrix of H ( f ) was denoted as Hn = H~,,~,(f). Now with oo
zf(z) - ~ fk+az -k, k=0
we can again associate a Hankel matrix
H ( z f ) = [fi+j+2]i,j:0,1 .... 9 Its (v + 1) • (L, + 1)leading principal submatrix H~,,~,(zf ) is derived from H~,,~,(f) by deleting the first column and extending the Hankel matrix with one extra column at the end, or equivalently, by deleting the first row and extending the Hankel matrix with one extra row at the bottom. Obviously H ( z f ) = H ( . f ) Z where Z is the shift operator. Therefore we also use the notation H < for H ( z f ) because applying Z to the right shifts all columns one place to the left. Equivalently, we could denote it as
g(zf)-
ZTH(f)-
H A.
The following matrix relation holds. L e m m a 2.5 Let g = H ( f ) be the Hankel matrix associated with f ( z ) =
E~=x fk z-k and H~,,~,(f) - I~,+IlHI 1,+11 T a leading principal submatriz. Similarly, define the shifted Hankel matrix H < = H A with leading principal submatriz H n~. This is indeed true, if we denote the matrix [Zo[Zll...Iz,~] as Xn, then A'n = range Xn and extending X n to X n + l or further does not add anything as the following lemma shows. We have used there the notation a instead of n~ + 1 for simplicity. L e m m a 3.1 L e t X k - [xolAxol-." IAkxo], k - O, 1 , . . . be Krylov matrices. Then rank Xo~ - a if and only if the first a columns are linearly i n d e p e n d e n t and the first a + 1 columns are linearly dependent. P r o o f . Let us denote zk - A k z o . Suppose there exists an a such that ot
akzk--O, a.#O. i=0
Then the first a + 1 columns are linearly dependent and we can express z~ in terms of the previous columns. But it then follows by induction that all the following columns are also linear combinations of the first a columns and therefore the rank of Xor is at most a. Again by choosing a minimal a, we catch the rank of Xoo. [:3
3.2. B I O R T H O G O N A L I T Y
101
It follows from this l e m m a that the Krylov spaces are strictly included in each other until a maximal space is reached"
Xo c ,r~ c . . .
c x,,.
-
,v,,.+x = - - . =
xoo.
(3.~)
Thus we can give the following definition. D e f i n i t i o n 3.1 For a given vector Zo C F m+l and a square (m+ 1) x ( m + 1) matrix A, we can generate the Krylov spaces as in (3.1). The space 'u
- Xoo - span{z0, A z 0 , . . . , An'z0} - range[z0]Az01... IA~'x0]
is called the maximal Krylov subspace generated by zo. We define the grade of Zo (with respect to the matriz A) as the dimension n . + 1 of the mazimal Krylov subspace of A generated by zo. At the same time, ,In. is an A-invariant subspace as one can readily see. It is the smallest A-invariant subspace containing z0. Indeed, if it has to contain z0, and be A-invariant, it should also contain Az0. Therefore, it should also contain A2z0 etcetera until we get some A'*'+lz0 which is already in the space. The first number nx + 1 for which this happens is precisely the grade of z0. Note t h a t the minimality of the A-invariant subspace and the maximality of the Krylov subspace are no contradictions, but designate the same thing. A thorough account on invariant subspaces for matrices and their applications can be found in [112]. By analogy, the spaces Yk will terminate in an AH-invariant subspace of dimension n u + 1. It are these invariant subspaces t h a t we want to catch. T h a t this can be done by the Euclidean algorithm is explained in the next section.
3.2
Biorthogonality
To describe the invariant subspaces, we have to find some bases for them. Of course the Krylov sequences form bases, but because they are not orthogonal, it does not show at what point we loose linear independency unless we examine the linear independency, which could be done by an orthogonalization, or in this case, more efficiently by a biorthogonalization procedure. Thus the idea is to find nested biorthogonal bases {xk} and {Yk} such t h a t
2dk def k - xk, - span{zi }i=0
k-
Yk d~f span{:yi }i=o k
k - 0 , . . . , n u.
-
-
Yk,
o,...,~,
(3.9.)
CHAPTER
102
3. L A N C Z O S
ALGORITHM
Note that this condition is equivalent with requiring that ~k has the form "xk -- p ( A ) z o with p(z) some polynomial of degree k. A similar relation holds of course for yk. We also require some formal biorthogonality relation -
i , j - 0, 1 , . . .
with ~i nonzero and B a square nonsingular matrix that commutes with A. The above product is a genuine inner product if B is a self-adjoint positive definite matrix, however we shall allow an arbitrary nonsingular matrix here. R.elevant examples are B - I, B - A or B - A -1. This matrix B was introduced by Gutknecht in [125]. However, we shall discard it by assuming that B has been taken care of by replacing x0 by Bxo or by replacing Y0 by BHyo. It can at any instance be reintroduced by taking it out of the initial vector again and using the commuting property with A. If we define the matrices and which are both in written as
~,.,,+1)x(,.,+I), ~g~.
then the biorthogonality condition can be
_ On - d i a g ( ~ 0 , . . . , ~n).
It will turn out that these matrices reduce A to a tridiagonal matrix, i.e., ]THAXn is a tridiagonal matrix. We shall show that the extended Euclidean algorithm as we have seen it before solves this problem. To begin with, we show that it solves the problem at least in the generic case, by which we mean that all the degrees ak of the polynomials ak are K-~n+ 1 one. This implies that n,~+l z..,k=0 ak will be equal to n + 1. First, define the numbers fi by the relations
Y H z j -- y H a i a J z o - fi+j+l, Thus when we set X , - [ x 0 1
i , j - O, 1 , 2 , . . .
-.. I ~ , ] and Yn - [Y01 . . . l y, ], then
r#x. is a Hankel matrix based on the coefficients fi. In general these matrices had dimensions ~n+l, which, in our generic case, is just n + 1. The extended Euclidean algorithm, appfied to the data f - ~ i ~ 1 fi z - i will generate among other things, the polynomials ak (supposed to be of degree 1) and the polynomials a0,k whose coefficients appear in the matrix An as defined in (2.5, 2.7). Because this matrix is (unit) upper triangular, the matrices (recall that denotes the involution in F)
Jr,,, - X,,,An and 1~,~ - YnAn
3.2. B I O R T H O G O N A L I T Y
103
shall have columns that satisfy the conditions (3.2). We verify next the biorthogonality relation. We can express the previous relations as :~k
-
"'" I Akzo]ao,k
[zolAzol
= ao,k(A)zo and similarly we find that
flk.- a---o,k(AH)yo -[ao,k(A)]gYo. We can easily verify that
~H](,.,_ ATyHx,.,An - ATH,.,A,.,- D,.,
(3.3)
with Dn a diagonal matrix as we have seen in Theorem 2.1. Recall ak = 1 for all k which implies that Dn is n + 1 by n + 1 diagonal and its diagonal blocks Dk,k are scalar. Thus in the notation of the previous sections: Dk,k = ~k = d2k+~. Writing down the ( i , j ) t h element of the previous relation (3.3) we get ^H^ T 0] T -- ~ i j D i , i (3.4) Yi :r'.i -- [a0,i 0 . 9 " 0 ] H n [ a 0,j T 0... which shows how the orthogonality of a0,i and a0,j with respect to the symmetric indefinite weight matrix Hn is equivalent with the biorthogonality of
f I i - [ao,i(A)]Hyo
and
~j - [ a o d ( A ) ] z o
with respect to the inner product with weight I. Now we should show that yTApf~ is tridiagonal. relations zk+l = Azk for k = 0 , . . . , n 1
We can write the
and z , , + l - ao,,,+l ( A ) x o - A n + l zo + [ Zo I "'" ] z,, ]ao,n+ 1
as one matrix relation
AXn - Ix1 I "'"
] - X,.,F(ao,,.,+l)+[O . . . 01
]
where as before, F(ao,,.,+l) represents the Frobenius matrix of the polynomial a0,n+l. Because we have shown in Theorem 2.3 that
F(ao,,.,+l )A,., - ANT,., with Tn a tridiagonal Jacobi matrix, we find that
AX,.,A,., - X,.,F(ao,,.,+I)AT, + [ 0
.
.
.
0 I ~,~+1 ]An
104
CHAPTER
3. L A N C Z O S
ALGORITHM
or, by definition of )(n and because An has diagonal elements equal to 1, A](,., - XnT,., + [ 0
" - 0 [ ~n+t ].
(3.5)
... 0 I Sn+, ].
(3.6)
As a side result we note that by analogy A H y n - Y,~T,~ + [ 0
Hence we get the symmetric Jacobi matrix Jn in
2A.L,,
-
? 2 L , T,, + [ o
=
D.T.-J.
... o
]
because xn+l is orthogonal to all yk, k - 0 , . . . , n by the relations (3.4). Thus y H A X n is tridiagonal and we are done. We do not have that ]~Hxn is the unit matrix, but we can always write the diagonal matrix Dn as the product of diagonal matrices D~,~D" and replace ]I,~ by the normalized Y,~[D~]-1 and 5(,~ by Xn[D,.,] ^ , - 1 9 In that case Yn and )f,, will be biorthonormal. More generally, we could have imposed any normalization on these vectors by taking for D~ and D~ arbitrary nonsingular diagonal matrices.
3.3
The generic algorithm
Now that we know that the extended Euclidean algorithm can solve the problem, let us reformulate it in terms of the data of the present problem. Kather than computing all the fi first, then applying the algorithm to f to get the a0,k from which zk = ao,k(A)zo follows, we want to translate the algorithm directly in terms of a recursion for the zk. The result corresponds to a two-sided Gram-Schmidt biorthogonalization of the initial Krylov sequences formed by the columns of Xn and Y,~. The fact that ak is of degree 1, means that the orthogonahzation process can be obtained in a very efficient way. It is indeed sufficient to subtract from A~k-1 its components along xk-1 and zk-2, since its components along xi for i = 0, 1 , . . . , k - 3 are zero. This is how in the recurrence relation 5:k - ak(A)~.k-1 +
~kxk-2
appear only three terms. This three term recurrence is an immediate translation of the three term recurrence relation for the polynomials ao,k a s indeed by definition xk - ao,k(A)zo and because ao,k - akao,k-1 + flkao,k-2 as we know from (1.13). Recall that flk - u k - l c k . For the monic normalization
3.4.
THE
EUCLIDEAN
LANCZOS
105
ALGORITHM
(see Theorem 1.14) we should choose u k - 1 - -~k-_12 and ck then
-
~ k - 1 , SO t h a t
-
(3.7)
The Tk-1 elements are found as ~k-1Dk-l,k-1
=
T [aO,k-1
0 ' ' ' 0]H.[a O,k-1 T 0...0] T
=
[a0,k_ 11T r k -Hl X k - l [ a O , k - 1 ]
:
The polynomial coefficients ak - [ a Tk 1 ]T are given by the solution of Dk_l,k_l
a_k -- -d_. k
as in (2.19). In the present case where all ak are assumed to be 1, this gives a very simple expression because a_k is just a scalar and so is Dk-l,k-1 = d 2 k - 1 -- r k - 1 . The vector d_k also reduces to a scalar, viz. d2~k_~+2 = d2k, which, according to (2 911), is given by Y^S k - 1 A&k- 1. The algorithm can start with xo - x0, !)o - Yo and x-1 - 0, !}-1 - 0. The general iteration step is
X,k-
Ax, k-1
--
^H A~.k -~ ^ ~H_ 1 ~k, Yk-1 -^H ~.k z~:_~ ^H 5:k Yk-1
-I
Yk-2
^ Xk_ 2 .
-2
Of course a similar relation holds for the ~)k recursion. This is closely related to the recursion of the 0RTHODIR variant of the conjugate gradient m e t h o d as described by Young and Jea in [242] and in the thesis of Joubert [162]. It is possible to find other expressions for the inner products. For example, it follows from the biorthogonality relations and because the a0,k are monic that
3.4
The
Euclidean
Lanczos
algorithm
Note t h a t a zero division will cause a breakdown of the generic algorithm. T h a t is when one of the rk-1 is zero. This will certainly happen when a maximal subspace is reached. For example when the m a x i m a l Xk-1 is reached, then Xk C A'k-1 and thus also xk C A'k-1. Because Yk is chosen orthogonal to A'k-1, it will also be orthogonal to zk- Thus ~H~k -- 0. By s y m m e t r y the same holds when a maximal subspace Yk-1 is reached. If we are looking for the invariant subspaces, then this breakdown does not m a t t e r because we then have reached our goal. So the question is in which situation can a breakdown (i.e., rk-1 = 0) occur.
106
CHAPTER
3.
LANCZOS
ALGORITHM
If we are in the generic situation where all the blocks have size 1, then by the determinant relation det Hn - det Dn, we see that this will happen if and only if one of the leading principal submatrices of the Hankel matrix becomes singular. For a positive definite self-adjoint matrix A and with z0 - y0, it turns out that Yn - Xn and consequently H,., - x H x n is self-adjoint positive semi-definite. Thus det Hn - 0 if and only if n _> rank H - n~ + 1 - n~ + 1 where n~ + 1 is the maximal dimension of the Krylov spaces ,u - Yn. Thus in this case, at the moment of a breakdown, we always reached the rank of the Hankel matrix and at the same time we reached a maximal Krylov subspace. Note that reaching the rank of the Hankel matrix means t h a t we know its symbol exactly. This does not mean though that we know everything about A; only the information about A that is caught in the symbol f ( z ) - ~ k f k z - k where fk - y H A k - l z o , k - 1 , 2 , 3 , . . . will be available. How much of A is kept in f will be explained below. Since we are after these invariant subspace, this case is not really a breakdown, but the algorithm just comes to an end because everything has been computed. However, when A is not self-adjoint or when x0 ~ y0, it may happen that the algorithm breaks down before r a n k H is reached. This is a situation that can be remedied by a nongeneric Lanczos algorithm. Because the generic algorithm breaks down but the situation can be saved by the nongeneric algorithm, it is said that we have a "curable" breakdown. This shortcoming of the generic version explains why for some time the Lanczos algorithm has not been very popular, except for positive definite matrices where a breakdown theoretically never happens except at the end of the computations. So we can always continue the computations until we reach the rank of H. Of course we then always have a breakdown. Either we have then reached an invariant subspace, i.e., a maximal Krylov subspace, or not. The first case is what we wanted and the algorithm stops normally as in the positive definite case. However in the second case, we have an "incurable breakdown". T h a t is a breakdown when the rank of H is reached before a maximal Krylov subspace is obtained. Note that rank H < m i n { n ~ , n ~ ) + 1 is perfectly possible also for a nonsingular matrix A. E x a m p l e 3.1 As a simple example consider zo - [ 1 0] T, yo - [0 1]T and A the 2 by 2 identity matrix. Clearly n . - nu - 0 while H - 0 and thus rankH-0 < 1.
3.4. THE EUCLIDEAN LANCZOS ALGORITHM
107
A more detailed analysis of breakdown will be given in Section 3.5. First let us show that the extended Euclidean algorithm also works in the nongeneric case, i.e. the case where some of the ak can be larger than 1 and that it really heals the curable breakdown. A most important step in the design of a nongeneric Lanczos algorithm was made in 1985 when Parlett, Taylor and Liu proposed the Look-Ahead Lanczos (LAL) algorithm [201]. They propose to anticipate the case where rk-1 = 0. If it occurs that 9H_l~k_l -- 0, then they look for the number 9H_IA~k_I and when this is nonzero they can compute both pairs 9k, yk+l and ~k, ~k+l simultaneously. The price paid is that Tn is not quite tridiagonal, but it has some bump or bulge where a vanishing rk occurs. For numerical reasons, it is best to do such a block pivoting every time some rk is not zero but small. This is precisely what Parlett et al. propose. Computing couples of vectors simultaneously may not be sufficient because if the 2 by 2 block
k-1
^H A Yk-I
][
~k-1
A~k_~
]
is still singular, one should take a 3 by 3 block etc. If we go through the previous train of ideas again, but leave out the assumption that ai = 1 for all i, then we see that the Euclidean algorithm does precisely something like the LAL algorithm. We should stress however that the correspondence is purely algebraic. There are no numerical claims whatsoever. The Euclidean algorithm has no sound numerical basis, but at least in theory it gives a reliable algorithm. We start as before from a matrix A E F (m+l)• and some initial vectors x0, y0 C 1~+1, to define the Krylov sequences
zk--Akzo
and
yk--(AH)kyo
for
k-0,1,...
as wel as the corresponding spaces ~'k and Yk. Defining the elements fk+l
--
we can associate with f ( z ) -
Hk,k
-
yHAkxo,
k - O, 1 , 2 , . . . ,
Ek>l fk z-k the Hankel matrices
H k , k ( f ) - [Y0 Yl.-.yk]H[xo
Xl...Xk]
-
ykHXk
Let ~k and ak be the Kronecker and Euchdean indices respectively which are associated with the sequence {fk}. It may happen that 9H~k -- 0 without having reached an invariant subspace, thus k < min{nz, nu} , with nx + 1 and n u + 1 the dimensions of the maximal Krylov subspaces. Then
CHAPTER 3. LANCZOS ALGORITHM
108
we have to deal with some ak > 1 and the simple recursion of the generic case does not hold anymore. The solution to this problem is the following. We generate the xk and ~)k in clusters of ak elements. To describe a general step computing such a cluster, it is useful to renumber the vectors, so t h a t the grouping into clusters is reflected in the notation. We shah use the following notation to indicate the first vector of the kth cluster : z0,k-z~ k-ao,k(A)zo,
k-0,1,...
where we define '~0 - a0 - 0. Similar notations are also used for other vector sequences like xk, yk, yk etc. We also use the abbreviation we used in Section 2.2 to denote Hk - H~k-l,,~k-l(f ). Suppose t h a t we have computed the vectors xo,i-
ao,i(A)xo
~/o,i- [ao,i(A)]Hyo,
and
for
i-- 0,1,...,k-
1
and t h a t ~H 0,k_l$0,k_l -- 0. The generic version described above breaks down at this stage since we have to divide by this number. This is the place where a new cluster of more t h a n one element starts. The number of vectors in the cluster is defined by the smallest number i such t h a t the Hankel m a t r i x [Y0,k-11A H~)o,k-1 [...
I(AH)i-1Yo,k-1 ]H [z0,k-11Az0,k-1 I . . .
IAi-1 z0,k-1 ] (3.8) is nonsingular. The first nonsingular such m a t r i x is precisely D k - l , k - 1 which we need in the c o m p u t a t i o n of ak. The ( i , j ) t h d e m e n t in this m a t r i x is
-
yH[ao,k-1 (A)]A i+j-2 [ao,k-l(A)]xo
__
[zi+j-2
=
d2,~k_t+i+j-1.
^H Ai+J-2 z0,k-1 Y0,k-1
a0,/~-l] T Hk[a0,k-1]
As we know from Section 2.2, the first nonzero element in the sequence d2~k_~+k, k - 0, 1 , . . . is T k - 1 -- d 2 ~ k _ l +otk -- Y^H o , k - 1 A~k-l& 0 , k - 1 .
So the first i for which (3.8)is nonsingular is i - ak since the Hankel m a t r i x (3.8) then takes the form 0 ......
0
d2~h_~ +ak
9
Dk-l,k-1 --
,
" 0 d2~k_t+ctk
" 999
d2~k-1
, with d2,~_~+~k = ~k-1.
(3.9)
3.4.
THE E U C L I D E A N L A N C Z O S A L G O R I T H M
109
We then choose e.g. z~k-~+, - Aiz0,k-1 and y,~j,_~+ifor i -
1,2,...,ak-
1. This defines the ( k -
(AH)i~lo,k-1
1)st clusters
Xk-1 - [5:o,k-llA:/:o,k-l[... [A'~k-15:o,k-1] and
Yk-x - [
o,k-ll... [(AH) ak-1 Y0,k-1]
o,k-xlAn
so that s p a n { ) ( o , . . . , J(k-1}
span{:~o,..., x~k-1}
-
=
span{xo, . . . , Z~;k_l }
and -
=
span{~)o, 9 9 ~)~k-1 } span{yo,...,y,~k_l}
while
g/H~j _ ~i,jDi,j;
O _ ~n+l - 1. The nonsingular submatrices H~,,~ of rank v + 1 are found for v + 1 equal to a Kronecker index, i.e. v + 1 - ~1, ~ 2 , . . . , ~n+l. These are the matrices t h a t we denoted as Hk, k - 0 , . . . , n . -
-
C H A P T E R 3. L A N C Z O S A L G O R I T H M
116
We also h a d the Krylov matrices (A C F (m+l)x(m+l))
X , . , - [xo[Axol... fA~'xo] and Y ~ , - [YolAHyol ... I(AH)"yo] and the Krylov spaces 2'~ - r a n g e X ~ - s p a n { z k - Akzo 9 k - 0 , . . . , v } y~ - range Y~ - span{yk - (AH)kyo " k - 0 , . . . , u} where dim X~ - r a n k X~ _< r a n k Xn~ - r a n k X o o - nx + 1 dim Y~ - r a n k Y~ < r a n k Yn~ - r a n k Yoo - nu + 1. Because H -- Yoo H Xoo, we clearly have ~n+l - r a n k H < m i n { n ~ , n u } + 1 < m + 1. We s t a r t by a simple l e m m a (see e.g. [87, w 1.2]). L e m m a 3.3 The infinite Hankel matrix H has rank a if and only if its first a columns (rows) are linearly independent while the first a + 1 columns (rows) are linearly dependent. Proof.
If hk, k - 0, 1 , . . . represent the columns of H , t h e n hi - Z - l h 0 , . . . , h k
- Z -kh0,.. .
where Z - 1 represents the u p w a r d shift. If a is an integer such t h a t
aoho + al hl + . . . + a~h~ - O,
ao, ~ O,
t h e n the columns h 0 , . . . , h~ are linearly d e p e n d e n t . s t a n t s such t h a t h~ can be w r i t t e n as
h~ - ~
Thus there exist con-
bihi.
i=0
We can plug this into c~-I
h,~+l
-
Z - l h`~
- ~
bzhz+l.
i=0
This shows t h a t also h~+l is a linear c o m b i n a t i o n of the first a columns. By i n d u c t i o n we find t h a t all the other columns are also in this span, which
3.5. B R E A K D O W N
117
proves that there can be no more than a hnearly independent columns. Thus the rank can be at most a. If a is the smallest such number, then it is equal to the rank. By symmetry considerations, this holds also for the rows. D Note the similarity with the result of Lemma 3.1 which we showed earlier for Krylov matrices. Any function of the form f ( z ) - y H ( z I - A)-lZo with A finite dimensional will be rational, and conversely, for any rational f(z), there exists a triple (A, zo, yo) so that the previous relation holds. By the Sylvester determinant identity formula, it holds that
f ( z ) - yoH(zI- A) -1 X 0
yo --
dj( - A) o d e t ( z I - A)
Here a d j ( z I - A) represents the adjugate matrix containing the cofactors. Hence, it follows that the poles of f ( z ) will be zeros of the denominator, which are clearly eigenvalues of A. However, the converse is not true : it may well be that zeros of numerator and denominator cancel. In that case, it is possible to write f(z) as f(z) - ( y ~ ) g ( z I - A')-lZ~o with the size of A ~ smaller than the size of A. We give in this connection the following definitions. D e f i n i t i o n 3.2 ( ( m i n i m a l ) r e a l i z a t i o n ) If f ( z ) is a rational function and B, C C F (m+~)x~ and A C y,,,• such that f ( z ) - C H ( z I - A)-~B. Then we call the triple (A, B, C) a realization of f(z). If m is the smallest pos-
sible dimension for which a realization ezists, then the realization is called minimal. D e f i n i t i o n 3.3 ( E q u i v a l e n t r e a l i z a t i o n s ) If (A, B, C) and ( ft, B, C) are
two realization triples of the same size, then they are called equivalent if they represent the same function f(z) - C H ( z I - A ) - I B - O H ( z I - /~)-1~. The following is true. Two realizations related by C F H c , [~ - F - 1 B and A - F - 1 A F with F some invertible transformation matrix are equivalent and if the realizations are minimal the converse also holds [165, Theorem 2.4-7]. -
The dimension of A in a minimal realization (A, B, C) of f ( z ) is called the McMillan degree of f ( z ) and also of the sequence {fk} if f ( z ) - ~ = 1 fk z-k.
D e f i n i t i o n 3.4 ( M c M i l l a n d e g r e e )
C H A P T E R 3. LANCZOS A L G O R I T H M
118
Clearly if f ( z ) = N ( z ) / D ( z ) w i t h N ( z ) and D(z) coprime, then the McMillan degree of f is also equal to deg D. We can now give the Kronecker theorem.
Theorem 3.4 (Kronecker) The rank of the Hankel matrix H ( f ) is equal to the McMillan degree of f. P r o o f . The entries of the Hankel matrix are the coefficients fk in the asymptotic expansion
f(z)-
f l z -1 q- f2z -2 Jr f3z -3 Jr...
If f has McMillan degree a, then f has a coprime representation f ( z ) = g ( z ) / D ( z ) with degree of D(z) equal to c~. It then follows from equating the coefficients of the negative powers of z in the equality
f(z)D(z) = N(z) that there exists a nontrivial linear combination of the columns h0, h i , . . . , h~ in the Hankel matrix which gives zero. Thus the rank can not be larger t h a n a by Lemma 3.3. On the other hand, the rank can not be smaller than a, say a ~ < a because then
hoqo + hlql + ' " + h~,q~, = 0 would mean that there exists a relation of the form
f ( z ) D ' ( z ) = N'(z) or equivalently f ( z ) = N ' ( z ) / D ' ( z ) with D' and N ' polynomials and with the degree of D ~ less than a. This is impossible because f = N / D was irreducible. D We shall now describe how one can, at least in principle, find a minimal realization triple for some rational f(z). We consider the space 8 = F m+l and the Krylov matrices Xoo and Yoo whose columns are in 8. The space is called the state space for the realization triple (A, z0, Y0). On the other hand we have the space
Fr~- { a - (fo, f l , . . . ) ' f k
e F}
of one sided infinite sequences of elements in F where only a finite number of f~ are nonzero. We can see the infinite Hankel matrix H - H ( f ) - Yoo H Xoo as an operator on the space F N. Some input sequence u C F N is mapped onto
3.5.
119
BREAKDOWN
some output sequence w E F ~ by first mapping it to a state s,," u ~ s,, = X o o u E 8, and then mapping this state to the output w" su ~
w -
yHsu.
The subspace ,~, - range Xoo is the part in the state space that can be reached by applying all possible inputs. The subspace S ~ - ker y H will be mapped to zero by y H and hence can not be observed in the output. The subscripts r and o stand for r e a c h a b l e and o b s e r v a b l e respectively. When they have an upperbar, this stands for the negation of that. These terms originate from linear system theory and we refer to Chapter 6 for further explanation. The state space can obviously be decomposed as 8 - range Xoo | ker X H - $r | S~ or as ,9 -
range
Yoo
| ker y H _ ,9~ | ,9~.
Recall that range Xoo ( r a n g e Y ~ ) i s the smallest A-invariant (AH-invariant) subspace containing x0 (Y0). On the other hand, k e r X H (kerYH ) is the largest AH-invariant (A-invariant) subspace which is orthogonal to x0 (y0). However, for our purposes, we want to split the state space as
a-&|174 where Sr - range Xoo and S ~ - ker Yoo H as above, but where S~- and 8o are subspaces complementary to Sr and S~ respectively but they need not be the orthogonal complements. So we define them to be complementary, i.e. dimS~|
m+l
-dimS~|
and S~N So -
{0}
-- 8r N 8~-
and such that we maximize the dimensions of 8ro- SonSr
and
S~--5- ,~nS~.
By taking intersections, we can define four complementary spaces which subdivide the state space. We set
~r-5 &o
We have obviously
-
Sr N , ~ -
range Xoo N ker y H
=
S r N So -- rangeXcr N So
-
S~N&-
=
&nSo.
$~NkerYH
C H A P T E R 3. LANCZOS A L G O R I T H M
120
L e m m a 3.5 With the definitions given above &
=
&
=
&
=
$
=
&o|
P r o o f . We first note that these four spaces S,~, S,o, 8 ~ , and S~o are complementary because the spaces So and S~ as we1 as $~ and S~ are complementary. Therefore, they can only have the zero vector in common. To prove the first relation, assume that v, is an arbitrary vector from 8r. Since ,~o C S,, there is a unique way of writing v, = V,o + v' with Vro E S,o and v ' in the complementary space S, & $~o. Obviously v ~ E S,. Thus it can have no components along S~o or along $ ~ since these are both subsets of S~-. This v ~ can not have a component along So because that component would be in Sro by definition, hence contained in V,o. Therefore v ~ is completely contained in S~. In other words, v ~ C S, N S~ = S,~. The next three relations follow for similar reasons and the last one is a direct consequence. El This construct is somewhat more complicated than the one in [200, p. 584] which uses the orthogonal complements. However the latter does not seem to be quite correct since each of the four spaces could have dimension zero, which can never give the complete state space of course. Also the proof of this result in [172, p. 23] is not completely correct. If we now choose a basis for these four subspaces, we can obtain a realization triple with a special structure, which is known as the Structure Theorem or the Canonical Decomposition Theorem. See for example [165,
p. 133]. T h e o r e m 3.6 ( S t r u c t u r e T h e o r e m ) Let (A, xo, yo) be a realization triple for f ( z ) - y H ( z I - A)-lXo. Let the subspaces ,,,era, Sro, S ~ and S~o be as described above. By choosing bases for these subspaces, then clearly, with respect to this basis, f ( z ) can be realized by an equivalent triple (A, ~0, y0) which has the form
,ill 2~ --
,ila iila ,il4
0
A22
0 0
0 0
0
~/24
A33 A34 0 A44
-
-
F -1AF
(3.16)
3.5.
121
BREAKDOWN
for some invertible matrix F and
L Xo -
0
O)
- F-lxo
a n d ~1o -
-
(3.17)
FSyo.
o
P r o o f . This follows immediately from the fact that the columns of F represent the basis vectors of the subspaces. Using the invariant nature of these spaces, we get the zeros at the correct places. First, since Sr~ - range Xoo N ker y H is A-invariant as the intersection of two A-invariant subspaces, we get the three zeros in the first column of A. Because also range Xoo - ,S,. - ,S,.-6 | S,.o is A-invariant, we have the block of four zeros in the left lower corner. Furthermore, a vector in the third subspace S ~ Sv N S~ is part of ,.q~- ker y g which is A-invariant and thus its image under A will be in the same space Sz = S ~ | S ~ . This explains the two zeros in the third column of A. The format of the vectors :Co and Y0 can be seen as follows. The vector x0 is in range Xoo - ,S,.-6 | ,9,.~ Thus, it has no components in the complementary space, which gives the two b o t t o m zeros in x0. The vector y0 belongs to range Yoo _L ker y g _ 8~. Thus it can have no components along the subspaces in S~. Since both S ~ and S ~ are subspaces of S~, we get the two zeros as in Y0. [7 Note that some of the blocks in this theorem could be empty. We give an example. E x a m p l e 3.3 Assume
AThen Xoo-
0
2
,
x0
-
[11 0
[11...] 0
0
Thus Sr-rangeXoo-Span{
'
Y0 -
[
--.
[1] 0
}
1-1] 11 ] 1
1
"
1
-..
[1]
and
By our definition of the complementary subspaces, we have 1
}
and
"
So-span{
0
}"
C H A P T E R 3. LANCZOS A L G O R I T H M
122 Note that
s an [01 1
}
and
[ ]
9
S~ - r a n g e Y o o - s p a n {
-1
1 }"
Then we define the following subspaces by taking intersections 0
}'
S~o-span{
[,]~ },
s~-~pan{
S~o-{
0
}'
[0]
o }.
Note however that
S, NS~- Sr nS~ - S~ NS~- S~ NS~ -- {0}. Taking the basis vectors of the decomposition
S = S~z | S~o | S ~ | S~o as the colums of F: F
we find after applying the equivalence transformation on the triple (A, x0, yo) that -
1
0
A-
0
2
"1
,
~0-
,
y0-
O
"
This is indeed as predicted by the Structure Theorem with empty rows and columns in the first and fourth position. The Structure Theorem leads to the following corollary. C o r o l l a r y 8.7 Let (A, x0, y0) be a realization for f ( z ) - y H ( z I - A ) - l x o and let (A, Z.o, ~1o) be an equivalent realization as described in the Structure Theorem. Then
f ( z ) - y H ( z I - A)-~Zo - [ ~ 0 2 ] n ( z I - fi~22)-1~02 .
(3.~8)
Note that A22 - P~o Al~o where A]~orepresents the restriction of the operator A to the subspace S~o and P~o the projection on this space. The size of the
3.5. B R E A K D O W N
123
matrix, 7t22, which is the same as dim &o, is the McMillan degree of f ( z ) . Thus dim &o - deg f ( z ) - rank H. In other words, the second realization of f(z) is minimal. It also holds that the grade of ~2 with respect to ~t22 and the grade of f/2 with respect to ft H are both equal to the rank of H. P r o o f . That (3.18) is true can be simply verified by filling in the transformed quantities. That it is indeed the minimal size, follows from the following considerations. We have the following mappings.
The rank of H Y~H X ~ which is the dimension of its range will be given by the number of linearly independent vectors that can get through from left to right. Since all the vectors which are in the kernel of y H are annihilated by y H , we should look in the range of Xoo for the vectors that are not in the kernel of y H . These are precisely the vectors in S~o with dimension equal to the size of A22. Since by Kronecker's theorem, the rank of a Hankel matrix H ( f ) is equal to the McMillan degree of its symbol f ( z ) , it follows that there is no matrix smaller than A22 to represent f ( z ) . All the poles of f ( z ) are given as the zeros of the determinant d e t ( z I - A22). There is no cancellation of poles and zeros. The statement about the grades can be seen as follows. Let )~cr and Yoo be the grylov matrices that are generated by the pairs (-/]22, Xo 2) and (/]H,~02) respectively, then H - l:H)(oo and therefore rank H is at most equal to these grades. On the other hand the grades can never be larger than the size of the matrix A22. Since the size of the latter is precisely equal to the rank of H, the assertion follows. D -
The ideal situation is when the Lanczos process can continue until the very end. That is when the following quantities are all the same 1. The McMillan degree of f ( z ) - the rank of the Hankel matrix H ( f ) N;n+l
2. The grade of x0 with respect to A 9 n~ -t- 1 3. The grade of Y0 with respect to A H" n~ + 1 4. The size of the matrix A 9 m -t- 1. In that case, we have :ff(nlAXm - Tn because ]:HJ(rn -- Dn, and thus ~ H _ Dn~-gl. Therefore A and Tn are similar matrices and as such have
CHAPTER 3. LANCZOS ALGORITHM
124
the same eigenstructure. The characteristic polynomials of A and of Tn are the same and the principal vectors of A can be easily transformed into the corresponding ones for Tn and conversely. This is more than we can hope for in general. Kecall t h a t the inequalities n,,+l _< min{nx, ny} + 1 k - k(i) are supposed to be zero. This system can always be solved by choosing ~i and rli equal to zero for k < i --
O~
i-O
~
1~.. . ~ v - 1
.
Of course a similar definition holds for the left polynomials. Note that a true orthogonal polynomial is not exactly an orthogonal polynomial as in the nondegenerate case, because a true orthogonal polynomial of degree v is orthogonal to all polynomials of lower degree, but it can be orthogonal to z ~ as well. From what we have discussed before, it will be clear that we have the following property. T h e o r e m 4.1 The first polynomial ao,k of the kth block as generated by the Block_two_s• algorithm is a true right orthogonal polynomial.
If it is chosen to be monic, then it is uniquely defined. This holds for any k = O, 1 , . . . and for left as well as for right polynomials.
4.2.
ORTHOGONAL
145
POLYNOMIALS
For this reason, we shall occasionally refer to the first polynomial in a block as the TOP of that block. The remaining (inner) polynomials of the kth block are not uniquely defined. Some of them are true right or left orthogonal polynomials, but others are not. Note that the first polynomial in the block is a monic true orthogonal polynomial and all the polynomials in a block are orthogonal to all the polynomials that are in the blocks that precede the kth but not necessarily to polynomials in the same block. Since block orthogonahty still means monic polynomials of precise degree, it means that we have to stick to unit upper triangular An and Bn and that we should keep the block diagonal structure of (4.2). Consequently the freedom left in the choice of the polynomials in one block is that we can right multiply each block with a unit upper triangular matrix. Thus we may replace ak and bk by akUk,k and bkVk,k respectively where Uk,k and Vk,k are both unit upper triangular matrices. This describes all possible sets of block orthogonal polynomials. Now, because the block an is right orthogonal to all bk, for k = 0, 1 , . . . , n - 1, it follows that it is right orthogonal to all polynomials of degree ~t most ~,~ - 1, a sp~ce which is also spanned by the blocks ak, k = 0 , . . . , n - 1. Hence, the sequence of blocks ak is block right orthogonal to itself. In general however, since the moment matrix need not be symmetric, the block sequence ak need not be block left orthogonal to itself. Similarly, the sequence bk is left, but not necessarily right block orthogonal to itself. A possible choice of these upper triangular matrices Uk,k and Vk,k is to make the diagonal blocks in the unit upper triangular matrices An and Bn unit matrices This is obtained by choosing Uk,k -- A k,k -1 and Vk,k -- Bk. ~ where Ak,k and Bk,k are the ak+l • ak+l upper triangular blocks in the matrices 9
A.,k-
"
and B.,k -
Ak,k
" Bk,k
where Ai,k and Bi,k a r e ai+l x c~k+l and where A.,k and B.,k contain the stacking vectors of ak and bk. These block biorthogonal polynomials can be recursively computed as described in the next theorem. We first recall the following well known property.
Lemma 4.2 (Schur complement determinant formula) square matrix, subdivided as M
M0 M10
M01 ] Mll
Let M
be a
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
146
with Mo nonsingular and M(o) - M l l - M l o M o 1Mol the Schur complement of Mo in M, then det M - det M0 det M(0 ) . The l e m m a gives an expression for the determinant of a Schur complement. T h e o r e m 4.3 Suppose Mn-1 and Mn are two successive nonsingular leading principal submatrices of the moment matrix M. Their size is 'on and 'r - 'on + an+l respectively. Let Bn and An be the upper triangular matrices containing the stacking vectors of the left/right block orthogonal polynomials for the moment matrix Mn where we suppose that the diagonal blocks in these matrices are equal to unit matrices. Dn is the block diagonal matrix Dn - B H MnAn. The matrices Bn-1, An_l and Dn-1 are defined similarly, containing one block less. Subdivide these matrices as follows.
[ nl n]
Mn -
Mn,.
mn
'
On [Oni 0 ] 0
An [An io An]i
Dn,n
'
[ nio n]i
Then the following relations hold A.,, - -M(~I 1M.,. and B.,n - -M(~ H M H
(4.3)
Dn,n - M(n) - mn - Mn,.M~: 1M.,n
(4.4)
- -
_ _
n~
9
and
is the Schur complement of Mn-1 in Mn. Furthermore, the determinants of Mn and M,~-I are related by det Mn - det
Mn-1 det Dn,n;
det Dn,n - det M(n).
(4.5)
These determinant relations hold for any set of monic block orthogonal polynomials. P r o o f . The equalities (4.3) and (4.4) follow immediately from the fact t h a t B H M n A n - Dn. The equalities (4.5) follow because the determinants of An and B H are equal to 1, so that by induction det Mn - det Dn - det
Dn-1 det D,~,n - det Mn-1 det Dn,n.
The expression for det Dn,n follows by the Schur complement determinant formula. [:]
4.2.
ORTHOGONAL POLYNOMIALS
147
We have seen that there is a considerable degree of freedom in choosing the polynomials in a block. One way of making them unique is by requiring that they are monic in a matrix sense, i.e., we could make the diagonal blocks in the matrices An and Bn unit matrices. Another way to reduce this degree of freedom for the intermediate polynomials is by requiring that each block Dk,k in the block diagonal Dn is a permuted diagonal, or, if we do not force the polynomials to be monic, we can normalize them to make the blocks Dk,k equal to permutation matrices. So we may introduce the following definition. D e f i n i t i o n 4.10 ( q u a s i - o r t h o g o n a l ) Suppose that the leading principal submatriz Mn of the moment matrix M is nonsingular and has size v + 1 = an+l. We call the set {bi, ai)~=o a set of quasi-orthogonal left/right polynomials if it is block orthogonal and if moreover each polynomial at (tck o
The Schur algorithm computes the hk and gk recursively to get a block LDU factorization of M. We have proved in Section 2.6 that in the case
4.5. SCHUR ALGORITHM
157
where M is Hankel, this is equivalent with the Euclidean algorithm, for which it was shown that it also gave a factored form of the successive Schur complements. Schur developed the algorithm originally for positive definite Toeplitz matrices [215]. We show that the same objectives can be reached in the case of a general matrix. We describe one step of the algorithm, which can be chosen to be the first one, without loss of generality. So let M be as above with Mo of size a l . Suppose we have a factorization
M-HHG
with
H-
E
Hoo Hlo
Hol Hll
]
'
G -
[coocol] Glo
Gll
Both Hoo and Goo are of size al. In Section 4.7 on Krylov subspace methods, it will be shown that such a factorization is naturally available. Since
Mo - [HH HH][G H GH] H is nonsingular, both [H H H HI and [C H G H] have full rank c~1 and thus it is possible by applying row permutations on G and corresponding column permutations on H H to make Goo nonsingular. It turns out that then automatically also Hoo will be nonsingular. Indeed, we can write M = H H G - HHQ-1QG where Q is such that
[ oo 1o I 0
(~1
while
Q
1
-
HH
/:/H
with Goo nonsingular and of size al. Since it follows from this form that Mo HooGoo H and because Mo and Goo are nonsingular, also Hoo is nonsingular. Thus, we can assume without loss of generality that Goo and Hoo are nonsingular. In that case, we can define -
~--GlOGoo 1
and
c ~ - - H l o H ~ 1.
Set further S = I + o~H~ and T = I + ~ H . because s - • + ~~
Both S and T are nonsingular
~ -1 - n J M o V o o - ~ + ~ o ~ HloGloGoo
~
Thus S is nonsingular. By Lemma 4.2, also T will be nonsingular because det
ii ~
I
-det(I+~a
H)-•
[i
-a H
I
"
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
158 Define
8G ----
[s10
,
and -
I
0
T~'
] j
where eoLSR = S and TLTR = T are factorizations of S and T. 8H8C = I. It is easily checked that G -- eGG --
[
V00 0
]
V01 Vii
and
I-I H
--
H HOH - -
Then
H
[
0
~[11
]
and hence
M-
HHG-
HHeHeGG-
.[-IHG.
~ o o - Mo. Introducing the following notation Note H~oHo G
~(=)- ~x- [ ~OOo
(~ol
-0
0
go(=)]
='~'c1(=)
and similarly for ~r(y), we obtain
M(:r,,y)
-
[-I(y)Hc_,(x)
-
=
ho(y)HMolgo(x)+
[ h0,, 1 [ o1 0] [ ,0,., ] ya~Hi(y)
0
I
=~'GI(=)
(Yx)a'HI(y)HGI(x)
or equivalently
H~(y)HC,~(=)-
(yx)-o,, [M(z,
y) - h o ( y ) H M o l g o ( x ) ]
-
M(X, y).
The Schur algorithm, of which we have just described the first step, repeats this and computes successive Schur complements in factored form. The a and /3 matrices are generalizations of the reflection coefficients in the classical Schur algorithm. Our construction of 8c and 8H was given so as to maintain as much s y m m e t r y as possible. If we do not care about this symmetry, we could produce the zero blocks in the factors H and G one after the other. This could be seen as a Viscovatoff-like version of this Schur algorithm, in the sense t h a t it is a two-step algorithm like the Viscovatoff algorithm is a twostep staircase computation of a diagonal step, eliminating the two nonzero
4.6. R A T I O N A L A P P R O X I M A T I O N
159
residual blocks one by one. For example, we could set a before and generate
-HloH~o 1
as
HHG _ (HHO,H)(O,GG) dr H,HG,,
,,ode,[i
0
o.][i I
0
I
-I.
This makes HI0 - 0 The next step is to choose/3 - -G10G00' -1 and
H H G _ (H,HO~)(O~G) dr H,,HG,,,
o o de,[ _filli Thi~ m , k e ~ Ci'o -
0 ,nd
k~p~
0][1 0]_i
I
HI~ -
flH
O.
I
The desired effect has been
obtained in two steps.
4.6
Rational approximation
In the first chapter we have seen that the polynomials a0,n appeared as -1k which were approximants denominators in the rational functions co,kao, for the formal series f C F. Also these approximation results can be generalized to a certain extent. The formal series approximated here will be oo
/(z)- ~=o ~ zk+~ ,0,~ e F.
(4.10)
The rational forms which approximate can be given in the matrix notation as
fn(z)-
e T D n ( z I - Tn) -le0, with e0T - [ 1 0 0 999 0].
(4.11)
The matrix Tn is the block upper Hessenberg matrix and Dn the block diagonal matrix associated with the moment matrix M as in the previous sections. The approximating property is given in the following theorem. Theorem
4.13 Let f ( z ) and fn(z) be as defined in (4.10) and (~.11) above.
As before the index n refers to the fact that the dimension of the matrices is y + 1 - '~n+l where the '~k are the block indices for the moment matrix M. Then it holds that deg ( f ( z ) -
fn(z)) < - ( u + 2).
CHAPTER
160
4. O R T H O G O N A L
POLYNOMIALS
P r o o f . First note that f,.,(z) has the formal Laurent series expansion
Then we use the relation A~.IFnAn - T,., of Theorem 4.9 where Fn is the companion matrix for the right orthogonal polynomial a0,n+l. This gives us
T k _ A n 1FkA,.,. Since An is unit upper triangular, A,.,eo = eo. Thus f,.,(z) may be written as oo
f , . , ( z ) - eToD,.,A;I~"
Z-(k+l)
Fnk eo
k=O
By the form of the companion matrix we now get k e 0 --
ek fork-0,1,...,v -a_o,n+ 1 f o r k - u + l .
On the other hand
eT D,.,A~ 1 - eT B H M, ., -- eT M,., - [#0,o it0,1 "'" #o,~].
(4.12)
This leads us to the fact that f n ( z ) __ ~0,0_~_ ~0,1 ~O,v z --fi-+" " "H. zv+l.
eTo Mr,~,,.,+l . ZV+2 . .
eTo MnF,',a-o,,.,+l . . ZV+3.
(4 913)
By the relation (4.1), we know that the coefficient of the term in z -(~'+2) is equal to ~ 0 , v + l - This proves the theorem. [3 Note that M,.,F,., - M < so that the coefficient of the term in z -~'-3 in (4.13) is equal to -[#0,1 #0,2 "'" tt0,~,+l]a_0,,+l, which is in general not equal to ~/,0,v+2 9
Alternative determinant expressions for the rational function f n ( z ) can easily be obtained. C o r o l l a r y 4.14 The rational function f n ( z ) of the previous theorem can also be written as
fn(z)
-
e T D , . , ( z I - T,.,)-~eo
= =
eTD,.,(zDn-D,.,T,.,)-lD,.,eo eToD,.,A~l(zM,.,- M < ) - I B ~ H D , . , e o
=
eTMn(zM,.,-
M
endfor { here 6 - 2~k+1 } Solve Dk,ka_k+l = -d_k+x with l~k+t 1 D k,k -- [d2,~k+ p T q - 1 Jp,q= and ~+1 - [d6-~k+~+l... d6] T Set a k + l ( Z ) - z T [a__k+ T 1 1] T ao,k+l(z) - a k + l ( z ) a o , k ( z ) + ~k+lao,k-l(z)
endfor
172
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
Then the following ezpression holds for the reproducing kernel K n ( x , y) -
a0,n+l(x)a0,n(y) -
where ~,~ - (z ~', ao,n} # 0 with v - 'r
ao,.+l ( y ) a o , n ( x )
- 1.
P r o o f . From the relation (2.34)
K , . , - M(, T -
M~
1 -
A,.,D~,IA T,
we get
g n ( x , y ) - x T M n l y -- A n ( x ) D ~ l A n ( y ) T. Let Tn be the (truncated) Jacobi matrix, An(x) - xTAn the row vector containing the first n + 1 blocks of orthogonal polynomials and F(a0,n+l) the Frobenius matrix as before. Then the relation F(ao,n+l)An - AnTn, or equivalently T x A n ( x ) - An(x)Tn + e~,ao,n+l(X), leads to
xKn(x, y) - A,~(x)T,~D~IAn(y) T + (eTD~lAn(y)T)ao,n+l(X). Because D~ 1 is block diagonal with upper triangular Hankel blocks, it is directly seen that e_ , ,
r~D..-1 __~,
~
[0
9
9
9
0~ ~;,~ 0
9
9
9
0]
Therefore
xKn(x, y) - An(x)TnD~,IAn(y) T +
ao,n+l(x)ao,n(y)
(4.21)
?'n
Note that TnD~ 1 - D~ 1JnD~, 1 with Jn the symmetric Jacobi matrix is itself symmetric and this means that expression (4.21) is completely symmetric in x and y, so that we can immediately write that also
yKn(z, y) - A,~(x)T,~D~IAn(y) T +
~0,~+l(y)~o,~(~)
(4.22)
T n
Subtracting (4.22) from (4.21) gives the Christoffel-Darboux relation for the non normal case. [::] This formula has as an immediate consequence
4.8. THE H A N K E L CASE
173
C o r o l l a r y 4.19 The inverse of a Hankel matrix is a quasi-Hankel matrix. P r o o f . The Christoffel-Darboux formula shows that the generating kernel for Kn - M,~ 1 has the form required for Kn to be a quasi-Hankel matrix. Just take g2(x) - ao,n+l(Z) and gl(x) - ao,n(z)/~n in the general definition 2.7. [3 Note that the polynomial gl(x) = ao,n(X)/~n = --U0,n+l(Z) where 'tt0,n+ 1 is the polynomial appearing in the (2,1) position of the matrix V0,n+l of the extended Euclidean algorithm when we adopt the monic normalization. G(0) # 0 since Kn is invertible. Also Lemma 2.13, which was proved independently, is a consequence of the Christoffel-Darboux formula. To see this, we define the Hankel matrix S k whose first row is given by
[ ~ " 0 dcL dot+l ... ] cLk--1 with a - 2 a ; k - 1 - } - a k . This means that S k is the Hankel matrix whose entries are the dk elements, computed in the algorithm Block_MGS, after deleting the first 2ak-1 elements. The first leading nonsingular matrix in S k is Dk-l,k-1. Now we recall the relation (2.13)
Dk_l,k_la_k -- --dk which gave the coefficients of the monic polynomial ak which appears in the recurrence relation a0,k(Z ) --
ao,k_2(Z)~ k -Jr-ao,k_l(Z)ak(z).
This relation says precisely that ak is the first true orthogonal polynomial (TOP element) in block 1 of the block orthogonal polynomials that can be associated with the Hankel matrix S k. The TOP of block 0 being 1 by definition, it follows that by applying the general Chistoffel-Darboux formula for the Hankel S k and for n - 0 that the formula
Dkll,k_l(X y ) _ ak(y) - ak(x) '
(y-
X)~k-1
of Lemma 2.13 holds. The reproducing kernel, which by definition can be written as n y)
-
-
k=0
174
C H A P T E R 4.
ORTHOGONAL POLYNOMIALS
reduces in the normal case (when all blocks Dk,k have size 1 and contain the nonzero element rk) to the familiar formula n
K.(x, y)-
~ k=0
ao,k(z)__ao,k(y) rk
In the non normal case, we can write this as above in terms of the blocks of orthogonal polynomials where ak(z)-
ao,k(z)[1
z
...
za],
a-
12k+ 1
-
-
1.
When we use the above expression for D k,k -1 (z, y), we get n
K n ( z , y) - ~
ao.k(z) ak+l(X)
--
ak+l(Y)
For ak+l(z) - z + "~k+l, this reduces to the previous formula. Another interesting corollary of the Christoffel-Darboux formula is an inversion formula for a Hankel matrix given in the next theorem. T h e o r e m 4.20 (Hankel inversion formula) W e use the same notation as in Theorem 4.18. Introduce the coefficients of the TOPs
=0,.(z)
-
-
k=0
k=0
where p,~. - ~,~,,+~ = 1 and let all pk - 0 f o r k > ~,~ and Ak - 0 for k > a , + l . Then K , - M(, 1 is the difference of two triangular ToeplitzHankel products" K . - ~I(TpH>, - T~Hp) with A1
Do
Pl T~
m
9
Po ,
,
9
Pl]
Pl
"'"
,~0 )~1 T~
HA-
9
~
.
."
~l)
""
/~0 ~
.~176
Hp-
9
P12 ~
0
~ ~
P12
)~o
1
9
"~
9
~
~,
1
Po
l
.
"'"
0
""
1
4.8. THE HANKEL CASE
175
P r o o f . We use the formal expansion 1
x-y
-+
y
+
y2
+...
to get
~.g.(~
y)- ~ k=l
oo
~o,.+~(~) ao,n(Y)yk-1-- E ao'n(X) ~ao,n+l(y)y xk xk
k-1
k=l
The first sum can be written as
~+12 aT
Z2
O,n+l aT O,n+l
YT[ao,,,IZao,nlZ 2~o,nl 9 ]
Z3
where Z is the finite dimensional shift matrix of size gn+l and Z the shift matrix of size ~n+l + 1. Note that since they are finite dimensional these matrices cut the elements that are shifted out and thus Z k - 0 for k _ ~n+l. Furthermore, noting that ,,,r~ - [ 9 zk~o
"""
0. po
"" "
p~-k] r
k
and
ar ~ . . 2 k - [ ~ k n,n_t.
1
9 .
.
:~ 1 ~0 .. 0j .
~r
k
This means that the contribution of the first sum to the matrix Kn is indeed the product TpH~. Similarly, it is shown that the second sum contributes the product T~Hp. D This formula was given by Lander [179] and Heinig [133]. Also in the book of Heinig and Rost [144, p. 32], this and other inversion formulas for Hankel matrices can be found. D e f i n i t i o n 4.13 Define for the polynomials 12
12
~(z)- ~ p~z~ an~ b(z)- ~ ~ z ~ k=O
k=O
the bilinear form
B(~, y) - ~(~)b(y) - ~(y)b(~) x--y
176
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
This B ( z , y) is called the Bezoutiant of the polynomials a and b. The matrix B whose generator is given by B ( x , y ) is called the Bezoutian of the polynomials a and b. D e f i n i t i o n 4.14 ( S y l v e s t e r m a t r i x ) Ira(z) and b(z) are two formal power series O0 O0
a(z) -- ~
pkz k - zTa
and
b(z)- ~
k=O
)tkz k - zTb
k=O
then the matrix (possibly cut after a finite number of rows} [z f l - l a
I "'"
Ia
I Z'~-ab I
"'"
I b]
is called a Sylvester matrix. In the special case where a and b are polynomials, of degree a and fl respectively,
~(~)-
~
p~z ~ - ~
~nd
b(z)- ~
k=O
~~
- gb
k=O
one may construct their Sylvester matrix of size a + /9O
t90
)tO
P~
P~ where the first block column has fl copies of a and the second one contains a copies of b. Its determinant is the resultant for the system of equations
{
~(~)- 0 b(~)- 0
Bezout [15] introduced these in 1764 to find a condition under which the polynomials have common zeros thus to find their greatest common divisor. The two polynomials are coprime when the determinant of their Sylvester matrix is nonzero. In search for an explicit expression for these determinants, Sylvester introduced the terminology Bezoutians and Bezoutiants in 1853 [221]. There is a vast literature on Bezoutians [59, 60, 90, 91, 114, 135, 138, 143, 144, 145, 146, 147, 153, 154, 204, 205, 209, 210, 211,212], mostly in the context of stability properties of polynomials. We shall return to this
4.8.
THE HANKEL CASE
177
later. The relation between Hankel matrices and Bezoutians is attributed to Hermite in [175]. In our case we find that the polynomials a0,n and a0,n+l are coprime by Theorem 1.22 and we have by the Frobenius identity det Vo,n+l - vo,n+1ao,n+l - uo,n+ 1CO,n+I -- 1 in the case of a monic normalization. U0,n+ 1 -- a o , n U n + l , this also means that
Because vo,n+l = co,,.,u,,,+l and
a o , n + l CO, n - - a o , n C O , n + 1 ~
-1 Un+ 1
which is also equal to -~n for a monic normalization. Thus we have shown that the inverse of a Hankel matrix of size u + 1 is a Bezoutian of two coprime polynomials, one of degree u + 1 and the other of lower degree. The converse is also true. Therefore we need the inverse form of the Euclidean algorithm, which was already discussed in Section 2.4. We have explained there that, given the monic TOPs a0,n and a0,n+l, all the previous monic TOPs can be reconstructed in a unique way, and thus that the Hankel matrix Hn is uniquely defined by these two polynomials. Of course, the polynomials being monic is not essential, if they are not, we can normalize them first and apply the same procedure. Thus we have now also proved the converse in T h e o r e m 4.21 The inverse of a Hankel matrix of size u + 1 is a Bezoutian of two coprime polynomials a(z) and b(z) with u - deg a > deg b and conversely, the Bezoutian of two such polynomials is the inverse of a Hankel matrix of size u + 1. The procedure of reconstructing the Hankel data from the polynomials a0,k -1 and a0,k+l by dividing ao,kao,k+l and splitting in polynomial part (ak+l) and a-1 strictly proper part/3k+1 o,ka0,k-1 (see Section 2.4) is nothing but the Euclidean algorithm , which is applied to the data [a0,n+l,-U0,n+l ]T 9 Indeed the direct Euclidean algorithm builds up the matrix V0,n+x = V1-"Vn+I from left to right; this inverse procedure takes out the factors in reverse order by multiplying from the right with Vn+l, -1 V~-1 etc. This is of course equivalent with building up the product V0,n+l from right to left. Indeed, taking -1 1 - I, gives [Uo,n+l ao,n+l]Vo,n+ 1 - 1 __-[0 1] the second line of Vo,n+lVo,n+ Since for a monic normalization, V0,n+ 1 - 1 ---~v'T0,,,+IG with ~_[01
-1]0
178
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
as in (1.27), we can organize this as
[1]0 0n l[ o0nl ] Thus the matrix V0,n+i is rebuilt from right to left when the Euclidean algorithm in its left formulation is applied to the vector in the right-hand side. At the same time, it constructs a solution to the Diophantine equation
~(z)~(z)- ~(,)~(~)
-
(4.23)
i
where u(z) = uo,n+l(z) and a(z) = ao,n+l(z) are given polynomials with degu < dega, while v(z) = vo,n+l and c(z) = co,n+l(z) are polynomial solutions with degv < degu and degc < dega. Note that v and c are the numerators of the Pad~ approximants whose denominators are u and a. Obviously, in general, such an equation can only have a solution when u(z) and a(z) are coprime. In this context, the equation (4.23)is often called the Bezout equation. Our previous construction shows that the following theorem holds. T h e o r e m 4.22 The Bezout equation (4.23) with u(z) and a(z) given coprime polynomials with deg u < deg a has a unique polynomial solution v(z) and c(z) with deg v < deg u and deg c < deg a. These solutions are constructed by the Euclidean algorithm (with suitable normalization) applied to the data [a(z) - u(z)]. For further study of Bezoutians we refer to the literature cited above.
4.9
T o e p l i t z case
Let us now consider the case where the moment matrix is a Toeplitz matrix T - [#j_~]- [(z~, zJ>]. Before we introduce the orthogonal polynomials, let us first have a closer look at Toeplitz matrices and their generators. Since a Toeplitz matrix is completely defined in terms of its first row and its first column, we should have, as in the Hankel case a simplified description of the bivariate generating kernel T(x, y) - ygTx. Let us define the series oo
s+(~) - ~ k=O
oo
~_~
~nd
S-(~) - - ~ k-1
~z ~
4.9.
TOEPLITZ CASE
179
and the formal Laurent series O O
f(z)-f+(z)-f-(z - 1 ) -
~
#_kz k.
The series f is called the symbol of the Toeplitz matrix and it can be used to define the inner product. The inner product can indeed be described in terms of a fixed linear form f*, which is defined on the set of the Laurent polynomials (these are the formal Laurent series with only finitely many nonzero coefficients in both directions). The definition of this linear form is
f * ( z k) - I~k,
k - O,-Fl,:s
....
In other words, for any Laurent polynomial p, f*(p) will give the coefficient of z ~ in the formal product f ( z ) p ( z ) . The inner product is related to this linear form by
We introduced here the formal para-conjugation c, to mean the following" If ~ ( z ) - E ~kz k e F, then w~ ~ t ~ , ( z ) - E ~ k z -k. It i~ imm~diat~ly checked that - f ' ( z j - i ) - i ~ j _ , . 4.9.1
Quasi-Toeplitz
matrices
A Toeplitz matrix T has the displacement property
AT-
T-
ZTZ T
-
#0 ]1,-1
#1 #2
#-2 ,
#o
~--I ~-2
--
-
1
0 0
[01][0 1
0
-I
0
0
.
.
~
.
~
.
HH~G.
In terms of the generating kernel yHTx, this means (1-
xy)T(x, y ) -
[f+(~)
- 1]~[f-(r
- 1] T - f + ( ~ ) -
f-(x).
Conversely, any matrix whose generator satisfies such a relation with f - ( 0 ) 0 is a Woeplitz matrix with symbol f ( z ) - f + ( z ) - f - ( z - ~ ) .
180
CHAPTER
4.
ORTHOGONAL
POLYNOMIALS
Note that the choice of the G and H is not unique. Another choice could be for example
i f + ( y ) + ~I .f+ (Y) - -~
HH(y) -
and
a(~) -
[
f-(~) + ~ f - ( ~ ) -
Like the Hankel matrices, Toeplitz matrices can be embedded in a larger class of quasi-Toeplitz matrices. D e f i n i t i o n 4.15 ( q u a s i - T o e p l i t z m a t r i x )
We say that a matrix M is quasi-Toeplitz if it satisfies A M - M - Z M Z T - H H E G with G and H arbitrary matrices with 2 rows: G T - [gl g2], H T - [hi h2], such that G(O) 7~ 0 and H(O) # O with G ( x ) - Gx and H ( y ) - H y . Thus AM - M - ZMZ T - HHEG-
-h2gT - -hlg T.
This means M ( z , y) - y H M x -
H(y)Hr~G(x) I- zy
1- zy
The conditions G(0) r 0 and H(0) r 0 are again imposed to avoid that the first row or column of M are zero and that it does not have any nonsingular leading principal submatrix. The factorization of A M - M - Z M Z T - H H N G is not unique since any couple (OH, Or) of N-biorthogonal matrices, i.e., 0HNoG -- E, will give H(y)HEG(z)-
H(y)HOHEOcG(z)-
~r(y)HEl~(x)
with 1~ - OGG and H OH H. This freedom can be exploited to generate standard factors. We say that the factorization is in standard form if H(O) H - [ # o o -1] andG(O)-[0 -1] T. -
L e m m a 4.23 For a quasi-Toeplitz matrix M , we can always bring the factorization (1 - z ~ ) M ( z , y) - H ( y ) H E G ( x ) in standard form. P r o o f . Suppose H ( z ) - [hi(z) h2(z)] T and G ( z ) ously ttoo - gl(O)h2(O)- g2(O)hl(O). We define
[gl(Z) g2(z)] T. Obvi-
[ g2,o,gl,o, and oo [ g,o,u
4.9. TOEPLITZ CASE
181
then all the conditions /~HEOc-E,
H(o)Ht? H-I/zoo
-c],
OaG(O)-[O
- 1 ] T,
cCFo
are satisfied if -
-
o
Note that the determinant of this system is #oo. Thus it has a unique solution if/Zoo ~ O. This unique solution exists for any choice of c, so we can e.g., choose c = 1. This proves the l e m m a for #oo ~ O. If ttoo = O, then there is a nonzero c~ C Fo such t h a t [hl(0) h 2 ( 0 ) ] - [ g l ( 0 ) g 2 ( 0 ) ] c ' . If c is chosen to be this c ~, then the system has a solution. We can now make this c - 1 by dividing the second column of H H 8 H by c and multiplying the first row of 6~G by c. This proves the l e m m a for/zoo - 0. El A quasi-Toeplitz m a t r i x is equivalent with a Toeplitz matrix. More precisely 4 . 2 4 A matrix, M is quasi-Toeplitz iff there ez,ist upper triangular Toeplitz matrices RH and Rc such that T - R H M R c is a Toeplitz matrix.
Theorem
P r o o f . Suppose we write the generator of M with a s t a n d a r d factorization ( 1 - x~)M(:r,,y)- H ( y ) H E G ( z ) - [hi(y) h2(y)]F-,[gl(:r,) g2(x)] T with H(O) H - [/zoo - 1] and G ( 0 ) - [ 0 - 1]T. Let R c and RH be the upper triangular Toeplitz matrices whose first row is generated by 1/g2(z) and 1/h2(z)respectively. Then the generator for T - R H M R c is given by
T(z,y)
=
M(:r,, y)
=
gl(:r,)h2(y) - g2(:r,)hl(y)
f+(y)- f-(z) 1 - my with
f+(ff)_
hi(y)
and f-(x)-
which shows t h a t T is a Toeplitz matrix. The converse is trivial.
gl(x)
f-(O)-O
C H A P T E R 4.
182 4.9.2
ORTHOGONAL POLYNOMIALS
The inner product
Let us now see what properties for the inner product are implied by the moment matrix being Toeplitz. By definition, the inner product is only defined for the polynomials, thus (z ~, zJ) - #j_, is in principle only defined for i and j nonnegative integers. However, by setting
we may as well assume that the inner product is not only defined for polynomials, but that it is defined for Laurent polynomials as well. We shall freely use this property in the sequel. Let us first introduce the following definition, which plays an important role in the derivation of Levinson-like recursions. D e f i n i t i o n 4.16 ( r e c i p r o c a l ) Let p~,(z) e F~,[z] be a polynomial. we define its reciprocal as the following involution in F~,[z] 9 12
]2
k=0
k=0
Then
e
Note that the definition of the reciprocal depends on the index v and since a polynomial p belongs to F~,[z] for any v _ degp, the reciprocal is not well defined, unless the v is explicitly mentioned. However, in order not to complicate the notation, we assume that it should be clear from the context what this v is meant to be. The moment matrix T being Toeplitz implies that it is persymmetric. Persymmetry means that the matrix is symmetric with respect to the antidiagonal, which means that it is equal to its own reflection in its antidiagonal, thus that it satisfies I T ] - T T, where /~ is the usual reversal operator. This relation also holds for the finite sections of the moment matrix, i.e., if i denotes now a finite reversal operator, then it also holds that Tv,~, ] - T T Also the moment matrix does not change when we shift it up-left, which means that Z T T Z - T, where Z is the down shift operator. The latter relation holds for the infinite dimensional moment matrix, but for the finite it needs an adaptation in the sense that with Z now a finite dimensional shift, ZTT~,~Z - T~-1,~-1 | O. The latter two properties of the moment matrix are reflected in the properties of the inner product which are given under 1 in the next theorem. T h e o r e m 4.25 Suppose that the inner product, defined for the polynomials, has a m o m e n t matrix (with respect to the standard bases) which is Toeplitz.
4.9. TOEPLITZ CASE
183
1. Let p and q be polynomials in F~,[z], then ( p , q ) - and ( p , q ) - . 2. If p~,(z) e F~,[z] is right (left) orthogonal to z k, then p#v(z) e F~,[z] is left (right) orthogonal to z "-k . Proof. 1. For the first part we see that
(p, q)
_
pHT~,,~, q _ []; p]H[ ~ T~,,,,, .I ][ I q]
=
[.i p]H[T~,,~,]T[ .i q] -- [i q]TT~,,~,[ i p]
= q#HTt,,~,p#--. For the second equality we note that
(zp, zq)
-: = -
[0 pH]Tv+l.v+l[0 qT]T [pH o]ZTTv+I.~+IZ[qT 0]T [pg 0][T..~ | 0][q T 0] T pHT~,,~,q -- (p, q).
2. For the second property note that (zk,p(z)> - 0 implies that also
- 0 by part 1 of this theorem. El
4.9.3
Iohvidov indices
We introduce now indices which are closely related to the notion of characteristic of a Toeplitz matrix as given by Iohvidov [156, p. 106] or the one given by Heinig and Kost [144, p. 91]. In those definitions, the characteristic is a triple of integer numbers that can be associated with an arbitrary Toeplitz matrix. We shall associate with an infinite Toeplitz matrix T two sequences of couples of integers: a sequence of left couples and a sequence of right couples. They will fix the size of the sequence of nonsingular leading principal submatrices of T, hence the block sizes for the biorthogonal polynomials. We shall call them Iohvidov indices, although this might be different from what is meant by the same term in other publications.
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
184
Let T - [#./_~] be a Toeptitz matrix whose nonsingular leading principal submatrices are Tk, k - 1, 2 , . . . of size tck. Let bo,k and ao,k be the uniquely defined TOPs of degree ~r associated with the moment matrix T. Define ak+ 1 to be the smallest integer a >_ O, such that
Definition
4.17 (Iohvidov
indices)
-[~ct+l
0 such that 0 and an+ 1 + - 0, then shifting of a0,n gives a simple update across a~+l classical blocks of size 1. We shah rearrange the right block orthogonal polynomials (where block has the classical meaning of being related to nonsingular leading principal submatrices of the moment matrix) in somewhat larger blocks. For example, the shifted KOPs of the previous lemma would typically belong to the same block of ROPs, even though they might (or might not) belong to different classical blocks of size 1. Thus we rearrange the ROPs in blocks defined in function of the computational procedure. To make the distinction, we call them right blocks (R-blocks) while the classical blocks related to the nonsingular leading principal submatrices of T are called T-blocks (T-blocks do not depend on the adjective left or right). A formal definition of the Rblocks is given below. The R-blocks are the same as the T-blocks, except that some of the l~-blocks may group a sequence of T-blocks of size 1. Our previous block structure is a refinement of the R-block structure: The boundaries of l~-blocks are boundaries of T-blocks, but the converse need not be true. There is a similar algorithm for the left orthogonal polynomials which will generate these as (in principle) yet another set of blocks, which will be called L-blocks. We shall develop the recursion for the right orthogonal polynomials, leaving the left dual as an exercise for the reader. We shall describe how to compute R-blocks of right orthogonal polynomials. The R-block sizes will be denoted by pk, k - 1, 2, .... We denote also P0,n - ~ = 1 Pk, which are the l~-block indices. The right Iohvidov indices corresponding to l~-block number n will be denoted by Pn+l• They form a subsequence of the Iohvidov indices a,~+l+ which correspond to the T-blocks. We are now ready to start the derivation of the row recursion. We first show how to start up the recursion. L e m m a 4.28 Let T = [#j_i] be a moment matrix and let ao = 1. The right Iohvidov indices for R-block 1 (and thus also for T-block 1) are p{--a{ pl+ - a
t
-
min{a 9 / Z a + l - {1, za+la0) # 0 } - o r d f + ( z ) -
--
min{a 9 # _ , - ( z " , a 0 ) # 0 } - o r d f - ( z ) .
1
1. If p+ - oo, then T is strictly upper triangular and there is no invertible leading principal submatrix. There is only one infinite T-block which is also an R-block. The right orthogonal polynomials do not have to satisfy any orthogonality condition and they are arbitrary of appropriate degree.
C H A P T E R 4.
188
ORTHOGONAL POLYNOMIALS
2. If Pl - oo, then we may choose ak(z) - z k, k - O, 1 , . . . which gives an infinite R-block, the only one there is. (a) If P+I - O, then T is lower triangular with nonzero diagonal. All the T-blocks have size 1. (b) If p+ > O, then T is strictly lower triangular. There is no invertible leading principal submatriz. There is only one infinite T-block which is an infinite R-block. The above choice for the right orthogonal polynomials still holds but any other set of polynomials of appropriate degree would do as well since there are no orthogonality conditions to be satisfied. 3. If v - p+ + p~ < oo then there is an R-block number 0 of size pl - v + 1 containing the right orthogonal polynomials ak(z) - z k, k - 0 , . . . , v. (a) T~,,~, is the first nonsingular leading principal submatriz of T for which a monic true right orthogonal polynomial a~,+l exists, which can not be chosen to be z ~'+1 . (b) If p+ - O, the R-block number 0 contains v T-blocks of size 1. (c) If P+I > O, then T~,,~, is the first nonsingular leading principal submatriz of T. It has the form 0
...
0
I~l+p~
...
9
#,, 9
9
0
,L/,l_l_p~
#_p~
0
9
9
9
,u_,,
9
~
""
~-o~
0
...
0
with #_p+ #l+p~- ~ 0. (d) The right orthogonal polynomial av+l, which is the first one of R-block number 1 can be computed by solving the system Tv,~,~,+l -- - [ # v + l -
z
+
"'" #I]T.
The system can be solved very
e.ffi-
ciently. P r o o f . We only check case 3. The fact t h a t the polynomials in B.-block 0 can be chosen to be the powers of z follows from L e m m a 4.27 if p+ - 0, and
4.9.
189
TOEPLITZ CASE
in the case pl+ > 0, this is a trivial observation since T~,~ is then the first nonsingular leading principal submatrix of T. For k < v, we can choose for ak any monic polynomial of degree k because it need not be orthogonal to any other polynomial of lower degree. We have chosen R-block 0 to be of size v + 1 because it is the first value of n for which the orthogonality requirements - 0 for k - 0 , . . . , n, lead to a unique nontrivial monic solution. Indeed, there is either a trivial solution z n+l (for n - 0 , . . . , p ~ - - 1) or no monic solution (for n - & , . . . , v 1). Since T~,~ is nonsingular, the system defining a~+l and reflecting the true right orthogonality of a~+l, is uniquely solvable. [:] !
Note t h a t we could have a situation with IZ0 ~ 0 and ~1 - 0. In that case p+ - 0 and p~- > 0. The submatrices T0,0 as well as T1,1 are nonsingular, while the polynomial al is obtained by shifting a0, i.e. both are considered to be in the same R-block number 0 of right orthogonal polynomials while T-block number 0 in the sense of earlier sections clearly has size 1. It was said in the previous l e m m a that if p~- - cr then an infinite 1%block of right orthogonal polynomials can be obtained by shifting the initial one. This observation is true in general. If T,,_I - T~,~ (v + 1 - P0,n) is nonsingular and the corresponding Iohvidov index Pn+l is infinite, then an infinite block of shifted right orthogonal polynomials exists. This fact is shown in the next lemma. L e m m a 4.29 Let T be a Toeplitz matrix and assume T~,,v is nonsingular. A s s u m e it defines the first n R-blocks, thus v + 1 = po,,~. The monic right block orthogonal polynomial a~,+l is a T O P and thus uniquely defined and therefore also the corresponding right Iohvidov indices Pn+ + 1 are defined. Let a~,+l (z), k O, 1 , . . . is a set of right P,~+I be infinite. Then a v + k + l ( Z ) - z k orthogonal polynomials. The R-block number n is infinite. If moreover pn+ 1 - O, then this R-block consists of infinitely m a n y Tblocks of size 1 9 If however Pn+l + > O, then there is no leading principal submatrix Tk,k that is invertible for some finite k > v, i.e. the T-block is also infinite. P r o o f . By L e m m a 4.27, we know that zka~,+l(z) is right orthogonal to all z i for i - 0 , . . . , v + k. We can express this in matrix terms as ([lz ...],[a0al
. . . a: l a~+l za~+l . . . ]> - T A - R
190
C H A P T E R 4.
ORTHOGONAL POLYNOMIALS
with T the infinite moment matrix and A the infinite unit upper triangular matrix containing the stacking vectors of the right orthogonal polynomials, as we have described them above. The matrix R will have the form R
R0o R10
0 ] Rll
with R00 of size v + 1, block lower triangular and invertible and Rll is lower triangular Toeplitz with zero diagonal element iff P++x > 0. Comparing the principal leading principal submatrices on the left and on the right, we see that all submatrices Tk,k are singular for k -> v + 1 if P,~+I + > 0 or they are all invertible if Pn+l + - O. [3 m
Note that the condition Pn+l - c~ is sufficient to have an infinite block of right orthogonal polynomials, but the extra condition Pn+l + > 0 is necessary to have an infinite block of singular submatrices. We can check this by considering the following example E x a m p l e 4.4 Consider the moments satisfying /z0 ~t 0 while #k - 0 for all k > 0. Then p~- - oo but p+ - 0 and all leading principal submatrices are invertible (all T-blocks have size 1) while there is an infinite It-block of right orthogonal polynomials which are just the powers of z.
The situation where P~,+I - oo clearly requires the simplest possible updating procedure. Another simple situation occurs when Pn+l + - oo. This situation is discussed in the next lemma and after that the general update will be given. Before that we have to define formally how we subdivide the right orthogonal polynomials into K-blocks, which, as we have repeatedly said, are not always the same blocks as the T-blocks corresponding to invertible leading principal submatrices. Deft n i t i o n 4.18 ( R - b l o c k s ) Let T be a m o m e n t matrix which is Toeplitz. We set by definition po,o - O. R-block number 0 has size pl - p{ + p+ + 1. Set v + 1 = pl, then if v < oa, the matrix T~,,~, is invertible. Hence its Iohvidov indices are defined. Denote them by p~. R-block number I of right orthogonal polynomials will have size p2 - p2 + p+ + 1. In general denote the cumulative R-block sizes as po,n - ~'~=1 pk. It will then be proved below that for v + 1 = po,n, the leading principal submatrices T~,,~, are all invertible whenever v < oo. Hence its lohvidov indices Pn+l + are well defined and R-block number n of right orthogonal polynomials will have size Pn+ l - Pn+ l + P++a + 1.
4.9.
TOEPLITZ CASE
191
The invertibility of T~,~ when v + 1 = p0,n proves t h a t boundaries of Rblocks are also boundaries of T-blocks. This property will be proved by induction in the following lemma's. The initialization of this induction was already given in L e m m a 4.28. In K-block number n of right orthogonal polynomials, two polynomials will be of special interest to describe the recursions. T h a t is the first one of the R-block, which we denote as a0,n (referred to as the T O P of t h a t block) and h,~ - zP:+ ~+1 ao,n (referred to as the NOP of t h a t block see below). This is a slight abuse of notation. Previously a0,n referred to a~., the T O P of T-block n, whereas we now use it with the meaning of ap0,. , the T O P of l~-block n. We are convinced though that this will not lead to confusion. Considering the sequence of polynomials zkao,n for k - 0, 1 , . . . , the polynomial an is the first one in the sequence which is not a right orthogonal polynomial from P~-block n, because it is not right orthogonal to the constant polynomial 1. Indeed, by definition of P~+I, the inner product
- (1, z p'~--+'+Iao,n > # ( zk
II
0. Note also that
an)-O
for k - 1,
9
9
9
,po,n+l-1
but (zp~247 ~ , a . )
-
zP~
~ , a0,.
r 0
by definition of Pn+l. + For easy reference we shall call the polynomial fin the NOP for K-block n, referring to the fact that it is Not a right Orthogonal Polynomial. We can refer to it as the N OP for block n if we assume it is uniquely defined by a monic normalization. We now give the l e m m a describing the u p d a t e when PO,n+ 00. L e m m a 4 . 3 0 Let T be Toeplitz and T~,,v with v + 1 = po,n a nonsingufar leading principal submatriz. Let the corresponding Iohvidov indices be P++I - oo and Pn-+a < 00. Then all Tk,k are singular for k > v. There is an infinite R-block of right orthogonal polynomials which can be
computed as follows: Set a~+k+~ (z) - zka~+l (Z) for k - 0 , . . . , Pn-+l. Shifting the last of these polynomials once more will give the N O P fin. Setting - v + P~+I + 2, we can now find a nonzero constant cn+x = - ~ / r n - - 1 (see page 18g) and monic polynomials s of degree j such that a~,+j(z)
-- z3~l,n_l (z)cn+ 1 -~- an(z)d~+l,j(z), j - 0, 1, 2 , . . .
P r o o f . T h a t Tk,k is singular for all k > v follows from the fact t h a t Ta~+l - 0 because Pn+l + - oo. Let A be the unit upper triangular m a t r i x
192
C H A P T E R 4.
ORTHOGONAL POLYNOMIALS
whose u + 2 first columns are the stacking vectors of the right orthogonal polynomials ak, k = 0 , . . . , u + 1 and the remaining columns are columns from the identity matrix. Then, setting T A = R, we see that the determinant of a leading principal submatrix Tk,k of T will be equal to the determinant of the corresponding leading principal submatrix of R. Since the latter always contains a zero column for k > u, all Tk,k are singular for k>u. The choice for the first P~+I + 1 polynomials a~,+i+l is correct by Lemma 4.27. The remaining polynomials can be obtained as indicated. For example, take j = 0. We have
{ :riO f o r k - 0 '
-0
for k -- 1 , 2 , . . .
Similarly (
){ z k,a,,-1
Thus there exists a right orthogonal to More generally, right orthogonal to
#0 -0
fork-O fork-l,...,p0,n
-1
nonzero constant Cn+I such that ap - a,n-lc,,+1 + an is z k for k - 0 , . . . , p 0 , n - 1. suppose ap+i for i - 0 , . . . , j 1 have been computed all z k for k - 0 , . . . , p 0 , n - 1. Now, z j ap - z ~(a,,_ 1 c,,+ 1 + a . )
is right orthogonal to z k for k - j , . . . , p 0 , n - 1 + j. By adding multiples of z'~n, i - 0, 1 . . . , j 1, we can satisfy the remaining orthogonality conditions. It remains to be shown that the polynomials ap+j are monic of the correct degree. This will be the case if deg 5n-1 < deg ~,,. Now deg an-1
=
Po,n-1 + P~, + 1
0 it is the smallest nonsingular leading principal submatrix of T which contains T~,,~,. In other words, pn+l has the meaning of the block size of a T-block. If Pn+l + - O, then all T~,+i,~+i for i - O, 99 9, Pn+l are nonsingular. The corresponding right orthogonal polynomials are all TOPs, but Tv,,~,, is the smallest submatrix containing T~,~ for which the T O P a~,+l can not be obtained by simply shifting ao,n.
P r o o f . The choice for the first P~+I + 1 polynomials a~+i+l is justified by Lemma 4.27.
CHAPTER
194
4. O R T H O G O N A L
POLYNOMIALS
To prove that the remaining polynomials in the same R-block are of the indicated form, we may repeat the proof of the previous lemma for a finite number of updates. To show the claim about the (non)singularity of the leading principal submatrices, we consider the current block of right orthogonal polynomials. Express the right orthogonality of the polynomials ak, k = 0 , . . . v ' , which we have obtained so far in a matrix form. If we put their stacking vectors in a unit upper triangular matrix A~, and if we denote Tn = T~,,~,, then it holds that T,.,A,., = R,., where Rn is block lower triangular. The determinants of the leading principal submatrices of Tn are equal to the determinants of the leading principal submatrices of Rn because An is unit upper triangular. Now we can write R,~ as (suppose for the moment that Pn+l+ > 0)
R,, -
[Rnl ~ X
[~ R12] ,47,
R,,,,,
R22
X
with R22 a lower triangular Toeplitz matrix with diagonal elements cn+l + + (zk,ao,n> dn+l(O).
--
Both of the inner products are nonzero by definition of p+ and Pn+~ + respectively. Because we also know t h a t cn+~ is nonzero, it follows t h a t also dn+l (0) can not be zero. D Note however t h a t the previous corollary is only true for polynomials starting a new P~-block. T-blocks corresponding to nonsingular leading principal submatrices may start with polynomials with vanishing constant terms if they do not coincide with the start of an R-block. We are now able to prove an approximation property which will be identified as a two-point Pad6 approximation in a later section. The two series t h a t will be approximated are
f +(z) --f_(z)
-
-
-
[tO "{- ~ - l Z + ~-2 z 2 + ' ' "
(4.34)
#1 z - 1 + ~2 z - 2 + ' ' '
(4.35)
The series f+ is the same as the previously mentioned f + , but for ease of notation we have used f_(z) for what was previously denoted as f - ( z - 1 ) . The rational approximant has denominator a0,n and the n u m e r a t o r can be found as follows. First note that for v + 1 - p0,n, we may write /tO
~1
"""
~v
9
#-1 9
9
#-~,
#o
".
,
~v+ 1
9
#,,
9
.
a v + l -- 0
.
9
9
/.tl
"'"
#-1
9
tto
#1
which can be split up into /.to ~
Cv+l
--
/to
#-1
av+l 9
~--v+l
0
~1
~2 9
~ 9
~
9 ..
/to
9 .
.
/.tv+
0 1
o
av+l. ~2
0
#~
4.9. T O E P L I T Z CASE
201
Clearly c~+1 is the stacking vector of a polynomial c~+1 = co,,, which has degree at most v. It can be described as the polynomial part of f_(z)ao,n(z) or as the initial terms in f+(z)ao,n(z). Using the notation of Section 1.5, we can express this as
co,,(z)
-
--
f_(z)ao,,(z) div 1 ZP~
(4.36) zP~
(4.37)
It follows readily from this definition t h a t
r~, (z) - f_ (z)ao,n(z) - Co,n(z)
-
~ z - l - P ~ + 1 + lower degree terms §
r+(z) - f+(z)ao,n(z) - C0,n(Z)
-
~+z~'+l+~
+ higher order terms
with nonzero ~ , and ~ . Now because the constant t e r m as well as the leading coefficient in a0,n is nonzero, we can t r a n s f o r m this into f_(~) - c0,,~a0, -In
=
r~-n z -.-2-.:+ 1 + lower degree terms
(4.38)
-1 -- C.o,nao, n
__
~
(4.39)
f +(z)
ZU-[-1 +o,~+~ +
+ higher order terms.
The rational function co,nao, n-1 which has 2u + 2 degrees of freedom, fits in the two series f+ a total of 2(u + 1) + pn+l - 1 coefficients with pn+l + Pn+l + P~,+I + 1 >_ 1. Since one a p p r o x i m a t i o n is in the neighborhood of z - 0 and the other in the neighborhood of z - oo one calls this a two-point Pad6 a p p r o x i m a n t . The complete analysis of the generic case and the block s t r u c t u r e of the nongeneric case can be found in the book [41]. 4.9.7
Euclidean
type
algorithm
W h e n using the n o t a t i o n of series as we just did, it is not difficult to give a Euclidean type algorithm for the Toeplitz case too. It is related to the type of algorithms described in [38]. First note t h a t the nonzero initial coefficients ~+ and ~ t h a t we introduced above, correspond to
~+ --rn
_
-(
( : 0 , . ++~+~, a0,~ ) z - 1 - ~ 2 4 7 , ao,n
(z~~ ~+~, a ~ )
) -- (1, an)
This means t h a t the constant cn+l in the three t e r m recurrence can be expressed as
~+~ = - ( ~ _ ~ ) - ~ ( ~ ) .
CHAPTER 4. ORTHOGONAL POLYNOMIALS
202
The monic polynomial d~+l(Z ) which also occurs in the recursion, has a stacking vector satisfying (4.30). This system expresses the equality of the polynomial part of oo
oo
(E ?'i,n Z -
-i+1
t
)dn+l(Z)
+
- -Cn+l(E
i=1
?'i,n-lZ-i+l)
Zp'+I"
k=l
!
Thus d,+z(z ) is obtained as -+~ +pZ-p~+l r~,_z (z)c~+l div r~, (z). d~+ (z) - - z p+ All the other dn+l, j (z) are obtained by taking fewer terms in this division. ' To be precise d'+ a,j(Z) -- -- zJ+PZ -':"~+1 7 ,-n _ l ( Z ) C n + l div r~, (z), quadj - 0,...,P~+I. + t t t t is Note that dn+xj(z ) - zdn+xj_x(z ) + dn+l, i(0), i.e. that the next dn+ld obtained by continuing the division one step further. Since, according to Theorem 4.3 , the largest j for which we need d~+~j(z) in the recurrence is j - Pn+l, + it is sufficient to compute d~+l(z ) - d'n+l,p++ (z). iF we know
this one, we know them all. The polynomial dn+x(z) in the three term recursion is a composition of two parts:
d n + l ( z ) - z~
!
I!
dn+l(Z ).
(4.40)
It is explained above how to find the higher degree part d~+ 1. The lower II II degree part dn+ l(z) has a stacking vector d,~+l , satisfying (4.31). As for d~+l, this system expresses the equality of the first P~+I + 1 coefficients in oo
oo
ljn
i=0
z' ) d"n + l ( Z ) - - c n + l
$1n
-1 z' ) "
i=0
This polynomial d~+~(z) is at most of degree Pn-+l" The previous relation means that it can be obtained by dividing suitably shifted versions of r+_l (z) and r+(z). To be precise a"+l(Z)
-
-
+ -p~+ +o;+1 r n+_
1
divr+(z)]
Thus to get an algorithm in Euclidean division style, we should be able to compute the successive r k+ and r~-. By the linearity of their definitions one can easily find that they obey exactly the same recurrence as the denominators a0,k. We can collect these results in Algorithm 4.6. To show the
4.9.
TOEPLITZ
203
CASE
similarity with the Hankel case Euclidean algorithm, one could write the recurrence in terms of G and V-matrices. So if we set
Ok -
?'k-1
Tk
ao ,k- 1
ao ,k
rk_ 1
rk
I
+
,
+
+ -Fpk-I-1
[
0
V k + 1 --
Z ph+l
1
Ok+ 1
dk+l(Z)
]
then Gk+l = G k V k + l . Note however t h a t the G m a t r i x contains two residual series and the denominators, whereas for the Hankel case we had the numerators, denominators and one residual series. We now give an example which illustrates the result of the algorithm. Example
4.5 We consider the following series I+(~)
:
I z2 + ~ 1 Z3 + g 1 Z4 + 1-6 z5 + 9 z6 + "
I-(~)
=
-z -I+~
l z _ 3 _ 1 -s
~z
+g
lz
-
7
1
-9
- 1--~z + . . .
The algorithm initializes a0,0 - 1 and c0,0 - 0 and the corresponding residuals are
r 0(z) _
=
Iz_3_ -z -1+ ~
1
~z
-s
I
+ gz
-7
1 -9 - 1--~z + . . .
The Iohvidov indices are p+ - 2 and p~- - 0 so t h a t Po,l -
Pl -
P+ -I- P l -I- I - 3.
The leading coefficient in r o is ro - - 1 . To find a0,1(z) we should solve a homogeneous Toeplitz system. Setting a0,1 - [ ~ T 1 1] T, this system is
I~176 0 1
0
1
a_o,x-
0 0
-i
The m a t r i x of this system is the first nonsingular leading principal s u b m a t r i x of the m o m e n t m a t r i x T - [#j_,]. The solution of this system defines a0,1(z) ~s , t
ao,l(z) - z a + ~ z L
2.
C H A P T E R 4.
204
ORTHOGONAL POLYNOMIALS
Algorithm 4.6: Toeplitz_Euclid Civen the y+(z) - E~~ # - k z k ~nd - y _ (z) - E~~ t, kz -k Set aoo - 1, ~oo - O, %+(z) - f § p~- - min{p _> 0 " ~l+p # O} p+ - min{p > 0 " # _ v # 0} Set ro - - t tp~-+1
To(Z)-
y(z),
poo-
0
Solve T,,,v+lao,1 - 0 , T,,,,,+I = [ t t j - i j i l~,,v+ = o j = o1
with u + 1 - Po,1 - Pl - P t + P l + 1 Set Co,1(z) - f(z) - ao, 1 (z) d i v 1 r t ( z ) -- f + ( z ) a o , l ( Z ) - c0,1(z) r l ( Z ) -- f - ( z ) a o , l ( Z ) - C0,1(Z) for k - 1 , 2 , . . . Define Pk+l and ~- by r k ( z ) -- r k ( Z ) Z-I-p-~+I if 1.d.t., ~- ~ 0 Define Pk+l + by + r k ( z ) - r v+ k ( z ) zPO,k+P++~ -F h . o . t . , rv+ k r
Define Ck+l - - ( ~ -
1)-1~ -
Set d~+ l(z) - -[zP++ 1+P; r~-_l (z)ck+l] div[z pk+l r~- (z)] m
It Set dk+l(z ) _ _zpk+~ [zpk++,-pk+~ +p~-+l rk_ 1 + (z)ck+l d i v rk+ (z)]
Set dk+l(Z) -- dg+l(Z ) + ZPk-+l+ldlk+l(Z ) Tk+l Tk-1 + +p~-+1 a0,k+l aO,k-1 z pk+~ Ck+l + a O , k dk+l + + r+k rk+l rk-1 co,k+l(Z) - f _ ( z ) a o , k + l ( z ) d i v 1 + Set pk+l - Pk+l + Pk-+l + 1 and po,k+l - po,k + pk+l endfor -
-
4.9.
TOEPLITZ
CASE
205
The corresponding n u m e r a t o r c0,1 can be found as the polynomial part of f _ (z)a0,1(z). It is C0,1(Z) -- . f _ ( z ) a o , l ( Z
) div 1 -
-2.
The corresponding residuals are now obtained as
~+(z)
~{-(z)
--
f+(z)ao,l(z)-
_-
_l_z~_ _lz,_ A z ~ _ ! z ~ 4
8
16
f_(z)ao,l(z)-
-
=
C0,1(Z) 32
c0,1(z)
2z_ 1 _ I Z - 3 + l2z _ s + Oz_ s . . .
This shows us i m m e d i a t e l y t h a t p~ - 0 and ~{- - 2. Hence P0,2 4. We can check the correct orders of approximation"
C.O,l(z)ao,l(Z) -1
1
=
- - z 2 ( z 3 -[- ~ z - -
=
_z-1
_-
~l z 2 +
-
-
P0,1 + 1
-
-
2) - 1
+ ~1 z - 3 - 2 z - 4 + . . .
z3+
1 z4 + " -
W h e n we compare this with the series f_ and f+ respectively, we see t h a t the difference starts deferring from zero with the t e r m in z -4 where 4 p0,1 + P2 + 1 and for the other one with the t e r m in z 3 where 3 - p0,1 + p+. We s t a r t now a regular iteration. Next we have to identify c2 as E2-
--('PO)-I'p1
--"
2 --1 = 2.
Then we c o m p u t e
d~2 -
- [ r o C l ] div[r{-]
=
- [ - 2 z -1 + z -a - l_z-S + . . . ] d i v [ 2 z -1 - - Z -~ + ...] 2
:
1.
In fact, the above calculation is unnecessary because we know this should be 1 because the polynomial has to be monic. The second part of the recursion polynomial d2 is found as
=
-[z 3+
----
4.
z 4 + ~l z s
+'"
i divi_ z3_ z4+ i
206
C H A P T E R 4.
ORTHOGONAL
POLYNOMIALS
Consequently the polynomial d2(z) is d2(z) - z + 4. We can now do an update.
ao,~(z)
=
++p;-+a ao,o zp' c2 + ao,l(z)d2(z) 1 1.z.2+(z ~+~z-2)(z+4)
=
z4+4z 3+~ lz2 +2z-8.
-
The residuals can be u p d a t e d by similar recursions"
~+(z)
~+(z)z.~++p?+~ ~ + ~+(z)d~(z)
=
~ - (z)
lz4 4
_
lzs
_
8
_lz6 +.... 16
~o- (z)z.~++p~-+l ~ + ~ (z)a~(z)
-
:
8Z - 1 _ 4Z - 3 +
2z - s + . . . .
This gives p3~ - 0 again and r2 - 8. Thus/90,3 P0,2 W 1 -- 5. We have again the predicted correspondence between the expansions of c0,2ao, ~. We have indeed t h a t -
c0,2(z)
:
-
f_ (z)a0,2(z) d i v 1 _ z 3 _ 4z 2.
The expansions give co,2(z)ao,2(z) -1
_
x + 1_z-3 33 -s -- - - Z ~2 4 l z 2 + iz3 + L z 4 + . . .
-
~
=
--Z--
~
"'"
32
We check just one more iteration. The constant c3 is ~
-
_(~-)-~
-
--- --4.
Now the higher degree part of the recurrence polynomial is d~
=
-[r{-c2] div[r~-] - [ - 8 z -1 + 4 z - 3 . . . . 1.
] div[8z - 1 _ 4z -3 + --.]
4.9. TOE, PLITZ CASE
207
Again, we know in advance that this should be 1 because the polynomial has to be monic. The lower degree part is again of degree zero and is given by d~
-
-[zr+c2] div[r +]
=-[z4+ --
zS+...]div[-~
z 4 -- i z S
....
]
8
4.
This delivers d3(z) - z + 4. Now we are ready for another update: a0,3(z)
--
aO,l(z)zP+3+P;+lc3 Avao,2(z)da(z)
=
(z 3 + ~ z1 - 2 ) z ( - 4 ) + ( z
=
zS+4z 4+~6z3+2z
4 + 4 z 3 + ~ 1 z2 + 2 z - 8 ) ( z +
4)
2+8z_32.
2
The corresponding numerator is c0,3(z) - - z 4 - 4z 3 - 16z 2. The expansions of the approximants are =
1 -3 1 s - - Z - 1 -~- ~ Z -- ~ Z --
_
_1z2 +
-
2
lz3 + lz4 +
32z -6 + . . .
...
etc.
4.9.8
Atomic
version
The previous algorithm was formulated in terms of R-blocks, which in a computational sense is the most natural thing to do. It computed explicitly the TOPs of the R-blocks, and not the inner polynomials of the blocks. It is of course possible to explicitly compute the inner polynomials as well and at the same time define the T-blocks. This would correspond to an atomic version of the previous algorithm. We do not give all the details, but we describe the decomposition of R-block number n. Since the T-block partitioning is a refinement of this R-block partinioning, this decomposition
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
208
should also give the T-block recursion. Therefore we let the index n refer to a T-block. Suppose we know the T O P s of two successive l~-blocks. This is what we need to make the recursion work. Suppose t h a t the second l~-block corresponds to T-block n u m b e r n. We will then denote the T O P as a0,n and the corresponding residuals as r + and r~,. Thus co,,,
=
f - a0,n d i v 1
rn
:
f-ao,n
- - Co,n
r I"L +
:
f+ao,n
- - Co,n
m
The T O P of the previous l~-block will also be the T O P of some previous T-block, which need not necessarily be T-block n u m b e r n - 1. Assume it is T-block n u m b e r 7i with fi. 0, thus t h a t the T-block has size larger t h a n 1, we find t h a t the N O P 5n-1 is of degree nn-1 + a ~ + 1 which is less t h a n the degree of a0,n which is n,, - '~n-1 + a~, + a + + 1, while its stacking vector satisfies an equation like q with the 1 in the right-hand side replaced by ~,-1 ~ 0. Thus in this case q - ~,,_1(~,_1) -1. If however a + - 0, then an-1 has the same degree as a0,n and we do not get the zero at the position v + 1 of vector q. We could in t h a t case set [qr
0it
_
_
whereby the offending leading coefficient is eliminated without changing the right-hand side. This however is not so nice to work with. This is because we only used the right polynomials. At this point, it is b e t t e r to introduce the left polynomials as well since the vector q appears more naturally in terms of a reversed left polynomial as we shall see. For the dual results, we use the following notation. The analog of (4.34,4.35) is g+(z) --g_(z)
Note t h a t g ( z ) a p p e a r in
g+(z)-
-
#o + # l z + # 2 z 2 + . . .
(4.47)
--
~_1 z-1 + ~_2 z-2 + ' ' ".
(4.48)
g_(z) -
f(z-1).
The corresponding residuals
with s~ (z)
-
~, z - l - ~ + ~ + lower degree terms
s+ (z)
-
~+ z"+l+~++ ~ + higher order terms
214
CHAPTER
4.
ORTHOGONAL
POLYNOMIALS
with nonzero $~, and $+. Using Theorem 4.25 and the definition of the Iohvidov indices fl+ , it follows that q'(z) - z ~+ b0,,,-1(z) # is right orthogonal to all z k for k - 1 , . . . , u - ~,~- 1 but not for k - 0 since (1, q'(z)) - sn_ ~+ x O. Recall that the size of the T-blocks is not left-right sensitive, and thus { an-~0,nandan-1 a n - a + +a~, + l - f l n - / 3 + + fl~ + l
ifa +-f~+-0 otherwise.
Thus the previous orthogonality property means that the polynomial q = ( sn_ ~+ 1)- ~q~ of degree at most a,~-I + f~+ < an has a stacking vector which satisfies the defining equations of the vector q. This choice is always valid, independent of the size of the T-block. Of course it should be the same as before, since the system has a unique solution. The left and right polynomials can be computed independently from each other by the algorithms we have seen. There are however combined computational schemes, which are generalizations of the Levinson and Szeg6 algorithms. These will be given in Section 4.9.11. Since the fundamental polynomial p ( z ) - zTp -- ao,n(z) is a TOP polynomial, we shall denote the corresponding residuals r~ temporarily as r~. Recall r ; (z)
-
f _ ( z ) a o , , ( z ) - co,,(z),
r+(z)
-
f+(z)ao,n(z) - Co,n(z),
deg r~-(z) - - a ~ + 1 - 1 + o r d r + ( z ) - an + an+x
and that f ( z ) p ( z ) - r + ( z ) - rp(Z). For the other fundamental polynomial q(z), we introduce similar notations. We set ?'q- (Z)
--
_ ('~d-Sn_l)-18n_d- 1 ( z - l ) z'~'-~+~+ ,
deg r~-(z) - 0
Tq+ (Z)
__
__ 8n_l)("d- -1 8 n _ l (Z -1 )Z ~;'~-1 +fl: ,
oral ?'q
-
To motivate this choice, note that
g ( z ) b o , n _ l ( Z ) - 8n_ § 1 (Z) -- Sn_ I(z) -- f ( z -1 )b0,n-1 (z) f(z)b-o,n-l(Z - 1 ) - 8 dn _ i ( Z - 1 ) - - 8 n -_ l ( Z - 1 ) f ( z ) b ~ n _ l ( Z ) - Z~n-l[,~n+ i(Z--1 ) -- 8 n _ l ( Z - - 1 ) ] . Multiply the last relation with z fl~+ (Sn+_l)-1, t h e n the above definitions lead to f ( z ) q ( z ) - r + ( z ) - r~-(z). L e m m a 4.35 With our definitions, just introduced, we have r+(z)p(z)-
r+(z)q(z)-
z '~" - r ; ( z ) p ( z ) -
r;(z)q(z).
4.9. T O E P L I T Z CASE
215
P r o o f . The relations f ( z ) q ( z ) lead to
0
1
r+(z)-r;(z)and f(z)p(z)-
q(z)
p(z)
q(z)
r+ (z)-r~- (z)
p(z)
"
The determinant of the first matrix is 1. Checking the orders of the residual series r + and r + it follows that the determinant of the second matrix has an order at least '~n. Similarly, checking the degrees of the series rp and r~and the polynomials p and q, it follows that the determinant of the third matrix has a degree at most '~n. More precisely, it is of the form z ~ + lower degree terms. Combining these results, the only possibihty is that the determinants of the second and third matrices are equal to z ~ . This proves the result. [:3 To express r~- and r~- in terms of the vectors p and q, and the given Toeplitz matrix, we use the following notation. q0
P0
Pl
ql
P0
qo
~ ~
Lp
P0 Pv
q~,
Lq
q~
Pl 9
q0 ql
(4.49)
9 , ,
0
,
"'.
: qv
Pls
1
0
The invertible Toeplitz matrix T - Tn-i - T~,~ has size u -{- 1 - ,~n. If we extend it with a square block T s of the same size so that
then, by definition of the residual series rp- and r~-, the products Up]
_
R~-and
[
IT']
[ Lq
]-R;
(4.50)
deliver both upper triangular Toeplitz matrices containing the initial coefficients of these series. R~-
9 "rO,p
with
r ; ( z ) - ~ - ~_ r_~ . p_ z k-O
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
216
and similarly for R~-. Before we prove the inversion formula, we need one last lemma. L e m m a 4.36 With the notation just introduced, R g U, - R ;
-
P r o o f . We recall that the multiplication of series in z -1 is equivalent with the multiplication of upper triangular Toeplitz matrices. For f, g 6 F[[z -1]], and denoting by Th for h 6 F[[z-~]], an upper triangular Toeplitz matrix with symbol h, then TfTg - Tfg. When f, g 6 F(z -1 ), the Woeplitz matrices are not exactly upper triangular. They are however upper triangular up to a certain shift and a similar result holds. Applying this to the relation r;(z)q(z)-
z
and cutting out matrices of appropriate sizes gives the desired relation.
Now we can give the inversion formulas for Toeplitz matrices which is due to Heinig [134], see also [144]. T h e o r e m 4.37 ( T o e p l i t z i n v e r s i o n f o r m u l a s ) Let p(z), q(z) be a fundamental pair of polynomials for the invertible Toeplitz matrix Tn-1 = T~,,~,, u T 1 = nn. This means that their stacking vectors satisfy the equations (~.46). Furthermore, define the triangular Toeplitz matrices Lp, Up, iq, Uq as in (~.49). Then
T(,~_~ - L q U p - LpUq - U p L q - UqLp. P r o o f . Using (4.51)and (4.50), we find
I-
T(LqUp - LpUq) + T'(UqUp - UpUq).
Since both Up and Uq are upper triangular Toeplitz matrices, they commute and therefore the last term in the right-hand side will vanish. The first formula then follows immediately. For the second formula, we can make use of the persymmetry of Toeplitz matrices i.e., .f T .~ - T T for any Toeplitz matrix T. When applied to the first inversion formula, we get the second one. [:] As an immediate consequence of the inversion formula, we find a Christoffel-Darboux type formula for block orthogonal polynomials with respect to an arbitrary Toeplitz matrix.
4.9.
TOEPLITZ CASE
217
T h e o r e m 4.38 ( C h r i s t o f f e l - D a r b o u x f o r m u l a ) Let Tn-1 = T~,~ with v + i - an be an invertible Toeplitz matrix. Set Kn-1 - T(~T , then its generator is a reproducing kernel for the polynomials of degree at most v. Let ak and bk be the blocks of monic block orthogonal polynomials associated with T. These reduce Tn-1 to the block diagonal matrix Dn-1 by a decomposition of the form B,~H1Tn_IAn_I - Dn-1. If (p, q) is a fundamental pair .for Tn-1, then p(z) - ao,n(z) and q(z) have
-($+_l)-lzf3+b o,,~# 1 ( z ) and we
n-1
bk(y)D-k, Tak(m) T k-O
q(~lp#(y) q#(y)p(~) -
i-
xy
P r o o f . With the notations of the previous theorem, we get
[
Lq
Uq Up - Up Uq
-
0
-
Uq L p - Up L q
01
T,~11
"
The last equality follows from the Toeplitz inversion formulas and the commutativity of upper (and lower) triangular Toeplitz matrices. Multiplying from the left with xT and from the right with y gives (1 - (xy) ~+1) K n _ l ( X , y) - [q(x)p#(y) - p(x)q#(y)](1 + x y + . . . + (xy) v) where (p(z), q ( z ) ) i s a fundamental pair for Tn_ 1 and the reciprocal is taken for degree v -{- 1 for both of them. Since (1 - z)(1 + z + . - . + z ~) - 1 - z ~+1 , the result follows. [:] A direct expression in terms of the orthogonai polynomials is not so nice. Substituting for the fundamental polynomials in terms of the orthogonal polynomials, we get
Kn_l(x,y) -
~"-+ b0,._~ (y)~o,~(~) - ~'-+ b0,n-l(x) # ~#o,n(Y) v+
sn_ i (I -
x~)
C H A P T E R 4. O R T H O G O N A L P O L Y N O M I A L S
218 if a + > 0 and
Kn_l(x,y)
=
~bo,._~(y)~o,~(~)- b#o,._~(~) ~ ( y ) v+ Sn_ 1(1 - x y )
if a + - fl+ - O. These Christoffel-Darboux formulas are not s y m m e t r i c in their left/right aspects, although the defining formula for Kn-1 is. This is because our analysis was based on the f u n d a m e n t a l pair, which should in fact be called a right f u n d a m e n t a l pair, since there is a complete s y m m e t r i c definition for a left f u n d a m e n t a l pair. A left f u n d a m e n t a l pair (/5, ~) would be defined as
[
i5o ~o
"'" "'"
i5, ~,~
]
1 0
T,~+I,,~=
[00 0] 1
0
...
0
or equivalently
-
[o] t.
'
1. If z 7~ 0, we set z - re {~ which gives for d 5r 1 that Td-1
_
ei(d+l)O.
Thus, r - 1. Therefore, besides the origin, the other points in the variety are given by the points e iO where the 0 are solutions of e iO - - e - i d O
or
e i(d+l)O =
1.
That is, these 0 are given by 2k
O- ~~r,
k-O,l,...,d
d+l
ford>
1 and by
0for d < - 1 . points"
2k
~~', d+l
For example, for d -
(0, 0), (1, 0), and for d -
k-O,-1,...,d+2 2, the algebraic variety consists of 4
(-1/2, vf/2)
- 2 , there is only the one point (1, 0).
Chapter 5
Padd approximation The history of continued fractions, and associated with it, the problem of Pad~ approximation is one of the oldest in the history of mathematics. One can read about the history in Brezinski's book [25]. There are very early predecessors, but the study was really started in the 18th century and came to maturity in the 19th century. The serious work started with Cauchy in his famous Cours d'analyse [50] and Jacobi [157] and was continued by Frobenius [100] and Pad~ [198]. A current standard work is the book by G. Baker Jr. and P. Graves-Morris [7]. For the proofs of the theorems given in this section we refer to the latter. For the less patient reader, there are surveys by Brezinski and van Iseghem [33, 35]. This chapter is mainly devoted to linking the previous algorithms to the recursive computation of Pad~ approximants on a cetain path in the Pad~ table. The recursions are mostly well known and follow also immediately from the recursion formulas for continued fraction expansions of the given formal series. Many of them can be found in the book of Wall [238] and in the references on Pad~ approximants given above. Of course these relations can be translated in a terminology of orthogonal polynomials and the recursive computation corresponds to the recurrences of orthogonal polynomials and adjacent orthogonal polynomials. For this interpretation we refer to [24, 41, 238, 240] for the normal case and to [87, 88] for the more general situation. One of our main intentions in this chapter is the introduction of the notion on minimal Pad~ approximant in Section 5.6. This is the translation of the notion of minimal partial realization that is extensively discussed in Chapter 6. In this chapter, we shall assume again that the coefficients are taken from a skew field F. This implies that we should discuss again left/right duality. We shall give a development for the right sided terminology. The 231
232
C H A P T E R 5. PADF, A P P R O X I M A T I O N
left sided analog is left as an exercise for the reader.
5.1
Definitions
and terminology
Let us start with the definition, or rather some definitions, since one could accept different definitions for a Pad6 approximant. Definition 1.13 for Pad6 approximant that we have given at the end of Section 1.5 corresponds to the Baker-type definition. We repeat it here in a slightly different form. D e f i n i t i o n 5.1 ( P A - B a k e r ) Given a formal power series f ( z ) - E~=o fk zk C F[[z]], with fk C F, we define a right Pad6 approximant ( P A ) [ ~ / a ] of type (~, a), a , ~ > O, as a rational fraction c(z)a(z) - t satisfying 1.
deg c(z) O, for the series f ( z ) e F[[z]] iff 1.
deg c(z) _ w + 1
(a(z) ~ O)
(5.8) (5.9) (5.10)
Such a PA will also be denoted as For the Baker definition, a(0) ~ 0 and thus a comonic normalisation seemed natural. For the Frobenius definition, it can not be ensured that a(0) ~ 0 and another normalisation has to be adopted. We shall occasionally assume
234
CHAPTER
5.
PADE APPROXIMATION
that the PA is normalised by making a ( z ) monic, i.e., that a,~ = 1 with = deg a. By this normalisation, the PA will be fixed uniquely. Note that conditions (5.5) and (5.10) of the two definitions are the same and that (5.8)is the linearized form of (5.3). The original Frobenius definition [7] is the above definition without condition (5.10). Without this restriction, it allows a complete equivalence class of PAN for entries in a singular block of the Pad@ table. A single one could be selected by taking the reduced form, i.e., the rational function defined by it. Then in a square block of the Pad~ table, all the PAN (viewed as rational functions) are equal to the left top element, including those below the main antidiagonal of the block. Note that the Frobenius definition is weaker than the Baker definition. A PA in the Baker sense will automatically be a PA in Frobenius sense, but the converse is not true in general. Depending on the approach we take, (see the next two sections), the Euclidean algorithm sometimes computes entries of the Pad~ table that are below the antidiagonal of a square block in an unreduced form. Therefore it might then be convenient to define the PAN by conditions (5.6)-(5.9)and make them as simple as possible by imposing condition (5.10). This is our main incentive to introduce the Frobenius type of PA besides the stronger Baker definition. In the sequel we shah assume the PAN are defined by the Baker type Definition 5.1 or the Frobenius type Definition 5.3 depending on what will be the most appropriate. E x a m p l e 5.1 To illustrate the difference between both definitions, consider the example F ( z ) - 1 + z 4 + z ~ + z 9 + . . . . Its Pad@ table has a 4 by 4 block in its left top corner. It can be filled in as in Table 5.1. The original Table 5.1: Pad~ table for different definitions of PA
1/1 1/1 1/1 1/1
1/1 1/1 1/1
1/1 1/1
Definition 5.1 Baker
1/1
0 1/1 1 1/1 2 1/1 3 1/1
1/1 1/1 1/1 z/z
1/1 1/1
1/1 z/z
z/z z2/z 2
Z2 / Z 2 Z3 / Z 3
Definition 5.3 Frobenius
5.2.
COMPUTATION
235
OF D I A G O N A L P A S
Frobenius definition has 1/1 all over the square. Note that our Baker and Frobenius definitions are equiwlent iff a(0) 7t 0.
We shall see next how the Euclidean algorithm or one of its variants, computes the Pad6 approximants which are located at certain paths in the Pad6 table.
5.2
Computation of diagonal PAs
It has already been discussed in Section 1.7 on Viscovatoff's algorithm that the Euclidean algorithm (and the Euchdean algorithm is a special case obtained by setting the parameter v - 0 in the general algorithm) will produce Pad~ approximants for a series f e F[[z]]. This property of the Euclidean algorithm was described in Theorem 1.18 (case v - 0). However, it was assumed there that the series f - - s - l r has ord f _> 1. When we work with power series it is more conventional to start the series with a constant term f0, which need not be zero. Thus we should allow f to be replaced by a series f - f0 + f with f0 C F, possibly nonzero. This is a simple m a t t e r since to make this work, we only have to replace the matrix
V0 -
[vo 0] by [vo oao] 0
ao
0
a0
with a0, v0 E F0. Thus if f0 7t 0, the algorithm starting with V0 will compute approximants which are different from the approximants computed in the case f0 = 0, i.e., when it starts with Vo. However, it is easily checked that if Co,k and ao,k denote the polynomials as they are generated in Theorem 1.18 for the series f and if Co,k and 50,k are the polynomials generated with the initial matrix Vo, replacing f by ] - f0 + f, then it holds that ~o,ka0,k--~ ---fo + Co,kao,k-1 and thus ~o,k(z) - f o a o , k ( z ) + co,k(z) and ao,k - ao,k. The degrees of the c0,k are still bounded by the Kronecker indices ~k and the orders of approximation are still the same. By using the same method as in the proof of Theorem 1.18, it can be shown that c0,k and a0,k are right coprime and hence that ~0,kho,~ is a [~k/~k] PA for the series ]. Thus we can formulate the following theorem.
T h e o r e m 5.3 Let s - - 1 and f - s - l r - r - ~_,k~176f k z k C F[[z]]. We apply the Extended Euclidean algorithm with the above initialisation step,
236
C H A P T E R 5.
PADE APPROXIMATION
which means the following. Define
ilol
G-l-
0
1
8
r
Vo-
[ ivao vo
foao
Vo, ao C Fo arbitrary.
For k > 1 set
I UO,k vo,k aO,k ~o,k1
Gk-
$k
ckzk 1
'
rk
~k~"~ ~k(~)
and zk - - s k 1rk with ak - ord rk-1 -- ord Sk-~, ak(z) -- --(sk-1 l d i v rk-1)ckz ~k Ck, uk C F0 arbitrary Vk -- O. Then,
-s
-
l
al
I+
[
.
Ir-fo+v
o
~ 1 - - O:1,
~ k - - O~k-1 -~-O~k,
aO,k(O)- a o a l ( O ) ' - "ak(O) ~ O,
+ ...+
a2
l an+unz~
ao 1 ,
k _> 2
k ~_ 0 k
degc0,k _< 'ok,
degao,k _< 'ok,
ord zk
--
O~k+l, i=1
f
--1
a-1
- c0,k a0,k - rk 0,k
ord rk
-
-
ord sk +
r
-- ak+l
+ 2~;k.
1 The polynomials co,k and ao,k are right coprime and all co,k a -o,k are Padd approximants for f .
Note t h a t since ao,k(0) ~ 0, we need not distinguish between the Baker and the Frobenius definition here. Let us illustrate this with an example. E x a m p l e 5.2 The example is nothing but a translation of E x a m p l e 1.5 to the present situation. It also shows t h a t the algorithm works in more general situations of given Laurent series s and r. The restriction to s - - 1
5.2. COMPUTATION OF DIAGONAL PAS was j u s t for t h e ease of f o r m u l a t i o n .
237
N o t e also t h a t in this e x a m p l e , it so
h a p p e n s t h a t fo - 0. W e c h o o s e for t h e g i v e n s a n d r t h e following L a u r e n t series $-
r-
--Z - 1 -~- Z 4 a n d
1 + z 4.
W e shall n o t d i s t i n g u i s h b e t w e e n left a n d r i g h t in o u r n o t a t i o n . Vo - ao -
We take
1, t h e n we c a n d i r e c t l y s t a r t w i t h so - s a n d ro - r. S u p p o s e we
c h o o s e ck a n d
uk e q u a l t o 1 for all k _ 1. T h e n t h e successive c o m p u t a t i o n s
give us t h e following results" a l - o r d ro - o r d so - 0 - ( - 1 ) al
--
Note that deg al - 0 < 31
d i v r 0 ) z "~ - z -1 z, t h u s a l -
--(s0 al
--
-
1. T h u s
1.
1.
[~o ~o],1
T1 ]
-
l+z'][
~
_ [z+z~ ~ , § N e x t we find t h a t O~2 -- o r d r l - o r d
S 1 -- 4 -
1 - 3. So t h a t we get
a2 -- - - ( 8 1 d i v r l ) Z 3 -- - 1 + z -
z 2 ~- z 3.
Thus 82
T2 ]
[~1 ~]~ _ [~§ z4§
F o r t h e n e x t s t e p : a3 - o r d r2 - o r d
[oz~
s2 - 8 -
7-
-1
+ z-
z)12.
= [zT+z ~ 2~8][ z0
-(1
-(s2 divr2)z
-
T h i s gives $3
T3] -[~ -
~ ]v~ [2z 9 o].
]
1 a n d t h u s we find
-(1 +
a3 -
z 2 + Z3
z ] + z)/2
238
CHAPTER
5.
PADE
APPROXIMATION
Here the algorithm stops because r3 - 0 which means that - s - l r and co,3ao, t3 correspond exactly. The given data corresponded to a rational function which is recovered after 3 steps of the algorithm. The successive V0,k matrices, containing the successive approximants are given by Vo,1-
[0z] z
Vo,3_
1
[z4 ; Vo,2 - Vo,I V2 -
Vo,2V3-
-
za
+
-z+z
-l
-
+ z-
+
z~ + za + z4
(z +
)/2
2-z 3+z 4+z 5 (1-z5)/2
;
"
The degree properties can be easily checked. For example, we find that the series expansions of f - - s - l r - [1 + z 4 ] / [ z - 1 z 4] and C0,aa0-~ = [(z + z S ) / 2 ] / [ ( 1 zS)/2] match completely. Note that f - - s - l r has the continued fraction expansion z4
[+
whose successive convergents a r e c o , k a o , k ,-1 for k - 1 ,2 3, . A picture representing the computed elements graphically is given in Figure 5.1. The successive approximants are indicated by an X.
Figure 5.1: Pad~ table for f ( z ) - z + z s + z 6 + z t~ + z it + z is + . . - : Diagonal path
0 1 2 3 4 5 6 7 8
x0i
i i i :X~
,
,-,
~
i i i
" ,-,
,
,-,
,
, : :
:
: ,
,
r
,
,-,
,
,-,
,
,
~
,',
,
,
,
,-,
,
,
,
,
. .
:Iil
"
'i
...........
,,
, ' , ,
~
x2
............
.
~
,
, ' , ,
, ' , ,
,
,.
,
,%
,
,',
,
,-~
,
,-,
,
,-,
.
.
,
,
.
,_-:
:
~ , ,
,::
=
:,
=
~ ,
,_-:
.
.
~",,
,
,
~
"~
,
,
r
,
,
o
.
,",
, ~
.
.
,
.
.
,-,
.
.
.
.
In the case where we replace the previous f by f + 1, i.e., we choose for example s - - z -1 + z 4 and r l + z -1 then we can start with the
5.2.
COMPUTATION
OF D I A G O N A L P A S
239
multiplication with the V0 matrix to get
'30 7'0 ]
and thus, after this initialisation step, we can reuse the previous calculations. Thus all the V0 matrix does is to reduce the problem with a term f0 to a problem without a constant term. Note also t h a t if f0 - 0, V0 - V0. We have seen t h a t the extended Euclidean algorithm computes (reduced) PAs on the main diagonal of the Pad6 table. We could wonder whether we did not miss any, i.e., is there any other PA on the main diagonal of the Pad6 table in between Co,k_lao,k_ -a 1 and co,kao, ~. The answer to the latter question is negative. We indeed find them all. If the main diagonal cuts successive singular blocks in the Pad6 table, then the algorithm produces for each block precisely one PA, which is the reduced one in the left top corner of the block. The correctness of this statement should be clear if we identify the tck as the Kronecker indices of the sequence {fk}k>l and the ak as the Euclidean indices. Knowing that the denominator
a,~(z) = ao + ' . . + a,,,z '~ of an [a/a] approximant in the Baker sense should satisfy the Hankel system 9 ""
f
+l
"'"
fa+2 9
-.-
0
aa-1 9
ao
0 ~
,
ao#O.
0
It then follows from the definition of Kronecker indices and the interpretation as right orthogonal polynomials of the denominators t h a t these denominators are uniquely defined (up to a nonzero constant factor) when a is some Kronecker index tck and all solutions of denominators in between are polynomial multiples of a0,k. The same polynomial multiples will then occur in the corresponding numerators c0,k. Thus all PAs t h a t might exist between two successive Kronecker indices tck and tck+l can be represented in the reduced form cO,kao,k. -1
C H A P T E R 5. PADE A P P R O X I M A T I O N
240
In conclusion we can say that the (main) diagonal algorithm computes the PAs along a diagonal, precisely one per singular block the diagonal passes through, namely its reduced left top element. If we are interested in other approximants than those on the main diagonal, then we can use the following theorem (see [7, Theorem 1.5.4]). T h e o r e m 5.4 ( T r u n c a t i o n t h e o r e m )
Given f ( z ) -
~
fkz k ~
F[[z]].
Define fk = 0 for k < O. Define for some shift parameter a C Z oo
f~(z)-
~
~ f(z)fkz ~+k and p ~ ( z ) - [ 0
z-"f,,(z)
if a < 0 if a > O
k=-a
Suppose a(z) and c(z) are polynomials satisfying ord (fa(z)a(z) - c(z))
>
deg a(z)
_
_ 0. Then
f~(z) - z~ f ( z ) and p~(z) - O. From the defining relations for a(z) and c(z), it follows that the solution has got to have the form c ( z ) - z~c'(z), with c'(z) a polynomial of degree a - a at most (it is zero when a < a). Hence ~ ( z ) - z - ~ c ( z ) - c'(z) has a degree as claimed. Also the order of approximation is as it should be since
f(z)a(~)
-
e(z)
-
z-.(f~(~)~(z)- ~(z)).
This gives the proof for a _> 0. Now let a < 0. Then
f~(z) p~(z)
--
f_o.
-~- f _ o . + l
Z .4- . . .
--
f o -~- f l Z .-~ . . . ..~ f _ o . _ l Z
--o'--1
5.3. COMPUTATION OF ANTIDIAGONAL PAS so that f(z) - p,,(z) + z-"f,,(z). degree < a - a and the order of f(z)a(z)-
241
Thus ~(z) - p,~(z)a(z)+ z -'rc(z) has
e(z) :
_
is as claimed.
[3
The theorem says that if we want to find a PA for f(z) on diagonal a, we can always produce one starting from a diagonal 0 approximant for the series f,~(z). Thus it is sufficient if we know how to compute PAs on the main diagonal of the Pad~ table. Another fact that can be investigated is an adaptation of the atomic Euclidean algorithm to the case of series in F(z). It turns out that if we do this carefully, that we can obtain approximants that are on a downsloping staircase path in the Pad~ table. In the special case of a normal Pad~ table for example, the adaptation of the atomic Euclidean algorithm can be arranged such that it decomposes one step of the extended Euclidean algorithm into two more elementary steps. The first one will find an approximant whose order of approximation is one higher and which is located on the neighboring diagonal. The second one adds again one to the order of approximation and brings us back on the main diagonal again. One can imagine how this has to be extended to the nonnormal case where blocks may occur in the Pad~ table. For further details we refer to [38, 39, 87] etc. In Section 5.4 we discuss the Viscovatoff algorithm which also computes a staircase. Also the Berlekamp-Massey algorithm is an atomic version of the diagonal algorithm and it computes staircases. We shall come back to this in greater detail in Section 5.7. In the paper by Brent, Gustavson and Yun [22] a divide-and-conquer strategy was applied to speed up the algorithms.
5.3
Computation
of antidiagonal
PAs
The idea which was used in the previous section to translate the results for series in F(z -1) into corresponding results for series in F(z) was a transformation z ~ z -1. This resulted in an algorithm which computed the reduced elements in the sequence [ a - a / a ] for a - O, 1 , . . . , where a refers to the chosen diagonal in the Pad~ table, a - 0 refers to the main diagonal, negative a's number diagonals below it and positive a's refer to diagonals above the main diagonal. Hence, the downward sloping diagonals in the Pad~ table,
242
C H A P T E R 5. PADE A P P R O X I M A T I O N
which are described by a constant a, will be called a-lines for short. The upward sloping antidiagonals are orthogonal to the a-lines. These are lines where the lower bound w + 1 for the order of approximation is constant. We shall call them w-lines. Thus the coordinate net of fl's and a's can be replaced by another orthogonal net which is rotated over 45 degrees and which numbers the diagonals in the Pad6 table. The diagonal algorithm discussed in the previous section computes Pad6 approximants along a a-line. To compute the PAs along an w-fine another technique is used than the one proposed in the previous section. Note that the length of an w-line is always finite. If we have to fit the first w + 1 coefficients of f e F[[z]], then it is sufficient to know only h for k = 0 , . . . , w. The remaining coefficients do not enter the computations and could as well be thought of as being zero. Since an w-line starts at the point (w, 0) and ends at the point (0, w), we immediately know how to start the computations since [w, 0] - ~ = o h z k / 1 is obviously the simplest element in [w/0]. As on a a-line, the algorithm should compute the elements on the current w-line, or rather only those that correspond to the first entry found at the intersection of the w-line and the blocks in the Pad6 table. However, it will be most convenient now to use a monic normalization for the denominators of the approximants. If we consult Table 5.1 again, we see that, if the w-fine intersects a block below its main antidiagonal, then according to the Baker definition, (left-hand side table in Table 5.1) the PA does not exist there. Since we do need a representative for each block, we shah use the Frobenius definition (righthand side table in Table 5.1) and use a reducible (by some power of z) PA (below the antidiagonal) as a representative of the block. Note that we can not take the reduced left top corner element as in the previous section, since this one will not meet the required order of approximation imposed by w if the intersection is below the antidiagonal. Hence the monic normalization for the denominator (rather than a comonic one) is a correct choice here. The use of the Euclidean algorithm in this context is based on the following observation. Let f be a series from F[[z -1]]. The extended Euclidean algorithm gave the relation a-1 a-1 f - c o , k O,k-- rk o,k which can obviously be written as
This means that we can freely interchange the role of numerator and residual, certainly if all the series involved are actually finite series, i.e., polyno-
5.3.
COMPUTATION
OF A N T I D I A G O N A L
243
PAS
mials. Here we can choose for f the strictly proper series =
k=0
k=0
From the analysis of the extended Euclidean algorithm applied to a series -1 from F[[z -1]], we know that the degrees of ao,k and c0,ka0, k are nondecreasing while the degree of rka0-,~ is nonincreasing. Taking rk in the role of numerator and c0,k in the role of residual, this is precisely what we want to obtain when walking along an w-hne from left to right. The desired quantities then will only need a shifting back with the power z ~+1. This leads us to the following theorem which is an analog of Theorem 1.13. T h e o r e m 5.5 Let f ( z ) - E F : o h zk e F[[z]] be given. Let s - - z w+l and r - ~ f k z k. Apply the right sided extended Euclidean algorithm with the initial matrix
I- ol
G-l-
This means F \ { 0 } and vk - 0 and polynomials Set
the following uo - C o - O. ak - - ( s k - 1 and should be
1
0 8
.
r
9 Choose nonzero constants vo and ao C F0 F o r k >_ 1 and until rk - O, choose uk, ck C Fo, l d i v r k _ l ) c k . The ldiv operation acts here on considered as a division of elements from F ( z - 1 ) .
V k _ [ v k uk
ck] ' ak
k > O,
and generate G k -
a-
x V o V~ . . . V k
-
vO,k
-I cO,k |
ltO,k 9Sk
aO,k J 9 rk
Then the following relations hold
degao-ao-O,
anddegak-ak_>
1,
k>_ 1
k
deg ao,k - ~k - ~ i = o ai ordvo,k >_ w + 1 and ordco,k >_ w + 1. degrk--w+l-t~k+l ord (fao,k - rk) - ord co,k >_ w + 1. The rational fractions r k a o,k - 1 ate P A s of f of type (w - ~k, ~k). This means they are approzimants on the w-line number w in the Padd table of f .
C H A P T E R 5. PADF, A P P R O X I M A T I O N
244
The algorithm shall eventually end because rn will become zero for finite n ~k - deg ao,k, it follows that rkao--,~ satisfy the degree-order conditions of a Pad4 approximant (in Frobenius sense) on the indicated w-line.
5.3.
COMPUTATION
OF ANTIDIA
GONAL
245
PAS
By Theorem 1.16, Co, k~ and a0,k are right coprime. Since by f a o , k - C o , k rk, any common right divisor of c0,k and a0,k is also a common right divisor of a0,k and rk, it follows that the only common divisor that rk and ao,k can have is a power of z. To show that the degree of the denominator is as low as possible, we should prove that if rk and a0,k have a common factor z, then it is impossible to divide out this factor in the relation f a o , k - rk = co,k without violating the order condition, thus that if a0,k(0) = 0, then ord c0,k is exactly equal to w + 1. To prove this, we introduce
[1121w12],2 Note that Wo,,~ is a polynomial matrix (see e.g. Theorem 1.4). Thus we have 0
1
uo,k
aO,k
1/321 1022
or explicitly __
VO,k1011 -Jc CO,k1/321
(5.11)
0
:
'00,k1012 ~ C0,k1022
(5.12)
0
=
U0,k 1/311 "~- ao,k w21
(5.13)
1
--
U0,klOl2 -[- a0,kl/322 .
(5.14)
ZW+ 1
From (5.11) it follows that min{ord vo,k + ord w11, ord Co,k + ord w21 } _< w + 1. Since ord v0,k _ w + 1 and ord c0,k _> w + 1, this is only possible if (ordvo,k=w+l (ordco,k=w+l Now suppose that
aO,k(O) =
0,
& ordwll-O) - 0). & ord
(5.15)
i.e. ord ao,k _> 1, then
(5.13) :::> Uo,k(O)w11(O)- 0 1
(5.14)
or
1
f
11(0) = 0.
But if w11(0) = 0, then ord wll _> 1 and thus by (5.15) we find that ord co,k = w + 1, which confirms what we said before. Thus the degree of a0,k is indeed as low as possible and this means that the approximants rka0--,~ are Pad6 approximants in Frobenius sense.
CHAPTER 5. PADJE APPROXIMATION
246
T h a t the algorithm will end is obvious since the degree of rk is strictly decreasing with k (because ak _ 1) and therefore, one shall end up with an rn - 0. The computations actually correspond to the (extended) Euclidean algorithm computing the greatest common left divisor o f - s and r. When rn has become zero, then sn is the g.c.l.d, of s and r. [::] Note that we do not necessarily have ord
( f - rka 0,k) -1 > w + 1
since we are not sure that a0,k(0) ~ 0. T h a t this constant term may indeed be zero is illustrated in the next example where X2 is reducible and a0,2(0) 0. E x a m p l e 5.3 Take the example w - 6, the initial matrix G-1 is
G~I
f(z) - 1 + z 4 + z s + z 9 + . . . .
z7
0
0 -z 7
1 1 + z4 + zs
Choose
We choose the matrix V0 to be the identity matrix, so that Go Consequently, we have the initial approximant -1 7'0,0a0, 0 :
G-1.
1 -~- Z 4 -[- Z 5
In the rest of this example, we shall always choose the constants uk and ck such that - s k as well as a0,k are monic polynomials. This is always possible to obtain: in the initialisation step by an appropriate choice of the m a t r i x V0 and in every subsequent step ck is used to make ak monic and if this is monic, then also a0,k will be monic by induction. On the other hand, uk which is used to transform r k - 1 into sk, can be used to make - s k monic (see the normalization Theorem 1.14). For example we can choose ul - - 1 , while we can take cl = 1. Thus we get
iv1 c1] [01 1] U1
a 1
z2- z + 1
--
Note that al is the quotient of z 7 divided by z S + z 4 + 1. This gives
Go,l-
I
0
-1 -z s-z 4-1
Z7
z2-z+1 z4+z 2-z+1
1
-1
andr1%,l =
1--Z+Z2+Z 4 l-z+z
2
5.3. COMPUTATION OF ANTIDIAGONAL PAS
247
T h e n e x t V - m a t r i x is given by
iv2 c2] [01 1] u2
a2
-
z + 1
w h e r e c2 = - u 2 = 1 a g a i n a n d a2 is t h e q u o t i e n t of z 5 + z 4 + 1 a n d z4+z 2-z+l. So we find t h a t
I Go,2 -
--Z7
Z 8 q- z 7
-z 2 + z- 1 -z 4 - z 2 + z- 1
z3 z3
N e x t we find
,03 C3 ] '/t3 a3
V3
I
z3 a n d r2ao, 2 = z3.
_[o i] --i
Z
and
I _ Z8 _ Z 7 _Z 3 Go,3 : _Z 3
z 9 + z 8 _ z ~' z4- z 2 + z- 1 I -z2+z-1
-l+z-z a n d r3ao, 1 -
2
--l+z--z2+z
4"
A n e x t step gives
~34 ~4
V4
0 1
C4 ] --
a4
-1 l+z
]
and I
Go,4 --
z9 + z 8 z4-
z7
z 2 + z-
z 10 _[- 2z 9 + z 8 z s + z4-
1
-z2+z-1
1
-1 a n d r4ao, ~ -
T h e last step gives
Y~ --
us
1
z 1~ + 2z 9 + z a
I
z s + z4-1
+ z 4 + z S"
[0 1]
a5
1 - z + z2
with
Go,5
-1
--1
1
z 12 ~-
Z11 _~-Z 7 z7 0
1 .
T h e successive solutions we o b t a i n e d are i n d i c a t e d by an X in t h e P a d ~ t a b l e of f ( z ) in F i g u r e 5.2.
248
CHAPTER
Figure 5.2" Pad6 table for Antidiagonal path
f(z)
-
5.
PADE
APPROXIMATION
1 + z 4 § z S -Jr- z 9 -+- z 1~ -+- z 14 -}- . . . .
0 1 2 3 4 5 6 7 8
II
I
! I•
-1 0 1 "i'"i'"i ....... X;!"!"i'" 2 : X3: : :X2 3 4 I J• ..... 5 6 Xo: 7 8 o
r ,
, . ~ ,
, - , ,
,-,
,
,_-
:
e , ,
,
.
.
.
.
.
-,
~
,
, ' , ,
, 7 ,
,',
,
,r:
,-,
,
~
:
_-,
0
; ~
,
r
, ;
,
,
il
,
,'0
~
~
,
,-,
,
~
o
~
,
~
,-,
,
-
-
,
,
,
"~
.
Note that the entry X2 is reducible since it is below the antidiagonal of its block. All the other entries indicated are irreducible because they are in the upper left triangular part of the blocks. The entry Xs is purely artificial. It corresponds to the "approximant" in G0,s which is O / z ~. We shall say more about it in Section 5.6.
This version of the Euclidean algorithm is often called the Kronecker algorithm [176] (see [7]). Also Chebyshev was aware of this algorithm. In [58] he describes how to expand a rational function into a continued fraction and this corresponds to what we have described in this section.
5.4
C o m p u t a t i o n of staircase PAs
In the previous sections, we have considered the computation of PAs on a diagonal or an antidiagonal PAs, depending on the initial conditions and on whether the algorithm is working on series in z or series in z -1. The fact that for the antidiagonal, the algorithm stops when the upper border of the Pad~ table is reached does only happen because one starts the algorithm by taking a polynomial as the initial condition. If we would add to this polynomial infinitely many negative powers of z, then the algorithm could go on just as it does in the case of series in z on a downward sloping diagonal. So
5.4.
COMPUTATION
OF
STAIRCASE
249
PAS
if we consider bi-infinite Laurent series, then depending on whether these are truncated on the side of positive powers or on the side of negative powers, we can compute an infinite diagonal in one direction or the other. The transformation z ~ z -1 then flips the corresponding bi-infinite (half plane) Pad4 table upside down along the horizontal axis and the duality is quite natural. This is the idea used in the construction of Laurent-Pad6 approximants which were considered in [41]. W h a t has been discussed for the diagonals can be repeated for staircases that alternate between two neighbouring diagonals. This is precisely what the Viscovatoff algorithm does. If we consider the general Viscovatoff algorithm with the parameter v 6 {0, 1} as described in Section 1.7, then the case v = 0 corresponds to what has been explained in the foregoing sections and the algorithm computes PAs on a diagonal or antidiagonal. When the Viscovatoff algorithm is applied to series in F(z>, with the parameter v set to 1, then it follows from Theorem 1.18 that the successive elements which are computed are the relevant elements on a staircase path of the Pad6 table. E x a m p l e 5.4 One can have a second look at the Example 1.8 and it can be observed that the elements which are computed are located on the staircase path as given in Figure 5.3. All of them are Pad6 approximants in the Baker sense. Figure 5.3:Pad4 table for Staircase path
= z + z 5 -4- z 6 + z 1~ -4- z 11 + z 15 _[_ ...:
f(z)
012345678 .
~o: ,-,,- , ,
,,i,,:~
-. '
: .
"
,
,"
]
.
.
.
.
.
[
~
.
. . . . . . . . . .
..................
.
o,
.
:..:,.?,,:
250
C H A P T E R 5. PADE A P P R O X I M A T I O N
Using the Truncation Theorem 5.4, it is possible to compute other staircases parallel to the main one. For more details see [7, Section 4.2] or [38]. When the Viscovatoff algorithm is applied to polynomials which are considered as truncated power series in z -1, one can also compute upward sloping staircases. In this case, like for the antidiagonal, not all the approximants computed will be Pad~ approximants in the Baker sense because the approximant which is computed is the first one that one finds in a block at the place where that path enters a block. Since an upward sloping staircase can enter a block either from below or from the left, one may have a reducible Pad6 approximant in the Frobenius sense when the p a t h enters the block from below but not in the first column of the block. In that case the first entry that is find in the block is below the main antidiagonal of that block where no Baker type Pad6 approximant exists. We have identified the Berlekamp-Massey algorithm as a variant of the Euclidean algorithm in Chapter 1. Thus it will obviously compute diagonal paths in a Pad6 table. However, the original idea behind the design of this algorithm is the construction of a minimal annihilating polynomial. This polynomial can be chosen as the denominator of a rational approximant that is "as simple as possible" and approximates up to a certain order. Depending on what is defined to be "as simple as possible" or "minimal", one can generate diagonals or staircases. This leads us to the notion of minimal Pad6 approximation. Minimal Pad~ approximation is a translation of the notion of minimal partial realization. The latter is usually discussed in a framework of series in z -1. Translating this to a framework of power series in z, as we have done several times before in this book, we obtain minimal Pad~ approximants. This is what we shall do in the next sections.
5.5
M i n i m a l indices
Before we discuss minimal Pad6 approximation, we introduce in this section the notion of minimal indices and minimal annihilating polynomials. These are needed to connect the denominator and numerator degrees of minimal Pad6 approximants with the Kronecker indices. In a sense, the way minimal indices are defined is an alternative way to characterize the Kronecker indices. Here is the definition.
Definition 5.4 (minimal indices, mAP)
Let
f
-
{fk)k>~
sequence and r > 0 an integer. Consider the Hankel matrices //~-~-1,~,
a = O, 1 , . . . , r
1
be a given
5.5. M I N I M A L I N D I C E S
251
]k,l
with Hk,t - [fi+j+lji,j=o. We define H-I,r as an empty matrix with zero rows and r + 1 columns. Let a(0; f ) - 0 and for r > 1, we define a(r f ) as the smallest integer a >_ 0 for which the homogeneous system
I~176176 9
-
.=
9
,
(5.16)
0
has a nonzero solution. The a(r f ) is called the minimal index of order r and a(z) - ~/]~ akz k is called a minimal annihilating polynomial ( m A P } of order r for the sequence f . If we impose the extra condition aa 7~ 0 in (5.16}, then we call a(r f ) the constrained minimal indices and a(z) the constrained m A P . This definition means that a(r f ) is the smallest a for which the columns of Hr are linearly dependent. The matrix has a kernel or right null space, that is of dimension at least 1. Any vector from the kernel defines a minimal annihilating polynomial. If we consider H = [f~+j+l] as a moment matrix, we can rephrase this by saying that a(z) - ~ akz k is the right orthogonal polynomial of degree a at most which is right orthogonal to all polynomials of degree < r a with a as low as possible. We represent this in a picture so that the relation with the Kronecker indices will become obvious. Let H be the infinite moment matrix and A the unit upper triangular matrix whose columns are the stacking vectors of the block orthogonal polynomials. Then ATHA - D where D is block diagonal with block sizes ak. Each block is a lower Hankel matrix. This block diagonal D is depicted in Figure 5.4. The orthogonality requirement formulated above means that we should walk along the diagonal r from left to right and find the first column for which there exists among all columns considered so far, one which is completely zero, on and above the C-line. There are two possibilities to consider: either the C-line cuts a block above its main antidiagonal or not. This corresponds to the left and right picture in Figure 5.4. In the first case (to which we shall refer as case A), there is no doubt: the minimal index is a Kronecker index of the sequence fl, f2, .... This is true for the constrained as well as for the unconstrained case. The column always coincides with the starting point of a new block, i.e., a(r f ) is a Kronecker index. If we denote the Kronecker indices of fl, f 2 , . . , by ~k and the Euclidean indices
252
CHAPTER
5.
PAD.E A P P R O X I M A T I O N
Figure 5.4: Minimal indices and orthogonal polynomials
~(r
r
.
a1(r
~2(r
r
/
/
/
/
/
/
/
/
/
/
/
/
II
/ /
/
/
/
I /
/
P
/ /
/
,/
/
/
/ /
by ak, then in case A case A"
a(r f ) -
nk
for
2~;k _< r < 2nk+l - a k + l .
As a m A P we should take the true right orthogonal polynomial ( T O P ) a0,k of degree ~k. The second case, when the C-line cuts a block on or below its main a n t i d i a g o n a l - see the right-hand side situation of Figure 5 . 4 will be called case B. It now depends on the minimal indices being constrained or not what value they will have. When we are in case B and choose the constrained minimal indices, (case Bc) then the m A P is an orthogonal polynomial that should be of strict degree a and then the only possible choice for the minimal indices is the next Kronecker index, which is indicated in the figure by a2(r Hence for the constrained minimal indices, we have case Bc:
a(r f ) - ~k
for
2~k - ak _< r < 2~;k.
As a m A P we can not only take the TOP a0,k, but also any polynomial of strict degree ~;k which is right orthogonal to z i for i - 0 , . . . , r ~ k - 1. That means any polynomial from the manifold
~0,k + ~p~n{~'a0,k_," i - 0,...,2~k - (r + 1)}. Their stacking vectors span the kernel of Hr
for a - ~;k.
253
5.5. M I N I M A L INDICES
If the minimal indices are unconstrained (case Bu), then we find the minimal indices to be caseBu:
a(r162
for
2,r162
This is indicated by a1(r in Figure 5.4. For the mAP we can take the TOP ao,k-l(z). Note however that for r = 2,r 1, the minimal index is a(r f) = 'ok, thus for that particular value of r we can choose any combination of a0,k and a0,k-1 as a mAP. This exceptional value r = 2,r - 1 with 'on a Kronecker index, will be needed several times in the sequel. For further reference, we call it an E (for exceptional) situation or an E value of r We summarize in the following theorem what has been found so far. T h e o r e m 5.6 Let f l , f 2 , . . , be a sequence with Kronecker indices 'ok and Euclidean indices ak. Then for given r > O, the (unconstrained) minimal indices are given in case A by
a(r f ) -- tCk
2tCk_< r < 2~k+1 -
for
O:k+l
with corresponding m A P equal to ao,k. In case B, they are
a(r162
for
2,r -- at: 0 the numbers { a k } ~ = l such that -
j-a,a+l,...
_
k--1
with a as small as possible. The Massey algorithm (for a - 1) does compute the coefficients ak and the minimal number a needed on a recursive basis, i.e., they are rechecked and updated when needed as the data fk become available.
Chapter 6
Linear systems and partial realization This chapter will serve to introduce several concepts from linear system theory and to give an interpretation of the previously obtained results in this context. Since it is our experience that specialists in approximation theory are not always familiar with the notions and the terminology from linear systems, we think that it is worth introducing the basic concepts. Eventually we shall work with and apply our results to very specific systems that can be defined quickly. However, we think that then the meaning of certain assumptions will be lost and in that case one is playing with mathematical objects that haven't any physical interpretation. This is what we want to prevent and therefore we start with the most general situation and introduce several concepts at the lowest possible level. We hope that this will give the reader a grip on what all the terminology means. Of course we restrict ourselves to the bare minimum since we do not have the ambition to rewrite any of the excellent books on linear system theory that already exist. The reader is warmly recommended to consult the extensive literature [36, 101, 165, 169]. Linear system theory is a place where many mathematical disciplines meet. In harmony with the previous chapters of the book, we shall give a rather formal algebraic treatment. So most of the definitions will be given for a commutative field. However at a certain point this will not be possible anymore. We then switch from formal series and rational forms to functions of a complex variable. In contrast with this relatively extensive introduction, our interpretation of the results of previous chapters will be relatively short. Most of the theorems will be just reformulations of earlier ones in a different terminology 271
272
CHAPTER
6. L I N E A R S Y S T E M S
without needing a new proof. Other results, like the theory of orthogonal polynomials with respect to a Toeplitz moment matrix, as we have discussed it, will be definitely too thin to serve system theoretic demands. In both cases, the discussion will be brief.
6.1
Definitions
In this section, we shall introduce the notion of a hnear system and several ways to represent them. To give the reader insight into these mathematical objects, called hnear systems, we shall develop some results without demanding much other mathematical background. We start by defining what we mean by a signal.
Definition 6.1 (signal) Let F be a field and ~ C_ R a time set.
Then a scalar signal is a f u n c t i o n defined everywhere on ~ with scalar values in F. Similarly, a vector signal is a f u n c t i o n ~ ~ yr and a m a t r i z signal is a f u n c t i o n ~ ~ Fpxm with the integer numbers p, m > O. The set of all possible signals is denoted as S.
Note that S is a vector space over F. It is in fact equal to 1~, the set of mappings from lr to F (or of (IP') ~ for the vector signals). In this chapter, we shall only work with scalar signals; if not, this will be stated explicitly. Two choices for the time set 11are of particular interest. When ]I - R, the set of real numbers, we have a continuous time set and the signals are continuous time signals or continuous signals for short. On the other hand, also 1[ - Z, the set of all integers, is important. This is a discrete time set and the signals are called discrete (time) signals. Now we can introduce the concept of a system with a single input and a single output. It is basically a device that maps certain (input) signals into (output) signals.
Definition 6.2 ( S I S O s y s t e m ) A single-input single-output s y s t e m or SISO s y s t e m is a triple (U, Y, 7( ) where U C S is a set of input signals, Y C S is a set of output signals and 75 9 U ~ 3{ is an input-output map. connects with each input signal u C U an output signal y - TI u C 3{.
7(
This is a rather general definition. In the sequel, we shah impose further limitations onto the set of systems to get more interesting subsets.
Definition 6.3 ( l i n e a r s y s t e m ) If the input and output sets U and Y are subspaces of S, then we call the system (U, u 7"l ) linear if 7-[ is a vector space homomorphism.
Thus
273
6.1. D E F I N I T I O N S
Because we have an ordering on the time set, we can speak about the past, the future and the present with respect to a certain time r. D e f i n i t i o n 6.4 ( p a s t , p r e s e n t , f u t u r e ) For every ~" E ~ we can partition the time set into
If we consider the singleton { r } as the present, then we refer to the first set as the past and to the last set as the future. The signal space S can be written as a direct sum of three subspaces
where S~+ is called the space of past signals: it contains all signals that are zero on the present and future, S r- is the space of future signals" it contains all signals that are zero on the present and past and S~ is the space of present signals" it contains all signals that can only be nonzero for t - 7. The projectors onto the past (present, future) signals are denoted as IV+,
II ).
e S,
H
) the
(p,
ent,
of the signal s with respect to 7. Occasionally we shall use combinations: e.g. (II~_ + II~)s is called the past and present of s and it will be abbreviated as II~_0s etc.
Note that we have indicated the past by a "+"-sign, the present by a "0" and the future by a "-"-sign. The reason for this will become clear later on when we define the z-transform following the engineering convention. We introduce now the concept of causality. D e f i n i t i o n 6.5 ( c a u s a l i t y ) Let the space of input signals U be projection invariant, i.e. VT C ~ " IIT"U C U, where IIT" is any of the projectors introduced above. A system is called causal or dynamical if with respect to all possible times T the present input signals can only influence present and future output signals" 7"[ H~U C II~_u for all 7" C ~. A strictly causal system is a system where present input signals can only influence future output signals, i.e. 7-l II~U C IV_u Note that causalitycan be expressed in other, equivalent ways. A system is causal if and only if two input signals with the same past, have corresponding outputs with the same past" VT E ][,
VUl,U2 C U
7"
7"
9 I I : u I -- I I + u 2 ==~ H + ~ ~ u I -
7"
H+']-~ U 2.
274
C H A P T E R 6. L I N E A R S Y S T E M S
Equivalently I I ~ _ u - 0 => II~_~ u - 0 or 7-/II~_U C_ II~_Y. Yet another way to describe causality is II~.~ II~_ - 0 for all ~" C ]I. In the sequel we shall use the latter expression, but the other interpretations may be useful too. Now we define time invariance of a system. D e f i n i t i o n 6.6 ( t i m e i n v a r i a n t ) Let the time set ~ and the space of input signals U be shift invariant, which means VT, t C ~ " t - 7 C ~ and Z - ~ U C_ U, where Z - r is the backward shift or delay operator, defined on S by Z - r s ( t ) s(t - T), s E S. A system (U, u 7-/) with a shift invariant set U of input signals is called a constant or time invariant system if Z -~" and 7[ commute for all 7 E ~, i.e. 7-[ Z - ~ u - Z-~7-l u. Note that the continuous time set R as well as the discrete time set Z are shift invariant. To obtain further results, we shall limit the discussion for the m o m e n t to discrete linear time invariant causal systems. The time set wiU now be ]I = Z. Moreover, the set of input and output signals will be equal to the set of all possible functions Z ~ F: U = u = S = F ~. A parallel t r e a t m e n t can be given for continuous systems, but we do not want to complicate the exposition more than strictly necessary. Note that in the discrete case, the signal space S can be described in terms of the canonical basis consisting of shifted impulse signals. D e f i n i t i o n 6.7 ( i m p u l s e s i g n a l ) The impulse signal ~ is the Kronecker delta function at O, i.e., ~k = 1 for k = 0 and zero elsewhere. W i t h the impulse signal and the shift operator, we can describe the signal space S as span~{Zk6 9 k 6 Z}. It will be useful in certain instances to describe a system in an isomorphic framework. The isomorphic image of the signal space S will turn out to be the space of two-sided formal Laurent series, which is defined as follows. D e f i n i t i o n 6.8 ( t w o - s i d e d f o r m a l L a u r e n t s e r i e s ) The space of the twosided formal Laurent series F(z, z -1) is the set of all mathematical objects of the f o r m ~ = - o ~ sk z - k with sk E F. Clearly F(z, z -1) is a vector space over F. It is isomorphic to the space S of discrete signals. The isomorphism is given by the z-transform. D e f i n i t i o n 6.9 ( z - t r a n s f o r m ) The z-transform s - ~ of a discrete time signal s C S is defined as the two-sided formal Laurent series whose coefficients are the signal values at the discrete times, i.e. ~.s - ~k=-oo+c~ skz -k with sk - s ( k ) , k C ~ - Z. It is clear that also the inverse z-transform
6.1. D E F I N I T I O N S
275
is defined for all two-sided formal Laurent series generating a signal whose values at the discrete times are the coefficients of the formal Laurent series. So we have a one-to-one relationship between s and ~. s is an isomorphism between S and F(z, z - l ) 9 S - / Z S - F(z, z-l}. The inverse z-transform is a generalization of what we described previously as a stacking operation. As we associated with a polynomial or a formal series a column matrix which was called its stacking vector, the inverse z-transform does basically the same thing. To describe the properties of a discrete linear time invariant causal system, we can work with the space S, but it is also possible to describe it in its isomorphic image S - F(z, z -1). This has been given a name. D e f i n i t i o n 6.10 ( t i m e d o m a i n , f r e q u e n c y d o m a i n ) The signal space S is called the time domain and the space of two-sided formal Laurent series is called the frequency domain. The signal space S is shift invariant with respect to the shift operator Z. That we have represented the shift operator by Z is not a coincidence because the corresponding operator Z in S defines a multiplication with z. The reader will recognize it as the down shift operator for stacking vectors. We shall often write the multiplication operator ,~ simply as z. Thus S being shift invariant means that Vs E S,VT E I[ : s
= z~s
which can also be expressed a s / : Z - 2 ~ o r / : - 2 - 1 ~ Z - ZZ:Z -1. Not only the shift operator in S has an isomorphic equivalent in S. We can generalize this to any operator acting on S. We shall use the convention that an operator T acting on S has a corresponding operator in S which is denoted by T. They are related b y / : T - T/:. This holds especially for the input-output map 7-/ and the projection operators we introduced before. Adding the shift operation to the vector space structure of F ( z , z - 1 ) , we see that the multiplication of a two-sided formal Laurent series with a polynomial is well defined, because each coefficient of the result can be computed with finitely many operations in F. Multiplication with a formal Laurent series, two-sided or not, is in general not defined. Recall however that multiplication of elements in the subspaces of formal Laurent series F(z} or F(z -1 } are well defined. In the frequency domain S - F ( z , z - 1 ) , the properties of our linear system can be reformulated as linearity:
is a linear operator on F(z, z - l ) .
C H A P T E R 6. LINEAR S Y S T E M S
276
" T ~ II"T0 _ - 0 f o r a l l T E l I II+
causality:
7:t 2 - 27:t.
time invariant"
Note that time invariance implies Z-17-/Z = 7-I which means that with respect to the canonical basis in S, the operator 7/ must have a matrix representation with a bi-infinite Toeplitz structure. Suppose that we are considering the past, present and future input and output signals with respect to a certain time T = k C lI = Z. Then in the frequency domain, the past, present and future subspaces correspond to
n~
_
~-k+~F[[z]] aoj F~
~ik__.~
__
z _ k _ 1 F[[Z_ 1 ]]
de__fFk (Z_ 1 ).
Note that the following property holds for the projection operators
HI + H0~ + rt ~_ - z
(6.~)
and for II any of the possible projection operators, we have
Hk -
Z - k I I ~ k.
(6.2)
In general one can write for s = {sk}k C S that IIko_3 -- B k Z - k ~
-3l-
IIok_ +1 s.
(6.3)
Suppose that we subdivide the signal space S into the direct sum S-
S~_ | Sok_ with S~_ - H~_S and Sok_ - IIok_S.
Accordingly, the effect of an operator can be split up in four parts, e.g. for the operator 7-/:
-
+
+
no
+ no
no _.
(6.4)
For example II~_7-/II0k_ maps the present and future into the past. For a causal system, we know that this is zero. It is of practical importance to describe what the reaction of a system will be at present and in the future when the system is driven by an arbitrary input signal. Thus we should be able to describe the effect of II0k_7-/. As we just said, for a causal system, the second term in the decomposition - II~_7/II~_ + II~_7/II0k_ gives zero anyway. If the input signal has
II~_7-I
6.1. DEFINITIONS
277
no past, then the so called impulse response of the system is sufficient to describe its full effect. However, if the input signal has a past, then we shall need the notion of state of a system. The state condenses in some sense the complete history in a compact form. The state depends on the past of the input signal and the set of all possible states a system can reach at a certain m o m e n t is the state space. These are the i m p o r t a n t concepts to be introduced next. We start with the impulse response. For a shift invariant system, the output can partly be described in terms of the impulse response which is the output generated by the system when driven by the impulse signal. D e f i n i t i o n 6.11 ( i m p u l s e r e s p o n s e ) The impulse response h of a discrete time system is the output signal produced by the system when the impulse ~ is given as input signal, i.e. h = 7-[ ~ or in the frequency domain formulation h - ~ ~ - ~ 1. The impulse response has the following properties. 9 If the system is time invariant, we know that zh - 7~ Z~ Zh = ZTI ~ = 7"l Z~.
-
7~ z
or
9 If the system is linear and time invariant, ]~(azk+flz l) - ~ (azk+flzt), or ( Z k h a + Zlht3) - 7"l ( a Z k + flZt)6, 9 If the system is causal, we know that II~]z - 0, i.e., ]z E F[[z-1]] F_l(Z -~) or h C S~ If the system is strictly causal, we know t h a t l~I~0]z- 0, i.e., ]z e F0(z -1 ) or h e S ~ Now suppose that u C Sok_ has a zero past. Then H ~ u - 0. If moreover the system is causal, then we can see from the decomposition (6.4) t h a t
y=7-lu
-
iIo _
=
Ilok_g
=
Ho
=
h.(II0k_u)-
h.u
where 9 denotes the convolution operator. D e f i n i t i o n 6.12 ( c o n v o l u t i o n ) If h and g are two discrete signals, then their convolution f = h . g is defined as another signal given by
f(j) - ~ iEz
h ( i ) g ( j - i),
278
C H A P T E R 6. L I N E A R S Y S T E M S
whenever this exists for all j E Z.
Note that these sums are infinite and need not converge in general. However when both h and g are only half infinite, i.e., when both have a zero past or future for some time instant, then these sums are actually finite and the convolution is then well defined. For example, in the situation above, h C S ~ and u C S0k_, so that their convolution y - h . u is only nonzero for time instances > k and J j f(k
+ j) -
h(i)
(k + j -
i=0
-
h(j -
+
j>o.
i=0
Thus y E S0k_. We may now conclude that we have proved the following theorem. T h e o r e m 6.1 Let (S,S, 7-I ) be a discrete linear time invariant causal system with impulse response h - 7-[ 5. For given k C Z, let u C Sko_ be an input signal with zero past. Then the output signal y = 7-[ u is given by the convolution of h and u: y-
7-l u -
h.u
k
C So_.
In the frequency domain, this translates into the following. Let it C S ko_ = Fk-1 (z -1 ) be given and 7~ a linear shift invariant map on ~ - F(z, z -1 ) with impulse response h - 7~ 1 e F_I (z -1 ). Then ~1 - 7~ iz is given by the product
f/-- ~ ~t-- h~ E Fk-l(Z -1). In other words, 7~ represents the multiplication operator with h in S.
Note that with respect to the canonical basis in S0k_, the convolution operator II0k 7-/II0k_ for a causal system is represented by a lower triangular Toeplitz matrix whose nonzero entries are the impulse response values. However not all input signals are in S0k_, so how about the contribution of the past component H~_u C S~_ to the present and future output? That is IIok 7-/IIku. To describe this we shall need the notion of state. D e f i n i t i o n 6.13 ( s t a t e , s t a t e s p a c e ) The state at time k of a discrete system with input signal u is defined as the present-future component of the output signal generated by the past of u which is shifted to time O:
The set of all possible states at time k is called the state space at time k and denoted by S k : s
-
c s~
6.1. D E F I N I T I O N S
279
We have the following properties. T h e o r e m 6.2 If the system is time invariant, then the state space S k is independent of the time k. If the system is linear, then the state space forms a vector space. P r o o f . We can rewrite the state space for time k as
s~-
z~iio~_~ ii~s =
Z kZ-klIo_Zkl-I Z-kII~
= n o_ ~ n~ z~s = n~
H~
For the second line we used (6.2) and for the third line the time invariance of the system while the last line follows from the shift invariance of S. The fact that the state space is a vector space is obvious. [3 Because the state space is independent of time k, we shall drop the superscript k from the notation. We are interested in systems with a finite-dimensional state space. D e f i n i t i o n 6.14 ( f i n i t e - d i m e n s i o n a l s y s t e m ) A discrete linear time invariant system is called finite-dimensional if the state space is finite-dimensional. Hence, for a discrete finite-dimensional linear time invariant system, we can span the complete state space by some finite basis of signals from So_, e.g. yl, y 2 , . . . y n if the state space has dimension n. We can now completely describe the effect of II0k_H with the help of the state space basis and the impulse response. We get the following result. T h e o r e m 6.8 For a discrete linear time invariant causal finite-dimensional system with state space dimension n and state space basis signals yl, y 2 , . . . , yn, and impulse response h, we can write the present-future component of the output signal at time k as IIko_"]-~ "Ul,- z - k [ y
~ith z-k[y ~ y ~ . . . r
1 y2 . . .ynjT, k ._[_h .
(IIko_'U,),
x k C:_.~ ' ,
n~o_n H~_~.
P r o o f . We know from Theorem 6.1 that II0k~ u
- n0~_ n ~0~_= + n0~_ n n~ = h 9 (n0~_=) + z -~z~n0~_n n~
(6.5)
C H A P T E R 6. L I N E A R S Y S T E M S
280
Because the system is finite-dimensional, we can interpret in the second t e r m ZkIIko_~ II~.u ~s a state and write it as a linear combination of the state space basis signals y l , . . . , yn determined by the coefficient vector xk. D
Note that, given the state space basis, the vector zk characterizes the state of the system uniquely. Note also that the state space is independent of time, but the vector zk is not. D e f i n i t i o n 6.15 ( s t a t e v e c t o r ) Once we have chosen a state space basis for a discrete linear time invariant causal finite-dimensional system, we can write the state at time k of the system uniquely as a linear combination of these basis vectors. The vector containing the coordinate coefficients is called the state vector x k. To describe the time dependence of the system, we should be able to describe the evolution of the state, or equivalently, of the state vector. In the next theorem, we give the relationship of the next state vector Xk+l and the output value yk in function of zk and uk. Such a description of a system is called a state space description. T h e o r e m 6.4 ( s t a t e s p a c e d e s c r i p t i o n ) For every discrete linear time invariant causal finite-dimensional system the present-future output II'o_ TI u for any input signal u, can be described by a state space description, characterized by a quadruple (A, B, C, D), as follows. Given an initial state xi, it holds for k >_ i that zk+~ -
Azk +
A E ]~x,., B E ]W xl
Buk,
CE]~
Yk -- cTT, k + Duk,
xl
DEF.
Vice versa, every quadruple (A, B, C, D) of the given form can be interpreted as the characterization of a state space description of a discrete linear time invariant finite-dimensional system. For a strictly causal system, i.e. with D = O, we can characterize the state space by a triple (A, B, C) instead of a quadruple. A is called the state transition matrix. P r o o f . If we can prove the theorem for i - 0, the same result will also be valid for i r 0 because the system is time invariant. We start from (6.5) in Theorem 6.3
n0 _n
- H0 _y - z-
[y
+ h,
6.1.
281
DEFINITIONS
Now we use property (6.3) to rewrite
1Io~_ y
-
y~ z -~ ~ + no~_+a y
no~_.
-
. ~ z - ~ 6 + Ho~_+'.
II~ yJ
-
yJoS + Il lo_ yJ , j - 1 ,
-
ho~ + n~_h.
n~
The signals Z I I ~ _ y J and Z I I ~ h we can write
. . .,n
are both in the state space and therefore n
znx_yJ
E
yiaij,
aij C F
i=1 n
ZH~_h
E
Yibi,
bi G F.
i=1 Using all this we get yk z -k ~ + Hko+_~U
z-~[yo~ yo~...y~]~ + Z - k - l [ y l y2 . . . y ~ ] A z k +
(h05 if- z - l [ y 1 y2 . . . yn]B ) , (,U,k Z - k 5 .4_ ii0k+lu) where the components of A 6 F nxn and B 6 F nxl are aij and bi respectively. Taking the projection IIok results in
Yk -- C T x k + D u k , with C T - [ y ~
y ~ . . . y'~] 6 ~lxn and D - ho 6 F. The projection IIk gives
no~_+ly
z - k - 1 [yl y2 . . . y " ] A z k + Z - k - 1 [yl y2 . . . y " ] B u k +
h 9 (IIok_+lu) which should be compared with
Ho~+~y - z - ~ - i [ y x y ~ . . . y " ] ~ + ~ + h 9 (no~_+i=). Because the signals yl, y 2 , . . . , yn are linearly independent, we can write zk+~ - A z k + B u k .
This proves the first part of the theorem.
282
C H A P T E R 6. L I N E A R S Y S T E M S
It is clear that every quadruple (A, B, C, D) characterizes a state space description of a discrete hnear time invariant causal system. We prove t h a t this system is finite-dimensional. The state space is the set of the presentfuture output signals generated by all possible past input signals. In this case, the past is characterized by a vector x0. All possible present-future output signals with initial vector x0 (x0 represents the past input) and present-future input values uk - 0, k _ 0 have values yk - C T A k - l x o . Therefore, any state y can be written as a linear combination of the states y' with y~ the ith component of the row vector C T A k-1. Hence, the system is finite-dimensional. Note that the dimension of the state space is not necessarily equal to n. It could be smaller. The system is strictly causal if and only if D - h0 - 0. [:3 The state space description is especially interesting when we want to know the present-future output signal from a certain time on, e.g. time 0. The influence of the past input signal is then condensed into the state vector x0. In the following theorem, we shall write the impulse response in terms of the state space description. T h e o r e m 6.5 Given the state space quadruple ( A , B , C , D ) of a discrete linear time invariant causal finite-dimensional system, the impulse response h of this system can be written in terms of this quadruple as h0 hk
-
or in the frequency domain h -
D C T A k- ~B,
k>0
~.h can be written compactly as
h(z) - D + C T ( z I -
A)-IB
where the operations have to be carried out in the field ]~(z-1).
P r o o f . The impulse response is h - 7-/6. Knowing t h a t the system is causal, and I I ~ - 0, the state at time k - 0 is II0~ I I ~ - 0. Hence the state vector x0 - 0. From the state space description theorem, we then find easily that yo-
D,
xl - B,
yt - C T B ,
z2 - A B ,
y2 - C T A B , . . . .
For the induction step, suppose that xk-1
_
A k-
2
B
and
hk-1
_
Yk-1
_
C T A k- 2 B, for some k _ 2,
6.1. D E F I N I T I O N S
283
then we get xk-
Axk_l - A k - ~ B
and
hk - Yk
-
c T T, k -- C T A
k-lB.
Hence, we have proved the first part of the theorem. To prove the second part, we use the fact that if the operations are carried out in the field F(z -1), we get (zI-
A) -1
-
I z -1 + A z -2 + A2z -3 + A3z -4 + . . . .
Hence D + CT(zI-
A ) - I B - D + C T B z -1 + C T A B z -2 4- C T A 2 B z -3 + " ".
Another way to prove the second part of the theorem starts from the state space description, which we can write using the z-transform as follows z ~ , ( z ) - A~,(z) + B'~(z) ~)(z) - cT~,(z) + Diz(z) where & denotes the z-transform of a vector signal. Eliminate & and use the fact that u - 6 ( ~ - 1) and xo - 0 to get h - D + C T ( z I - A ) - I B . [:] From now on, we assume that the systems we are working with are discrete linear time invariant causal and finite-dimensional. Moreover we assume that the field F is commutative. Suppose that it is strictly causal (D - 0), then the impulse response can be represented in the frequency domain by a polynomial fraction h(z)-CT(zI-A)-IB
-
C T adj(zI- A)B d e t ( z I - A)
where a d j ( z I - A) represents the adjugate matrix containing the cofactors of z I - A. Note that the degree of the denominator d e t ( z I - A) is equal to n with n the dimension of the state transition matrix A. The degree of the numerator C T a d j ( z I - A ) B is smaller than n. Conversely every polynomial fraction with the degree of the numerator smaller than the degree of the denominator (a strictly proper fraction) can be interpreted as the frequency domain representation h of the impulse response as stated in the following theorem. ^
284
C H A P T E R 6. L I N E A R S Y S T E M S
T h e o r e m 6.6 ( p o l y n o m i a l f r a c t i o n r e p r e s e n t a t i o n ) Given a state space triple (A, B, C) of a system, then its impulse response is in the frequency domain given by the polynomial fraction h(z) - C T ( z I - A ) - I B -
C r adj(zI- A)B det(zI - A)
with the degree of the numerator smaller then the degree of the denominator the latter being equal to the dimension of the state transition matriz A. Vice versa, every polynomial fraction pq-1 with deg p < deg q = n can be interpreted as a polynomial fraction representation in the frequency domain of the impulse response of a system. P r o o f . The first part of the proof was given above. To prove the second part, we shall explicitly give one of the possible state space representations of the system with polynomial fraction representation pq-1. We know that the impulse response satisfies ]~ - ~ = 1 hkz -k - pq-1 E F(z -1 ). Note that the system does not change when we normalize the polynomial fraction representation by making q monic. A possible state space representation is given by A-
F(q),
B-
[l O O...O] and C T - [ h 1
h2...h,.,],
with n - deg q and where F(q) is the companion matrix of q (see Definition 2.3). The reader should check that hk - C T A k - I B , k > 0. Note that the C-vector which consists of the first values of the impulse response h can be computed as follows 1 q,.,-1 "'" 0 1 ... C T-
[hl h 2 " " h n ] - [P,.,-1 p,.,-2"" "P0]
9
.
9
o
0
0
"'"
0
0
'
ql q2
--I
qn-1
1
with p ( z ) - ~k=o ,.,-1 Pk zk and q(z) - ~'~=o qk zk The state space description that we have given here is called the controllability form by Kailath [165, p. 51]. KI With the state space description, two other concepts are easily introduced. D e f i n i t i o n 6.16 ( o b s e r v a b l e , r e a c h a b l e ) If ( A , B , C) is a state space triple representing a system with state transition matriz A of dimension
6.1. DEFINITIONS
285
n, then we call (A, C) and the system itself observable if the observability matrix 0 - [C A T c ... (AT)n-Ic]T has full rank. The couple ( A , B ) and also the system itself is called reachable if the reachability matrix ~ = [B A B ... An-IB] has full rank. The matrix H = O ~ is the n x n Hankel matrix consisting of the first impulse response entries. It is called the Hankel matrix of the system. With respect to the appropriate basis, it is the matrix representation of the operator II0k_?-/II~_ mapping past into present and future. The matrices O and T~ appeared already as Krylov matrices in our discussion of the Lanczos algorithm. The reader should recall that if rank O = n then it has reached its maximal possible rank. Extending the Krylov matrix further has no influence on the rank. The same observation holds for the rank of T~. Note that ~t(z) - p(z)q(z) -1 - C T ( z I - A ) - I B when we consider z as a variable can also be interpreted as a rational function and not just as a formal ratio. According to Definition 3.2 the state space triple (A, B, C) is then a realization of the rational function pq-i. This leads us to the definition of the transfer function which we shall also give for the case where D might be nonzero. D e f i n i t i o n 6.17 ( t r a n s f e r f u n c t i o n , r e a l i z a t i o n ) /f ( A , B , C , D ) is the
state space quadruple of a system, the rational function f(z) - D + CT(zI- A)-IB is called the transfer function of the system. Describing a system with data like the transfer function, the impulse response or the polynomial fraction description is often called an external description, while the state space representation is called an internal description. It is clear that each realization of a rational transfer function is another name for a state space description of the system whose impulse response can be found from a polynomial fraction representation for the rational function. Therefore, we can speak about the (minimal) realization of a system as being the (minimal) realization of the transfer function of the system. We can also speak about equivalent realizations for the same system as defined by Definition 3.3. For completeness, we repeat it here. D e f i n i t i o n 6.18 ( ( m i n i m a l ) r e a l i z a t i o n ) If f ( z ) is a rational transfer function of a system and B , C C F"xl, D E F and A C Fnxn such that f ( z ) - D + C T ( z I - A ) - I B . Then we call the quadruple ( A , B , C , D ) a realization of f(z). If n is the smallest possible dimension for which a realization ezists, then the realization is called minimal.
CHAPTER 6. LINEAR SYSTEMS
286
Two realization quadruples ( A, B, C, D) and ( ~i, B, C, D) are equivalent if they represent the same transfer function D e f i n i t i o n 6.19 ( E q u i v a l e n t r e a l i z a t i o n s )
f(z) - D + C T ( z I - A ) - I B - b + C T ( z I - /~)-1 and have the same size. Two realizations related by D-D,
C-
FTC,
JB- F - 1 B
and
tt- F -1AF
with F some nonsingular transformation matrix are equivalent and if the realizations are minimal the converse also holds [165, Theorem 2.4-7]. Translating Definition 3.4 of the McMillan degree within this context gives us
The McMillan degree of a system is the dimension of the state transition matrix A of a minimal realization quadruple (A, B, C, D) for the system. It is clear that the minimal denominator degree of all polynomial fraction representations of the system is also equal to this McMillan degree. Hence, the McMillan degree is equal to the denominator degree of a coprime representation of the transfer function.
D e f i n i t i o n 6.20 ( M c M i l l a n d e g r e e )
The McMillan degree can be seen as a measure for the complexity of a system. Indeed, all systems having McMillan degree n can be represented by a coprime polynomial fraction representation pq-1 with deg p < deg q = n. It is a well known fact that a minimal realization can be characterized in several ways. For example. T h e o r e m 6.7 Let the triple (A, B, C) be a state space triple for a system
with the dimension of A equal to n. Let T~ and 0 be the reachability and observability matrices and C ( z I - A ) - I B the transfer function. Then (A, B, C) is a minimal realization iff one of the following conditions is satisfied. 1. C ( z I - A ) - I B is irreducible, i.e. d e t ( z I - a) and C a d j ( z I - A)B are coprime. 2. rank 7~ - rank 0 = n. I.e., the system is reachable and observable. 3. rank OT~ - n. P r o o f . See for example in Ka~lath [165, p. 126-128].
6.2. M O R E DEFINITIONS A N D P R O P E R T I E S
6.2
287
More definitions and properties
Most of the derivations in the previous section were given for discrete systems. A similar route can be followed for continuous ones. However, where the discrete case allows easily a formal approach, since we are working with formal series and sequences, this is much more difficult to maintain for continuous systems. Since we want to include again continuous time systems in the discussion, we shall suppose for simplicity that our basic field F is the field of complex numbers C. The series we shall consider are now in a complex variable and do (or do not) converge to complex functions in certain regions of the complex plane. In the same way as for discrete time systems, we can define the continuous time impulse signal as a generalized function 6 (a Dirac impulse) satisfying
f+f
6 ( t - T)f(t)d(t)
f(~').
The transform which is the equivalent of the discrete z-transform, is the Laplace transform of the signal. We shall not build up the theory for continuous systems from scratch as we did for the discrete time case, but immediately summarize the most important results. The reader should consult the literature if he wants to know more about the derivations. Although not really necessary, we shall suppose that all the signals we consider in the sequel are zero for negative time. If the system is causal, this means that also the initial state at time zero will be zero. So we come to the following characterization of a system. Some authors see it as the actual definition of a system.
The state space description of a linear time invariant causal finite-dimensional system has the following
D e f i n i t i o n 6.21 ( s t a t e s p a c e d e s c r i p t i o n )
form: s:(t)
-
Az(t)+Bu(t),
y(t)
-
c T x ( t ) + Du(t),
~(o)
-
O,
t E ~,
BE][~"~,xl
A E F "~xn, C e F ~xl
,
.D E l~l x l ,
t>_O.
As before, u is the input, y is the output and z is the state vector. For a discrete time (d.t.) system the operator S represents the shift operator Z, i.~. s=(t) - z=(t) - =(t + 1). Fo~ a ~ o ~ t i ~ o ~ tim~ (~.t.) ~y~t~m S ~ t ~ ~ for differentiation Sx (t) - d x (t). Let s denote the Laplace transform in continuous time or the discrete Laplace transform (i.e. z-transform) in discrete time. This means ] - s f
CHAPTER 6. LINEAR SYSTEMS
288
depends on the frequency domain variable z, in the following way oo
:(z) - ~ I ( t ) z - '
(d.t.)
t=0
-
Jo"
e-*tf(t)dt
(c.t.)
whenever the right-hand side makes sense. In the frequency domain, we can write the state space description as z~(z)
=
A~(z)+B~(z),
f/(z)
-
cT~(z) + D~(z)
which gives an expression for the transfer function Z(z) as
f/(z) - Z(z)~(z) with Z(z) - D + C T ( z I - A)-IB. The reaction of the system to an impulse signal 5 as input, which is the Kronecker delta 5 = {50k}ke~ for d.t. or the Dirac impluse function with spike at zero for c.t., is called the impulse response h. Since in both cases, the frequency domain equivalent of an impulse is the constant function - /:6 - 1, the impulse response will correspond to the transfer function in the frequency domain. ]z(z) - Z(z). 1 - Z(z). Therefore, the impulse response itself is the inverse transform of Z(z)" h - s Thus we have the following relations between the transfer function Z(z) and the impulse response h(t) whenever they make sense:
Z(z)- ~oo h(t)z-'; h(t)- ~1
fo 2~ e~:tZ(e~ ~ )d~
(d.t.)
(6.6/
F
(c.t.).
(6.7)
t--0
Z(z)-
/o e-*th(t)dt;
h(t)-
ei~tZ(iw)dw
oo
Without loss of generality, we shall assume strict causality, i.e. D = 0 in the sequel. Hence, the system can be described by the transfer function Z(z) connected with the state space triple (A, B, C) as follows:
Z(z)-CT(zI-A)-IB. It is a strictly proper rational function. In the theory of linear systems, it is important that the system model is stable. Stability means the following.
6.2. MORE DEFINITIONS AND PROPERTIES
289
D e f i n i t i o n 6.22 ( s t a b i l i t y ) We shall call a system (bounded-input-bounded-output) stable (or BIBO stable) iff a bounded input generates a bounded output. This is known to be equivalent with
(~.t.)
fo ~ ]h(t)]dt < M < ~o oo
Ih(t)l < M < oo
(~.t.)
t=l
with h(t) the impulse response of the system. A system is called internally stable iff
rt~(r
< o
(~.t.)
ICzl < 1
(d.t.)
where (i are the eigenvalues of the matrix A from the state space representation (A, B, C) for the system. Re (() is the real part of the complex number
r Internal stability implies BIBO stability but the converse is only true if the state space representation is minimal. For a physical system, we should expect that it is BIBO stable. However, for the approximants that are generated by (minimal) partial realization (see next section), we cart not guarantee that they will be stable. Some issues about the stability checks for systems (or more generally for polynomials) will we discussed in Section 6.7. To come to the definition of the minimal partial realization (mPR) problem , we introduce the Markov parameters of a linear system as follows" D e f i n i t i o n 6.28 ( M a r k o v p a r a m e t e r s ) For a linear system with transfer function Z(z), and state space triple (A, B, C), we cart the Mk, defined by
1 dkZ(z) Mk = k~ dz k
- CA k - l B ,
k - 1,2,...
(6.8)
z~oo
the Markov parameters of the system. Using equations (6.6-6.8) we can write the Markov parameters in function of the impulse response as follows
Mk - h(k) Mk-
dk-lh(t) dtk-1
(d.t.) (c.t.). t=0
C H A P T E R 6. LINEAR S Y S T E M S
290
This indicates that the first Markov parameters contain information about the transient response of the linear system. That is the behavior of the system for small values of time. This is easily verified for a discrete system. The output for small t depends on the input for small t and the initial Markov parameters, i.e., the first terms in the impulse response. For continuous time, it is somewhat harder to see, but it remains basically the same. Where the Markov parameters of a system give the most adequate information about the transient behavior, the time moments will be better to model the steady state behavior, i.e. the behavior for t ~ oo. We define them as follows. 6.24 ( t i m e m o m e n t s ) /f Z(z) is the transfer function of a continuous system and Z(z) is analytic in z - O, then its time moments are defined as mk--(--1)k (d) k , k - 0,1,2,... (6.9)
Definition
z--0
If Z(z) is the transfer function of a discrete system and Z(z) is analytic in z - 1, then its time moments are defined as
mk - ( - 1 )
(d)k
Z(z)
,
k - 0,1,2,...
(6.10)
z=l
Note that for stable systems the analyticity condition for the transfer function Z will be satisfied. From this definition we get the following connection between the time moments and formal power series. T h e o r e m 6.8 Matching the first ~ time moments is equivalent with matching the first ~ coefficients Tk in oo
T(z) - ~
Tkz k
k=O
for continuous systems, where T(z) is the MacLaurin ezpansion of Z(z), and with matching the first A coefficients Tk in T(z) - ~
Tk(z - 1) k
k=O
for discrete systems, where T(z) is the Taylor expansion of Z(z) around z - 1. These coefficients Tk are called time coefficients.
6.2. M O R E DEFINITIONS A N D P R O P E R T I E S
291
P r o o f . For continuous systems this is clear from the definition of the time moments. For discrete systems we get from (6.10) that each time moment mk is a linear combination of the time coefficients To, T 1 , . . . , Tk. The coefficient in this linear combination for Tk is different from zero. This proves the theorem. [::] To show that time moments give information about the steady state behavior of stable linear systems, we give the following theorem. T h e o r e m 6.9 Let mk be the time moments of a stable linear system with
impulse response h(t) (see (6.6-6.7)). Then these time moments are equal to
mk - ~ tkh(t)
(d.t.)
t=O
and ink-
tkh(t)dt
~0~176
(c.t.).
P r o o f . See the paper of Decoster and van Cauwenberghe [?4].
[::]
From this theorem it is clear that the time moments m k of a stable system are a weighted average of the impulse response h(t). When k becomes larger more emphasis is laid on the impulse response for larger t. Hence, we can conclude that matching the first time moments of a stable system will model the steady state behavior if the approximating system is also stable. If some initial Markov parameters are known, we know the initial coefficients of a series expansion of the transfer function in the neighborhood of infinity; of a series in F[[z -~ ]] say. We can thus construct with our algorithms some rational approximant that will match these Markov parameters and since these parameters model the transient behavior of the system, we may hope that we get a good approximate model for the system as it behaves for small time. This type of approximation problem is the partial realization problem. An important issue is the simplicity of the approximant, i.e., its minimality. Hence, one is usually interested in a minimal partial realization problem. We shall interpret the previously obtained results in this context in the next section. A severe drawback of this type of approximation however is that we can not be sure that the approximant will correspond to a stable system, even if we started out with a stable one. So stability checks as will be discussed in Section 6.7 are in order. Other techniques consist in allowing only restricted types of approximants that will be guaranteed to be stable. For this a price is paid because less coefficients can be fitted. Some examples are reviewed in the survey paper [42].
C H A P T E R 6. L I N E A R S Y S T E M S
292
If some time coefficients are known, we can use the division algorithms of the previous chapters for fls in F(z) to generate approximants. This is a problem like described in Pad~ approximation since we are now dealing with power series (in z or z - 1). Thus we shall give a system theoretic interpretation of the Pad~ results in Section 6.4. Of course, one can think of a mixed minimal partial realization problem, i.e. where both time coefficients and Markov parameters are given. One may hope to have a model that is good for the transient behavior as well as for the steady state behavior. For that we should approximate both a series in z -1 (Markov parameters) and a power series in z or z - 1 (time coefficients). Hence we have here a sort of two-point Pad6 approximation problem. If we want to do the previous computations recursively, i.e., we suppose more and more Markov paramaters (or time coefficients) become available as the computations go on, we are basically facing a problem with "Hankel structure". We mean that the data can be collected in a Hankel matrix which is bordered with new rows and columns as new data flow in. If there are only data of one sort, i.e., either only Markov parameters or only time coefficients, then the data expand in only one direction, just like a bordered Hankel matrix gets only new entries in its SE corner. Even if we have a mixed problem but with a preset, fixed number of data of one kind, we are basically in the same situation, viz., data expanding in only one direction. However, if Markov parameters as well as time moments keep flowing in, the problem has a "Toeplitz structure". If a Toeplitz matrix is bordered with an extra b o t t o m row and an extra last column, the new d a t a appear in the NE and the SW corner of the matrix, which illustrates the growth in two different directions. Thus in the first case, we shall be dealing with the results we obtained for Hankel matrices, while in the second case we shall have to interpret the results we have given for Toeplitz matrices. We shall investigate this in more detail in the following sections.
6.3
The minimal partial realization problem
We assume in the rest of this section that the systems taken into consideration are linear time invariant and strictly causal. Usually, the realization problem is defined as the problem of finding a state space description triple (A, B, C) given the Markov parameters Mk, k - 1,2,3, .... However, because the state space approach is equivalent to the polynomial fraction description approach, we can also look for a
6.3. THE MINIMAL PARTIAL R E A L I Z A T I O N P R O B L E M
293
polynomial fraction. In other words, given the coefficients in the expansion
M1z-1 -[- M2z -2 + M3z -3 + . . . , find the function it represents, either in the form of a state space realization Z(z) - C T ( z I - A ) - I B or in the form of a polynomial fraction Z(z) c(z)a(z) -1. Thus our definition is as follows" D e f i n i t i o n 6.25 ( ( m ) ( P ) R - M p r o b l e m ) Given the Markov parameters of a finite-dimensional system, find a polynomial fraction c(z)a(z) -1 which represents the transfer function of this system. We speak about a minimal problem if we look for a polynomial fraction c(z)a(z) -1 with a minimal degree for the denominator a(z). This is equivalent to finding a state space representation(A, B, C) of the transfer function with the order n of the state transition matrix A as small as possible. In short, we look for a minimal state space description or a coprime polynomial fraction representation of the transfer function, with a complexity equal to the McMiUan degree of the system. The adjective partial refers to the fact that we do not consider all Markov parameters (not necessarily connected to a finite-dimensional system) but .y only the first ones, {hk}k=l say. In this case the set of (finite-dimensional) systems having the same first 7 Markov parameters, i.e. the set of partial realizations of degree 7 (not to be confused with the degree of the transfer function), consists of more than one element. The partial realization problem looks for a polynomial fraction description of one or all of these systems, while the minimal problem looks for all those with the smallest possible McMillan degree. By solving a partial realization problem, one finds a (rational) transfer function which can be expanded as an element of F(z -1) whereby the first 7 coefficients in the expansion correspond to the given Markov parameters. We shall have to show that the previously described techniques give indeed (minimal) solutions. Let us start by looking again at the results of Section 1.5 where we obtained approximations for formal series from F(z -1 }. From Theorem 1.13, we can easily derive the following reformulation which can be stated without proof. T h e o r e m 6.10 Given the Markov parameters M k , k >_ 1 and using the notation of Theorem 1.13, the (right sided) extended Euclidean algorithm applied to s - 1 and r ( z ) f(z)Z(z)E ~ - i Mk z-k, the given transfer function, generates partial realizations co,kao,k-1 of degree 2t~k_ 1 + ak, . 9 2ak, . . . , 2ak + ak+l -- 1 having McMillan degree ak.
C H A P T E R 6. L I N E A R S Y S T E M S
294
In fact, the partial realizations co,kao, ~ are minimal. This result is not contained in Theorem 1.13, but it follows from the corresponding results about orthogonal polynomials. In fact, the previous theorem is a special case of Theorem 6.12 to be given below. We continue our scan of the results of the previous chapters. The pre-1 The intervious theorem says something about the approximants co,k a o,k" mediate results that are generated by the atomic Euclidean algorithm are partial realizations too. Indeed, interpretation of the results of Section 1.6 within the context of linear systems leads to the following theorem. T h e o r e m 6.11 Given the Markov parameters Mk, k > 1 and using the notation of Section 1.6, the atomic Euclidean algorithm applied to s - - 1 and Ci), (i) _~ , r - f ( z ) - Z ( z ) - ~-]~=1 hk z - k generates partial realizations Vo,ktUo,k)
i -- O, 1 , . . . , ak, of degree 2,r
+ ak + i having McMillan degree 'ok.
P r o o f . From equation (1.17), we get u0,k - ~0,k with deg,~'0,k (i) -
'ok and deg s (i) -
-'r
right-hand side of equation (6 " 11) by -(i) ~'O,k
(6.11) i - 1 9 Dividing the left and proves the theorem,
o
Note that the polynomial fractions Co,kao,~ as well as ~0,k(~0,k) can be written as continued fractions using the results of Section 1.4. Some of the linear algebra results of Chapter 2 can also be restated in the current context of linear systems. Let us start by rewriting the partial realization problem as a linear algebra problem. Given the Markov parameters M k , k > 0, i.e. the transfer function f ( z ) - Z ( z ) , the partial realization ca -1 of degree 7 has to satisfy f - ca -1 - r with deg r _< - 7 - 1. Suppose deg a - a. Hence, we get the equivalent linearized condition f a c - r a - r ~ with deg r ~ 0 with 2~k - ak _< 7 < 2nk+x - ak+x
6.3.
THE M I N I M A L P A R T I A L R E A L I Z A T I O N P R O B L E M
295
are given by the manifold ao,k +pao,k-1 with p an arbitrary polynomial having degp < 2 ~ k - (7 + 1) and a0,k and ao,k-~ the true (right) monic orthogonal polynomial for the Hankel moment matrix [Mi+j-1] having degrees ~k and ~k-1 respectively. For each of the denominator polynomials a = ao,k + pao,k-1 of this manifold, the corresponding numerator polynomial c can be found as the polynomial part of f a or c = Co,k + pco,k-1. For each different choice of p we get a coprime polynomial fraction representation of a different system, each system having McMillan degree nk. P r o o f . The first part of the theorem follows immediately from Theorem 5.6. The polynomial fraction ca -1 is coprime because, otherwise, the reduced rational form c~a~-1 would be a partial reahzation with smaller degree for the denominator. This is impossible because dega - ~k is minimal. Because c and a for different choices of p are always coprime and because ao,k, ao,k-1,..., z'ao,k-1 with i - 2~k - (7 + 1) are hnearly independent, ca -1 is a coprime polynomial fraction which is different for each choice of p. Therefore, for each choice of p this polynomial fraction represents a different system. [3 We assumed above that we started from the Markov parameters Mk or, equivalently, from the transfer function Z(z) - ~ = 1 Mk z-k. We then looked for a representation ca -1 having smallest possible McMillan degree and having the same first 7 Markov parameters. Instead of the Markov parameters we could consider the situation where a polynomial fraction is being given, one of high complexity say and possibly not coprime. We can then simplify the model (the given polynomial fraction) by replacing it by another one with a lower complexity. This is called model reduction. Thus suppose that in that case we know the transfer function for the given system in the form of a numerator-denominator pair ca -1. We can then also apply the different versions of the Euclidean algorithm which we described before with initial data s - a and r - c and get a reduced model (one with reduced McMillan degree or one with the same McMillan degree but represented by a coprime polynomial fraction). Instead of a given set of Markov parameters or a transfer function or its given polynomial fraction representation, we could consider the case where an input signal u ~ 0 is given together with its corresponding output signal y. When we assume that the past input signal is zero, we can also use the different versions of the algorithm of Euclid with initial data s - ~2 and r - -~) because the transfer function equals Z(z) - fl~z-1. In this case the problem of constructing a rational representation for Z(z) in one way or another from these data is usually called system identification. If the initial system is finite-dimensional the
296
CHAPTER
6.
LINEAR
SYSTEMS
algorithm will stop after a finite number of steps as in the previous case and we shall get a coprime polynomial fraction for this system. Otherwise, the algorithm will generate coprime polynomial fractions being minimal partial but never full realizations of the given system. E x a m p l e 6.1 Let us reinterpret the results of Example 1.5 within the context of linear system theory. We can view it(z) - - s ( z ) - z - z -4 as the z-transform of the input signal of a discrete system with corresponding output signal ~(z) - r ( z ) - 1 + z -4. This is the system identification interpretation. Because the input signal as well as the output signal have only a finite number of nonzero values, we can also adapt a model reduction viewpoint. We could see ca -1 with a(z)-z4~(z)-z
~- 1
and
4+1
c(z)-z4~(z)-z
as a polynomial fraction representation of the transfer function of a systern with great complexity for which we want to compute a less complex approximation (i.e. having smaller McMillan degree) or a reduced polynomial fraction representation. Implicitly we work with the transfer function, which for a discrete system corresponds to the z-transform of the impulse response Z(z)
-
-
-
z
+
z
+
z
+
z
+....
In the two cases, we can use the extended Euclidean algorithm to compute the true (monic) orthogonal polynomials a0,k together with the corresponding numerators c0,k. For the system identification problem, we start with the series s and r while for the model reduction problem, we start with the polynomials a and c. We could even use the transfer function (impulse response) explicitly by starting the algorithm with s - - 1 and r - Z - h. The Kronecker indices are ^
' ~ 1 - 1,
'~2-4
and
'~k-5
for
k>_3.
Hence, following the previous theorem, the denominator polynomials of all minimal partial realizations of degree 7 with 5 _< 7 < 9 are given by a 0 , 2 -~-
pao, 1
with
a0,2(z) - z 4 - z 3 + z 2 - z - 1
and
a0,1(z )
-- Z
and p an arbitrary polynomial with deg p _ 8 - (7 + 1). For example, take 7 - 7. Hence deg p _< 0 or p C F. All minimal partial realizations of degree 7 can be written as
(gO,2 -~- Pgo,1)(ao,2 + pa0,1) -1
6.3. THE MINIMAL PARTIAL R E A L I Z A T I O N P R O B L E M
297
with impulse response
z -1 + z -5 4- z -6 --pz -8 - - ( 2 p - 2)z -9 - ( p -
3)z -1~ + . . . .
Note that for p # 0 the first 7 Markov parameters are the same, while for p = 0 exactly 8 Markov parameters are matched.
Up until now, we have always looked for a (minimal) (partial) realization in the polynomiM fraction representation. We can easily translate the results to state space descriptions. T h e o r e m 6.13 Suppose the Markov parameters Mk, k > 0 are given defining the transfer function f ( z ) = Z(z). Using the notations of Corollary 2.30 and Theorem 2.7, we can give several state space representations (A, B, C) for the minimal partial realization co,kao, -1k of Theorem 6.12:
9 A- F(ao,k)- Hk-l(f)-lHk_l(zf), B T - [1 0 . . . 0 ] and C T - [ M 1 M2"..M,~k], with F(ao,k) the companion matrix of the (monic) polynomial ao,k. 9 A - Tk_l, B T [l 0 ' ' ' 0 ] and C T - eTDk_l - - [ 0 . . - 0 ~00--'0] where ro ~ 0 is the a l th element with a l the first Euclidean index of -
Y. P r o o f . Lemma 2.5 gives us
F(ao,k)- Hk-l(f)-lHk-l(zf). The controllability form of Theorem 6.6 gives us the first state space representation. Starting with this controllability form, we can construct the equivalent representation (Ak-_l1F(ao,k)Ak-1, A-kl_IB, AT_I C). Equation (2.29) gives us
A k l l F(ao,k)Ak_l
-
-
Tk_l"
Because Ak-1 is a unit upper triangular matrix, A~-_ll will also be unit upper triangular. Hence A~-_ll B - a0 - e0.
298
CHAPTER
6. L I N E A R S Y S T E M S
Finally, C T A k - 1 - eT H k - l A k - 1 -- e T ] R k - 1 .
This proves the theorem. Note that the results also follow from Corollary 4.14 applied to the special case where the moment matrix is equal to a Hankel matrix. From equation (4.12), we also have C T A k _ I -- e T D k _ l . [:]
E x a m p l e 6.2 Let us continue Example 6.1. A state space representation (A, B, C ) o f the minimal partial realization of degree 8 is given by
A
=
F(a0,2)=
BT
-
[1 0 0 0 ] ,
c r
=
[1000].
0 1 0 0
0 0 1 0
0 0 0 1
1 1 -1 1
The other state space representation based on the block tridiagonal matrix T1 gives the same representation for this example. Note that T1 is the upper left part of T2 given in Example 2.3. l having )~ - a rows and a + 1 columns. As for the mPK-M problem for the Markov parameters, all solutions of the mPR-T problem for the time moments are described in the following theorem. T h e o r e m 6.14 Given the time coefficients Tk, k >_ O. Associate with it the Hankel m o m e n t matriz [Ti+j] and the Kronecker indices tck and Euclidean indices ak. Let us write the minimal partial realizations for these time coefficients in the form of a coprime polynomial fraction representation as c'(z)a'(z) -1 with a'(z) - z~a(z -1) with deg a - a and c'(z) - z ~ - l c ( z - 1 ) . I f 2~k - ak < ~ < 2tck, the reversed polynomials a, associated with the denominator polynomials a ~ of all minimal partial realizations of order ~ for the given time coefficients can be written as a-
ao,k + pao,k-l~
degp < 2~;k - (~ + 1)
and with the extra condition that a(0) = ao,k(0)+ p(0)ao,k-l(0) ~ 0. The polynomials ao,k and ao,k-1 are the true (right} orthogonal polynomials with respect to the m o m e n t matriz [Ti+j] of degree tck and ~k-1 respectively. If 2t;k < ,k < 2t;k+l - a k + l , we have to make a distinction between two other cases. If ao,k(0) ~ 0, the unique minimal partial realization denominator a' is the reverse of the polynomial a = ao,k. If ao,k(0) = 0, the denominators of art minimal partial realizations of order ~ are associated with the reversed polynomials a = pao,k + cao,k-1 with p a monic polynomial of degree ~ - 2tck + 1 and 0 ~ c C F = C. In all these cases, we have chosen for a monic normalization for a, i.e. a comonic normalization for a'.
300
C H A P T E R 6. L I N E A R S Y S T E M S
Together with the numerator polynomial c ~ which can be found as the polynomial part of degree < a - deg a of f a ~, f - ~k~176Tkz k, we get minimal partial realizations of order )~ written as a coprime polynomial fraction. For each different denominator, we get a different system. The McMiUan degree of all these systems is '~k except when a0,k(0) = O. In this case the McMillan degree is )~ - '~k + 1. P r o o f . The proof is very similar to the proof we have given for the mPKM case in the previous section. In the same way, we can use the results we obtained in Section 5.5 about minimal indices. However, there is an additional element here. It is important that a(0) should be different from zero. To show that such an a can always be found by the results above, we have to show that a0,k(0) and a0,k-~ (0) can not vanish simultaneously. This is easy to see because
E
"O0'k
E0'k
"U,0,k aO,k
]
-- Y 0 , k - 1
Evkck] Uk
ak
with vk - 0 or [v0,k Uo,k] T -- Uk[CO,k-1 ao,k-1] T and V0,k a unimodular matrix, i.e., its matrix is a nonzero constant. Hence the matrix V0,k(0) is invertible and this proves not only that a0,k(0) and a0,k-~(0) can not be equal to zero simultaneously but also that c0,k(0) and a0,k(0) can not be equal to zero at the same time either. We leave the rest of the proof to the reader. [J Note that according to Section 5.6 about minimal Pad~ approximation we can connect the a polynomials to the Frobenius definition (along an antidiagonal, see Definition 5.5 and equation (5.22)) and the a ~ polynomials to the Baker definition (along a diagonal, see Definition 5.6 and equation (5.28)). As before, we could give state space descriptions for co,kao,k~ ~-1. Again, we leave the details for the reader.
6.5
The mixed problem
In the two previous sections we considered the cases where either only Markov parameters or time coefficients were given. With the same methods, we can also solve mixed problems where both time coefficients and Markov parameters are given on condition that there is only a finite number of one of them. We try to find a realization having minimal McMillan degree with the same first Markov parameters and time coefficients, i.e. the mixed problem.
6.5.
THE, MIXED P R O B L E M
301
Suppose ~ + 1 time coefficients Tk are given and 7 Markov parameters Mk, which we want to fit simultaneously. Let the denominator degree be bounded by a, then the numerator degree is bounded by a - 1 since we want strict causality and at least one of both degrees is exact, because we want a minimal realization. If we denote the denominator by q and the numerator as p, then q(0) should be nonzero, as was derived for the minimal Pad6 approximation problem in Bakers sense. This means that numerator and denominator are linked to the time coefficients Tk by the systems
T,)t
~ ~ 9
T,~
-- Ot
O, T1 ... ...
To
~ * "
O
To 0
9
.
9
.
To
qo#O
q-
...
(6.12) q
-
ip.
0
This is just a rewriting of the systems (5.28). We used ~ instead of w, the coefficients fk are replaced by the time coefficients Tk and ~ was set to 1. If you then reverse the order of rows and columns, you get the system above. We could also consider it as a special case of (5.22), but with the extra condition that q(0) ~ 0. Similarly, since also 7 Markov parameters have to be fitted, we also have a system of equations that looks like
...
0 0 M1
M1
M 1
~ ~ ~
9
9 ~ ~
Mr:
9 9 9
M , ~ + I
q
ip
~
(6.13)
M2 9
,
, ,
q
O,
q,~ # 0 .
M,,]r
In both systems, a should be minimal. If we now subtract the appropriate equations in (6.12) and ( 6 . 1 3 ) f r o m each other, we obtain a homogeneous
C H A P T E R 6. LINEAR S Y S T E M S
302 Hankel system t h a t looks like
0
0
O
0
9
fa+l fc~+2
q-0,
qoq~O
(6.14)
9
where we have set
/ -T,~+I_k fk
-
-
-
Mk-~-I 0
1,...,
1
k~+ k - )~ + 2 , . . . , ~ + 7 + 1 otherwise
~+7+1
(6.15) (6.16)
It was tacidly understood that a was _ )~ and < 7, but it is easily checked t h a t we can come to the latter system for other values of a too. This system has the form (see (5.16)) Hn-~-l,~q - 0 with a minimal and qoq~, ~ 0 and we have seen before how to solve this. We should consider the minimal indices associated with the sequence {fk}nk=l and the denominator is found to be an annihilating polynomial for t h a t sequence. Because of the nonvanishing constant term and leading coefficient, it is actually a constrained problem as we discussed it in Section 5.5. Once we have obtained the denominator as a solution, we can easily find the n u m e r a t o r coefficients from (6.12) or (6.13). Both of them give the same numerator. Thus we take the sequence of ~ + 1 time coefficients in reverse order and with opposite sign followed by the 7 Markov parameters, i.e. -Tx,-Tx_I,...,-To,
M1, M 2 , . . . , M~.
Rename the elements of this sequence as f k , k - 1 , 2 , . . . , r / , and we can adapt a combination of Theorems 6.12 and 6.14 to the current situation, so t h a t it reads T h e o r e m 6.15 Let there be given ~ + 1 time moments T 0 , T 1 , . . . , T x and 7 Markov parameters M~, M ~ , . . . , M r. Define ~7 and fk as in (6.15} and (6.16}. Associate with this sequence the Kronecker indices nk and Euclidean indices ak. Denote by ao,k the monic true (right} orthogonal polynomials for the Hankel moment matriz with coefficients fk.
6.5.
THE M I X E D P R O B L E M
303
Then, if2tck--ak < ~? < 2t;k, all minimal partial realizations that fit these )~ + 1 time moments and the 7 Markov parameters have (monic) denominators of the form a = ao,k + pao,k-~ with p an arbitrary (monic) polynomial of degree 2ak - (~/+ 1) at most and subject to the condition a(O) ~ O. If 2ak l and solve the problem with this sequence. A theorem much like the previous one can be formulated then. In fact the solutions for c and a described in the previous one are the reversed polynomials of n u m e r a t o r and denominator of the solution of the current problem. Thus we have T h e o r e m 6.16 Let the polynomials c and a be defined as in the previous theorem but now for the sequence { fk } = { M-r, U~_ 1 , . . . , U l , -To, - T I , . . .}, then the minimal partial realizations for the 7 Markov parameters Mk, k = 1 , . . . , 7 and )~+ 1 time coefficients Tk, k = 0,...,)~ are given by c'(z)a'(z) -1 with d ( z ) za-lc(z)and a'(z)zaa(z), where a is the
304
CHAPTER
6.
LINEAR
SYSTEMS
M c M i l l a n degree o f ca -1, the m i n i m a l partial realization described by the p r e v i o u s theorem.
In the next section we shall interpret the Toeplitz results of Section 4.9 as a solution to the mixed minimal partial realization problem. There, the number of Markov parameters and time coefficients can both grow simultaneously.
6.6
Interpretation of the Toeplitz results
We shall reinterpret the results obtained in Section 4.9 where orthogonal polynomials with respect to a Toeplitz moment matrix were briefly studied. It will lead to a solution of the mixed minimal partial realization problem. Thus we suppose as in the previous section that Markov parameters as well as time coefficients are given. The advantage of the present approach is that the methods will work recursively when the number of time coefficients and the number of Markov parameters are increasing simultaneously. This is in contrast with the previous section where we supposed a finite and fixed number of either time coefficients or Markov parameters as known in advance. The two-point Pad~ interpretation that we gave in Section 4.9 gives us a possible method to find partial realizations that match both Markov parameters and time coefficients. So as in equations (4.34) and (4.35), we couple the series f+ and f_ with the given Markov parameters Mk, k > 1 and time coefficients Tk, k _> 0 respectively" + T2z 2 + . . . ,
/+(z)
-
To +
- f_(z)
-
M l z - 1 - F M 2 z -2 -F " " " .
This means that in the notation of Section 4.9 we have set f k -- Tk for k = 0 , 1 , . . . and f-k = Mk for k = 1,2, .... Now consider the Toeplitz moment matrix M - [f~_j] and associate with it the orthogonal polynomials as in Section 4.9. We recall that for example a0,n was the first of a new block of (right) orthogonal polynomials and P0,k were the (right) block indices and pk the (right) block sizes for the Toeplitz moment matrix M. Furthermore, define co,n(z) as in (4.36)or (4.37). Then it is easy to see from (4.38)and (4.39) that a partial realization for the first p0,n + P~+I Markov parameters and the first p0,n+ Pn+l + time coefficients is given by Co,nao, -1n having McMillan degree P0,n. They need not be minimal though. It is also possible to generalize these results to the cases where other numbers of Markov parameters and/or time coefficients are given but we
6.6. I N T E R P R E T A T I O N
OF T H E T O E P L I T Z R E S U L T S
305
shall not do this here. The Toeplitz case is only developed in the chapter on orthogonal polynomials and certain aspects were not scrutinized as we did for the Hankel case. The Toeplitz case really needs a separate t r e a t m e n t because they can not be directly obtained from the Hankel case by some mechanical translation and the results are different, since, although similar to the Euclidean algorithm, the recurrence is not the same. For example, we did not deepen the ideas of minimal indices for Toeplitz matrices as we did for Hankel matrices. So we shall also be brief in this chapter about the Toeplitz case. Instead of proving new things, we just give an example or rather give a system interpretation of an example that we gave earlier. E x a m p l e 6.8 ( E x . 4.5 c o n t i n u e d ) Let us interpret the results of Example 4.5 as solutions of a mixed partial realization problem. First of all, let us assume that we have a continuous time system with the first Markov parameters - 1 , 0, 1/2, 0 , - 1 / 4 , 0, 1/8, 0 , - 1 / 1 6 , . . . and first time coefficients 0, 0, 1/2, 1/4, 1/8, 5/16, 9/32, .... For example, we find that the polynomial fraction c0,2 (z) - z3 - 4 z 2
ao,2(z)
1 z 2 +2z-8 z 4 + 4 z 3 + -~
is a mixed partial realization for the first po,2 + P3 - 4 Markov parameters and the first po,2 + P+ -- 4 time coefficients. In the same way, we can interpret the result as a mixed partial realization for discrete time system data when we take into account that the time coefficient information is now around 1 instead of around 0. Hence, co,2[z'_ - -'I)_
a o , 2 ( x - 1)
- x 3 - x 2 + 5x - 3
x4
__
~ x 2 + 9x
2
25
is a partial realization of order and degree 4 for the same time coefficients as above but for the transformed Markov parameters ( z - x - 1) -1,
-1,
-1/2,
1/2, 7/4, 11/4, 23/8, 13/8,
-17/16, ....
Note that the steady state behaviour of the original system will not necessarily be matched by the partial realization because the realization is not stable not only for the continuous time interpretation but also for the discrete time viewpoint. Indeed, the zeros of a0,2(z) are approximately equal to 1.0257, - 0 . 4 5 6 9 • 1.2993i, -4.1118. deg P1, then the Euclidean algorithm generates remainder polynomials Pk and quotient polynomials qk = Pk-1 div Pk such that P k - l ( Z ) : q k ( z ) P k ( z ) - Pk+l(Z),
k = 1,2,...,t
(6.17)
leading to a situation with Pt+l = 0 and hence Pt = gcd(P0, P1). If deg Pt = 0 then P0 and/)1 are coprime. Note that this corresponds to the continued fraction expansion P0 1 l..... 1] Pl -- q l - [ q 2 I qt" The simplest way to link the Euchdean algorithm with the real root location problem is through the theory of orthogonal polynomials. We say that a system of real polynomials {Pk}~=0 with deg pk - k is orthogonal on
6.7.
STABILITY
307
CHECKS
the real line with respect to a positive measure # if
oo
with 6k,t the Kronecker delta and vk - Ilpkll2 > 0. The link with the general algebraic o r t h o g o n a h t y which we discussed in C h a p t e r 4, is by the H a m b u r g e r m o m e n t problem. If the m o m e n t s for the m e a s u r e / z are defined as
#k -
V
zkdtt(z),
k - O, 1, . . ., n
then if all the Hankel matrices Hm - [ttk+l]~,t=o, m -- O, 1 , . . . , n are positive definite, the measure # will be positive and the inner product is expressed as
(pk
,
r PkHmPt, m >_ m a x { k , t } ,
pj(z)-
T pjx.
In the Hamburger moment problem, one investigates the existence, unicity and characterization of the measures t h a t have a (infinite) sequence of prescribed moments. See [1]. The following theorem is a classical result which is known as Favard's theorem. T h e o r e m 6 . 1 7 Let Pk, k = 0 , . . . , n (n O.
For a proof, we refer to the literature. See for example [92, Theorem 1.5, p. 60]. Note t h a t for orthogonal polynomials which are not necessarily monic, but assuming they have positive leading coefficients ),t > 0, we can write a similar recurrence relation [222, p. 42] Pk+l(X) - a k ( x ) p k ( x ) + 6 k P k - l ( X ) = O,
where now ak(x) = flkx + 7k with 7k E R and flk -- ~ k + l / ~ k
> 0
and
6k > 0.
Another observation to make is t h a t if pn+l and pn are polynomials of degree n + 1 and n respectively whose zeros are real and interlace, then they
308
C H A P T E R 6. L I N E A R S Y S T E M S
can always be considered as two polynomials in a sequence of orthogonal polynomials. To see this, we assume without loss of generality that both their leading coefficients are positive. We construct pn-1 by the Euclidean algorithm, so that
PnTl(X) -~- Pn_l(X) =
an(X)pn(X)
where an(x) is of the form flax + 7n with ~,~ > 0. Since at the zeros (i of pn, the right-hand side is zero and because pn+l((~) ~ 0 by the interlacing property, we see that Pn+l((i)Pn-l((z) < 0. Thus Pn-1 alternates in sign at least n - 1 times. Thus it has at least n - 1 zeros which interlace the (i. Thus, degpn-1 = n - 1 and its leading coefficient is positive. The same reasoning is now applied to pn and pn-1 etc. By Favard's theorem we can conclude that the sequence pk, k = 0, 1 , . . . , n is a sequence of orthogonal polynomials with respect to a positive measure on the real line. C o r o l l a r y 6.18 Consider two polynomials Pn+l and Pn of degree n + 1 and degree n respectively. Then these polynomials are orthogonal with respect to a positive measure ~ on the real line iff the zeros of pn+l and pn are real, simple and interlace. P r o o f . The interlacing property of the real zeros for orthogonal polynomials is well known [222, p. 44] and follows easily by induction from the recurrence relation. The converse has been shown above. 77 This result implies the following observations. If P is a real polynomial of degree n, with n simple zeros on the real line, then its derivative P~ is a polynomial of degree n - 1 whose zeros interlace the zeros of P. Thus if we apply the Euclidean algorithm to the polynomials P0 = P and P1 = P~, then it should generate a recurrence relation for orthogonal polynomials. Thus we get the following theorem. T h e o r e m 6.19 The zeros of the real polynomial P will be real iff the Euclidean algorithm applied to Po - P and P1 - P~ will generate quotient polynomials of degree 1 with a positive leading coefficient. The zeros of P are simple iff Pt -- gcd(P0, P1) and deg Pt -- 0. P r o o f . This follows from our previous observations and because obviously the polynomials Pk -- P n - k / P t should be a system of orthogonal polynomials. If deg Pt - 0, then P and P~ are coprime and hence the zeros of P are simple. [~
6.7. S T A B I L I T Y C H E C K S
309
In fact, a more general result exists giving more precise information about the number of zeros in a real interval. This is based on the theory of Cauchy indices and Sturm sequences. To formulate this result, we introduce the following definitions. D e f i n i t i o n 6.27 ( C a u c h y i n d e x ) Let F be a rational function with a real pole ~. Tracing the value of F ( x ) when x crosses ~ from left to right, then either we have a jump from - c ~ to +co, in which case the Cauchy index at is 1, or a jump from +oc to - c ~ , in which case the Cauchy index at ~ is - 1 or the left and right limit at ~ give infinity with the same sign, in which case the Cauchy index at ~ is set to zero. Let I be a real interval such that the boundary points are not poles of F. Then the Cauchy index of F for I is the sum of the Cauchy indices at all the poles of F that are in I. If I - [a, b], we denote it as I b { F } . Let P be a real polynomial such that P ( a ) P ( b ) # O, a < b and let P' be its derivative. Then by a partial fraction decomposition, it is easily seen that, the Cauchy index of F = P ' / P gives the number of distinct zeros of P in the interval [a, b]. D e f i n i t i o n 6.28 ( S t u r m s e q u e n c e ) A sequence of polynomials {pk} tk = 0 is a Sturm sequence for an interval I - In, b] if
p0(a)p0(b)# 0 2. If ~ 6 I and Pk(~) - 0 and 1 0.
6.7. S T A B I L I T Y CHECKS
313
Therefore, since the zeros of a polynomial are continuous functions of its coefficients, we have N(1 + P) = N(Q=) with Q= = 1 + a P , for any a > 0. Furthermore N ( Q = ) = N ( Q ~ ) w i t h Q ~ ( z ) = zn[1 + aP(1/z)]. The zero z = 0 of multiplicity n of the monomial z n will be perturbed into n roots by adding a z n P ( 1 / z ) to this monomial. For a small enough, these roots can be approximated by the roots of z n + a i n+lc0 and the latter are given by
k = O, . . ., n - 1 ,
~k=pexp{i(r with
r
arg(in-lco)/n,
and
p-I~coi
If n is even, then, because there are no zeros on the imaginary axis, there are precisely n/2 roots in the left half plane and the same number of roots in the right half plane. If n is odd, there will be either (n + 1)/2 or ( n - 1)/2 zeros in the open right half plane and the others are in the open left half plane. This depends on the value of r = (n • 1)~r/(2n) where the +1 is for the case co < 0 and the - 1 for the case Co > 0. Thus r = ~r/2 • ~r/(2n) and hence for co < 0, the root ~0 will be in the left half plane and for co > 0, it will be in the right half plane. Therefore the number of roots in the right half plane is given by (n + sgn co)/2. [::1 For a more general pseudo-lossless function we can use a similar continuity argument and we obtain a proof for the following property which we leave as an exercise to the reader. The details can be found in [83, Theorem 2]. 6.24 Let F be a pseudo-lossless function whose distinct poles in
Theorem
the closed right half plane are given by ( 1 , . . . , ~r. If #k is the multiplicity of ~k and if the principal parts of the Laurent expansions of F at these poles is given by c k ( z - ~k) -uk for ~k ~ oo and ckz "k for ~k = ~ , then I ( F ) = i~ + . . . + i, with ik ik ik
=
~tk ~
=
#k12,
-
(#k + i u h + l s g n c k ) / 2 ,
for Re ~k > 0 for Re ~k = 0 and lzk even for Re ~k - 0 and lzk odd.
Note that in the last case, ck is indeed a real number so that sgn ck is well defined. This has as an immediate consequence the remarkable additivity property. See [83, Theorem 3].
C H A P T E R 6. L I N E A R S Y S T E M S
314
T h e o r e m 6.25 If F1 and F2 are two pseudo-lossless functions, which do not have a common pole in the closed right half plane, then I(F1 + F2) =
Z(F ) + Let P0 and P1 be the para-even and para-odd parts of P. Since we assumed that the leading coefficient of P was real the degrees of P0 and P1 are different. Assume deg P0 = n > deg/91 (if not one has to exchange the role of P0 and P1). Suppose we apply the Euclidean algorithm with starting polynomials P0 and P1. As we said before, we only consider here the simplest form of the Euclidean algorithm, that is, we have the recursion (6.18) for the polynomials Pk. L e m m a 6.26 Let P be a complex polynomial of degree n with para-even and para-odd part given by Po and P1. Let the sequence of remainder polynomials Pk and the sequence of quotient polynomials qk be as constructed by the Euclidean algorithm (6.18) for k = 1 , . . . , t. Let Pt = gcd(P0, P1). Then ~
The para-parity of all P2j is the same as the parity of n and opposite to the para-parity of all P2j+I.
2. The rational functions Fk-1 = Pk-1/ Pk and the quotients qk are pseudo-lossless. 3. Suppose deg Pt >_ 1. Then ( is a para-conjugate zero of P if and only if it is a zero of Pt. P r o o f . The first and second point follow from Lemma 6.21. Thus Pt will be either para-even or para-odd. Thus it can only have para-conjugate zeros. These zeros are common zeros of P0 and P1, hence also zeros of P = P0 + P1. Conversely, para-conjugate zeros of P are common zeros of P0 and P1, hence are zeros of Pt. D Define now the pseudo-lossless functions Fk = Pk/Pk+l. Then it holds that
Fk-i =qk + l/Fk,
k= 1,...,t
and because the para-odd polynomial qk is a pseudo-lossless rational function with all its poles at infinity, and by construction 1/Fk is strictly proper, qk and 1/Fk do not have common poles in the closed right half plane, so that by the additivity property
6.7. S T A B I L I T Y C H E C K S
315
Thus, by induction t
I(Fo) -
~
I(qk).
k=l
Setting Q - P / P t , then obviously N ( P ) -
N ( Q ) + N ( P t ) . Furthermore,
P Po P1 _ x( Po/ P, p /p ) - Z(Fo). N ( Q ) - N ( -~t ) - N ( -~t + -~t )
So that we have t
N(P) - N(Q) + N(Pt) - ~
I(qk) + N(Pt).
k=l
Thus P will have N ( P t ) para-conjugate zeros in the open right half plane. The number of zeros in the open right half plane which are not paraconjugate is given by N ( Q ) . To find I(qi), we can use Theorem 6.23 which we gave above. As a special case, consider a situation where the polynomial has real coefficients. The para-even and para-odd parts are then given by
Po(z) - c , z n + cn-2 Xn--2 + cn-4z "-4 + " " and
P l ( z ) - c,-1
xn-I
+ cn-3z
n-3
+ cn-5
xn-5
+""
when
P ( z ) - cnz n + s
Xn--I
2t- Cn--2 xn--2 -4- "'"-'~ CO.
For a stable polynomial, all the zeros have to be in the left half plane. Thus there can be no para-conjugate ones. This implies that deg Pt - O. Moreover we need ~ k I(qk) = O. The only possible way to obtain this is that all qk are of degree 1, and that their leading coefficients are positive. Thus the Euclidean algorithm is nondegenerate (all quotients have degree 1), with para-odd quotient polynomials qk(x) = ~kz, and since we need I(qk) = 0 we should have sgn flk = + 1. Thus f~k > 0 for all k. We have thus proved T h e o r e m 6.27 (classical R o u t h - H u r w i t z a l g o r i t h m ) A realpolynomial P of degree n has all its zeros in the (open) left half plane 5t is said to be strictly Hurwitz or stable) iff the Euclidean algorithm applied to the paraeven and para-odd part of P has n nondegenerate steps and all the quotient polynomials qk are of the form qk(z) = ]~kX with ~k > O.
316
CHAPTER
6.
LINEAR
SYSTEMS
This problem of stability became famous since it was proposed by A. Stodola to his colleague A. Hurwitz around 1895. It had been studied earlier by E. P~outh and A. Lyapunov. The l~outh array [104] is a practical organization of the Euclidean algorithm. It arranges the coefficients of the polynomials Pk from highest degree coefficients to the constant t e r m as the rows of a triangular array. The polynomial is then stable if the elements in the first column of the array are all positive. We have given the criterion for real polynomials only, although it is a simple m a t t e r to generalize this to complex polynomials. A more serious problem is that the test only works if all the zeros are strictly in the left half plane. If they are not, the algorithm can be degenerate and in this case we loose more precise information. This however can be remedied. It is indeed possible to extract much more information about the zero location of a complex polynomial. For example, how many zeros there are on the imaginary axis, what multiplicity they have etc. To find the multiplicity, of the zeros of a polynomial Q0 = P , it is obvious that if we start the Euclidean algorithm with P0 = Q0 and P1 - Q~, then a g.c.d. Qi - gcd(P0, P1) will contain all the zeros of Q0 that have multiplicity at least 2. Restarting the procedure with P0 = Q1 and P1 - Q~, we find a g.c.d. Q2 which contains all the zeros of P of multiplicity at least 3 and we can repeat this until we have found the multiplicity of all the zeros of P. Suppose this procedure ends with deg Q~ = 0, thus t h a t P has a zero of multiplicity s but not higher. By defining the polynomials Ps -- Q s - 1 / Q s ,
and
pk - Q k - l Qk+l / Q ~ ,
k - 1,..., s - 1
(6.20)
we see t h a t they have simple zeros and t h a t we have found a factorization of the polynomial P which clearly shows the multiplicity of its zeros P(z) - cpl(z)p](z)...p~(z),
c-
Q-~I e C.
In other words, the number of distinct zeros of multiplicity k is given by deg pk. If we are able to decide how many of the para-conjugate zeros of P, i.e., how many of the zeros of Pt are strictly in the right half plane and how many are on the imaginary axis, then we know exactly how the zeros of P are distributed on the imaginary axis, to the left and to the right of it. Recall from L e m m a 6.26 that P~ is para-even or para-odd. In t h a t case, we can easily compute the information we want. So in our previous approach, we now suppose that Q0 = P is para-even or para-odd, then all its zeros are para-conjugate. By induction, also the zeros of the subsequent Qk, k - 1 , . . . , s - 1 will be para-conjugate. Hence, the rational functions
6.7. STABILITY CHECKS
317
Fk+l - Q~/Qk, k - 0 , . . . , s -
1 which can be constructed from these polynomials will be pseudo-lossless and have simple para-conjugate poles. More precisely, let iwj, j = 1 , . . . , j k be the purely imaginary zeros of P which have a multiplicity nj >_ k + 1 and let (r l - 1 , . . . , Ik be the other para-conjugate pairs of zeros of P with a multiplicity mt _ k + 1, then these zeros are precisely the zeros of Qk where they will appear with multiplicity n j - k and m l - k respectively. Thus Fk+l has the form
F~+~(z) - ~
+
j--1 Z -- io.)j
+ 1--1
_
Z -- ~l
Z "~- ~!
The index of Fk+l is the sum of the indices of each (pseudo-lossless) term in the sum. To compute the latter, note that N ( z - iwj + nj - k) - 0 and that N [ ( z - ~,)(z + ~t) + ( m , - k)(2z - r + r - 1. Thus I(Fk+l) the right half Qk which are k = 0,..., s-
is equal to the number of poles of F k + 1 which are strictly in plane. By construction this is the number of distinct zeros of in Re z > O. Thus, because N(Qk) = I(Fk+l)+ N(Qk+I) for 1, we find by induction that s-1
N(Qo) - N(P) - ~ I(Fk+ 1) k-O
and hence P has E I ( F k + l ) pairs of para-conjugate zeros and thus deg ( P ) 2 ~ I ( F k + l ) purely imaginary zeros. The practical way to find I ( F k + l ) i s by applying the Euclidean algorithm to the polynomials P0 - Q k and/91 - Q~. When it produces quotient polynomials qj, then I(Fk+l) - ~ j I(qj) and the latter are easily found as in Theorem 6.23. E x a m p l e 6.4 Consider the example P ( z ) - ( z - 2)(z 2 - 1)3(z 2 + 1) 2 9 explicit form, this is P ( z ) - Po(z)+ P~(z)with
In
Po(z)-z 11-z 9-2z 7+2z s+z 3-z
and Pl(z) - - 2 z 1~ -}- 2z 8 + 4z 6 - 4 z 4 - 2z 2 + 2 the para-even and para-odd parts respectively. ends after one step. Po(z) -
z 11 -
z 9 - 2z ~ + 2z 5 + z 3 -
The Euchdean algorithm
z
P l ( Z ) - - 2 z I~ + 2z 8 + 4z 6 - 4 z 4 - 2z 2 + 2
P~.(z)- o
--,
Oo(z)-
--, q l ( z ) - - z 1 2
- 2 z 1~ + 2 z ~ + 4 z ~ -
4z ~ - 2z ~ + 2
CHAPTER
318
6. L I N E A R S Y S T E M S
Hence, because I ( q l ) = 1, there is exactly one zero in the right half plane which is not p a r a - c o n j u g a t e (the zero z - 2). To investigate the p a r a - c o n j u g a t e zeros, we have to analyse the g.c.d. p o l y n o m i a l Q0, which is a para-even polynomial. So we s t a r t again the Euclidean a l g o r i t h m with P0 - Q0 and P1 - Q~. This gives
eo(z) PI(Z)P2(z) P~(z) P4(z)Ps(z)I(F~) -
- 2 z 10 q- 2z 8 q - 4 z 6 - 4z 4 - 2z 2 --k 2 - 2 0 z 9 + 16z 7 + 24z s - 16z 3 - 4z 2(z 8 + 4z 6 - 6z 4 - 4z 2 + 5 ) / 5 9 6 ( z ~ - z ~ - z ~ + z) 2(z 6 - z 4 - z 2 + 1) 0 I ( q ~ ) + I ( q , ) + I ( q ~ ) + I(q~) -
= Qo(z) q~(z) : z / l O q2(z) = - 5 0 z -+ q~(z)= zl24o q4(z) = 48z ~ Ql(z)2(z 6- z 4 - z 2 T l)
Since deg Q1 - 6 > O, we have to s t a r t again with Po - Q1 a n d / ) 1 - Q~. Po(z) P~ (z) P~(z) P3(z)
-
2 ( z ~ - z ' - z ~ + 1) 4 ( 3 z ~ - 2z ~ - z) 4 ( - z ~ - 2z ~. + 3)/6 3 2 ( - z 3 + z) P4(z) - - 2 z 2 + 2
-+ q l ( z ) - z / 6 q2(z) = - 1 8 ~ --, q ~ ( z ) = z/48 --+ q 4 ( z ) = 16z
Ps(z) - 0 I(F2) - I ( q l ) + I ( q 2 ) + I ( q 3 ) + I(q4) -- 1
Q ~ ( ~ ) - - 2 ( ~ ~ - 1)
Again deg Q2 - 2 > 0 so t h a t we have to s t a r t a n o t h e r Euclidean sweep with P0 - Q2 a n d / ) 1 - Q~. Po(~)P~(z) : P~(z) : Pz(z) = I(F3) =
-2(z ~-4z 2 o
1) q~(z)--* q 2 ( z ) --, Q ~ ( z ) -
z/2 -2z 2
I ( q l ) + I(q2) = 1
Now deg Q3 - 0 and we have all the i n f o r m a t i o n we need. T h e n u m b e r of p a r a - c o n j u g a t e pairs of zeros off the i m a g i n a r y axis is I ( F ~ ) + I ( F 2 ) + I ( F 3 ) 3 (the pair • with multiplicity 3). T h u s there r e m a i n 11 - 1 - 3 . 2 - 4 zeros on the i m a g i n a r y axis (the zeros + i with multiplicity 2). We can find the factorization of Q0 by defining
p~(z)-
Qo(z)Q2(z) = 1 Ql(Z)2 '
p2(z)-
Q l ( z ) Q 3 ( z ) _ z2 -t- 1 Q2(z)2 '
6.7.
STABILITY
319
CHECKS
and Q
(z) =
Q (z) Then 1 z 2 + 1) 2 ( 1 Qo(Z) - ~ P l ( Z ) P 2 ( z ) P 3 ( z ) - 2(
z 2 )3 .
The number of distinct pairs of para-conjugate zeros in these factors which are not on the imaginary axis is given by I(F2) - I ( F 1 ) = 0 for Pl, I ( F 3 ) I(F2) = 0 for P2 and I(F3) = 1 for P3.
We can summarize the results in a general form of the Routh-Hurwitz algorithm, which is formulated as Algorithm 6.1. Algorithm 6.1" Routh_Hurwitz Given P = cnz n + . . . E C[z] with 0 ~ cn C R Set P0 = [P + ( - 1 ) n P . ] / 2 and P~ = [ P - ( - 1 ) n P . ] / 2 if deg P0 < deg P1 t h e n interchange P0 and P1 N=M=O k=l
w h i l e Pk Pk+l qk = if ak
r 0 = Pk m o d Pk-1 Pk div Pk-~ = - i ~ k ( i z ) ~'k + " " + 7k odd t h e n N - N + sgn~k
k=k+l
endwhile Q0 = Pk-1, s = 0 w h i l e deg Q, > 0 Po - Q,; P1 - Q's; M s - 0
k=l w h i l e Pk ~ 0 Pk+l = Pk m o d Pk- 1 qk = Pk div Pk-~ = - i f l k ( i z ) '~h + " " + 7k if ak odd t h e n Ms = Ms + sgnflk k=k+l endwhile M = M + Ms, s = s + 1, Q~ = Pk-1 endwhile
CHAPTER 6. LINEAR SYSTEMS
320
In a first cycle, the g.c.d. Q0 of the para-even and the para-odd part of the polynomial P is computed. Then, this g.c.d, polynomial is further analysed in the s-cycle of Euclidean algorithms applied to P0 = Q~ and P1 - Q',. The N and M counters are used to give the information about the location of the zeros of P. For example the number of zeros in the right half plane is given by N ( P ) = ( n - N - M)/2. To see this, recall that t
N(P) - ~ I(qk) + N(Pt). k=l
Now,
I(qk)
_
f
gn k)/2
-
akl2
Since ~ k deg qk - n - deg Pt and ~=k I(qk)
-
(n -
if ak - deg qk is odd if ak - deg qk is even. odd
sgn flk -- N, we have
deg P t
-
N)/2.
k
Since Qo - Pt, we have by a similar argument that (degQk_l-degQk-Mk_l)/2,
k-
1,...,s
is the number of zeros of Q o - Pt in the right half plane which have multiplicity k. Summing this up for k - 1 , . . . , s and using degQ~ - 0, we get $
N(Pt) - (deg Pt - ~ Mk-1)/2 -- (deg Pt - M)/2. k=l
Therefore
N(P)-
degPt + d e g P t 2 - M = n - N 2 - M
n-N-2
The number of (para-conjugate) zeros on the imaginary axis is given by N o ( P ) - M because
N~
degP*-M)2 -M.
Thus the number of zeros in the left half plane is given by (n + N - M)/2. Indeed, this number is n-
N(P)-
No(P)
-
n-
n-N-M 2
-
M
-
n+N-M 2
6.7. S T A B I L I T Y C H E C K S
321
The polynomials (6.20) could be defined to give the factorization
Qo(z)-
c e c.
The number of zeros of Pk that are on the imaginary axis is given by No(pk) = Mk-1, and the number of (para-conjugate) zeros of pk which are in the right half plane is given by N (Pk) = (deg pk - M k - ~) / 2. A number of classical zero location criteria can be obtained as a special case of this general l~outh-Hurwitz algorithm. For example, the classical l~outh-Hurwitz algorithm also works for complex polynomials. A polynomial P C C[z] is said to be Hurwitz in a strict sense if all its zeros are in the open left half plane. We have C o r o l l a r y 6.28 ( s t r i c t s e n s e H u r w i t z ) The polynomial P C C[z] is strict sense Hurwitz iff the general Routh-Hurwitz algorithm gives quotient polynomials qk(z) - ~kz + iTk where ~k > 0 and deg Q0 = 0. A polynomial P E C[z] is said to be Hurwitz in wide sense if all its zeros are in the closed left half plane and if there are zeros on the imaginary axis, then these zeros should be simple. C o r o l l a r y 6.29 ( w i d e s e n s e H u r w i t z ) The polynomial P C C[z] is wide sense Hurwitz iff the general Routh-Hurwitz algorithm gives quotient polynomials qk(z) - ~kz + iTk where ~k > 0 and deg Qo - 0 (in which case it is Hurwitz in strict sense) or deg Q0 > 0 but deg Q~ = 0 (in which case there are deg Q o simple zeros on the imaginary axis). Also the criterion for the zeros of a real polynomial to be real can be generalized to complex polynomials. Therefore we have to use the transformation z ~ iz to bring the real axis to the imaginary axis. If P ( z ) has only real zeros, then P(iz) has only zeros on the imaginary axis. These are paraconjugate and therefore P(iz) is either para-even or para-odd. Thus the first cycle in the Generalized P~outh-Hurwitz algorithm should vanish and we can immediately start with P0 - Q0 - P and P1 - Q~ - P~. Working out the details is left to the reader. The result is as follows. C o r o l l a r y 6.30 ( r e a l z e r o s ) The polynomial P C C[z] has only real zeros iff the general Routh-Hurwitz algorithm which should now be applied with initial polynomials Po = P and P1 = P~ (instead of the para-even and paraodd parts of P) is regular with quotient polynomials qk(z) - ~kz + iTk where ~k > 0 and Tk C R. Moreover, if the pk, k = 1 , . . . , s are defined as in (6.20), then P has deg Pk zeros of multiplicity k. P has only simple zeros if degQ1 = 0. Several other related tests can be derived from this general P~outhHurwitz algorithm. For more applications we refer to [107].
C H A P T E R 6. L I N E A R S Y S T E M S
322 6.7.2
Schur-Cohn
test
It will not be surprising that the stability checks for continuous time systems which require an appropriate location of polynomial zeros with respect to the imaginary axis have an analog for discrete time systems, where the location with respect to the complex unit circle is important. LetD{z e C ' [ z [ < 1 } , ' l g - {z e C ' [ z [ 1} and E - {z e C " [z[ > 1} U {oo} denote the open unit disk, the unit circle and its exterior respectively. The most classical test to check whether all the zeros of a complex polynomial are in D is the Schur-Cohn-Jury test [64, 163, 164, 186, 215]. Again the link can be easily made when considering orthogonal polynomials with respect to the unit circle. Let # be a positive measure on the unit circle, then the sequence {pk}~, n < oo is called a sequence of orthogonal polynomials on the unit circle with respect to # if
(Pk,Pt) - fv p k ( t ) p t ( t ) d # ( t ) - 5k,tvk,
k,l - O, 1 , . . . , n
with 6k,t the Kronecker delta and u k - Ilpkll2 > 0. Note incidentally that if the trigonometric moments are given by
#k-fv tkd#(t)' kcZ, then
( P k , P t ) - pHTmpk,
m _> max{k,l},
pj(t)-
pTt
where T,,~ is the Toeplitz matrix T,n - [#t-k]'~,t=o Such polynomials are called Szeg6 polynomials [222]. It is well known that the zeros of these polynomials are all in D. To give the recurrence relation for Szeg6 polynomials, we need to adapt our notion of pard-conjugate which we have given for the imaginary axis, to the situation of the unit circle.
Definition 6.31
( p a r a - c o n j u g a t e , r e c i p r o c a l ) For any complez function f , we define its pard-conjugate (us.r.t. the unit circle) as the function f , (z) =
If P is a complex polynomial of degree < n, then its reciprocal with respect to n is given by P # (z) - z"P, (z). We say that ~ is a pare-conjugate zero of a polynomial P if P(() = 0 and also P( (. ) = 0 with ~, = 1/~. A polynomial P is called e-reciprocal if P # - eP with e C T. A polynomial is called reciprocal if it is e-reciprocal with e = 1. A polynomial is called anti-reciprocal if it is e-reciprocal with e = - 1 .
6.7.
STABILITY
323
CHECKS
The definition of reciprocal depends on the degree of the polynomial. It should be clear from the context what degree is assumed. If P ( z ) = c n z n + c,,-1 Zn-I + " " + Co, with c,., - c n - 1 = " " = c k + l - 0 and ck ~ 0, then it is said to have a zero of order n - k at co. Thus P C C_~[z] has a zero of order n - k at co iff P # has a zero of order n - k at the origin. Note t h a t the zeros of P are para-conjugates of the zeros of P # i.e. if P ( ( ) - 0 then P # ( r - 0 with (. - 1/~. This implies t h a t ( i s a para-conjugate zero of P if and only if it is a common zero of P and P # . Para-conjugate zeros are on "ll' or appear in pairs ((, (.). Thus ( is a paraconjugate zero of P iff it is a zero of g c d ( P , P # ) . In other words, the greatest common divisor of P and P # collects all the para-conjugate zeros of P and it has only these zeros. A polynomial is e-reciprocal iff it has only para-conjugate zeros. As for the real line case, the Szeg6 polynomials satisfy a three term recurrence relation, but they also satisfy another type of recurrence which is used in the Levinson algorithm. For simplicity, assume t h a t the Szeg5 polynomials are monic, then we can write this recurrence as
I, Pk+l(z) -- ZPk(Z)"~- PkTlP~k(Z),
p0
-
Pk-.}-I e ]D),
k - 0,...,
7/,- 1.
(6.22)
The Pk are known as Szeg5 or Schur parameters or reflection coefficients or partial correlation (PAt~COI~) coefficients. Note that they are given by = pk(o).
It is also well known t h a t the zeros of the Szeg5 polynomials are all in lD. Thus, to check if a polynomial P of degree n has all its zeros in lD, we could check if it is the n t h polynomial in a sequence of Szeg5 polynomials. To do this, we can rely upon Favard's theorem which says t h a t a sequence of polynomials is a sequence of Szeg5 polynomials iff it satisfies a recurrence of Szeg5 type. Thus, given pn = P monic, we can invert the recurrences as
(1-
IPkl2)zpk_
(z) - p k ( z ) -
pkp
(z),
Pk -
pk(0)
k - n,n-
1 , . . . , 1.
Thus if all the Pk which are generated in this way are in ID, then p,, = P will be the n t h Szeg5 polynomial in a sequence of orthogonal polynomials because it can be generated by the Szeg5 recurrence and hence it has all its zeros in D. Note that we can leave out the factor ( 1 - Ipkl 2) since it will not change the value of pt: - P k ( O ) / P # k ( O ) . This is the computation used in the classical Schur-Cohn test. We thus found t h a t if P is stable, then the pk are all in ID. But the converse is also true. If all the pk are in D, then P will be stable. Therefore
C H A P T E R 6. L I N E A R S Y S T E M S
324
we note that if P has all its zeros in D, then Sn - P / P # will be a Blaschke product of degree n which is analytic in D and its modulus equals 1 on T. This is a Schur function. D e f i n i t i o n 6.32 A complex function f is called a Schur function if it is analytic in D and if ]f(z)] _< 1 for all z E D. A classical theorem by Schur gives us the tool to invert the previous statement about the Blachke product. Theorem
6.31 ( S c h u r )
p = S(O) C D
and
The function S is a Schur function iff either i S(z)- p S - I ( Z ) - ; I -fiS(z)
is a Schur function
-
or
S ( z ) is a constant function of modulus 1. Its proof is easily obtained by observing that if p = S(O) C D, then
z-p
1
-
is a MSbius transform which is a one-to-one map of the unit disk into itself. Moreover Mp(S) has a zero in the origin which can be divided out. Hence the Schur function S is m a p p e d into the Schur function S-1. If S(0) C T, then this map is not well defined, hence the second possibility, which follows by the m a x i m u m modulus principle. Note that if S - P / P # , with P a polynomial of degree n, then we can express S ~ S-1 as
1 P(z) - p P # ( z ) S _ l ( z ) - z P # ( z ) - -tiP(z)'
p-
p(o)/P#(O),
or in terms of polynomials: S-1 - P-1//)-#1 with P-1 (z) - z1 [P(z) - pP#(z)],
19#-1(z) - P # ( z ) - -tiP(z),
P(0) P - P#(0)
Note that this corresponds to one step in the inverse Szeg6 recurrence. So, we reformulate it as follows: Given a polynomial P of degree n, set Pn = P and generate until Pt#_t(O)- 0 (t _> O) the sequence of polynomials Pk as Pk(O)
Pk#_l(Z) - P k # ( z ) - --#kPk(z),
k - n,n-
1,...,t.
(6.23)
6.7.
STABILITY CHECKS
325
We refer to it as the classical Schur-Cohn algorithm. Thus if Pn has all its zeros in D, then Sn - P n / P ~ will be a Schur function of degree n and thus the Schur-Cohn algorithm will generate Schur parameters pk which are all i n D . If for s o m e t > 0 we get Pt C T, then St is a c o n s t a n t and Pt-1 is identically zero. Thus Pt is pt-reciprocal: P t ( z ) - p t P ~ ( z ) . Since common zeros of Pk and P f are also common zeros of Pk+l and Pk#+I it follows that Pt - g c d ( P , P #) if pt C T. Since Pt can only have para-conjugate zeros, this is impossible if we started with a Pn that had all its zeros in D. Thus the polynomials Pk, k = 0, 1 , . . . , n satisfy a Szeg6 recurrence and thus they are orthogonal by Favard's theorem. To recapitulate, we have obtained the following analog of the Favard Theorem 6.17. T h e o r e m 6.32 Let Pk, k - 0 , . . . , n be a set of complex monic polynomials with deg pk = k. Then these polynomials are orthogonal with respect to a positive measure on the unit circle iff they satisfy a recurrence relation of
the/o,'m For a proof of this Favard type theorem, see for example [89]. The property saying that the zeros of Szeg& polynomials are in D is also classical. See for example [222]. From our previous observations, we also have the following analog of Corollary 6.18. C o r o l l a r y 6.33 Let Pn be a complex polynomial of degree n, then it is the nth polynomial in a sequence of polynomials orthogonal with respect to a positive measure on the unit circle iff it has all its zeros in D. We also have proved the classical Schur-Cohn test. T h e o r e m 6.34 (classical S c h u r - C o h n t e s t ) Let P be a complex polynomial of degree n, then it has all its zeros in D iff the classical Schur-Cohn algorithm (6.23) generates n reflection coefficients pk which are all in D. If we do not start from a polynomial P with all its zeros in D, then it may happen that P ~ ( 0 ) - 0. If Pt-1 is identically zero, we still have Pt - g c d ( P , P #) as before. If P ~ I ( 0 ) - 0 without P t - i being identically zero, then P ~ i ( 0 ) - P ~ ( 0 ) - - f i t P t ( O ) - 0 or St(O) - 1/-fit and since by definition St(O) = pt, it follows that Pt = 1~-fit. Thus also in that situation we have pt C T. To deal with this situation conveniently, we rely again on the index theory of pseudo-lossless functions.
D e f i n i t i o n 6.33 ( p s e u d o - l o s s l e s s , p s e u d o - i n n e r ) A rational function F is called pseudo-lossless (w.r.t. T ) if F + F, = O. A rational function S is called pseudo-inner (w.r.t. T ) if S S , = 1.
326
C H A P T E R 6. L I N E A R S Y S T E M S
Note that there is a one-to-one mapping between the pseudo-lossless and the pseudo-inner functions given by the Cayley transform S-
1-F I+F
or
1-S I+S"
F=
(6.24)
Obviously, a rational pseudo-inner function S is of the form S - P / P # with P a polynomial. Definition 6.34 ( i n d e x ) Let T be a polynomial and denote by N ( T ) the number of zeros of T that are in D. The index of a rational pseudo-inner function S - P / P # with P and P # coprime is defined as I ( S ) - N ( P #). The index of a rational pseudo-lossless function F = P / Q with P and Q coprime is defined as I ( F ) = N ( P + Q). If S and F are related by (6.24), then they have the same index: I ( S ) = I ( F ) . Indeed, if S - P / P # , then F - ( P # - P ) / ( P # + P), so that I(F) - N(2P # ) - I(S).
The following properties hold for pseudo-lossless fuctions. For a proof we refer to the literature [83]. See also [81, 84]. The proof goes along the same lines as the proof of Theorem 6.24 or it can be directly obtained from that theorem by applying a Cayley transform mapping the half plane to the unit circle. T h e o r e m 6.35 1. If F is a rational pseudo-lossless function then 1/F, F + ic with c C R are also pseudo-lossless of the same degree and with the same index as F. 2. If F1 and F2 are rational pseudo-lossless functions without common pole, then F = F1 + F2 is pseudo-lossless of degree deg F = deg F1 + deg F2 and with inde~ I( F) = I( F1 ) + I( F2 ). 3. If F is a rational pseudo-lossless function with distinct poles 7k C T of multiplicity tk, for k = 1 , . . . , t and with poles ~t E D of multiplicity st, for l = 1 , . . . , s , then F = F1 + . . . + Ft + G1 + " " + G s + ic with c C R and -
fjr
j=l
+, z
rj
z
and at(z)
-
j
j=l
_
z- G
1 - ~jz
6.7. S T A B I L I T Y CHECKS
327
All the Fk and Gl are pseudo-lossless and I ( a l ) - st and I(Fk) - tk/2 if tk is even while I ( F k ) - ( t k - sgn ftk)/2 if tk is odd. Moreover, they have no common poles and I ( F ) - ~':~=1 I(Fk) + ~-~=11(Gl). If P is a complex polynomial and if g o d ( P , P # ) - Pt, then P - Q Pt and since Pt is e-reciprocal, we also have P# - eQ # Pt. Hence
F = 1 - P / P # _- P# - P __ eQ # - Q 1 + P/P# P# + P eQ# + Q is a pseudo-lossless function and S - g / P # function and obviously
I(F)-
I(S)-
- Q/eQ # is a pseudo-inner
N(Q#).
Thus
N ( P #) - N ( Q #) + N ( P t ) -
I ( S ) + N(Pt).
To find N ( P # ) , we should be able to compute I ( S ) and N(Pt) with S = P / P # , a pseudo-inner function and Pt - g c d ( P , P # ) an e-reciprocal polynomial. The Schur-Cohn algorithm actually does this. Indeed, with the Pk as in the classical Schur-Cohn algorithm ( 6 . 2 3 ) a n d Sk - Pk/Pk#, we have deg Sk - deg Pk - k. Now the MSbius transform z
~-+
z-p 1 - -pz
maps D onto D if p C D and maps D onto E if p C E. Because this mapping is one-to-one, we have
I(Sk) - I ( S k _ l )
ifpkEDand
I(Sk) - k - I ( S k _ l )
ifpkcE.
This allows us to compute conveniently I(Sn) as long as we do not meet the singular situation pk C T. We know that this will happen when P~-I (0) - 0. Either Pk-~ is identically zero, and then Pk - g c d ( P n , P ~ ) a n d Sk is a unimodular constant (N(Sk) = 0). Or Pk-1 is not identically zero. This is a curable breakdown of the Schur-Cohn algorithm which can be solved as follows. Since P~_I(0) - 0, it has a zero at the origin. Suppose this zero has multiplicity m. Because P~-I - P ~ - PkPk with Pk C T, it is an e-reciprocal (with respect to k) polynomial with e = - p k and hence it will have degree k - m. Thus it is of the form (z) - z m T k _ 2 m ( z )
328
C H A P T E R 6. L I N E A R S Y S T E M S
where T k _ 2 m is an e-reciprocal polynomial of degree k - 2m. The pseudolossless function of degree k l + p k S k = P~ +-PkPk 1 - ~kSk Pff - P~Pk
i~k-
will have a para-conjugate pair of poles (0, co) of multiplicity m. These poles can be isolated in an elementary pseudo-lossless term Gm of the form ,n
hj
where the hj are suitable complex numbers, so that Fk - G m
+
F k - 2 rn
with Gm having only poles at 0 and co and F k _ 2 m pseudo-lossless and analytic in 0 and co. By the additivity property of Theorem 6.35(2), we have I ( F k ) - I ( G m ) + I ( F k - 2 m ) and because I ( G m ) - m, we get I(/~k) - m + I(Fk-2r,,).
We can transform this back to the pseudo-inner formulation by setting 1 + Fk-2m S k - 2m =
i ~ Fk_2rn
and we have
z ( s k ) - Z(~kSk)- Z ( P k ) - m + Z(Fk_~m) - m + Z ( S k _ ~ ) . To put this procedure into a practical (polynomial) computational scheme, we should be able to generate the polynomial Pk-2m in the representation Sk-2m - Pk-2m/P#_2m from the polynomial Pk in Sk - Pk/Pk#. Therefore, we note that Gm has the form
v,,,(z) -
D2,.,(z) Z TM
with D2m a polynomial of degree 2m which has the particular structure D2m(z)
-
do + dlz +
=
b,,,_,(z)- ~'+'b~_,(z).
9 . .
+ d~-i
Z m-1
zm+l - -din-1
.....
-doz2m
6.7. S T A B I L I T Y C H E C K S
329
Letting
Wk(z) F~(~)- ~"Tk_~,,,(z)
P:(~) + -p~P~(~) P~ ( z ) - -ZkPk ( z )
and
Fk_~,,,(z)- y~_~,,,(z)
Tk-2m(z)
P:_~,,,(z) + P~_~,,,(z)
wzJ
T~_~,,,(z)- ~
with polynomials k
w~(~)-
k-2m
~
and
ts~ s,
j=O
j=0 we find from the relation Fk - G m
+ Fk-2m that
Tk_2~(z)D2m(z) + z m V k - 2 m ( z ) - Wk(z).
(6.25)
Equating the coefficients of z j, j - 0, 1 , . . . , m - 1, one sees t h a t the second term is not involved and we find t h a t the coefficients di, j - 0 , . . . , m - 1 of the polynomial Din-l, which completely defines the polynomial D2,,,, can be found by solving the triangular Toeplitz system
to tl
to
i
i
tin- 1
tin- 2
do dl
Wo wl
i
i
din- 1
ZOrn- 1
"'. "
99 to
Once D2m is known, we can solve (6.25) for Pk-2m is found as
Yk-2rn
and finally the polynomial
P~-2m = Tk-2m + Vk-2m. So we have managed to j u m p from Pk to Pk-2m, thus from Sk to Sk-2m in the case Pk C T. After that, the Schur-Cohn algorithm can go on. Since we are now able to deal with any situation where pk can be in D, in E or on T, the Schur-Cohn algorithm will only stop when pt C T and Pt-1 identically zero so t h a t Pt - g c d ( P , P # ) . The computation of the index I(S) thus follows the rules
I(Sk)
--
I(Sk-1),
I(Sk)
-
k-I(Sk_l),
I( Sk )
-
m + I( Sk_2m ),
if Pk e D ifpkeE ifpkcT
C H A P T E R 6. L I N E A R S Y S T E M S
330
where in the last case m is the multiplicity of the zero z - 0 in P~-I" Thus we are able to compute I ( S ) where S - P / P # and we end up with I ( S ) and Pt - g o d ( P , P # ) . Thus
N ( P #) - I ( S ) + N(Pt) - N ( P # / P t ) + N(Pt). Example
6.5 ( G e n i n [83]) We consider the polynomial
P(z)-l+z+z
2 + 2 z 3 + z 4 + z 5.
To compute N ( P ) , we set P5# - P and find that in the very first step of the Schur-Cohn algorithm we have p5 - P s ( O ) / P ~ ( O ) - 1. The polynomial P ~ is given by P 4 # ( z ) - z 2 ( - 1 + z) which is not identically zero. Thus we have to rely on the special procedure to overcome this singular situation. Note that z = 0 is a zero of multiplicity m - 2. The general formula P~_l(Z) - z'~Tk_2m(z) gives Tl(z) - - 1 q- z. The n u m e r a t o r of/55 is W ~ ( z ) - P ~ ( z ) + -;~P~(z) - 2 + 2 z + 3z ~ + 3z ~ + 2~ ~ + 2z ~
Thus we have to solve the triangular system
giving do = - 2 and
dl
-
- 4 . Thus
D4(z) - - 2 - 4z + 4z 3 + 2z 4. We then find V1 as
V~(z) - z - 2 [ W s ( z ) - Tl(z)D4(z)] - 7(1 + z) and hence PlY(Z) - y , ( z ) + T , ( z ) - 6 + 8z.
We note t h a t I ( $ 5 ) = 2 -}- 1(81)
and we can continue with the Schur-Oohn algorithm using P1#.
px - PI(O)/PI~(O)- 4/3 E E. Thus /(Sl)
-- I - - / ( S 0 )
-- I
We get
6.7.
STABILITY
331
CHECKS
since So is a constant. This means that Ps and P ~ are coprime and thus N ( P ) - N(P~#) - I ( S s ) - 2 + I ( S 1 ) - 2 + 1 - 3.
There are no para-conjugate zeros. There are 3 zeros in D and 2 zeros in E. In general P and P # can have a nontrivial g.c.d. P~ which will contain all the para-conjugate zeros of P. Then we have to find a convenient way to compute N ( P t ) . Note that Pt is always e-reciprocal. Our next objective is therefore to compute N ( Q ) with Q an arbitrary e-reciprocal polynomial where for simplicity we assume that Q ( 0 ) ~ 0. We start by noting that such a polynomial is always of the form ~!
Tn I
Q(z) - ,7 I I ( z - n) TM I-[ 1--1
n
[(z - r
- ~z)]
~
q, zJ
- ~
m--1
(6.26)
j--O II
with 77 C T, TI C T, l - 1 , . . . , I ' , 0 ~ (m E D, m - 1 , . . . , m ' and ~ t = l nt + TIq~t 2 ~ m = l nm - n. Define the pseudo-lossless function F(z) - n-
(6.27)
2z Q'(z---~)
Q(z)
where Q' is the derivative of Q. Using a partial fraction decomposition of F and the definition and additivity of the index of pseudo-lossless functions, it can be shown [108] that i e m m a 6.36 With Q of the f o r m (6.26) and F as in (6.27), we have that I(F) = m'= N(Q). P r o o L By a partial fraction decomposition, we see that F(z)
-
+
1=1
7"/-- Z
~-~nm
m=l
+
(rn-- Z
-
,
l--~mZ
which incidently shows that F is pseudo-lossless. For the terms in the first sum we note that X nt
-N Tl--
(nt+l)Tt+(nt--1)z
--0.
Z
For the terms in the second sum we have with bin(z) - 2((m - ~ m z 2) and am(z) - (,,,,, - ( 1 + I(,nl2)z + (,~z 2 that its index is given by
i
nm~
- N('~mbm + ~m).
C H A P T E R 6. LINEAR S Y S T E M S
332 Setting
P(z)
-
n ~ b m ( z ) + a~(z)
-
(1 + 2n,,,,)~,,,, - ( 1 + Ir
+ (1 -
the product of its zeros is (1 + 2nm)/(1 - 2nm)" ~,~/~,~ r T. Thus it has no para-conjugate zeros so that we can apply the Schur-Cohn test with the theory we have seen so far. We obtain p2 and pl C E. Thus N ( P ) = N(nmbm + am) = 1. Using the additivity property for the indices, we see t h a t indeed I ( F ) = m'. [:] Thus I(F) counts the number of para-conjugate pairs of zeros of Q which are not on T. To compute I(F) practically, we first note 6.37 With Q as in (6.26) and T(z) = n Q ( z ) - 2zQ'(z) the numerator of the function F in (6.27), we have g c d ( Q , Q') = g c d ( Q , T).
Lemma
P r o o f . Obviously, any common zero of Q and Q' is also a zero of T. Conversely, a common zero of Q and T is also a zero of zQ'(z). Since Q is e-reciprocal, it can not have a zero at the origin because this would imply t h a t it is not of degree n. Thus a common zero of Q and T is also a zero of Q'. This proves the lemma. 0 Let Q1 = g c d ( Q , Q'). Then, since F = T / Q , we have
I(F)-
T/Q1 I(Q/Q1)-
N[(Q + T)/Q1] - N(U/Q1)
with n
U(z) - Q(z) + T(z) - (n + 1 ) Q ( z ) - 2zQ'(z) - ~ . ( n - 2j + 1)qjz j. j=O Since also Q1 is e-reciprocal and
Q - VQ1,
Q ' - RQ1,
T - WQ1,
hence
U - (V + W)Q1,
for some polynomials V, R, W, it follows that Q1 is also the g.c.d, of U and U #. Thus Q1 can be computed by applying the Schur-Cohn algorithm to V. Moreover, the algorithm provides us with N(U/Q1) = I(F). Because Q1 = g c d ( Q , Q'), it contains the zeros of Q which have a multiplicity at least 2. Because Q1 is again e-reciprocal, thus of the same form as Q, this procedure can be repeated, replacing Q by Q1. Thus, after the classical
6.7.
333
STABILITY CHECKS
Schur-Cohn algorithm has computed Q - P~ - g c d ( P , P # ) , the general Schur-Cohn algorithm will append to it the computation of the polynomials
Qo - Q,
Uk(z) - n k Q k ( z ) - 2zQ~(z),
nk -- deg Qk,
k - O, 1 , . . . , s - 1
and apply the classical Schur-Cohn steps to these Uk to give Qk+l - gcd(Uk, Uff) - gcd(Qk, Q~)
and
Nk - N ( U k / Q k + I ) .
This iteration stops when eventually degQ, = 0. The number of paraconjugate pairs of zeros of Q, hence also of P, which are not on T and that have a multiplicity at least k is given by Ark. Thus if P is a complex polynomial and Q0 - g c d ( P , P #) is e-reciprocal, then the number of its para-conjugate pairs of zeros which are not on 2" is given by N ( Q o ) = $--1 ~k=0 Ark and the number of zeros of P that are on "IF is given by N o ( P ) No(Qo) = deg Q 0 - 2N(Q0). In conclusion we can say that the total number of zeros for P that are in D is given by N ( P ) = N ( P / Q o ) + N(Qo). By applying the Schur-Cohn algorithm to P we get N ( P / Q o ) and Q0. By the s subsequent Schur-Cohn sweeps applied to the polynomials U0, U1,..., U~-I, we get the quantities N ( Q o ) and N o ( P ) = go(Qo). As in the Routh-Hurwitz case, one can obtain from the Qk the factors pk that collect precisely the zeros of Q that have multiplicity k. The formulas are the same p~ - Qs-1/Q~,
and
pk - Q k - l Q k + l / Q ~ ,
k-
1,...,s-1
and
Q o ( z ) - cpl(z)pi(z)...pSs(z),
c e C.
with N ( p k ) - Nk - N k - 1 , the number of para-conjugate pairs of zeros which are not on 21"and that have multiplicity k. See [108]. E x a m p l e 6.6 Consider the polynomial 9 1 ) 2 ( z - 2 ) ( z - 1 / 2 ) - 1 - 5z + 7z 2 _ 59 Z 3 +
Q(z) - ( z -
We construct 4
U(z)
j=o
27
9
2j),sz'
-
-
5-
-z
2
+ 7z
+
-
2
3z
Z4
"
C H A P T E R 6. L I N E A R S Y S T E M S
334
Applying the Schur-Cohn algorithm to this polynomial P ~ = U, we get the successive steps 5-
+
9 Z3 -
3z4
9 --~ p3 -- - g E E
-r~ + ~ z ~ + ~ z 3 17 9 19_2
~--
pg(z)
P~(z)-
19
20 + g z - $6~ -
-
3
--*p4---g ED --'* P 2 -
z)
i-fEE
---, pl = - 1 E T.
Thus we arrive in the singular situation Pl E T. But since P0# - 0, we have found that gcd(Q, Q ' ) - gcd(U, U #) - Q l ( z ) -
Pp(z)-
18 --~-(1 - z)
which contains the zero z = 1, which is indeed the only zero of Q that has multiplicity larger than 1. To count the number of para-conjugate pairs, we need -
=
3-
3-(2-
N C P I # / P I # ) ) - 3 - 2 - 1.
Thus Q has one pair of para-conjugate zeros which are not on T of multiplicity 1, and it has one zero on T of multiplicity 2.
We mention that both the P~outh-Hurwitz and the Schur-Cohn algorithm have an operation count which is O(n 2) with n - deg P. However, since the Schur-Cohn algorithm works with the full polynomials while the l~outhHurwitz algorithm needs para-even and para-odd polynomials which contain approximately half the number of coefficients of the full polynomials, the Schur-Cohn algorithm will need about twice as many operations as the Routh-Hurwitz algorithm. This observation has led Y. Bistritz [16] to design an alternative for the Schur-Cohn algorithm which makes use of the symmetric and anti-symmetric parts of the polynomial when it has real coefficients. These polynomials, like e-reciprocal polynomials show a symmetry in their coefficients so that only half of them have to be computed, and thus also for the circle case the number of operations can be halved just like in the l~outh-Hurwitz test. Later, this gave rise to so called "split" versions of the Levinson and Schur algorithm as explored by Ph. Delsarte, Y. Genin and Y. Kamp [75, 76, 77, 78, 79, 80, 106]. They also give a generalization of the Bistritz test to complex polynomials [81].
6.7. S T A B I L I T Y CHECKS
335
We first sketch the ideas of the split Levinson algorithm and show how the inversion of this algorithm leads to a stability test in the normal case. Later on we give the complex Bistritz algorithm for the general case. Suppose that {pk} are the Szeg6 polynomials. Define a new set of polynomials for arbitrary nonzero wk by R0=w0+~0, 0#w0EC Rk(z) -- "WkPk_ # 1 (Z) Ar ~ k Z P k _ l (Z),
"U)k -- R k ( O ) ,
k -
1, 2, . . . .
(6.28)
Obviously these polynomials form a set of reciprocal polynomials (whence the notation R), meaning that Rff - Rk. Because of the symmetry in the coefficients of such polynomials, they are completely defined in terms of the first half of their coefficients. It is possible to give an alternative expression for the Rk. L e m m a 6.38 The polynomials Rk of (6.28) can also be expressed as
okak(z) -
~kp~(~)+
~kPk(z),
~k -- O k - l ( ~ k -- P--k~k),
where the ak are defined by the recurrence 0 ~ o~ -
o-k_~(i
- I,okl~),
k -
a-1
= ao E
k -- 0 , 1 , ~ , . . . (6.29) R and
o, 1, 2,...
where we have introduced the artificial po = O. P r o o f . This is immediately obtained by using the SzegtJ recurrence (6.22) in (6.28). o The interesting thing about these reciprocal polynomials Rk is that they satisfy a specific three term recurrence on condition that the parameters wk are chosen in an appropriate way. T h e o r e m 6.39 One can choose the parameters wk such that the polynomials (6.28) satisfy a three term recurrence of the form
Ro = wo + Wo Rl(z) = Wl + @1z
~k+l(Z) --
(6.30)
(0l k + - ~ k Z ) R k ( Z ) + Z e k _ l ( Z )
: O,
k :
1, 2 , . . .
P r o o f . This is obtained by using the definition (6.28) and the Szeg6 recurrence (6.22)in the expression R k + l ( z ) + z R k - l ( Z ) . This takes the form (ak + -Skz)Rk(z) when we set ak = wk+l/wk which is defined in terms of wk, wk-1 and the Schur parameters pk-1 and pk such that ak - Wk+l _
(Wk-lPk-1 --
wk-i)
k - 1, 2, 3, .
~
~
336
CHAPTER
6. L I N E A R S Y S T E M S
Using the expression for ak and the definition of the Tk, it is immediately seen that a k - wk+l _- ~k'-_____A,1 k - 1, 2,3, ... (6.31) "tOk
Tk
T h e o r e m 6.40 The reciprocal polynomials Rk and the Szeg6" polynomials pk are also related by Rk+l(z)-
)~k+lzRk(z) - wk+l(1 - ~kz)p#k (z),
k - O, 1 , 2 , . . .
(6.32)
where ~k+~ = w--k+~ak _ Rk+~(,?k)
~k
k-
0 ~ 1 ~ 2 ~''*
nkRk(nk) '
The quantities l/)k+l, Tk and ak are as above and ~?k - wk+~Tk C T
k - O, 1 2,
ll3k+ l T k
P r o o f . This follows easily by combining the formulas (6.28) and (6.29).
The )~k are called Jacobi parameters [78, 79]. Finally, for the Schur-Cohn algorithm, the most important observation is that
~k+l
llJ---k+1 Tk-1 (Tk
Ak
~krkak-z
= I~kl~-(1 - Ipkl~),
k - 1, 2, 3 , . . .
(6.33)
Therefore we find that Ak+t/Ak is real and that Pk C D (V, IE) iff Ak+l/Ak > 0 ( - 0, < 0). The efficient version of the Schur-Cohn test will therefore reverse the recurrence (6.30) to compute the polynomials Rk by
R k _ , ( z ) - z-X[(~k + ~ k z ) R k ( z ) - Rk+a(z)],
k-
~, ~ -
1,..., 1
(6.34)
where ak = Rk+l(O)/Rk(O). This will work as long as Rk(O) ~ O, which will be assumed for the time being. Note that all the polynomials Rk are reciprocal and we thus have to compute only the first half of its coefficients. To get started, we have to find appropriate Rn and Rn+l. Given a polynomial P, this P will play the role of the Szeg6 polynomial ion, but since in general it will not be orthogonal with respect to a positive definite
6.7. STABILITY CHECKS
337
measure on the unit circle, we switch to the notation with capitals. Thus we set Pn = P. We assume without loss of generality that it is monic, thus P # ( 0 ) - 1. A possible choice is then
Rn+l(z)- P#(z) + zP(z)
and
(6.a5)
R , ( z ) - P(z) + P#(z).
Both of them are reciprocal and the relation (6.32)holds with p,(z) = pn(z), wn+l = 1, ~,~ - 1 and ),n+l = 1. This is a most convenient situation because it follows from (6.31) that arg (Wk+l)+ arg ('rk) = arg (Wk) + arg ('rk_l) while ~n = 1 implies a r g ( w , + l ) + arg (Tn) = 0. Therefore, it follows by induction t h a t for all k we have arg (wk+l) + arg (rk) = 0 and thus that all ~k = 1. Thus Ak+l = Rk+l (1)/Rk(1). The evaluation of a polynomial in the point 1 is particularly easy since we only have to add its coefficients. Because all the polynomials Rk are reciprocal, we can compute Rk(1) by taking twice the sum of the real parts of the first half of its coefficients. Thus all Rk(1) and hence all Ak are real numbers. Since we only need the sign of the )~k, hence of the Rk(1), we can even discard the factor 2. Let us first consider the nondegenerate situation. In the Schur-Cohn algorithm, this means that none of the pk (except for the last one) are on T. By the definition of the rk (6.29) we see that this means that all rk # 0 and by (6.31) this also implies that all ak ~ 0 and thus that all wk # 0. The converse however is not so simple. We do know the following: Since ~k = 1, we get from (6.32)
P : (z) - Rk+l (z) - /~k+l z R k ( z ) Wk+I(1--Z) with P ~ ( 0 ) -
1 and pk - Pk(0). Thus
Pk(O) _ 1 (~k+l,Wk __ ,t.Ok+l) __ 'Wk+l ('~k+___~l Pk - pk# (O) - ~ k+----~ t//%+l O~k
1).
Therefore Pk C T iff 1 - )~k+l/ak C 2". This implies that if wk = Rk(0) = 0, then ak = oo, so that pk E T. Also, from (6.29), we find
O'k-l'WkTk-1 Thus if R k - l ( 1 ) = 0, then ~k = oo and thus Pk E T.
--
~
1
338
CHAPTER
6.
LINEAR
SYSTEMS
However Rk(0) = 0 is not necessary for Pk to be in 21" and neither is Rk_~(1) = 0. It may happen that pk E ql" but Rk(0) r 0 as the following example shows. In fact all kinds of situations can occur as we will illustrate IIOW.
E x a m p l e 6.7 Suppose P ( z ) - 1 + z + 2 z 2 + z 3 so that p3 - P ( O ) / P # ( O ) 1. The polynomial P has one real zero inside ]D and two complex conjugate zeros in E. The algorithm should be initialized with R , ( ~ ) - ~ + 3z + 2 z ~ + 3~ ~ + z ~
~nd
e~(z) - 2 + 3z + 3z ~ + 2z ~
After one step one obtains R2(z) - ( - 1 + 2 z - z2)/2 and we note that R2(1) = 0. However, the algorithm can terminate without some Rk(0) becoming zero. We have R 4 ( z ) = 1 + 3 z + 2z 2 + 3 z 3 + z 4
R3(z) = 2 + 3z + 3z 2 + 2z 3 R~(z) - (-1 + 2z- z~)/2 R~ (z) = - 5 ( ~ + z) R0(z) = - 2
R,(1) R~(1) R~(1) R~(1) R0(1)
= = = = =
10 10 0 -10 -2
Another example: consider P ( z ) - 1 + z + z 2 + 2 z 3 -[- z 4 -+- z 5 where again p~ = 1. The initialization is R6(z) - 1 + 2z + 2z 2 + 4z 3 + 2 z 4 + 2 z 5 + z 6 R s ( z ) = 2 + 2z + 3z 2 + 3z 3 + 2 z 4 + 2 z 5.
and
Now we obtain after one step R4(z) - ( z - 2z 2 + z3)/2, thus R4(0) - 0 and R~(1)=o.
A last example: P ( z ) = 1 + z + 2z 2 - z 3 with P3 = - 1 . The initialization is R4(z)--l+3z+2z
2+3z 3-z 4
and
R3(z)-3z+3z
2,
which immediately gives R 3 ( 0 ) = 0 but R 3 ( 1 ) ~ 0.
Suppose we have a nondegenerate situation and let us consider an example of a polynomial P # of degree 5, for which the classical Schur-Cohn algorithm computes the polynomials P ~ and the Schur parameters pk, k = 5 , . . . , 1 as in the table below. If also the inverse split Levinson algorithm is applied to this polynomial to compute the reciprocal polynomials Rk and the coefficients )~k, then the relation (6.33) implies the signs of the ~k and of the Rk(1) as indicated. We know that )~6 -- R6(1)/Rs(1) - 1,
6.7. S T A B I L I T Y CHECKS
339
so that R6(1) and Rh(1) have the same sign. We have assumed ample that R6(1) > 0. The other possibility R6(1) < 0 would signs of all the subsequent Rk(1). It is easily checked t h a t the zeros N ( P #) t h a t are in D corresponds to the number of sign the sequence Rk(1).
! R . ( 1 ) :> 0 sgn flips
N(P:)Rs(1) > 0
]
N(p~) R4(1) > 0
I
N(P:)Ra(1) > 0
!
3
N(P2#) R2(1) < 0 1
--t--
I
3
N(P1#)
R1(1) > 0 1
-'t-
in this exchange the number of changes in
I
-3 /t0(1) < 0 1=3
This observation holds in general as long as we have a nondegenerate situation. Nondegenerate means that all pk ~ T, thus that none of the Rk(0) and none of the Rk(1) become zero. We formulate it as a theorem, but leave the details of the proof to the reader. T h e o r e m 6.41 If the inverse split Levinson algorithm (6.34,6.35) is applied for a given complex monic polynomial P of degree n, then if there is no degenerate situation during the execution (all Rk(O) and all Rk(1) are nonzero) then the number of sign changes in the real sequence Rk(1) ~ O, k - n , . . . , 0 is equal to N ( P #), the number of zeros of P# in D. In particular, the polynomial P will have all its zeros in D, iff all the numbers Rk(1) have the same sign. This inverse split Levinson algorithm is however unable to deal elegantly with singular situations (where some pk C 'IF). Only when all wk = Rk(0) 0, k = n , n - 1 , . . . , t and Rt-1 =- 0, we can say something, l~ecall t h a t Rn+l (0) - P # ( 0 ) - 1 is always nonzero. Note t h a t when Rt-1 = 0, then R~+l(z) - (at + - h t z ) R t ( z ) . Thus g c d ( R t + l , R t ) = Rt. It is easy to see from ( 6 . 3 4 ) t h a t a common zero of Rn+l and Rn is also a zero of Rn-1, Rn-2, ... Thus if at a certain stage in the algorithm we find Rt-1 = 0, then we have found g c d ( R n + l , Rn) - Rt. Since we do not have to count sign changes for this, we do not need the polynomial P to be monic. So let us assume that P is any polynomial of degree n but t h a t it has no zero at z = 1. Such a zero is easily recognized and can be eliminated. Then we renormalize the resulting P such that 0 ~ P(1) C E. 6.42 Suppose 0 ~ P(1) C R. Then with the polynomials as defined above, we have g c d ( R n + l , Rn) - g c d ( P , P # ) .
Lemma
P r o o f . Any common zero of P and P # will also be a zero of Rn+l and Rn because of (6.35). Thus g c d ( P , P # ) divides g c d ( R n + l , Rn).
340
C H A P T E R 6. L I N E A R S Y S T E M S
Conversely, any common zero of Rn+l and Rn will also be a zero of a n + l ( z ) - R n ( z ) - ( z - 1)P(z) and of z R n ( z ) - Rn+l(z) - ( z - 1)P#(z). Now Rn(1) - Rn+l(1) - 0 iff Ke P(1) - 0. Thus the common zeros of Rn+l and Rn are common zeros of P and P # . This proves the lemma. [:] Thus we may conclude the following theorem. T h e o r e m 6.43 Given a complex polynomial P with 0 ~ P(1) C I~. Suppose that in the algorithm (6.3~,6.35) all Rk(O) ~ 0 for k = n, n - 1 , . . . , t and that Rt-1 - O. Then Rt - g c d ( P , P # ) . We note that this result depends only on the Rk(O) being nonzero and not on the fact that some pk is or is not on T. Because g c d ( P , P # ) contains precisely all the para-conjugate zeros of P, we arrive at the second stage of the algorithm where such zeros are treated. It has been explained before how this can be handled by the Schur-Cohn algorithm or by any alternative for it. This solves the problem that arrises when at a certain stage in the algorithm we find a polynomial Rt-1 that vanishes identically but none of the R k ( 0 ) a r e 0 for k = n, n - 1 , . . . , t. As we said before, there is no simple adaptation of the inverse split Levinson algorithm to deal with other singular situations. We shall have to switch to the complex Bistritz algorithm to solve these problems. The disadvantage of the complex Bistritz algorithm is there is not a simple link with the Szeg6 polynomials as in the inverse split Levinson algorithm. Most is surprisingly however, that, except for the initialization, these algorithms are formally exactly the same. We give the derivation of the complex Bistritz algorithm, based on the index theory as given in [81]. We drop the condition that P should be monic (which was mainly used to make the link with the monic Szeg6 polynomials) but instead assume that P(1) is real and nonzero. As we stated above, we can easily get rid of a zero z = 1 and then renormalize to make P(1) real. Thus assume P is a complex polynomial of degree n with 0 # P(1) C R. We split P # in its reciprocal and anti-reciprocal part P # = Rn + An where Rn = P # q- P is a reciprocal polynomial and An = P # - P is an anti-reciprocal polynomial. Because of the normalization P(1) e R, we can write An(z) - ( 1 - z ) R n _ l ( z ) with R~-I a reciprocal polynomial of degree n - 1. Then by definition, the complex Bistritz algorithm applies the same steps as in the inverse split Levinson recursion to these initial polynomials Rn and Rn-1 to give (in the nondegenerate situation) the reciprocal polynomials
Rk-X (z) -- z -1 [(C~k + -~kz)Rk(z) -- Rk+x (z)],
C~k --
k = n-
Rk+l(O) ak(O)
1 , n - 2 , . . . , 1.
6.7. S T A B I L I T Y CHECKS
341
If we then also compute the (real) numbers )~k = Rk(1)/Rk-l(1), then it turns out that the number N ( P #) of zeros of P # in D is given by the number of negative elements in the sequence Sk, k = n, n - 1 , . . . , 1. Before we show this general result, we give an example first. E x a m p l e 6.8 Consider the polynomial
P(z) - 1 + z + z 2 +
2z 3 + Z 4 "-~ 1 z 5
which has a pair of complex conjugate zeros in E, a pair of complex conjugate zeros in D and a real zero in D. Thus N ( P #) - 2. The Bistritz algorithm generates
k Rk( ) 5 4 3 2 1 0
53 + 2 z + 3 z 2 + 3 z 3 + 2 z 4 + 5 z3 s Z2 - Z3 - Z4 ) - ~1( - 1 - z + 1 - 3 z - 3z 2 + Z 3 89 + 5z + 3z 21 ~ ( 1 + z) 1
Rk(0)
Rk(i)
3/2
+
1/2 1
3/2 3/2
i/2
2
-3 -1/2 3/2 2/3
+ +
+ +
+
+
The number of sign changes in the sequence Rk(1) is 2 and this is equal to the number of negative ~k's.
The justification of this result is not as simple as in the Schur-Cohn test because the recurrence is not a simple translation of the Schur recurrence. If Pt - god(P, P # ) and S - P / P # is pseudo-inner, then N ( P # / P t ) - I(S). If Fn = (1 + S ) / ( 1 - S), then F,~ is pseudo-lossless and I ( S ) = I(1/F~,) = I(Fn). Thus the problem of counting N ( P # / Pt) reduces to the computation of I(Fn) where Fn(z) = R n ( z ) / [ ( 1 - z)Rn_l(Z)]. When we do not have degenerate situations, the degree of Fn is reduced by 2 in two steps, after which a pseudo-lossless function Fn-2 of the same form is obtained. This process is repeated until R0 is reached. We shall now describe in detail such a 2-step reduction of Fn to Fn-2. So suppose we have given at stage k the reciprocal polynomials Rk and Rk-1 and the pseudo-lossless function of degree k defined by F k ( z ) = R k ( z ) / [ ( 1 First, it is seen that this function has a simple pole at z = 1 which can be extracted by writing (all functions are pseudo-lossless) -
+
CHAPTER 6. LINEAR SYSTEMS
342 with
~ ( ~ ) _ ),k(1 + ~) 2(1- z) ' ~k
-
-
Rk(i) Rk-i(1)"
H~ has no pole at z - 1 and hence by the additivity property of the indices I(Fk) - I ( g ~ ) + I ( g ~ ) . On the other hand splitting Fk as
F~ - c~ + 1/a~ with alk(z)
__ Otk-1 -iF "~k-1 Z ,
Otk-1 __
1- z
Rk(O)
Rk_~ (0)
we find after elimination of Fk that 1
zRk_2(~)
C~(z) = [ H ~ ( z ) - a~(z)] + H~(z) - - ( 1 - z)Rk_l(z) where Rk_~(z)-
z-~[(~k_~ + - a k _ ~ ) R k _ ~ ( z ) - nk(z)].
(6.36)
Because, by construction, H~ - G ~ and H~ have no common pole, we have -
I(H~ - a ~ ) + I(H~)
= I(H~ - a~)+ I ( F k ) - I(H~) Some computations yield that
I(H~)- { while ~(H~ - G~) -
{1 o
1 if~k < 0 0 if,~k > 0
if 2Keak_l - ~k > 0 if 2Ke O~k_1 -- )~k < 0.
This can be expressed compactly as
z ( a ~ ) - ~(Fk)+ ~l[sgn)~k + sgn (2Ke ak_l
,~k)].
Setting z = 1 in (6.36) yields 2Keak_i - ~k = 1/~k-1 so that 1 Z(Fk) - I(V~) - ~(~g~ ~k + ~gn ~k-~).
For the second step, we isolate the pole z - 0 from G~ as follows (again all functions are pseudo-lossless) G ~ - K~ + K~
6.7. STABILITY CHECKS
343
with
K~(z) -
Or.k-2
F -~k-2Z~
Rk-l(0) ak-2 = Rk_2(0)"
Writing this as
a2k(z) - K~(z) - K ~ ( 1 ) + with K~(1) purely imaginary and F~_~(z)
-
(1- z)Rk-3(z)
where
Rk-3(z) - z - l [ ( a k - 2 + -ak-2z)Rk-2(z)- Rk-l(Z)] and
K ~ ( z ) - K~(1) - Lk(z) - - ( ak-2z + ~ k - 2 ) ( 1 - z ) . Obviously I ( L , , ) - 1 so that I ( G ~ ) - 1 + I(Fk-2) and therefore Z(Fk) - Z(Fk_~)+
1 -
~1 (sgn
,kk + sgn )~k-1)"
Since Fk-2 has the same form as Fk, we can repeat this two-step procedure until Rt-1 - 0. Since
1
1 - ~(sgn )~k + sgn ~k-1 ) --
/ 2l
if ~k < 0 and )~k-1 < 0 if ),kAk-~ < 0 0 if ),k > 0 and ~k-1 > 0
it follows that I(Fn) is equal to the number of negative ~k's for k = 1, 2 , . . . , n, which is the same as the number of sign changes in the sequence of Rk(1), k - 0, 1 , . . . , n . We shall not prove the Bistritz algorithm in the most general case. The reader is referred to [81] for all the details. The result is a four-step algorithm which is described as follows. Given a complex polynomial P with 0 ~t P(1) E R, it computes N ( P # / P t ) - I(Fn) and Pt where Pt - gcd(P, P # ) and Rn(z) f.(~)
-
(1 - z ) R ~ _ ~ ( z )
with
R,.,(z) - P#(z) + P(z)
and
R,~-I (z) -
P # ( z ) - P(z) 1-z
CHAPTER
344
6. L I N E A R S Y S T E M S
The iteration at stage k is entered with a pseudo-lossless function
Fk(z) -
ak(z) (1 - z)Rk_~ (z)
or equivalently, the reciprocal polynomials Rk and R,k-1 of the indicated degree. Then the following 4 steps are executed. STEP
1
If Rk-1 - 0 then stop, I(Fk) - 0 and Rk - gcd(P, P # ) . Otherwise write
ak-.(~) - ~"Rp_.(z).
R.,_~(O) ~ O.
p-
k-
2~.
Compute the anti-reciprocal polynomial D2m from
Rk(z) - (1 - z)D2m(z)Rr,_l(z ) m o d z TM. Compute the reciprocal polynomial Rp from
Rr,(z ) - z - m [ R k ( z ) -
( 1 - z)D2,,,,(z)Rr,_l(z)]
Define the pseudo-lossless function Fp and the Jacobi parameter )~p
aS
R.(z) Fp(z)-
(1 -
Z)Rp_l(Z)
and
x.= Rp(1) Rp_l(1)"
Then
~(F~) - ~ + ~(F~) Set k - p and go to STEP 2. STEP 2
Compute
Rk-2(z) - z-l[(otk-l-["-~k-l Z)Rk_l(Z)--Rk(Z)], Set Rk-l(1)
~k-~
=
Rk-2(1)"
Define the function C ~ ( z ) - - (1 - ~ ) a k _ ~ ( ~ ) zRk_2(z)
Rk(0) O~k_ 1
--
Rk-l(0)"
6.7.
STABILITY
345
CHECKS
Then I(Fk) - I(G#)+ a
with
o / Go to
0 1
5 (sgn)~k + sgn ;Xk_1 ) 1 ~(1 - sgn Ak)
-
STEP
if R k - l ( 1 ) - 0 and Rk-2(1)5r 0 if R k - l ( 1 ) r 0 and Rk-2(1)7 ~ 0 if R k - l ( 1 ) r 0 and R k _ 2 ( 1 ) - 0
3.
STEP 3
If Rk-2 - 0 then stop, I ( G ~ ) - 0 and R k - t - gcd(P, P # ) Otherwise write
Compute the reciprocal polynomial T.-1 from Rk-l(z) + zT,,_l(z)Rq_2(z)-
Compute the reciprocal polynomial
0mod
(1 - z) ".
Rq-1 from
R~_~(z) - (i - iz)-~[zT~_~(z)R~_~(z) +
Rk_~(z)].
Define the function
C~(z) -
-
( 1 - z)R~_~(~) zRq_2(z)
Then
Z(G~) - Z(G~)+
b
with
b-
if v is odd
{ 89 - i) l(v-
Set k - q and go to
1
STEP
sgn Rh_t(1)
if v is even.
4.
STEP 4
If Rk-2 - 0 then stop, R k - 1 -- gcd(P, P # ) . Otherwise write
346
C H A P T E R 6. L I N E A R S Y S T E M S C o m p u t e the reciprocal polynomial
U2w+l from
Rk-1 (z) -- U2w+l ( z ) R r - 2 ( z ) m o d z w+l. C o m p u t e the reciprocal polynomial R r - 3 from
R~_~(z)- z-(~+l~[u~+l(z)R,_~(z)- ak_~(z)]. Define the pseudo-lossless function
F~_~(z)
-
R~_~(z) ( ~ _ z)a,_~(z)
Then Z(G~)
Set k = r -
2 and go to
- ~ + ~ + ~(F,_~). STEP
1.
Remarks. 1. In step 1, m is the R k - l ( 0 ) ~ 0, which and step 1 becomes Rv-1 = Rk-1 so that
multiplicity of z = 0 as a zero of Rk-1. If is the nondegenerate situation, then m = 0 trivial because then D2m = 0, Rp - Rk and Fp = Fk.
2. Step 2 of this general procedure corresponds to the first step of the nondegenerate version. In the nondegenerate case, the second choice for a holds. 3. In step 3, v is the multiplicity of z = 1 as a zero of Rk-2. If Rk-2(1) 0, then we are in the nondegenerate situation and this step becomes again trivial because then v = 0, T.-1 is a polynomial with a negative degree and is therefore equal to zero. The polynomials Rq-1 = Rk-1 and Rq-2 - Rk-2, so t h a t G~ - G~. Obviously, b is then equal to zero. 4. Step 4 is the generalization of the second step of the nondegenerate version. Indeed, if Rk-2(0) ~ 0, then w = 0. The polynomial U2w+l is then of degree 1, hence of the form Ul(z) = u + ~z and u has to be chosen such t h a t the constant terms in Rk-1 and T I R k - 2 are the same, which means that u - R k - l ( O ) / R k - 2 ( O ) - ak-2. Thus Rk-3 is obtained by the recurrence of the nondegenerate case.
6.7. S T A B I L I T Y CHECKS
347
E x a m p l e 6.9 We reconsider the polynomial of E x a m p l e 6.5 which is also the second polynomial in E x a m p l e 6.7. We know from the Schur-Cohn test t h a t the polynomial
P(z) - 1 + z + z 2 + 2z 3 + z 4 + z s has no p a r a - c o n j u g a t e zeros, 3 zeros in ]D and 2 zeros in E. Taking P in the role of P # in the initialization of the Bistritz algorithm will yield N ( P / P t ) - I(Fs). This initialization is thus
Rs(z) - P(z)-+- P # ( z ) - 2 + 2z + 3z 2 + 3z 3 + 2z 4 + 2z s
and
R4(z) - P ( z ) -
P # ( z ) = _z2. 1-z
So in step 1 we have with k - 5 t h a t R4(0) - 0, m - 2, p - 1, R p - 1 - - 1 and n4(z) has the form n 4 ( z ) - do + d l z - di z3- doz4. Its coefficients are obtained from R ~ ( z ) - (1 - z)D4(z)Ro(z) m o d z 2 which gives rise to the system
so t h a t d o - - 2 and dl - - 4 . T h e n Rl(z) is c o m p u t e d from Rl(Z)-
z - 2 [ R s ( z ) - (1 - z ) D 4 ( z ) R o ( z ) ] - 7(1 -{- z).
Step 1 is concluded with the c o m p u t a t i o n of )~1 = - 1 4 and the knowledge t h a t I(F5 ) - 2 -{- I(F1 ). In step 2, R - 1 is c o m p u t e d and of course it turns out to be identically zero (it should have degree - 1 ) . So )~0 = oo, but it does not occur in the index c o m p u t a t i o n because we are in situation 3 for the c o m p u t a t i o n of a: a-
1 ~(1-
sgn ( - 1 4 ) ) -
1
and thus
/(El)
- ~(a~)+
~.
The algorithm moves on to step 3, but there it stops because R - 1 - 0. Thus g c d ( P , P # ) - R0 - - 1 so t h a t P and P # are coprime, thus P has no p a r a - c o n j u g a t e zeros. Finally we add up the indices to find t h a t
N ( e ) - I(F~)- 2 + I ( F 1 )
- 2 + 1 + I(a~)
There are 3 zeros in D hence 2 zeros in E.
- 3.
CHAPTER 6. LINEAR SYSTEMS
348
If we switch the role of P and P # , the c o m p u t a t i o n s are practically the same. Rs is the same polynomial. There is a sign change in R4, hence also in R0 and D4. So R1 is again the same as before, but )~1 changes sign. Therefore in step 2 we now find a = 0 and thus
N ( P #) - I(Fh)
-
2 + I(F1) -
2 + 0 +
I(G21) - 2
because again step 3 t e r m i n a t e s the procedure while R-1 = 0. Thus we find indeed the symmetric situation where there are 2 zeros in ]D and 3 zeros in E.
E x a m p l e 6 . 1 0 For the other polynomials in E x a m p l e 6.7, the Bistritz test gives the following results. The polynomial P(z) - 1 + z + 2z 2 + z 3 gives rise to the initializations R3(z) - P ( z ) + P#(z) - 2 + 3z + 3z 2 + 2z 3 '
R2(z) - P(z) - P#(z) = z. 1-z
Step 1 finds m 1, R o ( z ) - 1 and D 2 ( z ) - 2 - 2z 2 so t h a t R l ( z ) 5(1 + z). Therefore )~1 = 5 and I ( F 3 ) = 1 + I(F1). Step 2 computes R - 1 - 0 and I(F1)+I(G2)-Fb with b - 8 9 0. Step 3 concludes the c o m p u t a t i o n with g c d ( P , P # ) - R0 - 1 and o. Thus P and P # are coprime. There is no p a r a - c o n j u g a t e zero in P and
N(P)-
I(F3)-
1 + I(F1) - 1 + 0 +
I(G21)- 1.
There is one (real) zero in ][3) and hence 2 zeros in E. For the polynomial P(z) - 1 + z + 2z 2 - z 3, the initialization gives R3(z) - P ( z ) + P # ( z ) -
3z(l+z)and
R2(z) - P ( z ) - f # ( z ) = 2 + z + 2 z 2 . 1-z
Step 1 gives m = 0, so t h a t it is trivial except for the Jacobi p a r a m e t e r c o m p u t a t i o n ) ~ 3 - 6 / 5 > 0. Step 2 computes a2 = R3(O)/R2(O)= 0 and therefore
+ z). Thus )~2 = R2(1)/Rl(1) = 5 / ( - 6 ) formula and it gives a-
< 0. Now a is c o m p u t e d by the second
21(sgn>'3+sgn)~2)-0'
so t h a t
I(F3)-I(G
2)+0.
6.7. STABILITY CHECKS
349
In step 3 we find v - 0 so t h a t this step is trivial and gives
I(C]) - •
+ b with
b-
l[v-1-
~gn ( R ~ ( ~ ) / R , ( ~ ) ) ]
- 0.
We arrive at step 4 where it is found t h a t w - 0. This means as we have said before t h a t we are in the nondegenerate situation and thus
U l ( Z ) - al +-~,z
with
~1
--
R2(O)/RI(O)- - 2 / 3 .
Thus
R o ( z ) - z - l [ a l ( 1 + z ) R l ( z ) - R 2 ( z ) ] - 3. The conclusion here is I(G 3) - 1 + I(F~). We now return to step 1, which gives m - 0 so t h a t it becomes again trivial and we only compute $1 - - 6 / 3 - 2 < 0. We pass to step 2 where a0 - - 1 and this gives R - 1 - 0. The applicable formula for a is here 1
a - ~ ( 1 - sgn ~ ) -
1 so t h a t
~(f~ ) - ~(C~) + ~.
Step 3 terminates again the algorithm because R-1 - 0 and g c d ( P , P # ) 0
Thus there are no para-conjugate zeros in P and N(P) - I(F3), which after filling in the successive steps gives N(P) - 2. This polynomial has indeed a pair of complex conjugate zeros in D and hence one real zero in
Chapter 7
General rational interpolation As explained in Chapters 2 and 5, the recurrence relations described in this book allow to compute Pad~ approximants along different paths in the Pad~ table: on a diagonal, a staircase, an antidiagonal, a row, . . . . In this chapter, we develop a general framework generalizing the above in two directions. Firstly, a more general interpolation problem is considered from which, e.g., the Pad~ approximation problem can be derived as a special case. Secondly, recurrence relations are constructed allowing to follow an arbitrary p a t h in the "solution table" connected to the new interpolation problem. In Pad~ approximation, the approximant matches a maximal number of coefficients in a power series in z or in z -1 . This corresponds to a number of interpolation conditions at the origin or at infinity. In the Toeplitz case, we had two power series: one at the origin and one at co. As shown in (4.38), (4.39), the interpolation conditions are distributed over the two series. The general framework will allow to m a t c h a number of coefficients in several formal series which are given at several interpolation points. Again all the degrees of freedom in the approximant are used to satisfy a maximal number of interpolation conditions that are distributed over the different formal series. This could therefore be called a multipoint Pad~ approximation problem.
7.1
General framework
The main ideas for this general framework are given by Beckermann and Labahn [12, 13] and Van Sarel and Sultheel [228, 229]. The reader can 351
352
CHAPTER 7. GENERAL RATIONAL INTERPOLATION
consult these papers for more detailed information. We sketch the main ideas. As before F denotes a (finite or infinite, commutative) field, F[z] denotes the set of polynomials and F[[z - ~]] denotes the set of formal power series around ~ C F and F ( z - ~) denotes the set of formal Laurent series around with finitely many negative powers of ( z - ~). We need these formal series for different points r C F, called interpolation points. The set (finite or infinite) of all these interpolation points is denoted by Z. Now suppose Z = { ~ l , . . . , ~,~ and that we want to find a rational form of type (fl, a) whose series expansion at ~ matches k~ coefficients of the given series gz e F[[z-~z]] for i = 1 , . . . , n . I f k l + . ' - + k n is equal to the number of degrees of freedom in the approximant, namely a + fl + 1, then the approximant is a multipoint Pad~ approximant for this collection of series {g~ : i = 1 , . . . , n } . In the special case that k~ = 1 for all i, then the multipoint Pad~ approximant is just a rational interpolant. If n - 1 and hence kl = a + fl + 1, then we have an ordinary Pad~ approximant at ~1. All the information which is used for the interpolation (the k~ terms of g~, i = 1 , . . . , n) could of course be collected in one Newton polynomial, which could be the start of a formal Newton series expansion of a function. However, in this formal framework, infinite power series need not converge and thus need not represent functions. So we have to replace the notion of function by a set of formal series at the interpolation points. In principle, these series are totally unrelated. Therefore we define a formal Newton series as a collection of power series {g~}z~Z with gr e F[[z - r D e f i n i t i o n 7.1 ( f o r m a l N e w t o n series) The set F[[z]]z of formal Newton series with respect to the set of interpolation points Z C F is defined as
~[[z]]z
-
{ g - {g~}r162 e F[[z- r
We call gr the expansion of g at ~. Elements in F[[z]]z can be multiplied as follows. With f, g e F[[z]]z, the product h - fg e F[[z]]z is defined as
h- fg-{h(}~ez
with
h ( - f(g(.
Also division can be defined. When gr ~ 0, V( E Z, the quotient h is defined as h - f / g - {hr162 with h r fr162
Note that in general h i - f r F[[z- r
f/g
belongs to F(z - () and not necessarily to
7.1. G E N E R A L F R A M E W O R K
353
Because polynomials can be written as an element of F [ [ z - (']] for any (" E Z, we can consider the set of polynomials F[z] as a subset of the set of formal Newton series: F[z] C F[[z]]z. Hence, the product of g E F[[z]]z and p E F[z] is well-defined resulting in an element of F[[z]]z. Similarly for the quotient. ,~x, denotes the set of m x s matrices whose entries are in F[[z]]z and similarly for F[z] 'nx~ etc. Recall the Pad~ interpolation condition (5.9) for power series in z ord ( f ( z ) a ( z ) - c(z)) >_ a + 13 + 1. Setting G(z)-
[1
- f(z)],
P(z) - [c(z) a(z)] T,
and
w(z)-
z ~+~+1, (7.1)
this can be rewritten as
a ( z ) P ( z ) - w(z)R(z),
R(z) e
(7.2)
The power series R(z) is called the residual. For power series in z - (', the factor w in the right-hand side should be replaced by w(z) - ( z - ~)~+1 and R should contain only nonnegative powers of ( z - ('). Generalizing this notion further to our situation of formal Newton series where multiple interpolation points are involved, w(z) can be any monic polynomial having interpolation points as zeros. When G(z) is not a row vector, but a general matrix, we introduce not just one polynomial w, but a vector of polynomials tO.
Definition 7.2 ( o r d e r v e c t o r ) Let m be an integer with m >_ 2. An order vector ~ - ( w t , . . . , w m ) with respect to Z is defined as a vector of monic polynomials having interpolation points as zeros. Now we can express interpolation conditions for the polynomial matrix P, given a matrix series G by requiring that G P - diag(a~)R with residual R containing only nonnegative powers of ( z - (') for all (" C Z. We say that P has G-order a3. D e f i n i t i o n 7.3 ( G - o r d e r ) Let G e F[[z]]~ x " and ~ an order vector. The polynomial matriz P C F[z] mxs is said to have G-order ~ iff GP-
diag(wl,...,wm)R
with R e F[[z]]~ x~.
(7.3)
The matriz R is called the order residual (for P). The set of all polynomial vectors having G-order ~ is denoted by S ( G , ~ ) , i.e., 8(G,5)-
{Q e F[z]mXl 9 Q has G-order ~}.
354
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
E x a m p l e 7.1 The G-order of a polynomial P will depend on the set of interpolation points Z. For example suppose
G(z)-
[zz 1, z] ~ ( ~ - 2)
z-
[1]
~
~- 9
Then G(z)P(z)-
(z-
2 ) ( 2 z - 1)
"
Therefore, if Z - {0, 1} or Z - {0}, then P has G-order ~7 - (z k, 1) for k - 0, 1. If Z - {0, 2}, then P has G-order a7 - (z k, ( z - 2) t) for k, l - 0, 1. If
a(z)-
z(z-~)
~-~
~-~
,
then
so that for Z - {0, 1}, this P has G-order ~ - ( z P ( z - 1)q, ( z - 1) k) for p , q - 0, 1 , 2 , . . . and k e {0, 1}. When Z - {0, 1,2}, we can add the factor ( z - 2) ~ to the first component of ~7 with r any nonnegative integer.
Before we look into the algebraic structure of S(G, ~), we need the following definition; see [17, p. 3, Th. 7.6, 7.4, p. 105]. D e f i n i t i o n 7.4 ( m o d u l e , free m o d u l e , f i n i t e - d i m e n s i o n a l ) Let R be a ring. Then an additive commutative group M with operators R is a (left) R-module if the law of external composition R x M ~ M 9 (~, a ) ~ ~a, is subject to the following axioms, for all elements ~, ~ E R and a, b C M" a( a + b) (a + $)a
-
(~)~la
-
aa + ab, aa + $a,
~(~), a.
A (left) R-module M is free if it has a basis, i.e., if every element of M can be expressed in a unique way as a (left) linear combination of the elements of the basis. The dimension dim M of a free R-module M is the cardinality of any of its bases.
7.1. G E N E R A L F R A M E W O R K
355
We have a right R-module when instead of (a)~)a = a(),a) we have (a)~)a - ~(aa). Since we only need the left version here, we drop the adjective "left" everywhere. It is clear that F[z] TM is an m-dimensional F[z]-module. A possible basis is given by the set {ek)k~=~ with ek - [ 0 , . . . , 0, 1, 0 , . . . , 0] T where the one is at position k. A principal ideal domain is defined as follows [66, p. 301,318]. D e f i n i t i o n 7.5 (ideal, p r i n c i p a l ideal d o m a i n ) An ideal A in a ring R is a subgroup of the additive group of R such that R A C A and A R C A. In any commutative ring R, an ideal is said to be principal if it can be generated by a single element, i.e., when it can be written in the form aR with a C R. An integral domain in which every ideal is principal is called a principal ideal domain. The Euclidean domain of polynomials F[z] is a principal ideal domain [66, Th. 1, p. 319]. Every submodule of a free R-module will also be free if R is a principal ideal domain. We have indeed [17, Th. 16.3] T h e o r e m 7.1 Let R be a principal ideal domain and let M be a free Rmodule. Then every submodule N of M is free with dim N > O, as a rational fraction c(z)a(z) -1, which can be written as a polynomial vector Q(z) - [c(z) a(z)] T (or as a polynomial couple (c(z), a(z)) when appropriate), satisfying ord (G(z)Q(z)) > (v + 1, O) (row-wise) with v - / 3 + a
deg ( H ( z ) Q ( z ) ) _ 0 a n d f l >_ 0 and hence, v >__ O. Let B be a (,O,a)-basis matrix and define [1
f]B- [s r].
-
Then, B has one of the two (mutually excluding)forms" (a) B has the unique form
B -
Iv c] z2u ~ a
with the following normalization
a(O)- I
and s(O)- i.
(b) B has the (nonunique) form
,_iv zc] u
za
with the following normalization u(O)- 1
and
~ ( o ) - 1.
370
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
P r o o f . From Theorem 7.6, we get that all solutions (c, a) having H-degree _ 0 and satisfying
('/.12)
c ( z ) - f(z)a(z) - ?'(Z)Z c~+/3+1 form a subspace of dimension 1. There are two possibilities:
(a) a(0) ~ 0. Hence, we can take for the second column of B the unique polynomial couple (c, a)with a(0) = 1. From Theorem 7.6, we get that there exists a basis vector (v", u") of H-degree 1, linearly independent of (c, a) and (zc, za). Because a(0) ~ 0, it is clear that there exist (unique) 71,72 C F such that
('V'1, "t/,H) + ")'1(r a)+ ")'2(zc, za) has the form (v, z2u ') with H-degree 1. This polynomial couple is unique up to a nonzero constant factor. By taking the determinant of the equality G B - diag u3. R we obtain that
R(0)-
,(0)]
with u -
Z2lt !
is nonsingular. Because u ( O ) - O, s(O)~ 0 and (v, u ) c a n be normalized such that s(0) - 1. (b) a(0) = 0. Because u = a + f l >_ 0, this implies that also c(0) = 0. Because R ( 0 ) i s nonsingular and a(0) = 0, r(0) # 0 and u(0) ~ 0. Hence, we can scale (c,a)such that r(0) = 1 and (v,u) such that u(0) = 1. Note that ( u , v ) i s not unique. A linear combination of (c, a ) a n d (zc, za) can be added. Thus the theorem is proved.
[3
Because the basis matrix of the previous theorem in case (a) is unique, we will call it the canonical (fl, a)-basis matrix for the weakly normal data (G, H,&) of type (a). An example of weakly normal data of type (b) is given below. E x a m p l e 7.2 ( w e a k l y n o r m a l d a t a of t y p e (b)) Let f ( z ) - 1 + Oz + 0z 2 - lz 3 + . - . a n d consider the case a - 1 and f l - 2. It turns out that each solution having H-degree _ 1, then the first column of the canonical (~, a)-basis matrix is the (unique with s(O) - 1) PA of type (f~- 1, a - 1) multiplied by Z2 .
P r o o f . From Lemma 7.9, we know that the data ( G , H , ~ ) are weakly normal of type (a) iff the dimension of the space of solutions in $(G,05) having H-degree _< 0 is one and each of these solutions (c, a) satisfies a(0) ~t 0. Let a(z) - a o + a l z + . . . + a ~ z ~. The a-part ofasolution (c,a) 6 S ( G , ~ ) having H-degree 0.
O
v~ v~ I,
.
0 1
(:.:4)
l
0
(1,2) u(+l
0
If ~ + 77 > 0, it follows that vO'2) - 0. If no new interpolation conditions are added, i.e. when : + 77 - 0, v0(1,2) - 1 / s o(:) ~ 0 and the first column of B(:'2) can be computed as the solution of the following set of linear equations. -
(:,2)
-
v1
0 ~
(1,2)
T1,2
;II:I)
-- __V~ 1,2)
+1
~!1,2)
v ~+n+2 (:) -
(1,2) . U~+l
0
_
Let us consider two special cases of the previous general recurrence between two arbitrary weakly normal data points in the Pad~ table. We shall look for the next weakly normal data point on a diagonal and on a row in the Pad~ table.
Diagonal
path
Suppose we know the canonical (fl, a)-basis matrix B (1) for the weakly normal data point 03, a). We want to construct B(2), the canonical basis matrix for the next weakly normal data point on the same diagonal in the Pad~ table, i.e., of the form (fl + ~, a + ~). For general ~, not necessarily minimal, the coefficients of the polynomial elements of B(1,2) can be found as the solution of the following set of linear equations (we drop the superscripts in
7.4.
PADE
375
APPROXIMATION
u (1'2), V(1'2), C(1'2) a n d a (1'2) a n d t h e s u p e r s c r i p t s in s (1) a n d r (1))
T(~)
Vl
CO
0
--7' 0
V2
Cl
0
-rl
9
9
9
9
9
~
v(
c(-1
0
'/L2
al
0
-rl~_l --r~
u(
a(-t
0
--r2~_ 2
I
--r2~-i
ul~+l
at~
(r 5)
where
80
0
"'"
81
80
"9
0
-..
0
0
ro
"'.
0
0
9
T(~)
-
S~-1 8~
9
9
S~-2
9. .
80
r~-2
...
ro
0
8~-1
" " "
81
r~-I
999
rl
ro
r(_l
r(_2
r(
r(-1
9 9
, ~
,
82~-2
82~-3
9. .
8~--I
r2(-3
""
82~-1
82~-2
9. .
s~
r2(-2
999
a n d w h e r e vo - uo - ul - 0 a n d a w e a k l y n o r m a l d a t a p o i n t , the smallest n o n s i n g u l a r m a t r i x T ( ( ) r k - 2 - 0 a n d rk-1 ~ 0. This T ( k )
So
0
ao - 1. B e c a u s e (f~ + ( , a + ( ) has to be m a t r i x T ( ~ ) should be n o n s i n g u l a r . T h e occurs for ~ - k w h e n ro - r l = " " has a lower t r i a n g u l a r f o r m
...
0
0
...
0
0
"'.
0
0
0
0
0
0
9
81
80
9
.
0
8k-1
8k-2
" 9"
80
0
Sk
8k-1
9
s1
rk-1
9
T(k)
-
9
9
9
9
9
9
82k-2
82k-3
82k-1
82k-2
.
9
...
"'" 9
9 9
9
.
9
9
9. .
8k__ 1
r2k-3
"''
Sk
72k-2
"""
rk-1 ?'k
.
.
0 ?'k-1
C H A P T E R 7. G E N E R A L R A T I O N A L I N T E R P O L A T I O N
376
and the system becomes
2}1 ?32
T(k)
r c1
Vk 2t2
Ok-1 al
9
.
Uk Ukq-1
ak-1 ak
0
0
0 0
0 --7'k_ 1
0
--Tk
9
~
9
~
0
--r2k_2
1
--~'2k-1
Because of the lower triangular structure of the coefficient matrix T(k) and the initial zeros in the columns of the right-hand side, we get that
v(z) - 0
u(z) - uk+lz k+l ~ and
c(z) - Ck-1 Zk-1
with Uk+l - 1/rk_l # 0 and ck-1 - - r k - 1 / s o ~ O. The polynomial a(z) of degree _< k is the solution of
o r d ( c k _ l z k - l s ( z ) + r(z)a(z))
-
Ck-lS(z) + zr(z)q(z -1)
-
2k
or
with ord ( c k - l s ) -
O, ord ( z r ) -
k, ord (r') > k + 1 and q(z -1) - z-ka(z).
Hence, according to Definition 1.12, we can write
q(z -1)
-
a(z) - zkq(z -1)
-
- ( s d i v zr)ck_l
or z k - ( s div zr)ck_l .
Note that Ck_ 1 WaS chosen such that a(0) - 1. Finally, the basis matrix B (1'2) looks as follows
I B(I'2)(z)
-
0 "ll,k+l zk+l
Ck_ 1Z k-1 ] a(z)
with a defined by (7.16). We obtain the same results as in Section 5.2, more precisely, as in Theorem 5.3 taking into account that the (v, u) here are z times the (v, u) polynomial couple of Theorem 5.3. In a similar way, canonical basis matrices can be computed on an antidiagonal following a sequence of subsequent weakly normal data points.
377
7.4. P A D E A P P R O X I M A T I O N Row
path
To find the elements of the basis m a t r i x B (1'2) t r a n s f o r m i n g the c a n o n i c a l basis m a t r i x B(1) for t h e w e a k l y n o r m a l d a t a point (13, a ) to the c a n o n i c a l basis m a t r i x B (2) for the w e a k l y n o r m a l d a t a point (~, a + [), we have to solve t h e following set of linear e q u a t i o n s as a special case of (7.13) a n d (7.14) (also here we d r o p the s u p e r s c r i p t s , b u t d e n o t e the u, v, c a n d a w i t h a p r i m e to d i s t i n g u i s h t h e m f r o m the d i a g o n a l case) "
v~ v~
c~ c~
0
--T O
0 --r~-2 1 0 0
T([)
a _x
--r~-I
(7.17)
0 0
,
0
0
w i t h a~ - 1, v~ - u~ - u~ - 0 a n d w h e r e t h e coefficient m a t r i x T(~r s t a n d s for "
so
0
Sl
SO
9
.
...
9
0
0
.
r0 9
o
9
... ". ,
0
0
0
0
,
9
now
-
9
.
,
-
0 0 9
_ v#+l
If c~ ~t 0, the m a t r i x
999 999 9
...
0 v#+l ,
... 999
0 c~
c~
"..
c~-~+2
c~ C~_l
.
v~_~+3
v~_~+2
c~_~+1 _
[ 0] v#+l
is n o n s i n g u l a r . cient m a t r i x
0 0
V~+l v~
c~
If c o -- O, t h e n V~+l r 0 b e c a u s e the highest degree coeffi-
[ v/3+1 C/3 ] U~+l
a#
is n o n s i n g u l a r . S u p p o s e now t h a t c~ - c~-1 = "-" = c~-k+2 - 0 a n d c ~ - k + l ~t 0 a n d t h a t r0 - rl = . . ' = rl-1 - 0 a n d rt 7t 0. It can easily be
378
CHAPTER
7.
GENERAL
RATIONAL
INTERPOLATION
checked that the smallest nonsingular matrix T(~)is reached for ~ - k + l T(k +
l)O(z+x)x(k-1)
L( so, . . . , sk +z_ l )
L(rz, 99 r~+z-2)
R(v~+~,...,v~_~._z+2)
O(a+z)x(k-1)
O(k+z)x(z+~) O(k-t)x(z+i) R(c#_k+~, . . .,c~_~_z+~)
with L a lower triangular Toeplitz matrix
L(to, tl,. . .,tk)
I to
.."
0
i
"'.
i
-
tk
.. 9 to
and R a lower triangular Hankel matrix
R(to, tl,. . .,tk)
--
I O .-:
.."
to
"..
to 1 "
tk
The system (7.17) has the form" 0
0 ~
I Vl T(k+t)
u2 u;
c1
0
0
0
--7"l
1
--rk+l-1
0
0
a! L,0
9
~
o
o
0
0
Here we have split up a'(z) as a'(z)-
at(z ) + zka~H(Z)-
[1
z ...
zk-1]a'c +
zk[1 z .-. z~-k]a~y.
The notation a'L,O refers to the fact that we dropped the constant term (which is 1). Similarly we have set ~"(z)-
z~[1 z "'" zk-~]u 'L + z k+~ [1 z ... z ~-k]us
7.4. P A D E A P P R O X I M A T I O N
379
It is clear that v'(z) - 0 and u'(z) - u~z k. Hence we can write the above system of hnear equations as
ord(s(z)d(z)+r(z)a~L(Z))>_ deg (v(z)d(z) + c(z)zka~(z)) LP --->
r
P(z
-1)T
P(z)
- ~ Z -1
----~ B P --->
Figure 8.5: Classical representation of wavelet transform
)
~
~ LP
m
H
I
;BP
This polyphase representation corresponds to a more classical approach to wavelets. There the signal is filtered by a low pass f i l t e r / ~ ( z -1) and a band pass filter (~(z -1) which are both subsampled. See Figure 8.5. In the classical approach the filters (~ a n d / ~ are applied, that is a band pass and a low pass filter, and the filtered results are subsampled. In general, this is however computationaUy more demanding as the following example as well as Example 8.6 illustrate. E x a m p l e 8.8 For the case where we applied lifting to the linear interpolating subdivision as in Example 8.6, we have h(z)-
i
'
g(z)
-
1
- - ~ -~
1
1
z
2z 2'
s(z) -
-
1
(l+z)
-4
"
Thus -
h e ( z ) - 1,
-
ho(z)-O,
1
#e(z)--~(l+-),z
Therefore
1
P(z)
&o(~) ~o(z) _
i -~( 0
+ ~)
0]
-,(~-') 1 o I ~(1+ z)1
1
~ o ( z ) - 1.
8.5. POLYNOMIAL FORMULATION _
~3 - ~1( ~ + ; 1)
-
1(1 + z)
405
1 5 ( i + ~1]) _ [ /I~ (z) q~(z) - ] 1
Ho(z)
Co(z)
Thus -
2
1
3
1
1
1 2
H ( z ) - --8z- + -~ -~ + ~ + ~ z - gz and
-
1
a(z)-
--~ + ~-
i
-5
lz-2
'
which corresponds to our findings in Example 8.6. We have here the (N = 2 , / V - 2) Cohen-Daubechies-Feauveau biorthogonal wavelet [63].
Usually, one wants no information to be lost. This means that the signal can be perfectly reconstructed from its wavelet transform. In the polyphase representation, this is most easily represented by the condition that
p(z)f~(z-1)T _ I. Supposing that all the filters involved are finite impulse reponse filters, then they can be represented by Laurent polynomials. If both P(z) and its inverse should only have Laurent polynomial entries, then the determinant of P(z) should be a monomial, say of the form C z ~. Without loss of generality, we can assume that it is a constant which we shall moreover assume to be 1. We say that the polyphase matrix is then normalized or that the filters G(z) and H(z) are complementary. It is easily verified that perfect reconstruction implies in this case
fi~(~) - ao(~ -~ ),
~o(~) - - G ( ~ -~ ),
and
G(~) - -Ho(z-1),
Go(z) - H~(z -1).
In other words, the primal and dual filters for a perfect reconstruction scheme should satisfy
G(z)-z-lH(-z
-1)
and
[-I(z)--z-lG(-z-1).
The operations of primal and dual lifting are now characterized by the following theorem. T h e o r e m 8.5 (lifting) Assume that ( G ( z ) , H ( z ) ) is a couple of comple-
mentary finite impulse response filters, then
CHAPTER 8. WAVELETS
406
1. (G'(z), H(z)) will be another couple of complementary finite impulse response filters if and only if G'(z) is of the form C'(z)
- C(z)
+ H(z)
(z
where s(z) is a Laurent polynomial. 2. ( G( z), H'( z) ) will be another couple o] complementary finite impulse response filters if and only if H'(z) is of the form g'(z)
-
H(z) + G(z)t(z 2)
where t(z) is a Laurent polynomial. P r o o f . We only prove the first part, since the second part is completely similar. For the first part, it is sufficient to note that the even and odd components for H(z)s(z 2) are given by H~(z)s(z) and Ho(z)s(z). Thus the polyphase matrix for primal rifting has the form P ' ( z ) - P(z)
0
"
1
This proves the theorem.
[3
Note that the dual polyphase matrix for primal lifting is given by
P'(z)- P(z) [
1
0]
i.e., - [t(z)
-
We shall now consider the inverse problem. Suppose we are given the filters G, (~, H , / - / a s in some classical approach to biorthogonal wavelets. We shall give a procedure to decompose these filtering operations in elementary lifting steps which will lead to an efficient computation of the wavelet transform and of its inverse. The answer will be given by a factorization of the polyphase matrices. Suppose that we succeed in factoring such a polyphase matrix P(z) into a product of the form
P(z) = P~p(z)P~d(Z) " " "Pmp(z)Pmd(Z) with the elementary matrix factors given by Pip(z)-
0
1
ti(z)
1
'
"" '
8.6.
E U C L I D E A N D O M A I N OF L A U R E N T P O L Y N O M I A L S
407
where si(z) and ti(z) are Laurent polynomials. Then we have reduced the application of the polyphase matrix to a successive application of couples of dual and primal lifting steps to the lazy wavelet transform. Indeed, the multiplication with a matrix Pzd(Z) represents a dual lifting step and the multiplication with a matrix Pip(Z) represents a primal lifting step. We shall now show that such a factorization is obtained by the Euclidean algorithm. First we shall introduce the Euclidean algorithm for Laurent polynomials.
8.6
Euclidean
domain
of Laurent
polynomials
Let us consider the ring of Laurent polynomials over the field F which we denote as F[z, z-l]. We assume for simplicity that F is commutative. The units are all the invertible elements. Unlike for polynomials, where the units consist of all nonzero constants, here the units are given by all monomials az k a E F0 k C Z We turn this into a Euclidean domain as follows. First we set for any p e F[z,z -1]" I P l - - 1 if p - 0 and for p 7~ 0 }
9
u
Ip(z)l - ~ - l > 0
if
p(~) - ~ pkz k,
p~p~ # 0.
k-l
Then we define O ( p ) - 1 + IPl. The division property requires that for any two Laurent polynomials a and b, there should exist a quotient q and a remainder r such that a-
bq + r,
O(r) < O(b).
Such a quotient and remainder always exist, but they are far from unique. There is a much larger degree of freedom than in the polynomial case. We can best illustrate this with an example. E x a m p l e 8.9 Consider the Laurent polynomials
~(z)-
~-~ + 2z + 3z ~
a~d
b(z)-
z -~ + z.
Since ]b] - 2, we have to find a Laurent polynomial q(z) such that in [ a - bq[ < 2. Thus there may only remain at most two successive nonzero coefficients in the result. Setting q(z) - q_2z -2 + q _ l z -1 + qo + qlz + q2z 2
(other possibilities do not lead to a solution), we see that the remainder is in general r(z)-(a
_
bq)(z) - r_3z - 3 + r _ 2
Z-2
+r_lz-
1
+ ro + r l z + r2 Z2 + r3 Z3
CHAPTER
408
8.
WAVELETS
with r_3 r_2 r_l r0 rl r2 r3
--
q-2 1 -q-1 -q-a-q0 - q - 1 - q~ 2-q0-q2 3 - ql -q2
Now one can choose to keep the successive coefficients rk and rk+l, for some k E { - 3 , . . . , 2} and make all the others equal to zero. This corresponds to a system of 5 linear equations in 5 unknowns. Possible solutions are therefore q(Z) -- --2Z -2 -- 3Z -1 -~- 2 -]- 3Z
T(Z) -- --2Z -3 ~- 4Z -2
q(z) -- --3Z -1 + 2 + 3Z
r(z)
q ( z ) - z - ~ + 2 + a~ q ( z ) - z -1 + a~
~(z) ~(z) -
q(z) - z-~ - z
~ ( z ) - 2 z + 4~ ~
q ( z ) - ~-~ - z + 2 z ~
~(z) - 4z-
-
-
4Z -2
-- 2Z -1
-2z-~ -4 - 4 + 2~ 2 z ~.
It is clear t h a t the quotient and the remainder are far from uniquely defined. In general, if Ua
~(~)-
Ub
~ ~z ~ k=la
then we have
a~d
b(z)-
~ b~ ~ k=lb
Uq
q(z)-
~ q~ k=lq
with uq = u ~ - I b and lq = l ~ - u b so t h a t Iql = l a l + Ibl 9 The quotient has lal + Ibl + 1 coefficients to be defined. For the product bq we have Iqbl = lal + 21bl, thus it has lal + 21b I + 1 coefficients. Thus also a - bq has t h a t m a n y coefficients. Since at most Ibl subsequent of these coefficients may be a r b i t r a r y and all the others have to be zero, it follows t h a t there are always lal + Ibl + 1 equations for the lal + Ibl ~- 1 unknowns. W h e n these coefficients are made zero in a - bq then there remain at most Ibl successive coefficients which give the remainder r. We can conclude t h a t the quotient and remainder always exists and thus we do have a Euclidean domain. Therefore we can apply the Euclidean algorithm and obtain a greatest common divisor which will be unique up to
8.7. FA CTORIZATION ALGORITHM
409
a unit factor. It is remarkable that, with all the freedom we have at every stage of the Euclidean algorithm, we will always find the same greatest common divisor up to a monomial factor.
8.7 Factorization algorithm Suppose we start with a filter H(z) - He(z 2) + z -1Ho(z 2) and some other complementary filter G. The Laurent polynomials H~ and Ho are coprime. Indeed, if they were not, then they would have a nontrivial common divisor which would divide all the entries in P(z), thus also divide det P(z), but we assumed that det P(z) = 1, so this is impossible. The Euclidean algorithm will thus compute a greatest common divisor which we can always assume to be a constant, say K. This leads to [H~(z) Ho(z)]V~(z)...Vn(z)-[K
0]
with the Vk matrices of the form
~1
1 l
-qi(z)
where q~(z) are Laurent polynomials. After inverting and transposing, this reads
where the matrices Wk(z) are given by
Wk(z)-
) 1] -r [ qk( 1 0 "
We can always assume that n is even. Indeed, if it were odd, we can multiply the filter H with z and the filter G with z -1. They would still be complementary since the determinant of P(z) does not change. This would interchange the role of H~ and Ho which would introduce some "dummy" V0 which does only interchange these two Laurent polynomials. Let G~(z) be a filter which is complementary to H(z) for which G~ and G~ are defined by
P~(z)-
Ho(z)
G~,(z)
- Wl(z)...W,(z)
K0
K -1
"
C H A P T E R 8.
410 Because
qk(z) 1
1][
WAVELETS
qk,z,][01]_[01] [ 1 0]
0
1
0
1
1
0
qk(z)
'
1
we can set
nJ2[lq kl,z,][ 1 P~(z)-
II
0
k=l
1
0]
q2k(z)
1
0
K -1
"
In case our choice of G ~ does not correspond to the given complementary filter G, then by an application of Theorem 8.5, we can find a Laurent polynomial s(z) such that
P(z)-
1 ,(z)] 1 "
P~(z)
o
As a conclusion we can formulate the following theorem. T h e o r e m 8.6 Given two complementary finite impulse response filters ( H ( z ) , G(z)), then there exist Laurent polynomials sk(z) and tk(z), k = 1 , . . . , m and some nonzero constant K such that the polyphase matrix can be factored as
P(z)-
rail
1-I
k=l
o
10][
1
tk(z)
1
0
0]
K -1
"
The interpretation of this theorem is obvious. It says that any couple of complementary filters which does (one step of) an inverse wavelet transform can be implemented as a sequence of primal and dual lifting steps and some scaling (by the constants K and K - l ) . For the forward transform in the corresponding analysis step of a perfectly reconstructing scheme, the factorization is accordingly given by
.P(z) -
m[ 1 0][1
II k=l
--,~k(Z-1)
1
0
1
0
0]
K
"
E x a m p l e 8.10 One of the simplest of the classical wavelets one can choose are the Haar wavelets. They are described by the filters
H ( ~ ) - ~ + z-~
~d
1
C(z)- -~ +
1 -1 ~z
8.7. FACTORIZATION A L G O R I T H M
411
The dual filters are H(z)-
1
1
~+~z
-1
and
-1
G(z)--l+z
It is clear how these compute a wavelet transform" the low pass filter /~ takes the average and the high pass filter G takes the difference of two successive samples. Thus 1
J~l,k --
~()~l+l,2k -t- "~l+l,2k+l)
and
"Tl,k
= J~l+l,2k-t-1- ,~l-t-l,2k-
The polyphase matrix is trivially factored by the Euclidean algorithm as
P(z)-
i11j2] [10][1 lj21 1
1/2
-
11
0
1
"
The dual polyphase matrix is factored as
P(z)-
[lj211 E11][10] 1/2
1
-
0
1
1/2
1
"
One could object that these are not factorizations of the form we proposed above, but they are since we have left out the identity matrix in front of the two factors and an identity matrix between the two factors. For practical implementation, these of course do not do anything and they are just kept out of the computation. Thus, to compute the forward wavelet transform, we have to apply the lazy wavelet, i.e., take the even and the odd samples separately. Then, a first lifting step leaves the even samples untouched and computes the difference 71,k = )~l+~,2k+l -- ,~l+~,2k. In the next lifting step, this result is left untouched, but the even samples are modified by computing :Xl,k = )~l+l,2k + 1/27l,k. For the inverse transform, first one computes )~l+l,2k = ~t,k- 1/27l,k, and then )~1+1,2k+1 -- )~t+~,2k + 7t,k. This is just a matter of interchanging addition and subtraction. Note that in this simple example, there is not really a gain in computational effort, but as our earlier examples showed, in general there is.
Many more examples of this idea can be found in the paper [73].
Bibliography [1] N.I. Akhiezer [Achieser]. The classical moment problem. Oliver and Boyd, Edinburgh, 1969. Originally published Moscow, 1961. [2] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The design and analysis of computer algorithms. Addison Wesley, Reading, Mass., 1974. [3] R.V. Andree. Selections from modern abstract algebra. Holt, Rinehart and Winston, New York, 1958. [4] O. Axelsson. Iterative solution methods. Cambridge University Press, 1994.
[5] G.A. Baker Jr. Essentials of Padd Approzimants. Academic Press, New York, 1975. [6] G.A. Baker, Jr. and P.R. Graves-Morris. Padd Approzimants. Part II: Eztensions and Applications, volume 14 of Encyclopedia of Mathematics and its Applications. Addison-Wesley, Reading, MA, 1981. [7] G.A. Baker, Jr. and P.R. Graves-Morris. Pad~ Approzimants, volume 59 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2nd edition, 1996. [8] S. Barnett. Polynomials and linear control systems. Marcel Dekker, New york, 1983.
[9]
R. Barret et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, 1993.
[10]
G. Baxter. Polynomials defined by a difference system. J. Math. Anal. Appl., 2:223-263, 1961.
[11]
B. Beckermann. A reliable method for computing M-Pad~ approximants on arbitrary staircases. J. Comput. Appl. Math., 40:19-42, 1992. 413
414
BIBLIOGRAPHY
[12] B. Beckermann and G. Labahn. A uniform approach for Hermite-Pad~ and simultaneous Pad~ approximants and their matrix-type generalizations. Numer. Algorithms, 3(1-4):45-54, 1992. [13] B. Beckermann and G. Labahn. Recursiveness in matrix rational interpolation problems. Technical Report AN0-357, Laboratoire d'Analyse Num~rique et d'Optimisation, Universit~ des Sciences et Technologies de Lille, 1996. [14] E.I~. Berlekamp. Algebraic coding theory. McGraw-Hill, New York, 1968. [15] E. Bezout. Sur le degr~ des ~quations r~sultantes de 1' ~vanouissement des inconnues. Mdm. de l' Acad. Royale des Sciences, pages 288-338, 1764. [16] Y. Bistritz. Zero location with respect to the unit circle of discretetime linear system polynomials. Proc. IEEE, 72:1131-1142, 1984. [17] T.S. Blyth. Module theory. Claredon Press- Oxford University Press, 1990. [18] A.W. Bojanczyk and G. Heinig. A multi-step algorithm for Hankel matrices. J. Complexity, 10:142-164, 1994. [19] D.L. Boley, S. Elhay, G. H. Golub, and M.H. Gutknecht. Nonsymmetric Lanczos and finding orthogonal polynomials associated with indefinite weights. Numer. Algorithms, 1(1):21-44, 1991. [20] I~.C. Bose and D.K. l~ay-Chaudhuri. Further results on error correcting binary group codes. Inf. Control, 3:68-79, 1960. [21] I~.C. Bose and D.K. l~ay-Chaudhuri. On a class of error correcting binary group codes. Inf. Control, 3:68-79, 1960. [22] I~.P. Brent, F.G. Gustavson, and D.Y.Y. Yun. Fast solution of Toeplitz systems of equations and computation of Pad~ approximants. J. Algorithms, 1:259-295, 1980. [23] I~.P. Brent and H.T. Kung. Systolic VLSI arrays for polynomial GCD computation. IEEE Trans. Comput., C-33(8):731-736, 1984. [24] C. Brezinski. Padd-type approximation and general orthogonal polynomials, volume 50 of Internat. Set. of Numer. Math. Birkhs Verlag, Basel, 1980.
BIBLIOGRAPHY
415
[25] C. Brezinski. History of continued fractions and Padd approximants, volume 12 of Springer Set. Comput. Math. Springer, Berlin, 1990. [26] C. Brezinski. Biorthogonality and its applications to numerical analysis, volume 156 of Lecture Notes in Pure and Appl. Math. Marcel Dekker Inc., 1992. [27] C. Brezinski. Formal orthogonality on an algebraic curve. Ann. Numet. Math., 2:21-33, 1995. [28] C. Brezinski, editor. Projection methods for systems of equations. Studies in Computational Mathematics. Elsevier, Amsterdam, 1997. To appear. [29] C. Brezinski and M. Redivo Zaglia. Orhogonal polynomials of dimension - 1 in the non definite case. Rend. Mat. Roma, Set. VII, 14:127-133, 1994. [30] C. Brezinski, M. l~edivo Zaglia, and H. Sadok. Avoiding breakdown and near-breakdown in Lanczos type algorithms. Numer. Algorithms, 1(3):261-284, 1991. Addendum, vol. 2(2):133-136, 1992. [31] C. Brezinski, M. Redivo Zaglia, and H. Sadok. A breakdown free Lanczos type algorithm for solving linear systems. Numer. Math., 63(1):29-38, 1993. [32] C. Brezinski and H. Sadok. Lanczos type algorithms for solving systems of linear equations. Appl. Numer. Math., 11(6):443-473, 1993. [33] C. Brezinski and J. van Iseghem. Pad~ approximations. In P.G. Ciarlet and J.L. Lions, editors, Handbook of Numerical Analysis, volume 3. North-Holland, 1993. [34] C. Brezinski and J. van Iseghem. Vector orthogonal polynomials of dimension -d. In R.V.M. Zahar, editor, Approximation and Computation, volume 119 of Internat. Set. of Numer. Math., pages 29-39. Birkhs Verlag, 1994. [35] C. Brezinski and J. van Iseghem. A taste of Pad~ approximation. In Acta Numerica, pages 53-103, 1995. [36] l~.W. Brockett. Finite dimensional linear systems. John Wiley and Sons, New York, 1970.
416
BIBLIOGRAPHY
[37] A.M. Bruckstein and T. Kailath. An inverse scattering framework for several problems in signal processing. IEEE ASSP magazine, pages 6-20, 1987. [38] A. Bultheel. Division algorithms for continued fractions and the Pad~ table. J. Comput. Appl. Math., 6(4):259-266, 1980. [39] A. Bultheel. l~ecursive algorithms for non normal Pad~ tables. SIAM J. Appl. Math., 39(1):106-118, 1980. [40] A. Bultheel. Triangular decomposition of Toeplitz and related matrices: a guided tour in the algorithmic aspects. Bull. Soc. Math. Belg. Sdr. A, 37(3):101-144, 1985. [41] A. Bultheel. Laurent series and their Padd approzimations, volume OT-27 of Oper. Theory: Adv. Appl. Birkhs Verlag, Basel-Boston, 1987. [42] A. Bultheel and M. Van Barel. Pad~ techniques for model reduction in linear system theory. J. Comput. Appl. Math., 14:401-438, 1986. [43] S. Cabay and D.K. Choi. Algebraic computations of scaled Pad~ fractions. SIAM J. Comput., 15(1):243-270, 1986. [44] S. Cabay, A. Jones, and G. Labahn. Computation of numerical Pad~Hermite and simultaneous Pad~ systems I: Near inversion of generalized Sylvester matrices. SIAM J. Matrix Anal. Appl., 17:248-267, 1996. [45] S. Cabay, A. Jones, and G. Labahn. Computation of numerical Pad~Hermite and simultaneous Pad~ systems II: A weakly stable algorithm. SIAM J. Matriz Anal. Appl., 17:268-297, 1996. [46] S. Cabay, A.I~. Jones, and G. Labahn. Experiments with a weakly stable algorithm for computing Pad~-Herm ite and simultaneous Pad~ approximants. A CM Trans. Math. Software, 1996. Submitted. [47] S. Cabay and G. Labahn. A superfast algorithm for multi-dimensional Pad~ systems. Numer. Algorithms, 2(2):201-224, 1992. [48] S. Cabay, G. Labahn, and B. Beckermann. On the theory and computation of non-perfect Pad~-Hermite approximants. J. Comput. Appl. Math., 39:295-313, 1992.
BIBLIOGRAPHY
417
[49] S. Cabay and R. Meleshko. A weakly stable algorithm for Pad~ approximants and the inversion of Hankel matrices. SIAM J. Matrix Anal. Appl., 14:735-765, 1993. [50] A.L. Cauchy. Cours d'analyse de l'Ecole Royale Polytechnique. Premiere pattie" Analyse Algdbrique. Paris, 1821. [51] T. F. Chan and P.C. Hansen. A look-ahead Levinson algorithm for general Toeplitz systems. IEEE Trans. Sig. Proc., 40(5):1079-1090, 1992. [52] T. F. Chan and P.C. Hansen. A look-ahead Levinson algorithm for indefinite Toeplitz systems. SIAM J. Matrix Anal. Appl., 13(2):490506, 1992. [53] S. Chandrasekaran and A.H. Sayed. Stabihzing the generahzed Schur algorithm. SIAM J. Matrix Anal. Appl., 17:950-983, 1996. [54] P.L. Chebyshev. Sur les fractions continues. Journ. de Math. Pures et Appliqudes, Sdr II, 3:289-323, 1858. See (Euvres, Tome I, Chelsea Pub. Comp. pp.-. [55] P.L. Chebyshev. Sur l'interpolation par la m~thode des moindres carr~s. Mere. Acad. Impdr. des Sciences St. Petersbourg, sdr. 7, 1:124, 1859. See (Euvres, Tome I, Chelsea Pub. Comp. pp. 471-498. [56] P.L. Chebyshev. Sur les fractions continues alg~briques. Journ. de Math. Pures et Appliqudes, Sdr II, 10:353-358, 1865. See (Euvres, Tome I, Chelsea Pub. Comp. pp. 609-614. [57] P.L. Chebyshev. Sur le d~veloppement de fonctions en s~ries s l'aide des fractions continues. In A. Markoff and N. Sonin, editors, (Euvres de P.L. Tchebycheff, Tome I, pages 615-631, New York, 1866. Chelsea Publishing Company. (Original in Russian Mem. Acad. Impdr. des Sciences St. Petersbourg, 9, Append. 1). [58] P.L. Chebyshev. Sur la d~termination des fonctions d'apr~s les valeurs qu'elles ont pour certaines valeurs de variables. Math. Sb., 4:231-245, 1870. See (Euvres, Tome II, Chelsea Pub. Comp. pp. 71-82. [59] G. Chen and Z. Yang. Bezoutian representation via Vandermonde matrices. Linear Algebra Appl., 186:37-44, 1993. [60] G.-N. Chen and H.-P. Zhang. Note on products of Bezoutians and Hankel matrices. Linear Algebra Appl., 225:23-36, 1995.
418
BIBLIOGRAPHY
[61] T. Chihara. An introduction to orthogonal polynomials. Gordon and Breach Science Publishers, New York, 1978. [62] G.C. Clark Jr. and J.B. Cain. Error-correction coding for digital communications. Plenum Press, New York, London, 1981. [63] A. Cohen, I. Daubechies, and J.C. Feauveau. Biorthogonal bases of compactly supported wavelets. Comm. Pure Appl. Math., 45(5):485560, 1992. [64] A. Cohn. Uber die Anzahl der Wurzeln einer Algebraischen Gleichung in einem Kreise. Math. Zeit., 14:110-148, 1922. [65] P.M. Cohn. Free rings and their relations. Academic Press, London, 1971. [66] P.M. Cohn. Algebra, volume 1. John Wiley & Sons, Chichester, 2nd edition, 1982. [67] J.K. Cullum and I~.A. Willoughby. Lanczos algorithms for large symmetric eigenvalue computations, Volume 1, Theory. Birkh/~user Verlag, Basel, 1985. [68] A. Cuyt. A review of multivariate Pad~ approximation theory. Jo Comput. Appl. Math., 12/134:221-232, 1985. [69] A. Cuyt and L. Wuytack. Nonlinear methods in numerical analysis, volume 136 of Mathematical Studies. North Holland, Amsterdam, 1987. [70] A.M. Cuyt. Padd approzimants for operators: theory and applications, volume 1065 of Lecture Notes in Math. Springer, 1984. [71] A.M. Cuyt. General order multivariate rational Hermite interpolants. Habilitation, University of Antwerp, July 1986. [72] G. Cybenko. An explicit formula for Lanczos polynomials. Linear Algebra Appl., 88/89:99-115, 1987. [73] I. Daubechies and W. Sweldens. Factoring wavelet transforms into lifting steps. 1996. Technical Report, Bell Labs, Lucent Technologies. [74] M. Decoster and A.B.. van Cauwenberghe. A comparative study of different reduction methods (part 2). Journal A, 17(3):125-134, 1976.
BIBLIOGRAPHY
419
[75] Ph. Delsarte and Y. Genin. The split Levinson algorithm. IEEE Trans. Acoust. Speech Signal Process., ASSP-34:470-478, 1986. [76] Ph. Delsarte and Y. Genin. On the splitting of classical algorithms in linear prediction theory. IEEE Trans. Acoust. Speech Signal Process., ASSP-35:645-653, 1987.
[77] Ph. Delsarte and Y. Genin. A survey of the split approach based techniques in digital signal processing applications. Phillips J. Res., 43:346-374, 1988. [78] Ph. Delsarte and Y. Genin. The tridiagonal approach to Szegt~ orthogonal polynomials, Toeplitz linear systems and related interpolation problems. SIAM J. Math. Anal., 19:718-735, 1988. [79] Ph. Delsarte and Y. Genin. An introduction to the class of split Levinson algorithms. In G. Golub and P. Van Dooren, editors, Numerical linear algebra, digital signal processing and parallel algorithms, volume 70 of NATO-ASI Series, F: Computer and Systems Sciences, pages 112-130, Berlin, 1991. Springer. [80] Ph. Delsarte and Y. Genin. On the split approach based algorithms for DSP problems. In G. Golub and P. Van Dooren, editors, Numerical linear algebra, digital signal processing and parallel algorithms, volume 70 of NATO-ASI Series, F: Computer and Systems Sciences, pages 131-147, Berlin, 1991. Springer. [81] Ph. Delsarte, Y. Genin, and Y. Kamp. Application of the index theory of pseudo-lossless functions to the Bistritz stability test. Philips J. Res., 39:226-241, 1984. [82] Ph. Delsarte, Y. Genin, and Y. Kamp. A generalization of the Levinson algorithm for hermitian Toeplitz matrices with any rank profile. IEEE Trans. Acoust. Speech Signal Process., ASSP-33(4):964-971, 1985. [83] Ph. Delsarte, Y. Genin, and Y. Kamp. Pseudo-lossless functions with application to the problem of locating the zeros of a polynomial. IEEE Trans. Circuits and Systems, CAS-32:371-381, 1986. [84] Ph. Delsarte, Y. Genin, and Y. Kamp. A survey od the split approach based techniques in digital signal processing. Philips J. Res., 43:346374, 1988.
420
BIBLIOGRAPHY
[85] G. Deslauriers and S. Dubuc. Interpolation dyadique. In Fractals, Dimensions non-enti~res et applications, pages 44-55, Paris, 1987. Masson. [86] G. Deslauriers and S. Dubuc. Symmetric iterative interpolation processes. Constr. Approx., 5(1):49-68, 1989. [87] A. Draux. Polyn6mes orthogonauz formels - applications, volume 974 of Lecture Notes in Math. Springer, Berlin, 1983. [88] A. Draux and P. van Ingelandt. Polyn6mes Othogonaux et Approximants de Padd- Logiciels. Editions Technip, Paris, 1987. [89] T. Erd~lyi, P. Nevai, J. Zhang, and J.S. Geronimo. A simple proof of "Favard's theorem" on the unit circle. Atti. Sem. Mat. Fis. Univ. Modena, 29:551-556, 1991. Proceedings of the Meeting "Trends in Functional Analysis and Approximation Theory", 1989, Italy. [90] M. Fiedier. Special matrices and their Applications in Numerical Mathematics. Martinus Nijhoff Publ., Dordrecht, 1986. [91] M. Fiedler and V. Pts Loewner and Bezout matrices. Linear Algebra Appl., 101:187-202, 1988. [92] G. Freud. Orthogonal polynomials. Pergamon Press, Oxford, 1971. [93] K.W. Freund. A look-ahead Bareiss algorithm for general Toeplitz matrices. Numer. Math., 68:35-69, August 1994. AT~T Bell Labs. Numerical Analysis Manuscript 93-11. [94] I~.W. Freund, G.H. Golub, and N.M. Nachtigal. Iterative solution of linear systems. Acta Numerica, 1:57-100, 1992. [95] P~.W. Freund and N.M. Nachtigal. QMK: A quasi-minimal residual method for non-Hermitian matrices. Numer. Math., 60:315-339, 1991. [96] P~.W. Freund and N.M. Nachtigal. Implementation details of coupled QMR algorithm. In L. Reichel, A. l~uttan, and K.S. Varga, editors, Numerical Linear Algebra, pages 123-140, Berlin, 1993. W. de Gruyter. [97] I~.W. Freund and N.M. Nachtigal. An implementation of the QMR method based on coupled two-term recurrences. SIAM J. Sci. Statist. Comput., 15(2), 1993.
BIBLIOGRAPHY
421
[98] R,.W. Freund and H. Zha. Formally biorthogonal polynomials and a look-ahead Levinson algorithm for general Toeplitz systems. Linear Algebra Appl., 188/189:255-303, 1993. [99] I~.W. Freund and H. Zha. A look-ahead strategy for the solution of general Hankel systems. Numer. Math., 64:295-322, 1993. [100] G. Frobenius. Uber relationen zwischen den N~herungsbriichen yon Potenzreihen. J. Reine Angew. Math., 90:1-17, 1881. [101] P.A. Fuhrmann. A polynomial approach to linear algebra. Springer, 1996. [102] K. Gallivan, S. Thirumalai, P. Van Dooren, and V. Vermaut. High performance algorithms for Toeplitz and block Toeplitz matrices. Linear Algebra Appl., 241-243:343-388, 1996. [103] K. Gallivan, S. Thirumalai, and P. Van Dooren. A block Toeplitz lookahead Schur algorithm. In M. Moonen and B. De Moor, editors, SVD and Signal Processing III, pages 199-206, Amsterdam, 1995. Elsevier. [104] F.I~. Gantmacher. The theory of matrices, Volume II. Chelsea, New York, 1959. [105] W. Gautschi. On generating orthogonal polynomials. SIAM J. Sci. Statist. Comput., 3:289-317, 1982. [106] Y. Genin. An introduction to the modern theory of positive functions and some of its today applications to signal processing, circuits and system problems. In Advances in modern circuit theory and design (Paris, 1987), pages 195-234, Amsterdam-New York, 1987. NorthHolland.
[107] Y. Genin. Euclid algorithm, orthogonal polynomials and generalized l~outh-Hurwitz algorithm. Linear Algebra Appl., 246:131-158, 1996. [108] Y. Genin. On polynomials nonnegative on the unit circle and related questions. Linear Algebra Appl., 1996. To appear. [109] I. Gohberg, editor. I. Schur methods in operator theory and signal processing, volume 18 of Oper. Theory: Adv. Appl. Birkhs Verlag, Basel, 1986.
422
BIBLIOGRAPHY
[110] I. Gohberg, T. Kailath, I. Koltracht, and P. Lancaster. Linear complextity parallel algorithms for linear systems of equations with recursive structure. Linear Algebra Appl., 88/89:271-315, 1987. [111] I. Gohberg, T. Kailath, and V. Olshevsky. Fast Gaussian elimination with partial pivoting for matrices with displacement structure. Math. Comp., 64:1557-1576, 1995. [112] I. Gohberg, P. Lancaster, and L. P~odman. Invariant Subspaces of Matrices with Applications. John Wiley & Sons, New York, 1986. [113] I.C. Gohberg, T. Kailath, and I. Koltracht. Efficient solution of linear systems of equations with recursive structure. Linear Algebra Appl., 80:81-113, 1986. [114] G. Gomez and L. Lerer. Generalized Bezoutians for analytic operator functions and inversion of structured operators. In U. Helmke, 1~. Mennicken, and J. Saurer, editors, Systems and Networks: Mathematical theory and applications, volume II, volume 79 of Mathematical Research, pages 691-696. Akademie Verlag, 1994. [115] V.D. Goppa. A new class of linear error-correcting codes. Peredach. Inform., 6:24-30, 1970.
Probl.
[116] M.J.C. Gover and S. Barnett. Inversion of Toeplitz matrices which are not strongly non-singular. IMA J. Numer. Anal., 5:101-110, 1985. [117] W. B. Gragg and M. H. Gutknecht. Stable look-ahead versions of the Euclidean and Chebyshev algorithms. In K.V.M. Zahar, editor, Approximation and Computation: A Festschrifl in Honor of Walter Gautschi, pages 231-260. Birkhs Verlag, 1994. [118] W.B. Gragg. The Pad~ table and its relation to certain algorithms of numerical analysis. SIAM Rev., 14:1-62, 1972. [119] W.B. Gragg. Matrix interpretations and applications of the continued fraction algorithm. Rocky Mountain J. Math., 4(2):213-225, 1974. [120] W.B. Gragg. Positive definite Toeplitz matrices, the Arnoldi process for isometric operators and Gaussian quadrature on the unit circle. In E.S. Nikolaev, editor, Numerical methods in linear algebra, pages 16-32. Moscow university press, 1982. (In Russian). [121] W.B. Gragg and A. Lindquist. On the partial realization problem. Linear Algebra Appl., 50:277-319, 1983.
BIBLIOGRAPHY
423
[122] M. Gu. Stable and efficient algorithms for structured systems of linear equations. Technical Report LBL-37690, Lawrence Berkeley Laboratory, University of California, Berkeley, 1995. [123] M. Gutknecht and M. Hochbruck. Optimized look-ahead recurrences for adjacent rows in the Pad~ table. BIT, 36:264-286, 1996. [124] M.H. Gutknecht. The unsymmetric Lanczos algorithms and their relations to Pad~ approximation, continued fractions, the QD algorithm, biconjugate gradient squared algorithms and fast Hankel solvers. In Proceedings of the Copper Mountain Conference on Iterative Methods, 1990. [125] M.H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms. Part I. SIAM J. Matrix Anal. Appl., 13(2):594-639, 1992. [126] M.H. Gutknecht. Stable row recurrences for the Pad~ table and generically superfast lookahead solvers for non-Hermitian Toeplitz systems. Linear Algebra Appl., 188/189:351-422, 1993. [127] M.H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms. Part II. SIAM J. Matrix Anal. Appl., 15:15-58, 1994. [128] M.H. Gutknecht. Lanczos-type solvers for nonsymmetric linear systems of equations. Acta Numerica, 6, 1997. [129] M.H. Gutknecht and M. Hochbruck. Look-ahead Levinson- and Schurtype recurrences in the Pad~ table. Electron. Trans. Numer. Anal., 2:104-129, 1994. [130] M.H. Gutknecht and M. Hochbruck. Look-ahead Levinson and Schur algorithms for non-Hermitian Toeplitz systems. Numer. Math., 70:181-228, 1995. [131] S. Gutman and E.I. Jury. A general theory for matrix root-clustering in subregions of the complex plane. IEEE Trans. Automat. Control, 26:853-863, 1981. [132] P.C. Hansen and P.Y. Yalamov. Stabilization by perturbation of a 4n 2 Toeplitz solver. Preprint N25, Technical University of l~usse, Bulgaria, January 1995. Submitted to SIAM J. Matrix Anal. Appl.
424
BIBLIOGRAPHY
[133] G. Heinig. Beitriige zur Spektraltheorie yon Operatorbiischeln und zur algebraischen Theorie yon Toeplitzmatrizen. PhD thesis, TH KarlMarx-Stadt, 1979. [134] G. Heinig. Inversion of Toeplitz and Hankel matrices with singular sections. Wiss. Zeitschr. d. TH. Karl-Marx-Stadt, 25(3):326-333, 1983. [135] G. Heinig. On structured matrices, generalized Bezoutians and generalized Christoffel-Darboux formulas. In H. Bart, I. Gohberg, and M.A. Kaashoek, editors, Topics in matrix and operator theory, volume 50 of Oper. Theory: Adv. Appl., pages 267-281. Birkhs Verlag, 1991. [136] G. Heinig. Inversion of generalized Cauchy matrices and other classes of structured matrices. In Linear Algebra in Signal Processing, volume 69 of IMA volumes in Mathematics and its Applications, pages 95-114. IMA, 1994. [137] G. Heinig. Inversion of Toeplitz-like matrices via generalized Cauchy matrices and rational interpolation. In Systems and Networks: Mathematical Theory and Applications, volume 2, pages 707-711. Akademie Verlag, 1994. [138] G. Heinig. Matrix representations of Bezoutians. Appl., 1994.
Linear Algebra
[139] G. Heinig. Solving Toeplitz systems via tangential Lagrange interpolation. SIAM J. Matrix Anal. Appl., 1996. Submitted. [140] G. Heinig. Transformation approaches for fast and stable solution of Toeplitz systems and polynomial equations. In Proceedings of the International Workshop "Recent Advances in Applied Mathematics", pages 223-238, State of Kuwait, May 4-7 1996. [141] G. Heinig and A. Bojanczyk. Transformation techniques for Toeplitz and Toeplitz-plus-Hankel matrices I. Transformations. Linear Algebra Appl., pages 1-24, 1996. To appear. [142] G. Heinig and A. Bojanczyk. Transformation techniques for Toeplitz and Toeplitz-plus-Hankel matrices II. Algorithms. Linear Algebra Appl., pages 1-20, 1996. To appear. [143] G. Heinig and F. Hellinger. On the Bezoutian structure of the MoorePenrose inverses of Hankel matrices. SIAM J. Matrix Anal. Appl., 14(3):629-645, 1993.
BIBLIOGRAPHY
425
[144] G. Heinig and K. P~ost. Algebraic methods for Toeplitz-like matrices and operators. Akademie Verlag, Berlin, 1984. Also Birkhs Verlag, Basel. [145] G. Heinig and K. Kost. Matrices with displacement structure, generalized Bezoutians, and Moebius transforms. In H. Dym, S. Goldberg, M. Kaashoek, and P. Lancaster, editors, The Gohberg anniversary collection, volume I: The Calgary conference and matrix theory papers, volume 40 of Oper. Theory: Adv. Appl., pages 203-230, Boston, 1989. Birkhs Verlag. [146] U. Helmke. Rational functions and Bezout forms" a functional correspondence. Linear Algebra Appl., 122/123/124:623-640, 1987. [147] U. Helmke and P.A. Fuhrmann. Bezoutians. Linear Algebra Appl., 122/124:1039-1097, 1989. [148] P. Henrici. Power series, integration, conformal mapping, location of zeros, volume I of Applied and Computational Complex Analysis. John Wiley & Sons, New York, London, Sydney, Toronto, 1974. [149] P. Henrici. Applied and computational complex analysis. Volume 2: Special functions, integral transforms, asymptotics, continued fractions, volume II of Pure and Applied Mathematics, a Wileyinterscience series of texts, monographs and tracts. John Wiley Sons, New York, 1977. [150] M. Hochbruck. Further optimized look-ahead recurrences for adjacent rows in the Pad ~ table and Toeplitz matrix factorizations, August 1996. Manuscript. [151] M. Hochbruck. The Padd table and its relation to certain numerical algorithms. PhD thesis, Mathematische Fakults Universits Tfibingen, November 1996. [152] Hocquenghem. Codes correcteurs d'erreurs. Chiffres, 2"147-156, 1959. [153] A.S. Householder. Bigradients and the problem of Routh and Hurwitz. SIAM Rev., 10:56-66, 1968. [154] A.S. Householder. Bezoutiants, elimination and localization. SlAM Rev., 12"106-119, 1970.
426
BIBLIOGRAPHY
[155] T. Huckle. A look-ahead algorithm for solving nonsymmetric linear Toeplitz equations. In Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, Snowbird, Utah, June 199~, pages 455-459, 1994. [156] I.S. Iohvidov. Hankel and Toeplitz matrices and forms. Birkh~iuser Verlag, Boston, 1982. [157] C.G.J. Jacobi. Uber die Darstellung einer l~eihe gegebener Werte durch einer gebrochenen rationale Funktion. J. fiir reine und angew. Math., 30:127-156, 1845. [158] E.A. Jonckheere and C. Ma. l~ecursive partial realization from combined sequence of Markov parameters and moments. Linear Algebra Appl., 122/123/124:565-590, 1989. [159] E.A. Jonckheere and C. Ma. A simple Hankel interpretation for the Berlekamp-Massey algorithm. Linear Algebra Appl., 125:65-76, 1989. [160] M. T. Jones. The use of Lanczos' method to solve the generalized eigenproblem. PhD thesis, Duke University, Department of Mathematics, 1990. [161] W.B. Jones and W.J. Thron. Continued Fractions. Analytic Theory and Applications. Addison-Wesley, Reading, Mass., 1980. [162] W. Joubert. Generalized conjugate gradient and Lanczos methods for the solution of nonsymmetric systems of linear equations. PhD thesis, Center of Numerical Analysis, The University of Texas at Austin, Austin, Texas, January 1990. l~eport CNA-238. [163] E.I. Jury. Theory and applications of the z-transform method. J. Wiley Sons, New York, 1964. [164] E.I. Jury. Inners and stability of dynamical systems. Wiley, New York, 1974. [165] T. Kailath. Linear Systems. Prentice-Hall, Inc., Englewood Cliffs, N.J., 1980. [166] T. Kailath and A.H. Sayed. Displacement structure: theory and applications. SIAM Rev., 37:297-386, 1995.
BIBLIOGRAPHY
427
[167] 1~. Kalman. On partial realizations, transfer functions, and canonical forms. Acta Technica Scandinavica, 31:9-32, 1979. (Zbl 424.93020, MR 80k:93022). [168] tt.E. Kalman. Algebraic characterization of polynomials lie in certain algebraic domain. Proc. Natl. Acad. Sci. USA, 64:818-823, 1969. [169] P..E. Kalman, P.L. Falb, and M.A. Arbib. Topics in mathematical system theory. International Series in Pure and Applied Mathematics. McGraw-Hill Book Company, New York, San Francisco, St. Louis, Toronto, London, Sydney, 1969. [170] M.D. Kent. Chebyshev, Krylov, Lanczos : matrix relationships and computations. PhD thesis, Stanford University, Dept. of Computer Science, June 1989. l~ept. STAN-CS-89-1271. [171] A.N. Khovanskii. The application of continued fractions and their generalizations to problems in approximation theory. Noordhoff, Groningen, 1963. [172] M. Kimura. Chain scattering approach to H-infinity-control. Birkhs Verlag, 1997. [173] D.E. Knuth. The art of computer programming, Vol 2 : Seminumerical algorithms. Addison Wesley, Reading, Mass., 1969. [174] P. Kravanja and M. Van Bard. A fast Hankel solver based on an inversion formula for Loewner matrices. Linear Algebra Appl., 1996. Submitted. [175] M.G. KreYn and M.A. Naimark. The method of symmetric and Hermitian forms in the theory of the separation of roots of algebraic equations. 1936. Translated in English: Linear and Multilinear Algebra, 10 (1981)265-308. [176] L. Kronecker. Zur Theorie der Elimination einer Variabel aus zwei algebraischen Gleichungen. Monatsber. K6nigl. Preus. Akad. Wiss., pages 535-600, 1881. [177] S. Kung. Multivariable and multidimensional systems : analysis and design. PhD thesis, Stanford University, 1977. [178] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand., 45:225-280, 1950.
428
BIBLIOGRAPHY
[179] F.I. Lander. The Bezoutian and the inversion of Hankel and Toeplitz matrices. Matem. Issled., 9(2):69-87, 1974. [180] N. Levinson. The Wiener rms (root mean square) error criterion in filter design and prediction. J. Math. Phys., 25:261-278, 1947. [181] J.D. Lipson. Elements of algebra and algebraic computing. Addison Wesley Publishing Co., Reading, Mass., 1981. [182] C. Van Loan. Matriz frameworks for the Fast Fourier Transform, volume 10 of Frontiers in Applied Mathematics. SIAM, 1992. [183] L. Lorentzen and H. Waadeland. Continued fractions with applications, volume 3 of Studies in Computational Mathematics. NorthHolland, 1992. [184] A. Magnus. Certain continued fractions associated with the Pad~ table. Math. Z., 78:361-374, 1962. [185] A. Magnus. Expansion of power series into P-fractions. Math. Z., 80:209-216, 1962. [186] M. Marden. The geometry of polynomial approzimants. Amer. Math. Soc., Providence, l~.I., 1966. [187] P. Maroni. Sur quelques espaces de distributions qui sont des formes lin~aires sur l'espace vectoriel des polyn6mes. In C. Brezinski, A. Draux, A.P. Magnus, P. Maroni, and A. l~onveaux, editors, Proc. Polyn6mes Orthogoneaux et Applications, Bar-le-Duc, 198~, volume 1171 of Lecture Notes in Math., pages 184-194. Springer, 1985. [188] P. Maroni. Prol~gom~nes s l'~tude de polynSmes orthogonaux. Ann. Mat. Pura ed Appl., 149:165-184, 1987. [189] P. Maroni. Le calcul de formes lin~aires et des polyn6mes orthogonaux semi-classiques. In M. Alfaro et al., editors, Orthogonal polynomials and their applications, volume 1329 of Lecture Notes in Math., pages 279-288. Springer, 1988. [190] P. Maroni. Une th~orie alg~brique de polyn6mes orthogonaux. Application aux polynSmes orthogonaux semi-classiques. In C. Brezinski, L. Gori, and A. l~onveaux, editors, Orthogonal Polynomials and their Applications, volume 9 of IMACS annals on computing and applied mathematics, pages 95-130, Basel, 1991. J.C. Baltzer AG.
BIBLIOGRAPHY
429
[191] P. Maroni. An introduction to second degree forms. Adv. Comput. Math., 3:59-88, 1995. [192] J.L. Massey. Shift-register synthesis and BCH decoding. IEEE Trans. Inf. Th., IT-15:122-127, 1969. [193] l~.J. McEliece. The theory of information and coding: A mathematical framework for communication, volume 3 of Encyclopedia of mathematics. Addison Wesley, Reading, Mass., 1977. [194] l~.J. McEliece and J.B. Shearer. A property of Euclid's algorithm and an application to Pad~ approximation. SIAM J. Appl. Math., 34:611-616, 1978. [195] M.S. Moonen, G. Golub, and B.L.I~. De Moor, editors. Linear algebra for large scale and real-time applications, volume 232 of NATO-ASI series E: Applied Sciences. Kluwer Acad. Publ., Dordrecht, 1993. [196] M. Morf. Fast algorithms for multivariable systems. PhD thesis, Stanford University, 1974. [197] O. Nevanlinna. Convergence of iterations for linear equations. Birkhs Verlag, 1993. [198] H. Pad~. Sur la Reprdsentation Approchde d'une Fonction par des Fractions Rationelles. PhD thesis, Ann. Ecole. Norm. Sup., vol. 9, pages 3-93, Paris, 1892. [199] D. Pal and T. Kailath. Fast triangular factorization and inversion of Hermitian Toeplitz, and related matrices with arbitrary rank profile. SIAM J. Matrix Anal. Appl., 14(4):1016-1042, 1993. [200] B.N. Parlett. Reduction to tridiagonal form and minimal realizations. SIAM J. Matrix Anal. Appl., 13(2):567-593, 1992. [201] B.N. Parlett, D.R. Taylor, and Z.A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Mathematics of Comp., 44(169):105124, 1985. [202] O. Perron. Die Lehre yon den Kettenbriichen. Teubner, 1977. [203] S. Pombra, H. Lev-Ari, and T. Kailath. Levinson and Schur algorithms for Toeplitz matrices with singular minors. In Int. Conf. Acoust., Speech and Signal proc., pages 1643-1646, 1988.
430
BIBLIOGRAPHY
[204] V. Pts Explicit expressions for Bezoutians. Linear Algebra Appl., 59:43-54, 1984. [205] V. Pts Lyapunov, Bezout and Hankel. Linear Algebra Appl., 58:363390, 1984. [206] L. l~dei. Algebra, Erster Tell. Akademische Verlagsgesellschaft, 1959. [207] I.S. Reed and G. Solomon. Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math., 8:300-304, 1960. [208] J. Rjssanen. l~ecursive identification of linear systems. SIAM J. Control, 9(3):420-430, 1971. [209] K. l~ost. MSbius transformations, matrix representations for generalized Bezoutians and fast algorithms for displacement structure systems of equations. Wiss. A. Techn. Univ. Chemnitz, 33:29-36, 1991. [210] K. l~ost. Generalized companion matrices and matrix representations for generalized Bezoutians. Linear Algebra Appl., 193:151-172, 1993. [211] K. l~ost. Generalized Lyapunov equations, matrices with displacement structure and generalized Bezoutians. Linear Algebra Appl., 193:7594, 1993. [212] E.M. l~ussakovskii. The theory of V-Bezoutians and its applications. Linear Algebra Appl., 212/213:437-460, 1994. [213] Y. Saad. Numerical methods for large eigenvalue problems. Algorithms and Architectures for Advanced Scientific Computation. Manchester University Press/Halsted Press, 1992. [214] E. Saff. Orthogonal polynomials from a complex perspective. In P. Nevai, editor, Orthogonal polynomials, pages 363-393, Dordrecht, 1990. NATO, Kluwer Academic Press. [215] I. Schur. Uber Potenzreihen die im Innern des Einheitskreises Beschrs sind I. J. Reine Angew. Math., 147:205-232, 1917. See also [109, p.31-59]. [216] T.J. Stieltjes. Quelques recherches sur la th~orie des quadratures dites m~caniques. Ann. Aci. Ecole Norm. Paris, Sdr. 3, 1:409-426, 1884. Oeuvres vol. 1, pp. 377-396.
BIBLIOGRAPHY
431
[217] G.W. Struble. Orthogonal polynomials: variable-signed weight functions. Numer. Math., 5:88-84, 1963. [218] Y. Sugiyama. An algorithm for solving discrete-time Wiener-Hopf equations based upon Euclid's algorithm. IEEE Trans. Inf. Th., IT32(3):394-409, 1986. [219] Y. Sugiyama, M. Kasahara, S. Hirasawa, and T. Namekawa. A method for solving key equation for decoding goppa codes. Information and Control, 27:87-99, 1975. [220] W. Sweldens and P. SchrSder. Building your own wavelets at home. In Wavelets in Computer Graphics, ACM SIGGRAPH Course Notes. ACM, 1996. [221] J.J. Sylvester. On the theory of sysygetic relations of two rational integral functions, comprising an application to the theory of Sturm functions and that of the greatest algebraic common measure. Philos. Trans. Roy. Soc. London, 143:407-548, 1853. [222] G. Szeg~. Orthogonal polynomials, volume 33 of Amer. Math. Soc. Colloq. Publ. Amer. Math. Soc., Providence, Rhode Island, 3rd edition, 1967. First edition 1939. [223] D.K. Taylor. Analysis of the look ahead Lanczos algorithm. PhD thesis, Center for Pure and Applied Mathematics, University of California, Berkeley, 1982. [224] E.E. Tyrtyshnikov. New cost-effective and fast algorithms for special classes of Toeplitz systems. Soy. J. Numer. Math. Modelling, 3(1):6376, 1988. [225] M. Van Barel. Nested Minimal Partial Realizations and Related Matriz Rational Approximants. PhD thesis, K.U. Leuven, January 1989. [226] M. Van Barel and A. Bultheel. A new approach to the rational interpolation problem: the vector case. J. Comput. Appl. Math., 33(3):331346, 1990. [227] M. Van Barel and A. Bultheel. The computation of non-perfect Pad~Hermite approximants. Numer. Algorithms, 1:285-304, 1991. [228] M. Van Barel and A. Bultheel. A general module theoretic framework for vector M-Pad~ and matrix rational interpolation. Numer. Algorithms, 3:451-461, 1992.
432
BIBLIOGRAPHY
[229] M. Van Barel and A. Bultheel. The "look-ahead" philosophy applied to matrix rational interpolation problems. In U. Helmke, 1~. Mennicken, and J. Saurer, editors, Systems and networks: Mathematical
theory and applications, Volume II: Invited and contributed papers, volume 79 of Mathematical Research, pages 891-894. Akademie Verlag, 1994. [230] M. Van Barel and A. Bultheel. A look-ahead method to compute vector Pad@-Hermite approximants. Constr. Approx., 11:455-476, 1995. [231] M. Van Barel and A. Bultheel. Look ahead methods for block Hankel systems. J. Comput. Appl. Math., 1995. Submitted. [232] M. Van Barel and A. Bultheel. Look-ahead schemes for block Toeplitz systems and formal orthogonal matrix polynomials. In Proceedings
of the Workshop on orthogonal polynomials: the non-definite case, Rouen, France, April 2~-26, 1995, 1995. To appear. [233] M. Van Barel and A. Bultheel. A look-ahead algorithm for the solution of block Toeplitz systems. Linear Algebra Appl., 1996. Accepted. [234] J. van Iseghem. Approximants de Padd vectoriels. PhD thesis, Lille, 1987. [235] J. van Iseghem. Convergence of the vector QD-algorithm. Zeros of vector orthogonal polynomials. J. Comput. Appl. Math., 25:33-46, 1989. [236] H. Viscovatoff. De la m@thode g@n@rale pour r@duire toutes sortes de quantit@s en fractions continues. Mdm. Acad. Sci. Imp. St. Pdtersbourg, 1"226-247, 1805. [237] H.S. Wall. Analytic theory of continued fractions. Princeton, 1948.
Van Nostrand,
[238] H.S. Wall. Analytic theory of continued fractions. Chelsea, 1973. [239] W.A. Wolovich. Linear Multivariable Systems. Springer, New York, 1974. [240] P. Wynn. A general system of orthogonal polynomials. Math., 18:81-96, 1967.
Quart. J.
BIBLIOGRAPHY
433
[241] P.Y. Yalamov. Convergence of the iterative refinement procedure applied to stabilization of a fast Toeplitz solver. In P. Vassilevski and S. Margenov, editors, Proc. Second I M A C S Symposium on Iterative Methods in Linear Algebra, pages 354-363. IMACS, 1996. [242] D.M. Young and K.C. Jea. Generalized conjugate-gradient acceleration of non-symmetrizable iterative methods. Linear Algebra Appl., 34:159-194, 1980. [243] O. Zariski and P. Samuel. Commutative algebra, vol. II, volume 29 of Graduate texts in mathematics. Springer, New York, Heidelberg, Berlin, 1960.
List of Algorithms i.I
Scalar_Euclid
1.2
Extended_Euclid
1.3
Initialize
1.4
Comput e_Gk
1.5
Normalize
.......................... .........................
2 ii
............................
II
............................
II
............................. ..........................
Ii
1.6
Atomic_Euclid
1.7
U p d a t e_k
1.8
Comput e_Gk(v)
..........................
45
1.9
Comput e_Gk(V)
..........................
48
2.1
Block_MGS
.............................
............................. ..............................
35
36
86
3.1
Lanczos
4.1
Two_s i d e d _ G S
4.2
Two_s i d e d _ M G S
4.3
Block_two_sided_MGS
......................
142
4.4
Generalized_Lanczos
......................
167
4.5
Block_MGSP
4.6
Toeplitz_Euclid
5.1
Massey
6.1
Routh_Hurwitz
........................... ..........................
............................ .........................
............................... ..........................
435
ii0 140 140
171 204 268 319
Index Euclidean type, 201 extended Euclidean, 9, 23, 30, 69,102,104, 111,239,242, 293,296 fast, 61, 94, 133,382 generalized Lanczos, 167 Gram-Schmidt, 139 Lanczos, 72,100,115,135,166, 285 Look-Ahead, 99, 107 layer adjoining, 37 layer peeling, 37 left sided Euclidean, 55 Levinson, 306,323 Look-Ahead-Lanczos, 107 modified Gram-Schmidt, 140, 152, 166 right sided atomic Euclidean, 35 right sided Euclidean, 23 Routh-Hurwitz, 306 Schur, 158,381 Schur-Cohn, 323,325 split Levinson, 334 superfast, 61,381 Toeplitz Euclidean, 204 two-sided Gram-Schmidt, 135, 186 Viscovatoff, 37, 95, 158, 235, 241,249 annihilating polynomial, 302 antidiagonal, 234, 241, 251, 255,
Atomic_Euclid, 35 Block_MGS, 86 Block_MGSP, 171 Block_two_s ided_MGS, 142 Compute_Gk, ii Comput e_Gk(v), 45 Compute_Gk(v), 48 Extended_Euclid, ii Initialize, ii Lanczos, Ii0 Massey, 268 Normalize, 11 Routh_Hurwitz, 319 Scalar_Euclid, 2 Two_sided_GS, 140 Two_sided_MGS, 140 Update_k, 36
adjoint, 166 adjoint mapping, 137 algebraic curve, 226 algebraic variety, 227 algorithm atomic Euclidean, 32,241,270, 294 Berlekamp-Massey, 24,241,265 Chebyshev, 23, 381 classical l~outh-Hurmitz, 315 conjugate gradient, 105 division, 292 Euclidean, 32, 62, 68, 73, 78, 234, 254, 265, 295, 303, 306, 381 436
INDEX
260, 264, 300, 351, 364, 368,376 approximant, 18, 31 minimal Pad6, 254, 300 mPA-Baker, 261 mPA-Frobenius, 255 PA-Baker, 232 PA-Frobenius, 233,366 Pad6, 31, 99, 231, 232, 233, 292 Baker, 300 Frobenius, 300 minimal, 254 two-point, 200, 292,304 Pad6-Hermite, 382 rational, 23, 30, 159, 200,351 simultaneous Pad6, 382 vector Pad6, 163 artificial row, 256 associates, 4 backward substitution, 37 basis (G, ~)-basis, 355 biorthogonal, 101 canonical, 274, 370 dual, 137 module, 354 BCH code, 265 behavior steady state, 290 transient, 290 Bezout equation, 178 theorem, 14 Bezoutian, 88, 176 Bezoutiant, 176 bilinear form, 135, 136, 138, 166, 226 biorthogonal bases, 101
437 biorthogonal polynomials, 139 biorthogonality, 102, 135,392 biorthogonalization, 166 two-sided Gram-Schmidt, 104 biorthonormal, 104 block index, 141,152,153,154,159, 168, 187, 197 right, 190 singular, 239,255,263 size, 141,193 block orthogonal polynomial, 141 breakdown, 105,112,115,127, 141 curable, 128,327 incurable, 106, 130 canonical basis, 274, 370 canonical decomposition theorem, 120 Cauchy index, 309 causal system, 273 Cayley transform, 326 characteristic of a Toeplitz matrix, 183 characteristic polynomial, 77,124, 129,356 Chebyshev algorithm, 23 Christoffel-Darboux relation, 90,170, 173, 217, 219 circle unit, 169 circuit theory, 312 class equivalence, 234 cluster, 108 code BCH, 265 Goppa, 265 P~eed-Solomon, 265 cofactor, 117, 283
438 common divisor, 246 comonic polynomial, 232 complementary filters, 405 complementary subspaces, 119 complex conjugate, 100, 169,310 complexity, 286, 295 composite, 6 conjugate gradient method, 99 constant system, 274 continued fraction, 15,231,294 approximant, 31 convergent, 16, 56 left sided formal, 56 principal part, 30 tail, 18, 30, 56 continuous time, 272, 287 controllability form, 284 convergent, 16, 56 convolution operator, 277 coprime, 6,232 polynomials, 118 curable breakdown, 128,327 decoding, 265 decomposition LDU, 143 partial fraction, 131 defect, 255,257, 261,264 degree, 22, 254, 264 H-degree, 358 McMillan, 286 degree residual, 358 delay, 274 denominator, 161 description external, 285 internal, 285 state space, 280,287, 297,300 determinant, 67, 70, 77, 80, 141, 146, 154, 160, 194, 212
INDEX
diagonal, 235, 240, 251,256,300 main, 239, 265 Dirac impulse, 287, 288 discrete Laplace transform, 287 discrete time, 272 division, 202,243 division property, 6, 22, 30 divisor, 4 common, 246 domain Euclidean, 1 frequency, 275,288 integral, 3 time, 275 down shift operator, 62, 125, 152, 182 dual basis, 137 space, 136, 138 dual MSbius transformation, 17 dynamical system, 273 echelon form, 72 eigenstructure, 127 eigenvalue, 77, 117, 289 eigenvector, 132 elimination elementary row, 148 Gaussian, 147 equivalence class, 234 equivalent realizations, 117, 286 Euclidean algorithm, 32, 62, 68, 73,234, 254, 265, 295 domain, 1, 2, 3,407 index, 68, 107, 141,239, 251, 253,259, 294 ring, 3, 18, 55 evaluation backward, 17
439
INDEX
forward, 17 exceptional situation, 253 value, 253 expansion Fourier, 151 Laurent series, 160 factorization, 147 LDU, 141 of the moment matrix, 197 triangular, 151 Favard theorem, 307, 323 field, 21 skew, 21,231 finite impulse response, 405 finite-dimensional system, 279 form bilinear, 136, 138, 166, 226 linear, 136, 138, 179 linearized, 234 rational, 159 reduced, 232,234 unreduced, 234 formal Fourier series, 149 formal Laurent series, 21 formal Newton series, 352 formal power series, 21,138 formal series, 21 Fourier expansion, 151 fraction polynomial, 295 rational, 232, 233,243 frequency domain, 275,288 function rational, 115, 160, 201, 232, 285 transfer, 285, 288, 290, 294, 295,298 fundamental system, 213
future, 273 Gaussian elimination, 147 (G, ~)-basis, 355 G-order, 353 grade, 101,123, 126, 132 Gram-Schmidt procedure, 151 greatest common divisor, 1 definition, 5 uniqueness, 5 Hamburger moment problem, 307 Hankel symbol, 63 H-Bezoutian, 88 H-degree, 358 higher order term, 201 homomorphism, 136, 272 Hurwitz polynomial strict sense, 315, 321 impulse Dirac, 287, 288 response, 277, 282, 284, 288, 291 signal, 274, 287 incurable breakdown, 106 indefinite weight, 169 index, 326 block, 141,152,153,154, 159, 162, 168, 187, 197 Cauchy, 309 Euclidean, 68, 107, 141, 239, 251,253,259,294, 302 Iohvidov, 184, 186, 189, 190, 193, 198,203 Kronecker, 68, 78, 107, 113, 115, 141, 239, 251, 259, 266,268, 294, 302 minimal, 250, 255, 256, 260, 261,263,300,302,305 constrained, 251,266
440 unconstrained, 253 of rational function, 311 inner polynomial, 143 inner product, 103, 105, 138, 166, 179, 182, 191,197, 266 nondegenerate, 139 input signal, 272,295 input-output map, 272 interpolation points, 352 invariant set, 274 invariant subspace, 101, 119, 127, 128 inverse, 76, 77 involution, 100, 136 Iohvidov index, 184, 186,189,190, 193, 198,203 isomorphism, 274 Jacobi parameter, 336 Jordan block, 124 chain, 126 curve, 169 form, 131 Julia set, 169 jump, 263, 264, 266,269 kernel, 64 moment generating, 149, 153, 162 reproducing, 150 Kronecker delta, 139,274 index, 68, 115, 141,239, 251, 259,266,268,294 theorem, 118, 123 Krylov matrix, 100,114,116,166,285 maximal subspace, 101 sequence, 100, 107, 135, 166 space, 100
INDEX
Lagrange polynomial, 137 Lanczos, 99 polynomials, 99 Laplace transform, 287, 402 Laurent polynomial, 169, 179,407 series, 21,236 lazy wavelet, 403 leading coefficient, 62 left-right duality, 54 lifting scheme, 385,400 linear form, 136, 138, 179 functional, 138, 169 operator, 166 system, 271,272 linear system continuous time, 287 discrete time, 287 input, 287 output, 287 state vector, 287 linearized form, 234 Loewner matrix, 169,382 look-ahead, 133,379,381 lower degree term, 201 MSbius transformation, 15 dual, 17 main diagonal, 239 Markov parameter, 289, 301 matrix adjugate, 117 bi-diagonal block, 211 block tridiagonal, 75 companion, 74, 76, 78, 79,152, 160,284 diagonal, 103 block, 79, 153, 198,251
INDEX
Frobenius, 210,212 Gram, 139 Hankel, 37, 63, 77, 78, 102, 107, 114, 116, 135, 15t, 169, 251,294, 299, 302 Hankel Hessenberg, 79 Hessenberg, 159,161,168,197, 210, 211 block, 151 block upper, 152, 153 unit upper, 152 infinite dimensional, 69 Jacobi, 75, 114, 129, 151 Krylov, 100, 114, 116, 166 Loewner, 169, 382 moment, 135, 139, 146, 149, 151, 153, 159, 166, 167, 178, 184, 190, 203, 226, 251,302, 304 truncated, 161 observability, 285 of bilinear form, 136 permutation, 147 positive definite, 102, 106 quasi-Hankel, 169, 173 quasi-Toeplitz, 169, 180 rank deficient, 68 reachability, 285 shifted, 78, 153 similar, 123 state transition, 280 Sylvester, 176,381 Toeplitz, 37,178,182,190,194, 197,304 triangular left upper, 63 lower/upper Hankel, 63 right lower, 63 unit upper, 76,102,140,185, 251
441 tridiagonal, 66, 74, 102, 103 Jacobi, 103 truncated, 168 unimodular, 300 Vandermonde, 137, 169, 382 weight, 103 McMillan degree, 117, 123, 126, 286, 300, 303 measure, 169 minimal annihilating polynomial, 250, 270 index, 250,255,256, 260,261, 300 constrained, 251,266 unconstrained, 253 Padd approximant, 254, 300 partial realization problem, 289, 291 polynomial, 126, 129 realization, 117, 126, 254,285 Mismatch theorem, 130 model reduction, 295 modified Gram-Schmidt procedure, 166 module, 354 moment, 139, 162, 166, 190 generating kernel, 149, t53,162 modified, 139 monic, 234 monomial, 407 mPA-Baker approximant, 261 mPA-Frobenius approximant, 255 multiplicity, 124 geometric, 126 multiresolution, 390 normal data, 360 normalization, 10, 27, 34, 62, 66, 74,104,111,232,258,261, 267
442 monic, 104, 242, 299 null space, 64, 251 numerator, 161 observability, 119,285 operator convolution, 277 delay, 274 down shift, 62, 125, 152, 182 linear, 166 projection, 276 reversal, 63, 182 shift, 274, 287 truncation, 69 order, 21,254, 264 of approximation, 266 of multiresolution, 393 order residual, 353 order vector, 353 orthogonal, 104, 119, 136 on an algebraic curve, 226 output signal, 272, 295 P-fraction, 30 P-part, 29 PA-Baker approximant, 232 PA-Frobenius approximant, 233,366 Pad~ approximant, 231,232,233,292 Baker, 300 Frobenius, 300 two-point, 200,292, 304 table, 232, 263,382 antidiagonal, 241,300 artificial row, 256 diagonal, 240,300 main diagonal, 239,265 nonnormal, 232 normal, 232 path, 235 singular block, 233, 239
INDEX
staircase, 241 Pad~-Hermite approximant, 382 para-conjugate, 310, 322 zero, 311 para-even, 310 para-odd, 310 part polynomial, 22,201 strictly proper, 22 partial correlation coefficient, 323 partial fraction decomposition, 125, 131,309 partial realization, 23, 271 mixed problem, 292 past, 273 path, 235 pencil, 81 persymmetry, 182 pivot element, 147 pole, 117, 124, 126,309 polynomial, 21 annihilating minimal, 250, 270 biorthogonal, 135, 139 block orthogonal, 141,161,186, 195 characteristic, 77,124,129,356 coprime, 118,232 fraction, 295 representation, 283, 295 inner, 143 Lanczos, 99 Laurent, 169, 179,407 minimal, 126, 129 monic, 234 orthogonal, 135,138,239,251, 294, 302,304, 306 on an algebraic curve, 226 part, 201 quasi-orthogonal, 147, 197
INDEX
second kind, 163 shifted, 210 Szeg6, 220 true orthogonal, 144, 184, 195, 252, 296 monic, 145 true right orthogonal monic, 154 polynomial part, 22 polyphase matrix, 403 present, 273 prime, 6 principal ideal domain, 355 principal part, 29 principal vector, 124, 132 procedure Gram-Schmidt modified, 166 updating, 190 product inner, 36 projection, 122,276 projector, 273 proper fls, 22 proper part, 29 pseudo-inner, 325 pseudo-lossless, 310, 325 quadruple, 280 quasi-Hankel matrix, 169, 173 quasi-orthogonal polynomial, 147, 197 quasi-Toeplitz matrix, 169, 180 quotient left, 30 polynomial, 22 rank, 68, 115, 123, 126 rational approximant, 159,199,200,351 form, 159
443 fraction, 232,233,243 function, 115, 160, 201, 232, 285 strictly proper, 162 reachability, 119, 285 realization, 1.17,285 equivalent, 117, 122,286 minimal, 117, 123, 126,285 partial, 271 triple, 118, 120 realization problem, 293,298 minimal, 9.93,298 minimal partial, 289, 291,298 mixed problem, 300,304 partial, 291,293,298 reciprocal, 182, 322 recurrence Szeg6-Levinson, 220 three term, 201,210 generalized, 186, 193 reduced H-reduced, 358 column reduced, 358 form, 232,234 reflection coefficient, 158,225,323 remainder, 22 left, 30 representation polynomial fraction, 283,295, 299 reproducing kernel, 150 reproducing property, 151 residual, 206,243, 266 response impulse, 277, 282, 284, 288, 291 transient, 290 reversal operator, 63, 182 right block, 190 ring
444 division, 21 Euclidean, 3, 55 integral, 3 right sided Euclidean, 6 valuation, 20 l~outh array, 316 Routh-Hurwitz test, 228,306,334 scaling function, 387 scattering theory, 37 Schur algorithm, 158,381 function, 324 parameter, 323,325 theorem, 324 Schur complement determinant formula, 146, 150, 161 Schur-Cohn test, 228,323,334 self- adjoint, 100 sequence Krylov, 100, 107, 135, 166 one sided infinite, 138 series formal Fourier, 149 formal Newton, 352 Laurent, 236 power, 290 two-sided Laurent series, 274 set invariant, 274 Julia, 169 time, 272 shift, 274 operator, 274, 287 register, 24, 265 upward, 116 shift invariant, 274 shift-register synthesis, 270 signal, 272,385,401,402 impulse, 274, 287
INDEX
input, 272,295 output, 272, 295 similar matrices, 123 simultaneous Pad@ approximant, 382 singular block, 239, 255,263 SISO system, 272 size block, 141,193 skew field, 231 space dual, 136, 138 Krylov, 100 vector, 136 spectrum, 129 stability BIBO, 288 internal, 289 stability check, 306 stacking vector, 62, 139, 151,198, 201 staircase, 241 state, 278 space, 278 description, 280, 287, 297, 300 triple, 284 transition matrix, 280 vector, 280 state space, 118 state space description, 287 strictly proper, 22 structure theorem, 120 Sturm sequence, 309 subspace complementary, 119 invariant, 101,119, 127, 128 maximal Krylov, 101 observable, 119 reachable, 119 Sylvester determinant formula, 117
INDEX
Sylvester matrix, 176,381 symbol Hankel, 63 Toeplitz, 179 synthesis shift-register, 270 system causal, 273 constant, 274 dynamical, 273 finite-dimensional, 279 Hankel, 239 identification, 295 linear, 271,272 SISO, 272 theory, 254 time invariant, 274 Toeplitz, 203 systolic array, 34 Szeg8 polynomial, 220, 323, 335 Szeg8 theory, 169 SzegS-Levinson recurrence, 220 table Pad~, 232,263 antidiagona~, 241,300 artificial row, 256 diagonal, 240, 300 main diagonal, 239,265 nonnormal, 232 normal, 232 path, 235 singular block, 233,239 staircase, 241 term constant, 201 higher order, 201 lower degree, 201 three term recurrence, 74, 104, 151, 185,201,210
445 generalized, 186, 193 time continuous, 272,287 discrete, 272 domain, 275 moment, 290 set, 272 time coefficient, 290,298, 301 time invariant system, 274 Toeplitz, 36,276 symbol, 179 system, 203 transfer function, 285, 288, 290, 294, 295,298 transform atomic, 32 discrete Laplace, 287 dual MSbius, 17 Laplace, 287, 402 MSbius, 15 z-transform, 274,287, 402 transient response, 290 true orthogonal polynomial, 144 truncation operator, 69 truncation theorem, 240 two-point Pad~ approximant, 200, 292,304 two-sided Laurent series, 274 unimodular matrix, 300,363 unit, 3 circle, 169, 185, 226, 306,382 unreduced form, 234 upward shift, 116 valuation, 20 Vandermonde matrix, 137,169,382 vector Pad~ approximant, 163 principal, 124, 132 stacking, 139, 151,198,201
446 state, 280 vector space, 136 Viscovatoff algorithm, 37, 95, 158, 235, 241,249 VLSI, 34 wavelet domain, 396 wavelet transform, 396 wavelets, 385 weakly normal data, 360 weight, 169 zero, 117, 311 para-conjugate, 311 zero division, 105 z-transform, 274, 287, 402
INDEX