SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications
This Page Intentionally Left Blank
SVD AND SIGNAI~ PROCESSING, III Algorithms, Architectures and Applications
Edited by
MARC MOONEN BART DE MOOR ESAT-SISTA Department of Electrical Engineering Katholieke Universiteit Leuven Leuven, Belgium
1995 ELSEVIER Amsterdam
Lausanne
New York
Oxford
Shannon - Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands
Library of Congress C a t a ] o g l n g - l n - P u b l l c a t t o n
Data
SVD and s i g n a l p r o c e s s i n g I I I : a l g o r i t h m s , a r c h i t e c t u r e s , and a p p l i c a t i o n s / e d i t e d by Marc Moonen, B a r t De Moor. p. cm. Includes b l b l l o g r a p h l c a l references and index. ISBN 0-444-82107-4 (a]k. paper) I. Signal p r o c e s s i n g - - D l g i t a ] techniques--Congresses. 2. Decomposition (Mathematlcs)--Congresses. I . Moonen, Marc S., 1963. I I . Moor, Bart L. R. de., 1960. I I I . T i t l e : SVD and signal processing three. IV. T i t l e : SVD and signal processing 3. TK5102.9.$83 1995 621.382'2'015194--dc20 95-3193 CIP
ISBN: 0 444 82107 4 9 1995 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, The Netherlands. Special regulations for readers in the U . S . A . - This publication has been registered with Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA, 01923. Information can be about conditions under which photocopies of parts of this publication may be made copyright questions, including photocopying outside of the U.S.A., should be referred Elsevier Science B.V., unless otherwise specified.
the Copyright Clearance obtained from the CCC in the U.S.A. All other to the copyright owner,
No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. pp. 375-382, 415-422, 467-474: Copyright not transferred. This book is printed on acid-free paper. Printed in The Netherlands.
PREFACE This book is a compilation of papers dealing with matrix singular value decomposition and its application to problems in signal processing. Algorithms and implementation architectures for computing the SVD are discussed, as well as a variety of applications such as systems and signal modeling and detection. The book opens with a translation of a historical paper by Eugenio Beltrami, containing one of the earliest published discussions of the singular value decomposition. It also contains a number of keynote papers, highlighting recent developments in the field. The material in the book is based on the author contributions presented at the 3rd International Workshop on SVD and Signal Processing, held in Leuven, August 22-25, 1994. This workshop was partly sponsored by EURASIP and the Belgian NFWO (National Fund for Scientific Research), and organized in co-operation with the IEEE Benelux Signal Processing Chapter, the IEEE Benelux Circuits and Systems Chapter. It was a continuation of two previous workshops of the same name which were held at the Les Houches Summer School for Physics, Les Houches, France, September 1987, and the University of Rhode Island, Kingston, Rhode Island, U.S.A., June 1990. The results of these workshops were also published by Elsevier Science Publishers B.V. (SVD and Signal Processing, Algorithms, Applications and Architectures, edited by E.F. Deprettere, and SVD and Signal Processing, Algorithms, Analysis and Applications, edited by R. Vaccaro). It has been a pleasure for us to organize the workshop and to work together with the authors to assemble this book. We feel amply rewarded with the result of this co-operation, and we want to thank all the authors here for their effort. We would be remiss not to thank the other members of the workshop organizing committee, Prof. Ed Deprettere, Delft University of Technology, Delft, The Netherlands, Prof. Gene Golub, Stanford University, Stanford CA, U.S.A., Dr. Sven Hammarling, The Numerical Algorithms Group Ltd, Oxford, U.K., Prof. Franklin Luk, Rensselaer Polytechnic Institute, Troy NY, U.S.A., and Prof. Paul Van Dooren, Universit@ Catholique de Louvain, Louvain-la-Neuve, Belgium. We also gratefully acknowledge the support of L. Delathauwer, who managed all of the workshop administration.
Marc Moonen Bart De Moor
Leuven, November 199~
This Page Intentionally Left Blank
vii
CONTENTS
A short introduction to Beltrami's paper
On bilinear functions E. Beltrami, 1873 English Translation" D. Boley
P A R T 1. K E Y N O T E P A P E R S
1. Implicitly restarted Arnoldi/Lanczos methods and large scale SVD applications D. C. Sorensen
21
2. Isospectral matrix flows for numerical analysis U. Helmke
33
3. The Riemannian singular value decomposition B.L.R. De Moor
61
4. Consistent signal reconstruction and convex coding N. T. Thao, M. VetterIi
79
P A R T 2. A L G O R I T H M S A N D T H E O R E T I C A L C O N C E P T S
5. The orthogonal qd-algorithm U. von Matt
6. Accurate singular value computation with the Jacobi method Z. Drmad
99
107
viii
Contents
7. Note on the accuracy of the eigensolution of matrices generated by finite elements Z. DrmaS, K. Veselid
115
8. Transpose-free Arnoldi iterations for approximating extremal singular values and vectors M.W. Berry, S. Varadhan
123
9. A Lanczos algorithm for computing the largest quotient singular values in regularization problems P.C. Hansen, M. Hanke
131
10. A QR-like SVD algorithm for a product/quotient of several matrices G.H. Golub, K. SoMa, P. Van Dooren
139
11. Approximating the PSVD and QSVD S. Qiao
149
12. Bounds on singular values revealed by QR factorizations C.- T. Pan, P. T.P. Tang
157
13. A stable algorithm for downdating the ULV decomposition J.L. Barlow, H. Zha, P.A. Yoon
167
14. The importance of a good condition estimator in the URV and ULV algorithms 175 R.D. Fierro, J.R. Bunch
15. L-ULV(A), a low-rank revealing ULV algorithm R.D. Fierro, P.C. Hansen
183
16. Fast algorithms for signal subspace fitting with Toeplitz matrices and applications to exponential data modeling S. Van Huffel, P. Lemmerling, L. Vanhamme
191
17. A block Toeplitz look-ahead Schur algorithm K. Gallivan, S. Thirumalai, P. Van Dooren
199
Contents 18. The set of 2-by-3 matrix pencils - Kronecker structures and their transitions under perturbations - And versal deformations of matrix pencils B. KdgstrSm
ix
207
19. J-Unitary matrices for algebraic approximation and interpolation -The singular case P. Dewilde 209
P A R T 3. A R C H I T E C T U R E S
AND REAL TIME IMPLEMENTATION
20. Sphericalized SVD Updating for Subspace Tracking E.M. Dowling, R.D. DeGroat, D.A. Linebarger, H. Ye
227
21. Real-time architectures for sphericalized SVD updating E.M. Dowling, R.D. DeGroat, D.A. Linebarger, Z. Fu
235
22. Systolic Arrays for SVD Downdating F. Lorenzelli, K. Yao
243
23. Subspace separation by discretizations of double bracket flows K. Hiiper, J. GStze, S. Paul
251
24. A continuous time approach to the analysis and design of parallel algorithms for subspace tracking J. Dehaene, M. Moonen, J. Vandewalle
259
25. Stable Jacobi SVD updating by factorization of the orthogonal matrix F. Vanpoucke, M. Moonen, E.F. Deprettere
267
26. Transformational reasoning on time-adaptive Jacobi type algorithms H.W. van Dijk, E.F. Deprettere
277
27. Adaptive direction-of-arrival estimation based on rank and subspace tracking B. Yang, F. Gersemsky
287
28. Multiple subspace ULV algorithm and LMS tracking S. Hosur, A.H. Tewfik, D. Boley
295
Conmn~ P A R T 4. A P P L I C A T I O N S
29. SVD-based analysis of image boundary distortion F.T. L uk, D. Vandevoorde
305
30. The SVD in image restoration D.P. O'Leary
315
31. Two dimensional zero error modeling for image compression J. Skowronski, L Dologlou
323
32. Robust image processing for remote sensing data L.P. Ammann
333
33. SVD for linear inverse problems M. Bertero, C. De Mol
341
34. Fitting of circles and ellipses, least squares solution W. Gander, R. Strebel, G.H. Golub
349
35. The use of SVD for the study of multivariate noise and vibration problems D. Otte
357
36. On applications of SVD and SEVD for NURBS identification W. Ma, J.P. Kruth
367
37. A tetradic decomposition of 4th-order tensors : Application to the source separation problem J.-F. Cardoso
375
38. The application of higher order singular value decomposition to independent component analysis L. De Lathauwer, B. De Moor, J. Vandewalle
383
39. Bandpass filtering for the HTLS estimation algorithm: design, evaluation and SVD analysis H. Chen, S. Van Huffel, J. Vandewalle
391
Con~n~ 40. Structure preserving total least squares method and its application to parameter estimation H. Park, J. B. Rosen, S. Van HuffeI
399
41. Parameter estimation and order determination in the low-rank linear statistical model R.J. Vaccaro, D.W. Tufts, A.A. Shah
407
42. Adaptive detection using low rank approximation to a data matrix L P. Kirsteins, D.W. Tufts
415
43. Realization of discrete-time periodic systems from input-output data E.L Verriest, J.A. Kullstam
423
44. Canonical correlation analysis of the deterministic realization problem J.A. Ramos, E.L Verriest
433
45. An updating algorithm for on-line MIMO system identification M. Stewart, P. Van Dooren
441
46. Subspace techniques in blind mobile radio channel identification and equalization using fractional spacing and/or multiple antennas D.T.M. Slock
449
47. Reduction of general broad-band noise in speech by truncated QSVD: implementation aspects S.H. Jensen, P.C. Hansen, S.D. Hansen, J.A. Sorensen
459
48. SVD-based modelling of medical NMR signals R. de Beer, D. van Ormondt, F.T.A.W. Wayer, S. Cavassila, D. Graveron-Demilly, S. Van Huffel
467
49. Inversion of bremsstrahlung spectra emitted by solar plasma M. Piana
475
Authors index
485
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
A SHORT
INTRODUCTION
TO BELTRAMI~S
PAPER
The last 30 years the singular value decomposition has become a popular numerical tool in statistical data analysis, signal processing, system identification and control system analysis and design. This book certainly is one of the convincing illustrations of this claim. It may come as a surprise that its existence was already established more than 120 years ago, in 1873, by the Italian geometer Beltrami. This is only 20 years after the conception of a matrix as a multiple quantity by Cayley in his 'Memoir on Matrices' of 1858. Says Lanczos in [8, p.100]: "The concept of a matrix has become so universal in the meantime that we often forget its great philosophical significance. To call an array of letters by the letter A was much more than a matter of notation. It had the significance that we are no longer interested in the numerical values of the coe.O~cients aij. The matrix A was thus divested of its arithmetic significance and became an algebraic operator." And Sylvester himself says: "The idea of subjecting matrices to the additive process and of their consequent amenability to the laws of functional operation was not taken from it (Cayley's ' M e m o i r ' ) but occured to me independently before I had seen the memoir or was acquainted with its contents." Approximately twenty years thereafter, there was a sequence of papers on the singular value decomposition, starting with the one by Beltrami [3] that is translated here. Others were by Jordan [7], Sylvester [11] [12] [13] (who wrote some of his papers in French and called the singular values, les multiplicateurs canoniques.), and by Autonne [1] [2]. After that, the singular value decomposition was rediscovered a couple of times and received more and more attention. For the further history and a more detailed explanation of these historical papers, we refer to the paper by Stewart [9] and a collection of historical references in the book by Horn and Johnson [6, Chapter 3]. We would like to thank Prof. Dan Boley of the University of Minnesota, Minneapolis, U.S.A., for his translation of the Beltrami paper.
E. Beltrami References
[1] Autonne L. Sur les matrices hypohermitiennes et les unitaires. Comptes Rendus de l'Acadhmie des Sciences, Paris, Vo1.156, 1913, pp.858-860. [2] Autonne L. Sur les matrices hypohermitiennes et sur les matrices unitaires. Ann. Univ. Lyon, Nouvelle S6rie I, Fasc. 38, 1915, pp.177. [3] Beltrami E. Sutle Funzioni Bilineari, Giornale di Mathematiche, Battagline G., Fergola E. (editors), Vol. 11, pp. 98-106, 1873. [4] Cayley A. Trans. London Phil. Soc., Vo1.148, pp.17-37, 1858. Collected works: Vol. II, pp.475-496.
[5] Dieudonn4 J.
Oeuvres de Camille Jordan. Gauthier-Villars & Cie, Editeur-Imprimeur-Libraire, Paris, 1961, Tome I-II-III.
[6] Horn R.A., Johnson C.R.
Topics in matrix analysis. Cambridge
University Press, 1991.
[7] Jordan C. Mdmoire
sur les formes bilindaires. J. Math. Pures Appl. II, Vol.19, pp.35-54, 1874. (see reprint in [5, Vol.III, pp.23-42]).
[8] Lanczos C.
Linear Differential Operators. D. Van Nostrand Com-
pany Limited London, 1961. [9] Stewart G.W. On the early history of the singular value decomposition. SIAM Review, December 1993, Vol.35, no.4, pp.551-566. [10] Sylvester J.J. C.R. Acad. Sci. Paris Vo1.108, 1889 pp.651-653Mess. of Math. Vol.19, 1890, pp.42-46.
[ii] Sylvester J.J.
Sur la rdduction biorthogonale d'une forme lindolin'eaire h sa forme canonique. Comptes Rendus, CVIII., pp.651-
653, 1889 (see also in [10, p.638]).
[12] Sylvester J.J.
A new proof that a general quadric may be reduced to its canonical form (that is, a linear function of squares) by means of a real orthogonal substitution. Messenger of Mathematics, XIX,
pp.l-5, 1890 (see also [10, p.650])
A Short Introduction to Beltrami's Paper
[13] Sylvester J.J. On the reduction of a bilinear quantic of the
n TH
order to the form of a sum of n products by a double orthogonal substitution. Messenger of Mathematics, XIX, pp.42-46, 1890 (see
also [10, p.654]).
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 1995 Elsevier Science B.V.
ON BILINEAR
FUNCTIONS
(SULLE FUNZIONIBILINEARI) by E. B E L T R A M I
English Translation- Copyright @1990 Daniel Boley
A Translation from the Original Italian of one of the Earliest Published Discussions of the Singular Value Decomposition.
Eugenio Beltrami (Cremona, 1835 - Rome, 1900) was professor of Physics and Mathemat-
ics at the University of Pavia from 1876 and was named a senator in 1899. His writings were on the geometry of bilinear forms, on the foundations of geometry, on the theory of elasticity, electricity, and hydrodynamics, on the kinematics of fluids, on the attractions of ellipsoids, on potential functions, and even some in experimental physics.
E. Beltrami
GIORNALE
DI MATEMATICHE AD USO DEGLI S T U D E N T I
DELLE
UNIVERSITA
ITALIANE
PUBBLICATO PER CURA DEI PROFESSORI G. B A T T A G L I N I ,
E. F E R G O L A
IN UNIONE DEI PROFESSORI E. D ' O V I D I O , G. T O R E L L I e C. S A R D I
Volume XI.-1873
NAPOLI BENEDETTO PELLERANO EDITORE LIBRERIA SCIENTIFICA E INDUSTRIALE Strada di Chiaia, 60
On Bilinear Functions
JOURNAL
OF MATHEMATICS FOR USE OF THE STUDENTS
OF THE ITALIAN UNIVERSITIES PUBLISHED UNDER THE DIRECTION OF THE PROFESSORS G. B A T T A G L I N I , E. F E R G O L A TOGETHER WITH THE PROFESSORS E. D ' O V I D I O , G. T O R E L L I and C. S A R D I
Volume XI.-1873
NAPLES BENEDETTO PELLERANO PUBLISHER SCIENTIFIC AND INDUSTRIAL BOOKSTORE 50 Chiaia Road
English Translation- Copyright @1990 Daniel Boley
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
ON BILINEAR FUNCTIONS by
E. B E L T R A M I
The theory of bilinear functions, already the subject of subtle and advanced research on ~he part of the eminent geometers KttONECKER and CHttISTOFFEL (Journal of BOttCHAttDT t. 68), gives occasion for elegant and simple problems, if one removes from it the restriction, almost always assumed until now, that the two series of variables be subjected to identical substitutions, or to inverse substitutions. I think it not entirely unuseful to treat briefly a few of these problems, in order to encourage the young readers of this Journal to become familiar ever more with these algebraic processes that form the fundamental subject matter of the new analytic geometry, and without which this most beautiful branch of mathematical science would remain confined within a symbolic geometry, which for a long time the perspicuity and the power of pure synthesis has dominated. Let f - ~rsCrsXrYs be a bilinear function formed with the two groups of independent variables Xl~ x2, ...Xn; Yl , Y2 , .. . Yn.
Transforming these variables simultaneously with two d i s t i n c t linear substitutions (1)
xr = ~ r a r p ~ p ,
Ys = F~sbsq~?q,
(whose determinants one supposes to be always different from zero) one obtains a transformed form = 2pqTpq~p?Tq, whose coefficients 7pq axe related to the coefficients crs of the original function by the n 2 equations that have the following form: (2)
~ / p q - ~rsCrsarpbsq.
Setting for brevity 2mc,~rams = hrs,
Emc~mb,~ = k~,
10
E. B e l t r a m i
this typical equation can be written in the two equivalent forms (3)
Eshspbsq - 7pq,
~ r k r q a r p - 7pq.
Indicating with A, B, H, K, I" the determinants formed respectively with the elements a, b, c, h, k, 7, one has, from these last equations, I' = HB = KA. But, by the definition of the quantities h, k, one has as well H = CA, K = CB; hence F = ABC, that is, the determinant of the transformed function is equal to that of the original one multiplied by the products of the moduli of the two substitutions. Let us suppose initially that the linear substitutions (1) are both orthogonal. In such a case, their 2n 2 coefficients depend, as is known, on n 9 - n independent parameters, and on the other hand, the transformed function ~ can be, generally speaking, subjected to as many conditions. Now the coefficients 7pq whose indices p, q are mutually unequal are exactly n 2 - n in number: one can therefore seek if it is possible to annihilate all these coefficients, and to reduce the bilinear function f to the canonical form
To resolve this question, it suffices to observe that if, after having set in equations (3) >
7vq = 0 for p < q and 7pp = 7p,
one multiplies the first by brq and one carries out on the result the summation Eq; then one multiplies the second by asp and one carries out the summation Ep, one obtains h~p = 7vbrp,
ksq - %asq.
These two typical equations are mutually equivalent to the corresponding ones of equations (3), and, as a consequence of these latter ones, one could in this way recover the equations (3) in the process. In them are contained the entire resolution of the problem posed, that one obtains in this way: Writing out the last two equations in the following way
(4)
c l r a l s + c2ra2s + ... + Cnrans = 7sbrs, crlbls + cr2b2s + ... + crnbns = 7sars,
then setting, for brevity, CrlCsl + Cr2Cs2 + ... + CrnCsn • ~rs,
ClrCls ~- C2rC2s ~" ... -~ Cnrgns -- Vrs , (so that #rs = #st, vrs = r'sr), the substitution into the second equation (4) of the values of the quantities b obtained from the first one yields
(5)1
# r q a l s + Pr2a2s + ...-4- ~rnans = 72ars;
On Bilinear Functions
11
likewise, the substitution onto the first equation (4) of the values of the quantities a recovered from the second one yields (5)2
Vrlbls
+ v r 2 b 2 s + ... +
vrnbns
= "/'2brs.
The elimination of the quantities a from the n equations that one deduces from equations (5)x setting r = 1, 2, ... n in succession, leads one to the equation
A1
=
#11 - 7 2 #12
...
#1,~
#21 # 2 2 -- ,,[2 .oo ~ . . . . . . . . . . . . . . . . . . . .
...
#2n
#nl
...
"- 0 ~176.~
#n2
#nn
- - "~
2
which the n values 3,12, 72, ... 72 of 3, 2 must satisfy. Likewise, the elimination of the quantities b from the n equations that one deduces from equation (5)2 setting r = 1, 2, ...n in succession, leads one to the equation V l l -- ,,/2 A 2 _.
/212
/221 /222 _ ,)12 .............................
...
Vln
..
/'/2n
- - 0~
9
9
12nl
12n2
...
Vnn -- 72
possessing the same properties as the preceding equation. It follows t h a t the two determinants A1, and A 2 are mutually identical for any value of g a m m a (*). In fact, they are entire functions of degree n with respect to 9,2, which become identical for n -t- 1 values of 7 2, that is for the values 72, 722, ... 72 that simultaneously make both determinants zero, and for the value h' = 0 that makes them both equal to C 2. The n roots 7 2, 72, ... 72 of the equation A = 0 (likewise indicating indifferently A1 = 0 or A2 = 0) are all real, by virtue of a very well known theorem; to convince oneself that they are also positive, it suffices to observe that the coefficients of 7 0 , - - 7 2, 7 4 ,
--7 6 , etc.
are sums of squares. But one can also consider that, by the elementary theory of ordinary quadratic forms and by virtue of the preceding equations, one has F = Z~r~#~,z~z, = ")'I2 2~I + G -- ~r,VrsYrYs
22
... +
22
22 22 22. ---- ~'17"]1 ~- "/2T]2 "~ "'" ~- ~'nT]n,
on the other hand, one has as well f
-- ~m(glmX
1 -4- C 2 m Z 2 J I - . . . ~- C n r n X n ) 2 '
G = Zm(cmlyl + c,~2Y2 + ... + c,~,y~)2; hence the two quadratic functions F, and G are essentially positive, and the coefficients 72, 72, ... 7 2 of the transformed expressions, i.e. the roots of the equation A = 0, are necessarily all positive. (*) This theorem one finds demonstrated in a different way in w
of Determinants by BRIOSCHI
12
E. Beltrami
The proposed problem is therefore susceptible of a real solution, and here is the procedure: Find first the roots 7 2, 7 2, ... 7 2 of the equation A = 0, (which is equivalent to reducing one or the other of the quadratic functions F, G to the canonical form); then with the help of the equations of the form (5)1 and of those of the form
a~, + a~, + ... + a,,2 = i , one determines the coefficients a of the first substitution (coefficients that admit an ambiguity of sign common to all those items in the same column). This done, the equations that have the form (4) supply the values of the coefficients b of the second substitution (coeffidents that also admit an ambiguity of sign common to all those items in the same column, so that each of the quantities 7, are determined only by its square 72). Having done all these operations, one has two orthogonal substitutions that yield, exactly as desired by the problem, the identity ErsCrsZrYs = E m T m f m ~ m , in which everyone of the coefficients7m must be taken with the same sign that it is assigned in the calculation of the coefficientsb. It is worth observing that the quadratic functions denominated F and G can be derived from the bilinear function f setting in the latter on the one hand ys :
C l s X l "4" C2s;r,2 "Jr ... "~ CnsZn~
and on the other Z r = C r l Y l + cr2Y2 + ... + C r n y n .
Now if in these two relations one applies the substitutions (1), one then sees immediately that they are respectively converted into the following relations in the new variables ~ and ~?: ~,~ = 7,~f,~,
f,~ = %~-~,
which transform the canonical bilinear function
into the respective quadratic functions 22
22
22
7:f: + 72~2 + ...+ 7 ~ , 2 + "y22~2 + ... + "Y.~.. 22 "~:2 ~1
And, in fact, we have already noted that these two last functions are equivalent to the quadratics F, G. We ask of what form must be the bilinear function f so that the two orthogonal substitutions that reduce it to the canonical form turn out substantially mutually identical. To this end, we observe that setting b:s = =t=als,
b2, - -4-a2,,
...
bns = =t=a,~s
13
On B i l i n e a r F u n c t i o n s
the equations (4) are converted to the following: Clrals "4- c2ra2s + ... ~- cnra,~s = 4"Tsars, c r l a l s "4" cr2a2s -}-... + Crnans -- +7oa,.~,
which, since they must hold for every value of r and s, give crs = Csr. Reciprocally this hypothesis implies the equivalence of the two linear substitutions. So every bilinear form of the desired form is a s s o c i a t e d h a r m o n i c a l l y with an ordinary quadratic form: that is to say that designating this quadratic form with
the bilinear function is 1 de
f = r~,g~y,. In this case, the equation zX = 0 can be decomposed in this way:
A
--
(:11 - - '~
s
...
Cln
(:11 "~- "~
C12
...
Cln
c21
(:22 - - "}I
...
C2n
C21
C22 "~- "[
...
C2n
Cnl
On2
...
Cnn -- "~
Cnl
On2
...
C n n "4-'~
.
The first factor of the second member, set to zero, gives the well known equation that serves to reduce the function r to its canonical form. If the two substitutions are absolutely mutually identical, then coefficients 7 are the same for the quadratic function and for the bilinear one. But if, as we have already supposed, one concedes the possibility of an opposition of sign between the coefficients a and b belonging to two columns of equal index, the coefficient 7s of the corresponding index in the bilinear function can have sign opposite to that in the quadratic. From this, the presence in the equation A = 0 of a factor having for roots the quantities 7 taken negatively. In the particular case just now considered, the quadratic function denoted F is
and, on the other hand, one finds that there always exists an orthogonal substitution which makes simultaneously identical the two equations
r
2, 2
~
=2~2~ 2.
This is a consequence of the fact that, as is well known,
is a symbol invariant with respect to any orthogonal substitution.
14
E. Beltrami
We translate into geometrical language the results of the preceding analysis, assuming (as is generally useful to do) that the signs of the coefficients 7 are chosen to make AB = 1 and hence F = C. Let Sn, S~ be two spaces of n dimensions with null curvature, referred to, respectively, by the two systems of orthogonal linear coordinates z and y, for which we will call O and O ~ the origins. To a straight line $1 drawn through the origin 0 in the space Sn, there corresponds a specific set of ratios zz : x2 : ... : x,~; and on the other hand, the equation f = 0, homogeneous and of first degree in the xl, x2, ..., zn and in the Yl, Y2, ..., yn defines a correlation of figures in which to each line through the point O in the space Sn corresponds a locus of first order in n - 1 dimensions that we call S ~ n--1 within the space S~; and vice-versa. By virtue of the demonstrated theorem, it is always possible to substitute for the original coordinate axes in the z's and y's new axes in the ~'s and r/'s, respectively, with the same origins O and O ~, so that the correlation rule assumes the simpler form ~1~IT}1 "~" ~2~2~2 "~" ... "~- "~n~nT}r~ : 0.
Said this, one may think of the axis system r/, together with its figure, moved so that its origin O ~ falls on O, and that each axis ~r falls on its homologous axis ~r (*). In such a hypothesis, the last equation expresses evidently that the two figures are found to be in polar or involutory correlation with respect to the quadric cone (in n - 1 dimensions)
~
+~
+ ... + ~
= 0
that has its vertex on O. Hence, one can always convert a correlation of first degree of the above type, through a motion of one of the figures, into a polar or involutory correlation with respect to a quadric cone (in n - 1 dimensions) having its vertex on the common center of the two figures overlaid. In the case of n = 2, this general proposition yields the very well known theorem that two homographic bundles of rays can always be overlaid in such a way that they constitute a quadratic involution of rays. In the case that n = 3, one has the theorem, also known, that two correlative stars (i.e., such that to every ray in one corresponds a plane in the other, and vice-versa) can always be overlaid in such a way that they become reciprocal polar with respect to a quadric cone having its vertex on the common center. One can interpret the analytic theorem in another way, and recover other geometric properties in the cases n = 2, and n = 3. If with YlYx + Y2Y2 -[- ... + YnYn "b 1 = 0 one represents a locus of first order (S~_l) lying in the space S~, to every orthogonal substitution of the form Ys -"
~qbsq~q
(*) Something that is possible by having AB = 1.
On Bilinear Functions
15
applied on the local coordinates y, corresponds an identical orthogonal substitution Y8 = 2CqbsqEq applied to the tangential coordinates Y. Given this, suppose that between the x's and y's one institutes the n - 1 relations that result from setting equal the n ratios ClrXl +
Y" C2rX2
+
... +
(, = I, 2,
CnrXn
...'0.
This is equivalent to considering two homographic stars with centers on the points 0 and 0 ~. With such hypotheses, the equation ylY1 + y2Y2 + ... + ynY,~ = 0, which corresponds to this other equation in the ~?-axis system ~?IE1 + rl2E: + ... + ~?~E,~ = 0, is equivalent to the following relation between the x's and the y's ~rsCrsXrYs = O,
and this is in turn reducible, with two simultaneous orthogonal substitutions, to the canonical form 71fiE1 + 72~2E2 + ... + %f,~En = 0. Now the relation that this establishes between the new tangential coordinates E cannot differ from that contained in the third to last equation; hence it must be ~. 'TI~I
-"T2~2
...
--
.
"Tn~n
From these equations, which are nothing else than the relations of homography, expressed in the new coordinates ~ and ~?, it emerges evidently that the axis ~1 and r/l, ~2 and ~72, ... ~n and r/n are pairs of corresponding straight lines in the two stars. Since it is thus possible to move one of the stars in such a way that the ~ axes and the ~7 axes coincide one for one, one concludes that, given two homographic stars in an n dimensional space, one can always overlay one upon the other so that the n double rays acted upon by the overlaying, constitute a system of n orthogonal cartesian rays. From this, setting n = 2 and n = 3, one deduces that two homographic groups of rays can always be overlaid so that the two double rays are orthogonal; and that the two homographic stars can always be overlaid so that the three double rays form an orthogonal cartesian triple. Returning now to the hypothesis of two arbitrary linear substitutions, but always acting to give to the transformed function ~ the canonical form r = EmTm~m~?m, one solve the equations (3) with respect to the quantities h and k, respectively, with which one finds the two typical equations, mutually equivalent, (6)
Bhrs = Brs%,
Akr, = Ars%,
16
E. Beltrami
in which At,, B,s are the algebraic complements of the elements ara, bra in the respective determinants A, B. The equations (4) are nothing else than particularizations of these. Let ff be a second bilinear function, given by the expression f = Z~clrszrys,
and suppose we want to transform simultaneously, with the very same linear substitutions (1), the function f into the canonical form ~o and the function f into the canonical form = Z,~7,~,~y,~. Indicating with h ~ and k ~ quantities analogous to the h and k for the second function, together with the equations (6), these further equations must hold (6)'
Bh~,~ = B ~ 7 : ,
Akin, = A~,7:.
Dividing the first two equations of each of the pairs (6) and (6)' one by the other, one obtains hra - Aah~rs = 0 where As = "Y---ts 7;' or
(7)
( Clr -- A s C t l r ) a l s Jr" (C2r -- AsCt2r)g2s Jr" ... Jr" (Cnr -- A s C t n r ) a n .
= O.
Setting r = 1, 2, ...n in this last equation and eliminating the quantities als, a2s, ... arts from the n equations obtained in this way, one arrives at the equation
O --
C l l -- AC~l
c12 -
~c~2
...
s
c21 -
c22 -
AC~2
""
c2n - Act2n
Ac~l
..o . o . . ~
..o ..~ . . . . . . .
Cnl -
cn2 -
ACtnl
Actn2
...
~c~n
-- 0
~ ,~ ,~176 Cnn-
~C:r ,
which must be satisfiedby the n values AI, A2, ... A n of A. One can convince oneself of this by another way, by observing that | is the determinant of the bilinear function f - A f, and that hence one has by virtue of the general theorem on the transformation of this determinant (*) O = AB('),I- A ' y ~ ) ( " [ 2 - A " [ 2 ) . . . ( ' ~ ' r ~ - A']/~), whence emerges just as expected, for A and B to be different from zero, that the equation | = 0 has for its roots the n ratios 71
~'
72
~'
Vn
...~,,.
Of the rest, since the two series of quantities V and V~ are not determined except by these ratios, it is useful to assume for more simplicity that 7~ = V~ . . . . . V~ = I, and hence % = A~. (*) f r o m t h i s one sees t h a t t h e d e t e r m i n a n t of a b i l i n e a r f u n c t i o n is zero o n l y w h e n t h e f u n c t i o n i t s e l f c a n b e r e d u c e d to c o n t a i n t w o f e w e r v a r i a b l e s , a p r o p e r t y t h a t one c a n easily s h o w d i r e c t l y .
On Bilinear Functions
17
Here then is the procedure that leads to the solution of the problem: Find the n roots (real or imaginary) A1, A2, ..., A,~ of the equation | = 0; then substitute them successively into the equations of the form (7). One obtains in this way, for each value of s, n linear and homogeneous equations, one of which is a consequence of the other n - 1, so that one can recover only the values of the ratios az, : a2s :... : arts. Having chosen arbitrarily the values of the quantities al,, a2s, ..., an, so that they have these mutual ratios, the equations that are of the form of the second part of (6) ~ yield (8)
I A I b,, - Czr z, + C2rA2s + ... + CIA
I Cnr An,.
and in this way all the unknown quantities can be determined. One can observe that if for the coefficients at, one substitutes the products paara, something that is legitimate by a previous observation, the coefficients bra, determined by the last equation, are converted into ~ . This is the same as saying that if for the variables ~, one substitutes p,~,, the variables ~?s are converted to ~ ; but this change leaves unaltered the transformed functions 7~ and 7~~. If one denotes by | the algebraic complement of the element cr, - Adr, in the determinant $, and by Sr,(A,) what results by setting A = A, in this complement, it is easy to see that the equations (7) are satisfied by setting a,., = c~,O,.,(,L).
In this way, if one forms the equations analogous to (7) and containing the coefficients b in place of the coefficients a, they can be satisfied by setting
b,, =/~,o,,(~,,). The quantities aa and ~a are not determined completely: but the equations analogous to (2) determine the product aa~a. In any case, these factors are not essential, since by writing ~a in place of aa~a and ~?, in place of ~,~7,, they can be removed. From the expressions a.. = O . . ( A . ) .
b. = O..(A.).
that result from this supposition, it is evident that this important property emerges, that ~ i.e. when the two bilinear functions are associated harmonically when Cr, = C,r, %, = C,r, to tWO quadratic functions, the substitutions that reduce them both simultaneously to canonical form are s u b s t a n t i a l l y m u t u a l l y identical. If we suppose that the second bilinear function ft has the form f l : XlYl + x2Y2 + ... + XnYn,
the general problem just treated assumes the following form: Reduce the bilinear function f to the canonical form
18
E. Beltrami
with two simultaneous linear substitutions, so that the function XlYl
+ x2Y2
q" ... q" Z n Y n
is transformed into itself. In this case, the general formula (8) becomes Abe, = A~,, whence Ay~
=
Arlr/1
q- A r 2 r / 2
+
... q-
Arnrh~,
and hence ~Ta = a l s Y l + a28Y2 + ... + anaYn.
This last formula shows that the two substitutions (a) and (b) are inverses of each other, something that follows necessarily from the nature of the function that is transformed into itself. The relations among the coefficients a and b that follow from this ~rarsbr, = 1, ~,arsbrs = 1 Y~rarsbr,, = 0, ~,arsbr,s = 0, AB=I make a perfect contrast to those that hold among the coefficients of one orthogonal substitution, and they reduce to them when ara = br,, such a case arising (by what we have recently seen) when the form f is associated to a quadratic form. In the special problem which we have mentioned (and which has already been treated by Mr. CHRISTOrFEL at the end of his Memoirs), the equation 0 = 0 takes on the form C l l - - )~
s
...
Cln
C21
C22 - - '~
""
C'2n
........,
, ....................
Cnl
Cn2
...
- - O~
Cnn - -
and under the hypothesis cr, = csr (that we have just alluded to), it is identified, as is natural, with what the analogous problem leads to via a true quadratic form. We will not add any word on the geometric interpretation of the preceding results, since their intimate connection with the whole theory of homogeneous coordinates is evident. Likewise, we will not discuss for now the transformations of bilinear functions into themselves, an important argument but less easy to handle than the preceding ones, for which Mr. CHItISTOFFEL has already presented the treatment in the case of a single function and a single substitution.
PART
1
KEYNOTE PAPERS
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
IMPLICITLY RESTARTED ARNOLDI/LANCZOS AND LARGE SCALE SVD APPLICATIONS
21
METHODS
D.C. SORENSEN Department of Computational and Applied Mathematics Rice University P. O. Box 1892 Houston, TX 77251 sorensen @rice.edu ABSTRACT. Implicit restarting is a technique for combining the implicitly shifted QR mechanism with a k-step Arnoldi or Lanczos factorization to obtain a truncated form of the implicitly shifted QR-iteration for eigenvalue problems. The software package ARPACK that is based upon this technique has been successfully used to solve large scale symmetric and nonsymmetric (generalized) eigenvalue problems arising from a variety of applications. The method only requires a pre-determined limited storage proportional to n times the desired number of eigenvalues. Numerical difficulties and storage problems normally assodated with Arnoldi and Lanczos processes are avoided. This technique has also proven to be very useful for computing a few of the largest singular values and corresponding singular vectors of a very large matrix. Biological 3-D Image Reconstruction and Molecular Dynamics are two interesting applications. The SVD plays a key roll in a classification procedure that is the computationally intensive portion of the 3-D image reconstruction of biological macromolecules. A relatively new application is to analyze the motions of the proteins using the SVD instead of normal mode analysis. The primary research goal is to pick out the non-harmonic phenomena that usually contains most of the interesting behavior of the protein's motion. This paper reviews the Implicitly Restarted Arnoldi/Lanczos Method and briefly discusses the biological applications of the SVD. KEYWORDS. Lanczos/Arnoldi methods, implicit restarting, image reconstruction, molecular dynamics, computational biology.
1
INTRODUCTION
Large scale eigenvalue problems arise in a variety of applications. In many of these, relatively few eigenvalues and corresponding eigenvectors are required. The Lanczos method
22
t
D. C. Sorensen
for symmetric problems and the Arnoldi method, which is a generalization to nonsymmetric problems, are two important techniques used to compute a few eigenvalues and corresponding vectors of a large matrix. These methods are computationally attractive since they only require matrix-vector products. They avoid the cost and storage of the dense factorizations required by the more traditional Q R-methods that are used on small dense problems. A number of well documented numerical difficulties are associated with the Lanczos process [11]. These include maintenance of numerically orthogonal basis vectors or related techniques to avoid the introduction of spurious approximations to eigenvalues; required storage of basis vectors on peripheral devices if eigenvectors are desired; no a-priori bound on the number of Lanczos steps (hence basis vectors) required before sufficiently accurate approximations to desired eigenvalues appear. Since the Arnoldi technique generalizes the Lanczos process, these difficulties as well as the additional difficulties associated with computing eigenvalues of nonsymmetric matrices are present [13]. The Arnoldi method does not lead to a three term recurrence in the basis vectors as does Lanczos. Hence considerable additional arithmetic and storage is associated with the more general technique. Implicit restarting is a technique for combining the implicitly shifted QtL mechanism with a k-step Arnoldi or Lanczos factorization to obtain a truncated form of the implicitly shifted Qtt-iteration for eigenvalue problems. The numerical difficulties and storage problems normally associated with Arnoldi and Lanczos processes are avoided. The technique has been developed into a high quality public domain software package ARPACK that has been useful in a variety of applications. This software has proven to be very effective for computing a few of the largest singular values and corresponding singular vectors of a very large matrix. Two applications of the SVD occur in Biological 3-D Image Reconstruction and in Molecular Dynamics. The SVD plays a key roll in a classification procedure that is the computationally intensive portion of the 3-D image reconstruction of biological macromolecules. A relatively new application is to analyze the motions of the proteins using the SVD instead of normal mode analysis. The primary research goal is to pick out the non-harmonic phenomena that usually contains most of the interesting behavior of the protein's motion. This paper will review the implicit restarting technique for eigenvalue calculations and will discuss several applications in computing the SVD and related problems.
2
S O F T W A R E F O R L A R G E SCALE E I G E N A N A L Y S I S
The Arnoldi process is a well known technique for approximating a few eigenvalues and corresponding eigenvectors of a general square matrix. A new variant of this process has been developed in [14] that employs an implicit restarting scheme. Implicit restarting may be viewed as a truncation of the standard implicitly shifted QR-iteration for dense problems. Numerical difficulties and storage problems normally associated with Arnoldi and Lanczos processes are avoided. The algorithm is capable of computing a few (k) eigenvalues with user specified features such as largest real part or largest magnitude using 2nk + O(k2)storage. No auxiliary storage is required. Schur basis vectors for the k-dimensional eigen-space are computed which are numerically orthogonal to working precision.
Implicitly Restarted Arnoldi/Lanczos Methods
23
This method is well suited to the development of mathematical software. A software package ARPACK [10] based upon this algorithm has been developed. It has been designed to be efficient on a variety of high performance computers. Parallelism within the scheme is obtained primarily through the matrix-vector operations that comprise the majority of the work in the algorithm. The software is capable of solving large scale symmetric, nonsymmetric, and generalized eigenproblems from significant application areas. This software is designed to compute approximations to selected eigenvalues and eigenvectors of the generalized eigenproblem (1)
Ax = A M x ,
where both A and M are real n x n matrices. It is assumed that M is symmetric and positive semi-definite but A may be either symmetric or nonsymmetric. Arnoldi's method is a Krylov subspace projection method which obtains approximations to eigenvalues and corresponding eigenvectors of a large matrix A by constructing the orthogonal projection of this matrix onto the Krylov subspace S p a n { v , Av, ..., A k - l v } . The Arnoldi process begins with the specification of a starting vector v and in k steps produces the decomposition of an n x n matrix A into the form (2)
A V = V H + f e T,
where v is the first column of the matrix V E l:t nxk, v T v = Ik; H E R, kxk is upper Hessenberg, f E R n with 0 = v T f and ek E R k the kth standard basis vector. The vector f is called the residual. This factorization may be advanced one step at the cost of a (sparse) matrix-vector product involving A and two dense matrix vector products involving V T and V. The explicit steps are:
1 . / ~ = Ilfll; v + - / 1 ~ ;
v +- ( v , v); H +-
Zekr
.
2. z ~ Av; 3. h ~ VTz; f ~-- z -
Vh;
4. H .-- (H, h); The dense matrix-vector products may be accomplished using level 2 BLAS. In exact arithmetic, the columns of V form an orthonormal basis for the Krylov subspace and H is the orthogonal projection of A onto this space. In finite precision arithmetic, care must be taken to assure that the computed vectors are orthogonal to working precision. The method developed in [4] may be used to accomplish this. Eigenvalues and corresponding eigenvectors of H provide approximate eigenvalues and eigenvectors for A. If Hy=yO,
and we put x = V y .
D. C. Sorensen
24
Then x, 0 is an approximate eigenpair for A with
liAr, - xS[I = []f[lleTy[, and this provides a means for estimating the quality of the approximation. The information obtained through this process is completely determined by the choice of the starting vector. Eigen-information of interest may not appear until k gets very large. In this case it becomes intractable to maintain numerical orthogonality of the basis vectors V and it also will require extensive storage. Failure to maintain orthogonality leads to a number of numerical difficulties. Implicit restarting provides a means to extract interesting information from very large Krylov subspaces while avoiding the storage and numerical difficulties associated with the standard approach. It does this by continually compressing the interesting information into a fixed size k dimensional subspace. This is accomplished through the implicitly shifted QR mechanism. An Arnoldi factorization of length k + p is compressed to a factorization of length k by applying p implicit shifts resulting in
AVk++p = Yk+pHk+ + + p + fk+pek+pQ T ,
(3)
where Vk~p = Vk+pQ, Hk+ p+ = QTHk+pQ, and Q = Q1Q2"" "Qp, with Qj the orthogonal matrix associated with the shift #j. It may be shown that the first k - 1 entries of the T vector ek+pQ are zero. Equating the first k columns on both sides yields an updated k - s t e p Arnoldi factorization
Av: = V:H+
+
(4)
with an updated residual of the form f+ = Vk+_4.pek+l~k+ fk+po'. Using this as a starting point it is possible to use p additional steps of the Arnoldi process to return to the original form. Each of these applications implicitly applies a polynomial in A of degree p to the starting vector. The roots of this polynomial are the shifts used in the Q R process and these may be selected to filter unwanted information from the starting vector and hence from the Arnoldi factorization. Full details may be found in [14]. The software ARPACK that is based upon this mechanism provides several features which are not present in other (single vector) codes to our knowledge 9 Reverse communication interface 9 Ability to return k eigenvalues which satisfy a user specified criterion such as largest real part, largest absolute value, largest algebraic value (symmetric case), etc. 9 A fixed pre-determined storage requirement suffices throughout the computation. Usually this is n , O ( 2 k ) + O ( k 2) where k is the number of eigenvalues to be computed and n is the order of the matrix. No auxiliary storage or interaction with such devices is required during the course of the computation. 9 Eigenvectors may be computed on request. The Arnoldi basis of dimension k is always computed. The Arnoldi basis consists of vectors which are numerically orthogonal to working accuracy. 9 Accuracy: The numerical accuracy of the computed eigenvalues and vectors is user specified and may be set to the level of working precision. At working precision,
Implicitly Restarted A r n o l d i / L a n c z o s Methods
25
the accuracy of the computed eigenvalues and vectors is consistent with the accuracy expected of a dense method such as the implicitly shifted QR iteration. 9 Multiple eigenvalues offer no theoretical or computational difficulty other than additional matrix vector products required to expose the multiple instances. This cost is commensurate with the cost of a block version of appropriate blocksize. 3
COMPUTING
THE SVD WITH ARPACK
In later sections we shall discuss large scale applications of the Singular Value Decomposition (SVD). To avoid confusion with the basis vectors V computed by the Arnoldi factorization, we shall denote the singular value decomposition of a real m x n matrix M by M = U SW T
and we consider only the short form with U E R mx'~ , W E l:t"~x'~ and S = d i a g ( a l , a2, ..., an) with o'j ~_ aj+l. We assume m > n and note that u T u = w T w = In. The applications we have in mind focus upon computing the largest few singular values and corresponding singular vectors of M. In this case (with m > n ) it is advantageous to compute the desired right singular vectors W directly using ARPACK z ~ Av
where A = M T M
If Wk is the matrix comprised of the first k columns of W and Sk is the corresponding leading principal submatrix of S then A W k -- W k ( S k 2)
The corresponding left singular vectors are obtained through the relation
vk = M W k S [ ~. Although this mechanism would ordinarily be numerically suspect, it appears to be safe in our applications where the singular values of interest are large and well separated. Finally, it should be noted that the primary feature that would distinguish the implicit restarting method from other Lanczos style iterative approaches [1] to computing the SVD is the limited storage and the ability to focus upon the desired singular values. An important feature of this arrangement is that space on the order of 2 n k is needed to compute the desired singular values and corresponding right singular vectors and then only m k storage locations are needed to compute the corresponding left singular vectors. The detail of the matrix vector product z ~ A v calculation is also important. The matrix M can be partitioned into blocks M T = (MT, M T, ..., M T) and then matrix-vector product z ~ A v may be computed in the form
26
D. C. Sorensen
9 for j = 1, 2, ..., b, 1. z ~ z + M T ( M j v ) ;
end. Moreover, on a distributed memory parallel machine or on a network or cluster of workstations these blocks may be physically distributed to the local memories and the memory traffic associated with reading the matrix M every time a matrix vector product is required could be completely avoided. 4
LARGE SCALE APPLICATIONS
OF THE SVD
In this section we discuss two important large scale applications of the SVD. The first of these is the 3-D image reconstruction of biological macromolecules from 2-D projections obtained through electron micrographs. The second is an application to molecular dynamical simulation of the motions of proteins. The SVD may be used to compress the data required to represent the simulation and more importantly to provide an analytical tool to help in understanding the function of the protean. 4.1
ELECTRON
MICROGRAPHS
We are currently working with Dr. Wah Chiu and his colleagues at the Baylor College of Medicine to increase the efficiency of their 3-dimensional image reconstruction software. Their project involves the identification and classification of 2-dimensional electron micrograph images of biological macromolecules and the subsequent generation of the corresponding high resolution 3-D images. The underlying algorithm [16] is based upon the statistical technique of principal component analysis [17]. In this algorithm, a singular value decomposition (SVD) of the data set is performed to extract the largest singular vectors which are then used in a classification procedure. Our initial effort has been to replace the existing algorithm for computing the SVD with ARPACK which has increased the speed of the analysis by a factor of 7 on an Iris workstation. The accuracy of the results were also increased dramatically. Details are reported in [6]. The next phase of this project will introduce the parallel aspects discussed previously. There are typically several thousand particle images in a working set, and in theory, each one characterizes a different viewing angle of the molecule. Once each image is aligned and processed to eliminate as much noise as possible the viewing angle must be determined. The set is then Fourier transformed, and each transformed image is added to a Fourier composite in a position appropriate to the this angle. The composite is then transformed back to yield a high resolution 3-D image. If we consider each 2-D image to be a vector in N dimensional space, where N is the number of pixels, we know that we can describe each image as a weighted sum of the singular vectors of the space defined by the set. The distribution of the images in this N dimensional coordinate system is then used to break up the set into classes of viewing angles, and the images in a class are averaged. Determining the left singular vectors of
Implicitly Restarted Arnoldi/Lanczos Methods
27
the space of images is presently the computational bottleneck of the process. As is true in many applications, the classification can be accomplished with relatively few of the singular vectors corresponding to the largest singular values. In a test example (using real data) the largest 24 singular triplets uiaiw T were extracted from a data set of size 4096x1200, corresponding to 1200 64x64 images. The decomposition was done with an existing software package as well as with a newer algorithm with the hopes that the computational bottleneck would be resolved. The existing package ran with 24 iterations of its method in roughly 29 minutes of CPU time. The new routines performed 7 iterations in just under 4 minutes, an improvement in speed by a factor of 7. In this test, the data set fits into memory. The ultimate goal is to process data sets consisting of 100,000 images each 128x128 pixels and this will require 6.4Gbytes of storage. I/O will then be a severe limitation unless a parallel I/O mechanism as described above can be implemented.
4.2
MOLECULAR DYNAMICS
Knowledge of the motions of proteins is important to understanding their function. Proteins are not rigid static structures, but exist as a complex dynamical ensemble of closely related substates[7]. It is the dynamics of the system that permit transitions between these conformational substates and there is evidence that this dynamic behavior is also responsible for the kinetics of ligand entry and ligand binding in systems such as myoglobin. These transitions are believed to involve anharmonic and collective motions within the protein. X-ray crystallography is able to provide high-resolution maps of the average structure of a protein, but this structure only represents a static conformation. Molecular dynamics, which solves the Newtonian equations of motion for each atom, is not subject to such approximations and can be used to model the range of motions available to the protein on short (nanosecond) time-scales. Time-averaged refinement extends the molecular dynamics concept, restraining the simulation by requiring the time-averaged structure to fit the observed X-ray data[2]. With this methodology, an ensemble of structures that together are consistent with the dynamics force-fields and the observed data is generated. This ensemble represents a range of conformations that are accessible to proteins in the crystalline state. One difficulty with time-averaged refinements is the plethora of generated conformations. No longer is one structure given as the model, rather, thousands of structures collectively describe the model. Although it is possible to analyze this data directly by manually examining movies of each residue in the simulation until some pattern emerges, this is not a feasible approach. Nor is it feasible to watch an animation of an entire protein and discern the more subtle global conformation states that may exist. The SVD can provide am important tool for both compressing the data required to represent the simulation and for locating interesting functional behavior. In order to employ the SVD we construct a matrix C from the simulation data. A column of C represents the displacement of the atoms from their mean positions at a given instant of time in the simulation. Each group of three components if this column vector make up the spatial displacement of a particular atom from its mean position. As discussed in Section 2 above, we construct an approximation to C using the leading k singular values
28
D. C. Sorensen
and vectors. C ~ Ck = U k & V T.
For a visually realistic reconstruction of the simulation (ie. run the graphics with Ck in place of C one can typically set k around 15 to 20). This representation not only compresses the data but classifies the fundamental spatial modes of motion of the atoms in the protein. The SVD also includes temporal information in a convenient form. It is the temporal information that sets it apart from the more traditional eigendecomposition analysis. This transform can also be thought of as a quasinormal mode calculation based on the dynamics data rather than the standard normal mode analysis obtained by an approximation to the system's potential energy function. This standard approach only admits harmonic motions. Using the SVD, it is possible to extract information about the distribution of motion in the protein, and to classify the conformational states of the protein and determine which state the system is in at any point in the simulation. Projections of the entire trajectory onto the SVD basis can also be used for understanding the extent of configuration space sampling of dynamics simulations and the topography of the system's energy hypersurface[3]. In [12], we have computed a time-averaged refinement of a mutant myoglobin. Using the SVD, we have characterized the motions within the protein and begun to automate this analysis. We have also used the SVD to distinguish between different conformational states within the dynamics ensemble. This study exposes anharmonic motions associated with the ligand binding sites through analysis of the distribution of the components of the right singular vectors. It also documents the efficiency of computing the SVD through ARPACK in this application. 5
G E N E R A L A P P L I C A T I O N S OF A R P A C K
AR.PACK has been used in a variety of challenging applications, and has proven to be useful both in symmetric and nonsymmetric problems. It is of particular interest when there is no opportunity to factor the matrix and employ a "shift and invert" form of spectral transformation, fi ~ ( A -
a I ) -1 .
(5)
Existing codes often rely upon this transformation to enhance convergence. Extreme eigenvalues {#} of the matrix A are found very rapidly with the Arnoldi/Lanczos process and the corresponding eigenvalues {A} of the original matrix A are recovered from the relation A = 1/# + a. Implementation of this transformation generally requires a matrix factorization. In many important applications this is not possible due to storage requirements and computational costs. The implicit restarting technique used in ARPACK is often successful without this spectra/transformation. One of the most important classes of application arise in computational fluid dynamics. Here the matrices are obtained through discretization of the Navier-Stokes equations. A typical application involves linear stability analysis of steady state solutions. Here one linearizes the nonlinear equation about a steady state and studies the stability of this state
Implicitly Restarted Arnoldi/Lanczos Methods
29
through the examination of the spectrum. Usually this amounts to determining if the eigenvalues of the discrete operator lie in the left halfplane. Typically these are parametrically dependent problems and the analysis consists of determining phenomena such as simple bifurcation, Hopf bifurcation (an imaginary complex pair of eigenvalues cross the imaginary axis), turbulence, and vortex shedding as this parameter is varied. ARPACK is well suited to this setting as it is able to track a specified set of eigenvalues while they vary as functions of the parameter. Our software has been used to find the leading eigenvalues in a Couette-Taylor wavy vortex instability problem involving matrices of order 4000. One interesting facet of this application is that the matrices are not available explicitly and are logically dense. The particular discretization provides efficient matrix-vector products through Fourier transform. Details may be found in [5]. Very large symmetric generalized eigenproblems arise in structural analysis. One example that we have worked with at Cray Research through the courtesy of Ford motor company involves an automobile engine model constructed from 3D solid elements. Here the interest is in a set of modes to allow solution of a forced frequency response problem ( K - AM)x = f(t), where f ( t ) is a cycnc forcing function which is used to simulate expanding gas loads in the engine cylinder as well as bearing loads from the piston connecting rods. This model has over 250,000 degrees of freedom. The smallest eigenvalues are of interest and the ARPACK code appears to be very competitive with the best commercially available codes on problems of this size. For details see [15]. Another source of problems arise in magnetohydrodynamics (MHD) involving the study of the interaction of a plasma and a magnetic field. The MHD equations describe the macroscopic behavior of the plasma in the magnetic field. These equations form a system of coupled nonlinear PDE. Linear stability analysis of the linearized MHD equations leads to a complex eigenvalue problem. Researchers at the Institute for Plasma Physics and Utrecht University in the Netherlands have modified the codes in ARPACK to work in complex arithmetic and are using the resulting code to obtain very accurate approximations to the eigenvalues lying on the Alfven curve. The code is not only computes extremely accurate solutions, it does so very efficiently in comparison to other methods that have been tried. See [9] for details. There are many other applications. It is hoped that the examples that have been briefly discussed here will provide an indication of the versatility of the ARPACK software as well a the wide variety of eigenvalue problems that arise. Acknowledgements The author is grateful to Ms. Laurie Feinswog and to Mr. Tod Romo for working with ARPACK in the 3-D image reconstruction and molecular dynamics applications mentioned above. They provided most of the information conveyed in Section 3.
References [1] M. W. Berry, "Large scale singular value computations", IJSA ,6,13-49,(1992).
D. C. Sorensen
30
[2] J.B. Clarage, G.N. Phillips, Jr. "Cross-validation Tests of Time-averaged Molecular Dynamics Refinements for Determination of Protein Structures by X-ray Crystallography," Acta Cryst., D50,24-36,(1994). [3] J.B. Clarage, T.D. Romo, B.M. Pettit, G.N. Phillips, Jr., "A Sampling Problem in Molecular Dynamics Simulations of Macromolecules, Keck Center Rept., Rice University, (1994). [4] J. Daniel, W.B. Gragg, L. Kaufman, G.W. Stewart, "Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization," Math. Comp.,30, 772795,(1976). [5] W.S. Edwards, L.S. Tuckerman, R.A. Friesner and D.C. Sorensen, "Krylov Methods for the Incompressible Navier-Stokes Equations," Journal of Computational Physics,
~0,82-102 (1994). [6] L. Feinswog, M. Sherman, W. Chiu, D.C. Sorensen, "Improved Computational Methods for 3-Dimensional Image Reconstruction" , CRPC Tech. Rept., PAce University (in preparation). [7] H. Frauenfelder, S.G. Sligar, P.G. Wolynes, " The Energy Landscapes and Motions of Proteins, Science, 254,1598-1603, (1991). [8] G.H. Golub, C.F. Van Loan. Matrix computations. North Oxford Academic Publishing Co., Johns Hopkins Press,(1988). [9] M.N. Kooper, H.A. van der Vorst, S. Poedts, and J.P. Goedbloed, "Application of the Implicitly Updated Arnoldi Method with a Complex Shift and Invert Strategy in MHD," Tech. Rept., Institute for Plasmaphysics, FOM Rijnhuizen, Nieuwegin, The Netherlands,(Sep. 1993) ( submitted to Journal of Computational Physics). [10] R. Lehoucq, D.C. Sorensen, P.A. Vu, ARPACK: Fortran subroutines for solving large scale eigenvalue problems, Release 2.1, available from
[email protected], (1994). [11] B.N. Parlett, The Symmetric Eigenvalue Problem , Prentice-Hall, Englewood Cliffs, NJ. (1980). [12] T.D. Romo, J.B. Clarage, D.C. Sorensen, and G.N. Phillips, Jr., "Automatic Identification of Discrete Substates in Proteins: Singular Value Decomposition Analysis of Time Averaged Crystallographic Refinements," CRPC-TR 94481, Rice University, (Oct. 1994). [13] Y. Saad, Numerical Methods for Large Eigenvalue Problems, Halsted Press-John Wiley & Sons Inc., New York (1992). [14] D. C. Sorensen, "Implicit application of polynomial filters in a k-Step Arnoldi Method," SIAM J. Matt. Anal. Apps.,13 , 357-385, (1992).
Implicitly Restarted Arnoldi/Lanczos Methods
31
[15] D.C. Sorensen, P.A. Vu, Z. Tomasic, "Algorithms and Software for Large Scale Eigenproblems on High Performance Computers," High Performance Computing 1993Grand Challenges in Computer Simulation,Adrian Tentner ed., Proceedings 1993 Simulation Multiconference, Society for Computer Simulation, 149-154, (1993). [16] M. Van Heel, J. Frank, "Use of Multivariate Statistics in Analysing the Images of Biological Macromolecules," Ultramicroscopy, 6 187-194, (1981). [17] S. Van Huffel and J. Vandewalle, The Total Least Squares Provblem: Computational Aspects and Analysis, Frontiers in Applied Mathematics 9, SIAM Press, Philadelphia,(1991).
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
ISOSPECTRAL
MATRIX
FLOWS FOR NUMERICAL
33
ANALYSIS
U. HELMKE
Department of Mathematics University of Regensburg 93040 Regensburg Germany
[email protected] ABSTRACT. Current interest in analog computation and neural networks has motivated the investigation of matrix eigenvalue problems via eigenvalue preserving differential equations. An example is Brockett's recent work on a double Lie bracket quadratic matrix differential equation with applications to matrix diagonalization, sorting and linear programming. Another example is the Toda flow as well as many other classical completely integrable Hamiltonian systems. Such isospectral flows appear to be a useful tool for solving matrix eigenvahe problems. Moreover, generalizations of such flows are able to compute the singular value decomposition of arbitrary rectangular matrices. In neural network theory similar flows have appeared in investigations on learning dynamics for networks achieving the principal component analysis. In this lecture we attempt to survey these recent developments, with special emphasis on flows solving matrix eigenvalue problems. Discretizations of the flows based on geodesic approximations lead to new, although slowly convergent, algorithms. A modification of such schemes leads to a new Jacobi type algorithm, which is shown to be quadratically convergent. KEYWORDS. Dynamical systems, Riemannian geometry, SVD, Jacobi method.
1
INTRODUCTION
Isospectral matrix flows are differential equations evolving on spaces of matrices, which have the defining property that their solutions have constant eigenvalues. Such equations have been studied over the past 25 years, appearing in diverse areas such as classical and statistical mechanics, nonlinear partial differential equations, numerical analysis, optimization, neural networks, signal processing and control theory. The early development of isospectral flows goes back to the pioneering work of Rutishauser on the infinitesimal analogue of
U. Helmke
34
the quotient - difference algorithm. However Rutishauser's work in the 1950s has stayed isolated and it took more than 20 years that interest into such flows did renew through the discovery of Flaschka, Symes and Kostant that the Toda flow from statistical mechanics is a continuous - time analogue of the QlZ - algorithm on tridiagonal matrices. More precisely the QI~ - algorithm was seen as a discretization of an associated isospectral differential equation, the discretization being at integer time. This work on the Toda flow has spurred further investigations of Chu, Driessel, Watkins, Elsner and others on continuous time flows solving matrix eigenvalue problems. More recently, work by Brockett and others has considerably broadened the scope of applications of isospectral flows. In this paper I will focus on the applications within numerical linear algebra. More specifically the focus will be on matrix eigenvalue methods. After a brief outline on the historical development of isospectral flows the double bracket equation of Roger Brockett is studied. This is a quadratic matrix differential equation which is derived as a gradient flow for a least squares cost function. Approximations of the flow based on short length geodesics are introduced. These lead to linearly convergent algorithms involving calculations of matrix exponentials. A variant of the classical Jacobi algorithm is introduced which is based on optimization of the least squares cost function along several geodesics. This algorithm, termed Sort-Jacobi algorithm, has attractive properties such as quadratic convergence and incorporates sorting of the eigenvalues. The extension to the SVD case is straightforward. This is a survey paper. Accordingly most proofs are omitted. I have tried to preserve the structure of my talk at the workshop in as much as possible.
2
ISOSPECTlZAL FLOWS
A differential equation on matrix space ~nxn
A(*) = F(A(t)),
A e I~'~•
(1)
is called isospectral, if the eigenvalues of the solutions A(t) are constant in time, i.e. if spectrum(A(t)) = spectrum(A(0))
(2)
holds for all t and all initial conditions A(0) E I~nxn. A more restrictive class of isospectral matrix flows are the self-similar flows on I~nxn. These are defined by the property that
A(t) = S(t)A(O)S(t) -1
(3)
holds for all initial conditions A(0) and times t, and suitable invertible transformations S(t) E GL(n, n~). Thus the Jordan canonical form of the solutions of a self-similar flow does not change in time. It is easily seen that every self-similar flow (1) has the Lie-bracket form =
[A, f(A)]
= Af(A)- f(A)A
(4)
Isospectral Matrix Flows f o r Numerical Analysis
35
for a suitable matrix-valued function of the coefficients of A. Conversely, every differential equation on I~nxn of the form (4)is isospectral; see e.g. Chu [7], Helmke and Moore [19]. Of course, instead of considering.time-invariant differential equations (1) we might as well consider time-varying systems A = F ( t , A ( t ) ) . More generally we might even consider isospectral flows on spaces of linear, possibly infinite-dimensional, operators, but we will not do so in the sequel. E x a m p l e 1 ( L i n e a r Flows) The simplest example of an isospectral flow is the linear self-similar flow (5)
A=[A,B]
for a constant matrix B E Ii~nxn. The solutions are A ( t ) = e - t s A(O)e t s ,
which immediately exhibits the self-equivalent nature of the flow. E x a m p l e 2 ( B r o c k e t t ' s Flow) The second example is Brockett's double Lie bracket equation
(6)
= [A, [A, B]] = A 2 B + B A 2 - 2 A B A .
For this equation there is no general solution formula known; we will study this flow in section 3.
Example 3 (QR-Flow) This is a rather nontrivial looking example. Here = [A, (log A)_]
(7)
where B_ denotes the unique skew-symmetric part of B with respect to the orthogonal decomposition
B=
0
-b21
. . . .
b21
"'.
"'.
b,n
...
bn,,~-i
bnl
+ 0
*
......
*
0
"'.
:
0
...
0
,
into skew-symmetric and upper triangular matrices. This differential equation implements exactly the QR-algorithm in the sense that for all n E N: A(n) = n-
th step of Qtt-algorithm applied to A(0).
Let us now briefly describe some of the earlier developments on isospectral flows.
U. Helmke
36 2.1
KORTEWEG-DE VRIES EQUATION
An important part of the theory of isospectral matrix flows is linked to comparatively recent developments in the theory of nonlinear partial differential equations and, in particular, with the theory of solitons. This is explained as follows. Let us consider the Korteweg-de Vries (KdeV) ~
= 6~
- ~.
,
~ = ~ ( ~ , t).
(8)
This nonlinear partial differential equation models the dynamics of water flowing in a shallow one-dimensional channel. By a beautiful observation of P. Lax the KdeV equation has the appealing form Lt = [P, L]
(9)
where 92
/-, = -~-'~'z2 -F u(x,t)
(I0)
denotes the SchrSdinger operator and
o3
P = -4~--dx3 + 6u
+ 3ux.
(11)
Therefore the KdeV equation is equivalent to the isospectral flow (9) on a space of differential operators. In particular, the spectrum of the SchrSdinger operator (10) yields invariants for the solutions u(z,t) of (t). It is thus obvious that the isospectral nature of (9) may lead, and indeed has led, to further insights into the solution structure of the KdeV equation. Following this fundamental observation of Lax several other important nonlinear partial differential equations (such as, e.g., the nonlinear SchrSdinger equation or the LandauLifshitz equation) have also been recognized as isospectral flows. These developments then have culminated into what is nowadays called the theory of solitons, with important contributions by Gelfand-Dikii, Krichever, Novikov, Lax, Adler and van Moerbeke and many others. For further information we refer to [1], [2].
2.2
EULElZ EQUATION
One of the many mechanical systems which do admit an interpretation as an isospectral flow is the classical Euler equation
Isospectral Matrix Flows for Numerical Analysis
I=~=
=
(Ia - h)~'l~oa
Ia~aa
=
( h -/2)~01~,=.
37
(12)
Using the notation
B'=
0 0
I~"1 0 0 I~-1
,
A: =
r/3 -72
0 771
-r/1 0
(13)
where (~h, ~12,*/3) = (/1 r
I2w2, I3w3 )
a simple calculation shows that /[ = [A2, B] = [A, AB -i- BA].
(14)
Thus the Euler equation (12) is equivalent to the quadratic isospectral matrix flow (14) on the Lie algebra so(3) of 3 x 3 skew-symmetric matrices. In particular we conclude that IIAII2 = Ii2wl2 + I~02 2 2 + i3w32 is an invariant of the Euler equation. Moreover, from the identity tr(A[B, C]) = tr([A, B]C) it follows that
2tr(BAfi) = tr(BA + AB),;i = tr((AB + BA)[A, AB + BA]) = O. Thus
tr(BA 2) = / 1 ( / 2 + I3)w 2 + I2(I1 + I3)w~ + I3(I~ + / 2 ) w 2 is another invariant of the Euler flow. Thus for c _> 0, d >_ 0 the ellipsoids
Mc = {A e so(3) [ tr(A2B) = -c} as well as the spheres {A e so(3) lllAII 2 = d} are invariant submanifolds of (14). From this the familiar phase-portrait description of the Euler equation on II~3 is easily obtained; see e.g. Arnol'd [2]. Thus the isospectral form (14) of the Euler equation leads to a complete integration of the flow. For a further analysis of generalized Euler equations defined on arbitrary Lie algebras we refer to Arnol'd [2], Fomenko [16].
38 2.3
U. H e l m k e
TODA FLOW
The Toda flow is one of the earliest examples of an isospectral flow which yields a continuous analogue of a numerical matrix eigenvalue problem. The original motivation for the flow stems from statistical mechanics; cf. Flaschka (1974, 1975), Moser (1975), Kostant (1979). Consider the one-dimensional lattice of n idealized mass points Zl < ... < xn, located on the real axis I~. We assume that the potential energy of the system is described by
Y(xl,..., z~) = ~ e ~ - ~ + ~ k= 92 ?%
X o ' = - o o , x n + l " = +oo; while the kinetic energy has the usual form ] ~] x~. k=l
Thus, if the particles are sufficiently far apart, then the total potential energy is very small and the system behaves roughly like a gas. The Hamilton function of the system is
(~ = ( ~ , . . . , ~ ) , v = (v~,..., y~))
H(~,v)=~
lfiv~+ fi~ - ~ + ~ k=l
(15)
k=l
where Yk = :~k denotes the velocity of the k-th mass point. The associated Hamiltonian system is
'xk
=
~]k
=
OH
= Yk Oyk OH OXk = e X k - l - x k -- eXk-xk+l
(16)
for k = 1 , . . . , n. These are the Toda lattice equations on 11~2n. Equivalently, the Toda lattice is described by the second order differential equations Xk
-- s
_
exk--xk+l
,
k =
1,...,
(17)
n.
To relate the Toda system to an isospectral flow we make the following (not invertible) change of variables (this observation is due to Flaschka). (xl,...,xn,
Yl,...,Yn)'
)(al,...,an-l,bl,...,bn)
with ak --
1 e (xk-xk+l)/2
'
bk --
-1 T
Yk
~
k - 1,
"'''
n.
Then, in the new variables, the Toda system becomes (ao = 0, bn+l = 0).
(18)
Isospectral Matrix Flows for Numerical Analysis hk = ak(bk+l -- bk) bk = 2 ( a ~ - a ~ _ l ) ,
k= 1,...,n.
39
(19)
This is called the Toda flow. Thus we have associated to every solution ( x ( t ) , y ( t ) ) o f (16) a corresponding solution (a(t),b(t)) of (19). Conversely, if (a(t),b(t))is a solution of (19) with initial conditions satisfying al(0) > 0 , . . . , an-l(0) > 0~ then one can uniquely reconstruct from (a(t),b(t)) via (18) the velocity vector (yl(t),...,yn(t)) as well as the pairwise differences x l ( t ) x 2 ( t ) , . . . , X n _ l ( t ) - xn(t) of the positions of the mass points. Moreover, if (x(0), y(0))is an initial condition of (16), then by conservation of momentum we have
9~(t) = t. ~ w(0)+ ~ ~ ( 0 ) k=l
k=l
k=l
Thus, given any initial condition (x(0), y(0))of (16) and a corresponding solution (a(t), b(t)) of (19), one can explicitly determine the solution (x(t),y(t)) of (16). In this sense the systems (16) and (19) are equivalent. In order to exhibit the isospectral nature of the flow (19) we store the state variables as components of a tridiagonal matrix 9 Let
A:=
bl
al
al
b2
0 "'.
9
0
A
9
(20)
an_ 1 bn
9
at,-1 0
al
--al
0 9 9
0
0 "'. " 9
an-1
(21) an-1 0
With these notations the flow (19) is easily verified to be equivalent to the isospectral Toda flow on tridiagonal matrices ,~i = [A, A_]
(22)
The convergence properties of the Toda flow (22) are summarized in the following theorem, whose proof is due to Moser [23], Flaschka [15], Delft, Nanda and Tomei [11]. T h e o r e m 1.1
(a) The solutions of the Toda flow (22)
40
U. Helmke A = [A,A_]
exist and are tridiagonal for all time.
(b) Every tridiagonal solution of (22) converges to a diagonal matrix lim
A(t) =
diag(A1,..., An),
where ) u , . . . , A n are the eigenvalues of A(O). For almost every initial condition A(O) in the set of tridiagonal Jacobi matrices the solutions converges to diag(Ax,..., An) with )u < ..._ ... >_ A~.
All other critical points diag(A,r(1),..., A~(n)) for r # id are saddle points. (c) The Hessian of fA" M(A) ~ I~ at each critical point is nondegenerate, for A1 > . . . > An. El
Case 2: Block Diagonalization This case is of obvious interest for principal component analysis. Here we choose A = diag(1,..., 1,0,...,0), the eigenvalue 1 appearing k times. Theorem 1 then specializes to C o r o l l a r y 2.3 Let A be as above. Then (a) The critical points of fA" M(A) ---, [~ are all block diagonal matrices X = diag(Xll,X22) E M(A)
with Xlx E IRkxk. (b) The set of all local = global minima consists of all block diagonal matrices X = diag(Xll,X22)
where Xxl has eigenvalues ~1 >_ ... >__Ak
and X22 has eigenvalues Ak+l >_ ... _> A,~.
3.2
THE DOUBLE BRACKET FLOW
The above results show that we can characterize the tasks of symmetric matrix diagonalization or block diagonalization as a constrained optimization problem for the cost function f A : M ( A ) ~ IR. A standard approach to tackle optimization problems on manifolds is by a steepest descent approach, following the solution of a suitable gradient flow. What is then the gradient flow of fA: M(A) ---, IR? To compute the gradient of fA one have to specify a Riemannian metric on M(A). As the answer depends crucially on the proper choice of a Riemannian metric it is important to make a good choice.
46
U. Helmke
It turns out that there is indeed (in some sense) such a best possible metric, the so-called normal Riemannian metric. With respect to this normal metric the double bracket flow (27) is then seen as the gradient flow of fA: M(A) ~ II~; see Bloch, Brockett, and Ratiu (1990). All other choices of Riemannian metrics lead to much more complicated differential equations. T h e o r e m 2.4 ( B r o c k e t t (1988), Bloch, B r o c k e t t a n d l:tatiu (1990)). Let A = A T E I~ nXn. (a) The gradient flow of fA: M(A)---, R is the double bracket flow )( = IX, [X, A]] on symmetric matrices X = X T E I~nx'~.
(b) Th~
diff~,'~ntial~quatio,~ (27) d ~ n ~
matrices X E ]~nxn.
a~ i~osp~ctratflow on th~ ~t o/,'~al~ymm~t,'i~
(c) Every solution X ( t ) of (27) exists/or all t e It~ and converges to an equilibrium point X ~ . The equilibrium points of (27) are characterized by [A, X~] = 0, i.e. by A X ~ = X ~ A . (d) Zet A = d i a g ( m , . . . , ~ ) with #1 > . . . > #,~. Every solution X ( t ) of (27) converges to lim X ( t ) = diag()~TrO),... )~r(n))
t----~oo
'
for a suitable permutation 7r: { 1 , . . . , n} ~ { 1 , . . . , n}. There exists an open and dense subset U C M(A) such that for every initial condition X(O) E U then lim Z ( t ) = diag(,\l,...An)
,
~1 >_ ..._> ~ .
t--*oo
Moreover, convergence of X ( t ) to this unique attractor of the flow is exponentially fast.
Thus solving the double bracket flow leads us to a diagonalization method for symmetric matrices. Of course, this is a rather naive method which is not recommended in practice! D o u b l e B r a c k e t Flow D i a g o n a l i z a t i o n M e t h o d 9 The to be diagonalized matrix X0 is the initial condition of (27). 9 Choose, e.g., A = d i a g ( n , n - 1 , . . . , 1 ) . 9 Solve the differential equation (27)! Then X ( ~ ) = diag(A1,..., An)
with ) q , . . . , An the eigenvalues of X0.
Isospectral Matrix Flows for Numerical Analysis
3.3
47
DISCRETIZATIONS
A major drawback of the double bracket flow diagonalization method is that the solutions of a nonlinear ODE have to be computed. For this it would be most helpful if explicit solutions of the double bracket flow were known. Unfortunately this appears to be possible only in one simple case.
3.3.1 Exact Solutions Suppose that X is a projection operator, that is X 2 = X holds. Then the double bracket equation (27) is equivalent to the matrix Riccati equation = AX + XA-
2XAX
(32)
For this Riccati equation explicit solutions are well-known. We obtain the formula
x ( t ) = r177
Xo + d'AXo)-~r 'A
(33)
for the solution of the double bracket flow (27) with X02 = X0, X0T = X0. Note that (33) defines indeed a projection operator for each t E I~. We use this formula to show that discretizing (27) at integer time t = k E N is equivalent to the power method for the matrix eA. For simplicity let us restrict to the case where rkXo = 1. Connection with the Power Method Suppose X(0) is a rank 1 projection operator. Then the solution formula (33) simplifies to
X(t)-
etAX(O)e tA tr(e2tAX(O)).
Thus
eAX(k)e A
x(k + 1)= t~(~2AX(k))"
(34)
Since X is a rank 1 projection operator we have X = ~ T for a unit vector ~ E I~n. The above recursion (34) is then obtained by squaring up the power iterations for the matrix eA:
eA~k O
3.3.2 Geodesic Approximation As we have mentioned above, it is in general not possible to find explicit solution formulas for the double bracket equation. Thus one is forced to look for approximate solutions. A straightforward although naive approach would be to discretize numerically the ODE on the space of symmetric matrices, using Euler type
48
U. Helmke
or Runge-Kutta methods. Such a standard discretization method is however not recommended, as it may lead to large errors in the eigenvalues. The difficulty with such an approach is that the fixed eigenvalue constraints are not satisfied during the iterations. Thus our goal is to find suitable approximations which preserve the isospectral nature of the flow. Such approximate solutions are given by the geodesics of M(A). The geodesics of the homogeneous space M(A) can be explicitly computed. They are of the form e t ~ X e -t~
,
X E M(A)
where f~ is any skew-symmetric matrix in the tangent space of M(A) at X. We are thus led to consider the following Geodesic Approximation Scheme Xk+l = e~k[Xk'A]Xke-~k[Xk'A]
(35)
for a still to be determined choice of a step-size factor ak. We assume, that there exists a function aA:M(A) ~ It~ such that ~k = a A ( X k ) for all k. Moreover we require that the step-size factor satisfies the following condition. Boundedness Condition There exist 0 < a < b < oc, with C~A:M(A)---, [a,b] continuous everywhere, except possibly at those points X E M(A) where A X = X A.
We then have the following general convergence result of (35). P r o p o s i t i o n ([22]) Let A = diag(#l,. . ., I.tn) with #1 > . . . > #n and let aA" M(A) ~ [a, b] satisfy the Boundedness Condition. For Xk E M(A) let ~k = a A ( X k ) and define A f A ( X k , ak) = t r ( A ( X k - Xk+l )) where Xk+l is given by (35). Suppose A f A ( X k , ak) < 0
whenever [A, Xk] # O.
Then
(a) The fixed points X e M(A) of the discrete-time iteration (35) are characterized by AX = XA. (b) Every solution Xk, for k e N, of (35) converges as k ---, oc to a diagonal matrix Xoo = diag()~r(1), . " , )~r(n)), :r a permutation. []
Note that the above result does not claim that Xk converges to the optimal fixed point A, which minimizes fA" M(A) ~ ~. In fact, the algorithm will converge to this local attractor A = diag(A1,..., A~), with A1 _> ... >_ A,~, only for generic initial conditions. For
Isospectral Matrix Flows for Numerical Analysis
49
a nongeneric initial condition the algorithm may well get stuck in a saddle point. This is in sharp contrast to the globally convergent Sort-Jacobi algorithm which will be introduced later. The following examples yield step-size selections ak which satisfy the assumption of the Proposition. E x a m p l e 1: C o n s t a n t Step-Size ([22]) 1 o~ : 4IIAll " ilXoll
(36)
E x a m p l e 2" Variable Step-Size I ([22])
1 I[A, Xdll = ) ~k--II[A, Xk]ll log(1 + IlXo : [i[-~:~.:x~]]ll
(a7)
E x a m p l e 3: Variable Step-Size II ([6])
211[A, Xk]ll 2
~k -- II[A, [A, Xk]]ll" II[[A, Xk], Xk]ll
(as)
In all three cases the isospectral algorithm (35) converges to a diagonalization. While the last variable step-size (38) is always larger than the (36) or (37), the on-line evaluation of the norms of Lie brackets, in comparison with the constant step size, causes additional computational problems, which may thus slow down overall convergence speed. What are the advantages and disadvantages of the proposed geodesic approximation method? 1. The algorithm (35) is isospectral and thus preserves the eigenvalues. 2. Linear convergence rates hold in a neighbourhood of the global attractor. 3. By combining the method with known quadratically convergent optimization schemes, such as the Newton method or a conjugate gradient method, local quadratic convergence can be guaranteed. However, the domains in M(A) where quadratic convergence holds can be rather small, in addition to the hugh computational complexity one faces by implementing e.g. Newton's method. Thus these quadratically convergent modifications are not suitable in practice. See the recent Ph. D. thesis of R. Mahony [21] and S. Smith [26] for coordinate free description of such methods. 4. Despite of all such possible advantages of the above geodesic approximation algorithm, the bottleneck of the method really is the need to compute on-line matrix exponentials. This makes such gradient-flow based schemes hardly practical. In section 5 we thus describe a way how to circumvent such difficulties.
U. Helmke
50 4
GENERALIZATIONS
The previous theory of double bracket flows can be extended in a straightforward way to compute eigenvectors or the singular value decomposition. Moreover, the whole theory is easily generalized to flows on Lie algebras of compact semisimple Lie groups.
4.1
FLOWS ON ORTHOGONAL MATRICES
Let SO(n) denote the Lie group of n x n real orthogonal matrices of determinant one and let A, X0 be n x n real symmetric matrices. We consider the task of optimizing the smooth function CA,Xo: SO(n) ----* I~
CA,Xo = [[A- OXoOTII2;
,
(39)
this time the optimization taking place on the Lie group SO(n) rather than on the homogeneous space M(Xo). We have the following analogous result of Theorem 2.1 by Brockett. Let us assume for simplicity that A = d i a g ( # l , . . . , #n) with #1 > ... > #~.
Theorem 3.1 (Brockett (1988)) (a) A matrix O e SO(n) is a critical point of CA,x0 : SO(n) ---, I~ if and only if[A, OXo| T] = O. Equivalently, for A = d i a g ( # l , . . . , # n ) , with #1 > ... > #n, Q E SO(n) is a critical point if and only if the row vectors of | form an orthonormal basis of eigenvectors of Xo. (b) The local minima of CA,Xo : S O ( n ) ~ (c) CA,X0: S O ( n ) ~
~ coincide with the global minima.
I~ is a Morse-Bott function.
D
The gradient flow of CA,X0: SO(n) ---, It~ with respect to the (normalized) Killing form is easily computed to be the cubic matrix differential equation
(9 = [|174
,
|
E SO(n).
(40)
Moreover, the solutions | E SO(n) exist for all t E It( and converge, as t ~ =t=~, to an orthogonal basis of eigenvectors of X0; see Brockett [5]. The associated recursions on orthogonal matrices which discretize the gradient flow (40) are
Ok+l = eak[Okx~174
(41)
where ak is any step-size selection as in section 3.3. In particular, if ak is chosen as in (36) - (37) (with Xk = QkXoQT), then the convergence properties of the gradient flow (2) remain in force for the recursive system (40); see [22] for details. Thus
Ok+~ ' e~[O~X~
,
o~ = 1/(411Xoll
IIAII),
(42)
Isospectral Matrix Flows for Numerical Analysis
51
converges to a basis of eigenvectors of X0. The rate of convergence in a neighbourhood of the stable equilibrium point is linear.
4.2
SINGULAR VALUE D E C O M P O S I T I O N
One can parallel the analysis of double bracket flows for symmetric matrix diagonalization with a corresponding theory for flows computing the singular value decomposition of rectangular matrices 9 For a full analysis we refer to [18], [25]. Let A, E E IR'~xm, m >_ n, be rectangular matrices of the form
A:=
E'=
#1
"'"
0
.
:
"..
:
9
0
...
#,~
o'1
.."
0
:
"..
:
0
"''
(7" n
O~,x(~_,-,)
(43)
: 0~x(.~_~)
(44)
.
with al _> 9149149 > a,~ >_ 0 and #1 > . . . > #n > 0. The Lie group O(n) x O(m) acts on the matrix space I~ n x m by ~: o ( ~ ) • o ( m ) • ~ •
((u, v ) , x )
--~ ~-~
~• uxy T
(45)
Thus the set M ( E ) of n x m real matrices X with singular values a l , . . . , an M ( E ) = { u E v T I u e O(n), V E O(m)}
(46)
is an orbit of a and therefore M ( E ) is a smooth, compact manifold. Consider the task of optimizing the smooth least squares distance function
FA: M ( E ) ---* II~ FA(X) =
IIA- XII 2
= E (o? + ~ ) - 2t~(AX~). i-1
For proofs of the following results we refer to [18], [25]; see also [9].
(47)
52
U. Helmke
T h e o r e m 3.2
(a) A matrix X E M ( 2 ) is a critical point of FA: M ( 2 ) - - , IR if and only if A X T = X A T and A T x = X T A holds. Equivalently, for A as in (~3) and al > ... > an > O, X is a critical point if an only if X = (diag(exa~(1),...,ena~(,~)), O,~•
(48)
with ei E {q-l} and rr: { 1 , . . . n } --~ { 1 , . . . , n} is a permutation.
(b) The local minima coincide with the global minima. T h e o r e m 3.3
(a) The gradient flow of the least squares distance function FA: M ( E ) ---, IR, F A ( X ) =
IIA- Xll 2,
is
2 = (AX T- xAT)X
- x(ATX
- xTA).
(49)
This differential equation defines a self-equivalent, i.e. singular value preserving, flow on I~ r~• m .
(b) The solutions X ( t ) of (~9) exist for all t e I~ and converge to a critical point of FA: M ( E ) ~ I~. In particular, every solution X ( t ) converges to a matrix Xr
= (diag(Ax,...,)~n), Onx(m-,~))
where ])ql, i = 1 , . . . , n, are the singular values of X(O) (up to a possible re-ordering).
(50) []
Similarly also recursive versions of (49) and corresponding flows for the singular vectors exist. We refer to [22] for details.
4.3
FLOWS ON LIE ALGEBRAS
Brockett's double bracket equation generalizes in a straightforward way to flows on Lie algebras. The importance of such generalizations lies in the ability to cope in a systematic way with various structural constraints, such as for skew-symmetric or Hamiltonian matrices. The standard example of a Lie algebra is the Lie algebra of skew-symmetric matrices. 1 More generally let us consider an arbitrary compact semi-simple Lie group G with Lie algebra ~. Then G acts on ~ via the adjoint representation Ad: G x ~
----* r
(g, ~) ~
Ad(g) .
1The set of symmetric matrices is not a Lie algebra, but a Jordan algebra!
(51)
Isospectral Matrix Flows for Numerical Analysis
53
where Ad(g). ~ is defined as d.
t~
Ad(g) . ~ = --~(ge g-1)lt=o. By differentiating Ad(g).~ with respect to the variable g we obtain the Lie bracket operation
ad: ~ x ~
(~,,7)
~
---,
[~, hi.
(52)
We also use the notation ad~(~)= [~, 7/]. Then the orbit
M(~) = {Ad(g). ~ l g e G)
(53)
is a smooth compact submanifold of the Lie algebra qS. If G is semi-simple, then the Killing form (~, r/): = -tr(ad~ o adn)
(54)
defines a positive definite inner product on @. Consider for a fixed element 7/E q5 the trace function Cn:M(~)
~~
,
Cn(~) = ((,77).
(55)
We now have the following natural generalization of Theorems 2.1, 2.4 to arbitrary compact semi-simple Lie algebras.
Theorem 3.4 (Bloch, Brockett, Ratiu (1990)) (a) There exists a Riemannian metric on M(~) such that the double Lie bracket flow
= [~, [~, ~]]
(56)
is the gradient flow of Cn: M(~) ~ ~ . (b) Every solution ((t) of (17) converges to an equilibrium point ~oo, characterized by [~oo, ~] = o. Therefore the double bracket flow (17) converges to the intersection of tow sets, one being defined by the a spectral constraints ~oo E M(~) while the other set is defined by the linear constraint [~oo,r/] = 0. Thus such flows are capable of solving certain structured inverse eigenvalue problems. This can in fact be done in more generality; cf. [10] for further examples and ideas.
54
U. Helmke
Remarks
(a) It is not quite proved (nor even stated!) in [4] that every solution r of the gradient flow (56) really converges to a single equilibrium point, rather than a set of equilibria. While this would be a trivial consequence of standard properties of gradient flows if (56) were known to have only finitely many equilibrium points, it is not so obvious in case (16) has a continuum of equilibria. However, as r M(~) ~ I~ can be shown to be a MorseBott function, convergence to a single equilibrium point then it follows from appropriate generalizations of standard convergence results of gradient flows; see e.g. Helmke and Moore (1994), Prop. 1.3.6. (b) There is a step-size selection ak similar to (38), such that the recursion ~k+l -- Ad(e-~
(57)
9~k
has the same convergence properties as (56); see Brockett (1993).
5
SORT-JACOBI
ALGORITHM
In the previous sections we have seen that double Lie bracket isospectral flows and their respective discretizations lead to convergent algorithms for matrix diagonalization. Although such algorithms are attractive from a theoretical viewpoint, they do lead only to highly unpractical, slowly convergent algorithms. Our goal in this section is to show how modifications of such algorithms yield efficient, quadratically convergent algorithms for matrix diagonalization. The algorithm which we derive is actually a modification of the classical Jacobi algorithm, which also incorporates sorting of the eigenvalues. This part is forthcoming joint work with K. Hiiper, where full details will appear.
5.1
MATRIX DIAGONALIZATION
We use the notation of section 3. Let ei E I~n denote the i-th standard basis vector of I~n. For 1 _< i < j _ n and t E I~ let Gij(t) = cos(t). (eie T + eje T) - sin(t). (eie T - eje T)
(58)
denote the Jacobi-rotation in the (i,j)-plane. Using any fixed ordering of {(i,j) E 1~21 1 _< i < j _< n} we denote by
G~(t),
. . ., GN(t)
,
lV =
ln(n-
1)
(59)
the Jacobi rotations of I~n. The Sort-Jacobi algorithm is build up similarly to standard numerical eigenvalue methods via the iteration of sweeps.
Isospectral Matrix Flows for Numerical Analysis
55
k-th Sort-Jacobi Sweep Define
X 0)
.=
G l ( t O ) ) X k G l ( - t (1))
X~ 2) .= G2(t!2))X~I)G2(-t!2) ) X~ N) "---- GN(t!N)~ y(N-1)GN(
)
(60)
where t! 0 9= arg re.in .{fA(Gi(t J"k ~ ( i - 1 )Gi(-t)}. tE[0,2~rJ for/=
(61)
1,...,N.
Thus X~ i) is recursively defined as the minimum of the least squares distance function
fA: M(Zo) ~ ~, when restricted to the i-th Jacobi geodesic {Gi(t)X~i-1)Gi(-t) l t e I~} r(~-~)
containing ~ k
9
The Sort-Jacobi algorithm then consists of the iteration of sweeps. Sort-Jacobi Algorithm 9 Let X o , . . . , X k
E M(Xo) be given for k E H0.
9 Define the recursive sequence
Xk(1) , X~2) , . . . ,
x~N)
as above (sweep). 9 Set Xk+l"= X~ N). Proceed with the next sweep.
The convergence properties of the algorithm are established by the following result. T h e o r e m 4.1 Let Xo = XTo E Xn•
be given and A = diag(#l,. . .,#n) with #1 > ... > #n. Let Xk, k = O, 1, 2,..., denote the sequence generated by the Sort-Jacobi algorithm. Then
(a) Global convergence holds, that is lim Xk = diag(11,... k---,oo
1,~)
(62)
with A1 >_ ... _> A~.
(63)
U. Helmke
56
Here hx,...,)~n are the eigenvalues of Xo. (b) The algorithm is quadratically convergent. Remarks
(a) The above algorithm automatically sorts the eigenvalues of X0 in a given order. If the diagonal entries #i in A were, e.g., chosen in an increasing way, than the eigenvalues A1,...,An of X0 would also appear in an increasing order. Similarly for any other permutation. This property does not hold for standard versions of the aacobi algorithm. (b) The theorem remains in force for any a priori chosen order, in which the sweep steps are applied. (c) The basic difference to standard textbook Jacobi algorithms is that we are maximizing the linear trace function tr(AX) in each step, rather than minimizing the quadratic Jacobi cost function Iloffdiag(X)ll ~. This is important for the proof, as the critical point structure of the trace function on M(A) is much simpler that that of I]offdiag(X)ll 2. This difference in the choice of cost functions is the main reason why our theory appears to be simpler than the classical theory of aacobi algorithms. P r o o f of T h e o r e m 4.1 ( S k e t c h ) For details we refer to our forthcoming work Hiiper and Helmke (1994). First we establish global convergence. By construction
/A(Xk+,) < /A(Zk). It is easily seen that equality holds if and only if Xk+l = Xk is a critical point of fA: M(Xo) Ii~. By a Lyapunov type argument we deduce that the sequence (Xk) converges to a critical point of fA: M(Xo) ~ Ii~; i.e. to a diagonal matrix X ~ . Suppose Xr162were a saddle point. Then a single sweep involving permutations, applied to Xk for k sufficiently large, would bring us arbitrarily close to the unique global minimum. Contradiction. Thus (Xk) converges to the global minimum of the cost function. To prove quadratic convergence we consider the algorithm in a neighbourhood of the global minimum A. Using the implicit function theorem one shows that there exists a smooth map ~p: U + U, defined on an open neighbourhood U C M(A) of A, such that for all sufficiently large k
Xk+~ = ~(Xk) holds. Thus the algorithm consists in iterating a locally defined smooth map of M(A). A calculation shows that the first derivative of ~: U ~ U at the fixed point A is zero! Thus the result follows from a standard Taylor expansion argument. Q.E.D. []
Isospectral Matrix Flows for Numerical Analysis 5.2
57
SVD
In an analogous way a Sort-Jacobi algorithm which computes the singular values of a matrix is obtained. Let A" = ( d i a g ( # : , . . . , #n),
0nx(m-~))
(64)
E : = (diag(a:,...,a,~),
0n•
(65)
with #: > ... > #n > 0 and a: _> .. >_ an _> 0. Using the well-known transformation
f~'--
[0A AT
0
'
0
we can reduce the problem to the symmetric case. Thus for any matrix Xo E I~n• with singular values a l , . . . , an we can apply the Sort-Jacobi algorithm to the symmetric (m + n) • (m + n) matrix )(0. The only difference now is that we only perform those Jacobi iterations which do preserve the structure of )(0. That is we are applying, in each sweep, only Jacobi-iterations of the form G:XG2 E M ( X ) . We then have an analogous convergence theory for the SVD Sort-Jacobi algorithm. T h e o r e m 4.2 Let Xo E I~n• and let A be defined by (64). W i t h # : >_ ...>_ #n >_ 0 let Xk, k = O, 1, 2,..., denote the sequence generated by the Sort-Jacobi SVD algorithm. Then (a) Global convergence holds, i.e. lira Xk = (diag(a:,... k-,,,,+ ~
an)
On•
(66)
'
with a: > _ . . . > a n > _ O .
(67)
Here a : , . . . , an are the singular values of Xo. (b) The algorithm converges quadratically fast.
Acknowledgement This work was partially supported by the German-Israeli-Foundation for scientific research and development, under grant 1-0184-078.06/91.
58
U. Helmke
References
[1] Ablowitz, M.J. and Clarkson, P.A., Solitons, Nonlinear Evolution Equations and Inverse Scattering, London Math. Soc. Lecture Notes Series 149, Cambridge Univ. Press, 1991 [2] Arnold, V.I., Mathematical Methods of Classical Mechanics, Second Edition, Graduate Texts in Mathematics 60, Springer, New York, 1989 [3] Bloch, A.M., Steepest descent, linear programming and Hamiltonian flows, Contemporary Math. 114, 77-88, 1990 [4] Bloch, A.M., Brockett, R.W. and Ratiu, T., A new formulation of the generalized Toda lattice equations and their fixed point analysis via the moment map, Bull. Amer. Math. Soc., 23,447-456, 1990 [5] Brockett, I~.W., Dynamical systems that sort lists and solve linear programming problems, Proc. 27th Conference on Decision and Control, Austin, TS, 779-803. See also Linear Algebra Appl. 146 (1991), 79-91, 1988 [6] Brockett, I~.W., Differential geometry and the design of gradient algorithms, Proceedings of Symposia in Pure Mathematics, 54, 69-91, 1993 [7] Chu, M.T., The generalized Toda flow, the QR algorithm and the center manifold theory, SIAM J. Disc. Meth., 5,187-201, 1984b [8] Chu, M.T., A differential equation approach to the singular value decomposition of bidiagonal matrices, Linear Algebra and its Appl., 80, 71-80, 1986 [9] Chu, M.T. and Driessel, K.R., The projected gradient method for least squares matrix approximations with spectral constraints, SIAM J. Numer. Anal. 27, 1050-1060, 1990 [10] Chu, M.T., Matrix differential equations: A continuous realization process for linear algebra problems, Nonlinear Analysis, TMA 18, 1125-1146, 1992a [11] Deift, P., Nanda, T. and Tomei, C., Ordinary differential equations for the symmetric eigenvalue problem, SIAM J. Numer. Anal. 20, 1-22, 1983 [12] Duistermaat, J.J., Kolk, J.A.C. and Varadarajan, V.S., Functions, flow and oscillatory integrals on flag manifolds and conjugacy classes in real semisimple Lie groups, Compositio Math. 49,309-398, 1983 [13] Faybusovich, L., Toda flows and isospectral manifolds, Proc. American Mathematical Society 115,837-847, 1992 [14] Flaschka, H., The Toda lattice, I, Phys. Rev. B 9, 1924-1925, 1974 [15] Flaschka, H., Discrete and periodic illustrations of some aspects of the inverse methods, in dynamical system, theory and applications, J. Moser, ed., Lecture Notes in Physics, 38, Springer-Verlag, Berlin, 1975
Isospectral Matrix Flows for Numerical Analysis
59
[16] Fomenko, A.T., Symplectic Geometry, Advanced Studies in Contemporary Mathematics, Vol. 5, Gordon and Breach Publ., New York, 1988 [17] Golub, G.H. and Van Loan, C.F., Matrix Computations, Second Edition, The John Hopkins University Press, Baltimore, 1989 [18] Helmke, U. and Moore, J.B., Singular value decomposition via gradient and self equivalent flows, Linear Algebra and its Appl. 69,223-248, 1992 [19] Helmke, U. and Moore, J.B., Optimization and Dynamical Systems, CCES, SpringerVerlag, London, 1994 [20] Kostant, B., The Solution to a generalized Toda lattice and representation theory, Advances in Mathematics 34, 195-338, 1979 [21] Mahony, R., Optimization Algorithms on Homogeneous @aces, Ph.D. Thesis, Canberra, ANU, 1994 [22] Moore, J.B., Mahony, R.E. and Helmke, U., Numerical gradient algorithms for eigenvalue and singular value decomposition, SIAM J. of Matrix Analysis, Appl., Vol. 15, 881-902, 1994 [23] Moser, J., Finitely many mass points on the line under the influence of an exponential potential- An integrable system, in J. Moser, Ed., Dynamical Systems Theory and Applications, Springer-Verlag, Berlin - New York, 467-497, 1975 [24] Rutishauser, H., Ein InfinitesimaIes Analogon zum Algorithmus, Arch. Math., (Basel), 5, 132-137, 1954
Quo tienten- Diff erenzen-
[25] Smith, S.T., Dynamical systems that perform the singular value decomposition, Systems and Control Letters 16,319-328, 1991 [26] Smith, S.T., Geometric optimization methods for adaptive filtering, PhD Thesis, Harvard University, 1993 [27] Symes, W.W., Systems of the Toda type, inverse spectral problems and representation theory, Inventiones Mathematicae 59, 13-51, 1980 [28] Symes, W.W., The QR algorithm and scattering for the finite nonperiodic Toda lattice, Physica 4D, 275-280, 1982 [29] Tomei, C., The topology of isospectral manifolds of tri-diagonal matrices, Duke Math. Journal 51,981-996, 1984 [30] Watkins, D.S., Isospectral Flows, SIAM Rev., 26,379-391, 1984 [31] Watkins, D.S. and Elsner, L., Self-similar flows, Linear Algebra and its Appl. 110, 213-242, 1988 [32] Watkins, D.S. and Eisner, L., Self-equivalent flows associated with the singular value decomposition, SIAM J. Matrix Anal. Appl. 10, 244-258, 1989
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
THE RIEMANNIAN
SINGULAR
61
VALUE DECOMPOSITION
B.L.R. DE MOOR ESAT-SISTA, K a t h o l i e k e Universiteit L e u v e n K a r d i n a a l M e r c i e r l a a n 9~ 3001 L e u v e n , Belgium bart. demoor@esat, kuleu ven. ac. be
ABSTRACT. We define a nonlinear generalization of the SVD, which can be interpreted as a restricted SVD with Riemannian metrics in the column and row space. This so-called Riemannian SVD occurs in structured and weighted total least squares problems, for instance in the least squares approximation of a given matrix A by a rank deficient Hankel matrix B. Several algorithms to find the 'minimizing' singular triplet are suggested. This paper reveals interesting and sometimes unexplored connections between linear algebra (structured matrix problems), numerical analysis (algorithms), optimization theory, (differential) geometry and system theory (differential equations, stability, Lyapunov functions). We also point out some open problems. KEYWORDS. (Restricted) singular value decomposition, gradient flows, differential geometry, continuous algorithms, total least squares problems, power method.
1
THE RIEMANNIAN
SVD: M O T I V A T I O N
Since the work by Eckart-Young [14], we know how to obtain the best rank deficient least squares approximation of a given matrix A E Itr215 of full column rank q. This approximation follows from the SVD of A by subtracting from A the rank one matrix u . a . v T, where (u, a, v) is the singular triplet corresponding to the smallest singular value a, which satisfies A v = uo" ~ ATu : va ,
uTu = 1 , v T v : 1.
(1)
Here u E li~p and v E ~q are the corresponding left resp. right singular vector. When formulated as an optimization problem, we obtain B
min
E It~T M
y EIt~q
I]A-BI] ~
subject to
By=0, yTy = 1 ,
(2)
62
B.L.R. De Moor
the solution of which follows from (1) as B = A - u a v T, y - v. Here, a is the smallest singular value of A and it is easy to prove that minB I I A - BII ~ = a 2 if S is required to be rank deficient. This problem is also known as the Total (Linear) Least Squares (TLS) problem and has a long history [1] [2] [14] [16] [17] [20] [25] [27] [28]. The vector y describes a linear relation between the columns of the approximating matrix B, which therefore is rank deficient as required. In this paper we consider two generalizations of the TLS problem. First of all, in the remainder of this Section, we describe how the introduction of user-given weights and/or the requirement that the approximant B is to be structured (e,g. Hankel or Toeplitz), leads to a (nonlinear) generalization of the SVD, which we call the R i e m a n n i a n S V D . In Section 2, we show that the analysis and design of continuous time algorithms provides useful insight in many problems and how they also provide a unifying framework in which several mathematical engineering disciplines meet. In Section 3, we discuss the power method in more or less detail to illustrate this claim. We also derive some new continuous-time algorithms, that are not yet completely understood. In Section 4, we discuss some ideas to generate algorithms for the Riemannian SVD. Conclusions are presented in Section 5. There are at least two important variations on the Total Least Squares problem (2) 1. The first one is called the Weighted TLS problem: p q min ~_~ ~ ( a i j - bij)2wij subject to B 6 n~pxq i=lj=l y6N q
By.o..-_ 0,1 tJ~u '
(3)
where the scalars wij E 1~+ are user-defined weights. An example is given by chosing wii - 1/a~j in which case one minimizes (a first order approximation to) the sum of relative errors squared (instead of the sum of absolute errors squared as in (2)). Another example corresponds to the choice wij E {0, 1) in which case some of the elements of B will be the same as corresponding elements of A (namely the ones that correspond to a weight wij - 1, see also the Example in Section 4.2). Yet other examples of weighted TLS problems are given in [10] [11]. The second extension consists of adding a so-called structural constraint to the optimization problem (2), in which case we have a Structured TLS problem. For instance, we could require the matrix B to be a Hankel matrix. The problem we are solving then is to approximate a given (possible structured) full column rank matrix A by a rank deficient Hankel matrix B. Rank deficient Hankel matrices in particular have important applications in systems and control theory. But in general there are many other applications where it is required to find rank deficient structured approximants [10] [11] [12] [13]. The results of this paper apply to affine matrix structures, i.e. matrices that can be written as a linear combination of a given set of fixed basis matrices. Examples are (block) Hankel, Toeplitz, 1The list of constraints discussed in this paper is not exhaustive. For instance, one could impose constraints on the vectors y in the null space, an example that is not treated here (see e.g. [9]).
63
The Riemannian SVD
Brownian, circulant matrices, matrices with a given zero pattern or with fixed entries, etc... The main result, which we have derived in [10], applies to both Weighted and/or Structured TLS problems and states that these problems can be solved by obtaining the singular triplet that corresponds to the smallest singular value that satisfies: Av-
u T D v u = 1,
D~ua ,
ATu---- D u v a ,
vTDuv-
1.
(4)
Notice the similarity with the SVD expressions in (1). Here A is the structured data matrix that one wants to approximate by a rank deficient one. Du and Dv are nonnegative or positive definite matrix functions of the components of the left and right singular vectors u and v. Their precise structure depends on the weights in (3) and/or the required affine structure of the rank deficient approximant B. To give one example (which happens to be a combination of a Weighted and Structured TLS problem), let us consider the approximation of a full column rank Hankel matrix A 9 ~pxq, p _> q, rank(A) = q, by a rank deficient Hankel matrix B such that [[A- S[[~ is minimized. In this case, the matrix D~ has the form Dv - T v W - 1 T T where W=diag[123
...
qq ...q
...321],
y-
(p-q+l)
times
iv v v oool
and Tv is a banded Toeplitz matrix (illustrated here for the case p - 4, q - 3) of the form"
T~ =
0
V1
V2
V3
0
0
0 0
0 0
Vl 0
v2 vx
va v2
0 v3
"
The matrix Du is constructed similarly as D~, - T u W - 1 T T. Obviously, in this example, both Du and D~ are positive definite matrices. Observe that B has disappeared from the picture, but it can be reconstructed as B = A - multilinear function of (u, a, v), (see the constructive proof in [11] for details). Observe that the modification to A is no longer a rank one matrix as is the case with the 'unstructured' TLS problem (2). Instead, the modification is a multilinear function of the 'smallest' singular triplet, the detailed formulas of which can be found in [10] [11]. We are interested in finding the smallest singular value in (4) because it can be shown that its square is precisely equal to the object function. For more details and properties and other examples and expressions for Du and Dv for weighted and structured total least squares problems we refer to [10] [11] [12] [13]. In the special case that D~ = Ip and Dv = Iq we obtain the SVD expressions (1). In the case that D~ and Dv are fixed positive definite matrices that are independent of u and v, one obtains the so-called Restricted SVD, which is extensively studied in [7] together with some structured/weighted TLS problems for which it provides a solution. In the Restricted
B.L.R. De Moor
54
SVD, D~ and D~ are positive (or nonnegative) definite matrices which can be associated to a certain inner product in the column and row space of A. Here in (4), D~ and D~ are also positive (nonnegative) definite, but instead of being constant, their elements are a function of the components of u and v. It turns out that we can interprete these matrices as Riemannian metrics, an interpretation which might be useful when developping new (continuous-time) algorithms. For this reason, we propose to call the equations in (4), the
Riemannian SVD 2. 2
CONTINUOUS-TIME
ALGORITHMS
As discussed in the previous section, in order to solve a (weighted a n d / o r structured) TLS problem, we need the 'smallest' singular triplet of the (Riemannian) SVD. The calculation of the complete SVD is obviously unnecessary (and in the case of the Riemannian SVD there is even no 'complete' decomposition). For the calculation of the smallest singular value and corresponding singular vectors of a given matrix A, one could apply the power method or inverse iteration to the matrix AT A (see e.g. [16] [24] [29]), which will be discussed in some more detail in the next Section. One of the goals of this paper is to point out several interesting connections between linear algebra, optimization theory, numerical analysis, differential geometry and system theory. We also would like to summarize some recent developments by which continuous time algorithms for solving and analyzing numerical problems have gained considerable interest the last decade or so. Roughly speaking, a continuous time method involves a system of differential equations. The idea that a computation can be thought as a flow t h a t starts at a certain initial state and evolves until it reaches an equilibrium point (which then is the desired result of the computation) is a natural one when one thinks about iterative algorithms and even more, about recent developments in natural information processing systems related to artificial neural networks 3. There are several reasons why the study of continuous time algorithms is important. Continuous time methods can provide new additional insights with respect to and shed light upon existing discrete time iterative or recursive algorithms (such as e.g. in [22], where convergence properties of recursive algorithms are analysed via an associated differential equation). In contrast to the local properties of some discrete time methods, the continuous time approach often offers a global picture. Even if they are not particularly competitive with discrete time algorithms, the combination of parallellism and analog implementation does seem promising for some continuous-time algorithms. In many cases, continuous-time algorithms provide an alternative or sometimes even better understanding of discrete-time versions (e.g. in optimization, see the continuous-time version of interior point techniques in [15, p.126], or in numerical analysis, the self-similar 2This name is slightly misleading in the sense that we do NOT want to suggest that there is a complete decomposition with rain(p, q) different singular triplets, which are mutual 'independent' (they are 'orthogonal') and which can for instance be added together to give an additive decomposition of the matrix A (the dyadic decomposition). There might be several solutions to (4) (for some examples, there is only one), but since each of these solutions goes with a different matrix D~ and D~, it is not exaclty clear how these solutions relate to each other, let alone that they would add together in one way or another to obtain the matrix A. SSpecifically for neural nets and SVD, we refer to e.g. [3] [4] [21] [23] (and the references in these papers).
The Riemannian SVD
65
iso-spectral (calculating the eigenvalue decomposition) or self-equivalent (singular value decomposition) matrix differential flows, see the article by Helmke [19] in this Volume and its rather complete list of references). The reader should realize that it is not our intention to claim that these continuous-time algorithms are in any sense competitive with classical algorithms from e.g. numerical analysis. Yet, there are examples in which we have only continuous-time solutions and for which the discrete-time iterative counterpart has not yet been derived. To give one example of a continuous-time unconstrained minimization algorithm, let us derive the continuous-time steepest descent method and prove its convergence to a local minimum. Consider the minimization of a scalar object function f ( z ) in p variables z E I~p. Let its gradient with respect to the elements of z be Vzf(z). It is straightforward to prove that the system of differential equations ~(t) = - V ~ f ( z ) , z ( 0 ) = z0,
(5)
where z0 is a given initial state, will converge to a local minimum. This follows directly from the chain rule: df(z) = (Vzf(z))TL " = dt
_llVzf(z)ll2
(6)
which implies that the time derivative of f ( z ) is always negative, hence, as a function of time, f ( z ) is non-increasing. This really means that the norm of the gradient is a Lyapunov function for the differential equation, proving convergence to a local minimum. In Section 3.8 we will show how to derive continuous-time algorithms for constrained minimization problems, using ideas from differential geometry. 3
CONTINUOUS-TIME PROBLEM
ALGORITHMS
FOR THE EIGENVALUE
The symmetric eigenvalue problem is the following: Given a matrix C - C T E IRpxp, find at least one pair (x, A) with x E I~p and A E I~ such that C x - " x ~ , x T x -- 1 .
(7)
Of course, this problem is of central importance in most engineering fields and we refer to [16] [24] [29] for references and bibliography. Here we will concentrate on power methods, both in continuous and discrete time. The continuous-time power methods are basically systems of vector differential equations. Some of these have been treated in the literature (see e.g. [6] [18] [26]). Others presented here are new. The difference with the matrix differential equations referred to in the previous section is that typically, these vector differential equations compute only one eigenvector or singular vector while the matrix differential equations upon convergence deliver all of them (i.e. the complete decomposition). Obviously, when solving (structured/weighted) TLS problems, we only need the 'minimal' triplet in (1) or (4). For the TLS problem (2) one might also (in principle) calculate the smallest eigenvalue of A T A and its corresponding eigenvector.
B.L.R. De Moor
56
3.1
G E O M E T R I C INTERPRETATION
Consider the two quadratic surfaces xTcx = 1 and xTx = 1. While the second surface is the unit sphere in p dimensions, the first surface can come in many disguises. For instance, when C is positive definite, it is an ellipsoid. In three dimensions, depending on its inertia, it can be a one-sheeted or two-sheeted hyperboloid or a (hyperbolic) cilinder. In higher dimensions, there are many possibilities, the enumeration of which is not relevant right now. In each of these cases, the vectors Cx and x are the normal vectors at x to the two surfaces. Hence, when trying to solve the symmetric eigenvalue problem, we are looking for a vector x such that the normal at x to the surface x T c x = 1 is proportional to the normal at x to the unit sphere xTx -- 1. The constant of proportionality is precisely the eigenvalue A.
3.2
THE EXTREME EIGENVALUES AS AN UNCONSTRAINED OPTIMIZATION PROBLEM
Consider the unconstrained optimization problem: min f(z)
zE~P
with f ( z ) -
~(zTCz)/(zTz) .
(8)
It is straightforward to see that Vzf(z) = ( C z ( z T z ) - z(zTCz))/(zTz) 2 ,
(9)
from which it follows that the stationary points are given by
C(z/
zVq~z)= ( z / ~ ) ( z T C z ) / ( z T z )
.
This can only be satisfied if x = z/[[z[[ is an eigenvector of C with corresponding eigenvalue x T c x . Obviously, the minimum will correspond to the minimal eigenvalue of C (which can be negative, zero or positive). The maximum of f(z) will correspond to the maximum eigenvalue of C.
3.3
E X T R E M E EIGENVALUES AS A CONSTRAINED OPTIMIZATION PROBLEM
We can also formulate a constrained optimization problem as: min f(x) subject to x ElmP
xTx--1
where
f(x)----xTcx.
(10)
The Lagrangian for this constrained problem is L(x, A ) = x T c x -1- A ( 1 - x T z ) , where A E R is a scalar Lagrange multiplier. The necessary conditions for a stationary point now follow from VxL(x, A) = 0 and V~L(x, A) = 0, and will correspond exactly to the two equations in (7). Observe that these equations have p solutions (x, A) while we are only interested in the minimizing one.
The Riemannian SVD
3.4
67
THE CONTINUOUS-TIME P O W E R M E T H O D
Let us now apply the continuous-time minimization idea of (6) to the unconstrained minimization problem (8). We then find the 'steepest descent' nonlinear differential equation ~. -- - ( C z ( z T z) -- z ( z T C z ) ) / ( z T z )
2.
(11)
An important property of this flow is that it is isonormal, i.e. the norm of the state vector z is constant over time: []z(t)[[ = [Iz(O)[[, Yt > O. This is easy to see since d[[z[[2/dt = 2 zT2; = O, hence [Iz(t)[[ is constant over time. This means that the state of the nonlinear system (11) evolves on a sphere with radius given by [[z(0)[ I. We can assume without loss of generality that [Iz(0)[]- 1. Hence (11)evolves on the unit sphere. Replacing z by x with Ilxll 2 - x T x -- 1 we can rewrite (11) = -cz
+ z
xTCx xTx
.
(12)
We know about this flow that it is isonormal and that it will converge to a local minimum. Hence, (12) is a system of nonlinear differential equations that will converge to the eigenvector corresponding to the minimal eigenvalue of the matrix C. This flow can also be interpreted as a special case of Brockett's [5] double bracket flow (see [19] for this interpretation).
3.5
THE CONTINUOUS-TIME P O W E R M E T H O D F O R THE LARGEST SINGULAR VALUE
The preceeding observations can also be used to derive a system of nonlinear vector differential equations that converges to the largest singular value of a matrix A E I~TM, p _> q. It suffices to choose in (12) C=
() 0 Ar
A 0
and
z =
v
.
(13)
This matrix C will have q eigenvalues given by a~(A), q eigenvalues equal to -a~(A) and p - q eigenvalues equal to 0. Hence the smallest eigenvalue is -amax(A). The first q components of z will go to the corresponding left singular vector u while its last q components converge to the right singular vector v.
3.6
THE D I S C R E T E - T I M E P O W E R M E T H O D
The surprising fact about the nonlinear differential equation (12) is that it can be solved analytically. Its solution is 9 (t) - -
e-c~(o)/ll~-C~x(O)ll,
(14)
which can be verified by direct substitution. If we consider the analytic solution at integer times t = k = 0, 1 , 2 , . . . , we see that x(k + 1) = e-Cx(k)/lIe-Cx(k)[I,
B.L.R. De Moor
68
i o.8
...............i................ ~ ..........
i
..... : .... :. i............. i .............. i................ : .................................
0.6
"~):-... ........
0.4I\,
0 2~-\..x 9
0
i-:~ ...........
i ..............
:.................................
................ 9
!.x.~.:-~ .........
; ................
:................
:................
...... i.. ::-,..:. k ..... i ................ ~ : .....%,:. \ : '...'..'.~: ~ ~ ~
i ............... : :
~ ..........
..........
~...i ~. . . . . . . . . . . . ".'.'::~;;. ;f,T:.~ ,.,,,',~,=.7.:.: ~. . . . . .
: ................
i ................ : : .-_.~~.-~-.~..~ - _
i ............... : : ~
.
,,~_,..
-o.2
..............::..".......:..i................!...............::................i...............
-0.4
..............
-o.~
-o.8
::...............
i ................
:: ...............
i ................
:: ...............
...............
" ...............
i ................
...............
i ...............
i ................
i. ...............
i ................
i................
::................
i ................
i
................
-1
0
0.5
1
1.5
2
2.5
3
Figure 1: As a function of time, we show the convergence of the elements of the vectors u(t) and v(t) of the differential equation (12) with C given by (13), where A is a 4 x 3 'diagonal' matrix with on its diagonal the numbers (5, 3, 1). The initial vector z0 = (u(O) T v(O)T) T is random. This differential system converges to the eigenvector corresponding to the smallest eigenvalue of C in (13), which is -amax(A) = - 5 . Hence, this flow upon convergence delivers the 'largest' singular triplet of the matrix A. The picture was generated using the Matlab numerical integration function 'ode45'.
which shows that the continuous time equation (12) interpolates the discrete time power method 4 for the matrix e -c'. This implies that we now have the clue to understand the global convergence behavior of the flow (12). Obviously, the stationary points (points where = 0) are the eigenvectors of C, but there is only one stationary point that is stable (as could be shown by linearizing (12) around all the stationary points and calculating the eigenvalues of the linearized system). It will always convergence to the eigenvector corresponding to the smallest eigenvalue of C, except when the initial vector x(0) is orthogonal to the smallest eigenvector. These observations are not too surprising since the solution for the linear part in (12) (the first term of the right hand side) is x(t) = e x p ( - V t ) x ( O ) . The second term is a normalization term which at each time instant projects the solution of the linear part back to the unit sphere. We can rewrite equation (12) as xx T . 5~ = ( I - ~-~x ) C x . This clearly shows how ~ is obtained from the orthogonal projection of the vector C x (which is the gradient of the unconstrained object function 0 . h z T C x ) onto the hyperplane which is tangent to the unit sphere at z.
4...which is exactly the reason why (12) is called the continuous-time power method.
The Riemannian SVD 3.7
69
A D I F F E R E N T I A L EQUATION DRIVEN BY T H E RESIDUAL
Another interesting interpretation is the following: Define the residual vector r(t) - Cx x(xTCx/xTx). Then the differential equation (12) reads = -~(t).
Hence, the differential equation is driven by a residual error and when r(t) = 0, we also have ~ = 0 so that a zero residual results in a stationary point. Moreover, using a little known fact in numerical anaylsis [24, p.69] we can give a backward error interpretation as follows. Define the rank one matrix M(t) - r(t).x(t) T. Then we easily see t h a t
( C - M(t))x(t) = x(t))~(t) with A(t)
= (x(t))Tcx(t)/(x(t)Tx(t))
.
The interpretation is that, at any time t, the real number )~(t) is the exact eigenvalue of a modified matrix, namely C - M(t). The norm of the modification is given by IIM(t)l] = IIr(t)ll, which is the norm of the residual vector and from the convergence, we know that
3.8
DERIVATION AS A G R A D I E N T F L O W
Gradient flows on manifolds can be used to solve constrained optimization problems. The idea is to consider the feasible set (i.e. the set of vectors that satisfies the constraints) as a manifold, and then by picking out an appropriate Riemannian metric, determining a gradient flow. Let us illustrate this by deriving a gradient flow to solve the constrained minimization problem (10). The set of vectors that satisfies the constraints is the unit sphere, which is known to be a manifold:
M= { z e~Pl xrz= l }. The tangent space at x is given by the vectors z that belong to
T x M = { z e I~Pl z T x = O }. The directional derivative Dxg(z) is the amount by which the object function changes when moving in the direction z of a vector in the tangent space:
Dxg(z) -~ xTCz
9
Next, we can choose a Riemannian metric represented by a smooth matrix function W(x) which is positive definite for all x E M. It is well known (see e.g. [18] that, given the metric W(x), the gradient Vg can be uniquely determined from two conditions:
1 Compatibility: Dzg(z)= z T C x -
2 Tangency: Vg E T~M
r
x TVg
zTW(x)Vg. = O.
70
B.L.R. De Moor
The unique solution to these two equations applied to the constrained optimization problem (10) results in the gradient given by
Vg(~) =
W(~)-~(C~
-
9
xrW(z)-lCz). ~rW(~)-~
~,From this we obtain the gradient flow: =
-w(~)-~(c~
_ ~
~rw(~)-1C~ ~rw(~)-~ )"
(15)
It is easily seen that the stationary points of this system must be eigenvector-eigenvalues of the matrix C. Convergence is guaranteed because one can easily find a Lyapunov function (essentially the norm of the gradient) using the chain rule (see e.g. [18] for details). Observe that the norm IIx(t)ll is constant for all t. This can be seen from 1 dllzll 2 = xT ~ = _ z T w ( z ) _ X C x + z T w ( z ) _ X C x = 0 2 dt
Hence, if IIx(0)ll = 1, we have IIx(t)l I = 1,Vt > 0. If we choose the Euclidean metric, W ( x ) = I~, we obtain the continuous-time power method (12). An interesting open problem is how to chose the metric W ( x ) such that for instance the convergence speed could be increased and whether a choice for W ( x ) other than Ip leads to new iterative discrete-time algorithms.
3.9
NON-SYMMETRIC CONTINUOUS TIME POWER METHOD
So far we have only considered symmetric matrices. However most of the results still hold true mutatis mutandis if C is a non-symmetric matrix. For instance, the analytic solution to (12) is still given by (14) even if C is nonsymmetric. For the convergence proof, we can find a Lyapunov function using the left and right eigenvectors of the non-symmetric matrix C. Let X E ]tip• be the matrix of right eigenvectors of C while Y is the matrix of left eigenvectors, normalized such that AX-XA, yTx--Ip , ATy = YA , X Y T - Ip . For simplicity we assume that all the eigenvalues of C are real (although this is not really a restriction). The vector y~n E ~P is the left eigenvector corresponding to the smallest eigenvalue. It can be shown that the scalar function L(z) = (xTymin) 2 / ( x T y y T x )
is a Lyapunov function for the differential equations (12) because L > 0, Yr. Note that x - x ( y T x ) , which says that the vector y T x contains the components of x with respect to the basis generated by the column vectors in X (which are the right eigenvectors). The denominator is just the norm squared of this vector of components. The numerator is the component of x along the last eigenvector (last column of X), which corresponds to the smallest eigenvalue. Since g > 0, this component grows larger and larger relative to all other ones, which proves convergence to the 'smallest' eigenvector.
The Riemannian SVD 3.10
71
N O N - S Y M M E T R I C C O N T I N U O U S - T I M E P O W E R M E T H O D F O R T H E SMALLEST SINGULAR VALUE
We can exploit these insights to calculate the smallest singular triplet of a matrix A 6 i~p• as follows. Consider the 'asymmetric' continuous-time power method:
i)
=
-
AT
-~Iq
v
v
~
'
(16)
with = ~(ur~-
v rv) I (~r~ + ~rv) "
Here c~ is a user-defined real scalar. When it is chosen such t h a t a > amin(A), u and v will converge to the left resp. right singular vector corresponding to the smallest singular value of A. This can be understood by realizing that the 2q (we assume t h a t p _> q) eigenvalues of the matrix
AT
-a Iq
are Ai = 4-~/a2 _ a2(A) and that there are p - q eigenvalues equal to c~. Hence, if c~ > hi, the corresponding eigenvalues Ai are real, else they are pure imaginary. So if a > amin, the smallest eigenvalue is A _ _ ~/c~2 _ O'mi 2 n to which (16) will converge. This is illustrated in Figure 2.
3.11
CHRISTIAAN'S F L O W F O R THE SMALLEST SINGULAR VALUE
The problem with the asymmetric continuous time power method (16) is t h a t we have to know a scalar ~ that is an upper bound to the smallest singular value. Here we propose a new continuous-time algorithm for which we have strong indications t h a t it always converges to the 'smallest' singular value, but for which there is no formal proof of convergence yet. Let A 6 n~T M (p >_ q) and r 6 I~ be a given (user-defined) scalar satisfying 0 < r < 1. Consider the following system of differential equations, which we call Christiaan's flow 5:
(it)(aIPi) = -
CA T
-ar
(17)
It is our experience that (u(t), a(t), v(t)) converges to the smallest singular triplet of A. The convergence behavior can be influenced by r (for instance, when r --* 1, there are many oscillations (bad for numerical integration) but convergence is quite fast in time; when r --* 0, there are no oscillations but convergence is slow in time). There are many similarities with the continuous-time algorithms discussed so far. When we rewrite these equations as
it = iJ -
Av-ua, -r
va) ,
(18)
5... after one of our PhD students Christiaan Moons who one day just tried it out and to his surpise found out that it always converges.
72
B.L.R. De Moor
1
!
!
i
1
!
0.8
0.8 "i"
0.6
0.6
0.4
0.4
o~o
~ -0.2 -
-0.4
-0.4
:
:
:
:
!
!
!
!
!
i 2
2.5
3
!
i 3.5
i
-0.2
-0.6
!
L
..... 1
-0.6-
:
-0.8 -0.8 -1 ................................
0
0.5
.
":.................................................................
, 1
1.5
2
2.5
3
.11 0
.
i
!
o 0.5
, 1
.
I 1.5
Figure 2: Convergence as a function of time of the components of u(t) and v(t) of the asymmetric continuous-time power method (16) for the same matrix A and the same random initial vectors u(0) and v(0) as in the previous Figure. The left picture shows the behavior for a = 2, which is larger than the smallest singular value and therefore converges. The right picture shows the dynamic behavior for a - 0.5. Now, all the eigenvalues of the system matrix are pure imaginary. Therefore, there is no convergence but instead, there is an oscillatory regime.
we easily see from (1) that both equations are 'driven' by the (scaled) residual error (see
Section 3.7). Another intriguing connection is seen by rewriting (17) as
(0) (0 01), -I
0
0
A
u
u
which compares very well to (15), except that the metric here is indefinite! As for a formal convergence proof, there are the following facts: It is readily verified that the singular triplets of A are the stationary points of the flow. When we linearize the system around these stationary points, it is readily verified that they are all unstable, except for the stationary point corresponding to the 'smallest' singular triplet of A. However, at this moment of writing, we do not have a formal proof of convergence (for instance a Lyapunov function that proves stability) ~.
4
ALGORITHMS
FOR THE RIEMANNIAN
SVD
In this section, we'll try to use the ideas presented in the previous section to come up with continuous time algorithms for the Riemannian SVD (4) and hence for structured/weighted total least squares problems. The ideas presented here may be premature but on the other 6...and as we have done before, we offer a chique dinner in the most exquis restaurant of Leuven if somebody solves our problem.
The Riemannian SVD
!
I
73
!
0.8
o.6
~. .....
'
~
:
0.4
~: 9 ...... ~i . . . . . . . . . . .
02
:
: .........
~
i .........
i..........
!.........
:
:
:
:
:
i !.........
i ~.........
i i .........
i
i
i
7
8
9
i i ~ ......... i ..........
:
i
9
-0,
2
3
4
i
5
i .........
i .........
. i
6
i
10
Figure 3: Example of the convergence behavior of Christiaan's flow (17) for the same matrix A and initial vectors u(0) and v(0) as in the first Figure. The full lines are the components of u, the dashed ones those of v. Both vectors converge asymptotically to the left and right singular vectors of A corresponding to the smallest singular value.
hand, they offer intriguing and challenging perspectives, which is the reason why we mention them. It should be noted that we have derived a heuristic iterative discrete-time algorithm for the Riemannian SVD in [12], the basic inspiration of which is the classical power method. While this algorithm works well in most cases, there is no formal proof of convergence nor is there any guarantee that it will convergence to (an at least local) minimum. Therefore, we try to go via these continuous-time algorithms to find an approach which would be guaranteed to converge to a local minimum. So far we have not succeeded in doing so, but we hope that the elements presented in this section provide enough new material to convince the reader of the usefulness of the presented approach.
4.1
AN OPTIMIZATION PROBLEM
Let us try to solve (see the equations (4):
uTAv vTATu a = min -- min . u,v uTDvu u,v vTDuv A nice property which can often that for every vector u and v, we The fact that D= is independent continous-time algorithm (5) and da 1 it -du = uTDv"-'--"~( - A v
iJ =
da 1 -- ~ ( - A T u dv vT Du v
be used in manipulating formulas in this framework, is always have uTDvu = vTDuv (see [12] for a proof). of v and Dv is independent of u allows us to apply the derive the system of differential equations uTAv + Dvu u r D v u ) ,
-~" Duv
vTATu ) 9 vT Du v
74
B.L.R. De Moor
While this system will converge to a local minimum, it is not this local minimum that solves our structured/weighted TLS problem. Indeed, to see why, it suffices to consider the special case where Dv = Ip and Du - Iq in which case we recover the system (12)-(13) which converges to -amax(A) and the corresponding singular vectors (and NOT amin(A)!).
4.2
CHRISTIAAN'S FLOW FOR THE RIEMANNIAN SVD
Although we are completely driving on heuristics here, it turns out that the following generalization of Christiaan's flow (17) works very well to find the minimal singular triplet of the Riemannian SVD (4).
with r E II~ a user-defined number satisfying 0 < r < 1. The only difference between (17) and (19) is the introduction of the positive definite metric matrices D,~ and D,~. We have no idea whatsoever about possible convergence properties (that are formally provable), except for the fact that in a stationary point, the necessary conditions (4) are satisfied. Yet, our numerical experience is that this system of nonlinear differential equations converges to a local minimum of the object function. As an example, let A 6 n~6x5 be a given matrix (we took a random matrix), that will be approximated in Frobeniusnorm by a rank deficient matrix B, by not modifying all of its elements but only those elements that are 'flagged' by a '1' in the following matrix
V=
oo
0 1 000 1 0 1 0 1 O 0 1 1 0
1 1 1 0 1
; Let W =
oo
oo
oo
oo
i
i
oo
1
I
c~
I
I
oo oo
0 1 1 1 1 0 0 1 1 1
1
1
oo
i
I
i
oo
1
i
I
oo
oo
i i i
be the elementwise inverse of V. The matrix W contains the weights as in (3) and the element oo means that we impose an infinite weight on the modification of the corresponding element in A (which implies that it will not be modified and that the corresponding element in B will be equal to that in A). It can be shown [11] that the metric matrices D~ and Dv for this flow are diagonal matrices given by
D~=diag(V
v~
),
OH = d i a g ( Y T
u~
)"
Here vi and ui denote the i-th component of v, resp. u and u 2 and v2 are their squares. As an initial vector for Christiaan's flow (19) we took [u(0) T v(0) T] = leT/v/6 eT/v/5] where ek E irk is a vector with all ones as its components. The resulting behavior as a function of
75
The Riemannian SVD
x~xxxx ~
4).
0
2
4
6
8
10
1
14
16
1
20
Figure 4: Vectors u(t) and v(t) as a function of time for Christiaan's flow (19), which solves the problem of least squares approximation of a given matrix A by a rank deficient one, while not all of its elements can be modified as specified by the elements in the matrix V which belong to {0, 1}. The vector differential equation converges to the same solution as the one provided by the discrete-time algorithm of [11], which on its turn is inspired by the discrete-time power method. time is shown in Figure 4. 5
CONCLUSIONS
In this paper, we have discussed how weighted and/or structured total least squares problems lead to a nonlinear generalization of the SVD, which we have called the Riemannian SVD. Next, we have derived several interesting interpretations of the continuous-time power method (geometric, unconstrained and constrained optimization, gradient flow, residual driven differential equation). We have also discussed Christiaan's flow, which is a set of nonlinear differential equations that seems to converge to the 'smallest' singular triplet, both of the SVD and the Riemannian SVD. As of now, there is however no formal proof of convergence. There are interesting analogies between the SVD and the Riemannian SVD, which are enumerated in the following table: There are still several open problems, which all together form a complete research program: Proofs of convergence of some of the algorithms we have introduced here, discretization (e.g. based on geodesic approximations (see e.g. [26])), decreasing the rank with more than 1 while preserving the structure, etc . . . . Also in the applications there are several interesting sideways that could be followed. For instance, in [8] we have shown how our approach leads to an H2-model reduction algorithm, while in [12] [13] we show how the structured total least squares problem with a 'double' Hankel matrix of noisy inputs and outputs is the
B.L.R. De Moor
76
Problem Applications
Decomposition Approximation B Algorithms
'Euclidean' TLS (Eckart-Young) Linear relations, noisy data
SVD of A 'Identity' metric Rank 1 modification Power method Christiaan's flow
'Riemannian' Weighted/Structured TLS Hankel matrices Model reduction, Dynamic TLS Relative error TLS Maximum likelihood Riemannian SVD of A Positive definite metric Du, Dv Multilinear matrix function Power method Christiaan's flow
same as the L2-optimal so-called errors-in-variables problem from system identification. Acknowledgements The author is a Senior Research Associate of the Belgian National Fund for Scientific Research. This work was supported by grants from the Federal Ministry of Scientific Policy (DWTC, with grants IUAP/PAI-17 (Modelling and Control of Dynamical Systems), IUAP/PAI-50 (Automation in Design and Production)), the Flemish NFWO-project no. G.0292.95 (Matrix algorithms for adaptive signal processing systems, identification and control) and the SIMONET (System Identification and Modelling Network) supported by the Human Capital and Mobility Program of the European Commision. I would sincerely like to thank Christiaan Moons, Jeroen Dehaene and Johan Suykens for many lively discussions as well as Thomas De Moor for letting me use his balloon in my attempts to illustrate the equivalence principle of general relativity in Gene Golub's car. References
[1] Adcock R.J. Note on the method of least squares. The Analyst, Vol IV, no.6, Nov. 1877, pp.183-184. [2] Adcock R.J. A problem in least squares. The Analyst, March 1878, Vol V, no.2, pp.5354. [3] Baldi P., Hornik K. Neural networks and principal component analysis: Learning from examples without local minima. Neural Networks, Vol.2, pp.53-58, 1989. [4] Bourlard H., Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern., 59, pp.291-294, 1988. [5] Brockett R.W. Dynamical systems that sort lists and solve linear programming problems. Linear Algebra and its Applications, 146, pp.79-91, 1991.
The Riemannian SVD
77
[6] M.T. Chu, On the continuous realization of iterative processes, SIAM Review, 30, 3, pp.375-387, September 1988. [7] De Moor B., Golub G.H. The restricted singular value decomposition: properties and applications. Siam Journal on Matrix Analysis and Applications, Vol.12, no.3, July 1991, pp.401-425. [8] De Moor B., Van Overschee P., Schelfhout G. H2-model reduction for SISO systems. Proc. of the 12th World Congress International Federation of Automatic Control, Sydney, Australia, July 18-23 1993, Vol. II pp.227-230.7 [9] De Moor B., David J. Total linear least squares and the algebraic Riccati equation. System ~z Control Letters, Volume 18, 5, pp. 329-337, May 1992 s. [10] De Moor B. Structured total least squares and L2 approximation problems. Special issue of Linear Algebra and its Applications, on Numerical Linear Algebra Methods in Control, Signals and Systems (eds: Van Dooren, Ammar, Nichols, Mehrmann), Volume 188-189, July 1993, pp.163-207. [11] De Moor B. Total least squares for aJi~nely structured matrices and the noisy realization problem . IEEE Transactions on Signal Processing, Vol.42, no.11, November 1994. [12] De Moor B. Dynamic Total Linear Least Squares. SYSID '94, Proc. of the 10th IFAC Symposium of System Identification, 4-6 July 1994, Copenhagen, Denmark. [13] De Moor B., Roorda B. L2-optimal linear system identification: Dynamic total least squares for SISO systems. ESAT-SISTA TR 1994-53, Department of Electrical Engineering, Katholieke Universiteit Leuven, Belgium. Accepted for publication in the Proc. of 33rd IEEE CDC, Florida, December 1994. [14] Eckart C., Young G. The approximation of one matrix by another of lower rank. Psychometrika, 1, pp.211-218, 1936. [15] Fiacco A.V., McCormick G.P. Nonlinear programming; Sequential unconstrained minimization techniques. SIAM Classics in Applied Mathematics, 1990. [16] Golub G.H., Van Loan C. Matrix Computations. Johns Hopkins University Press, Baltimore 1989 (2nd edition). [17] Golub G.H., Van Loan C.F. An analysis of the total least squares problem. Siam J. of Numer. Anal., Vol. 17, no.6, December 1980. [18] Helmke U., Moore J.B. Optimization and dynamical systems. CCES, Springer Verlag, London, 1994. [19] Helmke U. Isospectral matrix flows for numerical analysis. This volume. 7Reprinted in the Automatic Control World Congress 1993 5-Volume Set, Volume 1: Theory. 8Forms also the basic material of a Chapter in "Peter Lancaster, Leiba Rodman. Algebraic Riccati Equations. Oxford University Press, 1994.
78
B.L.R. De Moor
[20] Householder A.S., Young G. Matrix approximation and latent roots. Americ. Math. Monthly, 45, pp.165-171, 1938. [21] Kung S.Y., Diamantaras K.I., Taur J.S. Neural networks for extracting pure/constrained/oriented principal components. Proc. of the 2nd International Workshop on SVD and Signal Processing, June 1990.
[22] Ljung L. Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control, Vol.AC-22, no.4., August 1977, pp.551-575. [23] Oja E. A simplified neuron model as a principal component analyzer. J. Math. Biology, 15, pp.267-273, 1982. [24] Parlett B. The symmetric eigenvalue problem. Prentice Hall, Englewood Cliffs, NJ, 1980. [25] Pearson K. On lines and planes of closest fit to systems of points in space. Phil. Mag., 2, 6-th series, pp.559-572. [26] S.T. Smith, Geometric Optimization Methods for Adaptive Filtering, PhD Thesis, Harvard University, May 1993 Cambridge, Massachusetts, USA. [27] Van Huffel S., Vandewalle J. The Total Least Squares Problem: Computational Aspects and Analysis. Frontiers in Applied Mathematics 9, SIAM, Philadelphia, 300 pp., 1991. [28] Young G. Matrix approximation and subspace fitting. Psychometrika, vol.2, no.l, March 1937, pp.21-25. [29] Wilkinson J. The algebraic eigenvalue problem. New York, Oxford University Press, 1965.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
CONSISTENT
SIGNAL
RECONSTRUCTION
79
AND
CONVEX
CODING
N.T. THAO
Department of Electrical and Electronic Engineering Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong M. VETTERLI
Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 9~720 U.S.A. ABSTRACT. The field of signal processing has known tremendous progress with the development of digital signal processing. The first foundation of digital signal processing is due to Shannon's sampling theorem which shows that any bandlimited analog signal can be reduced to a discrete-time signal. However, digital signals assume a second digitization operation in amplitude. While this operation, called quantization, is as deterministic as time sampling, it appears from the literature that no strong theory supports its analysis. By tradition, quantization is only approximately modeled as an additive source of uniformly distributed and independent white noise. We propose a theoretical framework which genuinely treats quantization as a deterministic process, is based on Hilbert space analysis and overcomes some of the limitations of Fourier analysis. While, by tradition, a digital signal is considered as the representation of an approximate signal (the quantized signal), we show that it is in fact the representation of a deterministic convex set of analog signals in a Hilbert space. We call the elements of the set the analog estimates consistent with the digital signal. This view leads to a new framework of signal processing which is non-linear and based on convex projections in Hilbert spaces. This approach has already proved effective in the field of high resolution A/D conversion (oversampling, Sigma-Delta modulation), by showing that the traditional approach only extracts partial information from the digital signal (3dB of SNR are "missed" for every octave of oversampling).
80
N. Z Thao and M. Vertterli
The more general motivation of this paper is to show that any discretization operation, including A/D conversion but also signal compression, amounts to encoding sets of signals, that is, associating digital signals with sets of analog signals. With this view and the framework presented in this paper, directions of research for the design of new types of high resolution A/D converters and new signal compression schemes can be proposed. KEYWORDS. A/D conversion, digital representation, oversampling, quantization, SigmaDelta modulation, consistent estimates, convex projections, set theoretic estimation, coding.
1
INTRODUCTION
Although numbers are usually thought of real continuous numbers in theory, signal processing is nowadays mostly performed digitally. Traditionally, digital signals are considered as the encoded version of an approximated analog signal. In many cases, the approximation error is considered negligible and digital numbers are thought of quasi-continuous. However, this assumption starts to be critical in more and more emerging fields such as high resolution data conversion (oversampled A/D conversion) and signal compression. In this paper, we ask the basic question of the exact correspondence which exists between analog signals and there encoded digital signals. This starts by reviewing the existing foundations of analog-to-digital (A/D) conversion. It is known that A/D conversion consists of two discretization operations, that is, one in time and one in amplitude. While a strong theory (Shannon's sampling theorem) describes the operation of time discretization, we will see in Section 2 that the analysis of the amplitude discretization, or quantization, is only approximate and statistical. This approach turns out to be insufficient in fields such as oversampled A/D conversion. To find out what the exact analog information contained in a digital signal is, it is necessary to have a more precise description of the whole A/D conversion chain. In Section 3 we define a theoretical framework which permits a more precise description of A/D conversion. To do this, we go back to the basic description of an analog signal as an element of a Hilbert space (or Euclidean space in finite dimension), and wedescribe any signal transformation geometrically in this space, instead of using the traditional Fourier analysis which is limited to time-invariant and linear transformations. In this framework, we show that the precise meaning of a digital signal is the representation of a deterministic convex set of analog signals. The elements of the set are called the analog estimates consistent with the digital signal. Because of the convexity property, we show that, given a digital signal, a consistent estimate must be picked as a necessary condition for optimal reconstruction. With this new interpretation, digital signal processing implies a new framework of (nonlinear) signal processing based on convex projections in Hilbert spaces and presented in Section 4. In fact, the case of A/D conversion which is thoroughly considered in this paper is only a particular case of digitization system. The more general motivation of this paper is to
81
Consistent Signal Reconstruction and Convex Coding
show that the basic function of any digitization system, including high resolution data acquisition systems (Section 5) but also signal compression systems, is to associate digital representations with sets of analog signals, or, to encode sets of analog signals. Not only does this view give a genuine description of their functions, but it indicates new directions of research for the design of A/D converters and signal compression systems. 2
CLASSICAL PRESENTATION
OF A / D C O N V E R S I O N
The term of "digital signal processing" often designates what should be actually be called "discrete-time signal processing" [1]. Thanks to Shannon's sampling theorem, it is known that any bandlimited analog signal can be reduced to a discrete-time signal, provided that the sampling rate is larger than or equal to the Nyquist rate, that is, twice the maximum frequency of the input signal. Mathematically speaking, there exists a invertible mapping between bandlimited continuous-time signals x(t) and sequences (Zk)keZ such that 1 xk = x ( k T s ) , provided that T, = ]'8 >_ 2fro, where fm is the maximum frequency of x(t). Therefore, any processing of the continuous-time signal x(t) can be performed in the discrete-time domain. This constitutes the foundation of discrete-time processing. However, digital signal processing assumes that a second discretization in amplitude, or quantization, is performed on the samples, as indicated by Figure 1. The digital output bandlimited
x(t)
real
-Isamplerl
-I
real
xk _! quantizer l
I
time discretization
ampfitude discretization
c~
integer
_1 coder I
-I
linear scaling
dk
I
Figure 1: Analog-to-digital (A/D) conversion sample dk of an A/D converter is an integer representation of ck which is a quantized version of the continuous-amplitude sample xk. The transformation from Xk to Ck is known to introduce an error ek " - C k - Xk, called the quantization error. While the time discretization process is supported by a solid theory, it appears from the literature that there only exists an approximate analysis of the quantization process. Either the quantization error is neglected and the quantization operation is considered as "transparent", or, when some close analysis is needed, it is commonly modeled as a uniformly distributed and independent white noise [2, 1] 9 This leads to the classical mean squared quantization error of ~where q is the 12 quantization step size. However, this model, which is in fact only accurate under certain conditions [3, 4], does not take into account the deterministic nature of the quantization operation. This is particularly critical when dealing with oversampled A/D conversion. Oversampiing is commonly used in modern data conversion systems to increase the resolution of conversion while using coarse quantization. While the independent white noise model validity conditions become less and less valid with oversampling [4], it is still used as basic model to recover a high resolution estimate of the source signal from the oversampled and
82
N.T. Thao and M. Vertterli
coarsely quantized signal. With this model, a frequency analysis of the quantized signal shows in the frequency domain that only a portion of the quantization error energy lies in the input baseband region. Thus, the total energy of quantization noise can be reduced by the oversampling factor R = 2f-~, by using a linear lowpass filter at cut off frequency f,n (see Figure 2). In practice, the lowpass filtering is performed digitally on the encoded lowpass filter cut off =fm
=Isampler
quantizerI (~)
I
v
c(o)) X (co)~ in-band E ((o),~
Wj
-rr,
2~/R
error rr,
(b) Figure 2: Oversampled A/D conversion. (a) Principle: the sampling is performed at the frequency fs > 2fro. (b) Power spectrum of the quantized signal (Ck)keZ with the white quantization noise model. version (dk)keZ of represented.
(Ck)keZ. On
Figure 2(a), only the equivalent discrete-time operation is
Although this noise reduction can be observed in practice under certain conditions, this does not tell us how much exactly we know about the analog source signal from the oversampled and quantized signal. We can already give a certain number of hints which tell us that a linear and statistical approach of the quantization process is not sufficient to give a full analysis of the signal content process. First, it is not clear whether the in-band noise which cannot be canceled by the lowpass filter is definitely irreversible. Because quantization is a deterministic process, there does exist some correlation between the input signal and the quantization error signal, even after filtering. Second, with the linear filtering approach, it appears that the quantization mean squared error (MSE) has a non-homogeneous dependence with the time resolution and the amplitude resolution. Indeed, the MSE is divided by 4 when the amplitude resolution is multiplied by 2 (that is, q is divided by 2), whereas it is divided by 2 only when the time resolution is multiplied by 2 (that is, R is multiplied by 2). This is a little disappointing when thinking of A/D conversion as the two dimensional discretization of a continuous graph. In fact, an example can already be given which shows by some straightforward mechanisms that the in-band noise is indeed not irreversible. Figure 3 shows a numerical example
Consistent Signal Reconstruction and Convex Coding
2q
! ,
.
.
! .
:
:
quantization threshold
' ,.
~ ~ ,
~
analog input signal : x(t)
3q/2 ................. :-- * -* '--:- :- *- ". z~* -~ i
'
:
"
'
.
i
X
',
i
'
.
i
i
83
'
,
9 quantized signal
ck
,, estimate c k obtained from linearfiltering of
ck
~, projection of the estimate
ckonC
.... i .... i .... ) .... i 1 :2 3 4
5 6
'7 '8 9 fO 1.1 1~2 13 1'4 1.5 1'6 17
I remaining error
Figure 3: Example of oversampling and quantization of a bandlimited signal with reconstruction by linear filtering. of a bandlimited signal x(t), shown by a solid line, which is oversampled by 4 and quantized, giving a sequence of values (ck)keZ represented by black dots. The classical discrete-time reconstruction (~k)keZ obtained by lowpass filtering (ck)keZ is shown by the sequence of crosses. Some error represented by grey shades can be observed between the signal reconstruction ((3k)keZ and the samples of the input signal. We know that this error forms a signal located in the frequency domain in the baseband region. However, some anomalies can be observed in the time domain. At instants 11 and 12, it can be seen that the values of (dk)kez are larger than q, while the given values of the quantized signal (ck)keZ tell us that the input signal's samples necessarily belong to the interval [0, q]. Not only is the sequence (ck)keZ not consistent with the knowledge we actually have about the source input signal, but this knowledge also gives us a deterministic way to improve the reconstruction estimate (ck)keZ. Indeed, although we don't know where exactly the samples of the input signal are located within the interval [0, q] at instants 11 and 12, we know that projecting the two respective samples of (ck)keZ on the level q leads to a necessary reduction of the error (see Figure 3). This shows that the in-band error is not irreversible. These hints show that a new framework of analysis is necessary.
3 3.1
NEW ANALYSIS OF A/D CONVERSION SIGNAL
ANALYSIS
FRAMEWORK
The goal is to define a framework where quantization can be analyzed in a deterministic way with the given definition of an error measure. Bandlimited analog signals are usually formalized as elements of the space /~2(R) of square summable functions, where Fourier decomposition is applicable. Thanks to Shannon's sampling theorem, the analog signals x(t) bandlimited by a maximum frequency fm can be studied as elements of the space s of square summable sequences, thanks to the
84
N. T Thao and M. Vertterli
invertible mapping
z(t)
, (xk)k~Z E L2(Z) where xk = x(kT,),
under the condition that fa = ~1 _ 2fro. Errors between the bandlimited analog signals are measured using the canonical norm of s and can also be evaluated in the discrete-time space/:2(Z) using its own canonical norm, thanks to the relation:
• f~ I~(t)l~t = ~ I~kl~ T8
R
kEZ
Unfortunately, this framework cannot be used to study quantization because the quantized version (ck)keZ of an element (Zk)keZ of s is not necessarily an element of s (or, is not necessarily square summable). For example, using the quantization configuration of Figure 3, although a sequence (Xk)kE Z may be decaying towards 0 when k goes to infinity, its quantized version (ck)keZ never goes below ~ in absolute value. On the other hand, while the MSE type of error measure can be applied for the analysis of quantized signals, it cannot be applied to the elements of s since it would systematically lead to the value 0. Therefore, we propose to confine ourselves to another space of bandlimited signals which can be entirely defined on a finite time window [0, T0]. Precisely, we assume that the sinusoidal components of the Fourier series expansion of x(t) on [0, T0] are zero as soon as the corresponding frequencies are larger than fro. This is equivalent to saying that the T0-periodized version of z(t) defined on [0, To] is bandlimited by the maximum frequency f,n. Under this assumption, we have a finite time version of Shannon's sampling theorem. It can be easily shown that, under the condition ~N-1 _ 2fro equivalent to the Nyquist condition, there is an invertible mapping between z(t) and its discrete-time version X = (Xk)l 1, C is obviously the N dimensional cross-product of real intervals and therefore forms geometrically a hypercube of R N parallel to the canonical axes. As a generalization of the case N = 1, the quantized signal C appears to be the particular consistent estimate located at the geometric center of C. As in the case N = 1, we consider that the digital output D is the encoded version of the whole set C, not of the signal C. In the case of oversampled A/D conversion, the quantization operation is performed, not on any element of R N, but on the sampled version of bandlimited signals only. Indeed, it is easy to see from the assumption of Section 3.1 that the bandlimited signals have a finite Fourier series expansion containing not more than 2f,.,To + 1 components. As a
86
N. I". Thao and M. Vertterli
consequence, they belong to a space of finite dimension equal to W = [2freT0 + 1], where [y] designates the smallest integer greater than or equal to y. As a second consequence, their sampled version also belongs to a W dimensional space, since the sampling operation applied on bandlimited signals is a linear and invertible mapping. Because of the Nyquist rate condition ~ > 2fro, note that we necessarily have W < N. Therefore, the sampled versions of the bandlimited signals belong to a W dimensional subspace S of l:t N. By abuse of language, we call 5` the space of bandlimited discrete-time signals. It can be shown that the dimensional ratio N coincides approximately with the oversampling ratio R. To recapitulate, in the oversampling context, the inputs to the quantizer are elements of the subspace S C R g. Once X E 5' is quantized into C, the complete knowledge which is available about X is that X belongs to the set S N 17 where 17 =bf Q-I[C]. We will say that S g117 is the set of estimates consistent with C. This set is geometrically represented in Figure 5.
..........1
Figure 5: Geometric representation of oversampled A/D conversion.
3.3
NECESSITY FOR CONSISTENT RECONSTRUCTION
In the previous section, it was shown that when an input signal is quantized, the exact information which remains available to us is that it belongs to the set of consistent estimates. However, nothing tells us until now that we must pick a consistent estimate if we want to estimate the input signal from its quantized version. We show in this section that this is in fact the case in a certain sense. It can be easily shown from the previous section that sets of consistent estimates are convex. We recall that .,4 is a convex set if and only if for any couple of elements X, Y E `4, the segment [X, Y] is entirely included in ,4. Because the considered norm I1" II in R g is a euclidean norm, the convexity property will appear to play an important role thanks to the following lemmas: L e m m a 3.1 [7] Let X be an element of I~ N and A C I~ N be a convex set. There exists a unique element X ' of the closure -~ of A such that for all Y E `4, I l Z ' - Y[[ < [Iz - Yli. The transformation from X to X ~ is then a mapping of R N called the convez projection on
Consistent Signal Reconstruction and Convex Coding
87
A~ L e m m a 3.2 [8] If X I is the convex projection of X on a convex set A C R N and X q~ "A, then for all Xo e ,4, [ I X ' - Xo[[ < [ I X - Zo[[. These lemmas are illustrated by Figure 6. They lead to the following proposition.
Figure 6: Geometric representation of convex projection. P r o p o s i t i o n 3.3 Let Xo E I~ N be the input of a quantizer, Co the output of the quantizer and X an element of R y which does not belong to the set A of estimates consistent with Co. Then, although we don't know where the input Xo is located within the set A, the distance between X and Xo can be deterministically reduced 1 by a convex projection of X on .A. We recall that .A is equal to Q-I[C0] without oversampling, and SNQ-I[C0] with oversampiing. In any case, the operation of convex projection on .A is uniquely determined by the knowledge of Co. As a conclusion, when reconstructing a discrete-time signal from its quantized version Co, any non-consistent estimate is by necessity non-optimal and can be deterministically improved using the knowledge of Co.
3.4
DETERMINISTIC ANALYSIS OF OVERSAMPLED A/D CONVERSION
We have already seen from Figure 3 an example where the reconstruction estimate C = (ck)l 0 such that Vn E N, a,~ E [e, 2 - e]. In practice, the speed of convergence can be often accelerated by empirical adjustments of the coefficients an in ]1, 2[. One drawback of the alternating projection algorithm is that it does not permit parallel processing. A new algorithm involving parallel projections was recently introduced by Combettes [13] and based on the following operation
Xn+l -- ~
'Wi,n" Pi[Zn], where Wi,n >_ 0 and ~
ieln
wi,,~ = 1,
ieI,,,
and where In is a subset of indices of {1,...,p}. Qualitatively speaking, at each step n, a certain number of sets among C1, ..., Cv is selected (the set of the indices of the selected sets is called In) and the convex projections of Xn on these selected sets are applied. This forms a set of points {Pi[X,~] / i E In) and Xn+l is chosen in the convex envelop of this set. These operations are illustrated in Figure 7. The distance of the estimate X,~ with any element of ,4 is shown to be reduced by this transformation [14] and the infinite iteration is proved to converge to an element of ,4 under certain conditions on the sequence (In)heN [13]. The admissible choices of (I~)~eN include two particular cases: (i) I,~ = {1, ...,p}: This is the case where all convex projections are performed in parallel at each step. (ii) I,~ = {n mod p + 1}: This fails back to the case of alternating projections. A version with relaxation coefficients is also introduced in [13] as: X,,,+I = an" ~ wi,,~Vi[X,-,] + (1 - an)" X~,. iE I,., The convergence to an element of ,4 is shown to be guaranteed if 3e > 0, gn E N, an E [e, 2Lr, - e] where _
E ~ . ~,~
- It- , 7o 5
APPLICATION
IIP~[X~]-
X~ll ~
itl TO HIGH RESOLUTION
DATA CONVERSION
Although the deterministic approach was introduced on the simple version of oversampled A / D conversion in Section 3, it is also applicable to modern techniques of high resolution data conversion such as oversampled ~ A [15, 16]. The conversion scheme is similar to that of Figure 2, but the quantizer is replaced by a more sophisticated circuit called a ~ A 3The reduction of distance is strict when Xn ~t S rood v+l and ~n E [0, 2].
Consistent Signal Reconstruction and Convex Coding
91
modulator, including an integrator, a quantizer and a feedback loop (see Figure 9). This
ck
I
ZA I
sampler
,
(a)
I
-I
I
TM
V% (b) Figure 9: 2 A modulation. (a) Overall principle. (b) Detail of the ~A modulator. type of data conversion allows the use of very coarse quantization (down to one bit), and thus simple circuitry, while reproducing a high resolution estimate after lowpass filtering. Although the conditions of validity of the white quantization noise model are not really applicable here [4], ~ A modulation is still classically analyzed using this model [16]. In this context, it is shown that a ~A behaves like an additive source of independent noise whose spectrum is "shaped" as shown in Figure 10. Then, it is easy to show that the ~,. i ~
in-band
error
C(o))
f
/-1
I
2rdR Figure 10: Power spectrum of the output of a ~A modulator with the assumption of white quantization noise. portion of noise energy contained in the baseband of the quantized signal decreases with the oversampling ratio R in R -3, which represents a decrease of 9dB per octave of R. In spite of the limited validity of the assumed model, this result is observed in practice. More sophisticated architectures of ~A modulation exist, which include a higher number of integrators [17]. In general, for an n th order ~ A modulator, the noise energy remaining in the baseband of the quantized signal depends on R in R -(2n+1). Now, the same kind of question as in Section 3 can be raised here. What do we know exactly about a bandlimited signal after it is oversampled and processed through a ~A modulator ? Like a single quantizer, a F~A modulator can also be studied as a many-to-one mapping of R g. The set C of estimates consistent with the output of a ~A modulator can be obtained
92
N.T. Thao and M. Vertterli
by inversion of this mapping. It is shown in [5, 6] that the set is no longer a hypercube, but a parallelepiped (the edges are no longer perpendicular). However, this is still a convex set, and it is shown that the quantized signal C = (r is still located at its geometric center. As in Section 3, the set of consistent estimates is S N C. This is geometrically represented in Figure 11..Mthough the distance between X and C, due to the in-band C
X
s
~
.............~....
l
I-
-of-band
...................
Figure 11: Geometric represetation of oversampled lEA modulation. error remaining in the quantized signal C, decreases with R faster than in the case of simple quantization, it appears that C is still not necessarily a consistent estimate. In fact, numerical experiments performed on bandlimited and T0-periodic signals [5, 6] show that the MSE yielded by consistent estimates decreases in average w i t h / / i n O(//-4) instead of (.0(//-3). In general, for an n th order EA modulator~ it was shown that the average MSE of consistent estimates behaves in O(//-(2~+2)) instead of 0(R-(2~+1)), implying, as in the case of simple quantization, a faster decrease of MSE by 3dB per octave of R, regardless of the order n. With a deterministic approach, these experiments show that the output of a lEA modulator contains more information about the input signal than that recovered with the classical approach of A/D conversion. 6
CONCLUSION
AND RELATED RESEARCH
The full meaning of a digital signal is obtained by a deterministic analysis of the digitization process as a many-to-one mapping. Thus, a digital signal is the representation of, not a single estimate, but a whole set of analog signals, called the set of consistent estimates. This set plays two roles: (i) it gives the exact knowledge of the possible locations of the original analog signal, (ii) it is the set where a signal should be picked when estimating the original signal from its digital version. The second item is due to the convexity of the set, as observed on classical quantization schemes. With this approach, not only is a more precise analysis of the A/D conversion process given, but, in the context of oversampling and ~A modulation, it also leads to the conclusion that a digital output signal contains more analog information about the input
93
Consistent Signal Reconstruction and Convex Coding
signal than that traditionally recovered by the classical analysis of A/D conversion. Namely, the MSE of consistent estimates decreases with the oversampling ratio R fastest than that of the classical linear reconstruction estimate by 3dB per octave. This new approach of digital signals implies a new framework of signal processing based on convex projections in Hilbert spaces, derived from the field of set theoretic estimation. This past research leads to the new idea that the intrinsic function of an A/D converter is to split the space of analog input signals into convex sets, and assign a digital representation to each of them. For this reason, we say that an A / D converter is a convex coder. The intrinsic performance of a convex coder can be evaluated by its ability to split the input space into small sets with respect to the considered error measure. Recent research has been done to measure the intrinsic performance of an oversampled A/D converter or a EA modulator [18, 19, 20]. Figures 12 and 13 show that the evolution of the intrinsic performance of a ]CA modulator with the oversampling ratio R or with the order n of the modulator can be graphically observed by the set partition it defines in the input space. The intrinsic performance of the encoder can be measured by the average MSE of optimal SINGLE-LOOP, R=6
SINGLE-LOOP, R=4
/
~ 3 ~
\\
\'
\
' 0.6 0.4
0.4
iX
0.2
,
'~
0.2
/
/
~
8
o
~
o~ o -0.2 o -0.4
-0.2 -0.4
-0.6
..
-1
-0.5
-0.6
0 sin component
(a)
0.5
'-,,
,/
1
-1
-0.5
0
0.5
sin component
(b)
Figure 12: Partition defined by a first order EA modulator in the 2 dimensional space of To- periodic sinusoids with arbitrary amplitude and phase: (a) Case of oversampling ratio R = 4 . (b) C a s e R = 6 . reconstruction which consists of picking for each cell of the input space partition its centroid. It was shown in [19, 20] that optimal reconstruction yields the same MSE behavior in R as consistent reconstruction. This input space view can be a new direction for the design of high resolution data converters, traditionally designed using the noise shaping approach. The convex coding approach can also be applied to signal compression [21]. Although this field implies a digital to digital transformation, the input signal is usually considered as quasi-continuous in amplitude. In this context, it is shown in [21] that classical signal compression schemes such as block DCT coding can be analyzed as convex coding schemes. An example of new signal compression scheme is proposed by a direct and active control of the encoded sets.
94
N. T Thao and M. Vertterli SINGLE-LOOP, R=4
.
.
.
DOUBLE-LOOP, R=4
.
.
.
i
I
oo ~o-0.
(~ -0.5
-,/ -2
"
~
-1.5
-~
-0'.5 0 0.5 sin component
(~)
1
1.5
2
-2
-1.5
-1
-0.5
0
0.5
1
.5
sin component
(b)
Figure 13: Partition defined by a ~A modulator in the 2 dimensional space of T0-periodic sinusoids with arbitrary amplitude and phase at oversampling ratio R = 4. (a) Single-loop case. (b) Double-loop case. References [1] A.V.Oppenheim and R.W.Shafer, Discrete-Time Signal Processing. 1989.
Prentice Hall,
[2] N.S.Jayant and P.Noll, Digital Coding of Waveforms. Prentice-Hall, 1984. [3] W.R.Bennett, "Spectra of quantized signals," Bell System Technical Journal, vol. 27, pp. 446-472, July 1948. [4] R.M.Gray, "Quantization noise spectra," IEEE Trans. Information Theory, vol. IT-36, pp. 1220-1244, Nov. 1990. [5] T.T.Nguyen, "Deterministic analysis of oversampled A/D conversion and ~A modulation, and decoding improvements using consistent estimates," PhD. dissertation, Dept. of Elect. Eng., Columbia Univ., Feb. 1993. [6] N.T.Thax) and M.Vetterli, "Deterministic analysis of oversampled A/D conversion and decoding improvement based on consistent estimates," IEEE Trans. on Signal Proc., vol. 42, pp. 519-531, Mar. 1994. [7] D.G.Luenberger, Optimization by vector space methods. Wiley, 1969. [8] L.M.Bregman, "The method of successive projection for finding a common point of convex sets," Soviet Mathematics - Doklady, vol. 6, no.3, pp. 688-692, May 1965. [9] N.T.Thao and M.Vetterli, "Reduction of the MSE in R-times oversampled A/D conversion from O(1/R) to O(1/R2)," IEEE Trans. on Signal Proc., vol. 42, pp. 200-203, Jan. 1994.
2
Consistent Signal Reconstruction and Convex Coding
95
[10] D.G.Luenberger, Linear and nonlinear programming. Wiley, 1984. [11] P.L.Combettes, "The foundations of set theoretic estimation," Proc. IEEE, vol. 81, no. 2, pp. 1175-1186, Feb. 1993. [12] D.C.Youla and H.Webb, "Image restoration by the method of convex projections: part 1 - theory," IEEE Trans. Medical Imaging, 1(2), pp. 81-94, Oct. 1982. [13] P.L.Combettes and H.Puh, "A fast parallel projection algorithm for set theoretic image recovery," Proc. IEEE Int. Conf. ASSP, vol. V, pp. 473-476, Apr. 1994. [14] P.L.Combettes and H.Puh , Personal communication, May 1994. [15] J.C.Candy, "A use of limit cycle oscillations to obtain robust analog-to-digital converters," IEEE Trans. Commun., vol. COM-22, pp. 298-305, Mar. 1974. [16] J.C.Candy and G.C.Temes, eds., Oversampling delta-sigma data converters. Theory, design and simulation. IEEE Press, 1992. [17] S.K.Tewksbury and R.W.Hallock, "Oversampled, linear predictive and noise shaping coders of order N > 1," IEEE Trans. Circuits and Systems, vol. CAS-25, pp. 436-447, July 1978. [18] S.Hein, K.Ibraham, and A.Zakhor, "New properties of sigma-delta modulators with dc inputs," IEEE Trans. Commun., vol. COM-40, pp. 1375-1387, Aug. 1992. [19] N.T.Thao and M.Vetterli, "Lower bound on the mean squared error in multi-loop 2EZX modulation with periodic bandlimited signals," Proc. 27th Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, Nov. 1993. [20] N.T.Thao and M.Vetterli, "Lower bound on the mean squared error in oversampled quantization of periodic signals," IEEE Trans. Information Theory. Submitted in June 1993, revised in Sept. 1994. [21] K.Asai, N.T.Thao, and M.Vetterli, "A study of convex coders with an application to image coding," Proc. IEEE Int. Conf. ASSP, vol. V, pp. 581-584, Apr. 1994.
This Page Intentionally Left Blank
PART
2
ALGORITHMS AND THEORETICAL
CONCEPTS
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved. :t.
THE
ORTHOGONAL
99
QD-ALGORITHM
U. VON MATT Institute for Advanced Computer Studies University of Maryland College Park, MD 207~2 U.S.A. na. von matt @na- ne t. ornl. gov
ABSTRACT. We present the orthogonal qd-algorithm to compute the singular value decomposition of a bidiagonal matrix. This algorithm represents a modification of Rutishauser's qd-algorithm, and it is capable of determining all the singular values to high relative precision. We also introduce a generalization of the Givens transformation, which has applications besides the orthogonal qd-algorithm. KEYWORDS. Generalized Givens transformation, Newton's method, Laguerre's method, orthogonal qd-algorithm, singular value decomposition.
1
INTRODUCTION
In 1990 J. Demmel and W. Kahan showed that all the singular values of a bidiagonal matrix are determined to high relative accuracy by the entries of the matrix [2]. They modified the SVD-algorithm by Golub and Reinsch [6] as implemented in the LINPACK library [3] by introducing special zero shifts to compute the small singular values to high relative accuracy. Their code is now a part of the LAPACK library [1]. A different approach is proposed by K. V. Fernando and B. N. Parlett [4]. They use a modification of Rutishauser's qd-algorithm (cf. [5, 9, 10, 11, 12, 13]) to compute the singular values of a bidiagonal matrix. However, their algorithm can no longer be expressed as a sequence of orthogonal transformations applied to the bidiagonal matrix. Consequently, it is not possible to compute the singular vectors simultaneously with the singular values. Our approach is based on Rutishauser's qd-algorithm, too. But the orthogonal qdalgorithm can be expressed as a sequence of Givens rotations applied from the left and the right to the bidiagonal matrix. This enables us to compute the singular vectors along with the singular values.
1O0
2
U. von Matt
GENERALIZED
GIVENS
TRANSFORMATION
Usually the Givens transformation --8
C
with c 2 + s 2 = 1 is determined such that z2
0
"
The matrix G is therefore used to selectively annihilate elements in a vector or a matrix. But it is also possible to introduce another value a different from zero: ~
9
X2
(7
The v~ue of~ is given b y e ' - +~/~ + ~ -
~2.
Obviously, this transformation is only possible if lal ___ ~/x~ +
z~.
It is easily verified that
1 8
Xl2 -{- X 2
X2
--Xl
0.
We will also need a variant of the generalized Givens transformation, its so-called differential form 9 In this case we determine the matrix G such that
G[xl] = rl X2
r2
'
where ~1 : = ~ i g n ( ~ l ) ~ / ~ ~ =
2 -
sign(~2)J~2 +
~, 0"2.
Of course, this is only possible if ]0"[ _< [Xl[. It is easily verified that the values of c and s are given by ~
3
- ~ + ~
~2
ORTHOGONAL
-~1
r2
"
QUOTIENT-DIFFERENCE
In what follows, we will refer to the following matrices L and U: C~l i
:-
]~1
STEPS
n-by-n lower bidiagonal 71
'"
U :~ /~n--I
" 9 O/n
~n-1
7n
and upper bidiagonal
The Orthogonal QD-algorithm
101
|
81
81
O/2
8~
0/3
82
0
O/2
|
82
0/3
"..
~3 ~
|
P ~r
|
ff
~
81
*)'1
~1
0
3'2
82
0
%
9
~
.9
~
9
9149 o 9
P P
P
O"
O" 9 9149J
~149149
Figure 1: Orthogonal Left lu-Step with Shift s. D e f i n i t i o n 3.1 Let Q be an orthogonal (2n)-by-(2n) matrix. We call the transformation
(2
aI
=
~/a 2 + s2I
an orthogonal left lu-step with shift s. This transformation exists if and only if the condition Isl _< a ~ n ( L ) holds. The singular values of L are diminished by the amount of the shift s, i.e. =
The matrix Q can be constructed by the sequence of Givens rotations depicted in Figure 1. The quantity p is an abbreviation for V'a2+ s 2. Note that every other step consists of a generalized Givens transformation. In order to avoid numerical problems for small shifts s it is necessary to use the differential form of the generalized Givens transformation. D e f i n i t i o n 3.2 Let Q denote an orthogonal (2n)-by-(2n) matrix. We call the transformation
Q
:
v/a2 -~- S2I
an orthogonal left ul-step with shift s. This transformation represents the dual version of the orthogonal left lu-step. It can be carried out if and only if Isl < a ~ n ( U ) . The singular vMues of U are reduced by the amount of the shift s, i.e. :
U. von Matt
102
|174
al
'~
~1
0
"~
9176 ~
0 0~2
0
#2
~3
9
~
~
". ~
,~
Figure 2: 0rthogonal Right ul-Step. An orthogonal left ul-step, too, can be executed by a sequence of Givens rotations. The mechanism is the same as in Figure 1, with the only exception that the transformations are applied from bottom to top. D e f i n i t i o n 3.3 Let Q denote an orthogonal n-by-n matrix. We call the transformation I
QT
Q=
aI
an orthogonal right ul-step. This transformation can always be executed, and it leaves the singular values of U unchanged. If 7n = 0 we have an = fin-1 = 0 after the transformation. This property is useful to deflate a matrix U with Vn = 0. The sequence of Givens rotations necessary for an orthogonal right ul-step is depicted as Figure 2. D e f i n i t i o n 3.4 Let Q denote an orthogonal n-by-n matrix. We call the transformation
an orthogonal right lu-step. This transformation represents the dual version of the orthogonal right ul-step. It can also be executed unconditionally, and it preserves the singular values of the matrix L. If the first row of L is zero, i.e. if a l = 0, we get a matrix U with 3'1 = 61 = 0. We can therefore use this transformation to deflate a matrix L with a l = 0. An orthogonal right lu-step can also be carried out by a sequence of Givens rotations. The same technique is used as shown in Figure 2, except that the transformations are applied from bottom to top. We will refer to the four transformations that have been introduced in this section by the generic term of orthogonal qd-steps. 4
ORTHOGONAL
QUOTIENT-DIFFERENCE
ALGORITHM
Let B denote a given n-by-n lower bidiagonal matrix whose singular value decomposition is desired. The orthogonal qd-algorithm transforms the (2n)-by-n matrix
A:=
B] 0
The Orthogonal QD-algorithm
103
by a sequence of orthogonal qd-steps into the (2n)-by-n matrix 0] E ' where IE denotes an n-by-n diagonal matrix with the singular values of B. In matrix terms this process can be described by the equation
where P denotes an orthogonal (2n)-by-(2n) matrix, and Q denotes an orthogonal n-byn matrix.
5
DEFLATION
In this section we analyse the conditions that allow us to set to zero an entry of the lower bidiagonal matrix L. This reduced matrix will be called L. We compare the singular values of the original matrix A :=
aI
with those of the modified matrix A.:=
aI
"
We require that all the singular values of A and A agree within the precision of the computer arithmetic. The analysis in [14, pp. 143-149] shows that we can set the diagonal entry ak to zero if the condition
applies numerically. It should be noted that/30 =/3,~ = 0. The numerical criterion for an off-diagonal element ~k reads
~2 + I~kl(l~kl + min(l~kl, I~k+~l))= ~2. Similar criterions can be derived for an upper bidiagonal matrix U.
6
SINGULAR VECTORS
If the orthogonal transformations are accumulated the orthogonal qd-algorithm delivers the decomposition (1). In most cases, however, we would like to get the singular value decomposition B = Vr~V T
(2)
of the n-by-n lower bidiagonal matrix B, where U and V denote orthogonal n-by-n matrices. It is straightforward to identify the matrix Q from (1) with the matrix V in (2). The
104
U. von M a t t
matrix U can then be obtained from the QR-decomposition (without pivoting) of B V : BV
= UR.
Care must be taken that the diagonal elements of R remain nonnegative. We get an alternative way of computing the left singular vectors by partitioning the orthogonal matrix P in (1) into n - b y - n submatrices: P=
[ Pll /~
P12 ] /~ "
If B is nonsingular then we must have Pll = P22 = 0, and P12 is an orthogonal matrix consisting of the left singular vectors of B. Even in the presence of rounding errors we can force Pll and P22 to go to zero by an appropriate convergence criterion.
7
SHIFTS
The performance of the orthogonal qd-algorithm mainly depends on the choice of the shift s in each step. In this section we describe two different shift strategies based on Newton's and Laguerre's method to compute the zeros of a polynomial. The zeros of the characteristic polynomial p(A) := det(U wU - AI) are the eigenvalues of u T u , which are equal to the squares of the singular values of U. Thus if we use Newton's or Laguerre's method to approximate the smallest zero of p(A) we can also get an approximation for the smallest singular value of U. Newton's method can be described by the iteration
?(~k)
)~k+l = )~k - p,()~k) ,
and in the case of Laguerre's method we have
Ak+l
p(~k)
F(~k)
~
.
1 + i(n - 1)(nP'C~)~-P( ~)p''(~)p,(~)~ - 1)
We choose A0 = 0 as our initial value. It is well-known (cf. [7, 8] and [15, pp. 443-445])that both methods will then converge monotonically to the smallest zero of p(A). In particular we have
0 - ~0 _~ )~1 _~ Amin(UTU) 9 Laguerre's method enjoys cubic convergence (cf. [8, pp. 353-362] and [15, pp. 443-445]). On the other hand Newton's method will converge only quadratically [15, p. 441]. We will use v / ~ as the shift in an orthogonal qd-step. Thus we define Newton's shift and Laguerre's shift as follows:
3Newton
./
p(O)
V p,(o)'
The OrthogonalQD-algorithm
8Laguerre
"-'-i -p,(o) p(o)l[
105
n
p'(0)~ - 1) 1 Both Newton's and Laguerre's shift can be computed by recurrence formulas. Special attention is necessary to avoid overflow and underflow problems. More details may be found in [14, pp. 149-151]. 8
NUMERICAL RESULTS
In this section we compare the performance of the orthogonal qd-algorithm with the subroutine sbdsqr from the LAPACK library [1] (see also [14, pp. 152-155]). This subroutine represents an implementation of the work of Demmel and Kahan [2]. Both algorithms compute all the singular values to high relative accuracy. We also observe that Laguerre's shift is quite expensive to evaluate. If n is the size of the matrix, the calculation of Laguerre's shift needs on the order of O(n) operations. On the other hand, Demmel and Kahan use Wilkinson's shift which only needs 0(1) operations. Wilkinson's shift also requires fewer iterations per singular value to converge. Unfortunately it can only be applied in conjunction with the QR-algorithm since it does not compute a lower bound for the smallest singular value. Laguerre's shift, on the other hand, always gives us a lower bound on the smallest singular value of the bidiagonal matrix, which is essential for the orthogonal qd-algorithm. If only the k smallest singular values are desired the orthogonal qd-algorithm can compute them in O(kn) operations. If the singular vectors are also needed the operation count increases to O(kn2). This represents a significant savings compared to O(n 3) which is the computational complexity of a complete singular value decomposition. 9
CONCLUSIONS
We have presented the orthogonal qd-algorithm to compute the singular values of a bidiagonal matrix to high relative accuracy. Our approach differs from the qd-algorithm by Fernando and Parlett as we do not transpose the bidiagonal matrix in each step. This enables us to accumulate the orthogonal transformations and thus obtain the singular vectors. We use Newton's and Laguerre's method to compute the shifts for the orthogonal qdsteps. Although Laguerre's shift does not quite attain the efficiency of Wilkinson's shift it has the advantage that it always computes a lower bound on the smallest singular value of the bidiagonal matrix. We have also presented two generalizations of the Givens transformation. They come in very handy in the context of the orthogonal qd-algorithm, but they should also be useful in other applications. Acknowledgments Our thanks go to W. Gander, G. H. Golub, and J. Waldvogel for their support. The author also thanks G. W. Stewart for his helpful comments.
106
U. von Matt
References [1] E. Anderson, Z. Bal, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov and D. Sorensen. LAPA CK Users' Guide. SIAM Publications, Philadelphia, 1992. [2] J. Demmel and W. Kahan. Accurate Singular Values of Bidiagonal Matrices. SIAM J. Sci. Stat. Comput. 11, pp. 873-912, 1990. [3] J. J. Dongarra, C. B. Moler, J. R. Bunch and G. W. Stewart. LINPA CK Users' Guide. SIAM Publications, Philadelphia, 1979. [4] K. V. Fernando and B. N. Parlett. Accurate singular values and differential qd algorithms. Numer. Math. 67, pp. 191-229, 1994. [5] W. Gander, L. Molinari and H. Svecovs Numerische Prozeduren aus Nachlass und Lehre yon Prof. Heinz Rutishauser. Internat. Ser. Numer. Math., Vol. 33, Birkhguser, Basel, 1977. [6] G. H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numer. Math. 14, pp. 403-420, 1970. [7] E. Hansen and M. Patrick. A Family of Root Finding Methods. Numer. Math. 27, pp. 257-269, 1977. [8] A. M. Ostrowski. Solution of Equations in Euclidean and Banach Spaces. Third Edition of Solution of Equations and Systems of Equations, Academic Press, New York, 1973. [9] H. Rutishauser. Der Quotienten-Differenzen-Algorithmus. ZAMP 5, pp. 233-251, 1954. [10] H. Rutishauser. Der Quotienten-Differenzen-Algorithmus. Mitteilungen aus dem Institut fiir angewandte Mathematik Nr. 7, Birkh~iuser, Basel, 1957. [11] H. Rutishauser. Uber eine kubisch konvergente Variante der LR-Transformation. ZAMM 40, pp. 49-54, 1960. [12] H. Rutishauser. Les propridtds numdriques de l'algorithme quotient-diffdrence. Rapport EUR 4083f, Communaut~ Europ~enne de l'Energie Atomique - EURATOM, Luxembourg, 1968. [13] H. Rutishauser. Lectures on Numerical Mathematics. Birkh~iuser, Boston, 1990. [14] U. von Matt. Large Constrained Quadratic Problems. Verlag der Fachvereine, Ziirich, 1993. [15] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon Press, Oxford, 1965.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
ACCURATE SINGULAR JACOBI METHOD
VALUE
107
COMPUTATION
WITH
THE
Z. DRMA(~ Fachbereich Mathematik, Fern UniversitSt Hagen, Postfach 9~0, D-5808~ Hagen, Germany. zlatko, drmac @fern uni-hagen, de
ABSTRACT. The main interest in this work is implementation of the Jacobi method as reliable software for computing the singular values of a general real matrix. The modifications of formulas for computing the rotation angle and modified Jacobi transformation are given in order to ensure reliable computation of singular values in the full range of floating point numbers. If the gradual underflow is used, then the effect of underflow on the accuracy properties of the modified Jacobi SVD method is not worse than the uncertainty caused by roundoff errors. KEYWORDS. Jacobi method, relative accuracy, singular values.
1
INTRODUCTION
The following is a very short summary of recent results in the field of singular value computation:
The singular values of general matrices are not perfectly conditioned ( in the sense of Weyl's theorem ), if we consider relative perturbation estimates. In other words, if a > 0 is singular value of A E R mxn, then small perturbation 5A = A o E, 5Aij = AijEij,
IEijl > n. (In the eigenvalue computation positive definite H is factorized, ~r*HTr = LI, ~, by Cholesky factorization with optional pivoting and the SVD of/_, is computed.) Due to the unitary invariance of the spectral norm, ~2(As) = ~2(Rs), where A s = Adiag ([]Aei[[2) -1, R s = Rdiag ([[Rei[]2) -1. The accelerated Jacobi method [4] runs on R r (on L in the eigenvalue computation of r~Hlr = LLr). The following two theorems show that the perturbations of singular values caused by the Jacobi procedure are of the same order as those caused by the QR (Cholesky) factorization. T h e o r e m 2.1 Let A E l:tmxn and let al >_ ".>_ an > 0 be its singular values. Let R be the upper triangular matrix computed by Givens QR factorization in the I E E E arithmetic with relative precision r and let 6A be the backward perturbation (A + 6A = QR). If the overall proces can be performed in p parallel steps, then the singular values al ~ "'" ~ an of R satisfy
I~
O'il
_< v~llAts[12((1 + 6r p - 1),
1 ___i _< n,
(12)
where q < n denotes the maximal number of nonzero entries in any row of 6A and provided that the right hand side in (12) is less than one. For m >> n and suitably chosen pivot strategy p ~ log 2 m + (n - 1) log 2 log 2 m. T h e o r e m 2.2 If one sweep of the Jacobi SVD method with some pivot strategy can be performed in p parallel steps on A E R nxn, then after l sweeps
l a i - ail _ v/-ff{id~sR{]2((1 + 10r _ 1), 1 _ an > 0 and ~r1 >_ ... >_ ~,~ are the singular values of A and the matrix obtained at the end of the ith sweep, respectively, and provided that the bound in (13) is less than one. ~For the sake of technical simplicity we do not consider underflow exceptions. STake e.g. Ilapl[2 ~ HaqH2and ap and aq nearly parallel.
Singular Value Computation with the Jacobi Method
113
The main difference between our and the analysis of Demmel and Veselid is that we estimate relative errors in matrix rows while transforming its columns. The point is that the condition number of row-scaled matrices remains unchanged during the process because of the unitary invariance. 4 The conclusion of the error analysis is: If the gradual underflow is used and the singular values of the stored matrix belong to (v, r then the effect of underflow on the accuracy of the modified Jacobi SVD method is not worse than the uncertainties caused by roundoff errors.
2.4
FAST C O N V E R G E N C E ON GRADED MATRICES. WHY?
As we know, if the pivot columns are differently scaled, then the Jacobi rotation behaves like the Gram-Schmidt orthogonalization. On the other side, the modified Gram-Schmidt procedure follows the pattern (1, 2), ( 1 , 3 ) , . . . , ( n - 1, n), just as the row-cyclic 3acobi process does. With proper pivoting, introduced by de Rijk [2], the Jacobi SVD algorithm is forced to behave on graded matrices like the modified Gram-Schmidt orthogonalization, s (Initial sorting of matrix columns in nonincreasing sequence additionally supports this similarity.) This explains fast convergence of Jacobi SVD method on graded matrices. 2.5
CONDITION BEHAVIOUR. AN EXAMPLE
Explaining the excellent behaviour of maxk tc2(A(sk))/tc2(As) is stated in [4] as an important open problem. For the sake of compatibility with [4], [10] we consider hereafter the symmetric two-sided Jacobi method. Set H = ArA, Hs = ArsAs, H (k) = (A(k))*A (k), k = 1 , 2 , . . . Mascarenhas [10] found examples where maxk>ox2(H(k))/~2(Hs) can be as large as n/4. Although this is not bad because of factors of n already in the floating point errors, the condition behaviour in the Jacobi method remains interesting problem. Mascarenhas used S(n,a) = ( 1 - a)I + aee ~, e = ( 1 , . . . , 1 ) ~, 0 < a < 1, with n = 2 t and pivot strategy that produced recursively matrices of the same type. In the first step he obtained S (1) = (1 - a)In/2 • (1 + a)S(n/2, (1 - a ) / ( 1 + a)); in the next step the same strategy was applied to the lower right 2 z-1 x 2 l-1 submatrix of S (1) etc. To the growth of the condition of scaled matrices shown in [10], we added in [8] the fonowing observation:
The matrix S(n,a) is already optimally scaled, i.e. ~2(S(n,a)) __ O, .find the Cholesky factor of [( in a forward stable way, using R. and w l , . . . , w n . Without loss of generality we take wl > 0, w2 . . . . . wn = O. The sign paterns are as follows: ~
D
9
m .
.
.
.
.
.
.
.
.
.
. m
+
.
.
.
.
.
. . . .
A-
R~I,
~
021 el
ArA=I(
=
+
.
:
:
:
9
9
.
.
.
". 9
.
+
Rotating at (1, n § 1) in A produces o
.
A
9
.
9
.
.
:=
+
+
...
+ +
and this transformation is obviously forward stable 9 Now, rotate at (2, n § 1). The sign patern of the transformation of pivot rows is -
+
+
+
+
...
+
0
+
+
...
+
"
All 'pluses' can be computed in a forward stable way. The minus signs ( @ ) are secure because /~ is an M-matrix, but in the application of the rotation they are computed by
The Accuracy of the Eigensolution of Matrices
121
proper substraction which does not obey the forward stability property. We can easily handle this problem by using formulas from the Cholesky factorization of K. Indeed,
A2j := ~ 1 ( / ( 2 j - A12Alj), 3 n) matrix with rank(A) = r. The singular value decomposition (SVD) of A can be defined as A = U E V T,
(1)
where u T U = v T v = In and E = diag(al, ..., an), ai > 0 for 1 __ i < r, ai = 0 for i > r + 1. The first r columns of the orthogonal matrices U and V define the orthonormalized eigenvectors associated with the r nonzero eigenvalues of A A T and A TA, respectively. The singular values of A are defined as the diagonal elements of E which are the nonnegative square roots of the n eigenvalues of A A T. The set {ui, ai, vi} is called the i-th singular triplet. The singular vectors (triplets) corresponding to large (small) singular values are called large (small) singular vectors (triplets). SVDPACKC uses Lanczos, block-Lanczos, subspace iteration, and trace minimization methods to approximate extremal singular values and corresponding singular vectors. Two canonical sparse symmetric eigenvalue problems are solved by SVDPACKC routines to (indirectly) compute the sparse SVD. For well-conditioned matrices A, the n x n eigensystem of AT A is the desired eigenvalue problem. For SVDPACKC methods such as B[.52, which implements a block Lanczos iteration similar to [5], each step of the Lanczos recursion requires multiplication by A and A T. If memory is not sufficient to store both A and A T, the matrix A is normally stored in an appropriate compressed sparse matrix format [2]. Subsequently, the multiplication by A T may perform poorly as compared to multiplication by A simply due to excessive indirect addressing enforced by the data structure in A [3]. Can the multiplication by A T be avoided altogether? Probably not, but as demonstrated in the next section, the block Arnoldi method applied to the m • m matrix [AI0 ] is an alternative scheme which can suppress the multiplication by A T up to restarts. 2
AI:tNOLDI SVD METHOD
Sand and Schultz [8] [9] originally proposed the GMI:tES method for solving large sparse nonsymmetric linear systems based on the Arnoldi process [1] for computing the eigenvalues of a matrix. Can Arnoldi's method, which is essentially a Gram-Schmidt method for computing an orthonormal basis for a particular Krylov subspace, be used to compute the SVD in (1) without having to reference A T explicitly? Consider the block Arnoldi iteration below.
Transpose-free Arnoldi Iterations
125
A l g o r i t h m 2.1 Transpose-Free ARnoldi (T[=AR) S V D iteration. 1. Choose V1 (m • b) such that v T v 1 = /b, and define B = [AlO], so that B is an m • m nonsymmetric matrix, and A is a given m • n (m _> n) sparse matrix. 2. For j = 1, 2, ...k W
=
B•
Hid
=
viTw,
i=l,2,...,j J
W
=
W-y~Vi•
W
=
QR, QTQ = Ibm and R is upper triangular,
-
R.
i=1 Hj+I,j
Let V = IV1, V2,..., Vk], and assemble the block upper Hessenberg matrix H from the b • b submatrices, Hi,j,i = 1 , . . . , k + 1;j = 1 , . . . , k , generated in Step 2 of the Arnoldi iteration above. Define the n • bk matrix V = [In l0] • V so that
is an orthogonal factorization of 1)" where Q k is n x bk and has orthonormal columns. It can be shown that AQ, k = VCk + Vk+l HTk+l,k • GT]~-I,
(2)
where Ck = H k R -1 , Hk is the bk• block upper Hessenberg matrix constructed by deleting the last b • b submatrix o f / t k , and G T is the b • bk matrix defined by G T = [0, ... , 0 I/b]. Via (2), the extremal singular triplets of the m • n matrix A are approximated by the k singular triplets of the k • k Hessenberg matrix Ck. If the SVD of Ck = (]kEkV T , where uT~rk = ~'TVk = Ik and ~k = diag[~l, a2, . . . , ak], then A(Ok~/k) = (V~fk)Ek +
(Vk+~HT+~,k)(aTk-~fZk),
(3)
and the i-th diagonal element of Ek is an approximation to the exact singular value, hi, of A in (1). Now, define Yk = QkVk = [Yl, Y2, . . . , Yk] and Zk = YUk = [zl, z2, . . . , Zk], where Yk and Zk are n x k and m x k, respectively. Then, (3) becomes AYk -- Z k ~ k + (Vk+l H Tk+l,k )(GT/~-IVk)
(4)
where the columns of Zk and Yk are approximations to the left and right singular vectors of A, respectively. If the columns of Yk are not suitably accurate approximations of the right singular vectors of A, Algorithm 2.1 can be restarted by setting
vxr = [(ATzk~;~ ) r 10~_~,b], r
(5)
126
M. W. Berry and S. Varadhan
as an improved approximation to b-largest right singular vectors of A. Hence, b multiplications of A T are required upon each restart of Algorithm 2.1. If one can easily solve systems of the form
(AT A - b2I)y k+l = 0,
(6)
then Inverse Iteration (INVIT) can be used to greatly improve the accuracy (see [7]) of the restart vectors stored as columns of V1. The use of INVIT for restarting Algorithm 2.1 is referred to as Arnoldi with INVerse iteration or AINV. 3
PRELIMINARY
PERFORMANCE
The performance of Algorithm 2.1 or TFAR and BL52 from SVDPACKC [4] is compared with respect to the number of required multiplications by A and A T. All experiments were conducted using MATLAB (Version 4.1) on a Sun SPARCstation 10. For the four test matrices mentioned below, the p-largest singular triplets, where p = 2q, q = 1, 2, 3, 4, are approximated to residual accuracies [Iri[[2 = []A~i- aifii[]~ of order (..9(10-3) or less. Deflation was used in both methods to avoid duplicate singular value approximations. In Figure 1, the well-separated spectrum of the 374 x 82 term-document matrix from [3] is approximated. Initial block sizes b = p and Krylov subspace dimension (bk) bounds of 2p were used in both methods. The TFAR scheme required, on average, 60% fewer multiplications by A T (represented by y=A'x) with a 40% increase in multiplications by A only for the cases p = 6, 8. In Figure 2, a well-separated portion of the spectrum of the 457 x 331 constraint matrix from the collection of linear programming problems assembled in [6] was approximated. These experiments used initial block sizes of 2, 4, 6, 4 and corresponding Krylov subspace dimension bounds of 12, 24, 24, 20 for p = 2, 4, 6, 8 triplets, respectively. On average, TFAR required 65% fewer multiplications by A T but unfortunately at the cost of over 200% more multiplications by A. Figures 3 and 4 compare the performance of AINV, BLS2, and TFAR on two 50 x 50 synthetic matrices with clustered spectra. The spectra of these test matrices (see table below) are specified by the function r fl, 6, k,n,~x, ima=) = (ak + t3) + 6i, where k = 0 , 1 , . . . , k m a x is the index over the clusters, i = 1 , 2 , . . . , irnax is the index within a duster, 6 is the uniform separation between singular values within a duster, and a, fl are scalars.
Matrix CLUS1 CLUS2
Number of Clusters 13 5
Cluster Separation (6) 10 -3 10-6
Maximum Cluster Size 4
10
~.(a,13,6, kmax, ima=)
r r
12,4)
In approximating the p-largest triplets of CLUS1, both TFAR and BL52 used initial block sizes of 2, 4, 6, 4 and corresponding Krylov subspace dimension bounds of 12, 24, 24, 20 for p = 2, 4, 6, 8, respectively. For CLUS2, initial block sizes of 2, 4, 6, 8 and corresponding Krylov subspace dimension bounds of 4, 8, 12, 16 were used. An implementation of AINV using a Krylov subspace dimension bound of 16 and deflation for a static block size b = 1
Transpose-free Arnoldi Iterations
127
470
300
= --m" -,a-
BLS2 (y=Ax) BLS2 (y=A'x) TFAR (y=Ax) TFAR (y=A'x)
/ / /
,,4~4" ,,,4"," /("
cg~ 100
Number of Singular Triplets
Figure 1: Performance of BLS2 and TFAR on the 374 • 82 matrix ADI for 10 -3 residual accuracy.
470
3oo
= BLS2 (y=Ax) --i- BLS2 (y=A'x) -*-4 TFAR (y=Ax)
.~
~2oo
100
9
1
"
~
"
8
Number of Singular Triplets
Figure 2: Performance of BLS2 and TFAR on the 457 x 331 matrix SCFXM1 for 10 -3 residual accuracy.
M.W. Berry and S. Varadhan
128
: - 4-
:
- g-
AINV (y=Ax) (y=A'x) BLS2 (y=Ax) BLS2 (y=A'x)
AINV
,; ,"
s
9 TFAR(y=Ax)
..o
--~- TFAR (y=A'x)
~ 100I I
1
2
---
"
----O-
1
~
8
Number of Singular Triplets
Figure 3: Performance of AINV, BLS2, and TFAR on the 50 • 50 matrix CLUS1 for 10 -3 residual accuracy.
| | / .2~2oo.~
:
-4-
mNV
sI
(y=Ax)
AINV (y=A'x) BLS2 ( y = A x ) - - - BLS2 (y=A'x) 9 TFAR (y:Ax)
/."
=
~ "
/." /~'"
sssssSSSS
2
4 6 Number of Singular Triplets
8
Figure 4: Performance of AINV, BLS2, and TFAR on the 50 x 50 matrix CLUS2 for 10 -3 residual accuracy.
129
Transpose-free Arnoldi Iterations
(i.e., a single-vector iteration) was also applied to these synthetic matrices. In Figure 3, about 85% fewer multiplications (on average) by A T and 35% fewer multiplications (on average) by A are required by TFAR. In Figure 4, the reduction factors in multiplications by A and A T for TFAR are about 50% and 80%, respectively. At the cost of solving the systems in (6), both Figures 3 and 4 demonstrate that AINV requires approximately 42% and 96% fewer multiplications by A and A T, respectively, than that of BIS2.
20
40 60 Subspace Dimension
•
Mults by A
l
Mults by A'
80
Figure 5: Distribution of sparse matrix-vector multiplications for TFAR as the Krylov subspace dimension increases. The largest singular triplet of the 374 x 82 term-document matrix ADI is approximated to a residual accuracy no larger than 10 -3. Figure 5 illustrates how the distribution of sparse matrix-vector multiplications by A or A T varies as the Krylov subspace dimension (k from Algorithm 2.1) increases. Here, k ranges from 10 to 75 as we approximate the largest singular triplet { u l , a l , Vl} of the 374 x 82 ADI matrix. Clearly, the need for restarting (see (5)) diminishes as the subspace dimension increases and thereby requires fewer and fewer multiplications by A T at the cost of requiring more storage for the Arnoldi vectors (columns of V in Algorithm 2.1). 4
FUTURE WORK
Although extensive testing of TFAR and AINV is certainly warranted, the results obtained thus far for selected matrices is promising. Future research will include comparisons with recent Arnoldi methods based on implicitly shifted QR-iterations [10]. Although there does not appear to be a way to completely remove the multiplication by A T in computing the SVD of A, there can be substantial gains for methods such as TFAR (or AINV) when multiplication by A T is more expensive than multiplication by A. Alternative restarting
130
M.W. Berry and S. Varadhan
procedures may improve the global convergence rate of the TI=AR method and further reduce the need for multiplications by A T . References
[1] W. E. Arnoldi, The Principle of Minimized Iteration in the Solution of the Matrix Eigenvalue Problem, Quart. Appl. Math., 9 (1951), pp. 17-29. [2] R. Barrett et al., Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, 1994. [3] M. W. Berry, Large Scale Singular Value Computations, International Journal of Supercomputer Applications, 6(1992), pp. 13-49. [4] M. W. Berry et al., SVDPACKC: Version 1.0 User's Guide, Tech. Rep. CS-93-194, University of Tennessee, Knoxville, TN, October 1993. [5] G. Golub, F. Luk, and M. Overton, A Block Lanczos Method for Computing the Singular Values and Corresponding Singular Vectors of a Matrix, ACM Transactions on Mathematical Software, 7 (1981), pp. 149-169. [6] I. J. Justig, An Analysis of an Available Set of Linear Programming Test Problems, Tech. Rep. SOL 87-11, Systems Optimization Laboratory, Stanford University, Stanford, CA, August 1987. [7] B. N. Parlett, The Symmetric Eigenvalue Problem, Prentice Hall, Englewood Cliffs, NJ, 1980. [8] Y. Saad, Krylov Subspace Methods for Solving Large Unsymmetric Linear Systems, Mathematics of Computation, 37 (1981), pp. 105-126. [9] Y. Saad and M. H. Schultz, GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM Journal of Statistical and Scientific Computing, 7 (1986), pp. 856-869. [10] D. C. Sorensen, Implicit Application of Polynomial Filters in a K-Step Arnoldi Method, SIAM Journal of Matrix Analysis and Applications, 13 (1992), pp. 357-385.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
131
A LANCZOS ALGORITHM FOR COMPUTING THE LARGEST QUOTIENT SINGULAR VALUES IN REGULARIZATION PROBLEMS
P.C. HANSEN
UNI. C, Building 304 Technical University of Denmark DK-2800 Lyngby, Denmark Per.
[email protected] M. HANKE
Institut fiir Praktische Mathematik Universitgt Karlsruhe, Englerstrasse 2 D-76128 Karlsruhe, Federal Republic of Germany hanke @ipmsun l. mathematik, uni-karlsruhe, de
ABSTRACT. In linear regularization problems of the form min{llA x - bll 2 + )~21IBxll2}, the quotient singular value decomposition (QSVD) of the matrix pair (A, B) plays an essential role as analysis tool. In particular, the largest quotient singular values and the associated singular vectors are important. We describe an algorithm based on Lanczos bidiagonalization that approximates these quantities. KEYWORDS. Lanczos bidiagonalization, QSVD, regularization.
1
INTRODUCTION
If the coefficient matrix of a (possibly overdetermined) system of linear equations is very ill conditioned, then the solution is usually very sensitive to errors (measurement errors, approximation errors, rounding errors, etc.), and regularization is therefore needed in order to compute a stabilized solution which is less sensitive to the errors. Perhaps the most popular regularization method is the method due to Tikhonov and Phillips, see [7]. In this method, the regularized solution is computed as the solution to the following problem
rnJn {llAx - bl]2 + )~211Bzl]~).
(1)
P.C. Hansen and M. Hanke
132
Here, A is the coefficient matrix and b is the right-hand side of the linear system to be solved. In many regularization problems in signal processing, such as image restoration and other deconvolution problems, the coefficient matrix A is either (block) Toeplitz or (block) Hankel. This structure of A should be utilized whenever possible. The matrix B in (1) defines a seminorm, and typically B is a discrete approximation to loss of generality we can assume that B has full row rank. We will also assume that B is reasonably well conditioned--otherwise it will not have the desired regularizing effect.
a derivative operator. Without
The parameter A is the regularization parameter, and it controls how much emphasis is put on minimizing the smoothness of the solution x, measured by lib zl12 , relative to minimization of the residual norm IIA x - b[12. How to choose )~ is an important topic which we shall not consider here; see, e.g., [9] for details. In addition to the Tikhonov/Phillips method, many other regularization methods can be defined. Surveys of some of these methods can be found in, e.g., [4], [7] and [9]. Common for all these methods is the appearance of the matrix pair (A, B), where A is the coefficient matrix, and B is the regularization matrix. For all these regularization methods involving the pair (A,B), it turns out that the a very valuable tool for analysis of the problem as well as for computing regularized sohtions. Let A and B have dimensions m x n and p x n, respectively, satisfying m __ n _> p. Then the QSVD takes the form
Quotient Singular Value Decomposition1 (QSVD) is
A = U ( ~0 In-,O) X-I'
B = V ( M , O ) X -1,
(2)
where U and V have orthonormal columns, X is nonsingular, and E and M are p x p diagonal matrices, = diag(ai),
M = diag(#i).
(3)
Then the quotient singular values are defined as the ratios "r~ = ,~lu~,
i = 1,...,p.
(4)
We note that if B is nonsingular, then the quotient singular values are the ordinary singular values of the matrix quotient A B -1. The reason why the QSVD is so important in the analysis of regularization problems is that the regularized solution can often be expressed in terms of the QSVD components. For example, for Tikhonov regularization (1) the regularized solution is given by
~'r
"/? + '~2 ffi
i=p+l
Here, ui and xi are the columns of U and X, respectively. Other regularized solutions can be obtained from (5) by replacing 72/(72 + ,~2) with the appropriate filter factors for the particular regularization method. Thus, we see that information about the QSVD components gives important insight into the regularization problem. 1The QSVDwas previously known as the
GeneralizedSVD.
A Lanczos Algorithm
133
In particular, the QSVD components corresponding to the largest quotient singular values 7i are important because they constitute the essentially "unregularized" component of the regularized solution, i.e., the component for which the associated filter factors are close to one. In fact, one can define a regularization method (called "truncated GSVD" in [10]) in which the corresponding filter factors are simply ones and zeros. Another important use of these QSVD components is to judge the choice of the regularization matrix B. Only limited theoretical results are available regarding this choice, and often the user wants to experiment numerically with different B-matrices and monitor the influence of B on the basis vectors--the columns of X ~ f o r the regularized solution. The largest quotient singular value is also needed as a scaling factor in connection with the v-method for iterative regularization, cf. [9, w Hence, there is a need for efficient algorithms that compute the largest QSVD components when A is large and structured or sparse. A Lanczos-based algorithm for computing the largest QSVD components of a general matrix pair was proposed recently by Zha [14]. In this paper we present an alternative algorithm specifically designed for the case where B is a banded matrix with full row rank~which is the typical situation in regularization problems. By making use of this property of B, we obtain an algorithm which is faster than the more general algorithm. A related direct algorithm which involves a full SVD computation can be found in [11].
2
THE
QSVD L A N C Z O S B I D I A G O N A L I Z A T I O N
ALGORITHM
The key idea in our algorithm is to apply the Lanczos bidiagonalization algorithmmfor computing the largest singular values, see, e.g., [1, w [2, w and [6, w the matrix A, but "preconditioned" from the right in a specialized way such that the quotient singular values and associated vectors are obtained. The "preconditioner" is BtA, the A-weighted generalized inverse of B, see [3], which in terms of the QSVD can simply be written as
BtA = X ( M-1) 0 vT '
(6)
and we see that A BtA is then given by
0 Hence, the ordinary
(7)
SVD of A BtA is identical to part of the QSVD of (A, B).
The use of this "preconditioning" for iterative methods was initially advocated in [8]. We emphasize that the purpose of the "preconditioner" is not to improve the spectrum of the matrix, but rather to transform the problem such that the desired quotient singular values are computed. More details can be found in [9]. In the Lanczos bidiagonalization algorithm we need to multiply a vector with A BtA and
(A BtA)T. Hence, two operations with BtA are required, namely, matrix-vector multiplication
P.C. Hansen and M. Hanke
134
with BtA and (BtA) T. With the aid of some limited preprocessing, both operations can be implemented in ( . 9 ( ( n - p + l)n) operations, where I is the band width of B. The preprocessing stage takes the following form. PREPROCESSING STAGE W .-- basis for null space of B A W = Q R ( QR factorization) S ~ R-1QTA /3 ~ augmented B (see text below) /} = ],/)" (L V factorization of/}). Usually, the n x (n - p) matrix W can be constructed directly from information about B; see, e.g., [11] and (19) below. The QR decomposition of the "skinny" m x ( n - p) matrix A W is needed to compute its pseudoinverse (A W) t = R -1 QT. The computation of S requires O ( m ( n - p)2) operations plus n - p multiplications with A and with A T. Next, the p x n matrix B is augmented with n - p rows such that the resulting m a t r i x / 3 is square and nonsingular, followed by L U factorization of J3. If B is a discrete approximation to the (n - p)th derivative operator of the form B=
(1_1) "..
then one can set
".. 1
or
B=
-1
(1_21) "..
".. 1
0),
".. -2
,
(s)
1
(~)
and no L U factorization is actually required because /} is lower triangular. If B is a discrete approximation to the Laplacian on a square N x N grid (in which case n = N 2 and p = n - 4), then one particular choice s of B has the block form B(~ B(1)
0 B(2) 9..
B=
B(1) ..
..
B(i) B(~) BO) 0
where B (~ is ( N -
B (o) B (1)
= =
(~0)
B (~
2) • N, B (1) and B (~) are N x N, and these matrices have the form
"..
"..
"..
2
-4
(11) 2
d i a g ( 2 , 1 , 1 , . . . , 1,2)
2This particular choice of B corresponds to the MATLABfunction del2.
(12)
A Lanczos Algorithm
-4 1
0 -4 "..
B (2) =
1 ".. 1
".. . -4 1 0 -4 Then one can obtain/} by augmenting B (~ as follows ~(o) ,_
(40...0) (o)
,
135
(la)
(14)
0...04 and since B has bandwidth 2N + 1, an L U factorization without pivoting requires O ( n N 2) = O(N 4) operations, see [6, w For this choice of/}, pivoting is not required because/~ is column diagonally dominant (see [6, w Given the matrices W and 5', and possibly an LUfactorization of/}, the two algorithms for multiplication with B?A and (B?A) T take the following form. COMPUTE ,-y ~ y ~
y = BtA z x (pad x with zeros according to B ~ / ~ ) /}-12 y-Wb'y.
COMP,TE V = ( B ~ ) r 9 X ~-- X-- s T w T x
fl y
,*-
(i~-~ )r ~ ~ (extract elements of ~) according to B ~ B).
Both algorithms require (.9((n- p + t)n) operations, where l is the band width of B. In particular, if B is given by (8) and (10), then 3 ( n - p)n and 2(N + 4)N 2 multiplications axe required, respectively.
3
HOW TO EXTRACT
THE NECESSARY QUANTITIES
The Lanczos algorithm immediately yields approximations of the largest quotient singular values 7i and the associated singular vectors ui and v~ (the columns of U and V in (2)). From 7i, the quantities ai and #i in (3) can easily be computed from 7~ 1 "'
= J1 +-~'
~'
: J1 +-y?
(15)
which follow from (4). Finally, we need to be able to compute the columns xi of X corresponding to 7i. Using the Q S V D (2) and the definition (6) of B?A, we get
B~v=x
0
'
(16)
136
P.C. Hansen and M. Hanke
from which we obtain the relation zi = ~i BOAvi.
(17)
Hence, xi can be computed from the corresponding vi by multiplication with B?A, which can be performed by means of the algorithm from the previous section. Notice that the vectors ui and xi for i = p + 1 , . . . , n cannot be determined by our Lanczos algorithm. However, the spaces span{xp+i,..., xn} and span{up+i,..., un}--which may be more important than the vectors--are spanned by the columns of the two matrices W and A W, respectively. Moreover, the vector
9o = ~ i..---p+l
~Tb ~i,
which is the component of the regularized solution x~ (5) in the null space of B, can be computed directly by means of the relation
xo = W (A W ) t b = W R -1QT b,
(iS)
where Q and R form the QR factorization of A W, cf. the preprocessing stage. We emphasize that our single-vector Lanczos algorithm cannot compute the multiplicity of the quotient singular values. If this is important, one should instead switch to a block Lanczos algorithm, cf. [5]. 4
NUMERICAL
EXAMPLES
We illustrate the use of our Lanczos algorithm with a de-blurring example from image reconstruction. All our experiments 3 are carried out in MATLAB using the REGULARIZATION TOOLS package [12]. The images have size N x N. The matrix A is N ~ x N 2, it represents Gaussian blurring of the image, and it is doubly Toeplitz, i.e., it consists of N x N Toeplitz blocks which are the same along each block diagonal; see [9, w167 and 8.2] for details. The matrix B is the discrete approximation to the Laplacian given by (10)-(13), and its size is ( g 2 - 4) x g 2. The starting vector is Ae/llAell2 with e = ( 1 , . . . , 1) T, and no reorthogonalization is applied. The matrix W representing the null space of B is computed by applying the modified Gram-Schmidt process to a matrix W0 of the following form, shown for N = 4: wT =
Ii
1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 2 2
1 1 3 4 3 4
2 2 1 2 2 4
2 3 6
2 4 8
3 1 3
3 2 6
3 3 9
3 4 12
1 1
4 4 1 2 4 8
1
4 3 12
i\
4 4 16
"
(19)
Our first experiment involves a small problem, N = 16, for which we computed all the 252 quotient singular values 7i explicitly by means of routine gsvd from [12]. The first five 7i axe very well separated in the range 175-46, the next 15 7i are less well separated in the range 38-6.3, and the next 30 7i are even closer separated in the range 5.5-0.67. 3The MATLABroutines used in these tests are available from the authors.
137
A Lanczos Algorithm
number of iterations k 10 20 30 40 50
1 1 3 4 5 7
2 1 2 3 4 5
3 4 1 2 3 4 5
index i 6 5 7 1 1 1 2 2 3 2 3 3
of exact quotient singular value 3'~ 10 14 8 9 11 12 13 15 16 17 1 1 2 2
1 1 2 2
1 1 2 2
1 1 1 2
1 1 2
1 1 1
1 1 1
1 1 1
18
19 20
1 1
Table 1: The convergence history of the quotient singular values in the small example with N = 16. The table shows the number of approximations to the ith exact quotient singular value 3'i, after k Lanczos iterations, whose absolute accuracy is better than 10 -l~ Multiple Vi, such as V3 and 74, are grouped. O
k 10 20 30 40 50
u(k) 5 11 14 21 28
~
smallest converged 1.19.103 2.56 9102 2.56 9102 8.98.101 3.21.101
Table 2" The convergence of the quotient singular values for the large example with N = 64. The table shows the number u(k) of converged distinct quotient singular values after k Lanczos iterations, together with the smallest converged value.
The convergence of the approximate quotient singular values reflects this distribution, see Table 1 next page where we list the computed quotient singular values whose absolute accuracy is better than 10 -l~ We see that the first, well separated 7i are captured fast by the algorithm, while more iterations are required to capture the 7{ which are less well separated. We also observe the appearance of multiple (or "spurious") quotient singular values which is a typical phenomenon of the Lanczos process in finite precision and which is closely related to the convergence of the approximate 7i, cf. [13, w Our next example involves a larger problem, N = 64, which is too large to allow computation of the exact quotient singular singular values by means of routine gsvd. Table 2 shows the number of converged distinct quotient singular values after k = 10, 20, 30, 40, and 50 Lanczos iterations, using a loose convergence criterion (10 -3) and identifying spurious values using the test by Cullum & Willoughby [2, w The largest quotient singular value is 71 = 7.66.103, and the table also lists the size of the smallest converged quotient singular value. The numbers u(k) are for N = 64 are not inconsistent with the results for
N=16.
138
P.C. Hansen and M. Hanke
References [1] /~. BjSrck. Least Squares Methods. In 9P.G. Ciarlet and J.L. Lions (Eds.), Handbook of Numerical Analysis, Vol. I~ North-Holland, Amsterdam, 1990. [2] J.K. Cullum and R.A. Willoughby. Lanczos Algorithms for Large Symmetric Eigenvalue Computations. Vol.I Theory. Birkh~user, Boston, 1985. [3] L. Eld~n. A weighted pseudoinverse, generalized singular values, and constrained least squares problems. BIT 22, pp 487-502, 1982. [4] H.W. Engl. Regularization methods for the stable solution of inverse problems. Surv. Math. Ind. 3, pp 71-143, 1993. [5] G.H. Golub, F.T. Luk and M.L. Overton. A block Lanczos algorithm for computing the singular values and corresponding singular vectors of a matrix. A CM Trans. Math. Soft. 7, pp 149-169, 1981. [6] G.H. Golub and C.F. Van Loan. Matrix Computations. 2. Ed. Johns Hopkins University Press, Baltimore, 1989. [7] C.W. Groetsch. Inverse Problems in the Mathematical Sciences. Vieweg, Wiesbaden, 1993. [8] M. Hanke. Iterative solution of underdetermined linear systems by transformation to standard form. In : Proceedings Numerical Mathematics in Theory and Practice, Dept. of Mathematics, University of West Bohemia, Plzefi, pp 55-63 (1993) [9] M. Hanke and P.C. Hansen. Regularization methods for large-scale problems. Surv. Math. Ind. 3, pp 253-315, 1993. [10] P.C. Hansen. Regularization, GSVD and truncated GSVD. BIT 29, pp 491-504, 1989. [11] P.C. Hansen. Relations between SVD and GSVD of discrete regularization problems in standard and general form. Lin. Alg. Appl. 141, pp 165-176, 1990. [12] P.C. Hansen. Regularization Tools: A Matlab package for analysis and solution of discrete ill-posed problems. Numerical Algorithms 6, pp 1-35, 1994. [13] B.N. Parlett. The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs, N.J., 1980. [14] H. Zha. Computing the Generalized Singular Values/Vectors of Large Sparse or Structured Matrix Pairs. Report CSE-94-022, Dept. of Computer Science and Engineering, Pennsylvania State University, January 1994; submitted to Numer. Math. A condensed version appears in these proceedings.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
A QR-LIKE SVD ALGORITHM SEVERAL MATRICES
FOR
139
A PRODUCT/QUOTIENT
OF
G.H. GOLUB
Computer Science Department Stanford University Stanford, CA U.S.A. golub @sccm.stanford, edu K. SOLNA
Computer Science Department Stanford University Stanford, CA U.S.A. solna @sccm.stanford, edu P. VAN DOOREN Cesame
Universitd Catholique de Louvain Louvain-la-Neuve Belgium vandooren @anma. ucl. ac. be
ABSTRACT. In this paper we derive a new algorithm for constructing unitary decomposition of a sequence of matrices in product or quotient form. The unitary decomposition requires only unitary left and right transformations on the individual matrices and amounts to computing the generalized singular value decomposition of the sequence. The proposed algorithm is related to the classical Golub-Kahan procedure for computing the singular value decomposition of a single matrix in that it constructs a bidiagonal form of the sequence as an intermediate result. When applied to two matrices this new method is an alternative way of computing the quotient and product SVD and is more economical than current methods. KEYWORDS. Numerical methods, generalized singular values, products of matrices, quotients of matrices.
140
1
G.H. Golub et al.
INTRODUCTION
The two basic unitary decompositions of a matrix A yielding some spectral information are the Schur form A = U T U * - where U is unitary and T is upper triangular - and the singular value decomposition A = UF~V* - where U and V are unitary and ]E is diagonal - (for the latter A does not need to be square). It is interesting to notice that both these forms are computed by a QR-like iteration [4]. The SVD algorithm of Golub-Kahan [3] is indeed an implicit QR-algorithm applied to the Hermitian matrix A * A . When looking at unitary decompositions involving two matrices, say A and B, a similar implicit algorithm was given in [6] and is known as the QZ-algorithm. It computes A = QTaZ* and B = QTbZ* where Q and Z are unitary and T~ and Tb are upper triangular. This algorithm is in fact the QR-algorithm again performed implicitly on the quotient B - 1 A . The corresponding decomposition is therefore also known as the generalized Schur form. This is not the case, though, when considering the generalized singular value decomposition of two matrices, appearing as a quotient B - 1 A or a product B A . In this case the currently used algorithm is not of QR type but of a Jacobi type. The reason for this choice is that Jacobi methods extend to products and quotient without too much problems. The bad news is that the Jacobi algorithm typically has a (moderately) higher complexity than the Q R algorithm. Yet, so far, nobody proposed an implicit QR-like method for the SVD of a product or quotient of two matrices. In this paper we show that, in fact, such an implicit algorithm is easy to derive and that it even extends straightforwardly to sequences of products/quotients of several matrices. Moreover, the complexity will be shown to be lower than for the corresponding Jacobi like methods.
2
IMPLICIT
SINGULAR
VALUE DECOMPOSITION
Consider the problem of computing the singular value decomposition of a matrix A that is an expression of the following type : A = A~.....
A~2. A~1,
(1)
where si = + l , i.e. a sequence of products of quotients of matrices. For simplicity we assume that the Ai matrices are square n x n and invertible, but as was pointed out in [2], this does not affect the generality of what follows. While it is clear that one has to perform left and right transformations on A to get U * A V = E, these transformations will only affect A K and A1 Yet, one can insert an expression Q*Qi = In in between every pair "asi+a A~i *i+1 in (1). If we also define Q K - U and Qo - V, we arrive at the following expression : 9
U * A V = ( Q K9A K,K QK-I)
"
9. .
(f),asl 9(f),as2 k ~ 2 ~ - ~ 2 Q1). \ ~41-'-~1 Q0)
(2)
9
With the degrees of freedom present in these K + 1 unitary transformations Qi at hand, one can now choose each expression -O~*iA- isi Qi-1 to be upper triangular. Notice that the expression .~,O*.AS.iQi_l._,= T~ ~ with Ti upper triangular can be rewritten as 9 Qi 9 A i Q i - l = Ti
forsi
---
1,
Q i*_ l A i Q i
Ti
for si = - 1 .
(3)
A QR-like SVD Algorithm
141
/,From the construction of a normal Q R decompostion, it is clear t h a t , while making the matrix A upper triangular, this "freezes" only one m a t r i x Qi per m a t r i x Ai. The remaining unitary m a t r i x leaves enough freedom to finally diagonalize the matriz A as well. Since meanwhile we computed the singular values of (1), it is clear t h a t such a result can only be obtained by an iterative procedure. On the other hand, one intermediate form t h a t is used in the G o l u b - K a h a n SVD algorithm [3] is the bidiagonalization of A and this can be obtained in a finite recurrence. We show in the next section t h a t the matrices Qi in (2) can be constructed in a finite number of steps in order to obtain a bidiagonal Q*KAQo in (2). In carrying out this task one should try to do as much as possible implicitly. Moreover, one would like the t o t a l complexity of the algorithm to be comparable to - or less t h a n the cost of K singular value decompositions. This means t h a t the complexity should be O ( K n 3) for the whole process.
3
IMPLICIT
BIDIAGONALIZATION
We now derive such an implicit reduction to bidiagonal form. Below ~ / ( i , j ) denotes the group of Householder transformations having ( i , j ) as the range of rows/columns they operate on. Similarly ~(i, i + 1) denotes the group of Givens transformations operating on rows/columns i and i + 1. We first consider the case where all si = 1. We thus only have a product of matrices Ai and in order to illustrate the procedure we show its evolution operating on a product of 3 matrices only, i.e. A3A2At. Below is a sequence of "snapshots" of the evolution of the bidiagonal reduction. Each snapshot indicates the p a t t e r n of zeros ('0') and nonzeros ( ' x ' ) i n the three matrices. First perform a Householder transformation Q~X) 6 ~ ( 1 , n) on the rows of A1 and the columns of A2. Choose Q~I)" to annihilate all but one element in the first column of A1 " X
X
X
X
X
2;
X
2;
2;
X
X
2;
2;
2;
2;
2;
2;
X
2;
2;
2;
X
2;
2;
2;
0
2;
2;
2;
2;
X
X
2;
2;
2;
2;
2;
2;
2;
2;
0
2;
2;
2;
2;
2;
2;
2;
2;
2;
2;
2;
2;
X
2;
0
2;
2;
2;
2;
2;
2;
2;
2;
2;
2;
2;
2;
X
X
0
2;
2;
X
X
.
Then perform a Householder transformation Q~I) 6 T/(1, n) on the rows of A2 and the columns of A3. Choose Q t21)' to annihilate all but one element in the first column of A2 " X
X
2;
X
2;
X
2;
2;
2;
2;
X
2;
2;
X
2;
2; x x x
x x x x
2; x x 2;
2; x x x
x x 2; 2;
0 0 0 0
2; x 2; 2;
2; x x 2;
2; x x x
x x 2; 2;
0 0 0 0
x x x x
x x 2; 2;
x x 2; 2;
x x 2; 2;
.
Then perform a Householder transformation Q0) E ~/(1, n) on the rows of A3. Choose
142
G.H. Golub et al.
Q i 1) t o a n n i h i l a t e Z
Z
At
Z
in the first column
Z
Z
Z
Z
Z
Z
Z
of A3
Z
Z
Z
Z
0
x
x
x
x
0
x
x
x
z
0
x
x
x
x
0
x
x
x
x
0
x
x
x
z
0
x
x
x
x
0
x
x
x
x
0
x
x
x
x
0
x
x
x
x
0
x
z
x
z
0
x
z
x
z
0
z
z
z
x
Notice that three matrices
this third 9
transformation
9
.
yields the same form
also for the product
Z
Z
Z
Z
Z
Z
Z
X
Z
Z
z
z
z
z
z
x
X
x
x
X
0
z
z
x
z
0
z
z
z
x
0
z
z
x
z
0
z
z
z
z
0
z
z
z
z
0
z
z
x
z
0
z
z
z
z
0
x
z
z
z
0
x
z
z
x
0
x
z
x
z
0
x
z
z
z
0
z
z
x
z
0
z
z
z
z
0
z
z
z
z
0
x
x
x
x
0
z
x
x
x
this stage
above).
all b u t o n e e l e m e n t
Z
we are interested
in t h e
This row can be constructed
first row
as t h e p r o d u c t
=
of this product
(indicated
which annihilates
transformation
Q(1) e 7-/(2, n ) o p e r a t i n g
all b u t t w o e l e m e n t s
.
in boldface
of t h e first r o w of A3 w i t h t h e m a t r i c e s
t o t h e r i g h t o f i t , a n d t h i s r e q u i r e s o n l y O ( K n 2) f l o p s . O n c e t h i s r o w is c o n s t r u c t e d find a Householder
of the
on the last (n-
we can
1) e l e m e n t s
'
= o o o]. This transformation is t h e bidiagonalization since
Q(~)'AQ~)=
Now perform columns
Then columns
applied
x
x
0
0
0
0
Z
X
X
X
0
9
9
9
9
0
x
x
x
x
0
X
X
X
X
a Householder
(4)
t o A1 o n l y
transformation
o f A 2 . C h o o s e Q~2) t o a n n i h i l a t e
and
this
completes
Z
X
X
X
X
X
X
27
X
X
X
X
X
X
0
z
z
x
z
0
z
z
z
z
0
z
x
z
x
0
z
x
x
x
0
x
z
x
z
0
0
x
x
z
0
z
x
z
z
0
z
z
z
z
0
0
z
z
z
0
x
x
z
x
0
z
x
x
x
0
0
z
x
z
a Householder
transformation
o f A 3 . C h o o s e Q~2) t o a n n i h i l a t e
first stage
of the
Q~2) E 7 / ( 2 , n ) o n t h e r o w s o f A1 a n d
all b u t t w o e l e m e n t s
X
perform
the
in the second column
.
Q~2) e 7~(2, n ) o n
t h e r o w s o f A2 a n d t h e
all b u t t w o e l e m e n t s in t h e s e c o n d c o l u m n
X
X
X
X
X
X
X
X
X
X
X
X
X
Z
X
0
z
x
x
z
0
z
x
z
x
0
x
z
z
x
0
x
x
x
z
0
0
z
z
x
0
0
x
z
x
0
x
x
x
z
0
0
x
z
z
0
0
z
z
z
0
x
x
z
z
0
0
x
x
x
0
0
z
z
x
the
o f A1
.
o f A2
A QR-like SVD Algorithm
143
Then perform a Householder transformation Q(2) 6 7-/(2, n) on the rows of A3 and choose it to annihilate all but two elements in the second column of A3 " x
2;
z
x
z
2;
x
2;
2~
x
x
x
2;
x
0
z
z
x
x
0
x
x
x
x
0
z
x
z
x
0
0
z
z
z
0
0
z
z
z
0
0
z
z
x
0
0
x
x
x
0
0
z
x
x
0
0
x
z
x
0
0
x
z
z
0
0
x
z
z
0
0
x
x
z
.
For the product we know that 9 2;
X
2;
2;
2;
2;
2;
X
X
2;
z
z
z
z
z
z
z
0
0
0
0
2;
Z
2;
Z
0
Z
2;
Z
Z
0
z
z
z
z
0
x
x
x
x
0
0
2;
2;
2;
0
0
2;
Z
2;
0
0
z
z
z
0
0
z
x
x
0
0
2;
Z
2;
0
0
Z
Z
Z
0
0
z
z
z
0
0
z
z
x
0
0
Z
2;
2;
0
0
Z
Z
Z
0
0
z
z
z
0
0
z
z
x
--
.
At this stage we are interested in the second row of this product (indicated in boldface above). This row can be constructed as the product of the second row of A3 with the matrices to the right of it, and this again requires only O(K(n-1) 2) flops. Once constructed we can find a Householder transformation Q(2) 6 ~ ( 3 , n ) operating on the last ( n - 2) elements which annihilates all but two elements :
[o
9 9 o o].
(5)
This transformation is then applied to A1 only, completing the second step of the bidiagonalization of A :
Q~)'Q~)'AQ(0~)Q~)=
x
x
0
0
0
x
x
0
0 0
0
0
~
~
~
0
0
z
z
x
0
0
z
z
x
It is now clear from the context how to proceed further with this algorithm to obtain after n - 1 stages 9 X
2;
2;
X
X
x
x
x
x
x
x
z
x
x
x
x
z
0
0
0
X
X
X
X
0
z
z
x
z
0
z
z
z
x
0
z
z
0
0
0
0
z
z
z
0
0
z
z
z
0
0
z
x
x
0
0
x
x
0
0
0
0
z
x
0
0
0
z
z
0
0
0
x
x
0
0
0
z
z
0
0
0
0
z
0
0
0
0
z
0
0
0
0
z
0
0
0
0
z
--
0
.
Notice that we never construct the whole product A = A3A2A1, but rather compute one its rows when needed for constructing the transformations Q(00. The only matrices that are kept in memory and updated are the Ai matrices and possibly ~ K and Q0 if we require the singular vectors of A afterwards. of
The complexity of this bidiagonalization step is easy to evaluate. Each matrix Ai gets pre and post multiplied with essentially n Householder transformations of decreasing range. For updating all Ai we therefore need 5Kn3/3 flops, and for updating QK and Q0 we need 2n 3 flops. For constructing the required row vectors of A we need ( K - 1)n3/3 flops. Overall
G.H. Golub et al.
144
we thus need 2 K n 3 flops for the construction of the triangular Ti and 2n 3 for the outer transformations Q K and Qo. Essentially this is 2n 3 flops per updated matrix. If we now have some of the si = - 1 we can not use Householder transformations anymore. Indeed in order to construct the rows of A when needed, the matrices A~"1 have to be trangularized first, say with a QR factorization. The Q R factorization is performed in an initial step. From there on the same procedure is followed, but using Givens rotations instead of Householder transformations 9 The use of Givens rotations allow us to update the triangularized matrices A~"1 while keeping them upper triangular 9 Each time a Givens rotation detroys this triangular form, another Givens rotation is applied to the other side of that matrix in order to restore its triangular form. The same technique is e.g. used in keeping the B matrix upper triangular in the Q Z algorithm applied to B -1A. The bookkeeping of this algorithm is a little more involved and so are the operation counts, which is why we do not develop this here. One shows that when there are inverses involved, the complexity of the bidiagonalization step amounts to less than 4n 3 flops per updated matrix 9
4
COMPUTING
THE SINGULAR
VALUES
The use of Householder and Givens transformations for all operations in the bidiagonalization step guarantees that the obtained matrices 7"/in fact correspond to slightly perturbed data as follows : * Ti = Qi(Ai + ,SAi)Qi-1, si
-
1,
Tj = Q j* - I (
A
j + ~Aj)Qj, sj = - 1 ,
(6)
where
II~Ai[I n and given k,
C. -T. Pan and P. T.P. Tang
158
1 < k < n, there exists a column permutation 1I such that
k AII = QR = Q ( Rll 0
n-k
(i)
R12 ) R22
'
and an~n(Rll) ~
ak(A),
O'max(R22) ~
ak+l(A),
(2)
where Q E p mxn, QTQ = I, R E pj~x,~ is upper triangular, and
al(A) >_a2(A) _>... >_ an(A) are singular values of A. In particular, if k is the numerical rank of A, i.e., at(A) ::~ ar+l(A) ~ 0. ( [4, 5]), the factorization is rank revealing ([1]). Unfortunately, since the choice of the permutation II in (1) relies upon partially on information of the SVD of A, the proof does not render a practical algorithm to substitute the SVD in determining rank. In this paper, we give an algorithm which identifies the permutation II without using any information on the SVD. We introduce a pair of dual concepts: pivoted blocks and reverse pivoted blocks. They are the natural generalization of what we called pivoted magnitudes and reverse pivoted magnitudes, which are the results of the well-known Golub's column pivoting [3] and a less well-known column pivoting strategy proposed by Stewart [7]. The main result is that a pivoted block Rll (or equivalently a reverse pivoted block R22) ensures (2). This result allows us to devise a column pivoting strategy, which we call cyclic pivoting, that produces the pivoted and reverse pivoted blocks. In particular, if k is the numerical rank of A, the cyclic pivoting strategy guarantees a rank-revealing QR factorization. It should be pointed out that our pivoting strategy is similar to that of Hybrid-III in [2], without using the SVD, but the two methods also are fundamentally different. The rest of the paper is organized as follows. Section 2 establishes bounds on singular values derived from the properties of pivoted and reverse pivoted magnitudes. Section 3 presents our main theorems on pivoted blocks and reverse pivoted blocks. Section 4 presents the cyclic column pivoting algorithm used to find the pivoted blocks. Section 5 presents some numerical experiments used to confirm the theoretical results.
2
PIVOTED
MAGNITUDES
Given a matrix A, there are two well-known pivoting strategies for Q R factorization that produce tight bounds on amax(A) and anon(A) from particular entries in R. Those particular entries of R are important to our subsequent discussion. We make two definitions. Given an m-by-n matrix A, m _ n, let IIi,j be the permutation such that AIIi,j interchanges columns i and j of A. The pivoted magnitude of A, 7/(A), is defined to be the maximum
Singular Values Revealed by QR Factorizations
159
magnitude of the (1, 1) entry of the R factors of AIIl,t, 1 = 1 , 2 , . . . , n, thus: rll
~(A) aef =max
r12
9. 9
tin
r22
999
r2n
r~ 13
Irnl'AIIl,t=Q
.
,l=l,2,...,n
.
i
rnn
Algorithmically, one can think of applying QR factorization with column pivoting [3] to A. Then, the magnitude of r n of the resulting R factor is ~?(A). Clearly,
~?(A) = maxllAejll2,
j = 1,2,...,n,
where ej is the standard basis vector in ~'~. We now define the reverse pivoted magnitude of A, r(A), to be the minimum magnitude of the (n,n) entry of the R factors of AIIt,,~, 1 = 1 , 2 , . . . , n , thus: rll
r12
9. .
rln
9
r2n
r22
r(A)d=dmin
Irnnl'AIIt'n=Q
O
"..
"
,I=1,2,...,n
.
rr~r~
If A is nonsingular, we also have
~(A) = ~ / m ~ ll~}"A-~ll~, 3
as shown in [7], where Stewart calls a related column pivoting strategy the reverse pivoting strategy. The following lemma is not new. The result for r ( A ) i s proved in [7] and [1]. The result for r/(A) is rather straightforward. We therefore only state the results" L e m m a 1 Let A be an m-by-n matrix, m >_n. Then, r/(A) < amax(A) _< V~r/(A),
(3)
(1/~/-Qr(A) < amin(A)_
(4)
and r(A).
Now we consider the Q R factorization
k A=QR=Q
(Rn 0
n-k R12) R22 "
Two related submatrices are important in subsequent discussions: /~11, the (k + 1)-by-(k + 1) leading principal submatrix of R, and R22, the ( n - k + 1)-by-(n- k + 1) trailing principal submatrix of R. The next two lemmas facilitate the discussions to follow.
C. - T. Pan and P. T P. Tang
160
L e m m a 2 If T(Rll) ~ f ~(/~22) for 80me 0 < f ~ 1, then
amin(Rll) f
for 1 = 1 , 2 , . . . , k .
Among other benefits, the f-factor provides us with flexibility in algorithm implementation without sacrificing theoretic rigor. T h e o r e m 1 Let k
A =QR=Q
Rll
0
n-k R12 ) R22 "
If Rll is a pivoted block, then drain(R11) _< ak(A) - f It(0 k+l,k+l[
for l = k + 1, k + 2 , . . . , n.
It turns out that pivoted blocks and reverse pivoted blocks are dual concepts. T h e o r e m 2 Let
k (Rll A = QR = Q 0
n-k R12 ) R22 "
Rll is a pivoted block if and only if R22 is a reverse pivoted block. More generally, Rll is an f-pivoted block if only if R22 is an reverse f-pivoted block, where 0 < f < 1. (See [6] for a proof).
Singular Values Revealed by QR Factorizations
4
163
ALGORITHM
In the previous section, the main properties of pivoted and reverse pivoted blocks are discussed. Do these pivoted blocks always exist? The following algorithm answers this question positively. We use the notation 7Z(M) to denote the R factor of a matrix M. A l g o r i t h m 1 Cyclic P i v o t i n g . Given k, 1 _ n. The number Wd(A) is the average work to compute the singular value and vector estimate. Required L
L,V L,U
L, U~ L, U, V L, U1, V
L-ULV(A) 8mnk + 3(m + n)k + (k + 1)Wd(A) 8mnk + 3(m + n)k + 4n2k + (k + 1)Wd(A) 4m2k + 8mnk + 3(m + n)k + (k + 1)Wd(A) 12mnk + 3(m + n)k + (k + 1)Wd(A) 4m2k + 4(2m + n)nk + 3(m + n)k + (k + 1)Wd(A) 12mnk + 3(m + n)k + 4n2k + (k + 1)Wd(A) R-SVD Golub-Reinsch SVD
Required
(m _ 5n/3)
E E, V
2mn 2 + 2n 3 2mn 2 + l l n 3 4m2n + 13n 3 6mn 2 + l l n 3 4m2n + 22n 3 6mn 2 + 20n 3
E,O Z, ~ , ?
(5n/3 _ m >__n) 4ran 2 - 4n3/3 4mn 2 + 8n 3
4m2n + 8mn 2 14mn 2 - 2n 3
4m2n + 8mn 2 + 9n 3 14mn 2 + 8n 3
In Table 1 we summarize the flop counts for our L-ULV(A) algorithm. We compare the flop counts with those of two SVD algorithms, cf. [5, w the Golub-Reinsch SVD algorithm and the R-SVD algorithm. An important consideration is "how much" of the ULV decomposition is required. For example, in many total least squares problems, only k and V are required. Therefore, we give flop counts for various combinations of the factors U, U1, L and V, and compare with the two SVD algorithms. In the table, Wd(A) denotes the average work involved in a deflation step to compute the required singular value and vector estimate. We refer to w for more details about Wd(A), and to w for an evaluation of the algorithm. As mentioned earlier, in [4] we present and analyze ALGORITHM ULV(L),which requires an initial "skinny" QR factorization of A. We note that if the matrix A has Toeplitz structure, then the initial skinny QR factorization of A can be computed much faster by means of specialized algorithms: R and U1 can be computed in m n + 6n 2 and 12mn flops, respectively; see, e.g., [9] (a more accurate R can be computed in 2mn + 6.5n 2 flops by new algorithm by Park & Eld~n [10]). The Toeplitz structure is lost in the R factor. Altogether, the flop count for ALGORITHM L-ULV(L)is mn(6k + 13)+ n2(12k +6) + (k + 1)Wd(L) when A has Toeplitz structure and U1 and V are required explicitly with L. See also [4] for a discussion of an implicit ULV algorithm in which the L matrix is never explicitly computed but kept in factored form.
R.D. Fierro and P.C. Hansen
186
3
THE IMPORTANCE
OF A GOOD SINGULAR VECTOR
ESTIMATE
In this section we prove that a very important part of the algorithms given above is the estimation of the largest singular value and associated singular vector in the ith deflation step. The following theorem reveals the importance of a good singular vector estimate during the ith stage of ALGORITHM L-ULV(A), and readily applies to ALGOttlTHM LULV(L) (see [4] for the proof.) T h e o r e m 1 Let Vest'(i)and west" (i) denote the estimate of the largest singular value a~i) and
associated left singular vector u~i) of A(i: m, i: n) 9 Let Oi denote the angle between.' * e(i) and at u~i). Let the orthogonal matrices P(i) and Q(i) come from ALGORITHM L-ULV(A), and partition the updated P(i)A(i: m, i" n)Q(i ) according to P(i)A(i:m,i'n)Q(i)=
1 n-i (li 0 ) 1 hi L (i) m - i.
(2)
Then aq)~
< sin Oi -
and
-~
... >_ aM
(1)
192
S. Van Huffel et al.
yields a basis V1 of the signal subspace 7Z(V1). L _> M is assumed and * denotes the conjugated transpose. However, since O ( L M 2 T M 3) operations are needed, faster factorizations axe considered here, such as the rank-revealing (Rtt) URV decomposition: T = U T ( K ) v * = [ U1 U2 K M-K
Ux
]
L-M
0
G
[ ~'1
0
0
K
V2
]*
(2)
M-K
where U and V axe unitary, D 6 C KxK and G axe upper triangular. The UI%V decomposition is rank revealing in the sense that the numerical rank K of T is exhibited in (2) from
IIFIl + Ilall
+... +
where ak denotes the kth singular value of T. In the noise-free case, all the singular values a K + l , . . . , a M axe equal to zero in exact arithmetic. However, in a noisy environment, the singular values a g + l , . . . , aM correspond to the noise part of the data, and the numerical rank K has to be estimated. The 1~1~UP~V decomposition was first introduced by Stewart [11] for matrices of which the dimension of the signal subspace is large, i.e. when K is only slightly smaller than M. Recently, it was adapted to the computation of a basis for the signal subspace when K o'22 .~ a~ ~ a24 ) 3. Dominant Subspace Size Correct ( a2>a2>a23 ,.~ a 2 ) 4. Increase Dominant Subspace Size ( a~>_a~>_a~>a~ ) Thus, it is only necessary to test for four possible outcomes, i.e., q = 0, r - 1, r, r + 1 . For the details and the strong consistency theorem, see [10]. 4
SIMPLIFICATIONS
WITH FORWARD-BACKWARD
AVERAGING
The data model of [8] assumes forward-only averaged data. Here we discuss how to reduce computations when forward backward averaging is applied. For a more detailed discussion of the topics covered here, see [12]. Write the forward-backward averaged correlation matrix,
RrB,k=
1
jAT][Ak]
AN
= AHBkAFB,k.
(26)
The following theorem, proven in [12], shows how to transform this correlation matrix to a purely real matrix with purely real square root data factors.
Sphericalized SVD Updating for Subspace Tracking
233
T h e o r e m . Let Z FB be defined according to (27)
ZFB -- L H A F B K
where K is given by 1 [I
K=~
S
jI ] -jS '
1
[i0 j~/'20
K=~
J
0
0 -jJ
(28)
for even and odd ordered matrices respectively, and A F B is the FB data matrix defined in (26), and L is a unitary transformation defined by s = ~
1 [I i
jI] -ji
(29)
then, B = KHRFBK
= KHAHBAFBK
=
ZTBZFB
(30)
and ZFB is real if A is complex or ZFB is real and block diagonal if A is real. Furthermore, if we evenly partition A as A = [A1 A= ], then (27) simplifies to ZFB --
[ Re(A1 + JA2) -Im(A1 + JA2)
I m ( A 1 - JA~)] Re(A, - JA2) "
(31)
Assume the data is complex and let ZFB be defined by (27). Similarly, let AFB be defined by (26). If the SVD of ZFB is given by Z F B -----U Z ~ Z v rZ
(32)
where U z and V z contain the left and right singular vectors, respectively, and ]Ez is a non-square diagonal matrix containing the singular values of ZFB , and the SVD of A F B -- U AN2AV/~ , then U A ---- L U z ,
~A = ~Z
and
V A = KV z.
(33)
The SVD computational cost is thus reduced by approximately 75%. SVD updating of A F B , k c a n be more efficiently accomplished by updating ZFB,k instead. First permute AFB,k as defined in (26) by adding a pair of forward backward data vectors. To compute a forward-backward update in less computations than a forward-only update, /x 1. From the k th data vector, x H = a k = [al,k according to (31).
a2,k] , compute rows z2k-1 and Z2k
2. Append Z2k-1 and Z~k t o ZFB,k_ 1 and use two SSA4 updates (or other subspace tracking method) to update the right singular vectors, V z and the singular wlues. 3. As needed, compute VA,k = KVz, k 9(Note: ~-,A,k = }3Z, k .)
234 5
E.M. Dowling et al.
CONCLUSION
In this paper we derived the algorithm ODE for the Single sided Signal Averaged 4-level (SSA4) subspace tracker. We used this ODE to prove convergence in the mean with probability one to the signal subspace in a stationary environment. The algorithm is inexpensive and useful in practical slowly time-varying sonar and radar tracking problems. SSA4-MDL monitors the signal rank. Forward-backward averaging adds structure that can be exploited to reduce computation and boost performance.
Acknowledgments This work was supported in part by the National Science Foundation Grant MIP-9203296 and the Texas Advanced Itesearch Program Grant 009741-022.
References [1] I. Karasalo, "Estimating the covariance matrix by signal subspace averaging," IEEE Trans. ASSP, vol. ASSP-34, pp. 8-12, Feb. 1986. [2] It. D. DeGroat, "Non-iterative subspace tracking," IEEE Trans. Signal Processing, vol. 40, pp. 571-577, Mar. 1992. [3] E. M. Dowling and It. D. DeGroat, "Adaptation dynamics of the spherical subspace tracker," IEEE Transactions on Signal Processing, pp. 2599-2602, Oct. 1992. [4] It. D. DeGroat and E. M. Dowling, "Non-iterative subspace updating," in SPIE Adv.SP Alg., Arch. and Imp. II, (San Diego, California), pp. 376-387, July 1991. [5] It. D. DeGroat and E. M. Dowling, "Spherical subspace tracking: analysis, convergence and detection schemes," 26th Asilomar Conference, pp. 561-565, Oct. 1992. [6] E. M. Dowling and It. D. DeGroat, "A spherical subspace based adaptive filter," in ICASSP, pp. III:504-507, April 1993. [7] It. D. DeGroat, H. Ye, and E. M. Dowling, "An asymptotic analysis of spherical subspace tracking," 27th Asilomar Conference, Nov. 1993. [8] E. M. Dowling, It. D. DeGroat, D. A. Linebarger, and Z. Fu, "Iteal time architectures for sphericalized SVD updating," in this volume. [9] L. Ljung and T. Soderstrom, Theory and prac. of rec. id. MIT Press, 1983. [10] It. D. DeGroat, E. M. Dowling, H. Ye, and D. A. Linebarger, "Multilevel spherical subspace tracking," IEEE Trans. Signal Processing, Submitted August 1994. [11] M. Wax and T. Kailath, "Detection of signals by information theoretic criteria," IEEE Trans. ASSP, vol. ASSP-33, pp. 387-392, Apr. 1985. [12] D. A. Linebarger, It. D. DeGroat, and E. M. Dowling, "Efficient direction finding methods employing forward/backward averaging," IEEE Tr. on Sig. Proc., Aug. 1994.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
REAL TIME ARCHITECTURES ING
FOR SPHERICALIZED
235
SVD UPDAT-
E.M. DOWLING, R.D. DEGROAT, D.A. LINEBARGER, Z. FU University of Texas at Dallas, EC33 P.O. Box 830688 Richardson, TX 75083-0688 U.S.A.
[email protected] ABSTRACT. In this paper we develop a low complexity square root algorithm and a pseudo-systolic array architecture to track the dominant or subdominant singular subspace associated with time-varying data matrices in real time. The pseudo-systolic architectures are ideally suited for implementation with custom CORDIC VLSI or networks of available parallel numeric processing chips such as iWarps or TMS320C40's. The real time update complexity is O(n) where n is the data dimension. Further parallelization is possible to scale complexity down linearly as more processors are added. We show how to track the dimension of the signal subspace and gear-shift the array when the subspace dimension changes. KEYWORDS. SVD updating, subspace tracking, systolic arrays.
1
INTRODUCTION
The background information about the Single sided Signal Averaged 4-level (SSA4) subspace tracking algorithm is discussed in the companion paper in this volume [1], so it will not be repeated here. The rest of the paper is organized as follows. In section II we will develop the SSA4 parallel subspace tracking algorithm. Section III describes the array algorithm that prealigns the spherical subspace basis, and section IV covers the operations associated with the core 5• SVD Jacobi sweep. The use of Jacobi sweeps in subspace updating algorithms has been employed by both Moonen and Proakis [2, 3]. In section V we show how the array implements SSA4-MDL and how it gear-shifts to increase/decrease the dimension of the tracked subspace during tracking. Section VI concludes the paper.
E.M. Dowling et al.
236
2
SSA4 ALGORITHM
DEVELOPMENT
Consider a vector time series {xk} whose associated slowly time-varying correlation matrix may be estimated recursively according to
where 0 < a < l is an exponential window fading factor and l~k is the correlation matrix estimate. Suppose the vector time series is generated by the nonstationary narrowband model [4], ..L% xk = ~_~ a(Oj(k))sj, k + nk (2) j--1
where a(0j(k))EC n is a signal vector, 8j(k) is the jth signal direction at the k th sample, sj,k is the k th sample of the jth modulating process, r is the number of sources and nkEC n is additive white noise. Here the set of signal vectors, { a ( O j ( k ) ) l O j ( k ) , j = 1, ...r} defines the signal subspace. If we define the matrix r ( 0 ( k ) ) e c ~• to have ~(0s(k)) ~ its jth column, then we can write
xk = r ( 0 ( k ) ) ~ + nk
(3)
where now the jth element of s k E C r is sd, k . Now consider the square root data matrix formulation,
9.. l
xk
9 1
~ v~xk
(4)
xH
where W k = diag( ak ,..., a, 1) is an exponential weighting matrix. Note that for stationary data, (1/(1 - a ) ) R k = ~j=ok a k - J X k X H = AHAk so that the r dominant right singular vectors of Ak span the signal subspace, i.e., the range of r(8(k)) . Suppose that at time k - 1 we have a 4-level sphericalized SVD estimate of the data matrix. A 4-level SVD estimate can be obtained from a full SVD by replacing the first r - 1 singular values by their root-square average value, and by replacing the last n - r - 2 singular values by their root-square average value. At time k = 0 we initialize the estimate to be in the 4-level form and from then on each rank one SVD update is resphericalized to produce a new 4-level sphericalized estimate. Hence, at time k - 1 we have the 4-level sphericalized SVD estimate,
(5) which has the 4-level structure 0.1I t - 1
A~4)x_ = Uk-1
I
0"2 0"3 a4Im-r-1
v~' V~
(~)
In (6), al and 0"4 represent the root-square average singular-levels and the columns of the sphericalized portions of the signal and noise subspaces, V , and Vn are the associated
237
Real Time Architectures for Sphericalized SVD Updating
spherical right singular subspace basis vectors. To avoid expensive and redundant computations, the algorithm will actually only update either the signal or noise subspace basis. Thus we will only carry along the smaller of V8 or V,~. Also, since we are only interested in the right singular subspace, we do not update U k. Once (6) is available, we append a new row as follows:
So the question is, how do we restore the form of (6) to the form of (7)? At this point we will take advantage of the spherical subspace structure induced on the problem to doubly deflate the computation to a core 5x4 SVD. Write the appended and A
H
A
rotated input, j3=x k Vk-1, and define G = diag(Z~'/]Zl]," "Z~/]Zn]) so that 7 = ZG is real. Then, since G is unitary, we can rewrite the decomposition as
or,
xH-
=[Uk-10
01jL G][0
01][ C-~DTk-~]GHVHk-t"
(9)
Note that the core matrix is diagonal except for the bottom row and that all elements are real. For notational convenience, denote the two left most matrices on the right hand side of (9) as Ok-1 and G . Block embedded matrices will follow this hat-convention in all that follows. Let J1ER nxn be an orthogonal transformation that, when multiplied onto 7, annihilates the first r - 2 elements. Let J2ER nx'~ be an orthogonal transformation that, when multiplied onto 7, annihilates the last n - r - 2 elements. These transformations can be selected as sequences of plane rotations. Next, using the hat-convention, apply these rotations as (note the inner J's cancel due to the structure of D in (6))
xk
7
to doubly deflate the core matrix, Xk
1
0...7'...0
k-1
where now 7'ER lx4 and falls under the middle portion of Dk-1 as outlined in (6).
238
E.M. Dowling at al.
At this point the problem is nearly solved. All that remains is to diagonalize the core deflated 5 x 4 matrix given by, 0.1 0.2
0"4
This can be done in a number of ways [5]. One parallel approach is to apply Jacobi sweeps which we collect as the J'a and J3 transformations (left and right sides) needed to complete the diagonalization. It turns out that in practice, one Jacobi sweep is usually sufficient to make the off diagonal elements in (12) close enough to zero. Thus we apply one sweep and set the off diagonal elements to zero. Once done, we are left with, Xk
-3
-~ -1
k-1
which can be readily put back into the form of (6). We see that the updated subspace is given by Vk = Vk-~ GJ1J~J3.
(14)
Now we reduce computational complexity by implicitly computing certain quantities 9 Suppose we are tracking the signal subspace. Then the J1 rotations must be applied to annihilate the top portion of 7, and must also be applied to the first r - 1 columns of Vk-1. The J~ rotations are applied implicitly and we only carry along two noise subspace basis vectors. We compute / 12
,I.(I -Iv.
v~+~
][v.
~+~
]") xll~
(15)
and [V. V r + l ][V~ V r + l ]H)x. 7r+2 The J1 and the J~ rotations do not affect vr , vr+l , 7r nor 7r+1. Vr+2
1 (I --~ _.--7"--"
(16)
The algorithm may be summarized as follows: 1. Compute ~ = xH[V, 2. C o m p u t e r
v r + , ] = [ft,
(I-[Vs
Vr+,][Vs
~r+,] and [x]~ . Vr+l]H)x=x--Vs~s-Zr+lVr+l.
3. Compute the diagonal elements of G = diag(Z~'/]~l],'"Z*+l/]Zr+ll) and compute 7=/9G. 4. Scale the columns of [V8
vr+l ] according to [Vs, vr+l]G H.
5. Using "h, ...,7r-1, compute the sequence of rotations J ( i , i + 1,0) for i = 1 , 2 , . . . , r - 2 to annihilate all but the last element 9This will produce %-1 9Apply these rotations N down the columns of Vs .
Real Time Architectures for Sphericalized SVD Updating
239
, - - %+1 and compute %'+2 -- j ilxll 2 _ 7 r,2- l - - T)r2 --%+1 ,2 9 Check 6. Set 7r' = T r ,%+1 each update to insure this quantity is real, and if not, set it to zero. 7. Compute vr+2 = 1__~._r r+2
8. Construct the 5• core matrix (12), compute and apply a :Iacobi sweep, and apply the right rotations, J3 to Vr-1, Vr, Vr+l, Vr+a. 9. Re-average, i.e., re-sphericalize the signal and noise subspace singular values according to al = i (r- 2~-a~+~{~ and a4 = i ( m - r -m2- r)- 1~ + ~ g wherea~ and a~ are the modified 1 values after the core Jacobi sweep is applied. 3
INPUT PROCESSING
AND BASIS PREALIGNMENT
In this section we show how to pipeline the sequence of J1 plane rotations together with the implicit J2 rotations which are implemented via an innerproduct according to (16). These computations serve to compress the input vector and prealign the spherical subspace bases. Other topological variations of the array structure are outlined in [6, 7]. We will refer to the steps of the above algorithm summary as we proceed. The first step of the plane-rotation based SSA4 tracker is to compute/3 = x H [Vsvr+l] = [/~s/~r+l] and [Ixl[~. The [/3s /3r+l ] vector can be computed systolically using the scheme depicted in Figure 1, where we assume r = 4. Here the elements of the current input vector, x stream in one at a time and move to the right. Meanwhile, the signal subspace basis vectors stream in from above staggered by one element in order for the heads of each of the v-streams to meet with the head of the x-stream. With this alignment, for j = 1, 2, ..., r + 1 the jth cell can accumulate/~j = xHvj in place. The (r + 2) nd cell accumulates IIx]]2. *
*
*
V3,4
V2,5
*1%2* [vs,3 Ivy4 jv~,, v~l * Ivy2 IVy3 v~s '4~ 71,1
Figure I: Innerproduct array for input compression and noise vector computation
Figure 2: Bi-linear array architecture for SSA4 updating. The boxes represent innerproduct cells and the diamonds represent rotation cells.
A convenient way to specify the operation of the systolic array is through the use of cell programs. Collectively, the cell programs specify the operation, timing, and data flow of the array. In this paper we do not have the space to report the cell programs, but the cell programs of closely related two-sided EVD versions of this algorithm (SA2 and SA4) are found in [6, 7]. C-language cell programs that fork multiple Unix processes which communicate through Unix shared memory segments have been implemented and tested. These programs are based on code supplied by G.W. Stewart.
E.M. Dowling et al.
240
Now consider how to pipeline steps 1 and 2 of the SSA4 tracking algorithm. The problem is to pipeline the step 2 computation right behind the computations of step 1. Note that the ~--outputs from step 1 are inputs to step 2. Hence step two can start execution as soon as the ~-values become available. As soon as the cells P(1)-P(r + 1) finish the computations of Figure 1, they start executing step 2. To see how, write r as r ---- X -
Yl/~l -- Y2~2 - - ' ' ' - -
Vr-{-1/~r+l
(17)
where vj denotes the jth column of V. Here each of the terms in this expansion are systolic streams that combine into an accumulator stream. During step 1, cell P(j) stores a local copy of vj, for j = 1, 2, ..., r + 1 and P ( r + 2) stores a copy of the input vector, x. Hence each cell of Figure 1 next generates one of the terms in (17) as soon as the /~-values become available. Here P(1) starts the process by propagating the v1~1 stream to the right. Next P(j) for j = 2, .., r accepts as input the accumulator stream containing acc = v~x + ... + v/3j_l and outputs the stream acc = v1~1 + ... + v~/~j. The last cell, P(r + 2) finishes the computation described by (17) and routes the v~+2-stream into a vector variable for later use. Next the array computes the sequence of plane rotations to annihilate the first r - 1 elements of the -},-vector. The array also applies these rotations to the columns of Vs to accumulate the basis according to (11). The basis accumulation array and its interconnection with the previously described innerproduct array is depicted in Figure 2. The cells in the basis accumulation array are represented by diamonds while those of the innerproduct array are represented by boxes. The cells with different functions from the others are highlighted. In the figure cells P(1, r), P(1, r + 1) and P(1, r + 1) double as innerproduct array cells and basis accumulation array cells. Also, the figure identifies the input and output streams of the various cells as labeled by the communications arrows. To understand the operation of the basis accumulation array, assume that each P ( 2 , j ) for j = 1, 2, ...r + 1 contains the vector vj. The first thing that these cells do is propagate their initial contents south. This provides input to the array of Figure 1 via the connections described by Figure 2. After the P ( 2 , j ) cells send their vj vectors south, they must each receive back their jth fl-value. As soon as /3j is available, P ( 2 , j ) can compute the jth diagonal element of G as described in step 3 of the SSA4 tracking algorithm. Hence cells P(2,1) - P(2, r + 1) in Figure 2 send down their vj-vector and receive back/~j and compute gj and 7j. Next the cells exchange information in an overlapped pattern from left to right to compute the sequence of plane rotations that annihilate the first r - 2 elements of columns of 7 and produce 7'-1. These rotations are applied in an overlapped systolic fashion to successive pairs of columns of V s = V s G J 1 9 For more details to include cell programs, see [6, 7]. 4
5•
JACOBI OPERATIONS
The computations of the last section computed 7'ER 1• and set up [Vr-lVrV~+lVr+2] for their final set of core rotations. The array next computes the applies the core 5• Jacobi sweep to (12) and applies the right-rotations to the columns of [Vr-lV~Vr+lVr+2] 9 This core Jacobi sweep exchanges information between the signal and noise subspaces.
Real Time Architectures for Sphericalized SVD Updating
241
Figure 3: Application of Jacobi rotations to update the boundary subspace basis vectors. The subarray, {P(i,j)li = 1,2; j = r, r + 1, r + 2}, computes and applies the Jacobi sweep to (12) and applies the right rotations to [Vr-lVrVr+lVr+2]. First it applies a sequence of left rotations to convert (12) into an upper triangular matrix whose bottom row is annihilated. These operations involve only length-4 or less quantities so the communications needed is local. Next it applies the Jacobi rotations to diagonalize the 4• upper triangular portion following the method described on page 455 of [8] or in [2, 3]. The main computational load involves applying these right-rotations to [V~-lV~Vr+lV~+2]. The rotation locations in the subarray and the data flow is depicted in figure 3. 5
SSA4-MDL AND GEAR-SHIFTING
The SSA4 array uses SSA4-MDL to track the number of sources. SSA4-MDL is described in the companion paper in this volume [1]. The cell programs for the two-sided version, SA4-MDL, are presented in [7] and do not differ much from the cell programs needed to implement SSA4-MDL. SSA4-MDL uses the four singular levels retained by the SSA4 algorithm to direct the array to increase, decrease, or not change the subspace dimension. After the completion of the Jacobi operations, the top part of the subarray will contain the singular level estimates used to make the SSA4-MDL gear shift decision. If the decision is to leave the rank the same, when the Jacobi rotations are applied as depicted in Figure 3, the streams [v~-lv~vr+lvr-2] will return to their host cells in the top portion of the basis accumulation array. If the decision is to increase the rank, an idle cell in the left portion of the array will become active and all the basis vector streams will shift left by one cell. Here the second computed noise vector, vr+2 , becomes vr+l, and the new Vr+ 2 is computed according to (16) during the next update. If the decision is to decrease the signal subspace dimension, the left-most active cell becomes inactive and all of the cells route their host basis vectors one cell to the right while v~+l is discarded. 6
CONCLUSION
In this paper we derived the Single sided Signal Averaged 4-level (SSA4) subspace tracker. The algorithm uses stochastic approximation techniques to reduce computation and is quite useful in slowly time-varying signal scenarios as encountered in practical sonar and radar angle and frequency tracking problems. We designed a pseudo-systolic array for real time
E.M. Dowling et al.
242
processing applications. The algorithm was designed using innerproducts and plane rotations, so that it is inherently parallel and amenable to pipelining. The array uses SSA4MDL to track the number of signals and automatically gear shifts when the number of tracked signals changes. The array presented herein can be modified to l-D, 2-D and 3-D mesh topologies as discussed in [6, 7].
Acknowledgments This work was supported in part by the National Science Foundation Grant MIP-9203296 and the Texas Advanced Research Program Grant 009741-022. References
[1]
E. M. Dowling, R. D. DeGroat, D. A. Linebarger, and H. Ye, "Sphericalized SVD updating for subspace tracking," in SVD and Signal Processing: Algorithms, Applications and Architectures III, North Holland Publishing Co. 1995.
[2]
M. Moonen, P. VanDooren, and J. Vanderwalle, "Updating singular value decompositions. A parallel implementation," in SPIE Advanced Algorithms and Architectures for Signal Processing, (San Diego, California), pp. 80-91, 1989.
[3]
W. Ferzali and J. G. Proakis, "Adaptive SVD algorithm and applications," in SVD and signal processing II, (Elsevier), 1992.
[4]
E. M. Dowling, L. P. Ammann, and R. D. DeGroat, "A TQR-iteration based SVD for real time angle and frequency tracking," IEEE Transactions on Signal Processing, pp. 914-925, April 1994.
[5]
S. VanHuffel and H. Park, "Parallel reduction of bounded diagonal matrices," Army high performance computing research center, 1993.
[6]
Z. Fu, E. M. Dowling, and R. DeGroat, "Spherical subspace tracking on systolic arrays," Journal of High Speed Electronics and Systems, Submitted may 1994.
[7]
Z. Fu, E. M. Dowling, and R. D. DeGroat, "Systolic MIMD architectures for 4-level spherical subspace tracking," Journal of VLSI Signal Processing, Submitted Feb. 1994.
[8]
G. H. Golub and C. F. VanLoan, Matrix Computations, 2nd Edition. Baltimore, Maryland: Johns Hopkins University Press, 1989.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
SYSTOLIC ARRAYS
243
FOR SVD DOWNDATING
F. LORENZELLI, K. YAO Electrical Engineering Dept., UCLA Los Angeles, CA 90024-159~, U.S.A. lorenz@ee, ucla.edu, yao@ee,ucla.edu ABSTRACT. In many applications it is required to adaptively compute the singular value decomposition (SVD) of a data matrix whose size nominally increases with time. In situations of high nonstationarity, the window of choice is a constant amplitude sliding window. This implies that whenever new samples are appended to the data matrix, outdated samples must be dropped, raising the need for a downdating operation. Hyperbolic transformations can be used for downdating with only minor changes with respect to the updating algorithms, but they can be successfully used only on well conditioned matrices. Orthogonal downdating algorithms which exploit the information contained in the orthogonal matrices of the decomposition are less prone to instabilities. Here, we consider hyperbolic and orthogonal downdating procedures applied to two algorithms for SVD computation, and discuss the respective implementational issues. KEYWORDS. SVD, orthogonal downdating, hyperbolic transformations. 1
MOTIVATION
In numerous situations, such as adaptive filtering, bearing estimation, adaptive beamforming, etc., the received signals (appropriately sampled and digitized) are stored in a matrix form. Linear algebraic techniques are subsequently employed to extract from these matrices useful information for further processing. The quantities of interest are typically given by numerical rank, individual singular values and/or singular vectors, or global information about singular (signal/noise) subspaces. The estimation of these entities is then used for the calculation of filter weights, predictor coefficients, parametrical spectral estimators, etc. The receiving signals are quite commonly nonstationary in nature, and so are all the quantities pertaining to the data matrices. Moreover, it can be analytically proved that some characteristics of the data would change in time even in a stationary environment, by the mere presence of noise. These are the reasons which motivate the interest in adaptive algorithms, i.e., algorithms which can track the time variability of the involved signals.
244
F. Lorenzelli and K. Yao
As data change, it is advisable to give prior importance to the latest samples and discard the outdated ones. It is thus common practice to "window" the data matrices prior to any computation. There are many different kinds of windowing techniques, but two have received the most attention, namely exponential forgetting and the sliding window approach. The former method involves the multiplication of the old data by a forgetting parameter 0 < fl _( 1, before the new samples get appended to the data matrix. As a consequence, each sample decays exponentially with time. In turn, the alternative sliding window approach retains the same number of samples at any given time, thus requiring one (block) downdate per each (block) update. Exponential forgetting is usually accomplished with only minor changes to the fundamental algorithms. This is in contrast to the more complicated operation of downdating, required in the sliding window approach. Hsieh shows in [6] that the sliding window approach is more effective than the exponential forgetting in highly nonstationary environments, even when the parameters are optimally chosen. Many algorithms can be used to solve the problems listed above, with different tradeoffs of computational complexity, parallelism, up/downdating capabilities, etc. The theoretically best algorithms are usually based on the singular value decomposition (SVD) of the data matrix. Despite its numerous properties, the SVD was until recently considered too computationally intensive and exceedingly difficult to update. With the advent of parallel architectures, this view has been changing and various SVD parallel algorithms have been proposed. Of particular interest are the linear SVD array based on Hestenes' algorithm and proposed in [3] and the 2-D SVD array based on Jacobi rotations [10] and fully analyzed in [4], [7] and [8]. The former array performs block computations, whereas the latter is ideally suited for updating with exponential forgetting. Both algorithms can be extended to include up/downdating with sliding window, by making use of hyperbolic or orthogonal transformation. 2
T H E J A C O B I SVD A L G O R I T H M - H Y P E R B O L I C D O W N D A T I N G
The Jacobi SVD algorithm can be transformed to include hyperbolic downdating operations.
Consider t h e m a t r i x ( Az_) , where z_ is the information to be downdated.
Let
A = ~r ( ~ ~H be the SVD of A. Let the hyperbolic transformation H be chosen so that \ 0] 0
~T~
-
H
R
(re+l) xCm+l)
0
(~)
and g is an n x n upper triangular matrix. In general, let H be hyperexchange with respect to ~, in the sense that
0
-1
'
m-n+1
n
m-n+l
0
~2
'
where ~k = d i a g ( + l , . . . , -4-1), k = 1, 2. If Jt and Jr are the Jacobi rotations that diagonalize R, i.e., JeAJ H = R, and Je is hyperexchange with respect to ~1, so that JH~I Je - ~3, we
Systolic Arrays for SVD Downdating obtain AHA - xHz
--
r
X_.
-- ~," ( ~ (a) X_
245
(_X~r)S)(~
v~?(RH o ) H H ~ H ( R ) v H - V A V H 0 where V = V Jr and A_. = A~3A. As suggested in [9], by using hyperexchange matrices instead of hypernormal matrices (for which (I) = (I)), possible ill-conditioning can be detected by the negative elements of A_. If the problem is well posed, we have of course A_ = ~.._2. =
2.1
GENERATION OF THE HYPERBOLIC TRANSFORMATIONS
In recursive Jacobi SVD algorithms, the rotations involved in the triangularization of (1), as well as the Jacobi rotations ,It and Jr, are usually computed from sequences of 2 x 2 transformations. Consider for the time being the triangularization operation of (1). There are three main kinds of hyperexchange 2 x 2 rotations, according to how the signature matrices change. Let /2 be the signature matrix identical to the 2 x 2 identity matrix, and
92 (10 )-01 Let H2 =
(
( : 0 1 /~ .
'
)
(~
hll h12 the 2 x 2 transformation imposed on vector v = The three h21 h22 b " possibilities are as follows: 1) /2 ~ I2: In this case, the rotation is orthogonal and well defined. We have hll = h22 = cos S, h12 = -h21 = sin 0, for some angle 0. 2) ~2 ~ ~2, or ~2 ---* ~2" The hyperbolic transformation H2 is well defined if la[ > Ibl. We have that hll = h22, h12 = h21, and h~l - h122 = 1. 3) ~2 --* ~2, or ~2 ~ ~2" In this case lal < Ibl. It follows that hll = h22, h12 = h21, and h21 - h122 = - 1 . Notice that the rotations listed above, corresponding to signatures different from the identity, are defined only when [a[ ~ Ib[. It is possible to implement the rescue device suggested in [9] for these cases in which the hyperbolic transformation cannot be defined. The triangularization of (1) is replaced by the following two-sided transformation: 0 x_.V
= H
R 0
GH '
where H is again hyperexchange with respect to (~, while G is an orthogonal rotation. In order to explain how the matrices G and H are computed, consider a generic diagonal processor of the algorithm in [8]. This processor has access to the 2 x 2 submatrix R2 = 0
d " During the triangularization step, the 2-element vector (v, w), derived from the
product x . V, is input into the processor. If Iv[ ~ [al, then a hyperbolic transformation can be generated which zeroes out the vector component v against the matrix entry a. If [v I = la[, then additional computation is required. In particular, left and right orthogonal rotations, respectively by angles r and r are generated and applied as follows (ci = cos r
246
F. Lorenzelli and K. Yao
si = sin r
i = 1, 2):
0
d"
=
v~ w~
d~
-82
c2
0
-k
0
0
1
v ~ w~
=
-82
c2
0
0
d
Cl
0
0
1
v
~/)
81
-81
.
CX
The rotation angle Cx is chosen so that la" I # Iv'l. The sub diagonal fill-in, here symbolized by a ',', is zeroed out by the left rotation by angle r One can choose the rotation angle r which maximizes the difference (a") 2 - (v') 2. It can be shown that angles r and r satisfy tan(2r ) = 2(ab - v w ) / (a 2 - b 2 - d 2 - v 2 -t- w 2 ), tan r = - d sin r / (a cos Cx + b sin r ). The rotation angle r is then propagated to the V array. Consider now the rediagonalization step. In this case, left and right rotations are simultaneously generated. With a careful selection of rotation parameters, it is always possible to generate an orthogonal right rotation and a hypernormal left transformation which diagonalize a generic 2 • 2 triangular matrix R2, defined as before, i.e., s
c
0
hyperbolic
d
a
7 -y orthogonal
'
Feasible hyperbolic and orthogonal parameters are given by c=
(1 + r) t/2 + (1 - r) 1/2 2(1-r)1/4 ,
s=
(1 + r) 1/2 - (1 - r) 1/2 2(1-r)l/4 '
a
(bc + ds)
7
ac
'
where r - - 2 b d / ( a 2^ + b 2 -t- d2). The above implies that the rotation J~ can always be made hypernormal, and ~3 = ~x. Combining the triangularization and the rediagonalization steps, we obtain AHA-
xHx_. = V G J r A @ I A j H G H v
H = V A V H,
where V = V G J r , and A = A ~ I A . Ill-conditioned situations are signalled by negative elements on the main diagonal of the signature ~1, which is determined during the triangularization step.
3
THE JACOBI SVD ALGORITHM
- ORTHOGONAL
DOWNDATING
The calculation of hyperexchange transformations, together with the tracking of the signatures, may be inconvenient in certain applications, due to the additional computational burden. On the other hand, regular hyperbolic transformations are known to be prone to instabilities when the matrix to be downdated is particularly ill-conditioned, or in presence of strong outliers. An alternative approach is to use orthogonal transformations throughout [5]. But even this approach does not seem to solve the problem of instability. Recent analyses by BjSrk, Park and Eld~n [2], and Bendtsen et al. [1] in the context of QR downdating and least squares problems, show that any method (hyperbolic or orthogonal) which is based on the triangular factor and the original data matrix may lead to a much more ill-conditioned problem than using the additional information contained in the orthogonal factor. Among the techniques compared in [2] is the Gram-Schmidt (GS) downdating algorithm, which is characterized by relative simplicity of implementation and accuracy of the
Systolic Arraysfor SVD Downdating
247
results. In all the experiments conducted, the GS downdating algorithm displays a behavior comparable to the sophisticated method of corrected seminormal equation and superior to the LINPACK downdating algorithm, which is an orthogonal downdating algorithm exclusively based on the triangular factor of the QR factorization. The storage requirement is the same as for the hyperbolic downdating (note that the storage of outdated data samples is here replaced by the need to keep track of the orthogonal matrix). On the basis of these considerations, we propose here an alternative algorithm for SVD downdating which make use of orthogonal transformations only and exploit the information contained in the left singular matrices. The updating algorithm which constitutes the basis for the Jacobi SVD algorithm requires that three distinct operations be performed: a vector-by-matrix multiplication, a QR update, and a sequence of Jacobi rotations to restore diagonality. If an additional (n + 1) x (n + 1) array is available for storage of the left singular matrix Us, then during update all the left rotations involving rows i and j of R will also be propagated along the ith and j t h rows of U n. After update, the matrix U is (n + 1) x (n + 1). For the downdate consider the following. Let A =
~
= ~ - ~ H be the SVD of .4. It is desired to compute
the SVD of the downdated matrix B = U 2 V H. We have that 1 The equations above suggest the following procedure:
for i = 1 to n, apply a rotation to the ith and (i + 1)th rows of ~rn in order to zero out U(1, i); apply the conjugate of the previous rotation to rows i and (i + 1) of ~; this operation generates fill-in in positions (i, i + 1) and (i + 1, i); zero out element (i + 1, i) using a right rotation on columns i and (i + 1) of ~; apply the same rotation to columns i and (i + 1) of V. end Now ~ is upper triangular. Moreover U(1, i) = 0 for i = 1 to n. Zero out the elements of above diagonal of rows 1 to n using a sequence of Jacobi rotations. The conjugate of the left rotations are applied from the right to the columns 1 to n of U. The right rotations are also applied to the right of V. Eventually, we have ~=
u_
o
'
=
Zv
'
~=y
Note that the downdated vector is now generated as an output. The storage of the data matrix for downdating is now replaced by the storage of the U matrix, of identical size. This approach pipelines with the updating operations.
F. Lorenzelli and K. Yao
248 3.1
SVD UP/DOWNDATING IN PARALLEL
The parallel algorithm that we present here is based on the scheme proposed in [8], and we assume that the reader is familiar with its operation. In particular, the E matrix is stored in a triangular array, composed of O(n 2) locally connected processing elements, each of which has access to four memory cells (entries of ~). All the processors are capable of both column and row rotations, and the processors on the main diagonal additionally perform 2 • 2 SVD's. The particular array for the updating of the V matrix is not of particular concern in this paper, and will be neglected in the following. In the sequel, we assume that each update is followed immediately by a downdate, so that the size of the U-matrix oscillates between (n + I) x (n + i) and n x n. For these reasons, we propose to use an additional n • 1 linear array of processors, each having access to two adjacent rows of U H. juxtaposed to the diagonal of the E-array, so that the ith column of U corresponds to the ith row of E (except for U(:, n + 1)). In order to explain how the proposed array works, let us assume that the updating is completed and the (n + 1) 2 entries of the U-matrix are non-zero. Starting from the top pair of rows of U H, the rotations which sequentially zero out the first n entries of U(1, :) are generated and propagate through the matrix. If this operation starts at time slot 1, then the rotation parameters reach the diagonal elements of E at times 1, 3 , . . . , 2 n - 1. When this occurs, the same rotations are propagated row-wise through the E-matrix. As explained earlier, the left rotations produce fill-ins in the first sub diagonal of E, which can be immediately zeroed out by the diagonal processors by the use of right rotations. The rotation parameters associated with these column rotations are subsequently propagated to the columns of E and those of the V matrix. A complication is due to the fact that at the end of the procedure just explained the contents of the square U-array are not properly arranged for the subsequent update. The required transformation is shown in the diagram below, for the case n = 3, where the • symbolize generally non-zero entries. The matrix on the left shows the result of the above operations on U s , while the matrix on the right shows the ideal configuration of the square array at the beginning of the updating step. 0
x
x
x
x
x
x
0)
0
x
x
x
x
x
x
0
1
0
0
0
0
0
0
1
0
"
This problem can be easily solved by reindexing the elements of each column of U. No physical data movement is required. At the end of the downdating operation, the three arrays (containing the elements of Us, ~, and V) are ready for the update. The incoming data vector is input into the V matrix and the z = z. V product is input into the top row of the triangular array. Simultaneously a row vector w initialized at ( 0 , 0 , . . . , 0 , 1 ) is input at one end of the linear U-array. The rotation parameters which annihilate the ith element of z are computed by the diagonal processors in the triangular array, and propagated into both the ~ and U arrays. The rotation is applied to vector pairs (z, ~(i, :)) and (w, vH(i, :)). Following arguments similar to those put forward in [8], and keeping in mind that left (right) rotations involving disjoint pairs of rows (columns) commute, one can prove that the
Systolic Arrays for SVD Downdating
249
rotations relative to the different steps of QR update, downdating, and rediagonalization can be pipelined and interleaved. The pipeline rate is basically unaltered with respect to the updating SVD array of [8]. Note that because of the downdating algorithm chosen, the matrix U must be represented explicitly and not in factorized form. The scheme so far presented may be viewed as an intermediate design towards a one-dimensional SVD array of O(n) elements where each memory cell stores an entire row of the matrices ~ and U H. 4
THE HESTENES
SVD ALGORITHM-
HYPERBOLIC
DOWNDATING
Consider Brent & Luk's array performing the SVD of the (m + 1) x n matrix A0 =-
(A)
where x_ is the data row to be downdated. As suggested in [9], the eigenvalues and eigenvectors of the downdated correlation matrix A H A - xHx can be obtained by generating a sequence of unitary rotations V j such that for growing i, A, = Ao l I v_j - - .
j=o
__.v (m+l)•
o
'
(m-t.-1) X n
where ~ is an n • n diagonal matrix and U is hypernormal with respect to the signature matrix ~ = (Imo
-10 ) , (where Im is the m •
identity matrix)in the sense that _uH~u_ =
~. After convergence, one has that AHA - xHx =
x_.
r
x__
= V(~
.
.
O)
.
.
~U
.
= V
.
The hypernormality of U can be ensured as follows. The sequence of matrices Ai+l = A i ~ is generated by using plane rotations, V~, which operate on pairs of columns of Ai. The rotation parameters are selected in such a way as to make the resulting column pairs hypernormal. The procedure is iteratively executed by selecting sweeps of all the possible different column pairs, and by repeating the sweeps a sufficient number of times (until convergence). There are numerous strategies for choosing the order in which column pairs are rotated. One efficient ordering, suitable for parallel implementations, is given in [3]. 5
THE HESTENES SVD ALGORITHM-
ORTHOGONAL
DOWNDATING
As mentioned previously, hyperbolic transformations are susceptible to ill-conditioning and may generate numerical instabilities. Here, we present a possible procedure for orthogonal up/downdating and give a high level description of the required array of processors. For simplicity of up- and downdate, consider the SVD of A H (for the sliding window approach, we can assume that m is not too much larger than n). Also, assume that m is an odd number. The linear array is composed of (m + 1)/2 elements, each containing a pair of columns of A s (in practice, only (n + 1)/2 elements are required). At the end of a sweep [3], the processing elements contain the n columns of W = V~, together with the n columns of U, and m - n + 1 columns of zeros. During update, the (m + 1)st new column ~S is appended to the rightmost element and the columns of (W, 0, ~H) are orthogonalized
250
F. Lorenzelli and K. Yao
to produce the updated matrices I~ and ~'. In fact, if Q is the orthogonal matrix used for reorthogonalization,
(w
~H)
o
1 =(W
)Q
o
1 =(~v
o)
~ . (n+l) x(m+l)
For the downdating, consider the following. Let ~H = (~H, ~-H)H, and let G be the orthogonal matrix that orthogonalizes the columns of ~H. Then we have
(~ 0)(~ @)=(# 0)cc'(~ Y
Up)= ( w
o)(zQ) H = (~_ w uH),
Ztt
where Q is the matrix that orthogonalizes Y and _U = M(2 9m % 1,1 9n), M - ZQ. Notice that downdating requires two reorthogonalization steps, one on matrix Us and one on matrix Y, and is thus twice as expensive as the updating step. The reason for this is that the information on the individual matrices Z and V is unavailable, and one needs to work on the matrix W = VZ.
Acknowledgments This work was partially supported by the NSF grant NCR-8814407 and the NASA-Dryden grant 482520-23340. References
[1] C. Bendtsen, P. C. Hansen, K. Madsen, H. B. Nielsen, and M. Pmar. "Implementation of QR Up- and Downdating on a Massively Parallel Computer". Report UNIC-93-13, UNI.C [2] A. BjSrk, H. Park, and L. Eld4n. "Accurate Downdating of Least Squares Solutions". SIAM J. Mat. An.~ Appl., 15(1994), to appear. [3] R. P. Brent and F. Luk. "The Solution of Singular-Value and Symmetric Eigenvalue Problems on Multiprocessor Arrays". SIAM J. Sci. Star. Comput., Vol. 6, No. 1, pp. 69-84, 1985. [4] W. Ferzali and J. G. Proakis. "Adaptive SVD Algorithms for Covariance Matrix Eigenstructure Computation". In Proc. ICASSP, 1990. [5] G. H. Golub and C. F. Van Loan. "Matrix Computations". John Hopkins, 2nd ed. 1989. [6] S.-F. Hsieh. "On Recursive Least-Squares Filtering Algorithms and Implementations". PhD Thesis, UCLA, 1990. [7] M. Moonen, P. Van Dooren, and J. Vandewalle. "A Singular Value Decomposition Updating Algorithm for Subspace Tracking". SIAM J. Mat. An.~ Appl., 13(4):1015-1038, October 1992. [8] M. Moonen, P. Van Dooren, and J. Vandewalle. "A Systolic Array for SVD Updating". SIAM J. Mat. An. ~ Appl., 14(2):353-371, April 1993. [9] R. Onn and A. O. Steinhardt. "The Hyperbolic Singular Value Decomposition and Applications". IEEE Tr. SP, 39(7):1575-1588, July 1991. [10] G. W. Stewart. "A Jacobi-like Algorithm for Computing the Schur Decomposition of a Nonhermitian Matrix". SIAM J. Sci. Star. Comp., 6(4):853-863, 1985.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
251
SUBSPACE SEPARATION BY DISCRETIZATIONS OF DOUBLE BRACKET FLOWS
K. HOPER, J. GOTZE, S. PAUL
Technical University of Munich Institute of Network Theory and Circuit Design D-80290 Munich Germany knh u @n ws. e- tech n ik. tu- m ue nche n. de
ABSTRACT. A method for the separation of the signal and noise subspaces of a given data matrix is presented. The algorithm is derived by a problem adapted discretization process of an equivalent dynamical system. The dynamical system belongs to the class of isospectral matrix flow equations. A matrix valued differential equation, whose time evolution converges for t ~ oc to block diagonal form is considered, i.e., only the cross-terms, correlating signal and noise subspaces, are removed. The iterative scheme is performed by computing some highly regular orthogonal matrix-vector multiplications. The algorithm essentially works like a Jacobi-type method. An updating scheme is also discussed. KEYWORDS. Double Bracket flow, gradient flow, block diagonalization, principal eigenspace, Jacobi-type methods.
1
INTRODUCTION
Given a data matrix H E ~nxm (n > m), a frequently encountered problem in signal processing (DOA-estimation, harmonic retrieval, system identification) is the separation of the column space of H into signal and noise subspaces. The SVD of H is the most robust and reliable tool for this task. The signal subspace is defined by the right singular vectors corresponding to the large singular values. Computing the SVD of H, however, is computationally expensive and yields much more information than necessary to separate the signal and the noise subspaces. In order to avoid the computationally expensive SVD, various methods for determining the subspaces in a computationally less expensive way have been published. Among these methods are the Rank-Revealing QRD [2], the URV decomposition [11] and the SVD-updating algorithms [8, 9] which can be considered as
K. Haper et al.
252
approximate SVD's. Other methods not based on the SVD have also been proposed, e.g. the Schur-type method presented in [4]. Another quite obvious approach is the block diagonalization of H, i.e.
H = U
0 0
H22~ 0
V T ,
where U E O(n), V E O(m), and Hllo~ E/R dxd, H22r E / R m-dxm-d have dimensions due to the dimensions of signal and noise subspaces, respectively. It is not straight forward, however, to extend known linear algebra algorithms such, that any block diagonalization of an arbitrary matrix is obtained. In this paper an algorithm for block diagonalizing a given matrix by orthogonal transformations is derived. This is achieved by a problem adapted discretization process of an equivalent dynamical system. The matrix flow is Brockett's double bracket flow [1] H = [H, [H, N]], choosing the matrix N appropriately with respect to the subspace separation problem. Essentially, after discretization, this results in a Jacobitype method which only works on the cross terms H12 E 1Rdxm-d (assuming a preparatory QRD). Therefore, only d ( m - d ) rotations are required per sweep, while the standard Jacobitype methods apply m ( m - 1)/2 rotations. The algorithm can be considered as a method for maximizing an objective function of a continuous-time gradient flow. It is a specific example of a gradient ascent method using geodesic interpolation (see [5], [7], and [10] for related work). An updating scheme for this algorithm is very similar to the SVD-updating scheme of [8]. Again, the updating scheme only operates on the cross terms.
2
DYNAMICAL
SYSTEM FOR BLOCK DIAGONALIZATION
In this section a dynamical system for block diagonalizing symmetric matrices with distinct eigenvalues is briefly presented. We shall state the necessary propositions without proofs. For convenience, we have decided to present the results for symmetric matrices (covariance matrices). The extension to arbitrary n x m data matrices (SVD) is straightforward. The isospectral matrix flow is governed by
= [H,[H,N]],
(1)
and the associated flow on orthogonal matrices by 6) = O[H,N], with H = H T E IR re•
(2)
O(t) E O(m), [X, Y] d*d X Y - Y X , and N = diag(Id, 0,~-d).
Proposition 2.1
Eq.(1) is the gradient flow H = grad f ( H ) of the function f ( H ) = - t r H n . Eq.(2) is the gradient flow 0 = grad r of the function r = - t r NOTHoO + tr H0 = f ( n ) . R e m a r k The gradient flows in Proposition 2.1 are defined with respect to different metrics. See [5] for details. P r o p o s i t i o n 2.2 For t ~ oo the time dependent H(t) of (1) with initial H(O) converges
Discretizations o f Double Bracket flows
253
to block diagonal form:
H(O)=
[Hl1(0)
H12(0) 1 t--*r162 [Hl1~ 0 H~(O) H~(O) ' 0 H~
with Hll E 1Rdxd, H12 E 1Rdxm-d, and H22 E ~ m - d • Assuming a(Hllcr a(H22or = {}, for all )q E a ( H ~ ) and for all )~j e a(H22~r holds )~i > )~j. For t ~ c~ the time dependent O(t) o./'(2) with initial 0(0) e O(m) converges to (|162162 02~r with 0Tr162162162 = Id and 0Tr162
= Im-d. It holds
span(OlcclHOxcr = 01r162162162 s p a n { q 1 , . . . , q d l H q i = qi)q, Ai E a(H11~r span(O2cclH02r162- 02r162162162 s p a n { q d + l , . . . , q m I H q i = qiAi, Ai e a(H22r162 Proof: see e.g. [7].
3
DISCRETIZATION
SCHEME FOR BLOCK DIAGONALIZATION
A discretization scheme for the above gradient flow is now presented. Taking into account that at each time step only an orthogonal transformation in a specific 2-dimensional plane of the matrix H is performed, a modification of the cyclic-by-row Jacobi method results. This method works with the full group SO(2) or equivalently, the rotation angles r e [-7r/2, ~r/2]. Excluding certain pathological initial conditions (saddle points) which in general do not occur for subspace separation problems, this algorithm is obviously globally convergent. Furthermore local quadratic convergence for the case of full diagonalization (including multiple eigenvalues) was recently proved using methods from global analysis [6] (see also the contribution by U. Helmke to this workshop). Here we borrowed from [3] the very readable MATLAB-like algorithmic language and notation. A l g o r i t h m 3.1 Given a symmetric H E
j~m• and p, q E 1N that satisfy 1 < p < d and d + 1 j, t~,j = Xid if i > j,
Yii, =~Xi,i,1 Yi,i = ~Xi,i,
and l~,j = Xi,j if i < j, (1) and Y~,j = 0 if i < j.
L e m m a 1: If
X -1, then
Y =
dY 9 = -~-~(Y)=-Y/(Y If X = c T c is a Cholesky decomposition of a non singular matrix X, then dC .](.
u p p h ( C _ T ~ ( c _ l )C,
If X = Q R is a QR decomposition of X, and X is non singular, then Q, =
~r~(X)
=
Q w ( Q T X R -1) and
where co(Z) = lowh(Z) - ( lowh(Z))T and p(Z) = ( lowh(Z))T 4- upph(Z).
If X = V A V T is an EVD of a symmetric matrix X with distinct eigenvalues, then A
= =
~(.X) rz(~:)
= =
d]ag(VT.Xy)and v~(vr~:v)
where #(Z) is an elementwise scaling of the elements of Z, by #i,j(Z) = zi,j/(Aj - Ai) if i r j and/~i,i(zi,i) = O. If X has repeated eigenvalues Ai = Aj over a finite time interval, V is not uniquely defined, and the corresponding entries in I~i,j(vT/(v), can be replaced by arbitrary values as long as # ( v T x v ) is kept skew symmetric. If X has repeated eigenvalues Ai = Aj at an isolated time instant to, and if the eigenvectors have a differentiable time evolution, #i,j(to) is found by continuous extension of #i,j(t). Similar formulas exist for the LU decomposition. A similar formula for the SVD can be derived easily, applying the formula for the EVD to x T x and X X T. We only state the well known result for the evolution of the singular values: If X = U E V T is an SVD of a matrix X with distinct singular values, then dE 9 = ~-~(x)= d~g(UrXV)
Parallel Algorithms for Subspace Tracking 3
A CONTINUOUS
TIME SUBSPACE TRACKING
261
ALGORITHM
Subspace tracking consists in the adaptive estimation of the column space of a slowly timevarying n x Jr matrix M, given a signal z, generated by x(t) = Ms(t) + n(t), where s(t) is a source signal with non singular correlation matrix E{s(t)s(t)T}, and n(t)is additive white noise, with variance a 2. We study the following algorithm
fi = 7 ( z z T A - 2A upph(ATzz T A))
(2)
where z E R '~ is an input signal, the first tr columns of A E I~nxn span an estimation of the signal subspace (the other columns are not needed for the adaptive computation of the first tr columns), and 7 is a (possibly time dependent) scalar. The algorithm is clearly a continuous time version of the neural stochastic gradient algorithm as described in discrete time by 0ja[2] as
AA(k-
1)
= 7 ( k ) { x ( k ) z ( k ) T A ( k - 1) - 2 A ( k - 1 ) u p p h ( A ( k - 1 ) T z ( k ) z ( k ) T A ( k - 1))}
(3)
where 7(k) is a scalar. It can also be derived as a continuous time version of the spherical subspace tracker, a discrete time algorithm, related to the well known SVD approach but with reduced complexity, in which the signal singular values as well as the noise singular values are averaged at each step [3]. In continuous time one can work with (a generalization of) the SVD, or more conveniently, with the EVD of N = fto~ z(r)z(r)Te-;~(t-")dr = V A V T. The tr first columns of V are an estimate for the signal subspace. Exact tracking of V, applying 1emma 1 to = zz T - AN, leads to lk = V # ( v T z z T v ) . Parallel realization of this algorithm would require the number of links between different cells (storing matrix entries) to be proportional to the problem dimension n. But, as in the discrete case, constant averaging of the signal eigenvalues and the noise eigenvalues reduces the complexity. Application of lemma 1 (with continuous averaging of the eigenvalues) now results in different algorithms, depending on the choice of the arbitrary components in #. These different possibilities correspond to representations by different bases of the signal and noise spaces. (Our choice does not correspond to the choice of [3]). A possible solution, with ~i,j(xi,j) = -#j,i(xi,j) = x i , j / ( A s - An)if i < j and #i,i(xi,i) = 0, yields #(X) = 1 / ( ~ , - ~ . ) ( X - 2 upph(X)), and
ft = 7 ( x x T A - 2A upph(ATxxTA)) where 7 = A8-1 A,~ and ~s = ~1llATxll2 - AAs and A,~ = ~1liATxll2 - AA,~.
(4)
This corresponds to (2) with an adaptive factor 7. From lemma 1 the algorithm can also be regarded as a stochastic QR-flow. Whereas the QR flow, given by (~ = Qw(QTAQ), tracks the Q-factor of a matrix B, obeying B = N B , to find the eigenvalue decomposition of N, equation (2) tracks the Q-factor of a matrix B, obeying B = 7xxTB. The.algorithm also corresponds to a square root version of a stochastic double bracket flow Z = 72[Z, [Z, xxT]], where Z = AA T.
J. Dehaene et al.
262
4
ANALYSIS
In this section we briefly show how the continuous time background of the algorithm can also shed new light on the analysis of its behavior. Lemma 1 provides an easy interpretation of the stability of the orthogonality of A and of the convergence of the estimated subspace to the right solution in the stationary case, and with asymptotically vanishing noise. Application of lemma 1 to (2) to find out the evolution of the singular values of A yields = ( ~ r ~ ) ~ ( ~ _ ~), where u is the corresponding left singular vector. Checking the sign of d one easily sees that a converges to 1, keeping A orthogonal (under mild conditions for x). Furthermore, applying the same lemma 1 for the singular values of .MTA, where M = orth(M) spans the (stationary) exact subspace, gives the evolution of the cosines of the canonical angles between the estimated and the exact subspace, in the absence of noise. A simple substitution cos 0 = a gives the evolution of the canonical angles.
= _ ( ~ r M r ~ )~ ~i~(20), where ~ is the corresponding right singular vector of MTA. Generalizations of these formula for non stationary and noisy signals will be published elsewhere.
5
PARALLEL REALIZATION
An interesting aspect of many continuous time algorithms is that they have a simple parallel signal flow graph. In [4] we have shown how continuous time algorithms of the form A = ~i Ti, where Ti axe terms of the types given below, can be realized as an array of identical cells (uniform parallel realization). terms of type I: terms of type II: terms of type III:
T = ed T T = tri(eeT)A
T = tri(edTA-1)A
or T = A t r i ( d f T) or T = A t r i ( A - l e d T)
where A must be upper triangular for terms of type III, and tri stands for one of the triangular parts defined in (1), and must also be upper triangular for terms of type III. The vectors e, e E lt~m can be of the form (AAT)kx or (AAT)kAy, and d, f e it(n can be of the form (ATA)ky or (ATA)kATx, where k >_ 0 is an integer, and x E lir and/or y E Ii~m are external inputs. All three types can be interpreted as adaptive neural networks in the sense of [2]. For the parallel realization of these systems the matrix A is represented as an array of analog cells storing the entries of A. Vectors are realized as signals in the rows or the columns of the array. A component xi, ci or ei (i = 1 , . . . , m ) of a vector x, e or e is realized as a signal, available to all cells (i, k) in row i of the array. The vectors x, e, e, are said to be available in the rows. Similarly, vectors y, d, or f, are available in the columns. The inputs x and/or y are supplied externally. The other vectors are obtained through matrix vector multiplication in a straightforward way. The presence of upph amounts to the need of partial sums of these matrix vector products. For triangular matrices A one can
Parallel Algorithms for Subspace Tracking
263
also easily compute A - I x or A - T z by back-substitution [4]. As an example fig. 1 shows the realization of algorithm (2) with 3' = 1.
I
i ~l,j = 0
Oi,j
Xi
dj ai,j
Xi
,1 -- 0 ~
Xi
(Xi -- ~i,j -- ~ i , j + l )
,j +1 -:
/ r},+l,j =1
rIi,j + aid xi rI.+l,j
~i,j + a i j d j
l
dj dj = rln+l,j
i Figure 1: cell (i,j) and boundary cells for the realization of ~{ = x x T A - 2 A upph(ATxxTA)
6
INTEGRATION
In the previous section, we desribed some continuous time systems with a straightforward parallelization. The next problem is to convert these algorithms into discrete time algorithms. We will do this by exact integration of the continuous algorithm for a piecewise constant input signal. Each new value of the input corresponds to a new discrete iteration step. The integration essentially amounts to using lemma 1 in the opposite direction. The discrete formula and its parallel realization can also be found by exploiting the analogy with an inverse updating algorithm [6] for recursive least squares estimation (RLS). For RLS, the Cholesky factor R or the inverse Cholesky factor S = R -1 of the matrix N = ft_~o z(r)x(r)Te-~(t-~)dr is to be tracked. Using lemma 1, one finds
k = upph(R-TzzTR-1)R - ~tg
(5)
and again with lemma 1, :i = - S u p p h ( S T z z T s ) +
~S.
(6)
264
J. Dehaene et al.
When supplied with piecewise constant inputs, these algorithms give the same solutions as discrete algorithms for RLS. The first term of algorithm (5) is of type III and the first term of algorithm (6) is of type II. The second term of both algorithms has a trivial realization. Algorithm (5) is a continuous time limit of the well known Gentleman Kung array for QRupdating [5]. Algorithm (6) is a continuous time limit of the systolic algorithm described in [6]. The resemblance of (6) and (2), suggests that a similar systolic array exists for a discrete version of (2). It turns out that a (discrete) systolic signal flow graph can be obtained, from which a linear pipelined array can be derived easily. The following theorem gives the exact integration of (2) for constant input. T h e o r e m 1 If A0 = A(0), and if z e ~
is constant and
= ~(t) = ~ v ( ~ r ~ yo~~(~)d~) v - 1 , . _ T A_ B = B(t) = Ao + x-,~x~ ,~o
S=S(t)= i c f ( I + ~v2-1ATxxTAo) 0 where icf stands for inverse Cholesky factor A = A(t)= BS then p(to) = 1 b = ~xTxv B(to) = Ao B = 7 x T x B S(to) = I $ = -27Supph(ATzxTA) A(t0) = Ao fl = 7 ( z z T A - 2 A u p p h ( A T x z T A ) ) These formulas are easily verified, by derivation, using lemma 1, and the formulas z TB = uxTAo and x T A = uxTAoS.
The formula for u depends on an assumption of the evolution of 7. A simple formula is obtained assuming ~ 1 = zTx (compare with (4)) , resulting in u(t) = 1 + xTxTt. Every step of the corresponding discrete algorithm consists in calculating a new matrix A' = A(Ta) (where Ta is a sampling period) from the old matrix A = A(0), with a new given input z. For parallel realization, many variations are possible. A two dimensional signal flow graph for updating A, with a signal flow similar to the continuous parallel realization is shown in figure 2. The array consists of n x ~ cells ( i , j ) storing the entries of A, an extra column to the left with cells (i, 0), and an extra row at the bottom with cells (n + 1,j). First, to obtain B = B(Ta) from A = A0, v-1 xxTAo should be added to A0. The vector yT _ xTAo is accumulated by the signals r/, as each cell ( i , j ) calculates ~i+l,j = ~i,j + z i a i j . The quantity x T x is accumulated by the signals &in the cells (i, 0). Using the external input 7, u = 1 + x T x T t and a = ~Y--1 are calculated in cell (n + 1, 0) and a is passed to all cells (n + 1,j) of the bottom row. There ~ = ay is calculated and sent upwards to the cens (i,j). With this information, each cell ( i , j ) can already add xi~/j to aid to obtain bi,j. To obtain A' = A(Ts), B is to be multiplied by S. By analogy with the inverse updating array for RLS [6], this is done as follows. One can prove that S = [II0] G 1 . . . G n [yTI -
1 vp
--~]G~
['?
-~ where Gi, i = 1 , . . . , n are Givens rotations, such that
. . . G~ =
[0...01y~+~... Y.l~+x]
(7)
265
Parallel Algorithms for Subspace Tracking
~l,j = 0
51 = 0
flj
5~
cj, sj ai,j
Y
i,j Xi
J (i,O)
> zi
] /
x~
>
(ij)
Xi
$=05
~+1
r/i+a,j ~)j cj, sj
~n+l
7]n+l,j yj
Cj,
8j
I
(n+lj) Cj+l Figure 2" signal flow of a systolic array for the discrete time algorithm
where ~ = x-,/~x, v2-1 and each successive Gi affects only components i and n 4- 1 of the vector it is acting on. That is, post-multiplying with S means adding an extra column of zeros, applying n successive Givens rotations that roll [ y T _ ~ ] into its last component and dropping the extra column. The extra column is realized by the signals r in fig 2. To calculate the Givens rotations, cell (n 4- 1, 0) calculates el = - - ~ and every cell (n 4- 1,j) u
calculates cj sj and ej+l from yj and ej such that cj2 4- sj2 = 1 and
[yj ej]
cj
-sj
sj
cj
= [0 er
The quantities cj and sj are passed upwards such that each cell (i,j) can calculate the updated a~,j = ai,j(Ts) and ~i,j+l from bi,j - ai,j + xi,jfli,j and r by
[a~,j ~i,j+l] = [bi,j ~i,j]
cj -sj
sj cj
266
J. Dehaene et al.
As their is no data flow from right to left, the algorithm can be pipelined in a straightforward way on a linear processor array storing the columns of A. Obtaining a pipelined two-dimensional array is more difficult. One way is to apply the same manipulations as used in [6] to obtain a pipelined array for RLS. This works for the Givens rotations but to calculate B from A in the retimed array, products of inputs x at different time instants are needed, which leads to a more complicated array. A second way is to store and update an LU-decomposition of A. The resulting continuous time algorithm can be derived, using lemma 1. And a discrete version is obtained using an approach similar as above. Unfortunately, the algorithm breaks down, when for some A, the LU-decomposition does not exist. However, the example illustrates that our approach to derive discrete parallel algorithms also works in other cases than the present one. It follows from the correspondence with the continuous algorithm, that A(0) need not be orthogonal, as A automatically converges to an orthogonal matrix. Whereas explicit orthogonalizations would need O(n 2) operations, the multiplication with S (as n Givens rotations) can be regarded as a cheaper orthogonalization, which works slower if A is far from orthogonal, but is sufficient to restore orthogonality in every step if A(0) is orthogonal. The complexity of the algorithm is the same as for the spherical subspace tracker. But whereas in [3] a different representation of the signal and noise subspaces was introduced to obtain this lower complexity, this is not necessary for the given algorithm. As a consequence, the columns of A can be interpreted as approximations of the eigenvectors of N. Using these approximations one can also estimate the eigenvalues, as an exponentially weighted average of xTA.j. These estimates can be used to estimate ~ if unknown and to calculate a time step 7 inspired by (4). References
[1] L. Ljung, "Analysis of Recursive Stochastic Algorithms", IEEE Trans. on Automatic Control, AC-22 (1977) 551-575. [2] E. Oja, "Principal Components, Minor Components, and Linear Neural Networks", Neural Networks 5 (1992) 927-935. [3] R.D. DeGroat, "Non-iterative subspace tracking", IEEE Transaction on Signal Processing 40 no.3 (1992) 571-577. [4] Dehaene J., Vandewalle J., "Uniform local implementation on networks and arrays of continuous time matrix algorithms for adaptive signal processing", report 93-34I, ESAT-SISTA, K.U.Leuven, and shorter version to appear in proceedings of MTNS 93, symposium on the Mathematical Theory of Networks and Systems, Regensburg, Germany, 2-6 August, 1993. [5] W.M. Gentleman and H.T. Kung, "Matrix triangularization by systolic arrays", RealTime Signal Processing IV, Proc. SPIE 298 (1981) 19-26. [6] M. Moonen and J.G. McWhirter, "A systolic array for recursive least squares by inverse updating", Electronics Letters, 29 No. 13 (1993) 1217-1218.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
STABLE JACOBI SVD UPDATING ORTHOGONAL MATRIX
267
BY FACTORIZATION
OF
THE
F. VANPOUCKE, M. MOONEN
Katholieke Universiteit Leuven Dept. of Electrical Engineering, ESA T Kard. Mercierlaan 9~, 3001 Leuven, Belgium { Filiep. Van, oucke, Marc.Moonen} @esat.kuleuven.ac. be E. D E P R E T T E R E
Delft University of Technology Dept. of Electrical Engineering Mekelweg 4, 2628 CD Delft, The Netherlands ed@dutentb, et. tudelft, nl
ABSTRACT. A novel algorithm is presented for updating the singular value decomposition in parallel. It is an improvement upon a Jacobi-type SVD updating algorithm, where now the exact orthogonality of the matrix of short singular vectors is guaranteed by means of a minimal factorization in terms of angles. Its orthogonality is known to be crucial for the numerical stability of the overall algorithm. The factored approach has also advantages with respect to parallel implementation. We derive a triangular array of rotation cells, performing an orthogonal m a t r i x - vector multiplication, and a modified Jacobi array for SVD updating. Both arrays can be built with CORDIC processors since the algorithms make exclusive use of Givens rotations. KEYWORDS. Adaptive singular value decomposition, systolic arrays.
1
INTRODUCTION
AND PRELIMINARIES
The singular value decomposition (SVD) of a matrix X E ~NxM is defined as
X = U. E. V T, where U E ~NxM, V E ~MxM are orthogonal and E E ]~MxM is a diagonal matrix holding the singular values of X. This matrix decomposition is central in the theory of model based signal processing, systems identification and control. Because of the availability of robust
F. Vanpoucke et al.
268
Al$orithm 1 V[o] '-" Rio]
'-
IM OMxM
f o r k = 1,...,o0 1. Input new observation vector z[k ] ~T T z[k ] ~- z[~].V[k_~] 2. QR updating
ki0 l
]
+_
G M + I - i [ M + 1T
II
"
A.R~}_I]
i=l
3. SVD steps M-1
(3a) R[k] +--
(~ M - i i M + l - i ~r
H "t l i---1
M-1
9kt l"
II i'-I M-1
(3b) V[k] *--
& i]i + l
-t 1
•ili+1
i-1
endfor
algorithms with excellent numerical properties, it is also an important algorithmic building block. In real-time applications the data vectors Z[k] E n~M a r e processed as soon as they become available. The data matrix X[k] is often defined recursively X[k] =
[ A'X[k-I] ] x(~] '
kxM
The real scalar A < 1 is an exponential weighting factor which deemphasizes older data. In the above applications, the SVD of X[k] has to be tracked. Unfortunately, an exact update the SVD at each time instant k requires O(M 3) operations. For many real-time applications this computational cost is a serious impediment. Therefore, approximate algorithms for SVD updating have been developed which trade accuracy for computational complexity. A promising algorithm is the Jacobi-type SVD updating algorithm [1]. It is reprinted as Algorithm 1 and computes an orthogonal decomposition
xt j = vt j. RE j. where U[k] E nt k• V[k] E n~M x M a r e orthogonal but now R[k] is an upper triangular matrix which is nearly diagonal (i.e., close to the true ~[k]). Since U[k] grows in size, the algorithm only keeps track of R[k] and V[k], which is sufficient for most applications. First the incoming vector z[k] is multiplied by the orthogonal matrix V[k-1]. The second step is a QR decomposition updating step. The transformed vector Z[k] is worked into the weighted triangular matrix A 9R[k_l] by means of a sequence of M Givens rotation 2 i l M + l E n~( M + I ) x ( M + I ) Such a Givens rotation matrix Qilj is a square matrices [2] f'-'[k]
Stable Jacobi SVD Updating matrix embedding of a 2 x 2 rotation operating on identical to the identity matrix except for 4 elements _QilJ(i,j) = QilJ(j, i)= sin(a ilj) for some angle a ilj. nearly diagonal structure of R[k]. Therefore the third of M -
i row and column rotations (~i[i+1 ~'-'[k] , ~i[i+l ~[k] E
269
rows/columns i and j, i.e., Qilj is QilJ(i,i) = QilJ(j,j) = cos(ailj) and The QR updating step degrades the step consists in applying a sequence
]~MxM)
to R[k] to reduce the size of
the off-diagonal elements again. Finally the column rotations ~i[i+l [k] have to be applied to V[k-1]. For details on how to compute the rotation angles, we refer to the original paper P]. A high level graphical representation (signal flow graph or SFG) of the algorithm is given in Figure 1. The upper square frame represents a memoryless operator which computes the matrix - vector product x[k]-.T= X[k]T 9V[k-1] (step 1), and the matrix - matrix product ~k] = ~k-1] ' @[k] (step 3). The heavy dots (here and in subsequent figures) represent delay operators. The lower triangular frame represents an operator which performs the QR updating (step 2) and the SVD steps (step 3). A systolic Jacobi array implementing this SFG is described in [3]. Algorithm 1 has one numerical shortcoming. It suffers from round-off error accumulation in the orthogonal matrix V[k]. At each time instant V[k-1] is multiplied by a sequence of 9 9 i+l 9 M - 1 Givens rotations (I)i[lk] . Rounding errors in these multiplications will perturb V[k] in a stochastic manner. These errors do not decay, but keep on accumulating. This can easily be verified experimentally. The orthogonality of V[k] is known to be crucial for the overall numerical stability and accuracy [1]. Therefore, an unbounded increase of the deviation of orthonormality is clearly unacceptable 9
The algorithm can be stabilized by including a reorthogonalization step based on symmetric Gram-Schmidt orthogonalization [1]. At each time the rows are V[k] are reorthogonalized by 2 • 2 transformations. However, this method does not guarantee orthonormality at each iteration. In combination with exponential weighting it only keeps the deviation sufficiently low and bounded. Secondly, the resulting systolic implementation is rather tricky and inefficient. An alternative to keep a matrix orthogonal is to describe it by means of a set of rotation angles. In applications where the matrix is constant, e.g., orthogonal filters, this parameterization has been used extensively. However, in adaptive signal processing, updating the angles is a non-trivial issue. In section 2 it is reviewed how the orthogonality of the V[k] matrix can be preserved by parameterizing the matrix as a finite sequence of Givens rotations. In addition, the corresponding triarray for orthogonal matrix-vector multiplication is presented. In section 3 a rotation method is described to update the rotation parameters of V[k] without explicit computation of V[k]. This method is the major contribution of this chapter. Section 4 combines these results into a modified systolic Jacobi array for SVD updating.
F. Vanpoucke et al.
270
Yt- vtl z[k]
~.T [~]
~ _ t'[k-1] ~[k] ~ - - - ~ R[k]
Figure 1: SFG of Algorithm 1 for SVD updating. 2
ORTHOGONAL
MATRIX-
VECTOR
PRODUCT
In this section we show how to factor the matrix V[~] as a finite chain of Givens rotations, each determined by an angle a. If all computations are performed on the angles, the matrix V[k] can never leave the manifold of orthogonal matrices. Rounding errors will now perturb the stored rotation angles, but by construction the perturbed V[k] is still orthogonal. First we derive a unique factorization of an arbitrary orthogonal matrix V. Secondly we use this factorization to compute the product x[k T ] 'V[k-1] This leads to an elegant systolic array for orthogonal matrix - vector multiplication. L e m m a 1 Let V be a real orthogonal matrix (V T . V = IM). Then V can be factored uniquely into a product of M . ( M - 1)/2 Givens rotations Qilj and a signature matrix S, i.e., v ---
II
o
9s ,
\ i=a j=i+a
where S is equal to the identity matrix of size M, except that the last diagonal entry is 4-1. Example For a 3 x 3 orthogonal matrix, the factorization is given by V - Q112. Q1[3. Q213. S. To construct the factorization, it is sufficient to apply the weU-known Givens method for QR decomposition [2].
271
Stable J a c o b i S V D U p d a t i n g
Ixxx xxx xxx]~ _ Ixx0 xx x xx x I ~_ [100 xx ~ xx ~ ]Q23 _ [100 0~1 0~1 v
s
At each stage we need to compute a ilj such that after rotation, v~i is zeroed. The angle a ilj is unique if the convention is taken that vii is non-negative. If both vii = vii = 0 , we define a i[j -- O. After zeroing all off-diagonal elements in a column, the diagonal entry equals 1 since the columns of an orthogonal matrix have unit-norm. The same argument holds true for the rows. Finally, the sign of the (M, M)-th entry of 5' is not controlled by the algorithm. It is positive or negative depending on the sign of the determinant of V. n
Comments 1. One can always choose a matrix of short singular vectors with positive determinant such that the signature matrix can be omitted. 2. The ideal hardware component for fast computation of the angles, given a matrix V, is a CORDIC processor [4] in angle accumulation mode. The signal flow graph (SFG) for an orthogonal matrix-vector product Z[k ]~T __ X[k]T "V[k-1] in factored form is now shown in Figure 2. The triangular graph consists of M . ( M - 1)/2 nodes, having local and regular interconnections. The functionality of each node is the same. It stores the rotation Qilj and applies it to its input pair coming in from the top and from the left. Its output data pair is propagated to the bottom and to the right respectively. Again the most efficient implementation of a node is a CORDIC processor in vector rotation mode. Mapping the SFG into a systolic array is trivial, since the data flow is unidirectional. It suffices to insert a delay cell on all dependencies cut by the dashed lines in Figure 2. The pipelining period of the systolic array is one cycle. This systolic array has already been derived in the context of computing the QR decomposition of an arbitrary square matrix [5]. There, on feeding in the matrix, the nodes first compute their rotation angles a ilj and the triangular matrix R is output. Once the angles are fixed, the array operates as in Figure 2. The array also bears resemblance to the well-known Gentleman-Kung array for QR updating. Note however that in Figure 2 the rotation angles are resident in the cells, whereas in the Gentleman-Kung array they are propagated through the array. Except for guaranteed orthogonality, the factorization has the additional benefit that the scalar multiplications in step 1 are eliminated. The whole SVD updating algorithm now consists exclusively of Givens rotations. This regularity is important in view of a hardware realization. It allows to construct a systolic array for SVD updating only using CORDIC processors, provided that the updating of V[k-1] (step 3) can be done in factored form. This is the topic of the next section.
F. Vanpoucke et al.
272
/'
/
/__I_L-~
/'__I_E-~
/"
/
'
i (r-a~
Xl
2•qut 3 X out i
/
1 ~
in
out
,"
/ ,...,"1~ ''-ff ' . / ~/ '-1~~
x2
/ z~"' L
x3
= QilJ T .
in] xi zjin
x4
Figure 2" SFG of the factored orthogonal matrix-vector multiplication (M = 4). 3
UPDATING
THE ANGLES
In step 3 of Algorithm 1, the V[k_l] m a t r i x is post-multiplied by a sequence of M - 1 Givens 9 ,~ilj rotations r ili+x . In this section we present an O ( M 2) method to update the angles ~'[k-1] directly, without explicit computation of the V-matrix. The updating matrix 1 (~ is the product of rotations on neighboring columns, (I) =
M-1 H (I)i{i+l" i=1
Each transformation of the form V ~ V. ~ili+1 will alter several rotation angles. Starting from the tail, a rotation ~ilj is worked backwards into the factorization. It interacts with the preceding rotations ~klt in a way which depends on the relative position of the coordinate planes defined by their indices. Three types of transformations have to be considered.
i. THE
I N D E X PAIRS (k,l) A N D
(i,j)A R E
DISJOINT.
In this case the rotation matrices Qklt and CqJ commute since they affect different rows or columns, i.e.,
~ilj . Qklt = Qklt . r 2. THE I N D E X PAIRS (k,l) A N D
(i,j)
A R E EQUAL 9
Here the rotation angles of Qilj and r
simply add together, i.e.,
3. THE INDEX PAIRS ( k , l ) AND ( i , j ) SHARE A COMMON INDEX. 1The time index k is omitted for notational convenience.
Stable Jacobi SVD Updating
273
This is the complicated case. Let k = j. Generically, the matrices Qjlt and ~ilj do not commute and it is even impossible to calculate an equivalent pair of rotations such that ~i.lj .Qj.lt = Qjll. r However, reordering the indices becomes possible if a third rotation, Qilt, is taken into account. The sequence of 3 Givens rotations in the (i, l), (j,/), (i,j)-planes, defines a rotation in the 3-dimensional ( i, j, /)-space,
ViJt = Qilt. Qjlt. (~ilj. The 3-dimensional rotation can also be represented by a different set of three Givens rotations by choosing another ordering of the coordinate planes. ViJt = (~ ~,lJ . Qi, lt . QJl,.
There is no simple trigonometric expression for the mapping from the former to the latter set of angles. A natural algorithm is to compute V ijt explicitly and refactor it. The computational complexity of this 3 x 3 core problem is relatively low and independent of the matrix dimension M. 1 0 0] 0 1 0 0 0 1 u._
y_,
x
x
0
x
x
0
0
0
1
Q3lZ )
xxxxxx xx 0]o Exxx xx x xx x]o:
J
/3
Ix~ xx x xx - [100 xx ~ xx
E10001 ~ 01 J
It is even sufficient to compute only two columns of V ijt. When selecting the first and last column, the operation count is optimized to 7 rotations over a given angle (vector rotation), and 3 rotations in which a coordinate is zeroed (angle accumulation). On a CORDIC processor, both operations have the same complexity. Below the course of the computations in the 4 • 4 example is detailed for the first updating rotation r The numbers between braces refer to the respective transformation types above. In the first line, ~112 can be commuted with Q3[4 (type 1). To interchange r with Q214, Qll4 must be adjacent to Q214. Therefore, Qll4 is commuted with Q213. Then the equivalent set of rotations in the (1, 2, 4)-space is determined (type 3). The same operations axe then repeated in the (1,2,3)-space. On the last line the angles in the (1,2)-plane are summed (type 2). V.r
_
Q112.Q11a.Ql14.Q213.Q214.Q314.(~l12
=
Ql12.Ql13. Qll4. Q213. Q214. ~112. Qal4
(T1)
=
Ql12. Qlla.Q213. Ql14. Q214. (I)112. Q314
(T1)
274
F. Vanpoucke et al.
ili+z
= JAI
ol I'+ l
" T
iji+l = Q[~+I . eili+l out ..., ..
........,O214
Qilj
"'t............ D314
=
OU~
] I
r
r
I)iJi+l out
out
0!+11/ L
"v t'n.
JBV
sn
a weave candidate and we refer the algorithmic equivalence transformation step as the weave step. PROOF The weave candidate defines an elementary Jacobi-step X ' ,- J(O)X J(7~)T (2), which is associative because J(O) and J(7~) are unconditional matrices; the weave step exploits this property. []
Slice S~ is the, two times, right shifted slice of slice S~ (9). Application of a sequence of dependency preserving transformation steps reorder S~ into slice S~.
~g = .~" (k) 4
k) O~'~(k)~'~
I1
o~'~(k r ~
- 1)~3214(k
1~1~(k)
(i0)
Here the first weave candidate announces itself (boxed). The critical path ordering of the slice shows that the critical path is of length 2 ( n - 1) and the operators in the weave candidate are all on a critical path of the slice.
PRECEDENCE GRAPH The occurrences and relationships between the weave candidates (r, c , k - l I in the DG define a precedence graph (PG). Figure 2 gives an example of the precedence graph of weave candidates for a matrix of order n, with n = 7. In this figure A K stands for relative sweep distance k - l. co
3 4
AK
32
Figure 2: Weave candidate PG for n = 7. Any feasible cut-set of the PG defines a partial ordered set of weave candidates which includes the root candidate. The weave steps in this set, when applied to the given program (slice) will transform it into an input-output equivalent program (slice). The dependence graph of the initial cyclic by rows algorithm is regular [1]. The aim of 'weaving' is to decrease
286 the program's critical path and, moreover, to arrive, if possible, at an equivalent program that again is regular. The following proposition gives (without proof) the conditions. P r o p o s i t i o n 4.3 Let prog be the cyclic by rows SVD algorithm of order n with operators 0 rlr+x (k), 8rlr+l(k) and ~lc+x(k) (cf. program 3). Let DG be the dependence graph o f p r o g
Let p be any point in the 3-D integral space with coordinate axes r, c and A K. Let x and y be integral scalars ( z , y E Z ) a n d put )~T = [1,-1, z] and a = x - y . Consider the hyperplane partition hp(z, y) " )~Tp < a This partition with 1 < z < n - 3 and 1 < y < n - 2 defines feasible cut-sets. Application of the weave steps in a partition hp(x, y) to DG yield an input-output equivalent dependence graph DG(z, y) which is, moreover, regular together with the original one. The critical path of DG(z, y) is at most of the same length of the critical path of DG; more precisely, if DG has a critical path of length 2(n - 1) then the critical path of DG(x, y) is 2d with 4 ro(t- 1) Generate a new auxiliary eigencomponent by computing
v_.,(t)(t)=z__,(,_~)+x(t)/ll~(,_x)+~(t)ll END
and
A,(,)(t)=AN(t)
Table 2: An adaptive algorithm for both rank and subspace tracking We note that when we estimate rs(t), the last n - r(t - 1) noise eigenvalues are identical to AN(t). This makes AIC(k) and MDL(k) to be strictly increasing in k when k >_ r ( t - 1). Hence we only need to compute AIC(k) and MDL(k) for k _ a2 _> 9.. >__an _> 0. In discrete ill-posed problems, the matrix K has a cluster of singular values near zero, and the size of this cluster increases when the dimension is increased. The least squares solution to (1) is the solution to the problem m~n Ilgx - YlI2,
(3)
and can be written as
XLSQ =
E
- - Vi,
i=10"i
(4)
where ai = uTy. In XLSQ, e r r o r in the directions corresponding to small singular values is greatly magnified and usually overwhelms the information contained in the directions corresponding to larger singular values. Regularization methods differ only in how they choose to filter out the effects of these errors. To explain these choices, we will follow the development of I-Iansen and O'Leary [9]. In T i k h o n o v r e g u l a r i z a t i o n [13] we solve the minimization problem
~ n { IIg~ - YII] + ~ll~ll~ }.
(5)
The parameter A controls the weight given to minimization of []zl[ 2 relative to minimization of the residual norm. Sometimes a seminorm IlL xl12 is substituted, where L typically is a discrete approximation to some derivative operator; by transformation of coordinates, this problem can be reduced to one involving minimizing the standard 2-norm [1]. In the t r u n c a t e d S V D regularization method [8, 16], one truncates the summation in (4) at an upper limit k < n, before the small singular values start to dominate. Certain iterative methods for solving the least squares problem (3) also have regularization properties. The c o n j u g a t e g r a d i e n t family of methods minimizes the least squares function over an expanding sequence of subspaces ~k = span{KTy, ( K T K ) K T y , ..., ( K T K ) k - I K T y } . The LSQR implementation of Paige and Saunders [11] is well suited to ill-posed problems. Like the least squares formulation (3), the solutions produced by these regularization methods are solutions to minimization problems. There are two complementary views" they minimize the size of the solution x subject to keeping the size of the residual r = K x - y less than some fixed value, and in a dual sense they minimize the size of r subject to keeping
317
The SVD in Image Restoration
the size of x less than some value M(A), monotonically nondecreasing as the regularization parameter A decreases. The way that "size" is measured varies from method to method, but many of these methods are defined in terms of the norms induced by the bases of the singular value decomposition. Although the 2-norm is invariant with respect to the choice of an orthonormal basis, other norms do not share this property. We will denote the p-norm in the SVD basis by I1" lip, defined by
Ilzllp_= I1~11,
if x =
Ilrllp_=
~ivi;
11711,+ IIr•
if r = ~ T i u i + ra..
i=1
i=1
Hansen and O'Leary [9] verify the following characterizations: Method
II minimizes !
domain
Tikhonov Truncated SVD
Ilrl12
Ilrll~ Ilrll~
{~: 11~112~ M(A)} {x: Ilxlk ~ M(A)} {x: Ilxll~ ~ M(A)}
Ilrl12
{z:x e X:k}
LSQR
]
The "smoothness" of the solutions in the lp family for a particular value of M(A) decreases as p increases. The truncated SVD (11) solution has no components in directions corresponding to small singular values. The Tikhonov (12) solution has small components in these directions. The lp (p > 2) solutions have larger components, and the ler solution has components of size comparable to those in the directions corresponding to large singular values. These methods can also be generalized to weighted lp norms. From this discussion we see that the choice of regularization method is a choice of a pair of functions, one measuring the size of the residual and the other measuring the size of the solution vector. The functions determine precisely how the effects of error are damped. The choice of the regularization parameter determines the size of the solution vector.
3
TOTAL LEAST SQUARES METHODS
All of the regularization methods considered so far have the weakness that they ignore errors in the kernel matrix K. In reality, such errors may be large, due to discretization error, numerical quadrature error, and departure of the behavior of the true measuring device from its mathematical model. We can account for errors in the kernel matrix as well as the right-hand side by using a total least squares criteria (or errors in variables method) [5, 15] for measuring error. In total least squares, we seek to solve the problem
rain II(K, y ) - (k, ~)IIF x,g
subject to
~ = ~" x .
The solution to this problem can be expressed in terms of the singular value decomposition of the matrix ( K , y) n+l
(g, y)= ~ i=1
#ifii~ T.
D.P. O'Leary
318
We partition the matrix V, whose columns are vi, as P ."
q ;
(
where p is the largest integer so that #p > #p+l. Then the total least squares solution vector is defined by If the problem is ill-conditioned, then the total least squares solution can be unduly affected by noise, and some regularization is necessary. In analogy to least squares, we can add the constraint rank((~', #)) ~ k + 1
to produce a t r u n c a t e d total least squares solution, or add the constraint
Ilxll < M(~) to produce a T i k h o n o v total least squares solution. Some properties of the truncated total least squares solution for ill-posed problems are discussed by Fierro, Golub, Hansen, and O'Leary [2], while the Tikhonov total least squares sohtion is being studied by Golub, Hansen, and 0'Leary [3]. 4
COMPUTATIONAL
ISSUES
For large matrices K, the S V D is too expensive to compute, but it is well known that anything defined by the S V D can be approximated through quantities computed in Lanczos bidiagonalization in the L S Q R algorithm [11]: K V = UB,
where V T V = Ikxk, U T U = l(k+i)x(k+1), B = bidiagonal(k+1)xk.
The approximation will be good if singular values drop off rapidly and only the large ones matter, as is c o m m o n in ill-posed problems. The problem is projected onto the subspace spanned by the columns of V, and either the least squares or the total least squares solutions can be computed on this subspace. Regularization can be added as necessary; see, for example, [10].
The Lanczos bidiagonalization iteration is inexpensive if K is sparse or structured, since it depends only on matrix-vector products plus vector operations. 5
NUMERICAL
EXPERIMENTS
W e illustratethe effects of different regularization methods using a trivialtest problem, the 16 • 16 image shown in the figure. The kernel matrix K was constructed to model Gaussian blurring: the value for pixel (i,k) was constructed by summing the 9 nearest neighbors (j,l) weighted by e -'I((i-j)~+(~-02). This matrix K is a spacially-invariant idealization of a physical process. W e assume that the true physical process has some spacial dependence,
The SVD in Image Restoration
319
represented by error drawn from a uniform distribution on [-am, am] added to each matrix element. The true image was convolved with K plus these errors to produce a blurred image, and then Gaussian noise e, with standard deviation ar times the largest element in the original image, was added to the blurred image. This resulting image, along with the idealized kernel matrix K, form the data for the problem. Much work has been done on how to choose the regularization parameter [14] (or decide when to stop an iterative method) using methods such as the discrepancy principle [6, w generalized cross-validation [4], the L-curve [7], etc. We sidestep the important question of how to choose this parameter by using an utterly impractical but fair regularization criterion" set the regularization parameter, truncation parameter, or iteration count to make the norm of the answer equal to the norm of the image. The choice of norm is the one appropriate to the method: 11 for truncated SVD, 12 for Tikhonov least squares, etc. Before plotting the answers in Matlab, we set all negative pixel values to zero. For each problem, the results of seven reconstructions were computed: 9 The truncated SVD method, based on the 11 norm in the SVD basis. 9 Tikhonov regularization, based on the 12 norm in the SVD basis. 9 The lo~ method, based on the loo norm in the SVD basis. 9 The least squares solution from Lanczos bidiagonalization of the matrix K. 9 The total least squares solution from Lanczos bidiagonalization of the matrix K. 9 The least squares solution from Lanczos bidiagonalization of K preconditioned by a Toeplitz approximation to K. 9 The total least squares solution from Lanczos bidiagonalization of K preconditioned by a Toeplitz approximation to K. Numerical experiments were run with a,~ = 10 -1, 10 -3, 10 -5, and ar = 10 -1, 10 -3, 10 -s. Typical results are shown in the figures. Several conclusions emerged: 9 The truncated SVD solution is useful in high-noise situations, but has limited usefulness when noise is small. 9 The Tikhonov and loo solutions are useful when the noise is small. 9 For noise of 10 -1, the Lanczos algorithm quickly reproduces the Tikhonov solution (in fewer than 40 iterations), but for noise of 10 -3 or less, the cost is high (greater than 130 iterations). 9 Preconditioning can significantly reduce the number of iterations. 9 The Lanczos total least squares solutions are quite close to the Lanczos least squares solutions.
320
D.P. O'Leary
References
[1] L. Eld~n, Algorithms for regularization of ill-conditioned least squares problems, BIT, 17 (1977), pp. 134-145. [2] It. Fierro, G. H. Golub, P. C. Hansen, and D. P. O'Leary, Regularization by Truncated Total Least Squares, in Proc. of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM Press, Philadelphia, 1994, pp. 250-254. [3] G. H. Golub, P. C. Hansen, and D. P. O'Leary, Tikhonov Regularization and Total Least Squares, Tech. ttept., Computer Science Dept., University of Maryland, to appear. [4] Gene H. Golub, Michael Heath, and Grace Wahba, Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter, Technometrics, 21 (1979), pp. 215-223. [5] G. H. Golub and C. F. Van Loan, An Analysis of the Total Least Squares Problem, SIAM J. Numer. Anal., 17 (1980), pp. 883-893. [6] C. W. Groetsch, The Theory of Tikhonov Regularization for Fredholm Integral Equations of the First Kind, Pitman, Boston, 1984. [7] P. C. Hansen, Analysis of Discrete Ill-Posed Problems by Means of the L-Curve, SIAM Review, 34 (1992), pp. 561-580. [8] P. C. Hansen, The Truncated SVD as a Method for Regularization, BIT, 27 (1987), pp. 354-553. [9] P. C. Hansen and D. P. O'Leary, The use of the L-curve in the regularization of discrete ill-posed problems, SIAM J. on Sci. Statist. Comp., 14 (1993), pp. 1487-1503. [10] Dianne P. O'Leary and John A. Simmons, A Bidiagonalization-Regularization Procedure for Large Scale Discretizations of Ill-Posed Problems, SIAM J. on Sci. and Statist. Comp., 2 (1981), pp. 474-488. [11] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse equations and sparse least squares, ACM Transactions on Mathematical Software, 8 (1982), pp. 43-71. [12] J. Skilling and S. F. Gull, Algorithms and applications, in Maximum-Entropy and Bayesian Methods in Inverse Problems, C. It. Smith and W. T. Grandy Jr., eds., D. tteidel Pub. Co., Boston, 1985, pp. 83-132. [13] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems, Wiley, NY, 1977. [14] D. M. Titterington, Common structure of smoothing techniques in statistics, International Statistics Review, 53, 1985, pp. 141-170. [15] S. Van Huffel and J. Vanderwalle, The Total Least Squares Problem - Computational Aspects and Analysis, SIAM Press, Philadelphia, 1991. [16] J. M. Varah, Pitfalls in the numerical solution of linear ill-posed problems, SIAM 3. on Scientific and Statistical Computing, 4 (1983), pp. 164-176.
321
The SVD in Image Restoration Blurred image
Originalimage
........
Trunc. SVD image
~::=----=:!
i,,":,--'_':))ill!)i I I | l l m l l l l l l l | l i l l i i l n l l
~.:.~mm:
:~ ~
m m l l ~ 1
Lanczos LS
LT E 4j ':]
~ ) m m m n m m m m ~ im ~ = m m u n ~ = ~ : u ~ .|~m~mm~ u m m m ~ ~ m m m m
;i):)iiiiill mmn~= ~.~m~mmm m ~ = ~ r ~m=m, =m~m~ =mmi
m~
l ~ m m m m m . : ~ . m m ~
Prec. Lanczos TLS
Prec. Lanczos LS I n ]
nil'Ill
lW
II
m ~ m m * m m ~ m m m ~
I-infinity image
Ul
.... . =...~:., :-," 9 ~ 9~ ~ # - . ......
.....
.., .
i~.,i... ~ , ~ ~,m~, sigma_.m - I .~-I, sioma_r - I .~-5
Blurred image
Trunc. SVD image
~ ~=mmnmnmmumnl Im m n m m m m m m l m m lnmm mmimmumml immmm mmmmmmmm
~mma
:i mmm~
~ m m m ~ m ~ a mm~mmmma
lmm~mmmmu~ =mm.~m .......... ~ iNN
nmm
Ir
i
.....
llmi |
Lanczos LS
~ma
~a
~Imm
u i : . m n m
m m m
....
Lanczos TLS m m m~ 9 m)~
mmm mm ~nmmm~mm
mn~ =mm
m m ~ : m m ~ m ~ m
mmm
~mm
m
,: ~ m ~ m
=::~:=m = ~ : r:.,~;::] , n , m , I, m , ,}....~ | .... i .
Prec. Lanczos TLS lmm~mmmm
mum m
~m
~ m m ~
~..
mm
~
m m~
~
Im::mm~m::u: mm ~m=m
m
mm
m
mm
IIIU~
=
~
9
~l
II.'.~.I."=~II
~mm
m
m~~ iN
i
; ~m'
|m~
sigma_rn" 1 .e-5, sigma_.r, 1 .e-1
9
I I , - I I I R ~
|. m m ~
~mmm m~mm,
~ m m ~ l
I ml !
D.P. O'Leary
322
Blurred image
Original image
"-:-:Jii:::::"::: ~::,U_.liiiiiiil
im~ummmmmm~mm( im)~ ~mmmm~mml ~mm~::~,mmmmmmm~ ~ m m ~ ~ m m m m l l~ma~ ~ ..... m ~ m m ( mm~m..~:~s~mm~.l
~::~m~mmmmmmm~
--
Tikhonov image
Lanczos TLS
Lanczos LS
~ ~
~:.~~/
',',: ',
i u ~..:~l ! i i I ! L~:~r:::IV::::~.S I m
:..... ~+.
. ~ H _ ~ ....
Lal .............::T"I 9 m--..;
~
4
~Tl[~
(o)l
Prec. Lanczos TLS
i
:~=~::i~:|:: u~m~ mm iu~ :mm~i,~mim,mmu
ml
mummmm~m:-~.mm::l m~mm
=smi~u
mnmmm
~
[,lr:l
u i
l~,:i:l:::li::l:5.1
I
mm.mm
m~l
i:!:!:~::~i
sigma_m = 1.e-I, sigma_r = I .e-1
Original image
Trunc. SVD image
Bluffed image . . . . '..~. ~ | l l l i n i l
~ ~ _ : i -. ~./~
....
/~..:.; ,,
:)?
:':||!|E|||||)
nmm iamunimm! mmmm iimnmmnm mmmmm mmmmmmn mm mmm mmmm| mm mmmm mmmm mmmmmmmmm mmm mmmm mmmm am mmmmmmmmmmmm mmmmmmmmmmmmm!
Lanczos TLS
Tikhonov image
:==:==========: i= mmunmm=mmu tnnu nnuunnnn, ~uuuu mnunnnn, immllu mmmmum
tmm mmm ,mmmm =mmmmm tmmmmammmammm tammmmmmmmmamm,
I
mmmmmmmmmmmmmu
Inuunmnnnmunnn
m~~
m mmmmmmmmmmm mmm mmmmmmmm| mmmm mmmmmmm| mmmmm mmmmmml mm mmm mmmml ~mm mmmm mmm|
in mnnnnnnnmu uumn umumummm immmu mammmml luunnm muumuu imu ann nmun imm Rmnm mmm
nnnn munnnnnl unnnu mnmmmul un mum mumml nn -mnnm mmml
:FT;N~
11~:
l i B | m a n | i l i u m imiinanmmiB|mml
||mnmmm|mm|mml
nnmnnnnnnnun mmmmmmmmmmmnm!
sigma..m = I.e-5, sigma..r = 1.e-5
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
TWO DIMENSIONAL COMPRESSION
ZERO
ERROR
323
MODELING
FOR IMAGE
J. SKOWRONSKI, I. DOLOGLOU
Laboratoire des Signaux et Syst~mes, CNRS-ESE Plateau de Moulon, 91192 Gif-sur- Yvette France skowronski @lss.supelec,fr ABSTRACT. A zero error modeling technique for two dimensional signals is presented. It is pointed out that a rank reduction algorithm applied to the signal renders the modeling physically meaningful in the sense that the estimated exponential sinusoides coincide with the real ones. Image compression is performed as an application of the present method. KEYWORDS. Zero error modeling, rank reduction, spectral estimation, prony, image compression.
1
INTRODUCTION
The estimation of one and two dimensional exponential sinusoids is an important problem in many signal processing applications. This paper presents an optimal representation of a discrete two dimensional signal using these functions, according to the following expression, 2
2
I(m, n) = ~ ~ Aija~Z? sin(12jm + win +r
(1)
i=1 j=l
where I(m, n) denotes a sample of the signal. The major problem concerning this representation is the accurate determination of the attenuations ai,~j and the frequencies wi, llj. The application of a conventional least square modeling approach may yield an approximation of these parameters. The amplitudes Aij and the phases r of the exponential sinusoids can then be obtained by Prony's method. Since, in general, the modeling error will not vanish completely, there will be an error made on the attenuation and frequency estimates, causing important distortions to the signal reconstructed according to (1). As a remedy to this problem, a zero modeling method has been proposed in [1] for monodimensional signals, which can be found at the meeting point of LPC and SVD analysis. When choosing the model order p sufficiently large, that is p = N , where N is the
J. Skowronski and L Dologlou
324
number of samples in the signal, the covariance matrix C ~ + 1 of order N + 1 has always a zero eigenvalue, meaning that the corresponding eigenvector provides a zero modeling error. Consequently, the exponential sinusoids, obtained from this eigenvector, allow a perfect modeling of the signal. In order to obtain a physically meaningful representation of the signal, using the above approach, it is necessary to dispose of a reduced rank signal (rank(C~.+l) < N). Since, in general, signals are not of reduced rank an algorithm was proposed in [3] [2] to optimally reduce the rank of one and two dimensional signals. However, in practice, this algorithm only provides an approximately reduced rank signal, implying a small but non zero modeling error. This, in turn, does not permit a conventional modeling approach to determine accurately the exponential factors and frequencies. Only the zero error modeling technique can do so. In the following, the outlined modeling procedure is first detailed for 1D signals and then generalized for images. Section 2 describes the zero error modeling approach which leads to a signal representation given by equation (1). In section 3 the rank of a signal is defined and the importance of a rank reduction of the signal is explained. Such a rank reduction algorithm is presented in section 4. The paper is concluded with an application of the present method to image compression.
2
SIGNAL
MODELING
The determination of the frequencies and the exponential factors is basically a spectral estimation task and much work has been done on this subject for monodimensional [5] and bidimensional signals [6]. These methods are all based on a linear prediction model. In the monodimensional case the model can be written as follows: htSp = e t (2) where h ~ = [hi,h2,..., hp+l] is the vector of prediction coefficients,Sp the signal observation matrix and e t the modeling error. For a 1D signal l(n),n = 1,...,N, the signal observation matrix Sp, which is of the Hankel form, is given by:
sp =
I(1) 1(2)
:
1(2) I(3)
:
I(p + 1) x(p + 2)
... "..
I(N-p) I ( N - p + 1) :
...
X(N)
...
(3)
Let us consider the solution of the SVD problem, that is the minimization of e 2 in : A(hth - i) e2 = e t e = h S p St p h +~ (4) This solution is given by the eigenvector hmi,~ of the signal covariance matrix Cp+1 = SpS~, which corresponds to the smallest eigenvalue Ami,~ = e2min of Cp+l [5]. In general, Amin is small but nonzero, which results in a modeling that is not exact. Consequently, there will be an error made on the estimation of the exponential factors and frequencies. This inaccuracy, especially the one on the attenuations, yields fairlyimportant distortionsto the estimates of the exponential sinusoids, which is the reason why Prony's method gives poor
2-D Zero Error Modeling for Image Compression
325
results 9 The only way of eliminating the modeling error is to increase the order of prediction p. In [1], it was pointed out, that for a prediction order p = N, N being the number of samples, C ~ + 1 = S ~ S ~ becomes singular. The eigenvector hmi,~ corresponding to the zero eigenvalue yields therefore zeros that exactly describe the N samples of the signal
l(n).
In the following, this zero error modeling approach is generalized to twodimensional signals I ( m , n), m = 1 , . . . , M, n = 1 , . . . , N. For this purpose we consider the modeling of a multichannel signal (k channels) as proposed in [2]. Consequently, a unique prediction equation describes each one of the k channels, according to the following expression, ti Yi = 1 ... k (5) h St p i= e where h t denotes the common vector of prediction coefficients, S~ the signal observation matrix of the i-th channel and e ti the associated modeling error. The above k equations may be expressed as follows, (6) hiS v = [et , e ~ , . . . , e ~ ] = e' with Sp --
I(i, 1)
I(1,2)
...
I(1,2)
I(1,3)
...
9
9
I(1,p+l)
. ,
I(1,p+2)
i(k, 1) i(k, 2)
1(1,N-p)
I(1,N-p+
9
9
9
0
9
.
.
9
...
z(k, 2) z(k, a)
9
l)
I(1,N)
...
9
...
i(k, N - p) I(k, N - p + 1)
...
I(k, N)
9
9
O
(7)
9
I ( k , p + 1) I ( k , p + 2 )
As in the monodimensional case, we only obtain a global prediction error e equal to zero, if Sp is singular, that is, if rank(Sp) 174 were replaced by 0. The result is an image in which the ocean surface reflectivity is essentially removed, and which therefore gives a much sharper view of the underlying reefs. The pixels in this data set can be classified into 2p -- 64 subgroups using 174 as the cutoff
L.P. Ammann
340
for outliers. In this case 339209 (57.51%) of the pixels are classified into the main group, which consists of those pixels that are not outliers in any principal component. The second largest subgroup is the set of pixels that are outliers in all principal components. This group contains 39955 (6.77%) of the pixels. The next two largest subgroups are those pixels that are outliers just in the sixth and just in the third principal components, containing 29338 (4.97%) and 26701 (4.53%) pixels, respectively. Images constructed using the techniques described here are available via anonymous ftp from ftp.utdallas.edu in the directory, /pub/satellite. These images are in PPM format, which can be viewed by a variety of imaging programs. Images in this format can be converted to other formats by the publicly available PBMPLUS tools. 6
EXTENSIONS
Because of the increasing popularity of Geographic Information System (GIS) packages among researchers in Geology, Social Science, and Environmental Science, a number of researchers are developing multivariate databases in raster coordinates to take advantage of the capabilities of these packages to organize and display such data. These databases have the same basic structure as remote sensing data sets: each record corresponds to a geographic location within a lattice and has associated with it a vector of observations. Although the values of these observations are not necessarily integers in the interval 0-255, RSVD can be applied to these databases, and the output of RSVD can be put into the same format as for remote sensing data. The image construction and classification techniques described above can be performed on this output, which can provide visualization and analytical tools to search for any structure that may be present in these databases. References
[1]
L.P. Ammann. Robust singular value decompositions- a new approach to projection pursuit. J.A.S.A. 27, pp. 579-594, 1993.
[2]
E.M. Dowling, L.P. Ammann, R.D. DeGroat. A TQR.-iteration based adaptive SVD for real time angle and frequency tracking. IEEE Trans. Signal Processing, 42, pp. 914-926, 1994.
[3] [4]
S.A. Drury. Image Interpretation in Geology. Allen and Unwin, 1987.
[5]
F.I~. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel. Robust Statistics, The Approach Based on Influence Functions. J. Wiley and Sons, 1986.
[6]
J.J. Jensen. Introductory digital image processing, a remote sensing perspective. Prentice-Hall, 1986.
[7]
P.J. ttousseeuw, A.N. Leroy. Robust regression and outlier detection. John Wiley and Sons, 1987.
G.H. Golub, C.F. Van Loan. Matrix computations. North Oxford Academic Publishing Co., Johns Hopkins Press, 1988.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
SVD FOR LINEAR
INVERSE
341
PROBLEMS
M. BERTERO
University of Genoa, Department of Physics Via Dodecaneso, 33 16146 Genova, Italy bertero@genova. infn. it C. DE MOL
University of Brussels, Department of Mathematics Campus Plaine C.P.217, Bd du Triomphe 1050 Brussels, Belgium
[email protected] ABSTRACT. Singular Value Decompositions have proved very useful in getting efficient solvers for linear inverse problems. Regularized solutions can be computed as filtered expansions on the singular system of the integral operator to be inverted. In a few special instances of interest in optical imaging, we show that the relevant singular systems exhibit a series of nice analytic properties. We first consider the inversion of a finite Laplace transform, i.e. of a Laplace transform where the solution or the data have finite support. The corresponding singular functions are shown to be also the eigenfunctions of a second-order differential operator closely related to the Legendre operator. This remarkable fact allows to derive further properties, which are similar to some well-known properties of the prolate spheroidal wave functions (derived by Slepian and Pollack for the case of the finite Fourier transform inversion). The second compact linear integral operator we consider describes the one-dimensional coherent imaging properties of a confocal microscope. Its singular functions have very simple analytic expressions and moreover, the determination of the generalized inverse of this integral operator reduces to the inversion of an infinite matrix whose inverse can also be derived analytically. KEYWORDS. Inverse problems. Laplace transform inversion, optical imaging. 1
INTRODUCTION
The interest for inversion methods is growing in parallel with the development of remote sensing techniques, sophisticated imaging devices and computer-assisted instruments, which
M. Bertero and C. De Mol
342
require clever processing and interpretation of the measurements. In most modern imaging or scattering experiments, indeed, the measured data provide only indirect information about the searched-for parameters or characteristics and one has then to solve a so-called inverse problem. In many situations, the model relating these parameters to the data is linear or can be linearized. Moreover, the linear inverse problem is often equivalent to the inversion of an integral operator, i.e. to the solution of an integral equation of the first kind: K(z,y)
f(u)
(1)
dy = g(x) .
The integral kernel K(x, y) describes the effect of the instrument, medium, propagation, etc. For example, in space-invariant optical instruments, g ( z , y ) = S(z - y) where S(z) is the impulse response of the instrument. The inverse problem consisting in the restoration of the object f from the image g is then a deconvolution problem. In another type of applications, it is required to resolve exponential relaxation rates. This happens e. g. when trying to discriminate the sizes of small particles undergoing Brownian motion by the technique of photon correlation spectroscopy [1]. If all the particles are identical, the output of the correlator is a decreasing exponential function with a relaxation constant inversely proportional to the particle size. Hence, for an inhomogeneous population, a superposition of such exponential patterns is observed, and the determination of the size distribution requires the inversion of a Laplace transform, or in other words, the solution of eq. (1) with K ( x , y ) = e x p ( - x y ) . Notice that in this case the unknown function has clearly a finite support because some bounds on the particle sizes can be given a priori on the basis of physical grounds. In a similar way, in optical imaging, it is often possible to assess a priori the support of the observed specimen. To overcome the non-uniqueness and stability problems inherent in inverse problems, it is of uttermost importance to take explicitly into account all the available a priori knowledge about the solution and, in particular, of such finite-support constraint. This explains why many of the integral operators one has to invert for practical purposes are in fact compact operators. In the next section, we recall some basic features concerning the inversion of compact bounded linear operators, stressing in particular the usefulness of their singular systems. In the rest of the paper, we present some properties of the singular system of two particular operators we studied in view of solving practical inverse problems. In Section 3, we consider the previously mentioned Laplace transform on a finite interval and in Section 4, an integral operator describing the imaging capabilities of a confocal microscope. We focus here on some mathematical results which we believe to present some interest independently of the original applications we were considering.
REGULARIZED SOLUTIONS FILTERED SVD
OF LINEAR
INVERSE
PROBLEMS
BY
In agreement with the previous discussion, we assume that the relationship between the unknown characteristic f and the data g can be described by a known bounded linear operator A (e. g. an integral operator) so that the linear inverse problem we consider
SVD for Linear Inverse Problems
343
consists in solving the operator equation
A f =g
(2)
where f and g belong to Hilbert spaces, say respectively F and G. We assume moreover that A is a compact operator. The linear inverse problem is then ill-posed, because either the inverse operator A -1 does not exist (the null space N(A) of A is not trivial) or A -1 exists but is unbounded, so that the solution does not depend continuously on the data. When N(A) # {0}, a common practice for restoring uniqueness is to select the so-called minimal norm or normal solution belonging to the orthogonal subspace N(A) • On the other hand, a solution of eq. (2) exists if and only if the data g belong to the range R(A) of the operator. This, however, is unlikely to happen for noisy data and hence, usually, one looks for a pseudo- or least-squares solution of eq. (2), i.e. for a function providing a best fit of the data, say minimizing I ] A f - gila. The least-squares solution that has minimal norm is called the generalized solution f t of eq. (2). The operator A t : G --. F, which maps the data g on the corresponding generalized solution ft, is called the generalized inverse of the operator A. The generalized inverse of a compact operator is either unbounded or finite-rank (see e. g. [9] or [7] for more details). A natural framework for analyzing the inverse problem is provided by the singular system of the operator A. To fix the notations, let us recall that this singular system {ak; uk, vk}, k = 0, 1,2, . . . , solves the following coupled equations
A vk = ak Uk ;
A* u k - ak Vk.
(3)
The singular values O'k, real and positive by definition, will be ordered as follows: a0 _> al > a2 > "... For a compact operator with infinite rank, we have limk~oo O'k -- 0. The singular functions or vectors uk, eigenvectors of AA*, constitute an orthonormal basis in the closure of R(A) whereas the singular functions or vectors Vk, eigenvectors of A'A, form an orthonormal basis in N(A) • We can then write the following expansion
A f = ~ ak (f, vk)F uk k
(4)
which generalizes the classical Singular Value Decomposition (SVD) of matrices. Accordingly, the generalized solution ft admits the following expansion in terms of the singular system: 1 k The above expression is clearly meaningless for noisy data when At is unbounded. Moreover, even ill the finite-rank case, when the sum is finite, the determination of f t is likely to be unstable, the stability being governed by the ratio between the largest and the smallest of the singular values of A, i.e. by the so-called condition number. When this number is too large (ill-conditioning) or infinite (ill-posedness), one has to resort to some form of regularization to get numerically stable approximations of the generalized solution ft. Notice that finite-dimensional problems arising from the discretization of underlying illposed problems are likely to be ill-conditioned. A classical way of regularizing the problem is provided by spectral windowing, which amounts to take as approximate solution a filtered
M. Bertero and C. De Mol
344
version of the singular-system expansion:
.~
f-
W~,k
Z-~k
(g' Uk)a Vk
(6)
k
where the coefficients W~,k depend on a real positive number a, called the regularization parameter. They should lie between 0 and 1 and satisfy the following condition r~m w . k
~-~0+
= l
(7)
to ensure that f converges to f t when a ---, 0 +. A simple choice for the filter coefficients corresponds to sharp truncation: W~,k=
1 if k _ [l/a] 0 ifk>[1/a]
(8)
where [l/a] denotes the largest integer contained in 1/a. This method is also known as numerical filtering. The so-called regularized solution lives then in a finite-dimensional subspace and the number of terms in the truncated expansion (6) represents the number of independent "pieces of information" about the object which can be reliably (i.e. stably) retrieved from the data. Therefore, it is sometimes referred to as the number of degrees of freedom. There are many other possible choices for the filters Wa,k (see e.g. [7], [9]), such as a triangular filter, discrete equivalents of classical spectral windows (e.g. Hanning's window) or the classical Tikhonov filter: Wa,k
-=
2~
(9)
ak+a To get a proper regularization method one has still to give a rule for choosing the regularization parameter a according to the noise level. Roughly speaking, one has to ensure the convergence of the regularized solution to the true f t when the noise vanishes. In practice, one needs a proper control of noise propagation, namely a good trade-off between fidelity (goodness of fit) and stability of the solution. Many different methods have been proposed for the choice of the regularization parameter (see e. g. [7], [9]). Let us just mention here that a good "rule of thumb" is to take a in (9) of the order of the "noise-to-signal" ratio. Such an empirical choice can be fully justified in the framework of stochastic regularization methods (see e. g. [7]). When in a practical application the number of significant terms in the expansion (6) is not too high, the computation of a regularized solution can be done simply by implementing formula (6) numerically. This requires of course the computation of the singular system, which may be quite time-consuming. For a specific application, however, (i.e. for a given operator A), this computation is done once for all and the implementation of (6) with a prestored singular system results in very fast reconstruction algorithms. Numerical approximations of the singular system are obtained by collocation methods. Let us still observe that in experiments, the detection scheme imposes a natural discretization in data space (the collocation points being just the measurement points) whereas the discretization in the solution space is more arbitrary and may be done only at the last stage, just for numerical purposes.
SVD for Linear Inverse Problems 3
THE FINITE
LAPLACE
345
TRANSFORM
We define the finite Laplace transform as follows
(L f ) ( p ) =
e - ' t f ( t ) dt
;
c
0 (and/or c > 0), then s L2(a, b) --. L2(c, d)is compact and injective. As already recalled in Section 2, the inverse operator 12-1 can be given in terms of the singular system
of s {ak; uk, vk). Some of the properties of s are similar to those of the finite Fourier transform defined by -T e-i~tf(t) dt
(.T'f)(w) -
;
-f~ < w < +12.
(ii)
This operator is also compact and its singular functions are related in a simple way to the so-called linear prolate spheroidal wave functions (LPSWF) considered by Slepian and Pollack [11]. Indeed, after trivial rescaling, both operators .T'*.T"and .T'.T'* coincide with the operator studied in [11], namely -
(~cf)(x) =
1
y)]
r ( x - y)
f ( y ) dy
;
- 1 < x < +1
(12)
where c = 12T. Its eigenfunctions, the LPSWF, are denoted by Ck(C,X) and the corresponding eigenvalues by Ak(c). Hence, the singular values of the finite Fourier transform ~" are ~/~k(C). As shown in [11], the eigenfunctions Ck(C, x) are also the eigenfunctions of the following second-order linear self-adjoint differential operator:
(Dcf)(x) =
-[(1 - x2)f'(x)] ' + c2x2 f ( x ) .
(13)
As emphasized in [10] and [13], this is a remarkable fact, which allows to establish a series of nice properties of the singular functions of 9v, as for example, that they have exactly k zeroes in their respective intervals of definition. Such results have been extended to the case where ( U f ) ( w ) is given only in a discrete set of equidistant points inside the interval [-12, +12]. The set of corresponding eigenfunctions are then called the discrete linear prolate spheroidal functions (DLPSWF) [12] (for more details about the singular system of .T" and its link with the LPSWF, see e. g. [7] and for a review on the prolate functions, see [13]). Similar results hold true also for the finite Laplace transform when c = 0 and d = oc. Then, if a > 0, Z: is a compact operator from L2(a, b)into L2(0, ce). It is easy to show [1] that its singular values depend only on the ratio 7 = b/a. It is therefore convenient, by changing variables, to transform the interval [a, b] into [1, 7] and to study the finite Laplace transform in the following standardized form
(s
=
e - p t f ( t ) dt
;
0 _< p < c~ .
(14)
= s163 is a "finite Stieltjes transform" given by
The operator s
(s^
/;
=
Jl~ tf+( ss)
ds
;
1 < t < 7. - -
(15)
M. Bertero and C. De Mol
346 A
It has been shown [3] that s (D.~f)($)
-
-[(t 2 -
commutes with the following differential operator
1)("/2-
t2)ft(t)]
! + 2(t 2 -
1)f(t) .
(16)
The appropriate self-adjoint extension of this operator is obtained by requiring that the functions in the domain of D.~ are bounded at the boundary points t = 1 and t = 3' (notice A that the differential equation (D~f)(z) = # f(z) has five regular singular points in the complex plane, one being the point at infinity). In the limit 7 ~ 1, the operator (16) reduces, after a change of variable, to the Legendre operator [3]. From the commutation property, it follows ^that any singular function vk(t) is simultaneously an eigenfunction of s = s163 and of D~:
It is then possible to prove that [3] all the eigenvalues a~ have multiplicity 1 and that the ordering of the vk corresponding to increasing values of #k coincides with their ordering corresponding to decreasing values of o"k2 . As a corollary, vk(t) has exactly k zeroes inside the interval [1, 7] (the boundary points cannot be zeroes); moreover, the usual interlacing property of the zeroes of the eigenfunctions of a differential operator holds true for the singular functions Vk. Finally, it is seen that the vk are alternately "even" and "odd" functions in the following sense:
,,~(~)= (_l)k
t
~k(t).
(is)
In particular, when k is odd, then t = ~ is a zero of vk(t). As concerns the singular functions uk(p) it is possible to prove that they are eigenfunctions of the following fourthorder differential operator
( ~ g ) ( p ) = [p~g"(p)]"- (1 + ~)[#9'(p)]' + (~p~ - 2)g(p)
(19)
which commutes with the operator s = s The above properties have been exploited in [4] to compute the singular system of the finite Laplace transform. The case of a solution with infinite support (0, oo) and data in a finite interval (c, d), c > 0, can be treated in a similar way, by exchanging the operators s and s
4
AN IMAGING
OPERATOR
IN CONFOCAL
MICROSCOPY
In a very simplified case, the imaging properties of a confocal microscope are described by the following integral operator
(Af)(x) = f / 5
sinc(x - y) sinc(y) f(y) dy
; x e T~
(20)
where sinc(x) = sin(rx)/(rx). Notice that the object is first multiplied by the sinc-like light pattern of the illumination lens and then convolved by the sinc-like impulse response of the imaging lens. The operator A is compact in L2(T~). Its nun space N(A) is not trivial and can be completely characterized as follows. Let PW2~r(Tt) be the Paley-Wiener space of the L2-functions whose Fourier transform is zero outside the interval [ - 2 r , 2r]. This is a dosed subspace of L2(7~) whose elements are entire functions which can be represented
347 by means of the Whittaker-Shannon expansion
PW(2~
Let us denote now by the subspace of PW2,~(n) containing all the functions r~TXT(+)(Td,) the vanishing at the sampling points x = 0 and x = 4-).,4-3,+~.,...,1 5 and by r,,2~. subspace of all the functions that are zero at the sampling points x = 4-1, 4-2, 4-3,.... We have Then the following result holds true [2]
PW2,~(Tz)=PW(~
PW(+)(Tz).
N(A) = PWi~(7"v,.)(~PW~~162 ; N(A) • =/-'~,':~.(+)(TO).
(22)
The functions belonging to N(A) "t are represented by the following sampling expansion +co f(x) = ~ f(xm) sinc[2(x- Xm)] (23) m---cr
with x0 = 0, Xm -- sgn(m) ( I m l - 89 m -- 4-1, 4-2,..., where sgn(m) denotes the sign of m. The null space of the adjoint operator is characterized as follows
N(A*) = PW~(7"r ; N(A*) l = PW,~(7"r .
(24)
Hence the singular functions vk(x) belong to PW(+)(Td.) whereas the singular functions Uk(X) belong to PW,~(Tt). h very nice result proved by Gori and Guattari [8] is that these singular functions, as well as the corresponding singular values, have very simple analytic expressions. When k is odd (k = 21 + 1; l = 0, 1, 2,...), we have l = o, ~ , 2 , . . .
a2z+l = r ( 2 l + 1) '
u2l+z(x) = ~
sinc x
v2,+l(X)=sinc[2(x
2
(25)
2
'
21+12) ] - s i n c [ 2 ( x + 2 1 + 1 2
)] "
(27)
On the other hand, when k is even (k = 2l; l = 0, 1,2,...), denoting by ~t the positive solutions of the transcendental equation t a n ( ~ / 2 ) = 2//3 we have v~ ~:~ = ~ - ,
u2/(x)=Niv~2[sinc(x1 ~z(~) = - - si~c(~) u~(~)
(28)
2~-)-sinc(x+2~)]
'
(29) (a0)
O'2l
where N~ = (4 +/3~)/~r(8 +/3~). Another nice analytic result concerning the integral operator (20) is the following. Con. . . . (+) sider the restriction of A to the subspace/~w~ (7~), i.e. to the subspace of the functions represented by the sampling expansion (23). This restriction is isomorphic to the following
M. Bertero and C. De Mol
348
infinite-dimensional matrix 1 Ano = ~ 6n0, n = 0, 4-1, :t:2,...
1
(31)
( - 1 ) "+~
A,,m = 2 r 2 x m ( n - x,~) '
n = 0,4-1,4-2,...;
m = 4-1,+2,...
(32)
with xm as defined above. Then this matrix is invertible and its inverse is given by [6] (A-l)0,, = 2 ( - 1 ) n,
n = 0, 4-1,:t=2,...
(33)
(A_1)m.,- (_l)n+ I
2n , n=0,4-1,• m=4-1,4-2,... (34) n- xm This analytic result has provided us with a useful insight for analyzing related inverse problem in confocal microscopy and for estimating the resolution enhancement one could obtain with a multi-detector scheme (see [2], [5] and [6]).
References [1] M. Bertero, P. Boccacci and E. R. Pike. On the recovery and resolution of exponential relaxation rates from experimental data. Proc. R. Soc. Zond. A 383, pp 15-29, 1982. [2] M. Bertero, C. De Mol, E. R. Pike and J. G. Walker. Resolution in diffraction-limited imaging, a singular value analysis. IV. Optica Acta 31, pp 923-946, 1984. [3] M. Bertero and F. A. Grfinbaum. Commuting differential operators for the finite Laplace transform. Inverse Problems 1, pp 181-192, 1985. [4] M. Bertero, F. A. Grfinbaum and L. Rebolia. Spectral properties of a differential operator related to the inversion of the finite Laplace transform. Inverse Problems 2, pp 131-139, 1986. [5] M. Bertero, P. Brianzi and E. R. Pike. Superresolution in confocal scanning microscopy. Inverse Problems 3, pp 195-212, 1987. [6] M. Bertero, C. De Mol and E. R. Pike. Analytic inversion formula for confocal scanning microscopy. J. Opt. Soc. Am. A 4, pp 1748-1750, 1987. [7] M. Bertero. Linear inverse and ill-posed problems. In : P. W. Hawkes (Ed.), Advances in Electronics and Electron Physics 75, Academic Press, New York, pp 1-120, 1989. [8] F. Gori and G. Guattari. Signal restoration for linear systems with weighted inputs. Singular value analysis for two cases of low-pass filtering. Inverse Problems 1, pp 67-85, 1985. [9] C. W. Groetsch. The theory of Tikhonov regularization for Fredholm equations of the first kind. Pitman, Boston, 1984. [10] F. A. Griinbaum. Some mathematical problems motivated by medical imaging. In : G. Talenti (Ed.), Inverse Problems, Lecture Notes in Mathematics vol. 1225, Springer-Verlag, Berlin, pp 113-141, 1986. [11] D. Slepian and H. O. Pollack. Prolate spheroidal wave functions, Fourier analysis and uncertainty, I. Bell Syst. Tech. J. 40, pp 43-64, 1961. [12] D. Slepian. Prolate spheroidal wave functions, Fourier analysis and uncertainty, V: The discrete case. Bell Syst. Tech. J. 57, pp 1371-1430, 1978. [13] D. Slepian. Some comments on Fourier analysis, uncertainty and modeling. SIAM Review 25, pp 379-393, 1983.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
349
FITTING OF CIRCLES AND ELLIPSES LEAST SQUARES SOLUTION W. GANDER, R. STREBEL
Institut fSr Wissenschaftliches Rechnen EidgenSssische Technische Hochschule CH-8092 Ziirich Switzerland [email protected], [email protected] G.H. GOLUB
Computer Science Departement Stanford University Stanford, California 94305, U.S.A. golub @sccm. stanford, edu ABSTRACT. Fitting ellipses to given points in the plane is a problem that arises in many application areas, e.g. computer graphics, coordinate metrology, petroleum engineering, statistics. In the past, algorithms have been given which fit circles and ellipses in some least squares sense without minimizing the geometric distance to the given points. In this article, we first present algorithms that compute the ellipse, for which the sum of the squares of the distances to the given points is minimal. Note that the solution of this non-linear least squares problem is generally expensive. Thus, in the second part, we give an overview of linear least squares solutions which minimize the distance in some algebraic sense. Given only a few points, we can see that the geometric solution often differs significantly from algebraic solutions. Third, we refine the algebraic method by iteratively solving weighted linear least squares. A criterion based on the singular value decomposition is shown to be essential for the quality of the approximation to the exact geometric solution. KEYWORDS. Least squares, curve fitting, singular value decomposition.
1
PRELIMINARIES
Ellipses may be represented in algebraic form F(x)=xWAx+bWx+c--0
(1)
W. Gander et al.
350
with A symmetric and positive definite. Alternatively, ellipses may be represented in parametric form x(r
= Q(~)
~in
+ ~
for r e [0,2~[,
(2)
where z is the center and a, b are the axes of the ellipse. The orthogonal matrix Q rotates the figure by some angle ~. Ellipses, for which the sum of the squares of the distances to the given points is minimal will be referred to as "best fit" or "geometric fit", and the algorithms will be called "geometric". Determining the parameters of the algebraic equation F(x) = 0 in the least squares sense will be denoted by "algebraic fit" and the algorithms will be called "algebraic". We will further look at "iterative algebraic" solutions, which determine a sequence (uk)k=0..r162 of parameter vectors by solving weighted or constrained algebraic equations F(x; uk) = 0 in the least squares sense. We define the following notation: The 2-norm ]1" ]]2 of vectors and matrices will simply be denoted by ]l" II. 2
GEOMETRIC
SOLUTION
Given points (xi)i=l...m, and representing the ellipse by some parameter vector u, we then seek to solve m
d~(u) ~ = n~n,
(3)
i--1
where dl is the geometric distance of xi from the ellipse. We may solve this problem using either the parametric or algebraic form. 2.1
PARAMETRIC FORM
Using the parametric form (2), di may be expressed
di = ~in [Ixi - x(r
u)]l,
and thus the minimization problem (3) may be written Ilxi
-
x(r
u)ll ~ -
i=1
rain
.
r ...r
This is equivalent to solving the nonlinear least squares problem g~ = x~ - x(r
u) ~ 0
for i = 1 . . . m .
(4)
Thus we have 2m nonlinear equations for m + 5 unknowns (r ...r zl, z2). This problem may be solved using the well-known Gauss-Newton method [6], which solves a
Fitting of Circles and Ellipses
351
linear least squares problem with the Jacobian of the nonlinear problem in each iteration step. We may exploit the special structure of the Jacobian for (4) by multiplication from the left with the block diagonal matrix - d i a g ( Q ( a ) ) , and then reordering the equations (gll,gl2...gml,gm2) to (gll...gml,gl2...gin2). The coefficient matrix J then looks - a sin r
- b sin r 9 9
. 9
- a sin Cm b cos r
- b sin Cm a COS r
" ' .
b cos r
cos r .
cos a .
.
.
cos r sin r
cos a - sin a
9
.
.
.
.
.
a cos r
sin a
.
sin r
- sin a
sin a cos a
"
cos a
We may efficiently compute the Q R decomposition of ] by m Givens rotations and a Q R decomposition of the last 5 columns. 2.2
ALGEBRAIC FORM
Using the algebraic form (1), the problem may be stated as follows m
Ildill 2
-
min
=
0
where
(5)
i--1
F(Xi + di; u)
for i = 1 . . . m .
(6)
The vector u contains the parameters A, b and c, and di is the distance vector of xi to the ellipse. The Orthogonal Distance Regression (odr) algorithm provides an elegant solution to this problem using Levenberg-Marquardt. See [2] for a description of the algorithm, and [1] for an implementation in FORTRAN 9 Like in (4), the number of unknowns involved is the number of model parameters plus the number of data points. Although odr is generally applicable, the algorithm has a per step computational effort similar to the method applied in the parametric form. 2.3
CONCLUSION
Several methods for the solution of (3) exist, including for instance the Newton-method applied to (4). Unfortunately, all these algorithms are prohibitively expensive compared to simple algebraic solutions. We will examine in the next sections if we get sufficiently close to the geometric solution with algebraic fits. 3
ALGEBRAIC
SOLUTION
Let us consider the algebraic representation (1) with the parameter vector u = (a11, a12, a22, bi, b2,
r149
(7)
352
W. Gander et al.
1)
To fit an ellipse, we need to compute u from given points (xi)i=l...m. We obtain a linear system of equations
Z21
2ZllX12
X22
Zll
Z12
:
:
:
:
;r,m l
Xm2
9
Bu=
2l Xm
2XmlXm 2 T'm2 2
9
u~0.
1
To obtain a non-trivial solution, we have to impose some constraint on u. For the constraint Ilull = 1, the solution vector ~ is the right singular vector associated with the smallest singular value of B. The disadvantage of the constraint Ilull = 1 is its non-invariance for Euclidian coordinate transformations. For this reason Bookstein [3] recommended using the constraint
A12 4" A22 : a121 4- 2a122 + a22 = 1,
(8)
where ~I and )~2 are the eigenvalues of A. While [3] describes a solution based on eigenvalue decomposition, we may solve the same problem more efficiently and accurately with a singular value decomposition as described in [5]. If we define vectors V
=
( b l , b 2 , c) w
W
=
(all,V/'2a12,a22)
T ,
the constraint (8) may be written
Ilwll- 1. The reordered system then looks
( :(xll...... ) x121 : ) XllXl2 : x2) ~0.
'
Xml Xm2 1 X21 V/2XmlXm2 X22
The Q R decomposition of S leads to the equivalent system
which may be solved in following steps. First solve for w R22 w ~ 0
where [[w[[ = 1
using the singular value decomposition of R22 , and then v = - R l l -1R12w. Note that the problem
where,,u,,- 1
Fitting of Circles and Ellipses
353
is equivalent to the generalized total least squares problem finding a m a t r i x $2 such that rank ( Sl $2 )
0.0 for i = 1, 2, ...,n. We have then the following solutions for equation (9). 4.1
GENERAL SOLUTIONS
The general solutions of the weights w E ~ n with II w 112~: 0 are given by r$
imp+l
where p is an index such that dp > dp+l . . . . . dn, {Pi}~=p+l are eigenvectors corresponding to the smallest eigenvalues {di}~=p+l of M, and {ai}n=p+l are arbitrary coefficients. The corresponding rationality of equation (18) is given by
-- ~/R(w)--II
M . w 0 I1=-- dp+l . . . . .
dn,
(19)
i.e., the smallest eigenvalues of M. Equation (18) gives the interpolation or exact solutions of equation (9) if the rank r of M satisfies r < n, i.e., di = 0 for i = p + 1, p + 2, ..., n. It gives the best fitting solutions if the rank r of M satisfies r = n, i.e., dn ~ 0. 4.2
SOLUTIONS FOR POSITIVE WEIGHTS
Equation (18) provides general solutions with possibly negative weights. As in practical applications, negative weights may introduce singularities and are not expected, we study practical algorithms for a set of positive weights in the eigenspace of M. We first check if the general solutions contain positive alternatives. If there exists w E IEn-p such that all the elements of w are positive, w is then a set of positive interpolation or best fitting weights, w can be computed from the following minimization algorithm, { min~ II w - w~ II~ subject to: wz _< wi _< w~,
(20)
where w = ~n=p+l ~iPi and w~, > wz > 0.0 are positive upper and lower bounds for the weights. The objective function of equation (20) guarantees a set of stable solutions. If the positive interpolation or best fitting solutions do not exist, one can still achieve a set of best fitting positive weights. The basic strategy is the following. We first look for the best subspace of ~ n that contains positive weights and a set of feasible solutions in this subspace. Starting from this feasible solution, we try to optimize the weights in this subspace in the sense of least value of R(w). The meaning of the best is twofold. On the one hand, the maximum rationality inside this subspace is the smallest compared with other subspaces containing positive weights. On the other hand, this subspace is the largest one compared with others for the same maximum rationality.
372
W. Ma and J. P. Kruth
To find the best subspace containing positive weights and a set of feasible solutions in this subspace, let q be an index such that dq > dq+l = ... = dp. Furthermore, let l = n - q and w = pn. We try to move w into ]Et f3 JR.n+. If it is successful, following the minimax theorem [5], ]E ! is then the best subspace containing positive weights and w is a feasible solution in this subspace. Otherwise, I is incremented to include a new group of eigen~,ectors corresponding to the next larger eigenvalues, i.e. to include a new eigen subspace, and the searching process is continued till the objective is satisfied. The algorithm used to move w into ]Et N R n+ is linear programming. The positive constraints used are Wl < wi < wu for rt i = 1, 2, ..., n with w = ~i=~-t+i/3ipi. When we have the best subspace ]El and a feasible solution in it, the following minimization problem can be performed to find a set of best fitting solutions in this subspace. ~y,n B2d2 R ( ~ ) = m i n ~ z..,,.._,+, ~ -,
Subject to:
E;'.._,+, ~~
(21)
w# < wi < w~,,
where w = ~ = r , - i + i 13iPi. The objective function of equation (21)is derived by introducing w into equation (13). According to the maximal theorem of R(w) [6, 10, 11], the rationality 7 of w is bounded by /x",n
(42d2
~/ z.~i~_~-l+i H'i 2 i < d~-t+x. I ~
7 =II M . w o 112= V ~ 5
SOME
EXAMPLES
OF NURBS
(22)
IDENTIFICATION
The techniques developed for the identification of NURBS curves and surfaces from discrete points using SVD and SEVD have been applied to a number of industrial applications. They are typical examples in mechanical engineering for reverse engineering where a CAD model has to be created from free-form and hand-made physical parts [10]. The physical parts are first measured by coordinate measuring machines, laser scanners or other digitizing equipments. A CAD model is then created from the measured points of the physical models. Fig. 1 illustrates the process for transforming some measured points into a single NURBS surface. Fig. 1 (a) shows the measured points for an individual surface and Fig. 1 (b) shows the created surface model in wire frame. Fig. 2 illustrates some other examples of NURBS
XXXX .-'~,, X
_ 9. ~ " .
~,
.
X
X--XX"
^
~, X
,,, "%, X
^
"X
X
^
X X
)'(
X x
X
a: some measured points for a single surface
""
b: the fitted NURBS surface and fitting errors
Figure 1: NURBS surface identification from measured points identification from measured points. In this figure, only surfaces are displayed and they are
SVD and SEVD for NURBS Identification
a: a closed NURBS surface
b: an open NURBS surface (3 duplicates)
c: a rotational NURBS surface
d: some NURBS surfaces for a car wheel
373
Figure 2: Some other examples for NURBS identification shown in shaded images. Fig. 2 (a)-(c) show respectively a general closed NURBS surface, a general open NURBS surface in three duplicates, and a rotational NURBS surface. Fig. 2 (d) shows a partial CAD model of a car wheel defined by NURBS surfaces.
6
CONCLUSIONS
This paper presents SVD and SEVD applications in computer-aided geometric design (CAGD) for the identification of NURBS curves and surfaces from discrete points. Both general solutions and positive solutions are studied. Some practical algorithms and industrial examples are presented.
Acknowledgements This research is sponsored in part by the European Union through a Brite-Euram project under contract number BREU-CT91-0542, and by the Katholieke Universiteit Leuven through a doctoral scholarship.
374
W. Ma and J. P. Kruth
References
[1] C. de Boor. On calculating with B-splines. Journal of Approximation Theory, 6, pp 50-62, 1972. [2] M. G. Cox. The Numerical Evaluation of B-Splines. Journal of the Institute of Mathematics and its Applications, 10, pp 134-149, 1972. [3] G. Farin. From Conics to NURBS. IEEE Computer Graphics 8J Applications, 12, pp 78-86, September 1992. [4] G. Farin. Curves and Surfaces for Computer-Aided Geometric Design, A Practical Guide, Second Edition. Academic Press Inc., 1990. [5] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore and London, 1989. [6] E. X. Jiang, K. M. Gao and J. K. Wu. Linear Algebra. People's Educational Publishers, Shanghai, 1978. [7] J. P. Kruth and W. Ma. CAD Modelling of Sculptured Surfaces From Digitized Data of Coordinate Measuring Machines. Proc. of the 4th Int. Syrup. on Dimensional Metrology in Production and Quality Control. pp 371-387, Tampere, Finland, June 22-25, 1992. [8] S. Lang. Linear Algebra. Addison-Wesley Publishing Company, Reading, Massachusetts, 1977. [9] E. T. Y. Lee. Choosing Nodes in Parametric Curve Interpolation. Computer-Aided Design. Vol. 21, Nr. 6, pp 363-370, August 1989. [10] W. Ma. NURBS-Based CAD Modelling from Measured Points of Physical Models. Ph.D. Dissertation, Katholieke Universiteit Leuven, Belgium, 1994. [11] W. Ma and J. P. Kruth. NURBS Curve and Surface Fitting and Interpolation. To appear in: M. Daehlen, T. Lyche and L. L. Schumaker (eds), Mathematical Methods in Computer Aided Geometric Design. Academic Press, Ltd., Boston, 1995. [12] W. Ma and J. P. Kruth. Mathematical Modelling of Free-Form Curves and Surfaces from Discrete Points with NURBS. In: P. J. Laurent, A. Le M4haut4 and L. L. Schumaker, (eds), Curves and Surfaces in Geometric Design. A. K. Peters, Ltd., Wellesley, Mass., 1994. [13] W. Ma and J. P. Kruth. Parametrization of Randomly Measured Points for the Least Squares Fitting of B-spline Curves and Surfaces. Accepted for publication in ComputerAided Design, 1994. [14] L. Piegl. On NURBS: A Survey. IEEE Computer Graphics ~ Applications. 11, pp 55-71, January 1991. [15] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, 1965.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 1995 Elsevier Science B.V.
A TETRADIC APPLICATION
DECOMPOSITION TO THE
SOURCE
375
OF 4 T H - O R D E R SEPARATION
TENSORS PROBLEM
:
J.-F. CARDOSO
Tdldcom Paris / URA 820 / Gdr TdSI Tdldcom Paris, 46 rue Barrault 7563~ Paris, France. cardoso@sig, enst. fr
A B S T R A C T . Two results are presented on a SVD-like decomposition of 4th-order tensors. This is motivated by an array processing problem: consider an array of m sensors listening at n independent narrow band sources; the 4th-order cumulants of the array output form a 4th-order rank-deficient symmetric tensor which has a tetradic structure. Finding a tetradic decomposition of this tensor is equivalent to identify the spatial transfert function of the system which is a matrix whose knowledge allows to recover the source signals. We first show that when a 4th-order tensor is a sum of independant tetrads, this tetradic structure is essentially unique. This is to be contrasted with the second order case, where it is weel known that.dyadic decompositions are not unique unless some constraints are put on the dyads (like orthogonality, for instance). Hence this first result is equivalent to an identifiability property. Our second result is that (under a 'locality' condition), symmetric and rank-n 4th-order tensors necessarily are a sum of n tetrads. This result is needed because the sample cumulant tensor being only an approximation to the true cumulant tensor, is not exactly a sum of tetrads. Our result implies that the sample cumulants can be 'enhanced' to the closest tetradic cumulants by alternatively forcing their rank-deficiency and symmetry. A simple algorithm is described to this effect. Its output is an enhanced statistic, from which blind identification is obtained deterministically. This leads to a source separation algorithm based only on the 4th-order cumulants, which is equivalent to robust statistic matching without the need for an explicit optimization procedure.
KEYWORDS.Tetradic decomposition, super-symmetric tensors, alternate projections (POCS), signal enhancement, cumulants, high order statistics, source separation.
376
1 1.1
J. -F. C a r d o s o
FOURTH-ORDER
TENSORS
AND THE TETRADIC
STRUCTURE
D E F I N I T I O N S AND NOTATIONS
In this paper, we consider tensors on an m-dimensional complex vector space ~'. For the sake of brievity, we abusively call 'matrices' the once covariant once contravariant tensors and simply 'tensors' the twice covariant twice contravariant tensors. Unless explicitly stated, all vectors, matrices, tensors are related to the vector space ~'. We use the following convention: in some orthonormal basis, a generic vector v has components vi for 1 _< i _< m, the ( i , j ) - t h entry of a generic matrix R has components r iJ for 1 < _ i, j _< m. Similarly a generic tensor jt for 1 < i , j , k , l m where it is only required that the dyads constructed from the columns of A are linealry independent matrices [2]. In the following, we call 'n-tetradic tensors' these tensors which are a sum of n tetrads constructed from linearly independent vectors. For later use, we note that, if a tensor Q has the tetradic decomposition (6), then, a direct substitution of (6) in (1) yields the image by Q of any matrix M as
Q(M) = AAMA H
with AM = D i a g ( d l , . . . , d n )
dp def ~
a~ ap ml.
(7)
kl
In particular, if M is hermitian, the diagonal terms of hermitian. 2
UNICITY
AND COMPUTATION
A M are
real and Q ( M ) then also is
OF A TETRADIC
DECOMPOSITION
At this stage, we may state a major difference between the 2nd- and 4th-order cases. While the decomposition of a 2nd-order tensor in the form (5) is not unique (as noted above), the decomposition of a 4th-order tensor having the form (6) is, under mild conditions, essentially unique. By 'essentially unique', we mean that the columns of A are detremined up to a permutation and that each column of A is determined up to complex scalar factor. Note that this is just the same degree of indetermination observed in the case of the eigendecomposition of a normal matrix with distinct eigenvalues. In this case, the eigenvectors are not ordered (unless the eigenvalues are sorted according to some additional convention) and they are determined up to a phase term if they are normed to unity. We have the following property. P r o p o s i t i o n 1 The tetradic structure (6) is essentially unique if matrix A has full column rank and kp ~ 0 for 1 [
I
1
I
Figure 3: Unfolding of the (I • J • K)-tensor A to an (I • JK)-matrix
A._.A(IxJK).
The core tensor is obtained by bringing the matrices in Eq. (3) to the other side: = A xl U t x2 V t x3W t
(7)
The way of cMculation and the ordering constraint on the core tensor show that the HO SVD obeys analog unicity properties as its matrix equivalent: in the generic case, the singular vectors are determined up to the sign. When the sign of a singular vector is changed, the sign of the corresponding submatrix in ~ alters too.
3
3.1
APPLICATION
TO INDEPENDENT
COMPONENT
ANALYSIS
OUTLINEOF THE PROCEDURE
We consider the noise-free version of Eq. (i): Y = MX
(8)
The separation problem will be solved by factorisation of the transfer matrix: M = TQ
(9)
in which T is regular and Q is orthogonal. In a first step T will be determined from the second-order statistics of the output Y. The resulting degree of freedom, the orthogonal factor Q, is recovered from the higher-order statistics of Y.
388 3.2
L. De Lathauwer et al.
STEP OFY
i: DETERMINATION
OF
T FROM
THE
SECOND-ORDER
STATISTICS
The covariance C Y of Y is given by C2y = M C X M t
(10)
in which the covariance C x of X is diagonal, since we claim that the source signals are uncorrelated. Assuming that the source signals have unity variance, we get: C2Y = MMt
(I 1)
This assumption means just a scaling of the columns of M and is not detrimental to the method's generality: it is clear that M can at most be determined up to a scaling and a permutation of its columns. We can conclude from Eq. (11) that M can be determined, up to an orthogonal factor Q, from a congruence transformation of C Y" C2Y = M M t = ( T Q ) ( T Q ) t = T T t
(12)
This equation shows that only the column space of M can be identified when just secondorder statistical information on Y is used and no extra constraints are added. In order to solve the initial problem one has to resort to the higher-order statistics of Y. When one sticks to a mere second-order approach, it is common to make the solution essentially unique by selecting a matrix with orthonormal columns - in which the extra constraint generally has no physical meaning. This corresponds to the well-known concept of PCA [10]. In the framework of ICA, the PCA-procedure can be considered as one alternative to perform the pre-whitening. Algebraically, this is realized by computing the EVD of C Y" C~ = E D 2 E t = ( E D ) ( E D ) t
(13)
When the output covariance is estimated following C Y = A F A r , in which A y is an ( I x N)dimensional dataset containing N realizations of Y, then the factor (ED) can be obtained in a numerically more reliable way from the SVD of A y [9].
3.3
STEP OFY
2: DETERMINATION
OF
Q FROM
THE
HIGHER-ORDER
STATISTICS
The third-order cumulant C Y of Y, defined by the element-wise expectation c~k = E{l~l~Yk}
(14)
is related to the third-order cumulant Cx of the source vector X in the following way: CY = C x" • 2 1 5 2 1 5
(15)
as can easily be verified by combining Eqs. (14) and (8). In Eq. (15) CX is diagonal, since we daim that the source signals are also higher-order independent ([13], [14]). Substitution
The Applications of Higher-order SVD to ICA
389
of Eq. (9) in Eq. (15) yields I~ = C x x l Q x2 q x3 Q
(16)
in which the tensor ~ is defined as: ]~ def=cy Xl T -1 • T -I X3 T -I
(17)
Hence, due to the unicity property in Section 3.3, Q can be obtained from the HO SVD of I~. (Eq. (16) is for the tensor B the third-order equivalent of the Eigenvalue Decomposition of a symmetric matrix.)
The transfer matrix M is finally given by Eq. (9). 3.4
DISCUSSION
We want to stress the conceptual importance of the new approach. It reveals an important symmetry when comparing the problems of PCA and ICA. In "classical" second-order statistics, the problem of interest is to remove the correlation from data measured after linear transfer of independent source signals. The key tool to realize this, comes from "classical" linear algebra: it is the matrix SVD. More recently, researchers also aimed at the removal of higher-order dependence, which is a problem of Higher-Order Statistics. We proved that one can resort to a tool from multilinear algebra, which is precisely the generalization of the SVD for higher-order tensors. 4
CONCLUSION
We generalized the Singular Value Decomposition of matrices to the higher-order case. It was shown that this decomposition provides a new conceptual approach to solve the blind source separation problem in Higher-Order Statistics. References
[1] J.-F. Cardoso, P. Comon. Tensor-based independent component analysis. Signal Processing V : Theories and Applications, pp 673-676, 1990. [2] J.-F. Cardoso. A tetradic decomposition of 4th-order tensors. Application to the source separation problem. In : B. De Moor, M. Moonen (editors). SVD and signal processing, III : algorithms, applications and architectures. Elsevier Science Publishers, North Holland, Amsterdam, 1995. [3] J.-F. Cardoso, A. Souloumiac. An efficient technique for blind separation of complex sources. Proc. IEEE SP workshop on higher-order statistics, Lake Tahoe, U.S.A. pp 275-279, 1993. [4] P. Comon. Independent component analysis, A New Concept? Signal Processing, special issue on higher-order statistics. 36 (3), pp 287-314, 1994.
390
L. De Lathauwer et al.
[5]
L. De Lathauwer, B. De Moor, J. Vandewalle. A singular value decomposition for higher-order tensors. Proc. ATHOS workshop on system identification and high-order statistics, Nice, France, September 1993.
[~]
L. De Lathauwer, B. De Moor, J. Vandewalle. The higher-order singular value decomposition. To be submitted to: SIAM Journal on Matrix Analysis and Applications.
[7]
B. De Moor, M. Moonen (editors). SVD and signal processing, III: algorithms, applications and architectures. Elsevier Science Publishers, North Holland, Amsterdam, 1995.
Is]
E.F. Deprettere (editor). SVD and signal processing : algorithms, applications and architectures. Elsevier Science Publishers, North Holland, Amsterdam, 1988.
[9]
G.H. Golub, C.F. Van Loan. Matrix computations. North Oxford Academic Publishing Co., Johns Hopkins Press, 1988.
[10]
I.T. Jolliffe. Principal component analysis. Springer, New York, 1986.
[11] D.C. Kay. Theory and problems of tensor calculus. McGraw-Hill, 1988. [12] M. Marcus. Finite dimensional multilinear algebra. Dekker, New York, 1975. [13] J.M. Mendel. Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some applications. Proceedings of the IEEE. 79 (3), pp 278-305, 1991.
[14]
C.L. Nikias, J.M. Mendel. Signal processing with higher-order spectra. IEEE Signal Processing Magazine. July 1993, pp 10-37.
[15]
M. Schmutz. Optimal and suboptimal separable expansions for 3D-signal processing. Pattern Recognition Letters. 8, pp 217-220, 1988.
[16]
L.tt. Tucker. The extension of factor analysis to three-dimensional matrices. In : H. Gulliksen, N. Frederiksen (editors). Contributions to mathematical psychology. Holt, Rinehart & Winston, pp 109-127, 1964.
[17]
It. Vaccaro (editor). SVD and signal processing, H : algorithms, applications and architectures. Elsevier Science Publishers, North Holland, Amsterdam, 1991.
[is]
J. Vandewalle, B. De Moor. On the use of the singular value decomposition in identification and signal processing. In : G. Golub, P. Van Dooren (editors). Numerical linear algebra, digital signal processing and parallel algorithms. NATO ASI Series, F70, pp 321-360, 1991.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
BANDPASS FILTERING DESIGN, EVALUATION
FOR THE HTLS ESTIMATION AND SVD ANALYSIS
391
ALGORITHM:
H. CHEN, S. VAN HUFFEL, J. VANDEWALLE Katholieke Universiteit Leuven Kard. Mercierlaan 9~ 3001 Leuven Belgium hua. chen @esat. kule u ven. ac. be
ABSTRACT. The research on the parameter estimation of a sum of K exponentially damped sinusoids has led to the development of many estimation algorithms. In some applications, however, it is desired to prefilter the input data in order to reduce the noise and enhance the parameters of interest. In this paper, we present a prefiltering technique in which a filter matrix is multiplied with the original data matrix prior to applying a subspace and SVD-based method. Two filter matrices are proposed, respectively, for the FIR and IIR prefiltering. A theoretical analysis on a special case of two exponentially damped sinusoids is given, which reveals the relationship between the singular values/vectors of the prefiltered and original data matrices. KEYWORDS. Exponentially damped sinusoids, parameter estimation, frequency-selective, bandpass filter, SVD, total least squares, state space.
1
INTRODUCTION
The estimation of the parameters of a sum of K exponentially damped sinusoids from noisy data has led to the development of many algorithms, such as the linear prediction (LP) method [1], the matrix pencil (MP) method [2] and Kung et al.'s method and its variant (HSVD and HTLS) [3, 4, 5]. The essence of these subspace and SVD-based methods lies in the SVD truncation, sometimes called SVD filtering since it filters out part of the noise by truncating the SVD of a Hankel/Toeplitz data matrix to rank K and discarding the non-significant singular values and vectors. However, it is not always satisfactory especially at low signal-to-noise ratios (SNR). Moreover, in some applications only a few (less than K) sinusoids are of interest. To further combat noise and/or enhance sinusoids of interest,
H. Chen et al.
392
it is frequently desired to prefilter the input data in order to improve the resolution and accuracy of the estimated parameters of interest. It should be noted that the SVD filtering of the subspace-oriented estimation methods cited above is based on the rank-K property of a Hankel/Toeplitz data matrix in the absence of noise. As such, a special prefiltering technique is required since a Hankel/Toeplitz matrix constructed from the filtered data, instead of the original noiseless data, does not possess the rank-K property anymore[6]. In the following sections, a prefiltering technique is first described in which a full-rank filter matrix is multiplied with the Hankel/Toeplitz data matrix 9 The prefiltered data matrix, or the product obtained thereof, keeps the rank-K property and can readily be processed by an SVD-based estimation algorithm, e.g. HTLS in this paper. Next, we present two filter matrices that implement the finite impulse response (FIR) and infinite impulse response (IIR) filters, respectively. Subsequently, a theoretical analysis on a special case of two exponential]y damped sinusoids is given revealing the relationship between the singular values/vectors of the prefiltered and original data matrices 9
2
P R E F I L T E R E D HTLS
The estimation algorithm used here is HTLS, which assumes that the data samples xn are modeled as follows: K
K
z= .~ ~ ckexp[--dknAt + j(2~r fknAt)] = ~ CkZ'~ k-'l
n = 0, 1,." . , Y -
1
(1)
k-1
where ck is the complex-valued linear parameter, dk (damping factor) and fk (frequency) are the nonlinear parameters of the kth peak and j = ~2"f, zk = exp[-dknAt+j(2~rhnAt)] is the pole of the signal and At is the constant sample interval. Obviously, fs = 1/zXt is the sampling frequency. HTLS is a subspace based method, which first arranges the data points in a Hankel matrix XLxM a s f o l l o w s
x =
X0
Xl
Xl
X2
~
. 9 9
XL_ 1
. 9
X2
9 1 4 9 1X4M 9 -1
. . . . . .
.
.
.
(2)
.
9
9
9
9
.
9
. . . . .
XN_ 1
where L > K, M > K, N = L + M - 1, and then performs an SVD filtering
X=UK~KV~
(3)
where UK and VK axe the first K left and right singular vectors, and ]EK is the diagonal matrix composed of the first K singular values. Finally zk is retrieved using the rotational invariance property of UK or VK and the TLS technique =
=
(4)
where the up (down) arrow stands for deleting the top (bottom) row, the superscript H denotes Hermitian conjugate, and the eigenvalues of both Z1 and Z2 are the signal poles Zk. This method gives better accuracy compared to Kung's method as shown in [5]. Once
Bandpass Filtering f o r the H T L S Estimation Algorithm
393
the poles and hence the nonlinear parameters are determined, the linear parameters can be found by the least squares method 9 By prefiltered HTLS we mean that a full-rank filter matrix H is multiplied with the Hankel data matrix before applying the estimation algorithm HTLS. As formulated below, a right (resp., left) multiplication of a full-rank matrix Hr (resp., Ht) of appropriate dimensions retains the rank-K and rotational-invariant properties of UK (resp., V K ) in the absence of noise. = X H r = r d g ~ , g V H = EK=I dkffkVk g =V fdTg = T J K j Z 1 resp., 2 = H I X = TJK~aK ~TH "- E K = I ~kil.kVk H ~ ~rTK = ~VrK~ZH The eigenvalues of Zl (resp. Z2) are the poles of signal. The tilde denotes the prefiltered counterpart throughout the paper. Since a right (resp., left) multiplication of a full-rank square matrix does not change the column (resp., row) space of X, there exists a unitary matrix 12 such that I5 = U ~ (resp., V = V ~ ) . Furthermore, it can be proven that 12_[1210
3
12201with121EcK•
FIR AND IIR FILTER MATRICES
For a linear filtering process, a filter matrix H can be found such that y = H x E Cp• is the filtered data vector if x E Cp• is the original data vector 9 A full-rank Toeplitz filter matrix H E shown below is set up from the impulse response of an FIR filter hi, h 2 , . . . , hq to implement the linear convolution with zero-padding 9 hT 9"" hi 0 ...... 0 h2 hq H F ~-
0
"'.
9
hi
"'.
:
h2
"'.
hq 9,
0
"'.
9
.
'.
".
:
9 .
0
E C p•
wherer=
[q +2 l j
hi
9149
"" . . . .
0
hq
...
hm
Given the backward recursion of an rth order IIR filter aryn = brxn + br-lxn+l + ... + boxn+r - ar-lYn+l . . . . . alyn+r-1 - a0Yn+r, ar ~ 0 with the initial conditions: Xp+l = xp+2 . . . . . xp+,. = O, Yp+l = Yp+2 . . . . . yp+r = 0 we can construct a full-rank upper-triangular filter matrix Hx gp gl g2 9 9 0 gl 9 9149gp-1 HI= .. . ~ c p• (5) 9
9149149 0
9
gl
with the elements gx
=
gi
=
br/a~, (br-i+l
-
(-ar-lgi-1
ar-lgi-1
-
ar-2gi-2
- ar-2gi-2 . . . . .
.....
ar-i+lgl)/ar, aogi-r)/ar,
for2_i q,
=
(17)
then, ff~p = A (N+p,I~), is the state transition matrix over one period for phase p, 7~p is the phase-p teachability matrix, T~p = [A(p+N'p+I)Bp, " " " , BIb+N-I]],
(18)
Realization of Discrete-time Periodic Systems
427
(.Or the phase-p observability matrix,
c~ Or =
C[p+llA(r+l,r) .
(19)
C[p+N_I]A (r+N-l,r) and jt4p a lower triangular matrix of Markov parameter blocks, for i, j = 1, 2 , . . . , N - 1 0
[Mr]i j =
Dip+i] , C[p+i]A(r+',r+J+X)B[p + j]
i<j i= j i > j.
(20)
Any LTI system realization or identification scheme may be used to find the monodromy systems (16), i.e., the Op, TCr, O r and M r matrices. However this misses the point, as then products of the matrices Ai, Bi, and Ci (i = 0 , . . . , N - 1) are identified. A difficult factorization problem would still have to be solved. This problem can be circumvented by exploiting the structure of the subspace algorithm. As described in [2,3] it generates a state trajectory realization first and then the coefficients are found to fit the state with the input and output . Since there is no need for the monodromy system other than generation of the state, we go directly to solving for the time-periodic coefficients.
4
EXTENSION
OF T H E S U B S P A C E A L G O R I T H M
Let us assume that the true system is represented by the state space description in equation (1) with period N. With the sequence aP{y}, define a vector of T = kN sequences
Y~ = [{y}~, { y h + ~ , . . . , {Y}~+kN-~]'.
(21)
and define hop similarly. The induced input-output relation is
y~ = V~x~ + ~ u ~
(22)
for each p = t (mod N). The matrices ~p and ~ ' r are respectively the observability and Toeplitz matrix of the Markov parameters of the lifted system (16). To shift to a finite segment of real input-output data, the truncated (after l-th entry) sequences {u}t and {Y}t are replaced by the m x 1, respectively p x l matrices u[t] and y[t]. Similarly/gt and Yt are replaced by the data matrices in [2] and [3]. Denote these respectively by L/It] and y[t]. Finally, x[t] replaces {x}t. Only y[t] and/A[t] are used in the state realization scheme to generate x[t]. In fact, neither O"'p, M'p nor the system of equations in (3) are ever explicitly generated. They are, however, implicitly used in the state realization technique.
4.1
SINGLE PHASE STATE RECONSTRUCTION
Consider (22), and let us concentrate on one phase of the system. As there is no risk of confusion, we shall suppress the explicit dependency on the phase designator p. We shall assume that full row rank is enjoyed by/4, x (persistent excitation), and ~ (observability).
E.L Verriest and J.A. Kullstam
428
Let H denote the concatenation of/4 and y
As shown in [2,3], concatenation of the outputs contributes n new dimensions to the space spanned by the inputs i.e. rank (H) = rank(/4)+ rank(x). These n new dimensions are attributed to the effects of the state and are used to infer a state realization. Now let U[t]
/41
H2 =
=
+ kN] The state realization is generated from the relation / / 1 -~
y[t]
=
Yl
'
y[t
Y2
.
(24)
rowspan(x2) = rowspan(H1) N rowspan(H2).
(25)
Therefore, the main computational task of the subspace algorithm in generating a state trajectory is to find this rowspace intersection, for which we refer to [2] and [3]. 4.2
IMPLEMENTATION
We now discuss some aspects of a practical nature relating to the implementation of the algorithm. We start with the data record of the input-output data d(t) =
u(t) u(t)
, for t=
0 , 1 , . . . , M < cr We proceed as follows. Let d[t] = [d(t), d(t + N ) , . . . , d(t + iN)]. Use the notation
d~[p] = d~[p] =
dip] dip + kN]
(26) (2T)
For each phase p form the data matrices/tl and H2 by interleaving the input and output vectors, i.e. dl[p] d2[p]
~[p] =
d~[p.+ 1]
,
H~[p] =
d2[p.+ 1]
.
(2S)
dl[p + k N - 1] d2[p + k N - 1] This is a rearrangement of the rows from our previous definition of Hx and H2. Clearly, all of the arguments made with the previous definition still hold 9 Next, find x2[p] by the method described in [2] and [3], i.e.,
(29) Note that d[p + N] x2[p + N] = Ur
](
)
9
d[p+(k+~lN-
o
(30)
11
this trick will become important in the reconstruction step and motivates the interleaving 9 While we will not go into any details, we state that H should contain many more columns than rows. It needs to contain enough information about the system to reconstruct a state 9
Realization of Discrete-time Periodic Systems
429
The periodicity will shrink the width by a factor of N so a moderately long sequence of data is required in order for the method to work properly (the data required for grows by a factor of N 2 since we must insure t h a t / t will be at least square). At this point we have generated a state trajectory realization :for each phase from the input-output data. The states, inputs and outputs are still not interleaved but as it will turn out, this is the natural representation for finding the system matrices. 4.3
RECONSTRUCTION OF THE SYSTEM EQUATIONS
We arrive at the last step: the creation of a linear time periodic state space model. We seek the matrix coefficients Ap, Bp, Cp and Dp. At our disposal are the input, output and state trajectories. The u2[p] and Y2[P] are, of course, given at the outset, x2[p], for p = 1 , 2 , . . . , N - 1 are defined by formula (29). Also, x[N] comes from equation (30) with
p=O. For each p = 0 , 1 , . . . , N - 1 solve the over-determined system of equations Y2[P]
=
C,
D,
u2[p]
in the total least squares sense [1]. Since known quantities on the right hand side are just as reliable as those on the left the use of standard least squares may not be well motivated. 5
SIMULATION RESULTS
We checked if the noise insensitivity of the subspace algorithm in the time-invariant case is also enjoyed by the periodic extension. To explore sensitivity, we used a 4-th order linear system with period 3, which was such that its response to a unit step at every phase was a unit step. While this is artificial, it makes interpretation of the results easier. Unit variance white Gaussian plant and measurement noise were added to obscure the true signal. To ensure excitation of all modes, a Bernoulli( 89 sequence of { - 1 , + 1} was chosen as an input. Based on 800 data points (M), a 4th order model was then identified using the periodic extension to the sub-space model presented in this paper. The figures compare the step response of the exact and the identified model. As can be seen from figure 1, the noise level is substantial. The model identified presents a reasonable approximation of the dynamics as seen in figure 2. As noise is reduced, the identified model quickly nears the true model. 6
CONCLUSION
We have described a simple strategy for state-space identification of periodic systems. The data record must contain several full cycles in order to exploit the periodic structure. Therefore, only short to moderate period lengths are feasible. The main development is the restructuring of the problem using the monodromy system formulations. Empirical tests show that even when moderate amounts of noise are present the algorithm
430
E.L Verriest and J.A. Kullstam
produces a good approximation. The subspace method is much easier to implement than a polynomial matrix fraction to state-space realization technique. Therefore, the subspace algorithm can be used to obtain a reduced order model or it can be used as an identification scheme on data generated by real world systems. Note that the state trajectory realization defines N individual monodromy matrices. While the periodicity of the underlying system forces a correlation on these monodromy matrices (e.g. they all have the same eigenvalues), this information is disregarded in the creation of a state. It is, however, implicit in the construction of the LTP system matrices. In the case of high noise, especially when inputs are highly correlated with past outputs e.g. feedback situations, the subspace algorithm may not perform well. Also, in the high noise case a large data record will be needed to extract accurate information. The subspace algorithm is numerically expensive. As an alternative the stochastic identification method described in [4] can be used. From correlation estimates of past and future input-output data a state trajectory estimate can be derived by:canonical correlation analysis. This state estimate can then be used in the second part of the algorithm to determine the state-space equation parameters. References
[1] S. Van Huffel, J. Vandewalle. The total least squares problem: computation aspects and analysis. SIAM 1991. [2] B. De Moor, M. Moonen, L. Vandenberghe, and J. Vandewalle. The Application of the Canonical Correlation Concept to the Identification of Linear State Space Models. Analysis and Optimization of Systems, A. Bensoussan, J. Lions (Eds.), Springer-Verlag, Lecture Notes in Control and Information Sciences, Vol. 111, 1988. [3] M. Moonen, B. De Moor, L. Vandenberghe, and J. Vandewalle. On- and Off-line identification of linear state-space models. Int. Journal of Control 49, No. 1, 1989, pp. 219-232. [4] W. Larimore. System Identification, Reduced-Order Filtering and Modelling Via Canonical Variate Analysis. Proc. American Control Conference, H.S. Rao and T. Dorato, Eds., 1983, pp. 445-451. New York: IEEE. [5] B. Park and E. Verriest. Canonical Forms for Discrete Linear Periodically TimeVarying System and Applications to Control. Proceedings of the 28th IEEE Conference on Decision and Control, Tampa, Florida, 1989 pp. 1220-1225. [6] D. Alpay, P. Dewilde, and H. Dym. Lossless Inverse Scattering and reproducing kernels for upper triangular operators. In: I. Gohberg (Ed.), Extension and Interpolation of Linear Operators and Matrix Functions, Birkhs Verlag, 1990. [7] O.M. Grasselli, A. Tornambe and S. Longhi. A Polynomial Approach to Deriving a State-Space Model of a Periodic Process Described by Difference Equations. Proceedings of the 2-nd International Symposium on Implicit and Robust Systems, Warsau, Poland, Aug. 1991.
431
Realization of Discrete-time Periodic Systems
[8] P.B. Park and E.I. Verriest. Time-Frequency Transfer Function and Realization Algorithm for Discrete Periodic Linear Systems. Proceedings of the 32nd IEEE Conf. Dec. Control, San Antonio, TX, Dec. 1993, pp. 2401-2402. [9] E.I. Verriest and P.B. Park. Periodic Systems Realization Theory with Applications. Proc. 1st IFAC Workshop on New Trends in Design of Control Systems, Sept. 7-10, 1994, Smolenice, Slovak Republic.
.
slep rcspon.r of true system
.
.
"0"20~
.
.
0.2
0 --i
-0.4
20
40
60
80
100
120
Figure 1" Noisy and Noiseless Step Responses Typical step response of the noisy system (solid line) and the step response of the noiseless system (dashed line). 1
.
4
-
-
slep response of eslitnatcd
sy.~tetn
1.2 I 0.8 0.6!
IV,
[
0.4 0.2 0 -
60
80
1(X)
120
Figure 2: Identified and True Step Responses The step response of the 4th order system identified from noisy data (solid line) and the step response of the noiseless (dashed line).
This Page Intentionally Left Blank
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
433
CANONICAL CORRELATION ANALYSIS OF THE DETERMINISTIC REALIZATION PROBLEM
J. A. RAMOS 11200 N.W. 43 rd Street Coral Springs, F1 33065 zilouchi @acc.fau. edu E. I. VERRIEST Georgia Tech Lorraine, Technop61e Metz 2000 2 et 3, rue Marconi, 57070 Metz France verriest @33. gatech, edu
ABSTRACT. Recently, a fairly simple state-space system identification algorithm has been introduced, which uses I/O data directly. This new algorithm falls in the class of subspace methods and resembles a well known Markov parameter based realization algorithm. The heart of the subspace algorithm is the computation of a state vector sequence from the intersection of two subspaces. Once this state vector sequence is obtained, the system matrices can be easily obtained by solving an over-determined linear system of equations. It has been shown that this new subspace algorithm is equivalent to a canonical correlation analysis on certain data matrices. In this paper we apply the classical theory of canonical correlation analysis to the deterministic identification problem. It is shown that the first n pairs of canonical eigenvectors contain all the information necessary to compute the state vector sequence in the original algorithm. We further show the connections with forwardbackwards models, a duality property common to canonical correlation based identification algorithms. This is useful for computing balanced realizations (an optional step). It is shown that, unlike the stochastic realization problem, certain dual covariance constraints have to be satisfied in order for the state sequence to be admissible. Thus, forcing the optimal solution to be of a canonical correlation analysis type. Algorithms that do not satisfy all the constraints are only approximate. Finally, we derive deterministic realization algorithms using ideas from their stochastic realization counterparts. KEYWORDS. Deterministic realization theory, canonical correlations, balanced realizations, GSVD (QPQ-SVD, PQ-SVD, P-SVD).
J.A. Ramos and E.L Verriest
434 1
INTRODUCTION
Consider the linear, discrete, time invariant, multi-variable system with forward state space representation :r.k+l = Axk-k Buk (1)
Yk = C zk + Duk (2) where uk E ~,n, Yk E ~ , and xk E ~R" denote, respectively, the input, output, and state vectors at time k, and n corresponds to the minimal system order. Furthermore, [A, B, C, D! are the unknown system matrices of appropriate dimensions, to be identified (up to a similarity transformation) by means of a recorded I/O sequence {Uk}k=l N and {Yk}g=l. Let the past (Hp) and future (H!) I/O data Hankel matrices be as defined in [3, 6]
Hp -
[ .---U, ] -
[ Hankel{uk, Hankel{yk, uk+l,...,Uk+i+j_2} Yk+l,...,yk+i+j-2}
and 1ti
= 9
~
where {U~,, U!) e ~rn, x~, {yp,y!) E ~R~ixj , and j ~> i. Let also the state vector sequences Xp and X / b e defined as
I
],
I
Then, it is shown in [3, 6] that there exists matrices U12 E ~R(m/+~/)x(2~/-n) , and 7" E ~Rnx(2~i-n) such that
~(mi+~i)•
]
U22 E
f(! = T UT Hp = - T UT H! where f(l = T X f , for some similarity transformation T. The system matrices [A, B, C',/)] are then obtained by solving an over-determined linear system of equations. 2
CANONICAL
CORRELATION
ANALYSIS
Suppose ZII p = MTHp is an oversized (mi + n) • j state vector sequence corresponding to a non-minimum system. Then, we know that there is an n th order sub-system that has minimum dimension and at the same time is both observable and controllable. The idea then is to compress the state vector sequence so that the non-observable/non-controllable states are deleted. That is, find an n • (n -k mi) compression matrix 7", such that ffl = 7"MTHp" In essence, this data compression operation resembles a principal component extraction, an idea popularized in system theory by [1, 8]. To fix the ideas, we need the output and state equations in subspace form [3, 6], i.e.,
Yp = r Xp + HT Up
(3)
HT U!
(4)
X I = A i x p + A Up
(5)
= r xj +
where HT E ~ixmi is a lower triangular block Toeplitz matrix with the Markov param-
Canonical Correlation Analysis
435
eters, while r E ~tixn and A E ~,~xmi axe, respectively, the observability and reversed controllability matrices. From the subspace equations (3)-(5), one can show that both Hp and H I have dual representations. That is,
Up lip
]
--
YI
[_~s]x~
Omixn Imixmi r HT
Omixmi ] c a Oa x mi
Omixn Omi•
Imixmi J
rA
rA ~
,4
Up
0
Xp lp
-
C1 ]32
UI
.Ab 0 B~ B2
XIIp
=
(6)
HT
Similarly, by reversing the direction of time, we get lip f
=
y]
rb O~i•
HbT O..•
Oa,,,~i I,~i•
Up
r
Oiixmi
HT
UI
(7)
US
where the subscript/superscript b denotes backwards, in relation to a backwards model, and rb = r A -~ and H b = H T - F A - i A are, respectively, the backwards observability matrix and the upper triangular Toeplitz matrix of backwards Markov parameters. The aforementioned duality stems from the fact that by reversing the direction of time and exchanging the roles of Hp and HI, we get another representation dual to (7), i.e.,
YI
r
HT
Otixmi
yp
rb
o~i•
H~r
A
0
Xll I
(8)
Up
Let us now formally define the (n + mi) • j "state plus input" sequences
XI
X p l , = [ X, u~
'
x s ls =
us
[ XI '
x s l, =
v~
Let us also define the following covariance matrices j-~oo 3
A :
UI
j--oolim13 Xfl U fp
[ % Iv; ] -
~pf
ZI
Ay,
Alp Ap
,.oo 7 H ,IHf 7Z] We should point out that rank{rip} = rank{n]} = rank{~]p} = mi 4- n, thus, the data covaxiance matrices are rank deficient [2, 5].
It is rather tedious but straight forward to show that
n~ ~ n~ - ~s = ~ ( ~ ; i r ~ , s n~ ~ n z - n ,
_ r~s)~
= ~ (ay~AT~a. - A,)(~)~
(9) (10)
If we now pre-multiply (9) and (10) by their respective orthogonal complements, we get
J.A. Ramos and E.L Verriest
436
where
~-(~) ~
~
~'•162
- [ - ~'~ ~ 1 ~ ' ~ ]
for nonsingular matrices (P, Pb) E ~ti• P=
r•
'
Pb=
whose general form can be shown to be
rb•
where (.)• denotes orthogonal complement and (.)~ denotes pseudo inverse. One can further show that there exists matrices U and V such that Np and N! can be decomposed as
~7
=
.~-
[v~
=
.~.
-rtA~lrt ]
N~ = A• = A• where U and V are the canonical solutions we are looking for, i.e., they satisfy
2s
(la)
= u r n , = vrHs
It is well known [3, 13] that if dim{spanrow{Hp}Nspanrow{H ! }} = n, then the canonical angles between the subspaces spanned by the rows of Hp and H! satisfy : 81 = 82 . . . . . 8,, = 0, and 8n+1 >_ 8n+2 >_ " " >_ 8n+mi >_ O. This result states that the first n canonical correlations (si = cos{0i}) are equal to one. Therefore, we can write (11)-(12) as a pair of generalized eigenvalue-eigenvector problems of the form
where S = diag{si = 1}~=1 are the first n canonical correlations associated with the canonical eigenvectors U = [u~, u 2 , . . . , un] and Y = [v~, v2,..., v,~].
2.1 PROPERTIES OF THE CANONICAL MATRICES [U, V! Let us now form the (rai + n) • j pair of variables to which we will perform the canonical correlations. First, recall from (7)-(8) that
x/l/ - At
H/,
xtl~ = A~ Hp
Let us alsodefinethe followingtransformedvariables Z/I. = IX/I
~-
=PbX!
l,,
zsls=
['X/]
~)
= PX!
is
where ~ j = TXs for ~ny ~imnarity transformation m~trix T, ~', ~nd ~) ~re tr~n~orm~tion~ to Up and U!, respectively, and P and Pb have the general form 7)= Then ;Ell~, and ;Ell! form a full rank pair of canonical variates defined as z~,l, = M r H , , Z~, b, = L r H ,
(is)
437
Canonical Correlation Analysis
where M T = 79b .A~ and L T = P .A~ are the canonical transformations which are to be determined from the data matrices Hp and H I. One can further show the connection of the canonical variables to a pair of forwardbackwards models, a concept well developed in stochastic realization theory [4]. From r s
At=[-FtHT rt
and ~4bT, one can show that Imixmi
]
Omixgi
O,,,.ix.a
l,.i x,.i
Im~x,~i
O,,-,ixti
where Fb=FA Hb =
-i HT-FA-iA HT-
=
Fb
A
= HT+FAb Ab = - A -i A
Notice that H~ is upper triangular, corresponding to an anti-causal backwards system. It is then rather straight forward to show that the above relationships are associated with a pair of forward-backwards models of the form" Forward M o d e l xk+l
=
A xk + B uk
Yk
=
Cxk+Duk
Backwards M o d e l zk-1
=
Ab zk + Bb uk
Yk
--
Cb Zk + Db uk
where, for the backwards model, Ab = A -1, Bb = - A -1 B, Cb = C A -1, Db = D
-
CA -1B,
and Zk-1 -- Xk.
Furthermore, from (13) one can show that f(. f H T - uTT~,p - v T ' ~ . f p
Thus, we have the following system of equations 7-lip
(17)
~ y
from which we get the optimality conditions
v~t~.T,~ u,', v~tn~ u~ n ~
-
~,]
=
- u~] =
UT
=
vTT-I# Tt~
VT
--
uT']-~T~]
o.•
(18)
o,,•
This can easily be implemented using a Q R decomposition.
(20) (21)
J.A. Ramos and E.L Verriest
438 3
BALANCED
REALIZATIONS
Here we will show that the orthogonal complement matrices Np and N! satisfy certain constraints that lead to the balanced conditions similarly derived in [7]. First, let us point out that Np and N! satisfy the following system of equations
Now. since the two systems of equations are equivalent, one needs to solve only one system. Let [ then from (22) we have
IT4 ]= [ NTI-NT ],
T,
T2r + T4rA ~ =
T1 + T2HT + T4EA T3 + T4HT
=
0~•
(24)
0~i•
(25)
= 0tixmi (26) Since the product r & is similarity invariant, one can specify a similarity transformation directly from the equations. That is, from (25) we can specify T* = T4F as the pseudoinverse of the compression matrix. Substituting this result in the above system of equations leads to a dual relationship of the form T t = (T2T4T3 - I'1)A~. One can then find the compression matrix by specifying the following balancing constraints
(Tt)T(TT4 )TT2T* = Abe, T(T2T4T3 - Tx)(T2T4T3 - T1)TT T =
(27) (28)
Abal
where Abat is a diagonal matrix.
] --
Let us finally remark that (24)-(26) can also be expressed as
] 9[
I-r
[0~,•
(29)
where rl:2i and H)r:2i are extensions of r and HT to include 2i block rows. Since the system matrices [A, B, C, D] are contained in the unknowns, one may find other interesting canonical forms, leading perhaps to alternative constrained system identification algorithms. 4
GENERALIZED
DETERMINISTIC
IDENTIFICATION
PROBLEM
The deterministic realization problem can be seen as a classical multivariate analysis problem, where one has a pair of data matrices and would like to transform them to a coordinate system where they have maximum correlation [3, 5, 10, 11]. This leads to a number of alternatives for modeling the deterministic realization problem. Due to lack of space we shall only present a summary of the results. Let us write the main decomposition as follows :
Lr o l l
=
Aj
(30)
M TOpM
=
Ap
(31)
7-lln =
Of LA~ 89
(32)
where S = d*ag{si}i=l " mi+,, , Ap and A / are diagonal matrices, and 0p and 0 f are to be defined below. From the above solution we can specify the constraints according to the
Canonical Correlation Analysis
439
following algorithms [12]"
Case 1" Canonical Correlation Analysis QPQ-SVD of {Hf, H f , Hp, H / } _ .r~ < 1 0 < t~,tji:l __ Op = 7Zp
0~
=
U VT
= =
~f first n rows o f L T first n rows o f M T
Case 2: Principal Components of Instrumental Variables PQ-SVD of { HI, Hp, H T, } 0 < {si}m=il+n < no upper bound (variances) 0p = ~p O] = I(mi+~)• UT = aTSM T V T = aT1L T where al is the result of a second approximation, i.e,
Due to lack of space we will not go into any details as to the GSVD part of the algorithms. Instead, we will refer the interested reader to the references [3, 12] where an algorithmic approach has been elegantly outlined. Finally, the computation of L and M can also be done from the ordinary SVD by performing the following computations 9 First, compute the SVD's of Hp and H / Hp = VpSpVT , H / - Vf S f Vf then compute the SVD of the product of the V factors
vfv Set
m f -- S
= usv and Ap = S. Finally, compute L and M as 1
LT= A~uTsTIu~,
5
MT=
1
AgvTs;1U [
CONCLUSIONS
We have studied the deterministic realization problem from a classical canonical correlation analysis point of view. We have shown that there exists an inherent duality between a forward-backwards pair of models. Furthermore, we have derived the canonical weights as a function of system matrices such as the observability and controllability matrices. This introduces some flexibility in designing constrained system identification algorithms that may require a given canonical form. For the balancing conditions it was shown how to derive the constraints explicitly. However, for other canonical forms, this remains an open problem. Finally, we derived some optimality conditions and showed how to correct for these when using Principal Components based algorithms.
440
J.A. Ramos and E.L Verriest
References
[1] K.S. Arun, D.V. Bhaskar Rao, and S.Y. Kung A new predictive efficiency criterion for approximate stochastic realization. Proc. 22nd IEEE Conf. Decision Contr., San Antonio, TX, pp. 1353-1355, 1983. [2] A. Bjork and G. H. Golub. Numerical methods for computing angles between linear subspaces. Math. Comp., Vol. 27, pp. 579-594, 1973. [3] B. De Moor. Mathematical concepts and techniques for modeling of static and dynamic systems. Ph. D. dissertation, Katholieke Universiteit Leuven, Leuven, Belgium, 1988. [4] U.B. Desai, D. Pal, and R. Kirkpatrick. A realization approach to stochastic model reduction. Int. Journal of Control 42, No. 4, pp. 821-838, 1985. [5] D. G. Kabe. On some multivariate statistical methodology with applications to statistics, psychology, and mathematical programming. The Journal of the Industrial Mathematics Society, Vol. 35, Part 1, pp. 1-18, 1985. [6] M. Moonen, B. De Moor, L. Vandenberghe, J. Vandewalle. On- and off-line identification of linear state space models. Int. Journal of Control 49, No. 1, pp. 219-232, 1989. [7] M. Moonen and J. A. Ramos. A subspace algorithm for balanced state-space system identification. IEEE Trans. Automat. Contr. AC-38, No. 11, pp. 1727-1729, 1993. [8] J.B. Moore. Principal component analysis in linear systems: controllability, observability, and model reduction. IEEE Trans. Automat. Contr. AC-26, No. 1, pp. 17-32, 1981. [9] C.T. Mullis and R.A. Roberts. Synthesis of minimum roundoff noise in fixed point digital filters. IEEE Trans. Circuits and Systems CAS-23, No. 9, pp. 551-562, 1976.
[10]
J. A. Ramos and E. I. Verriest. A unifying tool for comparing stochastic realization algorithms and model reduction techniques. Proc. of the 1984 Automatic Control Conference, San Diego, CA, pp. 150-155, June 1984.
[11]
P. Robert and Y. Escoufier. A unifying tool for linear multivariate statistical methods: the RV-coefficient. Applied Statistics, 25, NO. 3, pp. 257-265, 1976.
[12]
J. Suykens, B. De Moor, and J. Vandewalle GSVD-based stochastic realization. ESAT Laboratory, Katholieke Universiteit Leuven, Leuven, Belgium. Internal report No. ESAT-SISTA 1990-03, July 1990.
[13]
D. S. Watkins. Fundamentals of Matrix Computations. John Wiley & Sons, 1991.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
AN UPDATING ALGORITHM IDENTIFICATION
FOR ON-LINE
441
MIMO
SYSTEM
M. STEWART Coordinated Science Laboratory University of Illinois Urbana, IL USA stewart @monk. csl. uiuc. edu P. VAN DOOREN CESAME Universitd Catholique de Louvain Louvain-la-Neuve Belgium vandooren @anma. ucl. ac. be ABSTRACT. This paper describes the application of a generalized URV decomposition to an on-line system identification algorithm. The algorithm updates estimates of a state space model with O(n 2) complexity. KEYWORDS. Identification, MIMO systems, updating, URV.
1
INTRODUCTION
Identification of a state space model for a MIMO system from input/output data is a computationally intensive problem. A reliable algorithm was given in [1], but as presented it depends on the SVD to make crucial rank decisions and identify subspaces. Unfortunately, there have been no exact algorithms proposed for updating the SVD when input/output measurements are added which are faster than O(n3). An approximate approach to SVD updating which might be considered for use here was developed in [2]. However the problem is really more difficult than just updating one SVD. What is desired is the intersection of the range spaces of two matrices. In [1] this is computed using two SVD's, and they both must be updated simultaneously. The URV decomposition, [4], is an easily updated decomposition which, in some applications, may be used to replace the SVD. The fact that an intersection of the range spaces
M. Stewart and P. Van Dooren
442
of two matrices is required suggests that a generalization of the URV decomposition along the lines of [3] might be helpful. Such a decomposition was introduced in [5] along with an O(n 2) updating algorithm. This paper will give a brief description of the decomposition and show how it can be used as part of an on-line identification algorithm. Given a sequence of m x 1 input vectors, u(k), we assume that the sequence of l • 1 output vectors, y(k), are generated from the state space equations,
(1)
x(k + 1) = Akz(k) + Bku(k) y(k) = Ckz(k) + Dku(k).
Assuming we have observations of the input and output vectors, the identification problem is to find an order, n, and time-varying matrices, {Ak, Bk, Ck, Dk}, which satisfy (1) for some n x 1 state sequence, x(k). Generally it is assumed that the state space model is slowly time-varying. We then wish to provide an algorithm which will track the model. The algorithm uses the same basic approach developed in [1]. It can be summarized in two steps: find an estimate of the state sequence and then obtain the system matrices from the least squares problem
[ x(k+i+j-1) ... y(k + i + j - 2) ... Cj Dj
x(k+i+l) y(k + i)
] Wj-l =
u(k+i+j-2)
...
(2)
u(k+i)
Wj-1,
where Wj is a diagonal weighting matrix defined by
wj=
[1o ,~wj_~ o
for lal < 1 and W1 = 1. The index k is the time at which observations begin and k + i + j - 1 is the time at which the latest observations have been made. Indices k and i are fixed, but j grows as observations are made. To keep the notation compact, the indexing of the system matrices will show only the dependence on j, though {Aj,Bj, Cj, Dj} will depend on observations up to u(k + i + j - 1) and y(k + i + j - 1) An appropriate exponentially weighted state vector sequence can be determined from the intersection of the row space of two Toeplitz matrices 9 Define the (m + l)i x j block Toeplitz matrix
T(k) =
u(k + j - 1)
u(k + j - 2)
u(k + j - 1)
y(k + j - 2)
9
u(k+j+i-2) y(k+j+i-2)
... ...
,
u(k+j+i-3) y(k+j+i-3)
u(k) y(k) ,
... ...
u(k+i-1) y(k+i-1)
If T1 = T(k) and 7'2 = T(k + i) then in the time invariant case, the intersection of the row spaces of T1 and T2 generically has dimension n, the order of the model (1) generating y(k) from u(k). [1] If the rows of some X form a basis for the intersection then the columns of X are a
On-line MIMO System Identification
443
sequence of state vectors for a time invariant model generating y(k) from u(k). A proof of this fact can be found in [1]. If we compute the intersection of the row spaces of T1Wj and T2Wj and let X denote the basis for this space, then we use X as the exponentially weighted state vector sequence,
X = [ x(k+i+j-1)
x(k+i+j-2)
...
x(k+i) ]Wj.
(3)
The decomposition of [5] can be used to track the intersection of the row spaces. The contribution of this paper is to show how the system matrices can be obtained efficiently at the same time.
2
THE DECOMPOSITION
This section will deal with the T matrices in transposed form so that the problem becomes one of tracking column spaces as rows are added. The decomposition has the form
o
v2
Rn
E12
0
E22
0 0 0 0
E32 E42 F52 0
S13 R14 E24 0 F34 0 0 0 0 0 0
R23
E15 E25 E35 F45 0 0
(4)
where R n , R23 and R14 are upper triangular and full rank. R l l and R23 are square. Each F block is an upper triangular matrix with norm less than the tolerance. Each E block is an arbitrary matrix with norm less than the tolerance. The S block is an arbitrary matrix. If this is the case, then the decomposition gives estimates of the range spaces of WiT T and WiT T. In fact, it can be shown that if the E and F blocks are zero, then the first columns, U1, of U corresponding to the number of columns in R14 are a basis for the intersection of the range space of WiT T and WjT T. Details concerning decompositions of this type can be found in [3]. If we partition V2 in a manner which matches the decomposition
[
]
and assume that the E and F blocks are zero then
Villi4 = W~TTV24 and the full rank property of R14 imply that WjT2TV24 is also a basis for the intersection. This fact makes it possible to avoid storing U and will avoid the problem of growing memory storage as rows are added to WiT1T and WiT T. Details on updating the decomposition can be found in [5]. A brief summary of relevant features will be given here. We assume that the decomposition has already been computed and we are interested in having the decomposition for the same matrices, but with an added row. The process can be initialized by setting U and V to the identity and letting
444
M. Stewart and P. Van Dooren
the decomposition equal zero. If two rows, aT and bT are added to WiT T and WiT T respectively and each row of the old matrix is weighted by 0 < a < 1 then we wish to restore the form of (4) to
[1 ~ 0 UT
o~WjTT
Ir
aWjT T
0
V2
l
"
(5)
This looks much like (4), but with an additional row along the top. The problem is to update the orthogonal matrices U and V to restore the structure of the decomposition and to deal with possible rank changes in the R matrices. The key feature of the algorithm as presented in [5] which has a bearing on the system identification algorithm is that the structure of (5) can be restored by applying plane rotations from the left which operate on adjacent rows and plane rotations from the right which operate on adjacent columns. The approach is similar to that of UP~V updating as given in [4]. The algorithm can be broken into two stages. The first updates the overall structure of the decomposition when new rows are added to WiT T and WiT T. After the update, the decomposition has the same general form, but the triangular R matrices are possibly larger and might no longer have full rank. The second stage looks for small singular values of the R blocks and recursively deflates these blocks using the scheme described in [4] until they have full rank.
3
UPDATING THE SYSTEM MATRICES
As mentioned earlier, the intersection of the range spaces of WiT T and WiT T is given by WjTTV24 which also gives an estimate of the exponentially weighted state vectors, (3). Thus the least squares problem can be written as
where
U(j)= [ u ( k + i + j - 1 )
u(k+i+j-2)
...
u(k+i) ]T
Y(j)= [ y(k+i+j-2)
y(k+i+j-3)
...
y(k+i-1)
and ]T.
We will give an updating scheme for the Q R decomposition associated with the least squares problem which can carried out in conjunction with the decomposition updating to provide a solution to the system identification problem. It would be nice if the updating could be performed in O(n2), and in a sense this is possible: Unfortunately there is a problem: The system matrices will be updated through several intermediate stages during the process of updating the decomposition of WiT T and WiT T. If at one of these stages R14 is ill conditioned, as would be expected prior to a deflation, then the least squares problem will also be ill conditioned. The connection between the conditioning of R14 and that of the least squares problem is obvious since R14 is, neglecting the small elements, the
On-line MIMO System Identification
445
n • n principle submatrix of the QR decomposition for (6). This temporary ill conditioning cart introduce large errors in the system matrices as updating is carried out. Thus there are two possible approaches to updating the system matrices presented in this paper. The first is the numerically safe approach of updating the QR decomposition for (6) and then do a back substitution to get {Aj, Bj, Cj, Dj}. This avoids large errors due to a temporarily ill conditioned R14, but it is, unfortunately, an O(n 3) process. The other possibility is art O(n2) algorithm which updates {Aj, Bj, Cj, Dj} as the generalized URV decomposition is updated. Both approaches require the
QR factorization of
I so that will be dealt with first. As the generalized URV decomposition is updated, the size of WjTTV24 can change due to changes in the size of R14 during the updating. This can be dealt with by computing the QR decomposition of the expanded matrix, P= We then obtain the required R factor as the (n + m) • (n + m) principal submatrix of the expanded R. Suppose a row is added to
WiTT and WiTT. This corresponds to a row being added to
P, o o
QT Qr
=
~R
,
(v)
where
is a square orthogonal matrix and Q1 has the same number of columns as P. If P has full column rank, then Q1 is art orthogonal basis for the range space of P. Otherwise, range of P is contained in the range of Q1. To deal with the right hand side, we define
and keep track of the matrix
[0 o]
QTs. When a row is added we get
~QTS1"
(s/
Before any updating is done on the generalized URV decomposition, we can apply a standard QR updating to (7). The rotations which accomplish this are applied to (8) at the same time. Since it is not necessary to store Q, the memory required by this approach does not grow with time. Once the Q R decomposition has been updated, the generalized URV updating can be
M. Stewartand P. Van Dooren
446
performed. Each time a right rotation is performed and V2 is updated, a corresponding right rotation is performed on P and S. The rotation performed on P destroys the QR decomposition of P. Since all of the right rotations which are used to update the generalized UI~V decomposition operate on adjacent columns, there are clearly three ways in which the QR decomposition can be damaged. The simplest is when the update to V2 only affects one of the matrices V23, V24, or V25. In this case the rotation operates on two adjacent columns of P and hence merely creates a single nonzero element on the sub diagonal of the R factor of P. To zero this element requires just one left rotation which is applied to both P and S. The other possibilities are when the update to V2 affects the last column of V24 and the first column of V2s or the first column of V24 and the last column of V23. Since they do not correspond to adjacent columns of P, they create more nonzeros than the first case. To restore the QR decomposition after one such right rotation is an O(n2) process. Fortunately, the number of these rotations is bounded independently of n, so that the overall process is still O(n2). It can easily be shown that it is possible to deal with changes in the block sizes of P and S due to changes in the sizes of R14 and R23 in O(n2) by using similar techniques. Once the URV updating has been completed and the QR decomposition of P has been maintained, we have t h e / / f a c t o r for the least squares problem in the form of the (n + m) x (n + m) principal submatrix of the R factor of P. Similarly if we take as the right hand side the (n + m) x (n + l) principal submatrix of QT1s, then we can do a triangular backsolve to find the system matrices. There are three sorts of updates which must be performed on the least squares solution. The first is to deal with a new row which is added to the problem. If we look at the submatrix of P,
and the submatrix of S S1
([
that define the least squares problem
AT cT P1 B~ D~ =$1, then when a row is added, we would like to find the solution to the least squares problem
pT]
AT. cT]
,sT ]
The normal equations are
AT cT Using the Sherman-Morrison-Woodbury formula, the solution can be written as
[A T.~. C]aT T. = (P1Tp1)_lpTs1 + x (
pTx sT_ xTP1TS1) .
On-line MIMO System Identification
447
where x = (pTp1)-lp. The new least squares solution is just a rank one modification of the old solution. This modification can be computed in O(n 2) flops using the R factor for P1, the updating of which was described earlier. Once the new row has been incorporated into the least squares solution, the QR decomposition for P with the new row added can be computed as described earlier. When that has been done, the generalized URV updating can commence with the QR decomposition being updated as V2 is changed. The final part of the identification problem is to update {Aj,Bj, Cj, Dj} as V2 is changed and the partitioning is changed. Rotations which only affect V23 and V25 will not affect the least squares solution. The simplest case in which something actually has to be done is one in which the rotation affects only V24. Suppose some rotation V is applied to the state estimate portions of P1 and $1. Then the normal equations become
0 Im
pTp1
V
oIm
0
B
V
=
OIm
0
pT.,cI
V 0 OIt
so the new solution is
AT. c T.
V
O ]T
[ 0
It
"
The same rotation which is applied to P1 and 5'1 can be applied to the right and left of the old solution to get the new system matrices. Because the right rotations involved in updating the generalized URV decomposition always act on adjacent columns, we need only make special consideration of the case in which a rotation acts at the boundaries of one of the blocks of V:. It turns out that such rotations always occur when there is a change in the size of the R14 block. They can be dealt with by viewing the process as adding either the last column of V23 or the first column of V25 to V24 and then performing a rotation which acts purely within II24. All that we need to deal with are rotations which act solely within one of the blocks V23, V24 or V25, which has already been covered, and the process of adding or deleting a column from the least squares problem. Each time R14 grows, we must bring a column from the W.iT2TV23 or WjTTV25 into the WjTTV24 block of P. Since the WjTTV24 and the U(j) blocks of P define, the least squares problem, this amounts to adding a column to the least squares problem. The same thing applies to S. Similarly, whenever a column is removed from R14, the least squares problem shrinks by a column. Suppose we have the
QR
decomposition of/)1 with the column p appended, [Rll
r12]
0
0
0
and we have a solution to the least squares problem
M. Stewart and P. Van Dooren
448 Then the solution satisfies 0
r22
xT1
x22
--
q T S 1 qTs
"
/,From this we get four equations which can be used to update the solution
~11X12 "~- r12X22 --" Q1Ts
= qTs
and T22X22 _. qT.
/,From the first of these equations, it is clear that if we wish to delete a column, we get a solution to the new least squares problem P1X = $1 of x =
+
(9)
This can easily be computed in O(n2). The reverse process, that of going from a solution, X, of the smaller problem, to the solution of the larger problem makes use of all four of the equations. First z22 can be computed from the last equation, x21 from the third, x12 from the second, and Xll using (9). The necessary products QITs, qTs1 and qTs will be available from the part of the identification algorithm which updates the Q R decomposition and the transformed right hand side. Again, the whole process can be carried out in O(n 2) flops. Acknowledgments This research was supported by the National Science Foundation under Grant CCI~ 9209349 References
[1] M. Moonen, B. De Moor, L. Vandenberghe and J. Vandewalle. On- and Off-line Identification of Linear State-space Models. Int. J. Control 49, pp 219-232, 1989. [2] M. Moonen. Jacobi-type Updating Algorithms for Signal Processing, Systems Identification and Control. PhD Thesis, Katholieke Universiteit Leuven, 1990. [3] C. C. Paige. Some Aspects of Generalized QR Factorizations. In : M. G. Cox and S. Hammarling (Eds.), Reliable Numerical Computation, Oxford Univ. Press, pp. 73-91, 1990. [4] G. W. Stewart. An Updating Algorithm for Subspace Tracking. IEEE Transactions on Signal Processing 40, pp. 1535-1541, 1992. [5] M. Stewart and P. Van Dooren, A QURV Updating Algorithm. In : J. G. Lewis (Ed.), Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, SIAM, pp. 269273, 1994.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
449
SUBSPACE TECHNIQUES IN BLIND MOBILE RADIO CHANNEL IDENTIFICATION AND EQUALIZATION USING FRACTIONAL SPACING AND/OR MULTIPLE ANTENNAS
D.T.M. SLOCK Mobile Communications Department Eurecom Institute 2229 route des Crates, BP 193 F-06904 Sophia Antipolis Cedex, France [email protected]
ABSTRACT. Equalization for digital communications constitutes a very particular blind deconvolution problem in that the received signal is cyclostationary. Oversampling (w.r.t. the symbol rate) of the cyclostationary received signal leads to a stationary vector-valued signal (polyphase representation). Oversampling also leads to a fractionally-spaced channel model and equalizer. In the polyphase representation, channel and equalizer can be considered as an analysis and synthesis filter bank. Zero-forcing (ZF) equalization corresponds to a perfect-reconstruction filter bank. We show that in the oversampling case FIR (Finite Impulse Response) ZF equalizers exist for a FIR channel. In the polyphase representation, the noise-free multichannel power spectral density matrix has rank one and the channel can be found as the (minimum-phase) spectral factor. The multichannel linear prediction of the noiseless received signal becomes singular eventually, reminiscent of the single-channel prediction of a sum of sinusoids. As a result, a ZF equalizer can be determined from the received signal second-order statistics by linear prediction in thenoise-free case, and by using a Pisarenko-style modification when there is additive noise. In the given data case, Music (subspace) or Maximum Likelihood techniques can be applied. We also present some Cramer-Rao bounds and compare them to the case of channel identification using a training sequence.
KEYWORDS. Subspace techniques, multichannel linear prediction, eigen decomposition, blind equalization, channel identification, cyclostationarity, music, maximum likelihood.
D.T.M. Slock
450
1
FRACTIONALLY-SPACED TER BANKS
C H A N N E L S A N D E Q U A L I Z E R S , A N D FIL-
Consider linear digital modulation over a linear channel with additive Gaussian noise so that the received signal can be written as
y(t) = ~ a k h ( t - k T ) + v(t) (1) k where the ak axe the transmitted symbols, T is the symbol period, h(t) is the (overall) ~h~nnel impulse re~pon~e.
A ~ u m i n g the (ak} ~nd ( ~ ( t ) } to be (wide-~en~e) ~ t ~ t i o ~ r r ,
the process {y(t)} is (wide-sense) cyclostationaxy with period T. If {y(t)} is sampled with period T, the sampled process is (wide-sense) stationary and its second-order statistics contain no information about the phase of the channel. Tong, Xu and Kailath [6] have proposed to oversample the received signal with a period A = T/m, m > 1. This leads to m symbols-spaced channels. The results presented here generalize the results in [3] where an oversampling factor ra = 2 was considered. As an alternative to oversampling, multiple channels could also arise from the use of multiple antennas. Corresponding to each antenna signal, there is a channel impulse response. Each antenna signal could furthermore be oversampled. The total number of symbol rate channels is then the product of the number of antennas and the oversampling factor. In what follows, we use the terminology of the case of one antenna. The case of multiple synchronous transmitting sources is treated in [4]. We assume the channel to be FIR with duration of approximately NT. With an oversampling factor m, the sampling instants for the received signal in (1) are to+T(k + ~ ) for integer k and j = 1, 2 , . . . , m. We introduce the polyphase description of the received signal: yj(k) = y(to+T(k + L~_.!)) for j = 1 , . . . , m are the m phases of received signal, and similarly for the channel impulse response and the additive noise. The oversampled received signal can now be represented in vector form at the symbol rate as N-1 y(k) = ~ h(i)ak_i + v(k) = HNAN(k) + v ( k ) ,
i-o 9
,v(k) =
y(k)= y ik) HN = [ h ( 0 ) . . . h ( N - 1 ) ] , A ~ ( k ) =
9
,h(k) =
9
(2)
h ik)J [.ak ""ak_N§ . ]"
where superscript H denotes Hermitian transpose. We formalize the finite duration N T assumption of the channel as follows (AFIR) 9 h(0) 7~ 0, h ( N - 1 ) 7~ 0 and h(/) = 0 for i < 0 or i >__N.
(3)
The z-transform of the channel response at the sampling rate ~- is/-/(z) = ~ z-(J-1)//j(z~). j-1 Similarly, consider a fractionally-spaced (~-)equalizer of which the z-transform can also be decomposed into its polyphase components" F(z) = ~ = 1 z(J-1)Fj(z'~), see Fig. 1. A1-
Blind Mobile Radio Channel Identification
451
though this equalizer is slightly noncausal, this does not cause a problem because the discrete-time filter is not a sampled version of an underlying continuous-time function. In fact, a particular equalizer phase z(J-1)Fj(z m) follows in cascade the corresponding channel phase z-(J-1)Hj(z m) so that the cascade Fj(zm)Hj(z m) is causal. We assume the equalizer phases to be causal and FIR of length L: Fj(z) --- x--,L-1 f J~,rk~z-k, j = 1,...,m. z-.,k=O /
[qli!iii!i!iiii!ii!iiiiiiiiii!i]_ ........................ Hi!ii!i!i!i!i! !iiiiiii!!!iiiiill iiii!iiiiiiiN!iii!i!i!i!!l-
Figure 1" Polyphase representation of the T/m fractionally-spaced channel and equalizer for m = 2. 2
FIR ZERO-FORCING
(ZF) E Q U A L I Z A T I O N
We introduce f(k) . .[fl(k) . . fro(k)], FL = [f(0) .. 9f(L-1)], H(z) = ~k=og-1h(k)z-k and F ( z ) = ~k=o L-~ f(k)z - k . The condition for the equalizer to be ZF is F(z)H(z) - z - ~ where n = 0, 1 , . . . , N + L - 2 . The ZF condition can be written in the time-domain as
FL T L ( H N ) :
[0...0 1 0...0]
(4)
where the 1 is in the n+ 1st position and TM (x) is a (block) Toeplitz matrix with M (block) r o w s and Ix 0p• as first (block) row (p is the number of rows in x). (4)is a system of L + N - 1 equations in Lm unknowns. To be able to equalize, we need to choose the equalizer length L such that the system of equations (4) is exactly or underdetermined. Hence -
-
,~
1
'
(5)
We assume that HN has full rank if N >_ m. If not, it is still possible to go through the developments we consider below. But lots of singularities will appear and the non-singular part will behave in the same way as if we had a reduced number of channels, equal to the row rank of HN. Reduced rank in HN can be detected by inspecting the rank of Ey(k)yH(k). If a reduced rank in HN is detected, the best way to proceed (also when quantities are estimated from data) is to preprocess the data y(k) by transforming them into new data of dimension equal to the row rank of HN. The matrix TL (HN) is a generalized Sylvester matrix. It can be shown that for L _ L_. it has full column rank if the FIR assumption (3) is satisfied, and if H(z) # 0, Vz or in other words if the Hi(z) have no zeros in common. This condition coincides with the identifiability condition of Tong et al. on H(z) mentioned earlier. Assuming TL (HN) to have full column rank, the nullspace of TH (HN) has dimension L ( m - 1 ) - N + I . If we take the entries of any vector in this nullspace as equalizer coefficients, then the equalizer output is zero, regardless of the transmitted symbols.
452
D . T . M . Slock
To find a ZF equalizer (corresponding to some delay n), it suffices to take an equalizer length equal to L__. We can arbitrarily fix L _ ( m - 1 ) - N + I equalizer coefficients (e.g. take L _ . ( m - 1 ) - N + I equalizer phases of length L_-i only). The remaining L__+N-1 coefficients can be found from (4)if H(z) # 0, Vz. This shows that in the oversampled case, a FIR equalizer suffices for ZF equalization! With an oversampling factor m = N, the minimal required total number of equalizer coefficients N is found (L__.= 1). 3
CHANNEL IDENTIFICATION FROM FREQUENCY DOMAIN APPROACH
SECOND-ORDER
STATISTICS:
Consider the noise-free case and let the transmitted symbols be uncorrelated with variance a 2. Then the power spectral density matrix of the stationary vector process y(k) is Syy(z) = a 2 H ( z ) H H ( z - * ) .
(6)
The following spectral factorization result has been brought to our attention by Loubaton [1]. Let g ( z ) be a m • 1 rational transfer function that is causal and stable. Then g ( z ) i s called minimum-phase if K(z) # 0, Iz] > 1. Syy(z) is a rational m • m spectral density matrix of rank 1. Then there exists a rational m • 1 transfer matrix K(z) that is causal, stable , minimum-phase, unique up to a unitary constant, of (minimal) McMillan degree d e g ( g ) = 89 d e g ( S y y ) s u c h that (7)
Syy(Z) = g ( z ) E l l ( z - * ) .
In our case, S y y is polynomial (FIR channel) and H ( z ) i s minimum-phase since we assume H(z) ~ 0, Vz. Hence, the spectral factor K(z)identifies the channel K(z)
= as e jr H(z)
(8)
up to a constant ac, eJr So the channel identification from second-order statistics is simply a multivariate MA spectral factorization problem. 4 4.1
GRAM-SCHMIDT IZATION AND UDL
ORTHOGONALIZATION, LINEAR PREDICTION
FACTORIZATION
OF THE
INVERSE
TRIANGULAR COVARIANCE
FACTOR-
MATRIX
Consider a vector of zero mean random variables Y = yH y H . . . y . We shall introduce the notation Yl:M -- Y . Consider Gram-Schmidt orthogonalization of the components of Y. We can determine the linear least-squares (lls) estimate yi of yi given Yl:i-1 and the associated estimation error Y'i as y~ = ~1~:,_~ =
R~,~:,_~R~:,_~:,_~ yl:~-~,
~ = ~l~:,_~ = y ~ - y~
(9)
where Rab = Eab H for two random column vectors a and b. The Gram-Schmidt orthogo-
.a z io. p ocess r of ge,er y~ = yl. We can write the relation LY = Y
i.g r162
s r i.g with (10)
Blind Mobile Radio Channel Identification
453
where L is a unit-diagonal lower triangular matrix. The first i - 1 elements in row i of L ~From (10), we obtain axe -.Ryiyl:i_ 1R Y-1 l:i-lYl:i-1 " E(LY)(Ly)H
= E??H
~
LRyyL H = D = R~.
(11)
D is indeed a diagonal matrix since the Y'i are decorrelated. Equation (11) can be rewritten as the UDL triangular factorization of Ry~
R~
= L'D-~L.
(12)
If Y is filled up with consecutive samples of a random process, Y = [yH(k) y H ( k - 1 ) . . . y H ( k - M + l ) ] H, then the L become backward prediction errors of order i - 1 , the corresponding rows in L are backward prediction filters and the corresponding diagonal elements in D are backward prediction error variances. If the process is stationary, then R y y is Toeplitz and the backward prediction errors filters and variances (and hence the UDL factorization of R y e ) can be determined using a fast algorithm, the Levinson algorithm. If Y is filled up in a different order, i.e. r = [ y g ( k ) y H ( k + l ) . . . y g ( k + M - 1 ) ] H, then the backward prediction quantities become forward prediction quantities, which for the prediction error filters and variances are the same as the backward quantities if the process y(.) is scalar valued. If the process y(.) is vector valued (say with m components), then there are two ways to proceed. We can do the Gram-Schmidt procedure to orthogonalize a vector component with respect to previous vector components. In this way, we obtain successively Yi = .~ilyl:~_1. Applied to the multichannel time-series case, we obtain vector valued prediction errors and multichannel prediction quantities. The UDL factorization of R y y now becomes
R~ry = L'HD'-IL '
(13)
in which L' and D' are block matrices with m x m blocks. L' is block lower triangular and its block rows contain the multichannel prediction error filter coefficients. The diagonal blocks in particular are Im. D' is block diagonal, the diagonal containing the m x m prediction ! error variances Di. Alternatively, we can carry out the Gram-Schmidt factorization scalar component by scalar component. This is multichannel linear prediction with sequential processing of the channels. By doing so, we obtain the (genuine) UDL factorization
R;2y = L~D -~L
(14)
in which L is again lower triangular with unit diagonal and D is diagonal. When the vector process y(.) is stationary, R y y is block Toeplitz and appropriate versions of the multichannel Levinson algorithm can generate both triangular factorizations in a fast way. The relationship between the two triangular factorizations (13) and (14) is the following. I I Consider the UDL factorizations of the inverses of the blocks D i on the diagonal of D D~-I = L H i D i"- l L i
(15)
and let
LII
=
bto~kdiag(L~,...,Z,M}
,
D
II
II
II
= bto~kdiag(D~,...,DM}
.
(16)
454
D.T.M. Slock
Then we get from (13)-(16) R ~ r y = L H D - 1 L = L ' H L " H D " - I L "L'
=~ L = L"L' , D = D"
(17)
by uniqueness of UDL triangular factorizations. If the matrix R y y is singular, then there exist linear relationships between certain components of Y. As a result, certain components Yi will be perfectly predictible from the previous components and their resulting orthogonalized version Yi will be zero. The corresponding diagonal entry in D will hence be zero also. For the orthogonalization of the following components, we don't need this yi. As a result, the entries under the diagonal in the coresponding column of L can be taken to be zero. The (linearly independent) row vectors in L that correspond to zeros in D are vectors that span the null space of R y y . The number of non-zero elements in D equals the rank of R y y . 4.2
LDU FACTORIZATION OF A COVARIANCE MATRIX
Assume at first that R y y is nonsingular. Since the Y'i form just an orthogonal basis in the space spanned by the yi, Y can be perfectly estimated from Y. Expressing that the covariance matrix of the error in estimating Y from Y is zero leads to 0 = Ryy =~ R y y
- Ry~l:ly~l:l.~y = ( R y ~ R ~-1)
R~
(Rv~R?y)
=
V HDU
(is)
where D is the same diagonal matrix as in (14) and U = L -H is a unit-diagonal upper triangular matrix. (18) is the LDU triangular factorization of R y y . In the stationary multichannel time-series case, R y y is block Toeplitz and the rows of U and the diagonal elements of D can be computed in a fast way using a sequential processing version of the multichannel Schur algorithm. When R y v is singular, then D will contain a number of zeros, equal to the dimension of the nullspace of R y y . Let J be a selection matrix (the rows of J are rows of the identity matrix) that selects the nonzero elements of D so that J D J H is a diagonal matrix that contains the consecutive non-zero diagonal elements of D. Then we can write Ryy
= (ju)H(jD-1jH)(jU)
(19)
which is a modified LDU triangular factorization of the singular R y y . (JU) H is a modified lower triangular matrix, its columns being a subset of the columns of the lower triangular matrix U g. A modified version of the Schur algorithm to compute the generalized LDU factorization of a singular block Toeplitz matrix R y y has been recently proposed in [7].
5
SIGNAL AND NOISE SUBSPACES
Consider now the measured data with additive independent white noise v(k) with zero mean and assume E v ( k ) v H ( k ) = a2Im with unknown variance a v2 (in the complex case, real and imaginary parts are assumed to be uncorrelated, colored noise with known correlation structure but unknown variance could equally well be handled). A vector of L measured
Blind Mobile Radio Channel Identification
455
data can be expressed as YL(k) = TL (HN) AL+N-I(k) + VL(k)
(20)
where V L ( k ) = [ y H ( k ) . . . y H ( k - L + I ) ] H and VL(k)is similarly defined. Therefore, the structure of the covariance matrix of the received signal y(k) is
Ry
"-
E Y L ( k ) y H ( k ) = TL ( H N ) R L~+ N - 1 7;,LH (HN) + a,2 ImL
(21)
where R~ = EAL(k)AH(k). We assume R ~ to be nonsingular for any M. For L >_ L_., and assuming the FIR assumption (3) and the no zeros assumption, H(z) ~ 0, Yz, to hold, then TL (HN) has full column rank and a v2 can be identified as the smallest eigenvalue of R y. Replacing R y by R y - a 2 I m L gives us the covariance matrix for noise-free data. Given the structure of R y in (21), the column space of TL (HN) is caned the signal subspace and its orthogonal complement the noise subspace. Consider the eigendecomposition of R y of which the real positive eigenvalues are ordered in descending order: L+N-1
R[ =
mL
, v,. i=x
+
v,. v,
=
A v[ + v. A v#
( 22 )
i=L+N
2 where A2r = avI(m_l)L_N+l (see (21)). The sets of eigenvectors Vs and VX are orthogonal: v H v H = 0, and Ai > a ,2, i = 1 . . . , L + N - 1 . We then have the following equivalent descriptions of the signal and noise subspaces
Range {Vs} = Range {TL (HN)} , VffTL (HN) = 0.
ZF E Q U A L I Z E R SECOND-ORDER DICTION
(23)
AND NOISE SUBSPACE DETERMINATION FROM STATISTICS BY MULTICHANNEL LINEAR PRE-
We consider now the noiseless covariance matrix or equivalently assume noisefree data: v(t) - O. We shall also assume the transmitted symbols to be uncorrelated, R ~ = a2aIM, though the noise subspace parameterization we shall obtain also holds when the transmitted symbols are correlated. Consider now the Gram-Schmidt orthogonalization of the consecutive (scalar) elements in the vector YL(k). We start building the UDL factorization of (RY) -x and obtain the consecutive prediction error filters and variances. No singularities are encountered until we arrive at block row L in which we treat the elements of y(k-L_.+l). From the full column rank of TL_(HN), we infer that we will get m = mL__-(L+N-1) E {0, 1 , . . . , m - 2 } singularities. If m > 0, then the following scalar components of Y become zero after orthogonalization: ~'i(k-L+l) = 0, i = r e + l - m , . . . , m . So the corresponding elements in the diagonal factor D are also zero. We shall call the corresponding rows in the triangular factor L singular prediction filters. For L = L__+I, T_L+I(HN) has m more rows than TL(HN) but only one more column. Hence the (column) rank increases by one only. As a result ~l(k-L_.) is not zero
456
D.T.M. Mock
in general while y i ( k - L ) = 0, i = 2 , . . . , m (we assume h i ( N - l ) ~ 0. The ordering of the channels can always be permuted to make this true since h ( N - 1 ) ~ 0). Furthermore, since TL(HN) has full column rank, the orthogonalization of y l ( k - L ) w.r.t. YL_(k) is the same as the orthogonalization of y l ( k - L ) w.r.t. AL+N-I(k). Hence, since the ak are assumed to be uncorrelated, only the component of yl(k-L__) along ak-L-N+l remains" ~'l(k-L_.) = hl(N--1)ak-L-N+l. This means that the corresponding prediction filter is (proportional to) a ZF equalizer! Since the prediction error is white, a further increase in the length of the prediction span will not improve the prediction. Hence y"~(k-L) = h~(N-1)ak-L-y+~, L > L and the prediction filters in the corresponding rows of L will be appropriately shifted versions of the prediction filter in row m L + 1. Similarly for the prediction errors that are zero, a further increase of the length of the prediction span cannot possibly improve the prediction. Hence ~i(k-L) = O, i = 2 , . . . , m , L >_ L. The singular prediction filters further down in L are appropriately shifted versions of the first m - 1 singular prediction filters. Furthermore, the entries in these first m - 1 singular prediction filters that appear under the l's ("diagonal" elements) are zero for reasons we explained before in the general orthogonalization context. So we get a (rank one) white prediction error with a finite prediction order. Hence the channel ouput process y(k) is autoregressive. Due to the structure of the remaining rows in L being shifted versions of the first ZF equalizer and the first m - 1 singular prediction filters, after a finite "transient", L becomes a banded lower triangular block Toeplitz matrix. Consider now L > L and let us collect all consecutive singular prediction filters in the triangular factor L into a ( ( m - 1 ) ( L - L_)+m) x (mL) matrix ~L. The row space of gL is the (transpose of) the noise subspace. Indeed, every singular prediction filter belongs to the noise subspace since ~L"ffL (HN) = 0 , all rows in gL are linearly independent since they are a subset of the rows of a unit-diagonal triangular matrix, and the number of rows in ~L equals the noise subspace dimension. ~L is a banded block Toeplitz matrix of which the first m - l - m rows have been omitted. ~L is in fact parameterized by the first m - 1 singular prediction filters. Let us collect the nontrivial entries in these m - 1 singular prediction filters into a column vector GN. So we can write ~L(GN). The length of GN can be calculated to be
m((L__-l)m+m-m)+(m-l-m)((L_.-1)m+m-m+l)
= mN-I
(24)
which equals the actual number of degrees of freedom in HN (the channel can only be determined up to a scalar factor, hence the -1). So ~L(GN) represents a minimal linear parameterization of the noise subspace.
7
CHANNEL IDENTIFICATION LAR FACTORIZATION
BY COVARIANCE MATRIX TRIANGU-
Consider now the triangular factorization of R y. Since R y is singular, we shall end up with a factorization of the form (19). For L = L_, we have exactly m singularities. Going from L = L to L = L + I , the rang increases by only one and only one colum gets added to (JU) g. Since the corresponding nonzero orthogonalized variable is ~'l(k-L__) = hl(N--1)ak-L-g+l,
Blind Mobile Radio Channel Identification
457
the corresponding column in the factor (JU) H of R y is E Y~(k)~H(k-_L) = h H ( N - 1 ) a~2[01• _
9..hH(0) 0 . . ]9H
(25)
which hence contains the channel impulse response, apart from a multiplicative factor. For the remaining columns of (JU) H, we have ~ l ( k - L ) = hl(W-1)ak-L-N+X , L >_ L_, and hence the remaining columns of (JU) H are just shifted down versions of the column in (25). Hence, after a finite transient, (JU) H becomes a lower triangular block Toeplitz matrix. The elements of this block Toeplitz matrix are a certain multiple of the multichannel impulse response coefficients. Since the channel output is obviously a multichannel moving average process, R y and its triangular factor (JU) H are banded. This is the time-domain equivalent of the frequency domain spectral factorization result. In the frequency domain, the channel is obtained as the minimum-phase spectral factor of the power spectral density matrix. In the time-domain, the channel is obtained by triangular factorization of the covariance matrix. Due to a combination of the FIR assumption and the singularity of the power spectral density matrix, this time-domain factorization reaches a steady-state after a finite number of recursions. Recall that in the single-channel case (MA process), the minimum-phase spectral factor can also be obtained from the triangular factorization of the covariance matrix. However, the factorization has to be pursued until infinity for convergence of the last line of the triangular factor to the minimum-phase spectral factor to occur. 8
CHANNEL ESTIMATION FROM AN ESTIMATED QUENCE BY SUBSPACE FITTING
COVARIANCE
SE-
See [5] for a discussion of this approach. 9
CHANNEL
ESTIMATION
FROM DATA USING DETERMINISTIC
ML
The transmitted symbols ak are considered deterministic, the stochastic part is considered to come only from the additive noise, which we shall assume Gaussian and white with 2 We assume the data YM(k) to be available. The zero mean and unknown variance a v. maximization of the likelihood function boils down to the following least-squares problem rain [JYM(k) HN,AM+N-1 (k)
-- 7 M
(HN) AM+N-I(k)[[~ 9
(26)
The optimization problem in (26) is separable. Eliminating A u + N - l ( k ) in terms of HN, we get
IP
II
rain P• )YM(k) HN ~'M(HN 2
(27)
subject to a nontriviality constraint on HN. In order to find an attractive iteratNe procedure for solving this optimization problem, we should work with a minimal parameterization of the noise subspace, which we have obtained before. Indeed, P~M (HN) = PanM(aN)"
(28)
D.T.M. Slock
458
The number of degrees of freedom in HN and GN is both raN-1 (the proper scaling factor cannot be determined). So HN can be uniquely determined from GN and vice versa. Hence, we can reformulate the optimization problem in (27) as min GN
2
Due to the (almost) block Toeplitz character of GM, the product GMYM(k) represents a convolution. Due to the commutativity of convolution, we can write gM(GN)YM(k) = YN(YM(k))[1 GH]H for some properly structured YN(YM(k)). This leads us to rewrite
(29)
~
min
aN
[GN ]H 1
Y~v(YM(k)) (~HM(GN)~M(GN)) -1YN(YM(k))
1]
GN
(30)
This optimization problem can now easily be solved iteratively in such a way that in each iteration, a quadratic problem appears [2]. An initial estimate may be obtained from the subspace fitting approach discussed above. Such an initial estimate is consistent and hence one iteration of (30) will be sufficient to generate an estimate that is asymptotically equivalent to the global optmizer of (30). Cramer-Rao bounds have been obtained and analyzed in [5]. References
[1] Ph. Loubaton. "Egalisation autodidacte multi-capteurs et systhmes multivariables". GDR 134 (Signal Processing) working document, february 1994, France. [2] L.L. Scharf. Statistical Signal Processing. Addison-Wesley, Reading, MA, 1991. [3] D.T.M. Slock. "Blind Fractionally-Spaced Equalization, Perfect-Reconstruction Filter Banks and Multichannel Linear Prediction". In Proc. ICASSP 94 Conf., Adelaide, Australia, April 1994. [4] D.T.M. Slock. "Blind Joint Equalization of Multiple Synchronous Mobile Users Using Oversampling and/or Multiple Antennas". In Proc. 28th Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, Oct. 31- Nov. 2 1994. [5] D.T.M. Slock and C.B. Papadias. "Blind Fractionally-Spaced Equalization Based on Cyclostationarity". In Proc. Vehicular Technology Conf., Stockholm, Sweden, June 1994. [6] L. Tong, G. Xu, and T. Kailath. "A New Approach to Blind Identification and Equalization of Multipath Channels". In Proc. of the 25th Asilomar Conference on Signals, Systems ~J Computers, pages 856-860, Pacific Grove, CA, Nov. 1991. [7] K. Gallivan, S. Thirumalai, and P. Van Dooren. "A Block Toeplitz Look-Ahead Schur Algorithm". In Proc. 3rd International Workshop on SVD and Signal Processing, Leuven, Belgium, Aug. 22-25 1994.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 9 1995 Elsevier Science B.V. All rights reserved.
REDUCTION TRUNCATED
OF GENERAL BROAD-BAND NOISE IN SPEECH QSVD: IMPLEMENTATION ASPECTS
459
BY
S.H. JENSEN
ESAT--Department of Electrical Engineering Katholieke Universiteit Leuven Kardinaal Mercierlaan 9~ B-3001 Heverlee, Belgium Soren.Jensen @esat.kuleuven. ac. be P.C. HANSEN
UNI . C, Building 304 Technical University of Denmark DK-2800 Lyngby, Denmark Per. [email protected] S.D. HANSEN, J.A. SORENSEN
Electronics Institute, Building 3~9 Technical University of Denmark DK-2800 Lyngby, Denmark { sdh,jaas} @ei.dtu.dk
ABSTRACT. In many speech processing applications an appropriate filter is needed to remove the noise. The truncated SVD technique has a noise filtering effect and, provided that the noise is white, it can be applied directly in noise reduction algorithms. However, for non-white broad-band noise a pre-whitening operation is necessary. In this paper, we focus on implementation aspects of a newly proposed QSVD-based algorithm for reduction of general broad-band noise in speech. A distinctive advantage of the algorithm is that the prewhitening operation is an integral part of the algorithm, and this is essential in connection with updating issues in real-time applications. We compare the existing implementation (based on the QSVD) with an implementation based on the ULLV decomposition that can be updated at a low cost. KEYWORDS. Speech processing, speech enhancement, noise reduction, quotient singular value decomposition, ULLV decomposition.
460 1
S.H. Jensen et al.
INTRODUCTION
At a noisy site, e.g., the cabin of a moving vehicle, speech communication is affected by the presence of acoustic noise 9 This effect is particularly serious when linear predictive coding (LPC) [7] is used for the digital representation of speech signals at low bit rates as, for instance, in digital mobile communication. Low-frequency acoustic noise severely affects the estimated LPC spectrum in both the low- and high-frequency regions 9 Consequently, the intelligibility of digitized speech using LPC often falls below the minimum acceptable level. In [4], we described an algorithm, based on the quotient singular value decomposition (QSVD) [2], for reduction of general broad-band noise in speech 9 Our algorithm, referred to as the truncated QSVD algorithm hereafter, first arranges a segment of speech-plus-noise samples and a segment of noise samples in two separate Hankel matrices, say H and N. Then it computes the QSVD of ( H , N ) and modifes H by filtering and truncation, and finally it restores the Hankel structure of the modified H. In this way, the pre-whitening operation becomes an integral part of the algorithm. The resulting modified data segment can be considered as enhanced speech. In this paper, we focus on implementation aspects of the truncated QSVD algorithm. Section 2 summarizes the truncated QSVD algorithm. Section 3 addresses some updating issues related to the algorithm and suggest an implementation by means of a related decomposition, the rank-revealing ULLV decomposition [6]. Section 4 contains experiments that compare the truncated QSVD algorithm with an implementation based on the ULLV decomposition 9
2
THE TRUNCATED
QSVD ALGORITHM
We consider a noisy signal vector of N samples:
(i)
= [~o, ~ , . . . ~ N - ~ ] r, and we assume that the noise is additive, x=~Tn,
(2)
where .~ contains the signal component and n represent t h e noise. From x we construct the following L • M Hankel matrix H, where M -t- L = N - 1 and L >_ M:
X0 H =
xl
Xl X2
XL-I
XL
"'" XM-1/ 9 99 XM 999
(3)
XN-I
We can always write H as H = I=I + N,
(4)
where I=I and N represent, respectively, the Hankel matrices derived from ~ and n in (2). Moreover, we assume that I=I is rank deficient, rank(I=I) = K < M, and that H and N have full rank, rank(H) = rank(N) = M. This assumptions are, e.g., satisfied when the samples xi of ~ consist of a sum of K sinusoids, and the samples ni of n consist of white noise
Reduction of General Broad-band Noise in Speech
461
or non-white broad-band noise. A sinusoidal model has often been attributed to speech signals, cf. [9]. Our interest in the ordinary SVD is to formulate a general noise reduction algorithm that can be applied directly to the data matrix H = I=I + N, and give reliabel estimates of I=I. Provided that the signal is uncorrelated with the noise in the sense that H is orthogonal to N' I:ITN - 0, and the noise is white in the sense that N has orthogonal columns and 2 every column of N has norm anoise: N T N = anoiseI, then the MV estimate of I=I can be found [1] from the SVD of H' H = U d i a g ( a l , . . . , aM) V T,
(5)
where U and V are matrices with orthonormal columns. Specificly, by setting the K • K filter matrix F-
diag
1 - an~
,.
..
,
1 - an~
(6)
,
the MV-estimate of H is given by
I~'Iest "-- U (F~I0
0)vT 0 "
(7)
Unfortunately, IT"Iestis not Hankel. So, to obtain a signal vector ~ corresponding to the MV estimates we need to make a Hankel matrix approximation to I=I~t. A simple way to compute a Hankel matrix approximation is to arithmetically average every antidiagonal of I=Iest, and put each average-value as a common element in the corresponding diagonal of a new Hankel matrix of the same dimension [10]. If the noise is broad-band but not white, N T N ~ anoiseI 2 , then a pre-whitening matrix R -1 can always be applied to H. From N, which is constructed from x in "silent" periods in the speech, we can compute R via a QR decomposition,
N=QR.
(8)
In the colored-noise case, we then consider the matrix X = H R - I = ITIR-1 + N R - 1 ;
(9)
the pre-whitening operation does not change the nature of the linear model while it diagonalizes the covariance matrix of the noise as shown by ( N R - 1 ) T ( N R -1) = QTQ = I. It follows that the MV estimate of I=IR -1 can be found by applying the same procedure as outlined above. The only modification is that the MV estimate of I=IR-1 should be de-whitened before restoring the Hankel structure. The MV estimate is, of course, not the only possible estimate of I=I or I=IR -1. example, by setting F = diag
1 -
an~
, ... ,
1 -
an~
,
For
(10)
we obtain the well-known estimate used in [5] and by setting F---IK,
(11)
462
S . H . J e n s e n et al.
we obtain the classical LS estimate. We are now in a position to formulate the general SVD-based noise reduction algorithm using conventional pre-whitening. Algorithm 1 I n p u t : (H, N) and K.
Output: I:I. 1. Compute the QR decomposition of N:
(12)
N=QR. 2. Perform a pre-whitening of H: X
=
H R -1.
3. Compute the SVD of X
(14)
X = U diag(o'l,... , O'M) V T 4. Truncate X to rank K and filter diag(al,...,OK) by F: "Xest -- U ( F d i a g ( a l '
00)
VT"
" " "'
(15)
5. Perform a de-whitening of Xest:
z = X~,R.
(16)
6. Compute I:I from Z by arithmetic averaging along its antidiagonals:
i2I =
;~0 ;~1 9
Xl
...
X2
...
:
:
;~L-1 ;~L ...
;~M-1/ ~M 9
(17)
;~N-1
where xi = ~ _ a + l
~z(i-k+2'k)'
(18)
k=a with a = max(l, i - L + 2) and # = min(M, i + 1). In principle, one should repeat Steps 2 - 6 until I:I converges to a Hankel matrix with exact rank K. In practise, we do not perform this iteration. A major disadvantage of Algorithm 1 is that the explicit use of the matrix R -1 may result in loss of accuracy in the data. Moreover, it is complicated to update the matrix X = H R -1 when H and N are updated, e.g. in a recursive algorithm. The explicit use of R, and also the QR decomposition of N, can be avoided by working directly with H and N using the QSVD of the pair of matrices (H, N), which delivers the required factorization
Reduction of General Broad-band Noise in Speech
463
without forming quotients and products. To see that this is true, write the QSVD of the matrix pair (H, N) as H = U diag(~l, ~2,..., ~M) | -1,
N - V diag(#l, # 2 , . . . , #M) | -1,
(19)
where U and V are matrices with orthonormal columns, and | is nonsingular. In addition, we use that R. = QTN. By substituting this and (19) into X = H R -1 we obtain X = H i t -1 = U diag(~x/#l,..., ~M/#M)(QTv)T;
(20)
i.e., U, diag(~l/#t,...,~M/~M) and Q T v in the QSVD of ( H , N ) are identical to the SVD of X = HR. -1, with ai = ~i/#i. Accordingly, Algorithm 1 can be reformulated by means of the QSVD as outlined below: Algorithm 2 (Truncated QSVD)
Input: (H, N) and K. Output: I:I. 1. Compute the QSVD of (H, N): H = U diag(~l,...,~M)|
N = V diag(#l,...,#M) |
(21)
2. Truncate H to rank K and filter diag(~l,...,~g) by F:
I'[Iest---- U
(Fdiag(~o''"~K)
oO) 0 _ 1.
(22)
3. Compute I:I from Hest by arithmetic averaging along its antidiagonals, cf. Step 6 in Algorithm 1. Notice that the pre-whitening operation is now an integral part of the algorithm, not a separate step. We mention in passing that a similar use of the truncated QSVD is suggested in connection with regularization problems in [3]. 3
IMPLEMENTATION
ASPECTS
In real-time signal processing, it is desirable to update matrix decompositions instead of recomputing them. In connection with the truncated QSVD algorithm, we see from (22) that we, in general, need to update U, diag(~l, ~2,..., ~M), diag(#l,#2,..., #M), and | Notice that F in (6) and (10) requires the first K singular values of Hl:t -1, i.e., the first K quotient singular values of (H, N); and the quantity an|e 2 , which can be estimated as 2 = (M K) -1 ~MK+ 1 a 2. Algorithms for updating the QSVD is a topic of current ffnoise research. Anyway, the QSVD would be difficult to update, and may not be the best choice in a practical application. A promising alternative is to use the rank-revealing ULLV decomposition that can be updated at a low cost. In this decomposition, the matrix pair (H, N) is written as H = U L L1 W T,
N -- 'Q L1 W T,
(23)
S.H. Jensen et al.
464
where I~I, ~r, and W are matrices with orthonormal columns, while L and L1 are lower triangular matrices. In particular, L=
G
E
'
where L K is K x K with ~min(LK)~ aK, and IIGII~- IIEII~~ a~+ 1 + . . . + ~ ; ~min(LK) denotes the smallest singular value of L K and IIGIIF (and I]EIIF)is the Frobenious norm of G (and E). By analogy with (20), we can then write
i.e., the rank K of H R -1 is displayed in the matrix L in that LK is well conditioned, and that [[(~'I]F and I]EI]F are small. In addition, 01 = [51,..., IlK] (and 02 = [ U K + I , ' " , UM]) represents approximately the same space as U1 = [Ul," ", UK] (and U2 = [UK+l,'' ", UM]). The same is true for V and V. Hence, the ULLV decomposition in (23) yields essentially the same rank and subspace information as the QSVD does, and the approximate subspaces are typically very accurate. The advantage of the ULLV decomposition is that the matrices H and N can be recursively updated in an efficient way, which will lead to an adaptive algorithm for reduction of broad-band noise in speech. The implementation of the truncated QSVD algorithm by means of the ULLV decomposition is straightforward when the LS estimate is used. In this case, Step 2 in Algorithm 2 becomes:
ffILs=fd (LOK 0o ) L1 w r.
(26)
On the other hand, when other estimates are used, the implementation by means of the ULLV decomposition is not so simple. In this case a refinement step [8], which moves a triangular matrix toward diagonality, may be useful in order to obtain an estimate of the singular values of HR. -1 from LK so that the "filters" (6), (10), or (11) can be used. This is a topic of current research.
4
EXPERIMENTS
Algorithm 2 (Truncated QSVD) with Step 2 implemented respectively by the QSVD and the ULLV decomposition were programmed in MATLAB; the QSVD was computed using a stable QSVD algorithm implemented along the lines described in [11] and the ULLV decomposition was computed using an adaptive ULLV-decomposition algorithm implemented along the lines described in [6]. The output of the truncated QSVD algorithm implemented respectively by the QSVD and the ULLV decomposition was compared by the R.MS log spectral distortion, which is widely used in many speech processing systems. Define f to be the normalized frequency. Let 17"~10(f)l be the 10'th order LPC model spectrum of the original speech segment ~ = [~'0, 21,...Z,g-1] T, and let ~10(f) be the 10'th order LPC model spectrum of the the enhanced speech segment ~ = [xo,~l,...'I'N-1] T.
Reduction of General Broad-band Noise in Speech (b)
(a) 20
9
9
9
20 123 "O
.~
465
10
(D t-
I -20' 0
20
c~
\ ' ' ' 1000 2000 3000 f r e q u e n c y in Hz
9
0
E -10
4000
0
lo'oo
x;oo
3ooo
4000
3ooo
4ooo
f r e q u e n c y in H z (d)
(c) 9
20
9
m 10 ._c
m 1o ._ "Q o
-o
0
r
r
0~ t~
-]0
E -10 -20 o
looo
2ooo
3ooo
f r e q u e n c y in Hz
4ooo
-2o o
l o~o
2o~o
f r e q u e n c y in H z
Figure 1: LPC model spectrum of (a) segment containing noise-free voiced speech sounds. (b) segment containing noisy speech; SNR = 5 dB. (c) segment containing enhanced speech; QSVD. (d) segment containing enhanced speech; ULLV decomposition.
Then the RMS log spectral distortion, in dB, is defined as [/1/2 ] 1/2 . d2 = 20 tJ_l/2(log 17~1o(f)1 - log 17S[lo(f)l)2df[.
(27)
We used real speech signals sampled at 8 kHz, noise with spectral density of the form
S(f) = (4s- - cos(27rf)) -1, and (N,M,K) = (160,20,14). Our experiments show that for the LS estimate, the algorithm implemented by means of ULLV decomposition computes the enhanced speech segment such that d2 in most cases is less that 1 dB above d2 obtained with the algorithm implemented by means of QSVD. A difference of 1 dB is normally not audible. In Figure 1 we show an example where the speech segment contains voiced speech sounds and the signal-to-noise ration (SNR) is 5 dB. We see that the LPC spectrum of the enhanced speech segment matches the LPC spectrum of the noise-free speech segment much more closely in the regions near the peaks (formants) than the noisy one does. We also see that the enhanced speech segment obtained with the ULLV decomposition (d2 = 3.04 dB) is close to the segment obtained with the QSVD (d2 = 2.67 dB). This shows that the ULLV decomposition is a promising method in noise reduction of speech signals.
466
S.H. Jensen et al.
Acknowledgements Scren Holdt Jensen's research was supported by the European Community Itesearch Program HCM, contract no. EItBCHBI-CT92-0182. We thank Franklin Luk of Itensselaer Polytechnic Institute and Sanzheng Qiao of McMaster University for MATLAB routines for computing the ULLV decomposition. References
[1] B. De Moor. The singular value decomposition and long and short spaces of noisy matrices. IEEE Trans. Signal Processing 41, pp 2826-2838, 1993. [2] B. De Moor and H. Zha. A tree of generalizations of the ordinary singular value decomposition. Lin. Alg. and its Applic. 147, pp 469-500, 1991. [3] P.C. Hansen. Regularization, GSVD and truncated GSVD. BIT 29, pp 491-504, 1989. [4] S.H. Jensen, P.C. Hansen, S.D. Hansen, and J.A. Scrensen. Reduction of broad-band noise in speech by truncated QSVD. Report ESAT-SISTA/TR 1994-16, Dept. Electrical Engineering, Katholieke Universiteit Leuven, March 1994 (21 pages); revised version of Report UNIC-93-06, UNIoC, July 1993 (29 pages); IEEE Trans. Speech and Audio Processing. [5] S.Y. Kung, K.S. Arun, and D.V.B. Rao. State-space and singular value decompositionbased approximation methods for the harmonic retrieval problem. J. Opt. Soc. Am. 73, pp 1799-1811, 1983. [6] F.T. Luk and S. Qiao. A new matrix decomposition for signal processing. In" M.S. Moonen, G.H. Golub, and B.L.It. De Moor (Eds.), Linear algebra for large scale and realtime applications. Kluver Academic Publishers, Dordrecht, The Netherlands, pp 241247, 1993. [7] J.D. Markel and A.H. Gray Jr. Linear prediction of speech. Springer-Verlag, New York, N.Y., 1976. [8] It. Mathias and G.W. Stewart. A block QIt algorithm and the singular value decomposition, Lin. Alg. and its Appl. 182, pp 91-100, 1993. [9] It.J. McAulay and T.F. Quatieri. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. Acoust., Speech, Signal Processing 34, pp 744-754, 1986. [10] S. Van Huffel. Enhanced resolution based on minimum variance estimation and exponential data modeling, Signal Processing 33, pp 333-355, 1993. [11] C.F. Van Loan. Computing the CS and generalized singular value decomposition. Numet. Math. 46, pp 479-491, 1985.
SVD AND SIGNAL PROCESSING, III Algorithms, Architectures and Applications M. Moonen and B. De Moor (Editors) 1995 Elsevier Science B.V.
SVD-BASED
MODELLING
467
OF MEDICAL
NMR
SIGNALS
It. DE BEEIt, D. VAN OItMONDT, F.T.A.W. WAJEIt, Delft Univ. of Technology, Applied Physics Laboratory, P.O. Box 5046, 2600 GA Delft, The Netherlands. ormo@si, tn. tudelft, nl. S. CAVASSILA, D. GItAVERON-DEMILLY, Univ. Claude Bernard LyonI, Laboratoire RMN, 69622 Villeurbanne Cedex, France. graveron @muzelle. univ-lyon l .fr. S. VAN HUFFEL, Katholieke Univ. Leuven, E S A T Laboratory, 3001 Heverlee, Belgium. sabine, va nh uf f el @esat. kule u ven. ac. be.
ABSTRACT. A Magnetic Resonance (MR) scanner enables one to noninvasively detect and quantify biochemical substances at selected positions in patients. The MR signal detected by the scanner comprises a number of damped sinusoids in the time domain. Each chemical substance contributes at least one sinusoid, at a specific frequency. Often, metabolite quantification is severely hampered by interference with strong MR signals of water and other substances (e.g. fat) resident in the human body. The damping function of the interfering sinusoids is nonexponential, which aggravates the situation. We investigate subtraction of interfering sinusoids by means of SVD-based state space modelling. Among other things, modelling of all details of the signal is attempted (zero-error modelling). Rank criteria for three alternative Hankel data matrices guaranteeing zero-error modelling, are provided. Measures to meet the criteria are proposed and tested on simulated signals. Although zero-error modelling can indeed be achieved, it turns out that such modelling does not guarantee exact separation of the unwanted sinusoids from the wanted sinusoids. KEYWORDS. SVD, magnetic modelling
resonance
spectroscopy, in vivo, state space, zero-error
468 1
R. de Beer et al.
INTRODUCTION
A Magnetic Resonance (MR) scanner enables one to noninvasively detect and quantify biochemical substances at selected positions in patients [1]. Often the signal processing attendant on quantification is hampered by interference from MR signals of water and other substances (e.g. fat) resident in the human body. Removal of such signals prior to the quantification step is desired. Fig. 1 shows a typical example, transformed from the measurement domain (= time domain) to the frequency domain by FFT for display reasons. See caption of Fig.1 for details. |
--
!
water
~r
-0.05
0
0.05
0.1
a
!
!
0.15
0.2
0.25
c0/2~ Figure 1: State space processing of an in vivo MR time domain signal from the right parietal lobe of a human brain. The results are displayed in the frequency domain by applying FFT and plotting the real part. a) The spectral region of interest of the raw data minus the two first data points which are corrupted. A huge unwanted peak of nondescript (i.e., no model function available) shape originating from water dominates, b) Spectrum after subtraction of the water contribution as parametrized with seven sinusoids using the SVD-based state space algorithm of Kung et al. [2]. In addition, two relatively strongly damped sinusoids were subtracted at w / 2 r = 0.101 and 0.252, WNyquist = ~. (Note that the mentioned omission of initial datapoints causes perturbation of the baseline that may be hard to distinguish from genuine MR features. This is a mere display problem.) Apparently, Kung et al.'s SVD-based state space method [2] is capable of modelling nondescript (i.e., model function unknown) components in terms of exponentially damped sinusoids. Other examples of this feat were shown in the recent past [3]. However, several important aspects of modelling nondescript signals are yet to be resolved. First, criteria guaranteeing success of the modelling seem lacking. This is crucial in the context of automated processing of large numbers of signals. Success is defined here as the degree to
469
SVD-based Modelling o f Medical NMR Signals
which all details of the data can be accounted for. Ref.[4] advocates 'zero-error modelling', which aims to reduce the residue of model minus data to zero. Second, little is known about the extent to which successful modelling yields the true physical parameters. This is important for establishing whether removal of unwanted components from the signal by subtracting the related part of the model does affect the wanted components. In the present contribution we address both aspects 9 2
METHOD
2.1
PRELIMINARIES
For reasons of space, we assume that the reader is familiar with the essence of Kung et al.'s SVD-based state space method [2, 5, 6, 7]. We recall that the starting point of the method is rearrangement of the complex-valued data xn, n = 0 , 1 , . . . , N - 1, into an L x M Hankel data matrix X whose elements are Xl,m = Xl+m-2, l = 1, 2 , . . . , L, and m = 1, 2 , . . . , M, and L + M = N + 1, i.e.
=
X
XO
;r,1
Xl
X2
X2
9 9"
XM-1
999
XM
~
. 9
(I)
.
XL-1
XL
999
XN-1
Next, X is subjected to SVD, according to X = U A V H. If the signal xn comprises K exponentially damped sinusoids, Ck exp[(c~k + iwk)n], and no noise, the rank of X equals K (k = 1 , 2 , . . . , K , with K 1, for one or more values of k, the model function must be adapted according to Eq.(6). The latter is required only in the second stage where the amplitudes are estimated. Once zero-error modelling has been achieved, the next task is to identify those sinusoids that represent the original wanted signal, and to evaluate the extent to which the unwanted components affect the estimates of the wanted parameters. 3
RESULTS AND DISCUSSION
The method described above was tried on a noiseless simulated signal comprising a single exponentially damped sinusoid perturbed by the 2 outliers e, shown in Fig.2. Two cases, differing in frequency of the sinusoid, were considered. Details about the chosen parameters
472
R. de Beer et al.
and the results of the zero-error modelling are listed in Tablel. For w/21r = 0.1230, the singular values of the 17x 16 regularized data matrix Xr are )q = 7.644, A2 = 1.366, )~3 through Als = 1.000, )q6 = 0.876, which indicates that the rank of Xr is well defined and indeed full. Nearly the same singular values are found for w/2~r = 0.15625. The small residues quoted
1
9176
0.5
9
0. -0.5
-~0
9
o
9 i..". 9
9
9
9~
i
~
~o
~o
~0
n Figure 2: Simulated signal, comprising one noiseless, exponentially damped, sinusoid and two outliers of magnitude e, at n = 15 and 31, indicated by arrows. Table 1 lists the results of zero-error modelling for e = 1 and two different frequencies of the sinusoid. in the footnote of Table 1 show that zero-error modelling can be achieved. For both cases, the original sinusoid can be located at k = 6. Since all parameters have been estimated, subtraction of unwanted components on the basis of zero-error modelling is feasible. At the same time, this particular example shows limitations of the procedure. The estimates for k = 6 can be seen to deviate somewhat from the true parameters, listed immediately underneath. The deviations depend on the frequency, which is related to the fact that the power spectrum of the regularizing signal peaks at w/2~r = m / 1 6 , m = - 8 , - 7 , . . . , 7. For m = 2, w/2~r = 0.12500, which is near 0.1230; for m = 2.5, w/21r = 0.15625. Apparently, the parameters are better when the frequency of the wanted sinusoid is close to that of an unwanted sinusoid. Note the 'reverse damping' and small initial amplitude for k = 5, w / 2 r = 0.15625. At present we can not yet offer quantitative explanations of the observed phenomena. An analytic state space solution for the simple signal of Fig.2 is sought. Finally, we point out that in real-world cases, it may be difficult to distinguish between a wanted component and an unwanted component occupying the same frequency region. It is unfortunate that prior knowledge about the wanted part of the signal, such as relative frequencies and phases, cannot yet be exploited by SVD-based state space modelling. 4
SUMMARY 9
AND CONCLUSIONS
Zero-error state space modelling is possible if the data matrix satisfies certain rank criteria. (Proof omitted for reasons of space.)
9 A procedure for imposing the rank criteria (regularization) on the data matrix of an arbitrary signal, is devised. This regularization does not entail addition of random noise to the data. 9 Our state space solution is not restricted to exponentially damped sinusoids.
SVD-based Modelling of Medical NMR Signals
9
473
Zero-error state space modelling is capable of quantification of nondescript unwanted components in MR time domain signals. The spectrum of the unwanted components strongly overlaps with that of wanted components. The perturbation of parameter estimates of the wanted components is studied.
Table 1: Results of zero-error a SVD-based state space modelling for a single exponentially damped sinusoid augmented by two unit outliers as indicated in Fig.1. The true values of the sinusoid are listed below line k = 6. L = 17, M = 16, N = 32. ck = [ck[ exp(i~k). Two frequencies are considered: w/27r = 0.1230, 0.15625, with WNyquist 71". • 1. ----
;0)
"--
k
I~kl
~o~
~k
~k/2~
Ickl
e , ~ ~1l o g x-VT=~
=
can be introduced, so that the equation can be written in the form: g = LF
(13)
where L F is the vector whose components are given by: (LF)n = (Y,r
n = 1,...,N
(14)
N As L is a finite rank operator, it is possible to introduce its singular system {an; v,.,, un}n=l [2] so that:
I,v, = an un
Z* un = an vn
(15)
From this definition it immediately follows that the singular values and the singular vectors axe respectively the square root of the eigenvalues and the eigenvectors of the operator LL* which can be decomposed in the form: (16)
LL* = G T w
with W the weight matrix defined by: n,m = 1,...,N
Wnm = 6nmWm
(17)
and G the Gram matrix whose (n, m) entry is given by:
Grim = (r162
n,m = 1,...,N
(lS)
The elements of the Gram matrix have been computed analytically [12]; the result is: f o r m > n: Gmn =
4
3 log
for m = n: 1 G.~.~ - (em)s 8 log 2 F o r m < n:
1+~
2
1- ~
en(em) :z
log e__.~_+~ Em En(Em)2
=~
(20)
Bremsstrahlung Spectra Emitted by Solar Plasma 2.2
479
LAPLACE TRANSFORM
It is well-known that the inversion of the Laplace transform is a severely ill-posed problem [1]. It follows that, in order to obtain stable estimates of the solution, it may be necessary to include apriori information on the solution in the regularisation algorithm. An example of this fact is given by the reconstruction of functions which are assumed to be zero in the origin. In this case it is useful to choose, as the source space X, the Sobolev space Hi(O, a) endowed with the scalar product:
(/, g)x =
/'(y)g'(y)dy
(22)
and with the property: f(0) = 0
(23)
The linear inverse problem with discrete data (10) can be written again in the operator form (13), with L f = (f, Cn)x, though, this time, the functions Cn are given by the solution of the boundary value problem:
r
= o
r
= 0
(24)
One obtains explicitly: 1 1
exp(-eny) - Y exp(-ena)
= 1,..., N
(25)
and so the (n, m) entry of the Gram matrix is given by: 1
"{
1
exp[-(en+ era)a]+
+
+ aexp[-(en + era)a] + en -I- em 1 1 +--exp[-(en + era)a]- - - e x p ( - e n a ) + f-m
s
1 1 exp(-ema) "~ +--exp[-(en + era)a]- e--~
J
(-n
3
NUMERICAL
RESULTS
The knowledge of the singular system of the two integral operators allows to apply Tikhonov regularisation [13] to the reconstruction both of the averaged distribution function F ( E ) and of the differential emission measure ~(T). Tikhonov regularisation solution is defined as the function which minimises the functional:
~m[f] =
IILf-
gll~" + AIIfll~
(27)
We remark that, if X is the Sobolev space Hi(0, a), then the minimisation of (27) implies the minimisation of the functional:
9 ~[f] =
IILI- gll}
+ A
/i If'(Y)12dY
(28)
480
M. Piana
in the subset of the functions satisfying condition (23). In general, the regularised solution can be represented in terms of the singular system of the integral operator, explicitly [3]: N
Orn
f~ = ~ ~ + ~ (g, u~)~
(29)
r~=l
The choice of the regularisation parameter )~ is a crucial problem in Tikhonov method. To this aim, there exist several criteria whose efficiency sensitively depends on the particular problem which is studied. In this paper two methods have been applied [9]: Morozov's Discrepancy Principle, according to which the best )~is given by the solution of the equation: IIL/~ - glIY - ~
(30)
(where ~ is the RMS error on the data) and the Generalised Cross Validation, which provides the value of A minimising the function: V(A) = ( N - 1 T r [ I - A(A)])-2(N-11[[I- A(A)]gll 2)
(31)
It is interesting to note that, owing to their iU-posedness, the solutions of the inverse problems (9) and (10) call be known ollly with infinite uncertainty. Nevertheless it is possible to estimate the propagation error from the data to the regularised solution by calculating the so called confidence limit which is obtained by performing several regularised reconstructions corresponding with different realisations of the data vector computed by modifying the real data with random components with zero mean and variance equal to unity. The result is a "confidence strip" whose upper and lower borders are the confidence limits of the reconstructed function. As regards the estimate of the resolution, it is sufficient to observe that not all the singular functions significatively contribute to the regularised solution through equation (29). More precisely, this sum can be truncated at the value n = M so that the relative variation between the truncated solution: M
o"n
f ~ = ~ ~ + ~(g, u~)~
(32)
r~=l
and (29) is less than the relative error on the data. As the singular function of order n has n - 1 zeroes in (0, oo), it follows that the regularised solution cannot contain details in the interval between two adjacent zeroes of the last significant singular function. Besides several simulated cases, Tikhonov regularisation has been applied to a representative sample (figure 1) of the temporal series of spectra recorded by Germanium detectors during the solar flare of 27 June 1980 [11], in order to recover both the averaged electron distribution function and the differential emission measure. Figure 2a represents the "confidence strip" corresponding to equation (9); the value of the regularisation parameter is obtained by GCV and the error and resolution bars have been plotted on the strip at some discrete points, representing the geometric means between adjacent zeroes of the last significant singular function. In figure 2b, there is another reconstruction of F ( E ) , in which the value of A has been chosen by Discrepancy Principle. As one can see, GCV is characterised by undersmoothing properties while Discrepancy Principle provides oversmoothed reconstructions; this different performance between the two methods is even more
Bremsstrahlung Spectra Emitted by Solar Plasma
I
103
102
'
I
'
'
'
I
'
'
'
!
-
I I
I I
100
I
! I I
4.*
I
I
II
10 "1
iI
JO
3 o,
'
,
c_
L
I
o
I01
I
>.
'
I
,
3
'
481
'IIllI
10_ 2
lu"31.0
-
i
,
i
i
i
50
|
|
|
I
100
'
I
(keY]
e
Figure 1: Real data vector from the HIREX instrument for the June 27 1980 solar flare
-
I
I
i
i
I
i
i i I
I
i
|
I
i
i
I
iii
I
i
i
b
104
103 3
~"
102
L
L
Ld
100 i
20
,
,
, , i ill
80 100
E [keV]
i
200
i
i
20
i
|,
,,,I
i
80 100
200
l
E [keV]
Figure 2: Regularised reconstruction of the electron averaged distribution function F ( E ) : (a) A = 2.4x10 -6 (GCV). (b) A = 1.13x10 -5 (Discrepancy Principle).
482
M. Piana
evident in the case of Laplace inversion, where the reconstruction provided by GCV is still completely unstable; on the contrary, figure 3 shows the regularised differential emission measure obtained by Tikhonov technique, applied in the Sobolev space Hi(0, a), when the regularisation parameter is chosen by Discrepancy Principle. 10-1
::' '1
I
'
'
' ' ' '"1
f!11111
10-2
_. . . . P. w 4.,
........
: ~'?~'~"I'.%~
10-3
10-4
_
o
h
.,~
10"5
i--
I0 -{~
( ~ \ \ ~ .
10-7
"lIIlIil ,I
10 7
.......
I
10 8 T {K}
........
I
10 9
Figure 3: Regularised reconstruction of the differential emission measure ~(T); A = 6.7x10 -1~ (Discrepancy Principle) .
4
COMMENTS AND CONCLUSIONS
The study of ill-posed problems is usually developed by projecting the integral equation onto a finit dimensional space and then by treating the corresponding ill-conditioned linear system with opportune numerical methods. On the contrary, in this paper, we have considered discrete data but we have maintained the solution in infinite-dimensional spaces; this approach has been favoured by the possibility to analytically compute the Gram matrix both in the case of the Volterra operator in L2(0, oc) and in the case of the Laplace operator in Hi(0, a). Tikhonov regularisation in L2(0, c~) seems to be efficient in order to recover the averaged electron distribution function F ( E ) from the real data of figure 1. In particular, from a phenomenological point of view, the two reconstructions obtained by GCV and Discrepancy Principle put in evidence a double power law, typical of flare manifestations, with a spectral slope at electron energy E _ 40 keV. Nevertheless, in the GCV regularised solution it is possible to note a further spectral slope at E ~ 80 keV. The inversion of the Laplace transform is a more severely ill-conditioned problem and so, in order to exhibit a physically significant reconstruction of the differential emission measure, it has been necessary to adopt higher order smoothness assumptions. Then Tikhonov regularisation has been applied in a Sobolev space with the prescription that the solution is zero at the origin. The regularised ~(T) form comprises two components: the first one shows a distribution of relatively cool material with a peak at T _ 107 ~ the second
Bremsstrahlung Spectra Emitted by Solar Plasma
483
one is an "ultra hot" component peaking at 4.5x10 s ~ Such a ~(T) structure has numerous interpretations in the field of the theoretical modelisation of magnetic reconnection mechanisms typical of solar flares.
Acknowledgements It is a pleasure to thank R.P. Lin and C. Johns for providing the experimental data. This work has been partly supported by Consorzio INFM.
References
[1]
M. Bertero, P. Brianzi and E.R. Pike. On the recovery and resolution of exponential relaxation rates from experimental data. III. The effect of sampling and truncation of data on the Laplace transform inversion. Proc. R. Soc. Lond. A 398, pp 23-44, 1985.
[2]
M. Bertero, C. De Mol and E.R. Pike. Linear inverse problems with discrete data. I: General formulation and singular system analysis. Inverse Problems 1, pp 301-330, 1985.
[3]
M. Bertero, C. De Mol and E.R. Pike. Linear inverse problems with discrete data: II. Stability and regularisation. Inverse Problems 4, pp 573-594, 1988.
[4]
J.C. Brown. The deduction of energy spectra of non-thermal electrons in flares from the observed dynamic spectra of hard X-ray bursts. Solar Phys. 18, pp 489-502, 1971.
[5]
J.C. Brown and A.G. Emslie. Analytic limits on the forms of spectra possible from optically thin coUisional bremsstrahhng source models. Astrophys. J. 331, pp 554564, 1988.
[~]
J.C. Brown and D.F. Smith. Solar flares. Rep. Prog. Phys. 43, pp 125-197, 1980.
[7]
R. Courant and D. Hilbert. Methods of mathematical physics. Interscience, New York, 1989.
[8] I.J.D. Craig and J.C. Brown. Inverse problems in astronomy. Adam Hilger, Bristol, 1986.
[9]
A.R. Davies. Optimality in regularisation. In : M. Bertero and E.R. Pike (Ed.), Inverse problems in scattering and imaging, Adam Hilger, Bristol, pp 393-410, 1992.
[10]
H.W. Koch and J.W. Motz. Bremsstrahlung cross section formulas and related data. Rev. Mod. Phys. 31, pp 920-955, 1959.
[11] R.P Lin and R.A. Schwartz. High spectral resolution measurements of a solar flare hard X-ray burst. Astrophys. J. 312, pp 462-474, 1987. [12] M. Piana. Inversion of bremsstrahlung spectra emitted by solar plasma. Astron. Astrophys. 288, pp 949-959, 1994.
484
M. Piana
[13] A.N. Tikhonov. Solution of incorrectly formulated problems and the regularisation method. Soy. Math. Dokl. 4, pp 1035-1038, 1963.
485
AUTHORS
INDEX
Ammann L.P., 333 Barlow J.L., 167 Beltrami E., 5 Berry M.W., 123 Bertero M., 341 Boley D., 3 Boley D., 295 Bunch J.R., 175 Cardoso J.-F., 375 Cavassila S., 467 Chen H., 391 de Beer R., 467 De Lathauwer L., 383 De Mol C., 341 De Moor B.L.R., 61 De Moor B.L.R., 383 DeGroat R.D., 227 DeGroat R.D., 235 Dehaene J., 259 Deprettere E.F., 267 Deprettere E.F., 277 Dewilde P., 209 Dologlou I., 323 DoMing E.M., 227 DoMing E.M., 235 Drma(: Z., 107 Drma~ Z., 115 Fierro R.D., 175 Fierro R.D., 183 Fu Z., 235 Gallivan K., 199 Gander W., 349 Gersemsky F., 287 Golub G.H, 139 Golub G.H., 349 GStze J., 251 Graveron-Demilly D., 467
Hanke M., 131 Hansen P.C., 131 Hansen P.C., 183 Hansen P.C., 459 Hansen S.D., 459 Helmke U., 33 Hosur S., 295 Hiiper K., 251 Jensen S.H., 459 Ks B., 207 Kirsteins I.P., 415 Kruth J.P., 367 Kullstam J.A., 423 Lemmerling P., 191 Linebarger D.A., 227 Linebarger D.A., 235 Lorenzelli F., 243 Luk F.T., 305 Ma W., 367 Moonen M., 259 Moonen M., 267 O'Leary D.P., 315 Otte D., 357 Pan C.-T., 157 Park H., 399 Paul S., 251 Piana M., 475 Qiao S., 149 Ramos J.A., 433 Rosen J.B., 399 Shah A.A., 407 Skowronski J., 323 Slock D.T.M., 449 Solna K., 139 Sorensen D.C., 21 Sorensen J.A., 459 Stewart M., 441
Strebel R., 349 Tang P.T.P., 157 Tewfik A.H., 295 Thao N.T., 79 Thirumalai S., 199 Tufts D.W., 407 Tufts D.W., 415 Vaccaro R.J., 407 van Dijk H.W., 277 Van Dooren P., 139 Van Dooren P., 199 Van Dooren P., 441 Van Huffel S., 191 Van Huffel S., 391 Van Huffel S., 399 Van Huffel S., 467 van Ormondt D., 467 Vandevoorde D., 305 Vandewalle J., 259 Vandewalle J., 383 Vandewalle J., 391 Vanhamme L., 191 Vanpoucke F., 267 Varadhan S., 123 Verriest E.I., 423 Verriest E.I., 433 Veselid K., 115 Vetterli M., 79 von Matt U., 99 Wayer F.T.A.W., 467 Yang B., 287 Yao K., 243 Ye H., 227 Yoon P.A., 167 Zha H., 167
This Page Intentionally Left Blank