LARGE SCALE EIGENVALUE PROBLEMS
NORTH-HOLLAND MATHEMATICS STUDIES
NORTH-HOLLAND -AMSTERDAM
127
NEW YORK OXFORD *TOKYO
LARGE SCALE EIGENVALUE PROBLEMS Proceedings of the IBM Europe Institute Workshop on Large Scale Eigenvalue Problems held in Oberlech, Austria,, July 8-72, 7985
Edited by:
Jane CULLUM and
Ralph A. WILLOUGHBY Mathematical Sciences Department IBM ThomasJ, WatsonResearch Center Yorktown Heights, New York, U.S.A.
1986
NORTH-HOLLAND -AMSTERDAM
NEW YORK
OXFORD *TOKYO
Elsevier Science Publishers B.V., 1986 Allrights reserved. No part of this publication may be reproduced, stored in a retrievals ystem, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the priorpermission of the copyright owner.
ISBN: 0 444 70074 9
Publishers:
ELSEVIER SCIENCE PUBLISHERS B.V. P.O. Box 1991 1000 BZ Amsterdam The Netherlands
Sole distributors for the U.S.A. and Canada:
ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 52Vanderbilt Avenue NewYork, N.Y. 10017 U.S.A.
Library of Congress Cataloging-in-Publication Data
IBM Europe Institute on Large Scale Eigenvalue Problems (1985 : Oberlech, Austria) Large scale eigenvalue problems. (North-Holland mathematics studies ; 127) Includes.bibliographfes and index. 1. Eigenvalues--Congresses. 2. Eigenvaues--Data processing--Congresses. I. Cullum, Jane K., 193811. Willoughby, Ralph A. 111. Title. IV. Series. ~ 9 3 . 1 2 6 1985 512.9’434 86-13544 ISBN 0-444-70074-9
PRINTED I N THE NETHERLANDS
V
PREFACE
The papers which are contained in this book were presented at the IBM Europe Institute Workshop on Large Scale Eigenvalue Problems which was held at Oberlech, Austria, July 8-12,
1985. This Workshop was one in a series of summer workshops sponsored by the IBM World Trade Corporation for European scientists.
The unifying theme for this Workshop was ‘Large Scale Eigenvalue Problems’. The papers contained in this volume are representative of the broad spectrum of current research on such problems. The papers fall in four principal categories:
(1) Novel algorithms for solving large eigenvalue problems ( 2 ) Use of novel computer architectures , vector and parallel (3) Computationally-relevant theoretical analyses (4) Science and engineering problems where large scale eigenelement computations have pro-
vided new insight.
Most of the papers in this volume are readily accessible to the reader who has some knowledge of mathematics. A few of the papers require more mathematical knowledge. In each case, additional papers on these subjects are available from the authors of these papers. The interested reader can obtain such reprints by writing to the appropriate authors. A complete list of the names and addresses of the authors is included at the end of this book. A corresponding list
of the Workshop speakers who were not able to submit papers is also included. Interested readers should consult both Lists.
Jane Cullum Ralph A. Willoughby Program Organizers April 1986
This Page Intentionally Left Blank
vii
TABLE OF CONTENTS
Introduction to the Proceedings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jane Cullum and Ralph A. WiUoughby
......
1
High Performance Computers and Algorithms from Linear Algebra. . . . . . . . . . . . . . . . . Jack J. Dongarra and Daniel C. Sorensen
15
......
31
..........
51
......
67
Eigenvalue Problems and Algorithms in Structural Engineering. . . . . . . . . . . . . . . . . . . . Roger G. Grimes, John G. Lewis and Horst D. Simon
81
.....
95
The Impact of Parallel Architectures on the Solution of Eigenvalue Problems. . . Ilse C.F. Ipsen and Youcef Saad
Computing the Singular Value Decomposition on a Ring of Array Processors. Christian Bischof and Charles Van Loan
Quantum Dynamics with the Recursive Residue Generation Method: Improved Algorithm for Chain Propagators. . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert E. Wyatt and David S. Scott
A Generalised Eigenvalue Problem and the Lanczos Algorithm. . . . . . . . . . . Thomas Ericsson
Numerical Path Following and Eigenvalue Criteria for Branch Switching. . . . . . . . . . . . Yong Feng Zhou and Axel Ruhe
121
The Lanczos Algorithm in Molecular Dynamics: Calculation of Spectral Densities. . . . . . Giorgio Moro and Jack H. Freed
143
...
Vlll
Table of Contents
Investigation of Nuclear Dynamics in Molecules by Means of the Lanczos Algorithm. . Erwin Haller and Horst Koppel
163
...........
181
Examples of Eigenvalue/Vector Use in Electric Power System Problems. James E. Van Ness
A Practical Procedure for Computing Eigenvalues of Large Sparse Nonsymmetric Matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jane Cullum and Ralph A. Willoughby
Computing the Complex Eigenvalue Spectrum for Resistive Magnetohydrodynamics. . Wolfgang Kerner
193
24 1
............
267
Stably Computing the Kronecker Structure and Reducing Subspaces of ............. Singular Pencils A - h B for Uncertain Data. . . . . . . . . . . . . . . . . . . . . . James Demmel and Bo K8gstrom
283
Addresses of Authors and Other Workshop Speakers. . . . . . . . .
............
325
Index to Proceedings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 329
I11 Conditioned Eigenproblems. . . . . . . . . . . . . . . . . . . . . . . . . Francoise Chatelin
Large Scale Eigenvdue Problems J . Cullurn and R.A. WiUoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986
1
INTRODUCTION TO PROCEEDINGS Jane Cullum Ralph A. Willoughby IBM T. J. Watson Research Center Yorktown Heights, New York 10598 USA
We provide a brief summary of each paper contained in this volume, provide some indications of the relationships between these papers, and provide a few additional references for the interested reader.
The papers included in this volume can be classified into the following four categories. (1) Novel algorithms for solving large eigenvalue problems
See the papers by Zhou and Ruhe, by Kerner, and by Cullurn and Willoughby.
( 2 ) Use of novel architectures for solving eigenvalue problems
See the papers by Dongarra and Sorensen, by Ipsen and Saad, and by Bischof and Van Loan. The paper by Dongarra and Sorensen addresses both the question of restructuring the EISPACK library routines [ 19771 of restructuring the LINPACK library routines [1979] for novel architectures.
(3)
Computationally-relevan t theoretical analyses
See the papers by Demmel and Kagstrom, by Chatelin, and by Ericsson.
(4) Examples from science and engineering where large scale eigenvalue and eigenvector computations have provided new insight into fundamental properties and characteristics of physical systems, both those existing in nature and those which have been constructed artificially. See the papers by Grimes, Lewis, and Simon, by Van Ness, by Kerner, by Moro and Freed, and by Haller and Koppel.
2
J. Cullum and R.A. Willoughby
Most of the currently active areas of research in modal analysis are represented in this volume. With the exception of the three papers dealing with novel architectures which are presented first, the papers are presented in an ordering which takes us from the 'easiest' problems, the real symmetric eigenvalue problems, to the most difficult ones, the computation of the Kronecker canonical forms of general matrix pencils.
Many engineering and scientific applications yield very large matrices. Historically, the sizes of the matrices which must be used have grown as the computing power has grown. Therefore, there is much interest in understanding how to exploit the new vector and parallel architectures in such computations. The first paper by Dongarra and Sorensen addresses two basic questions dealing with such architectures. First, they look at the types of computers which are currently available and at those which should be available within the next few years. They then examine the question, how d o we exploit such architectures for linear algebra computations? They include introductory descriptions of classifications for the various arithmetic engines and storage hierarchies. This discussion is followed by a table of advanced computers, proposed and existing, together with tables of characteristics of these machines.
Dongarra and Sorensen include some discussion of data communication and how that relates to algorithm performance. The cost of algorithm execution can be dominated by the amount
of memory traffic rather than by the number of floating point operations involved. A performance classification is given for algorithms on a vector computer; scalar, vector, and super-vector. Data management and synchronization add to the complications in designing algorithms for parallel computers.
The authors also address such issues as program complexity, robustness, ease of use, and portability, each of which plays an important role in the analysis. Certain basic vector and matrix-vector operations are fundamental to many linear algebra algorithms and these are examined very carefully. The authors contend that it is possible to achieve a reasonable fraction
of the peak performance on a wide variety of different architectures through the use of program modules that handle certain basic procedures. These matrix-vector modules form an excellent basis for constructing linear algebra programs for vector and parallel processors. With this philosophy, the machine dependent code is isolated to a few modules. The basic routines provided
Introduction
3
in the linear equation solving package, LINPACK [1979], and in the EISPACK library [ 19771 for eigenvalue and eigenvector computations, achieve vector but not super-vector performance.
The paper by Ipsen and Saad presents a brief survey of recent research on multiprocessor (parallel) architectures and on algorithms for numerical linear algebra coniputations which take advantage of such architectures.
The emphasis is on algorithms for solving symmetric
eigenvalue problems. Basic terminology is introduced and data communication problems such as start-up times and synchronization are discussed. Three loosely coupled architectures; a processor ring, a two-dimensional processor grid and the hypercube are considered.
The paper by Bischof and Van Loan looks at one of the parallel architectures which is available today, the LCAP configuration designed by Enrico Clementi of IBM and at the problem of implementing a block Jacobi, singular value decomposition algorithm on LCAP. LCAP consists of ten Floating Point Systems FPS-l64/Max array processors connected in a ring structure via large bulk memories. The algorithm is a block generalization of a parallel Jacobi scheme which appeared in the paper by Brent, Luk, and Van Loan [1985]. The parallel procedure developed in the Bischof and Van Loan paper could also be applied to the real symmetric eigenvalue problem. The authors however did not achieve the speedups which others had predicted. This paper is a good illustration of the difficulties and considerations encountered in translating an idea for a parallel algorithm into a practical procedure.
Many of the papers in this volume use procedures which rest upon the so-called Lanczos recursion. For more details and background information on this recursion, the reader is referred to Parlett [ 19801 and Cullum and Willougby [ 19851. Both of these books contain bibliographies with references to much of the recent research on Lanczos procedures. A brief survey of recent research in this area is contained in Cullum and Willoughby [1985b].
The paper by Wyatt and Scott provides an example of the use of a real symmetric Lanczos procedure. The objective is to compute time dependent quanta1 transition probabilities. These transition probabilities are obtainable from differences of survival amplitudes for surviving in a particular state at time t given that we started in that state at time t=O. These survival amplitudes can be computed if all of the eigenvectors of an associated Hamiltonian operator are known. However, an eigenvector decomposition of this operator cannot be obtained easily.
J. Cullurn and R . A . Willoughby
4
The Lanczos algorithm provides real symmetric tridiagonal representations of this Hamiltonian which reduce (at least theoretically) the survival amplitude computations to computations of the eigenvalues of tridiagonal matrices and computations of the first components of the eigenvectors of these tridiagonal matrices. Both of these computations are reasonable. Specifically, Wyatt and Scott compute
where M is the size of the Lanczos matrix being used; s l a denotes the first component of the eigenvector of that tridiagonal matrix corresponding to the eigenvalue E,.
Wyatt and Scott use the Lanczos recursion with no reorthgonalization because they have very large matrices and therefore the amount of computer storage which would be required by the Lanczos methods which require reorthogonalization would, be too large. However, if the Lanczos vectors are not reorthogonalized, then extra or 'spurious' eigenvalues can appear among the eigenvalues of the Lanczos matrices. These are not genuine representations of the eigenvalues of the original matrix, and these eigenvalues must be handled appropriately if the results are to have any validity. Such eigenvalues can however be identified very easily, see Cullum and Willoughby [1985], and in earlier papers Wyatt and coauthors were using this identification test. However, the main point of this current paper is that for their particular application it is not necessary to sort these Lanczos eigenvalues. All of the computed quantities can be used in their computations and they still get correct results. They demonstrate this numerically. A partial explanation for this follows directly from the characterization of these spurious eigenvalues given in Cullum and Willoughby [ 19851. The spurious eigenvalues are eigenvalues of a particular submatrix of the Lanczos matrix being considered and because of this the first components of their eigenvectors are pathologically small. Therefore, their contributions to the sum in Eqn.( 1.1) are pathologically small.
Algorithms exist for computing eigenvalues of large, real symmetric generalized eigenvalue problems Ax = XBx where A and B are real symmetric and B is positive definite. However, there are many open questions regarding this problem when neither A nor B is not definite. Grimes, Lewis and Simon focus on this problem. They first provide a survey of the types of eigenvalue/eigenvector problems encountered in structural engineering problems. They then
I n trudicction
5
discuss extensions of procedures designed for the standard real symmetric eigenvalue problem to such problems. When B is not definite, these 'symmetric' problems are genuinely nonsymmetric and many numerical difficulties can be experienced in trying to solve them. Unfortunately, in many structural problems, B is only positive semidefinite.
Grimes, Lewis and Simon outline the two most common classes of structural engineering problems, vibration and buckling analyses. In vibration analyses, the higher frequency modes of vibration are not important because it is unlikely that they will be excited. In buckling analyses usually only the smallest positive eigenvalue and corresponding eigenvector are required. Because of the slow convergence of such eigenvalues in the numerical algorithms designed thus far, it is typical to factor one or more of the matrices involved in the eigenvalue computations and to use such factorizations to transform the desired eigenvalues into eigenvalues with dominant magnitudes. When matrices A
-
oB are factored, the method being used is called a shift
and invert method.
Grimes, Lewis, and Simon also list some of the practical considerations which must be faced by any numerical analyst designing algorithms which are to be used within the constraints of structural engineering packages. For example, typically there are restrictions on the way the required data is stored and can be accessed. For this reason block versions of modal algorithms have a number of desirable features for structural engineering calculations. A detailed discussion of a block Lanczos procedure for the problem Ax = ABx, where A and B are real symmetric and B is positive semidefinite, will appear shortly. See reference 7 in the Grimes, Lewis and Simon paper. The authors point out that in some situations it is necessary to use a model which involves large nonsymmetric matrices and/or solve nonlinear eigenvalue problems. However, satisfactory eigenvalue procedures for these nonsymmetric structural problems have not yet been devised. This is an open area for research.
The paper by Ericsson derives some of the computationally-important theoretical properties of generalized eigenvalue problems Kx = XMx, where K and M are real symmetric matrices. In the first part of his paper he develops the analysis which he needs for examining three types of procedures for computing eigenvalues:
(1) Inverse Iteration; ( 2 ) Power Methods; and (3)
Lanczos Methods. In most of his paper he assumes that K is a nonsingular matrix and that the pencil of matrices (K - AM) is nonsingular. Equivalently, this means that K and M do not have
J. Cullurn and R . A . Willoughby
6
a common null vector. He proves however, that any theorem which is valid under those assumptions must also be valid when K is singular so that there is no loss of generality in his arguments.
Under these conditions these generalized problems can behave very nonsymmetrically and Ericsson illustrates that type of behavior with examples. He then focuses on the problem for
M a positive semidefinite matrix. This type of problem is encountered frequently in structural engineering problems. He uses the analysis which he has developed to look at the ability of the three procedures listed above to compute good approximations to the eigenvectors of the given generalized problem. This analysis points out a basic difficulty with both Lanczos methods and with power methods, namely keeping the eigenvector approximations in the proper part of the space. Since M is singular, the generalized eigenvalue problem has infinite eigenvalues. If the starting vector in the Lanczos procedure contains a nonzero projection on the subspace spanned by the eigenvectors corresponding to these infinite eigenvalues, then this projection may grow and produce significant errors in the resulting Ritz vectors computed. This is a serious problem which must be dealt with numerically. Ericsson also shows that this problem can happen when
M is nonsingular but very ill-conditioned. He also addresses the question of obtaining error estimates for computed Ritz vectors in the case that M is singular.
As mentioned in the Grimes, Lewis and Simon paper, a more accurate representation of a
particular structural problem may require the solution of a nonlinear eigenvalue problem: Find u and h such that G(u, A) = 0 where G is a nonlinear vector function, u is a vector of the same dimension, and h is a scalar.
Zhou and Ruhe examine such problems, not only in the context of solving nonlinear eigenvalue problems, but as a general path following problem. Typically, the manifold of solutions consists of a curve or path as illustrated in several figures in this paper. Bifurcation points in this curve, places where several paths meet, and turning points, where the curve has the hyperplane h = c (c constant) are of interest. These are points where the Jacobian of G with respect to u is singular. In the linear problem there is a bifurcation point at each eigenvalue. They propose a modification to the Euler-Newton path following algorithm which uses the solution of a linear eigenproblem to give both a prediction of the position of singular points and the direction of bifurcating branches. Several examples illustrating this idea are included.
Introduction
I
The next level of difficulty in dealing with algorithms for solving eigenvalue problems is to design procedures which are applicable to complex symmetric problems. For diagonalizable, complex symmetric matrices one can write down a Lanczos recursion which is completely analogous to the real symmetric recursion. The left and the right eigenvectors of a complex symmetric matrix are identical so only one set of Lanczos vectors has to be generated. As is shown in Wilkinson [1965], in the general nonsymmetric case, it is necessary to replace the single Lanczos recursion which is used in the real symmetric case by a set of two such recursions. One of these recursions uses the given matrix and the other recursion uses the transpose of the given matrix. The papers by Haller and Koppel and by Moro and Freed describe applications where eigenvalue and eigenvector computations are used to obtain basic physical properties of molecular systems and the matrices involved are complex symmetric. Haller and Koppel consider both real symmetric and complex symmetric matrices.
Moro and Freed are studying molecular motion.
The information obtained from
spectroscopic or scattering techniques yields only macroscopic responses to external perturbing influences. The objective in these studies is however, to understand the basic mechanisms controlling the molecular motion. Moro and Freed describe the connections between experimental measurements, spectral density computations and the identification of these basic mechanisms.
In practice, different theoretical models for the underlying mechanisms are assumed and then comparisons of the resulting macroscopic quantities are made with experimental measurements. These comparisons require spectral densities.
Under certain assumptions, the computations of the spectral densities can be reduced to the computation of the effect of the resolvent of a certain operator on certain vectors. The form
of the operator and of the particular vectors depends upon the system being studied. The authors develop these relationships. They then show how the Lanczos algorithm can be used to obtain a 'tridiagonal' representation of the operator, and how this representation can be used to reduce the required spectral density computations to computations of continued fractions whose coefficients are simply the entries of the tridiagonal matrices generated by the Lanczos procedure.
The main part of the Moro and Freed paper considers problems where the operator can be symmetrized so that it is either a real symmetric or a complex symmetric operator. In both of
8
J. Cullum and R . A . Willoughby
these cases the Lanczos recursions reduce to a single recursion and the Lanczos tridiagonal matrices are symmetric, either real symmetric or complex symmetric. In the last section of their paper the authors extend some of their ideas to more general nonsymmetric operators.
Haller and Koppel are also looking at problems in molecular dynamics. They have attacked the very difficult and interesting problem of modeling the vibronic coupling in polyatomic molecules. They have obtained a model which reproduces gross features of complex experimental spectra. From this they can make several inferences. The basic computation which is required is the determination of the spectral distribution. Because of the sizes of the matrices involved, it is not possible to use standard eigenelement algorithms for these computations. The Lanczos algorithm with no reorthgonalization plays a critical role in their computations. In this paper the authors consider matrices up to size 40800. However, they want to consider matrices of size up to lo6. They use the computed results to support their theoretical models.
Electric power systems problems yield some of the most difficult nonsymmetric eigenvalue/eigenvector problems. Van Ness summarizes and illustrates these types of problems. Modern power systems consist of many generating stations and load centers connected together by an electrical transmission system. Small disturbances in such systems can be studied by using eigenanalysis on linearizations of the system equations around some nominal operating state. The objective of this analysis is to determine whether or not the linearized system has any eigenvalues with positive real parts, and to determine the sensitivities of such eigenvalues and of the eigenvalues with small negative real parts, to perturbations of parameters in the model. Sensitivity analysis requires the computation of eigenvectors. A history of the study of several oscillation problems in power systems is presented.
Nonsymmetric problems are considered in the paper by Cullum and Willoughby. The objective is to devise a Lanczos procedure for computing eigenvalues of large, sparse, diagonalizable, nonsymmetric matrices. The authors propose a two-sided Lanczos procedure with no reorthogonalization which uses both the given matrix A and its transpose. Two sets of Lanczos vectors are generated and the Lanczos matrices are chosen such that they are complex symmetric and tridiagonal matrices. In exact arithmetic the Lanczos vectors generated are biorthogonal. A generalization of the QL algorithm is used to compute the eigenvalues of these matrices. A generalization of the identification test for spurious eigenvalues which was used in
Introduction
9
the case of real symmetric problems is found to apply equally well here. Several properties of complex symmetric tridiagonal matrices are derived. Numerical results on matrices of size n = 104 to size n = 2961 demonstrate the strengths of this procedure and the type of convergence which can be expected. For the problems up to size n = 656, the Lanczos results are compared directly with the corresponding results obtained using the relevant EISPACK subroutines. All arithmetic is complex even if the starting matrix is real. Because there is no reorthogonalization, this procedure can be used on very large problems and can be used to compute more than just a few of the extreme eigenvalues of largest magnitude.
Kerner addresses the question of the stability of plasmas which are confined magnetically. Such plasmas play a key role in the research on controlled nuclear fusion. This is an application where large scale eigenvalue and eigenvector computations provide new insight into basic physical behavior. The most dangerous instabilities in a plasma are macroscopic in nature and can be described by the basic resistive magnetohydrodynamic model.
A well-chosen
discretization of this model transforms this model into generalized eigenvalue problems: Ax = XBx where A-is a general matrix and B is a real symmetric and positive definite matrix. The eigenvalues and eigenvectors of these systems provide knowledge about the behavior of the plasma. Of particular interest are the Alfven modes, and the author studies the effects of the resistivity upon these modes and upon the sound modes. These eigenvalues are found to lie on curves in the plane, and the eigenvalue and eigenvector computations are performed by using a path following technique which uses inverse iteration and a continuation method. Convergence is demonstrated by performing these computations over finer mesh sizes.
Central to the successful computation of eigenelements are both the theoretical stability of the given problem with respect to perturbations in the data and the numerical stability of the algorithm being used to perform the computations. Chatelin addresses both questions. She focuses on defective eigenvalues. Only multiple eigenvalues can be defective. An eigenvalue is defective if its multiplicity as a root of the characteristic polynomial of the given matrix is larger than the dimension of the subspace of eigenvectors associated with that eigenvalue.
Chatelin first looks at the question of condition numbers of eigenvalues and of invariant subspaces. She then shows that the method of simultaneous inverse iteration is not stable if the
10
J. Cullurn and R . A . Willoughby
subspace being computed corresponds to a defective eigenvalue. She proposes a particular modification of a block Newton method which is stable.
Demmel and Kagstrom present algorithms and error bounds for computing the Kronecker canonical form (KCF) of matrix pencils, A
-
XB, where A and B can be rectangular matrices.
For the standard eigenvalue problem, Ax = Ax, the Jordan canonical form (JCF) provides insight into the behavior of the system under perturbations in the matrix A. The K C F is a generalization of the J C F which can be used to provide similar insight into the generalized eigenvalue problem Ax = ABx and into general systems involving pencils of matrices A - hB where A and B may be rectangular. The KCF is obtained by applying left and right nonsingular transformations which simultaneously reduce the A and B matrices to block diagonal matrices. Each of these diagonal blocks may have one of three forms. These forms together with the matrix transformations characterize the subspace associated with each block.
Computing features of the KCF can be an ill-posed problem; that is, small changes in the data may result in large changes in the answers. By restricting the class of perturbations allowed, Demmel and Kagstrom look at the question of how much the Kronecker structure can change when perturbations are made in the matrices.
These perturbations can occur because of
roundoff errors or because of uncertainties in the input data. They then analyze the errors incurred in algorithms for computing Kronecker structures. They are particularly interested in singular pencils, for example when the determinant of (A
-
XB) = 0. The error bounds ob-
tained can be used for determining the accuracy of computed Kronecker features. Their results have applications to control and linear systems and these relationships are discussed. Wilkinson [ 19791provides additional introductory comments.
Several other talks were given at the Austrian Workshop which, for a variety of reasons, are not included in this volume. In particular, some of them have already been published elsewhere. G. W. Stewart opened the workshop with a survey of the basic theory and algorithms used in eigenelement analysis and computation. Much of this material can be found, for example, in his book Stewart [1973] and in the book by Golub and Van Loan [1983]. Later in the Workshop program, Stewart presented an algorithm for doing simultaneous iterations on the ring processor, ZMOB, which resides at the University of Maryland.
Introduction
11
Kosal Pandey described the Hermitian eigenelement problems which arise in his studies of surface properties of various materials. His results are directly applicable to semiconductor materials. Physical and chemical properties of a surface are determined by its surface structure. Two basic questions addressed are: ( 1 ) Determine the atomic structure at the surfaces of such materials; and ( 2 ) Determine basic physical characteristics such as how the surface will react with various chemicals. Pandey obtains this type of information by computing eigenfunctions of Schrodinger’s equation. Using these computations, he has shown that the accepted buckling reconstruction mechanism for the configuration of atoms at surfaces is valid only for heteropolar surfaces. He has proposed an alternative m - bonding model for homopolar surfaces which fits well with both theoretical arguments and with experimental data. For more details on this work the reader is referred to Pandey [1983, 1983b].
In some applications it is necessary to compute the eigenelements of matrices obtained by simple modifications of a given matrix, for example, a rank one modification. In other applications one or more of the eigenvalues of a system are specified a priori and the user is asked to determine a matrix with those eigenvalues. Gene Golub surveyed some of the work on such problems. Much of his talk is contained in the references Golub [1973] and Boley and Golub [ 19781.
A key question in any eigenelement computation is how do we know that the answers ob-
tained are meaningful? Beresford Parlett presented the material contained in the paper Kahan, Parlett, and Jiang [ 1982.1This paper includes error estimates for nonsymmetric problems which are applicable, for example, to Lanczos algorithms for nonsymmetric problems. Parlett gave two talks. The second talk surveyed the recent research on the question of maintaining semiorthogonality of the Lanczos vectors generated by a Lanczos recursion. The interested reader can find much of this material in the book, Parlett [1980].
A complete list of the authors and coauthors with their full addresses is included at the be-
ginning of this book. A corresponding list of the speakers who do not have papers in this volume is contained at the end of this book. The interested reader can obtain additional references by contacting the authors and speakers directly.
J. Cullurn and R.A. Willoughby
12
REFERENCES D. Boley and G. H. Golub (1978), Inverse eigenvalue problems for band matrices, P a . 7th Biennial Conf. Univ. Dudee: k t u m Notes in Math. 630 , Springer, Berlin, 23-3 1. R. Brent, F. Luk, and C. Van Loan (1983, Computing the singular value decompostion using mesh connected processors, J. VLSI and Computer Systems, 1, 242-270. J. Cullurn and R. A. Willoughby (1985), Lanezos Algorithm for Large Spmetric Eigenwlue Computations, Vol. 1, Z7tmyand Vol. 2, Progrums. Progress in Scientific Computing Series, Vol. 3 and Vol. 4, Eds. S. Abarbanel, R. Glowinski, G. Golub, P. Henrici, and H. 0.Kreiss, Birkhauser-Boston. J. Cullurn and R. A. Willoughby (1985b), A survey of Lanczos procedures for very large real 'symmetric' eigenvalue problems, J. Comput. Appl. Math., 12,13, 37-60. J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart (1979), LZNPACK Users Guide, SIAM Publ., Philadelphia.
B. S. Garbow, J. M. Boyle, J. J. Dongarra, and C. B. Moler (1977), Matrix Eigensptem Routins - EISPACK Guide memion, Lecture Notes in Computer Science, Vol. 51, Springer-Verlag, Berlin. G. H. Golub (1973), Some modified matrix eigenvalue problems, SIAM Review, 15, 3 18-334. G. H. Golub and C. F. Van Loan (1983). Matrix Compututions, The Johns Hopkins University Press, Baltimore. W. Kahan, B. N. Parlett, and E. Jiang (1982), Residual bounds on approximate eigensystems of nonnormal matrices, SIAM J. Numer. Anal., 19, 470-484.
[ 101K. C. Pandey (1983), Reconstruction of semiconductor surfaces: Si( 111)-2 x 1, Si( 111)7 x 7, and GaAs(1 lo), J. Vacuum Science and Technology, Vol. A1(2), 1099-1 100. [ 111 K. C. Pandey (1983b), Theory of semiconductor surface reconstruction: Si( 11 1)-7 x 7, Si(ll1)-2
x
1, andGaAs(llO), PhysicaB, 117-118, 761-766.
1121B. N. Parlett (1980). The Spmetric Eigenwlue Problem, Prentice-Hall, Englewood Cliffs, NJ.
[13] G. W. Stewart (1973), Zniroducfionto Matrix Computations , Academic Press, New York. [14] J. H. Wilkinson (1965), The Algebmic Eigenwlue Problem , Oxford University Press, New York.
Introduction
13
[ 151 J. H. Witkinson (1979), Kronecker’s canonical form and the QZ algorithm, Linear Algebra Appl., 28, 285-303.
This Page Intentionally Left Blank
Large Scale Eigenvalue Problems J. Cullurn and R.A. Willoughby (Editors) Elsevier Science Publishers B.V. (North-Holland), 1986
15
High Performance Computers and Algorithms From Linear Algebra JJ. Dongarra and D.C. Sorensen
Mathematics and Computer Science Division Argonne National Laboratory 9700 South Cass Avenue Argonne, Illinois 60439
1. Introduction Within the last ten years many who work on the development of numerical algorithms have come to realize the need to get directly involved in the software development process. Issues such as robustness, ease of use, and portability are now standard in any discussion of numerical algorithm design and implementation. New and exotic architectures are evolving which depend on the technology of concurrent processing, shared memory, pipelining, and vector components to increase performance capabilities. Within this new computing environment the portability issue, in particular, can be very challenging. One feels compelled to structure algorithms that are tuned to particular hardware features in order to exploit these new capabilities, yet, the sheer number of different machines appearing makes this approach intractable. It is very tempting to assume that an unavoidable byproduct of portability will be an unacceptable degradation in performance on any specific machine architecture. Nevertheless, we contend that it is possible to achieve a reasonable fraction of the performance of a wide variety of different architectures through the use of certain programming constructs. Complete portability is an impossible goal at this point in time, but it is possible to achieve a level of transportability through the isolation of machine dependent code within certain modules. Such an approach is essential in our view, to even begin to address the portability problem.
2. Advanced Computer Architectures In the past few years there has been an unprecedented explosion in the number of computers in the marketplace. This explosion has been fueled partly by the availability of powerful and cheap building blocks and by the availability of venture capital. We will examine a number of these machines that offer high performance through the use of vector and parallel processing. A much-referenced and useful taxonomy of computer architectures was given by Flynn [8]. He divided machines into four categories:
(i) SISD - single instruction stream, single data stream (ii) SIMD - single instruction stream, multiple data stream (iii) MISD - multiple instruction stream, single data stream (iv) MlMD - multiple instruction stream, multiple data stream
J. J. Dongarra and D. C. Sorensen
16
Although these categories give a helpful coarse division, we find immediately on examining current machines that the situation is more complicated, with some architectures exhibiting aspects of more than one category. Many of today’s machines are really a hybrid design. For example, the CRAY X - M P has up to four processors (MIMD), but each processor uses pipelining (SIMD) for vectorization. Moreover, where there are multiple processors, the memory can be local, global, or a combination of these. There may or may not be caches and virtual memory systems, and the interconnections can be by crossbar switches, multiple bus-connected systems, time-shared bus systems, etc. With this caveat on the difficulty of classifying machines, we list below the machines considered in this report. We group those with similar architectural features.
scalar pipelined (e.g., 7600, 3090) parallel pipelined microcoded FPS 164 FPS 264 Multijow STAR ST-100
vector memory to menory CDC CYBER 205 register to register American Super. Convex C-1 CRAY-I CRAY X-MP-I Amdahl500,l I00,1200,1400 (Fujitsu VP-50,I 00,200,400} Galaxy YH-I Hitachi S-810 NEC S X - 1 , SX-2 Scientifrc Computer Systems cache based r-to-r Alliant FXII
parallel global memory bus connect Alliant FXf8 (vector capability) Elxsi 6400 Encore Multimax Flex132 IP-1 Sequent Balance 8000 direct connect Amer. Super. (vector capability) CRAY-2 (vector capability) CRAY-3 (vector capability) CRAY X-MP-214 (vector cap.) Denelcor HEP-I local memory hypercube Ametek Sysrem I 4 Intel iPSC NCUBE Connection Machine butterfly BBN Butterjly ring-bus CDC CYBERPLUS lattice Goodyear MPP ICL DAP dataflow b r a 1 DATAFLO multilevel memory ETA-I0 (vector capabiliry) Myrias go00
High Performance Computers and Algorithms f r o m Linear Algebra
17
A more empirical subdivision can be made on the basis of cost. We split the machines into two classes: those costing over $1 million and those under $1 million. The former group is usually classed as supercomputers, the latter as high-performance engines. With this subdivision, we can summarize the machines in the following tables. Table 1 Machines Costing over $lM (base system) Machine
Amdahl 1400
Word length
32/64
0s
Maximum Rate
Memory
in MFLOPS
in Mbytes
1142
256
OWU
32
OW
Number of Roc.
1
(Fujitsu VP400) CRAY-1
64
160
CRAY X-MP
64
2lO/pmc
CRAY-2
64
5OO/proc
2000
UNIX
4
CRAY-3
64
lOOO/proc
16000
UNIX
16
CYBER 205
32/64
400
CYBERPLUS
32164
1OO/proc
Denelcor HEP-I
32/64
lO/PEM
ETA-10
32/64
125O/proc
Hitachi S-810/20 Mynas 4000 NEC SX-2
32/64 32/64/128 32/64
128
32
840 ???
1300
4C3) 16/PEM 2048(c)
OWUNIX
1 1,2,4
OW
1
Om
256
UNIX
16@)
OWU
2,4,6,8
256
OW
512/Krate
UNIX
1024/Krate
256W
Om
1
1
(a) Memory per processor (b) 64 processes possible for each PEM; however, effective parallelism per PEM is 8-10. (c) Also 32 Mwords of local memory with each processor (d) Also a 2-Gbyte extended memory The actual price of the systems in Table 1 is very dependent on the configuration, with most manufacturers offering systems in the $5 million to $20 million range. All use ECL logic with LSI (except the CRAY-I in SSI, CRAY X - M P , and HEP in MSI), and all use pipelining and/or multiple functional units to achieve vectorization/parallelizationwithin each processor. For the multiple-processor systems, the form of synchronization vanes: event handling on the CRAYs, asynchronous variables on the HEP, sendreceive on the CYBERPLUS. The CRAY-3 and ETA-10 are not yet available. Both Amdahl and Hitachi systems are TBM System 370 compatible.
J.J. Dongarra and D. C. Sorensen
18
In Table 2 we summarize machines in the lower price category. Table 2 Machines costing under $1M
Machine
Chip
Parallelism
Alliant FX/8
WTL 1064/1065
I+vector
Connection
cross bar (reg to cache) and bus (cache to memory)
plus 10 gate arrays Ametek System 14
80286/80287
256
hypercube
Amer. Super. Comp.
ECL
(vector)
Axiom
LSI
Vector 1
(scalar)
BBN ButterAy
68020/68881
256
butterfly
Connection
VLSI
64000
hypercube
Convex C-l
Gate array
Vector
(vector)
ELxsi 6400
ECL
12
bus
Encore Multimax
32032/32081
20
bus
FlexJ32
32032/3208 1
20
bus
FPS 364
LSI
1
(scalar)
FPS 264
ECL
1
(scalar)
FPS 164+MAX
VLSl
16
bus
bus
Think Machines/
FPS 5000
VLSI
4
FPS MP32
VLSI
3
bus
ICL DAP
ECL
1024
near-neighbor
Intel iPSC
80286/80287
128
hypercube
IP-1
????
8
cross-bar
b r a 1 DATAFM
32016/32081
256
bus
Goodyear MPP
VLSI
16384
near-neighbor
Muitillow
gate array
8
(scalar)
NCUBE
Custom VLSI
1024
hypercube
scs-40
ECULSI
Vecmr
(vector)
Sequent Balance 8000
32032/32081
12
bus
Star ST-100
VLSI
1
(scalar)
Because of the widely differing architectures of the machines in Table 2 it is not really advisable to give one or even two values for the memory. In some instances there is an identifiable global memory; in others there is a fixed amount of memory per processor. Additionally, it may be possible to configure memory either as local or global. A value for the
High Performance Computers und Algorithms f r o m Linear Algebra
19
maximum speed is even less meaningful than in Table 1, since a high Megaflop rate is not necessarily the objective of the machines in Table 2 and the actual speed will be very dependent upon the algorithm and application. In the other aspects quoted in Table 1, all the machines in Table 2 are very similar. All machines, except the FPSs and the SCS (all 64 bit), the DAP, MYP, and Connection (all bit-slice, the first two supporting variable-precision floating point), and the Star (32 bit), have both 32- and 64-bit arithmetic hardware, with most of them adhering closely to the E E E standard. Also, all machines have a version of UNIX as their operating system, except FPS (host system), American Supercomputer and SCS (COS), and Star and Ametek (own system). The machines listed above are representative implementations of advanced computer architectures. Such architectures involve various aspects of vector, parallel, and parallelvector capabilities. These notions and their implications on the design of software are discussed briefly in this section. We begin with the most basic of these, vector computers. The current generation of vector computers exploits several advanced concepts to enhance their performance over conventional computers: Fast cycle time, Vector instructions to reduce the number of instructions interpreted, Pipelining to utilize a functional unit fully and to deliver one result per cycle, Chaining to overlap functional unit execution, and Overlapping to execute more than one independent vector instruction concurrently. Current vector computers typically provide for "simultaneous" execution of a number of elementwise operations through pipelining. Pipelining generally takes the approach of splitting the function to be performed into smaller pieces or stages and allocating separate hardware to each of these stages. With this mechanism several instances of the same operation may be executed simultaneously, with each instance being in a different stage of the operation. The goal of pipelined functional units is clearly performance. After some initial startup time, which depends on the number of stages (called the length of the pipeline, or pipe length), the functional unit can turn out one result per clock period as long as a new pair of operands is supplied to the first stage every clock period. Thus, the rate is independent of the length of the pipeline and depends only on the rate at which operands are fed into the pipeline. Therefore, if two vectors of length k are to be added, and if the floating point adder requires 3 clock periods to complete, it would take 3 + k clock periods to add the two vectors
20
J. J. Dongarra and D. C. Sorensen
together, as opposed to 3 * k clock periods in a conventional computer Another feature that is used to achieve high rates of execution is chaining. Chaining is a technique whereby the output register of one vector instruction is the same as one of the input registers for the next vector instruction. If the instructions use separate functional units, the hardware will start the second vector operation during the clock period when the first result from the first operation is just leaving its functional unit. A copy of the result is forwarded directly to the second functional unit and the first execution of the second vector is started. The net result is that the execution of both vector operations takes only the second functional unit startup time longer than the first vector operation. The effect is that of having a new instruction which performs the combined operations of the two functional units that have been chained together. On the CRAY in addition to the arithmetic operations, vector loads from memory to vector registers can be chained with other arithmetic operations. It is also possible to overlap operations if the two operations are independent. If an addition and an independent multiplication operation are to be processed, the execution of the second independent operation would begin one cycle after the first operation has started. The key to utilizing a high performance computer effectively is to avoid unnecessary memory references. In most computers, data flows from memory into and out of registers; and from registers into and out of functional units, which perform the given instructions on the data. Performance of algorithms can be dominated by the amount of memory traffic, rather than the number of floating point operations involved. The movement of data between memory and registers can be as costly as arithmetic operations on the data. This provides considerable motivation to restructure existing algorithms and to devise new algorithms that minimize data movement. Many of the algorithms in linear algebra can be expressed in terms of a SAXPY operation: y t ytax , i.e. adding a multiple a of a vector x to another vector y. This would result in three vector memory references for each two vector floating point operations. If this operation comprises the body of an inner loop which updates the same vector y many times then a considerable amount of unnecessary data movement will occur. Usually, a SAXPY occurring in an inner loop will indicate that the algorithm may be recast in terms of some matrix vector operation, such as y t y + ~ * x ,which is just a sequence of SAXPYs involving the columns of the matrix M and the corresponding components of the vector x . The advantage of this is the y vector and the length of the columns of M are a fixed size throughout. This makes it relatively easy to automatically recognize that only the columns of M need be moved into registers while accumulating the result y in a vector register, avoiding two of the three memory references in the inner most loop. This also allows chaining to occur on vector machines, and results in a factor of three increase in performance on the CRAY 1. The cost of the algorithm in these cases is not determined by floating point operations, but by memory references. Programs that properly use all of the features mentioned above will fully exploit the potential of a vector machine. These features, when used to varying degrees, give rise to
High Performance Computers arid Algorithms f r o m Linear Algebra
21
three basic modes of execution: scalar, vector, and super-vector [I]. To provide a feeling for the difference in execution rates, we give the following table for execution rates on a CRAY 1: Mode of Execution Scalar Vector Super-vector
I Rate of Execution 0 - 10 MFLOPS 10 - 50 IvLFLOPS 50 - 160 MFLOPS
These rates represent, more or less, the upper end of their range. We define the term MFLOPS to be a rate of execution representing millions of floating point operations (additions or multiplications) performed per second. The basic difference between scalar and vector performance is the use of vector instructions. The difference between vector and super-vector performance hinges upon avoiding unnecessary movement of data between vector registers and memory. The CRAY 1 is limited in the sense that there is only one path between memory and the vector registers. This creates a bottleneck if a program loads a vector from memory, performs some arithmetic operations, and then stores the results. While the load and arithmetic can proceed simultaneously as a chained operation, the store is not started until that chained operation is fully completed. Most algorithms in linear algebra can be easily vectorized. However, to gain the most out of a machine like the CRAY 1, such vectorization is usually not enough. In order to achieve top performance, the scope of the vectorization must be expanded to facilitate chaining and minimization of data movement in addition to utilizing vector operations. Recasting the algorithms in terms of matrix vector operations makes it easy for a vectorizing compiler to achieve these goals. This is primarily due to the fact that the results of the operation can be retained in a register and need not be stored back to memory, thus eliminating the bottleneck. Moreover, when the compiler is not successful, it is reasonable to hand tune these operations, perhaps in assembly language, since there are so few of them and since they involve simple operations on regular data structures. These modules and their usage in the recasting of algorithms for linear algebra are discussed in detail in the next section. With their use, the resulting codes achieve super-vector performance levels on a wide variety of vector architectures. Moreover, these modules have also proved to be effective in the use of parallel architectures. Vector architectures exploit parallelism at lowest level of computation. They require very regular data structures (i.e. rectangular arrays) and large amounts of computation in order to be effective. The next level of parallelism that may be effective is to have individual scalar processors execute serial instruction streams simultaneously upon a shared data structure. A typical example would be the simultaneous execution of a loop body for various values of the loop index. This is the capability provided by a parallel processor. Along with this increased functionality comes a burden. If these independent processors are to work
22
J. J. Dongarra and D. C. Sorensen
together on the same computation they must be able to communicate partial results to each other and this requires a synchronization mechanism. Synchronization introduces overhead in terms of machine utilization that is unrelated to the primary computation. It also requires new programming techniques that are not well understood at the moment. While this situation is obviously more general than that of a vector processor, many of the same principles apply. Typically, a parallel processor with globally shared memory must employ some sort of interconnection network so that all processors may access all of the shared memory. There must also be an arbitration mechanism within this memory access scheme to handle cases where two processors attempt to access the same memoIy location at the same time. These two requirements obviously have the effect of increasing the memory access time over that of a single processor accessing a dedicated memory of the same type. Usually, this increase is substantial and this is especially so if the processor and memory in question are at the high end of the performance spectrum. Again, memory access and data movement' dominate the computations in these machines. Achieving near peak performance on such computers relies upon the same principle. One must devise algorithms that minimize data movement and reuse data that has been moved from globally shared memory to local processor memory. The effects of efficient data management on the performance of a parallel processor can be vely dramatic. For example, performance of the Denelcor HEP computer may be increased by a factor of ten through efficient use of its very large (2K word) register set [12]. The modules again aid in accomplishing this memory management. Moreover, they provide a way to make effective use of the parallel processing capabilities in a manner that is transparent to the user of the software. This means that the user does not need to wrestle with the problems of synchronization in order to make effective use of the parallel processor. The two types of parallelism we have just discussed are combined when vector rather than serial processors are used to construct a parallel computer. These machines are able to execute independent loop bodies which employ vector instructions. The most powerful computers that exist today are of this type. They include the CRAY X - M F line, and a new high performance "mini-super'' FX/8 computer manufactured by Alliant. The problems with using such computers efficiently are of course more difficult than those encountered with each type individually. Synchronization overhead becomes more significant when compared to a vector operation rather than a scalar operation. Blocking loops to exploit outer level parallelism may conflict with vector length etc. Finally, a third level of complication is added when parallel-vector machines are interconnected to achieve yet another level of parallelism. This is the case for the CEDAR architecture being developed at the Center for Super Computing Research and Development at the University of Illinois at Urbana. Such a computer is intended to solve large applications problems which naturally split up into loosely coupled parts which may be solved efficiently on the cluster of parallel-vector processors.
High Performance Coinputers and Algorithms f r o m Linear Algebra
23
3. Performance of Software for Dense Matrix Factorization We are interested in examining the performance of linear algebra algorithms on largescale scientific vector processors and on emerging parallel processors. In many applications, linear algebra calculations consume large quantities of computer time. If substantial improvements can be found for the linear algebra part, a significant reduction in the overall performance will be realized. We are motivated to look for alternative formulations of standard algorithms, as implemented in software packages such as LINPACK 141 and EISPACK [9,111, because of their wide usage in general and their poor performance on vector computers. As mentioned earlier, we are also motivated to restructure in a way that will allow these packages to be easily transported to new computers of radically different design. If this can be accomplished without serious loss of efficiency there are many advantages. In this section we report on some experience with restructuring these linear algebra packages in terms of the high level modules proposed in [ 5 ] . This experience verifies performance increases are achieved on vector machines and that this modular approach offers a viable solution to the transportability issue. This restructuring often improves performance on conventional computers and does not degrade the performance on any computer we are aware of.
Both of the packages have been designed in a portable, robust fashion, so they will run in any Fortran environment. The routines in LINPACK even go so far as to use a set of vector subprograms called the BLAS [lo] to cany out most of the arithmetic operations. The EISPACK routines do not explicitly make reference to vector routines, but the routines have a high degree of vector operations which most vectorizing compilers detect. The routines from both packages should be well suited for execution on vector computers. As we shall see, however, the Fortran programs from LINPACK and EISPACK do not attain the highest execution rate possible on a CRAY 1 [3]. While these programs exhibit a high degree of vectorization, the construction which leads to super-vector performance is in most cases not present. We will examine how the algorithms can be constructed and modified to enhance performance without sacrificing clarity or resorting to assembly language.
To give a feeling for the difference between various computers, both vector and conventional, a timing study on many different computers for the solution of a loOx100 system of equations has been carried out [3]. The LINPACK routines were used in the solution without modification.
J.J. Dongarra and D . C. Sorensen
24
Solving a System of Linear Equations with LINPACK in Full Precision Computer
OS/Compilef
CRAY x - m -1 CDC Cyber 205 CRAY 1s CRAY x-MP-I Fujitsu VP-200 Fujitsu VP-200 Hitachi S-810/20 CRAY 1s CDC Cyber 205 NAS 9060 w/VPF
CFT(Coded BLAS) FTN(C0ded BLAS) CFT(Coded BLAS) CFT(Rol1ed BLAS) Fortran 77(Comp directive) Fortran 77(Rolled BLAS) FORT77/HAP(Rolled BLAS) CFT(Rol1ed BLAS) FTN(Rol1ed BLAS) VS opt=2(Coded BLAS)
Ratiod
MFLOPS'
Time secs
.36 .48
33 25 23 21 19 17 17 12 8.4 6.8
.021 .027 .030 ,032
.54
.57 .64 .72 .74 1 1.5 1.8
,040
,040 .042 ,056 ,082 .lo1
LINPACK routines SGEFA and SGESL were used for single precision and routines DGEFA and DGESL
were used for double precision. These routines perform standard LU decomposition with partial pivoting and backsubstitution. bFull Precision implies the use of (approximately) 64 bit arithmetic, e.g. CDC single precision or IBM
double precision.
'OSlCompiler refers to the operating system and compiler used, (Coded BLAS) refers to the use of assembly language coding of the BLAS, and (Rolled BLAS) refers to a Foman version with, single statement, simple
loops and Comp Directive refers to the use of compiler directives to set the maximum vector length.
dRafio is the number of times faster or slower a particular machine configuration is when compared to the CRAY 1.9 using a Fortran coding for the BLAS. 'MFLOPS is a rate of execution, the number of million floating point operations completed per second. For solving a system of n equations, approximately 2/3n3 + 2n2 operations are performed (we count both additions and multiplications).
The LINPACK routines used to generate the timings in the previous table do not reflect the true performance of "high performance computers". A different implementation of the solution of linear equations, presented in a report by Dongarra and Eisenstat [l], better describes the performance on such machines. That algorithm is based on matrix-vector operations rather than just vector operations. This restructuring allowed the various compilers to take advantage of the features described in Section 2. It is important to note that the
High Performance Computers atid Algorithms f r o m Linear Algebra
25
numerical properties of the algorithm have not been altered by this restructuring. The number of floating point operations required and the roundoff errors produced by both algorithms are exactly the same, only the way in which the matrix elements are accessed is different. As before, a Fortran program was run and the time to complete the solution of equations for a matrix of order 300 is reported. Note that these numbers are for a problem of order 300 and all runs are for full precision.
Solving a System of Linear Equations Using the Vector Unrolling Technique
Computer
OS/Compilef
CRAY X-MP-4 CRAY X-MP-2 Fujitsu VP-200 Fujitsu VP-200 CRAY X-Mp-2 * Hitachi S-810/20 CRAY X-MP-1 + CRAY X-MP-1 CRAY 1-M CRAY 1-S CRAY 1-M CRAY 1-S CDC Cyber 205 NAS 9060 w/VPF
'
CFT(Coded ISAMAX) CFT(Coded ISAMAX) Fortran 77(Comp directive) Fortran 77 CFT FORT77/HAP CFT(Coded ISAMAX) CFT CFT(Coded ISAMAX) CFTfCoded ISAMAX) CFT CFT ftn 200 opt=l VS opt=2(Coded BLAS)
MFLOPS'
Time secs
356 257 220 183 161 158 134 106 83 76 69 66 31 9.7
.05 1 .076 ,083 .099 .113 .115 .136 .172 ,215 .236 ,259 .273 .59 1.9
Comments.
These timings are for four processors with manual changes to use parallel features
' These timings are for two processors with manual changes to use parallel features. +
These timings are for one processor.
Similar techniques of recasting matrix decomposition algorithms in terms of matrix vector operations have provided significant improvements in the performance of algorithms for the eigenvalue problem. In a paper by Dongam, Kaufman and Hammarling
26
J. J. Dongarra and D. C. Sorensen
[6] many of the routines in the EISPACK collection have been restructered to use matrix vector primitives resulting in improvements by a factor of two to three in performance over the standard implementation on vector computers such as CRAY X-MP, Hitachi S8 10/20, and Fujitsu VP-200. Using the matrix vector operations as primitives in constructing algorithms can also play an important role in achieving performance on multiprocessor systems with minimal recoding effort. Again, the recoding is restricted to the relatively simple modules and the numerical properties of the algorithms are not altered as the codes are retargeted for a new machine. This feature takes on added importance as the complexity of the algorithms reach the level required for some of the more difficult eigenvalue calculations. A number of factors influence the performance of an algorithm in multiprocessing. These include the degree of parallelism, process synchronization overhead, load balancing, inter processor memory contention, and modifications needed to separate the parallel parts of an algorithm. A parallel algorithm must partition the work to be done into tasks or processes which can execute concurrently in order to exploit the computational advantages offered by a parallel computer. These cooperating processes usually have to communicate with each other to claim a unique identifier or follow data dependency rules for example. This communication takes place at synchronization points within the instruction streams defining the process. The amount of work in terms of number of instructions that may be performed between synchronization points is referred to as the granularity of a task. The need to synchronize and to communicate before and after parallel work will greatly impact the overall execution time of the program. Since the processors have to wait for one another instead of doing useful computation, it is obviously better to minimize that overhead. In the situation where segments of parallel code are executing in vector mode, typically at ten to twenty times the speed of scalar mode, granularity becomes an even more important issue, since communication mechanisms are implemented in scalar mode.
Granularity is also closely related to the degree of parallelism, which is defined to be the percentage of time spent in the parallel portion of the code. Typically, a small granularity job means that parallelism occurs in an inner loop levels (although not necessarily the innermost loop). In this case, even the loop setup time in outer loops will become significant without even mentioning frequent task synchronization needs. Matrix vector operations offer the proper level of modularity for achieving both performance and transportability across a wide range of computer architectures. Evidence has already been given for a variety of vector architectures. We shall present further evidence in the following sections concerning their suitability for parallel architectures. In addition to the computational evidence, there are several reasons which support the use of these modules. We can easily construct the standard algorithms in linear algebra out of these types of modules. The matrix vector operations are simple and yet encompass
High Performance Computers and Algorithms f r o m Linear Algebra
27
enough computation that they can be vectorized and also parallelized at a reasonable level of granularity [1,2,3,12]. Finally, these modules can be constructed in such a way that they hide all of the machine specific intrinsics required to invoke parallel computation. Thereby shielding the user from being concerned with any changes to the library which are machine specific.
4. Structure of the Algorithms In this section we discuss the way algorithms may be restructured to take advantage of the modules introduced above. Typical recasting that occurs within LINPACK and EISPACK type subroutines is discussed here. We begin with definitions and a description of the efficient implementation of the modules themselves.
4.1 The Modules. Only three modules are required for the recasting of LINPACK in a way that achieves the super-vector performance reported above. They are z=Mw
M
= M - wzT
(matrix x vector). (rank one modification).
and L=
Tz
(solve a triangular system).
Efficient coding of these three routines is all that is needed to transport the entire package from one machine to another while retaining close to top performance. We shall describe some of the considerations that are important when coding the matrix vector product module. The other modules require similar techniques. For a vector machine such as the CRAY-1 the vector times matrix operation should be coded in the form (4.1.1) for j = 1.2,...,n . y(*) t y(*) + M ( * J ) x )
In (4.1.1) the * in the ikst entry implies this is a column operation and the intent here is that a vector register is reserved for the result while the columns of M are successively read into vector registers, multiplied by the corresponding component of x and then added to the result register in place. In terms of ratios of data movement to floating point operations this arrangement is most favorable. It involves one vector move for two vectorfloating point operations. Comparing this to the three vector moves to get the same two floating point operations when a sequence of SAXPY operations are used shows the advantage of using the matrix vector operation. This arrangement is perhaps inappropriate for a parallel machine because one would have to synchronize the access to y by each of the processes, and this would cause busy
28
J.J. Dongarra and D. C. Sorensen
waiting to occur. One might do better to partition the vector y and the rows of the matrix M into blocks
,
Y1 Yz '
=
'
?k,
* .
Y1
Ml
Yz
Mz
+
!k,
'
.
7
Mk
and self-schedule individual vector operations on each of the blocks in parallel: yc t yi + M&
for i = 1,2,._., k.
That is, the subproblem indexed by i is picked up by a processor as it becomes available and the entire matrix vector product is reported done when all of these subproblems have been completed. If the parallel machine has vector capabilities on each of the processors this partitioning introduces short vectors and defeats the potential of the vector capabilities for small to medium size matrices. A better way to partition in this case is
Again, subproblems are computed by individual processors. However, in this scheme, we must either synchronize the contribution of adding in each term Msi or write each of these into temporary locations and hold them until all are complete before adding them to get the final result. This scheme does prove to be effective for increasing the performance of the factorization subroutines on the smaller (order less than 100) matrices. One can easily see this if the data access scheme for LU decomposition shown in Figure 4.1 is studied. We see that during the final stages of the factorization vector lengths become short regardless of matrix size. For the smaller matrices, subproblems with vector lengths that are below a certain performance level represent a larger percentage of the calculation. This problem is magnified when the row-wise partitioning is used.
4.2 Recasting LINPACK subroutines We now turn to some examples of how to use the modules to obtain various standard matrix factorizations. We begin with the LU decomposition of a general nonsingular matrix. Restructuring the algorithm in terms of the basic modules described above is not so obvious in the case of LU decomposition. The approach described here is inspired by the work of Fong and Jordan [7]. They produced an assembly language code for LU decomposition for the CRAY-I. This code differed significantly in structure from those commonly in use because it did not modify the entire k-th reduced submatrix at each step
High Performance Computers and Algorithms f r o m Linear Algebra
29
but only the k-th column of that matrix. This step was essentially matrix-vector multiplication operation. Dongma and Eisenstat [ 11 showed how to restructure the Fong and Jordan implementation explicitly in terms of matrix-vector operations and were able to achieve nearly the same performance from a FORTRAN code as Fong and Jordan had done with their assembly language implementation. The pattern of data references for factoring a square matrix A into PA = LU (with P a permutation matrix, L unit lower triangular, U upper triangular) is shown below in Figure 4.1.1.
L
FZZl
STEP t STEP 2
Figure 4.1.1. Lu Data References
At the k-th step of this algorithm, a matrix formed from columns 1 through k-I and rows k through n is multiplied by a vector constructed from the k-th column, rows 1 through k-1, with the results added to the k-th column, rows k through n. The second part of the k-th step involves a vector-matrix product, where the vector is constructed from the k-th row, columns 1 through k-1, and a matrix constructed from rows 1 through k-1 and columns k+l through n, with the results added to the k-th row, columns k+l though n. One can construct the factorization by analyzing the way in which the various pieces of the factorization interact. Let us consider decomposition of the matrix A into its LU factorization with the matrix partitioned in the following way: L11
30
J.J. Dongarra a n d D.C.Sorensen
Multiplying L and U together and equating terms with A we have:
We can now construct the various factorizations for LU decomposition by determining how to form the unknown parts of L and U given various parts of A, L and U. For example, Given the triangular matrices LI1 and UI1, to construct vectors 1422 = a22 - &u12. must perform, u12= ~ ~ l ' a zI&l , =
1&
and
uI2 and
scalar
uZ2 we
Since these operations deal with triangular matrix LI1 and Ull they can be expressed in terms of solving triangular systems of equations.
Given the rectangular matrices kl and U13, and the vectors izl and u12, we can form vectors I,, and ug3 and scalar uZ2 by forming & = ag3 - /gl~13, u,, = aZ2- /T1ul2, and l32 =
(%2 - L31u12)/u22,
Since these operations deal with rectangular matrices and vectors they can be expressed i n terms of simple matrix-vector operations.
Given the triangular matrix L,,, the rectangular matrix L~~~and the vector ITl we can construct vectors u12 and 132 and scalar uZ2 by forming uI2 = Lil'a12, u22 = q2- ~TIUIZ. 4 2
= (a32 - L ~ I U ~ U Z Z .
These operations deal with a triangular solve and a matrix vector multiply. The same ideas for use of high-level modules can be applied to other algorithms, including matrix multiply, Cholesky decomposition, and QR factorization.
For the Cholesky decomposition the matrix dealt with is symmetric and positive definite. The factorization is of the form A=LLT,
where A = A T and is positive definite. If we assume the algorithm proceeds as in LU decomposition, but reference only the lower triangular part of the matrix, we have an algorithm based on matrix-vector operations which accomplishes the desired factorization.
31
High Performance Computers and Algorithms f r o m Linear Algebra
The final method we shall discuss is the QR factorization using Householder transformations. Given a real m matrix A , the routine must produce an mxm orthogonal matrix Q and an nxn upper triangular matrix R such that
Householder's method consists of constructing a sequence of transformations of the form I
-
(4.2.1)
awwT ,where a wTw = 2.
The vector w is constructed to transform the first column of a given matrix into a multiple of the first coordinate vector el. At the k-th stage of the algorithm one has
and wt is constructed such that (4.2.2) The factorization is then updated to the form
with
However, this product is not explicitly formed, since it is available in product form if we simply record the vectors w in place of the columns they have been used to annihilate. This is the basic algorithm used in LINPACK [4]for computing the QR factorization of a matrix. This algorithm may be coded in terms of two of the modules. To see this, just note that the operation of applying a transformation shown on the left hand side of (4.2.2) above may be broken into two steps: zT = wTA
(4.2.3)
(vector x matrix)
and
A = A - awzT
(rank one modification).
4.3 Restructuring EISPACK subroutines As we have seen, all of the main routines of LINPACK can be expressed in terms of the three modules described in Section 4.1. The same type of restructuring may be used to obtain efficient performance from EISPACK subroutines. A detailed description of this may be found in [6]. In the following discussion we just outline some of the basic ideas
J.J. Dongarra and D. C. Sorensen
32
used there. Many of the algorithms implemented in EISPACK have the following form: Algorithm (4.3.1): For i = 1,.... Generate matrix T, Perform transformation A,, End .
t
TdiF1
Because we are applying similarity transformations, the eigenvalues of A,, are those of A,. Since the application of these similarity transformations represents the bulk of the work, it is important to have efficient methods for this operation. The main difference between this situation and that encountered with linear equations is that these transformations are applied from both sides. The transformation matrices Tiused in Algorithm (4.3.1) are of different types depending upon the particular algorithm. The simplest are the stabilized elementary transformation matrices which have the form T = L P , where P is a permutation matrix, required to maintain numerical stability [9,11,13], and L has the form
The inverse of L has the same structure as L and may be written in terms of a rank one modification of the identity in the following way:
with eTw = 0. If we put
then
B - Ewe:
C + w.7 D
+ wdr - Dwer + (dTw)wer
High Performance Computers and Algorithms from Linear Algebra
33
where cT= eTC, dT = eTD, and el is the f i s t co-ordinate vector (of appropriate size). The appropriate module to use therefore, is the rank one modification. However, more can be done with the rank two correction that takes place in the modification of the matrix D above. In most of the algorithms the transformation matrices Ti are Householder matrices of the form (4.2.1) shown above. This results in a rank two correction that might also be expressed as a sequence of two rank one corrections. Thus, it would be straightforward to arrange the similarity transformation as two successive applications of the scheme (4.2.3) discussed above. However, more can be done with a rank two correction as we now show. Firstly suppose that we wish to form (I-awwT)A(I-puuT), where for a similarity transformation a = p and w = U. We may replace the two rank one updates by a single rank two update using the following algorithm. Algorithm 4.3.2 1. v T = wTA 2. x = A u 3. y' = VT-(PWTX)UT 4. Replace A by A - b u T - a y y T As a second example that is applicable to the linear equation setting, suppose that we wish to form (l-awwT)(I-puuT)A, then as with Algorithm 4.3.2 we might proceed as follows. Algorithm 4.3.3 : 1. v T = wTA 2. xT = uTA 3. yT = v'(pw'u)xT 4. Replace A by A-puxT-awy7 In both cases we can see that Steps 1 and 2 can be achieved by calls to the matrix vector and vector matrix modules. Step 3 is a simple vector operation and Step 4 is now a rank-two correction, and one gets four vector memory references for each four vector floating point operations (rather than the three vector memory references for every two vector floating point operations, as in Step 2 of (4.2.3) ). These techniques have been used quite successfully to increase the performance of EISPACK on various vector and parallel machines. The results of these modifications is reported in full detail in 161. As a typical example of the performance increase possible with these techniques we offer the following table.
J.J. Dongurru and D.C. Sorensen
34
Comparison of EISPACK to Matrix Vector version
Routine
order 50
100
Machine
ELh4HES ORTHES ELMBAK ORTBAK TREDl TRBAK 1 TRED2 SVD nofvectors S V D with/vectors REDUC REBAK
1.5 2.5 2.2 3.6 1.5 4.2 1.6 1.7 1.6 1.8 4.4
2.2 2.5 2.6 3.3 1.5 3.7 1.6 2.0 1.7 2.2 5.8
CRAY 1 CRAY 1 CRAY 1 CRAY 1 CRAY x-Mp-1 CRAY X-MP-1 CRAY x-MP-1 Hitachi S-810/20 Hitachi S-810/20 Fujitsu VP-200 Fujitsu VP-200
(All versions in Fortran) (Speedup of matrix vector versions over the EISPACK routines.)
5. Conclusions As multiprocessor designs proliferate, research efforts should focus on "generic" algorithms that can be easily transported across various architectures. If a code has been written in terms of high level synchronization and data management primitives, that are expected to be supported by every member of the model of computation, then these primitives only need to be customized to a particular realization. A very high level of transportability may be achieved through automating the transformation of these primitives. The benefit to software maintenance, particularly for large codes, is in the isolation of synchronization and data management peculiarities. This desire for portability is often at odds with the need to efficiently exploit the capabilities of a particular architecture. Nevertheless, algorithms should not be intrinsically designed for a particular machine. One should be prepared to give up a marginal amount of efficiency in trade for reduced man power requirements to use and maintain software. There are many possibilities for implementation of the general ideas that are briefly
High Performance Computers and Algorithms from Linear Algebra
35
described above. We are certainly not in a position to recommend a particular implementation with any degree of finality. However, we already have experience indicating the feasibility of both of the approaches discussed here. We believe that a high level of transportability as described above can be achieved without seriously degrading potential performance. We would like to encourage others to consider the challenge of producing transportable software that will be efficient on these new machines.
6. References J.J. Dongarra and S.C. Eisenstat, Squeezing the Most out of an Algorithm in CRAY Fortran, ACM Trans. Math. Software, Vol. 10, No. 3, 1984. J.J. Dongarra and D.C. Sorensen, A Fully Parallel Algorithm for the Symmetric Eigenvalue Problem, Argonne National Laboratory, Report MCS/Tm 62, (January 1986). J.J. Dongma, Performance of Various Computers Using Standard Linear Equations Software in a Fortran Environment, Argonne National Laboratory Report MCS-TM23, (updated August 1984).
J.J. Dongma, J.R. Bunch, C.B. Moler, and G.W. Stewart, LINPACK Users’ Guide, SIAM Publications, Philadelphia, 1979. J.J. Dongarra, J. Du Croz, S. Hammarling, R.J. Hanson, A Proposal for an Exrended Set of Fortran Basic Linear Algebra Subroutines, Argonne National Laboratory Report MCS/TM 41, Revision 1 (October, 1985). J.J. Dongarra, L. Kaufman, and S. Hammarling Squeezing the Most our of Eigenvalue Solvers on High-Performance Computers, Argonne National Laboratory Report ANL MCS-TM 46, (January 1985), to appear in Linear Algebra and Its Applications. K. Fong and T.L. Jordan, Some Linear Algebra Algorithms and Their Performance on CRAY-1, Los Alamos Scientific Laboratory, UC-32, June 1977. Flynn, M. J. Very high-speed computing systems. Proc IEEE, vol. 54, pp. 1901-1909, (1966).
36
J.J. Dongarra and D.C. Sorensen
[9] B.S. Garbow, J.M. Boyle, J.J. Dongma, and C.B. Moler, Matrix Eigensystern Routines - EISPACK Guide Extension, Lecture Notes in Computer Science, Vol. 51, Springer-Verlag, Berlin, 1977. [lo] C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, Basic Linear Algebra Subprograms f o r Fortran Usage. ACM Trans. Math. Software, 5 (1979), 308-371.
[ I l l B.T. Smith, J.M. Boyle, J.J. Dongma, B.S. Garbow, Y. Ikebe, V.C. Klema, and C.B. Moler, Matrix Eigensystem Routines - EISPACK Guide, Lecture Notes in Computer Science, Vol. 6, 2nd edition, Springer-Verlag, Berlin, 1976.
[12] D.C. Sorensen, Buffering f o r Vector Performance on a Pipelined MIMD Machine, Parallel Computing, Vol. 1, pp. 143-164, 1984. [131 J.H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford 1965.
Work supported in part by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy under Contracts W-31-109-Eng-38 and DE-AC05-840R21400.
Large Scale Eigenvalue Problems I. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-HoUand), 1986
31
THE IMPACT OF PARALLEL ARCHITECTURES ON THE SOLUTION O F EIGENVALUE PROBLEMS Ilse C.F. Ipsen, Youcef Saad Department of Computer Science Yale University New Haven, Connecticut U.S.A.
This paper presents a short survey of recent work on parallel implementations of Numerical Linear Algebra algorithms with emphasis on those relating to the solution of the symmetric eigenvalue problem on loosely coupled multiprocessor architectures. The vital operations in the formulation of most eigenvalue algorithms are matrix vector multiplication, matrix transposition, and linear system solution. Their implementations on several representative multiprocessor systems will be described, as well as parallel implementations of the following classes of eigenvalue methods : QR, bisection, divide-and-conquer, and Lanczos algorithm. 1. Introduction Undoubtedly the most significant impact on research in Scientific Computation, and Numerical Linear Algebra in particular, has come about with the advent of vector and parallel computation. This paper presents a short survey of recent work on parallel implementations of Numerical Linear Algebra algorithms with emphasis on those relating to the solution of the symmetric eigenvalue problem on loosely coupled multiprocessor architectures. Although the concept of parallel computation was well understood in the early years of electronic computing (already Babbage recognised it as a powerful means for speeding up the multiplication of two numbers [4]), it intermittently had to give way to the largely sequential von Neuman Computer. The recent turn to parallel computer architectures is motivated by the serious limitations inherent in the von Neuman model, the most important one being its limits to miniaturisation, imposed by physical constraints, which put a bound on the maximum speed of a logical circuit. Consequently, the only means for increasing computing speed by orders of magnitude is to resort to parallelism. In a parallel computer (also called ‘multiprocessor’) different processors share the computations involved in the solution of a problem. To this end, the computation must be decomposed and broken up into tasks, which can be performed simultaneously by the different processors; and organised co-operation among the processors must be established by means of synchronisation and data exchange. The selection of an algorithm, its decomposition into separate computational tasks and their subsequent assignment to particular processors, z well as the physical channels and protocols by means of which the processors communicate are among the many factors leading to a multitude of parallel implementations for any one problem. The above decisions are aggravated by the need (or perhaps absence) of adequate performance measures. Even if a certain multiprocessor machine is already specified, one still faces the problem of having to decompose a particular algorithm into tasks with the objective of gaining maximal speed-up and a balanced work-load for all processors. Before that, however, reliable criteria for evaluating and comparing the performance of different implementations of an algoritlim on that machine are indispensable. It is also necessary, of course, to be able to
L C.F. Ipsen and Y. Saad
38
compare implementations of different algorithms on one machine, as well as implementations of different algorithms on diflerent machines. These issues are far from being resolved. One of the reasons is that a fair assessment of two architectures must be based on the availability of adequate hardware as well as software. Yet, due to the absence of systematic design techniques the development of software is and will undoubtedly continue to lag behind hardware development. Section 2 presents a brief characterisation of popular parallel architectures and justifies our preference for a loosely coupled multiprocessor architecture. The choice of machine model in turn influences the choice of a parallel algorithm for solving a particular problem. Extensive surveys of parallel algorithms can be found in the articles by Heller [35], Sameh [69, 70, 711, and Ortega and Voigt [60]. Unlike in the single processor case the performance of a parallel algorithm is not only judged by its arithmetic speed but, equally, by the time required to exchange data and coordinate co-operation among processors; understanding of this aspect has only started [29, 30, 611. Accordingly, we present a collection of parallel algorithms for basic Linear Algebra tasks in Section 3 and their application to the parallel solution of eigenvalue problems in Section 4. To conclude, the last section casts a glance at some novel techniques which promise to alleviate the complex problem of parallel algorithm development. 2. Architectures
There exist quite a few classifications of multiprocessor architectures, and we will employ two of them. The first one distinguishes architectures by the way processors relate their instructions to the data while the second one groups machines according to the structure of their communication environment. The two most important categories of parallel architectures are Single Instruction Stream Multiple Data Stream (SIMD) and Multiple Instruction Stream Multiple Data Stream- (MIMD) machines [24]. SIMD machines initiate a single stream of vector instructions, which may be realised by pipelining in one processor or operating arrays of processors [38]. Examples include the CRAY-1, the ICL-DAP [23] and the ILLIAC IV [80]. MIMD machines simultaneously initiate different instructions on (necessarily) different data streams, essentially all multiprocessor configurations are included in this class [38]. Among the MIMD machines one can in turn differentiate between two types :
Shared memory models : processors have very little local or ‘private’ memory; they exchange data and co-operate by accessing a global shared memory. 0 Distributed memory models : there is no global memory, but processors possess a significant amount of local memory (with no access to other processor’s local memory); there are physical interconnections between certain pairs of processors, and data and control information is transferred from one processor to another along a path of these interconnections. 0
2.1. The Shared Memory Model The shared memory model is frequently implemented by connecting k processors to k memories via a large switching network, see Figure 1 (this switching network may be replaced by a global bus when the number of processors k is small). Thus the memory can be viewed as split into k ‘banks’, and shared among the k processors. Variations on this scheme are numerous, but the essential features here are the switching network and the shared memory; examples include the Ultracomputer developed at NYU [32] which uses an Omega network. Programming is greatly facilitated due to transparent data access (from the user’s point of view data are stored in one large memory readily accessible to any processor) and the ability of the switching network to simulate any interconnection topology. However, memory conflicts can lead to degraded performance and the shared memory models cannot easily take advantage of proximity of data in problems with local (data) dependences; these questions
Parallel Architectures apld the Solution of Eigenvalue Problems
39
Figure 1: A Tight,ly Coupled Shared Memory Machine. are addressed in [26, 791. Furthermore, the switching network becomes exceedingly complex as the number of processors and memories increases : the connection of N processors to N memories in general requires a total of O ( N log2 N ) identical 2 x 2 switches. 2.2. The Distributed Memory Model
In the distributed memory model, the processors are identical and the processor interconnections form a regular topology; examples are depicted in Figures 2, 3 and 4. There is no tight global synchronisation, and the computations are data driven (ie, a computation in a particular processor is performed only when the needed operands become available). Examples include the finite element machine [47], tree machines [12], the cosmic cube [78] and systolic arrays (511. Clearly, one of the most important advantages of the second class of architectures is its ability to exploit locality of data dependences in order to keep communication costs to a minimum. Thus, a two-dimensional processor grid as in Figure 3 is perfectly suitable for solving discretised elliptic partial differential equations (eg, by assigning each grid point to a corresponding processor) because iterative methods for solving the resulting linear systems require only interaction between adjacent! grid points. Hence, an efficient general purpose multiprocessor must have powerful mapping capabi6ities, ie, it must be able to easily emulate many common topologies such as grids or linear arrays. 2.3. Hypercube-based Architectures The ‘hypercube’ (boolean cube, n-cube), a distributed memory machine, constitutes an excellent compromise between a linear array and a completely interconnected network of processors. It offers a rich interconnection structure with large bandwidth, logarithmic diameter, and the ability to simulate every realistic architecture with small overhead. This explains the growing interest in hypercube-based parallel machines; commercially available machines at this point (last quarter of 1985) are the 128-processor INTEL iPSC/d7, the 1024-processor NCUBE/Ten, the 64000-processor (bit-sliced) Connection Machine from Thinking Machines and the soon-to-be-available 256-processor Ametek/System 14 [19]. The topology of a hypercube is best described by a simple recursion : a hypercube of dimension 1 consists of two connected processors and a hypercube of dimension n f l is made up of two identical subcubes of dimension n by connecting processors in corresponding positions; an illustration is given in Figure 4 which shows a four-dimensional cube constructed from two three-dimensional cubes. Topological characterisations of the hypercube, in particular with respect to embeddings of different graphs in the hypercube are investigated in (6, 67, 651.
40
I, C F. Ipsen and Y. Saad
Figure 2: A Processor Ring Consisting of Eight Processors.
Figure 3: A 4 x 4 Multiprocessor Grid. 3. Parallel Algorithms for Basic Linear Algebra Computations
When analysing the complexity of parallel methods in numerical linear algebra, one must bear in mind that the total time required to run an algorithm on a multiprocessor system does not only depend on pure arithmetic time but also on the time needed for exchanging data among processors. This implies a great richness in the class of algorithms, in terms of the assignments of tasks to processors and the assumed topology of the processor communication network : for a particular task it is now important when its input data become available as results of preceding calculations, in which processor they are located, and how long it will take to move them to the requesting processor.
Parallel Architectures and the Solution of Eigenvalue Problems
41
Figure 4: A Hypercube of Dimension 4. In many practical applications the number of processors k will usually be much smaller than the ‘problem size’ N (eg, the order of the matrix), and a large variety of algorithms can be found by choosing different ways of assigning matrix elements to the processors. We must take into account that times for data transfer are not negligible and may, in fact, dominate the times for actual arithmetic. A fairly general and yet simple communication model is proposed in (431, and the algorithms are characterised and compared with respect to their requirements for arithmetic as well as communication. It is assumed, that any processor is capable of writing to one of its directly connected neighbours while reading from the other. For purposes of estimating the computation time, processors are considered to work in lock step where one step corresponds to the computation time of the slowest processor, which, in particular, implies that identical tasks will take an equal amount of time if started simultaneously on different processors. This assumption is by no means restrictive and its sole purpose is to simplify the complexity analysis and its results. As a matter of fact, most of the parallel algorithms proposed so far can be viewed as SIMD methods, in the sense that a typical parallel loop comprises k identical tasks to be executed in parallel. It is further assumed that communication and arithmetic are not overlapped, which is the case, for instance, when processors are not equipped with 1/0 co-processors (ie, processors solely devoted to performing input and output). Yet even when this is not true, it is important to have a realistic measure of what exactly constitutes communication and what computation time, in order to judge the efficiency of an algorithm. We define communication (or data transfer) time as the time to execute an algorithm (in lock step mode) under the assumption that arithmetic can be done in zero time (that is, the arithmetic unit is regarded to be infinitely fast). Arithmetic time can then be defined analogously. The corresponding computation time is at most double of the one resulting from overlapped computation and communication [61]. If processor interconnections are capable of transmitting R words per second, then the inverse is denoted by r . In general, each transfer of a data packet is associated with a constant start-up (set-up) time of @, which is independent of the size (the number of words) per packet. Often, the start-up times are (much) larger than the elemental transfer times, that is, /3 >> r . The time to send a packet of size N from a processor to its neighbour is t~ = N r . On a single processor, a linear combination of two vectors of length N takes time t~ = 7 N w , where 7 is the pipe fill time (it is zero for non-pipelined machines), w the time for one scalar operation and 7 2 w (again, the start-up time dominates the elemental operation time).
+
+
42
I. C.F. Ipsen and Y. Saad
+
For any algorithm the sum of its transfer and arithmetic time, t~ t ~ is, simply called its computation time. The following section gives a short overview (with no claims of being complete) over possible implementations of basic Linear Algebra operations on the three loosely coupled architectures. The efficiency of parallel eigenvalue methods depends crucially on the implementation of data transfers, matrix transposition, matrix vector multiplication, and linear system solution. 3.1. Algorithms for the Processor Ring
A multiprocessor ring is one of the simplest interconnection schemes and yet it is one of the most cost-effective architectures when it comes to bridging the gap between future super computers and current vector computers. As suggested in [77] a small number of inexpensive off-the-shelf standard array processors can easily be connected in a ring yielding a machine with the computing power of a CRAY-1. As mentioned before, rings can be emulated without difficulty by most loosely coupled architectures. Time complexities of elementary data transfers and dense matrix vector multiplication on a processor ring are discussed in [43]. These operations are used to implement various algorithms for solution of dense linear systems by Gaussian elimination on a processor ring. Three ways of assigning matrix elements to particular processors are considered : by rows, by columns and by diagonals. A summary of the results obtained follows. The communication times are low order terms compared to the arithmetic operation times when the number of processors k is small compared to the order of the matrix N . Both the arithmetic times and the communication times of the triangular system solution methods are low order terms in comparison with those of Gaussian elimination. Allocating non-adjacent rows or columns of the matrix to a processor results in better arithmetic performance but worse communication performance. Allocation of non-adjacent diagonals to a processor results in poor overall performance. The overhead of pivoting is small compared with the cost of Gaussian Elimination; it is however of the same order of magnitude as that of triangular system solution. Pivoting is less expensive for the row-oriented schemes. In [61] lower bounds on the communication complexity for dense Gaussian elimination on bus and ring oriented architecture are shown : the communication time is of order at least O ( N 2 ) ,independent of the number of processors. Gaussian elimination for dense systems on a multiprocessor ring is discussed in [71]. Lawrie and Sameh [53]present a technique for solving symmetric positive-definite banded systems, which is a generalisation of a method for tridiagonal system solution on multiprocessors; it takes advantage of different alignment networks for allocating data to the memories of particular processors. Implementation of the Cholesky factorisation on a ring is discussed in [34].New algorithms and implementations for the solution of symmetric positive-definite systems are given in [16], as well as minimal time implementations for Toeplitz matrices on linear systolic arrays, which can be easily adapted to loosely-coupled systems. The literature on solution of linear systems arising from partial differential equations is extensive, the reader is referred to the survey by Ortega and Voight [60]. Iterative methods on rings or linear arrays have been considered by Saad and Sameh [62], Saad, Sameh, and Saylor [63] and more recently by Johnsson, Saad and Schultz 146, 681. In [68] it was shown that a speed-up of up to O ( n ) can be achieved when the system arises from an elliptic partial differential equation. Although implementations of direct sparse matrix techniques on vector and parallel computers have been considered by Duff [22, 211, there is very little work dealing with parallel implementations of sparse direct solutions; parallel nested dissection for finite element problems is discussed in [GANNON].
Parallel Architectures and the Solution of Eigenvalue Problems
43
3.2. Algorithms for the Two-Dimensional Processor Grid
A lower bound for the complexity of communication in dense Gaussian Elimination on a two-dimensional grid of processors is O ( N 2 / & ) O ( N & ) [61 when no overlapping of successive steps in Gaussian elimination takes place and O ( N 2 / k ) O(&) for pipelined algorithms. In the spirit of the ‘wavefront concept’ made popular by S.Y. Iiung 1501 data flow algorithms for dense Cholesky factorisation are developed in [59]. Six different implementations of the dense Cholesky factorisation, depending on the arrangement of the three loop indices, on a processor grid are discussed and compared in [27, 341. The preferred variant turns out to be a computation of the Cholesky factor by columns, whereby previously computed columns are accessed columnwise. The idea for transposing dense and banded matrices on two-dimensional architectures was first developed for systolic arrays in [13, 40) and carries over right away to multiprocessor systems; similar ideas can be found in [58]
+
JL +
3.3. Algorithms for the Hypercube The hypercube topology has been the focus of much recent research in parallel computation; since it can easily emulate many other architectures, the first task is the assignment of data to the processors so as to optimise processor utilisation. Saad and Schultz [65] establish general properties of the hypercube, while Bhatt and Ipsen [6] present algorithms for efficient embeddings of trees onto hypercubes for potential employment in adaptive numerical computations. Chan and Saad [14] propose a mapping of the grid points onto the hypercube so as to minimise processor communication in Multigrid methods. Saad and Schultz 1661 discuss the problem of solving banded linear systems on ring, mesh and hypercube architectures. It is concluded that the concept of one best algorithm for a given architecture is no longer valid. For instance, consider a simple banded linear system of half-bandwidth v and order N . It is not realistic to assume, as often done, that the halfbandwidth v matches exactly the number of processors k , ie, that v = k or that u2 = k. In reality the total number of available processors k is fixed, and one has to determine the best way of solving the system for different values of v and N. In the extreme case of a tridiagonal matrix, where v = 1, Gaussian elimination is not parallelisable, and therefore should be excluded, while the cyclic reduction algorithm [44] which is not advantageous for sequential machines is highly parallel and should be selected. At the other extreme, when the half-bandwidth v is very large with respect to k, simple banded Gaussian elimination with the rows of the matrix uniformly distributed over the processors performs best [66]. Similar observations can be made for AD1 methods [46, 641. Thus, it appears that in the standard software packages of tomorrow one will find several different codes for solving the same problem : according to parameters like size of the problem, number of floating point operations per second and communication bandwidth the program would dynamically choose the best alternative. 4. Eigenvalue Algorithms
So far, most of the work on parallel computation of eigenvalues for (symmetric) matrices has seemed to concentrate on the development of systolic array algorithms and hardware mainly with signal processing applications in mind. 4.1. Methods for (Small) Dense Matrices There is a variety of systolic array implementations for the symmetric eigenvalue problem based on the shifted QR algorithm with Givens’ rotations (the exception is [45] which makes use of Householder transformations). One can use either the arrays for orthogonal factorisations [I, 7, 31, 36, 37, 541 (see 15, 411 for the implementation of ‘Fast’ rotations) or the ones constructed specifical!y for the solution of the tridiagonal [36, 571, positive-definite
44
I. C. F. Ipsen and Y. Saad
tridiagonal [28] or banded [75] eigenvalue problem; however no satisfactory way for an efficient shift computation has been found. Other designs for symmetric tridiagonal eigenvalue computations include a doubling version of the QR method 131,methods based on isospectral flows 121, and Newton’s method or bisection [75]. Often a second, different set of arrays is required to reduce by similarity transformations the original matrix to tridiagonal or banded form [36,74, 751. Earlier papers for the solution of the tridiagonal eigenvalue problem on more general parallel architectures suggest the use of multiple Sturm sequences and bisection (491, and computation of the QR iteration via recurrence equations (7’21. Parallel implementations of Jacobi’s method to compute the eigenvalues of dense symmetric matrices have been considered in [49, 521 as early as 1971. Implementations for the ICL DAP [23] can be found in [56] and improved versions for systolic array implementations 191 seem to be most promising in terms of actual physical realisation since no shift computations are necessary and the architectures are simple; a modified algorithm is presented in [76] to deal with problems whose size does not match the number of available processors. A systolic array that solves the generalized eigenvalue problem for dense matrices via the Q Z algorithm is discussed in [Sl. Suggestions for parallel computation of certain instances of the generalized and the nonsymrnetric eigenvalue problem, as well as for Lanczos method, are presented in [70]. The divide-and-conquer approach to the tridiagonal eigenvalue problem introduced by Cuppen [ 151 has been advocated for implementations on tree machines [48], shared-memory architectures [20] and the hypercube [42]. In signal processing applications, the preferred approach is to design arrays that compute the singular values of the Cholesky factors of the covariance matrix directly instead of employing arrays for computing the eigenvalues of the symmetric positive-definite covariance matrix itself. Examples include arrays based on the Jacobi method [lo, 11, 551 and on Givens’ rotations [36, 39, 761. The iterative reduction to triangular form of a square matrix by Schur rotations is suggested in [59] for an architecture similar to the the systolic array in [55]. For general two-dimensional processor grids, a dataflow algorithm similar to the one for Cholesky factorisation is developed [59] for congruence transformations. 4.2. Methods for Large Sparse Matrices Generally speaking large sparse eigenvalue problems have rarely been examined from the parallel point of view, since the parallel algorithms are either trivial extensions of those for solving linear systems or obvious adaptations of sequential algorithms. For instance, a good shift-and-invert technique can be implemented for Lanczos algorithm if the inner loops, which consist of system solutions, can be efficiently parallelised. As a second example, the Subspace Iteration method is a perfectly parallelisable method and might actually constitute a reasonable choice in multiprocessor environments despite its sluggishness on sequential machines. Although actual comparisons between parallel implementations of this technique and Lanczos method in the symmetric case are in order, we still expect Lanczos algorithm to be superior. Along different lines Sameh and Wisniewski (731 and Wisniewski (811 propose an approach based on trace minimisation for solving the generalised eigenvalue problem Ax = XBx, where the trace of WTAW is minimised over all N x M matrices W that are B-orthonormal. An interesting idea by Grimes et al. [33] is the use of Lanczos algorithm, the preferred method for solution of large sparse symmetric eigenvalue problems, to solve even relatively s m a l l a n d dense problems. The tests conducted on a Cray-XMP124 show that Lanczos algorithm performs better on dense matrices of order 219 to 1496 than the Cray-optimised EISPACK routines. Thus, vector and parallel machines may result in unexpected changes regarding the range of applicability of classical algorithms. Large sparse nonsymmetric eigenvalue problems are crucial in the analysis of complex dynamic systems. Due to nonlinearities, the numerical nature of these problems is so intractable that scientists and engineers often abandon them and resort to simplified models.
Parallel Architectures and the Solution of Eigenvalue Problems
4s
In this situation the use of massive parallel computing could be a decisive factor as it might enable the solution of presently intractable problems. Nonsymmetric eigenvalue methods will benefit very strongly from an increase in computational power and parallelism. It is hoped that these difficult problems will be tackled once reasonably reliable and user-friendly parallel machines appear on the market. 5. Outlook
For multiprocessor systems with regular communication topologies, such as the ones discussed here, novel techniques promise ‘automatic’ design of many algorithms. Progress in this area has been made especially for the implementation of algorithms on systolic arrays (surveys can be found in [IS, 251). These approaches are easily adaptable to more general multiprocessor architectures with regular communication topologies. Given sets of recurrence equations (eg, a FORTRAN program consisting of several sets of nested loops) the design methodology developed in [17, 181 delivers the description of provably optimal, regular, parallel architectures for their implementation. Upoii specification of certain constraints for the resulting architecture, eg, avoidance of broadcasting or restriction to nearest-neiglibour communication, the methodology generates an optimal mapping onto a given architecture. Application of this design methodology will allow the numerical analyst to concentrate on the aIgorithm development, particularly on modifications for improved numerical behaviour or pipelinability, rather than on the mapping or implementation process.
Acknowledgements The work presented in this paper was supported by the Office of Naval Research under contracts N000014-82-K-0184 and N00014-85-K-0461.
46
I. C F. @sen and Y. Saad
References 111 Ahmed, H.M., Delosme, J.-M. and Morf, M., Highly Concurrent Computing Structures f o r Matrix Arithmetic and Signal Processing, IEEE Computer, 15 (1982), pp. 65-82. [2] Ang, P.H., Delosme, J.-M. and Morf, M., Concurrent Implementation of Matrix
Eigenvalue Decomposition based on Isospectral Flows, Proc. 27th Ann. Symp. of SPIE, 1983. [3] Ang, P.H. and Morf, M., Concurrent Array Processor for Fast Eigenvalue Computations, PTOC. 1940 IEEE ICASSP, 1984, pp. 34lA.2.1-4. [4] Babbage. H.P., Babbage’s Analytical Engine, Mon. Not . Roy. Astron. SOC., 70 (1910), pp. 517-26. [5] Barlow, J.L. and Ipsen, I.C.F, Scaled Givens Rotations f o r the Solution of Linear Least Squares Problems on Systolic Arrays, SIAM J . Sci. Stat. Comp., (1986). [6] Bhatt, S.N. and Ipsen, I.C.F., Embedding Trees an the Hypercube, Research Report 443, Dept Computer Science, Yale University, 1985. Submitted for publication. [7] Bojanczyk, A., Brent, R.P. and Kung, H.T., Numerically Stable Solution of Dense
Systems of Linear Equations Using Mesh-Connected Processors, SIAM J . Sci. Stat. Comp., 5 (1984), pp. 95-104. [8] Boley, D., A Parallel Method for the Generalized Eigenvalue Problem, Technical Report 84-21, Dept Computer Science, University of Minnesota, 1984. [9] Brent, R.P., and Luk, F.T.,, The Solution of Singular- Value and Symmetric Eigenvalue Problems on Multiprocessor Arrays, SIAM J. Sci. Stat. Comput., 6 (1985), pp. 69-84.
R.P., Luk, F.T., and van Loan, C.F., Computation of the Singular Value Decomposition Using Mesh-Connected Processors, J. VLSI Computer Systems, 1 (1984). To appear. 1111 , Computation of the Generalized Singular Value Decomposition Using MeshConnected Processors, Proc. SPIE Symp. 4% (Real Time Signal Processing VI), 1983, pp. 66-71. [12] Browning, S.A., The Tree Machine : A Highly Concurrent Computing Environment, Technical Report TR-3760, Dept Computer Science, California Institute of Technology, 1980. [ 131 Cappello, P.R. and Steiglitz, I0
the following algorithm computes orthogonal U and V such that
1 A 11 F.
Algorithm 2.1
While (OFF(A)
> E II A II F).
For rotationset = 1 : k-1
Processor Pi (i = 1:N) does the following: Solves subproblem (2i-l,2i) thereby generating orthogonal U(i) and V(i). Broadcasts U(i) to P,,
..., Pi-l, Pi+,, ..., PN.
Receives U( l), ..., U(i - 11, U(i
+ I>, ..., U(N).
Performs the updates:
,,
[A2i-
W 2 i - 1,
V,il
+
+
[U2i-19 U2il * [A2i- 1, A2il
diag(U( 1),...,U(N))T[A2i-l, A2ilV(i) [V2i- 1 , V,iIV(i) UJ2i-17
U,iIU(i)
T
+
nrn[A2i- 1, A2il
Global communications: AeAP UeUP VeVP end end
It is assumed that a threshold criteria T and a subproblem parameter p are part of the subproblem solution procedure.
Singular Value Decornpositiorl
oil
a Ring o f Array Processors
59
With this breezy development of the parallel block Jacobi SVD algorithm, we are ready to
look at some important practical details.
3. SOLVING THE SUBPROBLEMS
In the typical subproblem we are presented with a submatrix
n2
"1
and must choose orthogonal U, and V, such that
satisfies 2
IIB~~IIF +
(3.1) for some fixed p
2
IIB2, IIF I
2
P
2
[ I I A ~ ~ I I+ F
2
II&i IIF I
< 1. We mention two distinct approaches to this problem.
Method I. (Partial SVD via Row-Cjzk Jacobi)
Use the row cyclic Jacobi procedure (Algorithm 1.1) to compute U, and V, such that (3.1) holds, That is, keep sweeping until A,, is sufficiently close to 2-by-2 block diagonal form. What is nice about this approach is that it can exploit the fact that the subproblems are increasingly diagonal as the overall iteration progresses. A, more diagonal implies fewer sweeps needed to satisfy (3.1). On the other hand, Jacobi requires square matrices and so A, may have to be padded with zero columns. There are some subtlties associated with this, see Brent, Luk, and Van Loan [1985].
C Bischof and C. Van Loan
60
Methud 2. (Golub-Reinsch SVD with BidiagonalizutionPause)
-
-
x x o o o o o o o x x o o o o o o o x x o o o o U;A,V,
(nl=ml=n2=m2=4)
0 0 0 x b 0 0 0 =
o o o o x x x x o o o o x x x x o o o o x x x x o o o o x x x x -
-
where U, and VB are products of Householder transformations. Note that the "b" entry is all that prevents the reduced matrix from being block diagonal. This suggests that if
I b I is small
enough, then A, is sufficiently close to block diagonal form and we set U, = U, and V, = V,. If
1 b I is too large, we complete the bidiagonalization and proceed with the itera-
tive portion of the Golub-Reinsch algorithm terminating as soon as the absolute value of the current (n,,n1 + 1) entry is sufficiently small.
In contrast to Method 1, Method 2 can handle rectangular problems more gracefully. But note that in the rectangular case, diagonalization of A, does not correspond to block diagonalization: -
-
x o o o O x 0 0 0 0 x 0
ooox 0 0 0 0
-
0 0 0 0 -
The situation is remedied by some obvious permutations of U, and V,.
Singular Value Decompositiow on a Ring of Array Processors
61
After some experimentation we settled on the following subproblem procedure (illustrated by the ml = m2 = 4, n, = n2 = 3 case):
Subproblem Procedure
Step 1. Compute orthogonal Q so that
x x x x x x o x x x x x o o x x x x
0 0 0 0 0 0 o o o x x x
o o o o x x o o o o o x ~
0 0 0 0 0 0
This is a slight variation of the LINPACK QR factorization. For long rectangular problems Chan [ 19821 has demonstrated the advisability of preceding the SVD with a QR factorization. The reason for the "split" R has to do with obtaining close-to-identity U, and V, in the final stages of the iteration. We also mention that after the first rotation set is performed all subsequent subproblems have diagonal blocks, a fact that our split QR routine exploits. Step 2. Glue the pieces of R together and compute the SVD.
UT o o o o x x o o o o o x
1
Steps are taken to insure that U, and V, are close to the identity whenever that is possible through permutation. Step 3. Form U, out of Q and U and form V, out of V.
It is perhaps worth dwelLing on the need for close-to-identity transformations. This corresponds to the choosing of small rotation angles in the scalar case. A brief example serves to il-
C. Bischof and C. Van Loan
62
lustrate why this is important. Suppose k=4, and that each block in A is 2-by-2. Suppose that
A had the following form
where
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
X
X
E
E
E
E
X
E
X
X
E
E
E
E
E
X
denotes a small entry. Suppose that the U, and V, for subproblem (1,2) are close to
the 4-by-4 identity but that in subproblem (3,4) U,
= V,
M
[e,, e4, e l , e,]. It then follows that
A gets updated to X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
X
X
E
E
X
E
E
E
X
X
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
To get ready for the next rotation set, which involves rotations (1,4) and (2,3), the block columns (A
+
and
rows
of
A
are
shuffled
[El, E,, E,, E,IT A [El, E,, E4, E2I) giving
according
to
the
parallel
ordering
Singular Value Decomposition on a Ring of Array Processors
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
P
E
E
X
E
E
E
E
E
E
E
E
X
E
E
E
E
E
E
E
E
X
E
E
X
X
E
E
E
E
X
E
X
X
E
E
E
E
E
X
63
Thus the position of the non-neglible off-diagonal block remains fixed thereby slowing convergence immeasurably. (We stumbled into these observations when we encountered a fairly small example that required about 40 block sweeps to converge.)
To guard against this we have incorporated a low-overhead heuristic procedure in Step 3 that permutes all large entries in U and V to the diagonal.
4. IMPLEMENTATION ON THE IBM KINGSTON LCAP SYSTEM
The IBM Kingston Loosely Coupled Array Processor (LCAP) system consists of ten FPS-l64/MAX array processors connected in a ring via five bulk memories manufactured by Scientific Computing Associates (SCA). Each bulk memory is attached to four array processors (AP's) and each A P is attached to a pair of bulk memories. This allows for considerable flexibility. For example, six AP's can be allocated to one user and four to another. Communication is via some message passing primitives provided by the SCA system. To the user these primitives look like calls to Fortran subroutines and are processed by a precompiler.
There are two types of communication required b y Algorithm 2.1. Associated with each rotation set is a broadcast. Each A P must send an orthogonal matrix to every other AP in the ring. This is accomplished in merry-go-round fashion. The U(i) are passed around the ring (clockwise for example) and are applied to the housed block columns at each "stop". After the update, the A matrix must be redistributed around the ring. Here, the nearest neighbor topology is particularly convenient.
C Bischof and C. Van Loan
64
By "piggybacking" information on the U(i) as they circulate around the ring important global information such as OFF(A) and E 11 A 11 can be made available to each AP. This, of course, is critical in order to automate termination and the threshold testing.
We have run several examples and have gathered as much timing information a's the LCAP system permits. Our results for a 636
x
96 example run on a 6-processor ring are fairly typical.
Approximately 5 percent of the execution time was devoted to communication. Seven block sweeps were required and the overall performance rate was in the neighborhood of 3.6 Mflops. If is impossible to compare this with a single processor run as the problem would not fit into the fast memory of a single AP. However, it is clear that we are not achieving significant speed-up as the FPS-164 (without MAX boards) has a peak performance rating of 11 Mflops. (The AP's in the LCAP system are each scheduled to have two MAX boards, but these were not available at the time of our benchmarks.)
Since the block Jacobi procedure is rich in matrix-matrix multiplication, we expect more favorable performance when the LCAP MAX boards are in place. It is clear that communication costs diminish as problem size grows. When the m and n are several hundred, communication costs are quite insignificant.
5. CONCLUSIONS
The problem, as we see it, concerns the algorithm itself. In terms of the amount of arithmetic, two block sweeps is equivalent to one complete Golub-Reinsch algorithm. Thus, it is critical that the number of block sweeps be kept to a minimum. The problems we ran require between 6 and
10 block sweeps in order to reduce OFF(A) to a small multiple of
E
11 A 11 where E is the machine
precision. Thus, we suspect that it will be difficult to implement a parallel block Jacobi SVD procedure that has a decided advantage over a single processor LINPACK SVD routine. The situation is even worse for scalar Jacobi procedures where the communication/computation ratio is less favorable.
The only way to rescue block Jacobi is with an ultrafast update procedure. Fortunately, updating is much more rich in matrix multiplication than the subproblem solution procedure. Thus,
Singular Value Decomposition on a Ring of Array Processors
65
our subsequent work on the LCAP system will involve making optimum use of the MAX boards which are tailored to the matrix multiplication problem.
Acknowledgements. We wish to thank Dr. Enrico Clementi of IBM Kingston who invited us to use the LCAP system. We are also indebted to his research group whose cooperation and friendliness made our experiments possible.
6. REFERENCES
[ 11 R. Brent and F. Luk (1985), The solution of singular value and symmetric eigenproblems on multiprocessor arrays, SIAM J. Scientific and Statistical Computing, 6, 69-84.
[2] R. Brent, F. Luk, and C. Van Loan (1985), Computation of the singular value decomposition using mesh connected processors, J. VLSI and Computer Systems, 1,242-270.
[3] T. Chan (1982), An improved algorithm for computing the singular value decomposition, ACM Trans. Math. Software, 8,72-83.
[4] G. Forsythe and P. Henrici (1960), The cyclic Jacobi method for computing the principal values of a complex matrix, Trans. h e r . Math. SOC.,94, 1-23.
[ 5 ] G. H. Golub and C. Van Loan (1983), Mutrix
Computations, Johns Hopkins University
Press, Baltimore, Md.
[6] E. Hansen (1960), On Jacobi methods and block-Jacobi methods for computing marix eigenvalues, Ph. D. Thesis, Stanford University, Stanford, Calif.
[7] E. Hansen (1962), On quasicyclic Jacobi methods, ACM J., 9, 118-135. [8] P. Henrici (1958), On the speed of convergence of cyclic and quasicyclic Jacobi methods for computing the eigenvalues of Hermitian matrices, SIAM J. Applied Math., 6, 144-162.
[9] C.G.J. Jacobi (1846), Uber ein leiches vehfahren die in der theorie der sacular-storungen vorkommendern gleichungen numerisch aufzulosen, Crelle’s Journal, 30, 5 1-94.
[ 101 H. Rutishauser (1966), The Jacobi method for real symmetric matrices, Numer. Math., 16, 205-223.
[ 111 A. Schonhage (1964), On the quadratic convergence of the Jacobi process, Numer. Math., 6,410-412.
66
C Bischof and C. Van Loan
[ 121 C. Van Loan (1985), The block Jacobi method for computing the singluar value decomposition, Technical Report TR85-680, Department of Computer Science, Cornell University, Ithaca, NY 14853.
Large Scale Eigenvalue Problems J. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986
61
QUANTUM DYNAMICS WITH THE RECURSIVE RESIDUE GENERATION METHOD: IMPROVED ALGORITHM FOR CHAIN PROPAGATORS* Robert E. Wyatt Department o f Chemistry and Institute f o r Theoretical Chemistry The University o f Texas Austin, Texas U.S.A. David S.Scott" Department o f Computer Sciences University o f Texas Austin, Texas U.S.A. *Supported in part by grants from the National Science Foundation and the Robert A. Welch Foundation. "Current Address:
Intel Scientific Computers, 15201 N.W. Greenbriar Parkway, Beaverton, OR 97006 U.S.A.
A new approach is presented f o r the computation of quantum mechanical time-dependent transition probabilities in systems w i t h a large number o f states. Following use o f the Lanczos o f the algorithm t o produce a tridiagonal representation perturbed Hamiltonian, a modification o f the QL algorithm is introduced t o compute eigenvalues and the first r o w (only) of the eigenvector matrix o f 1. All o f t h e eigenvalues and eigenvector coefficients are used t o compute transition amplitudes, even though roundoff error during Lanczos recursion causes spurious eigenvalues t o appear. This contrasts w i t h the original version o f t h e recursive residue generation method (RRGM), where a condensed eigenvalue list was produced (via the CullumWillaughby procedure) before computing squares o f eigenvector coefficients. Eigenvalues, residues, and time dependent transition probabilities computed from t h e t w o methods are found t o be equivalent.
(A)
INTRODUCTION Intramolecular energy transfer, molecular multiphoton excitation and dissociation, spectroscopic pump and probe experiments, quantum beats, and multiquantum NMR are a sample o f the diverse chemical phenomena which require time-dependent quanta1 transition probabilities f o r their prediction or interpretation. Until recently, the largest dynamical calculations on Class VI supercomputers were limited t o fewer than l o 3 states. However, with t h e recently developed recursive residue generation
RRGM,
selected calculations o n systems with 104-105states became possible. Currently, the RRGM is being employed t o study multiphoton excitation pathways in small m o l e c ~ l e s , ~ electronic absorption and resonance Raman ~ p e c t r a ,and ~ t o calculate quantities in nonequilibrium statistical mechanics, including time correlation function^.^
R.E. Wyatt and D.S. Scott
68
The RRGM is based u p o n use o f the Lanczos algorithm6.' t o tridiagonalize the matrix representation o f the total (perturbed) Hamiltonian. From eigenvalues o f the tridiagonal matrix, and those o f t h e reduced matrix (one fewer row and column), squares of eigenvector coefficients (residues) are computed. The eigenvalues and residues are then used t o evaluate transition amplitudes and probabilities. A well known feature of the Lanczos method is that spurious eigenvalues, in the terminology o f Cullum and Willoughby,* may be produced, due t o the propagation of roundoff error. Comparison o f the eigenvalue lists from the original and reduced tridiagonal matrices allows one t o eliminate all o f t he spurious eigenvalues.* In the present study, w e propose and test a different route from the tridiagonal matrix t o the eigenvalues and residues. The new method eliminates the need t o compare lists of eigenvalues. All eigenvalues and residues, even spurious ones, are used t o compute transition probabilities. In addition, the new method is conceptually and computationally simpler than the original version of the RRGM. Over the past fifteen years, recursion methods have been widely used for a number of problems in quantum physics. Prominent among these studies are applications t o the electronic structure and optical properties o f disordered solids. In fact, the RRGM was largely inspired by work during t h e 1970s of the Cambridge solid state physics group.9-'' In Sec. II, the computation o f state-to-state transition amplitudes is reduced t o the evaluation of survivabilities o f t w o different initial states in linear chains. Evaluation o f the chain propagators, using a modification o f the QL a l g ~ r i t h m ' ~ -is' ~presented , in Sec. Ill. Numerical aspects o f b o th th e new method and the original RRGM are described and compared in Secs. IV and V. Results from th e computational study of a model but realistic molecular system are in Sec. VI and a su'mmary is presented i n Sec. VII.
II. DYNAMICS OF LINEAR CHAINS
For times
tcO,
t he quantum system is defined by a Hamiltonian operator Ho and has
stationary eigenstates Im), In), .._with eigenenergies EO,, Eon __.A t time t = 0, a timeindependent perturbation is switched on, such th at the Hamiltonian becomes H = Ho+ V. The states lm), In), ... are no longer stationary and transitions occur between them. Starting from state m at t = 0, this state evolves into the state exp[-iHt]lm) at time t . We say that t he operator exp[-iHt], a time propagator, acts o n Im) and converts it t o the state a t time t. At each time, the evolved state, by design, satisfies the Schrodinger timedependent equation (we w i l l always w o rk in a system of units where 6 = 1): H{e-'Hqm)}=iaa t 6 - ' H t Im)}
Now, at time t, t he amplitude for finding th e evolved state in some other state "n" is just
69
Quantum Dynamics with the Recursive Residue Generation Method
the projection o f exp[-iHt]lm) upon In), An,,l(t) =(nle-'Htlm)
and t h e transition probability is the absolute square o f this, P,,(t) Computing A,,(t) isthe big challenge in chemical dynamics!
= A',,(t)A,,(t)
In order t o compute the time-dependent transition amplitude between initial state m at
time t = 0 and another state n at time t, we first define t w o transition vectors (1)
(n)=(lm)+ ln))/d2 Iv)=(~rn)-~n)~/V"2
The quantum mechanical transition amplitude is then Am,(t) = (nle-
"ltl*n)= [(ule
- IHL
lu) - (vie-
iHt
Iv)1
(2)
Note that the transition amplitude i s the difference between t w o survival amplitudes: for example, (u(exp(-iHt)lu) is the amplitude for surviving in state lu) at time t if we start there at t = 0 If the eigenvectors o f H were known, then each survival amplitude could be easily calculated. Since (we assume an N-dimensional space)
a= I
where Hla) = Eola), t h e survival amplitude in state lu) i s
1 (+jLe-iEat, N
SU(t)= (ule IH7u) = ~
(4)
a= 1
where ( ~ I u )is~referred t o as a residue; it is the residue o f the Green operator (Z-H)-' w i th respect t o vector (u)at th e simple pole E,, (ul
1
N
Z-H Id=
(ulaXalu)
___ (Z - Eo) a= I
The problem wit h this approach is that (for most systems) w e are unable t o calculate the eigenvalues and eigenvectors needed t o evaluate E q . (4). This is because N is very large (>103). In the recursive approach t o computing Su(t), w e define a new basis Iu ,), IUJ, ... , w i t h the starting vector Iu,)= lu) (see Eq. (1)). Given Iu,), the Lanczos algorithm generates a new as t o form a Jacobi tridiagonal matrix representation of H, basis Iuz),Iu3),_._so
R.E. Wyatt and D.S. Scott
“1
Pl
...
0
(5)
...
0
P2
a3
...
where (1, and p, are self-energies and nearest neighbor coupling energies in the onedimensional chain representing H. The diagonal elements or self-energies {a,} are effective masses for ‘pseudo-particles’ in the chain, while t he coupling elements {p,} provide for linkage between nearest neighbor pseudo-particles. A high value for ak impedes the flow of probability (or energy) down the chain, while a high value for pk accelerates this flow. The chain of coupled pseudoparticles is disordered in the sense that the self-energies are all different from one another.
Letting y1 denote the column vector (N elements) representing Iu,), the survival amplitude is simply given by the (1,l) element o f the propagator SU(t)=_ul.exp(-i_Ht).gl
(6)
As a result, the m+n transition amplitude i n Eq. (2) is computed from the difference
between t w o matrix propagators: Amn(t)= t[u;.exp( - it_itbp, -_v:-exp(- igt).v_,I
(7)
In order t o evaluate the t w o survival amplitudes, w e will first focus upon the u-chain, w i th
u2
state vectorsg,, ... Although i s a very large sparse matrix (N rows and columns), if we perform M recursion steps, then A, will be an MxM matrix. In practice, M < 0, i = 1, . . . , n , we say t h a t A is positive definite. If & ( A ) > 0 , f o r all i. A is positive semidefinite. If A o r -A is positive definite, A is definite. If h i @ ) may be arbitrary, A is indefinite. If A is positive definite, zTAy defines a n inner product, and I ) x ] I A = ( x T A z ) g a vector norm. If zTAy = 0 , we s a y t h a t x a n d y a r e A-orthogonal. The n o r m having A = I is denoted 1 1 1 1 . The subordinate matrix n o r m is denoted in t h e same way, so 1 1 B 1 1 = ( m a z A(B*B))%. A+ is the Moore-Penrose pseudo-inverse of A If & , i = 1.. . . , n a r e square matrices, C = d i a g ( A 1 . .. . denotes the block , = 4. diagonal matrix where Cijis the zero matrix f o r all i f j , and C
,&I
Since we deal with K and M throughout this paper, we will give some often used factorisations:
A Generalised Eigewalue Problem und the Lanczos Algorithm
91
M has t h e eigen decomposition M(R, N ) = ( R ,N ) d i a g (R2, O),where the square matrix ( R ,N) is orthogonal, R = d i a g (a1,. . . , a?), T = rank ( M ) and u, > 0. ( R stands for Range space part, and N for Null space p a r t ) If K is nonsingular then
where H3 is a square matrix of order n u l l ( M ) . H I has the eigen decomposition H I ( Q ~QN) , = (QR,QN)d i a g ( h - ' , 0 ) where (QR, QN) is orthogonal.
C = X-'M, has the Jordan normal form CS = S J . The projections Px,PN and the subspaces X, N a r e defined in section 2.3 2. Some Basic Properties of ( K ,M ) . In this section we will give criteria for when t h e pencil (K, M ) is singular, defective e t c . Some of the theorems a r e well-known, but we have included them for completeness. These subjects a r e also dealt with in [ 5 ] , "71, f o r example, but from a different point of view. A s before we assume t h a t K and M E IRnX", and t h a t M is positive semidefinite and singular. We assume t h a t K is symmetric, but otherwise arbitrary. To make things easier to follow, we will first transform Kz = AMz into a n equivalent problem where the transformed M has a simple form.
Lemma 2.1. Kxi = XiMzi c a n be transformed into Rsi = hi&si,where R, & a r e r e a l = bij then a n d symmetric, a n d fi = d i a g ( I , 0 ) with T a n k (a)= T a n k (M).If z:Mzj s:l@zi = 6, Proof. Using the factorisation of M (please see section 1.1) with U = ( R ,N ) , A = d i a g then Kx = AMx H A-'UTKUA-'AUTz
(n, I ) ,
M = U A d i a g ( I , O)AUT
= A diag ( I ,O)AUTz H
Rz^= ha?, with
R = A - ~ u ~ K u A -z^~=. A U T X . Assume M = d i a g ( I , 0 ) and partition
K as
and let K ( a ) = K+aM, a EIR.
Lemma 2.2. If
k]
has full column rank then there is a a such t h a t K ( a ) is nonsingu-
lar.
Proof. Assume t h e r e is no such a, and let Q = d i a g ( 1 , U ) , where U T K 3 U = d i a g ( A , 0 ) , where A is nonsingular and U is orthogonal. Let K 2 U = ( A , B ) .
2 ’
Having assumed t h a t @ K ( a Q IS singular for all a, there must exist nonzero w T = ( x T , y T , z T ) , s u c h t h a t QK ( a ) @ -0,or
T. Ericsson
98
(Kl+aZ)x+Ay +Bz = 0 A ~ X + A Y=
o
BTz = 0
(22)j
AA-'ATz+Ay = 0. This, together with (1) x ( K l + a l ) z - z T A A - ' A T x = 0, so z T ( K 1 - A A - I A T ) z= - a x T z .
and
(3),
gives
Now, take a such t h a t l a l > m a x I A ( K l - A A - ' A T ) I , then x = 0 is necessary (since -a would be a Rayleigh quotient otherwise).
= 0 a n d ( 2 ) > y = 0 , s o (1) gives Bz = 0 , a n d since w rank.
z
#
0 , B cannot have full column
Using lemma 2.2 i t is easy t o prove:
Theorem 2.3. ( K , M ) singular H equivalent to t h a t
El
does n o t have full column rank. (Which also is
K and M have a nontrivial intersection between their null spaces.)
We leave the proof to the r e a d e r We will now characterise t h e definite case ((K, M ) is said to be a definite pencil if t h e r e exist r e a l numbers a and p such t h a t aK+pM is a definite matrix). If (K, M ) is nondefective t h a t does n o t imply t h a t t h e pencil is definite. Let, f o r example,
K = diag(1,1 , -l), M = diag(l,O,0), aK+@M = d i a g ( a t @ a, , -a) which is indefinite for all a,8 . The pencil is n o t defective, however. We have t h e following theorem.
Theorem 2.4. (K, M ) definite
K3 definite
Proof. If (K, M ) is definite, then K ( a ) is definite for some a , and in particular K3 must be definite.
If, on t h e other hand, K3 is definite (let us assume it is positive definite) take a > I min A(Kl )I (so Kl+aI is positive definite). The block LDLT-decomposition of K ( a ) is given by K ( a ) = LDLT, where D has the diagonal blocks Kl+aI and K ~ - K ~ ( K ~ + c x I ) Since - ' K ~ .K ( a ) and D a r e congruent we can, by taking a large enough, make t h e Dzz-block positive definite (the Dl,-block is already positive definite). H One special case is t h e one t r e a t e d in t h e following corollary
Corollary2.5. If (K, M ) is nonsingular and if K is positive semidefinite then definite.
(K,M ) is
Proof. It is obviously t r u e if K3 is definite. Assume K3 is positive semidefinite and singular. We now prove:
El
has full column r a n k and K3 is singular 3 K has negative
a n d positive eigenvalues. (So if K is positive semidefinite K3 must be positive definite.) Take
z# 0
such
that
K3x = O ,
and
let
z T =(xTKl,uzT),
then
A Generalised Eigenvalue Problem and the Lanczos Algorithm
99
z T K z = z T K ~ K l K z z + 2 ~K1 Ip l12. Since K2z f 0 (we would have rank deficiency otherwise), z ' K z can take any value with a suitable choice of 0 .
=
All the above results hold also when M is nondiagonal ( M is still real, symmetric and positive semidefinite). To generalise the theorems we use the matrix N E RXnU''((iU) introduced before (see section 1.1) and reverse the transformation in lemma 2 . 1 . Our two theorems then become:
( K , M ) singular H KN does not have full column rank ( K , M ) definite H NTKN definite.
(2.6 a ) (2.6 b)
In the coming sections we will not look a t the case when ( K , M) is singular. One way to avoid the singular case is t o assume t h a t K is nonsingular. Another advantage, t h a t results from the assumption, is t h a t we c a n study the transformed problem Cz = u z , where C = K I M , and u = A-'. To justify t h a t assumption we need, however, to show t h a t K being nonsingular does not r e s t r i c t our problem t o o much, a n d t h a t the theorems holding for a nonsingular K also a r e t r u e for a singular K and nonsingular ( K ,M). This is done in the next two paragraphs. In the next chapter we will construct the Jordan normal form, CS = SJ of the matrix C = K ' M . I t will be shown t h a t J = diag (A-l, B), where A is diagonal and non-
I"
singular, a n d B is a block diagonal matrix having and 0 blocks on the diagonal. S ll c a n be partitioned a s S = ( X , R ) , where XTMX = I , so t h a t K x = MXA. For the B-block we have KRB = MR. Now, assume t h a t ( K ,M ) is nonsingular and t h a t K is singular. Then there exists a such t h a t K+aM is nonsingular. From the Jordan normal form of (K+aM)-'M, (K+aM)-'MS = S J , we get M: = MX(A-aZ) and KRB = MR (since B(Z-aB)-' = B). S o t h e only change we can get, when allowing a singular K , is t h a t ( K , M) can have one o r more zero eigenvalues. As we have seen, NTKN is a n important quantity and since M has not changed (when forming K ( a ) ) ,N T K ( a ) N = NTKN.
Putting these things (and those in the next section) together, we get the following picture:
T. Ericsson
100
KN
/\
r a n k deficient
full r a n k
\
singular
( K , M ) is nonsingular
I
N~KN
We will not analyse this branch f u r t h e r
/\
definite
not definite
I
\
(K,M ) is n o t definite
( K ?M ) is definite
I
NTKN
/\
singular
nonsingular
( K , M ) is not
(K,M )
is defective
defective
2.1. The Jordan NormalFormof K I M . In the sections t h a t remain we will assume t h a t K is nonsingular and t h a t M is positive semidefinite and singular. So our problem Kz = AMx can be written a s Cz = u z , where v = A-', C = K ' M . We know t h a t M is singular, s o some u-eigenvalues must be zero (corresponding t o infinite A-eigenvalues). In this section we will cons t r u c t t h e Jordan normal form of C, CS = SJ, where S is nonsingular and J is block diagonal. To s t a r t with we assume t h a t M = diug ( I , 0 ) . Suppose now t h a t M=diag ( I , O), K-’ = H =
Ei 2
then C = K - ' M =
E:
:],where
H , has order T = r a n k (M), and we c a n s t a t e our-main theorem in this seciion. ' Theorem2.7. C has a zero eigenvalue of algebraic multiplicity n u l l ( M ) + n u l l (H,) and The order of the Jordan blocks, corresponding to the geometric multiplicity n u l l (M). defective eigenvalues, is two. The remaining m = n - ( n u l L ( H l ) + n u l L(M))nonzero eigenvalues a r e nondefective, and we c a n find diagonal A E RmXm, a n d X E IRnXm, such t h a t K x = MXA and XTMX = I . Proof. The eigenvalues of C a r e Ak(Hl) and n u l l ( M ) zeroes (looking a t the diagonal blocks C, Czz). Suppose t h a t H 1 has the spectral factorisation
HI(@, QN) = (QR,QN) d i a g
0)
101
A Generalised Eigenvalue Problem and the Lanczos Algorithm
where (QR,QN) is orthogonal, a n d A = diug(hl,. . . , As), for some s = rank(H1),and where t h e hi # 0. Then CX = XA-’, where
PI
*= H z Q ~ A Obviously XTMX = I,and lix = MXA. This t a k e s c a r e of the nonzero eigenvalues of C . The zero eigenvalues (corresponding to t h e Czz-block), have eigenvectors
k],
where
U is any nonsingular matrix of o r d e r null ( M ) . If H , is nonsingular this would be it. But if H1 is singular, we c a n n o t find eigenvec-
t o r s corresponding t o these e x t r a zero eigenvalues, since if u = I
implies H , u , = 0, H z v l = 0. which gives u 1 = 0 (otherwise H
\
l“bl
tI:
, t h e n CV = O
= 0 , a contradiction).
v z m u s t be zero too, since we have used all these vectors (for U-above.)
Another way to s e e the same thing, is by looking a t t h e principal vectors in t h e Jordan of g r a d e o n e , corresponding to the zero eigenn o r m a l form. The values, a r e given QN
n ( C z ) - n ( C )= T ( [ o
1).
The principal vectors of grade two, a r e given by There a r e no principalvectors of higher grade, since
I t follows t h a t t h e Jordan normal form, C S = S J , is given by S =( X , N ) and J = diug(A- ,O) in the nondefective c a s e , a n d ( X , R ) (for some R) and J = diug (A-l,B) where B = diug ( B , , . .
,
,
B,, 0), B1 =
, in the defective case.
Let us now look a t a problem where M # diug ( I , 0) By reversing t h e transformation, of t h e g e n e r a l eigenvalue problem, we g e t the corresponding H,= R R T K I R R , a n d so null ( R T K ’ R ) is t h e interesting quantity. We c a n use the Iollowing lemma to c h a r a c t e r i s e null ( H , ) in t e r m s of K ,r a t h e r t h a n K .
Lemma 2.8. Let Q = lar, then
(al,Qz) be a n orthogonal matrix, and assume t h a t A is nonsingunull (QTA-’ Q1)= null (Q$4Qz)
Proof.
have t h e same nullity.
W e c a n now reformulate our t h e o r e m :
T. Ericsson
102
Corollary2.9. Given Kz = AMz ( M not necessarily equal to d i a g ( I , O ) ) , there a r e + null ( N T K N ) infinite eigenvalues (zero A - l ) of which null ( N T K N ) a r e defective. The remaining eigenvalues a r e finite, nonzero, and nondefective, and we can find h a n d X s u c h t h a t KX = MXh, XTMX = I .
null ( M )
Example. Let K l , K2, and M , E IR"’",
M , positive definite, Kl symmetric, t h e n with
all the eigenvalues a r e infinite (and located in 2-by-2 blocks). In this c a s e
,
#
Let M = d i u g ( l , 1,0,0)
and K =
1 0 0 0 0 1 1 0 0 0 1 1
That there a r e null (M)+ nu11 ( N T K N ) infinite eigenvalues, of which null ( N T K N ) lack eigenvectors, should be interpreted in t e r m s of zero eigenvalues of C. We can t h e n avoid a discussion of eigenvalues in plus a n d minus infinity, as in the following example.
K = diag (1, 1, - l ) , M = diag(Z,O,0 ) . Here c = diug (2, 0,O) I
the
eigenvalues
are
5, + m ,
-m,
but
2.2. Another Word on Definite Pencils. We have already seen t h a t ( K , M ) nondefective does not imply t h a t the pencil is definite, i.e. a linear combination a K t g M , a2+P2 = 1, is positive definite. Let us now consider the equivalent pencil
(R,a)= (-@K+aM, aK+BM).
( R ,a)is a positive definite pencil, then RS 3 = S diag((aA+pI)-X, (aNTKN)-B), sT@3 =I A = diug ((aA+PI)-l(al-@A),-(@/ a ) I ) . Theorem 2.10. If
Proof.
=I
@ ~ Xwhere ,
A Generalised Eigerivalue Problem and the Lanczos Algorithm
103
f2S = ( - p K + a M ) S = -pKS+aKSJ = K S ( - p I + a J )
I@S= ( a K + p M ) S = aKS+pKSJ = K S ( a I + p J ) Since
(R,8)is
definite, aI+pJ must be nonsingular, and hence
h = (aI+pJ)-'(-pI+aJ) = diu ((aA+pI)-'(aI-pA), - ( p / a ) I ) .
RS
=
where
Q
diug_(aA+pI,a N K N ) , so with D = diug ((aA+pI)-%,(aNTKN)-H), 3 = SD, Now STI@S_= ST@S= I , AD = DA. which proves the theorem. I
2-3.Projections.
Since we will r e f e r t o T ( X ) quite often we introduce t h e following notation:
x=T(X)
If (K,M ) is nondefective it will later be shown t h a t X = ' f ( ( K - p M ) - ' M ) . The null s p a c e of (K-pM)-'M is denoted N. Obviously N = n(M).N a n d X a r e orthogonal with r e s p e c t t o K (if K is indefinite we mean t h a t z T K y = 0 for all x ~ X a n yd EN.)This follows f r o m K X = MXA and y T M = 0 if y EN.Also X @ N = iR".This is easily s e e n f r o m the proof of theorem 2.7 or by the fact that if S = (X,N) t h e n
sT= (m(xTm)-l, KN(N~KN)-’).
We also introduce t h e oblique projections
Px = X(XTKX)-'XTK = X X T M pN = N ( N ~ K N ) - ' N ~ K We have Px(Xu+Nb) = X a ,PN(Xa+Nb)= Nb. Also Px+PN = I (since ( P x ,P N )= (X, N)S-' = I ) , P$ = Px? P$ = PN,a n d P$KPN = 0
Example. In the following examples it c a n be s e e n how Xis made u p of components in T (M) a n d n (M). Note that eigenvectors , in general, s h o u l d have components In n (M). Note also t h a t X a n d N a r e orthogonal t o e a c h other with r e s p e c t to K and n o t with r e s p e c t to I. 1)
Let K = I a n d 2M = diug (1, 0 ) , t h e n A = 1 and X = s p u n ( (1, O)T).
t h e n h = 1.5 a n d X = s p a n ( ( 1 ,-)$)T) 3)
t h e n h = 1.5andX=spun((l,-)$,0)T).
If (K, M ) is defective t h e n we will show t h a t X = T((K-pM)-*M(K-uM)-'M). I t is no longer t r u e t h a t X@ n(u)= En (it is easily seen from t h e proof of t h e o r e m 2.7, we
T. Ericsson
104
miss
the
principal
vectors
of
grade
2).
If
we,
however,
define
N = ~ ( ( K - , U M ) - ~ M ( K - C F M ) - ' Mwe ) , have X@ N = En.We c a n no longer define PN using
t h e formula above (since NTKN is singular), but t h e definition for Px still holds a n d s o PN = I - p x .
3. Some Algorithms. In this section we will look a t some standard algorithms (inverse iteration, the power method, a n d a Lanczos algorithm) for solving the generalised eigenvalue problem. We will take t h e algorithms for a positive definite M , a n d then s e e how they behave when M is singular. As we will s e e , the major problem will be to keep vectors in the right subspace X.
3.1. Inverse Iteration. The algorithmis given by (see [4] page 317).
In one s t e p we have 2 = (K-pM)-'& (we assume t h a t K-pM is nonsingular). The algorithm is self correcting (given y 1B x then yk E x f o r k > l o r 2) as shown in the following the0 rem.
Theorem 3.1. T ( ( K - p M ) - ' M ) = X if ( K , M ) is nondefective.
r((K-pulW)-’M(K-~hf)-’M) = X if ( K , M ) is defective Proof. Let C = K I M a n d let C = SJS-' be the Jordan normal f o r m of C .
(K-pH)-'M = (I-pC)-'C = S(I-pJ)-’JS-’ = S diap ((A-pZ)-', 0)S-l. In tne defective c a s e we g e t in t h e same way: (K-~M)-~M(K-UM)-~M = S ( Z - ~ J ) - ~ J ( Z - ~ J ) - ~=J - ~ sdiag ( ( ( A - ~ z ) ( A - ~ I ) ) - ~0 , )s-1 since, with B =
i"
, (I-~B)-’B(I-UB)-~B = 0.
W
This implies t h a t if y, has a nonzero component inX, after at most two iterations, inverse iteration will work a s the standard algorithm (when M is positive definite).
A Generalised Eigenvalue Problem and the Lanczos Algoritlim
105
3.2. The Power Iteration. The algorithm is given by (see [4] page 317). In this section we will assume t h a t t h e problem is nondefective. Given y p1 do For k = 1 , 2 , ... Mz = f K - p k M ) y k Yk+l = = / I / z 11 choose & + I In one step we have M z = ( K - p M ) y say. Now, if M is singular this equation need n o t have a solution. In fact, using theorem 3.1 (assuming K - p M is nonsingular), we see t h a t regardless of what the possible solution z is, t h e known vector y must lie in X. If we have access to ( K - p M ) - ’ M we c a n produce a vector in the right space (X). But if may use the inverted operator we may switch to inverse iteration anyhow. If we do not have access to t h e inverted operator we may consider a n iteration having t h e equation z = A ( K - p M ) y a s its main step. One may be tempted to t r y to use t h e pseudo-inverse, A = M+,of M (if M is diagonal, for example). That choice will n o t work in general, since M + ( K - p M ) y E T ( M ) , a n d we do n o t g e t t h e required components in N. The right choice is to take A = XX* since
X X ~ ( K - ~ M )=YX ( A X ~ M - I . L X ~ M ) Y= X ( A - ~ I ) X ~ M Y a n d any y c a n be written Sin, for some a.This gives
X T M S a = X T M ( X , N ) n = ( I ,0 ) n a n d we would pick o u t the right p a r t of y The drawback with this approach is, of course, t h a t we do n o t know X,a n d t h e r e is no way we c a n find X f r o m M alone. We will r e t u r n to t h e matrix X X T in section 4, but it should be mentioned t h a t i t is a generalised inverse (though not, in general, t h e pseudo-inverse) of M .
3.3. The Lanczos Algorithm We will, in this paragraph, study the Lanczos algorithm for t h e generalised eigenvalue problem (see [4] for more details). We will, however, s t a r t to look a t the algorithm for a shifted and inverted standard problem, i.e given A r e a l and symmetric ( A - p 1 ) - ’ ~= ( A - ~ ) - ’ z AZ = AZ assuming t h a t t h e inverse exists. The Lanczos algorithm for this problem c a n be written:
T. Er icsson
106
Given v l , urvl = 1 , compute
For i = 1 t o p do u = (A-pI)-bi if( i > 1) then u = u -/3i-lv,-l
ai = v r u
u = u-a,v, Pi
=l l ~ l l
if ( pi = 0) then va+1 = 0 stop else
= u/Bi endif The algorithm produces, in exact arithmetic,
V = ( v l , . . . , u p ) , with V T V = I and a tridiagonal matrix T , 'a1
81
81
a2
T=
.
> '
. . . .
. 4
. ap-1 a,-1
a,-1 ap
,
such t h a t
( A - ~ J ) - ' v = VT+ppvp+lepT If ( s , v) is an eigenpair of T , i.e. Ts = v s ,then I I ( A - p u l ) - l V s - v V ~ I / = lt3,~,
1
Suppose t h a t M is positive definite, then the problem Kx = h M x is equivalent to A2 = h z , where z = $z, and A = M-5KM-g. Let us now reverse this transformation in the Lanczos algorithm above by introducing the new vectors 4 + u := M%L and This gives us the following algorithm in M-inner product. ui:=
A Generalised Eigenvalue Problem and the Lanczos Algorithm
107
Given v 1, v ?Mu = I,compute For i = 1 to p do 4 = (K-pUM)-lMVi if( i > 1) then 4 = di-&-lvt-, ai = v,'Mdi d, = d,-aivi Bi
= 114 I I M
i f ( Bi = 0) then stop else vi+l =
di/@i
endif Again we get tridiagonal T and V such t h a t
VTMV = I (K-pM)-'WV =
vT+% e,'
If ( s , u ) is a n eigenpairs of T , then II(K-pM)-'MVs-vVs
llM
=
Isp
I lid, l l d d
(3)
We will now use the above algorithm when M is singular Formulas (1) and (2), will still hold, but (3) will not give us the whole t r u t h (since M does not define a norm over the whole space). I t should also be noted t h a t a, = 0 is possible even if dp f 0 (if dp EN).If this is the case, up+' is not defined, b u t % is. (The only reason we have a n index on d, is to be able to refer to the vector. In a implementation we would use the same vector, a s shown in the Lanczos algorithm for A , ) If I g p > 0 then % = v p + l &., We will use T and V in the coming sections without specific reference t o p If we must refer to submatrices of T or V we use Ti to denote the leading principal submatrix, of order j , of T ,and to denote the first j columns of V (so in particular T = Tp and v = VP). Sections 3.3.1-3.3.4.2 deal with the nondefective case. The defective case is t r e a t e d in section 3.3.5.
3.3.1. Contamination of Vectors. We s t a r t this section by an example.
Example. Let K = d i a g ( h l , . , , l), M = d i a g ( 1 , . . . , l , O ) and take v T = ( l , O , . . . , O , E ) . This gives al = (Al-p)-l, p, = 0, and d , = -(A1--p)-’&en. This means t h a t the eigenvalue is e x a c t but the Ritz vector Vs = v l can be arbitrarily bad depending on E . The Ritz vector c a n be refined, and in this case V s + a ; ' d l = e l is the exact eigenvector. The cause of this problem is t h a t v i X,i.e. in this case PNV,= t e n . We will examine the cause of this problem and a try to find a c u r e f o r it in the following sections.
T. Ericsson
108
We will s t a r t by examining how components n o t being in X may affect t h e algorithm in t h e nondefective case. A s was proved f o r inverse iteration (K -pM )-’%, f o r any y ,will always produce a vector in X, hence
Corollary 3.2. If v EX t h e n all t h e vi and any s a n d y .
4
lie in X, so t h a t Vs a n d Vs + y d , lie in X,for
Proof. Using induction and the f a c t t h a t dk =
((K-/LM)-lMvk--(XkV1,-pk-,vk-l) gives theproof. W We will now look a t how a i , pz, vi, a n d $ change when we use a staring vector
v,@.xx.
have been produced by the Theorem33 Suppose t h a t a i , p i t v i i = 1 , . . . , p , and algorithm using v,EX. Let G i , & , Gi i = 1 , . . . , p , and%= denote the corresponding quantities produced by the algorithm, using t h e starting vector cl = v l + z , where z
EN,z
# 0.
Then
3E. = aI. ipi = pi , 8,= v i + y i z , i = I , . . . , p
ap = 4 + 6 , z
where 7 . = -(a. r-,Ti-,+Bi-27i-2)/Bi-l
6,
If
I
2
sP
8
70
= 0,
71 = 1
= -(“pYp+Bp-IYp-l) pk f
0, then 6 k = 7 k + l p k .
p k a r e formed by a&tf(K-pM)-’Mak and (blmk)# f o r some vectors a, Since Mu = 0 for any u E N,ak and pk a r e n o t affected by components in N. To prove the recursion use:
Proof. ak and and
bk.
= (K-/LLM)-’Kk which gives
6k
=
We leave it to r e a d e r to fill in the details.
-cxkYk-pk7k-l.
To s e e more clearly how Tk.
* -akGk-pk-IVk-,
6k
behaves, we express bA
In
terms of the eigenpairs of
A Generalised Eigenvalue Problem and t h e Lanczos Algorithm
109
Theorem35 L e t Tk have the spectral decomposition TkSi =
ViSi
, i = 1,. . . , k
then
Proof. Let
[4], page 129#gives sliskip'(vt)
and sincep'(vi) =
= @1'...'@k-l > 0 , 1 i
k,
k
(ui-vj) we g e t 3=1 1 #i
6k = P (o)(sliskiP'(ui))-l which proves the theorem. W This expression for bk holds for all i < k , s o let us pick a i such t h a t vi has settled down (converged) so IskiI is small. Let us also assume t h a t vi is well separated (the component of t h e Ritz vector from the other eigenvalues and t h a t s l i = ( VS,)~MV, in vl) is n o t particularly small. (All these assumptions a r e quite reasonable in a practical setting.) Then bk = s g l T k for some -rk, where it follows from the assumptions t h a t 7 k behaves reasonably, and s o I b k I can become quite large. Let us now look a t how the contamination affects a Ritz vector y^ = v s .
Theorem 3.6. Let g
= (rl,. . . ,y p ) then Tg = -bpep fj=v s = v s + z g T s = y - 6 p s p v - 1 z ,
(a) (VfO)
(b)
Proof. We know t h a t P = V + Z ? ~(the e r r o r is always in the same direction). Using this a n d (K-pM)-'MV = v T + a , ep we get (K-pM)-IMV = VT+$ eT+z ( g T T + 6 pepT proving (a). (b) follows directly from (a). H
ck.
From (b) we see t h a t the Ritz vector y^ is not so badly affected by z a s the In a r e less affected t h a n fact, good Ritz vectors (smaller Isp I and usually larger 1.1) bad ones. In the next section we will show t h a t we can get rid of t h e contamination of the Ritz vectors altogether by adding a suitable multiple of the next Lanczos vector.
3.3.2. Correcting the Ritz Vector. We will now look a t a n alternative to v s . Is there any linear combination Pa or $’a++,, t h a t does not have the unwanted z-component?
T. Ericsson
110
Theorem3.7. The vector
g = vs +spv-12p , v
# 0
has no component along z , and gives a residual (K-wM)S E T (M)for any w. Forming this vector is equivalent to taking one s t e p of inverse iteration
g = ~-'(K-puM)-l&Proof. Using ( K - p M ) - ' M v = v T + $
Ts = u s , gives
, :e
(K-~M)-IMVS =
For the residual (K-pM)g = v - ' M V s so ( K - u M ) ~= ( K - , u M ) ~ + ( ~ - c J ) W = M(u-'
VY
+sP
dp = ~g E X
VS + ( p - - ~ ) g )
I
(The vector was introduced in [Z] b u t for a different purpose.) When computing &? in a program t h e r e a r e better ways of doing i t t h a n using t h e formula above. One reason is t h a t s t a n d a r d programs (like t h e ones in EISPACK) may produce a sp t h a t is completely wrong when Isp I is small. See [ l ] for more details. It should be noted t h a t g is not t h e Rayleigh Ritz approximation to (K-pM, M ) from span(V,$ ) o r s p a n ( y , $) (for more details about these topics see [l]). Nor is PxG = g~ in general (T E R).Using (K-pM)-'&- = V $ + S ~ I ? ~= ug w e have
pXg = V X ( A - , U I ) X ~ ~
v-’ap
However, given = 9 s !$ X and a,, g = $+sp is the only (up to a scalar factor) linear combination of y^ a n d a, t h a t lies in X. W e may also note t h a t it is impossible to find linearly independent t,, . . . ,tp such that
P&
E
x,
i = 1,. . . , p
since this implies t h a t gTti is zero for all ti,which is impossible unless g = 0. The reason i t works with ps +sp is because we take a linear combination of p + 1 vectors and get p vectors in X. It would, of course, not be possible t o g e t p + 1 vectors in X by using V,,,. We have unfortunately not found any practical way t o refine the G k . One would like to hit t h e Lanczos vectors by Px = I-PN = I-N(NTKN)-’NTK, but the expense is usually t o high.
.-'ap,
We end this section with a look a t t h e case when T is singular. Since T is unreduced the eigenvalues a r e distinct and s o if T is singular u = 0 is a simple eigenvalue. Due to interlacing and Gcl cannot both be singular. Assume Tk is singular, then from lemma 3.4 it follows t h a t 61, = 0 ( s o 2, EX) and Tg = 0 (from 3.6 a) s o g is a n eigenvector corresponding to v1 = 0 say. Hence psi = Vsi+zgTsi = Vsi E X provided i > 1 and t h e r e is no need to c o r r e c t these Ritz vectors. The Ritz vector vsl does not lie i n X ( b u t it is of little practical interest). Singular T c a n , of course, be produced by the Lanczos a1 orithm. One example is given by K = d i a g (A, -A, I ) , M = diag ( I , I , 0), y = 0 , a n d v f = (aT,a T ,bT), Then all ak = 0 which implies t h a t all Ti of odd order will be singular. (Proof: Let A be a square matrix satisfying uij = 0 if i + jis even. Let C = diag(1,-1, 1 , - 1 , ...) then C A E = -A s o if A z = X z then A ( & ) = -X(Cz), and in o u r tridiagonal case this implies t h a t eigenvalues occur in +- pairs.) For a numerical example see section 3.3.4.
G
111
A Generalised Eigenvalue Problem and the Lanczos Algorithm
3.3.3. The Nonsingular Case. In this section we will t r e a t t h e case when M is nonsingular but ill conditioned. We will limit the study to one simple problem, and we s t a r t by listing some of the properties of the problem. Let
W e a s s u m e t h a t I I K l j l I 1 , 11q11 = 1 , / u I s l , a n d t h a t O < ~ < < l . I t is not difficult to sh?w t h a t (K, M) has o n e large eigenvalue FY u / E (with corresponding eigenvector 2 , say) and n-1 eigenvalues in [ - 2 , 2 ] . If we partition ZT = ( z : , Z N ) , Z N E IR (the notation ZR, Z N should remind about range and null space in the singular case) we can show t h a t
x
/ Z N / // I z R I I
Id/&
This means t h a t z^ is very rich in the eigenvector of M corresponding to X ( M ) = E . This is not true f o r eigenvectors corresponding to the small eigenvalues of ( K , A!?). Here / z N / / lISR//
1/(1-2&)
As will be s e e n in the example in the next section T (G)will function a s N did when was singular. When M was singular we had P N = N ( N ~ K N ) - ~ N ~and K / y k I = / I PNGk I I I I PNG, I I - I . Now when M is nonsingular we define the projection
M
P;. = z^(ZT=)-'STK = z ^ s T M , STM? = 1 a n d provided z^Th?vl# 0
Iz^Tm
1% I = I I P Z V k II IIPP, 11-l = I lZTh,l-' ( ~ i g n ( 7 is~ defined ) below). Letc = VT&, so w i t h g T = ( y l , . . . , y p ) then 1g I = / c I lz^TM-u,I-l,and I I C / I = 1 1 V T M q c 1 1 VTk4lI IIk4z^II = 1
/ / g11 is not bounded in the same way since 119 I / I I z ^ T ~ I-', which can be made arbitrarily large by taking zi almost MI-orthogonalt o z^ Unlike the singular case where the norm of the contamination I I PNck 11 = Iyk I 11 11 1s not bounded ( 1 1 z 1 1 is arbitrary), it is bounded in the nonsingular case. liP;.Vll =
I]
I I ~ c ~ I5I
11Z11 % (min h(M))-#
1, V c a n contain large numbers. This follows from the f a c t Even though / j c t h a t VTMV = I , and s o llzik / / can be a s large as (min X(M))-#. If llvk 1 1 is large then the vector must have a large component along z^. The above conclusions can be eneralised to the case when M has more t h a n one small eigenvalue (if lix = MXA. XM ' X = I then t h e r e is a t least one xk such t h a t jlzk I / 2 (n min x ( A ! ? ) ) - ~ ) .
Another way t o see the similarity between the two cases is by using (K-pM)-'MV = VT+$ e,' which gives (multiply by z^'M) (T-(X-p)-lI)c = -STAT$ ep Taking c = gZTh4u (defining sign(-yk)) (T-(X-p)-'I)g = - Z T ~ a i (ZTMuI)-’ep , which should be compared with t h e singular case (lemma 3.6 a).
T. Ericsson
112
Multiplying (K-pM)-'MV = V T + d , e J by 2^TMIvT& I-' and s we get (with y = V s ) a n expression for the relative growth of 2 in the Ritz vectors
IIPfY/ I 11%~111-1= I Y T E l / v T W - ' = I ~ p + 1 B p S p llu-(x-P)-ll-lM l~p+$pspll~l-l where M hoIds for extreme u. This result can be compared with theorem 3.6 (b). When using f,?' instead of y the component along 2^ will decrease (but not be deleted as when M is singular). Using 3C^TM(K-pM)-'y = U?~A@ we get (provided ZTMy # 0) 77 =
l l w l l ( i l e Y il ilf,?'IlMr1 = l ~ T ~ l ( Il l l f~ , ? 'TI l M~ r '
=(I4R-d
llf,?'llM)-'
IA-Pl IX-PI-'
where the M holds if u has converged (so h M p+u-l for some A). For our special example we would thus get 7 M E IX-pI ] CTI-', quite a substantial reduction. We would also like t o point out t h a t similar phenomena c a n occur when using standard Lanczos on the standard eigenvalue problem Ax = Ax, since we can transform the generalised problem if M is positive definite. An example is presented in the next section.
3.3.4. Two Examples.
In this section we present two examples. In the first Y is singular and in the second Y is nonsingular and moderately ill conditioned. The examples a r e completely artificial and have been included only to give the reader some idea of how the Lanczos algorithm may behave. In both examples n = 30 and
K = d i a g ( 5 - ' , 6-',
...,
32-',
[i
I1
),
M = d i a g ( I Z z eE,)
and the starting vector was v f = (1, 1 , . , , , 1, - 0 . 5 + [ )
T
(where T had been chosen so t h a t vTMu = 1.) So (K, M ) has eigenpairs ((4+i)-', ei), i = 1 , . . . , 28. For the remaining ones we have to look a t the subproblem:
The results below were computed on a VAX in double precision (relative machine precision M 1.4.10-17) using a Lanczos algorithm with full reorthogonalisation implemented in Matlab [3]. We r a n the algorithm for 15 steps, i.c. p = 15. The tridiagonal matrices satisfied: 5.10-'< lak I < 2 and 1.6,10-2 Ifi2 d 0.36. M 1.6.10-'.
3.3.4.1. Singular M Here E = 0 s o the subproblem (*) has eigenpairs (g, (1, -5)') and ( m , (0, l ) T ) . The starting vector had equal components of all eigenvectors corresponding t o finite eigenvalues. The contamination, z = [ T e 3 0 . Since we know t h a t P p k = ykz the size of 6 is not important if we just wish to find yk. (In practice we would like to have a
small ] [ I , of course, i.e. we would like t o have a w 1 t h a t lies in X.As we have seen (theorem 3.1) this can be accomplished by hitting any vector by (K-pM)-'M. Using
A Generaked Eigenvalue Problem and the Lanczos Algorithm
113
this technique on a computer would give a [ depending on the relative machine precision and t h e methods involved in forming (K-pM)M-'r, f o r some T .) Below a r e listed the eigenvalues, v k , of T and the absolute values of the bottom elements. I s p k 1 , of the corresponding eigenvectors (k , t h e index, equals the number of the row in the table). The third column contains the absolute values of the growth f a c t o r s (the 7k). The last column contains 1 1 P N I I ~ 1 1 P N~. ^ ~I I i.e. the relative growth of the contamination in the Ritz vectors. (All t h e values have been put in one table, although they should not necessarily be compared row-wise. A yk is not related to a n y specific u i , for example.)
-',
uk
3.1533d-02 3.4399d-02 3.9433d-02 4.6556d-02 5.5515d-02 6.5894d-02 7.71 1 Id-02 8.8464d-02 9.9504d-02 1.1108d-01 1.2500d-01 1.4286d-01 1.6667d-01 2.0000d-01 2.0000d+00
iSpk
I
7.5432d-02 1.8132d-01 2.7978d-01 3.6344d-01 4.2417d-01 4.5223d-01 4.3532d-01 3.5642d-01 2.1 15 Id-0 1 7.2107d-02 1.2412d-02 1.0479d-03 3.8827d-05 4.4893d-07 2.2286d-23
IYk
1 1 PNGk I I
I
1.0000d+00 3.8661d-01 1.7416d+00 3.8896d+00 8.8019d+00 2.0639d+01 5.0314d+01 1.2781d+02 3.3887d+O2 9.3915d+02 2.7241d+03 8.2803d+03 2.6409d+04 8.8501 d+04 3.1213d+05 1.1606d+06
~
~
~
N
4.4725d+04 9.8550d+04 1.3265d+05 1.4596d+05 1.4286d+05 1.2831d+05 1.0555d+05 7.5329d+04 3.9743d+04 1.2137d+04 1.8566d+03 1.3714d+02 4.3556d+00 4.1967d-02 2.0834d-19
6
1
~
~
In theory PNg = 0 but in this case it is of the order of lo-'' - la-" but t h a t is not of any r e a l significance since the problem is so trivial. Looking a t columns t h r e e and four it c a n be seen how the Lanczos vectors and t h e Ritz vectors a r e affected (theorems 3.5 and 3.6). In particular, it c a n be seen t h a t t h e good Ritz vectors a r e not affected very much.
3.3.4.2. Nonsingular M. Here one change was made, now changed to M fii
E
=
The eigenpairs of the subproblem (*) have
(4.999875.10-', (9.999875.10-', -5.000062.10-1)T) a n d (2.00 0 0 50.10 4 , (5.00 0 0 6 2.10-3,9.99 9 8 75.10 ') T,
(Which c a n be compared with the theory in section 3.3.3: U E - ' = 2.104 and /zNlj l ~ R 1 1 - 1M 2.104.) The [ in v1 does matter in this case (since it is measured by the M-norm) and we took = 0.4. The table below consists of v k a n d ispk 1 a s before. The third c o ~ u m n contains the growth factors as defined in section 3.3.3. The next two columns contain the relative growths of 5 in the Ritz vectors and the n o r r n a l i s e d g-vectors, i.e. using t h e notation in section 3.3.3 the columns contain 1Ipzyk 1 1 llppl 11-l and /)PzgkI/I/Ppll/-l(wherevhasbeennormalisedsothat ]Iglln= 1).
T. Ericsson
114
vk
5.0880d-05 3.1753d-02 3.5610d-02 4.2256d-02 5.1327d-02 6.2274d-02 7.4399d-02 8.6874d-02 9.8957d-02 1.1 101d-01 1.2500d-01 1.4286d-01 1.6667d-0 1 2.0000d-0 1 2.0001d+00
bpk
I
1.291ld-02 9.5617d-02 2.1961d-01 3.2745d-0 1 4.0956d-01 4.5697d-0 1 4.5933d-01 4.0 196d-0 1 2.7082d-0 1 1.1076d-01 2.2542d-02 2.1628d-03 8.8950d-05 1.1252d-06 7.5966d-23
I7k
I I p2Yk 1 1 I I P P 1 II
I
1.0000d+00 3.8646d-0 1 1.7402d+00 3.8842d+00 8.7839d+00 2.0580d+01 5.0098d+01 1.2652d+02 3.2382d+02 7.2302d+02 9.1762d+02 5.3057d+02 1.9481d+02 6.2529d+01 1.8886d+0 1 5.4101d+00
1.3463d+03 2.7726d-01 5.6772d-01 7.1321 d-0 1 7.3425d-01 6.7511d-01 5.6793d-01 4.2558d-01 2.5 171d-01 9.1768d-02 1.6585d-02 1.3922d-03 4.9077d-05 5.1731d-07 3.49 17d-24
I I PSVk I I 1 I p2v 11I
2.9890d+02 4.3600d-04 7.9278d-04 8.3667d-04 7.0076d-04 5.3787d-04 3.7959d-04 2.4418d-04 1.2704d-04 4.1327d-05 6.6339d-0 6 4.8726d-07 1.4723d-08 1.2932d-10 8.7288d-29
Comparing the two examples we see t h a t t h e 7k a r e roughly the same for small values of k . For larger values t h e boundedness forces t h e yk t o decrease in the nonsingular case. If we had t a k e n a smaller ( 1 the values would have followed e a c h other for even larger values of k . Looking a t t h e two last columns one c a n s e e how much b e t t e r fJ is. From section 3.3.3 it follows t h a t t h e quotient, between values in the two columns, is roughly 5.10-5v-’ (except for the first row). (To g e t t h e absolute values of t h e growths (instead of the relative values) use IIPpIjI M 7.4279.10-2 or Iz^TiCLIUl I fil 7.4280.10-4.)
3.3.5. The Defective Case. In this section we will briefly look a t the more complicated situation when ( K , M ) is defective. To make things easier to follow we assume t h a t M = &Lag ( I , 0) and so with o u r previous notation
i 1
= H zQR QRh
We may have unwanted components not only in n (M) b u t also in T (M). The following t h e o r e m describes how these components may grow in t h e v k .The --notation used is t h e s a m e a s in t h e o r e m 3 . 3 . To make the notation easier we have assumed t h a t BP # 0.
A Generalised Eigenvalue Problem and the Lanczos Algorithm
115
and
Since 6 k / tion, instead
E~
i s not constant, the contamination does not stay in the same direc-
pNv^k
=
[ ]
Yk Q N ~ wk
where wk E span ( H z Q N a , b ). From theorem 3.3 we see t h a t the recurrences f o r Y k and &k coincide with the previous r e c u r r e n c e for Yk (in theorem 3.3). SO withg = (yl,...., Tg = -S3p7p+leq. Theorem 3.8 now gives ~ k - l b k - I + $ k b k + B k b k + l = Yk O r with d = (bl,...., 6 p ) , Fd = g -flp6p+lep Using this we could compute P N ~ s . As before the starting vector should be chosen inXwhich can be accomplished by hitting any vector twice by (K-pM)-'M. To refine the Ritz vectors it is n o t enough to a d d a multiple of w , + ~ (that will only ensure t h a t the refined vector lies in T ( ( K - p M ) - ' M ) . If we, however, r u n the Lanczos algorithmp + 1 steps and compute the Ritz vector, y = V p s ,from Vp then
(K-PM)-~H$ = ~ j j + s p $ G p + l (K-pM)-1MVp
=
vp
c1
A T T p +BP +]UP +1ep + I
(1)
(2)
T. Ericsson
116
so ( ( K - p d f ) - ' ~ V ) ~ y=^~ ( K - p B ) - ~ i @ + s ~(& K - ~ J & ) - ' M V ~ +which ~, c a n be computed from (1) and ( 2 ) .
4. A Choice of Norm. When M is positive definite, we have several s t a n d a r d bounds in t h e M-'-norm, e.g. f o r arbitrary y # 0 and p , t h e r e is a n eigenvalue A (where Kx = A M z ) such t h a t
I A - p I l l m l l ~ -II(K-@M)y l~ IlM-1 (see 141 Ch. 15). In our c a s e , when M is singular, this does not hold, of course. It is possible to find relevant norms, I I I I say, also in this c a s e , but t h a t is t h e subject of a coming report. We will, however, quote (without proofs) some of the more practical results from t h a t r e p o r t , and in particular those when W is singular (only positive semidefinite). In t h a t case we have a semi n o r m (there a r e vectors x # 0 s u c h t h a t 115
I / Iy = 0).
When M is singular a n a t u r a l choice of W would, perhaps, be M', the pseudo-inverse of Y. There is, however, a more n a t u r a l choice. If M is ositive definite, t h e n XTMX = I , so M-I = AXT.So one might guess t h a t I I T l l x x ~ = (T XX*r)%could be a reasonable
P
alternative also in t h e c a s e when M is singular. It is not difficult to prove t h a t W = XYT satisfies:
WMW = W , M W M = M when ( K , M ) is nondefective (WMW = W holds in the defective case). So W is a generalised inverse, b u t since WM and MW usually a r e not symmetric matrices, W is n o t in general the pseudo-inverse.
To justify this latter choice of W let us quote the following theorem: Theorem4.1. Given any y
E Rn a n d
IA-PI
r e a l p r there is X = A(K, M ) such t h a t
llMv I l m T g II(K-I,LM)y I I m T
This is not, of course, a particularly practical result, since we do n o t usually know
X.We cannot replace XYT by M', in general, as t h e following example shows. Example. Let K =
1:]. :1 i],
I A-p I 1 1 My I I
a n d y = ( - 1 , 2 - ~ ) ~ Then .
M =
= I 1-p
hold unless p = 1.
1,
and
1 1 ( K - p M ) y 1 1 M+ = 0 ,
and the norm inequality does n o t
If we, however, r e s t r i c t t h e choice of y we c a n regain the inequality: Corollary 4.2. If y E X then IA-pI
llMv I l y c c lI(K-PM)y
IIMt
If y E X then both My and (K-pM)y lie in T ( M ) = T ( M + ) . It may be noted t h a t this bound holds in t h e defective case too. The n o r m of the residual can be computed in practice a t least when M is dia i o d . 1 1 . 1 I M + defines a n o r m on X.In h e following theorem we also use (z , Y ) = ~zTMy which defines a n inner product on X.Using this inner product we c a n get a decomposition of
Q
117
A Generalised Eigenvalue Problem and the Lanczos Algorithm
y into z and e , where z is the eigenvector of the eigenvalue closest t o p and e is a vector orthogonal to z ( ( 2 , e)M = 0). Assuming t h a t y . z , and e have unit length ((5, z ) =~1 , etc.) we can define the error-angle cp such t h a t y = cosp z +sinp e . With these definitions we can prove theorems similar to those in [4] page 2 2 2 .
Theorem4.3. Let y EX, where yTh& = 1, and p be given. Let X be the eigenvalue closest to p and define the gap y = min I X i -p 1 . A' # A
L e t y =coscpz+sincpe,whereKx =hMx,zTMz = l , e T M e = l , a n d e T M x = O . Then the following bound holds for sincp:
I sinv I Y-’ I I (K-PM)Y I I M4 If y equals the Rayleigh quotient, y TKy,it is also true that IA-P I 5 7-' I I (K-PWY I I+; These bounds also hold in the defective case. Using the above theorems we can give bounds on the accuracy of the approximations computed by the Lanczos algorithm. Before applying these bounds to the Lanczos case we prove the following theorem in which the residual of a inverted problem is used to construct another vector f o r the noninverted problem. (This is a slight generalisation of how the vector M was constructed in theorem 3 . 7 . ) From pow on we will only deal with the nondefective problem.
Theorem4.4. Let T = (K-pM)-'My -uy , y T M y = 1, y T& = I I (K-(P+w-1)M)i7 I I M + I I 1 I i!.
w7
Then q(w) is minimised for w o = u+ Ilr Il$u-'
Hi)-'
= 0. Take
g = uy +r
and let
and the minimum value, v(w0), is
IIT I l M ( ~ 2 + I I ~
Proof. Let (v2+11r
US
first note t h a t
Hi)%= I I W I I M + = IlVllnr =
s i n c e y T m = 1. From the definition of r^
5
lI(K-PM)-'h&
IIM f 0
f7 we see t h a t My = ( K - p M ) q , s o
(K-(p+u-')M)V = My -o-'W = (1- u w - ~ ) & -w-’Ah
which gives
1 1 r^ 11*; = (1 - u w - ' ) 2 + W - Z r T m = (w-’(U2+T TMr)?+-u(uZtrT&
)-"Z+
1-u2(
U2+T T M r ) -1
whichisminimisedfor w o = v + u - ' T ~ M T giving I I r ^ I I ~ + = r T A h ( v Z + r T ~ I )-l. (Of course wO1 = gT(K-pM)g/gTA@. Note t h a t u = yTM(K-pM)-'My.) (p+w{’, g ) need not be a better approximation than (p,+u-', y ) . So let us find out when 77(wo) = T
It ( K - W w g ' ) M ) V I I M + I I w I I
= (K-pM)-'IWy-vy
5 I I (K-b+v-')M)y II M + IIM y I I s o (K-(p+u-')M)y = -u-'(K-pM)r and (*) is equivalent to:
1 1 ~ 1 1 ~ ( ~ 2 + 1 l 1l i~ ) - ' s l l ( K - ~ M ) rI I M + 1 4 - ' which is equivalent to I w o I - ' ~ //(K-&Wr l l M + l l M Ilia!
(*)
T. Ericsson
118
In other words, for the inequality (*) to hold, T should have large components in the eigenvectors corresponding to A ( K , M) f a r f r o m y o r I uoI should be large, i.e. g should approximate a n eigenvector corresponding to a n eigenvalue close to y. (In the same way a s was shown in section 3.3.3 we get (provided A ,utu-')
I&?I(I4J@ which shows how
I IIflII)-l
IX-WI
1Ak-PI-l
is affected by other eigenvectors.)
Example. Let K = d i a g ( 5 , 10, loo), M = I , and p = 0. Take y T = (0, 10, l ) l O l - n . In this case we g e t a much better approximation using q T = (0, 10, 0 . 1 ) ~(for some 7). Furthermore, 10-u-’ fil -9.0.10-2, 1O-wO’ W -9.0,10-3, l\(K-(y+u-’)I)y IIy I \ - ' M 8.9, ) V l l M 0.90 and l l ( ~ - ( ~ + ~ O ~ VllVll-l If we, however pick y T = (1, 10, 0)lOl-5 t h e picture changes completely. Now 10-v-' FY 9.8,10-2, 10-wO1 FY 0.19, and g* = (2, 10, 0 ) ~ . II (K-(p+u-l)Z)y 1 1 I I Y 11-1 M 0 . 5 0 , and l I ( K - ( ~ u + w ~ ~ ) I ) f lIIMII-I I FY 0.96 We now apply t h e theorem to the approximations produced by the Lanczos algorithm in the nondefective case (the notation follows t h e theorems above). In this c a s e y = v s and I I T 1 1 = ~ ,3 1s I for some s , so w o = v + B ~ s ~ u -and ' the corresponding q(oo)= a, Isp I (u2+,3;sEf'. %e c a n also see t h a t the changes, when using g , will not be s o dramatic (except f o r the contamination), since:
which would be quite small. Also Iwo-u(
=
IlT
I~~luI-'=fSp2sp2IuI-~
which usually is even less. Since EXwe c a n apply 4.2 and 4.3 and get: Corollary4.5. L e t p = a, I sp I (u2+,3;sE)-'
t h e n sin
I sin p I
is bounded by
I py-'
and there is a n eigenvalue A t h a t satisfies I~-(yu+o;')
I
s min ( p , p2y-')
Acknowledgements. This work was completed during a visit to CPAM a t the University of California, Berkeley. I thank Professor B.N. P a r l e t t for making my stay so pleasant and interesting. This research was supported by the Swedish Natural Science Research Council.
5. References. 113 Ericsson, T., Jensen, P.S., Nour-Omid, B., and Parlett, B.N., How to implement the spectral transformation, Math. of Comp., to appear.
A Generalised Eigenvalue Problem and the Lanczos Algorithm
119
[2] Ericsson, T. a n d Ruhe, A,, The spectral transformation Lanczos method for the numerical solution of large sparse generalised eigenvalue problems, Math. of Comp. 35:152 (1980) 1251-1268. [3] Moler, C., An interactive matrix laboratory, Tech. rep., Univ. of New Mexico (1980, 2nd ed).
[41 Parlett, B.N., The symmetric eigenvalue problem (Prentice Hall Inc., 1980) [5] Uhlig, F., A recurring t h e o r e m about pairs of quadratic forms and extensions: A survey, Lin. Alg. Appl. 25 (1979) 219-237.
[6] Wilkinson, J.H.. The algebraic eigenvalue problem (Oxford U.P., 1965). [ 7 ] Wilkinson, J.H., Kronecker's canonical form a n d t h e QZ algorithm, Lin. Alg. Appl. 28
(1979) 285-303.
This manuscript was prepared using troff a n d eqn a n d printed on a Versatec printer using a 11 point Hershey font.
This Page Intentionally Left Blank
Large Scale Eigenvalue Problems J. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986
121
NUMERICAL PATH FOLLOWING AND EIGENVALUE C R I T E R I A FOR BRANCH SWITCHING
Yong Feng Zhou* and Axel Ruhe** *Wuhan D i g i t a l E n g i n e e r i n g I n s t i t u t e , P . O . Box 2 2 3 , Wuchang , Wuhan , P e o p l e ' s Republic of China **Department o f Computer S c i e n c e , Chalmers U n i v e r s i t y o f Technology, 4 1 2 9 6 Goteborg, Sweden
Methods f o r n u m e r i c a l p a t h f o l l o w i n g f o r n o n l i n e a r +?tract: e i q e n v a l u e problems a r e s t u d i e d . E u l e r Newton c o n t i n u a t i o n a l o n g c u r v e s p a r a m e t e r i z e d by a s e m i a r c l e n g t h i s d e s c r i b e d . C r i t e r i a f o r localizing singular points (turning points o r b i f u r c a t i o n s ) by means o f a l i n e a r eigenproblem a r e i n t r o d u c e d . I t i s found t h a t a n o n l i n e a r v e r s i o n o f t h e s p e c t r a l t r a n s f o r m a t i o n used f o r l i n e a r symmetric eigenproblems g i v e s a s u r p r i s i n g l y a c c u r a t e p r e d i c t i o n of t h e p o s i t i o n o f a s i n g u l a r p o i n t and t h e d i r e c t i o n o f b i f u r c a t i n g b r a n c h e s . P r a c t i c a l a p p l i c a t i o n s a r e d i s c u s s e d and n u m e r i c a l examples are r e p o r t e d .
I n t h e p r e s e n t c o n t r i b u t i o n w e w i l l seek s o l u t i o n s of g(u,X)
(1.1)
=o
where
u
and X
i s a real p a r a m e t e r . Such problems are termed n o n l i n e a r e i g e n -
and t h e n o n l i n e a r f u n c t i o n
g
a r e v e c t o r s of n dimsnsions,
v a l u e problems o r p a t h f o l l o w i n g problems. The m a n i f o l d of s o l u t i o n s t o ( 1 . 1 ) i s i n g e n e r a l of one dimension, and w e g e t a c u r v e o r p a t h t o f o l l o w . When s e v e r a l p a t h s meet w e g e t branch p o i n t s , most o f t e n termed b i f u r c a t i o n s .
L e t u s f i r s t c o n s i d e r some examples. The { l i n e a r ) e i g e n v a l u e problem
(1.2)
g(u,X)
where A and B
5
(A-XB)u, a r e symmetric m a t r i c e s , and B
is positive d e f i n i t e ,
h a s one s o l u t i o n p a t h
r0 =
{ u =0 , X a r b i t r a r y }
which w e c a l l t h e t r i v i a l s o l u t i o n . A t e a c h e i g e n v a l u e A = X k , t h e r e
Yon, Feng Zhou and A . Ruhe
122
i s a bifurcation with a straight line path s e t t i n g out i n the direct i o n of t h e e i g e n v e c t o r ,
rk =
C U = U ~ C,
x = xk 1 ,
where
uk e i g e n v e c t o r ,
(A
-
XkB)uk = 0
c arbitrary, See f i g u r e 1 . 1
1c
r
u ' t
t
G
7
J 13
h,’ A 1
Figure 1.1
When g i s n o n l i n e a r , i t is o f t e n n o r m a l i z e d s o t h a t g ( 0 , O ) = 0 , and w e c a l l t h e b r a n c h s t a r t i n g a t t h e o r i g i n t h e t r i v i a l b r a n c h . W e now g e t c u r v e d b r a n c h e s , and may g e t a more c o m p l i c a t e d s t r u c t u r e w i t h s e c o n d a r y b i f u r c a t i o n s . T u r n i n g p o i n t s , where a s o l u t i o n has t h e hyperplane
X = c o n s t a n t a s a t a n g e n t p l a n e , a r e a l s o of i n t e r e s t . The
s o l u t i o n m a n i f o l d may a l s o be d i s c o n n e c t e d i n t o s e v e r a l p a r t s or i s o l a s . See f i g u r e 1 . 2 .
123
Numerical Path Following and Eigenvalue Criteria for Branch Switching
t
A
Figure 1 . 2 One i m p o r t a n t s o u r c e o f n o n l i n e a r e i g e n v a l u e p r o b l e m s i s l a r g e d e f o r m a t i o n and b u c k l i n g p r o b l e m s i n m e c h a n i c s , see e.9.
[21.
Consider t h e
v e r y s i m p l e s t c a s e o f Euler buckling of a r o d ,
(1.3)
Here
I [ B ( s )8 ' 1 ' + p
sin0 = 0
8 s t a n d s f o r t h e angular d e f l e c t i o n i n t h e p o i n t s. B ( s ) i s t h e
s t i f f n e s s i n t h e p o i n t g i v i n g t h e b e n d i n g moment
The l o a d p , w i t h w h i c h t h e r o d i s c o m p r e s s e d , i s t h e e i g e n v a l u e p a r a -
meter. T h e t r i v i a l s o l u t i o n i s 0 = 0 , c o r r e s p o n d i n g t o a s t r a i g h t rod. I f w e l e t p i n c r e a s e from 0 , t h e r o d k e e p s s t r a i g h t u n t i l p = A , , where it may b u c k l e i n t o t h e f i r s t b u c k l e d s t a t e ( a n U
shaped r o d ) .
I f i t d o e s n o t b u c k l e it s t a y s s t r a i g h t u n t i l p = A 2 , where i t may b u c k l e i n t o t h e s e c o n d b u c k l e d s t a t e (an S s h a p e d r o d ) , and so o n , see f i g u r e 1.3.
124
YongFengZhouandA. Ruhe
Figure 1.3 I n a p r a c t i c a l s i t u a t i o n , t h e o c c u r r e n c e of b u c k l i n g i s g o v e r n e d by small imperfections i n t h e material o r load application, more g r a d u a l o n s e t o f b u c k l i n g , as t h e d o t t e d
leading t o a
l i n e i n f i g u r e 1.3
i n d i c a t e s . W e see t h a t b i f u r c a t i o n i s non g e n e r i c , b u t s t i l l of i n t e -
r e s t i n t h e m a t h e m a t i c a l s t u d y o f t h e m e c h a n i c a l problem. A n o t h e r i m p o r t a n t s o u r c e o f b i f u r c a t i o n problems i s t h e s t u d y o f s t e a d y s t a t e s o f d y n a m i c a l s y s t e m s . A s a s i m p l e c a s e c o n s i d e r a chemical system
ct = F ( c ) + k
(1.4)
where
c
Ac
i s a vector of concentrations of d i f f e r e n t species, F ( c ) a
n o n l i n e a r f u n c t i o n , c o r r e s p o n d i n g t o c h e m i c a l r e a c t i o n s , and A c
is
t h e Laplace o p e r a t o r , corresponding t o d i f f u s i o n . A t s t e a d y s t a t e c t =0, and w e c a n s t u d y t h e s h a p e o f t h e f i n a l s o l u t i o n v e c t o r c f o r d i f f e r e n t v a l u e s o f t h e d i f f u s i v i t y k , which w e now u s e as eigenvalue parameter.
See 131.
F i n a l l y homotopy methods f o r m i n i m i z a t i o n l e a d s t o a s p e c i a l t y p e o f path following [ I ] . (1.5) where cp
C o n s i d e r t h e problem
min ~ ( u )
u i s a r e a l v a l u e d f u n c t i o n . Suppose t h a t w e know t h e minimum
125
Numerical Path Following and Eigenvalue Criteria for Branch Switching
u0
of a n o t h e r
function
'po
,
c o n s i d e r e d as an a p p r o x i m a t i o n t o
U,.
I n t h e s i m p l e s t c a s e we can d e f i n e
Define t h e p a t h o f f u n c t i o n s $ ( u , A ) :=A(cp(u) - c p ( u o ) ) For
A=O
t
(1-A)
t h e minimum i s a t u = u o ,
s o u g h t minimum o f
((P0(U)
and f o r
-cpo(uo)) A=l
it i s a t
u=u*, the
( 1 . 5 ) . Along t h e p a t h i t h o l d s t h a t
One way o f f i n d i n g u * i s now t o f o l l o w a p a t h of s o l u t i o n s of from X = O
to
(1.6)
Such a p a t h may n o t e x i s t i f t u r n i n g p o i n t s
X = l .
o c c u r , b u t t h e r e i s a t e c h n i q u e of c o m p l e x i f i c a t i o n which g i v e s b r a n c h e s b i f u r c a t i n g o u t i n t h e complex p l a n e and t h e n j o i n i n g b a c k a g a i n . See f i g u r e 1 . 4 .
U
$=O Figure 1.4
l=t
Yong Feng Zhou and A . Ruhe
126
The p u r p o s e o f o u r work i n t h i s a r e a i s t o u s e i d e a s t h a t have p r o ved s u c c e s s f u l f o r e i g e n v a l u e c o m p u t a t i o n f o r l i n e a r p r o b l e m s ( S e e 181)
t o c o n s t r u c t a l g o r i t h m s f o r p a t h f o l l o w i n g . T h i s r e p o r t is j u s t
a f i r s t t i n y s t e p o n t h a t way, b u t t h e r e s u l t s t h a t h a v e shown up have e n c o u r a g e d us t o g o f u r t h e r . W e c o n t i n u e i n s e c t i o n 2 by d e s c r i b i n g t r a d i t i o n a l a l g o r i t h m s f o r
p a t h f o l l o w i n g , e s s e n t i a l l y t h e Euler-Newton c o n t i n u a t i o n method.
In
s e c t i o n 3,we d e s c r i b e how e x a c t f a c t o r i z a t i o n s of t h e m a t r i x o f p a r t i a l d e r i v a t i v e s i n one p o i n t , c a n be u s e d a s a p r e c o n d i t i o n i n g i n an i t e r a t i v e s o l u t i o n of t h e l i n e a r s y s t e m s o f t h e Newton i t e r a t i o n , i n l a t e r points.
I n s e c t i o n 4,we
a r r i v e a t t h e problem t h a t i s o u r p r i -
mary c o n c e r n , l o c a t i n g s i n g u l a r p o i n t s . W e show how t h e s o l u t i o n of a l i n e a r eiqenproblem g i v e s b o t h a p r e d i c t i o n o f t h e p o s i t i o n of t h e s i n g u l a r p o i n t s and t h e d i r e c t i o n o f t h e b i f u r c a t i n g b r a n c h e s . W e conc l u d e w i t h s o m e n u m e r i c a l examples i n s e c t i o n 5.
127
Numerical Path Following and Eigenvalue Criteria for Branch Switching
W e s e e k t h e s o l u t i o n of t h e n o n l i n e a r e i g e n v a l u e problem
G(u,A) = O
(2.1)
where
uER
into Rn
n
,
AER
1
i s a n o n l i n e a r t r a n s f o r m a t i o n from Rn x R1
G
I
and assume t h a t
The s t a n d a r d a p p r o a c h i s
dim N ( G U ) = 1
t o use
X I
if
GU is singular.
o n e of t h e n a t u r a l l y o c c u r r i n g
p a r a m e t e r s of t h e p r o b l e m , a s t h e p a r a m e t e r d e f i n i n g s o l u t i o n a r c s , u(X)
.
( u o , Xo)
Those
satisfying
(2.11,
f o r which t h e d e r i v a t i v e Gu
i s n o n s i n g u l a r , a r e r e g u l a r p o i n t s on t h e s o l u t i o n a r c .
is singular,
(uo,Xo)
singular points, i f
If
GU(uO,XO)
i s a s i n g u l a r p o i n t . To f u r t h e r d i s t i n g u i s h
GX$K(Gu)
it i s a t u r n i n g p o i n t , i f
GXER(G ) U
it
is a b i f u r c a t i o n p o i n t . I f w e u s e Euler-Newton c o n t i n u a t i o n t o ( 2 . 1 ) d i r e c t l y , t h i s p r o c e d u r e may f a i l o r e n c o u n t e r d i f f i c u l t i e s a s a s i n g u l a r p o i n t i s a p p r o a c h e d . See [111
f o r f u r t h e r d e t a i l s . T h e s e d i f f i c u l t i e s are o v e r c o m e , f o r
t h e c a s e of t u r n i n g p o i n t s , i f w e i n t r o d u c e a n o t h e r parameter
s
and
impose some a d d i t i o n a l c o n s t r a i n t o r n o r m a l i z a t i o n o n t h e s o l u t i o n . Replace ( 2 . 1 ) b y
and g e t a n e q u i v a l e n t (2.3)
For (2.3) Step 1 .
i n f l a t e d system
F(x,s) = 0
,
t h e Euler-Newton c o n t i n u a t i o n i s : ( F i n d t a n g e n t ) Solve
Yong Feng Zhou and A . Ruhe
128
Step 2 .
(Euler s t e p ) ' Set
(2.5)
x 0 ( s ) = x ( s 0 ) + ( s - s ~* ); ( s o ) = x ( s 0 )+ Aso*X(s0) .
Step 3 .
(Newton i t e r a t i o n ) Solve
(2.6)
Fx(x
V
( s ), s ) * ( x " + ~ (s)-x
and i t e r a t e f o r Here
kIG,i
V = Oll,
V
V
( s ) )=-F(x (s) ,s)
... .
express derivatives with respect t o s.
Some d i f f e r e n t n o r m a l i z a t i o n s a r e : S
11 X ( T ) I /
d?
-
(S-So)
N2 (X,S) = ; ( ( s o ) ( X ( s )
-
X(So))
-
-x(so)) -
qk
N1 ( x , s )
=
T
(2.7)
N ~ ( X ' S )=
N4(x,s) =
e;T
11
-
(x(s)
X(S)
- x ( s 0 ) 11 2 -
(s-so)
See f i g u r e 2 . 1
U
h
u
Figure 2 . 1
(S-So)
(See [ I l l ) (See [ 1 4 1 )
2
(See [ 6 1 )
129
Numerical Path Following and Eigenvalue Criteria f o r Branch Switching
The t r u e arc l e n g t h and W.C.
Rheinboldt
N,
i s d i f f i c u l t t o e v a l u a t e . H.B.
Keller
1 1 4 1 have u s e d t h e p s e u d o a r c l e n g t h s N 2
[I11 and N 3
t o form t h e i r a l g o r i t h m s , r e s p e c t i v e l y . S o l v i n g a l i n e a r s y s t e m by a d i r e c t method i n e a c h i t e r a t i o n i s u n f a v o r a b l e f o r l a r g e problems. D i f f e r e n t i t e r a t i o n s have been proposed. For e x a m p l e , T.F.
Chan and Y .
S a a d [ 5 1 h a v e u s e d a n I n c o m p l e t e Ortho-
g o n a l i z a t i o n Method w i t h p r e c o n d i t i o n i n g , H.Weber [ I 8 1 h a s u s e d MG t e c h n i q u e t o s o l v e a n o n l i n e a r e l l i p t i c e i g e n v a l u e p r o b l e m ; T.F. and K.R. s3.
Chan
J a c k s o n 1 4 1 h a v e p r o p o s e d a new p r e c o n d i t i o n i n g t e c h n i q u e . Mixed Method f o r System S o l u t i o n i n N u m e r i c a l P a t h F o l l o w i n g ............................................................
The I n c o m p l e t e O r t h o g o n a l i z a t i o n Method ( I O M )
,
a s p r o p o s e d by Y.Saad
[ 1 6 1 , i s an i t e r a t i v e p r o c e d u r e f o r s o l v i n g a s y s t e m . F o r l a r g e p r o b -
l e m s t h e main work i n e a c h i t e r a t i o n of IOM i s one m a t r i x - v e c t o r mult i p l i c a t i o n . The number o f i t e r a t i o n s i s r e l a t e d t o t h e c o n d i t i o n number o f t h e c o e f f i c i e n t m a t r i x . T h e r e f o r e some p r e c o n d i t i o n i n g t e c h n i q u e i s needed. Suppose t h a t t h e o r i g i n a l s y s t e m i s (3.11
Ax=b.
W e want t o f i n d a n o n s i n g u l a r m a t r i x
M
so t h a t t h e s y s t e m
which i s e q u i v a l e n t t o ( 3 . 1 ) , c a n be s o l v e d i n f e w e r i t e r a t i o n s i f (3.3)
3 ( M - ~ A ) < 2 (A)
W e have c h o s e n t o LU f a c t o r i z e t h e m a t r i c e s Gu and F,
i n some s e l e c t e d
p o i n t s , and t h e n use t h o s e LU f a c t o r i z a t i o n s as p r e c o n d i t i o n i n g i n a s e q u e n c e o f p a t h f o l l o w i n g s t e p s . When w e h a v e come f a r from t h e
f a c t o r i z a t i o n p o i n t , many i t e r a t i o n s w i l l be needed i n IOM,and t h e n
w e s e l e c t a new f a c t o r i z a t i o n p o i n t .
Yong Feng Zhou and A . Ruhe
130
W e h a v e , q u i t e a r b i t r a r i l y , chosen t o
p r e s c r i b e d v a l u e %ax
refactorize
when ION needed a
i t e r a t i o n s t o converge. I n o u r t e s t s
sax was
chosen a s 15 i n most c a s e s . The convergence of t h e o u t e r Newton i t e r a t i o n ( 2 . 6 ) i s a f f e c t e d by t h e s t e p l e n g t h chosen i n 12.5). and R.
C.D.
H e i j e r and W.C.
Rheinboldt 191
Seydel [ I 7 1 have s t u d i e d t h i s problem. W e have used a method
s i m i l a r t o t h a t proposed by R .
Seydel [ I 7 1
because it i s s i m p l e and
has shown t o be e f f e c t i v e . Our s t r a t e g y i s : (1)
The d e s i r e d number of o u t e r Newton i t e r a t i o n s ( D N I ) f o r ( 2 . 6 ) i s given a p r i o r i , t h e f i r s t s t e p length A s o
(2)
For i = 1 , 2 ,
i s known.
...
where N I i s t h e a c t u a l number of Newton
i t e r a t i o n s t h a t was
needed t o r e a c h xi. (3)
I f t h e s o l u t i o n of ( e . g . 1 5 ) w e set
(4.1)
( 2 . 6 ) d o e s n o t converge a f t e r many i t e r a t i o n s
and go back t o t h e E u l e r s t e p ( 2 . 5 ) .
As.:=Asi/2
_Algorithm _ _ _ _ _A _- _d e_t e_r m_i n_i n_g _t u_r n_i n-g - p-o i n r s - i n
I t i s an obvious f a c t t h a t
c o ~ t ~ n ~ a ~ i ~ n L
a t a t u r n i n g p o i n t . I n numerical
i(s) = 0
x ( s ) by s o l v i n g (2.4) a t e a c h p o i n t . W e monitor t h e s i g n of i t s last component i ( s ) . I f x ( s ) x ( s + l ) < 0 it means
continuation w e g e t
-
t h a t t h e r e i s a t least one t u r n i n g p o i n t on t h e c u r v e segment ( s , s + l ) . Then i f needed w e c a n s e a r c h f o r t h e s i n g u l a r p o i n t u n t i l
- _d e- t-e r-m- i -n i-n -g
(4.2) _Algorithm _ _ _ _ _B _ i n v e r s e i t e r - - - - - - -a t-i o-n .
A t a general singular point
x(s)
so now t h e s m a l l e s t e i g e n v a l u e
Amin
E.
s i n g u l z r_ p_o i n t s b y - m ~ a ~ s - o ~
d o e s n o t n e c e s s a r i l y change s i g n , of
T h i s i d e a formed t h e c o r e of Algorithm B. i n v e r s e power method a t e a c h p o i n t .
1 A (s*) 1
= 6.
. (3.16) JI J,J' s t a r t i n g from the and $1 c a l c u l a t e d f r o m t h e r i g h t - h a n d and l e f t - h a n d v e c t o r s o f t h e s p e c t r a l d e n s i t y o f eq. (3.2), a c c o r d i n g t o t h e f o l l o w i n g r e l a t i o n s :
o1
(3.17 a) (3.17b) I n s t e a d o f eq. (3.4) we now have a g e n e r a t i n g e q u a t i o n f o r each t y p e o f b a s i s function: (3.18a) (3.18b) where t h e p r o j e c t i o n o p e r a t o r Pn i s w r i t t e n as n
(3.19) and t h e complex c o e f f i c i e n t ~ ~ i s+ determined l by the normalization condition I t i s e a s i l y shown t h a t , i n t h i s new b a s i s , i; may again be =l. r e p r e e n t e d by a symmetric, b u t i n g e n e r a l complex, t r i d i a g o n a l m a t r i x T w i t h c o e f f i c i e n t s B~ as o f f - d i a g o n a l elements. T h i s l e a d s t o t h e f o l l o w i n g t h r e e term recursive relations: (3.20 a) (3.20b) with (3.21)
t h e c o n t i n u e d f r a c t i o n which appears a t t h e r i g h t hand s i d e o f eq. (3.10)
can
152
G. Moro and J.H. Freed
a l s o be used f o r r e p r e s e n t i n g t h e f r e q u e n c y dependence o f s p e c t r a l d e n s i t i e s c a l c u l a t e d w i t h o p e r a t o r s which a r e n o t s e l f - a d j o i n t . An a l t e r n a t i v e f o r m f o r J(LI) i s o b t a i n e d f r o m t h e s o l u t i o n o f t h e e i g e n v a l u e problem f o r T
TQ = QA
(3.23)
Reference 29 r e p o r t s on a m o d i f i c a t i o n o f t h e QR a l g o r i t h m f o r complex symmetric m a t r i c e s which can be a p p l i e d t o t h i s problem. The f r e q u e n c y dependence o f J(w) can t h e n be e x p l i c i t l y w r i t t e n as f o l l o w s :
J(w)/6A*6B =
(
Ql,k2/(iw
+
a,)
(3.24)
However t h i s r e l a t i o n i s more u s e f u l i n d i s p l a y i n g how t h e e i g e n v a l u e s o f t h e s t a r t i n g m a t r i x e n t e r i n t o J ( o J ) than f o r p r a c t i c a l purposes. I n f a c t i t i s conv e n i e n t t o c a l c u l a t e J ( w ) d i r e c t l y f r o m t h e c o n t i n u e d f r a c t i o n w i t h o u t any matrix diagonalization. I t should be emphasized t h a t t h e e x i s t e n c e o f t h e c o n t i n u e d f r a c t i o n r e p r e s e n t a t i o n o f t h e s p e c t r a l d e n s i t y i s n o t assured i f i t i s generated by means o f a b i o r t h o n o r m a l s e t o f b a s i s f u n c t i o n s . I t c o u l d happen, t h a t t h e f o l l o w i n g s c a l a r p r o d u c t [ c f . eqs. (3.18)] vanishes:
(3.25) a c c o r d i n g t o eq. (3.16) i s no so t h a t t h e n o r m a l i z a t i o n o f $n+l and l o n g e r p o s s i b l e . When t h e Lanczos method i s used with a s e l f - a d j o i n t F f o r g e n e r a t i n g o r t h o g o n a l b a s i s f u n c t i o n s , a s i m i l a r s i t u a t i o n i s found o n l y i f t h e r i g h t hand s i d e o f eq. (3.4) vanishes. I n t h i s case t h e o p e r a t o r f i s f a c t o r e d w i t h r e s p e c t t o t h e subspace spanned b y f u n c t i o n s 41,$2,...,$n, and t h e continued f r a c t i o n truncated a t the n-th term represents the spectral density completely. O f course, w i t h b i o r t h o n o r m a l b a s i s f u n c t i o n s , i t would a l s o be l e g i t i m a v e $0 t r u n c a t e t h e c o n t i n u e d f r a c t i o n a t t h e n - t h t e r m i f (l-Pn)r$,, o r ( l - p n )i: +n, o r b o t h vanish. However t h e i r s c a l a r p r o d u c t c o u l d v a n i s h s i m p l y because t h e y are o r t h o g o n a l , and i n t h i s case i t would be i m p o s s i b l e t o d e r i v e a c o n t i n u e d f r a c t i o n r e p r e s e n t a t i o n o f t h e s p e c t r a l d e n s i t y . The s p e c t r a l d e n s i t y o f eq. (3.2) w i t h <sn(P$,\&B>=O c o n s t i t u t e s an obvious example o f such a s i t u a t i o n . Normally, i n t h e c a l c u l a t i o n o f a u t o c o r r e l a t i o n f u n c t i o n s r e l e v a n t t o s p e c t r o s c o p i c observables, such anomalous b e h a v i o r i s n o t found. It i s however a d v i s a b l e t o check t h e magnitude o f t h e norm o f t h e f u n c t i o n i n t h e s c a l a r p r o d u c t o f eq. (3.25), when i t equals zero. As i n t h e s i m p l e case t r e a t e d a t t h e b e g i n n i n g o f t h i s chapter, t h e c o e f f i c i e n t s o f t h e c o n t i n u e d f r a c t i o n are cqmputed b y t h e use o f t h e r e c u r s i v e r e l a t i o n s eqs. (3.20) wi t h t h e o j ' s and +J’s expanded i n a g i v e n s e t o f orthonormal b a s i s f u n c t i o n s f j . R e c a l l i n g t h e d e f i n i t i o n eq. (3.11) and eq. (3.13) f o r xn, t h e r e c u r s i v e r e l a t i o n s may be w r i t t e n as f o l l o w s : (3.26a) (3.26 b) where x, i s t h e column m a t r i x c o n s t r u c t e d w i t h t h e expansion c o e f f i c i e n t s o f bn, and:
153
The Lanczos Algoritlini in Molecular Djxamics
C(
n
( x
=
~
( xn+l)t x
M
n+l
)xn ~
(3.27)
1
(3.28)
=
The computer i m p l e m e n t a t i o n o f these r e l a t i o n s r e q u i r e s t h e s t o r a g e o f f o u r v e c t o r s , as w e l l as two m u l t i p l i c a t i o n s o f a square m a t r i x by a column m a t r i x a t each i t e r a t i o n . The c o m p u t a t i o n a l e f f o r t i s n e a r l y doubled w i t h r e s p e c t t o t h e Lanczos a l g o r i t h m w i t h o r t h o n o r m a l b a s i s f u n c t i o n s . There i s however an i m p o r t a n t e x c e p t i o n w i t h complex symmetric m a t r i c e s M i f t h e s t a r t i n g v e c t o r s a r e complex c o n j u g a t e . That i s , f o r n=l, we l e t : x* = n
xn
(3.29)
frpm .
eqs. (3.26) t h a t eq. (3.29) w i l l be v a l i d f o r a l l Then i t i s e a s i l y shown v a l u e s o f n, p r o v i d e d M=M Therefore, o n l y t h e r e c u r s i v e r e l a t i o n (3.26a) needs t o be e x p l i c i t l y computed, and t h e n o r m a l i z a t i o n c o n d i t i o n eq. (3.28) becomes : xTr x
n
n
=
1
(3.30)
T h i s i s e q u i v a l e n t t o t h e implementation o f e q u a t i o n (3.12) w i t h a " E u c l i d e a n form" o f t h e s c a l a r p r o d u c t , i n s p i t e o f t h e complex number a l g e b r a f o r t h e m a t r i x o p e r a t i o n s . E q u a t i o n 3.30) i s e q u i v a l e n t t o t h e pseudo-norm t h a t we have p r e v i o u s l y i n t r o d u c e d [ 6 $ . I t should be n o t e d here t h a t o f t e n problems i n v o l v i n g o p e r a t o r s f which are n o t s e l f - a d j o i n t , such as t h e s t o c h a s t i c L i o u v i l l e o p e r a t o r s c o n s i d e r e d i n magnetic resonance experiments, can be d e s c r i b e d b y complex symmetric m a t r i c e s i f t h e b a s i s f u n c t i o n s f j a r e p r o p e r l y chosen [6,211. I f t h e t i m e e v o l u t i o n o p e r a t o r f is not "symmetrized", t h e n i t w i l l not, i n g e n e r a l , be p o s s i b l e t o r e p r e s e n t i t by a complex-symmetric m a t r i x , f o r which eq. (3.29) i s t r u e , and methods based upon b i o r t h o n o r m a l spaces, as o u t l i n e d above, become e s s e n t i a l . T h i s approach has been discussed i n d e t a i l by Wassam [301. Many aspects o f t h e g e n e r a l a n a l y s i s o f t h e f a c t o r s i n f l u e n c i n g t h e computer performance o f t h e ( r e a l symmetric) Lanczos a l g o r i t h m [3,4] can a l s o be a p p l i e d t o our t y p e o f problem. I n p a r t i c u l a r t h e s p a r s i t y o f t h e m a t r i x i s c r u c i a l i n d e t e r m i n i n g t h e e f f i c i e n c y o f t h e method f r o m b o t h t h e p o i n t o f view o f computer t i m e and memory needed. U s u a l l y t h e m a t r i x r e p r e s e n t a t i o n s o f t h e t i m e evolut i o n o p e r a t o r s c o n s i d e r e d i n t h e p r e v i o u s c h a p t e r have few n o n - v a n i s h i n g e l e ments. Values around 10-20% are t y p i c a l s p a r s i t i e s , b u t i t can be as low as a few p e r c e n t . Moreover, i n general, one f i n d s t h a t t h e r e i s a decrease i n t h e r e l a t i v e number o f non-zero elements when one i n c r e a s e s t h e degrees of freedom i n c l u d e d i n t h e t i m e e v o l u t i o n o p e r a t o r T. I n o t h e r words, t h e e f f i c i e n c y o f t h e Lanczos a l g o r i t h m i s enhanced i n problems w i t h l a r g e - s i z e m a t r i c e s . I t should be emphasized t h a t t h e a p p l i c a t i o n o f t h e Lanczos a l g o r i t h m c o n s i d e r e d h e r e d i f f e r s f r o m i t s s t a n d a r d use i n numerical a n a l y s i s , i n terms o f t h e quanti t y t o be computed. N o r m a l l y t h e Lanczos a l g o r i t h m i s c o n s i d e r e d in t h e framework o f t h e c a l c u l a t i o n o f eigenvalues, w h i l e we need t h e s p e c t r a l d e n s i t y as a f u n c t i o n o f frequency, which i s w e l l d e s c r i b e d b y t h e c o n t i n u e d f r a c t i o n eq. (3.10), ( b u t see Sect. I V ) . C o r r e s p o n d i n g l y t h e convergence o f t h e n u m e r i c a l method must be c o n s i d e r e d i n a d i f f e r e n t manner. I n p a r t i c u l a r we must e v a l u a t e how, by i n c r e a s i n g t h e number o f steps o f t h e Lanczos a l g o r i t h m , i . e . t h e number o f terms o f t h e c o n t i n u e d f r a c t i o n , t h e s p e c t r a l l i n e s h a p e I ( w ) o f eq. (2.12) approaches i t converged form. I n F i g u r e 2 a t y p i c a l ESR a b s o r p t i o n spectrum i s displayed.
G. Moro and J.H. Freed
154
n Figure 2 ESR a b s o r p t i o n spectrum o f a paramagnetic s p i n probe. The magnetic and m o t i o n a l parameters a r e t h e same as case I i n Table I o f r e f e r e n c e 6. I n numerical a n a l y s i s , t h e convergence o f t h e Lanczos a l g o r i t h m w i t h r e s p e c t t o t h e i n d i v i d u a l e i g e n v a l u e s has been c o n s i d e r e d i n d e t a i l [31,32]. We a r e n o t aware o f any g e n e r a l c r i t e r i o n f o r t h e convergence o f t h e s p e c t r a l d e n s i t y . We have found i t c o n v e n i e n t t o use t h e f o l l o w i n g phenomenological d e f i n i t i o n o f rel a t i v e e r r o r En f o r t h e s p e c t r a l l i n e s h a p e computed w i t h n terms o f t h e continued f r a c t i o n [ 6 ] :
Where I(w) i s t h e converged s p e c t r a l l i n e s h a p e . We t h i n k t h i s d e f i n i t i o n o f En i s a u s e f u l one, because i t i s a measure o f t h e o v e r a l l d i f f e r e n c e between I n ( w ) and I(w). Using t h i s q u a n t i t y , we can d e f i n e t h e s u f f i c i e n t number o f s t e p s ns, as t h e s m a l l e s t n which assures an e r r o r En l e s s t h a n t h e r e q u i r e d accuracy f o r t h e s p e c t r a l lineshape. I n general En decreases w i t h n, b u t i t may not be s t r i c t l y monotonic (see r e f e r e n c e 6 f o r some t y p i c a l t r e n d s ) . F o r an ns i s found t o be much l e s s t h a n t h e dimension N o f t h e accuracy En = m a t r i x [ 6 ] . F o r l a r g e s i z e problems, ns i s t y p i c a l l y o f t h e o r d e r o f N/5 o r l e s s . O f course t h i s c o n t r i b u t e s t o t h e o v e r a l l e f f i c i e n c y o f t h e Lanczos a l g o r i t h m by r e d u c i n g t h e number o f i t e r a t i o n s . I n some problems t h e s p e c t r a l d e n s i t y i s dominated by o n l y a few e i g e n v a l u e s . That i s , i n eq. (3.24) o n l y a few o f t h e w e i g h t i n g f a c t o r s Q1 k2 a r e n o t n e g l i g i b l e . I n these cases t h e Lanczos a l g o r i t h m reproduces h i t h comparable accuracy t h e s p e c t r a l l i n e s h a p e and t h e dominant e i g e n v a l u e s . There a r e o t h e r s i t u a t i o n s , l i k e t h e slow m o t i o n a l ESR spectrum d i s p l a y e d i n F i g u r e 2, where t h e l i n e s h a p e i s a c o m p l i c a t e d f u n c t i o n , which must be accounted f o r by a l a r g e coll e c t i o n o f eigenvalues. I n such cases t h e Lanczos a l g o r i t h m i s m r e e f f i c i e n t i n r e p r o d u c i n g t h e o v e r a l l shape o f I ( U ) t h a n i n computing t h e e i g e n v a l u e s [6]. F i g u r e 3 i l l u s t r a t e s t h i s f a c t i n d i s p l a y i n g t h e computed eigenvalues o f t h e ESR problem c o n s i d e r e d i n F i g . 2. The d o t s i n d i c a t e t h e e x a c t e i g e n v a l u e s o f t h e The crosses r e p r e s e n t t h e s t a r t i n g m a t r i x M which has a dimension equal t o 42. 16 eigenvalues o f t h e t r i d i a g o n a l m a t r i x which approximate t h e l i n e s h a p e t o an accuracy of From t h e F i g u r e i t i s c l e a r t h a t t h e r e i s no s i m p l e r e l a t i o n between o v e r a l l accuracy i n t h e l i n e s h a p e f u n c t i o n and accuracy i n t h e approximate eigenvalues. Most o f them, i n fact,cannot be s i m p l y a s s o c i a t e d on a oneto-one b a s i s w i t h p a r t i c u l a r exact eigenvalues. Even when t h i s i s p o s s i b l e , t h e e r r o r i n t h e approximate eigenvalues i s f a r g r e a t e r t h a n t h e accuracy o f
The Lanczos Algorithm in Molecular Dynamics
155
t
Figure 3 D i s t r i b u t i o n o f t h e exact ( d o t s ) and approximate ( t o t h e ESR spectrum d i s p l a y e d i n F i g u r e 2.
@.
) eigenvalues r e l a t i v e
f o r t h e f u l l spectrum ( w i t h t h e o n l y e x c e p t i o n b e i n g t h e e i g e n v a l u e h a v i n g t h e g r e a t e s t i m a g i n a r y p a r t ) . We f i n d t h e r e f o r e , t h a t t h e Lanczos a l g o r i t h m genera t e s c o n t i n u e d f r a c t i o n s which t e n d t o o p t i m i z e t h e o v e r a l l shape o f t h e spectrum, r a t h e r t h a n s e t s of eigenvalues. W h i l e a t f i r s t such a s t a t e m e n t m i g h t appear c o n t r a d i c t o r y , i t i s based on t h e f a c t t h a t t h e s p e c t r a l d e n s i t y i s u s u a l l y dominated by t h e e i g e n v a l u e s o f small r e a l p a r t , and t h e Lanczos a l g o r i t h m i s a b l e t o approximate them i n d i v i d u a l l y , o r i n p r o v i d i n g an "average" t o a c l u s t e r of e i g e n v a l u e s s u f f i c i e n t t o r e p r e s e n t t h e s p e c t r a l d e n s i t y . A q u a l i t a t i v e j u s t i f i c a t i o n o f t h i s b e h a v i o r i s i m p l i c i t i n t h e s o - c a l l e d method o f moments [23], s i n c e t h e subspaces generated b y t h e Lanczos a l g o r i t h m t e n d t o approximate t h e o v e r a l l b e h a v i o r o f T by r e p r o d u c i n g i t s f i r s t moments w i t h respect t o t h e s t a r t i n g vectors. We mention t h a t i n t h e t y p e s of problems we have been d i s c u s s i n g , t h e r e have almost never been s i g n i f i c a n t e f f e c t s due t o r o u n d - o f f e r r o r , which, however, i s t h e main weakness o f t h e Lanczos a l g o r i t h m i n t h e c a l c u l a t i o n o f e i g e n v a l u e s . Q u a l i t a t i v e l y , t h i s can be r e l a t e d t o t h e same f a c t t h a t t h e s p e c t r a l d e n s i t y ccnverges much sooner t h a n t h e c o r r e s p o n d i n g eigenvalues, so t h a t t h e l o s s o f I t i s known, i n f a c t , t h a t s p u r i o u s o r t h o g o n a l i t y i s not yet s i g n i f i c a n t . e i g e n v a l u e s appear a f t e r a l a r g e enough number o f Lanczos s t e p s have been c a l c u l a t e d i n o r d e r t o o b t a i n eigenvalues which are v e r y c l o s e t o t h e i r e x a c t v a l u e s [32]. As n o t e d above, t h i s i m p l i e s many more s t e p s t h a n are needed f o r t h e convergence o f t h e s p e c t r a l d e n s i t i e s . On t h e o t h e r hand, o n l y a t h e o r y , which i s s t i l l l a c k i n g , t h a t i s s p e c i f i c a l l y designed t o analyze t h e convergence o f the spectral densities, could give a q u a n t i t a t i v e estimate o f the e f f e c t s o f t h e round-off e r r o r . The l a s t p a r t of t h i s c h a p t e r w i l l be devoted t o a d i s c u s s i o n o f t h e c h o i c e o f basis functions for representing the time evolution operator. F i r s t of a l l , t h e r e i s a t r u n c a t i o n problem, s i n c e t h e f u n c t i o n s f j ( z 1, which f o r m a
G. Moro and J.H. Freed
156
complete s e t t o r e p r e s e n t t h e H i l b e r t space a s s o c i a t e d w i t h t h e f u n c t i o n s o f t h e s t o c h a s t i c v a r i a b l e s z, are i n general i n f i n i t e i n number. I n computer c a l c u l a t i o n s one can o n l y handle f i n i t e m a t r i c e s . T h e r e f o r e one must t r u n c a t e t h e m a t r i x M, i . e . one r e p r e s e n t s t h e o p e r a t o r f i n a f i n i t e b a s i s s e t o f f u n c t i o n s f,, f 2 , fN, under t h e h y p o t h e s i s t h a t t h e r e m a i n i n g f u n c t i o n s l e a d t o n e g l i g i b l e c o n t r i b u t i o n s . The t r u n c a t i o n o f t h e b a s i s s e t leads t o i n c o r r e c t f e a t u r e s i n t h e s p e c t r a l d e n s i t y , w h i l e t h e computer c a l c u l a t i o n c o u l d become e x c e e d i n g l y u n w i e l d y w i t h a t o o l a r g e b a s i s s e t . I n t h e absence o f theor e t i c a l e s t i m a t e s o f t h e e r r o r i n I ( w ) caused by t h e t r u n c a t i o n , t h e convergence must be v e r i f i e d d i r e c t l y f r o m t h e computed r e s u l t s by comparing IN(^) o b t a i n ed w i t h i n c r e a s i n g values o f N. T h i s i s i n t h e same s p i r i t as t h e a n a l y s i s o f convergence w i t h r e s p e c t t o t h e number o f s t e p s o f t h e Lanczos a l g o r i t h m . Again we can use eq. (3.29) as a measure o f t h e t r u n c a t i o n e r r o r . C l e a r l y , as t h e number o f s t o c h a s t i c v a r i a b l e s increases, t h e t r u n c a t i o n problem becomes more d i f f i c u l t and more t i m e consuming.
...,
U s u a l l y , i n a g i v e n problem, one has a c h o i c e amongst d i f f e r e n t t y p e s o f b a s i s f u n c t i o n s . The b e s t b a s i s f u n c t i o n s would be t h o s e which a l l o w t h e most e f f i c i e n t t r u n c a t i o n , t h u s y i e l d i n g t h e s m a l l e s t m a t r i x t o be handled by means o f t h e Lanczos a l g o r i t h m . There a r e no s i m p l e g u i d e l i n e s f o r such a choice, a p a r t f r o m t h e obvious r u l e t h a t a l l t h e symmetries o f t h e t i m e e v o l u t i o n o p e r a t o r should be t a k e n i n t o account. (Other f e a t u r e s i n s e l e c t i n g a b a s i s s e t i n c l u d e t h e d e s i r a b i l i t y t o maximize t h e s p a r s i t y o f t h e m a t r i x and t h e ease o f c a l c u l a t i n g t h e m a t r i x elements). O n l y by e x p e r i e n c e w i t h each p a r t i c u l a r c l a s s o f problems can one f e e l c o n f i d e n t i n s e l e c t i n g an o p t i m a l s e t o f b a s i s f u n c t i o n s . I n some problems t h e d e s i r e f o r a minimal b a s i s s e t suggests t h e use o f nono r t h o g o n a l f u n c t i o n s . T h i s i s l i k e l y t o happen when d e a l i n g w i t h p h y s i c a l systems c h a r a c t e r i z e d by mean p o t e n t i a l s which c o n f i n e t h e s t o c h a s t i c v a r i a b l e s z around some s t a b l e s t a t e s o r conformations. Correspondingly, smal 1 amplit u d e motions around t h e s t a b l e s t a t e s ( l i b r a t i o n s ) and t r a n s i t i o n s amongst d i f f e r e n t s t a t e s become t h e r e l e v a n t dynamical processes. T h i s t y p e o f problem i s commonly encountered i n such f i e l d s o f r e s e a r c h as chemical k i n e t i c s i n condensed phases [33-351 o r i n t h e s t u d y o f c o n f o r m a t i o n a l dynamics o f c h a i n molec u l e s o r polymers [36-381. From an a n a l y s i s o f t h e a s y m p t o t i c behaviour o f t h e s o l u t i o n s o f t h e d i f f u s i o n equation, i t has been shown t h a t such problems can be c o n v e n i e n t l y s o l v e d w i t h non-orthoganal b a s i s f u n c t i o n s o f t h e f o l l o w i n g t y p e [ 39-42] : (3.32) As long as one i s a b l e t o mimic e f f i c i e n t l y , by means o f t h e f u n c t i o n s g j , t h e fundamental processes o c c u r r i n g i n h i n d e r e d systems, v e r y few elements o f such a b a s i s s e t a r e needed i n t h e c a l c u l a t i o n o f t h e s p e c t r a l d e n s i t y . The r o l e played b y t h e f a c t o r P'/' i n eq. (3.32) should be emphasized, s i n c e i t a l l o w s one t o express P'/'sEqand P1126B o f eq. (3.2) s i m p l y as l i n e a r combinations o f properlFqchosen f u @ t i o n s f j ' s . On t h e o t h e r hand, w i t h orthonormal b a s i s f u n c t i o n s one must expand P 1 l 2 i n a l a r g e s e t o f f u n c t i o n s , as a consequence o f i t s sharp d i s t r i b u t i o n argflnd t h e s t a b l e s t a t e o f t h e system. ( A l t e r n a t i v e l y one may i n t r o d u c e f i n i t e d i f f e r e n c e o r f i n i t e element methods t o choose " l o c a l i z e d " b a s i s s e t s [ 4 3 ] ) . O f course t h e i m p l e m e n t a t i o n o f t h e Lanczos a l g o r i t h m must be changed when p a s s i n g f r o m orthonormal t o non-orthogonal b a s i s f u n c t i o n s . One may w r i t e t h e m a t r i x f o r m f o r t h e s p e c t r a l d e n s i t y eq. 3.2) by con i d e r i n g t h e r e p r e s e n t a t i o n A and P' B i n such a nono f t h e o p e r a t o r r and o f t h e f u n c t i o n s P1 o r t h o g o n a l b a s i s . One q u i c k l y sees thatef!he calculafyon o f the spectral density i s now c l o s e l y r e l a t e d t o t h e g e n e r a l i z e d e i g e n v a l u e problem Ax=xBx 281. A l t e r n a t i v e l y , we can s t a r t w i t h t h e r e c u r s i v e r e l a t i o n o f eq.(3.7) [ o r eq.
$
7
The Lanczos Algoritlzm in Molecular Dynamics
157
( 3 . 1 8 ) ] and implement them w i t h non-orthogonal b a s i s f u n c t i o n s , i n o r d e r t o c a l c u l a t e t h e c o e f f i c i e n t s o f t h e t r i d i a g o n a l m a t r i x . We s h a l l d i s c u s s i n d e t a i l t h i s second r o u t e u s i n g t h e r e c u r s i v e r e l a t i o n eq. ( 3 . 7 ) ; i.e. f o r s p e c t r a l d e n s i t i e s o f eq. (3.2) c h a r a c t e r i z e d by A=B and a s e l f - a d j o i n t o p e r a t o r r. One f i n d s t h a t t h e m a t r i x r e c u r s i v e r e l a t i o n eq. (3.12) c o n t i n u e s t o h o l d i f t h e g e n e r i c column m a t r i x xn r e p r e s e n t s t h e expansion c o e f f i c i e n t s o f Jln on t h e non-orthogonal b a s i s s e t a c c o r d i n g t o eq. (3.13), and i f t h e m a t r i x M i s i m p l i c i t l y d e f i n e d by t h e f o l l o w i n g r e l a t i o n : ? . f .=): M. f J k J k k
(3.33)
One does need t o m o d i f y t h e n o r m a l i z a t i o n c o n d i t i o n and t h e c a l c u l a t i o n o f diagonal c o e f f i c i e n t s , according t o the f o l l o w i n g r e l a t i o n s :
t
Yn xn
1
=
(3.34)
(3.35) where y matrix
9
i s an a u x i l i a r ' y column m a t r i x c a l c u l a t e d f r o m t h e n o r m a l i z a t i o n
sJk .
= < f .f
JI k
>
(3.36)
and t h e a r r a y xn a c c o r d i n g t o t h e e q u a t i o n : Yn =
s
xn
(3.37)
The o p e r a t i o n s o f t h e s t a n d a r d Lanczos a l g o r i t h m are e a s i l y m d i f i e d t o t h e preI n p a r t i c u l a r , d u r i n g an i t e r a t i v e c y c l e one must s t o r e s e n t case [39,42]. t h r e e a r r a y s t o r e p r e s e n t t h e v e c t o r s xn, Xn-1 and yn. By comparison w i t h t h e s t a n d a r d Lanczos a l g o r i t h m , t h e s t o r a g e needed i s t h e n increased by one square m a t r i x ( S ) and one a r r a y (yn), w h i l e t h e c o m p u t a t i o n a l e f f o r t i s i n c r e a s e d by t h e m u l t i p l i c a t i o n o f a square m a t r i x by a column m a t r i x a t each Therefore, t h e use o f s t e p , because o f t h e c a l c u l a t i o n o f Yn f r o m eq. (3.37). non-orthogonal b a s i s f u n c t i o n s i s convenient o n l y when it a l l o w s a c o n s i d e r a b l e r e d u c t i o n o f t h e s i z e o f t h e m a t r i c e s . T h i s i s g e n e r a l l y t h e case i n p h y s i c a l problems c h a r a c t e r i z e d by s t r o n g h i n d e r i n g p o t e n t i a l s [42]. U n l i k e t h e case w i t h o r t h o g o n a l f u n c t i o n s , t h e e f f e c t o f t h e t r u n c a t i o n o f t h e b a s i s t o t h e f i r s t N elements cannot be c a r r i e d o u t s i m p l y by n e g l e c t i n g t h e elements o f M o u t s i d e t h e f i r s t NxN b l o c k . As a m a t t e r o f f a c t , t h e use o f a f i n i t e b a s i s s e t which d e f i n e s an N-dime s i o n a l subspace €N, i s e q u i v a l e n t t o c o n s i d e r i n g i n eq. (3.2) t h e f u n c t i o n P1Y2aA and t h e o p e r a t o r i; p r o j e c t e d onto EN according t o t h e f o l l o w i n g projectorepN:
N
PN =
1 (fj>( j =1
s
-')jk'fk(
C o r r e s p o n d i n g l y , i n t h e r e c u r s i v e r e l a t i o n eq. ( 3 . 7 ) , o p e r a t o r T, by i t s p r o j e c t e d form f .
i;’
and t h e m a t r i x
I
(3.38) one must s u b s t i t u t e t h e
P FP N N
M i s i m p l i c i t l y defined by the f o l l o w i n g equation:
P Ff. = E M f N J k kJk A f t e r s u b s t i t u t i o n o f t h e p r o j e c t i o n o p e r a t o r o f eq. (3.38),
(3.40) one o b t a i n s :
158
G. Moro and J.H. Freed
M = S - ' R w i t h R c o n s t r u c t e d w i t h t h e m a t r i x elements o f
(3.41)
?:
R . = (3.42) Jk Jl-1 We n o t e t h a t eq. (3.41) i n d i c a t e s t h a t t h e c a l c u l a t i o n o f M i s m r e complicated t h a n f o r t h e case o f orthonormal b a s i s sets. There are s e v e r a l methods f o r t h e c a l c u l a t i o n o f M. F i r s t M can be computed d i r e c t l y f r o m eq. (3.41), b u t t h i s would r e q u i r e a l a r g e amount o f computation t i m e because o f t h e i n v e r s i o n o f S . Secondly, as suggested by Jones and coworkers [44,45]; one can implement t h e from r e c u r s i v e r e l a t i o n (3.12) by c a l c u l a t i n g a t each s t e p t h e a r r a y S-lRx, t h e s o l u t i o n o f t h e l i n e a r system o f e q u a t i o n s w i t h Rxn as knoyn c o e f f i c i e n t s . T h i r d l y , i n some d i f f u s i o n a l problems, one can w r i t e r f j as a l i n e a r combination o f b a s i s f u n c t i o n s by c o n s i d e r i n g e x p l i c i t l y t h e o p e r a t o r f o r m o f [39,42]. Thus t h e elements o f M are d e r i v e d by p r o j e c t i n g o u t , a c c o r d i n g t o eq. (3.38), o n l y those f u n c t i o n s which do not belong t o EN [42].
SUmARY We have, i n t h i s review, o u t l i n e d how t h e Lanczos a l g o r i t h m i s capable o f p l a y ing a significant r o l e i n the calculation o f spectral densities that arise i n t h e s t u d y o f m o l e c u l a r dynamics. T h i s i s , i n p a r t , due t o i t s c o m p u t a t i o n a l v a l u e and a l s o t o i t s c l o s e r e l a t i o n s h i p t o i m p o r t a n t t h e o r e t i c a l methods i n s t a t i s t i c a l p h y s i c s . We have p o i n t e d o u t t h a t t h e s e problems can o f t e n be r e p r e s e n t e d by complex symmetric m a t r i c e s , and t h e g e n e r a l i z a t i o n o f t h e Lanczos a l g o r i t h m t o such m a t r i c e s has been g e n e r a l l y s u c c e s s f u l . F u r t h e r work i s c l e a r l y needed i n e s t a b l i s h i n g a b e t t e r u n d e r s t a n d i n g o f how t h e Lanczos algor i t h m e f f e c t i v e l y p r o j e c t s out a useful representation o f the spectral d e n s i t i e s w i t h much l e s s e f f o r t t h a n i s r e q u i r e d t o o b t a i n a good s e t o f eigenvalues. As problems become more complicated, and t h e m a t r i x r e p r e s e n t a t i o n s become l a r g e r , t h e r e i s concern f o r c a r e f u l s e l e c t i o n o f b a s i s v e c t o r s i n c l u d i n g e f f e c t i v e means o f " p r u n i n g " o u t unnecessary b a s i s v e c t o r s . Also, problems due t o accumulated r o u n d - o f f can become more s e r i o u s . Thus e f f i c i e n t t e c h n i q u e s f o r p a r t i a l r e - o r t h o g o n a l i z a t i o n may be c a l l e d f o r [4]. I n systems w i t h s t r o n g t r a p p i n g p o t e n t i a l s , t h e use o f non-orthogonal f u n c t i o n s l o o k s promising. So f a r , however, t h e method has been t e s t e d o n l y i n cases where t h e s p e c t r a l d e n s i t i e s can s t i l l be r e a d i l y c a l c u l a t e d w i t h t h e use o f s t a n d a r d orthonormal b a s i s f u n c t i o n s [39,42]. The a p p l i c a t i o n t o c h a l l e n g i n g problems dependent on s e v e r a l degrees o f freedom r e q u i r e s a search f o r o p t i m a l b a s i s f u n c t i o n s . Otherwise, t h e approach i s s t r a i g h t f o r w a r d apply. Although we have emphasized i n t h i s work t h e c a l c u l a t i o n o f s p e c t r a l d e n s i t i e s , what we have s a i d h e r e g e n e r a l i z e s v e r y n i c e l y t o t h e a n a l y s i s o f t i m e domain experiments on m o l e c u l a r dynamics. I n t h e c o n t e x t o f l i n e a r response theory, t h e F o u r i e r t r a n s f o r m o f t h e s p e c t r a l d e n s i t y ( o r f r e q u e n c y spectrum) i s j u s t t h e t i m e c o r r e l a t i o n f u n c t i o n ( o r t i m e domain response). Thus, many modern time-domain experiments may be d e s c r i b e d by f i r s t c a l c u l a t i n g t h e s p e c t r a l d e n s i t y by t h e above methods and t h e n u s i n g FFT r o u t i n e s t o o b t a i n t h e a s s o c i a t e d c o r r e l a t i o n f u n c t i o n s . T h i s equivalence again emphasizes t h e r o l e o f t h e Lanczos a l g o r i t h m i n s e l e c t i n g o u t and a p p r o x i m a t e l y r e p r e s e n t i n g t h e eigenvalues o f small r e a l p a r t , i . e . t h e s l o w l y decaying components, which u s u a l l y dominate t h e time-domain experiments. There i s , however, a s p e c i a l case o f t i m e domain experiments: v i z . t h e s p i n echo (and i t s o p t i c a l analogues). These experiments may be t h o u g h t o f , t o a f i r s t approximation, as c a n c e l i n g o u t t h e e f f e c t s o f t h e i m a g i n a r y p a r t s o f t h e eigen-
159
The Lanczos Algorithm in Molecular Dynamics
v a l u e s t h a t c o n t r i b u t e t o t h e t i m e domain response and t h e r e b y t o p r o v i d e g r e a t s e n s i t i v i t y i n t h e experiment t o t h e r e a l p a r t s . A n a l y s i s o f spin-echo s p e c t r o scopy by t h e Lanczos a l g o r i t h m has met w i t h some success [46,47], because, as n o t e d above, t h e e i g e n v a l u e s o f small r e a l p a r t t e n d t o dominate t h e experimenta l o b s e r v a t i o n s , and t h e y are t h e ones t h a t are a t l e a s t r o u g h l y approximated by t h e Lanczos a l g o r i t h m . However, f u r t h e r c o m p u t a t i o n a l developments along t h i s l i n e would have t o address how t o o b t a i n b e t t e r e s t i m a t e s o f t h e s e " s m a l l e r " e i g e n v a l u e s by improvements on t h e b a s i c Lanczos t e c h n i q u e . ACKNOWLEDGMENTS T h i s work was supported by NSF S o l i d S t a t e Chemistry Grant DMR 81-02047 and NSF Grant CHE 8319826 (JHF) and by t h e I t a l i a n Research C o u n c i l (CNR) t h r o u g h i t s Centro S t u d i S u g l i S t a t i M o l e c o l a r i R a d i c a l i c i ed E c c i t a t i (GM). REFERENCES
1. R. Kubo, J. Phys. SOC. Japan 12 (1957) 570. 2. C. Lanczos, J. Res. Natn. Bur. Stand. 45 (1950) 255; 49 (1952) 33. 3. B . N . P a r l e t t , The S y m n e t r i c E i g e n v a l u e Problem ( P r e n t i c e H a l l , Englewood C l i f f s , N.J., 1980). 4. 3. Cullum and R. Willoughby, Lanczos A l g o r i t h m s f o r L a r g e Symnetric E i g e n v a l u e Computations, Vol I Theory ( B i r k h a u s e r , Basel, 1985).
5. G . Moro and J.H.
Freed, J. Phys. Chem. 84 (1980) 2837.
6. G. Moro and J.H.
Freed, J. Chem. Phys. 74 (1981) 3757.
7. G. Moro and J.H.
Freed, J. Chem. Phys. 75 (1981) 3157.
8. H. Mori, Prog. Theor. Phys. 34 (1965) 399. 9. J.T. Hynes and J.M. Deutch, i n : P h y s i c a l Chemistry, Vol. 118, eds. H. E y r i n g , 0. Henderson and W. J o s t (Academic Press, New York, 1975). 10. L.E. R e i c h l , A Modern Course i n S t a t i s t i c a l Physics, A u s t i n (1980).
U. of Texas Press,
11. B.J. Berne, i n : P h y s i c a l Chemistry, Vol. 88, eds. H. E y r i n g , D. Henderson and W.
J o s t (Academic Press, New York,
1975).
12. J.P.
R y c k a e r t and A. Bellemans, Phys. L e t t . 30 (1975) 123.
13. L.P.
Hwang and J.H. Freed, 3. Chem. Phys. 63 (1975) 118.
14. G. Moro and P.L.
Nordio, J. Phys. Chem. 89 (1985) 597.
15. M. Fixman and K. R i d e r , J. Chem. Phys. 51 (1969) 2425, 16. A.E.
S t i l l m a n and J.H.
Freed, J. Chem. Phys. 72 (1980) 550.
17. J.H. Freed, i n : G.S.
S t o c h a s t i c Processing, Formalism and A p p l i c a t i o n s , Argawal and S. Oattagupta ( S p r i n g e r , B e r l i n , 1983), p. 220.
18. L. Van Hove, Phys. Rev. 95 (1954) 249.
eds.
G.Moro and J.H. Freed
160
19. R. Kubo, Adv. Chem. Phys. 16 (1969) 101. 20. J.H. P.W.
Freed, i n : Electron Spin Relaxation i n Liquids, A t k i n s (Plenum, New York, 1972), p. 387.
21. E. M e i r o v i t c h , 77 (1982) 3915 22. C.F.
eds. L.T.
Muss and
D. I g n e r , E. I g n e r , G. Moro and J.H. Freed, J. Chem. Phys.
Polnaszek
23. Yu.Y. Vorobyev New York, 1965
G.V.
Freed, J . Chem. Phys. 58 (1973) 3189.
Bruno and J.H.
Method o f Moments i n Applied Mathematics (Gordon and Breach,
24. H.S. Wall, Analytic Theory of Continued Fractions (van Nostrand, P r i n c e t o n , 1948). 25. G.H. Golub and R. Underwood, New York, 1977), p. 361.
in:
Mathematical Software I11 (Academic Press,
26. G.P. Z i e n t a r a and J.H. Freed, i n : Proceedings of the Ninth International Conference on Liquid Crystals (Bangalore, I n d i a , 1982). 27.
L. Fox, Introduction t o Numerical Linear Algebra ( O x f o r d U n i v e r s i t y , New York, 1965).
28. J. W i l k i n s o n , The Algebraic Eigenvalue Problem, (Oxford, London, 1965). 29. R.G. Gordon and T. Messenger, i n : Electron Spin Relaxation in l i q u i d s , eds. L.T. Muus and P.W. A t k i n s (Plenum, New York, 1972), p. 219. 30. W.A.
Wassam, J . Chem. Phys. 82 (1985) 3371, 2286.
31. C.C.
Paige, J . I n s t . Maths. Appl. 10 (1972) 373, 18 (1976) 341.
32. W.Kahan and B.N. P a r l e t t , i n : Sparse Matrix Computations, eds. J . Bunch and D. Rose (Academic Press, New York, 1976), p. 131. 33. H.A.
Kramers, P h y s i c a 7 (1940) 284.
34. J.L.
Skinner and P.G.
35. D.G.
T r u h l a r , W.L.
Wolynes, J. Chem. Phys. 69 (1978) 2143.
Hase and J.T.
Hynes, J . Phys. Chem. 87 (1983) 2664.
36. D. Chandler, J . Chem. Phys. 68 (1978) 2959. 37. M. Fixman, J. Chem. Phys. 69 (1978) 1527; 1538. 38. E. H e l f a n d and J . S k o l n i c k , J. Chem. Phys. 77 (1982) 5714.
39. G. Moro and P.L.
Nordio, Chem. Phys. l e t t . 96 (1983) 192.
40. G. Moro and P.L.
Nordio, Mol. Phys. 56 (1985),
41. G. Moro and P.L.
Nordio, Mol. Phys.;
i n press.
i n press.
42. G. Moro; s u b m i t t e d t o Chem. Phys. 43. A.E.
S t i l l m a n , G.P.
Z i e n t a r a and J.H. Freed, J. Chem. Phys. 71 (1979) 113.
44. R. Jones and T. King, P h i l . Mag. 647 (1983) 481.
The Lanczos Algorithm in Molecular Dynamics 45. R. Jones, i n : The Recursion Method and I t s A p p l i c a t i o n s , eds. D.G. and D.L. Weaire ( S p r i n g e r , B e r l i n , 1985), p. 132. 46. L.J.
Schwartz, A.E.
S t i l l m a n and J.H.
47. G.L. M i l l h a u s e r and J . H .
161
Pettifor
Freed, J. Chem. Phys. 77 (1982) 5410.
Freed, J. Chem. Phys. 81 (1984) 37.
This Page Intentionally Left Blank
Large Scale Eigenvalue Problems I. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (NorthHolland), 1986
163
INVESTIGATION OF NUCLEAR DYNAMICS IN MOLECULES BY MEANS OF THE LANCZOS ALGORITHM Erwin H a l l e r IBM Program P r o d u c t Development Center, S i n d e l f i n g e n ,
Germany
H o r s t Koppel T h e o r e t i s c h e Chemie, U n i v e r s i t y o f H e i d e l b e r g , H e i d e l b e r g , Germany
The s t u d y of multi-mode n u c l e a r dynamics on t h e coupled p o t e n t i a l energy s u r f a c e s o f molecules has become a c h a l l e n g i n g t h e o r e t i c a l and c o m p u t a t i o n a l t a s k . I n t h i s c o n t r i b u t i o n we r e p o r t on o u r main r e s u l t s i n t h i s f i e l d w i t h p a r t i c u l a r emphasis on t h e c o m p u t a t i o n a l problems encountered and on t h e r o l e p l a y e d by t h e Lanczos A l g o r i t h m . The computation
of t h e s p e c t r a l i n t e n s i t y d i s t r i b u t i o n amounts t o t h e diagon a l i z a t i o n o f r e a l symmetric m a t r i c e s which a r e v e r y l a r g e b u t sparse. A s t r a i g h t f o r w a r d m o d i f i c a t i o n of t h e s t a n d a r d Lanczos A l g o r i t h m a l l o w s t h e d i a g o n a l i z a t i o n o f complex symmetric m a t r i c e s . W i t h i n o u r model t h i s i s used t o i n v e s t i gate t h e mixing o f the e l e c t r o n i c species i n t h e v i b r o n i c eigenstates.
INTRODUCTION The i n v e s t i g a t i o n o f t h e v i b r a t i o n a l s t r u c t u r e i n t h e e l e c t r o n i c spectrum o f a m o l e c u l e a l l o w s f o r a v e r y d e t a i l e d i n s i g h t i n t o i t s c o n f i g u r a t i o n and t h e u n d e r l y i n g dynamic processes. E s p e c i a l l y t h e i n t e r a c t i o n o f e l e c t r o n i c and v i b r a t i o n a l m o t i o n i n p o l y a t o m i c molecules interaction"
-
-
commonly termed " v i b r o n i c
o f t e n y i e l d s pronounced c h a r a c t e r i s t i c s t r u c t u r e s i n t h e c o r c s -
ponding s p e c t r a [l]. I n t h i s c o n t r i b u t i o n we r e p o r t on o u r main r e s u l t s i n t h e f i e l d o f n u c l e a r dynamics i n p o l y a t o m i c molecules w i t h p a r t i c u l a r emphasis on t h e c o m p u t a t i o n a l problems encountered. I n t h e f i r s t s e c t i o n a b r i e f account i s g i v e n o f some b a s i c q u e s t i o n s a s s o c i a t e d w i t h m o l e c u l a r spectroscopy. Next, a model w i l l be p r e s e n t e d which a l l o w s f o r a t h e o r e t i c a l i n v e s t i g a t i o n o f v i b r o n i c i n t e r a c t i o n s i n m o l e c u l a r s p e c t r a . The t r a n s f o r m a t i o n o f t h e p h y s i c a l problem t o a n u m e r i c a l one is d i s c u s s e d i n t h e f o l l o w i n g s e c t i o n . The Lanczos A l g o r i t h m i s shown t o be i d e a l l y s u i t e d t o
164
E. Haller and H. Koppel
diagonalize the model Hamiltonian which i s represented a s a large sparse matrix. Finally several i l l u s t r a t i v e r e s u l t s of o u r c a l c u l a t i o n s will be presented f o r s u e c i f i c molecules.
P H Y S I C A L BACKGROUND
The main constituents of the internal energy o f a molecule stem from t h e e l e c t r o nic motion and from the nuclear motion. The l a t t e r i s conveniently separated i n t o vibrational a n d rotational motion. These contributions d i f f e r s i g n i f i c a n t l y i n magnitude: typical e l e c t r o n i c energies l i e in the region of the v i s i b l e and the u l t r a v i o l e t radiation ( 1 10 eV). Vibrational e x c i t a t i o n s require
...
energies of the order o f a tenth of a n electron Volt which corresponds t o the infrared region of the electromagnetic spectrum. Thus, vibrational t r a n s i t i o n s a r e one t o two orders o f magnitude below the energies of e l e c t r o n i c t r a n s i t i o n s . Another two orders of magnitude below - in the microwave region of the r o t a t i o n a l degrees of freedoms a r e observed.
-
excitations
Guided by t h i s f a c t i t i s obvious t h a t one aims a t a separate treatment of t h e e f f e c t s associated with the e l e c t r o n i c , the vibrational and t h e r o t a t i o n a l motion. Indeed, t h i s so-called a d i a b a t i c concept found wide application and success in the study of diatomic molecules [ Z ] . Spectra of diatomic molecules a r e usually very easy t o survey: e l e c t r o n i c bands a r e well separated from each other a n d show a very regular vibrational f i n e s t r u c t u r e . The l a t t e r emerges from d i f f e r e n t vibrational s t a t e s of t h e e l e c t r o n i c s t a t e s involved. Finally, each vibrational band shows a sub-structure due t o r o t a t i o n a l e x c i t a t i o n s . This approach i s appealing in so f a r as i t allows f o r a concise i n t e r p r e t a t i o n of molecular spectra in terms of individual quantum numbers f o r e l e c t r o n i c , vibrational and r o t a t i o n a l s t a t e s [2]. The s i t u a t i o n changes dramatically when we study polyatomic molecules. The f a c t t h a t we a r e dealing here with several vibrational degrees of freedom implies n o t only a q u a n t i t a t i v e change b u t r a t h e r a q u a l i t a t i v e one. Since now the e l e c t r o n i c l e v e l s can vary as a function of several nuclear coordinates t h e probability t h a t two adjacent e l e c t r o n i c surfaces come very close o r even coincide with each o t h e r becomes very high. Thus e l e c t r o n i c energy d i f f e r e n c e s become comparable with t h e vibrational ones, t h e nuclei cease t o be confined t o a s i n g l e potential energy surface and i t i s no longer j u s t i f i e d t o t r e a t the e l e c t r o n i c motion and the nuclear motion separately. The combined action of e l e c t r o n i c a n d vibrational phenomena i s commonly termed "vibronic coupling", the "jumping" of the nuclei between d i f f e r e n t potential
Investigation of' Nuclear Dynamics in Molecules
165
energy s u r f a c e s i s c a l l e d a " n o n a d i a b a t i c " e f f e c t [1,3]. I n a d d i t i o n , t h e v a r i o u s v i b r a t i o n a l degrees o f freedom g e n e r a l l y cannot be t r e a t e d s e p a r a t e l y . T h i s i s known as multimode c o u p l i n g and i s observed i n many p o l y a t o m i c molec u l e s [4]. Consequently, t h e c o m p l e x i t y o f t h e t h e o r e t i c a l d e s c r i p t i o n o f t h e s p e c t r a o f p o l y a t o m i c molecules i s c o n s i d e r a b l y h i g h e r compared t o d i a t o m i c molecules. P a r t i c u l a r l y , i t w i l l be seen t h a t t h e v i b r o n i c l i n e s o f t h e spectrum o f a p o l y a t o m i c m o l e c u l e o f t e n appear c o m p l e t e l y e r r a t i c
-
i f n o t t o say c h a o t i c .
G e n e r a l l y , no s i m p l e r u l e s can be found which d e t e r m i n e t h e p o s i t i o n s o f t h e s p e c t r a l l i n e s and t h e i r i n t e n s i t i e s . V i b r o n i c e f f e c t s a r e n o t o n l y o f i n t e r e s t i n t h e v a r i o u s branches o f s p e c t r o s copy such as p h o t o e l e c t r o n spectroscopy, a b s o r p t i o n spectroscopy, e m i s s i o n s p e c t r o s c o p y o r p r e d i s s o c i a t i o n . Rather, t h e y can p l a n an i m p o r t a n t r o l e a l s o i n chemical processes, i n c o l l i s i o n s i n v o l v i n g molecules, i n resonance Raman s c a t t e r i n g and i n decay processes such as r a d i a t i v e o r n o n r a d i a t i v e decays o f e x c i t e d s t a t e s o f molecules. Here, however, we s h a l l c o n c e n t r a t e on absorpt i o n and p h o t o e l e c t r o n spectroscopy and t o u c h b r i e f l y upon decay processes.
THE THEORETICAL MODEL The H a m i l t o n i a n H i s t h e o p e r a t o r o f t h e energy i n a quantum system. I n t h e case o f molecules i t i s c o n v e n i e n t l y d i v i d e d i n t o t h r e e p a r t s [ 5 ] :
TN and Te a r e t h e k i n e t i c e n e r g i e s o f t h e n u c l e i and t h e e l e c t r o n s , r e s p e c t i v e l y . The t o t a l p o t e n t i a l energy U(q,Q) comprises t h e mutual r e p u l s i o n o f t h e
e l e c t r o n s , t h e mutual r e p u l s i o n o f t h e n u c l e i and t h e a t t r a c t i n g p o t e n t i a l between e l e c t r o n s and n u c l e i . The c o o r d i n a t e s o f t h e e l e c t r o n s and t h e n u c l e i a r e r e p r e s e n t e d b y q and Q, r e s p e c t i v e l y . The e i g e n v a l u e s { E m } and t h e c o r r e s p o n d i n g e i g e n f u n c t i o n s { Y ~ ] a r e t h e s o l u t i o n s o f the Schrodinger equation
Since t h e s o l u t i o n i s d i f f i c u l t t o o b t a i n d i r e c t l y , i t has become customary
[5] t o w r i t e t h e f u l l m o l e c u l a r w a v e f u n c t i o n Y~ as
E. Hailer and H. Koppel
166
where the e l e c t r o n i c wavefunctions o i a r e solutions o f the molecular Hamiltonia n ( 1 ) f o r fixed nuclear geometry ( i . e . , discarding t h e nuclear k i n e t i c energy T N ) . The eigenproblem ( 2 ) i s then reduced - f o r given wavefunctions a i t o a s e t of coupled d i f f e r e n t i a l equations f o r the vibrational wavefunctions x l m ) ( r o t a t i o n a l motion will be ignored in the following). We mention t h a t t h e adiabatic approximation where t h e nuclear motion i s governed by a s i n g l e , well-defined potential energy surface amounts t o retaining a s i n g l e term in the sum of eq. ( 3 ) [3,5]. We seek f o r a simple model Hamiltonian describing the vibrational motion [4]. To t h a t end we use so-called d i a b a t i c e l e c t r o n i c wavefunctions a i [ 6 , 7 ] . These a r e characterized by the requirement t h a t they vary smoothly with the nuclear coordinates even in regions of strong nonadiabatic e f f e c t s . Due t o this smoothness we can assume a very simple expansion of the matrix elements of H :
For the f i r s t two diagonal terms we adopt the harmonic approximation
v
= -
O
1
2
1 ws s
Qs2
and the t h i r d term i s expanded u p t o f i r s t order in the nuclear displacements
E i denotes the v e r t i c a l t r a n s i t i o n energy of t h e e l e c t r o n i c s t a t e labelled " i " . The non-diagonal matrix elements of H which accomplish the coupling bet-
ween the d i a b a t i c s t a t e s a r e a l s o expanded u p t o f i r s t order in the nuclear coordinates :
Group t h e o r e t i c a l considerations show t h a t only molecular vibrations o f c e r t a i n symmetry types can couple two given e l e c t r o n i c s t a t e s .
rs
r 1. x r i
(for ks(i))
Investigation oj'Nuclear Dynamics irt Molecules
r
stands h e r e f o r t h e i r r e d u c i b l e r e p r e s e n t a t i o n o f t h e v i b r a t i o n a l modes
and o f t h e d i a b a t i c s t a t e s i n t h e symmetry group o f a p a r t i c u l a r m o l e c u l e . Due t o t h e s e symmetry r e s t r i c t i o n s s e v e r a l terms i n t h i s f o r m a l expansion v a n i s h . F o r i n s t a n c e , o n l y t o t a l l y symmetric v i b r a t i o n s w i l l appear i n t h e d i a g o n a l o f t h e m a t r i x H a m i l t o n i a n . When t h e i n t e r a c t i n g e l e c t r o n i c s t a t e s have d i f f e r e n t symmetries (and a r e nondegenerate), o n l y n o n - t o t a l l y symmetric v i b r a t i o n s w i l l appear i n t h e non-diagonal elements o f H. Two remarks on t h e a d i a b a t i c p o t e n t i a l energy s u r f a c e s o f t h i s model s h o u l d be added h e r e . By d e f i n i t o n , t h e s e a r e t h e e i g e n v a l u e s o f t h e s t a t i c terms o f t h e H a m i l t o n i a n ( V 6 . . + ( A V ) . . ) . ( A ) Since we have n o t c o n s t r u c t e d o u r diaba0 1J
1J
t i c b a s i s e x p l i c i t l y , we must d e t e r m i n e t h e parameters o f t h e model H a m i l t o n i a n a p o s t e r i o r i . T h i s i s done by a d j u s t i n g t h e p o t e n t i a l energy s u r f a c e s o f o u r model t o ab i n i t i o c a l c u l a t e d p o t e n t i a l energy s u r f a c e s o f r e a l i s t i c molecules. Subsequently, t h e parameter v a l u e s d e r i v e d a r e u s u a l l y r e a d j u s t e d w i t h i n t h e e r r o r range o f t h e ab i n i t o c a l c u l a t i o n i n o r d e r t o improve t h e agreement w i t h experiment [ 4 ] .
( B ) I n t h e case o f two e l e c t r o n i c s t a t e s , f o r example,
dege-neracies o f t h e a d i a b a t i c p o t e n t i a l energy s u r f a c e s emerge when t h e two d i a g o n a l elements become equal and t h e non-diagonal element vanishes. These two c o n d i t i o n s , i n g e n e r a l , d e f i n e a subspace o f dimension N-2 f o r N n u c l e a r degrees o f freedom and w i l l t h u s , i n g e n e r a l , be n o t f u l f i l l e d i n d i a t o m i c molecules (N = 1). Since t h e degeneracy i f l i f t e d i n f i r s t o r d e r w i t h t h e d i s t a n c e f r o m t h e subspace t h e two s u r f a c e s f o r m t h e r e a m u l t i - d i m e n s i o n a l double cone. T h e r e f o r e t h i s t o p o l o g y i s commonly termed " c o n i c a l i n t e r s e c t i o n "
[8,9].
Near c o n i c a l i n t e r s e c t i o n s t h e n o n a d i a b a t i c e f f e c t s a r e known t o be
very strong [4].
As s t a t e d above we a r e m a i n l y i n t e r e s t e d i n t h e v i b r o n i c e f f e c t s i n m o l e c u l a r s p e c t r a . Given t h e complete s e t o f e i g e n v a l u e s { E m } and e i g e n v e c t o r s { y m } o f the Hamiltonian the spectral d i s t r i b u t i o n i s calculated according t o Fermi's Golden Rule [5]: P ( E ) = 2.rr
1 m
167
E. Haller and H. Koppel
168
One gets a sequence of 6-peaks positioned a t the eigenvalues and weighted with the squared f i r s t components of t h e corrresponding eigenvectors ( s e e below). T~ i s t h e matrix element of the t r a n s i t i o n operator T between d i a b a t i c wave-functions and may be termed t r a n s i t i o n amplitude. I t wi 1 be taken as some a r b i t r a r y constant in the following. [4] In experimental spectra one, of course, never f i n d s &-shaped peaks. Among the various mechanisms f o r l i n e broadening me mention the na u r a l l i n e width due t o spontaneous emission, limi ted experimental resolution and Doppler broadening [2]. To account f o r these e f f e c t s we convolute our l i n e spectrum ( 7 ) with Lorentzians of s u i t a b l e width y : 6(E-EM)
__*
L ( E - E ~ ) = rr 1 Y
(Y/2I2
( E- Em '+ ( y / 2
So f a r we have discussed p a r t i c u l a r l y t h e physical aspects of the model used. The next s t e p t o be done now i s the diagonalization of our model Hamiltonian. I n s p i t e of the s i m p l i c i t y of the model, a n a l y t i c a l solutions of the Schrodinger equation can be found f o r very few special cases only. Likewise, q u a n t u m mechanical perturbation theory f a i l s severely i n most cases. The general vibron i c coupling problem, t h e r e f o r e , requires numerical methods f o r i t s solution.
THE NUMERICAL PROBLEM
W e seek f o r the vibrational functions x l m ) a s l i n e a r combinations of products of harmonic o s c i l l a t o r wavefunctions, one f o r each vibrational mode considered. The o s c i l l a t o r wavefunctions a r e chosen t o be eigenfunctions of a "reference" s t a t e described by the Hamiltonian Ho = TN + Vo [4]. By the variational principle [5], the solution of the Schrodinger equation ( 2 ) i s then converted t o the diagonalization of an i n f i n i t e dimensional secular matrix f o r the expansion c o e f f i c i e n t s . This secular matrix must be truncated t o a f i n i t e dimension by introducing maximal occupation numbers f o r the vibrational modes. Furthermore, there are often vibronic symmetries present which permit a rearrangement of t h e matrix t o blockdiagonal form. Figure 1 shows the typical s t r u c t u r e of such a sub-block. Here, 2 e l e c t r o n i c s t a t e s and two vibrational modes a r e considered, one mode entering only i n the diagonal elements of H , t h e o t h e r only i n the off-diagonal element. The points i n d i c a t e in which way the matrix h a d t o be augmented i f the number of basis functions f o r each mode were t o be increased.
4
’0 I
I
I
I
I I
I I' I
I I
6
h
I
I
I
7!
I
I
I
.C
0 - V ~ O l
3-
Investigation of’Nuclear Dynamics in Molecules
I I
I
0 1I , $ I * , k " o o l I
w
0
I
I .*
I
I
I
I I I I
I I l
.
...
169
Figure 1 Structure o f t h e secular matrix
E. Haller and H. Koppel
170 I n g e n e r a l , r e t a i n i n g Ni
b a s i s f u n c t i o n f o r t h e i - t h mode ( 1 -< i -< L ) , one
p Ni w i t h a band-width M 1 [4]. I n t y p i c a l a p p l i c a t i o n s
N/Ni(maX) and
g e t s a banded m a t r i x o f o r d e r N =
=
r o u g h l y 2NL non-zero elements
Ni
a l r e a d y f o r t h r e e modes t h e dimension N i s i n t h e range
lo3
p 10 so t h a t
4
- 10 w h i l e t h e
percentage o f n o n v a n i s h i n g elements i s below 1 f . These numbers make i t c l e a r t h a t t h e Lanczos a l g o r i t h m i s a n e a r l y i d e a l t o o l t o accomplish t h e d i a g o n a l i zation o f the secular m a t r i x [lo] ( a l s o note the r e g u l a r p a t t e r n o f t h e m a t r i x elements i n F i g . 1 which p e r m i t s t h e d e s i g n o f e f f i c i e n t m a t r i x - v e c t o r m u l t i p l i c a t i o n r o u t i n e s r e q u i r e d f o r t h e Lanczos i t e r a t i o n ) . N e v e r t h e l e s s , due t o t h e banded s t r u c t u r e o f t h e s e c u l a r m a t r i x , a l s o s t a n d a r d methods l i k e Jacobi r o t a t i o n s can be c a r r i e d r a t h e r f a r and p e r m i t a check o f t h e Lanczos r e s u l t s i n a few r e l e v a n t cases. We end t h i s s e c t i o n w i t h a remark on o u r c h o i c e o f t h e s t a r t i n g v e c t o r p1 i n t h e Lanczos a l g o r i t h m . Whereas i n p r i n c i p l e t h i s c h o i c e i s r a t h e r a r b i t r a r y i n o u r case i t i s f i x e d by t h e f o l l o w i n g reasoning. W i t h t h e arrangement o f t h e s e c u l a r m a t r i x as i n F i g . 1 t h e s p e c t r a l i n t e n s i t i e s i n eq. ( 7 ) a r e apart from the i r r e l a v a n t o v e r a l l constant
T~
- g i v e n by t h e
-
first components
o f t h e e i g e n v e c t o r s . By s i m p l e r e a s o n i n g one can show t h a t t h e c h o i c e
guarantees t h a t t h e f i r s t elements o f t h e e i g e n v e c t o r s o f t h e t r i d i a g o n a l m a t r i x
Tm generated by t h e Lanczos a l g o r i t h m a r e t h e same as those o f t h e o r i g i n a l m a t r i x . Thus, i t i s n o t necessary t o s t o r e a l l t h e Lanczos v e c t o r s and i t s u f f i c e s t o compute o n l y t h e f i r s t components o f t h e e i g e n v e c t o r s o f Tm.
RESULTS AND D I S C U S S I O N We now p r e s e n t and d i s c u s s some s e l e c t e d r e s u l t s o b t a i n e d w i t h t h e Lanczos a l g o r i t h m . F i g . 2 shows t h e second band i n t h e p h o t o e l e c t r o n ( P E ) spectrum o f e t h y l e n e a s found e x p e r i m e n t a l l y ( F i g . 2a) and a c c o r d i n g t o t h e t h e o r e t i c a l c a l c u l a t i o n ( F i g . 2b). I n drawing F i g . 2b t h e v i b r o n i c c o u p l i n g H a m i l t o n i a n
( 5 ) w i t h f o u r v i b r a t i o n a l modes has been employed [ l l ] . T r e a t i n g one mode i n t h e c o n v o l u t i o n a p p r o x i m a t i o n [4,11]
leads t o a secular m a t r i x o f order
3600 on which 3600 Lanczos i t e r a t i o n s t e p s have been performed.The envelope o f t h e c a l c u l a t e d l i n e spectrum i s seen t o be i n f a i r agreement w i t h experiment. The r e l i a b i l i t y o f t h e u n d e r l y i n g l i n e s t r u c t u r e i s f u r t h e r c o r r o b a r a t e d by ab i n i t i o d a t a f o r t h e parameters a p p e a r i n g i n eq. ( 5 ) which agree w i t h those used f o r F i g . 2b w i t h i n t h e i r e r r o r l i m i t s . To a p p r e c i a t e t h e p h y s i c a l s i g n i -
Investigation of’Nuclear Dynamics in Molecules
>t-
ul
H
Z W
1
I
I
I
b
171
THEORY
t
z
H
W
>
H
t
a
-I W IY
C
CONDON
1
13
12
I O N I Z A T I O N ENERGY Figure 2 Second photoelectron band o f ethylene
eV
172
E. Haller and H. Koppel
f i c a n c e o f t h e c a l c u l a t e d spectrum we show i n F i g . 2c t h e r e s u l t o f a n o t h e r c a l c u l a t i o n w i t h t h e same parameters b u t where t h e n u c l e a r m o t i o n i s a r t i f i c i a l l y c o n f i n e d t o t h e upper p o t e n t i a l energy s u r f a c e , i . e . t h e n o n a d i a b a t i c e f f e c t s a r e suppressed i n t h e c a l c u l a t i o n [ll]. T h i s spectrum e x h i b i t s r e g u l a r v i b r a t i o n a l p r o g r e s s i o n s as i s f a m i l i a r from s p e c t r a o f d i a t o m i c molecules b u t i s seen t o bear l i t t l e resemblance n e i t h e r t o t h e f u l l c a l c u l a t i o n n o r t o experiment. Even t h e v e r y number o f s p e c t r a l l i n e s comes o u t much t o o low, by r o u g h l y two o r d e r s o f magnitude. These
strong
nonadiabatic e f f e c t s , o f
which t h e comparison between F i g s . 2b and c g i v e s evidence, can be t r a c e d t o t h e occurrence o f a c o n i c a l i n t e r s e c t i o n between a d i a b a t i c p o t e n t i a l energy s u r f a c e s . They a r e a genuine multi-mode e f f e c t and v a n i s h i n a single-mode d e s c r i p t i o n o f t h e problem [ll]. I n t h e p r e s e n t example i t p r o v e s p o s s i b l e , though w i t h c o n s i d e r a b l e e f f o r t , t o p e r f o r m a f u l l m a t r i x d i a g o n a l i s a t i o n w i t h t h e method o f J a c o b i r o t a t i o n s and t h u s check t h e Lanczos r e s u l t . The outcome o f t h i s check i s v e r y g r a t i f y i n g , indeed. The s p e c t r a l envelope and p r a c t i c a l l y t h e whole l i n e s t r u c t u r e ( e x c e p t f o r some t i n y l i n e s i n t h e h i g h energy p a r t o f t h e spectrum) c o i n c i d e t o w i t h i n drawing accuracy i n b o t h c a l c u l a t i o n s . A more q u a n t i t a t i v e comparison i s g i v e n i n t h e appendix. While f o r t h e envelope t h e good convergence o f t h e Lanczos scheme c o u l d be expected because o f t h e c l o s e r e l a t i o n between t h e Lanczos procedure and t h e method o f moments, t h e e q u a l l y good convergence on t h e l i n e s t r u c t u r e seems e s p e c i a l l y noteworthy. On t h e o t h e r hand, t h e computing e f f o r t i n c r e a s e s s t r o n g l y f o r t h e method o f Jacobi r o t a t i o n s : b o t h t h e r e q u i r e d s t o r a g e space and t h e CPU t i m e go up by almost two o r d e r s o f magnitude. These a r e t y p i c a l numbers f o r multi-mode v i b r o n i c problems and i l l u s t r a t e t h e power o f t h e Lanczos method f o r o u r purposes. As a s i m i l a r example we p r e s e n t i n F i g . 3 e x p e r i m e n t a l and t h e o r e t i c a l r e s u l t s on t h e v i s i b l e a b s o r p t i o n spectrum o f NO2 [12].
I n a f i r s t approximation t h i s
spectrum can be taken as t h e sum o f two d i f f e r e n t e l e c t r o n i c t r a n s i t i o n s , each o f which l e a d s t o a t w o - s t a t e three-mode v i b r o n i c c o u p l i n g problem. To a v o i d t r u n c a t i o n e r r o r s we have, however, t o i n c l u d e more b a s i s f u n c t i o n s t h a n i n e t h y l e n e and a r r i v e a t m a t r i x dimensions o f 18630 ( F i g . 2 b ) and 24000 ( F i g . 2 c ) T h i s makes t h e Lanczos a l g o r i t h m an i n d i s p e n s a b l e t o o l f o r d i a g o n a l i z a t i o n . P e r f o r m i n g 10000 v i z . 3000 i t e r a t i o n s t e p s one o b t a i n s t h e s p e c t r a d i s p l a y e d i n F i g . 2b and 2c, r e s p e c t i v e l y [12].
The e x p e r i m e n t a l r e c o r d i n g o f F i g . 2a
i s t h e weighted sum o f t h e two p a r t i a l s p e c t r a c o r r e s p o n d i n g t o F i g s . 2b and c w i t h some unknown w e i g h t c o e f f i c i e n t s . One sees t h a t t h e c o m p l e x i t y o f t h e v i s i b l e a b s o r p t i o n spectrum o f N02, f o r which i t has become famous among s c i e n t i s t s [13],
i s caused by t h e t r a n s i t i o n d i s p l a y e d i n F i g . 2b, t h e s o - c a l l e d ZB2-2A1
Investigatiori of Nuclear Dynamics in Molecules
b
1
'
"
'
'
~
"
~
"
'
~
'
~
~
'
"
1
"
'
'
c
I
Figure 3
The v i s i b l e a b s o r p t i o n spectrum o f NO2
'
'
173
114
E. Haller and H. Kappel
t r a n s i t i o n , w h i l e t h e s o - c a l l e d 2 B 1 - Z A 1 t r a n s i t i o n o f F i g . 2c has a r a t h e r regular v i b r o n i c structure. This d i f f e r e n c e i s p r e c i s e l y i n l i n e w i t h the e n e r g e t i c p o s i t i o n of a c o n i c a l i n t e r s e c t i o n which i s l o w - l y i n g i n t h e B2 v i b r o n i c m a n i f o l d b u t h i g h - l y i n g i n t h e B1 v i b r o n i c m a n i f o l d and l e a d s t o strong non-adiabatic effects
( o n l y ) i n t h e B2 v i b r o n i c species, c o r r e s p o n d i n g
t o F i g . 2b [12]. Another i n s t r u c t i v e example i s p r o v i d e d by t h e f i r s t PE band o f BF3 [14]. The 1A;
ground s t a t e of BF;
i n t e r a c t s w i t h t h e second e x c i t e d 3E’ e l e c t r o n i c
s t a t e through two degenerate v i b r a t i o n a l modes w h i l e t o t a l l y symmetric modes p r o v e t o be n e g l i g i b l e f o r t h e c o u p l i n g mechanism. T h i s i n t e r a c t i o n l e a d s t o a s t r o n g v i b r a t i o n a l e x c i t a t i o n i n t h e f i r s t PE band as shown i n t h e l e f t panel o f F i g . 4. For comparison, we d i s p l a y i n t h e r i g h t p a r t o f t h e f i g u r e t h e s p e c t r a where o n l y one o f t h e two v i b r a t i o n a l modes ("3 o r "4)
i s retained.
A p p a r e n t l y , t h e f u l l spectrum i s v e r y f a r from b e i n g t h e c o n v o l u t i o n o f t h e two single-mode s p e c t r a which shows t h a t t h e s t r o n g v i b r a t i o n a l e x c i t a t i o n c h a r a c t e r i s t i c o f t h e two-mode c a l c u l a t i o n i s a mode-mixing e f f e c t . I t can be t r a c e d t o t h e shape o f t h e a d i a b a t i c p o t e n t i a l energy surfaces [14]. I n drawing F i g . 4 ( l e f t p a n e l ) 500 Lanczos i t e r a t i o n s t e p s have been performed on a s e c u l a r m a t r i x o f o r d e r 40800. The convergence has been a s c e r t a i n e d by v e r i f y i n g t h a t t h e spectrum i s v i r t u a l l y t h e same as t h a t o b t a i n e d a f t e r 400 Lanczos i t e r a t i o n steps ( e x c e p t again f o r a few m i n o r l i n e s i n t h e h i g h energy p a r t o f t h e spectrum). T h i s i n d i c a t e s a n o t h e r a s p e c t i n which t h e Lanczos procedure i s i d e a l l y s u i t e d f o r o u r purposes: t h e s i m p l e r t h e v i b r o n i c l i n e s t r u c t u r e t h e s m a l l e r i s t h e number o f Lanczos s t e p s r e q u i r e d t o converge on i t To g a i n f u r t h e r i n s i g h t i n t o t h e s t r e n g t h o f t h e n o n a d i a b a t i c e f f e c t s we a l s o c a l c u l a t e t h e quenching o r w e i g h t c o e f f i c i e n t s qm. These a r e d e f i n e d as t h e percentage c h a r a c t e r of t h e upper e l e c t r o n i c s t a t e , termed No. 2, i n t h e v i b r o n i c e i g e n s t a t e s (we c o n f i n e o u r s e l v e s here t o two i n t e r a c t i n g e l e c t r o n i c s t a t e s ) [4].
Using t h e n o t a t i o n o f eq. ( 3 ) we can w r i t e
A p a r t f r o m t h e i r t h e o r e t i c a l i n t e r e s t , t h e quenching c o e f f i c i e n t s a r e a l s o r e l a t e d t o t h e r a d i a t i v e decay o f t h e v i b r o n i c s t a t e s , a t l e a s t i n a w e l l d e f i n e d l i m i t i n g case [4].
Investigation of Nuclear D.ynamics in Molecules
175
exact calculation
n v)
c, .C
C
1
n L
a
U
>
c, .C
v)
C 0
c, C
n
I 16.0
1
L 16.0
Energy - C ~ V I
Figure 4 F i r s t p h o t o e l e c t r o n band o f BF3
E. Haller and H. Koppel
176
To o b t a i n t h e qm w i t h o u t c a l c u l a t i n g t h e f u l l e i g e n v e c t o r s we use t h e f o l l o w i n g simple t r i c k . We augment t h e v i b r o n i c H a m i l t o n i a n H by t h e s o - c a l l e d r a d i a t i v e damping m a t r i x
I f t h e ( r e a l ) q u a n t i t y y i s much s m a l l e r t h a n n e a r e s t - n e i g h b o r spacings o f t h e o r i g i n a l H a m i l t o n i a n H t h e r e a l p a r t s o f t h e e i g e n v a l u e s w i l l n o t be a f f e c t e d by t h e p e r t u r b a t i o n
r
and t h e i m a g i n a r y p a r t s
y,
w i l l from p e r t u r b a t i o n theore-
t i c arguments be g i v e n b y
The qm can t h u s be o b t a i n e d f r o m t h e e i g e n v a l u e s o f t h e complex symmetric Hamiltonian H +
r
f o r s u f f i c i e n t l y small values o f t h e q u a n t i t y y .
We have g e n e r a l i z e d t h e Lanczos a l g o r i t h m f o r complex symmetric m a t r i c e s
[lo]
as d e s c r i b e d a l s o i n o t h e r c o n t r i b u t i o n s t o t h i s volume and have a p p l i e d i t t o t h e examples o f e t h y l e n e and NO2 d i s c u s s e d above. W i t h s i m i l a r numbers o f m a t r i x dimensions and Lanczos i t e r a t i o n s t e p s as b e f o r e we have found 147 and 175 converged quenching c o e f f i c i e n t s , r e s p e c t i v e l y [4,12].
Figure 5 displays
t h e r e s u l t s as histograms, i . e . p l o t t h e number o f v i b r o n i c s t a t e s w i t h quenc h i n g c o n s t a n t s i n a g i v e n i n t e r v a l . I n t h e absence o f v i b r o n i c c o u p l i n g t h e s e numbers would be e i t h e r z e r o o r one c o r r e s p o n d i n g t o v i b r a t i o n a l l e v e l s o f t h e lower o r upper e l e c t r o n i c s t a t e s . We see f r o m F i g . 5 t h a t t h e v i b r o n i c c o u p l i n g leads t o a s t r o n g m i x t u r e o f t h e e l e c t r o n i c species i n t h e v i b r o n i c s t a t e s . E s p e c i a l l y i n t h e e t h y l e n e c a t i o n t h e d i s t r i b u t i o n i s v e r y narrow and t h e m i x i n g almost complete. I n t h e example o f N02, on t h e o t h e r hand, t h e d i s t r i b u t i o n r a t h e r shows a bimodal b e h a v i o r and a p r e f e r e n t i a l e l e c t r o n i c c h a r a c t e r can s t i l l be i d e n t i f i e d . T h i s shows more c l e a r l y t h a n t h e s p e c t r a t h e d i f f e r e n t s t r e n g t h o f t h e n o n a d i a b a t i c e f f e c t s i n t h e two examples. Concerning t h e r a d i a t i v e decay, t h e d e p a r t u r e o f t h e q, anomalously l o n g r a d i a t i v e decay t i m e s [4,12]
from u n i t y leads t o
which have been w e l l documented
e x p e r i m e n t a l l y f o r NO2 [13].
CONCLUSIONS I n t h i s c o n t r i b u t i o n , we have presented some i l l u s t r a t i v e examples o f v i b r o n i c c o u p l i n g systems i n small p o l y a t o m i c molecules. A s i m p l e model H a m i l t o n i a n
Investigation of Nuclear Dynamics in Molecules
177
30
10
0.1 I
I
0.3 1
- 9
L
Figure 5 Quenching f a c t o r histograms f o r CzH4' and NO2
0.5
1
I
I
r
1
rl
178
E. Huller and H . Koppel
has been used i n t h e c a l c u l a t i o n s w h i c h n e v e r t h e l e s s r e p r o d u c e s t h e g r o s s f e a t u r e s o f complex e x p e r i m e n t a l s p e c t r a c o r r e c t l y . The examples g i v e e v i d e n c e t h a t t h e u s u a l a d i a b a t i c s e p a r a t i o n o f e l e c t r o n i c and n u c l e a r m o t i o n s , whereupon t h e n u c l e i move i n w e l l d e f i n e d p o t e n t i a l e n e r g y s u r f a c e s , may f a i l c o m p l e t e l y : e s p e c i a l l y when d i f f e r e n t e l e c t r o n i c s u r f a c e s i n t e r s e c t each o t h e r , t h e n u c l e i may i n t e r c o n v e r t f r e e l y between t h e s e s u r f a c e s . They a l s o show t h a t t h e dynamic p r o b l e m i s i n t r i n s i c a l l y a m u l t i - m o d e p r o b l e m . The e n s u e i n g enormous d i m e n s i o n o f t h e s e c u l a r m a t r i x , i t s h i g h d e g r e e o f s p a r s i t i y and t h e r e g u l a r p a t t e r n o f n o n z e r o e l e m e n t s make t h e Lanczos a l g o r i t h m a h i g h l y e f f e c t i v e means o f s o l v i n g t h e quantum m e c h a n i c a l e i g e n v a l u e p r o b l e m . T h i s i s f u r t h e r c o r r o b o r a t e d b y t h e o b s e r v a t i o n t h a t t h e Lanczos i t e r a t i o n p r o d u c e s r e l e v a n t i n f o r m a t i o n on t h e s p e c t r a l i n t e n s i t y d i s t r i b u t i o n f a r b e f o r e a numerical convergence i n t h e p r o p e r sense i s a c h i e v e d . I n f u t u r e a p p l i c a t i o n s one w o u l d l i k e t o t r e a t matrices o f s t i l l l a r g e r dimension
(4
6 10 , say) and, i n a s l i g h t l y d i f f e r e n t
c o n t e x t , a l s o o b t a i n s e l e c t e d e i g e n v e c t o r s i n a n a r b i t r a r y r a n g e of t h e e i g e n v a l u e spectrum. The r o l e o f t h e Lanczos o r r e l a t e d a l g o r i t h m s s h o u l d become t h e more i m p o r t a n t and e f f i c i e n t i m p l e m e n t a t i o n s f o r t h e above p u r p o s e s a r e o f great interest.
ACKNOWLEDGEMENT The a u t h o r s w i s h t o e x p r e s s t h e i r g r a t i t u d e t o L.S. Cederbaum and W . Domcke f o r a f r u i t f u l c o l l a b o r a t i o n on t h e p r o b l e m s d i s c u s s e d i n t h i s a r t i c l e .
APPENDIX We w i s h t o compare two l i n e sequences c h a r a c t e r i z e d b y e n e r g i e s Ei
and i n t e n s i -
t i e s Ii, one sequence ( w i t h o u t s u p e r s c r i p t ) b e i n g e x a c t , t h e o t h e r ( s u p e r s c r i p t L ) b e i n g g e n e r a t e d w i t h t h e a i d o f t h e Lanczos i t e r a t i o n scheme. To have a m e a n i n g f u l c o r r e l a t i o n between t h e i n d i v i d u a l l i n e s we c o n s i d e r o n l y t h e s e p a i r s o f l i n e s where t h e e n e r g i e s d i f f e r b y l e s s t h a n a t h i r d o f t h e n e a r e s t n e i g h b o r d i s t a n c e o f t h e e x a c t sequence and t h e i n t e n s i t i e s d i f f e r b y l e s s t h a n a f a c t o r o f two:
The c o r r e s p o n d i n g l y r e s t r i s t e d sum o f e x a c t i n t e n s i t i e s
Investigation of Nuclear Dynamics in Molecules
i s c a l l e d t h e c o n f i d e n c e i n d e x s i n c e i t measures t h a t p a r t o f t h e s p e c t r a l i n t e n s i t y w h i c h i s c o v e r e d b y t h e c o m p a r i s o n ( b o t h sequences o f i n t e n s i t i e s a r e t a k e n t o be n o r m a l i z e d t o u n i t y ) . B e i n g more i n t e r e s t e d i n s t r o n g t h a n i n weak l i n e s we i n t r o d u c e t h e w e i g h t e d e r r o r
T h i s q u a n t i t y i s n o t y e t m e a n i n g f u l b y i t s e l f b u t s h o u l d be r e l a t e d t o a t y p i c a l l i n e s p a c i n g such a s t h e w e i g h t e d a v e r a g e o f s p a c i n g s
( h e r e t h e sum i s n o t r e s t r i c t e d and t h e p r i m e o n t h e summation s i g n h a s t h e r e f o r e been o m i t t e d ) . As t h e c r i t e r i o n f o r t h e q u a l i t y o f t h e Lanczos s p e c t r u m we use t h e w e i g h t e d r e l a t i v e e r r o r
(17)
E = F / D
The f o l l o w i n g T a b l e I l i s t s t h e c o n f i d e n c e i n d e x C and t h e w e i g h t e d r e l a t i v e e r r o r E f o r t h e s e c u l a r problem o f e t h y l e n e ( F i g . 2b) w i t h a dimension o f 3600 and t h r e e d i f f e r e n t numbers o f Lanczos i t e r a t i o n s t e p s . One sees t h a t even a f t e r 1800 i t e r a t i o n s t h e e r r o r E i s q u i t e s m a l l , b u t t h i s i s n o t v e r y s i g n i f i c a n t because o n l y 58% o f t h e s p e c t r a l i n t e n s i t y have a c t u a l l y been compared. A f t e r 5400 i t e r a t i o n s , on t h e o t h e r hand, t h e Lanczos s p e c t r u m can be c o n s i d e r e d a s c o n v e r g e d f o r o u r p u r p o s e s . I t s h o u l d be s t r e s s e d t h a t t h e s e e s t i m a t e s a r e r a t h e r c o n s e r v a t i v e and t h a t f r o m t h e v i s u a l i m p r e s s i o n even t h e s p e c t r u m a f t e r 1800 i t e r a t i o n s l o o k s q u i t e s a t i s f a c t o r y w h i l e t h a t f o r 3600 i t e r a t i o n s l o o k s a l m o s t i n d i s t i n g u i s h a b l e f r o m t h e e x a c t one.
C o n f i d e n c e i n d e x C and w e i g h t e d r e l a t i v e e r r o r E f o r e t h y l e n e ( d i m e n s i o n 3600) Iterations
C
E
3600
5400
0.58
0.92
0.99
0.020
0.0045
0.00083
1800
179
E. Haller and H. Koppel
180
REFERENCES 1) See, f o r example, 1 . 6 . Bersuker, The J a h n - T e l l e r E f f e c t and V i b r o n i c I n t e r a c t i o n s i n Modern Chemistry (Plenum Press, New York, 1984)
2 ) G. Herzberg, Spectra o f D i a t o m i c Molecules (Van Nostrand, New York, 1950) 3 ) G. Herzberg, E l e c t r o n i c Spectra and E l e c t r o n i c S t r u c t u r e o f Polyatomic Molecules (Van Nostrand, New York, 1966)
4 ) H. Koppel, W . Domcke and L.S. Cederbaum, Multimode m o l e c u l a r dynamics beyond t h e Born-Oppenheimer a p p r o x i m a t i o n , Adv. Chem. Phys. 57 ( 1 9 8 4 ) 59
5 ) A.S. Davydov, Quantum Mechanics (Pergamon Rress, New York, 1965) 6 ) H.C.
Longuet-Higgins, Some r e c e n t developments i n t h e t h e o r y o f m o l e c u l a r
energy l e v e l s , Advan. Spectrosc. 2 (1961) 429 7 ) W . L i c h t e n , Resonant charge exchange i n a t o m i c c o l l i s i o n s , Phys. Rev. 131
(1963) 229 8 ) G . Herzberg and H.C.
Longuet-Higgins,
I n t e r s e c t i o n o f p o t e n t i a l energy
s u r f a c e s i n p o l y a t o m i c molecules, Discuss. Faraday SOC. 35 (1963) 77
9 ) T. C a r r i n g t o n , The geometry o f i n t e r s e c t i n g p o t e n t i a l s u r f a c e s , A c c t s . Chem. Res. 7 (1974) 20
10) E . H a l l e r , Mehrmodendynami k b e i konischen Durchschneidungen von P o t e n t i a l f l a c h e n ( T h e s i s , U n i v e r s i t y o f Heidelberg,
1984)
11) H. Koppel, L.S. Cederbaum and W . Domcke, S t r o n g n o n a d i a b a t i c e f f e c t s and c o n i c a l i n t e r s e c t i o n s i n m o l e c u l a r spectroscopy and u n i m o l e c u l a r decay:
C2H4+, J. Chem. Phys. 77 ( 1 9 8 2 ) 2014 1 2 ) E. H a l l e r , E . Koppel and L.S. Cederbaum, The v i s i b l e a b s o r p t i o n spectrum o f N02: a three-mode n u c l e a r dynamics i n v e s t i g a t i o n , J. Mol. Spectrosc.
111 (1985) 377 1 3 ) D.K. Hsu, D.L. Monts and R.N. Zare, S p e c t r a l A t l a s o f N i t r o g e n D i o x i d e (Academic Press, New York, 1978)
1 4 ) E. H a l l e r , H. Koppel, L.S. Cederbaum, W . von Niessen and G. B i e r i , M u l t i mode J a h n - T e l l e r and pseudo J a h n - T e l l e r e f f e c t s i n BF3+, J. Chem. Phys. 78 (1983) 1359
Large Scale Eigenvalue Problems J . Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986
181
EXAMPLES OF EIGENVALUE/VECTOR USE I N ELECTRIC POWER SYSTEM PROBLEMS
James E. Van Ness
E l e c t r i c a l E n g i n e e r i n g and Computer S c i e n c e Department Northwestern University Evans t o n , I111i n o i s
U. S. A.
Major
failures
result
of
of
electric
dynamic
disturbances
can
power
systems usually
in
instabilities be
studied
by
the
system.
linearizing
e q u a t i o n s and u s i n g e i g e n v a l u e s / v e c t o r s .
are t h e
the
Small system
If t h e power system
i s r e p r e s e n t e d i n c o m p l e t e d e t a i l , t h e o r d e r of t h e r e s u l t i n g system w i l l be i n t h e t e n s of t h o u s a n d s , handled. solving
which
c a n n o t be
P r e s e n t methods h a v e p r o v e n useful and e f f e c t i v e i n
real
problems,
but
they
have
just
increased
the
d e s i r e t o s t u d y s y s t e m s of h i g h e r o r d e r .
INTRODUCTION A modern power system p r e s e n t s a c h a l l e n g e t o t h e c o n t r o l e n g i n e e r t h a t a r i s e s p r i m a r i l y from i t s huge s i z e .
A c o m p l e t e r e p r e s e n t a t i o n o f t h e system would l e a d
t o d i f f e r e n t i a l e q u a t i o n s whose o r d e r would be measured i n t h e t e n s of t h o u s a n d s . Even w i t h
many
equivalents,
simplifying
assumptions and
t h e dynamic system
combining
t h a t results w i l l
d i f f e r e n t i a l e q u a t i o n s o f o r d e r s e v e r a l hundred. currently
be
of
separate
represented
units by
into
a s e t of
The b a s i c t e c h n i q u e s t h a t are
b e i n g u s e d t o s t u d y t h e s e s y s t e m s h a v e been known for y e a r s b u t o n l y
a p p l i e d t o r e l a t i v e l y small s y s t e m s .
T h i s p a p e r d e s c r i b e s methods of
studying
l a r g e r s y s t e m s and g i v e s examples of s t u d i e s t h a t have been made. A large d i g i t a l computer program c a l l e d PALS [ l ] h a s b e e n d e v e l o p e d t o a p p l y t h e methods t h a t w i l l be d e s c r i b e d h e r e .
The program d o e s l o s e some e f f i c i e n c y once
t h e s i z e of t h e system gets o v e r 300-t.h o r d e r .
T h i s s t i l l i s several o r d e r s of
magnitude l a r g e r t h a n t h e t y p i c a l system s t u d i e d i n a c o u r s e i n f e e d b a c k c o n t r o l .
J. E. Van Ness
182 DESCRIPTION OF THE SYSTEM A modern power
system c o n s i s t s of
many g e n e r a t i n g s t a t i o n s and l o a d c e n t e r s
connected t o g e t h e r by a n e l e c t r i c a l t r a n s m i s s i o n system.
I n many s t u d i e s of t h e
dynamics of t h i s t y p e of system, t h e time c o n s t a n t s of t h e g e n e r a t i n g s t a t i o n s are found t o be t h e c o n t r o l l i n g o n e s i n t h e o v e r a l l system r e s p o n s e .
The t r a n s i e n t s
i n v o l v e d a r e so slow t h a t t h e t r a n s m i s s i o n network c a n be c o n s i d e r e d t o be i n steady-state
operation,
and
the
transmission l i n e
t r a n s i e n t s c a n be ignored.
Likewise , t h e l o a d s a r e normally r e p r e s e n t e d by t h e i r s t e a d y - s t a t e istics. more
character-
T h i s i n part. may be due t o l a c k of s u f f i c i e n t knowledge t o r e p r e s e n t them
adequately.
Thus,
the
first
thing
to
be
is
considered
the
dynamic
c h a r a c t e r i s t i c s of t h e g e n e r a t i n g s t a t i o n . 1
1
B T E A H BOWL
Figure 1.
A Typical Generating S t a t i o n
A t y p i c a l r e p r e s e n t a t i o n of a g e n e r a t i n g s t a t i o n i s shown i n Fig.
t h e r e are many
variations
i n g e n e r a t i n g s t a t i o n s depending upon t h e type of
equipment used and when they were b u i l t . involved,
T h i s f i g u r e does show t h e major elements
The synchronous machine i s r e p r e s e n t e d by t h e b l o c k i n t h e upper r i g h t
hand c o r n e r l a b e l e d "machine e q u a t i o n s " . these
machine
I n t h e example t o be shown l a t e r i n t h i s
equations w i l l
consist
e q u a t i o n and two a l g e b r a i c e q u a t i o n s .
Again,
paper,
Of course,
1.
of
one f i r s t - o r d e r
differential
t h i s c o u l d change i f a d i f f e r e n t
r e p r e s e n t a t i o n of t h e synchronous machine were t o be used.
The v o l t a g e r e g u l a t o r
and e x c i t e r f o r t h i s synchronous machine are shown i n t h e upper l e f t - h a n d c o r n e r of Fig.
1.
The i n p u t t o t h i s r e g u l a t o r
i s t h e t e r m i n a l v o l t a g e ET, and t h e
o u t p u t i s t h e v o l t a g e a p p l i e d t o t h e f i e l d of t h e machine EF.
The a c t u a l v o l t a g e
r e g u l a t o r and e x c i t e r undoubtedly i n c l u d e limiters and s a t u r a t i o n e f f e c t s i n t h e e x c i t e r which would make t h e o v e r a l l system n o n - l i n e a r .
For t h e t y p e s of s t u d i e s
t h a t are d e s c r i b e d i n t h i s paper, t h e v o l t a g e r e g u l a t o r and e x c i t e r system are
l i n e a r i z e d around an o p e r a t i n g p o i n t and r e p r e s e n t e d as shown i n Fig. 1.
Eigenvaluel Vector Use iri Elwrric Power System Probleins
183
T h e mechanical c h a r a c t e r i s t i c s of t h e g e n e r a t i n g system are shown i n t h e lower row
of b l o c k s .
The two b l o c k s o n t h e r i g h t - h a n d s i d e r e p r e s e n t t h e r o t a t i n g i n e r t i a
and damping of t h e synchronous machine.
The v a r i a b l e
represents the v a r i a t i o n
of t h e a c t u a l frequency or speed of t h e machine from a r e f e r e n c e f r e q u e n c y , and t h e angle
6
r e p r e s e n t s t h e a n g l e of t h e machine w i t h r e s p e c t t o t h e o t h e r machines
on t h e system. and t u r b i n e
The t h r e e b l o c k s on t h e l e f t a r e a r e p r e s e n t a t i o n of t h e governor
c h a r a c t e r i s t i c s f o r one
type of
steam t u r b i n e
v a r i a b l e YD a c t u a l l y r e p r e s e n t s t h e power developed by
the
prime mover. turbine.
The In the
diagram t h i s h a s t h e e l e c t r i c a l power t h a t i s f e d t o t h e network s u b t r a c t e d from i t so t h a t t h e d i f f e r e n c e i s t h e a c c e l e r a t i n g power a v a i l a b l e f o r t h e t u r b i n e -
g e n e r a t o r combination.
Again, t h e governor l o o p could i n v o l v e n o n - l i n e a r
terms
t h a t would be l i n e a r i z e d around a n o p e r a t i n g p o i n t f o r t h e r e p r e s e n t a t i o n g i v e n here. The i n t e r a c t i o n o f t h i s g e n e r a t i n g u n i t w i t h t h e e l e c t r i c a l t r a n s m i s s i o n network i s r e p r e s e n t e d by t h e f o u r v a r i a b l e s shown i n t h e upper r i g h t - h a n d c o r n e r .
The
magnitude of t h e terminal v o l t a g e E T and i t s a n g l e 8 T are a result of t h e a c t i o n s of t h e v o l t a g e r e g u l a t o r and governor systems.
Depending upon t h e c o n d i t i o n of
t h e o t h e r g e n e r a t i n g u n i t s and t h e t r a n s m i s s i o n system, a c e r t a i n real and reac-
t i v e power w i l l flow between t h e g e n e r a t i n g system and t h e e l e c t r i c a l t r a n s m i s s i o n network.
The v a l u e of t h e real power P and t h e r e a c t i v e power Q are shown as
i n p u t s i n t o t h e g e n e r a t i n g u n i t r e p r e s e n t a t i o n as they do a f f e c t t h e r e s p o n s e of t h i s unit.
The a c t u a l r e l a t i o n s h i p between t h e s e v a r i a b l e s f o r machine k of N
machines o n t h e network is
where t h e Y ' s are t h e magnitudes of t h e e l e m e n t s of t h e a d m i t t a n c e m a t r i x of t h e
t r a n s m i s s i o n network, variables
t h e &Is are t h e a n g l e s of t h e a d m i t t a n c e s , and t h e o t h e r
are a s d e s c r i b e d above.
These n o n - l i n e a r
e q u a t i o n s are l i n e a r i z e d
around a n o p e r a t i n g p o i n t f o r t h e t r a n s m i s s i o n system and w i l l be r e p r e s e n t e d i n m a t r i x form as a s e t of l i n e a r a l g e b r a i c e q u a t i o n s . THE STABILITY PROBLEM
Modern power systems, such as t h o s e i n t h e U n i t e d S t a t e s , are i n t e r c o n n e c t e d o v e r l a r g e a r e a s so t h a t t h e r e may be o v e r 500 g e n e r a t i n g u n i t s , similar t o t h o s e d e s c r i b e d i n t h e p r e v i o u s s e c t i o n , i n t e r - a c t i n g w i t h each o t h e r . s t a b i l i t y problems are u s u a l l y d e f i n e d .
Two c l a s s e s of
One is c l a s s i c a l l y c a l l e d t h e t r a n s i e n t
J. E. Van Ness s t a b i l i t y problem and i n v o l v e s large d i s t u r b a n c e s of t h e system.
are brought disturbance.
about by
a f a u l t on a
transmission l i n e ,
or
These u s u a l l y
some similar major
I n t h e s e problems t h e n o w l i n e a r c h a r a c t e r i s t i c s of t h e t r a n s m i s s i o n
system, as g i v e n i n E q u a t i o n 1 , are e x t r e m e l y i m p o r t a n t .
These problems u s u a l l y
are s o l v e d by a s t e p b y - s t e p i n t e g r a t i o n of t h e d i f f e r e n t i a l e q u a t i o n s d e s c r i b i n g t h e system.
These s t u d i e s are e x p e n s i v e and time consuming, but e s s e n t i a l t o t h e
o p e r a t i o n of l a r g e power systems.
I n the United States, t h e n o r t h e a s t blackouts
i n November of 1965 and J u l y of 1977 were examples of t h e t y p e of s i t u a t i o n where t h e r e was a l a r g e d i s t u r b a n c e and i t was n e c e s s a r y t o t a k e t h e n o n - l i n e a r i t y
of
t h e system i n t o account. I n t h i s paper a d i f f e r e n t t y p e of s t a b i l i t y problem is c o n s i d e r e d of t h e system t o s m a l l d i s t u r b a n c e s .
-
t h e response
I f t h e d i s t u r b a n c e s are c o n s i d e r e d s m a l l ,
t h e n t h e system can be r e p r e s e n t e d by a s e t of l i n e a r i z e d e q u a t i o n s and l i n e a r system t h e o r y c a n be a p p l i e d .
One of t h e major a p p l i c a t i o n s of t h i s method i s t h e
s t u d y of t h e spontaneous o s c i l l a t i o n s t h a t occur i n some power systems [21. they o c c u r ,
they may b u i l d up t o l a r g e magnitudes.
s t u d y i n g t h e l i n e a r i z e d set of e q u a t i o n s .
When
They c a n be p r e d i c t e d by
The t e c h n i q u e t o be u s e d c o n s i s t s of
examining t h e e i g e n v a l u e s of the c h a r a c t e r i s t i c e q u a t i o n s and d e t e r m i n i n g by t h e i r p o s i t i o n on t h e complex p l a n e t h e c o n d i t i o n of t h e system. The b a s i c t e c h n i q u e used i s one of t h e f i r s t methods p r e s e n t e d i n a b e g i n n i n g course i n feedback control theory.
The problems e n c o u n t e r e d i n t r y i n g t o apply
t h i s method arose from t h e v e r y l a r g e s i z e of t h e system, and t h u s t h e o r d e r of t h e e q u a t i o n s t h a t needed t o be s o l v e d .
It should be noted t h a t i n t h e s y s t e m s
s t u d i e d so f a r t h e c r i t i c a l modes of o s c i l l a t i o n s f o r t h e o v e r a l l system were a f u n c t i o n of t h e i n t e r a c t i o n o f v a r i o u s g e n e r a t i n g s t a t i o n s , and n o t due t o one s t a t i o n by i t s e l f .
Thus,
t h e n e c e s s a r y i n f o r m a t i o n could n o t be o b t a i n e d by
c o n s i d e r i n g a s i n g l e machine o p e r a t i n g i n t o a f i x e d l o a d . I n a d d i t i o n t o f i n d i n g t h e e i g e n v a l u e s f o r l a r g e s y s t e m s such as t h e one d e s c r i b e d h e r e , t h e methods t h a t have been developed c a n be used t o f i n d t h e r o o t l o c i or e i g e n v a l u e l o c i of t h e system and t o f i n d t h e s e n s i t i v i t i e s of t h e e i g e n v a l u e s t o t h e v a r i o u s parameters i n t h e system.
These t o o l s have been most u s e f u l i n d e t e r
mining changes t h a t can be made t o improve t h e s y s t e m s r e s p o n s e .
The e i g e n v e c t o r s
are needed t o f i n d t h e s e n s i t i v i t i e s and are t h u s a v a i l a b l e f o r many o t h e r t y p e s of s t u d i e s on t h e system.
Some of t h e s e w i l l be d e s c r i b e d i n t h i s paper.
Eigenvaluel Vector Use in Electric Power System Problems
185
APPLICATION TO ?HE STABILITY PROBLEM c o n t a i n a high p r o p o r t i o n of
S e v e r a l l a r g e power s y s te m s which generating ope r at i o n .
capacity
have
experienced
spontaneous
oscillations
hydroelectric during
normal
D i f f i c u l t i e s of t h i s type t h a t have been ex p er i en ced i n t h e w est er n
United S t a t e s have been v e r y well documented i n v a r i o u s p u b l i sh ed papers. paper p u b l i s h ed i n 1963 [ 2 ] , A.
R.
Benson and D.
G.
In a
Wohlgemuth d e s c r i b e d i n great
d e t a i l t h e o s c i l l a t i o n s t h a t had o c c u r r e d on t h e Northwest Power Pool d u r i n g t h e p e r i o d from 1955 t o t h e e a r l y 1960's.
I n t h e d i s c u s s i o n s p u b l i sh ed w i t h t h a t
paper, o t h e r s y s t e m s r e p o r t e d s i m i l a r d i f f i c u l t i e s .
L a t e r i n t h e 1 9 6 0 1 s, when t h e
P a c i f i c northwest was i n t e r c o n n e c t e d w i t h t h e rest of t h e w e s t e r n United S t a t e s t o form one system, s i m i l a r o s c i l l a t i o n s were observed throughout t h e t o t a l w est er n United S t a t e s . hydr o - g en er at i n g
Some of t h e work t h a t h a s been done u s i n g first steam and t h e n u n i t s t o damp o u t
t h e s e o s c i l l a t i o n s h as been r e p o r t e d i n a
series of p ap er s by S c h l e i f and o t h e r a u t h o r s [3-51. In
the
earlier
d i f f i c u l t i e s which
involved
only
the
P a c i f i c n o r t h w est ,
frequency of t h e s e o s c i l l a t i o n s was approximately 3 c y c l e s p er minute.
the
Many times
the y would b u i l d up s p o n t a n e o u s l y , l a s t f o r s e v e r a l c y c l e s , and t h e n d i e away.
At
o t h e r times, however, t h e o s c i l l a t i o n s would b u i l d up t o a l e v e l and d u r a t i o n such t h a t c o r r e c t i v e a c t i o n would have t o be taken.
T h i s i n v o l v ed e i t h e r p l a c i n g
a d d i t i o n a l damping i n t h e g o v e r n o r s , b lo c k in g t h e governors, or b l o ck i n g t h e g a t e s on t h e g e n e r a t i n g u n i t s which appeared t o be c o n t r i b u t i n g t o t h e o s c i l l a t i o n s . The o s c i l l a t i o n s t h a t o c c u r r e d when t h e P a c i f i c n o r t h w est was i n t e r c o n n e c t e d w i t h the
power systems i n C a l i f o r n i a ,
Arizona,
and a d j a c e n t t e r r i t o r i e s were of
a
similar nature except t h a t t h e f r e q u e n c i e s sometimes v a r i e d up t o 6 c y c l e s per minute.
Most of t h e s e e a r l i e r o s c i l l a t i o n s are b e l i e v e d t o have been caused by
i n t e r a c t i o n s between t h e g o v e r n o r s of
t h e v a r i o u s u n i t s involved.
Most r e c e n t
cases, however, have been caused by i n t e r a c t i o n s i n v o l v i n g t h e high speed v o l t a g e r e g u l a t o r s now b ein g used o n some of t h e g e n e r a t i n g u n i t s . I n t h e e a r l y stages of t h i s work i t w a s d e s i r e d t o have a n example s y s t e m which could be used t o test t h e programs b e i n g developed.
Working w i t h t h e staff of t h e
Bon n ev i l l e Power A d m i n i s tr a ti o n and t h e Corps of Engineers, a s i m p l i f i e d 8 generating s t a t i o n r e p r e s e n t a t i o n of t h e P a c i f i c northwest was developed.
Many of t h e
pa r am et er s i n t h e models used were e v a l u a t e d o n t h e b a s i s of the i n t u i t i v e judgment of t h e engineers who had worked f o r a l o n g time w i t h t h i s system.
Also, many
g e n e r a t i n g s t a t i o n s were e i t h e r i g n o r e d or lumped t o g e t h e r t o make one of t h e eight
equivalents
t h a t were used.
The results f r m t h i s v er y
crude model
p r e d i c t e d t h e observed o s c i l l a t i o n s and gave a great d e a l of encouragement for t h e
J. E. Van Ness
186
f u r t h e r d e v e l o p e n t of th e s e methods.
L a t e r work showed t h a t t h e r e was a l a r g e
element of pure l u c k i n t h i s o r i g i n a l study because of t h e i n a c c u r a c i e s i n t h e crude model t h a t was used.
However, t h e results o f t h i s e a r l y study w i l l be used
h e r e as a n example of t h e type of study t h a t can be made. The example [61 c o n t a i n s 8 h y d r o e l e c t r i c g e n e r a t i n g s t a t i o n s whose b l o ck diagrams
are of
t h e form shown i n Fig.
2.
I n t h i s study t h e e f f e c t of
the voltage
r e g u l a t o r and synchronous machine e q u a t i o n s were n eg l ect ed s o t h a t t h e upper p a r t of t h e diagram i n Fig. 1 was p o t used. Fig. 2 a r e based o n a hydro-turbine t u r b i n e i n Fig.
1.
The governor and t u r b i n e r e p r e s e n t a t i o n i n
and t h u s d i f f e r from t h o se g i v e n f o r a steam
Also, c o n t r o l s i g n a l s a r e s e n t t o t h e i n d i v i d u a l g e n e r a t i n g These s i g n a l s are shown
u n i t s from a c e n t r a l lo a d frequency c o n t r o l system. coming i n from t h e l e f t of t h e diagram.
I n t h i s particular st u d y , only one of t h e
s t a t i o n s h a s p r o p o r t i o n a l c o n t r o l , so t h a t only one Kp i s non-zero.
REGULATOR
Figure 2.
Block diagram of hydr-turbine
and g en er at o r .
The s t a t i o n s are i n t e r c o n n e c t e d through an e l e c t r i c a l t r a n s m i s s i o n network and a c o n t r o l system which measures t h e power a t d e s i g n a t e d p o i n t s o n t h e system and from
t h i s d et er m i n e s t h e c o n t r o l
s i g n a l s Pc and P,A
f o r each s t a t i o n .
The
r e s u l t i n g system is of 52nd o r d e r . The computer program g i v e s a l i s t of all of t h e e i g e n v a l u e s of t h e system and a l s o p l o t s them u s i n g a p r i n t e r - p l o t t i n g r o u t i n e .
This p l o t i s shown i n Fig. 3.
Since
many of t h e r o o t s f a l l c l o s e t o g e t h e r and n e a r t h e o r i g i n , t h e r e s o l u t i o n of t h i s type of p l o t is not great enough t o separate them. could be found from t h e l i s t e d values. v e r t i c a l scales on t h i s p lo t.
However, t h e i r ex act v a l u e s
Note t h e d i f f e r e n c e i n t h e h o r i z o n t a l and
T h i s i s t y p i c a l of h y d r o e l e c t r i c systems.
For t h e
set of parameters used i n t h i s example, one p a i r of ei g en v al u es i s j u s t i n t o t h e r i g h t half-plane.
It i s of
i n t e r e s t t h a t t h i s pair of
ei g en v al u es seems t o
correspond t o a mode of o s c i l l a t i o n t h a t sometimes b u i l d s up spontaneously o n t h e a c t u a l system.
187
Eigenvaluel Vector Use 111 Electric Power System Problems PlOl OF ElGENVAlUES
I
v
A G
I
N A R V
*
I
* *
- 1 I I
I 1 I
A X I
5 -2.
-3.
-1.
0.
RE4L A X I S
R P 4 O P E R A l l O N S S l U O Y NO
2
F i g u r e 3. P l o t of e i g e n v a l u e s of sample problem.
THE POWERTON 6 STUDY
On May 12, 1976, real and r e a c t i v e power o s c i l l a t i o n s o ccu r r ed o n t h e Commonwealth Edison system.
The s o u r c e of t h e o s c i l l a t i o n s was shown t o be Powerton U n i t 6.
Powerton S t a t i o n c o n s i s t s of
two 992 MVA tandem
g e n e r a t o r s which were i n s t a l l e d i n t h e 1970's.
compound,
3600 rpm t u r b i n e-
The s t a t i o n i s l o c a t e d approx-
i ma t el y 260 kro ( 1 6 0 m i l e s ) southwest of Chicago,
o u t s i d e of Pekin,
Power i s t r a n s m i t t e d t o t h e Chicago area over f o u r 345 kV l i n e s . and t r a n s m i s s i o n c o n n e c ti o n s are shown i n F i g u r e 4.
Illinois.
The g e n e r a t i o n
Each u n i t i s equipped w i t h a
General E l e c t r i c ALTERREX* e x c i t a t i o n system which has a response r a t i o (ESR) o f 2.0
and a General E l e c t r i c Model LA202 power system s t a b i l i z e r .
Both u n i t s are
equipped w i t h e l e c t r o h y d r a u l i c s p e e d - c o n t r o l ( E H C ) systems. P r i o r t o experiencing the o s c i l l a t i o n s , approximately 820 W and 25kV.
both Powerton U n i t s were o p e r a t i n g a t
The l i n e t o Dresden (0302) and a l i n e t o Goodings
Grove (0303) were o u t of s e r v i c e due t o maintenance and t h e bus t i e b r eak er was open.
A d d i t i o n a l l y , n e i t h e r power system s t a b i l i z e r was i n s e r v i c e as they were
awaiting a change i n t h e i n p u t t r a n s d u c e r and f i n a l f i e l d testing. The s t a t i o n o p e r a t o r was a t t e m p t i n g t o i n c r e a s e t h e l o a d on Powerton #6 from 820
J. E. Van Ness
188
TLoCKsmT
Figure 4.
G e n e r a ti o n and Transmission Connections of May 12, 1976.
EIW t o 830 W when real and r e a c t i v e power o s c i l l a t i o n s began a t Powerton and were observed a t o t h e r Commonwealth Edison g e n e r a t i n g s t a t i o n s and on some t i e l i n e s . The o s c i l l a t i o n s
decayed when
Powerton
o s c i l l a t i o n s were n o t i c e d on t h r e e
#6 power o u t p u t was reduced.
The
s u c c e s s i v e a t t e m p t s t o i n c r e a s e l o a d and
decayed each t i m e l o a d was reduced. The o s c i l l a t i o n s observed on s w i tc h b o a r d m e ter i n g a t Powerton were approximately 30 t o 60 W peak t o peak and 10 Mvar on U n i t #6 and approximately 5 MW and 5 Mvar o n U n i t #5.
The frequency of t h e o s c i l l a t i o n s was approximately 1 Hz.
Subsequent
tests s h ar ed t h a t when t h e o s c i l l a t i o n s appeared, they could a l w a y s be e l i m i n a t e d by e i t h e r r ed u ci n g t h e u n i t ' s
power o u t p u t , i n c r e a s i n g t h e u n i t ' s e x c i t a t i o n or
sw it ch i n g t h e au t o m a t ic v o l t a g e r e g u l a t o r out of s e r v i c e . F u r t h e r a n a l y s i s showed t h a t a l t e r i n g some of t h e e x c i t a t i o n system parameters could a l s o s t a b i l i z e Powerton P6.
The major changes were i n t h e forward l o o p and
feedback gains. Following c o r r e c t i o n of the problem by r e s e t t i n g c e r t a i n v o l t a g e r e g u l a t o r g a i n s and t i m e c o n s t a n t s and r e p l a c i n g p a r t of t h e v o l t a g e r e g u l a t o r c o n t r o l s , a post i n c i d e n t a n a l y s i s was s t a r t e d t o simulate t h e i n c i d e n t and see i f t h e problem could have been p r e d i c t e d . PALS Rogram,
S e v e r a l models of t h e system were formed u s i n g t h e
t h e l a r g e s t b e i n g 177th o r d e r .
Th i s a n a l y t i c a l st u d y d u p l i c a t e d
f i e l d o b s e r v a t i o n s of Powerton U n i t #6 i n t h e o p e r a t i n g range of 800 t o 850 MW and 24.5
kV t o 25.5 kV ( . 9 8 t o 1.02 per u n i t ) .
The a n a l y t i c a l and f i e l d o b s e r v a t i o n s
were i n good agreement both i n t h e q u a l i t a t i v e sen se ( s t a b i l i t y vs. i n s t a b i l i t y ) and q u a n t i t a t i v e l y (matching t h e dominant f r e qu en cy ) . p r e s e n t e d i n r e f e r e n c e 7.
'Ihe d e t a i l e d results were
Eigenvaluel Vector (Jse iri Electric Power System Problems
189
OTHER APPLICATIONS While t h e s e methods were developed f o r t h e s t u d y of t h e s t a b i l i t y of a l a r g e power s y s t e m , t h e y have been a p p l i e d t o s e v e r a l o t h e r problems.
arose i n t h e d e s i g n of
One i n t e r e s t i n g problem
l a r g e a c c e l e r a t o r s f o r use i n nuclear physics.
These
a c c e l e r a t o r s have a p e r i o d i c power r e q u i r e m e n t similar t o t h e form shown i n Fig.
5.
I n t h i s particular waveform t h e base power i s 100 megawatts so t h a t t h e t o t a l
power s w i n g o v e r t h e two second p e r i o d i s 282 megawatts. s e v e r e cases.
In the past,
T h i s i s one of t h e more
t h i s f l u c t u a t i o n i n power h a s been accommodated by
u s i n g motor g e n e r a t o r s s e t s w i t h l a r g e f l y w h e e l s t o s e r v e as energy s t o r a g e devices.
The q u e s t i o n was r a i s e d a s t o whether such a l o a d c o u l d be connected
d i r e c t l y t o a modern power system w i t h o u t e x c i t i n g o s c i l l a t i n s i n t h e system t h a t would be u n d e s i r a b l e .
TIME F i g u r e 5.
- SECONDS
Waveform o f l o a d .
S e v e r a l methods of a n a l y z i n g t h i s problem were t r i e d i n v o l v i n g b o t h d i r e c t d i g i t a l s i m u l a t i o n and a n a l o g s i m u l a t i o n of t h e system.
The most s u c c e s s f u l and f l e x i b l e
was a method u s i n g t h e e i g e n v a l u e s and e i g e n v e c t o r s of t h e system [8,91.
The
b a s i c method was one t h a t i s p r e s e n t e d i n a first c o u r s e i n c i r c u i t t h e o r y .
The
waveform shown i n Fig. 5 was broken i n t o i t s harmonic components.
The r e s p o n s e of
t h e system t o e a c h of t h e components could t h e n be e x p r e s s e d simply i n terms of t h e e i g e n v a l u e s and e i g e n v e c t o r s .
The r e s p o n s e of t h e v a r i o u s v a r i a b l e s i n t h e
system was r e c o n s t r u c t e d by a d d i n g t o g e t h e r t h e r e s p o n s e of
those variables t o
J. E. Van Ness
190
each of t h e harmonics. t h e United States.
The method was a p p l i e d t w i c e t o two d i f f e r e n t l o c a t i o n s i n
I n one case i t was d e c i d e d t h a t t h e r e s u l t i n g v a r i a t i o n s i n
v o l t a g e and f r e q u e n c y would n o t be a c c e p t a b l e , w h i l e i n t h e second c a s e they were.
For t h i s second s t u d y t h e power s y s t e m s i n t h e midwestern U n i t e d S t a t e s were approximated by a 1 1 6 t h o r d e r system.
The PALS Program was used t o form t h e
system m a t r i x a n d t o f i n d i t s e i g e n v a l u e s and e i g e n v e c t o r s . v a l u e s and e i g e n v e c t o r s , v a r i o u s waveforms of e f f e c t on t h e system.
U s i n g these eige*
t h e l o a d were s t u d i e d f o r t h e i r
To be sure t h a t t h e worst c a s e had been c o n s i d e r e d , t h e
p e r i o d of t h e waveform was v a r i e d s o t h a t s p e c i f i c modes o f t h e o v e r a l l system, including its voltage
r e g u l a t o r s and g o v e r n o r s , w o u l d be e x c i t e d .
The major
advantage of t h i s approach was t h e a b i l i t y t o i d e n t i f y t h e s e c r i t i c a l modes and t o e a s i l y s t u d y the e f f e c t when t h e y were d i r e c t l y e x c i t e d . Another digital [lO,ll]
i n t e r e s t i n g problem
.
computer was
used
involved t h e e f f e c t t o g i v e load-frequency
of
t h e sampling rate when a control
of
a
power
system
The r e s u l t i n g model i n v o l v e d both d i f f e r e n t i a l and d i f f e r e n c e e q u a t i o n s .
These were t r a n s f o r m e d i n t o a s e t of d i f f e r e n c e e q u a t i o n s which were s t u d i e d f o r
stability. t h e system.
The t r a n s f o r m a t i o n s were based on t h e e i g e n v a l u e s and e i g e n v e c t o r s of
Some of t h e s u b r o u t i n e s i n t h e PALS F’rogram were developed as part of
t h e study of t h i s problem.
It was found t h a t c e r t a i n modes of o s c i l l a t i o n of t h e
system might be e x c i t e d i f t h e sampling r a t e of t h e d i g i t a l computer were i m p r o p e r l y chosen.
However, i t a l s o was found t h a t i f t h e system d e s i g n e r was aware
t h a t t h i s c o u l d happen, he c o u l d a v o i d t h o s e c r i t i c a l sampling rates s o t h a t t h i s
would n o t be a problem o n t h e system. I n r e c e n t y e a r s a n o t h e r type of problem h a s been of g r e a t c o n c e r n t o power system engineers.
I n l o n g ac t r a n s m i s s i o n l i n e s i t i s common t o i n s e r t series c a p a c i t o r s
t o i n c r e a s e t h e power t r a n s m i t t i n g c a p a b i l i t y of t h e l i n e .
The r e s u l t i n g series
L C c i r c u i t may have a r e s o n a n t f r e q u e n c y of 40 H z , f o r example.
If t h i s frequency
corresponds t o one of t h e n a t u r a l f r e q u e n c i e s of t h e spring-mass system formed by t h e g e n e r a t o r and i t s t u r b i n e s , a n o s c i l l a t i o n may result which h a s broken t h e s h a f t between t h e g e n e r a t o r and t h e t u r b i n e s .
Eigenvalue a n a l y s i s i s one approach
used f o r t h i s problem, but s i n c e t h e t r a n s m i s s i o n system can no l o n g e r be repres e n t e d by i t s s t e a d y - s t a t e e q u a t i o n , t h e r e s u l t i n g system of e q u a t i o n s i s of much h i g h e r o r d e r t h a n t h o s e d e s c r i b e d e a r l i e r i n t h i s paper.
For more i n f o r m a t i o n on
t h i s problem, c a l l e d the subsynchronous resonance problem, see r e f e r e n c e 1 2 .
Eigeizvalue/ 17ector U r c
iti
191
Electric Power-System Pr-obleins
REFERENCES 1.
J. E.
Van Ness, "PALS
t h e B o n n e v i l l e Power
-
A Program f o r A n a l y z i n g L i n e a r S y s t e m s , " A r e p o r t t o
published a t Northwestern University,
Administration
March 1969. 2.
A.
R. Benson a n d D.
G. Wohlgemuth, "System Frequency S t a b i l i t y i n t h e P a c i f i c
N o r t h w e s t , " IEEE T r a n s . o n Power A p p a r a t u s a n d S y s t e m s , No. 6 4 , pp. 765-773, F e b r u a r y 1963.
3.
F. R.
Schleif and J . H. White, "Damping f o r t h e Northwest-Southwest T i e - l i n e
Oscillations
An Analogue
-
S y s t e m s , Vol. PAS-85, 4.
F.
R.
S c h l e i f , G.
E.
Study,"
Trans.
IEEE
No. 1 2 , pp. 1239-1247,
Martin,
R.
w i t h a H y d r o g e n e r a t i n g U n i t , " IEEE T r a n s . Vol. PAS-86, 5.
No. 4 , pp. 438-442,
Apparatus
and
December 1966.
Angell,
R.
o n Power
"Damping of System O s c i l l a t i o n s o n Power A p p a r a t u s a n d S y s t e m s ,
A p r i l 1967.
F. R. S c h l e i f , H. D. Hunkins, G. E. M a r t i n , E. E. H a t t a n , " E x c i t a t i o n C o n t r o l
t o Improve P o w e r l i n e S t a b i l i t y , " I E E E Trans. o n Power A p p a r a t u s a n d S y s t e m s , Vol. PAS-87, No. 6 , pp. 1426-1434, J u n e 1968. 6.
A.
R.
Benson,
W.
F.
Tinney,
and D,
G.
Wohlgemuth,
" A n a l y s i s of Dynamic
Response of Electric Power S y s t e m s t o Small D i s t u r b a n c e s , " Proc. I n d u s t r y Computer A p p l i c a t i o n s Conf. 7.
J.
E.
Van Ness,
I n v e s t i g a t o n of
F.
M.
,
B r a s c h , G.
L.
Dynamic I n s t a b i l i t y
1965 Power
pp. 247-259. Landgren, S.
T.
Nauman,
"Analytical
O c c u r r i n g a t Powerton S t a t i o n ,
T r a n s . o n Power A p p a r a t u s and S y s t e m s , Vol. PAS-99,
"
IEEE
No. 4 , J u l y / A u g u s t 1 9 8 0 ,
PP. 1386-95.
8.
J. E.
Van Ness, "Response of Large Power S y s t e m s t o C y c l i c Load V a r i a t i o n s , "
IEEE T r a n s .
o n Power A p p a r a t u s and S y s t e m s , Vol. PAS-85,
No. 7 , J u l y 1966,
PP. 723-727.
9.
J. E.
a
Van Ness, J. A.
C y c l i c Load
P i n n e l l o , "Dynamic Response of a Large Power System t o
Produced by
a N u c l e a r Accelerator," IEEE T r a n s .
A p p a r a t u s a n d S y s t e m s , Vol. PAS-90,
No. 4 , J u l y 1 9 7 1 , pp. 1856-1962.
o n Power
J. E. Van Ness
192
10.
F.
P.
Imad and J .
E b Van Ness,
" F i n d i n g t h e S t a b i l i t y and S e n s i t i v i t y of
L a r g e Sampled S y s t e m s , " IEEE Trans. o n A u t o m a t i c C o n t r o l , Vol.
AC-12,
August 1967, pp. 442-1145.
11.
No. 4,
J. E. Van Ness and R. R a j a g o p a l a n , "Effect of D i g i t a l S a m p l i n g Rate o n System
Stability,
Proc. 5 t h Power I n d u s t r y Computer A p p l i c a t i o n s Conf.,
Pittsburgh,
P e n n s y l v a n i a , May 1967, pp. 41-46.
12.
BEE
Committee
Report,
"A
Bibliography
for
the
Study
Resonance Between R o t a t i n g Machines and Power S y s t e m s ,
"
of
Subsynchronous
IEEE T r a n s . on Power
PAS-95, No. 1 , J a d F e b 1976, pp. 216-218. S u p p l e m e n t , Vol. PAS-98, No. 6, Nov/Dec 1979, pp. 1872-1875. Approach & S y s t e m s , Vol.
First
Large Scale Eigenvalue Problems
193
J. Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986
A PRACTICAL PROCEDURE FOR COMPUTING EIGENVALUES OF LARGE SPARSE NONSYMMETRIC MATRICES
Jane Cullum and Ralph A. Willoughby IBM T.J. Watson Research Center, P.O. Box 218 Yorktown Heights, New York 10598 U.S.A.
We propose a Lanczos procedure with no reorthogonalization for computing eigenvalues of very large nonsymmetric matrices. (This procedure can also be used to compute corresponding eigenvectors but that issue will be dealt with in a separate paper.) Such computations are for example, central to transient stability analyses of electrical power systems and for determining parameters for iterative schemes for the numerical solution of partial differential equations. Numerical results for several large matrices are presented to demonstrate the effectiveness of this procedure.
1. INTRODUCTION
Economical procedures for computing eigenvalues and eigenvectors of very large but sparse real symmetric matrices exist. See for example Cullum and Willoughby [ 11and Parlett and Scott [2]. However comparable progress in the computation of eigenvalues and eigenvectors of nonsymmetric matrices has not yet been reported. The algorithms currently available for such computations, see for example Stewart and Jennings [3] and Saad [4] are very useful but somewhat limited in the amount of spectral information which they can obtain. The simultaneous iteration procedure described in [3] can be used to compute a few of those eigenvalues of the given matrix which are largest in magnitude. The procedures in [4] are based on an iterative Arnoldi’s method and use Hessenberg matrices as approximations to the original matrix. Both types of procedures do not modify the given matrix A and use only products of the form Ax. The procedures discussed in [ 11 are based upon variants of the basic Lanczos recursion. For a given real symmetric matrix A, and a starting vector v1 (which is typically generated randomly) this recursion can be used (at least theoretically) to generate orthonormal bases for the Krylov subspaces Xm(A, vl) z {v,, Av,,
... , Am-lvl]
for m = 1, 2, ... , associated with A and v I .
This generation produces a family of real symmetric tridiagonal matrices Tm which (in exact arithmetic) represent orthogonal projections of the given original matrix A onto the corresponding Krylov subspaces. In these procedures, approximations to the eigenvalues of the given
J. Cullurn and R . A . Willoughby
194
matrix A are obtained by computing eigenvalues of one or more of the Lanczos matrices generated, and then selecting some subset of the computed eigenvalues as approximations to eigenvalues of A. The nominal justification for this type of procedure is the “fact” that the Lanczos matrices are projection matrices for A and thus the computed eigenvalues of these matrices T,, are eigenvalues of the operators obtained from A by restricting A to the Krylov subspaces
Sm.
However in practice, this justification is not applicable because the
orthogonality upon which this argument is based does not persist in finite precision arithmetic.
In general the basic Lanczos procedure will not function properly without some modification. Various modifications have been proposed. They are discussed briefly in Chapter 2 of [l]. The interested reader is referred to [ l ] for detailed descriptions of one class of such algorithms.
Because of the observed efficiences and speed which have been obtained using Lanczos procedures on real symmetric matrices, there have been attempts recently to devise Lanczos algorithms for computing eigenvalues of large nonsymmetric matrices. These are based upon the basic two-sided Lanczos recursion for nonsymmetric matrices which is given in Equations (2.1)-(2.2) below. See in particular, Parlett, Taylor and Liu [ 5 ] . This paper and its predecessor do not contain very many numerical results. However they indicate that Lanczos procedures may also be very effective tools for nonsymmetric problems.
In attempting to construct a nonsymmetric Lanczos procedure from the basic equations given in Eqns(2.1)-(2.2), the algorithm designer must be aware of two possible difficulties. First (see Section 2 ) it is possible (at least theoretically) that a normalizing factor which is used in the recursions may vanish. If this were to happen, then the standard Lanczos recursions would have to be terminated. The second possible difficulty is that the roundoff errors caused by the finite precision arithmetic may destroy the theoretical relationship between the original matrix and the Lanczos matrices being generated. The Lanczos procedure proposed in reference [5] is specifically designed to mollify the first possible difficulty. These authors modified the basic twosided Lanczos recursion to incorporate an analog of the block pivoting for stability which was used by Bunch and Parlett [6] in the solution of systems of linear equations. Reference [ 5 ]states that they address the second possible difficulty by the continuous reorthogonalization or biorthogonalization of the Lanczos vectors as they are generated by the recursions.
Compu ring Eigenvalucs of Large Sparse Nonsymmetric Matrices
195
Modal computations for nonsymmetric matrices are inherently much more difficult than similar computations for real symmetric matrices.
Real symmetric matrices are always
diagonalizable. Any real symmetric matrix always has a full set of eigenvectors. Moreover, these eigenvectors can always be chosen to be orthogonal vectors. In fact, the eigenvalue computations for real symmetric matrices are ‘perfectly’ conditioned.
Furthermore, all of the
eigenvalues are real, and all of the eigenvectors are real vectors.
I n general, nonsymmetric matrices A have two sets of eigenvectors, right eigenvectors X
= {x,, . . . , x,)
such that AX = XA, and left eigenvectors Z
= { z , , ... , zJ)
such that
T
A Z = Z h . The matrix A is diagonal, and its nonzero entries are eigenvalues of A. X and Z are
(real) biorthogonal, that is XTZ = 1. but the number J of right or left eigenvectors may be less than the order of A . Furthermore, the individual sets of vectors X and Z need not and probably do not form orthogonal sets of vectors.
DEFINITION 1.1. Any nxn matrix A which does not have a complete set of right and left eigenvectors is called a defective or nondiagonalizable matrix.
Evcry real symmetric matrix is nondefective. Any matrix with distinct eigenvalues is nondefective (diagonalizable). For any diagonalizable matrix A we have that there exists a nonsingular matrix X and a diagonal matrix A such that
Clearly the columns of X in Eqn( 1.1 ) are right eigenvectors of A and the diagonal entries of A are the eigenvalues of A. The interested reader should see for example, Wilkinson [7, Chapter I ] for more background information on defective matrices and for examples of the effects of the
defectiveness upon any numerical procedure for computing eigenvalues. We restrict our considerations to nondefective nonsymmetric matrices. We do not make any claims for defective matrices.
In this paper we present a Lanczos procedure with no reorthogonalization for computing eigenvalues of large nondefective, nonsymmetric matrices. In Section 2 we first restate the general two-sided Lanczos recursion formulas for nonsymmetric matrices given for example in Wilkinson [7, Chapter 61. We then state a general procedure for using these recursions to obtain
J. Cullurn and R.A. Willoughby
196
approximations to eigenvalues .of a nondefective nonsymmetric matrix. In Section 3 we then outline our proposed algorithm which uses complex arithmetic, and derive some fundamental relationships valid in exact arithmetic. In practice when finite precision arithmetic is used, the basic Lanczos procedures as stated in Sections 2 and 3 do not possess the biorthogonality relationship given in Lemma 3.1. In order to obtain a practical procedure, it is necessary to modify the basic procedure. The modification which we use is given in Section 4 along with a justification for it. The behavior of this modified procedure on several large and medium-size problems is then illustrated in Section 5. Computations for small and medium size test problems yielded eigenvalue approximations with accuracies comparable to those obtained using the relevant subroutines contained in the EISPACK Library [8]. One of these examples is given in Cullum and Willoughby [9].
One unusual aspect of our procedure is that the computations are in complex arithmetic even when the original matrix is real. In fact all of the examples we present are real nonsymmetric matrices. Typically in the literature, see for example, EISPACK [8], such computations are modified so that the arithmetic is all real.
2. GENERAL TWO-SIDED LANCZOS RECURSION
Wilkinson [7] gives the following general two-sided Lanczos recursion for any nonsymmetric matrix A. Please refer to Chapter 6 in [7] for additional details. Specifically, let A be a real nxn matrix. Let v I and w I be two nxl vectors with their Euclidean “inner product” v,Tw I = 1 . Note that in general we will take these starting vectors to be complex vectors.
For W,,
i = 1 , 2 , .. . , M
use
the
following
= (wl, ... , wm] and V, = {vl, ... , v,]
recursions
to
and scalars y i + l ,
define and
ai
where T
ai
= wi Avi,
T
yi
= wi-,Avi,
T
T
and pi = viPIA wi.
Lanczos
such that
vectors
Computing Eigenvulues of Large Sparse Nonsymmetric Mutrices
197
The coefficients al,pi, yi are chosen such that for each 1 5 j 5 M, the sets of Lanczos vectors ( V j ) = { v , , ... , v,) and (Wj]
= ( w I , . . . , wj j
are (real) biorthogonal.
That is for each j,
T
V i W , = Ij. Therefore the corresponding vectors aivi and aiwi are respectively, the real T
biorthogonal projections of the vectors Avi and A wi onto the most recently-generated Lanczos v-vcctor v , and w-vector wi. Similarly, the corresponding vectors yivi-l and Biwi-, are respecT
tivcly, the corresponding real biorthogoiial projections of the vectors Avi and A wi onto the next most rccently-generated Lanczos v-vector vi- I and w-vector w,-
I.
Thc corrcspondiiig tridiagonal 1-anczos matrices are defined by the scalars yi+,.
ai,
pi+I ,
and
That is for each m = I , 2, ...
TI,
=
Since these recursions only explicitly biorthogonalize v,+ I and w , + ~respectively w.r.t. w, and , is necessary to prove that in fact w,Tvk = 0 for any j # k. We do not give w,-, and v, and v , - ~ it
a general proof of this fact here. However in the next section we give a very similar proof for the particular variant of Eqns(2.1)-(2.2) which we use in our procedure.
Since the two sets of vectors W, and V j are biorthogonal (in exact arithmetic), we have that for j = 1, 2, . . . the Lanczos matrices in Eqns(2.3) are just biorthogonal projections of A onto the subspaces Vj. That is, T
Tj = Wj AVj.
(2.4)
We can use the recursions defined in Eqns(2.1)-(2.2) to define the following basic Lanczos procedure for computing eigenvalues of nonsynimetric matrices.
BASIC LANCZOS EIGENVALUE PROCEDURE
STEP 1 .
Given a matrix A and a user-specified maximum size M, use the Lanczos recursions in Eqns(2.1)-(2.2) to generate a Lanczos matrix T, of order M.
198
J. Cullum and R.A. Willoughby
STEP 2.
For some user-specified m 5 M compute the eigenvalues of T,. Compute error estimates for each of these eigenvalues and select those eigenvalues with sufficiently small error estimates as approximations to eigenvalues of A.
STEP3.
If convergence is observed on all of the eigenvalues of interest, terminate the
computations. Otherwise enlarge the Lanczos matrix T, and repeat step 2. This basic Lanczos procedure replaces the direct computation of the eigenvalues of the given matrix A by an analogous computation of eigenvalues of the associated tridiagonal Lanczos matrices T,. Observe that the Lanczos recursions do not explicitly modify the given matrix A and only the two most recently-generated pairs of Lanczos vectors vk and wk for k = i, i - 1 are needed at each stage in the recursions. The storage requirements are very small when A is sparse or of a form such that the matrix-vector multiplies can be generated cheaply w.r.t. storage and time requirements.
Theoretically, it is easy to demonstrate that for each i, we have the following relationships between the scalars y i + I and
In other words for each i, the product of these two parameters must be equal to the “inner product” of the two right-hand sides of Eqns(2.1). Furthermore observe that we have used a real “inner product” as Wilkinson [7] does, not the Hermitian inner product. Since there exist complex vectors, for example x = ( 1 , i), for which xTx = 0 even though x # 0, this is obviously only a pseudo-inner product when we apply it to complex vectors.
Observe that this “inner product” can vanish (this would also be true even if we used a Hermitian inner product). If it vanishes then the recursion cannot be continued in its current
form. However, our experience and the experiences of others lead us to believe that this is a rare event. In fact even though Parlett, Taylor and Liu [ 5 ] are specifically addressing this potential difficulty, they were able to construct only one example where this type of failure was observed in practice and that occurred only for a very special choice of starting vector. In fact the test matrix which they were considering was itself very special in that it was a persymmetric matrix.
C o m p u t i n g Eigenvalues of Lnrge Sparse Nonsymmetric Matrices
199
DEFINITION 2.1. A matrix is persymmetric if and only if the'associated matrix B
= AJ where J(i,j) =
1, for i
+ j = n + 1 and otherwise J(i, j) = 0,
is symmetric.
I n other words a matrix is persymmetric if and only if it is symmetric w.r.t to its cross diag-
onal. Observe that if A is persymmetric then A ' = JAJ. In fact one can show that for this type of matrix two Lanczos recursions are not necessary; one will suffice. In the next section we
present a spccialization of the above general Lanczos recursion which seems to have very nice numerical properties.
3. PROPOSED LANCZOS EIGENVALUE ALGORITHM
We propose a double-vector, nonsymmetric Lanczos procedure with no reorthogonalization based upon the variant of Eqns(2.1)-(2.2) which is contained in Eqns(3.2). Nominally, we have three sets of scalars a , , pi, and yi. We can however reduce this number to two sets of parameters if we make the following choice
See Equations (2.1). The penalty for doing this is that even if we start with a real matrix and real starting vectors, depending upon the particular matrix, we can quickly obtain complex scalars and vectors. Furthermore, in our procedure we will always set v I = w I . With this choice wc will see that the only nonsymmetry in the Lanczos matrix generation procedure will be that which is a consequence of the nonsymmetry of A.
First we prove that with this particular choice for the scalars yi+, and
pi+, that the
biorthogonality of the Lanczos vectors is achieved (in exact arithmetic). With these choices, Eqns(2.1)-(2.2) reduce to j3i+lvi+l= Avi - aivi - piviPl= T
/3i+lwj+l = A wi ai = v ai
v (ai T
+
w ai )/2
- L Y ~ W-, /3iwi_, = ti+I and
2
T
= (ri+lti+l) w
T
T
= wi (Avi - /3ivi-l)and ai = vi (A
wi -
200
J. Cullum and R.A. Willoughby
The definition of a,is not what is given in Eqns(2.2) hut is a theoretically equivalent expression which is analogous to the modification which was recommended by Paige [lo] for the real symmetric case.
These definitions yield symmetric but typically complex Lanczos matrices, even if the starting vector is a real vector. Specifically we have that
T,
=
(3.3)
See Chapters 3 and 6 of Volume 1 of Cullum and Willoughhy [ l ] for more details about complex symmetric tridiagonal matrices. Several of the results in [ 11 are repeated here. First we demonstrate that the Lanczos vectors are biorthogonal.
We should observe that for the special case when A is a complex symmetric matrix that the recursions given by Eqns (2.1)-(2.2) and (3.1) reduce to a single recursion.
Formally, this is the ‘same’ recursion as that which is used in the real symmetric case. However, when A is complex symmetric, A and thus the Lanczos vectors v, and the Lanczos matrices T, arc all complex-valued. See Chapter 6 of Volume 1 and Chapter 7 of Volume 2 of reference [ 11 for more details.
LEMMA 3.1. (Exact Arithmetic).
Let A be any nxn nonsymmetric matrix with q distinct
eigenvalues. Let v , and w, be any nxl vectors such that wTv, = 1. Then assuming that for each i that wTvi # 0, we have that the v-vectors and the w-vectors generated using Eqns(3.2) are hiorthogonal. Specifically, for any j # k, T
wi. vk = 0.
(3.5)
Computing Eigenvufui~of-Large Sparse Nonsymmetric Matrices
201
PROOF‘. Clearly, by construction W 2I V I = VZWI T
= 0.
Furthermore we have that
Similarly, we have that
Therefore Tor j
> > +>
+
-.004..
+
8
-.006.. Ll -. 008 .. a U E -.Ill,
a
.008 ..
W-
.006 ,004..
2
__
lx
+
.002 ..
; a
o...
>*>>
>
-.002 _. w - .004 .. -.006.. a: -.008-. a
a
U
% c
Whl hl
m
= 2 a g
>
a a El
> s p
+
-.002..
U
++
>+ > > >
w
+ + >>>
**
*3+>
* *
+
.... .
. . 3
3 %
-.01.
.008.. .006.. .004.. .002 _.
o... -.002 .. -.004..
-.006..
-.008..
-.ill,
>*H
*
*...*... ,, ... .... .* i.
225
Computing Eigenvalues of Large Sparse Nonsymmetric Matrices
U
z
.008..+ .006 .. .004 __ .002..
(L
U. m
i U
5
o...>>>>>> -.002.. -.a04
+ + +
+ >
+
+
+
a
= 2 Wm
-.006.. -.008..
U
a
g
f
+ -k
.006 ..
g
3 -
*>
.om..
5
=
>> >>>:>
-.01,
.004.. .002.. 0 . -..z+H*. -.a02 _.
7 Q
>> >> >>+>
+
__
k -.OO6.. g -.008.. a
+
>>>>>>>>>A>>>>=>>>
-.004..
-.Ill,
.
,,m,w
*.**.**
,,w 3 * .
**
33-
..
J. Cullurn and R . A . Willoughby
226
i .002..
U
’ E w
-.006
hl
.008..
.006 _.
& .004.. i .a02 .. U > 0. _. E
i
3%
3
3
*
> > > > >
> > > v
-.002..
-.004..
+
>>>
i
i
>
+
t
-.OM+ -.008
-.01 -1.01
-1.
-.99
-.98 -.97 -.96 -.95 -.94 -.93 FIGURE 9. SAA1891, REAL PART, EIGVAL
d
&
.004.. .a02 _. U > 0. -. -.a02 .. -.004.. -.006..
5
z
5
3
__
$
-
3
3
i>
-.008.. -.01
2
w
=
0. .. -.002.. -.004..
-.008.. -.a1
. 3
3
3
*
* * 3 3
.
-.92
-.91
i
3
3
-
,>p
+
Computing Eigenvalues of Large Sparse Nonsymmetric Matrices
+>
+
1.5 >
>
> >
>
> >
>
>
>
>
> >
>+
>
> > >
> >
>
>
>
>
>
> + >
>
>
>
227
>
> > >
>
>
+
>
>
+
9 3>
> > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > +> >
+
>
-1.51
>+
+ -2.1 5.
1.5
5.5
II+
6.
3
6.5 7. 7.5 8. 8.5 9. FIGURE 11. ELN900, REAL PART, EIGVAL
>+
+
.
>
>
3
%
>
,
10.
3
5
>+
9.5
3
>
>
+ +
>
>,>+ >
>
>
>+
>
+> >
'
>
>
>
>
> + >
+ -2.
+
>
5
3
P 3 3
3
10. I
5.
5.5
6.
6.5 7. 7.5 8. 8.5 9. FIGURE 12. ELN900, REAL PART, EIGVAL
9.5
J. Cullum and R.A. Willoughby
228
2. 3
1.5..
8 M
1. __
m
9; t a
3
> + >
f
”
>
>
>
> >
>
>
> > >
a
-.5.,
a
-1.__
>> >
n
z
>
> >
+
3.
> + +
> > > ++ > >+ > 3 > 7 > > > >+ > > > > > > > > + ’ > > > * $ * -b 3
3
-1.5..
3
9
*3>
0 . ..
W
&
*
>
> > >
_1
=E.
*
3
> + > + >
3
> > > > > > >
.5
3
3. 3
* 3
-2.
5.
5.5
6.
6.5
7.
7.5
8.
8.5
9.
9.5
I
10.
3
i
-b
> >
> >
>
+
P
>
>
>
>
>
*
>
>
>
> >’
*
*
+>+
>
3
>+ >
> +>
+,
d-.
*
3. 3
-2.
* 3
3
10. 1
5.5
6.
6.5 7. 7.5 8. 8.5 9. FIGURE 14: ELN900, REAL PART, EIGVAL
9.5
229
Computing Eigenvalues of Large Sparse Nonsymmetric Matrices
2.T 1.5.. m
1. ..
m
K! m
vl
... r ))
J
U
5 M
w + [Y
a a
.5
M
*
*
*+
0.
> > >>> > +’> +> +> +> ..
3
;>
>
>
*
3
+
> +
3
’+,
>>
+*t
*
,*
3
-1.5.. -2.
9
3
>>’ $ , +
-.5., -1.
>
>
-.
3
* +
>
*
,
*
+ + -5 > > > + > > > > > >+ > > > > >> > > f
U
g
*
*
*
3
3
*
3
3 3
3
3 3
3
_1
2. 3
1.5 Q
m
K! 8)
sz -1
U
>
zw
3
*
1.
m
2
*
* >+
>
>
.5
+
+
*
a3
>+
+
0.
>
> >
2 -.5 w
’+*
U
a a
u
z
-1.
3
-1.5 -2.
3
5.5
6.
6.5
7.
7.5
8.
8.5
FIGURE 16. ELN900, REAL PART, EIGVAL
9.
9:5
lb.
J. Cullurn and R.A. Willoughby
230
2.
3
1.51
>+
.$
+
>
+
a
-1. ..
=
f;
0.
>
+
>
>
.2
.8
.4 -6 FIGURE 17.
1.
> >
>
> >
1.2
>
>
> >
>
>
>
$
>
>
>
> > >
>
+
>
> > >
+ >
-1.5.. -2.
>
>
>
+ 1.6
1.4
I
2.
1.8
ELN900, REAL PART, EIGVAL
i?
3
3
1.5..
1. ._
2
M-
:
>
+ >
hl
+
.5..
+
+
+ +
f
>
i?
>
’
> > >
>
> > >
>
>
>
>
> >
>
2
U
’ E
0. _.
W
+ + +
>
> >
3
>
3
-1.5
t
-2.1 0.
3
> > > >+
> >
+
>
+
3
3
.2
.4
.8
.6 FIGURE 18.
1.2
1.4 1. ELN900, REAL PART, EIGVAL
1.6
I
1.8
2.
Computing Eigenvalues of‘Large Sparse Nnnsymmetric Matrices
2.
3
3
1.5 hl
E
s&
23 1
1.
3
3 3
3
3
>
+
hl
>
+>
.5
>
_1
U
=-
E W
0.
c. M
-.5
a
-1.
U
a m
>
>
’
>
-D
f,
3
3
-1.5 3
-2.
.2
.4
.6
FIGURE 19.
.8
1.
1.2
1.4 ELN900, REAL PART, EIGVAL
1.6
I
1.8
2
3
1.51
3
3
-2.
I
0.
.2
3
.4
3
.6 FIGURE 20.
.8
>>+
1.
+’
1.2
~
1.4
ELN900, REAL PART, EIGVAL
1.6
I
1.8
2.
J. Cullurn and R.A. Willoughby
232
m m
y
.3..
m
Y
t
.2..
ZE
-
_1
U
' z w
8 n I-
.1..
0. _.
>
+
>
-.1..
+
>
> >
-+
- . 2 ._ -.3..
U w ZE
-.4FIGURE 21.
.4
ELN900, REAL PART, EIGVAL
T
*x
> > >
-.l
I-
g -.2 n U
-.3
f; -.4
m m
.15
y
.3..
2
.2..
-
.I..
h)
E
-1
U 2-
z L a U
0. ..
.2
.25 FIGURE 22.
.3 .35 .4 ELN900, REAL PART, EIGVAL
+> 3
'4
-.1.. -.2..
a -.3 .. U x
-
-.4,
FIGURE 23.
ELN900, REAL PART, EIGVAL
.45
-5
Computing Eigenvalues of Large Sparse Nunsymmetric Matrices
03
N!
.3..
2
.2..
hl
x
-1
-
.I..
7
0. --
a
2 k-
c&!
a
3 3
3
-.1..
3
3
3
-.2..
d
w -.3.. a
x F+
-,4* FIGURE 24.
-&
ELN900, REAL PART, EIGVAL
+
m
s l
.1..
g.
>
.05.. a
2 W >
+ -
a
>
>
-1
> >>>>
0 . -.
>
-. 05 ..
*
>
>
> > >
+
P
3
>
IY
p.
-
z
-.1.. -.15, FIGURE 25.
2
hl
6 Y a
-1
2 W
=I
.15.1.. .05.. 0. __
; -.05.. a
> >
> >
a W
-.1_.
+ >
>
>
>
IY
n
ELN4763A, REAL PART, EIGVAL
+
+
>>t> *
9
33.
3
3
233
234
J. Cullurn and R.A. WiNoughby
>
+
>
> > > > >
+ +
>
>
> >
+
+
>
> >
>
-I2
>
>
>
>
t
+
+
-14 1
3.
3.5
4.
4.5
FIGURE 27.
5.
5.5 6. 6.5 7. ELN47636, REAL PART, EIGVAL
7 3
3.
7.5
, 8.
+
0, > >
-2 ..
>
+
> 61
2
> >
-4
61
>
cp x
-
-6
-1
U
__
>
-8 _.
U
2 fi
> >
IY
a
+
>
+
>
>
; W I-
+ +
-10..
+
> >
>
+
-12 __ -14
+
>
>
+
+
>
>+ > > +> > >> + + ' > +$> +> 3
3
~
3.
3.5
4.
4.5
5.
5.5
6.
6.5
7.
Computing Eigenvalues of’Large Sparse Nonsymmetric Matrices
235
60 3
50 __ 3
__ g 30 __ 2 w20 ._ m 40
Y 0
In
&
10 ._
2>
0..
L U
-20 _-
w
-30
*
u
;-10 __ 3
Q
' U
__ __
-40
3 3
-50 ._ -60 -
20
;
-200
-180 -160 -140 -120
-80
-100
-40
-60
T
-20
0
20
3
*
3
>
w
3
> > +
v
>>w>g> >
+
-20
1
-20
-1 5
-10
FIGURE 30.
-5
0
VANNESS, REAL PART, EIGVAL
10 I
5
J. Cullum and R.A. Willoughby
236
u IL
::
z
3
15 __ 10..
W
h)
m
5.3
3
*
3
3.3
%
3
+, :> +
>
* -2 0
-10
-15
-20
FIGURE 31.
-5 0 VANNESS, REAL PART, EIGVAL
+
E
> > )t
LI1
U
x
M
>
+ >
+
5
4
10
>>
H
3 .2 _.
1 .. 0
I
237
Computing Eigenvahtes of Large Sparse Nonsymmetric Matrices
. Oh -.l
-.08
-.06
-.04 FIGURE 33.
-.02 0. .02 .04 VANNESS, REAL PART, EIGVAL
4
.06
.08
.01 ,008 Y 0
.006
2 W
.004
U
a
m
Y
.002
2 >
0.
E
=
+
>
3
-.002
I-
w CII
-.004
-.a06 -.008
-.01
9
l.'Ol 1.'03
1.'05 1.’07 1.’09 1.’11 1.’13 1.’15 1.’17 1.’19 1.21 FIGURE 34. CHUCK6OC, REAL PART, EIGVAL
238
J. Cullurn and R.A. Willoughby
U
g
r
2 1
.008..
.OM.. .004..
.a02 ._ +
0.2m&
w
-.002..
U
-.OM..
a
-.01,
-
El
E
a
-
%
W
2
6)
= 2
6
3
3 3
f
3
3 3
3
-.004..
-.008 __
.008.. .006 __ + .004.. .002..
o..&
-.002.. -.004 __
+
-.OM..
-.008..
g -.a,
El
t
> >>
> >>
+ I Y
a a
> >>
> >>
> >>
> >>
+
-.01
I2 u
-.015
.5
.51
.52
.53 .54 .55 .56 .57 .58 .59 FIWRE 37. CHUCK60C, REAL PART, EIGVAL
.6
.61
.62
Computing Eigenvalices of Large Sparse Nonsymrnetrie Matrices
+ > >>
U
> a
a
> >>
> >>
+
+
6; -.005
+
>>.
239
-.Ol/
+
h
P-4
-.OK]
.5
.51
.52
.53 .54 FIGURE 38.
.55 .56 .57 .58 .59 CHUCK60C, REAL PART, EIGVAL
.6
.61
.62
+ X h
+
+> *>+
>+>>
+ a
g
H
-.0151
.5
.51
.52
.53 .54 FIGURE 39.
.55 .56 .57 .58 .59 CHUCK60C, REAL PART, EIGVAL
.6
.61
.62
.6
.61
.62
... -
-.015
15
.51
.52
.53 .54 FIGURE 40.
.55 .56 .57 .58 .59 CHUCK60C, REAL PART, EIGVAL
J. Cullurn and R.A. Willoughby
240
Wbl 4;
.006 .004
>
+
+
>
+
m a
r
-.oIJ
0.
+4
:: U
a
Wh)
*
=
.a02
.a04
.006
FIGURE 41.
.008
>
>
+
.a1
.012
.a14
.016
CHUCK6OC, REAL PART, EIGVAL
.a18
.01 .008 -006 .004 .002 3
a
-.004
a
a -.008 a a -.a11
E.
0 U
0.
=
2
0.
W_ (0
‘R
>
.004
.006 FIGURE 42.
.008
.a12
.01 .a14 CHUCK60C, REAL PART, EIGVAL
.Ol-
.008.. .006 .. .004.. .002..
2
.a02
= =
3%
3
2 -.002.. -.004.. -.006.. g -.008.. m a -.MA
t
FIGURE 43. CHUCK60C, REAL PART, EIGVAL
.016
.a18
Large Scale Eigenvalue Problems I . Cullum and R.A. WiUoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland),1986
24 1
COMPUTING THE COMPLEX EIGENVALUE SPECTRUM
FOR RESISTIVE MAGNETOHYDRODYNAMICS W . KERNER Max-P lanck- Inst i tu t fur P lasmap hysi k Euratom Association, D-8046 Garching, Fed. Rep. of Germany
The spectrum of resistive MHD is evaluted by applying the Galerkin method in conjunction with finite elements. This leads to the general eigenvalue problem Ax = XBx. where A is a general non-Hermitian and B a symmetric positive-definite matrix. .4s this is a stiff problem, large matrix dimensions evolve. The QR algorithm can only be applied for a coarse grid. The fine grids necessary are treated by applying inverse vector iteration. Specific eigenvalue curves in the complex plane are obtained. By applying a continuation procedure it is possible by inverse vector it.eration to map out in succession complete branches of the spectrum, e.g. 4 resistive Alfven modes. for matrix dimensions of u p to 3.742.
1 . Introduction
Many problems in physics and engineering concern the oscillations and the stability of a given system, thus requiring evaluation of eigenvalues. The Schrodinger equation in quantum mechanics is a famous example of an eigenvalue problem where the energy levels are determined by a self-adjoint operator. In general. the physical model includes dissipation, and overstable or damped normal modes evolve which are described by complex eigenvalues. Such a nonHermitian eigenvalue problem arises in dissipative magnetohydrodynamics (MHD). The study of linearized motion has significantly contributed to the understanding of ideal and resistive MHD plasma phenomena such as stability. wave propagation and heating. The most complete pict.ure is obtained by means of a normal-mode analysis.
In the context. of computational physics the characftiristics of the set of equations have he understood before t h e discretization can be set. up and complex eigenvalues can be evaluat,ed. In fusion test devices such as tokamaks or stellarators the plasma is confined by a strong niagnetic field with a specific toroidal structure (“magnetic confinement”). For details ahout fusion devices and about the MHD model we refer t o texbooks, such as Refs. /1,2/. The MHD model combines the fluid equations with Maxwell’s equations. Consequently, the plasma exhibits characteristics of a n ordinary fluid, such as sound waves, as well as special features due to the magnetic field. such as Alfven waves. The ideal, i.e. non-dissipative, MHD spectrum of normal modes has three completely different branches: the fast and slow magnetoacoustic waves and the Alfven waves. The frequency of the fast magnetoacoustic waves tends to infinity with 1.0
242
W. Kerner
increasingly shorter perpendicular wave structure. The sound and Alfven branches usually form continua with zero frequency as end point. Since the fusion plasma is very hot, the resistivity assumes small values. It is therefore appropriate to treat, the dissipation as a small pert,urbation to a self-adjoint operator. Point eigenvalues of the Hermitian system experience o n l y a small change, mostly damping proportional t o the resistivit,y. But the continua are drastically changed even for small resistivity and, in addition, new instabilities occur which cause the plasma t o break away from the magnetic field. The discussion so far has revealed that the problem has quite different spatial and temporal scales which require appropriate and efficient numerical techniques, and that fine grids are necessary for accurate numerical solution. In this respect our model is quite different from the Schrodinger equation, which solves only for a scalar wave function. Consequently, the paper is organized to address the different aspects involved, namely the physical interpretation, the numerical accuracy and the solution of the complex eigenvalue problem. The author has tried t o make the ma.terial presented interesting in various respects: The reader is encouraged to t.ackle a non-Hermitian problem himself by studying the method presented. Serondly, the propert.ies of the discretization are made evident hy discussing the numerical arciirary in t.errns of convergence studies using increasingly finer meshes and by interpret.ing the results, and. last but not least. t h e eigenvalue pattern found may h e regarded a5 attractive and exciting. It ha. been at,temptrd to discuss i.lie different topics somewhat independently. so t h a t readers can pick o u t the parts of most interest to them. While accurate and very efficient solvers are available for treating the self-adjoint eigenvalue problem - we refer to the excellent books of Wilkinson /3:4/, Parlett / 5 , ’ ; and Golub and van Loan / 6 / - the situation is much more difficult for non-Hermitian matrices. Let us assume t h a t the chosen discretization does not lead t o defective matrices. The QR algorithm can then be applied t o compute all the eigenvalues of the system / 3 , 4 , 7 / . However, this algorithm destroys the band structure of the matrices and produces full matrices. T h e storage and CPU time requirements eventually pui, a. limit on the dimension d of the matrices, e.g. on the CRAY-I of IPP Computer Centre d has to be less than 600. However, much finer grids are necessary i n the resistive MHI) model for finding the st.ability limits and scaling properties. Inverse vector it.eration is a very efficient method of computing selected eigenvalues and eigenvectors of general matrices. It preserves the band st,ructure and thus allows very large matrices t o be treated. The slow convergence, which is sometimes considered a severe drawback, is st.rongly improved by a suitable complex shift. Fast convergence is found by restarting the ii.eration with a new shift.. With a continuation procedure in a relevant parameter, only a few shifts are needed to obtain the result.. In the case of Hermitian matrims Sylvester’s theorem (see, for example, Ref. /5/) yields the number of eigenvalues in a given rea.1 interval, and every desired eigenvalue can be found by the bisection method. A generalization of this theorem for general matrices does not, exist, and therefore inverse vector iteration cannot be used as a black box to compute all the eigenvalues in a given complex domain. Hut this also holds for subspace il.eralion /8,’ or for the Lanczos algorithm /9,10/. In practice, we did not find this drawback very rest,rict.ive;the results obtained from a coarse mesh by means of the QR algorithm or from a fine itiesh by continuation provide a good guess a t a suitable shift. Inspection of details of t>heeigenfunctions, such as the nuniber of radial oscillations, even makes it possible to comput,e
The Complex Eigenvalue Spectrum for Resistive Magnetohydrodynamics
243
eigenvalues in a certain domain of the complex X - plane successively. The matrices of the complex eigenvalue problem have a characteristic block-tridiagonal structure, this being the opposite of random initialization. Large matrices with a dimension of up t o 3.742 are treated and extension to even larger matrix dimension is discussed. The paper is organized as follows: The physical model appropriate to simulating the plasma behaviour in magnetic confinement experiments is described in Sec. 2. Section 3 contains t h e numerical method. The Galerkin method used in conjunction with finite elements culminates in numerical solution of the complex eigenvalue problem Ax = XBx , wherr A is a general matrix and B is Hermitian and positive-definite. The solution of the eigenvalue problem is given in Sec. 4 . The algorithm of the inverse vector iteration is presented. The continuation procedure is introduced as defined by a homotopy method. The CPU time and storage requirements are discussed. The results covering the specific properties of the normalmode spectrum of resistive MHD are displayed in Sec. 5. The variety of t h e eigenvalue branches is stressed as well as their physical interpretation. Finally, Sec. 6 contains the discussion and ronclusion. 2 . Physical model
The objertive of cont,rolled thermonuclear fusion research is to derive nuclear energy from the fusion of light nuclei. such as deuterium and tritium. on an economic basis. At the high temperatures involved the gas is ionized; this ionized state of matter is called the fourth st,ate or the plasma state. In order to obi.ain a sufficient number of fusion processes in a hightemperature plasma with T > 10 keV, the product of the particle density and the confinement time should exeed t h e value given by the Lawson criterion.
Fig. 1: Tokamak shematic: A toroidal current is induced in t.he plasma, which acts as the second loop of a transformer. This current creates a poloidal magnetic field, which together with the main loroidal field establishes an equilibrium and which heats the plasma by ohmic heating.
W. Kerner
244
T h e concept of magnetic confinement utilizes the fact that ions gyrate around magnetic field lines, i.e. are tied t o the field. Naturally, many instabilities tend t o destroy favourable confinement configurations. In the tokamak device a toroidal current is induced in the plasma, which produces a poloidal magnetic field. Together with the toroidal field, this current yields an equilibrium and also heats the plasma. The principle of a tokomak is shown in Fig. 1. The ensemble of particles exhibits collective behaviour. The gross macroscopic properties are of special interest. The plasma is described in terms of single-fluid theory. The resistive MHD equations read in normalized, dimensionless form: equation of motion p ( - t V . VV)= - V P i3V
at
Maxwell - Ohm
dB ~
at
=
V
2. (V x
+ ( V x B) x B,
B) - V x (qV x B ) ,
adiabatic law
Maxwell
V.B-0. Here p denotes the density, v the velocity. B the magnetic field, P the pressure and 0 the resistivity; y is the ratio of the specific heats. Note that t h e assumption of incompressibility. C . v = 0. is not made. The adiabatic law is adopted for the equation of state since the dissipation, which is proportional to q , is considered t o be small. Since the resistive modes rapidly oscillate, the compressible set of equations is appropriate. The fast and slow magnetoacoustic waves are retained. These equations are now linearized around a static equilibrium characterized by = 0 and v, 7 0. The equilibrium is then determined by the equation
;Fj
VP,
=
(V
x
B,) x B,.
-
(5)
In straight geometry static, ideal equilibria can be interpreted as resistive equilibria if = 0, with the consequence thai. T ! ~ , , ? ~ , E , const. In toroidal geometry a resistive equilibrium is only possible with flow. i.e. v, I 0 . This flaw, however, is proportional t o q and hence very small. Here we t.ake t.he simplest approach of a constant resistivity ql, irisi,ead o f a constant E,. This simplification does not. constitute any restriction on unstable modes, since the resistivity decouples the fluid from the magnetic field in localized regions where the perturbation matches the field. But also the results for stable modes are o n l y insignificantly changed by using constant resistivity. This model thus gives the basic feature of resistive modes, since we are interested in phenomena which scale as q 3 / 5 or q 1 / 3 ( and as q’l’, like the resistive Alfven modes). For a circular cylinder the equilibrium quantities only have a n r-dependence. With the usual cylindrical coordinates 7 , 0 , z , the equilibrium is determined b\- the eauat.ion dt-‘, 1 d d - - B s - - ( r B e ) - B,-B, dr r dr dr
V x q ( V x B,)
7
~
The Complex Eigenvalue Spectrum for Resistive Magnetohydrodynamics
245
With two profiles given, eq. (6) can b e solved t o give the remaining one. A class of realistic tokamak-like equilibria with peaked current density j, and constant toroidal field is given by
B, T h e tonstant
j0
7
1.
is adjusted t o vary q ( 0 ) . wliere the safety factor is defined as
T h e ratio of the safety factor on surface and on axis is q ( a ) / q ( O ) = v i 1. For Y = 1 the profiles assume t h e form
B,(r) = 1,
Po(r) - 1 ’
To simulate a plasma - vacuum - wall syst,em. it is only necessary to give the resistivity in the “vacuum” a sufficiently large value and hence a small value for the current. The following separation ansatz is suitable for the perturbed quantities :
where X is the eigenvalue. T h e growth rate XR is then defined as the real part of A, i.e. 2x XR = R e ( X ) . With k = - defining a periodicity length, a tokamak with large aspect ratio
1,
is simulated. II corresponding 1.0 the t.oroida1 mode number; m is the poloidal mode number. In ideal M H U X is either real or purely imaginary, which leads to exponentially growing unstable modes or purely oscillating waves. With resistivity included, the frequency can become complex. T h e equat.ions for the pert,urt)ed quantities v , p and b read
Xb = V
i :
(V
x B,)
~
V x ( q V x b).
(13)
Thr divergence condition, eq. (4), For t h e perturbed field, O . t) = 0, is used to eliminate b8 provided rri j 0. The perturbed resistivity is set to zero, thus eliminating the rippling mode. 1~in;tlly.we discuss t h e boundary conditions. It is assumed that the plasma is surrounded by a perfectly conducting wall, which implies the following conditions at the wall :
W.Kerner
246
b,(a) = 0.
(14b)
For finitv resistivity in the plasma the Maxwell equations require that, the tangential component of the electric field vanish at the wall. This implies d - - ( b , ) , = , = 0. dr On the axis
7
= 0 all the quantities are regular
3 . Numerical method
The set of linearized resistive MHD equations is solved by the finite-element method. A state vector u which contains the perturbed velocity, pressure and magnetic field is introduced:
uT = ( ~ ~ . ~ ~ , ~ ~ . p . b ~ , b ~ ) . In order l o reduce the order of derivatives and to obtain the weak form, we take the inner product. of eqs. (11-13) with the weighting funct.ion v , which has to be sufficient,ly smooth. and int,egrat,e over the plasma volume. In the Galerkin method used here the adjoint function v satisfies the same boundary conditions as u. The linear operator in eqs. (11-13) is represented by matrices R and S with spatial dependence only, where in S only the diagonal elements are non-zero and R contains differential operators and equilibrium quantities. The set of equations then reads
R U = XSU.
(15)
The vector u(r) is a weak solution if for any function v(r) of the admissible Sobolev space satisfying the boundary conditions the scalar product satisfies
(Ru,~= ) X(Su,v) The components of u are approximated by a finite linear combination of local expansion functions or shape functions:
3 - 1
Higher-order elcments are used, namely cubic Hermite elements, for the radial velocity and field componen1,s u, arid b , , and quadratic finite dements for ~ 8 v,, , p and b,. This introduces two orthogonal shape funct.ions per interval, raising the order of the unknowns to 2N + 2, where N denotes the number of radial intervals. With this choice, the transverse divergence
can be made to vanish exactly in every interval. and the divergence of b as well. Note t,hat 711
~
rt', and
ti2
~
zwg.
It has been established that this scheme yields for the discretized
The Complex Eigenvalu e Spec tm m for Resistive Magn e tohydrody namics
247
spectrum a pollution-free approximation to the true eigenvalue spectrum. Only with this choice of t h e finite elements can accurate results, as presented later, be obtained. Details of t h e discussion of this discretization are found in Refs. / 1 1 - 1 4 / . The error introduced in the differential equations through the approximation for u k( r ) is orthogonal to every expansion function. T h e Galerkin method eventually leads to the general eigenvalue problem
For details of the numerical method we refer t o Ref. /15,’.
4. -
Eigenvalue problem 4 . 1 . Algorithm
Since the Q R algorithm produces full matrices, it can only be applied up to relatively small matrix diniensions; the resulting coarse discretization of the operator does not yield sufficiently a c c u r a k results. Therefore, a method has t o be found which preserves the structure of the matrices and allows a fine mesh. Wc c h o o x i i i \ c . I s c . tor iteration in conjunction with a cont.inuation procedure. A t first the idea of continuation is explained in the context of a mapping sequence and then the vector iteration is discussed. The general eigenvalue problem Ax =
XBX
(18)
has tlo be solved, t h e eigenvalue X and the eigenvector x of the expansion coefficients being, in general, complex. A is a general, non-Hermitian matrix and B is symmetric and positivedefinit.e. Since in our problem A and B are real matrices, the eigenvalues occur in complex conjugate pairs. A and B have block-tridiagonal structure with a bandwidth b = 48. The dimension of the matrices is given by d = 12N - 2, where K is the number of radial intervals and is usually quite large. In the algorithm presented the band structure of A and 8 , which usually occurs in a finite difference or fini1.e-element discretization, is preserved and utilized. It is assumed that the system (18) can be approximated continuously by a sequence
where A k and B k represent the discretization of the operator with increasingly finer mesh; the dimension of the matrices is
Inverse vector iteration (see below) is used t o solve the system (18b) for given k . The start, value is taken from the previous step = X k - 1 and the start vector x p ) is obtained from xk-1 by interpolation. The iterat,ion is terminated if the change in the eigenvalues is less than a given tolerance. Of course, any other parameter of the system can be used for continuation.
Xp’
W.Kerner
248
Then instead of t h e dimension of the matrices the elements themselves change. In this fashion knowledge of a relevant part, of the spectrum a.t one point in parameter space is used to explore new regions. 4 . 2 . Inverse vector iteration
The initial value A, is considered as a n approximation to the eigenvalue of the system (18), i.e.
+
X = A,
With t h e shift A,
(IAA,l > b,. It is therefore reasonable to organize the algorithm t o perform the operations successively with increasing radial label i, 1 5 i 5 N . Given the shifted matrix A' = A X n P , which is easily composed by using only sub-blocks of size b,, the factorization is performed hlockwise. A' is again decomposed into a product of triangular matrices (eq. ( 2 4 ) ) :
A'
=
A
x,,e
=
LU.
Note that the first and last, rows have only two blocks A : , , . The factorization can then be cliosen as
t o yield the following algorithm for computing the lower and upper triangular blocks L, and
c,: (31)
which is performed with the LINPACK routine. CGEFA. The evaluation of the quadratic systems requires solution of
by means of the CGESL routine. Event,uallg, t h e matrices I%'? are obtained by solving
L , W , - A ,,*,
1'-1.2...IV-l.
(33)
T h e factorization can t,hen be performed with just the three rows i-1, i and it 1. These subblocks composing the shifted matrix A' as well as its fact,orization are stored on disk. The st.orage c ~ be, reduced t o only three blocks kept, simultaneously in the fast memory and by making 'Iccess to others through 1 / 0 from disk if required. The organization of the pivoting is left o u t in this discussion. The inverse vector iteration with evaluation of new vect,ors x n , . y m r p mand qn, a t each step is logically organized like the storage-improved algorithm described above. These vectors of length d = 1 2 . L . N are partitioned int.0 N part,s and stored
252
W. Kerner
arrordingly. T h e matrix multiplication then involves three matrix sub-blocks acting on three vect,or fractions. If t h e efficiency of the 1/O from and t o disk is put aside, this out-of-core algorithm is more economic because fewer data are involved. Owing to t,he tria.ngular form of t h e L , and U , the evaluation of the blocks K, and M;, only calls for forward solution in contrast t o the usual forward and backward substit,ut,ion. For the decomposition of N rows of blocks with block size b, 7 12 L N p = (2 + 213 + 1)Nbs" = l l i 3 N b ~ (34) operations are required, whereas t h e LINPACK routine. CGBFA, involves
Furthermore, the number of operations for one iteration solving for a new vector x is given by
M,hich is smaller than the number for the LINP.4CK routine, CGBSL:
T h e final prrforniance depends, of course. on t,he degree of vectorization achieved. The ratio of CPI' time spent. on the decomposition versus that of m iterations is
Again, it is evident that most of the CPU time is spent on t h e fact,orization. Therefore, good performance is achieved by successive mesh refinement wit.h increasing dimension of the sysbem, d l < dZ . . . < d,. It is emphasized t h a t the evaluation of the liavleigh quotient
does not require any addit,ional I/O if these vector products are evaluated piecewise together with the evaluation of t h v corresponding vectors. Finally. an overall optimization of the algorithm requires a well-balanced ratio of CPU versus I/O operations. Since I!O operations have to be paid for, it, is not advisable for a fast. vector computer to work with few d a t a in the fast memory and to rely on many I / O operations. Opt irrial performance is achieved by partitioning the available core int,o two pieces a.nd by kcrping as many blocks in memory as fit in1.o one part. The following blocks are then read into the second part. Data transfer can be sped up by using different channels. The algorithm is tuned t o perform 1 / 0 during execution. T h e sparseness within sub-blocks can btl utilized to speed u p this algorithm further. I n particular: the symmetry and sparseness of P arc used to reduce the storage requirement for = B X , ~,. Results obtained b) F as well as to optimize the evaluation of t h e product X,means of this out-of-core algorit,hm will be given elsewhere.
253
The Complex Eigenvalue Spectrum f o r Resistive Magnetohj~drodynamics
5. Results T h e applications display the essential feat,ures of the resistive MHI) sp ect r u m . Moret,he results have to est,ablish t h a t the chosen numerical method, in particular t h e special discretization using higher-order finitr clcments, is a good approxirnat,ion of t h e exact. spcc-
crier,
t r u m . T h i s involves careful inspection of the computed result,s using convergence studies and also comparison with exact analytical findings based on asymptotic boundary laycr theory or W K R J analysis. 5.1. __ Ideal
MHD
T h e first, application is aimed, nat,urally. at, testing t h e performance of the new method by reproducing known results from ideal M H D . T h e entire sp ect r u m of a plasma column with constant toroidal magnetic field and corist,arit. toroidal current dei1sit.y is an int,eresting rase. T h e equilibrium is specified by u 0 i n eq. ( 7 ) , yielding a parabolic pressure profile an d a ~
C O I I ~ I U I s~ a k t y
factor q.
102
I
I
Fig. 2:
0) spect.rum Complct,e, ideal (71 : of t h e constant current equilibriiim (v = 0 iri eq. ( 7 ) ) . T h e square of the eigenvalues ( A = zw)is plotted versus t h e safety fact,or with n = l , ni--2 an d k k 0 . 1 . Three differen(. branches occur, namely fast magriet.oacoiistic, Alfven an d slow magnetoacoustic waves. Negative values for w 2 indicate exponelit ially growing instabilit.ies. The entirc spec-t,riirri is well resolved and n o spurious eigerivaluc~sdue, to numerical coupling of difTerent hrarichc~s occur, i.e. n o "pollution".
I!!! . . . . .I .I .IIIII . . . . .I.!!Ill . . . . I! . . I! . . .I .I.I!.!. . .I!!!!!! . . . . . . .!I . .. .. .. .. .. .. .. .. .. .. .. .. .............. j j j j j j j j i j j j j j j j j ...................
............ .......
10' I
., ! ! I # ' .!
,!i!!!i
-10'6-
-10"
flp
.. . . . . . .. ... ... .: ..; ;:::: _. . . . . . . .' . . ._. . .. .. .. .. .. .. .. . . . . .
-10-q 190
195
200
205
:
In Fig. 2 the spectrum is displayed as a function of the safc,ty factor. T h e s q u a r r of t l i ~ eigmfrequency ( A == zw) is plotted. positive valiies o f w' corresponding to st.able modes and n r g a t i v r values to exponent.ia1ly growing iinstable ones. Th r ee par1.s o f t h e sjwctruni cari lw
clear11 dist ingiiishctl. narriely the discrcte fast, modes, the Alfvm ~ n o d e s ,which for t,his ecluiIibriiirri f o r m a discrele set of modes. and t.he slow-mode cont,inuiirri. If nq is sufficiently closc to - in. thc A1fvc.n rriodes bccornc unstahlc. a.s c a n be sern from I.'ig. 2, a n d f o r nq - rn there, a r e irifinikly rrianv uiistat)lc~triodes. 0 1 1 1a p p r o a c h yields ai this point a5 rriaiiy ii~st~al~ilit~ies as
W. Kerner
254
correspond to the entire Alfven class, namely 1/3 of the spectrum modes. This result holds for all mesh sizes. T h e spectrum presented is in complete agreement with that of Chance el, al. 1191, indicating t.hat we can reproduce the spectrum without, “pollution”/l3,14/, especially t h e marginal points, in agreement with analytical results. I t is emphasized that. our results are obtained from a non-self-adjoint operator in conjunct,ion with cubic and quadratic finite elements, and those of Hcf./IS/ from a completely different self-adjoint operator in conjunction with linear and piecewise constant elements.
5 . 2 . Resistive inst.abilities
As pointed o u t in the introduction, point eigenvalues of ideal MHD, such as the fast modes o r instabilities, experience only a small change proportional to the resistivity. This property is verified numerically, but these resistive normal modes are not really of int,erest. With finite resistivity, the magnetic field is no longer frozen into the fluid. Such resistive instabilities are studied for realistic tokamak-like equilibria wit,h peaked current density and constant toroidal 1 in eq. (7), a.nd hence q(a)/q(o) - 2; it is given explicitly in eq. (9). We field with v concent,rate on the m = 2 mode. ] 10- which i s the fourth mode of t h e upper branch, 2' . 1 . 4 4 for 7) = 2 x 1 0 ', whicli is the t.ent,h mode of t h e lower branch, 1 ) ) X : --O.(i3 i~ I .57 t 7 . 0 . 3 8 i'or q = 2 1 0 ', w h i c h is tlic' last mode with oscillation, i.e. finitc c) X imaginary p a r t , d) X 0.27 -t T. - 2 . 3 2 for q 2 i 1 0 ',, which i s t h e twelfth mode of t h e upper branch Xotc t.ha.1. t h e real and imaginary p a r t s o f ttic eigcnfiinctions are similar i n structure and cqual in magnitude.
',
-
7
~~
~~
~
Purely d a m p e d modes emerge from a second branch point, on the negative real axis of t.he The eigerifunctions are Bessel function-like with practically constant amplitude but. an
A plane.
Tlz e Coinp lex Eigeriualu c Spec tru tn f o r R esis tiiv Magi i e toh,vdrodynarnics
26 1
increasing number of radial nodes away from this branch point towards the t,wo accumulation points X = 0 an d X = -co. If th e numerical resolution is not good enough, completely false results are obtained for normal modes with eigenvalues between t h e two branch points - like t.hose shown in Fig. 8b. Adequately representing both t h e radial oscillations and the amplitude modulation requires a much finer grid t han t h a t for resolving the purely damped modes. Only with t h e reported fine grids of N = 300 is one able to understand t h e numerical results near the branch points. T h e smaller the resistivity, t h e more eigenvalues lie on the 0. T h e ideal ( r j = 0) curve, b u t t h e curve itself becomes independent of r j in t h e limit r j Alfven continuum is approximated only a t t h e two end points in the limit of r j 4 0 by modes where t h e eigenfunction is peaked in a sniall layer at r = 0 an d r = 1.0. This laver width 6 decreases with r j as h ,I'/~. Figures Y a and 9d display t.he eigenfunctions of two cases
-
-
with almost the s am e eigenvalue b u t for t,wo different values of t h e resistivity. T h e structure of
t h e eigenfunction is similar. b u t for smaller resistivity more radial oscillations occur in a finite radial dorna.in.
Controlled thermonuclear fusion research is aimed a t achieving o n earth economic exploit,ation of fusion energy. which is the source of energy in t h e
btdt\.
I ~ C111i. I I purpose the
plasma conrained in magnetic confiiirnient devices is being extensively studied rxperiment.ally aiid thcorctically. T h e most dangerous instabilit,ies which limit t h e operation of discharges are macroscopic. These can b e described bl- th e MHD model. T h e typical time scale of such gross insl.alJilities in tokamaks ranges from microseconds t o milliseconds. Th e propert.ies of tlrc linearized motion around a n equilibrium s ta te described by resistive MHI) are of special interest and have prompt,ed t h e numerical normal-mode analysis. Since dissipat,ion is included. the Galerkin procedure yields a non-variational form leading to the general eigenvalue problem
Ax = XBx, where A is a general and B il symmetric, positive-dcfinit,e matrix. This allows a ; also t.hc stable p ar t of t h e spectrum is t he numerical search for stable plasma f ~ ~ i i i l i h r ibut relevant for heating. T h e first configuration studied is t h e roilstant-current equilibrium without resistivity. Th e plot of the eigenvalues, either real or purely irnaginar!. clcarly displays three different branches. nairirly fast. and slow magnetoacoustic wa\"cs and -4lfven maves. T h e accumulation point of
t h e fast modes t en d s t o infinity, t h r slow aiid Alfven modes usually form continua approaching m r o frequency. T h e m a in result is t.hai t h e disc-retizaliotr based or1 cubic and quadratic
finite elements gives a n optimal approximation for t h e eniirc spectrum. .4 less sophisticated disc rct izat,ion yields false eigenvalues due t o numerical coupling of t h e different branches ("pollution"). The resistivity h a s small v;ilues for typical tokarriak discharges. Then the infli~ence o f resistivity on t h r ideal fast modes and on ideal instabilities is weak and is therefore not of interest. Important are new, resistive instabilities which decouple t h e fluid from the magnetic field. T h e problem has quite diflerent spatial a n d temporal scales, such as small resistive lavers. which requirc a fine grid for accurate numerical approximation. In t h e applications pressurc and current-driven resistive instal)ilit.ies have been discussed. Owerstable modes, i.c. torriplex eigenvalues. o r 1.
Definition 2.2 The global spectral condition number for a is spc(a)
= cond2(V) II X, II .
The spectral condition number for the corresponding invariant subspace M is
We recall
that
)IX, 11
=
11 ( cos 2 ) - I II
= ( cos &,,ax)-
I
and
11 2' 11
depends on
h-', cond,(V), and cond2(r). (See Example 2.3 below). When A is Hermitian or normal,
X, = Q, V is unitary, cond,(V) = 1, E = 0 , and We illustrate the dependence of example.
Example 2.3. Let
11 Z' 11
IIZ' I I F
=6
-1
upon 6-', cond,(V), and cond2(y) by the following
214
F. Chatelin
r l d I d I d I d 1
Clearly, 6 = u=
I a - 1 I , cond,(V) depends on I
II(B,BI-
i)
1
c
1
, and cond2(V) depends on
1
d
1 . We set
IIF.
Let a = 0.8, and 6 = 0.2.
c= 1
d.
U
d=-1
5
c 0
lo5
2
0 1.6
- 10
-5
-1
lox
2.3
1
lo4
5
lo5
lo8
-15
7
10
lox
100
5
lo6
5
5
lo5
1.3
lo7
Table 2.1
ii)
Let c = 1, d = -1.
Table 2.2
lo8
21s
Ill Coriditioiied Eigenproblems
Table 2.1 (Table 2 . 2 ) shows that a increases when c or d (6-l) increases. There are similar results for matrices B = ( i i ) , a f b .
A
Example 2.4. Consider the particular case u = {A], where h has multiplicity m. If h is semi-
simple, B = XI, and cond2(V) = 1. But if h is defective, cond,(V) may be large (see Example 2.2) and h may be ill-conditioned without
11 X, 11
being large. When cond,(V) 11 X,
11 is
moderate, a defective eigenvalue is globally well-conditioned, in contrast with what happens when it is treated individually (see Example 2.1 .). Cond,(V) is large when there exists a matrix
close to B with a defective eigenvalue having an almost degenerate Jordan basis.
Example 2.5. A =
cond,(V)
(
't)
has eigenvalues { I , 01 and eigenvectors V =
lo4. It is easily checked that
is defective with double eigenvalue 1/2 and the Jordan basis
Cond,(V')
- lo4. The departure from normality of A is of order lo4.
A
2.3. BALANCING OF A NONNORMAL MATRIX.
The quantity
11 X. 11
,is not invariant under a diagonal similarity transformation on
advisable to balance the matrix. The relevant A-'AA such that Example 2.6. Let
and
,
,
A . It is
11 X, 11 is the one which corresponds to the matrix
11 AK'AA 11 is close to its minimum.
F. Chatelin
276
A = ( ' 0 10' - 4 ) ,
then A' = A-'AA =
( A)
. The eigenvectors of A are
x=(
1
-4)
0 - 10
and those of A' are
X'=A
-1
X=('
'>.
0 -1
Balancing A with normality for A.
A has decreased cond2(X), as well as decreased the departure from
A
2.4. GROUPING THE EIGENVALUES.
To compute ill-conditioned eigenvalues and/or eigenvectors one tries to group (i.e. treat simultaneously) the eigenvalues such that a perturbation has more effect on them and/or o n the associated eigenvectors. This should be done to decrease as much as possible the corresponding spectral condition numbers.
When A is normal, spc(a) = 1 and spc(M) = 6-l. Hence, grouping close eigenvalues will decrease spc(M). When A is nonnormal, spc(a) = cond,(V) which depends on cond,(V), cond,(V) and 6-'. decrease 6-' and 11 X,
11 X, 11
and spc(M) =
1) Z*I J F
By grouping certain eigenvalues, one may
11 2 , but cond2(V) and cond2(V) remain unchanged.
Example 2.7. (Stewart [1972]). Consider
It has eigenvalues { l ,0, . 5 ] . The first two are ill-conditioned, whereas the corresponding eigenvectors (1.0, O)T and ( 1, - 1 0-4, O)T are well-conditioned. Indeed
111 Coiitlifiotied Eigrnproblenis
A’ =
I
lo4
o
1.1
0
0
2
0
.5
1
211
has eigenvalues { 1 . I , -0.1, .5 1. The first two eigenvectors are (1/3)
106
1
-1.1
10
x
-4
( - 1/3) x 10
We group the two ill-conditioned eigenvalues:
spc(M)
- lo4 where M
u =
{O, 1). We find spc(u)
- lo4 and
= lin(e,, e,) is the invariant subspace associated with u. Observe that
grouping the eigenvalues has not decreased the spectral numbers since
‘)
B = ( 1 10 0
0
has eigenvectors
v= with cond2(V)
(
1 -4)
0
- 10
- lo4.
One may check that, for A’, the basis X i in the invariant subspace M’, normalized by
Q*X’ = I, with Q = [el, e2] is given b y X’
=
XB where
thus
One may comment that almost parallel but we11 conditioned eigenvectors generate an illconditioned invariant subspace. A
278
F. Chatelin
The reader will find computational criteria to decide when and where to group eigenvalues in the paper by Demmel and Kagtrom in these Proceedings.
3. SIMULTANEOUSNEWTON’S ITERATIONS
We study the relationship between simultaneous inverse iterations and Newton’s iteration in the case of a defective multiple eigenvalue. To set up the framework, we start with a simDle eigenvalue. 3.1. X IS SIMPLE.
Let A be a simple eigenvalue of A. Then for any y, the eigenvector x normalized by y*x = 1 satisfies F(X) = AX - x(y*Ax) = 0
(3.1)
Let u be a starting vector and let y be given such that y*x # 0; The normalization y*x = 1 is linear in contrast with what is usually done (x*x = 1) (see Anselone-Rall[1968]). Newton’s iteration on Equation ( 3 . 1 ) is given by x = u/(y*u), (I
z =x
k+l
k
-x ,
where z satisfies
- x ~ * ) A z- z ( ~ * A x) = - F(x ), k 2 0. k
k
This is well known to be equivalent to a right Rayleigh quotient iteration (See Chatelin [ 19841).
Newton’s method is expensive so we consider the modified Newton’s iteration, where scalar chosen close to A, and u is a starting vector.
- Xk, (I - Xky*)AZ - U Z = - F(Xk), k 2 0 ,
xg = u/y*u, z = Xk+,
or, equivalently
a
is a
Ill Conilirinried Eigenproblerns
279
Proposition 3.1. The modified Newton’s iteration given by Equations (3.2) is equivalent to the inverse iteration on A.
Proof. The vectors qk and xk defined by these iterations are easily seen to be parallel. They correspond respectively to the normalizations
1) qk 11
= 1 and y*Xk = 1 .
The two methods are mathematically equivalent but not numerically. Indeed, Equations (3.2) are a defect correction method (see Stetter [1978]) and higher accuracy can be achieved if the residual F(xk) is computed with higher accuracy. (See Dongarra et a1 [1983] for an implementation). We now turn to a multiple eigenvalue A, with invariant subspace M. 3.2. SIMULTANEOUS INVERSE ITERATIONS.
Let u be close to A. Let U = [u,, . .. , urn] be a set of m independent vectors. The method of simultaneous inverse iterations can be written as the following recursion where k = 0,1,2,...
U = QoRO, Q*oQo = I and R, is a Lower triangular matrix (Schmidt factorization) (3.3) (A - ‘JI)Yk+i = Qk, Yk+l = Qk+iRk+i, k 2 0. For all the theorems stated in the rest of the paper, the proofs are omitted (for length’s sake). They can be found in Chatelin [1986].
Lemma 3.2. If spc(M) is moderate, the error incurred while solving (A
- 0I)Y = U
lies mainly in the wanted invariant subspace M.
Theorem 3.3. Let Q be an orthonormal basis in M. If h is defective, the m vectors which are solutions Z = {z,, ... ,]z,
of
are almost dependent.
Corollary 3.4. If h is defective the simultaneous inverse iterations are unstable.
F. Chatelin
280
Proof. The basis Yk computed by Equation (3.3) is almost degenerate, since the basis Qk tends
to Q.
0
3.3. SIMULTANEOUS NEWTON’S ITERATIONS
Let h be a multiple eigenvalue of A and let M be the corresponding invariant subspace of A
of dimension m. A basis X in M normalized by Y *X = I, is a solution of the quadratic equation F(X) = AX
- X(Y*AX)
= 0.
(3.4)
Conversely, if B = Y*AX is regular, the solution X of Equation (3.4) is a basis in M normalized by Y*X = I,.
B is an m
x
m matrix having a multiple eigenvalue. If h is semi-simple, B = hI,
but if h is defective, B is not normal. Remark. By a change of variable, Equation (3.4) can be transformed into an algebraic Riccati equation, often used in control theory (see Demmel[1985] and Arnold-Laub [1984]).
3.4. MODIFIED NEWTON’S METHOD
The following modified Newton’s method is mathematically equivalent to simultaneous iterations, see Equations (3.3), in the sense of Proposition 3.5 below:
x,= u,
Y * U = I, z = Xk+l - x, - OZ = - F(Xk), k 2 0
(3.5)
(1 - XkY*)AZ
Proposition 3.5. The bases
xk
and Qk computed respectively, by the methods in Equations (3.5)
and (3.3), span the same subspace (A - uI)-,S, where S = lin{U].
To study the convergence of the method defined by Equations (3.5), one may compare it with the simplified Newton’s iteration:
-
-
-
with B = Y*AU. We can interpret Equation (2.5) as the replacement of B in Equation (2.6) by 01. This is legitimate when
-
-
(1 B -
01 (1 is very small. Then the question arises: how well is B
-
approximated by oI? If B is diagonalizable, then we set B = WAW-’, and
-
(IB - 0111~ = II W(A - aI)W-I
I1
I cond,(W) mfx
I
hi - o
I
-
where sp(B) = Ih,, ... , h,).
Ill Conditioned Eigenproblems
28 1
The following proposition demonstrates however, that when A is defective, we cannot expect
-
to be able to approximate B by uI. In this proposition U is an orthonormal set of starting vectors and X is a basis for the invariant subspace corresponding to A. Proposition 3.6. If If A is defective and B diagonalizable, then cond,(W) is necessarily large Y
when
11 U - X 11
is small enough.
The main interest of the simultaneous iterations method given in Equations (3.3) is to keep fixed the matrix A - uI of the system to be solved at each step. This same purpose can be achieved by the following modification of the modified Newton’s method in Equations (3.5) which is stable when A is defective.
3.5. A STABLE MODIFIED NEWTON’S METHOD
-
As we have just seen, when A is defective, B is not well represented by 01 and this fact accounts for the instability of the simultaneous iterations method in Equations (3.3). We can however, obtain a stable method by introducing a SEhur decomposition.
-
Consider the Schur decomposition B = QTQ*, where T = diag (hi) + N, and N is strictly A
triangular.
We set T = uI
+ N,
A
-
A
A
11 B - B 11 = max I hi - u I . This 1 - U 11 is small enough. We are led to
and B = QTQ*, then
quantity is small for any value of cond2(W), when
11 X
the following modified Newton’s method which is stable when h is defective.
The Sylvester equation in (3.7) yields systems with the fixed matrix A
- 01 to solve (see Golub
A lrn et al. [1979] for an efficient algorithm). A natural choice for u in this context is h = x i f ; h i .
REFERENCES
[I]
AhuBs, M; Alvizu, J; Chatelin, F. (1986) Efficient computation of a group of close eigenvalues for integral operators in P m . IUACS World Congm Oslo, 4-9 August SS, North Holland, Amsterdam, (to appear).
F. Chatelin
Anselone, P.M.; RaH, L.B. (1968) The solution of characteristic value-vector problems by Newton’s method. Numer. Math. 11, 38-45. Arnold, W.; Laub, A. (1984) Generalized eigenproblem algorithms and software for the algebraic Riccati equations. Proc. IEEE, 72, (12). Bjorck, A; Golub, G.H. (1973) Numerical methods for computing angles between linear subspaces. Math. Comp. 27, 579-594. Chatelin, F. (1 984) Simultaneous Newton’s iteration for the eigenproblem. Computing, Suppl. 5,67-74. Chatelin, F. (1986) Valeurs Propres de matrices. Masson, Paris (to appear). Davis, C.; Kahan, W. (1968) The rotation of eigenvectors by a perturbation. 111. Siam J. Numer. Anal. 7, 1-46. Demmel, J. (1985) Three methods for refining estimates of invariant subspaces. Tech. Rep. 185, Computer Science Dept., Courant Institute, NY University, New York. Demmel, J; Kagstrom, B. (1986) Stably computing the Kronecker structure and reducing subspaces of singular pencils A - XB for uncertain data, these Proceedings.
[lo]Dongarra, J.J.; Moler, C.B.; Wilkinson, J.H. (1983) Improving the accuracy of computed eigenvalues and eigenvectors. SIAM J. Numer. Anal. 20, 23-45.
[ 111 Golub, G.H.; Nash, S.; Van Loan, C. (1979) A Hessenberg-Schur form for the problem AX
+ XB = C. IEEE Trans. Aut. Control AC-24.909-913.
[12] Golub, G.H.; Van Loan, C. (1984) Matrix commtations. North Oxford Academic, Oxford.
[ 131 Peters, G.; Wilkinson, J.H. (1979) Inverse iteration, ill-conditioned equations and Newton’s method. SIAM Rev. 21, 339-360.
[14] Rosenblum, M. (1956) On the operator equation BX - XA = Q. Duke Math. J. 23, 263-269.
[ 151 Stetter, H.J. (1978) The defect correction principle and discretization methods. Numer. Math. 29,425-443.
[ 161 Stewart, G.W. (1971) Error bounds for invariant subspaces of closed operators. SIAM J. Num. Anal. 8,796-808.
[17] Varah, J.M. (1979) On the separation of two matrices. SIAM J. Numer. Anal. 16, 216-222.
Large Scale Eigenvalue Problems J . Cullum and R.A. Willoughby (Editors) 0 Elsevier Science Publishers B.V. (North-Holland), 1986
283
Stably Computing the Kronecker Structure and Reducing Subspaces of Singular Pencils A - XB for Uncertain Data James Demmel
Courant Institute 215 Mercer Str. New York, NY 10012
USA
Bo K8gstrBm
Institute of Information Promsing University of Ume8 S-90187 Ume8 Sweden
We present algorithms and error bounds for computing the Kronecker structure (generalized eigenvalue and eigenvectors) of matrix pencils A - XB. Applications of matrix pencils to descriptor systems, singular systems of differential equations, and statespace realizations of linear systems demand that we accurately compute features of the Kronecker structure which are mathematically ill-posed and potentially numerically unstable. In this paper we show this can be accomplished efficiently and with computable error bounds. 1. Introduction
1.1 Motivation and short summary During the last few years there has been an increasing interest in the numerical treatment of general matrix pencils A-XB and the computation of the Kronecker canonical form, or KCF. The main reason for this interest is that in many applications, e.g linear systems theory [40], descriptor systems [1, 24, 311 and singular systems of differential equations [4],[46], problems are modeled in terms of linear matrix pencils. Given information about the KCF questions about the existence and the unicity of solutions, the state of a system or even explicit solutions can easily be answered. Algorithms have been proposed by several authors (see section 4) for computing the KCF. These algorithms share the property of backward stability: they compute the KCF (or some of its features discussed in section 1.2) for a pencil C - A D which lies some small (or at least monitorable) distance from the problem supplied as input, A-XB. A natural question to ask is how such a perturbation, measured as 11 ( A , B ) - ( C , D ) [ l E( be it roundoff, a user supplied estimate of measurement error, or a user supplied error tolerance), can affect the KCF or related quantities being computed. In particular a perturbation analysis where the size of the perturbation is a parameter (not necessarily small) is needed to rigorously determine the effects of the uncertainty in the data (see sections 4 and 5). It is not at all obvious that such a perturbation analysis is possible at all, since the mathematical problem is frequently unstable, so that arbitrarily small perturbations in the data may cause large changes in the answer. For some applications (e.g in linear systems theory) it is important to calculate unstable features of the KCF because they still contain pertinent physical information. In fact one sometimes wants to compute as unstable (nongeneric) a KCF as possible, because this in a certain sense allows low dimensional approximations to high dimensional problems, in particular low
284
J. Dernmel and B. Kbgstrom
dimensional approximate minimal realizations of linear systems ([39],[48], see also sections 2.3 and 2.4). In this setting one wants to know what nongeneric pencils lie within a distance E of a given pencil A -XB, in order to pick the one supplying the best approximation (see section 4.5). Before we go into further details we outline the rest of the paper. In section 1.2 we collect the basic algebraic theory of linear matrix pencils, for example regular and singular pencils, the KCF, minimal indices, and generalized subspaces (deflating and reducing subspaces). In section 1.3 we define other notation. Section 2 describes some applications where general matrix pencils A - XB appear, for example descriptor systems (section 2.1), singular systems of differential equations (section 2.2), and statespace realizations of (generalized) linear systems (sections 2.3 and 2.4). Section 3 considers algorithms for computing the Kronecker structure and its relevant features like reducing subspaces. The section starts with a geometric interpretation of a singular pencil in canonical form (KCF)and introduces the RGSVD and RGQZD algorithms ([19, 201) that are reviewed in section 3.2. In section 3.3 the transformation of a singular pencil to a generalized upper triangular form (GUPTRI) is discussed. A new more efficient implementation of the RGQZD algorithm, which is used as the basic decomposition in GUPTRI, is also presented. Section 3.4 is concluded with a short review of other approaches and algorithms, notably Kublanovskaya [16, 171, Van Dooren [39,41], and Wilkinson [46]. In section 4 the perturbation theory of singular pencils is formally introduced. The section begins by reviewing the perturbation theory for regular pencils (section 4.1). Section 4.2 discusses why the singular case is harder than the regular case. Perturbation bounds for pairs of reducing subspaces and the spectrum of the regular part are presented (sections 4.3 and 4.4). For a more complete presentation and proofs the reader is referred to [ll]. Finally in section 4.5 we make some interpretations of the perturbation results to linear systems theory by deriving perturbation bounds for the controllable subspace and uncontrollable modes. We illustrate the perturbation results with two numerical examples. Armed with the perturbation analysis for singular pencils, section 5 is devoted to analyzing the error of standard algorithms for computing the Kronecker structure from section 3. An upper bound for the distance from the input pencil A-XB to the nearest pencil C - A D with the computed Kronecker structure and perturbation bounds for computed pairs of reducing subspaces are presented (sections 5.1 and 5.2, respectively). In section 6 we present some numerical examples (one generic and one nongeneric pencil) and assess the computed results by using the theory from sections 4 and 5. Finally in section 7 we give some conclusions and outline directions for our future work. 1.2 Algebraic theory for singular pencils In this section we outline the algebraic theory of singular pencils in terms of the Kronecker Canonical Form (KCF). Suppose A , B, S and T are m by n complex matrices. If there are nonsingular matrices P and Q ,where P is m by m and Q is II by n, such that A-XB = P (S-AT) Q-' (1.1) then we say the pencils A-XB and S-AT are equivalen~and that P ( . ) Q - l is an
Stably Computing the Kronecker Structure
285
equivalence transformation. Our goal is to find P and Q so that S-XT is in a particularly simple form, block diagonal: S=diag(Sll, . . . ,Sbb) and T=diag(TI1, . . . ,Tbb).We can group the columns of P into blocks corresponding to the blocks of S-AT: P = [P1I * . I Pb] where Pi is m by mi, mi being the number of
rows of Si,-XT,,. Similarly, we can group the columns of Q into blocks corresponding to the blocks of S-AT: Q=[Qll . lQb]where Q,is n by n,, ni being the number of columns Of S,,-A Ti,. The diagonal blocks Sii- ATIi contain information about the generalized eigenstructure of the pencil A-XB and P, and Q, contain information about the corresponding generalized eigenspaces. One canonical decomposition of the form (1.1)is the Kronecker Canonical Form [12],where each block Sii-ATii must be of one of the following forms:
- -
This is simply a Jordan block. ho is called a finite eigenvalue of A-XB
This block corresponds to an infinite eigenvalue of multiplicity equal to the dimension of the block. The blocks of finite and infinite eigenvalues together constitute the regular part of the pencil.
This k by k+ 1 block is called a singular block of minimal right (or column) index k. It has a one dimensional right null space for any X .
This j + 1by j block is called a singular block of minimal left (or row) index j . It has a one dimensional left null space for any A . The left and right singular blocks together constitute the singular part of the pencil. If a pencil A-XB has only a regular part, it is called regular. A-AB is regular if and only if it is square and its determinant det(A-XB) is not identically zero. Otherwise, there is at least one singular block Lk or L,’ in the KCF of A - XB and it is called singular. In the regular case, A-XB has n generalized eigenvalues which may be finite or infinite. The diagonal blocks of S - AT partition the spectrum of A - XB as follows:
286
J. Demrnel and B. Ktigstrorn
UEU(A-AB) =
b
U u(S,,-AT,I) i=1
b
U U, .
1=1
The subspaces spanned by PI and Q, are called left and right deflating subspaces of A-AB corresponding to the part of the spectrum u, [32, 411. As shown in [41], a pair of subspaces P and Q is deflating for A-AB if P=AQ+BQ and dim(Q)=dim(P). They are the generalization of invariant subspaces for the standard eigenvalue problem A-AZ to the regular pencil case: Q is a (right) invariant subspace of A if Q=AQ+Q, i.e. AQCQ. Deflating subspaces are determined uniquely by the requirement that S-AT in (1.1) be block diagonal: different choices of the Pi or Qlsubmatrices will span the same spaces P,and Q1. The situation is not as simple in the singular case. The following example shows that the spaces spanned by the P, and Q,may no longer all be well defined:
As x grows large, the space spanned by Q2 (the last column of Q) can become arbitrarily close to the space spanned by Ql (the first two columns of Q). Similarly the space spanned by P z (the last column of P) can become arbitrarily close to the space spanned by P1 (the first column of P). Thus, we must modify the notion of deflating subspace used in the regular case, since these subspaces no longer all have unique definitions. The correct concept to use is reducing subspace, as introduced in [41]. P and Q are reducing subspaces for A-AB if P=AQ+BQ and dim(P)=dim(Q)-dim(N,), where N , is the right null space of A - AB over the field df rational functions in A. It is easy to express dim(Nr) in terms of the KCF of A-AB: it is the number of L, blocks in the KCF [41]. In the example above, N , is one dimensional and spanned by [1,A,OIT. In this example the nontrivial pair of reducing subspaces are spanned by P 1 and Ql and are well defined. In terms of the KCF,we may define reducing subspaces as follows. Assume in (1.1) that S-AT is in KCF,with diagonal blocks Sll-hTll through &-ATrr of the tfle Lk (right blocks) s r + l,r + l-AT[+ 1,r + 1 through s r + reg ,r+ r e g - A T r + reg,r+ reg regular blocks (Jordan blocks or blocks with a single infinite eigenvalue), and the remaining S,-ATl blocks of the type LF (left singular blocks). Assume for simplicity that there is one 1 by 1 regular block for each distinct eigenvalue (finite or infinite). Then there are exactly 2"g pairs of reducing subspaces, each one corresponding to a distinct subset of the reg eigenvalues. If P,Q are such a pair, Q is spanned by the columns of Ql, . . . ,Qr and those Q, that correspond to the eigenvalues in the chosen subset. P is spanned by the columns of P1,. . . ,Pr and those PI corresponding to the chosen eigenvalues. The smallest subset of eigenvalues is the null set, yielding P and Q of minimum possible dimension over all pairs of reducing subspaces; this pair is called the pair of minimal reducing subspaces. Similarly, the largest subset of eigenvalues is the whole set, yielding P and Q of maximal dimension; this pair is called the pair of maximal reducing subspaces. These pairs will later play a central role in applications. In any event, given A-AB, any pair of reducing subspaces is uniquely identified by specifying what subset of eigenvalues it corresponds to. We illustrate these concepts with the following 5 by 7 example: P-'(A - XB)Q = diag(Lo,LIJz(0),2.Jl(m),LT) =
Stably Coinputing the Kronecker Structure
287
7
.A 1
-A 0
1 -A
1
1 -A
1 Letting el denote the unit column vector with a 1 at position i and zeros elsewhere, we see that the minimal reducing subpaces are given by Qmin
= span{el,ez,e31 and
Pmin
= span{ed
and the maximal reducing subspaces are given by Q, = span(e1,ez,ej,e4,eg,e6,e7} and PmX= span{e1&2,e3&4&5). By choosing two different subsets of the spectrum of A--XB, we may get different pairs of reducing subspaces of the same dimensions. Thus Q = span{el,e2,e3,eq,ed and P = span(el,e~,e3} correspond to the double' eigenvalue at 0, and
Q = s p a n ( ~ l , e ~ w - w and ~ ) P = spadel,e4&5} correspond to the double eigenvalue at m. This concludes our summary of the purely algebraic properties of pencils. Since w e are interested in computing with them as well, we will now describe some of their analytic properties, and explain why computing the KCF of a singular pencil may be an ill-posed problem. Suppose first that A-AB is a square pencil. It turns out almost all square pencils are regular, in other words almost any arbitrarily small perturbation of a square singular pencil will make it regular. We describe this situation by saying that a generic square pencil is regular. Similarly, if A-AB is an m by n nonsquare pencil, it turns out almost all m by n pencils have the same KCF, in particular they almost all have KCFs consisting of Lk blocks alone (if m pregl) and Qmax = SPan([Qr QregI)
Q
=
[Qr
3
Qreg
9
3
3
A further partition of Q,,=[Qo example that
, Qf , Q l ] and PrCg=[Po, P, ,P I ] gives us for
P = spm([Pr Po ,PfI) and Q = Span([Qr Qo QfI) is a pair of reducing subspaces corresponding to the finite eigenvalues of A-XB. If, for example, we want a pair of reducing subspaces associated with the right minimal indices and all eigenvalues in the left complex plane Re X C O (stable modes in connection with linear systems, see sections 2.34) we ask for an ordering of the eigenvalues of Af-ABf such that these eigenvalues will be in the upper leftmost (north-west) corner of A f - X B f . Then we just have to partition Q, and Pr accordingly in order to produce a pair of reducing subspaces associated with the stable modes. However if for any reason we want to mix the zero and/or the infinite eigenvalues together with the nonzero eigenvalues the computed Jordan block structure of Ao-XBo and Al-XB, will be destroyed. Such a reordering can optionally be imposed directly when reducing A-XB to GUPTRI form or by postprocessing. The GupTRl decomposition (3.14) is based on the RGQZDalgorithm (see section 3.2). It utilizes two variants of a GQZD version of the reduction theorem 3.3. The first one, RzsI1I, is a new more efficient implementation of this reduction theorem and computes the right minimal indices (structure) and the Jordan structure of the zero eigenvalues of A -XB. One step of deflation in RZSTR is computed in the following way. Let A(')-XB(') denote the pencil we are to decompose in step i via the unitary transformations PI and Ql such that P~*(A(')-XB('))Q,=
We will use the following notation in the algorithm below:
Stably Computing the Kronecker Structure
299
:= svd(A) computed the singular value decomposition A = KEV* of A with the singular values on the diagonal of X = diag(ak, . . . ,al) in decreasing order. If one of U,X or V is "nil", that means it need not be computed. n := coluxnr~uility@,epsu)is the number of singular values of X which are less than or equal to epsu. < Q p > := qr(A) computes the QR decomposition A=QR. Algorithm RZSTR-kernel: Step 1. Compress the columns of A(i) to give ni 1.1 Cnil,Xf) ,VL’)>:=svd(A (i)) ;n, :=column-nullity(Xf) ,epsu) 1.2 Apply V"' to A(') and B(,): [0 A2]:=A(')V''); [B1B2]:=B(')Vf) (B1and the zero block to the left of A2 each consist of n, columns) Step 2. Compress the columns of B1to give n f - q 2.1 C nil ,Xi[) ,Vh’)> :=svd(B ;n,- r,;=column-nullity(Ei') ,epsu) to B1 and build Q,, the final right transformation in step i: 2.2 [0 B o ] : = B l V ') (Bo has rf columns)
4
Q,:= Vi')
r):]
Step 3. Triangularize [Bo ,B2]to give P f , the final left transformation in step i 3.1
:= qr([Bo,Bz])
r "I
0 B ( f + l ) := Rf
3.2 Apply Pi to A2
Next we apply the RZSTR-kernel to A('+l)-XB('+l) and so on until nlfl=O or n,+l#O but r,+l=O, as in theorem 3.3. In steps 1.1 and 2.1 above we use epsu (a bound on the uncertainty in A and B that is given by the user) as a tolerance for deleting small singular values when computing nullities. All singular values smaller than epsu are deleted, but we always insist on a gap between the singular values we interpret as zeros and nonzeros, respectively. In section 5 we further discuss the influence of finite arithmetic to computed results. The second decomposition LISTR computes the left minimal structure and the Jordan structure of the infinite eigenvalues similarly to RZSTR b;r working on B - pA. The main difference is that column compressions are replaced by row compressions and we start working from the lower rightmost (south-east) corner of the pencil. As we said earlier it is also possible to apply RZSTR to B* - pA* to get the left minimal and infinity structures (see section 3.1). However RZSTR working on B*-pA* will produce a block lower triangular form of A-XB, so in order to keep GUPTFU block upper triangular we make use of LISTR. In summary the GUPTFU decomposition is computed by the following 6 reductions: 1: Apply RZSTR to A-XB. Gives the right minimal structure and the Jordan structure of the zero eigenvdue in A,,- XB,,:
ror-ABor* I
J. Dernmel and B. Khgstrom
3 00
0
P1*(A-hB)Q1 =
A1-XB1
The block structure of Ao,-hBo, is exposed by the structure indices nf and rf (column nullities, see theorems 3.1 and 3.3). 2: Separate the right and zero structures by applying RZSTR to Bo,-pAo,. Gives the dght minimal structure in A,-XB,:
YBr* I 0
P z * ( A o ~ - X B O ~ )=Q ~
A ~ - X B ~
Insist on the same right minimal indices as in reduction 1. 3: Apply RZSTR to A2-XB2, which contains the zero eigenvalues of A-XB, giving P ~ * ( A z - X B ~ )= Q ~Ao-hBo Note that reduction 2 destroys the block structure of the zero eigenvalues, but this reduction gives it back by insisting on the same Jordan structure as in reduction 1. Reductions 1 to 3 in summary:
The pencil Al-AB1 comes from reduction 1 and contains the possible left minimal structure and remaining regular part of A -XB. 4: Apply LISTR to A1-XB1. Since A1-XB1 cannot have any zero eigenvalues the reduction produces the block structure associated with &e left indices inAl-ABl:
5 : Apply LISTR to B 2 - g ~ .Gives the Jordan structure of the infinity eigenvalues h Al-XBi:
[" 0"
*
B3
3-
=
P5*(A2-XB2)Q5
Af-Wf
1
6: Apply the QZalgorithm [27]to As- XB3 and a reordering process to the resulting upper triangular pencil to give the desired pencil Af-XBf: Pg* (A3-AB3)Qa = A / - XBf
*I
Let af and be the diagonal elements of Af and B f , respectively. Then the finite eigenvalues of A-XB are given by a f / p f . Reductions 4 to 6 in summary: P6* 0 Ps* 0 0 I] 0 I]
[
[
[
Q5
P4*(A1-XBl)Q4
0
0 I]
[
Q6
0
0 I] =
[
Af-XBf
0
*
Af-XBf * 0 Ai-XBi
Note that the transformation matrices Pi and Q, in the 6 reductions above are all unitary and the composite transformations P and Q in GUPTRI are given by
30 1
Stably Coinpiring the Kronecker Structure
[A 4[A 13][ [ [ o 0 0 I]
Q4
Q = Q,
0 0 I]
Q5
Q6
(3.15a) 0 I]
(3.15b)
where the identity matrices in (3.15a-b) are of appropriate sizes. 3.4 Some history and other approaches During the last years w e have seen an intensive study of computational aspects of spectral characteristics of general matrix pencils A-XB. Already in 1977 Vera Kublanovskaya presented her first paper (in Russian) on the AB-algorithm for handling spectral problems of linear matrix pencils. For more recent publications see [16, 171. The AB-algorithm computes two sequences of matrices {Ak} and {Bk} satisfying AkBk+I = BkAk+I , k = 0 , 1 , . * * ;Ao=A , Bo=B where At+ and Bk+ are blocks (one of them upper triangular) of the nullspace of the augmented matrix c k = [Ak Bk] in the following way: N(C&)=
rBk+lI *
By applying the AB-algorithm to a singular pencil A-XB we get the right minimal indices and the Jordan structure of the zero eigenvalues. Different ways to compute N ( C k ) give rise to different algorithms. Kublanovskaya presents the AB-algorithm in terms of the QR (or LR) decomposition. In [18] a modified AB-SVD algorithm is suggested that more stably computes the rangelnullspace separations in terms of the singular value decomposition. In 1979 both Van Dooren and Wilkinson presents papers on computing the KCF. The paper [46] has already been mentioned in sections 2.1-2, where in algorithmic terms he derives the KCF starting from a singular system of differential equations. In [39] Van Dooren presents an algorithm for computing the Kronecker structure which is a straight forward generalization of Kublanovskaya's algorithm for determining the Jordan structure of A-XI, as further developed by Golub and Wilkinson [15] and Ki3gstrt)m and Ruhe [21, 221. His reduction is obtained under unitary equivalence transformations and is similar in form to the one obtained from the RGQZD algorithm. The main difference is that we in RZSTR and LISTR compute n, and nl-rf (di~n(N(A(~-~))) and dirn(N(A('-l)) nN(B('-I))), respectively) from two column compressions, while Van Dooren computes n, and r, from one column compression and one row com ression. Thus Van Dooren never computes a basis for tLe common nullspace of A(f- and B(I-I). The operation counts for one deflation step in Van Dooren's algorithm and in RZSTR are of the same order. For exam le if m = n the first deflation step counts 2n3+O(n2) for Van Dooren and n3+O(n ) for RZSTR. Note that both algorithms use singular value decompositions for the critical rank/nullity decisions. If one consider less stable and robust procedures for determining a rangelnullspace separation (like the QR or LR decompositions) it is possible to formulate faster variants of our algorithms. However this is not recommended except possibly in cases where we for physical or modeling reasons know that the underlying pencil must have a certain Kronecker structure.
!?
f
J. Demmel and B. Kdgstrom
302
4. Perturbation Theory for Pencils 4.1 Review of the Regular Case The perturbation theory of regular pencils shares many of the features of the theory for the standard eigenproblem A-XI. In particular, for sufficiently small smooth perturbations, the eigenvalues and eigenvectors (or deflating subspaces) of a regular pencil also perturb smoothly. One can prove generalizations of both the Bauer-Fike ([6, 10, 11, 361) and Gerschgorin theorems [33] to regular pencils. In order to deal with infinite eigenvalues, one generally uses the chordal metric x(X,X’) =
IX
- X‘I
(1 + X2)” * (1 + X’2)’/2 to measure the distance between eigenvalues, since this extends smoothly to the case A =
00.
In our context we are interested in bounds on the perturbations of eigenvalues and deflating subspaces when the size of the perturbation E is a (not necessarily small) parameter, possible supplied by the user as an estimate of measurement or other error. Thus, we would like to compute a decomposition of the form A-XB = P(S-XT)Q-’ , S- AT block diagonal as in (1.1),which supplies as much information as possible about all matrices in a ball P(E) { A S E - X(B+F) : II(E,F)II.<E} .
To illustrate and motivate our approach to regular pencils, we indicate how we would decompose P(E) for various values of E, where A-XB is given by
[:1:]
1100 0 1 0 [o 0 ll where -q is a small number. It is easy to see that IT, the spectrum of A - h B , is a={l/q,O,q}. The three deflating subspaces corresponding to these eigenvalues are spanned by the three columns of the 3 by 3 identity matrix. For E sufficiently small, the spectrum of any pencil in P(E) will contain 3 points, one each inside disjoint sets centered at Vq, 0 and q. In fact, we can draw 3 disjoint closed curves surrounding 3 disjoint regions, one around each X € u such that each pencil in P(E) has exactly one eigenvalue in the region surrounded by each closed m e . Similarly, the three deflating subspaces corresponding to each eigenvalue remain close to orthogonal. Thus, for E sufficiently small, we partition u into three sets, ul={l/q}, u2={0} and A-XB=
u3={d*
0 0 0 -A
v
As E increases to q / f i , it becomes im ible to draw three such curves because there is a pencil almost within distance q/ 2 of A-XB with a double eigenvalue at q/2:
E p 1;H] q;J-A
7
where t; is an arbitrarily small nonzero quantity. Furthermore, there are no longer three independent deflating subspaces, because the 5 causes the two deflating subspaces originally belonging to 0 and q to merge into a single two-dimensional deflating subspace. We can, however, draw two disjoint closed curves, one around
Stably Computing the Kronecker Structure
303
Uq and the other around 0 and q, such that every pencil in P(E) has one eigenvalue inside the curve around Vq and two eigenvalues inside the other curve. In this case {Vq} = u1u 0 2 . we partition u={O,q} As E increases to 1, it no longer becomes possible to draw two disjoint closed curves any more, but merely one around all three eigenvalues, since it is possible to find a pencil inside P(r) with q and Uq having merged into a single eigenvalue near 1, as well as another pencil inside P(E) where 0 and q have merged into a single eigenvalue near q/2, In this case we cannot partition u into any smaller sets. This example motivates the definition of a stable decomposition o f a regular pencil: the decomposition in (1.1) is stable if the entries of P , Q , S and T are continuous and bounded functions of the entries of A and B as A-XB varies inside P(E). In particular, we insist the dimensions nI of the SS-XTil remain constant for A-XB in P(E). This b corresponds to partitioning u=U u, into disjoint pieces which remain disjoint for
u
i=1
A - X B in P(E). We illustrated this disjointness in the example by surround each ul by its own disjoint closed curve. For numerical reasons we will also insist that the matrices P and Q in (1.1) have their condition numbers bounded by some (user specified) threshold TOL for all pencils in P(E). This is equivalent to insisting that the
deflating subspaces belonging to different ui not contain vectors pointing in nearly parallel directions. In the above example, as E grows to q / f i , there are pencils in P(E) where the deflating subspaces belonging to q and 0 become nearly parallel:
E
0 11/2.P q;J - A
[;;;]. T O O
The two right deflating subspaces in question are spanned by [0,1,OIT and [0,-V5,llT, respectively, which become nearly parallel as 6 approaches 0. The numerical reason for constraining the condition numbers of P and Q is that they indicate approximately how much accuracy we expect to lose in computing the decomposition (1.1) [8]. Therefore the user might wish to specify a maximum condition number TOL he is willing to tolerate in a stable decomposition as well as specifying the uncertainty E in his data. With this introduction, we can state our perturbation theorem for regular pencils: it is a criterion for deciding whether a decomposition (1.1) is stable or not. Theorem 4.1: Let A - X B ,
E
and TOL be given. Let u
u into disjoint sets. Define xI for l S l b as
b
=
U uI be some partitioning of
i= 1
wherep,, q,, Dif. and Difr will be explained below. The corresponding decomposition (1.1) is stable if the following two criteria are satisfied: max xl < 1
lslsb
and
304
J. Demmel uml B. Kugstrom
For a proof and further discussion see [ll]. The criterion (4.1) is due essentially to Stewart [32]. If we have no constraint on the condition numbers (i.e. TOL=m), then there is a stronger test for stability (see [ll] for details). The quantities pi,q l , Dif,(u,,u-cr,), and Difl(ul,u-u,) play a central role in the analysis of singular pencils, and will be discussed further in the next section. For now let us say Dif,(u,,u-ui) and Difl(ui,u-mi) measure the "distance" between the eigenvalues in u, and the remainder in u- u,,and that p i and q, measure the "velocity" with which the eigenvalues move under perturbations. Thus the factor multiplying E in (4.1) is velocity over distance, or the reciprocal of how large a perturbation is needed to make an eigenvalue of ui coalesce with one from u--0,. This bound is fairly tight, and in fact the factor b.max (Pi,q,) in (4.3) is essentially the least possible value for ,I the maximum of K ( P ) and K(Q) where P and Q block diagonalize A-AB, the center of P( ). The quantities Dif,, Difl, p , , and q1 may be straightforwardly computed using standard software packages. Also, nearly best conditioned block diagonalizing P and Q in (1.1) can also be computed. Therefore, it is possible to computationally verify conditions (4.1)to (4.3)and so to determine whether or not a decomposition is stable as defined above, as well as to compute the decomposition. 4.2 Why is the Singular Case Harder than the Regular Case? Our analysis of the regular case in the last section depended on the fact that eigenvalues and deflating subspaces of regular pencils generally change continuously under continuous changes of the pencil. This situation is completely different for singular pencils. As mentioned in section 1.2, the features of a nongeneric singular pencil may change abruptly under arbitrarily small changes in the pencil entries. In this section we illustrate this with some examples: As stated in section 1.2, almost any arbitrarily small perturbation of a square singular pencil will make it regular. The singular pencils form a lower dimensional surface called a proper variety in the space of all pencils. A variety is the solution set of a set of polynomial equations. In this case the polynomial equations are obtained by equating the coefficients of all the different powers of A in det(A - AB) to zero [44]. In fact, given a square singular pencil one can find arbitrariIy small perturbations such that the resulting regular pencil has its eigenvalues at arbitrary preassigned points in the extended complex plane [46]. For example, the pencil 0 1 0 -A 1 0 0 0 0 - A O O l = 0 0 - A I 0 [ o 0 11 is singular, and it is easy to see that the perturbed pencil
J [:I A]:-
+
[:a
:]
-"b -c-A 0 has determinant dA3+cA2+bA+a. Clearly we can choose a , b , c and d arbitrarily
0
small so that this polynomial has any three roots desired.
Stably Computiiig the Kroriecker Structure
305
Similarly, almost all nonsquare pencils have the same KCF, which will consists entirely of Lk blocks if there are more columns than rows and LT blocks if there are more row than columns. For example, the following pencil is nongeneric because it has a regular part with an eigenvalue at 1:
0 1 0 [0 0 11 -
[i
=
-[: i
but the following perturbation
-[: i
hO1]
+
a\
I!X]
01
has a generic KC.F consisting of a single L2 block for any nonzero a . In control theory the fact that a generic nonsquare pencil has no regular part implies that a generic control system is completely controllable and observable (SW section 2.3). 4.3 Perturbation Results for Pairs of Reducing Subspaces How do we do perturbation theory for nongeneric pencils in the face of the results of the last section? Clearly, we need to make some restrictions on the nature of the perturbations we &ow, so that the structure we wish to preserve changes smoothly. In particular, we will assume that our unperturbed pencil A-XB has a pair of left and right reducing subspaces P and Q, and that the perturbed pencil ( A + E ) - X ( B + F ) has reducing subspaces PEF and QEFof the same dimensions as P and Q, respectively. Under these assumptions we will ask how much PEF and QEF can differ from the unperturbed P and Q as a function of 11 (E,F)(( E: Theorem 4.2: Let P and Q be the left and right reducing subspaces of A-XB corresponding to the subset ul of the set of eigenvalues u of A-XB. Let E = 11 (E,F)II E . Then if ( A + E ) - X ( B + F ) has reducing subspaces PEFand QEF of the same dimensions as P and Q, respectively, where
(where Difu(ul,u-ul), Difl(ul,u-ul), p and q will be defined below) then one of the following two cases must hold: Case 1: emax(P,PEF)Iarctan(x.(p + (p2- I)*~)) and 5
a c t d x * ( q +(q2- 1IU2>). If x